Parameterized Duplication in Software
Identifiers & constants become parameters
Hash each line of code into a sybmol of S + 0 or more parameters
Use hashing-based suffix tree
Linear time: O(|T|+m(t,T)), but m(t,T) < |T| - O(|T|).
With post-processing, 106 in 7 minutes
- 20% involved in parameterized duplication of ? 30 lines