Software reuse approaches are known to enable considerable effort and cost savings during the development of a group of software systems with a significant overlap in functionality. In practice, however, the need for systematic reuse often becomes apparent only after a number of product variants have already been delivered. The existing literature and an industry survey performed in the context of this dissertation indicate that in practice, new product variants are often created by cloning the code of an existing product and changing it according to the new requirements. In a long-term perspective, this practice often leads to significant maintenance problems.
To counteract such maintenance problems, a systematic reuse approach can be introduced afterwards by transforming the implementation of the cloned product variants. However, successful transformation is a challenging task because it requires precise and detailed information about the distribution of implementation similarity between the product variants. This information is usually not available, as the product variants were modified independent of each other. The motivation for this dissertation is hence to provide the needed similarity information and thus support the migration of existing system variants towards software reuse.
T The main contribution of this dissertation is a reverse engineering approach for obtaining information about the source code similarity of existing product variants. Compared to existing approaches, it delivers more detailed similarity information, reduces the analysis effort, and allows for improved correctness of similarity information understanding. The approach models the variant products as hierarchical, intersecting sets of uniquely identifiable elements, and expresses the similarity of the variants using set algebra. The resulting similarity information is available on any abstraction level, from a single code line to a whole product. The approach proposes a generic analysis framework, which can be used for diverse system representations, diverse similarity detection algorithms, and diverse definitions of element similarity. Hence, the approach can be instantiated in various contexts and adapted to a specific analysis goal.
T The contributed approach supports simultaneous analysis of multiple source code variants and proposes visualization concepts that enable easy interpretation of the analysis results even for large systems and a high number of variants. The benefits of the approach are evaluated empirically by means of a controlled experiment and an industrial case study, and analytically on a reference set of cloned system variants. Furthermore, practical applications of the approach in an industrial context are briefly presented.