Cross-Language Refactoring (CLARE)

Purpose

The CLARE research project investigates refactoring across language boundaries.

Funding

The CLARE project is funded by the Deutsche Forschungsgemeinschaft (DFG) under grant STE906/4-1 (two research assistants for two years).

Problem

Refactoring is the meaning-preserving transformation of existing code, usually performed with the aim of improving some non-functional quality such as the code’s readability or changeability.

Due to the complexity of programming languages such as Java or C#, even simple refactorings such as moving a class to another package can be difficult to get right. In fact, as of this writing, three major Java IDEs we have tested cannot move one of the two classes

class A { B b; }

class B {}

(both residing in the same package) to another package without introducing a compile error. Since they produce a compile error and thus can be responded to, bugs of this kind are benign; others that silently change the meaning of a program (eg, through a change of binding) are malign by comparison. Benign or malign, we contend that most — if not all — bugs of this kind can be traced back to what developers would likely call corner cases: supposedly rare constellations of language constructs that are nevertheless legal and that do occur in real programs. To be correct, a refactoring tool must consider all constellations, no matter how infrequent they are.

If creating correct refactoring tools for a single programming language is difficult, creating correct refactoring tools for programs written in more than one programming language will be even more so. Indeed, given what has been said above we must assume that no “common language core” solution will lead to acceptable results, since the common core (as a simplification) will be insufficient even for the refactoring of programs written in any single language. It seems more likely that a “common demoninator” of the different involved languages will form the basis of cross-language refactoring, which, unless one language is a subset (“divisor”) of the other, will be greater than the basis of correct refactoring in any one of the involved languages.

Approach

CLARE builds on prior work on constraint-based type refactoring by Frank Tip et al., and on our own work using constraints to handle declared accessibility in refactorings. With constraint-based refactoring, the syntactic and static semantic (binding) rules of a programming language are captured in the form of so-called constraint rules, rules whose precedents check for the presence of certain conditions in a program to be refactored (such as a reference binding to a declared entity, or one type declaring to subtype another) and whose consequents introduce constraints, ie, conditions on constraint variables, to be respected when changing the program in the course of the refactoring, where the constraint variables represent certain changeable properties of the program (such as the declared accessibility of an entity, or the type in which the entity is declared). The set of constraints generated by applying the constraint rules to a program to be refactored constitutes a constraint system whose solution space (ie, variable assignments satisfying all constraints) represents all correct (ie, meaning-preserving) refactorings.

Compared to imperative approaches, the constraint-based approach to refactoring has the advantage that it captures the language specifications in a modular manner: each constraint rule is completely independent of all others. Different combinations of language constructs, including all corner cases, need not be considered explicitly (eg, in deeply nested case analyses), but are implicitly handled by the joint solution of the constraints generated by the individual rules. Thus, all the complexity is in the constraint solution process, for which standard algorithms exist. Also, correctness of refactorings depends on the correctness and completeness of the used constraint rules (completeness in the sense that the constraint rules capture all relevant parts of the language specification). Both however can be tested independently of any particular refactoring.

Since constraint rules correspond closely to the rules of a language, different languages will require different constraint rules. In fact, the different sets of constraint rules required for the different languages mirror commonalities and differences of the language specifications in a concise way. It is an open research question whether cross-language refactorings will require additional constraint rules reflecting the conditions of accessing program elements across specific language boundaries, or whether the sharing of constraint variables (including a mapping of their values to different domains) suffices. We expect that the number of constraint rules required for cross-language refactoring will be a linear function of the numbers of constraint rules required for each individual language.

Related earlier projects

The following refactoring tools of ours build on constraint-based refactoring:

Publications

F Steimann “The Infer Type refactoring and its use for interface-based programming” Journal of Object Technology 6:2 (2007) 67–89.
H Kegel Constraint-basierte Typinferenz für Java 5 (Diplomarbeit, Lehrgebiet Programmiersysteme, Fernuniveristät in Hagen, 2007).
H Kegel, F Steimann “Systematically refactoring inheritance to delegation in Java” in: Proc. of ICSE (2008) 431–440.
F Steimann, A Thies “From public to private to absent: Refactoring Java programs under constrained accessibility” in: Proc. of ECOOP (2009) 419–443.
F Steimann, A Thies “From behaviour preservation to behaviour modification: Constraint-based mutant generation” in: Proc. of ICSE (2010) to appear.

Team

Andreas Thies
Christian Kollee (DFG)
Jens von Pilgrim (DFG)
Friedrich Steimann