Cross-Language Refactoring: The CLaRe Research Project

Purpose

The CLaRe research project investigates refactoring across programming language boundaries.

Funding

The CLaRe project is funded by the Deutsche Forschungsgemeinschaft (DFG) under grant STE906/4-1 (two research assistants for two years).

Problem

Refactoring is the meaning-preserving transformation of existing code, usually performed with the aim of improving some non-functional quality such as the code’s readability or changeability.

Correct refactoring   Due to the complexity of programming languages such as Java or C#, even simple refactorings can be difficult to get right. In fact, as of this writing three major Java IDEs we have tested cannot move one of the two classes

class A { B b; }
class B {}

(both residing in the same package) to another package without introducing a compile error. It is commonly argued that because they produce a compile error and thus can be responded to, refactoring bugs of this kind are benign and and not a problem, and that the only threat comes from refactoring bugs that silently change the meaning of a program (eg, through a change of binding) and escape regression testing (malign bugs). However, we contend that if refactoring is to be accepted as a discipline, neither kind of bug can be tolerated. Also, one frequently hears that most — if not all — bugs in refactoring tools can be traced back to what developers would likely call corner cases: supposedly rare constellations of language constructs. We contend that such constellations are nevertheless legal and do occur in real programs, and that to be correct, a refactoring tool must consider all constellations, no matter how infrequent they are.

Correct cross-language refactoring   If creating correct refactoring tools for a single programming language is difficult, creating correct refactoring tools for programs written in more than one programming language will be even more so. Indeed, in light of what has been said above about the correctness of refactoring tools we must assume that no “common language core” (“greatest common divisor”) solution will lead to acceptable results, since the common core (as a simplification) will be insufficient even for the refactoring of programs written in any single language. It seems more likely that a “least common demoninator” of the different involved languages will form the basis of cross-language refactoring, which, unless one language is a subset (“divisor”) of the other, will be greater than the basis of correct refactoring in any one of the involved languages.

Approach

CLaRe builds on prior work on constraint-based type refactoring by Frank Tip et al., and on our own work using constraints to handle declared accessibility in refactorings. With constraint-based refactoring, the syntactic and static semantic (binding) rules of a programming language are captured in the form of so-called constraint rules, rules whose precedents check for the presence of certain conditions in a program to be refactored (such as a reference binding to a declared entity, or one type declaring to subtype another) and whose consequents introduce constraints, ie, conditions on constraint variables to be respected when changing the program in the course of the refactoring (where the constraint variables represent certain changeable properties of the program such as the declared accessibility of an entity, or the type in which the entity is declared). The set of constraints generated by applying the constraint rules to a program to be refactored constitutes a constraint system whose solution space (ie, variable assignments satisfying all constraints) represents all correct (ie, meaning-preserving) refactorings.

Compared to imperative approaches, the constraint-based approach to refactoring has the advantage that it captures the language specifications in a modular manner: each constraint rule is completely independent of all others. Different combinations of language constructs, including all corner cases, need not be considered explicitly (eg, in deeply nested case analyses), but are implicitly handled by the joint solution of the constraints generated by application of the individual rules. Thus, all the complexity is in the constraint solution process, for which standard algorithms exist. Also, correctness of refactorings depends on the correctness and completeness of the used constraint rules (completeness in the sense that the constraint rules capture all relevant parts of the language specification). Both however can be checked (eg, tested) independently of any particular refactoring.

Since constraint rules correspond closely to the rules of a language, different languages will require different constraint rules. In fact, the different sets of constraint rules required for the different languages mirror commonalities and differences of the language specifications in a concise way. It is an open research question whether cross-language refactorings will require additional constraint rules reflecting the conditions of accessing program elements across specific language boundaries, or whether the sharing of constraint variables (including a mapping of their values to the different domains required by the different languages) suffices. We expect that the number of constraint rules required for cross-language refactoring will be a linear function of the numbers of constraint rules required for each individual language.

Related projects

The following earlier refactoring tools of ours build on constraint-based refactoring:

Related work

Publications

Team