Architecture
Comparison Process
This is the overview of the comparison process as a whole. Each of the six phases of the comparison process of EMF Compare are briefly defined on the
Overview, and a much more in-depth explanation will be given below, in our explanations of the
default behavior of EMF Compare.
Project Architecture
EMF Compare is built on top of the Eclipse platform. We depend on the Eclipse Modeling Framework (EMF), the Eclipse Compare framework and, finally, Eclipse Team, the framework upon which the repository providers (EGit, CVS, Subversive...) are built.
The EMF Compare extensions target specific extensions of the modeling framework: UML, the Graphical Modeling Framework (and its own extensions, papyrus, ecoretools, ...).
Whilst we are built atop bricks that are tightly coupled with the eclipse platform, it should be noted that the core of EMF Compare can be run in a standalone application with no runtime dependencies towards Eclipse; as can EMF itself.
The Comparison Model
EMF Compare uses a single model, which root is a
Comparison object, to represent all of the information regarding the comparison: matched objects, matched resources, detected differences, links between these references, etc. The root
Comparison is created at the beginning of the Match process, and will undergo a set of successive refinings during the remainder of the Comparison: Diff, Equivalence, Dependencies... will all add their own information to the
Comparison.
So, how exactly is represented all of the information the Comparison model can hold, and how to make sense of it all?
Match
A
Match element is how we represent that the
n compared versions have elements that are basically the same. For example, if we are comparing two different versions
v1 and
v2 of a given model which look like:
Master |
Borrowables |
|
|
Comparing these two models, we'll have a Comparison model containing three matches:
- library <-> library
- Book <-> Novel
- title <-> title
In other words, the comparison model contains an aggregate of the two or three compared models, in the form of
Match elements linking the elements of all versions together. Differences will then be detected on these
Match and added under them, thus allowing us to know both:
- what the difference is (for example, "attribute name has been changed from
Book to
Novel"), and
- what the original elements were.
Diff
Diff elements are created during the differencing process in order to represent the actual modifications that can be detected within the source model(s). The
Diff concept itself is only there as the super-class of the three main kind of differences EMF Compare can detect in a model, namely
ReferenceChange,
AttributeChange and
ResourceAttachmentChange. We'll go back to these three sub-classes in a short while.
Whatever their type, the differences share a number of common elements:
- a parent
match: differences are detected on a given
Match. Having a difference basically means that one of the elements paired through this
Match differs from its "reference" side (see
source description below).
- a
source: differences are detected on one side of their match. The source really only holds meaning in three-way comparisons, where a difference can be detected in either right or left. All differences detected through two-way comparisons have their source in the left side. This is because we always compare according to a "reference" side. During two-way comparisons, the reference side is the right: differences will always be detected on the left side as compared with the right side. During three-way comparisons though, differences can be detected on either left or right side as compared with their common ancestor; but never as compared to themselves (in other words, this is
roughly equivalent to two two-way comparisons, first the left as compared to the origin, then the right as compared to the origin).
- a current
state: all differences start off in their initial
unresolved state. The user can then choose to:
- merge the difference (towards either right or left, applying or reverting the difference in the process), in which case the difference becomes
merged, or
- discard it, thus marking the change as
discarded. For example, if there is a conflicting edit of a textual attribute, the user can decide that neither right nor left are satisfying, and instead settle for a mix of the two.
- a
kind: this is used by the engine to describe the type of difference it detected. Differences can be of four general types:
-
Add: There are two distinct things that EMF Compare considers as an "addition". First, adding a new element within the values of a
multi-valued feature is undeniably an addition. Second, any change in a
containment reference, even if that reference is mono-valued, that represents a "new" element in the model is considered to be an addition. Note that this second case is an exception to the rule for
change differences outlined below.
-
Delete: this is used as the counterpart of
add differences, and it presents the same exception for
mono-valued containment references.
-
Change: any modification to a
mono-valued feature is considered as a
change difference by the engine. Take note that containment references are an exception to this rule: no
change will ever be detected on those.
-
Move: once again, two distinct things are represented as
move differences in the comparison model. First,
reordering the values of a multi-valued feature is considered as a series of MOVE: one difference for each moved value (EMF Compare computes the smallest number of differences needed between the two sides' values). Second, moving an object from
one container to another (changing the containing feature of the EObject) will be detected as a
move.
In order to ensure that the model stays coherent through individual merge operations, we've also decided to link differences together through a number of associations and references. For example, there are times when one difference cannot be merged without first merging another, or some differences which are exactly equivalent to one another. In no specific order:
-
dependency: EMF Compare uses two oppposite references in order to track dependencies between differences. Namely,
requires and
requiredBy represent the two ends of this association. If the user has added a package
P1, then added a new Class
C1 within this package, we will detect both differences. However the addition of
C1 cannot be merged without first adding its container
P1. In such a case, the addition of
C1
requires the addition of
P1, and the later is
requiredBy the former.
-
refinement: this link is mainly used by extensions of EMF Compare in order to create high-level differences to hide the complexity of the comparison model. For example, this is used by the UML extension of EMF Compare to tell that the three differences "adding an association
A1", "adding a property
P1 in association
A1" and "adding a property
P2 in association
A1" is actually one single high-level difference, "adding an association
A1". This high-level difference is
refinedBy the others, which all
refines it.
-
equivalence: this association is used by the comparison engine in order to link together differences which are equivalent in terms of merging. For example, Ecore has a concept of
eOpposite references. Updating one of the two sides of an
eOpposite will automatically update the other. In such an event, EMF Compare will detect both sides as an individual difference. However, merging one of the two will trigger the update of the other side of the
eOpposite as well. In such cases, the two differences are set to be
equivalent to one another. Merging one difference part of an equivalence relationship will automatically mark all of the others as
merged (see
state above).
-
implication: implications are a special kind of "directed equivalence". A difference D1 that is linked as "implied by" another D2 means that merging D1 requires us to merge D1 instead. In other words, D2 will be automatically merged if we merge D1, but D1 will not be automatically merged if we merge D2. Implications are mostly used with UML models, where subsets and supersets may trigger such linked changes.
-
conflict: during three-way comparisons, we compare two versions of a given model with their common ancestor. We can thus detect changes that were made in either left or right side (see the description of
source above). However, there are cases when changes in the left conflict with changes in the right. For example, a class named "Book" in the origin model can have been renamed to "Novel" in the left model whereas it has been renamed to "Essay" in the right model. In such a case, the two differences will be marked as being in conflict with one another.
As mentionned above, there are only three kind of differences that we will detect through EMF Compare, which will be sufficient for all use cases.
ReferenceChange differences will be detected for every value of a reference for which we detect a change. Either the value was added, deleted, or moved (within the reference or between distinct references).
AttributeChange differences are the same, but for attributes instead of references. Lastly, the
ResourceAttachmentChange differences, though very much alike the ReferenceChanges we create for containment references, are specifically aimed at describing changes within the roots of one of the compared resources.
Conflict
Conflict will only be detected during three-way comparisons. There can only be "conflicts" when we are comparing two different versions of a same model along with their common ancestor. In other words, we need to able to compare two versions of a common element with a "reference" version of that element.
There are many different kinds of conflicts; to name a few:
- changing an element on one side (in any way, for example, renaming it) whilst that element has been removed from the other side
- changing the same attribute of an element on both sides, to different values (for example, renaming "Book" to "Novel" on the left while be renamed "Book" to "Essay" on the right)
- creating a new reference to an element on one side whilst it had been deleted from the other side
Conflicts can be of two kinds. We call
PSEUDO conflict a conflict where the two sides of a comparison have changed as compared to their common ancestor, but where the two sides are actually now equal. In other words, the end result is that the left is now equal to the right, even though they are both different from their ancestor. This is the opposite of
REAL conflict where the value on all three sides is different. In terms of merging, pseudo conflicts do not need any particular action, whilst real conflicts actually need resolution.
There can be more than two differences conflicting with each other. For example, the deletion of an element from one side will most likely conflict with a number of differences from the other side.
Equivalence
EMF Compare uses
Equivalence elements in order to link together a number of differences which can ultimately be considered to be the same. For example, ecore's
eOpposite references will be maintained in sync with one another. As such, modifying one of the two references will automatically update the second one accordingly. The manual modification and the automatic update are two distinct modifications of the model, resulting in two differences detected. However, merging any of these two differences will automatically merge the other one. Therefore both are marked as being equivalent to each other.
There can be more than two differences equivalent with each other; in which case all will be added to a single
Equivalence object, representing their relations.