|  | Rerere | 
|  | ====== | 
|  |  | 
|  | This document describes the rerere logic. | 
|  |  | 
|  | Conflict normalization | 
|  | ---------------------- | 
|  |  | 
|  | To ensure recorded conflict resolutions can be looked up in the rerere | 
|  | database, even when branches are merged in a different order, | 
|  | different branches are merged that result in the same conflict, or | 
|  | when different conflict style settings are used, rerere normalizes the | 
|  | conflicts before writing them to the rerere database. | 
|  |  | 
|  | Different conflict styles and branch names are normalized by stripping | 
|  | the labels from the conflict markers, and removing the common ancestor | 
|  | version from the `diff3` conflict style. Branches that are merged | 
|  | in different order are normalized by sorting the conflict hunks.  More | 
|  | on each of those steps in the following sections. | 
|  |  | 
|  | Once these two normalization operations are applied, a conflict ID is | 
|  | calculated based on the normalized conflict, which is later used by | 
|  | rerere to look up the conflict in the rerere database. | 
|  |  | 
|  | Removing the common ancestor version | 
|  | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | 
|  |  | 
|  | Say we have three branches AB, AC and AC2.  The common ancestor of | 
|  | these branches has a file with a line containing the string "A" (for | 
|  | brevity this is called "line A" in the rest of the document).  In | 
|  | branch AB this line is changed to "B", in AC, this line is changed to | 
|  | "C", and branch AC2 is forked off of AC, after the line was changed to | 
|  | "C". | 
|  |  | 
|  | Forking a branch ABAC off of branch AB and then merging AC into it, we | 
|  | get a conflict like the following: | 
|  |  | 
|  | <<<<<<< HEAD | 
|  | B | 
|  | ======= | 
|  | C | 
|  | >>>>>>> AC | 
|  |  | 
|  | Doing the analogous with AC2 (forking a branch ABAC2 off of branch AB | 
|  | and then merging branch AC2 into it), using the diff3 conflict style, | 
|  | we get a conflict like the following: | 
|  |  | 
|  | <<<<<<< HEAD | 
|  | B | 
|  | ||||||| merged common ancestors | 
|  | A | 
|  | ======= | 
|  | C | 
|  | >>>>>>> AC2 | 
|  |  | 
|  | By resolving this conflict, to leave line D, the user declares: | 
|  |  | 
|  | After examining what branches AB and AC did, I believe that making | 
|  | line A into line D is the best thing to do that is compatible with | 
|  | what AB and AC wanted to do. | 
|  |  | 
|  | As branch AC2 refers to the same commit as AC, the above implies that | 
|  | this is also compatible what AB and AC2 wanted to do. | 
|  |  | 
|  | By extension, this means that rerere should recognize that the above | 
|  | conflicts are the same.  To do this, the labels on the conflict | 
|  | markers are stripped, and the common ancestor version is removed.  The above | 
|  | examples would both result in the following normalized conflict: | 
|  |  | 
|  | <<<<<<< | 
|  | B | 
|  | ======= | 
|  | C | 
|  | >>>>>>> | 
|  |  | 
|  | Sorting hunks | 
|  | ~~~~~~~~~~~~~ | 
|  |  | 
|  | As before, lets imagine that a common ancestor had a file with line A | 
|  | its early part, and line X in its late part.  And then four branches | 
|  | are forked that do these things: | 
|  |  | 
|  | - AB: changes A to B | 
|  | - AC: changes A to C | 
|  | - XY: changes X to Y | 
|  | - XZ: changes X to Z | 
|  |  | 
|  | Now, forking a branch ABAC off of branch AB and then merging AC into | 
|  | it, and forking a branch ACAB off of branch AC and then merging AB | 
|  | into it, would yield the conflict in a different order.  The former | 
|  | would say "A became B or C, what now?" while the latter would say "A | 
|  | became C or B, what now?" | 
|  |  | 
|  | As a reminder, the act of merging AC into ABAC and resolving the | 
|  | conflict to leave line D means that the user declares: | 
|  |  | 
|  | After examining what branches AB and AC did, I believe that | 
|  | making line A into line D is the best thing to do that is | 
|  | compatible with what AB and AC wanted to do. | 
|  |  | 
|  | So the conflict we would see when merging AB into ACAB should be | 
|  | resolved the same way---it is the resolution that is in line with that | 
|  | declaration. | 
|  |  | 
|  | Imagine that similarly previously a branch XYXZ was forked from XY, | 
|  | and XZ was merged into it, and resolved "X became Y or Z" into "X | 
|  | became W". | 
|  |  | 
|  | Now, if a branch ABXY was forked from AB and then merged XY, then ABXY | 
|  | would have line B in its early part and line Y in its later part. | 
|  | Such a merge would be quite clean.  We can construct 4 combinations | 
|  | using these four branches ((AB, AC) x (XY, XZ)). | 
|  |  | 
|  | Merging ABXY and ACXZ would make "an early A became B or C, a late X | 
|  | became Y or Z" conflict, while merging ACXY and ABXZ would make "an | 
|  | early A became C or B, a late X became Y or Z".  We can see there are | 
|  | 4 combinations of ("B or C", "C or B") x ("X or Y", "Y or X"). | 
|  |  | 
|  | By sorting, the conflict is given its canonical name, namely, "an | 
|  | early part became B or C, a late part becames X or Y", and whenever | 
|  | any of these four patterns appear, and we can get to the same conflict | 
|  | and resolution that we saw earlier. | 
|  |  | 
|  | Without the sorting, we'd have to somehow find a previous resolution | 
|  | from combinatorial explosion. | 
|  |  | 
|  | Conflict ID calculation | 
|  | ~~~~~~~~~~~~~~~~~~~~~~~ | 
|  |  | 
|  | Once the conflict normalization is done, the conflict ID is calculated | 
|  | as the sha1 hash of the conflict hunks appended to each other, | 
|  | separated by <NUL> characters.  The conflict markers are stripped out | 
|  | before the sha1 is calculated.  So in the example above, where we | 
|  | merge branch AC which changes line A to line C, into branch AB, which | 
|  | changes line A to line C, the conflict ID would be | 
|  | SHA1('B<NUL>C<NUL>'). | 
|  |  | 
|  | If there are multiple conflicts in one file, the sha1 is calculated | 
|  | the same way with all hunks appended to each other, in the order in | 
|  | which they appear in the file, separated by a <NUL> character. | 
|  |  | 
|  | Nested conflicts | 
|  | ~~~~~~~~~~~~~~~~ | 
|  |  | 
|  | Nested conflicts are handled very similarly to "simple" conflicts. | 
|  | Similar to simple conflicts, the conflict is first normalized by | 
|  | stripping the labels from conflict markers, stripping the common ancestor | 
|  | version, and the sorting the conflict hunks, both for the outer and the | 
|  | inner conflict.  This is done recursively, so any number of nested | 
|  | conflicts can be handled. | 
|  |  | 
|  | Note that this only works for conflict markers that "cleanly nest".  If | 
|  | there are any unmatched conflict markers, rerere will fail to handle | 
|  | the conflict and record a conflict resolution. | 
|  |  | 
|  | The only difference is in how the conflict ID is calculated.  For the | 
|  | inner conflict, the conflict markers themselves are not stripped out | 
|  | before calculating the sha1. | 
|  |  | 
|  | Say we have the following conflict for example: | 
|  |  | 
|  | <<<<<<< HEAD | 
|  | 1 | 
|  | ======= | 
|  | <<<<<<< HEAD | 
|  | 3 | 
|  | ======= | 
|  | 2 | 
|  | >>>>>>> branch-2 | 
|  | >>>>>>> branch-3~ | 
|  |  | 
|  | After stripping out the labels of the conflict markers, and sorting | 
|  | the hunks, the conflict would look as follows: | 
|  |  | 
|  | <<<<<<< | 
|  | 1 | 
|  | ======= | 
|  | <<<<<<< | 
|  | 2 | 
|  | ======= | 
|  | 3 | 
|  | >>>>>>> | 
|  | >>>>>>> | 
|  |  | 
|  | and finally the conflict ID would be calculated as: | 
|  | `sha1('1<NUL><<<<<<<\n3\n=======\n2\n>>>>>>><NUL>')` |