| Rerere | 
 | ====== | 
 |  | 
 | This document describes the rerere logic. | 
 |  | 
 | Conflict normalization | 
 | ---------------------- | 
 |  | 
 | To ensure recorded conflict resolutions can be looked up in the rerere | 
 | database, even when branches are merged in a different order, | 
 | different branches are merged that result in the same conflict, or | 
 | when different conflict style settings are used, rerere normalizes the | 
 | conflicts before writing them to the rerere database. | 
 |  | 
 | Different conflict styles and branch names are normalized by stripping | 
 | the labels from the conflict markers, and removing the common ancestor | 
 | version from the `diff3` conflict style. Branches that are merged | 
 | in different order are normalized by sorting the conflict hunks.  More | 
 | on each of those steps in the following sections. | 
 |  | 
 | Once these two normalization operations are applied, a conflict ID is | 
 | calculated based on the normalized conflict, which is later used by | 
 | rerere to look up the conflict in the rerere database. | 
 |  | 
 | Removing the common ancestor version | 
 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | 
 |  | 
 | Say we have three branches AB, AC and AC2.  The common ancestor of | 
 | these branches has a file with a line containing the string "A" (for | 
 | brevity this is called "line A" in the rest of the document).  In | 
 | branch AB this line is changed to "B", in AC, this line is changed to | 
 | "C", and branch AC2 is forked off of AC, after the line was changed to | 
 | "C". | 
 |  | 
 | Forking a branch ABAC off of branch AB and then merging AC into it, we | 
 | get a conflict like the following: | 
 |  | 
 |     <<<<<<< HEAD | 
 |     B | 
 |     ======= | 
 |     C | 
 |     >>>>>>> AC | 
 |  | 
 | Doing the analogous with AC2 (forking a branch ABAC2 off of branch AB | 
 | and then merging branch AC2 into it), using the diff3 conflict style, | 
 | we get a conflict like the following: | 
 |  | 
 |     <<<<<<< HEAD | 
 |     B | 
 |     ||||||| merged common ancestors | 
 |     A | 
 |     ======= | 
 |     C | 
 |     >>>>>>> AC2 | 
 |  | 
 | By resolving this conflict, to leave line D, the user declares: | 
 |  | 
 |     After examining what branches AB and AC did, I believe that making | 
 |     line A into line D is the best thing to do that is compatible with | 
 |     what AB and AC wanted to do. | 
 |  | 
 | As branch AC2 refers to the same commit as AC, the above implies that | 
 | this is also compatible what AB and AC2 wanted to do. | 
 |  | 
 | By extension, this means that rerere should recognize that the above | 
 | conflicts are the same.  To do this, the labels on the conflict | 
 | markers are stripped, and the common ancestor version is removed.  The above | 
 | examples would both result in the following normalized conflict: | 
 |  | 
 |     <<<<<<< | 
 |     B | 
 |     ======= | 
 |     C | 
 |     >>>>>>> | 
 |  | 
 | Sorting hunks | 
 | ~~~~~~~~~~~~~ | 
 |  | 
 | As before, lets imagine that a common ancestor had a file with line A | 
 | its early part, and line X in its late part.  And then four branches | 
 | are forked that do these things: | 
 |  | 
 |     - AB: changes A to B | 
 |     - AC: changes A to C | 
 |     - XY: changes X to Y | 
 |     - XZ: changes X to Z | 
 |  | 
 | Now, forking a branch ABAC off of branch AB and then merging AC into | 
 | it, and forking a branch ACAB off of branch AC and then merging AB | 
 | into it, would yield the conflict in a different order.  The former | 
 | would say "A became B or C, what now?" while the latter would say "A | 
 | became C or B, what now?" | 
 |  | 
 | As a reminder, the act of merging AC into ABAC and resolving the | 
 | conflict to leave line D means that the user declares: | 
 |  | 
 |     After examining what branches AB and AC did, I believe that | 
 |     making line A into line D is the best thing to do that is | 
 |     compatible with what AB and AC wanted to do. | 
 |  | 
 | So the conflict we would see when merging AB into ACAB should be | 
 | resolved the same way---it is the resolution that is in line with that | 
 | declaration. | 
 |  | 
 | Imagine that similarly previously a branch XYXZ was forked from XY, | 
 | and XZ was merged into it, and resolved "X became Y or Z" into "X | 
 | became W". | 
 |  | 
 | Now, if a branch ABXY was forked from AB and then merged XY, then ABXY | 
 | would have line B in its early part and line Y in its later part. | 
 | Such a merge would be quite clean.  We can construct 4 combinations | 
 | using these four branches ((AB, AC) x (XY, XZ)). | 
 |  | 
 | Merging ABXY and ACXZ would make "an early A became B or C, a late X | 
 | became Y or Z" conflict, while merging ACXY and ABXZ would make "an | 
 | early A became C or B, a late X became Y or Z".  We can see there are | 
 | 4 combinations of ("B or C", "C or B") x ("X or Y", "Y or X"). | 
 |  | 
 | By sorting, the conflict is given its canonical name, namely, "an | 
 | early part became B or C, a late part becames X or Y", and whenever | 
 | any of these four patterns appear, and we can get to the same conflict | 
 | and resolution that we saw earlier. | 
 |  | 
 | Without the sorting, we'd have to somehow find a previous resolution | 
 | from combinatorial explosion. | 
 |  | 
 | Conflict ID calculation | 
 | ~~~~~~~~~~~~~~~~~~~~~~~ | 
 |  | 
 | Once the conflict normalization is done, the conflict ID is calculated | 
 | as the sha1 hash of the conflict hunks appended to each other, | 
 | separated by <NUL> characters.  The conflict markers are stripped out | 
 | before the sha1 is calculated.  So in the example above, where we | 
 | merge branch AC which changes line A to line C, into branch AB, which | 
 | changes line A to line C, the conflict ID would be | 
 | SHA1('B<NUL>C<NUL>'). | 
 |  | 
 | If there are multiple conflicts in one file, the sha1 is calculated | 
 | the same way with all hunks appended to each other, in the order in | 
 | which they appear in the file, separated by a <NUL> character. | 
 |  | 
 | Nested conflicts | 
 | ~~~~~~~~~~~~~~~~ | 
 |  | 
 | Nested conflicts are handled very similarly to "simple" conflicts. | 
 | Similar to simple conflicts, the conflict is first normalized by | 
 | stripping the labels from conflict markers, stripping the common ancestor | 
 | version, and the sorting the conflict hunks, both for the outer and the | 
 | inner conflict.  This is done recursively, so any number of nested | 
 | conflicts can be handled. | 
 |  | 
 | Note that this only works for conflict markers that "cleanly nest".  If | 
 | there are any unmatched conflict markers, rerere will fail to handle | 
 | the conflict and record a conflict resolution. | 
 |  | 
 | The only difference is in how the conflict ID is calculated.  For the | 
 | inner conflict, the conflict markers themselves are not stripped out | 
 | before calculating the sha1. | 
 |  | 
 | Say we have the following conflict for example: | 
 |  | 
 |     <<<<<<< HEAD | 
 |     1 | 
 |     ======= | 
 |     <<<<<<< HEAD | 
 |     3 | 
 |     ======= | 
 |     2 | 
 |     >>>>>>> branch-2 | 
 |     >>>>>>> branch-3~ | 
 |  | 
 | After stripping out the labels of the conflict markers, and sorting | 
 | the hunks, the conflict would look as follows: | 
 |  | 
 |     <<<<<<< | 
 |     1 | 
 |     ======= | 
 |     <<<<<<< | 
 |     2 | 
 |     ======= | 
 |     3 | 
 |     >>>>>>> | 
 |     >>>>>>> | 
 |  | 
 | and finally the conflict ID would be calculated as: | 
 | `sha1('1<NUL><<<<<<<\n3\n=======\n2\n>>>>>>><NUL>')` |