Documentation/technical/rerere.txt - git - Git at Google

 Rerere
 ======

 This document describes the rerere logic.

 Conflict normalization
 ----------------------

 To ensure recorded conflict resolutions can be looked up in the rerere
 database, even when branches are merged in a different order,
 different branches are merged that result in the same conflict, or
 when different conflict style settings are used, rerere normalizes the
 conflicts before writing them to the rerere database.

 Different conflict styles and branch names are normalized by stripping
 the labels from the conflict markers, and removing the common ancestor
 version from the `diff3` conflict style. Branches that are merged
 in different order are normalized by sorting the conflict hunks.  More
 on each of those steps in the following sections.

 Once these two normalization operations are applied, a conflict ID is
 calculated based on the normalized conflict, which is later used by
 rerere to look up the conflict in the rerere database.

 Removing the common ancestor version
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 Say we have three branches AB, AC and AC2.  The common ancestor of
 these branches has a file with a line containing the string "A" (for
 brevity this is called "line A" in the rest of the document).  In
 branch AB this line is changed to "B", in AC, this line is changed to
 "C", and branch AC2 is forked off of AC, after the line was changed to
 "C".

 Forking a branch ABAC off of branch AB and then merging AC into it, we
 get a conflict like the following:

     <<<<<<< HEAD
     B
     =======
     C
     >>>>>>> AC

 Doing the analogous with AC2 (forking a branch ABAC2 off of branch AB
 and then merging branch AC2 into it), using the diff3 conflict style,
 we get a conflict like the following:

     <<<<<<< HEAD
     B
     ||||||| merged common ancestors
     A
     =======
     C
     >>>>>>> AC2

 By resolving this conflict, to leave line D, the user declares:

     After examining what branches AB and AC did, I believe that making
     line A into line D is the best thing to do that is compatible with
     what AB and AC wanted to do.

 As branch AC2 refers to the same commit as AC, the above implies that
 this is also compatible what AB and AC2 wanted to do.

 By extension, this means that rerere should recognize that the above
 conflicts are the same.  To do this, the labels on the conflict
 markers are stripped, and the common ancestor version is removed.  The above
 examples would both result in the following normalized conflict:

     <<<<<<<
     B
     =======
     C
     >>>>>>>

 Sorting hunks
 ~~~~~~~~~~~~~

 As before, lets imagine that a common ancestor had a file with line A
 its early part, and line X in its late part.  And then four branches
 are forked that do these things:

     - AB: changes A to B
     - AC: changes A to C
     - XY: changes X to Y
     - XZ: changes X to Z

 Now, forking a branch ABAC off of branch AB and then merging AC into
 it, and forking a branch ACAB off of branch AC and then merging AB
 into it, would yield the conflict in a different order.  The former
 would say "A became B or C, what now?" while the latter would say "A
 became C or B, what now?"

 As a reminder, the act of merging AC into ABAC and resolving the
 conflict to leave line D means that the user declares:

     After examining what branches AB and AC did, I believe that
     making line A into line D is the best thing to do that is
     compatible with what AB and AC wanted to do.

 So the conflict we would see when merging AB into ACAB should be
 resolved the same way---it is the resolution that is in line with that
 declaration.

 Imagine that similarly previously a branch XYXZ was forked from XY,
 and XZ was merged into it, and resolved "X became Y or Z" into "X
 became W".

 Now, if a branch ABXY was forked from AB and then merged XY, then ABXY
 would have line B in its early part and line Y in its later part.
 Such a merge would be quite clean.  We can construct 4 combinations
 using these four branches ((AB, AC) x (XY, XZ)).

 Merging ABXY and ACXZ would make "an early A became B or C, a late X
 became Y or Z" conflict, while merging ACXY and ABXZ would make "an
 early A became C or B, a late X became Y or Z".  We can see there are
 4 combinations of ("B or C", "C or B") x ("X or Y", "Y or X").

 By sorting, the conflict is given its canonical name, namely, "an
 early part became B or C, a late part becames X or Y", and whenever
 any of these four patterns appear, and we can get to the same conflict
 and resolution that we saw earlier.

 Without the sorting, we'd have to somehow find a previous resolution
 from combinatorial explosion.

 Conflict ID calculation
 ~~~~~~~~~~~~~~~~~~~~~~~

 Once the conflict normalization is done, the conflict ID is calculated
 as the sha1 hash of the conflict hunks appended to each other,
 separated by <NUL> characters.  The conflict markers are stripped out
 before the sha1 is calculated.  So in the example above, where we
 merge branch AC which changes line A to line C, into branch AB, which
 changes line A to line C, the conflict ID would be
 SHA1('B<NUL>C<NUL>').

 If there are multiple conflicts in one file, the sha1 is calculated
 the same way with all hunks appended to each other, in the order in
 which they appear in the file, separated by a <NUL> character.

 Nested conflicts
 ~~~~~~~~~~~~~~~~

 Nested conflicts are handled very similarly to "simple" conflicts.
 Similar to simple conflicts, the conflict is first normalized by
 stripping the labels from conflict markers, stripping the common ancestor
 version, and the sorting the conflict hunks, both for the outer and the
 inner conflict.  This is done recursively, so any number of nested
 conflicts can be handled.

 Note that this only works for conflict markers that "cleanly nest".  If
 there are any unmatched conflict markers, rerere will fail to handle
 the conflict and record a conflict resolution.

 The only difference is in how the conflict ID is calculated.  For the
 inner conflict, the conflict markers themselves are not stripped out
 before calculating the sha1.

 Say we have the following conflict for example:

     <<<<<<< HEAD
     1
     =======
     <<<<<<< HEAD
     3
     =======
     2
     >>>>>>> branch-2
     >>>>>>> branch-3~

 After stripping out the labels of the conflict markers, and sorting
 the hunks, the conflict would look as follows:

     <<<<<<<
     1
     =======
     <<<<<<<
     2
     =======
     3
     >>>>>>>
     >>>>>>>

 and finally the conflict ID would be calculated as:
 `sha1('1<NUL><<<<<<<\n3\n=======\n2\n>>>>>>><NUL>')`
	Rerere
	======

	This document describes the rerere logic.

	Conflict normalization
	----------------------

	To ensure recorded conflict resolutions can be looked up in the rerere
	database, even when branches are merged in a different order,
	different branches are merged that result in the same conflict, or
	when different conflict style settings are used, rerere normalizes the
	conflicts before writing them to the rerere database.

	Different conflict styles and branch names are normalized by stripping
	the labels from the conflict markers, and removing the common ancestor
	version from the `diff3` conflict style. Branches that are merged
	in different order are normalized by sorting the conflict hunks. More
	on each of those steps in the following sections.

	Once these two normalization operations are applied, a conflict ID is
	calculated based on the normalized conflict, which is later used by
	rerere to look up the conflict in the rerere database.

	Removing the common ancestor version
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	Say we have three branches AB, AC and AC2. The common ancestor of
	these branches has a file with a line containing the string "A" (for
	brevity this is called "line A" in the rest of the document). In
	branch AB this line is changed to "B", in AC, this line is changed to
	"C", and branch AC2 is forked off of AC, after the line was changed to
	"C".

	Forking a branch ABAC off of branch AB and then merging AC into it, we
	get a conflict like the following:

	<<<<<<< HEAD
	B
	=======
	C
	>>>>>>> AC

	Doing the analogous with AC2 (forking a branch ABAC2 off of branch AB
	and then merging branch AC2 into it), using the diff3 conflict style,
	we get a conflict like the following:

	<<<<<<< HEAD
	B
	\|\|\|\|\|\|\| merged common ancestors
	A
	=======
	C
	>>>>>>> AC2

	By resolving this conflict, to leave line D, the user declares:

	After examining what branches AB and AC did, I believe that making
	line A into line D is the best thing to do that is compatible with
	what AB and AC wanted to do.

	As branch AC2 refers to the same commit as AC, the above implies that
	this is also compatible what AB and AC2 wanted to do.

	By extension, this means that rerere should recognize that the above
	conflicts are the same. To do this, the labels on the conflict
	markers are stripped, and the common ancestor version is removed. The above
	examples would both result in the following normalized conflict:

	<<<<<<<
	B
	=======
	C
	>>>>>>>

	Sorting hunks
	~~~~~~~~~~~~~

	As before, lets imagine that a common ancestor had a file with line A
	its early part, and line X in its late part. And then four branches
	are forked that do these things:

	- AB: changes A to B
	- AC: changes A to C
	- XY: changes X to Y
	- XZ: changes X to Z

	Now, forking a branch ABAC off of branch AB and then merging AC into
	it, and forking a branch ACAB off of branch AC and then merging AB
	into it, would yield the conflict in a different order. The former
	would say "A became B or C, what now?" while the latter would say "A
	became C or B, what now?"

	As a reminder, the act of merging AC into ABAC and resolving the
	conflict to leave line D means that the user declares:

	After examining what branches AB and AC did, I believe that
	making line A into line D is the best thing to do that is
	compatible with what AB and AC wanted to do.

	So the conflict we would see when merging AB into ACAB should be
	resolved the same way---it is the resolution that is in line with that
	declaration.

	Imagine that similarly previously a branch XYXZ was forked from XY,
	and XZ was merged into it, and resolved "X became Y or Z" into "X
	became W".

	Now, if a branch ABXY was forked from AB and then merged XY, then ABXY
	would have line B in its early part and line Y in its later part.
	Such a merge would be quite clean. We can construct 4 combinations
	using these four branches ((AB, AC) x (XY, XZ)).

	Merging ABXY and ACXZ would make "an early A became B or C, a late X
	became Y or Z" conflict, while merging ACXY and ABXZ would make "an
	early A became C or B, a late X became Y or Z". We can see there are
	4 combinations of ("B or C", "C or B") x ("X or Y", "Y or X").

	By sorting, the conflict is given its canonical name, namely, "an
	early part became B or C, a late part becames X or Y", and whenever
	any of these four patterns appear, and we can get to the same conflict
	and resolution that we saw earlier.

	Without the sorting, we'd have to somehow find a previous resolution
	from combinatorial explosion.

	Conflict ID calculation
	~~~~~~~~~~~~~~~~~~~~~~~

	Once the conflict normalization is done, the conflict ID is calculated
	as the sha1 hash of the conflict hunks appended to each other,
	separated by <NUL> characters. The conflict markers are stripped out
	before the sha1 is calculated. So in the example above, where we
	merge branch AC which changes line A to line C, into branch AB, which
	changes line A to line C, the conflict ID would be
	SHA1('B<NUL>C<NUL>').

	If there are multiple conflicts in one file, the sha1 is calculated
	the same way with all hunks appended to each other, in the order in
	which they appear in the file, separated by a <NUL> character.

	Nested conflicts
	~~~~~~~~~~~~~~~~

	Nested conflicts are handled very similarly to "simple" conflicts.
	Similar to simple conflicts, the conflict is first normalized by
	stripping the labels from conflict markers, stripping the common ancestor
	version, and the sorting the conflict hunks, both for the outer and the
	inner conflict. This is done recursively, so any number of nested
	conflicts can be handled.

	Note that this only works for conflict markers that "cleanly nest". If
	there are any unmatched conflict markers, rerere will fail to handle
	the conflict and record a conflict resolution.

	The only difference is in how the conflict ID is calculated. For the
	inner conflict, the conflict markers themselves are not stripped out
	before calculating the sha1.

	Say we have the following conflict for example:

	<<<<<<< HEAD
	1
	=======
	<<<<<<< HEAD
	3
	=======
	2
	>>>>>>> branch-2
	>>>>>>> branch-3~

	After stripping out the labels of the conflict markers, and sorting
	the hunks, the conflict would look as follows:

	<<<<<<<
	1
	=======
	<<<<<<<
	2
	=======
	3
	>>>>>>>
	>>>>>>>

	and finally the conflict ID would be calculated as:
	`sha1('1<NUL><<<<<<<\n3\n=======\n2\n>>>>>>><NUL>')`