xdiff: optimize patience diff's LCS search

The find_longest_common_sequence() function in patience diff is
inefficient as it calls binary_search() for every unique line it
encounters when deciding where to put it in the sequence. From
instrumentation (using xctrace) on popular repositories, binary_search()
takes up 50-60% of the run time within patience_diff() when performing a
diff.

To optimize this, add a boundary condition check before binary_search()
is called to see if the encountered unique line is located after the
entire currently tracked longest subsequence. If so, skip the
unnecessary binary search and simply append the entry to the end of
sequence. Given that most files compared in a diff are usually quite
similar to each other, this condition is very common, and should be hit
much more frequently than the binary search.

Below are some end-to-end performance results by timing `git log
--shortstat --oneline -500 --patience` on different repositories with
the old and new code. Generally speaking this seems to give at least
8-10% speed up. The "binary search hit %" column describes how often the
algorithm enters the binary search path instead of the new faster path.
Even in the WebKit case we can see that it's quite rare (1.46%).

| Repo     | Speed difference | binary search hit % |
|----------|------------------|---------------------|
| vim      | 1.27x            | 0.01%               |
| pytorch  | 1.16x            | 0.02%               |
| cpython  | 1.14x            | 0.06%               |
| ripgrep  | 1.14x            | 0.03%               |
| git      | 1.13x            | 0.12%               |
| vscode   | 1.09x            | 0.10%               |
| WebKit   | 1.08x            | 1.46%               |

The benchmarks were done using hyperfine, on an Apple M1 Max laptop,
with git compiled with `-O3 -flto`.

Signed-off-by: Yee Cheng Chin <ychin.git@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
1 file changed
tree: 1a6147153f6e098a572237105e4ad07dffe3d004
  1. .github/
  2. bin-wrappers/
  3. block-sha1/
  4. builtin/
  5. ci/
  6. compat/
  7. compiler-tricks/
  8. contrib/
  9. Documentation/
  10. ewah/
  11. git-gui/
  12. gitk-git/
  13. gitweb/
  14. mergetools/
  15. negotiator/
  16. oss-fuzz/
  17. perl/
  18. po/
  19. refs/
  20. reftable/
  21. sha1/
  22. sha1dc/
  23. sha256/
  24. subprojects/
  25. t/
  26. templates/
  27. trace2/
  28. xdiff/
  29. .cirrus.yml
  30. .clang-format
  31. .editorconfig
  32. .gitattributes
  33. .gitignore
  34. .gitlab-ci.yml
  35. .gitmodules
  36. .mailmap
  37. .tsan-suppressions
  38. abspath.c
  39. abspath.h
  40. aclocal.m4
  41. add-interactive.c
  42. add-interactive.h
  43. add-patch.c
  44. advice.c
  45. advice.h
  46. alias.c
  47. alias.h
  48. alloc.c
  49. alloc.h
  50. apply.c
  51. apply.h
  52. archive-tar.c
  53. archive-zip.c
  54. archive.c
  55. archive.h
  56. attr.c
  57. attr.h
  58. banned.h
  59. base85.c
  60. base85.h
  61. bisect.c
  62. bisect.h
  63. blame.c
  64. blame.h
  65. blob.c
  66. blob.h
  67. bloom.c
  68. bloom.h
  69. branch.c
  70. branch.h
  71. builtin.h
  72. bulk-checkin.c
  73. bulk-checkin.h
  74. bundle-uri.c
  75. bundle-uri.h
  76. bundle.c
  77. bundle.h
  78. cache-tree.c
  79. cache-tree.h
  80. cbtree.c
  81. cbtree.h
  82. chdir-notify.c
  83. chdir-notify.h
  84. check-builtins.sh
  85. checkout.c
  86. checkout.h
  87. chunk-format.c
  88. chunk-format.h
  89. CODE_OF_CONDUCT.md
  90. color.c
  91. color.h
  92. column.c
  93. column.h
  94. combine-diff.c
  95. command-list.txt
  96. commit-graph.c
  97. commit-graph.h
  98. commit-reach.c
  99. commit-reach.h
  100. commit-slab-decl.h
  101. commit-slab-impl.h
  102. commit-slab.h
  103. commit.c
  104. commit.h
  105. common-exit.c
  106. common-init.c
  107. common-init.h
  108. common-main.c
  109. config.c
  110. config.h
  111. config.mak.dev
  112. config.mak.in
  113. config.mak.uname
  114. configure.ac
  115. connect.c
  116. connect.h
  117. connected.c
  118. connected.h
  119. convert.c
  120. convert.h
  121. copy.c
  122. copy.h
  123. COPYING
  124. credential.c
  125. credential.h
  126. csum-file.c
  127. csum-file.h
  128. ctype.c
  129. daemon.c
  130. date.c
  131. date.h
  132. decorate.c
  133. decorate.h
  134. delta-islands.c
  135. delta-islands.h
  136. delta.h
  137. detect-compiler
  138. diagnose.c
  139. diagnose.h
  140. diff-delta.c
  141. diff-lib.c
  142. diff-merges.c
  143. diff-merges.h
  144. diff-no-index.c
  145. diff.c
  146. diff.h
  147. diffcore-break.c
  148. diffcore-delta.c
  149. diffcore-order.c
  150. diffcore-pickaxe.c
  151. diffcore-rename.c
  152. diffcore-rotate.c
  153. diffcore.h
  154. dir-iterator.c
  155. dir-iterator.h
  156. dir.c
  157. dir.h
  158. editor.c
  159. editor.h
  160. entry.c
  161. entry.h
  162. environment.c
  163. environment.h
  164. exec-cmd.c
  165. exec-cmd.h
  166. fetch-negotiator.c
  167. fetch-negotiator.h
  168. fetch-pack.c
  169. fetch-pack.h
  170. fmt-merge-msg.c
  171. fmt-merge-msg.h
  172. fsck.c
  173. fsck.h
  174. fsmonitor--daemon.h
  175. fsmonitor-ipc.c
  176. fsmonitor-ipc.h
  177. fsmonitor-ll.h
  178. fsmonitor-path-utils.h
  179. fsmonitor-settings.c
  180. fsmonitor-settings.h
  181. fsmonitor.c
  182. fsmonitor.h
  183. generate-cmdlist.sh
  184. generate-configlist.sh
  185. generate-hooklist.sh
  186. generate-perl.sh
  187. generate-python.sh
  188. generate-script.sh
  189. gettext.c
  190. gettext.h
  191. git-archimport.perl
  192. GIT-BUILD-OPTIONS.in
  193. git-compat-util.h
  194. git-curl-compat.h
  195. git-cvsexportcommit.perl
  196. git-cvsimport.perl
  197. git-cvsserver.perl
  198. git-difftool--helper.sh
  199. git-filter-branch.sh
  200. git-instaweb.sh
  201. git-merge-octopus.sh
  202. git-merge-one-file.sh
  203. git-merge-resolve.sh
  204. git-mergetool--lib.sh
  205. git-mergetool.sh
  206. git-p4.py
  207. git-quiltimport.sh
  208. git-request-pull.sh
  209. git-send-email.perl
  210. git-sh-i18n.sh
  211. git-sh-setup.sh
  212. git-submodule.sh
  213. git-svn.perl
  214. GIT-VERSION-FILE.in
  215. GIT-VERSION-GEN
  216. git-web--browse.sh
  217. git-zlib.c
  218. git-zlib.h
  219. git.c
  220. git.rc.in
  221. gpg-interface.c
  222. gpg-interface.h
  223. graph.c
  224. graph.h
  225. grep.c
  226. grep.h
  227. hash-lookup.c
  228. hash-lookup.h
  229. hash.c
  230. hash.h
  231. hashmap.c
  232. hashmap.h
  233. help.c
  234. help.h
  235. hex-ll.c
  236. hex-ll.h
  237. hex.c
  238. hex.h
  239. hook.c
  240. hook.h
  241. http-backend.c
  242. http-fetch.c
  243. http-push.c
  244. http-walker.c
  245. http.c
  246. http.h
  247. ident.c
  248. ident.h
  249. imap-send.c
  250. INSTALL
  251. iterator.h
  252. json-writer.c
  253. json-writer.h
  254. khash.h
  255. kwset.c
  256. kwset.h
  257. levenshtein.c
  258. levenshtein.h
  259. LGPL-2.1
  260. line-log.c
  261. line-log.h
  262. line-range.c
  263. line-range.h
  264. linear-assignment.c
  265. linear-assignment.h
  266. list-objects-filter-options.c
  267. list-objects-filter-options.h
  268. list-objects-filter.c
  269. list-objects-filter.h
  270. list-objects.c
  271. list-objects.h
  272. list.h
  273. lockfile.c
  274. lockfile.h
  275. log-tree.c
  276. log-tree.h
  277. loose.c
  278. loose.h
  279. ls-refs.c
  280. ls-refs.h
  281. mailinfo.c
  282. mailinfo.h
  283. mailmap.c
  284. mailmap.h
  285. Makefile
  286. match-trees.c
  287. match-trees.h
  288. mem-pool.c
  289. mem-pool.h
  290. merge-blobs.c
  291. merge-blobs.h
  292. merge-ll.c
  293. merge-ll.h
  294. merge-ort-wrappers.c
  295. merge-ort-wrappers.h
  296. merge-ort.c
  297. merge-ort.h
  298. merge.c
  299. merge.h
  300. mergesort.h
  301. meson.build
  302. meson_options.txt
  303. midx-write.c
  304. midx.c
  305. midx.h
  306. name-hash.c
  307. name-hash.h
  308. notes-cache.c
  309. notes-cache.h
  310. notes-merge.c
  311. notes-merge.h
  312. notes-utils.c
  313. notes-utils.h
  314. notes.c
  315. notes.h
  316. object-file-convert.c
  317. object-file-convert.h
  318. object-file.c
  319. object-file.h
  320. object-name.c
  321. object-name.h
  322. object-store.c
  323. object-store.h
  324. object.c
  325. object.h
  326. oid-array.c
  327. oid-array.h
  328. oidmap.c
  329. oidmap.h
  330. oidset.c
  331. oidset.h
  332. oidtree.c
  333. oidtree.h
  334. pack-bitmap-write.c
  335. pack-bitmap.c
  336. pack-bitmap.h
  337. pack-check.c
  338. pack-mtimes.c
  339. pack-mtimes.h
  340. pack-objects.c
  341. pack-objects.h
  342. pack-revindex.c
  343. pack-revindex.h
  344. pack-write.c
  345. pack.h
  346. packfile.c
  347. packfile.h
  348. pager.c
  349. pager.h
  350. parallel-checkout.c
  351. parallel-checkout.h
  352. parse-options-cb.c
  353. parse-options.c
  354. parse-options.h
  355. parse.c
  356. parse.h
  357. patch-delta.c
  358. patch-ids.c
  359. patch-ids.h
  360. path-walk.c
  361. path-walk.h
  362. path.c
  363. path.h
  364. pathspec.c
  365. pathspec.h
  366. pkt-line.c
  367. pkt-line.h
  368. preload-index.c
  369. preload-index.h
  370. pretty.c
  371. pretty.h
  372. prio-queue.c
  373. prio-queue.h
  374. progress.c
  375. progress.h
  376. promisor-remote.c
  377. promisor-remote.h
  378. prompt.c
  379. prompt.h
  380. protocol-caps.c
  381. protocol-caps.h
  382. protocol.c
  383. protocol.h
  384. prune-packed.c
  385. prune-packed.h
  386. pseudo-merge.c
  387. pseudo-merge.h
  388. quote.c
  389. quote.h
  390. range-diff.c
  391. range-diff.h
  392. reachable.c
  393. reachable.h
  394. read-cache-ll.h
  395. read-cache.c
  396. read-cache.h
  397. README.md
  398. rebase-interactive.c
  399. rebase-interactive.h
  400. rebase.c
  401. rebase.h
  402. ref-filter.c
  403. ref-filter.h
  404. reflog-walk.c
  405. reflog-walk.h
  406. reflog.c
  407. reflog.h
  408. refs.c
  409. refs.h
  410. refspec.c
  411. refspec.h
  412. remote-curl.c
  413. remote.c
  414. remote.h
  415. replace-object.c
  416. replace-object.h
  417. repo-settings.c
  418. repo-settings.h
  419. repository.c
  420. repository.h
  421. rerere.c
  422. rerere.h
  423. reset.c
  424. reset.h
  425. resolve-undo.c
  426. resolve-undo.h
  427. revision.c
  428. revision.h
  429. run-command.c
  430. run-command.h
  431. sane-ctype.h
  432. scalar.c
  433. SECURITY.md
  434. send-pack.c
  435. send-pack.h
  436. sequencer.c
  437. sequencer.h
  438. serve.c
  439. serve.h
  440. server-info.c
  441. server-info.h
  442. setup.c
  443. setup.h
  444. sh-i18n--envsubst.c
  445. sha1dc_git.c
  446. sha1dc_git.h
  447. shallow.c
  448. shallow.h
  449. shared.mak
  450. shell.c
  451. shortlog.h
  452. sideband.c
  453. sideband.h
  454. sigchain.c
  455. sigchain.h
  456. simple-ipc.h
  457. sparse-index.c
  458. sparse-index.h
  459. split-index.c
  460. split-index.h
  461. stable-qsort.c
  462. statinfo.c
  463. statinfo.h
  464. strbuf.c
  465. strbuf.h
  466. streaming.c
  467. streaming.h
  468. string-list.c
  469. string-list.h
  470. strmap.c
  471. strmap.h
  472. strvec.c
  473. strvec.h
  474. sub-process.c
  475. sub-process.h
  476. submodule-config.c
  477. submodule-config.h
  478. submodule.c
  479. submodule.h
  480. symlinks.c
  481. symlinks.h
  482. tag.c
  483. tag.h
  484. tar.h
  485. tempfile.c
  486. tempfile.h
  487. thread-utils.c
  488. thread-utils.h
  489. tmp-objdir.c
  490. tmp-objdir.h
  491. trace.c
  492. trace.h
  493. trace2.c
  494. trace2.h
  495. trailer.c
  496. trailer.h
  497. transport-helper.c
  498. transport-internal.h
  499. transport.c
  500. transport.h
  501. tree-diff.c
  502. tree-walk.c
  503. tree-walk.h
  504. tree.c
  505. tree.h
  506. unicode-width.h
  507. unimplemented.sh
  508. unix-socket.c
  509. unix-socket.h
  510. unix-stream-server.c
  511. unix-stream-server.h
  512. unpack-trees.c
  513. unpack-trees.h
  514. upload-pack.c
  515. upload-pack.h
  516. url.c
  517. url.h
  518. urlmatch.c
  519. urlmatch.h
  520. usage.c
  521. userdiff.c
  522. userdiff.h
  523. utf8.c
  524. utf8.h
  525. varint.c
  526. varint.h
  527. version-def.h.in
  528. version.c
  529. version.h
  530. versioncmp.c
  531. versioncmp.h
  532. walker.c
  533. walker.h
  534. wildmatch.c
  535. wildmatch.h
  536. worktree.c
  537. worktree.h
  538. wrapper.c
  539. wrapper.h
  540. write-or-die.c
  541. write-or-die.h
  542. ws.c
  543. ws.h
  544. wt-status.c
  545. wt-status.h
  546. xdiff-interface.c
  547. xdiff-interface.h
README.md

Build status

Git - fast, scalable, distributed revision control system

Git is a fast, scalable, distributed revision control system with an unusually rich command set that provides both high-level operations and full access to internals.

Git is an Open Source project covered by the GNU General Public License version 2 (some parts of it are under different licenses, compatible with the GPLv2). It was originally written by Linus Torvalds with help of a group of hackers around the net.

Please read the file INSTALL for installation instructions.

Many Git online resources are accessible from https://git-scm.com/ including full documentation and Git related tools.

See Documentation/gittutorial.adoc to get started, then see Documentation/giteveryday.adoc for a useful minimum set of commands, and Documentation/git-<commandname>.adoc for documentation of each command. If git has been correctly installed, then the tutorial can also be read with man gittutorial or git help tutorial, and the documentation of each command with man git-<commandname> or git help <commandname>.

CVS users may also want to read Documentation/gitcvs-migration.adoc (man gitcvs-migration or git help cvs-migration if git is installed).

The user discussion and development of Git take place on the Git mailing list -- everyone is welcome to post bug reports, feature requests, comments and patches to git@vger.kernel.org (read Documentation/SubmittingPatches for instructions on patch submission and Documentation/CodingGuidelines).

Those wishing to help with error message, usage and informational message string translations (localization l10) should see po/README.md (a po file is a Portable Object file that holds the translations).

To subscribe to the list, send an email to git+subscribe@vger.kernel.org (see https://subspace.kernel.org/subscribing.html for details). The mailing list archives are available at https://lore.kernel.org/git/, https://marc.info/?l=git and other archival sites.

Issues which are security relevant should be disclosed privately to the Git Security mailing list git-security@googlegroups.com.

The maintainer frequently sends the “What's cooking” reports that list the current status of various development topics to the mailing list. The discussion following them give a good reference for project status, development direction and remaining tasks.

The name “git” was given by Linus Torvalds when he wrote the very first version. He described the tool as “the stupid content tracker” and the name as (depending on your mood):

  • random three-letter combination that is pronounceable, and not actually used by any common UNIX command. The fact that it is a mispronunciation of “get” may or may not be relevant.
  • stupid. contemptible and despicable. simple. Take your pick from the dictionary of slang.
  • “global information tracker”: you're in a good mood, and it actually works for you. Angels sing, and a light suddenly fills the room.
  • “goddamn idiotic truckload of sh*t”: when it breaks