midx: don't reuse corrupt MIDXs when writing

When writing a new multi-pack index, Git tries to reuse as much of the
data from an existing MIDX as possible, like object offsets. This is
done to avoid re-opening a bunch of *.idx files unnecessarily, but can
lead to problems if the data we are reusing is corrupt.

That's because we'll blindly reuse data from an existing MIDX without
checking its trailing checksum for validity. So if there is memory
corruption while writing a MIDX, or disk corruption in the intervening
period between writing and reuse, we'll blindly propagate those bad
values forward.

Suppose we experience a memory corruption while writing a MIDX such that
we write an incorrect object offset (or alternatively, the disk corrupts
the data after being written, but before being reused). Then when we go
to write a new MIDX, we'll reuse the bad object offset without checking
its validity. This means that the MIDX we just wrote is broken, but its
trailing checksum is in-tact, since we never bothered to look at the
values before writing.

In the above, a "git multi-pack-index verify" would have caught the
problem before writing, but writing a new MIDX wouldn't have noticed
anything wrong, blindly carrying forward the corrupt offset.

Individual pack indexes check their validity by verifying the crc32
attached to each entry when carrying data forward during a repack.
We could solve this problem for MIDXs in the same way, but individual
crc32's don't make much sense, since their entries are so small.
Likewise, checking the whole file on every read may be prohibitively
expensive if a repository has a lot of objects, packs, or both.

But we can check the trailing checksum when reusing an existing MIDX
when writing a new one. And a corrupt MIDX need not stop us from writing
a new one, since we can just avoid reusing the existing one at all and
pretend as if we are writing a new MIDX from scratch.

Suggested-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2 files changed
tree: 22b8ed8b8995c49dc4e566290a424ae359f205fd
  1. .github/
  2. block-sha1/
  3. builtin/
  4. ci/
  5. compat/
  6. contrib/
  7. Documentation/
  8. ewah/
  9. git-gui/
  10. gitk-git/
  11. gitweb/
  12. mergetools/
  13. negotiator/
  14. perl/
  15. po/
  16. ppc/
  17. refs/
  18. sha1dc/
  19. sha256/
  20. t/
  21. templates/
  22. trace2/
  23. xdiff/
  24. .cirrus.yml
  25. .clang-format
  26. .editorconfig
  27. .gitattributes
  28. .gitignore
  29. .gitmodules
  30. .mailmap
  31. .travis.yml
  32. .tsan-suppressions
  33. abspath.c
  34. aclocal.m4
  35. add-interactive.c
  36. add-interactive.h
  37. add-patch.c
  38. advice.c
  39. advice.h
  40. alias.c
  41. alias.h
  42. alloc.c
  43. alloc.h
  44. apply.c
  45. apply.h
  46. archive-tar.c
  47. archive-zip.c
  48. archive.c
  49. archive.h
  50. attr.c
  51. attr.h
  52. banned.h
  53. base85.c
  54. bisect.c
  55. bisect.h
  56. blame.c
  57. blame.h
  58. blob.c
  59. blob.h
  60. bloom.c
  61. bloom.h
  62. branch.c
  63. branch.h
  64. builtin.h
  65. bulk-checkin.c
  66. bulk-checkin.h
  67. bundle.c
  68. bundle.h
  69. cache-tree.c
  70. cache-tree.h
  71. cache.h
  72. chdir-notify.c
  73. chdir-notify.h
  74. check-builtins.sh
  75. check_bindir
  76. checkout.c
  77. checkout.h
  78. chunk-format.c
  79. chunk-format.h
  80. CODE_OF_CONDUCT.md
  81. color.c
  82. color.h
  83. column.c
  84. column.h
  85. combine-diff.c
  86. command-list.txt
  87. commit-graph.c
  88. commit-graph.h
  89. commit-reach.c
  90. commit-reach.h
  91. commit-slab-decl.h
  92. commit-slab-impl.h
  93. commit-slab.h
  94. commit.c
  95. commit.h
  96. common-main.c
  97. config.c
  98. config.h
  99. config.mak.dev
  100. config.mak.in
  101. config.mak.uname
  102. configure.ac
  103. connect.c
  104. connect.h
  105. connected.c
  106. connected.h
  107. convert.c
  108. convert.h
  109. copy.c
  110. COPYING
  111. credential.c
  112. credential.h
  113. csum-file.c
  114. csum-file.h
  115. ctype.c
  116. daemon.c
  117. date.c
  118. decorate.c
  119. decorate.h
  120. delta-islands.c
  121. delta-islands.h
  122. delta.h
  123. detect-compiler
  124. diff-delta.c
  125. diff-lib.c
  126. diff-merges.c
  127. diff-merges.h
  128. diff-no-index.c
  129. diff.c
  130. diff.h
  131. diffcore-break.c
  132. diffcore-delta.c
  133. diffcore-order.c
  134. diffcore-pickaxe.c
  135. diffcore-rename.c
  136. diffcore-rotate.c
  137. diffcore.h
  138. dir-iterator.c
  139. dir-iterator.h
  140. dir.c
  141. dir.h
  142. editor.c
  143. entry.c
  144. entry.h
  145. environment.c
  146. environment.h
  147. exec-cmd.c
  148. exec-cmd.h
  149. fetch-negotiator.c
  150. fetch-negotiator.h
  151. fetch-pack.c
  152. fetch-pack.h
  153. fmt-merge-msg.c
  154. fmt-merge-msg.h
  155. fsck.c
  156. fsck.h
  157. fsmonitor.c
  158. fsmonitor.h
  159. fuzz-commit-graph.c
  160. fuzz-pack-headers.c
  161. fuzz-pack-idx.c
  162. generate-cmdlist.sh
  163. generate-configlist.sh
  164. gettext.c
  165. gettext.h
  166. git-add--interactive.perl
  167. git-archimport.perl
  168. git-bisect.sh
  169. git-compat-util.h
  170. git-cvsexportcommit.perl
  171. git-cvsimport.perl
  172. git-cvsserver.perl
  173. git-difftool--helper.sh
  174. git-filter-branch.sh
  175. git-instaweb.sh
  176. git-merge-octopus.sh
  177. git-merge-one-file.sh
  178. git-merge-resolve.sh
  179. git-mergetool--lib.sh
  180. git-mergetool.sh
  181. git-p4.py
  182. git-quiltimport.sh
  183. git-rebase--preserve-merges.sh
  184. git-request-pull.sh
  185. git-send-email.perl
  186. git-sh-i18n.sh
  187. git-sh-setup.sh
  188. git-submodule.sh
  189. git-svn.perl
  190. GIT-VERSION-GEN
  191. git-web--browse.sh
  192. git.c
  193. git.rc
  194. gpg-interface.c
  195. gpg-interface.h
  196. graph.c
  197. graph.h
  198. grep.c
  199. grep.h
  200. hash-lookup.c
  201. hash-lookup.h
  202. hash.h
  203. hashmap.c
  204. hashmap.h
  205. help.c
  206. help.h
  207. hex.c
  208. http-backend.c
  209. http-fetch.c
  210. http-push.c
  211. http-walker.c
  212. http.c
  213. http.h
  214. ident.c
  215. imap-send.c
  216. INSTALL
  217. iterator.h
  218. json-writer.c
  219. json-writer.h
  220. khash.h
  221. kwset.c
  222. kwset.h
  223. levenshtein.c
  224. levenshtein.h
  225. LGPL-2.1
  226. line-log.c
  227. line-log.h
  228. line-range.c
  229. line-range.h
  230. linear-assignment.c
  231. linear-assignment.h
  232. list-objects-filter-options.c
  233. list-objects-filter-options.h
  234. list-objects-filter.c
  235. list-objects-filter.h
  236. list-objects.c
  237. list-objects.h
  238. list.h
  239. ll-merge.c
  240. ll-merge.h
  241. lockfile.c
  242. lockfile.h
  243. log-tree.c
  244. log-tree.h
  245. ls-refs.c
  246. ls-refs.h
  247. mailinfo.c
  248. mailinfo.h
  249. mailmap.c
  250. mailmap.h
  251. Makefile
  252. match-trees.c
  253. mem-pool.c
  254. mem-pool.h
  255. merge-blobs.c
  256. merge-blobs.h
  257. merge-ort-wrappers.c
  258. merge-ort-wrappers.h
  259. merge-ort.c
  260. merge-ort.h
  261. merge-recursive.c
  262. merge-recursive.h
  263. merge.c
  264. mergesort.c
  265. mergesort.h
  266. midx.c
  267. midx.h
  268. name-hash.c
  269. notes-cache.c
  270. notes-cache.h
  271. notes-merge.c
  272. notes-merge.h
  273. notes-utils.c
  274. notes-utils.h
  275. notes.c
  276. notes.h
  277. object-file.c
  278. object-name.c
  279. object-store.h
  280. object.c
  281. object.h
  282. oid-array.c
  283. oid-array.h
  284. oidmap.c
  285. oidmap.h
  286. oidset.c
  287. oidset.h
  288. pack-bitmap-write.c
  289. pack-bitmap.c
  290. pack-bitmap.h
  291. pack-check.c
  292. pack-objects.c
  293. pack-objects.h
  294. pack-revindex.c
  295. pack-revindex.h
  296. pack-write.c
  297. pack.h
  298. packfile.c
  299. packfile.h
  300. pager.c
  301. parallel-checkout.c
  302. parallel-checkout.h
  303. parse-options-cb.c
  304. parse-options.c
  305. parse-options.h
  306. patch-delta.c
  307. patch-ids.c
  308. patch-ids.h
  309. path.c
  310. path.h
  311. pathspec.c
  312. pathspec.h
  313. pkt-line.c
  314. pkt-line.h
  315. preload-index.c
  316. pretty.c
  317. pretty.h
  318. prio-queue.c
  319. prio-queue.h
  320. progress.c
  321. progress.h
  322. promisor-remote.c
  323. promisor-remote.h
  324. prompt.c
  325. prompt.h
  326. protocol-caps.c
  327. protocol-caps.h
  328. protocol.c
  329. protocol.h
  330. prune-packed.c
  331. prune-packed.h
  332. quote.c
  333. quote.h
  334. range-diff.c
  335. range-diff.h
  336. reachable.c
  337. reachable.h
  338. read-cache.c
  339. README.md
  340. rebase-interactive.c
  341. rebase-interactive.h
  342. rebase.c
  343. rebase.h
  344. ref-filter.c
  345. ref-filter.h
  346. reflog-walk.c
  347. reflog-walk.h
  348. refs.c
  349. refs.h
  350. refspec.c
  351. refspec.h
  352. remote-curl.c
  353. remote.c
  354. remote.h
  355. replace-object.c
  356. replace-object.h
  357. repo-settings.c
  358. repository.c
  359. repository.h
  360. rerere.c
  361. rerere.h
  362. reset.c
  363. reset.h
  364. resolve-undo.c
  365. resolve-undo.h
  366. revision.c
  367. revision.h
  368. run-command.c
  369. run-command.h
  370. SECURITY.md
  371. send-pack.c
  372. send-pack.h
  373. sequencer.c
  374. sequencer.h
  375. serve.c
  376. serve.h
  377. server-info.c
  378. setup.c
  379. sh-i18n--envsubst.c
  380. sha1dc_git.c
  381. sha1dc_git.h
  382. shallow.c
  383. shallow.h
  384. shell.c
  385. shortlog.h
  386. sideband.c
  387. sideband.h
  388. sigchain.c
  389. sigchain.h
  390. simple-ipc.h
  391. sparse-index.c
  392. sparse-index.h
  393. split-index.c
  394. split-index.h
  395. stable-qsort.c
  396. strbuf.c
  397. strbuf.h
  398. streaming.c
  399. streaming.h
  400. string-list.c
  401. string-list.h
  402. strmap.c
  403. strmap.h
  404. strvec.c
  405. strvec.h
  406. sub-process.c
  407. sub-process.h
  408. submodule-config.c
  409. submodule-config.h
  410. submodule.c
  411. submodule.h
  412. symlinks.c
  413. tag.c
  414. tag.h
  415. tar.h
  416. tempfile.c
  417. tempfile.h
  418. thread-utils.c
  419. thread-utils.h
  420. tmp-objdir.c
  421. tmp-objdir.h
  422. trace.c
  423. trace.h
  424. trace2.c
  425. trace2.h
  426. trailer.c
  427. trailer.h
  428. transport-helper.c
  429. transport-internal.h
  430. transport.c
  431. transport.h
  432. tree-diff.c
  433. tree-walk.c
  434. tree-walk.h
  435. tree.c
  436. tree.h
  437. unicode-width.h
  438. unimplemented.sh
  439. unix-socket.c
  440. unix-socket.h
  441. unix-stream-server.c
  442. unix-stream-server.h
  443. unpack-trees.c
  444. unpack-trees.h
  445. upload-pack.c
  446. upload-pack.h
  447. url.c
  448. url.h
  449. urlmatch.c
  450. urlmatch.h
  451. usage.c
  452. userdiff.c
  453. userdiff.h
  454. utf8.c
  455. utf8.h
  456. varint.c
  457. varint.h
  458. version.c
  459. version.h
  460. versioncmp.c
  461. walker.c
  462. walker.h
  463. wildmatch.c
  464. wildmatch.h
  465. worktree.c
  466. worktree.h
  467. wrap-for-bin.sh
  468. wrapper.c
  469. write-or-die.c
  470. ws.c
  471. wt-status.c
  472. wt-status.h
  473. xdiff-interface.c
  474. xdiff-interface.h
  475. zlib.c
README.md

Build status

Git - fast, scalable, distributed revision control system

Git is a fast, scalable, distributed revision control system with an unusually rich command set that provides both high-level operations and full access to internals.

Git is an Open Source project covered by the GNU General Public License version 2 (some parts of it are under different licenses, compatible with the GPLv2). It was originally written by Linus Torvalds with help of a group of hackers around the net.

Please read the file INSTALL for installation instructions.

Many Git online resources are accessible from https://git-scm.com/ including full documentation and Git related tools.

See Documentation/gittutorial.txt to get started, then see Documentation/giteveryday.txt for a useful minimum set of commands, and Documentation/git-<commandname>.txt for documentation of each command. If git has been correctly installed, then the tutorial can also be read with man gittutorial or git help tutorial, and the documentation of each command with man git-<commandname> or git help <commandname>.

CVS users may also want to read Documentation/gitcvs-migration.txt (man gitcvs-migration or git help cvs-migration if git is installed).

The user discussion and development of Git take place on the Git mailing list -- everyone is welcome to post bug reports, feature requests, comments and patches to git@vger.kernel.org (read Documentation/SubmittingPatches for instructions on patch submission). To subscribe to the list, send an email with just “subscribe git” in the body to majordomo@vger.kernel.org. The mailing list archives are available at https://lore.kernel.org/git/, http://marc.info/?l=git and other archival sites.

Issues which are security relevant should be disclosed privately to the Git Security mailing list git-security@googlegroups.com.

The maintainer frequently sends the “What's cooking” reports that list the current status of various development topics to the mailing list. The discussion following them give a good reference for project status, development direction and remaining tasks.

The name “git” was given by Linus Torvalds when he wrote the very first version. He described the tool as “the stupid content tracker” and the name as (depending on your mood):

  • random three-letter combination that is pronounceable, and not actually used by any common UNIX command. The fact that it is a mispronunciation of “get” may or may not be relevant.
  • stupid. contemptible and despicable. simple. Take your pick from the dictionary of slang.
  • “global information tracker”: you're in a good mood, and it actually works for you. Angels sing, and a light suddenly fills the room.
  • “goddamn idiotic truckload of sh*t”: when it breaks