)]}'
{
  "commit": "1f34bf3e082741e053d25b76a0ffe31d9d967594",
  "tree": "71df2b3ec744d115a11b34e37626a43cf8f105f1",
  "parents": [
    "320572c43d7bc5afbcb8e5faf83b6eccfe6f4e32"
  ],
  "author": {
    "name": "Patrick Steinhardt",
    "email": "ps@pks.im",
    "time": "Wed May 28 14:24:11 2025 +0200"
  },
  "committer": {
    "name": "Junio C Hamano",
    "email": "gitster@pobox.com",
    "time": "Wed May 28 07:56:29 2025 -0700"
  },
  "message": "midx: stop repeatedly looking up nonexistent packfiles\n\nThe multi-pack index acts as a cache across a set of packfiles so that\nwe can quickly look up which of those packfiles contains a given object.\nAs such, the multi-pack index naturally needs to be updated every time\none of the packfiles goes away, or otherwise the multi-pack index has\ngrown stale.\n\nA stale multi-pack index should be handled gracefully by Git though, and\nin fact it is: if the indexed pack cannot be found we simply ignore it\nand eventually we fall back to doing the object lookup by just iterating\nthrough all packs, even if those aren\u0027t indexed.\n\nBut while this fallback works, it has one significant downside: we don\u0027t\ncache the fact that a pack has vanished. This leads to us repeatedly\ntrying to look up the same pack only to realize that it (still) doesn\u0027t\nexist.\n\nThis issue can be easily demonstrated by creating a repository with a\nstale multi-pack index and a couple of objects. We do so by creating a\nrepository with two packfiles, both of which are indexed by the\nmulti-pack index, and then repack those two packfiles. Note that we have\nto move the multi-pack-index before doing the final repack, as Git knows\nto delete it otherwise.\n\n    $ git init repo\n    $ cd repo/\n    $ git config set maintenance.auto false\n    $ for i in $(seq 1000); do printf \"%d-original\" $i \u003efile-$i; done\n    $ git add .\n    $ git commit -moriginal\n    $ git repack -dl\n    $ for i in $(seq 1000); do printf \"%d-modified\" $i \u003efile-$i; done\n    $ git commit -a -mmodified\n    $ git repack -dl\n    $ git multi-pack-index write\n    $ mv .git/objects/pack/multi-pack-index .\n    $ git repack -Adl\n    $ mv multi-pack-index .git/objects/pack/\n\nCommands that cause a lot of objects lookups will now repeatedly invoke\n`add_packed_git()`, which leads to three failed access(3p) calls as well\nas one failed stat(3p) call. The following strace for example is done\nfor `git log --patch` in the above repository:\n\n    % time     seconds  usecs/call     calls    errors syscall\n    ------ ----------- ----------- --------- --------- ----------------\n     74.67    0.024693           1     18038     18031 access\n     25.33    0.008378           1      6045      6017 newfstatat\n    ------ ----------- ----------- --------- --------- ----------------\n    100.00    0.033071           1     24083     24048 total\n\nFix the issue by introducing a negative lookup cache for indexed packs.\nThis cache works by simply storing an invalid pointer for a missing pack\nwhen `prepare_midx_pack()` fails to look up the pack. Most users of the\n`packs` array don\u0027t need to be adjusted, either, as they all know to\ncall `prepare_midx_pack()` before accessing the array.\n\nWith this change in place we can now see a significantly reduced number\nof syscalls:\n\n    % time     seconds  usecs/call     calls    errors syscall\n    ------ ----------- ----------- --------- --------- ----------------\n     73.58    0.000323           5        60        28 newfstatat\n     26.42    0.000116           5        23        16 access\n    ------ ----------- ----------- --------- --------- ----------------\n    100.00    0.000439           5        83        44 total\n\nFurthermore, this change also results in a speedup:\n\n    Benchmark 1: git log --patch (revision \u003d HEAD~)\n      Time (mean ± σ):      50.4 ms ±   2.5 ms    [User: 22.0 ms, System: 24.4 ms]\n      Range (min … max):    45.4 ms …  54.9 ms    53 runs\n\n    Benchmark 2: git log --patch (revision \u003d HEAD)\n      Time (mean ± σ):      12.7 ms ±   0.4 ms    [User: 11.1 ms, System: 1.6 ms]\n      Range (min … max):    12.4 ms …  15.0 ms    191 runs\n\n    Summary\n      git log --patch (revision \u003d HEAD) ran\n        3.96 ± 0.22 times faster than git log --patch (revision \u003d HEAD~)\n\nIn the end, it should in theory never be necessary to have this negative\nlookup cache given that we know to update the multi-pack index together\nwith repacks. But as the change is quite contained and as the speedup\ncan be significant as demonstrated above, it does feel sensible to have\nthe negative lookup cache regardless.\n\nBased-on-patch-by: Jeff King \u003cpeff@peff.net\u003e\nSigned-off-by: Patrick Steinhardt \u003cps@pks.im\u003e\nSigned-off-by: Junio C Hamano \u003cgitster@pobox.com\u003e\n",
  "tree_diff": [
    {
      "type": "modify",
      "old_id": "3d0015f782818c2037b90bc94fb35551f20c2812",
      "old_mode": 33188,
      "old_path": "midx.c",
      "new_id": "cd6e766ce2b15821995dbaf0dc5534136ed9969d",
      "new_mode": 33188,
      "new_path": "midx.c"
    }
  ]
}
