)]}'
{
  "commit": "0e5bba53af7d6f911e99589d736cdf06badad0fe",
  "tree": "1171cd062d102e691283ac27fbc5e17e9b44e1ab",
  "parents": [
    "1fb2b636c672fea06fdc5f50d5c0ed44117ae45a"
  ],
  "author": {
    "name": "Jeff King",
    "email": "peff@peff.net",
    "time": "Fri Sep 08 02:38:41 2017 -0400"
  },
  "committer": {
    "name": "Junio C Hamano",
    "email": "gitster@pobox.com",
    "time": "Fri Sep 08 15:43:17 2017 +0900"
  },
  "message": "add UNLEAK annotation for reducing leak false positives\n\nIt\u0027s a common pattern in git commands to allocate some\nmemory that should last for the lifetime of the program and\nthen not bother to free it, relying on the OS to throw it\naway.\n\nThis keeps the code simple, and it\u0027s fast (we don\u0027t waste\ntime traversing structures or calling free at the end of the\nprogram). But it also triggers warnings from memory-leak\ncheckers like valgrind or LSAN. They know that the memory\nwas still allocated at program exit, but they don\u0027t know\n_when_ the leaked memory stopped being useful. If it was\nearly in the program, then it\u0027s probably a real and\nimportant leak. But if it was used right up until program\nexit, it\u0027s not an interesting leak and we\u0027d like to suppress\nit so that we can see the real leaks.\n\nThis patch introduces an UNLEAK() macro that lets us do so.\nTo understand its design, let\u0027s first look at some of the\nalternatives.\n\nUnfortunately the suppression systems offered by\nleak-checking tools don\u0027t quite do what we want. A\nleak-checker basically knows two things:\n\n  1. Which blocks were allocated via malloc, and the\n     callstack during the allocation.\n\n  2. Which blocks were left un-freed at the end of the\n     program (and which are unreachable, but more on that\n     later).\n\nTheir suppressions work by mentioning the function or\ncallstack of a particular allocation, and marking it as OK\nto leak.  So imagine you have code like this:\n\n  int cmd_foo(...)\n  {\n\t/* this allocates some memory */\n\tchar *p \u003d some_function();\n\tprintf(\"%s\", p);\n\treturn 0;\n  }\n\nYou can say \"ignore allocations from some_function(),\nthey\u0027re not leaks\". But that\u0027s not right. That function may\nbe called elsewhere, too, and we would potentially want to\nknow about those leaks.\n\nSo you can say \"ignore the callstack when main calls\nsome_function\".  That works, but your annotations are\nbrittle. In this case it\u0027s only two functions, but you can\nimagine that the actual allocation is much deeper. If any of\nthe intermediate code changes, you have to update the\nsuppression.\n\nWhat we _really_ want to say is that \"the value assigned to\np at the end of the function is not a real leak\". But\nleak-checkers can\u0027t understand that; they don\u0027t know about\n\"p\" in the first place.\n\nHowever, we can do something a little bit tricky if we make\nsome assumptions about how leak-checkers work. They\ngenerally don\u0027t just report all un-freed blocks. That would\nreport even globals which are still accessible when the\nleak-check is run.  Instead they take some set of memory\n(like BSS) as a root and mark it as \"reachable\". Then they\nscan the reachable blocks for anything that looks like a\npointer to a malloc\u0027d block, and consider that block\nreachable. And then they scan those blocks, and so on,\ntransitively marking anything reachable from a global as\n\"not leaked\" (or at least leaked in a different category).\n\nSo we can mark the value of \"p\" as reachable by putting it\ninto a variable with program lifetime. One way to do that is\nto just mark \"p\" as static. But that actually affects the\nrun-time behavior if the function is called twice (you\naren\u0027t likely to call main() twice, but some of our cmd_*()\nfunctions are called from other commands).\n\nInstead, we can trick the leak-checker by putting the value\ninto _any_ reachable bytes. This patch keeps a global\nlinked-list of bytes copied from \"unleaked\" variables. That\nlist is reachable even at program exit, which confers\nrecursive reachability on whatever values we unleak.\n\nIn other words, you can do:\n\n  int cmd_foo(...)\n  {\n\tchar *p \u003d some_function();\n\tprintf(\"%s\", p);\n\tUNLEAK(p);\n\treturn 0;\n  }\n\nto annotate \"p\" and suppress the leak report.\n\nBut wait, couldn\u0027t we just say \"free(p)\"? In this toy\nexample, yes. But UNLEAK()\u0027s byte-copying strategy has\nseveral advantages over actually freeing the memory:\n\n  1. It\u0027s recursive across structures. In many cases our \"p\"\n     is not just a pointer, but a complex struct whose\n     fields may have been allocated by a sub-function. And\n     in some cases (e.g., dir_struct) we don\u0027t even have a\n     function which knows how to free all of the struct\n     members.\n\n     By marking the struct itself as reachable, that confers\n     reachability on any pointers it contains (including those\n     found in embedded structs, or reachable by walking\n     heap blocks recursively.\n\n  2. It works on cases where we\u0027re not sure if the value is\n     allocated or not. For example:\n\n       char *p \u003d argc \u003e 1 ? argv[1] : some_function();\n\n     It\u0027s safe to use UNLEAK(p) here, because it\u0027s not\n     freeing any memory. In the case that we\u0027re pointing to\n     argv here, the reachability checker will just ignore\n     our bytes.\n\n  3. Likewise, it works even if the variable has _already_\n     been freed. We\u0027re just copying the pointer bytes. If\n     the block has been freed, the leak-checker will skip\n     over those bytes as uninteresting.\n\n  4. Because it\u0027s not actually freeing memory, you can\n     UNLEAK() before we are finished accessing the variable.\n     This is helpful in cases like this:\n\n       char *p \u003d some_function();\n       return another_function(p);\n\n     Writing this with free() requires:\n\n       int ret;\n       char *p \u003d some_function();\n       ret \u003d another_function(p);\n       free(p);\n       return ret;\n\n     But with unleak we can just write:\n\n       char *p \u003d some_function();\n       UNLEAK(p);\n       return another_function(p);\n\nThis patch adds the UNLEAK() macro and enables it\nautomatically when Git is compiled with SANITIZE\u003dleak.  In\nnormal builds it\u0027s a noop, so we pay no runtime cost.\n\nIt also adds some UNLEAK() annotations to show off how the\nfeature works. On top of other recent leak fixes, these are\nenough to get t0000 and t0001 to pass when compiled with\nLSAN.\n\nNote the case in commit.c which actually converts a\nstrbuf_release() into an UNLEAK. This code was already\nnon-leaky, but the free didn\u0027t do anything useful, since\nwe\u0027re exiting. Converting it to an annotation means that\nnon-leak-checking builds pay no runtime cost. The cost is\nminimal enough that it\u0027s probably not worth going on a\ncrusade to convert these kinds of frees to UNLEAKS. I did it\nhere for consistency with the \"sb\" leak (though it would\nhave been equally correct to go the other way, and turn them\nboth into strbuf_release() calls).\n\nSigned-off-by: Jeff King \u003cpeff@peff.net\u003e\nSigned-off-by: Junio C Hamano \u003cgitster@pobox.com\u003e\n",
  "tree_diff": [
    {
      "type": "modify",
      "old_id": "f2bb7f2f63e458af6261b4a151d87ef9e973222e",
      "old_mode": 33188,
      "old_path": "Makefile",
      "new_id": "c052f09bbae7936b9c9378f9e314e6916a257998",
      "new_mode": 33188,
      "new_path": "Makefile"
    },
    {
      "type": "modify",
      "old_id": "ef625e3fb8b4f880f85e8a0283ecc18392ce19b6",
      "old_mode": 33188,
      "old_path": "builtin/add.c",
      "new_id": "a648cf4c56c9f56eb60a97ca50ccf0b191dc50ff",
      "new_mode": 33188,
      "new_path": "builtin/add.c"
    },
    {
      "type": "modify",
      "old_id": "b3b04f5dd3a94d1661e877c5019cc56ac46854ef",
      "old_mode": 33188,
      "old_path": "builtin/commit.c",
      "new_id": "58f9747c2f2d99d2ab4d78f62f5e9c0928a12f1f",
      "new_mode": 33188,
      "new_path": "builtin/commit.c"
    },
    {
      "type": "modify",
      "old_id": "52a4606243e3e39108436feff0319f4c9cc8d990",
      "old_mode": 33188,
      "old_path": "builtin/config.c",
      "new_id": "d13daeeb55927758ceec816f39412123d2ce5846",
      "new_mode": 33188,
      "new_path": "builtin/config.c"
    },
    {
      "type": "modify",
      "old_id": "47823f9aa4452edfa684b042dd5fbaa935df1d39",
      "old_mode": 33188,
      "old_path": "builtin/init-db.c",
      "new_id": "c9b7946bade9e10f799942137480e71ee3233701",
      "new_mode": 33188,
      "new_path": "builtin/init-db.c"
    },
    {
      "type": "modify",
      "old_id": "e1339e6d17d2bf592b800d3e5a686cbc91c12331",
      "old_mode": 33188,
      "old_path": "builtin/ls-files.c",
      "new_id": "8c713c47acccf0a3a50ec8a0b822aafae8d6bada",
      "new_mode": 33188,
      "new_path": "builtin/ls-files.c"
    },
    {
      "type": "modify",
      "old_id": "c98e2ce5f57c1f41ccaf90041a1a22233ae75551",
      "old_mode": 33188,
      "old_path": "builtin/worktree.c",
      "new_id": "de26849f5551a12aac16e4121c7024ebb40502d2",
      "new_mode": 33188,
      "new_path": "builtin/worktree.c"
    },
    {
      "type": "modify",
      "old_id": "6678b488cc73851d468094deb5352d311ab310d0",
      "old_mode": 33188,
      "old_path": "git-compat-util.h",
      "new_id": "003e444c46edce65f04413d732c22d71d708fd4f",
      "new_mode": 33188,
      "new_path": "git-compat-util.h"
    },
    {
      "type": "modify",
      "old_id": "1ea7df9a202339972ee59f35a5ba8852502c915f",
      "old_mode": 33188,
      "old_path": "usage.c",
      "new_id": "cdd534c9dfc4bd38ce112da62643ab2dc7b4fbe9",
      "new_mode": 33188,
      "new_path": "usage.c"
    }
  ]
}
