Makefile: use compat regex with SANITIZE=address

Recent versions of the gcc and clang Address Sanitizer produce test
failures related to regexec(). This triggers with gcc-10 and clang-8
(but not gcc-9 nor clang-7). Running:

  make CC=gcc-10 SANITIZE=address test

results in failures in t4018, t3206, and t4062.

The cause seems to be that when built with ASan, we use a different
version of regexec() than normal. And this version doesn't understand
the REG_STARTEND flag. Here's my evidence supporting that.

The failure in t4062 is an ASan warning:

  expecting success of 4062.2 '-G matches':
  	git diff --name-only -G "^(0{64}){64}$" HEAD^ >out &&
  	test 4096-zeroes.txt = "$(cat out)"

  =================================================================
  ==672994==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7fa76f672000 at pc 0x7fa7726f75b6 bp 0x7ffe41bdda70 sp 0x7ffe41bdd220
  READ of size 4097 at 0x7fa76f672000 thread T0
      #0 0x7fa7726f75b5  (/lib/x86_64-linux-gnu/libasan.so.6+0x4f5b5)
      #1 0x562ae0c9c40e in regexec_buf /home/peff/compile/git/git-compat-util.h:1117
      #2 0x562ae0c9c40e in diff_grep /home/peff/compile/git/diffcore-pickaxe.c:52
      #3 0x562ae0c9cc28 in pickaxe_match /home/peff/compile/git/diffcore-pickaxe.c:166
      [...]

In this case we're looking in a buffer which was mmap'd via
reuse_worktree_file(), and whose size is 4096 bytes. But libasan's
regex tries to look at byte 4097 anyway! If we tweak Git like this:

  diff --git a/diff.c b/diff.c
  index 8e2914c031..cfae60c120 100644
  --- a/diff.c
  +++ b/diff.c
  @@ -3880,7 +3880,7 @@ static int reuse_worktree_file(struct index_state *istate,
           */
          if (ce_uptodate(ce) ||
              (!lstat(name, &st) && !ie_match_stat(istate, ce, &st, 0)))
  -               return 1;
  +               return 0;

          return 0;
   }

to use a regular buffer (with a trailing NUL) instead of an mmap, then
the complaint goes away.

The other failures are actually diff output with an incorrect funcname
header. If I instrument xdiff to show the funcname matching like so:

  diff --git a/xdiff-interface.c b/xdiff-interface.c
  index 8509f9ea22..f6c3dc1986 100644
  --- a/xdiff-interface.c
  +++ b/xdiff-interface.c
  @@ -197,6 +197,7 @@ struct ff_regs {
   	struct ff_reg {
   		regex_t re;
   		int negate;
  +		char *printable;
   	} *array;
   };

  @@ -218,7 +219,12 @@ static long ff_regexp(const char *line, long len,

   	for (i = 0; i < regs->nr; i++) {
   		struct ff_reg *reg = regs->array + i;
  -		if (!regexec_buf(&reg->re, line, len, 2, pmatch, 0)) {
  +		int ret = regexec_buf(&reg->re, line, len, 2, pmatch, 0);
  +		warning("regexec %s:\n  regex: %s\n  buf: %.*s",
  +			ret == 0 ? "matched" : "did not match",
  +			reg->printable,
  +			(int)len, line);
  +		if (!ret) {
   			if (reg->negate)
   				return -1;
   			break;
  @@ -264,6 +270,7 @@ void xdiff_set_find_func(xdemitconf_t *xecfg, const char *value, int cflags)
   			expression = value;
   		if (regcomp(&reg->re, expression, cflags))
   			die("Invalid regexp to look for hunk header: %s", expression);
  +		reg->printable = xstrdup(expression);
   		free(buffer);
   		value = ep + 1;
   	}

then when compiling with ASan and gcc-10, running the diff from t4018.66
produces this:

  $ git diff -U1 cpp-skip-access-specifiers
  warning: regexec did not match:
    regex: ^[     ]*[A-Za-z_][A-Za-z_0-9]*:[[:space:]]*($|/[/*])
    buf: private:
  warning: regexec matched:
    regex: ^((::[[:space:]]*)?[A-Za-z_].*)$
    buf: private:
  diff --git a/cpp-skip-access-specifiers b/cpp-skip-access-specifiers
  index 4d4a9db..ebd6f42 100644
  --- a/cpp-skip-access-specifiers
  +++ b/cpp-skip-access-specifiers
  @@ -6,3 +6,3 @@ private:
          void DoSomething();
          int ChangeMe;
  };
          void DoSomething();
  -       int ChangeMe;
  +       int IWasChanged;
   };

That first regex should match (and is negated, so it should be telling
us _not_ to match "private:"). But it wouldn't if regexec() is looking
at the whole buffer, and not just the length-limited line we've fed to
regexec_buf(). So this is consistent again with REG_STARTEND being
ignored.

The correct output (compiling without ASan, or gcc-9 with Asan) looks
like this:

  warning: regexec matched:
    regex: ^[     ]*[A-Za-z_][A-Za-z_0-9]*:[[:space:]]*($|/[/*])
    buf: private:
  [...more lines that we end up not using...]
  warning: regexec matched:
    regex: ^((::[[:space:]]*)?[A-Za-z_].*)$
    buf: class RIGHT : public Baseclass
  diff --git a/cpp-skip-access-specifiers b/cpp-skip-access-specifiers
  index 4d4a9db..ebd6f42 100644
  --- a/cpp-skip-access-specifiers
  +++ b/cpp-skip-access-specifiers
  @@ -6,3 +6,3 @@ class RIGHT : public Baseclass
          void DoSomething();
  -       int ChangeMe;
  +       int IWasChanged;
   };

So it really does seem like libasan's regex engine is ignoring
REG_STARTEND. We should be able to work around it by compiling with
NO_REGEX, which would use our local regexec(). But to make matters even
more interesting, this isn't enough by itself.

Because ASan has support from the compiler, it doesn't seem to intercept
our call to regexec() at the dynamic library level. It actually
recognizes when we are compiling a call to regexec() and replaces it
with ASan-specific code at that point. And unlike most of our other
compat code, where we might have git_mmap() or similar, the actual
symbol name in the compiled compat/regex code is regexec(). So just
compiling with NO_REGEX isn't enough; we still end up in libasan!

We can work around that by having the preprocessor replace regexec with
git_regexec (both in the callers and in the actual implementation), and
we truly end up with a call to our custom regex code, even when
compiling with ASan. That's probably a good thing to do anyway, as it
means anybody looking at the symbols later (e.g., in a debugger) would
have a better indication of which function is which. So we'll do the
same for the other common regex functions (even though just regexec() is
enough to fix this ASan problem).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2 files changed
tree: 24b63df21a163fd51b77449d1700dfe4602684ba
  1. .github/
  2. block-sha1/
  3. builtin/
  4. ci/
  5. compat/
  6. contrib/
  7. Documentation/
  8. ewah/
  9. git-gui/
  10. gitk-git/
  11. gitweb/
  12. mergetools/
  13. negotiator/
  14. perl/
  15. po/
  16. ppc/
  17. refs/
  18. sha1dc/
  19. sha256/
  20. t/
  21. templates/
  22. trace2/
  23. vcs-svn/
  24. xdiff/
  25. .clang-format
  26. .editorconfig
  27. .gitattributes
  28. .gitignore
  29. .gitmodules
  30. .mailmap
  31. .travis.yml
  32. .tsan-suppressions
  33. abspath.c
  34. aclocal.m4
  35. advice.c
  36. advice.h
  37. alias.c
  38. alias.h
  39. alloc.c
  40. alloc.h
  41. apply.c
  42. apply.h
  43. archive-tar.c
  44. archive-zip.c
  45. archive.c
  46. archive.h
  47. argv-array.c
  48. argv-array.h
  49. attr.c
  50. attr.h
  51. azure-pipelines.yml
  52. banned.h
  53. base85.c
  54. bisect.c
  55. bisect.h
  56. blame.c
  57. blame.h
  58. blob.c
  59. blob.h
  60. branch.c
  61. branch.h
  62. builtin.h
  63. bulk-checkin.c
  64. bulk-checkin.h
  65. bundle.c
  66. bundle.h
  67. cache-tree.c
  68. cache-tree.h
  69. cache.h
  70. chdir-notify.c
  71. chdir-notify.h
  72. check-builtins.sh
  73. check_bindir
  74. checkout.c
  75. checkout.h
  76. CODE_OF_CONDUCT.md
  77. color.c
  78. color.h
  79. column.c
  80. column.h
  81. combine-diff.c
  82. command-list.txt
  83. commit-graph.c
  84. commit-graph.h
  85. commit-reach.c
  86. commit-reach.h
  87. commit-slab-decl.h
  88. commit-slab-impl.h
  89. commit-slab.h
  90. commit.c
  91. commit.h
  92. common-main.c
  93. config.c
  94. config.h
  95. config.mak.dev
  96. config.mak.in
  97. config.mak.uname
  98. configure.ac
  99. connect.c
  100. connect.h
  101. connected.c
  102. connected.h
  103. convert.c
  104. convert.h
  105. copy.c
  106. COPYING
  107. credential-cache--daemon.c
  108. credential-cache.c
  109. credential-store.c
  110. credential.c
  111. credential.h
  112. csum-file.c
  113. csum-file.h
  114. ctype.c
  115. daemon.c
  116. date.c
  117. decorate.c
  118. decorate.h
  119. delta-islands.c
  120. delta-islands.h
  121. delta.h
  122. detect-compiler
  123. diff-delta.c
  124. diff-lib.c
  125. diff-no-index.c
  126. diff.c
  127. diff.h
  128. diffcore-break.c
  129. diffcore-delta.c
  130. diffcore-order.c
  131. diffcore-pickaxe.c
  132. diffcore-rename.c
  133. diffcore.h
  134. dir-iterator.c
  135. dir-iterator.h
  136. dir.c
  137. dir.h
  138. editor.c
  139. entry.c
  140. environment.c
  141. exec-cmd.c
  142. exec-cmd.h
  143. fast-import.c
  144. fetch-negotiator.c
  145. fetch-negotiator.h
  146. fetch-pack.c
  147. fetch-pack.h
  148. fmt-merge-msg.h
  149. fsck.c
  150. fsck.h
  151. fsmonitor.c
  152. fsmonitor.h
  153. fuzz-commit-graph.c
  154. fuzz-pack-headers.c
  155. fuzz-pack-idx.c
  156. generate-cmdlist.sh
  157. gettext.c
  158. gettext.h
  159. git-add--interactive.perl
  160. git-archimport.perl
  161. git-bisect.sh
  162. git-compat-util.h
  163. git-cvsexportcommit.perl
  164. git-cvsimport.perl
  165. git-cvsserver.perl
  166. git-difftool--helper.sh
  167. git-filter-branch.sh
  168. git-instaweb.sh
  169. git-legacy-stash.sh
  170. git-merge-octopus.sh
  171. git-merge-one-file.sh
  172. git-merge-resolve.sh
  173. git-mergetool--lib.sh
  174. git-mergetool.sh
  175. git-p4.py
  176. git-parse-remote.sh
  177. git-quiltimport.sh
  178. git-rebase--preserve-merges.sh
  179. git-request-pull.sh
  180. git-send-email.perl
  181. git-sh-i18n.sh
  182. git-sh-setup.sh
  183. git-submodule.sh
  184. git-svn.perl
  185. GIT-VERSION-GEN
  186. git-web--browse.sh
  187. git.c
  188. git.rc
  189. gpg-interface.c
  190. gpg-interface.h
  191. graph.c
  192. graph.h
  193. grep.c
  194. grep.h
  195. hash.h
  196. hashmap.c
  197. hashmap.h
  198. help.c
  199. help.h
  200. hex.c
  201. http-backend.c
  202. http-fetch.c
  203. http-push.c
  204. http-walker.c
  205. http.c
  206. http.h
  207. ident.c
  208. imap-send.c
  209. INSTALL
  210. interdiff.c
  211. interdiff.h
  212. iterator.h
  213. json-writer.c
  214. json-writer.h
  215. khash.h
  216. kwset.c
  217. kwset.h
  218. levenshtein.c
  219. levenshtein.h
  220. LGPL-2.1
  221. line-log.c
  222. line-log.h
  223. line-range.c
  224. line-range.h
  225. linear-assignment.c
  226. linear-assignment.h
  227. list-objects-filter-options.c
  228. list-objects-filter-options.h
  229. list-objects-filter.c
  230. list-objects-filter.h
  231. list-objects.c
  232. list-objects.h
  233. list.h
  234. ll-merge.c
  235. ll-merge.h
  236. lockfile.c
  237. lockfile.h
  238. log-tree.c
  239. log-tree.h
  240. ls-refs.c
  241. ls-refs.h
  242. mailinfo.c
  243. mailinfo.h
  244. mailmap.c
  245. mailmap.h
  246. Makefile
  247. match-trees.c
  248. mem-pool.c
  249. mem-pool.h
  250. merge-blobs.c
  251. merge-blobs.h
  252. merge-recursive.c
  253. merge-recursive.h
  254. merge.c
  255. mergesort.c
  256. mergesort.h
  257. midx.c
  258. midx.h
  259. name-hash.c
  260. notes-cache.c
  261. notes-cache.h
  262. notes-merge.c
  263. notes-merge.h
  264. notes-utils.c
  265. notes-utils.h
  266. notes.c
  267. notes.h
  268. object-store.h
  269. object.c
  270. object.h
  271. oidmap.c
  272. oidmap.h
  273. oidset.c
  274. oidset.h
  275. pack-bitmap-write.c
  276. pack-bitmap.c
  277. pack-bitmap.h
  278. pack-check.c
  279. pack-objects.c
  280. pack-objects.h
  281. pack-revindex.c
  282. pack-revindex.h
  283. pack-write.c
  284. pack.h
  285. packfile.c
  286. packfile.h
  287. pager.c
  288. parse-options-cb.c
  289. parse-options.c
  290. parse-options.h
  291. patch-delta.c
  292. patch-ids.c
  293. patch-ids.h
  294. path.c
  295. path.h
  296. pathspec.c
  297. pathspec.h
  298. pkt-line.c
  299. pkt-line.h
  300. preload-index.c
  301. pretty.c
  302. pretty.h
  303. prio-queue.c
  304. prio-queue.h
  305. progress.c
  306. progress.h
  307. promisor-remote.c
  308. promisor-remote.h
  309. prompt.c
  310. prompt.h
  311. protocol.c
  312. protocol.h
  313. quote.c
  314. quote.h
  315. range-diff.c
  316. range-diff.h
  317. reachable.c
  318. reachable.h
  319. read-cache.c
  320. README.md
  321. rebase-interactive.c
  322. rebase-interactive.h
  323. ref-filter.c
  324. ref-filter.h
  325. reflog-walk.c
  326. reflog-walk.h
  327. refs.c
  328. refs.h
  329. refspec.c
  330. refspec.h
  331. remote-curl.c
  332. remote-testsvn.c
  333. remote.c
  334. remote.h
  335. replace-object.c
  336. replace-object.h
  337. repo-settings.c
  338. repository.c
  339. repository.h
  340. rerere.c
  341. rerere.h
  342. resolve-undo.c
  343. resolve-undo.h
  344. revision.c
  345. revision.h
  346. run-command.c
  347. run-command.h
  348. send-pack.c
  349. send-pack.h
  350. sequencer.c
  351. sequencer.h
  352. serve.c
  353. serve.h
  354. server-info.c
  355. setup.c
  356. sh-i18n--envsubst.c
  357. sha1-array.c
  358. sha1-array.h
  359. sha1-file.c
  360. sha1-lookup.c
  361. sha1-lookup.h
  362. sha1-name.c
  363. sha1dc_git.c
  364. sha1dc_git.h
  365. shallow.c
  366. shell.c
  367. shortlog.h
  368. sideband.c
  369. sideband.h
  370. sigchain.c
  371. sigchain.h
  372. split-index.c
  373. split-index.h
  374. stable-qsort.c
  375. strbuf.c
  376. strbuf.h
  377. streaming.c
  378. streaming.h
  379. string-list.c
  380. string-list.h
  381. sub-process.c
  382. sub-process.h
  383. submodule-config.c
  384. submodule-config.h
  385. submodule.c
  386. submodule.h
  387. symlinks.c
  388. tag.c
  389. tag.h
  390. tar.h
  391. tempfile.c
  392. tempfile.h
  393. thread-utils.c
  394. thread-utils.h
  395. tmp-objdir.c
  396. tmp-objdir.h
  397. trace.c
  398. trace.h
  399. trace2.c
  400. trace2.h
  401. trailer.c
  402. trailer.h
  403. transport-helper.c
  404. transport-internal.h
  405. transport.c
  406. transport.h
  407. tree-diff.c
  408. tree-walk.c
  409. tree-walk.h
  410. tree.c
  411. tree.h
  412. unicode-width.h
  413. unimplemented.sh
  414. unix-socket.c
  415. unix-socket.h
  416. unpack-trees.c
  417. unpack-trees.h
  418. upload-pack.c
  419. upload-pack.h
  420. url.c
  421. url.h
  422. urlmatch.c
  423. urlmatch.h
  424. usage.c
  425. userdiff.c
  426. userdiff.h
  427. utf8.c
  428. utf8.h
  429. varint.c
  430. varint.h
  431. version.c
  432. version.h
  433. versioncmp.c
  434. walker.c
  435. walker.h
  436. wildmatch.c
  437. wildmatch.h
  438. worktree.c
  439. worktree.h
  440. wrap-for-bin.sh
  441. wrapper.c
  442. write-or-die.c
  443. ws.c
  444. wt-status.c
  445. wt-status.h
  446. xdiff-interface.c
  447. xdiff-interface.h
  448. zlib.c
README.md

Build Status

Git - fast, scalable, distributed revision control system

Git is a fast, scalable, distributed revision control system with an unusually rich command set that provides both high-level operations and full access to internals.

Git is an Open Source project covered by the GNU General Public License version 2 (some parts of it are under different licenses, compatible with the GPLv2). It was originally written by Linus Torvalds with help of a group of hackers around the net.

Please read the file INSTALL for installation instructions.

Many Git online resources are accessible from https://git-scm.com/ including full documentation and Git related tools.

See Documentation/gittutorial.txt to get started, then see Documentation/giteveryday.txt for a useful minimum set of commands, and Documentation/git-<commandname>.txt for documentation of each command. If git has been correctly installed, then the tutorial can also be read with man gittutorial or git help tutorial, and the documentation of each command with man git-<commandname> or git help <commandname>.

CVS users may also want to read Documentation/gitcvs-migration.txt (man gitcvs-migration or git help cvs-migration if git is installed).

The user discussion and development of Git take place on the Git mailing list -- everyone is welcome to post bug reports, feature requests, comments and patches to git@vger.kernel.org (read Documentation/SubmittingPatches for instructions on patch submission). To subscribe to the list, send an email with just “subscribe git” in the body to majordomo@vger.kernel.org. The mailing list archives are available at https://public-inbox.org/git/, http://marc.info/?l=git and other archival sites.

Issues which are security relevant should be disclosed privately to the Git Security mailing list git-security@googlegroups.com.

The maintainer frequently sends the “What's cooking” reports that list the current status of various development topics to the mailing list. The discussion following them give a good reference for project status, development direction and remaining tasks.

The name “git” was given by Linus Torvalds when he wrote the very first version. He described the tool as “the stupid content tracker” and the name as (depending on your mood):

  • random three-letter combination that is pronounceable, and not actually used by any common UNIX command. The fact that it is a mispronunciation of “get” may or may not be relevant.
  • stupid. contemptible and despicable. simple. Take your pick from the dictionary of slang.
  • “global information tracker”: you're in a good mood, and it actually works for you. Angels sing, and a light suddenly fills the room.
  • “goddamn idiotic truckload of sh*t”: when it breaks