| commit | e5cb1e3f093986815945b85e61de112b8e3ccdd6 | [log] [tgz] |
|---|---|---|
| author | Jeff King <peff@peff.net> | Sun Aug 13 12:24:40 2023 -0400 |
| committer | Junio C Hamano <gitster@pobox.com> | Sun Aug 13 16:30:36 2023 -0700 |
| tree | b76c3861affedd94a50c0dcb5b3b94be9bcd2f4e | |
| parent | 231e86c10c674235cf28447a8486f7955d5f4dd9 [diff] |
t4053: avoid writing to unopened pipe
This fixes an occasional hang I see when running t4053 with
--verbose-log using dash.
Commit 1e3f26542a (diff --no-index: support reading from named pipes,
2023-07-05) added a test that "diff --no-index" will complain when
comparing a named pipe and a directory. The minimum we need to test this
is to mkfifo the pipe, and then run "git diff --no-index pipe some_dir".
But the test does one thing more: it spawns a background shell process
that opens the pipe for writing, like this:
{
(>pipe) &
} &&
This extra writer _could_ be useful if Git misbehaves and tries to open
the pipe for reading. Without the writer, Git would block indefinitely
and the test would never end. But since we do not have such a bug, Git
does not open the pipe and it is the writing process which will block
indefinitely, since there are no readers. The test addresses this by
running "kill $!" in a test_when_finished block. Since the writer should
be blocking forever, this kill command will reliably find it waiting.
However, this seems to be somewhat racy, in that the writing process
sometimes hangs around even after the "kill". In a normal run of the
test script without options, this doesn't have any effect; the
main test script completes anyway. But with --verbose-log, we spawn a
"tee" process that reads the script output, and it won't end until all
descriptors pointing to its input pipe are closed. And the background
process that is hanging around still has its stderr, etc, pointed into
that pipe.
You can reproduce the situation like this:
cd t
./t4053-diff-no-index.sh --verbose-log --stress
Let that run for a few minutes, and then you'll find that some of the
runs have hung. For example, at 11:53, I ran:
$ ps xk start o pid,start,command | grep tee | head
713459 11:48:06 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-9.out
713527 11:48:06 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-15.out
719434 11:48:07 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-1.out
728117 11:48:08 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-5.out
738738 11:48:09 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-31.out
739457 11:48:09 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-27.out
744432 11:48:10 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-21.out
744471 11:48:10 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-29.out
761961 11:48:12 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-0.out
812299 11:48:19 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-8.out
All of these have been hung for several minutes. We can investigate one
and see that it's waiting to get EOF on its input:
$ strace -p 713459
strace: Process 713459 attached
read(0,
^C
Who else has that descriptor open?
$ lsof -a -p 713459 -d 0 +E
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
tee 713459 peff 0r FIFO 0,13 0t0 3943636 pipe 719203,sh,5w 719203,sh,7w 719203,sh,12w 719203,sh,13w
sh 719203 peff 5w FIFO 0,13 0t0 3943636 pipe 713459,tee,0r 719203,sh,7w 719203,sh,12w 719203,sh,13w
sh 719203 peff 7w FIFO 0,13 0t0 3943636 pipe 713459,tee,0r 719203,sh,5w 719203,sh,12w 719203,sh,13w
sh 719203 peff 12w FIFO 0,13 0t0 3943636 pipe 713459,tee,0r 719203,sh,5w 719203,sh,7w 719203,sh,13w
sh 719203 peff 13w FIFO 0,13 0t0 3943636 pipe 713459,tee,0r 719203,sh,5w 719203,sh,7w 719203,sh,12w
It's a shell, presumably a subshell spawned by the main script. Though
it may seem odd, having the same descriptor open several times is not
unreasonable (they're all basically the original stdout/stderr of the
script that has been copied). And they should all close when the process
exits. So what's it doing? Curiously, it will exit as soon as we strace
it:
$ strace -s 64 -p 719203
strace: Process 719203 attached
openat(AT_FDCWD, "pipe", O_WRONLY|O_CREAT|O_TRUNC, 0666) = -1 ENOENT (No such file or directory)
write(2, "./t4053-diff-no-index.sh: 7: eval: ", 35) = 35
write(2, "cannot create pipe: Directory nonexistent", 41) = 41
write(2, "\n", 1) = 1
exit_group(2) = ?
+++ exited with 2 +++
I think what happens is this:
- it is blocking in the openat() call for the pipe, as we expect (so
this is definitely the backgrounded subshell mentioned above)
- strace sends signals (probably STOP/CONT); those cause the kernel to
stop blocking, but libc will restart the system call automatically
- by this time, the "pipe" fifo is gone, so we'll actually try to
create a regular file. But of course the surrounding directory is
gone, too! So we get ENOENT, and then exit as normal.
So the blocking is something we expect to happen. But what we didn't
expect is for the process to still exist at all! It should have been
killed earlier when the parent process called "kill", but it wasn't. And
we can't catch the race at this point, because it happened much earlier.
One can guess, though, that there is some race with the shell setting up
the signal handling in the backgrounded subshell, and possibly blocking
or ignoring signals at the time that the "kill" is received. Curiously,
the race does not seem to happen if I use "bash" instead of "dash", so
presumably bash's setup here is more atomic.
One fix might be to try killing the subshell more aggressively, either
using SIGKILL, or looping on kill/wait. But that seems complex and
likely to introduce new problems/races. Instead, we can observe that the
writer is not needed at all. Git will notice the pipe via stat() before
it is ever opened. So we can simply drop the writer subshell entirely.
If we ever changed Git to open the path and fstat() it, this would
result in the test hanging. But we're not likely to do that. After all,
we have to stat() paths to see if they are openable at all (e.g., it
could be a directory), so this seems like a low risk. And anybody who
does make such a change will immediately see the issue, as Git would
hang consistently.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git is a fast, scalable, distributed revision control system with an unusually rich command set that provides both high-level operations and full access to internals.
Git is an Open Source project covered by the GNU General Public License version 2 (some parts of it are under different licenses, compatible with the GPLv2). It was originally written by Linus Torvalds with help of a group of hackers around the net.
Please read the file INSTALL for installation instructions.
Many Git online resources are accessible from https://git-scm.com/ including full documentation and Git related tools.
See Documentation/gittutorial.txt to get started, then see Documentation/giteveryday.txt for a useful minimum set of commands, and Documentation/git-<commandname>.txt for documentation of each command. If git has been correctly installed, then the tutorial can also be read with man gittutorial or git help tutorial, and the documentation of each command with man git-<commandname> or git help <commandname>.
CVS users may also want to read Documentation/gitcvs-migration.txt (man gitcvs-migration or git help cvs-migration if git is installed).
The user discussion and development of Git take place on the Git mailing list -- everyone is welcome to post bug reports, feature requests, comments and patches to git@vger.kernel.org (read Documentation/SubmittingPatches for instructions on patch submission and Documentation/CodingGuidelines).
Those wishing to help with error message, usage and informational message string translations (localization l10) should see po/README.md (a po file is a Portable Object file that holds the translations).
To subscribe to the list, send an email with just “subscribe git” in the body to majordomo@vger.kernel.org (not the Git list). The mailing list archives are available at https://lore.kernel.org/git/, http://marc.info/?l=git and other archival sites.
Issues which are security relevant should be disclosed privately to the Git Security mailing list git-security@googlegroups.com.
The maintainer frequently sends the “What's cooking” reports that list the current status of various development topics to the mailing list. The discussion following them give a good reference for project status, development direction and remaining tasks.
The name “git” was given by Linus Torvalds when he wrote the very first version. He described the tool as “the stupid content tracker” and the name as (depending on your mood):