10.5.[01] broke ptys

Hey, I think I found a kernel bug. It's preventing the "Phosphor" screen saver (and others) from working properly on 10.5. As far as I can tell, if you have a pipe, and the process on the other end exits, the pipe flushes: all bytes that have been written to the pipe from the child but not yet read by the parent vanish. I reported it to Apple (5606018); no response yet.

<LJ-CUT text=" --More--(33%) ">


/* gcc test.c -lutil */

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <util.h>

static int
do_fork(void)
{
int fd = -1;
pid_t pid;

if ((pid = forkpty(&fd, NULL, NULL, NULL)) < 0)
perror("forkpty");
else if (!pid)
{
printf ("0123456789\n");

/* #### Uncommenting this makes it work! */
/* sleep(20); */

exit (0);
}

return fd;
}


int
main (int argc, char **argv)
{
char s[1024];
int n;
int fd = do_fork();

/* On 10.4, this prints the whole 10 character line, 1 char per second.
On 10.5, it prints 1 character and stops.
*/
do {
n = read (fd, s, 1);
if (n > 0) fprintf (stderr, "%c", *s);
sleep (1);
} while (n > 0);

return 0;
}


Update, 2 Sep 2009: Still broken in exactly the same way in 10.6.
(And also 10.7.)

Tags: , ,

15 Responses:

  1. dasht_brk says:

    It's a pty, really.

    I wonder if it's one of those cases like on old modem-based BBS' where the hang-up signal of the remote modem could be delivered before some of the type-ahe^D

  2. ciphergoth says:

    Doubtless you've done this test already, but it does what you want it to under Linux. Well, under Ubuntu Gutsy anyway.

  3. duskwuff says:

    For some reason, it works just fine on 10.5 when running under dtruss. I don't even claim to understand this.

    FWIW, here's an even more minimal test case that doesn't depend on forking:

    /* gcc test.c -lutil */

    #include <stdio.h>
    #include <stdlib.h>
    #include <unistd.h>
    #include <util.h>

    int main (int argc, char **argv) {
    int mfd, sfd;
    if(openpty(&mfd, &sfd, NULL, NULL, NULL) < 0) {
    perror("openpty");
    exit(1);
    }

    write(sfd, "test", 4);
    close(sfd);

    char buf[64];
    int n = read(mfd, buf, 1);
    printf("read returned %d\n", n);

    exit(0);
    }

    • jwz says:

      Wow, that test does something horrible under 10.4.11 as well. With your code, the read() blocks. But if I comment out the close(), then it hangs after printing "read returned 1", and by "hangs" I mean "kill -9 doesn't work". WTF.

    • pnendick says:

      This works fine for my 10.5.1 install (changing last argument to read to 4 returns whole string).

  4. sweh says:

    Shades of an old old SunOS 4 bug. When the writing end of a pty would close the reader only had a limitted length of time to read the buffer before the kernel would flush it. Annoying. Dunno if Sun ever did fix it. Haven't tested under Solaris 2 'cos I didn't need to do the same sort of work there.

  5. pnendick says:

    SIGCHILD being interpreted as SIGINT perhaps? SIGDEF'ing SIGCHILD might prove interesting. Python Popen* does this to me sometimes and it really winds me up.

    • jwz says:

      That would kill the parent, which isn't happening. The pty is getting flushed.

      • pnendick says:

        Perhaps you have exceeded your errant punctuation quota.

        Feel free to pay me in Basil Hayden my next visit to the bay.

        • pnendick says:

          BTW I was joking. I see the bug and am sufficiently bored at work to poke at it. Sec...

  6. babbage says:

    Try this:


    /* gcc test.c -lutil */

    #include <stdio.h>
    #include <stdlib.h>
    #include <unistd.h>
    #include <errno.h>
    #include <pty.h> /* or util.h, whatever */


    static void crapout(int errfd, const char *what)
    {
    int save_errno = errno;
    dup2(errfd, STDERR_FILENO); /* restore stderr */
    errno = save_errno;
    perror(what);
    _exit(1);
    }


    static int
    do_fork(void)
    {
    int fd = -1;
    pid_t pid;
    char slave[42]; /* Can't realistically use PATH_MAX even if it is defined. */
    int errfd = dup(STDERR_FILENO);
    int save_errno;

    fflush(NULL); /* Avoid doubly flushing the same data on exit */
    /* but still use _exit() to avoid atexit() weirdness */
    slave[0] = 0;
    if ((pid = forkpty(&fd, slave, NULL, NULL)) < 0) {
    crapout(errfd, "forkpty");
    } else if (0 == pid) {
    if (NULL == freopen(slave, "w", stdout))
    {
    crapout(errfd, "freopen");
    } else {
    close(errfd);
    printf("0123456789\n");
    _exit(0);
    }
    } else {
    close(errfd);
    return fd; /* parent */
    }
    return -2;
    }


    int
    main (int argc, char **argv)
    {
    char s[1];
    int n;
    int fd = do_fork();

    /* On 10.4, this prints the whole 10 character line, 1 char per second.
    On 10.5, it prints 1 character and stops.
    */
    do {
    n = read (fd, s, 1);
    if (n > 0) fprintf (stderr, "%c", *s);
    sleep (1); /* could skip this if '\r'==*s */
    } while (n > 0);

    return 0;
    }
    • pnendick says:

      It also works fine if you don't per-byte reads. I'm still not convinced they haven't sinned on this one...

      (keeping the same do_fork()):

      int main (int argc, char **argv) {
      char s[1024];
      int fd = do_fork();

      if(read(fd, s , 1024) != -1) {
      fprintf(stderr,"child wrote to fd #%d:\n%s",fd, s);
      } else {
      perror("read from child borked");
      }
      close(fd);
      return 0;
      }

  7. haqr_spice says:

    Are you sure this happens on pipes? Your code uses PTYs.

    Are you sure that data on PTYs is supposed to hang around indefinitely, according to POSIX, or is that an implementation detail on the other systems?

  8. jcalderone says:

    10.4 has some broken behavior in this area as well. I had hoped 10.5 would fix it, not break it more.

    Here's a unit test which *intermittently* reproduces the problem:

    http://twistedmatrix.com/trac/browser/trunk/twisted/test/test_stdio.py?rev=21558#L125

    I never managed to produce the behavior with a simple C program. I'm glad you figured it out and reported a bug.