Anyone have a theory on how Apple broke LWP?

Dear Lazyweb,

The MacOS 10.9.1 /usr/bin/perl truncates LWP-loads at the first packet, or something. Check this shit out:

$ cmd='use LWP::UserAgent;
  print $LWP::UserAgent::VERSION, " $] \n";
  for (my $i = 0; $i < 10; $i++) {
    print length(LWP::UserAgent->new->get
      ("http://images.shutterslut.com/Bootie/Bootie-1-Jan-2014")
      ->decoded_content), "\n"; }'


$ /usr/bin/perl -e "$cmd"
6.05 5.016002
504
507
500
501
500
503
503
502
927
504

$ /opt/local/bin/perl -e "$cmd"
6.05 5.012004
18948
18949
18948
18949
18949
18949
18949
18948
18949
18949

How can I make /usr/bin/perl work properly again?

Tags: , , ,

13 Responses:

  1. Cameo Wood says:

    Hey Maciej Stachowiak. You know the answer?

  2. Nick says:

    Every freaking release, Apple breaks system perl. Give up, use perlbrew (or MacPorts, as you are), stop dealing with Apple cocking it up every time.

  3. dinatural says:

    Upgrade HTML::Parser module from CPAN

    same problem as here :
    http://stackoverflow.com/questions/14740365/why-cant-lwpuseragent-get-this-site-entirely

  4. Kyzer says:

    I have LWP::UserAgent 6.00 on Perl 5.10 (this is Mac OS 10.6), the same problem happens - different responses, none of them right. Looking at Wireshark, it appears to be that specific website's response to the LWP request is haywire.

    Ask for this (as LWP does), get an indecipherable response

    GET /Bootie/Bootie-1-Jan-2014 HTTP/1.1
    TE: deflate,gzip;q=0.3
    Connection: TE, close
    Host: images.shutterslut.com
    User-Agent: LWP::Simple/6.00 libwww-perl/6.05

    Ask for this, you get a reasonable response:

    GET /Bootie/Bootie-1-Jan-2014 HTTP/1.1
    User-Agent: Wget/1.14 (darwin10.8.0)
    Accept: */*
    Host: images.shutterslut.com
    Connection: Keep-Alive

    However, I tried to narrow it down.

    When _perl_ asks for this, it fails:

    GET /Bootie/Bootie-1-Jan-2014 HTTP/1.1
    Connection: close
    Host: images.shutterslut.com
    User-Agent: LWP::Simple/6.00 libwww-perl/6.04

    /usr/bin/perl -e "@LWP::Protocol::http::EXTRA_SOCK_OPTS=(SendTE=>0); $cmd"

    When _curl_ asks for EXACTLY THE SAME THING, it succeeds every time.

    curl -v -D headers.txt -H 'Accept:' -H 'Connection: close' -A 'LWP::Simple/6.00 libwww-perl/6.04' http://images.shutterslut.com/Bootie/Bootie-1-Jan-2014 -o/dev/null

    Why? Why? Why?

  5. With some testing, it seems like: (a) LWP is definitely truncating the load of this site; (b) at the same time, tcpflow shows the full contents of the page getting sent as a response (and the truncation doesn't appear to be at a packet boundary); and (c) LWP works as expected on pages on other sites. Also curl fetches the page fine. One weird thing I noticed it always seems to die in the middle of the tag. Maybe that's the problem. I don't know if it is possible or desirable to remove it from the site. The best workaround I can suggest, if you're really just doing simple gets, is to shell out to 'curl'.