The MacOS 10.9.1 /usr/bin/perl truncates LWP-loads at the first packet, or something. Check this shit out:
$ cmd='use LWP::UserAgent;
print $LWP::UserAgent::VERSION, " $] \n";
for (my $i = 0; $i < 10; $i++) {
print length(LWP::UserAgent->new->get
("http://images.shutterslut.com/Bootie/Bootie-1-Jan-2014")
->decoded_content), "\n"; }'
$ /usr/bin/perl -e "$cmd"
6.05 5.016002
504
507
500
501
500
503
503
502
927
504
$ /opt/local/bin/perl -e "$cmd"
6.05 5.012004
18948
18949
18948
18949
18949
18949
18949
18948
18949
18949
How can I make /usr/bin/perl work properly again?
Hey Maciej Stachowiak. You know the answer?
Every freaking release, Apple breaks system perl. Give up, use perlbrew (or MacPorts, as you are), stop dealing with Apple cocking it up every time.
Obviously I already have a second, working Perl installed, and obviously I don't consider that a complete solution or I wouldn't have asked.
But hey, thanks for saying nothing of use.
Enjoy wasting time on broken shit.
You should also sarcastically tell carpenters to enjoy working with wood, or electricians to have fun with all those wires.
My favorite part of Lazyweb posts are the catfights!
You just described my entire livelihood.
Upgrade HTML::Parser module from CPAN
same problem as here :
http://stackoverflow.com/questions/14740365/why-cant-lwpuseragent-get-this-site-entirely
That fixed it. Thanks!
And for the curious, here's the underlying bug
Damn you, Apple, for not being directly at fault!
I have LWP::UserAgent 6.00 on Perl 5.10 (this is Mac OS 10.6), the same problem happens - different responses, none of them right. Looking at Wireshark, it appears to be that specific website's response to the LWP request is haywire.
Ask for this (as LWP does), get an indecipherable response
GET /Bootie/Bootie-1-Jan-2014 HTTP/1.1
TE: deflate,gzip;q=0.3
Connection: TE, close
Host: images.shutterslut.com
User-Agent: LWP::Simple/6.00 libwww-perl/6.05
Ask for this, you get a reasonable response:
GET /Bootie/Bootie-1-Jan-2014 HTTP/1.1
User-Agent: Wget/1.14 (darwin10.8.0)
Accept: */*
Host: images.shutterslut.com
Connection: Keep-Alive
However, I tried to narrow it down.
When _perl_ asks for this, it fails:
GET /Bootie/Bootie-1-Jan-2014 HTTP/1.1
Connection: close
Host: images.shutterslut.com
User-Agent: LWP::Simple/6.00 libwww-perl/6.04
/usr/bin/perl -e "@LWP::Protocol::http::EXTRA_SOCK_OPTS=(SendTE=>0); $cmd"
When _curl_ asks for EXACTLY THE SAME THING, it succeeds every time.
curl -v -D headers.txt -H 'Accept:' -H 'Connection: close' -A 'LWP::Simple/6.00 libwww-perl/6.04' http://images.shutterslut.com/Bootie/Bootie-1-Jan-2014 -o/dev/null
Why? Why? Why?
With some testing, it seems like: (a) LWP is definitely truncating the load of this site; (b) at the same time, tcpflow shows the full contents of the page getting sent as a response (and the truncation doesn't appear to be at a packet boundary); and (c) LWP works as expected on pages on other sites. Also curl fetches the page fine. One weird thing I noticed it always seems to die in the middle of the tag. Maybe that's the problem. I don't know if it is possible or desirable to remove it from the site. The best workaround I can suggest, if you're really just doing simple gets, is to shell out to 'curl'.