
It turns out that if you divide the URL into pieces and load them all in parallel, each of those loads gets rate limited, but not in aggregate, so downloading it by spanking their servers with 30 parallel connections makes it load 30× faster.
Let's see how long this lasts.
(It's possible that rate limiting is related to your IP address; some people report getting no limits at all.)
I did not want to write this code. Everything about the modern tech ecosystem is just... so... exhausting.
I recently started seeing this when using youtube-dl. A fork project, yt-dlp, seems to have worked around this problem.
It looks like Youtube has added extra challenges to the download process that you have to jump through if you want full bandwidth, see the links at https://github.com/ytdl-org/youtube-dl/issues/29326#issuecomment-974052706
> I did not want to write this code. Everything about the modern tech ecosystem is just... so... exhausting.
I keep having to update the HTTP user agent that my feed reader uses when fetching RSS/ATOM feeds. A whole bunch of sites (and their caching layers) start rejecting HTTP requests if you're using something they doesn't look like a reasonably-modern web browser. I've given up, I just paste in my Firefox UA every time I start seeing lots of failures, then magically things work again.
UA sniffing - bad idea then, worse idea now. I know, let's do more of that!
C.
Speaking of, I am amused that jwz.org rejects requests based on UA:
$ torify wget -q -S https://www.jwz.org/hacks/youtubedown 2>&1 | head -n1
HTTP/1.1 403 Forbidden
$ torify wget -q -S -U shibboleth https://www.jwz.org/hacks/youtubedown 2>&1 | head -n1
HTTP/1.1 200 OK
You would not believe how many children decide that my web site is a good test case for Baby's First Web Crawler.
If you don't know how to change the UA, this is too much 'puter for you.
I was going to say, it's probably understandable in jwz's case. Even without being a test case for that, if it did no checking it would probably be auto-downloaded ten million times a day by people (and cronjobs) that want to make sure they always have the latest running, just in case YT has changed something since 5 hours ago.
I can't imagine how many of the commercial "download from youtube" services are just a CGI wrapper around youtubedown. Probably most :P
C.
Modern tech, making life increasingly difficult
I saw this in youtube-dl as well. BTW, having worked inside the Goo, this is more or less how all their distributed systems work- you add throughput by adding more parallel independent readers. Since leaving I've gone back and started working with normal machines again (IE, a single server with lots of fast bits) and it's so much nicer.
What's worse is that software architecture outside the Gooballs now insists on operating lots of shitty horizontal scaling rather than realising that an average 2U server now goes vertically a long long way and will mean you don't need a hundred production support - oops, sorry, SRE - minions for every system. Sigh.
In case you should start getting 429 errors when making a lot of connections, maybe you could use https://metacpan.org/release/SORTIZ/JSP-1.02/view/lib/JSP.pm to interpret the n-parameter challenge in the media URLs?
You'd have to extend the existing sig processing to find the function (today it's zha()) in the player JS that transforms the parameter n in a media URL to a value that unthrottles the download; then evaluate it. Examples linked from the post linked by @Hales above.
Or you can POST with Android client data, but may not get a full set of resolutions.
youtube-dl was being affected by the throttling. yt-dlp generally is not, though sometimes they do. i'll add this and see how consistent it is.
You guys, I'm starting to think that having the entire internet controlled by two advertising companies was maybe not a good idea.
Wonder when they'll finally decide they're fed up with adblockers...
The Weyland Yutani & Tyrell corporations frown on this suggestion
Google is just an advertising company, or at least mostly. Facebook is that but also seems to be something a whole lot worse.
[I am not supporting Google here: thing 2 being hugely worse than thing 1 does not make thing 1 OK.]
What could possibly go wrong?
Whole GAFAM gang is controling maybe 80% of this adweb. While more than 60% of the traffic consist of 20+ years old H.264 AVC and some 20% of 25+ years old of JPEG.
Is that innovation?
Frankly, in the past future was better.
To be fair, at least G is actively pushing more efficient video codecs such as VP9 and AV1, but due to "this old smart TV device that can't even get new firmware anymore" it will likely take ages till H.264 can fully get dropped. And encoding every video with 3 codecs is expensive.
You're doing
$DEITY
's work, thank you!