First it loads a low resolution version of the image, then fills that in with progressively higher resolutions; and in each of those images, the YUV channels come in one at a time. Y is "luma", or brightness, which is a grayscale image; and then U and V are "chroma", which encode RGB using two numbers instead of three.
The "U" axis is sort-of yellow-cyan through red-blue; and the "V" axis is sort-of green-magenta. It's a strange encoding.
All of this came out of the development of color television, where for backward compatibility with black and white TVs back in like 1938, they had to leave the monochrome signal alone and find a way to tack the color information onto a subcarrier that older displays would ignore. NTSC and PAL are ridiculous kludges intended to avoid a flag-day where everyone would have needed to buy new TVs -- after all there were thousands of them deployed already! And we've been dealing with the fallout of that for nearly a century.
"NTSC" stands for "Never Twice the Same Color".
But at least it was a good-faith attempt to encode video, unlike HDMI, which is first a restraint and only secondarily a means of moving images from point A to B. A sensible design for video transport would have the design priority of "try really hard to get bits on the screen in the face of unreliable connections". But HDMI's prime directive is, "Under no circumstances display something unpermitted; all other considerations secondary; crew expendable."
But I digress. Here's a video.
(I considered rendering this out as an anim GIF, which would then have been auto-converted to an MP4 by my blog image resizer, but that would have been just too many layers for good taste.)
I grabbed an image taken in our photo booth on Saturday (chosen for its explicatory color palette, obviously) and slowed it way down. It starts with the Y (luminance) channel, then U (the yellow-ish channel) comes it at about 0:08, and V (the red-ish channel) comes in at about 0:10. The complete low-rez image is there by around 0:12, and then you see a verrrrry slow top-to-bottom pass of increasing resolution (you may have to squint to see it; watch the chunky aliasing on the black and white stripes on the dazzle pattern).
When displaying the original bandwidth-throttled image, Firefox, Safari and Opera all display it pretty much as you see here, but oddly, Chrome does not: it displays the first frame, and then waits for the entire image to arrive before displaying anything else.
One of the things that we did in Netscape 1.0 (and I think we were the first to do it?) was to do this kind of progressive display with interlaced GIFs. When people were browsing the web on 14.4kbps modems, that mattered. In the early betas, we would display the scan lines as they came in, which gave it a Venetian blind kind of effect: first you'd see a single-pixel slices of the image come in, every 8 or 16 lines, and then more would fill in. By v1.0 (I think) we had changed that to interpolate the lines that hadn't arrived yet, so it looked more like "blocky, low resolution image gets less blurry with time". It looked a lot better. But since we were running this code on Pentiums, which had literally dozens of megahertz, managing to re-write the whole image several times a second was kind of a big deal.