Principal Component Analysis and Fashion

With a bunch of components like these, we can reduce an image from, eg 60,000 points of data (pixel values) to just a handful of numbers. [...] Let's recreate this dress from its components.

The data for the dress now looks like this: [-17541.81, -12749.33, -3766.29, 2005.28, 4193.08, 6832.55, -6704.90, -2135.51, 1112.27, 7627.80].

Previously, previously, previously, previously.

Tags: ,

5 Responses:

  1. Karellen says:

    "So, if you have a million pictures you need to store, you can save a whole lot of space by saving just the component values instead of the values of every pixel of every dress."

    Yeah, I don't buy that. I don't see how you can recreate a picture if you don't have the values of every pixel of every dress in your training set. I mean, sure, you can use the component values as a shorthand with someone else who already has the entire training set, but you have to send them the whole training set.

    Also, while the author writes an example involving 10 component values, the resulting image is very poor, even for one already in the training set. All the other examples use the entire training set, which is 807 images in size, so would require an 807-component value. Given that the component values in the example range from -17541.81 to 7627.80, a 16-bit value is not going to be enough to hold each component value. Which means that's 3228 bytes needed for the components alone (excluding the training set data). I'm wondering how good a 3228 byte jpeg would be in comparison - which wouldn't even be dress specific or require pre-sharing the training set.

    Yes, using the tech to predict whether you'll like or not like a dress, and the ability to "create new dresses", are interesting novel uses. I think focussing on exploring those applications in more depth would be an improvement.

    • jwz says:

      I agree that it doesn't look particularly useful, but I was kind of surprised it worked even as well as it did.

      • latemodel says:

        I really don't know why the author talks about compressing and storing images, because there are better ways to do that. But showing the reconstructions and the components is useful, because it demonstrates that the very compact PCA representation is preserving relevant information about the images. This basic technique (extract features, then classify) is the backbone of modern computer vision/machine learning/pattern recognition systems. This is a very simple example—PCA + logistic regression is about as basic as it gets— but it's a very reasonable place to start.

        I suspect that there are two reason that it seems to work so well. The first is that some of the images are from the training set—we call that cheating. The pattern in the image you linked probably occupies several of the first 70 principal components. But the technique works relatively well on unseen images as well, so it's legit.

        The second reason is that, despite the author's statement that there's no "standard pose", the images are pretty standardized. The photos are all the same size, the models are all in the same place in the image, they're the same height, they have the same aspect ratio, and they're more or less facing the same way. That sort of scaling and shifting is exactly what you have to do to get eigenfaces and similar things to work. Things would probably work even better if she switched to a different color space (Lab) and did some contrast normalization.

    • crowding says:

      It's a didactic illustration of what PCA does, not a complete proposal for a compression scheme. PCA is a decent choice for constructing part of a compression scheme, though. Specifically the part that calls for a correlation-removing transform.

      Turning it into a complete compression scheme would take a few more additions like,
      * Discretize the coefficients (with the resolution of discretization based on some notion of perceptual just-noticeable-difference)

      * Write the coefficients with Huffman coding
      * Encode images blockwise rather than all at once

      which gets you pretty much to something at the JPEG level.

  2. margaret says:

    a bunch of old people sitting around in the retirement home - every once in a while one calls out a number and they start laughing. the visitor says "wtf?". it's explained that these are retired comedians and they don't have the time or energy to say the whole joke so they gave each a number.

  • Previously