We train an artificial neural network by showing it millions of training examples and gradually adjusting the network parameters until it gives the classifications we want. The network typically consists of 10-30 stacked layers of artificial neurons. Each image is fed into the input layer, which then talks to the next layer, until eventually the "output" layer is reached. The network's "answer" comes from this final output layer.
So then they reach inside to one of the layers and spin the knob randomly to fuck it up. Lower layers are edges and curves. Higher layers are faces, eyes and shoggoth ovipositors.
So here's one surprise: neural networks that were trained to discriminate between different kinds of images have quite a bit of the information needed to generate images too.
That part, they kinda gloss over...
But the best part is not when they just glitch an image -- which is a fun kind of embossing at one end, and the "extra eyes" filter at the other -- but is when they take a net trained on some particular set of objects and feed it static, then zoom in, and feed the output back in repeatedly. That's when you converge upon the platonic ideal of those objects, which -- it turns out -- tend to be Giger nightmare landscapes. Who knew. (I knew.)
PS: Inception sucked.