Datasaurus Dozen

Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing

Anscombe's Quartet is a set of four datasets, where each produces the same summary statistics (mean, standard deviation, and correlation), which could lead one to believe the datasets are quite similar. [...]

Recently, Alberto Cairo created the Datasaurus dataset which urges people to "never trust summary statistics alone; always visualize your data", since, while the data exhibits normal seeming statistics, plotting the data reveals a picture of a dinosaur. Inspired by Anscombe's Quartet and the Datasaurus, we present, The Datasaurus Dozen:

Previously, previously, previously, previously, previously, previously, previously, previously.

Tags: , , ,

2 Responses:

  1. David Thompson says:

    This kind of technique looks like a great way to generate falsified data that fits your hypothesis but still looks just random enough when visualised.

  2. James says:

    Today my customer's purportedly reliable Gaussian normal data is all bifurcated and I don't know why.

    At least if it looked like a dinosaur I would know that someone was fucking with me.

    Shakes fist at data.