Datasaurus Dozen

Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing

Anscombe's Quartet is a set of four datasets, where each produces the same summary statistics (mean, standard deviation, and correlation), which could lead one to believe the datasets are quite similar. [...]

Recently, Alberto Cairo created the Datasaurus dataset which urges people to "never trust summary statistics alone; always visualize your data", since, while the data exhibits normal seeming statistics, plotting the data reveals a picture of a dinosaur. Inspired by Anscombe's Quartet and the Datasaurus, we present, The Datasaurus Dozen:

Previously, previously, previously, previously, previously, previously, previously, previously.

Tags: , , ,

2 Responses:

  1. David Thompson says:

    This kind of technique looks like a great way to generate falsified data that fits your hypothesis but still looks just random enough when visualised.

  2. James says:

    Today my customer's purportedly reliable Gaussian normal data is all bifurcated and I don't know why.

    At least if it looked like a dinosaur I would know that someone was fucking with me.

    Shakes fist at data.

Leave a Reply

Your email address will not be published. But if you provide a fake email address, I will likely assume that you are a troll, and not publish your comment.

You may use these HTML tags and attributes: <a href="" title=""> <b> <blockquote cite=""> <code> <em> <i> <s> <strike> <strong> <img src="" width="" height="" style=""> <iframe src="" class=""> <div class=""> <blink> <tt> <u>, or *italics*.