I dunno, if that database query takes a month to run, maybe Amazon should look into contracting with a cloud provider?
Anyway, once you get the download link, guess what -- it's not a download link. It's seventy-one separate download buttons. Is there a Download All button? Hahahahahaha.
My data was 115 MB spread over 10,988 mostly-CSV files. A quick stroll didn't reveal anything too surprising. It includes every email they ever sent you, and, amusingly, the transcript of every conversation you've had with their "support" chatbots.
Presumably this is the result of them doing the absolute bare minimum in order to comply with some jurisdiction's new or pending legislation. You can tell from the loving care they put into it.
I wonder if each of those 71 separate zip files had its own project team.
Previously, previously, previously, previously, previously, previously, previously, previously.
Probably. That's only 568 Amazon employees out of over 1.5 million.
Modern big tech companies are so siloed and so many employees are more or less clueless about what's going on in the company that there must be a kind of "dark academia" field of study about how to build software systems while deceiving the grunt teams about what they're actually building.
Once upon a time part of the point of insults like "tool" was to discourage young idiots from becoming the kinds of suckers who will work on anything without knowing what it is and what it can be used for. Only a tool would work in the dark the way those workforces today seem to.
Advice to young, talented hacker: You can solve problems real good and you like to show that off at every opportunity to everyone who issues you a "challenge". Don't.
This was basically the punchline of Cube. I think about that ending often.
That "deceiving the grunt teams" item was a subplot in Snow Crash, where YT's mom wrote part of the virus in her honed-to-identical-shape-cog job.
I'm pretty sure there is. Although "academia" may be too broad a term. My brother is a programmer at a defense contractor. He has no idea what it is he's building. I don't know if they all operate that way. But since we know for a fact that the parent company's products do eventually get released and appear to work, there must be people somewhere who know, and a hierarchy below who know less and less.
For a process this absurd to ever result in anything not fundamentally broken or disconnected from a spec most engineers never saw (we will assume here they don't rely on the thousand monkeys theory or the "I feel lucky" button), there has to be some strictly formalized protocols, maybe not far removed from cryptography, which incidentally originates from the same family of contractors and government agencies.
It's easy to see why they would want that, and since stripping a spec of all meaning and purpose while retaining functionality certainly can't happen by accident, I'll go out on a limb and claim: yes, it definitely has been theorized, because we see it being applied.
Although, we also see military contracts balloon decades and billions over projections, so maybe there's still work to be done in that field.
GDPR! From the same award winning legal team that brought you "Do you mind if we set a cookie?"
I'm not sure what they could have done better. Forcing people to ask before using tools (cookies) to track you is certainly inconvenient. So is having to authenticate yourself before you get to take all the money out of your bank account. Are you saying that there should be some class of 'innocuous' cookies which they don't need to ask about? Because people are not going to abuse those, are they?
Similarly forcing companies to make your data available is, in fact, a good thing. Forcing them to make getting it easy is probably too much to ask.
And yes, the legal system can probably never win against
sovereign individuals vampiresbig tech companies. We certainlyprobably are not going to win against Putin or climate change: does that mean we should not try?
Oddly enough, a recent finding says those cookie popups are in breach of GDPR...
http://www.rt.ie popping up "accept all cookies/manage cookies" before I could read that article on mobile is the cherry on top.
funny you should mention that.
(Also, RTE.ie rather than RT.ie. Joke about state-managed propaganda, potatoes, etc. here)
The easiest way to comply with the GDPR is DON'T FUCKING TRACK PEOPLE
GDPR allows technically essential functionality. You can run a web shop... with cookies! No consent required! But you can't then analyze the logs to track each user's shopping habits, because you didn't get consent for that non-essential purpose
That's why businesses are so in-your-face with it; they hate being transparent. They show you the Scary Modal Dialog so you'll click to make it go away without even daring to find out what you just consented to let them do... it's a big list of them giving themselves and all their friends permission to track and datamine the shit out of you
Exactly this. The clear intent of the law was to kneecap the surveillance advertising model. It has obviously failed at that. Whether that's because it's poorly written or poorly enforced, I don't know, but defending those modal cookie dialogs with "well what else were they supposed to do" is bullshit, like all those weird nerds who cape for Musk and Bezos.
In practice, GDPR has managed to re-invent Prop 65: nearly every building in California has a sign at the door saying "This facility contains chemicals known to the State of California to cause cancer and birth defects or other reproductive harm." Why? Because if it's true, not having that sign can cost you money. But if it's not true, it costs you nothing. So everyone posts the sign. It's the "everybody gets a prize trophy for participating!" of public health policy.
Yep, that pretty much nails it.
There is some good done by GDPR, which is that it encourages some product specifications to avoid storing some classes of data just because it's a major pain in the arse to do the GDPR dance. Also, lots of products just don't bother launching in Europe until they're ready to deal with it... which probably isn't good.
It's a major ouroboros of unnecessary frustration for users and software developers, one bound by the legal Necronomicon to endlessly click on consent dialogs, the other perpetually adding mostly worthless "features" to their software so they can cough up encyclopedias of private user data that only a forensic data scientist would really have the patience to go through. If you hadn't guessed already, the 30-day query duration is enshrined as one of your "data rights" as an EU citizen in the GDPR, and nothing to do with how long it takes to pull a laundry list of csv files together.
It takes a major institution-wide audit to know the disjoint between the data $BIG_COMPANY stores and the data they cough up for a GDPR request. Some of them must have rogue or transient teams that make the calculation it's more of a risk or hassle to put their data out there than it is to be sued under the GDPR. Absent some whistleblower determined to ruin their career, you'll never really know what data they hold on you, which is basically the position you're in without the GDPR. Also if they do get caught, I'm sure they could point to their best efforts at GDPR compliance in doing an exhaustive job letting you pull all the other data, copping a smaller legal settlement if they do get caught.
Not everybody who falls under GDPR is a website. For example, I work in pharma and we handle clinical data, the data is deidentified but we still need to comply with GDPR requests and people who revoke consent for a clinical trial. GDPR is pretty onerous and gets in the way of us providing better health products without protecting patients that much, whereas consent revocation is an essential process for ethical research.
Respectfully, I don't believe data can be properly "deidentified", and there are a lot of "reidentification" risks. Saying it can be "deidentified" is handwavy (as is asserting it can always be reidentified, but hopefully I got the point across).
I'd rather data, whether on a website or not, was treated like toxic waste; the more you have of it, the more dangerous it is. You always need a plan for what you're doing with it, who you're exposing it to, what you're going to do if any of it leaks, and so on. Even though I too have to go through piles of GDPR bureaucracy, I love that GDPR is starting to make these questions about data happen _before_ collecting it, rather than after or never.
Your toxic waste metaphor is excellent. The vast majority of data should have an expiry date after which it is unconditionally purged. For example there's no functional reason to retain the details of a customer's order after it has been shipped and the return window has passed.
Companies that promptly purge data after it becomes irrelevant to the reason it was collected will then need to spend less to comply with GDPR. Those who retain data longer will be subject to increased regulatory oversight and associated fees. They're also exposed to greater legal liability if the data leaks or is misused.
Facebook is even more thoughtful. They make your data secure by requiring that you create a Facebook account in order to see the data they've collected.
And friends wonder why I deleted my account years back and refuse to get an Alexa?
Amazon has many tentacles aside from their retail empire, so they might still be gathering info from your web surfing. Not all of their web presence is clearly marked as Amazon.
Perhaps, but I have no interest in creating a new account just to find out. I'm glad to be done with them the same way I'm glad to be done with FB and Twitter.
(I'd love to kill off my Google account since I really only use it for YouTube, but a lot of my actual work requires Google Docs for my clients, so I have to stick with it for now.)
"Usually, this should not take more than a month."
"unsubscribe" requests are also often claimed to take days or weeks. Shouldn't deletion of a few DB records be &*$%ing instantaneous? Even if they're replicating to spread out the DB load, a deletion shouldn't take more than a few seconds to propagate.
But worse are the marketing mails that periodically come back from the dead after unsubscribe. This isn't limited to email though: the same occurs with telemarketer "do not call" lists and physical junk mail. It is almost as if the people responsible for these systems intentionally implement it so poorly that it breaks often. "Oops! The opt-out table was dropped because it wasn't on the migrate list."
This doesn't happen because writing code that automatically and permanently deletes things is TERRIFYING. Sure you should have backups and continually test your restore procedures, but I have never met an organization that's prepared to do partial restores from backups when bad code clobbers 0.05% of the records in your database.
So you do what everyone does which is to add a tombstone bit on the DB record which you flip when the record is "deleted". If you have a middleware layer, this can be transparent to 99% of your code. Then you have a reaper process which actually deletes things on that "days or weeks" schedule alluded to above.
Implementation details about databases and backups have (checks notes) absolutely nothing to do with the claims made by lobbyists and marketing managers about how hard it is to do the particular thing that happens to be the thing that make them lose money.
But it sure is convenient for them to use that as an excuse.
Ah, now I see that the previous commenter was referring to email marketing unsubscribe requests. I have zero sympathy for the marketing managers (or lobbyists, or programmers) on that account.
To the original post, I have wondered how anyone writes software against any of these data exports from large companies. For Google Takeout alone there are hundreds of projects on Github. Are the schema definitions published anywhere? Are there announcements prior to changes? I imagine not.
Here's the reality in one workflow I know:
An unsubscribe request is taken verbally over the phone. A rep enters the request into a form and submits it. This goes into a database table.
Every Wednesday this gets rolled up into a flat file and placed on an internal FTP site.
This file sits until another batch process comes along to collect the file, encrypt it, and FTP it to a vendor.
The vendor then decrypts the file and adds it to their database.
If any of the batch processes fail for any reason, the file is held until the following week.
Meanwhile, some of the mass mailings are staged up to a week in advance with the opt-outs scrubbed at staging.
As a result, if the request comes in at just the right/wrong time, it could take two weeks or more for the requests to "kick in".
This is not to judge the reasonableness of any timeframe in question but merely to illustrate a real-world example of how this process is anything but instantaneous.