This is probably going straight down the memory hole. ASCII art is already hard to archive properly; preserving the experience of scrolling through that web page will be even harder. Especially since Twitter uses that break-the-entire-web "#!" Javascript nonsense.
Cute, though.
Until now, I never understood or even noticed this #! business. I think I was happier before.
Twitter seemed to be interested in removing hashbangs last year, at least. I haven't heard anything about it since, which is a shame.
Speaking of memory holes, you might update your backup page. In current rsync, -vaxAX isn't a valid option any more.
I'm currently using:
sudo rsync -vaxEh --progress --delete --ignore-errors
--progress because some of my backups are over the network and it's handy to know if it's doing 3MB/s vs. 20MB/s.
You have that backwards: rsync 3.0.6 had -X (--xattrs) and -A (--acls) but 2.6.9 had -E (--extended-attrs) and no ACL support at all.
You are correct, I somehow have 2.6.9. Seems to work ok, at least.
Yeah, the old -E and the new -X do the same thing, and I don't think any systems use ACLs by default. ACLs also don't travel well across hosts; any time I've run into them, I just get a bunch of annoying errors from rsync anyway.
But someday you're going to accidentally upgrade and -E will stop working.
The idea that #! "breaks the web" is just nonsense. It simply doesn't. That's the academic silliness of people who think that the it's not valid to use anything in a way that they didn't think of first. (Search engines can't spider it? Surprise: they can't spider any page that's generated by script, even if they use the new History API. #! has nothing at all to do with that.)
The phrase "breaks the web" refers to changes to browsers (and specs) which break backwards-compatibility with existing content. It's not even applicable to web pages.
Search engines have to use special tricks which respond with different content to them than to the user. This is already broken; there shouldn't be any special "search engine-only" content, that's not how web search works or should work. Even more, not everyone implements that workaround, which means that some pages are, in essence, uncrawlable by any means.
Now, try to look that page (or any #! page, for that record), on web.archive.org. Won't work? Ok. Try to use wget to download the page. Won't work too? The best you can do is to manually load the part of timeline you want in the browser and, again, manually save it. This also breaks all non-text content in tweets, because it's loaded with some AJAX-ey crap.
There are cases where so-called "web application" paradigm is applicable and cases where it isn't.
For Google Docs it is; it makes no sense to save or crawl Docs pages (I mean the app itself, not the prerendered ones), and you can save your data anyway.
For a site which presents itself as a content provider, it definitely isn't. Twitter isn't a content provider. Twitter is a provider for funny shit which you will read, laugh for a second and then forget forever. Lifespan of a tweet is less than a few days; mostly even a few hours.
Ironically, this corresponds to the attention span of an average Twitter user, so everything balances itself out.
That doesn't "break the web". The web works fine; the pages work as expected. If you want to choose between making it hard for search engines to find your content or jumping hoops to make it work, that's perfectly fine and up to you (the author). Everything is about tradeoffs, and it's a perfectly valid decision to opt for client-side rendering (with its many benefits, such as faster responsiveness and easier implementation of interactive interfaces) at the expense of other things (such as every page being accessible as a static URL).
(And again, this has nothing at all to do with "#!"; the History API allows the same features that people are using "#!" for without repurposing anchors.)
What most people complaining that "#! breaks the web" are really trying to say is typically closer to "it makes some features that I like not work, and you should prioritize what I want over what you want".
Jesus Christ. There was a period for about five years between about 2003 and 2008 where it seemed that, for once, the prevalent consensus as to how HTML worked, what good practice was et cetera was actually sane and corresponded to the superior in which the language and the Web itself had been designed. Unfortunately it seems that day has passed and we're back to "if it works okay on the browsers i have installed then it's fine", assuming that you are, sadly, a professional website writer ("Web developer").
The Web changes; the design and practices from one decade, and their underlying rationale, aren't going to apply for all time. You might have missed it, but JavaScript engines and APIs have had orders of magnitude improvements in capability, speed, robustness and interoperability in the last decade, which means things are possible today that weren't in 2003. "You should do it this way because it's how things were done a decade ago (or two decades ago)" is a non-argument.
> "if it works okay on the browsers i have installed then it's fine"
Nobody is even trying to claim that "client-side page rendering doesn't work cross-browser", so I don't know where this came from.
> Nobody is even trying to claim that "client-side page rendering doesn't work cross-browser", so I don't know where this came from.
Erm, several people have pointed out that there are user agents other than desktop Web browsers interested in consuming HTML. Nobody should have to resort to some site-provided API to consume what is - and let's be quite clear about this - 140 characters of plain text updated on an irregular basis. Twitter has broken that expectation.
> things are possible today that weren't in 2003
Yes. Apparently people can get jobs writing websites without understanding how the static Web worked (or increasingly, worked). That is not a positive development.
> Erm, several people have pointed out that there are user agents other than desktop Web browsers interested in consuming HTML.
Sure. There are the variations of WebKit found on mobile devices, which--surprise!--support JavaScript just fine. Web authors can jump the extra hoops for the likely sub-2% of users without JavaScript, but it's certainly not their responsibility to do so.
> Yes. Apparently people can get jobs writing websites without understanding how the static Web worked (or increasingly, worked).
If you start convincing yourself that everyone making decisions which you disagree with are incompetent, you're going to have a grossly skewed view of the world.
> Sure. There are the variations of WebKit found on mobile devices, which--surprise!--support JavaScript just fine. Web authors can jump the extra hoops for the likely sub-2% of users without JavaScript, but it's certainly not their responsibility to do so.
My bad. When I said "other than desktop Web browsers" I'd assumed for some reason that you had the experience to relate that to things other than "graphical browsers for current smartphones and tablets", especially seeing as multiple people have already brought up the Google search bot, which is neither. But I'm forgetting quite how little one actually needs to know to get a gig in the occupation in question.
> If you start convincing yourself that everyone making decisions which you disagree with are incompetent, you're going to have a grossly skewed view of the world.
The people making decisions here, at least in terms of implementation, are not incompetent. The aim of the people making the decisions at Twitter is to monetise a simple service through charging people to build on it. They've done that very well. The incompetents are those who have to accept that as part of the nature of their work and feel the need to rationalise it to feel better about themselves without any particular need to dwell upon why the approach in question broke every existing understanding of how to consume content intended to be permanently addressible.
> When I said "other than desktop Web browsers" I'd assumed for some reason that you had the experience to relate that
Troll stench detected. Done talking.
Yes, yes, a thousand times yes to everything Chris Cunningham has said here.
In particular, I usually abbreviate his entire rant as, "Pop quiz, what's the 'U' in 'URL' supposed to mean?" and people like Glenn here pretend to not understand what I'm talking about.
I must live in a very strange demographic, with fantastic bandwidth and limited CPU grunt (academic networks and low-profile laptops). The consequence? Traditional, server-side rendered pages load fast; new, client-side rendered pages load slow.
Faster responsiveness? Maybe, but definitely not in the case of Twitter. It loads literally megabytes of poorly cached assets (something like ~5MB), is HORRIBLY SLOW on netbooks, and isn't any faster than a traditional interface. So, Twitter UX is bad on anything except a very recent, JavaScript-enabled browser on a fast machine with broadband connection. This is just fucked up.
Also, you've mentioned History API. The entire point of History API is fixing the mess with #!. Fragment-generated URLs look all similar to the server; history API-generated URLs degrade gracefully. Look at Github: they use History API and you can just wget any page. Twitter should get a lesson or two.
Surely you're not using Twitter as the gold standard.
The History API is nice, but it doesn't magically make client-generated pages easy to spider; wget still won't work with pages generated via JS.
Why should I use something so bad as a gold standard? Seriously, I'd rather pick Github for it.
Of course not. Github provides both JS-generated and server-generated versions of pages and makes sure they are the same; History API makes these URLs sensible to link to while ensuring they can be modified by JavaScript.
And it's their decision to spend the extra (significant!) development time to do that. It's just as valid a decision not to do so, if the time it takes to do that sort of thing isn't worthwhile. Making the choice not to do so may well annoy people who want to be able to wget pages, but it's just a cost/value decision; breaking a feature that somebody wants is not "breaking the web".
On which note, it is utterly unacceptable that it takes my recent computer on a fast connection upwards of thirty seconds to load a single ``tweet'' when we could transmit a 140-character message in four seconds over an acoustically-coupled modem in the fucking 1960s.
If you want to archive tweets, the best thing I've found is a combination of ifttt.com and a python script which translates their mail into vastly simpler mails of the form
From: jmtd
Date: Wed, 11 Apr 2012 17:57:00 +0000
Subject: I wonder if apple have patched that samba vuln in osx <= snow leopard yet
X-URL: http://twitter.com/jmtd/status/190136568932077569
(note also the alternative #!-less URL for the tweet)
You then back them up in the same way you back up your mail.
Still won't help getting tweets past 4k API limit. They are probably gone forever.
That's true. My next solution-in-the-works is to have a "window" of X tweets, perhaps X=100, or 1000, and to (automatically) delete all older than that. The ≥ 4k tweets will gradually move into range of the API that way. (I see the value in archiving tweets. I don't see the value in other people seeing what I've tweeted, Y years in the future, with few exceptions.)
But this way you would destroy other peoples' favorites, retweets and links to your old tweets. Not good.
I'm using pinboard.in 's tweet archiving functionality, and while I can't be bothered to go back and see if it retroactively works from before I turned it on, it's certainly been great since then. Just as another option.
> Surprise: they can't spider *any* page that's generated by script
Were that true, Google would be boned. But luckily it isn't.
Unrelated, but relevant to your retrocomputing interests:
http://m.wired.com/gamelife/2012/04/prince-of-persia-source-code/