Many of you have already begun composing a comment starting with, "Instead of Amazon, you know what you oughta do" and -- thank you but no, please stop. It is done. I solicited your opinions on this a year ago, you had your chance.
You can be helpful in the following ways:
- Tell me if you notice anything broken or acting weird on either jwz.org or dnalounge.com. The switch involved upgrading from CentOS 6 to 7, Apache 2.2 to 2.4, and literally every other piece of software, plus sending email from a new IP address, so it's bound to be weird for a while.
- I have been led to believe (by this thing) that if I pay for a year up front, my price goes down almost by half. I'd like to do that, but I can't figure out how. Apparently it's not simply a billing option, but somehow tied to an instance? Or scalpers? Please tell me what buttons to press to give them my money but less of it.
- If you have suggestions of simple ways to optimize my new setup for price and/or performance, let me know. Emphasis on simple rather than completely re-engineer decades of history.
I have a single instance, r5a.2xlarge (8 CPUs, 64 GB RAM), CentOS 7, us-east-2b, plus a 2TB EBS sc1 encrypted "cold storage" volume. I am running my own httpd, mysqld, postfix and a zillion cron jobs on this instance, and serving my web sites out of EC2 using the EBS volume as just a drive, not serving from its own pseudo-web site. Outbound bandwidth is probably somewhere between 1 and 2TB/month.
Please note that:
- Any solution that involves any URL changing is unacceptable. "Just leave behind 301 redirects to the new URLs" is unacceptable.
- If you're going to suggest, "Just run five more layers of proxies and CDNs in front of it", that suggestion needs to come with numbers on how much that's going to cost, and how much that's going to save, before it's a real suggestion. Because I can't tell.
- I understand that you all think that I should be running 30 different instances that aren't up all the time, instead of one monolithic one pretending to be a real computer. But there is almost nothing that any of my cron jobs do that is sufficiently standalone that that makes any sense to me. I can't wrap my brain around that and I don't have the time or desire to rewrite literally everything.
- I have an "Elastic IP" that is the A record of my web sites. Apparently I'm paying $14/month for it. Someone said, "Oh, be aware that they just go and change those on you sometimes". This can't possibly be true, right? If so, what exactly am I paying for?
- I'm in us-east-2b because that was cheapest. This seems weird, since all of my customers are in San Francisco. Should I care? I'm guessing I shouldn't care.
- Am I correct in assuming that my 2TB EBS volume is never going to "fail"? I don't really have to worry about making AWS "snapshots" of it if I don't want to? (They seem to charge $107 for a 5% change/month snapshot of the 2TB drive! That's a lot of money for a slow-assed Time Machine. It's twice the price of the 2TB drive itself, so that pricing makes no sense.)
Tim Bray works at Amazon. If asked, he will hopefully be willing to help connecting you to people who know the answers. I don't know him personally so YMMV.
the guy who took over the insidious big brother database works at amazon. that's somehow appropriate...
1. If you have an Elastic IP, it won't get changed on you.
2. I believe us-west-2 (Oregon) prices should be the same as us-east-1, no? Should save some amount of latency. I don't think r5a-type instances are available in us-west-1 (SFO).
3. You are not correct - from the docs, "Amazon EBS volumes are designed for an annual failure rate (AFR) of between 0.1% - 0.2%, where failure refers to a complete or partial loss of the volume, depending on the size and performance of the volume.". Taking snapshots mitigates this risk, though if you don't ever expect to need to restore from a snapshot past some older date (and that easy availability is part of what you're paying for), you could rotate older data into cold storage.
Hmm, yeah it does look like us-west-2 is the same price as us-east-2. Is it worthwhile to switch?
Can I switch without changing my Elastic IP? Or are those attached to a region?
Am I right in thinking that "switch" means: make a snapshot of my old root drive; make a new instance; restore the old snapshot on top of the new instance's root drive? (Or can you not restore a snapshot into an instance's root?)
If you're serving mostly west-coast clientele, I would suggest moving to us-west-2. us-east-2 is probably fine. Avoid us-east-1 at all costs. us-east-1 is the largest AWS region by far and historically, all of the major downtime incidents have been in that region. Part of this is the size, part of it is that they often roll out new features to that region first, so it often undergoes a lot more experimentation.
You can't move an Elastic IP to another region (at the moment - that could change next week).
You would take a snapshot of your EBS volume, transfer that snapshot to the new region, then create a new volume from that snapshot, associate it with the new instance, then mount it.
Do you have a single 2TB volume, or a boot volume + data volume? If it's a single volume, you can create an AMI (which is just a special type of snapshot) and transfer that to another region.
under Be Helpful #2: you want to go to EC2 console and create a "reserved instance". see Reserved Instances
To expand on this - think of reserved instances as a match against your running instances. It's not tied to a particular instance per se; but it's tied to (for example) "1 r5a.2xlarge running in us-east-2b". So if you run 2 of them, the discount applies to only 1, etc.
Also worth noting, there is an aftermarket for reserved instances, which Amazon administers. Folks sometimes buy, say, three years ahead of time then their business goes belly-up and they're stuck with 35 months of reserved instance they can't use. This will show up in the market for (probably) cheaper than buying direct from Amazon, if you find the right reserved instance for your vhost type and a desperate seller.
As cf says below, and it's worth reiterating, reserved instances are purchased against the instance type, so take some time to be 100% sure you won't need to upgrade/downgrade to/from a/an r5a.2xlarge, else you're stuck with a reserved instance you can't use, and you then become the seller (likely at a loss) in the scenario described in my previous paragraph.
There's AWS Reserved Instance Flexibility, so you do not have to get the 'exact' instance type to get the savings; see: https://aws.amazon.com/blogs/aws/new-instance-size-flexibility-for-ec2-reserved-instances/
Does "reserved instance" just mean "I'm paying for the instance I already have differently"? Because the name sounds like "I'm buying a different instance than the one that is already running".
It's a credit system. You can start out paying the pay as you go "rack rate", as you are now. Then you can buy the reserved instance credit for the type and location you're using and it will apply to your bill from that point onward.
That's pretty much it. From the reserved instance page:
You just buy "reserved" instances that match the type/size of the "real" instances you're running.
Only reax: EBS is surprisingly expensive for a couple TB over time. Can you stash any big datasets in S3?
Related and probably under Be Helpful #3:
If you have large static content that needs to be web-available and is on that EBS volume, you can do:
1. move the content to a non-public S3 volume
2. set up apache to reverse proxy to the S3 volume
This is the _simplest_ optimization I can think of to save you some immediate money on storage costs; anything else I can think of will be more complex.
EBS Cold HD is $0.025 while Standard S3 is $0.023 (in us-east and us-west Oregon), not that different. I'm guessing a lot of what's in the EBS Cold is web published images and video, not something that should go on S3 Infrequent Access unless a CDN can make them rarely retrieved.
I wouldn't commit to a reserved instance until determining that an r5a.2xlarge is required for what it's doing. Everything seems more expensive in us-west Northern California but not in the Oregon region. I don't see an advantage for you to pick Ohio or Virginia over Oregon but there definitely is a latency disavantage to picking a us-east region. Virginia has the most AWS services and gets new ones first but you're barely using what AWS offers as it is so that's not relevant (Oregon has the 2nd more complete set of services anyway).
If email, especially bulk email and transactional email is being sent from the EC2's ip address, I'd be worried about it not being delivered. AWS has Simple Email Service (SES), it looks cheap (maybe even free, depending on your level of use), and I think requires nothing weirder than postfix conf changes.
Given that I have already done the work to get my DKIM and SPF crud set up, I don't see what benefit there would be to paying more to use SES as one more relay hop (given that I don't need any of its other features). Perhaps there is some benefit to this but I don't know what it is, do you?
Depends on whether blacklisting whole AWS subnets based on spam from a single ip or treating anything from EC2-specific ip space as suspect are practices.
Well apparently neither of us knows the answer to that question.
There are tools available to look up IP addresses in various spam blacklists. I've been using mxtoolbox because I still make questionable life decisions like "running my own email server" and sometimes have to move ISPs, which means IP address changes, which are always fun.
There are definitely RBL services that provide lists of "all IP addresses on AWS or some other VPS hosting service" the same way they used to provide lists of "all IP addresses on cable modems" and "all IP addresses on dialup". Individual mail servers can be configured to use these to increase the probability of rejecting mail (if not rejected outright, it will push the output of a scoring algorithm in the 'spam' direction).
Machine-learning mail filters pretty quickly correlate "looks like an AWS instance IP" with "gee these things seem to send us a lot of spam, maybe we should just start killing them on sight."
The question is whether someone you want to send mail to has one of those services configured in their mail server. The answer is "it depends, but your probability of successful delivery goes up slightly if you use a service where IP reputation is part of their business model."
AWS publishes its public IP space, and there are some providers that simply grab that list and block anything coming from those IPs.
If you have a bunch of code that talks directly to postfix, you can configure postfix to relay via SES. In addition to better delivery rates, you would also have the ability to track delivery, bounces & complaints.
And if you're sending in volume using port 25, you will eventually get blocked or throttled.
Much of amazon's IP space is in spam blacklists and the like and if you care about that it's likely to bite you on the arse when customers etc. don't get email you thought they did. Of course amazon could deal with this but they have no desire to do so because their shoddy service makes their email relay more valuable.
I'm not a huge fan of SES (or indeed anything from amazon for that matter) but if you're concerned about being black-holed it may make sense, and it's difficult to know in advance whether you're going to have a problem or not. It would be a shame if something bad were to happen to all those nice smtp sessions, eh? If this sounds like a racket it's because it is, though not as bad as PKI (thanks for that, btw). Other non-amazon mail relays may be cheaper - I haven't looked.
$ork's servers have the relevant postfix config to relay via SES which I can dig out for you if you're happy to use copy pasta and don't care how your email server works (I'm sure I don't need to tell you that sane people spend the least amount of time managing smtp that they can get away with). It's a half-dozen lines or so.
Bear in mind that SES isn't just an smtp relay. It's designed in large part for spammers to use and includes such 'features' relevant to their business model as blocking all of your outbound email if too many of the messages you send bounce.
Alternatively find some way to route outbound smtp through a non-pathological IP and keep your existing hard-won DKIM & SPF.
It looks like "set SES as your postfix relayhost" will only cost me about $8/month, I think, and given that Yahoo and AT&T are still blocking me, I suppose I might as well, at least for now.
However, I'm completely at a loss as to how to do it. I see several tutorials out there, but I don't get it. I know how to set a relayhost in postfix, but the Amazon side doesn't make sense to me. They seem to be saying that I need to individually verify every email address that might be in a From line, meaning I'm going to have to dick around with this every time I hire a new employee? Do I need to figure out what IAM is? I can't tell. Also SES doesn't seem to be available in East-2?
Have any of you actually done this, and can you explain how?
Not postfix, but sendmail and exim.
There's a doc from amazon at https://docs.aws.amazon.com/ses/latest/DeveloperGuide/postfix.html that seems reasonable.
A couple of steps are needed first. You'll need to create your credentials using the SES console - do this under "SMTP Settings".
And the big gotcha: SES is in "sandbox mode" by default. You will need to open a ticket to request production access. Details at https://docs.aws.amazon.com/ses/latest/DeveloperGuide/request-production-access.html (In sandbox mode, you can only send to or from verified email addresses - this goes away in production mode.)
I should probably have read my way through before commenting myself...
This is the meat of main.cf in a postfix which is entirely unmodified from its default centos 7 configuration.
relayhost = email-smtp.us-west-2.amazonaws.com:25 # I guess you'll need to change this
smtp_use_tls = yes
smtp_sasl_auth_enable = yes
smtp_sasl_password_maps = static:PUBKEY:SECKEY
smtp_sasl_security_options = noanonymous
smtp_tls_note_starttls_offer = yes
smtp_tls_security_level = encrypt
The part I'm not clear on is the SASL auth. Is this one magic "dnalounge.com" user and password and that covers all of my relaying, or does every address that can appear in a From: line need to be individually configured?
I think this covers the lot. I would have to check our config but I don't think the from line matters.
I would hope, but have little faith in amazon, that at most each domain would need to be configured.
Sadly it looks like that may be the case. A scan of our mail log doesn't show any from address which isn't listed in the addresses configured in SES.
That said, we do have domains configured in SES for which there are no addresses listed. I presume some part of the business sends emails from those addresses. I don't know how that works (there are something like 5 subsidiaries in addition to the one I care for, all stuffed into the same amazon account).
I had no part in configuring the aws part of this mess thank god, that was done before I arrived. I only configured the postfix part. That said, we do send from our servers using that configuration from multiple addresses, so you can at least use a single postfix config (and single amazon key) for multiple addresses. I hope you don't need to add each from address individually but I wouldn't be surprised.
ps. Apologies for spamming your comments. I've been lulled to sleep by all this new fangled old technology that's springing up and have become used to being able to edit or delete history with impunity.
You can definitely whitelist an entire domain for SES relay; you do not need to pre-verify every possible address:
EBS volumes do fail, it's ridiculously not very often though. Keep a backup of your data in an s3 bucket. You can let an instance write one if you setup an "instance profile". Just make sure the bucket is write-only by the instance.
You might want to check out amazon lightsail. it's their digitalocean-like-attempt. Sometimes cheaper.
2TB of Cold HDD is $51/month.
2TB of "S3 infrequent access" is $26/month.
But a "snapshot" that Cold HDD is $102/month (per this)
What the hell are these "snapshots" about? Why are they twice as expensive as the thing they are snapshotting??
I can't tell how Lightsail differs from EC2 or why both exist.
Lightsail offers pre-paid, bundled specs that are more easily compared to more traditional VPS providers. They can be cheaper than the usual AWS pay for what you use model. It also serves as an introduction to the AWS ecosystem.
In the EBS snapshot field of the calculator, what are you entering? I think it’s supposed to be only the GB changed in a month but it automatically includes all the space needed for the initial snapshot which is the same size as the EBS volume.
S3 with versioning might be cheaper if most of the files don’t change; S3 is highly durable so you don’t need multiple cloud copies and versioning can protect against fuckups (but then requires managing the versions that are created, discarding unneeded ones). I don’t know what would be the simplest way to use it instead.
I just saw the other comment about avoiding mounting S3 as a filesystem.
You might still use S3 by doing something like using mod_rewrite to turn all requests for files in the relevant directories to point to the files within an S3 bucket instead. That would mean more work for the EC2 and miss out on the load-handling features of S3 but still have its other features. My main concern would be its possible effect on time to deliver the files; that could be tested by, say, copying one photo gallery to a bucket and trying a mod_rewrite rule just for those URLs. If a significant portion of the EBS Cold is for audio or video, for which time to first byte is much less important, perhaps you could move just those to S3 and use a much smaller, cheaper EBS Cold for images and the rest.
In case it wasn't already obvious, I'm just brainstorming and armchair sysadmining, not giving advice based on experience doing this.
Lightsail is also mysteriously much,much cheaper for the first several TB of outbound bandwidth. This is a big deal for my use case. The $5/month Lightsail 1 GB RAM instances include 2 TB of combined in/out bandwidth, after which they charge the (normal?) 9 cents per GB. I believe in normal EC2 you pay for all that bandwidth, so 2 TB of outbound transfer would cost you $180/month.
In your case, Lightsail only goes up to a 32GB of RAM instance at $160/month, including 7 TB of data transfer. If you can split your services into 2 or more instances, Lightsail might or might not be cheaper - it looks like r5a.2xlarge is $194/month for a 1 year reserved instance, plus $90-$180 for 1-2TB of outbound bandwidth.
Nothing mysterious, the usual $0.09/GB for AWS data out is very overpriced. You could call Lightsail a loss leader get people to try AWS but I bet it's still profitable on its own.
> I can't tell how Lightsail differs from EC2 or why both exist.
That's how they get you. Basically everything in aws is built on EC2 instances (the servers behind serverless computing) so unless you're doing clown computing (you're not - you're doing real computing in the clown) EC2 is almost certain to be cheaper or close-run enough as makes no difference.
1) Reserved Instances (RIs) - from the EC2 console, click "Reserved Instances" then at the top pick the blue purchase button. Pick r5a.2xlarge, all-upfront and search. You get a bit of an additional discount for standard instead of convertable, but less flexibility. If you're sure that you won't change it in the next year, then go for the standard. That should be it - it doesn't affect your running instance, it's just a billing construct that applies to a running instance. All upfront will save 41% vs on-demand prices. But I'd hold off on this until you're convinced that you're not going to change anything.
2) You could probably save a bit on content if you put cloudfront in front of your web server. And more importantly, if you serve your static content out of S3 (instead of off your EBS volume) you'll have cheaper storage and less traffic on the web server, meaning you can go with a smaller instance.
1) You do not get charged for an elastic IP. If you launched with the option to have a public IP, then that will get changed every time you restart your instance. You only pay for Elastic IP if it's not associated with a running instance.
2) You probably don't care.
3) EBS volumes can fail. It's unlikely, but in doing AWS for 6 years I've seen it twice. So back up to S3. And since your content is going to be there anyways, you may as well serve it from there.
I'd be happy to answer any additional questions you might have.
People are fond of saying things like this without explaining what it is that I am saving or how, or what the downsides are.
And I have to change all of my URLs, which I was very clear is a non-starter.
The cost for a GB of transfer from EC2 is $0.09/GB; for Cloudfront it's $0.085. Not a huge savings. The benefit would be reducing load on your web server, so you could go with a smaller instance saving money there.
Downsides? If you update static content, you have to wait for cache to expire or tell cloudfront to invalidate its caches, which costs money after the first 1000 requests/month.
When you use Cloudfront, you do NOT have to change your URLs. So if all of your images are under /images/ you'd point that path to the S3 bucket holding your images, and have your default origin continue to point to your web server. DNS points to the cloudfront origin. This is much easier to do if you have your DNS in Route53, but that is not a requirement.
Don't bother. EBS won't fail any more frequently than your own server's drives (because that's what they are) so just stick to whatever _offsite_ backup strategy you already employ.
As for the clown fairy thing, maybe it would allow you to reduce your aws costs enough to pay for the reverse proxy but I suspect not enough to pay for the headache of having Yet Another Service to maintain.
In the Good Old Days (by which I mean Saturday) I had two backups: one backup to a second drive attached to the colo machine; and another backup to offsite. This was fine and good because it meant that should the colo machine's internal drive fail, I was not in the situation of needing to upload 2TB of data over a DSL line.
On the subject of simple ways to optimize your new setup for price and/or performance: Cloudflare?
If you don't show your math, you're not helping.
I would recommend a CDN in general if you do want to cut down bandwidth costs. Cloudflare, as recommended by the commenter above, is probably the simplest and cheapest way of setting up a CDN, but it has quite a few downsides.
In my personal experience, Cloudflare serves static content from my server 50-60% of the time, so in theory, you can do some napkin math and just subtract 50% of your static-content's bandwidth costs for a quick estimate. Also, they cache the resources on their edge servers, which allows a visitor to get your content served on a server geographically close to them, which negates the need for region hopping in many situations. This is all with the Cloudflare free plan, though it has some really fucking weird stipulations, which might turn you off from it:
- You have to use Cloudflare as your DNS provider
- You can't use your current SSL certificate, but Cloudflare will gladly generate you one for you, and put it on their edge. This has the side effect of allowing them to MitM all your "SSL" traffic, and they can basically see it un-encrypted.
If you can get over those hurdles, there aren't many other technical considerations. Generally, you copy over your DNS records into Cloudflare, switch over your domain's nameservers, and it works right out of the box. I don't know what you use for anti-spam currently within WordPress, but you'll need to reconfigure that to not use the default PHP global variable for visitor IP (most plugins these days have an option for Cloudflare's HTTP header with the visitor's IP). In WordPress as well, you might get a redirect loop when you first set it up, but switching Cloudflare to use "Full SSL" fixes it. Also, you might have to screw around with fail2ban and such that same way, and you won't be able to SSH directly to the hostname proxied by Cloudflare, and will have to use the direct IP.
I'm not a fan of Cloudflare ethically in regards to them handling an insurmountable sum of all of the internet's traffic, many of it with them able to decrypt as well. That said, I use them myself and have saved quite a lot of money (as I've said, cuts my bandwidth down by 50% usually). They have paid plans with some apparently rather good features (custom SSL, edge-based apache config-like routing rules), but I have not used them.
If you want to stay within the AWS ecosystem, AWS has their own Cloudflare-like product, CloudFront, but it's not quite as easy to set up and use, and not free. It might be a better fit for you, but I'm not as familiar with it, so I won't comment further on it.
> Cloudflare serves static content from my server 50-60% of the time, so in theory, you can do some napkin math and just subtract 50% of your static-content's bandwidth costs for a quick estimate.
This BOTE math suggests that Cloudflare is completely free, which seems unlikely. A few comments above they say Cloudfront is only a tiny margin cheaper than EC2 bandwidth.
If you just want to use Cloudflare as a CDN, it is entirely free, with the stipulations mentioned above (they become your DNS and MitM your SSL traffic). I’ve put some really high traffic websites behind Cloudflare, and Cloudflare still doesn’t even have my credit card info, nor have they asked. Cloudflare doesn’t charge for bandwidth, but rather for various features that are unnecessary for jwz’s purposes. They don’t ration their bandwidth, it’s not what they make money off of. Cloudfront (which is AWS’ Cloudflare competitor) doesn’t offer the additional features that Cloudflare does within their Cloudfront product, so they charge Cloudfront by bandwidth usage.
You don't get charged for a single elastic IP address that is connected to a live instance. You get charged if you have an additional elastic IP for the same EC2 instance or if you have one that you're not using [citation] (for roughly the same reason that some buffets charge you extra for food you take and don't eat). If I saw this on my bill, I'd check for any unattached elastic IPs.
Elastic IPs don't change over time; that's the entire point of them. The person who said "be aware that they just go and change those on you sometimes" may have been thinking of elastic load balancers. Those have IP addresses that can and do change over time, and they come with an A record, which Amazon recommends you make a CNAME to. It doesn't sound like this is something you're using in your setup.
That is not true, and an IP address which does not change is exactly what you are paying for. I suspect your friend may have been confusing that with the dynamically-assigned external IP addreses of ec2-classic (you don't care what that means, trust me) instances, which are DHCP-assigned and sometimes do change.
Meh. The odds of the extra few ms latency of a coast-to-coast round trip being critical to your customers are basically zero. At some point when you have some spare cycles you could investigate putting all of your site's larger static elements (pngs, videos, etc) into S3 and serving them via Amazons' CDN (CloudFront) but again: meh.
EBS failures are rare, but they're not "never" -- there have definitely been data loss disasters at that layer. That said, as you've noticed, snapshots are expensive: a small script that makes a tarball of your web data directory and then copies that tarball to S3 every night would probably more than suffice for your use case.
Since S3 is 25% of the price of a "snapshot", why would anyone ever do their backups using a snapshot? Why do they even exist? There's got to be a catch.
s3 is not a mountable fs, it's an object store that uses https for access.
snapshots can be attached as a disk volume.
And yet fuse-s3fs is a thing that exists. Does it suck?
Yes. It's dog slow not reliable. It maps a block store onto an object store, leading to further inscrutability when things go wrong. Avoid.
I've never used it. At a certain point, though, you'll be sysadmining your cargo-cult architecture.
It's good enough from what I've experienced - used it just fine in multi-TB read situations (pulling over 500MB/s via multiple connections to different objects). You may need to do a bit of tuning if you want that level of perf, but it's not terrible out of the box.
If your use case is backup & restore, it's just fine. Just don't use it expecting to host a file system that needs to be super-low latency (eg: one constantly dynamically loading executables, etc..). And try to use build 1.84. Older versions are less ... ideal.
(and it maps a file per object - so it's actually possible to understand).
My 2c: we use s3 to host around 60GB of media served both by a web app and ftp daemon using s3fs/fuse.
Since the web app does do access-control before serving the data to end users, we could not go the route of using the native http access to S3. Also, it's an old beast of an app, so rewriting its storage layer to use the S3 api instead of plain filesystem calls was not a solution.
The end results are:
- latency is quite high. Running `ls` on a dir with 1000 files takes ages
- if you access frequently a lot of files from s3fs, you want to disable its 'cache' features (via mopunt options), as it seems to have unbounded usage of ram and lead to instance crashes
- the s3fs mount is still unreliable. It sometimes dismounts, sometimes gets stuck in a situation where all calls take 10 seconds.
We ended up with a cronjob running every minutes that does a simple `ls` on one known s3fs-mounted dir and if it fails or takes more than 5 secs to respond, we dismount and remount it, which seems to fix all problems.
Using the latest version of s3fs instead of the one from debian packages might help...
Balls through a straw. S3 does not expose an API suitable for faking POSIX filesystem semantics. We've almost finished exorcising it from our backend and I couldn't be happier about it.
To paraphrase Franz Kafka: there is a catch, but not for you. Or "it really depends a lot on your use case."
S3 is not a filesystem, not even a little bit. It doesn't preserve any posix metadata (owner/group/perms), it doesn't even have directories (/ characters are just part of the key name; some tools pretend it's an actual directory cut but that's a ui frill), and while their availability SLA is fantastic (all the nines!) the actual latency can be...highly variable. It is just a big-ass key-value store, where the key is something that looks like a filename, and the value is a blob of data with a content-type header.
If "make a tarball of all of your important stuff and copy that to S3", with the implicit restore strategy of "copy back that entire tarball to online storage and extract stuff manually" is a reasonable plan for you (and it almost certainly is), then great: no need to spend the money on snapshots.
The use cases where snapshots start becoming interesting are:
1- You are making so much $/minute that multiple hours of outage are unacceptable: you can then maintain a rolling window of snapshots. Restoring one to a live, mountable drive is an O(n) operation vs the size of the snapshot modulo how many IOPS you're willing to pay $$$$$$ for and you can then mount that drive without further alteration and you're off and running again.
You don't care about this.
2- The data you're backing up requires strong internal consistency guarantees that you can't get with "tar zcf"-- in other words you're backing up a Real Database. EBS snapshots are o(1) to create (but o(n) to get access to: CoW magic happening behind the scenes), so you can just freeze writes to the partition for a second, create the snapshot, unfreeze writes and move on.
You definitely don't care about this: use mysqldump to create a backup file and include that in your tarball.
3- If you're creating an autoscaling group of instances for your newfangled multi-tier web application, you can create a single snapshot of the data partition and each newly created instance will get instantiated with a data partition restored from that snapshot.
Oh my god do you not care about this.
A small postscript here:
In a few weeks/months once you've gotten a little more comfortable with Life In The Clown, you should probably look seriously at Amazon's managed Mysql service:
* RDS instances have an API to automatically restore to any point in time (down to the second) in your automatic backup window (which can be up to 30 days) by replaying the transaction log
* Snapshots actually kinda make sense as a database backup strategy in a way that they almost certainly don't for a random webserver. Paying for just 10mb of snapshot for your database rather than gigabytes for your server's entire data partition is much more cost effective
* If you ever find yourself IO constrained, it will almost certainly be felt at the DB layer first, which means if you're ever going to buy reserved IOPS, it'll be most cost-effective to only target mysql, and see previous point.
* Ditto CPU: moving the database away from your server instance would probably let you run a much less expensive EC2 instance type (YMMV)
* Never again having to ask yourself "is it worth taking my website down to apply this mysql update": priceless
Doctor Memory provides good info. The hope with RDS is that you can shrink your ec2 instance down to a smaller / cheaper size, plus save your time not having to do many of the mundane DBA tasks. The catch with this is if you've bought an RI for your r5a.2xl and you shrink your instance, you won't get the full value of your RI purchase.
So, I have basically no idea how to determine this.
I chose 8 cores, 64 GB because that's what my colo machine was. I arrived at that configuration over years, where every now and then there would be some crisis fire drill (I post something popular, or we have a big show, or the Y Donginators get themselves all engorged, or the Great Firewall of China turns on me) and the machine starts to fall over and I upgrade it. (Plus periodically thinking, "Geez, I wish this perl script / ffmpeg encoding would run faster.")
So I know what to do when the machine is falling apart: add more.
I don't know how to answer the question, "how much less machine can I get away with" because I don't know how to test peak load. Other than by severely underpowering the machine and then forcing repeated fire drills, which sounds awful all around.
For the "Geez, I wish this perl script / ffmpeg encoding would run faster" at least you do have in AWS the option to temporarily buy more of whatever you wish you had more of, rather than feeling the need to permanently upgrade your "whole server".
From Amazon's point of view 8 cores for 6 hours is not really very different from 48 cores for 1 hour. So you can buy the latter and save yourself five hours waiting. Assuming, of course, that whatever you're doing is embarrassingly parallel.
With Reserved Instances you don't really want to scale the "main" system up and down, since that's getting the RI discount at its current size, but you can spin something new up temporarily, run the process and get rid of it. AWS Lambda does all this for you (billing you only for the time spent actually doing work) if you have the right shape of problem. The example they give is a script that runs when there's a photo put into an S3 bucket, the script makes a thumbnail etcetera and puts it into a different bucket: if you upload sixty photos at once does it run sixty scripts on sixty virtual machines? Maybe, you get billed the same either way so who cares? We use this to process phishing scam data.
I missed the thread a year ago. I suspect that if you don't have the time/money/will to refactor a lot of your code to be 'more cloudy', Amazon's infrastructure is going to be an expensive and frustrating home for you. There's far cheaper ways to get bare metal ping/power/cooling that are going to be just as (if not more) reliable than AWS and 'us-tirefire-1' (although you did say you were in east-2, which I've heard is an improvement, but not by much) without having to radically refactor your entire environment.
That said, it sounds like you're committed to this course of action, at least until your second or third unplanned outage or 'holy f*ck where did that come from?!' AWS bill. Your users will definitely have lower latency if you migrated to us-west-1 or us-west-2, as well as greater reliability. You could also reasonably expect a web-based performance boost by getting a Cloudflare account and dropping that in front of your server. They have a free tier (https://www.cloudflare.com/plans/) with other plans coming in at $20 or $200/mo.
Reserved Instances (RIs) will get you the price reduction in AWS that you seek. Keeping in mind that AWS bills by the hour, the process is to basically purchase an RI for the size of EC2 instance ('VM') that you want. Then for every hour that server is up, instead of accumulating billable hours in the 'to invoice JWZ on the 1st' column, it subtracts it from the 'how many hours is left in this RI' column. Don't do a 3 year RI, the performance improvements and price reductions AWS gives year over year make it a lousy financial choice.
EBS instances ('your hard drive') do just up and vanish on occasion. If you're lucky, AWS will give you a few hours' notice to the tune of "Dear customer, we're experiencing instability on your EBS volume's master host. We'll be decommissioning it in 4 hours, please take appropriate action or prepare for downtime". If a snapshot-backup is too rich for your blood, consider using a CLI tool like 'duplicity' to throw rsync-style backups to S3 (or Backblaze's B2) block-storage thinger, and at least be able to recover from a catastrophic outage that way.
Good luck, and welcome to the
us-east-2b is a brand new data center in Ohio, picking up some of the load from the Ashburn, VA (us-east-1x) one that we all realized was too centralized to be useful after the Derecho wiped out power in the DC area for almost a week back in '12. That downtime cost Amazon a fortune in refunds to Netflix, and just about any service built on Heroku was dead. (talk about yelling at the cloud...)
So being brand new, it is under-utilized and that's likely why it is currently the cheapest option. Amazon will likely move a major player (probably Heroku) out there in the next year to free up space needed for the government operations they intend to run out of the new HQ-2 in Crystal City.
> I was served an eviction notice from my colo host
Why was that? Is colo just going away for smaller customers?
Don't know or care. But clearly nobody is in that line of work any more who isn't doing it as a hobby.
For what it's worth, should you decide that The AWS Way doesn't suit your style, the company formerly known as Softlayer (formerly known as ThePlanet, formerly known as EV1) and now known as "IBM Cloud" still does somewhat more classical single-instance hosting (virtual or dedicated iron) that's a bit more old school: you can actually get a KVM session with your server's serial console if you really want.
I ended up as their customer via the aforementioned series of acquisitions, and they're basically fine: IBM isn't what it once was, but their hobbies are still larger than most other companies' yearly profit margin.
A couple of years ago I was part of a team that used AWS for a VoIP/messaging platform. We had ~100 EC2 instances running (mostly t2.micro ones - probably way more than we really needed, but that's how the architecture panned out). Some random notes from that:
* Your EC2 instances will go down at random with no warning, and if you're unlucky it'll enter some weird broken state where the console refuses to let you restart it. We lost one every couple of months. AWS will sometimes email you to tell you that your EC2 instance is "degraded" and needs to be migrated to new hardware (by restarting it via the AWS console), but in at least one case we had an instance disappear off the network a few hours before AWS emailed us.
* AWS have been known to do mass-restarts of EC2 instances to patch whatever hypervisor exploit is doing the rounds.
* An autoscaling group of size 1 is excellent for ensuring that an EC2 instance always exists. We used those for instances that provided SSH tunneling or NAT routing with some scripting that ensured replacement instances took over the Elastic IP of the previous instance.
* S3 storage never failed for us. We used it to directly serve our website and demo apps with our own FQDN.
* We used one of the us-east zones and didn't notice any issues with accessing that from the UK.
So when Amazon decides to torpedo your EC2 instance what do you do? Make a new one and restore the root drive from backup? How precisely do you do that? What kind of backup?
> So when Amazon decides to torpedo your EC2 instance what do you do?
if you have advance notice, you reboot. if it's a failure, honestly, the default pattern is to use redundant system architectures, which you've made clear don't work for you.
> Make a new one and restore the root drive from backup?
Attach a snapshot instead of the failed volume to the existing instance.
or deploy a new instance from a custom AMI.
There aren't any default services that give you anything like backup/restore from tape.
because your db is on the volume, you'll have to make sure that if you mount a snapshot, the db data is usable.
You set an autoscaling group of 1. That means a new ec2 instance will pop up automatically, attach to the same ebs, and go
Note that while this is the correct answer, it's not a simple "click here to make it happen" kind of thing.
Specifically, you can't, within amazon's tooling, attach an EBS volume to an autoscaled instance: EBS devices can only be mounted on one instance at a time (under the hood these are analogous to an iSCSI SAN LUN, not an NFS mount) and even if you've set your autoscaling limits to 1/1/1, EBS doesn't know that you won't change that tomorrow.
The options are:
1. Mount the volume as part of the instance startup script (basically rc.local except defined in AWS's API rather than locally on the instance); this will require ensuring that the instance itself has permissions to use the ebs.attachVolume API call.
2. Instead of using EBS, use...nfs! Party like it's 1989! Amazon offers a managed NFS service and you can just mount that on your instance. Also needs some fiddling with the startup scripts almost certainly, and of course running a database on nfs requires some steely nerves.
Our main solution is probably not helpful for you (sorry): we were using AWS Elastic Beanstalk which automates deploying your application to instances running some standardish server stack (in our case, NodeJS). This only really works for things like web servers where all the actual data is elsewhere.
We had a different solution for some instances that may be less unhelpful: for things which didn't use Elastic Beanstalk, we used a preconfigured EC2 machine image (AWS calls these AMIs) that had some software already installed along with an autoscaling group configured to run a script on newly-started instances to perform instance-specific configuration. Our scripts varied: some just called AWS APIs to grab resources like Elastic IPs, while others installed extra packages and configured them. I think the script format AWS used let us directly list packages to install and files to replace (i.e. instead of running "yum install foo", there was a "packages" section that we could list "foo" in) - it's been a couple of years since I last did anything with them.
We also had a few entirely manually-managed instances, and our way to handle outages on those was to rebuild the replacement by hand. These instances didn't store data themselves (we used AWS' managed database service) so a restore for those was purely a configuration exercise.
This may be a usable approach for you: create your own AMI that's configured how you want, with all the software you need already installed. Then you can either write an instance script to handle AWS resources (Elastic IP, EBS volumes) and add it to an autoscaling group of size 1 (much like Doctor Memory suggested below), or you could do the restore yourself by manually launching a new instance and configuring it by hand. This assumes that your data is not stored on the instance root volume but lives elsewhere (EBS volume, NFS server, S3) and so will survive your EC2 instance evaporating.
Anecdotally, micros seem to go down much more frequently than large instance sizes. My team had 100+ instances and I got a notice about one going down maybe once a year.
I get notices from our other accounts (~2000 instances) every few months, but I'm unsure how those are used. A lot of them are relatively ephemeral.
Your mileage may vary. Objects in mirror may be closer than they appear.
Possibly related to your server migration: Hotmail (or whatever MS calls it these days) decided to drop the "new comment" emails from your blog into the Junk folder. Presumably this will sort itself out once Hotmail realizes your server has moved.
> Your EC2 instances will go down at random with no warning, and if you're unlucky it'll enter some weird broken state where the console refuses to let you restart it.
Bullshit, on both counts. VMs and their hosts don't crash any more than would be expected from linux and the console works as well as minimum-wage outsourced developers can make it.
> AWS have been known to do mass-restarts of EC2 instances to patch whatever hypervisor exploit is doing the rounds.
Never experienced this without prior warning. I smell bullshit.
> An autoscaling group of size 1 is excellent for ensuring that an EC2 instance always exists.
So is not using autoscaling at all. More bullshit.
> S3 storage never failed for us.
The storage is reliable but the API is shite and the front-end fails occasionally.
> We used one of the us-east zones and didn't notice any issues with accessing that from the UK.
There is a significant amount of latency initiating a tcp connection to EC2 instances in at least us-west-2 from everywhere in Europe I've tried, and brexit notwithstanding The UK is and will always be in Europe.
Honestly this is just ridiculous. I think amazon look at 90s Microsoft with admiration as they fleece you for every penny they can while they pull the wool over your eyes selling decades-old technology as new and shiny and I'm first in line to shit all over them, but the problems you describe simply don't exist.
On location: us-west-2 (Oregon) should be the same price as us-east-*
On cost optimization:
The spot prices for many of the instance types is quite low. You can submit a relatively high bid (say, the price of an on-demand instance) and you'll basically get an instance for 50+% off. I've been doing this with a few t2.micros for almost a year, and the savings are huge, but YMMV depending on instance type. From what I've seen, the market is not dynamic and basically sits at whatever floor price Amazon sets. I'd assume Amazon preempts the spot instances for basically no reason from time to time simply to prevent people from camping them, but I have not observed this behavior.
This strategy works if you can tolerate short periods of downtime since you can simply start an on-demand instance to replace your preempted spot instance. If you want to be l33t, you could setup an AWS Lambda function to listen for the Spot Instance preemption, then request an on-demand instance to replace it. While, I haven't needed to automate the failover yet (my application can tolerate downtime and I've yet to observe any from spot preemption), I'm using ECS which does make it easier since it automates launching my app and connecting it to the load balancer if necessary whenever a new instance enters the ECS cluster.
What about anything I described made it sound like I think it's cool for my web site to ever be down? I'm running a business.
* Do not buy any reserved instances (RI) until you're sure you're not going to make any changes that would no longer match the reserved instance. If you decide to move from Ohio to Oregon, for example, you'll still have to pay for the Ohio RI.
The secondary RI marketplace will not recoup your full spend. There are RIs that are a little more flexible, but the more flexible RI, the lower the discount. RIs are not simple.
* AWS actually bills by the second, but it will only really matter if you stop your instance. In your case, with a single instance, it won't matter.
* snapshots aren't as expensive as you're calculating, but it's really hard to project costs.
* you will need backups (on a single instance, something will go wrong), but it you are disciplined about deleting them when no longer valid, you can keep the costs minimized.
* Fundamentally, the only other way to reduce costs would be to re-architect and leverage more/different AWS services, like S3.
If snapshots aren't as expensive as I think, then how expensive are they and how do I tell?
More generally: how do I avoid losing my data? What's the done thing?
there are no done things for the single monolithic server in the clown. sorry, man.
I see snapshot costs being less than 10% of the EBS costs on a monthly basis. But those volumes aren't holding db data, so no idea how the de-dupe and compression and so on will work for you.
Redundancy is the done thing. Multiple servers, multiple backups, multiple vendors, automatic failover and recovery, online transaction replication. Quite a few of the things you need are products you can just buy and use transparently with a few URL proxying (not 301 redirection) rules; however, nobody offers "jwz.org as a service" because it's just too many things at once.
What did you do when hardware failed at your colo? You can probably start with that, but automate the "replace with less-broken hardware and restore backups" part.
When hardware failed at colo, it was easy to understand:
1) Disk failure? I had a backup drive attached to the machine. Replace failed drive, copy it all back.
2) Hardware failure? Put old disk in new pizza box and go.
So far everyone seems to be saying, "When Amazon fails, the only guarantee they make is that they will wipe your root drive with no way to restore it."
> When Amazon fails, the only guarantee they make is that they will wipe your root drive with no way to restore it.
Pretty much spot on. Since you're serving the old fashioned (read: simple and reliable) way, back up and restore your data the old fashioned way. Trust amazon as much as you'd trust any other third party with your precious data, except now "your" hard drive is actually theirs.
You may or may not be able to run two active nodes simultaneously (if it's cost-effective) but you should at least have no trouble simulating a setup where your server has two discs which are periodically synced. Basically what the EBS snapshots should be but probably cheaper and with less amazon smoke-and-mirrors (aka: their business model) so you don't need to figure out their snake oil while your website is down.
Is "Create an AMI instead of a snapshot" the thing that I am looking for? Or does that have some horrid downside too?
I am backing up my EC2 root drive to my sc1 drive, but I'd sure like the process of "machine died, un-die it" to be simpler than what that implies.
An AMI is a snapshot with metadata. In fact before I joined this particular flavour of hell new servers were launched by taking a snapshot of the most recently hacked server (beware - the "turn it off" option is enabled by default), turning that into an AMI and then launching a new instance from the new AMI.
I'd like to think you can attach two EBS volumes to act like two individual real drives but I'm sure amazon have their own unique caveats on that. My use-case doesn't involve the decades of legacy that yours does so I dealt with this particular problem by ensuring that I could always (automatically) build our servers from scratch. Sorry.
When I used EC2 I wanted to make it as much like a regular machine as possible. I used an EBS backed root device set to persistent mode.
If it failed, I just force terminated the instance and started it again (both in the web console). It never failed in the 3 or 4 years it was running except for the times when I was deliberately testing that it worked.
I made backups using dump piped via dd over SSH. I'd have done a full restore by mounting a volume in /mnt on a temporary instance, running restore and then reattaching the volume as a root to an instance. I never did this and might have expected some shenannigans with kernels and boot loaders.. :(
The EBS instance worked just like a virtual hard disk and everything was reliable.
It was initially fiddly to set everything up (a couple of days, no previous aws or clown experience) but no hassle after that and I lived entirely in the VM and didn't need to go back to the AWS tools for day-to-day operations. I only went there for firewall fussing and restarting the instance.
Write web content to S3 and serve from there via CloudFront. Storing 2TB in S3 = ~$46/mo. Serving 2TB/mo via CloudFront over HTTPS, plus the S3 storage charges, comes to $238/mo using the AWS Simple Monthly Calculator, assuming traffic is split evenly 50/50 to the US and Canada. There's Terraform here to configure such a CDN: https://github.com/cloudposse/terraform-aws-cloudfront-s3-cdn.
I think you just said "change all your URLs."
To be fair, you could run your webserver doing reverse-proxying with url-rewriting to hide the the cloufront urls
If you're going to do that, I highly recommend also introducing a monkey pedaling a penny farthing to inflate a balloon until it becomes large enough to impale itself on a spike and burst, thereby causing a chicken to become frightened and lay an egg as part of the architecture.
Naively if you "just" throw up a Cloudfront server with say, all the club photos this does "change all your URLs" which is unacceptable.
But you can just do the moral equivalent with the right URLs, S3 is API accessible, you can shove things you will rarely serve but don't want to ever lose into an S3 bucket and then fetch them on demand rather than keeping them forever on the expensive local storage. The URL doesn't change, just how you implement it, instead of https://www.jwz.org/images/2013/yellcloud.jpg being a file named "yellcloud.jpg" in a directory named "2013" and so on, you'd write some code (in Perl? I guess?) that handles all GETs of /images/ and has, say, a cache indexed on the path name and when people ask for things that aren't in the cache you get them out of S3 and optionally put them in the cache.
Maybe that's in your "Not simple" category, it's hard to tell from where I'm standing. If the club photos are just scattered randomly on a hard disk in amongst other important stuff it may not be practical at all. Likewise if it turns out that every photo is accessed on average say, once per week, it may turn out you do some math and shoving them into S3 is a bad idea. I can't do arithmetic for your business, just pointing out that "the URL must not change" does not bind you to storing everything in actual files on an imaginary local disk rather than keeping things in S3 buckets.
No Perl required--local disk cache and http reverse proxy (including rewriting the host server's URLs in HTML pages to match the original URLs) are all standard Apache 2.4 features. i.e. the server sees someone does a GET /images/foo.jpg and Apache fetches https://some.other.url/garbletoken/jwz-images/foo.jpg for the client (ProxyPassReverse), optionally cached to disk locally.
This lets you make the EBS smaller, assuming that most of the stuff in EBS is static media. That means a shorter recovery time if the EBS explodes, savings in EBS costs (possibly offset by increases in S3 and bandwidth costs), fewer downtime and data-loss events, etc. I don't know the details of the money tradeoffs here, but the sysadmin time tradeoff might be worth it for this specific customer.
Of course, whatever solution you use, you should probably keep a copy of your data in your basement, or on Backblaze, or anywhere that Amazon can't fuck with it, because accidentally losing customer data is what all storage providers eventually do.
Is your actual goal here:
A. Never change a URL that an external user might have bookmarked or written upon a piece of paper?
B. Never change your apache/nginx/whatever configuration's virtualhost/hostname/path configurations?
Both of these are are reasonable goals, but you potentially can achieve the former goal while still using Cloudfront (or another CDN) for caching and S3 for hosting some or all of your static files. (For example: dnalounge.com/flyers/ seems to be entirely a hierarchical tree of jpg/png/gif files -- stitching together a cloudfront configuration that fetched those from an S3 bucket without touching how the rest of your sites are arranged would be straightforward, and cloudfront+s3's data transfer pricing is still a touch cheaper than direct-from-ec2 and of course you could potentially serve from a much smaller ec2 instance and auto-rotate the files to even cheaper s3 storage tiers after set amounts of time.)
I don't actually advise that you do this now, but once you're a little more acclimatized to the Clown, it might we worth your while to poke at it or solicit a volunteer to do said poking.
Never change an existing URL, because I believe the "U" in that acronym actually means something.
However nobody has made the case that doing all this work to deal with CloudFront and all that it entails would actually save me a non-trivial amount of money.
All of this is obviously speculation without seeing your AWS bill and like a month of your cacti graphs, but my semi-informed guess is:
- in terms of bandwidth/storage charges, you could save a little money but probably not enough to make it worth the hassle of doing on its own. (3TB of xfer out of EC2: $270; out of cloudfront it's $255 plus a penny per https request; I suspect it's a wash.)
- but you've provisioned a really pretty beefy instance type there: $330/mo in order to run all of your everythings (apache, mod_perl, postfix, mysqld) on it. If you offload a lot of your web requests to a cache that's sometimes serving out of S3 rather than hitting your server at all (and presumably the vast majority of the traffic for dnalounge.com comes from the SF bay area so you'd only be serving through one or two cdn POPs and your hit rate would be good) you could potentially reduce the traffic to your webserver by 50-80% and be able to reasonably decide to downsize to, say, a t2.xlarge at $121/month.
Like I said elsewhere: this is probably not the time to get complicated and clever. Spend a few months getting the lay of the land here and then you can see if there's any low-hanging fruit to pluck pricewise. But there might be some!
Dude, seriously. For this kind of stuff you need to go to the source, "Administering Very High Volume Internet Services" by Dan Mosedale, William Foss, and Rob McCool, Netscape Communications Corporation September, 12 1995. From TFA: "this paper represents our many iterations through the watch the load increase; see various failures; fix what's broken loop." Link:
@Chris - you posted a link with no link-text. Direct link to the PDF - https://www.usenix.org/legacy/publications/library/proceedings/lisa95/full_papers/mosedale.pdf
@JWZ - Looks like the third-party authorization for posting comments may currently be broken (currently on Chrome/Windows) - if I choose either WordPress or G+ authentication I get a popup window with the number 0 in it.
Currently I don't see the FB "Like" option on the front page of /blog/ entries too.
On the broken/weird front:
I've started getting Data Transfer Interrupts in my usual browser a couple days ago when trying to access this site. Some caveats: I'd been having to use the "start with http" trick to access beforehand, and given the nature of my browser (not exactly modern), I don't know how much you should care.
(This is not the first site I've hit that error, and I'm kinda curious what causes it.)
Be careful with sc1, and perhaps monitor the "Burst Balance" from the AWS console or setup a cloudwatch alert when the balance drops under 50% or so; I've been burned by a weekly backup at 3am on such a volume, and the performance is garbage until you modify the volume to st1 or gp2 (luckily you can do this on the fly, every 6 hours, at no charge, and the performance increase is immediate. also you can go back to a cheaper volume later).
A 2TB sc1 will at least will have a pretty high IOPS/bandwidth threshold, so this might not happen to you, but if you have any scripts that read through the entire drive, you might wake up to an absurdly throttled server in the morning.
I've never had an EBS volume fail (knocking on wood till my knuckles bleed). They claim it is replicated. I hope it isn't a lie. Maybe a tarball backup to S3 is a good idea, even if it is only slightly cheaper than an sc1 volume.
sc1 provides burst up to 80Mb/s per TB (so you will have the ability to burst to 160Mb/s) but your baseline will be 12Mb/s.
So as long as you're not doing a lot of I/O you should be fine. But eventually, you're gonna get a burst of web traffic that will exceed that baseline, which will increase the load on the volume, consume all capacity, and then it'll slow right down, and your site is going to be unresponsive for a while. httpd will wait for I/O to return, more processes will spin up, and eventually you hit process limits or run out of memory & the whole system crashes.
Can't you just stick a computer under your desk where the Ann Arbor terminal was?
--Internet peanut gallery
seamus: "Provision three ships with 50 stout men and I'll sail 'round the Horn and return to ye laden with the finest gold, silks and spices..."
jwz: "I'm moving my colo server to AWS, you old fool..."
seamus: "Is that right, arrr...? Okay, give me five minutes."
Something to compare with for next year: https://azure.com/e/91e93a6e80c748da9af6e08378c47998
You can mount Azure File Storage as a CIFS filesystem, so you could dump your content there: https://docs.microsoft.com/en-us/azure/storage/files/storage-how-to-use-files-linux
If at some point URL rewrite is possible, you can integrate the Azure CDN with the storage account: https://docs.microsoft.com/en-us/azure/cdn/cdn-create-a-storage-account-with-cdn
compare the CDN features here: https://docs.microsoft.com/en-us/azure/cdn/cdn-features - 2TB served via Verizon from US-WEST is about US$165/mo . Given you aren't charging for this content it probably isn't worth it? maybe only if you start losing people because they are impatient?
equivalent from AWS would be EFS: https://aws.amazon.com/efs/features/ but the pricing isn't great (~U$600/mo for 2TB storage), but you could use your EC2 instance to tar it up and send it to S3 (and then Glacier) via the CLI? https://docs.aws.amazon.com/cli/latest/reference/s3api/put-object.html but you'd be paying for the S3 storage. get it to glacier as fast as possible. https://docs.aws.amazon.com/AmazonS3/latest/dev/lifecycle-configuration-examples.html#lifecycle-config-conceptual-ex3 (API), https://docs.aws.amazon.com/AmazonS3/latest/user-guide/create-lifecycle.html (console), https://superuser.com/questions/555929/moving-ebs-snapshots-to-glacier (third answer)
Sewer rat may taste like pumpkin pie, but I'll never know.
You must be new here. JWZ has been boycotting all things Microsoft for the past two decades.
I had a comment with prices and links, but it got eaten - can post again if needed.
CDN isn't worth it imo because you aren't charging for the content, only if you start losing people due to latency would it even possibly be worthwhile.
Under hopefully helpful comments:
I'd suggest switching on-instance database serving to using Amazon RDS, and downscale the instance you currently have. RDS mainly brings ease of life on Amazon, but it's pretty solid and is pretty well optimized + offers HA that you don't have to manage.
For static assets, I recommend moving them off to S3 with a proxy if you don't want redirects (arguably redirects are best practice there but you know your needs better than me).
Unless you need the IOPS I'd make the disk smaller, and probably use separate EBS disk (with regular snapshots) for your data/code and system. I personally don't trust EBS ever since the infamous crash when intern crashed whole region with data loss involved. EBS' guarantees are much lower than S3's.
Consider building custom AMI with your software stack preconfigured and sticking it in AutoScalingGroup of size 1 - this will ensure that if anything happens to the instance, it will be quickly restarted even if it would need reinitialization.
Regarding EIP - I recommend using Amazon's Route53 to host a DNS record for it, using their proprietary "ALIAS" extension - it ensures that the A record follows any IP change in the EIP. CNAME your existing record to it.
Moving to RDS and where possible to S3 and using that to reduce the size of instance and EBS volume should provide some concrete cost savings while also ensuring higher availability and lower operational load (especially when using something like terraform, which for the specified config isn't hard).
From my experience, using hosted SMTP is pretty much a must these days. Unfortunately I can't give directions for SES, but I personally use SendGrid on multiple projects and have working Postfix configuration whose task is "make it work with any braindamaged software including stuff that calls out to sendmail using system()". It works and I don't have to worry about RBL and the like.
A very unoptimized comparison:
old - https://calculator.s3.amazonaws.com/index.html#r=CMH&s=EC2&key=calc-8BD51984-AFE0-4FEE-B414-26C31A9230B1
new - https://calculator.s3.amazonaws.com/index.html#key=calc-81714324-D90C-46D1-97DE-FAEEF18192A2
I believe further optimizations can be done, but would require knowing more about workloads involved.
Well that was fast:
So normally, "Stop" means "shut down but your root drive is not wiped". But it sounds to me like they are actually saying "yes, we're wiping your drive". Am I reading that correctly?
What is my safest and simplest course of action here, to get the machine back up without having to re-do all the manual configuration I have already done?
Short answer: you're fine as you're using an EBS volume. Just stop/shutdown the instance then start it again. The EIP is associated with the instance so will be retained.
Longer answer: It's referring to instance store volumes only. Those are ephemeral and so you lose any data on them when you stop the instance. The reason for that if you're curious is they're local to the system the instance is running on. When you start the instance again, there's no guarantee it'll be running on the same underlying host. You said you were using EBS volumes so are safe here; these are completely independent of the physical host the instance is running on.
But they call everything "EBS volumes", even the magic root volumes, don't they?
I thought that the root volume, a default 8 GB gp2 on /dev/sda1 that was created along with the Instance, was the local physical disk on the same box that my CPUs are on, in a way that other volumes are not? And that's why it gets wiped when I "terminate" (but does not get wiped when I reboot or "stop" which seems to mean "shutdown").
So are they saying that if I "stop" and "start" the instance, that's going to cause my root volume to be physically copied to a new machine before it starts up again?
You said you were running an r5a type instance and those only support EBS storage. You may have multiple EBS volumes attached with different mountpoints but being EBS volumes they're still independent of the instance itself. The root device volume is somewhat special in that it's the volume the instance boots from but it's still just an EBS volume. You may find this article insightful.
The whole stop versus terminate thing is confusing due to the weird terminology but all it really comes down to is "Stop" is equivalent to shutting down the instance. The instance is still there, it's just not running/turned on, but you can power it back up whenever you want. Any persistent resources will remain intact (EBS storage, Elastic IPs, etc...). "Terminate" implies stopping the instance but will also destroy it, and potentially EBS storage attached to it if the "Delete on terminate" setting is enabled. Other resources such as elastic IPs which are attached to the instance will be disassociated, and can be attached to another instance or also removed outright.
The email you've got is saying the physical hardware backing the instance is dying. So all you need to do is stop the instance at a convenient time, then start it again. Doing so will boot the instance on new (hopefully not also defective) hardware. It should come back up exactly as it is now as the EBS volumes will still be attached as will your elastic IP. EC2 presumably has no concept of "live migration" between physical servers, so they require customers to power-off their instances on a defective server and start them again, the latter implicitly moving them to a new server. If you ignore it, they'll do it for you eventually, possibly at an inconvenient time, possibly with data loss if you're using instance storage (but that doesn't apply to you).
AWS is ridiculously complicated, as is Azure for that matter. I have no experience with Google Cloud but assume it's similar. They're all geared towards supporting complex deployments right up to a full "datacenter in the cloud" environment. They work for simple server hosting requirements, but you still have to deal with the ridiculous amount of complexity, as you're witnessing firsthand. Even a single server means EC2 for compute, EBS for storage, possibly VPC for networking, possibly Route 53 for DNS, nevermind the insane pricing models with on-demand, reserved, spot, and so on pricing.
I'd encourage you to consider a host like DigitalOcean which I'd suggest is far more straightforward to setup and manage, while sacrificing no features you actually care about, but it may be you can't be arsed given the time already sunk into AWS ;)
TL,DR: go to the EC2 console, click on your instance. In the detail view, there will be a line for "root device". Click on it. There will be a line that says "root device type". If it says "EBS", you're golden: just stop/start the machine or run /sbin/reboot from the command line.
If it says anything else, you're potentially in a bit of a jam if you've made local modifications on the root partition (e.g /etc/apache2). But I don't think r5a even support the potentially dangerous configuration here so I suspect you're fine.
EC2 has two kinds of storage:
- EBS, which is a block device mounted from some hellish datacenter-spanning SAN-ish thing
- "Instance store", which is a bit of scratch space on the local hypervisor which is provisioned at VM boot time
EBS is persistent -- the volume lives somewhere else, and can potentially outlive the instance itself. Any EBS volume that you provisioned separately from the instance will live past the termination of any instance it's attached to, and volumes that are auto-provisioned with the instance can be configured to be preserved post-termination (but that's not the default). EBS can be slow: it's a network hop or two away from your CPU and you're sharing bandwidth with possibly noisy neighbors, so amazon will happily charge you out the wazoo for dedicated IOPS if that's your jam.
Instance store volumes are ephemeral: if they're used as data volumes, they're for scratch data only and might not persist across a reboot and will definitely not persist across a stop/start. If your root device is kept on instance store, a stop/start will re-copy it from the AMI to wherever your instance lands, so goodbye to any hand customizations you've done in /etc. But the storage is local (or at least only going through several layers of virtualization on your local hardware rather than throwing a virtualized SAN into the mix) so it's often speedier than EBS and it's definitely cheaper.
And whether an instance uses IS or EBS for its root partition is a question of (a) whether the instance type even supports instance store volumes (some don't) and (b) whether the AMI (the base instance image) has been configured to use IS or EBS for its root. So that's fun and not unnecessarily complicated at all.
Strong suggestion: if you haven't already, enable "Termination protection" for your instance, which will require a two-step confirmation cycle before allowing it to be accidentally terminated.
Ok, it's EBS, so I'm good.
This was very confusing because when I was experimenting early on, I saw that doing "terminate" on an instance and then re-creating it required me to start with a from-scratch root partition, and I didn't see any way to say, "for this new instance, re-use the root partition from my now-terminated previous instance".
TL,DR: in general "terminate" really means "erase" and if all you want is to "power down" an instance and reboot it later (either to force it to jump from a physical host that's being de-provisioned as is happening now, or to change something about it like the instance type that can't be done while it's fully booted), what you want to do is "stop" and then "start" it. In day to day usage, "terminate, but then re-create, but with the same root partition" is really not something you'd ever do.
But if you really really wanna...
There are two ways to do that, both somewhat non-obvious:
1. Select your instance in the EC2 console. Pull down the "Actions" menu, and select Images -> Create Image. This will create an AMI ("Amazon Machine Image" predictably enough), which is basically a point-in-time snapshot of your instance's root partition that you can then use as a template to create as many clones of it as your original machine as you want. You get charged for storage of any AMIs you create, so don't, like, use this as a backup strategy.
2. Drill through to the EBS volume that is your root partition, and create a snapshot of it. Now you can do a number of interesting things: you manually assemble assemble an AMI using the snapshot as the source for its root partition. You can also, if you're insane, create a new EBS volume from the snapshot, stop a running instance, and then attach your from-the-snapshot volume to the stopped image as /dev/xvda and then re-start the instance and there: it's suddenly got your old root volume.
This may be the longest thread in the history of JWZ.
Not even close. Not so far. (And I'm pretty sure there are much longer ones than those.)
That was a very enjoyable correction, thank you.
Aw, have you switched to SNI certs as a consequence of the move? My creaking Android 4.x old-man-yells-at-cloud-phone can no longer access the site (for which my WhatsApp chums are probably eternally grateful as they are now no longer subjected to shares from the "doomed" or "perversions" tags). It gives me a security error as it does for many sites and I've always assumed that's the issue...
No it's required SNI for years. I did recently turn off TLSv1 however.
Boy, you sure have a lot of fans that are some kinda aws expert
Your RSS feed doesn't seem to be updating. The most recent article in the feed I see is your post of the video of 10,000 zombies vs. the giant blender from 11/16.
wget disagrees with you.
The error my RSS reader reports:
2018-11-29 23:37:56 -0500: Download Error: Domain: SSL Type: error code: -9836 URL: https://www.jwz.org/blog/feed/
No idea what that means.
Maybe it means your RSS reader still insists upon TLSv1 which is the only SSL-related thing I've changed recently.
Also wasn't sure where to put this but thought it kind of fit here? sort of