When Amazon began their project to act as the world’s online all-in-one shop, they knew they’d need to build one hell of a data center operation to cope with the demand. And they did it. Quite brilliantly, in fact.
Then they realized that by building worldwide data centers sufficient to cope with the worst of the peak demand (think: Christmas), they’d inevitably be overbuilding, leaving 95% of the capacity free most of the time. Why not do something with all that excess data center capacity…like, rent it out to other folks?
That, so the legend goes, was the genesis for what became known as Amazon Web Services (AWS), which has now grown to encompass countless services and computers spread across numerous connected data centers around the world. Their services now power everything from e-commerce to Dropbox to the Department of Defense. Indeed, if AWS ever does suffer one of their very-rare outages (the last I recall was a brief outage affecting their Virginia data center a year or so ago), it brings down significant parts of the internet.
We became a customer of AWS almost a decade ago, to help us serve up the installer images and picture disks in ComicBase over their “S3” (“Simple Storage Solution”) platform. Then, when I made the decision to move my family to Nashville and we had to split the IT operations in our California office, we decided to move our rack of web, database, and email servers up to Amazon’s cloud. AWS promised to let us spin up virtual servers and databases–essentially renting time on their hardware–and assign as much or as little resources as it took to get the job done.
It took us about a month to get the move done, and it was terrifying when we turned off the power to our local server rack (it felt like we were shutting down the business) . But to our great relief, we were able to walk over to a computer in our office outside our now-silent server room, fire up a web browser, go to www.comicbase.com, and see everything working just the way it should, hosted by Amazon’s extraordinary EC2 (“Elastic Compute Cloud”) and RDS (“Relational Database Services”). After a few weeks of making sure all was well, I and my family packed ourselves into a car, drove to Nashville, and the business carried on the entire time. We were living in the future.
So why, 3 years later, did I just spend the better part of a month moving all our infrastructure back down to our own servers again? Basically, it came down to cost, speed, and the ability to grow.
Bandwidth and Storage Costs
S3 — storing files up on Amazon’s virtual drives — is pretty cheap; what isn’t cheap is the bandwidth required to serve them up. If you download a full set of Archive Edition installers, for instance, it costs us a couple of bucks in bandwidth alone. Multiply by thousands, and things start adding up. The real killer, however, was the massive amount of web traffic caused by the combination of cover downloading and serving up image requests to image-heavy websites like ComicBase.com and AtomicAvenue.com. In a typical month, our data transfer is measured in the Terabytes–and the bandwidth portion of our Amazon bill definitely had moved into “ouch!” territory.
We were also paying the price for the promise we’d made to give each of our customers 2GB of allocated cloud storage to store database backups. When we were buying the hard drives ourselves, this wasn’t a super expensive proposition. But when we were now renting the space on a monthly basis from Amazon, we wound up effectively paying the price of the physical hardware many times over during the course of a year.
The Need for Speed
Our situation got tougher when we decided to add the ability to have ComicBase Pro and Archive Edition automatically generate reports for mobile use each time users saved a backup to the cloud. This let us give customers the ability to always have their data ready when they viewed your collection on their mobile devices, without needing to remember to save their reports ahead of time. It’s a cool feature–one which I use all the time to view my own collection–but it required a whole new set of constantly-running infrastructure to pull off.
Specifically, we had to create a back-end reporting process (“Jimmy” — after Jimmy Olsen, the intrepid reporter of Superman fame). Jimmy’s job is to watch for new databases that had been backed up, look through them, and generate any requested reports–many for users with tens of thousands of comics in their collections. Just getting all the picture references together to embed into one these massive reports could take 20 minutes on the virtualized Amazon systems.
Even with the “c4 large” compute-oriented server instances we wound up upgrading or Amazon account to, this was a terribly long time, and often left us with dozens of reports backed up awaiting processing. We could of course upgrade to more powerful computing instances, faster IO throughput allocations, etc., but only at an alarming increase in our already considerable monthly spend.
With terabytes of stored data, an escalating bandwidth bill, and all our plans for the future requiring far more resources than we were already using, it was time to start looking for alternatives.
Do it Yourself
When we launched ComicBase 2020 just before this past Halloween, we tried a very brief experiment in at least moving the new download images off Amazon and hosting them on a Dropbox share to save on the bandwidth bill.
The first attempt at this ended less than a day after it was begun, when I awakened to numerous complaints that our download site was offline, and a note from Dropbox letting us know that we’d (very quickly) exceeded a 200 GB/day bandwidth limit we hadn’t ever realized was part of the Dropbox service rules. (I could definitely see their point: they were also paying for S3 storage and AWS bandwidth to power their service–albeit at much lesser rates than us, thanks to bulk discounts they get on the astonishing amount of data they move on a daily basis). Unfortunately, there was no way to buy more bandwidth from Dropbox, so after one more day of, “maybe it’s just a fluke since we just launched” thinking–followed a day later by getting cut off by Dropbox again–we abandoned that experiment.
After a couple of days of moving the download images back up to S3 (and gulping as we contemplated the bandwidth bill implications), we wound up installing a new dedicated internet connection without any data caps, and quickly moved a web server to it whose sole purpose was to distribute disk image downloads.
Very quickly, however, we started the work to build custom data servers, based off the fastest hardware on the market, and stuffed full of ultra-fast NVMe SSDs (in RAID configuration, no less), as well as redundant deep storage, on-premise storage arrays, and off-premise emergency backup storage. All the money for this hardware wound up going on my Amazon Visa card, and ironically, I would up with a ton of Amazon Rewards points to spend at Christmas time, courtesy of the huge hardware spend.
After that began the work of moving first the database, then the email, web, and FTP servers down to the new hardware. I’ll spare you the horrific details here, but if anyone’s undergoing a similar move and wants tips and/or war stories, feel free to reach out. The whole thing from start to end took about 3 solid weeks, including a set of all-nighters and late-nighters over this past long weekend to do the final switch-over.
As of this morning at 2AM, we’d moved the last of the servers off of Amazon’s cloud, and are doing all our business once again, on our own hardware. Just before sitting down to write this, I scared myself silly once more as I shut down the remote computer which had been hosting ComicBase.com and AtomicAvenue.com on Amazon’s cloud. And once again, I started to breathe normally again when I was able to successfully fire up a web browser in the office and see that the sites–and the business–were still running: once again on our own hardware.
So far, things seem like they’re going pretty well. The new hardware is tearing through the reporting tasks in a fraction of the time it used to take; sites are loading dramatically faster; and the only real technical issues we encountered were a few minor permission and site configuration glitches that so far have been quickly resolved.
Unless it all goes horribly pear-shaped in the next few days, I’ll be deleting our Amazon server instances entirely. While I’m definitely appreciating the new speed and flexibility the new servers are giving us (and I’m looking forward to not writing what had become our business’ biggest single check of each month), I still have to hand it to the folks at AWS: you guys do a heck of a job, and you provided a world class service when we needed you most. I also love that a little Mom-n-Pop shop like ourselves could access a data center operation that would be the envy of the largest corporate environments I’ve ever worked in. With the incredible array of services you now provide, it wouldn’t surprise me in the least if we didn’t wind up doing business again in the future.