So, last month I decided to bow to peer (well, “industry”) pressure and check out Amazon Web Services for myself.
That’s right, before the year 2014 was out, I finally started my own personal “journey” to THE CLOUD(TM).
I wish I could say that I experienced a “Road to Damascus” moment and that all the major (i.e. “showstopper”) concerns I had with actually having to migrate databases to THE CLOUD(TM) magically disappeared once I had actually used it myself. Maybe I had been wrong this whole time?
“Just try it and you’ll see”…
Unfortunately, like a cigarette to a teenager, it was exactly what I had expected. No more, no less.
There is no real benefit to using it aside from looking cool.
Amazon’s Free Tier
Anyway, back to the story. A few weeks ago, I signed up for Amazon’s AWS Free Tier. This allows you to run one “micro” database instance for an entire month (750 hours) under their RDS offering. They also offer other services (storage, applications) in a similar tiered pricing structure.
I spun up two very small databases – MySQL and PostgresSQL, both using the “free” editions and I configured them so I “wouldn’t” find myself with an unexpected charge at the end of the month. For the MySQL instance, I used:
• MySQL 5.6.21 Community Edition
• NOT configured for Production
• No managed standby database in a different location
• No provisioned storage or IOPS
• 5Gb of magnetic storage
• db.t2.micro instance with 1 virtual CPU and 1GiB of RAM
• Default parameter group
• Default option group
I configured my PostgreSQL instance in a similar fashion. As you can see, I went for the absolute minimum – no bells or whistles.
Connecting to the Databases
To begin with, I found it tremendously frustrating to actually access the databases. You are given an endpoint, a port number and you set your own instance name, user name and password. Easy, no?
No, actually. It took a disappointing amount of messing around – including allowing access to all IP addresses – before I could get a client to connect. I’m not a networking genius by any stretch of the imagination, but I should know enough to be able to get basic access to work.
Maybe I don’t and I’ve been kidding myself. Either way, it was certainly not a “one-click”, “smooth”, “problem-free” exercise …
Using the Databases
With that unpleasantness out of the way, I connected to my cute little databases and checked the basics. Naturally, when you have a miniscule database on a VM which derives its specs from Gateway desktops circa 1999, I/O is not going to be blazingly fast. To be honest, nothing IS – but especially not I/O.
And so it proved. Even though the data volume was very small, I certainly wasn’t going to set any data-speed records with this puppy, not that I was expecting to. After all, that’s life in the cheap (free) seats, right?
After, at most, an hour of playing around, I shutdown both of my databases and had the occasional daydream about how I might actually use one of them for a hobby project when I had more time available. I set up alerts through the AWS dashboard to warn me when I had exceeded my Free Tier usage, but as everything was shut down, I really didn’t think any more of it.
Imagine my surprise when, a couple of weeks later, I was notified that AWS had charged my credit card for exceeding the usage limits. Surely the hour of tinkering around didn’t somehow equate to 750 hours – there must be smoe mstiake?
There wasn’t. It turns out that your “usage” starts from whenever you CREATE your database (or other service) and only stops when you DELETE it entirely.
Whether you have it up or down doesn’t matter, Amazon consider you to be “using” the service and charge you accordingly.
I created two databases for a reason – I wanted to test the impact running two databases had on my “usage”. Not altogether unsurprisingly, it turns out that you get 750 hours of COMBINED usage with this plan. I had been “using” TWO databases for a couple of weeks by this point, so I should have divided those 750 hours by 2 to get my “real” usage limit, which was 375 hours each (or 2 weeks round-the-clock).
This, coupled with the unique redefinition of “usage”, caused me to incur charges at around the two week mark even though the databases were down 99.9% of the time.
I caught this quickly thanks to Amazon’s own monitoring and the charge was small ($8), but I was still less than amused. This was a “micro instance” (actually, two of them…) which I had barely used, after all.
Imagine how careful you’d have to be if you moved an actual working database to THE CLOUD(TM). Not only would have to be aware of “usage”, but you’d have to track of bandwidth and I/O AND you’d have to multiple all of this if you wanted your data in multiple access zones.
• Now imagine doing that with 10 databases?
• What about 20?
• How does 50 databases and 10 application servers sound?
RDS = No DBA?
The RDS service “handles” backups and minor upgrades “for you” in that you schedule windows for both. However, there’s no sanity check to stop you from stating that backups must run between 3am and 3.15am only every Sunday morning. So what? Well, imagine you’re only keeping one month of backups for your small-to-medium, but very important database, but not a single one has completed successfully because it needed more than 15 minutes to run.
That’s right – no usable backups of your database.
At least you can blame the vendor, right? No, they just followed YOUR instructions. Ultimately, YOU still have to support your systems, whether they’re in THE CLOUD(TM), in a co-located facility or in your own data center.
What if you wanted to process any kind of real-world workload? The largest database currently available in AWS is called the “4XL” instance type and can use 3Tb of storage on a server with 8 virtual CPUs.
Not very impressive, is it?
Imagine you have a performance problem processing data. Who do you usually go to first?
Why, the Default Blame Acceptors, of course. In this case, though, there’s not much they can do because Amazon configures the database using a very small set of “configuration bundles” or parameter settings typical of your database “size”.
The DBA has no ability to tune the database beyond changing its configuration bundle from a “small database” to a “medium database”. You can’t tweak the optimizer index cost settings to suit your workload – it’s one-size-fits-all.
Partitioning is a really neat feature, isn’t it? How about RAC? Database In-Memory? What if you need to use Advanced Security because of compliance regulations?
Well, you’re out of luck with RDS as none of these options are available to you. To illustrate quite how archaic the technology is, note that AWS recently announced they would make Statspack (not AWR) an “option” for customers going forward.
Maybe there’s more to this “DBA” job than those sales types might have you believe after all…?
Of course, that’s the point: cloud vendors work on the basis that everything can be simplified and standardized to realized the cost savings. They have to treat everything the same so they can make money.
Taking Our Heads out of the Clouds
The drawback to this over-simplification is that the real world isn’t like that. How many application managers do you think would accept being told “we can’t tune the database for your application, you have to tune the application for the database”? How many databases do you have in your company which are – or could be – set up identically? How many do you have which fall under some sort of regulatory compliance?
Small companies may have no choice but to use THE CLOUD(TM). They can’t afford their own data center or co-lo and they sometimes have to go without some of the features they actually need because they simply don’t have the money. For such companies, perhaps THE CLOUD(TM) might be a viable option.
I can see that larger organizations might leverage THE CLOUD(TM) for test or development databases to save time “waiting on the DBA team to refresh the environment”. The theory is that you can “spin up” systems “on demand”, so if you need a throwaway environment for a short while, it could be useful.
Where does the “gold image” come from to use as a template come from, though? You can’t just start up a copy of Production because it contains live data – you have to cleanse it first. Performance will be different on THE CLOUD(TM) than on a physical server, so you’ll need to provision a Production-like QA/UAT environment in your data center for performance testing.
And how do you deal with the age-old developer complaint of “we HAVE to test in Production, that’s the only database big enough/with the right data“?
What Would YOU Do?
Do you trust THE CLOUD(TM) to be secure, to provide cost certainty or to be available when you need it?
Would you put your personal financial data on it? If not, why are you considering moving your company’s databases there?
THE CLOUD(TM) is great for me as an INDIVIDUAL. I can avoid creating a virtual machine on my laptop to use as a sandbox and use a small managed database instead. I can run a small application server as a service or store reference documents and scripts which I can get at anywhere. I can use it for consistent application configuration and for small, “hobby” projects. I don’t put anything sensitive there.
To advocate it as a silver bullet for your IT department’s “frivolous” spending is to try and force an enormous square peg into the tiniest of triangular holes. Is data a business asset or purely a business cost? If it’s the latter, are you sure?
When it comes down to the details, using THE CLOUD(TM) to host business data doesn’t make sense for anything other than small, niche environments.
I still have no idea who came up with the idea that moving enterprise data to THE CLOUD(TM) was a good idea or why they did it.