Over the last six weeks or so, I’ve been voraciously reading about Big Data, NoSQL, Hadoop and other emerging data management technologies which are clearly The Next Big Thing™ in IT. Personally, I find it fascinating to learn/think about how we will integrate different volumes and scales of data – both structured and unstructured, big and small – to give our organizations a clear, competitive edge in their industries.
As with all Next Big Things™, there are a lot of tremendous blogs and white papers written by some extremely smart and insightful people – and there are a lot of articles in “trade mags” which do nothing but fuel the runaway hype.
You can usually spot the paid advertisements masquerading unconvincingly as “industry” articles a mile off. They’re just as obvious as the “buzzword bingo” blog posts and are often liberally sprinkled with “free”, “cheap” and “open source” and a hearty and conspicuous inference that you can replace an expensive Oracle environment with a “solution” that only costs you the price of some cheap, commodity hardware.
The perfect example of this is an article posted today by The Register, “Oracle’s NoSQL nightmare MongoDB goes to version 2.6”.
“Nightmare”? Interesting. Nothing like using a heavily pejorative term as the third word in your article…
While NoSQL is now the talk of the town (but hasn’t yet established its bona fides), it’s probably a little bit on the early side to consider MongoDB as Oracle’s nemesis – especially when you actually read the article:
“Before delving further into the release, it’s worth pointing out that MongoDB currently has database-wide write locking, which means the entire system can accept only a single write at a time. This is a bad thing, as it means if the database has a very high rate of access, then multiple concurrent writes end up being serialized.
The company says it hopes to make “massive improvements to concurrency” in MongoDB 2.8, so admins keen to gain this capability will be waiting for a while longer.”
Keep reading and you may notice this:
“One particularly new powerful query feature is “index intersection”, which means “MongoDB can use the intersection of multiple indexes to fulfill queries,” according to a FAQ. Previously, MongoDB was mostly restricted to single indexes for most queries”.
As a relational DBA (I like Oracle, I admit it!), it’s difficult not to read these quotes with a large helping of schadenfreude.
To do so is to miss the point, though – MongoDB isn’t targeting (yet) the relational database market at all.
It’s a bit unfair to compare Oracle’s 10+ years’ worth of supporting index joins with MongoDB’s new “powerful … index intersection” feature but it does illustrate the fallacy of some people’s assumption/hope that NoSQL and Hadoop is going to entirely replace all of their Oracle systems and save them a LOT of license fees.
Oracle IS worried about NoSQL (that’s why they offer their own distribution) but MongoDB isn’t likely to do what Teradata, IBM and Microsoft have so far failed to – besting Oracle at large parallel processing of relational data. Unless you have large in-house teams who develop and support their OWN data management systems (see the proponents of “WebScaleSQL” for examples), it’s going to be very hard to entirely get rid of licensing relational database software any time soon.
I may be proven wrong, but I believe a lot of the Big Data hype is just vaporware (there’s a tenuous link to the cloud there somewhere). Absolutely, we are looking at data volumes orders of magnitude larger than what we’ve dealt with before and the standard relational database model is ill-suited to processing what we currently think we know of as “Big Data”.
On the other hand, does every company HAVE to go “Big”? Probably not. First of all, we still need to do what we always have: determine the business requirements for Big Data, architect the infrastructure to meet these needs and ensure scalability and extrapolate that raw data into something of value to the business.
When considering infrastructure, there is the assumption, which I think is premature, that “Oracle databases won’t be able to handle Big Data“. It’s true that how we CONFIGURE databases as of 2014 won’t work, but that’s not to say that within the next 2-3 years, we won’t have the option to tune them to better handle large volumes of unstructured data and/or the structured data that we see today.
A case in point is Exadata. It’s no secret that Oracle had data warehousing in mind when they first released their Exadata machine – I spent 2.5 years working on a V1, so trust me when I say it wasn’t meant to handle OLTP workloads. As such, a lot of people, especially those who worked with it at the start, think “Exadata = data warehouse“.
A reasonable opinion, but with the exponential annual hardware upgrades to Exadata, Oracle have been able to frame Exadata machines as consolidation platforms for a combination of OLTP, DW or hybrid workloads, depending on how you decide to tune it.
Exadata IS expensive, but with EHCC, Writable Flash Cache, Flash Cache Compression and some judicious pinning of objects, you CAN store your entire database in the flash memory of the storage cells AND you still get the “standard” benefits of Exadata, such as the storage cell offloading.
Though I expect this to change, the hardware required for a viable, in-memory, highly-available database is ALSO expensive today. Is it as expensive as Exadata? I don’t know – it probably isn’t – but, to this hardware cost, you also need to determine the “soft costs” such as replacing that hardware, licensing for the software on top of it and your support teams’ ability to support the system.
When it comes down to it, I’m not sure that this in-memory solution (for the time being) is all that much cheaper than an Exadata machine. Not only that, but “cost” and “value” can be very different things indeed.
Oracle’s flagship database has seen some major changes within the last five years (Exadata, multi-tenancy, in-memory options) and their strategy with Big Data is impressively cohesive. Look at this week’s press releases from Pivotal and Teradata and you’ll see that other major vendors agree with Oracle about the potential in this approach.
To discount using Oracle software because of an assumption it will only ever provide what it provides now is short-sighted, just as Oracle would be if they WEREN’T evolving with this “Big” new world.
Relational data isn’t going to magically disappear. Instead, it will be complemented by unstructured data and integrated with it into an organization’s strategic data management environment and whichever tool does the job and integrates the best will win.
Advocates of both “sides” can be guilty of falling into the mental trap of “SQL VERSUS NoSQL” and “structured VERSUS unstructured“.
As someone who hopes to be involved in data management for some time into the future, it’s far more exciting – and accurate – to think instead of “SQL AND NoSQL” and “structured AND unstructured“. The two disciplines are not mutually exclusive!