Oracle today announced their in-memory database option: Oracle Database In-Memory (ODIM).
Not to be overly dramatic (but…) – if this is for real, this could be a very big deal indeed for Oracle. I was somewhat unconvinced by some of the pre-release hype, but it looks like it might have been justified.
The performance improvement potentially dwarves the difference that Exadata made when it arrived on the scene. Of course, this was a super-slick presentation from Larry Ellison and the proof is always in the pudding. However, there are some SERIOUS players in the industry who are confirming that this IS as good as it sounds, if not better.
A datasheet comparing ODIM and SAP HANA – which, before today, was the current gold standard for in-memory databases – can be found here.
Being the curious type (?), it wasn’t long before I had a bunch of questions:
- How does the ODIM impact Exadata and its storage cell offloading?
- Is the columnar compression a sort of “lite” version of the Exadata Hybrid Columnar Compression?
- Where does this leave the Exalytics In-Memory Machine? Is the column cache a part of the SGA like the row (buffer) cache? Will it be put in HugePages in Linux?
Here are some observations I made while watching the presentation:
- Can be deployed against any 12c database with the “In-Memory” cost option.
- Available “next month” across all hardware platforms.
- Can be deployed on a single-instance database, a RAC clustered database or an engineered system, such as Exadata.
Row and Columnar Formatting
- Maintains standard row format (good for OLTP) and also presents a columnar format in memory (VERY good for OLAP).
- Changes to the columnar data do NOT get stored, so no extra write overhead required.
- Changes to the row data gets stored like they do today, ensuring ACID, etc.
- Memory is divided into “row cache” (buffer cache) and “column cache” (for columnar processing).
- 2x-20x compression enables more data to be stored in the column cache.
- Each CPU core can scan local in-memory columns at a rate of billions of rows per second.
- Ideal for “mixed workloads” – OLAP queries being run against an OLTP system.The expected improvement (the “goal”) is to make OLAP queries at least 100x faster.
- This is a conservative estimate as a number of use cases result in 1000x improvement.
- Customers should be “very disappointed” if they “only” improve OLAP queries by 10x.
- This improvement allows for real-time analytical queries.
- A business which gets faster answers to their questions can ask MORE questions MORE OFTEN (and more “on the fly”).
- Also allows for “instantaneous” queries on OLTP AND EDW processing.
- Because of the improvement for OLAP queries, there is no longer a need for analytical indexes. In fact, Oracle expects that you remove these entirely.
- Removing these analytical indexes will improve OLTP processing by 1-3x – mileage will no doubt vary and assumes a mixed workload.
Fault Tolerance, High Availability and Disaster Recovery
- The column cache is duplicated across at least 2 nodes, making it fault tolerant – the system can lose one node and continue processing (unlike other in-memory databases).
- Other in-memory databases are NOT fault tolerant – the loss of one node will bring the system down or, at least, prevent the user accessing some of the data.
- Some in-memory offerings are single-node only, but Oracle’s In-Memory can be scaled out.
- If In-Memory is used on a RAC and Data Guard environment, it automatically runs Oracle’s Maximum Availability Architecture, allowing it to make use of the resilience of RAC and the DR capabilities of Data Guard.
The Big Memory Machine
- M6-32 Big Memory Machine.
- Huge. Massive. Beast.
- If I’m REALLY good this year, will Santa put this in my stocking?
- 32Tb of RAM.
- Can achieve processing speeds of 3Tb/sec.
- Set a THREE separate world records for data processing in ONE query.
Implementation of In-Memory Database
- Allocate memory for the column cache.
- Identify which objects should be cached in the column cache.
- Remove all analytical indexes.
- Need to “eventually” re-organize the column cache.
- NO CHANGE is required to the application code – it is entirely transparent to the application.
- “uniquely achieves better performance with no changes”
- “from batch to real-time systems”
- “allows your business to become a ‘real-time enterprise’”
- “cloud-ready” – as long as you use 12c’s multitenancy option (which, by the way, is a COST option)
- 1.9bn-row analytical batch job improved by 257x – from 58 hours to 13 minutes.
- PeopleSoft Financial Analyzer processed 290m ledger lines across 250 business units through 7 steps 1300x faster – from 4.3 hours to 11.5 seconds.
- OTM batch job data on 145m status records, 60m shipment data records and 16k drivers 1030x faster – from 16 minutes to sub-second.
- 104m JD Edwards sales order lines were processed 1700x faster – from 22 minutes to sub-second.