In the last couple of months, Microsoft have repackaged their Parallel Data Warehouse appliance and now offer their new Analytics Platform System (APS) instead.
APS is a turnkey/black box appliance which offers the PDW, a Hadoop cluster (HDInsight, an implementation of Hortonworks) and an integration feature like Oracle’s Big Data SQL – called PolyBase – to allow the PDW to use the Hadoop data.
The PDW part still uses shared-nothing architecture like Teradata, which, I expect will provide linear scalability, but require a precise data model and the possibility that your access to certain parts of your data may not always be available in the event of a compute node crash (unless you have a spare comp node unused, etc)
Three vendors offer to sell the appliance: HP, Dell and Quanta.
“No tuning is required because the appliance is already built and tuned to balance CPU, memory, I/O, storage, network, and other resources. Simply configure PDW for your own network environment, and it is ready to use.”
“PDW’s cost-based query optimizer is the “secret sauce” that makes parallel queries run fast and return accurate results.”
It seems a bit underwhelming that the “secret sauce” is a cost-based optimizer. Exadata’s “secret sauce(s)” are the SmartScans/storage indexes, whereas Teradata’s differentiator is the linear scalability of its shared-nothing architecture.
I suspect the real “secret sauce” for APS is also, in fact, the linear scalability, but the marketing types obviously know that they need something that separates their offering from Teradata.
It’s great that they provide the ability for users to query the Hadoop data via SQL through their DW, but here’s what is a bit odd: they have both the PDW data and the Hadoop data in the same appliance.
There is no ability to scale out your Hadoop cluster independently of how you scale out your DW system – you have to buy another APS appliance.
If you just want more Hadoop, you have to buy another appliance, whether you need the extra DW capacity or not.
If you just want more DW power, you have to buy another appliance, whether you need the extra Hadoop capacity or not.
You’d think that this would change in the future, but right now, it seems like this could put people off as it eliminates the key selling point of Hadoop: unstructured data on cheap, scalable hardware.
It could be a useful product for companies who have small data warehouses and expect to have small “Big Data” implementations – such as a company that wants to capture data from Facebook, retain their sales history in a DW and be able to pull reports/analytics from both. Maybe that’s who their target market is for this?