Blogs
Latest Blogs

09.24.2008 :: Ben Werther
Category:: September
Product Perspective: What is Oracle Exadata? Just Oracle RAC and a whole lot of marketing!
Oracle has struggled to find a competitive foothold against data warehousing leaders Teradata, Netezza and Greenplum. With Exadata, Oracle is marketing that they have a solution, but the details tell another story. While they’ve cut EMC out of the picture, the new Oracle and HP solution is weighed down by the same overly-complex shared-disk architecture that Oracle RAC customers know well. Oracle knows the marketing story they want to tell, but they are no closer to building a proven shared-nothing architecture like Greenplum that can economically scale from terabytes to petabytes with massively parallel query performance.
Oracle has been getting beat badly in the high-end warehousing space. They've had a number of issues working against them:
- Oracle is an OLTP warehouse at heart. It has never been good at running large parallel analytical queries. Anyone with experience trying to implement a 50TB Oracle warehouse will tell you that it takes rocket-science tuning to get it to behave.
- Oracle's scale-out story is RAC, which requires a big and expensive shared-disk infrastructure (i.e. SAN) for coordination. Problem is that SANs only have ~4GBytes/s bandwidth out, so I/O scalability is inherently capped. RAC is also a bear to configure and has been known to have some stability problems.
- Oracle is competing against leading vendors such as Teradata, Netezza and Greenplum who are masters of the massively parallel shared-nothing architecture -- the only architecture that's known to work at these scales. Oracle has the wrong architecture, and they just can't compete technically.
So what has changed with the Exadata announcement? Once you cut through the marketing, this is really about swapping out EMC storage for HP commodity gear, taking money from EMC's pocket and putting it in Oracle's. Sure, there’s a little more I/O bandwidth, and very simple queries (e.g. basic table scans) may be faster. However this is just the low-hanging fruit, and the rest of RAC is unchanged, with significant parallel query planning and execution problems, and is sorely in need of re-architecting to allow multiple servers to work together effectively to plan and execute queries with complex joins and aggregations. Expect that extensive tuning of query plans will be required to avoid it melting down on anything mildly complex.
Oracle RAC has a fundamental limitation due to its shared-disk architecture — it requires that the Oracle Database servers are separated from the storage (i.e. SAN or Exadata Storage Servers), because each database server needs to be able to see all the data in order to function (figure 1). This means that the processing is happening on an entirely different box than the one accessing the data in disk. Oracle’s ‘intelligent storage’ story (i.e. predicate pushdown) is a baby step to reduce this deficiency by pushing around less data, but doesn’t address the core issue.
By contrast Greenplum’s shared-nothing architecture means that the database lives on the storage nodes (figure 2). This allows it to fully parallelize SQL, MapReduce and R (for statistical analysis), and do state-of-the-art analysis directly against the data — which makes Oracle’s ‘intelligent storage’ look awfully dumb by comparison. Until Oracle is able to run the entire database directly against their disks (i.e. a shared-nothing architecture) they are going to be living with fundamental processing limitations.
We welcome Oracle to the game, and we agree with their fundamental premise — i.e. that the traditional Oracle approach doesn’t cut it and customers need a highly-scalable solution, built on commodity hardware, that has been architected from the bottom-up for data warehousing. And while Oracle’s new solution just puts a new face on the old problems of Oracle RAC, Greenplum is delivering on this today with real solutions to address real customer needs -- scaling from terabytes to petabytes for some of the most demanding and sophisticated customers in the world.
Oracle has been getting beat badly in the high-end warehousing space. They've had a number of issues working against them:
- Oracle is an OLTP warehouse at heart. It has never been good at running large parallel analytical queries. Anyone with experience trying to implement a 50TB Oracle warehouse will tell you that it takes rocket-science tuning to get it to behave.
- Oracle's scale-out story is RAC, which requires a big and expensive shared-disk infrastructure (i.e. SAN) for coordination. Problem is that SANs only have ~4GBytes/s bandwidth out, so I/O scalability is inherently capped. RAC is also a bear to configure and has been known to have some stability problems.
- Oracle is competing against leading vendors such as Teradata, Netezza and Greenplum who are masters of the massively parallel shared-nothing architecture -- the only architecture that's known to work at these scales. Oracle has the wrong architecture, and they just can't compete technically.
So what has changed with the Exadata announcement? Once you cut through the marketing, this is really about swapping out EMC storage for HP commodity gear, taking money from EMC's pocket and putting it in Oracle's. Sure, there’s a little more I/O bandwidth, and very simple queries (e.g. basic table scans) may be faster. However this is just the low-hanging fruit, and the rest of RAC is unchanged, with significant parallel query planning and execution problems, and is sorely in need of re-architecting to allow multiple servers to work together effectively to plan and execute queries with complex joins and aggregations. Expect that extensive tuning of query plans will be required to avoid it melting down on anything mildly complex.
Oracle RAC has a fundamental limitation due to its shared-disk architecture — it requires that the Oracle Database servers are separated from the storage (i.e. SAN or Exadata Storage Servers), because each database server needs to be able to see all the data in order to function (figure 1). This means that the processing is happening on an entirely different box than the one accessing the data in disk. Oracle’s ‘intelligent storage’ story (i.e. predicate pushdown) is a baby step to reduce this deficiency by pushing around less data, but doesn’t address the core issue.
By contrast Greenplum’s shared-nothing architecture means that the database lives on the storage nodes (figure 2). This allows it to fully parallelize SQL, MapReduce and R (for statistical analysis), and do state-of-the-art analysis directly against the data — which makes Oracle’s ‘intelligent storage’ look awfully dumb by comparison. Until Oracle is able to run the entire database directly against their disks (i.e. a shared-nothing architecture) they are going to be living with fundamental processing limitations.
We welcome Oracle to the game, and we agree with their fundamental premise — i.e. that the traditional Oracle approach doesn’t cut it and customers need a highly-scalable solution, built on commodity hardware, that has been architected from the bottom-up for data warehousing. And while Oracle’s new solution just puts a new face on the old problems of Oracle RAC, Greenplum is delivering on this today with real solutions to address real customer needs -- scaling from terabytes to petabytes for some of the most demanding and sophisticated customers in the world.


- From the Field: Market Adoption
- From the Field: Greenplum's Pharmaceutical Customers
- Diverging views on Big Data density
- CTO View: Astounding scale at eBay
- Product Perspective: SQL and MapReduce. The choice is yours.
Archive
2009
2008
- December (4)
- November (3)
- October (3)
- September (4)
- August (3)
- July (2)
- June (2)
- May (1)
- April (1)
- March (2)
- February (1)
- January (2)


Add A Comment
tpc-interested Does Greenplum publish tcp-h benchmarks?
Ben Werther No, we don't publish TPC-H benchmarks. We are focused on real-world workloads, and our view is that showing performance on a customer's own queries is far more important than proving a point with a synthetic benchmark.
Anon Friend It is more than RAC - you may want to do a re-analysis of your blog or refresh it. The Exadata Storage software has processing capabilities.
BTW, I don't think customers would/should care about shared/unshared architecture.
The price point of Oracle is HUGE and
the technical requirement of Oracle 11 DB
is a long shot. In short term and in essence, you don't have to worry to compare architectures.
Anoop Dwivedi I sort of agree with your assessment that this solution from Oracle only addresses the i/o bandwidth issues which they had with traditional DWH implementations.
The exadata storage cells work as intelligent storage device which can run some minor queries and return the result sets back to the DB server.
It is still not MPP solution ad advertised. It is still based on wintel boxes and you can disect the storage cell architecture or the database machine architecture. It is still not a true shared nothing solution and as such the usuall Oracle DB perfomance issues will still hold true. It has teh same DBMS, the same optimizer/executer and nothing has changed at the core of oracle 11g. RAC and ASM have been difficult to manage in the past. I am waiting for the real results from the field to see if it really increases performance by 10X.
Price will always be an issue, but oracle for sure can negotiate something under enterprise licenses.