Blog
Latest Posts
Beyond Rows and Columns: Greenplum’s Polymorphic Data Storage™ -- Part 2
Rather than join the chorus on one side or another, we’ve been hard at work building in the flexibility so that customers can choose the right strategy for the job at hand. We call this Polymorphic Data Storage™. For each table (or partition of a table), the DBA can select the storage, execution and compression settings that suit the way that table will be accessed. With Polymorphic Data Storage™, the database transparently abstracts the details of any table or partition, allowing a wide variety of underlying models:
- Read/Write Optimized — Traditional ‘slotted page’ row-oriented table (based on PostgreSQL’s native table type), optimized for fine-grained CRUD operations.
- Row-Oriented / Read-Mostly Optimized — A row-structured storage layout using large densely-packed blocks and eliminating the need for PostgreSQL’s per-row IDs and MVCC concurrency control. Optimized for read-mostly scans and bulk append loads. DDL allows optional compression ranging from fast/light to deep/archival.
- Column-Oriented / Read-Mostly Optimized — Added as a feature in Greenplum’s latest 3.3.4 release, providing a true column-store just by specifying ‘WITH (orientation=column)’ on a table. Data is vertically partitioned, and each column is stored in a series of large densely-packed blocks that can be efficiently compressed from fast/light to deep/archival (and tend to see notably higher compression ratios than row-oriented tables). Performance is excellent for those workloads suited to column-store — Greenplum’s implementation only scans those columns required by the query, doesn’t have the overhead of per-tuple IDs, and does efficient early materialization using an optimized ‘columnar append’ operator.
Greenplum’s Polymorphic Data Storage™ really shines when combined with Greenplum’s multi-level table partitioning. Some of our customers have fact tables with trillions of rows, and they access newer data very differently to older historic data. Generally they’ll define the table as partitioned by date (e.g. daily or monthly) and potentially also by region or other values. With Polymorphic Data Storage™ they can tune the storage types and compression settings of different partitions within the same table. I.e. A single partitioned table could (for example) have older data stored as ‘column-oriented with deep/archival compression’, more recent data as ‘column-oriented with fast/light compression’, and the most recent data as ‘read/write optimized’ to support fast updates and deletes.
This puts an end to many of the compromises and religious battles that customers have had to put up with until now. Column-oriented tables get great compression and work well for many workloads (particularly when selecting a limited subset of the table’s columns), while there are plenty of workload types where row-oriented tables are a clear win. Now there’s no mystery — in minutes a customer can try both and determine which works best for their workload. Choice is a good thing!
- MAD Skills for Changing Times
- Teradata Taking Aim at Our Enterprise Data Cloud™ Initiative
- Beyond Rows and Columns: Greenplum’s Polymorphic Data Storage™ -- Part 2
- Beyond Rows and Columns: Greenplum’s Polymorphic Data Storage™ -- Part 1
- Greenplum Live! @Hadoop World ‘09
Archive
2010
- March (1)
- February
- January
2009
- December
- November
- October (4)
- September (4)
- June (1)
- May (2)
- April (3)
- March (1)
- February (4)
- January (2)
2008
- December (4)
- November (3)
- October (3)
- September (4)
- August (3)
- July (2)
- June (2)
- May (1)
- April (1)
- March (2)
- February (1)
- January (3)


Add A Comment