Polymorphic Data Storage™

Traditionally relational data has been stored in rows - i.e. as a sequence of tuples in which all the columns of each tuple are stored together on disk. This has a long heritage back to early OLTP systems that introduced the 'slotted page' layout that is still in common use today. However analytical databases tend to have different access patterns than OLTP systems. Instead of seeing many single-row reads and writes, analytical databases must process larger more complex queries that touch much larger volumes of data - i.e. read-mostly with big scanning reads and infrequent batch appends of data.

Vendors have taken a number of different approaches to meeting these needs. Some have optimized their disk layouts to eliminate the OLTP fat and do smarter disk scans. Others have turned their storage sideways (literally) with column-stores - i.e. the decades old idea of 'vertical decomposition' demonstrated to good success by Sybase IQ, and now reimagined by a raft of newer vendors. Each of these approaches has proven to have sweet-spots where they shine, and others where they do a less admirable job.

Rather than advocating for one approach or the other, we've built in the flexibility so that customers can choose the right strategy for the job at hand. We call this Polymorphic Data Storage™. For each table (or partition of a table), the DBA can select the storage, execution and compression settings that suit the way that table will be accessed. With Polymorphic Data Storage™, the database transparently abstracts the details of any table or partition, allowing a wide variety of underlying models:

  • Read/Write Optimized — Traditional 'slotted page' row-oriented table (based on PostgreSQL's native table type), optimized for fine-grained CRUD operations.

  • Row-Oriented / Read-Mostly Optimized -- Optimized for read-mostly scans and bulk append loads. DDL allows optional compression ranging from fast/light to deep/archival.

  • Column-Oriented / Read-Mostly Optimized — Added as a feature in Greenplum's latest 3.3.4 release, providing a true column-store just by specifying 'WITH (orientation=column)' on a table. Data is vertically partitioned, and each column is stored in a series of large densely-packed blocks that can be efficiently compressed from fast/light to deep/archival (and tend to see notably higher compression ratios than row-oriented tables). Performance is excellent for those workloads suited to column-store — Greenplum's implementation only scans those columns required by the query, doesn't have the overhead of per-tuple IDs, and does efficient early materialization using an optimized 'columnar append' operator.

Greenplum's Polymorphic Data Storage™ really shines when combined with Greenplum's multi-level table partitioning. With Polymorphic Data Storage™, customers can tune the storage types and compression settings of different partitions within the same table. I.e. A single partitioned table could (for example) have older data stored as 'column-oriented with deep/archival compression', more recent data as 'column-oriented with fast/light compression', and the most recent data as 'read/write optimized' to support fast updates and deletes.