Greenplum Database: Industry Leading Massively Parallel Processing Performance

Products Landing

Big Data Analytics, Management, and Storage

Greenplum Database is the industry-leading massively parallel processing (MPP) database offering, regarded as the most scalable mission-critical analytical database. The Greenplum Database architecture provides automatic parallelization of data loading and queries.

Fastest Data Loading

Leveraging Scatter Gather technology the Greenplum Database provides the worlds fastest data loading capabilities.

Greenplum Database

Built to support Big Data Analytics, Greenplum Database manages, stores, and analyzes Terabytes to Petabytes of data. Users experience 10 to 100 times better performance over traditional RDBMS products – a result of Greenplum’s shared-nothing MPP architecture, high-performance parallel dataflow engine, and advanced gNet software interconnect technology.

Greenplum Database was conceived, designed, and engineered to allow customers to take advantage of large clusters of increasingly powerful, increasingly inexpensive commodity servers, storage, and Ethernet switches. Greenplum customers can gain immediate benefit from deploying the latest commodity hardware innovations.


Product Highlights

Greenplum Database

Download Greenplum Database

Download Product Database

Massively Parallel Processing Architecture for Loading and Query Processing

The Greenplum Database architecture provides automatic parallelization of data loading and queries. The high-performance loading uses Scatter/Gather Streaming technology, supporting loading speed greater than 10 terabytes per hour, per rack, with linear scalability. All data is automatically partitioned across all nodes of the system, and queries are planned and executed using all nodes working together in a highly coordinated fashion.

Polymorphic Data Storage-MultiStorage/SSD Support

Greenplum Database introduced Polymorphic Data Storage™, including tunable compression and support for both row and column oriented storage within a database. With Greenplum Database, this capability is extended to allow the placement of data on specific storage types, such as SSD media or NAS archival stores. Customers can easily leverage multiple storage technologies to enable the ideal balance between performance and cost.

Multi-level Partitioning with Dynamic Partitioning Elimination

Flexible partitioning of tables is based on date, range, or value. Partitioning is specified using DDL and enables an arbitrary number of levels. Dynamic Partition Elimination disregards irrelevant partitions in a table and allows for significant reduction in amount of data scanned and results in faster query execution times.

Out-of-the-Box Support for Big Data Analytics


Greenplum Database delivers an agile, extensible platform for in-database analytics, leveraging the system’s massively parallel architecture. It natively runs MapReduce programs within its parallel engine and ensures automatic installation and updates of functional extensions, such as in-database GeoSpatial functions, PL/R, PL/Java, PL/Python, and PL/Perl.

High Performance gNet™ for Hadoop

Greenplum Database enables high performance parallel import and export of compressed and uncompressed data from Hadoop clusters using gNet for Hadoop, a parallel communications transport with the industry's first direct query interoperability between Greenplum Database nodes and corresponding Hadoop nodes. To further streamline resource consumption during load times, custom-format data (binary, Pig, Hive, etc.) in Hadoop can be converted to GPDB Format via MapReduce, and then imported into Greenplum Database. This is a high-speed direct integration option that provides an efficient and flexible data exchange between Greenplum Database and Hadoop. gNet for Hadoop is available for both Greenplum HD Community Edition and Enterprise Edition.

Analytics and Language Support


Greenplum Database provides analytical functions (t-statistics, p-values, and Naïve Bayes) for advanced in-database analytics. These functions provide the needed metrics for variable selection to improve the quality of a regression model, as well as enhance the ability to understand and reason about the edge cases. Greenplum Database also supports a new level of parallel analysis capabilities for mathematicians and statisticians and support for R, linear algebra, and machine-learning primitives is offered.

Dynamic Query Prioritization


Greenplum’s Advanced Workload Management is extended with patent-pending technology that provides continuous real-time balancing of the resources of the entire cluster across all running queries. This gives DBAs the controls they need to meet workload service-level agreements in complex, mixed-workload environments.

Self-Healing Fault Tolerance and Online Segment Rebalancing

Greenplum's fault-tolerance capabilities provide intelligent fault detection and fast online differential recovery, lowering TCO and allowing cloud-scale systems with the highest levels of availability. Greenplum Database can also perform post-recovery segment rebalancing without taking the database offline. All client sessions remain connected to allow no down time and the database remains functional while the system is recovered back into an optimal state

Simpler, Scalable Backup with Data Domain Boost

Greenplum Database includes advanced integration with EMC Data Domain deduplication storage systems via EMC Data Domain Boost for faster, more efficient backup. This integration distributes parts of the deduplication process to Greenplum database servers, enabling them to send only unique data to the Data Domain system. This dramatically increases aggregate throughput, reduces the amount of data transferred over the network and eliminates the need for NFS mount management.

Health Monitoring and Alerting

The Greenplum Database provides integrated email and SNMP notification in the case of any event needing IT attention. The system can also be configured to call home to EMC support for automatic event notification and advanced support capabilities.

Greenplum Database Technologies

Greenplum Database

Download Greenplum Database

Download Product Database
Data Sheet

Greenplum Database

Product overview of Greenplum Database - massively parallel processing (MPP) database built to support the next generation of “Big Data” warehousing and large-scale analytics processing.

Whitepaper

Greenplum Database —Critical Mass Innovation

The continuing explosion in data sources and volumes strains and exceeds the scalability of traditional data management and analytical architectures.

Whitepaper

EMC Greenplum Management Enabled by Aginity Workbench

This white paper discusses the features, benefits, and use of Aginity Workbench for EMC Greenplum - a comprehensive management and development tool, specially tailored for the features and architecture of the EMC Greenplum Database.


Whitepaper

Advanced Cyber Analytics with Greenplum Database

Massively scalable data warehousing meets large-scale analytics processing

Whitepaper

Universal PMML Plug-In for EMC Greenplum Database

As advanced analytics becomes pervasive across the enterprise to drive better business decisions, the need for efficient execution of predictive models is paramount. Zementis and Greenplum join forces to help companies easily bring predictive models into their database and score in-place and in-parallel huge amounts of data.

Case Study

NYSE Euronext

With daily data volumes growing at 200% per year and the need to carefully monitor Exchange performance, NYSE required a robust, large-scale solution.

Case Study

Zions Bancorporation

Zions Bank leverages EMC Greenplum Database software and predictive analytics to drive greater revenue and minimize customer churn.


Case Study

Tagged.com

Popular social network runs highly advanced analytics on "big data" sets to get insight at the speed of business.

Case Study

doubleIQ

doubleIQ's new cloud analytics warehouse processes analytics queries up to 300 percent faster.

Greenplum Database - Functionality Demonstration: Workload Management

A functional demonstration on Workload Management and how it is controlled by the Greenplum Database