Greenplum Database: Industry Leading Massively Parallel Processing Performance
Built to support Big Data Analytics, Greenplum Database manages, stores, and analyzes Terabytes to Petabytes of data. Users experience 10 to 100 times better performance over traditional RDBMS products – a result of Greenplum’s shared-nothing MPP architecture, high-performance parallel dataflow engine, and advanced gNet software interconnect technology.
Massively Parallel Processing Architecture for Loading and Query Processing
The Greenplum Database architecture provides automatic parallelization of data loading and queries. The high-performance loading uses Scatter/Gather Streaming technology, supporting loading speed greater than 10 terabytes per hour, per rack, with linear scalability. All data is automatically partitioned across all nodes of the system, and queries are planned and executed using all nodes working together in a highly coordinated fashion.
Polymorphic Data Storage-MultiStorage/SSD Support
Greenplum Database introduced Polymorphic Data Storage™, including tunable compression and support for both row and column oriented storage within a database. With Greenplum Database, this capability is extended to allow the placement of data on specific storage types, such as SSD media or NAS archival stores. Customers can easily leverage multiple storage technologies to enable the ideal balance between performance and cost.
Multi-level Partitioning with Dynamic Partitioning Elimination
Flexible partitioning of tables is based on date, range, or value. Partitioning is specified using DDL and enables an arbitrary number of levels. Dynamic Partition Elimination disregards irrelevant partitions in a table and allows for significant reduction in amount of data scanned and results in faster query execution times.
Out-of-the-Box Support for Big Data Analytics
Greenplum Database delivers an agile, extensible platform for in-database analytics, leveraging the system’s massively parallel architecture. It natively runs MapReduce programs within its parallel engine and ensures automatic installation and updates of functional extensions, such as in-database GeoSpatial functions, PL/R, PL/Java, PL/Python, and PL/Perl.
High Performance gNet™ for Hadoop
Greenplum Database enables high performance parallel import and export of compressed and uncompressed data from Hadoop clusters using gNet for Hadoop, a parallel communications transport with the industry's first direct query interoperability between Greenplum Database nodes and corresponding Hadoop nodes. To further streamline resource consumption during load times, custom-format data (binary, Pig, Hive, etc.) in Hadoop can be converted to GPDB Format via MapReduce, and then imported into Greenplum Database. This is a high-speed direct integration option that provides an efficient and flexible data exchange between Greenplum Database and Hadoop. gNet for Hadoop is available for both Greenplum HD Community Edition and Enterprise Edition.
Analytics and Language Support
Greenplum Database provides analytical functions (t-statistics, p-values, and Naïve Bayes) for advanced in-database analytics. These functions provide the needed metrics for variable selection to improve the quality of a regression model, as well as enhance the ability to understand and reason about the edge cases. Greenplum Database also supports a new level of parallel analysis capabilities for mathematicians and statisticians and support for R, linear algebra, and machine-learning primitives is offered.
Dynamic Query Prioritization
Greenplum’s Advanced Workload Management is extended with patent-pending technology that provides continuous real-time balancing of the resources of the entire cluster across all running queries. This gives DBAs the controls they need to meet workload service-level agreements in complex, mixed-workload environments.
Self-Healing Fault Tolerance and Online Segment Rebalancing
Greenplum's fault-tolerance capabilities provide intelligent fault detection and fast online differential recovery, lowering TCO and allowing cloud-scale systems with the highest levels of availability. Greenplum Database can also perform post-recovery segment rebalancing without taking the database offline. All client sessions remain connected to allow no down time and the database remains functional while the system is recovered back into an optimal state
Simpler, Scalable Backup with Data Domain Boost
Greenplum Database includes advanced integration with EMC Data Domain deduplication storage systems via EMC Data Domain Boost for faster, more efficient backup. This integration distributes parts of the deduplication process to Greenplum database servers, enabling them to send only unique data to the Data Domain system. This dramatically increases aggregate throughput, reduces the amount of data transferred over the network and eliminates the need for NFS mount management.
Health Monitoring and Alerting
The Greenplum Database provides integrated email and SNMP notification in the case of any event needing IT attention. The system can also be configured to call home to EMC support for automatic event notification and advanced support capabilities.
Product overview of Greenplum Database - massively parallel processing (MPP) database built to support the next generation of “Big Data” warehousing and large-scale analytics processing.
How Greenplum Database and Hadoop work together
The continuing explosion in data sources and volumes strains and exceeds the scalability of traditional data management and analytical architectures.
This white paper discusses the features, benefits, and use of Aginity Workbench for EMC Greenplum - a comprehensive management and development tool, specially tailored for the features and architecture of the EMC Greenplum Database.
Massively scalable data warehousing meets large-scale analytics processing
As advanced analytics becomes pervasive across the enterprise to drive better business decisions, the need for efficient execution of predictive models is paramount. Zementis and Greenplum join forces to help companies easily bring predictive models into their database and score in-place and in-parallel huge amounts of data.
With daily data volumes growing at 200% per year and the need to carefully monitor Exchange performance, NYSE required a robust, large-scale solution.
Zions Bank leverages EMC Greenplum Database software and predictive analytics to drive greater revenue and minimize customer churn.
Popular social network runs highly advanced analytics on "big data" sets to get insight at the speed of business.
doubleIQ's new cloud analytics warehouse processes analytics queries up to 300 percent faster.
A functional demonstration on Workload Management and how it is controlled by the Greenplum Database