INTRODUCING THE GREENPLUM MODULAR DATA COMPUTING APPLIANCE The industry's first complete big data analytics appliance
Unified for Big Data Analytics
Advanced all-in-one appliance to deliver a fast loading, highly scalable, data co-processing platform for Big Data Analytics.
Purpose-built modular appliance
Modular solution including Greenplum Database for structured data, Greenplum HD for unstructured data, and DIA Modules for Greenplum partner applications such as business intelligence (BI) and extract, transform, and load (ETL) applications configured into one appliance cluster via a high-speed, high-performance, low-latency interconnect.
Greenplum DCA
The EMC® Greenplum® Data Computing Appliance (DCA) offers the power of a massively parallel processing (MPP) architecture, while delivering the fastest data-loading rate and the best price/performance ratio in the industry—without the complexity and constraints of proprietary hardware. It is a unified Big Data analytics appliance—a modular solution for structured data, unstructured data, and Greenplum partner applications such as business intelligence (BI), and extract, transform and load (ETL). Enterprises can grow their DCAs as their demand for processing capacity grows or as their analytics requirement evolves. It is easy to start with a single, primary rack, which includes a Greenplum Database Module (Standard or High-Capacity), and expand the appliance in quarter-rack increments using Greenplum Database Standard Module, Greenplum Database High Capacity Module, Greenplum HD (Hadoop) Module, or Greenplum Data Integration Accelerator Module in any order and amount, up to 12 racks total. All modules are linked via a high-speed, high-performance, low-latency interconnect.
With the Greenplum DCA, your organization can embrace Big Data analytics quickly and easily. You can get results faster by using an integrated appliance that offers optimized performance, ease of deployment, increased system monitoring and manageability, and a reduced footprint. The Greenplum DCA modules greatly simplify the expansion of capacity and performance of the Greenplum Database (analytic database) and Greenplum HD (Apache Hadoop) portions of the systems. This data management appliance delivers maximum flexibility and scalability for organizations that are tackling terabyte- to petabyte-scale data opportunities.
The Greenplum Data Computing Appliance (DCA) modules are:
Greenplum Database Standard Module
The Greenplum Database Standard Module is a purpose-built, highly scalable data-analytics appliance module that architecturally integrates database, computing, storage, and network into an enterprise-class, easy-to-implement system. This module is the industry leader in price and performance.
Greenplum Database High Capacity Module
The Greenplum Database High Capacity Module is a module designed to host multiple petabytes of data without taking up additional space, surging power consumption, or increasing costs. For businesses that require detailed analysis of extremely large amounts of data or those looking for a longer-term archive, this model offers the lowest cost-per-unit data warehouse.
Greenplum HD
Module
The Greenplum HD Module is the world’s first high-performance data co-processing Hadoop appliance module. The DCA fuses Hadoop with the Greenplum Database, allowing the co-processing of both structured and unstructured data within a single, seamless solution.
Greenplum Data Integration Accelerator (DIA) Module
The Greenplum Data Integration Accelerator (DIA) Module is a module designed to host and to provide fast integration for partner analytics applications to Greenplum Data Computing Appliance. For example, it is used to solve the challenges of data loading in a parallel and scalable model, to shorten batch loads or to implement micro-batch loading.
Product Highlights
Extreme and Predictable Performance with Elastic Scalability
At the heart of the Greenplum Data Computing Appliance (DCA) is the Greenplum® Database, with a shared-nothing, massively parallel processing (MPP) architecture that has been designed for business intelligence and analytical processing. The core principle of Greenplum software is to move processing dramatically closer to the data and its users. This effectively enables computational resources to process every query in a fully parallel manner, use all storage connections simultaneously, and flow data efficiently between resources as the query plan dictates. The result is a wide variety of complex processing that can be pushed down in close proximity to the data for maximum processing efficiency and unparalleled expressiveness.
Modular for Data Co-Processing
The Greenplum DCA offers modular solution to include Greenplum Database for structured data, Greenplum HD for unstructured data, and DIA Module for Greenplum partner applications such as business intelligence (BI) and extract, transform, and load (ETL) applications configured into one appliance cluster via a high-speed, high-performance, low-latency interconnect.
Enterprise High Availability
The Greenplum DCA meets the reliability requirements of the most mission-critical enterprises by delivering multi-level, self-healing fault tolerance, which includes automated failover, fully online self-healing resynchronization, and multiple levels of redundancy and integrity checking. Data availability consists of a hardware RAID protection at the disk level, as well as data mirroring between the different servers. This system reliability ensures no data loss when a disk or server goes down.
Reliable Backup and Disaster Recovery
The Greenplum DCA uses Data Domain and DCA SAN Mirror solutions to ensure a robust and reliable remote data protection for the DCA data analytics environment. With EMC Data Domain’s deduplication and backup technology, the Greenplum DCA can achieve fast, reliable data recovery with backup throughput speeds up to 14 TB/hour. Data Domain wide-area replication has also been qualified to remotely replicate a Greenplum database. The Greenplum DCA SAN Mirror Solution uses EMC Symmetrix VMAX, TimeFinder/Snap, and Symmetrix Remote Data Facility (SRDF) for advanced storage and data replication between two sites in synchronous mode.
Proactive EMC One Support Structure
Customer Support Services provide the resources and services to quickly and proactively resolve solution-related issues and questions. This ensures business continuity and a highly available data environment. EMC’s global maintenance and support is available around-the-clock via 24x7 online support tools, including live chat and online service request management, live telephone support, and onsite support through the industry’s leading global field service organization.
In addition, the Greenplum DCA is enabled with Secure Remote Support (dial-home). Through this feature, the appliance provides around-the-clock remote and pre-emptive troubleshooting by automatically alerting the EMC Support Center of critical hardware and software errors. The EMC Support Center then remotely diagnoses the issue to prevent or shorten system downtime, and automatically dispatches customer engineers to accelerate hardware problem resolution.
DCA Cluster Configuration
You can expand the Greenplum DCA cluster by connecting up to twelve total cabinets with automatic data distribution and greater performance for analyst queries. Each primary rack contains two master servers, one Greenplum Database Module (either Standard or High Capacity), and three optional modules . In a multi-rack configuration, the expansion racks do not require master servers nor Greenplum Database Module. In a cluster that contains Greenplum HD Module, it will require one Administration Module in the primary rack. The Greenplum Database master servers are responsible for authentication, optimizing the query, balancing the workload among the different servers, and managing the fault tolerance mechanism of data.
Greenplum Data Computing Appliance
Product overview of the modular Greenplum Data Computing Appliance - purpose built, highly scalable data warehousing hardware and software that architecturally integrates Greenplum Database, Greenplum HD and 3rd party applications.
Product overview of Greenplum Database - massively parallel processing (MPP) database built to support the next generation of “Big Data” warehousing and large-scale analytics processing.
Data Sheet
Greenplum Data Integration Accelerator
Overview of the Greenplum Data Integration Accelerator, an add-on module that solves the challenges of data loading in a parallel and scalable model.
Press Release
EMC Delivers Industry's First Unified Big Data Analytics Appliance
Greenplum’s Scalable, Modular System Combines Shared-Nothing MPP Relational Database with Enterprise-Class Apache Hadoop for Structured and Unstructured Data Co-processing
EMC Greenplum Data Integration Accelerator
This white paper describes the EMC Greenplum Data Integration Accelerator (DIA), and how it can be used for fast data loading, using the included gpfdist utility, as well as using a popular data integration (DI) tool called Informatica.
Case Study
Greenplum Data Computing Appliance reduces data latency, puts data to work faster, and drives competitive advantage.
Whitepaper
The Greenplum Modular Data Computing Appliance
Decades old legacy architecture for data management and analytics is inherently unfit for scaling today' big data volumes. Competitive advantage in this environment requires more sophistication, including predictive modeling and large-scale data mining for revenue enhancement, cost reduction and breakthrough innovation.
Whitepaper
Data Computing Appliance Architectural Overview
An architectural overview of the EMC Greenplum Data Computing Appliance (DCA).
Whitepaper
Backup and Recovery of the Greenplum DCA Using EMC Data Domain
Insight into how EMC Data Domain de-duplication storage systems effectively deal with data growth, retention requirements and recovery service levels essential to businesses.
Analyst Report
EMC Greenplum positioned as leader in Gartner's Magic Quadrant for Data Warehouse Database Management Systems, the evaluation was based on completeness of vision and ability to execute.