Greenplum HD: Enterprise-Ready Apache Hadoop

Products Landing

Advanced Distribution for Hadoop

Get full data protection with no single points of failure

Faster Performance

Experience 2-5 times the speed of standard Apache Hadoop

API Compatibility

Use a complete distribution that is 100 percent API-compatible with Apache Hadoop (MapReduce, HDFS, and HBase)

Greenplum MR

Based on MapR’s M5 Distribution, Greenplum MR enables you to increase the performance and availability of Hadoop via breakthrough innovations. With Greenplum MR, Hadoop is faster, more dependable, and easier to use.

Experienced Hadoop users often find that issues with ease of use, performance, and manageability dramatically slow their internal development process while tying up valuable resources. To address these core Hadoop problems, Greenplum is providing a distribution that finally meets the needs of the most advanced users. Greenplum MR provides a complete distribution that has been re-engineered from the file system through the analytics layers to provide the fastest capabilities in the industry. These technological advances broaden the scope of how and where Hadoop can be used.


Innovation with Application Portability

Greenplum MR is a 100 percent interface-compatible implementation of the Apache Hadoop stack. By maintaining Hadoop interface compatibility, Greenplum MR provides seamless application portability while delivering the advanced features that organizations require.

Faster Performance

Greenplum MR achieves up to two times the performance of standard Apache Hadoop, and can reduce your equipment costs by more than half.

Greenplum MR Direct Shuffle uses the distributed NameNode to improve Reduce phase performance drastically. Unlike Hadoop distributions that use the local file system for shuffle and HTTP to transport shuffle data, Greenplum MR shuffle data is readable directly from anywhere on the network. The system stores data using a shared system that eliminates contention and overhead from data transport and retrieval. Automatic, transparent client-side compression reduces network overhead and reduces footprint on disk, while direct block device I/O provides throughput at hardware speed with no additional overhead. As an additional performance boost, you can read files while they are still being written.

Improved Data Access

NFS access makes Hadoop radically easier and less expensive to use. Greenplum MR allows files to be modified and overwritten, and enables multiple concurrent reads and writes on any file.

Users can simply browse files, automatically open associated applications with a mouse click, or drag and drop files and directories into and out of the cluster. Additionally, standard command-line tools and UNIX applications and utilities (such as Grep, Tar, Sort, or Tail) can be used directly on data in the cluster. With other Hadoop distributions, the user must copy the data out of the cluster in order to use standard tools.

Intuitive Resource Management

Greenplum MR includes the MapR Control System (MCS), which enables full visibility into cluster resources and activity. The tool provides visual insight into node health, service status, and resource utilization, organized by cluster topology (such as datacenters and racks).

Designed to manage large clusters with thousands of nodes, the MapR Heatmap shows the health of the entire cluster at a glance. Filters and group actions are provided to select specific components and perform administrative actions directly since the number of nodes, files, and volumes can be very high. The interfaces also include CLI and REST access.

Data Sheet

Greenplum MR

Product overview of Greenplum MR: High-Performance Hadoop.

Whitepaper

Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations

Greenplum MR, together with the Cisco Unified Computing System provides companies with an integrated Hadoop solution that delivers advanced performance, full data protection, no single point of failure, and improved data access features that can expedite the implementation of big data analytics environments.

Solution Brief

High-Performance Hadoop Configuration Solution Brief

Greenplum MR, together with the Cisco Unified Computing System provides companies with an integrated Hadoop solution that delivers advanced performance, full data protection, no single point of failure, and improved data access features that can expedite the implementation of big data analytics environments.