Greenplum HD: Enterprise-Ready Apache Hadoop
Advanced Distribution for Hadoop
Get full data protection with no single points of failure
Faster Performance
Experience 2-5 times the speed of standard Apache Hadoop
API Compatibility
Use a complete distribution that is 100 percent API-compatible with Apache Hadoop (MapReduce, HDFS, and HBase)
Greenplum MR
Based on MapR’s M5 Distribution, Greenplum MR enables you to increase the performance and availability of Hadoop via breakthrough innovations. With Greenplum MR, Hadoop is faster, more dependable, and easier to use.
Experienced Hadoop users often find that issues with ease of use, performance, and manageability dramatically slow their internal development process while tying up valuable resources. To address these core Hadoop problems, Greenplum is providing a distribution that finally meets the needs of the most advanced users. Greenplum MR provides a complete distribution that has been re-engineered from the file system through the analytics layers to provide the fastest capabilities in the industry. These technological advances broaden the scope of how and where Hadoop can be used.
Innovation with Application Portability
Greenplum MR is a 100 percent interface-compatible implementation of the Apache Hadoop stack. By maintaining Hadoop interface compatibility, Greenplum MR provides seamless application portability while delivering the advanced features that organizations require.
Faster Performance
Greenplum MR achieves up to two times the performance of standard Apache Hadoop, and can reduce your equipment costs by more than half.
Greenplum MR Direct Shuffle uses the distributed NameNode to improve Reduce phase performance drastically. Unlike Hadoop distributions that use the local file system for shuffle and HTTP to transport shuffle data, Greenplum MR shuffle data is readable directly from anywhere on the network. The system stores data using a shared system that eliminates contention and overhead from data transport and retrieval. Automatic, transparent client-side compression reduces network overhead and reduces footprint on disk, while direct block device I/O provides throughput at hardware speed with no additional overhead. As an additional performance boost, you can read files while they are still being written.
Improved Data Access
NFS access makes Hadoop radically easier and less expensive to use. Greenplum MR allows files to be modified and overwritten, and enables multiple concurrent reads and writes on any file.
Users can simply browse files, automatically open associated applications with a mouse click, or drag and drop files and directories into and out of the cluster. Additionally, standard command-line tools and UNIX applications and utilities (such as Grep, Tar, Sort, or Tail) can be used directly on data in the cluster. With other Hadoop distributions, the user must copy the data out of the cluster in order to use standard tools.
Intuitive Resource Management
Greenplum MR includes the MapR Control System (MCS), which enables full visibility into cluster resources and activity. The tool provides visual insight into node health, service status, and resource utilization, organized by cluster topology (such as datacenters and racks).
Designed to manage large clusters with thousands of nodes, the MapR Heatmap shows the health of the entire cluster at a glance. Filters and group actions are provided to select specific components and perform administrative actions directly since the number of nodes, files, and volumes can be very high. The interfaces also include CLI and REST access.
Whitepaper
Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations
Greenplum MR, together with the Cisco Unified Computing System provides companies with an integrated Hadoop solution that delivers advanced performance, full data protection, no single point of failure, and improved data access features that can expedite the implementation of big data analytics environments.
High-Performance Hadoop Configuration Solution Brief
Greenplum MR, together with the Cisco Unified Computing System provides companies with an integrated Hadoop solution that delivers advanced performance, full data protection, no single point of failure, and improved data access features that can expedite the implementation of big data analytics environments.