Greenplum Analytics Workbench Scale Out Development Environment for Big Data Innovation and Research
Accelerate Hadoop Technology
The Greenplum Analytics Workbench will enable the Apache Hadoop open source community to validate code to scale on a regular, ongoing basis. With contributions certified at scale, enterprises can run them with confidence.
Big Data Application Innovation
Greenplum will not only use the Analytics Workbench to test the limits of scale-out infrastructure technology but also to re-define the models for applying Big Data analytics.
Training & Certification
A unique aspect of Greenplum’s Hadoop training program is that each member of the course will be granted access to the 1,000-node cluster to use as a sandbox environment to leverage following the successful completion of the Greenplum’s training and certification process.
Greenplum Analytics Workbench
The Greenplum Analytics Workbench – a 1,000-node cluster that will act as a lab environment for accelerating the pace of Big Data innovation - is now live. One of the primary uses of the Greenplum Analytics Workbench will be to act as an environment for running scale validation of the Apache Hadoop code base. Greenplum is actively working with the Apache Software Foundation to ensure that all results from the Analytics Workbench are available to the open source community in an effort to leverage the resources of the Analytics Workbench to further accelerate the development of Hadoop as a revolutionary technology for Big Data. The Analytics Workbench consists of technology from some of the world's leading software and hardware manufacturers to provide the infrastructure needed to fuel the progression of Big Data analytics.
Hadoop innovation and development is reliant upon contributions made by open source developers. However, the Apache Hadoop community has consistently faced the challenge of provisioning the required resources to validate new releases of the open source software. Without access to a large cluster for scale validation, the Apache community – and enterprise users – must wait for Hadoop user communities to sponsor an effort to run scale validations. This is done very infrequently and a lot of time is spent stabilizing releases for enterprise adoption.
With an aggressive plan for testing on the Apache Hadoop trunk and its continuing releases, EMC is excited to contribute to the Hadoop open source community by providing testing resources it lacks to quickly identify bugs, stabilize new releases and optimize hardware configurations in an effort to speed up the innovation of Hadoop. EMC plans to provide test results to the Apache Software Foundation and open source community, and EMC’s testing will be planned in coordination with the Apache Hadoop project.
The Greenplum Analytics Workbench is the result of a collaboration of several hardware and software vendors including:
The test bed cluster, which consists of 1,000-plus hardware nodes or 10,000 nodes with the addition of virtual machines, features 24 petabytes of physical storage. This is the equivalent of nearly half of the entire written works of mankind, from the beginning of recorded history.
Scale-Out Hadoop Validation
The Greenplum Analytics Workbench will be used for regular integration tests on Apache Hadoop. The 1,000-plus node test bed cluster incorporates technology from the world’s leading software and hardware manufacturers with the intention of providing the infrastructure needed to facilitate Apache Hadoop innovation. With the availability of a large-scale test bed, developers can have their contributions validated at scale, and enterprises can confidently deploy new releases in a production environment.
Innovative Applications of Big Data AnalyticsGreenplum will not only use the Analytics Workbench to test the limits of scale-out infrastructure technology but also to re-define the models for applying Big Data analytics. Whether that involves working with visionary academic institutions on data-intensive research studies, or collaborating with big data application developers, Greenplum has plans to provide the most innovative thinkers in the data space with access to the Analytics Workbench.
Greenplum Training and Certification
The 1,000-node cluster will also be made available to members of Greenplum’s training and certification classes for Hadoop. With the first publicly available courses launching this summer, Greenplum will offer organizations and individuals with a set of comprehensive Hadoop training programs designed to provide participants with the knowledge and programming skills required to have success with Hadoop. A unique aspect of Greenplum’s Hadoop training program is that each member of the course will be granted access to the 1,000-node cluster to use as a sandbox environment to leverage following the successful completion of the Greenplum’s training and certification process.
This whitepaper details the way the Greenplum Analytics Workbench was designed and built to validate Apache Hadoop code at scale, as well as provide a large scale experimentation environment for mixed mode development that include various SQL and Non-SQL execution environments.
Greenplum and Partners Launch 1,000 Node Platform to Accelerate Hadoop® Testing and Development
EMC and industry leading companies including Intel, VMware, Micron, Seagate, Supermicro, Switch, and Mellanox Technologies Partner To Deliver the Greenplum Analytics Workbench ™ analytic computing platform