MapReduce has been proven as a technique for high-scale data analysis by Internet leaders such as Google and Yahoo. Greenplum gives enterprises the best of both worlds – MapReduce for programmers and SQL for DBAs – and will execute both MapReduce and SQL directly within Greenplum’s parallel dataflow engine, which is at the heart of the Greenplum Database.
Greenplum MapReduce enables programmers to run analytics against petabyte-scale datasets stored in and outside of the Greenplum Database. Greenplum MapReduce brings the benefits of a growing standard programming model to the reliability and familiarity of the relational database. The new capability expands the Greenplum Database to support MapReduce programs.
Parallel dataflow engine: MapReduce + SQL
“Greenplum has been mastering the use of a parallel dataflow engine as the heart of our core product, Greenplum Database. With several years of experience developing and deploying Greenplum Database at companies large and small, we know how to build highly scalable data solutions. Adding MapReduce means our customers will be able to use a leading-edge new technology on a stable, reliable foundation.”
Luke Lonergan, CTO, Greenplum
Key Driver: Need For Petabyte-Scale Data Analytics
Greenplum customers have been involved in an early-access program utilizing Greenplum MapReduce for advanced analytics. For example, LinkedIn is using Greenplum Database for new innovative social networking features such as “People You May Know” and is evaluating Greenplum MapReduce as a way to develop compelling analytics products faster. A primary benefit of the new capability is that customers can combine SQL queries and MapReduce programs into unified tasks that are executed in parallel across hundreds or thousands of cores.
“The integration of MapReduce into Greenplum Database creates new ways to manage our text analysis efforts. What previously would require us to take data out of the database or write complex SQL queries can now be simplified into a few lines of code.”
Roger Magoulas, Research Director, O’Reilly Media
"The most exciting aspect of MapReduce is the excitement it is generating. It's attracting talented programmers -- many of whom don't want to buy or use SQL databases -- and enabling them to wrangle enormous data sets without leaving their familiar programming paradigms. Any movement that brings that much compute power to a larger talent base has the potential to produce game-changing results."
Joe Hellerstein, Professor, UC Berkeley
Key Features of Greenplum MapReduce
- Combine SQL & MapReduce
- Process any type of data wherever it lives
- Enterprise-level integration and support