MPP Scatter/Gather Streaming™ Technology

Greenplum's new MPP Scatter/Gather Streaming™ (SG Streaming™) technology eliminates the bottlenecks associated with other approaches to data loading, enabling lightning-fast flow of data into the Greenplum Database for large-scale analytics and data warehousing. Greenplum customers are achieving production loading speeds of over four terabytes per hour with negligible impact on concurrent database operations.

  • Scatter/Gather Streaming™:
    • manages the flow of data into all nodes of the database
    • does not require additional software or systems
    • takes advantage of the same Parallel Dataflow Engine nodes in Greenplum Database

    Greenplum utilizes a 'parallel-everywhere' approach to loading in which data flows from one or more source systems to every node of the database without any sequential choke points. This differs from traditional “bulk loading” technologies, used by most mainstream database and MPP appliance vendors that push data from a single source, often over a single or small number of parallel channels, and result in fundamental bottlenecks and ever-increasing load times. Greenplum's approach also avoids the need for a 'loader' tier of servers, as required by some other MPP database vendors, that can add significant complexity and cost while effectively bottlenecking the bandwidth and parallelism of communication into the database.

    Greenplum’s SG Streaming™ technology ensures parallelism by 'scattering' data from all source systems across 100s or 1000s of parallel streams that simultaneously flow to all nodes of the Greenplum Database. Performance scales with the number of Greenplum Database nodes, and the technology supports both large batch and continuous near-real-time loading patterns with negligible impact on concurrent database operations. Data can be transformed and processed in-flight, utilizing all nodes of the database in parallel, for extremely high-performance ELT (extract-load-transform) and ETLT (extract-transform-load-transform) loading pipelines. Final 'gathering' and storage of data to disk takes place on all nodes simultaneously, with data automatically partitioned across nodes and optionally compressed. This technology is exposed to the DBA via a flexible and programmable "external table" interface and a traditional command-line loading interface.

     


    What's New:

    Greenplum Chorus™ is a new class of software that empowers people within an enterprise to more easily collaborate and derive insight from their data.

    As the first commercial Enterprise Data Cloud platform, it provides the key services necessary to realize the benefits of private cloud computing techniques and social collaboration for enterprise data warehousing and analytics. More >


    Information For: