Data Warehouse Database Features


Greenplum Database allows our customers to create petabyte databases on commodity hardware that run at unprecedented speeds. With the introduction of MapReduce, in-database compression, programmable parallel analytics, and enhanced monitoring, Greenplum Database 3.2 handles unprecedented volumes of data.

Shared-nothing massively parallel processing architecture
Responsible for the distribution of data and parallel execution of queries across a cluster of machines. Includes the ability to leverage 10s, 100s, or 1000s of processing cores with full parallelism.

Software solution leverages commodity hardware
Software is easily installed on commodity x86-based servers from a range of tier-1 vendors, and runs on both Linux and Solaris.

Fault tolerance and advanced replication
No single point of failure. Internally the system utilizes log shipping and segment-level replication to achieve redundancy, and provides automated failover.

Linear scalability
Shared-nothing architecture and parallel query optimization ensure that performance and capacity increase linearly to 100s of nodes and 1000s of processing cores.

MapReduce support
MapReduce has been proven as a technique for high-scale data analysis by Internet leaders such as Google and Yahoo. With Greenplum, this capability is available in-house to enterprises.

SQL standard
Comprehensive SQL-92 and SQL-99 support with SQL 2003 OLAP extensions. All queries are parallelized and executed across the entire system.

Unified analytical processing
All queries and analysis (SQL, MapReduce, R, etc) are executed on the same parallel dataflow engine, allowing analysts, developers and statisticians to analyze data using a common infrastructure.

Programmable parallel analytics
Offers a new level of parallel analysis capabilities for mathematicians and statisticians, with support for R, linear algebra and machine learning primitives.

In-database compression
Utilizes industry-leading compression technology to increase performance and dramatically reduce the space required to store data. Customers can expect to see a 3-10x disk space reduction with a corresponding increase in effective I/O performance.

Petabyte-scale loading
High-performance parallel data loader executing simultaneously across all cluster nodes facilitates load rates in excess of 4.5TB/hr.

Anywhere data access
Allows queries to be executed from the database against external data sources, returning data in parallel, regardless of their location, format, or storage medium.

Dynamic expansion
Allows companies to easily add data warehouse capacity in small or large increments, and avoid costly appliance or SMP server upgrades.

Advanced gNet interconnect technology
Utilizes pipelining techniques and redistributes data among nodes for high performance execution of complex joins.

Workload management
Allows administrators to create role-based resource queues to divide up resources and manage the load on the system.

Centralized administration
Provides cluster-wide management tools and utilities that allow administrators to manage the database as if it was a single system.

Performance monitoring
Graphical performance monitoring allows users to profile running and historical queries and track system utilization and resources.

Support For Indexes
Greenplum supports B-Tree, Hash, Bitmap, GiST, and GIN, which allows for a rich indexing capability, ensuring data architects have the tools necessary to implement the optimal design.

Industry standard interfaces
Supports standard database interfaces (SQL, ODBC, JDBC, DBI) and is interoperable with market-leading business intelligence and extract/transform/load (ETL) tools.

 




Videos
Briefing on the Petabyte Future and the next generation database.
Watch now
mapreduce demo
Technical Overview of MapReduce - with MapReduce Demos.
Watch now

Database thought leaders discuss state of development.
Watch now

Luke Lonergan on achieving large scale analytics.
Watch now

Customers help shape the next generation of Greenplum Database.
Watch now