HAWQ: The New Benchmark for SQL on Hadoop

hawq-web

Business data analytics has changed tremendously in recent years. When enterprise datasets consisted entirely of structured data generated from ERP, CRM and other operational databases, businesses would typically use heavyweight ETL process to load data into Enterprise Datamarts or to EDW systems.

Read more »

Managing Hot and Cold Data Using a Unified Storage System

Photo by Daniel Hurst. (Getty Images)

In my previous blog post, “Hadoop and Disparate Data Stores”, I introduced a project Greenplum is working on that abstracts various storage options within an organization under a unified layer referred to as Unified Storage System (USS). The advantage of USS is that it can help with the Tiering of Storage, a concept that has been around for some time, but is unfamiliar to some.

Read more »

Towards a Unified In-Situ Analytics System

Photo by Michael Mandiberg via Flickr. (CC BY-SA 2.0)

With ever-growing data sets produced from user-generated online content and activity, and the amount of machine-generated data from server logging and network traffic monitoring, enterprise customers want the best of both worlds. They want to perform complicated interactive queries and sophisticated reporting easily, using existing BI tool sets.

Read more »

Can Anyone Become a Data Scientist? Oxdata Believes So

Visualization for Popular Science magazine by Jer Thorp via Flickr. CC BY 2.0 license.

Data science is a sophisticated and complex discipline, but since it’s still an emerging field, its practitioners come from a wide variety of backgrounds. Typically, though, a background in working with large data sets in a research setting is advantageous. This is why you may find yourself mingling with a former physicist or immunologist at the next data hackathon you attend.

Read more »

Hadoop and Disparate Data Stores

elephant_rgb_sq

Through our experiences in working with customers on Big Data platforms, we’ve come to notice that there are fundamentally two types of Hadoop users out there; the first type being “Hadoop-centric” users who are building platforms completely off of Hadoop and no longer want to leverage relational database technologies for analytics (these tend to be the early adopters of Hadoop), and the second type being users who are leveraging Hadoop as an augmentation to existing systems and are focused on integrating the technology with existing analytical databases and workflows (these tend to be the later adopters who are still building their Hadoop skills internally).

Read more »