Making Hadoop MapReduce Work with a Redis Cluster
Redis is a very cool open-source key-value store that can add instant value to your Hadoop installation. Since keys can contain strings, hashes, lists, sets and sorted sets, Redis can be used as a front end to serve data out of Hadoop, caching your ‘hot’ pieces of data in-memory for fast access when they are needed again.
Read more »
The History of Hadoop: From Small Starts to Big Data
Named after a toy elephant belonging to developer Doug Cutting’s son, over the past decade Hadoop has proven to be the little platform that could. From its humble beginnings as an open source search engine project created by Cutting and Mike Cafarella, Hadoop has evolved into a robust platform for Big Data storage and analysis.
Read more »
Disruptive Data Science – Transforming Your Company into a Data Science-Driven Enterprise
Big Data is the latest technology wave impacting C-Level executives across all areas of business, but amid the hype, there remains confusion about what it all means. The name emphasizes the exponential growth of data volumes worldwide (collectively, 2.5 Exabytes/ day in the latest estimate I saw from IDC), but more nuanced definitions of Big Data incorporate the following key tenets: diversification, low latency, and ubiquity.
Read more »
Hadoop Vaidya: Performance advisor for Hadoop Map/Reduce Jobs
It’s been few years since I open sourced the Hadoop Vaidya, as a “contrib” project under Apache Hadoop. It is a rule-based performance diagnostic framework for MapReduce jobs where each rule (aka diagnostic test) identifies a specific problem with the job’s performance, scalability or even a best practice violation and suggests a solution.
Read more »
Meet the “Team of Rivals” Building Greenplum HD
When our company was acquired by EMC in July of 2010, we could have easily been scooped up and monetized as a pretty nice data warehousing business for our parent company. They decided to do the opposite. EMC’s leadership believed in our team and our vision for leading the Big Data analytics industry and decided to double down on their investment.
Read more »
Towards a Unified In-Situ Analytics System
With ever-growing data sets produced from user-generated online content and activity, and the amount of machine-generated data from server logging and network traffic monitoring, enterprise customers want the best of both worlds. They want to perform complicated interactive queries and sophisticated reporting easily, using existing BI tool sets.
Read more »