Hadoop is an open-source software framework for running large batch-oriented information analytics jobs across clusters of commodity servers. Inspired by work at Google and Yahoo!, Hadoop exposes the MapReduce programming paradigm and can also be used with open-source tools from Apache including Hbase, Pig and Hive. Unstructured data stored in Hadoop is distributed and stored multiple times across the Hadoop machine cluster to optimize both performance and reliability.