Hadoop
News & Blogs
Infographic: How Big Would All the World's Big Data Be?
Business Insider
October 3, 2012 –
Have you ever wondered just how big all the date in the world would be, if translated into physical terms? It turns out that you now have an answer. According to the newest infographic chart just published by PC Wholesale, the answer is...huge! It seems that 10 trillion gigabytes of data are generated each and every year. In fact, Youtube alone receives a whopping eight whole years' worth of video, uploaded to its servers each and every day.
For the future of big data, look to Facebook
GigaOm
October 1, 2012 –
Facebook knows something about big data — it collects more data and has built more tools than almost anybody else. Here, Facebook’s Jay Parikh and Accel Partners’ Ping Li talk about what lessons big data startups can take from Facebook to build businesses that can succeed.
Hadoop MapReduce Can Transform How You Build Top-Ten Lists
Greenplum Blog
September 26, 2012 –
It seems like websites, magazines, and TV shows all over the place are building top ten lists (or top-k lists) these days. The top ten science fiction movies of all time, the best places to live, etc. Top-ten lists are not only a lot of fun because of our seemingly primal need to create categories and hierarchies — they can actually be a useful way to analyze your data.
A lot of times, the most interesting records in your data set are the ones with the most extreme values. It’s mind-expanding to think about building a top ten from billions and trillions of records, but it’s also a remarkable achievement for those in the list. Here’s a design pattern you can use to develop a MapReduce job that produces a top-ten list from your data.
Big Data: Polluter or Environmentalist?
Silicon Angle
September 25, 2012 –
Here’s what we know. Data volumes are growing exponentially – a.k.a. Big Data. And we need hardware – lots of hardware – to store it, crunch it and deliver all that data to hungry end-users – both business users and consumers. The problem, as pointed out in a New York Times piece published over the weekend, is that all that hardware requires significant power to operate, including industrial cooling equipment and back-up generators that spew diesel exhaust and other pollutants into the atmosphere. Meanwhile, datacenter operators that prize continuous uptime over energy efficiency routinely run their operations at full power even while many servers sit idle or significantly underutilized.
Data Analysts Seek to Make Social Media More Useful
Bloomberg Businessweek
September 19, 2012 –
It’s not easy turning the Mayberry Police Department into the team from CSI, or turning an idea for a new type of social network analysis into something like Klout on steroids, but those types of transformations are becoming ever more realistic. The world’s universities and research institutions are hard at work figuring out ways to make the mountains of social data generated every day more useful and, hopefully, make us realize there’s more to social data than just figuring out whose digital voice is the loudest.
5 ideas to help everyone make the most of big data
GigaOm
September 17, 2012 –
Big data is going mainstream, but there are still plenty of lessons to be learned from Silicon Valley data scientists whose businesses depend on data to survive. Although their use cases don’t always align with what more-traditional businesses are doing, they know enough about the science and technology to save big-data newcomers a lot of frustration.
Information Management
September 14, 2012 –
This fall, prestigious Columbia University in New York City is offering a course entitled “Introduction to Data Science,” taught by a team under the direction of Google Statistician and Columbia Assistant Professor Rachel Schutt. The class is an outgrowth of the recently-created Institute for Data Sciences and Engineering, a joint initiative between Columbia and New York City.
Big Data Analytics: Thinking Outside of Hadoop
Cloud Computing Journal
September 12, 2012 –
In the recent release of '2012 Hype Cycle Of Emerging Technologies,' research analyst Gartner evaluated several technologies to come up with a list of technologies that will dominate the future . "Big Data" related technologies form a significant portion of the list, in particular the following technologies revolve around the concept and usage of Big Data.
Everyone is Talking about Big Data
eHealth
September 7, 2012 –
Several new publications about Big Data in healthcare are showing up with good analysis of this emerging field. The starting point for healthcare organizations is “setting the company’s technology strategy and designing the architecture for internal systems” among other tasks.
Etsy unveils its infrastructure (and its Supermicro love)
Gigaom
September 5, 2012 –
Etsy shared the details of its hardware architecture on Friday, showing the world a whole lot of Supermicro servers running everything from web servers to Hadoop. At this point, software is the name of the game at webscale, so hardware openness is just welcome community service.