Blog

Latest Posts

04.30.2009 :: Luke Lonergan
Category:: April

CTO View: Astounding scale at eBay

It's great to see that Curt Monash has dug into some of the details of eBay's massive data warehouse project on the DBMS2 blog. This reminds me of some of the airplane projects I've worked on – lots of very cool stuff we can't talk about, but we all enjoy what the analysts say about our work. Quoting Curt Monash:

So far as I can tell, eBay uses Greenplum to manage one kind of data — web and network event logs. These seem to be managed primarily at two levels of detail — Oliver said that the 17 trillion event detail records reduce to 1 trillion real event records. When I asked where the 17:1 ratio comes from, Oliver explained that a single web page click — which is what is memorialized in an event record — resulted in 50-150 details. That leaves a missing factor of 3-8X, but perhaps other less complex kinds of events are also mixed in.

The Greenplum metrics I quoted above represent over 100 days of data. Ultimately, eBay expects to keep 90-180 days of ultimate detail, and >1 years of event data. The 6 1/2 petabyte figure comes from dividing 2 terabytes of compressed data by (100%-70%). Since that all fits on a 4 1/2 petabyte system, I presume there’s only one level of mirroring (duh), not much temp space, and even less in the way of indexes.

This is an important example of what can be done by companies who see the value of their data and are willing to take risks to move ahead – I'm totally excited about what we are doing with eBay and think there's no end to what is possible there :-)

Add A Comment

Your Name(*):
Comment(*):