<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Datastream</title>
	<atom:link href="http://www.greenplum.com/blog/feed" rel="self" type="application/rss+xml" />
	<link>http://www.greenplum.com/blog</link>
	<description>The Official Greenplum Blog</description>
	<lastBuildDate>Mon, 22 Apr 2013 17:48:34 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>Beginning Wednesday, Join Us at the Pivotal Blog</title>
		<link>http://www.greenplum.com/blog/dive-in/upstream/beginning-wednesday-join-us-at-the-pivotal-blog</link>
		<comments>http://www.greenplum.com/blog/dive-in/upstream/beginning-wednesday-join-us-at-the-pivotal-blog#comments</comments>
		<pubDate>Mon, 22 Apr 2013 17:08:40 +0000</pubDate>
		<dc:creator>Paul M. Davis</dc:creator>
				<category><![CDATA[upstream]]></category>
		<category><![CDATA[datastream]]></category>
		<category><![CDATA[pivotal]]></category>

		<guid isPermaLink="false">http://www.greenplum.com/blog/?p=1936</guid>
		<description><![CDATA[<p></p>
<p>The Pivotal era kicks off this Wednesday April 24th, with Pivotal: A New Platform for a New Era, a live streaming event unveiling this exciting new company. Bringing together selected technology, people and programs from EMC and VMware, Pivotal will unite Greenplum&#8217;s products and services with those from Cloud Foundry, Spring, GemFire and other products from the VMware vFabric Suite, Cetas, and Pivotal Labs. <a href="http://www.greenplum.com/blog/dive-in/upstream/beginning-wednesday-join-us-at-the-pivotal-blog" class="read_more"><br /><br />Read more &#187;</a></p>]]></description>
				<content:encoded><![CDATA[<p><img src="http://www.greenplum.com/sites/default/wp/blog/wp-content/uploads/2013/04/1365796232-150x150.jpg" alt="1365796232.jpg" width="150" height="150" class="alignleft size-thumbnail wp-image-1894" /></p>
<p>The Pivotal era kicks off this Wednesday April 24th, with <a href="http://gopivotal.com">Pivotal: A New Platform for a New Era</a>, a live streaming event unveiling this exciting new company. Bringing together selected technology, people and programs from EMC and VMware, Pivotal will unite Greenplum&#8217;s products and services with those from Cloud Foundry, Spring, GemFire and other products from the VMware vFabric Suite, Cetas, and Pivotal Labs.</p>
<p>Beginning on Wednesday, <a href="http://gopivotal.com">we&#8217;ll be moving to the Pivotal blog</a>. There you&#8217;ll find the Hadoop and data science news, technical deep dives, and articles you&#8217;ve come to expect from Datastream, as well as the latest on agile development and the cloud.</p>
<p>Join us on Wednesday, April 24th at <a href="http://gopivotal.com">gopivotal.com</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.greenplum.com/blog/dive-in/upstream/beginning-wednesday-join-us-at-the-pivotal-blog/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
	<ib_show><![CDATA[1]]></ib_show>
	<ib_modified><![CDATA[1366652914]]></ib_modified>
	<ib_summary><![CDATA[The Pivotal era kicks off this Wednesday April 24th, with Pivotal: A New Platform for a New Era, a live streaming event unveiling this exciting new company. Beginning on Wednesday, we'll be moving to the Pivotal blog. There you'll find the Hadoop and data science news, technical deep dives, and articles you've come to expect from Datastream, as well as the latest on agile development and the cloud.]]></ib_summary>
	<ib_category><![CDATA[22]]></ib_category>
	<ib_topics><![CDATA[5]]></ib_topics>
	<ib_image><![CDATA[http://www.greenplum.com/sites/default/wp/blog/wp-content/uploads/2013/04/1365796232.jpg]]></ib_image>	</item>
		<item>
		<title>Making Hadoop MapReduce Work with a Redis Cluster</title>
		<link>http://www.greenplum.com/blog/topics/hadoop/making-hadoop-mapreduce-work-with-a-redis-cluster</link>
		<comments>http://www.greenplum.com/blog/topics/hadoop/making-hadoop-mapreduce-work-with-a-redis-cluster#comments</comments>
		<pubDate>Thu, 18 Apr 2013 16:34:02 +0000</pubDate>
		<dc:creator>Adam Shook</dc:creator>
				<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[technical]]></category>
		<category><![CDATA[IO]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[Jedis]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Redis]]></category>

		<guid isPermaLink="false">http://www.greenplum.com/blog/?p=1911</guid>
		<description><![CDATA[<p></p>
<p><i>Redis</i> is a very cool open-source key-value store that can add instant value to your Hadoop installation. Since keys can contain strings, hashes, lists, sets and sorted sets, Redis can be used as a front end to serve data out of Hadoop, caching your ‘hot’ pieces of data in-memory for fast access when they are needed again. <a href="http://www.greenplum.com/blog/topics/hadoop/making-hadoop-mapreduce-work-with-a-redis-cluster" class="read_more"><br /><br />Read more &#187;</a></p>]]></description>
				<content:encoded><![CDATA[<p><img src="http://www.greenplum.com/sites/default/wp/blog/wp-content/uploads/2013/04/redis_logo-150x150.png" alt="redis_logo" width="150" height="150" class="alignleft size-thumbnail wp-image-1926" /></p>
<p><a href="http://www.redis.io"><i>Redis</i></a> is a very cool open-source key-value store that can add instant value to your Hadoop installation. Since keys can contain strings, hashes, lists, sets and sorted sets, Redis can be used as a front end to serve data out of Hadoop, caching your ‘hot’ pieces of data in-memory for fast access when they are needed again. By using a Java client called <a href="https://github.com/xetorthio/jedis"><i>Jedis</i></a>, you can ingest and retrieve data with Redis. Combining this simple client with the power of MapReduce will let you write and read data to and from Redis in parallel.</p>
<p>In the code below, we use MapReduce to pull and push key/value pairs to any number of standalone Redis instances. We will be writing to, and reading from, a Redis <i>hash</i>, which maps string fields to string values, much like a Java HashMap. Each hash is uniquely identified by a <i>hash key</i>, similar to the names of tables. Each input and output format has two core configuration parameters: a CSV list of hostnames running a Redis instance, and the hash key. Similar to Hadoop’s default HashPartitioner, (key.hashCode() % number of Redis instances) is used to determine which Redis instance the key is written to. This random distribution will result in even data distribution, so long as your key-space isn’t skewed – but solving that problem is a topic for another post.</p>
<div id="attachment_1922" class="wp-caption alignleft" style="width: 627px"><a href="http://www.greenplum.com/sites/default/wp/blog/wp-content/uploads/2013/04/output.png"><img src="http://www.greenplum.com/sites/default/wp/blog/wp-content/uploads/2013/04/output.png" alt="Four map tasks outputting to three standalone Redis hash instances.  Each record writer has a connection to each Redis instance." width="617" height="500" class="size-large wp-image-1922" /></a><p class="wp-caption-text">Four map tasks outputting to three standalone Redis hash instances.  Each record writer has a connection to each Redis instance.</p></div>
<p>With that said, let’s take a look at all the code. Pay attention to the comments, as they’ll tell you what is going on. First up is an implementation of <a href="http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/outputformat.html"><i>OutputFormat</i></a>. This class defines the key/value data types and behavior for writing to Redis instances via Jedis.</p>
<pre class="brush: java; title: ; wrap-lines: true; notranslate"> // This output format class is templated to accept a key and value of type Text
public static class RedisHashOutputFormat extends OutputFormat&lt;Text, Text&gt; {

// These static conf variables and methods are used to modify the job configuration.  This is a common pattern for MapReduce related classes to avoid the magic string problem
public static final String REDIS_HOSTS_CONF = &quot;mapred.redishashoutputformat.hosts&quot;;
public static final String REDIS_HASH_KEY_CONF = &quot;mapred.redishashinputformat.key&quot;;
                    
public static void setRedisHosts(Job job, String hosts) {
job.getConfiguration().set(REDIS_HOSTS_CONF, hosts);                
}
                    
public static void setRedisHashKey(Job job, String hashKey) {
job.getConfiguration().set(REDIS_HASH_KEY_CONF, hashKey);                
}

// This method returns an instance of a RecordWriter for the task.  Note how we are pulling the variables set by the static methods during configuration
public RecordWriter&lt;Text, Text&gt; getRecordWriter(TaskAttemptContext job)
throws IOException, InterruptedException {
String hashKey = job.getConfiguration().get(REDIS_HASH_KEY_CONF);
String csvHosts = job.getConfiguration().get(REDIS_HOSTS_CONF);
return new RedisHashRecordWriter(hashKey, csvHosts);
}
            
// This method is used on the front-end prior to job submission to ensure everything is configured correctly        
public void checkOutputSpecs(JobContext job) throws IOException {
String hosts = job.getConfiguration().get(REDIS_HOSTS_CONF);
if (hosts == null || hosts.isEmpty()) {
throw new IOException(REDIS_HOSTS_CONF + &quot; is not set in configuration.&quot;);
}

String hashKey = job.getConfiguration().get(REDIS_HASH_KEY_CONF);
if (hashKey == null || hashKey.isEmpty()) {
throw new IOException(REDIS_HASH_KEY_CONF + &quot; is not set in configuration.&quot;);
}
}

// The output committer is used on the back-end to, well, commit output.  Discussion of this class is out of scope, but more info can be found here
public OutputCommitter getOutputCommitter(TaskAttemptContext context)
throws IOException, InterruptedException {
// use a null output committer, since
return (new NullOutputFormat&lt;Text, Text&gt;()).getOutputCommitter(context);
}

public static class RedisHashRecordWriter extends RecordWriter&lt;Text, Text&gt; {
    // implementation of this static nested class is shown shortly
}
} // end RedisHashOutputFormat
</pre>
<p>The role of OutputFormat is to properly configure the job, ensuring that the RecordWriter implementation has everything it needs to work correctly. Once configured, the RecordWriter is what actually writes key/value pairs wherever you want them to go. A common practice is to make your RecordWriter (or reader) a static nested class, but that isn’t required. Let’s take a look at an implementation of <a href="http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/recordwriter.html"><i>RecordWriter</i></a>:</p>
<pre class="brush: java; title: ; wrap-lines: true; notranslate">
// This class is template to write only Text keys and Text values
public static class RedisHashRecordWriter extends RecordWriter&lt;Text, Text&gt; {

// This map is used to map an integer to a Jedis instance
private HashMap&lt;Integer, Jedis&gt; jedisMap = new HashMap&lt;Integer, Jedis&gt;();

// This is the name of the Redis hash
private String hashKey = null;

public RedisHashRecordWriter(String hashKey, String hosts) {
this.hashKey = hashKey;

// Create a connection to Redis for each host
// Map an integer 0-(numRedisInstances - 1) to the instance
int i=0;
for (String host : hosts.split(&quot;,&quot;)) {
Jedis jedis = new Jedis(host);
jedis.connect();
jedisMap.put(i++, jedis);
}
}

// The write method is what will actually write the key value pairs out to Redis
public void write(Text key, Text value) throws IOException, InterruptedException {
// Get the Jedis instance that this key/value pair will be written to.
Jedis j = jedisMap.get(Math.abs(key.hashCode()) % jedisMap.size());

// Write the key/value pair
j.hset(hashKey, key.toString(), value.toString());
}
                    
public void close(TaskAttemptContext context)
throws IOException, InterruptedException {
// For each jedis instance, disconnect it
for (Jedis jedis : jedisMap.values()) {
jedis.disconnect();
}
}
} // end RedisRecordWriter
</pre>
<p>This code demonstrates how simple it is to hook into external hosts for output. Such lightweight interfaces allow for endless possibilities, so long as the custom output formats can handle the parallel load of many map or reduce tasks.</p>
<p>Next up, let’s take a look at the <a href="http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/inputformat.html"><i>InputFormat</i></a> code to pull data out of our Redis instances. This is a bit more complex, as we’ll use a custom InputSplit implementation as well.</p>
<div id="attachment_1921" class="wp-caption alignleft" style="width: 627px"><a href="http://www.greenplum.com/sites/default/wp/blog/wp-content/uploads/2013/04/input.png"><img src="http://www.greenplum.com/sites/default/wp/blog/wp-content/uploads/2013/04/input.png" alt="Three map tasks are created, and each record reader pulls from its assigned Redis instance." width="617" height="500" class="size-large wp-image-1921" /></a><p class="wp-caption-text">Three map tasks are created, and each record reader pulls from its assigned Redis instance.</p></div>
<p>We create an InputSplit for each Redis host, and map task is created from each InputSplit. A single map task pulls all the data its assigned Redis instance.</p>
<pre class="brush: java; title: ; wrap-lines: true; notranslate">
// This input format will read all the data from a given set of Redis hosts
public static class RedisHashInputFormat extends InputFormat&lt;Text, Text&gt; {

// Again, the CSV list of hosts and a hash key variables and methods for configuration
public static final String REDIS_HOSTS_CONF = &quot;mapred.redishashinputformat.hosts&quot;;
public static final String REDIS_HASH_KEY_CONF = &quot;mapred.redishashinputformat.key&quot;;

public static void setRedisHosts(Job job, String hosts) {
job.getConfiguration().set(REDIS_HOSTS_CONF, hosts);
}

public static void setRedisHashKey(Job job, String hashKey) {
job.getConfiguration().set(REDIS_HASH_KEY_CONF, hashKey);
}

// This method will return a list of InputSplit objects.  The framework uses this to create an equivalent number of map tasks
public List&lt;InputSplit&gt; getSplits(JobContext job) throws IOException {

    // Get our configuration values and ensure they are set
String hosts = job.getConfiguration().get(REDIS_HOSTS_CONF);
if (hosts == null || hosts.isEmpty()) {
throw new IOException(REDIS_HOSTS_CONF + &quot; is not set in configuration.&quot;);
}

String hashKey = job.getConfiguration().get(REDIS_HASH_KEY_CONF);
if (hashKey == null || hashKey.isEmpty()) {
throw new IOException(REDIS_HASH_KEY_CONF + &quot; is not set in configuration.&quot;);
}

// Create an input split for each Redis instance
// More on this custom split later, just know that one is created per host
List&lt;InputSplit&gt; splits = new ArrayList&lt;InputSplit&gt;();
for (String host : hosts.split(&quot;,&quot;)) {
splits.add(new RedisHashInputSplit(host, hashKey));
}

return splits;
}

// This method creates an instance of our RedisHashRecordReader
public RecordReader&lt;Text, Text&gt; createRecordReader(InputSplit split, TaskAttemptContext context)
throws IOException, InterruptedException {
return new RedisHashRecordReader();
}

public static class RedisHashRecordReader extends RecordReader&lt;Text, Text&gt; {
    // implementation of this static nested class is shown shortly
}

public static class RedisHashInputSplit extends InputSplit implements Writable {
    // implementation of this static nested class is shown shortly
}

} // end RedisHashInputFormat
</pre>
<p>There are only two methods that adhere to the InputFormat abstract class: getSplits and createRecordReader. The example above demonstrates how simple it is to hook into external sources for output. The remaining static methods and variables are used to configure the job for the needs of the InputFormat and <a href="http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/recordreader.html"><i>RecordReader</i></a> implementations.</p>
<pre class="brush: java; title: ; wrap-lines: true; notranslate">
// This custom RecordReader will pull in all key/value pairs from a Redis instance for a given hash
public static class RedisHashRecordReader extends RecordReader&lt;Text, Text&gt; {

// A number of member variables to iterate and store key/value pairs from Redis
private Iterator&lt;Entry&lt;String, String&gt;&gt; keyValueMapIter = null;
private Text key = new Text(), value = new Text();
private float processedKVs = 0, totalKVs = 0;
private Entry&lt;String, String&gt; currentEntry = null;

// Initialize is called by the framework and given an InputSplit to process
public void initialize(InputSplit split, TaskAttemptContext context)
throws IOException, InterruptedException {

// Get the host location from the InputSplit
String host = split.getLocations()[0];
String hashKey = ((RedisHashInputSplit) split).getHashKey();

// Create a new connection to Redis
Jedis jedis = new Jedis(host);
jedis.connect();
jedis.getClient().setTimeoutInfinite();
    
// Get all the key/value pairs from the Redis instance and store them in memory
totalKVs = jedis.hlen(hashKey);
keyValueMapIter = jedis.hgetAll(hashKey).entrySet().iterator();
LOG.info(&quot;Got &quot; + totalKVs + &quot; from &quot; + hashKey); jedis.disconnect();
}

// This method is called by Mapper’s run method to ensure all key/value pairs are read
public boolean nextKeyValue() throws IOException, InterruptedException {
if (keyValueMapIter.hasNext()) {
// Get the current entry and set the Text objects to the entry
currentEntry = keyValueMapIter.next();
key.set(currentEntry.getKey());
value.set(currentEntry.getValue());
return true;                
} else {
return false;    
}
}

// The next two methods are to return the current key/value pairs.  Best practice is to re-use objects rather than create new ones, i.e. don’t use “new”
public Text getCurrentKey() throws IOException, InterruptedException {
return key;
}

public Text getCurrentValue() throws IOException, InterruptedException {
return value;
}

// This method is used to report the progress metric back to the framework.  It is not required to have a true implementation, but it is recommended.
public float getProgress() throws IOException, InterruptedException {
return processedKVs / totalKVs;                
}

public void close() throws IOException {
/* nothing to do */
}

} // end RedisHashRecordReader
</pre>
<p>Now that we’ve implemented a RecordReader, we need to determine what data is read by it. This is defined as an InputSplit implementation, which initializes the reader. The number of input splits determines the number of map tasks created by the framework.</p>
<p>This is a pretty simple task with Redis. We will create one map task for each Redis instance, which hosts a shard of our total data set. Each mapper will then connect to a single Redis instance and pull all of the data for the hash. This will all happen in parallel, similar to how MapReduce reads a file in parallel by reading its blocks. This split means we won’t overload a single Redis instance with too many connections. This is where InputSplit comes in.<sup><a href="#fn1" id="ref1">1</a></sup></p>
<pre class="brush: java; title: ; wrap-lines: true; notranslate">
public static class RedisHashInputSplit extends InputSplit implements Writable {

// Two member variables, the hostname and the hash key (table name)
private String location = null;
private String hashKey = null;

public RedisHashInputSplit() {
// Default constructor required for reflection
}

public RedisHashInputSplit(String redisHost, String hash) {
this.location = redisHost;
this.hashKey = hash;
}

public String getHashKey() {
return this.hashKey;
}

// The following two methods are used to serialize the input information for an individual task
public void readFields(DataInput in) throws IOException {
this.location = in.readUTF();
this.hashKey = in.readUTF();
}

public void write(DataOutput out) throws IOException {
out.writeUTF(location);
out.writeUTF(hashKey);
}

// This gets the size of the split so the framework can sort them by size.  This isn’t that important here, but we could query a Redis instance and get the bytes if we desired
public long getLength() throws IOException, InterruptedException {
return 0;
}

// This method returns hints to the framework of where to launch a task for data locality
public String[] getLocations() throws IOException, InterruptedException {
return new String[] { location };
}

} // end RedisHashInputSplit
</pre>
<p>This demonstrates how to customize input and output using the MapReduce framework for Redis. Though it’s often overlooked, customizing I/O is a useful way to make MapReduce more flexible. If I’ve piqued your interest and you want to know more about customizing MapReduce I/O, check out chapter seven of <a href="http://shop.oreilly.com/product/0636920025122.do"><i>MapReduce Design Patterns</i></a> (O’Reilly 2012), “Input and Output Patterns.”</p>
<p>When implementing custom formats for yourself for other external sources, be mindful of how well these sources can scale, and what would happen if a task fails and is tried again. In some cases, such as this one, that doesn’t really matter. Data that was already written to Redis will just be overwritten with a new copy, and data pulled from Redis will simply be pulled again on the next attempt. Other cases, for example if we were writing to a Redis list rather than a hash, would require a little bit more effort. In this scenario, task retries would add duplicate entries to the list. It would take additional engineering to roll back committed entries, but worth the effort to ensure more fault-tolerant external outputs.</p>
<p>Now get out there and write some custom formats of your own!</p>
<p><sup id="fn1"><br />
<hr /></sup></p>
<p>1. Note that custom <a href="http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/inputsplit.html"><i>InputSplits</i></a> must inherit from <a href="http://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/writable.html"><i>Writable</i></a> in order for the framework to serialize them.<a href="#ref1" title="Jump back to footnote 1 in the text.">↩</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.greenplum.com/blog/topics/hadoop/making-hadoop-mapreduce-work-with-a-redis-cluster/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
	<ib_show><![CDATA[1]]></ib_show>
	<ib_modified><![CDATA[1366302199]]></ib_modified>
	<ib_summary><![CDATA[Redis is a very cool open-source key-value store that can add instant value to your Hadoop installation. Since keys can contain strings, hashes, lists, sets and sorted sets, Redis can be used as a front end to serve data out of Hadoop, caching your ‘hot’ pieces of data in-memory for fast access when they are needed again. ]]></ib_summary>
	<ib_category><![CDATA[22]]></ib_category>
	<ib_topics><![CDATA[38]]></ib_topics>
	<ib_image><![CDATA[http://www.greenplum.com/sites/default/wp/blog/wp-content/uploads/2013/04/redis_logo.png]]></ib_image>	</item>
		<item>
		<title>Pivotal, A NEW PLATFORM FOR A NEW ERA</title>
		<link>http://www.greenplum.com/blog/topics/big-data-topics/pivotal-a-new-platform-for-a-new-era</link>
		<comments>http://www.greenplum.com/blog/topics/big-data-topics/pivotal-a-new-platform-for-a-new-era#comments</comments>
		<pubDate>Mon, 15 Apr 2013 16:56:26 +0000</pubDate>
		<dc:creator>Paul M. Davis</dc:creator>
				<category><![CDATA[agile]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[events]]></category>
		<category><![CDATA[predictive analytics]]></category>
		<category><![CDATA[the enterprise]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Pivotal Initiative]]></category>

		<guid isPermaLink="false">http://www.greenplum.com/blog/?p=1884</guid>
		<description><![CDATA[<p></p>
<p>Just as consumer-facing web giants such as Google and Facebook have done, enterprise companies increasingly need to store and analyze massive amounts of data cost-effectively, ingest huge numbers of events in real time, reason over the data, and react rapidly. To meet this need, Pivotal will host a live streaming event on April 24th with a special announcement and an unveiling of its plans to build a  platform that makes  the consumer grade enterprise a reality. <a href="http://www.greenplum.com/blog/topics/big-data-topics/pivotal-a-new-platform-for-a-new-era" class="read_more"><br /><br />Read more &#187;</a></p>]]></description>
				<content:encoded><![CDATA[<p><img src="http://www.greenplum.com/sites/default/wp/blog/wp-content/uploads/2013/04/1365796104.jpg" class="alignleft size-full" width="300" height="211" /></p>
<p>Just as consumer-facing web giants such as Google and Facebook have done, enterprise companies increasingly need to store and analyze massive amounts of data cost-effectively, ingest huge numbers of events in real time, reason over the data, and react rapidly. To meet this need, Pivotal will host a live streaming event on April 24th with a special announcement and an unveiling of its plans to build a  platform that makes  <a href="http://www.greenplum.com/blog/topics/data-science/paul-maritz-calls-for-a-consumer-grade-enterprise-platform">the consumer grade enterprise</a> a reality.</p>
<p>Billed as &quot;A New Platform for a New Era&quot;, the Pivotal platform will unite data, application, and cloud fabrics. Prioritizing big and fast data, agile development, and cloud independence, the Pivotal platform will enable enterprises to understand more, react quicker, build smarter, and achieve at an even greater scale. To this end, the new company brings together a prodigious set of technologies and talent from a number of EMC and VMware entities, which include Greenplum, Cloud Foundry, Spring, GemFire and other products from the VMware vFabric Suite, Cetas, and Pivotal Labs. </p>
<p>Paul Maritz, the Pivotal Leadership Team, and special guests will unveil this platform, and make a special announcement during a live streaming event on Wednesday, April 24th at 10:00 am Pacific/1:00 pm Eastern. Sign up for the event at <a href="http://gopivotal.com/" target="_blank">gopivotal.com</a> and follow <a href="http://twitter.com/gopivotal">@gopivotal</a> on Twitter for updates.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.greenplum.com/blog/topics/big-data-topics/pivotal-a-new-platform-for-a-new-era/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
	<ib_show><![CDATA[1]]></ib_show>
	<ib_modified><![CDATA[1366044986]]></ib_modified>
	<ib_summary><![CDATA[Pivotal will host a live streaming event on April 24th with a special announcement and an unveiling of its plans to build a  platform that makes  the consumer grade enterprise a reality.]]></ib_summary>
	<ib_category><![CDATA[22]]></ib_category>
	<ib_topics><![CDATA[42|1]]></ib_topics>
	<ib_image><![CDATA[http://www.greenplum.com/sites/default/wp/blog/wp-content/uploads/2013/04/1365796232.jpg]]></ib_image>	</item>
		<item>
		<title>Effective Data Visualization Techniques, from Business to Social Advocacy</title>
		<link>http://www.greenplum.com/blog/topics/data-for-good/effective-data-visualization-techniques-from-business-to-social-advocacy</link>
		<comments>http://www.greenplum.com/blog/topics/data-for-good/effective-data-visualization-techniques-from-business-to-social-advocacy#comments</comments>
		<pubDate>Tue, 09 Apr 2013 18:56:36 +0000</pubDate>
		<dc:creator>Paul M. Davis</dc:creator>
				<category><![CDATA[data for good]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[predictive analytics]]></category>
		<category><![CDATA[upstream]]></category>
		<category><![CDATA[visualization]]></category>
		<category><![CDATA[analytics]]></category>
		<category><![CDATA[best practices]]></category>
		<category><![CDATA[dashboard]]></category>
		<category><![CDATA[design]]></category>
		<category><![CDATA[Periscopic]]></category>
		<category><![CDATA[Procter & Gamble]]></category>
		<category><![CDATA[storytelling]]></category>

		<guid isPermaLink="false">http://www.greenplum.com/blog/?p=1874</guid>
		<description><![CDATA[<p></p>
<p>No matter how much data you amass, or how ingenious your models may be, your efforts are only as effective as how you communicate the insights revealed. Data visualization is not a new art, but one which has grown significantly more important in recent years, as organizations must respond to an increasing amount of data faster than ever. <a href="http://www.greenplum.com/blog/topics/data-for-good/effective-data-visualization-techniques-from-business-to-social-advocacy" class="read_more"><br /><br />Read more &#187;</a></p>]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.greenplum.com/sites/default/wp/blog/wp-content/uploads/2013/04/periscopic_cultural_assets.png"><img src="http://www.greenplum.com/sites/default/wp/blog/wp-content/uploads/2013/04/periscopic_cultural_assets-300x200.png" alt="periscopic_cultural_assets" width="300" height="200" class="alignleft size-medium wp-image-1875" /></a></p>
<p>No matter how much data you amass, or how ingenious your models may be, your efforts are only as effective as how you communicate the insights revealed. Data visualization is not a new art, but one which has grown significantly more important in recent years, as organizations must respond to an increasing amount of data faster than ever. </p>
<p>Two recent articles demonstrate the wide range of uses for data visualization — from business intelligence to social advocacy — and how form and function vary widely depending on the data visualized and the intended purpose. <a href="http://www.fastcodesign.com/1672207/whats-the-secret-to-great-infographics">Co.Design features Periscopic</a>, an information design studio that uses visualization as a persuasive storytelling form ripe for innovation and experimentation. In contrast, <a href="http://blogs.hbr.org/cs/2013/04/how_p_and_g_presents_data.html">a recent Harvard Business Review article</a> details Procter &#038; Gamble&#8217;s flexible yet robust framework of visualization tools, designed to rapidly deliver actionable business insight to decision makers.  </p>
<p>Despite the ascent of the form, the notion of telling stories with data visualization was a tougher sell when <a href="http://www.periscopic.com/">Periscopic</a> formed in 2004. At the time, co-founder Kim Rees explains to Co.Design, &#8220;We were seeing more interest in large databases, but there was a lag in the sense-making part of it.&#8221; Still, Rees and her partner Dino Citraro had a sense of the coming data deluge, and knew they were on to something. &#8220;If you store a lot of data it does you no good unless you can understand it,” she says. </p>
<p><a href="http://www.greenplum.com/sites/default/wp/blog/wp-content/uploads/2013/04/periscopic_guns.png"><img src="http://www.greenplum.com/sites/default/wp/blog/wp-content/uploads/2013/04/periscopic_guns-300x210.png" alt="periscopic_guns" width="300" height="210" class="alignright size-medium wp-image-1876" /></a></p>
<p>Periscopic became an in-demand information design firm and advocacy group by building tools that demonstrated their vision, telling stories out of complex sets of data through interactive interfaces which fostered agency and engagement among users. Rees tells Co.Design that Periscopic&#8217;s &#8220;goal is to communicate what the data is saying and allow people to have their own ownership of that data.&#8221; Conversely, the studio produces starkly simple interface when the topic demands such an approach. One exemplary recent examples of this is Periscopic&#8217;s <a href="http://guns.periscopic.com">sobering animation of the years lost to gun deaths in the U.S.</a>, which drives home the human toll of each individual death, rather than taking an approach that abstracts the statistics.</p>
<p>In terms of subject matter, methods, and intent, Procter &#038; Gamble&#8217;s robust analytics platform differs widely from Periscopic&#8217;s work. Its &#8220;Decision Cockpit&#8221; is accessible by over 50,000 decision makers within the company, clearly presenting a profusion of critical information on business operations. The dashboard is instantly accessible and highly visible in management meeting rooms in over 50 Procter &#038; Gamble offices.</p>
<p><a href="http://www.greenplum.com/sites/default/wp/blog/wp-content/uploads/2013/04/png_cockpit.png""><img src="http://www.greenplum.com/sites/default/wp/blog/wp-content/uploads/2013/04/png_cockpit-300x224.png" alt="png_cockpit" width="300" height="224" class="alignleft size-medium wp-image-1877" /></a></p>
<p>The company&#8217;s CIO tells Harvard Business Review that the purpose of its analytics platform is to turn data into knowledge and results, by &#8220;getting beyond the what to the why and the how.&#8221; A global company with a huge portfolio of products, Procter &#038; Gamble&#8217;s management need tools that let them drill deep into the minutiae of a particular market, without losing sight of the bigger picture. IT Director Guy Peri explains, &#8220;With visual analytics, we are able to quickly focus business decision makers on the businesses issues that are material.&#8221;</p>
<p>Despite the vast differences, both Periscopic and Procter &#038; Gamble demonstrate that there is no single set of techniques for effective data visualization. Form must not only follow function, but also context and intent. Read more at <a href="http://www.fastcodesign.com/1672207/whats-the-secret-to-great-infographics#1">Co.Design</a> and <a href="http://blogs.hbr.org/cs/2013/04/how_p_and_g_presents_data.html">Harvard Business Review</a>, which is running <a href="http://hbr.org/special-collections/insight/visualizing-data">a month-long series on visualizing data.</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.greenplum.com/blog/topics/data-for-good/effective-data-visualization-techniques-from-business-to-social-advocacy/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
	<ib_show><![CDATA[1]]></ib_show>
	<ib_modified><![CDATA[1365544335]]></ib_modified>
	<ib_summary><![CDATA[Periscopic and Procter & Gamble demonstrate the wide range of uses for data visualization — from business intelligence to social advocacy — and how form and function vary widely depending on the data visualized and the intended purpose.]]></ib_summary>
	<ib_category><![CDATA[22]]></ib_category>
	<ib_topics><![CDATA[5]]></ib_topics>
	<ib_image><![CDATA[http://www.greenplum.com/sites/default/wp/blog/wp-content/uploads/2013/04/periscopic_cultural_assets.png]]></ib_image>	</item>
		<item>
		<title>A Robot Gave Me My Job!</title>
		<link>http://www.greenplum.com/blog/topics/big-data-topics/a-robot-gave-me-my-job</link>
		<comments>http://www.greenplum.com/blog/topics/big-data-topics/a-robot-gave-me-my-job#comments</comments>
		<pubDate>Fri, 05 Apr 2013 19:13:20 +0000</pubDate>
		<dc:creator>Paul M. Davis</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[predictive analytics]]></category>
		<category><![CDATA[the enterprise]]></category>
		<category><![CDATA[upstream]]></category>
		<category><![CDATA[data analytics]]></category>
		<category><![CDATA[data mining]]></category>
		<category><![CDATA[employment]]></category>
		<category><![CDATA[hiring]]></category>
		<category><![CDATA[human resources]]></category>
		<category><![CDATA[jobs]]></category>
		<category><![CDATA[predictive algorithms]]></category>

		<guid isPermaLink="false">http://www.greenplum.com/blog/?p=1866</guid>
		<description><![CDATA[<p></p>
<p>&#8220;A robot took my job!&#8221; The familiar refrain gets at a long-running cultural anxiety that machines will render human workers redundant. &#8220;My father had worked for the same firm for twelve years,&#8221; begins a memorable gag from Woody Allen&#8217;s early standup career. <a href="http://www.greenplum.com/blog/topics/big-data-topics/a-robot-gave-me-my-job" class="read_more"><br /><br />Read more &#187;</a></p>]]></description>
				<content:encoded><![CDATA[<p><img src="http://www.greenplum.com/sites/default/wp/blog/wp-content/uploads/2013/04/phone-brazil-300x178.jpg" alt="phone-brazil" width="300" height="178" class="alignleft size-medium wp-image-1869" /></p>
<p>&#8220;A robot took my job!&#8221; The familiar refrain gets at a long-running cultural anxiety that machines will render human workers redundant. &#8220;My father had worked for the same firm for twelve years,&#8221; begins a memorable gag from Woody Allen&#8217;s early standup career. &#8220;They replaced him with a tiny gadget, this big, that does everything my father does, only it does it much better. The depressing thing is, my mother ran out and bought one.&#8221; But according to <a href="http://www.economist.com/news/business/21575820-how-software-helps-firms-hire-workers-more-efficiently-robot-recruiters">a recent <em>Economist</em> article</a> about the role of big data and predictive analytics in Human Resources departments, an increasing number of workers may instead have reason to declare, &#8220;A robot gave me my job!&#8221;</p>
<p>The notion of machine-aided HR departments might seem troubling to some job seekers, evoking a future where algorithms cull through countless resumes for candidates that fulfill a pre-selected set of metrics. Yet the approach is revealing surprising correlations that not only identify promising job candidates, but may benefit prospective employees in unexpected ways. </p>
<p>Some of the correlations are intriguing yet untested — for example, candidates who apply for jobs using a non pre-installed web browser like Chrome or Firefox tend to perform better and stay at jobs longer. But deep analysis of a wealth of applicant and HR data can also reveal insights that meaningfully contradict deeply ingrained HR biases:</p>
<blockquote>
<p>For instance, firms routinely cull job candidates with a criminal record. Yet the data suggest that for certain jobs there is no correlation with work performance…Likewise, many HR departments automatically eliminate candidates who have hopped from job to job. But a recent analysis of 100,000 call-centre workers showed that those who had job-hopped in the past were no more likely to quit quickly than those who had not.</p>
</blockquote>
<p>Other insights seem to make sly, intuitive sense — on the whole, honest people tend to be more productive and dependable employees, but less effective salespeople. And as the article details, biases and human error can still taint the accuracy of results. But for many job seekers, data mining and algorithms may increase, rather than limit, new opportunities. <a href="http://www.economist.com/news/business/21575820-how-software-helps-firms-hire-workers-more-efficiently-robot-recruiters">Read more at <em>The Economist</em></a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.greenplum.com/blog/topics/big-data-topics/a-robot-gave-me-my-job/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
	<ib_show><![CDATA[1]]></ib_show>
	<ib_modified><![CDATA[1365196547]]></ib_modified>
	<ib_summary><![CDATA[A robot took my job!” The familiar refrain gets at a long-running cultural anxiety that machines will render human workers redundant.  But according to a recent Economist article about the role of big data and predictive analytics in Human Resources departments, an increasing number of workers may instead have reason to declare, “A robot gave me my job!”]]></ib_summary>
	<ib_category><![CDATA[22]]></ib_category>
	<ib_topics><![CDATA[1]]></ib_topics>
	<ib_image><![CDATA[http://www.greenplum.com/sites/default/wp/blog/wp-content/uploads/2013/04/phone-brazil.jpg]]></ib_image>	</item>
		<item>
		<title>Pivotal: All Systems Go</title>
		<link>http://www.greenplum.com/blog/topics/data-science/pivotal-all-systems-go</link>
		<comments>http://www.greenplum.com/blog/topics/data-science/pivotal-all-systems-go#comments</comments>
		<pubDate>Tue, 02 Apr 2013 17:11:18 +0000</pubDate>
		<dc:creator>Paul M. Davis</dc:creator>
				<category><![CDATA[agile]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[the enterprise]]></category>
		<category><![CDATA[upstream]]></category>
		<category><![CDATA[cloud]]></category>
		<category><![CDATA[Paul Maritz]]></category>
		<category><![CDATA[Pivotal HD]]></category>
		<category><![CDATA[Pivotal Initiative]]></category>
		<category><![CDATA[Pivotal Labs]]></category>
		<category><![CDATA[VMWare]]></category>

		<guid isPermaLink="false">http://www.greenplum.com/blog/?p=1844</guid>
		<description><![CDATA[<p></p>
<p>Pivotal, the new company uniting EMC Greenplum’s decade of big data R &#38; D, agile development trailblazers Pivotal Labs, and VMWare’s cloud services and app framework acumen, is ready to launch. In a letter to employees sent Monday, April 1st, CEO Paul Maritz announced Pivotal as &#8220;a new and exciting company with great promise,&#8221; and outlined the Pivotal vision. <a href="http://www.greenplum.com/blog/topics/data-science/pivotal-all-systems-go" class="read_more"><br /><br />Read more &#187;</a></p>]]></description>
				<content:encoded><![CDATA[<p><img title="gopivotal.jpg" src="http://www.greenplum.com/sites/default/wp/blog/wp-content/uploads/2013/04/gopivotal.jpg" alt="Gopivotal" width="916" height="516" border="0" /></p>
<p>Pivotal, the new company uniting EMC Greenplum’s decade of big data R &amp; D, agile development trailblazers Pivotal Labs, and VMWare’s cloud services and app framework acumen, is ready to launch. In a letter to employees sent Monday, April 1st, CEO Paul Maritz announced Pivotal as &#8220;a new and exciting company with great promise,&#8221; and outlined the Pivotal vision. In the letter, titled “Pivotal: One Team. One Platform. One Company,” Maritz wrote:</p>
<blockquote><p>As you recall, on December 4, 2012 we shared with you our intentions to bring together products and people from EMC and VMware to form a “new entity” in an effort to pursue a unique opportunity in the marketplace. On March 13th, we publicly outlined our intentions to the analyst community. We knew that getting this far would take effort and we still have much work to do, however, at this time I would like to recognize everyone’s hard work, patience, and spirit that has brought us to this day. Thank you.</p>
<p>When I think of Pivotal, one word consistently comes to mind: privilege.</p>
<p>We are all privileged to have the opportunity to work on a big and important idea: to enable customers to build a new class of applications, leveraging big and fast data, and do all of this with the power of cloud-independence. Building a new platform for a new era is an incredible challenge and opportunity. If we can succeed we will have something to be rightly proud of.</p>
</blockquote>
<p>On April 29th, Pivotal will launch, promising a unified platform to enable “<a href="http://www.greenplum.com/blog/topics/data-science/paul-maritz-calls-for-a-consumer-grade-enterprise-platform">the consumer-grade enterprise</a>,” boasting robust data analytics and applications fabrics, cloud independence, and the tools and skills to develop customer experiences in a rapid, iterative fashion. For the time being, you can learn more about Pivotal at <a href="http://gopivotal.com/">gopivotal.com</a>, and get more details in the <a href="http://gopivotal.com/faq.php">Pivotal FAQ</a>.</p>
<p>In <a href="http://www.crn.com/news/applications-os/240152130/maritz-pivotal-platform-will-sidestep-amazon-tax-for-big-data-apps.htm">an interview with CRN</a>, Maritz emphasizes the cloud independence aspect of the Pivotal platform:</p>
<blockquote>
<p>&#8220;&#8216;We don&#8217;t want this world to be like the bad old days of the mainframe: when you wrote a COBOL CICS app, you were condemned to pay IBM (NYSE:IBM) a tax for all eternity,&#8217; said Maritz, who is slated to publicly unveil the Pivotal Initiative in an April 29 press conference. &#8216;We don&#8217;t want to make it so when you write an app in Amazon you are condemned to pay Amazon a tax for all eternity.&#8217;&#8221;</p>
<p>(snip)</p>
<p>&#8220;Thus, Pivotal is aiming to deliver a platform as a service on top of a wide array of infrastructure-as-a-service offerings, from Amazon&#8217;s EC2 Web Service to Microsoft&#8217;s Azure and other platforms too. &#8216;If infrastructure as a service is the new hardware, we are the new OS on top of it,&#8217; said Maritz.&#8221;</p>
<p>&#8220;&#8216;At the end of the day, we intend to be a platform provider,&#8221; said Maritz. &#8220;I hesitate to use the word [OS] because it isn&#8217;t a really good analogy, [but] we want to provide the operating system for the cloud era.&#8217;&#8221;</p>
</blockquote>
<p>Some additional responses in the press:</p>
<p><a href="http://bits.blogs.nytimes.com/2013/04/01/emcs-amazon-challenger-comes-out/"><strong>EMC’s Amazon Challenger Comes Out | NYTimes.com</strong></a></p>
<blockquote>
<p>“Pivotal has drawn talent from both companies, in particular a division of EMC specialized in data analysis and prediction, and another group that works on writing software applications within cloud computing.</p>
<p>“In a letter to employees, Mr. Maritz talked about Pivotal’s goal ‘to enable customers to build a new class of applications, leveraging big and fast data, and do all of this with the power of cloud independence.’ Those applications would be running on privately run clouds rich in EMC and VMware products.</p>
<p>(snip)</p>
<p>“Mr. Maritz’s last great act at VMware was spending $1.26 billion on a networking software company that needs a lot of private clouds to succeed. Pivotal, if it can supply lots of cloud applications and services faster than Amazon can make them, could be a way of ensuring that future.”</p>
</blockquote>
<p><strong>
<p><a href="http://www.eweek.com/cloud/emc-vmware-love-child-pivotal-poses-challenge-to-aws/">EMC, VMware Love Child Pivotal Poses Challenge to AWS | eWeek</a></p>
<p></strong></p>
<blockquote>
<p>&#8220;Does Amazon and the Web services sector need another competitor? Of course they would say no, but it&#8217;s a big Internet world out there. With the financial and development muscle of EMC—a $50 billion per year company—and VMware (nearly $5 billion), to go with the know-how and experience of Maritz, Pivotal stands as good a chance as any to make a real mark.&#8221;</p>
<p>&#8220;One supposes that in the marketplace of the Tier 1 Web service providers, with all things being relatively equal, it might simply come down trust and security. EMC, with its RSA Security arm and its generation-old data protection background, brings a lot to the table with Pivotal in that case.&#8221;</p>
</blockquote>
<p><a href="http://www.sys-con.com/node/2599482"><b>The Pivotal Initiative – Aiming for the PaaS Crown | SYS-CON MEDIA</b></a></p>
<blockquote><p>The Pivotal Initiative will aim to deliver the market a data analysis platform capable of capturing large volumes of data, quickly addressing and querying it and then producing near real time answers that can be stored in a large scale-out storage system. It would be naïve to think this is an initiative aimed just at existing VMware customers. This is an attempt to not only enter but also become relevant in the software led infrastructure arena that competes with the likes of Amazon.</p>
<p>“In essence the Pivotal Initiative is a brave yet necessary move from both EMC and VMware to embrace the challenge of change as the legacy of traditional infrastructure faces the daunting prospect of new software paradigms. Whether the Pivotal Initiative can be successful and achieve it’s $1bn rate in its projected five years depends on a number of factors. One thing is certain is that the first challenge to remaining relevant in the IT industry is to acknowledge and adapt to change. The masters behind the Pivotal Initiative have already achieved that.”</p>
</blockquote>
<p><a href="http://seekingalpha.com/article/1313621-emc-big-strides-ahead?source=google_news"><strong>EMC: Big Strides Ahead | Seeking Alpha</strong></a></p>
<blockquote>
<p>“The Pivotal venture is expected to be spun-off in the future and possibly made a public company, majority-owned by EMC (69%). It is expected to accrue $1 billion in revenue by 2017, according to this presentation. I believe this new development will help accelerate growth prospects for all these three entities.”</p>
</blockquote>
<p>And some of the reactions on Twitter:</p>
<blockquote class="twitter-tweet">
<p>Pivotal: A New Platform for a New Era. Ready. Set. <a href="https://twitter.com/search/%23gopivotal">#gopivotal</a> <a title="http://gopivotal.com" href="http://t.co/NsL5KbOMqe">gopivotal.com</a> Not much to say, but I like the promo</p>
<p>— Doug Henschen (@DHenschen) <a href="https://twitter.com/DHenschen/status/318840161012166656">April 1, 2013</a></p></blockquote>
<p><script type="text/javascript" src="//platform.twitter.com/widgets.js"></script></p>
<blockquote class="twitter-tweet">
<p>Day 1 of Pivotal: A New Cloud Platform for a New Era. Ready. Set. <a href="https://twitter.com/search/%23gopivotal">#gopivotal</a> <a title="http://gopivotal.com" href="http://t.co/8YIRSIaK73">gopivotal.com</a></p>
<p>— Brent Byrnes (@Brent_Byrnes) <a href="https://twitter.com/Brent_Byrnes/status/318866656304697346">April 1, 2013</a></p></blockquote>
<p><script type="text/javascript" src="//platform.twitter.com/widgets.js"></script></p>
<p>Learn more at <a href="http://gopivotal.com/">gopivotal.com</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.greenplum.com/blog/topics/data-science/pivotal-all-systems-go/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
	<ib_show><![CDATA[1]]></ib_show>
	<ib_modified><![CDATA[1365027463]]></ib_modified>
	<ib_summary><![CDATA[Pivotal, the new company uniting EMC Greenplum’s decade of big data R & D, agile development trailblazers Pivotal Labs, and VMWare’s cloud services and app framework acumen, is ready to launch. In a letter to employees sent Monday, April 1st, CEO Paul Maritz announced Pivotal as "a new and exciting company with great promise," and outlined the Pivotal vision.]]></ib_summary>
	<ib_category><![CDATA[22]]></ib_category>
	<ib_topics><![CDATA[5]]></ib_topics>
	<ib_image><![CDATA[http://www.greenplum.com/sites/default/wp/blog/wp-content/uploads/2013/04/pivotal_logo.jpg]]></ib_image>	</item>
		<item>
		<title>Chorus In Action at Data Science London</title>
		<link>http://www.greenplum.com/blog/topics/data-science/chorus-in-action-at-data-science-london</link>
		<comments>http://www.greenplum.com/blog/topics/data-science/chorus-in-action-at-data-science-london#comments</comments>
		<pubDate>Wed, 27 Mar 2013 22:55:46 +0000</pubDate>
		<dc:creator>Logan Lee</dc:creator>
				<category><![CDATA[data science]]></category>
		<category><![CDATA[events]]></category>
		<category><![CDATA[features]]></category>
		<category><![CDATA[reflections]]></category>
		<category><![CDATA[Chorus]]></category>
		<category><![CDATA[Data Science London]]></category>
		<category><![CDATA[event]]></category>
		<category><![CDATA[workflows]]></category>
		<category><![CDATA[workshops]]></category>

		<guid isPermaLink="false">http://www.greenplum.com/blog/?p=1830</guid>
		<description><![CDATA[<p>Over the past few years, organizations like Data Science London have sprung up in major metropolitan cities all over the world. This is yet another sign of increasing momentum behind the data science community. It’s a community capable of transformative impact, one enabled by the dramatic improvements in technologies to effectively analyze and model massive data volume. <a href="http://www.greenplum.com/blog/topics/data-science/chorus-in-action-at-data-science-london" class="read_more"><br /><br />Read more &#187;</a></p>]]></description>
				<content:encoded><![CDATA[<div id="attachment_1834" class="wp-caption alignleft" style="width: 310px"><a href="http://www.greenplum.com/sites/default/wp/blog/wp-content/uploads/2013/03/datascilondon1.jpg"><img src="http://www.greenplum.com/sites/default/wp/blog/wp-content/uploads/2013/03/datascilondon1-300x225.jpg" alt="Logan Lee at Data Science London. Image via Data Science London (@ds_ldn)" width="300" height="225" class="size-medium wp-image-1834" /></a><p class="wp-caption-text">Logan Lee at Data Science London. Image via Data Science London (<a href="https://twitter.com/ds_ldn">@ds_ldn</a>)</p></div>
<p>Over the past few years, organizations like <a target="_blank" href="http://datasciencelondon.org/">Data Science London</a> have sprung up in major metropolitan cities all over the world. This is yet another sign of increasing momentum behind the data science community. It’s a community capable of transformative impact, one enabled by the dramatic improvements in technologies to effectively analyze and model massive data volume. Greenplum feels privileged to be invited from time to time to present information about our products and learnings, and to participate in collaborative discussion with a group representative of the markets we serve.</p>
<p>Recently, I had the opportunity to present a workshop on <a href="http://www.greenplum.com/products/chorus">Greenplum Chorus</a> at <a target="_blank" href="http://datasciencelondon.org/">Data Science London</a>. The group is one of the largest data science communities in Europe, with over 1,600 data professionals from a diverse range of organizations and companies. They meet regularly to discuss data science concepts and technologies used to analyze large-scale data, extract predictive insight, and exploit business opportunities from data products. </p>
<p>On this night, I introduced the three primary concepts of Chorus:</p>
<ul>
<li>The first concept is self-service access to data from disparate sources for data scientists to speed up their work. </li>
<li>The second concept is how to avoid over-investing in each new data science project by providing a standard analytical environment to prove out the idea.</li>
<li>The third concept is the value of disseminating knowledge about data sources and datasets across an organization to speed up each team’s ability to work with the data they need.</li>
</ul>
<blockquote><p>The workshop was composed of a simulated data science project, where participants randomly formed teams and played the roles of Data Miner, Data Scientist, or Subject Matter Experts (SMEs). The Data Miner&#8217;s unique skills were technical aptitude to access and prepare datasets. The Data Scientists were the only team members able to create a statistical model using various programming languages. The Subject Matter Experts (SMEs) possessed critical domain knowledge of the project&#8217;s business constraints and a detailed understanding of the source data that materially affected what data should be used and how the model should be built.</p>
<p>The roles of data miner, data scientist, and subject matter experts are real-life roles, but data scientists (with unique statistical and computer science skills) are often forced to play all three roles, spending up to 80% of their time on tasks others could have performed. Effectively bringing others into a collaborative and secure environment is a principle element to how we see data science transforming.</p>
</blockquote>
<div id="attachment_1833" class="wp-caption alignright" style="width: 310px"><a href="http://www.greenplum.com/sites/default/wp/blog/wp-content/uploads/2013/03/datascilondon2.jpg"><img src="http://www.greenplum.com/sites/default/wp/blog/wp-content/uploads/2013/03/datascilondon2-300x225.jpg" alt="Image via Data Science London (@ds_ldn)" width="300" height="225" class="size-medium wp-image-1833" /></a><p class="wp-caption-text">Image via Data Science London (<a href="https://twitter.com/ds_ldn">@ds_ldn</a>)</p></div>
<p>Using Chorus, the Data Miner on the project was given instructions to identify and prepare training data for a model from a range of datasets stored within a Greenplum database. Instructions were intentionally vague and the Data Miner did not have 100% of the necessary knowledge to prepare the data on their own. Through Chorus&#8217; data management and notation capabilities, the Data Miner was able to effectively surface potential training sets, solicit input from the Data Scientist and SMEs on the project, and quickly revise the training data based on input received from the other two roles. All the data preparation work was done without the need for shuttling files back and forth or setting up individual environments to inspect and evaluate data stored in the database. With each team member navigating through the virtual workspace to the appropriate objects, Chorus made accessing the source of truth easy.</p>
<p>Once training data was prepared, the Data Scientist on the team was called into action to apply their unique domain knowledge to create sophisticated statistical models to predict the desired outcome. Chorus includes an integrated console for developing and versioning code that runs against the Greenplum data platform. Because datasets relevant to the project are readily available through the Chorus web interface, specifying what data to interact with is simple and intuitive. Here too, Chorus facilitated easy verification by SMEs, who provided key points related to the project&#8217;s goals or the nature of the underlying data that materially affected the data scientist&#8217;s choices. Chorus allowed all members of the team to run the unfinished model, saving iterations as new versions to be discussed and quickly improved. </p>
<p>These activities highlighted the value Chorus offers, streamlining the process of ”baton-handing” between team members, an important capability as organizations scale their data science activities. I was happy to see that even though the participants did not know which other people in the room were their team members, they were able to effectively collaborate in Chorus&#8217; web interface through the phases of their work.</p>
<p>Although each team worked in independent workspaces within Chorus, the contextual knowledge they accumulated and disseminated among themselves was exposed across teams, due to the use of common source datasets. A few astute teams quickly realized they could actually make use of the knowledge derived by other teams to improve their own results. This drove home the second important Chorus concept: how to make the knowledge accessible to others? By wrapping the questions, answers, comments, and feedback around the data assets, or modeling code itself, users can discover and use that knowledge as they search or browse for objects relevant to their work.</p>
<p>Feedback at the end of our workshop was positive and I was glad to see participants thinking about how they could apply the concepts Chorus enables in their own organization. As Chorus&#8217; Director of Product Management,it was extremely useful to me to gain direct feedback on how first time users use our product, and where to effectively invest in our goal to further transform the practice of data science for the better.</p>
<p>If you run a data science group and are interested in learning more about Chorus or Greenplum, please contact us. We&#8217;d like to learn more about you as well.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.greenplum.com/blog/topics/data-science/chorus-in-action-at-data-science-london/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
	<ib_show><![CDATA[1]]></ib_show>
	<ib_modified><![CDATA[1364489511]]></ib_modified>
	<ib_summary><![CDATA[Over the past few years, organizations like Data Science London have sprung up in major metropolitan cities all over the world. Greenplum feels privileged to be invited from time to time to present information about our products and learnings, and to participate in collaborative discussion with a group representative of the markets we serve. Recently, I had the opportunity to present a workshop on Greenplum Chorus at Data Science London, where I introduced the three primary concepts of Chorus.]]></ib_summary>
	<ib_category><![CDATA[15]]></ib_category>
	<ib_topics><![CDATA[42|4|5]]></ib_topics>
	<ib_image><![CDATA[http://www.greenplum.com/sites/default/wp/blog/wp-content/uploads/2013/03/datascilondon1.jpg]]></ib_image>	</item>
		<item>
		<title>eBay&#8217;s Beth Axelrod: Predictive Analytics is Transforming Human Resources</title>
		<link>http://www.greenplum.com/blog/topics/big-data-topics/ebays-beth-axelrod-predictive-analytics-is-transforming-human-resources</link>
		<comments>http://www.greenplum.com/blog/topics/big-data-topics/ebays-beth-axelrod-predictive-analytics-is-transforming-human-resources#comments</comments>
		<pubDate>Tue, 26 Mar 2013 19:55:21 +0000</pubDate>
		<dc:creator>Paul M. Davis</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[predictive analytics]]></category>
		<category><![CDATA[the enterprise]]></category>
		<category><![CDATA[upstream]]></category>
		<category><![CDATA[companies]]></category>
		<category><![CDATA[hiring]]></category>
		<category><![CDATA[human resources]]></category>
		<category><![CDATA[practioners]]></category>
		<category><![CDATA[recruitment]]></category>

		<guid isPermaLink="false">http://www.greenplum.com/blog/?p=1816</guid>
		<description><![CDATA[<p>The challenges of hiring skilled data scientists is well-documented, but what of Big Data&#8217;s impact on talent recruitment within other fields? According to Beth Axelrod, eBay&#8217;s Senior Vice President of Human Resources and the co-author of the new book <em>The War for Talent</em>, Human Resources is the latest department to be significantly disrupted within the data-driven enterprise. <a href="http://www.greenplum.com/blog/topics/big-data-topics/ebays-beth-axelrod-predictive-analytics-is-transforming-human-resources" class="read_more"><br /><br />Read more &#187;</a></p>]]></description>
				<content:encoded><![CDATA[<div id="attachment_1820" class="wp-caption alignleft" style="width: 310px"><img src="http://www.greenplum.com/sites/default/wp/blog/wp-content/uploads/2013/03/beth_axelrod-300x168.jpeg" alt="eBay&#039;s Beth Axelrod" width="300" height="168" class="size-medium wp-image-1820" /><p class="wp-caption-text">eBay&#8217;s Beth Axelrod</p></div>
<p>The challenges of hiring skilled data scientists is well-documented, but what of Big Data&#8217;s impact on talent recruitment within other fields? According to Beth Axelrod, eBay&#8217;s Senior Vice President of Human Resources and the co-author of the new book <em>The War for Talent</em>, Human Resources is the latest department to be significantly disrupted within the data-driven enterprise. </p>
<p>In <a target="_blank" href="http://www.forbes.com/sites/dorieclark/2013/03/08/how-big-data-is-transforming-the-hunt-for-talent/">an interview at Forbes</a>, Axelrod explains, &#8220;There’s a lot of value to be created and added through data analytics…Whether it’s doing a better job spotting talent outside to attract to the company, or doing predictive analysis of who is likely to leave and what are the factors, so you can intervene before that point is reached to try to change the trajectory.&#8221; She points to recruitment startup <a target="_blank" href="http://www.gild.com/">Gild</a> as exemplifying the changes to come. Not unlike a Kaggle in reverse, the service analyzes and evaluates open source code available on the web, and then connects prospective employers with the developers writing the top-ranked code.</p>
<p>Axelrod notes that the greatest barrier for companies embracing data-driven employee recruitment  is often the folks in the Human Resources department, who may lack the skills or desire to perform sophisticated analytics. This mismatch should encourage companies to reach outside of their HR departments for recruitment efforts, she says, and engage in-house analytics talent when considering new hires. </p>
<p><a target="_blank" href="http://www.forbes.com/sites/dorieclark/2013/03/08/how-big-data-is-transforming-the-hunt-for-talent/">Read more at Forbes</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.greenplum.com/blog/topics/big-data-topics/ebays-beth-axelrod-predictive-analytics-is-transforming-human-resources/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
	<ib_show><![CDATA[1]]></ib_show>
	<ib_modified><![CDATA[1364336446]]></ib_modified>
	<ib_summary><![CDATA[The challenges of hiring skilled data scientists is well-documented, but what of Big Data’s impact on talent recruitment within other fields? According to Beth Axelrod, eBay’s Senior Vice President of Human Resources and the co-author of the new book The War for Talent, Human Resources is the latest department to be significantly disrupted within the data-driven enterprise.
]]></ib_summary>
	<ib_category><![CDATA[22]]></ib_category>
	<ib_topics><![CDATA[26]]></ib_topics>
	<ib_image><![CDATA[http://www.greenplum.com/sites/default/wp/blog/wp-content/uploads/2013/03/beth_axelrod.jpeg]]></ib_image>	</item>
		<item>
		<title>Paul Maritz Calls for a &#8220;Consumer-Grade Enterprise&#8221; Platform</title>
		<link>http://www.greenplum.com/blog/topics/data-science/paul-maritz-calls-for-a-consumer-grade-enterprise-platform</link>
		<comments>http://www.greenplum.com/blog/topics/data-science/paul-maritz-calls-for-a-consumer-grade-enterprise-platform#comments</comments>
		<pubDate>Fri, 22 Mar 2013 19:37:49 +0000</pubDate>
		<dc:creator>Paul M. Davis</dc:creator>
				<category><![CDATA[agile]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[data warehousing]]></category>
		<category><![CDATA[predictive analytics]]></category>
		<category><![CDATA[the enterprise]]></category>
		<category><![CDATA[upstream]]></category>
		<category><![CDATA[GigaOm]]></category>
		<category><![CDATA[Internet of Things]]></category>
		<category><![CDATA[Paul Maritz]]></category>
		<category><![CDATA[Pivotal HD]]></category>
		<category><![CDATA[Pivotal Initiative]]></category>

		<guid isPermaLink="false">http://www.greenplum.com/blog/?p=1810</guid>
		<description><![CDATA[<p></p>
<p>Over the past decade, consumer-facing web giants such as Google and Facebook have invested heavily in user data — storing it, analyzing it, and using the insights revealed to rapidly iterate on products. These companies recognized early that user data is a treasure trove, and devoted significant in-house resources to reap its value. <a href="http://www.greenplum.com/blog/topics/data-science/paul-maritz-calls-for-a-consumer-grade-enterprise-platform" class="read_more"><br /><br />Read more &#187;</a></p>]]></description>
				<content:encoded><![CDATA[<p><img src="http://www.greenplum.com/sites/default/wp/blog/wp-content/uploads/2013/03/paul_maritz-300x175.jpg" alt="paul_maritz.jpg" width="300" height="175" class="alignleft size-medium wp-image-1809" /></p>
<p>Over the past decade, consumer-facing web giants such as Google and Facebook have invested heavily in user data — storing it, analyzing it, and using the insights revealed to rapidly iterate on products. These companies recognized early that user data is a treasure trove, and devoted significant in-house resources to reap its value. In this way, increasingly data-driven and data-dependent enterprises are playing catch-up. In <a target="_blank" href="http://gigaom.com/2013/03/19/the-world-is-ready-for-the-consumer-grade-enterprise/">a post at GigaOm</a>, Pivotal Initiative leader Paul Maritz articulates his vision for &#8220;the consumer-grade enterprise,&#8221; taking its cues not from the offerings of traditional enterprise vendors, but instead from the likes of Google, Facebook, and Amazon.   </p>
<p>The factors driving this shift, Maritz explains,  include the ascent of the cloud and the profusion of data, which will only increase in years ahead as consumers adopt &#8220;The Internet of Things&#8221; technologies. As the amount of data increases, enterprises will not only need to store and analyze it, but transform it into actionable insight at an ever-faster speed. He writes: </p>
<blockquote><p>Enterprise companies will need ways to store and analyze massive amounts of data cost-effectively, ingest huge numbers of events in real time, reason over the data and events, and react in real time. Teams will need to be able to develop rapidly the new solutions that exploit these underlying capabilities. The need for these capabilities can be seen across a wider set of industries — from industrial control to telecommunications to retail, and even to modern agriculture.</p></blockquote>
<p>To address these challenges, Maritz calls for a new cloud-enabled platform that is free from vendor lock-in, capable of handling massive amounts of data, which delivers insights in real time. <a target="_blank" href="http://gigaom.com/2013/03/19/the-world-is-ready-for-the-consumer-grade-enterprise/">Read more about Maritz&#8217;s vision at GigaOm</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.greenplum.com/blog/topics/data-science/paul-maritz-calls-for-a-consumer-grade-enterprise-platform/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
	<ib_show><![CDATA[1]]></ib_show>
	<ib_modified><![CDATA[1363986928]]></ib_modified>
	<ib_summary><![CDATA[Over the past decade, consumer-facing web giants such as Google and Facebook have invested heavily in user data — storing it, analyzing it, and using the insights revealed to rapidly iterate on products. In a post at GigaOm, Pivotal Initiative leader Paul Maritz articulates his vision for “the consumer-grade enterprise,” taking its cues not from the offerings of traditional enterprise vendors, but instead from the likes of Google, Facebook, and Amazon.
]]></ib_summary>
	<ib_category><![CDATA[22]]></ib_category>
	<ib_topics><![CDATA[1]]></ib_topics>
	<ib_image><![CDATA[http://www.greenplum.com/sites/default/wp/blog/wp-content/uploads/2013/03/paul_maritz.jpg]]></ib_image>	</item>
		<item>
		<title>Finding Problem Solvers Who Provide Big Answers: An Interview with Harper Reed</title>
		<link>http://www.greenplum.com/blog/topics/government/finding-problem-solvers-who-provide-big-answers-an-interview-with-harper-reed</link>
		<comments>http://www.greenplum.com/blog/topics/government/finding-problem-solvers-who-provide-big-answers-an-interview-with-harper-reed#comments</comments>
		<pubDate>Tue, 19 Mar 2013 20:48:59 +0000</pubDate>
		<dc:creator>Paul M. Davis</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[features]]></category>
		<category><![CDATA[government]]></category>
		<category><![CDATA[practitioners]]></category>
		<category><![CDATA[predictive analytics]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Harper Reed]]></category>
		<category><![CDATA[interview]]></category>
		<category><![CDATA[iteration]]></category>
		<category><![CDATA[Obama campaign]]></category>

		<guid isPermaLink="false">http://www.greenplum.com/blog/?p=1797</guid>
		<description><![CDATA[<p>Harper Reed, CTO for the Obama 2012 Presidential Campaign, delivered the keynote speech at EMC Greenplum’s Hadoop: The Foundation for Change event on Monday, February 25. Reed delivered what he called “A Big Data intervention,” urging the audience to move the conversation beyond Big Data, toward what he called “Big Answers.” He noted that technologists are “often bad at listening when it comes to data,” and said that practitioners “should be using these insights from data to do more listening.” He stated technologists must ask themselves, “‘How do we use targeting to have a conversation?’”</p>
<p></p>
<p>Running the most data-driven Presidential campaign to date presented unique challenges. <a href="http://www.greenplum.com/blog/topics/government/finding-problem-solvers-who-provide-big-answers-an-interview-with-harper-reed" class="read_more"><br /><br />Read more &#187;</a></p>]]></description>
				<content:encoded><![CDATA[<p>Harper Reed, CTO for the Obama 2012 Presidential Campaign, delivered the keynote speech at EMC Greenplum’s <a href="http://www.greenplum.com/hadoop-the-foundation-for-change">Hadoop: The Foundation for Change</a> event on Monday, February 25. Reed delivered what he called “A Big Data intervention,” urging the audience to move the conversation beyond Big Data, toward what he called “Big Answers.” He noted that technologists are “often bad at listening when it comes to data,” and said that practitioners “should be using these insights from data to do more listening.” He stated technologists must ask themselves, “‘How do we use targeting to have a conversation?’”</p>
<p><img src="http://www.greenplum.com/sites/default/wp/blog/wp-content/uploads/2013/03/harper_powerofdata_1.jpg" alt="Harper powerofdata 1" title="harper_powerofdata_1.jpg" border="0" width="621" height="465" /></p>
<p>Running the most data-driven Presidential campaign to date presented unique challenges. The high-stakes contest that is a Presidential campaign demands constant iteration. Reed stated that multivariate user testing and focusing on user experience was key, demanding that his team reap insights from a huge amount of data. This required people who thrived in a high-pressure, constantly changing work environment, facing a hard deadline and with no room for error. It demanded a technology infrastructure that wouldn’t go down on election day. And it required that the campaign run analytics on a wealth of data from many sources — social media, SMS, volunteer canvasing, poll results — and deliver actionable insights in real time.</p>
<p>Before the keynote, Reed spoke with Datastream about how the campaign found the right problem solvers for the job, and his reflections on using data to push the conversation closer to individual voters.</p>
<h2>Building a Team of Problem Solvers</h2>
<p>“Working for the President, every conversation was sort of like a mic drop. People would ask, ‘What are you doing,’ and I’d say, ‘well, I&#8217;m working for the President,’ and they&#8217;d drop the phone and show up. For the most part, people wanted the job, so it wasn&#8217;t very tricky to hire — people were excited about the job and the problems. What we did to make sure we got the right people is that we really celebrated the problem solving and what we were trying to do. We had 18 months to build a giant platform, it was going to be this billion dollar company, and then we were going to shut it down, and it couldn&#8217;t fail or go down.”</p>
<p>“Oftentimes, when you read job descriptions, they talk about your responsibilities. That doesn&#8217;t really tell you: ‘What are the awesome problems? What is the culture of problem solving? Why are we here? What are we trying to accomplish?’ We had this big architecture, and we&#8217;d see it was going to go down in six months, so we had to make sure it didn&#8217;t do that. So how were we going to do that? We used that to sort of lure them in.”</p>
<p>“There&#8217;s a book by Daniel Pink called <em>Drive</em> that talks about the motivations of problem solvers and really smart people. What we found was that by creating a culture where they could be autonomous, and very specifically not trying to control how they solved problems, instead making sure that we had goals that they could achieve — for example, &#8216;this needs to be highly available&#8217; — one team might build that differently than another team.”</p>
<p>“We had to have frameworks for them to do that. One example of what not to do is say, &#8216;you have to use this tool to solve this problem,&#8217; because then you find people who know the tools, but not how the solve the problems. We focused aggressively on &#8216;not the tools&#8217;. So if you wanted to use Rails, that was great, but if you happened to be really good at Python, there was a whole team that was solving problems in Python, and you could easily do that. It was more about enabling them to do what they needed to do to solve the problem, and not focusing on asking, &#8217;are they solving it the right way?’”</p>
<p>“That was hard, because anyone in this world can name 100 reasons why that&#8217;s risky. You&#8217;re segmenting your technology, making it harder to maintain, all these things. We had a very scaled-back operation, but like any business, we couldn&#8217;t fail.”</p>
<blockquote class="twitter-tweet"><p><a href="http://t.co/oRlS2xi7Kf" title="http://twitpic.com/c6tew1">twitpic.com/c6tew1</a> @<a href="https://twitter.com/harper">harper</a> reed, Obama 2012 Campaign CTO, crushing it as usual. <a href="https://twitter.com/search/%23hadoopSF">#hadoopSF</a> @<a href="https://twitter.com/greenplum">greenplum</a></p>
<p>&mdash; Kim Bassett (@kimbrob) <a href="https://twitter.com/kimbrob/status/306117212656267264">February 25, 2013</a></p></blockquote>
<p><script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script></p>
<h2>Big Answers Rather Than Big Data</h2>
<p>“It seems to me that the conversation about Big Data came out of it being hard to store. Greenplum is in this business, EMC is in this business. But the thing is, storage doesn&#8217;t matter, because these and other companies have solved this problem. Because it&#8217;s largely solved, it pushes us to this idea that I think people invest less in, which is Big Answers: How do you couple this data with answers? I think people say Big Data when they really mean answers. This should be a conversation of how you get the best answers.”</p>
<p>“On the campaign, it was very important to us that we focused on the answers, so our analytics team was all about giving us answers. Every day they would give us a brief that would say, &#8216;we need to put more people in Florida&#8217; or &#8216;we need to do more media here,&#8217; or &#8216;when you&#8217;re on the TV, here is who your audience is.&#8217; It was about giving us actual information that we could react to and act upon, answers to an actual question. The question wasn&#8217;t &#8216;how big is your database?&#8217; The questions we focus on as technologists, we forget that the reason we&#8217;re here is to get better answers and insights.”</p>
<h2>From Many-to-Many to Many-to-One: How Data Can Push Conversations Closer to Individuals</h2>
<p>“I realized that my entire career has been about asking, &#8216;how do we push the conversation closer and closer to an individual?&#8217; The idea of microlistening came from Tim O&#8217;Reilly. I was at a Foo Camp, and Tim O&#8217;Reilly said, &#8216;I&#8217;m tired of targeting, I want more listening.&#8217; I started looking closely at all the listening we were doing, whether it was on Twitter or knocking on doors.” </p>
<p>“If you knock on a door and somebody says, &#8216;I&#8217;m really interested in health care,&#8217; then when you ask, &#8216;who&#8217;s interested in health care in this area,&#8217; you have that person and you have something to react to, and you can move that conversation closer. You can then target them to have a loop, a conversation.&quot;</p>
<p>&quot;People are focusing more on, &#8216;how can I show them an ad?&#8217; They&#8217;ll say that a person needs to see it 12 times or 10 times or whatever to impact them, and that&#8217;s great — we still need ads and microtargeting is cool — but I think what&#8217;s important is asking, &#8217;how do we have a conversation, and how do we make it so it&#8217;s on an individual level?’ So for example, it&#8217;s between EMC and Harper Reed, EMC and a programmer, EMC and a data guy — that&#8217;s the advertising cycle, that is the loop.”</p>
<p><a href="http://www.greenplum.com/sites/default/wp/blog/wp-content/uploads/2013/03/harper_powerofdata_2.jpg"><img src="http://www.greenplum.com/sites/default/wp/blog/wp-content/uploads/2013/03/harper_powerofdata_2-617x500.jpg" alt="harper_powerofdata_2" width="617" height="500" class="alignleft size-large wp-image-1799" /></a></p>
<h2>Trusting Your Users</h2>
<p>“If you have microtargeting in your organization, how do you reflect that into listening? First of all, you have to trust your users. You have to want to have that conversation. In a campaign, that is the most important thing, for us to listen to people, because we&#8217;re representing people — we&#8217;re trying to participate in a representative government.”</p>
<p>“That&#8217;s where I think it gets interesting, and technology is bringing us towards that, where we can target so specifically that you start to have microconversations and these tiny, many-to-one interactions. Twitter obviously does this on a grand social scale, and we can take advantage of that, but there&#8217;s a lot of opportunities there that people are missing.”</p>
<p>Watch Harper Reed’s keynote on The Power of Data, from <a href="http://www.greenplum.com/hadoop-the-foundation-for-change">Hadoop: The Foundation for Change</a>.</p>
<p><a href="http://bitcast-a.v1.o1.sjc1.bitgravity.com/greenplum/VIDEO/Hadoop_Live_Webcast/KeyNote_HaperReed_final_640x360.m4v" class="lightbox-video"><img src="http://www.greenplum.com/sites/default/wp/blog/wp-content/uploads/2013/03/Screen-Shot-2013-03-19-at-3.56.29-PM-617x371.jpg" alt="Harper Reed video" width="617" height="371" class="alignnone size-large wp-image-1804" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.greenplum.com/blog/topics/government/finding-problem-solvers-who-provide-big-answers-an-interview-with-harper-reed/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
	<ib_show><![CDATA[1]]></ib_show>
	<ib_modified><![CDATA[1363798940]]></ib_modified>
	<ib_summary><![CDATA[Harper Reed, CTO for the Obama 2012 Presidential Campaign, delivered the keynote speech at EMC Greenplum’s Hadoop: The Foundation for Change event on Monday, February 25. Before the keynote, Reed spoke with Datastream about how the campaign found the right problem solvers for the job, and his reflections on using data to push the conversation closer to individual voters.]]></ib_summary>
	<ib_category><![CDATA[15]]></ib_category>
	<ib_topics><![CDATA[1]]></ib_topics>
	<ib_image><![CDATA[http://www.greenplum.com/sites/default/wp/blog/wp-content/uploads/2013/03/harper_powerofdata_1.jpg]]></ib_image>	</item>
	</channel>
</rss>
