Big Data as a source of competitive advantage at Bank of America – Abhishek Mehta at Hadoop World 2010
Data as the new competitive advantage
My favorite insight from the presentation follows:
The last piece of it, which kind of goes with the fact that Data Quants [or Data Scientists] are really undervalued in the market today, was the fact that Modeling Quants are overvalued …
Our learning was this: [Algorithms] are only as good as what you feed in them. So if you truly had a discipline built around data – collected over multiple sources, structured and unstructured, many variables going back many years – the [algorithms] become less important. Because simple models over Big Data are more powerful than the most complex model using some approximation behind it. And we fully believe in it.
Secondly, with this massive democratization of data and the tools around it – and the next phase of it being the algorithms also probably following the same track. Most of the best [algorithms] that you need to use are already open-sourced – they have been written. So the algorithms, or the art of writing algorithms, is no longer proprietary. That is NOT your competitive advantage. The advantage you have is going to be how you apply the algorithm to a particular business problem, and that’s going to be the competitive differentiator.
Now that’s very interesting stuff.
In reference to the above slide, Mehta has this to say:
As an example, the graph algorithm – the same algorithm:
- Powers the People You May Know app at LinkedIn
- Can be used and deployed to classify people into behavioral tribes, or behavioral networks
- And the same algorithm can be used to look at risk concentration ratios
The algorithm has already been written in MapReduce. … But which problem you apply it to – between people you may know, marketing tribes or risk concentrations – is what is going to be the competitive advantage.
So we spend a lot more time in building what I call Data Algorithms, or data modeling, than actually in writing algorithms. And that is a massive change in the science around data, and big data.
Data Factories – the next Industrial Revolution
The closing segment of Mehta’s talk begins with the following slide:
And he has some provocative thoughts on the topic:
I believe that we are witnessing the birth of the next industrial revolution. It’s going to be powered by data. It’s already begun. It’s still very early, but it’s already begun.
And the concept around data factories, of the ability to take data in, automate – just like factories do today – automate the data pipeline, and produce data products then can then be fed to solve multiple problems, is truly game changing. What we are building at Bank of America is the first data factory in financial services – to do exactly that.
Now data factories exist today. … Google and Facebook are some of the well-known data factories, as I classify them. Some of the not-so-well-known ones are comScore and Zynga. They do the exact same thing. Data is their core asset. They know how to monetize it, and they’ve tried to build an automated process to take raw data and push it forward.
… Buy into the fact that [Big Data] is going to change the world, and massively disrupt existing economic models. I look at Hadoop today as Linux was 20 years ago. We all have seen what Linux has done in the enteprise software space. It’s been massively disruptive. Hadoop will do the same. It’s not a question of if, it’s a question of when … across all verticals, not just in web properties.