Long before Steve Jobs or Bill Gates were even twinkles in their fathers’ eyes, the word computer was a job title for someone who computes or performs mathematical calculations. Depending which online resource you trust most, its use dates back to the 1600’s. Not until much later, sometime in the 1800’s, did it come to refer to a device rather than a human. From what I can gather, the word calculator underwent a similar evolution.
I’ve taken you on this little jaunt back in time in part because I’m under the influence of a book that I’m currently reading, Etymologicon, but mostly to make the point that another title – “data scientist” – is likely to follow the same trajectory.
Data scientist has come to refer to people with a specific skills set that includes (but may not be limited to) quantitative modeling expertise (e.g. econometrics, applied statistics, theoretical physics, etc.), programming abilities (e.g. R, Python, SAS, etc.), a good working knowledge of Big Data (including its enabling technologies such as Hadoop and Spark), all complemented by some general business acumen. It’s a neologism of much more significance than usual buzzwords like “growth hacker.” (One day soon I may still decide mount my trusty soap box to rail against that one.)
Much like human computers once possessed an uncommon skill set (e.g. mathematics) so too do data scientists today. (Ok, I’m stretching well beyond the boundaries of my actual knowledge here to speculate that knowledge of mathematics was necessarily uncommon in the 1600’s but humor me for the sake of conversation. Why let the truth get in the way of a good story?)
Today there is an imbalance between the supply of and demand for data scientists, I wouldn’t expect that situation to persist for long. As one recent McKinsey Quarterly report put it, “Skilled employees across the spectrum of data-analytics roles are in short supply, so aggressive actions to address this problem are critical. Our study found that 15 percent of operating-profit increases from big data analytics were linked to the hiring of data and analytics experts.”
In the short term, companies like DataScience are taking an “as a service” approach to satisfying the unmet market need for data scientists. In the medium term, any ambitious young soul who’s currently picking a major and wants to find a job in the sexy tech sector would be well advised to direct her (but all too often his) energies toward data science. The money is good and the openings plentiful, and a few graduating classes hence, we are likely to close the gap, even as demand for data scientists continues to grow in sectors other than tech.
The other reason I expect the imbalance to be somewhat short lived is the technological advancements that will “productize” this specialized skill set. Consider the evolution from the abacus of antiquity to the slide rule, then next the HP12C until the spreadsheet of today. Technology has allowed us to abstract away from complex mathematical concepts with a simple Excel function and empowered line managers to run their analyses with a computer device rather than have to rely on a human computer.
That day is coming and because the rate of technological advancement continues to accelerate, on a much shorter timescale than ever before. Companies like Arimo and DataRobot are already applying machine learning to the problem. Make no mistake, an AI that can take over data science from humans may be a moonshot, but by offering something today to increase the productivity of human data scientists, DataRobot and their ilk could use the crowd to train their artificial data scientist intelligence in a manner similar to Facebook M. This is an area where the first mover advantage could be extremely difficult to overcome.
One day I believe data science will be “point-and-click,” easy, and democratizing access will unlock the tremendous latent value behind all the hype. The aforementioned McKinsey Quarterly article estimates contemporaneous return multiples between 1.1. and 1.9, similar to the 80’s era of computer investment. I suspect, without any real grounds for it of course, that 1.9 will be the low end of the range after the next wave of investment and adoption.
To quote the article again, “Although companies struggle to roll out big data initiatives across the whole organization, these results suggest that efforts to democratize usage—getting analytics tools in the hands of as many different kinds of frontline employees as possible—will yield broad performance improvements. . . With a broad range of talent, [early adopters] can use data analytics to address the current challenges of their functional areas while developing forward-facing applications to stay ahead of competitors”