So what is data science? There’s a joke that it is simply what they call data analytics in Silicon Valley. And to some extent that’s true – much of data science has indeed existed long before the phrase. But actually the exponential growth of data availability 1 and the power of computers (both in terms of grunt, and also machine learning techniques) have meant that it is evolving rapidly.
The best exposition I’ve seen is this venn, that outlines the three distinct skillsets that make for great data science.
- Statistical understanding. This is the ability to understand how maths and statistics can help. It means fluency with the techniques available, and the shortcomings and pitfalls of them. These techniques are evolving rapidly and – as I’ve posted before – even the simplest concepts can trip people up – experts included.
- Domain knowledge. This refers to the understanding of the field or domain where the data exists. Despite references to machine learning, and unsupervised data techniques, we are still a long way from being able to just let the system tell us the answer. The ability to frame, interpret, and communicate is critical. As the Sloan Management Review put it:
“it may be easier for domain experts, with deep knowledge of the business in which they are engaged and the requisite interpersonal skills, to obtain sufficient knowledge about data analysis to act as the translator for data scientists than for data scientists to gain enough knowledge about the domain, especially the language of that domain.”
- Technical skills. Despite being where the lay observer would think all the value is, this is probably the easiest of the three (although R will definitely have you pulling your hair out, and experts can get even excel catastrophically wrong), and the area in which there is least differentiation. They are tools, and the value is knowing the right tool for the job, rather than knowing a particular toolkit intimately. Google and StackOverflow mean that the answers are easily found, if you know the right question. A good data scientist needs to be comfortable with at least some of:
- R or Python for analytics,
- Hadoop for huge data sets,
- Tableau for visualisation and communciation
- Excel and SQL, the traditional two workhorses of data analytics.
Only if all of those are brought together can a
data analyst data scientist capture, manipulate, interrogate, interpret and communicate the information available in a way that gives accurate insight. It’s no surprise that this role, while “sexy“, and able to bring about transformative change, is not easily filled.
- I’m not sure it’s really exponential. There’s a statistic going around that 90% of the data in the world was captured in the last 2 years. I’ve seen that dated 2013, 2014, 2015, and according to a new report from IBM Marketing Cloud in 2016 – and if it’s consistently true, then yes, that would be exponential. However I think that that’s probably lazy generalisation and a great example of how a meme travels faster, and with more resilience, than fact. Whatever the details – it’s rapidly growing.