Why is data preparation so hard and are we getting worse at it?

Some statistics for the aperitif:

Data scientists, according to interviews and expert estimates, spend from 50 percent to 80 percent of their time mired in this more mundane labor of collecting and preparing unruly digital data, before it can be explored for useful nuggets. New York Times, 2014. No source for the statistics.

Analysts will still spend up to 80% of their time just trying to create the data set to draw insights. Forrester, 2015. No source for the statistics.

Since the popular emergence of data science as a field, its practitioners have asserted that 80% of the work involved is acquiring and preparing data. Harvard Business Review, reprinting the statistic from Forbes in 2016. Forbes cites a “survey of about 80 data scientists was conducted for the second year in a row by CrowdFlower.”

Continue reading “Why is data preparation so hard and are we getting worse at it?”