Data quality assurance requires real users

- January 10, 2018

A hard lesson that I've learned over the past few months is that data quality assurance requires real data users.

I wrote a handful of data processing scripts two years ago to batch parse literature citations. The workflow was functioning well, and I even ran multiple test runs on different kinds of references to find bugs.

I've returned to these scripts and started processing new sets of data. I've found that the resulting datasets are riddled with problems. Some references are skipped entirely by the workflow, with no flagging system, and other features I excitedly added to the scripts are now broken.

This lesson has come to bear on my work a number of times, with software and workflows written by other, highly competent folks. You just can't predict what the problems with your workflow will be, until you have real users. What is the solution here? I hope I will find some clarity soon.

Search This Blog

data d'Pender

Data quality assurance requires real users

Comments

Post a Comment

Popular posts from this blog

What I learned at Data Science Bootcamp

Why do so many programming languages exist?

Web design for science dummies