Data visualization: biodiversity science vs. business

Since my transition from full-time business analytics back into biodiversity informatics, I've accumulated a few thoughts on the subtle, albeit major, differences in challenges posed by data visualization.

Big data continues to pile up in every domain. While scientists have long collected and presented large sets of data as part of their research pipeline, new players are entering the data deluge realm. From small online companies to monolithic technology companies (e.g., Google, Apple, Facebook, Amazon, etc.), many businesses are now clients of easy-to-use visualization tools that take data directly from its data store to beautiful live dashboards or linked stories (e.g., Tableau, Looker, Google Analytics). These are the sorts of tools I used as an analyst in the private sector. More available visualization tools can only mean good things for scientists, right?

The differences between scientific datasets, data visualization goals and audiences and the visualization goals and audiences of technology businesses represent major stumbling blocks to the adoption and utilization of data visualization software by scientists.

I haven't entirely figured out how I can use the many offerings of the data visualization sector to enable and empower me to build better tools for biodiversity scientists. This post is an attempt to sketch out major differences in the hopes that a better understanding will lead to actionable clarity.

Dataset nature

Important business metrics (e.g., gross sales, cost of goods, number of active users, etc.) are often similar and predictable in their structure. They are temporal, quantitative, and their biases can be understood, because they are collected by the data user in question (i.e., the company).

Biodiversity data come from natural history museums
Photo by Chip Clark and via the Smithsonian 
Conversely, scientific data are complex, of uncertain quality, and collected in various ways. For biodiversity science in particular, the data are often mined from various sources external to the data user. No two biodiversity datasets are the same (i.e., various combinations of taxa and distribution, taxonomy, natural history, and conservation data). This makes it difficult to use default data visualization software visualizations.

Data visualization goals

For business folks, keeping a pulse on the health of the company is the primary goal. Data needs to be presented with clarity and in real-time. This is achieved using simplistic visualization techniques (e.g., bar charts and line graphs) with connections to live databases.

Secondary goals, like optimization of company functions, targeting of business decisions and support for fundraising, as well as the primary goal are common to many other clients of data visualization software. Therefore meeting these goals quickly and efficiently is straightforward and is provided by the software out of the box. For example, building out a forecast of a given business metric, based on historical data, can be achieved with the click of a button in Tableau.

Business dashboards are often useful for all employees
Dashboard from Tableau Public uploaded by user Ebere Igbojekwe
For biodiversity informaticians, the goal of data visualization is assuredly not to measure real-time sales using simple bar charts. Goals are more difficult to pin down, because the types of users of a given data visualization are very distinct: citizen scientists, the general public, taxonomists, ecologists, etc. How can one visualization or one set of visualizations accomplish a multitude of goals, that may change drastically from one dataset to the next?

Data visualization audience

Business and biodiversity data users are distinctly different. The message offered by a data visualization for business needs to be short, clear and obvious. The audience isn't picky, and details are not important. Business users have little time to spare. There is typically one message for all users within a business; the needs differ only slightly between users within this audience.

For scientists, the message as well as the analytical details, are important. However, this complex, nuanced requirement of a data visualization doesn't work well for a layperson. Because the audience of a visualization built for biodiversity science is so diverse, one visualization cannot serve all.

To be refined...

I will be attending a seminar in February 2018 on the topic of data visualization, so I hope to add more to this post soon :)

Comments

Popular posts from this blog

Data quality assurance requires real users

What I learned at Data Science Bootcamp

Existing in fast and slow moving spaces