Posts

Tableau for exploration of biodiversity datasets

Image
A tutorial with Tableau Tableau is an intuitive and fast data visualization software program. In this tutorial, I'll be using Tableau Public, a free alternative to Tableau Desktop, to show you how you can quickly visually explore biodiversity datasets. The need for visualization Effective data visualization software can make a huge difference in modern informatics analysis pipelines. But why is data visualization important and will it continue to be important? What do the numbers all mean? Photo by  Mika Baumeister  on  Unsplash Firstly, data is accumulating at a speed never seen before. Technological advances have made data acquisition easier (Internet of Things, social and mobile networks etc.), data processing power fast, data storage cheap and data display devices ubiquitous. We are drowning in data, and need faster ways to make sense of it.   Secondly, humans are visual. While computers are adept at making sense of large sets of data quickly, humans struggle t

A few thoughts on community

Image
It is widely accepted that in the modern era, scientists cannot work in isolation. Collaboration is essential to achieving the level of productivity expected of a 21st century researcher. I believe this is a beneficial development of modern science. Several fields, such as economics and psychology, have concluded that groups are smarter than individuals (see the popular book The Wisdom of Crowds by journalist James Surowiecki for a synthesis). As a budding scientist (I'm still getting comfortable calling myself this, since I'm not too sure what I really am), I understand the importance of building a strong network of collaborators . They are essential in helping to refine your thinking, providing productive criticism to improve your work, and keeping your enthusiasm for your discipline alive. This is why I was extremely eager that my employer agreed to send me to the 2018 meeting of the Biodiversity Information Standards group ( TDWG ). TDWG is, to my knowledge, the on

Existing in fast and slow moving spaces

Image
Ottawa has a thriving technology meet-up community. In fact, I am fortunate to live about a 5 minute bike ride away from the hosting venue of the Ottawa Machine Learning Meetup group. Tonight's lecture was "An Introduction to Natural Language Processing (NLP)". It struck me through a combination of comments made during the talk, and during my discussion with a young Computer Systems Engineering student, that working in the technology space means adjusting your frame of reference for what is a new and old idea. Botany. a discipline as old as time Image via  www.twenty20.com "Classical" NLP techniques date back to a decade ago. An "old" and developed Data Science company means it has existed for 5 years. These frames of reference are in stark contrast to my regular existence in the Biology, Systematics and Taxonomy world, disciplines pioneered over 100 years ago. "New" techniques in systematics include DNA sequencing, first used over 40 y

What I learned at Data Science Bootcamp

Image
Ottawa is where I attended the bootcamp. Photo by  Shanta Rohse  via  Wikipedia About a week ago I was fortunate enough to attend a Data Science Bootcamp hosted by a gracious and affable local Data Scientist . It turns out that many people within her network of family and friends were desperate to learn more about Data Science. What better way to educate all the interested people than to bring them all into one room? Here in Ottawa, Ontario, the federal government is the largest employer, with two major universities supplying many graduates for the public servant workforce. This resulted in a crowd composition of mostly government workers; incidentally many had a Biology background (one biologist friend brings another biologist friend, etc. etc). Our host went around the room to assess our goals for attending her workshop, and it appeared as though we were all keen learners who simply wanted to learn more! She gave three wonderful presentations, and invited a handful of us up

Data quality assurance requires real users

Image
A hard lesson that I've learned over the past few months is that data quality assurance requires real data users. I wrote a handful of data processing scripts two years ago to batch parse literature citations. The workflow was functioning well, and I even ran multiple test runs on different kinds of references to find bugs. I've returned to these scripts and started processing new sets of data. I've found that the resulting datasets are riddled with problems . Some references are skipped entirely by the workflow, with no flagging system, and other features I excitedly added to the scripts are now broken. This lesson has come to bear on my work a number of times, with software and workflows written by other, highly competent folks. You just can't predict what the problems with your workflow will be, until you have real users. What is the solution here? I hope I will find some clarity soon. 

Data visualization: biodiversity science vs. business

Image
Since my transition from full-time business analytics back into biodiversity informatics, I've accumulated a few thoughts on the subtle, albeit major, differences in challenges posed by data visualization. Big data continues to pile up in every domain. While scientists have long collected and presented large sets of data as part of their research pipeline, new players are entering the data deluge realm. From small online companies to monolithic technology companies (e.g.,  Google, Apple, Facebook, Amazon, etc.) , many businesses are now clients of easy-to-use visualization tools that take data directly from its data store to beautiful live dashboards or linked stories (e.g., Tableau, Looker, Google Analytics). These are the sorts of tools I used as an analyst in the private sector. More available visualization tools can only mean good things for scientists, right? The differences between scientific datasets, data visualization goals and audiences and the visualization goals an

Web design for science dummies

Image
I'm currently working on a project to bring an old publication to new life on the web. We want it to be a dynamic, structured and integrated version of the paper copy publication. To have maximum value for researchers, it should be modern and user friendly. Web Design So, I've been diving into modern web design. The beautiful, easy-to-read and clean looking sites you see these days can be distilled into a surprisingly few number of principles. Fonts, centering, spacing, and images all have huge impact. Jeremy Thomas  took me on a 4 minute journey into the world of design. Thanks Jeremy! The look and feel of a modern webpage can be distilled into the following set of [paraphrased] rules: Focus on content Use clear readable text Utilize text colour for emphasis Add bold images Of course, the world of web design seems to be as fickle as fashion. Trends come and go, things become outdated and need refreshing. I believe the above principles, however, sho