What I learned at Data Science Bootcamp

Ottawa is where I attended the bootcamp.
Photo by Shanta Rohse via Wikipedia
About a week ago I was fortunate enough to attend a Data Science Bootcamp hosted by a gracious and affable local Data Scientist. It turns out that many people within her network of family and friends were desperate to learn more about Data Science. What better way to educate all the interested people than to bring them all into one room?

Here in Ottawa, Ontario, the federal government is the largest employer, with two major universities supplying many graduates for the public servant workforce. This resulted in a crowd composition of mostly government workers; incidentally many had a Biology background (one biologist friend brings another biologist friend, etc. etc). Our host went around the room to assess our goals for attending her workshop, and it appeared as though we were all keen learners who simply wanted to learn more!

She gave three wonderful presentations, and invited a handful of us up to give lightening talks interspersed throughout her training. The focus of the workshop was not only to teach the attendees about Data Science generally, but also to introduce them to the Data Science freelance lifestyle. Below are the main items I took away from the bootcamp. This post is mostly a way for me to cement my own learning.


Gartner's Hype Cycle


Gartner's Hype Cycle
By Jeremy Kemp via Wikipedia
This was the most impactful thing I learned at bootcamp. We are in a Data Science hype cycle right now, and this phenomenon is nothing new. Specifically, we are on the incline, and we are potentially about to peak to inflated expectations. Unfortunately, the next phase of the cycle is the trough of disillusionment (think, AI winter). Well, it's better to be prepared than surprised! (Also, keep a keen eye out for the next hype train).

I find myself easily getting swept up in the hype. What helped to ground me was a timely talk by one of the attendees, who has been a software developer for more than four decades. He has seen many hype cycles and focused his talk around discussing the fundamentals of data science.

Databases and structured data management have been around since the 80s. Similarly, the fundamental skillsets required to utilize structured data have been around for a long time (since the enlightenment - statistics, algorithms, and since data existed - programming and requisite libraries). No, neural networks are not new, he explained!

The Data Science Pipeline and Roles of a Data Scientist


After discussing what Data Science is, we jumped into the trenches to discover what a typical pipeline looks like. Yes, there is grungy work (and yes, this comprises most of the work). A useful (and approximate) figure I learned was that in fact 80% of the work is in the "trenches". 

The hardest part of building your data pipeline is in fact getting the data to flow through your pipeline. This is a tricky balance of not over or under-engineering your solutions, and understanding where automation is appropriate. As lowly humans we must strive to automate as much as possible (as my former DevOps boss would repeat to me on a regularly basis).

The most exciting thing I learned is that a Data Scientist has many hats to wear, and this diversity of roles is something you can leverage to build a Data Science dream team (DS is a team sport!). I am the sort of person who enjoys many things, and the diversity of roles is really motivating!


The IKEA Metaphor



Building furniture is like building Data Science pipelines.
Photo by Jeff Sheldon on Unsplash
As I mentioned, most of us had a background in Biology. Hence, we are accustomed to carefully and diligently collecting our experimental data, hand cleaning it (my former thesis supervisor would MANUALLY verify his DNA sequence alignments and this is the only technique he trusted), developing our analytical procedure via thorough literature searches and spending hours considering the results, conclusions and exhaustively discussing. This is the equivalent of building beautiful artisan furniture.

However when Data Science freelancing, you are building IKEA furniture. The underlying structure of an IKEA chair is equivalent to an artisan piece, but the process used is much less meticulous and is more efficient.

The Four Literacies

Data Scientists are a unique breed. They possess (1) Computer Literacy, (2) Math/Stats Literacy, (3) Data Literacy and (4) Graphical literacy. This is likely why we see so many Data Scientists emerging from a science training background. The scientific ivory tower is a great place to learn the fundamental math and statistics literacy required for the Data Science role. However, the other three literacies are just as important. I believe that having the four literacies laid out for me in a practical manner will help me focus my training efforts on places where I'm lacking or need more experience.


Client Relationships



Clients are people who love data.
Photo by Alejandro Escamilla on Unsplash
Having had experience building reports for internal clients at a small business, I can relate to the problems that freelance Data Scientists encounter (albeit, my experience is on a much smaller scale). Our host, along with freelance attendees, gave very actionable advice at the end of our bootcamp (thank you!) on how to manage client relationships. It was an open, freeform discussion that lead to some really interesting conversations. Here are some of the highlights of the pieces of advice given:

It is important to acknowledge that clients are a part of the team. Specifically, they are the domain experts on your Data Science team. Your role on the team is to utilize this domain knowledge and your Data Science knowledge to build something great.

Ottawa by night.
Photo by Shawn M. Kent via Wikipedia 
The client may not know exactly what they want, but this is your job as the Data Science expert, to understand what they need. Ask clients "what do you want to accomplish".

Clients will want a presentation at the end of the job, and this is is something they usually don't realize until the end of a project.

Clients know the buzzwords. It is your job to know them as well. You should seek to understand (1) the language, (2) application frameworks, libraries and implementations, (3) protocols and (4) techniques, algorithms and theories. This understanding can be achieved quickly using Wikipedia.

Be explicit with contracts. Have the client sign and agree to the extend of the project, to avoid scope creep.


Data Science Tips

Throughout the bootcamp, a number of miscellaneous tips stood out for me. To end this post on an actionable note, here is a list of my favs:


  • Be willing to take risks
  • Learn to have a thick skin
  • Slot in learning hours (not billable)
  • It takes 3 days to learn a programming language, but one year to master it. Use it immediately in projects or else you will forget it.
  • One-offs are unlikely
  • Data Science is "hypothesis discovery"
  • There is more than one way to do all things
  • There are many options to each piece of a Data Scientist's "stack"
  • Many Data Scientists specialize in certain types of data and techniques





Comments

Popular posts from this blog

Data quality assurance requires real users

Existing in fast and slow moving spaces