What I learned at Data Science Bootcamp
Ottawa is where I attended the bootcamp. Photo by Shanta Rohse via Wikipedia |
Here in Ottawa, Ontario, the federal government is the largest employer, with two major universities supplying many graduates for the public servant workforce. This resulted in a crowd composition of mostly government workers; incidentally many had a Biology background (one biologist friend brings another biologist friend, etc. etc). Our host went around the room to assess our goals for attending her workshop, and it appeared as though we were all keen learners who simply wanted to learn more!
She gave three wonderful presentations, and invited a handful of us up to give lightening talks interspersed throughout her training. The focus of the workshop was not only to teach the attendees about Data Science generally, but also to introduce them to the Data Science freelance lifestyle. Below are the main items I took away from the bootcamp. This post is mostly a way for me to cement my own learning.
Gartner's Hype Cycle
Gartner's Hype Cycle By Jeremy Kemp via Wikipedia |
I find myself easily getting swept up in the hype. What helped to ground me was a timely talk by one of the attendees, who has been a software developer for more than four decades. He has seen many hype cycles and focused his talk around discussing the fundamentals of data science.
Databases and structured data management have been around since the 80s. Similarly, the fundamental skillsets required to utilize structured data have been around for a long time (since the enlightenment - statistics, algorithms, and since data existed - programming and requisite libraries). No, neural networks are not new, he explained!
The Data Science Pipeline and Roles of a Data Scientist
After discussing what Data Science is, we jumped into the trenches to discover what a typical pipeline looks like. Yes, there is grungy work (and yes, this comprises most of the work). A useful (and approximate) figure I learned was that in fact 80% of the work is in the "trenches".
The hardest part of building your data pipeline is in fact getting the data to flow through your pipeline. This is a tricky balance of not over or under-engineering your solutions, and understanding where automation is appropriate. As lowly humans we must strive to automate as much as possible (as my former DevOps boss would repeat to me on a regularly basis).
The most exciting thing I learned is that a Data Scientist has many hats to wear, and this diversity of roles is something you can leverage to build a Data Science dream team (DS is a team sport!). I am the sort of person who enjoys many things, and the diversity of roles is really motivating!
The IKEA Metaphor
Building furniture is like building Data Science pipelines. Photo by Jeff Sheldon on Unsplash |
However when Data Science freelancing, you are building IKEA furniture. The underlying structure of an IKEA chair is equivalent to an artisan piece, but the process used is much less meticulous and is more efficient.
The Four Literacies
Data Scientists are a unique breed. They possess (1) Computer Literacy, (2) Math/Stats Literacy, (3) Data Literacy and (4) Graphical literacy. This is likely why we see so many Data Scientists emerging from a science training background. The scientific ivory tower is a great place to learn the fundamental math and statistics literacy required for the Data Science role. However, the other three literacies are just as important. I believe that having the four literacies laid out for me in a practical manner will help me focus my training efforts on places where I'm lacking or need more experience.Client Relationships
Clients are people who love data. Photo by Alejandro Escamilla on Unsplash |
It is important to acknowledge that clients are a part of the team. Specifically, they are the domain experts on your Data Science team. Your role on the team is to utilize this domain knowledge and your Data Science knowledge to build something great.
Ottawa by night. Photo by Shawn M. Kent via Wikipedia |
Clients will want a presentation at the end of the job, and this is is something they usually don't realize until the end of a project.
Clients know the buzzwords. It is your job to know them as well. You should seek to understand (1) the language, (2) application frameworks, libraries and implementations, (3) protocols and (4) techniques, algorithms and theories. This understanding can be achieved quickly using Wikipedia.
Be explicit with contracts. Have the client sign and agree to the extend of the project, to avoid scope creep.
Data Science Tips
Throughout the bootcamp, a number of miscellaneous tips stood out for me. To end this post on an actionable note, here is a list of my favs:- Be willing to take risks
- Learn to have a thick skin
- Slot in learning hours (not billable)
- It takes 3 days to learn a programming language, but one year to master it. Use it immediately in projects or else you will forget it.
- One-offs are unlikely
- Data Science is "hypothesis discovery"
- There is more than one way to do all things
- There are many options to each piece of a Data Scientist's "stack"
- Many Data Scientists specialize in certain types of data and techniques
Comments
Post a Comment