To get hired as a Data Scientist, Data Analyst, or in any other data role, you need to stand out. That means being great at explaining technical concepts, showing that you’re thoughtful and introspective, and building interesting projects.
At SharpestMinds, we try to help our mentees with all three. We took some of the best community events we've hosted for our mentors and mentees and turned them into a course that you can take to build job-ready skills that schools and bootcamps don’t teach!
"Oh wow, how cool!"
-You, after finishing the course, hopefully
Decision trees: the most important model
Tree-based models: half the model zoo!
Support vector machines: not nearly as useful as most people think
Model evaluation metrics: MAE, MSE, precision, recall, and ENTROPY!
Dimensionality reduction: PCA and t-SNE using handwaving!
Feature selection: what most aspiring data scientists don't know
K-means and hirearchical clustering with no math
Word embeddings and word vectors: doing math with language!
How to design a project that will actually get companies interested.
How I solve new data science problems from scratch.
How to handle behavioral interviews.
Product (or business) interviews: my go-to method.
Deploying your ML model with Flask and Heroku
Basic tips on writing production-ready code (for Flask machine learning apps!)
Part 1 - Decision trees: the most important model
It's easy to get lost in the zoo of machine learning models, and end up knowing a little bit about many different algorithms. But that's a bad strategy: interviewers will expect you to understand at least *some* models very deeply.
It's normal to know only a little about most algorithms, but you should choose one or two to understand deeply. Decision trees are a great option because they're easy to interpret, simple to use, and tree-based models make up about 40% of the model zoo!
Part 2 - Tree-based models: half the model zoo!
A disproportionate fraction of the most popular machine learning models for data science are tree-based. In this session, we look at how random forests and gradient boosting can make decision trees much more effective.
Part 3 - Support vector machines: not nearly as useful as most people think
Support vector machines (SVMs) are a well-known class of machine learning models that interviewers love to ask about. But the truth is, they're rarely the best algorithm for the job. You can almost always find a model that performs better, and is more interpretable, than an SVM.
Still, because this video series is about preparing you for interviews, this is our obligatory dive into SVMs!
Part 4 - Model evaluation metrics: MAE, MSE, precision, recall, and ENTROPY!
One of the easiest ways to tell a beginner data scientist apart from a pro is to ask them about model evaluation metrics. Learning how to identify the right metric for your data science problem is a critical step towards becoming proficient enough to pass interviews.
Part 5 - Dimensionality reduction: PCA and t-SNE using handwaving!
Dimensionality reduction is one of those things you just have to understand if you're aiming for data science or analytics roles. In this video we'll explore how the two fundamental dimensionality reduction techniques - PCA and t-SNE - actually work, using nothing but hand-waving!
Part 6 - Feature selection: what most aspiring data scientists don't know
Practically every dataset has too many features. But how do you choose which features to keep, and which to remove? That's what this video is all about.
Feature selection is a lot like model evaluation metrics: absolutely crucial if you want to make a good impression on take-home tests or interviews, and very under-valued by jobseekers. This is a must-have skill!
Part 7 - K-means and hirearchical clustering with no math
Just because you don't have training labels or target variables doesn't mean you can't do machine learning! Unsupervised learning is an essential part of a data scientist's toolbox.
Here's a math-free walkthrough of the two most important clustering algorithms. We'll not only cover how they work, but when you might want to use them, and which use cases work best for each.
Part 8 - Word embeddings and word vectors: doing math with language!
Machine learning algorithms all work the same way: they take in a list of numbers (an array, or a row in a dataset) and they return an output. But what if your data isn't just a list of numbers? What if it's a bunch of text? Can you still use those fancy algorithms - those random forests, that PCA - on it?
Yes - but you'll have to figure out how to turn it into a list of numbers first. In this session, we'll look at how you can do just that, by using an algorithm called word2vec. Math with words!
Part 9 - How to design a project that will actually get companies interested.
Covering your data science foundations is critical. But if you want to get hired, it's also not enough: somehow, you'll need to get companies interested in you.
And that means building a project that proves to them that you can solve the kinds of problems they have.
Part 10 - How I solve new data science problems from scratch.
WARNING: There *is no* one single "right" data science process that you can follow to solve a new problem from scratch.
With that said, it's also important to know how the different pieces of the data science lifecycle fit together. So in this session, Jeremie explains his thought process as he tackles new data science problems from the ground up, including how much time he spends on each step.
Part 11 - How to handle behavioral interviews.
Whether you're aware of it or not, every interview is a behavioral interview (at least to some extent).
Getting good at behavioral interviews is a big part of getting good at interviews overall. It can be (and often is) make-or-break. But most entry level applicants don't understand how to approach behavioral interviews, or get defensive when talking about their weaknesses or personal challenges.
Let's talk about how not to do that!
Part 12 - Product (or business) interviews: my go-to method.
For data scientists without on-the-job experience, business- and product-related questions can be difficult to navigate.
But they don't have to be: in this session, we'll talk about how you can get a much better idea of how to tackle these questions by thinking of companies in terms of their pipelines.
Part 13 - Deploying your ML model with Flask and Heroku
So you have a cool machine learning project and you want to share it with the world? This tutorial will walk through all the steps to get a basic web app hosting your machine learning model using Flask and Heroku!
Part 14 - Basic tips on writing production-ready code (for Flask machine learning apps!)
In this session, we provide some simple steps you can take to start refactoring that hobby project into production-ready code, following best practices!