The problem with data science job postings

Jeremie Harris
5 min read

Every once in a while, you notice something that you realize you probably should have noticed a long time ago. You start to see it everywhere. You wonder why more people aren’t talking about it.

For me, “every once in a while” was yesterday when I was scrolling through the #jobs channel in the SharpestMinds Slack workspace, and the “something” is a big problem in the data science industry that I really don’t think we’re taking seriously enough: the vast majority of data science job descriptions do not convey the actual requirements of the position they’re advertising.

How do I know this? For one, quite a few of the jobs posted to our internal board included notes from the users (usually SharpestMinds mentors) who posted them, saying things like, “I know the posting says they’re looking for X and Y, but they’re actually fine with Z.” As often as not I’d also get direct messages from them saying the same thing.

In other words, when senior data scientists are called upon to recruit “for real”, their first move is often to throw away the job posting altogether.

This is not good, for several reasons. First, a misleading job description means that recruiters get a ton of irrelevant applications, and that candidates waste a ton of time applying to irrelevant positions. But there’s another problem: job descriptions are the training labels that any good aspiring data scientist will use to prioritize their personal and technical skills development.

Despite the obvious downsides of these mangled job postings, companies keep putting them out there, so a very natural question to ask is: why? Why are job postings so confusing (in that they fail to clearly specify the skills they expect from a candidate), or so outrageously over-reaching (“looking for a machine learning engineer with 10 years’ experience in deep learning…”)?

There are many reasons. For one, companies make hiring decisions based on a candidate’s (perceived) ability to solve a real problem that they actually have. Because there are many ways to solve any given data science problem, it can be hard to narrow down the job description to a specific set of technical skills or libraries. That’s why it usually makes sense to put in an application for a company if you think you can solve the problems they have, even if you don’t know the specific tools they ask for.

Another possible reason is that many companies don’t actually know what they want — especially companies with relatively new data science teams — either because the early stage of their data science effort forces everyone to be a jack of all trades, or because they lack the expertise they need to even know what problems they have, and who can help solve them. If you come across an oddly non-specific posting, it’s worth taking the time to figure out which bucket it belongs to, since the former can be a great experience builder, whereas the latter can be a recipe for disaster.

But perhaps the most important reason is that job postings are often written by recruiters, who are not remotely technical. This has the unfortunate side-effect of resulting in occasionally incoherent asks (“Must have 10+ years’ experience with deep learning…”, “…including natural language toolkits, such as OpenCV…”) or asks that no human being could possibly satisfy.

The net result of this job qualifications circus is that I regularly get questions from our mentees about whether they’re qualified for an opening, despite their having read all the information available on the internet about that position. Those questions are actually surprisingly consistent — so much so that I think it’s worth listing the answers to the most common ones here, in the form of simple rules you can follow to make sure you’re applying to the right roles (and not being scared away by fake requirements):

None of these rules are universally applicable, of course: the odd company will insist on hiring only candidates who meet all their stated requirements, and others will be particularly interested in people who know framework X, and will disregard people who can solve similar problems, but with different tools. But because there’s no way to know that from job descriptions alone (unless they’re explicit about it), your best bet is almost always to bet on yourself, and throw your hat in the ring.

If you want to connect, you can find me on Twitter at @jeremiecharris!

Jeremie Harris
Follow me on Twitter