The complete guide to hiring a data scientist

A data scientist’s job is one of the most sought after jobs of the 21st century. But how do you hire a data scientist who fits the bill?

According to Firstround.com, in a competitive field like Data Science, strong candidates often receive 3 or more offers, so the success rates of hiring are typically below 50%. The key is to have prospective candidates go through the recruiting process quickly, thus helping recruiters close data scientist positions faster. This is possible only if the right objective is set before hiring starts.

Organizational use cases

It is imperative for your organization to set the right expectations for the Data Science platform and for your hiring needs to align with it. You could have a large amount of data and no idea about what to do with it. In most cases, organizations look at achieving the following using Data Science:

Provide Business Intelligence

Business intelligence is all about data management— arranging data, and producing information from data via dashboards. These business insights play an important role in the decision-making process of any organization

Solve Opimization problems

Simply put, reshaping processes by analyzing data. An example could be a logistics company where the supply chain can be optimized so that delivery drivers can use less fuel and reach customers faster

Provide recommendations

Using data to form predictive models for companies to better understand their target customers. E-commerce companies use this to recommend products based on the consumer buying behavior and also monitor stock levels in warehouses

Project Life Cycle

The most crucial step for any data science project is the “problem specification” phase where you need to figure out what needs to be solved and the “experimentation and validation” phase where you check whether an approach really works. Evaluating a candidate’s skills for these important phases can be a tedious process without the right platform. In fact, in a traditional hiring process, most hiring managers feel fortunate if their accuracy of evaluation is as high as 50%. The ongoing effort that traditional hiring requires could easily consume 20% or more time of a Data Science team. This is where a technical recruitment platform like HackerEarth ’s comes to the rescue.

Job Parameters

Now that the ultimate goal of data science within your organization has been set, every hiring manager needs to look at certain skills that are important for data scientists to have.


Statistics and linear algebra

This is a decision-making skill. Prospective candidates should be good at collecting, analyzing, and making inferences from data

Machine Learning

This is the art of classifying or grouping data for prediction. An ideal data scientist should be able to use big data technologies to create pipelines that feed Machine Learning algorithms

Data mining

This refers to handling and cleaning data. A data scientist should be able to visualize and mine raw data to derive meaningful insights from it

Technical skills

Every data scientist should be well versed in the following:

– Programming languages such as R, Python, Scala, JavaScript, SQL, Spark, C, and C++

– Libraries such as pandas, NumPy, scikit-learn, OpenCV, and Matplotlib

– Data structures and algorithms, Excel, Tableau, Hadoop, SAS, etc.

Types

Let us now look at whom to hire. Data scientists are broadly classified into two—Researchers and Engineers. For any organization, it is good to have a mix of both.

Things to look out for when hiring a researcher

Data researchers have a strong background in math or statistics. They should be skilled to develop custom algorithms to make the most of data and inquisitive to find solutions from data. They should be well-versed in technical skills such as R, Python, and SQL. To pull data, candidates should be able to understand relational databases. Using SQL to query data is a needed skill and having an experience of storing data using NoSQL is a plus point.

Things to look out for when hiring a data engineer

Data engineers typically have a stronger coding background. They should be capable of structuring things well and prototyping quickly. They should be well-versed in data tools and languages such as Python, Scala, Java, and MATLAB. For the extracted data to be used, engineers should be capable of creating a visualization or building a Machine Learning model.

Skill to assess

Finding the right candidate for the role of a data scientist can be tricky and challenging. This article will help you understand what Data Science is and what skill sets to look for in a candidate when hiring for a data scientist.

Data science is an interdisciplinary field that uses a blend of data inference and algorithm development to solve complex analytical problems. An ideal candidate has skills in the following 3 fields:

  • Mathematics and statistics
  • Machine Learning and programming
  • Business/domain knowledge

Mathematics and statistics

A candidate applying for the role of a data scientist should have a good understanding of certain mathematical concepts. This includes topics like statistics (both descriptive and inferential), linear algebra, probability, and differential calculus.

Machine Learning and programming

Any candidate applying for the role of a data scientist should have strong programming skills. The candidate must have a good understanding of basic programming concepts, data structures such as trees and graphs, and the most-commonly used algorithms. The candidate should be able to code in either of the languages—Python or R—which are the most widely-used languages in Data Science.

Business/domain knowledge

Candidates should have a basic understanding of the business or the industry in which they are  applying for as data scientists. They should be able to understand the problem from the perspective of the company’s business, translate that problem into a Data Science problem, and solve it using the skill sets described above. Finally, he should be able to present insights from the solution effectively. However, it is important to keep in mind that the depth of the business or domain knowledge will depend upon the experience of the candidates.

Salaries

According to Glassdoor, the national average salary for a Data Scientist is $1,17,345 in the United States.

Data Science salaries depend on the following factors –

  1. Experience – People who are experienced in data science, engineering or analytics get paid more than others with lesser experience. Also Data Scientists in managerial roles tend to be paid higher
  2. Academic achievement – Data Scientists with PHDs make more on average than those with Bachelors’ degrees
  3. Company size – Salary of a Data Scientist also depends on the size of the organization hiring the Data Scientist. Though lots of startups hire Data Scientists at competitive salaries, there are a lot of smaller start ups which pay lesser than the industry average

Sourcing Talent

Tech communities are full of potential hires waiting to be discovered. Here are 3 such communities from where you can source talent for free.

Hiring Data Scientists from GitHub

GitHub is one of the world’s largest code hosts, with close to 31 million developers. It’s like a tech recruiter’s dream. A developer’s GitHub profile gives you a wealth of information.

Before you start shortlisting profiles on GitHub, make sure that the Data Scientist is open to recruiters approaching him with jobs. Once this is sorted, follow these steps to find the best talent on GitHub:

  • The first step is to create a profile on GitHub
  • Once the profile is created, run a search using 3 parameters – language, location, and followers.
  • By default, GitHub shows results for the list of repositories. You can change this to users by choosing it from the left hand side menu. You now have a list of developers you can reach out to.

Here are a few things to remember before you connect with potential Data Scientists.

  • Check their repositories to familiarize yourself with their work. This would be mutually beneficial as you can filter out candidates who you think will not fit into the job role on offer.
  • Cross-reference their profiles on either Linkedin or Twitter to be doubly sure if they would be a perfect fit or not.
  • Don’t judge profiles on how active or complete they are. Sometimes developers do not tend to share code publicly for security reasons. Also, not having a great social following is not an indication of how good their tech skills are.

For more info, download our in-depth e-book on hiring GitHub developers.

Hiring developers from StackOverflow

StackOverflow is a Q&A site for professional and enthusiast programmers. Just like GitHub, StackOverflow is also a great platform to hire amazing Data Science talent.

The process of shortlisting Data Science profiles is similar to GitHub. However, here are a few things to remember before connecting with your first Data Scientist via StackOverflow:

  • StackOverflow is more of a Q&A site where developers post and answer technical questions. You would need to look at candidates addressing such specific questions to see if they fit your requirements.
  • Developers are segregated based on their user badges and reputation scores. An ideal candidate ranks high for both.
  • Every question which is posted has tags associated with  it. You can use these tags to find users who fit the bill.

Some other places to find great developer talent include HackerEarth, Reddit, Kaggle, etc.

Hiring Data Scientists from Machine Learning challenges and hackathons

Hackathons and coding challenges are great ways for candidates to show their skills in action. When you are hiring top Data Science talent, testing candidates on real-time problem-solving skills can boost your recruitment efforts.

Job Description

Data Scientist Job Description

Company Introduction

HackerEarth provides enterprise software solutions that help organisations with their innovation management and technical recruitment needs. HackerEarth has conducted 1000+ hackathons and 10,000+ programming challenges till date. Since its inception, HackerEarth has built a developer base of over 2 million+.

Job description

As a Data Science Engineer, you will significantly contribute to identifying best-fit architectural solutions for one or more projects; apply data science techniques to analyze large amounts of data, presenting data insights using high impact visualization, provide regular support/guidance to project teams on complex coding, issue resolution and execution. You will collaborate with some of the best talent in the industry to create and implement innovative high quality solutions. You will be part of a learning culture, where teamwork and collaboration are encouraged, excellence is rewarded, and diversity is respected and valued.

Qualifications

  • Bachelor’s degree or foreign equivalent required. Master’s in Statistics, Mathematics, Computer Science or another quantitative field (Preferred)
  • At least 4 years of experience and excellent understanding of: Machine learning techniques and algorithms for classification, clustering and prediction such as Neural Networks, Naive Bayes, SVM, Decision Forests, etc. NLP, text analytics technologies.
  • Common data science toolkits such as Python Data Science Libraries, R, MatLab, etc. Excellence in Python is highly desirable. Ability to enhance the standard algorithms is highly expected.
  • Developing the algorithms and testing on the real data sets and fine tuning the algorithms to ensure business objectives are met.
  • Implementing the ML algorithms in the production instance and integrating with necessary data sources to address specific business problems, Extending to add custom algorithms
  • Big data technology of HDFS, Hive, Spark, Scala etc. Data visualization tools such as Tableau, Query languages such as SQL, Hive.
  • Good applied statistics skills, such as distributions, statistical testing, regression, etc.

Roles and Responsibilities

  • You will be a core member of a team that does whatever it takes to delight customers, take an iterative and result oriented approach to software development. In this position you will provide best-fit architectural solutions for multi-product, multi-project, multi-industry portfolios providing technology consultation and assisting in defining scope and sizing of work.
  • You will be responsible for delivering high-value next-generation products on aggressive deadlines and will be required to write high-quality, highly optimized/high-performance and maintainable code that your fellow developers love.
  • You will be the anchor in Proof of Concept developments and support opportunity identification and pursuit processes and evangelize Infosys brand
  • You will collaborate with some of the best talent in the industry to create and implement innovative high quality solutions, lead and participate in sales and pursuits focused on our clients’ business needs
  • You will be part of a learning culture, where teamwork and collaboration are encouraged, excellence is rewarded, and diversity is respected and valued
  • The role involves high end technology and hence would require you to be an expert in coding.

Recruiter Email Template

Outreach Email

Subject – Join our amazing Data Science team at <Company name>

Dear <First_Name>

I am <Name> and I work as a Recruiter for <Company name>. I came across your profile on <Social media or Job board> and I was very impressed with your skills especially <describe a project or a particular programming skill set>.

We are currently looking for a Data Scientist to join our amazing team and I think you would be a great fit. Here are some of the cool projects that we are working on currently – <provide a link to projects at your organization>

If this is something that interests you, please write back to me and I will be happy to explain more over a call.

Have a great day, and I hope to hear back from you soon!

Best,

<Your name>

Follow-up Email

Subject – Following up!

Hi <First_Name>,

Hope you are doing great!

Have you had a chance to read my previous mail?

We are looking for some super talented Data Scientists to join our team at <Company name> and I thought you would be a great fit.  

Our Data Science team has been working on some cool projects <link some of your work> and I thought you would find them interesting.

And if you are wondering what it is like to work for , here is a short video of what our employees think – <Include an employer branding video>

If you are interested in this opportunity, do drop me an email so we can take this forward. Have a great day!

Best,

<Name>

Interview Questions

We’ve asked a couple of Data Scientists on Reddit on what they would like to be quizzed on. This is what they said –

According to Towards Data Science, these are the top 28 interview questions asked by most Hiring Managers to test Data Science skills among candidates –

  • What is the difference between supervised and unsupervised Machine Learning
  • What is bias, variance trade off?
  • What is exploding gradients?
  • What is a confusion matrix?
  • Explain how a ROC curve works.
  • What is selection bias?
  • Explain SVM machine learning algorithm in detail.
  • What are support vectors in SVM?
  • What are the different kernel functions in SVM?
  • Explain decision tree algorithm in detail.
  • What is Entropy and Information gain in a Decision tree algorithm?
  • What is pruning in a decision tree?
  • What is Ensemble learning?
  • What is random forest? How does it work?
  • What cross-validation technique would you use on a time series data set?
  • What is logistic regression? Or State an example when you have used logistic regression recently.
  • What do you understand by the term Normal Distribution?
  • What is a Box Cox Transformation?
  • How will you define the number of clusters in a clustering algorithm?
  • What is deep learning?
  • What are Recurrent Neural Networks(RNNs)?
  • What is the difference between Machine Learning and Deep Learning?
  • What is reinforcement learning?
  • What is selection bias?
  • Explain what regularisation is and why is it useful
  • What is TF/IDF vectorization?
  • What are recommender systems?
  • What is the difference between regression and classification ML techniques?

Ready to engage talent?

Find your people and start connecting with independent
professionals now.