Data Science and Machine Learning Interview tips and questions for new grads.

Virajdatt Kohir
9 min readDec 31, 2022

--

Photo by Christina @ wocintechchat.com on Unsplash

I recently graduated with a Masters's Degree in Data Science. For the past year, I have been preparing for interviews initially for internships and then for full time, and during this preparation, I have collected notes on things that helped me over time to better myself in interviews and land a job. In this article, I want to summarize the things I have learned and provide some tips for new grads out there wanting to break into Data Science and Machine Learning field. Along with tips, I also will be sharing interview questions that I faced during my search for internships and full-time. I hope the reader finds this guide helpful during their preparation.

Note:
I won’t be talking on how to prepare resume in this guide. I feel there are lot of materials out there to guide an individual.

I am gonna list typical steps that follow once your resume is picked up.

  1. Recruiter screening call or Online Assessment.
  2. Behavioral round.
  3. Technical rounds (2–5 rounds)
  4. Final Negotiation round.

SECTION 1: THE TIPS

Tip 1: Work on Data Science projects in varying domains/industries:

The preparation to land a job starts with your course work and projects you work on during your course. Before the start, of course, my master’s I was given this tip to work on data science projects in different domains like finance, healthcare, eCommerce, etc. This tip really helped in tailoring my resume for various industries. This is a great preparation that you can do to land a job during your coursework. If not done during course work this tip can be followed anytime during the job application process.

I had prepared a bunch of resumes with projects specific to the job and the industry. For example:

  1. VDK_Resume_DS_health_care: For data science positions focused on health care.
  2. VDK_Resume_DS_NLP: For data science positions focused on Natural Language Processing.

Etc…..

Tip 2: Prepare for Python and SQL assessments early in your job search:

There are a lot of companies out there that send out the online assessment(s) when you apply for their Data Science position. These online assessments can contain coding in python and SQL queries or a take-home data science project. In the initial days of my job search, I didn’t focus much on coding and SQL steps and was exclusively preparing for the Data Science interview question and I realized that it was a bad strategy. Coding and SQL questions also make up the initial round in technical rounds.

So my advice is to focus on coding (python) and SQL during the start of your preparation and not to delay these.

For python I focused on strings, array, and hashmap problems from Leetcode (easy and medium around 100 questions )and eventually worked on data structures and algorithms. I was able to develop good skills by focusing on strings, arrays, and hashmap-related problems and was able to get through many Online Assessments and Coding sessions in Technical Interviews. There were a couple of interviews where I was asked to write simple unit tests so I would recommend working on unit test skills as well during preparation.

For SQL, I practiced easy and medium problems from DataLemur and hackerrank. I solved all these problems twice over course of a month and was comfortable with SQL questions. If you are someone who is starting out in SQL, I would recommend solving the problems on this platform twice.

To develop coding and SQL skills along with the required thought process for coding rounds and online assessments I would recommend being consistent in the preparation. Allow yourself 1–2 hours each day for 3–4 times a week.

Take home Data Science Project:

I have seen some companies that send out take-home data science projects as part of their online assessment. I found the following article to be a great help in my preparation to tackle this kind of online assessments:

Tackling the Take-Home Challenge: link

This kind of assessment may seem challenging at first but as you keep working on data science projects you will get comfortable and will eventually prefer such assessments.

Tip 3: Sell a very good project you have worked on to the recruiter who calls for a screening round.

If you are not sent an online assessment then the most probable next step after your resume is picked up would be for a recruiter will reach out to you for a screening call. This can be an online video call or usually it's a phone call. The recruiter who calls you is your friend who will push your resume to the technical manager who is hiring for the role. So this is an important call for you in the process.

The preparation for this round:

  1. Research about the company (their projects, services, products, and values):
    This is a step that you may miss since you are applying for a lot of jobs and will be lined up for calls from n companies. But in my opinion, researching a company is vital to selling yourself better when you get that recruiter call.
    Go to the company website and read up on their projects, services, products, and their company values. During this step also familiarize yourself with the job description.
  2. Narrative to introduce yourself:
    Once the recruiter calls you, they will provide you with a little bit of background of the company, the team, and the projects they have for the role you applied to. Once this has been communicated to you the first question you will be asked by the recruiter is to introduce yourself. When introducing yourself be sure to include skills that you have and are part of the job description, align your soft skills with the company values and relay them to the recruiter. Be sure to communicate how your education, background, skills, and projects make you a good fit for the position.
    For example, if you are applying for a company that works in the finance domain: “I am enthusiastic about possibilities of Data Science in the field of fintech. I have worked on xxx projects to predict xxx in the finance industry. XXX courses really helped deepen my understanding of math concepts that are important for the role of…
  3. Sell the projects you worked on (focus on outcomes rather than technicality):
    It is always best to talk about a project that you have worked that aligns with the company's domain (talk about a healthcare project if you are being reached out by a healthcare company). The focus when you are talking about the project should be to communicate and sell the impact/outcome/end results as opposed to technical prowess. Yes, you can mention about the coding language and packages you used to complete your project but remember that the recruiter isn’t looking for technical expertise at this point in the recruitment process.
    For example: “The XXX project lead to an accuracy of XX %, these results can be used to make better decisions which are backed by historic data.
    Practice your project pitch and try to keep it under 2–3 mins. This is a skill that can be improved with practice.
  4. Behavioral Questions:
    The recruiter can also ask a couple of behavioral questions so be sure to prepare for those. Some typical questions that I have been asked during the initial screening rounds have been:
    - Tell me about your strengths?
    - Why this role and this company?
    Here is an article that I used to prepare for preparing my answers for behavioral rounds: link
  5. Finally, be enthusiastic about the company, and the role you are applying for and ask questions to the recruiter to show your vested interest. Some example questions I typically like to ask are:
    - What is the most challenging project the team is working on?
    - Career Development opportunities within the team and the company?

Once you are past the online assessment and/or recruiter call, the next steps in the Data Science recruitment process are technical rounds (some of them may include behavioral aspects)

Tip 4: Interviewers in the coding round are there to help so don’t hesitate to ask them questions.

Coding rounds are it is programming or SQL can be daunting given the sheer amount of problems that you may be asked to solve. But don’t sweat on it too much, typically these rounds with an interviewer are to assess your thought process at arriving at a solution (out of the many possible ways of solving them). If you have any doubts or are confused about the problem my advice is to talk about it with the interviewer. The clarity he provides will be very helpful in developing your solution. In the first few coding rounds I was part of I hesitated to ask questions and get more clarity on the problem I was asked to solve and this approach didn’t work out. With time I realized that asking questions and getting clarity on the problem is part of the problem itself. The interviewer wants to assess how you navigate with a problem especially when it’s new to you, so asking questions and putting your thought process out will not only aid you in arriving at a solution but also show the interviewer your approach when you are faced with unknown circumstances.

Tip 5: Sell your project to the technical recruiter the same way but this time cover the typical data science project life cycle.

You have prepared to sell one of your projects earlier in the recruitment round. There you have completed the preparation for selling the results of your project. During the technical round when you are asked to talk about one of your projects take an additional couple of minutes to talk about the life cycle of your data science project. Include the steps starting from data acquisition to training validation strategy, and why certain metrics were chosen for the evaluation of the model(s).

Tip 6: Prepare for pandas, sklearn, numpy, tensors

In some interviews coding questions, I was asked were related to pandas, sklearn, numpy, and tensors. With these questions, you are typically expected to write optimized and vectorized versions of code.

Here is an article that can help you get started with preparation in this direction: link

Sklearn Interview Questions:

Pandas Interview Questions:

SECTION 2: Data Science and Machine Learning Interview Questions

The Data Science interview questions range from statistics, probabilities, and machine learning algorithms to evaluation methods. In this article, I want to focus on Machine Learning Algorithms and some of the associated questions and concepts.

Questions on Machine Learning Algorithms

One interesting thing that I have observed over the interviews is that the recruiters usually ask you questions about Machine Learning Algorithms that you have mentioned in your resume. There are tons of Machine Learning Algorithms out there and you don’t have to feel daunted by learning them all for every interview you have. Be thorough with the algorithms you have used and mentioned in your resume without fail.

For every algorithm the questions will range in the following:

  1. Intuitive explanation of the working of the algorithm.
  2. Pros and Cons of the Algorithm.
  3. Hyper-parameters of the algorithm and how you tune it (list the most important hyperparameter based on the algorithm).
  4. Assumptions of the algorithm (ex: Linear Regression expects correlation between the feature(s) and target variable)

For most of my interviews I prepare for the following algorithms and here are links to some interesting questions on them:

1. Linear Regression:

Interview questions on Linear Regression:

2. Decision Tree:

Interview Questions:

3. Random Forest:

How does it work?:

Interview Questions

4. xgboost:

Interview Questions:

Evaluation Metrics:

1. Classification metrics:

  • When do you use accuracy as your metric?
  • When do you use the ROC curve and AUC?
  • When do you optimize for Precision and Recall? Ans: link

2. Regression metrics:

Following are links to additional concepts and questions that I have faced in interviews:

Additional Questions and Topics to prepare for:

Concepts on handling Missing Values:

https://towardsdatascience.com/7-ways-to-handle-missing-values-in-machine-learning-1a6326adf79e

https://towardsdatascience.com/6-different-ways-to-compensate-for-missing-values-data-imputation-with-examples-6022d9ca0779

Concepts on handling Outliers:

https://medium.com/analytics-vidhya/how-to-remove-outliers-for-machine-learning-24620c4657e8

https://www.analyticsvidhya.com/blog/2021/05/why-you-shouldnt-just-delete-outliers/

Principal Component Analysis:

  • Working of PCA: link
  • Interview Questions PCA: link
  • Pros and Cons of PCA: link

Standardization and Normalization in Machine Learning:

Handling imbalanced data:

How to choose k in k-fold cross-validation?

Thank you for reading. This article is the culmination of my recent experience in preparing for interviews (Data Science and Machine Learning Engineer role). I will also be sharing a few more questions in the domain of Probability, Statistics, and some interesting questions with concepts in Machine Learning, please make sure to follow me and get notified when I publish my next article.

--

--

Virajdatt Kohir
Virajdatt Kohir

Written by Virajdatt Kohir

AI in health care with research focused on Deep Learning and LLM. I also love to talk about Machine Learning Engineering. A student for life.

No responses yet