Pandas Essentials: Dealing with categorical variables using the value_counts method

Virajdatt Kohir
3 min readApr 3, 2022

--

Photo by Nicholas Cappello on Unsplash

Pandas is an amazing data manipulation and data wrangling library which makes an aspiring data scientist feel like a superhuman. I still remember my early days with python and pandas, it is the attractive nature of pandas along with sklearn that lead me to take up python over R (which I was working with before).

Even now after 5 years of using pandas, there are methods and tricks with pandas that still surprise me to this day. These tips and tricks I have learned over the years have increased my efficiency immensely and have given additional dimensions to my data analysis and model-building skills.

Now in this short article, I want to compile and talk about a five tricks with the value_counts method of pandas that I have learned over the past few months and have been using extensively when dealing with categorical variables.

So what is value_counts() in pandas?: In short, it lets you list the most frequently occurring elements in decreasing order in a pandas column (more specifically ). If you want to learn more about the value_counts method of pandas you can read about it here. Following are the tricks :

1. Get the rows/data points for which the frequency of a certain category is greater than the given threshold:

Imagine that there is a categorical data column that has more than 100 or more categories/types i.e a lot of categories. Now to find the frequency of each category from this column you use value_counts.

Then use the following code to select the rows which have the frequency of a certain category greater than the threshold you want:

2. Capture just the categories of a categorical column in decreasing order of frequency with value_counts

If you want to retrieve only the categories of your categorical column and also want to preserve the order (decreasing in this case) you can use the following code:

color_decreasing_frequency=df['color'].value_counts().index.to_list()

3. Make value_counts show the count of nan in your report:

This was a great time-saver, value_counts by default don’t count the NAN in your column but if you need to find out them for your analysis you can do it by passing dropna=False to the value_counts method.

4. Value_counts with the entire data frame:

Interesting results are obtained when you run the value_counts on a data frame instead of a series (meaning when you operate not on one single column but more). This results in returning counts of frequency of the combination of values from 2 columns. Please look at the example below :

This method has saved me immense time when I wanted to inspect the frequency of values across 2 columns. There was a time when I had developed a loop to have this functionality. Now that I know this method it has increased my productivity.

5. Make a data frame of your value_counts results :

Finally, this has been an absolute favorite find and I only discovered this very recently. We can convert the results of value_counts into a data frame. This is useful when you want to further manipulate the data after retrieving the frequency. Following is an example of how to do it:

:) :):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):)

Thank you for reading, that’s all for this article. More content to follow. Please clap if the article was helpful to you and comment if you have any questions.

Also, I am developing and publishing content here on medium based on my experience and learnings regularly. I write about various things in the field of Data Science, Machine Learning, and Deep Learning. Please do follow me here on medium if these areas interest you, lets's learn together.

If you want to connect with me, learn and grow with me or collaborate you can reach me at any of the following:

Linkedin:- https://www.linkedin.com/in/virajdatt-kohir/
Twitter:- https://twitter.com/kvirajdatt
GitHub:- https://github.com/Virajdatt
GoodReads:- https://www.goodreads.com/user/show/114768501-virajdatt-kohir

:) :):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):)

--

--

Virajdatt Kohir

Data Analysis/Science, Machine Learning, Deep Learning, student for life.