Create complex calculated columns using applymap()

Sharing is caring!

Last Updated on July 14, 2022 by Jay

We previously talked about how to create calculated columns in pandas, and walked through some simple examples. We can create calculated columns easily most of the time by assigning an expression to a new column like df['new column'] = expression. However, there are times when we need to create fairly complex calculated columns. Today, we’ll learn how to do that efficiently.

This tutorial is part of the “Integrate Python with Excel” series, you can find the table of content here for easier navigation.

Prepare a dataframe for demo

Let’s take a look at the following example, we have a list of students’ school averages in percentages, and we want to convert those into alphabetical grades (i.e. A, B, C, D, F, etc). The mark thresholds are the following:

  • A: >= 90
  • B: 80 <= and < 90
  • C: 70 <= and < 80
  • D: 50 <= and < 70
  • F: < 50

Let’s create our hypothetical students and their school averages. We’ll randomly generate numbers between 1 to 100 for the students’ marks. (Sorry those poor guys who failed due to bad luck!)

import pandas as pd
import random

>>> df = pd.DataFrame({'Student ID':[f'student_{i}' for i in range(1,11)], 'Class A Average':[random.randint(1,100) for i in range(1,11)],'Class B Average':[random.randint(1,100) for i in range(1,11)],'Class C Average':[random.randint(1,100) for i in range(1,11)]})
>>> df
   Student ID  Class A Average  Class B Average  Class C Average
0   student_1               71               87               99
1   student_2                8               89               91
2   student_3               25               38               63
3   student_4               29               24               14
4   student_5               86               92               76
5   student_6               33               16               47
6   student_7               61               14               68
7   student_8               14               32               43
8   student_9               46               17               74
9  student_10               13               74               11

Create a helper function

Now let’s create a function that takes an average and process/convert it into a letter grade.

>>> def letter_grade(x):
	if x >= 90:
		return 'A'
	elif x >= 80:
		return 'B'
	elif x >= 70:
		return 'C'
	elif x >= 50:
		return 'D'
		return 'F'

>>> letter_grade(90)
>>> letter_grade(70)

Now we just gotta apply this function to every students. So do a loop through each student in the column? Big No No! Remember that we should never loop through a pandas dataframe/series! Because it’s inefficient to do so if we have a large dataset.

pandas applymap() method

pandas provides an easy way to apply a customized function to a column or the entire dataframe. And that is the .applymap() method. It’s kind of similar to what the map() function does. Note the below code, we are applying the function on only the three columns contain averages. Since we know the first column contains string, and we’ll likely hit an error if we try to apply the letter_grade() function on string data.

>>> df[['Class A Average','Class B Average','Class C Average']].applymap(letter_grade)
  Class A Average Class B Average Class C Average
0               C               B               A
1               F               B               A
2               F               F               D
3               F               F               F
4               B               A               C
5               F               F               F
6               D               F               D
7               F               F               F
8               F               F               C
9               F               C               F

We can still use the map() function to convert the class grades. However, we’ll need to use map() separately on each of the three columns. Whereas applymap() takes care of the whole dataframe (multiple columns).

Leave a Reply

Your email address will not be published. Required fields are marked *