Last Updated on July 14, 2022 by Jay
We previously talked about how to create calculated columns in pandas, and walked through some simple examples. We can create calculated columns easily most of the time by assigning an expression to a new column like df['new column'] = expression
. However, there are times when we need to create fairly complex calculated columns. Today, we’ll learn how to do that efficiently.
This tutorial is part of the “Integrate Python with Excel” series, you can find the table of content here for easier navigation.
Prepare a dataframe for demo
Let’s take a look at the following example, we have a list of students’ school averages in percentages, and we want to convert those into alphabetical grades (i.e. A, B, C, D, F, etc). The mark thresholds are the following:
- A: >= 90
- B: 80 <= and < 90
- C: 70 <= and < 80
- D: 50 <= and < 70
- F: < 50
Let’s create our hypothetical students and their school averages. We’ll randomly generate numbers between 1 to 100 for the students’ marks. (Sorry those poor guys who failed due to bad luck!)
import pandas as pd
import random
>>> df = pd.DataFrame({'Student ID':[f'student_{i}' for i in range(1,11)], 'Class A Average':[random.randint(1,100) for i in range(1,11)],'Class B Average':[random.randint(1,100) for i in range(1,11)],'Class C Average':[random.randint(1,100) for i in range(1,11)]})
>>> df
Student ID Class A Average Class B Average Class C Average
0 student_1 71 87 99
1 student_2 8 89 91
2 student_3 25 38 63
3 student_4 29 24 14
4 student_5 86 92 76
5 student_6 33 16 47
6 student_7 61 14 68
7 student_8 14 32 43
8 student_9 46 17 74
9 student_10 13 74 11
Create a helper function
Now let’s create a function that takes an average and process/convert it into a letter grade.
>>> def letter_grade(x):
if x >= 90:
return 'A'
elif x >= 80:
return 'B'
elif x >= 70:
return 'C'
elif x >= 50:
return 'D'
else:
return 'F'
>>> letter_grade(90)
'A'
>>> letter_grade(70)
'C'
Now we just gotta apply this function to every students. So do a loop through each student in the column? Big No No! Remember that we should never loop through a pandas
dataframe/series! Because it’s inefficient to do so if we have a large dataset.
pandas applymap()
method
pandas
provides an easy way to apply a customized function to a column or the entire dataframe. And that is the .applymap()
method. It’s kind of similar to what the map()
function does. Note the below code, we are applying the function on only the three columns contain averages. Since we know the first column contains string, and we’ll likely hit an error if we try to apply the letter_grade()
function on string data.
>>> df[['Class A Average','Class B Average','Class C Average']].applymap(letter_grade)
Class A Average Class B Average Class C Average
0 C B A
1 F B A
2 F F D
3 F F F
4 B A C
5 F F F
6 D F D
7 F F F
8 F F C
9 F C F
We can still use the map()
function to convert the class grades. However, we’ll need to use map()
separately on each of the three columns. Whereas applymap()
takes care of the whole dataframe (multiple columns).