Last Updated on July 14, 2022 by Jay
This is the second part of the Filter a pandas dataframe tutorial. Today we’ll be talking about advanced filter in pandas dataframe, involving OR, AND, NOT logic.
This tutorial is part of the “Integrate Python with Excel” series, you can find the table of content here for easier navigation.
Prepare a dataframe for demo
We’ll be using the S&P 500 company dataset for this tutorial. First, we’ll fire up pandas and load the data from Wikipedia. Also attached below a screenshot of the data table for easy reference.
import pandas as pd
df = pd.read_html('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')[0]
Combination of things
We use OR logic when one of the conditions need to be satisfied. For example, to get all “Health Care” and “Information Technology” companies means we want the Sector to be either Health Care or IT. Note that:
- The regular
or
logic operator does not work in this case, and we have to use the bitwise logic operator “|
“, which means “or”. - Each criteria needs to be wrapped with a pair of parentheses.
df_1 = df.loc[(df['GICS Sector'] == 'Health Care') | (df['GICS Sector'] == 'Information Technology')]
>>> df_1['Symbol'].count()
133
In the above code, we have two boolean index in the .loc[]
. The below is a simplified Excel example to demonstrate what the |
operator means.
Intersection of things
We use AND logic when both (or more) conditions need to be satisfied. For example, We can find out how many companies are in “Health Care” sector and also operate in “Health Care Equipment” sub-sector.
df_2 = df.loc[(df['GICS Sector'] == 'Health Care') & (df['GICS Sub Industry'] == 'Health Care Equipment')]
>>> df_2['Symbol'].count()
20
Again, we need to use the bitwise AND operator “&
“, and wrap around the conditions with a pair of parentheses. An Excel example below shows what the &
operator means.
Opposite of things
To select the opposite of something, we need to use the NOT logic operator. The bitwise NOT is “~
“. An Excel example is below.
To select all companies other than “Information Technology”. We can do the following:
df_3 = df.loc[ ~(df['Symbol'] == 'Information Technology')]
#an equivalent way is:
df_3 = df.loc[df['Symbol'] != 'Information Technology']