Filter a pandas dataframe – OR, AND, NOT

Sharing is caring!

Last Updated on July 14, 2022 by Jay

This is the second part of the Filter a pandas dataframe tutorial. Today we’ll be talking about advanced filter in pandas dataframe, involving OR, AND, NOT logic.

This tutorial is part of the “Integrate Python with Excel” series, you can find the table of content here for easier navigation.

Prepare a dataframe for demo

We’ll be using the S&P 500 company dataset for this tutorial. First, we’ll fire up pandas and load the data from Wikipedia. Also attached below a screenshot of the data table for easy reference.

import pandas as pd
df = pd.read_html('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')[0]
sp500-companies-wikipedia
sp500-companies-wikipedia

Combination of things

We use OR logic when one of the conditions need to be satisfied. For example, to get all “Health Care” and “Information Technology” companies means we want the Sector to be either Health Care or IT. Note that:

  • The regular or logic operator does not work in this case, and we have to use the bitwise logic operator “|“, which means “or”.
  • Each criteria needs to be wrapped with a pair of parentheses.
df_1 = df.loc[(df['GICS Sector'] == 'Health Care') | (df['GICS Sector'] == 'Information Technology')]

>>> df_1['Symbol'].count()
133

In the above code, we have two boolean index in the .loc[]. The below is a simplified Excel example to demonstrate what the | operator means.

OR Operation Example in Excel

Intersection of things

We use AND logic when both (or more) conditions need to be satisfied. For example, We can find out how many companies are in “Health Care” sector and also operate in “Health Care Equipment” sub-sector.

df_2 = df.loc[(df['GICS Sector'] == 'Health Care') & (df['GICS Sub Industry'] == 'Health Care Equipment')]

>>> df_2['Symbol'].count()
20

Again, we need to use the bitwise AND operator “&“, and wrap around the conditions with a pair of parentheses. An Excel example below shows what the & operator means.

AND Operation Example in Excel

Opposite of things

To select the opposite of something, we need to use the NOT logic operator. The bitwise NOT is “~“. An Excel example is below.

NOT operation

To select all companies other than “Information Technology”. We can do the following:

df_3 = df.loc[ ~(df['Symbol'] == 'Information Technology')]

#an equivalent way is:
df_3 = df.loc[df['Symbol'] != 'Information Technology']

Leave a Reply

Your email address will not be published. Required fields are marked *