Pandas cheat sheet

Sharing is caring!

Last Updated on February 24, 2022 by Jay

This is a pandas cheat sheet that I recently created. Feel free to grab it. I’m still looking for a way to make a good-looking cheat sheet. If you have any recommendations, please leave a comment below.

This cheat sheet is an ongoing effort and I plan to keep improving it.

Download the pandas cheat sheet here:

Or if you prefer just browsing on the webpage, keep scrolling.

Import Data

pd.DataFrame()create dataframe
pd.read_csv(filename)import from csv
pd.read_excel(filename)import from Excel
pd.read_sql(query,connection_object)import from a SQL database
pd.read_html(url)import table from a url (website) or html file
pd.read_json(json_string)import from json format
pd.read_table(filename)import from a csv

Export Data

df.to_csv(filename)export to csv
df.to_excel(filename)export from Excel
df.to_sql(table_name,connection_object)export to a SQL database
df.to_json(filename)export to json file

Data Sorting

df.sort_index(axis=0, ascending=True/False)sort index/row by ascending or descending order
df.sort_index(axis=1, ascending=True/False)sort columns by ascending or descending order
df.sort_values('col1')sort entire df by col1 values
df['col1'].sort_values()sort only column col1

Data Grouping

df.groupby('col1').sum()group df by col1 values then show sum for all other columns
df.groupby('col1').agg({'col2':'count','col3':'mean'})group df by col1 values then show count for col2 and mean for col3

Combine Data

df1.append(df2)add df2 at the bottom of df1. deprecated, use concat
df1.merge(df2)combine df1 and df2 using unique key, similar to vlookup
pd.concat([df1,df2], axis = 0)axis=0 -> append, axis=1 -> merge

Data Selection

Select 1 Row

df.loc[row_name]row_name is row name
df.iloc[row_index]row_index is the row index position

Select Multiple Rows

df.loc[row_name_start:row_name_end]row_name is the row name
df.iloc[row_index_start:row_index_end]row_index is the row index position

Select 1 Column

df[col]col is the column name
df.col_namecol_name is the column name
df.loc[:, col_name]col_name is the column name
df.iloc[:, col_index]col_index is the column index

Select Multiple Columns

df[[col1,col2,col3]]col1 is the column name
df.loc[:, col_name_start:col_name_end]col_name is the column name
df.iloc[:, col_index_start:col_index_end]col_index is the column index position

Get Cells

df[col][row]col and row are column and row names
df.loc[row_name, col_name]use row and col names
df.iloc[row_index, col_index]use row and col index

Data Cleanup

df.dropna()drop rows which contain missing values
df.dropna(axis = 1)drop columns which contain missing values
df.fillna(value='a')fill NA/NaN values using a for entire df
df[col].fillna(value='a')fill NA/NaN values using a for column col
df[col].astype(int/str/float)cast/convert to certain datatype
df[col].replace('a','b')replace all ‘a’ values with ‘b’ for column col
df.rename(columns={'old_name':'new_name'})cast/convert to certain datatype
df.set_index('col')set existing column col as the index
df.reset_index()reset index, use 0,1,2,3,etc as the index

Data Exploration

df.head(n)show the first n rows of data
df.tail(n)show the last n rows of data index dtype, columns and memory usage
df.describe()descriptive statistics such as count, mean, std, etc
df.isnull().any()shows if df contains any null value
df.shapeshows the # of rows and columns of df
df.columnsshows the df column names
df.indexshows the df row/index names

Leave a Reply

Your email address will not be published. Required fields are marked *