How To Add A New Column in pandas DataFrame

Sharing is caring!

Last Updated on July 14, 2022 by Jay

In this tutorial, we’ll learn how to add a new column in pandas DataFrame. We are going to see a few different approaches for inserting columns into a pandas dataframe.

This article is part of the “Integrate Python with Excel” series, you can find the table of content here for easier navigation.

.insert() method

The quickest way is to use .insert() method provided by pandas. The method takes the following arguments:

  • loc – the index number for insertion
  • column – a column name
  • value – the data to be inserted

Let’s use the example from before to demonstrate this. Our goal is to insert a new column with values of 100s after the first column. Note that the insert() method overwrites the original df.

import pandas as pd
df = pd.DataFrame({'col 1':[1,2,3,4,5], 
                   'col 2':[6,7,8,9,10],
                   'col 3':[11,12,13,14,15],
                 })

>>> df
   col 1  col 2  col 3
0      1      6     11
1      2      7     12
2      3      8     13
3      4      9     14
4      5     10     15

>>> df.insert(1, 'new col', [100,100,100,100,100])

>>> df
   col 1  new col  col 2  col 3
0      1      100      6     11
1      2      100      7     12
2      3      100      8     13
3      4      100      9     14
4      5      100     10     15

Square bracket method

Ok, I made up this name. I don’t know what it’s called, but we use square brackets here so I’m calling it the square bracket method. This is my favorite method for adding new columns to a dataframe due to its simplicity.

Remember from the other post how we can reference a column in a dataframe? This is almost the same thing, except that we are now assigning values to a column instead of referencing it. Continue with the previous example:

>>> df['sqbracket method'] = df['new col'] * 2
>>> df

   col 1  new col  col 2  col 3    sqbracket method
0      1      100      6     11                 200
1      2      100      7     12                 200
2      3      100      8     13                 200
3      4      100      9     14                 200
4      5      100     10     15                 200

See how easy that was to create a calculated column? Note that this method also overwrites the original df by adding a new column to it, which is what we want. However, with this method you can not choose the location to add the new column. It will always be added to the end of the dataframe.

.assign() method

Let’s add a new column using the df.assign method. The first argument we put into the function will be the name of the new column. We can assign some data (e.g. a list) to the new column, or we can add a new calculated column like shown below

>>> df.assign(col_1_x_2=df['col 1']*2)

   col 1  col 2  col 3  col_1_x_2
0      1      6     11          2
1      2      7     12          4
2      3      8     13          6
3      4      9     14          8
4      5     10     15         10

Note that the .assign() method only returns the new dataframe, but it doesn’t overwrite the original dataframe.

Dictionary & .map() method

Imagine we have a dictionary with keys being data already in the dataframe, then we can bring the dictionary values into the dataframe pretty easily. Note that the .map() method is a pandas Series method (i.e. only works on Series), not the Python built-in map() function.

num_to_letter = {1:'one',2:'two',3:'three',4:'four',5:'five',6:'six'}

>>> df['letter'] = df['col 1'].map(num_to_letter)
>>> df
   col 1  col 2  col 3 letter
0      5      6     11   five
1      4      7     12   four
2      2      8     13    two
3      3      9     14  three
4      1     10     15    one

Leave a Reply

Your email address will not be published. Required fields are marked *