Replicate Excel VLOOKUP, HLOOKUP, XLOOKUP in Python (DAY 30!!)

Excel LOOKUP formulas are probably one of the most used formulas (at least for me). So today we’ll replicate the xlookup formula in Python. In fact, we can use the same technique to replicate any of VLOOKUP, HLOOKUP, XLOOKUP, or INDEX/MATCH in Python!

This is the day 30 of my 30-day challenge! I did it! I find myself also learning a great deal during this journey. I’ll be sharing another post on the statistics of the articles posted during the 30-day challenge.

UPDATED Nov 15 2020: included the optional argument if_not_found in the Python code.

The example

We have two Excel tables, one contains some basic customer information, and the other contains customer order information. Our task is to bring some data from one table to another. Does this situation sound familiar? Link here to the Excel example if you want to follow along.

Excel solution

To tackle this problem, we can use either: lookup or INDEX/MATCH formula. VLOOKUP is probably the most commonly used, but it’s restricted by the table format – the lookup key has to be on the left-most column of the array of data that we are performing the lookup for. In other words, if the values we are trying to bring is on the left side of the lookup key, VLOOKUP will not work. On the other hand, we can use INDEX/MATCH combination, but it requires more typing (I’m lazy!!).

To make the lazy people happier, Microsoft’s solution is the XLOOKUP formula, but it’s only available in Office 2016 or Office 365. So we’ll use the xlookup formula to solve this problem. In the below screenshot, column F “purchase” is what we wanted to bring over from the second table (down), and column G shows the formula used for column F. Although table 2 contains multiple entries for the same customers, for demonstration purpose, we’ll only use the value from the first entry. E.g. for Harry, we want to bring the purchase “Kill la Kill”.

Excel XLOOKUP example
Excel XLOOKUP example

Replicate xlookup in Python

We’ll be using the pandas library, which is almost equivalent to a spreadsheet app for Python, to replicate the Excel formula. If you are new to Python or pandas, find out how to install them here.

pandas offers a wide selection of tools so we can replicate the xlookup function in several ways. we’ll walk through one way here which is a combination of filters and apply().

First let’s load the tables into our “Python spreadsheet app”!

import pandas as pd

df1 = pd.read_excel('users.xlsx', sheet_name = 'User_info')
df2 = pd.read_excel('users.xlsx', sheet_name = 'purchase')


>>> df1
      User Name Country      City Gender  Age
0  Forrest Gump     USA  New York      M   50
1     Mary Jane  CANADA   Tornoto      F   30
2  Harry Porter      UK    London      M   20
3     Jean Grey   CHINA  Shanghai      F   30

>>> df2
    ID            purchase       Date      Customer
0  101        Dragon Ball  2020-08-12  Forrest Gump
1  102          Evangelion 2020-01-01     Mary Jane
2  103        Kill la Kill 2020-08-01  Harry Porter
3  104        Dragon Ball  1999-01-01     Jean Grey
4  105          Evangelion 2019-12-31     Mary Jane
5  106  Ghost in the Shell 2020-01-01  Harry Porter
6  107          Evangelion 2018-04-01     Jean Grey

Thought process

The idea behind the XLOOKUP function is similar to INDEX/MATCH but less typing. Given a lookup_value, we find the position of it in the lookup_array, then we return the value at the same position from the return_array. Below is the available arguments from the Excel xlookup formula. We’ll write the Python function using the same names for the arguments so it’s easier to compare with the Excel XLOOKUP formula.

Excel XLOOKUP syntax and arguments
Excel XLOOKUP syntax and arguments

Python replication

We can use a pandas filter to achieve this. In addition to the three required arguments, we’ll also implement the two optional arguments if_not_found and search_mode (to be updated later). Here is the Python code:

def xlookup(lookup_value, lookup_array, return_array, if_not_found:str = ''):
    match_value = return_array.loc[lookup_array == lookup_value]
    if match_value.empty:
        return f'"{lookup_value}" not found!' if if_not_found == '' else if_not_found

    else:
        return match_value.tolist()[0]

Okay, a lot is going on in the above few lines of code. This is why I love Python – it’s simple yet it can express complex logic! Let’s break down the above code.

In the first line, we are defining a function called xlookup with some arguments.

  • lookup_value: the value we are interested, this will be a string value
  • lookup_array: this is a column inside the source pandas dataframe, we are looking for the “lookup_value” inside this array/column
  • return_array: this is a column inside the source pandas dataframe, we want to return values from this column
  • if_not_found: will be returned if the “lookup_value” is not found

In the subsequent lines:

  • lookup_array == lookup_value returns a boolean index, which is used by pandas to filtering results.
  • return_array.loc[] returns a pandas Series with the value(s) based on the above boolean index, only True values are returned.
  • One nice thing about pandas Series is its .empty attribute, which tells us if the Series contains value or empty, if the match_value turns out to be empty, then we know there’s no match found! Then we can inform the user that no lookup_value is found in the data.
  • On the contrary, if the match_value is not empty, then we know some value has been found. We can convert match_value (which is a pandas Series) into a list by .tolist()
  • Finally, because we want to keep only the first value (if there are multiple entries), we pick the first element by specifying [0] from the returned list

Let’s test the function, seems to be working fine! Note that df1 is the table we want to bring values to, and df2 is the source table we are looking up values from, and we are passing two dataframe columns into the function for lookup_array and return_array.

>>> xlookup('Mary Jane', df2['Customer'],df2['purchase'])
'Evangelion'

>>> xlookup('Forrest Gump', df2['Customer'],df2['purchase'])
'Dragon Ball'

>>> xlookup('Forrest Gumfp', df2['Customer'],df2['purchase'])
'"Forrest Gumfp" not found!'

Formula complete, now “drag down”

Well, since we are doing everything in code and there’s no GUI, we can’t just simply double click on something to “drag down” the formula. But essentially the “drag down” is the looping part – we just need to apply the xlookup function to every single row of the table df1. And remember, we should never loop through a dataframe using the for loop.

apply() method instead of for loop

It turns out that pandas provides a method to do exactly this, and its name is .apply()! Let’s look at its syntax. Below is a simplified list of arguments, if you prefer to see the full list of arguments, check out the official pandas documentation on apply.

dataframe.apply(func, axis = 0, args=())

  • func: the function we are applying
  • axis: we can apply the function both rows or columns. By default it’s = 0, which is rows. axis=1 means columns
  • args=(): this is a tuple that contain the positional arguments we want to pass into the func

Here’s how we can apply the xlookup function on the entire column of a dataframe.

df1['purchase'] = df1['User Name'].apply(xlookup, args = (df2['Customer'], df2['purchase']))

One thing to pay attention to is how apply() passes arguments into the original func, which is xlookup in our case. By design, apply will automatically pass all data from the caller dataframe (series). In our example, apply() will pass df1['User Name'] as the very first argument into the function xlookup. However, our xlookup takes three arguments in total. That’s where the args=() argument becomes handy. Note that we need to pass these arguments in the correct order.

pandas apply() method arguments
pandas apply() method arguments

Let’s have another look at the Excel solution vs Python solution side by side:

>>> df1
      User Name Country      City Gender  Age      purchase
0  Forrest Gump     USA  New York      M   50  Dragon Ball 
1     Mary Jane  CANADA   Tornoto      F   30    Evangelion
2  Harry Porter      UK    London      M   20  Kill la Kill
3     Jean Grey   CHINA  Shanghai      F   30  Dragon Ball 
Excel XLOOKUP example
Excel XLOOKUP example

6 comments

  1. I loved this post so much.

    I’m compiling that kind of handy tools in a little package at work and that xlookup its so elegant and well explained that will replace my 10-lines approach.

    I’m suscribed to your blog!

    PS: is there any social media to follow you on?

    1. Hi Fernando,

      Glad that you found it helpful 🙂

      I’m curious what “little package” are you compiling? Is it a collection of Python tools to replicate Excel functions?

      I do have a twitter account @PythonInOffice, but I should post more often!

      Regards,
      Jay

      1. Yes it is!!

        It’s almost all ‘tkinter’, ‘pandas’, and ‘os’ modules snippets I find useful in my office job.

        For Excel stuff I’m just starting so there is not much to take by now. Feel free to suggest or take whatever go there.

        It is divided in scripts and helpers: scripts are speciffic functions and helpers are functions that I use a lot to build my scripts quickly.

        If you know spanish, the README file should be auto-explanatory.

        Here is the GitHub link: https://github.com/FARBEX97/scripts_oficina

        BTW I couldn’t find your Twitter account by searching @PythonInOffice

        Good luck with the blog!

  2. Hi Jay,

    Love love love your content, really helping me loads having tasked to automate things via python at work!!! Can’t wait for you to uplaod more.

    I was wondering if you will be uploading content that
    1) will cover dealing with multiple different sources with different unique key identifier e.g.
    – file 1 has client name & ID number, file 2 has client name & volumes => unique identifier here is the client name where I can then map client name, ID number, and volumes
    – file 3 has only ID number & pricing => unique identifier thus now is ID number
    eventually I only want one output of file with client name, ID number, volumes, and pricing

    2) I have a file with thousands of lines of purchases made in the month of September. I want to aggregate the total value spent by each customer e.g. Harry buys a product on 1st September, 15th September and 30th September spending $15 each time. I want to generate a report that says Harry spent $45 in the month of September and so on for other customers. => A report of unique client names and their total spending of that month.

    Looking forward to more content! Really a lifesaver.

    PS. Some of your examples sheets/folders are not linked!

    Thank you.

    1. Hi Liliana,

      Thank you for the kind words 🙂 And I’m glad that you find the content helpful.

      I’ll definitely be uploading more tutorials – I’ve been working on some projects which are taking most of my free time. I will be sharing some things that I’ve learned from those projects! Just a sneak peak – I’m getting a programmable drone which can be controlled by Python in real time. I feel this will be a little fun section to add, just a place to get away from the work-related stuff.

      For the two types of content you are interested in:
      1) Coincidentally, I think what you are looking for is explained in this post: https://pythoninoffice.com/merge-multiple-excel-files-in-python/
      2) This can be solved by either groupby or a pivot table. Both topics are on my to-do list, I just need to find some time to write.

      Thanks for pointing out that there are missing links, I will try to go through post by post to fix them.

      Happy automating!

  3. I hope with all the technology they can still preserve the books and libraries in towns. There is still a calming feeling getting in a place with lots of books.

Comments are closed.