Last Updated on July 14, 2022 by Jay
Excel LOOKUP formulas are probably one of the most used formulas (at least for me). So today we’ll replicate the xlookup formula in Python. In fact, we can use the same technique to replicate any of VLOOKUP, HLOOKUP, XLOOKUP, or INDEX/MATCH in Python!
This is the day 30 of my 30-day challenge! I did it! I find myself also learning a great deal during this journey. I’ll be sharing another post on the statistics of the articles posted during the 30-day challenge.
UPDATED Nov 15 2020: included the optional argument if_not_found in the Python code.
The example
We have two Excel tables, one contains some basic customer information, and the other contains customer order information. Our task is to bring some data from one table to another. Does this situation sound familiar? Link here to the Excel example if you want to follow along.
Excel solution
To tackle this problem, we can use either: lookup or INDEX/MATCH formula. VLOOKUP is probably the most commonly used, but it’s restricted by the table format – the lookup key has to be on the left-most column of the array of data that we are performing the lookup for. In other words, if the values we are trying to bring is on the left side of the lookup key, VLOOKUP will not work. On the other hand, we can use INDEX/MATCH combination, but it requires more typing (I’m lazy!!).
To make the lazy people happier, Microsoft’s solution is the XLOOKUP formula, but it’s only available in Office 2016 or Office 365. So we’ll use the xlookup formula to solve this problem. In the below screenshot, column F “purchase” is what we wanted to bring over from the second table (down), and column G shows the formula used for column F. Although table 2 contains multiple entries for the same customers, for demonstration purpose, we’ll only use the value from the first entry. E.g. for Harry, we want to bring the purchase “Kill la Kill”.
Replicate xlookup in Python
We’ll be using the pandas library, which is almost equivalent to a spreadsheet app for Python, to replicate the Excel formula. If you are new to Python or pandas, find out how to install them here.
pandas offers a wide selection of tools so we can replicate the xlookup function in several ways. we’ll walk through one way here which is a combination of filters and apply()
.
First let’s load the tables into our “Python spreadsheet app”!
import pandas as pd
df1 = pd.read_excel('users.xlsx', sheet_name = 'User_info')
df2 = pd.read_excel('users.xlsx', sheet_name = 'purchase')
>>> df1
User Name Country City Gender Age
0 Forrest Gump USA New York M 50
1 Mary Jane CANADA Tornoto F 30
2 Harry Porter UK London M 20
3 Jean Grey CHINA Shanghai F 30
>>> df2
ID purchase Date Customer
0 101 Dragon Ball 2020-08-12 Forrest Gump
1 102 Evangelion 2020-01-01 Mary Jane
2 103 Kill la Kill 2020-08-01 Harry Porter
3 104 Dragon Ball 1999-01-01 Jean Grey
4 105 Evangelion 2019-12-31 Mary Jane
5 106 Ghost in the Shell 2020-01-01 Harry Porter
6 107 Evangelion 2018-04-01 Jean Grey
Thought process
The idea behind the XLOOKUP function is similar to INDEX/MATCH but less typing. Given a lookup_value, we find the position of it in the lookup_array, then we return the value at the same position from the return_array. Below is the available arguments from the Excel xlookup formula. We’ll write the Python function using the same names for the arguments so it’s easier to compare with the Excel XLOOKUP formula.
Python replication
We can use a pandas filter to achieve this. In addition to the three required arguments, we’ll also implement the two optional arguments if_not_found and search_mode (to be updated later). Here is the Python code:
def xlookup(lookup_value, lookup_array, return_array, if_not_found:str = ''):
match_value = return_array.loc[lookup_array == lookup_value]
if match_value.empty:
return f'"{lookup_value}" not found!' if if_not_found == '' else if_not_found
else:
return match_value.tolist()[0]
Okay, a lot is going on in the above few lines of code. This is why I love Python – it’s simple yet it can express complex logic! Let’s break down the above code.
In the first line, we are defining a function called xlookup
with some arguments.
lookup_value
: the value we are interested, this will be a string valuelookup_array
:
this is a column inside the source pandas dataframe, we are looking for the “lookup_value” inside this array/columnreturn_array
: this is a column inside the source pandas dataframe, we want to return values from this columnif_not_found
: will be returned if the “lookup_value” is not found
In the subsequent lines:
lookup_array == lookup_value
returns a boolean index, which is used by pandas to filtering results.return_array.loc[]
returns a pandas Series with the value(s) based on the above boolean index, only True values are returned.- One nice thing about pandas Series is its
.empty
attribute, which tells us if the Series contains value or empty, if thematch_value
turns out to be empty, then we know there’s no match found! Then we can inform the user that no lookup_value is found in the data. - On the contrary, if the match_value is not empty, then we know some value has been found. We can convert match_value (which is a pandas Series) into a list by
.tolist()
- Finally, because we want to keep only the first value (if there are multiple entries), we pick the first element by specifying
[0]
from the returned list
Let’s test the function, seems to be working fine! Note that df1
is the table we want to bring values to, and df2
is the source table we are looking up values from, and we are passing two dataframe columns into the function for lookup_array
and return_array
.
>>> xlookup('Mary Jane', df2['Customer'],df2['purchase'])
'Evangelion'
>>> xlookup('Forrest Gump', df2['Customer'],df2['purchase'])
'Dragon Ball'
>>> xlookup('Forrest Gumfp', df2['Customer'],df2['purchase'])
'"Forrest Gumfp" not found!'
Formula complete, now “drag down”
Well, since we are doing everything in code and there’s no GUI, we can’t just simply double click on something to “drag down” the formula. But essentially the “drag down” is the looping part – we just need to apply the xlookup function to every single row of the table df1. And remember, we should never loop through a dataframe using the for
loop.
apply()
method instead of for
loop
It turns out that pandas provides a method to do exactly this, and its name is .apply()
! Let’s look at its syntax. Below is a simplified list of arguments, if you prefer to see the full list of arguments, check out the official pandas documentation on apply.
dataframe.apply(func, axis = 0, args=())
func
: the function we are applyingaxis
: we can apply the function both rows or columns. By default it’s = 0, which is rows. axis=1 means columnsargs=()
: this is a tuple that contain the positional arguments we want to pass into thefunc
Here’s how we can apply the xlookup function on the entire column of a dataframe.
df1['purchase'] = df1['User Name'].apply(xlookup, args = (df2['Customer'], df2['purchase']))
One thing to pay attention to is how apply()
passes arguments into the original func
, which is xlookup
in our case. By design, apply will automatically pass all data from the caller dataframe (series). In our example, apply()
will pass df1['User Name']
as the very first argument into the function xlookup
. However, our xlookup takes three arguments in total. That’s where the args=()
argument becomes handy. Note that we need to pass these arguments in the correct order.
Let’s have another look at the Excel solution vs Python solution side by side:
>>> df1
User Name Country City Gender Age purchase
0 Forrest Gump USA New York M 50 Dragon Ball
1 Mary Jane CANADA Tornoto F 30 Evangelion
2 Harry Porter UK London M 20 Kill la Kill
3 Jean Grey CHINA Shanghai F 30 Dragon Ball
I loved this post so much.
I’m compiling that kind of handy tools in a little package at work and that xlookup its so elegant and well explained that will replace my 10-lines approach.
I’m suscribed to your blog!
PS: is there any social media to follow you on?
Hi Fernando,
Glad that you found it helpful 🙂
I’m curious what “little package” are you compiling? Is it a collection of Python tools to replicate Excel functions?
I do have a twitter account @PythonInOffice, but I should post more often!
Regards,
Jay
Yes it is!!
It’s almost all ‘tkinter’, ‘pandas’, and ‘os’ modules snippets I find useful in my office job.
For Excel stuff I’m just starting so there is not much to take by now. Feel free to suggest or take whatever go there.
It is divided in scripts and helpers: scripts are speciffic functions and helpers are functions that I use a lot to build my scripts quickly.
If you know spanish, the README file should be auto-explanatory.
Here is the GitHub link: https://github.com/FARBEX97/scripts_oficina
BTW I couldn’t find your Twitter account by searching @PythonInOffice
Good luck with the blog!
Hi Jay,
Love love love your content, really helping me loads having tasked to automate things via python at work!!! Can’t wait for you to uplaod more.
I was wondering if you will be uploading content that
1) will cover dealing with multiple different sources with different unique key identifier e.g.
– file 1 has client name & ID number, file 2 has client name & volumes => unique identifier here is the client name where I can then map client name, ID number, and volumes
– file 3 has only ID number & pricing => unique identifier thus now is ID number
eventually I only want one output of file with client name, ID number, volumes, and pricing
2) I have a file with thousands of lines of purchases made in the month of September. I want to aggregate the total value spent by each customer e.g. Harry buys a product on 1st September, 15th September and 30th September spending $15 each time. I want to generate a report that says Harry spent $45 in the month of September and so on for other customers. => A report of unique client names and their total spending of that month.
Looking forward to more content! Really a lifesaver.
PS. Some of your examples sheets/folders are not linked!
Thank you.
Hi Liliana,
Thank you for the kind words 🙂 And I’m glad that you find the content helpful.
I’ll definitely be uploading more tutorials – I’ve been working on some projects which are taking most of my free time. I will be sharing some things that I’ve learned from those projects! Just a sneak peak – I’m getting a programmable drone which can be controlled by Python in real time. I feel this will be a little fun section to add, just a place to get away from the work-related stuff.
For the two types of content you are interested in:
1) Coincidentally, I think what you are looking for is explained in this post: https://pythoninoffice.com/merge-multiple-excel-files-in-python/
2) This can be solved by either groupby or a pivot table. Both topics are on my to-do list, I just need to find some time to write.
Thanks for pointing out that there are missing links, I will try to go through post by post to fix them.
Happy automating!
I hope with all the technology they can still preserve the books and libraries in towns. There is still a calming feeling getting in a place with lots of books.
agreed! sometimes I go to the library to pick things up for my mom and it’s so peaceful in there.
I sure hope we preserve books and libraries too! I spent many many hours in libraries studying for my professional exams during my first few years out of university.
Hello Jay,
is there a place where i can get the complete Python program for “Replicate Excel VLOOKUP, HLOOKUP, XLOOKUP in Python”?
Hi Uday,
The complete code is in the article, but separated into piece since it’s easier to explain how they work in small chunks. If you copy all the code from the article then you’ll have the complete code. Let me know if you need any help.
Hello Jay,
Thank you, good stuff. I am wondering if your xlookup can take more multiple criteria.
Thank you
Ludek