Last Updated on June 10, 2022 by Jay
It’s surprisingly easy to run Python code in the R programming language. You might be wondering why even bother running Python in R? I had the same doubt, until one day I had to bring two pieces of programs together.
Both programs had tens of thousands of lines of code – one in Python and the other in R. My team wanted to use the R program as the main engine. However, re-writing all the Python program in the R language seemed like a non-trivial task. To our surprise, we can actually run Python code directly in R!
Please note this tutorial will use a combination of Python and R code. However, all the example code should be run inside an R environment (e.g. RStudio) unless stated otherwise.
R Library
The reticulate R library lets us use Python and R together. To install it in R, type the following in the R Console:
> install.packages("reticulate")
Once the installation is done. We should check if Python can be found by R. By default, this is the Python found in the system’s PATH variable.
> Sys.which('python')
python
"C:\\PROGRA~2\\PYTHON~1\\python.exe"
To load the reticulate library into R:
> library(reticulate)
Install Python Library
We can install Python libraries using either of the following ways:
- pip install in cmd/powershell/terminal
- py_install() in R console
For example, to install the pandas Python library:
#type this in cmd/powershell/terminal
pip install pandas
#type this in R console
py_install("pandas")
Run Python Code With R Syntax
Let’s create a simple pandas dataframe using Python, then return it as an R object in the R coding environment.
Note the following difference vs writing pure Python code:
- the usual <- symbol for assignment in R, we assigned the pandas library to the name “pd”
- import(“library_name”) – needs quotes around the library name
- to access class attributes or methods, use the $ symbol instead of the . symbol
- the Python dataframe object got automatically converted into an R data.frame object
##the following code runs in R environment
> pd <- import("pandas")
> df <- pd$DataFrame(list(col1 = c(1,2,3), col2 =c('hello','world','python')))
> df
col1 col2
1 1 hello
2 2 world
3 3 python
> class(df)
[1] "data.frame"
Run Pure Python Code In R (Python syntax)
The above example is still in the R syntax so it might make you feel weird. We can write and run pure Python-style code in R.
First, we’ll write some Python code as text in R. We’ll store that text in an R variable called py_code. Then we call the py_run_string(py_code) R function to run the Python code.
We can confirm this is a pandas dataframe object by using the Python type() built-in function.
##the following code runs in R environment
> py_code <- "import pandas as pd
+ df = pd.DataFrame({'col1':[1,2,3], 'col2':['hello','world','python']})
+ print(df)
+ print(type(df))
+ "
> py_run_string(py_code)
col1 col2
0 1 hello
1 2 world
2 3 python
<class 'pandas.core.frame.DataFrame'>
Access Variables Created by Python
The py is an R object that contains all the Python-related stuff, including the variables created by Python, and even the Python __main__ module itself. To access those variables, also use the $ symbol. For example, py$df to access the dataframe we created using Python.
> py
Module(__main__)
> py$df
col1 col2
1 1 hello
2 2 world
3 3 python
Run A Python Script In R
The R function py_run_file(file_path) is particularly useful when we want to run an entire Python script instead of just a few lines of Python code.
Let’s look at the following Python code which is also saved as a .py script. The file name is “eg.py”. This script basically:
- asks the user to input a website address URL, then
- attempts to scrape the first table it can find on that URL and returns it
## This following code is saved in a .py script named as "eg.py"
import pandas as pd
def read_web(url):
df = pd.read_html(url)[0]
return df
url = input("please enter a url:")
dd = read_web(url)
Now let’s run this script from R. Note in our Python code above, we stored the dataframe into a Python variable called dd. To access it in R, simply use py$dd. Note this is an R object!
> py_run_file("C:/Users/jay/Desktop/PythonInOffice/r_reticulate/eg.py")
please enter a url:https://en.wikipedia.org/wiki/List_of_S%26P_500_companies
> head(py$dd)
Symbol Security SEC filings GICS Sector GICS Sub-Industry Headquarters Location Date first added CIK
1 MMM 3M reports Industrials Industrial Conglomerates Saint Paul, Minnesota 1976-08-09 66740
2 AOS A. O. Smith reports Industrials Building Products Milwaukee, Wisconsin 2017-07-26 91142
3 ABT Abbott reports Health Care Health Care Equipment North Chicago, Illinois 1964-03-31 1800
4 ABBV AbbVie reports Health Care Pharmaceuticals North Chicago, Illinois 2012-12-31 1551152
5 ABMD Abiomed reports Health Care Health Care Equipment Danvers, Massachusetts 2018-05-31 815094
6 ACN Accenture reports Information Technology IT Consulting & Other Services Dublin, Ireland 2011-07-06 1467373
Founded
1 1902
2 1916
3 1888
4 2013 (1888)
5 1981
6 1989