Run Python Code In R

Sharing is caring!

Last Updated on June 10, 2022 by Jay

It’s surprisingly easy to run Python code in the R programming language. You might be wondering why even bother running Python in R? I had the same doubt, until one day I had to bring two pieces of programs together.

Both programs had tens of thousands of lines of code – one in Python and the other in R. My team wanted to use the R program as the main engine. However, re-writing all the Python program in the R language seemed like a non-trivial task. To our surprise, we can actually run Python code directly in R!

Run A Python Script from R Studio

Please note this tutorial will use a combination of Python and R code. However, all the example code should be run inside an R environment (e.g. RStudio) unless stated otherwise.

R Library

The reticulate R library lets us use Python and R together. To install it in R, type the following in the R Console:

> install.packages("reticulate")

Once the installation is done. We should check if Python can be found by R. By default, this is the Python found in the system’s PATH variable.

> Sys.which('python')
                              python 
"C:\\PROGRA~2\\PYTHON~1\\python.exe"

To load the reticulate library into R:

> library(reticulate)

Install Python Library

We can install Python libraries using either of the following ways:

  1. pip install in cmd/powershell/terminal
  2. py_install() in R console

For example, to install the pandas Python library:

#type this in cmd/powershell/terminal
pip install pandas    

#type this in R console
py_install("pandas")  

Run Python Code With R Syntax

Let’s create a simple pandas dataframe using Python, then return it as an R object in the R coding environment.

Note the following difference vs writing pure Python code:

  • the usual <- symbol for assignment in R, we assigned the pandas library to the name “pd”
  • import(“library_name”) – needs quotes around the library name
  • to access class attributes or methods, use the $ symbol instead of the . symbol
  • the Python dataframe object got automatically converted into an R data.frame object
##the following code runs in R environment
> pd <- import("pandas")
> df <- pd$DataFrame(list(col1 = c(1,2,3), col2 =c('hello','world','python')))
> df
  col1   col2
1    1  hello
2    2  world
3    3 python
> class(df)
[1] "data.frame"

Run Pure Python Code In R (Python syntax)

The above example is still in the R syntax so it might make you feel weird. We can write and run pure Python-style code in R.

First, we’ll write some Python code as text in R. We’ll store that text in an R variable called py_code. Then we call the py_run_string(py_code) R function to run the Python code.

We can confirm this is a pandas dataframe object by using the Python type() built-in function.

##the following code runs in R environment
> py_code <- "import pandas as pd
+ df = pd.DataFrame({'col1':[1,2,3], 'col2':['hello','world','python']})
+ print(df)
+ print(type(df))
+ "
> py_run_string(py_code)

   col1    col2
0     1   hello
1     2   world
2     3  python
<class 'pandas.core.frame.DataFrame'>

Access Variables Created by Python

The py is an R object that contains all the Python-related stuff, including the variables created by Python, and even the Python __main__ module itself. To access those variables, also use the $ symbol. For example, py$df to access the dataframe we created using Python.

> py
Module(__main__)

> py$df
  col1    col2
1    1   hello
2    2   world
3    3  python

Run A Python Script In R

The R function py_run_file(file_path) is particularly useful when we want to run an entire Python script instead of just a few lines of Python code.

Let’s look at the following Python code which is also saved as a .py script. The file name is “eg.py”. This script basically:

  1. asks the user to input a website address URL, then
  2. attempts to scrape the first table it can find on that URL and returns it
## This following code is saved in a .py script named as "eg.py"

import pandas as pd

def read_web(url):
    df = pd.read_html(url)[0]
    return df


url = input("please enter a url:")
dd = read_web(url)

Now let’s run this script from R. Note in our Python code above, we stored the dataframe into a Python variable called dd. To access it in R, simply use py$dd. Note this is an R object!

> py_run_file("C:/Users/jay/Desktop/PythonInOffice/r_reticulate/eg.py")
please enter a url:https://en.wikipedia.org/wiki/List_of_S%26P_500_companies

> head(py$dd)
  Symbol    Security SEC filings            GICS Sector              GICS Sub-Industry   Headquarters Location Date first added     CIK
1    MMM          3M     reports            Industrials       Industrial Conglomerates   Saint Paul, Minnesota       1976-08-09   66740
2    AOS A. O. Smith     reports            Industrials              Building Products    Milwaukee, Wisconsin       2017-07-26   91142
3    ABT      Abbott     reports            Health Care          Health Care Equipment North Chicago, Illinois       1964-03-31    1800
4   ABBV      AbbVie     reports            Health Care                Pharmaceuticals North Chicago, Illinois       2012-12-31 1551152
5   ABMD     Abiomed     reports            Health Care          Health Care Equipment  Danvers, Massachusetts       2018-05-31  815094
6    ACN   Accenture     reports Information Technology IT Consulting & Other Services         Dublin, Ireland       2011-07-06 1467373
      Founded
1        1902
2        1916
3        1888
4 2013 (1888)
5        1981
6        1989

Additional Resources

Get Data From Website Using Python Pandas

Leave a Reply

Your email address will not be published. Required fields are marked *