Extracting Bitcoin Blockchain Data With Python, RPC, Bitcoind, And Bitcoin Core

Sharing is caring!

Last Updated on July 14, 2022 by Jay

Wondering how to get Bitcoin blockchain data with Python, and Bitcoind/RPC? Look no further, you’ll find the answer here!

It took me quite a while to figure this out. However, the process is surprisingly simple. It was about connecting the dots. “You don’t know what you don’t know.” When we don’t know stuff, it’s hard to make things happen.

But don’t worry, I’m here to help you connect the dots so you can get your own copy of authentic Bitcoin Blockchain data.

NOTHING on this blog is financial advice. We only talk about technical stuff here 🙂

What We Need

  • Bitcoin Core Client
  • Python
  • python-bitcoinrpc library

Why Do We Need Bitcoin Data?

I wanted to validate/replicate the stock-to-flow model, which predicts Bitcoin price in relation to the stock-to-flow ratio. I have a complete guide on how to prepare data and re-create the price prediction here: Predicting Bitcoin Price With Stock To Flow Using Python.

Download The Bitcoin Core Client & Blockchain Data

To download the full blockchain, head to the Bitcoin official site: https://bitcoin.org/en/download. Download the Bitcoin Core installation file for your operating system.

What Is Bitcoin Core?

Bitcoin Core is the official client-side software for the Bitcoin network. It includes the full Bitcoin blockchain data.

**WARNING** DO NOT ever download a blockchain directly from any link! The whole idea of Bitcoin is decentralization – if you download a blockchain from a single source, it’s likely untrustworthy.

Note although the official site says you need 7GB for storage, the actual blockchain size is a lot larger than that. We want the uncompressed blockchain so that we can have all the information. On my computer, the blockchain data is around 414 GB and counting. Make sure you have enough space for the data before downloading data.

Huge Size For Bitcoin Blockchain
Huge Size For Bitcoin Blockchain

What is Bitcoind?

Bitcoind is a command-line based program that’s bundled as a part of Bitcoin Core. There’s a “daemon” folder inside the Bitcoin Core installation folder, and we can find the bitcoind program there.

The bitcoind program provides a JSON_RPC interface and allows us directly communicate with the Bitcoin Core and the Bitcoin Blockchain!

Bitcoind program
Bitcoind program

Bitcoin Core Settings

Once we install the Bitcoin Core, when we run it the first time, it will ask us to select some settings. Do not check the “Prune block storage to” box, this way we’ll download the uncompressed blockchain. For a fast Internet connection plus a fast computer, it probably will take 12-24 hours to download the whole blockchain, just be patient with it. I tried to use my slow laptop to download it, and it took over a week…

This next step is important, we need to complete this step to communicate with the Bitcoin Core program using Python. In the Settings -> Options, we need to add the Configuration File, we can do this while downloading the blockchain. Click on “Open Configuration File”, then a notepad will popup. Enter the following text inside the file. Note that I literally just used “username” for username and “password” for password. This will be a local connection (127.0.0.1) so it doesn’t matter what username and password to set.

Bitcoin Core Settings
Bitcoin Core Settings
server = 1
rpcbind = 127.0.0.1:8332
rpcuser = username
rpcpassword = password

What Is PRC?

What we just set up is the configuration for RPC (Remote Procedure Call), which is a set of protocols and interfaces that the Bitcoin Core client interacts with the Blockchain. With RPC (i.e. bitcoind), we can query the blockchain information such as blocks and transactions. We can also send Bitcoins, but that’s for another tutorial.

Python Module & Connect to Bitcoin RPC

We need the module python-bitcoinrpc for communicating with the Bitcoin client/blockchain with Python.

pip install python-bitcoinrpc

While the Bitcoin Core client is open, run the following code in your Python IDE to test the connection first. Note the “username” and “password” need to match the values we just set in the RPC configuration file.

from bitcoinrpc.authproxy import AuthServiceProxy, JSONRPCException
rpc_connection = AuthServiceProxy("http://%s:%s@127.0.0.1:8332"%("username", "password"))
num_blocks = rpc_connection.getblockcount()

If the Bitcoin Core program isn’t open, there will be an error message that looks like this:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\jay\Desktop\PythonInOffice\bitcoin_price_prediction\venv\lib\site-packages\bitcoinrpc\authproxy.py", line 132, in __call__
    self.__conn.request('POST', self.__url.path, postdata,
  File "C:\Program Files (x86)\Python397\lib\http\client.py", line 1279, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "C:\Program Files (x86)\Python397\lib\http\client.py", line 1290, in _send_request
    self.putrequest(method, url, **skips)
  File "C:\Program Files (x86)\Python397\lib\http\client.py", line 1116, in putrequest
    raise CannotSendRequest(self.__state)
http.client.CannotSendRequest: Request-sent

At the time of writing, the latest block was #720,775. I use this site to check blockchain information. It means that our num_blocks should also equal to 720,775.

Bitcoin Explorer
Bitcoin Explorer

If you can see num_blocks = the latest block number on the blockchain explorer website, congratulations! You have successfully established a connection with the Bitcoin RPC.

Collect Data From Bitcoin Core Using Python & RPC/Bitcoind

As you see there are 700k + blocks at the time of writing. With Python, Bitcoin Core, Bitcoind and RPC, we will be able to extract the Bitcoin Blockchain data easily.

It would be a memory-intensive task if we were to download all the block data at once. Also, we don’t want our code to fail/crash for whatever reason during the download, which means we have to start over again.

So we are going to download bitcoin blockchain data in chunks. I’m just picking a number of 50,000 blocks per chunk, you can feel free to set your own number that fits your machine.

chunk_size = 50000
chunks = int(num_blocks / chunk_size)

With chunks = 14 from our calculation, meaning that we just need to download the data (of 50,000 blocks each time) 14 times, then plus the remaining ~20,775 blocks.

I’m storing the data into a SQLite database for efficiency & simplicity reasons. Check out this tutorial to learn how to work with a SQLite database.

You can also save the data into a .csv file. But personally, I feel CSV is slow when retrieving the data, and there’s a chance that the file becomes hard to open once it gets too large. I strongly recommend using a real database for storing data, SQLite is a beginner-friendly and easy-to-use database application.

def initial_load():
    with sqlite3.connect('bitcoin_blockchain.db') as conn:
        for c in range(0,chunks+1):
            block_stats = [rpc_connection.getblockstats(i) for i in range(c*chunk_size+1, (c+1)*chunk_size)]

            df = pd.DataFrame(block_stats)
            df['feerate_percentiles'] = df['feerate_percentiles'].astype(str)
            df.to_sql('blockchain', conn, if_exists='append') 
        print(f'finished {(c+1)*chunk_size} record')

The above function will get 50,000 block stats in one iteration, loops for 14 times, for a total of 699,999 blocks. Inside the for loop, .getblockstats(i) will download data for the ith block. Then we put the info into a pandas dataframe for organizing data & easy access and upload into the SQLite database.

Function For Future Updates

But aren’t there 720,775 blocks (and counting…) in total? Indeed, that’s why we need another function to grab the rest of the blocks. Another reason for having this next function is to help with future downloads. Let’s say today we get all 720,775 blocks, but a month later there will be several thousand new blocks, we need an easy way to get those. Of course, feel free to choose to combine the future updates and initial download together into one function. I just chose to separate them.

def update_chain(start_block):
    num_blocks = rpc_connection.getblockcount()
    block_stats = [rpc_connection.getblockstats(i) for i in range(start_block, num_blocks+1)]
    df = pd.DataFrame(block_stats)
    df['feerate_percentiles'] = df['feerate_percentiles'].astype(str)
    with sqlite3.connect('bitcoin_blockchain.db') as conn:
        df.to_sql('blockchain', conn,if_exists='append') 

update_chain(700000)

The above code will start downloading block data from # 700,000 and grab everything onwards. It looks very similar to our first function, in hindsight it probably makes sense to generalize the first function better so we don’t need to write the same code twice. But whatever, it was way faster to change a few things to come up with the 2nd function instead of making a perfect and generalized code that considers every scenario. If you are up for the challenge, create a function that’s capable of downloading the full blockchain as well as making future updates, and feel free to share your code in the comments!

It can take a little time to download all 700k blocks data, so be patient. The final product is a single .db file with about ~500MB in size.

Putting It Together

Here’s all the code we need to extract Bitcoin blockchain data using Python, Bitcoin Core, Bitcoind and RPC. Feel free to tweak it and combine the two download functions into one 🙂

from bitcoinrpc.authproxy import AuthServiceProxy, JSONRPCException
import pandas as pd
import sqlite3

# rpc_user and rpc_password are set in the bitcoin.conf file
rpc_connection = AuthServiceProxy("http://%s:%s@127.0.0.1:8332"%('username', 'password'))

chunk_size = 50000
chunks = int(num_blocks / chunk_size)
    
def initial_load():
    for c in range(0,chunks+1):
        block_stats = [rpc_connection.getblockstats(i) for i in range(c*chunk_size+1, (c+1)*chunk_size)]

        df = pd.DataFrame(block_stats)
        df['feerate_percentiles'] = df['feerate_percentiles'].astype(str)
        
        with sqlite3.connect('bitcoin_blockchain.db') as conn:
            df.to_sql('blockchain', conn,if_exists='append') 
        print(f'finished {(c+1)*chunk_size} record')
    final_block_stats = [rpc_connection.getblockstats(i) for i in range(chunks*chunk_size, num_blocks+1)]


def update_chain(start_block):
    block_stats = [rpc_connection.getblockstats(i) for i in range(start_block, num_blocks+1)]
    df = pd.DataFrame(block_stats)
    df['feerate_percentiles'] = df['feerate_percentiles'].astype(str)
    with sqlite3.connect('bitcoin_blockchain.db') as conn:
        df.to_sql('blockchain', conn,if_exists='append') 
    

initial_load()
start = 700000
update_chain(start)

Leave a Reply

Your email address will not be published. Required fields are marked *