When I first got started using the platform Quantopian, I was like a kid in a candy store. I could pretty quickly test different trading ideas that I couldn’t previously do very easily – especially when it came to using intraday equity prices.
There was one thing, however, that I really struggled with at first. Quantopian gives you access to daily bars and one-minute bars. That’s great, but I wanted to look at 30-minute and 60-minute bars. How do I do that? I knew I should be able to take the one-minute bars and resize them to be larger multi-minute bars (called downsampling). For some reason, how to do this wasn’t being shared (as far as I could tell). It took me longer than it should’ve to figure this out, so hopefully I can save you some time.
The best method that I found to solve this problem was using the Pandas DataFrame resample function. One reason for this is that when we call the data.history function in Quantopian to get one-minute bars, it already returns a pandas DataFrame object.
As of this writing, pandas 0.18.1 is the current version being used by Quantopian (though there are newer versions available). Here’s a link to the resample function’s documentation. When I look at most documentation like this, I want to pull my hair out. Why do they make it so complicated?
I read this, and I still don’t know how to use the function. And it even gives an example of downsampling one-minute data to 3-minute bins. What it doesn’t do, is give an example showing how all the input parameters work. And that’s the problem. Because the input that is most important in our case is the ‘base’ parameter, which states:
For frequencies that evenly subdivide 1 day, the “origin” of the aggregated intervals. For example, for ‘5min’ frequency, base could range from 0 through 4. Defaults to 0
That’s the best description they could come up with? I still don’t know what the heck that means, and I’m a nerd.
So, I approached this like I do engineering problems – I ran some experiments to figure it out for myself. I setup a test algorithm to do the following:
- Have a variable to enter the desired intraday bar that I want (e.g. 5 minute)
- For simplicity, only look at data for a single equity (e.g. AAPL for Apple)
- Log every one-minute bar for the equity, so I can easily check if the intraday bars are correct
- Resample and log the custom minute bars
The initialize function
So the first step was to write the initialize function that gets called at the very start of the algorithm. See below that I set the desired intraday bar to the variable context.intraday_bar and the stock that I want to track to the variable context.asset. In this example, I used 5 minutes as the intraday bar since it would be easy to check this manually (comparing each bar output to the previous 5 one-minute bars).
The tricky thing to do here was run a function every 5 minutes of the trading day to calculate the custom intraday bars? The US equity markets are open from 9:30am to 4:00pm EST. That’s 6.5 hours or 390 minutes. So I created a for loop, iterating the variable i from 1 to 390 (line 24).
Side note, why do programmers like to use the variable i so much for looping integers? I wondered the same thing, and this was the best answer I found. I’m so glad I learned Python and never used Fortran again after college.
Back on task – i loops from 1 to 390 (total number of minutes in the trading day). The easy way to get it to call a function every 5 minutes is to calculate the modulus (or remainder) of the quotient i/5. To get the value of the modulus only, use i % 5. What we want is when the modulus equals 0, meaning that 5 divides evenly into the value of i. This will be true for i = 5, 10, 15, etc, and is the number of minutes after the market open that we need to calculate a new intraday bar. Line 25 checks for a zero remainder, and when it’s true, line 26 sets up a scheduled function to run that specific minute after the open. The function that is called is one we’ll define later, get_intraday_bar.
""" Intraday bar test By: Aaron Eller aaron@ExcelInTrading.com www.ExcelInTrading.com """ import pandas as pd def initialize(context): """ Quantopian function called once at the start of the algorithm. """ # Verify version of Pandas used log.info('Current version of pandas is {}'.format(pd.__version__)) # Asset to trade context.asset = sid(24) # AAPL # Number of minutes for the desired intraday bars context.intraday_bar = 5 # Update bars every x minutes of trading day for i in range(1, 390): # Loop through max of 390 minutes in a trading day if i % context.intraday_bar == 0: schedule_function(get_intraday_bar, date_rules.every_day(), time_rules.market_open(minutes=i))
The handle_data function
The Quantopian built-in handle_data function is automatically called every minute of the trading day. We want this function to get the past one-minute bar data for our stock and log the results. This data is saved to the variable ohlcv using the data.current function in line 35. Line 37 logs the past one-minute bar’s open, high, low, close, and volume.
def handle_data(context, data): """ Quantopian function called every minute throughout the trading day. """ # Print one minute bars for the asset to check calculations ohlcv = data.current(context.asset, ['open', 'high', 'low', 'close','volume']) log.info('{}: open={}, high={}, low={}, close={}, volume={}'.format( context.asset.symbol, ohlcv['open'], ohlcv['high'], ohlcv['low'], ohlcv['close'], ohlcv['volume']))
Custom get_intraday_bar function
This function is going to get enough past one-minute bars to resample them into the desired 5-minute bars. It gets the past one-minute bars via the data.history function in line 48 and saves them as a pandas DataFrame that I called df.
In the intro, I mentioned that I’ll use the DataFrame resample function to get the custom intraday bars. To do that, I first need to set the rule or the target conversion period. I don’t know why, but they use T to represent minutes. Here’s a list of some of the time series frequencies available. I save the desired rule to the variable resample_period in line 52. Note that the rule must be a string type.
Next, I added a print statement for the resample function in its default form in line 53. If you uncomment this line, you will see the following from the PRINT statements:
DatetimeIndexResampler [freq=<5 * Minutes>, axis=0, closed=left, label=left, convention=start, base=0]
After doing some more digging into how to use this function, I figured out that it can be used with other functions, particularly numpy array functions, including sum, max, min, first, and last. Assume that you have 5 one-minute bars and you want to turn that into a single five-minute bar. The result you want is the following:
- open = first open value
- high = max of all high values
- low = min of all low values
- close = last close value
- volume = sum of all volume values
From here, I created another pandas DataFrame object (result) with the resampled intraday bars. Notice in line 55, I initially create the object, then in lines 56-60 I redefine each column of data. The one thing that tripped me up was the default value of that input parameter base is 0. If I left out base, or set it to 0, it ended up printing the last one-minute bar’s values. This didn’t make sense to me, so I just experimented with the values of base. In the documentation, it stated that for a five-minute bin, base could be a value of 0 to 4. So I tried all of these values and noticed how it changed the end result. It didn’t take long to conclude that what I wanted was base=1. This clearly set the “origin” of the intervals to the first one-minute bar.
def get_intraday_bar(context, data): """ Function calculates historical ohlcv bars for a custom intraday period. """ # Get enough data to form the past 3 intraday bars bar_count = context.intraday_bar * 3 df = data.history(context.asset, ['open', 'high', 'low', 'close', 'volume'], bar_count, '1m') # returns a pandas DataFrame # Resample dataframe for desired intraday bar resample_period = str(context.intraday_bar)+'T' # T = minute frequency #print(df.resample(resample_period)) # print to see resample default values result = df.resample(resample_period, base=1).first() result['open'] = df['open'].resample(resample_period, base=1).first() result['high'] = df['high'].resample(resample_period, base=1).max() result['low'] = df['low'].resample(resample_period, base=1).min() result['close'] = df['close'].resample(resample_period, base=1).last() result['volume'] = df['volume'].resample(resample_period, base=1).sum() # Log the results log.info('{} {} minute bar: O={}, H={}, L={}, C={}, Vol={}'.format( context.asset.symbol, context.intraday_bar, result['open'][-1], result['high'][-1], result['low'][-1], result['close'][-1], result['volume'][-1]))
Summary
Resampling the one-minute bars into custom intraday bars is an easy task, once you figure out how to properly use the DataFrame.resample function. I found the documentation and examples of how to do this to be cumbersome. That’s why I wanted to share this this with you. If you have other algorithmic trading or Quantopian tasks that you find difficult, please reach out to me (aaron (at) excelintrading.com) and I’ll try to help.