Does pairs trading still work (if you’re not a hedge fund)? [2021]

Does pairs trading still work (if you’re not a hedge fund)? [2021]

In this article, we will look at a popular algorithm for hedge funds and high-frequency traders called pairs trading. In its most basic form, you select a pair of stocks that historically move together and trade when they diverge. The expectation is that as they return to a normal state, you will make money.

This is still a workable strategy for high frequency hedge fund traders who can co-locate giant server farms in exchange datacenters, but can it still work for average traders who don’t have these resources? Or are all of these statistical arbitrage opportunities gone by the time an average trader can execute? Can this strategy be run at a daily/weekly frequency anymore or does it have to be run in milliseconds?

We will run some tests on the basic strategy to see if it can work today.

Historical profitability of pairs trading

Pairs trading has been in use for decades, and has been profitable — even at weekly frequencies. In 2006, a study was done by Yale on its effectiveness. The study is worth reading: Pairs Trading: Performance of a Relative Value Arbitrage Rule.

We find that trading suitably formed pairs of stocks exhibits profits, which are robust to
conservative estimates of transaction costs. These profits are uncorrelated to the S&P 500,
however they do exhibit low sensitivity to the spreads between small and large stocks and
between value and growth stocks in addition to the spread between high grade and intermediate
grade corporate bonds and shifts in the yield curve. In addition to risk and transactions cost, we
rule out several explanations for the pairs trading profits, including mean-reversion as previously
documented in the literature, unrealized bankruptcy risk, and the inability of arbitrageurs to take
advantage of the profits due to short-sale constraints.

Pairs Trading: Performance of a Relative Value Arbitrage Rule

However, this study does find that opportunities are much less frequent and much less profitable in 2000- timeframe.

One view of the lower profitability of pairs trading in recent year is that returns are
competed away by increased hedge fund activity. The alternative view, taken in this paper, is that
abnormal returns to pairs strategies are a compensation to arbitrageurs for enforcing the “Law of
One Price”.

Pairs Trading: Performance of a Relative Value Arbitrage Rule

Building a pairs trading model

To determine whether pairs trading still works, we will model a basic pairs trading strategy in Python using historical data, and then see if we can use that strategy to generate excess return in the next period.

We will use a lot of loading code that we wrote in the Getting Started article, which you should read before this one.

Our first task is loading the data. Our data comes from Quandl in a set of CSV files. We will load them all into a DataFrame using pandas:

import pandas as pd  
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import quandl
import pickle
import csv
import os
from datetime import date
import sys
import statsmodels.api as sm
import scipy.optimize as sco
plt.style.use('fivethirtyeight')
np.random.seed(777)


alldata = pd.DataFrame()

try: 
    ticker_data = pd.read_csv("tickers.csv")
except FileNotFoundError as e:
    pass

for t in ticker_data['ticker']:
    
    try:
        data = pd.read_csv("d:\\stockdata\\" + t + ".csv", 
                            header = None,
                            usecols = [0, 1, 12],
                            names = ['ticker', 'date', 'adj_close'])
    
        alldata = alldata.append(data)
        
    except FileNotFoundError as e:
        pass

Our ticker CSV file contains a lot of tickers that we don’t want to look at, such as delisted ones and OTC stocks, so we will filter it before loading:

ticker_data = ticker_data.loc[(ticker_data['table'] == 'SF1') & (ticker_data['isdelisted'] == 'N') & (ticker_data['currency'] == 'USD') & ((ticker_data.exchange == 'NYSE') | (ticker_data.exchange == 'NYSEMKT') | (ticker_data.exchange == 'NASDAQ'))]

Now, we have all of the relevant tickers in a DataFrame. The next step is cleaning the data so that we can build a correlation matrix. We will use daily data at first. Later, we can move to longer timeframes.

df = alldata.set_index('date')
table = df.pivot(columns='ticker')
table.columns = [col[1] for col in table.columns]

table.index = pd.to_datetime(table.index)
table = table['2018-01-01':'2020-01-01']

table = table.loc[:, (table.isnull().sum(axis=0) <= 30)]
table = table.dropna(axis='rows')

Finally, generating a correlation matrix is very simple with Pandas:

             A       AAL      AAME       AAN      AAON       AAP      AAPL  \
A     1.000000 -0.411075 -0.423860  0.548792  0.677535  0.439531  0.515594   
AAL  -0.411075  1.000000  0.762728 -0.675839 -0.625932 -0.729448 -0.526901   
AAME -0.423860  0.762728  1.000000 -0.544478 -0.579686 -0.578142 -0.637582   
AAN   0.548792 -0.675839 -0.544478  1.000000  0.857118  0.498967  0.648432   
AAON  0.677535 -0.625932 -0.579686  0.857118  1.000000  0.484853  0.624596   
       ...       ...       ...       ...       ...       ...       ...   
ZIOP  0.296149 -0.085815 -0.192385  0.472562  0.574997 -0.224872  0.353245   
ZIXI  0.440557 -0.699289 -0.566404  0.679136  0.761478  0.583238  0.284436   
ZNGA  0.631641 -0.730482 -0.702510  0.844649  0.898749  0.453662  0.625134   
ZUMZ  0.562601 -0.421862 -0.498312  0.619821  0.548948  0.285115  0.796054   
ZVO  -0.423629  0.493856  0.469000 -0.674794 -0.682076 -0.058555 -0.525241   

Finding pair trading opportunities

With the correlation matrix, we can get a lot of information. Using a simple sort, we can get the top correlations:

XEL   AEP     0.996523
V     MA      0.994791
NEE   ETR     0.994646
WEC   XEL     0.994444
  
OPTT  AEE    -0.972316
SHIP  AMT    -0.972656
SO    DYNT   -0.972725
AEE   FCEL   -0.973476
OGE   OPTT   -0.976275

As we can see, this is consistent with what the paper found: the most correlated stocks are for energy companies. Let’s look at one of these on a chart:

Comparison of XEL and AEP pairs trading charts. Credit: Yahoo
Comparison of XEL and AEP pairs trading charts. Credit: Yahoo

To determine the signal for when a trade should occur, we look at the difference between the two stocks:

In [22]: diff = table['XEL']-table['AEP']

In [23]: diff.mean()
Out[23]: -24.10590632758745

In [24]: diff.std()
Out[24]: 3.148618069119123

Now, we can say with a reasonably high level of confidence that if the two stocks diverge by more than 2 standard deviations (or ~ 6.3), there is a trading opportunity. Let’s see if that happens in the period following this one:


> next_period = df.pivot(columns='ticker')
> next_period.columns = [col[1] for col in next_period.columns]
> next_period.index = pd.to_datetime(next_period.index)

> next_period = next_period['2020-01-01':]
> next_diff = next_period['XEL']-next_period['AEP']
> next_diff.loc[next_diff < -30.4]
date
2020-01-15   -30.736753
2020-01-16   -30.877687
2020-01-17   -31.762922
2020-01-21   -31.903163
2020-01-22   -32.340503
2020-01-23   -32.607094
2020-01-24   -33.502785
2020-01-27   -33.378315
2020-01-28   -33.482761
2020-01-29   -33.890905
2020-01-30   -33.828830
2020-01-31   -33.521868
2020-02-03   -33.265412
2020-02-04   -32.095119
2020-02-05   -31.362759
2020-02-06   -30.806166
2020-02-07   -31.866599
2020-02-10   -32.091640
2020-02-11   -32.485245
2020-02-12   -31.993412
2020-02-13   -32.256399
2020-02-14   -32.717131
2020-02-18   -32.389837
2020-02-19   -31.938717
2020-02-20   -30.955869

So according to the pairs trading algorithm provided, we would open the trade on January 15, 2020 (XEL @ 63.59, AEP @ 94.33). The algorithm closes the trade when the two stocks cross their mean difference again. This happens on March 11, 2020 (XEL @ 65.90, AEP @ 88.21). This trade would therefore earn us $8.43, ignoring transaction costs.

A real test of the pairs trading algorithm

So in the one example, we did earn a positive return by following the pairs trading algorithm. The question is: does this generalize, and can we beat the market with this strategy? We will now look at a much larger number of correlated pairs.

First, we need to codify the trading algorithm. For readability, we will iterate over the DataFrames, although in practice there are much better ways to do this. We define a function to calculate per-trade profit

def calc_trade_profit(ticker1, ticker2, start_date, end_date):
    if next_period.loc[start_date][ticker1] &gt; next_period.loc[start_date][ticker2]:
        high_ticker = ticker1  
        low_ticker = ticker2

    else:
        high_ticker = ticker2
        low_ticker = ticker1
        
    short_profit = next_period.loc[start_date][high_ticker] - next_period.loc[end_date][high_ticker]
    long_profit = next_period.loc[end_date][low_ticker] - next_period.loc[start_date][low_ticker]

    return long_profit + short_profit

Now, we iterate over each day in the period. For each day, we will iterate over the correlated pairs to find whether they trade on that day. If the difference between the prices of the two stocks is greater than the threshold (and was less yesterday), we open the trade. If the trade is open, and the difference between the prices goes below the mean, we close the trade.

Here’s our code:

print("starting trades")
for index, row in next_period.iterrows():
    # iterate over the days of the next period
    
    for sindex, srow in corrdata.iterrows():
        # iterate over the correlated pairs
        
        ticker1 = sindex[0]
        ticker2 = sindex[1]
        
        trade = ticker1 + '-' + ticker2
        
        mean = srow['mean']
        std = srow['std']
        
        if not trade in trades:
            trades[trade] = { 'open' : False, 'start' : None, 'earned' : 0. }
        
        threshold = abs(mean) + 2*std
        print(trade + " " + str(row[ticker1]) + " " + str(row[ticker2]) + " " + str(threshold) + " " + str(abs(row[ticker1] - row[ticker2])))

        if ticker1 in prevrow:
            if abs(row[ticker1] - row[ticker2]) &gt; threshold and abs(prevrow[ticker1] - prevrow[ticker2] &lt;= threshold):
                #open a trade if not open
                if not trades[trade]['open']:
                    print("opening trade " + ticker1 + "-" + ticker2 + ": " + str(index))
                    trades[trade]['start'] = index
                    trades[trade]['open'] = True
            
            if abs(row[ticker1] - row[ticker2]) &lt; abs(mean) and abs(prevrow[ticker1] - prevrow[ticker2]) &gt;= abs(mean):
                #close an open trade
                if trades[trade]['open']:
                    print("closing trade " + ticker1 + "-" + ticker2 + ": " + str(index))
                    trades[trade]['earned'] = trades[trade]['earned'] + calc_trade_profit(ticker1, ticker2, trades[trade]['start'], index)
                    trades[trade]['start'] = None
                    trades[trade]['open'] = False
                
    prevrow = row

Finally, we need to close all open trades on the last day (these are the ones that may be losers):

#now close all of the open trades on the last day of the period
last_day = next_period.index[-1]

for sindex, srow in corrdata.iterrows():
    # iterate over the correlated pairs
    
    ticker1 = sindex[0]
    ticker2 = sindex[1]
    
    trade = ticker1 + '-' + ticker2
    if (trade in trades) and (trades[trade]['open']):
        print("last day: closing trade " + ticker1 + "-" + ticker2)
        trades[trade]['earned'] = trades[trade]['earned'] + calc_trade_profit(ticker1, ticker2, trades[trade]['start'], last_day)
        trades[trade]['start'] = None
        trades[trade]['open'] = False
        
    
    
print("total earned: " + str(functools.reduce(lambda acc,v: acc+trades[v]['earned'], trades, 0)))

Now, we run this algorithm on the top 100 correlated pairs:

XEL   AEP     0.996523
V     MA      0.994791
NEE   ETR     0.994646
WEC   XEL     0.994444
      CMS     0.994384
SUI   ELS     0.993536
CMS   XEL     0.993018
AWK   XEL     0.992827
UDR   AVB     0.992741
      AIV     0.992524
WEC   AWK     0.992214
HE    AEP     0.992158
SUI   ETR     0.991948
      MAA     0.991770
CMS   LNT     0.991266
ADC   O       0.991101
HE    XEL     0.991046
OPTT  FCEL    0.990808
AON   AJG     0.990615
WELL  PEAK    0.990414
AIV   AVB     0.990408
TRNO  PLD     0.990302
NEE   LNT     0.990216
AEP   AWK     0.989953
POR   AEP     0.989822
CMS   AEP     0.989687
O     AIV     0.989058
      UDR     0.989025
POR   XEL     0.988931
PEAK  O       0.988882
DRE   EGP     0.988872
SO    ETR     0.988799
DRE   FR      0.988766
AWK   AMT     0.988565
PNM   XEL     0.988536
HE    AWK     0.988447
ELS   ETR     0.988435
O     NNN     0.988409
EGP   ELS     0.988395
MSFT  V       0.988302
SUI   NEE     0.987992
CMS   POR     0.987979
CVV   BLIN    0.987850
OGE   AEE     0.987712
AEP   WEC     0.987688
CMS   AWK     0.987651
TRNO  SUI     0.987628
PEAK  ADC     0.987585
LNT   ETR     0.987326
AEP   PNM     0.987310
PEAK  UDR     0.987200
POR   AVB     0.987179
AVB   CPT     0.987089
EQR   ESS     0.986930
LNT   WEC     0.986684
USM   TDS     0.986674
WPC   PSB     0.986629
NWE   DTE     0.986558
CPT   EQR     0.986507
AEE   DTE     0.986407
XEL   AMT     0.986397
EGP   SUI     0.986270
ATO   CMS     0.986256
NEE   FE      0.986243
AEE   NWE     0.986101
CMS   EQR     0.986014
EGP   STWD    0.985946
MA    MSFT    0.985910
CMS   FE      0.985907
NWE   POR     0.985815
NEE   ELS     0.985728
CCI   AMT     0.985701
MT    HAL     0.985365
EQR   UDR     0.985344
MAA   ETR     0.985262
MPW   O       0.985165
EQR   AVB     0.985094
BAH   ARCC    0.984924
MA    FICO    0.984884
HE    WEC     0.984878
AEP   AMT     0.984747
SUI   SO      0.984630
NEE   SO      0.984616
WEC   ETR     0.984588
PNM   AWK     0.984569
FR    TRNO    0.984532
HE    PNM     0.984259
BKH   NWE     0.984165
STE   BAH     0.984014
AWR   WEC     0.983979
V     ARCC    0.983937
UDR   ADC     0.983922
MAA   TRNO    0.983697
PSB   WELL    0.983626
MAA   NEE     0.983566
VRSK  MSI     0.983533
BAH   V       0.983492
HE    AMT     0.983375
AWK   POR     0.983327
FE    LNT     0.983292

Results of the pairs trading algorithm

From the period of January 1, 2020 to August 18, 2020, the algorithm would have earned a profit of $284, by trading one long and one short share at each signal. Since each long is paired with a larger short, you could achieve this with essentially $0 initial investment (ignoring trading costs), making the return essentially infinite. Moreover, if you run the model from January 1, 2020 to April 1, 2020, you would have seen a significant profit, even while the S&P 500 had dropped by almost 25%.

However, in reality, whenever you open a short position, you assume a liability.

Conclusion: does pairs trading still work in 2021?

Our model shows that pairs trading can work and be profitable in 2021. The question of how profitable, and whether it is worth the risk is something we will tackle in a future article.

0

No Comments

No comments yet

Leave a Reply

Your email address will not be published.