{"id":299,"date":"2021-01-25T13:55:16","date_gmt":"2021-01-25T13:55:16","guid":{"rendered":"https:\/\/firemymoneymanager.com\/?p=299"},"modified":"2021-08-05T14:02:10","modified_gmt":"2021-08-05T14:02:10","slug":"does-pairs-trading-still-work","status":"publish","type":"post","link":"https:\/\/firemymoneymanager.com\/does-pairs-trading-still-work\/","title":{"rendered":"Does pairs trading still work (if you’re not a hedge fund)? [2021]"},"content":{"rendered":"\n

In this article, we will look at a popular algorithm for hedge funds and high-frequency traders called pairs trading. In its most basic form, you select a pair of stocks that historically move together and trade when they diverge. The expectation is that as they return to a normal state, you will make money.<\/p>\n\n\n\n

This is still a workable strategy for high frequency hedge fund traders who can co-locate giant server farms in exchange datacenters, but can it still work for average traders who don’t have these resources? Or are all of these statistical arbitrage opportunities gone by the time an average trader can execute? Can this strategy be run at a daily\/weekly frequency anymore or does it have to be run in milliseconds? <\/p>\n\n\n\n

We will run some tests on the basic strategy to see if it can work today.<\/p>\n\n\n\n

Historical profitability of pairs trading<\/h2>\n\n\n\n

Pairs trading has been in use for decades, and has been profitable — even at weekly frequencies. In 2006, a study was done by Yale on its effectiveness. The study is worth reading: Pairs Trading: Performance of a Relative Value Arbitrage Rule<\/a>.<\/p>\n\n\n\n

We find that trading suitably formed pairs of stocks exhibits profits, which are robust to
conservative estimates of transaction costs. These profits are uncorrelated to the S&P 500,
however they do exhibit low sensitivity to the spreads between small and large stocks and
between value and growth stocks in addition to the spread between high grade and intermediate
grade corporate bonds and shifts in the yield curve. In addition to risk and transactions cost, we
rule out several explanations for the pairs trading profits, including mean-reversion as previously
documented in the literature, unrealized bankruptcy risk, and the inability of arbitrageurs to take
advantage of the profits due to short-sale constraints.<\/p>
Pairs Trading: Performance of a Relative Value Arbitrage Rule<\/a><\/cite><\/blockquote>\n\n\n\n

However, this study does find that opportunities are much less frequent and much less profitable in 2000- timeframe. <\/p>\n\n\n\n

One view of the lower profitability of pairs trading in recent year is that returns are
competed away by increased hedge fund activity. The alternative view, taken in this paper, is that
abnormal returns to pairs strategies are a compensation to arbitrageurs for enforcing the \u201cLaw of
One Price\u201d.<\/p>
Pairs Trading: Performance of a Relative Value Arbitrage Rule<\/a><\/cite><\/blockquote>\n\n\n\n

Building a pairs trading model<\/h2>\n\n\n\n

To determine whether pairs trading still works, we will model a basic pairs trading strategy in Python using historical data, and then see if we can use that strategy to generate excess return in the next period.<\/p>\n\n\n\n

We will use a lot of loading code that we wrote in the Getting Started<\/a> article, which you should read before this one.<\/p>\n\n\n\n

Our first task is loading the data. Our data comes from Quandl in a set of CSV files. We will load them all into a DataFrame using pandas:<\/p>\n\n\n\n

import pandas as pd  \nimport numpy as np\nimport matplotlib.pyplot as plt\nimport seaborn as sns\nimport quandl\nimport pickle\nimport csv\nimport os\nfrom datetime import date\nimport sys\nimport statsmodels.api as sm\nimport scipy.optimize as sco\nplt.style.use('fivethirtyeight')\nnp.random.seed(777)\n\n\nalldata = pd.DataFrame()\n\ntry: \n    ticker_data = pd.read_csv(\"tickers.csv\")\nexcept FileNotFoundError as e:\n    pass\n\nfor t in ticker_data['ticker']:\n    \n    try:\n        data = pd.read_csv(\"d:\\\\stockdata\\\\\" + t + \".csv\", \n                            header = None,\n                            usecols = [0, 1, 12],\n                            names = ['ticker', 'date', 'adj_close'])\n    \n        alldata = alldata.append(data)\n        \n    except FileNotFoundError as e:\n        pass\n<\/pre>\n\n\n\n

Our ticker CSV file contains a lot of tickers that we don’t want to look at, such as delisted ones and OTC stocks, so we will filter it before loading:<\/p>\n\n\n\n

ticker_data = ticker_data.loc[(ticker_data['table'] == 'SF1') &amp; (ticker_data['isdelisted'] == 'N') &amp; (ticker_data['currency'] == 'USD') &amp; ((ticker_data.exchange == 'NYSE') | (ticker_data.exchange == 'NYSEMKT') | (ticker_data.exchange == 'NASDAQ'))]\n<\/pre>\n\n\n\n

Now, we have all of the relevant tickers in a DataFrame. The next step is cleaning the data so that we can build a correlation matrix. We will use daily data at first. Later, we can move to longer timeframes.<\/p>\n\n\n\n

df = alldata.set_index('date')\ntable = df.pivot(columns='ticker')\ntable.columns = [col[1] for col in table.columns]\n\ntable.index = pd.to_datetime(table.index)\ntable = table['2018-01-01':'2020-01-01']\n\ntable = table.loc[:, (table.isnull().sum(axis=0) &lt;= 30)]\ntable = table.dropna(axis='rows')<\/pre>\n\n\n\n

Finally, generating a correlation matrix is very simple with Pandas:<\/p>\n\n\n\n

             A       AAL      AAME       AAN      AAON       AAP      AAPL  \\\nA     1.000000 -0.411075 -0.423860  0.548792  0.677535  0.439531  0.515594   \nAAL  -0.411075  1.000000  0.762728 -0.675839 -0.625932 -0.729448 -0.526901   \nAAME -0.423860  0.762728  1.000000 -0.544478 -0.579686 -0.578142 -0.637582   \nAAN   0.548792 -0.675839 -0.544478  1.000000  0.857118  0.498967  0.648432   \nAAON  0.677535 -0.625932 -0.579686  0.857118  1.000000  0.484853  0.624596   \n       ...       ...       ...       ...       ...       ...       ...   \nZIOP  0.296149 -0.085815 -0.192385  0.472562  0.574997 -0.224872  0.353245   \nZIXI  0.440557 -0.699289 -0.566404  0.679136  0.761478  0.583238  0.284436   \nZNGA  0.631641 -0.730482 -0.702510  0.844649  0.898749  0.453662  0.625134   \nZUMZ  0.562601 -0.421862 -0.498312  0.619821  0.548948  0.285115  0.796054   \nZVO  -0.423629  0.493856  0.469000 -0.674794 -0.682076 -0.058555 -0.525241   <\/code><\/pre>\n\n\n\n

Finding pair trading opportunities<\/h2>\n\n\n\n

With the correlation matrix, we can get a lot of information. Using a simple sort, we can get the top correlations:<\/p>\n\n\n\n

XEL   AEP     0.996523\nV     MA      0.994791\nNEE   ETR     0.994646\nWEC   XEL     0.994444\n  \nOPTT  AEE    -0.972316\nSHIP  AMT    -0.972656\nSO    DYNT   -0.972725\nAEE   FCEL   -0.973476\nOGE   OPTT   -0.976275<\/code><\/pre>\n\n\n\n

As we can see, this is consistent with what the paper found: the most correlated stocks are for energy companies. Let’s look at one of these on a chart:<\/p>\n\n\n\n

\"Comparison
Comparison of XEL and AEP pairs trading charts. Credit: Yahoo<\/figcaption><\/figure>\n\n\n\n

To determine the signal for when a trade should occur, we look at the difference between the two stocks:<\/p>\n\n\n\n

In [22]: diff = table['XEL']-table['AEP']\n\nIn [23]: diff.mean()\nOut[23]: -24.10590632758745\n\nIn [24]: diff.std()\nOut[24]: 3.148618069119123<\/pre>\n\n\n\n

Now, we can say with a reasonably high level of confidence that if the two stocks diverge by more than 2 standard deviations (or ~ 6.3), there is a trading opportunity. Let’s see if that happens in the period following this one:<\/p>\n\n\n\n

\n> next_period = df.pivot(columns='ticker')\n> next_period.columns = [col[1] for col in next_period.columns]\n> next_period.index = pd.to_datetime(next_period.index)\n\n> next_period = next_period['2020-01-01':]\n> next_diff = next_period['XEL']-next_period['AEP']\n> next_diff.loc[next_diff < -30.4]\ndate\n2020-01-15   -30.736753\n2020-01-16   -30.877687\n2020-01-17   -31.762922\n2020-01-21   -31.903163\n2020-01-22   -32.340503\n2020-01-23   -32.607094\n2020-01-24   -33.502785\n2020-01-27   -33.378315\n2020-01-28   -33.482761\n2020-01-29   -33.890905\n2020-01-30   -33.828830\n2020-01-31   -33.521868\n2020-02-03   -33.265412\n2020-02-04   -32.095119\n2020-02-05   -31.362759\n2020-02-06   -30.806166\n2020-02-07   -31.866599\n2020-02-10   -32.091640\n2020-02-11   -32.485245\n2020-02-12   -31.993412\n2020-02-13   -32.256399\n2020-02-14   -32.717131\n2020-02-18   -32.389837\n2020-02-19   -31.938717\n2020-02-20   -30.955869\n<\/code><\/pre>\n\n\n\n

So according to the pairs trading algorithm provided, we would open the trade on January 15, 2020 (XEL @ 63.59, AEP @ 94.33). The algorithm closes the trade when the two stocks cross their mean difference again. This happens on March 11, 2020 (XEL @ 65.90, AEP @ 88.21). This trade would therefore earn us $8.43, ignoring transaction costs.<\/p>\n\n\n\n

A real test of the pairs trading algorithm<\/h2>\n\n\n\n

So in the one example, we did earn a positive return by following the pairs trading algorithm. The question is: does this generalize, and can we beat the market with this strategy? We will now look at a much larger number of correlated pairs.<\/p>\n\n\n\n

First, we need to codify the trading algorithm. For readability, we will iterate over the DataFrames, although in practice there are much better ways to do this. We define a function to calculate per-trade profit<\/p>\n\n\n\n

def calc_trade_profit(ticker1, ticker2, start_date, end_date):\n    if next_period.loc[start_date][ticker1] &gt; next_period.loc[start_date][ticker2]:\n        high_ticker = ticker1  \n        low_ticker = ticker2\n\n    else:\n        high_ticker = ticker2\n        low_ticker = ticker1\n        \n    short_profit = next_period.loc[start_date][high_ticker] - next_period.loc[end_date][high_ticker]\n    long_profit = next_period.loc[end_date][low_ticker] - next_period.loc[start_date][low_ticker]\n\n    return long_profit + short_profit<\/pre>\n\n\n\n

Now, we iterate over each day in the period. For each day, we will iterate over the correlated pairs to find whether they trade on that day. If the difference between the prices of the two stocks is greater than the threshold (and was less yesterday), we open the trade. If the trade is open, and the difference between the prices goes below the mean, we close the trade. <\/p>\n\n\n\n

Here’s our code:<\/p>\n\n\n\n

print(\"starting trades\")\nfor index, row in next_period.iterrows():\n    # iterate over the days of the next period\n    \n    for sindex, srow in corrdata.iterrows():\n        # iterate over the correlated pairs\n        \n        ticker1 = sindex[0]\n        ticker2 = sindex[1]\n        \n        trade = ticker1 + '-' + ticker2\n        \n        mean = srow['mean']\n        std = srow['std']\n        \n        if not trade in trades:\n            trades[trade] = { 'open' : False, 'start' : None, 'earned' : 0. }\n        \n        threshold = abs(mean) + 2*std\n        print(trade + \" \" + str(row[ticker1]) + \" \" + str(row[ticker2]) + \" \" + str(threshold) + \" \" + str(abs(row[ticker1] - row[ticker2])))\n\n        if ticker1 in prevrow:\n            if abs(row[ticker1] - row[ticker2]) &gt; threshold and abs(prevrow[ticker1] - prevrow[ticker2] &lt;= threshold):\n                #open a trade if not open\n                if not trades[trade]['open']:\n                    print(\"opening trade \" + ticker1 + \"-\" + ticker2 + \": \" + str(index))\n                    trades[trade]['start'] = index\n                    trades[trade]['open'] = True\n            \n            if abs(row[ticker1] - row[ticker2]) &lt; abs(mean) and abs(prevrow[ticker1] - prevrow[ticker2]) &gt;= abs(mean):\n                #close an open trade\n                if trades[trade]['open']:\n                    print(\"closing trade \" + ticker1 + \"-\" + ticker2 + \": \" + str(index))\n                    trades[trade]['earned'] = trades[trade]['earned'] + calc_trade_profit(ticker1, ticker2, trades[trade]['start'], index)\n                    trades[trade]['start'] = None\n                    trades[trade]['open'] = False\n                \n    prevrow = row<\/pre>\n\n\n\n

Finally, we need to close all open trades on the last day (these are the ones that may be losers):<\/p>\n\n\n\n

#now close all of the open trades on the last day of the period\nlast_day = next_period.index[-1]\n\nfor sindex, srow in corrdata.iterrows():\n    # iterate over the correlated pairs\n    \n    ticker1 = sindex[0]\n    ticker2 = sindex[1]\n    \n    trade = ticker1 + '-' + ticker2\n    if (trade in trades) and (trades[trade]['open']):\n        print(\"last day: closing trade \" + ticker1 + \"-\" + ticker2)\n        trades[trade]['earned'] = trades[trade]['earned'] + calc_trade_profit(ticker1, ticker2, trades[trade]['start'], last_day)\n        trades[trade]['start'] = None\n        trades[trade]['open'] = False\n        \n    \n    \nprint(\"total earned: \" + str(functools.reduce(lambda acc,v: acc+trades[v]['earned'], trades, 0)))\n<\/pre>\n\n\n\n

Now, we run this algorithm on the top 100 correlated pairs:<\/p>\n\n\n\n

XEL   AEP     0.996523\nV     MA      0.994791\nNEE   ETR     0.994646\nWEC   XEL     0.994444\n      CMS     0.994384\nSUI   ELS     0.993536\nCMS   XEL     0.993018\nAWK   XEL     0.992827\nUDR   AVB     0.992741\n      AIV     0.992524\nWEC   AWK     0.992214\nHE    AEP     0.992158\nSUI   ETR     0.991948\n      MAA     0.991770\nCMS   LNT     0.991266\nADC   O       0.991101\nHE    XEL     0.991046\nOPTT  FCEL    0.990808\nAON   AJG     0.990615\nWELL  PEAK    0.990414\nAIV   AVB     0.990408\nTRNO  PLD     0.990302\nNEE   LNT     0.990216\nAEP   AWK     0.989953\nPOR   AEP     0.989822\nCMS   AEP     0.989687\nO     AIV     0.989058\n      UDR     0.989025\nPOR   XEL     0.988931\nPEAK  O       0.988882\nDRE   EGP     0.988872\nSO    ETR     0.988799\nDRE   FR      0.988766\nAWK   AMT     0.988565\nPNM   XEL     0.988536\nHE    AWK     0.988447\nELS   ETR     0.988435\nO     NNN     0.988409\nEGP   ELS     0.988395\nMSFT  V       0.988302\nSUI   NEE     0.987992\nCMS   POR     0.987979\nCVV   BLIN    0.987850\nOGE   AEE     0.987712\nAEP   WEC     0.987688\nCMS   AWK     0.987651\nTRNO  SUI     0.987628\nPEAK  ADC     0.987585\nLNT   ETR     0.987326\nAEP   PNM     0.987310\nPEAK  UDR     0.987200\nPOR   AVB     0.987179\nAVB   CPT     0.987089\nEQR   ESS     0.986930\nLNT   WEC     0.986684\nUSM   TDS     0.986674\nWPC   PSB     0.986629\nNWE   DTE     0.986558\nCPT   EQR     0.986507\nAEE   DTE     0.986407\nXEL   AMT     0.986397\nEGP   SUI     0.986270\nATO   CMS     0.986256\nNEE   FE      0.986243\nAEE   NWE     0.986101\nCMS   EQR     0.986014\nEGP   STWD    0.985946\nMA    MSFT    0.985910\nCMS   FE      0.985907\nNWE   POR     0.985815\nNEE   ELS     0.985728\nCCI   AMT     0.985701\nMT    HAL     0.985365\nEQR   UDR     0.985344\nMAA   ETR     0.985262\nMPW   O       0.985165\nEQR   AVB     0.985094\nBAH   ARCC    0.984924\nMA    FICO    0.984884\nHE    WEC     0.984878\nAEP   AMT     0.984747\nSUI   SO      0.984630\nNEE   SO      0.984616\nWEC   ETR     0.984588\nPNM   AWK     0.984569\nFR    TRNO    0.984532\nHE    PNM     0.984259\nBKH   NWE     0.984165\nSTE   BAH     0.984014\nAWR   WEC     0.983979\nV     ARCC    0.983937\nUDR   ADC     0.983922\nMAA   TRNO    0.983697\nPSB   WELL    0.983626\nMAA   NEE     0.983566\nVRSK  MSI     0.983533\nBAH   V       0.983492\nHE    AMT     0.983375\nAWK   POR     0.983327\nFE    LNT     0.983292<\/code><\/pre>\n\n\n\n

Results of the pairs trading algorithm<\/h3>\n\n\n\n

From the period of January 1, 2020 to August 18, 2020, the algorithm would have earned a profit of $284, by trading one long and one short share at each signal. Since each long is paired with a larger short, you could achieve this with essentially $0 initial investment (ignoring trading costs), making the return essentially infinite. Moreover, if you run the model from January 1, 2020 to April 1, 2020, you would have seen a significant profit, even while the S&P 500 had dropped by almost 25%.<\/p>\n\n\n\n

However, in reality, whenever you open a short position, you assume a liability. <\/p>\n\n\n\n

Conclusion: does pairs trading still work in 2021?<\/h2>\n\n\n\n

Our model shows that pairs trading can work and be profitable in 2021. The question of how profitable, and whether it is worth the risk is something we will tackle in a future article.<\/p>\n","protected":false},"excerpt":{"rendered":"

In this article, we will look at a popular algorithm for hedge funds and high-frequency traders called pairs trading. In its most basic form, you select a pair of stocks that historically move together and trade when they diverge. The expectation is that as they return to a normal state, you will make money. This […]<\/p>\n","protected":false},"author":1,"featured_media":324,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[15],"tags":[17,18],"_links":{"self":[{"href":"https:\/\/firemymoneymanager.com\/wp-json\/wp\/v2\/posts\/299"}],"collection":[{"href":"https:\/\/firemymoneymanager.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/firemymoneymanager.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/firemymoneymanager.com\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/firemymoneymanager.com\/wp-json\/wp\/v2\/comments?post=299"}],"version-history":[{"count":0,"href":"https:\/\/firemymoneymanager.com\/wp-json\/wp\/v2\/posts\/299\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/firemymoneymanager.com\/wp-json\/wp\/v2\/media\/324"}],"wp:attachment":[{"href":"https:\/\/firemymoneymanager.com\/wp-json\/wp\/v2\/media?parent=299"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/firemymoneymanager.com\/wp-json\/wp\/v2\/categories?post=299"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/firemymoneymanager.com\/wp-json\/wp\/v2\/tags?post=299"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}