Getting Started<\/a> article, which you should read before this one.<\/p>\n\n\n\nOur first task is loading the data. Our data comes from Quandl in a set of CSV files. We will load them all into a DataFrame using pandas:<\/p>\n\n\n\n
import pandas as pd \nimport numpy as np\nimport matplotlib.pyplot as plt\nimport seaborn as sns\nimport quandl\nimport pickle\nimport csv\nimport os\nfrom datetime import date\nimport sys\nimport statsmodels.api as sm\nimport scipy.optimize as sco\nplt.style.use('fivethirtyeight')\nnp.random.seed(777)\n\n\nalldata = pd.DataFrame()\n\ntry: \n ticker_data = pd.read_csv(\"tickers.csv\")\nexcept FileNotFoundError as e:\n pass\n\nfor t in ticker_data['ticker']:\n \n try:\n data = pd.read_csv(\"d:\\\\stockdata\\\\\" + t + \".csv\", \n header = None,\n usecols = [0, 1, 12],\n names = ['ticker', 'date', 'adj_close'])\n \n alldata = alldata.append(data)\n \n except FileNotFoundError as e:\n pass\n<\/pre>\n\n\n\n Our ticker CSV file contains a lot of tickers that we don’t want to look at, such as delisted ones and OTC stocks, so we will filter it before loading:<\/p>\n\n\n\n
ticker_data = ticker_data.loc[(ticker_data['table'] == 'SF1') & (ticker_data['isdelisted'] == 'N') & (ticker_data['currency'] == 'USD') & ((ticker_data.exchange == 'NYSE') | (ticker_data.exchange == 'NYSEMKT') | (ticker_data.exchange == 'NASDAQ'))]\n<\/pre>\n\n\n\nNow, we have all of the relevant tickers in a DataFrame. The next step is cleaning the data so that we can build a correlation matrix. We will use daily data at first. Later, we can move to longer timeframes.<\/p>\n\n\n\n
df = alldata.set_index('date')\ntable = df.pivot(columns='ticker')\ntable.columns = [col[1] for col in table.columns]\n\ntable.index = pd.to_datetime(table.index)\ntable = table['2018-01-01':'2020-01-01']\n\ntable = table.loc[:, (table.isnull().sum(axis=0) <= 30)]\ntable = table.dropna(axis='rows')<\/pre>\n\n\n\nFinally, generating a correlation matrix is very simple with Pandas:<\/p>\n\n\n\n
A AAL AAME AAN AAON AAP AAPL \\\nA 1.000000 -0.411075 -0.423860 0.548792 0.677535 0.439531 0.515594 \nAAL -0.411075 1.000000 0.762728 -0.675839 -0.625932 -0.729448 -0.526901 \nAAME -0.423860 0.762728 1.000000 -0.544478 -0.579686 -0.578142 -0.637582 \nAAN 0.548792 -0.675839 -0.544478 1.000000 0.857118 0.498967 0.648432 \nAAON 0.677535 -0.625932 -0.579686 0.857118 1.000000 0.484853 0.624596 \n ... ... ... ... ... ... ... \nZIOP 0.296149 -0.085815 -0.192385 0.472562 0.574997 -0.224872 0.353245 \nZIXI 0.440557 -0.699289 -0.566404 0.679136 0.761478 0.583238 0.284436 \nZNGA 0.631641 -0.730482 -0.702510 0.844649 0.898749 0.453662 0.625134 \nZUMZ 0.562601 -0.421862 -0.498312 0.619821 0.548948 0.285115 0.796054 \nZVO -0.423629 0.493856 0.469000 -0.674794 -0.682076 -0.058555 -0.525241 <\/code><\/pre>\n\n\n\nFinding pair trading opportunities<\/h2>\n\n\n\n With the correlation matrix, we can get a lot of information. Using a simple sort, we can get the top correlations:<\/p>\n\n\n\n
XEL AEP 0.996523\nV MA 0.994791\nNEE ETR 0.994646\nWEC XEL 0.994444\n \nOPTT AEE -0.972316\nSHIP AMT -0.972656\nSO DYNT -0.972725\nAEE FCEL -0.973476\nOGE OPTT -0.976275<\/code><\/pre>\n\n\n\nAs we can see, this is consistent with what the paper found: the most correlated stocks are for energy companies. Let’s look at one of these on a chart:<\/p>\n\n\n\nComparison of XEL and AEP pairs trading charts. Credit: Yahoo<\/figcaption><\/figure>\n\n\n\nTo determine the signal for when a trade should occur, we look at the difference between the two stocks:<\/p>\n\n\n\n
In [22]: diff = table['XEL']-table['AEP']\n\nIn [23]: diff.mean()\nOut[23]: -24.10590632758745\n\nIn [24]: diff.std()\nOut[24]: 3.148618069119123<\/pre>\n\n\n\nNow, we can say with a reasonably high level of confidence that if the two stocks diverge by more than 2 standard deviations (or ~ 6.3), there is a trading opportunity. Let’s see if that happens in the period following this one:<\/p>\n\n\n\n
\n> next_period = df.pivot(columns='ticker')\n> next_period.columns = [col[1] for col in next_period.columns]\n> next_period.index = pd.to_datetime(next_period.index)\n\n> next_period = next_period['2020-01-01':]\n> next_diff = next_period['XEL']-next_period['AEP']\n> next_diff.loc[next_diff < -30.4]\ndate\n2020-01-15 -30.736753\n2020-01-16 -30.877687\n2020-01-17 -31.762922\n2020-01-21 -31.903163\n2020-01-22 -32.340503\n2020-01-23 -32.607094\n2020-01-24 -33.502785\n2020-01-27 -33.378315\n2020-01-28 -33.482761\n2020-01-29 -33.890905\n2020-01-30 -33.828830\n2020-01-31 -33.521868\n2020-02-03 -33.265412\n2020-02-04 -32.095119\n2020-02-05 -31.362759\n2020-02-06 -30.806166\n2020-02-07 -31.866599\n2020-02-10 -32.091640\n2020-02-11 -32.485245\n2020-02-12 -31.993412\n2020-02-13 -32.256399\n2020-02-14 -32.717131\n2020-02-18 -32.389837\n2020-02-19 -31.938717\n2020-02-20 -30.955869\n<\/code><\/pre>\n\n\n\nSo according to the pairs trading algorithm provided, we would open the trade on January 15, 2020 (XEL @ 63.59, AEP @ 94.33). The algorithm closes the trade when the two stocks cross their mean difference again. This happens on March 11, 2020 (XEL @ 65.90, AEP @ 88.21). This trade would therefore earn us $8.43, ignoring transaction costs.<\/p>\n\n\n\n
A real test of the pairs trading algorithm<\/h2>\n\n\n\n So in the one example, we did earn a positive return by following the pairs trading algorithm. The question is: does this generalize, and can we beat the market with this strategy? We will now look at a much larger number of correlated pairs.<\/p>\n\n\n\n
First, we need to codify the trading algorithm. For readability, we will iterate over the DataFrames, although in practice there are much better ways to do this. We define a function to calculate per-trade profit<\/p>\n\n\n\n
def calc_trade_profit(ticker1, ticker2, start_date, end_date):\n if next_period.loc[start_date][ticker1] > next_period.loc[start_date][ticker2]:\n high_ticker = ticker1 \n low_ticker = ticker2\n\n else:\n high_ticker = ticker2\n low_ticker = ticker1\n \n short_profit = next_period.loc[start_date][high_ticker] - next_period.loc[end_date][high_ticker]\n long_profit = next_period.loc[end_date][low_ticker] - next_period.loc[start_date][low_ticker]\n\n return long_profit + short_profit<\/pre>\n\n\n\nNow, we iterate over each day in the period. For each day, we will iterate over the correlated pairs to find whether they trade on that day. If the difference between the prices of the two stocks is greater than the threshold (and was less yesterday), we open the trade. If the trade is open, and the difference between the prices goes below the mean, we close the trade. <\/p>\n\n\n\n
Here’s our code:<\/p>\n\n\n\n
print(\"starting trades\")\nfor index, row in next_period.iterrows():\n # iterate over the days of the next period\n \n for sindex, srow in corrdata.iterrows():\n # iterate over the correlated pairs\n \n ticker1 = sindex[0]\n ticker2 = sindex[1]\n \n trade = ticker1 + '-' + ticker2\n \n mean = srow['mean']\n std = srow['std']\n \n if not trade in trades:\n trades[trade] = { 'open' : False, 'start' : None, 'earned' : 0. }\n \n threshold = abs(mean) + 2*std\n print(trade + \" \" + str(row[ticker1]) + \" \" + str(row[ticker2]) + \" \" + str(threshold) + \" \" + str(abs(row[ticker1] - row[ticker2])))\n\n if ticker1 in prevrow:\n if abs(row[ticker1] - row[ticker2]) > threshold and abs(prevrow[ticker1] - prevrow[ticker2] <= threshold):\n #open a trade if not open\n if not trades[trade]['open']:\n print(\"opening trade \" + ticker1 + \"-\" + ticker2 + \": \" + str(index))\n trades[trade]['start'] = index\n trades[trade]['open'] = True\n \n if abs(row[ticker1] - row[ticker2]) < abs(mean) and abs(prevrow[ticker1] - prevrow[ticker2]) >= abs(mean):\n #close an open trade\n if trades[trade]['open']:\n print(\"closing trade \" + ticker1 + \"-\" + ticker2 + \": \" + str(index))\n trades[trade]['earned'] = trades[trade]['earned'] + calc_trade_profit(ticker1, ticker2, trades[trade]['start'], index)\n trades[trade]['start'] = None\n trades[trade]['open'] = False\n \n prevrow = row<\/pre>\n\n\n\nFinally, we need to close all open trades on the last day (these are the ones that may be losers):<\/p>\n\n\n\n
#now close all of the open trades on the last day of the period\nlast_day = next_period.index[-1]\n\nfor sindex, srow in corrdata.iterrows():\n # iterate over the correlated pairs\n \n ticker1 = sindex[0]\n ticker2 = sindex[1]\n \n trade = ticker1 + '-' + ticker2\n if (trade in trades) and (trades[trade]['open']):\n print(\"last day: closing trade \" + ticker1 + \"-\" + ticker2)\n trades[trade]['earned'] = trades[trade]['earned'] + calc_trade_profit(ticker1, ticker2, trades[trade]['start'], last_day)\n trades[trade]['start'] = None\n trades[trade]['open'] = False\n \n \n \nprint(\"total earned: \" + str(functools.reduce(lambda acc,v: acc+trades[v]['earned'], trades, 0)))\n<\/pre>\n\n\n\nNow, we run this algorithm on the top 100 correlated pairs:<\/p>\n\n\n\n
XEL AEP 0.996523\nV MA 0.994791\nNEE ETR 0.994646\nWEC XEL 0.994444\n CMS 0.994384\nSUI ELS 0.993536\nCMS XEL 0.993018\nAWK XEL 0.992827\nUDR AVB 0.992741\n AIV 0.992524\nWEC AWK 0.992214\nHE AEP 0.992158\nSUI ETR 0.991948\n MAA 0.991770\nCMS LNT 0.991266\nADC O 0.991101\nHE XEL 0.991046\nOPTT FCEL 0.990808\nAON AJG 0.990615\nWELL PEAK 0.990414\nAIV AVB 0.990408\nTRNO PLD 0.990302\nNEE LNT 0.990216\nAEP AWK 0.989953\nPOR AEP 0.989822\nCMS AEP 0.989687\nO AIV 0.989058\n UDR 0.989025\nPOR XEL 0.988931\nPEAK O 0.988882\nDRE EGP 0.988872\nSO ETR 0.988799\nDRE FR 0.988766\nAWK AMT 0.988565\nPNM XEL 0.988536\nHE AWK 0.988447\nELS ETR 0.988435\nO NNN 0.988409\nEGP ELS 0.988395\nMSFT V 0.988302\nSUI NEE 0.987992\nCMS POR 0.987979\nCVV BLIN 0.987850\nOGE AEE 0.987712\nAEP WEC 0.987688\nCMS AWK 0.987651\nTRNO SUI 0.987628\nPEAK ADC 0.987585\nLNT ETR 0.987326\nAEP PNM 0.987310\nPEAK UDR 0.987200\nPOR AVB 0.987179\nAVB CPT 0.987089\nEQR ESS 0.986930\nLNT WEC 0.986684\nUSM TDS 0.986674\nWPC PSB 0.986629\nNWE DTE 0.986558\nCPT EQR 0.986507\nAEE DTE 0.986407\nXEL AMT 0.986397\nEGP SUI 0.986270\nATO CMS 0.986256\nNEE FE 0.986243\nAEE NWE 0.986101\nCMS EQR 0.986014\nEGP STWD 0.985946\nMA MSFT 0.985910\nCMS FE 0.985907\nNWE POR 0.985815\nNEE ELS 0.985728\nCCI AMT 0.985701\nMT HAL 0.985365\nEQR UDR 0.985344\nMAA ETR 0.985262\nMPW O 0.985165\nEQR AVB 0.985094\nBAH ARCC 0.984924\nMA FICO 0.984884\nHE WEC 0.984878\nAEP AMT 0.984747\nSUI SO 0.984630\nNEE SO 0.984616\nWEC ETR 0.984588\nPNM AWK 0.984569\nFR TRNO 0.984532\nHE PNM 0.984259\nBKH NWE 0.984165\nSTE BAH 0.984014\nAWR WEC 0.983979\nV ARCC 0.983937\nUDR ADC 0.983922\nMAA TRNO 0.983697\nPSB WELL 0.983626\nMAA NEE 0.983566\nVRSK MSI 0.983533\nBAH V 0.983492\nHE AMT 0.983375\nAWK POR 0.983327\nFE LNT 0.983292<\/code><\/pre>\n\n\n\nResults of the pairs trading algorithm<\/h3>\n\n\n\n From the period of January 1, 2020 to August 18, 2020, the algorithm would have earned a profit of $284, by trading one long and one short share at each signal. Since each long is paired with a larger short, you could achieve this with essentially $0 initial investment (ignoring trading costs), making the return essentially infinite. Moreover, if you run the model from January 1, 2020 to April 1, 2020, you would have seen a significant profit, even while the S&P 500 had dropped by almost 25%.<\/p>\n\n\n\n
However, in reality, whenever you open a short position, you assume a liability. <\/p>\n\n\n\n
Conclusion: does pairs trading still work in 2021?<\/h2>\n\n\n\n Our model shows that pairs trading can work and be profitable in 2021. The question of how profitable, and whether it is worth the risk is something we will tackle in a future article.<\/p>\n","protected":false},"excerpt":{"rendered":"
In this article, we will look at a popular algorithm for hedge funds and high-frequency traders called pairs trading. In its most basic form, you select a pair of stocks that historically move together and trade when they diverge. The expectation is that as they return to a normal state, you will make money. This […]<\/p>\n","protected":false},"author":1,"featured_media":324,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[15],"tags":[17,18],"_links":{"self":[{"href":"https:\/\/firemymoneymanager.com\/wp-json\/wp\/v2\/posts\/299"}],"collection":[{"href":"https:\/\/firemymoneymanager.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/firemymoneymanager.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/firemymoneymanager.com\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/firemymoneymanager.com\/wp-json\/wp\/v2\/comments?post=299"}],"version-history":[{"count":0,"href":"https:\/\/firemymoneymanager.com\/wp-json\/wp\/v2\/posts\/299\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/firemymoneymanager.com\/wp-json\/wp\/v2\/media\/324"}],"wp:attachment":[{"href":"https:\/\/firemymoneymanager.com\/wp-json\/wp\/v2\/media?parent=299"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/firemymoneymanager.com\/wp-json\/wp\/v2\/categories?post=299"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/firemymoneymanager.com\/wp-json\/wp\/v2\/tags?post=299"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}