{"id":221,"date":"2021-01-08T16:14:36","date_gmt":"2021-01-08T16:14:36","guid":{"rendered":"https:\/\/firemymoneymanager.com\/?p=221"},"modified":"2022-04-01T02:01:21","modified_gmt":"2022-04-01T02:01:21","slug":"getting-started-using-python-to-find-alpha","status":"publish","type":"post","link":"https:\/\/firemymoneymanager.com\/getting-started-using-python-to-find-alpha\/","title":{"rendered":"Getting started: using Python to find alpha [2021]"},"content":{"rendered":"\n<p class=\"has-drop-cap\">In this article, we get started examining the CAPM and Fama\/French alphas by calculating their values for real stocks.  Understanding this procedure allows us to build on these models in other articles.<\/p>\n\n\n\n<p>Basu and Fama\/French provided important methods for modeling excess returns based on factors beyond the standard Capital Asset Pricing Model.  Unfortunately, not all of their papers are easily available online.  However, there are plenty of summaries of their work, which are useful reading.<\/p>\n\n\n\n<h2>The basis for alpha: related reading<\/h2>\n\n\n\n<p>The following papers do a good job in summarizing the work of Basu and Farma\/French, and they have the added benefit of being freely available online:<\/p>\n\n\n\n<ul><li><a href=\"https:\/\/digitalcommons.usu.edu\/cgi\/viewcontent.cgi?article=1326&amp;context=gradreports\" class=\"rank-math-link\" target=\"_blank\" rel=\"noopener\">Investment Performance and Price-Earnings Ratios: Basu 1977 Revisited<\/a><\/li><li><a href=\"https:\/\/digitalcommons.usu.edu\/cgi\/viewcontent.cgi?article=1662&amp;context=gradreports\" class=\"rank-math-link\" target=\"_blank\" rel=\"noopener\">Investment Performance of Common Stock in Relation to their Price-Earnings Ratios<\/a><\/li><li><a href=\"https:\/\/poseidon01.ssrn.com\/delivery.php?ID=136001008002099090080087017003121026030075090031022078086024068076127104098093005120059099053118019039062118008007079111069023080092085020084095092089080121095006012054006002076089075115028108086095126100086118021080017068021085106003000002009069&amp;EXT=pdf&amp;INDEX=TRUE\" class=\"rank-math-link\" target=\"_blank\" rel=\"noopener\">The Volatility Effect: Lower Risk without Lower Return<\/a><\/li><li><a href=\"https:\/\/poseidon01.ssrn.com\/delivery.php?ID=330022115101127017082031010080067096023021008018084032024021085066022001106121083124004013059048123122011029106119089009069067047050057008076080016095084026098102019082024069081076099121001004112122113072084018006114110068101073095077105069122093105&amp;EXT=pdf&amp;INDEX=TRUE\" class=\"rank-math-link\" target=\"_blank\" rel=\"noopener\">Five Concerns with the Five-Factor Model<\/a><\/li><li><a href=\"https:\/\/citeseerx.ist.psu.edu\/viewdoc\/download?doi=10.1.1.464.6520&amp;rep=rep1&amp;type=pdf\" class=\"rank-math-link\" target=\"_blank\" rel=\"noopener\">Minimum-Variance Portfolios in the U.S. Equity Market<\/a><\/li><\/ul>\n\n\n\n<p><\/p>\n\n\n\n<p>The concepts of alpha and beta assume that equities follow a normal distribution.  We look into whether that is true in <a href=\"https:\/\/firemymoneymanager.com\/do-equities-really-follow-a-normal-distribution\/\" class=\"rank-math-link\">this article<\/a>.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<h2>Step 1: loading the data using Pandas<\/h2>\n\n\n\n<p>Let&#8217;s look at Russell 2000 stocks.  These should give us a good dataset with which we can work.<\/p>\n\n\n\n<p> We begin by loading in all of the data from Quandl, using the list of Russell 2000 stocks.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">import pandas as pd  \nimport numpy as np\nimport matplotlib.pyplot as plt\nimport seaborn as sns\nimport quandl\nimport pickle\nimport csv\nfrom datetime import date\nimport sys\nimport statsmodels.api as sm\nimport scipy.optimize as sco\nplt.style.use('fivethirtyeight')\nnp.random.seed(777)\n\n\n\npd.set_option('display.max_rows', 100)\npd.set_option('display.max_columns', 100)\n\n\ntickers = []\nused_tickers = []\n\nwith open(\"russell2000tick.csv\") as csvfile:\n    reader = csv.reader(csvfile)\n    for row in reader:\n        tickers.append(row[0])\n    \nalldata = pd.DataFrame()\n\nfor ticker in tickers:\n\n    try:\n        data = pd.read_csv(\"stockdata\/\" + ticker + \".csv\", \n                            header = None,\n                            usecols = [0, 1, 12],\n                            names = ['ticker', 'date', 'adj_close'])\n    \n        alldata = alldata.append(data)\n        used_tickers.append(ticker)\n        print(ticker)\n        \n    except FileNotFoundError as e:\n        pass\n\ntickers = used_tickers<\/pre>\n\n\n\n<p>Now, we can get a table with columns for each ticker and date rows:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">df = alldata.set_index('date')\ntable = df.pivot(columns='ticker')\ntable.columns = [col[1] for col in table.columns]\n<\/pre>\n\n\n\n<p><\/p>\n\n\n\n<h2>Step 2: Cleaning the data for analysis<\/h2>\n\n\n\n<p>There are a lot of <em>NaN <\/em>values in this dataset, so we can clean them out by first removing columns with more than 30 <em>NaNs<\/em>.  Holidays also have <em>NaN <\/em>values, so we need to remove the dates with holidays.  We can do this by adding this code:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">table = table['2010-01-04':'2020-08-18']\n\n# we want the percentage change for comparison\ntable = table.pct_change()\ntable = table.loc[:, (table.isnull().sum(axis=0) &amp;lt;= 30)]\ntable = table.dropna(axis='rows')\n<\/pre>\n\n\n\n<p>Now, we are ready to find the CAPM \u03b1 and \u03b2. <\/p>\n\n\n\n<p><\/p>\n\n\n\n<h2>Step 3: Alpha and the CAPM equation<\/h2>\n\n\n\n<p>The basic CAPM equation is <\/p>\n\n\n\n<p class=\"has-text-align-center\"><em>R<sub>i<\/sub> &#8211; R<sub>f<\/sub> = \u03b1<sub>i <\/sub>+ \u03b2<sub>i<\/sub>(R<sub>m<\/sub> &#8211; R<sub>f<\/sub>) + \u03b5<sub>i<\/sub><\/em><\/p>\n\n\n\n<p>Where <\/p>\n\n\n\n<p><em>R<sub>i<\/sub><\/em> is the return on the stock in the given time period<\/p>\n\n\n\n<p><em>R<sub>f<\/sub><\/em> is the risk-free return<\/p>\n\n\n\n<p><em>\u03b1<sub>i<\/sub><\/em> is the CAPM alpha<\/p>\n\n\n\n<p><em>\u03b2<sub>i<\/sub><\/em> is the CAPM beta<\/p>\n\n\n\n<p><em>R<sub>m<\/sub><\/em> is the market return<\/p>\n\n\n\n<p><em>\u03b5<sub>i<\/sub><\/em> is a stochastic error term<\/p>\n\n\n\n<h3>Determining CAPM market return and risk free rates<\/h3>\n\n\n\n<p>Fama and French (from the paper linked above) keep a data library with calculations of the market return and risk free rate, which can be accessed from <a href=\"https:\/\/mba.tuck.dartmouth.edu\/pages\/faculty\/ken.french\/data_library.html\" class=\"rank-math-link\" target=\"_blank\" rel=\"noopener\">here<\/a>.<\/p>\n\n\n\n<p>The CSV doesn&#8217;t have a header for the first column, so we edit the header and name the first column &#8216;Date&#8217;.  Then we can load and format the CSV like this:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">factors = pd.read_csv(\"famafactors.csv\")\nfactors['Date'] = pd.to_datetime(factors['Date'],format='%Y%m%d')\nfactors['Mkt-RF'] = factors['Mkt-RF'] \/ 100\nfactors = factors.set_index('Date')<\/pre>\n\n\n\n<p>Now, we combine the factors and the returns tables:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">factors = factors['2010-01-04':'2020-08-18']\ntable = factors.merge(table, left_index=True, right_index=True, how='inner')\n<\/pre>\n\n\n\n<p><\/p>\n\n\n\n<h2>Step 4: Running regressions to find alpha<\/h2>\n\n\n\n<p>At this point, it&#8217;s easy to plot regression lines for individual stocks.  If we want to see a plot for NL, we can do this:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">sns.regplot(y='Mkt-RF', x='NL', data=table)<\/pre>\n\n\n\n<p>Now, we can start running correlations.  For the first test, we will do a regression on the daily data.  This will obviously result in a low correlation, due to the amount of noise in the data.  However, it gives us a baseline from which we can do further analysis.<\/p>\n\n\n\n<p>To do this, we use the following code:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">regressions = pd.DataFrame(columns = ['ticker', 'alpha', 'beta', 'rsquared'])\n\n# start at 6 to skip date and fama factors\nfor i in range(6, len(table.columns)):\n    \n    # for calculating y, we must subtract the risk free rate\n    y = table[table.columns[i]] - table['_RF']\n    X = table['Mkt-RF']\n\n    ro = y.between(y.quantile(.01), y.quantile(.99))\n    y = y[ro]\n    X = X[ro]\n    \n    X = sm.add_constant(X)\n    \n   \n    model = sm.OLS(y, X).fit()\n    \n    regressions.loc[i-6] = [ \n        table.columns[i],\n        model.params['const'],\n        model.params['Mkt-RF'],\n        model.rsquared\n    ]\n                    <\/pre>\n\n\n\n<p>Note that we filtered out outliers here that are in the top or bottom 1%.  It&#8217;s up to you whether you think that makes sense in this particular context.<\/p>\n\n\n\n<p>For a sanity check we can first run the regressions against a list of ETFs.  This should give us the broader market ETFs (S&amp;P 500 ETFs for example) first, and they should be nearly 100% correlated to the Mkt-RF.  Here is the output:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>124    CORP -0.002448  0.000005  1.215519e-10\n1008    SHY -0.002567 -0.000072  5.407849e-08\n505    IBND -0.002573  0.000222  1.326580e-07\n758     NOM -0.002519  0.001175  9.270459e-07\n653     LQD -0.002432  0.000688  2.020755e-06\n    ...       ...       ...           ...\n1071    SSO -0.002655  1.920029  9.420202e-01\n1060   SPXS -0.002690 -2.859140  9.637690e-01\n1061   SPXU -0.002657 -2.854800  9.639438e-01\n1059   SPXL -0.002696  2.873916  9.641320e-01\n1134   UPRO -0.002690  2.881056  9.644571e-01<\/code><\/pre>\n\n\n\n<p>As you can see, the best correlated are UPRO and SPXL, which are both triple-long S&amp;P 500 ETFs.  The beta for both of these tickers is nearly 3, which makes sense.  SPXS, the triple-short S&amp;P 500 ETF, is next, with a beta of -3.  And at the bottom, we have a corporate bond ETF.<\/p>\n\n\n\n<p>Now, we can return to using stock tickers.  After running the code again, regressions is a table of all tickers with their daily alpha, beta, and r-squared factors.  We can sort it to get the most well correlated and the least well correlated tickers:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>     ticker     alpha      beta      rsquared\n1015    GFI -0.002056  0.003114  8.205919e-07\n2578   WHLM -0.002060 -0.009481  6.090840e-06\n650     CVR -0.002334  0.006751  1.254616e-05\n1948   PRPH -0.001908  0.010896  1.521615e-05\n2355    THM -0.002350 -0.019028  1.782250e-05\n    ...       ...       ...           ...\n32      ACN -0.002495  0.974155  5.241445e-01\n843      EV -0.002993  1.272874  5.278900e-01\n2401   TROW -0.002700  1.121679  5.570539e-01\n132     AMP -0.002748  1.392772  5.767456e-01\n337     BLK -0.002516  1.229884  5.860815e-01<\/code><\/pre>\n\n\n\n<p>You can see that the least correlated ticker is GFI, which is Gold Fields Limited, one of the largest gold mining firms.  Obviously, gold related instruments are going to move very differently from most equities.  The most closely correlated is BlackRock.<\/p>\n\n\n\n<p>We can look at these on a chart (the S&amp;P 500 is in orange, BlackRock is in blue, and GFI is in green):<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" width=\"1024\" height=\"566\" src=\"https:\/\/firemymoneymanager.com\/wp-content\/uploads\/2021\/02\/screenshot-2021-02-25-09-56-36-1024x566.png\" alt=\"\" class=\"wp-image-402\" srcset=\"https:\/\/firemymoneymanager.com\/wp-content\/uploads\/2021\/02\/screenshot-2021-02-25-09-56-36-1024x566.png 1024w, https:\/\/firemymoneymanager.com\/wp-content\/uploads\/2021\/02\/screenshot-2021-02-25-09-56-36-300x166.png 300w, https:\/\/firemymoneymanager.com\/wp-content\/uploads\/2021\/02\/screenshot-2021-02-25-09-56-36-768x425.png 768w, https:\/\/firemymoneymanager.com\/wp-content\/uploads\/2021\/02\/screenshot-2021-02-25-09-56-36-620x343.png 620w, https:\/\/firemymoneymanager.com\/wp-content\/uploads\/2021\/02\/screenshot-2021-02-25-09-56-36-800x442.png 800w, https:\/\/firemymoneymanager.com\/wp-content\/uploads\/2021\/02\/screenshot-2021-02-25-09-56-36-50x28.png 50w, https:\/\/firemymoneymanager.com\/wp-content\/uploads\/2021\/02\/screenshot-2021-02-25-09-56-36.png 1340w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><figcaption>Chart of alpha, beta correlations for BlackRock, GFI, and the S&amp;P 500.  From Yahoo Charts.<\/figcaption><\/figure>\n\n\n\n<h3>Resampling to Monthly<\/h3>\n\n\n\n<p>Running these regressions on daily data may be interesting, but not particularly useful.  Due to the error inherent in CAPM, we need to look at longer time periods.  So we now resample the data to monthly.<\/p>\n\n\n\n<p>First, we load the monthly factor data, which is in a slightly different format.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">factors = pd.read_csv(\"d:\\\\famafactorsmonthly.csv\")\nfactors['Date'] = pd.to_datetime(factors['Date'],format='%Y%m')\nfactors['Mkt-RF'] = factors['Mkt-RF'] \/ 100\nfactors = factors.set_index('Date')<\/pre>\n\n\n\n<p>The monthly factor data is the data from the end of each month.  Unfortunately, this loads the monthly data on the first of each month.  So we need to change the dates to end of month:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">factors.index = factors.index.to_period('M').to_timestamp('M')\n<\/pre>\n\n\n\n<p>Now, we can run the ETF test again on the monthly time scale.  And here are our results:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>     ticker     alpha      beta  rsquared\n1012    SJB -0.059726  0.005377  0.000010\n96      CCZ -0.049075 -0.020540  0.000076\n426    GDXJ -0.066987  0.096735  0.000570\n753     NMI -0.053687  0.055460  0.000775\n1266    YCL -0.065084  0.057360  0.000893\n    ...       ...       ...       ...\n1105   TQQQ -0.051655  3.366393  0.625342\n1131   UMDD -0.068596  3.494975  0.648961\n684    MIDU -0.069756  3.542325  0.658577\n1059   SPXL -0.061149  3.023759  0.666595\n1134   UPRO -0.060995  3.033314  0.667091<\/code><\/pre>\n\n\n\n<p>SJB is a short bond ETF, so it makes sense that it would be uncorrelated to equities.  Meanwhile, we see the same ETFs at the top of the list.<\/p>\n\n\n\n<p>Now, let&#8217;s look at the list of equities, this time sorting by alpha:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>     ticker     alpha      beta  rsquared\n2380   TOPS -0.209049  1.177885  0.028528\n448     CEI -0.189496  1.496256  0.035676\n684    DCTH -0.184497  0.085347  0.000182\n1012   GEVO -0.184126  2.639587  0.261636\n1709   NSPR -0.181267  2.010658  0.112558\n    ...       ...       ...       ...\n784    EHTH -0.008732 -0.119886  0.000704\n2306    TAL -0.006959  0.486709  0.023229\n847     EVI -0.006877  0.974036  0.030031\n1253   INSG -0.004456  1.306819  0.047048\n728      DQ -0.003193  1.698251  0.076006<\/code><\/pre>\n\n\n\n<p>So here we go, the alphas of all stocks &#8212; and there&#8217;s one thing we notice immediately: they&#8217;re all less than 0.  <\/p>\n\n\n\n<h2>Testing the Fama-French Model<\/h2>\n\n\n\n<p>One thing we learn from this is that the CAPM model is not a very good fit for current stock prices.  When we run a regression, our r-squared value maxes out at less than .5 for non-ETFs, meaning the fit is not very good.<\/p>\n\n\n\n<p>There is another popular and more recent model called the <a href=\"https:\/\/mba.tuck.dartmouth.edu\/pages\/faculty\/ken.french\/data_library.html\" class=\"rank-math-link\" target=\"_blank\" rel=\"noopener\">Fama-French<\/a> model.  This model adds additional variables to the CAPM equation.  The Fama-French equation is:<\/p>\n\n\n\n<p>The Fama-French equation is <\/p>\n\n\n\n<p class=\"has-text-align-center\"><em>R<sub>i<\/sub> &#8211; R<sub>f<\/sub> = \u03b1<sub>i <\/sub>+ \u03b2<sub>i<\/sub>(R<sub>m<\/sub> &#8211; R<sub>f<\/sub>) + s<sub>p<\/sub>SMB + h<sub>p<\/sub>HML + \u03b5<sub>i<\/sub><\/em><\/p>\n\n\n\n<p>Where the alpha, beta, and epsilon terms remain the same, but two new terms are added:<\/p>\n\n\n\n<p><em><em>s<sub>p<\/sub>SMB<\/em><\/em>, which is a variable <em><em>s<sub>p<\/sub><\/em><\/em> multiplied by a precalculated value of the difference between small and big portfolios<\/p>\n\n\n\n<p><em><em>h<sub>p<\/sub>HML<\/em><\/em>, which is a variable <em><em>h<sub>p<\/sub><\/em><\/em> multiplied by a precalculated value of the difference between the highest book to market ratio and the lowest.<\/p>\n\n\n\n<p>The creators of this model publish the values of HML and SMB at the link above.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<h2>Conclusion<\/h2>\n\n\n\n<p>There we have it, we have used Python and Pandas to find alphas for each stock in our dataset.  From here, we can start looking into using these values for strategies, such as Mean-Variance Optimization, and basic statistical arbitrage.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this article, we get started examining the CAPM and Fama\/French alphas by calculating their values for real stocks. Understanding this procedure allows us to build on these models in other articles. Basu and Fama\/French provided important methods for modeling excess returns based on factors beyond the standard Capital Asset Pricing Model. Unfortunately, not all [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":402,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[6],"tags":[7,13,12,11,10,14],"_links":{"self":[{"href":"https:\/\/firemymoneymanager.com\/wp-json\/wp\/v2\/posts\/221"}],"collection":[{"href":"https:\/\/firemymoneymanager.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/firemymoneymanager.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/firemymoneymanager.com\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/firemymoneymanager.com\/wp-json\/wp\/v2\/comments?post=221"}],"version-history":[{"count":1,"href":"https:\/\/firemymoneymanager.com\/wp-json\/wp\/v2\/posts\/221\/revisions"}],"predecessor-version":[{"id":1197,"href":"https:\/\/firemymoneymanager.com\/wp-json\/wp\/v2\/posts\/221\/revisions\/1197"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/firemymoneymanager.com\/wp-json\/wp\/v2\/media\/402"}],"wp:attachment":[{"href":"https:\/\/firemymoneymanager.com\/wp-json\/wp\/v2\/media?parent=221"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/firemymoneymanager.com\/wp-json\/wp\/v2\/categories?post=221"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/firemymoneymanager.com\/wp-json\/wp\/v2\/tags?post=221"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}