{"id":981,"date":"2021-10-21T15:13:13","date_gmt":"2021-10-21T15:13:13","guid":{"rendered":"https:\/\/firemymoneymanager.com\/?p=981"},"modified":"2022-04-01T01:53:39","modified_gmt":"2022-04-01T01:53:39","slug":"principal-component-analysis-predict-stock-returns","status":"publish","type":"post","link":"https:\/\/firemymoneymanager.com\/principal-component-analysis-predict-stock-returns\/","title":{"rendered":"Can principal component analysis predict stock returns? [2021]"},"content":{"rendered":"\n<p>In this article we will take a look at principal component analysis.  Principal component analysis (or PCA) is a tool used in many disciplines to find patterns in data.  It can either be used as part of a machine learning algorithm, or it can be used on its own.  <\/p>\n\n\n\n<h2>What is principal component analysis?<\/h2>\n\n\n\n<p>Wikipedia defines principal component analysis like this:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote\"><p><strong>Principal component analysis<\/strong>&nbsp;(<strong>PCA<\/strong>) is the process of computing the principal components and using them to perform a&nbsp;<a href=\"https:\/\/en.wikipedia.org\/wiki\/Change_of_basis\" target=\"_blank\" rel=\"noopener\">change of basis<\/a>&nbsp;on the data, sometimes using only the first few principal components and ignoring the rest.<\/p><cite>Wikipedia<\/cite><\/blockquote>\n\n\n\n<p>Essentially, it uses matrices and eigenvectors\/eigenvalues to find vectors which together can span most of the solution space.  We won&#8217;t get too much into the math behind it, but we have linked to some useful articles below.  <\/p>\n\n\n\n<p>Several academic papers have suggested that this type of analysis can generate factors which predict asset prices.  In this article we will determine if that&#8217;s still true.<\/p>\n\n\n\n<h2>Suggested readings<\/h2>\n\n\n\n<ul><li><a href=\"https:\/\/core.ac.uk\/download\/pdf\/297024999.pdf\" target=\"_blank\" rel=\"noopener\">Performance measurement with the arbitrage pricing theory<\/a><\/li><li><a href=\"http:\/\/lib.cufe.edu.cn\/upload_files\/other\/3_20140520102837_Rick%20and%20Return%20In%20an%20Equilibrium%20Apt%20Application%20of%20a%20New%20Test%20Methodology.pdf\" target=\"_blank\" rel=\"noopener\">Risk and return in equilibrium APT<\/a><\/li><li><a href=\"http:\/\/cs229.stanford.edu\/section\/cs229-linalg.pdf\" target=\"_blank\" rel=\"noopener\">Linear algebra review (if you want to understand the math)<\/a><\/li><li><a href=\"https:\/\/firemymoneymanager.com\/getting-started-using-python-to-find-alpha\/\" data-type=\"post\" data-id=\"221\">Getting started: using Python to find alpha [2021]<\/a><\/li><li><a href=\"https:\/\/firemymoneymanager.com\/do-capm-efficient-portfolios-really-outperform-random-ones\/\" data-type=\"post\" data-id=\"829\">Do CAPM efficient portfolios really outperform random ones? [2021]<\/a><\/li><\/ul>\n\n\n\n<p><\/p>\n\n\n\n<h2>Modeling stock returns with 2 factor PCA<\/h2>\n\n\n\n<p>We begin with a basic model of stock returns.  We will limit this model to three months: 2 input months and one month to test the results.  <\/p>\n\n\n\n<p>There are different ways to use PCA on stock data.  We can use the stocks as features and the stock prices at certain dates as samples, or we can use the dates as features and each stock as a sample.  Each way provides valuable information.<\/p>\n\n\n\n<p>In the papers linked above, the authors build a matrix with time periods as columns and stocks as rows to perform their PCA.  We use the following code to create a similar matrix:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\"> \nalldata = pd.DataFrame()\n\npd.set_option('display.max_rows', 100)\npd.set_option('display.max_columns', 100)\n\n\ntickers = pd.read_csv(\"tickers.csv\")\ntickers = tickers.loc[(tickers.exchange == 'NYSE') | (tickers.exchange == 'NYSEARCA')| (tickers.exchange == 'NYSEMKT')| (tickers.exchange == 'NASDAQ')]\ntickers = tickers.loc[tickers.isdelisted == 'N']\ntickers = tickers.loc[tickers.table == 'SF1']\n\nfor t in tickers['ticker']:\n    print(t)\n    try:\n        data = pd.read_csv(t + \".csv\", \n                            header = None,\n                            usecols = [0, 1, 12],\n                            names = ['ticker', 'date', 'adj_close'])\n    \n        alldata = alldata.append(data)\n        \n    except FileNotFoundError as e:\n        pass\n\n\n\ndf = alldata.set_index('date')\n\n\ntable = df.pivot(columns='ticker')\ntable.columns = [col[1] for col in table.columns]\n\n#table = table['2010-01-04':'2020-08-18']\ntable = table['2014-01-01':]\n\ntable.index = pd.to_datetime(table.index)\ntable.fillna(1, inplace=True)\n\n\n\ntable.resample('BM').last()\nt = table.resample('BM').last().pct_change().transpose()\n\nt = t[['2019-06-28','2019-07-31','2019-08-30']]\n\nx = t[['2019-06-28','2019-07-31']]\ny = t['2019-08-30']<\/pre>\n\n\n\n<p>This gives us all stocks and their percent changes as rows, and the three dates in 2019 as columns.  Now let&#8217;s use sklearn&#8217;s PCA function to run a 2 dimension PCA. In this code we first scale the data using the StandardScaler, and run the PCA to get a DataFrame of principal vectors:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">x = pre.StandardScaler().fit_transform(x)\npca = dec.PCA(n_components = 2)\nvectors = pca.fit_transform(x)<\/pre>\n\n\n\n<p>That&#8217;s it! The actual PCA is done.  The <em>vectors <\/em>array now contains the 2 vectors that the PCA function believes are the principal factors that explain the data.  <\/p>\n\n\n\n<p>We would really like (at least with this first test) to visualize the data to see what exactly the PCA function did.  Let&#8217;s plot the vectors (the x and y axis) along with whether the returns are positive (shown in red) or negative (shown in black) over the period.  If the PCA function successfully found explanatory components, we should see some separation of the data:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" width=\"800\" height=\"800\" src=\"https:\/\/firemymoneymanager.com\/wp-content\/uploads\/2021\/10\/Figure_1.png\" alt=\"\" class=\"wp-image-993\" srcset=\"https:\/\/firemymoneymanager.com\/wp-content\/uploads\/2021\/10\/Figure_1.png 800w, https:\/\/firemymoneymanager.com\/wp-content\/uploads\/2021\/10\/Figure_1-300x300.png 300w, https:\/\/firemymoneymanager.com\/wp-content\/uploads\/2021\/10\/Figure_1-150x150.png 150w, https:\/\/firemymoneymanager.com\/wp-content\/uploads\/2021\/10\/Figure_1-768x768.png 768w, https:\/\/firemymoneymanager.com\/wp-content\/uploads\/2021\/10\/Figure_1-620x620.png 620w, https:\/\/firemymoneymanager.com\/wp-content\/uploads\/2021\/10\/Figure_1-50x50.png 50w\" sizes=\"(max-width: 800px) 100vw, 800px\" \/><figcaption>Using principal component analysis to predict stock returns: a basic test of a 2 factor PCA<\/figcaption><\/figure>\n\n\n\n<p>Hm&#8230;this does not look too promising.  You immediately notice that the red and black points are nearly all intermingled.  The PCA was not able to separate this data into an X and Y that predicts the stock returns.<\/p>\n\n\n\n<p>At this point, we could use a predictive tool such as regression or machine learning to make a prediction using the extracted factors.  Unfortunately, PCA doesn&#8217;t help us achieve a better prediction. <\/p>\n\n\n\n<h2>Can we do better with more factors?<\/h2>\n\n\n\n<p>We can add more factors and more samples, but the predictive value is not enhanced.  Regardless of whether we use individual time points as samples or stocks as samples and times as factors, PCA does not show statistically significant predictive value.<\/p>\n\n\n\n<h2>Conclusion<\/h2>\n\n\n\n<p>Principal component analysis is extremely useful looking backward in order to analyze data.  However, for prediction of stock prices, it isn&#8217;t particularly helpful.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Can principal component analysis predict stock prices in 2021? We use historical returns to determine if this type of analysis still works.<\/p>\n","protected":false},"author":1,"featured_media":993,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[32],"tags":[5,11,34,35,33,10,4],"_links":{"self":[{"href":"https:\/\/firemymoneymanager.com\/wp-json\/wp\/v2\/posts\/981"}],"collection":[{"href":"https:\/\/firemymoneymanager.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/firemymoneymanager.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/firemymoneymanager.com\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/firemymoneymanager.com\/wp-json\/wp\/v2\/comments?post=981"}],"version-history":[{"count":27,"href":"https:\/\/firemymoneymanager.com\/wp-json\/wp\/v2\/posts\/981\/revisions"}],"predecessor-version":[{"id":1024,"href":"https:\/\/firemymoneymanager.com\/wp-json\/wp\/v2\/posts\/981\/revisions\/1024"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/firemymoneymanager.com\/wp-json\/wp\/v2\/media\/993"}],"wp:attachment":[{"href":"https:\/\/firemymoneymanager.com\/wp-json\/wp\/v2\/media?parent=981"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/firemymoneymanager.com\/wp-json\/wp\/v2\/categories?post=981"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/firemymoneymanager.com\/wp-json\/wp\/v2\/tags?post=981"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}