Friday, November 17, 2017

Optimizing trading strategies without overfitting

By Ernest Chan and Ray Ng

===

Optimizing the parameters of a trading strategy via backtesting has one major problem: there are typically not enough historical trades to achieve statistical significance. Whatever optimal parameters one found are likely to suffer from data snooping bias, and there may be nothing optimal about them in the out-of-sample period. That's why parameter optimization of trading strategies often adds no value. On the other hand, optimizing the parameters of a time series model (such as a maximum likelihood fit to an autoregressive or GARCH model) is more robust, since the input data are prices, not trades, and we have plenty of prices. Fortunately, it turns out that there are clever ways to take advantage of the ease of optimizing time series models in order to optimize parameters of a trading strategy.

One elegant way to optimize a trading strategy is to utilize the methods of stochastic optimal control theory - elegant, that is, if you are mathematically sophisticated and able to analytically solve the Hamilton-Jacobi-Bellman (HJB) equation (see Cartea et al.) Even then, this will only work when the underlying time series is a well-known one, such as the continuous Ornstein-Uhlenbeck (OU) process that underlies all mean reverting price series. This OU process is neatly represented by a stochastic differential equation. Furthermore, the HJB equations can typically be solved exactly only if the objective function is of a simple form, such as a linear function. If your price series happens to be neatly represented by an OU process, and your objective is profit maximization which happens to be a linear function of the price series, then stochastic optimal control theory will give you the analytically optimal trading strategy: with exact entry and exit thresholds given as functions of the parameters of the OU process. There is no more need to find such optimal thresholds by trial and error during a tedious backtest process, a process that invites overfitting to sparse number of trades. As we indicated above, the parameters of the OU process can be fitted quite robustly to prices, and in fact there is an analytical maximum likelihood solution to this fit given in Leung et. al.

But what if you want something more sophisticated than the OU process to model your price series or require a more sophisticated objective function? What if, for example, you want to include a GARCH model to deal with time-varying volatility and optimize the Sharpe ratio instead? In many such cases, there is no representation as a continuous stochastic differential equation, and thus there is no HJB equation to solve. Fortunately, there is still a way to optimize without overfitting.

In many optimization problems, when an analytical optimal solution does not exist, one often turns to simulations. Examples of such methods include simulated annealing and Markov Chain Monte Carlo (MCMC). Here we shall do the same: if we couldn't find an analytical solution to our optimal trading strategy, but could fit our underlying price series quite well to a standard discrete time series model such as ARMA, then we can simply simulate many instances of the underlying price series. We shall backtest our trading strategy on each instance of the simulated price series, and find the best trading parameters that most frequently generate the highest Sharpe ratio. This process is much more robust than applying a backtest to the real time series, because there is only one real price series, but we can
we can simulate as many price series (all following the same ARMA process) as we want. That means we can simulate as many trades as we want and obtain optimal trading parameters with as high a precision as we like. This is almost as good as an analytical solution. (See flow chart below that illustrates this procedure - click to enlarge.)

Optimizing a trading strategy using simulated time series

Here is a somewhat trivial example of this procedure. We want to find an optimal strategy that trades  AUDCAD on an hourly basis. First, we fit a AR(1)+GARCH(1,1) model to the data using log midprices. The maximum likelihood fit is done using a one-year moving window of historical prices, and the model is refitted every month. We use MATLAB's Econometrics Toolbox for this fit. Once the sequence of monthly models are found, we can use them to predict both the log midprice at the end of the hourly bars, as well as the expected variance of log returns. So a simple trading strategy can be tested: if the expected log return in the next bar is higher than K times the expected volatility (square root of variance) of log returns, buy AUDCAD and hold for one bar, and vice versa for shorts. But what is the optimal K?

Following the procedure outlined above, each time after we fitted a new AR(1)+GARCH(1, 1) model, we use this to simulate the log prices for the next month's worth of hourly bars. In fact, we simulate this 1,000 times, generating 1,000 time series, each with the same number of hourly bars in a month. Then we simply iterate through all reasonable value of K and remember which K generates the highest Sharpe ratio for each simulated time series. We pick the K that most often results in the best Sharpe ratio among the 1,000 simulated time series (i.e. we pick the mode of the distribution of optimal K's across the simulated series). This is the sequence of K's (one for each month) that we use for our final backtest. Below is a sample distribution of optimal K's for a particular month, and the corresponding distribution of Sharpe ratios:

Histogram of optimal K and corresponding Sharpe ratio for 1,000 simulated price series

Interestingly, the mode of the optimal K is 0 for any month. That certainly makes for a simple trading strategy: just buy whenever the expected log return is positive, and vice versa for shorts. The CAGR is about 4.5% assuming zero transaction costs and midprice executions. Here is the cumulative returns curve:


You may exclaim: "This can't be optimal, because I am able to trade AUDCAD hourly bars with much better returns and Sharpe ratio!" Of course, optimal in this case only means optimal within a certain universe of strategies, and assuming an underlying AR(1)+GARCH(1, 1) price series model. Our universe of strategies is a pretty simplistic one: just buy or sell based on whether the expected return exceeds a multiple of the expected volatility. But this procedure can be extended to whatever price series model you assume, and whatever universe of strategies you can come up with. In every case, it greatly reduces the chance of overfitting.

P.S. we invented this procedure for our own use a few months ago, borrowing similar ideas from Dr. Ng’s computational research in condensed matter physics systems (see Ng et al here or here). But later on, we found that a similar procedure has already been described in a paper by Carr et al

===

About the authors: Ernest Chan is the managing member of QTS Capital Management, LLC. Ray Ng is a quantitative strategist at QTS. He received his Ph.D. in theoretical condensed matter physics from McMaster University. 

===

Upcoming Workshops by Dr. Ernie Chan

November 18 and December 2:  Cryptocurrency Trading with Python

I will be moderating this online workshop for Nick Kirk, a noted cryptocurrency trader and fund manager, who taught this widely acclaimed course here and at CQF in London.

February 24 and March 3: Algorithmic Options Strategies

This online course focuses on backtesting intraday and portfolio option strategies. No pesky options pricing theories will be discussed, as the emphasis is on arbitrage trading.



Thursday, September 07, 2017

StockTwits Sentiment Analysis


By Colton Smith
===

Exploring alternative datasets to augment financial trading models is currently the hot trend among the quantitative community. With so much social media data out there, its place in financial models has become a popular research discussion. Surely the stock market’s performance influences the reactions from the public but if the converse is true, that social media sentiment can be used to predict movements in the stock market, then this would be a very valuable dataset for a variety of financial firms and institutions.

When I began this project as a consultant for QTS Capital Management, I did an extensive literature review of the social media sentiment providers and academic research. The main approach is to take the social media firehose, filter it down by source credibility, apply natural language processing (NLP), and create a variety of metrics that capture sentiment, volume, dispersion, etc. The best results have come from using Twitter or StockTwits as the source. A feature of StockTwits that distinguishes it from Twitter is that in late 2012 the option to label your tweet as bullish or bearish was added. If these labels accurately capture sentiment and are used frequently enough, then it would be possible to avoid using NLP. Most tweets are not labeled as seen in Figure 1 below, but the percentage is increasing.

Figure 1: Percentage of Labeled StockTwits Tweets by Year

This blog post will compare the use of just the labeled tweets versus the use of all tweets with NLP. To begin, I did some basic data analysis to better understand the nature of the data. In Figure 2 below, the number of labeled tweets per hour is shown. As expected there are spikes around market open and close.

Figure 2: Number of Tweets Per Hour of the Day

The overall market sentiment can be estimated by aggregating the number of bullish and bearish labeled tweets each day. Based on the previous literature, I expected a significant bullish bias. This is confirmed in Figure 3 below with the daily mean percetage of bullish tweets being 79%.

Figure 3: Percentage of Bullish Tweets Each Day

When writing a StockTwits tweet, users can tag multiple symbols so it is possible that the sentiment label could apply to more than one symbol. Tagging more than one symbol would likely indicate less specific sentiment and predictive potential so I hoped to find that most tweets only tag a single symbol. Looking at Figure 4 below, over 90% of the tweets tag a single symbol and a very small percentage tag 5+.

Figure 4: Relative Frequency Histogram of the Number of Symbols Mentioned Per Tweet

The time period of data used in my analysis is from 2012-11-01 to 2016-12-31. In Figure 5 below, the top symbols, industries, and sectors by total labeled tweet count are shown. By far the most tweeted about industries were biotechnology and ETFs. This makes sense because of how volatile these industries are which hopefully means that they would be the best to trade based on social media sentiment data.

Figure 5: Top Symbols, Industries, and Sectors by Total Tweet Count

Now I needed to determine how I would create the sentiment score to best encompass the predictive potential of the data. Though there are obstacles to trading an open to close strategy including slippage, liquidity, and transaction costs, analyzing how well the sentiment score immediately before market open predicts open to close returns is a valuable sanity check to see if it would be useful in a larger factor model. The sentiment score for each day was calculated using the tweets from the previous market day’s open until the current day’s open:

S-Score =  (#Bullish-#Bearish)/(#Bullish+#Bearish)

This S-Score then needs to be normalized to detect the significance of a specific day’s sentiment with respect to the symbol’s historic sentiment trend. To do this, a rolling z-score is applied to the series. By changing the length of the lookback window the sensitivity can be adjusted. Additionally, since the data is quite sparse, days without any tweets for a symbol are given an S-Score of 0. At the market open each day, symbols with an S-Score above the positive threshold are entered long and symbols with an S-Score below the negative threshold are entered short. Equal dollar weight is applied to the long and short legs. These positions are assumed to be liquidated at the day’s market close. The first test is on the universe of equities with previous day closing prices > $5. With a relatively small long-short portfolio of ~250 stocks, its performance can be seen in Figure 6 below (click on chart to enlarge).

Figure 6: Price > $5 Universe Open to Close Cumulative Returns

The thresholds were cherry-picked to show the potential of a 2.11 Sharpe Ratio but the results vary depending on the thresholds used. This sensitivity is likely due to the lack of tweet volume on most symbols. Also, the long and short thresholds are not equal in an attempt to maintain roughly equal number of stocks in each leg. The neutral basket contains all of the stocks in the universe that do not have an S-Score extreme enough to generate a long or short signal. Using the same thresholds as above, the test was ran on a liquidity universe which is defined as the top quartile of 50-day Average Dollar Volume stocks. As seen in Figure 7 below, the Sharpe drops to a 1.24 but is still very encouraging.

Figure 7: Liquidity Universe Open to Close Cumulative Returns

The sensitivity of these results needs to be further inspected by performing analysis on separate train and test sets but I was very pleased with the returns that could be potentially generated from just labeled StockTwits data.

In July, I began working for Social Market Analytics, the leading social media sentiment provider. Here at SMA, we run all the StockTwits tweets through our proprietary NLP engine to determine their sentiment scores. Using sentiment data from 9:10 EST which looks at an exponentially weighted sentiment aggregation over the last 24 hours, the open to close simulation can be ran on the price > $5 universe. Each stock is separated into its respective quintile based on its S-Score in relation to the universe’s percentiles that day. A long-short portfolio is constructed in a similar fashion as previously with long positions in the top quintile stocks and short positions in the bottom quintile stocks. In Figure 8 below you can see that the results are much better than when only using sentiment labeled data.

Figure 8: SMA Open to Close Cumulative Returns Using StockTwits Data

The predictive power is there as the long-short boasts an impressive 4.5 Sharpe ratio. Due to having more data, the results are much less sensitive to long-short portfolio construction. To avoid the high turnover of an open-to-close strategy, we have been exploring possible long-term strategies. Deutsche Bank’s Quantitative Research Team recently released a paper about strategies that solely use our SMA data which includes a longer-term strategy. Additionally, I’ve recently developed a strong weekly rebalance strategy that attempts to capture weekly sentiment momentum.

Though it is just the beginning, my dive into social media sentiment data and its application in finance over the course of my time consulting for QTS has been very insightful. It is arguable that by just using the labeled StockTwits tweets, we may be able to generate predictive signals but by including all the tweets for sentiment analysis, a much stronger signal is found. If you have questions please contact me at coltonsmith321@gmail.com.

Colton Smith is a recent graduate of the University of Washington where he majored in Industrial and Systems Engineering and minored in Applied Math. He now lives in Chicago and works for Social Market Analytics. He has a passion for data science and is excited about his developing quantitative finance career. LinkedIn: https://www.linkedin.com/in/coltonfsmith/
===
Upcoming Workshops by Dr. Ernie Chan

September 11-15City of London workshops

These intense 8-16 hours workshops cover Algorithmic Options StrategiesQuantitative Momentum Strategies, and Intraday Trading and Market Microstructure. Typical class size is under 10. They may qualify for CFA Institute continuing education credits.

November 18 and December 2:  Cryptocurrency Trading with Python

I will be moderating this online workshop for Nick Kirk, a noted cryptocurrency trader and fund manager, who taught this widely acclaimed course here and at CQF in London.

Friday, July 21, 2017

Building an Insider Trading Database and Predicting Future Equity Returns

By John Ryle, CFA
===
I’ve long been interested in the behavior of corporate insiders and how their actions may impact their company’s stock. I had done some research on this in the past, albeit in a very low-tech way using mostly Excel. It’s a highly compelling subject, intuitively aligned with a company’s equity performance - if those individuals most in-the-know are buying, it seems sensible that the stock should perform well. If insiders are selling, the opposite is implied. While reality proves more complex than that, a tremendous amount of literature has been written on the topic, and it has shown to be predictive in prior studies.

In generating my thesis to complete Northwestern’s MS in Predictive Analytics program, I figured employing some of the more prominent machine learning algorithms to insider trading could be an interesting exercise. I was concerned, however, that, as the market had gotten smarter over time, returns from insider trading signals may have decayed as well, as is often the case with strategies exposed to a wide audience over time. Information is more readily available now than at any time in the past. Not too long ago, investors needed to visit SEC offices to obtain insider filings. The standard filing document, the form 4 has only required electronic submission since 2003. Now anyone can obtain it freely via the SEC’s EDGAR website. If all this data is just sitting out there, can it continue to offer value?

I decided to inquire by gathering the filings directly by scraping the EDGAR site.  While there are numerous data providers available (at a cost), I wanted to parse the raw data directly, as this would allow for greater “intimacy” with the underlying data. I’ve spent much of my career as a database developer/administrator, so working with raw text/xml and transforming it into a database structure seemed like fun. Also, since I desired this to be a true end-to-end data science project, including the often ugly 80% of the real effort – data wrangling, was an important requirement.  That being said, mining and cleansing the data was a monstrous amount of work. It took several weekends to work through the code and finally download 2.4 million unique files. I relied heavily on Powershell scripts to first parse through the files and shred the xml into database tables in MS SQL Server.

With data from the years 2005 to 2015, the initial 2.4 million records were filtered down to 650,000 Insider Equity Buy transactions. I focused on Buys rather than Sells because the signal can be a bit murkier with sells. Insider selling happens for a great many innocent reasons, including diversification and paying living expenses. Also, I focused on equity trades rather than derivatives for similar reasons -it can be difficult to interpret the motivations behind various derivative trades.  Open market buy orders, however, are generally quite clear.

After some careful cleansing, I had 11 years’ worth of useful SEC data, but in addition, I needed pricing and market capitalization data, ideally which would account for survivorship bias/dead companies. Respectively, Zacks Equity Prices and Sharadar’s Core US Fundamentals data sets did the trick, and I could obtain both via Quandl at reasonable cost (about $350 per quarter.)

For exploratory data analysis and model building, I used the R programming language. The models I utilized were linear regression, recursive partitioning, random forest and multiplicative adaptive regression splines (MARS).  I intended to make use of a support vector machine (SVM) models as well, but experienced a great many performance issues when running on my laptop with a mere 4 cores. SVMs have trouble with scaling. I failed to overcome this issue and abandoned the effort after 10-12 crashes, unfortunately.

For the recursive partitioning and random forest models I used functions from Microsoft’s RevoScaleR package, which allows for impressive scalability versus standard tree-based packages such as rpart and randomForest. Similar results can be expected, but the RevoScaleR packages take great advantage of multiple cores. I split my data into a training set for 2005-2011, a validation set for 2012-2013, and a test set for 2014-2015. Overall, performance for each of the algorithms tested were fairly similar, but in the end, the random forest prevailed.

For my response variable, I used 3-month relative returns vs the Russell 3000 index. For predictors, I utilized a handful of attributes directly from the filings and from related company information. The models proved quite predictive in the validation set as can be seen in exhibit 4.10 of the paper, and reproduced below:
The random forest’s predicted returns were significantly better for quintile 5, the highest predicted return grouping, relative to quintile 1(the lowest). Quintiles 2 through 4 also lined up perfectly - actual performance correlated nicely with grouped predicted performance.  The results in validation seemed very promising!

However, when I ran the random forest model on the test set (2014-2015), the relationship broke down substantially, as can be seen in the paper’s Exhibit 5.2, reproduced below:


Fortunately, the predicted 1st decile was in in fact the lowest performing actual return grouping. However, the actual returns on all remaining prediction deciles appeared no better than random. In addition, relative returns were negative for every decile.  

While disappointing, it is important to recognize that when modeling time-dependent financial data, as the time-distance moves further away from the training set’s time-frame, performance of the model tends to decay. All market regimes, gradually or abruptly, end. This represents a partial (yet unsatisfying) explanation for this relative decrease in performance. Other effects that may have impaired prediction include the use of price, as well as market cap, as predictor variables. These factors certainly underperformed during the period used for the test set. Had I excluded these, and refined the filing specific features more deeply, perhaps I would have obtained a clearer signal in the test set.

In any event, this was a fun exercise where I learned a great deal about insider trading and its impact on future returns. Perhaps we can conclude that this signal has weakened over time, as the market has absorbed the informational value of insider trading data. However, perhaps further study, additional feature engineering and clever consideration of additional algorithms is worth pursuing in the future.

John J Ryle, CFA lives in the Boston area with his wife and two children. He is a software developer at a hedge fund, a graduate of Northwestern’s Master’s in Predictive Analytics program (2017), a huge tennis fan, and a machine learning enthusiast. He can be reached at john@jryle.com. 

===
Upcoming Workshops by Dr. Ernie Chan

July 29 and August 5Mean Reversion Strategies

In the last few years, mean reversion strategies have proven to be the most consistent winner. However, not all mean reversion strategies work in all markets at all times. This workshop will equip you with basic statistical techniques to discover mean reverting markets on your own, and describe the detailed mechanics of trading some of them. 

September 11-15: City of London workshops

These intense 8-16 hours workshops cover Algorithmic Options Strategies, Quantitative Momentum Strategies, and Intraday Trading and Market Microstructure. Typical class size is under 10. They may qualify for CFA Institute continuing education credits.

===
Industry updates
  • scriptmaker.net allows users to record order book data for backtesting.
  • Pair Trading Lab offers a web-based platform for easy backtesting of pairs strategies.


Thursday, May 04, 2017

Paradox Resolved: Why Risk Decreases Expected Log Return But Not Expected Wealth

I have been troubled by the following paradox in the past few years. If a stock's log returns (i.e. change in log price per unit time) follow a Gaussian distribution, and if its net returns (i.e. percent change in price per unit time) have mean m and standard distribution s, then many finance students know that the mean log returns is m-s2 /2That is, the compound growth rate of the stock is m-s2 /2. This can be derived by applying Ito's lemma to the log price process (see e.g. Hull), and is intuitively satisfying because it is saying that the expected compound growth rate is lowered by risk ("volatility"). OK, we get that - risk is bad for the growth of our wealth.

However, let's find out what the expected price of the stock is at time t. If we invest our entire wealth in one stock, that is really asking what our expected wealth is at time t. To compute that, it is easier to first find out what the expected log price of the stock is at time t, because that is just the expected value of the sum of the log returns in each time interval, and is of course equal to the sum of the expected value of the log returns when we assume a geometric random walk. So the expected value of the log price at time t is just t * (m-s2 /2). But what is the expected price (not log price) at time t? It isn't correct to say exp(t * (m-s2 /2)), because the expected value of the exponential function of a normal variable is not equal to the exponential function of the expected value of that normal variable, or E[exp(x)] !=exp(E[x]). Instead, E[exp(x)]=exp(μ+σ2 /2) where μ and σ are the mean and standard deviation of the normal variable (see Ruppert). In our case, the normal variable is the log price, and thus μ=t * (m-s2 /2), and σ2=t *s. Hence the expected price at time t is exp(t*m). Note that it doesn't involve the volatility s. Risk doesn't affect the expected wealth at time t. But we just argued in the previous paragraph that the expected compound growth rate is lowered by risk. What gives?

This brings us to a famous recent paper by Peters and Gell-Mann. (For the physicists among you, this is the Gell-Mann who won the Nobel prize in physics for inventing quarks, the fundamental building blocks of matter.) This happens to be the most read paper in the Chaos Journal in 2016, and basically demolishes the use of the utility function in economics, in agreement with John Kelly, Ed Thorp, Claude Shannon, Nassim Taleb, etc., and against the entire academic economics profession. (See Fortune's Formula for a history of this controversy. And just to be clear which side I am on: I hate utility functions.) To make a long story short, the error we have made in computing the expected stock price (or wealth) at time t, is that the expectation value there is ill-defined. It is ill-defined because wealth is not an "ergodic" variable: its finite-time average is not equal to its "ensemble average". Finite-time average of wealth is what a specific investor would experience up to time t, for large t. Ensemble average is the average wealth of many millions of similar investors up to time t. Naturally, since we are just one specific investor, the finite-time average is much more relevant to us. What we have computed above, unfortunately, is the ensemble average.  Peters and Gell-Mann exhort us (and other economists) to only compute expected values of ergodic variables, and log return (as opposed to log price) is happily an ergodic variable. Hence our average log return is computed correctly - risk is bad. Paradox resolved!

===

My Upcoming Workshops

May 13 and 20: Artificial Intelligence Techniques for Traders

I will discuss in details AI techniques as applied to trading strategies, with plenty of in-class exercises, and with emphasis on nuances and pitfalls of these techniques.

June 5-9: London in-person workshops

I will teach 3 courses there: Quantitative Momentum, Algorithmic Options Strategies, and Intraday Trading and Market Microstructure.

(The London courses may qualify for continuing education credits for CFA Institute members.)


Friday, March 03, 2017

More Data or Fewer Predictors: Which is a Better Cure for Overfitting?

One of the perennial problems in building trading models is the spareness of data and the attendant danger of overfitting. Fortunately, there are systematic methods of dealing with both ends of the problem. These methods are well-known in machine learning, though most traditional machine learning applications have a lot more data than we traders are used to. (E.g. Google used 10 million YouTube videos to train a deep learning network to recognize cats' faces.)

To create more training data out of thin air, we can resample (perhaps more vividly, oversample) our existing data. This is called bagging. Let's illustrate this using a fundamental factor model described in my new book. It uses 27 factor loadings such as P/E, P/B, Asset Turnover, etc. for each stock. (Note that I call cross-sectional factors, i.e. factors that depend on each stock, "factor loadings" instead of "factors" by convention.) These factor loadings are collected from the quarterly financial statements of SP 500 companies, and are available from Sharadar's Core US Fundamentals database (as well as more expensive sources like Compustat). The factor model is very simple: it is just a multiple linear regression model with the next quarter's return of a stock as the dependent (target) variable, and the 27 factor loadings as the independent (predictor) variables. Training consists of finding the regression coefficients of these 27 predictors. The trading strategy based on this predictive factor model is equally simple: if the predicted next-quarter-return is positive, buy the stock and hold for a quarter. Vice versa for shorts.

Note there is already a step taken in curing data sparseness: we do not try to build a separate model with a different set of regression coefficients for each stock. We constrain the model such that the same regression coefficients apply to all the stocks. Otherwise, the training data that we use from 200701-201112 will only have 1,260 rows, instead of 1,260 x 500 = 630,000 rows.

The result of this baseline trading model isn't bad: it has a CAGR of 14.7% and Sharpe ratio of 1.8 in the out-of-sample period 201201-201401. (Caution: this portfolio is not necessarily market or dollar neutral. Hence the return could be due to a long bias enjoying the bull market in the test period. Interested readers can certainly test a market-neutral version of this strategy hedged with SPY.) I plotted the equity curve below.




Next, we resample the data by randomly picking N (=630,000) data points with replacement to form a new training set (a "bag"), and we repeat this K (=100) times to form K bags. For each bag, we train a new regression model. At the end, we average over the predicted returns of these K models to serve as our official predicted returns. This results in marginal improvement of the CAGR to 15.1%, with no change in Sharpe ratio.

Now, we try to reduce the predictor set. We use a method called "random subspace". We randomly pick half of the original predictors to train a model, and repeat this K=100 times. Once again, we average over the predicted returns of all these models. Combined with bagging, this results in further marginal improvement of the CAGR to 15.1%, again with little change in Sharpe ratio.

The improvements from either method may not seem large so far, but at least it shows that the original model is robust with respect to randomization.

But there is another method in reducing the number of predictors. It is called stepwise regression. The idea is simple: we pick one predictor from the original set at a time, and add that to the model only if BIC  (Bayesian Information Criterion) decreases. BIC is essentially the negative log likelihood of the training data based on the regression model, with a penalty term proportional to the number of predictors. That is, if two models have the same log likelihood, the one with the larger number of parameters will have a larger BIC and thus penalized. Once we reached minimum BIC, we then try to remove one predictor from the model at a time, until the BIC couldn't decrease any further. Applying this to our fundamental factor loadings, we achieve a quite significant improvement of the CAGR over the base model: 19.1% vs. 14.7%, with the same Sharpe ratio.

It is also satisfying that the stepwise regression model picked only two variables out of the original 27. Let that sink in for a moment: just two variables account for all of the predictive power of a quarterly financial report! As to which two variables these are - I will reveal that in my talk at QuantCon 2017 on April 29.

===

My Upcoming Workshops

March 11 and 18: Cryptocurrency Trading with Python

I will be moderating this online workshop for my friend Nick Kirk, who taught a similar course at CQF in London to wide acclaim.

May 13 and 20: Artificial Intelligence Techniques for Traders

I will discuss in details AI techniques such as those described above, with other examples and in-class exercises. As usual, nuances and pitfalls will be covered.