**number of shares**of stocks A and B fixed, in the ratio hA:hB, and short this spread when it is much higher than average, and long this spread when it is much lower. On the other hand, for a stationary log price spread hA*log(yA)-hB*log(yB), we need to keep the

**market values**of stocks A and B fixed, in the ratio hA:hB, which means that at the end of every bar, we need to rebalance the shares of A and B due to price changes.

For most cointegrating pairs that I have studied, both the price spreads and the log price spreads are stationary, so it doesn't matter which one we use for our trading strategy. However, for an unusual pair where its log price spread cointegrates but price spread does not (Hat tip: Adam G. for drawing my attention to one such example), the implication is quite significant. A stationary price spread means that prices differences are mean-reverting, a stationary log price spread means that returns differences are mean-reverting. For example, if stock A typically grows 2 times as fast as B, but has been growing 2.5 times as fast recently, we can expect the growth rate differential to decrease going forward. We would still short A and long B, but we would exit this position when the growth rates of A vs B return to a 2:1 ratio, and not when the price spread of A vs B returns to a historical mean. In fact, the price spread of A vs B should continue to increase over the long term.

This much is easy to understand. But thanks to a reader Ferenc F. who referred me to a paper by Fernholz and Maguire, I realize there is a simple mathematical relationship between stock A and B in order for their log prices to cointegrate.

Let us start with a formula derived by these authors for the change in log market value P of a portfolio of 2 stocks: d(logP) = hA*d(log(yA))+hB*d(log(yB))+gamma*dt.

The gamma in this equation is

gamma=1/2*(hA*varA + hB*varB), where varA is the variance of stock A

__minus__the variance of the portfolio market value, and ditto for varB.

Note that this formula holds for a portfolio of any two stocks, not just when they are cointegrating. But if they are in fact cointegrating, and if hA and hB are the weights which create the stationary portfolio P, we know that d(logP) cannot have a non-zero long term drift term represented by gamma*dt. So gamma must be zero. Now in order for gamma to be zero, the

**covariance**of the two stocks must be positive (no surprise here) and equal to the

**average of the variances**of the two stocks. I invite the reader to verify this conclusion by expressing the variance of the portfolio market value in terms of the variances of the individual stocks and their covariance, and also to extend it to a portfolio with N stocks. This cointegration test for log prices is certainly simpler than the usual CADF or Johansen tests! (The price to pay for this simplicity? We must assume normal distributions of returns.)

===

My online Quantitative Momentum Strategies workshop will be offered on December 2-4. Please visit epchan.com/my-workshops for registration details.

## 69 comments:

Ernie,

You wrote,

"For example, if stock A typically grows 2 times as fast as B, but has been growing 2.5 times as fast recently, we can expect the growth rate differential to decrease going forward. We would still short A and long B, but we would exit this position when the growth rates of A vs B return to a 2:1 ratio, and not when the price spread of A vs B returns to a historical mean. In fact, the price spread of A vs B should continue to increase over the long term."

I believe this explanation is incorrect (no offense!).

Let's assume A and B have fixed long-term growth rates, but that each has an instantaneous growth rate that fluctates randomly around the mean. If stock A typically grows twice as fast as stock B, then the log(A) price series will grow, on average, at twice the linear rate as the log(B) price series. So, viewed in log space, A and B will both tend to rise linearly with some fluctuations around the best fit line, but log(A)'s best fit line will have twice the slope as log(B)'s. Therefore, log(A) will diverge from log(B) over time, and therefore they cannot be cointegrated.

In order for log(A) and log(B) to be cointegrated, A and B must have the same long-term growth rate. Consider the example of two classes of common stock for a single company that trade on different exchanges in different countries. After accounting for forex effects these two stocks must grow at the same rate since they fundamentally represent the same company. Therefore their log-price series will grow at the same rate and their log-prices will be cointegrated. But over a very long period of time both price series should grow exponentially, so their raw price series will diverge because even a small percentage difference between the two will correspond to a large absolute difference compared to their inital values.

- aagold (Adam G.)

Adam,

For 2 stocks with growth rates in the ratio of 2:1, we merely have to keep the ratio of their market values to be 1:2 so that their positions will have the same long-term growth rate.

As I wrote, if log prices are cointegrating, we need to constantly rebalance these positions so that their market values are always in 1:2 ratio.

Ernie

Ernie,

Let's separate the discussion of a trading strategy from the discussion of cointegration definition. Let's see if we can agree on the following definitions.

1) If stocks A and B are cointegrated in raw price space with hedge ratio h, then the difference A - h*B will fluctuate randomly around 0 with no drift.

2) If stocks A and B are cointegrated in log price space with hedge ratio h, then the difference log(A) - log(h*B) will fluctuate randomly around 0 with no drift.

My claim is that any two stocks A & B which satisfy definition #2 must have the same long-term growth rate. Consider this example: A=exp(alpha0*t) and B=exp(alpha1*t). The difference in their logs is (alpha0 - alpha1)*t, which has no drift only when alpha0 = alpha1 (i.e., same growth rate).

- Adam

Adam,

I believe your definition of cointegration of log prices is incorrect.

The spread in this case is defined as log(A)-h*log(B), not log(A)-log(h*B) as you wrote.

Ernie

Ernie,

Ok, I see your point. Defined this way, stocks with different growth rates can be cointegrated in log space. The hedge ratio h compensates for the different growth rates.

However, I think the most interesting real life examples where log prices are cointegrated, but raw prices are not, occur when h=1 (i.e., same growth rate). At least that's the case for any real-world examples I can think of.

Regards,

Adam

Ernie,

Sorry if I'm beating this to death, but in the example you cited with A and B the hedge ratio would be 2. So the log spread would be log(A) - 2*log(B).

You wrote we would exit this position when the growth rate of A reverts to 2 times the growth rate of B, but I think this is incorrect. We should exit the position when the *ratio* A/B^2 reverts to its historical mean (which is equivalent to the log spread returning to its historical mean).

The analagous statement for raw prices is, we would exit the position when the *difference* A - 2*B reverts to its historical mean.

- Adam

Adam,

You are right that I was being imprecise when I said entry/exit signals should be based on differential "growth rates". By growth rates, I don't mean the instantaneous growth rates d(log(P))/dt, but the average growth rate log(P)/t where t is the time since some distant past at the beginning of our backtest period. Since t is the same for both stocks, difference in average growth rates are essentially the same as the difference in log prices.

Ernie

Ernie,

ON the topic of Kelly leverage. I'm following the example in your book (Quantitative Trading pp. 99) and for SPY I calculate a leverage of 21.08 usinfg the last 252 returns. Is that also what you get?

Anon,

Yes, that's about what I get too.

Ernie

Anon,

I certainly hope you're not planning on investing on SPY with a leverage factor of 21.08! You know that would be absolutely insane, right? Even half-Kelly at 10.5 would be insane.

It might be ok to estimate future variance using the past 252 daily returns, but it's certainly not correct to estimate future expected returns that way.

I use a half-Kelly model to determine how much stock market exposure I should have with my real-life portfolio, and right now it's saying I should be 70% in the US stock market and 30% in cash. My estimate of the market's future daily mean return is 6.05% per year, annualized standard deviation is 15.4%, and risk-free interest rate is 2.71%.

- Adan

Hi Ernie,

It seems Yahoo Finance starts to provide real-time tick data (no delay).

Do you hear any good news or comments about that?

Hi Anon,

Indeed Yahoo Finance now offers real-time data, but only from Nasdaq. So if a stock like IBM is primarily traded on NYSE, the Nasdaq price may be slightly different.

Ernie

Hi Ernie,

Is that ok to use IB real-time data stream to do pairs trading in US markets?

Hi Ernie

After reading your books, I find that a pair of futures contracts traded on Shnaghai Futures exchange that is great to extract roll return form. the question is if one is in backwardation and the other is in contango, how do you determine the hedge ratio between them. Are we suppose to belance their spot return fluctuation? Right now I optimize the backtesting sharp ratio,to determine the ratio, any advice?

Ruan Xun

Hi Anon,

I find the real-time feed of IB too noisy for pair trading stocks. I recommend IQFeed or even Yahoo Realtime instead.

Ernie

Hi Ruan,

You can use linear regression on their prices (or log prices) to determine the optimal hedge ratio.

You will always be long the one in backwardation, and short the one in contango if you want to extract roll return.

Ernie

Hi Ernie

So should I use continuious futures prices of these 2 contracts,or should I use the 2 underlying spot prices instead?

Hi Ruan,

You can use linear regression on their prices (or log prices) to determine the optimal hedge ratio.

You will always be long the one in backwardation, and short the one in contango if you want to extract roll return.

Ernie

Ruan,

You should not use continuous futures, nor spot prices.

You should be using individual futures contracts to test for roll returns.

Ernie

Ernie,

I tested the price spread you talked about in your book A - h*B for stationarity and I find that for a lot of pairs , the stationarity keeps fluctuating from true to false , and vice versa if I retest the pair for stationarity every month (verying lookback windows). Even a stable pair like GLD-GDX that you mentioned isn't very stationary month to month. How does one interpret this result ?

Anon,

Stationarity tests should involve at least 1 year of daily data. How are saying that even with 1 year of lookback, the test statistic changes greatly from month-to-month?

Ernie

Yes. Using a 1 year lookback & testing for stationarity after every month, I notice that stationary flips from True to False and vice versa on majority of the pairs.

Is this something you notice too ?

Anon,

Usually stationarity test is more stable than that. Perhaps increasing your lookback to 3 years would help. If not, it simply indicates those pairs are not really stationary.

Ernie

Hi Ernie

do you use fundamental data such as PE and ROE etc? Any good service/api recommendation? thanks

Paul

Hi Paul,

I don't currently use fundamental data. But I believe you can scrape the Yahoo Finance website for such info. Also, IB's API also provides such data.

Ernie

Have you tried Quantum Mechanics trading?!

http://arxiv.org/abs/1307.6727

Hi Anon,

Thanks for the article. I find this article lacking in empirical support.

Ernie

Hi Erine

I just read the paper of

"Optimal Pairs Trading: A Stochastic Control Approach"

http://www.nt.ntnu.no/users/skoge/prost/proceedings/acc08/data/papers/0479.pdf

As I m not familiar of Ornstein–Uhlenbeck process and its application on pairs trading, so I would like to seek your opinion,

Isn't it true that the OU process can model the spread and the mean reverting behaviour in continuous time and dynamic way but the cointegration approach cannot ,but the weakness of the OU process is it does not tell us what is the weightage of each stock in a pair. Thus,We have to use the stochastic control approach to get this weightage, but we will have to set a final period T when we close the position. Whereas for the cointegration approach, it explicitly shows the weightage of each stock in a pair.

Hi Anon,

Indeed the OU process does not tell you the optimal hedge ratio. It is a model of one mean-reverting time series, not the cointegration of several series. The only use for an OU model for me is to extract the halflife of mean reversion from the regression coefficient.

Ernie

Hi Erine

When you do simulation or backtesting for the pairs, you use tick data or just daily High, Low, Open and Close Price

Thx

James

Hi James,

That depends on whether the mean reverting strategy is daily, or intraday. In either case, you need bid-ask quotes: trade prices will inflate results.

Ernie

Hi Ernie,

I do not quite understand the concept of cointegration trading with log prices.

Assuming we have a cointegration relationship

log(A) - 0.5 log(B) = 0

If log(A) - 0.5 log(B) > 0, we will short A and long B.

I do not understand why we have to short A. Our assumption here is the growth rate of A may decrease, and this does not mean that the price of A will drop. Hence, we will make a loss if we short A, as the growth rate of A is still increasing.

In pair trading, whether using raw prices or log prices, we should expect one side to lose while the other side profit.

In your example, hopefully the gain in B will be more than enough to offset the loss in A.

Ernie

Dr Ernie,

For stationary test on USDCAD, you do a logarithmn on it. I read that if we use prices like USDCAD, we should use the difference and hence use a random walk model. If we use returns, we should log it and an exponential random walk model is used. Does it matter if we log or dont log it?

Thank you

Leo

Dr Ernie,

I am trying to see the variance ratio test on your data of USDCAD (1min) for different time frame. If I want to look at 60minute time-frame, how do I choose the number of periods for the numerator and denominator based on the below formula? Is it sampling at 60 points for the top and 1 point for the bottom?

VR(k) =Variance(rk t)/k Variance(rt)

Thank you

Leo

Hi Leo,

No, you should just input the 60-min bars into the vratiotest function. This is a hypothesis test, so you can't just compute the ratio and see if it is 1.

Ernie

Dr Ernie,

This is because we can see the variance ratio <1 , =1, >1 and so know the "state" if we use the formula. But if we use vratio from matlab, it can only as you said reject RW with a probability. No way to form the equation with that 1min data for 60min time frame?

Thank you

Leo

maybe I'm missing something in this discussion, but for me a pair is cointegrated if there is a linear combo that is stationary. Stationarity is not the same as driftless. driftless can be non stationary and stationary processes can have drift, i.e., that which is implied by the OU process.

technically, if log(X) and log(Y) cointegrated, one gets a fairly nonlinear expression for the innovation of prices, owing to the fact that aLog(X)-bLog(Y)=Log(X^a/Y^b), then do OU process and express innovation of X in terms of Y. I get that it isn't as simple as what Ernie says in his opening comments.

Marc,

You are right that stationary processes can have polynomial dependence on t. We are only concerned with whether the residual is a bounded function in time.

And you are also right that a driftless process, such as a random walk, is not stationary, since the variance can increase without bound.

When we test for cointegration of log prices, what we are really testing for is whether the returns per period have a linear relationship to each other.

Ernie

Oh, yes, interesting. thanks Ernie Logs is looking nice for a number of reasons now....

Hi Leo,

The variance ratio is a t-statistic. You need critical values of the t-statistic to test if it is significant. vratiotest provides that (hypothesis) test.

Ernie

Dr Ernie,

In your book, you need holding period and look back period to check for trending. Any statistics to test for trending using lookback period? like the way variance ratio test?

Thank you very much

Leo

Hi Ernie,

I have read the post in detail and find it very interesting.

I just wanted to clarify something... can you please define exactly what you mean by "growth rate" of a stock. And also "growth rate differential".

Thanks.

We assume that stock prices grow exponentially. Thus the growth rate is the exponent. This is also ordinarily known as the CAGR (compounded annual growth rate) in financial literature.

Growth rate differentials is the difference of this exponent for 2 different stocks.

Ernie

Hello Ernie, I'm building a mean reversion algorithm based on H1 data and depends which dates I use to check the cointegration I have good or bad results. So, in live trading I have to check each new candle if the pairs are still cointegrated or not? I'm a little bit confused with this topic because maybe you get 1000 candles and the pairs is cointegrated but then I check 1500 and it is not.

I think that in GLD-GDX case it happens something like this, that the cointegration was broken and then continue.

Another question is that if I have to recalculate my parameters each candle or maybe it is better once a day/week?

Thanks.

Greetings

Hi Carlos,

Typically, cointegration need to be determined only using daily data. It doesn't help to use intraday data, unless you want to liquidate your positions at the close each day.

Even if you were to use daily data, there is no need to update your cointegration test daily. Monthly update is good enough. You can run that on the most recent 3 years of data.

Ernie

Hi Ernie,

In Pairs Trading by Vidyamurthy on page 83, the author describes an elementary example of trading with log prices. However, he seems to use the cointegration coefficient to indicate the ratio of

sharesto hold rather than to indicate the relative market value of positions (as you state above). As I'm sure you have read this book, can you reconcile this discrepancy?From the book, with a cointegration coefficient of 1.5, he states "At time t, buy shares of A and short shares of B in the ratio 1:1.5" on page 83. Would very much appreciate your input.

Best,

Flapjacks

Hi Flapjacks,

I have not read that book. But based on your description, I can only say that I disagree with his interpretation. For a mathematical justification of my interpretation, please read p. 65 of my book Algorithmic Trading.

Ernie

Hi Dr Ernie:

I am your reader. however, I found the allocation ratio for pair trade under log price is difficult to comprehend.

Why we want to make the market value of asset A and B in fixed ratio Ha:Hb?

shall their ratio be always proportional to Ha*Log(Pa)/Hb*Log(Pb)?

Hi Chen,

I don't necessarily recommend a fixed market cap ratio. You can choose to keep the ratio of shares to be fixed instead - in this case you won't have to perform rebalancing each day. However, this runs into the danger that your portfolio may have a net exposure over time.

If you do want fixed market cap ratio, the hedge ratio b is determined by a linear regression of their log prices. The ratio you displayed does not seem right.

Ernie

Dr Ernie,

I am trying to develop a mean reversion strategy with FX intraday data (1H) between NZDUSD - CHFUSD using log(rt/rt-1) and Johansen. If one log price grows to any standard deviation above the mean, should I enter my positions according to the signs of Johansen test?. As I read, you expect to get profit in one pair and loss in another? Should I use only daily price? Did not cointegration make sense with intraday data as 15m, 30m, 1h?

Hi Camilo,

The Johansen test determines the hedge ratio between your 2 instruments. So yes, you should enter into a position based on the hedge ratio (which is signed).

Both sides may win, lose, or one-side win and the other lose.

The optimal frequency of your data is something you need to backest.

Cointegration is not particularly useful for short time frame, but you can still use it to obtain your hedge ratios (and the half-life of mean reversion.)

For the best way to trade mean reversion, see Cartea, 2015, which I referenced in a recent Tweet.

Ernie

Hi Ernie,

I am reading your "Algorithmic Trading", and in page 65 chapter 3, you mentioned that when h1=-h2 in y = h1y1+h2y2, then log(y1/y2) and y1/y2 are indeed stationary...

I can't see why, could you please elaborate more?

By the way the "stationary" you mentioned here means the Hurst exponent does not equal to 0.5 right? or you mean the mean and auto-covariance independent with time?

Thank you very much!

Hi Tianyi,

Actually there is a typo: I was referring to Equation 3.3, not 3.1. I.e. if the log price of y1 and y2 are cointegrated with hedge ratio = -1, then y1/y2 are stationary.

The stationarity definition I used is that the mean and covariance is independent of time.

Ernie

Dr. Chan,

Thank you for putting together such an excellent book, and providing such diligent blog responses. They've been a tremendous help!

A variant of this question seems to have been posted earlier, but here goes:

When forming a stationary time series using log prices, as you outline in your book, you advocate holding market values as opposed to ratio of shares. However, while researching this subject, I came across two authors, Professor Ruey Tsay (Booth) and Ganapathy Vidyamurthy (author of Pairs Trading 2004), who seemingly advocate contradictory advice. Mainly, that one should hold a ratio of shares irregardless of the logged price levels. As I'm just beginning my exploration on this topic, I'm hoping that you might add a bit more color to the topic and perhaps specify why the market value method is superior.

I've provided a link to Professor Tsay's online lecture slides (Page 10 and 11), and linked the below unanswered stackexhange thread referencing Vidyamurthy.

http://faculty.chicagobooth.edu/ruey.tsay/teaching/bs41202/sp2012/lec10-12.pdf

http://quant.stackexchange.com/questions/19340/what-does-the-cointegration-coefficient-represent-in-pairs-trading-when-cointegr?newreg=52e62b15fddf4c7ea1c826fa106d3401

Best,

Gyllenhaal

Hi Gyllenhaal,

Thank you for your kind words on my books and blog!

Using ratio for trading pairs has the virtue that one does not have to adapt the hedge ratio constantly. It is also equivalent to fixing the hedge ratio for log prices at 1. But that also means that one must adjust the market value of the two legs regularly. Also, if the growth rates of the two legs aren't the same, a hedge ratio of 1 won't be optimal.

Ernie

Hi Ernie,

First of all, thanks for writing such a great blog and those books. Your books and blog opened the quantitative trading world door to me.

I'm currently using linear regression to calculate the optimal hedge ratio between stocks with raw price. As stocks have earning season, the prices processes could be more volatile during those period. As to eliminate the noises during those periods, I wanted to remove the price data during those period in the optimal hedge ratio calculations. However, due to the autoregressive component of the time series data, I cannot simply remove the data in between and join the data. What should be the best method of doing this? Or, Should I even be doing this? My reason of doing this is that I don't want the optimal hedge ratios are fitted with data with too much noises.

I'm considering convert the time series into return space or log return space. Remove the data that fall into the time period , run regression on return or log return. However, I'm not sure the result of the regression on return/log return will give me the correct optimal hedge ratio.

Thanks again for your work and effort.

Sincerely,

Simon

Hi Simon,

Thanks for your kind words on my writings.

I don't see what's wrong with simply removing those days within the earnings season from your regression fit of prices. Linear regression does not assume the lack or presence of autocorrelations between different data points.

Ernie

Hi Ernie,

Thanks for the prompt reply. I see your point. Linear regression does not require that assumption. However, in the case of a Johansen Test, should I be worried about the data being cut off during the earning seasons? Because the change from the the last price before the cut off to the first price after the cut off could be misleading, just like the rolling over of the future contracts.

Thanks again for your work.

Sincerely,

Simon Z

Hi Simon,

Indeed, you can't just remove those prices from a Johansen test.

If you want to perform Johansen test on a continuous price series while removing those that are in earnings season, you have to adjust for the price gap like the way people piece together futures contracts to form continuous contract. Please see http://epchan.blogspot.com/2015/07/time-series-analysis-and-data-gaps.html

Ernie

Hi Ernie,

Thanks for the reply again. I shall do that instead.

Another question that I wanted to ask is that why don't we run a linear regression on returns/log returns to determine the optimal hedge ratio? I noticed in your post that you are comparing price vs log price. I have tried to run linear regression using price,log price, return, log return for one pair using 1-min bar. After that I used a ADF test on the price residuals of the regressions( obtain the optimal hedge ratios and calculate residual in terms of price), I realized that although all of the residuals past the ADF with p-value <.01, regression using return or log return gave me much more negative t- statistic than using price/log price. The optimal hedge ratios calculated from these regressions are slightly different.

Well, let me try to answer that question and see if I'm on the right track.

if we convert a price series to a return series, it lost some information about its current level.The regression on return data will not know the current prices of the stocks. The regression is trying to calculate the instantaneous relationships between two stocks returns within a fixed interval. For example, if the optimal hedge ratio between stock A and B is 2:1 using daily data, it suggests that if stock A has a x% price increase in one day, at the same day, we should expect stock B to have a x/2% price increase . If we noticed that stock B over performed or under performed, we might put up a trade to make advantage of it. Am I wrong here?

Thanks for your time reading and answering.

Sincerely,

Simon

Hi Simon,

When we pair trade stocks, we are not interested in having the returns cancel each other as well as possible. If we find a hedge ratio based on returns, we are doing exactly that. In pair trading, we are interested in mean reversion of the spread. The spread is made of 2 price series, not returns series. We are trading the deviation from a straight line fitted through the scatter plot of the prices. Hence the best way to construct the spread is to use the slope of that straight line as the hedge ratio.

Ernie

Hi Ernie,

Thanks a lot! This definitely clears my confusion. Thanks again for the great work you put up together.

Hi Ernie,

In your blog above you wrote that for hA*yA-hB*yB, "We should just keep the number of shares of stocks A and B fixed, in the ratio hA:hB, and short this spread..."

I believe that this is only true if you are using a stationary hedge ratio. When you start using dynamically hedged ratios (e.g. with lookback n or kalman filter) and have an open position, you must rebalance your portfolio periodically to best match the new synthetic spread by holding hA in stock A and short hB in stock B?

Can you let me know if my understanding is correct.

Hi Vincent,

Yes, if you dynamically adapt the hedge ratio, you do need constant rebalancing.

Ernie

Hi Ernie,

I love your books and blog comments.

I see for pair trading people often use ADF instead of DF.

What is the rationale?

Ryo

Hi Ryo,

The Dickey-Fuller test assumes the underlying time series model is AR(p). But the Augmented Dickey Fuller test assumes a more general error correction model with both lagged prices and lagged differences.

So in general we should use ADF because of its generality.

Ernie

Thank you very much for your reply.

Please allow me for a few more questions.

When identifying a pair, do we need to do ADF test for each security that they have same integrated of order? Or do we just need to do the test for the pair?

I am guessing that we need same order otherwise we have spurious results.

Or is it not necessary so?

Regards,

Ryo

Ryo,

It is necessary to first run ADF on each price series to ensure that are NOT I(0).

If it is I(0), then you don't need to trade it within a pair!

If it isn't I(0), then you should pair it with another price series and run cadf (not adf) test on the pair.

Ernie

Thank you very much!

Ryo

Post a Comment