I have been making my way through the EconTalk archives, which I feel is more educational hour-for-hour than a university degree in economics. Today I listened to finance professor Campbell Harvey talking about statistical significance in investment management. He shares a story that ties together a lot of what I wrote about in the last few “math classes“:
Harvey: A number of years ago I was shown some research, at a high-level meeting, at one of the top 3 investment banks in the world. And this person was presenting the research, and basically he had found a variable that looked highly significant in beating the market, with a regression analysis, as you said. And it turned out that this variable was the 17th monthly lag in U.S. Industrial Production.
Russ: Yeah. I’ve always known that’s an important factor. [sarcastic] But that’s the beauty of his approach: nobody knows it, but by his fabulous deep look at the data, he uncovered this secret relationship that no one else knows.
Harvey: So, 17th lag? That seems a little unusual. So, usually we think of maybe the 2nd because one month the data isn’t available because of the publication delay. Maybe the 3rd. But the 17th–where’s that coming from? And then he basically said, ‘Well, that’s the only one that worked.’
Harvey shouldn’t be surprised: he teaches at a business school. Business schools have to cover a lot of ground: leadership (organizing a party), marketing (telling people about the party), supply chain (buying the beer), accounting (Venmoing beer money) and networking (party). The MBAs who go on to become VPs at top investment banks have time for just one mandatory statistics class, which covers p < 0.05 and not much else.
Of course, p-values doesn’t sound impressive enough for finance professionals. Instead, an investment strategy is evaluated using the Sharpe Ratio: the ratio of excess return to volatility.
Taking a measurement (excess returns) and dividing it by the standard deviation (volatility) is simply calculating the test statistic. The test statistic is a normalized measure of how far the result is from the null – the null for investment is the return on a “risk free” asset like US treasuries. The test statistic generates the p-value, they are one and the same. Each (arbitrary and useless) p-value cutoff corresponds to a test statistic cutoff which translates to a Sharpe Ratio cutoff above which an investment strategy is “significant” enough to brag about in front of a finance professor.
Going from p-values to Sharpe Ratios only serves to obscure the problem of multiplicity – testing many models and picking the best one. The banker tested at least 17 hypotheses (1 month lag, 2 months lag, 17 months lag) until he found one that worked. It’s a lot easier (and more intuitive) to divide the p-value cutoff by 17 and see if the result is still significant than it is to figure how to adjust the Sharpe Ratio.
Traders talk about ratios, engineers about sigmas, doctors about confidence intervals – the underlying statistical logic to all these things is the same. And in almost every profession, multiplicity (even when its accidental) is the quickest way for this logic to blow up in your face.
Interestingly, after having a good laugh at the banker’s expense, Harvey continues the story:
Harvey: However, and this is kind of, I think interesting. My paper has been very well received by investment bankers and people designing these strategies. And actually it’s interesting because they actually don’t want to market a strategy that turns out to be a fluke. Because that means that it hurts their reputation. It reduces the amount of fees that they get. And it really, basically it could reduce their bonus directly. So there’s actually a strong incentive in terms of business practice to get it right. So, within the practitioner community, at least, there are strong incentives to reduce the impact of data mining, so that you can develop a good reputation.
However, on the academic side, it’s not as clear. As you said, there’s minimal replication in some fields. And the editors don’t see all of the hocus-pocus going on before the paper actually is submitted for scientific review.
Russ: Yeah. When you were in that meeting at the investment bank and the person said it was significant and you said, ‘Well, how many did you run?’ and he said, ‘Well, 26, 24’, whatever it was, and you said, ‘That’s not significant’: Nobody around the table said, ‘So what? Doesn’t matter. We’ll be able to sell it because it’s over 2.’
Harvey: No. People, I’m sure: They do not want to do this. So that damages the reputation hugely. So, everything is reputation in terms of kind of street finance. And you want to do the right thing.
In the last post I wrote that the two reasons to do data analysis are: 1 – Make the right decision, and 2 – get published in an academic journal. I wasn’t being (entirely) facetious.
I don’t believe that academics have less integrity or intellectual honesty than investment bankers, and I don’t think that Harvey implies it. Instead, we both believe in the core maxim of economics: that people will follow their incentives.
What makes the difference in this case is that a bank has different roles for the people who make the model and the people who sell them. As long as the person who creates the investment strategy gets their bonus based on how well the strategy performs, and not based how easy it is to convince clients of its significance, their incentives will keep them honest.
I wonder if one of the root causes of the bad research published in several disciplines is that the same person makes (i.e. designs and runs the experiment) and sells (to a journal).
People on a diet grab a cookie because near-term desires (taste the cookie right now) easily overwhelm long-term goals (being skinny next summer). A scientist has the long term goal of discovering enduring truths, but in the short term the journal editor wants p-values and positive results.
As long as the cookies are on the table, it’s hard not to take a bite.