Data Mining in Index Construction: Why Investors need to be cautious

This is an example of data mining in index construction and why investors need to tread carefully when looking at the performance of such indices

Published: March 9, 2020 at 11:32 am

The world of Finance is beset with an illness so widespread and commonly prevalent that even “experts” do not seem to take it with the required seriousness.  The illness, which shows no sign of withdrawal anytime soon is known as – ‘Data Mining’. Here is how it affects index construction and why we need to be careful. This is a guest post by an expert in the financial markets who wishes to be anonymous for personal reasons.

Much of the readers coming from ‘Tech’ background always have a positive opinion about data mining and rightly so because, in several fields, data and data mining have worked wonders – from things as simple as understanding customer behaviour to boost sales to analyzing weather trends to forecast – data and data mining has been so helpful. However, in the context of Finance and Investment Management – ‘Data Mining’ is a plague.

In the context of Finance/Investment Management, let me define what is data mining? Data Mining is nothing but looking at past data without any economic and intuitive rationale but to look for patterns in particular ‘superior’ performance. Given the growth in computing power and largescale availability of intraday data, it is not very difficult for a half-decent programmer to write simple scripts to come up with thousands if not millions of backtests to come up with some stellar results. However, both professionals and investors conveniently forget the most central tenet in investments – “Past is not indicative of Future” despite this statement being thrown away by everyone who has ever bought a single stock or unit of mutual fund.

Here is an illustration of data mining in action. MSCI, the world’s largest index provider- with trillions of dollars either tracking their indices or benchmarked against their indices, has three different ‘Value’ indices –  MSCI Value index, MSCI Value weighted index, MSCI ‘Enhanced’ Value index. Any logical person would ask the following questions: Why there are three different value indices from the same provider? Which one should I invest in? What are the differences between them? How one is better than the other ? MSCI Value the oldest member in the family has been live since 1997, the value-weighted index was launched in December 2010 and Enhanced Value in April 2015. Of course, the newly launched indices outperform the old indices in the backtests and that’s the “enhancements”.

The following picture plots the NAV Ratio of all the three value indices against the broad market index. NAV ratio, for those who do not know, is just the ratio of one index NAV divided by another index NAV. The economic interpretation of the ratio is the performance of a long-short portfolio in which we go ‘long’ on the numerator index/portfolio and ‘short’ on the denominator index/portfolio. So, if the NAV ratio goes up, the numerator index outperforms the denominator index (the benchmark in this case) and when it goes down, the numerator index underperforms the denominator. As you can see the latest indices outperform the old ones by a significant margin particularly in the backtests.  Also, it is interesting to see that new indices are launched following a prolonged bad performance of its predecessor indices. It doesn’t take a team of forensic analysts and investigative journalists to put together 2+3=5.  Once the indices are launched and they go live, what happened to them? That’s the result of data mining. Non-robust backtests plagued by data mining will sooner or later reveal their true color.  The fact is the academic value factor has been underperforming for more than a decade. No amount of data mining can change that fact. Whatever way we look at value, there is no escaping it. However, a fantastic past performance is what sells. A guy gotta eat, and to eat he gotta sell, so ..!

NAV evolution of MSCI Value Indices with annotation
NAV evolution of MSCI Value Indices with annotation

One might wonder how are we so sure there is data mining? Why can’t we give them the benefit of doubt? Well, it is out in the open in their methodology documents. The following is an excerpt on how MSCI chooses several variables and their weightage in constructing its factors. They blatantly admit they overweight variables that have shown better returns/volatility in the backtests. That’s the textbook definition of data mining and they openly say – they do data mining. It can mean only one of two things – 1. They don’t even know they do data mining. 2. They simply don’t care. I don’t know which of the two reasons is more dangerous than the other.

This is a screenshot from page 8 of MSCI FaCS Methodology document

Screenshot from page 8 of MSCI FaCS Methodology document
Screenshot from page 8 of MSCI FaCS Methodology document

The text is reproduced below for clarity:

When assigning weights to factors in each factor group, we put more emphasis on factor returns and IRs for Systematic Equity Strategy (SES)^2 factor groups (quality and value), and more emphasis on factor volatilities, t-stats and cross-validated (CV) R^2 gain for the non-SES factor groups (i.e., the size and volatility factor groups).
Within the quality group, for example, our multi-variate cross-sectional regression statistics, provided in Exhibit 2, also show that the three SES quality factors of earnings quality, investment quality and profitability generated significantly higher factor returns and IRs than the non-SES quality factors of leverage and earnings variability

Readers will ask, this is US data, US indices, US provider – I am simply investing in mutual funds in India why should I care? If the problem is this blatantly prevalent in indices, whose back test track record, construction methodology, launch date and live track record are public, imagine the scale and magnitude of your favorite active funds for which you have no access to anything. There is zero transparency. Indices are rules based and systematic whereas active mutual funds are totally discretionary. I cannot possibly fathom the scale at which data mining would be prevalent in the mutual fund industry. Thank goodness, SEBI came up with the rules to restrict the number of funds in each category.

This is not to say that we should never back test anything or never look at back test performance. Of course not. Past data is the only information available for us to make decisions. We should take it with a pinch of salt. As Pattu sir says, “Cherry picking best past returns is wrong. Cherry picking worst past risk is prudence”. That’s pretty much it. A one line summary of what data mining is and is not. That’s how we as investors should treat backtests or past data in general – to understand risks. As for the industry – there is no hope.

Do share if you found this useful
Share your thoughts on this topic at the  Reddit freefincal_user_forum

Reach your financial goals like a pro! Join our 1600+ Facebook Group on Portfolio Management! You can now reduce fear, doubt and uncertainty while investing for your financial goals! Sign up for our lectures on goal-based portfolio management and join our exclusive Facebook Community. The 1st lecture is free!
Want to check if the market is overvalued or undervalued? Use our market valuation tool (will work with any index!) or you buy the new Tactical Buy/Sell timing tool!
About the Author Pattabiraman editor freefincalM. Pattabiraman(PhD) is the founder, managing editor and primary author of freefincal. He is an associate professor at the Indian Institute of Technology, Madras. since Aug 2006. Connect with him via Twitter or Linkedin Pattabiraman has co-authored two print-books, You can be rich too with goal-based investing (CNBC TV18) and Gamechanger and seven other free e-books on various topics of money management. He is a patron and co-founder of “Fee-only India” an organisation to promote unbiased, commission-free investment advice. He conducts free money management sessions for corporates and associations on the basis of money management. Previous engagements include World Bank, RBI, BHEL, Asian Paints, Cognizant, Madras Atomic Power Station, Honeywell, Tamil Nadu Investors Association. For speaking engagements write to pattu [at] freefincal [dot] com
About freefincal & its content policy Freefincal is a News Media Organization dedicated to providing original analysis, reports, reviews and insights on developments in mutual funds, stocks, investing, retirement and personal finance. We do so without conflict of interest and bias. Follow us on Google News Freefincal serves more than one million readers a year (2.5 million page views) with articles based only on factual information and detailed analysis by its authors. All statements made will be verified from credible and knowledgeable sources before publication. Freefincal does not publish any kind of paid articles, promotions or PR, satire or opinions without data. All opinions presented will only be inferences backed by verifiable, reproducible evidence/data. Contact information: letters {at} freefincal {dot} com (sponsored posts or paid collaborations will not be entertained)
Connect with us on social media
Our publications

You Can Be Rich Too with Goal-Based Investing

You can be rich too with goal based investingPublished by CNBC TV18, this book is meant to help you ask the right questions, seek the right answers and since it comes with nine online calculators, you can also create custom solutions for your lifestyle! Get it now. It is also available in Kindle format.
Gamechanger: Forget Startups, Join Corporate & Still Live the Rich Life You Want Gamechanger: Forget Start-ups, Join Corporate and Still Live the Rich Life you wantThis book is meant for young earners to get their basics right from day one! It will also help you travel to exotic places at low cost! Get it or gift it to a young earner

Your Ultimate Guide to Travel

Travel-Training-Kit-Cover-new This is a deep dive analysis into vacation planning, finding cheap flights, budget accommodation, what to do when travelling, how travelling slowly is better financially and psychologically with links to the web pages and hand-holding at every step. Get the pdf for Rs 199 (instant download)
Free android apps