Data Mining in Index Construction: Why Investors need to be cautious

This is an example of data mining in index construction and why investors need to tread carefully when looking at the performance of such indices

Published: March 9, 2020 at 11:32 am

The world of Finance is beset with an illness so widespread and commonly prevalent that even “experts” do not seem to take it with the required seriousness.  The illness, which shows no sign of withdrawal anytime soon is known as – ‘Data Mining’. Here is how it affects index construction and why we need to be careful. This is a guest post by an expert in the financial markets who wishes to be anonymous for personal reasons.

Much of the readers coming from ‘Tech’ background always have a positive opinion about data mining and rightly so because, in several fields, data and data mining have worked wonders – from things as simple as understanding customer behaviour to boost sales to analyzing weather trends to forecast – data and data mining has been so helpful. However, in the context of Finance and Investment Management – ‘Data Mining’ is a plague.

In the context of Finance/Investment Management, let me define what is data mining? Data Mining is nothing but looking at past data without any economic and intuitive rationale but to look for patterns in particular ‘superior’ performance. Given the growth in computing power and largescale availability of intraday data, it is not very difficult for a half-decent programmer to write simple scripts to come up with thousands if not millions of backtests to come up with some stellar results. However, both professionals and investors conveniently forget the most central tenet in investments – “Past is not indicative of Future” despite this statement being thrown away by everyone who has ever bought a single stock or unit of mutual fund.

Here is an illustration of data mining in action. MSCI, the world’s largest index provider- with trillions of dollars either tracking their indices or benchmarked against their indices, has three different ‘Value’ indices –  MSCI Value index, MSCI Value weighted index, MSCI ‘Enhanced’ Value index. Any logical person would ask the following questions: Why there are three different value indices from the same provider? Which one should I invest in? What are the differences between them? How one is better than the other ? MSCI Value the oldest member in the family has been live since 1997, the value-weighted index was launched in December 2010 and Enhanced Value in April 2015. Of course, the newly launched indices outperform the old indices in the backtests and that’s the “enhancements”.


The following picture plots the NAV Ratio of all the three value indices against the broad market index. NAV ratio, for those who do not know, is just the ratio of one index NAV divided by another index NAV. The economic interpretation of the ratio is the performance of a long-short portfolio in which we go ‘long’ on the numerator index/portfolio and ‘short’ on the denominator index/portfolio. So, if the NAV ratio goes up, the numerator index outperforms the denominator index (the benchmark in this case) and when it goes down, the numerator index underperforms the denominator. As you can see the latest indices outperform the old ones by a significant margin particularly in the backtests.  Also, it is interesting to see that new indices are launched following a prolonged bad performance of its predecessor indices. It doesn’t take a team of forensic analysts and investigative journalists to put together 2+3=5.  Once the indices are launched and they go live, what happened to them? That’s the result of data mining. Non-robust backtests plagued by data mining will sooner or later reveal their true color.  The fact is the academic value factor has been underperforming for more than a decade. No amount of data mining can change that fact. Whatever way we look at value, there is no escaping it. However, a fantastic past performance is what sells. A guy gotta eat, and to eat he gotta sell, so ..!

NAV evolution of MSCI Value Indices with annotation
NAV evolution of MSCI Value Indices with annotation

One might wonder how are we so sure there is data mining? Why can’t we give them the benefit of doubt? Well, it is out in the open in their methodology documents. The following is an excerpt on how MSCI chooses several variables and their weightage in constructing its factors. They blatantly admit they overweight variables that have shown better returns/volatility in the backtests. That’s the textbook definition of data mining and they openly say – they do data mining. It can mean only one of two things – 1. They don’t even know they do data mining. 2. They simply don’t care. I don’t know which of the two reasons is more dangerous than the other.

This is a screenshot from page 8 of MSCI FaCS Methodology document

Screenshot from page 8 of MSCI FaCS Methodology document
Screenshot from page 8 of MSCI FaCS Methodology document

The text is reproduced below for clarity:

When assigning weights to factors in each factor group, we put more emphasis on factor returns and IRs for Systematic Equity Strategy (SES)^2 factor groups (quality and value), and more emphasis on factor volatilities, t-stats and cross-validated (CV) R^2 gain for the non-SES factor groups (i.e., the size and volatility factor groups).
Within the quality group, for example, our multi-variate cross-sectional regression statistics, provided in Exhibit 2, also show that the three SES quality factors of earnings quality, investment quality and profitability generated significantly higher factor returns and IRs than the non-SES quality factors of leverage and earnings variability

Readers will ask, this is US data, US indices, US provider – I am simply investing in mutual funds in India why should I care? If the problem is this blatantly prevalent in indices, whose back test track record, construction methodology, launch date and live track record are public, imagine the scale and magnitude of your favorite active funds for which you have no access to anything. There is zero transparency. Indices are rules based and systematic whereas active mutual funds are totally discretionary. I cannot possibly fathom the scale at which data mining would be prevalent in the mutual fund industry. Thank goodness, SEBI came up with the rules to restrict the number of funds in each category.

This is not to say that we should never back test anything or never look at back test performance. Of course not. Past data is the only information available for us to make decisions. We should take it with a pinch of salt. As Pattu sir says, “Cherry picking best past returns is wrong. Cherry picking worst past risk is prudence”. That’s pretty much it. A one line summary of what data mining is and is not. That’s how we as investors should treat backtests or past data in general – to understand risks. As for the industry – there is no hope.

Do share if you found this useful

Use our Robo-advisory Excel Template for a start-to-finish financial plan! Now with a new demo video!  More than 700 investors and advisors use this!
Our flagship course! Learn to manage your portfolio like a pro to achieve your goals regardless of market conditions! More than 2600 investors and advisors are part of our exclusive community! Get clarity on how to plan for your goals and achieve the necessary corpus no matter what the market condition is!! Watch the first lecture for free!  One-time payment! No recurring fees! Life-long access to videos! Reduce fear, uncertainty and doubt while investing! Learn how to plan for your goals before and after retirement with confidence.
Our new course!  Increase your income by getting people to pay for your skills! More than 600 salaried employees, entrepreneurs and financial advisors are part of our exclusive community! Learn how to get people to pay for your skills! Whether you are a professional or small business owner who wants more clients via online visibility or a salaried person wanting a side income or passive income, we will show you how to achieve this by showcasing your skills and building a community that trusts you and pays you! (watch 1st lecture for free). One-time payment! No recurring fees! Life-long access to videos!   
My new book for kids: “Chinchu gets a superpower!” is now available!
Both boy and girl version covers of Chinchu gets a superpower
Both boy and girl version covers of Chinchu gets a superpower.
Most investor problems can be traced to a lack of informed decision making. We have all made bad decisions and money mistakes when we started earning and spent years undoing these mistakes. Why should our children go through the same pain? What is this book about? As parents, if we had to groom one ability in our children that is key not only to money management and investing but for any aspect of life, what would it be? My answer: Sound Decision Making. So in this book, we meet Chinchu, who is about to turn 10. What he wants for his birthday and how his parent’s plan for it and teach him several key ideas of decision making and money management is the narrative. What readers say!
Feedback from a young reader after reading Chinchu gets a Superpower (small version)
Feedback from a young reader after reading Chinchu gets a Superpower!
Must-read book even for adults! This is something that every parent should teach their kids right from their young age. The importance of money management and decision making based on their wants and needs. Very nicely written in simple terms. - Arun.
Buy the book: Chinchu gets a superpower for your child!
How to profit from content writing: Our new ebook for those interested in getting side income via content writing. It is available at a 50% discount for Rs. 500 only!
Did you know? We have more than 1000+ videos on YouTube to explore! Join our YouTube Community!

Want to check if the market is overvalued or undervalued? Use our market valuation tool (will work with any index!), or you buy the new Tactical Buy/Sell timing tool!
We publish mutual fund screeners and momentum, low volatility stock screeners .every month.
About the Author Pattabiraman editor freefincalM. Pattabiraman(PhD) is the founder, managing editor and primary author of freefincal. He is an associate professor at the Indian Institute of Technology, Madras. since Aug 2006. Connect with him via Twitter or Linkedin Pattabiraman has co-authored three print books, You can be rich too with goal-based investing (CNBC TV18), Gamechanger, Chinchu Gets a Superpower! and seven other free e-books on various money management topics. He is a patron and co-founder of “Fee-only India,” an organisation to promote unbiased, commission-free investment advice. He conducts free money management sessions for corporates and associations based on money management. Previous engagements include World Bank, RBI, BHEL, Asian Paints, Cognizant, Madras Atomic Power Station, Honeywell, Tamil Nadu Investors Association, IIST Alumni Association. For speaking engagements, write to pattu [at] freefincal [dot] com
About freefincal & its content policy Freefincal is a News Media Organization dedicated to providing original analysis, reports, reviews and insights on developments in mutual funds, stocks, investing, retirement and personal finance. We do so without conflict of interest and bias. Follow us on Google News. Freefincal serves more than three million readers a year (5 million page views) with articles based only on factual information and detailed analysis by its authors. All statements made will be verified from credible and knowledgeable sources before publication. Freefincal does not publish any paid articles, promotions, PR, satire or opinions without data. All opinions presented will only be inferences backed by verifiable, reproducible evidence/data. Contact information: letters {at} freefincal {dot} com (sponsored posts or paid collaborations will not be entertained)
Connect with us on social media
Our publications

You Can Be Rich Too with Goal-Based Investing

You can be rich too with goal based investingPublished by CNBC TV18, this book is meant to help you ask the right questions, seek the correct answers, and since it comes with nine online calculators, you can also create custom solutions for your lifestyle! Get it now. It is also available in Kindle format.
Gamechanger: Forget Startups, Join Corporate & Still Live the Rich Life You Want Gamechanger: Forget Start-ups, Join Corporate and Still Live the Rich Life you wantThis book is meant for young earners to get their basics right from day one! It will also help you travel to exotic places at a low cost! Get it or gift it to a young earner.

Your Ultimate Guide to Travel

Travel-Training-Kit-Cover-new This is an in-depth dive analysis into vacation planning, finding cheap flights, budget accommodation, what to do when travelling, how travelling slowly is better financially and psychologically with links to the web pages and hand-holding at every step. Get the pdf for Rs 199 (instant download)
Free android apps