Last Updated on January 30, 2022
The world of Finance is beset with an illness so widespread and commonly prevalent that even “experts” do not seem to take it with the required seriousness. The illness, which shows no sign of withdrawal anytime soon is known as – ‘Data Mining’. Here is how it affects index construction and why we need to be careful. This is a guest post by an expert in the financial markets who wishes to be anonymous for personal reasons.
Much of the readers coming from ‘Tech’ background always have a positive opinion about data mining and rightly so because, in several fields, data and data mining have worked wonders – from things as simple as understanding customer behaviour to boost sales to analyzing weather trends to forecast – data and data mining has been so helpful. However, in the context of Finance and Investment Management – ‘Data Mining’ is a plague.
In the context of Finance/Investment Management, let me define what is data mining? Data Mining is nothing but looking at past data without any economic and intuitive rationale but to look for patterns in particular ‘superior’ performance. Given the growth in computing power and largescale availability of intraday data, it is not very difficult for a half-decent programmer to write simple scripts to come up with thousands if not millions of backtests to come up with some stellar results. However, both professionals and investors conveniently forget the most central tenet in investments – “Past is not indicative of Future” despite this statement being thrown away by everyone who has ever bought a single stock or unit of mutual fund.
Get free money management solutions delivered to your mailbox! Subscribe to get posts via email!
🔥Enjoy massive discounts on our robo-advisory tool & courses! 🔥
Here is an illustration of data mining in action. MSCI, the world’s largest index provider- with trillions of dollars either tracking their indices or benchmarked against their indices, has three different ‘Value’ indices – MSCI Value index, MSCI Value weighted index, MSCI ‘Enhanced’ Value index. Any logical person would ask the following questions: Why there are three different value indices from the same provider? Which one should I invest in? What are the differences between them? How one is better than the other ? MSCI Value the oldest member in the family has been live since 1997, the value-weighted index was launched in December 2010 and Enhanced Value in April 2015. Of course, the newly launched indices outperform the old indices in the backtests and that’s the “enhancements”.
The following picture plots the NAV Ratio of all the three value indices against the broad market index. NAV ratio, for those who do not know, is just the ratio of one index NAV divided by another index NAV. The economic interpretation of the ratio is the performance of a long-short portfolio in which we go ‘long’ on the numerator index/portfolio and ‘short’ on the denominator index/portfolio. So, if the NAV ratio goes up, the numerator index outperforms the denominator index (the benchmark in this case) and when it goes down, the numerator index underperforms the denominator. As you can see the latest indices outperform the old ones by a significant margin particularly in the backtests. Also, it is interesting to see that new indices are launched following a prolonged bad performance of its predecessor indices. It doesn’t take a team of forensic analysts and investigative journalists to put together 2+3=5. Once the indices are launched and they go live, what happened to them? That’s the result of data mining. Non-robust backtests plagued by data mining will sooner or later reveal their true color. The fact is the academic value factor has been underperforming for more than a decade. No amount of data mining can change that fact. Whatever way we look at value, there is no escaping it. However, a fantastic past performance is what sells. A guy gotta eat, and to eat he gotta sell, so ..!

One might wonder how are we so sure there is data mining? Why can’t we give them the benefit of doubt? Well, it is out in the open in their methodology documents. The following is an excerpt on how MSCI chooses several variables and their weightage in constructing its factors. They blatantly admit they overweight variables that have shown better returns/volatility in the backtests. That’s the textbook definition of data mining and they openly say – they do data mining. It can mean only one of two things – 1. They don’t even know they do data mining. 2. They simply don’t care. I don’t know which of the two reasons is more dangerous than the other.
This is a screenshot from page 8 of MSCI FaCS Methodology document

The text is reproduced below for clarity:
When assigning weights to factors in each factor group, we put more emphasis on factor returns and IRs for Systematic Equity Strategy (SES)^2 factor groups (quality and value), and more emphasis on factor volatilities, t-stats and cross-validated (CV) R^2 gain for the non-SES factor groups (i.e., the size and volatility factor groups).
Within the quality group, for example, our multi-variate cross-sectional regression statistics, provided in Exhibit 2, also show that the three SES quality factors of earnings quality, investment quality and profitability generated significantly higher factor returns and IRs than the non-SES quality factors of leverage and earnings variability
Readers will ask, this is US data, US indices, US provider – I am simply investing in mutual funds in India why should I care? If the problem is this blatantly prevalent in indices, whose back test track record, construction methodology, launch date and live track record are public, imagine the scale and magnitude of your favorite active funds for which you have no access to anything. There is zero transparency. Indices are rules based and systematic whereas active mutual funds are totally discretionary. I cannot possibly fathom the scale at which data mining would be prevalent in the mutual fund industry. Thank goodness, SEBI came up with the rules to restrict the number of funds in each category.
This is not to say that we should never back test anything or never look at back test performance. Of course not. Past data is the only information available for us to make decisions. We should take it with a pinch of salt. As Pattu sir says, “Cherry picking best past returns is wrong. Cherry picking worst past risk is prudence”. That’s pretty much it. A one line summary of what data mining is and is not. That’s how we as investors should treat backtests or past data in general – to understand risks. As for the industry – there is no hope.
🔥Enjoy massive discounts on our courses and robo-advisory tool! 🔥
Use our Robo-advisory Excel Tool for a start-to-finish financial plan! ⇐ More than 1000 investors and advisors use this!
New Tool! => Track your mutual funds and stocks investments with this Google Sheet!
- Follow us on Google News.
- Do you have a comment about the above article? Reach out to us on Twitter: @freefincal or @pattufreefincal
- Join our YouTube Community and explore more than 1000 videos!
- Have a question? Subscribe to our newsletter with this form.
- Hit 'reply' to any email from us! We do not offer personalized investment advice. We can write a detailed article without mentioning your name if you have a generic question.
Get free money management solutions delivered to your mailbox! Subscribe to get posts via email!
Explore the site! Search among our 2000+ articles for information and insight!
About The Author

Our flagship course! Learn to manage your portfolio like a pro to achieve your goals regardless of market conditions! ⇐ More than 3000 investors and advisors are part of our exclusive community! Get clarity on how to plan for your goals and achieve the necessary corpus no matter what the market condition is!! Watch the first lecture for free! One-time payment! No recurring fees! Life-long access to videos! Reduce fear, uncertainty and doubt while investing! Learn how to plan for your goals before and after retirement with confidence.
Our new course! Increase your income by getting people to pay for your skills! ⇐ More than 700 salaried employees, entrepreneurs and financial advisors are part of our exclusive community! Learn how to get people to pay for your skills! Whether you are a professional or small business owner who wants more clients via online visibility or a salaried person wanting a side income or passive income, we will show you how to achieve this by showcasing your skills and building a community that trusts you and pays you! (watch 1st lecture for free). One-time payment! No recurring fees! Life-long access to videos!
Our new book for kids: “Chinchu gets a superpower!” is now available!


Must-read book even for adults! This is something that every parent should teach their kids right from their young age. The importance of money management and decision making based on their wants and needs. Very nicely written in simple terms. - Arun.Buy the book: Chinchu gets a superpower for your child!
How to profit from content writing: Our new ebook for those interested in getting side income via content writing. It is available at a 50% discount for Rs. 500 only!
Want to check if the market is overvalued or undervalued? Use our market valuation tool (it will work with any index!), or you buy the new Tactical Buy/Sell timing tool!
We publish monthly mutual fund screeners and momentum, low volatility stock screeners.
About freefincal & its content policy Freefincal is a News Media Organization dedicated to providing original analysis, reports, reviews and insights on mutual funds, stocks, investing, retirement and personal finance developments. We do so without conflict of interest and bias. Follow us on Google News. Freefincal serves more than three million readers a year (5 million page views) with articles based only on factual information and detailed analysis by its authors. All statements made will be verified from credible and knowledgeable sources before publication. Freefincal does not publish any paid articles, promotions, PR, satire or opinions without data. All opinions presented will only be inferences backed by verifiable, reproducible evidence/data. Contact information: letters {at} freefincal {dot} com (sponsored posts or paid collaborations will not be entertained)
Connect with us on social media
- Twitter @freefincal
- Subscribe to our Youtube Videos
- Posts feed via Feedburner.
Our publications
You Can Be Rich Too with Goal-Based Investing

Gamechanger: Forget Startups, Join Corporate & Still Live the Rich Life You Want

Your Ultimate Guide to Travel
