Friday, March 2, 2018

Tinkering with Data until it looks Pretty

Experienced Data scientists know very well that you can beat data long enough to get it to say what you want. The recent playbook in economic sector in India highlights how you can get the data to speak the truth and lie at the same time.

In the last 4 years, the methodology of how a few key metrics are calculated have been changed.

Changed in
What changed?
Base year changed from 2004-05 to 2011-12
Index of Industrial Production (IIP)
Base year changed from 2004-05 to 2011-12
Wholesale Price Index (WPI)
Base year changed from 2004-05 to 2011-12
Consumer Price Index (CPI)
Changed in 2015 and now in 2018
Base year changed to 2010 and then to 2012
Base Year
Plan is to make the base year 2017-18
Min. of Statistics and Programme Implementation
is planning to change the base year for calculations
of most national statistics
The risk is that such revisions are NOT accompanied with backward calculations or back series. Hypothetical example - if you change 1inch from 2.54 cm to 4cm from today and do not revise old measurements and use (1in = 4cm) for measurements from tomorrow - there's going to be lot of confusion.
Also, changing the base year to 2017-18 by skipping a few years doesn’t add up.
And since key policy decisions are based on these national statistics, ambiguity about the trend will not help in taking accurate measures.
[In case someone is unaware - these metrics are calculated by the govt - NSO, MoSPI as only the govt. has the most comprehensive data. ]
Out of the mainstream print and electronic media - only Business Standard seemed to highlight this technical anomaly.

without the back series it becomes difficult if not impossible to make substantial and meaningful assessments about the changes. Moreover, if such changes are brought about too frequently they can also cause a fair amount of confusion and make it very hard to read economic trends and understand policy impact.

This is what the establishment seems to want to happen. The GDP methodology was revised in 2015 and the data for old years w.r.t new methodology were not released. The  GDP numbers calculated henceforth, were based on a new methodology seemed to gave it a little upward push. Now comparing with old GDP numbers prior to 2015 is pointless as they both were calculated differently and the new numbers look more robust and betray the  naked eye.

Secondly, terms like CPI/WPI/IIP are Greek and Latin to almost 98% of the populace. In election campaigns the numbers will be touted which will invariably be higher than the old numbers and the fact that the method via which it was calculated will be lost on everyone. As it is said the devil lies in the details - not many will bother to look under the hood and those who do will be relegated to academic journals and blogs like this.

It is a far-fetched dream that the masses choose their leaders based on their proposed economic policies and question them on their financial feasibility specially in India where caste and religion drive the voting patterns. Still, it is the responsibility of the educated and the fourth pillar of democracy to at least raise questions and bring it into the public discourse.

