Autofilling the Data Gaps

My macroeconomics professor at The University at Buffalo told our class, at semester’s end, that people in his profession “had a lot to be humble about.” I loved that line and have used it hundreds of times since, to describe his and other professions, too.

I thought of the professor today when reading this recent post in Retraction Watch: No data? No problem! Undisclosed tinkering in Excel behind economics paper.

Last year, a new study on green innovations and patents in 27 countries left one reader slack-jawed. The findings were no surprise. What was baffling was how the authors, two professors of economics in Europe, had pulled off the research in the first place. 

The reader, a PhD student in economics, was working with the same data described in the paper. He knew they were riddled with holes – sometimes big ones: For several countries, observations for some of the variables the study tracked were completely absent. The authors made no mention of how they dealt with this problem. On the contrary, they wrote they had “balanced panel data,” which in economic parlance means a dataset with no gaps.

“I was dumbstruck for a week,” said the student …

The student wrote the article’s coauthor asking for an explanation and found out from him that Excel’s autofill function had “mended the data.” The program “filled in the blanks. If the new numbers turned negative, [the coauthors] replaced them with the last positive value Excel had spit out.” 

Replacing missing observations with substitute values – an operation known in statistics as imputation – is a common but controversial technique in economics that allows certain types of analyses to be carried out on incomplete data. Researchers have established methods for the practice; each comes with its own drawbacks that affect how the results are interpreted. As far as the student knew, Excel’s autofill function was not among these methods, especially not when applied in a haphazard way without clear justification.

But it got worse. [In] several instances … there were no observations to use for the autofill operation. … [The authors] had filled in thousands of empty cells in the dataset – well over one in 10 – including missing values for the study’s outcome variables. 

Interpolating data this way tends to be bad practice, but economists still do it. It’s not “cheating,” though, as long as you explain to your readers that this is what you did. Had the authors done so, however, it would have been unlikely their paper would have been published in the first place.

I have been reading Retraction Watch for years, but literally every week it publishes something that stuns me. There’s a lot of mayhem in academic publishing.

This entry was posted in Robert's posts and tagged , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *