A Few Thoughts on ML in Economic Predictions

I was recently talking to a friend about my work in economic forecasting and why the field hasn’t had much influence from more modern or trendy machine learning algorithms.The way most economic forecasting works is using a class of models called time-series. Unlike structural models they are based on identifying statistical dynamics of how series change over time, based on their past data, and their dynamic relationship with related series. For example, if you want to model GDP you would start with only GDP data and explain how it evolves based on its past data.

There are a few challenges with a large class of interesting economic time-series models. For one, the model you are  specifying is super dependent on economic theory. As an example, we never use more than one lagged value when using stock market data, because by definition a stock market embeds expectations of the future conditional on information known at the time. So having past data is not only meaningless, but misleading. Secondly, this is compounded by the problem that economic time-series data often has  little meaningful data. The amount of data in a model shouldn’t just be measured by it’s sample size, but by the amount of variation that is meaningful to build a mathematical connection between your models.

I ran into this problem when trying to forecast house prices in Seattle based on Zillow. Their data goes back about 20 years. That’s not much data, there are some economic issues around 2001-2003, a huge price increase up to 2007, a housing crash, and a new price increase. Not only is this not much variation (4 or 5 data points?), but the relationship between variables is structurally changing. Is the relationship between the macro-economy and Seattle house prices the same in 2004 as in the modern tech-boom Seattle?

How do you solve this? If you’re really really good you are able to build great priors based on economic theory. My economic priors come from years of reading Game Theory textbooks, economic principles, history, and the Philosophy of Economics and Science. Those books aggregate their knowledge on human behavior from centuries of observation, documentation, and modelling human action and markets.

That’s the cool part of economics, which is that lots of more complex systems can be understood from observations that we can use our brains to filter out to a low number of dimensions, which people often call economic theory. If I drop $30 on the ground, I strongly expect the first person to walk by to pick it up. I expect that because One, I know that it’s what I would do, because as a human I can simulate what another human will probably do. And Two, I’ve read lots of information involving other humans that suggests they would pick up the $30.

Let’s take this back to time-series modelling. The best economic models, before they are even estimated, are hypothesized by a human based on a combination of the specifics of their problem combined with their knowledge of economic theory. How do you get a machine to learn the right model to do this?  To replace a human it needs to understand economic theory, the structure of an economy as put through a textbook, the ability to try and simulate human behavior, and understand how this all interacts with the context of the specific problem.

Not only is it a problem type that requires massive high-dimensional data, but because the models are about other humans we are naturally suited to simulate what it’s like to be another human, and how their choices could dynamically evolve through the future.

In a sense the objective function we minimize for a given time-series model is only an approximation for the “true” objective function conditional on our having chosen the right model based on our prior data of economic systems.

Based on this I get the feeling solving economic models using advanced ML methods (where the model is able to incorporate prior economic system information) would require an AI-complete solution, which is able to read textbooks, human history, and simulate human behavior. There will definitely be smaller steps, particularly with respect to letting a machine search very high-dimensional datasets for useful predictors.

I have to think about in a more structured way, but my thought now is that the more a model relies on economic theory the less useful ML models will end up.