Abstract
We show how state-of-the-art large language models (LLMs), seemingly inapplicable to the small samples typical of macroeconomics, can be trained to learn the language of macroeconomy. We estimate a large-scale dynamic stochastic general equilibrium (DSGE) model on an initial segment of the data and obtain a posterior distribution over structural parameters. We sample from this posterior to generate millions of theory-consistent synthetic panels that, when mixed with actual macroeconomic data, form the training corpus for a time-series transformer with attention. The trained model is then used to forecast out-of-sample through 2025. The results show that this hybrid forecaster, which combines the theoretical coherence of DSGE models with the representational power of modern LLMs, successfully learns the macroeconomic language.
Summary
This paper addresses the challenge of macroeconomic forecasting with limited historical data by leveraging Large Language Models (LLMs). The central idea is to train a time-series transformer with attention on a hybrid dataset comprising both actual macroeconomic data and synthetic data generated from a Dynamic Stochastic General Equilibrium (DSGE) model. The DSGE model, estimated on an initial segment of historical data (1947:Q3-1959:Q4), provides a posterior distribution over structural parameters. This distribution is then sampled to create millions of theory-consistent synthetic panels, which are mixed with real macroeconomic data (1960:Q1-2017:Q3) to form the training corpus. The trained model is subsequently used to forecast out-of-sample data (2017:Q4-2025:Q2). The key finding is that this hybrid forecasting approach effectively combines the theoretical coherence of DSGE models with the representational power of modern LLMs, enabling the model to learn the "macroeconomic language" and achieve strong predictive performance. The authors introduce a novel approach to overcome the small-sample problem inherent in macroeconomic forecasting. They don't use the DSGE model for direct forecasting but rather as a structured data generator, effectively encoding economic theory into the training data. The paper also presents architectural adaptations to the transformer model, including a separate-then-concatenate embedding strategy to handle multivariate macroeconomic data and a modular design with specialized transformers for each target variable. Mixed-batch training, where real and synthetic data are combined in a controlled manner (10% real, 90% synthetic), ensures that empirical data influences learning from the outset, while synthetic data stabilize estimation and mitigate overfitting. This methodology contributes to the field by demonstrating how simulation-based models and observational data can be integrated within deep-learning frameworks, offering a potentially generalizable solution for domains where structural simulators exist but real data are scarce.
Key Insights
- •Novel Hybrid Forecasting Approach: Combines DSGE-generated synthetic data with real macroeconomic data to train a transformer model, overcoming the limitations of small sample sizes in macroeconomic forecasting.
- •Architectural Adaptations: Includes a "separate-then-concatenate" embedding strategy to handle multivariate macroeconomic data and a modular design with specialized transformers for each target variable, reducing the number of parameters and improving computational efficiency.
- •Mixed-Batch Training: Implements a training strategy where each batch contains a mixture of real and synthetic data (10% real, 90% synthetic), effectively balancing theoretical priors from the DSGE model with empirical evidence from real-world observations.
- •Performance on Persistent Variables: Demonstrates superior performance in forecasting persistent level variables (hours worked, inflation, interest rate) compared to volatile growth rate variables (output, consumption, investment), aligning with theoretical expectations about autocorrelation and mean reversion.
- •Tokenization Strategy: Employs a percentile-based tokenization scheme to discretize continuous macroeconomic time series into economically meaningful bins, preserving distributional information and acting as a noise reduction mechanism.
- •Compact Model Size: Achieves strong predictive performance with a relatively small transformer architecture (approximately 50,000 parameters), suggesting that massive models may not be necessary for macroeconomic forecasting due to the inherent structure and theoretical constraints of economic data.
- •Bayesian Interpretation: The mixed training ratio (alpha) has a natural Bayesian interpretation, acting as a hyperparameter that controls the prior-to-likelihood weight, analogous to the effective sample size in Bayesian inference.
Practical Implications
- •Improved Macroeconomic Forecasting: The hybrid forecasting approach can potentially improve the accuracy and reliability of macroeconomic forecasts, particularly in data-scarce environments.
- •Applications for Central Banks and Policymakers: Central banks and policymakers can benefit from this research by using the trained transformer model to generate forecasts and conduct scenario analysis under alternative policy rules.
- •Data Augmentation Strategy: The use of DSGE models as structured data generators can be extended to other domains where simulation-based models exist, enabling the application of modern machine learning techniques in data-limited settings.
- •Development of Macro-Foundation Models: The framework can be used to develop general-purpose "macro-foundation models" – compact, theory-aligned forecasting systems that serve as economic counterparts to language-foundation architectures.
- •Future Research Directions: This work opens up avenues for future research, including multi-horizon forecasting, ensemble approaches combining multiple DSGE specifications, and the development of differentiable likelihood approximations for simulation-based inference.