Podcast cover for "Hybrid LSTM and PPO Networks for Dynamic Portfolio Optimization" by Jun Kevin & Pujianto Yugopuspito
Episode

Hybrid LSTM and PPO Networks for Dynamic Portfolio Optimization

Nov 22, 202510:53
Machine LearningArtificial IntelligencePortfolio Management
No ratings yet

Abstract

This paper introduces a hybrid framework for portfolio optimization that fuses Long Short-Term Memory (LSTM) forecasting with a Proximal Policy Optimization (PPO) reinforcement learning strategy. The proposed system leverages the predictive power of deep recurrent networks to capture temporal dependencies, while the PPO agent adaptively refines portfolio allocations in continuous action spaces, allowing the system to anticipate trends while adjusting dynamically to market shifts. Using multi-asset datasets covering U.S. and Indonesian equities, U.S. Treasuries, and major cryptocurrencies from January 2018 to December 2024, the model is evaluated against several baselines, including equal-weight, index-style, and single-model variants (LSTM-only and PPO-only). The framework's performance is benchmarked against equal-weighted, index-based, and single-model approaches (LSTM-only and PPO-only) using annualized return, volatility, Sharpe ratio, and maximum drawdown metrics, each adjusted for transaction costs. The results indicate that the hybrid architecture delivers higher returns and stronger resilience under non-stationary market regimes, suggesting its promise as a robust, AI-driven framework for dynamic portfolio optimization.

Summary

This paper investigates a hybrid portfolio optimization framework that combines Long Short-Term Memory (LSTM) networks for return forecasting with Proximal Policy Optimization (PPO) for dynamic asset allocation. The core idea is to leverage LSTM's ability to capture temporal dependencies in financial time series and then use PPO to adaptively adjust portfolio weights based on these predictions. The model is trained and tested on a diverse multi-asset dataset including U.S. and Indonesian equities, U.S. Treasuries, and major cryptocurrencies, spanning from January 2018 to December 2024. The performance is evaluated using standard metrics such as annualized return, volatility, Sharpe ratio, and maximum drawdown, all adjusted for transaction costs. The key finding is that the hybrid LSTM+PPO architecture outperforms both single-model baselines (LSTM-only and PPO-only) and traditional benchmarks like equal-weight portfolios and the S&P 500 index, particularly in terms of annualized return. While the PPO-only model achieved a higher Sharpe ratio, the hybrid model demonstrates a better balance between return and stability, showcasing its resilience under non-stationary market conditions. The ablation study further highlights the impact of LSTM, indicating it shifts the policy frontier from a stable, low-volatility regime (PPO-only) to a high-growth, higher-volatility regime (Hybrid LSTM+PPO). This research contributes to the growing body of work on AI-driven portfolio management by demonstrating the benefits of integrating predictive models with reinforcement learning for improved adaptability and performance in dynamic financial environments.

Key Insights

  • The hybrid LSTM+PPO model achieves a higher annualized return (25.4% for Top-5) compared to PPO-only (5.75% for Top-5), LSTM-only (-3.03% for Top-5), and benchmark strategies, indicating the benefit of combining predictive and adaptive approaches.
  • While the hybrid model boosts returns, it also increases volatility and maximum drawdown compared to the PPO-only model, suggesting a trade-off between risk and reward when incorporating LSTM forecasts. For example, the Top-5 hybrid model has a maximum drawdown of -13.7% compared to -7.19% for the PPO-only Top-5 model.
  • The PPO-only model achieves the highest Sharpe ratio (1.02 for Top-10) demonstrating that reinforcement learning can effectively learn allocation policies directly from market data, even without explicit predictive signals.
  • The paper employs a Top-K softmax projection with a threshold to generate sparse portfolio weights, allowing for a sensitivity analysis of diversification levels (K = 5, 10, 30). Higher K values lead to reduced maximum drawdowns, confirming the benefits of diversification.
  • The study includes a transaction cost of 0.1% per unit turnover in the reward function of the PPO agent, making the results more realistic and applicable to real-world trading scenarios.
  • The LSTM module uses a 30-week lookback window and is trained with early stopping and weight decay to prevent overfitting, demonstrating careful consideration of model training and generalization.
  • The research highlights the limitations of static allocation strategies (like LSTM-only) compared to adaptive approaches (PPO-only and Hybrid LSTM+PPO), especially in volatile market conditions.

Practical Implications

  • The hybrid LSTM+PPO framework can be used by portfolio managers and financial institutions to develop AI-driven investment strategies that adapt to changing market conditions and aim for higher returns.
  • The modular design of the framework allows for future extensions, such as incorporating multi-frequency forecasting, volatility-aware reward functions, and cross-asset transfer learning, making it a flexible platform for further research and development.
  • Practitioners can use the findings to guide the development of hybrid AI systems for portfolio optimization, carefully considering the trade-off between return and risk when incorporating predictive models. The hyperparameter tuning and model architecture details provided in the paper offer a starting point for implementation.
  • Future research can focus on improving the robustness and stability of the hybrid model by exploring uncertainty quantification, macroeconomic feature integration, and risk-sensitive policy regularization. This could lead to more reliable and institutionally palatable AI-driven portfolio management solutions.
  • The results suggest that predictive signals from LSTM can be used to enhance reinforcement learning-based portfolio control, providing a valuable direction for future research in financial machine learning.

Links & Resources

Authors

Cite This Paper

Year:2025
Category:cs.LG
APA

Kevin, J., Yugopuspito, P. (2025). Hybrid LSTM and PPO Networks for Dynamic Portfolio Optimization. arXiv preprint arXiv:2511.17963.

MLA

Jun Kevin and Pujianto Yugopuspito. "Hybrid LSTM and PPO Networks for Dynamic Portfolio Optimization." arXiv preprint arXiv:2511.17963 (2025).