Episode

Reinforcement Learning for Portfolio Optimization with a Financial Goal and Defined Time Horizons

Fermat Leukam,Rock Stephane Koffi,Prudence Djagba

Nov 22, 2025•7:25

Portfolio ManagementArtificial IntelligenceMachine Learning

No ratings yet

Abstract

This research proposes an enhancement to the innovative portfolio optimization approach using the G-Learning algorithm, combined with parametric optimization via the GIRL algorithm (G-learning approach to the setting of Inverse Reinforcement Learning) as presented by. The goal is to maximize portfolio value by a target date while minimizing the investor's periodic contributions. Our model operates in a highly volatile market with a well-diversified portfolio, ensuring a low-risk level for the investor, and leverages reinforcement learning to dynamically adjust portfolio positions over time. Results show that we improved the Sharpe Ratio from 0.42, as suggested by recent studies using the same approach, to a value of 0.483 a notable achievement in highly volatile markets with diversified portfolios. The comparison between G-Learning and GIRL reveals that while GIRL optimizes the reward function parameters (e.g., lambda = 0.0012 compared to 0.002), its impact on portfolio performance remains marginal. This suggests that reinforcement learning methods, like G-Learning, already enable robust optimization. This research contributes to the growing development of reinforcement learning applications in financial decision-making, demonstrating that probabilistic learning algorithms can effectively align portfolio management strategies with investor needs.

Summary

This paper introduces an enhanced portfolio optimization approach using the G-Learning algorithm, combined with parametric optimization via the GIRL algorithm. The central problem addresses maximizing portfolio value by a target date while minimizing investor contributions, operating within a volatile market and a diversified, low-risk portfolio. The core methodology involves using reinforcement learning (specifically G-Learning and GIRL) to dynamically adjust portfolio positions. The key findings demonstrate an improvement in the Sharpe Ratio from 0.42 to 0.483, which the authors claim is a significant achievement in volatile markets. The paper also highlights that while GIRL optimizes reward function parameters (e.g., lambda from 0.002 to 0.0012), its impact on portfolio performance is marginal, suggesting the robustness of G-Learning alone. The paper contributes to the growing body of research applying reinforcement learning to financial decision-making. The work demonstrates the ability of probabilistic learning algorithms to align portfolio management with investor needs, specifically when considering a target financial goal and regular contributions. The investigation into the utility of GIRL in conjunction with G-Learning highlights an important practical consideration: whether the added complexity of inverse reinforcement learning provides a significant performance boost compared to simply relying on G-Learning with appropriately tuned reward functions. This distinction is crucial for practitioners seeking to implement these algorithms in real-world scenarios.

Key Insights

•The paper demonstrates an improvement in Sharpe Ratio from 0.42 to 0.483 using G-Learning, suggesting its effectiveness in volatile markets with diversified portfolios.
•GIRL optimizes the reward function's lambda parameter (λ) from 0.002 to 0.0012.
•The impact of GIRL on portfolio performance is described as "marginal," indicating that G-Learning provides robust optimization even without parameter tuning via inverse reinforcement learning.
•The paper frames the portfolio optimization problem as a Markov Decision Process (MDP), explicitly considering investor-specific objectives (target date and contribution minimization).
•The reward function incorporates a penalty for underperformance relative to a target portfolio value, a transaction cost term, and the cost of contributions, aligning with the stated optimization goals.
•The target value for the portfolio at each time step is defined as a linear combination of a benchmark and a proportional increase in the current portfolio value, providing a flexible and intuitive way to set intermediate goals.

Practical Implications

•The research offers a practical reinforcement learning framework for goal-based wealth management, potentially applicable to retirement planning, major asset acquisition, or other financial goals.
•Financial institutions and individual investors can use G-Learning to develop dynamic portfolio management strategies that adapt to market volatility and investor-specific needs.
•The findings suggest that practitioners may achieve substantial performance improvements using G-Learning without necessarily needing to implement the more complex GIRL algorithm for reward function parameter optimization.
•Future research could investigate the sensitivity of G-Learning performance to different reward function parameters and explore methods for efficiently tuning these parameters in real-world financial markets.
•The algorithms could be extended to incorporate additional factors, such as investor risk preferences, tax implications, and other real-world constraints, to further enhance their practical applicability.

Links & Resources

View on arXiv Download PDF

Authors

Fermat Leukam Rock Stephane Koffi Prudence Djagba

Cite This Paper

arXiv:2511.18076

Year:2025

Category:q-fin.PM

APA

Leukam, F., Koffi, R. S., Djagba, P. (2025). Reinforcement Learning for Portfolio Optimization with a Financial Goal and Defined Time Horizons. arXiv preprint arXiv:2511.18076.

MLA

Fermat Leukam, Rock Stephane Koffi, and Prudence Djagba. "Reinforcement Learning for Portfolio Optimization with a Financial Goal and Defined Time Horizons." arXiv preprint arXiv:2511.18076 (2025).