The Dynamical Anatomy of Anderson Acceleration:From Adaptive Momentum to Variable-Mass ODEs
Episode

The Dynamical Anatomy of Anderson Acceleration:From Adaptive Momentum to Variable-Mass ODEs

Dec 24, 20257:09
Optimization and Controlmath.NA
No ratings yet

Abstract

This paper provides a rigorous derivation and analysis of accelerated optimization algorithms through the lens of High-Resolution Ordinary Differential Equations (ODEs). While classical Nesterov acceleration is well-understood via asymptotic vanishing damping, the dynamics of Anderson Acceleration (AA) remain less transparent. This work makes significant theoretical contributions to AA by bridging discrete acceleration algorithms with continuous dynamical systems, while also providing practical algorithmic innovations. Our work addresses fundamental questions about the physical nature of Anderson Acceleration that have remained unanswered since its introduction in 1965. Firstly, we prove that AA can be exactly rewritten as an adaptive momentum method and, in the high-resolution limit, converges to a second-order ODE with Variable Effective Mass. Through a Lyapunov energy analysis, we reveal the specific instability mechanism of standard AA: unchecked growth in effective mass acts as negative damping, physically injecting energy into the system and violating dissipation constraints. Conversely, high-resolution analysis identifies an implicit Hessian-driven damping term that provides stabilization in stiff regimes. Leveraging these dynamical insights, we then propose Energy-Guarded Anderson Acceleration (EG-AA), an algorithm that acts as an inertial governor to enforce thermodynamic consistency. Morevoer, our convergence analysis, formulated via the Acceleration Gain Factor, proves that EG-AA improves upon gradient descent by maximizing the geometric contraction of the linear subspace projection while actively suppressing nonlinear approximation errors. Theoretical bounds confirm that EG-AA is no worse than standard AA, and numerical experiments demonstrate strictly improved convergence stability and rates in ill-conditioned convex composite problems compared to standard Anderson mixing.

Summary

This paper tackles the problem of understanding and improving Anderson Acceleration (AA), a widely used optimization technique, particularly in scientific computing. While Nesterov acceleration is well-understood through the lens of Ordinary Differential Equations (ODEs) with asymptotic vanishing damping, the dynamics of AA have remained opaque. The authors address this by rigorously deriving the continuous-time limit of AA, showing that it converges to a second-order ODE with a variable effective mass. They identify that unchecked growth in this effective mass leads to instability by injecting energy into the system, violating dissipation constraints. Conversely, they identify an implicit Hessian-driven damping term that provides stabilization in stiff regimes. Based on these dynamical insights, the authors propose Energy-Guarded Anderson Acceleration (EG-AA), an algorithm that acts as an inertial governor to control the effective mass and enforce thermodynamic consistency. They provide a convergence analysis using the Acceleration Gain Factor, demonstrating that EG-AA improves upon gradient descent by maximizing the geometric contraction of the linear subspace projection while suppressing nonlinear approximation errors. Theoretical bounds confirm that EG-AA is no worse than standard AA, and numerical experiments demonstrate improved convergence stability and rates in ill-conditioned convex composite problems. This work bridges the gap between discrete acceleration algorithms and continuous dynamical systems, providing valuable insights into the physical nature of AA and paving the way for more robust and efficient optimization methods.

Key Insights

  • AA can be exactly rewritten as an adaptive momentum method, providing a direct connection to classical acceleration techniques like Nesterov's method.
  • The high-resolution limit of AA converges to a second-order ODE with Variable Effective Mass, where the mass is dynamically determined by the second moment of the mixing coefficients.
  • Instability in AA arises when the algorithm increases effective mass too rapidly, acting as "negative damping" and injecting energy into the system.
  • High-resolution analysis reveals an implicit Hessian-driven damping term (proportional to `sqrt(h) * Hessian * velocity`) that stabilizes AA in stiff regimes, explaining its superior robustness compared to standard Nesterov acceleration.
  • Energy-Guarded Anderson Acceleration (EG-AA) improves stability by explicitly controlling the growth of effective mass to ensure continuous energy dissipation. EG-AA clamps mass growth to be no more than `delta * sqrt(h)`.
  • EG-AA maximizes the geometric contraction of the linear subspace projection (Acceleration Gain Factor) while actively suppressing nonlinear approximation errors.
  • Theoretical bounds guarantee that EG-AA is no worse than standard AA, with numerical experiments showing strictly improved convergence stability and rates in ill-conditioned convex composite problems.

Practical Implications

  • EG-AA offers a more robust and stable alternative to standard AA, particularly for ill-conditioned optimization problems and those arising in scientific computing (e.g., computational chemistry, electronic structure calculations).
  • Practitioners can use EG-AA by implementing the algorithm described in Algorithm 1, tuning the mass growth limit `delta` and the damping factor `eta` for specific applications. These parameters act as inertial governors.
  • The insights from the variable-mass ODE analysis can inform the design of other accelerated optimization algorithms, focusing on controlling energy dissipation and leveraging Hessian information.
  • Future research directions include exploring adaptive strategies for tuning the mass growth limit `delta` and damping factor `eta`, as well as extending the analysis to non-convex optimization problems. This involves understanding how the Hessian-driven damping interacts with negative curvature.

Links & Resources

Authors