Mapping Still Matters: Coarse-Graining with Machine Learning Potentials
Abstract
Coarse-grained (CG) modeling enables molecular simulations to reach time and length scales inaccessible to fully atomistic methods. For classical CG models, the choice of mapping, that is, how atoms are grouped into CG sites, is a major determinant of accuracy and transferability. At the same time, the emergence of machine learning potentials (MLPs) offers new opportunities to build CG models that can in principle learn the true potential of the mean force for any mapping. In this work, we systematically investigate how the choice of mapping influences the representations learned by equivariant MLPs by studying liquid hexane, amino acids, and polyalanine. We find that when the length scales of bonded and nonbonded interactions overlap, unphysical bond permutations can occur. We also demonstrate that correctly encoding species and maintaining stereochemistry are crucial, as neglecting either introduces unphysical symmetries. Our findings provide practical guidance for selecting CG mappings compatible with modern architectures and guide the development of transferable CG models.
Summary
This paper investigates the impact of coarse-graining (CG) mappings on the performance of equivariant machine learning potentials (MLPs) in molecular simulations. The main research question is how the choice of mapping scheme, which dictates how atoms are grouped into CG sites, affects the ability of MLPs to accurately represent the potential of mean force. The authors employ the MACE architecture, a representative E(3)-equivariant MLP, and compare its performance against classical CG potentials across three systems: liquid hexane, single amino acids (capped alanine), and a 15-mer polyalanine peptide. They use force matching to train the CG potentials based on atomistic reference simulations. The key findings highlight that the choice of mapping significantly influences the learned representation. Specifically, when bonded and nonbonded interaction length scales overlap (as in the two-site hexane model or Cα polyalanine), unphysical bond permutations can occur, leading to instabilities. For amino acids, neglecting stereochemistry or using ambiguous species encodings introduces unphysical symmetries, resulting in enantiomerization or incorrect secondary structure formation. The paper emphasizes the importance of preserving a faithful representation through careful CG mapping selection, even when using powerful MLPs. The findings provide practical guidance for choosing CG mappings compatible with modern MLP architectures and for developing transferable CG models. This research matters because it demonstrates the limitations of current MLPs in CG modeling, particularly when topology is critical, and underscores the need for incorporating topological information or regularization in these models.
Key Insights
- •Overlapping length scales of bonded and nonbonded interactions in CG mappings (e.g., two-site hexane) can lead to unphysical bond permutations and simulation instabilities, particularly when using MLPs that rely solely on geometric information for neighbor identification.
- •Neglecting stereochemistry in CG mappings of amino acids (e.g., removing the Cβ carbon) introduces point symmetries, allowing the model to freely switch between L- and D-enantiomers, resulting in incorrect free energy surfaces. The observed free energy barrier for chiral inversion decreased when the Cα hydrogen was removed.
- •Ambiguous species encoding in CG models can lead to a failure to capture the directionality of molecules, resulting in further symmetries and incorrect representations of local environments.
- •Equivariant MLPs, while exhibiting superior expressivity and data efficiency compared to classical potentials, are not immune to mapping-induced artifacts and require careful consideration of the CG mapping scheme to ensure accurate representation of the potential of mean force.
- •Increasing the correlation order (ν) or cutoff radius (rcut) in MACE models generally reduces force and RDF errors and improves stability, but at the cost of increased computational time.
- •Classical potentials struggle to capture many-body correlations present in more complex CG mappings (e.g., three- and four-site hexane models) due to their fixed functional forms, whereas MLPs can learn these interactions directly from data.
- •The polyalanine simulations showed that the Cα mapping resulted in a completely symmetric helicity index, indicating no preference for helix handedness and that the Core mapping also showed increased sampling of incorrect left-handed helices.
Practical Implications
- •When developing CG models with MLPs, practitioners should carefully consider the choice of mapping to avoid overlapping length scales and unphysical symmetries, particularly for systems where topology is critical, such as proteins.
- •For protein CG models, current MLPs may be best suited for implicit solvent or heavy-atom models, while coarser representations require additional topological information or regularization through prior energy terms.
- •The findings provide guidance for selecting appropriate CG mappings for different systems and highlight the importance of validating the learned representation through structural analysis and stability tests.
- •The code and data supporting the study, as well as the chemtrain framework, are publicly available, enabling researchers to reproduce the results and apply the methods to other systems.
- •Future research should focus on developing MLP architectures that can incorporate topological information or regularization techniques to overcome the limitations identified in this study and enable the use of coarser CG mappings for complex systems.
Links & Resources
Authors
Cite This Paper
Görlich, F., Zavadlav, J. (2025). Mapping Still Matters: Coarse-Graining with Machine Learning Potentials. arXiv preprint arXiv:2512.07692.
Franz Görlich and Julija Zavadlav. "Mapping Still Matters: Coarse-Graining with Machine Learning Potentials." arXiv preprint arXiv:2512.07692 (2025).