Podcast cover for "BUT Systems for Environmental Sound Deepfake Detection in the ESDD 2026 Challenge" by Junyi Peng et al.
Episode

BUT Systems for Environmental Sound Deepfake Detection in the ESDD 2026 Challenge

Dec 9, 20259:30
eess.AS
No ratings yet

Abstract

This paper describes the BUT submission to the ESDD 2026 Challenge, specifically focusing on Track 1: Environmental Sound Deepfake Detection with Unseen Generators. To address the critical challenge of generalizing to audio generated by unseen synthesis algorithms, we propose a robust ensemble framework leveraging diverse Self-Supervised Learning (SSL) models. We conduct a comprehensive analysis of general audio SSL models (including BEATs, EAT, and Dasheng) and speech-specific SSLs. These front-ends are coupled with a lightweight Multi-Head Factorized Attention (MHFA) back-end to capture discriminative representations. Furthermore, we introduce a feature domain augmentation strategy based on distribution uncertainty modeling to enhance model robustness against unseen spectral distortions. All models are trained exclusively on the official EnvSDD data, without using any external resources. Experimental results demonstrate the effectiveness of our approach: our best single system achieved Equal Error Rates (EER) of 0.00\%, 4.60\%, and 4.80\% on the Development, Progress (Track 1), and Final Evaluation sets, respectively. The fusion system further improved generalization, yielding EERs of 0.00\%, 3.52\%, and 4.38\% across the same partitions.

Links & Resources

Authors

Cite This Paper

Year:2025
Category:eess.AS
APA

Peng, J., Zhang, L., Li, J., Plchot, O., Cernocky, J. (2025). BUT Systems for Environmental Sound Deepfake Detection in the ESDD 2026 Challenge. arXiv preprint arXiv:2512.08319.

MLA

Junyi Peng, Lin Zhang, Jin Li, Oldrich Plchot, and Jan Cernocky. "BUT Systems for Environmental Sound Deepfake Detection in the ESDD 2026 Challenge." arXiv preprint arXiv:2512.08319 (2025).