Data di Pubblicazione:
2025
Abstract:
The audio forensics field has recently faced a new challenge: singing voice deepfake detection. Current approaches to tackle this problem have borrowed methods initially developed for the more established task of speech deepfake detection, often simply retraining these systems on singing voice data. However, effective speech detection techniques may not necessarily perform well on singing voice, and there has been limited research on identifying the factors that can improve detection specifically in the singing domain. This paper investigates the effectiveness of various audio representations and features for discriminating real and synthetically generated singing voice signals. We evaluate two Convolutional Neural Network (CNN)-based detection systems using a wide range of audio representations, including handcrafted, learning-based, and pre-trained features. Through a systematic analysis, we aim to understand the key factors that can improve the performance of deepfake detection methods for singing voices. Additionally, we investigate the differences between singing voice and speech detection, highlighting the implications of the feature sets considered. Our results offer valuable insights and guidance for developing more advanced and effective singing voice deepfake detection systems in the future.
Tipologia CRIS:
4.1 Contributo in Atti di convegno
Keywords:
audio forensics; deepfake; Singing voice
Elenco autori:
Gohari, Mahyar; Salvi, Davide; Bestagini, Paolo; Adami, Nicola
Link alla scheda completa:
Titolo del libro:
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings