Audio Features Investigation for Singing Voice Deepfake Detection

Contributo in Atti di convegno

Data di Pubblicazione:

2025

Abstract:

The audio forensics field has recently faced a new challenge: singing voice deepfake detection. Current approaches to tackle this problem have borrowed methods initially developed for the more established task of speech deepfake detection, often simply retraining these systems on singing voice data. However, effective speech detection techniques may not necessarily perform well on singing voice, and there has been limited research on identifying the factors that can improve detection specifically in the singing domain. This paper investigates the effectiveness of various audio representations and features for discriminating real and synthetically generated singing voice signals. We evaluate two Convolutional Neural Network (CNN)-based detection systems using a wide range of audio representations, including handcrafted, learning-based, and pre-trained features. Through a systematic analysis, we aim to understand the key factors that can improve the performance of deepfake detection methods for singing voices. Additionally, we investigate the differences between singing voice and speech detection, highlighting the implications of the feature sets considered. Our results offer valuable insights and guidance for developing more advanced and effective singing voice deepfake detection systems in the future.

Tipologia CRIS:

4.1 Contributo in Atti di convegno

Keywords:

audio forensics; deepfake; Singing voice

Elenco autori:

Gohari, Mahyar; Salvi, Davide; Bestagini, Paolo; Adami, Nicola

Autori di Ateneo:

ADAMI Nicola

Link alla scheda completa:

https://iris.unibs.it/handle/11379/633787

Titolo del libro:

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

Pubblicato in:

PROCEEDINGS OF THE ... IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING

Journal

PROCEEDINGS OF THE ... IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING

Series