Developing an Effective Training Dataset to Enhance the Performance of AI-based Speaker Separation Systems

Authors

  • Rawad Melhem
  • Assef Jafar Damascus University
  • Oumayma Al Dakkak Damascus University

Keywords:

Speaker Separation, Training Set, Ground Truth, Mixture Signal

Abstract

This paper addresses the challenge of speaker separation, which remains an active research topic despite the promising results achieved in recent years. These results, however,

 often degrade in real recording conditions due to the presence of noise, echo, and other interferences. This is because neural models are typically trained on synthetic datasets consisting of mixed audio signals and their corresponding ground truths, which are generated using computer software and do not fully represent the complexities of real-world recording scenarios. The lack of realistic training sets for speaker separation remains a major hurdle, as obtaining individual sounds from mixed audio signals is a non-trivial task. To address this issue, we propose a novel method for constructing a realistic training set that includes mixture signals and corresponding ground truths for each speaker. We evaluate this dataset on a deep learning model and compare it to a synthetic dataset. We got a 1.65 dB improvement in Scale Invariant Signal to Distortion Ratio (SI-SDR) for speaker separation accuracy in realistic mixing. Our findings highlight the potential of realistic training sets for enhancing the performance of speaker separation models in real-world scenarios.

Downloads

Download data is not yet available.

Downloads

Published

2023-12-06

How to Cite

Developing an Effective Training Dataset to Enhance the Performance of AI-based Speaker Separation Systems. (2023). Damascus University Journal for Engineering Sciences, 39(4). https://journal.damascusuniversity.edu.sy/index.php/engj/article/view/10868