تطوير مدوّنة تدريب لتحسين أداء نظم فصل المتكلمين المعتمدة على الذكاء الصنعي

رواد ملحم; د.م. آصف جعفر; د.م. أميمة الدكاك

Authors

Rawad Melhem
Assef Jafar Damascus University
Oumayma Al Dakkak Damascus University

Keywords:

Speaker Separation, Training Set, Ground Truth, Mixture Signal

Abstract

This paper addresses the challenge of speaker separation, which remains an active research topic despite the promising results achieved in recent years. These results, however,

often degrade in real recording conditions due to the presence of noise, echo, and other interferences. This is because neural models are typically trained on synthetic datasets consisting of mixed audio signals and their corresponding ground truths, which are generated using computer software and do not fully represent the complexities of real-world recording scenarios. The lack of realistic training sets for speaker separation remains a major hurdle, as obtaining individual sounds from mixed audio signals is a non-trivial task. To address this issue, we propose a novel method for constructing a realistic training set that includes mixture signals and corresponding ground truths for each speaker. We evaluate this dataset on a deep learning model and compare it to a synthetic dataset. We got a 1.65 dB improvement in Scale Invariant Signal to Distortion Ratio (SI-SDR) for speaker separation accuracy in realistic mixing. Our findings highlight the potential of realistic training sets for enhancing the performance of speaker separation models in real-world scenarios.

Developing an Effective Training Dataset to Enhance the Performance of AI-based Speaker Separation Systems

Authors

Keywords:

Abstract

Downloads

Downloads

Published

Issue

Section

How to Cite

Language

Information

Browse

Make a Submission