Building a Keyword Spotting Model for Arabic LanguageUsing Self-Supervised Learning Approach

Authors

  • Osama Ibrahim Deeb PhD student, Eng, Higher Institute for Applied Sciences and Technology (HIAST), telecommunication engineering
  • Oumayma Al-Dakkak Professor, Dr, Higher Institute for Applied Sciences and Technology (HIAST)
  • Assef Jafar Professor Dr, Higher Institute for Applied Sciences and Technology (HIAST), AI and Signal Processing

Keywords:

Contextual Representation, Self-Supervised Learning, Keyword Spotting, HuBERT, Dataset

Abstract

This research paper presents a comprehensive investigation into the efficiency of using contextual representation models trained via self-supervised

learning for keyword spotting (KWS) in the Arabic language, in view of reducing the amount of data required for training, while maintaining high accuracy in KWS. We employed Hidden Unit Bidirectional Encoder Representations from Transformers (HuBERT), a pre-trained model on Arabic data for extracting the contextual representation of the speech signal and developed a head model for the KWS downstream task. This head model was fine-tuned using the Arabic Speech Command dataset, and multiple experiments were conducted to ascertain the minimum number of training samples required to attain a specific level of accuracy.

Remarkably, using only ten training samples per word, the achieved detection accuracy exceeded 98.5%, and by increasing the number to more than 11 training samples, the accuracy increased to 99.7%. The performance of the model was evaluated on English language data and obtained similar outcomes regarding accuracy and the number of training samples needed for training. The results demonstrate the effectiveness of self-supervised learning for the KWS task in Arabic regarding the reduction of required training samples and suggest the potential for broader applications in speech processing.

Downloads

Download data is not yet available.

Author Biographies

  • Osama Ibrahim Deeb, PhD student, Eng, Higher Institute for Applied Sciences and Technology (HIAST), telecommunication engineering

    PhD student, Eng, Higher Institute for Applied Sciences and Technology (HIAST), telecommunication engineering

  • Oumayma Al-Dakkak, Professor, Dr, Higher Institute for Applied Sciences and Technology (HIAST)

    Professor, Dr, Higher Institute for Applied Sciences and Technology (HIAST)

  • Assef Jafar, Professor Dr, Higher Institute for Applied Sciences and Technology (HIAST), AI and Signal Processing

    Professor Dr, Higher Institute for Applied Sciences and Technology (HIAST), AI and Signal Processing

Downloads

Published

2025-08-04

How to Cite

Building a Keyword Spotting Model for Arabic LanguageUsing Self-Supervised Learning Approach. (2025). Damascus University Journal for Engineering Sciences, 40(4). https://journal.damascusuniversity.edu.sy/index.php/engj/article/view/11529