Enhancing Robustness and Accuracy of Bone-Conducted Speech Emotion Recognition via Transformer Models

Published in 10th International Conference on Electrical Engineering and Informatics (ICEEI2025), Malaysia, 2025

Speech Emotion Recognition (SER) enhances human-computer interaction by enabling systems to identify and respond to emotions in vocal expressions. This research presents a high-performance SER model based on the Wav2Vec2.0 trans- former framework, fine-tuned with a custom dataset named audio EmoBon, created with bone-conducted (BC) speech from Malaysian speakers. The dataset features eight emotional cate- gories and has undergone rigorous validation for high record- ing quality and accurate emotional representation. Our model utilizes raw audio input via a self-supervised transformer to automatically extract rich acoustic representations, eliminating feature engineering and enhancing generalizability across diverse acoustic conditions. Additionally, the audio Emobon dataset boosts emotional authenticity by simulating speech transmission through bone conduction. Our system achieves 99.06% accuracy, surpassing existing models on similar tasks. It performs excel- lently across all evaluation metrics, including macro and weighted precision, recall, and F1-score. ROC curve and confusion matrix analyses validate its ability to classify emotional states accurately while reducing misclassification. This study advances the SER field by integrating transformer-based learning with culturally relevant and physiologically informed speech data. The findings indicate that these models are feasible for Southeast Asian populations and practical applications like affective computing, mental health diagnostics, and intelligent virtual agents.

Recommended citation: M. R. Hossen, K. A. A. Bakar, M. U. Mia, M. N. Hossain and M. S. Hosain, "Enhancing Robustness and Accuracy of Bone-Conducted Speech Emotion Recognition via Transformer Models." 2025 International Conference on Electrical Engineering and Informatics (ICEEI), Kuching, Malaysia, 2025, pp. 1-6, doi: 10.1109/ICEEI68459.2025.11330456.
Download Paper

Share on

Bluesky Facebook LinkedIn Mastodon X (formerly Twitter)

Md. Rifat Hossen

Share on