Journal ID : TRKU-24-03-2020-10592
[This article belongs to Volume - 62, Issue - 03]
Total View : 170

Title : A Systematic Review of Speaker Recognition Using Deep Learning on Research Trends, Datasets and Methods

Abstract :

Speaker recognition is a research topic that is still interesting and challenging. Various problems such as noise problems, poor performance, short duration, spoofing and inconsistency are problems that need to be resolved immediately. The researchers conducted research with various models from traditional methods such as the Gaussian Mixture Model (GMM), Support Vector Machine (SVM) and Hidden Markov Model (HMM) to the Deep Learning methods using Deep Neural Network (DNN) and Convolutional Neural Network (CNN). In addition, various hybrid deep learning methods are also used. Various papers that use these methods are difficult to understand, especially when compared between one method with another to obtain novelty and direction of research on speaker recognition. Systematic Literature Review (SLR) is helpful in identifying and interpreting various findings in a field of research in answering the research questions that have determined. This paper uses SLR in identifying research trends,datasets, feature extraction ,classification methods and evaluation techniques used in speaker recognition using deep learning. Results of the SLR discussion are 82 major study journals from 2011 to 2019 show that 20% of research studies focus on speaker verification topics, 11.5% each at Speaker Recognition in Noisy Conditions, Speaker Emotion Recognition and Short and Mismatch Utterance Duration. Research in speaker recognition 90% used public datasets and 10% used private datasets. The MFCC method is a method often used in feature extraction although there are I-vector and X-Vector methods that are starting to be used in deep learning. Deep Neural Network is a classification method that is often used in speaker recognition. 31% of the evaluation techniques that are often used are Equal Error Rate, 29% used the Word Error Rate and 40% used others method such as Accuracy, Root Mean Square Error (RMSE), Signal to Noise Ratio (SNR), Character Error Rate (CER) , Phone Error Rate (PER) and Speech Separation Performance (SSP)

Full article