CN108877784B - Robust speech recognition method based on accent recognition - Google Patents

Robust speech recognition method based on accent recognition Download PDF

Info

Publication number
CN108877784B
CN108877784B CN201811030962.6A CN201811030962A CN108877784B CN 108877784 B CN108877784 B CN 108877784B CN 201811030962 A CN201811030962 A CN 201811030962A CN 108877784 B CN108877784 B CN 108877784B
Authority
CN
China
Prior art keywords
accent
training
target speaker
acoustic model
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811030962.6A
Other languages
Chinese (zh)
Other versions
CN108877784A (en
Inventor
吕勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hohai University HHU
Original Assignee
Hohai University HHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hohai University HHU filed Critical Hohai University HHU
Priority to CN201811030962.6A priority Critical patent/CN108877784B/en
Publication of CN108877784A publication Critical patent/CN108877784A/en
Application granted granted Critical
Publication of CN108877784B publication Critical patent/CN108877784B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • G10L15/142Hidden Markov Models [HMMs]
    • G10L15/144Training of HMMs
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The invention discloses a robust speech recognition method based on accent recognition, which predicts the accent characteristics of a target speaker by using acoustic models of various accents. In the training stage, accents with similar pronunciation characteristics are combined into one class, and a Gaussian mixture model and a group of hidden Markov models are generated for each class of accent training; in the testing stage, firstly, extracting a formant from the testing voice of a target speaker; then according to the characteristic of the formant, the accent of the speaker is identified, an acoustic model corresponding to the accent is selected according to the identification result, and the parameters of the acoustic model are adjusted to be matched with the pronunciation characteristic of the target speaker; and finally, recognizing the test voice feature vector by using the adaptive acoustic model to obtain a recognition result. The method can reduce the influence of the accent on the voice recognition system and improve the model self-adaption accuracy under the accent change condition.

Description

Robust speech recognition method based on accent recognition
Technical Field
The invention belongs to the field of voice recognition, and particularly relates to a robust voice recognition method which uses a Gaussian mixture model to describe the formant vector distribution of an accent, uses a pre-trained Gaussian mixture model to recognize the accent of a test voice in a test environment, selects an acoustic model which is most matched with the accent of a current speaker, and performs speaker self-adaptation on the parameters of the acoustic model to obtain the acoustic model of the test environment.
Background
Speech recognition systems generally use Mel-Frequency Cepstral coefficients (MFCC) as feature vectors and Hidden Markov Models (HMM) as acoustic models. In order to reflect the voice characteristics of the target speaker, the acoustic model is generally trained by training voices of a large number of speakers. It is then very difficult to reduce the impact of speaker changes by adding training speech. This is because different speakers have different speaking modes, and the number of speakers is huge, so that it is difficult to cover too many speakers in the training phase. On the other hand, too many speakers are trained, so that the acoustic model is too flat, the characteristic gap between each speaker is increased, and the system recognition rate is reduced.
Currently, most speech recognition systems achieve high recognition rates for standard mandarin chinese pronunciations. However, in real life, there are few people who can say the standard, and most people pronounce more or less with regional accents. The speaker self-adaptation can transform the parameters of the acoustic model trained in advance according to a small amount of test voices in the test environment, so that the parameters are matched with the test environment as much as possible. The transformation relationship between the training environment and the test environment is then unknown and non-linear. For the sake of implementation, in speaker adaptation, it is generally assumed that such environment mapping relationship is linear transformation. This may result in a large difference between the adaptively obtained acoustic model and the ideal acoustic model. This difference is more pronounced, especially when the pronunciation characteristics of the training speech and the targeted speaker differ significantly.
Disclosure of Invention
The purpose of the invention is as follows: aiming at the problems in the prior art, the invention provides a robust speech recognition method based on accent recognition.
The technical scheme is as follows: a robust speech recognition method based on accent recognition is characterized in that accents with similar pronunciation characteristics are combined into one class in a training stage, and a Gaussian Mixture Model (GMM) and a group of hidden Markov models are generated for each class of accent training; in the testing stage, firstly, extracting formants from the testing voice of a target speaker; then according to the characteristic of the formant, the accent of the speaker is identified, an acoustic model corresponding to the accent is selected according to the identification result, and the parameters of the acoustic model are adjusted to be matched with the pronunciation characteristic of the target speaker; and finally, recognizing the test voice feature vector by using the adaptive acoustic model to obtain a recognition result.
The method comprises the following specific steps:
(1) Obtaining training voices of various accents;
(2) Windowing the training voice of each type of accent, and framing to obtain frame signals;
(3) Extracting formants from voiced frame signals of each type of training speech, and forming formant vectors by the first three formants;
(4) GMM training is carried out on the formant vector of each type of training voice to obtain a GMM model of the accent;
(5) Extracting features of each type of training voice to obtain a Mel Frequency Cepstrum Coefficient (MFCC), and performing HMM training to obtain an HMM model (acoustic model) of each voice unit of the accent;
(6) Windowing the test voice of the target speaker, and framing to obtain a frame signal of the test voice;
(7) Extracting formant vectors from voiced frame signals of a target speaker;
(8) Carrying out accent recognition on the formant vector of the target speaker by using a pre-trained GMM to obtain accent information of the target speaker;
(9) Selecting an acoustic model of the similar accent according to the accent information of the target speaker, and adjusting parameters of the acoustic model to match the acoustic model with the pronunciation characteristics of the target speaker to obtain a self-adaptive acoustic model, wherein the matching process is an approximate approximation process, and the matching degree is improved if the recognition rate is improved but not completely matched if the recognition rate is improved;
(10) Extracting features from the frame signal of the target speaker to obtain the MFCC of the target speaker;
(11) And performing acoustic decoding on the MFCC of the target speaker by using the adaptive acoustic model to obtain a recognition result.
By adopting the technical scheme, the invention has the following beneficial effects:
the method can reduce the influence of the accent on the voice recognition system, improve the model self-adaption accuracy under the accent changing condition and enhance the recognition performance of the voice recognition system.
Drawings
Fig. 1 is a general framework diagram of a robust speech recognition method based on accent recognition according to an embodiment of the present invention.
Detailed Description
The present invention is further illustrated by the following examples, which are intended to be purely exemplary and are not intended to limit the scope of the invention, as various equivalent modifications of the invention will occur to those skilled in the art upon reading the present disclosure and fall within the scope of the appended claims.
A robust speech recognition method based on accent recognition mainly comprises preprocessing, formant extraction, GMM training, feature extraction, HMM training, accent recognition, model adaptation and acoustic decoding.
1. Pretreatment of
And respectively windowing the training voice and the test voice in a training stage and a testing stage, framing and generating each frame of signal. The sampling frequency of the speech signal is 8000Hz, the window function is a hamming window, the frame length is 256, and the frame shift is 128.
2. Formant extraction
In the training stage and the testing stage, formants are respectively extracted from voiced frame signals of training speech and testing speech, and the first three formants are combined into a formant vector.
3. Feature extraction
In the training stage and the testing stage, each frame signal of the training voice and the testing voice is subjected to fast Fourier transform, mel filtering, logarithmic transform and discrete cosine transform to generate a Mel Frequency Cepstrum Coefficient (MFCC).
4. GMM training
And carrying out GMM training on all training voice formant vectors of each type of accent to generate a GMM model of the accent.
5. HMM training
And carrying out HMM training on all training voices MFCC of each voice unit of each type of accent to obtain an HMM model of the voice unit of the type of accent. All HMMs for each type of accent make up an acoustic model for that type of accent.
6. Accent recognition
And inputting the formant vector of the tested voice of the target speaker into the GMM of each type of accent, and calculating the output probability of the GMM. And the accent corresponding to the GMM with the maximum output probability is the accent of the target speaker.
7. Model adaptation
According to the accent information of the target speaker obtained by accent recognition, an acoustic model of the accent is selected, and parameters of the acoustic model of the selected accent are transformed by using a maximum likelihood regression algorithm, so that the parameters are more matched with the pronunciation characteristics of the target speaker.
8. Acoustic decoding
And performing acoustic decoding on the MFCC of the target speaker by using the adaptive accent acoustic model to obtain a recognition result.

Claims (2)

1. A robust speech recognition method based on accent recognition is characterized in that in a training stage, training speech of various accents is obtained, and a Gaussian Mixture Model (GMM) and a group of hidden Markov models are generated for training each accent; in the testing stage, firstly, extracting formants from the testing voice of a target speaker; then according to the characteristic of the formant, the accent of the speaker is identified, an acoustic model corresponding to the accent is selected according to the identification result, and the parameters of the acoustic model are adjusted to be matched with the pronunciation characteristic of the target speaker, so that an adaptive acoustic model is obtained; finally, the acoustic model after self-adaptation is used for recognizing the test voice feature vector to obtain a recognition result;
the specific method for generating a GMM model and an HMM model for each type of accent training comprises the following steps:
(1) Combining the accents with similar pronunciation characteristics into one class to obtain training voices of various accents;
(2) Windowing the training voice of each type of accent, and framing to obtain frame signals;
(3) Extracting formants from voiced frame signals of each type of training speech, and forming formant vectors by the first three formants;
(4) GMM training is carried out on the formant vector of each type of training voice to obtain a GMM model of the accent;
(5) Performing feature extraction on each type of training speech to obtain a Mel Frequency Cepstrum Coefficient (MFCC), and performing HMM training to obtain an HMM model of each speech unit of the accent, wherein the HMM model is an acoustic model;
in the testing stage, the process of firstly identifying from the testing voice of the target speaker is as follows:
(1) Windowing the test voice of the target speaker, and framing to obtain a frame signal of the test voice;
(2) Extracting formant vectors from voiced frame signals of a target speaker;
(3) Performing accent recognition on the formant vector of the target speaker by using a pre-trained GMM to obtain accent information of the target speaker;
(4) Selecting an acoustic model of the accent according to the accent information of the target speaker, and adjusting parameters of the acoustic model to obtain a self-adaptive acoustic model so as to enable the acoustic model to be matched with the pronunciation characteristics of the target speaker;
(5) Extracting features from the frame signal of the target speaker to obtain the MFCC of the target speaker;
(6) And performing acoustic decoding on the MFCC of the target speaker by using the adaptive acoustic model to obtain a recognition result.
2. The robust speech recognition method based on accent recognition of claim 1, wherein the sampling frequency of the training speech and the test speech signal are both 8000Hz, the window function is a hamming window, the frame length is 256, and the frame shift is 128.
CN201811030962.6A 2018-09-05 2018-09-05 Robust speech recognition method based on accent recognition Active CN108877784B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811030962.6A CN108877784B (en) 2018-09-05 2018-09-05 Robust speech recognition method based on accent recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811030962.6A CN108877784B (en) 2018-09-05 2018-09-05 Robust speech recognition method based on accent recognition

Publications (2)

Publication Number Publication Date
CN108877784A CN108877784A (en) 2018-11-23
CN108877784B true CN108877784B (en) 2022-12-06

Family

ID=64323254

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811030962.6A Active CN108877784B (en) 2018-09-05 2018-09-05 Robust speech recognition method based on accent recognition

Country Status (1)

Country Link
CN (1) CN108877784B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220189463A1 (en) * 2020-12-16 2022-06-16 Samsung Electronics Co., Ltd. Electronic device and operation method thereof

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109961794B (en) * 2019-01-14 2021-07-06 湘潭大学 Method for improving speaker recognition efficiency based on model clustering
CN112116909A (en) * 2019-06-20 2020-12-22 杭州海康威视数字技术股份有限公司 Voice recognition method, device and system
CN110648654A (en) * 2019-10-09 2020-01-03 国家电网有限公司客户服务中心 Speech recognition enhancement method and device introducing language vectors
CN112233659A (en) * 2020-10-14 2021-01-15 河海大学 Quick speech recognition method based on double-layer acoustic model
CN112466056B (en) * 2020-12-01 2022-04-05 上海旷日网络科技有限公司 Self-service cabinet pickup system and method based on voice recognition
CN112599118B (en) * 2020-12-30 2024-02-13 中国科学技术大学 Speech recognition method, device, electronic equipment and storage medium

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080147404A1 (en) * 2000-05-15 2008-06-19 Nusuara Technologies Sdn Bhd System and methods for accent classification and adaptation
CN101123648B (en) * 2006-08-11 2010-05-12 中国科学院声学研究所 Self-adapted method in phone voice recognition
CN102881284B (en) * 2012-09-03 2014-07-09 江苏大学 Unspecific human voice and emotion recognition method and system
CN103474061A (en) * 2013-09-12 2013-12-25 河海大学 Automatic distinguishing method based on integration of classifier for Chinese dialects
CN104392718B (en) * 2014-11-26 2017-11-24 河海大学 A kind of robust speech recognition methods based on acoustic model array
CN104485108A (en) * 2014-11-26 2015-04-01 河海大学 Noise and speaker combined compensation method based on multi-speaker model
CN105355198B (en) * 2015-10-20 2019-03-12 河海大学 It is a kind of based on multiple adaptive model compensation audio recognition method
CN105632501B (en) * 2015-12-30 2019-09-03 中国科学院自动化研究所 A kind of automatic accent classification method and device based on depth learning technology
CN106251859B (en) * 2016-07-22 2019-05-31 百度在线网络技术(北京)有限公司 Voice recognition processing method and apparatus
CN106531157B (en) * 2016-10-28 2019-10-22 中国科学院自动化研究所 Regularization accent adaptive approach in speech recognition
CN107919115B (en) * 2017-11-13 2021-07-27 河海大学 Characteristic compensation method based on nonlinear spectral transformation

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220189463A1 (en) * 2020-12-16 2022-06-16 Samsung Electronics Co., Ltd. Electronic device and operation method thereof

Also Published As

Publication number Publication date
CN108877784A (en) 2018-11-23

Similar Documents

Publication Publication Date Title
CN108877784B (en) Robust speech recognition method based on accent recognition
KR100679051B1 (en) Apparatus and method for speech recognition using a plurality of confidence score estimation algorithms
Zen et al. Continuous stochastic feature mapping based on trajectory HMMs
Aggarwal et al. Performance evaluation of sequentially combined heterogeneous feature streams for Hindi speech recognition system
EP1675102A2 (en) Method for extracting feature vectors for speech recognition
Das et al. Bangladeshi dialect recognition using Mel frequency cepstral coefficient, delta, delta-delta and Gaussian mixture model
Nanavare et al. Recognition of human emotions from speech processing
Ranjan et al. Isolated word recognition using HMM for Maithili dialect
Bhukya Effect of gender on improving speech recognition system
KR101236539B1 (en) Apparatus and Method For Feature Compensation Using Weighted Auto-Regressive Moving Average Filter and Global Cepstral Mean and Variance Normalization
US11929058B2 (en) Systems and methods for adapting human speaker embeddings in speech synthesis
Hachkar et al. A comparison of DHMM and DTW for isolated digits recognition system of Arabic language
CN107919115B (en) Characteristic compensation method based on nonlinear spectral transformation
Singh et al. A critical review on automatic speaker recognition
Manjutha et al. Automated speech recognition system—A literature review
Vuppala et al. Recognition of consonant-vowel (CV) units under background noise using combined temporal and spectral preprocessing
Jayanna et al. Multiple frame size and rate analysis for speaker recognition under limited data condition
Dey et al. Content normalization for text-dependent speaker verification
Chakroun et al. An improved approach for text-independent speaker recognition
Salman et al. Speaker verification using boosted cepstral features with gaussian distributions
Biswas et al. Speaker identification using Cepstral based features and discrete Hidden Markov Model
Khalifa et al. Statistical modeling for speech recognition
Galić et al. Speaker dependent recognition of whispered speech based on MLLR adaptation
CN108986794B (en) Speaker compensation method based on power function frequency transformation
Kishimoto et al. Model training using parallel data with mismatched pause positions in statistical esophageal speech enhancement

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant