CN102290048A - Robust voice recognition method based on MFCC (Mel frequency cepstral coefficient) long-distance difference - Google Patents

Robust voice recognition method based on MFCC (Mel frequency cepstral coefficient) long-distance difference Download PDF

Info

Publication number
CN102290048A
CN102290048A CN2011102588847A CN201110258884A CN102290048A CN 102290048 A CN102290048 A CN 102290048A CN 2011102588847 A CN2011102588847 A CN 2011102588847A CN 201110258884 A CN201110258884 A CN 201110258884A CN 102290048 A CN102290048 A CN 102290048A
Authority
CN
China
Prior art keywords
mfcc
parameter
remote
voice recognition
differences
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2011102588847A
Other languages
Chinese (zh)
Other versions
CN102290048B (en
Inventor
赵斯培
邱小军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN201110258884A priority Critical patent/CN102290048B/en
Publication of CN102290048A publication Critical patent/CN102290048A/en
Application granted granted Critical
Publication of CN102290048B publication Critical patent/CN102290048B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention discloses a robust voice recognition method based on an MFCC (Mel frequency cepstral coefficient) long-distance difference, which is significantly characterized in that the long-distance difference between four sampling points and six sampling points of the MFCC is used as a voice recognition characteristic parameter. Based on no fundamental increase in the amount of computation and the amount of storage, compared with common use of the MFCC parameter and the one-order differential coefficient as characteristic parameters in the field, the recognition rate of the robust voice recognition system can be increased by 20-40 percent.

Description

A kind of robust speech recognition methods based on the remote difference of MFCC
One, technical field
The present invention relates to the speech recognition technology field.The robust speech recognition methods of the remote difference of a kind of employing Mel frequency cepstral coefficient (MFCC) as characteristic parameter proposed.
Two, background technology
Speech recognition system performance main reasons for decrease under noise circumstance is that pure training data and being existed between the test data of noise pollution does not match, and seeks that a kind of can to reduce this unmatched characteristic parameter be a kind of important method that improves speech recognition system noisy speech discrimination.Speech recognition features parameter commonly used at present has Mel frequency cepstral coefficient (Mel Frequency Cepstral Coefficient, be called for short MFCC) and linear prediction cepstrum coefficient (Linear Predictive Cepstral Coefficient is called for short LPCC).MFCC meets the auditory properties of people's ear, has noiseproof feature preferably, computing method are as follows: at first voice signal is carried out end-point detection, pre-emphasis, divide frame, pre-service such as windowing, then each frame signal is carried out fast fourier transform (Fast Fourier Transform, being called for short FFT) the back delivery square obtains power spectrum, adopt 24 Jan Vermeer bank of filters that power spectrum is carried out filtering, filtered energy is carried out log-transformation, carry out discrete cosine transform (Discrete Cosine Transform at last again, abbreviation DCT) obtains the MFCC parameter, concrete computation process can list of references (as Han Jiqing, Zhang Lei, Zheng Tieran. voice signal is handled [M]. Beijing: publishing house of Tsing-Hua University, 2004.).LPCC is based on people's sonification model, and the sonification model of supposing the people is an all-pole modeling, thinks that the voice of current time can be represented with several voice linear combinations constantly before.Employing minimum mean square error criterion and correlation method can be obtained the linear predictor coefficient in the following formula, then can be in the hope of linear prediction cepstrum coefficient (LPCC) according to the homomorphism disposal route.Concrete computation process can referring to document (as Han Jiqing, Zhang Lei, Zheng Tieran. voice signal is handled [M]. Beijing: publishing house of Tsing-Hua University, 2004.).
Test in a large number (as Steven B.Davis, Paul Mermelstein.Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences.[J] .IEEE Trans.on ASSP, 1980,28 (4): 357-366. and Shang-Ming Lee, Shi-hau Fang, Jeih-weih Hung and Lin-Shan Lee.Improved MFCC feature extraction by PCA-optimized filter-bank for speech recognition.[J] .IEEEAutomatic Speech Recognition and Understanding, 2001,49-52.) show, MFCC has better noise robustness than LPCC, but MFCC still can not obtain gratifying effect (Yeganeh H. in robust speech identification, Ahadi S.M., Ziaei A.A new MFCC improvement method for robust ASR.[J] .IEEE ICSP, 2008,643-646.).
Document (Steven B.Davis, Paul Mermelstein.Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences.[J] .IEEE Trans.on ASSP, 1980,28 (4): adopt principal component analysis (Principal Component Analysis 357-366.), abbreviation PCA) method is optimized the Mel bank of filters, improves robustness; Document (Yeganeh H. is arranged again, Ahadi S.M., Ziaei A.A new MFCC improvement method for robust ASR.[J] .IEEE ICSP, 2008,643-646.) at first calculate Mel subband spectrum and subtract, then to each subband estimated snr, parameter is weighted according to this estimation, less parameter weight affected by noise is bigger, improves the robustness of speech recognition system under noise circumstance thereby reach.Korean Patent KR100893154B1 is used for the identification of voice sex with the MFCC coefficient of weighting, U.S. Pat 2009177466 replaces whole power spectrum to be used to extract the Mel frequency cepstral coefficient of voice the energy of voice spectrum crest, has improved the anti-noise robustness of speech recognition under the situation that does not increase the phonetic feature dimension.
Distinguishing feature of the present invention is to utilize the remote difference of MFCC as the speech recognition features parameter, and the combination of MFCC parameter of abandoning tradition itself and first order difference coefficient thereof is as the speech recognition features parameter.Experiment shows that when characteristic parameter was selected MFCC4 sampled point and 6 remote differences of sampled point for use, speech recognition system had best anti-noise robustness.
Three, summary of the invention
1, goal of the invention: propose a kind of robust speech recognition methods based on the remote difference of MFCC.This method selects for use the remote difference of MFCC4 sampled point and 6 sampled points as characteristic parameter, and MFCC parameter of abandoning tradition itself and first order difference coefficient thereof.
2, technical scheme: for achieving the above object, algorithm proposed by the invention is tried to achieve the remote difference of its 4 sampled points and 6 sampled points on the basis that calculates the MFCC parameter, is used for training and identification with this as the speech recognition features parameter.
The MFCC calculation method of parameters of standard is: at first voice signal is carried out pre-service, be end-point detection, pre-emphasis, branch frame, windowing, then each frame voice is calculated its FFT and delivery and square obtain power spectrum, power spectrum is carried out filtering with the Mel bank of filters, take the logarithm after the filtering, and calculate the MFCC parameter that DCT obtains standard.Specifically can consult document (the diligent .MFCC feature of leaf is improved algorithm at Application in Speech Recognition .[J for pay cloud, Jing Xinxing]. computer engineering and science, 2009,31 (12): 146-148.).
The computing method of 2 sampled point differences of MFCC are as follows:
Δ 2MFCC(i)=MFCC(i+1)-MFCC(i-1) (1)
In like manner, the computing method of 4 remote differences of sampled point of MFCC are as follows:
Δ 4MFCC(i)=MFCC(i+2)-MFCC(i-2) (2)
The computing method of 6 remote differences of sampled point of MFCC are as follows:
Δ 6MFCC(i)=MFCC(i+3)-MFCC(i-3) (3)
Wherein MFCC (i) is the MFCC parameter of i frame voice signal, and Δ 2MFCC is 2 sampled point differences of MFCC, Δ 4MFCC is 4 remote differences of sampled point of MFCC, Δ 6MFCC is 6 remote differences of sampled point of MFCC.
The concrete sound recognition system can adopt as hidden markov model (Hidden Markov Model, abbreviation HMM) (but being not limited to) is as system model, to the characteristic parameter of selecting for use (the remote differences of 4 sampled points of MFCC disclosed by the invention and 6 sampled points), training process can adopt Baum-Welch algorithm (but being not limited to), and identifying can adopt Viterbi decoding algorithm (but being not limited to).Concrete sound recognition system algorithm flow can consult document (how strong, He Ying .MATLAB expansion programming [M]. Beijing: publishing house of Tsing-Hua University, 2002.).
3, beneficial effect: remarkable advantage of the present invention is: select for use 4 sampled points of MFCC and 6 remote differences of sampled point as the speech recognition features parameter, substantially not increasing on the basis of calculated amount and memory space, is that characteristic parameter improves 20-40 percentage point of noisy speech discrimination than the MFCC parameter of the common employing in this area itself and the cooperation of first order difference coefficient sets thereof.
4, description of drawings
Fig. 1 is the theory diagram that calculates 4 remote differences of sampled point of MFCC.
Fig. 2 is the theory diagram that calculates 6 remote differences of sampled point of MFCC.
Five, embodiment
Algorithm characteristics proposed by the invention is: select for use the remote difference of MFCC as the speech recognition features parameter, MFCC parameter of abandoning tradition itself and the cooperation of first order difference coefficient sets thereof are characteristic parameter.Be example with isolated word robust speech recognition system below, introduce its implementation procedure in detail.
Isolated word robust speech recognition system adopts hidden markov model (Hidden Markov Model is called for short HMM), and as system model, training process adopts the Baum-Welch algorithm, and identifying adopts the Viterbi decoding algorithm.Speech data is the 8kHz sampling, 16 quantifications, and frame length is 256, and frame moves 128, and Hamming window is adopted in windowing.The voice signal preprocessing part, end-point detection adopts classical short-time energy-zero-crossing rate double threshold method.Concrete HMM algorithm flow can consult document (how strong, He Ying .MATLAB expansion programming [M]. Beijing: publishing house of Tsing-Hua University, 2002.).Detailed process is as follows:
1, calculates 4 sampled points of MFCC and 6 remote differences of sampled point as characteristic parameter
At first voice signal is carried out pre-service, be end-point detection, pre-emphasis, branch frame, windowing, then each frame voice calculated its FFT and delivery and square obtain power spectrum, power spectrum is carried out filtering with the Mel bank of filters, take the logarithm after the filtering, and calculate the MFCC parameter that DCT obtains standard.At last calculate the remote difference of 4 sampled points of MFCC and 6 sampled points according to the method described above as characteristic parameter.
2, carry out the HMM model training with clean speech
When carrying out speech recognition, to train model parameter earlier, use 4 sampled points of MFCC of clean speech of 120 people (63 men/57 woman) and 6 remote differences of sampled point herein, be input among the HMM and train as the speech recognition features parameter with HMM.HMM adopts continuous pdf model, and each HMM has 4 states, and each state is mixed by 3 Gauss units.
3, test with noisy speech
The voice that contain different signal to noise ratio (S/N ratio)s with 51 people (31 men/20 woman) are tested, discovery select for use 4 sampled points of MFCC and 6 remote differences of sampled point as characteristic parameter than the normally used MFCC parameter in this area itself and first order difference coefficient thereof during as characteristic parameter discrimination exceed 20-40 percentage point, concrete outcome is shown in table 1-table 4.
The different signal to noise ratio (S/N ratio) phonetic recognization rates (Gaussian noise) of table 1 different characteristic parameter
The different signal to noise ratio (S/N ratio) phonetic recognization rates (Formocarbam supermarket noise) of table 2 different characteristic parameter
Figure BSA00000567167900042
The different signal to noise ratio (S/N ratio) phonetic recognization rates of table 3 different characteristic parameter (noise in the subway carriage)
Figure BSA00000567167900043
Figure BSA00000567167900051
The different signal to noise ratio (S/N ratio) phonetic recognization rates (Hunan road traffic noise) of table 4 different characteristic parameter

Claims (3)

1. the robust speech recognizer based on the remote difference of Mel frequency cepstral coefficient (MFCC) is characterized in that adopting 4 sampled points of MFCC and 6 remote differences of sampled point as characteristic parameter.
2. the computing method of 4 remote differences of sampled point of MFCC as claimed in claim 1 is characterized in that:
Δ 4MFCC(i)=MFCC(i+2)-MFCC(i-2),
Wherein MFCC (i) is the MFCC parameter of i frame voice signal, Δ 4MFCC is 4 remote differences of sampled point of MFCC.
3. the computing method of 6 remote differences of sampled point of MFCC as claimed in claim 1 is characterized in that:
Δ 6MFCC(i)=MFCC(i+3)-MFCC(i-3),
Wherein MFCC (i) is the MFCC parameter of i frame voice signal, Δ 6MFCC is 6 remote differences of sampled point of MFCC.
CN201110258884A 2011-09-05 2011-09-05 Robust voice recognition method based on MFCC (Mel frequency cepstral coefficient) long-distance difference Expired - Fee Related CN102290048B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110258884A CN102290048B (en) 2011-09-05 2011-09-05 Robust voice recognition method based on MFCC (Mel frequency cepstral coefficient) long-distance difference

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110258884A CN102290048B (en) 2011-09-05 2011-09-05 Robust voice recognition method based on MFCC (Mel frequency cepstral coefficient) long-distance difference

Publications (2)

Publication Number Publication Date
CN102290048A true CN102290048A (en) 2011-12-21
CN102290048B CN102290048B (en) 2012-10-24

Family

ID=45336411

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110258884A Expired - Fee Related CN102290048B (en) 2011-09-05 2011-09-05 Robust voice recognition method based on MFCC (Mel frequency cepstral coefficient) long-distance difference

Country Status (1)

Country Link
CN (1) CN102290048B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105845141A (en) * 2016-03-23 2016-08-10 广州势必可赢网络科技有限公司 Speaker confirmation model, speaker confirmation method and speaker confirmation device based on channel robustness
CN106373559A (en) * 2016-09-08 2017-02-01 河海大学 Robustness feature extraction method based on logarithmic spectrum noise-to-signal weighting
CN108175436A (en) * 2017-12-28 2018-06-19 北京航空航天大学 A kind of gurgling sound intelligence automatic identifying method
CN112951245A (en) * 2021-03-09 2021-06-11 江苏开放大学(江苏城市职业学院) Dynamic voiceprint feature extraction method integrated with static component

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100893154B1 (en) * 2008-10-13 2009-04-16 한국과학기술연구원 A method and an apparatus for recognizing a gender of an speech signal
US20090177466A1 (en) * 2007-12-20 2009-07-09 Kabushiki Kaisha Toshiba Detection of speech spectral peaks and speech recognition method and system
CN101546555A (en) * 2009-04-14 2009-09-30 清华大学 Constraint heteroscedasticity linear discriminant analysis method for language identification

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090177466A1 (en) * 2007-12-20 2009-07-09 Kabushiki Kaisha Toshiba Detection of speech spectral peaks and speech recognition method and system
KR100893154B1 (en) * 2008-10-13 2009-04-16 한국과학기술연구원 A method and an apparatus for recognizing a gender of an speech signal
CN101546555A (en) * 2009-04-14 2009-09-30 清华大学 Constraint heteroscedasticity linear discriminant analysis method for language identification

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
《计算机工程与科学》 20091231 俸云等 MFCC特征改进算法在语音识别中的应用 146-148 第31卷, 第12期 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105845141A (en) * 2016-03-23 2016-08-10 广州势必可赢网络科技有限公司 Speaker confirmation model, speaker confirmation method and speaker confirmation device based on channel robustness
CN106373559A (en) * 2016-09-08 2017-02-01 河海大学 Robustness feature extraction method based on logarithmic spectrum noise-to-signal weighting
CN106373559B (en) * 2016-09-08 2019-12-10 河海大学 Robust feature extraction method based on log-spectrum signal-to-noise ratio weighting
CN108175436A (en) * 2017-12-28 2018-06-19 北京航空航天大学 A kind of gurgling sound intelligence automatic identifying method
CN112951245A (en) * 2021-03-09 2021-06-11 江苏开放大学(江苏城市职业学院) Dynamic voiceprint feature extraction method integrated with static component
CN112951245B (en) * 2021-03-09 2023-06-16 江苏开放大学(江苏城市职业学院) Dynamic voiceprint feature extraction method integrated with static component

Also Published As

Publication number Publication date
CN102290048B (en) 2012-10-24

Similar Documents

Publication Publication Date Title
CN108922541B (en) Multi-dimensional characteristic parameter voiceprint recognition method based on DTW and GMM models
CN102436809B (en) Network speech recognition method in English oral language machine examination system
CN109192200B (en) Speech recognition method
WO2014153800A1 (en) Voice recognition system
CN104485103A (en) Vector Taylor series-based multi-environment model isolated word identifying method
CN105654947B (en) Method and system for acquiring road condition information in traffic broadcast voice
CN102290048B (en) Robust voice recognition method based on MFCC (Mel frequency cepstral coefficient) long-distance difference
Mallidi et al. Autoencoder based multi-stream combination for noise robust speech recognition.
CN102237083A (en) Portable interpretation system based on WinCE platform and language recognition method thereof
AboElenein et al. Improved text-independent speaker identification system for real time applications
CN113744725B (en) Training method of voice endpoint detection model and voice noise reduction method
Bocchieri et al. Investigating deep neural network based transforms of robust audio features for lvcsr
Chowdhury et al. Text-independent distributed speaker identification and verification using GMM-UBM speaker models for mobile communications
Alam et al. A study of low-variance multi-taper features for distributed speech recognition
Sorin et al. The ETSI extended distributed speech recognition (DSR) standards: client side processing and tonal language recognition evaluation
Kalamani et al. Review of Speech Segmentation Algorithms for Speech Recognition
Fujimoto et al. Frame-wise model re-estimation method based on Gaussian pruning with weight normalization for noise robust voice activity detection
Zhao et al. Efficient feature extraction of speaker identification using phoneme mean F-ratio for Chinese
Dimitriadis et al. An alternative front-end for the AT&T WATSON LV-CSR system
Narayanan et al. Coupling binary masking and robust ASR
Yue et al. Speaker age recognition based on isolated words by using SVM
Maragakis et al. Region-based vocal tract length normalization for ASR.
Haghani et al. Robust voice activity detection using feature combination
Saha et al. Modified mel-frequency cepstral coefficient
Liu et al. Efficient voice activity detection algorithm based on sub-band temporal envelope and sub-band long-term signal variability

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20121024