CN102290048A - Robust voice recognition method based on MFCC (Mel frequency cepstral coefficient) long-distance difference - Google Patents
Robust voice recognition method based on MFCC (Mel frequency cepstral coefficient) long-distance difference Download PDFInfo
- Publication number
- CN102290048A CN102290048A CN2011102588847A CN201110258884A CN102290048A CN 102290048 A CN102290048 A CN 102290048A CN 2011102588847 A CN2011102588847 A CN 2011102588847A CN 201110258884 A CN201110258884 A CN 201110258884A CN 102290048 A CN102290048 A CN 102290048A
- Authority
- CN
- China
- Prior art keywords
- mfcc
- parameter
- remote
- voice recognition
- differences
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title abstract description 19
- 238000004364 calculation method Methods 0.000 claims description 7
- 238000005070 sampling Methods 0.000 abstract description 3
- 238000001228 spectrum Methods 0.000 description 9
- 238000001914 filtration Methods 0.000 description 5
- 238000012549 training Methods 0.000 description 5
- 238000001514 detection method Methods 0.000 description 4
- 238000012360 testing method Methods 0.000 description 3
- 101000822695 Clostridium perfringens (strain 13 / Type A) Small, acid-soluble spore protein C1 Proteins 0.000 description 2
- 101000655262 Clostridium perfringens (strain 13 / Type A) Small, acid-soluble spore protein C2 Proteins 0.000 description 2
- 101000655256 Paraclostridium bifermentans Small, acid-soluble spore protein alpha Proteins 0.000 description 2
- 101000655264 Paraclostridium bifermentans Small, acid-soluble spore protein beta Proteins 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000000513 principal component analysis Methods 0.000 description 2
- FCUBTKFQDYNIIC-UHFFFAOYSA-N 2-dimethoxyphosphinothioylsulfanyl-n-(methoxymethyl)acetamide Chemical compound COCNC(=O)CSP(=S)(OC)OC FCUBTKFQDYNIIC-UHFFFAOYSA-N 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000011551 log transformation method Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000011002 quantification Methods 0.000 description 1
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Telephonic Communication Services (AREA)
Abstract
The invention discloses a robust voice recognition method based on an MFCC (Mel frequency cepstral coefficient) long-distance difference, which is significantly characterized in that the long-distance difference between four sampling points and six sampling points of the MFCC is used as a voice recognition characteristic parameter. Based on no fundamental increase in the amount of computation and the amount of storage, compared with common use of the MFCC parameter and the one-order differential coefficient as characteristic parameters in the field, the recognition rate of the robust voice recognition system can be increased by 20-40 percent.
Description
One, technical field
The present invention relates to the speech recognition technology field.The robust speech recognition methods of the remote difference of a kind of employing Mel frequency cepstral coefficient (MFCC) as characteristic parameter proposed.
Two, background technology
Speech recognition system performance main reasons for decrease under noise circumstance is that pure training data and being existed between the test data of noise pollution does not match, and seeks that a kind of can to reduce this unmatched characteristic parameter be a kind of important method that improves speech recognition system noisy speech discrimination.Speech recognition features parameter commonly used at present has Mel frequency cepstral coefficient (Mel Frequency Cepstral Coefficient, be called for short MFCC) and linear prediction cepstrum coefficient (Linear Predictive Cepstral Coefficient is called for short LPCC).MFCC meets the auditory properties of people's ear, has noiseproof feature preferably, computing method are as follows: at first voice signal is carried out end-point detection, pre-emphasis, divide frame, pre-service such as windowing, then each frame signal is carried out fast fourier transform (Fast Fourier Transform, being called for short FFT) the back delivery square obtains power spectrum, adopt 24 Jan Vermeer bank of filters that power spectrum is carried out filtering, filtered energy is carried out log-transformation, carry out discrete cosine transform (Discrete Cosine Transform at last again, abbreviation DCT) obtains the MFCC parameter, concrete computation process can list of references (as Han Jiqing, Zhang Lei, Zheng Tieran. voice signal is handled [M]. Beijing: publishing house of Tsing-Hua University, 2004.).LPCC is based on people's sonification model, and the sonification model of supposing the people is an all-pole modeling, thinks that the voice of current time can be represented with several voice linear combinations constantly before.Employing minimum mean square error criterion and correlation method can be obtained the linear predictor coefficient in the following formula, then can be in the hope of linear prediction cepstrum coefficient (LPCC) according to the homomorphism disposal route.Concrete computation process can referring to document (as Han Jiqing, Zhang Lei, Zheng Tieran. voice signal is handled [M]. Beijing: publishing house of Tsing-Hua University, 2004.).
Test in a large number (as Steven B.Davis, Paul Mermelstein.Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences.[J] .IEEE Trans.on ASSP, 1980,28 (4): 357-366. and Shang-Ming Lee, Shi-hau Fang, Jeih-weih Hung and Lin-Shan Lee.Improved MFCC feature extraction by PCA-optimized filter-bank for speech recognition.[J] .IEEEAutomatic Speech Recognition and Understanding, 2001,49-52.) show, MFCC has better noise robustness than LPCC, but MFCC still can not obtain gratifying effect (Yeganeh H. in robust speech identification, Ahadi S.M., Ziaei A.A new MFCC improvement method for robust ASR.[J] .IEEE ICSP, 2008,643-646.).
Document (Steven B.Davis, Paul Mermelstein.Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences.[J] .IEEE Trans.on ASSP, 1980,28 (4): adopt principal component analysis (Principal Component Analysis 357-366.), abbreviation PCA) method is optimized the Mel bank of filters, improves robustness; Document (Yeganeh H. is arranged again, Ahadi S.M., Ziaei A.A new MFCC improvement method for robust ASR.[J] .IEEE ICSP, 2008,643-646.) at first calculate Mel subband spectrum and subtract, then to each subband estimated snr, parameter is weighted according to this estimation, less parameter weight affected by noise is bigger, improves the robustness of speech recognition system under noise circumstance thereby reach.Korean Patent KR100893154B1 is used for the identification of voice sex with the MFCC coefficient of weighting, U.S. Pat 2009177466 replaces whole power spectrum to be used to extract the Mel frequency cepstral coefficient of voice the energy of voice spectrum crest, has improved the anti-noise robustness of speech recognition under the situation that does not increase the phonetic feature dimension.
Distinguishing feature of the present invention is to utilize the remote difference of MFCC as the speech recognition features parameter, and the combination of MFCC parameter of abandoning tradition itself and first order difference coefficient thereof is as the speech recognition features parameter.Experiment shows that when characteristic parameter was selected MFCC4 sampled point and 6 remote differences of sampled point for use, speech recognition system had best anti-noise robustness.
Three, summary of the invention
1, goal of the invention: propose a kind of robust speech recognition methods based on the remote difference of MFCC.This method selects for use the remote difference of MFCC4 sampled point and 6 sampled points as characteristic parameter, and MFCC parameter of abandoning tradition itself and first order difference coefficient thereof.
2, technical scheme: for achieving the above object, algorithm proposed by the invention is tried to achieve the remote difference of its 4 sampled points and 6 sampled points on the basis that calculates the MFCC parameter, is used for training and identification with this as the speech recognition features parameter.
The MFCC calculation method of parameters of standard is: at first voice signal is carried out pre-service, be end-point detection, pre-emphasis, branch frame, windowing, then each frame voice is calculated its FFT and delivery and square obtain power spectrum, power spectrum is carried out filtering with the Mel bank of filters, take the logarithm after the filtering, and calculate the MFCC parameter that DCT obtains standard.Specifically can consult document (the diligent .MFCC feature of leaf is improved algorithm at Application in Speech Recognition .[J for pay cloud, Jing Xinxing]. computer engineering and science, 2009,31 (12): 146-148.).
The computing method of 2 sampled point differences of MFCC are as follows:
Δ
2MFCC(i)=MFCC(i+1)-MFCC(i-1) (1)
In like manner, the computing method of 4 remote differences of sampled point of MFCC are as follows:
Δ
4MFCC(i)=MFCC(i+2)-MFCC(i-2) (2)
The computing method of 6 remote differences of sampled point of MFCC are as follows:
Δ
6MFCC(i)=MFCC(i+3)-MFCC(i-3) (3)
Wherein MFCC (i) is the MFCC parameter of i frame voice signal, and Δ 2MFCC is 2 sampled point differences of MFCC, Δ
4MFCC is 4 remote differences of sampled point of MFCC, Δ
6MFCC is 6 remote differences of sampled point of MFCC.
The concrete sound recognition system can adopt as hidden markov model (Hidden Markov Model, abbreviation HMM) (but being not limited to) is as system model, to the characteristic parameter of selecting for use (the remote differences of 4 sampled points of MFCC disclosed by the invention and 6 sampled points), training process can adopt Baum-Welch algorithm (but being not limited to), and identifying can adopt Viterbi decoding algorithm (but being not limited to).Concrete sound recognition system algorithm flow can consult document (how strong, He Ying .MATLAB expansion programming [M]. Beijing: publishing house of Tsing-Hua University, 2002.).
3, beneficial effect: remarkable advantage of the present invention is: select for use 4 sampled points of MFCC and 6 remote differences of sampled point as the speech recognition features parameter, substantially not increasing on the basis of calculated amount and memory space, is that characteristic parameter improves 20-40 percentage point of noisy speech discrimination than the MFCC parameter of the common employing in this area itself and the cooperation of first order difference coefficient sets thereof.
4, description of drawings
Fig. 1 is the theory diagram that calculates 4 remote differences of sampled point of MFCC.
Fig. 2 is the theory diagram that calculates 6 remote differences of sampled point of MFCC.
Five, embodiment
Algorithm characteristics proposed by the invention is: select for use the remote difference of MFCC as the speech recognition features parameter, MFCC parameter of abandoning tradition itself and the cooperation of first order difference coefficient sets thereof are characteristic parameter.Be example with isolated word robust speech recognition system below, introduce its implementation procedure in detail.
Isolated word robust speech recognition system adopts hidden markov model (Hidden Markov Model is called for short HMM), and as system model, training process adopts the Baum-Welch algorithm, and identifying adopts the Viterbi decoding algorithm.Speech data is the 8kHz sampling, 16 quantifications, and frame length is 256, and frame moves 128, and Hamming window is adopted in windowing.The voice signal preprocessing part, end-point detection adopts classical short-time energy-zero-crossing rate double threshold method.Concrete HMM algorithm flow can consult document (how strong, He Ying .MATLAB expansion programming [M]. Beijing: publishing house of Tsing-Hua University, 2002.).Detailed process is as follows:
1, calculates 4 sampled points of MFCC and 6 remote differences of sampled point as characteristic parameter
At first voice signal is carried out pre-service, be end-point detection, pre-emphasis, branch frame, windowing, then each frame voice calculated its FFT and delivery and square obtain power spectrum, power spectrum is carried out filtering with the Mel bank of filters, take the logarithm after the filtering, and calculate the MFCC parameter that DCT obtains standard.At last calculate the remote difference of 4 sampled points of MFCC and 6 sampled points according to the method described above as characteristic parameter.
2, carry out the HMM model training with clean speech
When carrying out speech recognition, to train model parameter earlier, use 4 sampled points of MFCC of clean speech of 120 people (63 men/57 woman) and 6 remote differences of sampled point herein, be input among the HMM and train as the speech recognition features parameter with HMM.HMM adopts continuous pdf model, and each HMM has 4 states, and each state is mixed by 3 Gauss units.
3, test with noisy speech
The voice that contain different signal to noise ratio (S/N ratio)s with 51 people (31 men/20 woman) are tested, discovery select for use 4 sampled points of MFCC and 6 remote differences of sampled point as characteristic parameter than the normally used MFCC parameter in this area itself and first order difference coefficient thereof during as characteristic parameter discrimination exceed 20-40 percentage point, concrete outcome is shown in table 1-table 4.
The different signal to noise ratio (S/N ratio) phonetic recognization rates (Gaussian noise) of table 1 different characteristic parameter
The different signal to noise ratio (S/N ratio) phonetic recognization rates (Formocarbam supermarket noise) of table 2 different characteristic parameter
The different signal to noise ratio (S/N ratio) phonetic recognization rates of table 3 different characteristic parameter (noise in the subway carriage)
The different signal to noise ratio (S/N ratio) phonetic recognization rates (Hunan road traffic noise) of table 4 different characteristic parameter
Claims (3)
1. the robust speech recognizer based on the remote difference of Mel frequency cepstral coefficient (MFCC) is characterized in that adopting 4 sampled points of MFCC and 6 remote differences of sampled point as characteristic parameter.
2. the computing method of 4 remote differences of sampled point of MFCC as claimed in claim 1 is characterized in that:
Δ
4MFCC(i)=MFCC(i+2)-MFCC(i-2),
Wherein MFCC (i) is the MFCC parameter of i frame voice signal, Δ
4MFCC is 4 remote differences of sampled point of MFCC.
3. the computing method of 6 remote differences of sampled point of MFCC as claimed in claim 1 is characterized in that:
Δ
6MFCC(i)=MFCC(i+3)-MFCC(i-3),
Wherein MFCC (i) is the MFCC parameter of i frame voice signal, Δ
6MFCC is 6 remote differences of sampled point of MFCC.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201110258884A CN102290048B (en) | 2011-09-05 | 2011-09-05 | Robust voice recognition method based on MFCC (Mel frequency cepstral coefficient) long-distance difference |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201110258884A CN102290048B (en) | 2011-09-05 | 2011-09-05 | Robust voice recognition method based on MFCC (Mel frequency cepstral coefficient) long-distance difference |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102290048A true CN102290048A (en) | 2011-12-21 |
CN102290048B CN102290048B (en) | 2012-10-24 |
Family
ID=45336411
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201110258884A Expired - Fee Related CN102290048B (en) | 2011-09-05 | 2011-09-05 | Robust voice recognition method based on MFCC (Mel frequency cepstral coefficient) long-distance difference |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102290048B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105845141A (en) * | 2016-03-23 | 2016-08-10 | 广州势必可赢网络科技有限公司 | Speaker confirmation model, speaker confirmation method and speaker confirmation device based on channel robustness |
CN106373559A (en) * | 2016-09-08 | 2017-02-01 | 河海大学 | Robustness feature extraction method based on logarithmic spectrum noise-to-signal weighting |
CN108175436A (en) * | 2017-12-28 | 2018-06-19 | 北京航空航天大学 | A kind of gurgling sound intelligence automatic identifying method |
CN112951245A (en) * | 2021-03-09 | 2021-06-11 | 江苏开放大学(江苏城市职业学院) | Dynamic voiceprint feature extraction method integrated with static component |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100893154B1 (en) * | 2008-10-13 | 2009-04-16 | 한국과학기술연구원 | A method and an apparatus for recognizing a gender of an speech signal |
US20090177466A1 (en) * | 2007-12-20 | 2009-07-09 | Kabushiki Kaisha Toshiba | Detection of speech spectral peaks and speech recognition method and system |
CN101546555A (en) * | 2009-04-14 | 2009-09-30 | 清华大学 | Constraint heteroscedasticity linear discriminant analysis method for language identification |
-
2011
- 2011-09-05 CN CN201110258884A patent/CN102290048B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090177466A1 (en) * | 2007-12-20 | 2009-07-09 | Kabushiki Kaisha Toshiba | Detection of speech spectral peaks and speech recognition method and system |
KR100893154B1 (en) * | 2008-10-13 | 2009-04-16 | 한국과학기술연구원 | A method and an apparatus for recognizing a gender of an speech signal |
CN101546555A (en) * | 2009-04-14 | 2009-09-30 | 清华大学 | Constraint heteroscedasticity linear discriminant analysis method for language identification |
Non-Patent Citations (1)
Title |
---|
《计算机工程与科学》 20091231 俸云等 MFCC特征改进算法在语音识别中的应用 146-148 第31卷, 第12期 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105845141A (en) * | 2016-03-23 | 2016-08-10 | 广州势必可赢网络科技有限公司 | Speaker confirmation model, speaker confirmation method and speaker confirmation device based on channel robustness |
CN106373559A (en) * | 2016-09-08 | 2017-02-01 | 河海大学 | Robustness feature extraction method based on logarithmic spectrum noise-to-signal weighting |
CN106373559B (en) * | 2016-09-08 | 2019-12-10 | 河海大学 | Robust feature extraction method based on log-spectrum signal-to-noise ratio weighting |
CN108175436A (en) * | 2017-12-28 | 2018-06-19 | 北京航空航天大学 | A kind of gurgling sound intelligence automatic identifying method |
CN112951245A (en) * | 2021-03-09 | 2021-06-11 | 江苏开放大学(江苏城市职业学院) | Dynamic voiceprint feature extraction method integrated with static component |
CN112951245B (en) * | 2021-03-09 | 2023-06-16 | 江苏开放大学(江苏城市职业学院) | Dynamic voiceprint feature extraction method integrated with static component |
Also Published As
Publication number | Publication date |
---|---|
CN102290048B (en) | 2012-10-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108922541B (en) | Multi-dimensional characteristic parameter voiceprint recognition method based on DTW and GMM models | |
CN102436809B (en) | Network speech recognition method in English oral language machine examination system | |
CN109192200B (en) | Speech recognition method | |
WO2014153800A1 (en) | Voice recognition system | |
CN104485103A (en) | Vector Taylor series-based multi-environment model isolated word identifying method | |
CN105654947B (en) | Method and system for acquiring road condition information in traffic broadcast voice | |
CN102290048B (en) | Robust voice recognition method based on MFCC (Mel frequency cepstral coefficient) long-distance difference | |
Mallidi et al. | Autoencoder based multi-stream combination for noise robust speech recognition. | |
CN102237083A (en) | Portable interpretation system based on WinCE platform and language recognition method thereof | |
AboElenein et al. | Improved text-independent speaker identification system for real time applications | |
CN113744725B (en) | Training method of voice endpoint detection model and voice noise reduction method | |
Bocchieri et al. | Investigating deep neural network based transforms of robust audio features for lvcsr | |
Chowdhury et al. | Text-independent distributed speaker identification and verification using GMM-UBM speaker models for mobile communications | |
Alam et al. | A study of low-variance multi-taper features for distributed speech recognition | |
Sorin et al. | The ETSI extended distributed speech recognition (DSR) standards: client side processing and tonal language recognition evaluation | |
Kalamani et al. | Review of Speech Segmentation Algorithms for Speech Recognition | |
Fujimoto et al. | Frame-wise model re-estimation method based on Gaussian pruning with weight normalization for noise robust voice activity detection | |
Zhao et al. | Efficient feature extraction of speaker identification using phoneme mean F-ratio for Chinese | |
Dimitriadis et al. | An alternative front-end for the AT&T WATSON LV-CSR system | |
Narayanan et al. | Coupling binary masking and robust ASR | |
Yue et al. | Speaker age recognition based on isolated words by using SVM | |
Maragakis et al. | Region-based vocal tract length normalization for ASR. | |
Haghani et al. | Robust voice activity detection using feature combination | |
Saha et al. | Modified mel-frequency cepstral coefficient | |
Liu et al. | Efficient voice activity detection algorithm based on sub-band temporal envelope and sub-band long-term signal variability |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20121024 |