CN111489763A - Adaptive method for speaker recognition in complex environment based on GMM model - Google Patents
Adaptive method for speaker recognition in complex environment based on GMM model Download PDFInfo
- Publication number
- CN111489763A CN111489763A CN202010284977.6A CN202010284977A CN111489763A CN 111489763 A CN111489763 A CN 111489763A CN 202010284977 A CN202010284977 A CN 202010284977A CN 111489763 A CN111489763 A CN 111489763A
- Authority
- CN
- China
- Prior art keywords
- voice
- model
- speaker recognition
- mfcc
- gmm
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 51
- 230000003044 adaptive effect Effects 0.000 title claims abstract description 31
- 238000001914 filtration Methods 0.000 claims abstract description 22
- 238000007781 pre-processing Methods 0.000 claims abstract description 15
- 238000012545 processing Methods 0.000 claims abstract description 14
- 238000009432 framing Methods 0.000 claims abstract description 9
- 238000012549 training Methods 0.000 claims description 18
- 239000013598 vector Substances 0.000 claims description 6
- 238000001228 spectrum Methods 0.000 claims description 5
- 238000001514 detection method Methods 0.000 claims description 4
- 239000000203 mixture Substances 0.000 claims description 4
- 230000003595 spectral effect Effects 0.000 claims description 3
- 238000012360 testing method Methods 0.000 claims description 3
- 238000012546 transfer Methods 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 230000001419 dependent effect Effects 0.000 claims 1
- 238000005516 engineering process Methods 0.000 abstract description 6
- 238000010276 construction Methods 0.000 abstract description 5
- 230000008859 change Effects 0.000 abstract description 3
- 230000007547 defect Effects 0.000 abstract description 2
- 238000000605 extraction Methods 0.000 description 6
- 230000008569 process Effects 0.000 description 3
- 230000006978 adaptation Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000036541 health Effects 0.000 description 2
- 230000001755 vocal effect Effects 0.000 description 2
- 108091006146 Channels Proteins 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000011840 criminal investigation Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 210000005069 ears Anatomy 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
- 230000005284 excitation Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 210000001260 vocal cord Anatomy 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
Abstract
The invention relates to a signal processing technology, in particular to a speaker recognition adaptive method in a complex environment based on a GMM model, which comprises a GMM-based speaker recognition model construction stage, namely, after preprocessing of low-pass filtering, pre-emphasis, windowing, framing and the like is carried out on a voice signal, filtering and denoising are carried out through a Gamma filter, and GMFCC combination characteristic parameters are extracted. And the speaker recognition and self-adaptation stage is also included, namely the speaker recognition is completed by extracting the voice characteristic parameters of the speaker to be recognized and carrying out self-adaptation adjustment on the original model. The method overcomes the defects of low speaker recognition accuracy rate and the like caused by illness or complex environment, provides a novel combined characteristic parameter method, can combine and analyze different characteristics, and effectively compensates errors caused by voice change due to different conditions of speakers, thereby improving the recognition accuracy rate.
Description
Technical Field
The invention belongs to the technical field of signal processing, and particularly relates to a GMM model-based speaker recognition adaptive method in a complex environment.
Background
Speaker recognition is a method of performing feature extraction through collected voice signals of a speaker, analyzing and processing the voice signals, and then recognizing or confirming the speaker. With the rapid development of the current internet and information technology, speaker recognition technology is used in more and more related fields. Speaker recognition is used as a leading-edge technology and is widely applied to the fields of smart home, judicial criminal investigation, identity verification and the like.
With the progress of speaker recognition research, key technologies mainly develop around the problems of noise elimination, feature extraction, pattern matching and the like.
How to extract the individual features of the speaker from the speech signal of the speaker is the key of voiceprint recognition. The voice signal contains the characteristics of the sent voice and the personality characteristics of the speaker, and is a mixture of the voice characteristics and the personality characteristics of the speaker. The characteristic parameters extracted from the speech signal of the speaker meet a certain criterion, have robustness (the health condition and emotion of the speaker, dialect and other person imitation, and the like) to the outside, can keep stable for a long time, and are easy to extract from the speech signal.
From the acoustic level, the sound feature parameters can be simply classified into two categories: the inherent characteristics related to the physiological structure of the speaker are mainly embodied on the frequency spectrum structure of the voice, and include frequency spectrum envelope characteristic information reflecting vocal tract resonance and detail structural characteristic information reflecting the excitation property of a sound source such as vocal cord vibration, and typical characteristic parameters include pitch period coefficient and formant, which are not easy to be simulated, but are easy to be influenced by the health condition. The other type mainly reflects the dynamic characteristics of the vocal tract activity, namely the pronunciation mode, pronunciation habit and the like, which are reflected in the general dynamic characteristics of the audio structure changing along with time and containing characteristic parameters, and the characteristics are relatively stable but are easier to imitate, such as representative Mel cepstrum coefficients. If the two are objectively weighted and fused, the purpose of weighting and fusing the two can be achieved
Meanwhile, the extracted sound also has interference of surrounding noise and the like, and how to effectively remove the noise also becomes an important factor for the speaker to identify whether the speaker has high resolution or not.
Currently, adaptive techniques are also becoming mature. By the self-adaptive technology, the model parameters can be adjusted according to the speaking characteristics of the testers, and the identification accuracy is improved.
Disclosure of Invention
The invention aims to provide an adaptive method for analyzing different characteristics in a combined mode and effectively compensating errors caused by voice changes due to diseases or noises.
In order to achieve the purpose, the invention adopts the technical scheme that the speaker recognition self-adaptive method under the complex environment based on the GMM model comprises the following steps:
step 1, constructing a speaker recognition model based on GMM;
step 1.1, collecting a certain amount of voice data as training voice data for speaker recognition, and preprocessing the extracted voice data;
step 1.2, extracting a pitch period coefficient of the preprocessed voice signal by a cepstrum method;
step 1.3, extracting MFCC coefficients of the voice signals preprocessed in the step 1.1, and filtering the voice signals through a Gamma filter;
step 1.4, processing the MFCC coefficients to obtain first-order and second-order differences of the MFCC coefficients, and adding pitch period coefficients to obtain a GMFCC combined feature vector;
step 1.5, training a Gaussian mixture model by using the acoustic spectrum characteristics of a part of voice data;
step 2, speaker identification and self-adaptation;
step 2.1, preprocessing the voice to be recognized, extracting pitch period coefficients and MFCC coefficients from the voice data to be recognized, and obtaining GMFCC characteristics of the voice to be recognized after processing;
step 2.2, self-adaptive adjustment of the GMM model is carried out through the maximum posterior probability model;
and 2.3, identifying by using the adjusted model.
In the adaptive method for speaker recognition in a complex environment based on the GMM model, the step 1.1 is implemented by the following specific steps:
step 1.1.1, collecting a certain amount of voice data to make a corpus as training voice data for speaker recognition;
step 1.1.2, carrying out low-pass filtering on the obtained voice signal, reserving the frequency below 1000Hz, and simultaneously carrying out windowing and framing to obtain a frame signal;
and 1.1.3, performing least square method de-trend processing on each frame of signal, and eliminating noise in the voice signal by using spectral subtraction.
In the adaptive method for speaker recognition in a complex environment based on the GMM model, the implementation of step 1.2 includes the following specific steps:
step 1.2.1, analyzing the preprocessed signals to obtain a linear prediction model:
wherein ,the l L PC coefficient, x representing the i frame of speechi(m-l) represents the m-l-th frame,represents the predicted mth frame;
step 1.2.2, deducing a transfer function of a prediction error:
step 1.2.3, eliminating the influence of a formant by utilizing a linear prediction method;
step 1.2.4, replacing the value of the burr point in the voice signal with the median value of each adjacent point by using a median filtering algorithm, and eliminating the influence of the burr point in the voice on voice analysis;
and step 1.2.5, detecting the pitch period of the processed voice signal by using a cepstrum method, and calculating a pitch period coefficient.
In the adaptive method for speaker recognition in a complex environment based on the GMM model, the implementation of step 1.3 includes the following specific steps:
step 1.3.1, preprocessing the voice data obtained in the step 1.1, and distributing the voice data according to Mel frequency through a triangular filter bank configured with M band-pass filters;
step 1.3.2, carrying out logarithmic energy processing on the data output by each filter bank in the step 1.3.1;
and step 1.3.3, obtaining the MFCC parameters after the data obtained in the step 1.3.2 is subjected to Discrete Cosine Transform (DCT).
In the adaptive method for speaker recognition in a complex environment based on the GMM model, the implementation of step 1.4 includes the following specific steps:
step 1.4.1, after the MFCC parameters of the voice signals are extracted in the step 1.3, extracting first-order MFCC and second-order MFCC parameters by using the following equations;
Sm=MFCC+ΔMFCC+ΔΔMFCC
wherein ,dtFor the cepstral coefficients of the first difference, T represents the cepstral coefficient dimension, θ is the time difference of the first derivative, taken as 1 or 2, ctIs the t-th cepstrum coefficient;
step 1.4.2, the pitch period parameter extracted in the previous stepAnd the obtained MFCC parameter Sm is used as the posterior probability value of the test voice file, and the two vectors are normalized to ensure thatAnd Sm' becomes data between 0 and 1:
wherein ,represents the pitch period parameter, max represents its maximum value,and Sm' represents the normalized pitch period parameter and MFCC parameter;
step 1.4.3, calculating influence degree factors C1 and C2 of the two parameters by using an entropy weight method to form a new combined parameter GMFCC:
in the adaptive method for speaker recognition in a complex environment based on the GMM model, the implementation of step 1.5 includes the following specific steps:
and step 1.5.1, obtaining the GMM corresponding to each sample by using an EM algorithm, wherein each GMM corresponds to a respective mean value, covariance and weight.
In the adaptive method for speaker recognition in a complex environment based on the GMM model, the step 2.1 is implemented by the following specific steps:
step 2.1.1, preprocessing the voice to be recognized, including low-pass filtering, de-trending, framing, windowing and end point detection;
step 2.1.2, filtering the voice signals by utilizing a Gamma filter;
and 2.1.3, extracting pitch period coefficients and MFCC coefficients of the voice to be recognized through a cepstrum method, and calculating first-order MFCC and second-order MFCC parameters to form GMFCC combined parameters.
In the adaptive method for speaker recognition in a complex environment based on the GMM model, the step 2.2 is implemented by performing adaptive speaker transformation on the original model according to the parameters of the speech to be recognized by using the maximum posterior probability model to obtain the adaptive model related to the speaker.
In the adaptive method for speaker recognition in a complex environment based on the GMM model, the step 2.3 is implemented by calculating probability values P (Z | a) of the speech to be recognized and the original training respectively through a GMM formula, wherein Z is the speech data to be recognized, a is one of the training data, and the model with the maximum probability value is selected to label the speech to be recognized as the speaker.
The invention has the beneficial effects that: (1) the two voice parameters are used for recognition, and the reduction of recognition rate caused by sound change due to illness or different feelings is avoided by adding the pitch period parameter; and the dynamic characteristics of the sound channel activity are reflected on the basis of the MFCC parameters, so that the stability is certain.
(2) And filtering the original voice data by using a Gamma filter to remove the noise caused by the surrounding complex environment to reduce the identification accuracy.
(3) And modifying the original GMM model by utilizing the maximum posterior probability model according to the parameter characteristics of the voice data to be recognized, realizing the self-adaptation of the model and effectively improving the accuracy of the model recognition.
Drawings
FIG. 1 is a general flow chart of one embodiment of the present invention.
Detailed Description
Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
In order to overcome the defects that the speaker recognition accuracy is reduced due to illness or complex environment, and the like, the embodiment provides a novel combined characteristic parameter method which can combine and analyze different characteristics, effectively compensate errors caused by voice change due to illness or noise, and improve the recognition accuracy.
A speaker recognition adaptive method in a complex environment based on a GMM model comprises the following steps: and a GMM-based speaker recognition model construction stage, namely, after preprocessing of low-pass filtering, pre-emphasis, windowing, framing and the like is carried out on the voice signals, filtering and denoising are carried out through a Gamma filter, and GMFCC combination characteristic parameters are extracted. And a speaker identification and self-adaptation stage, namely, the speaker identification is completed by extracting the voice characteristic parameters of the speaker to be identified and carrying out self-adaptation adjustment on the original model.
The construction stage of the GMM-based speaker recognition model specifically comprises the following steps:
and step S1, collecting a certain amount of voice data as training voice data for speaker recognition, and preprocessing the extracted voice data.
In step S2, pitch period coefficients of the preprocessed speech signal are extracted by a cepstrum method.
In step S3, MFCC coefficients are extracted from the preprocessed speech information, and filtering is performed by a Gammatone filter.
Step S4, the MFCC coefficients are processed to obtain the first and second order differences of the MFCC coefficients, and the pitch period coefficients are added to obtain the GMFCC combined feature vector.
And step S5, training a Gaussian mixture model by using the acoustic spectrum characteristics of a part of voice data.
The speaker recognition and adaptation stage specifically comprises the following steps:
step S6, preprocessing the speech to be recognized, extracting pitch period coefficients and MFCC coefficients from the speech data to be recognized, and obtaining GMFCC characteristics of the speech to be recognized after processing.
In step S7, the GMM model is adaptively adjusted by the maximum a posteriori probability model.
In step S8, recognition is performed using the adjusted model.
In specific implementation, as shown in fig. 1, the embodiment is a speaker recognition adaptive method in a complex environment based on a GMM model, and includes 7 functional modules: the system comprises a data preprocessing module, a Gamma filtering module, a pitch period parameter extraction module, an MFCC parameter extraction module, a GMFCC combined parameter module, a GMM module and a self-adaptive module. The data preprocessing module has the main functions of performing end point detection, pre-emphasis, framing and windowing on original voice data by utilizing signal processing. The Gamma filtering module has the main function of filtering and denoising the original voice signal and highlighting the speaking voice of the speaker. The pitch period parameter extraction module has the main function of extracting the pitch period coefficient of the original voice, and the pitch period coefficient is used as the characteristic parameter of the voice for later training and recognition. The MFCC parameter extraction module has the main function of extracting the MFCC parameters, the first-order MFCC parameters and the second-order MFCC parameters of the voice. The main function of the GMFCC combined parameter module is to process the pitch period parameters and MFCCs and concatenate them into a high dimensional combined parameter, GMFCC. The GMM module has the main function of training the extracted characteristic parameters, and the training sample of each speaker obtains a corresponding GMM matching model through an EM algorithm. The main function of the self-adaptive module is to adjust the original model parameters according to the acoustic characteristics of a new speaker by using a MAP algorithm to realize self-adaptation.
The method of the embodiment comprises the following steps: a GMM-based speaker recognition model construction stage and a speaker recognition and self-adaptation stage.
The construction stage of the GMM-based speaker recognition model specifically comprises the following steps:
the step S1 specifically has the following substeps:
step S11, a certain amount of speech data is collected to make a corpus as training speech data for speaker recognition.
And step S12, performing low-pass filtering on the obtained voice signal, only keeping the frequency below 1000Hz, and simultaneously performing windowing and framing to obtain a frame signal.
In step S14, a least square method de-trending process is performed on each frame of signal, and noise in the speech signal is removed by using spectral subtraction.
The substeps of step S2 are as follows:
step S21, analyzing the preprocessed signals to obtain a linear prediction model thereof:
wherein ,the l L PC coefficient, x representing the i frame of speechi(m-l) represents the m-l-th frame,representing the predicted mth frame.
Step S22, deriving a transfer function of the prediction error:
In step S23, the influence of the formants is eliminated by the linear prediction method.
And step S24, replacing the burr point value in the voice signal with the median value of the adjacent points by using a median filtering algorithm, and eliminating the influence of burrs in the voice on voice analysis.
Step S25, using cepstrum to detect the pitch period of the processed voice signal, and calculating the pitch period coefficient,
the substeps of step S3 are as follows:
step S31, obtaining the processed voice data after the preprocessing of step S1, and passing the processed voice data through a triangular filter bank configured with M band-pass filters to distribute the voice data according to Mel frequency so as to satisfy the requirement of auditory habits of human ears.
In step S32, logarithmic energy processing is performed on the data output from each filter bank in step S31.
And step S33, obtaining the MFCC parameters after the data obtained in step S32 is subjected to Discrete Cosine Transform (DCT).
The substeps of step S4 are as follows:
in step S41, after the MFCC parameters of the speech signal are extracted in step S3, the first-order MFCC and the second-order MFCC parameters can be extracted by the following equations.
Sm=MFCC+ΔMFCC+ΔΔMFCC
wherein ,dtCepstral coefficients for the first difference, T representing the cepstral coefficient dimension, theta being the time difference of the first derivative, the value being 1 or 2, ctIs the t-th cepstrum coefficient.
Step S42, the pitch period parameter extracted in the previous stepAnd the obtained MFCC parameter Sm is used as the posterior probability value of the test voice file, and the two vectors are normalized to ensure thatAnd Sm' becomes data between 0 and 1:
wherein ,represents the pitch period parameter, max represents its maximum value,and Sm' represents the normalized pitch period parameter and MFCC parameters.
Step S43, calculating the influence degree factors C1 and C2 of the two parameters by using an entropy weight method, and forming a new combination parameter GMFCC:
the substeps of step S5 are as follows:
step S51, the mean, covariance and weight of each GMM model corresponding to each sample are obtained by EM algorithm.
The speaker recognition and adaptation stage specifically comprises the following steps:
the substeps of step S6 are as follows:
step S61, pre-process the speech to be recognized, including low-pass filtering, de-trending, framing, windowing, end-point detection, etc.
In step S62, the speech signal is filtered by the Gammatone filter.
Step S63, extracting pitch period coefficient and MFCC coefficient of the speech to be recognized by cepstrum method, and calculating first order MFCC and second order MFCC parameters to form GMFCC combined parameters.
A substep of step S7;
step S71 is to perform adaptive adjustment of the GMM model through the maximum a posteriori probability model, i.e. to perform adaptive transformation of the speaker on the original model according to the parameters of the speech to be recognized by using MAP (maximum a posteriori probability model), so as to obtain the adaptive model related to the speaker.
A substep of step S8;
step S81 identifies the voice to be identified and the probability value P (Z | a) of the original training (Z is the voice data to be identified and a is one of the training data) by GMM formula, and selects the model with the highest probability value, and then labels the voice to be identified as the speaker.
It should be understood that parts of the specification not set forth in detail are well within the prior art.
Although specific embodiments of the present invention have been described above with reference to the accompanying drawings, it will be appreciated by those skilled in the art that these are merely illustrative and that various changes or modifications may be made to these embodiments without departing from the principles and spirit of the invention. The scope of the invention is only limited by the appended claims.
Claims (9)
1. A speaker recognition self-adaptive method in a complex environment based on a GMM model is characterized by comprising the following steps:
step 1, constructing a speaker recognition model based on GMM;
step 1.1, collecting a certain amount of voice data as training voice data for speaker recognition, and preprocessing the extracted voice data;
step 1.2, extracting a pitch period coefficient of the preprocessed voice signal by a cepstrum method;
step 1.3, extracting MFCC coefficients of the voice signals preprocessed in the step 1.1, and filtering the voice signals through a Gamma filter;
step 1.4, processing the MFCC coefficients to obtain first-order and second-order differences of the MFCC coefficients, and adding pitch period coefficients to obtain a GMFCC combined feature vector;
step 1.5, training a Gaussian mixture model by using the acoustic spectrum characteristics of a part of voice data;
step 2, speaker identification and self-adaptation;
step 2.1, preprocessing the voice to be recognized, extracting pitch period coefficients and MFCC coefficients from the voice data to be recognized, and obtaining GMFCC characteristics of the voice to be recognized after processing;
step 2.2, self-adaptive adjustment of the GMM model is carried out through the maximum posterior probability model;
and 2.3, identifying by using the adjusted model.
2. The adaptive method for speaker recognition in a complex environment based on GMM model as claimed in claim 1, wherein the step 1.1 is implemented by the following steps:
step 1.1.1, collecting a certain amount of voice data to make a corpus as training voice data for speaker recognition;
step 1.1.2, carrying out low-pass filtering on the obtained voice signal, reserving the frequency below 1000Hz, and simultaneously carrying out windowing and framing to obtain a frame signal;
and 1.1.3, performing least square method de-trend processing on each frame of signal, and eliminating noise in the voice signal by using spectral subtraction.
3. The adaptive method for speaker recognition in a complex environment based on GMM model as claimed in claim 1, wherein the step 1.2 is implemented by the following steps:
step 1.2.1, analyzing the preprocessed signals to obtain a linear prediction model:
wherein ,the l L PC coefficient, x representing the i frame of speechi(m-l) represents the m-l-th frame,represents the predicted mth frame;
step 1.2.2, deducing a transfer function of a prediction error:
step 1.2.3, eliminating the influence of a formant by utilizing a linear prediction method;
step 1.2.4, replacing the value of the burr point in the voice signal with the median value of each adjacent point by using a median filtering algorithm, and eliminating the influence of the burr point in the voice on voice analysis;
and step 1.2.5, detecting the pitch period of the processed voice signal by using a cepstrum method, and calculating a pitch period coefficient.
4. The adaptive method for speaker recognition in a complex environment based on GMM model as claimed in claim 1, wherein the step 1.3 is implemented by the following steps:
step 1.3.1, preprocessing the voice data obtained in the step 1.1, and distributing the voice data according to Mel frequency through a triangular filter bank configured with M band-pass filters;
step 1.3.2, carrying out logarithmic energy processing on the data output by each filter bank in the step 1.3.1;
and step 1.3.3, obtaining the MFCC parameters after the data obtained in the step 1.3.2 is subjected to Discrete Cosine Transform (DCT).
5. The adaptive method for speaker recognition in a complex environment based on GMM model as claimed in claim 1, wherein the step 1.4 is implemented by the following steps:
step 1.4.1, after the MFCC parameters of the voice signals are extracted in the step 1.3, extracting first-order MFCC and second-order MFCC parameters by using the following equations;
Sm=MFCC+ΔMFCC+ΔΔMFCC
wherein ,dtFor the cepstral coefficients of the first difference, T represents the cepstral coefficient dimension, θ is the time difference of the first derivative, taken as 1 or 2, ctIs the t-th cepstrum coefficient;
step 1.4.2, the pitch period parameter extracted in the previous stepAnd the obtained MFCC parameter Sm is used as the posterior probability value of the test voice file, and the two vectors are normalized to ensure thatAnd Sm' becomes data between 0 and 1:
wherein ,represents the pitch period parameter, max represents its maximum value,and Sm' represents the normalized pitch period parameter and MFCC parameter;
step 1.4.3, calculating influence degree factors C1 and C2 of the two parameters by using an entropy weight method to form a new combined parameter GMFCC:
6. the adaptive method for speaker recognition in a complex environment based on GMM model as claimed in claim 1, wherein the step 1.5 is implemented by the following steps:
and step 1.5.1, obtaining the GMM corresponding to each sample by using an EM algorithm, wherein each GMM corresponds to a respective mean value, covariance and weight.
7. The adaptive method for speaker recognition in a complex environment based on GMM model as claimed in claim 1, wherein the step 2.1 is implemented by the following steps:
step 2.1.1, preprocessing the voice to be recognized, including low-pass filtering, de-trending, framing, windowing and end point detection;
step 2.1.2, filtering the voice signals by utilizing a Gamma filter;
and 2.1.3, extracting pitch period coefficients and MFCC coefficients of the voice to be recognized through a cepstrum method, and calculating first-order MFCC and second-order MFCC parameters to form GMFCC combined parameters.
8. The adaptive method for speaker recognition in a complex environment based on GMM model as claimed in claim 1, wherein the step 2.2 is implemented by using a maximum a posteriori probability model to perform speaker adaptive transformation on the original model according to the parameters of the speech to be recognized, so as to obtain the speaker dependent adaptive model.
9. The adaptive method for speaker recognition under complex environment based on GMM model as claimed in claim 1, wherein the step 2.3 is implemented by calculating probability values P (Z | a) of the speech to be recognized and the original training respectively through GMM formula, Z is the speech data to be recognized, a is a model in the training data, and the model with the highest probability value is selected to label the speech to be recognized as the speaker.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010284977.6A CN111489763B (en) | 2020-04-13 | 2020-04-13 | GMM model-based speaker recognition self-adaption method in complex environment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010284977.6A CN111489763B (en) | 2020-04-13 | 2020-04-13 | GMM model-based speaker recognition self-adaption method in complex environment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111489763A true CN111489763A (en) | 2020-08-04 |
CN111489763B CN111489763B (en) | 2023-06-20 |
Family
ID=71812744
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010284977.6A Active CN111489763B (en) | 2020-04-13 | 2020-04-13 | GMM model-based speaker recognition self-adaption method in complex environment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111489763B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112951245A (en) * | 2021-03-09 | 2021-06-11 | 江苏开放大学(江苏城市职业学院) | Dynamic voiceprint feature extraction method integrated with static component |
CN113567969A (en) * | 2021-09-23 | 2021-10-29 | 江苏禹治流域管理技术研究院有限公司 | Illegal sand dredger automatic monitoring method and system based on underwater acoustic signals |
WO2022205249A1 (en) * | 2021-03-31 | 2022-10-06 | 华为技术有限公司 | Audio feature compensation method, audio recognition method, and related product |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102324232A (en) * | 2011-09-12 | 2012-01-18 | 辽宁工业大学 | Method for recognizing sound-groove and system based on gauss hybrid models |
CN104900235A (en) * | 2015-05-25 | 2015-09-09 | 重庆大学 | Voiceprint recognition method based on pitch period mixed characteristic parameters |
CN105679312A (en) * | 2016-03-04 | 2016-06-15 | 重庆邮电大学 | Phonetic feature processing method of voiceprint identification in noise environment |
JP2016143043A (en) * | 2015-02-05 | 2016-08-08 | 日本電信電話株式会社 | Speech model learning method, noise suppression method, speech model learning system, noise suppression system, speech model learning program, and noise suppression program |
CN106782500A (en) * | 2016-12-23 | 2017-05-31 | 电子科技大学 | A kind of fusion feature parameter extracting method based on pitch period and MFCC |
CN107369440A (en) * | 2017-08-02 | 2017-11-21 | 北京灵伴未来科技有限公司 | The training method and device of a kind of Speaker Identification model for phrase sound |
CN110400565A (en) * | 2019-08-20 | 2019-11-01 | 广州国音智能科技有限公司 | Method for distinguishing speek person, system and computer readable storage medium |
-
2020
- 2020-04-13 CN CN202010284977.6A patent/CN111489763B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102324232A (en) * | 2011-09-12 | 2012-01-18 | 辽宁工业大学 | Method for recognizing sound-groove and system based on gauss hybrid models |
JP2016143043A (en) * | 2015-02-05 | 2016-08-08 | 日本電信電話株式会社 | Speech model learning method, noise suppression method, speech model learning system, noise suppression system, speech model learning program, and noise suppression program |
CN104900235A (en) * | 2015-05-25 | 2015-09-09 | 重庆大学 | Voiceprint recognition method based on pitch period mixed characteristic parameters |
CN105679312A (en) * | 2016-03-04 | 2016-06-15 | 重庆邮电大学 | Phonetic feature processing method of voiceprint identification in noise environment |
CN106782500A (en) * | 2016-12-23 | 2017-05-31 | 电子科技大学 | A kind of fusion feature parameter extracting method based on pitch period and MFCC |
CN107369440A (en) * | 2017-08-02 | 2017-11-21 | 北京灵伴未来科技有限公司 | The training method and device of a kind of Speaker Identification model for phrase sound |
CN110400565A (en) * | 2019-08-20 | 2019-11-01 | 广州国音智能科技有限公司 | Method for distinguishing speek person, system and computer readable storage medium |
Non-Patent Citations (1)
Title |
---|
叶寒生等: "噪声环境下基于特征信息融合的说话人识别", 《计算机仿真》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112951245A (en) * | 2021-03-09 | 2021-06-11 | 江苏开放大学(江苏城市职业学院) | Dynamic voiceprint feature extraction method integrated with static component |
WO2022205249A1 (en) * | 2021-03-31 | 2022-10-06 | 华为技术有限公司 | Audio feature compensation method, audio recognition method, and related product |
CN113567969A (en) * | 2021-09-23 | 2021-10-29 | 江苏禹治流域管理技术研究院有限公司 | Illegal sand dredger automatic monitoring method and system based on underwater acoustic signals |
CN113567969B (en) * | 2021-09-23 | 2021-12-17 | 江苏禹治流域管理技术研究院有限公司 | Illegal sand dredger automatic monitoring method and system based on underwater acoustic signals |
Also Published As
Publication number | Publication date |
---|---|
CN111489763B (en) | 2023-06-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105513605B (en) | The speech-enhancement system and sound enhancement method of mobile microphone | |
CN106486131B (en) | A kind of method and device of speech de-noising | |
Cai et al. | Sensor network for the monitoring of ecosystem: Bird species recognition | |
CN105023573B (en) | It is detected using speech syllable/vowel/phone boundary of auditory attention clue | |
JP4802135B2 (en) | Speaker authentication registration and confirmation method and apparatus | |
Liu et al. | Bone-conducted speech enhancement using deep denoising autoencoder | |
CN111489763B (en) | GMM model-based speaker recognition self-adaption method in complex environment | |
CN102968990B (en) | Speaker identifying method and system | |
CN111816218A (en) | Voice endpoint detection method, device, equipment and storage medium | |
CN108922541A (en) | Multidimensional characteristic parameter method for recognizing sound-groove based on DTW and GMM model | |
CN113012720B (en) | Depression detection method by multi-voice feature fusion under spectral subtraction noise reduction | |
Hui et al. | Convolutional maxout neural networks for speech separation | |
Liang et al. | Real-time speech enhancement algorithm based on attention LSTM | |
CN112397074A (en) | Voiceprint recognition method based on MFCC (Mel frequency cepstrum coefficient) and vector element learning | |
Chiu et al. | Learning-based auditory encoding for robust speech recognition | |
CN112466276A (en) | Speech synthesis system training method and device and readable storage medium | |
CN110415707B (en) | Speaker recognition method based on voice feature fusion and GMM | |
CN111524520A (en) | Voiceprint recognition method based on error reverse propagation neural network | |
Kaminski et al. | Automatic speaker recognition using a unique personal feature vector and Gaussian Mixture Models | |
CN111862991A (en) | Method and system for identifying baby crying | |
Cai et al. | The best input feature when using convolutional neural network for cough recognition | |
Wang et al. | Robust Text-independent Speaker Identification in a Time-varying Noisy Environment. | |
Tzudir et al. | Low-resource dialect identification in Ao using noise robust mean Hilbert envelope coefficients | |
CN114302301A (en) | Frequency response correction method and related product | |
CN107993666B (en) | Speech recognition method, speech recognition device, computer equipment and readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |