CN111489763A - Adaptive method for speaker recognition in complex environment based on GMM model - Google Patents

Adaptive method for speaker recognition in complex environment based on GMM model Download PDF

Info

Publication number
CN111489763A
CN111489763A CN202010284977.6A CN202010284977A CN111489763A CN 111489763 A CN111489763 A CN 111489763A CN 202010284977 A CN202010284977 A CN 202010284977A CN 111489763 A CN111489763 A CN 111489763A
Authority
CN
China
Prior art keywords
voice
model
speaker recognition
mfcc
gmm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010284977.6A
Other languages
Chinese (zh)
Other versions
CN111489763B (en
Inventor
郭雨欣
宋雨佳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN202010284977.6A priority Critical patent/CN111489763B/en
Publication of CN111489763A publication Critical patent/CN111489763A/en
Application granted granted Critical
Publication of CN111489763B publication Critical patent/CN111489763B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Abstract

The invention relates to a signal processing technology, in particular to a speaker recognition adaptive method in a complex environment based on a GMM model, which comprises a GMM-based speaker recognition model construction stage, namely, after preprocessing of low-pass filtering, pre-emphasis, windowing, framing and the like is carried out on a voice signal, filtering and denoising are carried out through a Gamma filter, and GMFCC combination characteristic parameters are extracted. And the speaker recognition and self-adaptation stage is also included, namely the speaker recognition is completed by extracting the voice characteristic parameters of the speaker to be recognized and carrying out self-adaptation adjustment on the original model. The method overcomes the defects of low speaker recognition accuracy rate and the like caused by illness or complex environment, provides a novel combined characteristic parameter method, can combine and analyze different characteristics, and effectively compensates errors caused by voice change due to different conditions of speakers, thereby improving the recognition accuracy rate.

Description

Adaptive method for speaker recognition in complex environment based on GMM model
Technical Field
The invention belongs to the technical field of signal processing, and particularly relates to a GMM model-based speaker recognition adaptive method in a complex environment.
Background
Speaker recognition is a method of performing feature extraction through collected voice signals of a speaker, analyzing and processing the voice signals, and then recognizing or confirming the speaker. With the rapid development of the current internet and information technology, speaker recognition technology is used in more and more related fields. Speaker recognition is used as a leading-edge technology and is widely applied to the fields of smart home, judicial criminal investigation, identity verification and the like.
With the progress of speaker recognition research, key technologies mainly develop around the problems of noise elimination, feature extraction, pattern matching and the like.
How to extract the individual features of the speaker from the speech signal of the speaker is the key of voiceprint recognition. The voice signal contains the characteristics of the sent voice and the personality characteristics of the speaker, and is a mixture of the voice characteristics and the personality characteristics of the speaker. The characteristic parameters extracted from the speech signal of the speaker meet a certain criterion, have robustness (the health condition and emotion of the speaker, dialect and other person imitation, and the like) to the outside, can keep stable for a long time, and are easy to extract from the speech signal.
From the acoustic level, the sound feature parameters can be simply classified into two categories: the inherent characteristics related to the physiological structure of the speaker are mainly embodied on the frequency spectrum structure of the voice, and include frequency spectrum envelope characteristic information reflecting vocal tract resonance and detail structural characteristic information reflecting the excitation property of a sound source such as vocal cord vibration, and typical characteristic parameters include pitch period coefficient and formant, which are not easy to be simulated, but are easy to be influenced by the health condition. The other type mainly reflects the dynamic characteristics of the vocal tract activity, namely the pronunciation mode, pronunciation habit and the like, which are reflected in the general dynamic characteristics of the audio structure changing along with time and containing characteristic parameters, and the characteristics are relatively stable but are easier to imitate, such as representative Mel cepstrum coefficients. If the two are objectively weighted and fused, the purpose of weighting and fusing the two can be achieved
Meanwhile, the extracted sound also has interference of surrounding noise and the like, and how to effectively remove the noise also becomes an important factor for the speaker to identify whether the speaker has high resolution or not.
Currently, adaptive techniques are also becoming mature. By the self-adaptive technology, the model parameters can be adjusted according to the speaking characteristics of the testers, and the identification accuracy is improved.
Disclosure of Invention
The invention aims to provide an adaptive method for analyzing different characteristics in a combined mode and effectively compensating errors caused by voice changes due to diseases or noises.
In order to achieve the purpose, the invention adopts the technical scheme that the speaker recognition self-adaptive method under the complex environment based on the GMM model comprises the following steps:
step 1, constructing a speaker recognition model based on GMM;
step 1.1, collecting a certain amount of voice data as training voice data for speaker recognition, and preprocessing the extracted voice data;
step 1.2, extracting a pitch period coefficient of the preprocessed voice signal by a cepstrum method;
step 1.3, extracting MFCC coefficients of the voice signals preprocessed in the step 1.1, and filtering the voice signals through a Gamma filter;
step 1.4, processing the MFCC coefficients to obtain first-order and second-order differences of the MFCC coefficients, and adding pitch period coefficients to obtain a GMFCC combined feature vector;
step 1.5, training a Gaussian mixture model by using the acoustic spectrum characteristics of a part of voice data;
step 2, speaker identification and self-adaptation;
step 2.1, preprocessing the voice to be recognized, extracting pitch period coefficients and MFCC coefficients from the voice data to be recognized, and obtaining GMFCC characteristics of the voice to be recognized after processing;
step 2.2, self-adaptive adjustment of the GMM model is carried out through the maximum posterior probability model;
and 2.3, identifying by using the adjusted model.
In the adaptive method for speaker recognition in a complex environment based on the GMM model, the step 1.1 is implemented by the following specific steps:
step 1.1.1, collecting a certain amount of voice data to make a corpus as training voice data for speaker recognition;
step 1.1.2, carrying out low-pass filtering on the obtained voice signal, reserving the frequency below 1000Hz, and simultaneously carrying out windowing and framing to obtain a frame signal;
and 1.1.3, performing least square method de-trend processing on each frame of signal, and eliminating noise in the voice signal by using spectral subtraction.
In the adaptive method for speaker recognition in a complex environment based on the GMM model, the implementation of step 1.2 includes the following specific steps:
step 1.2.1, analyzing the preprocessed signals to obtain a linear prediction model:
Figure BDA0002448146160000031
wherein ,
Figure BDA0002448146160000032
the l L PC coefficient, x representing the i frame of speechi(m-l) represents the m-l-th frame,
Figure BDA0002448146160000033
represents the predicted mth frame;
step 1.2.2, deducing a transfer function of a prediction error:
Figure BDA0002448146160000034
wherein ,
Figure BDA0002448146160000035
the l L PC coefficient representing the i frame of speech;
step 1.2.3, eliminating the influence of a formant by utilizing a linear prediction method;
step 1.2.4, replacing the value of the burr point in the voice signal with the median value of each adjacent point by using a median filtering algorithm, and eliminating the influence of the burr point in the voice on voice analysis;
and step 1.2.5, detecting the pitch period of the processed voice signal by using a cepstrum method, and calculating a pitch period coefficient.
In the adaptive method for speaker recognition in a complex environment based on the GMM model, the implementation of step 1.3 includes the following specific steps:
step 1.3.1, preprocessing the voice data obtained in the step 1.1, and distributing the voice data according to Mel frequency through a triangular filter bank configured with M band-pass filters;
step 1.3.2, carrying out logarithmic energy processing on the data output by each filter bank in the step 1.3.1;
and step 1.3.3, obtaining the MFCC parameters after the data obtained in the step 1.3.2 is subjected to Discrete Cosine Transform (DCT).
In the adaptive method for speaker recognition in a complex environment based on the GMM model, the implementation of step 1.4 includes the following specific steps:
step 1.4.1, after the MFCC parameters of the voice signals are extracted in the step 1.3, extracting first-order MFCC and second-order MFCC parameters by using the following equations;
Figure BDA0002448146160000041
Sm=MFCC+ΔMFCC+ΔΔMFCC
wherein ,dtFor the cepstral coefficients of the first difference, T represents the cepstral coefficient dimension, θ is the time difference of the first derivative, taken as 1 or 2, ctIs the t-th cepstrum coefficient;
step 1.4.2, the pitch period parameter extracted in the previous step
Figure BDA0002448146160000051
And the obtained MFCC parameter Sm is used as the posterior probability value of the test voice file, and the two vectors are normalized to ensure that
Figure 1
And Sm' becomes data between 0 and 1:
Figure BDA0002448146160000053
Figure BDA0002448146160000054
wherein ,
Figure BDA0002448146160000055
represents the pitch period parameter, max represents its maximum value,
Figure BDA0002448146160000056
and Sm' represents the normalized pitch period parameter and MFCC parameter;
step 1.4.3, calculating influence degree factors C1 and C2 of the two parameters by using an entropy weight method to form a new combined parameter GMFCC:
Figure BDA0002448146160000057
in the adaptive method for speaker recognition in a complex environment based on the GMM model, the implementation of step 1.5 includes the following specific steps:
and step 1.5.1, obtaining the GMM corresponding to each sample by using an EM algorithm, wherein each GMM corresponds to a respective mean value, covariance and weight.
In the adaptive method for speaker recognition in a complex environment based on the GMM model, the step 2.1 is implemented by the following specific steps:
step 2.1.1, preprocessing the voice to be recognized, including low-pass filtering, de-trending, framing, windowing and end point detection;
step 2.1.2, filtering the voice signals by utilizing a Gamma filter;
and 2.1.3, extracting pitch period coefficients and MFCC coefficients of the voice to be recognized through a cepstrum method, and calculating first-order MFCC and second-order MFCC parameters to form GMFCC combined parameters.
In the adaptive method for speaker recognition in a complex environment based on the GMM model, the step 2.2 is implemented by performing adaptive speaker transformation on the original model according to the parameters of the speech to be recognized by using the maximum posterior probability model to obtain the adaptive model related to the speaker.
In the adaptive method for speaker recognition in a complex environment based on the GMM model, the step 2.3 is implemented by calculating probability values P (Z | a) of the speech to be recognized and the original training respectively through a GMM formula, wherein Z is the speech data to be recognized, a is one of the training data, and the model with the maximum probability value is selected to label the speech to be recognized as the speaker.
The invention has the beneficial effects that: (1) the two voice parameters are used for recognition, and the reduction of recognition rate caused by sound change due to illness or different feelings is avoided by adding the pitch period parameter; and the dynamic characteristics of the sound channel activity are reflected on the basis of the MFCC parameters, so that the stability is certain.
(2) And filtering the original voice data by using a Gamma filter to remove the noise caused by the surrounding complex environment to reduce the identification accuracy.
(3) And modifying the original GMM model by utilizing the maximum posterior probability model according to the parameter characteristics of the voice data to be recognized, realizing the self-adaptation of the model and effectively improving the accuracy of the model recognition.
Drawings
FIG. 1 is a general flow chart of one embodiment of the present invention.
Detailed Description
Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
In order to overcome the defects that the speaker recognition accuracy is reduced due to illness or complex environment, and the like, the embodiment provides a novel combined characteristic parameter method which can combine and analyze different characteristics, effectively compensate errors caused by voice change due to illness or noise, and improve the recognition accuracy.
A speaker recognition adaptive method in a complex environment based on a GMM model comprises the following steps: and a GMM-based speaker recognition model construction stage, namely, after preprocessing of low-pass filtering, pre-emphasis, windowing, framing and the like is carried out on the voice signals, filtering and denoising are carried out through a Gamma filter, and GMFCC combination characteristic parameters are extracted. And a speaker identification and self-adaptation stage, namely, the speaker identification is completed by extracting the voice characteristic parameters of the speaker to be identified and carrying out self-adaptation adjustment on the original model.
The construction stage of the GMM-based speaker recognition model specifically comprises the following steps:
and step S1, collecting a certain amount of voice data as training voice data for speaker recognition, and preprocessing the extracted voice data.
In step S2, pitch period coefficients of the preprocessed speech signal are extracted by a cepstrum method.
In step S3, MFCC coefficients are extracted from the preprocessed speech information, and filtering is performed by a Gammatone filter.
Step S4, the MFCC coefficients are processed to obtain the first and second order differences of the MFCC coefficients, and the pitch period coefficients are added to obtain the GMFCC combined feature vector.
And step S5, training a Gaussian mixture model by using the acoustic spectrum characteristics of a part of voice data.
The speaker recognition and adaptation stage specifically comprises the following steps:
step S6, preprocessing the speech to be recognized, extracting pitch period coefficients and MFCC coefficients from the speech data to be recognized, and obtaining GMFCC characteristics of the speech to be recognized after processing.
In step S7, the GMM model is adaptively adjusted by the maximum a posteriori probability model.
In step S8, recognition is performed using the adjusted model.
In specific implementation, as shown in fig. 1, the embodiment is a speaker recognition adaptive method in a complex environment based on a GMM model, and includes 7 functional modules: the system comprises a data preprocessing module, a Gamma filtering module, a pitch period parameter extraction module, an MFCC parameter extraction module, a GMFCC combined parameter module, a GMM module and a self-adaptive module. The data preprocessing module has the main functions of performing end point detection, pre-emphasis, framing and windowing on original voice data by utilizing signal processing. The Gamma filtering module has the main function of filtering and denoising the original voice signal and highlighting the speaking voice of the speaker. The pitch period parameter extraction module has the main function of extracting the pitch period coefficient of the original voice, and the pitch period coefficient is used as the characteristic parameter of the voice for later training and recognition. The MFCC parameter extraction module has the main function of extracting the MFCC parameters, the first-order MFCC parameters and the second-order MFCC parameters of the voice. The main function of the GMFCC combined parameter module is to process the pitch period parameters and MFCCs and concatenate them into a high dimensional combined parameter, GMFCC. The GMM module has the main function of training the extracted characteristic parameters, and the training sample of each speaker obtains a corresponding GMM matching model through an EM algorithm. The main function of the self-adaptive module is to adjust the original model parameters according to the acoustic characteristics of a new speaker by using a MAP algorithm to realize self-adaptation.
The method of the embodiment comprises the following steps: a GMM-based speaker recognition model construction stage and a speaker recognition and self-adaptation stage.
The construction stage of the GMM-based speaker recognition model specifically comprises the following steps:
the step S1 specifically has the following substeps:
step S11, a certain amount of speech data is collected to make a corpus as training speech data for speaker recognition.
And step S12, performing low-pass filtering on the obtained voice signal, only keeping the frequency below 1000Hz, and simultaneously performing windowing and framing to obtain a frame signal.
In step S14, a least square method de-trending process is performed on each frame of signal, and noise in the speech signal is removed by using spectral subtraction.
The substeps of step S2 are as follows:
step S21, analyzing the preprocessed signals to obtain a linear prediction model thereof:
Figure BDA0002448146160000081
wherein ,
Figure BDA0002448146160000091
the l L PC coefficient, x representing the i frame of speechi(m-l) represents the m-l-th frame,
Figure BDA0002448146160000092
representing the predicted mth frame.
Step S22, deriving a transfer function of the prediction error:
Figure BDA0002448146160000093
wherein ,
Figure BDA0002448146160000094
the l L PC coefficients representing the i frame of speech.
In step S23, the influence of the formants is eliminated by the linear prediction method.
And step S24, replacing the burr point value in the voice signal with the median value of the adjacent points by using a median filtering algorithm, and eliminating the influence of burrs in the voice on voice analysis.
Step S25, using cepstrum to detect the pitch period of the processed voice signal, and calculating the pitch period coefficient,
the substeps of step S3 are as follows:
step S31, obtaining the processed voice data after the preprocessing of step S1, and passing the processed voice data through a triangular filter bank configured with M band-pass filters to distribute the voice data according to Mel frequency so as to satisfy the requirement of auditory habits of human ears.
In step S32, logarithmic energy processing is performed on the data output from each filter bank in step S31.
And step S33, obtaining the MFCC parameters after the data obtained in step S32 is subjected to Discrete Cosine Transform (DCT).
The substeps of step S4 are as follows:
in step S41, after the MFCC parameters of the speech signal are extracted in step S3, the first-order MFCC and the second-order MFCC parameters can be extracted by the following equations.
Figure BDA0002448146160000101
Sm=MFCC+ΔMFCC+ΔΔMFCC
wherein ,dtCepstral coefficients for the first difference, T representing the cepstral coefficient dimension, theta being the time difference of the first derivative, the value being 1 or 2, ctIs the t-th cepstrum coefficient.
Step S42, the pitch period parameter extracted in the previous step
Figure BDA0002448146160000102
And the obtained MFCC parameter Sm is used as the posterior probability value of the test voice file, and the two vectors are normalized to ensure that
Figure BDA0002448146160000103
And Sm' becomes data between 0 and 1:
Figure BDA0002448146160000104
Figure BDA0002448146160000105
wherein ,
Figure BDA0002448146160000106
represents the pitch period parameter, max represents its maximum value,
Figure BDA0002448146160000107
and Sm' represents the normalized pitch period parameter and MFCC parameters.
Step S43, calculating the influence degree factors C1 and C2 of the two parameters by using an entropy weight method, and forming a new combination parameter GMFCC:
Figure BDA0002448146160000108
the substeps of step S5 are as follows:
step S51, the mean, covariance and weight of each GMM model corresponding to each sample are obtained by EM algorithm.
The speaker recognition and adaptation stage specifically comprises the following steps:
the substeps of step S6 are as follows:
step S61, pre-process the speech to be recognized, including low-pass filtering, de-trending, framing, windowing, end-point detection, etc.
In step S62, the speech signal is filtered by the Gammatone filter.
Step S63, extracting pitch period coefficient and MFCC coefficient of the speech to be recognized by cepstrum method, and calculating first order MFCC and second order MFCC parameters to form GMFCC combined parameters.
A substep of step S7;
step S71 is to perform adaptive adjustment of the GMM model through the maximum a posteriori probability model, i.e. to perform adaptive transformation of the speaker on the original model according to the parameters of the speech to be recognized by using MAP (maximum a posteriori probability model), so as to obtain the adaptive model related to the speaker.
A substep of step S8;
step S81 identifies the voice to be identified and the probability value P (Z | a) of the original training (Z is the voice data to be identified and a is one of the training data) by GMM formula, and selects the model with the highest probability value, and then labels the voice to be identified as the speaker.
It should be understood that parts of the specification not set forth in detail are well within the prior art.
Although specific embodiments of the present invention have been described above with reference to the accompanying drawings, it will be appreciated by those skilled in the art that these are merely illustrative and that various changes or modifications may be made to these embodiments without departing from the principles and spirit of the invention. The scope of the invention is only limited by the appended claims.

Claims (9)

1. A speaker recognition self-adaptive method in a complex environment based on a GMM model is characterized by comprising the following steps:
step 1, constructing a speaker recognition model based on GMM;
step 1.1, collecting a certain amount of voice data as training voice data for speaker recognition, and preprocessing the extracted voice data;
step 1.2, extracting a pitch period coefficient of the preprocessed voice signal by a cepstrum method;
step 1.3, extracting MFCC coefficients of the voice signals preprocessed in the step 1.1, and filtering the voice signals through a Gamma filter;
step 1.4, processing the MFCC coefficients to obtain first-order and second-order differences of the MFCC coefficients, and adding pitch period coefficients to obtain a GMFCC combined feature vector;
step 1.5, training a Gaussian mixture model by using the acoustic spectrum characteristics of a part of voice data;
step 2, speaker identification and self-adaptation;
step 2.1, preprocessing the voice to be recognized, extracting pitch period coefficients and MFCC coefficients from the voice data to be recognized, and obtaining GMFCC characteristics of the voice to be recognized after processing;
step 2.2, self-adaptive adjustment of the GMM model is carried out through the maximum posterior probability model;
and 2.3, identifying by using the adjusted model.
2. The adaptive method for speaker recognition in a complex environment based on GMM model as claimed in claim 1, wherein the step 1.1 is implemented by the following steps:
step 1.1.1, collecting a certain amount of voice data to make a corpus as training voice data for speaker recognition;
step 1.1.2, carrying out low-pass filtering on the obtained voice signal, reserving the frequency below 1000Hz, and simultaneously carrying out windowing and framing to obtain a frame signal;
and 1.1.3, performing least square method de-trend processing on each frame of signal, and eliminating noise in the voice signal by using spectral subtraction.
3. The adaptive method for speaker recognition in a complex environment based on GMM model as claimed in claim 1, wherein the step 1.2 is implemented by the following steps:
step 1.2.1, analyzing the preprocessed signals to obtain a linear prediction model:
Figure FDA0002448146150000021
wherein ,
Figure FDA0002448146150000022
the l L PC coefficient, x representing the i frame of speechi(m-l) represents the m-l-th frame,
Figure FDA0002448146150000023
represents the predicted mth frame;
step 1.2.2, deducing a transfer function of a prediction error:
Figure FDA0002448146150000024
wherein ,
Figure FDA0002448146150000025
the l L PC coefficient representing the i frame of speech;
step 1.2.3, eliminating the influence of a formant by utilizing a linear prediction method;
step 1.2.4, replacing the value of the burr point in the voice signal with the median value of each adjacent point by using a median filtering algorithm, and eliminating the influence of the burr point in the voice on voice analysis;
and step 1.2.5, detecting the pitch period of the processed voice signal by using a cepstrum method, and calculating a pitch period coefficient.
4. The adaptive method for speaker recognition in a complex environment based on GMM model as claimed in claim 1, wherein the step 1.3 is implemented by the following steps:
step 1.3.1, preprocessing the voice data obtained in the step 1.1, and distributing the voice data according to Mel frequency through a triangular filter bank configured with M band-pass filters;
step 1.3.2, carrying out logarithmic energy processing on the data output by each filter bank in the step 1.3.1;
and step 1.3.3, obtaining the MFCC parameters after the data obtained in the step 1.3.2 is subjected to Discrete Cosine Transform (DCT).
5. The adaptive method for speaker recognition in a complex environment based on GMM model as claimed in claim 1, wherein the step 1.4 is implemented by the following steps:
step 1.4.1, after the MFCC parameters of the voice signals are extracted in the step 1.3, extracting first-order MFCC and second-order MFCC parameters by using the following equations;
Figure FDA0002448146150000031
Sm=MFCC+ΔMFCC+ΔΔMFCC
wherein ,dtFor the cepstral coefficients of the first difference, T represents the cepstral coefficient dimension, θ is the time difference of the first derivative, taken as 1 or 2, ctIs the t-th cepstrum coefficient;
step 1.4.2, the pitch period parameter extracted in the previous step
Figure FDA0002448146150000032
And the obtained MFCC parameter Sm is used as the posterior probability value of the test voice file, and the two vectors are normalized to ensure that
Figure FDA0002448146150000033
And Sm' becomes data between 0 and 1:
Figure FDA0002448146150000034
Figure FDA0002448146150000035
wherein ,
Figure FDA0002448146150000036
represents the pitch period parameter, max represents its maximum value,
Figure FDA0002448146150000037
and Sm' represents the normalized pitch period parameter and MFCC parameter;
step 1.4.3, calculating influence degree factors C1 and C2 of the two parameters by using an entropy weight method to form a new combined parameter GMFCC:
Figure FDA0002448146150000041
6. the adaptive method for speaker recognition in a complex environment based on GMM model as claimed in claim 1, wherein the step 1.5 is implemented by the following steps:
and step 1.5.1, obtaining the GMM corresponding to each sample by using an EM algorithm, wherein each GMM corresponds to a respective mean value, covariance and weight.
7. The adaptive method for speaker recognition in a complex environment based on GMM model as claimed in claim 1, wherein the step 2.1 is implemented by the following steps:
step 2.1.1, preprocessing the voice to be recognized, including low-pass filtering, de-trending, framing, windowing and end point detection;
step 2.1.2, filtering the voice signals by utilizing a Gamma filter;
and 2.1.3, extracting pitch period coefficients and MFCC coefficients of the voice to be recognized through a cepstrum method, and calculating first-order MFCC and second-order MFCC parameters to form GMFCC combined parameters.
8. The adaptive method for speaker recognition in a complex environment based on GMM model as claimed in claim 1, wherein the step 2.2 is implemented by using a maximum a posteriori probability model to perform speaker adaptive transformation on the original model according to the parameters of the speech to be recognized, so as to obtain the speaker dependent adaptive model.
9. The adaptive method for speaker recognition under complex environment based on GMM model as claimed in claim 1, wherein the step 2.3 is implemented by calculating probability values P (Z | a) of the speech to be recognized and the original training respectively through GMM formula, Z is the speech data to be recognized, a is a model in the training data, and the model with the highest probability value is selected to label the speech to be recognized as the speaker.
CN202010284977.6A 2020-04-13 2020-04-13 GMM model-based speaker recognition self-adaption method in complex environment Active CN111489763B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010284977.6A CN111489763B (en) 2020-04-13 2020-04-13 GMM model-based speaker recognition self-adaption method in complex environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010284977.6A CN111489763B (en) 2020-04-13 2020-04-13 GMM model-based speaker recognition self-adaption method in complex environment

Publications (2)

Publication Number Publication Date
CN111489763A true CN111489763A (en) 2020-08-04
CN111489763B CN111489763B (en) 2023-06-20

Family

ID=71812744

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010284977.6A Active CN111489763B (en) 2020-04-13 2020-04-13 GMM model-based speaker recognition self-adaption method in complex environment

Country Status (1)

Country Link
CN (1) CN111489763B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112951245A (en) * 2021-03-09 2021-06-11 江苏开放大学(江苏城市职业学院) Dynamic voiceprint feature extraction method integrated with static component
CN113567969A (en) * 2021-09-23 2021-10-29 江苏禹治流域管理技术研究院有限公司 Illegal sand dredger automatic monitoring method and system based on underwater acoustic signals
WO2022205249A1 (en) * 2021-03-31 2022-10-06 华为技术有限公司 Audio feature compensation method, audio recognition method, and related product

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102324232A (en) * 2011-09-12 2012-01-18 辽宁工业大学 Method for recognizing sound-groove and system based on gauss hybrid models
CN104900235A (en) * 2015-05-25 2015-09-09 重庆大学 Voiceprint recognition method based on pitch period mixed characteristic parameters
CN105679312A (en) * 2016-03-04 2016-06-15 重庆邮电大学 Phonetic feature processing method of voiceprint identification in noise environment
JP2016143043A (en) * 2015-02-05 2016-08-08 日本電信電話株式会社 Speech model learning method, noise suppression method, speech model learning system, noise suppression system, speech model learning program, and noise suppression program
CN106782500A (en) * 2016-12-23 2017-05-31 电子科技大学 A kind of fusion feature parameter extracting method based on pitch period and MFCC
CN107369440A (en) * 2017-08-02 2017-11-21 北京灵伴未来科技有限公司 The training method and device of a kind of Speaker Identification model for phrase sound
CN110400565A (en) * 2019-08-20 2019-11-01 广州国音智能科技有限公司 Method for distinguishing speek person, system and computer readable storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102324232A (en) * 2011-09-12 2012-01-18 辽宁工业大学 Method for recognizing sound-groove and system based on gauss hybrid models
JP2016143043A (en) * 2015-02-05 2016-08-08 日本電信電話株式会社 Speech model learning method, noise suppression method, speech model learning system, noise suppression system, speech model learning program, and noise suppression program
CN104900235A (en) * 2015-05-25 2015-09-09 重庆大学 Voiceprint recognition method based on pitch period mixed characteristic parameters
CN105679312A (en) * 2016-03-04 2016-06-15 重庆邮电大学 Phonetic feature processing method of voiceprint identification in noise environment
CN106782500A (en) * 2016-12-23 2017-05-31 电子科技大学 A kind of fusion feature parameter extracting method based on pitch period and MFCC
CN107369440A (en) * 2017-08-02 2017-11-21 北京灵伴未来科技有限公司 The training method and device of a kind of Speaker Identification model for phrase sound
CN110400565A (en) * 2019-08-20 2019-11-01 广州国音智能科技有限公司 Method for distinguishing speek person, system and computer readable storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
叶寒生等: "噪声环境下基于特征信息融合的说话人识别", 《计算机仿真》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112951245A (en) * 2021-03-09 2021-06-11 江苏开放大学(江苏城市职业学院) Dynamic voiceprint feature extraction method integrated with static component
WO2022205249A1 (en) * 2021-03-31 2022-10-06 华为技术有限公司 Audio feature compensation method, audio recognition method, and related product
CN113567969A (en) * 2021-09-23 2021-10-29 江苏禹治流域管理技术研究院有限公司 Illegal sand dredger automatic monitoring method and system based on underwater acoustic signals
CN113567969B (en) * 2021-09-23 2021-12-17 江苏禹治流域管理技术研究院有限公司 Illegal sand dredger automatic monitoring method and system based on underwater acoustic signals

Also Published As

Publication number Publication date
CN111489763B (en) 2023-06-20

Similar Documents

Publication Publication Date Title
CN105513605B (en) The speech-enhancement system and sound enhancement method of mobile microphone
CN106486131B (en) A kind of method and device of speech de-noising
Cai et al. Sensor network for the monitoring of ecosystem: Bird species recognition
CN105023573B (en) It is detected using speech syllable/vowel/phone boundary of auditory attention clue
JP4802135B2 (en) Speaker authentication registration and confirmation method and apparatus
Liu et al. Bone-conducted speech enhancement using deep denoising autoencoder
CN111489763B (en) GMM model-based speaker recognition self-adaption method in complex environment
CN102968990B (en) Speaker identifying method and system
CN111816218A (en) Voice endpoint detection method, device, equipment and storage medium
CN108922541A (en) Multidimensional characteristic parameter method for recognizing sound-groove based on DTW and GMM model
CN113012720B (en) Depression detection method by multi-voice feature fusion under spectral subtraction noise reduction
Hui et al. Convolutional maxout neural networks for speech separation
Liang et al. Real-time speech enhancement algorithm based on attention LSTM
CN112397074A (en) Voiceprint recognition method based on MFCC (Mel frequency cepstrum coefficient) and vector element learning
Chiu et al. Learning-based auditory encoding for robust speech recognition
CN112466276A (en) Speech synthesis system training method and device and readable storage medium
CN110415707B (en) Speaker recognition method based on voice feature fusion and GMM
CN111524520A (en) Voiceprint recognition method based on error reverse propagation neural network
Kaminski et al. Automatic speaker recognition using a unique personal feature vector and Gaussian Mixture Models
CN111862991A (en) Method and system for identifying baby crying
Cai et al. The best input feature when using convolutional neural network for cough recognition
Wang et al. Robust Text-independent Speaker Identification in a Time-varying Noisy Environment.
Tzudir et al. Low-resource dialect identification in Ao using noise robust mean Hilbert envelope coefficients
CN114302301A (en) Frequency response correction method and related product
CN107993666B (en) Speech recognition method, speech recognition device, computer equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant