CN105096955A - Speaker rapid identification method and system based on growing and clustering algorithm of models - Google Patents

Speaker rapid identification method and system based on growing and clustering algorithm of models Download PDF

Info

Publication number
CN105096955A
CN105096955A CN201510563935.5A CN201510563935A CN105096955A CN 105096955 A CN105096955 A CN 105096955A CN 201510563935 A CN201510563935 A CN 201510563935A CN 105096955 A CN105096955 A CN 105096955A
Authority
CN
China
Prior art keywords
model
class
vocal print
models
representative
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510563935.5A
Other languages
Chinese (zh)
Other versions
CN105096955B (en
Inventor
张晶
陈晓梅
郑党
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Foreign Studies
Original Assignee
Guangdong University of Foreign Studies
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Foreign Studies filed Critical Guangdong University of Foreign Studies
Priority to CN201510563935.5A priority Critical patent/CN105096955B/en
Publication of CN105096955A publication Critical patent/CN105096955A/en
Application granted granted Critical
Publication of CN105096955B publication Critical patent/CN105096955B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention discloses a speaker rapid identification method and a system based on the growing and clustering algorithm of models. The method comprises the processes of model training and model identification. The model training process comprises the steps of acquiring voiceprint signals from multiple persons including a speaker, pre-treating all the voice-print signals and extracting voiceprint characteristic parameters to form a plurality of models, and conducting the adaptive classification for all models based on the growing and clustering algorithm of models. The model identification process comprises the steps of acquiring voice signals from a speaker, pre-treating the voice signals, extracting voiceprint characteristic parameters, calculating the characteristic parameters of to-be-identified voice signals and the likelihoods of all model types, selecting a model type for the to-be-identified voice signal based on the maximum likelihood principle, calculating the likelihood scores of all models in the above selected model type, and adopting a model of the highest likelihood score as an identification result. According to the technical scheme of the invention, the operation of matching the to-be-identified voice characteristics with all models is not required, so that the method is short in matching period and good in real-time performance. The method can be well adapted to large-scale model bases.

Description

A kind of speaker's method for quickly identifying based on model growth cluster and system
Technical field
The present invention relates to Application on Voiceprint Recognition field, more specifically, relate to a kind of speaker's method for quickly identifying based on model growth cluster and system.
Background technology
In embedded OS, realized the identification of speaker ' s identity by voice, usually need to carry out pre-service to the vocal print of input, transfer data to server, and then generate sound-groove model, Model Matching, finally exports and shows result.Wherein, sound-groove model refer to mixed Gauss model (GMM), and the training of this model have employed EM algorithm.λ=(ω, μ, Σ) tlv triple generally can be used to carry out succinct expression mixed Gauss model.Mixed Gauss model adopts the speech model of the set of weights of a multiple Gauss model incompatible description speaker, and maximal value algorithm EM constantly updates systematic parameter to adopt local to expect, thus obtains the mathematical statistical model GMM of voice." Speaker Identification model and method " book that Wu Chaohui, Yang Yingchun write is described in detail GMM and EM algorithm.Traditional recognition methods needs phonetic feature to be identified to mate with models all in model bank, once model bank scale becomes large, coupling required time is just more and more longer, thus causes identifying slowly that even initiating system is to bursting, and the real-time of system cannot be protected.
Summary of the invention
The present invention is intended to solve the problems of the technologies described above at least to a certain extent.
Primary and foremost purpose of the present invention is the defect overcoming match time described in above-mentioned prior art of long, poor real, provides the speaker's method for quickly identifying based on model growth cluster that a kind of match time is short, real-time is good.
A further object of the present invention is to provide the speaker's system for rapidly identifying based on model growth cluster that a kind of match time is short, real-time is good.
For solving the problems of the technologies described above, technical scheme of the present invention is as follows:
Based on speaker's method for quickly identifying of model growth cluster, comprise model training and Model Identification;
Model training comprises the following steps:
S1: gather the vocal print signal comprising many people of speaker;
S2: carry out pre-service to each vocal print signal, its preprocessing process comprises pre-emphasis, framing, windowing and end-point detection successively;
S3: vocal print characteristic parameter extraction is carried out to each vocal print signal, forms multiple model;
S4: adopt model growth clustering algorithm to carry out self-adaptation classification to all models, self-adaptation classification process comprises the representative of initialization class, class represents authorization, class representative is elected;
Model Identification comprises the following steps:
S5: the voice signal gathering speaker, is voice signal to be identified;
S6: pre-service is carried out to voice signal to be identified and extracts vocal print characteristic parameter;
S7: the likelihood score of characteristic parameter to all kinds of representative calculating voice signal to be identified, the class belonging to selecting with the maximum principle of likelihood score, and then calculate Likelihood Score with all models in the class selected, the model that score is the highest is recognition result.
In the preferred scheme of one, in step S2, pre-service is carried out to each vocal print signal and specifically comprises the following steps:
S2.1: pre-emphasis, in pre-emphasis process, vocal print signal moves suitable frequency range by wave filter,
Transport function is: H (z)=1-0.9375z -1,
The signal obtained is: S ~ ( n ) = S ( n ) - 0.9375 S ( n - 1 ) ;
S2.2: framing, with 10 ~ 20ms for vocal print signal is divided into some frames by interval, a frame is a base unit; Vocal print signal is transient change, but is metastable in 10 ~ 20ms, so the vocal print signal in this relatively stable period can be regarded as a base unit---frame.
S2.3: windowing, in order to avoid during rectangular window to LPC coefficient (linear predictor coefficient) end-on error, have employed Hamming window function to carry out window, that is: wherein: w ( n ) = 0.54 - 0.46 ( 2 π n N - 1 ) , 0 ≤ n ≤ N - 1 ;
S2.4: end-point detection, detect end points by the short-time energy coefficient of vocal print signal and short-time zero-crossing rate coefficient, the formula of these two coefficients is as follows:
Short-time energy coefficient: e ( i ) = Σ n = 1 N | x i ( n ) | ,
Short-time zero-crossing rate coefficient: Z C R ( i ) = Σ n = 1 N - 1 | x i ( n ) - x i ( n + 1 ) | .
End-point detection object detects the existence having unmodulated groove signal, from the segment signal comprising vocal print, namely determine starting point and the terminating point of vocal print.Effective end-point detection can not only make the processing time reduce to minimum, and can get rid of the noise of unvoiced segments, thus makes recognition system have good recognition performance.
In the preferred scheme of one, in step S3, described characteristic parameter is MFCC (mel-frequency cepstrum coefficient) characteristic parameter, carries out vocal print characteristic parameter extraction, specifically comprise the following steps each vocal print signal:
S3.1: Fast Fourier Transform (FFT) is carried out to vocal print signal and obtains energy frequency spectrum;
S3.2: energy frequency spectrum is multiplied by one group of N number of V-belt bandpass filter, tries to achieve logarithmic energy (LogEnergy) E that each wave filter exports k, described N number of V-belt bandpass filter is evenly distributed on mel-frequency (MelFrequency), and the relational expression of mel-frequency mel (f) and general frequency f is:
mel(f)=2595*log10(1+f/700);
S3.3: by the N number of logarithmic energy E obtained kbring discrete cosine transform (DCT) into, obtain the Mel-scaleCepstrum parameter on L rank, obtain L parameters of cepstrum, the value of L gets 12, and discrete cosine transform formula is as follows:
C m=Ncos[m*(k-0.5)*p/N]*E k,m=1,2,...,L;
S3.4: the logarithmic energy extracting a vocal print signal frame, the logarithmic energy of a frame is defined as the quadratic sum of signal in a frame, get denary logarithm value again, be multiplied by 10 again, the energy of a frame is also the key character of vocal print, therefore add the logarithmic energy of a frame, the vocal print feature making each frame basic just has 13 dimensions, contains 1 logarithmic energy and 12 parameters of cepstrums;
S3.5: the residual quantity parameters of cepstrum (Deltacepstrum) extracting vocal print signal, residual quantity parameters of cepstrum represents the slope of parameters of cepstrum relative to the time, although obtained 13 characteristic parameters, but when being applied to sound-groove identification, add residual quantity parameters of cepstrum, to show the change of parameters of cepstrum to the time, its meaning is the slope of parameters of cepstrum relative to the time, namely represent parameters of cepstrum dynamic change in time, formula is as follows:
▿ C m ( t ) = Σ τ = - M M τ · C m ( t + τ ) Σ τ = - M M τ 2 = Σ τ = 1 M τ · C m ( t + τ ) - C m ( t - τ ) ) 2 · Σ τ = 1 M τ 2 , m = 1 , 2 , ... L
Here the value of M gets the number of 2 or 3, t representative frame, C mt () refers to the parameters of cepstrum of t frame.
In the preferred scheme of one, in step S4, model growth clustering algorithm is adopted to comprise the following steps the concrete grammar that all models carry out self-adaptation classification:
S4.1: initialization class represents:
From all models, Stochastic choice model represents R0 as first initial classes;
Calculate the approximate entropy D of residue model to R0 successively, until D> θ, then this model is appointed as second initial classes and represent R1, now class presenting set A0={R0, R1}, wherein θ is default threshold value;
Calculate the approximate entropy of all the other models to R0 and R1 respectively, if be all greater than θ, appoint as the 3rd initial classes and represent R2, so repeatedly, until obtain k class representative, k is the number of default class, i.e. A0={R0, R1 ... Rk-1}, class represents initialization and completes;
The value of initial classes representative directly has influence on the efficiency of clustering algorithm, and initial classes of the present invention representative meets following two conditions: initial classes representative is directly or indirectly produced by model set, and initial classes representative similarity between any two need be greater than the threshold value θ of setting.
S4.2: class representative authorization:
Cluster result due to initial classes representative often cannot meet the restriction of class members, therefore needs to authorize class representative to cancel class representative or produce new class representative.
Calculate number of members γ and be greater than γ maxclass ω in all members model densities value and by order arrangement from big to small, the maximum member of density value directly appoints as new class representative, and the method then according to initialization class representative in step S4.1 generates γ newindividual new class representative, γ newspan determined by following formula:
1 ≤ γ n e w ≤ γ γ m a x
Authorize all class representatives successively, and reclassify, until upgrade without class representative;
S4.3: class representative is elected:
After cluster, all models are divided into k class, now the feature re-training of each class model are obtained the class representative of class GMM (gauss hybrid models) model as such; This GMM is elected by models all in class and obtains, and has representative more accurately.
Based on speaker's system for rapidly identifying of model growth cluster, comprising: client, network connecting module and service end, client is connected by network connecting module with service end;
Client comprises:
Vocal print acquisition module: for gathering the vocal print signal of the many people comprising speaker and outputting to pretreatment module;
Service end comprises:
Pretreatment module: comprise the pre-emphasis unit, sub-frame processing unit, window processing unit and the end-point detection unit that connect in turn, for carrying out pre-emphasis, framing, windowing and end-point detection to vocal print signal successively, then vocal print signal is transferred to server end by network connecting module;
Vocal print characteristic extracting module: vocal print characteristic parameter extraction is carried out to each vocal print signal, forms multiple model;
Self-adaptation classifying module: grow clustering algorithm for adopting model and self-adaptation classification is carried out to all models, self-adaptation classification process comprises the representative of initialization class, class representative is authorized, class representative is elected;
Voiceprint identification module: for the likelihood score of the characteristic parameter to all kinds of representative that calculate voice signal to be identified, the class belonging to selecting with the maximum principle of likelihood score, and then calculate Likelihood Score with all models in the class selected, the model that score is the highest is recognition result.
In the preferred scheme of one, server end receives the identification request of multiple client simultaneously, and server end is newly-built 1 thread of each identification request, and makes response by the identification request of wireless network to user.
In the preferred scheme of one, described client is Android client,
Compared with prior art, the beneficial effect of technical solution of the present invention is: the present invention discloses a kind of speaker's method for quickly identifying based on model growth cluster, and model training comprises the vocal print signal gathering and comprise many people of speaker; Pre-service carried out to each vocal print signal and extracts vocal print characteristic parameter, forming multiple model; Model growth clustering algorithm is adopted to carry out self-adaptation classification to all models; Model Identification comprises the voice signal gathering speaker and carries out pre-service and extract vocal print characteristic parameter, calculate the likelihood score of characteristic parameter to all kinds of representative of voice signal to be identified, class belonging to selecting with the maximum principle of likelihood score, and then calculating Likelihood Score with all models in the class selected, the model that score is the highest is recognition result.The inventive method is without the need to mating phonetic feature to be identified with all models, and therefore match time is short, real-time good, can adapt to large-scale model bank well.
The present invention also discloses a kind of speaker's system for rapidly identifying based on model growth cluster, and described system is the hardware foundation that said method realizes, and described method and system combines can achieve quick, real-time Speaker Identification.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of the speaker's method for quickly identifying based on model growth cluster.
Fig. 2 is the process flow diagram that self-adaptation is sorted out.
Fig. 3 is the schematic diagram of the speaker's system for rapidly identifying based on model growth cluster.
Fig. 4 is the functional schematic of the speaker's system for rapidly identifying based on model growth cluster.
Embodiment
Accompanying drawing, only for exemplary illustration, can not be interpreted as the restriction to this patent;
In order to better the present embodiment is described, some parts of accompanying drawing have omission, zoom in or out, and do not represent the size of actual product; To those skilled in the art, in accompanying drawing, some known features and explanation thereof may be omitted is understandable.
Below in conjunction with drawings and Examples, technical scheme of the present invention is described further.
Embodiment 1
As shown in Figure 1, a kind of speaker's method for quickly identifying based on model growth cluster, comprises model training and Model Identification;
Model training comprises the following steps:
S1: gather the vocal print signal comprising many people of speaker, i.e. voice signal;
S2: carry out pre-service and noise reduction process to each vocal print signal, its preprocessing process comprises pre-emphasis, framing, windowing and end-point detection successively;
In specific implementation process, in step S2, pre-service is carried out to each vocal print signal and specifically comprises the following steps:
S2.1: pre-emphasis, in pre-emphasis process, vocal print signal moves suitable frequency range by wave filter,
Transport function is: H (z)=1-0.9375z -1,
The signal obtained is: S ~ ( n ) = S ( n ) - 0.9375 S ( n - 1 ) ;
S2.2: framing, with 10 ~ 20ms for vocal print signal is divided into some frames by interval, a frame is a base unit; Vocal print signal is transient change, but is metastable in 10 ~ 20ms, so the vocal print signal in this relatively stable period can be regarded as a base unit---frame.
S2.3: windowing, in order to avoid during rectangular window to the end-on error of LPC coefficient, have employed Hamming window function to carry out window, that is: wherein: w ( n ) = 0.54 - 0.46 ( 2 π n N - 1 ) , 0 ≤ n ≤ N - 1 ;
S2.4: end-point detection, detect end points by the short-time energy coefficient of vocal print signal and short-time zero-crossing rate coefficient, the formula of these two coefficients is as follows:
Short-time energy coefficient: e ( i ) = Σ n = 1 N | x i ( n ) | ,
Short-time zero-crossing rate coefficient: Z C R ( i ) = Σ n = 1 N - 1 | x i ( n ) - x i ( n + 1 ) | .
End-point detection object detects the existence having unmodulated groove signal, from the segment signal comprising vocal print, namely determine starting point and the terminating point of vocal print.Effective end-point detection can not only make the processing time reduce to minimum, and can get rid of the noise of unvoiced segments, thus makes recognition system have good recognition performance.
S3: vocal print characteristic parameter extraction is carried out to each vocal print signal, forms multiple model;
In specific implementation process, in step S3, described characteristic parameter is MFCC characteristic parameter, carries out vocal print characteristic parameter extraction, specifically comprise the following steps each vocal print signal:
S3.1: Fast Fourier Transform (FFT) is carried out to vocal print signal and obtains energy frequency spectrum;
S3.2: energy frequency spectrum is multiplied by one group of N number of V-belt bandpass filter, tries to achieve the logarithmic energy E that each wave filter exports k, described N number of V-belt bandpass filter is evenly distributed on mel-frequency, and the relational expression of mel-frequency mel (f) and general frequency f is: mel (f)=2595*log10 (1+f/700);
S3.3: by the N number of logarithmic energy E obtained kbring discrete cosine transform into, obtain the Mel-scaleCepstrum parameter on L rank, obtain L parameters of cepstrum, the value of L gets 12, and discrete cosine transform formula is as follows:
C m=Ncos[m*(k-0.5)*p/N]*E k,m=1,2,...,L;
S3.4: the logarithmic energy extracting a vocal print signal frame, the logarithmic energy of a frame is defined as the quadratic sum of signal in a frame, get denary logarithm value again, be multiplied by 10 again, the energy of a frame is also the key character of vocal print, therefore add the logarithmic energy of a frame, the vocal print feature making each frame basic just has 13 dimensions, contains 1 logarithmic energy and 12 parameters of cepstrums;
S3.5: the residual quantity parameters of cepstrum extracting vocal print signal, residual quantity parameters of cepstrum represents the slope of parameters of cepstrum relative to the time, although obtained 13 characteristic parameters, but when being applied to sound-groove identification, add residual quantity parameters of cepstrum, to show the change of parameters of cepstrum to the time, its meaning is the slope of parameters of cepstrum relative to the time, namely represent parameters of cepstrum dynamic change in time, formula is as follows:
▿ C m ( t ) = Σ τ = - M M τ · C m ( t + τ ) Σ τ = - M M τ 2 = Σ τ = 1 M τ · C m ( t + τ ) - C m ( t - τ ) ) 2 · Σ τ = 1 M τ 2 , m = 1 , 2 , ... L
Here the value of M gets the number of 2 or 3, t representative frame, C mt () refers to the parameters of cepstrum of t frame.
S4: adopt model growth clustering algorithm to carry out self-adaptation classification to all models, self-adaptation classification process comprises the representative of initialization class, class represents authorization, class representative is elected;
As shown in Figure 2, in specific implementation process, in step S4, model growth clustering algorithm is adopted to comprise the following steps the concrete grammar that all models carry out self-adaptation classification:
S4.1: initialization class represents:
From all models model bank, Stochastic choice model represents R0 as first initial classes;
Calculate the approximate entropy D of residue model to R0 successively, until D> θ, then this model is appointed as second initial classes and represent R1, now class presenting set A0={R0, R1}, wherein θ is default threshold value;
Calculate the approximate entropy of all the other models to R0 and R1 respectively, if be all greater than θ, appoint as the 3rd initial classes and represent R2, so repeatedly, until obtain k class representative, k is the number of default class, i.e. A0={R0, R1 ... Rk-1}, class represents initialization and completes, and then sorts out model;
The value of initial classes representative directly has influence on the efficiency of clustering algorithm, and initial classes of the present invention representative meets following two conditions: initial classes representative is directly or indirectly produced by model set, and initial classes representative similarity between any two need be greater than the threshold value θ of setting.
S4.2: class representative authorization:
Cluster result due to initial classes representative often cannot meet the restriction of class members, therefore needs to authorize class representative to cancel class representative or produce new class representative.
Calculate number of members γ and be greater than γ maxclass ω in all members model densities value and by order arrangement from big to small, the maximum member of density value directly appoints as new class representative, and the method then according to initialization class representative in step S4.1 generates γ newindividual new class representative, γ newspan determined by following formula:
1 ≤ γ n e w ≤ γ γ m a x
Authorize all class representatives successively, and reclassify, until upgrade without class representative;
S4.3: class representative is elected:
After cluster, all models are divided into k class, now the feature re-training of each class model are obtained the class representative of class GMM model as such, and are saved in database; This GMM is elected by models all in class and obtains, and has representative more accurately.
Model Identification comprises the following steps:
S5: the voice signal gathering speaker, is voice signal to be identified;
S6: pre-service, noise reduction process extract vocal print characteristic parameter are carried out to voice signal to be identified;
S7: the likelihood score of characteristic parameter to all kinds of representative calculating voice signal to be identified, class belonging to selecting with the maximum principle of likelihood score, and then calculating Likelihood Score with all models in the class selected, the model that score is the highest is recognition result, finally exports recognition result.
The present embodiment provides a kind of speaker's method for quickly identifying based on model growth cluster, and model training comprises the vocal print signal gathering and comprise many people of speaker; Pre-service carried out to each vocal print signal and extracts vocal print characteristic parameter, forming multiple model; Model growth clustering algorithm is adopted to carry out self-adaptation classification to all models; Model Identification comprises the voice signal gathering speaker and carries out pre-service and extract vocal print characteristic parameter, calculate the likelihood score of characteristic parameter to all kinds of representative of voice signal to be identified, class belonging to selecting with the maximum principle of likelihood score, and then calculating Likelihood Score with all models in the class selected, the model that score is the highest is recognition result.The inventive method is without the need to mating phonetic feature to be identified with all models, and therefore match time is short, real-time good, can adapt to large-scale model bank well.
Embodiment 2
As shown in Figure 3, a kind of speaker's system for rapidly identifying based on model growth cluster, comprising: client, network connecting module and service end, client is connected by network connecting module with service end;
Client comprises:
Vocal print acquisition module: for gathering the vocal print signal of the many people comprising speaker and outputting to pretreatment module;
Service end comprises:
Pretreatment module: comprise the pre-emphasis unit, sub-frame processing unit, window processing unit and the end-point detection unit that connect in turn, for carrying out pre-emphasis, framing, windowing and end-point detection to vocal print signal successively, then vocal print signal is transferred to server end by network connecting module;
Vocal print characteristic extracting module: vocal print characteristic parameter extraction is carried out to each vocal print signal, forms multiple model;
Self-adaptation classifying module: grow clustering algorithm for adopting model and self-adaptation classification is carried out to all models, self-adaptation classification process comprises the representative of initialization class, class representative is authorized, class representative is elected;
Voiceprint identification module: for the likelihood score of the characteristic parameter to all kinds of representative that calculate voice signal to be identified, the class belonging to selecting with the maximum principle of likelihood score, and then calculate Likelihood Score with all models in the class selected, the model that score is the highest is recognition result.
As shown in Figure 4, in specific implementation process, server end receives the identification request of multiple client and user simultaneously, and server end is newly-built 1 thread of each identification request, and makes response by the identification request of wireless network to user.
In specific implementation process, described client is Android client, and voice acquisition module is realized by the android.media.AudioRecord of android system, obtains the PCM data of voice.
In the present invention, client gathers voice signal, and service end does signal logic process, and both data pass through Http agreement and complete.Client does not do mathematical logic process, and therefore system there is no special hardware requirement to client; The data-handling capacity of service end is far above client, and therefore the logical relation such as training, classification, cluster, coupling of model is all by server process, thus ensure that the smoothness of client.
By gathering voice messaging after the selection of client modules function, setup parameter, and send to server by network request; Network connecting module select the network transmission protocol, setting data transformat and process network request or response time-out; Service end is resolved after receiving request and is obtained speech data, carry out Preprocessing, then select to perform corresponding operation according to different functions, comprise training pattern, Model tying and Model Identification three zones, finally the result of process is returned the display of client display panel.
The present embodiment provides a kind of speaker's system for rapidly identifying based on model growth cluster, and described system is the hardware foundation that said method realizes, and described method and system combines can achieve quick, real-time Speaker Identification.
Obviously, the above embodiment of the present invention is only for example of the present invention is clearly described, and is not the restriction to embodiments of the present invention.For those of ordinary skill in the field, can also make other changes in different forms on the basis of the above description.Here exhaustive without the need to also giving all embodiments.All any amendments done within the spirit and principles in the present invention, equivalent to replace and improvement etc., within the protection domain that all should be included in the claims in the present invention.

Claims (7)

1., based on speaker's method for quickly identifying of model growth cluster, it is characterized in that, comprise model training and Model Identification;
Model training comprises the following steps:
S1: gather the vocal print signal comprising many people of speaker;
S2: carry out pre-service to each vocal print signal, its preprocessing process comprises pre-emphasis, framing, windowing and end-point detection successively;
S3: vocal print characteristic parameter extraction is carried out to each vocal print signal, forms multiple model;
S4: adopt model growth clustering algorithm to carry out self-adaptation classification to all models, self-adaptation classification process comprises the representative of initialization class, class represents authorization, class representative is elected;
Model Identification comprises the following steps:
S5: the voice signal gathering speaker;
S6: pre-service is carried out to voice signal to be identified and extracts vocal print characteristic parameter;
S7: the likelihood score of characteristic parameter to all kinds of representative calculating voice signal to be identified, the class belonging to selecting with the maximum principle of likelihood score, and then calculate Likelihood Score with all models in the class selected, the model that score is the highest is recognition result.
2. the speaker's method for quickly identifying based on model growth cluster according to claim 1, is characterized in that, in step S2, carry out pre-service specifically comprise the following steps each vocal print signal:
S2.1: pre-emphasis, in pre-emphasis process,
Transport function is: H (z)=1-0.9375z -1,
The signal obtained is: S ~ ( n ) = S ( n ) - 0.9375 S ( n - 1 ) ;
S2.2: framing, with 10 ~ 20ms for vocal print signal is divided into some frames by interval, a frame is a base unit;
S2.3: windowing, have employed Hamming window function to carry out window, that is: 0≤n≤N-1, wherein: w ( n ) = 0.54 - 0.46 ( 2 π n N - 1 ) , 0≤n≤N-1;
S2.4: end-point detection, detect end points by the short-time energy coefficient of vocal print signal and short-time zero-crossing rate coefficient, the formula of these two coefficients is as follows:
Short-time energy coefficient: e ( i ) = Σ n = 1 N | x i ( n ) | ,
Short-time zero-crossing rate coefficient: Z C R ( i ) = Σ n = 1 N - 1 | x i ( n ) - x i ( n + 1 ) | .
3. the speaker's method for quickly identifying based on model growth cluster according to claim 1, it is characterized in that, in step S3, described characteristic parameter is MFCC characteristic parameter, carries out vocal print characteristic parameter extraction, specifically comprise the following steps each vocal print signal:
S3.1: Fast Fourier Transform (FFT) is carried out to vocal print signal and obtains energy frequency spectrum;
S3.2: energy frequency spectrum is multiplied by one group of N number of V-belt bandpass filter, tries to achieve the logarithmic energy E that each wave filter exports k, described N number of V-belt bandpass filter is evenly distributed on mel-frequency, and the relational expression of mel-frequency mel (f) and general frequency f is: mel (f)=2595*log10 (1+f/700);
S3.3: by the N number of logarithmic energy E obtained kbring discrete cosine transform into, obtain the Mel-scaleCepstrum parameter on L rank, obtain L parameters of cepstrum, discrete cosine transform formula is as follows:
C m=Ncos[m*(k-0.5)*p/N]*E k,m=1,2,...,L;
S3.4: the logarithmic energy extracting a vocal print signal frame, the logarithmic energy of a frame is defined as the quadratic sum of signal in a frame, then gets denary logarithm value, then is multiplied by 10;
S3.5: the residual quantity parameters of cepstrum extracting vocal print signal, residual quantity parameters of cepstrum represents the slope of parameters of cepstrum relative to the time, and formula is as follows:
▿ C m ( t ) = Σ τ = - M M τ · C m ( t + τ ) Σ τ = - M M τ 2 = Σ τ = 1 M τ · C m ( t + τ ) - C m ( t - τ ) ) 2 · Σ τ = 1 M τ 2 , m = 1 , 2 , ... L
Here the value of M gets the number of 2 or 3, t representative frame, C mt () refers to the parameters of cepstrum of t frame.
4. the speaker's method for quickly identifying based on model growth cluster according to claim 1, is characterized in that, in step S4, adopts model growth clustering algorithm to comprise the following steps the concrete grammar that all models carry out self-adaptation classification:
S4.1: initialization class represents:
From all models, Stochastic choice model represents R0 as first initial classes;
Calculate the approximate entropy D of residue model to R0 successively, until D> θ, then this model is appointed as second initial classes and represent R1, now class presenting set A0={R0, R1}, wherein θ is default threshold value;
Calculate the approximate entropy of all the other models to R0 and R1 respectively, if be all greater than θ, appoint as the 3rd initial classes and represent R2, so repeatedly, until obtain k class representative, k is the number of default class, i.e. A0={R0, R1 ... Rk-1}, class represents initialization and completes;
S4.2: class representative authorization:
Calculate number of members γ and be greater than γ maxclass ω in all members model densities value and by order arrangement from big to small, the maximum member of density value directly appoints as new class representative, and the method then according to initialization class representative in step S4.1 generates γ newindividual new class representative, γ newspan determined by following formula:
1 ≤ γ n e w ≤ γ γ m a x
Authorize all class representatives successively, and reclassify, until upgrade without class representative;
S4.3: class representative is elected:
After cluster, all models are divided into k class, now the feature re-training of each class model are obtained the class representative of class GMM model as such.
5., based on speaker's system for rapidly identifying of model growth cluster, it is characterized in that, comprising: client, network connecting module and service end, client is connected by network connecting module with service end;
Client comprises:
Vocal print acquisition module: for gathering the vocal print signal of the many people comprising speaker and outputting to pretreatment module;
Service end comprises:
Pretreatment module: comprise the pre-emphasis unit, sub-frame processing unit, window processing unit and the end-point detection unit that connect in turn, for carrying out pre-emphasis, framing, windowing and end-point detection to vocal print signal successively, then vocal print signal is transferred to server end by network connecting module;
Vocal print characteristic extracting module: vocal print characteristic parameter extraction is carried out to each vocal print signal, forms multiple model;
Self-adaptation classifying module: grow clustering algorithm for adopting model and self-adaptation classification is carried out to all models, self-adaptation classification process comprises the representative of initialization class, class representative is authorized, class representative is elected;
Voiceprint identification module: for the likelihood score of the characteristic parameter to all kinds of representative that calculate voice signal to be identified, the class belonging to selecting with the maximum principle of likelihood score, and then calculate Likelihood Score with all models in the class selected, the model that score is the highest is recognition result.
6. the speaker's system for rapidly identifying based on model growth cluster according to claim 5, it is characterized in that, server end receives the identification request of multiple client simultaneously, server end is newly-built 1 thread of each identification request, and makes response by the identification request of wireless network to user.
7. the speaker's system for rapidly identifying based on model growth cluster according to claim 5, it is characterized in that, described client is Android client.
CN201510563935.5A 2015-09-06 2015-09-06 A kind of speaker's method for quickly identifying and system based on model growth cluster Expired - Fee Related CN105096955B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510563935.5A CN105096955B (en) 2015-09-06 2015-09-06 A kind of speaker's method for quickly identifying and system based on model growth cluster

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510563935.5A CN105096955B (en) 2015-09-06 2015-09-06 A kind of speaker's method for quickly identifying and system based on model growth cluster

Publications (2)

Publication Number Publication Date
CN105096955A true CN105096955A (en) 2015-11-25
CN105096955B CN105096955B (en) 2019-02-01

Family

ID=54577238

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510563935.5A Expired - Fee Related CN105096955B (en) 2015-09-06 2015-09-06 A kind of speaker's method for quickly identifying and system based on model growth cluster

Country Status (1)

Country Link
CN (1) CN105096955B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106971711A (en) * 2016-01-14 2017-07-21 芋头科技(杭州)有限公司 A kind of adaptive method for recognizing sound-groove and system
CN107799114A (en) * 2017-04-26 2018-03-13 珠海智牧互联科技有限公司 A kind of pig cough sound recognition methods and system
CN108417217A (en) * 2018-01-11 2018-08-17 苏州思必驰信息科技有限公司 Speaker Identification network model training method, method for distinguishing speek person and system
WO2018166187A1 (en) * 2017-03-13 2018-09-20 平安科技(深圳)有限公司 Server, identity verification method and system, and a computer-readable storage medium
CN108922538A (en) * 2018-05-29 2018-11-30 平安科技(深圳)有限公司 Conferencing information recording method, device, computer equipment and storage medium
CN108922543A (en) * 2018-06-11 2018-11-30 平安科技(深圳)有限公司 Model library method for building up, audio recognition method, device, equipment and medium
CN108962229A (en) * 2018-07-26 2018-12-07 汕头大学 A kind of target speaker's voice extraction method based on single channel, unsupervised formula
CN109461441A (en) * 2018-09-30 2019-03-12 汕头大学 A kind of Activities for Teaching Intellisense method of adaptive, unsupervised formula
CN109887496A (en) * 2019-01-22 2019-06-14 浙江大学 Orientation confrontation audio generation method and system under a kind of black box scene
CN109961794A (en) * 2019-01-14 2019-07-02 湘潭大学 A kind of layering method for distinguishing speek person of model-based clustering
CN113697321A (en) * 2021-09-16 2021-11-26 安徽世绿环保科技有限公司 Garbage bag coding system for garbage classification station

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5857169A (en) * 1995-08-28 1999-01-05 U.S. Philips Corporation Method and system for pattern recognition based on tree organized probability densities
CN1403953A (en) * 2002-09-06 2003-03-19 浙江大学 Palm acoustic-print verifying system
CN101226742A (en) * 2007-12-05 2008-07-23 浙江大学 Method for recognizing sound-groove based on affection compensation
CN102194455A (en) * 2010-03-17 2011-09-21 博石金(北京)信息技术有限公司 Voiceprint identification method irrelevant to speak content
EP2808866A1 (en) * 2013-05-31 2014-12-03 Nuance Communications, Inc. Method and apparatus for automatic speaker-based speech clustering
CN104732972A (en) * 2015-03-12 2015-06-24 广东外语外贸大学 HMM voiceprint recognition signing-in method and system based on grouping statistics

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5857169A (en) * 1995-08-28 1999-01-05 U.S. Philips Corporation Method and system for pattern recognition based on tree organized probability densities
CN1403953A (en) * 2002-09-06 2003-03-19 浙江大学 Palm acoustic-print verifying system
CN101226742A (en) * 2007-12-05 2008-07-23 浙江大学 Method for recognizing sound-groove based on affection compensation
CN102194455A (en) * 2010-03-17 2011-09-21 博石金(北京)信息技术有限公司 Voiceprint identification method irrelevant to speak content
EP2808866A1 (en) * 2013-05-31 2014-12-03 Nuance Communications, Inc. Method and apparatus for automatic speaker-based speech clustering
CN104732972A (en) * 2015-03-12 2015-06-24 广东外语外贸大学 HMM voiceprint recognition signing-in method and system based on grouping statistics

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
熊华乔: "基于模型聚类的说话人识别方法研究", 《中国优秀硕士学位论文全文数据库》 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106971711A (en) * 2016-01-14 2017-07-21 芋头科技(杭州)有限公司 A kind of adaptive method for recognizing sound-groove and system
WO2018166187A1 (en) * 2017-03-13 2018-09-20 平安科技(深圳)有限公司 Server, identity verification method and system, and a computer-readable storage medium
CN107799114A (en) * 2017-04-26 2018-03-13 珠海智牧互联科技有限公司 A kind of pig cough sound recognition methods and system
CN108417217A (en) * 2018-01-11 2018-08-17 苏州思必驰信息科技有限公司 Speaker Identification network model training method, method for distinguishing speek person and system
CN108922538A (en) * 2018-05-29 2018-11-30 平安科技(深圳)有限公司 Conferencing information recording method, device, computer equipment and storage medium
CN108922543B (en) * 2018-06-11 2022-08-16 平安科技(深圳)有限公司 Model base establishing method, voice recognition method, device, equipment and medium
CN108922543A (en) * 2018-06-11 2018-11-30 平安科技(深圳)有限公司 Model library method for building up, audio recognition method, device, equipment and medium
CN108962229A (en) * 2018-07-26 2018-12-07 汕头大学 A kind of target speaker's voice extraction method based on single channel, unsupervised formula
CN108962229B (en) * 2018-07-26 2020-11-13 汕头大学 Single-channel and unsupervised target speaker voice extraction method
CN109461441B (en) * 2018-09-30 2021-05-11 汕头大学 Self-adaptive unsupervised intelligent sensing method for classroom teaching activities
CN109461441A (en) * 2018-09-30 2019-03-12 汕头大学 A kind of Activities for Teaching Intellisense method of adaptive, unsupervised formula
CN109961794A (en) * 2019-01-14 2019-07-02 湘潭大学 A kind of layering method for distinguishing speek person of model-based clustering
CN109887496A (en) * 2019-01-22 2019-06-14 浙江大学 Orientation confrontation audio generation method and system under a kind of black box scene
CN113697321A (en) * 2021-09-16 2021-11-26 安徽世绿环保科技有限公司 Garbage bag coding system for garbage classification station

Also Published As

Publication number Publication date
CN105096955B (en) 2019-02-01

Similar Documents

Publication Publication Date Title
CN105096955A (en) Speaker rapid identification method and system based on growing and clustering algorithm of models
CN107610707B (en) A kind of method for recognizing sound-groove and device
TWI641965B (en) Method and system of authentication based on voiceprint recognition
Yu et al. Spoofing detection in automatic speaker verification systems using DNN classifiers and dynamic acoustic features
US9940935B2 (en) Method and device for voiceprint recognition
CN110457432B (en) Interview scoring method, interview scoring device, interview scoring equipment and interview scoring storage medium
WO2017162017A1 (en) Method and device for voice data processing and storage medium
CN102968990B (en) Speaker identifying method and system
CN106952643A (en) A kind of sound pick-up outfit clustering method based on Gaussian mean super vector and spectral clustering
CN110164452A (en) A kind of method of Application on Voiceprint Recognition, the method for model training and server
CN102737633B (en) Method and device for recognizing speaker based on tensor subspace analysis
CN108899044A (en) Audio signal processing method and device
CN107886943A (en) Voiceprint recognition method and device
CN108597505B (en) Voice recognition method and device and terminal equipment
CN107886957A (en) Voice wake-up method and device combined with voiceprint recognition
WO2014114116A1 (en) Method and system for voiceprint recognition
CN108922513A (en) Speech differentiation method, apparatus, computer equipment and storage medium
CN108766445A (en) Method for recognizing sound-groove and system
CN104900235A (en) Voiceprint recognition method based on pitch period mixed characteristic parameters
WO2019200744A1 (en) Self-updated anti-fraud method and apparatus, computer device and storage medium
WO2019136912A1 (en) Electronic device, identity authentication method and system, and storage medium
CN109545228A (en) A kind of end-to-end speaker's dividing method and system
CN109346084A (en) Method for distinguishing speek person based on depth storehouse autoencoder network
CN113223536B (en) Voiceprint recognition method and device and terminal equipment
CN109378014A (en) A kind of mobile device source discrimination and system based on convolutional neural networks

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20190201

Termination date: 20190906