CN105989843A - Method and device of realizing missing feature reconstruction - Google Patents
Method and device of realizing missing feature reconstruction Download PDFInfo
- Publication number
- CN105989843A CN105989843A CN201510044910.4A CN201510044910A CN105989843A CN 105989843 A CN105989843 A CN 105989843A CN 201510044910 A CN201510044910 A CN 201510044910A CN 105989843 A CN105989843 A CN 105989843A
- Authority
- CN
- China
- Prior art keywords
- speech
- frame
- speech frame
- tested
- tested speech
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/20—Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
Abstract
The present invention discloses a method and device of realizing missing feature reconstruction The method comprises the steps of pre-obtaining the weights, the mean vectors and the covariance matrixes of two or more than two Gaussian hybrid clusters; dividing the test voice into two or more than two voice frames, for each voice frame of the test voice, adopting an IMCRA algorithm to calculate the signal to noise ratio of the dimensions of the voice frames of the test voice; dividing the voice frames of the test voice into a reliable part and an unreliable part according to the signal to noise ratio of the dimensions of the voice frames of the test voice, determining that the voice frames of the test voice need the missing feature reconstruction according to the unreliable part of the voice frames of the test voice, and carrying out the missing feature reconstruction on the unreliable part of the voice frames of the test voice according to the weights, the mean vectors and the covariance matrixes of the Gaussian hybrid clusters and the reliable part of the voice frames of the test voice. According to the scheme of the present invention, the precision of the missing feature reconstruction is improved.
Description
Technical field
The present invention relates to Application on Voiceprint Recognition (VPR, Voiceprint Recognition) technology, espespecially a kind of real
The method and apparatus now lacking feature reconstruction.
Background technology
The recognition performance of Voiceprint Recognition System can drastically reduce along with the enhancing of environment noise, for improving system
System discrimination in a noisy environment, as the effective front-end processing side of one of field of speech recognition
Method, the method for disappearance feature reconstruction has been applied to Application on Voiceprint Recognition field, and has achieved under experimental conditions
Good effect.
The existing method realizing disappearance feature reconstruction generally comprises:
Tested speech is divided into two or more speech frames, for the voice of each tested speech
Frame, uses the training speech frame that the speech frame composing the acquisition tested speech that cuts algorithm is corresponding, and according to acquisition
Training speech frame calculates the signal to noise ratio of each dimension of speech frame of tested speech;Speech frame according to tested speech
The speech frame of tested speech is divided into reliable parts and unreliable part by the signal to noise ratio of each dimension;Obtain each
Weight, mean vector and the covariance matrix that training speech frame is corresponding;Corresponding according to each training speech frame
Weight, mean vector and covariance matrix obtain weight corresponding to the speech frame of tested speech, mean vector
And covariance matrix;Weights, mean vector and covariance matrix that speech frame according to tested speech is corresponding,
Reliable parts in the speech frame of tested speech use maximum a posteriori in the speech frame of tested speech not
Reliable parts carries out lacking feature reconstruction.
In the existing method realizing disappearance feature reconstruction, owing to spectrum cuts algorithm, to be defaulted as noise be stable,
And actual noise be non-smoothly, cause carrying out lack feature reconstruction process exist bigger error.
Summary of the invention
In order to solve the problems referred to above, the present invention proposes a kind of method and apparatus realizing and lacking feature reconstruction,
Error can be reduced, thus improve disappearance feature reconstruction precision.
In order to achieve the above object, the present invention proposes a kind of method realizing and lacking feature reconstruction, including:
Obtain weight, mean vector and the covariance matrix of two or more Gaussian Mixture cluster in advance;
Tested speech is divided into two or more speech frames, for the voice of each tested speech
Frame, the speech frame using the minima control recursive average IMCRA algorithm calculating tested speech improved is each
The signal to noise ratio of dimension;
The speech frame of tested speech is divided into reliably by the signal to noise ratio of each dimension of speech frame according to tested speech
Part and unreliable part, and judge tested speech according to the unreliable part of the speech frame of tested speech
Speech frame need carry out lack feature reconstruction, according to obtain each Gaussian Mixture cluster weight, average
Vector sum covariance matrix, tested speech speech frame reliable parts to the speech frame of tested speech not
Reliable parts carries out lacking feature reconstruction.
Preferably, the unreliable part at the speech frame according to described tested speech judges described test language
When the speech frame of sound is made without lacking feature reconstruction, the method also includes: abandon described tested speech
Speech frame.
Preferably, described obtain in advance two or more Gaussian Mixture cluster weight, mean vector
Include with covariance matrix:
Obtain two or more training voices in advance, voice is trained for each, by described training
Voice is divided into two or more speech frames;
Obtain the Mel territory log power spectrum character vector of the speech frame of each described training voice,
The Mel territory log power spectrum character vector of the speech frame according to each training voice obtains each Gaussian Mixture
Cluster weight, mean vector and covariance matrix.
Preferably, the Mel territory log power spectrum Characteristic Vectors of the speech frame of each training voice of described acquisition
Amount parameter includes:
The speech frame of described training voice is carried out Fourier transformation, to the training voice after Fourier transformation
Speech frame delivery obtain the amplitude spectrum of speech frame of described training voice;
To the squared speech frame obtaining described training voice of amplitude spectrum of the speech frame of described training voice
Power spectrum;
The power spectrum of the speech frame of described training voice is obtained described training language through Mel comb filter
The Mel territory power spectrum characteristic vector parameters of the speech frame of sound, the Mel to the speech frame of described training voice
Territory power spectrum characteristic vector parameters is taken the logarithm the Mel territory log power of the speech frame obtaining described training voice
Spectrum signature vector parameters.
Preferably, the Mel territory log power spectrum characteristic vector ginseng of the described speech frame according to each training voice
Number obtains each Gaussian Mixture cluster weight, mean vector and covariance matrix and includes:
The number of described Gaussian Mixture cluster is set, initializes the mean vector of each Gaussian Mixture cluster, association
Variance matrix and weight;
According to initialized each Gaussian Mixture cluster mean vector, covariance matrix and weight, respectively train
The Mel territory log power spectrum character vector of the speech frame of voice uses Gaussian Mixture clustering algorithm to obtain
Each Gaussian Mixture cluster weight, mean vector and covariance matrix.
Preferably, the signal to noise ratio of each dimension of speech frame of described employing IMCRA algorithm calculating tested speech
Including:
Obtain the Mel territory power spectrum characteristic vector parameters of the speech frame of described tested speech;
The Mel territory power spectrum characteristic vector parameters of the speech frame according to described tested speech uses described
IMCRA algorithm calculates the signal to noise ratio of each dimension of speech frame of described tested speech.
Preferably, the Mel territory power spectrum characteristic vector parameters of the speech frame of described acquisition tested speech includes:
The speech frame of described tested speech is carried out Fourier transformation, to the tested speech after Fourier transformation
Speech frame delivery obtain the amplitude spectrum of speech frame of described tested speech;
To the squared speech frame obtaining described tested speech of amplitude spectrum of the speech frame of described tested speech
Power spectrum, obtains described survey by the power spectrum of the speech frame of described tested speech through Mel comb filter
The Mel territory power spectrum characteristic vector parameters of the speech frame of examination voice.
Preferably, the Mel territory power spectrum characteristic vector parameters of the described speech frame according to tested speech uses
The signal to noise ratio of each dimension of speech frame that IMCRA algorithm calculates tested speech includes:
According to formula D2(λ, k2)=αd(λ,k2)D2(λ-1,k2)+[1-αd(λ,k2)]Y2(λ, k2) calculates described test
The noise power of each dimension of speech frame of voice, according to formula
SNR (λ, k2)=20log10(Y(λ,k2)-D(λ,k2))-20log10D (λ, k2) calculates the voice of described tested speech
The signal to noise ratio of each dimension of frame;
Wherein, D2(λ, k2) is the of the Mel territory noise power of λ speech frame of tested speech
The value of k2 dimension, k2 is the dimension of the Mel territory power spectrum characteristic vector parameters of the speech frame of described tested speech
Sequence number, λ is the speech frame sequence number of described tested speech, αdFor smoothing parameter, Y is described tested speech
Kth 2 dimension values of Mel territory power spectrum characteristic vector parameters of speech frame, Y2(λ, k2) is test
The value of kth 2 dimension of the Mel territory power spectrum characteristic vector parameters of λ speech frame of voice.
Preferably, the signal to noise ratio of the described each dimension of the speech frame according to tested speech is by the voice of tested speech
Frame is divided into reliable parts and unreliable part includes:
Judge that the signal to noise ratio of speech frame dimension of described tested speech, more than predetermined threshold value, determines institute
State the reliable parts of the speech frame that this dimension is described tested speech of the speech frame of tested speech;
Judge that the signal to noise ratio of the speech frame dimension of described tested speech is less than or equal to predetermined threshold value,
Determine the speech frame of described tested speech the speech frame that this dimension is described tested speech can not relying part
Point.
Preferably, before the unreliable part of the described speech frame to tested speech carries out lacking feature reconstruction
Also include: judge the speech frame whether need of tested speech according to the unreliable part of the speech frame of tested speech
Carry out lacking feature reconstruction, including:
The number of dimensions of unreliable part of speech frame and the voice of described tested speech when described tested speech
Ratio between total number of dimensions of frame is more than or equal to when presetting ratio, it is judged that the language of described tested speech
Sound frame needs to carry out lacking feature reconstruction;
The number of dimensions of unreliable part of speech frame and the voice of described tested speech when described tested speech
Ratio between total number of dimensions of frame is less than when presetting ratio, it is judged that the speech frame of described tested speech is not
Needs carry out lacking feature reconstruction.
Preferably, described weight, mean vector and covariance square according to each Gaussian Mixture cluster obtained
The unreliable part of the speech frame of tested speech is lacked by battle array, the reliable parts of speech frame of tested speech
Mistake feature reconstruction includes:
Weight, mean vector and covariance matrix according to described each Gaussian Mixture cluster judge described survey
Gaussian Mixture belonging to the speech frame cluster of examination voice;
Weight, mean vector and the association of the Gaussian Mixture cluster belonging to the speech frame according to described tested speech
Variance matrix and formulaUnreliable part to the speech frame of tested speech
Carry out lacking feature reconstruction;
Wherein,Unreliable part for the speech frame of described tested speech;XoFor described tested speech
The reliable parts of speech frame;UkmGaussian Mixture cluster belonging to the speech frame of described tested speech
Part corresponding with the unreliable part of the speech frame of described tested speech in mean vector;UkoFor institute
State the language with described tested speech in the mean vector of the Gaussian Mixture cluster belonging to the speech frame of tested speech
The part that the reliable parts of sound frame is corresponding;θkmoFor the Gauss belonging to the speech frame by described tested speech
Mix in the covariance matrix of cluster corresponding with the unreliable part of the speech frame of described tested speech
OK, and the row corresponding with the reliable parts of the speech frame of described tested speech intersect elementary composition
Matrix;θkooIn the covariance matrix of the Gaussian Mixture cluster belonging to the speech frame by described tested speech
The row corresponding with the reliable parts of the speech frame of described tested speech, and with the voice of described tested speech
The elementary composition matrix that the corresponding row of the reliable parts of frame intersect.
Preferably, described weight, mean vector and the covariance matrix clustered according to each Gaussian Mixture judges
Go out the cluster of the Gaussian Mixture belonging to the speech frame of tested speech to include:
According to formulaJudge the height belonging to the speech frame of described tested speech
This mixes cluster;
Wherein, X is the Mel territory log power spectrum character vector of the speech frame of tested speech, λk4
Cluster for 4 Gaussian Mixture of kth,For the k4 value that maximum likelihood value is corresponding, P is X and λk4Between
Likelihood value, argmax be P be k4 value during maximum;
Wherein,
Wherein, y is certain one-dimensional data of X, ωk4λ is clustered for Gaussian Mixturek4Weight, μk4, x is λk4
The average corresponding with y dimension, σk4, y is λk4In with the y corresponding covariance matrix diagonal values of dimension.
By compare X the most each Gaussian Mixture cluster likelihood value P (X | λk4) size, select likelihood value maximum
Gaussian Mixture cluster λk4。
The invention allows for a kind of device realizing lacking feature reconstruction, at least include:
Acquisition module, for obtain in advance the weight of two or more Gaussian Mixture cluster, average to
Amount and covariance matrix;
Computing module, for being divided into two or more speech frames by tested speech, for each
The speech frame of tested speech, uses the minima improved to control recursive average IMCRA algorithm and calculates test
The signal to noise ratio of each dimension of speech frame of voice;
Rebuild module, be used for the signal to noise ratio of each dimension of speech frame according to tested speech by the language of tested speech
Sound frame is divided into reliable parts and unreliable part, and the unreliable part of the speech frame according to tested speech
Judge that the speech frame of tested speech needs to carry out lacking feature reconstruction, gather according to each Gaussian Mixture obtained
The weight of class, mean vector and covariance matrix, tested speech speech frame reliable parts to test language
The unreliable part of the speech frame of sound carries out lacking feature reconstruction.
Preferably, described reconstruction module is additionally operable to:
Unreliable part at the speech frame according to described tested speech judges the voice of described tested speech
When frame is made without lacking feature reconstruction, abandon the speech frame of described tested speech.
Preferably, described acquisition module specifically for:
Obtain two or more training voices in advance, voice is trained for each, by described training
Voice is divided into two or more speech frames;Obtain the prunus mume (sieb.) sieb.et zucc. of the speech frame of each described training voice
That territory log power spectrum character vector, according to the Mel territory log power of the speech frame of each training voice
Spectrum signature vector parameters obtains each Gaussian Mixture cluster weight, mean vector and covariance matrix.
Preferably, described computing module specifically for:
Tested speech is divided into two or more speech frames, for the voice of each tested speech
Frame, obtains the Mel territory power spectrum characteristic vector parameters of the speech frame of described tested speech;According to described survey
The Mel territory power spectrum characteristic vector parameters of the speech frame of examination voice uses described IMCRA algorithm to calculate institute
State the signal to noise ratio of each dimension of speech frame of tested speech.
Preferably, described reconstruction module specifically for:
Judge that the signal to noise ratio of speech frame dimension of described tested speech, more than predetermined threshold value, determines institute
State the reliable parts of the speech frame that this dimension is described tested speech of the speech frame of tested speech;
Judge that the signal to noise ratio of the speech frame dimension of described tested speech is less than or equal to predetermined threshold value,
Determine the speech frame of described tested speech the speech frame that this dimension is described tested speech can not relying part
Point;
The unreliable part of the speech frame according to tested speech judges that the speech frame of tested speech needs to carry out
Disappearance feature reconstruction, according to obtain each Gaussian Mixture cluster weight, mean vector and covariance matrix,
The reliable parts of the speech frame of tested speech carries out disappearance spy to the unreliable part of the speech frame of tested speech
Levy reconstruction.
Preferably, described reconstruction module specifically for:
The speech frame of tested speech is divided into reliably by the signal to noise ratio of each dimension of speech frame according to tested speech
Part and unreliable part;
The number of dimensions of unreliable part of speech frame and the voice of described tested speech when described tested speech
Ratio between total number of dimensions of frame is more than or equal to when presetting ratio, it is judged that the language of described tested speech
Sound frame needs to carry out lacking feature reconstruction;
According to weight, mean vector and the covariance matrix of each Gaussian Mixture cluster obtained, tested speech
The reliable parts of speech frame carry out the unreliable part of the speech frame of tested speech lacking feature reconstruction.
Preferably, described reconstruction module specifically for:
The speech frame of tested speech is divided into reliably by the signal to noise ratio of each dimension of speech frame according to tested speech
Part and unreliable part, judge tested speech according to the unreliable part of the speech frame of tested speech
Speech frame needs to carry out lacking feature reconstruction;
Weight, mean vector and covariance matrix according to described each Gaussian Mixture cluster judge described survey
Gaussian Mixture belonging to the speech frame cluster of examination voice;
Weight, mean vector and the association of the Gaussian Mixture cluster belonging to the speech frame according to described tested speech
Variance matrix and formulaUnreliable part to the speech frame of tested speech
Carry out lacking feature reconstruction;
Wherein,Unreliable part for the speech frame of described tested speech;XoFor described tested speech
The reliable parts of speech frame;UkmGaussian Mixture cluster belonging to the speech frame of described tested speech
Part corresponding with the unreliable part of the speech frame of described tested speech in mean vector;UkoFor institute
State the language with described tested speech in the mean vector of the Gaussian Mixture cluster belonging to the speech frame of tested speech
The part that the reliable parts of sound frame is corresponding;θkmoFor the Gauss belonging to the speech frame by described tested speech
Mix in the covariance matrix of cluster corresponding with the unreliable part of the speech frame of described tested speech
OK, and the row corresponding with the reliable parts of the speech frame of described tested speech intersect elementary composition
Matrix;θkooIn the covariance matrix of the Gaussian Mixture cluster belonging to the speech frame by described tested speech
The row corresponding with the reliable parts of the speech frame of described tested speech, and with the voice of described tested speech
The elementary composition matrix that the corresponding row of the reliable parts of frame intersect.
Compared with prior art, the present invention includes: obtain two or more Gaussian Mixture cluster in advance
Weight, mean vector and covariance matrix;Tested speech is divided into two or more speech frames,
For the speech frame of each tested speech, the speech frame using IMCRA algorithm calculating tested speech is each
The signal to noise ratio of dimension;The signal to noise ratio of each dimension of speech frame according to tested speech is by the speech frame of tested speech
It is divided into reliable parts and unreliable part, and judges according to the unreliable part of the speech frame of tested speech
The speech frame going out tested speech needs to carry out lacking feature reconstruction, according to each Gaussian Mixture cluster obtained
Weight, mean vector and covariance matrix, tested speech the reliable parts of speech frame to tested speech
The unreliable part of speech frame carries out lacking feature reconstruction.By the solution of the present invention, due to IMCRA
Algorithm be can under steady and nonstationary noise environment the method for effective tracking noise, therefore improve flat
Surely and lack the precision of feature reconstruction under nonstationary noise environment, thus improve Voiceprint Recognition System identification
Rate.
Accompanying drawing explanation
Illustrating the accompanying drawing in the embodiment of the present invention below, the accompanying drawing in embodiment is for this
Bright is further appreciated by, and is used for explaining the present invention, is not intended that and the present invention is protected model together with description
The restriction enclosed.
Fig. 1 is the flow chart of the method realizing disappearance feature reconstruction of the present invention;
Fig. 2 is the structure composition schematic diagram of the device realizing disappearance feature reconstruction of the present invention.
Detailed description of the invention
For the ease of the understanding of those skilled in the art, below in conjunction with the accompanying drawings the present invention is further retouched
State, can not be used for limiting the scope of the invention.It should be noted that in the case of not conflicting,
Embodiment in the application and the various modes in embodiment can be mutually combined.
Seeing Fig. 1, the present invention proposes a kind of method realizing and lacking feature reconstruction, including:
Step 100, in advance obtain two or more Gaussian Mixture cluster weight, mean vector and
Covariance matrix.Specifically include:
Obtain two or more training voices in advance, voice is trained for each, voice will be trained
It is divided into two or more speech frames;Obtain the Mel territory logarithm of the speech frame of each training voice
Power spectrum characteristic vector parameters, according to the Mel territory log power spectrum Characteristic Vectors of the speech frame of each training voice
Amount parameter acquiring each Gaussian Mixture cluster weight, mean vector and covariance matrix.
Wherein, when training voice is divided into two or more speech frames, training voice can be made
Between a length of 20 milliseconds (ms) of speech frame to 30ms, adjacent two training voices speech frame it
Between have 25% to 50% overlap.
Wherein, the Mel territory log power spectrum character vector of the speech frame of each training voice is obtained
Including:
The speech frame of training voice is carried out Fourier transformation, the language to the training voice after Fourier transformation
Sound frame delivery obtains training the amplitude spectrum of the speech frame of voice, takes the amplitude spectrum of the speech frame of training voice
Square obtain training the power spectrum of the speech frame of voice, will the power spectrum of speech frame of training voice through prunus mume (sieb.) sieb.et zucc.
Your comb filter obtains training the Mel territory power spectrum characteristic vector parameters of the speech frame of voice, to training
The Mel territory power spectrum characteristic vector parameters of the speech frame of voice is taken the logarithm and is obtained training the speech frame of voice
Mel territory log power spectrum character vector.
Wherein, Mel comb filter can be triangular filter or hyperbola wave filter.
Wherein, triangular filter can use following methods to design.
According to formula (1), temporal frequency gone to Mel territory frequency.
Wherein, f is temporal frequency, fmelFor Mel territory frequency.
Calculate voice signal peak frequency again:
Wherein, fsFor sample frequency, sample frequency is generally 8000Hz, fgOriginal frequency for voice signal
The maximum of rate.
In conjunction with formula (1) and formula (2), the maximum Mel territory frequency of voice is to make f=fg draw:
Wherein, fmaxFor maximum Mel territory frequency.
Mel territory, K rank power spectrum characteristic vector parameters is:
Utilize the inverse function of formula (1) by vector M=Δ mel, 2 Δ mel, 3 Δ mel ..., (K+1) Δ mel} is converted into
Temporal frequency obtains fcenter={ f1,f2,f3,…,fK+1, finally utilize fcenterDesign triangle wave filter.Design
Triangular filter to meet gain be 1, each triangle center frequency is fn, n ∈ 1, K}, triangle
The mid frequency left side a width of f of bandn—fn—1, a width of f of the right bandn+1—fn?.First triangle center
The frequency left side a width of f of band1。
Wherein, obtain according to the Mel territory log power spectrum character vector of the speech frame of each training voice
Each Gaussian Mixture cluster weight, mean vector and covariance matrix include:
The number of Gaussian Mixture cluster is set, initializes the mean vector of each Gaussian Mixture cluster, covariance
Matrix and weight, according to initialized each Gaussian Mixture cluster mean vector, covariance matrix and weight,
The Mel territory log power spectrum character vector of the speech frame of each training voice uses Gaussian Mixture cluster to calculate
Method obtains each Gaussian Mixture cluster weight, mean vector and covariance matrix.
Wherein it is possible to rule of thumb the number that Gaussian Mixture clusters is set to 128.
Wherein it is possible to use LBG-vector quantization (VQ, Vector Quantization) algorithm initialization
The mean vector of each Gaussian Mixture cluster, implements the common knowledge belonging to those skilled in the art, and
It is not intended to limit the scope of the invention, repeats no more here.
Wherein it is possible to the covariance matrix clustered by each Gaussian Mixture of several random initializtions between 0 to 2.
Wherein, during the weight that random initializtion Gaussian Mixture clusters, should ensure that what all Gaussian Mixture clustered
Weight sum is 1.
Wherein, Gaussian Mixture clustering algorithm can be EM algorithm.EM algorithm belongs to existing algorithm,
The protection domain that concrete implementation is not intended to limit the present invention.EM algorithm is implemented as follows:
Clustering for each Gaussian Mixture, circulation performs formula (5) to formula (9).
Wherein, i represents the sequence number of the speech frame of training voice, and N1 is Gaussian function, ωk1For kth 1
The weight of Gaussian Mixture cluster, xiMel territory log power spectrum for the speech frame of i-th training voice is special
Levy vector parameters, μk1For the mean vector of 1 Gaussian Mixture cluster of kth, θk1Mix for 1 Gauss of kth
Close the covariance matrix of cluster, μjFor the mean vector of jth Gaussian Mixture cluster, θjFor jth Gauss
The covariance matrix of mixing cluster, j, k1 are the sequence number of Gaussian Mixture cluster, and M is Gaussian Mixture cluster
Number.
Wherein, n is the number of the speech frame training voice.
Wherein, the number of times of circulation can preset, and the number of times of circulation is the most, and precision is the highest, such as,
Could be arranged to 10 times.
Wherein,
Wherein, X is the Mel territory log power spectrum character vector of the speech frame of training voice, and U is
The mean vector of Gaussian Mixture cluster, θ is the covariance matrix of Gaussian Mixture cluster, and d is training voice
The dimension of the Mel territory log power spectrum character vector of speech frame.
128 groups of ω, μ, the θ obtained after 10 loop ends are preserved, i.e. thinks these parameters
Represent Human voice's common feature.
Step 101, tested speech is divided into two or more speech frames, each is tested
The speech frame of voice, uses the minima improved to control recursive average (IMCRA, The Improved
Minima Controlled Recursive Averaging) algorithm calculates each dimension of speech frame of tested speech
Signal to noise ratio.
In this step, a dimension of the speech frame of tested speech refers to carry from the speech frame of tested speech
Obtain a subband of the Mel territory power spectrum characteristic vector parameters arrived.Such as, when the Mel of tested speech
When territory power spectrum characteristic vector parameters is [2,3,4], the speech frame of tested speech includes three dimensions, respectively
It is 2,3,4.
In this step, when tested speech is divided into two or more speech frames, test language can be made
Between a length of 20ms to the 30ms of the speech frame of sound, between the speech frame of adjacent two tested speech
There is the overlap of 25% to 50%.
In this step, use the signal to noise ratio bag of each dimension of speech frame of IMCRA algorithm calculating tested speech
Include:
Obtain the Mel territory power spectrum characteristic vector parameters of the speech frame of tested speech, according to tested speech
The Mel territory power spectrum characteristic vector parameters of speech frame uses IMCRA algorithm to calculate the voice of tested speech
The signal to noise ratio of each dimension of frame.
Wherein, the Mel territory power spectrum characteristic vector parameters of the speech frame obtaining tested speech includes:
The speech frame of tested speech is carried out Fourier transformation, the language to the tested speech after Fourier transformation
Sound frame delivery obtains the amplitude spectrum of the speech frame of tested speech, takes the amplitude spectrum of the speech frame of tested speech
Square obtain the power spectrum of the speech frame of tested speech, by the power spectrum of the speech frame of tested speech through prunus mume (sieb.) sieb.et zucc.
Your comb filter obtains the Mel territory power spectrum characteristic vector parameters of the speech frame of tested speech.
Wherein, IMCRA is used according to the Mel territory power spectrum characteristic vector parameters of the speech frame of tested speech
The signal to noise ratio of each dimension of speech frame that algorithm calculates tested speech includes:
According to formula D2(λ, k2)=αd(λ,k2)D2(λ-1,k2)+[1-αd(λ,k2)]Y2(λ, k2) calculates tested speech
The noise power of each dimension of speech frame, according to formula
SNR (λ, k2)=20log10(Y(λ,k2)-D(λ,k2))-20log10The speech frame that D (λ, k2) calculates tested speech is each
The signal to noise ratio of dimension.
Wherein, D2(λ, k2) is the of the Mel territory noise power of λ speech frame of tested speech
The value of k2 dimension, k2 is the dimension sequence of the Mel territory power spectrum characteristic vector parameters of the speech frame of tested speech
Number, λ is the speech frame sequence number of tested speech, αdFor smoothing parameter, Y is the speech frame of tested speech
Kth 2 dimension values of Mel territory power spectrum characteristic vector parameters, Y2(λ, k2) is the λ of tested speech
The value of kth 2 dimension of the Mel territory power spectrum characteristic vector parameters of individual speech frame.
Wherein,
αd(λ, k2)=α+(1-α) p (λ, k2) (11)
Wherein, α is constant, p (λ, k2) be the speech frame of λ tested speech kth 2 dimension in exist
The probability of voice.
Wherein, q (λ, k2) be λ tested speech speech frame kth 2 dimension in there is not voice
Probability,γ (λ, k2) is the prunus mume (sieb.) sieb.et zucc. of the speech frame of λ tested speech
The posteriori SNR of kth 2 Wei Chu of your territory power spectrum characteristic vector parameters, ζ (λ, k2) is λ
The prior weight of kth 2 Wei Chu of the Mel territory power spectrum characteristic vector parameters of the speech frame of tested speech.
Wherein, BminFor deviation factors, SminMinimum for S (λ, k2-1-k3) to S (λ, k2-1)
Value.K3 can preset.
Wherein,
S (λ, k)=αSS(λ-1,k)+(1-αS)Sf(λ,k) (15)
Wherein, αsFor constant smoothing factor, and
Wherein, w (i) is Hanning window function, the long 2L of windoww+1。
Wherein, γ1It it is a constant threshold.
Wherein, γ0And ζ0For constant threshold.
Step 102, according to the signal to noise ratio of each dimension of speech frame of tested speech by the speech frame of tested speech
It is divided into reliable parts and unreliable part, and judges according to the unreliable part of the speech frame of tested speech
The speech frame going out tested speech needs to carry out lacking feature reconstruction, according to each Gaussian Mixture cluster obtained
Weight, mean vector and covariance matrix, tested speech the reliable parts of speech frame to tested speech
The unreliable part of speech frame carries out lacking feature reconstruction.
In this step, the unreliable part at the speech frame according to tested speech judges the language of tested speech
When sound frame is made without lacking feature reconstruction, abandon the speech frame of tested speech.
In this step, according to the signal to noise ratio of each dimension of the speech frame of tested speech by the speech frame of tested speech
It is divided into reliable parts and unreliable part includes:
Judge that the signal to noise ratio of speech frame dimension of tested speech, more than predetermined threshold value, determines test language
The reliable parts of the speech frame that this dimension is tested speech of the speech frame of sound;Judge the language of tested speech
The signal to noise ratio of sound frame dimension is less than or equal to predetermined threshold value, determines this dimension of the speech frame of tested speech
Degree is the unreliable part of the speech frame of tested speech.
Wherein it is possible to use the mode of labelling that the speech frame of tested speech is divided into reliable parts and can not
Relying part is divided, and specifically can use formulaWherein, L is predetermined threshold value,
M (λ, k2) is mark value.
Wherein, whether the speech frame of tested speech is judged according to the unreliable part of the speech frame of tested speech
Needs carry out lacking feature reconstruction and include:
When the number of dimensions of unreliable part of speech frame of tested speech and total dimension of the speech frame of tested speech
Ratio between the number of degrees is more than or equal to when presetting ratio, it is judged that the speech frame of tested speech needs to carry out
Disappearance feature reconstruction;The number of dimensions of unreliable part of speech frame and the language of tested speech when tested speech
Ratio between total number of dimensions of sound frame is less than when presetting ratio, it is judged that the speech frame of tested speech is not required to
Carry out lacking feature reconstruction.
In this step, according to obtain each Gaussian Mixture cluster weight, mean vector and covariance matrix,
The reliable parts of the speech frame of tested speech carries out disappearance spy to the unreliable part of the speech frame of tested speech
Levy reconstruction to include:
Tested speech is judged according to weight, mean vector and the covariance matrix that each Gaussian Mixture clusters
Gaussian Mixture cluster belonging to speech frame, clusters according to the Gaussian Mixture belonging to the speech frame of tested speech
Weight, mean vector and covariance matrix and formulaTo tested speech
The unreliable part of speech frame carries out lacking feature reconstruction.
Wherein,Unreliable part for the speech frame of tested speech;XoSpeech frame for tested speech
Reliable parts;UkmBelonging to the speech frame of tested speech Gaussian Mixture cluster mean vector in
The part that the unreliable part of the speech frame of tested speech is corresponding;UkoSpeech frame institute for tested speech
Portion corresponding with the reliable parts of the speech frame of tested speech in the mean vector of the Gaussian Mixture cluster belonged to
Point;θkmoFor by belonging to the speech frame of tested speech Gaussian Mixture cluster covariance matrix in test
The row that the unreliable part of the speech frame of voice is corresponding, and the reliable parts of the speech frame with tested speech
The elementary composition matrix that corresponding row intersect;θkooFor by the height belonging to the speech frame of tested speech
Row corresponding with the reliable parts of the speech frame of tested speech in the covariance matrix of this mixing cluster, and
The elementary composition matrix that the row corresponding with the reliable parts of the speech frame of tested speech intersect.
It is to say, after obtaining the cluster of the Gaussian Mixture described in the speech frame of tested speech, can will survey
The mean vector of the speech frame of examination voice is rearranged for Uk=[Uko,Ukm], by the speech frame of tested speech
Covariance matrix is rearranged for
Such as, total number of dimensions of the speech frame of tested speech is 6, and wherein 1,3,5 dimensions are reliable parts, 2,4,6
Dimension is unreliable part, then UkoThe average of the Gaussian Mixture cluster belonging to the speech frame of tested speech
1,3,5th dimension in vector;θkmoFor the covariance clustered by the Gaussian Mixture belonging to the speech frame of tested speech
In matrix, the 2nd, 4,6 row and the 1st, 3,5 arranges the elementary composition matrix intersected;θkooFor by test language
In the covariance matrix of the Gaussian Mixture cluster belonging to the speech frame of sound, the 1,3,5th row and the 1,3,5th row intersect
The elementary composition matrix of fork.
Wherein, weight, mean vector and the covariance matrix clustered according to each Gaussian Mixture judges test
Gaussian Mixture cluster belonging to the speech frame of voice includes:
According to formulaJudge that the Gauss belonging to the speech frame of tested speech mixes
Close cluster.
Wherein, X is the Mel territory log power spectrum character vector of the speech frame of tested speech, λk4
Cluster for 4 Gaussian Mixture of kth,For the k4 value that maximum likelihood value is corresponding, P is X and λk4Between
Likelihood value, argmax be P be k4 value during maximum.
Wherein,
Y is certain one-dimensional data of X, ωk4λ is clustered for Gaussian Mixturek4Weight, μk4, x is λk4With
The average that y dimension is corresponding, σk4, y is that Gaussian Mixture clusters λk4In with the y corresponding covariance matrix pair of dimension
Angle value.
By the method for the present invention, owing to IMCRA algorithm is can under nonstationary noise environment effectively
The method of tracking noise, thus improve the precision of disappearance feature reconstruction.
See Fig. 2, the invention allows for a kind of device realizing lacking feature reconstruction, at least include:
Acquisition module, for obtain in advance the weight of two or more Gaussian Mixture cluster, average to
Amount and covariance matrix;
Computing module, for being divided into two or more speech frames by tested speech, for each
The speech frame of tested speech, uses the noise of each dimension of speech frame of IMCRA algorithm calculating tested speech
Ratio;
Rebuild module, be used for the signal to noise ratio of each dimension of speech frame according to tested speech by the language of tested speech
Sound frame is divided into reliable parts and unreliable part, and the unreliable part of the speech frame according to tested speech
Judge that the speech frame of tested speech needs to carry out lacking feature reconstruction, gather according to each Gaussian Mixture obtained
The weight of class, mean vector and covariance matrix, tested speech speech frame reliable parts to test language
The unreliable part of the speech frame of sound carries out lacking feature reconstruction.
In assembly of the invention, rebuild module and be additionally operable to:
Unreliable part at the speech frame according to tested speech judges that the speech frame of tested speech need not
When carrying out lacking feature reconstruction, abandon the speech frame of tested speech.
In assembly of the invention, acquisition module specifically for:
Obtain two or more training voices in advance, voice is trained for each, voice will be trained
It is divided into two or more speech frames;Obtain the Mel territory logarithm of the speech frame of each training voice
Power spectrum characteristic vector parameters, according to the Mel territory log power spectrum Characteristic Vectors of the speech frame of each training voice
Amount parameter acquiring each Gaussian Mixture cluster weight, mean vector and covariance matrix.
In assembly of the invention, computing module specifically for:
Tested speech is divided into two or more speech frames, for the voice of each tested speech
Frame, obtains the Mel territory power spectrum characteristic vector parameters of the speech frame of tested speech;According to tested speech
The Mel territory power spectrum characteristic vector parameters of speech frame uses IMCRA algorithm to calculate the voice of tested speech
The signal to noise ratio of each dimension of frame.
In assembly of the invention, rebuild module specifically for:
Judge that the signal to noise ratio of speech frame dimension of tested speech, more than predetermined threshold value, determines test language
The reliable parts of the speech frame that this dimension is tested speech of the speech frame of sound;
Judge that the signal to noise ratio of speech frame dimension of tested speech, less than or equal to predetermined threshold value, determines
The unreliable part of the speech frame that this dimension is tested speech of the speech frame of tested speech;
The unreliable part of the speech frame according to tested speech judges that the speech frame of tested speech needs to carry out
Disappearance feature reconstruction, according to obtain each Gaussian Mixture cluster weight, mean vector and covariance matrix,
The reliable parts of the speech frame of tested speech carries out disappearance spy to the unreliable part of the speech frame of tested speech
Levy reconstruction.
In assembly of the invention, rebuild module specifically for:
The speech frame of tested speech is divided into reliably by the signal to noise ratio of each dimension of speech frame according to tested speech
Part and unreliable part;
When the number of dimensions of unreliable part of speech frame of tested speech and total dimension of the speech frame of tested speech
Ratio between the number of degrees is more than or equal to when presetting ratio, it is judged that the speech frame of tested speech needs to carry out
Disappearance feature reconstruction;
According to weight, mean vector and the covariance matrix of each Gaussian Mixture cluster obtained, tested speech
The reliable parts of speech frame carry out the unreliable part of the speech frame of tested speech lacking feature reconstruction.
In assembly of the invention, rebuild module specifically for:
The speech frame of tested speech is divided into reliably by the signal to noise ratio of each dimension of speech frame according to tested speech
Part and unreliable part, judge tested speech according to the unreliable part of the speech frame of tested speech
Speech frame needs to carry out lacking feature reconstruction;
Tested speech is judged according to weight, mean vector and the covariance matrix that each Gaussian Mixture clusters
Gaussian Mixture cluster belonging to speech frame;
Weight, mean vector and the covariance clustered according to the Gaussian Mixture belonging to the speech frame of tested speech
Matrix and formulaThe unreliable part of the speech frame of tested speech is carried out
Disappearance feature reconstruction;
Wherein,Unreliable part for the speech frame of tested speech;XoSpeech frame for tested speech
Reliable parts;UkmBelonging to the speech frame of tested speech Gaussian Mixture cluster mean vector in
The part that the unreliable part of the speech frame of tested speech is corresponding;UkoSpeech frame institute for tested speech
Portion corresponding with the reliable parts of the speech frame of tested speech in the mean vector of the Gaussian Mixture cluster belonged to
Point;θkmoFor by belonging to the speech frame of tested speech Gaussian Mixture cluster covariance matrix in test
The row that the unreliable part of the speech frame of voice is corresponding, and the reliable parts of the speech frame with tested speech
The elementary composition matrix that corresponding row intersect;θkooFor by the height belonging to the speech frame of tested speech
Row corresponding with the reliable parts of the speech frame of tested speech in the covariance matrix of this mixing cluster, and
The elementary composition matrix that the row corresponding with the reliable parts of the speech frame of tested speech intersect.
Understand it should be noted that embodiment described above is for only for ease of those skilled in the art
, it is not limited to protection scope of the present invention, in the premise of the inventive concept without departing from the present invention
Under, any obvious replacement that the present invention is made by those skilled in the art and improvement etc. are all at this
Within the protection domain of invention.
Claims (19)
1. the method realizing lacking feature reconstruction, it is characterised in that including:
Obtain weight, mean vector and the covariance matrix of two or more Gaussian Mixture cluster in advance;
Tested speech is divided into two or more speech frames, for the voice of each tested speech
Frame, the speech frame using the minima control recursive average IMCRA algorithm calculating tested speech improved is each
The signal to noise ratio of dimension;
The speech frame of tested speech is divided into reliably by the signal to noise ratio of each dimension of speech frame according to tested speech
Part and unreliable part, and judge tested speech according to the unreliable part of the speech frame of tested speech
Speech frame need carry out lack feature reconstruction, according to obtain each Gaussian Mixture cluster weight, average
Vector sum covariance matrix, tested speech speech frame reliable parts to the speech frame of tested speech not
Reliable parts carries out lacking feature reconstruction.
Method the most according to claim 1, it is characterised in that at the language according to described tested speech
The unreliable part of sound frame judges that the speech frame of described tested speech is made without lacking feature reconstruction
Time, the method also includes: abandon the speech frame of described tested speech.
Method the most according to claim 1 and 2, it is characterised in that described in advance obtain two or
Weight, mean vector and the covariance matrix of two or more Gaussian Mixture cluster include:
Obtain two or more training voices in advance, voice is trained for each, by described training
Voice is divided into two or more speech frames;
Obtain the Mel territory log power spectrum character vector of the speech frame of each described training voice,
The Mel territory log power spectrum character vector of the speech frame according to each training voice obtains each Gaussian Mixture
Cluster weight, mean vector and covariance matrix.
Method the most according to claim 3, it is characterised in that each training voice of described acquisition
The Mel territory log power spectrum character vector of speech frame include:
The speech frame of described training voice is carried out Fourier transformation, to the training voice after Fourier transformation
Speech frame delivery obtain the amplitude spectrum of speech frame of described training voice;
To the squared speech frame obtaining described training voice of amplitude spectrum of the speech frame of described training voice
Power spectrum;
The power spectrum of the speech frame of described training voice is obtained described training language through Mel comb filter
The Mel territory power spectrum characteristic vector parameters of the speech frame of sound, the Mel to the speech frame of described training voice
Territory power spectrum characteristic vector parameters is taken the logarithm the Mel territory log power of the speech frame obtaining described training voice
Spectrum signature vector parameters.
Method the most according to claim 3, it is characterised in that the described language according to each training voice
The Mel territory log power spectrum character vector of sound frame obtains each Gaussian Mixture cluster weight, mean vector
Include with covariance matrix:
The number of described Gaussian Mixture cluster is set, initializes the mean vector of each Gaussian Mixture cluster, association
Variance matrix and weight;
According to initialized each Gaussian Mixture cluster mean vector, covariance matrix and weight, respectively train
The Mel territory log power spectrum character vector of the speech frame of voice uses Gaussian Mixture clustering algorithm to obtain
Each Gaussian Mixture cluster weight, mean vector and covariance matrix.
Method the most according to claim 1 and 2, it is characterised in that described employing IMCRA calculates
The signal to noise ratio of each dimension of speech frame that method calculates tested speech includes:
Obtain the Mel territory power spectrum characteristic vector parameters of the speech frame of described tested speech;
The Mel territory power spectrum characteristic vector parameters of the speech frame according to described tested speech uses described
IMCRA algorithm calculates the signal to noise ratio of each dimension of speech frame of described tested speech.
Method the most according to claim 6, it is characterised in that the voice of described acquisition tested speech
The Mel territory power spectrum characteristic vector parameters of frame includes:
The speech frame of described tested speech is carried out Fourier transformation, to the tested speech after Fourier transformation
Speech frame delivery obtain the amplitude spectrum of speech frame of described tested speech;
To the squared speech frame obtaining described tested speech of amplitude spectrum of the speech frame of described tested speech
Power spectrum, obtains described survey by the power spectrum of the speech frame of described tested speech through Mel comb filter
The Mel territory power spectrum characteristic vector parameters of the speech frame of examination voice.
Method the most according to claim 6, it is characterised in that the described voice according to tested speech
The speech frame that the Mel territory power spectrum characteristic vector parameters employing IMCRA algorithm of frame calculates tested speech is each
The signal to noise ratio of dimension includes:
According to formula D2(λ, k2)=αd(λ,k2)D2(λ-1,k2)+[1-αd(λ,k2)]Y2(λ, k2) calculates described test
The noise power of each dimension of speech frame of voice, according to formula
SNR (λ, k2)=20log10(Y(λ,k2)-D(λ,k2))-20log10D (λ, k2) calculates the voice of described tested speech
The signal to noise ratio of each dimension of frame;
Wherein, D2(λ, k2) is the of the Mel territory noise power of λ speech frame of tested speech
The value of k2 dimension, k2 is the dimension of the Mel territory power spectrum characteristic vector parameters of the speech frame of described tested speech
Sequence number, λ is the speech frame sequence number of described tested speech, αdFor smoothing parameter, Y is described tested speech
Kth 2 dimension values of Mel territory power spectrum characteristic vector parameters of speech frame, Y2(λ, k2) is test
The value of kth 2 dimension of the Mel territory power spectrum characteristic vector parameters of λ speech frame of voice.
Method the most according to claim 1 and 2, it is characterised in that described according to tested speech
The speech frame of tested speech is divided into reliable parts and unreliable part bag by the signal to noise ratio of each dimension of speech frame
Include:
Judge that the signal to noise ratio of speech frame dimension of described tested speech, more than predetermined threshold value, determines institute
State the reliable parts of the speech frame that this dimension is described tested speech of the speech frame of tested speech;
Judge that the signal to noise ratio of the speech frame dimension of described tested speech is less than or equal to predetermined threshold value,
Determine the speech frame of described tested speech the speech frame that this dimension is described tested speech can not relying part
Point.
Method the most according to claim 1 and 2, it is characterised in that described to tested speech
The unreliable part of speech frame also includes before carrying out lacking feature reconstruction: according to the speech frame of tested speech
Unreliable part judge the speech frame of tested speech the need of carrying out lacking feature reconstruction, including:
The number of dimensions of unreliable part of speech frame and the voice of described tested speech when described tested speech
Ratio between total number of dimensions of frame is more than or equal to when presetting ratio, it is judged that the language of described tested speech
Sound frame needs to carry out lacking feature reconstruction;
The number of dimensions of unreliable part of speech frame and the voice of described tested speech when described tested speech
Ratio between total number of dimensions of frame is less than when presetting ratio, it is judged that the speech frame of described tested speech is not
Needs carry out lacking feature reconstruction.
11. methods according to claim 1 and 2, it is characterised in that described each according to obtain
Gaussian Mixture cluster weight, mean vector and covariance matrix, tested speech speech frame can relying part
Divide the unreliable part to the speech frame of tested speech to carry out lacking feature reconstruction to include:
Weight, mean vector and covariance matrix according to described each Gaussian Mixture cluster judge described survey
Gaussian Mixture belonging to the speech frame cluster of examination voice;
Weight, mean vector and the association of the Gaussian Mixture cluster belonging to the speech frame according to described tested speech
Variance matrix and formulaUnreliable part to the speech frame of tested speech
Carry out lacking feature reconstruction;
Wherein,Unreliable part for the speech frame of described tested speech;XoFor described tested speech
The reliable parts of speech frame;UkmGaussian Mixture cluster belonging to the speech frame of described tested speech
Part corresponding with the unreliable part of the speech frame of described tested speech in mean vector;UkoFor institute
State the language with described tested speech in the mean vector of the Gaussian Mixture cluster belonging to the speech frame of tested speech
The part that the reliable parts of sound frame is corresponding;θkmoFor the Gauss belonging to the speech frame by described tested speech
Mix in the covariance matrix of cluster corresponding with the unreliable part of the speech frame of described tested speech
OK, and the row corresponding with the reliable parts of the speech frame of described tested speech intersect elementary composition
Matrix;θkooIn the covariance matrix of the Gaussian Mixture cluster belonging to the speech frame by described tested speech
The row corresponding with the reliable parts of the speech frame of described tested speech, and with the voice of described tested speech
The elementary composition matrix that the corresponding row of the reliable parts of frame intersect.
12. methods according to claim 11, it is characterised in that described poly-according to each Gaussian Mixture
The weight of class, mean vector and covariance matrix judge the Gaussian Mixture belonging to the speech frame of tested speech
Cluster includes:
According to formulaJudge the height belonging to the speech frame of described tested speech
This mixes cluster;
Wherein, X is the Mel territory log power spectrum character vector of the speech frame of tested speech, λk4
Cluster for 4 Gaussian Mixture of kth,For the k4 value that maximum likelihood value is corresponding, P is X and λk4Between
Likelihood value, argmax be P be k4 value during maximum;
Wherein,
Wherein, y is certain one-dimensional data of X, ωk4λ is clustered for Gaussian Mixturek4Weight, μk4,xFor λk4
The average corresponding with y dimension, σK4, yFor λk4In with the y corresponding covariance matrix diagonal values of dimension;
By compare X the most each Gaussian Mixture cluster likelihood value P (X | λk4) size, select likelihood value
Maximum Gaussian Mixture cluster λk4。
13. 1 kinds of devices realizing lacking feature reconstruction, it is characterised in that at least include:
Acquisition module, for obtain in advance the weight of two or more Gaussian Mixture cluster, average to
Amount and covariance matrix;
Computing module, for being divided into two or more speech frames by tested speech, for each
The speech frame of tested speech, uses the minima improved to control recursive average IMCRA algorithm and calculates test
The signal to noise ratio of each dimension of speech frame of voice;
Rebuild module, be used for the signal to noise ratio of each dimension of speech frame according to tested speech by the language of tested speech
Sound frame is divided into reliable parts and unreliable part, and the unreliable part of the speech frame according to tested speech
Judge that the speech frame of tested speech needs to carry out lacking feature reconstruction, gather according to each Gaussian Mixture obtained
The weight of class, mean vector and covariance matrix, tested speech speech frame reliable parts to test language
The unreliable part of the speech frame of sound carries out lacking feature reconstruction.
14. devices according to claim 13, it is characterised in that described reconstruction module is additionally operable to:
Unreliable part at the speech frame according to described tested speech judges the voice of described tested speech
When frame is made without lacking feature reconstruction, abandon the speech frame of described tested speech.
15. according to the device described in claim 13 or 14, it is characterised in that described acquisition module has
Body is used for:
Obtain two or more training voices in advance, voice is trained for each, by described training
Voice is divided into two or more speech frames;Obtain the prunus mume (sieb.) sieb.et zucc. of the speech frame of each described training voice
That territory log power spectrum character vector, according to the Mel territory log power of the speech frame of each training voice
Spectrum signature vector parameters obtains each Gaussian Mixture cluster weight, mean vector and covariance matrix.
16. according to the device described in claim 13 or 14, it is characterised in that described computing module has
Body is used for:
Tested speech is divided into two or more speech frames, for the voice of each tested speech
Frame, obtains the Mel territory power spectrum characteristic vector parameters of the speech frame of described tested speech;According to described survey
The Mel territory power spectrum characteristic vector parameters of the speech frame of examination voice uses described IMCRA algorithm to calculate institute
State the signal to noise ratio of each dimension of speech frame of tested speech.
17. according to the device described in claim 13 or 14, it is characterised in that described reconstruction module has
Body is used for:
Judge that the signal to noise ratio of speech frame dimension of described tested speech, more than predetermined threshold value, determines institute
State the reliable parts of the speech frame that this dimension is described tested speech of the speech frame of tested speech;
Judge that the signal to noise ratio of the speech frame dimension of described tested speech is less than or equal to predetermined threshold value,
Determine the speech frame of described tested speech the speech frame that this dimension is described tested speech can not relying part
Point;
The unreliable part of the speech frame according to tested speech judges that the speech frame of tested speech needs to carry out
Disappearance feature reconstruction, according to obtain each Gaussian Mixture cluster weight, mean vector and covariance matrix,
The reliable parts of the speech frame of tested speech carries out disappearance spy to the unreliable part of the speech frame of tested speech
Levy reconstruction.
18. according to the device described in claim 13 or 14, it is characterised in that described reconstruction module has
Body is used for:
The speech frame of tested speech is divided into reliably by the signal to noise ratio of each dimension of speech frame according to tested speech
Part and unreliable part;
The number of dimensions of unreliable part of speech frame and the voice of described tested speech when described tested speech
Ratio between total number of dimensions of frame is more than or equal to when presetting ratio, it is judged that the language of described tested speech
Sound frame needs to carry out lacking feature reconstruction;
According to weight, mean vector and the covariance matrix of each Gaussian Mixture cluster obtained, tested speech
The reliable parts of speech frame carry out the unreliable part of the speech frame of tested speech lacking feature reconstruction.
19. according to the device described in claim 13 or 14, it is characterised in that described reconstruction module has
Body is used for:
The speech frame of tested speech is divided into reliably by the signal to noise ratio of each dimension of speech frame according to tested speech
Part and unreliable part, judge tested speech according to the unreliable part of the speech frame of tested speech
Speech frame needs to carry out lacking feature reconstruction;
Weight, mean vector and covariance matrix according to described each Gaussian Mixture cluster judge described survey
Gaussian Mixture belonging to the speech frame cluster of examination voice;
Weight, mean vector and the association of the Gaussian Mixture cluster belonging to the speech frame according to described tested speech
Variance matrix and formulaUnreliable part to the speech frame of tested speech
Carry out lacking feature reconstruction;
Wherein,Unreliable part for the speech frame of described tested speech;XoFor described tested speech
The reliable parts of speech frame;UkmGaussian Mixture cluster belonging to the speech frame of described tested speech
Part corresponding with the unreliable part of the speech frame of described tested speech in mean vector;UkoFor institute
State the language with described tested speech in the mean vector of the Gaussian Mixture cluster belonging to the speech frame of tested speech
The part that the reliable parts of sound frame is corresponding;θkmoFor the Gauss belonging to the speech frame by described tested speech
Mix in the covariance matrix of cluster corresponding with the unreliable part of the speech frame of described tested speech
OK, and the row corresponding with the reliable parts of the speech frame of described tested speech intersect elementary composition
Matrix;θkooIn the covariance matrix of the Gaussian Mixture cluster belonging to the speech frame by described tested speech
The row corresponding with the reliable parts of the speech frame of described tested speech, and with the voice of described tested speech
The elementary composition matrix that the corresponding row of the reliable parts of frame intersect.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510044910.4A CN105989843A (en) | 2015-01-28 | 2015-01-28 | Method and device of realizing missing feature reconstruction |
PCT/CN2015/093901 WO2016119501A1 (en) | 2015-01-28 | 2015-11-05 | Method and apparatus for implementing missing feature reconstruction |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510044910.4A CN105989843A (en) | 2015-01-28 | 2015-01-28 | Method and device of realizing missing feature reconstruction |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105989843A true CN105989843A (en) | 2016-10-05 |
Family
ID=56542342
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510044910.4A Withdrawn CN105989843A (en) | 2015-01-28 | 2015-01-28 | Method and device of realizing missing feature reconstruction |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN105989843A (en) |
WO (1) | WO2016119501A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106653056A (en) * | 2016-11-16 | 2017-05-10 | 中国科学院自动化研究所 | Fundamental frequency extraction model based on LSTM recurrent neural network and training method thereof |
CN108899032A (en) * | 2018-06-06 | 2018-11-27 | 平安科技(深圳)有限公司 | Method for recognizing sound-groove, device, computer equipment and storage medium |
WO2020034593A1 (en) * | 2018-08-13 | 2020-02-20 | 平安科技(深圳)有限公司 | Method and apparatus for processing missing feature in crowd performance feature prediction |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1475987A (en) * | 2003-07-14 | 2004-02-18 | 中国科学院声学研究所 | Hiaden Markov model edge decipher data reconstitution method f speech sound identification |
CN1571012A (en) * | 2003-07-11 | 2005-01-26 | 中国科学院声学研究所 | Method for rebuilding probability weighted average deletion characteristic data of speech recognition |
CN101236742A (en) * | 2008-03-03 | 2008-08-06 | 中兴通讯股份有限公司 | Music/ non-music real-time detection method and device |
WO2009123387A1 (en) * | 2008-03-31 | 2009-10-08 | Transono Inc. | Procedure for processing noisy speech signals, and apparatus and computer program therefor |
CN101853661A (en) * | 2010-05-14 | 2010-10-06 | 中国科学院声学研究所 | Noise spectrum estimation and voice mobility detection method based on unsupervised learning |
CN102820033A (en) * | 2012-08-17 | 2012-12-12 | 南京大学 | Voiceprint identification method |
CN103456310A (en) * | 2013-08-28 | 2013-12-18 | 大连理工大学 | Transient noise suppression method based on spectrum estimation |
CN103778920A (en) * | 2014-02-12 | 2014-05-07 | 北京工业大学 | Speech enhancing and frequency response compensation fusion method in digital hearing-aid |
CN104143327A (en) * | 2013-07-10 | 2014-11-12 | 腾讯科技(深圳)有限公司 | Acoustic model training method and device |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20050071656A (en) * | 2002-11-05 | 2005-07-07 | 코닌클리케 필립스 일렉트로닉스 엔.브이. | Spectrogram reconstruction by means of a codebook |
EP1918910B1 (en) * | 2006-10-31 | 2009-03-11 | Harman Becker Automotive Systems GmbH | Model-based enhancement of speech signals |
CN103650040B (en) * | 2011-05-16 | 2017-08-25 | 谷歌公司 | Use the noise suppressing method and device of multiple features modeling analysis speech/noise possibility |
US9786275B2 (en) * | 2012-03-16 | 2017-10-10 | Yale University | System and method for anomaly detection and extraction |
-
2015
- 2015-01-28 CN CN201510044910.4A patent/CN105989843A/en not_active Withdrawn
- 2015-11-05 WO PCT/CN2015/093901 patent/WO2016119501A1/en active Application Filing
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1571012A (en) * | 2003-07-11 | 2005-01-26 | 中国科学院声学研究所 | Method for rebuilding probability weighted average deletion characteristic data of speech recognition |
CN1475987A (en) * | 2003-07-14 | 2004-02-18 | 中国科学院声学研究所 | Hiaden Markov model edge decipher data reconstitution method f speech sound identification |
CN101236742A (en) * | 2008-03-03 | 2008-08-06 | 中兴通讯股份有限公司 | Music/ non-music real-time detection method and device |
WO2009123387A1 (en) * | 2008-03-31 | 2009-10-08 | Transono Inc. | Procedure for processing noisy speech signals, and apparatus and computer program therefor |
CN101853661A (en) * | 2010-05-14 | 2010-10-06 | 中国科学院声学研究所 | Noise spectrum estimation and voice mobility detection method based on unsupervised learning |
CN102820033A (en) * | 2012-08-17 | 2012-12-12 | 南京大学 | Voiceprint identification method |
CN104143327A (en) * | 2013-07-10 | 2014-11-12 | 腾讯科技(深圳)有限公司 | Acoustic model training method and device |
CN103456310A (en) * | 2013-08-28 | 2013-12-18 | 大连理工大学 | Transient noise suppression method based on spectrum estimation |
CN103778920A (en) * | 2014-02-12 | 2014-05-07 | 北京工业大学 | Speech enhancing and frequency response compensation fusion method in digital hearing-aid |
Non-Patent Citations (2)
Title |
---|
尹海明,王金明,李欢欢: "基于信噪比估计的说话人识别前端处理", 《军事通信技术》 * |
王宁: "基于缺失特征重建的说话人识别", 《万方学术期刊数据库》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106653056A (en) * | 2016-11-16 | 2017-05-10 | 中国科学院自动化研究所 | Fundamental frequency extraction model based on LSTM recurrent neural network and training method thereof |
CN106653056B (en) * | 2016-11-16 | 2020-04-24 | 中国科学院自动化研究所 | Fundamental frequency extraction model and training method based on LSTM recurrent neural network |
CN108899032A (en) * | 2018-06-06 | 2018-11-27 | 平安科技(深圳)有限公司 | Method for recognizing sound-groove, device, computer equipment and storage medium |
WO2020034593A1 (en) * | 2018-08-13 | 2020-02-20 | 平安科技(深圳)有限公司 | Method and apparatus for processing missing feature in crowd performance feature prediction |
Also Published As
Publication number | Publication date |
---|---|
WO2016119501A1 (en) | 2016-08-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Calhoun et al. | A method for making group inferences from functional MRI data using independent component analysis | |
DE102017102134B4 (en) | Globally optimized post-filtering using the least squares method for speech enhancement | |
Mitzner | Numerical solution for transient scattering from a hard surface of arbitrary shape—retarded potential technique | |
CN104680495B (en) | The self-adaptive solution method of ultrasonoscopy | |
CN105989843A (en) | Method and device of realizing missing feature reconstruction | |
US20120243763A1 (en) | Signal-to-noise enhancement in imaging applications using a time-series of images | |
CN105044701B (en) | Ground target sorting technique based on robustness time-frequency characteristics | |
CN107358945A (en) | A kind of more people's conversation audio recognition methods and system based on machine learning | |
CN108107475A (en) | A kind of borehole microseismic denoising method based on experience wavelet transformation and multi-threshold function | |
Hsu et al. | Online recursive independent component analysis for real-time source separation of high-density EEG | |
CN104360316B (en) | A kind of array antenna Adaptive beamformer method being tapered based on covariance matrix | |
Calhoun et al. | Group ICA of functional MRI data: separability, stationarity, and inference | |
CN101571949A (en) | PCNN-based method for de-noising wavelet domain ultrasonic medical image | |
CN109101890A (en) | Electrical energy power quality disturbance recognition methods and device based on wavelet transformation | |
CN106228045A (en) | A kind of identification system | |
CN106019256A (en) | Radar signal adaptive detection method based on autoregressive model | |
CN107067407A (en) | Profile testing method based on non-classical receptive field and linear non-linear modulation | |
CN104515984A (en) | Broadband radar target reecho denoising method based on Bayes compressed sensing | |
Vrins et al. | Improving independent component analysis performances by variable selection | |
CN105093189B (en) | Airborne radar object detection method based on GCV | |
CN104360338B (en) | A kind of array antenna Adaptive beamformer method loaded based on diagonal angle | |
CN108613737A (en) | The discrimination method of aircraft multifrequency vibration signal based on wavelet packet and STFT | |
Bonettini et al. | Primal-dual first order methods for total variation image restoration in presence of Poisson noise | |
CN104156925B (en) | Speckle and the enhanced processing method in border and system are removed to ultrasonoscopy | |
CN104007429B (en) | Steady-noise complete-polarization broadband target recognition method based on polarization decomposition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20161005 |
|
WW01 | Invention patent application withdrawn after publication |