CN102708871A - Line spectrum-to-parameter dimensional reduction quantizing method based on conditional Gaussian mixture model - Google Patents

Line spectrum-to-parameter dimensional reduction quantizing method based on conditional Gaussian mixture model Download PDF

Info

Publication number
CN102708871A
CN102708871A CN2012101400303A CN201210140030A CN102708871A CN 102708871 A CN102708871 A CN 102708871A CN 2012101400303 A CN2012101400303 A CN 2012101400303A CN 201210140030 A CN201210140030 A CN 201210140030A CN 102708871 A CN102708871 A CN 102708871A
Authority
CN
China
Prior art keywords
parameter
sequence
dimension
vector
sigma
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2012101400303A
Other languages
Chinese (zh)
Inventor
陈立伟
汤春明
廖艳萍
刘晴晴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Engineering University
Original Assignee
Harbin Engineering University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Engineering University filed Critical Harbin Engineering University
Priority to CN2012101400303A priority Critical patent/CN102708871A/en
Publication of CN102708871A publication Critical patent/CN102708871A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention provides a line spectrum-to-parameter dimensional reduction quantizing method based on a conditional Gaussian mixture model. Specifically, the method comprises the following steps of: firstly framing sampled voice signals, extracting LSP (Linear Spectrum Pair) characteristic parameters of the voice signals to carry out characteristic parameter dimension reduction; then dividing a characteristic parameter sequence to obtain subvector; combining a subvector parameter sequence in pairs, and establishing a union sequence; training a conditional Gaussian mixture model by utilizing the union sequence to obtain the parameters of the conditional Gaussian mixture model; calculating the conditional probability density by utilizing parameters of mean value vector, covariance matrix and the like of the conditional Gaussian mixture model, wherein the number is equal to that of a Gaussian component; then grouping the data, and including the current frame data into a group distributed by the Gaussian component with maximum conditional probability density; training a code book to the grouped data by using an LBG (Linde, Buzo and Gray) algorithm, thus finally obtaining the code book, namely the vector quantizing result of the voice signal. The line spectrum-to-parameter dimensional reduction quantizing method based on the conditional Gaussian mixture model can be used for promoting the quantizing property, and is simple to train, and low in calculation complexity.

Description

Line spectrum pairs parameter dimensionality reduction quantization method based on the condition gauss hybrid models
Technical field
The present invention relates to a kind of parameter quantification method, specifically a kind of line spectrum pairs parameter dimensionality reduction quantization method based on the condition gauss hybrid models.
Background technology
LSP (line spectrum pair) parameter is an important parameter in the voice coding, and it plays critical effect to decoded voice quality, so the quantification of this parameter just seems very important.
The quantification problem of LSP parameter from the seventies in last century be exactly hot issue.Wherein the main direction of research is the improvement to the vector quantizer structure, and initial quantizer is the division vector quantizer, and it splits into the less sub-vector of several dimensions with the LSP parameter, trains code book and quantification then respectively.This method has breakthrough at that time, it has reduced computation complexity greatly, has kept the advantage of vector quantization simultaneously again.Direct development to the division vector quantizer is the conversion division vector quantizer that Stephen So proposes, this algorithm before quantification earlier with the LSP parametric classification, thereby quantize classified sequence with code book more targetedly.Lattice type quantizer also is used to quantize the LSP parameter, but because lattice type quantizer has effect preferably to Gaussian distribution, the LSP parameter distributions is difficult for confirming.Along with gauss hybrid models is introduced in the vector quantization of voice gradually, the research that gauss hybrid models dative type quantizer combines is more and more paid close attention to by the people.In many research directions, the excavation of LSP parameter frame-to-frame correlation also is a focus.The scholar attempts to utilize this redundant bit still less of using between the LSP parameter to reach transparent quantification.
Summary of the invention
The object of the present invention is to provide a kind of line spectrum pairs parameter dimensionality reduction quantization method simple, that computation complexity is low of training based on the condition gauss hybrid models.
The objective of the invention is to realize like this:
Line spectrum pairs parameter dimensionality reduction quantization method based on the condition gauss hybrid models may further comprise the steps:
Step (1): at first divide frame, extract the LSP characteristic parameter of voice signal, and carry out the characteristic parameter dimensionality reduction to the voice signal after the sampling;
Step (2): the disruptive features argument sequence obtains sub-vector then;
Step (3): combine in twos to the sub-vector argument sequence, set up the associating sequence;
Step (4): utilize associating sequence training condition gauss hybrid models, obtain the various parameters of condition gauss hybrid models;
Step (5): utilize the parameter such as mean vector and covariance matrix of condition gauss hybrid models, design conditions probability density, the number of conditional probability density equal the number of gaussian component;
Step (6): carry out packet then, will work as last frame data and be included in the maximum described grouping of that gaussian component of conditional probability density value;
Step (7): the data that will divide into groups are used LBG algorithm training code book respectively;
Step (8): the code book that finally obtains is the vector quantization result of this voice signal.
Said step (1) comprises the following steps:
1) divides frame to the voice signal after the sampling;
2) every frame extracts P rank LSP parameter;
3) superframe formed in L frame voice;
4) utilize the compressed sensing theory that the higher-dimension line spectrum pair that this superframe forms is carried out dimension-reduction treatment earlier, obtain the measured value y of low dimension, then each sub-vector population of measured values of fixed allocation.
Said step (6) comprises the following steps:
1) division original series; The LSP parameter of 16 dimensions is split into five sub-vector forms of 3 dimension+3 dimensions+3 dimension+3 dimensions+4 dimensions, and division finishes, and to obtain quantity all are sub-vectors of 545,522;
2) set up the associating sequence; If initial LSP sub-vector argument sequence is x 1, x 2..., x n, x is the element of this argument sequence, and the dimension of each sequence all is 3 or 4, and two sequences that order is occurred combine in twos, just constitute new arrangement set a: x 1x 2, x 2x 3..., x N-1x n, with x wherein 1x 2Be example, this sequence is 6 or 8 dimensions, x 1Preceding 3 or 4 dimensions have wherein been constituted, x 2Constituted back 3 or 4 dimensions wherein, and x 2Be current subframe, x 1Be last subframe, claim that the new sequence that makes up is the associating sequence;
3) training associating GMM; Set the component number of GMM, establish GMM and constitute by m gaussian component, wherein m=4 or m=8, m is the number of the gaussian component of setting, trained obtains m gaussian component weight, 1 * 6 or 1 * 8 mean vector, 6 * 6 or 8 * 8 covariance matrix;
4) data qualification; Utilize formula
f X | Y ( X | Y ) = Σ i = 1 m β i ( Y ) g ( X | Y ; M i ( Y ) , C i )
Calculate the probable value of current subframe X, X and Y are respectively d dimension present frame and former frame LSP parameter, and d is the dimension of parameter, M i, C iBe respectively the 2d dimension mean vector and the 2d * 2d type covariance matrix of corresponding Gaussian density function,
Figure BDA00001615009600022
α iBe the weight of each gaussian component, and satisfy α i>0,
Figure BDA00001615009600023
G (Y) is a d dimension Gaussian density function, g ( Y ) = 1 ( 2 π ) d / 2 | C i | 1 / 2 × Exp ( - 1 2 ( Y - M i ) T C i - 1 ( Y - M i ) ) , I is the sequence number of gaussian component
Because GMM is made up of m gaussian component, so can calculate m conditional probability value, this frame is included in the grouping of that maximum one-component description of conditional probability value, this step is carried out can finally obtain m data classification since first frame order;
5) training code book; M data class of above-mentioned each sub-vector that has divided into groups used LBG algorithm training code book respectively.
Said step (4) comprises the following steps:
1) obtains the parameter of each gaussian component, α i, M i, C i, α iBe the weight of each gaussian component, M iAnd C iBe respectively the d dimension mean vector and the d * d type covariance matrix of corresponding Gaussian density function, i is the sequence number of gaussian component;
Adopt the EM iterative algorithm, mainly be divided into following 2 steps:
1. in the E step, promptly initial parameter is estimated, utilizes training data to ask for one group of initial parameter θ=[α 1, α 2α m, M 1, M 2... M m, C 1, C 2C m], can make
Figure BDA00001615009600031
And use the K-Mean Method to calculate the central point of clustering, with this as M 1, M 2... M mInitial value, α iBe the weight of each gaussian component, M iAnd C iBe respectively the d dimension mean vector and the d * d type covariance matrix of corresponding Gaussian density function, i is the sequence number of gaussian component;
2. M step, i.e. maximization utilizes the parameter that 1. E step obtains, according to maximum-likelihood criterion appraising model parameter again, and till parameter value reaches predefined requirement, new argument α i', M i', C i' available following formula calculates:
α i ′ = 1 m Σ j = 1 m h i ( x j )
M i ′ = Σ j = 1 m h i ( x j ) x j Σ j = 1 m h i ( x j )
C i ′ = 1 d Σ j = 1 m h i ( x j ) ( x j - M i ′ ) T ( x j - M i ′ ) Σ j = 1 m h i ( x j )
In the formula, h i(x j) the random vector x that observes of expression jBe the probability that is produced by i gaussian component, i and j are the sequence numbers of gaussian component
h i ( x j ) = α i g ( x j ) Σ i = 1 m α i g ( x j )
2) establishing X and Y is respectively d dimension present frame and former frame LSP parameter, X, and the joint probability density function of Y can be expressed as: f in the formula X, Y(X Y) is X, the joint probability density function of Y, g i(X Y) is 2d dimension Gaussian density function, M i, C iBe respectively the 2d dimension mean vector and the 2d * 2d type covariance matrix of corresponding Gaussian density function,
f X , Y ( X , Y ) = Σ i = 1 m α i g i ( X , Y )
M i = M i X M i Y
C i = C i XX C i XY C i YX C i YY
Can obtain the marginal probability density function of former frame LSP parameter Y, it is the GMM of a d dimension,
f Y ( Y ) = Σ i = 1 m α i f i ( Y )
Like this, under the known situation of former frame parameter Y, the conditional probability density of present frame X can be expressed as
f X | Y ( X | Y ) = Σ i = 1 m β i ( Y ) g ( X | Y ; M i ( Y ) , C i )
In the formula:
M i ( Y ) = M i X + C i XY ( C i YY ) - 1 ( Y - M i Y )
C i = C i XX + C i XY ( C i YY ) - 1 C i YX
β i ( Y ) = α i f i ( Y ) Σ j = 1 m α j f j ( Y )
Condition covariance C iIrrelevant with variable Y, can calculate in advance and store, i and j are the sequence numbers of gaussian component.
Said step (7) comprises the following steps:
1) given inceptive code book size N is through the random choice method or the selected initial centre of form of disintegrating method
Figure BDA00001615009600045
And establish initial average distortion D -1→ ∞, given calculating stops thresholding ε, wherein 0<ε<1;
2) around given code word, according to the arest neighbors criterion with training sequence X={x 1, x 2..., x m, the dimension that to be divided into N nonoverlapping regional m be training sequence, the arest neighbors criterion is following:
S j n = { x | d ( x , c j ) ≤ d ( x , c i ) , i ≠ j , c i , c j ∈ C N n , x ∈ X } J=1,2 ..., N
Figure BDA00001615009600047
Be j cell of the n time iteration gained, d (x, c j) be x and c jDistance, c jBe
Figure BDA00001615009600048
In element,
Figure BDA00001615009600049
Be the centre of form of the n time iteration gained, x is the element of training sequence X, and i, j are the sequence number of the centre of form element of correspondence;
3) calculate average distortion and distortion relatively
Average distortion does
D n = 1 m Σ r = 1 m min d c i ∈ C N n ( x r , c i )
In the formula, c iBe x rThe centre of form of place cell, D nBe average distortion, n is an iterations, and m is the vector dimension, d (x r, c i) be x rAnd c iDistance, r is a vector dimension sequence number, i is a centre of form sequence number;
Distortion does relatively
D rel n = | D n - 1 - D n D n |
If
Figure BDA000016150096000412
explains that then the current centre of form meets distortion criterion; These centres of form promptly can be used as code word; EOP (end of program); Otherwise recomputate the centre of form, turn to step 2) continue iteration, the centroid calculation formula is following:
c i n = 1 λ Σ r = 1 λ x r
In the formula, λ is included in the number of i the training sequence in the cell, and i is the cell sequence number.
The invention has the advantages that:
Vector quantization is a kind of important compression method, is widely used in voice coding, speech recognition and the phonetic synthesis.
Line spectrum pair (Line Spectrum Pair; LSP) parameter is an important parameter in the low rate voice coding; The LSP parameter is a kind of frequency domain parameter; With the peak of voice signal spectrum envelope contact is more closely arranged, its quantification and interpolation characteristic also are superior to his parameter, and the quantification quality of LSP parameter directly has influence on the intelligibility of synthetic speech.Therefore, studying efficiently, LSP parameter quantification algorithm is extremely important to voice coding.
On the basis of analyzing based on the LSP parameter dimensionality reduction of compressed sensing and condition gauss hybrid models, the present invention utilizes the condition gauss hybrid models that line spectrum pairs parameter is carried out dimensionality reduction to quantize.At first utilize dimensionality reduction algorithm that the LSP parameter is carried out dimensionality reduction, reach the purpose that reduces calculated amount based on compressed sensing.Utilize training parameter to construct the condition gauss hybrid models of present frame then, excavated frame-to-frame correlation, thereby design code book more targetedly; Make under different LSP parameters; The specific aim of code book is stronger, has promoted the quantification performance, thereby reaches better quantification effect.Compare with common division Vector Quantization algorithm, all show the validity of this method from spectrum distortion, computation complexity, three aspects of storage complexity.
The realization of the inventive method is through having made up a LSP parameter dimensionality reduction quantization system based on the condition gauss hybrid models, is that platform carries out a series of specific aim experiments with this system, and the hardware environment that system realizes is following:
1. hardware: processor Intel (R) Pentium (R) Dual, CPU 1.60GHz; Internal memory 1GB; Video card 256M; Hard disk 80G.
2. software: Windows XP operating system; Matlab7.0 and VC++6.0 development environment.
Quantification performance to system is estimated, and in the experiment, 545,522 frame data are used to train code book, and 65,016 frame training datas are used for test.All LSP parameters by ITU-T G.722.2 audio coder & decoder (codec) calculate, and calculate average spectrum distortion according to international method.In the experiment, 16 dimension LSP parameters are divided into 5 sub-vectors by (3,3,3,3,4), the bit number average row that each sub-vector distributed write in the corresponding form.The gaussian component number is respectively 4 and 8.Carry out Algorithm Analysis from spectrum distortion, computation complexity, storage complexity three aspects respectively, compare with SVQ (Split Vector Quantization, SVQ, division vector quantization) algorithm.The result is illustrated on this main performance of spectrum distortion (flood rate of averaging spectrum distortion and 2-4dB) and obviously is superior to the SVQ under the identical bit, on aspect the computation complexity, rises seldom, on storage complexity, increases to some extent.Because current process chip register length constantly enlarges, therefore this way with sacrifice storage complexity lifting capacity voltinism ability is desirable.
Description of drawings
Fig. 1 is the line spectrum pairs parameter dimensionality reduction quantization method process flow diagram based on the condition gauss hybrid models.
Embodiment
Below in conjunction with accompanying drawing the present invention is done more detailed description:
In conjunction with Fig. 1.Line spectrum pairs parameter dimensionality reduction based on the condition gauss hybrid models quantizes may further comprise the steps:
Line spectrum pairs parameter dimensionality reduction based on the condition gauss hybrid models quantizes, and it is characterized in that:
(1) input speech signal carries out the branch frame
Employing adds the method for Hamming window to be carried out, and the definition of window function is following:
Figure BDA00001615009600061
N is the length of window, the length of promptly dividing frame, and w (n) is a window function.Voice after the windowing become:
s w(n)=s(n)w(n)
S (n) is a raw tone, s w(n) be the windowing voice.
(2) extract line spectrum pair (LSP) characteristic parameter, comprising:
1. voice signal is carried out the linear prediction analysis of P rank, P is the linear prediction exponent number, obtains P linear predictor coefficient α i, i=1,2 ..., P.Making
Figure BDA00001615009600062
i is the characteristic parameter sequence number.A (z) is a linear prediction analysis filter.
2. define two polynomial expressions:
P(z)=A(z)+z -(p+1)A(z -1)
Q(z)=A(z)-z -(p+1)A(z -1)
Satisfying
Figure BDA00001615009600063
P is the linear prediction exponent number; Be plural number their zero point, and the frequency that its phase place is represented is exactly the LSP parameter.
3. a superframe formed in L frame voice;
4. utilize the compressed sensing theory that the higher-dimension line spectrum pair that this superframe forms is carried out dimension-reduction treatment earlier, obtain the measured value y of low dimension, then each sub-vector population of measured values of fixed allocation.
(3) division one frame characteristic parameter sequence obtains sub-vector
The LSP parameter of 16 dimensions is split into five sub-vector forms of 3 dimension+3 dimensions+3 dimension+3 dimensions+4 dimensions.Division finishes, and just to have obtained quantity all are sub-vectors of 545,522.
(4) set up the associating sequence
If initial LSP sub-vector argument sequence is x 1, x 2..., x n, the dimension of each sequence all is 3 (or 4).Two sequences that order is occurred combine in twos, just constitute new arrangement set a: x 1x 2, x 2x 3..., x N-1x n, with x wherein 1x 2Be example, this sequence is 6 (or 8) dimensions, x 1Constituted preceding 3 (or 4) dimension wherein, x 2Constituted back 3 (or 4) dimension wherein, and x 2Be current subframe, x 1It is last subframe.Claim that the new sequence that makes up is the associating sequence.
(5) training condition gauss hybrid models
1. obtain the parameter of each gaussian component, i.e. α i, M i, C i, α iBe the weight of each gaussian component, M iAnd C iBe respectively the d dimension mean vector and the d * d type covariance matrix of corresponding Gaussian density function.
Adopt the EM iterative algorithm.This algorithm mainly is divided into following 2 steps.
In the E step, promptly initial parameter is estimated.Utilize training data to ask for one group of initial parameter
θ=[α 1,α 2…α m,M 1,M 2,…M m,C 1,C 2…C m]。Can make
Figure BDA00001615009600071
And use the K-Mean Method to calculate the central point of clustering, with this as M 1, M 2... M mInitial value.
M step, i.e. maximization.Utilize last one to go on foot the parameter that obtains, according to maximum-likelihood criterion appraising model parameter again, till parameter value reaches predefined requirement.New argument α i', M i', C i' available following formula calculates:
α i ′ = 1 m Σ j = 1 m h i ( x j )
M i ′ = Σ j = 1 m h i ( x j ) x j Σ j = 1 m h i ( x j )
C i ′ = 1 d Σ j = 1 m h i ( x j ) ( x j - M i ′ ) T ( x j - M i ′ ) Σ j = 1 m h i ( x j )
In the formula, h i(x j) the random vector x that observes of expression jIt is the probability that produces by i gaussian component.
h i ( x j ) = α i g ( x j ) Σ i = 1 m α i g ( x j )
2. establishing X and Y is respectively d dimension present frame and former frame LSP parameter, X, and the joint probability density function of Y can be expressed as
f X , Y ( X , Y ) = Σ i = 1 m α i g i ( X , Y )
M i = M i X M i Y
C i = C i XX C i XY C i YX C i YY
G in the formula i(X Y) is 2d dimension Gaussian density function, M i, C iBe respectively the 2d dimension mean vector and the 2d * 2d type covariance matrix of corresponding Gaussian density function.
(6) design conditions probability density
Obtain the marginal probability density function of former frame LSP parameter Y, it is the GMM of a d dimension.
f Y ( Y ) = Σ i = 1 m α i f i ( Y )
Like this, under the known situation of former frame parameter Y, the conditional probability density of present frame X can be expressed as
f X | Y ( X | Y ) = Σ i = 1 m β i ( Y ) g ( X | Y ; M i ( Y ) , C i )
In the formula:
M i ( Y ) = M i X + C i XY ( C i YY ) - 1 ( Y - M i Y )
C i = C i XX + C i XY ( C i YY ) - 1 C i YX
β i ( Y ) = α i f i ( Y ) Σ j = 1 m α j f j ( Y )
Condition covariance C iIrrelevant with variable Y, can calculate in advance and store.
(7) with LBG (a kind of Vector Quantization algorithm, by Linde, Buzo, Gray three people proposed in 1980, LBG is the initial of three names) algorithm training code book
1. given inceptive code book size N is through the random choice method or the selected initial centre of form of disintegrating method
Figure BDA00001615009600084
And establish initial average distortion D -1→ ∞, given calculating stops thresholding ε (0<ε<1).
2. around given code word, according to the arest neighbors criterion with training sequence X={x 1, x 2..., x mBe divided into N nonoverlapping zone (cell).The arest neighbors criterion is following:
S j n = { x | d ( x , c j ) ≤ d ( x , c i ) , i ≠ j , c i , c j ∈ C N n , x ∈ X } j=1,2,…,N
3. calculate average distortion and distortion relatively.
Average distortion does D n = 1 m Σ r = 1 m Min d c i ∈ C N n ( x r , c i )
In the formula, c iBe x rThe centre of form of place cell.
Distortion does relatively D Rel n = | D n - 1 - D n D n |
If explains that then the current centre of form meets distortion criterion; These centres of form promptly can be used as code word, EOP (end of program).Otherwise recomputate the centre of form, turn to step 2. to continue iteration.The centroid calculation formula is following:
c i n = 1 λ Σ r = 1 λ x r
In the formula, λ is included in the number of i the training sequence in the cell.
Concrete performing step of the present invention is:
1. at first divide frame, extract line spectrum pair (LSP) characteristic parameter of voice signal, and carry out the characteristic parameter dimensionality reduction, specifically may further comprise the steps the voice signal after the sampling:
(1) divides frame to the voice signal after the sampling;
(2) every frame extracts P (P is the exponent number of characteristic parameter) rank LSP parameter;
(3) superframe formed in L (L is the frame number that is comprised of a superframe) frame voice;
(4) utilize the compressed sensing theory that the higher-dimension line spectrum pair that this superframe forms is carried out dimension-reduction treatment earlier, obtain the measured value y of low dimension, then each sub-vector population of measured values of fixed allocation.
2. the disruptive features argument sequence obtains sub-vector then;
3. carry out combination in twos to the sub-vector argument sequence, set up the associating sequence;
4. utilize associating sequence training condition gauss hybrid models, obtain the various parameters of condition gauss hybrid models, specifically may further comprise the steps:
(1) obtains the parameter of each gaussian component, α i, M i, C i, α iBe the weight of each gaussian component, M iAnd C iBe respectively the d dimension mean vector and the d * d type covariance matrix of corresponding Gaussian density function, i is the sequence number of gaussian component;
Adopt EM (Expect-Maximum, expectation maximization) iterative algorithm, mainly be divided into following 2 steps:
1. in the E step, promptly initial parameter is estimated, utilizes training data to ask for one group of initial parameter θ=[α 1, α 2α m, M 1, M 2... M m, C 1, C 2C m].Can make
Figure BDA00001615009600091
And use the K-Mean Method to calculate the central point of clustering, with this as M 1, M 2... M mInitial value, α iBe the weight of each gaussian component, M iAnd C iBe respectively the d dimension mean vector and the d * d type covariance matrix of corresponding Gaussian density function, i is the sequence number of gaussian component;
2. the M step, i.e. maximization utilizes the parameter that 1. obtains, according to maximum-likelihood criterion appraising model parameter again, and till parameter value reaches predefined requirement, new argument α i', M i', C i' available following formula calculates:
α i ′ = 1 m Σ j = 1 m h i ( x j )
M i ′ = Σ j = 1 m h i ( x j ) x j Σ j = 1 m h i ( x j )
C i ′ = 1 d Σ j = 1 m h i ( x j ) ( x j - M i ′ ) T ( x j - M i ′ ) Σ j = 1 m h i ( x j )
In the formula, h i(x j) the random vector x that observes of expression jBe the probability that is produced by i gaussian component, i and j are the sequence numbers of gaussian component
h i ( x j ) = α i g ( x j ) Σ i = 1 m α i g ( x j )
(2) establishing X and Y is respectively d dimension present frame and former frame LSP parameter, X, and the joint probability density function of Y can be expressed as: f in the formula X, Y(X Y) is X, the joint probability density function of Y, g i(X Y) is 2d dimension Gaussian density function, M i, C iBe respectively the 2d dimension mean vector and the 2d * 2d type covariance matrix of corresponding Gaussian density function,
f X , Y ( X , Y ) = Σ i = 1 m α i g i ( X , Y )
M i = M i X M i Y
C i = C i XX C i XY C i YX C i YY
Can obtain the marginal probability density function of former frame LSP parameter Y, it is the GMM of a d dimension.
f Y ( Y ) = Σ i = 1 m α i f i ( Y )
Like this, under the known situation of former frame parameter Y, the conditional probability density of present frame X can be expressed as
f X | Y ( X | Y ) = Σ i = 1 m β i ( Y ) g ( X | Y ; M i ( Y ) , C i )
In the formula:
M i ( Y ) = M i X + C i XY ( C i YY ) - 1 ( Y - M i Y )
C i = C i XX + C i XY ( C i YY ) - 1 C i YX
β i ( Y ) = α i f i ( Y ) Σ j = 1 m α j f j ( Y )
Condition covariance C iIrrelevant with variable Y, can calculate in advance and store.I and j are the sequence numbers of gaussian component.
5. utilize the parameter such as mean vector and covariance matrix of condition gauss hybrid models, design conditions probability density, the number of conditional probability density equal the number of gaussian component;
6. carry out packet then, a handled frame be included in the maximum described grouping of that gaussian component of conditional probability density value, specifically may further comprise the steps:
(1) division original series; The LSP parameter of 16 dimensions is split into five sub-vector forms of 3 dimension+3 dimensions+3 dimension+3 dimensions+4 dimensions, and division finishes, and just to have obtained quantity all are sub-vectors of 545,522;
(2) set up the associating sequence; If initial LSP sub-vector argument sequence is x 1, x 2..., x n, x is the element of this argument sequence, and the dimension of each sequence all is 3 (or 4), and two sequences that order is occurred combine in twos, just constitute new arrangement set a: x 1x 2, x 2x 3..., x N-1x n, with x wherein 1x 2Be example, this sequence is 6 (or 8) dimensions, x 1Constituted preceding 3 (or 4) dimension wherein, x 2Constituted back 3 (or 4) dimension wherein, and x 2Be current subframe, x 1It is last subframe.Claim that the new sequence that makes up is the associating sequence;
(3) training associating GMM (Gaussian Mixture Model, gauss hybrid models); Set the component number of GMM, establish GMM and constitute (m=4 or m=8) by m gaussian component, m is the number of the gaussian component of setting, and trained obtains m gaussian component weight, the mean vector of 1 * 6 (or 1 * 8), the covariance matrix of 6 * 6 (or 8 * 8);
(4) data qualification.Utilize formula
f X | Y ( X | Y ) = Σ i = 1 m β i ( Y ) g ( X | Y ; M i ( Y ) , C i )
Calculate the probable value of current subframe X.X and Y are respectively d dimension present frame and former frame LSP parameter, and d is the dimension of parameter, M i, C iBe respectively the 2d dimension mean vector and the 2d * 2d type covariance matrix of corresponding Gaussian density function,
Figure BDA00001615009600111
α iBe the weight of each gaussian component, and satisfy α i>0,
Figure BDA00001615009600112
Figure BDA00001615009600113
G (Y) is a d dimension Gaussian density function, g ( Y ) = 1 ( 2 π ) d / 2 | C i | 1 / 2 × Exp ( - 1 2 ( Y - M i ) T C i - 1 ( Y - M i ) ) , M iAnd C iBe respectively the d dimension mean vector and the d * d type covariance matrix of corresponding Gaussian density function, i is the sequence number of gaussian component;
Because GMM is made up of m gaussian component, so can calculate m conditional probability value, this frame is included in the grouping of that maximum one-component description of conditional probability value, this step is carried out can finally obtain m data classification since first frame order;
(5) training code book; M data class of above-mentioned each sub-vector that has divided into groups used LBG algorithm training code book respectively.
7. the data that will divide into groups are used LBG (Linde-Buzo-Gray, LBG) algorithm training code book respectively; , specifically may further comprise the steps:
(1) given inceptive code book size N selectes the initial centre of form
Figure BDA00001615009600115
also through random choice method or disintegrating method
If initial average distortion D -1→ ∞, given calculating stops thresholding ε (0<ε<1);
(2) around given code word, according to the arest neighbors criterion with training sequence X={x 1, x 2..., x m, being divided into N nonoverlapping zone (cell) m is the dimension of training sequence.The arest neighbors criterion is following:
S j n = { x | d ( x , c j ) ≤ d ( x , c i ) , i ≠ j , c i , c j ∈ C N n , x ∈ X } j=1,2,…,N
Figure BDA00001615009600117
Be j cell of the n time iteration gained, d (x, c j) be x and c jDistance, c jBe
Figure BDA00001615009600118
In element,
Figure BDA00001615009600119
Be the centre of form of the n time iteration gained, x is the element of training sequence X, and i, j are the sequence number of the centre of form element of correspondence;
(3) calculate average distortion and distortion relatively.
Average distortion does
D n = 1 m Σ r = 1 m min d c i ∈ C N n ( x r , c i )
In the formula, c iBe x rThe centre of form of place cell, D nBe average distortion, n is an iterations, and m is the vector dimension, d (x r, c i) be x rAnd c iDistance, r is a vector dimension sequence number, i is a centre of form sequence number;
Distortion does relatively
D rel n = | D n - 1 - D n D n |
If
Figure BDA000016150096001112
explains that then the current centre of form meets distortion criterion; These centres of form promptly can be used as code word; EOP (end of program); Otherwise recomputate the centre of form, turn to step (2) to continue iteration, the centroid calculation formula is following:
c i n = 1 λ Σ r = 1 λ x r
In the formula, λ is included in the number of i the training sequence in the cell, and i is the cell sequence number.
8. the code book that finally obtains is the vector quantization result of this voice signal.

Claims (5)

1. line spectrum pairs parameter dimensionality reduction quantization method based on the condition gauss hybrid models is characterized in that may further comprise the steps:
Step (1): at first divide frame, extract the LSP characteristic parameter of voice signal, and carry out the characteristic parameter dimensionality reduction to the voice signal after the sampling;
Step (2): the disruptive features argument sequence obtains sub-vector then;
Step (3): combine in twos to the sub-vector argument sequence, set up the associating sequence;
Step (4): utilize associating sequence training condition gauss hybrid models, obtain the various parameters of condition gauss hybrid models;
Step (5): utilize the parameter such as mean vector and covariance matrix of condition gauss hybrid models, design conditions probability density, the number of conditional probability density equal the number of gaussian component;
Step (6): carry out packet then, will work as last frame data and be included in the maximum described grouping of that gaussian component of conditional probability density value;
Step (7): the data that will divide into groups are used LBG algorithm training code book respectively;
Step (8): the code book that finally obtains is the vector quantization result of this voice signal.
2. the line spectrum pairs parameter dimensionality reduction quantization method based on the condition gauss hybrid models according to claim 1, it is characterized in that: said step (1) comprises the following steps:
1) divides frame to the voice signal after the sampling;
2) every frame extracts P rank LSP parameter;
3) superframe formed in L frame voice;
4) utilize the compressed sensing theory that the higher-dimension line spectrum pair that this superframe forms is carried out dimension-reduction treatment earlier, obtain the measured value y of low dimension, then each sub-vector population of measured values of fixed allocation.
3. the line spectrum pairs parameter dimensionality reduction quantization method based on the condition gauss hybrid models according to claim 2, it is characterized in that: said step (6) comprises the following steps:
1) division original series; The LSP parameter of 16 dimensions is split into five sub-vector forms of 3 dimension+3 dimensions+3 dimension+3 dimensions+4 dimensions, and division finishes, and to obtain quantity all are sub-vectors of 545,522;
2) set up the associating sequence; If initial LSP sub-vector argument sequence is x 1, x 2..., x n, x is the element of this argument sequence, and the dimension of each sequence all is 3 or 4, and two sequences that order is occurred combine in twos, just constitute new arrangement set a: x 1x 2, x 2x 3..., x N-1x n, with x wherein 1x 2Be example, this sequence is 6 or 8 dimensions, x 1Preceding 3 or 4 dimensions have wherein been constituted, x 2Constituted back 3 or 4 dimensions wherein, and x 2Be current subframe, x 1Be last subframe, claim that the new sequence that makes up is the associating sequence;
3) training associating GMM; Set the component number of GMM, establish GMM and constitute by m gaussian component, wherein m=4 or m=8, m is the number of the gaussian component of setting, trained obtains m gaussian component weight, 1 * 6 or 1 * 8 mean vector, 6 * 6 or 8 * 8 covariance matrix;
4) data qualification; Utilize formula
f X | Y ( X | Y ) = Σ i = 1 m β i ( Y ) g ( X | Y ; M i ( Y ) , C i )
Calculate the probable value of current subframe X, X and Y are respectively d dimension present frame and former frame LSP parameter, and d is the dimension of parameter, M i, C iBe respectively the 2d dimension mean vector and the 2d * 2d type covariance matrix of corresponding Gaussian density function,
Figure FDA00001615009500021
α iBe the weight of each gaussian component, and satisfy α i>0,
Figure FDA00001615009500022
Figure FDA00001615009500023
G (Y) is a d dimension Gaussian density function, g ( Y ) = 1 ( 2 π ) d / 2 | C i | 1 / 2 × Exp ( - 1 2 ( Y - M i ) T C i - 1 ( Y - M i ) ) , I is the sequence number of gaussian component;
Because GMM is made up of m gaussian component, so can calculate m conditional probability value, this frame is included in the grouping of that maximum one-component description of conditional probability value, this step is carried out can finally obtain m data classification since first frame order;
5) training code book; M data class of above-mentioned each sub-vector that has divided into groups used LBG algorithm training code book respectively.
4. the line spectrum pairs parameter dimensionality reduction quantization method based on the condition gauss hybrid models according to claim 3, it is characterized in that: said step (4) comprises the following steps:
1) obtains the parameter of each gaussian component, α i, M i, C i, α iBe the weight of each gaussian component, M iAnd C iBe respectively the d dimension mean vector and the d * d type covariance matrix of corresponding Gaussian density function, i is the sequence number of gaussian component;
Adopt the EM iterative algorithm, mainly be divided into following 2 steps:
1. in the E step, promptly initial parameter is estimated, utilizes training data to ask for one group of initial parameter θ=[α 1, α 2α m, M 1, M 2... M m, C 1, C 2C m], can make
Figure FDA00001615009500025
And use the K-Mean Method to calculate the central point of clustering, with this as M 1, M 2... M mInitial value, α iBe the weight of each gaussian component, M iAnd C iBe respectively the d dimension mean vector and the d * d type covariance matrix of corresponding Gaussian density function, i is the sequence number of gaussian component;
2. M step, i.e. maximization utilizes the parameter that 1. E step obtains, according to maximum-likelihood criterion appraising model parameter again, and till parameter value reaches predefined requirement, new argument α i', M i', C i' available following formula calculates:
α i ′ = 1 m Σ j = 1 m h i ( x j )
M i ′ = Σ j = 1 m h i ( x j ) x j Σ j = 1 m h i ( x j )
C i ′ = 1 d Σ j = 1 m h i ( x j ) ( x j - M i ′ ) T ( x j - M i ′ ) Σ j = 1 m h i ( x j )
In the formula, h i(x j) the random vector x that observes of expression jBe the probability that is produced by i gaussian component, i and j are the sequence numbers of gaussian component
h i ( x j ) = α i g ( x j ) Σ i = 1 m α i g ( x j )
2) establishing X and Y is respectively d dimension present frame and former frame LSP parameter, X, and the joint probability density function of Y can be expressed as: f in the formula X, Y(X Y) is X, the joint probability density function of Y, g i(X Y) is 2d dimension Gaussian density function, M i, C iBe respectively the 2d dimension mean vector and the 2d * 2d type covariance matrix of corresponding Gaussian density function,
f X , Y ( X , Y ) = Σ i = 1 m α i g i ( X , Y )
M i = M i X M i Y
C i = C i XX C i XY C i YX C i YY
Can obtain the marginal probability density function of former frame LSP parameter Y, it is the GMM of a d dimension,
f Y ( Y ) = Σ i = 1 m α i f i ( Y )
Like this, under the known situation of former frame parameter Y, the conditional probability density of present frame X can be expressed as
f X | Y ( X | Y ) = Σ i = 1 m β i ( Y ) g ( X | Y ; M i ( Y ) , C i )
In the formula:
M i ( Y ) = M i X + C i XY ( C i YY ) - 1 ( Y - M i Y )
C i = C i XX + C i XY ( C i YY ) - 1 C i YX
β i ( Y ) = α i f i ( Y ) Σ j = 1 m α j f j ( Y )
Condition covariance C iIrrelevant with variable Y, can calculate in advance and store, i and j are the sequence numbers of gaussian component.
5. the line spectrum pairs parameter dimensionality reduction quantization method based on the condition gauss hybrid models according to claim 4, it is characterized in that: said step (7) comprises the following steps:
1) given inceptive code book size N selectes the initial centre of form
Figure FDA00001615009500039
also through random choice method or disintegrating method
If initial average distortion D -1→ ∞, given calculating stops thresholding ε, wherein 0<ε<1;
2) around given code word, according to the arest neighbors criterion with training sequence X={x 1, x 2..., x m, the dimension that to be divided into N nonoverlapping regional m be training sequence, the arest neighbors criterion is following:
S j n = { x | d ( x , c j ) ≤ d ( x , c i ) , i ≠ j , c i , c j ∈ C N n , x ∈ X } j=1,2,…,N
Figure FDA000016150095000311
Be j cell of the n time iteration gained, d (x, c j) be x and c jDistance, c jBe In element,
Figure FDA000016150095000313
Be the centre of form of the n time iteration gained, x is the element of training sequence X, and i, j are the sequence number of the centre of form element of correspondence;
3) calculate average distortion and distortion relatively
Average distortion does
D n = 1 m Σ r = 1 m min d c i ∈ C N n ( x r , c i )
In the formula, c iBe x rThe centre of form of place cell, D nBe average distortion, n is an iterations, and m is the vector dimension, d (x r, c i) be x rAnd c iDistance, r is a vector dimension sequence number, i is a centre of form sequence number;
Distortion does relatively
D rel n = | D n - 1 - D n D n |
If
Figure FDA00001615009500043
explains that then the current centre of form meets distortion criterion; These centres of form promptly can be used as code word; EOP (end of program); Otherwise recomputate the centre of form, turn to step 2) continue iteration, the centroid calculation formula is following:
c i n = 1 λ Σ r = 1 λ x r
In the formula, λ is included in the number of i the training sequence in the cell, and i is the cell sequence number.
CN2012101400303A 2012-05-08 2012-05-08 Line spectrum-to-parameter dimensional reduction quantizing method based on conditional Gaussian mixture model Pending CN102708871A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2012101400303A CN102708871A (en) 2012-05-08 2012-05-08 Line spectrum-to-parameter dimensional reduction quantizing method based on conditional Gaussian mixture model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2012101400303A CN102708871A (en) 2012-05-08 2012-05-08 Line spectrum-to-parameter dimensional reduction quantizing method based on conditional Gaussian mixture model

Publications (1)

Publication Number Publication Date
CN102708871A true CN102708871A (en) 2012-10-03

Family

ID=46901572

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2012101400303A Pending CN102708871A (en) 2012-05-08 2012-05-08 Line spectrum-to-parameter dimensional reduction quantizing method based on conditional Gaussian mixture model

Country Status (1)

Country Link
CN (1) CN102708871A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103678896A (en) * 2013-12-04 2014-03-26 南昌大学 CVB separation method for GMM parameters
CN104244017A (en) * 2014-09-19 2014-12-24 重庆邮电大学 Multi-level codebook vector quantitative method for compressed encoding of hyperspectral remote sensing image
CN104244018A (en) * 2014-09-19 2014-12-24 重庆邮电大学 Vector quantization method capable of rapidly compressing high-spectrum signals
CN106782510A (en) * 2016-12-19 2017-05-31 苏州金峰物联网技术有限公司 Place name voice signal recognition methods based on continuous mixed Gaussian HMM model
CN107580722A (en) * 2015-05-27 2018-01-12 英特尔公司 Gauss hybrid models accelerator with the direct memory access (DMA) engine corresponding to each data flow
CN108109612A (en) * 2017-12-07 2018-06-01 苏州大学 A kind of speech recognition sorting technique based on self-adaptive reduced-dimensions
CN110019953A (en) * 2019-04-16 2019-07-16 中国科学院国家空间科学中心 A kind of real-time quick look system of payload image data
CN111520615A (en) * 2020-04-28 2020-08-11 清华大学 Pipe network leakage identification and positioning method based on line spectrum pair and cubic interpolation search

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040210436A1 (en) * 2000-04-19 2004-10-21 Microsoft Corporation Audio segmentation and classification
CN101188107A (en) * 2007-09-28 2008-05-28 中国民航大学 A voice recognition method based on wavelet decomposition and mixed Gauss model estimation
CN102034472A (en) * 2009-09-28 2011-04-27 戴红霞 Speaker recognition method based on Gaussian mixture model embedded with time delay neural network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040210436A1 (en) * 2000-04-19 2004-10-21 Microsoft Corporation Audio segmentation and classification
CN101188107A (en) * 2007-09-28 2008-05-28 中国民航大学 A voice recognition method based on wavelet decomposition and mixed Gauss model estimation
CN102034472A (en) * 2009-09-28 2011-04-27 戴红霞 Speaker recognition method based on Gaussian mixture model embedded with time delay neural network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
陈立伟: "基于条件PDF的宽带ISF参数分裂矢量量化方法", 《应用科技》 *
鲍长春: "《数字语音编码原理》", 31 December 2007 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103678896A (en) * 2013-12-04 2014-03-26 南昌大学 CVB separation method for GMM parameters
CN104244017A (en) * 2014-09-19 2014-12-24 重庆邮电大学 Multi-level codebook vector quantitative method for compressed encoding of hyperspectral remote sensing image
CN104244018A (en) * 2014-09-19 2014-12-24 重庆邮电大学 Vector quantization method capable of rapidly compressing high-spectrum signals
CN104244017B (en) * 2014-09-19 2018-02-27 重庆邮电大学 The multi-level codebook vector quantization method of compressed encoding high-spectrum remote sensing
CN104244018B (en) * 2014-09-19 2018-04-27 重庆邮电大学 The vector quantization method of Fast Compression bloom spectrum signal
CN107580722B (en) * 2015-05-27 2022-01-14 英特尔公司 Gaussian mixture model accelerator with direct memory access engines corresponding to respective data streams
CN107580722A (en) * 2015-05-27 2018-01-12 英特尔公司 Gauss hybrid models accelerator with the direct memory access (DMA) engine corresponding to each data flow
CN106782510A (en) * 2016-12-19 2017-05-31 苏州金峰物联网技术有限公司 Place name voice signal recognition methods based on continuous mixed Gaussian HMM model
CN106782510B (en) * 2016-12-19 2020-06-02 苏州金峰物联网技术有限公司 Place name voice signal recognition method based on continuous Gaussian mixture HMM model
CN108109612A (en) * 2017-12-07 2018-06-01 苏州大学 A kind of speech recognition sorting technique based on self-adaptive reduced-dimensions
CN110019953A (en) * 2019-04-16 2019-07-16 中国科学院国家空间科学中心 A kind of real-time quick look system of payload image data
CN110019953B (en) * 2019-04-16 2021-03-30 中国科学院国家空间科学中心 Real-time quick-look system for effective load image data
CN111520615A (en) * 2020-04-28 2020-08-11 清华大学 Pipe network leakage identification and positioning method based on line spectrum pair and cubic interpolation search
CN111520615B (en) * 2020-04-28 2021-03-16 清华大学 Pipe network leakage identification and positioning method based on line spectrum pair and cubic interpolation search

Similar Documents

Publication Publication Date Title
CN102708871A (en) Line spectrum-to-parameter dimensional reduction quantizing method based on conditional Gaussian mixture model
US6826526B1 (en) Audio signal coding method, decoding method, audio signal coding apparatus, and decoding apparatus where first vector quantization is performed on a signal and second vector quantization is performed on an error component resulting from the first vector quantization
EP1953737B1 (en) Transform coder and transform coding method
CN1906855B (en) Dimensional vector and variable resolution quantisation
CN101057275B (en) Vector conversion device and vector conversion method
US7243061B2 (en) Multistage inverse quantization having a plurality of frequency bands
CN101371295B (en) Apparatus and method for encoding and decoding signal
Ma et al. Vector quantization of LSF parameters with a mixture of Dirichlet distributions
CN103050122B (en) MELP-based (Mixed Excitation Linear Prediction-based) multi-frame joint quantization low-rate speech coding and decoding method
Boucheron et al. Low bit-rate speech coding through quantization of mel-frequency cepstral coefficients
KR20090117876A (en) Encoding device and encoding method
Ranjan A discrete wavelet transform based approach to Hindi speech recognition
WO2009125588A1 (en) Encoding device and encoding method
CN102436815B (en) Voice identifying device applied to on-line test system of spoken English
CN102982807A (en) Method and system for multi-stage vector quantization of speech signal LPC coefficients
CN102812512B (en) Method and apparatus for processing an audio signal
Shin et al. Audio coding based on spectral recovery by convolutional neural network
CN104183239A (en) Method for identifying speaker unrelated to text based on weighted Bayes mixture model
CN117292694B (en) Time-invariant-coding-based few-token neural voice encoding and decoding method and system
Sisman et al. A new speech coding algorithm using zero cross and phoneme based SYMPES
Kang et al. A High-Rate Extension to Soundstream
Chu Embedded quantization of line spectral frequencies using a multistage tree-structured vector quantizer
Lee et al. Entropy coding of compressed feature parameters for distributed speech recognition
Xiang et al. Mobile audio coding using lattice vector quantization based on Gaussian mixture model
Ali Faraj Comparative analysis of vector quantization methods used in speech processing

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20121003