CN102708871A - Line spectrum-to-parameter dimensional reduction quantizing method based on conditional Gaussian mixture model - Google Patents
Line spectrum-to-parameter dimensional reduction quantizing method based on conditional Gaussian mixture model Download PDFInfo
- Publication number
- CN102708871A CN102708871A CN2012101400303A CN201210140030A CN102708871A CN 102708871 A CN102708871 A CN 102708871A CN 2012101400303 A CN2012101400303 A CN 2012101400303A CN 201210140030 A CN201210140030 A CN 201210140030A CN 102708871 A CN102708871 A CN 102708871A
- Authority
- CN
- China
- Prior art keywords
- parameter
- sequence
- dimension
- vector
- sigma
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The invention provides a line spectrum-to-parameter dimensional reduction quantizing method based on a conditional Gaussian mixture model. Specifically, the method comprises the following steps of: firstly framing sampled voice signals, extracting LSP (Linear Spectrum Pair) characteristic parameters of the voice signals to carry out characteristic parameter dimension reduction; then dividing a characteristic parameter sequence to obtain subvector; combining a subvector parameter sequence in pairs, and establishing a union sequence; training a conditional Gaussian mixture model by utilizing the union sequence to obtain the parameters of the conditional Gaussian mixture model; calculating the conditional probability density by utilizing parameters of mean value vector, covariance matrix and the like of the conditional Gaussian mixture model, wherein the number is equal to that of a Gaussian component; then grouping the data, and including the current frame data into a group distributed by the Gaussian component with maximum conditional probability density; training a code book to the grouped data by using an LBG (Linde, Buzo and Gray) algorithm, thus finally obtaining the code book, namely the vector quantizing result of the voice signal. The line spectrum-to-parameter dimensional reduction quantizing method based on the conditional Gaussian mixture model can be used for promoting the quantizing property, and is simple to train, and low in calculation complexity.
Description
Technical field
The present invention relates to a kind of parameter quantification method, specifically a kind of line spectrum pairs parameter dimensionality reduction quantization method based on the condition gauss hybrid models.
Background technology
LSP (line spectrum pair) parameter is an important parameter in the voice coding, and it plays critical effect to decoded voice quality, so the quantification of this parameter just seems very important.
The quantification problem of LSP parameter from the seventies in last century be exactly hot issue.Wherein the main direction of research is the improvement to the vector quantizer structure, and initial quantizer is the division vector quantizer, and it splits into the less sub-vector of several dimensions with the LSP parameter, trains code book and quantification then respectively.This method has breakthrough at that time, it has reduced computation complexity greatly, has kept the advantage of vector quantization simultaneously again.Direct development to the division vector quantizer is the conversion division vector quantizer that Stephen So proposes, this algorithm before quantification earlier with the LSP parametric classification, thereby quantize classified sequence with code book more targetedly.Lattice type quantizer also is used to quantize the LSP parameter, but because lattice type quantizer has effect preferably to Gaussian distribution, the LSP parameter distributions is difficult for confirming.Along with gauss hybrid models is introduced in the vector quantization of voice gradually, the research that gauss hybrid models dative type quantizer combines is more and more paid close attention to by the people.In many research directions, the excavation of LSP parameter frame-to-frame correlation also is a focus.The scholar attempts to utilize this redundant bit still less of using between the LSP parameter to reach transparent quantification.
Summary of the invention
The object of the present invention is to provide a kind of line spectrum pairs parameter dimensionality reduction quantization method simple, that computation complexity is low of training based on the condition gauss hybrid models.
The objective of the invention is to realize like this:
Line spectrum pairs parameter dimensionality reduction quantization method based on the condition gauss hybrid models may further comprise the steps:
Step (1): at first divide frame, extract the LSP characteristic parameter of voice signal, and carry out the characteristic parameter dimensionality reduction to the voice signal after the sampling;
Step (2): the disruptive features argument sequence obtains sub-vector then;
Step (3): combine in twos to the sub-vector argument sequence, set up the associating sequence;
Step (4): utilize associating sequence training condition gauss hybrid models, obtain the various parameters of condition gauss hybrid models;
Step (5): utilize the parameter such as mean vector and covariance matrix of condition gauss hybrid models, design conditions probability density, the number of conditional probability density equal the number of gaussian component;
Step (6): carry out packet then, will work as last frame data and be included in the maximum described grouping of that gaussian component of conditional probability density value;
Step (7): the data that will divide into groups are used LBG algorithm training code book respectively;
Step (8): the code book that finally obtains is the vector quantization result of this voice signal.
Said step (1) comprises the following steps:
1) divides frame to the voice signal after the sampling;
2) every frame extracts P rank LSP parameter;
3) superframe formed in L frame voice;
4) utilize the compressed sensing theory that the higher-dimension line spectrum pair that this superframe forms is carried out dimension-reduction treatment earlier, obtain the measured value y of low dimension, then each sub-vector population of measured values of fixed allocation.
Said step (6) comprises the following steps:
1) division original series; The LSP parameter of 16 dimensions is split into five sub-vector forms of 3 dimension+3 dimensions+3 dimension+3 dimensions+4 dimensions, and division finishes, and to obtain quantity all are sub-vectors of 545,522;
2) set up the associating sequence; If initial LSP sub-vector argument sequence is x
1, x
2..., x
n, x is the element of this argument sequence, and the dimension of each sequence all is 3 or 4, and two sequences that order is occurred combine in twos, just constitute new arrangement set a: x
1x
2, x
2x
3..., x
N-1x
n, with x wherein
1x
2Be example, this sequence is 6 or 8 dimensions, x
1Preceding 3 or 4 dimensions have wherein been constituted, x
2Constituted back 3 or 4 dimensions wherein, and x
2Be current subframe, x
1Be last subframe, claim that the new sequence that makes up is the associating sequence;
3) training associating GMM; Set the component number of GMM, establish GMM and constitute by m gaussian component, wherein m=4 or m=8, m is the number of the gaussian component of setting, trained obtains m gaussian component weight, 1 * 6 or 1 * 8 mean vector, 6 * 6 or 8 * 8 covariance matrix;
4) data qualification; Utilize formula
Calculate the probable value of current subframe X, X and Y are respectively d dimension present frame and former frame LSP parameter, and d is the dimension of parameter, M
i, C
iBe respectively the 2d dimension mean vector and the 2d * 2d type covariance matrix of corresponding Gaussian density function,
α
iBe the weight of each gaussian component, and satisfy α
i>0,
G (Y) is a d dimension Gaussian density function,
I is the sequence number of gaussian component
Because GMM is made up of m gaussian component, so can calculate m conditional probability value, this frame is included in the grouping of that maximum one-component description of conditional probability value, this step is carried out can finally obtain m data classification since first frame order;
5) training code book; M data class of above-mentioned each sub-vector that has divided into groups used LBG algorithm training code book respectively.
Said step (4) comprises the following steps:
1) obtains the parameter of each gaussian component, α
i, M
i, C
i, α
iBe the weight of each gaussian component, M
iAnd C
iBe respectively the d dimension mean vector and the d * d type covariance matrix of corresponding Gaussian density function, i is the sequence number of gaussian component;
Adopt the EM iterative algorithm, mainly be divided into following 2 steps:
1. in the E step, promptly initial parameter is estimated, utilizes training data to ask for one group of initial parameter θ=[α
1, α
2α
m, M
1, M
2... M
m, C
1, C
2C
m], can make
And use the K-Mean Method to calculate the central point of clustering, with this as M
1, M
2... M
mInitial value, α
iBe the weight of each gaussian component, M
iAnd C
iBe respectively the d dimension mean vector and the d * d type covariance matrix of corresponding Gaussian density function, i is the sequence number of gaussian component;
2. M step, i.e. maximization utilizes the parameter that 1. E step obtains, according to maximum-likelihood criterion appraising model parameter again, and till parameter value reaches predefined requirement, new argument α
i', M
i', C
i' available following formula calculates:
In the formula, h
i(x
j) the random vector x that observes of expression
jBe the probability that is produced by i gaussian component, i and j are the sequence numbers of gaussian component
2) establishing X and Y is respectively d dimension present frame and former frame LSP parameter, X, and the joint probability density function of Y can be expressed as: f in the formula
X, Y(X Y) is X, the joint probability density function of Y, g
i(X Y) is 2d dimension Gaussian density function, M
i, C
iBe respectively the 2d dimension mean vector and the 2d * 2d type covariance matrix of corresponding Gaussian density function,
Can obtain the marginal probability density function of former frame LSP parameter Y, it is the GMM of a d dimension,
Like this, under the known situation of former frame parameter Y, the conditional probability density of present frame X can be expressed as
In the formula:
Condition covariance C
iIrrelevant with variable Y, can calculate in advance and store, i and j are the sequence numbers of gaussian component.
Said step (7) comprises the following steps:
1) given inceptive code book size N is through the random choice method or the selected initial centre of form of disintegrating method
And establish initial average distortion D
-1→ ∞, given calculating stops thresholding ε, wherein 0<ε<1;
2) around given code word, according to the arest neighbors criterion with training sequence X={x
1, x
2..., x
m, the dimension that to be divided into N nonoverlapping regional m be training sequence, the arest neighbors criterion is following:
3) calculate average distortion and distortion relatively
Average distortion does
In the formula, c
iBe x
rThe centre of form of place cell, D
nBe average distortion, n is an iterations, and m is the vector dimension, d (x
r, c
i) be x
rAnd c
iDistance, r is a vector dimension sequence number, i is a centre of form sequence number;
Distortion does relatively
If
explains that then the current centre of form meets distortion criterion; These centres of form promptly can be used as code word; EOP (end of program); Otherwise recomputate the centre of form, turn to step 2) continue iteration, the centroid calculation formula is following:
In the formula, λ is included in the number of i the training sequence in the cell, and i is the cell sequence number.
The invention has the advantages that:
Vector quantization is a kind of important compression method, is widely used in voice coding, speech recognition and the phonetic synthesis.
Line spectrum pair (Line Spectrum Pair; LSP) parameter is an important parameter in the low rate voice coding; The LSP parameter is a kind of frequency domain parameter; With the peak of voice signal spectrum envelope contact is more closely arranged, its quantification and interpolation characteristic also are superior to his parameter, and the quantification quality of LSP parameter directly has influence on the intelligibility of synthetic speech.Therefore, studying efficiently, LSP parameter quantification algorithm is extremely important to voice coding.
On the basis of analyzing based on the LSP parameter dimensionality reduction of compressed sensing and condition gauss hybrid models, the present invention utilizes the condition gauss hybrid models that line spectrum pairs parameter is carried out dimensionality reduction to quantize.At first utilize dimensionality reduction algorithm that the LSP parameter is carried out dimensionality reduction, reach the purpose that reduces calculated amount based on compressed sensing.Utilize training parameter to construct the condition gauss hybrid models of present frame then, excavated frame-to-frame correlation, thereby design code book more targetedly; Make under different LSP parameters; The specific aim of code book is stronger, has promoted the quantification performance, thereby reaches better quantification effect.Compare with common division Vector Quantization algorithm, all show the validity of this method from spectrum distortion, computation complexity, three aspects of storage complexity.
The realization of the inventive method is through having made up a LSP parameter dimensionality reduction quantization system based on the condition gauss hybrid models, is that platform carries out a series of specific aim experiments with this system, and the hardware environment that system realizes is following:
1. hardware: processor Intel (R) Pentium (R) Dual, CPU 1.60GHz; Internal memory 1GB; Video card 256M; Hard disk 80G.
2. software: Windows XP operating system; Matlab7.0 and VC++6.0 development environment.
Quantification performance to system is estimated, and in the experiment, 545,522 frame data are used to train code book, and 65,016 frame training datas are used for test.All LSP parameters by ITU-T G.722.2 audio coder & decoder (codec) calculate, and calculate average spectrum distortion according to international method.In the experiment, 16 dimension LSP parameters are divided into 5 sub-vectors by (3,3,3,3,4), the bit number average row that each sub-vector distributed write in the corresponding form.The gaussian component number is respectively 4 and 8.Carry out Algorithm Analysis from spectrum distortion, computation complexity, storage complexity three aspects respectively, compare with SVQ (Split Vector Quantization, SVQ, division vector quantization) algorithm.The result is illustrated on this main performance of spectrum distortion (flood rate of averaging spectrum distortion and 2-4dB) and obviously is superior to the SVQ under the identical bit, on aspect the computation complexity, rises seldom, on storage complexity, increases to some extent.Because current process chip register length constantly enlarges, therefore this way with sacrifice storage complexity lifting capacity voltinism ability is desirable.
Description of drawings
Fig. 1 is the line spectrum pairs parameter dimensionality reduction quantization method process flow diagram based on the condition gauss hybrid models.
Embodiment
Below in conjunction with accompanying drawing the present invention is done more detailed description:
In conjunction with Fig. 1.Line spectrum pairs parameter dimensionality reduction based on the condition gauss hybrid models quantizes may further comprise the steps:
Line spectrum pairs parameter dimensionality reduction based on the condition gauss hybrid models quantizes, and it is characterized in that:
(1) input speech signal carries out the branch frame
Employing adds the method for Hamming window to be carried out, and the definition of window function is following:
N is the length of window, the length of promptly dividing frame, and w (n) is a window function.Voice after the windowing become:
s
w(n)=s(n)w(n)
S (n) is a raw tone, s
w(n) be the windowing voice.
(2) extract line spectrum pair (LSP) characteristic parameter, comprising:
1. voice signal is carried out the linear prediction analysis of P rank, P is the linear prediction exponent number, obtains P linear predictor coefficient α
i, i=1,2 ..., P.Making
i is the characteristic parameter sequence number.A (z) is a linear prediction analysis filter.
2. define two polynomial expressions:
P(z)=A(z)+z
-(p+1)A(z
-1)
Q(z)=A(z)-z
-(p+1)A(z
-1)
Satisfying
P is the linear prediction exponent number; Be plural number their zero point, and the frequency that its phase place is represented is exactly the LSP parameter.
3. a superframe formed in L frame voice;
4. utilize the compressed sensing theory that the higher-dimension line spectrum pair that this superframe forms is carried out dimension-reduction treatment earlier, obtain the measured value y of low dimension, then each sub-vector population of measured values of fixed allocation.
(3) division one frame characteristic parameter sequence obtains sub-vector
The LSP parameter of 16 dimensions is split into five sub-vector forms of 3 dimension+3 dimensions+3 dimension+3 dimensions+4 dimensions.Division finishes, and just to have obtained quantity all are sub-vectors of 545,522.
(4) set up the associating sequence
If initial LSP sub-vector argument sequence is x
1, x
2..., x
n, the dimension of each sequence all is 3 (or 4).Two sequences that order is occurred combine in twos, just constitute new arrangement set a: x
1x
2, x
2x
3..., x
N-1x
n, with x wherein
1x
2Be example, this sequence is 6 (or 8) dimensions, x
1Constituted preceding 3 (or 4) dimension wherein, x
2Constituted back 3 (or 4) dimension wherein, and x
2Be current subframe, x
1It is last subframe.Claim that the new sequence that makes up is the associating sequence.
(5) training condition gauss hybrid models
1. obtain the parameter of each gaussian component, i.e. α
i, M
i, C
i, α
iBe the weight of each gaussian component, M
iAnd C
iBe respectively the d dimension mean vector and the d * d type covariance matrix of corresponding Gaussian density function.
Adopt the EM iterative algorithm.This algorithm mainly is divided into following 2 steps.
In the E step, promptly initial parameter is estimated.Utilize training data to ask for one group of initial parameter
θ=[α
1,α
2…α
m,M
1,M
2,…M
m,C
1,C
2…C
m]。Can make
And use the K-Mean Method to calculate the central point of clustering, with this as M
1, M
2... M
mInitial value.
M step, i.e. maximization.Utilize last one to go on foot the parameter that obtains, according to maximum-likelihood criterion appraising model parameter again, till parameter value reaches predefined requirement.New argument α
i', M
i', C
i' available following formula calculates:
In the formula, h
i(x
j) the random vector x that observes of expression
jIt is the probability that produces by i gaussian component.
2. establishing X and Y is respectively d dimension present frame and former frame LSP parameter, X, and the joint probability density function of Y can be expressed as
G in the formula
i(X Y) is 2d dimension Gaussian density function, M
i, C
iBe respectively the 2d dimension mean vector and the 2d * 2d type covariance matrix of corresponding Gaussian density function.
(6) design conditions probability density
Obtain the marginal probability density function of former frame LSP parameter Y, it is the GMM of a d dimension.
Like this, under the known situation of former frame parameter Y, the conditional probability density of present frame X can be expressed as
In the formula:
Condition covariance C
iIrrelevant with variable Y, can calculate in advance and store.
(7) with LBG (a kind of Vector Quantization algorithm, by Linde, Buzo, Gray three people proposed in 1980, LBG is the initial of three names) algorithm training code book
1. given inceptive code book size N is through the random choice method or the selected initial centre of form of disintegrating method
And establish initial average distortion D
-1→ ∞, given calculating stops thresholding ε (0<ε<1).
2. around given code word, according to the arest neighbors criterion with training sequence X={x
1, x
2..., x
mBe divided into N nonoverlapping zone (cell).The arest neighbors criterion is following:
3. calculate average distortion and distortion relatively.
Average distortion does
In the formula, c
iBe x
rThe centre of form of place cell.
Distortion does relatively
If
explains that then the current centre of form meets distortion criterion; These centres of form promptly can be used as code word, EOP (end of program).Otherwise recomputate the centre of form, turn to step 2. to continue iteration.The centroid calculation formula is following:
In the formula, λ is included in the number of i the training sequence in the cell.
Concrete performing step of the present invention is:
1. at first divide frame, extract line spectrum pair (LSP) characteristic parameter of voice signal, and carry out the characteristic parameter dimensionality reduction, specifically may further comprise the steps the voice signal after the sampling:
(1) divides frame to the voice signal after the sampling;
(2) every frame extracts P (P is the exponent number of characteristic parameter) rank LSP parameter;
(3) superframe formed in L (L is the frame number that is comprised of a superframe) frame voice;
(4) utilize the compressed sensing theory that the higher-dimension line spectrum pair that this superframe forms is carried out dimension-reduction treatment earlier, obtain the measured value y of low dimension, then each sub-vector population of measured values of fixed allocation.
2. the disruptive features argument sequence obtains sub-vector then;
3. carry out combination in twos to the sub-vector argument sequence, set up the associating sequence;
4. utilize associating sequence training condition gauss hybrid models, obtain the various parameters of condition gauss hybrid models, specifically may further comprise the steps:
(1) obtains the parameter of each gaussian component, α
i, M
i, C
i, α
iBe the weight of each gaussian component, M
iAnd C
iBe respectively the d dimension mean vector and the d * d type covariance matrix of corresponding Gaussian density function, i is the sequence number of gaussian component;
Adopt EM (Expect-Maximum, expectation maximization) iterative algorithm, mainly be divided into following 2 steps:
1. in the E step, promptly initial parameter is estimated, utilizes training data to ask for one group of initial parameter θ=[α
1, α
2α
m, M
1, M
2... M
m, C
1, C
2C
m].Can make
And use the K-Mean Method to calculate the central point of clustering, with this as M
1, M
2... M
mInitial value, α
iBe the weight of each gaussian component, M
iAnd C
iBe respectively the d dimension mean vector and the d * d type covariance matrix of corresponding Gaussian density function, i is the sequence number of gaussian component;
2. the M step, i.e. maximization utilizes the parameter that 1. obtains, according to maximum-likelihood criterion appraising model parameter again, and till parameter value reaches predefined requirement, new argument α
i', M
i', C
i' available following formula calculates:
In the formula, h
i(x
j) the random vector x that observes of expression
jBe the probability that is produced by i gaussian component, i and j are the sequence numbers of gaussian component
(2) establishing X and Y is respectively d dimension present frame and former frame LSP parameter, X, and the joint probability density function of Y can be expressed as: f in the formula
X, Y(X Y) is X, the joint probability density function of Y, g
i(X Y) is 2d dimension Gaussian density function, M
i, C
iBe respectively the 2d dimension mean vector and the 2d * 2d type covariance matrix of corresponding Gaussian density function,
Can obtain the marginal probability density function of former frame LSP parameter Y, it is the GMM of a d dimension.
Like this, under the known situation of former frame parameter Y, the conditional probability density of present frame X can be expressed as
In the formula:
Condition covariance C
iIrrelevant with variable Y, can calculate in advance and store.I and j are the sequence numbers of gaussian component.
5. utilize the parameter such as mean vector and covariance matrix of condition gauss hybrid models, design conditions probability density, the number of conditional probability density equal the number of gaussian component;
6. carry out packet then, a handled frame be included in the maximum described grouping of that gaussian component of conditional probability density value, specifically may further comprise the steps:
(1) division original series; The LSP parameter of 16 dimensions is split into five sub-vector forms of 3 dimension+3 dimensions+3 dimension+3 dimensions+4 dimensions, and division finishes, and just to have obtained quantity all are sub-vectors of 545,522;
(2) set up the associating sequence; If initial LSP sub-vector argument sequence is x
1, x
2..., x
n, x is the element of this argument sequence, and the dimension of each sequence all is 3 (or 4), and two sequences that order is occurred combine in twos, just constitute new arrangement set a: x
1x
2, x
2x
3..., x
N-1x
n, with x wherein
1x
2Be example, this sequence is 6 (or 8) dimensions, x
1Constituted preceding 3 (or 4) dimension wherein, x
2Constituted back 3 (or 4) dimension wherein, and x
2Be current subframe, x
1It is last subframe.Claim that the new sequence that makes up is the associating sequence;
(3) training associating GMM (Gaussian Mixture Model, gauss hybrid models); Set the component number of GMM, establish GMM and constitute (m=4 or m=8) by m gaussian component, m is the number of the gaussian component of setting, and trained obtains m gaussian component weight, the mean vector of 1 * 6 (or 1 * 8), the covariance matrix of 6 * 6 (or 8 * 8);
(4) data qualification.Utilize formula
Calculate the probable value of current subframe X.X and Y are respectively d dimension present frame and former frame LSP parameter, and d is the dimension of parameter, M
i, C
iBe respectively the 2d dimension mean vector and the 2d * 2d type covariance matrix of corresponding Gaussian density function,
α
iBe the weight of each gaussian component, and satisfy α
i>0,
G (Y) is a d dimension Gaussian density function,
M
iAnd C
iBe respectively the d dimension mean vector and the d * d type covariance matrix of corresponding Gaussian density function, i is the sequence number of gaussian component;
Because GMM is made up of m gaussian component, so can calculate m conditional probability value, this frame is included in the grouping of that maximum one-component description of conditional probability value, this step is carried out can finally obtain m data classification since first frame order;
(5) training code book; M data class of above-mentioned each sub-vector that has divided into groups used LBG algorithm training code book respectively.
7. the data that will divide into groups are used LBG (Linde-Buzo-Gray, LBG) algorithm training code book respectively; , specifically may further comprise the steps:
(1) given inceptive code book size N selectes the initial centre of form
also through random choice method or disintegrating method
If initial average distortion D
-1→ ∞, given calculating stops thresholding ε (0<ε<1);
(2) around given code word, according to the arest neighbors criterion with training sequence X={x
1, x
2..., x
m, being divided into N nonoverlapping zone (cell) m is the dimension of training sequence.The arest neighbors criterion is following:
Be j cell of the n time iteration gained, d (x, c
j) be x and c
jDistance, c
jBe
In element,
Be the centre of form of the n time iteration gained, x is the element of training sequence X, and i, j are the sequence number of the centre of form element of correspondence;
(3) calculate average distortion and distortion relatively.
Average distortion does
In the formula, c
iBe x
rThe centre of form of place cell, D
nBe average distortion, n is an iterations, and m is the vector dimension, d (x
r, c
i) be x
rAnd c
iDistance, r is a vector dimension sequence number, i is a centre of form sequence number;
Distortion does relatively
If
explains that then the current centre of form meets distortion criterion; These centres of form promptly can be used as code word; EOP (end of program); Otherwise recomputate the centre of form, turn to step (2) to continue iteration, the centroid calculation formula is following:
In the formula, λ is included in the number of i the training sequence in the cell, and i is the cell sequence number.
8. the code book that finally obtains is the vector quantization result of this voice signal.
Claims (5)
1. line spectrum pairs parameter dimensionality reduction quantization method based on the condition gauss hybrid models is characterized in that may further comprise the steps:
Step (1): at first divide frame, extract the LSP characteristic parameter of voice signal, and carry out the characteristic parameter dimensionality reduction to the voice signal after the sampling;
Step (2): the disruptive features argument sequence obtains sub-vector then;
Step (3): combine in twos to the sub-vector argument sequence, set up the associating sequence;
Step (4): utilize associating sequence training condition gauss hybrid models, obtain the various parameters of condition gauss hybrid models;
Step (5): utilize the parameter such as mean vector and covariance matrix of condition gauss hybrid models, design conditions probability density, the number of conditional probability density equal the number of gaussian component;
Step (6): carry out packet then, will work as last frame data and be included in the maximum described grouping of that gaussian component of conditional probability density value;
Step (7): the data that will divide into groups are used LBG algorithm training code book respectively;
Step (8): the code book that finally obtains is the vector quantization result of this voice signal.
2. the line spectrum pairs parameter dimensionality reduction quantization method based on the condition gauss hybrid models according to claim 1, it is characterized in that: said step (1) comprises the following steps:
1) divides frame to the voice signal after the sampling;
2) every frame extracts P rank LSP parameter;
3) superframe formed in L frame voice;
4) utilize the compressed sensing theory that the higher-dimension line spectrum pair that this superframe forms is carried out dimension-reduction treatment earlier, obtain the measured value y of low dimension, then each sub-vector population of measured values of fixed allocation.
3. the line spectrum pairs parameter dimensionality reduction quantization method based on the condition gauss hybrid models according to claim 2, it is characterized in that: said step (6) comprises the following steps:
1) division original series; The LSP parameter of 16 dimensions is split into five sub-vector forms of 3 dimension+3 dimensions+3 dimension+3 dimensions+4 dimensions, and division finishes, and to obtain quantity all are sub-vectors of 545,522;
2) set up the associating sequence; If initial LSP sub-vector argument sequence is x
1, x
2..., x
n, x is the element of this argument sequence, and the dimension of each sequence all is 3 or 4, and two sequences that order is occurred combine in twos, just constitute new arrangement set a: x
1x
2, x
2x
3..., x
N-1x
n, with x wherein
1x
2Be example, this sequence is 6 or 8 dimensions, x
1Preceding 3 or 4 dimensions have wherein been constituted, x
2Constituted back 3 or 4 dimensions wherein, and x
2Be current subframe, x
1Be last subframe, claim that the new sequence that makes up is the associating sequence;
3) training associating GMM; Set the component number of GMM, establish GMM and constitute by m gaussian component, wherein m=4 or m=8, m is the number of the gaussian component of setting, trained obtains m gaussian component weight, 1 * 6 or 1 * 8 mean vector, 6 * 6 or 8 * 8 covariance matrix;
4) data qualification; Utilize formula
Calculate the probable value of current subframe X, X and Y are respectively d dimension present frame and former frame LSP parameter, and d is the dimension of parameter, M
i, C
iBe respectively the 2d dimension mean vector and the 2d * 2d type covariance matrix of corresponding Gaussian density function,
α
iBe the weight of each gaussian component, and satisfy α
i>0,
G (Y) is a d dimension Gaussian density function,
I is the sequence number of gaussian component;
Because GMM is made up of m gaussian component, so can calculate m conditional probability value, this frame is included in the grouping of that maximum one-component description of conditional probability value, this step is carried out can finally obtain m data classification since first frame order;
5) training code book; M data class of above-mentioned each sub-vector that has divided into groups used LBG algorithm training code book respectively.
4. the line spectrum pairs parameter dimensionality reduction quantization method based on the condition gauss hybrid models according to claim 3, it is characterized in that: said step (4) comprises the following steps:
1) obtains the parameter of each gaussian component, α
i, M
i, C
i, α
iBe the weight of each gaussian component, M
iAnd C
iBe respectively the d dimension mean vector and the d * d type covariance matrix of corresponding Gaussian density function, i is the sequence number of gaussian component;
Adopt the EM iterative algorithm, mainly be divided into following 2 steps:
1. in the E step, promptly initial parameter is estimated, utilizes training data to ask for one group of initial parameter θ=[α
1, α
2α
m, M
1, M
2... M
m, C
1, C
2C
m], can make
And use the K-Mean Method to calculate the central point of clustering, with this as M
1, M
2... M
mInitial value, α
iBe the weight of each gaussian component, M
iAnd C
iBe respectively the d dimension mean vector and the d * d type covariance matrix of corresponding Gaussian density function, i is the sequence number of gaussian component;
2. M step, i.e. maximization utilizes the parameter that 1. E step obtains, according to maximum-likelihood criterion appraising model parameter again, and till parameter value reaches predefined requirement, new argument α
i', M
i', C
i' available following formula calculates:
In the formula, h
i(x
j) the random vector x that observes of expression
jBe the probability that is produced by i gaussian component, i and j are the sequence numbers of gaussian component
2) establishing X and Y is respectively d dimension present frame and former frame LSP parameter, X, and the joint probability density function of Y can be expressed as: f in the formula
X, Y(X Y) is X, the joint probability density function of Y, g
i(X Y) is 2d dimension Gaussian density function, M
i, C
iBe respectively the 2d dimension mean vector and the 2d * 2d type covariance matrix of corresponding Gaussian density function,
Can obtain the marginal probability density function of former frame LSP parameter Y, it is the GMM of a d dimension,
Like this, under the known situation of former frame parameter Y, the conditional probability density of present frame X can be expressed as
In the formula:
Condition covariance C
iIrrelevant with variable Y, can calculate in advance and store, i and j are the sequence numbers of gaussian component.
5. the line spectrum pairs parameter dimensionality reduction quantization method based on the condition gauss hybrid models according to claim 4, it is characterized in that: said step (7) comprises the following steps:
1) given inceptive code book size N selectes the initial centre of form
also through random choice method or disintegrating method
If initial average distortion D
-1→ ∞, given calculating stops thresholding ε, wherein 0<ε<1;
2) around given code word, according to the arest neighbors criterion with training sequence X={x
1, x
2..., x
m, the dimension that to be divided into N nonoverlapping regional m be training sequence, the arest neighbors criterion is following:
Be j cell of the n time iteration gained, d (x, c
j) be x and c
jDistance, c
jBe
In element,
Be the centre of form of the n time iteration gained, x is the element of training sequence X, and i, j are the sequence number of the centre of form element of correspondence;
3) calculate average distortion and distortion relatively
Average distortion does
In the formula, c
iBe x
rThe centre of form of place cell, D
nBe average distortion, n is an iterations, and m is the vector dimension, d (x
r, c
i) be x
rAnd c
iDistance, r is a vector dimension sequence number, i is a centre of form sequence number;
Distortion does relatively
If
explains that then the current centre of form meets distortion criterion; These centres of form promptly can be used as code word; EOP (end of program); Otherwise recomputate the centre of form, turn to step 2) continue iteration, the centroid calculation formula is following:
In the formula, λ is included in the number of i the training sequence in the cell, and i is the cell sequence number.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2012101400303A CN102708871A (en) | 2012-05-08 | 2012-05-08 | Line spectrum-to-parameter dimensional reduction quantizing method based on conditional Gaussian mixture model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2012101400303A CN102708871A (en) | 2012-05-08 | 2012-05-08 | Line spectrum-to-parameter dimensional reduction quantizing method based on conditional Gaussian mixture model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN102708871A true CN102708871A (en) | 2012-10-03 |
Family
ID=46901572
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2012101400303A Pending CN102708871A (en) | 2012-05-08 | 2012-05-08 | Line spectrum-to-parameter dimensional reduction quantizing method based on conditional Gaussian mixture model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102708871A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103678896A (en) * | 2013-12-04 | 2014-03-26 | 南昌大学 | CVB separation method for GMM parameters |
CN104244018A (en) * | 2014-09-19 | 2014-12-24 | 重庆邮电大学 | Vector quantization method capable of rapidly compressing high-spectrum signals |
CN104244017A (en) * | 2014-09-19 | 2014-12-24 | 重庆邮电大学 | Multi-level codebook vector quantitative method for compressed encoding of hyperspectral remote sensing image |
CN106782510A (en) * | 2016-12-19 | 2017-05-31 | 苏州金峰物联网技术有限公司 | Place name voice signal recognition methods based on continuous mixed Gaussian HMM model |
CN107580722A (en) * | 2015-05-27 | 2018-01-12 | 英特尔公司 | Gauss hybrid models accelerator with the direct memory access (DMA) engine corresponding to each data flow |
CN108109612A (en) * | 2017-12-07 | 2018-06-01 | 苏州大学 | Voice recognition classification method based on self-adaptive dimension reduction |
CN110019953A (en) * | 2019-04-16 | 2019-07-16 | 中国科学院国家空间科学中心 | A kind of real-time quick look system of payload image data |
CN111520615A (en) * | 2020-04-28 | 2020-08-11 | 清华大学 | Pipe network leakage identification and positioning method based on line spectrum pair and cubic interpolation search |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040210436A1 (en) * | 2000-04-19 | 2004-10-21 | Microsoft Corporation | Audio segmentation and classification |
CN101188107A (en) * | 2007-09-28 | 2008-05-28 | 中国民航大学 | A voice recognition method based on wavelet decomposition and mixed Gauss model estimation |
CN102034472A (en) * | 2009-09-28 | 2011-04-27 | 戴红霞 | Speaker recognition method based on Gaussian mixture model embedded with time delay neural network |
-
2012
- 2012-05-08 CN CN2012101400303A patent/CN102708871A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040210436A1 (en) * | 2000-04-19 | 2004-10-21 | Microsoft Corporation | Audio segmentation and classification |
CN101188107A (en) * | 2007-09-28 | 2008-05-28 | 中国民航大学 | A voice recognition method based on wavelet decomposition and mixed Gauss model estimation |
CN102034472A (en) * | 2009-09-28 | 2011-04-27 | 戴红霞 | Speaker recognition method based on Gaussian mixture model embedded with time delay neural network |
Non-Patent Citations (2)
Title |
---|
陈立伟: "基于条件PDF的宽带ISF参数分裂矢量量化方法", 《应用科技》 * |
鲍长春: "《数字语音编码原理》", 31 December 2007 * |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103678896A (en) * | 2013-12-04 | 2014-03-26 | 南昌大学 | CVB separation method for GMM parameters |
CN104244018A (en) * | 2014-09-19 | 2014-12-24 | 重庆邮电大学 | Vector quantization method capable of rapidly compressing high-spectrum signals |
CN104244017A (en) * | 2014-09-19 | 2014-12-24 | 重庆邮电大学 | Multi-level codebook vector quantitative method for compressed encoding of hyperspectral remote sensing image |
CN104244017B (en) * | 2014-09-19 | 2018-02-27 | 重庆邮电大学 | The multi-level codebook vector quantization method of compressed encoding high-spectrum remote sensing |
CN104244018B (en) * | 2014-09-19 | 2018-04-27 | 重庆邮电大学 | The vector quantization method of Fast Compression bloom spectrum signal |
CN107580722B (en) * | 2015-05-27 | 2022-01-14 | 英特尔公司 | Gaussian mixture model accelerator with direct memory access engines corresponding to respective data streams |
CN107580722A (en) * | 2015-05-27 | 2018-01-12 | 英特尔公司 | Gauss hybrid models accelerator with the direct memory access (DMA) engine corresponding to each data flow |
CN106782510A (en) * | 2016-12-19 | 2017-05-31 | 苏州金峰物联网技术有限公司 | Place name voice signal recognition methods based on continuous mixed Gaussian HMM model |
CN106782510B (en) * | 2016-12-19 | 2020-06-02 | 苏州金峰物联网技术有限公司 | Place name voice signal recognition method based on continuous Gaussian mixture HMM model |
CN108109612A (en) * | 2017-12-07 | 2018-06-01 | 苏州大学 | Voice recognition classification method based on self-adaptive dimension reduction |
CN110019953A (en) * | 2019-04-16 | 2019-07-16 | 中国科学院国家空间科学中心 | A kind of real-time quick look system of payload image data |
CN110019953B (en) * | 2019-04-16 | 2021-03-30 | 中国科学院国家空间科学中心 | Real-time quick-look system for effective load image data |
CN111520615A (en) * | 2020-04-28 | 2020-08-11 | 清华大学 | Pipe network leakage identification and positioning method based on line spectrum pair and cubic interpolation search |
CN111520615B (en) * | 2020-04-28 | 2021-03-16 | 清华大学 | Pipe network leakage identification and positioning method based on line spectrum pair and cubic interpolation search |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102708871A (en) | Line spectrum-to-parameter dimensional reduction quantizing method based on conditional Gaussian mixture model | |
US6826526B1 (en) | Audio signal coding method, decoding method, audio signal coding apparatus, and decoding apparatus where first vector quantization is performed on a signal and second vector quantization is performed on an error component resulting from the first vector quantization | |
EP1953737B1 (en) | Transform coder and transform coding method | |
CN101057275B (en) | Vector conversion device and vector conversion method | |
US7243061B2 (en) | Multistage inverse quantization having a plurality of frequency bands | |
CN101371295B (en) | Apparatus and method for encoding and decoding signal | |
Ma et al. | Vector quantization of LSF parameters with a mixture of Dirichlet distributions | |
CN103050122B (en) | MELP-based (Mixed Excitation Linear Prediction-based) multi-frame joint quantization low-rate speech coding and decoding method | |
Boucheron et al. | Low bit-rate speech coding through quantization of mel-frequency cepstral coefficients | |
Ranjan | A discrete wavelet transform based approach to Hindi speech recognition | |
CN103918028B (en) | The audio coding/decoding effectively represented based on autoregressive coefficient | |
CN102714040A (en) | Encoding device, decoding device, spectrum fluctuation calculation method, and spectrum amplitude adjustment method | |
WO2009125588A1 (en) | Encoding device and encoding method | |
CN102436815B (en) | Voice identifying device applied to on-line test system of spoken English | |
CN102982807B (en) | Method and system for multi-stage vector quantization of speech signal LPC coefficients | |
Shin et al. | Audio coding based on spectral recovery by convolutional neural network | |
CN101159136A (en) | Low bit rate music signal coding method | |
CN102812512A (en) | Method and apparatus for processing an audio signal | |
Liu et al. | Spectral envelope estimation used for audio bandwidth extension based on RBF neural network | |
Wu et al. | A fused speech enhancement framework for robust speaker verification | |
Kang et al. | A High-Rate Extension to Soundstream | |
Sisman et al. | A new speech coding algorithm using zero cross and phoneme based SYMPES | |
CN117292694B (en) | Time-invariant-coding-based few-token neural voice encoding and decoding method and system | |
Chu | Embedded quantization of line spectral frequencies using a multistage tree-structured vector quantizer | |
Li et al. | Vector quantization for LSF coding and codebook storage |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20121003 |