CN1862661A

CN1862661A - Nonnegative matrix decomposition method for speech signal characteristic waveform

Info

Publication number: CN1862661A
Application number: CNA2006100122964A
Authority: CN
Inventors: 鲍长春; 张鹏
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2006-06-16
Filing date: 2006-06-16
Publication date: 2006-11-15

Abstract

The present invention relates to a nonnegative matrix decomposition method of speech signal characteristic waveform, belonging to speech signal processing technology. Said method includes the following several steps: firstly, utilizing fundamental tone pitch of speech signal to divide the speech characteristic waveform into 9 classes, for every class of characteristic waveform utilizing iteration method of standard nonnegative matrix decomposition to train out base matrix W, then for given a frame characteristic waveform utilizing its fundamental tone pitch to make subsumption, then taking out the trained base matrix W correspondent to said class of characteristic waveform, and utilizing iteration method to obtain code matrix H correspondent to said frame characteristic waveform, so that said frame characteristic waveform can be approximately decomposed into the product of base matrix W and code matrix H.

Description

A kind of nonnegative matrix decomposition method of phonic signal character waveform

Technical field

The present invention relates to a kind of nonnegative matrix decomposition method of phonic signal character waveform, belong to field of voice signal.

Background technology

Along with wireless mobile communications, secure voice communications and network VoIP Rapid development in communication systems, people increase day by day to the demand of the speech coding technology below the high-quality 4kb/s.At present mainly contain binary Excited Linear Prediction model (LPC-10-Linear Prediction Coding-10), MELP (Mixed Excitation Linear Prediction) model (MELP-MixedExcitation Linear Prediction), be with excitation (MBE-Muliti-BandExcitation) and waveform interpolation model (WI-Waveform Interpolation) more at the voice coding model of the following speed of 4kb/s in the world.These models are all based on the source-system model that produces voice signal, promptly use excitation source signal (simulation is from the air-flow of lung) linear time-varying filtering device of de-energisation (simulation sound channel) to produce voice signal, in this provenance-excitation, the extraction of channel parameters is extremely successful, and how according to the different characteristics of voice pumping signal (or claim " signature waveform ") is carried out high-precision decomposition and quantification is the bottleneck that present voice quality improves.The WI speech coding algorithm is the low rate speech coding algorithm of tool potentiality, and the key issue of this encryption algorithm also is how effectively to decompose and quantization characteristic waveform (in the WI scrambler, the title excitation source signal is " signature waveform ").

In waveform interpolation voice coding scheme, have the decomposition method of three kinds of signature waveforms (CW-CharacterWaveform) at present: (1) utilizes the linear-phase filtering method to decompose.(2) wavelet transformation decomposition method.(3) singular value decomposition method.These three kinds of decomposition methods are decomposing on precision, computation complexity, three technical indicators of extra time-delay size, exist following defective separately:

(1) utilize the linear-phase filtering method to decompose: to decompose low precision; Bring 1 frame additionally to delay time.

(2) wavelet transformation: bring 5 frames additionally to delay time.

(3) svd: computation complexity is very high.

Summary of the invention

In order to address the above problem, the invention provides a kind of nonnegative matrix decomposition method of phonic signal character waveform, this method problem to be solved is exactly when decomposing the phonic signal character waveform in the waveform interpolation speech coder, the purpose that reaches high precision, low complex degree and do not have extra signature waveform of delaying time to decompose.

Nonnegative matrix is decomposed (NMF-Non-negative Matrix Factorization) technology and has been widely used in the every field of signal Processing, its basic thought is: for any given nonnegative matrix V, NMF can search out a nonnegative matrix W and a nonnegative matrix H by the limited number of time iteration, make and to satisfy approximation relation: V ≈ W * H, thereby the product of two nonnegative matrixes about a non-negative approximate matrix is decomposed into.Wherein left matrix W is storing the local feature of such things of V matrix representative in (having another name called " basis matrix "), " parts " that promptly forms V, linear combination these " parts " can be similar to synthetic original V matrix information, and combination coefficient is stored in the right matrix H (having another name called " encoder matrix ").

The present invention is based on the nonnegative matrix technology of being used widely, decompose the signature waveform of voice signal in the waveform interpolation speech coder in the signal Processing field.The basic thought that decomposes the phonic signal character waveform with the nonnegative matrix decomposition technique is: by the experiment sample about 10000 frames, train the basis matrix of phonic signal character waveform, and basis matrix is stored.Because the training basis matrix is that off-line carries out, so when the computation complexity of assessment this method, do not add up the computation complexity in this step; Then the given signature waveform of each frame is made nonnegative matrix and decompose, promptly the basis matrix that has trained corresponding to this frame signature waveform is taken out, by the encoder matrix of alternative manner acquisition corresponding to this frame signature waveform.So far, the approximate product that is broken down into basis matrix and encoder matrix of this frame signature waveform that is to say and finished the overall process that nonnegative matrix is decomposed.

How argumentation utilizes standard nonnegative matrix decomposition method to train basis matrix and how to obtain encoder matrix respectively below.

A, training basis matrix.

A, at first the phonetic feature waveform is divided into 9 classes according to the size of the pitch period (pitch) of this frame voice signal, classification foundation is as shown in table 1:

The 1st class 20≤pitch＜30	The 2nd class 30≤pitch＜40	The 3rd class 40≤pitch＜50
The 1st class 20≤pitch＜30	The 2nd class 30≤pitch＜40	The 3rd class 40≤pitch＜50	The 4th class 50≤pitch＜60	The 5th class 60≤pitch＜70	The 6th class 70≤pitch＜80
The 7th class 80≤pitch＜90	The 8th class 90≤pitch＜100	The 9th class 100≤pitch≤120	The 4th class 50≤pitch＜60	The 5th class 60≤pitch＜70	The 6th class 70≤pitch＜80

The classification of table 1 signature waveform

2) then first kind signature waveform is chosen experiment sample about 10000 frames, form matrix V, the alternative manner according to the nonnegative matrix of standard is decomposed trains basis matrix W, and concrete steps are as follows:

(1) such signature waveform is chosen experiment sample about 10000 frames, constitute matrix V;

(2) maximum iteration time being set is N time, and the value of N is greater than 10; It is M that the decomposition exponent number is set, the value of M is the integer between 8～32, Seung and the employed formulation of Lee when " decomposition exponent number " this notion has been continued to use the proposition of standard nonnegative matrix decomposition method, the decomposition exponent number is set to M and in other words the row dimension of basis matrix W is made as M;

(3) be used in equally distributed random number between [0,1], all elements in initialization W, the H matrix;

(4) each row to matrix W carry out normalization, each element in promptly every row all divided by this row all elements and;

(5) the current iteration number of times being set is 1;

(6), then change (7), otherwise change (8) if the current iteration number of times is less than or equal to maximum iteration time N;

(7) upgrade encoder matrix H, upgrade basis matrix W subsequently, the update mode of two matrixes is as follows, then the current iteration number of times is increased by 1, changes (6);

H_{aμ} &LeftArrow; H_{aμ} \underset{i}{Σ} W_{ia} \frac{V_{iμ}}{{(WH)}_{iμ}}

(1)

W_{ia} &LeftArrow; W_{ia} \underset{μ}{Σ} \frac{V_{iμ}}{{(WH)}_{iμ}} H_{aμ}

(2)

W_{ia} &LeftArrow; \frac{W_{ia}}{\underset{j}{Σ} W_{ja}}

(3)

Symbol description related in above-mentioned (1) formula is as follows:

(a) H _{A μ}The element of the capable μ row of a of representing matrix H;

(b) W _IaThe element of the capable a row of the i of representing matrix W;

(c) V _{I μ}The element of the capable μ row of representing matrix V i;

(d) (WH) _{I μ}The element of the capable μ row of the i of gained matrix after representing matrix W and matrix H multiply each other;

(e)

The element of the capable μ row of i of gained matrix after the element of the capable μ row of the i of representing matrix V multiplies each other divided by the matrix W matrix H;

(f)

Expression to all different i and μ by

μ row in the matrix that all elements constituted that is calculated with a row of W matrix, are made inner product of vectors;

(g) matrix element on " ← " symbolic representation handle " ← " the right is composed the matrix element to " ← " left side correspondence position, has promptly finished " renewal " operation of each element of H matrix.

Symbol description related in above-mentioned (2) formula is as follows:

(a) W _IaThe element of the capable a row of representing matrix W i;

(b) H _{A μ}The element of the capable μ row of a of representing matrix H;

(c) V _{I μ}The element of the capable μ row of the i of representing matrix V;

(e)

The element of the capable μ of representing matrix V i row is divided by matrix W and the matrix H element that the capable μ of i of back gained matrix is listed as that multiplies each other;

(f)

H _{A μ}Expression to all different i and μ by

I in the matrix that all elements constituted that is calculated is capable, and is capable with a of H matrix, makes inner product of vectors;

(g) matrix element on " ← " symbolic representation handle " ← " the right is composed the matrix element to " ← " left side correspondence position, has promptly finished " renewal " operation of each element of W matrix.

Symbol description related in above-mentioned (3) formula is as follows:

(a) W _IaThe element of the capable a row of the i of representing matrix W;

(b) The all elements summation of representing matrix W a row;

(c)

The element of the capable a of representing matrix W i row divided by matrix W a row all elements and;

(d) matrix element on " ← " symbolic representation handle " ← " the right is composed the matrix element to " ← " left side correspondence position, has promptly finished " normalization " operation of each element of W matrix.

(8) end loop, and preserve basis matrix W.

3) repeat said method, obtain the basis matrix W of the 2nd～9 category feature waveform respectively, obtain 9 basis matrixs that correspond respectively to the inhomogeneity signature waveform altogether;

B, obtain the encoder matrix H of the given signature waveform of a certain frame

A, to the given signature waveform of a certain frame, at first be divided into 9 classes according to the size of pitch period according to this frame signature waveform, sorting technique is identical with point-score in the step 1);

B, from 9 basis matrixs that trained, take out basis matrix W then, use existing nonnegative matrix decomposition method at last, obtain encoder matrix H, obtain encoder matrix H's corresponding to such signature waveform

Concrete steps are as follows:

(1) this frame signature waveform, is considered as matrix V;

(2) the decomposition exponent number being set is M, and the value of M is the integer between 8～32, and it is 10 times that maximum iteration time is set;

(3) be used in equally distributed random number between [0,1], all elements in the initialization H matrix;

(4) the current iteration number of times being set is 1;

(5) if the current iteration number of times is less than or equal to maximum iteration time N, the value of N is then changeed (6) greater than 10 times, otherwise changes (7);

(6) upgrade encoder matrix H, update mode as shown in the formula, then the current iteration number of times is increased by 1, change (5);

H_{aμ} &LeftArrow; H_{aμ} \underset{i}{Σ} W_{ia} \frac{V_{iμ}}{{(WH)}_{iμ}}

(4)

(4) related symbol description is as follows in the formula:

(a) H _{A μ}The element of the capable μ row of a of representing matrix H;

(b) W _IaThe element of the capable a row of the i of representing matrix W;

(e) The element of the capable μ of the i of representing matrix V row is divided by matrix W and the matrix H element that the capable μ of i of back gained matrix is listed as that multiplies each other;

(f) Expression to all different i and μ by μ row in the matrix that all elements constituted that is calculated with a row of W matrix, are made inner product of vectors;

(g) matrix element on " ← " symbolic representation handle " ← " the right is composed the matrix element to " ← " left side correspondence position, has promptly finished " renewal " operation of each element of H matrix;

(7) end loop, and preserve encoder matrix H; So far, this frame signature waveform V, the product that has resolved into two nonnegative matrix W, H that is similar to,

Be V ≈ W * H.

Beneficial effect of the present invention:

1. owing to decompose the phonic signal character waveform with the nonnegative matrix decomposition method, only need decompose just passablely to the present frame voice signal, and do not need the participation of future frame voice signal, so the decomposition of the nonnegative matrix of signature waveform can not bring extra time-delay.

2. the decomposition ratio of precision of signature waveform nonnegative matrix decomposition is higher.The signature waveform (Fig. 2) that synthesizes again after nonnegative matrix is decomposed, can obtain with svd after the approximate reconstruction effect of signature waveform (Fig. 3) of being synthesized by the second order singular value, and slightly be better than the reconstruction effect of the signature waveform (Fig. 4) that is synthesized by the single order singular value after the svd.The part that the with dashed lines ellipse encloses among Fig. 4 is compared with this part in the primitive character waveform (Fig. 1), has bigger reconstruction error, will be worse than the signature waveform (Fig. 2) that synthesizes again after nonnegative matrix is decomposed so it rebuilds effect.

3. the computation complexity of the nonnegative matrix of signature waveform decomposition is lower.Table 2 has compared the computation complexity that linear-phase filtering, wavelet transformation, svd, nonnegative matrix decompose four kinds of signature waveform decomposition methods:

Decomposition method	Linear-phase filtering	Wavelet transformation	Svd	Nonnegative matrix is decomposed
Decomposition method	Linear-phase filtering	Wavelet transformation	Svd	Nonnegative matrix is decomposed	Computation complexity	0(mn)	0(mn)	0(mn ²)	0(mn)

The computation complexity of four kinds of decomposition methods of table 2

In the table 2, m, n be line number, the columns of representation feature waveform matrix respectively.

4. after signature waveform being done the nonnegative matrix decomposition, the signature waveform matrix of former higher-dimension can characterize by the encoder matrix H of low-dimensional is approximate, make nonnegative matrix to signature waveform thus and decompose the purpose that has played data compression.

Accompanying drawing is described

Signature waveform before Fig. 1 four frame voice signals are undecomposed

Fig. 2 nonnegative matrix is decomposed the synthetic signature waveform in back

After Fig. 3 svd, with the synthetic signature waveform of second order singular value

After Fig. 4 svd, with the synthetic signature waveform of single order singular value

Fig. 5 trains the flow process of basis matrix

Fig. 6 obtains the flow process of encoder matrix

Embodiment

The specific embodiment of the present invention, how argumentation utilizes standard nonnegative matrix decomposition method to train basis matrix and how to obtain encoder matrix respectively below.

A, training basis matrix.

The classification of table 1 signature waveform

B, then each category feature waveform is all chosen experiment sample about 10000 frames, form matrix V, the alternative manner that decomposes according to the nonnegative matrix of standard (here, the present invention has adopted and made that of V and W * H Euclidean distance minimum overlap iteration method) trains basis matrix W.Because signature waveform has been divided into 9 classes according to the size of pitch period, so the process of training basis matrix is exactly that each category feature waveform is all trained its corresponding basis matrix W.Be concrete steps (referring to Fig. 5) below with standard nonnegative matrix decomposition method training basis matrix W:

(2) maximum iteration time being set is 1000 times, and present embodiment maximum iteration time N is set to 1000 times, and W matrix and H matrix all can be restrained in the time of guaranteeing the iteration end.It is 16 that the decomposition exponent number is set, Seung and the employed formulation of Lee when " decomposition exponent number " this notion has been continued to use the proposition of standard nonnegative matrix decomposition method, and the decomposition exponent number is set to 16 and in other words the row dimension of basis matrix W is made as 16;

(5) the current iteration number of times being set is 1;

(6), then change (7), otherwise change (8) if the current iteration number of times is less than or equal to maximum iteration time 1000;

(7) upgrade encoder matrix H, upgrade basis matrix W subsequently, the update mode of two matrixes following (this is the alternative manner of the standard nonnegative matrix decomposition method quoted of the present invention) increases by 1 to the current iteration number of times then, changes (6);

H_{aμ} &LeftArrow; H_{aμ} \underset{i}{Σ} W_{ia} \frac{V_{iμ}}{{(WH)}_{iμ}}

(1)

W_{ia} &LeftArrow; W_{ia} \underset{μ}{Σ} \frac{V_{iμ}}{{(WH)}_{iμ}}

(2)

W_{ia} &LeftArrow; \frac{W_{ua}}{\underset{j}{Σ} W_{ja}}

(3)

Symbol description related in above-mentioned (1) formula is as follows:

(a) H _{A μ}The element of the capable μ row of a of representing matrix H;

(b) W _IaThe element of the capable a row of the i of representing matrix W;

(f) Expression to all different i and μ by

Symbol description related in above-mentioned (2) formula is as follows:

(a) W _IaThe element of the capable a row of the i of representing matrix W;

(b) H _{A μ}The element of the capable μ row of a of representing matrix H;

(e)

The element of the capable μ of the i of representing matrix V row is divided by matrix W and the matrix H element that the capable μ of i of back gained matrix is listed as that multiplies each other;

(f)

H _{A μ}Expression to all different i and μ by

Symbol description related in above-mentioned (3) formula is as follows:

(a) W _IaThe element of the capable a row of the i of representing matrix W;

(b) The all elements summation of representing matrix W a row;

(c)

(8) end loop, and preserve basis matrix W.

B, obtain the encoder matrix of the given signature waveform of a certain frame.

A, to the given signature waveform of a certain frame, at first this frame signature waveform is sorted out according to the size of pitch period (pitch), classification foundation is still as shown in table 1;

B, take out basis matrix (training process trains) then, use existing nonnegative matrix decomposition method at last, obtain encoder matrix H corresponding to such signature waveform.Obtain the concrete steps following (referring to Fig. 6) of encoder matrix H:

(1) this frame signature waveform, is considered as matrix V;

(2) the decomposition exponent number being set is 16, and it is 10 times that maximum iteration time is set;

(4) the current iteration number of times being set is 1;

(5), then change (6), otherwise change (7) if the current iteration number of times is less than or equal to maximum iteration time 10;

H_{aμ} &LeftArrow; H_{aμ} \underset{i}{Σ} W_{ia} \frac{V_{iμ}}{{(WH)}_{iμ}}

(4)

(4) related symbol description is as follows in the formula:

(a) H _{A μ}The element of the capable μ row of a of representing matrix H;

(b) W _IaThe element of the capable a row of the i of representing matrix W;

(f)

Expression to all different i and μ by

(7) end loop, and preserve encoder matrix H.So far, this frame signature waveform V, what be similar to has resolved into two nonnegative matrix W, H product, i.e. V ≈ W * H.

Claims

1, a kind of nonnegative matrix decomposition method of phonic signal character waveform is characterized in that, this method is carried out according to the following steps:

A, training basis matrix W

1) at first to the pitch period of phonetic feature waveform according to this frame voice signal, promptly the size of pitch is divided into 9 classes, and is as follows:

The 1st class 20≤pitch＜30

The 2nd class 30≤pitch＜40

The 3rd class 40≤pitch＜50

The 4th class 50≤pitch＜60

The 5th class 60≤pitch＜70

The 6th class 70≤pitch＜80

The 7th class 80≤pitch＜90

The 8th class 90≤pitch＜100

The 9th class 100≤pitch≤120

(5) the current iteration number of times being set is 1;

H_{aμ} &LeftArrow; H_{aμ} \underset{i}{Σ} W_{ia} \frac{V_{iμ}}{{(WH)}_{iμ}} - - - (1)

H_{ia} &LeftArrow; H_{ia} \underset{μ}{Σ} \frac{V_{iμ}}{{(WH)}_{iμ}} H_{aμ} - - - (2)

H_{ia} &LeftArrow; \frac{W_{ia}}{\underset{j}{Σ} W_{ja}} - - - (3)

Symbol description related in above-mentioned (1) formula is as follows:

(a) H _{A μ}The element of the capable μ row of a of representing matrix H;

(b) W _IaThe element of the capable a row of the i of representing matrix W;

(e)

Symbol description related in above-mentioned (2) formula is as follows:

(a) W _IaThe element of the capable a row of the i of representing matrix W;

(b) H _{A μ}The element of the capable μ row of a of representing matrix H;

(e)

(f) Expression to all different i and μ by I in the matrix that all elements constituted that is calculated is capable, and is capable with a of H matrix, makes inner product of vectors;

Symbol description related in above-mentioned (3) formula is as follows:

(a) W _IaThe element of the capable a row of the i of representing matrix W;

(b) The all elements summation of representing matrix W a row;

(c) The element of the capable a of representing matrix W i row divided by matrix W a row all elements and;

(8) end loop, and preserve basis matrix W.

B, take out basis matrix W corresponding to such signature waveform then from 9 basis matrixs that trained, use existing nonnegative matrix decomposition method at last, obtain encoder matrix H, the concrete steps that obtain encoder matrix H are as follows:

(1) this frame signature waveform, is considered as matrix V;

(4) the current iteration number of times being set is 1;

H_{aμ} &LeftArrow; H_{aμ} \underset{i}{Σ} W_{ia} \frac{V_{iμ}}{{(WH)}_{iμ}} - - - (4)

(4) related symbol description is as follows in the formula:

(a) H _{A μ}The element of the capable μ row of a of representing matrix H;

(b) W _IaThe element of the capable a row of the i of representing matrix W;

(e)

(f)

Expression to all different i and μ by

All that are calculated

μ row in the matrix that element constituted with a row of W matrix, are made inner product of vectors;

(7) end loop, and preserve encoder matrix H;

So far, this frame signature waveform V, the product that has resolved into two nonnegative matrix W, H that is similar to, i.e. V ≈ W * H.