CN104795074A

CN104795074A - Multi-mode multi-stage codebook joint optimization method

Info

Publication number: CN104795074A
Application number: CN201510121820.0A
Authority: CN
Inventors: 徐敬德; 崔慧娟; 唐昆
Original assignee: XINRUIDI (BEIJING) SCIENCE & TECHNOLOGY Co Ltd; Tsinghua University
Current assignee: XINRUIDI (BEIJING) SCIENCE & TECHNOLOGY Co Ltd; Tsinghua University
Priority date: 2015-03-19
Filing date: 2015-03-19
Publication date: 2015-07-22
Anticipated expiration: 2035-03-19
Also published as: CN104795074B

Abstract

The invention discloses a multi-mode multi-stage codebook joint optimization method, belongs to the technical field of low-bitrate speech coding vector quantization and solves the technical problem that in existing multi-mode multi-stage codebook training, error codes greatly influence the systematic distortion. The multi-mode multi-stage codebook joint optimization method includes: outputting the vectors to be quantified; conducting training on the vectors to be quantified and obtaining the multi-mode multi-stage initial codebooks; conducting the codebook index rearrangement on each multi-mode multi-stage initial codebook and obtaining a new codebook; regarding the vectors to be quantified, seeking the optimal cell and the quantified index which enable the systematic distortion to be smallest through the new codebook; updating the multi-mode multi-stage optimal codon through the input of the residual vectors and the corresponding quantified indexes and iterating the optimal codon to the preset times; and obtaining the codon obtained from the last iteration.

Description

Multi-mode multi-stage codebooks combined optimization method

Technical field

The present invention relates to Low-ratespeech coding vector quantization technology field, specifically, relate to a kind of multi-mode multi-stage codebooks combined optimization method.

Background technology

Along with the development of Audiotechnica, Low-ratespeech coding has a wide range of applications in the field such as radio communication, satellite communication.

In Low-ratespeech coding, owing to being subject to memory space and computing quantitative limitation, often adopt restrained vector quantization mode, comprising Tree-structured vector quantization, classified vector quantization, multi-stage vector quantization, transform domain vector quantization etc., to reduce memory space and the operand of code book.Wherein, the most conventional with multi-stage vector quantization.On the other hand, although merotype vector quantization can increase certain memory space, it when additionally not increasing bit, can effectively reduce quantization error.In numerous applications of Low-ratespeech coding, often the moment is along with the existence of error code.For multi-mode multi-stage vector quantization, once generation error code, not only code book index at different levels can be made mistakes, and the selection of pattern also can make mistakes, often there is larger deviation with the input parameter of coding side in the parameter of decoding end synthesis like this, have a strong impact on intelligibility and the comfort level of synthetic speech.

Traditional merotype multi-stage vector quantization, carry out the combined optimization of code book training and code book so that information source is optimum for criterion, owing to not considering channel error code situation, the systematic distortion of the multi-mode multi-stage vector quantization of information source optimum is quantizing distortion.Such training optimum code word is out evenly distributed in whole quantification space, and once there is error code, whole system distortion also can be larger.

Based on above-mentioned situation, in existing multi-mode multi-stage codebooks training, error code is comparatively large on the impact of systematic distortion, have impact on the quality to inverse quantization code word and synthetic speech.

Summary of the invention

The object of the present invention is to provide a kind of multi-mode multi-stage codebooks combined optimization method, to solve in the training of existing multi-mode multi-stage codebooks, the technical matters that the impact of error code on systematic distortion is larger.

The invention provides a kind of multi-mode multi-stage codebooks combined optimization method, comprising:

Step 1, exports vector to be quantified;

Step 2, trains described vector to be quantified, obtains each pattern initialization code book at different levels, and wherein, progression is M;

Step 3, carries out the rearrangement of code book index, obtains new code book to described each pattern initialization code book at different levels;

Step 4, to described vector to be quantified, utilizes new code book, finds the optimum cell and quantization index that make systematic distortion minimum;

Step 5, for m level code book, utilizes the quantization index that input residual error vector at the corresponding levels is corresponding, and upgrade the optimum code word of each pattern at the corresponding levels, wherein, the initial value of m is 1;

Step 6, compares the value of m and M;

If m<M, then the value of m is added 1, and return step 4;

If m=M, then carry out step 7;

Step 7, judges whether iterations t reaches preset value T, and wherein, the initial value of t is 1;

If t<T, then the value of t is added 1, the value of m is reset to 1, and returns step 4;

If t=T, then carry out step 8;

Step 8, obtains the code book that last iteration draws.

Preferably, described step 1, is specially:

In Low-ratespeech coding, to sound bank according to merotype according to the quantization index of parameter as the numbering of pattern, export parameter to be quantified vector to be quantified in each mode.

Described sound bank preferably have setting sample frequency, there is certain time length, the standard Chinese sound bank that comprises multiple tone color.Described setting sample frequency is preferably 8kHz.

Preferably, described step 2, is specially:

Using vector to be quantified for each pattern as code book training and the input vector of combined optimization, adopt the codebook training method based on simulated annealing respectively, carry out multi-stage codebooks training according to the bit number of distribution at different levels, obtain each pattern initialization code book at different levels.

Preferably, described step 3, is specially:

To described each pattern initialization code book at different levels, adopt the method based on tabu search to carry out the rearrangement of code book index, obtain new code book.

Preferably, the span of T is between 20 to 40.

Present invention offers following beneficial effect: in multi-mode multi-stage codebooks combined optimization method provided by the invention, the optimum code book of the multi-mode multi-stage vector quantization of channel optimum is obtained by successive ignition, code word is wherein not the barycenter of current cell, but each cell barycenter is obtained by error code transition probability weighted mean.Compared to the multi-mode multi-stage vector quantization mode of existing information source optimum, distance in the present invention between optimum code word is smaller, once there is error code, whole system distortion also can significantly reduce, thus reduce the impact of error code on systematic distortion, improve the error-resilient performance of system.Therefore, in multi-mode multi-stage vector quantization and transmitting procedure, the systematic distortion of inverse quantization code word and input parameter in channel error code situation effectively can be reduced, and the quality of synthetic speech under effectively improving channel error code situation.

Other features and advantages of the present invention will be set forth in the following description, and, partly become apparent from instructions, or understand by implementing the present invention.Object of the present invention and other advantages realize by structure specifically noted in instructions, claims and accompanying drawing and obtain.

Accompanying drawing explanation

In order to be illustrated more clearly in the technical scheme in the embodiment of the present invention, accompanying drawing required in describing is done simple introduction below to embodiment:

Fig. 1 is the process flow diagram of the multi-mode multi-stage codebooks combined optimization method that the embodiment of the present invention provides.

Embodiment

Describe embodiments of the present invention in detail below with reference to drawings and Examples, to the present invention, how application technology means solve technical matters whereby, and the implementation procedure reaching technique effect can fully understand and implement according to this.It should be noted that, only otherwise form conflict, each embodiment in the present invention and each feature in each embodiment can be combined with each other, and the technical scheme formed is all within protection scope of the present invention.

The embodiment of the present invention provides a kind of multi-mode multi-stage codebooks combined optimization method, can be applicable in Low-ratespeech coding.As shown in Figure 1, the method comprises:

S1: export vector to be quantified.

Concrete, in Low-ratespeech coding, to sound bank according to merotype according to the quantization index of parameter as the numbering of pattern, export parameter to be quantified vector to be quantified in each mode.Wherein, sound bank preferably have setting sample frequency, there is certain time length, the standard Chinese sound bank that comprises multiple tone color.

The sample frequency of the sound bank adopted in the present embodiment is 8kHz, and duration can be several hours, wherein comprises the multiple tone color of different sexes, all ages and classes.To this sound bank, the every 20ms of line spectrum pairs parameter extracts 10 and maintains number, go prediction, go average after as parameter to be quantified, and utilize the quantization index of Voice and unvoice parameter as MODE NUMBER.

S2: treat quantization vector and train, obtains each pattern initialization code book at different levels.

Concrete, using the input vector of vector to be quantified for each pattern as code book training and combined optimization, adopt the codebook training method based on simulated annealing respectively, carry out multi-stage codebooks training according to the bit number of distribution at different levels, obtain each pattern initialization code book at different levels.

For typical multi-mode multistage vector quantizer, the speech coder Q of U kind pattern M level is by the input vector x=[x of K dimension (K=10) ₁, x ₂... x _k] be mapped to corresponding pattern u and index obtain corresponding code book wherein represent u-mode m level i-th code word.

In the present embodiment, for the speech coder of speed 600bps, 4 20ms frames form a superframe and carry out multi-stage vector quantization.Voice and unvoice parameter can distribute 4 bits, then have 16 kinds of patterns (U=16).Line spectrum pairs parameter is then divided into 4 grades (M=4), and the bit number of distribution at different levels is respectively 8,7,7,6, at different levelsly carries out vector quantization according to respective bit.

In other embodiments, for the speech coder of speed 2400bps, multi-mode multi-stage vector quantization can deteriorate to monotype multi-stage vector quantization, and the every 20ms frame of line spectrum pairs parameter carries out 4 grades of vector quantizations, and the bit number of distribution at different levels is respectively 7,6,6,6.

S3: the rearrangement each pattern initialization code book at different levels being carried out to code book index, obtains new code book.

Concrete, to each pattern initialization code book at different levels, adopt the method based on tabu search to carry out the rearrangement of code book index, obtain new code book, thus when not increasing additional bit, reduce the distortion between each code word.

In the present embodiment, by the code book obtained in step S2 after noisy communication channel, become v pattern, index becomes suppose that transition probability is P (J ^v| I ^u), counter for the index code book that is mapped to is by demoder Q ＇ wherein represent a v pattern m level jth code word.

S4: treat quantization vector, utilizes new code book, finds the optimum cell and quantization index that make systematic distortion minimum.

Concrete, for the input vector x of multi-mode multi-stage vector quantization, its distortion is expressed as:

D (x, c_{I}^{u}) = \underset{v}{Σ} \underset{J}{Σ} P (J^{v} | I^{u}) d (x, Σ_{m = 1}^{M} c_{jm}^{v}) - - - (1)

For all input vectors, systematic distortion is expressed as:

D = \underset{u}{Σ} \underset{I}{Σ} \underset{v}{Σ} \underset{J}{Σ} P (J^{v} | I^{u}) \underset{s_{I}^{u}}{&Integral;} p (x) d (x, Σ_{m = 1}^{M} c_{jm}^{v}) dx - - - (2)

Wherein, p (x) distribution probability that is x.

The new code book utilizing step S3 to obtain, the optimum cell that systematic distortion of must sening as an envoy to minimizes (namely channel is optimum) is expressed as:

s_{I}^{u} = {x | D (x, c_{I}^{u}) \leq D (x, c_{L}^{u}), for all L &NotEqual; I} - - - (3)

Wherein, L represents other institute's likely indexes except I.

S5: upgrade the optimum code word of each pattern of m level.

Concrete, for m level code book, utilize the quantization index that input residual error vector at the corresponding levels is corresponding, upgrade the optimum code word of each pattern at the corresponding levels, be expressed as:

c_{jm}^{v} = \frac{\underset{i_{m}^{u}}{Σ} p (j_{m}^{v} | i_{m}^{u}) \underset{s_{i_{m}}^{u}}{&Integral;} x_{m}^{u} p (x) dx}{\underset{i_{m}^{u}}{Σ} p (j_{m}^{v} | i_{m}^{u}) \underset{s_{i_{m}}^{u}}{&Integral;} p (x) dx} - - - (4)

Wherein, the initial value of m is 1, namely first upgrades from the 1st grade of optimum code word of each pattern.

The input residual error vector representation of u-mode m level is:

x_{m}^{u} = x - \underset{v}{Σ} \underset{J - jm}{Σ} P (J^{v} | I^{u}) {\underset{n = 1,}{Σ}}_{n &NotEqual; m}^{M} c_{jn}^{v} - - - (5)

In formula (4), index to index error code transition probability be expressed as:

p (j_{m}^{v} | i_{m}^{u}) = p (v | u) p (j_{m} | i_{m}) - - - (6)

Wherein, the error code transition probability of p (v|u) intermediate scheme u to pattern v, because channel bit error rate q is generally smaller, therefore can be simplified shown as:

p (v | u) = \{\begin{matrix} r = q {(1 - q)}^{(C - 1)} & h (u, v) = 1 \\ 0 & h (u, v) > 1 \\ 1 - Cr & h (u, v) = 0 \end{matrix} - - - (7)

Wherein, the bit number of C intermediate scheme index, the Hamming distance of h (u, v) intermediate scheme u and pattern v.

In addition, the p (j in formula (6) _m| i _m) represent m level index under same pattern to index error code transition probability, because channel bit error rate q is generally smaller, therefore can be simplified shown as:

p (j_{m} | i_{m}) = \{\begin{matrix} q_{m}^{'} = q {(1 - q)}^{(B_{m} - 1)} & h_{m} (i_{m}, j_{m}) = 1 \\ 0 & h_{m} (i_{m}, j_{m}) > 1 \\ 1 - B_{m} q_{m}^{'} & h_{m} (i_{m}, j_{m}) = 0 \end{matrix} - - - (8)

Wherein, B _mrepresent the bit number of m level codewords indexes, h _m(i _m, j _m) represent index i _mwith index j _mhamming distance.

It should be noted that in optimum code word renewal process, need to utilize channel bit error rate q to calculate error code transition probability.For time varying channel, channel bit error rate can change in real time, and the maximum bit error rate q1 that channel now can be utilized possible is to carry out the renewal of optimum code word.For the multi-stage vector quantization in conjunction with chnnel coding, often adopt the channel coding schemes that non-grade is heavily protected.The heavy protection scheme such as non-often carries out stronger chnnel coding protection for former levels of outbalance, and now channel residual bit errors is lower, and lower bit error rate q0 can be adopted to carry out the renewal of optimum code word; And to less important rear what carry out more weak or do not carry out chnnel coding, now channel residual bit errors is higher, and higher bit error rate q1 can be adopted to carry out the renewal of optimum code word.

S6: the value comparing m and M.

If m<M, then the value of m is added 1, and return step S4, to utilize new code book, the minimized optimum cell of systematic distortion of must sening as an envoy to, then in step s 5, upgrades the optimum code word of each pattern of next stage.

If m=M, represent that the optimum code word of each pattern of every one-level completes renewal all, namely complete one and take turns iteration, then carry out step S7.

S7: judge whether iterations t reaches preset value T.

Wherein, the initial value of t be the span of 1, T preferably between 20 to 40, in the present embodiment, the value of T can be set to 30.

If t<T, represent and do not reach default iterations, then the value of t is added 1, the value of m is reset to 1, and returns S4, thus carry out next round iteration.

If t=T, represent and reached default iterations, then carry out S8.

S8: obtain the code book that last iteration draws.

After completing 30 iteration, now iteration restrains substantially, and parameter to be quantified, as final code book, is carried out quantification and the transmission of parameter by code book in each mode that utilize last iteration to obtain according to final code book.

In the multi-mode multi-stage codebooks combined optimization method that the embodiment of the present invention provides, the optimum code book of the multi-mode multi-stage vector quantization of channel optimum is obtained by successive ignition, code word is wherein not the barycenter of current cell, but each cell barycenter is obtained by error code transition probability weighted mean.Compared to the multi-mode multi-stage vector quantization mode of existing information source optimum, distance in the embodiment of the present invention between optimum code word is smaller, once there is error code, whole system distortion also can significantly reduce, thus reduce the impact of error code on systematic distortion, improve the error-resilient performance of system.Therefore, in multi-mode multi-stage vector quantization and transmitting procedure, the systematic distortion of inverse quantization code word and input parameter in channel error code situation effectively can be reduced, and the quality of synthetic speech under effectively improving channel error code situation.

Although embodiment disclosed in this invention is as above, the embodiment that described content just adopts for the ease of understanding the present invention, and be not used to limit the present invention.Technician in any the technical field of the invention; under the prerequisite not departing from spirit and scope disclosed in this invention; any amendment and change can be done what implement in form and in details; but scope of patent protection of the present invention, the scope that still must define with appending claims is as the criterion.

Claims

1. a multi-mode multi-stage codebooks combined optimization method, is characterized in that, comprising:

Step 1, exports vector to be quantified;

Step 6, compares the value of m and M;

If m<M, then the value of m is added 1, and return step 4;

If m=M, then carry out step 7;

If t=T, then carry out step 8;

Step 8, obtains the code book that last iteration draws.

2. method according to claim 1, is characterized in that, described step 1, is specially:

3. method according to claim 2, is characterized in that, described sound bank be have setting sample frequency, there is certain time length, the standard Chinese sound bank that comprises multiple tone color.

4. method according to claim 3, is characterized in that, described setting sample frequency is 8kHz.

5. method according to claim 1, is characterized in that, described step 2, is specially:

6. method according to claim 1, is characterized in that, described step 3, is specially:

7. method according to claim 1, is characterized in that, the span of T is between 20 to 40.