CN104795074A - Multi-mode multi-stage codebook joint optimization method - Google Patents

Multi-mode multi-stage codebook joint optimization method Download PDF

Info

Publication number
CN104795074A
CN104795074A CN201510121820.0A CN201510121820A CN104795074A CN 104795074 A CN104795074 A CN 104795074A CN 201510121820 A CN201510121820 A CN 201510121820A CN 104795074 A CN104795074 A CN 104795074A
Authority
CN
China
Prior art keywords
code book
quantified
vector
pattern
stage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510121820.0A
Other languages
Chinese (zh)
Other versions
CN104795074B (en
Inventor
徐敬德
崔慧娟
唐昆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
XINRUIDI (BEIJING) SCIENCE & TECHNOLOGY Co Ltd
Tsinghua University
Original Assignee
XINRUIDI (BEIJING) SCIENCE & TECHNOLOGY Co Ltd
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by XINRUIDI (BEIJING) SCIENCE & TECHNOLOGY Co Ltd, Tsinghua University filed Critical XINRUIDI (BEIJING) SCIENCE & TECHNOLOGY Co Ltd
Priority to CN201510121820.0A priority Critical patent/CN104795074B/en
Publication of CN104795074A publication Critical patent/CN104795074A/en
Application granted granted Critical
Publication of CN104795074B publication Critical patent/CN104795074B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses a multi-mode multi-stage codebook joint optimization method, belongs to the technical field of low-bitrate speech coding vector quantization and solves the technical problem that in existing multi-mode multi-stage codebook training, error codes greatly influence the systematic distortion. The multi-mode multi-stage codebook joint optimization method includes: outputting the vectors to be quantified; conducting training on the vectors to be quantified and obtaining the multi-mode multi-stage initial codebooks; conducting the codebook index rearrangement on each multi-mode multi-stage initial codebook and obtaining a new codebook; regarding the vectors to be quantified, seeking the optimal cell and the quantified index which enable the systematic distortion to be smallest through the new codebook; updating the multi-mode multi-stage optimal codon through the input of the residual vectors and the corresponding quantified indexes and iterating the optimal codon to the preset times; and obtaining the codon obtained from the last iteration.

Description

Multi-mode multi-stage codebooks combined optimization method
Technical field
The present invention relates to Low-ratespeech coding vector quantization technology field, specifically, relate to a kind of multi-mode multi-stage codebooks combined optimization method.
Background technology
Along with the development of Audiotechnica, Low-ratespeech coding has a wide range of applications in the field such as radio communication, satellite communication.
In Low-ratespeech coding, owing to being subject to memory space and computing quantitative limitation, often adopt restrained vector quantization mode, comprising Tree-structured vector quantization, classified vector quantization, multi-stage vector quantization, transform domain vector quantization etc., to reduce memory space and the operand of code book.Wherein, the most conventional with multi-stage vector quantization.On the other hand, although merotype vector quantization can increase certain memory space, it when additionally not increasing bit, can effectively reduce quantization error.In numerous applications of Low-ratespeech coding, often the moment is along with the existence of error code.For multi-mode multi-stage vector quantization, once generation error code, not only code book index at different levels can be made mistakes, and the selection of pattern also can make mistakes, often there is larger deviation with the input parameter of coding side in the parameter of decoding end synthesis like this, have a strong impact on intelligibility and the comfort level of synthetic speech.
Traditional merotype multi-stage vector quantization, carry out the combined optimization of code book training and code book so that information source is optimum for criterion, owing to not considering channel error code situation, the systematic distortion of the multi-mode multi-stage vector quantization of information source optimum is quantizing distortion.Such training optimum code word is out evenly distributed in whole quantification space, and once there is error code, whole system distortion also can be larger.
Based on above-mentioned situation, in existing multi-mode multi-stage codebooks training, error code is comparatively large on the impact of systematic distortion, have impact on the quality to inverse quantization code word and synthetic speech.
Summary of the invention
The object of the present invention is to provide a kind of multi-mode multi-stage codebooks combined optimization method, to solve in the training of existing multi-mode multi-stage codebooks, the technical matters that the impact of error code on systematic distortion is larger.
The invention provides a kind of multi-mode multi-stage codebooks combined optimization method, comprising:
Step 1, exports vector to be quantified;
Step 2, trains described vector to be quantified, obtains each pattern initialization code book at different levels, and wherein, progression is M;
Step 3, carries out the rearrangement of code book index, obtains new code book to described each pattern initialization code book at different levels;
Step 4, to described vector to be quantified, utilizes new code book, finds the optimum cell and quantization index that make systematic distortion minimum;
Step 5, for m level code book, utilizes the quantization index that input residual error vector at the corresponding levels is corresponding, and upgrade the optimum code word of each pattern at the corresponding levels, wherein, the initial value of m is 1;
Step 6, compares the value of m and M;
If m<M, then the value of m is added 1, and return step 4;
If m=M, then carry out step 7;
Step 7, judges whether iterations t reaches preset value T, and wherein, the initial value of t is 1;
If t<T, then the value of t is added 1, the value of m is reset to 1, and returns step 4;
If t=T, then carry out step 8;
Step 8, obtains the code book that last iteration draws.
Preferably, described step 1, is specially:
In Low-ratespeech coding, to sound bank according to merotype according to the quantization index of parameter as the numbering of pattern, export parameter to be quantified vector to be quantified in each mode.
Described sound bank preferably have setting sample frequency, there is certain time length, the standard Chinese sound bank that comprises multiple tone color.Described setting sample frequency is preferably 8kHz.
Preferably, described step 2, is specially:
Using vector to be quantified for each pattern as code book training and the input vector of combined optimization, adopt the codebook training method based on simulated annealing respectively, carry out multi-stage codebooks training according to the bit number of distribution at different levels, obtain each pattern initialization code book at different levels.
Preferably, described step 3, is specially:
To described each pattern initialization code book at different levels, adopt the method based on tabu search to carry out the rearrangement of code book index, obtain new code book.
Preferably, the span of T is between 20 to 40.
Present invention offers following beneficial effect: in multi-mode multi-stage codebooks combined optimization method provided by the invention, the optimum code book of the multi-mode multi-stage vector quantization of channel optimum is obtained by successive ignition, code word is wherein not the barycenter of current cell, but each cell barycenter is obtained by error code transition probability weighted mean.Compared to the multi-mode multi-stage vector quantization mode of existing information source optimum, distance in the present invention between optimum code word is smaller, once there is error code, whole system distortion also can significantly reduce, thus reduce the impact of error code on systematic distortion, improve the error-resilient performance of system.Therefore, in multi-mode multi-stage vector quantization and transmitting procedure, the systematic distortion of inverse quantization code word and input parameter in channel error code situation effectively can be reduced, and the quality of synthetic speech under effectively improving channel error code situation.
Other features and advantages of the present invention will be set forth in the following description, and, partly become apparent from instructions, or understand by implementing the present invention.Object of the present invention and other advantages realize by structure specifically noted in instructions, claims and accompanying drawing and obtain.
Accompanying drawing explanation
In order to be illustrated more clearly in the technical scheme in the embodiment of the present invention, accompanying drawing required in describing is done simple introduction below to embodiment:
Fig. 1 is the process flow diagram of the multi-mode multi-stage codebooks combined optimization method that the embodiment of the present invention provides.
Embodiment
Describe embodiments of the present invention in detail below with reference to drawings and Examples, to the present invention, how application technology means solve technical matters whereby, and the implementation procedure reaching technique effect can fully understand and implement according to this.It should be noted that, only otherwise form conflict, each embodiment in the present invention and each feature in each embodiment can be combined with each other, and the technical scheme formed is all within protection scope of the present invention.
The embodiment of the present invention provides a kind of multi-mode multi-stage codebooks combined optimization method, can be applicable in Low-ratespeech coding.As shown in Figure 1, the method comprises:
S1: export vector to be quantified.
Concrete, in Low-ratespeech coding, to sound bank according to merotype according to the quantization index of parameter as the numbering of pattern, export parameter to be quantified vector to be quantified in each mode.Wherein, sound bank preferably have setting sample frequency, there is certain time length, the standard Chinese sound bank that comprises multiple tone color.
The sample frequency of the sound bank adopted in the present embodiment is 8kHz, and duration can be several hours, wherein comprises the multiple tone color of different sexes, all ages and classes.To this sound bank, the every 20ms of line spectrum pairs parameter extracts 10 and maintains number, go prediction, go average after as parameter to be quantified, and utilize the quantization index of Voice and unvoice parameter as MODE NUMBER.
S2: treat quantization vector and train, obtains each pattern initialization code book at different levels.
Concrete, using the input vector of vector to be quantified for each pattern as code book training and combined optimization, adopt the codebook training method based on simulated annealing respectively, carry out multi-stage codebooks training according to the bit number of distribution at different levels, obtain each pattern initialization code book at different levels.
For typical multi-mode multistage vector quantizer, the speech coder Q of U kind pattern M level is by the input vector x=[x of K dimension (K=10) 1, x 2... x k] be mapped to corresponding pattern u and index obtain corresponding code book wherein represent u-mode m level i-th code word.
In the present embodiment, for the speech coder of speed 600bps, 4 20ms frames form a superframe and carry out multi-stage vector quantization.Voice and unvoice parameter can distribute 4 bits, then have 16 kinds of patterns (U=16).Line spectrum pairs parameter is then divided into 4 grades (M=4), and the bit number of distribution at different levels is respectively 8,7,7,6, at different levelsly carries out vector quantization according to respective bit.
In other embodiments, for the speech coder of speed 2400bps, multi-mode multi-stage vector quantization can deteriorate to monotype multi-stage vector quantization, and the every 20ms frame of line spectrum pairs parameter carries out 4 grades of vector quantizations, and the bit number of distribution at different levels is respectively 7,6,6,6.
S3: the rearrangement each pattern initialization code book at different levels being carried out to code book index, obtains new code book.
Concrete, to each pattern initialization code book at different levels, adopt the method based on tabu search to carry out the rearrangement of code book index, obtain new code book, thus when not increasing additional bit, reduce the distortion between each code word.
In the present embodiment, by the code book obtained in step S2 after noisy communication channel, become v pattern, index becomes suppose that transition probability is P (J v| I u), counter for the index code book that is mapped to is by demoder Q ' wherein represent a v pattern m level jth code word.
S4: treat quantization vector, utilizes new code book, finds the optimum cell and quantization index that make systematic distortion minimum.
Concrete, for the input vector x of multi-mode multi-stage vector quantization, its distortion is expressed as:
D ( x , c I u ) = &Sigma; v &Sigma; J P ( J v | I u ) d ( x , &Sigma; m = 1 M c jm v ) - - - ( 1 )
For all input vectors, systematic distortion is expressed as:
D = &Sigma; u &Sigma; I &Sigma; v &Sigma; J P ( J v | I u ) &Integral; s I u p ( x ) d ( x , &Sigma; m = 1 M c jm v ) dx - - - ( 2 )
Wherein, p (x) distribution probability that is x.
The new code book utilizing step S3 to obtain, the optimum cell that systematic distortion of must sening as an envoy to minimizes (namely channel is optimum) is expressed as:
s I u = { x | D ( x , c I u ) &le; D ( x , c L u ) , for all L &NotEqual; I } - - - ( 3 )
Wherein, L represents other institute's likely indexes except I.
S5: upgrade the optimum code word of each pattern of m level.
Concrete, for m level code book, utilize the quantization index that input residual error vector at the corresponding levels is corresponding, upgrade the optimum code word of each pattern at the corresponding levels, be expressed as:
c jm v = &Sigma; i m u p ( j m v | i m u ) &Integral; s i m u x m u p ( x ) dx &Sigma; i m u p ( j m v | i m u ) &Integral; s i m u p ( x ) dx - - - ( 4 )
Wherein, the initial value of m is 1, namely first upgrades from the 1st grade of optimum code word of each pattern.
The input residual error vector representation of u-mode m level is:
x m u = x - &Sigma; v &Sigma; J - jm P ( J v | I u ) &Sigma; n = 1 , n &NotEqual; m M c jn v - - - ( 5 )
In formula (4), index to index error code transition probability be expressed as:
p ( j m v | i m u ) = p ( v | u ) p ( j m | i m ) - - - ( 6 )
Wherein, the error code transition probability of p (v|u) intermediate scheme u to pattern v, because channel bit error rate q is generally smaller, therefore can be simplified shown as:
p ( v | u ) = r = q ( 1 - q ) ( C - 1 ) h ( u , v ) = 1 0 h ( u , v ) > 1 1 - Cr h ( u , v ) = 0 - - - ( 7 )
Wherein, the bit number of C intermediate scheme index, the Hamming distance of h (u, v) intermediate scheme u and pattern v.
In addition, the p (j in formula (6) m| i m) represent m level index under same pattern to index error code transition probability, because channel bit error rate q is generally smaller, therefore can be simplified shown as:
p ( j m | i m ) = q m &prime; = q ( 1 - q ) ( B m - 1 ) h m ( i m , j m ) = 1 0 h m ( i m , j m ) > 1 1 - B m q m &prime; h m ( i m , j m ) = 0 - - - ( 8 )
Wherein, B mrepresent the bit number of m level codewords indexes, h m(i m, j m) represent index i mwith index j mhamming distance.
It should be noted that in optimum code word renewal process, need to utilize channel bit error rate q to calculate error code transition probability.For time varying channel, channel bit error rate can change in real time, and the maximum bit error rate q1 that channel now can be utilized possible is to carry out the renewal of optimum code word.For the multi-stage vector quantization in conjunction with chnnel coding, often adopt the channel coding schemes that non-grade is heavily protected.The heavy protection scheme such as non-often carries out stronger chnnel coding protection for former levels of outbalance, and now channel residual bit errors is lower, and lower bit error rate q0 can be adopted to carry out the renewal of optimum code word; And to less important rear what carry out more weak or do not carry out chnnel coding, now channel residual bit errors is higher, and higher bit error rate q1 can be adopted to carry out the renewal of optimum code word.
S6: the value comparing m and M.
If m<M, then the value of m is added 1, and return step S4, to utilize new code book, the minimized optimum cell of systematic distortion of must sening as an envoy to, then in step s 5, upgrades the optimum code word of each pattern of next stage.
If m=M, represent that the optimum code word of each pattern of every one-level completes renewal all, namely complete one and take turns iteration, then carry out step S7.
S7: judge whether iterations t reaches preset value T.
Wherein, the initial value of t be the span of 1, T preferably between 20 to 40, in the present embodiment, the value of T can be set to 30.
If t<T, represent and do not reach default iterations, then the value of t is added 1, the value of m is reset to 1, and returns S4, thus carry out next round iteration.
If t=T, represent and reached default iterations, then carry out S8.
S8: obtain the code book that last iteration draws.
After completing 30 iteration, now iteration restrains substantially, and parameter to be quantified, as final code book, is carried out quantification and the transmission of parameter by code book in each mode that utilize last iteration to obtain according to final code book.
In the multi-mode multi-stage codebooks combined optimization method that the embodiment of the present invention provides, the optimum code book of the multi-mode multi-stage vector quantization of channel optimum is obtained by successive ignition, code word is wherein not the barycenter of current cell, but each cell barycenter is obtained by error code transition probability weighted mean.Compared to the multi-mode multi-stage vector quantization mode of existing information source optimum, distance in the embodiment of the present invention between optimum code word is smaller, once there is error code, whole system distortion also can significantly reduce, thus reduce the impact of error code on systematic distortion, improve the error-resilient performance of system.Therefore, in multi-mode multi-stage vector quantization and transmitting procedure, the systematic distortion of inverse quantization code word and input parameter in channel error code situation effectively can be reduced, and the quality of synthetic speech under effectively improving channel error code situation.
Although embodiment disclosed in this invention is as above, the embodiment that described content just adopts for the ease of understanding the present invention, and be not used to limit the present invention.Technician in any the technical field of the invention; under the prerequisite not departing from spirit and scope disclosed in this invention; any amendment and change can be done what implement in form and in details; but scope of patent protection of the present invention, the scope that still must define with appending claims is as the criterion.

Claims (7)

1. a multi-mode multi-stage codebooks combined optimization method, is characterized in that, comprising:
Step 1, exports vector to be quantified;
Step 2, trains described vector to be quantified, obtains each pattern initialization code book at different levels, and wherein, progression is M;
Step 3, carries out the rearrangement of code book index, obtains new code book to described each pattern initialization code book at different levels;
Step 4, to described vector to be quantified, utilizes new code book, finds the optimum cell and quantization index that make systematic distortion minimum;
Step 5, for m level code book, utilizes the quantization index that input residual error vector at the corresponding levels is corresponding, and upgrade the optimum code word of each pattern at the corresponding levels, wherein, the initial value of m is 1;
Step 6, compares the value of m and M;
If m<M, then the value of m is added 1, and return step 4;
If m=M, then carry out step 7;
Step 7, judges whether iterations t reaches preset value T, and wherein, the initial value of t is 1;
If t<T, then the value of t is added 1, the value of m is reset to 1, and returns step 4;
If t=T, then carry out step 8;
Step 8, obtains the code book that last iteration draws.
2. method according to claim 1, is characterized in that, described step 1, is specially:
In Low-ratespeech coding, to sound bank according to merotype according to the quantization index of parameter as the numbering of pattern, export parameter to be quantified vector to be quantified in each mode.
3. method according to claim 2, is characterized in that, described sound bank be have setting sample frequency, there is certain time length, the standard Chinese sound bank that comprises multiple tone color.
4. method according to claim 3, is characterized in that, described setting sample frequency is 8kHz.
5. method according to claim 1, is characterized in that, described step 2, is specially:
Using vector to be quantified for each pattern as code book training and the input vector of combined optimization, adopt the codebook training method based on simulated annealing respectively, carry out multi-stage codebooks training according to the bit number of distribution at different levels, obtain each pattern initialization code book at different levels.
6. method according to claim 1, is characterized in that, described step 3, is specially:
To described each pattern initialization code book at different levels, adopt the method based on tabu search to carry out the rearrangement of code book index, obtain new code book.
7. method according to claim 1, is characterized in that, the span of T is between 20 to 40.
CN201510121820.0A 2015-03-19 2015-03-19 Multi-mode multi-stage codebooks combined optimization method Active CN104795074B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510121820.0A CN104795074B (en) 2015-03-19 2015-03-19 Multi-mode multi-stage codebooks combined optimization method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510121820.0A CN104795074B (en) 2015-03-19 2015-03-19 Multi-mode multi-stage codebooks combined optimization method

Publications (2)

Publication Number Publication Date
CN104795074A true CN104795074A (en) 2015-07-22
CN104795074B CN104795074B (en) 2019-01-04

Family

ID=53559832

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510121820.0A Active CN104795074B (en) 2015-03-19 2015-03-19 Multi-mode multi-stage codebooks combined optimization method

Country Status (1)

Country Link
CN (1) CN104795074B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113449126A (en) * 2020-03-24 2021-09-28 中移(成都)信息通信科技有限公司 Image retrieval method, image retrieval device, electronic equipment and computer storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1975861A (en) * 2006-12-15 2007-06-06 清华大学 Vocoder fundamental tone cycle parameter channel error code resisting method
CN101009098A (en) * 2007-01-26 2007-08-01 清华大学 Sound coder gain parameter division-mode anti-channel error code method
CN101261835A (en) * 2008-04-25 2008-09-10 清华大学 Joint optimization method for multi-vector and multi-code book size based on super frame mode
CN101295507A (en) * 2008-04-25 2008-10-29 清华大学 Superframe acoustic channel parameter multilevel vector quantization method with interstage estimation

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1975861A (en) * 2006-12-15 2007-06-06 清华大学 Vocoder fundamental tone cycle parameter channel error code resisting method
CN101009098A (en) * 2007-01-26 2007-08-01 清华大学 Sound coder gain parameter division-mode anti-channel error code method
CN101261835A (en) * 2008-04-25 2008-09-10 清华大学 Joint optimization method for multi-vector and multi-code book size based on super frame mode
CN101295507A (en) * 2008-04-25 2008-10-29 清华大学 Superframe acoustic channel parameter multilevel vector quantization method with interstage estimation

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
J.S.PAN AND S.C.CHU: ""Non-redundant VQ channel coding using tabu search strategy"", 《ELECTRONICS LETTERS》 *
NARIMAN FARVARDIN: ""a study of vector quantization for noisy channels"", 《IEEE TRANSACTIONS ON INFORMATION THEORY》 *
徐敬德 等: ""基于码字特征的多模式多级矢量量化算法"", 《清华大学学报(自然科学版)》 *
徐敬德 等: ""多矢量多模式量化中码本尺寸联合优化算法"", 《清华大学学报(自然科学版)》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113449126A (en) * 2020-03-24 2021-09-28 中移(成都)信息通信科技有限公司 Image retrieval method, image retrieval device, electronic equipment and computer storage medium

Also Published As

Publication number Publication date
CN104795074B (en) 2019-01-04

Similar Documents

Publication Publication Date Title
CN1906855B (en) Dimensional vector and variable resolution quantisation
US9171550B2 (en) Context-based arithmetic encoding apparatus and method and context-based arithmetic decoding apparatus and method
CN102859583B (en) Audio encoder, audio decoder, method for encoding and audio information, and method for decoding an audio information using a modification of a number representation of a numeric previous context value
CN101110214B (en) Speech coding method based on multiple description lattice type vector quantization technology
CN103778919B (en) Based on compressed sensing and the voice coding method of rarefaction representation
CN103620672B (en) For the apparatus and method of the error concealing in low delay associating voice and audio coding (USAC)
CN106203624A (en) Vector Quantization based on deep neural network and method
US20210201924A1 (en) Audio encoding/decoding based on an efficient representation of auto-regressive coefficients
CN101578508A (en) Method and device for coding transition frames in speech signals
CN110491400B (en) Speech signal reconstruction method based on depth self-encoder
CN110473557B (en) Speech signal coding and decoding method based on depth self-encoder
CN104517612A (en) Variable-bit-rate encoder, variable-bit-rate decoder, variable-bit-rate encoding method and variable-bit-rate decoding method based on AMR (adaptive multi-rate)-NB (narrow band) voice signals
US7283968B2 (en) Method for grouping short windows in audio encoding
CN103503320B (en) For reconstructing method and the decoder of source signal
CN104795074A (en) Multi-mode multi-stage codebook joint optimization method
Shin et al. Audio coding based on spectral recovery by convolutional neural network
EP2447943A1 (en) Coding method, decoding method, and device and program using the methods
US20110181449A1 (en) Encoding and Decoding Method and Device
CN103081007A (en) Quantization device and quantization method
Lee et al. KLT-based adaptive entropy-constrained quantization with universal arithmetic coding
CN105575401A (en) AACHuffman domain steganalysis method based on C-MAC characteristics
CN101004915A (en) Protection method for anti channel error code of voice coder in 2.4kb/s SELP low speed
CN2927247Y (en) Speech decoder
CN102740072A (en) Two-step learning vector quantization code book generating method for image reconstruction
CN106157960A (en) The self adaptation arithmetic coding/decoding of audio content

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant