CN103383846A

CN103383846A - Speech coding system to improve packet loss repairing quality

Info

Publication number: CN103383846A
Application number: CN2013102366677A
Authority: CN
Inventors: 高扬
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2006-12-26
Filing date: 2007-12-12
Publication date: 2013-11-06
Anticipated expiration: 2027-12-12
Also published as: CN101286319A; CN101286319B; CN103383846B

Abstract

The invention provides a speed coding system to improve packet loss repairing quality and a method of significantly reducing error propagation due to voice packet loss, while still greatly profiting from long-term pitch prediction, which is achieved by adaptively limiting the maximum value of the pitch gain for the first pitch cycle within one frame. A speech coding system for encoding a speech signal, wherein said a plurality of speech frames are classified into said a plurality of classes depending on if the first pitch cycle is included in one subframe or several subframes. The pitch gain is set to a value significantly smaller than 1 for the subframes covering first pitch cycle; wherein the pitch gain reduction is compensated by increasing the coded excitation codebook size or adding one more stage of excitation for the subframes covering the first pitch cycle.

Description

Improve the voice coding method of speech packet loss repairing quality

Technical field

The invention belongs to the Signal coding field.It is specifically the voice coding field; Especially aim at and improve the compensation of performance after packet loss when transmission of voice packets.

Background technology

Tradition, all parameterised speech coding methods are usually all the redundancies of utilizing voice signal itself, reduce the quantity of information that must transmit, and the parameter of estimation voice signal in short time interval.At first this redundancy results from that speech waveform is periodic to be repeated and spectrum envelope becomes process slowly.

The redundancy of multi-form speech waveform is corresponding to dissimilar voice signal, as turbid sound and clear sound.With regard to turbid sound voice, voice signal is periodic basically; Yet this periodicity changes in voice segments, and periodic waveform slowly changes between voice segments.The voice coding of low bit rate can be benefited from this periodicity greatly.The turbid sound cycle is called pitch period (pitch), and this pitch period prediction is named as the advantage prediction.As for voiceless sound, its signal is more as a random noise, and is periodically also less.

Under any circumstance, parameter coding reduces the redundancy of voice segments by the excitation of split spectrum envelope and voice segments.Spectrum envelope change process slowly is described as linear prediction (also referred to as short prediction).The voice coding of low bit rate is benefited from short prediction too.The advantage of this coding just comes from the variation at a slow speed of parameter.Yet these parameter values possibility that great changes have taken place is very little in several milliseconds.Therefore, when 8k Hz or 16k Hz sampling, speech coding algorithm is as a frame with 10～30 milliseconds of voice segments.And 20 milliseconds be the most frequently used frame length.In early well-known international standard, such as G.723, G.729, EFR, AMR, code-excited linear prediction (CELP) technology (CELP) is widely adopted; Usually code-excited linear prediction (CELP) technology (CELP) is understood to be code-excited, advantage is predicted and short item is predicted the comprehensive of each technology.The speech coding algorithm that utilizes code-excited linear prediction (CELP) technology (CELP) is quite popular in the compress speech field.

Fig. 1 shows CELP initial speech scrambler, utilizes analysis by synthesis method, and the weighted error 109 between integrated voice 102 and raw tone 101 is reduced to minimum, namely minimizes 113.W(z) be weighting filter 110.It is weighted filtering to error signal 111 and processes.1/B(z) being advantage predictive filter 105, is 1/A(z) short predictive filter, is designated as 103.Code-excited 108, be known as again and solidify code book (fixed codebook) excitation, be designated as 106 by the linear filter external reservoir in gain G c().Short prediction linear filtering (being designated as 103) is completed by analyzing original signal 101, and by a linear predictor coefficient set expression:

A (z) = Σ_{i = 1}^{P} 1 + a_{i} \cdot z^{- i}, i = 1, 2, . . . ., P - - - (1)

Weighting filter (110) relates to and uses top short predictive filtering.A typical weighting filter can be expressed as:

W (z) = \frac{A (z / α)}{A (z / β)}, - - - (2)

β＜α wherein, 0＜β＜1,0＜α≤1.Pitch period and pitch period gain are depended in advantage prediction (105); To the estimation of pitch period based on original signal, residual signal or weighting original signal.The advantage anticipation function can be expressed as:

B(z)＝1-β·z ^-Pitch (3)

Code-excited (108) are made of the signal of similar pulse or the signal of similar noise usually, and these signals can produce in real time or deposit in code book with mathematical method.At last, with code-excited index, the gain index of quantification, short the Prediction Parameters index that the advantage Prediction Parameters exponential sum of quantification quantizes passes in demoder.

Fig. 2 shows initial Voice decoder, adds post-processing unit 207 after integrated voice.Demoder is comprised of several unit such as code-excited 201, advantage prediction 203, short prediction 205, post-processed 207.Except the post-processed unit, other is all identical with the scrambler definition in Fig. 1.The post-processed unit is comprised of short post-processed and advantage post-processed.

Fig. 3 shows basic celp coder.Unique difference of it and Fig. 1 is to realize the advantage prediction with the adaptive codebook 307 that contains comprehensive excitation 304 in the past.The pitch of voice is used for producing corresponding adaptive excitation component.This excitation components will be taken advantage of in a gain G _p(being designated as 305) (also referred to as cycle gain).Two by the excitation components of gain span of control limit of control by front being added together of short item predictive filter (being designated as 303).These two gain variables (G _p305 and G _c306) then requirement delivers to demoder.Adaptive codebook 307 excitation components and code book 308 excitation components of solidifying are added together and produce total excitation e (n).

Fig. 4 shows basic CELP demoder, and it is fully corresponding to the scrambler in Fig. 3, but has added post-processing unit 408 after integrated voice 407.This demoder is except adaptive codebook, and other is similar to Fig. 2.Demoder also is comprised of several unit, comprises code-excited 402, adaptive codebook 401, short prediction 406 and post-processed 408.Except the post-processed unit, the scrambler in each unit and Fig. 3 has identical definition.

If bit stream last time is surrounded by loss, while cycle gain G _pAlso very large, the mistake of comprehensive excitation is so in the past estimated to cause the error propagation long duration, even this error propagation can not stop after demoder is received correct bit stream bag yet at once.Error propagation in part because of e _p(n) and e _c(n) phase relation changes when the bit stream packet loss of last time.A simple solution is that the cycle dependency between frame is excised fully; Gain G in other words _pSet to 0 when coding.Although this method has solved the problem of error propagation, also sacrificed the quality when there is no packet loss simultaneously, only more just can compensate therefore and the quality of loss under high bit rate in other words.This paper will provide and prove subsequently a compromise solution.

Summary of the invention

The objective of the invention is in order to overcome above-mentioned weak point of the prior art, a kind of speech coding system that utilizes the improvement speech packet loss repairing quality of the quality that disposable advantage forecasting techniques improves Discarded Packets compensation is provided.

The voice coding method of improvement speech packet loss repairing quality of the present invention, be mainly to use disposable advantage forecasting techniques to reduce because voice package losing produces error propagation, it is realized by maximum cycle gain (Gp) value that suitably limits first pitch period in a frame; Here suppose to have in a frame a plurality of pitch periods.

Described first pitch period gain-limitation is an appropriate value less than 1; For compensating this lower cycle yield value, for this first pitch period, suitably increase the size of excitation code book or add again the level encoder excitation.

The maximum cycle yield value that described first pitch period arranges is in 0.5 left and right.

Other pitch period outside described first pitch period keeps conventional pitch period yield value and excitation codebook size.

Described first pitch period limits suitable cycle gain maximum for strong voiced sound.

Described encoding and decoding speech system carries out encoding and decoding to voice signal; Voice signal is divided into a lot of frames, and every frame has a plurality of voice fundamental cycles; Whether occupy a subframe or a plurality of subframe is divided into different classes to each frame according to first pitch period in frame.

Described subframe when covering first pitch period, is an appropriate value less than 1 with the pitch period gain-limitation; For compensating this lower cycle yield value, to this subframe, suitably increase the size of excitation code book or add again the level encoder excitation.

Described subframe is limited in 0.5 left and right with the pitch period gain maximum when covering first pitch period.

Other outer subframe of the subframe that described first pitch period covers keeps conventional pitch period yield value and excitation codebook size.

Described restriction maximum cycle gain is for strong voiced sound.

The speech coding system of improvement speech packet loss repairing quality of the present invention, owing to using disposable advantage forecasting techniques, main by suitably limiting maximum cycle gain (Gp) value of first pitch period in a frame, can effectively reduce because voice package losing produces error propagation, have be skillfully constructed, method rationally, the advantages such as fidelity successful.

Description of drawings

Fig. 1 is initial CELP speech coder block scheme;

Fig. 2 is initial CELP Voice decoder block scheme;

Fig. 3 is basic CELP speech coder block scheme;

Fig. 4 is basic CELP decoder side block diagram;

Fig. 5 is that a pitch period 503 is less than the exemplary plot of subframe lengths 502;

Fig. 6 is that a pitch period 603 is greater than subframe lengths 602 but less than the exemplary plot of half frame length.

Specific implementation method

It is as follows that the invention will be further described by reference to the accompanying drawings:

The details that comprised relevant code-excited linear prediction (CELP) technology CELP are below described.Simultaneously, the people who is familiar with correlation technique can find that this method can be practiced in other various speech coding technology algorithm, but not is confined to application discussed in this article.In addition, for outstanding characteristic of the present invention, this paper does not discuss some general knowledge details in the art.

This paper accompanying drawing and additional disclosure thereof are also just given an example for more of the present invention.Brief for the sake of simplicity, other entity of using correlation technique of the present invention will not be described in detail or provide diagram one by one.

Fig. 3 has lifted one can illustrate encoder instances of the present invention.With reference to Fig. 3 and Fig. 4, the advantage prediction is being played the part of important role in the voiced sound coding, and this is the strong periodicity due to voiced sound.Simultaneously, the adjacent voice fundamental cycle is closely similar, and this just causes the pitch period gain G that encourages in expression _pVery high on 305 numerical value.

e(n)＝G _p·e _p(n)+G _c·e _c(n) (4)

E in following formula _p(n) be that it gets from the adaptive codebook 307 that comprised de-energisation 304 take the subframe of n as the sampling ordinal number; e _c(n) come from the code-excited code book 308(that acts on current excitation and be again curing code book fixed codebook).For voiced sound, e _p(n) effect is more remarkable, the cycle gain G _p305 is near the values 1.Generally encourage each subframe to upgrade once.Typical frame length is 20 milliseconds; Subframe length is 5 milliseconds.

For most of voiced sounds, a frame all contains plural pitch period.Fig. 5 has provided a pitch period 503 of a frame 501 less than the example of subframe lengths 502; Fig. 6 has provided a pitch period 603 greater than subframe lengths 602 but less than half example of frame length 601.If voiced sound is very strong, for fear of the error propagation that is caused by packet loss, a compromise way is, when benefiting from the advantage prediction, limits the maximal value of cycle gain of first pitch period of each frame.We can to classification of speech signals, give different treating.Let us is seen following example, and efficient voice is divided into following 4 classes:

The 1st class: (strong voiced sound) and (pitch period＜=subframe lengths)For this frame, the pitch period gain-limitation of first subframe is far smaller than 1 value (such as 0.5) at one.For first subframe, code-excited code book should be greater than other subframe in same frame.Also can add again level encoder and encourage to compensate its lower pitch period yield value in first subframe.For other subframe, use conventional CELP algorithm just can.Because this is strong unvoiced frame, pitch period and cycle gain are stable in frame, so pitch period and cycle gain can come efficient coding with bit number still less.

The 2nd class: (strong voiced sound) and (pitch period〉subframe lengths and pitch period＜=half frame length)For this frame, the pitch period gain-limitation of the first two subframe (field) is far smaller than 1 value (such as 0.5) at one.For these two subframes, code-excited code book should be greater than other subframe in same frame.Also can add again level encoder in these two subframes and encourage to compensate its lower pitch period yield value.For other subframe, use conventional CELP algorithm just can.Because this is strong unvoiced frame, pitch period and cycle gain are stable in frame, so pitch period and cycle gain can come efficient coding with bit number still less.

The 3rd class: (strong voiced sound) and (pitch period〉half frame length)When pitch period was very long, the pitch period that the error propagation effect is compared by the impact of advantage prediction was less.For this class frame, the cycle gain that covers the subframe of first pitch period can be restricted to one less than 1 value; The large I of code-excited code book exceeds routine, also can add level encoder again and encourage to compensate its lower pitch period yield value.Because long pitch period produces less error propagation, and the situation of the long pitch period of appearance is also less, so conventional CELP algorithm also can be used for all subframes in frame.Because this is strong unvoiced frame, pitch period and cycle gain are stable in frame, so pitch period and cycle gain also can come efficient coding with bit number still less.

The 4th class: remove All situations outside the 1st, 2,3 classesUse conventional CELP algorithm just can.

Classification that the above stipulates numbering (class sequence number) can be changed and not affect actual result.For example, (strong voiced sound) and (base Sound cycle＜=subframe lengths)Also can be defined as the 2nd class but not the 1st class; (strong voiced sound) and (pitch period〉subframe and fundamental tone Cycle＜=half's frame length)Also can be defined as the 3rd class rather than the 2nd class, etc.

Roughlly speaking, can reduce the error propagation effect that produces because of voice package losing by the correlativity of suitably dwindling between two frame boundaries place pitch periods, keep simultaneously the significant contribution of advantage prediction.

The present invention can have other concrete forms of expression that does not depart from its marrow or essential characteristic.Example described in literary composition also just has illustrative but not strict restrictive meaning.Therefore, than the narration of front, the accessory claim of back has embodied scope of the present invention more significantly.All the variation with the Meaning equivalence of claim is included in the scope of this claim.

Claims

1. voice coding method that improves speech packet loss repairing quality, the method comprises:

Use an adaptive excitation component, this excitation components equals a pitch period gain G P and multiply by ep(n), and ep(n) got from comprising currentless adaptive codebook;

Use a code-excited component;

Thereby adaptive excitation component and code-excited component addition are produced a pumping signal;

In scrambler, use conventional CELP algorithm, each subframe in the voice signal frame is determined the routine value of its pitch period gain;

For strong unvoiced frame, limit the pitch period yield value of subframe of first pitch period of covering of each frame less than 1; For other subframes, keep the routine value of pitch period gain.

2. method according to claim 1, is characterized in that, the pitch period yield value of subframe that limits first pitch period of covering of each frame comprises less than 1:

The pitch period yield value of subframe that limits first pitch period of covering of each frame is 0.5.

3. a voice coding method that improves speech packet loss repairing quality, is characterized in that, described method comprises:

Voice signal is carried out CELP code-excited linear prediction (CELP) coding, obtain the voice signal frame, described voice signal frame comprises strong unvoiced frame;

To the strong unvoiced frame in described voice signal frame, limit the maximum cycle yield value of the subframe of first pitch period in described strong unvoiced frame;

For other subframes, keep the pitch period gain that obtains according to described CELP coding.

4. method according to claim 3, is characterized in that, in the described strong unvoiced frame of described restriction, the maximum cycle yield value of first pitch period comprises:

Limit the pitch period yield value of the subframe of first pitch period in described strong unvoiced frame less than 1.

5. method according to claim 4, is characterized in that, in the described strong unvoiced frame of described restriction, the maximum cycle yield value of the subframe of first pitch period comprises less than 1:

The pitch period yield value that limits the subframe of first pitch period in described strong unvoiced frame is 0.5.

6. voice coding method that improves speech packet loss repairing quality is characterized in that the encoding and decoding speech system carries out encoding and decoding to voice signal; Voice signal is divided into a lot of frames, and every frame has a plurality of voice fundamental cycles;

The method comprises:

Voice signal is carried out CELP code-excited linear prediction (CELP) coding, wherein said CELP coding comprises adaptive excitation component of use, this excitation components comes from the gain of pitch period and multiply by one and comprised currentless adaptive codebook vector, use a code-excited component, thereby adaptive excitation component and code-excited component addition are produced a pumping signal, to obtain the voice signal frame;

Whether occupying a subframe or a plurality of subframe according to first described pitch period in described voice signal frame gains with the pitch period that determines whether to limit first subframe in described voice signal frame or the first two subframe and be;

For in the voice signal frame except first subframe or other subframes except the first two subframe, keep the pitch period gain that obtains according to described CELP coding.

7. method according to claim 6, it is characterized in that describedly whether occupy a subframe or a plurality of subframe according to first described pitch period in described voice signal frame and gain with the pitch period that determines whether to limit first subframe in described voice signal frame or the first two subframe and be and comprise:

In described voice signal frame, first subframe or the first two subframe when covering first pitch period, limit the pitch period gain limit of described first subframe or the first two subframe less than 1.

8. method according to claim 7, is characterized in that, the pitch period gain limit of described first subframe of described restriction or the first two subframe comprises less than 1:

The pitch period gain that limits described first subframe or the first two subframe is limited to 0.5.

9. voice coding method that improves speech packet loss repairing quality, it is characterized in that using disposable advantage forecasting techniques to reduce because voice package losing produces error propagation, it is realized by maximum cycle gain (Gp) value of first pitch period in restriction one frame; Have a plurality of pitch periods in a described frame.

10. the speech coding system of improvement speech packet loss repairing quality as claimed in claim 9, is characterized in that described first pitch period gain-limitation is less than 1; For compensating this lower cycle yield value, for this first pitch period, suitably increase the size of excitation code book or add again the level encoder excitation.

11. the speech coding system of improvement speech packet loss repairing quality as claimed in claim 10, the maximum cycle yield value that it is characterized in that described first pitch period setting is 0.5.