WO2013016986A1

WO2013016986A1 - Compensation method and device for frame loss after voiced initial frame

Info

Publication number: WO2013016986A1
Application number: PCT/CN2012/077356
Authority: WO
Inventors: 关旭; 袁浩; 彭科; 黎家力
Original assignee: 中兴通讯股份有限公司
Priority date: 2011-07-31
Filing date: 2012-06-21
Publication date: 2013-02-07
Also published as: CN102915737B; CN102915737A

Abstract

A compensation method for frame loss after a voiced initial frame, comprising: if a first frame following a voiced initial frame is lost after the voiced initial frame is correctly received (101), choosing a fundamental tone delay inference method according to a stability condition of the voiced initial frame to infer a fundamental tone delay of the first lost frame (102); inferring an adaptive codebook gain of the first lost frame according to an adaptive codebook gain of one or more subframes received before the first lost frame, or inferring an adaptive codebook gain of the first lost frame according to an energy change of a time domain voice signal of the voiced initial frame (103); and compensating the first lost frame according to the inferred fundamental tone delay and adaptive codebook gain (104).

Description

Method and device for compensating frame loss after start frame of voiced sound

Technical field

The present invention relates to the field of speech codec technology, and in particular, to a method and apparatus for compensating for frame loss after a voiced start frame. Background technique

When a voice frame is transmitted in a channel, such as a wireless environment or an IP network, the frame loss phenomenon may occur during reception due to various complicated factors involved in the transmission process, so that the voice quality synthesized at the receiving end is seriously degraded. The purpose of the frame loss compensation technique is to reduce the quality of speech caused by frame dropping to improve the subjective feelings of people.

The CELP (Code Excited Linear Prediction) type speech codec is widely used in practical communication systems because it can provide better speech quality at low and medium speeds. The CELP type speech codec is based on a predictive speech codec. The current coded speech frame depends not only on the current speech frame data, but also on the historical state of the codec, ie there is a strong interframe correlation. In this way, when any one of the speech frames is lost, not only the current speech frame cannot be correctly synthesized, but also the error is continued to the subsequent frames, resulting in a serious degradation of the synthesized speech quality, thus providing a high quality frame loss frame. The compensation method is particularly important.

In order to improve the quality of the frame loss compensation, one method is to send additional "side information" on the encoding side. These "side information" are used to recover the lost speech frame during decoding, but obviously this method will increase the bit stream rate. At the same time bring additional codec delay. Another method is to classify the time domain speech signals obtained after decoding the information frame, and the types include: unvoiced frames, unvoiced transition frames, voiced transition frames, voiced frames, voiced start frames, and the like. Different frame loss compensation methods are selected according to different categories of adjacent frames before the lost frame, but the frame loss after the voiced start frame usually uses a compensation method similar to the frame loss after the voiced frame, so that when the frame loss occurs in the frame The compensated sound quality is not guaranteed after the voiced start frame. Summary of the invention

The technical problem to be solved by the present invention is to provide a compensation method and device for dropping frames after a voiced start frame. Set, to ensure that the frame loss after the start of the voiced frame is compensated without delay and good effect.

To solve the above technical problem, the present invention provides a method for compensating for a frame loss after a voiced start frame, the method comprising:

The voiced start frame is correctly received. When the first frame immediately following the voiced start frame is lost, the corresponding pitch delay inference method is selected according to the stability condition of the voiced start frame to infer the pitch delay of the first lost frame. Deriving an adaptive codebook gain of the first lost frame according to an adaptive codebook gain of one or more subframes received before the first lost frame, or inferring an energy variation of the time domain speech signal according to the voiced start frame The adaptive codebook gain of the first lost frame; compensating for the first lost frame based on the inferred pitch delay and the adaptive codebook gain.

In order to solve the above technical problem, the present invention further provides a compensation device for dropping frames after a voiced start frame, the device comprising a first genetic delay compensation module, a first adaptive codebook gain compensation module, and a first compensation module. , among them:

The first gene delay compensation module is configured to: when the voiced start frame is correctly received, and the first frame immediately following the voiced start frame is lost, the corresponding pitch delay is selected according to the stability condition of the voiced start frame. Inferring the way to infer the pitch delay of the first lost frame;

The first adaptive codebook gain compensation module is configured to: infer an adaptive codebook gain of the first lost frame according to an adaptive codebook gain of one or more subframes received before the first lost frame, or according to The energy variation of the time domain speech signal of the voiced start frame infers the adaptive codebook gain of the first lost frame; the first compensation module is configured to: according to the inferred pitch delay and the adaptive codebook gain pair first Lost frames are compensated.

Another technical problem to be solved by the present invention is to provide a method and apparatus for delaying frame loss after a voiced start frame, which reduces the error transmission caused by frame dropping and controls the energy of the synthesized speech.

In order to solve the above technical problem, the present invention provides a method for compensating a frame after a voiced start frame, the method comprising:

The voiced start frame is correctly received. When one or more frames following the voiced start frame are lost, the pitch delay of the lost frame and the adaptive codebook gain are inferred, and the pitch delay and the adaptive code are obtained according to the inference. This gain compensates for lost frames; For the first correctly received frame after the voiced start frame, the adaptive codebook gain obtained by decoding each subframe in the frame is multiplied by the second scale factor of the subframe to obtain a new adaptive codebook gain for each subframe. The new adaptive codebook gain is used instead of the decoded adaptive codebook gain to participate in speech synthesis.

In order to solve the above technical problem, the present invention further provides a compensation device for a frame after a voiced start frame, the device comprising a compensation module and an adaptive codebook gain adjustment module, wherein:

The compensation module is configured to: correctly receive the voiced start frame, and when the one or more frames immediately following the voiced start frame are lost, infer the pitch delay of the lost frame and the adaptive codebook gain, according to the inference The pitch delay and the adaptive codebook gain compensate for the lost frame;

The adaptive codebook gain adjustment module is configured to: multiply an adaptive codebook gain obtained by decoding each subframe in the frame by a second scale factor of the subframe after the first correctly received frame after the voiced start frame A new adaptive codebook gain for each subframe is obtained, and the new adaptive codebook gain is used instead of the decoded adaptive codebook gain to participate in speech synthesis. BRIEF abstract

1 is a flowchart of Embodiment 1 of the present invention;

2 is a flowchart of a specific method of step 102 in Embodiment 1 of the present invention;

3 is a flowchart of a specific method of step 103 in Embodiment 1 of the present invention;

4 is a flowchart of Embodiment 3 of the present invention;

5 is a flowchart of a second scale factor calculation method in Embodiment 4 of the present invention;

6 is a schematic structural view of a compensation device according to Embodiment 5 of the present invention;

Figure 7 is a schematic structural view of a compensation device in Embodiment 6 of the present invention;

Figure 8 is a schematic structural view of a compensation device in Embodiment 7 of the present invention;

9 is a schematic structural view of a compensation device according to Embodiment 8 of the present invention;

FIG. 10 is a schematic structural diagram of a second scale factor calculation module according to Embodiment 8 of the present invention.

Preferred embodiment of the invention

Embodiments of the present invention will be described in detail below with reference to the accompanying drawings. It should be noted that In the case of no conflict, the features in the embodiments and the embodiments in the present application may be arbitrarily combined with each other. The following embodiment is described for the case where the voiced start frame is normally received, and the frame immediately following the voiced start frame is lost.

Example 1

This embodiment describes a method for compensating the first frame loss immediately after the voiced start frame, as shown in FIG. 1, including the following steps:

Step 101: The voiced start frame is correctly received, and it is determined whether the first frame (hereinafter referred to as the first lost frame) immediately after the voiced start frame is lost. If it is lost, step 102 is performed, otherwise the process ends; step 102, according to the The stability condition of the voiced start frame is selected by the corresponding pitch delay inference method to infer the pitch delay of the first lost frame;

Specifically: if the voiced start frame conforms to the stability condition, the pitch delay of the first lost frame is inferred by the following pitch delay inference method: an integer of the pitch delay of the last subframe in the voiced start frame is used. Partially as the pitch delay of each subframe in the first lost frame;

If the voiced start frame does not meet the stability condition, then the pitch delay of the first lost frame is inferred by the following pitch delay inference method: using the first correction amount, the pitch of the last subframe in the voiced start frame is delayed. The integer part of the time ( Τ - is corrected to obtain the first correction value, and the first correction value is used as the pitch delay of each sub-frame in the first lost frame.

When the pitch delay obtained is a non-integer, preferably, the first correction value can be made an integer by rounding. The specific implementation of the rounding process can be rounded up or rounded down or rounded off.

The first correction amount is obtained by: canceling the gene delay of two or more subframes before the first lost frame based on one subframe before the first lost frame (the last subframe of the voiced start frame) a multiple of the pitch delay of the two or more sub-frames before the first lost frame after eliminating the multiple of the pitch delay to determine the correction factor of the pitch delay, and using the correction factor and 7^ to determine the gene delay a first scale factor, the first correction amount being a product of the correction factor and a first scale factor, wherein the first scale factor is used to represent the reliability of the correction factor. Specifically, the correction factor is: a standard deviation of the pitch delay integer portion of the two or more subframes before the first lost frame after the multiple of the pitch delay is eliminated. The first scale factor is: 1 minus the correction factor and the voiced start frame The ratio of the integer part of the pitch delay of the latter sub-frame _; = 1 - / _m / 7:, , , where / _ra is the correction factor. In other embodiments, the first scale factor may also take other values, such as a constant between [0, 1].

Preferably, the following method is used to determine whether the voiced start frame meets the stability condition: the voiced start frame that satisfies any of the following conditions meets the stability condition, and the voiced start frame that does not satisfy all of the following conditions does not meet the stability condition. Sexual conditions:

The autocorrelation coefficient of the pitch synchronization of the voiced start frame is greater than the first threshold?

The adaptive codebook subframe the penultimate voiced onset frame is the last sub-frame adaptive codebook gain value is greater than the second width and the gain of the present frame is voiced onset width greater than a third value G _2;

The last sub-frame of the voiced start frame is equal to the integer part of the pitch delay of the second to last subframe.

The following method is specifically described in the case of a frame length of 20 ms, each frame is divided into four sub-frames of 5 ms duration, and a speech stream with a sampling rate of 16 kHz is taken as an example. Steps 102 of the present embodiment are used. The same applies. As shown in Figure 2, the following steps are included:

Step 102a, it is determined whether the voiced start frame meets any of the following stability conditions, and if so, step 102b is performed, if all the following conditions are not met, step 102c is performed;

• The autocorrelation coefficient ^ of the pitch synchronization of the voiced start frame is greater than the first threshold R;

Where 0 ≤ i? ≤ l. Preferably, ?>0.5.

For any frame, the pitch-synchr onous normalized correlation R _{T is} the normalized autocorrelation coefficient value of the last two consecutive pitch periods of the frame, which is used to indicate the similarity of the two consecutive pitch periods. The characteristics can be calculated by the following methods:

_R c _N (T) if r>N

^T ~ [ .5C _N (T) + .5C _N (2T) If r≤N

j, ivm ττ-, -h I "round(7^) If 7^≤3N/2 where N is the length of the sub-frame, the values are as follows: Γ = ^3J , _ro ³ ,

[round(7 ₂ +r ₃ ) if r ₃ >3N/2 roundW denotes a rounding operation, ₂ and ₃ denote the pitch delay of the 3rd and 4th subframes of the frame; C _w (kT) in the above formula , = 1, 2 is calculated as follows:

∑L-\-{k-\)T ,

C _N (kT)

Lsr^L-\-(k-\)TI^L- -(k- )T ~~ IT Wherein, for the frame length, S(), = 0, ..., Jl is the time domain speech signal of the frame synthesized by the decoder. Greater than the second threshold (^ and _{-2 is} greater than the third threshold G ₂ ;

Where, and _ ₂ are the adaptive codebook gains of the 4th subframe (the last subframe) and the 3rd subframe (the second last subframe) of the voiced start frame, respectively; 0 ^ ≤ (3⁄4<1.

Equal to 7: ₂ ;

Wherein ^ and 7: ₂ are the integer parts of the pitch delay of the 4th subframe and the 3rd subframe of the voiced start frame, respectively.

Step 102b, if the voiced start frame meets any of the above stability conditions, the integer part 7^ of the pitch delay of the last subframe of the voiced start frame (the fourth subframe in this embodiment) is used as the first Loss frame, pitch delay of each sub-frame, end;

Step 102c: If the voiced start frame does not meet all the stability conditions described above, the integer part of the pitch delay of the M (e.g., M=4) subframes before the current lost frame Τ - _Μ , , .., Τ _Λ The process of eliminating the multiple of the pitch delay is as follows, that is, based on the last subframe before the current lost frame, the multiple of the gene delay of the two or more subframes before the current lost frame is eliminated:

First take 7 ^ for 7 ^ to indicate the pitch delay after eliminating the multiple; for ζ·from -2 to

If 7 is less than or equal to Τ—ι, take 7 and 2*7; the closer to Τ _,, ie 7 and 2*7; the one with the smallest absolute difference between the difference and 7^, if ΙΤτΤ^Ι and ^Τ^ -Τ^Ι中ΙΤτΤ^Ιminimum, then take Γ=Τ, if Idil and |2*7

2*7;;

Whereas if Ί] is greater than _Τ Λ, ', · and 7 taken ΊΙ2 _Τ Λ closer distance, namely ΊΙ2 7 and the smallest absolute value, and if ΙΤΓΤ ^ Ι K0 ^) of the difference between _Τ Λ - ^ in If ΙΤΓΤ^Ι is the smallest, then τ=τ, if ΙΤτΤ^Ι and Κ ^)-^ Κ ^)-^ is the smallest, then take T)= 7V2.

Step 102d, determining a correction factor / _{ra of the} pitch delay and a first scale factor, taking the first correction amount as the product of the first scale factor and the correction factor, that is, wherein the correction factor / _{ra is} taken as T'- _M ,,, ..,

The standard deviation of 7^, the first scale factor / _s represents a credibility of the correction factor, the specific values are as follows:

In the above formula, Γ is calculated in step 102C.

Step 102e, using the integer part 7^ of the pitch delay of the last subframe of the voiced start frame (the fourth subframe in this embodiment) as the basic value of the pitch delay of each subframe of the first lost frame, The correction factor and the first scale factor perform a first correction process on the fundamental value of the pitch delay to obtain a first correction value T^T^+^, which is used as the pitch delay of each subframe of the first lost frame.

When correcting 7^ with the first correction amount, the first correction value 7 obtained should be guaranteed; within the range of the pitch delay. Finally, the first correction value 7 is made by rounding (in this embodiment, rounding off); In other embodiments, if the pitch delay obtained is an integer, the rounding process may not be performed.

Step 103: Infer an adaptive codebook gain of the first lost frame according to an adaptive codebook gain received before the first lost frame (take an integer greater than or equal to 1), or according to a time domain voice of the voiced start frame. The energy variation of the signal infers the adaptive codebook gain of the first lost frame, and the time domain speech signal of the voiced start frame is synthesized by the decoder;

Specifically: if the following condition is satisfied: the difference between the logarithmic energy in the pitch period of the voiced start frame and the logarithmic energy in the long time pitch period is less than the fourth threshold value 3⁄4^, then the first frame of the first lost frame will be attenuated adaptive codebook gain value of the estimated value of the median of the present adaptive codebook gain of the present frame as the first lost frame in each sub-frame g _p, is a constant coefficient for the [0,1] preclude attenuation;

If condition one is not satisfied, but the following condition two is satisfied: the adaptive codebook gain _g/l of the last subframe in the voiced start frame is within a predetermined range, and the attenuated _{g/l is taken} as the first lost frame. The estimated value of the adaptive codebook gain of the subframe is attenuated by a constant between [0, 1]; if the condition is not satisfied, the energy ratio is calculated as ^^ and ?^, using attenuation The weighted average of the subsequent R _LT and R _ST is used as an inferred value of the adaptive codebook gain for each subframe in the first lost frame; wherein, in addition to the first, the time domain speech signal representing the voiced start frame synthesized by the decoder The ratio of the energy outside the pitch period to the energy other than the last pitch period; R _ST represents the energy of the last pitch period of the time domain speech signal representing the voiced start frame synthesized by the decoder and the previous one of the last pitch period The ratio of the energy of the pitch period, where the pitch period is limited (ie, the gene delay must not exceed half the frame length, ie, 7: corpse/2 when 7^ is greater than /2.

When the current frame is lost, the historical excitation signal is cycled with the pitch delay obtained in step 102. The row periodic extension obtains the adaptive codebook excitation, and the product of the adaptive codebook gain obtained in step 103 and the adaptive codebook excitation is used as the periodic part of the excitation signal of the current subframe of the current lost frame to participate in speech synthesis.

The following method is also applicable to the case where the frame length is 20 ms, each frame is divided into four sub-frames of 5 ms duration, and the speech stream with a sampling rate of 16 kHz is taken as an example for the case of other frame lengths and sampling rates. . As shown in Figure 3, the following steps are included:

* For the first subframe of the currently lost frame:

Step 103a, if the following condition one is satisfied: the difference dE _t between the logarithmic energy in the pitch period of the previous frame of the current frame loss (in the present embodiment, the voiced start frame) and the logarithmic energy in the long time pitch period is less than The threshold value of 3⁄4 ^ (usually 3⁄4 „ and negative value) is taken as the adaptive codebook gain g _p , - _Ml , .., g _p of the subframe before the attenuation of the current lost frame (eg = 5 ). The value of the number of bits is used as the inferred value of the adaptive codebook gain of the first subframe of the current lost frame.

_{_{g P = o v ( ")}} * median (^ _ M, ..., g p ^), while the limit of a proper range, such as limiting the ¾ ₇ [0.5,0.95] within that: when _& <0.5, take _& = 0.5; if _& >0.95, take _& = 0.95.

In the above formula, “represents the serial number of the current consecutive frame loss, for example, the first frame loss frame after receiving the frame correctly, so take w=l; indicating the attenuation coefficient corresponding thereto, and the values are as follows:

1.0, n = \

(") = o.95, n = 2 ; median(*) means taking the median.

0.6, n>=3 For any frame, defined as the difference between the logarithmic energy in the pitch period and the logarithmic energy in the long-term pitch period, ie:

dE _f = E _t -E _t , where represents the logarithmic energy in the pitch period: £, =101 _0& . (^ ^ ² ( - Γ'+)), where is the frame length, Γ' indicates the pitch delay, and the value is:

Iround(0.5*7 ₂ + 0.5*r ₃ ) if round(0.5*r ₂ +0.5*Γ ₃ )≥ N .

_ 2*round(0.5*r ₂ +0.5*r ₃ ) If round(0.5*r ₂ +0.5*Γ ₃ ) < N ' represents the logarithmic energy in the long-time pitch period, when the type of the frame is a voiced frame ( VOICED) It needs to be updated when updating: ^3⁄4=0.99, +0.01.

103b, if the condition in the above 103a is not satisfied, but the following condition 2 is satisfied: the adaptive codebook gain of the previous subframe of the current lost frame (ie, the last subframe in the voiced start frame) is within an appropriate range, for example, 3⁄4 ^ Between 0.8 and 1.1, make appropriate attenuation for 3⁄4^ to get the adaptive codebook gain of the 1st subframe of the current lost frame _& :

g _P = c _p (nyg _p _ ( 1 )

Where the attenuation coefficient is expressed.

103c, when the two conditions in the above 103a and 103b are not satisfied, according to the energy variation of the time domain speech signal of the voiced start frame synthesized by the decoder, the adaptive codebook gain of the current lost frame is inferred, and the specific inference manner is as follows: :

First, the energy ratios ₇ and 3⁄4 _{7 are} calculated, wherein the ratio of the energy of the time domain speech signal of the voiced start frame synthesized by the decoder to the energy other than the first pitch period is compared with the energy of the last pitch period; 3⁄4 _r represents The ratio of the energy of the last pitch period of the time domain speech signal of the voiced start frame synthesized by the decoder to the energy of the previous pitch period of the last pitch period, where the pitch period is limited to no more than /2, ie 7^ is greater than / The calculation formula for taking 2, and 3⁄4 _r at 2 o'clock is as follows:

,

Wherein, for the frame length, _re (), = 0...J-1 is a time domain speech signal of the voiced start frame synthesized by the decoder;

Then, the energy ratios R _LT and 3⁄4 _{r are} weighted and averaged and then appropriately attenuated:

=«»*(0.5*R ₊ 0.5D, (2)

103d, estimated by the formula (1) or (2) as the value of the current adaptive codebook subframe, the first lost frame according to the estimated value of the gain limiting _& obtained; specific limitation on the method are as follows:

If it is greater than a certain upper threshold, for example, 1, it is taken as the upper threshold; If the width is less than a lower limit value, e.g. 0.7, the lower limit for the width values _& taken; if ^ is equal to a first step extrapolated correction value T _c 102 (rounding process after T _c), and _& than the other The upper limit is, for example, 0.95, taking _{& is} the other upper limit;

• For the other subframes except the first subframe of the current lost frame, step 103e is performed, and the adaptive codebook gain estimated by using the first subframe of the current lost frame is directly used as the estimated value of the adaptive codebook gain of the subframe. .

Step 104: Compensate the first lost frame according to the inferred pitch delay and the adaptive codebook gain, that is, use the inferred pitch delay and the adaptive codebook gain to participate in the speech synthesis of the first lost frame.

The specific compensation method can be implemented by using the prior art, and will not be described in detail herein.

Example 2

The present embodiment describes a method of compensating for the first frame loss immediately after the voiced start frame, which is different from Embodiment 1 in that the second correction processing is added.

Step 201 is the same as step 101 in Embodiment 1;

Step 202, the main difference between this step and step 102 is: when the voiced start frame does not meet the stability condition, after the first correction amount is used to correct 7^, the corrected 7^ is subjected to the second correction process, and The processed result is corrected as the inferred value of the pitch delay of each subframe of the final first lost frame.

Specifically, the second correction process is as follows: It is judged that if the following two conditions are satisfied, 7^ is taken as the intermediate value of the pitch delay: Condition 1: The corrected Τ _Λ (ie Τ _ΰ = Τ _Λ + f _s % ) and Τ The absolute value of the difference of _Λ is greater than the fifth threshold T _thrl , Condition 2: Τ _Λ and the pitch delay of the second sub-frame of the start frame of the voiced _speech . The absolute value of the difference of the 7: ₂ is less than the sixth threshold 7^ ₂ ; where 0 < sixth threshold 7^ ₂ < fifth threshold value judgment If the above condition is not satisfied, the sum of the fifth threshold and the minimum value of the first correction amount and Τ _作为 is taken as the pitch delay Value; if the intermediate value of the pitch delay is greater than X times (χ > 1, preferably χ = 1.7) of the pitch delay of the most recently received voiced frame with a stable pitch delay, the intermediate value of the pitch delay is multiplied by 2 As a result of the second correction process, otherwise the intermediate value of the pitch delay is directly used as the second correction process. The result. Preferably, when the intermediate value of the pitch delay is greater than X times the pitch delay of the most correctly received voiced frame with a stable pitch delay, the multiplier flag is valid (eg, 1), not greater than, the multiplier The flag is invalid (for example, 0).

Step 203, the main difference between this step and step 103 is that the condition one is: the difference between the logarithmic energy in the pitch period of the voiced start frame and the logarithmic energy in the long time pitch period is less than the fourth threshold E _thr or in the pitch The multiplier flag set in the delay inference is valid (for example, 1). The process satisfying the condition one, the condition two, the process of satisfying the condition one but satisfying the condition two, and the process of not satisfying the condition one and the condition two are the same as the step 103. Step 204 is the same as step 104 in Embodiment 1.

For the frame length of 20 ms, each frame is divided into four sub-frames of 5 ms duration, and the speech stream with a sampling rate of 16 kHz is taken as an example. Step 202 of the embodiment is specifically described. Under other frame lengths and sampling rates, the following method is used. The same applies.

Step 202a, it is determined whether the voiced start frame meets any of the following stability conditions, and if so, step 202b is performed, if all the following conditions are not met, step 202c is performed;

The autocorrelation coefficient of the pitch synchronization of the voiced start frame is greater than the first threshold;

Where 0 ≤ i? ≤ l. Preferably, ? > 0.5.

For any frame, the pitch-synchr onous normalized correlation R _{T is} the normalized autocorrelation coefficient value of the last two consecutive pitch periods of the frame, which is used to indicate the similarity of the two consecutive pitch periods. For the specific calculation method, refer to step 102a, and details are not described herein again.

· _{-1 is} greater than the second threshold (^ and _{-2 is} greater than the third threshold G ₂ ;

· Τ _Λ is equal to _-2 ;

Step 202b, if the voiced start frame meets any of the above stability conditions, the integer part 7^ of the pitch delay of the last subframe of the voiced start frame (the fourth subframe in this embodiment) is used as the first Loss frame, pitch delay of each sub-frame, end; Step 202c, if the voiced start frame does not meet all the above stability conditions, the integer part of the pitch delay of the M (for example, M=4) subframes before the current lost frame is 7l _M , .., 7Ut to eliminate the pitch as follows The processing of the multiple of the delay, that is, based on the last subframe before the current lost frame, the multiple of the gene delay of the other subframes is eliminated:

First, take 7^ as Τ _Λ , 7^ denotes the pitch delay after eliminating the multiple; if 7 is less than or equal to Τ _Λ , T'i takes 7 and 2*7; the one with the smallest absolute difference; if 7 is greater than Τ 'Π] and the one with the smallest absolute difference between 7V2 and

Where Μ is the number of subframes before the first lost frame to be erased.

Step 202d, determining a correction factor / _{ra of the} pitch delay and a first scale factor, taking the first correction amount as the product of the first scale factor and the correction factor, that is, wherein the correction factor / _{ra is} taken as T'- _M .., The standard deviation of ', the first scale factor / _s represents a degree of confidence in the correction factor, the specific values are as follows: f where f = 丄^Mi

In the above formula, Γ is the Τ calculated in step 202c.

Step 202e, using the integer part 7^ of the pitch delay of the last subframe of the voiced start frame (the fourth subframe in this embodiment) as the basic value of the pitch delay of each subframe of the first lost frame, The correction factor and the first scale factor perform a first correction process on the fundamental value of the pitch delay to obtain a first correction value T _c = T _A + f _s %;

Step 202f, performing the following second correction processing on the first correction value:

If the absolute value of the difference between 7 and 7 is greater than the threshold value of the fourth T _thrl , and the absolute value of the difference between 7 and 7: ₂ is less than the sixth threshold of 7^ ₂ , then 7 = ^; otherwise (any of the above conditions are not Satisfy) take 7; add 7^ to ■; * and T _{thA to} the minimum value, ie T^T^+mii^f _s * f _{m thrl} ), preferably, take the value

The resulting 7; is compared with the pitch delay T _{s of the} recently correctly received voiced frame with a stable pitch delay: if 7; greater than X times 7; preferably x = 1.7, new T _c = T _c x 2. Set the multiplier flag to 1; otherwise, do not update 7; set the multiplier flag to 0. Among them, 7 needs to be updated when the information frame is correctly received, and the update method is as follows:

Let T _Q , Ί\ , Γ ₂ and Γ _{3 be} the pitch delay of the first, second, third and fourth sub-frames of the frame respectively, if the currently correctly received frame is a voiced type frame, including a voiced over frame, voiced sound frame, voiced onset frame, and the frame has a stable pitch period, for example, satisfy the condition: not more than 1.4 times of Γ _3, and Γ ₃ and not more than 1.4 times the absolute value of the difference between T _Q Γ ₂ and not more than 10, Then update 7; it is Γ ₃ , otherwise it will not be updated.

Step 202g, using 7 after the rounding process; as the pitch delay of each subframe of the current lost frame, and ensuring 7 after the rounding process; within the range of the pitch delay, that is:

If 7; > Tmax, take T _c = T _max ;

^^ T _c < T _mm , T _c = 7mm;

Where Imm and ^ are the minimum and maximum values allowed by the pitch delay, respectively.

Example 3

This embodiment describes a method for compensating two or more frames immediately after a voiced start frame, where the lost frame includes a first lost frame and one or more missing frames immediately following the first lost frame, as shown in the figure. 4, including the following steps:

Step 301: Infer the pitch delay and the adaptive codebook gain of the first lost frame by using the method in Embodiment 1 or Embodiment 2;

Step 302: For one or more lost frames immediately following the first lost frame, use the pitch delay of the previous lost frame of the current lost frame as the pitch delay of the currently lost frame;

Step 303: Attenuate and interpolate the inferred value of the adaptive codebook gain of the last subframe of the previous lost frame of the current lost frame as the adaptive of each subframe in the current lost frame. Codebook gain;

Specifically, for the current lost frame, an adaptive codebook of the last subframe of the previously lost frame of the currently lost frame that may be attenuated (which may be the first lost frame or the lost frame after the first lost frame) Gain as the adaptive codebook gain of the last subframe of the current lost frame ^ The adaptive codebook gain of the other subframes of the current lost frame is obtained by linear interpolation between the processed ^ and ^, and the processing of the pair is used to make Approach 1 to, for example, the arithmetic square root of the process: ^gp ' ^end , or a cube root that can also be. Step 304: Compensate for the lost frame according to the inferred pitch delay and the adaptive codebook gain.

For the frame length of 20 ms, each frame is divided into four sub-frames of 5 ms duration, and the speech stream with a sampling rate of 16 kHz is taken as an example. This step 303 is specifically described. Under other frame lengths and sampling rates, the following methods are also applicable. .

The adaptive codebook gain of the 4 subframes of the currently lost frame is recorded as: _Q , g _p , g _Pi2 , g _P , the adaptive codebook gain inferred value of the last subframe of the previous lost frame of the currently lost frame Recorded as: gp, - Calculated 3⁄4,. , g _P , 2 , are as follows:

First, let W denote the sequence number of the current consecutive frame loss, and indicate the attenuation coefficient corresponding thereto;

S

Then, calculate the interpolation step size as: ^gp ' ^{SteP =} ^ , where _gpstar =^ ,

4 is the total number of subframes of the current lost frame. In other embodiments, if the number of subframes in each frame is other values, when the calculation is performed by the method of this embodiment, the other values are used to replace the above formula. "4"; Thus, the value of g _P , 2 , is as follows:

S ― S + S ,

- + ,

S ― S S ,

S ― S + S ― S

Example 4

This embodiment describes how to perform the recovery processing after compensating the first correctly received frame after the voiced start frame. This embodiment may be used in combination with Embodiment 1 or Embodiment 2 or Embodiment 3, or may be There is a technique for compensating for a frame loss frame after a start of a voiced tone frame. Includes the following steps:

Step 401: The voiced start frame is correctly received. When one or more frames immediately following the voiced start frame are lost, the pitch delay of the lost frame and the adaptive codebook gain are inferred, and the pitch delay and the inference are obtained according to the inference. The adaptive codebook gain compensates for lost frames;

This step can be implemented by using the method in Embodiment 1 or Embodiment 2 or Embodiment 3, or by using the compensation method in the prior art. Step 402: For the first correctly received frame after the voiced start frame, multiply the adaptive codebook gain obtained by decoding each subframe in the frame by the second scale factor scale_fac to obtain a new adaptive code of each subframe. The gain g _p = scale_ fac * g _p , using the new adaptive codebook gain instead of the decoded adaptive codebook gain to participate in speech synthesis.

In the speech synthesis, the new adaptive codebook gain is used instead of the decoded adaptive codebook gain to participate in speech synthesis, and the time domain speech signal of the current frame is obtained.

The second scale factor scale-fac is used to control the contribution of the adaptive codebook of the first correctly received frame after the frame loss and the overall energy of the synthesized speech. When the pitch delay used in the compensation forms a jump with the pitch delay used in the current frame, the reliability of the pitch delay used in the compensation is not high, and the adaptive codebook contribution needs to be appropriately reduced to reduce the error adaptation. The error transmission caused by the codebook, and by controlling the second scale factor scale-fac, the energy of the first correctly received frame after the frame loss does not increase rapidly. As shown in FIG. 5, in this embodiment, each subframe The second scale factor is calculated using the following method:

Step a, assigning the second scale factor to the initial value of 1;

Preferably, the step a1 is further included between the steps a and b: if the inferred value of the pitch delay of the previous frame loss of the current frame is different from the pitch delay T _Q of the first subframe obtained by the current frame decoding. If the absolute value is greater than the preset eighth threshold, for example, greater than 10, the new second scale factor is recalculated according to the linear increasing function of the pitch synchronization autocorrelation coefficient R _T of the last correct received frame before the frame loss, that is, the voiced start frame. a*i? _r +b, usually only need to take a>0 to ensure that the second scale factor is about the increasing function, and can also limit the range of the new scale-fac, for example, when scale_fac is greater than 1, When it is less than 0.5, take 0.5.

Step b, multiplying the second scale factor scale-fac (possibly the second scale factor initial value in step a, or possibly the new second scale factor in step a) by the adaptive code obtained by decoding the current subframe The gain g _p , the obtained value is multiplied by the adaptive codebook of the current subframe, and the obtained signal is used as an excitation signal of the current subframe;

Step c, using the excitation signal to perform voice pre-synthesis, synthesizing, not updating the state value of each filter, and calculating the signal energy of the current subframe according to the pre-synthesized speech signal

Step d, if the signal energy E of the current subframe and the last subframe in the previous frame of the current frame The arithmetic square root of the ratio of the signal energy ^ exceeds the seventh threshold f (preferably 1 < f < 1.5), and the second scale factor is updated to the ^E -, ^{/ E} times of the current second scale factor: scale _fac = K * JF E * scale_fac; _? If not, it will not be updated. The energy is calculated as follows: ^{£ =} ∑'=. Where N is the subframe length, ii = \, -, N is the pre-composed speech signal or the speech signal of the previous frame of the current frame synthesized by the decoder.

Example 5

This embodiment describes a compensation apparatus for implementing the method of Embodiment 1, the apparatus comprising a first genetic delay compensation module, a first adaptive codebook gain compensation module, and a first compensation module, wherein:

The first gene delay compensation module selects the corresponding pitch delay inference method according to the stability condition of the voiced start frame when the voiced start frame is correctly received and the first frame immediately following the voiced start frame is lost. The pitch delay of the first lost frame;

The first adaptive codebook gain compensation module estimates an adaptive codebook gain of the first lost frame according to an adaptive codebook gain of one or more subframes received before the first lost frame, or according to a voiced start The energy variation of the time domain speech signal of the frame infers the adaptive codebook gain of the first lost frame;

The first compensation module is configured to compensate the first lost frame according to the inferred pitch delay and the adaptive codebook gain.

Preferably, the first gene delay compensation module is configured to infer the pitch delay of the first lost frame according to the stability condition of the voiced start frame by selecting a corresponding pitch delay inference manner:

If the voiced start frame satisfies any of the following conditions, the pitch delay of the first lost frame is inferred by the following pitch delay inference method: using the integer part of the pitch delay of the last subframe of the voiced start frame as An inferred value of the pitch delay of each subframe of the first lost frame;

If the voiced start frame does not satisfy all of the following conditions, the pitch delay of the first lost frame is inferred by the following pitch delay inference method: using the first correction amount, the pitch of the last subframe of the voiced start frame is delayed. The integer part of the time is corrected to obtain a first correction value, and the first correction value is used as an inferred value of the pitch delay of each subframe of the first lost frame; The above conditions are:

The adaptive codebook gain of the last subframe of the voiced start frame is greater than the second threshold, and the adaptive codebook gain of the second to last subframe of the voiced start frame is greater than the third threshold;

As shown in FIG. 6, the compensation device further includes a first correction amount calculation module, configured to obtain the first correction amount, and the first correction amount calculation module may be separately set, or may be set in the first pitch delay compensation. In the module. The first correction amount calculation module includes an elimination unit, a correction factor calculation unit, a first scale factor calculation unit, and a first correction amount calculation unit, wherein:

The eliminating unit is configured to eliminate a multiple of a gene delay of two or more subframes before the first lost frame, based on a last subframe before the first lost frame;

The correction factor calculation unit is configured to determine a correction factor of the pitch delay in the following manner: The correction factor is: a criterion for eliminating the pitch delay integer part of the two or more subframes before the first lost frame after the pitch delay multiple is eliminated Variance

The first scale factor calculation unit is configured to determine a first scale factor of the gene delay in the following manner: The first scale factor is: 1 minus the correction factor and the pitch delay of the last subframe of the voiced start frame The ratio of the integer parts;

The first correction amount calculation unit is configured to calculate the first correction amount in the following manner: The first correction amount is: a product of the correction factor and the first scale factor.

Preferably, the cancellation unit is configured to eliminate the multiple of the gene delay of the two or more subframes before the first lost frame based on the last subframe before the first lost frame in the following manner:

First take 7 ^ for it, 7 ^ denotes the pitch delay after eliminating the multiple, 7^ is the integer part of the pitch delay of the last subframe of the voiced start frame; if 7 is less than or equal to 7^, the elimination unit takes 7 and 2*7; the one with the smallest absolute value of the difference between 7 and 7 is taken as the 如果, where z = [-2, -M, if 7 is greater than the absolute value of the difference between the elimination unit and 7 and 7V2 and ^ is the smallest. ], where M is the number of subframes before the first lost frame to be subjected to the cancel operation.

Preferably, the first adaptive codebook gain compensation module is used to The adaptive codebook gain of one or more subframes received before the lost frame infers the adaptive codebook gain of the first lost frame, or infers the first loss according to the energy variation of the time domain speech signal of the voiced start frame Adaptive codebook gain for frames:

The first adaptive codebook gain compensation module determines that if the following condition is satisfied: the difference between the logarithmic energy in the pitch period of the voiced start frame and the logarithmic energy in the long time pitch period is less than the fourth threshold, then the attenuation is The value of the median of the adaptive codebook gain of one or more subframes before the first lost frame as the inferred value of the adaptive codebook gain for each subframe in the first lost frame;

The first adaptive codebook gain compensation module determines that if condition one is not satisfied, but the following condition two is satisfied: the adaptive codebook gain of the last subframe in the voiced start frame is within a predetermined range, and the value to be attenuated An inferred value of the adaptive codebook gain for each subframe in the first lost frame;

The first adaptive codebook gain compensation module determines that if the condition 1 is not satisfied, the energy ratio R _LT and RST are calculated, and the weighted average of the attenuated RLT and 3⁄4 _r is used as each of the first lost frames. An inferred value of the adaptive codebook gain of the frame; wherein, the ratio of the energy of the time domain speech signal of the voiced start frame synthesized by the decoder, except for the first pitch period, to the energy other than the last pitch period; 3⁄4 _r represents the ratio of the energy of the last pitch period of the time domain speech signal representing the voiced start frame synthesized by the decoder to the energy of the previous pitch period of the last pitch period, the pitch period not exceeding half of the frame length.

Example 6

This embodiment describes a compensation device for implementing the method of Embodiment 2. As shown in FIG. 7, the device adds a gene delay compensation correction module to the device of Embodiment 5, which is used after obtaining the first correction value. And performing a second correction process on the first correction value, and using the corrected result as the inferred value of the pitch delay of each subframe of the first lost frame.

Further, the gene delay compensation correction module is configured to perform a second correction process on the first correction value in the following manner:

The gene delay compensation correction module determines that the integer part of the pitch delay of the last subframe of the voiced start frame is the intermediate value of the pitch delay if the following two conditions are met: Condition 1: First correction value and the voiced tone The absolute value of the difference of the integer portion of the pitch delay of the last subframe of the start frame is greater than the fifth threshold condition 2: the integer portion of the pitch delay of the last subframe of the voiced start frame The absolute value of the difference from the pitch delay integer portion of the second to last sub-frame of the voiced start frame is less than the sixth threshold; wherein 0<the sixth threshold <the fifth threshold; the gene delay compensation correction module determines if it is not satisfied In any of the above conditions, the sum of the minimum value of the first correction amount and the fifth threshold value and the integer portion of the pitch delay of the last subframe of the voiced start frame is an intermediate value of the pitch delay;

The gene delay compensation correction module determines that the intermediate value of the pitch delay is greater than X times the pitch delay of the recently received voiced frame with a stable pitch delay, χ > 1 , and multiplies the intermediate value of the pitch delay by 2 as the second Correcting the processed result, and simultaneously setting the multiplier flag is valid; if the intermediate value of the pitch delay is not greater than X times the pitch delay of the recently received voiced frame with a stable pitch delay, the pitch is delayed in the middle The value is used as the result of the second correction process, and the doubled frequency flag is invalid.

In this embodiment, the first adaptive codebook gain compensation module is configured to infer the first lost frame according to an adaptive codebook gain of one or more subframes received before the first lost frame in the following manner. Adaptive codebook gain, or inferring the adaptive codebook gain of the first lost frame based on the energy variation of the time domain speech signal of the voiced start frame:

The first adaptive codebook gain compensation module determines if the following condition is satisfied: the difference between the logarithmic energy in the pitch period of the voiced start frame and the logarithmic energy in the long time pitch period is less than the fourth threshold or inferred in the pitch delay If the multiplier flag set in the middle is valid, the median value of the adaptive codebook gain of one or more subframes before the first lost frame is used as the self of each subframe in the first lost frame. The inferred value of the adaptation codebook gain;

Example 7 The embodiment describes a compensation device for implementing the method of Embodiment 3. As shown in FIG. 8, the device adds a second pitch delay compensation module and a second adaptive codebook gain compensation based on the device of Embodiment 5 or Embodiment 6. a module and a second compensation module, wherein: the second pitch delay compensation module is configured to use a pitch delay of a previous lost frame of the current lost frame for one or more lost frames immediately following the first lost frame The inferred value is used as the pitch delay of the currently lost frame;

The second adaptive codebook gain compensation module is configured to attenuate and interpolate the estimated value of the adaptive codebook gain of the last subframe of the previous lost frame of the current lost frame as the adaptive codebook gain value. The adaptive codebook gain of each subframe in the current lost frame;

The second compensation module is configured to compensate for the lost frame according to the inferred pitch delay and the adaptive codebook gain.

Preferably, the second adaptive codebook gain compensation module is configured to attenuate and interpolate the inferred value of the adaptive codebook gain of the last subframe of the previous lost frame of the current lost frame in the following manner. The adaptive codebook gain value is used as the adaptive codebook gain of each subframe in the current lost frame: the second adaptive codebook gain compensation module will automatically subtract the last subframe of the previous lost frame of the current lost frame. Adapting the codebook gain as the adaptive codebook gain of the last subframe of the current lost frame (

The adaptive codebook gain of other subframes of the current lost frame is obtained by linear interpolation between the processed sums, and the processing of ^^^ is used to bring the ^1 closer.

Example 8

This embodiment describes a compensation apparatus for implementing the method of Embodiment 4, as shown in FIG. 9, the apparatus includes a compensation module and an adaptive codebook gain adjustment module, wherein:

The compensation module is configured to correctly receive the voiced start frame, and when one or more frames immediately following the voiced start frame are lost, infer the pitch delay of the lost frame and the adaptive codebook gain, according to the inference The pitch delay and the adaptive codebook gain compensate for the lost frame; the compensation module can be implemented by using a compensation device as described in Embodiment 5 or Embodiment 6 or Embodiment 7;

The adaptive codebook gain adjustment module obtains the first correctly received frame after the voiced start frame, and multiplies the adaptive codebook gain obtained by decoding each subframe in the frame by the second scale factor of the subframe to obtain each subframe. The new adaptive codebook gain of the frame, using the new adaptive codebook gain instead of decoding The adaptive codebook gain participates in speech synthesis.

Preferably, the compensation device further includes a second scale factor calculation module, configured to calculate a second scale factor of each subframe, the second scale factor calculation module may be separately set, or may be set in the adaptive codebook gain adjustment module. . As shown in FIG. 10, the second scale factor calculation module includes an excitation signal acquisition unit, a pre-synthesis unit, and a second scale factor generation unit, where:

The excitation signal acquiring unit is configured to multiply the initial value of the second scale factor by the adaptive codebook gain obtained by decoding the current subframe, and then multiply the adaptive codebook of the current subframe, and use the obtained signal as the current subframe. Excitation signal

The pre-synthesis unit is configured to perform voice pre-synthesis using the excitation signal, and calculate signal energy of the current subframe according to the pre-synthesized speech signal;

The second scale factor generating unit is configured to update the second scale factor when determining that the arithmetic square root of the ratio of the signal energy of the current subframe and the signal energy of the last subframe of the previous frame exceeds the seventh threshold Q is the current second scale factor, Q is the product of the arithmetic square root and the seventh threshold.

Preferably, the excitation signal acquisition unit is further configured to determine an inference value of a pitch delay of a previous frame loss of the current frame before multiplying the initial value of the second scale factor by the adaptive codebook gain decoded by the current subframe. When the absolute value of the pitch delay difference of the first subframe obtained by decoding the current frame is greater than the eighth threshold, the new second scale factor is recalculated according to the linear increasing function of the pitch synchronization autocorrelation coefficient of the voiced start frame. Replace the initial value of the second scale factor with a new second scale factor.

The width values used in the examples herein are empirical values and can be obtained by simulation.

One of ordinary skill in the art will appreciate that all or a portion of the above steps may be accomplished by a program instructing the associated hardware, such as a read-only memory, a magnetic disk, or an optical disk. Alternatively, all or part of the steps of the above embodiments may also be implemented using one or more integrated circuits. Correspondingly, each module/unit in the above embodiment may be implemented in the form of hardware or in the form of a software function module. The invention is not limited to any specific form of combination of hardware and software.

Of course, the invention may have other various embodiments without departing from the spirit and spirit of the invention. However, such corresponding changes and modifications are intended to be included within the scope of the appended claims.

INDUSTRIAL APPLICABILITY The embodiment of the present invention fully considers the characteristics that the voiced start frame is different from the voiced frame, and the first lost frame immediately following the voiced start frame is different according to different stable characteristics of the voiced start frame. Means inferring the pitch delay of the first lost frame, based on the adaptive codebook gain of one or more subframes received before the first lost frame, or inferring from the energy variation of the time domain speech signal of the voiced start frame The adaptive codebook gain of the first lost frame can be used to avoid the compensation delay only when the information of the frame before the frame is lost, and the compensation can be compensated for by using different compensation modes based on different stable characteristics of the voiced start frame. Sound quality. For one or more missing frames immediately following the first lost frame, the adaptive codebook gain of the lost frame is obtained by the method of post-fading interpolation, so that the speech energy at the time of the lost frame is smoothly reduced. For the first normal received frame after the lost frame, by adjusting the adaptive codebook gain to reduce the error transmission due to frame dropping and controlling the energy of the synthesized speech, in summary, the present invention is used. The embodiment method can improve the quality of voice calls in a frame dropping environment.

Claims

Claim

A method for compensating for a frame loss after a voiced start frame, the method comprising:

2. The method of claim 1 wherein

And determining a pitch delay of the first lost frame according to the stability condition of the voiced start frame, and selecting a corresponding pitch delay inference manner, including:

If the voiced start frame conforms to the stability condition, the pitch delay of the first lost frame is inferred by the following pitch delay inference method: using the integer part of the pitch delay of the last subframe of the voiced start frame as the Inferred value of the pitch delay of each subframe of the first lost frame;

If the voiced start frame does not meet the stability condition, the pitch delay of the first lost frame is inferred by the following pitch delay inference method: using the first correction amount, the pitch of the last subframe of the voiced start frame is delayed. The integer part of the time is corrected to obtain a first correction value, and the first correction value is used as an inferred value of the pitch delay of each subframe of the first lost frame.

3. The method of claim 2, wherein

判断 Use the following method to determine whether the voiced start frame meets the stability condition:

A voiced start frame that satisfies any of the following conditions satisfies the stability condition, and a voiced start frame that does not satisfy all of the following conditions does not satisfy the stability condition:

4. The method of claim 2, wherein

The first correction amount is obtained by the following method:

Eliminating the multiple of the gene delay of the two or more subframes before the first lost frame based on the last subframe before the first lost frame, using two of the first lost frames after eliminating the multiple of the pitch delay The integer portion of the pitch delay of the above subframe determines a correction factor of the pitch delay, and the correction factor and the integer portion of the pitch delay of the last subframe of the voiced start frame determine the first scale factor of the gene delay, The first correction amount is a product of the correction factor and the first scale factor.

5. The method of claim 4, wherein

The correction factor is: a standard deviation of a pitch delay integer part of two or more sub-frames before the first lost frame after canceling the pitch delay multiple;

The first scale factor is: 1 minus the ratio of the correction factor to the integer portion of the pitch delay of the last subframe of the voiced start frame.

6. The method according to claim 4 or 5, wherein

Determining the multiple of the gene delay of the two or more subframes before the first lost frame based on the last subframe before the first lost frame, including:

First, take 7^ as the middle, 7^ denotes the pitch delay after eliminating the multiple, and 7^ is the integer part of the pitch delay of the last subframe of the voiced start frame;

If 7 is less than or equal to 7 and 2*7; the one with the smallest absolute value of the difference from 7^; if 7 is greater than the one with the smallest absolute value of the difference between 7 and 7V2 and 7^, where z=[- 2, -M], where M is the number of subframes before the first lost frame to be subjected to the cancel operation.

7. The method of claim 2, wherein

Deriving an adaptive codebook gain of the first lost frame according to an adaptive codebook gain of one or more subframes received before the first lost frame, or changing an energy of the time domain voice signal according to the voiced start frame Inferring the adaptive codebook gain of the first lost frame, including:

If the following condition one is satisfied: the difference between the logarithmic energy in the pitch period of the voiced start frame and the logarithmic energy in the long time pitch period is less than the fourth threshold, then one or more of the first lost frame after the attenuation The value of the median of the adaptive codebook gain of the subframe as the inferred value of the adaptive codebook gain for each subframe in the first lost frame; If condition one is not satisfied, but the following condition two is satisfied: the adaptive codebook gain of the last subframe in the voiced start frame is within a predetermined range, and the value attenuated as the value of each subframe in the first lost frame Inferred value of the adaptive codebook gain;

If the condition is not satisfied and the condition 2 is not satisfied, the energy ratios ?^ and ?^ are calculated, and the weighted average of the attenuated sum and 3⁄4 _r is used as the inferred value of the adaptive codebook gain for each subframe in the first lost frame. Wherein, the ratio of the energy of the time domain speech signal of the voiced start frame synthesized by the decoder to the energy other than the first pitch period and the energy other than the last pitch period; 3⁄4 _r represents the voiced start of the decoder synthesis The ratio of the energy of the last pitch period of the time domain speech signal of the frame to the energy of the previous pitch period of the last pitch period, the pitch period not exceeding half of the frame length.

8. The method of claim 2, wherein

After the first correction value is obtained, the method further includes:

The first correction value is subjected to a second correction process, and the result of the correction process is used as an inferred value of the pitch delay of each subframe of the first lost frame.

9. The method according to claim 8, wherein the performing the second correction processing on the first correction value comprises:

It is judged that if the following two conditions are satisfied, the integer part of the pitch delay of the last subframe of the voiced start frame is the intermediate value of the pitch delay: Condition 1: The first correction value and the last of the voiced start frame The absolute value of the difference of the integer part of the pitch delay of the subframe is greater than the fifth threshold, Condition 2: the integer part of the pitch delay of the last subframe of the voiced start frame and the second last subframe of the voiced start frame The absolute value of the difference between the integer part of the pitch delay is less than the sixth threshold; wherein 0<the sixth threshold <the fifth threshold; determining that if any of the above conditions are not met, the first correction amount and the fifth threshold are taken The sum of the minimum value and the integer portion of the pitch delay of the last subframe of the voiced start frame is the intermediate value of the pitch delay;

If the intermediate value of the pitch delay is greater than X times the pitch delay of the recently received voiced frame with a stable pitch delay, χ > 1 , the intermediate value of the pitch delay is multiplied by 2 as the result of the second correction process. Simultaneously setting the multiplier flag is valid; if the intermediate value of the pitch delay is not greater than X times the pitch delay of the recently received voiced frame with a stable pitch delay, the intermediate value of the pitch delay is As a result of the second correction processing, the simultaneous multiplier flag is invalid.

10. The method of claim 9, wherein

If the following condition one is satisfied: the difference between the logarithmic energy in the pitch period of the voiced start frame and the logarithmic energy in the long time pitch period is less than the fourth threshold or the multiplier flag set in the pitch delay estimation is valid, Then, the value of the median of the adaptive codebook gain of one or more subframes before the first lost frame is used as the inferred value of the adaptive codebook gain of each subframe in the first lost frame; Condition 1 is satisfied, but the following condition 2 is satisfied: The adaptive codebook gain of the last subframe in the voiced start frame is within a predetermined range, and the value attenuated is used as the adaptation of each subframe in the first lost frame. Inferred value of the codebook gain;

If the condition is not satisfied and the condition 2 is not satisfied, the energy ratios ?^ and ?^ are calculated, and the weighted average of the attenuated R _LT and R _ST is used as the adaptive codebook gain of each subframe in the first lost frame. estimated value; wherein the ratio of the energy of the time-domain speech signal decoder synthesis of voiced onset frames except the first one, except the last pitch period of a pitch period energy representation; ¾ _r represents voiced synthesis represents the decoder The ratio of the energy of the last pitch period of the time domain speech signal of the start frame to the energy of the previous pitch period of the last pitch period, the pitch period not exceeding half of the frame length.

11. The method of claim 1 or 7 or 10, wherein the method further comprises: using one or more missing frames immediately following the first lost frame, using the previous lost frame of the current lost frame The inferred value of the pitch delay is used as the pitch delay of the current lost frame; the inferred value of the adaptive codebook gain of the last subframe of the previous lost frame of the current lost frame is attenuated, and the adaptive codebook gain obtained after interpolation is obtained. The value is used as the adaptive codebook gain of each subframe in the current lost frame; the lost frame is compensated according to the inferred pitch delay and the adaptive codebook gain.

12. The method of claim 11 wherein

The attenuated value of the adaptive codebook gain of the last subframe of the previous lost frame of the current lost frame is attenuated, and the adaptive codebook gain value obtained by the interpolation is used as the subframe of the current lost frame. Adaptive codebook gain, including:

The adaptive codebook gain of the last subframe of the previous lost frame of the attenuated current lost frame is used as the adaptive codebook gain of the last subframe of the current lost frame ( ^g , other subframes of the current lost frame) The adaptive codebook gain is obtained by linear interpolation between the processed ^ and ^, and the processing of ^^ * is used to bring ^^ * closer to 1.

13. The method of claim 12, wherein

The processed ^g is the arithmetic square root of the process.

The method of claim 1, wherein the method further comprises:

For the first correctly received frame after the voiced start frame, the adaptive codebook gain obtained by decoding each subframe in the frame is multiplied by the second scale factor of the subframe to obtain a new adaptive codebook gain for each subframe. The new adaptive codebook gain is used instead of the decoded adaptive codebook gain to participate in speech synthesis.

The method of claim 11, wherein the method further comprises:

16. The method of claim 14 or 15, wherein the second scale factor of each subframe is calculated by:

Multiplying the initial value of the second scale factor by the adaptive codebook gain obtained by decoding the current subframe, multiplying the adaptive codebook of the current subframe, and using the obtained signal as the excitation signal of the current subframe;

Performing voice pre-synthesis using the excitation signal, and calculating signal energy of the current subframe according to the pre-synthesized speech signal;

If the arithmetic square root of the ratio of the signal energy of the current subframe to the signal energy of the last subframe of the previous frame exceeds the seventh threshold, the second scale factor is updated to Q times the current second scale factor, Q is The product of the arithmetic square root and the seventh threshold.

17. The method of claim 16 wherein Before multiplying the initial value of the second scale factor by the adaptive codebook gain obtained by decoding the current subframe, the method further includes:

If the inferred value of the pitch delay of the previous frame of the current frame is greater than the eighth value of the pitch delay of the first subframe obtained by the current frame decoding, the pitch is synchronized according to the pitch of the voiced start frame. The linear increase function of the correlation coefficient recalculates the new second scale factor, and replaces the second scale factor initial value with the new second scale factor.

18. A method for compensating a frame after a voiced start frame, the method comprising:

The voiced start frame is correctly received. When one or more frames following the voiced start frame are lost, the pitch delay of the lost frame and the adaptive codebook gain are inferred, and the pitch delay and the adaptive code are obtained according to the inference. This gain compensates for lost frames;

19. The method of claim 18, wherein the second scale factor of each subframe is calculated using the following method:

If the signal energy of the current subframe and the signal energy of the last subframe of the previous frame of the current frame

The arithmetic square root of the ratio of Ε _超过 exceeds the seventh threshold, and the second scale factor is updated to Q times the current second scale factor, and Q is the product of the arithmetic square root and the seventh threshold.

20. The method of claim 19, wherein

Before multiplying the initial value of the second scale factor by the adaptive codebook gain obtained by decoding the current subframe, the method further includes:

If the inferred value of the pitch delay of the previous frame of the current frame is greater than the eighth value of the pitch delay of the first subframe obtained by the current frame decoding, the pitch is synchronized according to the pitch of the voiced start frame. The linear increase function of the correlation coefficient recalculates the new second scale factor, using the new second scale factor Instead of the second scale factor initial value.

21. The method of claim 18 or 19 or 20, wherein

The inferring the pitch delay of the lost frame and the adaptive codebook gain, comprising: using the method according to any one of claims 1-10 when the first frame immediately following the voiced start frame is lost , inferring the pitch delay and adaptive codebook gain of the first lost frame immediately following the start of the voiced speech; or

When the first frame immediately following the voiced start frame is lost and one or more frames immediately following the first lost frame are lost, the method according to any one of claims 1-10 is used to infer Pitch delay and adaptive codebook gain of the first lost frame immediately following the voiced start frame; using the method of any of claims 11-17, inferring the immediately following first lost frame Pitch delay and adaptive codebook gain for one or more lost frames.

22. A compensation device for dropping frames after a voiced start frame, the device comprising a first genetic delay compensation module, a first adaptive codebook gain compensation module, and a first compensation module, wherein:

The first adaptive codebook gain compensation module is configured to: infer an adaptive codebook gain of the first lost frame according to an adaptive codebook gain of one or more subframes received before the first lost frame, or according to The energy variation of the time domain speech signal of the voiced start frame infers the adaptive codebook gain of the first lost frame;

The first compensation module is configured to: compensate the first lost frame according to the inferred pitch delay and the adaptive codebook gain.

23. The compensation device according to claim 22, wherein

The first gene delay compensation module is configured to: 推断 infer the pitch delay of the first lost frame according to the stability condition of the voiced start frame by selecting a corresponding pitch delay inference manner: if the voiced start frame If any of the following conditions are met, the pitch delay of the first lost frame is inferred by the following pitch delay inference method: using the integer part of the pitch delay of the last subframe of the voiced start frame as the first lost frame Inferred value of the pitch delay for each sub-frame; If the voiced start frame does not satisfy all of the following conditions, the pitch delay of the first lost frame is inferred by the following pitch delay inference method: using the first correction amount, the pitch of the last subframe of the voiced start frame is delayed. The integer part of the time is corrected to obtain a first correction value, and the first correction value is used as an inferred value of the pitch delay of each subframe of the first lost frame;

The conditions are:

24. The compensation device according to claim 23, wherein

The compensation device further includes a first correction amount calculation module configured to: obtain the first correction amount, the first correction amount calculation module includes an elimination unit, a correction factor calculation unit, a first scale factor calculation unit, and the first a correction amount calculation unit, wherein: the elimination unit is configured to: cancel a multiple of a gene delay of two or more subframes before the first lost frame based on a last subframe before the first lost frame;

The correction factor calculation unit is configured to: determine a correction factor of the pitch delay in the following manner: The correction factor is: canceling the pitch delay integer part of the two or more subframes before the first lost frame after the pitch delay multiple Standard variance

The first scale factor calculation unit is configured to: 确定 determine a first scale factor of the gene delay in the following manner: The first scale factor is: 1 minus the correction factor and the pitch delay of the last subframe of the voiced start frame The ratio of the integer parts;

The first correction amount calculation unit is configured to: calculate the first correction amount in the following manner: The first correction amount is: a product of the correction factor and the first scale factor.

25. The compensation device according to claim 24, wherein

The eliminating unit is configured to: 消除 eliminate the multiple of the gene delay of the two or more subframes before the first lost frame based on the last subframe before the first lost frame in the following manner: First take 7 ^ for it, 7 ^ denotes the pitch delay after eliminating the multiple, 7^ is the integer part of the pitch delay of the last subframe of the voiced start frame; if 7 is less than or equal to 7^, the elimination unit takes 7 and 2*7; the one with the smallest absolute value of the difference between ^ and ^ is T, and if 7 is greater than Τ _Λ , the elimination unit takes the one with the smallest absolute value of the difference between 7 and Ί\Ι2 as 7^ as Τ , where -2, -Μ], where Μ is the number of subframes before the first lost frame to be cancelled.

26. The compensation device according to claim 23, wherein

The first adaptive codebook gain compensation module is configured to: infer an adaptive codebook of the first lost frame according to an adaptive codebook gain of one or more subframes received before the first lost frame in the following manner: Gain, or inferring the adaptive codebook gain of the first lost frame based on the energy variation of the time domain speech signal of the voiced start frame:

It is judged that if the following condition is satisfied: the difference between the logarithmic energy in the pitch period of the voiced start frame and the logarithmic energy in the long time pitch period is less than the fourth threshold, then one or two of the first lost frame will be attenuated The value of the median of the adaptive codebook gain of the above subframe is used as an inferred value of the adaptive codebook gain of each subframe in the first lost frame;

It is judged that if condition one is not satisfied, but the following condition two is satisfied: the adaptive codebook gain of the last subframe in the voiced start frame is within a predetermined range, and the value that is attenuated is used as each subframe in the first lost frame. Inferred value of the adaptive codebook gain;

If it is judged that the condition 2 is not satisfied, the energy ratios ?^ and ?^ are calculated, and the weighted average of the attenuated R _LT and R _ST is used as the adaptive codebook gain of each subframe in the first lost frame. Inferred value; wherein, the ratio of the energy of the time domain speech signal of the voiced start frame synthesized by the decoder to the energy other than the first pitch period and the energy other than the last pitch period; R _ST represents the representation of the decoder synthesis The ratio of the energy of the last pitch period of the time domain speech signal of the voiced start frame to the energy of the previous pitch period of the last pitch period, the pitch period not exceeding half of the frame length.

27. The compensation device according to claim 23, wherein

The compensation device further includes: a gene delay compensation correction module, configured to: after obtaining the first correction value, performing a second correction process on the first correction value, and using the corrected result as the final first loss The inferred value of the pitch delay for each subframe of the frame.

28. The compensation device according to claim 27, wherein

The gene delay compensation correction module is configured to: 进行 perform a second correction process on the first correction value in the following manner:

It is judged that if the following two conditions are satisfied, the integer part of the pitch delay of the last subframe of the voiced start frame is the intermediate value of the pitch delay: Condition 1: The first correction value and the last of the voiced start frame The absolute value of the difference of the integer part of the pitch delay of the subframe is greater than the fifth threshold, Condition 2: the integer part of the pitch delay of the last subframe of the voiced start frame and the second last subframe of the voiced start frame The absolute value of the difference between the integer part of the pitch delay is less than the sixth threshold; wherein 0<the sixth threshold <the fifth threshold; the gene delay compensation correction module determines that if any of the above conditions are not met, the first The sum of the minimum value of a correction amount and the fifth threshold value and the integer portion of the pitch delay of the last subframe of the voiced start frame is an intermediate value of the pitch delay;

If the intermediate value of the pitch delay is greater than X times the pitch delay of the recently received voiced frame with a stable pitch delay, χ > 1 , the intermediate value of the pitch delay is multiplied by 2 as the result of the second correction process. Simultaneously setting the multiplier flag is valid; if the intermediate value of the pitch delay is not greater than X times the pitch delay of the recently received voiced frame with a stable pitch delay, the intermediate value of the pitch delay is used as the second correction process. After the result, the simultaneous multiplier flag is invalid.

29. The compensation device according to claim 28, wherein

It is judged that if the following condition one is satisfied: the difference between the logarithmic energy in the pitch period of the voiced start frame and the logarithmic energy in the long time pitch period is less than the fourth threshold or the multiplier flag set in the pitch delay estimation is valid And determining, as the inferred value of the adaptive codebook gain of each subframe in the first lost frame, the value of the median of the adaptive codebook gain of one or more subframes before the first lost frame;

If the condition 1 is not satisfied, but the following condition 2 is satisfied: the adaptive codebook gain of the last subframe in the voiced start frame is within a predetermined range, and the value after the attenuation is used as the first lost frame. Inferred value of the adaptive codebook gain for each subframe;

30. The compensation device according to claim 22 or 26 or 29, wherein

The compensation device further includes a second pitch delay compensation module, a second adaptive codebook gain compensation module and a second compensation module, wherein:

The second pitch delay compensation module is configured to: use the inferred value of the pitch delay of the previous lost frame of the current lost frame as the current lost frame for one or more missing frames immediately following the first lost frame Pitch delay

The second adaptive codebook gain compensation module is configured to: attenuate and interpolate the inferred value of the adaptive codebook gain of the last subframe of the previous lost frame of the current lost frame to obtain an adaptive codebook gain value. As an adaptive codebook gain for each subframe in the current lost frame;

The second compensation module is configured to: compensate for the lost frame based on the inferred pitch delay and the adaptive codebook gain.

31. The compensation device according to claim 30, wherein

The second adaptive codebook gain compensation module is configured to: 衰减 attenuate and interpolate the inferred value of the adaptive codebook gain of the last subframe of the previous lost frame of the current lost frame in the following manner The codebook gain value is used as the adaptive codebook gain for each subframe in the current lost frame:

The adaptive codebook gain of the last subframe of the previous lost frame of the attenuated current lost frame is used as the adaptive codebook gain of the last subframe of the current lost frame, and the other subframes of the current lost frame are self-framed. The adaptive codebook gain is obtained by linear interpolation between the processed ^ and ^, and the processing of the pair is used to bring toward 1.

32. The compensation device according to claim 31, wherein The processed ^g is the arithmetic square root of the process.

33. The compensation device according to claim 22, wherein

The compensation device further includes an adaptive codebook gain adjustment module and a third compensation module, wherein: the adaptive codebook gain adjustment module is configured to: for the first correctly received frame after the voiced start frame, each frame in the frame The adaptive codebook gain obtained by decoding the subframe is multiplied by the second scale factor of the subframe to obtain a new adaptive codebook gain for each subframe;

The third compensation module is configured to: participate in speech synthesis using a new adaptive codebook gain instead of the decoded adaptive codebook gain.

34. The compensation device according to claim 30, wherein

35. The compensation device according to claim 33 or 34, wherein

The compensation device further includes a second scale factor calculation module configured to: calculate a second scale factor of each subframe, including an excitation signal acquisition unit, a pre-synthesis unit, and a second scale factor generation unit, where:

The excitation signal acquiring unit is configured to: multiply the initial value of the second scale factor by the adaptive codebook gain obtained by decoding the current subframe, and multiply the adaptive codebook of the current subframe, and use the obtained signal as the current subframe. Incentive signal

The pre-synthesis unit is configured to: perform voice pre-synthesis using the excitation signal, and calculate a signal energy of the current subframe according to the pre-synthesized speech signal;

The second scale factor generating unit is configured to: when determining that the arithmetic square root of the ratio of the signal energy of the current subframe to the signal energy of the last subframe of the previous frame exceeds the seventh threshold, the second scale factor Updated to Q times the current second scale factor, Q is the arithmetic square root and the The product of seven broad values.

36. The compensation device according to claim 35, wherein

The excitation signal acquiring unit is further configured to: before multiplying the initial value of the second scale factor by the adaptive codebook gain obtained by decoding the current subframe, determining an inferred value of the pitch delay of the previous frame loss of the current frame and the current When the absolute value of the pitch delay difference of the first subframe obtained by the frame decoding is greater than the eighth threshold, the new second scale factor is recalculated according to the linear increasing function of the pitch synchronization autocorrelation coefficient of the voiced start frame, with new The second scale factor replaces the initial value of the second scale factor.

37. A compensation device for a frame after a voiced start frame, the device comprising a compensation module and an adaptive codebook gain adjustment module, wherein: the compensation module is configured to: correctly receive the voiced start frame, when the voiced tone starts When one or more frames immediately following the frame are lost, the pitch delay of the lost frame and the adaptive codebook gain are inferred, and the lost frame is compensated according to the inferred pitch delay and the adaptive codebook gain;

The adaptive codebook gain adjustment module is configured to: multiply an adaptive codebook gain obtained by decoding each subframe in the frame by a second scale factor of the subframe after the first correctly received frame after the voiced start frame A new adaptive codebook gain for each subframe is obtained, and the new adaptive codebook gain is used instead of the decoded adaptive codebook gain to participate in speech synthesis.

38. The compensation device according to claim 37, wherein

39. The compensation device according to claim 38, wherein