WO2013016986A1 - Compensation method and device for frame loss after voiced initial frame - Google Patents

Compensation method and device for frame loss after voiced initial frame Download PDF

Info

Publication number
WO2013016986A1
WO2013016986A1 PCT/CN2012/077356 CN2012077356W WO2013016986A1 WO 2013016986 A1 WO2013016986 A1 WO 2013016986A1 CN 2012077356 W CN2012077356 W CN 2012077356W WO 2013016986 A1 WO2013016986 A1 WO 2013016986A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
adaptive codebook
subframe
codebook gain
pitch
Prior art date
Application number
PCT/CN2012/077356
Other languages
French (fr)
Chinese (zh)
Inventor
关旭
袁浩
彭科
黎家力
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2013016986A1 publication Critical patent/WO2013016986A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/005Correction of errors induced by the transmission channel, if related to the coding algorithm
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/09Long term prediction, i.e. removing periodical redundancies, e.g. by using adaptive codebook or pitch predictor

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

A compensation method for frame loss after a voiced initial frame, comprising: if a first frame following a voiced initial frame is lost after the voiced initial frame is correctly received (101), choosing a fundamental tone delay inference method according to a stability condition of the voiced initial frame to infer a fundamental tone delay of the first lost frame (102); inferring an adaptive codebook gain of the first lost frame according to an adaptive codebook gain of one or more subframes received before the first lost frame, or inferring an adaptive codebook gain of the first lost frame according to an energy change of a time domain voice signal of the voiced initial frame (103); and compensating the first lost frame according to the inferred fundamental tone delay and adaptive codebook gain (104).

Description

一种浊音起始帧后丢帧的补偿方法和装置  Method and device for compensating frame loss after start frame of voiced sound
技术领域 Technical field
本发明涉及语音编解码技术领域, 具体涉及一种浊音起始帧后丟帧的补 偿方法和装置。 背景技术  The present invention relates to the field of speech codec technology, and in particular, to a method and apparatus for compensating for frame loss after a voiced start frame. Background technique
当语音帧在信道中传输时, 例如无线环境或者 IP网络等, 可能由于传输 过程中所涉及的各种复杂因素导致在接收时出现丟帧现象, 使得接收端合成 的语音质量严重下降。 丟帧补偿技术的目的是为了减小这种因为丟帧所引起 的语音质量下降, 以提高人的主观感受。  When a voice frame is transmitted in a channel, such as a wireless environment or an IP network, the frame loss phenomenon may occur during reception due to various complicated factors involved in the transmission process, so that the voice quality synthesized at the receiving end is seriously degraded. The purpose of the frame loss compensation technique is to reduce the quality of speech caused by frame dropping to improve the subjective feelings of people.
CELP ( Code Excited Linear Prediction, 码激励线性预测)类型语音编解 码器由于能在中低速率提供较好的语音质量, 从而在实际的通讯系统中得到 广泛的应用。 CELP类型语音编码解器是基于预测的语音编解码器, 当前编解 码的语音帧不仅依赖于当前语音帧数据,而且还与编解码器的历史状态有关, 即存在较强的帧间相关性。 这样当任意一语音帧丟失时, 不仅会造成当前语 音帧无法正确合成, 还会将这种错误延续到之后的若干帧去, 造成合成的语 音质量严重下降, 因此提供一种高质量的丟帧补偿方法显得尤为重要。  The CELP (Code Excited Linear Prediction) type speech codec is widely used in practical communication systems because it can provide better speech quality at low and medium speeds. The CELP type speech codec is based on a predictive speech codec. The current coded speech frame depends not only on the current speech frame data, but also on the historical state of the codec, ie there is a strong interframe correlation. In this way, when any one of the speech frames is lost, not only the current speech frame cannot be correctly synthesized, but also the error is continued to the subsequent frames, resulting in a serious degradation of the synthesized speech quality, thus providing a high quality frame loss frame. The compensation method is particularly important.
为了提高丟帧补偿质量, 一种方法是在编码端发送额外的 "边信息" , 这些 "边信息" 在解码时被用来恢复丟失的语音帧, 但显然这种方法会增加 比特流速率, 同时带来额外的编解码延时。 另一种方法是对信息帧解码后得 到的时域语音信号进行分类, 类型包括: 清音帧, 清音过渡帧, 浊音过渡帧, 浊音帧, 浊音起始帧等。 才艮据丟失帧前相邻帧不同的类别选择不同的丟帧补 偿方法, 但浊音起始帧之后的丟帧通常使用与浊音帧之后的丟帧相类似的补 偿办法, 从而当丟帧发生在浊音起始帧之后时补偿音质没有得到保证。 发明内容  In order to improve the quality of the frame loss compensation, one method is to send additional "side information" on the encoding side. These "side information" are used to recover the lost speech frame during decoding, but obviously this method will increase the bit stream rate. At the same time bring additional codec delay. Another method is to classify the time domain speech signals obtained after decoding the information frame, and the types include: unvoiced frames, unvoiced transition frames, voiced transition frames, voiced frames, voiced start frames, and the like. Different frame loss compensation methods are selected according to different categories of adjacent frames before the lost frame, but the frame loss after the voiced start frame usually uses a compensation method similar to the frame loss after the voiced frame, so that when the frame loss occurs in the frame The compensated sound quality is not guaranteed after the voiced start frame. Summary of the invention
本发明要解决的技术问题是提供一种浊音起始帧后丟帧的补偿方法和装 置, 保证浊音起始帧后丟帧的补偿无延时、 效果好。 The technical problem to be solved by the present invention is to provide a compensation method and device for dropping frames after a voiced start frame. Set, to ensure that the frame loss after the start of the voiced frame is compensated without delay and good effect.
为解决上述技术问题,本发明提供了一种浊音起始帧后丟帧的补偿方法, 所述方法包括:  To solve the above technical problem, the present invention provides a method for compensating for a frame loss after a voiced start frame, the method comprising:
浊音起始帧正确接收, 当浊音起始帧之后紧随的第一帧丟失时, 根据该 浊音起始帧的稳定性条件选取相应的基音延时推断方式推断该第一丟失帧的 基音延时; 根据第一丟失帧前接收的一个或两个以上子帧的自适应码本增益 推断该第一丟失帧的自适应码本增益, 或者根据浊音起始帧的时域语音信号 的能量变化推断该第一丟失帧的自适应码本增益; 根据推断得到的基音延时 和自适应码本增益对第一丟失帧进行补偿。  The voiced start frame is correctly received. When the first frame immediately following the voiced start frame is lost, the corresponding pitch delay inference method is selected according to the stability condition of the voiced start frame to infer the pitch delay of the first lost frame. Deriving an adaptive codebook gain of the first lost frame according to an adaptive codebook gain of one or more subframes received before the first lost frame, or inferring an energy variation of the time domain speech signal according to the voiced start frame The adaptive codebook gain of the first lost frame; compensating for the first lost frame based on the inferred pitch delay and the adaptive codebook gain.
为解决上述技术问题, 本发明还提供了一种浊音起始帧后丟帧的补偿装 置, 所述装置包括第一基因延时补偿模块、 第一自适应码本增益补偿模块和 第一补偿模块, 其中:  In order to solve the above technical problem, the present invention further provides a compensation device for dropping frames after a voiced start frame, the device comprising a first genetic delay compensation module, a first adaptive codebook gain compensation module, and a first compensation module. , among them:
所述第一基因延时补偿模块设置为: 在浊音起始帧正确接收, 浊音起始 帧之后紧随的第一帧丟失时, 根据该浊音起始帧的稳定性条件选取相应的基 音延时推断方式推断该第一丟失帧的基音延时;  The first gene delay compensation module is configured to: when the voiced start frame is correctly received, and the first frame immediately following the voiced start frame is lost, the corresponding pitch delay is selected according to the stability condition of the voiced start frame. Inferring the way to infer the pitch delay of the first lost frame;
所述第一自适应码本增益补偿模块设置为: 根据第一丟失帧前接收的一 个或两个以上子帧的自适应码本增益推断该第一丟失帧的自适应码本增益, 或者根据浊音起始帧的时域语音信号的能量变化推断该第一丟失帧的自适应 码本增益; 所述第一补偿模块设置为: 根据推断得到的基音延时和自适应码本增益 对第一丟失帧进行补偿。  The first adaptive codebook gain compensation module is configured to: infer an adaptive codebook gain of the first lost frame according to an adaptive codebook gain of one or more subframes received before the first lost frame, or according to The energy variation of the time domain speech signal of the voiced start frame infers the adaptive codebook gain of the first lost frame; the first compensation module is configured to: according to the inferred pitch delay and the adaptive codebook gain pair first Lost frames are compensated.
本发明要解决的另一技术问题是提供一种浊音起始帧后丟帧的 卜偿方法 和装置, 减小由于丟帧带来的错误传递, 控制合成语音的能量。  Another technical problem to be solved by the present invention is to provide a method and apparatus for delaying frame loss after a voiced start frame, which reduces the error transmission caused by frame dropping and controls the energy of the synthesized speech.
为解决上述技术问题, 本发明提供了一种浊音起始帧后帧的补偿方法, 所述方法包括:  In order to solve the above technical problem, the present invention provides a method for compensating a frame after a voiced start frame, the method comprising:
浊音起始帧正确接收, 当浊音起始帧之后紧随的一个或两个以上帧丟失 时, 推断丟失帧的基音延时以及自适应码本增益, 根据推断得到的基音延时 和自适应码本增益对丟失帧进行补偿; 对浊音起始帧之后首个正确接收的帧, 将该帧中每个子帧解码得到的自 适应码本增益乘以该子帧的第二尺度因子得到每个子帧的新的自适应码本增 益,使用新的自适应码本增益代替解码得到的自适应码本增益参与语音合成。 The voiced start frame is correctly received. When one or more frames following the voiced start frame are lost, the pitch delay of the lost frame and the adaptive codebook gain are inferred, and the pitch delay and the adaptive code are obtained according to the inference. This gain compensates for lost frames; For the first correctly received frame after the voiced start frame, the adaptive codebook gain obtained by decoding each subframe in the frame is multiplied by the second scale factor of the subframe to obtain a new adaptive codebook gain for each subframe. The new adaptive codebook gain is used instead of the decoded adaptive codebook gain to participate in speech synthesis.
为解决上述技术问题,本发明还提供了一种浊音起始帧后帧的补偿装置, 所述装置包括补偿模块和自适应码本增益调整模块, 其中:  In order to solve the above technical problem, the present invention further provides a compensation device for a frame after a voiced start frame, the device comprising a compensation module and an adaptive codebook gain adjustment module, wherein:
所述补偿模块设置为: 在浊音起始帧正确接收, 当浊音起始帧之后紧随 的一个或两个以上帧丟失时, 推断丟失帧的基音延时以及自适应码本增益, 根据推断得到的基音延时和自适应码本增益对丟失帧进行补偿;  The compensation module is configured to: correctly receive the voiced start frame, and when the one or more frames immediately following the voiced start frame are lost, infer the pitch delay of the lost frame and the adaptive codebook gain, according to the inference The pitch delay and the adaptive codebook gain compensate for the lost frame;
所述自适应码本增益调整模块设置为: 对浊音起始帧之后首个正确接收 的帧, 将该帧中每个子帧解码得到的自适应码本增益乘以该子帧的第二尺度 因子得到每个子帧的新的自适应码本增益, 使用新的自适应码本增益代替解 码得到的自适应码本增益参与语音合成。 附图概述  The adaptive codebook gain adjustment module is configured to: multiply an adaptive codebook gain obtained by decoding each subframe in the frame by a second scale factor of the subframe after the first correctly received frame after the voiced start frame A new adaptive codebook gain for each subframe is obtained, and the new adaptive codebook gain is used instead of the decoded adaptive codebook gain to participate in speech synthesis. BRIEF abstract
图 1为本发明实施例 1的流程图;  1 is a flowchart of Embodiment 1 of the present invention;
图 2为本发明实施例 1中步骤 102的具体方法流程图;  2 is a flowchart of a specific method of step 102 in Embodiment 1 of the present invention;
图 3为本发明实施例 1中步骤 103的具体方法流程图;  3 is a flowchart of a specific method of step 103 in Embodiment 1 of the present invention;
图 4为本发明实施例 3的流程图;  4 is a flowchart of Embodiment 3 of the present invention;
图 5为本发明实施例 4中第二尺度因子计算方法的流程图;  5 is a flowchart of a second scale factor calculation method in Embodiment 4 of the present invention;
图 6为本发明实施例 5中补偿装置的结构示意图;  6 is a schematic structural view of a compensation device according to Embodiment 5 of the present invention;
图 7为本发明实施例 6中补偿装置的结构示意图;  Figure 7 is a schematic structural view of a compensation device in Embodiment 6 of the present invention;
图 8为本发明实施例 7中补偿装置的结构示意图;  Figure 8 is a schematic structural view of a compensation device in Embodiment 7 of the present invention;
图 9为本发明实施例 8中补偿装置的结构示意图;  9 is a schematic structural view of a compensation device according to Embodiment 8 of the present invention;
图 10为本发明实施例 8中第二尺度因子计算模块的结构示意图。  FIG. 10 is a schematic structural diagram of a second scale factor calculation module according to Embodiment 8 of the present invention.
本发明的较佳实施方式 Preferred embodiment of the invention
下文中将结合附图对本发明的实施例进行详细说明。 需要说明的是, 在 不冲突的情况下, 本申请中的实施例及实施例中的特征可以相互任意组合。 以下实施例针对浊音起始帧正常接收, 而浊音起始帧之后紧随的帧丟失的情 况进行说明。 Embodiments of the present invention will be described in detail below with reference to the accompanying drawings. It should be noted that In the case of no conflict, the features in the embodiments and the embodiments in the present application may be arbitrarily combined with each other. The following embodiment is described for the case where the voiced start frame is normally received, and the frame immediately following the voiced start frame is lost.
实施例 1  Example 1
本实施例描述浊音起始帧之后紧随的第一帧丟失后补偿的方法, 如图 1 所示, 包括以下步骤:  This embodiment describes a method for compensating the first frame loss immediately after the voiced start frame, as shown in FIG. 1, including the following steps:
步骤 101 , 浊音起始帧正确接收, 判断浊音起始帧之后紧随的第一帧(下 称第一丟失帧)是否丟失, 如果丟失, 执行步骤 102 , 否则本流程结束; 步骤 102 , 根据该浊音起始帧的稳定性条件选取相应的基音延时推断方 式推断该第一丟失帧的基音延时;  Step 101: The voiced start frame is correctly received, and it is determined whether the first frame (hereinafter referred to as the first lost frame) immediately after the voiced start frame is lost. If it is lost, step 102 is performed, otherwise the process ends; step 102, according to the The stability condition of the voiced start frame is selected by the corresponding pitch delay inference method to infer the pitch delay of the first lost frame;
具体地: 如果浊音起始帧符合稳定性条件, 则釆用以下基音延时推断方 式推断该第一丟失帧的基音延时: 使用该浊音起始帧中最后一个子帧的基音 延时的整数部分 作为该第一丟失帧中每个子帧的基音延时;  Specifically: if the voiced start frame conforms to the stability condition, the pitch delay of the first lost frame is inferred by the following pitch delay inference method: an integer of the pitch delay of the last subframe in the voiced start frame is used. Partially as the pitch delay of each subframe in the first lost frame;
如果浊音起始帧不符合稳定性条件, 则釆用以下基音延时推断方式推断 该第一丟失帧的基音延时: 使用第一修正量对该浊音起始帧中最后一个子帧 的基音延时的整数部分 ( Τ- 进行修正得到第一修正值, 将第一修正值作为 该第一丟失帧中每个子帧的基音延时。  If the voiced start frame does not meet the stability condition, then the pitch delay of the first lost frame is inferred by the following pitch delay inference method: using the first correction amount, the pitch of the last subframe in the voiced start frame is delayed. The integer part of the time ( Τ - is corrected to obtain the first correction value, and the first correction value is used as the pitch delay of each sub-frame in the first lost frame.
当得到的基音延时为非整数时, 优选地, 可通过取整处理使第一修正值 为一整数。 取整处理的具体实现方式可以是向上取整或向下取整或者四舍五 入。  When the pitch delay obtained is a non-integer, preferably, the first correction value can be made an integer by rounding. The specific implementation of the rounding process can be rounded up or rounded down or rounded off.
第一修正量釆用以下方法获得: 以第一丟失帧之前的一个子帧 (浊音起 始帧的最后一个子帧) 为基准, 消除第一丟失帧之前的两个以上子帧的基因 延时的倍数, 利用消除基音延时的倍数后的第一丟失帧之前的两个以上子帧 的基音延时的整数部分确定基音延时的修正因子,用该修正因子和 7^确定基 因延时的第一尺度因子, 所述第一修正量为该修正因子和第一尺度因子的乘 积, 其中所述第一尺度因子用于表示该修正因子的可信度。 具体地, 修正因 子为: 消除基音延时的倍数后的第一丟失帧之前的两个以上子帧的基音延时 整数部分的标准方差。 第一尺度因子为: 1 减去修正因子与浊音起始帧的最 后一个子帧的基音延时的整数部分的比值 _; = 1 - /m / 7:, , , 其中/ ra为修正因子。 在其他实施例中, 第一尺度因子也可以取其他值, 例如 [0,1]之间的常数。 The first correction amount is obtained by: canceling the gene delay of two or more subframes before the first lost frame based on one subframe before the first lost frame (the last subframe of the voiced start frame) a multiple of the pitch delay of the two or more sub-frames before the first lost frame after eliminating the multiple of the pitch delay to determine the correction factor of the pitch delay, and using the correction factor and 7^ to determine the gene delay a first scale factor, the first correction amount being a product of the correction factor and a first scale factor, wherein the first scale factor is used to represent the reliability of the correction factor. Specifically, the correction factor is: a standard deviation of the pitch delay integer portion of the two or more subframes before the first lost frame after the multiple of the pitch delay is eliminated. The first scale factor is: 1 minus the correction factor and the voiced start frame The ratio of the integer part of the pitch delay of the latter sub-frame _; = 1 - / m / 7:, , , where / ra is the correction factor. In other embodiments, the first scale factor may also take other values, such as a constant between [0, 1].
优选釆用以下方式判断浊音起始帧是否符合稳定性条件: 满足以下任一 条件之一的浊音起始帧符合所述稳定性条件, 不满足以下所有条件的浊音起 始帧不符合所述稳定性条件:  Preferably, the following method is used to determine whether the voiced start frame meets the stability condition: the voiced start frame that satisfies any of the following conditions meets the stability condition, and the voiced start frame that does not satisfy all of the following conditions does not meet the stability condition. Sexual conditions:
浊音起始帧的基音同步的自相关系数大于第一阔值 ?;  The autocorrelation coefficient of the pitch synchronization of the voiced start frame is greater than the first threshold?
浊音起始帧的最后一个子帧的自适应码本增益大于第二阔值 且该浊 音起始帧的倒数第二个子帧的自适应码本增益大于第三阔值 G2; The adaptive codebook subframe the penultimate voiced onset frame is the last sub-frame adaptive codebook gain value is greater than the second width and the gain of the present frame is voiced onset width greater than a third value G 2;
浊音起始帧的最后一个子帧和倒数第二个子帧的基音延时的整数部分相 等。  The last sub-frame of the voiced start frame is equal to the integer part of the pitch delay of the second to last subframe.
以帧长为 20ms,每帧分成 4个 5ms时长的子帧,釆样率为 16kHz的语音 流为例对本实施例步骤 102进行具体说明, 在其它帧长和釆样率条件下, 下 述方法同样适用。 如图 2所示, 包括以下步骤:  The following method is specifically described in the case of a frame length of 20 ms, each frame is divided into four sub-frames of 5 ms duration, and a speech stream with a sampling rate of 16 kHz is taken as an example. Steps 102 of the present embodiment are used. The same applies. As shown in Figure 2, the following steps are included:
步骤 102a, 判断浊音起始帧是否符合以下任一稳定性条件, 如果是, 执 行步骤 102b, 如果不符合以下所有条件, 执行步骤 102c;  Step 102a, it is determined whether the voiced start frame meets any of the following stability conditions, and if so, step 102b is performed, if all the following conditions are not met, step 102c is performed;
•浊音起始帧的基音同步的自相关系数 ^大于第一阔值 R;  • The autocorrelation coefficient ^ of the pitch synchronization of the voiced start frame is greater than the first threshold R;
其中, 0≤i?≤l。 优选地, ?>0.5。  Where 0 ≤ i? ≤ l. Preferably, ?>0.5.
对于任意一帧,基音同步自相关系数( pitch-synchr onous normalized correlation ) RT为该帧最后两个连续基音周期的归一化自相关系数值, 用于表示该连续两 个基音周期的相似性特征, 具体可以釆用如下方法计算: For any frame, the pitch-synchr onous normalized correlation R T is the normalized autocorrelation coefficient value of the last two consecutive pitch periods of the frame, which is used to indicate the similarity of the two consecutive pitch periods. The characteristics can be calculated by the following methods:
R cN(T) 如果 r>N R c N (T) if r>N
T ~ [ .5CN(T) + .5CN(2T) 如果 r≤N T ~ [ .5C N (T) + .5C N (2T) If r≤N
j,i v m ττ-, -h I 「round(7^) 如果 7^≤3N/2 其中, N为子帧长度, Γ取值如下: Γ= 3J , ro 3 , j, ivm ττ-, -h I "round(7^) If 7^≤3N/2 where N is the length of the sub-frame, the values are as follows: Γ = 3J , ro 3 ,
[round(72+r3) 如果 r3>3N/2 roundW表示四舍五入运算, 23表示该帧第 3子帧和第 4子帧的基音延时; 上式中的 Cw (kT), = 1, 2的计算方法如下: [round(7 2 +r 3 ) if r 3 >3N/2 roundW denotes a rounding operation, 2 and 3 denote the pitch delay of the 3rd and 4th subframes of the frame; C w (kT) in the above formula , = 1, 2 is calculated as follows:
∑L-\-{k-\)T 、  ∑L-\-{k-\)T ,
CN(kT) C N (kT)
lsr^L-\-(k-\)T I^L- -(k- )T ~~ IT 其中, 为帧长, S(), = 0,...,J-l为解码器合成的该帧的时域语音信号。 大于第二阔值 (^且 -2 大于第三阔值 G2; Lsr^L-\-(k-\)TI^L- -(k- )T ~~ IT Wherein, for the frame length, S(), = 0, ..., Jl is the time domain speech signal of the frame synthesized by the decoder. Greater than the second threshold (^ and -2 is greater than the third threshold G 2 ;
其中, 和 ,_2分别为浊音起始帧的第 4子帧 (最后一子帧)和第 3 子帧 (倒数第二子帧) 的自适应码本增益; 0 ^≤(¾<1。 Where, and _ 2 are the adaptive codebook gains of the 4th subframe (the last subframe) and the 3rd subframe (the second last subframe) of the voiced start frame, respectively; 0 ^ ≤ (3⁄4<1.
等于 7:2Equal to 7: 2 ;
其中, ^和 7:2分别为浊音起始帧的第 4子帧和第 3子帧的基音延时的 整数部分。 Wherein ^ and 7: 2 are the integer parts of the pitch delay of the 4th subframe and the 3rd subframe of the voiced start frame, respectively.
步骤 102b, 如果浊音起始帧符合上述任一稳定性条件, 则使用浊音起始 帧的最后一个子帧(本实施例中为第 4子帧)的基音延时的整数部分 7^作为 第一丟失帧每个子帧的基音延时, 结束;  Step 102b, if the voiced start frame meets any of the above stability conditions, the integer part 7^ of the pitch delay of the last subframe of the voiced start frame (the fourth subframe in this embodiment) is used as the first Loss frame, pitch delay of each sub-frame, end;
步骤 102c, 如果浊音起始帧不符合上述所有稳定性条件, 则对当前丟失 帧之前的 M个(例如 M=4 )子帧的基音延时的整数部分 Τ—Μ、,..,ΤΛ做如下消 除基音延时的倍数的处理, 即以当前丟失帧之前的最后一个子帧为基准, 消 除当前丟失帧之前的两个以上子帧的基因延时的倍数: Step 102c: If the voiced start frame does not meet all the stability conditions described above, the integer part of the pitch delay of the M (e.g., M=4) subframes before the current lost frame Τ - Μ , , .., Τ Λ The process of eliminating the multiple of the pitch delay is as follows, that is, based on the last subframe before the current lost frame, the multiple of the gene delay of the two or more subframes before the current lost frame is eliminated:
先取 7 ^为 7 ^表示消除倍数后的基音延时; 对于 ζ·从 -2到  First take 7 ^ for 7 ^ to indicate the pitch delay after eliminating the multiple; for ζ·from -2 to
如果 7小于等于 Τ—ι, 取 7和 2*7;距离 ΤΛ更近者, 即 7和 2*7;中与 7^之差的绝对值最小的那个,如果 ΙΤτΤ^Ι与 ^Τ^-Τ^Ι中 ΙΤτΤ^Ι最小,则取 Γ=Τ, 如果 Idil与 |2*7
Figure imgf000008_0001
2*7;;
If 7 is less than or equal to Τ—ι, take 7 and 2*7; the closer to Τ ,, ie 7 and 2*7; the one with the smallest absolute difference between the difference and 7^, if ΙΤτΤ^Ι and ^Τ^ -Τ^Ι中ΙΤτΤ^Ιminimum, then take Γ=Τ, if Idil and |2*7
Figure imgf000008_0001
2*7;;
反之如果 Ί]大于 ΤΛ, ',·取 7和 ΊΙ2距离 ΤΛ更近者, 即 7和 ΊΙ2中与 ΤΛ 之差的绝对值最小的那个, 如果 ΙΤΓΤ^Ι与 Κ ^)-^中 ΙΤΓΤ^Ι最小, 则取 τ=τ, 如果 ΙΤτΤ^Ι与 Κ ^)-^中 Κ ^)-^最小, 则取 T)= 7V2。 Whereas if Ί] is greater than Τ Λ, ', · and 7 taken ΊΙ2 Τ Λ closer distance, namely ΊΙ2 7 and the smallest absolute value, and if ΙΤΓΤ ^ Ι K0 ^) of the difference between Τ Λ - ^ in If ΙΤΓΤ^Ι is the smallest, then τ=τ, if ΙΤτΤ^Ι and Κ ^)-^ Κ ^)-^ is the smallest, then take T)= 7V2.
步骤 102d, 确定基音延时的修正因子/ ra和第一尺度因子 , 取第一修正 量为第一尺度因子和修正因子的乘积,即 ,其中修正因子/ ra取作 T'-M、,,..,Step 102d, determining a correction factor / ra of the pitch delay and a first scale factor, taking the first correction amount as the product of the first scale factor and the correction factor, that is, wherein the correction factor / ra is taken as T'- M ,,, ..,
7^的标准方差, 第一尺度因子/ s表示修正因子的一种可信程度, 具体取值如 下:
Figure imgf000008_0002
上式中 Γ为步骤 102C中计算得到的 Τ、。
The standard deviation of 7^, the first scale factor / s represents a credibility of the correction factor, the specific values are as follows:
Figure imgf000008_0002
In the above formula, Γ is calculated in step 102C.
步骤 102e, 使用浊音起始帧的最后一个子帧 (本实施例中为第 4子帧) 的基音延时的整数部分 7^作为第一丟失帧每个子帧的基音延时基本值,釆用 修正因子和第一尺度因子对基音延时基本值进行第一次修正处理得到第一修 正值 T^ T^+^ , 将该 7;作为该第一丟失帧每个子帧的基音延时。  Step 102e, using the integer part 7^ of the pitch delay of the last subframe of the voiced start frame (the fourth subframe in this embodiment) as the basic value of the pitch delay of each subframe of the first lost frame, The correction factor and the first scale factor perform a first correction process on the fundamental value of the pitch delay to obtain a first correction value T^T^+^, which is used as the pitch delay of each subframe of the first lost frame.
使用第一修正量对 7^进行修正时, 应保证得到的第一修正值 7;在基音 延时的取值范围内。 最后通过取整处理(本实施例中釆用四舍五入的方式) 使第一修正值 7;为一整数。在其他实施例中,如果得到的基音延时为一整数, 则可以不再进行取整处理。  When correcting 7^ with the first correction amount, the first correction value 7 obtained should be guaranteed; within the range of the pitch delay. Finally, the first correction value 7 is made by rounding (in this embodiment, rounding off); In other embodiments, if the pitch delay obtained is an integer, the rounding process may not be performed.
步骤 103 , 根据第一丟失帧前接收的 ( 取大于等于 1的整数)个子 帧的自适应码本增益推断该第一丟失帧的自适应码本增益, 或者根据浊音起 始帧的时域语音信号的能量变化推断该第一丟失帧的自适应码本增益, 该浊 音起始帧的时域语音信号由解码器合成得到;  Step 103: Infer an adaptive codebook gain of the first lost frame according to an adaptive codebook gain received before the first lost frame (take an integer greater than or equal to 1), or according to a time domain voice of the voiced start frame. The energy variation of the signal infers the adaptive codebook gain of the first lost frame, and the time domain speech signal of the voiced start frame is synthesized by the decoder;
具体地: 如果满足以下条件一: 浊音起始帧的基音周期内对数能量与长 时基音周期内对数能量的差值小于第四阔值¾ ^则将衰减后的第一丟失帧之 前 个子帧的自适应码本增益的中位数的值作为第一丟失帧中每个子帧的 自适应码本增益的推断值 gp, 衰减釆用的系数为 [0,1]之间的常数; Specifically: if the following condition is satisfied: the difference between the logarithmic energy in the pitch period of the voiced start frame and the logarithmic energy in the long time pitch period is less than the fourth threshold value 3⁄4^, then the first frame of the first lost frame will be attenuated adaptive codebook gain value of the estimated value of the median of the present adaptive codebook gain of the present frame as the first lost frame in each sub-frame g p, is a constant coefficient for the [0,1] preclude attenuation;
如果不满足条件一, 但满足以下条件二: 浊音起始帧中最后一个子帧的 自适应码本增益 g/ l在预定范围内, 则将衰减后的 g/ l作为第一丟失帧中每 个子帧的自适应码本增益的推断值 衰减釆用的系数为 [0,1]之间的常数; 如果不满足条件一也不满足条件二, 则计算能量比值 ?^和 ?^, 使用衰 减后的 RLT和 RST的加权平均值作为第一丟失帧中每个子帧的自适应码本增益 的推断值 ; 其中, 表示解码器合成的浊音起始帧的时域语音信号的除第 一个基音周期外的能量与除最后一个基音周期外的能量的比值; RST表示表示 解码器合成的浊音起始帧的时域语音信号的最后一个基音周期的能量与最后 一个基音周期的前一个基音周期的能量的比值, 这里限制基音周期 (即基因 延时 不得超过帧长 的一半, 即当 7^大于 /2时取 7:尸 /2。 If condition one is not satisfied, but the following condition two is satisfied: the adaptive codebook gain g/l of the last subframe in the voiced start frame is within a predetermined range, and the attenuated g/l is taken as the first lost frame. The estimated value of the adaptive codebook gain of the subframe is attenuated by a constant between [0, 1]; if the condition is not satisfied, the energy ratio is calculated as ^^ and ?^, using attenuation The weighted average of the subsequent R LT and R ST is used as an inferred value of the adaptive codebook gain for each subframe in the first lost frame; wherein, in addition to the first, the time domain speech signal representing the voiced start frame synthesized by the decoder The ratio of the energy outside the pitch period to the energy other than the last pitch period; R ST represents the energy of the last pitch period of the time domain speech signal representing the voiced start frame synthesized by the decoder and the previous one of the last pitch period The ratio of the energy of the pitch period, where the pitch period is limited (ie, the gene delay must not exceed half the frame length, ie, 7: corpse/2 when 7^ is greater than /2.
当前帧丟失时, 对历史激励信号以步骤 102中得到的基音延时为周期进 行周期性延拓得到自适应码本激励, 将步骤 103中得到的自适应码本增益与 自适应码本激励的乘积作为当前丟失帧当前子帧的激励信号的周期性部分参 与语音合成。 When the current frame is lost, the historical excitation signal is cycled with the pitch delay obtained in step 102. The row periodic extension obtains the adaptive codebook excitation, and the product of the adaptive codebook gain obtained in step 103 and the adaptive codebook excitation is used as the periodic part of the excitation signal of the current subframe of the current lost frame to participate in speech synthesis.
以帧长为 20ms,每帧分成 4个 5ms时长的子帧,釆样率为 16kHz的语音 流为例对步骤 103进行具体说明, 在其它帧长和釆样率条件下, 下述方法同 样适用。 如图 3所示, 包括以下步骤:  The following method is also applicable to the case where the frame length is 20 ms, each frame is divided into four sub-frames of 5 ms duration, and the speech stream with a sampling rate of 16 kHz is taken as an example for the case of other frame lengths and sampling rates. . As shown in Figure 3, the following steps are included:
*对于当前丟失帧的第 1子帧:  * For the first subframe of the currently lost frame:
步骤 103a, 如果满足以下条件一: 当前丟帧的前一帧 (在本实施例中为 浊音起始帧) 的基音周期内对数能量与长时基音周期内对数能量的差值 dEt 小于阔值¾ ^(通常¾„ 又负值), 则取衰减后的当前丟失帧之前的 个(例 如 =5 )子帧的自适应码本增益 gp,-Ml,.., gp 的中位数的值作为当前丟失帧第 1子帧的自适应码本增益的推断值 即 Step 103a, if the following condition one is satisfied: the difference dE t between the logarithmic energy in the pitch period of the previous frame of the current frame loss (in the present embodiment, the voiced start frame) and the logarithmic energy in the long time pitch period is less than The threshold value of 3⁄4 ^ (usually 3⁄4 „ and negative value) is taken as the adaptive codebook gain g p , - Ml , .., g p of the subframe before the attenuation of the current lost frame (eg = 5 ). The value of the number of bits is used as the inferred value of the adaptive codebook gain of the first subframe of the current lost frame.
gP = ov (") * median(^_M ,...,gp^), 同时将 限制在适当范围之内, 例如限制 ¾7在[0.5,0.95]之内, 即: 若& <0.5, 取& = 0.5; 若&>0.95, 取& = 0.95。 g P = o v ( ") * median (^ _ M, ..., g p ^), while the limit of a proper range, such as limiting the ¾ 7 [0.5,0.95] within that: when & <0.5, take & = 0.5; if & >0.95, take & = 0.95.
上述公式中, 《表示当前连续丟帧的序号, 例如这里为正确接收帧后的 第一个丟帧, 故取 w=l ; 表示与之对应的衰减系数, 取值如下:  In the above formula, “represents the serial number of the current consecutive frame loss, for example, the first frame loss frame after receiving the frame correctly, so take w=l; indicating the attenuation coefficient corresponding thereto, and the values are as follows:
1.0, n = \ 1.0, n = \
(") = o.95, n = 2 ; median(*)表示取中位数。  (") = o.95, n = 2 ; median(*) means taking the median.
0.6, n>=3 对于任意一帧, 定义为基音周期内对数能量与长时基音周期内对数能 量的差值, 即:  0.6, n>=3 For any frame, defined as the difference between the logarithmic energy in the pitch period and the logarithmic energy in the long-term pitch period, ie:
dEf = Et -Et, 其中, 表示基音周期内对数能量: £,=1010&。(^ ^2( - Γ'+)), 式中, 表示帧长, Γ'表示基音延时, 取值为: dE f = E t -E t , where represents the logarithmic energy in the pitch period: £, =101 0& . (^ ^ 2 ( - Γ'+)), where is the frame length, Γ' indicates the pitch delay, and the value is:
iround(0.5*72 + 0.5*r3) 如果 round(0.5*r2 +0.5*Γ3)≥ N . Iround(0.5*7 2 + 0.5*r 3 ) if round(0.5*r 2 +0.5*Γ 3 )≥ N .
_ 2*round(0.5*r2+0.5*r3) 如果 round(0.5*r2 +0.5*Γ3) < N ' 表示长时基音周期内对数能量, 当该帧的类型为浊音帧 (VOICED) 时需要对其进行更新, 更新方式为: ^¾=0.99 ,+0.01 ,。 _ 2*round(0.5*r 2 +0.5*r 3 ) If round(0.5*r 2 +0.5*Γ 3 ) < N ' represents the logarithmic energy in the long-time pitch period, when the type of the frame is a voiced frame ( VOICED) It needs to be updated when updating: ^3⁄4=0.99, +0.01.
103b, 如果上述 103a中的条件不满足, 但满足以下条件二: 当前丟失帧 前一子帧 (即浊音起始帧中最后一个子帧) 的自适应码本增益 在适当的 范围内, 例如 ¾^在 0.8到 1.1之间, 对 ¾^做适当衰减得到当前丟失帧第 1 子帧的自适应码本增益&: 103b, if the condition in the above 103a is not satisfied, but the following condition 2 is satisfied: the adaptive codebook gain of the previous subframe of the current lost frame (ie, the last subframe in the voiced start frame) is within an appropriate range, for example, 3⁄4 ^ Between 0.8 and 1.1, make appropriate attenuation for 3⁄4^ to get the adaptive codebook gain of the 1st subframe of the current lost frame & :
gP = cp(nygp_ ( 1 ) g P = c p (nyg p _ ( 1 )
式中 表示衰减系数。  Where the attenuation coefficient is expressed.
103c, 上述 103a和 103b中的两个条件都不满足时, 根据解码器合成的 浊音起始帧的时域语音信号的能量变化, 推断得到当前丟失帧的自适应码本 增益, 具体推断方式如下: 103c, when the two conditions in the above 103a and 103b are not satisfied, according to the energy variation of the time domain speech signal of the voiced start frame synthesized by the decoder, the adaptive codebook gain of the current lost frame is inferred, and the specific inference manner is as follows: :
首先, 计算能量比值 7和 ¾7, 其中 表示解码器合成的浊音起始帧 的时域语音信号的除第一个基音周期外的能量与除最后一个基音周期外的能 量的比值; ¾r表示表示解码器合成的浊音起始帧的时域语音信号的最后一个 基音周期的能量与最后一个基音周期的前一个基音周期的能量的比值, 这里 限制基音周期不得超过 /2, 即 7^大于 /2时取 /2, 和 ¾r的计算公式如 下: First, the energy ratios 7 and 3⁄4 7 are calculated, wherein the ratio of the energy of the time domain speech signal of the voiced start frame synthesized by the decoder to the energy other than the first pitch period is compared with the energy of the last pitch period; 3⁄4 r represents The ratio of the energy of the last pitch period of the time domain speech signal of the voiced start frame synthesized by the decoder to the energy of the previous pitch period of the last pitch period, where the pitch period is limited to no more than /2, ie 7^ is greater than / The calculation formula for taking 2, and 3⁄4 r at 2 o'clock is as follows:
、(,
Figure imgf000011_0001
Figure imgf000011_0001
其中, 为帧长, re(), = 0...J-1为解码器合成的浊音起始帧的时域语 音信号; Wherein, for the frame length, re (), = 0...J-1 is a time domain speech signal of the voiced start frame synthesized by the decoder;
然后, 将能量比值 RLT和 ¾r加权平均后做适当的衰减得到: Then, the energy ratios R LT and 3⁄4 r are weighted and averaged and then appropriately attenuated:
=«»*(0.5*R +0.5D, (2) =«»*(0.5*R + 0.5D, (2)
103d, 将由式(1 )或式(2)估计得到的 &进行限制后的值作为当前丟 失帧第 1子帧的自适应码本增益的推断值; 具体对 的限制方法如下: 103d, estimated by the formula (1) or (2) as the value of the current adaptive codebook subframe, the first lost frame according to the estimated value of the gain limiting & obtained; specific limitation on the method are as follows:
如果 大于某一上限阔值, 例如 1, 取 为该上限阔值; 如果 小于某一下限阔值, 例如 0.7 , 取&为该下限阔值; 如果 ^等于步骤 102 中推断得到的第一修正值 Tc (进行取整处理后的 Tc ) , 并且 &大于另一上限阔值, 例如 0.95 , 取&为该另一上限阔值; If it is greater than a certain upper threshold, for example, 1, it is taken as the upper threshold; If the width is less than a lower limit value, e.g. 0.7, the lower limit for the width values & taken; if ^ is equal to a first step extrapolated correction value T c 102 (rounding process after T c), and & than the other The upper limit is, for example, 0.95, taking & is the other upper limit;
•对于当前丟失帧除第 1子帧外的其他子帧, 执行步骤 103e, 直接沿用 当前丟失帧第 1子帧推断得到的自适应码本增益 作为该子帧的自适应码本 增益的推断值。  • For the other subframes except the first subframe of the current lost frame, step 103e is performed, and the adaptive codebook gain estimated by using the first subframe of the current lost frame is directly used as the estimated value of the adaptive codebook gain of the subframe. .
步骤 104 , 根据推断得到的基音延时和自适应码本增益对第一丟失帧进 行补偿, 即使用推断得到的基音延时和自适应码本增益参与第一丟失帧的语 音合成。  Step 104: Compensate the first lost frame according to the inferred pitch delay and the adaptive codebook gain, that is, use the inferred pitch delay and the adaptive codebook gain to participate in the speech synthesis of the first lost frame.
具体补偿方法可釆用现有技术实现, 本文不再赘述。  The specific compensation method can be implemented by using the prior art, and will not be described in detail herein.
实施例 2  Example 2
本实施例描述浊音起始帧之后紧随的第一帧丟失后补偿的方法, 与实施 例 1的区别在于增加了第二修正处理。  The present embodiment describes a method of compensating for the first frame loss immediately after the voiced start frame, which is different from Embodiment 1 in that the second correction processing is added.
步骤 201 , 与实施例 1中步骤 101相同;  Step 201 is the same as step 101 in Embodiment 1;
步骤 202, 本步骤与步骤 102的主要区别在于, 当浊音起始帧不符合稳 定性条件, 使用第一修正量对 7^进行修正后, 对该修正后的 7^进行第二修 正处理, 将修正处理后的结果作为最终的该第一丟失帧每个子帧的基音延时 的推断值。  Step 202, the main difference between this step and step 102 is: when the voiced start frame does not meet the stability condition, after the first correction amount is used to correct 7^, the corrected 7^ is subjected to the second correction process, and The processed result is corrected as the inferred value of the pitch delay of each subframe of the final first lost frame.
具体地, 第二修正处理如下: 判断如果满足下述两条件, 则取 7^为基音延时中间值: 条件 1 : 修正后 的 ΤΛ (即 ΤΰΛ+ fs% )与 ΤΛ的差的绝对值大于第五阔值 Tthrl , 条件 2: ΤΛ 与浊音起始帧倒数第二个子帧的基音延时整数部分 7:2的差的绝对值小于第 六阔值 7^2; 其中 0<第六阔值 7^2<第五阔值 判断如果不满足上述任一 条件,则将第五阔值 与第一修正量的最小值与 ΤΛ的和作为基音延时中间 值; 判断基音延时中间值如果大于最近正确接收的具有稳定基音延时的浊音 帧的基音延时的 X倍(χ > 1 , 优选地 χ=1.7 ) , 则将基音延时中间值乘 2作为 第二修正处理后的结果, 否则直接将基音延时中间值作为第二修正处理后的 结果。 优选地, 在基音延时中间值大于最近正确接收的具有稳定基音延时的 浊音帧的基音延时的 X倍时, 置倍频标识位为有效(例如 1 ) , 不大于时, 置倍频标识位为无效(例如 0 ) 。 Specifically, the second correction process is as follows: It is judged that if the following two conditions are satisfied, 7^ is taken as the intermediate value of the pitch delay: Condition 1: The corrected Τ Λ (ie Τ ΰ = Τ Λ + f s % ) and Τ The absolute value of the difference of Λ is greater than the fifth threshold T thrl , Condition 2: Τ Λ and the pitch delay of the second sub-frame of the start frame of the voiced speech . The absolute value of the difference of the 7: 2 is less than the sixth threshold 7^ 2 ; where 0 < sixth threshold 7^ 2 < fifth threshold value judgment If the above condition is not satisfied, the sum of the fifth threshold and the minimum value of the first correction amount and Τ 作为 is taken as the pitch delay Value; if the intermediate value of the pitch delay is greater than X times (χ > 1, preferably χ = 1.7) of the pitch delay of the most recently received voiced frame with a stable pitch delay, the intermediate value of the pitch delay is multiplied by 2 As a result of the second correction process, otherwise the intermediate value of the pitch delay is directly used as the second correction process. The result. Preferably, when the intermediate value of the pitch delay is greater than X times the pitch delay of the most correctly received voiced frame with a stable pitch delay, the multiplier flag is valid (eg, 1), not greater than, the multiplier The flag is invalid (for example, 0).
步骤 203 , 本步骤与步骤 103 的主要区别在于, 条件一为: 浊音起始帧 的基音周期内对数能量与长时基音周期内对数能量的差值小于第四阔值 Ethr 或者在基音延时推断中设置的倍频标识位为有效(例如为 1 ) 。 满足条件一 的处理, 条件二, 不满足条件一但满足条件二的处理, 以及不满足条件一和 条件二时的处理均与步骤 103相同。 步骤 204 , 与实施例 1中步骤 104相同。 Step 203, the main difference between this step and step 103 is that the condition one is: the difference between the logarithmic energy in the pitch period of the voiced start frame and the logarithmic energy in the long time pitch period is less than the fourth threshold E thr or in the pitch The multiplier flag set in the delay inference is valid (for example, 1). The process satisfying the condition one, the condition two, the process of satisfying the condition one but satisfying the condition two, and the process of not satisfying the condition one and the condition two are the same as the step 103. Step 204 is the same as step 104 in Embodiment 1.
以帧长为 20ms,每帧分成 4个 5ms时长的子帧,釆样率为 16kHz的语音 流为例对本实施例步骤 202进行具体说明, 在其它帧长和釆样率条件下, 下 述方法同样适用。  For the frame length of 20 ms, each frame is divided into four sub-frames of 5 ms duration, and the speech stream with a sampling rate of 16 kHz is taken as an example. Step 202 of the embodiment is specifically described. Under other frame lengths and sampling rates, the following method is used. The same applies.
步骤 202a, 判断浊音起始帧是否符合以下任一稳定性条件, 如果是, 执 行步骤 202b, 如果不符合以下所有条件, 执行步骤 202c;  Step 202a, it is determined whether the voiced start frame meets any of the following stability conditions, and if so, step 202b is performed, if all the following conditions are not met, step 202c is performed;
·浊音起始帧的基音同步的自相关系数 大于第一阔值 ;  The autocorrelation coefficient of the pitch synchronization of the voiced start frame is greater than the first threshold;
其中, 0≤i?≤l。 优选地, ? > 0.5。  Where 0 ≤ i? ≤ l. Preferably, ? > 0.5.
对于任意一帧,基音同步自相关系数( pitch-synchr onous normalized correlation ) RT为该帧最后两个连续基音周期的归一化自相关系数值, 用于表示该连续两 个基音周期的相似性特征, 具体计算方法参见步骤 102a, 此处不再赘述。 For any frame, the pitch-synchr onous normalized correlation R T is the normalized autocorrelation coefficient value of the last two consecutive pitch periods of the frame, which is used to indicate the similarity of the two consecutive pitch periods. For the specific calculation method, refer to step 102a, and details are not described herein again.
· -1大于第二阔值 (^且 -2 大于第三阔值 G2; · -1 is greater than the second threshold (^ and -2 is greater than the third threshold G 2 ;
其中, 和 ,_2分别为浊音起始帧的第 4子帧 (最后一子帧)和第 3 子帧 (倒数第二子帧) 的自适应码本增益; 0 ^≤(¾<1。 Where, and _ 2 are the adaptive codebook gains of the 4th subframe (the last subframe) and the 3rd subframe (the second last subframe) of the voiced start frame, respectively; 0 ^ ≤ (3⁄4<1.
· ΤΛ等于 -2; · Τ Λ is equal to -2 ;
其中, ^和 7:2分别为浊音起始帧的第 4子帧和第 3子帧的基音延时的 整数部分。 Wherein ^ and 7: 2 are the integer parts of the pitch delay of the 4th subframe and the 3rd subframe of the voiced start frame, respectively.
步骤 202b, 如果浊音起始帧符合上述任一稳定性条件, 则使用浊音起始 帧的最后一个子帧(本实施例中为第 4子帧)的基音延时的整数部分 7^作为 第一丟失帧每个子帧的基音延时, 结束; 步骤 202c, 如果浊音起始帧不符合上述所有稳定性条件, 则对当前丟失 帧之前的 M个(例如 M=4 )子帧的基音延时的整数部分 7lM,..,7Ut如下消 除基音延时的倍数的处理, 即以当前丟失帧之前的最后一个子帧为基准, 消 除其他子帧的基因延时的倍数: Step 202b, if the voiced start frame meets any of the above stability conditions, the integer part 7^ of the pitch delay of the last subframe of the voiced start frame (the fourth subframe in this embodiment) is used as the first Loss frame, pitch delay of each sub-frame, end; Step 202c, if the voiced start frame does not meet all the above stability conditions, the integer part of the pitch delay of the M (for example, M=4) subframes before the current lost frame is 7l M , .., 7Ut to eliminate the pitch as follows The processing of the multiple of the delay, that is, based on the last subframe before the current lost frame, the multiple of the gene delay of the other subframes is eliminated:
先取 7^为 ΤΛ , 7^表示消除倍数后的基音延时;如果 7小于等于 ΤΛ , T'i 取 7和 2*7;中与 之差的绝对值最小的那个; 反之如果 7大于 Τ'Π] 和 7V2中与 之差的绝对值最小的那个, 其中
Figure imgf000014_0001
其中 Μ为待进行 消除操作的第一丟失帧之前的子帧的个数。
First, take 7^ as Τ Λ , 7^ denotes the pitch delay after eliminating the multiple; if 7 is less than or equal to Τ Λ , T'i takes 7 and 2*7; the one with the smallest absolute difference; if 7 is greater than Τ 'Π] and the one with the smallest absolute difference between 7V2 and
Figure imgf000014_0001
Where Μ is the number of subframes before the first lost frame to be erased.
步骤 202d, 确定基音延时的修正因子/ ra和第一尺度因子 , 取第一修正 量为第一尺度因子和修正因子的乘积, 即 ,其中修正因子/ ra取作 T'—M .., ' 的标准方差, 第一尺度因子/ s表示修正因子的一种可信程度, 具体取值如 下: f 其中 f =丄 Mi Step 202d, determining a correction factor / ra of the pitch delay and a first scale factor, taking the first correction amount as the product of the first scale factor and the correction factor, that is, wherein the correction factor / ra is taken as T'- M .., The standard deviation of ', the first scale factor / s represents a degree of confidence in the correction factor, the specific values are as follows: f where f = 丄Mi
Figure imgf000014_0002
Figure imgf000014_0002
上式中 Γ为步骤 202c中计算得到的 Τ、。 In the above formula, Γ is the Τ calculated in step 202c.
步骤 202e, 使用浊音起始帧的最后一个子帧 (本实施例中为第 4子帧) 的基音延时的整数部分 7^作为第一丟失帧每个子帧的基音延时基本值,釆用 修正因子和第一尺度因子对基音延时基本值进行第一次修正处理得到第一修 正值 Tc= TA+fs%; Step 202e, using the integer part 7^ of the pitch delay of the last subframe of the voiced start frame (the fourth subframe in this embodiment) as the basic value of the pitch delay of each subframe of the first lost frame, The correction factor and the first scale factor perform a first correction process on the fundamental value of the pitch delay to obtain a first correction value T c = T A + f s %;
步骤 202f, 对第一修正值进行如下第二修正处理:  Step 202f, performing the following second correction processing on the first correction value:
如果 7;与 7 的差的绝对值大于阔值第五 Tthrl ,并且 7 与 7:2差的绝对值 小于第六阔值 7^2, 则取 7 = ^; 否则 (上述任一条件不满足)取 7;为 7^加 上■; * 与 TthA 的最小值, 即 T^T^+mii^ fs * fm thrl) , 优选地, 取阔值 If the absolute value of the difference between 7 and 7 is greater than the threshold value of the fourth T thrl , and the absolute value of the difference between 7 and 7: 2 is less than the sixth threshold of 7^ 2 , then 7 = ^; otherwise (any of the above conditions are not Satisfy) take 7; add 7^ to ■; * and T thA to the minimum value, ie T^T^+mii^f s * f m thrl ), preferably, take the value
将得到的 7;与最近正确接收的具有稳定基音延时的浊音帧的基音延时 Ts 作比较: 若 7; 大于 X倍的 7;, 优选 x=1.7 , 吏新 Tc=Tc x 2, 置倍频标识位为 1 ; 否则, 不更新 7;, 置倍频标识位为 0。 其中, 7在信息帧正确接收时需要更新, 更新方式如下: The resulting 7; is compared with the pitch delay T s of the recently correctly received voiced frame with a stable pitch delay: if 7; greater than X times 7; preferably x = 1.7, new T c = T c x 2. Set the multiplier flag to 1; otherwise, do not update 7; set the multiplier flag to 0. Among them, 7 needs to be updated when the information frame is correctly received, and the update method is as follows:
设 TQ, Ί\ , Γ2和 Γ3分别为该帧第 1、 第 2、 第 3和第 4子帧的基音延时, 如果当前正确接收帧为浊音类型帧, 包括浊音过度帧, 浊音帧, 浊音起始帧, 并且该帧有稳定的基音周期, 例如满足条件: 不超过 1.4倍的 Γ3, 且 Γ3不 超过 1.4倍的 且 TQ与 Γ2差的绝对值不超过 10, 则更新 7;为 Γ3, 否则不 进行更新。 Let T Q , Ί\ , Γ 2 and Γ 3 be the pitch delay of the first, second, third and fourth sub-frames of the frame respectively, if the currently correctly received frame is a voiced type frame, including a voiced over frame, voiced sound frame, voiced onset frame, and the frame has a stable pitch period, for example, satisfy the condition: not more than 1.4 times of Γ 3, and Γ 3 and not more than 1.4 times the absolute value of the difference between T Q Γ 2 and not more than 10, Then update 7; it is Γ 3 , otherwise it will not be updated.
步骤 202g,使用取整处理后的 7;作为当前丟失帧每一个子帧的基音延时, 同时应保证取整处理后的 7;在基音延时的取值范围内, 即:  Step 202g, using 7 after the rounding process; as the pitch delay of each subframe of the current lost frame, and ensuring 7 after the rounding process; within the range of the pitch delay, that is:
如果 7; > Tmax, 取 Tc = Tmax; If 7; > Tmax, take T c = T max ;
^^ Tc < Tmm, Tc = 7mm; ^^ T c < T mm , T c = 7mm;
其中, Imm和 ^分别为基音延时所允许的最小值和最大值。  Where Imm and ^ are the minimum and maximum values allowed by the pitch delay, respectively.
实施例 3  Example 3
本实施例描述浊音起始帧之后紧随的两个以上帧丟失后补偿的方法, 丟 失帧中包括第一丟失帧以及第一丟失帧之后紧随的 1个或 2个以上丟失帧, 如图 4所示, 包括以下步骤:  This embodiment describes a method for compensating two or more frames immediately after a voiced start frame, where the lost frame includes a first lost frame and one or more missing frames immediately following the first lost frame, as shown in the figure. 4, including the following steps:
步骤 301 , 釆用实施例 1或实施例 2中的方法推断第一丟失帧的基音延 时和自适应码本增益;  Step 301: Infer the pitch delay and the adaptive codebook gain of the first lost frame by using the method in Embodiment 1 or Embodiment 2;
步骤 302, 对于第一丟失帧之后紧随的 1个或 2个以上丟失帧, 使用当 前丟失帧的前一丟失帧的基音延时作为当前丟失帧的基音延时;  Step 302: For one or more lost frames immediately following the first lost frame, use the pitch delay of the previous lost frame of the current lost frame as the pitch delay of the currently lost frame;
步骤 303 , 将当前丟失帧的前一丟失帧的最后一个子帧的自适应码本增 益的推断值进行衰减、 插值后得到的自适应码本增益值作为当前丟失帧中各 子帧的自适应码本增益;  Step 303: Attenuate and interpolate the inferred value of the adaptive codebook gain of the last subframe of the previous lost frame of the current lost frame as the adaptive of each subframe in the current lost frame. Codebook gain;
具体地,针对当前丟失帧,将经过衰减后的当前丟失帧的前一丟失帧(可 能是第一丟失帧也可能是第一丟失帧之后的丟失帧 ) 的最后一个子帧的自适 应码本增益作为当前丟失帧的最后一个子帧的自适应码本增益^ 当前丟 失帧的其他子帧的自适应码本增益由处理后的 ^和 ^之间的线性插值得 到, 对 的处理用于使 向 1靠近, 例如处理后的 为 的算术平 方根: gp'end , 或者也可以为 的立方根。 步骤 304 , 根据推断得到的基音延时和自适应码本增益对丟失帧进行补 偿。 Specifically, for the current lost frame, an adaptive codebook of the last subframe of the previously lost frame of the currently lost frame that may be attenuated (which may be the first lost frame or the lost frame after the first lost frame) Gain as the adaptive codebook gain of the last subframe of the current lost frame ^ The adaptive codebook gain of the other subframes of the current lost frame is obtained by linear interpolation between the processed ^ and ^, and the processing of the pair is used to make Approach 1 to, for example, the arithmetic square root of the process: gp ' end , or a cube root that can also be. Step 304: Compensate for the lost frame according to the inferred pitch delay and the adaptive codebook gain.
以帧长为 20ms,每帧分成 4个 5ms时长的子帧,釆样率为 16kHz的语音 流为例对本步骤 303进行具体说明, 在其它帧长和釆样率条件下, 下述方法 同样适用。  For the frame length of 20 ms, each frame is divided into four sub-frames of 5 ms duration, and the speech stream with a sampling rate of 16 kHz is taken as an example. This step 303 is specifically described. Under other frame lengths and sampling rates, the following methods are also applicable. .
将当前丟失帧的 4个子帧的自适应码本增益记为: Q, gp , gPi2, gP , 将当前丟失帧的前一丟失帧的最后一个子帧的自适应码本增益推断值记为: gp,- 计算得到¾,。, gP,2 , 的方法如下: The adaptive codebook gain of the 4 subframes of the currently lost frame is recorded as: Q , g p , g Pi2 , g P , the adaptive codebook gain inferred value of the last subframe of the previous lost frame of the currently lost frame Recorded as: gp, - Calculated 3⁄4,. , g P , 2 , are as follows:
首先, 令 其中, W表示当前连续丟帧的序号, 表示 与之对应的衰减系数;  First, let W denote the sequence number of the current consecutive frame loss, and indicate the attenuation coefficient corresponding thereto;
S  S
然后,计算插值步长 为: gp'SteP = ^ ,其中, gpstar =^ ,Then, calculate the interpolation step size as: gp ' SteP = ^ , where gpstar =^ ,
4 为当前丟失帧的总子帧数, 在其他实施例中, 如果每帧中的子帧数为其他 值, 则在釆用本实施例方法进行计算时,用该其他值替换上述公式中的 "4"; 这样, gP,2 , 的取值如下式: 4 is the total number of subframes of the current lost frame. In other embodiments, if the number of subframes in each frame is other values, when the calculation is performed by the method of this embodiment, the other values are used to replace the above formula. "4"; Thus, the value of g P , 2 , is as follows:
S ― S + S ,  S ― S + S ,
- + ,  - + ,
S ― S S ,  S ― S S ,
S ― S + S ― S  S ― S + S ― S
实施例 4  Example 4
本实施例描述如何对浊音起始帧之后首个正确接收到的帧进行补偿后的 恢复处理, 本实施例可以与上述实施例 1或实施例 2或实施例 3结合使用, 或者也可以与现有技术中的对浊音起始帧后丟帧的补偿方法结合使用。 包括 以下步骤:  This embodiment describes how to perform the recovery processing after compensating the first correctly received frame after the voiced start frame. This embodiment may be used in combination with Embodiment 1 or Embodiment 2 or Embodiment 3, or may be There is a technique for compensating for a frame loss frame after a start of a voiced tone frame. Includes the following steps:
步骤 401 , 浊音起始帧正确接收, 当浊音起始帧之后紧随的一个或两个 以上帧丟失时, 推断丟失帧的基音延时以及自适应码本增益, 根据推断得到 的基音延时和自适应码本增益对丟失帧进行补偿;  Step 401: The voiced start frame is correctly received. When one or more frames immediately following the voiced start frame are lost, the pitch delay of the lost frame and the adaptive codebook gain are inferred, and the pitch delay and the inference are obtained according to the inference. The adaptive codebook gain compensates for lost frames;
本步骤可以釆用实施例 1或实施例 2或实施例 3中的方法实现, 或者釆 用现有技术中的补偿方法实现。 步骤 402 , 对于浊音起始帧之后首个正确接收的帧, 将该帧中每个子帧 解码得到的自适应码本增益 乘以第二尺度因子 scale— fac得到每个子帧的新 的自适应码本增益 gp= scale— fac * gp,使用新的自适应码本增益代替解码得到 的自适应码本增益参与语音合成。 This step can be implemented by using the method in Embodiment 1 or Embodiment 2 or Embodiment 3, or by using the compensation method in the prior art. Step 402: For the first correctly received frame after the voiced start frame, multiply the adaptive codebook gain obtained by decoding each subframe in the frame by the second scale factor scale_fac to obtain a new adaptive code of each subframe. The gain g p = scale_ fac * g p , using the new adaptive codebook gain instead of the decoded adaptive codebook gain to participate in speech synthesis.
在进行语音合成时, 使用新的自适应码本增益代替解码得到的自适应码 本增益参与语音合成, 得到当前帧的时域语音信号。  In the speech synthesis, the new adaptive codebook gain is used instead of the decoded adaptive codebook gain to participate in speech synthesis, and the time domain speech signal of the current frame is obtained.
第二尺度因子 scale— fac用来控制丟帧后第一个正确接收帧的自适应码本 的贡献和合成语音的整体能量。 当补偿时使用的基音延时与当前帧使用的基 音延时形成跳跃时即说明补偿时使用的基音延时的可靠性不高, 需要适当减 小自适应码本贡献以减小错误的自适应码本带来的错误传递, 同时通过控制 第二尺度因子 scale— fac使得丟帧后第一个正确接收帧的能量不会迅速增大, 如图 5所示, 本实施例中, 每个子帧的第二尺度因子釆用以下方法计算 得到:  The second scale factor scale-fac is used to control the contribution of the adaptive codebook of the first correctly received frame after the frame loss and the overall energy of the synthesized speech. When the pitch delay used in the compensation forms a jump with the pitch delay used in the current frame, the reliability of the pitch delay used in the compensation is not high, and the adaptive codebook contribution needs to be appropriately reduced to reduce the error adaptation. The error transmission caused by the codebook, and by controlling the second scale factor scale-fac, the energy of the first correctly received frame after the frame loss does not increase rapidly. As shown in FIG. 5, in this embodiment, each subframe The second scale factor is calculated using the following method:
步骤 a, 将第二尺度因子赋初值 1 ;  Step a, assigning the second scale factor to the initial value of 1;
优选地, 在步骤 a、 b之间还可以包括步骤 al : 如果当前帧的前一丟帧的 基音延时的推断值与当前帧解码得到的第一个子帧的基音延时 TQ差的绝对值 大于预设第八阔值, 例如大于 10 , 则根据丟帧前最后一个正确接收帧即浊音 起始帧的基音同步自相关系数 RT的线性增函数重新计算新的第二尺度因子为 a*i?r +b,通常只需取 a>0以保证第二尺度因子为关于 的增函数, 同时可以 对新的 scale— fac进行范围限制, 例如当 scale— fac大于 1时取 1 , 小于 0.5时 取 0.5。 Preferably, the step a1 is further included between the steps a and b: if the inferred value of the pitch delay of the previous frame loss of the current frame is different from the pitch delay T Q of the first subframe obtained by the current frame decoding. If the absolute value is greater than the preset eighth threshold, for example, greater than 10, the new second scale factor is recalculated according to the linear increasing function of the pitch synchronization autocorrelation coefficient R T of the last correct received frame before the frame loss, that is, the voiced start frame. a*i? r +b, usually only need to take a>0 to ensure that the second scale factor is about the increasing function, and can also limit the range of the new scale-fac, for example, when scale_fac is greater than 1, When it is less than 0.5, take 0.5.
步骤 b ,将第二尺度因子 scale— fac(可能是步骤 a中的第二尺度因子初值, 也可能是步骤 al中的新的第二尺度因子)乘以当前子帧解码得到的自适应码 本增益 gp, 得到的值再乘以当前子帧的自适应码本, 将得到的信号作为当前 子帧的激励信号; Step b, multiplying the second scale factor scale-fac (possibly the second scale factor initial value in step a, or possibly the new second scale factor in step a) by the adaptive code obtained by decoding the current subframe The gain g p , the obtained value is multiplied by the adaptive codebook of the current subframe, and the obtained signal is used as an excitation signal of the current subframe;
步骤 c , 使用该激励信号进行语音预先合成, 合成后不更新各滤波器的 状态值, 根据预先合成的语音信号计算得到当前子帧的信号能量  Step c, using the excitation signal to perform voice pre-synthesis, synthesizing, not updating the state value of each filter, and calculating the signal energy of the current subframe according to the pre-synthesized speech signal
步骤 d, 如果当前子帧的信号能量 E和当前帧的前一帧中最后一个子帧 的信号能量 的比值的算术平方根^ 超过第七阔值 f (优选 1 < f < 1.5 ) , 将第 二尺度 因 子更新为 当 前第 二尺度 因 子的 E-、/ E 倍: scale _fac = K * JF E * scale_fac; ? 如果不超过, 则不更新。 能量 的计算公式如下: £ =∑'=。 其中, N为子帧长度, i i = \,— , N 为预先合成的语音信号或解码器合成的当前帧的前一帧的语音信号。 Step d, if the signal energy E of the current subframe and the last subframe in the previous frame of the current frame The arithmetic square root of the ratio of the signal energy ^ exceeds the seventh threshold f (preferably 1 < f < 1.5), and the second scale factor is updated to the E -, / E times of the current second scale factor: scale _fac = K * JF E * scale_fac; ? If not, it will not be updated. The energy is calculated as follows: £ = ∑'=. Where N is the subframe length, ii = \, -, N is the pre-composed speech signal or the speech signal of the previous frame of the current frame synthesized by the decoder.
实施例 5  Example 5
本实施例描述实现实施例 1方法的补偿装置, 该装置包括第一基因延时 补偿模块、 第一自适应码本增益补偿模块和第一补偿模块, 其中:  This embodiment describes a compensation apparatus for implementing the method of Embodiment 1, the apparatus comprising a first genetic delay compensation module, a first adaptive codebook gain compensation module, and a first compensation module, wherein:
该第一基因延时补偿模块, 在浊音起始帧正确接收, 浊音起始帧之后紧 随的第一帧丟失时, 根据该浊音起始帧的稳定性条件选取相应的基音延时推 断方式推断该第一丟失帧的基音延时;  The first gene delay compensation module selects the corresponding pitch delay inference method according to the stability condition of the voiced start frame when the voiced start frame is correctly received and the first frame immediately following the voiced start frame is lost. The pitch delay of the first lost frame;
该第一自适应码本增益补偿模块, 根据第一丟失帧前接收的一个或两个 以上子帧的自适应码本增益推断该第一丟失帧的自适应码本增益, 或者根据 浊音起始帧的时域语音信号的能量变化推断该第一丟失帧的自适应码本增 益;  The first adaptive codebook gain compensation module estimates an adaptive codebook gain of the first lost frame according to an adaptive codebook gain of one or more subframes received before the first lost frame, or according to a voiced start The energy variation of the time domain speech signal of the frame infers the adaptive codebook gain of the first lost frame;
该第一补偿模块, 用于根据推断得到的基音延时和自适应码本增益对第 一丟失帧进行补偿。  The first compensation module is configured to compensate the first lost frame according to the inferred pitch delay and the adaptive codebook gain.
优选地, 该第一基因延时补偿模块是用于釆用以下方式根据该浊音起始 帧的稳定性条件选取相应的基音延时推断方式推断该第一丟失帧的基音延 时:  Preferably, the first gene delay compensation module is configured to infer the pitch delay of the first lost frame according to the stability condition of the voiced start frame by selecting a corresponding pitch delay inference manner:
如果浊音起始帧满足以下任一条件, 则釆用以下基音延时推断方式推断 该第一丟失帧的基音延时: 使用该浊音起始帧的最后一个子帧的基音延时的 整数部分作为该第一丟失帧每个子帧的基音延时的推断值;  If the voiced start frame satisfies any of the following conditions, the pitch delay of the first lost frame is inferred by the following pitch delay inference method: using the integer part of the pitch delay of the last subframe of the voiced start frame as An inferred value of the pitch delay of each subframe of the first lost frame;
如果浊音起始帧不满足以下所有条件, 则釆用以下基音延时推断方式推 断该第一丟失帧的基音延时: 使用第一修正量对该浊音起始帧的最后一个子 帧的基音延时的整数部分进行修正得到第一修正值, 将第一修正值作为该第 一丟失帧每个子帧的基音延时的推断值; 上述条件为: If the voiced start frame does not satisfy all of the following conditions, the pitch delay of the first lost frame is inferred by the following pitch delay inference method: using the first correction amount, the pitch of the last subframe of the voiced start frame is delayed. The integer part of the time is corrected to obtain a first correction value, and the first correction value is used as an inferred value of the pitch delay of each subframe of the first lost frame; The above conditions are:
浊音起始帧的基音同步的自相关系数大于第一阔值;  The autocorrelation coefficient of the pitch synchronization of the voiced start frame is greater than the first threshold;
浊音起始帧的最后一个子帧的自适应码本增益大于第二阔值, 且该浊音 起始帧的倒数第二个子帧的自适应码本增益大于第三阔值;  The adaptive codebook gain of the last subframe of the voiced start frame is greater than the second threshold, and the adaptive codebook gain of the second to last subframe of the voiced start frame is greater than the third threshold;
浊音起始帧的最后一个子帧和倒数第二个子帧的基音延时的整数部分相 等。  The last sub-frame of the voiced start frame is equal to the integer part of the pitch delay of the second to last subframe.
如图 6所示, 该补偿装置还包括第一修正量计算模块, 其用于获得所述 第一修正量, 该第一修正量计算模块可以单独设置, 也可以设置在第一基音 延时补偿模块中。 该第一修正量计算模块包括消除单元、修正因子计算单元、 第一尺度因子计算单元和第一修正量计算单元, 其中:  As shown in FIG. 6, the compensation device further includes a first correction amount calculation module, configured to obtain the first correction amount, and the first correction amount calculation module may be separately set, or may be set in the first pitch delay compensation. In the module. The first correction amount calculation module includes an elimination unit, a correction factor calculation unit, a first scale factor calculation unit, and a first correction amount calculation unit, wherein:
该消除单元, 用于以第一丟失帧之前的最后一个子帧为基准, 消除第一 丟失帧之前的两个以上子帧的基因延时的倍数;  The eliminating unit is configured to eliminate a multiple of a gene delay of two or more subframes before the first lost frame, based on a last subframe before the first lost frame;
该修正因子计算单元, 用于釆用以下方式确定基音延时的修正因子: 修 正因子为: 消除基音延时倍数后的第一丟失帧之前的两个以上子帧的基音延 时整数部分的标准方差;  The correction factor calculation unit is configured to determine a correction factor of the pitch delay in the following manner: The correction factor is: a criterion for eliminating the pitch delay integer part of the two or more subframes before the first lost frame after the pitch delay multiple is eliminated Variance
该第一尺度因子计算单元, 用于釆用以下方式确定基因延时的第一尺度 因子: 第一尺度因子为: 1 减去修正因子与浊音起始帧的最后一个子帧的基 音延时的整数部分的比值;  The first scale factor calculation unit is configured to determine a first scale factor of the gene delay in the following manner: The first scale factor is: 1 minus the correction factor and the pitch delay of the last subframe of the voiced start frame The ratio of the integer parts;
该第一修正量计算单元, 用于釆用以下方式计算该第一修正量: 第一修 正量为: 所述修正因子和第一尺度因子的乘积。  The first correction amount calculation unit is configured to calculate the first correction amount in the following manner: The first correction amount is: a product of the correction factor and the first scale factor.
优选地, 该消除单元是用于釆用以下方式以第一丟失帧之前的最后一个 子帧为基准, 消除第一丟失帧之前的两个以上子帧的基因延时的倍数:  Preferably, the cancellation unit is configured to eliminate the multiple of the gene delay of the two or more subframes before the first lost frame based on the last subframe before the first lost frame in the following manner:
先取 7 ^为 其中, 7 ^表示消除倍数后的基音延时, 7^为该浊音起 始帧的最后一个子帧的基音延时的整数部分; 如果 7小于等于 7^ , 消除单元 取 7和 2*7;中与 7^之差的绝对值最小的那个作为 如果 7大于 消除 单元取 7和 7V2中与 ^之差的绝对值最小的那个作为 Τ、, 其中 z=[-2,-M], 其中 M为待进行消除操作的第一丟失帧之前的子帧的个数。  First take 7 ^ for it, 7 ^ denotes the pitch delay after eliminating the multiple, 7^ is the integer part of the pitch delay of the last subframe of the voiced start frame; if 7 is less than or equal to 7^, the elimination unit takes 7 and 2*7; the one with the smallest absolute value of the difference between 7 and 7 is taken as the 如果, where z = [-2, -M, if 7 is greater than the absolute value of the difference between the elimination unit and 7 and 7V2 and ^ is the smallest. ], where M is the number of subframes before the first lost frame to be subjected to the cancel operation.
优选地, 该第一自适应码本增益补偿模块是用于釆用以下方式根据第一 丟失帧前接收的一个或两个以上子帧的自适应码本增益推断该第一丟失帧的 自适应码本增益, 或者根据浊音起始帧的时域语音信号的能量变化推断该第 一丟失帧的自适应码本增益: Preferably, the first adaptive codebook gain compensation module is used to The adaptive codebook gain of one or more subframes received before the lost frame infers the adaptive codebook gain of the first lost frame, or infers the first loss according to the energy variation of the time domain speech signal of the voiced start frame Adaptive codebook gain for frames:
第一自适应码本增益补偿模块判断如果满足以下条件一: 浊音起始帧的 基音周期内对数能量与长时基音周期内对数能量的差值小于第四阔值, 则将 衰减后的第一丟失帧之前一个或两个以上子帧的自适应码本增益的中位数的 值作为第一丟失帧中每个子帧的自适应码本增益的推断值;  The first adaptive codebook gain compensation module determines that if the following condition is satisfied: the difference between the logarithmic energy in the pitch period of the voiced start frame and the logarithmic energy in the long time pitch period is less than the fourth threshold, then the attenuation is The value of the median of the adaptive codebook gain of one or more subframes before the first lost frame as the inferred value of the adaptive codebook gain for each subframe in the first lost frame;
第一自适应码本增益补偿模块判断如果不满足条件一, 但满足以下条件 二: 浊音起始帧中最后一个子帧的自适应码本增益在预定范围内, 则将对其 衰减后的值作为第一丟失帧中每个子帧的自适应码本增益的推断值;  The first adaptive codebook gain compensation module determines that if condition one is not satisfied, but the following condition two is satisfied: the adaptive codebook gain of the last subframe in the voiced start frame is within a predetermined range, and the value to be attenuated An inferred value of the adaptive codebook gain for each subframe in the first lost frame;
第一自适应码本增益补偿模块判断如果不满足条件一也不满足条件二, 则计算能量比值 RLT和 RST,使用衰减后的 RLT和 ¾r的加权平均值作为第一丟 失帧中每个子帧的自适应码本增益的推断值; 其中, 表示解码器合成的浊 音起始帧的时域语音信号的除第一个基音周期外的能量与除最后一个基音周 期外的能量的比值; ¾r表示表示解码器合成的浊音起始帧的时域语音信号的 最后一个基音周期的能量与最后一个基音周期的前一个基音周期的能量的比 值, 所述基音周期不超过帧长的一半。 The first adaptive codebook gain compensation module determines that if the condition 1 is not satisfied, the energy ratio R LT and RST are calculated, and the weighted average of the attenuated RLT and 3⁄4 r is used as each of the first lost frames. An inferred value of the adaptive codebook gain of the frame; wherein, the ratio of the energy of the time domain speech signal of the voiced start frame synthesized by the decoder, except for the first pitch period, to the energy other than the last pitch period; 3⁄4 r represents the ratio of the energy of the last pitch period of the time domain speech signal representing the voiced start frame synthesized by the decoder to the energy of the previous pitch period of the last pitch period, the pitch period not exceeding half of the frame length.
实施例 6  Example 6
本实施例描述实现实施例 2方法的补偿装置, 如图 7所示, 该装置在实 施例 5中装置的基础上增加了一个基因延时补偿修正模块, 其用于在得到第 一修正值之后, 对该第一修正值进行第二修正处理, 将修正处理后的结果作 为最终的该第一丟失帧每个子帧的基音延时的推断值。  This embodiment describes a compensation device for implementing the method of Embodiment 2. As shown in FIG. 7, the device adds a gene delay compensation correction module to the device of Embodiment 5, which is used after obtaining the first correction value. And performing a second correction process on the first correction value, and using the corrected result as the inferred value of the pitch delay of each subframe of the first lost frame.
进一步地, 该基因延时补偿修正模块是用于釆用以下方式对该第一修正 值进行第二修正处理:  Further, the gene delay compensation correction module is configured to perform a second correction process on the first correction value in the following manner:
基因延时补偿修正模块判断如果满足下述两条件, 则取该浊音起始帧的 最后一个子帧的基音延时的整数部分为基音延时中间值: 条件 1 : 第一修正 值与该浊音起始帧的最后一个子帧的基音延时的整数部分的差的绝对值大于 第五阔值 条件 2: 该浊音起始帧的最后一个子帧的基音延时的整数部分 与浊音起始帧倒数第二个子帧的基音延时整数部分的差的绝对值小于第六阔 值; 其中 0<第六阔值<第五阔值; 基因延时补偿修正模块判断如果不满足上 述任一条件, 则取第一修正量与第五阔值的最小值与该浊音起始帧的最后一 个子帧的基音延时的整数部分的和为基音延时中间值; The gene delay compensation correction module determines that the integer part of the pitch delay of the last subframe of the voiced start frame is the intermediate value of the pitch delay if the following two conditions are met: Condition 1: First correction value and the voiced tone The absolute value of the difference of the integer portion of the pitch delay of the last subframe of the start frame is greater than the fifth threshold condition 2: the integer portion of the pitch delay of the last subframe of the voiced start frame The absolute value of the difference from the pitch delay integer portion of the second to last sub-frame of the voiced start frame is less than the sixth threshold; wherein 0<the sixth threshold <the fifth threshold; the gene delay compensation correction module determines if it is not satisfied In any of the above conditions, the sum of the minimum value of the first correction amount and the fifth threshold value and the integer portion of the pitch delay of the last subframe of the voiced start frame is an intermediate value of the pitch delay;
基因延时补偿修正模块判断基音延时中间值如果大于最近正确接收的具 有稳定基音延时的浊音帧的基音延时的 X倍, χ > 1 ,则将基音延时中间值乘 2 作为第二修正处理后的结果, 同时置倍频标识位为有效; 若基音延时中间值 不大于最近正确接收的具有稳定基音延时的浊音帧的基音延时的 X倍, 则将 该基音延时中间值作为第二修正处理后的结果, 同时置倍频标识位为无效。  The gene delay compensation correction module determines that the intermediate value of the pitch delay is greater than X times the pitch delay of the recently received voiced frame with a stable pitch delay, χ > 1 , and multiplies the intermediate value of the pitch delay by 2 as the second Correcting the processed result, and simultaneously setting the multiplier flag is valid; if the intermediate value of the pitch delay is not greater than X times the pitch delay of the recently received voiced frame with a stable pitch delay, the pitch is delayed in the middle The value is used as the result of the second correction process, and the doubled frequency flag is invalid.
在本实施例中, 第一自适应码本增益补偿模块是用于釆用以下方式根据 第一丟失帧前接收的一个或两个以上子帧的自适应码本增益推断该第一丟失 帧的自适应码本增益, 或者根据浊音起始帧的时域语音信号的能量变化推断 该第一丟失帧的自适应码本增益:  In this embodiment, the first adaptive codebook gain compensation module is configured to infer the first lost frame according to an adaptive codebook gain of one or more subframes received before the first lost frame in the following manner. Adaptive codebook gain, or inferring the adaptive codebook gain of the first lost frame based on the energy variation of the time domain speech signal of the voiced start frame:
第一自适应码本增益补偿模块判断如果满足以下条件一: 浊音起始帧的 基音周期内对数能量与长时基音周期内对数能量的差值小于第四阔值或者在 基音延时推断中设置的倍频标识位为有效, 则将衰减后的第一丟失帧之前一 个或两个以上子帧的自适应码本增益的中位数的值作为第一丟失帧中每个子 帧的自适应码本增益的推断值;  The first adaptive codebook gain compensation module determines if the following condition is satisfied: the difference between the logarithmic energy in the pitch period of the voiced start frame and the logarithmic energy in the long time pitch period is less than the fourth threshold or inferred in the pitch delay If the multiplier flag set in the middle is valid, the median value of the adaptive codebook gain of one or more subframes before the first lost frame is used as the self of each subframe in the first lost frame. The inferred value of the adaptation codebook gain;
第一自适应码本增益补偿模块判断如果不满足条件一, 但满足以下条件 二: 浊音起始帧中最后一个子帧的自适应码本增益在预定范围内, 则将对其 衰减后的值作为第一丟失帧中每个子帧的自适应码本增益的推断值;  The first adaptive codebook gain compensation module determines that if condition one is not satisfied, but the following condition two is satisfied: the adaptive codebook gain of the last subframe in the voiced start frame is within a predetermined range, and the value to be attenuated An inferred value of the adaptive codebook gain for each subframe in the first lost frame;
第一自适应码本增益补偿模块判断如果不满足条件一也不满足条件二, 则计算能量比值 RLT和 RST,使用衰减后的 RLT和 ¾r的加权平均值作为第一丟 失帧中每个子帧的自适应码本增益的推断值; 其中, 表示解码器合成的浊 音起始帧的时域语音信号的除第一个基音周期外的能量与除最后一个基音周 期外的能量的比值; ¾r表示表示解码器合成的浊音起始帧的时域语音信号的 最后一个基音周期的能量与最后一个基音周期的前一个基音周期的能量的比 值, 所述基音周期不超过帧长的一半。 The first adaptive codebook gain compensation module determines that if the condition 1 is not satisfied, the energy ratio R LT and RST are calculated, and the weighted average of the attenuated RLT and 3⁄4 r is used as each of the first lost frames. An inferred value of the adaptive codebook gain of the frame; wherein, the ratio of the energy of the time domain speech signal of the voiced start frame synthesized by the decoder, except for the first pitch period, to the energy other than the last pitch period; 3⁄4 r represents the ratio of the energy of the last pitch period of the time domain speech signal representing the voiced start frame synthesized by the decoder to the energy of the previous pitch period of the last pitch period, the pitch period not exceeding half of the frame length.
实施例 7 本实施例描述实现实施例 3方法的补偿装置, 如图 8所示, 该装置在实 施例 5或者实施例 6装置基础上增加了第二基音延时补偿模块、 第二自适应 码本增益补偿模块和第二补偿模块, 其中: 该第二基音延时补偿模块, 用于对于第一丟失帧之后紧随的一个或两个 以上丟失帧, 使用当前丟失帧的前一丟失帧的基音延时的推断值作为当前丟 失帧的基音延时; Example 7 The embodiment describes a compensation device for implementing the method of Embodiment 3. As shown in FIG. 8, the device adds a second pitch delay compensation module and a second adaptive codebook gain compensation based on the device of Embodiment 5 or Embodiment 6. a module and a second compensation module, wherein: the second pitch delay compensation module is configured to use a pitch delay of a previous lost frame of the current lost frame for one or more lost frames immediately following the first lost frame The inferred value is used as the pitch delay of the currently lost frame;
该第二自适应码本增益补偿模块, 用于将当前丟失帧的前一丟失帧的最 后一个子帧的自适应码本增益的推断值进行衰减、 插值后得到的自适应码本 增益值作为当前丟失帧中各子帧的自适应码本增益;  The second adaptive codebook gain compensation module is configured to attenuate and interpolate the estimated value of the adaptive codebook gain of the last subframe of the previous lost frame of the current lost frame as the adaptive codebook gain value. The adaptive codebook gain of each subframe in the current lost frame;
该第二补偿模块, 用于根据推断得到的基音延时和自适应码本增益对丟 失帧进行补偿。  The second compensation module is configured to compensate for the lost frame according to the inferred pitch delay and the adaptive codebook gain.
优选地, 该第二自适应码本增益补偿模块是用于釆用以下方式将当前丟 失帧的前一丟失帧的最后一个子帧的自适应码本增益的推断值进行衰减、 插 值后得到的自适应码本增益值作为当前丟失帧中各子帧的自适应码本增益: 第二自适应码本增益补偿模块将经过衰减后的当前丟失帧的前一丟失帧 的最后一个子帧的自适应码本增益作为当前丟失帧的最后一个子帧的自适应 码本增益(
Figure imgf000022_0001
) ,当前丟失帧的其他子帧的自适应码本增益由处理后的 和 ^之间的线性插值得到, 对^^ ^的处理用于使 ^向 1靠近。
Preferably, the second adaptive codebook gain compensation module is configured to attenuate and interpolate the inferred value of the adaptive codebook gain of the last subframe of the previous lost frame of the current lost frame in the following manner. The adaptive codebook gain value is used as the adaptive codebook gain of each subframe in the current lost frame: the second adaptive codebook gain compensation module will automatically subtract the last subframe of the previous lost frame of the current lost frame. Adapting the codebook gain as the adaptive codebook gain of the last subframe of the current lost frame (
Figure imgf000022_0001
The adaptive codebook gain of other subframes of the current lost frame is obtained by linear interpolation between the processed sums, and the processing of ^^^ is used to bring the ^1 closer.
实施例 8  Example 8
本实施例描述实现实施例 4方法的补偿装置, 如图 9所示, 该装置包括 补偿模块和自适应码本增益调整模块, 其中:  This embodiment describes a compensation apparatus for implementing the method of Embodiment 4, as shown in FIG. 9, the apparatus includes a compensation module and an adaptive codebook gain adjustment module, wherein:
该补偿模块, 用于在浊音起始帧正确接收, 当浊音起始帧之后紧随的一 个或两个以上帧丟失时, 推断丟失帧的基音延时以及自适应码本增益, 根据 推断得到的基音延时和自适应码本增益对丟失帧进行补偿; 该补偿模块可以 釆用如实施例 5或实施例 6或实施例 7中所述的补偿装置实现;  The compensation module is configured to correctly receive the voiced start frame, and when one or more frames immediately following the voiced start frame are lost, infer the pitch delay of the lost frame and the adaptive codebook gain, according to the inference The pitch delay and the adaptive codebook gain compensate for the lost frame; the compensation module can be implemented by using a compensation device as described in Embodiment 5 or Embodiment 6 or Embodiment 7;
该自适应码本增益调整模块, 对浊音起始帧之后首个正确接收的帧, 将 该帧中每个子帧解码得到的自适应码本增益乘以该子帧的第二尺度因子得到 每个子帧的新的自适应码本增益, 使用新的自适应码本增益代替解码得到的 自适应码本增益参与语音合成。 The adaptive codebook gain adjustment module obtains the first correctly received frame after the voiced start frame, and multiplies the adaptive codebook gain obtained by decoding each subframe in the frame by the second scale factor of the subframe to obtain each subframe. The new adaptive codebook gain of the frame, using the new adaptive codebook gain instead of decoding The adaptive codebook gain participates in speech synthesis.
优选地, 补偿装置还包括第二尺度因子计算模块, 其用于计算每个子帧 的第二尺度因子, 该第二尺度因子计算模块可以单独设置, 也可以设置在自 适应码本增益调整模块中。 如图 10所示, 该第二尺度因子计算模块包括激励 信号获取单元、 预合成单元和第二尺度因子生成单元, 其中:  Preferably, the compensation device further includes a second scale factor calculation module, configured to calculate a second scale factor of each subframe, the second scale factor calculation module may be separately set, or may be set in the adaptive codebook gain adjustment module. . As shown in FIG. 10, the second scale factor calculation module includes an excitation signal acquisition unit, a pre-synthesis unit, and a second scale factor generation unit, where:
该激励信号获取单元, 用于将第二尺度因子初值乘以当前子帧解码得到 的自适应码本增益, 再乘以当前子帧的自适应码本, 将得到的信号作为当前 子帧的激励信号;  The excitation signal acquiring unit is configured to multiply the initial value of the second scale factor by the adaptive codebook gain obtained by decoding the current subframe, and then multiply the adaptive codebook of the current subframe, and use the obtained signal as the current subframe. Excitation signal
该预合成单元, 用于使用所述激励信号进行语音预先合成, 根据预先合 成的语音信号计算得到当前子帧的信号能量;  The pre-synthesis unit is configured to perform voice pre-synthesis using the excitation signal, and calculate signal energy of the current subframe according to the pre-synthesized speech signal;
该第二尺度因子生成单元, 用于在判断当前子帧的信号能量和当前帧的 前一帧最后一个子帧的信号能量的比值的算术平方根超过第七阔值时, 将第 二尺度因子更新为当前第二尺度因子的 Q倍, Q为所述算术平方根与第七阔 值的乘积。  The second scale factor generating unit is configured to update the second scale factor when determining that the arithmetic square root of the ratio of the signal energy of the current subframe and the signal energy of the last subframe of the previous frame exceeds the seventh threshold Q is the current second scale factor, Q is the product of the arithmetic square root and the seventh threshold.
优选地, 该激励信号获取单元, 还用于在将第二尺度因子初值乘以当前 子帧解码得到的自适应码本增益之前, 判断当前帧的前一丟帧的基音延时的 推断值与当前帧解码得到的第一个子帧的基音延时差的绝对值大于第八阔值 时, 根据浊音起始帧的基音同步自相关系数的线性增函数重新计算新的第二 尺度因子, 用新的第二尺度因子代替第二尺度因子初值。  Preferably, the excitation signal acquisition unit is further configured to determine an inference value of a pitch delay of a previous frame loss of the current frame before multiplying the initial value of the second scale factor by the adaptive codebook gain decoded by the current subframe. When the absolute value of the pitch delay difference of the first subframe obtained by decoding the current frame is greater than the eighth threshold, the new second scale factor is recalculated according to the linear increasing function of the pitch synchronization autocorrelation coefficient of the voiced start frame. Replace the initial value of the second scale factor with a new second scale factor.
本文实施例中所使用阔值为经验值, 可通过仿真得到。  The width values used in the examples herein are empirical values and can be obtained by simulation.
本领域普通技术人员可以理解上述方法中的全部或部分步骤可通过程序 来指令相关硬件完成, 所述程序可以存储于计算机可读存储介质中, 如只读 存储器、 磁盘或光盘等。 可选地, 上述实施例的全部或部分步骤也可以使用 一个或多个集成电路来实现。 相应地, 上述实施例中的各模块 /单元可以釆用 硬件的形式实现, 也可以釆用软件功能模块的形式实现。 本发明不限制于任 何特定形式的硬件和软件的结合。  One of ordinary skill in the art will appreciate that all or a portion of the above steps may be accomplished by a program instructing the associated hardware, such as a read-only memory, a magnetic disk, or an optical disk. Alternatively, all or part of the steps of the above embodiments may also be implemented using one or more integrated circuits. Correspondingly, each module/unit in the above embodiment may be implemented in the form of hardware or in the form of a software function module. The invention is not limited to any specific form of combination of hardware and software.
当然, 本发明还可有其他多种实施例, 在不背离本发明精神及其实质的 但这些相应的改变和变形都应属于本发明所附的权利要求的保护范围。 Of course, the invention may have other various embodiments without departing from the spirit and spirit of the invention. However, such corresponding changes and modifications are intended to be included within the scope of the appended claims.
工业实用性 本发明实施例充分考虑到浊音起始帧不同于浊音帧的特点, 针对浊音起 始帧之后紧随的第一丟失帧, 根据该浊音起始帧的不同的稳定特性, 釆用不 同的方式推断该第一丟失帧的基音延时, 根据第一丟失帧前接收的一个或两 个以上子帧的自适应码本增益, 或者根据浊音起始帧的时域语音信号的能量 变化推断该第一丟失帧的自适应码本增益, 补偿时仅仅用到丟失帧前帧的信 息可以避免补偿延时, 同时由于基于浊音起始帧不同的稳定特性釆用不同的 补偿方式从而可以保证补偿音质。 针对上述第一丟失帧之后紧随的一个或两 个以上的丟失帧, 釆用衰减后插值的方法获得丟失帧的自适应码本增益, 从 而使得丟失帧时的语音能量平滑下降。 针对丟失帧之后的首个正常接收帧, 通过对其进行自适应码本增益的调整以达到减小由于丟帧带来的错误传递和 控制合成语音的能量的效果, 综上, 釆用本发明实施例方法, 可以提高在丟 帧环境下的语音通话质量。 INDUSTRIAL APPLICABILITY The embodiment of the present invention fully considers the characteristics that the voiced start frame is different from the voiced frame, and the first lost frame immediately following the voiced start frame is different according to different stable characteristics of the voiced start frame. Means inferring the pitch delay of the first lost frame, based on the adaptive codebook gain of one or more subframes received before the first lost frame, or inferring from the energy variation of the time domain speech signal of the voiced start frame The adaptive codebook gain of the first lost frame can be used to avoid the compensation delay only when the information of the frame before the frame is lost, and the compensation can be compensated for by using different compensation modes based on different stable characteristics of the voiced start frame. Sound quality. For one or more missing frames immediately following the first lost frame, the adaptive codebook gain of the lost frame is obtained by the method of post-fading interpolation, so that the speech energy at the time of the lost frame is smoothly reduced. For the first normal received frame after the lost frame, by adjusting the adaptive codebook gain to reduce the error transmission due to frame dropping and controlling the energy of the synthesized speech, in summary, the present invention is used. The embodiment method can improve the quality of voice calls in a frame dropping environment.

Claims

权 利 要 求 书 Claim
1、 一种浊音起始帧后丟帧的补偿方法, 所述方法包括:  A method for compensating for a frame loss after a voiced start frame, the method comprising:
浊音起始帧正确接收, 当浊音起始帧之后紧随的第一帧丟失时, 根据该 浊音起始帧的稳定性条件选取相应的基音延时推断方式推断该第一丟失帧的 基音延时; 根据第一丟失帧前接收的一个或两个以上子帧的自适应码本增益 推断该第一丟失帧的自适应码本增益, 或者根据浊音起始帧的时域语音信号 的能量变化推断该第一丟失帧的自适应码本增益; 根据推断得到的基音延时 和自适应码本增益对第一丟失帧进行补偿。  The voiced start frame is correctly received. When the first frame immediately following the voiced start frame is lost, the corresponding pitch delay inference method is selected according to the stability condition of the voiced start frame to infer the pitch delay of the first lost frame. Deriving an adaptive codebook gain of the first lost frame according to an adaptive codebook gain of one or more subframes received before the first lost frame, or inferring an energy variation of the time domain speech signal according to the voiced start frame The adaptive codebook gain of the first lost frame; compensating for the first lost frame based on the inferred pitch delay and the adaptive codebook gain.
2、 如权利要求 1所述的方法, 其中,  2. The method of claim 1 wherein
所述根据该浊音起始帧的稳定性条件选取相应的基音延时推断方式推断 该第一丟失帧的基音延时, 包括:  And determining a pitch delay of the first lost frame according to the stability condition of the voiced start frame, and selecting a corresponding pitch delay inference manner, including:
如果浊音起始帧符合稳定性条件, 则釆用以下基音延时推断方式推断该 第一丟失帧的基音延时: 使用该浊音起始帧的最后一个子帧的基音延时的整 数部分作为该第一丟失帧每个子帧的基音延时的推断值;  If the voiced start frame conforms to the stability condition, the pitch delay of the first lost frame is inferred by the following pitch delay inference method: using the integer part of the pitch delay of the last subframe of the voiced start frame as the Inferred value of the pitch delay of each subframe of the first lost frame;
如果浊音起始帧不符合稳定性条件, 则釆用以下基音延时推断方式推断 该第一丟失帧的基音延时: 使用第一修正量对该浊音起始帧的最后一个子帧 的基音延时的整数部分进行修正得到第一修正值, 将第一修正值作为该第一 丟失帧每个子帧的基音延时的推断值。  If the voiced start frame does not meet the stability condition, the pitch delay of the first lost frame is inferred by the following pitch delay inference method: using the first correction amount, the pitch of the last subframe of the voiced start frame is delayed. The integer part of the time is corrected to obtain a first correction value, and the first correction value is used as an inferred value of the pitch delay of each subframe of the first lost frame.
3、 如权利要求 2所述的方法, 其中,  3. The method of claim 2, wherein
釆用以下方式判断浊音起始帧是否符合稳定性条件:  判断 Use the following method to determine whether the voiced start frame meets the stability condition:
满足以下任一条件的浊音起始帧符合所述稳定性条件, 不满足以下所有 条件的浊音起始帧不符合所述稳定性条件:  A voiced start frame that satisfies any of the following conditions satisfies the stability condition, and a voiced start frame that does not satisfy all of the following conditions does not satisfy the stability condition:
浊音起始帧的基音同步的自相关系数大于第一阔值;  The autocorrelation coefficient of the pitch synchronization of the voiced start frame is greater than the first threshold;
浊音起始帧的最后一个子帧的自适应码本增益大于第二阔值, 且该浊音 起始帧的倒数第二个子帧的自适应码本增益大于第三阔值;  The adaptive codebook gain of the last subframe of the voiced start frame is greater than the second threshold, and the adaptive codebook gain of the second to last subframe of the voiced start frame is greater than the third threshold;
浊音起始帧的最后一个子帧和倒数第二个子帧的基音延时的整数部分相 等。 The last sub-frame of the voiced start frame is equal to the integer part of the pitch delay of the second to last subframe.
4、 如权利要求 2所述的方法, 其中, 4. The method of claim 2, wherein
所述第一修正量釆用以下方法获得:  The first correction amount is obtained by the following method:
以第一丟失帧之前的最后一个子帧为基准, 消除第一丟失帧之前的两个 以上子帧的基因延时的倍数, 利用消除基音延时的倍数后的第一丟失帧之前 的两个以上子帧的基音延时的整数部分确定基音延时的修正因子, 用该修正 因子和该浊音起始帧的最后一个子帧的基音延时的整数部分确定基因延时的 第一尺度因子, 所述第一修正量为该修正因子和第一尺度因子的乘积。  Eliminating the multiple of the gene delay of the two or more subframes before the first lost frame based on the last subframe before the first lost frame, using two of the first lost frames after eliminating the multiple of the pitch delay The integer portion of the pitch delay of the above subframe determines a correction factor of the pitch delay, and the correction factor and the integer portion of the pitch delay of the last subframe of the voiced start frame determine the first scale factor of the gene delay, The first correction amount is a product of the correction factor and the first scale factor.
5、 如权利要求 4所述的方法, 其中,  5. The method of claim 4, wherein
所述修正因子为: 消除基音延时倍数后的第一丟失帧之前的两个以上子 帧的基音延时整数部分的标准方差;  The correction factor is: a standard deviation of a pitch delay integer part of two or more sub-frames before the first lost frame after canceling the pitch delay multiple;
所述第一尺度因子为: 1 减去修正因子与浊音起始帧的最后一个子帧的 基音延时的整数部分的比值。  The first scale factor is: 1 minus the ratio of the correction factor to the integer portion of the pitch delay of the last subframe of the voiced start frame.
6、 如权利要求 4或 5所述的方法, 其中,  6. The method according to claim 4 or 5, wherein
所述以第一丟失帧之前的最后一个子帧为基准, 消除第一丟失帧之前的 两个以上子帧的基因延时的倍数, 包括:  Determining the multiple of the gene delay of the two or more subframes before the first lost frame based on the last subframe before the first lost frame, including:
先取 7 ^为 其中, 7 ^表示消除倍数后的基音延时, 7^为该浊音起 始帧的最后一个子帧的基音延时的整数部分;  First, take 7^ as the middle, 7^ denotes the pitch delay after eliminating the multiple, and 7^ is the integer part of the pitch delay of the last subframe of the voiced start frame;
如果 7小于等于 取 7和 2*7;中与 7^之差的绝对值最小的那个; 反之如果 7大于 取 7和 7V2中与 7^之差的绝对值最小的那个, 其中 z=[-2,-M], 其中 M为待进行消除操作的第一丟失帧之前的子帧的个数。  If 7 is less than or equal to 7 and 2*7; the one with the smallest absolute value of the difference from 7^; if 7 is greater than the one with the smallest absolute value of the difference between 7 and 7V2 and 7^, where z=[- 2, -M], where M is the number of subframes before the first lost frame to be subjected to the cancel operation.
7、 如权利要求 2所述的方法, 其中,  7. The method of claim 2, wherein
所述根据第一丟失帧前接收的一个或两个以上子帧的自适应码本增益推 断该第一丟失帧的自适应码本增益, 或者根据浊音起始帧的时域语音信号的 能量变化推断该第一丟失帧的自适应码本增益, 包括:  Deriving an adaptive codebook gain of the first lost frame according to an adaptive codebook gain of one or more subframes received before the first lost frame, or changing an energy of the time domain voice signal according to the voiced start frame Inferring the adaptive codebook gain of the first lost frame, including:
如果满足以下条件一: 浊音起始帧的基音周期内对数能量与长时基音周 期内对数能量的差值小于第四阔值, 则将衰减后的第一丟失帧之前一个或两 个以上子帧的自适应码本增益的中位数的值作为第一丟失帧中每个子帧的自 适应码本增益的推断值; 如果不满足条件一, 但满足以下条件二: 浊音起始帧中最后一个子帧的 自适应码本增益在预定范围内, 则将对其衰减后的值作为第一丟失帧中每个 子帧的自适应码本增益的推断值; If the following condition one is satisfied: the difference between the logarithmic energy in the pitch period of the voiced start frame and the logarithmic energy in the long time pitch period is less than the fourth threshold, then one or more of the first lost frame after the attenuation The value of the median of the adaptive codebook gain of the subframe as the inferred value of the adaptive codebook gain for each subframe in the first lost frame; If condition one is not satisfied, but the following condition two is satisfied: the adaptive codebook gain of the last subframe in the voiced start frame is within a predetermined range, and the value attenuated as the value of each subframe in the first lost frame Inferred value of the adaptive codebook gain;
如果不满足条件一也不满足条件二, 则计算能量比值 ?^和 ?^, 使用衰 减后的 和 ¾r的加权平均值作为第一丟失帧中每个子帧的自适应码本增益 的推断值; 其中, 表示解码器合成的浊音起始帧的时域语音信号的除第一 个基音周期外的能量与除最后一个基音周期外的能量的比值; ¾r表示表示解 码器合成的浊音起始帧的时域语音信号的最后一个基音周期的能量与最后一 个基音周期的前一个基音周期的能量的比值, 所述基音周期不超过帧长的一 半。 If the condition is not satisfied and the condition 2 is not satisfied, the energy ratios ?^ and ?^ are calculated, and the weighted average of the attenuated sum and 3⁄4 r is used as the inferred value of the adaptive codebook gain for each subframe in the first lost frame. Wherein, the ratio of the energy of the time domain speech signal of the voiced start frame synthesized by the decoder to the energy other than the first pitch period and the energy other than the last pitch period; 3⁄4 r represents the voiced start of the decoder synthesis The ratio of the energy of the last pitch period of the time domain speech signal of the frame to the energy of the previous pitch period of the last pitch period, the pitch period not exceeding half of the frame length.
8、 如权利要求 2所述的方法, 其中,  8. The method of claim 2, wherein
得到第一修正值之后, 所述方法还包括:  After the first correction value is obtained, the method further includes:
对该第一修正值进行第二修正处理, 将修正处理后的结果作为最终的该 第一丟失帧每个子帧的基音延时的推断值。  The first correction value is subjected to a second correction process, and the result of the correction process is used as an inferred value of the pitch delay of each subframe of the first lost frame.
9、 如权利要求 8所述的方法, 其中, 所述对该第一修正值进行第二修正 处理, 包括:  9. The method according to claim 8, wherein the performing the second correction processing on the first correction value comprises:
判断如果满足下述两条件, 则取该浊音起始帧的最后一个子帧的基音延 时的整数部分为基音延时中间值: 条件 1 : 第一修正值与该浊音起始帧的最 后一个子帧的基音延时的整数部分的差的绝对值大于第五阔值, 条件 2: 该 浊音起始帧的最后一个子帧的基音延时的整数部分与浊音起始帧倒数第二个 子帧的基音延时整数部分的差的绝对值小于第六阔值; 其中 0<第六阔值<第 五阔值; 判断如果不满足上述任一条件, 则取第一修正量与第五阔值的最小 值与该浊音起始帧的最后一个子帧的基音延时的整数部分的和为基音延时中 间值;  It is judged that if the following two conditions are satisfied, the integer part of the pitch delay of the last subframe of the voiced start frame is the intermediate value of the pitch delay: Condition 1: The first correction value and the last of the voiced start frame The absolute value of the difference of the integer part of the pitch delay of the subframe is greater than the fifth threshold, Condition 2: the integer part of the pitch delay of the last subframe of the voiced start frame and the second last subframe of the voiced start frame The absolute value of the difference between the integer part of the pitch delay is less than the sixth threshold; wherein 0<the sixth threshold <the fifth threshold; determining that if any of the above conditions are not met, the first correction amount and the fifth threshold are taken The sum of the minimum value and the integer portion of the pitch delay of the last subframe of the voiced start frame is the intermediate value of the pitch delay;
判断基音延时中间值如果大于最近正确接收的具有稳定基音延时的浊音 帧的基音延时的 X倍, χ > 1 , 则将基音延时中间值乘 2作为第二修正处理后 的结果, 同时置倍频标识位为有效; 若基音延时中间值不大于最近正确接收 的具有稳定基音延时的浊音帧的基音延时的 X倍, 则将该基音延时中间值作 为第二修正处理后的结果, 同时置倍频标识位为无效。 If the intermediate value of the pitch delay is greater than X times the pitch delay of the recently received voiced frame with a stable pitch delay, χ > 1 , the intermediate value of the pitch delay is multiplied by 2 as the result of the second correction process. Simultaneously setting the multiplier flag is valid; if the intermediate value of the pitch delay is not greater than X times the pitch delay of the recently received voiced frame with a stable pitch delay, the intermediate value of the pitch delay is As a result of the second correction processing, the simultaneous multiplier flag is invalid.
10、 如权利要求 9所述的方法, 其中,  10. The method of claim 9, wherein
所述根据第一丟失帧前接收的一个或两个以上子帧的自适应码本增益推 断该第一丟失帧的自适应码本增益, 或者根据浊音起始帧的时域语音信号的 能量变化推断该第一丟失帧的自适应码本增益, 包括:  Deriving an adaptive codebook gain of the first lost frame according to an adaptive codebook gain of one or more subframes received before the first lost frame, or changing an energy of the time domain voice signal according to the voiced start frame Inferring the adaptive codebook gain of the first lost frame, including:
如果满足以下条件一: 浊音起始帧的基音周期内对数能量与长时基音周 期内对数能量的差值小于第四阔值或者在基音延时推断中设置的倍频标识位 为有效, 则将衰减后的第一丟失帧之前一个或两个以上子帧的自适应码本增 益的中位数的值作为第一丟失帧中每个子帧的自适应码本增益的推断值; 如果不满足条件一, 但满足以下条件二: 浊音起始帧中最后一个子帧的 自适应码本增益在预定范围内, 则将对其衰减后的值作为第一丟失帧中每个 子帧的自适应码本增益的推断值;  If the following condition one is satisfied: the difference between the logarithmic energy in the pitch period of the voiced start frame and the logarithmic energy in the long time pitch period is less than the fourth threshold or the multiplier flag set in the pitch delay estimation is valid, Then, the value of the median of the adaptive codebook gain of one or more subframes before the first lost frame is used as the inferred value of the adaptive codebook gain of each subframe in the first lost frame; Condition 1 is satisfied, but the following condition 2 is satisfied: The adaptive codebook gain of the last subframe in the voiced start frame is within a predetermined range, and the value attenuated is used as the adaptation of each subframe in the first lost frame. Inferred value of the codebook gain;
如果不满足条件一也不满足条件二, 则计算能量比值 ?^和 ?^, 使用衰 减后的 RLT和 RST的加权平均值作为第一丟失帧中每个子帧的自适应码本增益 的推断值; 其中, 表示解码器合成的浊音起始帧的时域语音信号的除第一 个基音周期外的能量与除最后一个基音周期外的能量的比值; ¾r表示表示解 码器合成的浊音起始帧的时域语音信号的最后一个基音周期的能量与最后一 个基音周期的前一个基音周期的能量的比值, 所述基音周期不超过帧长的一 半。 If the condition is not satisfied and the condition 2 is not satisfied, the energy ratios ?^ and ?^ are calculated, and the weighted average of the attenuated R LT and R ST is used as the adaptive codebook gain of each subframe in the first lost frame. estimated value; wherein the ratio of the energy of the time-domain speech signal decoder synthesis of voiced onset frames except the first one, except the last pitch period of a pitch period energy representation; ¾ r represents voiced synthesis represents the decoder The ratio of the energy of the last pitch period of the time domain speech signal of the start frame to the energy of the previous pitch period of the last pitch period, the pitch period not exceeding half of the frame length.
11、 如权利要求 1或 7或 10所述的方法, 其中, 所述方法还包括: 对于第一丟失帧之后紧随的一个或两个以上丟失帧, 使用当前丟失帧的 前一丟失帧的基音延时的推断值作为当前丟失帧的基音延时; 将当前丟失帧 的前一丟失帧的最后一个子帧的自适应码本增益的推断值进行衰减、 插值后 得到的自适应码本增益值作为当前丟失帧中各子帧的自适应码本增益; 根据 推断得到的基音延时和自适应码本增益对丟失帧进行补偿。  11. The method of claim 1 or 7 or 10, wherein the method further comprises: using one or more missing frames immediately following the first lost frame, using the previous lost frame of the current lost frame The inferred value of the pitch delay is used as the pitch delay of the current lost frame; the inferred value of the adaptive codebook gain of the last subframe of the previous lost frame of the current lost frame is attenuated, and the adaptive codebook gain obtained after interpolation is obtained. The value is used as the adaptive codebook gain of each subframe in the current lost frame; the lost frame is compensated according to the inferred pitch delay and the adaptive codebook gain.
12、 如权利要求 11所述的方法, 其中,  12. The method of claim 11 wherein
所述将当前丟失帧的前一丟失帧的最后一个子帧的自适应码本增益的推 断值进行衰减、 插值后得到的自适应码本增益值作为当前丟失帧中各子帧的 自适应码本增益, 包括: The attenuated value of the adaptive codebook gain of the last subframe of the previous lost frame of the current lost frame is attenuated, and the adaptive codebook gain value obtained by the interpolation is used as the subframe of the current lost frame. Adaptive codebook gain, including:
将经过衰减后的当前丟失帧的前一丟失帧的最后一个子帧的自适应码本 增益作为当前丟失帧的最后一个子帧的自适应码本增益 ( g , 当前丟失 帧的其他子帧的自适应码本增益由处理后的 ^和 ^之间的线性插值得 到, 对^^ *的处理用于使^^ *向 1靠近。 The adaptive codebook gain of the last subframe of the previous lost frame of the attenuated current lost frame is used as the adaptive codebook gain of the last subframe of the current lost frame ( g , other subframes of the current lost frame) The adaptive codebook gain is obtained by linear interpolation between the processed ^ and ^, and the processing of ^^ * is used to bring ^^ * closer to 1.
13、 如权利要求 12所述的方法, 其中,  13. The method of claim 12, wherein
所述处理后的 g 为 的算术平方根。 The processed g is the arithmetic square root of the process.
14、 如权利要求 1所述的方法, 其中, 所述方法还包括: The method of claim 1, wherein the method further comprises:
对于浊音起始帧之后首个正确接收的帧, 将该帧中每个子帧解码得到的 自适应码本增益乘以该子帧的第二尺度因子得到每个子帧的新的自适应码本 增益, 使用新的自适应码本增益代替解码得到的自适应码本增益参与语音合 成。  For the first correctly received frame after the voiced start frame, the adaptive codebook gain obtained by decoding each subframe in the frame is multiplied by the second scale factor of the subframe to obtain a new adaptive codebook gain for each subframe. The new adaptive codebook gain is used instead of the decoded adaptive codebook gain to participate in speech synthesis.
15、 如权利要求 11所述的方法, 其中, 所述方法还包括:  The method of claim 11, wherein the method further comprises:
对于浊音起始帧之后首个正确接收的帧, 将该帧中每个子帧解码得到的 自适应码本增益乘以该子帧的第二尺度因子得到每个子帧的新的自适应码本 增益, 使用新的自适应码本增益代替解码得到的自适应码本增益参与语音合 成。  For the first correctly received frame after the voiced start frame, the adaptive codebook gain obtained by decoding each subframe in the frame is multiplied by the second scale factor of the subframe to obtain a new adaptive codebook gain for each subframe. The new adaptive codebook gain is used instead of the decoded adaptive codebook gain to participate in speech synthesis.
16、 如权利要求 14或 15所述的方法, 其中, 每个子帧的第二尺度因子 釆用以下方法计算:  16. The method of claim 14 or 15, wherein the second scale factor of each subframe is calculated by:
将第二尺度因子初值乘以当前子帧解码得到的自适应码本增益, 再乘以 当前子帧的自适应码本, 将得到的信号作为当前子帧的激励信号;  Multiplying the initial value of the second scale factor by the adaptive codebook gain obtained by decoding the current subframe, multiplying the adaptive codebook of the current subframe, and using the obtained signal as the excitation signal of the current subframe;
使用所述激励信号进行语音预先合成, 根据预先合成的语音信号计算得 到当前子帧的信号能量;  Performing voice pre-synthesis using the excitation signal, and calculating signal energy of the current subframe according to the pre-synthesized speech signal;
如果当前子帧的信号能量和当前帧的前一帧最后一个子帧的信号能量的 比值的算术平方根超过第七阔值, 将第二尺度因子更新为当前第二尺度因子 的 Q倍, Q为所述算术平方根与第七阔值的乘积。  If the arithmetic square root of the ratio of the signal energy of the current subframe to the signal energy of the last subframe of the previous frame exceeds the seventh threshold, the second scale factor is updated to Q times the current second scale factor, Q is The product of the arithmetic square root and the seventh threshold.
17、 如权利要求 16所述的方法, 其中, 将第二尺度因子初值乘以当前子帧解码得到的自适应码本增益之前, 所 述方法还包括: 17. The method of claim 16 wherein Before multiplying the initial value of the second scale factor by the adaptive codebook gain obtained by decoding the current subframe, the method further includes:
如果当前帧的前一丟帧的基音延时的推断值与当前帧解码得到的第一个 子帧的基音延时差的绝对值大于第八阔值, 则根据浊音起始帧的基音同步自 相关系数的线性增函数重新计算新的第二尺度因子, 用该新的第二尺度因子 代替第二尺度因子初值。  If the inferred value of the pitch delay of the previous frame of the current frame is greater than the eighth value of the pitch delay of the first subframe obtained by the current frame decoding, the pitch is synchronized according to the pitch of the voiced start frame. The linear increase function of the correlation coefficient recalculates the new second scale factor, and replaces the second scale factor initial value with the new second scale factor.
18、 一种浊音起始帧后帧的补偿方法, 所述方法包括: 18. A method for compensating a frame after a voiced start frame, the method comprising:
浊音起始帧正确接收, 当浊音起始帧之后紧随的一个或两个以上帧丟失 时, 推断丟失帧的基音延时以及自适应码本增益, 根据推断得到的基音延时 和自适应码本增益对丟失帧进行补偿;  The voiced start frame is correctly received. When one or more frames following the voiced start frame are lost, the pitch delay of the lost frame and the adaptive codebook gain are inferred, and the pitch delay and the adaptive code are obtained according to the inference. This gain compensates for lost frames;
对浊音起始帧之后首个正确接收的帧, 将该帧中每个子帧解码得到的自 适应码本增益乘以该子帧的第二尺度因子得到每个子帧的新的自适应码本增 益,使用新的自适应码本增益代替解码得到的自适应码本增益参与语音合成。  For the first correctly received frame after the voiced start frame, the adaptive codebook gain obtained by decoding each subframe in the frame is multiplied by the second scale factor of the subframe to obtain a new adaptive codebook gain for each subframe. The new adaptive codebook gain is used instead of the decoded adaptive codebook gain to participate in speech synthesis.
19、 如权利要求 18所述的方法, 其中, 每个子帧的第二尺度因子釆用以 下方法计算:  19. The method of claim 18, wherein the second scale factor of each subframe is calculated using the following method:
将第二尺度因子初值乘以当前子帧解码得到的自适应码本增益, 再乘以 当前子帧的自适应码本, 将得到的信号作为当前子帧的激励信号;  Multiplying the initial value of the second scale factor by the adaptive codebook gain obtained by decoding the current subframe, multiplying the adaptive codebook of the current subframe, and using the obtained signal as the excitation signal of the current subframe;
使用所述激励信号进行语音预先合成, 根据预先合成的语音信号计算得 到当前子帧的信号能量;  Performing voice pre-synthesis using the excitation signal, and calculating signal energy of the current subframe according to the pre-synthesized speech signal;
如果当前子帧的信号能量和当前帧的前一帧最后一个子帧的信号能量 If the signal energy of the current subframe and the signal energy of the last subframe of the previous frame of the current frame
ΕΛ的比值的算术平方根超过第七阔值, 将第二尺度因子更新为当前第二尺度 因子的 Q倍, Q为所述算术平方根与第七阔值的乘积。 The arithmetic square root of the ratio of Ε 超过 exceeds the seventh threshold, and the second scale factor is updated to Q times the current second scale factor, and Q is the product of the arithmetic square root and the seventh threshold.
20、 如权利要求 19所述的方法, 其中,  20. The method of claim 19, wherein
将第二尺度因子初值乘以当前子帧解码得到的自适应码本增益之前, 所 述方法还包括:  Before multiplying the initial value of the second scale factor by the adaptive codebook gain obtained by decoding the current subframe, the method further includes:
如果当前帧的前一丟帧的基音延时的推断值与当前帧解码得到的第一个 子帧的基音延时差的绝对值大于第八阔值, 则根据浊音起始帧的基音同步自 相关系数的线性增函数重新计算新的第二尺度因子, 用该新的第二尺度因子 代替第二尺度因子初值。 If the inferred value of the pitch delay of the previous frame of the current frame is greater than the eighth value of the pitch delay of the first subframe obtained by the current frame decoding, the pitch is synchronized according to the pitch of the voiced start frame. The linear increase function of the correlation coefficient recalculates the new second scale factor, using the new second scale factor Instead of the second scale factor initial value.
21、 如权利要求 18或 19或 20所述的方法, 其中, 21. The method of claim 18 or 19 or 20, wherein
所述推断丟失帧的基音延时以及自适应码本增益, 包括: 当浊音起始帧之后紧随的第一帧丟失时, 釆用如权利要求 1-10中任一权 利要求所述的方法, 推断浊音起始帧之后紧随的第一丟失帧的基音延时和自 适应码本增益; 或者  The inferring the pitch delay of the lost frame and the adaptive codebook gain, comprising: using the method according to any one of claims 1-10 when the first frame immediately following the voiced start frame is lost , inferring the pitch delay and adaptive codebook gain of the first lost frame immediately following the start of the voiced speech; or
当浊音起始帧之后紧随的第一帧丟失且第一丟失帧之后紧随的一个或两 个以上帧丟失时, 釆用如权利要求 1-10中任一权利要求所述的方法, 推断浊 音起始帧之后紧随的第一丟失帧的基音延时和自适应码本增益; 釆用如权利 要求 11-17 中任一权利要求所述的方法, 推断第一丟失帧之后紧随的一个或 两个以上丟失帧的基音延时和自适应码本增益。  When the first frame immediately following the voiced start frame is lost and one or more frames immediately following the first lost frame are lost, the method according to any one of claims 1-10 is used to infer Pitch delay and adaptive codebook gain of the first lost frame immediately following the voiced start frame; using the method of any of claims 11-17, inferring the immediately following first lost frame Pitch delay and adaptive codebook gain for one or more lost frames.
22、 一种浊音起始帧后丟帧的补偿装置, 所述装置包括第一基因延时补 偿模块、 第一自适应码本增益补偿模块和第一补偿模块, 其中:  22. A compensation device for dropping frames after a voiced start frame, the device comprising a first genetic delay compensation module, a first adaptive codebook gain compensation module, and a first compensation module, wherein:
所述第一基因延时补偿模块设置为: 在浊音起始帧正确接收, 浊音起始 帧之后紧随的第一帧丟失时, 根据该浊音起始帧的稳定性条件选取相应的基 音延时推断方式推断该第一丟失帧的基音延时;  The first gene delay compensation module is configured to: when the voiced start frame is correctly received, and the first frame immediately following the voiced start frame is lost, the corresponding pitch delay is selected according to the stability condition of the voiced start frame. Inferring the way to infer the pitch delay of the first lost frame;
所述第一自适应码本增益补偿模块设置为: 根据第一丟失帧前接收的一 个或两个以上子帧的自适应码本增益推断该第一丟失帧的自适应码本增益, 或者根据浊音起始帧的时域语音信号的能量变化推断该第一丟失帧的自适应 码本增益;  The first adaptive codebook gain compensation module is configured to: infer an adaptive codebook gain of the first lost frame according to an adaptive codebook gain of one or more subframes received before the first lost frame, or according to The energy variation of the time domain speech signal of the voiced start frame infers the adaptive codebook gain of the first lost frame;
所述第一补偿模块设置为: 根据推断得到的基音延时和自适应码本增益 对第一丟失帧进行补偿。  The first compensation module is configured to: compensate the first lost frame according to the inferred pitch delay and the adaptive codebook gain.
23、 如权利要求 22所述的补偿装置, 其中,  23. The compensation device according to claim 22, wherein
所述第一基因延时补偿模块设置为: 釆用以下方式根据该浊音起始帧的 稳定性条件选取相应的基音延时推断方式推断该第一丟失帧的基音延时: 如果浊音起始帧满足以下任一条件, 则釆用以下基音延时推断方式推断 该第一丟失帧的基音延时: 使用该浊音起始帧的最后一个子帧的基音延时的 整数部分作为该第一丟失帧每个子帧的基音延时的推断值; 如果浊音起始帧不满足以下所有条件, 则釆用以下基音延时推断方式推 断该第一丟失帧的基音延时: 使用第一修正量对该浊音起始帧的最后一个子 帧的基音延时的整数部分进行修正得到第一修正值, 将第一修正值作为该第 一丟失帧每个子帧的基音延时的推断值; The first gene delay compensation module is configured to: 推断 infer the pitch delay of the first lost frame according to the stability condition of the voiced start frame by selecting a corresponding pitch delay inference manner: if the voiced start frame If any of the following conditions are met, the pitch delay of the first lost frame is inferred by the following pitch delay inference method: using the integer part of the pitch delay of the last subframe of the voiced start frame as the first lost frame Inferred value of the pitch delay for each sub-frame; If the voiced start frame does not satisfy all of the following conditions, the pitch delay of the first lost frame is inferred by the following pitch delay inference method: using the first correction amount, the pitch of the last subframe of the voiced start frame is delayed. The integer part of the time is corrected to obtain a first correction value, and the first correction value is used as an inferred value of the pitch delay of each subframe of the first lost frame;
所述条件为:  The conditions are:
浊音起始帧的基音同步的自相关系数大于第一阔值;  The autocorrelation coefficient of the pitch synchronization of the voiced start frame is greater than the first threshold;
浊音起始帧的最后一个子帧的自适应码本增益大于第二阔值, 且该浊音 起始帧的倒数第二个子帧的自适应码本增益大于第三阔值;  The adaptive codebook gain of the last subframe of the voiced start frame is greater than the second threshold, and the adaptive codebook gain of the second to last subframe of the voiced start frame is greater than the third threshold;
浊音起始帧的最后一个子帧和倒数第二个子帧的基音延时的整数部分相 等。  The last sub-frame of the voiced start frame is equal to the integer part of the pitch delay of the second to last subframe.
24、 如权利要求 23所述的补偿装置, 其中,  24. The compensation device according to claim 23, wherein
所述补偿装置还包括第一修正量计算模块, 设置为: 获得所述第一修正 量, 所述第一修正量计算模块包括消除单元、 修正因子计算单元、 第一尺度 因子计算单元和第一修正量计算单元, 其中: 所述消除单元设置为: 以第一丟失帧之前的最后一个子帧为基准, 消除 第一丟失帧之前的两个以上子帧的基因延时的倍数;  The compensation device further includes a first correction amount calculation module configured to: obtain the first correction amount, the first correction amount calculation module includes an elimination unit, a correction factor calculation unit, a first scale factor calculation unit, and the first a correction amount calculation unit, wherein: the elimination unit is configured to: cancel a multiple of a gene delay of two or more subframes before the first lost frame based on a last subframe before the first lost frame;
所述修正因子计算单元设置为:釆用以下方式确定基音延时的修正因子: 修正因子为: 消除基音延时倍数后的第一丟失帧之前的两个以上子帧的基音 延时整数部分的标准方差;  The correction factor calculation unit is configured to: determine a correction factor of the pitch delay in the following manner: The correction factor is: canceling the pitch delay integer part of the two or more subframes before the first lost frame after the pitch delay multiple Standard variance
所述第一尺度因子计算单元设置为: 釆用以下方式确定基因延时的第一 尺度因子: 第一尺度因子为: 1 减去修正因子与浊音起始帧的最后一个子帧 的基音延时的整数部分的比值;  The first scale factor calculation unit is configured to: 确定 determine a first scale factor of the gene delay in the following manner: The first scale factor is: 1 minus the correction factor and the pitch delay of the last subframe of the voiced start frame The ratio of the integer parts;
所述第一修正量计算单元设置为: 釆用以下方式计算所述第一修正量: 第一修正量为: 所述修正因子和第一尺度因子的乘积。  The first correction amount calculation unit is configured to: calculate the first correction amount in the following manner: The first correction amount is: a product of the correction factor and the first scale factor.
25、 如权利要求 24所述的补偿装置, 其中,  25. The compensation device according to claim 24, wherein
所述消除单元是设置为: 釆用以下方式以第一丟失帧之前的最后一个子 帧为基准, 消除第一丟失帧之前的两个以上子帧的基因延时的倍数: 先取 7 ^为 其中, 7 ^表示消除倍数后的基音延时, 7^为该浊音起 始帧的最后一个子帧的基音延时的整数部分; 如果 7小于等于 7^ , 所述消除 单元取 7和 2*7;中与 ^之差的绝对值最小的那个作为 T、 如果 7大于 ΤΛ , 所述消除单元取 7 和 Ί\Ι2 中与 7^之差的绝对值最小的那个作为 Τ、, 其中 -2,-Μ], 其中 Μ为待进行消除操作的第一丟失帧之前的子帧的个数。 The eliminating unit is configured to: 消除 eliminate the multiple of the gene delay of the two or more subframes before the first lost frame based on the last subframe before the first lost frame in the following manner: First take 7 ^ for it, 7 ^ denotes the pitch delay after eliminating the multiple, 7^ is the integer part of the pitch delay of the last subframe of the voiced start frame; if 7 is less than or equal to 7^, the elimination unit takes 7 and 2*7; the one with the smallest absolute value of the difference between ^ and ^ is T, and if 7 is greater than Τ Λ , the elimination unit takes the one with the smallest absolute value of the difference between 7 and Ί\Ι2 as 7^ as Τ , where -2, -Μ], where Μ is the number of subframes before the first lost frame to be cancelled.
26、 如权利要求 23所述的补偿装置, 其中,  26. The compensation device according to claim 23, wherein
所述第一自适应码本增益补偿模块设置为: 釆用以下方式根据第一丟失 帧前接收的一个或两个以上子帧的自适应码本增益推断该第一丟失帧的自适 应码本增益, 或者根据浊音起始帧的时域语音信号的能量变化推断该第一丟 失帧的自适应码本增益:  The first adaptive codebook gain compensation module is configured to: infer an adaptive codebook of the first lost frame according to an adaptive codebook gain of one or more subframes received before the first lost frame in the following manner: Gain, or inferring the adaptive codebook gain of the first lost frame based on the energy variation of the time domain speech signal of the voiced start frame:
判断如果满足以下条件一: 浊音起始帧的基音周期内对数能量与长时基 音周期内对数能量的差值小于第四阔值, 则将衰减后的第一丟失帧之前一个 或两个以上子帧的自适应码本增益的中位数的值作为第一丟失帧中每个子帧 的自适应码本增益的推断值;  It is judged that if the following condition is satisfied: the difference between the logarithmic energy in the pitch period of the voiced start frame and the logarithmic energy in the long time pitch period is less than the fourth threshold, then one or two of the first lost frame will be attenuated The value of the median of the adaptive codebook gain of the above subframe is used as an inferred value of the adaptive codebook gain of each subframe in the first lost frame;
判断如果不满足条件一, 但满足以下条件二: 浊音起始帧中最后一个子 帧的自适应码本增益在预定范围内, 则将对其衰减后的值作为第一丟失帧中 每个子帧的自适应码本增益的推断值;  It is judged that if condition one is not satisfied, but the following condition two is satisfied: the adaptive codebook gain of the last subframe in the voiced start frame is within a predetermined range, and the value that is attenuated is used as each subframe in the first lost frame. Inferred value of the adaptive codebook gain;
判断如果不满足条件一也不满足条件二, 则计算能量比值 ?^和 ?^, 使 用衰减后的 RLT和 RST的加权平均值作为第一丟失帧中每个子帧的自适应码本 增益的推断值; 其中, 表示解码器合成的浊音起始帧的时域语音信号的除 第一个基音周期外的能量与除最后一个基音周期外的能量的比值; RST表示表 示解码器合成的浊音起始帧的时域语音信号的最后一个基音周期的能量与最 后一个基音周期的前一个基音周期的能量的比值, 所述基音周期不超过帧长 的一半。 If it is judged that the condition 2 is not satisfied, the energy ratios ?^ and ?^ are calculated, and the weighted average of the attenuated R LT and R ST is used as the adaptive codebook gain of each subframe in the first lost frame. Inferred value; wherein, the ratio of the energy of the time domain speech signal of the voiced start frame synthesized by the decoder to the energy other than the first pitch period and the energy other than the last pitch period; R ST represents the representation of the decoder synthesis The ratio of the energy of the last pitch period of the time domain speech signal of the voiced start frame to the energy of the previous pitch period of the last pitch period, the pitch period not exceeding half of the frame length.
27、 如权利要求 23所述的补偿装置, 其中,  27. The compensation device according to claim 23, wherein
所述补偿装置还包括: 基因延时补偿修正模块, 设置为: 在得到第一修 正值之后, 对该第一修正值进行第二修正处理, 将修正处理后的结果作为最 终的该第一丟失帧每个子帧的基音延时的推断值。 The compensation device further includes: a gene delay compensation correction module, configured to: after obtaining the first correction value, performing a second correction process on the first correction value, and using the corrected result as the final first loss The inferred value of the pitch delay for each subframe of the frame.
28、 如权利要求 27所述的补偿装置, 其中, 28. The compensation device according to claim 27, wherein
所述基因延时补偿修正模块设置为: 釆用以下方式对该第一修正值进行 第二修正处理:  The gene delay compensation correction module is configured to: 进行 perform a second correction process on the first correction value in the following manner:
判断如果满足下述两条件, 则取该浊音起始帧的最后一个子帧的基音延 时的整数部分为基音延时中间值: 条件 1 : 第一修正值与该浊音起始帧的最 后一个子帧的基音延时的整数部分的差的绝对值大于第五阔值, 条件 2: 该 浊音起始帧的最后一个子帧的基音延时的整数部分与浊音起始帧倒数第二个 子帧的基音延时整数部分的差的绝对值小于第六阔值; 其中 0<第六阔值<第 五阔值; 所述基因延时补偿修正模块判断如果不满足上述任一条件, 则取第 一修正量与第五阔值的最小值与该浊音起始帧的最后一个子帧的基音延时的 整数部分的和为基音延时中间值;  It is judged that if the following two conditions are satisfied, the integer part of the pitch delay of the last subframe of the voiced start frame is the intermediate value of the pitch delay: Condition 1: The first correction value and the last of the voiced start frame The absolute value of the difference of the integer part of the pitch delay of the subframe is greater than the fifth threshold, Condition 2: the integer part of the pitch delay of the last subframe of the voiced start frame and the second last subframe of the voiced start frame The absolute value of the difference between the integer part of the pitch delay is less than the sixth threshold; wherein 0<the sixth threshold <the fifth threshold; the gene delay compensation correction module determines that if any of the above conditions are not met, the first The sum of the minimum value of a correction amount and the fifth threshold value and the integer portion of the pitch delay of the last subframe of the voiced start frame is an intermediate value of the pitch delay;
判断基音延时中间值如果大于最近正确接收的具有稳定基音延时的浊音 帧的基音延时的 X倍, χ > 1 , 则将基音延时中间值乘 2作为第二修正处理后 的结果, 同时置倍频标识位为有效; 若基音延时中间值不大于最近正确接收 的具有稳定基音延时的浊音帧的基音延时的 X倍, 则将该基音延时中间值作 为第二修正处理后的结果, 同时置倍频标识位为无效。  If the intermediate value of the pitch delay is greater than X times the pitch delay of the recently received voiced frame with a stable pitch delay, χ > 1 , the intermediate value of the pitch delay is multiplied by 2 as the result of the second correction process. Simultaneously setting the multiplier flag is valid; if the intermediate value of the pitch delay is not greater than X times the pitch delay of the recently received voiced frame with a stable pitch delay, the intermediate value of the pitch delay is used as the second correction process. After the result, the simultaneous multiplier flag is invalid.
29、 如权利要求 28所述的补偿装置, 其中,  29. The compensation device according to claim 28, wherein
所述第一自适应码本增益补偿模块设置为: 釆用以下方式根据第一丟失 帧前接收的一个或两个以上子帧的自适应码本增益推断该第一丟失帧的自适 应码本增益, 或者根据浊音起始帧的时域语音信号的能量变化推断该第一丟 失帧的自适应码本增益:  The first adaptive codebook gain compensation module is configured to: infer an adaptive codebook of the first lost frame according to an adaptive codebook gain of one or more subframes received before the first lost frame in the following manner: Gain, or inferring the adaptive codebook gain of the first lost frame based on the energy variation of the time domain speech signal of the voiced start frame:
判断如果满足以下条件一: 浊音起始帧的基音周期内对数能量与长时基 音周期内对数能量的差值小于第四阔值或者在基音延时推断中设置的倍频标 识位为有效, 则将衰减后的第一丟失帧之前一个或两个以上子帧的自适应码 本增益的中位数的值作为第一丟失帧中每个子帧的自适应码本增益的推断 值;  It is judged that if the following condition one is satisfied: the difference between the logarithmic energy in the pitch period of the voiced start frame and the logarithmic energy in the long time pitch period is less than the fourth threshold or the multiplier flag set in the pitch delay estimation is valid And determining, as the inferred value of the adaptive codebook gain of each subframe in the first lost frame, the value of the median of the adaptive codebook gain of one or more subframes before the first lost frame;
判断如果不满足条件一, 但满足以下条件二: 浊音起始帧中最后一个子 帧的自适应码本增益在预定范围内, 则将对其衰减后的值作为第一丟失帧中 每个子帧的自适应码本增益的推断值; If the condition 1 is not satisfied, but the following condition 2 is satisfied: the adaptive codebook gain of the last subframe in the voiced start frame is within a predetermined range, and the value after the attenuation is used as the first lost frame. Inferred value of the adaptive codebook gain for each subframe;
判断如果不满足条件一也不满足条件二, 则计算能量比值 ?^和 ?^, 使 用衰减后的 RLT和 RST的加权平均值作为第一丟失帧中每个子帧的自适应码本 增益的推断值; 其中, 表示解码器合成的浊音起始帧的时域语音信号的除 第一个基音周期外的能量与除最后一个基音周期外的能量的比值; RST表示表 示解码器合成的浊音起始帧的时域语音信号的最后一个基音周期的能量与最 后一个基音周期的前一个基音周期的能量的比值, 所述基音周期不超过帧长 的一半。 If it is judged that the condition 2 is not satisfied, the energy ratios ?^ and ?^ are calculated, and the weighted average of the attenuated R LT and R ST is used as the adaptive codebook gain of each subframe in the first lost frame. Inferred value; wherein, the ratio of the energy of the time domain speech signal of the voiced start frame synthesized by the decoder to the energy other than the first pitch period and the energy other than the last pitch period; R ST represents the representation of the decoder synthesis The ratio of the energy of the last pitch period of the time domain speech signal of the voiced start frame to the energy of the previous pitch period of the last pitch period, the pitch period not exceeding half of the frame length.
30、 如权利要求 22或 26或 29所述的补偿装置, 其中,  30. The compensation device according to claim 22 or 26 or 29, wherein
所述补偿装置还包括第二基音延时补偿模块、 第二自适应码本增益补偿 模块和第二补偿模块, 其中:  The compensation device further includes a second pitch delay compensation module, a second adaptive codebook gain compensation module and a second compensation module, wherein:
所述第二基音延时补偿模块设置为: 对于第一丟失帧之后紧随的一个或 两个以上丟失帧, 使用当前丟失帧的前一丟失帧的基音延时的推断值作为当 前丟失帧的基音延时;  The second pitch delay compensation module is configured to: use the inferred value of the pitch delay of the previous lost frame of the current lost frame as the current lost frame for one or more missing frames immediately following the first lost frame Pitch delay
所述第二自适应码本增益补偿模块设置为: 将当前丟失帧的前一丟失帧 的最后一个子帧的自适应码本增益的推断值进行衰减、 插值后得到的自适应 码本增益值作为当前丟失帧中各子帧的自适应码本增益;  The second adaptive codebook gain compensation module is configured to: attenuate and interpolate the inferred value of the adaptive codebook gain of the last subframe of the previous lost frame of the current lost frame to obtain an adaptive codebook gain value. As an adaptive codebook gain for each subframe in the current lost frame;
所述第二补偿模块设置为: 根据推断得到的基音延时和自适应码本增益 对丟失帧进行补偿。  The second compensation module is configured to: compensate for the lost frame based on the inferred pitch delay and the adaptive codebook gain.
31、 如权利要求 30所述的补偿装置, 其中,  31. The compensation device according to claim 30, wherein
所述第二自适应码本增益补偿模块设置为: 釆用以下方式将当前丟失帧 的前一丟失帧的最后一个子帧的自适应码本增益的推断值进行衰减、 插值后 得到的自适应码本增益值作为当前丟失帧中各子帧的自适应码本增益:  The second adaptive codebook gain compensation module is configured to: 衰减 attenuate and interpolate the inferred value of the adaptive codebook gain of the last subframe of the previous lost frame of the current lost frame in the following manner The codebook gain value is used as the adaptive codebook gain for each subframe in the current lost frame:
将经过衰减后的当前丟失帧的前一丟失帧的最后一个子帧的自适应码本 增益作为当前丟失帧的最后一个子帧的自适应码本增益 ) , 当前丟失 帧的其他子帧的自适应码本增益由处理后的 ^和 ^之间的线性插值得 到, 对 的处理用于使 向 1靠近。  The adaptive codebook gain of the last subframe of the previous lost frame of the attenuated current lost frame is used as the adaptive codebook gain of the last subframe of the current lost frame, and the other subframes of the current lost frame are self-framed. The adaptive codebook gain is obtained by linear interpolation between the processed ^ and ^, and the processing of the pair is used to bring toward 1.
32、 如权利要求 31所述的补偿装置, 其中, 所述处理后的 g 为 的算术平方根。 32. The compensation device according to claim 31, wherein The processed g is the arithmetic square root of the process.
33、 如权利要求 22所述的补偿装置, 其中, 33. The compensation device according to claim 22, wherein
所述补偿装置还包括自适应码本增益调整模块和第三补偿模块, 其中: 所述适应码本增益调整模块设置为: 对于浊音起始帧之后首个正确接收 的帧, 将该帧中每个子帧解码得到的自适应码本增益乘以该子帧的第二尺度 因子得到每个子帧的新的自适应码本增益;  The compensation device further includes an adaptive codebook gain adjustment module and a third compensation module, wherein: the adaptive codebook gain adjustment module is configured to: for the first correctly received frame after the voiced start frame, each frame in the frame The adaptive codebook gain obtained by decoding the subframe is multiplied by the second scale factor of the subframe to obtain a new adaptive codebook gain for each subframe;
所述第三补偿模块设置为: 使用新的自适应码本增益代替解码得到的自 适应码本增益参与语音合成。  The third compensation module is configured to: participate in speech synthesis using a new adaptive codebook gain instead of the decoded adaptive codebook gain.
34、 如权利要求 30所述的补偿装置, 其中,  34. The compensation device according to claim 30, wherein
所述补偿装置还包括自适应码本增益调整模块和第三补偿模块, 其中: 所述适应码本增益调整模块设置为: 对于浊音起始帧之后首个正确接收 的帧, 将该帧中每个子帧解码得到的自适应码本增益乘以该子帧的第二尺度 因子得到每个子帧的新的自适应码本增益;  The compensation device further includes an adaptive codebook gain adjustment module and a third compensation module, wherein: the adaptive codebook gain adjustment module is configured to: for the first correctly received frame after the voiced start frame, each frame in the frame The adaptive codebook gain obtained by decoding the subframe is multiplied by the second scale factor of the subframe to obtain a new adaptive codebook gain for each subframe;
所述第三补偿模块设置为: 使用新的自适应码本增益代替解码得到的自 适应码本增益参与语音合成。  The third compensation module is configured to: participate in speech synthesis using a new adaptive codebook gain instead of the decoded adaptive codebook gain.
35、 如权利要求 33或 34所述的补偿装置, 其中,  35. The compensation device according to claim 33 or 34, wherein
所述补偿装置还包括第二尺度因子计算模块, 设置为: 计算每个子帧的 第二尺度因子, 包括激励信号获取单元、 预合成单元和第二尺度因子生成单 元, 其中:  The compensation device further includes a second scale factor calculation module configured to: calculate a second scale factor of each subframe, including an excitation signal acquisition unit, a pre-synthesis unit, and a second scale factor generation unit, where:
所述激励信号获取单元设置为: 将第二尺度因子初值乘以当前子帧解码 得到的自适应码本增益, 再乘以当前子帧的自适应码本, 将得到的信号作为 当前子帧的激励信号;  The excitation signal acquiring unit is configured to: multiply the initial value of the second scale factor by the adaptive codebook gain obtained by decoding the current subframe, and multiply the adaptive codebook of the current subframe, and use the obtained signal as the current subframe. Incentive signal
所述预合成单元设置为: 使用所述激励信号进行语音预先合成, 根据预 先合成的语音信号计算得到当前子帧的信号能量;  The pre-synthesis unit is configured to: perform voice pre-synthesis using the excitation signal, and calculate a signal energy of the current subframe according to the pre-synthesized speech signal;
所述第二尺度因子生成单元设置为: 在判断当前子帧的信号能量和当前 帧的前一帧最后一个子帧的信号能量的比值的算术平方根超过第七阔值时, 将第二尺度因子更新为当前第二尺度因子的 Q倍, Q为所述算术平方根与第 七阔值的乘积。 The second scale factor generating unit is configured to: when determining that the arithmetic square root of the ratio of the signal energy of the current subframe to the signal energy of the last subframe of the previous frame exceeds the seventh threshold, the second scale factor Updated to Q times the current second scale factor, Q is the arithmetic square root and the The product of seven broad values.
36、 如权利要求 35所述的补偿装置, 其中,  36. The compensation device according to claim 35, wherein
所述激励信号获取单元还设置为: 在将第二尺度因子初值乘以当前子帧 解码得到的自适应码本增益之前, 判断当前帧的前一丟帧的基音延时的推断 值与当前帧解码得到的第一个子帧的基音延时差的绝对值大于第八阔值时, 根据浊音起始帧的基音同步自相关系数的线性增函数重新计算新的第二尺度 因子, 用新的第二尺度因子代替第二尺度因子初值。  The excitation signal acquiring unit is further configured to: before multiplying the initial value of the second scale factor by the adaptive codebook gain obtained by decoding the current subframe, determining an inferred value of the pitch delay of the previous frame loss of the current frame and the current When the absolute value of the pitch delay difference of the first subframe obtained by the frame decoding is greater than the eighth threshold, the new second scale factor is recalculated according to the linear increasing function of the pitch synchronization autocorrelation coefficient of the voiced start frame, with new The second scale factor replaces the initial value of the second scale factor.
37、 一种浊音起始帧后帧的补偿装置, 所述装置包括补偿模块和自适应 码本增益调整模块, 其中: 所述补偿模块设置为: 在浊音起始帧正确接收, 当浊音起始帧之后紧随 的一个或两个以上帧丟失时, 推断丟失帧的基音延时以及自适应码本增益, 根据推断得到的基音延时和自适应码本增益对丟失帧进行补偿;  37. A compensation device for a frame after a voiced start frame, the device comprising a compensation module and an adaptive codebook gain adjustment module, wherein: the compensation module is configured to: correctly receive the voiced start frame, when the voiced tone starts When one or more frames immediately following the frame are lost, the pitch delay of the lost frame and the adaptive codebook gain are inferred, and the lost frame is compensated according to the inferred pitch delay and the adaptive codebook gain;
所述自适应码本增益调整模块设置为: 对浊音起始帧之后首个正确接收 的帧, 将该帧中每个子帧解码得到的自适应码本增益乘以该子帧的第二尺度 因子得到每个子帧的新的自适应码本增益, 使用新的自适应码本增益代替解 码得到的自适应码本增益参与语音合成。  The adaptive codebook gain adjustment module is configured to: multiply an adaptive codebook gain obtained by decoding each subframe in the frame by a second scale factor of the subframe after the first correctly received frame after the voiced start frame A new adaptive codebook gain for each subframe is obtained, and the new adaptive codebook gain is used instead of the decoded adaptive codebook gain to participate in speech synthesis.
38、 如权利要求 37所述的补偿装置, 其中,  38. The compensation device according to claim 37, wherein
所述补偿装置还包括第二尺度因子计算模块, 设置为: 计算每个子帧的 第二尺度因子, 包括激励信号获取单元、 预合成单元和第二尺度因子生成单 元, 其中:  The compensation device further includes a second scale factor calculation module configured to: calculate a second scale factor of each subframe, including an excitation signal acquisition unit, a pre-synthesis unit, and a second scale factor generation unit, where:
所述激励信号获取单元设置为: 将第二尺度因子初值乘以当前子帧解码 得到的自适应码本增益, 再乘以当前子帧的自适应码本, 将得到的信号作为 当前子帧的激励信号;  The excitation signal acquiring unit is configured to: multiply the initial value of the second scale factor by the adaptive codebook gain obtained by decoding the current subframe, and multiply the adaptive codebook of the current subframe, and use the obtained signal as the current subframe. Incentive signal
所述预合成单元设置为: 使用所述激励信号进行语音预先合成, 根据预 先合成的语音信号计算得到当前子帧的信号能量;  The pre-synthesis unit is configured to: perform voice pre-synthesis using the excitation signal, and calculate a signal energy of the current subframe according to the pre-synthesized speech signal;
所述第二尺度因子生成单元设置为: 在判断当前子帧的信号能量和当前 帧的前一帧最后一个子帧的信号能量的比值的算术平方根超过第七阔值时, 将第二尺度因子更新为当前第二尺度因子的 Q倍, Q为所述算术平方根与第 七阔值的乘积。 The second scale factor generating unit is configured to: when determining that the arithmetic square root of the ratio of the signal energy of the current subframe to the signal energy of the last subframe of the previous frame exceeds the seventh threshold, the second scale factor Updated to Q times the current second scale factor, Q is the arithmetic square root and the The product of seven broad values.
39、 如权利要求 38所述的补偿装置, 其中,  39. The compensation device according to claim 38, wherein
所述激励信号获取单元还设置为: 在将第二尺度因子初值乘以当前子帧 解码得到的自适应码本增益之前, 判断当前帧的前一丟帧的基音延时的推断 值与当前帧解码得到的第一个子帧的基音延时差的绝对值大于第八阔值时, 根据浊音起始帧的基音同步自相关系数的线性增函数重新计算新的第二尺度 因子, 用新的第二尺度因子代替第二尺度因子初值。  The excitation signal acquiring unit is further configured to: before multiplying the initial value of the second scale factor by the adaptive codebook gain obtained by decoding the current subframe, determining an inferred value of the pitch delay of the previous frame loss of the current frame and the current When the absolute value of the pitch delay difference of the first subframe obtained by the frame decoding is greater than the eighth threshold, the new second scale factor is recalculated according to the linear increasing function of the pitch synchronization autocorrelation coefficient of the voiced start frame, with new The second scale factor replaces the initial value of the second scale factor.
PCT/CN2012/077356 2011-07-31 2012-06-21 Compensation method and device for frame loss after voiced initial frame WO2013016986A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201110216736.9A CN102915737B (en) 2011-07-31 2011-07-31 The compensation method of frame losing and device after a kind of voiced sound start frame
CN201110216736.9 2011-07-31

Publications (1)

Publication Number Publication Date
WO2013016986A1 true WO2013016986A1 (en) 2013-02-07

Family

ID=47614075

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2012/077356 WO2013016986A1 (en) 2011-07-31 2012-06-21 Compensation method and device for frame loss after voiced initial frame

Country Status (2)

Country Link
CN (1) CN102915737B (en)
WO (1) WO2013016986A1 (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107369455A (en) * 2014-03-21 2017-11-21 华为技术有限公司 The coding/decoding method and device of language audio code stream
US10997982B2 (en) 2018-05-31 2021-05-04 Shure Acquisition Holdings, Inc. Systems and methods for intelligent voice activation for auto-mixing
US11297426B2 (en) 2019-08-23 2022-04-05 Shure Acquisition Holdings, Inc. One-dimensional array microphone with improved directivity
US11297423B2 (en) 2018-06-15 2022-04-05 Shure Acquisition Holdings, Inc. Endfire linear array microphone
US11303981B2 (en) 2019-03-21 2022-04-12 Shure Acquisition Holdings, Inc. Housings and associated design features for ceiling array microphones
US11302347B2 (en) 2019-05-31 2022-04-12 Shure Acquisition Holdings, Inc. Low latency automixer integrated with voice and noise activity detection
US11310592B2 (en) 2015-04-30 2022-04-19 Shure Acquisition Holdings, Inc. Array microphone system and method of assembling the same
US11310596B2 (en) 2018-09-20 2022-04-19 Shure Acquisition Holdings, Inc. Adjustable lobe shape for array microphones
US11438691B2 (en) 2019-03-21 2022-09-06 Shure Acquisition Holdings, Inc. Auto focus, auto focus within regions, and auto placement of beamformed microphone lobes with inhibition functionality
US11445294B2 (en) 2019-05-23 2022-09-13 Shure Acquisition Holdings, Inc. Steerable speaker array, system, and method for the same
US11477327B2 (en) 2017-01-13 2022-10-18 Shure Acquisition Holdings, Inc. Post-mixing acoustic echo cancellation systems and methods
US11523212B2 (en) 2018-06-01 2022-12-06 Shure Acquisition Holdings, Inc. Pattern-forming microphone array
US11552611B2 (en) 2020-02-07 2023-01-10 Shure Acquisition Holdings, Inc. System and method for automatic adjustment of reference gain
US11558693B2 (en) 2019-03-21 2023-01-17 Shure Acquisition Holdings, Inc. Auto focus, auto focus within regions, and auto placement of beamformed microphone lobes with inhibition and voice activity detection functionality
US11678109B2 (en) 2015-04-30 2023-06-13 Shure Acquisition Holdings, Inc. Offset cartridge microphones
US11706562B2 (en) 2020-05-29 2023-07-18 Shure Acquisition Holdings, Inc. Transducer steering and configuration systems and methods using a local positioning system
US11785380B2 (en) 2021-01-28 2023-10-10 Shure Acquisition Holdings, Inc. Hybrid audio beamforming system

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107818789B (en) 2013-07-16 2020-11-17 华为技术有限公司 Decoding method and decoding device
CN108364657B (en) 2013-07-16 2020-10-30 超清编解码有限公司 Method and decoder for processing lost frame
CN104978966B (en) * 2014-04-04 2019-08-06 腾讯科技(深圳)有限公司 Frame losing compensation implementation method and device in audio stream
CN105225666B (en) 2014-06-25 2016-12-28 华为技术有限公司 The method and apparatus processing lost frames
CN113838453B (en) * 2021-08-17 2022-06-28 北京百度网讯科技有限公司 Voice processing method, device, equipment and computer storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6826527B1 (en) * 1999-11-23 2004-11-30 Texas Instruments Incorporated Concealment of frame erasures and method
CN1989548A (en) * 2004-07-20 2007-06-27 松下电器产业株式会社 Audio decoding device and compensation frame generation method
US20080154588A1 (en) * 2006-12-26 2008-06-26 Yang Gao Speech Coding System to Improve Packet Loss Concealment
CN101286319A (en) * 2006-12-26 2008-10-15 高扬 Speech coding system to improve packet loss repairing quality
CN101894558A (en) * 2010-08-04 2010-11-24 华为技术有限公司 Lost frame recovering method and equipment as well as speech enhancing method, equipment and system
CN102122511A (en) * 2007-11-05 2011-07-13 华为技术有限公司 Signal processing method and device as well as voice decoder

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6826527B1 (en) * 1999-11-23 2004-11-30 Texas Instruments Incorporated Concealment of frame erasures and method
CN1989548A (en) * 2004-07-20 2007-06-27 松下电器产业株式会社 Audio decoding device and compensation frame generation method
US20080154588A1 (en) * 2006-12-26 2008-06-26 Yang Gao Speech Coding System to Improve Packet Loss Concealment
CN101286319A (en) * 2006-12-26 2008-10-15 高扬 Speech coding system to improve packet loss repairing quality
CN102122511A (en) * 2007-11-05 2011-07-13 华为技术有限公司 Signal processing method and device as well as voice decoder
CN101894558A (en) * 2010-08-04 2010-11-24 华为技术有限公司 Lost frame recovering method and equipment as well as speech enhancing method, equipment and system

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11031020B2 (en) 2014-03-21 2021-06-08 Huawei Technologies Co., Ltd. Speech/audio bitstream decoding method and apparatus
CN107369455A (en) * 2014-03-21 2017-11-21 华为技术有限公司 The coding/decoding method and device of language audio code stream
US11832053B2 (en) 2015-04-30 2023-11-28 Shure Acquisition Holdings, Inc. Array microphone system and method of assembling the same
US11310592B2 (en) 2015-04-30 2022-04-19 Shure Acquisition Holdings, Inc. Array microphone system and method of assembling the same
US11678109B2 (en) 2015-04-30 2023-06-13 Shure Acquisition Holdings, Inc. Offset cartridge microphones
US11477327B2 (en) 2017-01-13 2022-10-18 Shure Acquisition Holdings, Inc. Post-mixing acoustic echo cancellation systems and methods
US10997982B2 (en) 2018-05-31 2021-05-04 Shure Acquisition Holdings, Inc. Systems and methods for intelligent voice activation for auto-mixing
US11798575B2 (en) 2018-05-31 2023-10-24 Shure Acquisition Holdings, Inc. Systems and methods for intelligent voice activation for auto-mixing
US11800281B2 (en) 2018-06-01 2023-10-24 Shure Acquisition Holdings, Inc. Pattern-forming microphone array
US11523212B2 (en) 2018-06-01 2022-12-06 Shure Acquisition Holdings, Inc. Pattern-forming microphone array
US11297423B2 (en) 2018-06-15 2022-04-05 Shure Acquisition Holdings, Inc. Endfire linear array microphone
US11770650B2 (en) 2018-06-15 2023-09-26 Shure Acquisition Holdings, Inc. Endfire linear array microphone
US11310596B2 (en) 2018-09-20 2022-04-19 Shure Acquisition Holdings, Inc. Adjustable lobe shape for array microphones
US11438691B2 (en) 2019-03-21 2022-09-06 Shure Acquisition Holdings, Inc. Auto focus, auto focus within regions, and auto placement of beamformed microphone lobes with inhibition functionality
US11558693B2 (en) 2019-03-21 2023-01-17 Shure Acquisition Holdings, Inc. Auto focus, auto focus within regions, and auto placement of beamformed microphone lobes with inhibition and voice activity detection functionality
US11303981B2 (en) 2019-03-21 2022-04-12 Shure Acquisition Holdings, Inc. Housings and associated design features for ceiling array microphones
US11778368B2 (en) 2019-03-21 2023-10-03 Shure Acquisition Holdings, Inc. Auto focus, auto focus within regions, and auto placement of beamformed microphone lobes with inhibition functionality
US11445294B2 (en) 2019-05-23 2022-09-13 Shure Acquisition Holdings, Inc. Steerable speaker array, system, and method for the same
US11800280B2 (en) 2019-05-23 2023-10-24 Shure Acquisition Holdings, Inc. Steerable speaker array, system and method for the same
US11302347B2 (en) 2019-05-31 2022-04-12 Shure Acquisition Holdings, Inc. Low latency automixer integrated with voice and noise activity detection
US11688418B2 (en) 2019-05-31 2023-06-27 Shure Acquisition Holdings, Inc. Low latency automixer integrated with voice and noise activity detection
US11750972B2 (en) 2019-08-23 2023-09-05 Shure Acquisition Holdings, Inc. One-dimensional array microphone with improved directivity
US11297426B2 (en) 2019-08-23 2022-04-05 Shure Acquisition Holdings, Inc. One-dimensional array microphone with improved directivity
US11552611B2 (en) 2020-02-07 2023-01-10 Shure Acquisition Holdings, Inc. System and method for automatic adjustment of reference gain
US11706562B2 (en) 2020-05-29 2023-07-18 Shure Acquisition Holdings, Inc. Transducer steering and configuration systems and methods using a local positioning system
US11785380B2 (en) 2021-01-28 2023-10-10 Shure Acquisition Holdings, Inc. Hybrid audio beamforming system

Also Published As

Publication number Publication date
CN102915737B (en) 2018-01-19
CN102915737A (en) 2013-02-06

Similar Documents

Publication Publication Date Title
WO2013016986A1 (en) Compensation method and device for frame loss after voiced initial frame
US10643624B2 (en) Apparatus and method for improved concealment of the adaptive codebook in ACELP-like concealment employing improved pulse resynchronization
US7519535B2 (en) Frame erasure concealment in voice communications
US9263049B2 (en) Artifact reduction in packet loss concealment
US11367453B2 (en) Apparatus and method for generating an error concealment signal using power compensation
US11410663B2 (en) Apparatus and method for improved concealment of the adaptive codebook in ACELP-like concealment employing improved pitch lag estimation
US20090037168A1 (en) Apparatus for Improving Packet Loss, Frame Erasure, or Jitter Concealment
KR101692659B1 (en) Comfort noise addition for modeling background noise at low bit-rates
WO2013060223A1 (en) Frame loss compensation method and apparatus for voice frame signal
WO2007143953A1 (en) Device and method for lost frame concealment
WO2017166800A1 (en) Frame loss compensation processing method and device
JP7167109B2 (en) Apparatus and method for generating error hidden signals using adaptive noise estimation
US20200273466A1 (en) Apparatus and method for generating an error concealment signal using individual replacement LPC representations for individual codebook information
EP2983171B1 (en) Decoding method and decoding device
CN109496333A (en) A kind of frame losing compensation method and equipment
CN106898356B (en) Packet loss hiding method and device suitable for Bluetooth voice call and Bluetooth voice processing chip
JP3754819B2 (en) Voice communication method and voice communication apparatus

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12819832

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12819832

Country of ref document: EP

Kind code of ref document: A1