JP2001154699A

JP2001154699A - Hiding for frame erasure and its method

Info

Publication number: JP2001154699A
Application number: JP2000356459A
Authority: JP
Inventors: Takahiro Unno; ウンノタカヒロ
Original assignee: Texas Instruments Inc
Current assignee: Texas Instruments Inc
Priority date: 1999-11-23
Filing date: 2000-11-22
Publication date: 2001-06-08
Also published as: EP1103953A2; EP1103953B1; DE60030069T2; DE60030069D1; ATE336780T1; EP1103953A3

Abstract

PROBLEM TO BE SOLVED: To provide the hiding method of erased frames by frame repetition that the performance is improved. SOLUTION: This method uses a decoder with respect to frames which are encodingly excited and LP-encoded by both of an adaptive code book and a fixed code book. Moreover, the method uses muted repetitive excitation, a composite filter for expanding and repeating a threshold adaptive band width and a repetitive pitch delay (T(m)) which is made to jitter in order to hide erased frames.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は電子装置に関するも
のであり、更に詳しくは音声符号化、送信、記憶、およ
び復号／合成の方法と回路に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to electronic devices, and more particularly, to a method and circuit for speech encoding, transmitting, storing, and decoding / synthesizing.

【０００２】[0002]

【従来の技術】現在の、そして予見可能なディジタル通
信では、低ビットレートを使用するディジタル音声シス
テムの性能はますます重要になってきた。専用チャネル
とネットワーク上パケット化（たとえば、ＩＰ上音声ま
たはパケット上音声）伝送はともに、音声信号の圧縮に
よる利益がある。広く使用されている線形予測（ＬＰ：
ｌｉｎｅａｒｐｒｅｄｉｃｔｉｏｎ）ディジタル音声
圧縮方法は、人間の声をまねるために、声道を、時間と
ともに変化するフィルタ及び時間とともに変化するフィ
ルタの励起としてモデリングする。線形予測分析はと設定し、フレーム内の残差ｒ（ｎ）のエネルギーΣｒ
（ｎ）²を最小にすることにより、ディジタル音声サン
プル｛ｓ（ｎ）｝の入力フレームに対するＬＰ係数
ａ_i、ｉ＝１，２，．．．，Ｍを決める。通常、線形予
測フィルタの次数Ｍは約１０から１２とされる。サンプ
ルｓ（ｎ）を形成するためのサンプリングレートは通常
８ｋＨｚとされる（ディジタル送信のための公衆交換電
話網サンプリングと同じである）。フレーム内のサンプ
ル｛ｓ（ｎ）｝の数は通常８０または１６０である（１
０または２０ｍｓのフレーム）。サンプルのフレーム
は、入力音声サンプルに種々のウィンドウ化（ｗｉｎｄ
ｏｗｉｎｇ）操作を加えることにより作成してもよい。
「線形予測」という名称は、を、先行音声サンプルの線形組み合わせによってｓ（ｎ）を予測する際の誤差と解釈することに
由来する。したがって、Σｒ（ｎ）²を最小にすること
により、フレームに対する線形予測を最善にする
｛ａ_i｝が得られる。係数｛ａ_i｝は、線スペクトル周波
数（ＬＳＦ：ｌｉｎｅｓｐｅｃｔｒａｌｆｒｅｑｕｅ
ｎｃｉｅｓ）に変換することにより量子化して、送信ま
たは記憶してもよいし、線スペクトル対（ＬＳＰ：ｌｉ
ｎｅｓｐｅｃｔｒａｌｐａｉｒｓ）に変換すること
によりサブフレーム相互間で内挿してもよい。BACKGROUND OF THE INVENTION In current and foreseeable digital communications, the performance of digital voice systems using low bit rates has become increasingly important. Both dedicated channels and packetized over network (eg, voice over IP or voice over packet) transmissions benefit from compression of voice signals. Widely used linear prediction (LP:
Linear prediction digital speech compression methods model the vocal tract as a time-varying filter and a time-varying filter excitation to mimic the human voice. Linear predictive analysis And the energy Σr of the residual r (n) in the frame
By minimizing (n) ² , the LP coefficients a _i , i = 1, 2,. . . , M. Usually, the order M of the linear prediction filter is about 10 to 12. The sampling rate for forming samples s (n) is typically 8 kHz (same as public switched telephone network sampling for digital transmission). The number of samples {s (n)} in a frame is typically 80 or 160 (1
0 or 20 ms frame). The frames of the sample are variously windowed (wind
owing) operation.
The name "linear prediction" Is the linear combination of the preceding audio samples Is interpreted as an error in predicting s (n). Therefore, minimizing {r (n) ² yields {a _i } that optimizes linear prediction for the frame. The coefficient {a _i } is determined by a line spectral frequency (LSF).
nces), and may be quantized and transmitted or stored, or a line spectrum pair (LSP: li)
(ne.spectral pairs) to interpolate between subframes.

【０００３】｛ｒ（ｎ）｝はフレームに対するＬＰ残差
である。理想的には、ＬＰ残差は合成フィルタ１／Ａ
（ｚ）に対する励起である。ここでＡ（ｚ）は式（１）
の伝達関数である。もちろん、ＬＰ残差は復号器では得
られない。したがって、符号器の仕事は、符号化された
パラメータからＬＰ残差を模倣する励起を復号器が発生
できるようにＬＰ残差を表現することである。生理学的
に、有声フレームの場合には励起はほぼ、ピッチ周波数
での一連のパルスの形式となり、無声フレームの場合に
は励起はほぼ白色雑音の形式となる。[0003] {r (n)} is the LP residual for the frame. Ideally, the LP residual is the synthesis filter 1 / A
Excitation for (z). Here, A (z) is given by equation (1).
Is the transfer function of Of course, the LP residual cannot be obtained at the decoder. Thus, the task of the encoder is to represent the LP residual so that the decoder can generate an excitation that mimics the LP residual from the encoded parameters. Physiologically, for voiced frames the excitation will be in the form of a series of pulses at the pitch frequency, and for unvoiced frames the excitation will be in the form of almost white noise.

【０００４】ＬＰ圧縮アプローチは基本的に、（量子化
された）フィルタ係数、（量子化された）残差（波形、
またはピッチのようなパラメータ）、および（量子化さ
れた）利得（一つまたは複数）に対する更新を送信／記
憶するに過ぎない。受信器は送信／記憶されたアイテム
を復号し、入力音声を同じ知覚特性で再生する。図５お
よび図６はＬＰシステムの高レベルブロックを示す。量
子化されたアイテムの周期的更新の所要ビットは音声信
号の直接表現より少ないので、合理的なＬＰ符号器は２
から３ｋｂ／ｓ（キロビット／秒）という低いビットレ
ートで動作することができる。The LP compression approach basically consists of (quantized) filter coefficients, (quantized) residuals (waveforms,
Or parameters (such as pitch) and (quantized) gain (s) are only transmitted / stored. The receiver decodes the transmitted / stored item and plays the input audio with the same perceptual characteristics. 5 and 6 show the high-level blocks of the LP system. Since the required bits for the periodic update of the quantized item are less than the direct representation of the audio signal, a reasonable LP encoder is 2
To 3 kb / s (kilobits / second).

【０００５】しかし、無線伝送の高誤り率と網伝送に対
する大きなパケット損失／遅延により、ＬＰ復号器は、
非常に多くのビットがそこなわれてフレームが無視され
る（消去される）ようなフレーム群を扱わなければなら
なくなる。フレームが消去された場合に無線またはパケ
ット上音声の用途に対する音声の品質と了解度を維持す
るために、復号器は通常、このようなフレーム消去を隠
蔽（ｃｏｎｃｅａｌ）する方法をそなえており、このよ
うな方法は内挿形または反復形に分類することができ
る。内挿形隠蔽方法は、未来のフレームパラメータと過
去のフレームパラメータの両方を用いることにより欠落
パラメータを内挿する。一般に内挿形隠蔽方法では、過
去のフレームパラメータだけを用いる反復形隠蔽方法に
比べて、欠落フレームの音声信号の近似が改善される。
無線通信のような用途では、内挿形隠蔽（ｃｏｎｃｅａ
ｌ）方法は未来のフレームを取得するための遅延が付加
されるという犠牲を払わなければならない。パケット上
音声通信では、未来のフレームはプレイアウトバッファ
から入手できる。プレイアウトバッファはパケットの到
着ジッタを補償する。内挿形隠蔽方法では主として、プ
レイアウトバッファのサイズが大きくなる。単に過去の
フレームパラメータを反復または修正する反復形隠蔽方
法は、Ｇ．７２９、Ｇ．７２３．１、およびＧＳＭ−Ｅ
ＦＲを含むいくつかのＣＥＬＰ形音声符号器で使用され
る。これらの符号器での反復形隠蔽方法では、遅延また
はプレイアウトバッファのサイズが大きくなることはな
いが、消去フレームがある状態で、再構成された音声の
性能は内挿形アプローチのそれに比べて劣る。特に、消
去フレームの比率が高い環境、またはバースト状のフレ
ーム消去環境では、そうである。However, due to the high error rate of wireless transmission and the large packet loss / delay for network transmission, LP decoders
One has to deal with frames where so many bits are lost that frames are ignored (erased). To maintain speech quality and intelligibility for wireless or packet-on-packet applications when frames are erased, decoders typically have a method to conceal such frame erasures. Such methods can be categorized as interpolated or iterative. The interpolation concealment method interpolates missing parameters by using both future frame parameters and past frame parameters. In general, the interpolation concealment method improves the approximation of the speech signal of the missing frame compared to the iterative concealment method using only past frame parameters.
In applications such as wireless communications, interpolation concealment (concea
l) The method must come at the expense of adding a delay to obtain future frames. For voice over packet communication, future frames are available from playout buffers. The playout buffer compensates for packet arrival jitter. The interpolation concealment method mainly increases the size of the playout buffer. An iterative concealment method that simply repeats or modifies past frame parameters is described in 729, G.C. 723.1, and GSM-E
Used in some CELP speech encoders, including FR. The iterative concealment method in these encoders does not increase the delay or the size of the playout buffer, but in the presence of erasure frames, the performance of the reconstructed speech is lower than that of the interpolation approach. Inferior. This is especially true in an environment where the ratio of erased frames is high or in a burst-like frame erased environment.

【０００６】更に詳しく説明すると、ＩＴＵ規格Ｇ．７
２９は１０ｍｓの長さのフレーム（８０個のサンプルを
使用する。ピッチおよび利得のパラメータのトラッキン
グを改善し、コードブック（ｃｏｄｅｂｏｏｋ）探索の
複雑さを少なくするために、１０ｍｓの長さのフレーム
は各々が５ｍｓで４０個のサンプルの二つのサブフレー
ムに分割される。各サブフレームは、適応コードブック
寄与分と固定（代数的）コードブック寄与分によって表
される励起をそなえている。適応コードブック寄与分は
励起に周期性を与える。適応コードブック寄与分は、現
在フレームピッチの時間遅れにより変換され、内挿され
た前のフレームの励起ｖ（ｎ）に利得ｇ _pを掛けた積で
ある。代数的コードブック寄与分は、実際の残差と適応
コードブック寄与分との差を４パルスベクトルｃ（ｎ）
と利得ｇ_cの積で近似する。したがって、励起はｕ
（ｎ）＝ｇ_pｖ（ｎ）＋ｇ_cｃ（ｎ）である。ここで、ｖ
（ｎ）は前の（復号された）フレームから得られるもの
であり、ｇ_p、ｇ_c、およびｃ（ｎ）は現在のフレームに
対して送信されたパラメータから得られるものである。
図３および図４は符号化と復号化をブロック形式で示
す。ポストフィルタは本質的にどの周期性をも強調する
（たとえば、母音）。More specifically, according to ITU standard G. 7
29 is a 10 ms long frame (80 samples
use. Track pitch and gain parameters
To improve codebook search
10ms long frame to reduce complexity
Are two subframes of 40 samples each of 5 ms
Divided into Each subframe has an adaptive codebook
Represented by contributions and fixed (algebraic) codebook contributions
With the excitation to be performed. The adaptive codebook contribution is
Gives periodicity to the excitation. The adaptive codebook contribution is
Converted and interpolated by the time delay of the current frame pitch
Gain g to the excitation v (n) of the previous frame _pMultiplied by
is there. The algebraic codebook contribution is the actual residual and adaptive
The difference between the codebook contribution and the four-pulse vector c (n)
And gain g_cApproximate by the product of Therefore, the excitation is u
(N) = g_pv (n) + g_cc (n). Where v
(N) is from previous (decoded) frame
And g_p, G_c, And c (n) are in the current frame
It is obtained from the parameters transmitted to it.
3 and 4 show the encoding and decoding in block form.
You. Postfilters emphasize essentially any periodicity
(For example, vowels).

【０００７】Ｇ．７２９は前に受信した情報に基づく再
構成によりフレーム消去を取り扱う。すなわち、反復形
隠蔽である。すなわち、欠落励起信号を類似特性の一つ
に置き換えるとともに、（長期ポストフィルタ分析の一
部として演算される）長期の予測利得に基づく構音分類
器（ｖｏｉｃｉｎｇｃｌａｓｓｉｆｉｅｒ）を使用す
ることによりそのエネルギーを徐々に減衰させる。長期
ポストフィルタは、最適遅延判定に０．５より大きい正
規化された相関法を使用することにより、予測利得が３
ｄＢより大きい長期予測子（ｌｏｎｇ−ｔｅｒｍｐｒ
ｅｄｉｃｔｏｒ）を見出す。誤り隠蔽プロセスについて
は、一つ以上の５ｍｓのサブフレームが３ｄＢより大き
い長期予測利得をそなえている場合には、１０ｍｓのフ
レームは周期的であると宣言される。そうでない場合に
は、フレームは非周期的であると宣言される。消去され
たフレームはそのクラスを先行（再構成された）音声フ
レームから受け継ぐ。注意すべきことは、構音分類はこ
の再構成された音声信号に基づいて連続的に更新される
ということである。消去されたフレームに対して講じら
れた特定のステップは次の通りである。G. 729 handles frame erasure by reconstruction based on previously received information. That is, iterative concealment. That is, replace the missing excitation signal with one of the similar characteristics, and gradually reduce its energy by using a voicing classifier based on long-term predicted gain (calculated as part of long-term post-filter analysis). To attenuate. The long-term post-filter uses a normalized correlation method greater than 0.5 for optimal delay determination, resulting in a prediction gain of 3
Long-term predictors greater than dB (long-term pr
dictor). For the error concealment process, a 10 ms frame is declared to be periodic if one or more 5 ms subframes have a long term prediction gain greater than 3 dB. Otherwise, the frame is declared aperiodic. The erased frame inherits its class from the preceding (reconstructed) speech frame. Note that the articulation classification is continuously updated based on this reconstructed audio signal. The specific steps taken for the erased frame are as follows.

【０００８】１）合成フィルタパラメータの反復。最後
の良好なフレームのＬＰパラメータが使用される。1) Iteration of synthesis filter parameters. The LP parameter of the last good frame is used.

【０００９】２）適応コードブック利得と固定コードブ
ック利得の減衰。適応コードブック利得は前の適応コー
ドブック利得の減衰バージョンに基づく。第（ｍ＋１）
のフレームが消去された場合には、ｇ_p ^(m+1)＝０．９ｇ
_p ^(m)を使用する。同様に、固定コードブック利得は前の
固定コードブック利得の減衰バージョンに基づく。ｇ _c
^(m+1)＝０．９８ｇ_c ^(m)。2) Adaptive codebook gain and fixed codebook
Attenuation of clock gain. The adaptive codebook gain is
Based on the attenuated version of the bookbook gain. (M + 1)
When the frame of is deleted, g_p ^{(m + 1)}= 0.9g
_p ^(m)Use Similarly, the fixed codebook gain is
Based on an attenuated version of the fixed codebook gain. g _c
^{(m + 1)}= 0.98g_c ^(m).

【００１０】３）利得予測子のメモリの減衰。固定コー
ドブック利得に対する利得予測子は、前に選択された代
数的コードブックベクトルｃ（ｎ）のエネルギーを使用
する。遷移の影響を避けるため、良好なフレームを一旦
受信すれば、利得予測子のメモリは４個の前のフレーム
にわたる平均コードブックエネルギーの減衰バージョン
で更新される。3) Decay of the memory of the gain predictor. The gain predictor for a fixed codebook gain uses the energy of the previously selected algebraic codebook vector c (n). To avoid the effects of the transition, once a good frame is received, the gain predictor memory is updated with an attenuated version of the average codebook energy over the four previous frames.

【００１１】４）置換励起の発生。使用される励起は周
期性分類で決まる。最後の再構成されたフレームが周期
的であると分類された場合には、現在のフレームも周期
的であると考えられる。その場合、適応コードブック寄
与分だけが使用され、固定コードブック寄与分は０にセ
ットされる。ピッチ遅延は前のフレームのピッチ遅延の
整数部分に基づいており、後続の各フレームに対して繰
り返される。過度の周期性を避けるため、ピッチ遅延値
は次のフレーム毎に１だけ増され、１４３を限度とす
る。これと異なり、最後の再構成されたフレームが非周
期的であると分類された場合には、現在のフレームも非
周期的であると考えられ、適応コードブック寄与分が０
にセットされる。固定コードブック寄与分は、コードブ
ックインデックスと符号インデックスをランダムに選択
することにより発生される。分類を使用することによ
り、どちらの形の励起に対しても異なる減衰率を使用す
ることができる（たとえば、周期的利得に対しては０．
９、非周期的利得に対しては０．９８）。図２は隠蔽パ
ラメータをそなえた復号器を示す。4) Generation of displacement excitation. The excitation used depends on the periodicity classification. If the last reconstructed frame was classified as periodic, then the current frame is also considered periodic. In that case, only the adaptive codebook contribution is used and the fixed codebook contribution is set to zero. The pitch delay is based on the integer part of the pitch delay of the previous frame and is repeated for each subsequent frame. To avoid excessive periodicity, the pitch delay value is increased by one every next frame, up to 143. Alternatively, if the last reconstructed frame was classified as aperiodic, then the current frame is also considered aperiodic and the adaptive codebook contribution is 0.
Is set to The fixed codebook contribution is generated by randomly selecting a codebook index and a code index. By using classification, different decay rates can be used for both types of excitation (e.g., 0. 0 for periodic gain).
9, 0.98 for aperiodic gain). FIG. 2 shows a decoder with concealment parameters.

【００１２】プロシーディングズ・ワイヤレス誌のレウ
ングらによる「ディジタルセルラー通信と無線通信にお
けるＣＥＬＰ音声符号器のための音声フレーム再構成方
法」（Ｌｅｕｎｇｅｔａｌ．，ＶｏｉｃｅＦｒａ
ｍｅＲｅｃｏｎｓｔｒｕｃｔｉｏｎＭｅｔｈｏｄｓ
ｆｏｒＣＥＬＰＳｐｅｅｃｈＣｏｄｅｒｓｉｎ
ＤｉｇｉｔａｌＣｅｌｌｕｌａｒａｎｄＷｉｒ
ｅｌｅｓｓＣｏｍｍｕｎｉｃａｔｉｏｎｓ，Ｐｒｏ
ｃ．Ｗｉｒｅｌｅｓｓ９３（Ｊｕｌｙ１９９３））
は、フレーム当たり４個のサブフレームを使用する低複
雑度ＣＥＬＰ符号器のためのパラメトリックな外挿と内
挿を使用する欠落フレーム再構成について説明してい
る。しかし、反復形隠蔽方法の結果は芳しくない。"Speech Frame Reconstruction Method for CELP Speech Encoder in Digital Cellular and Wireless Communications," by Leung et al., Proceedings Wireless Magazine (Leung et al., Voice Fra).
me Reconstruction Methods
for CELP Speech Codersin
Digital Cellular and Wir
eless Communications, Pro
c. Wireless 93 (Jully 1993)
Describes missing frame reconstruction using parametric extrapolation and interpolation for low complexity CELP encoders using four subframes per frame. However, the results of the iterative hiding method are not good.

【００１３】[0013]

【発明が解決しようとする課題】本発明の目的は、反復
形隠蔽方法に対する性能を改善した、フレーム反復によ
る消去フレームの隠蔽方法を提供することである。SUMMARY OF THE INVENTION It is an object of the present invention to provide a method for concealing erased frames by frame repetition, which has improved performance for iterative concealment methods.

【００１４】[0014]

【課題を解決するための手段】本発明では、励起信号ミ
ューティング（ｍｕｔｉｎｇ）、カットオフ周波数での
ＬＰ係数帯域幅拡大、およびピッチ遅延ジッタリングの
中の一つ以上と共に、フレームを反復する。SUMMARY OF THE INVENTION In accordance with the present invention, a frame is repeated with one or more of excitation signal muting, LP coefficient bandwidth expansion at a cutoff frequency, and pitch delay jittering.

【００１５】[0015]

【発明の実施の形態】１．概説ＣＥＬＰ符号化された音声等の信号の送信におけるフレ
ーム消去を隠蔽するための好適実施例の復号器と方法
は、次の三つの特徴の中の一つ以上をそなえる。（１）帰還ループの外側で励起をミューティングする。
これは適応コードブック利得と固定コードブック利得の
減衰を置き換える。（２）拡大係数を異ならせるために
閾値周波数のＬＰ合成フィルタの帯域幅を拡大する。
（３）周期的な反復フレームの重なりを避けるためにピ
ッチ遅延をジッタリングする。特徴（２）および（３）
は特に、フレーム消去に至るバースト状雑音に当てはま
る。図１は三つの隠蔽の特徴をすべて使用する好適実施
例の復号器を示す。これは図２に示されたＧ．７２９規
格の復号器の隠蔽と対照をなすものである。BEST MODE FOR CARRYING OUT THE INVENTION Overview The decoder and method of the preferred embodiment for concealing frame erasure in the transmission of a signal such as CELP encoded speech comprises one or more of the following three features. (1) Muting the excitation outside the feedback loop.
This replaces adaptive codebook gain and fixed codebook gain attenuation. (2) The bandwidth of the LP synthesis filter having the threshold frequency is expanded to make the expansion coefficient different.
(3) Jitter the pitch delay to avoid the overlap of periodic repetitive frames. Features (2) and (3)
This is especially true for bursty noise leading to frame erasure. FIG. 1 shows a preferred embodiment decoder that uses all three concealment features. This is shown in FIG. This contrasts with the concealment of the decoder of the G.729 standard.

【００１６】好適実施例のシステム（たとえば、ＩＰ上
音声またはパケット上音声）は、復号器の中に好適実施
例の隠蔽方法を含む。The preferred embodiment system (eg, voice over IP or voice over packet) includes the preferred embodiment concealment method in a decoder.

【００１７】２．符号器の詳細好適実施例を説明するためには、Ｇ．７２９に類似した
符号化方法のある詳細が必要とされる。詳しく説明する
と、励起寄与分が適応コードブックと代数的コードブッ
クの両方からあるＬＰ符号化を使用する音声符号器を示
す。そして好適実施例の隠蔽の特徴はピッチ遅延、コー
ドブック利得、およびＬＰ合成フィルタに影響を及ぼ
す。符号化は次のように進む。2. Encoder Details To describe the preferred embodiment, see Certain details of an encoding method similar to G.729 are needed. In particular, a speech encoder using LP coding with excitation contributions from both an adaptive codebook and an algebraic codebook is shown. And the concealment features of the preferred embodiment affect pitch delay, codebook gain, and LP synthesis filter. The encoding proceeds as follows.

【００１８】（１）８ｋＨｚまたは１６ｋＨｚで入力音
声信号（直流と低周波数等をフィルタで除去するために
前処理してもよい）をサンプリングすることにより、一
連のディジタルサンプルｓ（ｎ）を得る。サンプルスト
リームをフレームに分割する。たとえば８０個のサンプ
ルまたは１６０個のサンプル（たとえば、１０ｍｓのフ
レーム）または他の都合のよいサイズに分割する。分析
と符号化はフレームの種々のサイズのサブフレームまた
は他の期間を使用してもよい。(1) A series of digital samples s (n) are obtained by sampling the input audio signal at 8 kHz or 16 kHz (which may be pre-processed to filter out DC and low frequencies, etc.). Divide the sample stream into frames. For example, split into 80 samples or 160 samples (eg, 10 ms frame) or other convenient size. The analysis and encoding may use different sized subframes of the frame or other periods.

【００１９】（２）各フレーム（またはサブフレーム）
に対して、線形予測（ＬＰ：ｌｉｎｅａｒｐｒｅｄｉ
ｃｔｉｏｎ）分析を適用することにより、ＬＰ（および
したがってＬＳＦ／ＬＳＰ）係数を見出して、係数を量
子化する。更に詳しく説明すると、ＬＳＦは０とナイキ
スト周波数（８ｋＨｚまたは１６ｋＨｚのサンプリング
レートに対する４ｋＨｚまたは８ｋＨｚ）の間で単調に
増加する周波数｛ｆ₁，ｆ₂，ｆ₃，．．．ｆ_N｝である。
すなわち、０＜ｆ₁＜ｆ₂．．．＜ｆ_M＜ｆ_samp／２であ
り、Ｍは線形予測フィルタの次数であり、通常１０から
１２の範囲にある。周波数群とその周波数群の４次移動
平均予測値との間の差群をベクトル量子化することによ
り、送信／記憶のためのＬＳＦを量子化する。(2) Each frame (or subframe)
For linear prediction (LP: linear predi)
ction) analysis to find the LP (and hence LSF / LSP) coefficients and quantize the coefficients. More specifically, the LSF is a monotonically increasing frequency {f ₁ , f ₂ , f ₃ ,... Between 0 and the Nyquist frequency (4 kHz or 8 kHz for a sampling rate of 8 kHz or 16 kHz). . . f _N }.
That is, 0 <f ₁ <f ₂ . . . <F _M <f _samp / 2, where M is the order of the linear prediction filter and is usually in the range of 10 to 12. The LSF for transmission / storage is quantized by vector quantizing the difference group between the frequency group and the fourth-order moving average prediction value of the frequency group.

【００２０】（３）ウィンドウ化された範囲内でｓ
（ｎ）とｓ（ｎ＋ｋ）との相関を探索することにより、
サブフレーム毎にピッチ遅延を見出す。探索の前にｓ
（ｎ）を知覚的にフィルタリングしてもよい。探索は２
段階で行ってもよい。すなわち、ピッチ遅延を見出すた
めにｓ（ｎ）の相関を使用する開ループ探索と、その後
の、（サブ）フレーム内の目的音声ｘ（ｎ）と前の（サ
ブ）フレームの励起に適用される（サブ）フレームの量
子化ＬＰ合成フィルタが発生する音声ｙ（ｎ）との正規
化された内積＜ｘ｜ｙ＞の最大値からの内挿によりピッ
チ遅延をリファインする閉ループ探索である。ピッチ遅
延の分解能はサンプルの一部分としてもよい。特に、よ
り小さいピッチ遅延の場合がそうである。このとき、適
応コードブックｖ（ｎ）はリファインされたピッチ遅延
により変換され、内挿された前の（サブ）フレームの励
起である。(3) s within the windowed range
By searching for the correlation between (n) and s (n + k),
Find the pitch delay for each subframe. Before searching
(N) may be filtered perceptually. Search is 2
It may be performed in stages. That is, it applies to the open loop search using the correlation of s (n) to find the pitch delay, and then to the excitation of the target speech x (n) in the (sub) frame and the previous (sub) frame. This is a closed-loop search that refines the pitch delay by interpolation from the maximum value of the normalized inner product <x | y> with the speech y (n) generated by the quantized LP synthesis filter of the (sub) frame. The resolution of the pitch delay may be part of the sample. This is especially true for smaller pitch delays. At this time, the adaptive codebook v (n) is the excitation of the previous (sub) frame, which has been transformed by the refined pitch delay and interpolated.

【００２１】（４）適応コードブック利得ｇ_pを、内積
＜ｘ｜ｙ＞を＜ｙ｜ｙ＞で割った比と決める。ここで、
ｘ（ｎ）は（サブ）フレーム内の目的音声であり、ｙ
（ｎ）はステップ（３）からの適応コードブックベクト
ルｖ（ｎ）に印加される量子化ＬＰ合成フィルタが発生
する（サブ）フレーム内の（知覚的に重みづけされた）
音声である。したがって、ｇ_pｖ（ｎ）は励起に対する
適応コードブック寄与分であり、ｇ_pｙ（ｎ）は（サ
ブ）フレーム内の音声に対する適応コードブック寄与分
である。(4) Determine the adaptive codebook gain g _p as the ratio of the inner product <x | y> divided by <y | y>. here,
x (n) is the target speech in the (sub) frame, y
(N) is the (perceptually weighted) in the (sub) frame where the quantized LP synthesis filter is applied to the adaptive codebook vector v (n) from step (3)
It is voice. Thus, g _p v (n) is the adaptive codebook contribution to the excitation and g _p y (n) is the adaptive codebook contribution to the speech in the (sub) frame.

【００２２】（５）（サブ）フレーム毎に、（サブ）フ
レーム内の目的音声として量子化ＬＰ合成フィルタでフ
ィルタリングされたｃ（ｎ）のｘ（ｎ）−ｇ_pｙ（ｎ）
との正規化された相関を本質的に最大化することにより
代数的なコードブックベクトルｃ（ｎ）を見出す。すな
わち、適応コードブック寄与分を除去することにより、
新しい目的を得る。詳しくは、相関＜ｘ−ｇ_pｙ｜Ｈ｜
ｃ＞の自乗をエネルギー＜ｃ｜Ｈ^TＨ｜ｃ＞で割った比
が最大になるように、可能な代数的コードブックベクト
ルｃ（ｎ）を探索する。ここで、ｈ（ｎ）は（知覚フィ
ルタリングを行う）量子化ＬＰ合成フィルタのインパル
ス応答であり、Ｈは対角線ｈ（０），ｈ（１）、．．．
をそなえた下側三角（ｌｏｗｅｒｔｒｉａｎｇｕｌａ
ｒ）テプリッツ（Ｔｏｅｐｌｉｔｚ）畳込み行列であ
る。符号化粒度として４０サンプル（５ｍｓ）の（サ
ブ）フレームが使用される場合には、ベクトルｃ（ｎ）
は４０個の位置をそなえる。４０個のサンプルは４個の
交互配置されたトラックに仕切られ、各トラックの中に
１パルスが配置される。３個のトラックは各々、８個の
サンプルをそなえ、１個のトラックは１６個のサンプル
をそなえている。[0022] (5) (sub) for each frame, (sub) x quantization as the target speech in the frame LP synthesis filter filtered c (n) (n) -g p y (n)
Find the algebraic codebook vector c (n) by essentially maximizing the normalized correlation with That is, by removing the adaptive codebook contribution,
Get a new purpose. For more information, correlation _{<x-g p y | H} |
The square of c> Energy ^{<c | H T H | c} > as divided by the ratio is maximized, the searching possible algebraic codebook vectors c (n). Where h (n) is the impulse response of the quantized LP synthesis filter (performing perceptual filtering) and H is the diagonal h (0), h (1),. . .
With lower triangle (lower triangula)
r) Toeplitz convolution matrix. If a (sub) frame of 40 samples (5 ms) is used as the coding granularity, the vector c (n)
Has 40 positions. Forty samples are partitioned into four interleaved tracks, with one pulse in each track. Each of the three tracks has eight samples, and one track has sixteen samples.

【００２３】（６）｜ｘ−ｇ_pｙ−ｇ_cｚ｜を最小にする
ことにより、代数的コードブック利得ｇ_cを決める。こ
こで、前の説明と同様に、ｘ（ｎ）は（サブ）フレーム
の中の目的音声であり、ｇ_pは適応コードブック利得で
あり、ｙ（ｎ）はｖ（ｎ）に適用される量子化ＬＰ合成
フィルタであり、ｚ（ｎ）は量子化ＬＰ合成フィルタを
代数的コードブックベクトルｃ（ｎ）に適用することに
より発生されるフレーム内の信号である。[0023] _{(6) | x-g p} y-g c z | by the minimizing, determine the algebraic codebook gain g _c. Here, as before, x (n) is the target speech in the (sub) frame, g _p is the adaptive codebook gain, and y (n) applies to v (n). A quantized LP synthesis filter, where z (n) is the signal in the frame generated by applying the quantized LP synthesis filter to the algebraic codebook vector c (n).

【００２４】（７）符号語の一部として挿入するために
利得ｇ_pおよびｇ_cを量子化する。代数的コードブック利
得は因数分解して予測してもよい。利得群はベクトル量
子化コードブックで一緒に量子化しいもよい。次に、
（サブ）フレームに対する励起はｕ（ｎ）＝ｇ_pｖ
（ｎ）＋ｇ_cｃ（ｎ）で量子化され、次の（サブ）フレ
ームで使用するために励起メモリが更新される。(7) Quantize gains g _p and g _c for insertion as part of the codeword. Algebraic codebook gains may be factored and predicted. The gains may be quantized together in a vector quantization codebook. next,
The excitation for the (sub) frame is u (n) = g _p v
It is quantized by (n) + g _c c (n) and the excitation memory is updated for use in the next (sub) frame.

【００２５】注意すべきことは、量子化されたアイテム
のすべてが通常、異なる値となり、先行フレームの値の
移動平均が予測子（ｐｒｅｄｉｃｔｏｒ）として使用さ
れる。すなわち、実際の値と予測された値の差だけが符
号化される。It should be noted that all of the quantized items typically have different values, and the moving average of the values of the previous frame is used as a predictor. That is, only the difference between the actual value and the predicted value is encoded.

【００２６】（サブ）フレームを符号化する最終的な符
号語は、量子化されたＬＳＦ係数、適応コードブックピ
ッチ遅延、代数的コードブックベクトル、および量子化
された適応コードブックと代数的コードブックの利得、
に対するビット群を含む。The final codewords for encoding the (sub) frames are the quantized LSF coefficients, the adaptive codebook pitch delay, the algebraic codebook vector, and the quantized adaptive and algebraic codebooks. Gain,
.

【００２７】３．復号器の詳細図１は好適実施例の復号器と復号化方法を示す。この復
号器と復号化方法は、本質的に前記符号化方法の符号化
ステップを逆にしたものであり、次の節で説明するよう
に消去フレーム再構成のための反復形隠蔽の特徴を提供
するものである。図４は隠蔽の特徴が無い復号器を示
し、第ｍ（サブ）フレームに対して次のように進める。（１）量子化されたＬＰ係数ａ_j ^(m)を復号化する。係数
は差分ＬＳＰの形式にできるので、前のフレームの復号
化された係数の移動平均を使用してもよい。ＬＳＰドメ
インで２０サンプル（サブフレーム）毎にＬＰ係数を内
挿してもよい。（２）適応コードブックの量子化されたピッチ遅延Ｔ
^(m)を復号化し、このピッチ遅延を前の復号化された
（サブ）フレームの励起ｕ^(m-1)（ｎ）に印加する（時
間変換と内挿）ことにより、ベクトルｖ^(m)（ｎ）を形
成する。これは図４の帰還ループである。（３）代数的コードブックベクトルｃ^(m)（ｎ）を復号
化する。（４）量子化された適応コードブックおよび代数的コー
ドブックの利得ｇ_p ^(m)およびｇ_c ^(m)を復号化する。（５）ステップ（２）−（４）からのアイテムを使用し
て第ｍ（サブ）フレームに対する励起をｕ^(m)（ｎ）＝
ｇ_p ^(m)ｖ^(m)（ｎ）＋ｇ_c ^(m)ｃ^(m)（ｎ）として形成す
る。（６）ステップ（１）のＬＰ合成フィルタからステップ
（５）の励起までを適用することにより音声を合成す
る。（７）任意のポストフィルタリングと他の成形動作を適
用する。3. Details of the Decoder FIG. 1 shows the decoder and the decoding method of the preferred embodiment. This decoder and decoding method essentially reverses the coding steps of the above coding method, and provides the feature of iterative concealment for erasure frame reconstruction as described in the next section. Is what you do. FIG. 4 shows a decoder without concealment features, proceeding as follows for the m-th (sub) frame. (1) The quantized LP coefficient a _j ^(m) is decoded. Since the coefficients can be in the form of a differential LSP, a moving average of the decoded coefficients of the previous frame may be used. The LP coefficient may be interpolated every 20 samples (subframes) in the LSP domain. (2) Quantized pitch delay T of adaptive codebook
^(m) decoding the, decoded before the pitch delay is applied to the (sub) frame of the excitation u ^(m-1) (n) by (time conversion and interpolation) to the vector v ^(m) (N) is formed. This is the feedback loop of FIG. (3) Decode the algebraic codebook vector c ^(m) (n). (4) Decode the gains g _p ^(m) and g _c ^(m) of the quantized adaptive and algebraic codebooks. (5) Use the items from steps (2)-(4) to calculate the excitation for the m th (sub) frame as u ^(m) (n) =
It is formed as g _p ^(m) v ^(m) (n) + g _c ^(m) c ^(m) (n). (6) A voice is synthesized by applying from the LP synthesis filter in step (1) to the excitation in step (5). (7) Apply any post-filtering and other shaping operations.

【００２８】４．好適実施例の隠蔽図１は好適実施例復号器の好適実施例隠蔽特徴を示し、
図２と対照をなすものである。詳しく述べると、第ｍフ
レームは復号化されたが、第（ｍ＋１）フレームは消去
されたものとし、第（ｍ＋２）フレーム，．．．，第
（ｍ＋ｊ）フレーム．．．も同様とする。次に、好適実
施例の隠蔽特徴は、下記の復号器ステップの一つ以上で
第（ｍ＋ｊ）フレームを構成する。4. FIG. 1 shows the preferred embodiment concealment features of the preferred embodiment decoder,
This is in contrast to FIG. Specifically, it is assumed that the m-th frame has been decoded, but the (m + 1) -th frame has been deleted, and the (m + 2) -th frame,. . . , The (m + j) th frame. . . The same applies to Next, the concealment feature of the preferred embodiment comprises the (m + j) th frame in one or more of the following decoder steps.

【００２９】（１）（量子化された）フィルタ係数ａ_k
^(m+j)が前の良好なフレームの（量子化された）係数ａ_k
^(m)の帯域幅拡大されたバージョンであるとみなすこと
により、ＬＰ合成フィルタを定める。ｊ＝１，２，．．．の相次ぐ消去されたフレ
ームに対して、であり、帯域幅拡大係数γ⁽ⁿ⁾は範囲［０．８，１．
０］に制限される。図１は合成フィルタに適用されるこ
の帯域幅拡大を示す。復号器はフレーム毎に次式により
帯域幅拡大を更新する。ここで、Ｃ_Bは引き続く消去されたフレームの数を計数
するバースト状フレーム消去カウンタであり、ＬＳＦＢ
Ｗ_MiNは最後の良好なフレームの最小ＬＳＦ帯域幅であ
る。第ｉのＬＳＦ帯域幅（ＬＳＦＢＷ_i）は｜ｆ_i+1−ｆ
_i｜と定められる。ＬＳＦ帯域幅が小さければ小さいほ
ど、対応するＬＰＣスペクトルのピーク（ホルマント）
が鋭くなる。すなわち、ＬＳＦＢＷ_minは最小のＬＳＦ
ＢＷ_iであるので、帯域幅拡大係数を小さくしてもよい
のは、少なくとも一対のＬＳＦ周波数が近接している
（ホルマントが鋭い）場合だけである。注意すべきこと
は、γ⁽ⁿ ⁾が小さくなるにつれて、合成フィルタの極が原点に向かって半径方向に動くことにより、ホルマン
トピークを拡大するということである。(1) (quantized) filter coefficient a _k
^{(m + j)} is the (quantized) coefficient a _{k of the} previous good frame
^The LP synthesis filter is assumed to be a bandwidth-expanded version of ^(m). Is determined. j = 1, 2,. . . For successively erased frames, And the bandwidth expansion factor γ ⁽ⁿ⁾ is in the range [0.8, 1..
0]. FIG. 1 illustrates this bandwidth extension applied to a synthesis filter. The decoder updates the bandwidth extension by the following formula for each frame. Here, C _B is the bursty frame erasure counter which counts the number of frames subsequent erased, LSFB
W _MiN is the minimum LSF bandwidth of the last good frame. The ith LSF bandwidth (LSFBW _i ) is | f _{i + 1} −f
_i |. The smaller the LSF bandwidth, the corresponding peak (formant) in the LPC spectrum
Becomes sharp. That is, LSFBW _min is the minimum LSF
Since it is BW _i , the bandwidth expansion factor may be reduced only when at least one pair of LSF frequencies is close (sharp formant). Note that as γ ⁽ⁿ ⁾ decreases, the poles of the synthesis filter Move radially toward the origin, thereby expanding the formant peak.

【００３０】したがって、第ｍフレームが良好なフレー
ムで、第（ｍ＋１）フレームが消去された場合、Ｃ_B＝
１で、更新された拡大係数はγ^(m+1)＝ｍｉｎ（１．０
５γ^(m ⁾，１．０）である。（γ^(m+1)＝１．０５γ^(m)
≦１）の場合、γ^(m)は約０．９５３以下でなければな
らなかった。これは先行する４個のフレームの中の少な
くとも一つはγ⁽ⁿ⁾が減少するということを意味し、こ
のことは二つ以上の相次ぐ消去フレームを意味する。）
しかし、第（ｍ＋２）フレームまたはより多くの消去フ
レームがあり、第ｍフレームのＬＳＦＢＷ_minが１００
Ｈｚより小さい場合、係数γ^(m+j)は次第に０．８の限
度まで小さくなる。これにより第ｍフレームのどの鋭い
ホルマント（ＬＳＦＢＷ_min＜１００Ｈｚ）が第（ｍ＋
２）フレームとその後の相次ぐ消去フレームに対する隠
蔽再構成の合成品質に影響を及ぼすことが防止される。
すなわち、合成フィルタは消去された第（ｍ＋ｊ）フレ
ーム隠蔽するためのである。ここで、フィルタ係数ａ_k ^(m)は最後の良好なフ
レームから得られる。Therefore, when the m-th frame is a good frame and the (m + 1) -th frame is deleted, C _B =
At 1, the updated magnification factor is γ ^{(m + 1)} = min (1.0
5γ ^(m ⁾ , 1.0). (Γ ^{(m + 1)} = 1.05γ ^(m)
In the case of ≦ 1), γ ^(m) had to be about 0.953 or less. This means that at least one of the preceding four frames has a reduced γ ^(n), which means two or more consecutive erased frames. )
However, there are (m + 2) th frames or more erased frames, and the LSFBW _{min of} the mth frame is 100
Below Hz, the coefficient γ ^{(m + j)} gradually decreases to the limit of 0.8. This allows any sharp formant (LSFBW _min <100 Hz) in the m-th frame to be shifted to the (m +
2) Prevents affecting the combined quality of the concealment reconstruction for the frame and subsequent successive erased frames.
That is, the synthesis filter is used to conceal the erased (m + j) -th frame. It is. Here, the filter coefficients a _k ^(m) are obtained from the last good frame.

【００３１】また、バースト状のフレーム消去に続く良
好なフレームの場合、γ^(m+j)はなおも復号化されたフ
ィルタ係数に適用され、γ^(m+j+1)＝ｍｉｎ（１．０５
γ^(m+ ^j)，１．０）によりフレーム消去から滑らかに回
復するために次第に１．０まで大きくなる。Also, for good frames following a burst of frame erasures, γ ^{(m + j)} is still applied to the decoded filter coefficients and γ ^{(m + j + 1)} = min (1. 05
γ ^{(m +} ^j) , 1.0) gradually increases to 1.0 in order to smoothly recover from frame erasure.

【００３２】（２）消去された第（ｍ＋１）フレームを
隠蔽するための適応コードブックの量子化されたピッチ
遅延Ｔ^(m+1)を良好な前の第ｍフレームからのＴ^(m)に等
しく定める。しかし、二つ以上の相次ぐ消去フレームに
ついては、ランダムな３％のジッタをＴ^(m)に加えるこ
とにより、Ｔ^(m+j)（ｊ＝２，３…）を定める。これに
より、Ｇ．７２９のようにＴ^(m+j+1)を丁度Ｔ^(m+j)＋１
にした場合に生じ得る推定誤差を累積することなく、過
度に周期的な隠蔽信号を再構成することが避けられる。
この隠蔽ピッチ遅延を前の（サブ）フレームの励起ｕ
^(m)（ｎ）に印加することにより、適応コードブックベ
クトルｖ^(m+j)（ｎ）を形成する。要するに、Ｔ^(m)に
［−０．０３Ｔ^(m)，０．０３Ｔ^(m)］の範囲のランダム
な数を印加し、範囲に応じて最も近い１／３または整数
に丸めることにより、相次ぐ消去フレームに対するＴ
^(m+j)を得る。図１はジッタを示し、帰還ループは前の
フレームの励起の使用を示す。(2) Replace the quantized pitch delay T ^{(m + 1)} of the adaptive codebook for concealing the erased (m + 1) frame with T ^(m) from the previous good m-th frame. Determine equally. However, for two or more consecutive erased frames, T ^{(m + j)} (j = 2,3...) Is determined by adding a random 3% jitter to T ^(m) . Thereby, G. As in 729, T ^{(m + j + 1)} is exactly T ^{(m + j)} +1
Thus, it is possible to avoid reconstructing an excessively periodic concealment signal without accumulating the estimation error that may occur in the case of (1).
This concealment pitch delay is determined by the excitation u of the previous (sub) frame.
^(m) Apply to (n) to form an adaptive codebook vector v ^{(m + j)} (n). In short, a random number in the range of [-0.03T ^(m) , 0.03T ^(m) ] is applied to T ^(m) , and rounded to the nearest 1/3 or an integer according to the range, thereby successively. T for erased frame
^{(m + j)} . FIG. 1 shows the jitter and the feedback loop shows the use of the excitation of the previous frame.

【００３３】（３）代数的コードブックベクトルｃ
^(m+j)（ｎ）をｃ^(m)（ｎ）の型のランダムベクトルとし
て定める。すなわち、Ｇ．７２９形の符号化の場合、ベ
クトルは４０個の０の成分の中の４個を±１パルスとし
たものとする。(3) Algebraic codebook vector c
^{(m + j)} (n) is defined as a random vector of the type c ^(m) (n). That is, G. In the case of 729-type coding, it is assumed that four vectors out of 40 zero components are ± 1 pulses.

【００３４】（４）量子化された適応コードブック利得
ｇ_p ^(m+j)および代数的コードブック利得ｇ_c ^(m+j)を単に
ｇ_p ^(m)およびｇ_c ^(m)と等しく定める。ただし、ｇ_p ^(m+j)
の上限はｍａｘ（１．２−０．１（Ｃ_B−１），０．
８）とする。この場合も、Ｃ_Bは相次ぐ消去フレームの
数の計数値、すなわちバーストである。この上限によ
り、励起信号エネルギーの予測されないサージが防止さ
れる。このように減衰されない利得を使用することによ
り、励起エネルギーが維持される。しかし、ステップ
（５）で説明するように、係数ｇ_E ^(m+j)を適用すること
により合成の前に励起が弱められる（ミューティングさ
れる）。(4) The quantized adaptive codebook gain g _p ^{(m + j)} and algebraic codebook gain g _c ^{(m + j)} are simply determined to be equal to g _p ^(m) and g _c ^(m). . Where g _p ^{(m + j)}
The upper limit _{max (1.2-0.1 (C B -1)} , 0.
8). Again, C _B is the count of the number of successive erased frames, that is, the burst. This upper limit prevents unexpected surges of excitation signal energy. By using such an unattenuated gain, the excitation energy is maintained. However, as explained in step (5), the excitation is weakened (muted) before the synthesis by applying the coefficient g _E ^{(m + j} ).

【００３５】（５）消去された第（ｍ＋１）の（サブ）
フレームに対する励起をステップ（２）−（４）からの
アイテムを使用するｕ^(m+1)（ｎ）＝ｇ_p ^(m+1)ｖ
^(m+1)（ｎ）＋ｇ_c ^(m+1)ｃ^(m+1)（ｎ）として形成する。
次に、図１に示すように適応コードブック帰還ループの
外側で励起ミューティング係数ｇ_E ^(m+1)を適用する。こ
れにより、励起の過度の減衰が除去されるが、消去され
たフレームが母音の始まりを含むフレームに続く場合に
生じるような音声エネルギーのサージはなお避けられ
る。サブフレーム（５ｍｓ）毎に励起ミューティング係
数ｇ_E ⁽ⁿ⁾が更新され、範囲［０．０，１．０］の中にあ
る。更新は、次のように、フレーム（１０ｍｓ）毎に更
新されるミューティングカウンタＣ_Mによって決まる。Ｃ_B＞１である場合には、Ｃ_M＝４そうでない場合、ｇ_p ^(m+1)＜１．０でＣ_M＞０であれ
ば、Ｃ_Mを１だけ減らすそうでなければ、Ｃ_Mは変化させ
ないここで、Ｃ_Bはこの場合も消去されたフレームの相次ぐ
数を計数するバースト状のカウンタであり、ｇ_p ^(m+1)は
ステップ（４）による代数的コードブック利得である。
次にｇ_E（ｎ）更新は次の通りである。ｇ_E ⁽ⁿ⁺¹⁾＝０．９５４９９ｇ_E ⁽ⁿ⁾ Ｃ_M ⁽ⁿ⁺¹⁾＞０の場合ｇ_E ⁽ⁿ⁺¹⁾＝ｍｉｎ（１．０９６４８ｇ_E ⁽ⁿ⁾，１．０）そうでない場合したがって、合成フィルタに対する励起はｇ_E ^(m+1)ｕ
^(m+1)（ｎ）となる。同様に、第（ｍ＋１）の相次ぐ消
去フレームの場合には、対応するｇ_p ^(m+j)ｖ
^(m+j)（ｎ）＋ｇ_c ^(m+j)ｃ^(m+j)（ｎ）を使用し、ｇ_E
^(m+j)でミューティングする。(5) Deleted (m + 1) th (sub)
Excitation for the frame from steps (2)-(4)
Use item u^{(m + 1)}(N) = g_p ^{(m + 1)}v
^{(m + 1)}(N) + g_c ^{(m + 1)}c^{(m + 1)}(N).
Next, as shown in FIG.
Excitation muting coefficient g on the outside_E ^{(m + 1)}Apply This
This eliminates, but eliminates, the excessive attenuation of the excitation.
Frame that follows the frame containing the beginning of the vowel
Surges of voice energy that can occur are still avoided
You. Excitation muting section for each subframe (5 ms)
Several g_E ⁽ⁿ⁾Is updated to be within the range [0.0,1.0].
You. The update is performed every frame (10 ms) as follows.
New muting counter C_MDepends on C_BIf> 1, C_M= 4 otherwise g_p ^{(m + 1)}<1.0 at C_M> 0
If C_MOtherwise reduce C by 1_MChange
Not where C_BAgain in succession of erased frames
A burst-like counter for counting the number, g_p ^{(m + 1)}Is
Algebraic codebook gain from step (4).
Then g_E(N) The update is as follows. g_E ^{(n + 1)}= 0.95499 g_E ⁽ⁿ⁾ C_M ^{(n + 1)}If> 0 g_E ^{(n + 1)}= Min (1.09648 g_E ⁽ⁿ⁾, 1.0) Otherwise, the excitation for the synthesis filter is g_E ^{(m + 1)}u
^{(m + 1)}(N). Similarly, the (m + 1) -th successive disappearance
For the last frame, the corresponding g_p ^{(m + j)}v
^{(m + j)}(N) + g_c ^{(m + j)}c^{(m + j)}Using (n), g_E
^{(m + j)}Muting with.

【００３６】（６）ステップ（１）のＬＰ合成フィルタ
からステップ（５）の励起を適用することにより音声を
合成する。(6) A speech is synthesized by applying the excitation of step (5) from the LP synthesis filter of step (1).

【００３７】（７）任意のポストフィルタリングと他の
成形動作を適用する。(7) Apply any post-filtering and other shaping operations.

【００３８】５．代替好適実施例代替好適実施例は前記好適実施例の三つの隠蔽特徴の中
の一つまたは二つだけを遂行する。実際、消去フレーム
に対する、また消去フレームのバースト後の良好なフレ
ームに対するＬＰ係数の帯域幅拡大を省略することがで
きる。これは合成フィルタを変えるだけで、励起ミュー
ティングまたはピッチ遅延ジッタリングに影響を及ぼす
ことはない。5. Alternative Preferred Embodiment The alternative preferred embodiment performs only one or two of the three concealment features of the preferred embodiment. In fact, it is possible to omit the bandwidth expansion of the LP coefficients for erasure frames and for good frames after erasure frame bursts. This only changes the synthesis filter and does not affect excitation muting or pitch delay jittering.

【００３９】もう一つの代替好適実施例はピッチ遅延ジ
ッタリングを省略するが、励起ミューティングおよびＬ
Ｐ係数の帯域幅拡大とともに、消去フレームに対して
Ｇ．７２９のようにインクリメントを使用することがで
きる。Another alternative preferred embodiment omits pitch delay jittering, but uses excitation muting and L
With the bandwidth expansion of the P coefficient, the G.P. 729, an increment can be used.

【００４０】更に、代替好適実施例は励起ミューティン
グを省略し、ピッチ遅延ジッタリングおよび合成フィル
タ係数の帯域幅拡大とともに、Ｇ．７２９の構成を使用
する。In addition, an alternative preferred embodiment omits excitation muting, along with pitch delay jittering and bandwidth expansion of the synthesis filter coefficients, as well as G.264. 729 configuration is used.

【００４１】最後に、好適実施例は三つの特徴（励起ミ
ューティング、ピッチ遅延ジッタリング、および合成フ
ィルタ係数の帯域幅拡大）の中の一つだけを使用し、他
の側面でＧ．７２９に従うことができる。Finally, the preferred embodiment uses only one of the three features (excitation muting, pitch delay jittering, and bandwidth expansion of the synthesis filter coefficients), while G. 729.

【００４２】６．システムの好適実施例図５および図６は好適実施例の符号化と復号化を使用す
る好適実施例システムを機能ブロック形式で示す。これ
は音声と、有効にＣＥＬＰ符号化することができる他の
信号にも適用される。符号化と復号化は、ディジタル信
号プロセッサ（ＤＳＰ）、または汎用プログラマブルプ
ロセッサ、またはチップ上の専用回路またはシステム、
たとえばＲＩＳＣプロセッサ制御を行う同一チップ上の
ＤＳＰとＲＩＳＣの両方のプロセッサで行うことができ
る。コードブックは符号器と復号器の両方のメモリに記
憶される。装置内または外部ＲＯＭ、フラッシュＥＥＰ
ＲＯＭ、もしくはＤＳＰまたはプログラマブルプロセッ
サ内に記憶されたプログラムは信号処理を遂行すること
ができる。アナログ−ディジタル変換器およびディジタ
ル−アナログ変換器は現実世界への結合を行い、変調器
および復調器（と空中インタフェースのためのアンテ
ナ）が送信波形のための結合を行う。符号化された音声
はインタネットのような網を介してパケット化し、送信
することができる。6. System Preferred Embodiment FIGS. 5 and 6 show, in functional block form, a preferred embodiment system that uses the preferred embodiment encoding and decoding. This also applies to speech and other signals that can be effectively CELP encoded. Encoding and decoding may be performed by a digital signal processor (DSP), or a general-purpose programmable processor, or a dedicated circuit or system on a chip,
For example, the control can be performed by both the DSP and the RISC processor on the same chip that controls the RISC processor. The codebook is stored in the memory of both the encoder and the decoder. Internal or external ROM, Flash EEP
A ROM or a program stored in a DSP or a programmable processor can perform signal processing. Analog-to-digital and digital-to-analog converters provide real-world coupling, and modulators and demodulators (and antennas for the air interface) provide for transmission waveforms. The encoded voice can be packetized and transmitted via a network such as the Internet.

【００４３】７．変形例好適実施例は、合成フィルタ係数の帯域幅拡大、ピッチ
遅延ジッタリング、および励起ミューティングによる消
去フレーム隠蔽の特徴の一つ以上を維持しながら、種々
変形することができる。たとえば、期間（フレームとサ
ブフレーム）のサイズおよびサンプリングレートは変え
ることができる。帯域幅拡大係数はＣ_B＞０またはＣ_B＞
２に対して適用することができる。乗数０．９５と１．
０５、および限度０．８と１．０を変えることができ
る。１００Ｈｚの閾値を変えることができる。ピッチ遅
延ジッタはピッチ遅延に対するパーセントを大きくした
り、小さくしたりすることができるし、最初の消去フレ
ームに適用することもできる。ジッタのサイズは相次ぐ
消去フレームの数または消去密度に応じて変えることが
できる。励起ミューティングは相次ぐ消去フレームの数
または消去密度に応じて非線形に変えることができる。
乗数０．９５４９９と１．０９６４８を変えることがで
きる。7. Modifications The preferred embodiment can be modified in various ways while maintaining one or more of the characteristics of erasure frame concealment due to bandwidth expansion of the synthesis filter coefficients, pitch delay jittering, and excitation muting. For example, the size of the periods (frames and sub-frames) and the sampling rate can vary. The bandwidth expansion factor is C _B > 0 or C _B >
2 can be applied. Multipliers 0.95 and 1.
05 and the limits 0.8 and 1.0 can be varied. The threshold of 100 Hz can be changed. Pitch delay jitter can be increased or decreased as a percentage of pitch delay, and can be applied to the first erased frame. The size of the jitter can be changed according to the number of successive erased frames or the erase density. The excitation muting can be changed non-linearly depending on the number of consecutive erased frames or the erase density.
The multipliers 0.95499 and 1.09648 can be changed.

【００４４】関連出願に対する相互参照この出願は、１９９９年１１月２３日に出願された米国
仮特許出願第６０／１６７，１９７号により優先権を主
張する。This application claims priority from US Provisional Patent Application No. 60 / 167,197, filed November 23, 1999.

【００４５】以上の説明に関して更に以下の項を開示す
る。（１）ディジタル音声を復号化するための方法であっ
て、（ａ）適応コードブック寄与分と固定コードブック
寄与分の和によって、符号化されたディジタル音声の消
去された期間に対する励起を形成するステップであっ
て、前記適応コードブック寄与分は前記符号化されたデ
ィジタル音声の時間的に前の期間の励起とピッチと第一
の利得とから求められ、前記固定コードブック寄与分は
時間的に前の前記期間の第二の利得から求められる、励
起形成ステップと、（ｂ）前記励起をミューティングす
るステップと、（ｃ）前記ミューティングされた励起を
フィルタリングするステップとを含むディジタル音声の
復号化方法。With respect to the above description, the following items are further disclosed. (1) A method for decoding digital speech, comprising: (a) summing an adaptive codebook contribution and a fixed codebook contribution to form an excitation for an erased period of encoded digital speech. Step, wherein the adaptive codebook contribution is determined from the excitation, pitch, and first gain of a temporally previous period of the encoded digital speech, and wherein the fixed codebook contribution is temporally Decoding digital speech, comprising: forming an excitation, derived from a second gain of the previous period; (b) muting the excitation; and (c) filtering the muted excitation. Method.

【００４６】（２）第１項記載のディジタル音声の復号
化方法であって、（ａ）前記フィルタリングが、時間的
に前の前記期間のフィルタ係数から求められる合成フィ
ルタ係数による合成を含む、前記ディジタル音声の復号
化方法。(2) The method for decoding digital audio according to (1), wherein (a) the filtering includes synthesis by a synthesis filter coefficient obtained from a filter coefficient of the period preceding the time. A method for decoding digital audio.

【００４７】（３）ディジタル音声を復号化するための
方法であって、（ａ）符号化されたディジタル音声の時
間的に前の期間のフィルタ係数の帯域幅拡大されたバー
ジョンからフィルタ係数を決めることにより、前記符号
化されたディジタル音声の消去された期間に対する合成
フィルタを形成するステップと、（ｂ）前記の消去され
た期間に対する前記合成フィルタで前記の消去された期
間に対する励起をフィルタリングするステップとを含む
ディジタル音声の復号化方法。(3) A method for decoding digital speech, wherein (a) determining a filter coefficient from a bandwidth-expanded version of the filter coefficient of a temporally previous period of the encoded digital speech. Thereby forming a synthesis filter for the erased period of the encoded digital audio; and (b) filtering the excitation for the erased period with the synthesis filter for the erased period. And a method for decoding digital voice.

【００４８】（４）第３項記載のディジタル音声の復号
化方法であって、（ａ）前記の消去された期間に対する
前記合成フィルタの前記フィルタ係数ａ ₁，ａ₂，．．．
ａ_Mは時間的に前の期間に対する前記合成フィルタの前
記フィルタ係数ｂ₁，ｂ₂，．．．ｂ_Mに対して、ａ₁＝ｆ
ｂ₁，ａ₂＝ｆ²ｂ₂，．．．，ａ_M＝ｆ^Mｂ_Mの関係があ
り、ｆは帯域幅拡大係数である、ディジタル音声の復号
化方法。(4) Decoding of the digital voice described in item 3
(A) with respect to the erased period
The filter coefficient a of the synthesis filter ₁, A_Two,. . .
a_MIs before the synthesis filter for the temporally previous period
The filter coefficient b₁, B_Two,. . . b_MFor a₁= F
b₁, A_Two= F^Twob_Two,. . . , A_M= F^Mb_MHas a relationship
Where f is the bandwidth expansion factor,
Method.

【００４９】（５）ディジタル音声の復号化方法であっ
て、（ａ）適応コードブック寄与分と固定コードブック
寄与分の和によって、符号化されたディジタル音声の消
去された期間に対する励起を形成するステップであっ
て、前記適応コードブック寄与分は前記符号化されたデ
ィジタル音声の時間的に前の期間の励起とピッチと第一
の利得とから求められ、前記固定コードブック寄与分は
時間的に前の前記期間の第二の利得から求められる、励
起形成ステップと、（ｂ）前記ミューティングされた励
起をフィルタリングするステップとを含むディジタル音
声の復号化方法。(5) A method for decoding digital speech, wherein (a) the sum of the adaptive codebook contribution and the fixed codebook contribution forms an excitation for the erased period of the encoded digital speech. Step, wherein the adaptive codebook contribution is determined from the excitation, pitch, and first gain of a temporally previous period of the encoded digital speech, and wherein the fixed codebook contribution is temporally A method for decoding digital speech, comprising: forming an excitation, derived from a second gain of a previous said period; and (b) filtering the muted excitation.

【００５０】（６）第５項記載のディジタル音声の復号
化方法であって、（ａ）前記フィルタリングが、ミュー
ティングと、それに続く、時間的に前の前記期間の合成
フィルタ係数から求められる合成フィルタ係数による合
成とを含む、前記ディジタル音声の復号化方法。(6) The digital audio decoding method according to (5), wherein (a) the filtering is performed by muting and subsequent synthesis obtained from a synthesis filter coefficient of the temporally preceding period. The digital speech decoding method, comprising: synthesizing with a filter coefficient.

【００５１】（７）第６項記載のディジタル音声の復号
化方法であって、（ａ）前記の符号化されたディジタル
音声の時間的に前の期間の合成フィルタ係数の帯域幅拡
大されたバージョンから、前記期間に対する合成フィル
タ係数を決めるステップを含むディジタル音声の復号化
方法。(7) The method for decoding digital speech according to item 6, wherein (a) a bandwidth-expanded version of the synthesis filter coefficient of the encoded digital speech in a temporally preceding period. And determining a synthesis filter coefficient for the period from the following.

【００５２】（８）ＣＥＬＰ符号化された信号のための
復号器であって、（ａ）固定コードブックベクトル復号
器と、（ｂ）固定コードブック利得復号器と、（ｃ）適
応コードブック利得復号器と、（ｄ）適応コードブック
ピッチ遅延復号器と、（ｅ）前記各復号器に結合された
励起発生器と、（ｆ）合成フィルタと、（ｇ）前記励起
発生器の出力と前記合成フィルタの入力との間に結合さ
れたミューティング利得と、を具備し、（ｈ）受信した
フレームが消去されているとき、前記各復号器は代わり
の出力を発生し、前記励起発生器は代わりの励起を発生
し、前記合成フィルタは代わりのフィルタ係数を発生
し、前記ミューティング利得は前記代わりの励起をミュ
ーティングする、復号器。(8) a decoder for a CELP coded signal, comprising: (a) a fixed codebook vector decoder; (b) a fixed codebook gain decoder; and (c) an adaptive codebook gain. A decoder; (d) an adaptive codebook pitch delay decoder; (e) an excitation generator coupled to each of the decoders; (f) a synthesis filter; and (g) an output of the excitation generator and A muting gain coupled between the input of the synthesis filter, and (h) when the received frame is erased, each of the decoders generates an alternate output, and the excitation generator comprises: A decoder for generating an alternative excitation, wherein the synthesis filter generates alternative filter coefficients, and wherein the muting gain mutes the alternative excitation.

【００５３】（９）第８項記載の復号器であって、
（ａ）前記固定コードブック復号器と前記適応コードブ
ック復号器とはともに、前のフレームに対する出力を反
復することにより前記代わりの各出力を発生する、復号
器。(9) The decoder according to item 8, wherein
(A) The fixed codebook decoder and the adaptive codebook decoder both generate the alternative outputs by repeating the output for the previous frame.

【００５４】（１０）適応コードブックと固定コードブ
ックの両方で符号励起されＬＰ符号化されたフレームに
対する復号器。消去されたフレームの隠蔽のため、ミュ
ーティングされた反復励起と、閾値適応の帯域幅拡大反
復合成フィルタと、ジッタリングされた反復ピッチ遅延
とを使用する。(10) A decoder for LP-coded frames code-excited by both the adaptive codebook and the fixed codebook. For concealment of the erased frame, use a muted repetitive excitation, a threshold adaptive bandwidth extension iterative synthesis filter, and a jittered repetition pitch delay.

[Brief description of the drawings]

【図１】好適実施例の復号器をブロック形式で示す図で
ある。FIG. 1 is a block diagram illustrating a decoder according to a preferred embodiment.

【図２】公知の復号器の隠蔽方法を示す図である。FIG. 2 is a diagram illustrating a known decoder concealment method.

【図３】公知の符号器のブロック図である。FIG. 3 is a block diagram of a known encoder.

【図４】公知の復号器のブロック図である。FIG. 4 is a block diagram of a known decoder.

【図５】ＬＰシステムを示す図である。FIG. 5 is a diagram showing an LP system.

【図６】ＬＰシステムを示す図である。FIG. 6 is a diagram showing an LP system.

[Explanation of symbols]

ｇ_p ^(m) 適応コードブック利得ｇ_c ^(m) 固定コードブック利得Ｔ^(m) ピッチ遅延g _p ^(m) adaptive codebook gain g _c ^(m) fixed codebook gain T ^(m) pitch delay

Claims

[Claims]

1. A method for decoding digital speech, comprising: (a) providing excitation for an erased period of encoded digital speech by a sum of an adaptive codebook contribution and a fixed codebook contribution. Forming the adaptive codebook contribution from the excitation, pitch, and first gain of the encoded digital speech in a temporally earlier period, wherein the fixed codebook contribution is time Digital audio comprising: an excitation formation step, derived from a second gain of the previous period, (b) muting the excitation, and (c) filtering the muted excitation. Decoding method.

2. A decoder for CELP encoded signals, comprising: (a) a fixed codebook vector decoder; (b) a fixed codebook gain decoder; and (c) an adaptive codebook gain decoding. (D) an adaptive codebook pitch delay decoder; (e) an excitation generator coupled to each of the decoders; (f) a synthesis filter; and (g) an output of the excitation generator and the synthesis. A muting gain coupled between the input of the filter and: (h) each of the decoders generates an alternative output when the received frame is canceled, and the excitation generator generates an alternative output. Wherein the synthesis filter generates alternative filter coefficients and the muting gain mutes the alternative excitation.