JPH0566800A

JPH0566800A - Speech coding and decoding method

Info

Publication number: JPH0566800A
Application number: JP3225843A
Authority: JP
Inventors: Hiroto Suda; 博人須田; Takehiro Moriya; 健弘守谷
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1991-09-05
Filing date: 1991-09-05
Publication date: 1993-03-19
Anticipated expiration: 2017-04-15
Also published as: JP3275249B2

Abstract

PURPOSE:To reduce the distortion of the front portion of the update frame. CONSTITUTION:Against the spectrum envelope information update frame, make the pitch information update frame, selected from an adaptive coding table, as a 1/2 subframe, synchronize them, make the length of the residual information update frame (noise coding table search frame) same as the length of the spectrum envelope information update frame and superimpose for every frame and shift them. The original voice is passed through an LPC inverse filter to form a residual waveform, using this, make plural virtual coding corresponding to the code selected from the adaptive coding table, quantize noise code using the above, obtain plural respective candidates, select the candidate, which has the minimum distortion with respect to the original voice, among all candidates so as to decide the noise code and after that, the pitch information is quantized.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】この発明は、音声を低速度で高品
質で符号化する符号化方法、及びその符号化された音声
符号を復号する復号化方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a coding method for coding speech at low speed and high quality, and a decoding method for decoding the coded speech code.

【０００２】[0002]

【従来の技術】低速度で高品質を達成する音声符号化方
式として、Ｍ．Ｒ．Ｓｃｈｒｏｅｄｅｒ、Ｂ．Ｓ．Ａｔ
ａｌによる文献（文献１），“Ｃｏｄｅ−ｅｘｉｔｅｄ
ｌｉｎｅａｒｐｒｅｄｉｃｔｉｏｎ（ＣＥＬＰ）：
ｈｉｇｈ−ｑｕａｌｉｔｙｓｐｅｅｃｈａｔｖｅ
ｒｙｌｏｗｂｉｔｒａｔｅ”、ＩＥＥＥＰｒｏ
ｃ．ｏｆＩｎｔｅｒｎａｔｉｏｎａｌＣｏｎｆ．Ａ
ｃｏｕｓｔ．、Ｓｐｅｅｃｈ＆ＳｉｇｎａｌＰｒ
ｏｃｅｓｓ．、ｐｐ．９３７−９４０（１９８５）方式
や、Ｔ．Ｍｏｒｉｙａ、Ｍ．Ｈｏｎｄａによる文献（文
献２），“ＴｒａｎｓｆｏｒｍＣｏｄｉｎｇｏｆ
Ｓｐｅｅｃｈｕｓｉｎｇａｗｅｉｇｈｔｅｄｖ
ｅｃｔｏｒｑｕａｎｔｉｚｅｒ”，ＩＥＥＥＪ．Ｓ
ｅｌｅｃｔｅｄＡｒｅａｓｉｎＣｏｍｍｕｎｉｃ
ａｔｉｏｎｓ，ＪＳＡＣ−６，ｐｐ．４２５−４３１
（１９８８）などの方式が提案され検討されて来てい
る。これらの方式はＣＥＬＰ方式と呼ばれ、４．８ｋｂ
ｐｓから８ｋｂｐｓ程度の符号化速度を対象にしたもの
であり、広く検討されている。ＣＥＬＰ方式は、音声の
スペクトル包絡情報とピッチ情報および残差情報とによ
り音声を構成するモデルを用いる。そのため、符号化側
ではこれら３種類の情報を抽出／量子化し、復号側では
伝送されてきたこれら３種類の情報に基づいて音声波形
を再生する。2. Description of the Related Art As a speech coding method for achieving high quality at low speed, M. R. Schroeder, B.A. S. At
Reference by al. (Reference 1), “Code-exited”
linear prediction (CELP):
high-quality speech at ve
"ry low bit rate", IEEE Pro
c. of International Conf. A
cost. , Speech & Signal Pr
cess. , Pp. 937-940 (1985) method and T.K. Moriya, M .; Reference by Honda (reference 2), "Transform Coding of
Speech using a weighted v
vector quantizer ", IEEE JS
elected Areas in Communic
ations, JSAC-6, pp. 425-431
Methods such as (1988) have been proposed and studied. These methods are called CELP methods and are 4.8 kb.
It is intended for a coding rate of about ps to 8 kbps and has been widely studied. The CELP method uses a model that constitutes a voice by using spectral envelope information of the voice, pitch information, and residual information. Therefore, the encoding side extracts / quantizes these three types of information, and the decoding side reproduces a voice waveform based on these three types of transmitted information.

【０００３】これらの情報はいづれもフレームまたはフ
レームを複数に分割したサブフレーム単位で抽出／量子
化され、これが繰り返される。図７に、従来法における
音声波形と、スペクトル包絡情報の更新フレームとピッ
チ情報の更新フレームと、残差情報の更新フレームとの
関係例を示す。この例ではスペクトル包絡情報のフレー
ム更新周期が長く、他の情報のフレーム更新周期の２倍
になっている。従来のＣＥＬＰ方式の特徴の１つは、全
ての情報の更新フレームの境界が一致する時刻が周期的
に存在することであり、図７では、ａ，ｂ，ｃ点がこの
時刻に相当する。Each of these pieces of information is extracted / quantized in units of frames or subframes obtained by dividing a frame into a plurality of frames, and this is repeated. FIG. 7 shows an example of the relationship between the speech waveform, the update frame of the spectrum envelope information, the update frame of the pitch information, and the update frame of the residual information in the conventional method. In this example, the frame update period of the spectrum envelope information is long and is twice the frame update period of other information. One of the characteristics of the conventional CELP method is that there are periodical times at which the boundaries of all information update frames coincide, and in FIG. 7, points a, b, and c correspond to this time.

【０００４】[0004]

【発明が解決しようとする課題】しかし、このように更
新フレームの境界が一致すると、この境界部分に歪みが
集中することとなり、特に次のフレームの先頭部分での
劣化が大きくなる可能性が高い。これは、量子化値決定
における歪み最小の基準範囲がフレームまたはサブフレ
ームに閉じているためである。フレームの中央付近の値
の量子化値は前方や後方の値との関係からフレームの歪
み最小基準のもとで決定される。ところが、フレームの
最後のサンプルの量子化値は、前方の値との関係のみか
ら決定されることになり、後方の値である次のフレーム
の先頭部分の値とは無関係に決定されてしまう。そのた
め、次のフレームの先頭部分の値から見ると、歪みを増
大する結果となる値が量子化値として選択されることが
あり、これが従来方法の問題点であった。However, when the boundaries of the update frames coincide with each other, the distortion concentrates on the boundaries, and there is a high possibility that the deterioration particularly at the beginning of the next frame will increase. .. This is because the minimum distortion reference range in the quantization value determination is closed to the frame or subframe. The quantized value of the value near the center of the frame is determined based on the minimum distortion criterion of the frame from the relationship with the values in the front and the rear. However, the quantized value of the last sample of the frame is determined only from the relationship with the preceding value, and is determined irrespective of the value of the leading portion of the next frame, which is the subsequent value. Therefore, when viewed from the value of the leading portion of the next frame, a value that results in increased distortion may be selected as the quantized value, which is a problem of the conventional method.

【０００５】[0005]

【課題を解決するための手段】この発明においては以上
の従来のＣＥＬＰ方式の問題点を解決するため、例えば
図１に図７と対応して示すように、残差情報の量子化の
区切り（サブフレーム）が隣どうし重なり合うような構
成とする。図１に示す同一サブフレームの全体を同時に
量子化することで、いずれの部分においても信号の前後
の関係を考慮した量子化が実現できる。重なり合うサブ
フレームの信号がなめらかにつながるように、例えばラ
ップトトランスフォーム処理（例えば、文献（文献３）
Ｈ．Ｓ．Ｍａｌｖａｒ，“ＬａｐｐｅｄＴｒａｎｓｆ
ｏｒｍｓｆｏｒＥｆｆｉｃｉｅｎｔＴｒａｎｓｆ
ｏｒｍ／ＳｕｂｂａｎｄＣｏｄｉｎｇ”，ＩＥＥＥ
Ｔｒａｎｓ．Ａｃｏｕｓｔ．，Ｓｐｅｅｃｈ，Ｓｉｇｎ
ａｌＰｒｏｃｅｓｓｉｎｇ，ｖｏｌ．ＡＳＳＰ−３
８，ｐｐ９６９−９７８，Ｊｕｎｅ１９９０に詳細が
記述）を用いる。In order to solve the above-mentioned problems of the conventional CELP system in the present invention, for example, as shown in FIG. 1 corresponding to FIG. Subframes) should be configured so that they are adjacent to each other. By quantizing the entire same subframe shown in FIG. 1 at the same time, it is possible to realize the quantization taking into consideration the relationship before and after the signal in any part. For example, wrapped transform processing (for example, reference (reference 3) so that overlapping subframe signals are smoothly connected.
H. S. Malvar, “Lapped Transf
orms for Effective Transf
orm / Subband Coding ”, IEEE
Trans. Accout. , Speech, Sign
al Processing, vol. ASSP-3
8, pp969-978, June 1990).

【０００６】従来のＣＥＬＰ方式では、量子化雑音がフ
レームを重ねる毎に蓄積することを防ぐため、過去のフ
レームの量子化雑音の影響を除去するように現フレーム
の音声を符号化する構成をとる。具体的には、前フレー
ムの量子化雑音の影響（現フレームの合成フィルタのゼ
ロ入力応答）を現フレームの音声から差し引いて得られ
た信号を量子化する構成をとっている。この構成を実現
するためには、現フレーム音声の量子化処理の前に前フ
レーム音声の量子化が全て完了していることが必要とな
る。フレーム毎に全ての処理が区切られている従来のＣ
ＥＬＰ方式においては、現フレーム音声の量子化処理の
前に前フレーム音声の量子化が全て完了しているため現
フレームの合成フィルタのゼロ入力応答が計算でき、前
フレームの量子化雑音の影響を打消すことが可能であ
る。しかし、この発明では例えば図１に示すように、現
フレーム音声の量子化処理の前に前フレーム音声の量子
化が全て完了することにならない。そのままでは、前フ
レームの量子化雑音の影響を次のフレームで打消すこと
ができないため、以下のようにする。In the conventional CELP system, in order to prevent the quantization noise from accumulating each time a frame is overlapped, the voice of the current frame is encoded so as to remove the influence of the quantization noise of the past frame. .. Specifically, it is configured to quantize the signal obtained by subtracting the influence of the quantization noise of the previous frame (zero input response of the synthesis filter of the current frame) from the voice of the current frame. In order to realize this configuration, it is necessary that all the quantization of the previous frame speech is completed before the quantization processing of the current frame speech. Conventional C in which all processing is divided for each frame
In the ELP method, the zero input response of the synthesis filter of the current frame can be calculated because all the quantization of the speech of the previous frame is completed before the quantization processing of the speech of the current frame, and the influence of the quantization noise of the previous frame can be calculated. It is possible to cancel it. However, in the present invention, as shown in FIG. 1, for example, all the quantization of the previous frame speech is not completed before the quantization processing of the current frame speech. As it is, the influence of the quantization noise of the previous frame cannot be canceled in the next frame, so the following is done.

【０００７】あるフレームの処理を始める時点で、前フ
レームの情報でまだ量子化されてないものは残差であ
る。この量子化されていない残差を適当な条件のもとで
量子化されたと仮定してそのフレームの符号化処理を進
める構成をとる。ただし、適応符号帳の周期の量子化お
よび残差の形状の量子化（雑音符号帳のインデックス検
索）処理において、最適候補以外に複数の準最適な候補
を残しておき、この中から最終的に量子化雑音を最小に
する組み合せを選択する。従来のＣＥＬＰは、適応符号
帳の周期の量子化も残差の形状の量子化も１回で決定さ
れる。At the time when processing of a certain frame is started, the information that has not been quantized in the information of the previous frame is a residual. It is assumed that the unquantized residual is quantized under an appropriate condition and the coding process of the frame is advanced. However, in the process of quantizing the cycle of the adaptive codebook and the quantizing of the shape of the residual (index search of the noise codebook), a plurality of suboptimal candidates are left in addition to the optimum candidate, and finally, among these, Select the combination that minimizes the quantization noise. In the conventional CELP, the quantization of the cycle of the adaptive codebook and the quantization of the shape of the residual are determined once.

【０００８】以上この発明の具体的な主な特徴は、ラッ
プトトランスフォーム処理を用いること、および適応符
号帳の周期の量子化および残差の形状の量子化処理にお
いて複数の準最適な候補を残しておき最終的に量子化雑
音を最小にする組み合せを選択することである。As described above, the main features of the present invention are that the wrapped transform process is used, and a plurality of suboptimal candidates are left in the quantization process of the adaptive codebook cycle and the residual form. The final point is to select a combination that minimizes the quantization noise.

【０００９】[0009]

【実施例】図２にこの発明による復号化方法の実施例を
示す。受信符号、又はファイルの読出し出力符号として
スペクトル包絡情報を示す包絡符号と、ピッチ情報を示
すピッチ符号と、残差情報を示す残差符号とが入力さ
れ、現１フレーム前の残差符号が第１のインデックスと
して雑音符号帳１１に与えられ、これより読出された雑
音波形が変換器１２で逆変形ラップトトランスフォーム
変換され、その２フレーム長とされた変換出力は後半切
り出し回路１３で波形の後半が切り出されて利得付与回
路１４へ与えられ、その出力が第１基本残差として加算
回路１５へ供給される。FIG. 2 shows an embodiment of the decoding method according to the present invention. An envelope code indicating spectrum envelope information, a pitch code indicating pitch information, and a residual code indicating residual information are input as a reception code or a read output code of a file, and the residual code of the current one frame before is the first code. The noise waveform which is given to the random codebook 11 as an index of 1 and read from this is subjected to the inverse modified wrapped transform conversion in the converter 12, and the converted output having the two frame length is the latter half of the waveform in the latter half cutout circuit 13. Is cut out and given to the gain applying circuit 14, and its output is supplied to the adding circuit 15 as a first basic residual.

【００１０】現フレームの残差符号が第２のインデック
スとして雑音符号帳１１と同一の雑音符号帳１６に与え
られ、これより読出された雑音波形が変換器１７で逆変
形ラップトトランスフォーム変換され、その２フレーム
長とされた変換出力は前半切り出し回路１３で波形の前
半が切り出されて利得付与回路１９へ与えられ、その出
力が第３基本残差として加算回路１５へ供給される。The residual code of the current frame is given as a second index to the noise codebook 16 which is the same as the noise codebook 11, and the noise waveform read from this is subjected to inverse modified wrapped transform conversion by the converter 17. The converted output having the two-frame length is cut out by the first half cutout circuit 13 into the first half of the waveform and is given to the gain giving circuit 19, and the output thereof is supplied to the addition circuit 15 as the third basic residual.

【００１１】入力されたピッチ符号が適応符号帳２１へ
与えられ、そのピッチ符号で示す周期で適応符号帳２１
が繰返し読出され、その読出された波形は利得付与回路
２２で利得が与えられて第２基本残差として加算回路１
５へ供給される。加算回路１５の出力は適応符号帳に記
憶されると共に線形予測・合成フィルタ２３に駆動信号
として供給される。合成フィルタ２３のフィルタ係数は
入力された包絡符号により制御される。合成フィルタ２
３から復号音声信号が得られる。The input pitch code is given to the adaptive codebook 21, and the adaptive codebook 21 is supplied at the cycle indicated by the pitch code.
Are repeatedly read, and the read waveform is given a gain by the gain applying circuit 22 to obtain the second basic residual as the adding circuit 1
5 is supplied. The output of the adder circuit 15 is stored in the adaptive codebook and is supplied to the linear prediction / synthesis filter 23 as a drive signal. The filter coefficient of the synthesis filter 23 is controlled by the input envelope code. Synthesis filter 2
A decoded voice signal is obtained from 3.

【００１２】図３にこの実施例の復号処理概要における
信号のタイミング関係を示す。残差情報は前フレームと
現フレームとが半分ずつ重ってそれぞれ符号化されてい
るため、第３および第１の基本残差が時間的に連続し、
かつ前後のフレームと重なりながら残差情報を構成す
る。このためこの発明ではフレームの先頭部分での歪が
小さく、従来のＣＥＬＰ方式と大きく異なる。FIG. 3 shows the timing relationship of signals in the outline of the decoding process of this embodiment. Since the residual information is encoded such that the previous frame and the current frame overlap each other by half, the third and first basic residuals are temporally continuous,
In addition, the residual information is constructed while overlapping the preceding and following frames. Therefore, in the present invention, the distortion at the beginning of the frame is small, which is significantly different from the conventional CELP method.

【００１３】図４にこの発明による符号化方法の実施例
の処理概要を示す。通常のＣＥＬＰ方式では、適応符帳
のラグ（ピッチ周期相当）を決定し、次に雑音符号帳の
インデックスを決定する。しかしこの発明の方法では順
番が逆転し、雑音符号帳のインデックスを決定してから
適応符号帳のラグを決定する構成を取らざるをえない。
この発明では、先に決定される雑音符号帳のインデック
スを後に決定する適応符号帳（第２の基本残差）と組み
合わせたときのミスマッチの発生を防ぐため、雑音符号
帳のインデックスの決定の前に仮想的第２の基本残差
（第２の基本残差の推定値系列）をこの発明では導入す
る。FIG. 4 shows a processing outline of an embodiment of the encoding method according to the present invention. In the normal CELP method, the lag (corresponding to the pitch period) of the adaptive codebook is determined, and then the index of the random codebook is determined. However, in the method of the present invention, the order is reversed, and the index of the random codebook is determined, and then the lag of the adaptive codebook is determined.
According to the present invention, in order to prevent the occurrence of a mismatch when the index of the random codebook determined first is combined with the adaptive codebook (the second basic residual) determined later, before the index determination of the random codebook is performed. In the present invention, a virtual second basic residual (an estimated value sequence of the second basic residual) is introduced into.

【００１４】最適な仮想的第２の基本残差を解析的に求
めることは難しいため、仮想的第２の基本残差の候補を
複数作成し、復号音声信号の歪み最小を評価基準とし
て、総当たりを行なう構成とした。ここで、仮想的第２
の基本残差の候補数をＮで表すこととする。さらに、仮
想的第２の基本残差の各候補毎に、Ｍ個の雑音符号の候
補を残す構成とした。即ち、Ｎ×Ｍ個の復号音声の候補
を作成し、これから歪み最小となる候補を選択する構成
をこの実施例はとる。Since it is difficult to analytically obtain the optimum virtual second basic residual, a plurality of candidates for the virtual second basic residual are created, and the minimum distortion of the decoded speech signal is used as the evaluation criterion. It was configured to hit. Where the virtual second
Let N be the number of candidates for the basic residual of. Further, M noise code candidates are left for each candidate of the virtual second basic residual. That is, this embodiment adopts a configuration in which N × M decoded speech candidates are created and the candidate with the minimum distortion is selected.

【００１５】図５にこの発明の符号化方法を適用した符
号化装置の例を示す。入力端子２５からの入力音声は仮
想的第２基本残差作成部２６、ＬＰＣ逆フィルタ２７及
び最適値検索部２８へ供給される。仮想的第２基本残差
作成部２６からＮ個の仮想的第２基本残差が作成されて
雑音符号決定用音声合成部２９₁〜２９_Nへ供給され
る。入力音声からスペクトル包絡の特徴を除去した残差
波形がＬＰＣ逆フィルタ２７から得られ、この残差波形
は雑音符号決定用音声合成部２９₁〜２９_Nのすべてへ
供給される。FIG. 5 shows an example of a coding device to which the coding method of the present invention is applied. The input voice from the input terminal 25 is supplied to the virtual second basic residual creating unit 26, the LPC inverse filter 27, and the optimum value searching unit 28. N virtual second basic residuals are created from the virtual second basic residual creating unit 26 and are supplied to the noise code determining speech synthesizers 29 _{1 to} 29 _N. A residual waveform obtained by removing the feature of the spectrum envelope from the input speech is obtained from the LPC inverse filter 27, and this residual waveform is supplied to all of the noise code determination speech synthesis units 29 _{1 to} 29 _N.

【００１６】雑音符号決定用音声合成部２９₁におい
て、第１の仮想的第２基本残差とＬＰＣ逆フィルタ２７
からの残差波形との差が減算器３１で求められ、これよ
り雑音残差、つまり後半が第１の基本残差と、前半が第
３の基本残差と対応したものが得られる。この減算器３
１の出力は変換部３２で変形ラップトトランスフォーム
変換され、その変換出力は重み付きベクトル量子化／逆
量子化部３３で図２中の雑音符号帳１１，１６と同一の
雑音符号帳３４の何れかの雑音符号に重み付き量子化さ
れる。この場合Ｍ個の候補が求められ、これにより各後
半が第１の基本残差、前半が第３の基本残差の２フレー
ム分の残差波形に対するＭ個のインデックスが各フレー
ムごとに得られ、その各インデックスはそれぞれ雑音波
形に逆量子化されて、音声波形合成部３５₁〜３５_Mへ
供給され、Ｍ個の音声波形に合成されて最適値検索部２
８へ供給される。雑音符号決定用音声合成部２９₂〜２
９_Nも同様に構成される。In the noise code determining speech synthesizer 29 ₁ , the first virtual second basic residual and the LPC inverse filter 27 are used.
The difference between the residual residual waveform and the residual residual waveform is obtained by the subtracter 31. From this, the noise residual, that is, the latter half corresponding to the first basic residual and the first half corresponding to the third basic residual is obtained. This subtractor 3
The output of 1 is modified wrapped transform transform by the transform unit 32, and the transformed output is one of the same noise codebooks 34 as the noise codebooks 11 and 16 in FIG. 2 by the weighted vector quantization / inverse quantization unit 33. The noise code is quantized with weight. In this case, M candidates are obtained, and M indexes are obtained for each frame with respect to the residual waveforms of two frames, each half of which is the first basic residual and the first half is the third basic residual. , Each of the indexes is inversely quantized into a noise waveform and supplied to the speech waveform synthesizing units 35 _{1 to} 35 _M , synthesized into M speech waveforms and optimized value searching unit 2
8 is supplied. Noise code determined for speech synthesis unit 29 _2-2
9 _N is similarly constructed.

【００１７】以上の処理を以下に更に詳細に説明する。仮想的第２の基本残差の生成法通常のＣＥＬＰ符号化方式においては、あるフレームの
適応符号帳には、その前フレームの残差波形を適当な区
間で切り出し繰り返した信号が書かれている。この発明
の方法においては、あるフレームの前フレームの残差波
形は、第１，第２，および第３の基本残差の重み付き和
に利得を掛けたものとなる。ところが、これら基本残差
のうち第３の基本残差の量子化が終了（確定）していな
い。そのため、前フレームの残差波形、およびこれから
生成されるそのフレームの第２の基本残差が確定できな
い。The above processing will be described in more detail below. Virtual Second Basic Residual Generation Method In the normal CELP coding method, the adaptive codebook of a certain frame has a signal in which the residual waveform of the previous frame is cut out and repeated in appropriate intervals. .. In the method of the present invention, the residual waveform of the preceding frame of a certain frame is the weighted sum of the first, second, and third basic residuals multiplied by a gain. However, the quantization of the third basic residual among these basic residuals is not completed (determined). Therefore, the residual waveform of the previous frame and the second basic residual of that frame generated from this cannot be determined.

【００１８】そこで、この第２の基本残差の推定系列は
以下の方法により生成する。まず、ＬＰＣ逆フィルタ２
７で、そのフレームの前フレーム以前の原音声からその
特徴を取り除いた後に残された残差（第１の残差と呼
ぶ）を生成する。その第１の残差には、量子化雑音が含
まれない。すなわち、第１の残差で合成フィルタ（量子
化されていない係数）を駆動すると、原音声と一致した
音声波形が得られることになる。次に、その前フレーム
の量子化された第１，第２の基本残差の和を求め、この
信号にその前前フレーム以前の量子化された残差（第
１，第２および第３の基本残差の和）を時間的に連続す
るように接続した信号（第２の残差と呼ぶ）を生成す
る。その第２の残差と前記第１の残差との差が、第３の
基本残差の目標波形（量子化前の第３の基本残差）であ
る。さらに、前記第１の残差および前記第２の残差の重
み付き線形和を求め、これを第３の残差と呼ぶ。次に、
その第３の残差から切り出した一部分またはその第３の
残差を時間的に補間してから切り出した一部分を直接あ
るいは繰り返して生成した波形を仮想的第２の基本残差
とする。Therefore, this second basic residual estimation sequence is generated by the following method. First, the LPC inverse filter 2
At 7, the residual left after removing the feature from the original speech before the previous frame of that frame (called the first residual) is generated. The first residual does not include quantization noise. That is, when the synthesis filter (coefficients that have not been quantized) are driven by the first residual, a speech waveform that matches the original speech is obtained. Next, the sum of the quantized first and second basic residuals of the previous frame is calculated, and this signal is quantized to the quantized residuals (first, second and third) before the previous frame. A signal (called a second residual) is generated by connecting the basic residuals) so as to be continuous in time. The difference between the second residual and the first residual is the target waveform of the third basic residual (third basic residual before quantization). Further, a weighted linear sum of the first residual and the second residual is obtained, and this is called a third residual. next,
A part cut out from the third residual or a waveform generated by temporally interpolating the third residual and then cut out directly or repeatedly is defined as a virtual second basic residual.

【００１９】ここで、第３の残差から仮想的第２の基本
残差波形を生成する過程を、詳細に述べる。第３の残差
と現在のフレームの残差の相互相関を計算し、相互相関
を最大にする時間差（ピッチラグ）を求める。第３の残
差の時間的に新しい部分から、ピッチラグの長さの区間
を切り出す。切り出した区間を繰り返しながら波形を延
ばし、フレームと同じ長さにする。これが、仮想的第２
の基本残差である。仮想的第２の基本残差をＮ個生成す
るためには、前記相互相関を大きくするピッチラグを最
大にするものからＮ個残し、それぞれのピッチラグに応
じて上記の方法で仮想的第２の基本残差を生成する。第３および第１の基本残差の同時量子化法図５中の雑音符号決定用音声合成部２９₁の詳細、つま
り第３および第１の基本残差の同時量子化法および最適
な合成音声波形の検索を図６に示す。前述のようにして
生成した仮想的第２の基本残差を、そのフレームの残差
波形から差し引いた波形を減算器３１で生成する。この
波形を第１の基本残差として、前フレームの量子化前の
同様に得た基本残差、つまり第３の基本残差の後に時間
的に連続するように合成回路３６で接続する。この接続
波形を、変換部３２でモジュレイテッドラップトトラン
スフォーム（ＭｏｄｕｌａｔｅｄＬａｐｐｅｄＴｒａ
ｎｓｆｏｒｍ：ＭＬＴ，例えば、文献３に詳細が記述）
アルゴリズムにより変換し２分の１のデータ数の周波数
領域の波形を得る。この波形を、量子化部３３ａで重み
付きベクトル量子化（ＷｅｉｇｈｔｅｄＶｅｃｔｏｒ
Ｑｕａｎｔｉｚａｔｉｏｎ：ＷＶＱ，文献２に詳細が
記述）処理により雑音符号帳３４の何れかの雑音符号に
量子化する。この周波数領域の波形を雑音符号帳３４の
何れかの雑音波形で代表させることを各フレームごとに
行う。このとき複数個（Ｍ個）の候補を残す点にこの発
明の特長がある。さらに、これら残されたＭ個の各候補
について、以下の処理を行なう。Here, the process of generating the virtual second basic residual waveform from the third residual will be described in detail. The cross-correlation between the third residual and the residual of the current frame is calculated, and the time difference (pitch lag) that maximizes the cross-correlation is obtained. A section of the length of the pitch lag is cut out from the temporally new portion of the third residual. The waveform is extended while repeating the cut section to make it the same length as the frame. This is a virtual second
Is the basic residual of. In order to generate N virtual second basic residuals, N pieces are left from the one that maximizes the pitch lag that increases the cross-correlation, and the virtual second basic is generated by the above method according to each pitch lag. Generate the residuals. Third and First Simultaneous Basic Residual Simultaneous Quantization Method Details of the noise code determining speech synthesizer 29 ₁ in FIG. 5, that is, the third and first basic residual simultaneous quantization methods and optimum synthesized speech. The waveform search is shown in FIG. A waveform obtained by subtracting the virtual second basic residual generated as described above from the residual waveform of the frame is generated by the subtractor 31. This waveform is used as a first basic residual, and is connected by a synthesizing circuit 36 so as to be temporally continuous after the basic residual obtained in the same manner before the quantization of the previous frame, that is, the third basic residual. This connection waveform is converted by the conversion unit 32 into a modulated wrapped transform (Modulated Lapped Tra).
nsform: MLT, for example, refer to Reference 3 for details)
It is converted by an algorithm to obtain a waveform in the frequency domain with one-half the number of data. This waveform is subjected to weighted vector quantization (Weighted Vector) by the quantizer 33a.
Quantization: WVQ, the details of which are described in Document 2) processing to quantize into a random code in one of the random codebooks 34. The waveform in the frequency domain is represented by one of the noise waveforms in the noise codebook 34 for each frame. At this time, a feature of the present invention is that a plurality (M) of candidates are left. Further, the following processing is performed for each of the M candidates that remain.

【００２０】（ａ）逆量子化部３３ｂで雑音符号帳３４
を用いて逆量子化し、（ｂ）その逆量子化出力を変換部
３７で逆変形ラップトトランスフォーム変換を行い、
（ｃ）その変換出力を音声波形合成部５５で音声波形に
合成する。以上の処理により、Ｍ個の候補に対する復号音声を生成
する。原音声波形との距離最小となる候補を選択以上の処理の結果、合計でＮ×Ｍ個の復号音声の候補が
生成される。検索部２８で各候補復号音声と原音声波形
との距離を計算し、距離最小となる復号音声を決定す
る。(A) The dequantizer 33b uses the random codebook 34
And (b) the inverse quantized output is inverse-transformed wrapped transform transformed by the transforming unit 37.
(C) The converted output is synthesized by the speech waveform synthesizer 55 into a speech waveform. By the above processing, decoded speech for M candidates is generated. As a result of the above-described processing in which the candidate having the smallest distance from the original speech waveform is selected , a total of N × M decoded speech candidates are generated. The search unit 28 calculates the distance between each candidate decoded speech and the original speech waveform, and determines the decoded speech with the smallest distance.

【００２１】最後に第２の基本残差を再決定する。再決
定のため、図５中の第２の基本残差決定部３８で距離最
小となる復号音声を与える量子化された第３および第１
の基本残差（図４のの信号）を用い、第２の基本残差
（図４のの信号）を決定する。以上、この発明の符号
化処理を図４にそってまとめる。Finally, the second basic residual is redetermined. For the re-determination, the quantized third and first quantized decoded speech that minimizes the distance in the second basic residual determination unit 38 in FIG.
The second basic residual (the signal of FIG. 4) is determined using the basic residual of the same (the signal of FIG. 4). The encoding process of the present invention is summarized above with reference to FIG.

【００２２】（１）原音声波形をＬＰＣ逆フィルタリン
グした残差波形から、複数（Ｎ個）の仮想的第２基本残
差を生成する。（２）各仮想的第２基本残差に対して、複数（Ｍ個）の
量子化された第３および第１基本残差の組を求める。（３）復号音声と原音声波形との距離を選択の尺度とし
て、Ｎ×Ｍ個の復号音声の候補から最適候補を決定す
る。(1) A plurality of (N) virtual second basic residuals are generated from a residual waveform obtained by LPC inverse filtering the original speech waveform. (2) For each virtual second basic residual, obtain a set of a plurality (M) of quantized third and first basic residuals. (3) Using the distance between the decoded speech and the original speech waveform as a criterion for selection, the optimum candidate is determined from N × M decoded speech candidates.

【００２３】（４）決定された最適候補を与える、量子
化された第３および第１基本残差を用いて、第２基本残
差を決定する。図２において、逆ＭＬＴ処理は高速に行われるため、雑
音符号帳１１，１６、逆ＭＬＴ回路１２，１７を各１つ
とし、各時間領域に変換された２フレームの残差の前半
を現フレームで、後半を前フレームで用いるようにして
もよい。上述では重複サブフレームを１としたが複数サ
ブフレームずつ重複させてもよい。(4) Determine the second basic residual using the quantized third and first basic residuals that give the determined optimal candidate. In FIG. 2, since the inverse MLT processing is performed at high speed, the noise codebooks 11 and 16 and the inverse MLT circuits 12 and 17 are each set to one, and the first half of the residuals of the two frames converted into each time domain is set to the current frame. Then, the latter half may be used in the previous frame. In the above description, the number of overlapping subframes is 1, but a plurality of subframes may be overlapped.

【００２４】[0024]

【発明の効果】以上述べたように、この発明によれば、
残差情報、つまり雑音波形の量子化をフレームを一部重
複させながら行っているため、フレームの境界における
量子化雑音の増加を押さえ、かつ平均的な量子化雑音が
低下し、しかも少ない量子化ビットで済む。As described above, according to the present invention,
Since the residual information, that is, the noise waveform is quantized while overlapping some frames, the increase in the quantization noise at the frame boundary is suppressed, the average quantization noise is reduced, and the quantization is reduced. Just a bit.

[Brief description of drawings]

【図１】この発明におけるフレーム更新例の概要を示す
タイムチャート。FIG. 1 is a time chart showing an outline of an example of frame updating according to the present invention.

【図２】この発明の復号化方法を適用した復号化器の一
例を示すブロック図。FIG. 2 is a block diagram showing an example of a decoder to which the decoding method of the present invention is applied.

【図３】図２の復号化器の処理の流れを示すタイムチャ
ート。FIG. 3 is a time chart showing a processing flow of the decoder shown in FIG.

【図４】この発明による符号化方法の例を示すタイムチ
ャート。FIG. 4 is a time chart showing an example of an encoding method according to the present invention.

【図５】この発明の符号化方法を適用した符号化器の一
例を示すブロック図。FIG. 5 is a block diagram showing an example of an encoder to which the encoding method of the present invention is applied.

【図６】図５中の雑音符号決定用音声合成部の具体例を
示すブロック図。FIG. 6 is a block diagram showing a specific example of a noise code determination speech synthesis unit in FIG.

【図７】従来の音声符号化方法におけるフレーム更新を
示すタイムチャート。FIG. 7 is a time chart showing frame updating in a conventional speech encoding method.

Claims

[Claims]

1. A speech signal is divided into frames, a characteristic parameter of the frame is extracted and encoded, a component represented by the encoded characteristic parameter is removed from the speech signal, and a residual is calculated, The residual is the first
In a speech coding method that expresses the sum of the second and third basic residuals and quantizes each of these basic residuals, before the quantization of the first basic residual of each frame, A virtual second basic residual is generated, a virtual residual is generated by subtracting the virtual second basic residual from the residual of the frame, and the virtual residual and information before the previous frame are generated. Originally, the quantized values of the first basic residual of the frame and the third basic residual of the previous frame are simultaneously determined, and then the final quantized value of the second basic residual is again determined. Characteristic speech coding method.

2. In the generation of the virtual second basic residual, a residual left after the feature is removed from the original speech before the previous frame of the frame (referred to as a first residual). The sum of the quantized first and second basic residuals of its previous frame, and the quantized residual of the previous previous frame (first, second and third basic residuals) to this signal. Sum of differences)
To generate a signal (referred to as a second residual) connected in time, and construct a signal (referred to as a third residual) by linear processing from the first residual and the second residual. 2. The speech coding method according to claim 1, wherein the virtual second basic signal is generated by using a waveform generated by directly or repeatedly extracting a part cut out from the third residual. ..

3. The first basic residual of the frame and the third basic residual of the frame preceding the frame are generated by using a wrapped transform process and are quantized. The speech encoding method according to claim 1 or 2.

4. An envelope code, a pitch code and a noise code are input, a noise code book is read by the noise code, the read noise waveform is extended to two frames, and the noise waveform extended in the current frame is read. The first half is taken out, the latter half of the noise waveform stretched in the previous frame is taken out, the waveform of the adaptive codebook is repeatedly read with the above pitch code, the output of the adaptive codebook and the first half of the stretched noise waveform, and the above stretched A speech decoding method in which the latter half of the noise waveform is added and the addition output is supplied as a drive signal to a synthesis filter whose filter coefficient is controlled by the envelope code to obtain a decoded speech signal.