JPH04291400A

JPH04291400A - Low delay code device type predictive encoding method

Info

Publication number: JPH04291400A
Application number: JP3056993A
Authority: JP
Inventors: Akitoshi Kataoka; 章俊片岡; Takehiro Moriya; 健弘守谷
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1991-03-20
Filing date: 1991-03-20
Publication date: 1992-10-15

Abstract

PURPOSE:To improve the quantity of an encoded voice. CONSTITUTION:A quantized amplitude is selected and supplied for a pitch period candidate selected by a pitch excitation source, a synthesizing filter is driven with the pitch signal to synthesize a voice, and a filter coefficient is linearly predicted from a past decoded voice waveform and set in the synthesizing filter; and the pitch period candidate and a quantization amplitude value are selected so that the distortion of the synthesized voice to an input voice becomes minimum, thus encoding the voice. The autocorrelation coefficient of the past decoded voice is calculated and either of two groups of four quantization gains g0-g3, and g0'-g3' are selected according to whether the calculated coefficient is >=0.6 or not to determine the amplitude of the pitch period candidate by adapting this method.

Description

[Detailed description of the invention]

【０００１】0001

【産業上の利用分野】この発明はピッチ励振源からピッ
チ候補を選択し、そのピッチ候補に量子化した振幅を与
えて合成フィルタを駆動し、その合成フィルタのフィル
タ係数をそれまでに復号した音声波形で線形予測して設
定し、その合成フィルタを用いてピッチ候補および量子
化振幅値を決定して短いサンプル数のフレーム単位で符
号化出力して少ない遅延で音声を符号化する低遅延符号
駆動型予測符号化方法に関する。[Industrial Application Field] This invention selects a pitch candidate from a pitch excitation source, gives a quantized amplitude to the pitch candidate, drives a synthesis filter, and uses the filter coefficients of the synthesis filter to generate the previously decoded audio. Low-delay code drive that linearly predicts and sets the waveform, uses the synthesis filter to determine pitch candidates and quantization amplitude values, encodes and outputs in frames with a short number of samples, and encodes audio with less delay. This invention relates to a predictive coding method.

【０００２】0002

【従来の技術】ディジタル移動通信などの分野では、電
波の有効利用などを計るため、種々の高能率符号化法が
用いられている。８ｋｂｉｔ／ｓ程度の情報量で符号化
する方法としては、ＣＥＬＰ（符号駆動型線形予測）、
ＶＳＥＬＰ（ベクトル加算駆動型線形予測）、マルチパ
ルス符号化などが知られている。2. Description of the Related Art In fields such as digital mobile communications, various high-efficiency encoding methods are used to effectively utilize radio waves. Methods for encoding with an amount of information of about 8 kbit/s include CELP (code-driven linear prediction);
VSELP (vector addition driven linear prediction), multipulse coding, etc. are known.

【０００３】これらの方式では図３に示すように、複数
サンプルの入力音声からフィルタ係数決定部１１で予測
係数を計算してフィルタ係数を決定し、そのフィルタ係
数を合成フィルタ１２に設定する。Ａ（Ｚ）は合成フィ
ルタ１１の伝達関数である。ピッチ励振源１３の複数の
ピッチ周期成分（励振候補）から取出したピッチ周期と
、符号帳励振源１４の複数の雑音波形ベクトル（例えば
乱数ベクトル、励振候補）から取出した候補とをそれぞ
れ利得部１５，１６で量子化された利得（振幅）を与え
た後加算して合成フィルタ１２に駆動信号として供給し
て音声を合成し、その合成音声の入力音声に対する歪が
最も小になるようにパワー計算部１７で両励振源１３，
１４中の各励振候補を選び、かつ利得部１５，１６の各
利得を設定する。符号出力部１８では予測係数、ピッチ
周期成分候補と符号帳の候補それぞれに対して選ばれた
コード番号と利得などが符号として出力される。In these systems, as shown in FIG. 3, a filter coefficient determination unit 11 calculates a prediction coefficient from a plurality of samples of input audio to determine a filter coefficient, and sets the filter coefficient in a synthesis filter 12. A(Z) is a transfer function of the synthesis filter 11. The pitch period extracted from a plurality of pitch period components (excitation candidates) of the pitch excitation source 13 and the candidates extracted from a plurality of noise waveform vectors (for example, random number vectors, excitation candidates) of the codebook excitation source 14 are each acquired by a gain unit 15. , 16, and then add the quantized gain (amplitude) and supply it as a driving signal to the synthesis filter 12 to synthesize speech, and then calculate the power so that the distortion of the synthesized speech with respect to the input speech is minimized. In the section 17, both excitation sources 13,
14 are selected, and the gains of gain sections 15 and 16 are set. The code output unit 18 outputs the code numbers and gains selected for each of the prediction coefficients, pitch period component candidates, and codebook candidates as codes.

【０００４】合成フィルタ１２のフィルタ係数を決定す
る予測係数は入力音声の分析によって求める。２０〜３
０ｍｓ程度（通常、サンプル数で１２８又は２５６）を
１フレームとして処理が行われる。このように符号化し
ようとするサンプルより先の１フレーム分から予測係数
を求める前方予測型では符号化出力は少なくとも１フレ
ーム分の遅れが生じる。これらの方法では、１フレーム
が長いため、大きな遅延が生じてしまう。[0004] Prediction coefficients for determining filter coefficients of the synthesis filter 12 are obtained by analyzing input speech. 20-3
Processing is performed with approximately 0 ms (usually 128 or 256 samples) as one frame. In the forward prediction type in which prediction coefficients are obtained from one frame before the sample to be encoded, the encoded output is delayed by at least one frame. In these methods, since one frame is long, a large delay occurs.

【０００５】現在は、パーソナル通信などの用途におい
て、音声符号化法に対しては遅延の少ない方法が求めら
れており、上記のような大きな遅延を生じる方法は望ま
しくない。低遅延の音声符号化法としては、１６ｋｂｉ
ｔ／ｓでＬＤ−ＣＥＬＰ（低遅延符号駆動型線形予測）
符号化方式が知られている。この方法では後方予測型の
ピッチ予測と近接予測とを用いている。すなわち予測係
数の算出に現在量子化しようとするフレーム内の信号を
使わずに、図３に破線で示すように符号化出力を記憶復
号部１９に記憶しておき、この過去の符号を復号化し、
フィルタ係数決定部１１でこの復号音声に窓をかけ、相
関関数を経由してピッチの周期性も含めた線形予測を行
う。つまり過去のフレームの波形を復号して、その波形
から合成フィルタ１２のフィルタ係数を求め、その合成
フィルタ１／Ａ０（Ｚ）を用いて、ピッチ励振源１３中
のピッチのパラメータ候補およびその振幅（利得）量子
化値を求めると共に、符号帳励振源１４中の形状ベクト
ル（雑音成分）候補およびその振幅量子化値を求め、こ
れら求めた両候補および量子化値の符号を伝送する。[0005]Currently, in applications such as personal communications, there is a demand for voice encoding methods with less delay, and methods that cause large delays such as those described above are not desirable. As a low-latency audio encoding method, 16kbi
LD-CELP (low delay code-driven linear prediction) at t/s
Encoding methods are known. This method uses backward prediction type pitch prediction and proximity prediction. That is, instead of using the signal in the frame that is currently being quantized to calculate the prediction coefficient, the encoded output is stored in the storage/decoding unit 19 as shown by the broken line in FIG. 3, and this past code is decoded. ,
A filter coefficient determination unit 11 windows the decoded speech and performs linear prediction including pitch periodicity via a correlation function. In other words, the waveform of the past frame is decoded, the filter coefficient of the synthesis filter 12 is determined from the waveform, and the synthesis filter 1/A0 (Z) is used to determine the pitch parameter candidate in the pitch excitation source 13 and its amplitude ( In addition to determining the quantized value (gain), the shape vector (noise component) candidate in the codebook excitation source 14 and its amplitude quantized value are determined, and the codes of both of the determined candidates and the quantized value are transmitted.

【０００６】この方法では符号器と復号器との双方で過
去に復号化された音声は共通に利用できるので、予測係
数や周期性（ピッチ）の情報を伝送する必要がない。従
って１フレーム当りのサンプル数を少なく、例えば５〜
１０サンプル数とすることができ、フレーム長を短くす
ることができ、遅延の少ない符号化が実現されている。In this method, previously decoded speech can be commonly used by both the encoder and the decoder, so there is no need to transmit information on prediction coefficients or periodicity (pitch). Therefore, the number of samples per frame should be small, e.g.
The number of samples can be set to 10, the frame length can be shortened, and encoding with little delay is realized.

【０００７】しかし、ＬＤ−ＣＥＬＰは現在のフレーム
の予測を過去の復号化された系列のみから行うので、予
測誤差が従来の前方予測型に比べて大きい。そのため、
８ｋｂｉｔ／ｓ程度の符号化では急激に波形歪が増大し
、品質が低下する。８ｋｂｉｔ／ｓ程度の情報量で、低
遅延での音声符号化を実現するため、ＬＤ−ＣＥＬＰの
ようにピッチの周期性を線形予測に含めず、ピッチ周期
成分も復号化された音声から抽出する手法が提案されて
いる。However, since LD-CELP predicts the current frame only from past decoded sequences, the prediction error is larger than in the conventional forward prediction type. Therefore,
When encoding at approximately 8 kbit/s, waveform distortion increases rapidly and quality deteriorates. In order to achieve low-latency speech encoding with an information amount of about 8 kbit/s, the pitch periodicity is not included in linear prediction as in LD-CELP, but the pitch period component is also extracted from the decoded speech. A method has been proposed.

【０００８】しかし、従来の手法はいずれもピッチ利得
（振幅）の量子化幅は最大ダイナミックレンジから、目
的とする符号化精度に応じて適当に決めた一定のものと
しており適切なピッチ利得の量子化幅が与えられていな
かった。つまり、ピッチ励振源１３からピッチ周期成分
候補の検索では次式に示す入力音声に対する歪ｄを最小
とするｇとＣが決定される。However, in all of the conventional methods, the quantization width of the pitch gain (amplitude) is a constant value that is appropriately determined from the maximum dynamic range depending on the desired encoding accuracy, and the quantization width of the pitch gain (amplitude) is set to a constant value that is appropriately determined from the maximum dynamic range depending on the desired encoding accuracy. No scope was given. That is, when searching for pitch period component candidates from the pitch excitation source 13, g and C are determined that minimize the distortion d to the input voice expressed by the following equation.

【０００９】ｄ＝（Ｘ−ｇ・Ｈ・Ｃ）２　Ｘ；入力音声ｇ；ピッチ利得Ｈ；合成フィルタ１２のインパルス応答Ｃ；ピッチ成分
の候補ある候補Ｃに対して、最適なピッチ利得ｇは次式で与え
られる。d=(X-g・H・C)2 X; input speech g; pitch gain H; impulse response C of synthesis filter 12; pitch component candidate It is given by the following formula.

【００１０】ｇ＝（Ｘ，Ｃ）／｜Ｃ｜２　１００文章を
対象に約１０００秒を分析して得た各フレームにおける
相関係数に対するピッチ周期成分の最適利得の分布は図
４に示すようになった。図４において、例えば相関係数
０．９〜１．０の間の場合は最適利得が０．９〜１．０
になったフレームが１５６０個あったことを示す。この
図４から理解されるように相関係数の値によって最適利
得（振幅）の分布が異なり、相関係数が大きい程、最適
利得の幅が狭くなっている。しかし、従来においてはこ
のような関係がわからず、一定の量子化幅としていたた
め、必ずしも良好な符号化が行われなかった。g=(X,C)/|C|2 The distribution of the optimal gain of the pitch period component for the correlation coefficient in each frame obtained by analyzing approximately 1000 seconds of 100 sentences is as shown in FIG. Became. In FIG. 4, for example, if the correlation coefficient is between 0.9 and 1.0, the optimal gain is between 0.9 and 1.0.
This shows that there were 1,560 frames. As can be understood from FIG. 4, the distribution of the optimal gain (amplitude) differs depending on the value of the correlation coefficient, and the larger the correlation coefficient, the narrower the width of the optimal gain. However, in the past, this relationship was not known and a constant quantization width was used, which did not necessarily result in good encoding.

【００１１】この発明の目的は、低遅延で音声符号化を
行う際に、過去の符号化された音声の情報に基づいて、
ピッチ利得の量子化幅を適応的に制御することにより、
高品質な音声符号化を実現することができる低遅延符号
駆動型予測符号化方法を提供することにある。[0011] An object of the present invention is to perform speech encoding with low delay based on information on past encoded speech.
By adaptively controlling the pitch gain quantization width,
An object of the present invention is to provide a low-delay code-driven predictive encoding method that can realize high-quality speech encoding.

【００１２】0012

【課題を解決するための手段】この発明によれば低遅延
符号駆動型予測符号化方法において過去の復号化した音
声の自己相関を求め、その相関関数値をパラメータとし
て現在のフレームにおけるピッチ利得（振幅）の量子化
幅を適応的に制御する。この制御パラメータとしては、
〇自己相関関数の最大値〇条件付きピッチ予測で選ばれた周期での相関係数が考
えられる。さらに、後者の場合、自己相関の大きい周期
の複数の候補を選び、その中から現在のフレームのピッ
チ予測に最も望ましい周期を求め、その周期に対応する
相関係数をパラメータとして利得の量子化幅を決める方
法１。[Means for Solving the Problems] According to the present invention, in a low-delay code-driven predictive coding method, the autocorrelation of past decoded speech is determined, and the pitch gain ( Adaptively controls the quantization width of (amplitude). This control parameter is
〇 Maximum value of autocorrelation function 〇 Correlation coefficient at the period selected in conditional pitch prediction can be considered. Furthermore, in the latter case, select multiple candidates for periods with large autocorrelation, find the most desirable period for pitch prediction of the current frame from among them, and use the correlation coefficient corresponding to that period as a parameter to determine the gain quantization width. Method 1 to decide.

【００１３】自己相関の大きい周期の複数の候補を選び
、その各候補ごとに対応する自己相関値に基づいて量子
化幅を決め、現在のフレームのピッチ予測に最も望まし
い周期と利得とを組にして決定する方法２。一つ前のピ
ッチ予測で用いた周期での相関係数に基づいて量子化幅
を決め、それを全ての周期候補に共通に使って、現在の
ピッチ利得を量子化する方法３。などがある。[0013] Select a plurality of candidates with periods with large autocorrelation, determine the quantization width for each candidate based on the corresponding autocorrelation value, and pair the most desirable period and gain for pitch prediction of the current frame. Method 2: Method 3: A quantization width is determined based on the correlation coefficient in the period used in the previous pitch prediction, and the quantization width is commonly used for all period candidates to quantize the current pitch gain. and so on.

【００１４】[0014]

【実施例】この発明によれば、上述したように低遅延符
号駆動型予測符号化方法において、過去の復号化した音
声の自己相関を求め、その相関関数をパラメータとして
現在のフレームにおけるピッチ利得の量子化幅を適応的
に制御する。つまり図４に示したように、相関の高い時
には最適利得は１付近に集中しており、相関が低くなる
に従い最適利得の値の幅が広がっている。[Embodiment] According to the present invention, in the low-delay code-driven predictive coding method as described above, the autocorrelation of past decoded speech is determined, and the pitch gain in the current frame is calculated using the correlation function as a parameter. Adaptively control the quantization width. In other words, as shown in FIG. 4, when the correlation is high, the optimum gain is concentrated around 1, and as the correlation becomes low, the range of the optimum gain value becomes wider.

【００１５】従ってこの例では、隣合うフレーム間の相
関が高いことを利用して、一つ前のフレームで選ばれた
候補の相関係数の値を用い、図１に示すように前フレー
ムの相関係数の値が０．６以上か０．６以下かによって
、ピッチ利得を二組の４状態ｇ０　〜ｇ３　かｇ０　′
〜ｇ３　′のいずれか、つまり２ビットで量子化する。この二組の４状態のいずれかの選択は、過去の復号化し
た音声の自己相関から、その前フレームのものが０．６
以上か否かで決定するため、受信側においてもその決定
をすることができ、ピッチ利得の量子化は２ビットでよ
い。Therefore, in this example, taking advantage of the high correlation between adjacent frames, the value of the correlation coefficient of the candidate selected in the previous frame is used, and as shown in FIG. Depending on whether the correlation coefficient value is 0.6 or more or 0.6 or less, the pitch gain can be set to two sets of 4 states g0 to g3 or g0'
~g3', that is, quantize with 2 bits. The selection of one of these two sets of four states is based on the autocorrelation of past decoded speech, where the previous frame has a value of 0.6.
Since the decision is made based on whether or not the above is true, the receiving side can also make the decision, and the pitch gain can be quantized using 2 bits.

【００１６】上述では相関係数の値が０．６以上か以下
かで量子化幅を適応的に変更したが、図２に示すように
さらに細かくわけて量子化幅を適応的に変化させてもよ
い。図２の例は８分割の場合であり、これによって、よ
り最適利得の分布に近い量子化が行える。前記二つの例
では現フレームの相関係数の値として、前フレームのも
のを用いた。これを過去で選ばれた候補の相関係数の値
の系列を用いて、線形予測を行って予測する。これによ
って現フレームで選ばれるであろう候補の相関係数の値
に近い値で、ピッチ利得の量子化ができる。In the above, the quantization width was adaptively changed depending on whether the correlation coefficient value was 0.6 or more or less, but as shown in FIG. Good too. The example in FIG. 2 is a case of 8 divisions, which allows quantization closer to the optimal gain distribution. In the above two examples, the value of the previous frame is used as the correlation coefficient value of the current frame. This is predicted by performing linear prediction using a series of correlation coefficient values of candidates selected in the past. This allows pitch gain to be quantized with a value close to the value of the correlation coefficient of the candidate that will be selected in the current frame.

【００１７】[0017]

【発明の効果】以上述べたように、この発明によれば過
去の復号化した音声の情報から、現フレームでのピッチ
利得の量子化幅を適応的に制御することにより、符号化
音声の品質を改善できる。[Effects of the Invention] As described above, according to the present invention, the quality of encoded speech is improved by adaptively controlling the quantization width of the pitch gain in the current frame based on the information of past decoded speech. can be improved.

[Brief explanation of the drawing]

【図１】この発明の実施例においてピッチ利得の量子化
幅を相関係数が０．６以上か以下に応じて変更する場合
の数値例を示す図。FIG. 1 is a diagram showing a numerical example when the quantization width of pitch gain is changed depending on whether the correlation coefficient is 0.6 or more or less than 0.6 in an embodiment of the present invention.

【図２】この発明の実施例における相関係数の値を１０
分割してそのいずれであるかに応じて量子化幅を変化さ
れた場合の数値例を示す図。[Fig. 2] The value of the correlation coefficient in the embodiment of this invention is 10.
The figure which shows the numerical example when the quantization width is changed depending on which of the divisions it is.

【図３】低遅延符号駆動型予測符号化方法の一般的構成
を示すブロック図。FIG. 3 is a block diagram showing a general configuration of a low-delay code-driven predictive encoding method.

【図４】自己相関係数の値と最適ピッチ利得との関係を
示す図。FIG. 4 is a diagram showing the relationship between the value of an autocorrelation coefficient and the optimum pitch gain.

Claims

[Claims]

Claim 1: Select a pitch candidate from a pitch excitation source using a relatively short number of samples of an audio signal as one frame, apply a quantized amplitude to the pitch candidate, drive a synthesis filter, and apply the quantized amplitude to the pitch candidate. The filter coefficients are set by linear prediction from the audio waveform decoded up to one frame before, and the synthesized filter is used to determine the pitch candidate and the quantization value, and the predictive code is encoded in units of one frame. In the encoding method, the correlation function of the audio waveform decoded up to one frame before is obtained, and the quantization width of the pitch amplitude used for encoding the current frame is adaptively controlled according to the value of the correlation function. A low-delay code-driven predictive coding method.