JPH04291400A - Low delay code device type predictive encoding method - Google Patents

Low delay code device type predictive encoding method

Info

Publication number
JPH04291400A
JPH04291400A JP3056993A JP5699391A JPH04291400A JP H04291400 A JPH04291400 A JP H04291400A JP 3056993 A JP3056993 A JP 3056993A JP 5699391 A JP5699391 A JP 5699391A JP H04291400 A JPH04291400 A JP H04291400A
Authority
JP
Japan
Prior art keywords
pitch
voice
frame
amplitude
encoding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP3056993A
Other languages
Japanese (ja)
Inventor
Akitoshi Kataoka
章俊 片岡
Takehiro Moriya
健弘 守谷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nippon Telegraph and Telephone Corp
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to JP3056993A priority Critical patent/JPH04291400A/en
Publication of JPH04291400A publication Critical patent/JPH04291400A/en
Pending legal-status Critical Current

Links

Landscapes

  • Complex Calculations (AREA)

Abstract

PURPOSE:To improve the quantity of an encoded voice. CONSTITUTION:A quantized amplitude is selected and supplied for a pitch period candidate selected by a pitch excitation source, a synthesizing filter is driven with the pitch signal to synthesize a voice, and a filter coefficient is linearly predicted from a past decoded voice waveform and set in the synthesizing filter; and the pitch period candidate and a quantization amplitude value are selected so that the distortion of the synthesized voice to an input voice becomes minimum, thus encoding the voice. The autocorrelation coefficient of the past decoded voice is calculated and either of two groups of four quantization gains g0-g3, and g0'-g3' are selected according to whether the calculated coefficient is >=0.6 or not to determine the amplitude of the pitch period candidate by adapting this method.

Description

【発明の詳細な説明】[Detailed description of the invention]

【0001】0001

【産業上の利用分野】この発明はピッチ励振源からピッ
チ候補を選択し、そのピッチ候補に量子化した振幅を与
えて合成フィルタを駆動し、その合成フィルタのフィル
タ係数をそれまでに復号した音声波形で線形予測して設
定し、その合成フィルタを用いてピッチ候補および量子
化振幅値を決定して短いサンプル数のフレーム単位で符
号化出力して少ない遅延で音声を符号化する低遅延符号
駆動型予測符号化方法に関する。
[Industrial Application Field] This invention selects a pitch candidate from a pitch excitation source, gives a quantized amplitude to the pitch candidate, drives a synthesis filter, and uses the filter coefficients of the synthesis filter to generate the previously decoded audio. Low-delay code drive that linearly predicts and sets the waveform, uses the synthesis filter to determine pitch candidates and quantization amplitude values, encodes and outputs in frames with a short number of samples, and encodes audio with less delay. This invention relates to a predictive coding method.

【0002】0002

【従来の技術】ディジタル移動通信などの分野では、電
波の有効利用などを計るため、種々の高能率符号化法が
用いられている。8kbit/s程度の情報量で符号化
する方法としては、CELP(符号駆動型線形予測)、
VSELP(ベクトル加算駆動型線形予測)、マルチパ
ルス符号化などが知られている。
2. Description of the Related Art In fields such as digital mobile communications, various high-efficiency encoding methods are used to effectively utilize radio waves. Methods for encoding with an amount of information of about 8 kbit/s include CELP (code-driven linear prediction);
VSELP (vector addition driven linear prediction), multipulse coding, etc. are known.

【0003】これらの方式では図3に示すように、複数
サンプルの入力音声からフィルタ係数決定部11で予測
係数を計算してフィルタ係数を決定し、そのフィルタ係
数を合成フィルタ12に設定する。A(Z)は合成フィ
ルタ11の伝達関数である。ピッチ励振源13の複数の
ピッチ周期成分(励振候補)から取出したピッチ周期と
、符号帳励振源14の複数の雑音波形ベクトル(例えば
乱数ベクトル、励振候補)から取出した候補とをそれぞ
れ利得部15,16で量子化された利得(振幅)を与え
た後加算して合成フィルタ12に駆動信号として供給し
て音声を合成し、その合成音声の入力音声に対する歪が
最も小になるようにパワー計算部17で両励振源13,
14中の各励振候補を選び、かつ利得部15,16の各
利得を設定する。符号出力部18では予測係数、ピッチ
周期成分候補と符号帳の候補それぞれに対して選ばれた
コード番号と利得などが符号として出力される。
In these systems, as shown in FIG. 3, a filter coefficient determination unit 11 calculates a prediction coefficient from a plurality of samples of input audio to determine a filter coefficient, and sets the filter coefficient in a synthesis filter 12. A(Z) is a transfer function of the synthesis filter 11. The pitch period extracted from a plurality of pitch period components (excitation candidates) of the pitch excitation source 13 and the candidates extracted from a plurality of noise waveform vectors (for example, random number vectors, excitation candidates) of the codebook excitation source 14 are each acquired by a gain unit 15. , 16, and then add the quantized gain (amplitude) and supply it as a driving signal to the synthesis filter 12 to synthesize speech, and then calculate the power so that the distortion of the synthesized speech with respect to the input speech is minimized. In the section 17, both excitation sources 13,
14 are selected, and the gains of gain sections 15 and 16 are set. The code output unit 18 outputs the code numbers and gains selected for each of the prediction coefficients, pitch period component candidates, and codebook candidates as codes.

【0004】合成フィルタ12のフィルタ係数を決定す
る予測係数は入力音声の分析によって求める。20〜3
0ms程度(通常、サンプル数で128又は256)を
1フレームとして処理が行われる。このように符号化し
ようとするサンプルより先の1フレーム分から予測係数
を求める前方予測型では符号化出力は少なくとも1フレ
ーム分の遅れが生じる。これらの方法では、1フレーム
が長いため、大きな遅延が生じてしまう。
[0004] Prediction coefficients for determining filter coefficients of the synthesis filter 12 are obtained by analyzing input speech. 20-3
Processing is performed with approximately 0 ms (usually 128 or 256 samples) as one frame. In the forward prediction type in which prediction coefficients are obtained from one frame before the sample to be encoded, the encoded output is delayed by at least one frame. In these methods, since one frame is long, a large delay occurs.

【0005】現在は、パーソナル通信などの用途におい
て、音声符号化法に対しては遅延の少ない方法が求めら
れており、上記のような大きな遅延を生じる方法は望ま
しくない。低遅延の音声符号化法としては、16kbi
t/sでLD−CELP(低遅延符号駆動型線形予測)
符号化方式が知られている。この方法では後方予測型の
ピッチ予測と近接予測とを用いている。すなわち予測係
数の算出に現在量子化しようとするフレーム内の信号を
使わずに、図3に破線で示すように符号化出力を記憶復
号部19に記憶しておき、この過去の符号を復号化し、
フィルタ係数決定部11でこの復号音声に窓をかけ、相
関関数を経由してピッチの周期性も含めた線形予測を行
う。つまり過去のフレームの波形を復号して、その波形
から合成フィルタ12のフィルタ係数を求め、その合成
フィルタ1/A0(Z)を用いて、ピッチ励振源13中
のピッチのパラメータ候補およびその振幅(利得)量子
化値を求めると共に、符号帳励振源14中の形状ベクト
ル(雑音成分)候補およびその振幅量子化値を求め、こ
れら求めた両候補および量子化値の符号を伝送する。
[0005]Currently, in applications such as personal communications, there is a demand for voice encoding methods with less delay, and methods that cause large delays such as those described above are not desirable. As a low-latency audio encoding method, 16kbi
LD-CELP (low delay code-driven linear prediction) at t/s
Encoding methods are known. This method uses backward prediction type pitch prediction and proximity prediction. That is, instead of using the signal in the frame that is currently being quantized to calculate the prediction coefficient, the encoded output is stored in the storage/decoding unit 19 as shown by the broken line in FIG. 3, and this past code is decoded. ,
A filter coefficient determination unit 11 windows the decoded speech and performs linear prediction including pitch periodicity via a correlation function. In other words, the waveform of the past frame is decoded, the filter coefficient of the synthesis filter 12 is determined from the waveform, and the synthesis filter 1/A0 (Z) is used to determine the pitch parameter candidate in the pitch excitation source 13 and its amplitude ( In addition to determining the quantized value (gain), the shape vector (noise component) candidate in the codebook excitation source 14 and its amplitude quantized value are determined, and the codes of both of the determined candidates and the quantized value are transmitted.

【0006】この方法では符号器と復号器との双方で過
去に復号化された音声は共通に利用できるので、予測係
数や周期性(ピッチ)の情報を伝送する必要がない。従
って1フレーム当りのサンプル数を少なく、例えば5〜
10サンプル数とすることができ、フレーム長を短くす
ることができ、遅延の少ない符号化が実現されている。
In this method, previously decoded speech can be commonly used by both the encoder and the decoder, so there is no need to transmit information on prediction coefficients or periodicity (pitch). Therefore, the number of samples per frame should be small, e.g.
The number of samples can be set to 10, the frame length can be shortened, and encoding with little delay is realized.

【0007】しかし、LD−CELPは現在のフレーム
の予測を過去の復号化された系列のみから行うので、予
測誤差が従来の前方予測型に比べて大きい。そのため、
8kbit/s程度の符号化では急激に波形歪が増大し
、品質が低下する。8kbit/s程度の情報量で、低
遅延での音声符号化を実現するため、LD−CELPの
ようにピッチの周期性を線形予測に含めず、ピッチ周期
成分も復号化された音声から抽出する手法が提案されて
いる。
However, since LD-CELP predicts the current frame only from past decoded sequences, the prediction error is larger than in the conventional forward prediction type. Therefore,
When encoding at approximately 8 kbit/s, waveform distortion increases rapidly and quality deteriorates. In order to achieve low-latency speech encoding with an information amount of about 8 kbit/s, the pitch periodicity is not included in linear prediction as in LD-CELP, but the pitch period component is also extracted from the decoded speech. A method has been proposed.

【0008】しかし、従来の手法はいずれもピッチ利得
(振幅)の量子化幅は最大ダイナミックレンジから、目
的とする符号化精度に応じて適当に決めた一定のものと
しており適切なピッチ利得の量子化幅が与えられていな
かった。つまり、ピッチ励振源13からピッチ周期成分
候補の検索では次式に示す入力音声に対する歪dを最小
とするgとCが決定される。
However, in all of the conventional methods, the quantization width of the pitch gain (amplitude) is a constant value that is appropriately determined from the maximum dynamic range depending on the desired encoding accuracy, and the quantization width of the pitch gain (amplitude) is set to a constant value that is appropriately determined from the maximum dynamic range depending on the desired encoding accuracy. No scope was given. That is, when searching for pitch period component candidates from the pitch excitation source 13, g and C are determined that minimize the distortion d to the input voice expressed by the following equation.

【0009】d=(X−g・H・C)2 X;入力音声 g;ピッチ利得 H;合成フィルタ12のインパルス応答C;ピッチ成分
の候補 ある候補Cに対して、最適なピッチ利得gは次式で与え
られる。
d=(X-g・H・C)2 X; input speech g; pitch gain H; impulse response C of synthesis filter 12; pitch component candidate It is given by the following formula.

【0010】g=(X,C)/|C|2 100文章を
対象に約1000秒を分析して得た各フレームにおける
相関係数に対するピッチ周期成分の最適利得の分布は図
4に示すようになった。図4において、例えば相関係数
0.9〜1.0の間の場合は最適利得が0.9〜1.0
になったフレームが1560個あったことを示す。この
図4から理解されるように相関係数の値によって最適利
得(振幅)の分布が異なり、相関係数が大きい程、最適
利得の幅が狭くなっている。しかし、従来においてはこ
のような関係がわからず、一定の量子化幅としていたた
め、必ずしも良好な符号化が行われなかった。
g=(X,C)/|C|2 The distribution of the optimal gain of the pitch period component for the correlation coefficient in each frame obtained by analyzing approximately 1000 seconds of 100 sentences is as shown in FIG. Became. In FIG. 4, for example, if the correlation coefficient is between 0.9 and 1.0, the optimal gain is between 0.9 and 1.0.
This shows that there were 1,560 frames. As can be understood from FIG. 4, the distribution of the optimal gain (amplitude) differs depending on the value of the correlation coefficient, and the larger the correlation coefficient, the narrower the width of the optimal gain. However, in the past, this relationship was not known and a constant quantization width was used, which did not necessarily result in good encoding.

【0011】この発明の目的は、低遅延で音声符号化を
行う際に、過去の符号化された音声の情報に基づいて、
ピッチ利得の量子化幅を適応的に制御することにより、
高品質な音声符号化を実現することができる低遅延符号
駆動型予測符号化方法を提供することにある。
[0011] An object of the present invention is to perform speech encoding with low delay based on information on past encoded speech.
By adaptively controlling the pitch gain quantization width,
An object of the present invention is to provide a low-delay code-driven predictive encoding method that can realize high-quality speech encoding.

【0012】0012

【課題を解決するための手段】この発明によれば低遅延
符号駆動型予測符号化方法において過去の復号化した音
声の自己相関を求め、その相関関数値をパラメータとし
て現在のフレームにおけるピッチ利得(振幅)の量子化
幅を適応的に制御する。この制御パラメータとしては、
〇自己相関関数の最大値 〇条件付きピッチ予測で選ばれた周期での相関係数が考
えられる。さらに、後者の場合、自己相関の大きい周期
の複数の候補を選び、その中から現在のフレームのピッ
チ予測に最も望ましい周期を求め、その周期に対応する
相関係数をパラメータとして利得の量子化幅を決める方
法1。
[Means for Solving the Problems] According to the present invention, in a low-delay code-driven predictive coding method, the autocorrelation of past decoded speech is determined, and the pitch gain ( Adaptively controls the quantization width of (amplitude). This control parameter is
〇 Maximum value of autocorrelation function 〇 Correlation coefficient at the period selected in conditional pitch prediction can be considered. Furthermore, in the latter case, select multiple candidates for periods with large autocorrelation, find the most desirable period for pitch prediction of the current frame from among them, and use the correlation coefficient corresponding to that period as a parameter to determine the gain quantization width. Method 1 to decide.

【0013】自己相関の大きい周期の複数の候補を選び
、その各候補ごとに対応する自己相関値に基づいて量子
化幅を決め、現在のフレームのピッチ予測に最も望まし
い周期と利得とを組にして決定する方法2。一つ前のピ
ッチ予測で用いた周期での相関係数に基づいて量子化幅
を決め、それを全ての周期候補に共通に使って、現在の
ピッチ利得を量子化する方法3。などがある。
[0013] Select a plurality of candidates with periods with large autocorrelation, determine the quantization width for each candidate based on the corresponding autocorrelation value, and pair the most desirable period and gain for pitch prediction of the current frame. Method 2: Method 3: A quantization width is determined based on the correlation coefficient in the period used in the previous pitch prediction, and the quantization width is commonly used for all period candidates to quantize the current pitch gain. and so on.

【0014】[0014]

【実施例】この発明によれば、上述したように低遅延符
号駆動型予測符号化方法において、過去の復号化した音
声の自己相関を求め、その相関関数をパラメータとして
現在のフレームにおけるピッチ利得の量子化幅を適応的
に制御する。つまり図4に示したように、相関の高い時
には最適利得は1付近に集中しており、相関が低くなる
に従い最適利得の値の幅が広がっている。
[Embodiment] According to the present invention, in the low-delay code-driven predictive coding method as described above, the autocorrelation of past decoded speech is determined, and the pitch gain in the current frame is calculated using the correlation function as a parameter. Adaptively control the quantization width. In other words, as shown in FIG. 4, when the correlation is high, the optimum gain is concentrated around 1, and as the correlation becomes low, the range of the optimum gain value becomes wider.

【0015】従ってこの例では、隣合うフレーム間の相
関が高いことを利用して、一つ前のフレームで選ばれた
候補の相関係数の値を用い、図1に示すように前フレー
ムの相関係数の値が0.6以上か0.6以下かによって
、ピッチ利得を二組の4状態g0 〜g3 かg0 ′
〜g3 ′のいずれか、つまり2ビットで量子化する。 この二組の4状態のいずれかの選択は、過去の復号化し
た音声の自己相関から、その前フレームのものが0.6
以上か否かで決定するため、受信側においてもその決定
をすることができ、ピッチ利得の量子化は2ビットでよ
い。
Therefore, in this example, taking advantage of the high correlation between adjacent frames, the value of the correlation coefficient of the candidate selected in the previous frame is used, and as shown in FIG. Depending on whether the correlation coefficient value is 0.6 or more or 0.6 or less, the pitch gain can be set to two sets of 4 states g0 to g3 or g0'
~g3', that is, quantize with 2 bits. The selection of one of these two sets of four states is based on the autocorrelation of past decoded speech, where the previous frame has a value of 0.6.
Since the decision is made based on whether or not the above is true, the receiving side can also make the decision, and the pitch gain can be quantized using 2 bits.

【0016】上述では相関係数の値が0.6以上か以下
かで量子化幅を適応的に変更したが、図2に示すように
さらに細かくわけて量子化幅を適応的に変化させてもよ
い。図2の例は8分割の場合であり、これによって、よ
り最適利得の分布に近い量子化が行える。前記二つの例
では現フレームの相関係数の値として、前フレームのも
のを用いた。これを過去で選ばれた候補の相関係数の値
の系列を用いて、線形予測を行って予測する。これによ
って現フレームで選ばれるであろう候補の相関係数の値
に近い値で、ピッチ利得の量子化ができる。
In the above, the quantization width was adaptively changed depending on whether the correlation coefficient value was 0.6 or more or less, but as shown in FIG. Good too. The example in FIG. 2 is a case of 8 divisions, which allows quantization closer to the optimal gain distribution. In the above two examples, the value of the previous frame is used as the correlation coefficient value of the current frame. This is predicted by performing linear prediction using a series of correlation coefficient values of candidates selected in the past. This allows pitch gain to be quantized with a value close to the value of the correlation coefficient of the candidate that will be selected in the current frame.

【0017】[0017]

【発明の効果】以上述べたように、この発明によれば過
去の復号化した音声の情報から、現フレームでのピッチ
利得の量子化幅を適応的に制御することにより、符号化
音声の品質を改善できる。
[Effects of the Invention] As described above, according to the present invention, the quality of encoded speech is improved by adaptively controlling the quantization width of the pitch gain in the current frame based on the information of past decoded speech. can be improved.

【図面の簡単な説明】[Brief explanation of the drawing]

【図1】この発明の実施例においてピッチ利得の量子化
幅を相関係数が0.6以上か以下に応じて変更する場合
の数値例を示す図。
FIG. 1 is a diagram showing a numerical example when the quantization width of pitch gain is changed depending on whether the correlation coefficient is 0.6 or more or less than 0.6 in an embodiment of the present invention.

【図2】この発明の実施例における相関係数の値を10
分割してそのいずれであるかに応じて量子化幅を変化さ
れた場合の数値例を示す図。
[Fig. 2] The value of the correlation coefficient in the embodiment of this invention is 10.
The figure which shows the numerical example when the quantization width is changed depending on which of the divisions it is.

【図3】低遅延符号駆動型予測符号化方法の一般的構成
を示すブロック図。
FIG. 3 is a block diagram showing a general configuration of a low-delay code-driven predictive encoding method.

【図4】自己相関係数の値と最適ピッチ利得との関係を
示す図。
FIG. 4 is a diagram showing the relationship between the value of an autocorrelation coefficient and the optimum pitch gain.

Claims (1)

【特許請求の範囲】[Claims] 【請求項1】  音声信号を比較的短かいサンプル数を
1フレームとして、ピッチ励振源からピッチ候補を選択
し、そのピッチ候補に量子化した振幅を与えて合成フィ
ルタを駆動し、その合成フィルタのフィルタ係数は1フ
レーム前までに復号化した音声波形から線形予測して設
定し、その合成フィルタを用いて、上記ピッチ候補およ
び上記量子化値を決定して1フレームを単位として符号
化する予測符号化方法において、1フレーム前までに復
号化した音声波形の相関関数を求め、その相関関数の値
によって、現在のフレームの符号化に用いる上記ピッチ
振幅の量子化幅を適応的に制御することを特徴とする低
遅延符号駆動型予測符号化方法。
Claim 1: Select a pitch candidate from a pitch excitation source using a relatively short number of samples of an audio signal as one frame, apply a quantized amplitude to the pitch candidate, drive a synthesis filter, and apply the quantized amplitude to the pitch candidate. The filter coefficients are set by linear prediction from the audio waveform decoded up to one frame before, and the synthesized filter is used to determine the pitch candidate and the quantization value, and the predictive code is encoded in units of one frame. In the encoding method, the correlation function of the audio waveform decoded up to one frame before is obtained, and the quantization width of the pitch amplitude used for encoding the current frame is adaptively controlled according to the value of the correlation function. A low-delay code-driven predictive coding method.
JP3056993A 1991-03-20 1991-03-20 Low delay code device type predictive encoding method Pending JPH04291400A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP3056993A JPH04291400A (en) 1991-03-20 1991-03-20 Low delay code device type predictive encoding method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP3056993A JPH04291400A (en) 1991-03-20 1991-03-20 Low delay code device type predictive encoding method

Publications (1)

Publication Number Publication Date
JPH04291400A true JPH04291400A (en) 1992-10-15

Family

ID=13043021

Family Applications (1)

Application Number Title Priority Date Filing Date
JP3056993A Pending JPH04291400A (en) 1991-03-20 1991-03-20 Low delay code device type predictive encoding method

Country Status (1)

Country Link
JP (1) JPH04291400A (en)

Similar Documents

Publication Publication Date Title
KR100304682B1 (en) Fast Excitation Coding for Speech Coders
JP3346765B2 (en) Audio decoding method and audio decoding device
US20020111800A1 (en) Voice encoding and voice decoding apparatus
US5727122A (en) Code excitation linear predictive (CELP) encoder and decoder and code excitation linear predictive coding method
US5953697A (en) Gain estimation scheme for LPC vocoders with a shape index based on signal envelopes
US5659659A (en) Speech compressor using trellis encoding and linear prediction
US6910009B1 (en) Speech signal decoding method and apparatus, speech signal encoding/decoding method and apparatus, and program product therefor
EP1005022B1 (en) Speech encoding method and speech encoding system
JP3416331B2 (en) Audio decoding device
JP2002268696A (en) Sound signal encoding method, method and device for decoding, program, and recording medium
US5313554A (en) Backward gain adaptation method in code excited linear prediction coders
CA2090205C (en) Speech coding system
JPH0944195A (en) Voice encoding device
US5719993A (en) Long term predictor
US6088667A (en) LSP prediction coding utilizing a determined best prediction matrix based upon past frame information
JPH04291400A (en) Low delay code device type predictive encoding method
JP2613503B2 (en) Speech excitation signal encoding / decoding method
JPH10207496A (en) Voice encoding device and voice decoding device
EP1355298B1 (en) Code Excitation linear prediction encoder and decoder
JPS6238500A (en) Highly efficient voice coding system and apparatus
JPH0519795A (en) Excitation signal encoding and decoding method for voice
JP2968109B2 (en) Code-excited linear prediction encoder and decoder
US5761635A (en) Method and apparatus for implementing a long-term synthesis filter
JPH08234795A (en) Voice encoding device
JPH0561499A (en) Voice encoding/decoding method