JP3002299B2

JP3002299B2 - Audio coding device

Info

Publication number: JP3002299B2
Application number: JP3196521A
Authority: JP
Inventors: 智一森尾
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 1991-08-06
Filing date: 1991-08-06
Publication date: 2000-01-24
Anticipated expiration: 2015-01-24
Also published as: JPH0540500A

Abstract

PURPOSE:To offer the voice encoding device provided with a pitch predictor whose amount of information is small, and also, can predict a signal of pitch length of a high frequency with high accuracy. CONSTITUTION:In a CELP voice encoding processing, transmission of pitch length information is executed by plural sub-frame units, and in order to execute a determination of pitch length, an input sound signal is predicted by a linear predictor 102 at every plural sub-frames, and a linear prediction residual signal is calculated. From a past exciting signal in an adaptive code book 104, and a linear prediction residual signal, pitch length is determined 121 by setting a mutual correlation coefficient as an evaluation reference. In the case of determining a pitch prediction coefficient, a pitch prediction 123 is executed to the whole code book 124 of the pitch prediction coefficient by the pitch length derived previously at every sub-frame, and a reproducing signal is subjected to auditory weighting, and thereafter, a coefficient by which an error to an input signal becomes minimum is selected.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】この発明は、音声信号のピッチ構
造を利用し、ピッチ予測を行うことで効率的に音声信号
を情報圧縮し、伝送或いは蓄積する装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a device for efficiently compressing, transmitting or storing an audio signal by making use of a pitch structure of the audio signal and performing pitch prediction.

【０００２】[0002]

【従来の技術】音声信号は近接サンプルの相関がかなり
高く、近接サンプルの信号系列による線形予測の手法を
用いて、音声信号を効率良く情報圧縮する手段が広く利
用されている。また音声信号にはピッチによる繰り返し
的な冗長性もあり、ピッチ予測により情報圧縮の効率を
更に高める手段も広く利用されている。2. Description of the Related Art A voice signal has a considerably high correlation between adjacent samples, and means for efficiently compressing information of a voice signal using a linear prediction technique based on a signal sequence of the nearby samples is widely used. The voice signal also has repetitive redundancy depending on the pitch, and means for further improving the efficiency of information compression by pitch prediction is widely used.

【０００３】これら二つの線形予測の手段を用いた低ビ
ットレイトの音声符号化方式として、符号励振線形予測
符号化（Code-Excited Linear Prediction：以後ＣＥＬ
Ｐと記す。）が盛んに研究開発されている(例えば、"St
ochastic Coding of SpeechSignals at Very Low Bit R
ates：The Importance of Speech Perception" M,R,Sch
roeder and B,S,Atal Speech Communication 4 1985 pa
ge 155-162 North-Holland 参照)。ＣＥＬＰにおいて
は、励振信号波形を例えば４０サンプルの長さのベクト
ルとして取り扱い、ベクトル量子化の技術を用いて励振
信号波形を非常に低ビットレイトで圧縮符号化してい
る。As a low bit rate speech encoding method using these two linear prediction means, Code-Excited Linear Prediction (hereinafter referred to as CEL)
Described as P. ) Are being actively researched and developed (for example, "St
ochastic Coding of SpeechSignals at Very Low Bit R
ates: The Importance of Speech Perception "M, R, Sch
roeder and B, S, Atal Speech Communication 4 1985 pa
ge 155-162 North-Holland). In CELP, an excitation signal waveform is treated as a vector having a length of, for example, 40 samples, and the excitation signal waveform is compression-coded at a very low bit rate using a vector quantization technique.

【０００４】基本的なＣＥＬＰ符号化器の方式説明を、
図５を用いて行う。なお、ここではピッチ予測器として
公知の技術である閉ループの１タップのピッチ予測器を
用いる。A description of the basic CELP encoder system is as follows.
This is performed using FIG. In this case, a closed-loop one-tap pitch predictor, which is a known technique, is used as the pitch predictor.

【０００５】先ず図５で示すＣＥＬＰ符号化器におい
て、入力端子１０１から入力された音声信号波形は、一
定のサンプル長（例えば１６０サンプル。これをフレー
ムと呼ぶ）毎に線形予測器１０２で線形予測係数（α）
が算出される。以下の処理はこのフレームを分割したサ
ブフレーム長（例えば４０サンプル）単位に行われる。
この線形予測係数（α）を入力としてスペクトル予測器
１０３と、聴覚的重み付けフィルタ１１１が形成され
る。なお線形予測係数は、サブフレーム毎に隣接フレー
ムの線形予測係数より補間されて設定される。本処理は
公知のものである。適応コードブック１０４は、スペク
トル予測器１０３への過去の入力信号を保持したメモリ
ーからなり、サブフレーム毎に制御部１１４の指示によ
りピッチ長（ｔａｕ）の想定範囲（例えば８ＫＨｚサン
プリングの場合ｔａｕ＝４０〜１６７）の全てのについ
てピッチ予測信号を出力する。これを１式に表す。ここ
で、ｐｉｔ［ｎ］はピッチ予測信号を、ｅｘｃ［ｎ−ｔ
ａｕ］はスペクトル予測器１０３への過去の入力信号
を、ａはピッチの予測係数を表す。First, in the CELP encoder shown in FIG. 5, a speech signal waveform inputted from an input terminal 101 is subjected to linear prediction by a linear predictor 102 at every fixed sample length (for example, 160 samples, which is called a frame). Coefficient (α)
Is calculated. The following processing is performed for each subframe length (for example, 40 samples) obtained by dividing this frame.
Using the linear prediction coefficient (α) as an input, a spectrum predictor 103 and an auditory weighting filter 111 are formed. The linear prediction coefficient is set for each subframe by interpolating from the linear prediction coefficient of an adjacent frame. This processing is known. The adaptive codebook 104 is composed of a memory holding past input signals to the spectrum predictor 103, and assumes an assumed range of the pitch length (tau) (for example, tau = 40 in the case of 8 kHz sampling) according to an instruction of the control unit 114 for each subframe. To 167) are output as pitch prediction signals. This is represented by Equation 1. Here, pit [n] is a pitch prediction signal and exc [nt]
au] represents a past input signal to the spectrum predictor 103, and a represents a pitch prediction coefficient.

【０００６】[0006]

【数１】 (Equation 1)

【０００７】なおピッチ長（ｔａｕ）がサブフレーム長
より短い場合は仮想サーチと呼ばれる手段でピッチ予測
信号を作成する手法が知られている（例えば、"Improve
d Speech Quality and Efficient Vector Quantization
in SELP" W.B.Kleijn etc.ICASSP 1988 pp.155-158
参照）。[0007] When the pitch length (tau) is shorter than the subframe length, a method of generating a pitch prediction signal by means called a virtual search is known (for example, "Improve").
d Speech Quality and Efficient Vector Quantization
in SELP "WBKleijn etc.ICASSP 1988 pp.155-158
reference).

【０００８】ストカスティックコードブック１０６には
一定のサンプル長（例えば４０サンプル）の、励振信号
波形（コードワードと呼ぶ）が複数（例えば１０２４）
種類記憶されており、制御部１１４の指示により全ての
コードワードを順に出力する。The stochastic codebook 106 has a plurality of excitation signal waveforms (referred to as codewords) having a fixed sample length (eg, 40 samples) (eg, 1024).
The type is stored, and all codewords are sequentially output according to an instruction from the control unit 114.

【０００９】最適ピッチ長（ｔａｕ）とその予測係数
（ａ）、及び最適コードワードの指標値（ｉｎｄｅｘ）
とそのそのゲイン（ｂ）の決定は、あるゲインで増幅さ
れたピッチ予測信号と励振信号の加算信号を、線形予測
器１０２に入力し再生信号を得、入力信号波形との差信
号を聴覚的重み付けフィルタ１１１でスペクトル的に整
形したあと、エネルギー算出器１１３で整形後のエネル
ギーが最小になるように、合成による分析手法（Analys
is-by-Synthesis）に従って決定する。以上の処理は公
知の技術である。The optimal pitch length (tau) and its prediction coefficient (a), and the index value of the optimal code word (index)
To determine the gain (b) thereof, a sum signal of the pitch prediction signal and the excitation signal amplified by a certain gain is input to the linear predictor 102 to obtain a reproduced signal, and the difference signal between the input signal waveform and the input signal waveform is auditory. After spectrally shaping with the weighting filter 111, the energy calculator 113 analyzes the image by an analysis method (Analys) so that the energy after shaping is minimized.
is-by-Synthesis). The above processing is a known technique.

【００１０】次にＣＥＬＰで用いられているピッチ予測
器について説明する。Next, a pitch predictor used in CELP will be described.

【００１１】１式で説明したような（１タップのピッチ
予測器と呼ばれる）予測器では、ピッチ長（ｔａｕ）と
して整数値のみしか取り得ない。それゆえ表現できるピ
ッチ周波数は、高いピッチ周波数において非常に離散的
になってしまうという欠点がある。これを解決するため
にマルチタップの予測器（３タップの場合を２式に表
す）や、オーバーサンプリングの手法を用いた非整数の
ピッチサンプル長（Fractional Delayと呼ばれる）を用
いるピッチ予測器が提案されている（例えば、"Improve
d Pitch Prediction with Fractional Delays in CELPC
oding" J.S.Marques etc. ICASSP 1990page 665-668 参
照）。In a predictor such as that described in Equation 1 (referred to as a one-tap pitch predictor), the pitch length (tau) can take only integer values. Therefore, the pitch frequency that can be represented has the disadvantage that it becomes very discrete at high pitch frequencies. To solve this, a multi-tap predictor (three taps are represented by two equations) and a pitch predictor using a non-integer pitch sample length (called Fractional Delay) using an oversampling method are proposed. (For example, "Improve
d Pitch Prediction with Fractional Delays in CELPC
oding "JSMarques etc. ICASSP 1990 page 665-668).

【００１２】[0012]

【数２】 (Equation 2)

【００１３】[0013]

【発明が解決しようとする課題】しかしながら、ＣＥＬ
Ｐ符号化で情報伝送量を低減させようとすると、ピッチ
長を表現するパラメータの情報量がかなりの部分を占め
る。例えばU.S.Government Department of Defenceの制
定したＣＥＬＰ（４．８ｋｂｐｓ）では、２０％程度に
なる。高音質の再生音を得るためには、３タップや非整
数遅延を用いるピッチ予測器を組み込みたいが、ピッチ
予測器に対する伝送情報量を低くおさえることに困難が
生じる。SUMMARY OF THE INVENTION However, CEL
If an attempt is made to reduce the amount of information transmission by P coding, the information amount of a parameter expressing the pitch length occupies a considerable part. For example, in CELP (4.8 kbps) established by the US Government Department of Defense, it is about 20%. In order to obtain a high-quality reproduced sound, it is desired to incorporate a pitch predictor using three taps or a non-integer delay. However, it is difficult to reduce the amount of information transmitted to the pitch predictor.

【００１４】[0014]

【課題を解決するための手段】本発明の目的を達成する
ためには、線形予測手段、ピッチ長算出手段、適応コー
ドブック、予測係数コードブック、３タップピッチ予測
手段、ストカステックコードブック、乗算手段、加算手
段、フィルタ、誤差算出手段、制御手段を備える音声符
号化装置であって、線形予測手段は、入力される音声信
号から複数のサブフレーム単位で線形予測係数を算出
し、ピッチ算長出手段は、入力音声信号から複数のサブ
フレーム単位でピッチ長を算出し、適応コードブックは
過去の励振信号を記憶すると共に、ピッチ長に基づいて
適応励振信号を読み出し、予測係数コードブックは、第
1指標値に対応する予測係数を記憶すると共に、制御手
段の制御により予測係数を読み出し、３タップピッチ予
測手段は、適応励振信号と予測係数に基づき予測信号算
出し、ストカステックコードブックは、第２指標値に対
応する複数のストカステック励振信号を記憶すると共
に、制御手段の制御によりストカステック励振信号を読
み出し、乗算手段は、読み出されたストカステック励振
信号に乗算係数を算出して出力し、加算手段は、ピッチ
予測信号と乗算手段の出力を加算して励振信号を出力
し、フィルタは、励振信号に線形予測係数に基づくフィ
ルタ処理を行い、誤差算出手段は、入力音声信号とフィ
ルタの誤差であるピッチ予測誤差エネルギーを算出し、
制御手段は、サブフレーム単位で、ピッチ予測誤差エネ
ルギーが最小となるよう第１、２指標値、乗算係数を決
定する。 The object of the present invention is achieved.
For this purpose, linear prediction means, pitch length calculation means, adaptive code
Book, prediction coefficient codebook, 3-tap pitch prediction
Means, stochastic code book, multiplication means, adder
Speech note comprising steps, filters, error calculation means, and control means
A linear prediction means for receiving an input speech signal.
Calculate linear prediction coefficient from multiple signals in multiple subframe units
The pitch calculating means outputs a plurality of subs from the input audio signal.
The pitch length is calculated for each frame, and the adaptive codebook is
Stores the past excitation signal and, based on the pitch length,
The adaptive excitation signal is read, and the prediction coefficient codebook is
(1) Store the prediction coefficient corresponding to the index value and
The prediction coefficient is read out by controlling the stage, and the 3-tap pitch
The measuring means calculates the prediction signal based on the adaptive excitation signal and the prediction coefficient.
And the Stochastic code book matches the second index value.
When storing multiple corresponding stochastic excitation signals,
The stochastic excitation signal is read under the control of the control means.
The multiplying means outputs the read stochastic excitation
A multiplication coefficient is calculated and output from the signal, and the adding means calculates the pitch
The excitation signal is output by adding the prediction signal and the output of the multiplication means.
The filter uses a filter based on linear prediction coefficients for the excitation signal.
Filter processing, and the error calculation means outputs
Calculate pitch prediction error energy, which is the error of
The control means controls the pitch prediction error energy in subframe units.
The first and second index values and the multiplication coefficient are determined so that
Set.

【００１５】[0015]

【作用】ＣＥＬＰ音声符号化処理において、ピッチ長情
報の伝送を複数のサブフレーム単位で行い、ピッチ予測
係数はサブフレーム単位で伝送するために、適応コード
ブックを用いる３タップのピッチ予測器のパラメータを
以下の動作で決定する。In the CELP speech coding process, pitch length information is transmitted in units of a plurality of subframes, and pitch prediction coefficients are transmitted in units of subframes. Is determined by the following operation.

【００１６】ピッチ長の決定を行うために、複数のサブ
フレーム毎に入力音声信号を線形予測器で予測し、線形
予測残差信号を算出する。適応コードブック内の過去の
励振信号と、上記線形予測残差信号から、相互相関係数
を評価基準としピッチ長を決定する。ピッチ予測係数の
決定は、サブフレーム毎に先に求めたピッチ長で、ピッ
チ予測係数のコードブック全てに対してピッチ予測を行
い、再生信号が聴覚的重み付けした後、入力信号との誤
差が最小になる係数を選択する。以上の処理で、ピッチ
長の伝送を複数のサブフレーム単位で行い、ピッチ予測
係数はサブフレーム単位で伝送する。In order to determine the pitch length, an input speech signal is predicted by a linear predictor for each of a plurality of subframes, and a linear prediction residual signal is calculated. From the past excitation signal in the adaptive codebook and the linear prediction residual signal, the pitch length is determined using the cross-correlation coefficient as an evaluation criterion. The pitch prediction coefficient is determined based on the pitch length previously obtained for each subframe, and pitch prediction is performed for all codebooks of the pitch prediction coefficient. Select the coefficient that gives Through the above processing, the transmission of the pitch length is performed in units of a plurality of subframes, and the pitch prediction coefficient is transmitted in units of subframes.

【００１７】尚、他の発明では、線形予測手段、適応コ
ードブック、予測係数コードブック、３タップピッチ予
測手段、ストカステックコードブック、乗算手段、加算
手段、フィルタ、誤差算出手段、制御手段を備える音声
符号化装置であって、線形予測手段は、入力される音声
信号から複数のサブフレーム単位で線形予測係数を算出
し、適応コードブックは過去の励振信号を記憶すると共
に、制御手段の指定するピッチ長に基づいて適応励振信
号を読み出し、予測係数コードブックは、第1指標値に
対応する予測係数を記憶すると共に、制御手段の制御に
より予測係数を読み出し、３タップピッチ予測手段は、
適応励振信号と予測係数に基づき予測信号算出し、スト
カステックコードブックは、第２指標値に対応する複数
のストカステック励振信号を記憶すると共に、制御手段
の制御によりストカステック励振信号を読み出し、乗算
手段は、読み出されたストカステック励振信号に乗算係
数を算出して出力し、加算手段は、ピッチ予測信号と乗
算手段の出力を加算して励振信号を出力し、フィルタ
は、励振信号に線形予測係数に基づくフィルタ処理を行
い、誤差算出手段は、入力音声信号とフィルタの誤差で
あるピッチ予測誤差エネルギーを算出し、制御手段は、
適応コードブックにピッチ長を指定すると共に、ピッチ
予測誤差エネルギーが最小となるようサブフレーム単位
で第１、２指標値、乗算係数を決定し、複数のサブフレ
ーム単位でピッチ長を決定する。 In another invention, the linear prediction means, the adaptive
Code book, prediction coefficient code book, 3-tap pitch prediction
Measuring means, stochastic code book, multiplying means, addition
Including means, filter, error calculating means, and control means
An encoding device, wherein the linear prediction means is configured to input speech
Calculate linear prediction coefficients from signals in multiple subframe units
In addition, the adaptive codebook stores past excitation signals
Adaptive excitation signal based on the pitch length specified by the control means.
And the prediction coefficient codebook becomes the first index value.
Stores the corresponding prediction coefficient and controls the control means.
Read out the prediction coefficient from the three tap pitch prediction means,
A prediction signal is calculated based on the adaptive excitation signal and the prediction coefficient, and the
The castec codebook contains multiple index values corresponding to the second index value.
And storing the stochastic excitation signal of
Read the stochastic excitation signal by the control of
The means includes a multiplier for multiplying the read stochastic excitation signal.
The addition means calculates and outputs the pitch prediction signal and the
Output of the excitation means by adding the outputs of
Performs filtering based on linear prediction coefficients on the excitation signal.
Error calculating means calculates the error between the input audio signal and the filter.
Calculating a certain pitch prediction error energy, the control means:
Specify the pitch length in the adaptive codebook, and
Subframe unit so that prediction error energy is minimized
To determine the first and second index values and the multiplication coefficient.
Determine the pitch length in units of frames.

【００１８】[0018]

【実施例】図１と図２はそれぞれ、本発明の第１の発明
の符号化器、復号化器の一実施例を示している。1 and 2 show an embodiment of an encoder and a decoder according to the first invention of the present invention, respectively.

【００１９】この装置は、入力音声信号の入力端子１０
１、線形予測器１０２、スペクトル予測器１０３、１２
０、ピッチ長算出器１２１、３タップピッチ予測器１２
３、予測係数コードブック１２４、適応コードブック１
０４、ストカスティックコードブック１０６、掛算器１
０７、加算器１０８、１０９、減算器１１０、１２２、
聴覚的重み付けフィルタ１１１、１１２、エネルギー算
出器１１３、制御部１１４、及び出力端子２０１から構
成される。This device has an input terminal 10 for an input audio signal.
1, linear predictor 102, spectral predictors 103 and 12
0, pitch length calculator 121, 3-tap pitch predictor 12
3, prediction coefficient codebook 124, adaptive codebook 1
04, stochastic codebook 106, multiplier 1
07, adders 108 and 109, subtractors 110 and 122,
It comprises acoustic weighting filters 111 and 112, an energy calculator 113, a control unit 114, and an output terminal 201.

【００２０】先ず符号化器の動作を説明する。First, the operation of the encoder will be described.

【００２１】線形予測器１０２は、フレーム（例えば１
６０サンプル）毎に、入力端子１０１から入力した音声
信号波形より線形予測係数（α）を算出する。この線形
予測係数に基づいてスペクトル予測器１０３、１２０と
聴覚的重み付けフィルタ１１１、１１２の予測係数を設
定する。なお線形予測係数は、サブフレーム毎に隣接フ
レームの線形予測係数より補間されて設定される。The linear predictor 102 outputs a frame (for example, 1
A linear prediction coefficient (α) is calculated from the audio signal waveform input from the input terminal 101 every 60 samples). Based on the linear prediction coefficients , the prediction coefficients of the spectrum predictors 103 and 120 and the auditory weighting filters 111 and 112 are set. The linear prediction coefficient is set for each subframe by interpolating from the linear prediction coefficient of an adjacent frame.

【００２２】以下の処理はサブフレーム（例えば４０サ
ンプル）毎に行われる。The following processing is performed for each subframe (for example, 40 samples).

【００２３】入力信号に入力側の聴覚的重み付けフィル
タを通す（この出力信号をＳｗ［ｎ］、ｎ＝０、１、
…、３９と表す）。公知の技術である重畳加算法を用い
て、過去のサブフレームの影響を除外し、現在のサブフ
レームの信号に対して処理を行うために、先ずスペクト
ル予測器１０３と再生側の聴覚的重み付けフィルタ１１
２内のメモリーを退避した後、４０サンプルの０を入力
し、その出力をＳｗ［ｎ］から減算する（この減算され
た信号をＴ［ｎ］、ｎ＝０、１、…、３９と表す）。
この後、ピッチ予測器とストカスティックコードブック
のパラメータの決定を行うが、ここでは順次最適化の手
法を用いる。The input signal is passed through an input auditory weighting filter (this output signal is Sw [n], n = 0, 1,.
..., 39). In order to remove the influence of the past sub-frame and perform processing on the signal of the current sub-frame by using the well-known superposition addition method, first, the spectrum predictor 103 and the auditory weighting filter on the reproduction side are used. 11
After evacuating the memory in 2, the 0 of 40 samples is input and its output is subtracted from Sw [n] (this subtracted signal is represented as T [n], n = 0, 1,..., 39) ).
Thereafter, the parameters of the pitch predictor and the stochastic codebook are determined. Here, a method of sequential optimization is used.

【００２４】先ずピッチ予測パラメータの最適化につい
て説明する。ピッチ長は伝送情報量を抑える目的で、複
数のサブフレームで１つのパラメータを伝送する。予測
係数は基本的には合成による分析手法で、予測係数のコ
ードブックを全探索して決定する。以下にこれについて
説明する。First, optimization of the pitch prediction parameter will be described. The pitch length transmits one parameter in a plurality of subframes in order to suppress the amount of transmission information. The prediction coefficient is basically an analysis method based on synthesis, and is determined by fully searching the code book of the prediction coefficient. This will be described below.

【００２５】先ずピッチ長の決定と情報伝送は、複数の
サブフレーム長に１度行われる。ここでは例えば２サブ
フレームとする。先ず２サブフレーム分の入力信号か
ら、スペクトル予測器１２０で、線形予測残差信号（Ｚ
［ｎ］＝０、１、…、７９と表す）を算出する。この線
形予測残差信号と、適応コードブック１０４に記憶され
ている過去の励振信号（ｅｘｃ［ｎ］＝−１、−２、
…、−１６７と表す）の相互相関係数が最大となる遅延
値を、ピッチ長の想定範囲（例えば４０〜１６７サンプ
ル）の中で探索する。本手法により複数のサブフレーム
に渡って一定のピッチ長を算出する。First, pitch length determination and information transmission are performed once for a plurality of subframe lengths. Here, for example, two subframes are set. First, from the input signals for two subframes, the spectrum prediction unit 120 uses the linear prediction residual signal (Z
[N] = 0, 1,..., 79). This linear prediction residual signal and past excitation signals (exc [n] =-1, -2,
.., -167) in the assumed range of pitch length (for example, 40 to 167 samples). This method calculates a constant pitch length over a plurality of subframes.

【００２６】本音声符号化装置は３タップのピッチ予測
器を構成要素として備えている。３タップのピッチ予測
器は、ピッチ長の補間作用があり、高いピッチ周波数に
対する予測精度が高いという長所があるとともに、今回
提案するようなピッチ長を複数のサブフレーム間で固定
しても、予測係数さえサブフレーム単位で正しく設定で
きれば、ある程度のピッチ長の変化に対してもピッチ予
測ができるという特徴がある（"Efficient Encoding of
the Long-Term Predictor in Vector Excitation Code
rs"、M.Yong and A.Gersho、Advances in Speech Coding、
Kluwer Academic Publishers）。これに対して非整数遅
延を用いても、基本的には１タップのピッチ予測器は、
ピッチ長の変化に非常にセンシティブである。１タップ
のピッチ予測器ではピッチ長が正しく表現できなかった
場合、殆どピッチ予測の効果がなくなってしまう。The speech coding apparatus has a 3-tap pitch predictor as a component. The 3-tap pitch estimator has an advantage that the pitch length is interpolated and the prediction accuracy for a high pitch frequency is high, and even if the pitch length proposed here is fixed between a plurality of subframes, If the coefficients can be correctly set in subframe units, pitch prediction can be performed even for a certain change in pitch length ("Efficient Encoding of
the Long-Term Predictor in Vector Excitation Code
rs ", M. Yong and A. Gersho, Advances in Speech Coding,
Kluwer Academic Publishers). On the other hand, even if a non-integer delay is used, basically, a one-tap pitch predictor
Very sensitive to changes in pitch length. If the pitch length cannot be correctly represented by the one-tap pitch predictor, the effect of pitch prediction is almost lost.

【００２７】ピッチ予測係数の決定はサブフレーム毎に
行われる。予め作成されている予測係数のコードブック
の全てのコードワードに対して、３タップのピッチ予測
で予測信号を得、その予測信号を、過去の状態に無関係
の（メモリーの内容をクリアした）スペクトル合成フィ
ルタ１０３と、聴覚的重み付けフィルタ１１２に入力
し、ピッチ予測信号による再生信号を得る（これをＰ
［ｎ］、ｎ＝０、１、…、３９と表す）。Ｐ［ｎ］と先
に求めたＴ［ｎ］との誤差信号のエネルギーをエネルギ
ー算出器１１３で求め、この誤差エネルギーを最小化す
るように、制御部１１４により最適なピッチ予測係数の
指標値（Ｉｎｄｅｘ１）を決定する。以上がピッチ予測
パラメータの算出手順である。The determination of the pitch prediction coefficient is performed for each subframe. A prediction signal is obtained by pitch prediction of three taps for all codewords of the codebook of the prediction coefficient created in advance, and the prediction signal is converted into a spectrum (clearing the contents of the memory) irrespective of the past state. The signal is input to the synthesis filter 103 and the auditory weighting filter 112 to obtain a reproduced signal based on the pitch prediction signal.
[N], n = 0, 1,..., 39). The energy of the error signal between P [n] and the previously obtained T [n] is determined by the energy calculator 113, and the control unit 114 controls the index value of the optimal pitch prediction coefficient (minimum) to minimize the error energy. Index 1) is determined. The above is the procedure for calculating the pitch prediction parameter.

【００２８】以上の処理で、複数のサブフレーム毎にピ
ッチ長情報を伝送し、サブフレーム毎にピッチ予測係数
を伝送する。With the above processing, pitch length information is transmitted for each of a plurality of subframes, and a pitch prediction coefficient is transmitted for each of the subframes.

【００２９】この後の処理は公知の技術と同じであるの
で、簡単に説明する。The subsequent processing is the same as that of a known technique, and will be described briefly.

【００３０】次にＴ［ｎ］から、先に決定した最適なピ
ッチ予測信号での合成波形Ｐ［ｎ］を減算し、ピッチ成
分では表せなかった成分の信号波形（Ｔ２［ｎ］、ｎ＝
０、１、…、３９と表す）を算出する。Next, the composite waveform P [n] of the previously determined optimum pitch prediction signal is subtracted from T [n], and the signal waveform of the component that cannot be represented by the pitch component (T2 [n], n =
.., 39).

【００３１】ストカスティックコードブック（励振信号
ベクトル）のパラメータ最適化も、ピッチ予測器の最適
化同様、全探索の合成による分析手法で処理する。全探
索するパラメータはコードブックの指標値（Ｉｎｄｅｘ
２）で、全てのコードワードをフィルタ内部のメモリー
をクリアしたスペクトル予測器１０３と、聴覚的重み付
けフィルタ１１２に入力し、その出力波形（Ｓ［ｎ］、
ｎ＝０、１、…、３９と表す）とＴ２［ｎ］との誤差波
形のエネルギーをエネルギー算出器１１３で計算し、誤
差エネルギーの最も小さいコードブックのパラメータ
（Ｉｎｄｅｘ２、ｂ）を、制御部１１４で決定する。The optimization of the parameters of the stochastic codebook (excitation signal vector) is performed by an analysis method based on the synthesis of the full search, similarly to the optimization of the pitch predictor. The parameters to be fully searched are the index values of the codebook (Index)
In 2), all the codewords are input to the spectrum estimator 103 having cleared the memory inside the filter and the perceptual weighting filter 112, and the output waveforms (S [n],
n = 0, 1,..., 39) and the energy of the error waveform between T2 [n] are calculated by the energy calculator 113, and the codebook parameter (Index2, b) having the smallest error energy is calculated by the control unit. 114 is determined.

【００３２】最適なピッチ予測パラメータとストカステ
ックコードブックのパラメータ等が決定した後、次のサ
ブフレームのデータ処理に備えて、最適なピッチ予測信
号波形と、最適なストカスティックコードワード（励振
信号波形）に最適予測係数で増幅した信号を加算した信
号を算出する（これをｅｘｃ［ｎ］、ｎ＝０、１、…、
３９と表す）。これを次のサブフレームに対するピッチ
予測器１０４内のメモリーとして設定する。及びスペク
トル予測器１０３と聴覚的重み付けフィルタ１１２に、
退避したメモリーを再設定し、ｅｘｃ［ｎ］を入力とし
て合成波形を算出することで、これら２つのフィルタの
メモリーも更新しておく。After the optimal pitch prediction parameters and the parameters of the stochastic code book are determined, an optimal pitch prediction signal waveform and an optimal stochastic codeword (excitation signal waveform) are prepared for data processing of the next subframe. ) Is added to the signal amplified by the optimal prediction coefficient (this is expressed as exc [n], n = 0, 1,...,
39). This is set as a memory in the pitch predictor 104 for the next subframe. And the spectrum predictor 103 and the auditory weighting filter 112
The saved memories are reset and the combined waveforms are calculated using exc [n] as an input, so that the memories of these two filters are also updated.

【００３３】以上の処理で、本発明で提案する音声符号
化装置が実現できる。With the above processing, the speech coding apparatus proposed in the present invention can be realized.

【００３４】次に復号化器側の動作を説明する。Next, the operation of the decoder will be described.

【００３５】復号化器を図２に示す。復号化器に伝送さ
れる情報は、線型予測係数（α）、ピッチ長（ｔａ
ｕ）、ピッチ予測ゲインの指標値（ｉｎｄｅｘ１）、ス
トカスティックコードブックの指標値（ｉｎｄｅｘ
２）、励振ベクトルのゲイン（ｂ）である。FIG. 2 shows the decoder. The information transmitted to the decoder includes a linear prediction coefficient (α) , a pitch length (ta )
u), pitch prediction gain index value (index1), stochastic codebook index value (index)
2) The gain (b) of the excitation vector.

【００３６】基本的にＣＥＬＰは、合成による分析手法
を用いて符号化されているので、復号化の処理は符号化
処理の中に含まれている。以下に処理の手順を説明す
る。Basically, CELP is encoded by using an analysis method based on synthesis, so that the decoding process is included in the encoding process. The procedure of the process will be described below.

【００３７】先ず、伝送されたピッチ長（ｔａｕ）とピ
ッチ予測ゲインの指標値（ｉｎｄｅｘ１）に基づき、ピ
ッチ予測器１２３でピッチ予測信号を出力する。次に励
振信号の情報として伝送されたコードブックの指標値
（ｉｎｄｅｘ２）に基づき、コードブック１０６は励振
信号を出力し、伝送された励振ベクトルのゲイン（ｂ）
に基づき、掛算器１０７で信号を増幅し加算器１０８に
出力する。First, the pitch predictor 123 outputs a pitch prediction signal based on the transmitted pitch length (tau) and the index value (index1) of the pitch prediction gain. Next, based on the index value (index2) of the codebook transmitted as information of the excitation signal, the codebook 106 outputs an excitation signal, and the gain (b) of the transmitted excitation vector
, The signal is amplified by the multiplier 107 and output to the adder 108.

【００３８】加算器１０８で上記２つの信号は加算さ
れ、これを次のフレームに対するデータとして適応コー
ドブック１０４に設定する。また伝送された線型予測係
数（α）に基づき、スペクトル予測器１０３を構成し、
この加算信号を入力することで再生信号が算出され、出
力端子２０１に出力する。The above two signals are added by the adder 108, and this is set in the adaptive codebook 104 as data for the next frame. The linear prediction coefficient transmitted
Configure the spectrum predictor 103 based on the number (α) ,
A reproduction signal is calculated by inputting the addition signal, and is output to the output terminal 201.

【００３９】次に発明の第２項目の実施例について説明
する。Next, a second embodiment of the present invention will be described.

【００４０】第１項目の発明では、複数のサブフレーム
に渡るピッチ長の抽出を、相互相関係数を評価基準とし
た。この手法では、複数のサブフレームの信号を、その
区間一定の予測係数で予測する場合に最適予測となる。
しかしながら先に説明したように、複数のサブフレーム
で一定のピッチ長を用い、サブフレーム毎に予測係数を
設定する場合には、上記手法によるピッチ長の決定法が
必ずしも良いとは言えない。In the first aspect of the present invention, the extraction of the pitch length over a plurality of subframes is based on the cross-correlation coefficient. In this method, optimal prediction is performed when signals of a plurality of subframes are predicted with a constant prediction coefficient in the section.
However, as described above, when a constant pitch length is used in a plurality of subframes and a prediction coefficient is set for each subframe, the method of determining the pitch length by the above method is not necessarily good.

【００４１】本発明の手法は、各サブフレーム毎にスペ
クトル予測残差波形のピッチ予測誤差エネルギーを算出
し、複数のサブフレーム全体での誤差エネルギーが最小
になるようにピッチ長を決定する手法である。以下に、
このピッチ長の決定の２つの手法を説明する。The method of the present invention calculates the pitch prediction error energy of the spectrum prediction residual waveform for each subframe, and determines the pitch length so that the error energy in the plurality of subframes is minimized. is there. less than,
Two techniques for determining the pitch length will be described.

【００４２】先ず第１の手法は、あるピッチ長に対し
て、各サブフレーム毎に３タップピッチ予測器の最適予
測係数を算出し、その係数での予測誤差エネルギーを算
出し、各サブフレームでの予測誤差エネルギーを、ピッ
チ長を伝送する単位の複数のサブフレームで累積する。
この複数のサブフレーム全体の予測誤差エネルギーを最
小にするピッチ長を全探索の手段で決定する。この処理
の流れを図３に示す。第１の手法ではピッチ長の決定に
際して、予測係数を計算上の値で設定していた。しかし
ながら実際は、ピッチ予測係数は量子化されて予測係数
コードブック１２４に記憶された種類しか取り得ない。
第２の手法は、この点を厳密に考慮し、予測係数も量子
化して決定する方法である。この手法について説明す
る。First, a first method calculates an optimum prediction coefficient of a 3-tap pitch predictor for each subframe for a certain pitch length, calculates a prediction error energy at the coefficient, and calculates a prediction error energy for each subframe. Are accumulated in a plurality of subframes in units of transmitting the pitch length.
The pitch length that minimizes the prediction error energy of the whole of the plurality of subframes is determined by means of full search. FIG. 3 shows the flow of this processing. In the first method, when determining the pitch length, the prediction coefficient is set by a calculated value. However, in practice, the pitch prediction coefficients can only be of the type that has been quantized and stored in the prediction coefficient codebook 124.
The second method is a method in which this point is strictly considered, and the prediction coefficients are also quantized and determined. This technique will be described.

【００４３】あるピッチ長に対して、各サブフレーム毎
に全ての予測係数でピッチ予測を行い、予測誤差エネル
ギーが最小になるものを探索する。この処理をピッチ長
を伝送する単位の複数のサブフレームで行い、各サブフ
レームの予測誤差エネルギーを累積する。この複数のサ
ブフレーム全体の予測誤差エネルギーを最小にするピッ
チ長を全探索の手段で決定する。この処理の流れを図４
に示す。With respect to a certain pitch length, pitch prediction is performed for all prediction coefficients for each subframe, and a search is made for a pitch length having a minimum prediction error energy. This processing is performed for a plurality of subframes in units of transmitting the pitch length, and the prediction error energy of each subframe is accumulated. The pitch length that minimizes the prediction error energy of the whole of the plurality of subframes is determined by means of full search. This processing flow is shown in FIG.
Shown in

【００４４】しかしながら第２の手法は処理量が非常に
多くなる。予測係数を全探索ではなく、第１の手法で述
べたように、各サブフレーム毎に３タップピッチ予測器
の最適予測係数を算出し、予測係数のコードブックの中
から予測係数のレベルで最小距離となるものを量子化値
として設定し、予測誤差エネルギーを算出する方法も考
えられる。However, the second method requires a very large amount of processing. As described in the first method, the prediction coefficient is calculated not by the full search but by the optimum prediction coefficient of the 3-tap pitch predictor for each subframe, and the minimum prediction coefficient level is calculated from the prediction coefficient codebook. A method of calculating a prediction error energy by setting a distance as a quantization value is also conceivable.

【００４５】[0045]

【発明の効果】以上より明らかのように、入力音声信号
のスペクトル予測パラメータを算出する線形予測器と、
スペクトル予測器と、ピッチ予測器と、複数の励振信号
波形ベクトルを記憶したコードブックと、入力音声信号
と再生信号との差信号のスペクトルを整形する聴覚的重
み付けフィルタと、聴覚的重み付けされた差信号のエネ
ルギーを算出するエネルギー算出器と、エネルギー算出
器の出力値が最小になるように、ピッチ予測器とコード
ブックのパラメータを最適設定する制御部を備えた符号
励振線形予測符号化器において、線形予測残差信号を算
出する手段と、適応コードブックと、ピッチ長算出器
と、３タップのピッチ予測係数のコードブックを備え、
ピッチ長の伝送を複数のサブフレーム単位で行い、ピッ
チ予測係数はサブフレーム単位で伝送することで、ピッ
チ長の情報伝送量を低く抑えて、高周波のピッチ信号も
精度良く予測できる。As is clear from the above, a linear predictor for calculating a spectrum prediction parameter of an input speech signal;
A spectrum predictor, a pitch predictor, a codebook storing a plurality of excitation signal waveform vectors, an auditory weighting filter for shaping the spectrum of a difference signal between the input audio signal and the reproduced signal, and an auditory weighted difference An energy calculator that calculates the energy of the signal, and a code excitation linear prediction encoder including a control unit that optimally sets parameters of a pitch predictor and a codebook so that an output value of the energy calculator is minimized. Means for calculating a linear prediction residual signal, an adaptive codebook, a pitch length calculator, and a codebook of 3-tap pitch prediction coefficients,
By transmitting the pitch length in units of a plurality of subframes and transmitting the pitch prediction coefficient in units of subframes, it is possible to suppress the information transmission amount of the pitch length low and accurately predict a high-frequency pitch signal.

【図面の簡単な説明】[Brief description of the drawings]

【図１】本発明の音声符号化装置の符号化器の一実施例
のブロック図である。FIG. 1 is a block diagram of an embodiment of an encoder of a speech encoding device according to the present invention.

【図２】本発明の音声符号化装置の復号化器の一実施例
のブロック図である。FIG. 2 is a block diagram of an embodiment of a decoder of the speech encoding device of the present invention.

【図３】本発明の第１のピッチ長の決定手法を説明する
図である。FIG. 3 is a diagram for explaining a first pitch length determining method according to the present invention.

【図４】本発明の第２のピッチ長の決定手法を説明する
図である。FIG. 4 is a diagram illustrating a second pitch length determination technique according to the present invention.

【図５】従来技術のＣＥＬＰ音声符号化装置のブロック
図である。FIG. 5 is a block diagram of a conventional CELP speech encoding device.

[Explanation of symbols]

１０１入力端子１０２線形予測器１０３、１２０スペクトル予測器１０４、１２３ピッチ予測器１０５、１０７掛算器１０６ストカスティックコードブック１０８、１０９加算器１１０、１２２減算器１１１、１１２聴覚的重み付けフィルタ１１３エネルギー算出器１１４制御部１２１ピッチ長算出器１２４予測係数コードブック３０１出力端子 101 input terminal 102 linear predictor 103, 120 spectral predictor 104, 123 pitch predictor 105, 107 multiplier 106 stochastic codebook 108, 109 adder 110, 122 subtractor 111, 112 auditory weighting filter 113 energy calculator 114 control unit 121 pitch length calculator 124 prediction coefficient codebook 301 output terminal

Claims

(57) [Claims]

A linear prediction means for calculating a pitch length;
Means (120-122), adaptive codebook (10
4) Prediction coefficient codebook (124), 3 tap
H prediction means (123), stochastic code book
(106), multiplication means (107), addition means (10
8), filters (103, 109, 112), error calculation
Means (110, 113) and control means (114)
An audio coding apparatus, wherein a linear prediction means (102) performs decoding from an input audio signal.
The linear prediction coefficient is calculated in units of a number of subframes, and the pitch calculation length output means (120-122) outputs the input speech signal.
, The pitch length is calculated for each of a plurality of subframes, and the adaptive codebook (104) stores the past excitation signal.
Read the adaptive excitation signal based on the pitch length
And the prediction coefficient codebook (124) corresponds to the first index value.
The prediction coefficient to be stored is stored and controlled by the control means.
The prediction coefficient is read, and the 3-tap pitch prediction means (123)
The prediction signal is calculated based on the prediction coefficient, and the stochastic code book (106) calculates the second index value.
When storing multiple stochastic excitation signals corresponding to
Both are controlled by the control means (114).
The excitation signal is read, and the multiplication means (107) reads the read stochastic excitation.
The multiplication coefficient is calculated and output to the vibration signal, and the adding means (108) outputs the pitch prediction signal and the multiplication means (1
07) to output an excitation signal, and the filters (103, 109, 112) apply a line to the excitation signal.
The filter processing is performed based on the shape prediction coefficient, and the error calculation means (110, 113) outputs
Pitch which is the error of the filter (103, 109, 112)
The prediction error energy is calculated, and the control means (114) calculates the pitch prediction energy in subframe units.
First and second index value to measurement error energy becomes minimum, multiply
A speech coding device for determining arithmetic coefficients.

2. A linear prediction means (102) comprising:
(104), prediction coefficient codebook (124), 3
Tap pitch prediction means (123), Stokastec Co
Book (106), multiplication means (107), addition means
(108), filter (103, 109, 112), wrong
The difference calculation means (110, 113) and the control means (114)
A speech coding apparatus comprising, a linear prediction unit (102) is double from the audio signal input
A linear prediction coefficient is calculated in units of a number of subframes, and the adaptive codebook (104) stores a past excitation signal.
Based on the pitch length specified by the control means (114).
The adaptive excitation signal is read out according to the prediction coefficient codebook (124) corresponding to the first index value.
The prediction coefficient to be stored is stored and controlled by the control means.
The prediction coefficient is read, and the 3-tap pitch prediction means (123)
A prediction signal is calculated based on the prediction coefficient, and the stochastic code book (106) calculates the second index value.
When storing multiple stochastic excitation signals corresponding to
Both are controlled by the control means (114).
The excitation signal is read, and the multiplication means (107) reads the read stochastic excitation.
The multiplication coefficient is calculated and output to the vibration signal, and the adding means (108) outputs the pitch prediction signal and the multiplication means (1
07) to output an excitation signal, and the filters (103, 109, 112) apply a line to the excitation signal.
The filter processing is performed based on the shape prediction coefficient, and the error calculation means (110, 113) outputs
Pitch which is the error of the filter (103, 109, 112)
The prediction error energy is calculated, and the control means (114) stores the prediction error energy in the adaptive codebook (104).
Specify pitch length and pitch prediction error energy
, The first and second index values in subframe units so that
Determine the multiplication factor and pitch length in units of multiple subframes
Speech coding device that determines