JPH09127985A

JPH09127985A - Signal coding method and device therefor

Info

Publication number: JPH09127985A
Application number: JP7302128A
Authority: JP
Inventors: Shiro Omori; 士郎大森; Masayuki Nishiguchi; 正之西口; Atsushi Matsumoto; 淳松本; Kazuyuki Iijima; 和幸飯島
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1995-10-26
Filing date: 1995-10-26
Publication date: 1997-05-16

Abstract

PROBLEM TO BE SOLVED: To perform a regeneration of high quality at a low bit rate by band- dividing an input signal and coding it by respective methods suitable to low-pass side and high-pass side SOLUTION: From the input signal from a terminal 101, a high-pass side signal is taken out by a LPF (low pass filter) 102 and a subtracter 106, and this signal is FFT-processed by a FFT(fast Fourier-transformation) circuit 161, and shifted to a low-pass side band by a frequency shifting circuit. The signal reverse-FFT processed by a reverse FFT circuit 163 is transmitted to a LPC(linear predictive coding) reverse filter circuit 171 and subjected to predictive coding processing, whereby the predictive coding processing can be realized at low sampling frequency.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、広帯域音声信号の
ような入力信号を帯域分割して符号化する信号符号化方
法及び装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a signal coding method and apparatus for band-dividing and coding an input signal such as a wideband speech signal.

【０００２】[0002]

【従来の技術】オーディオ信号（音声信号や音響信号を
含む）の時間領域や周波数領域における統計的性質と人
間の聴感上の特性を利用して信号圧縮を行うような符号
化方法が種々知られている。この符号化方法としては、
大別して時間領域での符号化、周波数領域での符号化、
分析合成符号化等が挙げられる。2. Description of the Related Art Various coding methods are known in which signal compression is performed by utilizing the statistical properties of audio signals (including voice signals and acoustic signals) in the time domain and frequency domain and human auditory characteristics. ing. As this encoding method,
Broadly speaking, time domain coding, frequency domain coding,
Examples include analysis and synthesis coding.

【０００３】音声信号等の高能率符号化の例として、ハ
ーモニック（Harmonic）符号化、ＭＢＥ（Multiband Ex
citation: マルチバンド励起）符号化等のサイン波分析
符号化や、ＳＢＣ（Sub-band Coding:帯域分割符号
化）、ＬＰＣ（Linear Predictive Coding: 線形予測符
号化）、あるいはＤＣＴ（離散コサイン変換）、ＭＤＣ
Ｔ（モデファイドＤＣＴ）、ＦＦＴ（高速フーリエ変
換）等が知られている。[0003] Examples of high-efficiency coding of voice signals and the like include harmonic coding and MBE (Multiband Ex).
citation: Sine wave analysis coding such as multi-band excitation coding, SBC (Sub-band Coding: band division coding), LPC (Linear Predictive Coding: linear predictive coding), or DCT (discrete cosine transform), MDC
T (Modified DCT), FFT (Fast Fourier Transform), etc. are known.

【０００４】[0004]

【発明が解決しようとする課題】ところで、従来におい
て、広帯域信号を符号化する場合、帯域分割を行いサブ
バンド符号化を行う方法が存在していた。特に、低域側
と高域側とに２分割する場合、サンプリング周波数fs／
４で分割を行う方法が一般的である。このfs／４の周波
数で分割した後、高域側についてはダウンサンプリング
が行われる。こうすることで、高域周波数成分がそのま
ま低域側にエイリアシングとして折り返されるので、低
域側の変換された信号として扱うことができたのであ
る。By the way, conventionally, in the case of encoding a wideband signal, there has been a method of performing band division and subband encoding. In particular, when the low frequency band and high frequency band are divided into two, the sampling frequency fs /
The method of dividing by 4 is general. After dividing at the frequency of fs / 4, down sampling is performed on the high frequency side. By doing so, the high frequency components are returned to the low frequency side as they are as aliasing, and thus can be treated as a converted signal on the low frequency side.

【０００５】ところが、fs／４の周波数で分割できない
場合においては、この方法をとることができなかった。
低域側の信号をfs／２で再生することをも考えると、低
域側はfs／４の周波数までは伸びていない。従って、高
域側でこの部分をもカバーしなくてはならない。However, this method cannot be used when the frequency cannot be divided at fs / 4.
Considering that the low frequency side signal is reproduced at fs / 2, the low frequency side does not extend to the frequency of fs / 4. Therefore, this part must be covered on the high frequency side.

【０００６】また、近年において、そのような帯域で分
割される音声／楽音信号を符号化して、低域側信号を符
号化したものと重ねてスケーラビリティを持つビットス
トリームを作成する方法が望まれている。In recent years, there has been a demand for a method of encoding a voice / tone signal divided in such a band and overlapping it with an encoded low-frequency side signal to create a bit stream having scalability. There is.

【０００７】そこで、本発明は、このような実情に鑑み
てなされたものであり、入力信号を帯域分割し、低域側
と高域側とでそれぞれ適した方法で符号化することを可
能にする信号符号化方法及び信号符号化装置を提供する
ことを目的とする。Therefore, the present invention has been made in view of such a situation, and enables an input signal to be band-divided and encoded by a method suitable for each of the low frequency band and the high frequency band. It is an object of the present invention to provide a signal coding method and a signal coding device that perform the above.

【０００８】[0008]

【課題を解決するための手段】本発明に係る信号符号化
方法は、上述した課題を解決するために、入力信号を帯
域分割し、分割された少なくとも１つの高域側の信号を
低域側に周波数変換し、低域側に変換された信号のサン
プリングレートを低下させ、サンプリングレートが低下
させられた信号を予測符号化することを特徴としてい
る。In order to solve the above-mentioned problems, a signal coding method according to the present invention divides an input signal into bands, and divides at least one high-frequency side signal into a low-frequency side signal. The frequency conversion is performed to reduce the sampling rate of the signal converted to the low frequency side, and the signal having the reduced sampling rate is predictively encoded.

【０００９】また、本発明に係る信号符号化装置は、上
述した課題を解決するために、入力信号を帯域分割する
帯域分割手段と、分割された少なくとも１つの高域側の
信号を低域側に周波数変換する周波数変換手段と、低域
側に変換された信号のサンプリングレートを低下させる
サンプリングレート低下手段と、サンプリングレートが
低下させられた信号を予測符号化する手段とを有するこ
とを特徴としている。Further, in order to solve the above-mentioned problems, the signal coding apparatus according to the present invention has a band dividing means for band-dividing an input signal, and at least one divided signal on the high band side on the low band side. And a frequency conversion means for frequency conversion, a sampling rate reduction means for reducing the sampling rate of the signal converted to the low frequency side, and a means for predictively encoding the signal with the reduced sampling rate. There is.

【００１０】[0010]

【発明の実施の形態】以下、本発明に係る好ましい実施
の形態について説明する。先ず、図１は、本発明に係る
信号符号化方法の実施の形態が適用された広帯域音声信
号の符号化装置を示している。BEST MODE FOR CARRYING OUT THE INVENTION Preferred embodiments of the present invention will be described below. First, FIG. 1 shows a wideband speech signal coding apparatus to which an embodiment of a signal coding method according to the present invention is applied.

【００１１】ここで、図１の符号化装置の基本的な考え
方は、入力信号を複数の帯域に分割して、分割された帯
域毎の信号に対してそれぞれの帯域の信号特性に応じた
互いに異なる符号化を施すものである。具体的には、広
帯域の入力音声信号を複数の帯域、例えば音声として充
分な明瞭度を得られる電話帯域と、この電話帯域よりも
高域側の帯域とに分割している。低域側の電話帯域の信
号については、ＬＰＣ（線形予測符号化）分析等の短期
予測の後にピッチ予測等の長期予測を行った上で直交変
換を施し、この変換後の係数を聴覚重み付けベクトル量
子化しており、また、ＬＰＣ係数等の短期予測係数を表
現するパラメータ、ピッチやピッチゲイン等の長期予測
に関連した情報についても量子化している。電話帯域よ
りも高域の信号については、短期予測したものをそのま
ま時間軸上でベクトル量子化している。Here, the basic idea of the encoding apparatus of FIG. 1 is that an input signal is divided into a plurality of bands, and the signals for each divided band are mutually dependent on the signal characteristics of each band. Different encoding is applied. Specifically, the wideband input voice signal is divided into a plurality of bands, for example, a telephone band in which sufficient intelligibility as voice is obtained and a band higher than the telephone band. For signals in the low-frequency side telephone band, short-term prediction such as LPC (linear predictive coding) analysis is performed, and then long-term prediction such as pitch prediction is performed, and then orthogonal transformation is performed. Quantization is also performed, and parameters that represent short-term prediction coefficients such as LPC coefficients, and information related to long-term prediction such as pitch and pitch gain are also quantized. For signals in the higher band than the telephone band, short-term predictions are vector-quantized as they are on the time axis.

【００１２】上記直交変換としてＭＤＣＴ（モディファ
イド離散コサイン変換）を用い、このときの変換長を短
くすることでベクトル量子化の重み付けを容易に行い、
さらにこの変換長を２^N、すなわち２のべき乗の大きさ
にすることでＦＦＴ（高速フーリエ変換）を用いた高速
化を可能としている。また、上記短期予測の残差を算出
し、直交変換係数をベクトル量子化する際の重み算出す
るためのＬＰＣ係数は（ポストフィルタについても）、
現フレーム内で求められたものと過去のフレームで求め
られたものとを滑らかに補間したものを用い、各分析サ
ブフレームで最適なＬＰＣ係数を用いている。また、上
記長期予測を行う場合、１フレームに複数回の予測又は
補間を行い、それらのピッチラグ、ピッチゲインをその
まま、あるいは差分をとった上で量子化し、あるいは補
間方法を示すフラグを伝送している。さらに、予測回数
（頻度）の増加に対して、分散の小さくなる予測残差に
ついて、それらの直交変換係数の差を量子化するマルチ
ステージ（多段）ベクトル量子化を行うか、あるいは分
割された帯域の内、ある１つの帯域に対するパラメータ
のみを用い、１つのエンコードビットストリームの全体
あるいは一部分により、レートの異なる複数のデコード
動作を可能としている。MDCT (Modified Discrete Cosine Transform) is used as the orthogonal transform, and the vector quantization is easily weighted by shortening the transform length at this time.
Furthermore, by setting this transform length to 2 ^N , that is, a power of 2, it is possible to speed up using FFT (Fast Fourier Transform). Further, the LPC coefficient (also for the post filter) for calculating the residual of the above short-term prediction and calculating the weight when vector-quantizing the orthogonal transform coefficient is
The optimum LPC coefficient is used in each analysis subframe by using a smooth interpolation of the one obtained in the current frame and the one obtained in the past frame. Further, in the case of performing the long-term prediction, prediction or interpolation is performed a plurality of times in one frame, and the pitch lag and pitch gain are quantized as they are, or the difference is taken, or a flag indicating an interpolation method is transmitted. There is. Furthermore, with respect to the prediction residual whose variance decreases as the number of predictions (frequency) increases, multi-stage vector quantization is performed to quantize the difference between the orthogonal transform coefficients, or the divided band is divided. Among these, only a parameter for one certain band is used, and a plurality of decoding operations with different rates are possible by the whole or a part of one encoded bit stream.

【００１３】以下、図１を参照しながら説明する。図１
の入力端子１０１には、例えばサンプリング周波数Ｆs
が１６ｋHzで０〜８ＫHz程度の広帯域音声信号が供給さ
れている。この入力端子１０１からの広帯域音声信号
は、ローパスフィルタ１０２及び減算器１０６により低
域側、例えば０〜３．８ｋHz程度のいわゆる電話帯域の
信号と、高域側、例えば３．８ｋHz〜８ｋHzの信号とに
帯域分割される。低域側は、サンプリング周波数変換器
１０３により、サンプリング定理を満たす範囲で間引く
ことにより、例えば８ｋHzサンプリングの信号にしてお
く。A description will be given below with reference to FIG. FIG.
Of the sampling frequency Fs
A wide band audio signal of about 0 to 8 kHz at 16 kHz is supplied. The wideband audio signal from the input terminal 101 is a low-pass filter 102 and a subtractor 106 for a low-frequency side signal, for example, a so-called telephone band signal of about 0 to 3.8 kHz and a high-frequency side signal, for example, a signal of 3.8 kHz to 8 kHz. Band divided into and. The low frequency side is thinned by the sampling frequency converter 103 within a range that satisfies the sampling theorem to be a signal of 8 kHz sampling, for example.

【００１４】低域側での処理としては、先ず、ＬＰＣ分
析・量子化部１３０により、例えば１ブロック２５６サ
ンプル程度の分析長によりハミング窓かけをした上で、
１０次程度のＬＰＣ係数、すなわちαパラメータを算出
し、ＬＰＣ逆フィルタ１１１によりＬＰＣ残差を求めて
いる。このＬＰＣ分析の際には、分析の単位となる１ブ
ロック２５６サンプルの内の９６サンプルを次のブロッ
クとオーバーラップさせることにより、フレーム間隔す
なわちフレームインターバルは１６０サンプルとなる。
このフレーム間隔は、８ｋHzサンプリングで２０ｍsec
になる。また、このＬＰＣ分析・量子化部１３０では、
ＬＰＣ係数であるαパラメータをＬＳＰ（線スペクトル
対）パラメータに変換して量子化したものを伝送するよ
うにしている。As the processing on the low frequency side, first, the LPC analysis / quantization unit 130 performs a Hamming window with an analysis length of, for example, about 256 samples per block.
The LPC coefficient of about 10th order, that is, the α parameter is calculated, and the LPC inverse filter 111 obtains the LPC residual. In this LPC analysis, 96 samples of one block of 256 samples, which is the unit of analysis, are overlapped with the next block, so that the frame interval, that is, the frame interval becomes 160 samples.
This frame interval is 20 msec at 8 kHz sampling
become. Further, in this LPC analysis / quantization unit 130,
The α parameter, which is an LPC coefficient, is converted into an LSP (line spectrum pair) parameter, and the quantized one is transmitted.

【００１５】すなわち、ＬＰＣ分析・量子化部１３０に
おいて、サンプリング周波数変換器１０３からの低域側
信号が入力されるＬＰＣ分析回路１３２は、入力信号波
形の２５６サンプル程度の長さを１ブロックとしてハミ
ング窓をかけて、自己相関法により線形予測係数、いわ
ゆるαパラメータを求めている。データ出力の単位とな
るフレーミングの間隔は、例えば１６０サンプルで２０
ｍsec である。That is, in the LPC analysis / quantization unit 130, the LPC analysis circuit 132, to which the low-frequency side signal from the sampling frequency converter 103 is input, hums the length of about 256 samples of the input signal waveform as one block. Through a window, a linear prediction coefficient, a so-called α parameter, is obtained by the autocorrelation method. The framing interval, which is the unit of data output, is 20 for 160 samples, for example.
msec.

【００１６】ＬＰＣ分析回路１３２からのαパラメータ
は、α→ＬＳＰ変換回路１３３に送られて、線スペクト
ル対（ＬＳＰ）パラメータに変換される。これは、直接
型のフィルタ係数として求まったαパラメータを、例え
ば１０個、すなわち５対のＬＳＰパラメータに変換す
る。変換は例えばニュートン−ラプソン法等を用いて行
う。このＬＳＰパラメータに変換するのは、αパラメー
タよりも補間特性に優れているからである。The α parameter from the LPC analysis circuit 132 is sent to the α → LSP conversion circuit 133 and converted into a line spectrum pair (LSP) parameter. This converts the α parameter obtained as the direct type filter coefficient into, for example, 10 pieces, that is, 5 pairs of LSP parameters. The conversion is performed using, for example, the Newton-Raphson method. The conversion to the LSP parameter is because it has better interpolation characteristics than the α parameter.

【００１７】α→ＬＳＰ変換回路１３３からのＬＳＰパ
ラメータは、ＬＳＰ量子化器１３４によりベクトル量子
化あるいはマトリクス量子化される。このとき、フレー
ム間差分をとってからベクトル量子化、あるいは、複数
フレーム分をまとめてマトリクス量子化してもよい。こ
こでは、２０ｍsec を１フレームとし、２０ｍsec 毎に
算出されるＬＳＰパラメータを２フレーム分まとめてマ
トリクス量子化している。The LSP parameter from the α → LSP conversion circuit 133 is vector-quantized or matrix-quantized by the LSP quantizer 134. At this time, vector quantization may be performed after obtaining the difference between frames, or matrix quantization may be performed on a plurality of frames at once. Here, 20 msec is taken as one frame, and LSP parameters calculated every 20 msec are collectively matrix-quantized for two frames.

【００１８】このＬＳＰ量子化器１３４からの量子化出
力、すなわちＬＳＰベクトル量子化のインデクスは、端
子１３１を介して取り出され、また量子化済みのＬＳＰ
ベクトルあるいは逆量子化出力は、ＬＳＰ補間回路１３
６に送られる。The quantized output from the LSP quantizer 134, that is, the index of the LSP vector quantization, is taken out via the terminal 131 and is also the quantized LSP.
The vector or dequantized output is the LSP interpolation circuit 13
Sent to 6.

【００１９】ＬＳＰ補間回路１３６は、ＬＳＰ量子化器
１３４で上記２０ｍsec 毎にベクトル量子化されたＬＳ
Ｐのベクトルの前フレームと現フレームとの組を補間
し、後の処理で必要となるレートにするためのものであ
り、この例では、８倍のレートと５倍のレートにしてい
る。８倍レートでは、２．５ｍsec 毎にＬＳＰベクトル
が更新されるようにする。これは、残差波形を分析合成
処理すると、その合成波形のエンベロープは非常になだ
らかでスムーズな波形になるため、ＬＰＣ係数が２０ｍ
sec 毎に急激に変化すると異音を発生することがあるか
らである。すなわち、２．５ｍsec 毎にＬＰＣ係数が徐
々に変化してゆくようにすれば、このような異音の発生
を防ぐことができる。The LSP interpolation circuit 136 is an LS that is vector-quantized by the LSP quantizer 134 every 20 msec.
This is for interpolating the set of the previous frame and the current frame of the vector of P to obtain the rate required for the subsequent processing. In this example, the rate is 8 times and the rate is 5 times. At the 8 times rate, the LSP vector is updated every 2.5 msec. This is because when the residual waveform is analyzed and synthesized, the envelope of the synthesized waveform becomes a very smooth and smooth waveform, so that the LPC coefficient is 20 m.
This is because an abnormal sound may be generated if it changes rapidly every sec. That is, if the LPC coefficient is gradually changed every 2.5 msec, the occurrence of such abnormal noise can be prevented.

【００２０】このような補間が行われた２．５ｍsec 毎
のＬＳＰベクトルを用いて入力音声の逆フィルタリング
を実行するために、ＬＳＰ→α変換回路１３７により、
ＬＳＰパラメータを例えば１０次程度の直接型フィルタ
の係数であるαパラメータに変換する。このＬＳＰ→α
変換回路１３７からの出力は、上記ＬＰＣ残差を求める
ためのＬＰＣ逆フィルタ回路１１１に送られ、このＬＰ
Ｃ逆フィルタ１１１では、２．５ｍsec 毎に更新される
αパラメータにより逆フィルタリング処理を行って、滑
らかな出力を得るようにしている。In order to execute the inverse filtering of the input voice using the LSP vector for every 2.5 msec which has been interpolated in this way, the LSP → α conversion circuit 137
The LSP parameter is converted into, for example, an α parameter which is a coefficient of a direct type filter of about 10th order. This LSP → α
The output from the conversion circuit 137 is sent to the LPC inverse filter circuit 111 for obtaining the LPC residual, and the LP
The C inverse filter 111 performs an inverse filtering process with the α parameter updated every 2.5 msec to obtain a smooth output.

【００２１】また、ＬＳＰ補間回路１３６で５倍レート
で補間された４ｍsec 毎のＬＳＰ係数は、ＬＳＰ→α変
換回路１３８に送られてαパラメータに変換され、ＭＤ
ＣＴ係数の量子化に使用する重み計算のためのＶＱ（ベ
クトル量子化）重み計算回路１３９に送られる。Further, the LSP coefficient every 4 msec interpolated by the LSP interpolation circuit 136 at the rate of 5 is sent to the LSP → α conversion circuit 138 to be converted into an α parameter and MD.
It is sent to the VQ (vector quantization) weight calculation circuit 139 for weight calculation used for the quantization of the CT coefficient.

【００２２】ＬＰＣ逆フィルタ１１１からの出力は、長
期予測であるピッチ予測のためのピッチ逆フィルタ１１
２及び１２２に送られる。The output from the LPC inverse filter 111 is the pitch inverse filter 11 for pitch prediction which is a long-term prediction.
2 and 122.

【００２３】次に、長期予測について説明する。長期予
測は、ピッチ分析により求められたピッチ周期あるいは
ピッチラグ分だけ時間軸上でずらした波形を元の波形か
ら減算してピッチ予測残差を求めることにより行ってお
り、この例では３点ピッチ予測によって行っている。な
お、ピッチラグとは、サンプリングされた時間軸データ
のピッチ周期に対応するサンプル数のことである。Next, the long-term prediction will be described. The long-term prediction is performed by subtracting the waveform shifted on the time axis by the pitch period or pitch lag obtained by the pitch analysis from the original waveform to obtain a pitch prediction residual. In this example, three-point pitch prediction is performed. Has gone by. The pitch lag refers to the number of samples corresponding to the pitch cycle of the sampled time axis data.

【００２４】すなわち、ピッチ分析回路１１５では１フ
レームに１回の割合、すなわち分析長が１フレームでピ
ッチ分析が行われ、ピッチ分析結果の内のピッチラグＬ
₁はピッチ逆フィルタ１１２及び出力端子１４２に送ら
れ、ピッチゲインはピッチゲインＶＱ（ベクトル量子
化）回路１１６に送られる。ピッチゲインＶＱ回路１１
６では、上記３点予測に対応する３点でのピッチゲイン
がベクトル量子化され、コードブックインデクスｇ₁が
出力端子１４３より取り出され、代表値ベクトルあるい
は逆量子化出力がピッチ逆フィルタ１１５、減算器１１
７、加算器１２７にそれぞれ送られる。ピッチ逆フィル
タ１１２は、上記ピッチ分析結果に基づいて３点ピッチ
予測されたピッチ予測残差を出力する。このピッチ予測
残差は、直交変換手段である例えばＭＤＣＴ回路１１３
に送られ、ＭＤＣＴ処理された後、ＶＱ（ベクトル量子
化）回路１１４により聴覚重み付けベクトル量子化され
る。このＶＱ回路１１４では、ＶＱ重み計算回路１３９
からの出力により聴覚重み付けされたベクトル量子化が
施され、その出力であるインデクスIdxVq₁は、出力端子
１４１より取り出される。That is, in the pitch analysis circuit 115, the pitch analysis is performed once per frame, that is, the analysis length is 1 frame, and the pitch lag L in the pitch analysis results is obtained.
₁ is sent to the pitch inverse filter 112 and the output terminal 142, and the pitch gain is sent to the pitch gain VQ (vector quantization) circuit 116. Pitch gain VQ circuit 11
In 6, the pitch gain at three points corresponding to the above three-point prediction is vector quantized, the codebook index g ₁ is taken out from the output terminal 143, and the representative value vector or the dequantized output is subtracted from the pitch inverse filter 115 and subtracted. Bowl 11
7 and the adder 127, respectively. The pitch inverse filter 112 outputs a pitch prediction residual with three-point pitch prediction based on the pitch analysis result. This pitch prediction residual is, for example, the MDCT circuit 113 which is an orthogonal transform means.
And is subjected to MDCT processing and then subjected to auditory weighting vector quantization by a VQ (vector quantization) circuit 114. In this VQ circuit 114, the VQ weight calculation circuit 139
Is subjected to perceptually weighted vector quantization, and the output, index IdxVq _1, is taken out from the output terminal 141.

【００２５】またこの例においては、予測精度を高める
ために、さらに別系統のピッチ予測系となるピッチ逆フ
ィルタ１２２、ピッチ分析回路１２４及びピッチゲイン
ＶＱ回路１２６を設けている。すなわち、上記各ピッチ
分析中心の中間位置にも分析中心を置くようにして、ピ
ッチ分析回路１２５により１／２フレーム周期で分析を
行うようにしている。ピッチ分析回路１２５からのピッ
チラグＬ₂はピッチ逆フィルタ１２２及び出力端子１４
５に、ピッチゲインはピッチゲインＶＱ（ベクトル量子
化）回路１２６にそれぞれ送られる。ピッチゲインＶＱ
回路１２６では、３点のピッチゲインベクトルをベクト
ル量子化して量子化出力であるピッチゲインのインデク
スｇ₂を出力端子１４４に送り、その代表ベクトルある
いは逆量子化出力を減算器１１７に送っている。ここ
で、元のフレーム周期の分析中心の位置でのピッチゲイ
ンは、ピッチゲインＶＱ回路１１６からのピッチゲイン
に近い値と考えられるから、この位置でのピッチゲイン
については、ピッチゲインＶＱ回路１１６、１２６から
の各逆量子化出力の差を減算器１１７でとって、これを
ピッチゲインＶＱ回路１１８でベクトル量子化して得ら
れるピッチゲイン差分のインデクスｇ_1dを出力端子１４
６に送っている。このピッチゲイン差分の代表ベクトル
あるいは逆量子化出力を、加算器１２７に送り、ピッチ
ゲインＶＱ回路１２６からの代表ベクトルあるいは逆量
子化出力と加算したものをピッチゲインとしてピッチ逆
フィルタ１２２に送っている。なお、出力端子１４３か
ら得られるピッチゲインのインデクスｇ₂は、上記中間
位置でのピッチゲインのインデクスである。ピッチ逆フ
ィルタ１２２からのピッチ予測残差は、ＭＤＣＴ回路１
２３でＭＤＣＴ処理され、これが減算器１２８に送られ
て、ＶＱ（ベクトル量子化）回路１１４からの代表ベク
トルあるいは逆量子化出力が減算され、その差分がＶＱ
回路１２４に送られてベクトル量子化され、インデクス
IdxVq2が出力端子１４７に送られる。このＶＱ回路は、
ＶＱ重み計算回路１３９からの出力により聴覚重み付き
のベクトル量子化を施す。Further, in this example, a pitch inverse filter 122, a pitch analysis circuit 124, and a pitch gain VQ circuit 126, which are a separate pitch prediction system, are further provided in order to improve the prediction accuracy. That is, the analysis center is placed at an intermediate position between the pitch analysis centers, and the pitch analysis circuit 125 performs the analysis at 1/2 frame period. The pitch lag L ₂ from the pitch analysis circuit 125 is the pitch inverse filter 122 and the output terminal 14.
5, the pitch gain is sent to the pitch gain VQ (vector quantization) circuit 126, respectively. Pitch gain VQ
The circuit 126 vector-quantizes the pitch gain vector at the three points, sends the pitch gain index g ₂ which is a quantized output to the output terminal 144, and sends the representative vector or the inverse quantized output to the subtractor 117. Here, the pitch gain at the position of the analysis center of the original frame period is considered to be a value close to the pitch gain from the pitch gain VQ circuit 116. Therefore, regarding the pitch gain at this position, the pitch gain VQ circuit 116, The subtractor 117 takes the difference between the respective inverse quantized outputs, and the pitch gain VQ circuit 118 vector-quantizes the difference to obtain the pitch gain difference index g _1d.
I am sending to 6. The representative vector of the pitch gain difference or the inverse quantized output is sent to the adder 127, and the sum of the representative vector or the inverse quantized output from the pitch gain VQ circuit 126 is sent to the pitch inverse filter 122 as the pitch gain. . The pitch gain index g ₂ obtained from the output terminal 143 is the pitch gain index at the intermediate position. The pitch prediction residual from the pitch inverse filter 122 is the MDCT circuit 1
23, MDCT processing is performed, and this is sent to the subtractor 128, and the representative vector or dequantized output from the VQ (vector quantization) circuit 114 is subtracted, and the difference is VQ.
It is sent to the circuit 124, vector-quantized, and indexed.
IdxVq2 is sent to the output terminal 147. This VQ circuit
The output from the VQ weight calculation circuit 139 performs vector quantization with auditory weighting.

【００２６】次に、高域側の信号処理について説明す
る。Next, the signal processing on the high frequency side will be described.

【００２７】この高域側の信号処理は、基本的に、入力
信号を帯域分割し、分割された少なくとも１つの高域側
の信号を低域側に周波数変換し、低域側に変換された信
号のサンプリングレートを低下させて、サンプリングレ
ートが低下させられた信号を予測符号化するようにして
いる。In the signal processing on the high frequency side, basically, the input signal is band-divided, at least one of the divided high-frequency side signals is frequency-converted to the low frequency side, and then converted to the low frequency side. The sampling rate of the signal is reduced, and the signal with the reduced sampling rate is predictively encoded.

【００２８】図１の入力端子１０１に供給された広帯域
信号が減算器１０６に入力され、この広帯域信号から、
ＬＰＦ（ローパスフィルタ）１０２により取り出された
低域側の信号、例えば０〜３．８ｋHz程度のいわゆる電
話帯域の信号、が差し引かれる。これによって、減算器
１０６からは、高域側の信号、例えば３．８ｋHz〜８ｋ
Hzの信号が出力される。ただし、現実のＬＰＦ１０２の
特性等により、減算器１０６からの出力には、３．８ｋ
Hz以下の成分もわずかながら残っており、高域側の信号
処理は、３．５ｋHz以上、あるいは３．４ｋHz以上の成
分に対して行うようにしている。The wideband signal supplied to the input terminal 101 of FIG. 1 is input to the subtractor 106, and from this wideband signal,
A low-frequency side signal extracted by the LPF (low-pass filter) 102, for example, a so-called telephone band signal of about 0 to 3.8 kHz is subtracted. As a result, from the subtractor 106, a high frequency side signal, for example, 3.8 kHz to 8 kHz
The Hz signal is output. However, due to the characteristics of the actual LPF 102, the output from the subtractor 106 is 3.8 k
A small amount of components below Hz remains, and signal processing on the high frequency side is performed on components above 3.5 kHz or above 3.4 kHz.

【００２９】この高域側の信号は、減算器１０６からの
例えば３．５ｋHz〜８ｋHzの４．５ｋHzの周波数幅を持
つが、ダウンサンプリング等により周波数を低域側にシ
フトあるいは変換して信号処理を行うため、例えば４ｋ
Hz幅まで狭めることが必要とされる。ここで、後の低域
側との合成を考慮し、３．５ｋHz〜４ｋHz付近は聴感上
敏感であるので、ここをカットせずに、音声信号の性質
として成分あるいはパワーも少なく聴感上影響の少ない
７．５ｋHz〜８ｋHzの０．５ｋHz分をＬＰＦあるいはＢ
ＰＦ（バンドパスフィルタ）１０７によりカットする。The high-frequency side signal has a frequency width of 4.5 kHz from 3.5 kHz to 8 kHz from the subtractor 106, but the frequency is shifted or converted to the low-frequency side by down-sampling or the like to perform signal processing. For example, 4k
It is necessary to narrow down to the Hz width. Here, in consideration of the synthesis with the low frequency side after that, since the region around 3.5 kHz to 4 kHz is sensitive to hearing, the component or power is small as the property of the audio signal without cutting it and the influence on the hearing is not affected. LPF or B for a small amount of 0.5 kHz from 7.5 kHz to 8 kHz
It is cut by a PF (band pass filter) 107.

【００３０】次に、低域側への周波数変換を行うが、こ
の例では、直交変換手段、例えばＦＦＴ（高速フーリエ
変換）回路１６１を用いて周波数軸上のデータに変換
し、この周波数軸上のデータを周波数シフト回路１６２
によりシフトした後、逆直交変換手段である逆ＦＦＴ回
路１６４により逆ＦＦＴ処理することにより実現してい
る。Next, frequency conversion to the low frequency side is performed. In this example, orthogonal conversion means, for example, FFT (Fast Fourier Transform) circuit 161, is used to convert to data on the frequency axis, and on this frequency axis. Data of the frequency shift circuit 162
After being shifted by, the inverse FFT circuit 164, which is an inverse orthogonal transform means, performs inverse FFT processing.

【００３１】逆ＦＦＴ回路１６４からは、入力信号の高
域側の例えば３．５ｋHz〜７．５ｋHzの信号が、０〜４
ｋHzの低域側に変換された信号が取り出される。この信
号はサンプリング周波数が８ｋHzで表現できるので、ダ
ウンサンプリング回路１６４によりダウンサンプリング
してサンプリング周波数８ｋHzの３．５ｋHz〜７．５ｋ
Hzの帯域の信号とする。このダウンサンプリング回路１
６４からの出力は、ＬＰＣ逆フィルタ１７１及びＬＰＣ
分析・量子化部１８０のＬＰＣ分析回路１８２にそれぞ
れ送られる。From the inverse FFT circuit 164, signals of, for example, 3.5 kHz to 7.5 kHz on the high frequency side of the input signal are output from 0 to 4.
The signal converted to the low frequency side of kHz is extracted. Since this signal can be expressed at a sampling frequency of 8 kHz, the downsampling circuit 164 downsamples the signal so that the sampling frequency is 3.5 kHz to 7.5 kHz.
The signal should be in the Hz band. This down sampling circuit 1
The output from 64 is the LPC inverse filter 171 and the LPC.
It is sent to the LPC analysis circuit 182 of the analysis / quantization unit 180.

【００３２】ＬＰＣ分析・量子化部１８０は、上記低域
側のＬＰＣ分析・量子化部１３０とほぼ同様な構成を有
しているため、簡単に説明する。Since the LPC analysis / quantization unit 180 has a configuration similar to that of the LPC analysis / quantization unit 130 on the low frequency side, it will be briefly described.

【００３３】すなわち、ＬＰＣ分析・量子化部１８０に
おいて、ダウンサンプリング回路１６４からの低域変換
された信号が入力されるＬＰＣ分析回路１８２は、入力
信号波形の２５６サンプル程度の長さを１ブロックとし
てハミング窓をかけて、例えば自己相関法により線形予
測係数、いわゆるαパラメータを求めている。ＬＰＣ分
析回路１８２からのαパラメータは、α→ＬＳＰ変換回
路１８３に送られて、線スペクトル対（ＬＳＰ）パラメ
ータに変換される。α→ＬＳＰ変換回路１８３からのＬ
ＳＰパラメータは、ＬＳＰ量子化器１８４によりベクト
ル量子化あるいはマトリクス量子化される。このとき、
フレーム間差分をとってからベクトル量子化してもよ
い。あるいは、複数フレーム分をまとめてマトリクス量
子化してもよい。ここでは、２０ｍsec を１フレームと
し、２０ｍsec 毎に算出されるＬＳＰパラメータをベク
トル量子化している。That is, in the LPC analysis / quantization unit 180, the LPC analysis circuit 182 to which the low-frequency-converted signal from the downsampling circuit 164 is input, has a length of about 256 samples of the input signal waveform as one block. A linear prediction coefficient, so-called α parameter, is obtained by, for example, an autocorrelation method by applying a Hamming window. The α parameter from the LPC analysis circuit 182 is sent to the α → LSP conversion circuit 183 and converted into a line spectrum pair (LSP) parameter. L from the α → LSP conversion circuit 183
The SP parameters are vector-quantized or matrix-quantized by the LSP quantizer 184. At this time,
Vector quantization may be performed after taking the difference between frames. Alternatively, a plurality of frames may be collectively subjected to matrix quantization. Here, 20 msec is set as one frame, and LSP parameters calculated every 20 msec are vector-quantized.

【００３４】このＬＳＰ量子化器１８４からの量子化出
力、すなわち高域側信号のＬＳＰベクトル量子化のイン
デクスLSPidx_Hは、端子１８１を介して取り出され、ま
た量子化済みのＬＳＰベクトルあるいは逆量子化出力
は、ＬＳＰ補間回路１８６に送られる。The quantized output from the LSP quantizer 184, that is, the LSP vector quantization index LSPidx _H of the high frequency side signal is extracted via the terminal 181, and the quantized LSP vector or dequantized The output is sent to the LSP interpolation circuit 186.

【００３５】ＬＳＰ補間回路１８６は、ＬＳＰ量子化器
１８４で上記２０ｍsec 毎にベクトル量子化されたＬＳ
Ｐのベクトルの前フレームと現フレームとの組を補間
し、後の処理で必要となるレートにするためのものであ
り、この例では、４倍のレートにしている。The LSP interpolation circuit 186 is an LS that has been vector quantized by the LSP quantizer 184 every 20 msec.
This is for interpolating the set of the previous frame and the current frame of the vector of P to obtain the rate required for the subsequent processing, and in this example, the rate is quadrupled.

【００３６】このような補間が行われた５ｍsec 毎のＬ
ＳＰベクトルを用いて入力音声の逆フィルタリングを実
行するために、ＬＳＰ→α変換回路１８７により、ＬＳ
ＰパラメータをＬＰＣ合成フィルタの係数であるαパラ
メータに変換する。このＬＳＰ→α変換回路１８７から
の出力は、上記ＬＰＣ残差を求めるためのＬＰＣ逆フィ
ルタ回路１７１に送られ、このＬＰＣ逆フィルタ１７１
では、５ｍsec 毎に更新されるαパラメータにより逆フ
ィルタリング処理を行って、滑らかな出力を得るように
している。L for each 5 msec in which such interpolation is performed
In order to execute the inverse filtering of the input voice using the SP vector, the LSP → α conversion circuit 187 causes the LS
The P parameter is converted into an α parameter which is a coefficient of the LPC synthesis filter. The output from the LSP → α conversion circuit 187 is sent to the LPC inverse filter circuit 171 for obtaining the LPC residual, and the LPC inverse filter 171 is supplied.
Then, the inverse filtering process is performed with the α parameter updated every 5 msec to obtain a smooth output.

【００３７】ＬＰＣ逆フィルタ１７１からのＬＰＣ予測
残差出力は、ＬＰＣ残差ＶＱ（ベクトル量子化）回路１
７２に送られてベクトル量子化され、その出力であるＬ
ＰＣ残差のインデクスLPCidxが出力端子１７３より取り
出される。The LPC prediction residual output from the LPC inverse filter 171 is the LPC residual VQ (vector quantization) circuit 1
It is sent to 72, vector-quantized, and its output is L
The PC residual index LPCidx is taken out from the output terminal 173.

【００３８】以上のような構成の信号符号化装置におい
て、低域側の一部構成を独立したコーデックの符号化装
置としたり、あるいは出力されるビットストリームの全
体と一部とを切り換えることにより、ビットレートの異
なる信号伝送やデコードを可能としている。In the signal coding apparatus having the above-described configuration, by partially setting the low-frequency side configuration as a coding apparatus of an independent codec, or by switching the whole and a part of the output bit stream, It enables signal transmission and decoding with different bit rates.

【００３９】すなわち、図１の構成の各出力端子からの
全てのデータを伝送するとき、伝送ビットレートは１６
ｋbps （ｋビット／秒）となり、一部端子からのデータ
を伝送することにより６ｋbps の伝送ビットレートとな
る。That is, when transmitting all data from each output terminal of the configuration of FIG. 1, the transmission bit rate is 16
It becomes kbps (k bits / second), and the transmission bit rate becomes 6 kbps by transmitting data from some terminals.

【００４０】あるいは、図１の全ての端子からの１６ｋ
bps の全データを伝送、すなわち送信あるいは記録し、
受信あるいは再生側で１６ｋbps の全データをデコード
することにより、１６ｋbps の高品質の音声信号が得ら
れ、６ｋbps のデータをデコードすることにより、簡単
なデコーダで６ｋbps に応じた品質の音声信号が得られ
る。Alternatively, 16k from all terminals in FIG.
transmit all data in bps, ie send or record,
By decoding all 16 kbps data on the receiving or playback side, a high quality 16 kbps voice signal can be obtained. By decoding 6 kbps data, a simple decoder can produce a 6 kbps quality voice signal. .

【００４１】ここで、図１の構成においては、出力端子
１３１、１４１〜１４３からの出力データが６ｋbps の
データに相当し、さらに出力端子１４４〜１４７、１７
３、１８１からの出力データを加えることで、１６ｋbp
s の全データが得られる。Here, in the configuration of FIG. 1, the output data from the output terminals 131, 141 to 143 corresponds to the data of 6 kbps, and the output terminals 144 to 147, 17 are further provided.
16 kbps by adding output data from 3 and 181
All data of s are obtained.

【００４２】次に、上記図１の信号符号化装置に対応す
る信号復号化装置について、図２を参照しながら説明す
る。Next, a signal decoding device corresponding to the signal coding device of FIG. 1 will be described with reference to FIG.

【００４３】この図２において、入力端子２００には、
上記図１の出力端子１３１からの出力に相当するＬＳＰ
のベクトル量子化出力、いわゆるコードブックのインデ
クスLSPidxが供給されている。In FIG. 2, the input terminal 200 has
LSP corresponding to the output from the output terminal 131 of FIG.
The vector quantized output of, the so-called codebook index LSPidx, is supplied.

【００４４】このＬＳＰのインデクスLSPidxは、ＬＰＣ
パラメータ再生部２４０のＬＳＰの逆ＶＱ（逆ベクトル
量子化）回路２４１に送られてＬＳＰ（線スペクトル
対）データに逆ベクトル量子化あるいは逆マトリクス量
子化され、ＬＳＰ補間回路２４２に送られてＬＳＰの補
間処理が施された後、ＬＳＰ→α変換回路２４３でＬＰ
Ｃ（線形予測符号）係数であるαパラメータに変換さ
れ、このαパラメータがＬＰＣ合成フィルタ２１５、２
２５及びピッチスペクトラルポストフィルタ２１６、２
２６に送られる。The index LSPidx of this LSP is LPC
It is sent to the inverse VQ (inverse vector quantization) circuit 241 of the LSP of the parameter reproducing unit 240 to be subjected to inverse vector quantization or inverse matrix quantization to LSP (line spectrum pair) data, and sent to the LSP interpolation circuit 242 to be sent to the LSP. After the interpolation processing is performed, LP is performed by the LSP → α conversion circuit 243.
The C parameter is converted into an α parameter which is a C (linear prediction code) coefficient, and the α parameter is converted into the LPC synthesis filters 215 and 2.
25 and pitch spectral post filters 216, 2
Sent to 26.

【００４５】また、図４の入力端子２０１、２０２、２
０３には、上記図１の各出力端子１４１、１４２、１４
３からのＭＤＣＴ係数のベクトル量子化のインデクスIs
xVq₁、ピッチラグＬ₁、ピッチゲインｇ₁がそれぞれ供
給されている。Further, the input terminals 201, 202, 2 of FIG.
03, the output terminals 141, 142, 14 of FIG.
Index Is of vector quantization of MDCT coefficients from 3
xVq ₁ , pitch lag L ₁ , and pitch gain g ₁ are respectively supplied.

【００４６】入力端子２０１からのＭＤＣＴ係数のベク
トル量子化のインデクスIsxVq₁は、逆ＶＱ（逆ベクトル
量子化）回路２１１に供給されて逆ベクトル量子化さ
れ、逆ＭＤＣＴ回路２１２により逆ＭＤＣＴ処理された
後、重畳加算（オーバーラップアッド）回路２１３で重
畳加算され、ピッチ合成フィルタ２１４に送られる。ピ
ッチ合成回路２１４には、各入力端子２０２、２０３か
らのピッチラグＬ₁ 、ピッチゲインｇ₁が供給されてい
る。このピッチ合成回路２１４で、上記図１のピッチ逆
フィルタ１１２でのピッチ予測符号化の逆処理が施され
た後、ＬＰＣ合成フィルタ２１５に送られ、ＬＰＣ合成
処理が施される。このＬＰＣ合成された出力は、ピッチ
スペクトラルポストフィルタ２１６に送られて、ポスト
フィルタ処理が施され、出力端子２１９より６ｋbps の
ビットレートに対応する音声信号として取り出される。The vector quantization index IsxVq ₁ of the MDCT coefficient from the input terminal 201 is supplied to the inverse VQ (inverse vector quantization) circuit 211, inverse vector quantized, and inverse MDCT processed by the inverse MDCT circuit 212. After that, the signals are superposed and added by the superposition and addition (overlap add) circuit 213 and sent to the pitch synthesis filter 214. The pitch synthesizing circuit 214 is supplied with the pitch lag L ₁ and the pitch gain g ₁ from the respective input terminals 202 and 203. In the pitch synthesizing circuit 214, the inverse process of the pitch predictive coding in the pitch inverse filter 112 of FIG. 1 is performed, and then it is sent to the LPC synthesizing filter 215 and the LPC synthesizing process is performed. The LPC-combined output is sent to the pitch spectral post filter 216, subjected to post filter processing, and taken out from the output terminal 219 as an audio signal corresponding to a bit rate of 6 kbps.

【００４７】図４の入力端子２０４、２０５、２０６及
び２０７には、上記図１の各出力端子１４４、１４５、
１４６及び１４７からのＭＤＣＴ係数のベクトル量子化
のピッチゲインｇ₂、ピッチラグＬ₂、インデクスIsxV
q₂及びピッチゲインｇ_1dがそれぞれ供給されている。The input terminals 204, 205, 206 and 207 of FIG. 4 are respectively connected to the output terminals 144, 145 of FIG.
Pitch gain g ₂ of vector quantization of MDCT coefficients from 146 and 147, pitch lag L ₂ , index IsxV
q ₂ and pitch gain g _1d are respectively supplied.

【００４８】入力端子２０７からのＭＤＣＴ係数のベク
トル量子化のインデクスIsxVq₂は、逆ＶＱ回路２２０に
供給されて逆ベクトル量子化され、加算器２２１に送ら
れて逆ＶＱ回路２１１からの逆ベクトル量子化されたＭ
ＤＣＴ係数と加算され、逆ＭＤＣＴ回路２２２により逆
ＭＤＣＴ処理された後、重畳加算（オーバーラップアッ
ド）回路２２３で重畳加算され、ピッチ合成フィルタ２
１４に送られる。このピッチ合成フィルタ２２４には、
各入力端子２０２、２０４、２０５からのピッチラグＬ
₁ 、ピッチゲインｇ₂、ピッチラグＬ₂が供給されると
共に、入力端子２０３からのピッチゲインｇ₁と入力端
子２０６からのピッチゲインｇ_1dとが加算器２１７で加
算されたものが供給されている。このピッチ合成フィル
タ２２４でピッチ残差の合成処理が施された後、ＬＰＣ
合成フィルタ２２５に送られ、ＬＰＣ合成処理が施され
る。このＬＰＣ合成された出力は、ピッチスペクトラル
ポストフィルタ２２６に送られて、ポストフィルタ処理
が施され、アップサンプリング回路２２７に送られてサ
ンプリング周波数が例えば８ｋHzから１６ｋHzにアップ
サンプリングされた後、加算器２２８に送られる。The vector quantization index IsxVq ₂ of the MDCT coefficient from the input terminal 207 is supplied to the inverse VQ circuit 220, inverse vector quantized, and sent to the adder 221, and the inverse vector quantum from the inverse VQ circuit 211. M
After being added to the DCT coefficient and subjected to inverse MDCT processing by the inverse MDCT circuit 222, it is superimposed and added by the superposition and addition (overlap add) circuit 223, and the pitch synthesis filter 2
14 is sent. The pitch synthesis filter 224 has
Pitch lag L from each input terminal 202, 204, 205
₁ , the pitch gain g ₂ , and the pitch lag L ₂ are supplied, and the sum of the pitch gain g ₁ from the input terminal 203 and the pitch gain g _1d from the input terminal 206 in the adder 217 is supplied. . After the pitch residual synthesis processing is performed by the pitch synthesis filter 224, the LPC
It is sent to the synthesis filter 225 and subjected to LPC synthesis processing. The LPC synthesized output is sent to the pitch spectral post filter 226, subjected to post filter processing, sent to the upsampling circuit 227, and the sampling frequency is upsampled from 8 kHz to 16 kHz, for example, and then the adder 228 is added. Sent to.

【００４９】さらに、入力端子２０７には、図１の出力
端子１８１からの高域側のＬＳＰインデクスLSPidx_Hが
供給されており、このＬＳＰのインデクスLSPidx_Hは、
ＬＰＣパラメータ再生部２４５のＬＳＰの逆ＶＱ（逆ベ
クトル量子化）回路２４６に送られてＬＳＰデータに逆
ベクトル量子化され、ＬＳＰ補間回路２４７に送られて
ＬＳＰの補間処理が施された後、ＬＳＰ→α変換回路２
４８でＬＰＣ係数のαパラメータに変換され、このαパ
ラメータが高域側ＬＰＣ合成フィルタ２３２に送られ
る。Further, the LSP index LSPidx _H on the high frequency side is supplied from the output terminal 181 of FIG. 1 to the input terminal 207, and the index LSPidx _{H of} this LSP is
After being sent to the LSP inverse VQ (inverse vector quantization) circuit 246 of the LPC parameter reproducing unit 245 and inverse vector quantized into LSP data, and sent to the LSP interpolation circuit 247 to be subjected to the LSP interpolation processing, → α conversion circuit 2
At 48, it is converted into an α parameter of the LPC coefficient, and this α parameter is sent to the high frequency side LPC synthesis filter 232.

【００５０】入力端子２０９には、図１の出力端子１７
３からの高域側のＬＰＣ残差のベクトル量子化出力であ
るインデクスLPCidxが供給されて、高域逆ＶＱ回路２３
１で逆ベクトル量子化され、高域側ＬＰＣ合成フィルタ
２３２に送られる。高域側ＬＰＣ合成フィルタ２３２で
ＬＰＣ合成処理された出力は、アップサンプリング回路
２３３でサンプリング周波数が例えば８ｋHzから１６ｋ
Hzにアップサンプリングされた後、直交変換手段である
ＦＦＴ回路２３４で高速フーリエ変換されて周波数軸上
の信号に変換され、周波数シフト回路２３５で高域側に
周波数シフト処理され、逆ＦＦＴ回路２３６で逆高速フ
ーリエ変換されることにより、高域側の時間軸信号とさ
れ、重畳加算回路２３７を介して加算器２２８に送られ
る。The input terminal 209 has an output terminal 17 shown in FIG.
The index LPCidx, which is the vector quantization output of the LPC residual on the high frequency side from 3, is supplied to the high frequency inverse VQ circuit 23.
Inverse vector quantization is performed at 1, and the result is sent to the high frequency side LPC synthesis filter 232. The output subjected to the LPC synthesis processing by the high frequency side LPC synthesis filter 232 has a sampling frequency of, for example, 8 kHz to 16 k
After being up-sampled to Hz, the FFT circuit 234, which is an orthogonal transform means, performs a fast Fourier transform to convert the signal into a signal on the frequency axis, a frequency shift circuit 235 performs frequency shift processing to the high frequency side, and an inverse FFT circuit 236. By performing the inverse fast Fourier transform, it becomes a time domain signal on the high frequency side, and is sent to the adder 228 via the superposition addition circuit 237.

【００５１】加算器２２８では、上記アップサンプリン
グ回路２２７からの信号と加算され、出力端子２２９よ
り１６ｋbps のビットレートの一部に対応する音声信号
として取り出される。全体としての１６ｋbps のビット
レートの信号は、上記出力端子２１９からの信号も合成
されることにより取り出される。In the adder 228, the signal from the upsampling circuit 227 is added, and the added signal is taken out from the output terminal 229 as an audio signal corresponding to a part of the bit rate of 16 kbps. A signal having a bit rate of 16 kbps as a whole is taken out by combining the signal from the output terminal 219.

【００５２】ここで、スケーラビリティについて説明す
る。上記図１、図２の構成においては、６ｋbps と１６
ｋbps との２通りの伝送ビットレートをほぼ同様な符号
化復号化方式で実現しており、１６ｋbps のビットスト
リーム内に６ｋbps のビットストリームを完全に包含す
るスケーラビリティを実現しているが、さらに２ｋbps
のような極端にビットレートの異なる符号化復号化を行
う場合には、このような完全な包含関係を得るのは難し
い。Scalability will now be described. In the configurations shown in FIGS. 1 and 2, 6 kbps and 16
Two kinds of transmission bit rates of kbps are realized by almost the same encoding / decoding method, and the scalability of completely enclosing a bit stream of 6 kbps in a bit stream of 16 kbps is realized.
It is difficult to obtain such a perfect inclusion relationship when performing encoding and decoding with extremely different bit rates such as.

【００５３】ここで、同一の符号化復号化方式を適用で
きない場合であっても、最大限に共有関係を持ちながら
スケーラビリティを持たせることが好ましい。Here, even when the same encoding / decoding method cannot be applied, it is preferable to have scalability while having a maximum sharing relationship.

【００５４】このため、図３に示すような構成の符号化
装置によって２ｋbps の符号化を行い、図１の構成との
間に最大限の共有部分あるいは共有データを持たせ、全
体として１６ｋbps のビットストリームで、この内１６
ｋbps 全てを使用する場合と、６ｋbps を使用する場合
と、２ｋbps を移用する場合とを、それぞれ用途に応じ
て使い分けるようにしている。For this reason, the coding apparatus having the structure shown in FIG. 3 performs coding at 2 kbps to have the maximum shared portion or shared data with the structure shown in FIG. 16 of these in the stream
The case of using all kbps, the case of using 6 kbps, and the case of transferring 2 kbps are properly used according to the intended use.

【００５５】なお、厳密には、後述するように、２ｋbp
s では２ｋbps の情報を全て使用するが、６ｋbps のモ
ードでは、符号化単位となるフレームが有声音（Ｖ）の
とき６ｋbps 、無声音（ＵＶ）のとき５．６５ｋbps で
あり、また１６ｋbps のモードでは、フレームが有声音
（Ｖ）のとき１５．２ｋbps 、無声音（ＵＶ）のとき１
４．８５ｋbps である。Strictly speaking, as will be described later, 2 kbps
s uses all 2 kbps information, but in the 6 kbps mode, it is 6 kbps when the frame to be coded is voiced (V), 5.65 kbps when unvoiced (UV), and 16 kbps in the 16 kbps mode. 15.2 kbps when frame is voiced (V), 1 when unvoiced (UV)
It is 4.85 kbps.

【００５６】ここで、図３の２ｋbps の符号化装置の構
成及び動作を説明する。The configuration and operation of the 2 kbps coding device shown in FIG. 3 will be described.

【００５７】図３に示す符号化装置の基本的な考え方
は、入力音声信号の短期予測残差例えばＬＰＣ（線形予
測符号化）残差を求めてサイン波分析（sinusoidal ana
lysis）符号化、例えばハーモニックコーディング（har
monic coding ）を行う第１の符号化部３１０と、入力
音声信号に対して位相伝送を行う波形符号化により符号
化する第２の符号化部３２０とを有し、入力信号の有声
音（Ｖ：Voiced）の部分の符号化に第１の符号化部３１
０を用い、入力信号の無声音（ＵＶ：Unvoiced）の部分
の符号化には第２の符号化部３２０を用いるようにする
ことである。The basic idea of the coding apparatus shown in FIG. 3 is to obtain a short-term prediction residual of an input speech signal, for example, an LPC (linear prediction coding) residual, and perform a sine wave analysis (sinusoidal analysis).
lysis) coding, eg harmonic coding (har
monic coding) and a second coding section 320 that performs coding by waveform coding that performs phase transmission on the input speech signal. : Voiced) portion is encoded by the first encoding unit 31.
0 is used, and the second encoding unit 320 is used for encoding the unvoiced sound (UV: Unvoiced) portion of the input signal.

【００５８】上記第１の符号化部３１０には、例えばＬ
ＰＣ残差をハーモニック符号化やマルチバンド励起（Ｍ
ＢＥ）符号化のようなサイン波分析符号化を行う構成が
用いられる。上記第２の符号化部３２０には、例えば合
成による分析法を用いて最適ベクトルのクローズドルー
プサーチによるベクトル量子化を用いた符号励起線形予
測（ＣＥＬＰ）符号化の構成が用いられる。The first encoding unit 310 has, for example, L
Harmonic coding and multi-band excitation (M
A configuration for performing sine wave analysis encoding such as BE) encoding is used. For the second encoding unit 320, for example, a configuration of code excitation linear prediction (CELP) encoding using vector quantization by closed loop search of an optimum vector using an analysis method by synthesis is used.

【００５９】図３の例では、入力端子３０１に供給され
た音声信号が、第１の符号化部３１０のＬＰＣ逆フィル
タ３１１及びＬＰＣ分析・量子化部３１３に送られてい
る。ＬＰＣ分析・量子化部３１３から得られたＬＰＣ係
数あるいはいわゆるαパラメータは、ＬＰＣ逆フィルタ
３１１に送られて、このＬＰＣ逆フィルタ３１１により
入力音声信号の線形予測残差（ＬＰＣ残差）が取り出さ
れる。また、ＬＰＣ分析・量子化部３１３からは、後述
するようにＬＳＰ（線スペクトル対）の量子化出力が取
り出され、これが出力端子３０２に送られる。ＬＰＣ逆
フィルタ３１１からのＬＰＣ残差は、サイン波分析符号
化部３１４に送られる。サイン波分析符号化部３１４で
は、ピッチ検出やスペクトルエンベロープ振幅計算が行
われると共に、Ｖ（有声音）／ＵＶ（無声音）判定部３
１５によりＶ／ＵＶの判定が行われる。サイン波分析符
号化部３１４からのスペクトルエンベロープ振幅データ
がベクトル量子化部３１６に送られる。スペクトルエン
ベロープのベクトル量子化出力としてのベクトル量子化
部３１６からのコードブックインデクスは、スイッチ３
１７を介して出力端子３０３に送られ、サイン波分析符
号化部３１４からの出力は、スイッチ３１８を介して出
力端子３０４に送られる。また、Ｖ／ＵＶ判定部３１５
からのＶ／ＵＶ判定出力は、出力端子３０５に送られる
と共に、スイッチ３１７、３１８の制御信号として送ら
れており、上述した有声音（Ｖ）のとき上記インデクス
及びピッチが選択されて各出力端子３０３及び３０４か
らそれぞれ取り出される。In the example of FIG. 3, the audio signal supplied to the input terminal 301 is sent to the LPC inverse filter 311 and the LPC analysis / quantization unit 313 of the first encoding unit 310. The LPC coefficient or the so-called α parameter obtained from the LPC analysis / quantization unit 313 is sent to the LPC inverse filter 311, and the LPC inverse filter 311 extracts a linear prediction residual (LPC residual) of the input speech signal. . The LPC analysis / quantization unit 313 extracts a quantized output of an LSP (line spectrum pair) as described later, and sends this to the output terminal 302. The LPC residual from the LPC inverse filter 311 is sent to the sine wave analysis coding unit 314. In the sine wave analysis coding unit 314, pitch detection and spectrum envelope amplitude calculation are performed, and the V (voiced sound) / UV (unvoiced sound) determination unit 3 is performed.
15 is used to determine V / UV. The spectral envelope amplitude data from the sine wave analysis coding unit 314 is sent to the vector quantization unit 316. The codebook index from the vector quantization unit 316 as the vector quantization output of the spectrum envelope is the switch 3
The output from the sine wave analysis coding unit 314 is sent to the output terminal 303 via the switch 17, and the output from the sine wave analysis coding unit 314 is sent to the output terminal 304 via the switch 318. In addition, the V / UV determination unit 315
V / UV judgment output from the output terminal 305 is sent to the output terminal 305 and also as a control signal for the switches 317 and 318. When the voiced sound (V) is used, the index and pitch are selected and the output terminals are output. From 303 and 304 respectively.

【００６０】図３の第２の符号化部３２０は、この例で
はＣＥＬＰ（符号励起線形予測）符号化構成を有してお
り、雑音符号帳３２１からの出力を、重み付きの合成フ
ィルタ３２２により合成処理し、得られた重み付き音声
を減算器３２３に送り、入力端子３０１に供給された音
声信号を聴覚重み付けフィルタ３２５を介して得られた
音声との誤差を取り出し、この誤差を距離計算回路３２
４に送って距離計算を行い、誤差が最小となるようなベ
クトルを雑音符号帳３２１でサーチするような、合成に
よる分析（Analysis by Synthesis ）法を用いたクロー
ズドループサーチを用いた時間軸波形のベクトル量子化
を行っている。このＣＥＬＰ符号化は、上述したように
無声音部分の符号化に用いられており、雑音符号帳３２
１からのＵＶデータとしてのコードブックインデクス
は、上記Ｖ／ＵＶ判定部３１５からのＶ／ＵＶ判定結果
が無声音（ＵＶ）のときオンとなるスイッチ３２７を介
して、出力端子３０７より取り出される。The second coding section 320 of FIG. 3 has a CELP (code excitation linear prediction) coding configuration in this example, and outputs the output from the random codebook 321 by the weighted synthesis filter 322. The weighted voice obtained by the synthesis processing is sent to the subtractor 323, the voice signal supplied to the input terminal 301 is taken out as an error from the voice obtained through the auditory weighting filter 325, and this error is calculated by the distance calculation circuit. 32
4, the distance calculation is performed, and a vector that minimizes the error is searched by the noise codebook 321. A time-axis waveform using a closed loop search using an analysis by synthesis method is used. Vector quantization is performed. This CELP coding is used for coding the unvoiced sound portion as described above, and the noise codebook 32 is used.
The codebook index as the UV data from 1 is taken out from the output terminal 307 via the switch 327 that is turned on when the V / UV determination result from the V / UV determination unit 315 is unvoiced (UV).

【００６１】このような符号化装置のＬＰＣ分析・量子
化部３１３が図１のＬＰＣ分析・量子化部１３０の一部
として共用でき、端子３０２からの出力がそのまま図１
の出力端子１３１からの出力として使用できる。また、
サイン波分析符号化部３１４により得られるピッチデー
タの一部が図１のピッチ分析回路１１５からの出力とし
て使用でき、このピッチ分析回路１１５をサイン波分析
符号化部３１４内のピッチ出力部分と共用することも可
能である。The LPC analysis / quantization unit 313 of such an encoding device can be shared as a part of the LPC analysis / quantization unit 130 of FIG. 1, and the output from the terminal 302 is as it is.
Can be used as an output from the output terminal 131. Also,
A part of the pitch data obtained by the sine wave analysis coding unit 314 can be used as an output from the pitch analysis circuit 115 of FIG. 1, and this pitch analysis circuit 115 is shared with the pitch output part in the sine wave analysis coding unit 314. It is also possible to do so.

【００６２】このように、図３の符号化方式と図１の符
号化方式とは異なっているが、両者とも共通する情報を
持っており、図４に示すようなスケーラビリティを有し
ている。As described above, although the coding method of FIG. 3 is different from the coding method of FIG. 1, both have common information and have scalability as shown in FIG.

【００６３】この図４において、２ｋbps のビットスト
リームＳ２は、分析合成フレームがＶ（有声音）のとき
とＵＶ（無声音）のときとで内部構造が異なっており、
Ｖのときの２ｋbps のビットストリームＳ２vは、２つ
の部分Ｓ２ve、Ｓ２vaから、ＵＶのときの２ｋbps のビ
ットストリームＳ２uは、２つの部分Ｓ２ue、Ｓ２uaか
らそれぞれ成っている。部分Ｓ２veは、ピッチラグが１
フレーム１６０サンプル当たり１ビット（以下、1ヒ゛ット/
160サンフ゜ルのように示す）で、振幅Ａm が15ヒ゛ット/160サンフ゜
ルであり、計16ヒ゛ット/160サンフ゜ルとなる。これは、８ｋHz
サンプリングで０．８ｋbps のビットレートのデータに
相当する。部分Ｓ２ueは、ＬＰＣ残差が11ヒ゛ット/80サンフ゜ル
と、予備の1ヒ゛ット/160サンフ゜ルとで、計23ヒ゛ット/160サンフ゜ルと
なり、１．１５ｋbps のビットレートのデータに相当す
る。２ｋbps のビットストリームＳ２の残りの部分Ｓ２
va、Ｓ２uaは、上述した６ｋbps 、１６ｋbps との共有
部分あるいは共通部分であり、部分Ｓ２vaは、ＬＳＰデ
ータ32ヒ゛ット/320サンフ゜ルと、Ｖ／ＵＶ判定データ1ヒ゛ット/16
0サンフ゜ルと、ピッチラグ7ヒ゛ット/160サンフ゜ルとで、計24ヒ゛ット/1
60サンフ゜ルとなり、１．２ｋbps のビットレートのデータ
に相当する。部分Ｓ２uaは、ＬＳＰデータ32ヒ゛ット/320サン
フ゜ルと、Ｖ／ＵＶ判定データ1ヒ゛ット/160サンフ゜ルとで、計17
ヒ゛ット/160サンフ゜ルとなり、０．８５ｋbps のビットレート
のデータに相当する。In FIG. 4, a bit rate of 2 kbps
Ream S2 is when the analysis and synthesis frame is V (voiced sound)
And UV (unvoiced) have different internal structure,
2 kbps bitstream S2v for VIs two
From the parts S2ve and S2va of 2 kbps at UV
Stream S2uAre the two parts S2ue and S2ua?
Each of them. Part S2ve has a pitch lag of 1
1 bit per 160 frame sample (hereinafter 1 bit /
Amplitude Am is 15 bits / 160 samples.
This is a total of 16 bits / 160 samples. This is 8kHz
Data of bit rate of 0.8 kbps by sampling
Equivalent to. Part S2ue has an LPC residual of 11 bits / 80 samples.
And a spare 1 bit / 160 sample, a total of 23 bit / 160 sample
Which is equivalent to data having a bit rate of 1.15 kbps.
You. Remaining part S2 of 2 kbps bit stream S2
va and S2ua are shared with the above 6kbps and 16kbps.
It is a part or a common part, and the part S2va is the LSP data.
Data 32 bits / 320 samples and V / UV judgment data 1 bit / 16
A total of 24 bits / 1 with 0 samples and pitch lugs 7 bits / 160 samples
60 sample rate, 1.2 kbps bit rate data
Is equivalent to Part S2ua is LSP data 32 bits / 320 samples
A total of 17 with the pool and V / UV judgment data 1 bit / 160 samples
Bit / 160 sample rate, 0.85 kbps bit rate
Corresponding to the data.

【００６４】また、６ｋbps のビットストリームＳ６
は、上記ビットストリームＳ２と同様に、分析フレーム
がＶのときとＵＶのときとで内部構造が一部だけ異な
る。Ｖのときの６ｋbps のビットストリームＳ６v は、
２つの部分Ｓ６va、Ｓ６vbから、ＵＶのときの６ｋbps
のビットストリームＳ６u は、２つの部分Ｓ６ua、Ｓ６
ubからそれぞれ成っている。部分Ｓ６vaは、上述したよ
うに、部分Ｓ２vaと共通のデータ内容であり、部分Ｓ６
vbは、ピッチゲイン6ヒ゛ット/160サンフ゜ルと、ピッチ残差18ヒ゛
ット/32サンフ゜ルとで、計96ヒ゛ット/160サンフ゜ルとなり、４．８ｋ
bps のビットレートのデータに相当する。また、部分Ｓ
６uaは、上記部分Ｓ２uaと共通のデータ内容であり、部
分Ｓ６ubは、上記部分Ｓ６vbと共通のデータ内容であ
る。In addition, a 6 kbps bit stream S6
In the same manner as the bit stream S2, the internal structure differs only partially when the analysis frame is V and when it is UV. The 6 kbps bitstream S6v at V is
From 2 parts S6va and S6vb, 6 kbps for UV
Of the bit stream S6u of the two parts S6ua, S6
Each made up of ubs. As described above, the part S6va has the same data content as the part S2va, and the part S6va
vb is a pitch gain of 6 bits / 160 samples and a pitch residual of 18 bits / 32 samples, giving a total of 96 bits / 160 samples, 4.8k.
Corresponds to bps bit rate data. Also, part S
6ua has the same data content as the above part S2ua, and the part S6ub has the same data content as the above part S6vb.

【００６５】また、１６ｋbps のビットストリームＳ１
６は、上記ビットストリームＳ２及びＳ６と同様に分析
フレームがＶのときとＵＶのときとで内部構造が一部だ
け異なる。Ｖのときの１６ｋbps のビットストリームＳ
１６v は、４つの部分Ｓ１６va、Ｓ１６vb、Ｓ１６vc、
Ｓ１６vdから、ＵＶのときの１６ｋbps のビットストリ
ームＳ１６u は、４つの部分Ｓ１６ua、Ｓ１６ub、Ｓ１
６uc、Ｓ１６udからそれぞれ成っている。部分Ｓ１６va
は、上記部分Ｓ２va、Ｓ６vaと共通のデータ内容であ
り、Ｓ１６vbは、上記部分Ｓ６vb、Ｓ６ubと共通のデー
タ内容である。部分Ｓ１６vcは、ピッチラグ2ヒ゛ット/160サ
ンフ゜ルと、ピッチゲイン11ヒ゛ット/160サンフ゜ルと、ピッチ残差
18ヒ゛ット/32サンフ゜ルと、Ｓ／Ｍモードデータ1ヒ゛ット/160サンフ゜ル
とで、計104ヒ゛ット/160サンフ゜ルとなり、５．２ｋbps のビッ
トレートに相当する。なお、上記Ｓ／Ｍモードデータ
は、ＶＱ回路１２４で、音声（Speech）用と楽音（Musi
c）用とで異なる２種類のコードブック（符号帳）を切
り換えるためのものである。部分Ｓ１６vdは、高域ＬＰ
Ｃデータ5ヒ゛ット/160サンフ゜ルと、高域ＬＰＣ残差15ヒ゛ット/32サ
ンフ゜ルとで、計80ヒ゛ット/160サンフ゜ルとなり、４ｋbps のビッ
トレートに相当する。また、部分Ｓ１６uaは、上記部分
Ｓ２ua、Ｓ６uaと共通のデータ内容であり、部分Ｓ１６
ubは、上記部分Ｓ１６vbすなわち上記部分Ｓ６vb、Ｓ６
ubと共通のデータ内容である。さらに、部分Ｓ１６uc
は、上記部分Ｓ１６vcと共通のデータ内容であり、部分
Ｓ１６udは、上記部分Ｓ１６vdと共通のデータ内容であ
る。In addition, a bit stream S1 of 16 kbps
6, like the bit streams S2 and S6, the internal structure differs only partially when the analysis frame is V and when it is UV. 16 kbps bit stream S for V
16v is four parts S16va, S16vb, S16vc,
From S16vd, the 16 kbps bitstream S16u for UV is divided into four parts S16ua, S16ub, and S1.
It consists of 6uc and S16ud. Part S16va
Is the data content common to the parts S2va and S6va, and S16vb is the data content common to the parts S6vb and S6ub. Part S16vc is pitch lag 2 bits / 160 samples, pitch gain 11 bits / 160 samples, and pitch residual
18 bits / 32 samples and 1 bit / 160 samples of S / M mode data give a total of 104 bits / 160 samples, which corresponds to a bit rate of 5.2 kbps. The S / M mode data is sent to the VQ circuit 124 for voice (Speech) and music (Musi).
It is for switching between two types of codebooks (codebooks), which are different for and for c). Part S16vd is high frequency LP
C data 5 bits / 160 samples and high band LPC residual 15 bits / 32 samples gives a total of 80 bits / 160 samples, which corresponds to a bit rate of 4 kbps. The portion S16ua has the same data content as the portions S2ua and S6ua.
ub is the portion S16vb, that is, the portions S6vb and S6.
It has the same data contents as ub. Furthermore, part S16uc
Is the data content common to the above-mentioned part S16vc, and the part S16ud is the data content common to the above-mentioned part S16vd.

【００６６】以上のようなビットストリームを得るため
の図１、図３の構成をまとめると、図５のようになる。The configuration of FIGS. 1 and 3 for obtaining the above bit stream is summarized as shown in FIG.

【００６７】この図５において、入力端子１１は図１、
図３の入力端子１０１に対応し、これが図１のＬＰＦ１
０２、サンプリング周波数変換器１０３、減算器１０
６、ＢＰＦ１０７等に相当する帯域分割回路１２に送ら
れて、低域側と高域側とに分割される。帯域分割回路１
２からの低域側信号は、図３の構成に対応する２ｋ符号
化部２１と共通部分符号化部２２とに送られる。共通部
分符号化部２２は、図１のＬＰＣ分析・量子化部１３
０、あるいは図３のＬＰＣ分析・量子化部３１０にほぼ
相当し、さらに図３のサイン波分析符号化部内のピッチ
抽出部分や図１のピッチ分析回路１１５も共通部分符号
化部２２に含ませることもできる。In FIG. 5, the input terminal 11 is shown in FIG.
This corresponds to the input terminal 101 of FIG. 3, and this is the LPF 1 of FIG.
02, sampling frequency converter 103, subtractor 10
6, sent to the band division circuit 12 corresponding to the BPF 107 and the like, and divided into the low frequency side and the high frequency side. Band division circuit 1
The low frequency side signal from 2 is sent to the 2k encoding unit 21 and the common partial encoding unit 22 corresponding to the configuration of FIG. The common partial encoding unit 22 is the LPC analysis / quantization unit 13 of FIG.
0, or almost equivalent to the LPC analysis / quantization unit 310 in FIG. 3, and the pitch extraction unit in the sine wave analysis coding unit in FIG. 3 and the pitch analysis circuit 115 in FIG. 1 are also included in the common partial coding unit 22. You can also

【００６８】また、帯域分割回路１２からの低域側信号
は、６ｋ符号化部２３及び１２ｋ符号化部２４にも送ら
れる。６ｋ符号化部２３は、図１の回路１１１〜１１６
にほぼ相当し、１２ｋ符号化部は、図１の回路１１７、
１１８、１２２〜１２８にほぼ相当する。The low-frequency side signal from the band division circuit 12 is also sent to the 6k coding unit 23 and the 12k coding unit 24. The 6k encoding unit 23 includes the circuits 111 to 116 shown in FIG.
, And the 12k encoding unit corresponds to the circuit 117 of FIG.
118, 122 to 128 are substantially equivalent.

【００６９】帯域分割回路１２からの高域側信号は、高
域４ｋ符号化部２５に送られる。高域４ｋ符号化部２５
は、図１の回路１６１〜１６４、１７１、１７２にほぼ
相当する。The high band side signal from the band dividing circuit 12 is sent to the high band 4k encoding unit 25. High frequency 4k encoding unit 25
Substantially correspond to the circuits 161-164, 171, 172 of FIG.

【００７０】この図５の各出力端子３１〜３５から出力
されるビットストリームと図４の各部分との関係を説明
する。２ｋ符号化部２１から出力端子３１を介して、図
４の部分Ｓ２ve又はＳ２ueのデータが出力され、共通部
分符号化部２２から出力端子３２を介して、図４の部分
Ｓ２va（＝Ｓ６va＝Ｓ１６va）又はＳ２ua（＝Ｓ６ua＝
Ｓ１６ua）のデータが出力される。また、６ｋ符号化部
２３から出力端子３３を介して、図４の部分Ｓ６vb（＝
Ｓ１６vb）又はＳ６ub（＝Ｓ１６ub）のデータが出力さ
れる。さらに、１２ｋ符号化部２４から出力端子３４を
介して、図４の部分Ｓ１６vc又はＳ１６ucのデータが出
力され、高域４ｋ符号化部２５から出力端子３５を介し
て、図４の部分Ｓ１６vd又はＳ１６udのデータが出力さ
れる。The relationship between the bit stream output from the output terminals 31 to 35 of FIG. 5 and the respective portions of FIG. 4 will be described. The data of the portion S2ve or S2ue in FIG. 4 is output from the 2k encoding unit 21 through the output terminal 31, and the portion S2va (= S6va = S16va in FIG. 4 is output from the common partial encoding unit 22 through the output terminal 32. ) Or S2ua (= S6ua =
The data of S16ua) is output. Further, the portion S6vb (= in FIG. 4 is transmitted from the 6k encoding unit 23 via the output terminal 33.
The data of S16vb) or S6ub (= S16ub) is output. Further, the data of the portion S16vc or S16uc of FIG. 4 is output from the 12k encoding unit 24 via the output terminal 34, and the high frequency 4k encoding unit 25 outputs the data of the portion S16vd or S16ud of FIG. 4 via the output terminal 35. Data is output.

【００７１】以上説明したスケーラビリティの実現の技
術を一般化すると、入力信号に対して第１の符号化を施
して得られた第１の符号化信号と、上記入力信号に対し
て上記第１の符号化の一部とのみ共通する部分と共通し
ない部分とを有し上記第１の符号化とは独立の第２の符
号化を施して得られた第２の符号化信号とを多重化する
際に、上記第１の符号化信号と、上記第２の符号化信号
の内の上記第１の符号化信号と共通する部分を除く信号
とを多重化することである。Generalizing the above-described technique for realizing scalability, the first coded signal obtained by performing the first coding on the input signal and the first coded signal on the input signal are obtained. The second coded signal obtained by performing the second coding independent of the first coding and having a part common to only a part of the coding and a part not common to the first coding is multiplexed. At this time, the first coded signal is multiplexed with a signal of the second coded signal excluding a portion common to the first coded signal.

【００７２】これによって、本質的に異なる符号化方式
であっても、共有できるものを最大限に共有させて、ス
ケーラビリティを持たせることができる。This makes it possible to maximize the share of what can be shared even if the encoding systems are essentially different, thereby providing scalability.

【００７３】次に、上記図１、図２の各部のより具体的
な動作について説明する。Next, a more specific operation of each unit shown in FIGS. 1 and 2 will be described.

【００７４】先ず、図６の（Ａ）に示すように、フレー
ム間隔、いわゆるフレームインターバルをＮサンプル、
例えば１６０サンプルとし、１フレームに１回の分析を
行う場合について説明する。First, as shown in FIG. 6A, the frame interval, so-called frame interval, is set to N samples,
For example, a case will be described where 160 samples are used and analysis is performed once per frame.

【００７５】ピッチ分析中心をｔ＝ｋＮ（ただしｋ＝0,
1,2,3,…）とするとき、ＬＰＣ逆フィルタ１１１からの
ＬＰＣ予測残差について、ｔ＝ kN-N/2 〜 kN+N/2 に存
在する成分から成る次元数ＮのベクトルをＸとし、これ
をＬサンプルだけ時間軸の前方にずらしたｔ＝ kN-N/2-
L 〜 kN+N/2-L の成分から成るＮ次元ベクトルをＸ_L と
して、 ‖Ｘ−ｇＸ_L ‖² が最小となるようＬ＝Ｌ_optをサーチし、このＬ_optを
この区間での最適ピッチラグＬ₁とする。あるいは、ピ
ッチの急激な変化を避けるため、ピッチトラッキングを
行った後の値を最適ピッチラグＬ₁としてもよい。The pitch analysis center is t = kN (where k = 0,
1,2,3, when the ...), the LPC prediction residuals from the LPC inverted filter 111, a vector of t = kN-N / 2 ~ kN + N / 2 number of dimensions consisting of components present in the N X And t = kN-N / 2- which is shifted L samples forward of the time axis.
The N-dimensional vector consisting of components of L ~ kN + N / 2- L as X _L, ‖ X -g X _L ‖ ² searches the L = L _opt to be the smallest, in the L _opt in this section The optimum pitch lag L ₁ is set. Alternatively, in order to avoid a rapid change in pitch, the value after pitch tracking may be set as the optimum pitch lag L ₁ .

【００７６】次に、この最適ピッチラグＬ₁に対して、Next, with respect to this optimum pitch lag L ₁ ,

【００７７】[0077]

【数１】 (Equation 1)

【００７８】が最小となるｇ_iの組をThe set of g _i that minimizes

【００７９】[0079]

【数２】 (Equation 2)

【００８０】について解き、ピッチゲインベクトルｇ₁
を求める。このピッチゲインベクトルｇ₁ をベクトル量
子化したもののコードブックインデクスをｇ₁とする。Solving for the pitch gain vector g ₁
Ask for. The codebook index of vector-quantized pitch gain vector g ₁ is g ₁ .

【００８１】次に、さらに予測精度を上げるため、ｔ＝
(k-1/2)Nにも分析中心をおくことを考える。このとき、
予めｔ＝kN及び(k-1)Nでのピッチラグ、ピッチゲインが
それぞれ求められているものとする。Next, in order to further improve the prediction accuracy, t =
Consider putting the analysis center on (k-1 / 2) N. At this time,
It is assumed that the pitch lag and the pitch gain at t = kN and (k-1) N are obtained in advance.

【００８２】音声信号の場合、その基本周波数はゆるや
かに変化すると考えられるため、ｔ＝kNのときのピッチ
ラグＬ(kN)と、ｔ＝(k-1)NのときのピッチラグＬ((k-1)
N)ととの間に大きな変化はないと考えられ、またその変
化も線形であると考えられるため、ｔ＝(k-1/2)Nのとき
のピッチラグＬ((k-1/2)N)のとり得る値に制限を加える
ことは可能である。本例では、次のようにしている。In the case of a voice signal, its fundamental frequency is considered to change gently, so that the pitch lag L (kN) at t = kN and the pitch lag L ((k- 1)
It is considered that there is no significant change between N) and N, and that change is also linear, so pitch lag L ((k-1 / 2) when t = (k-1 / 2) N It is possible to limit the possible values of N). In this example, the following is done.

【００８３】Ｌ（(k-1/2)N）＝Ｌ（ｋＮ）＝（Ｌ(kN)＋Ｌ((k-1)N)）／２＝Ｌ（(k-1)Ｎ）これらの内のどの値を採用するかは、それぞれのラグに
対応したピッチ残差のパワーを計算することによってな
される。L ((k-1 / 2) N) = L (kN) = (L (kN) + L ((k-1) N)) / 2 = L ((k-1) N) Which value is adopted is determined by calculating the power of the pitch residual corresponding to each lag.

【００８４】すなわち、ｔ＝(k-1/2)Nを中心としたｔ＝
(k-1/2)N-N/4〜(k-1/2)N+N/4の次元数Ｎ／２のベクトル
をＸとし、Ｌ(kN)、(L(kN)+L((k-1)N))/2、Ｌ((k-1)N)
だけそれぞれ遅れた次元数Ｎ／２のベクトルをＸ₀ ⁽⁰⁾ 、
Ｘ₁ ⁽⁰⁾ 、Ｘ₂ ⁽⁰⁾ とし、これらの各ベクトルＸ₀ ⁽⁰⁾ 、Ｘ₁
⁽⁰⁾ 及びＸ₂ ⁽⁰⁾ のそれぞれの近傍のベクトルをＸ₀ ^(-1) 、
Ｘ₀ ⁽¹⁾ 、Ｘ₁ ^(-1) 、Ｘ₁ ⁽¹⁾ 及びＸ₂ ^(-1) 、Ｘ₂ ⁽¹⁾ とす
る。また、これらの各ベクトルＸ₀ ⁽ⁱ⁾ 、Ｘ₁ ⁽ⁱ⁾ 、Ｘ₂ ⁽ⁱ⁾
（ただしｉ＝−１、０、１）に対応する核ピッチゲイン
ｇ₀ ⁽ⁱ⁾、ｇ₁ ⁽ⁱ⁾、ｇ₂ ⁽ⁱ⁾について、That is, t = centered on t = (k-1 / 2) N
Let X be a vector of dimension number N / 2 of (k-1 / 2) NN / 4 to (k-1 / 2) N + N / 4, and let L (kN) and (L (kN) + L ((k -1) N)) / 2, L ((k-1) N)
The vector of the number of dimensions N / 2 delayed respectively by X ₀ ⁽⁰⁾ ,
X ₁ ⁽⁰⁾ , X ₂ ^(0), and these respective vectors X ₀ ⁽⁰⁾ , X ₁
⁽⁰⁾ and X ₂ ⁽⁰⁾ each vector near X ₀ ^(-1) ,
X ₀ ⁽¹⁾ , X ₁ ^(-1) , X ₁ ⁽¹⁾ and X ₂ ^(-1) , X ₂ ⁽¹⁾ . Also, each of these vectors X ₀ ⁽ⁱ⁾ , X ₁ ⁽ⁱ⁾ , X ₂ ⁽ⁱ⁾
For the nuclear pitch gains g ₀ ⁽ⁱ⁾ , g ₁ ⁽ⁱ⁾ and g ₂ ⁽ⁱ⁾ corresponding to (where i = −1, ₀ , ₁ ⁾ ,

【００８５】[0085]

【数３】 (Equation 3)

【００８６】の３つうちの最小のものＤ_jに対するラグ
を、ｔ＝(k-1/2)Nでの最適ラグＬ₂とし、そのときのピ
ッチゲインｇ_j ⁽ⁱ⁾（ただしｉ＝−１、０、１）をベクト
ル量子化した上でピッチゲインを求める。なお、Ｌ₂の
とり得る値は３通りであり、これは現在及び過去のＬ₁
から求められるため、ストレートな値ではなく補間スキ
ームを表すフラグを補間インデクスとして伝送すればよ
い。また、Ｌ(kN)、Ｌ((k-1)N)のいずれかが０、すなわ
ちピッチが無い、ピッチ予測利得がとれない、と判断さ
れるときには、Ｌ((k-1/2)N)の候補として上記(L(kN)+L
((k-1)N))/2 は除外される。The lag with respect to the smallest one of the three, D _j , is the optimum lag L ₂ at t = (k-1 / 2) N, and the pitch gain g _j ^{(i) at} that time (where i =- 1, 0, 1) are vector-quantized and then the pitch gain is obtained. There are three possible values for L ₂ , which are the current and past L ₁ values.
Therefore, a flag representing an interpolation scheme, rather than a straight value, may be transmitted as an interpolation index. Further, when it is determined that either L (kN) or L ((k-1) N) is 0, that is, there is no pitch and the pitch prediction gain cannot be obtained, L ((k-1 / 2) N ) Above (L (kN) + L
((k-1) N)) / 2 is excluded.

【００８７】このように、ピッチラグ算出に用いるベク
トルＸの次元数を半分のN/2 にした場合、ｔ＝kNが分析
中心のときのＬ_kはそのまま用いることができるが、Ｘ
の次元数がＮで分析したときのピッチゲインが得られて
いるにも拘わらず、再度ゲイン計算を行い、そのデータ
を伝送しなければならない。ここでは、そのビット数削
減のため、[0087] Thus, when the number of dimensions of the vector X used for the pitch lag calculated half N / 2, t = it kN is L _k when the analysis center may be used as it is, X
Even though the pitch gain is obtained when the number of dimensions is analyzed by N, the gain must be calculated again and the data must be transmitted. Here, to reduce the number of bits,

【００８８】[0088]

【数４】 (Equation 4)

【００８９】ベクトルｇの要素（ｇ₀,ｇ₁,ｇ₂）の内、
ｇ₁ がもっとも大きくｇ₀,ｇ₂ は０に近いか、あるいは
その逆であり、ベクトルｇは３点の間で強い相関がある
ので、上記ベクトルｇ_1d は元のベクトルｇに比べて分散
が小さくなることが予想され、より少ないビット数で量
子化できる。Of the elements (g ₀ , g ₁ , g ₂ ) of the vector g ,
Since g ₁ is the largest and g ₀ and g ₂ are close to 0 or vice versa and the vector g has a strong correlation among the three points, the vector g _1d has a variance larger than that of the original vector g. It is expected to be smaller and can be quantized with a smaller number of bits.

【００９０】従って、１フレームで伝送すべきピッチパ
ラメータは、Ｌ₁,ｇ₁,Ｌ₂,ｇ₂,ｇ_1dの５つになる。Accordingly, there are five pitch parameters L ₁ , g ₁ , L ₂ , g ₂ , g _1d to be transmitted in one frame.

【００９１】次に、図５の（Ｂ）は、フレーム周波数の
８倍のレートで補間されたＬＰＣ係数の位相を示してお
り、このＬＰＣ係数は、図１のＬＰＣ逆フィルタ１１１
による予測残差算出に用いられ、また図２のＬＰＣ合成
フィルタ２１５、２２５、ピッチスペクトラルポストフ
ィルタ２１６、２２６にそれぞれ用いられる。Next, FIG. 5B shows the phase of the LPC coefficient interpolated at a rate of 8 times the frame frequency. This LPC coefficient is the LPC inverse filter 111 of FIG.
2 is used for the prediction residual calculation, and is also used for the LPC synthesis filters 215 and 225 and the pitch spectral post filters 216 and 226 of FIG.

【００９２】次に、上記ピッチラグ及びピッチゲインか
ら求められたピッチ残差のベクトル量子化について説明
する。Next, vector quantization of the pitch residual obtained from the pitch lag and the pitch gain will be described.

【００９３】ベクトル量子化の聴覚重み付けを容易にま
た精度よく行うため、ピッチ残差は５０％オーバーラッ
プの窓かけをした上、ＭＤＣＴ変換を行い、この領域で
重み付けベクトル量子化を行う。このときの変換長は任
意であるが、以下の点を考慮して上で、この例ではかな
り小次元なものを用いている。In order to easily and accurately perform the perceptual weighting of the vector quantization, the pitch residual is windowed by 50% overlap, the MDCT transformation is performed, and the weighting vector quantization is performed in this region. The conversion length at this time is arbitrary, but in consideration of the following points, a considerably small dimension is used in this example.

【００９４】(1) 大次元のベクトル量子化は演算量が膨
大になり、ＭＤＣＴ領域でスプリットあるいは並べ替え
を行わざるを得ない。 (2) スプリットした場合、スプリットされたバンド間の
ビットアロケーションを精密に行うことは大変困難であ
る。 (3) 次元数が２のべき乗ではないとき、ＦＦＴを用いた
ＭＤＣＴの高速算法が使用できない。(1) The large-dimensional vector quantization requires an enormous amount of calculation, and there is no choice but to perform splitting or rearrangement in the MDCT domain. (2) When splitting, it is very difficult to perform precise bit allocation between the split bands. (3) When the number of dimensions is not a power of 2, the high-speed arithmetic method of MDCT using FFT cannot be used.

【００９５】今回はフレーム長を２０ｍsec （＝１６０
サンプル／８ｋHz）にとっているため、１６０／５＝３
２＝２⁵となることから、５０％オーバーラップを考慮
してＭＤＣＴ変換サイズを６４にとり、上記(1)〜(3)の
各点の解決を図った。This time, the frame length is 20 msec (= 160
160/5 = 3 because it is for sample / 8 kHz)
Since 2 = ²⁵ , the MDCT transform size is set to 64 in consideration of 50% overlap and the above points (1) to (3) are solved.

【００９６】フレーミングの状態は図６の（Ｃ）のよう
になる。The framing state is as shown in FIG.

【００９７】すなわち、この図６の（Ｃ）において、２
０ｍsec ＝１６０サンプルのフレーム内のピッチ残差ｒ
_p(n) （ただしn=0,1,…,191、ここでn=160,…,191は、
次のフレームの0,…,31の意味）を５つのサブフレーム
に分け、５つのサブフレームのｉ番目（i=0,1,…,4）の
サブフレームのピッチ残差ｒ_pi(n) （ただし、n=0,1,
…,31）を、ｒ_pi(n) ＝ｒ_p(32i+n) とする。このサブフレームのピッチ残差ｒ_pi(n) に、Ｍ
ＤＣＴのエリアシング相殺ができるような窓関数ｗ(n)
をかけて得られるｗ(n)・ｒ_pi(n)に対してＭＤＣＴ変換
を施す。この窓関数ｗ(n) としては、例えばｗ(n) ＝ √（１−(cos2π(n+0.5))/64）を用いればよい。That is, in FIG. 6C, 2
Pitch residual r in the frame of 0 msec = 160 samples
_p (n) (where n = 0,1, ..., 191, where n = 160, ..., 191 is
(Meaning 0, ..., 31 of the next frame) is divided into 5 subframes, and the pitch residual r _pi (n) of the i-th (i = 0,1, ..., 4) subframe of the 5 subframes (However, n = 0,1,
, 31) is set as r _pi (n) = r _p (32i + n). For the pitch residual r _pi (n) of this subframe, M
Window function w (n) that can cancel aliasing of DCT
MDCT conversion is applied to w (n) · r _pi (n) obtained by multiplying by. As the window function w (n), for example, w (n) = √ (1- (cos2π (n + 0.5)) / 64) may be used.

【００９８】なお、ＭＤＣＴの変換処理の演算は、変換
長が６４（＝２⁶）のため、ＦＦＴを用いて次のように
計算できる。Since the conversion length of the MDCT conversion process is 64 (= 2 ⁶ ), it can be calculated as follows using FFT.

【００９９】(1) ｘ(n) ＝ｗ(n)・ｒ_pi(n)・exp((-2π
j/64)(n/2)) とする。 (2) ｘ(n) を６４ポイントＦＦＴ処理し、これをｙ(k)
とする。 (3) ｙ(k)・exp((-2πj/64)(k+1/2)(1/2+64/4)) の実部
をとり、これをＭＤＣＴ係数ｃ_i(k)（ただし、k=0,1,
…,31）とする。(1) x (n) = w (n) _.rpi (n) .exp ((-2π
j / 64) (n / 2)). (2) x (n) is 64-point FFT processed and this is y (k)
And (3) Take the real part of y (k) ・ exp ((-2πj / 64) (k + 1/2) (1/2 + 64/4)), and use this as the MDCT coefficient c _i (k) (however, , K = 0,1,
…, 31).

【０１００】次に、各サブフレームのＭＤＣＴ係数ｃ
_i(k)をベクトル量子化するが、このときの重み付けにつ
いて説明する。Next, the MDCT coefficient c of each subframe
Vector quantization is performed on _i (k), and the weighting at this time will be described.

【０１０１】ピッチ残差ｒ_pi(n) をベクトルｒ_i とおく
と、合成後の距離Ｄは、Letting the pitch residual r _pi (n) be the vector r _i , the distance D after synthesis is

【０１０２】[0102]

【数５】 (Equation 5)

【０１０３】ここで、Ｍは、その性質からＨ^tＨ（ただ
しＨ^t はＨの転置行列）を対角化すると考えられるの
で、Here, M is considered to diagonalize H ^t H (where H ^t is a transposed matrix of H) because of its nature.

【０１０４】[0104]

【数６】 (Equation 6)

【０１０５】とし、ここではｈ_iを合成フィルタの周波
数応答にとった。従って、Here, h _i is taken as the frequency response of the synthesis filter. Therefore,

【０１０６】[0106]

【数７】 (Equation 7)

【０１０７】このように、ｈ_kをそのままｃ_i(k)の量子
化の重み付けに使用した場合、合成後のノイズがフラッ
トになる、いわゆる１００％ノイズシェイピングになる
ため、さらに聴覚重み付けＷによりフォルマントを相似
形のノイズになるようにコントロールする。As described above, when h _k is used as it is for weighting the quantization of c _i (k), the noise after synthesis becomes flat, that is, so-called 100% noise shaping. Control so that it becomes a noise of similar shape.

【０１０８】[0108]

【数８】 (Equation 8)

【０１０９】なお、ｈ_i ²、ｗ_i ²は、合成フィルタＨ(z)
及び聴覚重み付けフィルタＷ(z)Note that h _i ² and w _i ² are synthesis filters H (z)
And perceptual weighting filter W (z)

【０１１０】[0110]

【数９】 (Equation 9)

【０１１１】のインパルス応答のＦＦＴパワースペクト
ルとして求められる。The FFT power spectrum of the impulse response of is obtained.

【０１１２】ここで、α_ijは、第ｉサブフレームに対応
するＬＰＣ係数であり、補間されたＬＳＰ係数から求め
られる。すなわち、前フレームの分析で得られたＬＳＰ
₀(j)と現フレームのＬＳＰ₁(j)とを内分し、本例の場
合、第ｉサブフレームのＬＳＰは、Here, α _ij is the LPC coefficient corresponding to the i-th subframe, and is obtained from the interpolated LSP coefficient. That is, the LSP obtained by the analysis of the previous frame
₀ (j) and LSP ₁ (j) of the current frame are internally divided, and in this example, the LSP of the i-th subframe is

【０１１３】[0113]

【数１０】 (Equation 10)

【０１１４】としてＬＳＰ⁽ⁱ⁾(j)を求める。その後、Ｌ
ＳＰ→α変換によりα_ijを求める。LSP ⁽ⁱ⁾ (j) is obtained as Then L
Obtain α _ij by SP → α conversion.

【０１１５】このようにして求められたＨ、Ｗに対し
て、新たにＷ’＝ＷＨと置き、ベクトル量子化の際の距
離尺度として用いる。With respect to H and W thus obtained, W '= WH is newly set and used as a distance measure in vector quantization.

【０１１６】ベクトル量子化は、シェイプ、ゲインベク
トル量子化によって行うが、その学習時の最適エンコー
ド、デコード条件について説明する。Vector quantization is performed by shape and gain vector quantization. Optimal encoding and decoding conditions at the time of learning will be described.

【０１１７】学習のある時点でのシェイプコードブック
をｓ、ゲインコードブックをｇとし、トレーニング時の
入力すなわち各サブフレームでのＭＤＣＴ係数をｘ、そ
のサブフレームでの重みをＷ´とすると、このときの歪
のパワーＤ ² は、以下の式で定義される。Let s be the shape codebook at a certain point in learning, g be the gain codebook, x be the MDCT coefficient in the training input, that is, in each subframe, and W ′ be the weight in that subframe. The distortion power D ^{2 at this time} is defined by the following equation.

【０１１８】Ｄ ² ＝‖Ｗ´（ｘ−ｇｓ）‖² このＤ ² を最小にするような（ｇ、ｓ）を選択すること
が最適エンコード条件である。 D ² = ‖W ′ ( x− g s ) ‖ ² The optimum encoding condition is to select (g, s ) that minimizes this D ² .

【０１１９】[0119]

【数１１】 [Equation 11]

【０１２０】したがって、まず第１のステップとして、
シェイプコードブックについて、Therefore, first of all, as the first step,
About shape code book,

【０１２１】[0121]

【数１２】 (Equation 12)

【０１２２】を最大にするｓ _opt をサーチし、ゲインコ
ードブックについては、このｓ _opt に対し、Search for s _opt that maximizes, and for the gain codebook, for this s _opt ,

【０１２３】[0123]

【数１３】 (Equation 13)

【０１２４】に最も近いｇ_opt をサーチすればよい。It suffices to search g _opt closest to.

【０１２５】次に、最適デコード条件を求める。Next, the optimum decoding condition is obtained.

【０１２６】第２のステップとしてシェイプコードブッ
クについて、学習中のある時点でシェイプコードブック
ｓにエンコードされたｘの集合ｘ _k （ｋ＝０，…，Ｎ−
１）に対して、このときの歪の総和Ｅ_s は、As the second step, regarding the shape codebook, at some point during learning, the shape codebook is
A set of encoded x in _{s x k (k = 0,} ..., N-
In contrast to 1), the total distortion E _{s at} this time is

【０１２７】[0127]

【数１４】 [Equation 14]

【０１２８】であるから、これを最小にするｓは、Therefore, s that minimizes this is

【０１２９】[0129]

【数１５】 (Equation 15)

【０１３０】よりFrom

【０１３１】[0131]

【数１６】 (Equation 16)

【０１３２】と求められる。Is calculated.

【０１３３】ゲインコードブックについては、ゲインコ
ードブックｇにエンコードされたｘの集合ｘ _k （重みＷ
´_k 、シェイプｓ _k ）について、歪の総和Ｅ_gは、For the gain codebook, a set x _{k of} x encoded in gain codebook g (weight W
′ _K , shape s _k ), the sum of distortion E _g is

【０１３４】[0134]

【数１７】 [Equation 17]

【０１３５】である。Is as follows.

【０１３６】上記第１、第２のステップを繰り返し求め
ながら、ＧＬＡ（一般化ロイドアルゴリズム）によっ
て、シェイプ、ゲインコードブックを得ることができ
る。A shape and gain codebook can be obtained by GLA (generalized Lloyd algorithm) while repeatedly obtaining the first and second steps.

【０１３７】なお、本例では、信号レベルの小さいとき
のノイズを重視するため、Ｗ´そのものでなく、レベル
（の逆数）の重みをつけたＷ´／‖ｘ‖を用いて学習を
行っている。[0137] In this embodiment, in order to emphasize the noise when the signal level is low, W'not itself, W'was weighted levels (inverse of) / ‖ x ‖ performing learning using There is.

【０１３８】このようにして、作成した符号帳を用いて
ピッチ残差をＭＤＣＴしたものに対するベクトル量子化
を行い、そのインデクスをＬＰＣ（実際にはＬＳＰ）、
ピッチ、ピッチゲインととともに伝送し、デコード側で
は逆ベクトル量子化、ピッチ、ＬＰＣ合成を行うこと
で、再生音を得ることができるが、本例では、さらにレ
ートの高い動作を可能とするため、前述のピッチラグ、
ピッチゲイン算出の頻度向上とともに、ピッチ残差ＭＤ
ＣＴベクトル量子化を多段にすることで、これに対応し
ている。In this way, vector quantization is performed on the MDCT of the pitch residual using the codebook created in this way, and its index is LPC (actually LSP),
By transmitting together with the pitch and the pitch gain and performing the inverse vector quantization, the pitch, and the LPC synthesis on the decoding side, the reproduced sound can be obtained, but in this example, since the operation with a higher rate is possible, The aforementioned pitch lag,
As the frequency of pitch gain calculation is improved, the pitch residual MD
This is dealt with by making the CT vector quantization multistage.

【０１３９】一例を図７の（Ａ）に示す。ここでの段数
は２段であり、シーケンシャルな多段ベクトル量子化で
あるが、２段目の入力は１段目のデコード結果をＬ₂ 、
ｇ₂、ｇ_1dから作られた精度の高いピッチ残差から引い
たものとして用いる。すなわち、１段目のＭＤＣＴ回路
１１３からの出力をＶＱ回路１１４でベクトル量子化し
た後の代表ベクトルあるいは逆量子化出力を、逆ＭＤＣ
Ｔ回路１１３ａで逆ＭＤＣＴ処理した結果を、減算器１
２８’に送り、２段目の残差（図１のピッチ逆フィルタ
１２２からの出力）から減算している。この減算器１２
８’からの出力をＭＤＣＴ回路１２３’によりＭＤＣＴ
処理してＶＱ回路１２４で量子化する。これは、１段目
の逆ＭＤＣＴを行わない等価な図７の（Ｂ）のような構
成とすることができ、図１ではこの（Ｂ）の構成を用い
ている。An example is shown in FIG. The number of stages here is two, which is sequential multi-stage vector quantization, but the input of the second stage is the decoding result of the first stage is L ₂ ,
_Used as subtracted from the highly accurate pitch residual made from g ₂ and g _1d . That is, the representative vector or the dequantized output after vector-quantizing the output from the MDCT circuit 113 of the first stage by the VQ circuit 114 is the inverse MDC.
The result of the inverse MDCT processing by the T circuit 113a is subtracted by the subtracter 1
28 ', and subtracted from the residual of the second stage (output from the pitch inverse filter 122 of FIG. 1). This subtractor 12
The output from 8'is MDCTed by the MDCT circuit 123 '.
It is processed and quantized by the VQ circuit 124. This can have an equivalent configuration as shown in FIG. 7B in which the inverse MDCT of the first stage is not performed, and this configuration in FIG. 1 is used.

【０１４０】図２のデコーダ側でＭＤＣＴ係数のインデ
クスＩ_dxＶ_q1、Ｉ_dxＶ_q2をともに用いたデコードをする
際は、Ｉ_dxＶ_q1、Ｉ_dxＶ_q2の逆ベクトル量子化の結果の
和を逆ＭＤＣＴ、オーバーラップ加算をした上で、ピッ
チ合成、ＬＰＣ合成を行い再生音を得る。当然ピッチ合
成時のピッチラグ、ピッチゲイン更新頻度は、１段のみ
の場合の倍になり、本願では、８０サンプル毎に切り換
わるピッチ合成フィルタを駆動することになる。When performing decoding using both the MDCT coefficient indexes I _dx V _q1 and I _dx V _q2 on the decoder side in FIG. 2, the sum of the results of inverse vector quantization of I _dx V _q1 and I _dx V _q2. Inverse MDCT and overlap addition are performed, and then pitch synthesis and LPC synthesis are performed to obtain a reproduced sound. Naturally, the pitch lag and pitch gain update frequency at the time of pitch synthesis are doubled in the case of only one stage, and in the present application, a pitch synthesis filter that switches every 80 samples is driven.

【０１４１】次に、図２のデコーダ側のポストフィルタ
２１６、２２６について説明する。Next, the post filters 216 and 226 on the decoder side in FIG. 2 will be described.

【０１４２】ポストフィルタ２１６、２２６は、ピッチ
強調、高域強調、スペクトル強調フィルタの縦続接続で
ポストフィルタ特性ｐ(Z) を実現する。The post filters 216 and 226 realize the post filter characteristic p (Z) by cascading pitch enhancement, high frequency enhancement and spectrum enhancement filters.

【０１４３】[0143]

【数１８】 (Equation 18)

【０１４４】この式において、ｇ_i、Ｌはピッチ予測で
求められたピッチゲイン、ピッチラグであり、νはピッ
チ強調の度合いを表すパラメータである（例えばν＝
０．５）。また、ν_bは高域強調（例えばν_b＝０．
４）を、ν_n、ν_dはスペクトルの強調度（例えばν_n
＝０．５、ν_d＝０．８）を表すパラメータである。In this equation, g _i and L are the pitch gain and pitch lag obtained by pitch prediction, and ν is a parameter indicating the degree of pitch enhancement (for example, ν =
0.5). Further, ν _b is high-frequency emphasis (for example, ν _b = 0.
4), ν _n and ν _d are the degree of emphasis of the spectrum (for example, ν _n
= 0.5, ν _d = 0.8).

【０１４５】次に、ＬＰＣ合成フィルタの出力ｓ(n) 、
ポストフィルタの出力ｓ_p(n)についてのゲイン補正を行
う。このときの係数ｋ_adj は、Next, the output s (n) of the LPC synthesis filter,
Gain correction is performed on the output s _p (n) of the post filter. The coefficient k _{adj at} this time is

【０１４６】[0146]

【数１９】 [Equation 19]

【０１４７】であるが、ｋ_adj はフレーム内で固定では
なく、ＬＰＦを通した上でサンプル毎に変化させる。な
お、ｐとして例えば０．１が用いられる。However, k _adj is not fixed within the frame, but is changed for each sample after passing through the LPF. For example, 0.1 is used as p.

【０１４８】k_adj(n) =(1-p)k_adj(n-1) +pk_adj 次に、フレームのつなぎを滑らかにするため、以下のよ
うにピッチ強調フィルタを２つ用意し、その結果をクロ
スフェイドしたものを最終出力とする。K _adj (n) = (1-p) k _adj (n-1) + pk _adj Next, in order to smooth the frame connection, two pitch enhancement filters are prepared as follows, The final output is the result of crossfading.

【０１４９】[0149]

【数２０】 (Equation 20)

【０１５０】[0150]

【数２１】 (Equation 21)

【０１５１】これにより構成されたポストフィルタの出
力ｓ_p0(n) 、ｓ_p(n)に対して最終出力ｓ_out(n)を、 s_out(n)=(1-f(n)) s_p0(n)+f(n) s_p(n) とする。ここで、ｆ(n) は、例えば図８に示すような窓
である。この図８の（Ａ）は低レート時、（Ｂ）は高レ
ート時をそれぞれ示しており、（Ｂ）の８０サンプル幅
の窓は、１６０サンプル、２０ｍsec の合成時には２回
繰り返して用いられる。For the outputs s _p0 (n) and s _p (n) of the post filter thus constructed, the final output s _out (n) is given as s _out (n) = (1-f (n)) s Let _p0 (n) + f (n) s _p (n). Here, f (n) is a window as shown in FIG. 8, for example. 8A shows a low rate and FIG. 8B shows a high rate, and the 80 sample width window in FIG. 8B is used twice when combining 160 samples and 20 msec.

【０１５２】次に、図１のエンコーダ側のＶＱ（ベクト
ル量子化）回路１２４について説明する。Next, the VQ (vector quantization) circuit 124 on the encoder side in FIG. 1 will be described.

【０１５３】このＶＱ回路１２４は、音声（Speech）用
と楽音（Music）用とで互いに異なる２種類の符号帳
（コードブック）を有しており、これらの２種類の符号
帳を入力信号に応じて切り換え選択するようにしてい
る。This VQ circuit 124 has two types of codebooks (codebooks) that are different from each other for speech and music, and these two types of codebooks are used as input signals. The switching is selected accordingly.

【０１５４】すなわち、音声、楽音信号の量子化におい
て、量子化器の構成が決まっている場合、この量子化器
の持つ符号帳は、学習時に使用した音声、楽音の性質に
おいて最適なものとなるため、両者を一緒にして学習し
た場合、両者の性質が大きく異なると、学習後の符号帳
は両者の平均的な性質を持つことになる。従って、一つ
の符号帳で量子化器を構成した場合、そのパフォーマン
ス、あるいは平均Ｓ／Ｎは、あまり高くならないことが
予想される。That is, when the structure of the quantizer is determined in the quantization of the voice and musical tone signals, the codebook of this quantizer becomes the optimum one in the nature of the voice and musical tone used during learning. Therefore, when both are learned together, if the properties of both greatly differ, the codebook after learning will have the average properties of both. Therefore, when a quantizer is configured with one codebook, it is expected that its performance or average S / N will not be so high.

【０１５５】そこで、本例においては、このように性質
の異なる複数の信号について、それぞれの学習データを
用いて作成した符号量を切り換えて、量子化器の性能を
向上させている。Therefore, in the present example, the code amount created by using the respective learning data for a plurality of signals having different properties as described above is switched to improve the performance of the quantizer.

【０１５６】図９は、このような２種類の符号帳Ｃ
Ｂ_A、ＣＢ_Bを有するベクトル量子化器の概略的な構成を
示している。FIG. 9 shows such two kinds of codebooks C.
1 shows a schematic configuration of a vector quantizer having B _A and _{C B B.}

【０１５７】この図９において、入力端子５０１に供給
された入力信号は、ベクトル量子化器５１１、５１２に
送られる。これらのベクトル量子化器５１１、５１２
は、それぞれ符号長（コードブック）ＣＢ_A、ＣＢ_Bを有
している。これらのベクトル量子化器５１１、５１２か
らの代表ベクトルあるいは逆量子化出力は、それぞれ減
算器５１３、５１４に送られ、元の入力信号との差がと
られて、これらの各誤差分が比較器５１５に送られる。
比較器５１５では、各誤差分を比較して、誤差が小さい
方のベクトル量子化器５１１、５１２からの量子化出力
であるインデクスを切換スイッチ５１６で切換選択して
出力端子５０２に送る。In FIG. 9, the input signal supplied to the input terminal 501 is sent to the vector quantizers 511 and 512. These vector quantizers 511, 512
Have code lengths (codebooks) CB _A and CB _B , respectively. The representative vector or dequantized output from these vector quantizers 511 and 512 is sent to subtractors 513 and 514, respectively, and the difference from the original input signal is taken, and these respective error components are compared. Sent to 515.
In the comparator 515, the respective error components are compared with each other, and the index which is the quantized output from the vector quantizer 511, 512 having the smaller error is selectively selected by the selector switch 516 and sent to the output terminal 502.

【０１５８】この場合、各ベクトル量子化器５１１、５
１２の量子化単位時間あるいは周期よりも、切換スイッ
チ５１６の切換周期を長くしている。例えば、量子化単
位がフレームを８分割したサブフレームであるとき、切
換スイッチ５１６をフレーム単位で切り換えている。In this case, each vector quantizer 511, 5
The changeover cycle of the changeover switch 516 is made longer than the 12 quantization unit times or cycles. For example, when the quantization unit is a subframe obtained by dividing a frame into eight, the changeover switch 516 is switched in the frame unit.

【０１５９】ここで、例として、それぞれ音声のみ、楽
音のみで学習した同じサイズＮで、同じ次元Ｍの符号帳
ＣＢ_A、ＣＢ_Bがあるとして、あるフレームのＬ個のデー
タから成るＬ次元のデータＸをサブフレーム長Ｍ（＝Ｌ
／ｎ）でベクトル量子化したとき、量子化後の歪みにつ
いて、符号帳ＣＢ_A を用いたときをＥ_A(k)、符号帳ＣＢ
_B を用いたときをＥ_B(k)とする。これらの歪みＥ_A(k)、
Ｅ_B(k)は、それぞれインデクスｉ，ｊが選ばれたとし
て、Ｅ_A(k)＝‖Ｗ_k（Ｘ−Ｃ _Ai）‖ Ｅ_B(k)＝‖Ｗ_k（Ｘ−Ｃ _Bj）‖ である。この式で、Ｗ_kはサブフレームｋでの重み付け
行列を表し、Ｃ _Ai、Ｃ _Bjは符号帳ＣＢ_A、ＣＢ_Bのそれぞ
れインデクスｉ，ｊに対応する代表ベクトルを表す。As an example, assuming that there are codebooks CB _A and CB _{B of} the same size N and the same dimension M, which are learned only by voice and only by tone, respectively, an L-dimensional data consisting of L data of a certain frame is given. Data X is subframe length M (= L
/ N), when the vector quantization is performed, the distortion after the quantization is E _A (k) when the codebook CB _A is used, and the codebook CB
_Let E _B (k) be the case when _B is used. These distortions E _A (k),
E _B (k), as the index i, j was chosen _{respectively, E A (k) = ‖W} k (X - C Ai) ‖ _{_{E B (k) = ‖W k}} (X - C Bj) ‖ Is. In this formula, W _k represents a weighting matrix of a subframe k, C _Ai, C _Bj denote representative vectors corresponding to the codebook CB _A, each index i of the CB _B, j.

【０１６０】このようにして得られた２つの歪みに対し
て、１フレーム内での歪みの総和により、そのフレーム
に最適な符号帳を採用することを考える。このときの選
び方について、次の２つの方法が考えられる。Consideration will be given to adopting the optimum codebook for the two distortions thus obtained, based on the total sum of distortions within one frame. The following two methods can be considered as the selection method at this time.

【０１６１】第１の方法として、全てのサブフレームに
おいて、符号帳ＣＢ_A 及びＣＢ_B のみを用いて量子化を
行い、歪みのフレーム内総和Σ_kＥ_A(k) 、Σ_kＥ_B(k) を
求め、小さい方の歪みの総和を与える符号帳ＣＢ_A、Ｃ
Ｂ_Bのいずれかを１フレームに亘って使用する。As a first method, in all subframes, quantization is performed using only the codebooks CB _A and CB _B , and the total sum Σ _k E _A (k) of the distortions, Σ _k E _B (k ), And gives the sum of the smaller distortions, codebooks CB _A , C
One of B _B is used over one frame.

【０１６２】この第１の方法を実現する構成例を図１０
に示す。この図１０では、上記図９と対応する部分に同
じ参照番号を付しており、参照番号に添付したａ、ｂ、
・・・等の添字は、サブフレームｋに対応している。符
号帳ＣＢ_Aについては、サブフレーム毎の歪みが得られ
る各減算器５１３ａ、５１３ｂ、・・・、５１３ｎから
の出力のフレーム内総和を加算器５１７でとり、符号帳
ＣＢ_Bについては各サブフレーム毎の歪みのフレーム内
総和を加算器５１８でとって、これらを比較器５１５で
比較することにより、コードブック切換のための制御信
号あるいは選択フラグを端子５０３より得ている。A configuration example for realizing the first method is shown in FIG.
Shown in In FIG. 10, parts corresponding to those in FIG. 9 are given the same reference numerals, and a, b, and
The subscripts such as ... Correspond to the subframe k. The codebook CB _A, takes the subtracters 513a distortion of each sub-frame is obtained, 513b, · · ·, the frame total of the output from 513n in adder 517, each sub-frame for the codebook CB _B An adder 518 obtains the total sum of distortions for each frame, and a comparator 515 compares them to obtain a control signal or a selection flag for codebook switching from a terminal 503.

【０１６３】次に、第２の方法は、各サブフレーム毎
に、歪みＥ_A(k)、Ｅ_B(k)を比較し、これらの比較結果を
フレーム内の全サブフレームに亘って判断処理すること
により符号帳を切換選択するものである。Next, the second method compares distortions E _A (k) and E _B (k) for each sub-frame, and judges the comparison result over all sub-frames in the frame. By doing so, the codebook is switched and selected.

【０１６４】この第２の方法の実現例を図１１に示す。
この図１１では、各サブフレーム毎に比較を行う比較器
５１６からの出力を判断ロジック５１９に送って、例え
ば多数決などにより判断処理し、１ビットの符号帳切換
選択フラグを端子５０３より得るようにしている。FIG. 11 shows an example of implementation of this second method.
In FIG. 11, the output from the comparator 516 that performs comparison for each sub-frame is sent to the judgment logic 519, and judgment processing is performed by, for example, a majority decision so that a 1-bit codebook switching selection flag is obtained from the terminal 503. ing.

【０１６５】なお、この選択フラグが、前述したＳ／Ｍ
（音声／楽音）モードデータとして伝送されるものであ
る。Note that this selection flag indicates that the S / M
It is transmitted as (voice / tone) mode data.

【０１６６】このようにして、複数の性質の異なる信号
を、１つの量子化装置により効率よく量子化できる。In this way, a plurality of signals having different characteristics can be efficiently quantized by one quantizer.

【０１６７】次に、図１のＦＦＴ回路１６１、周波数シ
フト回路１６２、逆ＦＦＴ回路１６３による周波数変換
処理について説明する。Next, the frequency conversion processing by the FFT circuit 161, the frequency shift circuit 162, and the inverse FFT circuit 163 of FIG. 1 will be described.

【０１６８】この周波数変換処理は、入力信号の内の少
なくとも１つの帯域を取り出す帯域抽出工程と、抽出さ
れた少なくとも１つの帯域の信号を周波数軸上の信号に
変換する直交変換工程と、直交変換された信号を周波数
軸上で（他の位置に、他の帯域に）シフトさせるシフト
工程と、周波数軸上でシフトされた信号を逆直交変換し
て時間軸上の信号に変換する逆直交変換工程とを有して
いる。This frequency conversion processing includes a band extraction step of extracting at least one band of the input signal, an orthogonal conversion step of converting the extracted signal of at least one band into a signal on the frequency axis, and an orthogonal conversion. Shift step for shifting the shifted signal on the frequency axis (to another position, to another band), and inverse orthogonal transformation for transforming the signal shifted on the frequency axis to a signal on the time axis And the process.

【０１６９】図１２は、上記周波数変換のための構成を
より詳しく示した図であり、図１と対応する部分には同
じ番号を付している。この図１２において、入力端子１
０１には、例えば１６ｋHzサンプリングで０〜８ＫHzの
成分を持つ広帯域音声信号が供給されている。この入力
端子１０１からの広帯域音声信号の内、例えば０〜３．
８ｋHzを低域側信号としてＬＰＦ（ローパスフィルタ）
１０２により分離し、また元の広帯域信号からこの低域
側信号を減算器１５１で差し引いた成分を高域側信号と
して分離する。これらの低域側信号と高域側信号とを独
立に処理するようにしている。FIG. 12 is a diagram showing in more detail the configuration for the above frequency conversion, and the parts corresponding to those in FIG. 1 are designated by the same reference numerals. In FIG. 12, the input terminal 1
01 is supplied with a wideband audio signal having a component of 0 to 8 kHz at 16 kHz sampling, for example. Of the wideband audio signals from the input terminal 101, for example, 0-3.
LPF (low-pass filter) with 8 kHz as the low frequency side signal
The signal is separated by 102, and the component obtained by subtracting the low frequency side signal from the original wide band signal by the subtractor 151 is separated as the high frequency side signal. The low-frequency side signal and the high-frequency side signal are processed independently.

【０１７０】ここで得た高域側信号は、ＬＰＦ１０２を
介してもわずかに残っている３．５ｋHzから８ｋHzまで
の４．５ｋHzの周波数幅を持つが、ダウンサンプリング
して信号処理を行うため、４ｋHz幅まで狭めなくてはな
らない。この例では、７．５ｋHz〜８ｋHzの０．５ｋHz
分をＢＰＦ（バンドパスフィルタ）１０７あるいはＬＰ
Ｆによりカットしている。The high-frequency side signal obtained here has a frequency width of 4.5 kHz from 3.5 kHz to 8 kHz, which is slightly left even through the LPF 102, but since it is down-sampled for signal processing, We need to narrow it down to 4 kHz. In this example, 0.5 kHz from 7.5 kHz to 8 kHz
BPF (bandpass filter) 107 or LP
Cut by F.

【０１７１】次に、低域側への周波数変換として例えば
ＦＦＴ（高速フーリエ変換）を行うが、これに先立っ
て、フレーム分割回路１０８により、サンプル数を２の
べき乗、例えば図１３の（Ａ）に示すように５１２サン
プル毎に区切っている。ただし、後での信号処理を行い
易くするため、８０サンプル毎に前進させている。Next, for example, FFT (Fast Fourier Transform) is performed as frequency conversion to the low frequency side. Prior to this, the frame division circuit 108 makes the number of samples a power of 2, for example, (A) of FIG. As shown in FIG. 5, each sample is divided into 512 samples. However, in order to facilitate later signal processing, it is advanced every 80 samples.

【０１７２】次に、ハミング窓かけ回路１０９により、
長さ３２０サンプルのハミング窓をかける。このサンプ
ル数の３２０は、上記フレーム分割の際に８０サンプル
ずつ前進させており、後の重畳加算によるフレーム合成
時に、図１３の（Ｂ）に示すように４つの波形を重ねて
加算できるようにするため、８０の４倍としているもの
である。Next, the Hamming windowing circuit 109
A Hamming window of 320 samples in length is applied. This sample number 320 is advanced by 80 samples at the time of the frame division so that four waveforms can be added in an overlapping manner as shown in FIG. Therefore, it is set to four times 80.

【０１７３】次に、この長さ５１２サンプルのデータに
対して、ＦＦＴ回路１６１によりＦＦＴ処理を行い、周
波数軸上のデータに変換する。Next, the data of 512 samples in length is subjected to FFT processing by the FFT circuit 161, and converted into data on the frequency axis.

【０１７４】次に、周波数シフト回路１６２により、周
波数軸上でデータを他の位置あるいは他の帯域にシフト
あるいは移動させる。この周波数軸上でのシフトによっ
てサンプリング周波数を低下させる原理は、図１４に示
すように、（Ａ）の斜線部の高域側信号を（Ｂ）のよう
に低域側に移動し、これを（Ｃ）に示すようにダウンサ
ンプリングするものである。図１４の（Ａ）から（Ｂ）
への周波数軸上での移動の際に、ｆs／２を中心に折り
返される成分については、互いに逆の移動方向となって
いる。これによって、サブバンドの帯域がｆs／２ｎ以
内であれば、サンプリング周波数をｆs／ｎに下げるこ
とができる。Next, the frequency shift circuit 162 shifts or moves the data to another position or another band on the frequency axis. The principle of lowering the sampling frequency by the shift on the frequency axis is to move the high frequency side signal in the shaded area of (A) to the low frequency side as shown in (B), as shown in FIG. Down-sampling is performed as shown in (C). 14 (A) to (B)
The components that are folded back around fs / 2 when moving on the frequency axis are in opposite movement directions. As a result, the sampling frequency can be reduced to fs / n if the subband is within fs / 2n.

【０１７５】この周波数シフト回路１６２では、図１５
に示すように、周波数軸上のデータの高域側に相当する
斜線部のデータを、低域側に相当する周波数軸上の位置
あるいは帯域にシフトあるいは移動させる処理を行えば
よい。具体的に、５１２サンプルの時間軸上のデータを
ＦＦＴ処理して得られる周波数軸上の５１２個のデータ
に対して、１１３番目から２３９番目までの１２７個の
データを、１〜１２７番目の位置あるいは帯域に移動さ
せ、また２７３番目から３９９番目までの１２７個のデ
ータを、３９５〜５１１番目の位置あるいは帯域に移動
させる。このとき、周波数軸上の１１２番目のデータを
０番目の位置に移動させないことが重要である。これ
は、周波数領域の信号では、０番目は直流成分であり、
位相成分が無いため、この位置のデータは実数でなくて
はならず、一般に複素数である周波数成分は、ここには
入れられないからである。また、ｆs／２を表す２５６
番目のデータ（一般にはＮ／２番目のデータ）も無効で
あり、ここは利用しない、すなわち、この場合の０〜４
ｋHzの範囲は、厳密には０＜ｆ＜４ｋHzと表される範囲
のことである。The frequency shift circuit 162 shown in FIG.
As shown in, the process of shifting or moving the shaded data corresponding to the high frequency side of the data on the frequency axis to the position or band on the frequency axis corresponding to the low frequency side may be performed. Specifically, with respect to 512 data on the frequency axis obtained by performing FFT processing on the data on the time axis of 512 samples, 127 pieces of data from the 113th to 239th positions are placed at positions 1 to 127. Alternatively, it is moved to the band, and 127 pieces of data from the 273rd to the 399th are moved to the 395th to 511th positions or bands. At this time, it is important not to move the 112th data on the frequency axis to the 0th position. This is because in the frequency domain signal, the 0th is the DC component,
Since there is no phase component, the data at this position must be a real number, and the frequency component, which is generally a complex number, cannot be included here. Also, 256 representing fs / 2
The second data (generally N / 2nd data) is also invalid and is not used, that is, 0 to 4 in this case.
Strictly speaking, the range of kHz is a range expressed as 0 <f <4 kHz.

【０１７６】次に、逆ＦＦＴ回路１６３により逆ＦＦＴ
処理して、周波数軸上のデータを時間軸上の信号に戻
す。この場合５１２サンプル毎に時間軸上の信号が得ら
れる。この５１２サンプル毎の時間軸信号を、重畳加算
回路１６６により図ｊの（Ｂ）に示すように８０サンプ
ルずつオーバーラップさせ、重なっている部分を足し合
わせる。Next, the inverse FFT circuit 163 performs inverse FFT.
By processing, the data on the frequency axis is returned to the signal on the time axis. In this case, a signal on the time axis is obtained every 512 samples. The time-axis signals for every 512 samples are overlapped by 80 samples by the superposition addition circuit 166 as shown in FIG. 7B, and the overlapping portions are added.

【０１７７】この重畳加算回路１６６から得られた信号
は、１６ｋHzサンプリングで０〜４ｋHzに制限されてい
るので、ダウンサンプリング回路１６４によりダウンサ
ンプリング処理する。これにより周波数シフトされた８
ｋHzサンプリングで０〜４ｋHzの信号を得ることがで
き、この信号が出力端子１６９を介して取り出されて、
図１のＬＰＣ分析・量子化部１３０やＬＰＣ逆フィルタ
１７１に送られる。Since the signal obtained from the superposition addition circuit 166 is limited to 0 to 4 kHz by 16 kHz sampling, it is downsampled by the downsampling circuit 164. 8 frequency shifted by this
A signal of 0 to 4 kHz can be obtained by kHz sampling, and this signal is taken out through the output terminal 169,
It is sent to the LPC analysis / quantization unit 130 and the LPC inverse filter 171 in FIG.

【０１７８】次に、デコード側での復元処理は、図１６
に示す構成により実現できる。Next, the restoration process on the decoding side is shown in FIG.
It can be realized by the configuration shown in.

【０１７９】この図１６の構成は、上記図２のアップサ
ンプリング回路２３３以降の構成に相当しており、対応
する部分に同じ指示符号を付している。ただし、図２に
おいては、ＦＦＴ処理の前にアップサンプリング処理を
行っているが、図１６の例では、ＦＦＴ処理の後にアッ
プサンプリング処理を行っている。The structure of FIG. 16 corresponds to the structure of the upsampling circuit 233 and the subsequent parts of FIG. 2 described above, and corresponding parts are designated by the same reference numerals. However, although the upsampling process is performed before the FFT process in FIG. 2, the upsampling process is performed after the FFT process in the example of FIG. 16.

【０１８０】この図１６において、端子２４１には、図
２の高域側ＬＰＣ合成フィルタ２３２からの出力信号の
ような、８ｋHzサンプリングで０〜４ｋHzにシフトされ
ている高域側の信号が入力される。In FIG. 16, a terminal 241 is supplied with a high-frequency side signal, such as the output signal from the high-frequency side LPC synthesis filter 232 in FIG. 2, which is shifted to 0 to 4 kHz by 8 kHz sampling. It

【０１８１】この信号は、フレーム分割回路２４２によ
り、フレーム長が２５６サンプルで前進分が８０サンプ
ルの信号に区切られる。これは、エンコード側のフレー
ム分割と同様な理由からであるが、サンプリング周波数
が１／２となっているので、サンプル数も１／２となっ
ている。また、ハミング窓かけ回路２４３により、フレ
ーム分割回路２４２からの信号に長さ１６０サンプルの
ハミング窓がかけられることも、エンコード側と同様
（ただしサンプル数は１／２）である。This signal is divided by the frame division circuit 242 into a signal having a frame length of 256 samples and an advance amount of 80 samples. This is for the same reason as the frame division on the encoding side, but since the sampling frequency is 1/2, the number of samples is also 1/2. Also, the Hamming window application circuit 243 applies a Hamming window having a length of 160 samples to the signal from the frame division circuit 242, similarly to the encoding side (however, the number of samples is 1/2).

【０１８２】次に、ＦＦＴ回路２３４により長さ２５６
サンプルでＦＦＴ処理が施され、時間軸上の信号が周波
数軸上の信号に変換される。次のアップサンプリング回
路２４４においては、図１５の（Ｂ）に示すようなゼロ
埋め処理を施すことにより、実質的にフレーム長が２１
６サンプルから５１２サンプルになる。これは、図１４
の（Ｃ）から（Ｂ）への変換に相当する。Next, the FFT circuit 234 sets the length 256.
The FFT processing is performed on the sample, and the signal on the time axis is converted to the signal on the frequency axis. In the next up-sampling circuit 244, the frame length is effectively reduced to 21 by performing the zero padding processing as shown in FIG.
From 6 samples to 512 samples. This is shown in FIG.
Of (C) to (B).

【０１８３】次に、周波数シフト回路２３５により、周
波数軸上でデータを他の位置あるいは他の帯域にシフト
あるいは移動させることにより、＋３．５ｋHzの周波数
シフトを行う。これは、図１４の（Ｂ）から（Ａ）への
変換に相当する。Next, the frequency shift circuit 235 shifts or moves the data to another position or another band on the frequency axis to shift the frequency by +3.5 kHz. This corresponds to the conversion from (B) to (A) in FIG.

【０１８４】このようにして得られた周波数軸上の信号
を、逆ＦＦＴ回路２３６により逆ＦＦＴ処理することに
より、時間軸上の信号に戻す。この逆ＦＦＴ回路２３６
からの信号は、１６ｋHzサンプリングで３．５ｋHz〜
７．５ｋHzとなっている。The signal on the frequency axis thus obtained is subjected to the inverse FFT processing by the inverse FFT circuit 236 to be returned to the signal on the time axis. This inverse FFT circuit 236
The signal from is 3.5kHz from 16kHz sampling
It is 7.5 kHz.

【０１８５】次の重畳加算回路２３７では、長さ５１２
サンプルのフレーム毎に、８０サンプルずつオーバーラ
ップさせて足し合わせ、連続する時間軸信号に戻す。こ
のようにして得られた高域側信号は、加算器２２８で低
域側信号と加算され、出力端子２２９より取り出され
る。In the next superposition addition circuit 237, the length 512
For each frame of samples, 80 samples are overlapped and added together to restore a continuous time axis signal. The high frequency side signal thus obtained is added to the low frequency side signal by the adder 228 and is taken out from the output terminal 229.

【０１８６】なお、このような周波数変換においては、
具体的な数値は上記の例に限定されず、シフトを行うバ
ンド数も１つに限定されない。In such frequency conversion,
The specific numerical value is not limited to the above example, and the number of bands for shifting is not limited to one.

【０１８７】例えば、図１７に示すように、１６ｋHzサ
ンプリングで狭帯域信号を３００Hz〜３．４ｋHz、広帯
域信号を０〜７ｋHzとする場合に、狭帯域に含まれない
低域０〜３００Hzと、高域側の３．４ｋHz〜７ｋHzとの
内、高域側を３００Hz〜３．９ｋHzに移動して低域側と
接触するように集めれば、０〜３．９ｋHzの信号とな
り、これも上述と同様にサンプリング周波数ｆs を１／
２の８ｋHzとすることができる。For example, as shown in FIG. 17, when the narrow band signal is set to 300 Hz to 3.4 kHz and the wide band signal is set to 0 to 7 kHz by 16 kHz sampling, the low band 0 to 300 Hz not included in the narrow band and the high band are included. Of the 3.4kHz to 7kHz range, if you move the high range to 300Hz to 3.9kHz and collect it so that it contacts the low range, it becomes a signal of 0 to 3.9kHz, which is the same as above. 1 / sampling frequency fs
It can be 2 to 8 kHz.

【０１８８】これを一般化すれば、広帯域信号と、広帯
域信号の内部に収まる狭帯域信号とを多重化する場合、
広帯域信号から狭帯域信号を減算した残りの内の高域側
成分を低域側にシフトして、サンプリングレートを下げ
るわけである。If this is generalized, when a wideband signal and a narrowband signal that fits inside the wideband signal are multiplexed,
The high band side component of the rest of the wide band signal obtained by subtracting the narrow band signal is shifted to the low band side to lower the sampling rate.

【０１８９】このように、任意の周波数から任意の周波
数のサブバンドを作ることができ、その周波数幅の総和
の２倍のサンプリング周波数で処理可能であり、アプリ
ケーションに柔軟に対応できる。As described above, a subband of an arbitrary frequency can be created from an arbitrary frequency, processing can be performed at a sampling frequency twice the sum of the frequency widths, and the application can be flexibly handled.

【０１９０】また、低ビットレートで量子化誤差が大き
い場合、一般にＱＭＦを利用すると分割周波数付近に発
生したはずの折り返しノイズも、上記周波数変換方法に
よれば回避できる、という利点もある。Further, when the quantization error is large at a low bit rate, the aliasing noise, which should have been generated in the vicinity of the division frequency when the QMF is generally used, can be avoided by the frequency conversion method.

【０１９１】なお、本発明は上記実施の形態のみに限定
されるものではなく、例えば上記図１の音声符号化側
（エンコード側）の構成や、図２の音声復号化側（デコ
ード側）の構成については、各部をハードウェア的に記
載しているが、いわゆるＤＳＰ（ディジタル信号プロセ
ッサ）等を用いてソフトウェアプログラムにより実現す
ることも可能である。また、上記ベクトル量子化の代わ
りに、複数フレームのデータをまとめてマトリクス量子
化を施してもよい。さらに、本発明が適用される音声符
号化方法や復号化方法は、上記符号化復号化方法に限定
されるものではなく、種々の音声符号化復号化方法に適
用でき、用途としても、伝送や記録再生に限定されず、
ピッチ変換やスピード変換、規則音声合成、あるいは雑
音抑圧のような種々の用途に応用できることは勿論であ
る。The present invention is not limited to the above-described embodiment, and for example, the configuration of the speech encoding side (encoding side) of FIG. 1 and the speech decoding side (decoding side) of FIG. Regarding the configuration, each unit is described as hardware, but it is also possible to realize it by a software program using a so-called DSP (digital signal processor) or the like. Also, instead of the vector quantization, the data of a plurality of frames may be collectively subjected to matrix quantization. Furthermore, the speech encoding method and the decoding method to which the present invention is applied are not limited to the above encoding / decoding method, but can be applied to various speech encoding / decoding methods, and can be used for transmission and Not limited to recording and playback,
Of course, it can be applied to various applications such as pitch conversion, speed conversion, regular speech synthesis, or noise suppression.

【０１９２】[0192]

【発明の効果】以上の説明から明らかなように、本発明
に係る音声符号化方法によれば、入力信号を帯域分割
し、分割された少なくとも１つの高域側の信号を低域側
に周波数変換し、低域側に変換された信号のサンプリン
グレートを低下させ、サンプリングレートが低下させら
れた信号を予測符号化することにより、符号化効率が向
上し、低ビットレートで高品質な符号化が可能となる。
また、低域側と高域側とに分離して符号化することによ
り、広帯域の信号再生が行える。As is apparent from the above description, according to the speech coding method of the present invention, the input signal is band-divided, and at least one divided high-frequency side signal is frequency-divided to the low-frequency side. Converting and lowering the sampling rate of the signal converted to the low frequency side, and predictively encoding the signal with the reduced sampling rate improves the encoding efficiency and enables high-quality encoding at a low bit rate. Is possible.
In addition, by separating the low-frequency side and the high-frequency side from each other and encoding, wideband signal reproduction can be performed.

[Brief description of the drawings]

【図１】本発明に係る音声符号化方法の実施の形態が適
用される音声信号符号化装置の基本構成を示すブロック
図である。FIG. 1 is a block diagram illustrating a basic configuration of an audio signal encoding device to which an embodiment of an audio encoding method according to the present invention is applied.

【図２】音声信号復号化装置の基本構成を示すブロック
図である。FIG. 2 is a block diagram showing a basic configuration of a speech signal decoding device.

【図３】他の音声信号符号化装置の構成を示すブロック
図である。FIG. 3 is a block diagram showing the configuration of another audio signal encoding device.

【図４】伝送される符号化データのビットストリームの
スケーラビリティを説明するための図である。FIG. 4 is a diagram for explaining scalability of a bit stream of encoded data to be transmitted.

【図５】本発明が適用可能な符号化側のシステム全体を
概略的に示すブロック図である。FIG. 5 is a block diagram schematically showing an entire encoding side system to which the present invention can be applied.

【図６】符号化、復号化の主要動作の周期及び位相関係
を説明するための図である。FIG. 6 is a diagram for explaining a cycle and a phase relationship of main operations of encoding and decoding.

【図７】ＭＤＣＴ（モディファイド離散コサイン変換）
係数のベクトル量子化の構成例を示す図である。FIG. 7: MDCT (Modified Discrete Cosine Transform)
It is a figure which shows the structural example of vector quantization of a coefficient.

【図８】ポストフィルタ出力にかけられる窓関数の例を
示す図である。FIG. 8 is a diagram showing an example of a window function applied to a post filter output.

【図９】２種類のコードブックを有するベクトル量子化
装置の例を示す図である。FIG. 9 is a diagram showing an example of a vector quantization device having two types of codebooks.

【図１０】２種類のコードブックを有するベクトル量子
化装置の具体例を示す図である。FIG. 10 is a diagram showing a specific example of a vector quantization device having two types of codebooks.

【図１１】２種類のコードブックを有するベクトル量子
化装置の他の具体例を示す図である。FIG. 11 is a diagram showing another specific example of the vector quantization device having two types of codebooks.

【図１２】周波数変換のエンコーダ側の構成を示すブロ
ック図である。FIG. 12 is a block diagram showing a configuration on the encoder side of frequency conversion.

【図１３】フレーム分割及び重畳加算処理を説明するた
めの図である。FIG. 13 is a diagram for explaining frame division and superposition addition processing.

【図１４】周波数軸上での周波数シフトの他の例を示す
図である。FIG. 14 is a diagram showing another example of frequency shift on the frequency axis.

【図１５】周波数軸上のデータのシフト処理を示す図で
ある。FIG. 15 is a diagram showing a shift process of data on the frequency axis.

【図１６】周波数変換のデコーダ側の構成を示すブロッ
ク図である。FIG. 16 is a block diagram showing a configuration of a frequency conversion decoder side.

【図１７】周波数軸上での周波数シフトの他の例を示す
図である。FIG. 17 is a diagram showing another example of frequency shift on the frequency axis.

[Explanation of symbols]

１１１、１７１ＬＰＣ逆フィルタ１１２、１２２ピッチ逆フィルタ１１３、１２３ＭＤＣＴ（モディファイド離散コサイ
ン変換）回路１１４、１２４ＶＱ（ベクトル量子化）回路１１５、１２５ピッチ分析回路１１６、１１８、１２６ピッチゲインＶＱ回路１３０、１８０ＬＰＣ分析・量子化部１６１、２３４ＦＦＴ（高速フーリエ変換）回路１６２、２３５周波数シフト回路１６３、２３６逆ＦＦＴ回路111, 171 LPC inverse filter 112, 122 Pitch inverse filter 113, 123 MDCT (Modified Discrete Cosine Transform) circuit 114, 124 VQ (Vector Quantization) circuit 115, 125 Pitch analysis circuit 116, 118, 126 Pitch gain VQ circuit 130, 180 LPC analysis / quantization unit 161, 234 FFT (Fast Fourier Transform) circuit 162, 235 Frequency shift circuit 163, 236 Inverse FFT circuit

フロントページの続き (72)発明者飯島和幸東京都品川区北品川６丁目７番35号ソニー株式会社内Front Page Continuation (72) Inventor Kazuyuki Iijima 6-735 Kita-Shinagawa, Shinagawa-ku, Tokyo Sony Corporation

Claims

[Claims]

1. A band division step of band-dividing an input signal, a frequency conversion step of frequency-converting at least one divided high-frequency side signal to a low-frequency side, and a sampling of the low-frequency side converted signal. A signal coding method comprising: a sampling rate lowering step of lowering a rate; and a step of predictively coding a signal with a lower sampling rate.

2. The signal encoding method according to claim 1, wherein the band dividing step divides a wide band input signal into a telephone band and a band higher than the telephone band.

3. The frequency transforming step performs orthogonal transform on at least one of the divided signals on the high band side, then performs frequency shift processing on the low band side, and further performs inverse orthogonal transform. The signal coding method according to claim 1, wherein

4. The signal coding method according to claim 3, wherein the orthogonal transform is a fast Fourier transform.

5. The predictive coding step comprises linearly predicting the signal having the reduced sampling rate, and extracting a quantized parameter and a prediction residual representing a linear prediction coefficient obtained at this time. The signal encoding method according to claim 1, characterized in that

6. Band dividing means for band-dividing an input signal, frequency converting means for frequency-converting at least one divided high-frequency side signal to a low-frequency side, and sampling of the low-frequency side converted signal. A signal coding apparatus comprising: a sampling rate lowering unit for lowering a rate; and a unit for predictively coding a signal with a lower sampling rate.