JPH05218980A

JPH05218980A - Voice coding communication system and its device

Info

Publication number: JPH05218980A
Application number: JP4056032A
Authority: JP
Inventors: Seiji Sasaki; 誠司佐々木
Original assignee: Kokusai Electric Corp
Current assignee: Kokusai Electric Corp
Priority date: 1992-02-07
Filing date: 1992-02-07
Publication date: 1993-08-27

Abstract

PURPOSE:To improve the reproduction sound quality by reducing distortion of an especially high frequency region attended with a half rate processing of a transmission line of the analysis synthesis system voice coding communication system. CONSTITUTION:After a DCT coefficient obtained by transforming a long period prediction residual signal into a frequency region by a discrete cosine transformation device 72 is divided into N sets of frequency regions by a divider 73, the result is normalized by normalizing devices 74,75 and quantized with vector quantization devices 76,78 by using a code book 77, pitch information Pa3, maximum values of DCT coefficients Pb3-Pc3, and DCT coefficient vector numbers Pd3-pe3 are coded and sent.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】自動車・携帯電話の普及は目覚ま
しく、現行のアナログシステムでは増大する加入者を収
容しきれなくなる事態が予想される。電波をより有効に
利用するため、ディジタルシステムに移行する計画が進
められており、その第１世代（フルレート）の標準化仕
様がＲＣＲ（電波システム開発センタ）から公開され
た。この中での音声符号化方式の符号化速度は、音声デ
ータと誤り訂正用の冗長データで１１．２ｋｂｐｓ（ビ
ット／秒）である。さらに２倍の電波利用効率を目指し
て音声符号化のハーフレート化が計画されている。この
ハーフレート音声符号化の符号化速度は、音声データと
誤り訂正用の冗長データで５．６ｋｂｐｓである。本発
明は、このハーフレートシステムに適用することを目的
として、音声データを情報圧縮するための音声符号化通
信方式及びその装置に関するものである。[Industrial application] The spread of automobiles and mobile phones is remarkable, and it is expected that the current analog system will not be able to accommodate the increasing number of subscribers. In order to use radio waves more effectively, a plan to shift to a digital system is underway, and standardization specifications for the first generation (full rate) have been released from RCR (Radio System Development Center). The coding speed of the speech coding system in this is 11.2 kbps (bits / second) for speech data and redundant data for error correction. Furthermore, half-rate speech coding is planned for the purpose of doubling the radio wave utilization efficiency. The coding rate of this half-rate voice coding is 5.6 kbps for voice data and redundant data for error correction. The present invention relates to a voice coding communication system for compressing voice data and an apparatus thereof for the purpose of being applied to this half rate system.

【０００２】[0002]

【従来の技術】図９は、ピッチ予測を用いた従来の適応
変換符号化通信方式のブロック図であり、（Ａ）は音声
符号化装置、（Ｂ）は音声復号装置である。この方式
は、音声データを例えば、６４ｋｂｐｓ（６．４ｋＨｚ
サンプリング、１０ビット量子化されている）から４．
５ｋｂｐｓに情報圧縮する方式である。この方式をハー
フレートに適用した場合、音声データは４．５ｋｂｐｓ
であるので、誤り訂正冗長ビットとして１．１ｋｂｐｓ
が割り当てられる。2. Description of the Related Art FIG. 9 is a block diagram of a conventional adaptive transform coding communication system using pitch prediction, in which (A) is a speech coding apparatus and (B) is a speech decoding apparatus. In this system, audio data is transmitted at 64 kbps (6.4 kHz), for example.
Sampling, 10-bit quantization) to 4.
This is a method of compressing information to 5 kbps. When this method is applied to the half rate, the audio data is 4.5 kbps
Therefore, 1.1 kbps as an error correction redundant bit
Is assigned.

【０００３】以下、例として音声データを６４ｋｂｐｓ
から４．５ｋｂｐｓに圧縮する方法について説明する。
図９（Ａ）において、６．４ｋＨｚサンプリングで１０
ビット量子化された入力音声信号（６４ｋｂｐｓ）ａ
は、１フレーム（３０ｍｓｅｃ：１９２サンプル）毎に
長期予測分析器１１によりピッチ情報Ｐａを抽出して出
力するとともに、ピッチ成分を取り除いた信号である長
期予測残差信号ｂを出力する。長期予測残差信号ｂは、
サブフレーム（１５ｍｓｅｃ：９６サンプル）に分割さ
れた後、離散コサイン変換（ＤＣＴ）器１２により周波
数領域に変換され、ＤＣＴ係数ｃ（９６サンプル／サブ
フレーム）を出力する。ＤＣＴ変換式については後述す
る。このＤＣＴ係数ｃは、サブフレーム毎に適応間引器
１３により間引かれ情報圧縮される。ここでの間引き方
は、各ＤＣＴ係数の振幅はサブフレーム毎に変化するの
で、それに適応するように振幅の大きいＤＣＴ係数を限
られた個数だけ選択し、残りの振幅の小さいＤＣＴ係数
は０にする。それらの振幅情報と位置情報をＤＣＴ情報
Ｐｂとして出力する。ピッチ情報Ｐａ，ＤＣＴ情報Ｐｂ
は、符号化器１４によりディジタル信号系列ｄに変換さ
れ、多重化されて受信側に送出される。In the following, as an example, voice data is 64 kbps.
To 4.5 kbps will be described.
In FIG. 9A, 10 at 6.4 kHz sampling
Bit-quantized input audio signal (64 kbps) a
Outputs the pitch information Pa extracted by the long-term prediction analyzer 11 for each frame (30 msec: 192 samples) and outputs the long-term prediction residual signal b which is a signal from which the pitch component is removed. The long-term prediction residual signal b is
After being divided into subframes (15 msec: 96 samples), they are transformed into the frequency domain by the discrete cosine transform (DCT) unit 12, and the DCT coefficient c (96 samples / subframe) is output. The DCT conversion formula will be described later. The DCT coefficient c is decimated by the adaptive decimator 13 for each subframe and information is compressed. In the thinning method here, the amplitude of each DCT coefficient changes for each subframe, so a limited number of DCT coefficients with large amplitude are selected to accommodate it, and the remaining DCT coefficients with small amplitude are set to 0. To do. The amplitude information and the position information are output as DCT information Pb. Pitch information Pa, DCT information Pb
Is converted into a digital signal sequence d by the encoder 14, multiplexed and sent to the receiving side.

【０００４】受信側では、ディジタル信号列ｅを受け取
り、分離回路２１によりピッチ情報Ｐｃ，ＤＣＴ情報Ｐ
ｄに分離する。適応間引復号器２２では、ピッチ情報Ｐ
ｃ中のＤＣＴ係数振幅情報、位置情報により、送られて
きたＤＣＴ係数を再生し、送られてこなかったＤＣＴ係
数の位置に０を挿入することにより補間する。再生され
たＤＣＴ係数ｆを逆離散コサイン変換器（ＩＤＣＴ器）
２３により時間領域に変換し、長期予測残差信号ｇを再
生する。長期予測合成器２４では長期予測残差信号ｇに
ピッチ情報Ｐ_dを付加することにより、音声信号ｈを復
号再生する。従来の符復号器のフレーム（３０ｍｓｅ
ｃ）毎のビット配分の例を次の表１に示す。各フレーム
の先頭には、フレーム同期をとるため５ビットの同期ビ
ットを挿入している。表１での合計を１秒当たりに変換
すると、１３５ビット／３０ｍｓｅｃ＝４．５ｋｂｐｓ
となる。On the receiving side, the digital signal train e is received and the separation circuit 21 separates the pitch information Pc and the DCT information P.
Separate into d. In the adaptive thinning-out decoder 22, the pitch information P
The transmitted DCT coefficient is reproduced based on the DCT coefficient amplitude information and position information in c, and interpolation is performed by inserting 0 at the position of the DCT coefficient that has not been transmitted. The reconstructed DCT coefficient f is converted to an inverse discrete cosine converter (IDCT device)
It is converted to the time domain by 23 and the long-term prediction residual signal g is reproduced. The long-term prediction synthesizer 24 adds pitch information P _d to the long-term prediction residual signal g to decode and reproduce the voice signal h. Conventional codec frame (30 mse
An example of bit allocation for each c) is shown in Table 1 below. Five synchronization bits are inserted at the beginning of each frame for frame synchronization. Converting the total in Table 1 per second, 135 bits / 30 msec = 4.5 kbps
Becomes

【表１】携帯電話・自動車電話等の移動通信システムでは有線ま
たは固定通信システムと違い伝送路状況が過酷なため、
ビット誤り率は常時０．１％〜１％であり１０％程度に
なることも稀ではない。このため、ハーフレート音声符
号化方式では、強力な誤り訂正機能を有する必要があ
り、全符号化速度（５．６ｋｂｐｓ）のうち３５％（約
２ｋｂｐｓ）程度以上は誤り訂正用の冗長ビットに割り
当てることが必要であるといえる。従って、ハーフレー
ト音声符号化方式に適用する場合には、音声データの符
号化速度は約３．６ｋｂｐｓ以下で高品質（ｌｏｇ−Ｐ
ＣＭ６ビット相当以上）な再生音声が得られることが要
求される。[Table 1] In mobile communication systems such as mobile phones and car phones, the transmission line conditions are harsh, unlike wired or fixed communication systems.
The bit error rate is always 0.1% to 1%, and it is not uncommon to reach about 10%. For this reason, the half-rate speech coding system needs to have a strong error correction function, and about 35% (about 2 kbps) or more of the total coding speed (5.6 kbps) is allocated to redundant bits for error correction. Can be said to be necessary. Therefore, when applied to the half-rate speech coding method, the coding rate of the speech data is about 3.6 kbps or less and high quality (log-P).
It is required to obtain reproduced sound of CM 6 bits or more).

【０００５】上述の方式での再生音声品質は、音声符号
化速度４．６ｋｂｐｓで、ｌｏｇ−ＰＣＭ４ビット相当
しか得られておらず、音声符号化速度をさらに３．６ｋ
ｂｐｓ以下に下げた場合、伝送できるＤＣＴ係数の個数
は減少し、周波数領域での歪みが大きくなるため、さら
に再生音声品質は劣化する。つまり、従来の方式では再
生音声品質をｌｏｇ−ＰＣＭ６ビット相当で符号化速度
を３．６ｋｂｐｓ以下に下げることはできず、ハーフレ
ート音声符号化方式に要求される性能（品質、誤り訂正
能力）を満たすことはできない。そこで、本発明者は、
この問題点を改善するために上述の方法にベクトル量子
化を導入し、ハーフレートシステムに適用可能な（音声
符号化速度３．６ｋｂｐｓ以下で、再生音声品質がｌｏ
ｇ−ＰＣＭ６ビット相当以上）音声符号化方法及びその
装置を先に提案した（特願平３−３２９７８２号参
照）。その内容は、上述の方法での適応間引き器をベク
トル量子化器に置き換えることにより再生音声品質を従
来方式以上に保ちながら符号化速度低減を図ったもので
あり、図１０はその実施例を示す音声符号化通信方式の
ブロック図である。（Ａ）は音声符号化装置、（Ｂ）は
音声復号装置である。The reproduced voice quality in the above-mentioned system is a voice coding rate of 4.6 kbps, and only log-PCM 4-bit equivalent is obtained, and the voice coding rate is further increased to 3.6 k.
When it is reduced to bps or less, the number of DCT coefficients that can be transmitted is reduced and distortion in the frequency domain is increased, so that the reproduced voice quality is further deteriorated. That is, in the conventional system, the reproduced voice quality cannot be lowered to 3.6 kbps or less with the log-PCM equivalent to 6 bits, and the performance (quality, error correction capability) required for the half rate voice coding system can be obtained. Can not meet. Therefore, the present inventor
In order to improve this problem, vector quantization is introduced to the above method, and it is applicable to a half rate system (speech coding speed is 3.6 kbps or less, reproduced speech quality is lo).
A voice coding method and its apparatus have been previously proposed (see g-PCM 6 bits or more) (see Japanese Patent Application No. 3-329782). The content is to reduce the coding speed while maintaining the reproduced voice quality higher than that of the conventional method by replacing the adaptive decimator in the above method with a vector quantizer, and FIG. 10 shows an embodiment thereof. It is a block diagram of a voice coding communication system. (A) is a speech encoding device, and (B) is a speech decoding device.

【０００６】図１０（Ａ）において、６．４ｋＨｚサン
プリング，１０ビット量子化された入力音声信号（６４
ｋｂｐｓ）ａ１は、長期予測分析器３１によりフレーム
（３０ｍｓｅｃ：１９２サンプル）毎にピッチ情報Ｐａ
１を抽出して出力するとともに、入力信号ａ１からピッ
チ成分を取り除いた信号である長期予測残差信号ｂ１を
生成して出力する。それを離散コサイン変換器３２によ
り、サブフレーム（１５ｍｓｅｃ：９６サンプル）毎に
周波数領域に変換して周波数成分であるＤＣＴ係数ｃ１
（９６係数）を出力する。離散コサイン変換については
後で述べる。ＤＣＴ係数ｃ１を正規化器３３によりＤＣ
Ｔ係数の最大値により正規化し、ＤＣＴ係数最大値Ｐｂ
１と正規化されたＤＣＴ係数ｄ１とを得る。次に正規化
されたＤＣＴ係数ｄ１をベクトル量子化器３４と符号帳
３５によりベクトル量子化する。符号帳３５には例えば
５１２種類のベクトルパターンが記憶されている。ベク
トル量子化器３４は、未知入力ベクトルであるＤＣＴ係
数ｄ１と符号帳３５の中のベクトルを比較し、ベクトル
間距離が最小となるベクトルを選択し、そのベクトル番
号Ｐｃ１を出力する。ベクトル番号Ｐｃ１は９ビットで
量子化される。ベクトル番号Ｐｃ１は符号化器３６によ
りＤＣＴ係数の最大値Ｐｂ１及びピッチ情報Ｐａ１とと
もにディジタル列信号ｅ１の形態に符号化した後多重化
して伝送路に送出される。In FIG. 10A, an input voice signal (64 kHz sampling, 10-bit quantized) (64
kbps) a1 is pitch information Pa for each frame (30 msec: 192 samples) by the long-term prediction analyzer 31.
1 is extracted and output, and a long-term prediction residual signal b1 that is a signal obtained by removing the pitch component from the input signal a1 is generated and output. The discrete cosine transformer 32 transforms it into the frequency domain for each subframe (15 msec: 96 samples), and the DCT coefficient c1 as a frequency component.
(96 coefficients) is output. Discrete cosine transform will be described later. The DCT coefficient c1 is converted to DC by the normalizer 33.
The DCT coefficient maximum value Pb is normalized by the maximum value of the T coefficient.
1 and the normalized DCT coefficient d1 are obtained. Next, the normalized DCT coefficient d1 is vector-quantized by the vector quantizer 34 and the codebook 35. The codebook 35 stores, for example, 512 types of vector patterns. The vector quantizer 34 compares the DCT coefficient d1 that is an unknown input vector with the vector in the codebook 35, selects the vector with the smallest inter-vector distance, and outputs the vector number Pc1. The vector number Pc1 is quantized with 9 bits. The vector number Pc1 is encoded by the encoder 36 together with the maximum value Pb1 of the DCT coefficient and the pitch information Pa1 in the form of the digital sequence signal e1 and then multiplexed and transmitted to the transmission line.

【０００７】伝送路を介して受信した前記ディジタル列
信号ｆ１を分離回路４１により分離してＤＣＴベクトル
番号Ｐｄ１，ＤＣＴ最大値情報Ｐｅ１及びピッチ情報Ｐ
ｆ１を取り出し、ＤＣＴベクトル番号Ｐｄ１を用いてベ
クトル逆量子化器４２及び符号帳４３によりベクトル逆
量子化し、正規化されたＤＣＴ係数ｇ１を再生する。こ
こで、符号帳４３は符号化装置の符号帳３５と同じ内容
になっており、ベクトル番号Ｐｄ１を指定することによ
り符号化装置側で選択されたベクトルと同じものを得る
ことができる。逆量子化器４４では、ＤＣＴ最大値Ｐｅ
１を正規化されたＤＣＴ係数ｇ１に乗算することによ
り、ＤＣＴ係数ｈ１を再生する。逆離散コサイン変換器
４５ではＤＣＴ係数ｈ１を時間領域に変換して長期予測
残差信号ｉ１を再生する。長期予測合成器４６では長期
予測残差信号ｉ１にピッチ情報Ｐｆ１を付加して音声信
号ｊ１を復号再生する。本方式におけるビット配分を次
の表２に示す。このビット分配を用いれば、符号化速度
は約１．４ｋｂｐｓ（４３ビット／３０ｍｓｅｃ）まで
低減することができ、また、各代表ベクトルの内容と個
数を適切に選ぶことにより、ｌｏｇ−ＰＣＭ６ビット相
当の再生音声品質が得られることが期待される。The digital column signal f1 received via the transmission line is separated by the separation circuit 41, and DCT vector number Pd1, DCT maximum value information Pe1 and pitch information P are obtained.
f1 is taken out, vector dequantization is performed by the vector dequantizer 42 and the codebook 43 using the DCT vector number Pd1, and the normalized DCT coefficient g1 is reproduced. Here, the codebook 43 has the same contents as the codebook 35 of the encoding apparatus, and by specifying the vector number Pd1, the same vector as the vector selected on the encoding apparatus side can be obtained. In the inverse quantizer 44, the maximum DCT value Pe
The DCT coefficient h1 is reproduced by multiplying the normalized DCT coefficient g1 by 1. The inverse discrete cosine transformer 45 transforms the DCT coefficient h1 into the time domain to reproduce the long-term prediction residual signal i1. The long-term prediction synthesizer 46 adds pitch information Pf1 to the long-term prediction residual signal i1 to decode and reproduce the voice signal j1. The bit allocation in this method is shown in Table 2 below. By using this bit distribution, the coding speed can be reduced to about 1.4 kbps (43 bits / 30 msec), and by appropriately selecting the content and number of each representative vector, log-PCM equivalent to 6 bits can be obtained. It is expected that reproduced voice quality will be obtained.

【表２】 [Table 2]

【０００８】[0008]

【発明が解決しようとする課題】上記の本発明者による
先の提案の構成では、ベクトル量子化を導入することに
より音声符号化速度を低く抑えることはできるが、再生
音声品質の維持はまだ十分ではないという問題点があ
る。この原因は、ベクトル量子化を行なう際、電力が集
中する低い周波数領域（約２ｋＨｚ以下）に重点が置か
れてベクトルが選択され、電力の小さい高い周波数領域
（約２ｋＨｚ以上）では量子化歪みが大きくなるためで
あり、再生音声の高い周波数領域が歪み品質が劣化す
る。このため、ハーフレートシステムで要求されるｌｏ
ｇ−ＰＭＣ６ビット相当以上の品質を得ることが難しい
という問題点がある。本発明の目的は、このような先願
の問題点をさらに改善して、ハーフレートシステムに適
用可能な（音声符号化速度３．６ｋｂｐｓ以下で、再生
音声品質がｌｏｇ−ＰＣＭ６ビット相当以上）音声符号
化通信方式及びその装置を提供することである。In the above-mentioned configuration proposed by the present inventor, the voice coding speed can be suppressed low by introducing vector quantization, but the reproduced voice quality is still not sufficiently maintained. There is a problem that is not. The reason for this is that when vector quantization is performed, the vector is selected with emphasis on a low frequency region (about 2 kHz or less) where power is concentrated, and quantization distortion is generated in a high frequency region with low power (about 2 kHz or more). This is because the sound quality becomes large, and the distortion quality deteriorates in the high frequency region of the reproduced sound. Therefore, the lo required in half-rate systems
There is a problem that it is difficult to obtain quality equivalent to 6 bits of g-PMC. An object of the present invention is to further improve the above problems of the prior application and to be applicable to a half rate system (voice coding speed of 3.6 kbps or less, reproduced voice quality of log-PCM equivalent to 6 bits or more) An object of the present invention is to provide a coded communication system and its device.

【０００９】[0009]

【課題を解決するための手段】本発明は、先の提案の問
題点を解決するため、ＤＣＴ係数を等間隔な複数（Ｎ
個、但しＮは２以上の整数）の周波数領域に分割し、そ
れらを別々にベクトル量子化を行なうことにより高い周
波数領域の量子化歪みを軽減し再生音声の品質改善を図
るものである。すなわちその構成は、送信側では、入力
音声信号を長期予測分析してピッチ情報と長期予測残差
信号を求め、該長期予測残差信号を離散コサイン変換に
より周波数領域に変換して得られるＤＣＴ係数のＤＣＴ
係数最大値を出力するとともに該ＤＣＴ係数最大値で正
規化したＤＣＴ係数を符号帳を用いてベクトル量子化
し、該符号帳から読み出した近似パターンのＤＣＴベク
トル番号と前記ピッチ情報および前記ＤＣＴ係数最大値
とをディジタル信号列の形態に符号化して伝送路に送出
し、受信側では、該伝送路からの前記ディジタル信号列
を分離して前記ピッチ情報，前記ＤＣＴ係数最大値およ
び前記ＤＣＴベクトル番号を取り出し、該ＤＣＴベクト
ル番号に対応するベクトルを符号帳から読み出し逆量子
化して正規化されたＤＣＴ係数を求め前記ＤＣＴ係数最
大値によって逆正規化してＤＣＴ係数を再生した後、逆
離散コサイン変換により長期予測残差信号を再生し、該
長期予測残差信号に前記ピッチ情報を付加して音声信号
を復号再生する音声符号化通信方式において、前記送信
側の離散コサイン変換により得られるＤＣＴ係数を等間
隔の複数の周波数領域に分割し、該分割された周波数領
域のＤＣＴ係数最大値をそれぞれ出力するとともに該Ｄ
ＣＴ係数最大値でそれぞれ正規化したＤＣＴ係数をそれ
ぞれ符号帳を用いてベクトル量子化し、それぞれ符号帳
から読み出した近似パターンのＤＣＴベクトル番号と前
記ピッチ情報および前記複数のＤＣＴ係数最大値とをデ
ィジタル信号列の形態に符号化して伝送路に送出し、前
記受信側では、該伝送路からの前記ディジタル信号列を
分離して前記ピッチ情報，前記複数のＤＣＴ係数最大値
および前記複数のＤＣＴベクトル番号をそれぞれ取り出
し、該複数のＤＣＴベクトル番号に対応するベクトルを
それぞれ符号帳から読み出し逆量子化して正規化された
ＤＣＴ係数を再生して合成した後、逆離散コサイン変換
により長期予測残差信号を再生するようにしたことを特
徴とするものである。In order to solve the problems of the above-mentioned proposal, the present invention uses a plurality of DCT coefficients at equal intervals (N
(Where N is an integer of 2 or more) and vector quantization is performed separately for each to reduce the quantization distortion in the high frequency region and improve the quality of reproduced voice. That is, the configuration is such that, on the transmission side, the input speech signal is subjected to long-term prediction analysis to obtain pitch information and a long-term prediction residual signal, and the DCT coefficient obtained by transforming the long-term prediction residual signal into the frequency domain by discrete cosine transform. DCT
The coefficient maximum value is output, and the DCT coefficient normalized by the DCT coefficient maximum value is vector-quantized using a codebook, and the DCT vector number of the approximate pattern read from the codebook, the pitch information, and the DCT coefficient maximum value. Are encoded in the form of a digital signal sequence and transmitted to a transmission line, and the receiving side separates the digital signal sequence from the transmission line and extracts the pitch information, the maximum DCT coefficient value and the DCT vector number. , A vector corresponding to the DCT vector number is read from the codebook, inversely quantized to obtain a normalized DCT coefficient, denormalized by the maximum value of the DCT coefficient to reproduce the DCT coefficient, and then long-term prediction by inverse discrete cosine transform A voice that reproduces a residual signal, adds the pitch information to the long-term prediction residual signal, and decodes and reproduces a voice signal. In Goka communication system, the D together with the DCT coefficients obtained by the discrete cosine transform on the transmission side is divided into equal intervals of a plurality of frequency domain, and outputs the DCT coefficients maximum value of the divided frequency regions, respectively
The DCT coefficients normalized by the CT coefficient maximum value are vector-quantized using a codebook, and the DCT vector number of the approximate pattern read from the codebook, the pitch information, and the plurality of DCT coefficient maximum values are digital signals. The digital signal sequence from the transmission line is separated into the pitch information, the plurality of maximum DCT coefficient values, and the plurality of DCT vector numbers on the receiving side by encoding in the form of a sequence and transmitting to the transmission line. Vectors corresponding to the plurality of DCT vector numbers are read out from the codebook, inversely quantized, normalized DCT coefficients are reproduced and combined, and then a long-term prediction residual signal is reproduced by inverse discrete cosine transform. It is characterized by doing so.

【００１０】さらに、送信側では、入力音声信号を長期
予測分析してピッチ情報と長期予測残差信号を求め、該
長期予測残差信号を離散コサイン変換により周波数領域
に変換して得られるＤＣＴ係数のＤＣＴ係数最大値を出
力するとともに該ＤＣＴ係数最大値で正規化したＤＣＴ
係数を符号帳を用いてベクトル量子化し、該符号帳から
読み出した近似パターンのＤＣＴベクトル番号と前記ピ
ッチ情報および前記ＤＣＴ係数最大値とをディジタル信
号列の形態に符号化して伝送路に送出し、受信側では、
該伝送路からの前記ディジタル信号列を分離して前記ピ
ッチ情報，前記ＤＣＴ係数最大値および前記ＤＣＴベク
トル番号を取り出し、該ＤＣＴベクトル番号に対応する
ベクトルを符号帳から読み出し逆量子化して正規化され
たＤＣＴ係数を求め前記ＤＣＴ係数最大値によって逆正
規化してＤＣＴ係数を再生した後、逆離散コサイン変換
により長期予測残差信号を再生し、該長期予測残差信号
に前記ピッチ情報を付加して音声信号を復号再生する音
声符号化通信方式において、前記送信側の離散コサイン
変換により得られるＤＣＴ係数を等間隔の複数の周波数
領域に分割し、該分割された周波数領域のうち奇数番目
の周波数領域のＤＣＴ係数についてはＤＣＴ係数最大値
をそれぞれ出力するとともに該ＤＣＴ係数最大値でそれ
ぞれ正規化したＤＣＴ係数を１つの符号帳を用いてベク
トル量子化し、該１つの符号帳から近似パターンのＤＣ
Ｔベクトル番号をそれぞれ読み出し、該分割された周波
数領域のうち偶数番目の周波数領域のＤＣＴ係数につい
ては各領域のＤＣＴ係数を周波数反転したのちＤＣＴ係
数最大値をそれぞれ出力するとともに該ＤＣＴ係数最大
値でそれぞれ正規化したＤＣＴ係数を前記１つの符号帳
を共用してベクトル量子化し、該１つの符号帳から近似
パターンのＤＣＴベクトル番号をそれぞれ読み出し、前
記ピッチ情報，前記複数のＤＣＴ係数最大値および前記
複数のＤＣＴベクトル番号とをディジタル信号列の形態
に符号化して伝送路に送出し、前記受信側では、該伝送
路からの前記ディジタル信号列を分離して前記ピッチ情
報，前記複数のＤＣＴ係数最大値および前記複数のＤＣ
Ｔベクトル番号をそれぞれ取り出し、該複数のＤＣＴベ
クトル番号にそれぞれ対応するベクトルを前記送信側の
１つの符号帳と同じ内容の１つの符号帳から読み出して
逆量子化してそれぞれ正規化されたＤＣＴ係数を再生
し、前記奇数番目の周波数領域のＤＣＴ係数と、前記偶
数番目の周波数領域のＤＣＴ係数を周波数反転したＤＣ
Ｔ係数とを合成した後、逆離散コサイン変換により長期
予測残差信号を再生するようにしたことを特徴とするも
のである。Further, on the transmitting side, the input speech signal is subjected to long-term prediction analysis to obtain pitch information and a long-term prediction residual signal, and the DCT coefficient obtained by transforming the long-term prediction residual signal into the frequency domain by discrete cosine transform. The DCT coefficient maximum value of the DCT coefficient and the DCT coefficient normalized by the DCT coefficient maximum value
Vector-quantizing coefficients using a codebook, coding the DCT vector number of the approximate pattern read from the codebook, the pitch information, and the DCT coefficient maximum value in the form of a digital signal sequence, and sending out to the transmission line; On the receiving side,
The digital signal sequence from the transmission line is separated, the pitch information, the DCT coefficient maximum value and the DCT vector number are taken out, and the vector corresponding to the DCT vector number is read out from the codebook and dequantized and normalized. After obtaining the DCT coefficient, the DCT coefficient is inversely normalized by the maximum value of the DCT coefficient to reproduce the DCT coefficient, the long-term prediction residual signal is reproduced by the inverse discrete cosine transform, and the pitch information is added to the long-term prediction residual signal. In a voice coding communication system for decoding and reproducing a voice signal, a DCT coefficient obtained by the discrete cosine transform on the transmitting side is divided into a plurality of frequency regions at equal intervals, and odd-numbered frequency regions among the divided frequency regions For the DCT coefficients of the above, the maximum value of the DCT coefficient is output respectively, and D normalized by the maximum value of the DCT coefficient is output. The T coefficient vector quantization using a single codebook, DC approximate pattern from said one codebook
Each T vector number is read out, and for the DCT coefficient of the even-numbered frequency domain of the divided frequency domain, the DCT coefficient of each domain is frequency-inverted, and then the DCT coefficient maximum value is output and the DCT coefficient maximum value is output. The normalized DCT coefficients are vector-quantized by sharing the one codebook, DCT vector numbers of approximate patterns are read from the one codebook, and the pitch information, the plurality of DCT coefficient maximum values, and the plurality of DCT coefficients are read out. And the DCT vector number of the DCT vector number are encoded in the form of a digital signal sequence and transmitted to the transmission line, and at the reception side, the digital signal sequence from the transmission line is separated to obtain the pitch information and the maximum values of the plurality of DCT coefficients. And the plurality of DCs
The T vector numbers are respectively taken out, and the vectors respectively corresponding to the plurality of DCT vector numbers are read out from one codebook having the same contents as the one codebook on the transmission side and inversely quantized to obtain the normalized DCT coefficients. DC which is reproduced and frequency-inverted the DCT coefficient in the odd-numbered frequency domain and the DCT coefficient in the even-numbered frequency domain
It is characterized in that the long-term prediction residual signal is reproduced by the inverse discrete cosine transform after synthesizing with the T coefficient.

【００１１】[0011]

【実施例】ここでは、高い周波数領域（２ｋＨｚ以上）
での歪みを軽減するため、Ｎ＝２として高い周波数領域
と低い周波数領域とに等分割して別々にベクトル量子化
を行なうことを前提に説明する。図１は本発明の第１の
実施例の音声符号化装置を示すブロック図、図２は本発
明の第１の実施例の音声復号装置を示すブロック図であ
る。図１において、６．４ｋＨｚサンプリング，１０ビ
ット量子化された入力音声信号（６４ｋｂｐｓ）ａ３
は、長期予測分析器７１によりフレーム（１０ｍｓｅ
ｃ：６４サンプル）毎にピッチ情報Ｐａ３を抽出して出
力するとともに、入力信号ａ３からピッチ成分を取り除
いた信号である長期予測残差信号ｂ３を生成して出力す
る。それを離散コサイン変換器７２により、フレーム
（１０ｍｓｅｃ：６４サンプル）毎に周波数領域に変換
して周波数成分であるＤＣＴ係数ｃ３（６４係数）を出
力する。離散コサイン変換については後で述べる。ＤＣ
Ｔ係数ｃ３は、ＤＣＴ係数分割器７３により周波領域が
Ｎ（例えばＮ＝２）等分され、低い周波数領域（０〜
１．６ｋＨｚ）のＤＣＴ係数ｄ３（３２係数）と、高い
周波数領域（１．６〜３．２ｋＨｚ）のＤＣＴ係数ｅ３
（３２係数）とに分割される。ｄ３，ｅ３はそれぞれ＃
１正規化器７４、＃Ｎ正規化器（Ｎ＝２）７５によりＤ
ＣＴ係数の最大値により正規化され、ＤＣＴ係数最大値
Ｐｂ３，Ｐｃ３及び正規化されたＤＣＴ係数ｆ３，ｇ３
が出力される。ｆ３，ｇ３はそれぞれ＃１ベクトル量子
化器７６と＃１符号帳７７、＃Ｎベクトル量子化器（Ｎ
＝２）７８と＃Ｎ符号帳（Ｎ＝２）７９によりベクトル
量子化され、ベクトル番号Ｐｄ３，Ｐｅ３が出力され、
符号化器８０により、ＤＣＴ係数の最大値Ｐｂ３，Ｐｃ
３及びピッチ情報Ｐａ３と共にディジタル信号列の形態
にされた後、多重化され出力信号ｈ３として伝送路に送
出される。[Example] Here, high frequency range (2 kHz or more)
In order to reduce the distortion at 1, the following description will be made on the assumption that N = 2 and equally divides the high frequency region and the low frequency region and separately performs vector quantization. 1 is a block diagram showing a speech encoding apparatus according to a first embodiment of the present invention, and FIG. 2 is a block diagram showing a speech decoding apparatus according to a first embodiment of the present invention. In FIG. 1, 6.4 kHz sampling, 10-bit quantized input speech signal (64 kbps) a3
Is a frame (10 mse by the long-term prediction analyzer 71).
The pitch information Pa3 is extracted and output for each (c: 64 samples), and the long-term prediction residual signal b3, which is a signal obtained by removing the pitch component from the input signal a3, is generated and output. The discrete cosine transformer 72 transforms it into the frequency domain for each frame (10 msec: 64 samples) and outputs the DCT coefficient c3 (64 coefficients) which is a frequency component. Discrete cosine transform will be described later. DC
The DC coefficient coefficient divider 73 equally divides the T coefficient c3 into N (for example, N = 2) frequency regions, and the T frequency coefficient c3 is in a low frequency region (0
DCT coefficient d3 (32 coefficient) of 1.6 kHz) and DCT coefficient e3 of high frequency region (1.6 to 3.2 kHz)
(32 coefficients). d3 and e3 are #
1 normalizer 74, #N normalizer (N = 2) 75
DCT coefficient maximum values Pb3, Pc3 and normalized DCT coefficients f3, g3 normalized by the maximum value of the CT coefficient
Is output. f3 and g3 are the # 1 vector quantizer 76, the # 1 codebook 77, and the #N vector quantizer (N
= 2) 78 and #N codebook (N = 2) 79, vector quantization is performed, and vector numbers Pd3 and Pe3 are output.
By the encoder 80, the maximum values Pb3, Pc of the DCT coefficients
3 and the pitch information Pa3 are converted into a digital signal train, which is then multiplexed and output to the transmission line as an output signal h3.

【００１２】次に、図２に示すように、伝送路を介して
受信した前記ディジタル信号列ｉ３を分離回路９１によ
り分離してＤＣＴベクトル番号Ｐｆ３，Ｐｇ３，ＤＣＴ
係数最大値Ｐｈ３，Ｐｉ３及びピッチ情報Ｐｊ３を取り
出す。ＤＣＴベクトル番号Ｐｆ３，Ｐｇ３から、それぞ
れ＃１ベクトル逆量子化器９２と＃１符号帳９３，＃Ｎ
ベクトル逆量子化器（Ｎ＝２）９４と＃Ｎ符号帳（Ｎ＝
２）９５を用いてベクトル逆量子化し、正規化された０
〜１．６ｋＨｚのＤＣＴ係数ｊ３と１．６〜３．２ｋＨ
ｚのＤＣＴ係数ｋ３を再生する。ここで、＃１符号帳９
３，＃Ｎ符号帳９５はそれぞれ符号化装置の＃１符号帳
７７，＃Ｎ符号帳７９と同じ内容になっている。ｊ３，
ｈ３はそれぞれ＃１逆正規化器９６とＤＣＴ係数最大値
Ｐｈ３，＃Ｎ逆正規化器９７とＤＣＴ係数最大値Ｐｉ３
により逆正規化され、各周波数帯のＤＣＴ係数ｍ３，ｎ
３が再生される。ｍ３，ｎ３は合成器９８により合成さ
れ、ＤＣＴ係数ｑ３が再生される。このＤＣＴ係数ｑ３
を逆離散コサイン変換（ＩＤＣＴ）器９９で時間領域に
変換して長期予測残差信号ｒ３を再生する。長期予測合
成器１００では長期予測残差信号ｒ３にピッチ情報Ｐｊ
３を付加して音声信号ｓ３を復号再生する。Next, as shown in FIG. 2, the digital signal sequence i3 received via the transmission line is separated by the separation circuit 91 to generate DCT vector numbers Pf3, Pg3, DCT.
The maximum coefficient values Ph3, Pi3 and the pitch information Pj3 are extracted. From the DCT vector numbers Pf3 and Pg3, the # 1 vector dequantizer 92 and the # 1 codebook 93 and #N, respectively.
Vector dequantizer (N = 2) 94 and #N codebook (N =
2) Vector dequantization using 95 and normalized 0
DCT coefficient j3 of up to 1.6 kHz and 1.6 to 3.2 kHz
Reproduce the DCT coefficient k3 of z. Where # 1 codebook 9
The # 3 codebook 95 and the #N codebook 95 have the same contents as the # 1 codebook 77 and the #N codebook 79 of the coding apparatus, respectively. j3
h3 is # 1 inverse normalizer 96 and DCT coefficient maximum value Ph3, #N inverse normalizer 97 and DCT coefficient maximum value Pi3, respectively.
Denormalized by DCT coefficients m3, n of each frequency band
3 is played. The m3 and n3 are combined by the combiner 98 to reproduce the DCT coefficient q3. This DCT coefficient q3
Is transformed into the time domain by the inverse discrete cosine transform (IDCT) device 99 to reproduce the long-term prediction residual signal r3. In the long-term prediction synthesizer 100, pitch information Pj is added to the long-term prediction residual signal r3.
3 is added and the audio signal s3 is decoded and reproduced.

【００１３】本方式におけるビット配分を次の表３に示
す。このビット配分を用いれば、音声符号化速度は約
３．４ｋｂｐｓ（３４ビット／１０ｍｓｅｃ）となり、
高い周波数領域の歪みが軽減され、ｌｏｇ−ＰＣＭ６ビ
ット相当の再生音声品質が得られる。但し、この時、符
号帳を充分に大きくする（例えば、（ベクトルの次元３
２×サイズ２５６）×２＝１６ｋｗｏｒｄ、ここで１６
ｗｏｒｄ＝２ｂｙｔｅとすると、３２ｋｂｙｔｅ）必要
がある。The bit allocation in this method is shown in Table 3 below. If this bit allocation is used, the speech coding rate will be about 3.4 kbps (34 bits / 10 msec),
Distortion in the high frequency region is reduced, and reproduced voice quality equivalent to 6 bits of log-PCM is obtained. However, at this time, make the codebook sufficiently large (for example, (dimension 3 of vector
2 x size 256) x 2 = 16kword, where 16
If word = 2 bytes, then 32 kbytes) are required.

【表３】 [Table 3]

【００１４】次に、本発明の第２の実施例について説明
する。第２の実施例は、第１の実施例の符号帳の数Ｎを
１／Ｎに減らし、さらに符号帳のメモリ容量を１／Ｎ
（Ｎ＝２）に軽減した構成である。これを実現するた
め、図３，図４に連続して示すアップサンプリング・ダ
ウンサンプリングの原理を応用した。図３，図４におい
てｆｓはサンプリング周波数を示す。図３（ａ）は、周
波数領域での信号であり、これをＮ（例えばＮ＝２）個
に分割する、つまり、低い周波数領域（ｃ）と高い周波
数領域（ｅ）とに分割する。図３の（ｂ）と（ｄ）は、
それぞれ（ｃ），（ｅ）に対応する時間波形である。時
間波形（ｂ），（ｄ）はそれぞれ１／Ｎ（Ｎ＝２）に間
引くと（サンプリング周波数を１／Ｎにダウンサンプリ
ングすると）（ｆ），（ｈ）となり、それらに対応する
スペクトルは（ｇ），（ｉ）となる。ここで、（ｉ）を
みると、（ｅ）のスペクトルが周波数反転され、（ｃ）
または（ｇ）と同じ周波数帯域に変換されていることが
分かる。このような変換を施すことにより、周波数帯が
等しくなるため、（ｉ）をベクトル量子化する際、
（ｇ）のベクトル量子化で用いる符号帳と同一のものを
用いることが可能となる。従って、符号帳に要するメモ
リ容量は、ベクトルの次元数が１／２となるので１／２
に軽減することができる。Next, a second embodiment of the present invention will be described. In the second embodiment, the number N of codebooks in the first embodiment is reduced to 1 / N, and the codebook memory capacity is reduced to 1 / N.
The configuration is reduced to (N = 2). In order to realize this, the principle of upsampling / downsampling continuously shown in FIGS. 3 and 4 was applied. 3 and 4, fs represents the sampling frequency. FIG. 3A shows a signal in the frequency domain, which is divided into N (for example, N = 2) signals, that is, a low frequency domain (c) and a high frequency domain (e). (B) and (d) of FIG.
They are time waveforms corresponding to (c) and (e), respectively. When the time waveforms (b) and (d) are thinned out to 1 / N (N = 2) respectively (when the sampling frequency is downsampled to 1 / N), they become (f) and (h), and their corresponding spectra are ( g) and (i). Here, looking at (i), the spectrum of (e) is frequency-inverted, and (c)
Alternatively, it can be seen that the frequency band is converted to the same frequency band as (g). By performing such a conversion, the frequency bands become equal, so when vector quantization of (i) is performed,
It is possible to use the same codebook as that used in the vector quantization of (g). Therefore, the memory capacity required for the codebook is 1/2 because the number of vector dimensions is 1/2.
Can be reduced to

【００１５】次に（ａ）のスペクトルを再生するため、
（ｆ），（ｈ）に対しアップサンプリング（０により間
引かれた時点を補間する）を施し、図４に示す時間波形
（ｊ），（ｍ）を得る。これらに対応するスペクトルは
それぞれ（ｋ），（ｎ）となる。（ｋ），（ｎ）に対
し、それぞれ図４に示すような低域フィルタ，高域フィ
ルタを通せば（ｑ），（ｓ）となり、図３の（ｃ），
（ｅ）が再生される。（ｑ），（ｓ）を合成すれば
（ｔ）となり（ａ）が再生できる。Ｎ＞２の場合も同様
に考えると、等間隔に分割後、低い周波数側から数えて
偶数番目の帯域を周波数反転し、奇数番目の帯域をその
まま使用することにより、共用する１つの符号帳により
各帯域をベクトル量子化することが可能となる。従っ
て、１つの符号帳に要するメモリ容量は、ベクトルの次
元数が１／Ｎとなるので１／Ｎに軽減することができ
る。Next, in order to reproduce the spectrum of (a),
Up-sampling (interpolating the time points thinned by 0) is applied to (f) and (h) to obtain time waveforms (j) and (m) shown in FIG. The spectra corresponding to these are (k) and (n), respectively. If (k) and (n) are passed through a low-pass filter and a high-pass filter as shown in FIG. 4, respectively, they become (q) and (s), respectively.
(E) is reproduced. If (q) and (s) are combined, it becomes (t) and (a) can be reproduced. In the case of N> 2 as well, after dividing into equal intervals, frequency inversion is performed on even-numbered bands counting from the low frequency side, and odd-numbered bands are used as they are, so that one shared codebook is used. It is possible to perform vector quantization on each band. Therefore, the memory capacity required for one codebook can be reduced to 1 / N since the number of vector dimensions is 1 / N.

【００１６】この原理をＤＣＴ係数を用いて実現した実
施例を図５〜８に示す。図５は音声符号化装置の第２の
実施例の構成例図、図６は音声復号装置第２の実施例の
構成例図、図７，図８は図５，図６の処理過程でのスペ
クトルを示す。図５の符号化装置において、６．４ｋＨ
ｚサンプリング，１０ビット量子化された入力音声信号
（６４ｋｂｐｓ）ａ２は、長期予測分析器５１によりフ
レーム（１０ｍｓｅｃ：６４サンプル）毎にピッチ情報
Ｐａ２を抽出して出力するとともに、入力信号ａ２から
ピッチ成分を取り除いた信号である長期予測残差信号ｂ
２を生成して出力する。それを離散コサイン変換（ＤＣ
Ｔ）器５２によりフレーム（１０ｍｓｅｃ：６４サンプ
ル）毎に周波数領域に変換して周波数成分であるＤＣＴ
係数ｃ２（６４係数）（図７（ａ））を出力する。ＤＣ
Ｔ係数ｃ２は、ＤＣＴ係数分割器５３によりＮ（例えば
Ｎ＝２）等分され、低い周波数帯（０〜１．６ｋＨｚ）
のＤＣＴ係数ｄ２（３２係数）と高い周波数帯（１．６
〜３．２ｋＨｚ）のＤＣＴ係数ｅ２（３２係数）とに分
割される。ｄ２（図７（ｂ））はそのまま＃１正規化器
５４に入力され、ｅ２は周波数反転器５５により周波数
反転されｆ２（図７（ｃ））となる。ｄ２，ｆ２はそれ
ぞれ＃１正規化器５４、＃Ｎ正規化器（Ｎ＝２）５６に
よりＤＣＴ係数の最大値により正規化され、ＤＣＴ係数
の最大値Ｐｂ２，Ｐｃ２及び正規化されたＤＣＴ係数ｇ
２，ｈ２が出力される。ｇ２，ｈ２はそれぞれ＃１ベク
トル量子化器５７，＃Ｎベクトル量子化器（Ｎ＝２）５
９と共用する１つの符号帳５８によりベクトル量子化さ
れ、ベクトル番号Ｐｄ２，Ｐｅ２が出力され、符号化器
６０によりＤＣＴ係数の最大値Ｐｂ２，Ｐｃ２及びピッ
チ情報Ｐａ２と共にディジタル信号列の形態にされた
後、多重化され出力信号ｉ２として伝送路に送出され
る。An embodiment in which this principle is realized by using DCT coefficients is shown in FIGS. FIG. 5 is a block diagram of the second embodiment of the speech coding apparatus, FIG. 6 is a block diagram of the second embodiment of the speech decoding apparatus, and FIGS. 7 and 8 show the processing steps of FIGS. The spectrum is shown. In the encoding device of FIG. 5, 6.4 kHz
The input voice signal (64 kbps) a2 which is z-sampled and 10-bit quantized is extracted by the long-term prediction analyzer 51 to output pitch information Pa2 for each frame (10 msec: 64 samples), and the pitch component is input from the input signal a2. Long-term prediction residual signal b which is the signal from which
2 is generated and output. The discrete cosine transform (DC
T) unit 52 transforms each frame (10 msec: 64 samples) into the frequency domain to generate a DCT which is a frequency component.
The coefficient c2 (64 coefficients) (FIG. 7A) is output. DC
The T coefficient c2 is equally divided into N (for example, N = 2) by the DCT coefficient divider 53, and a low frequency band (0 to 1.6 kHz) is obtained.
DCT coefficient d2 (32 coefficients) and high frequency band (1.6
.About.3.2 kHz) and the DCT coefficient e2 (32 coefficients). d2 (FIG. 7B) is directly input to the # 1 normalizer 54, and e2 is frequency-inverted by the frequency inverter 55 to be f2 (FIG. 7C). d2 and f2 are normalized by the maximum value of the DCT coefficient by the # 1 normalizer 54 and the #N normalizer (N = 2) 56, respectively, and the maximum value of the DCT coefficient Pb2 and Pc2 and the normalized DCT coefficient g
2, h2 are output. g2 and h2 are # 1 vector quantizer 57 and #N vector quantizer (N = 2) 5 respectively.
Vector quantization is carried out by one code book 58 shared with No. 9 and vector numbers Pd2 and Pe2 are outputted, and a coder 60 forms a digital signal string together with maximum values Pb2 and Pc2 of DCT coefficients and pitch information Pa2. After that, they are multiplexed and sent as an output signal i2 to the transmission line.

【００１７】次に、図６の復号装置において、伝送路を
介して受信した前記ディジタル列信号ｊ２を分離回路６
１により分離してＤＣＴベクトル番号Ｐｆ２，Ｐｇ２，
ＤＣＴ最大値Ｐｈ２，Ｐｉ２及びピッチ情報Ｐｊ２を取
り出す。ＤＣＴベクトル番号Ｐｆ２，Ｐｇ２から、それ
ぞれ＃１ベクトル逆量子化器６２，＃Ｎベクトル逆量子
化器（Ｎ＝２）６４と共用する１つの符号帳６３を用い
てベクトル逆量子化され、正規化された０〜１．６ｋＨ
ｚのＤＣＴ係数ｋ２と１．６〜３．２ｋＨｚのＤＣＴ係
数ｍ２を再生する。ここで、符号帳６３は符号化装置の
符号帳５８と同じ内容になっている。ｋ２，ｍ２は、そ
れぞれ、＃１逆正規化器６５とＤＣＴ係数最大値Ｐｈ
２，＃Ｎ逆正規化器（Ｎ＝２）６６とＤＣＴ係数最大値
Ｐｉ２により逆正規化され、各周波数帯のＤＣＴ係数ｎ
２，ｐ２が再生される。ｎ２（図８（ｄ））はそのまま
合成器６８に入力され、ｐ２は周波数反転器６７により
周波数反転されｑ２（図８（ｅ））となった後入力され
る。合成器６８により、高い周波数成分ｎ２と低い周波
数成分ｑ２は合成されてＤＣＴ係数ｒ２（図８（ｆ））
が再生される。逆離散コサイン変換器６９ではＤＣＴ係
数ｒ２を時間領域に変換して長期予測残差信号ｓ２を再
生する。長期予測合成器７０では長期予測残差信号ｓ２
にピッチ情報Ｐｊ２を付加して音声信号ｔ２を復号再生
する。この方法を用いれば、第１の実施例に比べ、符号
帳は送信側，受信側にそれぞれ１つとなり、さらに符号
帳に要するメモリ容量は１／Ｎ（Ｎ＝２）となる。図１
１は図５に示した第２の実施例のＮ＝４としたときの構
成例図である。符号は図５と同じである。Next, in the decoding device of FIG. 6, the separation circuit 6 separates the digital sequence signal j2 received via the transmission line.
DCT vector numbers Pf2, Pg2 separated by 1
The DCT maximum values Ph2, Pi2 and the pitch information Pj2 are taken out. From the DCT vector numbers Pf2 and Pg2, vector dequantization is performed using one codebook 63 shared with the # 1 vector dequantizer 62 and #N vector dequantizer (N = 2) 64, respectively, and normalized. 0 to 1.6 kH
The DCT coefficient k2 of z and the DCT coefficient m2 of 1.6 to 3.2 kHz are reproduced. Here, the codebook 63 has the same contents as the codebook 58 of the encoding device. k2 and m2 are the # 1 denormalizer 65 and the maximum DCT coefficient Ph respectively.
2, #N denormalizer (N = 2) 66 and the DCT coefficient maximum value Pi2 are denormalized to obtain the DCT coefficient n of each frequency band.
2, p2 is played. n2 (FIG. 8 (d)) is directly input to the combiner 68, and p2 is frequency-inverted by the frequency inverter 67 to become q2 (FIG. 8 (e)) and then input. The combiner 68 combines the high frequency component n2 and the low frequency component q2 into a DCT coefficient r2 (FIG. 8 (f)).
Is played. The inverse discrete cosine transformer 69 transforms the DCT coefficient r2 into the time domain to reproduce the long-term prediction residual signal s2. In the long-term prediction synthesizer 70, the long-term prediction residual signal s2
Is added with pitch information Pj2 to decode and reproduce the audio signal t2. If this method is used, one codebook is provided for each of the transmitting side and the receiving side, and the memory capacity required for the codebook is 1 / N (N = 2), as compared with the first embodiment. Figure 1
1 is a structural example diagram of the second embodiment shown in FIG. 5 when N = 4. Reference numerals are the same as those in FIG.

【００１８】参考のために、ＤＣＴ及びＩＤＣＴについ
て説明する。これらの変換式は、入力信号Ｘ（ｎ）とす
るとそれぞれ次のようになる。（１）ＤＣＴの場合、求めるＤＣＴ係数Ｘｃ（ｋ）
は、For reference, DCT and IDCT will be described. These conversion formulas are as follows when the input signal X (n) is used. (1) In the case of DCT, desired DCT coefficient Xc (k)
Is

【数１】但し、Ｎはブロック当たりのサンプル数ｇ（ｋ）＝１（ｋ＝０）ｇ（ｋ）＝√２（ｋ＝１，２，…，Ｎ−１）（２）ＩＤＣＴの場合、復元される信号Ｘ（ｎ）は、[Equation 1] However, N is the number of samples per block g (k) = 1 (k = 0) g (k) = √2 (k = 1, 2, ..., N-1) (2) In the case of IDCT, it is restored. The signal X (n) is

【数２】 [Equation 2]

【００１９】[0019]

【発明の効果】以上詳細に説明したように、本発明を実
施することにより次の効果が得られる。第１の実施例の
場合、低い周波数成分と高い周波数成分を別々にベクト
ル量子化することにより、高い周波数領域での歪みが軽
減されて再生音声の品質が向上する。第２の実施例の場
合、符号帳の数とメモリ容量は、第１の実施例に比べて
それぞれ１／Ｎになる。これらの効果により、ハーフレ
ートシステム（音声符号化速度３．６ｋｂｐｓ以下で、
再生音声品質がｌｏｇ−ＰＣＭ６ビット相当以上）の音
声符号化通信方式及びその装置を実現することができる
ため実用上極めて大きい効果がある。As described in detail above, the following effects can be obtained by implementing the present invention. In the case of the first embodiment, the low frequency component and the high frequency component are separately vector-quantized to reduce the distortion in the high frequency region and improve the quality of reproduced voice. In the case of the second embodiment, the number of codebooks and the memory capacity are 1 / N, respectively, as compared with the first embodiment. Due to these effects, a half rate system (voice coding speed of 3.6 kbps or less,
Since it is possible to realize a voice coding communication system and its apparatus in which reproduced voice quality is log-PCM equivalent to 6 bits or more), there is an extremely great effect in practical use.

[Brief description of drawings]

【図１】本発明の第１の実施例の音声符号化装置を示す
ブロック図FIG. 1 is a block diagram showing a speech coding apparatus according to a first embodiment of the present invention.

【図２】本発明の第１の実施例の音声復号装置を示すブ
ロック図である。FIG. 2 is a block diagram showing a speech decoding apparatus according to the first embodiment of the present invention.

【図３】アップサンプリング・ダウンサンプリングの原
理説明図である。FIG. 3 is a diagram illustrating the principle of upsampling / downsampling.

【図４】アップサンプリング・ダウンサンプリングの原
理説明図である。FIG. 4 is a diagram illustrating the principle of upsampling / downsampling.

【図５】本発明の第２の実施例の音声符号化装置を示す
ブロック図である。FIG. 5 is a block diagram showing a speech encoding apparatus according to a second embodiment of the present invention.

【図６】本発明の第２の実施例の音声復号装置を示すブ
ロック図である。FIG. 6 is a block diagram showing a speech decoding apparatus according to a second embodiment of the present invention.

【図７】第２の実施例の処理過程でのスペクトル（ＤＣ
Ｔ係数）である。FIG. 7 is a spectrum (DC in the process of the second embodiment.
T coefficient).

【図８】第２の実施例の処理過程でのスペクトル（ＤＣ
Ｔ係数）である。FIG. 8 shows a spectrum (DC in the process of the second embodiment.
T coefficient).

【図９】従来技術の構成例を示すブロック図である。FIG. 9 is a block diagram showing a configuration example of a conventional technique.

【図１０】先に出願した構成例を示すブロック図であ
る。FIG. 10 is a block diagram showing a configuration example filed earlier.

【図１１】図５の本発明の第２の実施例のＮ＝４にした
ときの構成例図である。FIG. 11 is a structural example diagram of the second embodiment of the present invention in FIG. 5 when N = 4.

[Explanation of symbols]

１１長期予測分析器１２ＤＣＴ器１３適応間引器１４符号化器２１分離回路２２適応間引復号器２３ＩＤＣＴ器２４長期予測合成器３１長期予測分析器３２ＤＣＴ器３３正規化器３４ベクトル量子化器３５符号帳４１分離回路４２ベクトル逆量子化器４３符号帳４４逆正規化器４５逆ＤＣＴ器４６長期予測合成器５１長期予測分析器５２ＤＣＴ器５３ＤＣＴ係数分割器５４，５６正規化器５５周波数反転器５７，５９ベクトル量子化器５８符号帳６０符号化器６１分離回路６２，６４ベクトル逆量子化器６３符号帳６５，６６逆正規化器６７周波数反転器６８合成器６９ＩＤＣＴ器７０長期予測合成器７１長期予測分析器７２ＤＣＴ器７３ＤＣＴ係数分割器７４，７５正規化器７６，７８ベクトル量子化器７７，７９符号帳８０符号化器９１分離回路９２，９４ベクトル逆量子化器９３，９５符号帳９６，９７逆正規化器９８合成器９９ＩＤＣＴ器１００長期予測合成器 11 long-term prediction analyzer 12 DCT device 13 adaptive thinning-out device 14 encoder 21 separation circuit 22 adaptive thinning-out decoder 23 IDCT device 24 long-term prediction synthesizer 31 long-term prediction analyzer 32 DCT device 33 normalizer 34 vector quantization 35 Codebook 41 Separation Circuit 42 Vector Dequantizer 43 Codebook 44 Inverse Normalizer 45 Inverse DCT Device 46 Long Term Prediction Synthesizer 51 Long Term Prediction Analyzer 52 DCT Unit 53 DCT Coefficient Splitter 54, 56 Normalizer 55 Frequency inverter 57,59 Vector quantizer 58 Codebook 60 Encoder 61 Separation circuit 62,64 Vector dequantizer 63 Codebook 65,66 Denormalizer 67 Frequency inverter 68 Combiner 69 IDCT device 70 Long term Prediction synthesizer 71 Long-term prediction analyzer 72 DCT device 73 DCT coefficient divider 74, 75 Normalizer 76, 8 vector quantizer 77, 79 codebook 80 encoder 91 separating circuit 94 vector inverse quantizer 93, 95 codebooks 96 and 97 inverse normalizer 98 combiner 99 IDCT unit 100 long-term prediction synthesizer

─────────────────────────────────────────────────────
─────────────────────────────────────────────────── ───

【手続補正書】[Procedure amendment]

【提出日】平成５年２月１６日[Submission date] February 16, 1993

【手続補正１】[Procedure Amendment 1]

【補正対象書類名】明細書[Document name to be amended] Statement

【補正対象項目名】０００４[Correction target item name] 0004

【補正方法】変更[Correction method] Change

【補正内容】[Correction content]

【０００４】受信側では、ディジタル信号列ｅを受け取
り、分離回路２１によりＤＣＴ情報Ｐｃ，ピッチ情報Ｐ
ｄに分離する。適応間引復号器２２では、ＤＣＴ情報Ｐ
ｃ中のＤＣＴ係数振幅情報、位置情報により、送られて
きたＤＣＴ係数を再生し、送られてこなかったＤＣＴ係
数の位置に０を挿入することにより補間する。再生され
たＤＣＴ係数ｆを逆離散コサイン変換器（ＩＤＣＴ器）
２３により時間領域に変換し、長期予測残差信号ｇを再
生する。長期予測合成器２４では長期予測残差信号ｇに
ピッチ情報Ｐｄを付加することにより、音声信号ｈを復
号再生する。従来の符復号器のフレーム（３０ｍｓｅ
ｃ）毎のビット配分の例を次の表１に示す。各フレーム
の先頭には、フレーム同期をとるため５ビットの同期ビ
ットを挿入している。表１での合計を１秒当たりに変換
すると、１３５ビット／３０ｍｓｅｃ＝４．５ｋｂｐｓ
となる。On the receiving side, the digital signal train e is received, and the separation circuit 21 separates the DCT information Pc and the pitch information P.
Separate into d. In the adaptive thinning-out decoder 22, the DCT information P
The transmitted DCT coefficient is reproduced based on the DCT coefficient amplitude information and position information in c, and interpolation is performed by inserting 0 at the position of the DCT coefficient that has not been transmitted. The reconstructed DCT coefficient f is converted to an inverse discrete cosine converter (IDCT device)
It is converted to the time domain by 23 and the long-term prediction residual signal g is reproduced. The long-term prediction synthesizer 24 decodes and reproduces the audio signal h by adding pitch information Pd to the long-term prediction residual signal g. Conventional codec frame (30 mse
An example of bit allocation for each c) is shown in Table 1 below. Five synchronization bits are inserted at the beginning of each frame for frame synchronization. Converting the total in Table 1 per second, 135 bits / 30 msec = 4.5 kbps
Becomes

【手続補正２】[Procedure Amendment 2]

【補正対象書類名】明細書[Document name to be amended] Statement

【補正対象項目名】０００５[Correction target item name] 0005

【補正方法】変更[Correction method] Change

【補正内容】[Correction content]

【０００５】 [0005]

【発明が解決しようとする課題】上述の方式での再生音
声品質は、音声符号化速度４．６ｋｂｐｓで、ｌｏｇ−
ＰＣＭ４ビット相当しか得られておらず、音声符号化速
度をさらに３．６ｋｂｐｓ以下に下げた場合、伝送でき
るＤＣＴ係数の個数は減少し、周波数領域での歪みが大
きくなるため、さらに再生音声品質は劣化する。つま
り、従来の方式では再生音声品質をｌｏｇ−ＰＣＭ６ビ
ット相当で符号化速度を３．６ｋｂｐｓ以下に下げるこ
とはできず、ハーフレート音声符号化方式に要求される
性能（品質、誤り訂正能力）を満たすことはできない。
そこで、本発明者は、この問題点を改善するために上述
の方法にベクトル量子化を導入し、ハーフレートシステ
ムに適用可能な（音声符号化速度３．６ｋｂｐｓ以下
で、再生音声品質がｌｏｇ−ＰＣＭ６ビット相当以上）
音声符号化方法及びその装置を先に提案した（特願平３
−３２９７８２号参照）。その内容は、上述の方法での
適応間引き器をベクトル量子化器に置き換えることによ
り再生音声品質を従来方式以上に保ちながら符号化速度
低減を図ったものであり、図１０はその実施例を示す音
声符号化通信方式のブロック図である。（Ａ）は音声符
号化装置、（Ｂ）は音声復号装置である。 [SUMMARY OF THE INVENTION] reproduced sound quality in the above-described method, the speech encoding rate 4.6Kbps, log-
If only PCM 4 bits are obtained and the voice coding rate is further reduced to 3.6 kbps or less, the number of DCT coefficients that can be transmitted is reduced and distortion in the frequency domain is increased, so that the reproduced voice quality is further improved. to degrade. That is, in the conventional system, the reproduced voice quality cannot be lowered to 3.6 kbps or less with the log-PCM equivalent to 6 bits, and the performance (quality, error correction capability) required for the half rate voice coding system can be obtained. Can not meet.
Therefore, the present inventor introduced vector quantization to the above method in order to improve this problem, and is applicable to a half rate system (speech coding speed is 3.6 kbps or less, reproduced speech quality is log- PCM equivalent to 6 bits or more)
A voice coding method and its apparatus were previously proposed (Japanese Patent Application No.
-329782). The content is to reduce the coding speed while maintaining the reproduced voice quality higher than that of the conventional method by replacing the adaptive decimator in the above method with a vector quantizer, and FIG. 10 shows an embodiment thereof. It is a block diagram of a voice coding communication system. (A) is a speech encoding device, and (B) is a speech decoding device.

【手続補正３】[Procedure 3]

【補正対象書類名】明細書[Document name to be amended] Statement

【補正対象項目名】０００８[Correction target item name] 0008

【補正方法】変更[Correction method] Change

【補正内容】[Correction content]

【０００８】上記の本発明者による先の提案の構成で
は、ベクトル量子化を導入することにより音声符号化速
度を低く抑えることはできるが、再生音声品質の維持は
まだ十分ではないという問題点がある。この原因は、ベ
クトル量子化を行なう際、電力が集中する低い周波数領
域（約２ｋＨｚ以下）に重点が置かれてベクトルが選択
され、電力の小さい高い周波数領域（約２ｋＨｚ以上）
では量子化歪みが大きくなるためであり、再生音声の高
い周波数領域が歪み品質が劣化する。このため、ハーフ
レートシステムで要求されるｌｏｇ−ＰＭＣ６ビット相
当以上の品質を得ることが難しいという問題点がある。
本発明の目的は、このような先願の問題点をさらに改善
して、ハーフレートシステムに適用可能な（音声符号化
速度３．６ｋｂｐｓ以下で、再生音声品質がｌｏｇ−Ｐ
ＣＭ６ビット相当以上）音声符号化通信方式及びその装
置を提供することである。In the above-mentioned configuration proposed by the inventor of the present invention, the voice coding speed can be suppressed to a low level by introducing vector quantization, but there is a problem that the reproduced voice quality is not sufficiently maintained. is there. The reason for this is that when vector quantization is performed, the vector is selected with emphasis on the low frequency region (about 2 kHz or less) where power is concentrated, and the high frequency region with low power (about 2 kHz or more) is selected.
This is because the quantization distortion becomes large, and the distortion quality deteriorates in the high frequency region of the reproduced voice. Therefore, there is a problem that it is difficult to obtain a quality equivalent to or higher than 6 bits of log-PMC required in the half rate system.
The object of the present invention is to further improve the above problems of the prior application and to be applicable to a half rate system (speech coding rate is 3.6 kbps or less, and reproduced speech quality is log-P.
CM 6 bit or more) to provide a voice coding communication system and its device.

Claims

[Claims]

1. A DCT coefficient obtained by, on the transmitting side, long-term prediction analysis of an input speech signal to obtain pitch information and a long-term prediction residual signal, and transforming the long-term prediction residual signal into a frequency domain by discrete cosine transform. Of the DCT coefficient maximum value and the DCT coefficient normalized by the maximum DCT coefficient value are vector-quantized using a codebook, and the DCT vector number of the approximate pattern read from the codebook, the pitch information, and the DCT coefficient The maximum value and the maximum value are encoded in the form of a digital signal sequence and transmitted to the transmission line, and on the receiving side, the digital signal sequence from the transmission line is separated to obtain the pitch information, the maximum DCT coefficient value and the DCT vector number. Of the DCT vector number, the vector corresponding to the DCT vector number is read from the codebook and inversely quantized to obtain the normalized DCT coefficient. After the DCT coefficient is reproduced by inverse normalization with the maximum value of the CT coefficient, the long-term prediction residual signal is reproduced by the inverse discrete cosine transform, and the pitch information is added to the long-term prediction residual signal to decode and reproduce the voice signal. In the voice coding communication system, the DCT coefficient obtained by the discrete cosine transform on the transmitting side is divided into a plurality of frequency regions at equal intervals, and the DCT coefficient maximum value of each of the divided frequency regions is output and the DCT coefficient is output. The DCT coefficients normalized by the maximum value are vector-quantized using a codebook, and the DCT vector number of the approximate pattern read from the codebook, the pitch information, and the maximum values of the plurality of DCT coefficients are stored in a digital signal sequence. The digital signal sequence from the transmission line is separated at the receiving side by being encoded into a form and transmitted to the transmission line. The pitch information, the plurality of DCT coefficient maximum values, and the plurality of DCT vector numbers, respectively, and the vectors corresponding to the plurality of DCT vector numbers are read from the codebook and inversely quantized to obtain the normalized DCT coefficients. A speech coding communication method characterized in that after reproducing and synthesizing, a long-term prediction residual signal is reproduced by inverse discrete cosine transform.

2. A maximum DCT coefficient of DCT coefficients obtained by performing long-term prediction analysis of an input speech signal to obtain pitch information and a long-term prediction residual signal, and transforming the long-term prediction residual signal into a frequency domain by discrete cosine transform. The value is output and D
The DCT coefficient normalized by the CT coefficient maximum value is vector-quantized using a codebook, and the DCT vector number of the approximate pattern read from the codebook, the pitch information, and the DCT coefficient maximum value are converted into a digital signal string form. A speech coding apparatus for coding and transmitting to a transmission line, a DCT coefficient divider for dividing a DCT coefficient obtained by the discrete cosine transform into a plurality of frequency regions at equal intervals, and outputting a plurality of DCT coefficients; The DCT coefficient maximum value is output from each of the plurality of DCT coefficients from the coefficient divider, and D
A plurality of normalizers that output DCT coefficients normalized with the maximum CT coefficient value, and the normalized DCT coefficients from the plurality of normalizers are vector-quantized using a codebook and read from the codebook. A plurality of vector quantizers that output DCT vector numbers of the approximate pattern, and code the pitch information, the plurality of DCT coefficient maximum values, and the plurality of DCT vector numbers in the form of a digital signal sequence and send it to the transmission line. A speech coding apparatus, comprising:

3. A signal encoded in the form of a digital signal sequence is received that includes pitch information of a voice signal, a plurality of maximum DCT coefficient values corresponding to divided frequency domains, and a plurality of DCT vector numbers. , A separation circuit for separating the digital signal sequence and extracting the pitch information, the plurality of DCT coefficient maximum values and the plurality of DCT vector numbers, and vectors respectively corresponding to the plurality of DCT vector numbers from respective codebooks. A plurality of vector dequantizers that output read dequantized and normalized DCT coefficients, respectively, and denormalize respective outputs of the plurality of vector dequantizers by the corresponding plurality of DCT coefficient maximum values. A plurality of denormalizers that output the normalized DCT coefficients for each frequency domain, and the outputs from the plurality of denormalizers are combined. A DCT coefficient to output a DCT coefficient, an inverse discrete cosine converter that reproduces a long-term prediction residual signal by performing an inverse discrete cosine transform on the output of the synthesizer, and the pitch information is added to the long-term prediction residual signal. And a long-term predictive synthesizer for decoding and reproducing a voice signal.

4. A DCT coefficient obtained by, on the transmitting side, long-term prediction analysis of an input speech signal to obtain pitch information and a long-term prediction residual signal, and transforming the long-term prediction residual signal into a frequency domain by discrete cosine transform. Of the DCT coefficient maximum value and the DCT coefficient normalized by the maximum DCT coefficient value are vector-quantized using a codebook, and the DCT vector number of the approximate pattern read from the codebook, the pitch information, and the DCT coefficient The maximum value and the maximum value are encoded in the form of a digital signal sequence and transmitted to the transmission line, and on the receiving side, the digital signal sequence from the transmission line is separated to obtain the pitch information, the maximum DCT coefficient value and the DCT vector number. Of the DCT vector number, the vector corresponding to the DCT vector number is read from the codebook and inversely quantized to obtain the normalized DCT coefficient. After the DCT coefficient is reproduced by inverse normalization with the maximum value of the CT coefficient, the long-term prediction residual signal is reproduced by the inverse discrete cosine transform, and the pitch information is added to the long-term prediction residual signal to decode and reproduce the voice signal. In the voice coding communication system, the DCT coefficient obtained by the discrete cosine transform on the transmitting side is divided into a plurality of frequency regions at equal intervals, and the DCT coefficient of the odd-numbered frequency region of the divided frequency regions is DCT. The maximum value of each coefficient is output, and the DCT coefficients normalized by the maximum value of the DCT coefficient are vector-quantized using one codebook, and the DCT vector numbers of the approximate patterns are read out from the one codebook, and the division is performed. Regarding the DCT coefficient of the even-numbered frequency domain of the frequency domain thus obtained, the frequency of the DCT coefficient of each domain is inverted. DC normalized respectively by said DCT coefficient maximum value and outputs the Chi DCT coefficients maximum respectively
The T coefficient is vector-quantized by sharing the one codebook, the DCT vector numbers of the approximate patterns are read out from the one codebook, and the pitch information, the plurality of DCT coefficient maximum values and the plurality of DCT vector numbers are read. Are encoded in the form of a digital signal sequence and transmitted to a transmission line, and at the reception side, the digital signal sequence from the transmission line is separated to obtain the pitch information, the plurality of DCT coefficient maximum values and the plurality of DCT coefficient maximum values. The DCT vector numbers are respectively taken out, and the vectors respectively corresponding to the plurality of DCT vector numbers are read out from one codebook having the same contents as the one codebook on the transmitting side and inversely quantized to obtain the normalized DCT coefficients. Reproduce and frequency-invert the DCT coefficient in the odd-numbered frequency domain and the DCT coefficient in the even-numbered frequency domain A voice coding communication system characterized in that the long-term prediction residual signal is reproduced by inverse discrete cosine transform after synthesizing with the DCT coefficient.

5. A maximum DCT coefficient of DCT coefficients obtained by performing long-term prediction analysis of an input speech signal to obtain pitch information and a long-term prediction residual signal, and transforming the long-term prediction residual signal into a frequency domain by discrete cosine transform. The value is output and D
The DCT coefficient normalized by the CT coefficient maximum value is vector-quantized using a codebook, and the DCT vector number of the approximate pattern read from the codebook, the pitch information, and the DCT coefficient maximum value are converted into a digital signal string form. In a speech coding apparatus for coding and transmitting to a transmission line, a DCT coefficient divider for dividing the DCT coefficient obtained by the discrete cosine transform into a plurality of equally spaced frequency regions and outputting the divided DCT coefficient, and a DCT coefficient divider A frequency inverter for inverting the frequency of each of the even-numbered frequency domain DCT coefficients of the divided DCT coefficients, and each of the odd-numbered frequency domain DCT coefficients of the divided DCT coefficients from the DCT coefficient divider. , The DCT coefficient maximum from each of the frequency-inverted even-numbered frequency domain DCT coefficients from the frequency inverter. A plurality of normalizers that respectively output a value and a DCT coefficient that is normalized by the maximum value of the DCT coefficient, and one codebook that commonly uses the normalized DCT coefficients from the plurality of normalizers A plurality of vector quantizers that perform vector quantization by using and read out and output DCT vector numbers of approximate patterns from the one codebook; and the pitch information, the plurality of DCT coefficient maximum values, and the plurality of DCT vector numbers. And a coder that encodes and are encoded in the form of a digital signal sequence and is sent to a transmission line.

6. A plurality of DCs corresponding to pitch information of a voice signal and a plurality of frequency regions divided at equal intervals.
A signal including a T coefficient maximum value and a plurality of DCT vector numbers and encoded in the form of a digital signal sequence is received, the digital signal sequence is separated, and the pitch information, the plurality of DCT coefficient maximum values, and A separation circuit for extracting the plurality of DCT vector numbers, and a plurality of vector dequantizations for reading out vectors corresponding to the plurality of DCT vector numbers from one codebook and dequantizing them to output normalized DCT coefficients, respectively. And a plurality of denormalizations for denormalizing the outputs of the plurality of vector dequantizers by the corresponding maximum values of the plurality of DCT coefficients and outputting normalized DCT coefficients for each frequency domain. A frequency inverter that inverts the DCT coefficients in the even-numbered frequency regions of the outputs from the plurality of denormalizers, respectively, A synthesizer that synthesizes the DCT coefficients in the odd-numbered frequency domain of the outputs from the plurality of denormalizers and the frequency-inverted DCT coefficients in the even-numbered frequency domain and outputs the DCT coefficients And an inverse discrete cosine transformer that reproduces a long-term prediction residual signal by performing an inverse discrete cosine transform on the output of the synthesizer, and a long-term decoder that decodes and reproduces a speech signal by adding the pitch information to the long-term prediction residual signal. A speech decoding apparatus having a predictive synthesizer.