JPS5936280B2

JPS5936280B2 - Adaptive transform coding method for audio

Info

Publication number: JPS5936280B2
Application number: JP57204850A
Authority: JP
Inventors: 健弘守谷; 雅彰誉田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1982-11-22
Filing date: 1982-11-22
Publication date: 1984-09-03
Also published as: JPS5994797A

Description

【発明の詳細な説明】この発明は音声信号を周波数領域に変換し、その量子化
を適応的に変化させる適応変換符号化方式に関する。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to an adaptive transform encoding method that transforms an audio signal into a frequency domain and adaptively changes its quantization.

＜従来技術＞この種の音声符号化方式は例えば゛特開昭５５−５７９
００号「音声信号処理回路」に示されている。<Prior art> This type of audio encoding method is known, for example, from Japanese Patent Application Laid-Open No. 55-579.
No. 00 "Audio Signal Processing Circuit".

この方式は第１図に示すように、入力端子１１よりの入
力音声は例えばＫＨ２でサンプリングされ、各サンプル
値がディジタル信号として直交変換部１２に入力され、
直交変換部１２は例えば第２図Ａに示す一定数の入力音
声サンプル５を・・・・・・ｓ２ｎを離散的フーリエ変
換により周波数領域の信号（スペクトル）ｆ、ｆ２・
・・・・・ｆｎ（第２図Ｂ）に変換されて適応量子化
部１３へ送られる。一方端子１１の入力音声はスペクト
ル包絡抽出部１４・に入力され、入力音声のスペクト
ルの包絡が線形予測分析により推定され、このスペクト
ル包絡及びピッチ周期は適応情報割当部１５に供給され
る。適応情報割当部１５は周波数領域の信号Ｆ，ｆ２・
・・・・・Ｆｎのそれぞれにおけるスペクトル包絡の瞬
時レベルに応じて、このレベルが大きければ割当てビツ
トを多くし、小さければ割当てビツトを少なくするよう
に、ピツチ周期をも考慮して量子化部１３における各信
号Ｆｌｆ２・・・・・・Ｆｎに対する量子化ビツトを適
応的に変化する。このようにして量子化された情報と、
ビツト割当てを示す情報とが合成回路１６で合成されて
符号化出力として送出される。この手法によつて８ＫＨ
ｚサンプリングの音声信号を１６Ｋｂｐｓ程度の情報量
で能率よく符号化でき、高品質の音声が得られる。As shown in FIG. 1, in this method, input audio from an input terminal 11 is sampled, for example, at KH2, and each sample value is input as a digital signal to an orthogonal transform section 12.
The orthogonal transform unit 12 converts a certain number of input voice samples 5 shown in FIG.
... is converted into fn (FIG. 2B) and sent to the adaptive quantization section 13. On the other hand, the input voice of the terminal 11 is input to the spectral envelope extraction section 14. The spectral envelope of the input voice is estimated by linear predictive analysis, and the spectral envelope and pitch period are supplied to the adaptive information allocation section 15. The adaptive information allocation unit 15 receives frequency domain signals F, f2.
. . . Depending on the instantaneous level of the spectral envelope for each Fn, the quantization unit 13 assigns more bits if the level is large, and decreases the assigned bits if it is small, taking into account the pitch period. The quantization bits for each signal Flf2...Fn are adaptively changed. The information quantized in this way,
The information indicating the bit allocation is combined by a combining circuit 16 and sent out as an encoded output. By this method, 8KH
A z-sampled audio signal can be efficiently encoded with an information amount of about 16 Kbps, and high-quality audio can be obtained.

しかし、ビツト割当て情報に２Ｋｂｐｓ程度の情報量が
必要であるため、全体で９．６Ｋｂｐｓ（一般に用いら
れている伝送速服の１つ）以下の情報量で符号化する際
には信号Ｆ，ｆ２・・・・・・Ｆｎをｌビツト／サン
プル以下で量子化する必要がある。この際、周波数の成
分中の強さの小さい区間にはほとんど情報を割当てるこ
とができず、音声品質の大きな劣化を招く。＜発明の概
要＞この発明は入力音声信号を周波数領域に変換し、そ
の変換されたスペクトルをプロツク単位に分割し、その
単位でベクトル量子化をスペクトル包絡情報に応じて適
応的に行うことにより、例えば符号化速度が９．６Ｋｂ
ｐｓ以下においても音声品質の劣化を少なくするように
した音声の適応変換符号化方式を提供することにある。However, since bit allocation information requires an amount of information of about 2 Kbps, when encoding with a total amount of information of 9.6 Kbps or less (one of the commonly used transmission speeds), the signals F, f2 . . . It is necessary to quantize Fn to less than 1 bit/sample. At this time, almost no information can be assigned to sections of low intensity among frequency components, resulting in significant deterioration of voice quality. <Summary of the Invention> This invention converts an input audio signal into the frequency domain, divides the converted spectrum into block units, and adaptively performs vector quantization in each block according to spectrum envelope information. For example, the encoding speed is 9.6Kb
An object of the present invention is to provide an adaptive conversion encoding method for audio that reduces deterioration in audio quality even at frequencies below ps.

またスペクトルをプロツク分割する前にスペクトルは平
坦化しておくことによりベクトル量子化を効率的に行う
ことかできる。＜第１実施例＞第３図はこの発明による音声符号化方式の実施例を示す
。Furthermore, vector quantization can be efficiently performed by flattening the spectrum before dividing it into blocks. <First Embodiment> FIG. 3 shows an embodiment of the speech encoding system according to the present invention.

端子１１からの入力信号は直交変換部１２で１フレーム
を単位に、離散的フーリエ変換（ＤＦＴ）、離散的余弦
変換（ＤＣＴ）などの直交変換により周波数領域の信号
、即ちスペクトルに変換され、このスペクトルはスペク
トル平滑部ＩＴにおいて、別に求められ、量子化された
スペクトル包絡の情報で大域的に平坦化される。即ち端
子１１の入力音声はスペクトル包絡抽出部１４において
線形予測分析によりスペクトル包絡が推定され、このス
ペクトル包絡情報及び音声パワは量子化部１８で補助情
報として量子化され、この量子化出力は局部復号部１９
で復号され、その復号された補助情報によりスペクトル
Ｆ，ｆ２・・・・・・Ｆｎがスペクトル平滑部ＩＴで割
算される。この平坦化されたスペクトルはプロツク分割
部２１で第４図に示すように連続するｐ個ずつのプロツ
クにＦｌ図（Ｆｌｌｆｌ２゜゜’゜゜゜ｆ！ｐ），Ｆ
２（Ｆ２ｌに分割される。スペクトルの各成分子Ｉｉ
（ｉ＝ｌ・・・・・・Ｓ，ｊ＝ｌ・・・・・・ｐ）
はそれぞれ実部Ｒ（Ｆｉｊ）と虚部Ｉ（Ｆｉｊ）とより
なり、各プロツクごとにこれら実部を要素とするベクト
ルＲ（Ｆｓｐ））と、同様に各虚部を要素とするｓ個の
ベクトルＩ（Ｆｉ）＝（Ｉ（ＦｉＪ））とを作る。これ
らベクトルが、予め用意した辞書中の何れの標準ベクト
ルと最もよく対応するかを検出してベクトル量子化をベ
クトル量子化部２２で行う。つまり辞書として予測され
る複数の標準的なベクトルを記憶しておき、入力音声の
ベクトルが何れの標準ベクトルに近いかを検出し、その
一致乃至類似した標準ベクトルを示す番号などの符号を
出力する。従つて各スペクトル成分の強さを量子化する
よりも少ないビツト数で符号化することができる。しか
もこのベクトル量子化に対するビツト割当てを適応的に
変化する。即ち局部復号化部１９の出力である補助情報
の復号出力に応じて前記プロツクごとにビツト数を割当
てる。The input signal from the terminal 11 is converted into a frequency domain signal, that is, a spectrum, by orthogonal transformation such as discrete Fourier transform (DFT) or discrete cosine transform (DCT) in units of one frame in orthogonal transform section 12. The spectrum is separately determined and globally flattened using quantized spectrum envelope information in the spectrum smoothing unit IT. That is, the spectral envelope of the input voice at the terminal 11 is estimated by linear predictive analysis in the spectral envelope extraction section 14, and this spectral envelope information and voice power are quantized as auxiliary information in the quantization section 18, and this quantized output is locally decoded. Part 19
The decoded auxiliary information is used to divide the spectra F, f2 . . . Fn by the spectrum smoothing unit IT. This flattened spectrum is divided into p consecutive blocks by the block dividing unit 21 as shown in FIG.
2 (divided into F2l.Each component element Ii of the spectrum
(i=l...S, j=l...p)
each consists of a real part R (Fij) and an imaginary part I (Fij), and for each block there is a vector R (Fsp)) whose elements are these real parts, and a vector R (Fsp) whose elements are each imaginary part, and s vectors whose elements are each imaginary part. A vector I(Fi)=(I(FiJ)) is created. The vector quantization unit 22 detects which standard vector in a dictionary prepared in advance these vectors most closely corresponds to, and performs vector quantization. In other words, it stores multiple predicted standard vectors as a dictionary, detects which standard vector the input speech vector is close to, and outputs a code such as a number indicating the matching or similar standard vector. . Therefore, the intensity of each spectral component can be encoded using fewer bits than when quantized. Moreover, the bit allocation for this vector quantization is adaptively changed. That is, the number of bits is assigned to each block according to the decoded output of the auxiliary information which is the output of the local decoding section 19.

一般的には強いスペクトルが含まれるプロツクには多く
のビツトを割当て、弱いスペクトル量子化部２２では多
くのビツト数が割当てられる時は、比較すべき標準ベク
トルの数が多い辞書を参照し、少いビツト数が割当てら
れる時は、標準ベクトル数が少い辞書を参照する。標準
ベクトルの要素の数ｐは一定であるから標準ベクトル数
が多い辞書は記憶されている標準ベクトルは微細なパタ
ーンをも表示していることになり、標準ベクトルの数が
少い辞書に記憶されている標準ベクトルは大さつばなパ
ターンを示すに過ぎないと云える。この適応的情報割当
（ビツト割当）は入力信号と出力信号のフレームごとの
ＳＮ比を最大化することを目的として行われる。Generally, a large number of bits are assigned to a program that includes a strong spectrum, and when a large number of bits are assigned to the weak spectrum quantization unit 22, a dictionary with a large number of standard vectors to be compared is referred to, and a small number of standard vectors are referred to. When a large number of bits is allocated, refer to a dictionary with a small number of standard vectors. Since the number p of standard vector elements is constant, the standard vectors stored in a dictionary with a large number of standard vectors also display minute patterns, and the standard vectors stored in a dictionary with a small number of standard vectors are stored in a dictionary with a small number of standard vectors. It can be said that the standard vectors shown only show a large pattern. This adaptive information allocation (bit allocation) is performed for the purpose of maximizing the signal-to-noise ratio of the input signal and the output signal for each frame.

直交変換してもＳＮ比は不変であるから符号化器２４の
スペクトル平滑部ＩＴの出力と受信側の復号化器２５の
スペクトル再生出力との歪を最小とするようにすれはよ
く、歪尺度はユークリツド距離とする。ｌフレームあた
りの歪Ｄは次式である。また全サンプル（スペクトル）
数はｐ・ｓであつてサンプルあたりの平均情報量（平均
量子化ビツト数）Ｒは、である。Since the S/N ratio remains unchanged even after orthogonal transformation, it is best to minimize the distortion between the output of the spectral smoothing unit IT of the encoder 24 and the spectral reproduction output of the decoder 25 on the receiving side. is the Euclidean distance. The distortion D per l frame is given by the following equation. Also all samples (spectra)
The number is p·s, and the average amount of information per sample (average number of quantized bits) R is.

Ｂは一定に保持するから歪Ｄを最小化する量子化ビツト
数Ｂｊは次式となる。このＢｊを整数値化し、２ｂｊ個
からなる辞書から歪最小となるものを選択することで量
子化が実行される。Since B is held constant, the number of quantization bits Bj that minimizes the distortion D is given by the following equation. Quantization is performed by converting this Bj into an integer value and selecting the one with the minimum distortion from a dictionary consisting of 2bj pieces.

なお量子化部１８における量子化もベクトル量子化する
ことができる。Note that the quantization in the quantization unit 18 can also be vector quantization.

このスペクトル包絡の量子化出力、つまり補助情報と、
ベクトル量子化部２２の出力である波形情報とは合成さ
れて符号化出力として復号化器２５へ送られる。復号化
器２５では入力された波形情報が平滑化スペクトル再生
部２６で、符号化器２４におけるベクトル量子化部２２
で用いた辞書と同一のものを用いて標準ベクトルを各プ
ロツクの量子化符号により読出して、平滑化スペクトル
を再生する。The quantized output of this spectral envelope, that is, the auxiliary information,
The waveform information output from the vector quantization unit 22 is combined and sent to the decoder 25 as an encoded output. In the decoder 25, the input waveform information is passed through the smoothing spectrum reproducing unit 26 and the vector quantizing unit 22 in the encoder 24.
Using the same dictionary used in step 1, the standard vector is read out using the quantization code of each block, and the smoothed spectrum is reproduced.

−方入力された補助情報はスペクトル包絡再生部２Ｔで
スペクトル包絡が再生され、これとパワとを再生された
平滑化スペクトルに対してスペクトル再生部２８で乗算
してスペクトルを再生する。この再生されたスペクトル
を逆変換部２９で時間領域に逆変換して出力端子３１に
再生音声信号を得る。＜第２実施例＞上述においては直交変換を行つた後にスペクトル平滑化
を行つたが、入力音声を逆フイルタに通した後に、直交
変換を行つてもよい。The spectrum envelope of the input auxiliary information is reproduced by the spectrum envelope reproduction section 2T, and the spectrum envelope is multiplied by the power by the reproduced smoothed spectrum in the spectrum reproduction section 28 to reproduce the spectrum. The reproduced spectrum is inversely transformed into the time domain by the inverse transformer 29 to obtain a reproduced audio signal at the output terminal 31. <Second Embodiment> In the above description, spectrum smoothing was performed after orthogonal transformation, but orthogonal transformation may be performed after passing the input audio through an inverse filter.

例えば第５図に第３図と対応する部分に同一符号を付け
て示すように入力端子１１からの入力音声信号は逆フイ
ルタ３２を通して直交変換部１２へ供給される。一方入
力音声信号は線形予測分析器３３でスペクトル包絡が分
析され、その分析予測係数は量子化部１８でベクトル量
子化され、その量子化出力は局部復号化部１９で復号化
され、その復号出力、つまり線形予測係数により逆フイ
ルタ３２のフイルタ定数が制御される。この逆フイルタ
３２の出力は残差信号であり、これを直交変換して前述
と同様に符号化して送出する。復号化器２５ではスペク
トル再生部２６でベクトル量子化された符号を復号して
残差信号のスペクトルを再生し、これを時間領域に逆変
換して線形予測合成フイルタ部３４へ送出する。この合
成フイルタ部３４のフイルタ定数は、スペクトル包絡再
生部２Ｔで再生された予測係数により制御され、フイル
タ部３４より音声信号が再生される。第６図Ａに、入力
音声信号の波形Ａｌ，その直交変換出力の実部の波形Ａ
２、虚部の波形Ａ３を示し、第６図Ｂに入力音声信号の
波形Ａ．を逆フイルタ部３２に通した後の残差信号波形
Ｂ，を、この残差信号の直交変換出力の実部の波形Ｂ２
を、虚部の波形Ｂ３をそれぞれ示す。For example, as shown in FIG. 5, in which parts corresponding to those in FIG. On the other hand, the spectral envelope of the input audio signal is analyzed by the linear prediction analyzer 33, the analyzed prediction coefficients are vector quantized by the quantizer 18, the quantized output is decoded by the local decoder 19, and the decoded output is , that is, the filter constant of the inverse filter 32 is controlled by the linear prediction coefficient. The output of this inverse filter 32 is a residual signal, which is orthogonally transformed, encoded in the same manner as described above, and sent out. In the decoder 25, the vector quantized code is decoded by the spectrum reproducing unit 26 to reproduce the spectrum of the residual signal, which is inversely transformed into the time domain and sent to the linear prediction synthesis filter unit 34. The filter constant of this synthesis filter section 34 is controlled by the prediction coefficients reproduced by the spectrum envelope reproduction section 2T, and the audio signal is reproduced from the filter section 34. Figure 6A shows the waveform Al of the input audio signal and the waveform A of the real part of the orthogonal transform output.
2, the waveform A3 of the imaginary part is shown, and FIG. 6B shows the waveform A3 of the input audio signal. The residual signal waveform B after passing through the inverse filter section 32 is the real part waveform B2 of the orthogonal transform output of this residual signal.
and the imaginary part waveform B3 are respectively shown.

音声入力波形Ａ，のスペクトル包絡線Ｂ４と、各プロツ
クに対する割当ビツトＢ５をそれぞれ示す。たゞしｐ＝
６、Ｂ＝１．０の例である。上述において量子化の
単位となるベクトルの次元Ｐを入力音声のピツチ周波数
に適応させ、１フレームの長さをピツチ周期の整数倍と
することで量子化の効率をさらに高めることができる。The spectral envelope B4 of the audio input waveform A and the allocated bits B5 for each block are shown. Tazushi p=
6. This is an example of B = 1.0. In the above, the efficiency of quantization can be further improved by adapting the dimension P of the vector serving as the unit of quantization to the pitch frequency of the input audio and making the length of one frame an integral multiple of the pitch period.

この場合はピツチ周波数は時間的に変化するためピツチ
周波数も補助情報に含める。また、ベクトルを実部、虚
部独立とせず、複素数のままの単位として処理すること
も可能である。また上述における各部はそれぞれ独立し
た或は共通の電子計算機で処理することができる。＜効
果＞以上説明したように、周波数領域で平坦化された信号を
プロツクに分割し適応的情報割当をすることで量子化効
率を高めることができ、特に９．６Ｋｂｐｓ以下でスカ
ラ量子化の従来の適応変換符号化方式より高いＳＮ比を
持つ音声を再生することができる。In this case, since the pitch frequency changes over time, the pitch frequency is also included in the auxiliary information. It is also possible to process vectors as units of complex numbers without making the real and imaginary parts independent. Further, each of the above-mentioned units can be processed by an independent computer or by a common computer. <Effects> As explained above, quantization efficiency can be improved by dividing a signal flattened in the frequency domain into blocks and adaptively allocating information. It is possible to reproduce audio with a higher signal-to-noise ratio than the adaptive transform coding method.

周波数領域の平坦化によりベクトル量子化の標準ベクト
ルの数が少なくて済む。またｌプロツクあたりに割当て
られる情報量が整数であればよく、１サンプルあたりの
情報量はｌ／Ｐビツトの単位で細かく割当てられる。こ
のことにより従来方式の欠点であつた情報量がまつたく
割当てられない周波数成分が存在し、かつそれが適応的
に変化することに起因する聴覚的劣化を避けることがで
きる。次に実験例を述べる。By flattening the frequency domain, the number of standard vectors for vector quantization can be reduced. Further, the amount of information allocated per l block only needs to be an integer, and the amount of information per sample is allocated finely in units of l/P bits. This makes it possible to avoid auditory deterioration caused by the existence of frequency components to which the amount of information cannot be fully allocated, which is a drawback of the conventional method, and which is adaptively changed. Next, an experimental example will be described.

サンプリング凋波数を８ＫＨｚ）線形予測分析部３３の
分析次数を８次、分析長（変換長）を２６〜３１ｍｓ、
分析の重複２ｍｓ（台形窓で接続）、ベクトル次元数を
６〜１２（ピツチ適応）とした場合の情報量Ｂ（ビツト
／サンプル）に対するＳＮ比を第Ｔ図に示す。第６図に
おいて曲線４１は均一量子化で、各１サンプルごとに符
号化した場合、曲線４２は均一量子化で６次元固定ベク
トル符号化した場合、曲線４３は均一量子化でベクトル
の次元をピツチ周波数に応じて変化させて符号化した場
合、曲線４４は適応量子化で各サンプルごとに符号化す
る場合（従来方式）、曲線４５はこの発明の方式で６次
元固定ベクトル量子化による符号化する場合、曲線４６
はこの発明の方式でベクトルの次元をピツチ周波数に応
じて適応的に変化させて符号化する場合である。これら
より、均一量子化（曲線４１〜４３）よりも適応量子化
（曲線４４〜４６）の方が優れ、適応量子化でも従来方
式（曲線４４）よりもこの発明方式（曲線４５，４６）
の方が優れていることが理解される。０．５〜１．１
ビツト／サンプル領域で、このＳＮ比の向上は学習サン
プル外でも女声で２．５ｄＢ）男声で１．０ｄＢ程度得
られた。The sampling frequency is 8 KHz) The analysis order of the linear prediction analysis unit 33 is 8th, the analysis length (conversion length) is 26 to 31 ms,
Figure T shows the S/N ratio for the amount of information B (bits/sample) when the analysis overlap is 2 ms (connected by trapezoidal windows) and the number of vector dimensions is 6 to 12 (pitch adaptation). In FIG. 6, curve 41 is uniform quantization and each sample is encoded, curve 42 is uniform quantization and 6-dimensional fixed vector encoding is performed, and curve 43 is uniform quantization and the vector dimension is pitched. When encoding is performed by changing it according to the frequency, curve 44 is when each sample is encoded by adaptive quantization (conventional method), and curve 45 is encoded by six-dimensional fixed vector quantization using the method of the present invention. If the curve 46
This is a case where the method of the present invention is used to adaptively change the dimension of a vector according to the pitch frequency for encoding. From these results, adaptive quantization (curves 44 to 46) is better than uniform quantization (curves 41 to 43), and even in adaptive quantization, the inventive method (curves 45 and 46) is better than the conventional method (curve 44).
It is understood that this is better. 0.5~1.1
In the bit/sample domain, this improvement in SN ratio was obtained even outside of the training samples by 2.5 dB for female voices and 1.0 dB for male voices.

スペクトル包絡もベクトル量子化することによりピツチ
、パワなどを含めて補助情報は８００ｂｐｓ程度を見積
ることができるから、残差信号ｌサンプル当りの情報量
Ｂが０．５で４．８Ｋｂｐｓ）１．１で９．６Ｋｂｐｓ
の符号化が可能である。By vector quantizing the spectrum envelope, the auxiliary information including pitch, power, etc. can be estimated at about 800 bps, so if the information amount B per residual signal l sample is 0.5, it is 4.8 Kbps) 1.1 at 9.6Kbps
It is possible to encode

[Brief explanation of the drawing]

第１図は従来の適応変換符号化方式を示すプロツク図、
第２図はその動作の説明に供する図、第３図はこの発明
による適応変換符号化方式の一例を示すプロツク図、第
４図はそのプロツク分割の例を示す図、第５図はこの発
明の他の例を示すプロツク図、第６図はその動作例を示
す図、第Ｔ図は各種符号化方式のＳＮ比−情報量Ｂとの
関係を示す図である。１１：音声入力、１２：直交変換部、１４：スベクトル
包絡抽出器、ＩＴ：スベクトル平滑部、１８：ベクトル
量子化器、１９：局部復号化器、２１：プロツク分割部
、２２：ベクトル量子化器、２３：適応情報割当部、２
４：符号化器。Figure 1 is a block diagram showing a conventional adaptive transform encoding method.
FIG. 2 is a diagram for explaining its operation, FIG. 3 is a block diagram showing an example of the adaptive transform encoding method according to the present invention, FIG. 4 is a diagram showing an example of block division, and FIG. FIG. 6 is a diagram showing an example of its operation, and FIG. T is a diagram showing the relationship between the SN ratio and the amount of information B of various encoding methods. 11: Audio input, 12: Orthogonal transform unit, 14: Svector envelope extractor, IT: Svector smoothing unit, 18: Vector quantizer, 19: Local decoder, 21: Block division unit, 22: Vector quantum Converter, 23: Adaptive information allocation unit, 2
4: Encoder.

Claims

[Claims] 1. In an encoding method in which a fixed number of sample value sequences of an audio signal are set as one frame, a spectrum is obtained by orthogonal transformation for each frame, and adaptively quantized, the envelope of the spectrum is a spectral envelope extraction means for calculating and quantizing the power and encoding it as auxiliary information; a local decoding means for decoding the auxiliary information; and a spectral envelope extraction means for decoding the auxiliary information. a block dividing means that divides the flattened spectrum decoded sequence into blocks on the frequency axis; 1. An adaptive transform encoding method for speech, comprising an adaptive information allocation means for allocating information, and a vector quantization means for vector quantizing the divided spectral signal sequence according to the allocation. 2 In a coding method that analyzes and encodes a sample value sequence of an audio signal in units of one frame, linear prediction analyzes the audio signal, determines its spectral envelope, quantizes it together with power, and encodes it as auxiliary information. an analysis means, a local decoding means for decoding the auxiliary information, a filter constant is controlled by a linear prediction coefficient of the decoded auxiliary information, and a sample value sequence of the audio signal is input, and a residual signal is output. an orthogonal exchange means for orthogonally exchanging the residual signal for each frame to obtain a spectrum; a block dividing means for dividing the spectrum into blocks on the frequency axis; Adaptive information allocation means for adaptively allocating information using the auxiliary information, and vector quantization means for vector quantizing the divided spectral residual signal sequence according to the allocation. Adaptive transform coding method.