JPH09288498A

JPH09288498A - Voice coding device

Info

Publication number: JPH09288498A
Application number: JP8098358A
Authority: JP
Inventors: Masayuki Misaki; 正之三崎; Mineo Tsushima; 峰生津島; Takeshi Norimatsu; 武志則松
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1996-04-19
Filing date: 1996-04-19
Publication date: 1997-11-04

Abstract

PROBLEM TO BE SOLVED: To improve quality of a reproduction by performing a bit assignment efficiently while utilizing a continuity time masking effect in a voice coding device reducing the redundancy included in voice and audio signals. SOLUTION: The compensation quantity of a masking threshold value is calculated by a simultaneous masking compensating means 13 and a continuity time masking compensating means 14 based on information of a spectrum envelope calculating means 11 and a power calculating means 12 and a masking threshold value 15a in which maskings of front and rear frames are concidered is finally determined in a masking threshold determining circuit 15 and after the assigning of a bit quantity is adaptively performed according to power of the value 16a obtained by dividing the spectra information 11a calculated in the spectrum envelope calculating meas 11 by the masking threshold value and then the quantization of a spectrum parameter is performed.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は音声，オーディオ信
号に含まれる冗長度を減少させる音声符号化装置に関
し、特に継時マスキングをも考慮することにより符号化
効率を向上できるものに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech coding apparatus for reducing redundancy included in speech and audio signals, and more particularly to a speech coding apparatus capable of improving coding efficiency by taking continuous masking into consideration.

【０００２】[0002]

【従来の技術】従来の音声符号化装置として特開平７−
２６１７９９号公報に開示されているものがある。図５
に、その音声符号化装置の構成をブロック図で示す。以
下、図面を参照しながら、従来の音声符号化装置の構成
について説明を行う。入力される音声信号は、図５のブ
ロックに入力される以前に、まず帯域分割手段により複
数の周波数帯域に分割されているものとする。図５のブ
ロックは、そのうちの１つの周波数帯域を符号化するも
のである。図５において、５１は所定の変換ブロック長
で直交変換係数を計算する分析手段、５２はＦＦＴ(Fas
t Fourier Transform)によりパワースペクトルを計算す
るフーリエ変換手段、５３は直交変換係数を、与えられ
たビット量で量子化する量子化手段、５４は直交変換係
数に対するビット割り当て量を計算するビット割り当て
情報生成手段、５５は量子化データとビット割り当て情
報とを多重化して符号化データとして出力するフォーマ
ッティング手段である。2. Description of the Related Art Japanese Unexamined Patent Publication No. 7-
Some are disclosed in Japanese Patent No. 261799. FIG.
The block diagram of the configuration of the speech coding apparatus is shown in FIG. Hereinafter, the configuration of a conventional speech encoding apparatus will be described with reference to the drawings. It is assumed that the input audio signal is first divided into a plurality of frequency bands by the band dividing means before being input to the block of FIG. The block of FIG. 5 encodes one of the frequency bands. In FIG. 5, reference numeral 51 is an analyzing means for calculating an orthogonal transform coefficient with a predetermined transform block length, and 52 is an FFT (Fas
Fourier transform means for calculating the power spectrum by (T Fourier Transform), 53 is a quantizing means for quantizing the orthogonal transform coefficient with a given bit amount, and 54 is bit allocation information generation for calculating the bit allocation amount for the orthogonal transform coefficient. Numeral 55 is a formatting means for multiplexing the quantized data and the bit allocation information and outputting it as encoded data.

【０００３】この装置では、入力信号を帯域分割して分
離し、各帯域パワーの偏在を利用しつつ、各帯域におい
て図５の処理を行うものである。すなわち、各帯域内で
のエネルギーの偏在を減少させることでダイナミックレ
ンジを縮小し、各帯域には各々のパワーの比率に応じて
ビット量を割り当てている。また、この帯域分割後のデ
ータに直交変換を施すことにより、さらにエネルギーの
集中性を高めて量子化を行い、符号化効率を向上させて
いる。これらの技術をまとめた参考文献としては、日本
音響学会誌５１巻１０号pp.790〜796 、「オーディオ符
号化」杉山昭彦、などがある。In this device, an input signal is divided into bands and separated, and the process of FIG. 5 is performed in each band while utilizing the uneven distribution of power in each band. That is, the dynamic range is reduced by reducing the uneven distribution of energy in each band, and the amount of bits is assigned to each band according to the ratio of each power. Further, by performing orthogonal transformation on the data after the band division, energy concentration is further enhanced and quantization is performed to improve coding efficiency. References summarizing these techniques include the Acoustical Society of Japan, Vol. 51, No. 10, pp. 790-796, "Audio Coding" Akihiko Sugiyama.

【０００４】以下、この従来の音声符号化装置の動作を
説明する。入力音声信号は、分析手段５１で、直交変換
として離散Cosine変換（Descrete Cosine Transform,以
下ＤＣＴと称す）などを用いて周波数領域のパラメータ
に変換される。一方、ビット割り当て情報生成手段５４
では、フーリエ変換手段で得られる音声データのパワー
スペクトル情報をもとに、同時マスキングによる最小可
聴値の上昇分を補正したマスキング閾値を求め、この閾
値を超える周波数成分にだけ、超えた量に応じたビット
量の割り当てを行う。ただし、１フレームに割り当てる
ことができる情報量は一定であるので（ビットレートが
一定のとき）、トータルのビット量を一定としてマスキ
ング閾値を超える量に比例的にビット量を配分すること
としている。量子化手段５３では、このようにして１フ
レーム毎に求められた各周波数成分に対するビット割り
当て量に応じて、対応する直交変換係数の量子化を行
う。The operation of this conventional speech coder will be described below. The input voice signal is converted into frequency domain parameters by the analyzing means 51 by using a discrete Cosine transform (hereinafter referred to as DCT) as an orthogonal transform. On the other hand, the bit allocation information generating means 54
Then, based on the power spectrum information of the audio data obtained by the Fourier transform means, find a masking threshold value that corrects the increase in the minimum audible value due to simultaneous masking, and only the frequency components exceeding this threshold value Allocate a bit amount. However, since the amount of information that can be assigned to one frame is constant (when the bit rate is constant), the total bit amount is fixed and the bit amount is proportionally distributed to the amount exceeding the masking threshold. The quantizing means 53 quantizes the corresponding orthogonal transform coefficient in accordance with the bit allocation amount for each frequency component thus obtained for each frame.

【０００５】このようなマスキング閾値を利用したビッ
ト割り当てを行うことで、聴覚的に歪みが知覚され難く
なり、符号化効率が向上することになる。聴覚マスキン
グについては、例えば「聴覚心理学概論」B.C.J.Moore
著，大串健吾監訳，誠信書房が詳しい。By performing bit allocation using such a masking threshold value, distortion is less perceptible perceptually, and coding efficiency is improved. About hearing masking, for example, "Introduction to Auditory Psychology" BCJ Moore
Written by Kengo Ogushi and detailed by Seishin Shobo.

【０００６】[0006]

【発明が解決しようとする課題】しかしながら、上記従
来の音声符号化装置のような構成では、時間的にずれた
信号成分のマスキング効果である継時マスキング（非同
時マスキングともいう）を考慮したビット割り当てを行
うことはできない。However, in the configuration such as the above-mentioned conventional speech coding apparatus, the bit considering the continuous masking (also referred to as non-simultaneous masking) which is the masking effect of the time-shifted signal components. No assignment can be made.

【０００７】本発明は上記課題に鑑み、同時マスキング
と継時マスキングとを考慮したビット割り当てを行うこ
とにより、エネルギーが急激に変化する音声やオーディ
オ信号の立ち上がり部分などにおいて符号化による歪み
が知覚され難くなり、トータルとして符号化効率の向上
を図れる音声符号化装置を提供することを目的とする。In view of the above problems, the present invention performs bit allocation in consideration of simultaneous masking and continuous masking, so that distortion due to coding is perceived in a voice or a rising portion of an audio signal whose energy changes abruptly. It is an object of the present invention to provide a speech coding apparatus which becomes difficult and which can improve the coding efficiency as a whole.

【０００８】[0008]

【課題を解決するための手段】この目的を達成するため
に、請求項１の音声符号化装置は、入力信号のスペクト
ル情報をフレーム単位で求めるスペクトル情報演算手段
と、前記入力信号のスペクトル包絡をフレーム単位で求
めるスペクトル包絡演算手段と、前記入力信号のパワー
をフレーム単位で求めるパワー演算手段と、前記入力信
号のスペクトル包絡をもとに同時マスキングの影響を考
慮した聴覚の最小可聴値の補正量を求める同時マスキン
グ補正手段と、前記入力信号のパワーの複数フレーム間
にわたる時間的な変化と前記入力信号の包絡とをもとに
継時マスキングの影響を考慮した聴覚の最小可聴値の補
正量を求める継時マスキング補正手段と、前記同時マス
キング補正手段，及び継時マスキング補正手段で求めた
各補正量をもとに、聴覚の最小可聴値を補正してマスキ
ング閾値を決定するマスキング閾値決定手段と、前記ス
ペクトル情報を前記マスキング閾値で除算し、該マスキ
ング閾値で除算されたスペクトル情報のパワーに応じて
適応的にビット割り当てを行うビット割り当て制御手段
と、前記マスキング閾値で除算されたスペクトル情報を
前記ビット割り当て量に応じて量子化するスペクトル情
報量子化手段とを備えた構成となっている。In order to achieve this object, a speech coding apparatus according to a first aspect of the present invention comprises a spectrum information calculation means for obtaining spectrum information of an input signal in frame units, and a spectrum envelope of the input signal. Spectral envelope calculation means for obtaining the power of the input signal in frame units, power calculation means for obtaining the power of the input signal in frame units, and a correction amount of the minimum audible value of hearing considering the influence of simultaneous masking based on the spectrum envelope of the input signal And a simultaneous masking correction means for determining the correction amount of the minimum audible value of the hearing considering the influence of the continuous masking based on the temporal change of the power of the input signal over a plurality of frames and the envelope of the input signal. Based on the required continuous masking correction means, the simultaneous masking correction means, and the respective correction amounts obtained by the continuous masking correction means. Masking threshold value determining means for correcting the minimum audible value of hearing and determining a masking threshold value, and dividing the spectrum information by the masking threshold value and adaptively allocating bits according to the power of the spectrum information divided by the masking threshold value. And a spectrum information quantization means for quantizing the spectrum information divided by the masking threshold according to the bit allocation amount.

【０００９】請求項２の音声符号化装置は、請求項１の
音声符号化装置において、前記ビット割り当て制御手段
が、前記マスキング閾値で除算されたスペクトル情報の
パワーの複数フレーム間にわたる変化に応じて、各フレ
ームに対し適応的にビット割り当てを行うものである構
成となっている。A speech coding apparatus according to a second aspect is the speech coding apparatus according to the first aspect, wherein the bit allocation control means is responsive to a change in the power of the spectrum information divided by the masking threshold over a plurality of frames. , Bits are adaptively assigned to each frame.

【００１０】請求項３の音声符号化装置は、入力信号を
フレーム単位で周波数成分に直交変換して直交変換係数
を求める直交変換手段と、前記入力信号のパワーをフレ
ーム単位で求めるパワー演算手段と、前記直交変換係数
から得たスペクトル包絡をもとに同時マスキングの影響
を考慮した聴覚の最小可聴値の補正量を求める同時マス
キング補正手段と、前記入力信号のパワーの複数フレー
ム間にわたる時間的な変化と前記直交変換係数から得た
スペクトル包絡をもとに継時マスキングの影響を考慮し
た聴覚の最小可聴値の補正量を求める継時マスキング補
正手段と、前記同時マスキング補正手段，及び継時マス
キング補正手段で求めた各補正量をもとに、聴覚の最小
可聴値を補正してマスキング閾値を決定するマスキング
閾値決定手段と、前記直交変換係数を前記マスキング閾
値で除算する除算手段と、前記マスキング閾値で除算さ
れた直交変換係数のパワーに応じて適応的にビット割り
当てを行うビット割り当て制御手段と、前記除算手段で
除算された直交変換係数を前記ビット割り当て量に応じ
て量子化するスペクトルパラメータ量子化手段とを備え
た構成となっている。According to a third aspect of the present invention, there is provided a speech coding apparatus, comprising: an orthogonal transforming means for orthogonally transforming an input signal into frequency components in frame units to obtain orthogonal transform coefficients; , A simultaneous masking correction means for obtaining a correction amount of a minimum auditory audible value in consideration of the effect of simultaneous masking on the basis of a spectral envelope obtained from the orthogonal transform coefficient, and temporal power over a plurality of frames of the power of the input signal. A continuous masking correction means for obtaining a correction amount of the minimum audible value of the hearing considering the influence of the continuous masking based on the change and the spectral envelope obtained from the orthogonal transform coefficient, the simultaneous masking correction means, and the continuous masking. Masking threshold value determining means for correcting the minimum audible value of hearing based on each correction amount obtained by the correcting means, and determining a masking threshold value, The division means for dividing the orthogonal transformation coefficient by the masking threshold, the bit allocation control means for adaptively allocating bits according to the power of the orthogonal transformation coefficient divided by the masking threshold, and the division means for division. A spectrum parameter quantizing means for quantizing the orthogonal transform coefficient according to the bit allocation amount is provided.

【００１１】請求項４の音声符号化装置は、入力信号を
フレーム単位で線形予測分析して線形予測係数，及び予
測残差を求める線形予測分析手段と、前記予測残差を周
波数成分に直交変換して直交変換係数を求める直交変換
手段と、前記入力信号のパワーをフレーム単位で求める
パワー演算手段と、前記線形予測係数から得たスペクト
ル包絡をもとに同時マスキングの影響を考慮した聴覚の
最小可聴値の補正量を求める同時マスキング補正手段
と、前記入力信号のパワーの複数フレーム間にわたる時
間的な変化と前記線形予測係数から得たスペクトル包絡
とをもとに継時マスキングの影響を考慮した聴覚の最小
可聴値の補正量を求める継時マスキング補正手段と、前
記同時マスキング補正手段，及び継時マスキング補正手
段で求めた各補正量をもとに、聴覚の最小可聴値を補正
してマスキング閾値を決定するマスキング閾値決定手段
と、前記線形予測係数から得たスペクトル包絡を前記マ
スキング閾値で除算し、該マスキング閾値で除算したス
ペクトル包絡のパワーの複数フレーム間にわたる変化に
応じて、各フレームに対し適応的にビット割り当てを行
うフレーム間ビット割り当て制御手段と、前記直交変換
係数をベクトル量子化するための代表ベクトルのコード
を有するコードブックと、該コードブックを用いて前記
直交変換係数に対して最適な代表ベクトルのコードとゲ
イン情報とを決定し、かつ該ゲイン情報を前記ビット割
り当て量に応じて量子化するスペクトルパラメータ量子
化手段とを備えた構成となっている。According to another aspect of the speech coding apparatus of the present invention, a linear prediction analysis unit for linearly predicting an input signal on a frame-by-frame basis to obtain a linear prediction coefficient and a prediction residual, and the prediction residual is orthogonally transformed into frequency components. Orthogonal transformation means for obtaining the orthogonal transformation coefficient by means of the above, power calculation means for obtaining the power of the input signal on a frame-by-frame basis, and minimum auditory sense considering the influence of simultaneous masking based on the spectral envelope obtained from the linear prediction coefficient. Simultaneous masking correction means for obtaining the correction amount of the audible value, the influence of the successive masking is considered based on the temporal change of the power of the input signal over a plurality of frames and the spectral envelope obtained from the linear prediction coefficient. Continuous masking correction means for obtaining the correction amount of the minimum audible value of hearing, each correction amount obtained by the simultaneous masking correction means, and the continuous masking correction means Based on the masking threshold value determining means for correcting the minimum audible value of the auditory sense to determine the masking threshold value, the spectrum envelope obtained from the linear prediction coefficient is divided by the masking threshold value, and the spectrum envelope of the masking threshold value is divided. An inter-frame bit allocation control means for adaptively allocating bits to each frame according to a change in power over a plurality of frames; and a codebook having a representative vector code for vector quantizing the orthogonal transform coefficient. , A spectrum parameter quantizing means for determining an optimum representative vector code and gain information for the orthogonal transform coefficient using the codebook, and quantizing the gain information according to the bit allocation amount. It is equipped with it.

【００１２】請求項５の音声符号化装置は、入力信号を
フレーム単位で線形予測分析して線形予測係数，及び予
測残差を求める線形予測分析手段と、前記予測残差を周
波数成分に直交変換して直交変換係数を求める直交変換
手段と、前記入力信号のパワーをフレーム単位で求める
パワー演算手段と、前記線形予測係数から得たスペクト
ル包絡をもとに同時マスキングの影響を考慮した聴覚の
最小可聴値の補正量を求める同時マスキング補正手段
と、前記入力信号のパワーの複数フレーム間にわたる時
間的な変化と前記線形予測係数から得たスペクトル包絡
とをもとに継時マスキングの影響を考慮した聴覚の最小
可聴値の補正量を求める継時マスキング補正手段と、前
記同時マスキング補正手段，及び継時マスキング補正手
段で求めた各補正量をもとに、聴覚の最小可聴値を補正
してマスキング閾値を決定するマスキング閾値決定手段
と、前記線形予測係数から得たスペクトル包絡を前記マ
スキング閾値で除算し、該マスキング閾値で除算したス
ペクトル包絡のパワーの複数フレーム間にわたる変化に
応じて、各フレームに対し適応的にビット割り当てを行
うフレーム間ビット割り当て制御手段と、前記直交変換
係数をベクトル量子化するための代表ベクトルのコード
をそれぞれ有し、その前記代表ベクトルの数，次元数等
のコードブックサイズ、及び該コードブックサイズに対
応する所要ビット量がそれぞれ異なる複数のコードブッ
クと、該複数のコードブックのうちから前記ビット割り
当て量に応じたコードブックサイズのコードブックを選
択し、該選択されたコードブックを用いて前記直交変換
係数に対して最適な代表ベクトルのコードとゲイン情報
とを決定するスペクトルパラメータ量子化手段とを備え
た構成となっている。According to a fifth aspect of the present invention, there is provided a speech coding apparatus, wherein linear prediction analysis means for linearly predicting and analyzing an input signal in frame units to obtain a linear prediction coefficient and a prediction residual, and the prediction residual is orthogonally transformed into frequency components. Orthogonal transformation means for obtaining the orthogonal transformation coefficient by means of the above, power calculation means for obtaining the power of the input signal on a frame-by-frame basis, and minimum auditory sense considering the influence of simultaneous masking based on the spectral envelope obtained from the linear prediction coefficient. Simultaneous masking correction means for obtaining the correction amount of the audible value, the influence of the successive masking is considered based on the temporal change of the power of the input signal over a plurality of frames and the spectral envelope obtained from the linear prediction coefficient. Continuous masking correction means for obtaining the correction amount of the minimum audible value of hearing, each correction amount obtained by the simultaneous masking correction means, and the continuous masking correction means Based on the masking threshold value determining means for correcting the minimum audible value of the auditory sense to determine the masking threshold value, the spectrum envelope obtained from the linear prediction coefficient is divided by the masking threshold value, and the spectrum envelope of the masking threshold value is divided. In accordance with a change in power over a plurality of frames, an inter-frame bit allocation control unit that adaptively allocates bits to each frame, and a representative vector code for vector quantizing the orthogonal transform coefficient, respectively, A codebook size such as the number of the representative vectors and the number of dimensions, and a plurality of codebooks each having a different required bit amount corresponding to the codebook size, and a plurality of codebooks corresponding to the bit allocation amount from the plurality of codebooks. Select a codebook of codebook size and use the selected codebook. It has a configuration that includes a spectral parameter quantization means for determining the code and gain information of the optimum representative vector for the orthogonal transform coefficient Te.

【００１３】請求項６の音声符号化装置は、請求項４，
又は５の音声符号化装置において、前記スペクトルパラ
メータ量子化手段が、周波数軸に関して聴覚重み付けさ
れた尺度で前記代表ベクトルのコードとゲイン情報とを
決定するものである構成になっている。A speech coding apparatus according to a sixth aspect is the speech coding apparatus according to the fourth aspect.
Alternatively, in the speech coding apparatus of No. 5, the spectrum parameter quantizing means is configured to determine the code and gain information of the representative vector on a perceptually weighted scale with respect to the frequency axis.

【００１４】請求項７の音声符号化装置は、入力信号の
周波数帯域を第１の周波数帯域と該第１の周波数帯域よ
り低い第２の周波数帯域とに分割する帯域分割手段と、
前記第１，第２の周波数帯域に、各周波数帯域の信号の
パワーの比に応じた量のビットをそれぞれ割り当てる帯
域ビット割り当て制御手段と、前記第１の周波数帯域の
信号が入力信号として入力され、そのビット割り当て制
御手段が、前記第１の周波数帯域に割り当てられたビッ
トの割り当てを行うものである請求項3 に記載の音声符
号化装置からなる第１の音声符号化モジュールと、前記
第２の周波数帯域の信号が入力信号として入力され、そ
のフレーム間ビット割り当て制御手段が、前記第２の周
波数帯域に割り当てられたビットの割り当てを行うもの
である請求項４，又は５に記載の音声符号化装置からな
る第２の音声符号化モジュールとを備えた構成となって
いる。According to a seventh aspect of the present invention, there is provided a voice encoding device, which comprises a band dividing means for dividing a frequency band of an input signal into a first frequency band and a second frequency band lower than the first frequency band.
Band bit allocation control means for respectively allocating to the first and second frequency bands an amount of bits corresponding to the power ratio of signals in the respective frequency bands, and the signal in the first frequency band are input as input signals. The first speech coding module comprising the speech coding apparatus according to claim 4, wherein the bit allocation control means allocates the bits allocated to the first frequency band, and the second speech coding module. 6. The voice code according to claim 4, wherein a signal in the frequency band of 1 is input as an input signal, and the inter-frame bit allocation control means allocates the bits allocated to the second frequency band. And a second speech encoding module composed of an encoding device.

【００１５】[0015]

BEST MODE FOR CARRYING OUT THE INVENTION

実施の形態１．本発明の実施の形態１は請求項１，２，
３に対応するものである。図１は本実施の形態１による
音声符号化装置の構成を示すブロック図である。図１に
おいて、１１は直交変換手段（スペクトル情報演算手
段、スペクトル包絡演算手段）、１２はパワー演算手
段、１３は同時マスキング補正手段、１４は継時マスキ
ング補正手段、１５はマスキング閾値決定手段、１６は
除算手段、１７はスペクトルパラメータ量子化手段（ス
ペクトル情報量子化手段）、１８はビット割り当て制御
手段である。Embodiment 1. The first embodiment of the present invention is defined in claims 1, 2,
It corresponds to 3. FIG. 1 is a block diagram showing the configuration of the speech coding apparatus according to the first embodiment. In FIG. 1, 11 is orthogonal transformation means (spectrum information calculation means, spectrum envelope calculation means), 12 is power calculation means, 13 is simultaneous masking correction means, 14 is continuous masking correction means, 15 is masking threshold value determination means, 16 Is a division means, 17 is a spectrum parameter quantization means (spectrum information quantization means), and 18 is a bit allocation control means.

【００１６】以下、動作について説明する。直交変換手
段１１は、入力された音声信号を、ＤＣＴなどによりフ
レーム単位で周波数成分に直交変換し、直交変換係数１
１ａ，及び該直交変換係数１１ａのスペクトル包絡１１
ｂを求める。パワー演算手段１２は、入力された音声信
号の各フレームのパワーを求める。同時マスキング補正
手段１３は、前記スペクトル包絡１１ｂをもとに、同時
マスキングによる聴覚の最小可聴値の補正量１３ａを各
フレーム毎に求める。また、継時マスキング補正手段１
４は、パワー演算手段１２によって得られた過去数フレ
ーム分のパワー１２ａと、前記スペクトル包絡１１ｂを
もとに、継時マスキングによる最小可聴値の補正量１４
ａを各フレーム毎に求める。ここで、同時マスキング補
正手段１３，及び継時マスキング補正手段１４に直交変
換係数１１ａを入力し、これら補正手段１３，１４でス
ペクトル包絡１１ｂを求めるようにすることもできる。The operation will be described below. The orthogonal transform means 11 orthogonally transforms the input audio signal into frequency components in frame units by DCT or the like, and orthogonal transform coefficient 1
1a and the spectral envelope 11 of the orthogonal transform coefficient 11a
Find b. The power calculation means 12 calculates the power of each frame of the input audio signal. The simultaneous masking correction means 13 obtains the correction amount 13a of the minimum auditory audible value by simultaneous masking for each frame based on the spectral envelope 11b. Further, the continuous masking correction means 1
4 is a minimum audible value correction amount 14 by successive masking based on the power 12a for the past several frames obtained by the power calculation means 12 and the spectrum envelope 11b.
a is obtained for each frame. Here, it is also possible to input the orthogonal transform coefficient 11a to the simultaneous masking correction means 13 and the successive masking correction means 14 and obtain the spectrum envelope 11b by these correction means 13 and 14.

【００１７】この継時マスキングには、マスカーがマス
キーに対して先行する順行性マスキングと、マスカーが
マスキーより後で現れる逆行性マスキングの両方が考え
られる。いま、フレーム長を１０msec程度にすると、現
在のフレームのパワーとその前後のフレームのパワーの
値に応じてマスキング効果を生じることになる。このマ
スク量は、順行性マスキング，あるいは逆行性マスキン
グのいずれであるかにより大きさが異なる。また、継時
マスキングによる最小可聴値の補正は、補正を行うフレ
ームの前後両方から行う必要がある。従って、数フレー
ム分のパワー１２ａ，及びスペクトル包絡１１ｂの値を
一旦メモリなどに記憶して、過去のフレームに溯って継
時マスキングの補正量１４ａを決定するようにしてい
る。マスキング閾値決定手段１５は、聴覚の最小可聴値
をもとに、同時マスキング補正手段１３，及び継時マス
キング補正手段１４によって与えられた補正量１３ａ，
１４ａに従って、各フレーム毎に最小可聴値を該補正量
１３ａ，１４ａの分だけ上昇させるよう補正し、この補
正した最小可聴値をマスキング閾値１５ａとして出力す
る。同時マスキング，及び継時マスキングに関しては、
例えば前記「聴覚心理学概論」に詳しい。As the continuous masking, both anterograde masking in which the masker precedes the masky and a retrograde masking in which the masker appears after the masky can be considered. Now, if the frame length is set to about 10 msec, a masking effect is produced according to the power of the current frame and the power values of the frames before and after it. The amount of this mask differs depending on whether it is the forward masking or the backward masking. Further, the minimum audible value should be corrected by successive masking both before and after the frame to be corrected. Therefore, the power 12a for several frames and the value of the spectrum envelope 11b are temporarily stored in a memory or the like, and the correction amount 14a of the successive masking is determined based on the past frames. The masking threshold value determining means 15 determines the correction amount 13a provided by the simultaneous masking correcting means 13 and the successive masking correcting means 14 on the basis of the minimum audible value of hearing.
14a, the minimum audible value is corrected for each frame so as to be increased by the correction amounts 13a and 14a, and the corrected minimum audible value is output as the masking threshold 15a. Regarding simultaneous masking and continuous masking,
For example, see "Introduction to Auditory Psychology" above.

【００１８】次に、除算手段１６では、周波数成分に直
交変換された直交変換係数１１ａが、マスキング閾値決
定手段１５で最終的に決定された各フレームのマスキン
グ閾値１５ａで除算（対数目盛り上では減算となる）さ
れる。スペクトルパラメータ量子化手段１７では、前記
除算後の直交変換係数１６ａを、ある帯域幅毎にグルー
ピングして、所定の周波数成分群１７ａとし、これをビ
ット割り当て制御手段１８に出力する。ビット割り当て
制御手段１８では、各フレームについて、入力された所
定の周波数成分群１７ａの各周波数成分のパワーの値に
応じて適応的にビット割り当てを行うとともに、この際
に、前記所定の周波数成分群１７ａのフレーム毎の平均
パワーの複数フレーム間にわたる変化に応じて、各フレ
ームに対し適応的にビット割り当てを行い、これをスペ
クトルパラメータ量子化手段１７に出力する（１８ａ）
とともに、補助情報１８ｂとして外部に出力する。ここ
で、本実施の形態１では、ビットレートは固定であるの
で、前記各フレームに対するビット割り当ては、複数フ
レームのトータルのビット量を一定として行われる。ス
ペクトルパラメータ量子化手段１７では、各フレームの
前記所定の周波数成分群１７ａの各周波数成分に割り当
てられたビット量１８ａに応じて、該所定の周波数成分
群１７ａの量子化を行い、これを符号化出力１７ａとし
て外部に出力する。Next, in the dividing means 16, the orthogonal transformation coefficient 11a orthogonally transformed into the frequency component is divided by the masking threshold 15a of each frame finally decided by the masking threshold decision means 15 (subtraction on the logarithmic scale). Will be). The spectrum parameter quantizer 17 groups the divided orthogonal transform coefficients 16a for each certain bandwidth to form a predetermined frequency component group 17a, which is output to the bit allocation controller 18. The bit allocation control means 18 adaptively allocates bits to each frame according to the power value of each frequency component of the input predetermined frequency component group 17a, and at this time, the predetermined frequency component group Bits are adaptively assigned to each frame according to the change in the average power of each frame of 17a over a plurality of frames, and this is output to the spectrum parameter quantizing means 17 (18a).
At the same time, the auxiliary information 18b is output to the outside. Here, in the first embodiment, since the bit rate is fixed, the bit allocation for each frame is performed with the total bit amount of a plurality of frames being constant. The spectrum parameter quantizing means 17 quantizes the predetermined frequency component group 17a according to the bit amount 18a assigned to each frequency component of the predetermined frequency component group 17a of each frame, and encodes this. It is output to the outside as the output 17a.

【００１９】以上のように、本実施の形態１において
は、入力信号のパワー１２ａの複数フレーム間にわたる
時間的な変化と入力信号の包絡１１ｂとをもとに継時マ
スキングの影響をも考慮してマスキング閾値１５ａを決
定するようにしたので、エネルギーが急激に変化する音
声やオーディオ信号の立ち上がり部分などにおいて符号
化による歪みが知覚され難くなり、トータルとして符号
化効率の向上を図ることができる。As described above, in the first embodiment, the influence of the successive masking is also taken into consideration based on the temporal change of the power 12a of the input signal over a plurality of frames and the envelope 11b of the input signal. Since the masking threshold 15a is determined by the above, the distortion due to the coding is less likely to be perceived in the voice or the rising portion of the audio signal where the energy changes abruptly, and the coding efficiency can be improved as a whole.

【００２０】また、ビット割り当て制御手段１８が、マ
スキング閾値１５ａで除算された直交変換係数１６ａ
（所定の周波数成分群１７ａ）のフレーム毎の平均パワ
ーの複数フレーム間にわたる変化に応じて、各フレーム
に対し適応的にビット割り当てを行うようにしたので、
該除算後の直交変換係数１６ａが複数フレーム間におい
てエネルギーの偏りを持つ場合などにおいて、符号化効
率を向上させることができる。Also, the bit allocation control means 18 divides the orthogonal transform coefficient 16a by the masking threshold 15a.
Since bit allocation is adaptively performed for each frame according to a change in average power of each frame of the (predetermined frequency component group 17a) over a plurality of frames,
When the orthogonal transform coefficient 16a after the division has an energy bias between a plurality of frames, the coding efficiency can be improved.

【００２１】また、入力信号を直交変換して直交変換係
数１１ａを求め、該直交変換係数１１ａを量子化し、あ
るいは該直交変換係数１１ａからスペクトル包絡１１ｂ
を得るようにしたので、直交変換を利用し，かつ継時マ
スキングをも考慮して効率よく入力信号を符号化できる
音声符号化装置を提供できる。Further, the input signal is orthogonally transformed to obtain the orthogonal transformation coefficient 11a, the orthogonal transformation coefficient 11a is quantized, or the spectral envelope 11b is converted from the orthogonal transformation coefficient 11a.
Therefore, it is possible to provide a speech coder which can efficiently code an input signal by utilizing orthogonal transformation and also taking into account successive masking.

【００２２】なお、本実施の形態１では、直交変換手段
をＤＣＴで構成した例を説明したが、その他にＦＦＴ、
ＭＤＣＴなどの直交変換系の手段で構成することも可能
である。In the first embodiment, an example in which the orthogonal transform means is composed of DCT has been described. However, in addition to this, FFT,
It is also possible to configure by means of an orthogonal transformation system such as MDCT.

【００２３】また、本実施の形態１では、固定のビット
レートによる符号化を行うようにしているが、可変のビ
ットレートによる符号化を行うこともできる。かかる場
合には、複数フレームのトータルのビット量を制限され
ることなく、該複数フレームにわたるビット割り当てを
行うことができる。In the first embodiment, the coding is performed at a fixed bit rate, but the coding may be performed at a variable bit rate. In such a case, it is possible to perform bit allocation over the plurality of frames without limiting the total bit amount of the plurality of frames.

【００２４】実施の形態２．本発明の実施の形態２は請
求項１，２，４，６に対応するものである。図２は本実
施の形態２による音声符号化装置の構成を示すブロック
図である。図２において、２１は線形予測分析手段（ス
ペクトル包絡演算手段）、２２はパワー演算手段、２３
は直交変換手段（スペクトル情報演算手段）、２４は同
時マスキング補正手段、２５は継時マスキング補正手
段、２６はマスキング閾値決定手段、２７は予め決定さ
れた代表ベクトルのコードを有するコードブック、２８
はスペクトルパラメータ量子化手段（スペクトル情報量
子化手段）、２９はフレーム間ビット割り当て制御手段
である。Embodiment 2 FIG. The second embodiment of the present invention corresponds to claims 1, 2, 4, and 6. FIG. 2 is a block diagram showing the configuration of the speech coding apparatus according to the second embodiment. In FIG. 2, 21 is a linear prediction analysis means (spectrum envelope calculation means), 22 is a power calculation means, and 23.
Is orthogonal transformation means (spectral information calculation means), 24 is simultaneous masking correction means, 25 is continuous masking correction means, 26 is masking threshold value determination means, 27 is a codebook having a code of a predetermined representative vector, 28
Is a spectrum parameter quantization means (spectrum information quantization means), and 29 is an inter-frame bit allocation control means.

【００２５】以下、動作について説明する。入力された
音声信号はフレーム単位で線形予測分析手段２１で線形
予測分析されて、線形予測係数と予測残差２１ｂが得ら
れ、さらに該線形予測係数からスペクトル包絡２１ａが
得られる。パワー演算手段２２は、入力信号の各フレー
ムのパワーを求める。同時マスキング補正手段２４は、
前記スペクトル包絡２１ａをもとに、同時マスキングに
よる聴覚の最小可聴値の補正量２４ａを各フレーム毎に
求める。また、継時マスキング補正手段２５は、パワー
演算手段２２によって得られた過去数フレーム分のパワ
ー２２ａと前記スペクトル包絡２１ａをもとに、継時マ
スキングによる最小可聴値の補正量２５ａを各フレーム
毎に求める。この継時マスキングには、マスカーがマス
キーに対して先行する順行性マスキングと、マスカーが
マスキーより後で現れる逆行性マスキングの両方が考え
られる。いま、フレーム長を10msec程度にすると、現在
のフレームに対するその前後のフレームとの継時マスキ
ングが、各々のパワーの値に応じて生じることになる。
このマスク量は、順行性マスキング，あるいは逆行性マ
スキングのいずれであるかにより大きさが異なる。ま
た、継時マスキングによる最小可聴値の補正は、補正を
行うフレームの前後両方から行う必要がある。従って、
数フレーム分のパワー２２ａ及びスペクトル包絡のデー
タの値を一旦メモリなどに記憶して過去のフレームに溯
って継時マスキングの補正量２５ａを決定するようにし
ている。マスキング閾値決定手段２６は、聴覚の最小可
聴値をもとに、同時マスキング補正手段２４及び継時マ
スキング補正手段２５によって与えられた補正量２４
ａ，２５ａに従って、各フレーム毎に最小可聴値を該補
正量２４ａ，２５ａの分だけ上昇させるよう補正し、こ
の補正した最小可聴値をマスキング閾値２６ａとして出
力する。直交変換手段２３では、線形予測分析の結果得
られる予測残差２１ｂに対し、ＭＤＣＴ（Modified DC
T）などの直交変換を行って直交変換係数２３ａを求め
る。The operation will be described below. The input speech signal is subjected to linear prediction analysis by the linear prediction analysis means 21 on a frame-by-frame basis to obtain a linear prediction coefficient and a prediction residual 21b, and further a spectrum envelope 21a is obtained from the linear prediction coefficient. The power calculation means 22 calculates the power of each frame of the input signal. The simultaneous masking correction means 24 is
Based on the spectral envelope 21a, a correction amount 24a of the minimum audible value of hearing due to simultaneous masking is obtained for each frame. Further, the successive masking correction means 25, based on the power 22a for the past several frames obtained by the power calculation means 22 and the spectral envelope 21a, corrects the minimum audible value correction amount 25a by the successive masking for each frame. Ask for. This continuous masking can be both anterograde masking in which the masker precedes the masky and a retrograde masking in which the masker appears after the masky. Now, if the frame length is set to about 10 msec, the successive masking of the current frame and the frames before and after the current frame will occur according to each power value.
The amount of this mask differs depending on whether it is the forward masking or the backward masking. Further, the minimum audible value should be corrected by successive masking both before and after the frame to be corrected. Therefore,
The power 22a for several frames and the values of the data of the spectrum envelope are temporarily stored in a memory or the like, and the correction amount 25a of the successive masking is determined in accordance with the past frames. The masking threshold value determining means 26 determines the correction amount 24 given by the simultaneous masking correcting means 24 and the successive masking correcting means 25 based on the minimum audible value of hearing.
In accordance with a and 25a, the minimum audible value is corrected for each frame so as to be increased by the correction amount 24a and 25a, and the corrected minimum audible value is output as the masking threshold value 26a. The orthogonal transform means 23 performs MDCT (Modified DC) on the prediction residual 21b obtained as a result of the linear prediction analysis.
The orthogonal transformation coefficient 23a is obtained by performing orthogonal transformation such as T).

【００２６】次に、フレーム間ビット割り当て制御手段
２９では、線形予測分析手段２１によって得られたスペ
クトル包絡２１ａをマスキング閾値決定手段２６で最終
的に決定された各フレームのマスキング閾値２６ａで除
算する。このマスキング閾値２６ａで除算されたスペク
トル包絡ｄＦは、以下でビット割り当て量の配分を行う
際に用いる他に、スペクトルパラメータ量子化手段２８
でベクトル量子化を行う際の重み付け距離尺度の重み係
数としても用いる。すなわち、前記除算後のスペクトル
包絡ｄＦに対して、複数フレーム分の平均パワーを求
め、該複数フレーム分の平均パワーに対する各フレーム
の平均パワーの比率に応じて、各フレームに対しビット
割り当て量の配分を行い、このビット割り当て量の配分
を前記除算後のスペクトル包絡ｄＦとともにスペクトル
パラメータ量子化手段２８に出力する（２９ａ）。そし
て、前記ビット割り当て量の配分を、補助情報ｓｉ(i)
として外部に出力する。ここで、本実施の形態２では、
ビットレートは固定であるので、前記各フレームに対す
るビット割り当ては、複数フレームのトータルのビット
量を一定として行われる。また、複数フレーム分にわた
るビット割り当てを考慮することで、除算後のスペクト
ル包絡ｄＦが複数フレーム間においてエネルギーの偏り
を持つ場合などにおいて、符号化効率を改善することが
可能となる。Next, the inter-frame bit allocation control means 29 divides the spectral envelope 21a obtained by the linear prediction analysis means 21 by the masking threshold value 26a of each frame finally determined by the masking threshold value determining means 26. The spectrum envelope dF divided by the masking threshold value 26a is used when the bit allocation amount is distributed below, and the spectrum parameter quantizing means 28 is used.
It is also used as the weighting coefficient of the weighted distance measure when vector quantization is performed in. That is, the average power for a plurality of frames is calculated for the spectrum envelope dF after the division, and the bit allocation amount is distributed to each frame according to the ratio of the average power of each frame to the average power for the plurality of frames. And outputs the allocation of the bit allocation amount to the spectrum parameter quantizing means 28 together with the spectrum envelope dF after the division (29a). Then, the allocation of the bit allocation amount is supplemented by auxiliary information si (i)
And output to the outside. Here, in the second embodiment,
Since the bit rate is fixed, the bit allocation for each frame is performed with a constant total bit amount of a plurality of frames. Further, by considering the bit allocation over a plurality of frames, it is possible to improve the coding efficiency in the case where the spectrum envelope dF after division has a bias of energy among a plurality of frames.

【００２７】次に、スペクトルパラメータ量子化手段２
８では、コードブック２７を用いて、前記ビット割り当
て制御手段２９により各フレームに割り当てられたビッ
ト量で、直交変換手段２３で得られた直交変換係数２３
ａのベクトル量子化を行う。すなわち、直交変換係数２
３ａに対し最適な代表ベクトルのコード２８ａとゲイン
情報とを決定し、該ゲイン情報を前記各フレームに割り
当てられたビット量に応じて量子化する（２８ｂ）。そ
して、これらコード２８ａ，及び量子化されたゲイン情
報２８ｂを符号化出力として外部に出力する。この際
に、代表ベクトルのコード２８ａは、除算後のスペクト
ル包絡ｄＦに比例した重み係数を用いて聴覚重み付けを
された距離尺度を、最小とするよう決定される。Next, the spectrum parameter quantization means 2
8, the codebook 27 is used to generate the orthogonal transform coefficient 23 obtained by the orthogonal transform means 23 with the bit amount assigned to each frame by the bit assignment control means 29.
Vector quantization of a is performed. That is, the orthogonal transform coefficient 2
An optimum representative vector code 28a and gain information for 3a are determined, and the gain information is quantized according to the bit amount assigned to each frame (28b). Then, the code 28a and the quantized gain information 28b are output to the outside as a coded output. At this time, the code 28a of the representative vector is determined so as to minimize the distance measure weighted by the auditory sense using the weighting coefficient proportional to the spectral envelope dF after the division.

【００２８】以上のように、本実施の形態２において
は、入力信号を線形予測分析して線形予測係数，及び予
測残差２１ｂを求め、該予測残差２１ｂから得た直交変
換係数２３ａを量子化し、あるいは線形予測係数からス
ペクトル包絡２１ａを得るようにしたので、線形予測分
析を利用し，かつ継時マスキングをも考慮して効率よく
入力信号を符号化できる音声符号化装置を提供できる。As described above, in the second embodiment, the linear prediction analysis of the input signal is performed to obtain the linear prediction coefficient and the prediction residual 21b, and the orthogonal transform coefficient 23a obtained from the prediction residual 21b is quantized. Since, or the spectrum envelope 21a is obtained from the linear prediction coefficient, it is possible to provide a speech coding apparatus that can efficiently code an input signal by using linear prediction analysis and also considering temporal masking.

【００２９】また、直交変換係数２３ａをコードブック
２７を用いてベクトル量子化し、かつ該ベクトル量子化
をするに際してゲイン情報に適応的にビットを割り当て
るので、より符号化効率を向上させることができる。Further, since the orthogonal transform coefficient 23a is vector-quantized by using the codebook 27 and the bits are adaptively assigned to the gain information when the vector quantization is performed, the coding efficiency can be further improved.

【００３０】さらに、各フレームに対するビットの割り
当てを、マスキング閾値で除算されたスペクトル包絡ｄ
Ｆの平均パワーの複数フレーム間にわたる変化に応じて
適応的に行うようにしたので、前記除算後のスペクトル
包絡ｄＦが複数フレーム間においてエネルギーの偏りを
持つ場合などにおいて、符号化効率を向上させることが
できる。Further, the bit allocation for each frame is divided by the masking threshold to obtain a spectral envelope d.
Since it is adapted adaptively according to the change of the average power of F over a plurality of frames, it is possible to improve the coding efficiency in the case where the spectrum envelope dF after the division has an energy deviation among a plurality of frames. You can

【００３１】さらにまた、直交変換係数をベクトル量子
化するに際し、周波数軸に関して聴覚重み付けされた尺
度で代表ベクトルのコードとゲイン情報とを決定するよ
うにしたので、聴覚マスキングを有効に利用した符号化
効率のよい音声符号化装置を構成できる。Furthermore, when vector-quantizing the orthogonal transform coefficients, the code of the representative vector and the gain information are determined on a perceptually weighted scale with respect to the frequency axis. It is possible to configure an efficient voice encoding device.

【００３２】なお、本実施の形態２では、直交変換手段
をＭＤＣＴで構成した例を説明したが、その他にＦＦ
Ｔ、ＤＣＴなどの直交変換系の手段で構成することも可
能である。In the second embodiment, an example in which the orthogonal transform means is composed of MDCT has been described.
It is also possible to configure by means of an orthogonal transformation system such as T or DCT.

【００３３】また、本実施の形態２では、固定のビット
レートによる符号化を行うようにしているが、可変のビ
ットレートによる符号化を行うこともできる。かかる場
合には、複数フレームのトータルのビット量を制限され
ることなく、該複数フレームにわたるビット割り当てを
行うことができる。In the second embodiment, the coding is performed at a fixed bit rate, but the coding may be performed at a variable bit rate. In such a case, it is possible to perform bit allocation over the plurality of frames without limiting the total bit amount of the plurality of frames.

【００３４】実施の形態３．本発明の実施の形態３は請
求項１，２，５，６に対応するものである。図３は本実
施の形態３による音声符号化装置の構成を示すブロック
図である。図３において、３１は線形予測分析手段（ス
ペクトル包絡演算手段）、３２はパワー演算手段、３３
は直交変換手段（スペクトル情報演算手段）、３４は同
時マスキング補正手段、３５は継時マスキング補正手
段、３６はマスキング閾値決定手段、３７は代表ベクト
ルのコードをそれぞれ有し，その代表ベクトルの数，次
元数等のコードブックサイズ，及び該コードブックサイ
ズに対応する所要ビット量がそれぞれ異なる複数のコー
ドブック（０〜Ｎｃ）からなるコードブック群、３８は
スペクトルパラメータ量子化手段（スペクトル情報量子
化手段）、３９はフレーム間ビット割り当て制御手段で
ある。Embodiment 3. The third embodiment of the present invention corresponds to claims 1, 2, 5, and 6. FIG. 3 is a block diagram showing the configuration of the speech coding apparatus according to the third embodiment. In FIG. 3, 31 is a linear prediction analysis means (spectrum envelope calculation means), 32 is a power calculation means, and 33.
Is orthogonal transformation means (spectral information calculation means), 34 is simultaneous masking correction means, 35 is continuous masking correction means, 36 is masking threshold value determination means, and 37 is a representative vector code. A codebook group consisting of a plurality of codebooks (0 to Nc) each having a different codebook size such as the number of dimensions and a required bit amount corresponding to the codebook size, and 38 is a spectrum parameter quantization means (spectrum information quantization means). ) And 39 are inter-frame bit allocation control means.

【００３５】以下、動作について説明する。入力された
音声信号はフレーム単位で線形予測分析手段３１で線形
予測分析されて、線形予測係数と予測残差３１ｂが得ら
れ、さらに該線形予測係数からスペクトル包絡３１ａが
得られる。パワー演算手段３２は、入力された音声信号
の各フレームのパワー３２ａを求める。同時マスキング
補正手段３４は、前記スペクトル包絡３１ａをもとに、
同時マスキングによる聴覚の最小可聴値の補正量３４ａ
を各フレーム毎に求める。また、継時マスキング補正手
段３５は、パワー演算手段３２によって得られた過去数
フレーム分のパワー３２ａと前記スペクトル包絡３１ａ
をもとに、継時マスキングによる最小可聴値の補正量３
５ａを各フレーム毎に求める。この継時マスキングに
は、マスカーがマスキーに対して先行する順行性マスキ
ングと、マスカーがマスキーより後で現れる逆行性マス
キングの両方が考えられる。いま、フレーム長を10msec
程度にすると、現在のフレームに対するその前後のフレ
ームとの継時マスキングが、各々のパワーの値に応じて
生じることになる。このマスク量は、順行性マスキン
グ，あるいは逆行性マスキングのいずれであるかにより
大きさが異なる。また、継時マスキングによる最小可聴
値の補正は、補正を行うフレームの前後両方から行う必
要がある。従って、数フレーム分のパワー３２ａ及びス
ペクトル包絡３１ａの値を一旦メモリなどに記憶して過
去のフレームに溯って継時マスキングの補正量３５ａを
決定するようにしている。マスキング閾値決定手段３６
は、聴覚の最小可聴値をもとに、同時マスキング補正手
段３４及び継時マスキング補正手段３５によって与えら
れた補正量３４ａ，３５ａに従って、各フレーム毎に最
小可聴値を該補正量３４ａ，３５ａの分だけ上昇させる
よう補正し、この補正した最小可聴値をマスキング閾値
３６ａとして出力する。直交変換手段３３では、線形予
測分析の結果得られる予測残差３１ｂに対し、ＭＤＣＴ
などの直交変換を行って直交変換係数３３ａを求める。The operation will be described below. The input speech signal is subjected to linear prediction analysis by the linear prediction analysis means 31 on a frame-by-frame basis to obtain a linear prediction coefficient and a prediction residual 31b, and further a spectrum envelope 31a is obtained from the linear prediction coefficient. The power calculation means 32 obtains the power 32a of each frame of the input audio signal. The simultaneous masking correction means 34 uses the spectral envelope 31a as a basis.
Correction amount 34a of minimum audible value of hearing by simultaneous masking
Is calculated for each frame. Further, the successive masking correction means 35 includes the power 32a for the past several frames obtained by the power calculation means 32 and the spectrum envelope 31a.
Based on, the minimum audible value correction amount 3 by continuous masking
5a is obtained for each frame. This continuous masking can be both anterograde masking in which the masker precedes the masky and a retrograde masking in which the masker appears after the masky. Now, the frame length is 10 msec
To a degree, successive masking of the current frame with the previous and subsequent frames will occur depending on the value of each power. The amount of this mask differs depending on whether it is the forward masking or the backward masking. Further, the minimum audible value should be corrected by successive masking both before and after the frame to be corrected. Therefore, the power 32a for several frames and the values of the spectral envelope 31a are temporarily stored in a memory or the like, and the correction amount 35a of the successive masking is determined in the past frame. Masking threshold value determining means 36
On the basis of the minimum audible value of hearing, according to the correction amounts 34a, 35a provided by the simultaneous masking correction means 34 and the successive masking correction means 35, the minimum audible value of each frame is set to the correction amount 34a, 35a. It is corrected so as to be increased by the amount, and the corrected minimum audible value is output as the masking threshold 36a. The orthogonal transform means 33 uses the MDCT for the prediction residual 31b obtained as a result of the linear prediction analysis.
Orthogonal transformation such as is performed to obtain the orthogonal transformation coefficient 33a.

【００３６】次に、フレーム間ビット割り当て制御手段
３９では、線形予測分析手段３１によって得られたスペ
クトル包絡３１ａをマスキング閾値決定手段３６で最終
的に決定された各フレームのマスキング閾値３６ａで除
算する。このマスキング閾値３６ａで除算されたスペク
トル包絡ｄＦは、以下でビット割り当て量の配分を行う
際に用いる他に、スペクトルパラメータ量子化手段３８
でベクトル量子化を行う際の重み付け距離尺度の重み係
数としても用いる。すなわち、前記除算後のスペクトル
包絡ｄＦに対して、複数フレーム分の平均パワーを求
め、該複数フレーム分の平均パワーに対する各フレーム
の平均パワーの比率に応じて、各フレームに対しビット
割り当て量の配分を行い、このビット割り当て量の配分
を前記除算後のスペクトル包絡ｄＦとともにスペクトル
パラメータ量子化手段３８に出力する（３９ａ）。そし
て、前記ビット割り当て量の配分を、補助情報ｓｉ(i)
として外部に出力する。ここで、本実施の形態３では、
ビットレートは固定であるので、前記各フレームに対す
るビット割り当ては、複数フレームのトータルのビット
量を一定として行われる。また、複数フレーム分にわた
るビット割り当てを考慮することで、除算後のスペクト
ル包絡ｄＦが複数フレーム間においてエネルギーの偏り
を持つ場合などにおいて、符号化効率を改善することが
可能となる。Next, the inter-frame bit allocation control means 39 divides the spectral envelope 31a obtained by the linear prediction analysis means 31 by the masking threshold value 36a of each frame finally determined by the masking threshold value determining means 36. The spectrum envelope dF divided by the masking threshold 36a is used in the following distribution of bit allocation amount, and the spectrum parameter quantizing means 38 is used.
It is also used as the weighting coefficient of the weighted distance measure when vector quantization is performed in. That is, the average power for a plurality of frames is calculated for the spectrum envelope dF after the division, and the bit allocation amount is distributed to each frame according to the ratio of the average power of each frame to the average power for the plurality of frames. Then, the distribution of the bit allocation amount is output to the spectrum parameter quantization means 38 together with the spectrum envelope dF after the division (39a). Then, the allocation of the bit allocation amount is supplemented by auxiliary information si (i)
And output to the outside. Here, in the third embodiment,
Since the bit rate is fixed, the bit allocation for each frame is performed with a constant total bit amount of a plurality of frames. Further, by considering the bit allocation over a plurality of frames, it is possible to improve the coding efficiency in the case where the spectrum envelope dF after division has a bias of energy among a plurality of frames.

【００３７】次に、スペクトルパラメータ量子化手段３
８では、コードブック群３７を用いて、前記ビット割り
当て制御手段３９により各フレームに割り当てられたビ
ット量で、直交変換手段３３で得られた直交変換係数３
３ａのベクトル量子化を行う。すなわち、コードブック
群３７の複数のコードブック（０〜Ｎｃ）のうちから、
各フレームに割り当てられたビット量に応じた所要ビッ
ト量，すなわちコードブックサイズ，のコードブックを
選択し、この選択されたコードブックを用いて直交変換
係数３３ａに対し最適な代表ベクトルのコード３８ａと
ゲイン情報とを決定し、該ゲイン情報を固定されたビッ
ト量で量子化する。そして、これらコード３８ａ，及び
量子化されたゲイン情報３８ｂを符号化出力として外部
に出力する。この際に、代表ベクトルのコード３８ａ
は、除算後のスペクトル包絡ｄＦに比例した重み係数を
用いて聴覚重み付けをされた距離尺度を、最小とするよ
う決定される。Next, the spectrum parameter quantization means 3
8, the codebook group 37 is used to generate the orthogonal transform coefficient 3 obtained by the orthogonal transform means 33 with the bit amount assigned to each frame by the bit assignment control means 39.
Perform vector quantization of 3a. That is, from the plurality of codebooks (0 to Nc) of the codebook group 37,
A codebook having a required bit amount corresponding to the bit amount assigned to each frame, that is, a codebook size, is selected, and by using the selected codebook, a code 38a of an optimum representative vector for the orthogonal transform coefficient 33a is obtained. Gain information is determined, and the gain information is quantized with a fixed bit amount. Then, the code 38a and the quantized gain information 38b are output to the outside as a coded output. At this time, the representative vector code 38a
Is determined to minimize the perceptually weighted distance measure using a weighting factor proportional to the spectral envelope dF after division.

【００３８】以上のように、本実施の形態３において
は、マスキング閾値で除算されたスペクトル包絡ｄＦの
パワーの複数フレーム間にわたる変化に応じて、各フレ
ームに対し適応的にビットの割り当てを行い、かつ複数
のコードブック（０〜Ｎｃ）のうちから前記割り当てら
れたビット量に応じたコードブックサイズのコードブッ
クを選択するようにして、直交変換係数３３ａをベクト
ル量子化するので、除算後のスペクトル包絡ｄＦが複数
フレーム間においてエネルギーの偏りを持つ場合などに
おいて、より符号化効率を向上させることができる。As described above, in the third embodiment, bits are adaptively assigned to each frame according to the change in the power of the spectrum envelope dF divided by the masking threshold value over a plurality of frames, Also, the orthogonal transform coefficient 33a is vector-quantized by selecting a codebook having a codebook size corresponding to the allocated bit amount from a plurality of codebooks (0 to Nc), and thus the spectrum after division is divided. When the envelope dF has a bias in energy among a plurality of frames, the coding efficiency can be further improved.

【００３９】また、直交変換係数３３ａをベクトル量子
化するに際し、周波数軸に関して聴覚重み付けされた尺
度で代表ベクトルのコードとゲイン情報とを決定するよ
うにしたので、聴覚マスキングを有効に利用した符号化
効率のよい音声符号化装置を構成できる。When the orthogonal transform coefficient 33a is vector-quantized, the code of the representative vector and the gain information are determined on the basis of the perceptually weighted scale with respect to the frequency axis. It is possible to configure an efficient voice encoding device.

【００４０】なお、本実施の形態３では、直交変換手段
をＭＤＣＴで構成した例を説明したが、その他にＦＦ
Ｔ、ＤＣＴなどの直交変換系の手段で構成することも可
能である。In the third embodiment, an example in which the orthogonal transform means is composed of MDCT has been described.
It is also possible to configure by means of an orthogonal transformation system such as T or DCT.

【００４１】また、本実施の形態３では、固定のビット
レートによる符号化を行うようにしているが、可変のビ
ットレートによる符号化を行うこともできる。かかる場
合には、複数フレームのトータルのビット量を制限され
ることなく、該複数フレームにわたるビット割り当てを
行うことができる。Further, in the third embodiment, the coding is carried out at a fixed bit rate, but the coding may be carried out at a variable bit rate. In such a case, it is possible to perform bit allocation over the plurality of frames without limiting the total bit amount of the plurality of frames.

【００４２】実施の形態４．本発明の実施の形態４は請
求項7 に対応するものである。図４は本実施の形態４に
おける音声符号化装置の構成を示すブロック図である。
本実施の形態４では、２つの周波数帯域に分割後に符号
化処理を行うモジュールを持つ例を示す。図４において
４１は入力音声信号の周波数帯域を第１の周波数帯域と
該第１の周波数帯域より低い第２の周波数帯域に分割す
る帯域分割手段、４２は各帯域の信号のパワーを演算す
るパワー演算手段、４３は各帯域にビットを割り当てる
帯域ビット割り当て制御手段である。４４は第１の周波
数帯域の信号，及び帯域ビット割り当て制御手段４３の
出力が入力される第１の音声符号化モジュールであり、
実施の形態１の音声符号化装置が使用される。４５は第
２の周波数帯域の信号，及び帯域ビット割り当て制御手
段４３の出力が入力される第２の音声符号化モジュール
であり、実施の形態２，又は３の音声符号化装置が使用
される。Embodiment 4 The fourth embodiment of the present invention corresponds to claim 7. FIG. 4 is a block diagram showing the configuration of the speech coding apparatus according to the fourth embodiment.
The fourth embodiment shows an example having a module that performs coding processing after dividing into two frequency bands. In FIG. 4, 41 is a band dividing means for dividing the frequency band of the input audio signal into a first frequency band and a second frequency band lower than the first frequency band, and 42 is power for calculating the power of the signal in each band. The calculating means 43 is a band bit allocation control means for allocating bits to each band. Reference numeral 44 is a first voice encoding module to which the signal of the first frequency band and the output of the band bit allocation control means 43 are input,
The speech coding apparatus according to the first embodiment is used. Reference numeral 45 is a second voice encoding module to which the signal of the second frequency band and the output of the band bit allocation control means 43 are input, and the voice encoding device of the second or third embodiment is used.

【００４３】以下、動作について説明する。帯域分割手
段４１は、入力される音声信号を第１，第２の周波数帯
域の信号４１ａ，４１ｂに分割して出力する。パワー演
算手段４２は帯域分割後の各帯域の信号のパワー４２ａ
を演算し、帯域ビット割り当て制御手段４３で各帯域の
信号のパワー４２ａの各々の比率に応じて帯域毎のビッ
ト量４３ａ，４３ｂを決定する。ここで、各帯域内での
エネルギーの偏在を減少させることでダイナミックレン
ジを縮小し、符号化効率をより向上させることができ
る。第１，第２の音声符号化モジュール４４，４５は、
第１，第２の周波数帯域の信号４１ａ，４１ｂを、帯域
ビット割り当て制御手段４３により割り当てられたビッ
ト量４３ａ，４３ｂでそれぞれ符号化する。The operation will be described below. The band dividing means 41 divides the input audio signal into signals 41a and 41b in the first and second frequency bands and outputs the signals. The power calculation means 42 is the power 42a of the signal of each band after band division.
Then, the band bit allocation control means 43 determines the bit amounts 43a and 43b for each band in accordance with the ratio of the power 42a of the signal in each band. Here, by reducing the uneven distribution of energy in each band, it is possible to reduce the dynamic range and further improve the coding efficiency. The first and second speech coding modules 44 and 45 are
The signals 41a and 41b of the first and second frequency bands are encoded by the bit amounts 43a and 43b allocated by the band bit allocation control means 43, respectively.

【００４４】これにより、線形予測分析を利用する第２
の音声符号化モジュールには、比較的線形予測可能な低
い周波数成分が多く入力され、一方、直交変換を利用す
る第１の音声符号化モジュールには、線形予測が難しい
信号を含む可能性の高い，高い周波数成分が多く入力さ
れることとなり、直交変換を利用する第１の音声符号化
モジュールを使用するメリットを増大させることができ
る。なお、以上の説明では、２つの周波数帯域に分割す
る場合を説明したが、前記第１，第２の周波数帯域をさ
らに複数の周波数帯域に分割することもできる。Thus, the second method utilizing the linear prediction analysis
The low-frequency components that can be relatively linearly predicted are input to the speech coding module of 1), while the first speech coding module that uses orthogonal transform is likely to include a signal that is difficult to linearly predict. Since a large number of high frequency components are input, it is possible to increase the merit of using the first speech encoding module that uses orthogonal transformation. In the above description, the case where the frequency band is divided into two frequency bands has been described, but the first and second frequency bands may be further divided into a plurality of frequency bands.

【００４５】このように、入力音声信号の周波数帯域を
分割し、この分割された各周波数帯域に対し、各々の周
波数に適した音声符号化モジュールを使用することによ
り、さらなる符号化効率の向上を実現できる。In this way, the frequency band of the input voice signal is divided, and the voice coding module suitable for each frequency is used for each of the divided frequency bands to further improve the coding efficiency. realizable.

【００４６】[0046]

【発明の効果】以上のように、請求項1 の発明によれ
ば、聴覚の最小可聴値をもとに同時マスキングを考慮し
て決定したマスキング閾値で入力信号のスペクトル情報
を除算し、該除算後のスペクトル情報を、該スペクトル
情報のパワーに応じて適応的にビットを割り当てて量子
化する音声符号化装置において、入力信号のパワーの複
数フレーム間にわたる時間的な変化と前記入力信号の包
絡とをもとに継時マスキングの影響をも考慮して前記マ
スキング閾値を決定するようにしたので、エネルギーが
急激に変化する音声やオーディオ信号の立ち上がり部分
などにおいて符号化による歪みが知覚され難くなり、ト
ータルとして符号化効率の向上を図れるという顕著な効
果が得られる。As described above, according to the invention of claim 1, the spectrum information of the input signal is divided by the masking threshold value determined in consideration of the simultaneous masking based on the minimum audible value of hearing, and the division is performed. In a speech coding apparatus that quantizes subsequent spectrum information by adaptively allocating bits according to the power of the spectrum information, a temporal change in power of an input signal over a plurality of frames and an envelope of the input signal. Since the masking threshold value is determined in consideration of the effect of continuous masking based on the above, distortion due to coding is less likely to be perceived in a rising portion of a voice or an audio signal whose energy changes abruptly, A remarkable effect that the coding efficiency can be improved as a whole is obtained.

【００４７】また、請求項2 の発明によれば、ビット割
り当て制御手段を、マスキング閾値で除算されたスペク
トル情報のパワーの複数フレーム間にわたる変化に応じ
て、各フレームに対し適応的にビット割り当てを行うよ
うにしたので、該除算後のスペクトル情報が複数フレー
ム間においてエネルギーの偏りを持つ場合などにおい
て、符号化効率を向上させることができる。Further, according to the invention of claim 2, the bit allocation control means adaptively allocates a bit to each frame according to a change in the power of the spectrum information divided by the masking threshold value over a plurality of frames. Since this is done, it is possible to improve the coding efficiency when the spectrum information after the division has a bias in energy among a plurality of frames.

【００４８】また、請求項3 の発明によれば、入力信号
を直交変換して直交変換係数を求め、該直交変換係数を
量子化し、あるいは該直交変換係数からスペクトル包絡
を得るようにしたので、直交変換を利用し，かつ継時マ
スキングをも考慮して効率よく入力信号を符号化できる
音声符号化装置を提供できる。According to the invention of claim 3, the input signal is orthogonally transformed to obtain the orthogonal transform coefficient, and the orthogonal transform coefficient is quantized, or the spectrum envelope is obtained from the orthogonal transform coefficient. It is possible to provide a speech coding apparatus that can efficiently code an input signal by using orthogonal transformation and also considering successive masking.

【００４９】また、請求項4 の発明によれば、入力信号
を線形予測分析して線形予測係数，及び予測残差を求
め、該予測残差から得た直交変換係数を量子化し、ある
いは前記線形予測係数からスペクトル包絡を得るように
したので、線形予測分析を利用し，かつ継時マスキング
をも考慮して効率よく入力信号を符号化できる音声符号
化装置を提供できる。また、直交変換係数をコードブッ
クを用いてベクトル量子化し、かつ該ベクトル量子化を
するに際してゲイン情報に適応的にビットを割り当てる
ので、より符号化効率を向上させることができる。さら
に、各フレームに対するビットの割り当てを、マスキン
グ閾値で除算されたスペクトル包絡のパワーの複数フレ
ーム間にわたる変化に応じて適応的に行うようにしたの
で、除算後のスペクトル包絡が複数フレーム間において
エネルギーの偏りを持つ場合などにおいて、符号化効率
を向上させることができる。Further, according to the invention of claim 4, the input signal is subjected to linear prediction analysis to obtain a linear prediction coefficient and a prediction residual, and the orthogonal transform coefficient obtained from the prediction residual is quantized, or the linear transformation coefficient is calculated. Since the spectrum envelope is obtained from the prediction coefficient, it is possible to provide a speech coder that can efficiently code the input signal by using the linear prediction analysis and also considering the temporal masking. In addition, since the orthogonal transform coefficient is vector-quantized using a codebook and bits are adaptively assigned to the gain information when the vector quantization is performed, the coding efficiency can be further improved. Furthermore, since the bit allocation for each frame is adaptively performed according to the change in the power of the spectrum envelope divided by the masking threshold value over a plurality of frames, the spectrum envelope after the division can reduce the energy of the plurality of frames. In the case where there is a bias, the coding efficiency can be improved.

【００５０】また、請求項5 の発明によれば、マスキン
グ閾値で除算されたスペクトル包絡のパワーの複数フレ
ーム間にわたる変化に応じて、各フレームに対し適応的
にビットの割り当てを行い、かつ複数のコードブックの
うちから前記割り当てられたビット量に応じたコードブ
ックサイズのコードブックを選択するようにして、直交
変換係数をベクトル量子化するので、除算後のスペクト
ル包絡が複数フレーム間においてエネルギーの偏りを持
つ場合などにおいて、より符号化効率を向上させること
ができる。Further, according to the invention of claim 5, bits are adaptively assigned to each frame according to a change in the power of the spectrum envelope divided by the masking threshold value over a plurality of frames, and a plurality of bits are assigned to each frame. Since a codebook having a codebook size corresponding to the allocated bit amount is selected from the codebooks and the orthogonal transform coefficient is vector-quantized, the spectrum envelope after division is biased in energy between a plurality of frames. It is possible to further improve the coding efficiency in the case of having.

【００５１】また、請求項6 の発明によれば、直交変換
係数をベクトル量子化するに際し、周波数軸に関して聴
覚重み付けされた尺度で代表ベクトルのコードとゲイン
情報とを決定するようにしたので、聴覚マスキングを有
効に利用した符号化効率のよい音声符号化装置を構成で
きる。Further, according to the invention of claim 6, the code of the representative vector and the gain information are determined by a perceptually weighted scale with respect to the frequency axis when vector-quantizing the orthogonal transform coefficient. It is possible to configure a speech coding apparatus with good coding efficiency that effectively uses masking.

【００５２】また、請求項7 の発明によれば、入力信号
の周波数帯域を高い周波数帯域と低い周波数帯域とに分
割し、高い周波数帯域の信号を、直交変換を利用する請
求項3 の音声符号化装置からなる第１の音声符号化モジ
ュールに入力し、低い周波数帯域の信号を、線形予測分
析を利用する請求項4 ，又は5 の音声符号化装置からな
る第２の音声符号化モジュールに入力するようにしたの
で、線形予測分析を利用する第２の音声符号化モジュー
ルには、比較的線形予測可能な低い周波数成分が多く入
力され、一方、直交変換を利用する第１の音声符号化モ
ジュールには、線形予測が難しい信号を含む可能性の高
い，高い周波数成分が多く入力されることとなり、直交
変換を利用する第１の音声符号化モジュールを使用する
メリットを増大させることができ、その結果、さらなる
符号化効率の向上を実現できる。Further, according to the invention of claim 7, the frequency band of the input signal is divided into a high frequency band and a low frequency band, and the signal of the high frequency band is subjected to orthogonal transformation. Input to a first speech coding module consisting of a coding device, and a signal in a low frequency band is inputted to a second speech coding module consisting of a speech coding device according to claim 4 or 5, which uses linear prediction analysis. As a result, a large number of relatively linearly predictable low frequency components are input to the second speech coding module that uses linear prediction analysis, while the first speech coding module that uses orthogonal transformation is used. A large number of high frequency components, which are likely to include signals for which linear prediction is difficult, will be input to, increasing the merit of using the first speech encoding module that utilizes orthogonal transform. It can, as a result, can achieve a further improvement in coding efficiency.

[Brief description of drawings]

【図１】本発明の実施の形態１による音声符号化装置の
構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of a speech coding apparatus according to Embodiment 1 of the present invention.

【図２】本発明の実施の形態２による音声符号化装置の
構成を示すブロック図である。FIG. 2 is a block diagram showing the configuration of a speech coding apparatus according to Embodiment 2 of the present invention.

【図３】本発明の実施の形態３による音声符号化装置の
構成を示すブロック図である。FIG. 3 is a block diagram showing the configuration of a speech encoding apparatus according to Embodiment 3 of the present invention.

【図４】本発明の実施の形態４による音声符号化装置の
構成を示すブロック図である。FIG. 4 is a block diagram showing the configuration of a speech coding apparatus according to Embodiment 4 of the present invention.

【図５】従来の音声符号化装置の構成を示すブロック図
である。FIG. 5 is a block diagram showing a configuration of a conventional speech encoding device.

[Explanation of symbols]

１１，２３，３３直交変換手段１２，２２，３２パワー演算手段１３，２４，３４同時マスキング補正手段１４，２５，３５継時マスキング補正手段１５，２６，３６マスキング閾値決定手段１６除算手段１７，２８，３８スペクトルパラメータ量子化手段１８ビット割り当て制御手段２１線形予測分析手段２７コードブック２９，３９フレーム間ビット割り当て制御手段３７コードブック群４１帯域分割手段４３帯域ビット割り当て制御手段４４第１の音声符号化モジュール４５第２の音声符号化モジュール 11,23,33 Orthogonal transformation means 12,22,32 Power calculation means 13,24,34 Simultaneous masking correction means 14,25,35 Continuous masking correction means 15,26,36 Masking threshold value determining means 16 Division means 17,28 , 38 spectrum parameter quantization means 18 bit allocation control means 21 linear prediction analysis means 27 codebook 29, 39 interframe bit allocation control means 37 codebook group 41 band division means 43 band bit allocation control means 44 first speech coding Module 45 Second Speech Encoding Module

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁶ 識別記号庁内整理番号ＦＩ技術表示箇所Ｈ０３Ｍ 7/30 9382−5ＫＨ０３Ｍ 7/30 Ａ ─────────────────────────────────────────────────── ─── Continuation of the front page (51) Int.Cl. ⁶ Identification number Internal reference number FI Technical indication H03M 7/30 9382-5K H03M 7/30 A

Claims

[Claims]

1. A spectrum information calculation means for obtaining the spectrum information of an input signal in frame units, a spectrum envelope calculation means for obtaining the spectrum envelope of the input signal in frame units, and a power calculation for obtaining the power of the input signal in frame units. Means, a simultaneous masking correction means for obtaining a correction amount of the minimum audible value of the hearing considering the influence of the simultaneous masking based on the spectral envelope of the input signal, and a temporal change of the power of the input signal over a plurality of frames. And a continuous masking correction means for obtaining a correction amount of the minimum auditory audible value in consideration of the influence of the continuous masking based on the envelope of the input signal, and the simultaneous masking correction means and the continuous masking correction means. The masking threshold that determines the masking threshold by correcting the minimum audible value of hearing based on each correction amount Determining means, the spectrum information is divided by the masking threshold, the bit allocation control means for adaptively bit allocation according to the power of the spectrum information divided by the masking threshold, the spectrum divided by the masking threshold A speech coding apparatus, comprising: spectral information quantizing means for quantizing information according to the bit allocation amount.

2. The speech coding apparatus according to claim 1, wherein the bit allocation control means adapts to each frame according to a change in power of the spectrum information divided by the masking threshold over a plurality of frames. A speech coding apparatus characterized in that bit allocation is carried out dynamically.

3. An orthogonal transformation means for orthogonally transforming an input signal into frequency components in frame units to obtain orthogonal transformation coefficients, a power calculation means for obtaining power of the input signal in frame units, and the orthogonal transformation coefficients. Simultaneous masking correction means for obtaining the correction amount of the minimum audible value of hearing considering the effect of simultaneous masking based on the spectrum envelope, and the temporal change of the power of the input signal over a plurality of frames and the orthogonal transform coefficient. A continuous masking correction means for obtaining a correction amount of the minimum audible value of the hearing in consideration of the effect of the continuous masking based on the spectral envelope, and the correction amounts obtained by the simultaneous masking correction means and the continuous masking correction means. Masking threshold value determining means for determining the masking threshold value by correcting the minimum audible value of the auditory sense based on Division means for dividing by a king threshold, bit allocation control means for adaptively allocating bits according to the power of the orthogonal transformation coefficient divided by the masking threshold, and the orthogonal transformation coefficient divided by the division means for the bit A speech coding apparatus, comprising: a spectrum parameter quantizing means for quantizing in accordance with an allocation amount.

4. A linear prediction analysis means for linearly predicting and analyzing an input signal on a frame-by-frame basis to obtain a linear prediction coefficient and a prediction residual, and an orthogonal transform for orthogonally transforming the prediction residual into a frequency component to obtain an orthogonal transform coefficient. A conversion means; a power calculation means for obtaining the power of the input signal in frame units; and a simultaneous calculation of a correction amount of the minimum auditory audible value considering the influence of simultaneous masking based on the spectral envelope obtained from the linear prediction coefficient. Masking correction means, and a correction amount of the minimum audible value of the hearing considering the influence of the temporal masking based on the temporal change of the power of the input signal over a plurality of frames and the spectral envelope obtained from the linear prediction coefficient. Based on the correction amount obtained by the continuous masking correction means, the simultaneous masking correction means, and the continuous masking correction means, And a masking threshold value determining means for determining a masking threshold value, and a spectrum envelope obtained from the linear prediction coefficient is divided by the masking threshold value, and the power of the spectrum envelope divided by the masking threshold value is changed depending on a plurality of frames. An inter-frame bit allocation control means for adaptively allocating bits to each frame, a codebook having a representative vector code for vector quantizing the orthogonal transform coefficient, and the orthogonal code using the codebook. A voice code comprising: a spectrum parameter quantizing means for determining a code of a representative vector and gain information optimum for a transform coefficient, and quantizing the gain information according to the bit allocation amount. Device.

5. A linear prediction analysis unit for linearly predicting and analyzing a frame-by-frame input signal to obtain a linear prediction coefficient and a prediction residual, and an orthogonal transform for orthogonally transforming the prediction residual into a frequency component to obtain an orthogonal transform coefficient. A conversion means; a power calculation means for obtaining the power of the input signal in frame units; and a simultaneous calculation of a correction amount of the minimum auditory audible value considering the influence of simultaneous masking based on the spectral envelope obtained from the linear prediction coefficient. Masking correction means, and a correction amount of the minimum audible value of the hearing considering the influence of the temporal masking based on the temporal change of the power of the input signal over a plurality of frames and the spectral envelope obtained from the linear prediction coefficient. Based on the correction amount obtained by the continuous masking correction means, the simultaneous masking correction means, and the continuous masking correction means, And a masking threshold value determining means for determining a masking threshold value, and a spectrum envelope obtained from the linear prediction coefficient is divided by the masking threshold value, and the power of the spectrum envelope divided by the masking threshold value is changed depending on a plurality of frames. And an inter-frame bit allocation control means for adaptively allocating bits to each frame, and a code of a representative vector for vector quantizing the orthogonal transform coefficient. And a plurality of codebooks having different required bit amounts corresponding to the codebook size, and a codebook having a codebook size corresponding to the bit allocation amount is selected from the plurality of codebooks. ,
A speech coding apparatus comprising: a spectrum parameter quantizing means for determining a code of a representative vector and gain information optimum for the orthogonal transform coefficient using the selected codebook.

6. The speech coding apparatus according to claim 4, wherein the spectrum parameter quantizing unit determines the code and gain information of the representative vector on a perceptually weighted scale with respect to a frequency axis. A speech coding apparatus characterized in that

7. Band dividing means for dividing a frequency band of an input signal into a first frequency band and a second frequency band lower than the first frequency band, and the first and second frequency bands, Band bit allocation control means for respectively allocating an amount of bits according to the power ratio of signals in each frequency band, and a signal for the first frequency band is input as an input signal, and the bit allocation control means for the first bit allocation control means. A first speech encoding module comprising the speech encoding device according to claim 3, which allocates bits assigned to the frequency band of, and a signal of the second frequency band is input as an input signal. 6. The audio encoding device according to claim 4, wherein the inter-frame bit allocation control means allocates the bits allocated to the second frequency band. And a second speech coding module consisting of a unit.