JP2002123298A

JP2002123298A - Method and device for encoding signal, recording medium recorded with signal encoding program

Info

Publication number: JP2002123298A
Application number: JP2000318017A
Authority: JP
Inventors: Akio Jin; 明夫神; Takehiro Moriya; 健弘守谷; Naoki Iwagami; 直樹岩上; Takeshi Mori; 岳至森; Kazuaki Chikira; 和明千喜良
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2000-10-18
Filing date: 2000-10-18
Publication date: 2002-04-26
Anticipated expiration: 2020-10-18
Also published as: JP3590342B2

Abstract

PROBLEM TO BE SOLVED: To provide a method for masking hearing sensation highly accurately by precisely estimating the position of a mountain and a valley in a spectral envelope curved line when vocal and musical sound signals are encoded. SOLUTION: The device is provided with a T/F converter 11 which performs a time-axis/frequency-axis conversion to an input signal to obtain a coefficient sequence X on a frequency-axis (n), an envelope calculation part 13 which calculates the spectral envelope on the basis of the coefficient sequence X (n), a mountain and valley estimation part 14 which estimates the position of the mountain and the valley in the spectral envelope, a weighting part 15 which performs weighting of amount of information at the position of the estimated mountain and valley in the spectral envelope, a hearing sensation weight calculation part 16 which calculates hearing sensation weight for quantization on the basis of the spectral envelope subjected to weighting of amount of information, and a quantization part 17 which performs quantization to the coefficient sequence X (n) on the basis of the hearing sensation weight for quantization.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、入力信号を時間軸
／周波数軸変換して量子化を行う信号符号化方法及び装
置に関し、特に、符号化に際して発生する量子化誤差
を、人間の耳が知覚しづらいように変形するための聴覚
マスキング方法と、この聴覚マスキング方法による信号
符号化装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a signal encoding method and apparatus for performing quantization by transforming an input signal on a time axis / frequency axis. The present invention relates to an auditory masking method for deforming the audio signal so as to be hard to perceive, and a signal encoding device using the auditory masking method.

【０００２】[0002]

【従来の技術】音声・楽音を符号化する従来の信号符号
化方法における聴覚マスキング方法としては、入力信号
を時間軸上または時間軸／周波数軸変換した上で、線形
予測分析方法等によりその入力信号のスペクトル包絡曲
線を推定し、その推定された曲線に妥当な変形操作を加
えることによってマスキング曲線を求めて聴覚マスキン
グを行なうという方法があった。あるいは、入力信号を
時間軸／周波数軸変換した信号から直接、スペクトル包
絡曲線を求め、この曲線に妥当な変形操作を加えること
によってマスキング曲線を求めて、聴覚マスキングによ
る量子化を行なう方法もあった。2. Description of the Related Art As an auditory masking method in a conventional signal encoding method for encoding voices and musical sounds, an input signal is converted on a time axis or a time axis / frequency axis, and the input signal is input by a linear prediction analysis method or the like. There has been a method of estimating a spectral envelope curve of a signal, performing a proper deformation operation on the estimated curve, obtaining a masking curve, and performing auditory masking. Alternatively, there is also a method in which a spectrum envelope curve is directly obtained from a signal obtained by converting an input signal into a time axis / frequency axis, and a masking curve is obtained by applying a proper deformation operation to the curve to perform quantization by auditory masking. .

【０００３】[0003]

【発明が解決しようとする課題】聴覚マスキング方法で
は、周波数軸上でのマスキングとして、スペクトル包絡
曲線の谷付近の量子化雑音を減らし、その代りにスペク
トル包絡曲線の山付近の量子化雑音を増加させるような
ノイズシェイピングを行うことによって、人間の耳には
量子化雑音が聞こえにくいようにすることができる。こ
こで、上述したような従来法では、スペクトル包絡にお
ける山と谷の推定位置が不正確となる場合があったた
め、ノイズシェイピングが適切に行われずに、結果とし
て符号化再生音の音質が悪い場合があった。In the auditory masking method, as the masking on the frequency axis, the quantization noise near the valley of the spectrum envelope curve is reduced, and the quantization noise near the peak of the spectrum envelope curve is increased instead. By performing such noise shaping, it is possible to make the quantization noise inaudible to the human ear. Here, in the conventional method as described above, since the estimated positions of the peaks and valleys in the spectral envelope may be inaccurate, noise shaping is not properly performed, and as a result, the sound quality of the encoded reproduced sound is poor. was there.

【０００４】そこで本発明の目的は、スペクトル包絡曲
線における山と谷の位置を正確に推定することができ、
これによって精度の高い聴覚マスキング方法を実行でき
る信号符号化方法及び装置を提供することにある。Accordingly, an object of the present invention is to accurately estimate the positions of peaks and valleys in a spectral envelope curve,
Accordingly, it is an object of the present invention to provide a signal encoding method and apparatus capable of executing a highly accurate auditory masking method.

【０００５】[0005]

【課題を解決するための手段】本発明は、聴感ベースで
の歪みが最小となるように量子化できる信号符号化を実
現するためのものであって、上述した課題を解決するた
めに、スペクトル包絡曲線の山と谷の位置を正確に推定
し、正確に推定した山と谷の位置から適切なノイズシェ
イピングを行う手法を取る。スペクトル包絡曲線の山と
谷の位置推定は、時間軸／周波数軸変換した信号の正確
なスペクトル包絡曲線から必要に応じて、微細な凹凸を
取り除き、さらに必要に応じて１階微分、２階微分を求
めて、これらの微分値または、微分値の相加平均値か
ら、山と谷の正確な位置を決定する。こうして得られた
山と谷の位置において適切な重みづけを行ない、効果的
なノイズシェイピングを実現する。SUMMARY OF THE INVENTION The present invention is for realizing signal encoding that can be quantized so that distortion on an auditory basis is minimized. A method of accurately estimating the positions of the peaks and valleys of the envelope curve and performing appropriate noise shaping from the accurately estimated positions of the peaks and valleys is employed. The position estimation of the peaks and valleys of the spectrum envelope curve is performed by removing fine irregularities as necessary from the accurate spectrum envelope curve of the signal subjected to the time-axis / frequency-axis conversion, and further performing first-order differentiation and second-order differentiation as necessary. And the exact positions of the peaks and valleys are determined from these differential values or the arithmetic mean of the differential values. Appropriate weighting is performed at the positions of the peaks and valleys thus obtained, and effective noise shaping is realized.

【０００６】[0006]

【発明の実施の形態】次に、本発明の好ましい実施の形
態について、図面を参照して説明する。図１は本発明の
実施の一形態の信号符号化装置の構成を示すブロック図
である。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Next, a preferred embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram illustrating a configuration of a signal encoding device according to an embodiment of the present invention.

【０００７】この信号符号化装置は、典型的には音声信
号あるいは楽音信号である時系列の入力信号ｘ(ｔ)に対
して時間軸／周波数軸変換（Ｔ／Ｆ変換）を施して周波
数軸上の信号列Ｘ(ｎ)を得るＴ／Ｆ変換部１１と、この
信号列Ｘ(ｎ)に対してベクトル量子化（ＶＱ）及びスカ
ラー量子化（ＳＱ）を施して量子化インデックスを得る
量子化部１２を備えている。ここでＴ／Ｆ変換部１１
は、例えば、ＭＤＣＴ（modified descrete cosine tra
nsform；変形離散コサイン変換）などの変換を実行し、
Ｘ(ｎ)はこの変換によって得られた変換係数列などを指
す。さらにこの信号符号化装置では、どの周波数帯域に
どれだけの情報量を配分するのかを決定するための“聴
覚重み”を算出し、量子化部１１での量子化に際し、人
間の耳に量子化雑音が聞こえ難いようにこの聴覚重みに
基づく聴覚重み付け量子化が行われるようになってい
る。聴覚重みの算出のために、この信号符号化装置は、
信号列Ｘ(ｎ)に基づいてスペクトル包絡を算出する包絡
算出部１３と、算出されたスペクトル包絡に基づいてス
ペクトルの山と谷の位置を推定する山・谷推定部１４
と、推定されたスペクトルの山と谷の位置に基づき、情
報量の配分が「山の位置で特に小さく」かつ「谷の位置
で特に大きく」なるように、山の付近と谷の付近におい
て適切な重み付けを行う重み付け部１５と、“聴覚重
み”として量子化部１２に出力する聴覚重み算出部１６
と、を備えている。ここで“聴覚重み”の原形として
は、スペクトル包絡の逆数を用いている。This signal encoding apparatus performs time / frequency axis conversion (T / F conversion) on a time-series input signal x (t), which is typically a voice signal or a tone signal, and performs frequency axis conversion. A T / F converter 11 for obtaining the above signal sequence X (n), and a quantum for obtaining a quantization index by performing vector quantization (VQ) and scalar quantization (SQ) on the signal sequence X (n). It has a conversion unit 12. Here, the T / F converter 11
Is, for example, MDCT (modified descrete cosine tra
nsform; modified discrete cosine transform)
X (n) indicates a conversion coefficient sequence or the like obtained by this conversion. Further, this signal encoding apparatus calculates an “auditory weight” for determining how much information amount is to be allocated to which frequency band, and when the quantization is performed by the quantization unit 11, the quantization is performed by the human ear. Perceptual weighting quantization based on this perceptual weight is performed so that noise is hard to hear. For the calculation of the auditory weight, this signal coding device
An envelope calculating unit 13 that calculates a spectrum envelope based on the signal sequence X (n), and a peak / valley estimating unit 14 that estimates the positions of peaks and valleys of the spectrum based on the calculated spectrum envelope.
Based on the estimated peak and valley positions of the spectrum, the distribution of the information amount is appropriate in the vicinity of the peak and the valley so that the distribution of the information amount is “particularly small at the peak position” and “particularly large at the valley position”. Weighting unit 15 for performing an appropriate weighting, and an auditory weight calculating unit 16 for outputting to the quantizing unit 12 as “auditory weight”
And Here, the reciprocal of the spectral envelope is used as the original form of the “auditory weight”.

【０００８】なお、山、谷については、横軸を周波数軸
として信号列Ｘ(ｎ)をプロットし、ならした（平滑化し
た）ときに、周囲に比べて信号列の値が大きいところを
山と称し、周囲に比べて値が小さいところを谷と称して
いる。後述するように、平滑化は、例えばある区間長
（平均区間長ともいう）での相加平均を算出する（その
区間長による移動平均を算出する）ことによって行われ
ているが、このとき、その区間長を変化させることによ
り、微細な山・谷、やや微細な山・谷、大まかな山・谷
の位置などが推定されることになる。ここで相加平均と
は、１フレーム内のスペクトルを周波数区間内で平滑化
するためのものである。本発明では、平滑化の度合いが
異なる山・谷の位置の推定を組み合わせることにより、
より精度の高い聴覚マスキングを可能にしている。As for the peaks and valleys, the signal sequence X (n) is plotted with the horizontal axis as the frequency axis, and when the signal sequence X (n) is averaged (smoothed), a portion where the value of the signal sequence is larger than the surroundings is indicated as a peak. The area where the value is smaller than the surrounding area is called a valley. As described later, the smoothing is performed by, for example, calculating an arithmetic mean (calculating a moving average based on the section length) in a certain section length (also referred to as an average section length). By changing the section length, fine peaks / valleys, slightly fine peaks / valleys, rough positions of peaks / valleys, and the like are estimated. Here, the arithmetic averaging is for smoothing a spectrum in one frame in a frequency section. In the present invention, by combining the estimation of the positions of the peaks and valleys with different degrees of smoothing,
This enables more accurate auditory masking.

【０００９】次に、この信号符号化装置の動作を説明す
る。Next, the operation of the signal encoding apparatus will be described.

【００１０】時系列の信号として入力する時系列の入力
信号ｘ(ｔ)は、Ｔ／Ｆ変換部１１によって周波数軸上の
信号列Ｘ(ｎ)に変換される。この信号列Ｘ(ｎ)は、ベク
トル量子化及びスカラー量子化のために量子化部１２に
供給されるとともに、そのスペクトル包絡を算出するた
めに、包絡算出部１３にも送られる。包絡算出部１３
は、信号列Ｘ(ｎ)のスペクトル包絡を算出し、山・谷推
定部１４は、算出されたスペクトル包絡に基づいて、ス
ペクトルにおける山と谷の位置を推定し、推定した位置
を重み付け部１５に出力する。重み付け部１５は、包絡
算出部１３において得られたスペクトル包絡の逆数に基
づいて、スペクトルの山と谷の位置においてそれぞれ、
情報量の配分が「山の位置で特に小さく」、「谷の位置
で特に大きく」なるように、山の付近と谷の付近におい
て、適切な情報量重み付けを行う。具体的には、山の付
近を高く持ち上げかつ谷の付近を深く下げるか、あるい
は、山の付近を低く下げ谷の付近を浅くなるように持ち
上げるような重み関数を用いて、山・谷の位置へ重み付
け操作を行う。重み付け部１５には、包絡算出部１３か
らスペクトル包絡曲線が供給されており、重み付け操作
が施されたスペクトル包絡曲線が重み付け部１５から聴
覚重み算出部１６に供給される。A time-series input signal x (t) input as a time-series signal is converted by a T / F converter 11 into a signal sequence X (n) on the frequency axis. The signal sequence X (n) is supplied to the quantization unit 12 for vector quantization and scalar quantization, and is also sent to the envelope calculation unit 13 to calculate the spectrum envelope. Envelope calculator 13
Calculates the spectral envelope of the signal sequence X (n), and the peak / valley estimating unit 14 estimates the positions of the peaks and valleys in the spectrum based on the calculated spectral envelope, and weights the estimated position to a weighting unit 15. Output to The weighting unit 15 calculates the peak and valley positions of the spectrum based on the reciprocal of the spectrum envelope obtained by the envelope calculation unit 13, respectively.
Appropriate information amount weighting is performed in the vicinity of the mountain and in the vicinity of the valley so that the distribution of the information amount is “particularly small at the position of the mountain” and “particularly large at the position of the valley”. Specifically, the position of the peaks and valleys is raised using a weight function that raises the vicinity of the peaks high and lowers the vicinity of the valleys deeply, or lowers the vicinity of the peaks and raises the vicinity of the valleys so as to be shallow. Weighting operation. The weighting unit 15 is supplied with a spectrum envelope curve from the envelope calculation unit 13, and the weighted operation is performed to supply the weighted spectrum envelope curve to the auditory weight calculation unit 16.

【００１１】聴覚重み算出部１６は、重み付けされたス
ペクトル包絡曲線に基づいて量子化用聴覚重みを算出し
てそれを量子化部１２に向けて出力する。その結果、量
子化部１３は、供給された量子化用聴覚重みを使用し
て、Ｔ／Ｆ変換部１１からの信号列Ｘ(ｎ)に対するベク
トル量子化及びスカラー量子化を実行する。これによ
り、量子化部１３から、精度の高い聴覚マスキングがな
された量子化インデックス（出力インデックス）が出力
される。The auditory weight calculator 16 calculates an auditory weight for quantization based on the weighted spectral envelope curve, and outputs it to the quantizer 12. As a result, the quantization unit 13 performs vector quantization and scalar quantization on the signal sequence X (n) from the T / F conversion unit 11 using the supplied auditory weights for quantization. As a result, the quantization unit 13 outputs a quantization index (output index) on which high-precision auditory masking has been performed.

【００１２】以上、この実施の形態の信号符号化装置の
基本的動作を説明したが、本発明では、聴覚重み付けの
方法として、上述した重み付けの方法と、従来から一般
的に用いられている線形予測分析法等によりスペクトル
包絡を予測し包絡曲線の山と谷をべき乗演算によりなま
らせ重みとする方法とを併用してもよい。Although the basic operation of the signal encoding apparatus according to this embodiment has been described above, in the present invention, the above-described weighting method and a conventionally used linear weighting method are used as auditory weighting methods. A method may be used in which a spectral envelope is predicted by a prediction analysis method or the like, and peaks and valleys of the envelope curve are rounded by exponentiation operation and weighted.

【００１３】次に、この実施の形態における重み付けの
過程を詳細を説明する。Next, the weighting process in this embodiment will be described in detail.

【００１４】図２は、スペクトルの山・谷へ重み付けを
行う過程を示すブロック図である。ここでは、スペクト
ル包絡算出部１３において得られたスペクトル包絡曲線
から、山・谷推定部１４において、スペクトルの微細な
山・谷の周波数位置を推定し、次にやや微細な山・谷の
周波数位置を推定し、というように、この手順を必要な
回数だけ繰り返し、最後に、スペクトルの大まかな山・
谷の周波数位置を推定する。重み付け部１５は、これら
の推定された山と谷の付近に対して、各々、妥当な重み
関数によって重み付け操作を行う。FIG. 2 is a block diagram showing a process of weighting peaks and valleys of a spectrum. Here, from the spectrum envelope curve obtained by the spectrum envelope calculator 13, the peak / valley estimator 14 estimates the frequency position of the fine peak / valley of the spectrum, and then the frequency position of the slightly fine peak / valley. This procedure is repeated as many times as necessary, and finally, a rough peak of the spectrum is obtained.
Estimate the frequency position of the valley. The weighting unit 15 performs a weighting operation on each of the estimated peaks and valleys by using an appropriate weighting function.

【００１５】図３は、包絡算出部１３における処理の詳
細を示すブロック図である。包絡算出部１３は、周波数
領域の信号列Ｘ(ｎ)に対して相加平均処理を施すことに
より、スペクトル包絡曲線を得るものである。図におい
て、相加平均（１）から相加平均（ｋ）までは、それぞ
れ、区間長が異なる移動平均区間における相加平均であ
る。ここでは、信号列Ｘ(ｎ)に対し、まず、第１の相加
平均（１）が適用され、その結果Ｙ₁(ｎ)に対して第２
の相加平均（２）が適用され、さらにその結果Ｙ₂(ｎ)
に対して第３の相加平均（３）が適用されるというよう
にして、ｋ回の相加平均を順次行うようにしている。こ
こでｋは１以上の整数の定数である。このようにして得
られた各相加平均の結果Ｙ₁(ｎ)，Ｙ₂(ｎ)，...，Ｙ
_k(ｎ)は、それぞれ山・谷推定部１４に送られる。各回
の相加平均での区間長は、各々の用途に応じて決定され
るものであるが、主として、相加平均（１）では平均区
間長を短くして微細な山と谷の位置を検出し、相加平均
（２）では相加平均（１）よりも平均区間長を長くして
大まかな山と谷の位置を検出する。以下、相加平均
（ｋ）まで同様の操作とし、各回の相加平均での平均区
間長を徐々に長くして行くとよい。また、前述した“相
加平均（ｋ）”の演算は、必要に応じて、平均区間長を
変えて複数回実施してもよい。FIG. 3 is a block diagram showing details of the processing in the envelope calculation unit 13. The envelope calculation unit 13 obtains a spectrum envelope curve by performing arithmetic averaging processing on the signal sequence X (n) in the frequency domain. In the figure, arithmetic averages (1) to (k) are arithmetic averages in moving average sections having different section lengths. Here, first, the first arithmetic mean (1) is applied to the signal sequence X (n), and as a result, the second arithmetic average (1) is applied to Y ₁ (n).
The arithmetic mean (2) is applied, and as a result Y ₂ (n)
, The third arithmetic mean (3) is applied, and the arithmetic average of k times is sequentially performed. Here, k is an integer constant of 1 or more. The results of each arithmetic mean Y ₁ (n), Y ₂ (n),.
_k (n) is sent to the peak / valley estimating unit 14, respectively. The section length in each arithmetic averaging is determined according to each application, but mainly in the arithmetic averaging (1), the average section length is shortened to detect fine peaks and valleys. In the arithmetic averaging (2), the average section length is made longer than in the arithmetic averaging (1), and rough peak and valley positions are detected. Hereinafter, the same operation is performed up to the arithmetic averaging (k), and the average section length in each arithmetic averaging may be gradually increased. Further, the arithmetic operation of the above-described “arithmetic average (k)” may be performed a plurality of times by changing the average section length as needed.

【００１６】ここでｋの値や各相加平均での区間長につ
いて説明する。ｋは１以上の整数であればよいが、典型
的には２または３である。また、入力信号が通常の音声
信号であり、入力信号のサンプリング周波数が１６ｋＨ
ｚ、フレーム長が６０ｍｓである場合には、相加平均
（１）の平均区間長は２００μｓ程度、相加平均（２）
の平均区間長は１ｍｓ程度、相加平均（３）の平均区間
長は１０ｍｓ程度、とすることが好ましい。Here, the value of k and the section length at each arithmetic mean will be described. k may be an integer of 1 or more, but is typically 2 or 3. The input signal is a normal audio signal, and the sampling frequency of the input signal is 16 kHz.
When the z and the frame length are 60 ms, the average section length of the arithmetic average (1) is about 200 μs, and the arithmetic average (2)
Is preferably about 1 ms, and the average section length of the arithmetic mean (3) is about 10 ms.

【００１７】次に、山・谷推定部１４での処理を説明す
る。図４は、山・谷推定部１４での処理を説明するブロ
ック図である。Next, the processing in the peak / valley estimating unit 14 will be described. FIG. 4 is a block diagram for explaining processing in the peak / valley estimating unit 14.

【００１８】山・谷推定部１４は、包絡算出部１３から
の各回の相加平均によるスペクトル包絡を表す係数列Ｙ
₁(ｎ)，Ｙ₂(ｎ)，...，Ｙ_k(ｎ)を入力として、係数列ご
とに、以下のようにして山と谷の位置を推定する。すな
わち、入力した係数列Ｙ_j(ｎ)（１≦ｊ≦ｋ）をまずｎ
で微分して系列Ｙ′_j(ｎ)を求め、この系列Ｙ′_j(ｎ)に
対して適切な区間で相加平均をとり、微細な変動成分を
取り除いた系列The peak / valley estimating unit 14 calculates a coefficient sequence Y representing the spectral envelope by the arithmetic averaging from each time from the envelope calculating unit 13.
_{Using 1} (n), Y ₂ (n),..., Y _k (n) as inputs, the positions of peaks and valleys are estimated for each coefficient sequence as follows. That is, the input coefficient sequence Y _j (n) (1 ≦ j ≦ k) is first converted to n
In differentiating 'seek _j (n), the sequence Y' series Y to take an arithmetic mean with the appropriate interval relative to _j (n), sequence obtained by removing the fine fluctuation component

【００１９】[0019]

【外１】 [Outside 1]

【００２０】を求める。さらにこれをｎで再び微分して
系列Ｙ″_j(ｎ)を求め、この系列Ｙ″_j(ｎ)の微細な変動
成分を取り除いた系列Is obtained. Further, this is differentiated again by n to obtain a series Y ″ _j (n), and a series Y ′ _j (n) obtained by removing minute fluctuation components from the series Y ″ _j (n)

【００２１】[0021]

【外２】 [Outside 2]

【００２２】を求める。そして、図４中に式で示したよ
うに、これらの値の正負からスペクトル包絡曲線の山と
谷の位置を推定する。また、前述した、微細な変動成分
を取り除くための“相加平均”の演算は、必要に応じ
て、平均区間長を変えて複数回実施してもよいし、これ
を実施しなくてもよい。Is obtained. Then, as shown by the equation in FIG. 4, the positions of the peaks and valleys of the spectrum envelope curve are estimated from the positive / negative of these values. In addition, the above-described arithmetic operation of “arithmetic averaging” for removing minute fluctuation components may be performed a plurality of times by changing the average section length as necessary, or may not be performed. .

【００２３】図５は、以上のようにして係数列Ｘ(ｎ)か
らスペクトル包絡の山と谷が検出された様子を例示する
図である。ここでは、ｋ＝２、すなわち包絡算出部１３
において２段階に相加平均を求める場合を示している。
この図において、平均を取る前の係数列Ｘ(ｎ)の絶対値
｜Ｘ(ｎ)｜を、相加平均（１）による系数列Ｙ₁(ｎ)
における絶対値｜Ｙ₁(ｎ)｜を、相加平均（２）によ
る係数列Ｙ₂(ｎ)における絶対値｜Ｙ₂(ｎ)｜をとす
る。相加平均（１）から推定した山の位置をｍ₁，
ｍ₂，...，ｍ₁₂、谷の位置をＶ₁，Ｖ₂，...，Ｖ₁₁で表
し、相加平均（２）から推定した山の位置をＭ₁，Ｍ₂，
Ｍ₃、谷の位置をＶ₁，Ｖ₂で表している。ここでは、相
加平均（１）での区間長よりも相加平均（２）での区間
長を長くしており、が微細な山・谷の周波数位置に相
当し、が大まかな山・谷の周波数位置に相当する。FIG. 5 is a diagram illustrating a manner in which peaks and valleys of the spectral envelope are detected from the coefficient sequence X (n) as described above. Here, k = 2, that is, the envelope calculation unit 13
Shows a case in which the arithmetic mean is obtained in two stages.
In this figure, the absolute value | X (n) | of the coefficient sequence X (n) before taking the average is represented by a series Y ₁ (n) of arithmetic mean (1).
The absolute value of | Y ₁ (n) | a, the absolute value of the arithmetic mean coefficient due (2) column _{_{Y 2 (n) | Y 2}} (n) | and the. The position of the mountain estimated from the arithmetic mean (1) is m ₁ ,
m _2, ..., m _12, the position of the valley V _1, V _2, ..., expressed in V _11, M ₁ to position mountain estimated from the arithmetic mean _(2), M 2,
M ₃ and the position of the valley are represented by V ₁ and V ₂ . Here, the section length in the arithmetic mean (2) is longer than the section length in the arithmetic mean (1), and corresponds to a fine peak / trough frequency position. Frequency position.

【００２４】次に、このようにして、複数種類の山・谷
の周波数位置が求められたとして、どのように情報量の
重み付けを行うかを説明する。図６は、スペクトル包絡
曲線の山・谷付近に情報量の重み付けを行った例を示す
図である。ここでは、説明を分かりやすくするために、
おおまかな波形を使って説明を行う。Next, assuming that a plurality of types of frequency positions of peaks and valleys have been obtained in this manner, how to weight information amounts will be described. FIG. 6 is a diagram illustrating an example in which information amounts are weighted near peaks and valleys of a spectrum envelope curve. Here, for simplicity of explanation,
The explanation is given using rough waveforms.

【００２５】図６において、あらかじめ推定されたスペ
クトル包絡曲線（｜Ｙ₂(ｎ)｜）の逆数（１／｜Ｙ₂
(ｎ)｜）を聴覚重みの原形とし、これの山と谷の推定位
置付近において、重み関数を使って重み付けを行う。こ
の図の例では、重み付け関数をに乗算することによ
って、山と谷の位置で情報量を補正した聴覚重み（Ｗ
_L）を作成している。重み付け関数及びとしては、
種々の形のものが可能であるが、ここでは、一例とし
て、重み付けを行う区間長が２ｔ、山の中心で０．５
倍、山の端で１．０倍、谷の中心で２．０倍、谷の端で
１．０倍となるような直線関数による重み付けを行った
結果をとして示している。図６から分かるように、山
と谷の正確な位置を推定し、谷の付近に情報量を多く
し、山の付近に情報量を少なく割り当る重みを作成する
ことができる。In FIG. 6, the reciprocal (1 / | Y ₂ ) of the spectral envelope curve (| Y ₂ (n) |) estimated in advance is shown.
(n) |) is used as the original form of the auditory weight, and weighting is performed using a weight function in the vicinity of the estimated positions of the peaks and valleys. In the example of this figure, the auditory weight (W) in which the information amount is corrected at the positions of the peaks and valleys by multiplying the weighting function by
_L ) have created. The weighting function and
Although various shapes are possible, here, as an example, the section length to be weighted is 2t, and 0.5 is set at the center of the mountain.
The result is weighted by a linear function such that the magnification is 1.0 times at the edge of the valley, 2.0 times at the center of the valley and 1.0 times at the edge of the valley. As can be seen from FIG. 6, it is possible to estimate the exact positions of the peaks and valleys, create a weight that increases the amount of information near the valley, and allocates a smaller amount of information near the peak.

【００２６】ここでｔの値は、例えば、ピッチ周波数を
表す山・谷の構造に重み付けしたい場合には１００〜２
００Ｈｚ、ホルマント周波数を表す山・谷の構造に重み
付けしたい場合には３００〜６００Ｈｚ程度とすること
が好ましい。Here, the value of t is, for example, 100 to 2 when it is desired to weight the peak / trough structure representing the pitch frequency.
When it is desired to weight the peak and valley structure representing 00 Hz and the formant frequency, the frequency is preferably about 300 to 600 Hz.

【００２７】実際には、スペクトル包絡の“微細な曲
線”と“おおまかな曲線”の各々の山・谷の付近におい
て、前述した方法により重み付けを行う。例えば、図５
に示すようにスペクトル包絡の“微細な曲線”と“おお
まかな曲線”の各々について山と谷の位置が推定されて
いる場合には、微細構造を表すスペクトル包絡の逆数
１／｜Ｙ₁(ｎ)｜を聴覚重みの原形とし、この包絡曲線
の山と谷の位置ｍ₁，ｖ₁，ｍ₂，ｖ₂，...の付近におい
て、図６と同様にして聴覚重みの原形である１／｜Ｙ
₁(ｎ)｜に対して適切な重み付けを行い、さらに、おお
まかなスペクトル構造を表す曲線の山と谷の位置
Ｍ₁，Ｖ₁，Ｍ₂，Ｖ₂，...の付近において、同様に聴覚
重みの原形である１／｜Ｙ₁(ｎ)｜に対して適切な重み
付けを行う。Actually, weighting is performed by the above-described method in the vicinity of the peaks and valleys of the “fine curve” and “rough curve” of the spectral envelope. For example, FIG.
As shown in (1), when the positions of the peaks and valleys are estimated for each of the “fine curve” and the “rough curve” of the spectral envelope, the reciprocal 1 / | Y ₁ (n ) | Is the original form of the auditory weight, and in the vicinity of the peak and valley positions m ₁ , v ₁ , m ₂ , v ₂ ,... Of this envelope curve, the original form of the auditory weight is 1 as in FIG. / | Y
₁ (n) | is appropriately weighted, and in the vicinity of the peak and valley positions M ₁ , V ₁ , M ₂ , V ₂ ,... Of the curve representing the rough spectral structure, Appropriate weighting is performed on 1 / | Y ₁ (n) | which is the original form of the auditory weight.

【００２８】山に対する重み付け関数及び谷に対する重
み付け関数としては、各種のものが考えられる。図７
は、そうした重み付け関数を例示するものである。Various functions can be considered as the weighting function for the peak and the weighting function for the valley. FIG.
Exemplifies such a weighting function.

【００２９】図７中、(a),(b)はいずれも山に対する重
み付け関数の例を示しており、(a)は直線により構成さ
れたもの、(b)は放物線により構成されたものである。
いずれも山の中心ｎ＝Ｍの両側にｔずつ、合計２ｔの区
間を重み付け区間としている。重み付け関数の値は、重
み付け区間の両端（Ｍ±ｔ）においては１．０であるも
のとする。また、山の中心ｎ＝Ｍにおける重みの値α
は、通常、０＜α＜１．０における妥当な定数とすれば
よい。同様に図７中、(c),(d)は、谷に対する重み付け
関数の例を示しており、(c)は直線により構成されたも
の、(d)は放物線により構成されたものである。山の場
合と同様に、谷に対する重み付け関数も、その値は、重
み付け区間の両端（Ｖ±ｔ）においては１．０である。
また、谷の中心ｎ＝Ｖにおける重みの値βは、通常、β
＞１．０における妥当な定数を使用する。しかしなが
ら、場合によっては、α＞１．０，０＜β＜１．０とす
ると効果的なこともある。In FIG. 7, (a) and (b) each show an example of a weighting function for a mountain, where (a) is composed of a straight line and (b) is composed of a parabola. is there.
In each case, a section of a total of 2t is set as a weighted section, with t on each side of the center n = M of the mountain. It is assumed that the value of the weighting function is 1.0 at both ends (M ± t) of the weighting section. Also, the value α of the weight at the center of the mountain n = M
Is usually a reasonable constant at 0 <α <1.0. Similarly, in FIG. 7, (c) and (d) show examples of the weighting function for the valley, where (c) is formed by a straight line and (d) is formed by a parabola. As in the case of the peak, the value of the weighting function for the valley is 1.0 at both ends (V ± t) of the weighting section.
The weight value β at the center of the valley n = V is usually β
Use a reasonable constant at> 1.0. However, in some cases, setting α> 1.0 and 0 <β <1.0 may be effective.

【００３０】このようにして聴覚重み付けを行った場合
に、量子化雑音は図８に示すように変形される。すなわ
ち、聴覚重み付けを行わない場合には、量子化ノイズは
周波数によらずに一定であると考えられるが（図中
）、入力信号のスペクトル包絡が図中に示すような
ものであるとすると、上述した聴覚重み付けを行うこと
により、ノイズは、図中に示すようにその周波数特性
が変形され、入力信号のスペクトル特性であるに隠さ
れて、聴感的に聞こえ難くなる。When the auditory weighting is performed in this way, the quantization noise is transformed as shown in FIG. That is, when the auditory weighting is not performed, the quantization noise is considered to be constant irrespective of the frequency (in the figure), but if the spectrum envelope of the input signal is as shown in the figure, By performing the above-described auditory weighting, the noise has its frequency characteristics deformed as shown in the figure and is hidden by the spectral characteristics of the input signal, making it difficult to hear audibly.

【００３１】したがって、従来法よりも精度の高い聴覚
マスキングが行なえ、高品質な符号化を行なうことが可
能となる。Therefore, auditory masking can be performed with higher accuracy than the conventional method, and high-quality coding can be performed.

【００３２】次に、上述した本発明の信号符号化方法を
一般的な変換符号化方式の聴覚重み付けに適用した例を
説明する。図９はそのような聴覚重み付けを行う信号符
号化装置の構成を示している。Next, an example in which the above-described signal encoding method of the present invention is applied to auditory weighting of a general transform encoding method will be described. FIG. 9 shows the configuration of a signal encoding device that performs such auditory weighting.

【００３３】図９に示す信号符号化装置は、入力信号に
対してＭＤＣＴを施すＭＤＣＴ変換部３１と、ＭＤＣＴ
後の信号のスペクトルを平坦化するスペクトル平坦化部
３２と、平坦化後のスペクトルに基づいてフレームゲイ
ンを正規化し量子化した後、ゲインインデックスを出力
するフレームゲイン正規化部３３と、正規化されたフレ
ームゲインに基づいて残差成分を量子化（ベクトル量子
化あるいはスカラー量子化）し、量子化インデックスを
出力する残差成分量子化部３４と、ＭＤＣＴ後の信号の
スペクトルからスペクトル包絡を推定するスペクトル包
絡推定部３５と、残差成分量子化部３４での量子化に際
して情報量重み付けを行うために、推定されたスペクト
ル包絡から聴覚重みを計算する聴覚重み計算部３６と、
推定されたスペクトル包絡に基づいてスペクトル情報を
量子化しスペクトルインデックスを出力するスペクトル
情報量子化部３７とを備えている。この信号符号化装置
では、ＭＤＣＴ変換部３１が図１に示した信号符号化装
置のＴ／Ｆ変換部１１に相当し、また、スペクトル包絡
推定部３５は、図１に示す装置の包絡算出部１３及び山
・谷推定部１４で構成され、聴覚重み計算部３６は、図
１に示す装置の重み付け部１５及び聴覚重み算出部１６
で構成される。The signal encoding apparatus shown in FIG. 9 includes an MDCT conversion unit 31 that performs MDCT on an input signal,
A spectrum flattening unit 32 for flattening the spectrum of the subsequent signal, a frame gain normalizing unit 33 for normalizing and quantizing the frame gain based on the flattened spectrum, and then outputting a gain index; The residual component is quantized (vector quantization or scalar quantization) based on the obtained frame gain, and a residual component quantization unit 34 that outputs a quantization index, and a spectral envelope is estimated from a signal spectrum after MDCT. A spectral envelope estimating unit 35, an auditory weight calculating unit 36 that calculates an auditory weight from the estimated spectral envelope to perform information weighting at the time of quantization in the residual component quantizing unit 34,
A spectrum information quantization unit 37 for quantizing the spectrum information based on the estimated spectrum envelope and outputting a spectrum index. In this signal encoding device, the MDCT conversion unit 31 corresponds to the T / F conversion unit 11 of the signal encoding device shown in FIG. 1, and the spectrum envelope estimating unit 35 is an envelope calculating unit of the device shown in FIG. 13 and the peak / valley estimating unit 14, and the hearing weight calculating unit 36 includes the weighting unit 15 and the hearing weight calculating unit 16 of the device shown in FIG. 1.
It consists of.

【００３４】本発明の信号符号化方法により、分析フレ
ーム内におけるスペクトルの山と谷を正確かつ細かに分
析し、その形に合わせて量子化の際に精度の高い聴覚マ
スキングを行うことができる。この聴覚マスキングは、
ベクトル量子化や、サブバンドスカラー量子化に対して
適用できる。According to the signal encoding method of the present invention, peaks and valleys of a spectrum in an analysis frame can be accurately and finely analyzed, and highly accurate auditory masking can be performed at the time of quantization according to the shape. This auditory masking
It can be applied to vector quantization and subband scalar quantization.

【００３５】さらに図１０は、特開平８−４４３９９号
公報に開示される符号器及び復号器に本発明の聴覚重み
付けを適用した例を示している。図１０に示されるもの
において、符号器１１０は、入力端子１１１に与えられ
た入力信号をフレームに分割するフレーム分割部１１４
と、フレームに時間窓を描ける時間窓掛部１１５と、時
間窓が掛けられたフレームにＮ次のＭＤＣＴを施すＭＤ
ＣＴ部１１６と、時間窓が掛けられたフレームに対して
線形予測分析を行い予測係数を出力する線形予測分析部
１１７と、予測係数を量子化してインデックスＩ_pを得
る量子化部１１８と、予測係数のスペクトラム概形を求
めるスペクトラム概形計算部１２１と、ＭＤＣＴ部１１
６からのスペクトラム振幅をスペクトラム概形により正
規化し残差係数Ｒ(Ｆ)を得る正規化部１２２と、残差係
数概形Ｅ_R(Ｆ)を計算する残差概形計算部１２３と、残
差係数概形及びスペクトラム概形に基づいて重み付け係
数（ベクトルＷ）を計算する重み計算部１２４と、重み
付け係数に基づいて量子化しインデックスＩ_mと量子化
小系列ベクトルＣ(ｍ)を出力する量子化部１２５と、残
差係数Ｒ(Ｆ)を残差係数概形Ｅ_R(Ｆ)で正規化して微細
構造係数を得る残差係数正規化部１２６と、現フレーム
の微細構造係数を正規化し正規化微細構造係数Ｘ(Ｆ)と
して量子化部１２５に与えるとともにインデックスＩ_G
を出力するパワー正規化部１２７と、量子化小系列ベク
トルＣ(ｍ)を逆正規化し量子化残差係数Ｒ_q(Ｆ)を残差
概形計算部１２３に出力する逆正規化部１３１とを備え
ている。FIG. 10 shows an example in which the auditory weighting of the present invention is applied to the encoder and decoder disclosed in Japanese Patent Application Laid-Open No. 8-44399. In the configuration shown in FIG. 10, encoder 110 includes a frame dividing section 114 for dividing an input signal applied to input terminal 111 into frames.
And a time windowing unit 115 for drawing a time window on the frame, and an MD for performing an N-order MDCT on the frame on which the time window is set
A CT unit 116, a linear prediction analysis unit 117 that performs a linear prediction analysis on a frame to which a time window is applied and outputs a prediction coefficient, a quantization unit 118 that quantizes the prediction coefficient to obtain an index I _p , A spectrum outline calculation unit 121 for obtaining a spectrum outline of a coefficient, and an MDCT unit 11
6, a normalization unit 122 that obtains a residual coefficient R (F) by normalizing the spectrum amplitude from the spectrum amplitude by a spectrum outline, a residual outline calculation unit 123 that calculates a residual coefficient outline E _R (F), A weight calculator 124 for calculating a weighting coefficient (vector W) based on the difference coefficient outline and the spectrum outline, and a quantum for quantizing based on the weighting coefficient and outputting an index _Im and a quantized small sequence vector C (m) And a residual coefficient normalizing unit 126 for normalizing the residual coefficient R (F) with the residual coefficient approximate form E _R (F) to obtain a fine structure coefficient, and normalizing the fine structure coefficient of the current frame. It is given to the quantization unit 125 as the normalized fine structure coefficient X (F) and the index _IG
And a denormalization unit 131 that denormalizes the quantized small sequence vector C (m) and outputs a quantized residual coefficient R _q (F) to the residual approximate calculation unit 123. It has.

【００３６】符号器１１０において本発明に基づく聴覚
重み付けを行うためには、スペクトラム概形計算部１２
１において、従来法に加えてさらに図１に示した信号符
号化装置の包絡算出部１３及び山・谷推定部１４での処
理と同様の処理を行わせ、その結果に基づいて、重み計
算部１２４においては、従来法に加えてさらに図１に示
した装置の重み付け部１５及び聴覚重み算出部１６での
処理と同様の処理を行い、得られた量子化用聴覚重みを
量子化部１２５に供給するようにすればよい。In order to perform the auditory weighting based on the present invention in the encoder 110, the spectrum outline calculator 12
1, in addition to the conventional method, the same processing as the processing in the envelope calculating unit 13 and the peak / valley estimating unit 14 of the signal encoding device shown in FIG. 1 is performed, and based on the result, the weight calculating unit In 124, in addition to the conventional method, the same processing as the processing in the weighting unit 15 and the perceptual weight calculating unit 16 of the apparatus shown in FIG. What is necessary is just to supply.

【００３７】これに対して復号器１５０は、インデック
スＩ_mから正規化微細構造係数を再生する再生部１５１
と、インデックスＩ_Gから正規化ゲインを再生する正規
化ゲイン再生部１５２と、正規化微細構造係数を正規化
ゲインにより逆正規化して微細構造係数を得るパワー逆
正規化部１５３と、微細構造係数を残差概形ＥRで逆正
規化して残差係数Ｒ(Ｆ)を再生する残差逆正規化部１５
４と、残差概形Ｅ_Rを計算する残差概形計算部１５５
と、インデックスＩ_pから線形予測係数を再生しスペク
トラム概形を計算する再生・スペクトラム概形計算部１
５６と、スペクトラム概形を残差係数Ｒ(Ｆ)で逆正規化
し周波数領域係数を再生する逆正規化部１５７と、周波
数領域係数にフレームごとに逆ＭＤＣＴを施し時間領域
信号を得る逆ＭＤＣＴ部１５８と、時間領域信号にフレ
ームごとに時間窓を掛ける窓掛部１５９と、窓掛け出力
に対してフレーム重ね合わせを行い再生音響信号を得て
これを出力端子１９１に出力するフレーム重ね合わせ部
１６１と、を備えている。[0037] The decoder 150 on the other hand, the reproduction unit 151 for reproducing the normalized fine structure coefficients from the index I _m
When, the normalized gain reproducing unit 152 for reproducing normalized gain from the index I _G, and power inverse normalization unit 153 and normalized fine structure coefficients by inverse normalization by the normalization gain obtain fine structure coefficients, fine structure coefficients Is denormalized with a residual approximate form ER to reproduce a residual coefficient R (F).
4 and a residual approximate shape calculation unit 155 for calculating the residual approximate shape E _R
And a reproduction / spectrum outline calculation unit 1 for reproducing a linear prediction coefficient from the index I _p and calculating a spectrum outline
56, an inverse normalization unit 157 that inversely normalizes the spectrum outline with a residual coefficient R (F) to reproduce a frequency domain coefficient, and an inverse MDCT unit that performs inverse MDCT on the frequency domain coefficient for each frame to obtain a time domain signal. 158, a windowing unit 159 for applying a time window to the time domain signal for each frame, and a frame overlapping unit 161 for obtaining a reproduced audio signal by superimposing frames on the windowed output and outputting the reproduced audio signal to an output terminal 191. And

【００３８】なお、図１０に示す符号器１１０において
は、逆正規化部１３１を設けることなく、正規化部１２
２の出力のみに基づいて残差概形計算部１２３が残差係
数概形Ｅ_R(Ｆ)とインデックスＩ_Qを算出するようにする
ことが可能であり、この場合、復号器１５０において残
差概形計算部１５５はインデックスＩ_Qに基づいて残差
概形Ｅ_Rを計算する。In the encoder 110 shown in FIG. 10, the normalization unit 12 is provided without the denormalization unit 131.
Residual envelope calculation section 123 based on only the output of 2 it is possible to calculate the residual coefficients envelope E _R (F) and the index I _Q, the residual in this case, the decoder 150 envelope calculation section 155 calculates a residual envelope E _R based on the index I _Q.

【００３９】次に、時間領域の符号化方式であるＣＥＬ
Ｐ(Code-Excited Linear Prediction)符号化の聴覚マス
キングに本発明を適用した例を説明する。ＣＥＬＰ符号
化では、時間領域で聴覚マスキングが行われるため、本
発明に基づく聴覚重み付けを周波数領域で適用し、得ら
れた聴覚重みを時間領域に戻してから量子化に適用す
る。図１１はそのような符号化を行う信号符号化装置の
構成を示すブロック図である。Next, CEL which is a coding method in the time domain is used.
An example in which the present invention is applied to auditory masking of P (Code-Excited Linear Prediction) coding will be described. In CELP coding, since auditory masking is performed in the time domain, the auditory weighting based on the present invention is applied in the frequency domain, and the obtained auditory weight is returned to the time domain and then applied to quantization. FIG. 11 is a block diagram illustrating a configuration of a signal encoding device that performs such encoding.

【００４０】図１１に示す装置は、入力信号に対してＦ
ＦＴ（高速フーリエ変換）を施すＦＦＴ部３８と、ＦＦ
Ｔ部の出力（周波数領域の信号列）に基づき、スペクト
ル包絡を推定するスペクトル包絡推定部３５と、推定さ
れたスペクトル包絡から聴覚重みを計算する聴覚重み計
算部３６と、聴覚重みを時間領域に戻すための逆ＦＦＴ
部３９と、時間領域の聴覚重みに基づいて入力信号のＣ
ＥＬＰ符号化を行い、インデックスを出力するＣＥＬＰ
符号化部４０とを備えている。この信号符号化装置にお
いては、ＦＦＴ部３８が図１に示した信号符号化装置の
Ｔ／Ｆ変換部１１に相当し、また、スペクトル包絡推定
部３５は、図１に示す装置の包絡算出部１３及び山・谷
推定部１４で構成され、聴覚重み計算部３６は、図１に
示す装置の重み付け部１５及び聴覚重み算出部１６で構
成される。The device shown in FIG.
An FFT unit 38 for performing FT (fast Fourier transform);
A spectral envelope estimating unit 35 for estimating a spectral envelope based on an output of the T unit (a signal sequence in the frequency domain), an auditory weight calculating unit 36 for calculating an auditory weight from the estimated spectral envelope, and Inverse FFT to return
Unit 39 and the C of the input signal based on the auditory weight in the time domain.
CELP that performs ELP encoding and outputs an index
And an encoding unit 40. In this signal encoding device, the FFT unit 38 corresponds to the T / F conversion unit 11 of the signal encoding device shown in FIG. 1, and the spectrum envelope estimating unit 35 is an envelope calculating unit of the device shown in FIG. 13 and the peak / valley estimating unit 14, and the auditory weight calculating unit 36 includes the weighting unit 15 and the auditory weight calculating unit 16 of the apparatus shown in FIG.

【００４１】さらに図１２は、特開平６−２８２２９８
号公報の図１に開示される音声符号化装置に本発明の聴
覚重み付けを適用した例を示している。図１２に示され
る音声符号化装置は、入力端子２０１を介して入力した
音声信号をフレームに分割して線形予測分析を行い、予
測係数を決定する予測係数決定部２０２と、合成フィル
タ２０３と、予測係数を量子化して合成フィルタ２０３
に予測係数を設定する予測係数量子化部２０４と、複数
のピッチ周期ベクトルを記憶する適応符号帳２１７と、
複数の雑音波形ベクトルを記憶する雑音符号帳２１８
と、適応符号帳２１７から選択されたピッチ周期ベクト
ルに利得を加える利得部２１９ａ及び雑音符号帳２１８
から選択された雑音波形ベクトルに利得を加える利得部
２１９ｂとを有する利得符号帳２１９と、利得部２１９
ｂの過去の出力パワーに基づいて次の雑音波形ベクトル
の予測利得を得る予測利得決定部２１５と、利得部２１
９ｂの入力側に設けられ選択された雑音波形ベクトルに
この予測利得を加える予測利得部２１６と、利得部２１
９ａ、２１９ｂからの出力ベクトルを加算して駆動ベク
トルとして合成フィルタ２０３に供給する加算器２０９
と、入力音声ベクトル（入力信号）から合成フィルタ２
０３の出力（合成音声ベクトル）を減算して歪データと
して出力する減算器２１１と、歪データに対して聴覚重
み付けを行う聴覚重み付けフィルタ２２０と、聴覚重み
付け後の歪データに基づいて歪パワーを計算し、歪パワ
ーが最小になるように各符号帳２１７〜２１９での選択
を行う歪パワー計算部２１２と、符号を出力する符号出
力部２１３と、を備えている。Further, FIG.
FIG. 1 shows an example in which the auditory weighting of the present invention is applied to the speech encoding device disclosed in FIG. The speech coding apparatus shown in FIG. 12 divides a speech signal input via an input terminal 201 into frames, performs linear prediction analysis, and determines a prediction coefficient, a prediction coefficient determination unit 202, a synthesis filter 203, Quantizing the prediction coefficients and synthesizing filter 203
A predictive coefficient quantizer 204 for setting a predictive coefficient in the adaptive codebook 217 for storing a plurality of pitch period vectors,
Noise codebook 218 storing a plurality of noise waveform vectors
And a gain unit 219 a for adding a gain to the pitch period vector selected from the adaptive codebook 217 and a noise codebook 218.
Codebook 219 having a gain section 219b for adding a gain to the noise waveform vector selected from, and gain section 219.
a prediction gain determination unit 215 for obtaining a prediction gain of the next noise waveform vector based on the past output power of b, and a gain unit 21
A prediction gain section 216 provided on the input side of the input section 9b and adding the prediction gain to a selected noise waveform vector;
The adder 209 adds the output vectors from 9a and 219b and supplies the resultant to the synthesis filter 203 as a drive vector.
And a synthesis filter 2 from the input speech vector (input signal)
03, a subtractor 211 that subtracts the output (synthesized speech vector) and outputs the resultant as distortion data, a perceptual weighting filter 220 that performs perceptual weighting on the distortion data, and calculates distortion power based on the perceptually weighted distortion data. Further, a distortion power calculation unit 212 that performs selection in each of the codebooks 217 to 219 so as to minimize distortion power is provided, and a code output unit 213 that outputs a code.

【００４２】この音声符号化装置において本発明に基づ
く聴覚重み付けを行う場合には、上述の図１１に示した
信号符号化装置をここでの聴覚重み付けフィルタ２２０
として、または聴覚重み付けフィルタ２２０と併用して
用いればよい。これにより、歪データに対して、本発明
に基づく聴覚重み付けがなされることになる。さらに、
ここでは図面を用いては説明しないが、特開平６−２８
２２９８号公報の図２に開示される音声符号化装置にお
いても、その聴覚重み付けフィルタとして、図１１に示
した信号符号化装置を上述のように変形したものを使用
することができる。When performing the hearing weighting based on the present invention in this speech coding apparatus, the signal coding apparatus shown in FIG.
Or in combination with the auditory weighting filter 220. As a result, auditory weighting based on the present invention is performed on the distortion data. further,
Here, although not described with reference to the drawings,
In the speech coding apparatus disclosed in FIG. 2 of Japanese Patent No. 2298, the signal coding apparatus shown in FIG. 11 modified as described above can be used as the auditory weighting filter.

【００４３】以上説明した本発明に基づく信号符号化方
法及び装置は、それを実現するための計算機プログラム
を、計算機（コンピュータ）に読み込ませ、そのプログ
ラムを実行させることによっても実現できる。信号符号
化を行うためのプログラムは、磁気テープやＣＤ−ＲＯ
Ｍなどの記録媒体によって、あるいは、ネットワークを
介して、計算機に読み込まれる。図１３は、上述の信号
符号化方法を実行する計算機の構成を示すブロック図で
ある。The signal encoding method and apparatus according to the present invention described above can also be realized by causing a computer (computer) to read a computer program for realizing the method and executing the program. The program for performing signal encoding is a magnetic tape or CD-RO.
It is read into a computer by a recording medium such as M or via a network. FIG. 13 is a block diagram illustrating a configuration of a computer that executes the above-described signal encoding method.

【００４４】この計算機は、中央処理装置（ＣＰＵ）２
１と、プログラムやデータを格納するためのハードディ
スク装置２２と、主メモリ２３と、キーボードやマウ
ス、マイクロホンなどの入力装置２４と、ＣＲＴやスピ
ーカなどの表示装置２５と、磁気テープやＣＤ−ＲＯＭ
等の記録媒体２７を読み取る読み取り装置２６と、ネッ
トワークに接続した通信インタフェース２８とから構成
されている。ハードディスク装置２２、主メモリ２３、
入力装置２４、表示装置２５、読み取り装置２６及び通
信インタフェース２８は、いずれも中央処理装置２１に
接続している。ハードディスク装置２２の代わりに、フ
ラッシュＲＯＭなどの不揮発性半導体記憶装置を用いて
もよい。この計算機は、信号符号化を行うためのプログ
ラムを格納した記録媒体２７を読み取り装置２６に装着
し、記録媒体２７からプログラムを読み出してハードデ
ィスク装置２２に格納し、ハードディスク装置２２に格
納されたプログラムを中央処理装置２１が実行すること
により、信号符号化装置として機能するようになる。も
ちろん、ネットワークを介して、信号符号化を行うため
のプログラムをこの計算機にダウンロードするようにし
てもよい。This computer has a central processing unit (CPU) 2
1, a hard disk device 22 for storing programs and data, a main memory 23, an input device 24 such as a keyboard, a mouse, and a microphone, a display device 25 such as a CRT or a speaker, a magnetic tape or a CD-ROM.
And the like, and a communication device 28 connected to a network. Hard disk device 22, main memory 23,
The input device 24, the display device 25, the reading device 26, and the communication interface 28 are all connected to the central processing unit 21. Instead of the hard disk device 22, a nonvolatile semiconductor storage device such as a flash ROM may be used. The computer attaches a recording medium 27 storing a program for performing signal encoding to a reading device 26, reads the program from the recording medium 27, stores the program in the hard disk drive 22, and executes the program stored in the hard disk drive 22. The central processing unit 21 functions as a signal encoding device when executed. Of course, a program for performing signal encoding may be downloaded to this computer via a network.

【００４５】[0045]

【発明の効果】以上説明したように、本発明によれば、
音声・楽音信号を符号化する際に、従来法よりも精度の
高い聴覚マスキングが行なえ、高品質な符号化を行なう
ことが可能となる。具体的には、例えばＭＤＣＴ変換等
によって時系列信号を周波数領域の係数列に変換して量
子化する際に、本発明を用いれば、人間の聴覚マスキン
グ特性を利用して、量子化誤差を知覚し難いように、周
波数軸上で従来法よりも高精度で配分することが可能と
なる。As described above, according to the present invention,
When encoding a speech / tone signal, auditory masking can be performed with higher accuracy than the conventional method, and high-quality encoding can be performed. Specifically, when the time series signal is converted into a frequency-domain coefficient sequence by, for example, an MDCT transform and quantized, the present invention is used to perceive a quantization error using human auditory masking characteristics. As a result, it is possible to perform the distribution on the frequency axis with higher accuracy than the conventional method.

[Brief description of the drawings]

【図１】本発明の実施の一形態の信号符号化装置の構成
を示すブロック図である。FIG. 1 is a block diagram illustrating a configuration of a signal encoding device according to an embodiment of the present invention.

【図２】スペクトルの山・谷へ重み付けを行う過程を示
すブロック図である。FIG. 2 is a block diagram showing a process of weighting peaks and valleys of a spectrum.

【図３】包絡算出部における処理の詳細を示すブロック
図である。FIG. 3 is a block diagram illustrating details of processing in an envelope calculation unit.

【図４】山・谷推定部における処理の詳細を示すブロッ
ク図である。FIG. 4 is a block diagram illustrating details of processing in a peak / valley estimating unit;

【図５】山・谷推定部により検出された、スペクトラム
包絡における山及び谷の様子の一例を示す図である。FIG. 5 is a diagram illustrating an example of a state of peaks and valleys in a spectrum envelope detected by a peak / valley estimating unit.

【図６】スペクトル包絡の山・谷付近に重み付けを行っ
た例を示す図である。FIG. 6 is a diagram illustrating an example in which weighting is performed around peaks and valleys of a spectral envelope.

【図７】(a)〜(d)は、山・谷付近への重み付け関数の例
を示す図である。FIGS. 7A to 7D are diagrams illustrating examples of weighting functions for peaks and valleys;

【図８】聴覚重み付け処理によって量子化雑音がスペク
トル包絡にマスキングされる様子を示した図である。FIG. 8 is a diagram illustrating a state where quantization noise is masked into a spectral envelope by an auditory weighting process.

【図９】本発明に基づく信号符号化装置の構成の一例を
示すブロック図である。FIG. 9 is a block diagram illustrating an example of a configuration of a signal encoding device according to the present invention.

【図１０】本発明に基づく聴覚重み付けが適用される符
号器及び復号器の構成の一例を示すブロック図である。FIG. 10 is a block diagram showing an example of a configuration of an encoder and a decoder to which auditory weighting according to the present invention is applied.

【図１１】信号符号化装置の構成の一例を示すブロック
図である。FIG. 11 is a block diagram illustrating an example of a configuration of a signal encoding device.

【図１２】信号符号化装置の構成の一例を示すブロック
図である。FIG. 12 is a block diagram illustrating an example of a configuration of a signal encoding device.

【図１３】信号符号化装置を構成するために使用される
計算機システムの一例を示すブロック図である。FIG. 13 is a block diagram illustrating an example of a computer system used to configure the signal encoding device.

[Explanation of symbols]

１１Ｔ／Ｆ変換部１２量子化部１３包絡算出部１４山・谷推定部１５重み付け部１６聴覚重み算出部 Reference Signs List 11 T / F converter 12 Quantizer 13 Envelope calculator 14 Peak / valley estimator 15 Weighter 16 Auditory weight calculator

───────────────────────────────────────────────────── フロントページの続き (72)発明者岩上直樹東京都千代田区大手町二丁目３番１号日本電信電話株式会社内 (72)発明者森岳至東京都千代田区大手町二丁目３番１号日本電信電話株式会社内 (72)発明者千喜良和明東京都千代田区大手町二丁目３番１号日本電信電話株式会社内Ｆターム(参考） 5J064 AA01 BA16 BB03 BC16 BC21 ──────────────────────────────────────────────────続き Continuing on the front page (72) Inventor Naoki Iwagami 2-3-1 Otemachi, Chiyoda-ku, Tokyo Within Nippon Telegraph and Telephone Corporation (72) Inventor Takeshi Mori 2-chome Otemachi, Chiyoda-ku, Tokyo No. 1 Within Nippon Telegraph and Telephone Corporation (72) Inventor Kazuaki Chikira 2-3-1 Otemachi, Chiyoda-ku, Tokyo F-term within Nippon Telegraph and Telephone Corporation (reference) 5J064 AA01 BA16 BB03 BC16 BC21

Claims

[Claims]

1. A signal encoding method for performing quantization on an input signal, comprising: performing a time axis / frequency axis conversion on the input signal to obtain a coefficient sequence on a frequency axis; Calculating the spectrum envelope based on the sequence; weighting the information at the peak and valley positions of the calculated spectrum envelope; calculating the auditory weight for quantization based on the information weighted spectrum envelope Performing a quantization based on the auditory weight for quantization,
A signal encoding method comprising:

2. A signal encoding method for performing quantization on an input signal, comprising: performing a time axis / frequency axis conversion on the input signal to obtain a coefficient sequence on a frequency axis; A step of performing a smoothing process by obtaining an arithmetic mean over the section length for the column to obtain a spectrum envelope, and a section used in obtaining the spectrum obtained last time with respect to the spectrum envelope obtained last time. Performing a smoothing process by obtaining an arithmetic mean over a section length longer than the length to obtain a spectral envelope at least once, and in any one of the spectral envelopes, a peak in each of the spectral envelopes is obtained. Performing a weighting of the information amount at the position of the valley to obtain a spectrum envelope weighted with the information amount, based on the spectrum envelope weighted with the information amount,
A signal encoding method comprising: calculating an auditory weight for quantization; and performing quantization based on the auditory weight for quantization.

3. A signal encoding method for performing quantization on an input signal, comprising: performing a time-axis / frequency-axis conversion on the input signal to obtain a coefficient sequence on a frequency axis; Calculating a spectrum envelope based on the sequence; estimating the positions of peaks and valleys in the spectrum envelope; and performing weighting of information amounts to the estimated positions of peaks and valleys in the spectrum envelope. Calculating the auditory weight for quantization based on the weighted spectral envelope; andquantizing based on the auditory weight for quantization, wherein the estimating step comprises: Obtaining the first order differential value, calculating the arithmetic mean value of the first order differential value, and obtaining the first order differential value of the arithmetic mean value of the first order differential value to obtain the second order differential value And the second derivative Calculating the arithmetic mean of the first order differential value or the arithmetic mean value of the first order differential value changes from a positive value to a negative value, and in the vicinity of the change, 2
If the arithmetic differential value of the first-order differential value or the second-order differential value is always a negative value, the frequency is regarded as a peak position, and the first-order differential value or the arithmetic mean of the first-order differential value is a negative value. To a positive value, and in the vicinity of the change, the second derivative or 2
If the arithmetic mean of the differential values is always a positive value, the frequency is used as the position of the valley.

4. The step of weighting the amount of information is performed by using a weighting function that raises the vicinity of the hill high and lowers the vicinity of the valley deeply, or lowers the vicinity of the hill and lifts the vicinity of the valley to be shallow. The signal encoding method according to claim 1, further comprising a step of performing a weighting operation on the positions of the peaks and valleys.

5. The method according to claim 1, wherein the step of calculating the auditory weight for quantization is a step of obtaining an auditory weight for quantization based on an envelope curve in which information positions are weighted at peaks and valleys. Signal encoding method.

6. A signal coding apparatus for performing quantization on an input signal, comprising: a conversion unit for performing a time axis / frequency axis conversion on the input signal to obtain a coefficient sequence on a frequency axis; Envelope calculating means for calculating a spectrum envelope based on a coefficient sequence; weighting means for weighting the amount of information at peaks and valleys of the calculated spectrum envelope; and quantizing based on the information weighted spectrum envelope. A signal encoding device comprising: an auditory weight calculating unit that calculates an auditory weight; and a quantizing unit that performs quantization based on the quantizing auditory weight.

7. A signal encoding apparatus for performing quantization on an input signal, comprising: a conversion unit for performing a time axis / frequency axis conversion on the input signal to obtain a coefficient sequence on a frequency axis; The smoothing process is performed to obtain the spectral envelope by obtaining the arithmetic mean over the section length for the coefficient sequence, and then the previously obtained spectral envelope is used for the previously obtained spectral envelope. An envelope calculating means for performing one or more times of performing a smoothing process to obtain a spectrum envelope by obtaining an arithmetic mean over a section length longer than the section length, and the envelope calculation in any of the spectrum envelopes; Weighting means for weighting the amount of information at peak and valley positions in each spectrum envelope obtained by the means to obtain a spectrum envelope weighted with the amount of information; Based on the spectral envelope which is,
A signal encoding device, comprising: an auditory weight calculator that calculates an auditory weight for quantization; and a quantizer that performs quantization based on the auditory weight for quantization.

8. A signal encoding apparatus for performing quantization on an input signal, comprising: a conversion unit configured to perform a time axis / frequency axis conversion on the input signal to obtain a coefficient sequence on a frequency axis; An envelope calculating means for calculating a spectrum envelope based on a coefficient sequence; a peak / valley estimating means for estimating a peak / valley position in the spectrum envelope; and an information amount to the estimated peak / valley position in the spectrum envelope. Weighting means for performing weighting, perceptual weight calculating means for calculating a perceptual weight for quantization based on the information-weighted spectral envelope, quantizing means for performing quantization based on the perceptual weight for quantization, The peak / valley estimating means obtains a first-order differential value of the spectrum envelope, obtains an arithmetic mean value of the first-order differential value,
Calculating the first order differential value of the arithmetic mean value of the second order differential value to obtain a second order differential value, obtaining the arithmetic mean value of the second order differential value, and calculating the arithmetic mean of the first order differential value or the first order differential value If the value changes from a positive value to a negative value and the second derivative or the arithmetic mean of the second derivative is always a negative value in the vicinity of the change,
The frequency is defined as the position of the mountain,
If the arithmetic mean of the second derivative changes from a negative value to a positive value, and if the second derivative or the arithmetic mean of the second derivative is always a positive value in the vicinity of the change, A signal encoding device that uses the frequency as a valley position.

9. The weighting means uses a weighting function that raises the vicinity of the hill higher and lowers the vicinity of the valley deeply, or lowers the vicinity of the hill and lifts the vicinity of the valley so as to be shallower. The signal encoding device according to claim 6, wherein a weighting operation is performed on the position.

10. The signal encoding apparatus according to claim 6, wherein the weighting means obtains an auditory weight for quantization based on an envelope curve obtained by weighting the amount of information to the positions of the peaks and valleys.

11. A recording medium readable by a computer, wherein the computer performs a time-axis / frequency-axis conversion on an input signal to obtain a coefficient sequence on a frequency axis, based on the coefficient sequence. Calculating the spectral envelope by calculating the amount of information to the peaks and valleys of the calculated spectral envelope; and calculating the auditory weight for quantization based on the information-weighted spectral envelope. A step of performing quantization based on the quantization auditory weight; and a recording medium storing a signal encoding program for executing the following.

12. A recording medium readable by a computer, wherein the computer performs a time axis / frequency axis conversion on an input signal to obtain a coefficient sequence on a frequency axis; Performing a smoothing process to obtain an arithmetic average over the section length to obtain a spectrum envelope, and comparing the previously obtained spectrum envelope with the section length used in obtaining the previously obtained spectrum. Performing a smoothing process by obtaining an arithmetic mean over a long section length to obtain a spectral envelope at least once, and in any one of the spectral envelopes, Performing an information amount weighting on the position to obtain an information amount weighted spectrum envelope, based on the information amount weighted spectrum envelope,
A recording medium storing a signal encoding program for executing the steps of: calculating a hearing weight for quantization; and performing quantization based on the hearing weight for quantization.

13. A recording medium readable by a computer, wherein the computer performs a time axis / frequency axis conversion on the input signal to obtain a coefficient sequence on a frequency axis; Calculating a spectrum envelope based on the spectrum envelope; estimating the positions of peaks and valleys in the spectrum envelope; performing weighting of information amounts to the estimated positions of peaks and valleys in the spectrum envelope; Calculating an auditory weight for quantization based on the obtained spectral envelope; and performing a quantization based on the auditory weight for quantization. Calculating a first derivative, calculating an arithmetic mean of the first derivative, calculating a first derivative of the arithmetic mean of the first derivative to obtain a second derivative, 2 above Calculating an arithmetic mean value of the first derivative value, wherein the first derivative value or the arithmetic mean value of the first derivative value changes from a positive value to a negative value, and In the vicinity 2
If the arithmetic differential value of the first-order differential value or the second-order differential value is always a negative value, the frequency is regarded as a peak position, and the first-order differential value or the arithmetic mean of the first-order differential value is a negative value. To a positive value, and in the vicinity of the change, the second derivative or 2
If the arithmetic mean of the differential values is always a positive value, a recording medium that stores a signal encoding program that sets the frequency as a valley position.

14. The step of weighting the amount of information is performed by using a weighting function that raises the vicinity of the hill high and lowers the vicinity of the valley deeply, or lowers the vicinity of the mountain and lifts the vicinity of the valley so as to be shallow. The recording medium according to claim 11, further comprising a step of performing a weighting operation on the positions of the peaks and valleys.

15. The step of calculating auditory weights for quantization includes:
12. The recording medium according to claim 11, wherein the step of obtaining an auditory weight for quantization is based on an envelope curve obtained by weighting the amount of information to peaks and valleys.