JP2001109497A

JP2001109497A - Audio signal encoding device and audio signal encoding method

Info

Publication number: JP2001109497A
Application number: JP28222599A
Authority: JP
Inventors: Akira Usami; 陽宇佐見
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1999-10-04
Filing date: 1999-10-04
Publication date: 2001-04-20

Abstract

PROBLEM TO BE SOLVED: To provide an audio signal encoding device and an audio signal encoding method by which the number of bits required for encoding is reduced and the deterioration of sound quality is prevented without reducing the number of quantized bits for encoding even if the number of bits lacks as a whole. SOLUTION: A hearing model calculation part 3 suppresses the number of bits allocated to frequency spectral data so that the number is decreased without dropping a masking level showing a hearing model even if the part 3 selects a second conversion length whose frequency resolution is low at the time of calculating the hearing model.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、人間の聴覚特性を
活用して、デジタル化されたオーディオ信号を圧縮して
符号化するオーディオ信号符号化装置およびオーディオ
信号符号化方法に関するものである。[0001] 1. Field of the Invention [0002] The present invention relates to an audio signal encoding apparatus and an audio signal encoding method for compressing and encoding a digitized audio signal by utilizing human auditory characteristics.

【０００２】[0002]

【従来の技術】近年、オーディオ装置のデジタル化は急
速に進展している。例えばミニディスク（ＭＤ）に採用
されているオーディオ信号符号化方式のＡＴＲＡＣ（Ａ
ｄａｐｔｉｖｅＴｒａｎｓｆｏｒｍＡｃｏｕｓｔｉ
ｃＣｏｄｉｎｇ）方式や、デジタル衛星放送で採用さ
れているオーディオ信号符号化方式のＭＰＥＧの各種方
式などは、復号して再生する際の再生信号の音質を保ち
ながら原信号を効率的に圧縮するためのデジタルオーデ
ィオ信号処理技術である。2. Description of the Related Art In recent years, digitalization of audio devices has been rapidly progressing. For example, ATRAC (A) of an audio signal encoding method adopted for a mini disc (MD)
adaptive Transform Acoustic
c Coding) system and various types of MPEG, which is an audio signal encoding system adopted in digital satellite broadcasting, are intended to efficiently compress the original signal while maintaining the sound quality of the reproduced signal when decoding and reproducing. Digital audio signal processing technology.

【０００３】これらのデジタルオーディオ信号を圧縮す
るオーディオ信号符号化技術の１つとして、入力デジタ
ルオーディオ信号をＱＭＦ（ＱｕａｄｒａｔｕｒｅＭ
ｉｒｒｏｒＦｉｌｔｅｒ）などの帯域分割フィルタ処
理を施して複数の周波数成分毎の時系列サブバンドデー
タに分割し、各々のサブバンドデータに人間の聴覚特性
を活用して量子化ビット数を割り当てて量子化した後に
符号化を行うサブバンド符号化（ＳｕｂｂａｎｄＣｏ
ｄｉｎｇ）方式がある。As one of audio signal encoding techniques for compressing these digital audio signals, an input digital audio signal is converted to a QMF (Quadrature M) signal.
(e.g., color filter) to divide the data into time-series sub-band data for each of a plurality of frequency components, and allocate the number of quantization bits to each sub-band data by utilizing human auditory characteristics to perform quantization. Subband coding (Subband Co
ding) method.

【０００４】また、これとは別に、入力信号として入力
デジタルオーディオ信号を、高速フーリエ変換（ＦＦ
Ｔ：ＦａｓｔＦｏｕｒｉｅｒＴｒａｎｓｆｏｒｍ）
や離散コサイン変換（ＤＣＴ：ＤｉｓｃｒｅｔｅＣｏ
ｓｉｎｅＴｒａｎｓｆｏｒｍ）、変形離散コサイン変
換（ＭＤＣＴ：ＭｏｄｉｆｉｅｄＤｓｉｃｒｅｔｅ
ＣｏｓｉｎｅＴｒａｎｓｆｏｒｍ）などの時間−周波
数変換処理を施して複数の周波数スペクトルデータに変
換し、各々の周波数スペクトルデータに人間の聴覚特性
を活用して量子化ビット数を割り当てて量子化した後に
符号化を行う変換符号化（ＴｒａｎｓｆｏｒｍＣｏｄ
ｉｎｇ）方式がある。[0004] Separately, an input digital audio signal is converted to a fast Fourier transform (FF) as an input signal.
T: Fast Fourier Transform)
And discrete cosine transform (DCT: Discrete Co
sine Transform, Modified Discrete Cosine Transform (MDCT)
A time-frequency conversion process such as Cosine Transform is performed to convert the frequency spectrum data into a plurality of frequency spectrum data. Perform Transform Cod
ing) method.

【０００５】ＭＰＥＧのＡＡＣ方式では、時系列のデジ
タルオーディオ信号から周波数スペクトルデータへの変
換にＭＤＣＴを用いており、この変換符号化方式の一つ
である。さらに、サブバンド符号化方式と変換符号化方
式を組み合わせた符号化方式があり、ＭＤ（ミニディス
ク）のＡＴＲＡＣはこの符号化方式の一つである。デジ
タルオーディオ信号を圧縮する上記のいずれの符号化方
式においても、人間の聴覚特性を用いることにより原信
号を圧縮している。人間の聴覚特性には、静寂時に周波
数毎で異なる知覚可能な最小の信号レベルを示す最小可
聴限特性がある。[0005] In the MPEG AAC method, MDCT is used to convert a time-series digital audio signal into frequency spectrum data, which is one of the conversion coding methods. Further, there is an encoding system in which a subband encoding system and a transform encoding system are combined, and ATRAC of MD (mini disc) is one of the encoding systems. In any of the above encoding schemes for compressing digital audio signals, the original signal is compressed by using human auditory characteristics. Human auditory characteristics include a minimum audible characteristic that indicates the minimum perceptible signal level that differs for each frequency in a quiet state.

【０００６】また、同時に発生する異なる周波数の信号
の間で、信号レベルの大きい音が信号レベルの小さい音
を聞こえにくくする周波数同時マスキング効果や、時間
軸で連続する信号の間で、信号レベルの大きい音が信号
レベルの小さい音を聞こえにくくする時間マスキング効
果などがある。これらの人間の聴覚特性を用いて、知覚
されない信号を間引いたり、あるいは知覚されにくい信
号の量子化ビット数を小さくすることで、再生時の音質
を保ちながらも高い圧縮率を実現している。[0006] Further, between signals of different frequencies generated simultaneously, a sound having a high signal level makes it difficult to hear a sound having a low signal level. There is a time masking effect that makes it difficult to hear a loud sound with a low signal level. Using these human auditory characteristics, thinning out signals that are not perceived or reducing the number of quantization bits of signals that are hardly perceived achieves a high compression rate while maintaining sound quality during reproduction.

【０００７】図４は変換符号化方式の符号化処理を実現
する従来のオーディオ信号符号化装置の構成を示すブロ
ック図である。図４において、４１は時系列での個数Ｍ
を単位とするデジタルオーディオ信号を、個数Ｍの周波
数軸上の周波数スペクトルデータに変換する第１の変換
長、あるいは時系列でＫを正整数としてＭ／Ｋで与えら
れる個数Ｎを単位とするデジタルオーディオ信号を、時
間軸上で個数がＫで連続する周波数軸上のＮ個の周波数
スペクトルデータに変換する第２の変換長により、周波
数スペクトルデータを算出する周波数変換部、４２は周
波数変換部４１で周波数スペクトルデータを算出する単
位を示す第１の変換長、あるいは第２の変換長をデジタ
ルオーディオ信号に応じて選択する変換長決定部、４３
は前記第１の変換長あるいは第２の変換長に応じた周波
数軸上の最小可聴値を供給する最小可聴値供給部、４４
は周波数変換部４１で算出された周波数スペクトルデー
タと変換長決定部４２で選択された変換長とにもとづい
て、周波数スペクトルデータの量子化のためのビット数
を決定する際に用いられる人間の聴覚特性に対応する聴
覚モデルを算出する聴覚モデル決定部、５５は聴覚モデ
ル決定部５４で算出された聴覚モデルを使って周波数ス
ペクトルデータを量子化する際のビット数を決定し、周
波数スペクトルデータを量子化し、符号化ビット列を生
成する量子化および符号化部である。符号化処理は、Ｍ
個のデジタルオーディオ信号に対して符号化処理を施す
単位で実行される。FIG. 4 is a block diagram showing a configuration of a conventional audio signal encoding apparatus which realizes encoding processing of the transform encoding method. In FIG. 4, 41 is the number M in time series.
Is a first conversion length for converting a digital audio signal in units of M into frequency spectrum data on the number M of frequency axes, or a digital unit in units of number N given by M / K, where K is a positive integer in time series. A frequency conversion unit for calculating frequency spectrum data by a second conversion length for converting an audio signal into N frequency spectrum data on a frequency axis where the number is continuous with K on the time axis; A conversion length determining unit for selecting a first conversion length or a second conversion length indicating a unit for calculating frequency spectrum data in accordance with the digital audio signal;
A minimum audible value supply unit 44 for supplying a minimum audible value on the frequency axis according to the first conversion length or the second conversion length;
Is a human auditory sense used to determine the number of bits for quantization of the frequency spectrum data based on the frequency spectrum data calculated by the frequency conversion unit 41 and the conversion length selected by the conversion length determination unit 42. An auditory model determining unit 55 that calculates an auditory model corresponding to the characteristic determines the number of bits when quantizing the frequency spectrum data using the auditory model calculated by the auditory model determining unit 54, and quantizes the frequency spectrum data. It is a quantization and coding unit that converts the data into a coded bit string. The encoding process is M
This is performed in units of performing the encoding process on the digital audio signals.

【０００８】ここで、周波数変換部４１で第１の変換長
あるいは第２の変換長の２つの異なる変換長で周波数変
換を行う理由を図５を使って以下に説明する。上記の変
換符号化方式により符号化されたデジタルオーディオ信
号を復号してデジタルオーディオ信号を再生する際、符
号化時の量子化ビット数によって決まる量子化誤差が周
波数変換を施す単位に分散して現れる。すなわち、信号
レベルが大きく変化するデジタルオーディオ信号を第１
の変換長で符号化し、復号して再生される際には図５
（ａ）に示されるようにレベルの大きい信号と同時に現
れる量子化誤差は知覚されないが、レベルの小さい信号
と同時に現れる量子化誤差が知覚されてしまう。これを
防ぐために、第２の変換長を選択して周波数変換を施す
単位を小さくすることで、図５（ｂ）に示されるように
量子化誤差が現れる時間を短くして量子化誤差が知覚さ
れにくくなるようにしている。このように変換長を選択
して周波数変換を施すことにより、Ｍ個のデジタルオー
ディオ信号を単位として実行される符号化処理におい
て、第１の変換長を選択したときはＭ個のデジタルオー
ディオ信号を単位として周波数変換を行い、Ｍ個の周波
数スペクトルデータＳｐ（０，ｊ）（０≦ｊ≦Ｍ−１）
が算出される。また、第２の変換長を選択したときはＮ
個のデジタルオーディオ信号を単位として周波数変換を
行い、Ｎ個の周波数スペクトルデータＳｐ（ｉ，ｊ）
（０≦ｉ≦Ｋ−１，０≦ｊ≦Ｎ−１）が算出され、この
Ｎ個の周波数スペクトルデータからなる周波数スペクト
ルデータ群が時間軸上にＫ個連続して算出される。図６
は、Ｍ＝１６として第１の変換長を選択したときに算出
される周波数スペクトルデータＳｐ（ｉ，ｊ）（ｉ＝
０，０≦ｊ≦１５）の様子を示す図である。また、図７
はＭ＝１６，Ｎ＝４，Ｋ＝４として第２の変換長を選択
したときに算出される周波数スペクトルデータＳｐ
（ｉ，ｊ）（０≦ｉ≦３，０≦ｊ≦３）の様子を示す図
である。図７に示すように、第２の変換長を選択したと
きに算出される周波数スペクトルデータは、第１の変換
長を選択したときに算出される周波数スペクトルデータ
に対して周波数軸上の周波数スペクトルデータの数が１
／Ｋすなわち１／４になり、周波数分解能が低くなると
ともに、時間軸上の周波数スペクトルデータの数がＫ倍
すなわち４倍になり、時間分解能が高くなる。Here, the reason why the frequency conversion is performed by the frequency conversion unit 41 using two different conversion lengths of the first conversion length or the second conversion length will be described below with reference to FIG. When a digital audio signal encoded by the above-described transform encoding method is decoded to reproduce the digital audio signal, a quantization error determined by the number of quantization bits at the time of encoding appears dispersedly in units for performing frequency conversion. . That is, the digital audio signal whose signal level changes greatly is
In the case of encoding with the conversion length of
As shown in (a), a quantization error that appears at the same time as a high-level signal is not perceived, but a quantization error that appears at the same time as a low-level signal is perceived. In order to prevent this, the unit for performing the frequency conversion by selecting the second conversion length is reduced to shorten the time during which the quantization error appears, as shown in FIG. It is difficult to be done. By performing the frequency conversion by selecting the conversion length in this manner, in the encoding process performed in units of M digital audio signals, when the first conversion length is selected, the M digital audio signals are converted. Frequency conversion is performed as a unit, and M pieces of frequency spectrum data Sp (0, j) (0 ≦ j ≦ M−1)
Is calculated. When the second conversion length is selected, N
Frequency conversion is performed in units of digital audio signals, and N frequency spectrum data Sp (i, j)
(0.ltoreq.i.ltoreq.K-1, 0.ltoreq.j.ltoreq.N-1) are calculated, and K frequency spectrum data groups composed of the N frequency spectrum data are continuously calculated on the time axis. FIG.
Is the frequency spectrum data Sp (i, j) (i = j) calculated when the first conversion length is selected with M = 16.
(0, 0 ≦ j ≦ 15). FIG.
Is the frequency spectrum data Sp calculated when M = 16, N = 4, K = 4 and the second conversion length is selected.
It is a figure showing a situation of (i, j) (0 ≦ i ≦ 3, 0 ≦ j ≦ 3). As shown in FIG. 7, the frequency spectrum data calculated when the second conversion length is selected is different from the frequency spectrum data calculated when the first conversion length is selected by the frequency spectrum on the frequency axis. Number of data is 1
/ K, that is, 1/4, and the frequency resolution becomes low, and the number of frequency spectrum data on the time axis becomes K times, that is, 4 times, and the time resolution becomes high.

【０００９】図５に示す従来のオーディオ信号符号化装
置で実行される符号化処理を以下に説明する。まず、変
換長決定部４２では、時系列のデジタルオーディオ信号
に応じて周波数変換を行う単位を示す変換長を決定し、
周波数変換部４１と最小可聴値供給部４３と聴覚モデル
算出部４４と量子化および符号化部４５に、決定した変
換長を変換長指定信号Ｓ４１によって通知する。The encoding process performed by the conventional audio signal encoding device shown in FIG. 5 will be described below. First, the conversion length determination unit 42 determines a conversion length indicating a unit for performing frequency conversion according to a time-series digital audio signal,
The determined conversion length is notified to the frequency conversion unit 41, the minimum audible value supply unit 43, the auditory model calculation unit 44, and the quantization and encoding unit 45 by the conversion length designation signal S41.

【００１０】次に、周波数変換部４１では、変換長決定
部４２から通知される第１の変換長あるいは第２の変換
長を示す変換長指定信号Ｓ４１に従って、時系列のＭ個
のデジタルオーディオ信号を周波数変換し、周波数スペ
クトルデータＳｐ（ｉ，ｊ）を算出する。ここで、第１
の変換長が選択されたときはｉ＝０，０≦ｊ≦Ｍ−１と
なり、第２の変換長が選択されたときは０≦ｉ≦Ｋ−
１，０≦ｊ≦Ｎ−１となる。[0010] Next, in the frequency conversion unit 41, according to the conversion length designation signal S41 indicating the first conversion length or the second conversion length notified from the conversion length determination unit 42, the time-series M digital audio signals. Is frequency-converted to calculate frequency spectrum data Sp (i, j). Here, the first
When the conversion length is selected, i = 0, 0 ≦ j ≦ M−1, and when the second conversion length is selected, 0 ≦ i ≦ K−
1,0 ≦ j ≦ N−1.

【００１１】最小可聴値供給部４３では、変換長決定部
４２から通知される変換長指定信号Ｓ４１にしたがって
第１の変換長、あるいは第２の変換長に応じた周波数軸
の最小可聴値Ｑ１（ｊ）あるいはＱ２（ｊ）を選択して
聴覚モデル決定部４４に供給する。ここで、第１の変換
長が選択されたときは０≦ｊ≦Ｍ−１となり、第２の変
換長が選択されたときは０≦ｊ≦Ｎ−１となる。In the minimum audible value supply unit 43, the minimum audible value Q1 (in the frequency axis corresponding to the first conversion length or the second conversion length) according to the conversion length designation signal S41 notified from the conversion length determination unit 42. j) or Q2 (j) is selected and supplied to the auditory model determination unit 44. Here, when the first conversion length is selected, 0 ≦ j ≦ M−1, and when the second conversion length is selected, 0 ≦ j ≦ N−1.

【００１２】聴覚モデル算出部４４では、変換長決定部
４２から通知される変換長指定信号Ｓ４１と、周波数変
換部４１により算出される周波数スペクトルデータＳｐ
（ｉ，ｊ）と、最小可聴値供給部４３から供給される最
小可聴値Ｑ１（ｊ）とに基づいて、周波数スペクトルデ
ータＳｐ（ｉ，ｊ）を量子化するビット数を決定する際
に用いるマスキング閾値レベルＭ（ｉ，ｊ）を算出す
る。The auditory model calculation unit 44 converts the conversion length designation signal S41 notified from the conversion length determination unit 42 and the frequency spectrum data Sp calculated by the frequency conversion unit 41.
Based on (i, j) and the minimum audible value Q1 (j) supplied from the minimum audible value supply unit 43, it is used when determining the number of bits for quantizing the frequency spectrum data Sp (i, j). A masking threshold level M (i, j) is calculated.

【００１３】量子化および符号化部４５では、聴覚モデ
ル算出部４４で算出されたマスキング閾値レベルＭ
（ｉ，ｊ）を用いて、周波数スペクトルデータＳｐ
（ｉ，ｊ）を量子化するビット数を決定し、周波数スペ
クトルデータＳｐ（ｉ，ｊ）を量子化して符号化ビット
列を出力する。図８はＭ＝１６の時の第１の変換長が選
択されたときに最小可聴値供給部４３から供給される周
波数毎の最小可聴値Ｑ１（ｊ）を示す図である。図８に
おいて、周波数ｊ（０≦ｊ≦１５）に対して最小可聴値
は網掛けの部分のレベルＱ１（ｊ）で示される。In the quantization and coding unit 45, the masking threshold level M calculated by the auditory model calculating unit 44
Using (i, j), the frequency spectrum data Sp
The number of bits for quantizing (i, j) is determined, and the frequency spectrum data Sp (i, j) is quantized to output a coded bit sequence. FIG. 8 is a diagram showing the minimum audible value Q1 (j) for each frequency supplied from the minimum audible value supply unit 43 when the first conversion length when M = 16 is selected. In FIG. 8, the minimum audible value for the frequency j (0 ≦ j ≦ 15) is indicated by the shaded level Q1 (j).

【００１４】この第１の変換長に対して、図９はＭ＝１
６，Ｎ＝４，Ｋ＝４の時の周波数分解能が１／Ｋすなわ
ち１／４となる第２の変換長が選択されたときに最小可
聴値供給部４３から供給される周波数毎の最小可聴値Ｑ
２（ｊ）を示す図である。図９において、周波数ｊ（０
≦ｊ≦３）に対して最小可聴値は網掛け部分のレベルＱ
２（ｊ）で示される。For this first conversion length, FIG.
6, when N = 4 and K = 4, the minimum audible value for each frequency supplied from the minimum audible value supply unit 43 when the second conversion length at which the frequency resolution is 1 / K, that is, 1/4 is selected. Value Q
It is a figure showing 2 (j). In FIG. 9, frequency j (0
≦ j ≦ 3), the minimum audible value is the level Q of the shaded portion.
2 (j).

【００１５】また、従来は一般的に、図９のＱ２（０）
は図８のＱ１（ｊ）（０≦ｊ≦３）のうちで最小のＱ１
（３）と同じレベルであり、図９のＱ２（１）は図８の
Ｑ１（ｊ）（４≦ｊ≦７）のうちで最小のＱ１（７）と
同じレベルであり、図９のＱ２（２）は図８のＱ１
（ｊ）（８≦ｊ≦１１）のうちで最小のＱ１（１０）と
同じレベルであり、さらに図９のＱ２（３）は図８のＱ
１（ｊ）（１２≦ｊ≦１５）のうちで最小のＱ１（１
２）と同じレベルとなるように、最小可聴値を間引いて
用いていた。Conventionally, generally, Q2 (0) shown in FIG.
Is the smallest Q1 of Q1 (j) (0 ≦ j ≦ 3) in FIG.
9 is the same level as Q2 (1) in FIG. 9, and is the same level as Q1 (7) which is the minimum among Q1 (j) (4 ≦ j ≦ 7) in FIG. (2) is Q1 in FIG.
(J) (8 ≦ j ≦ 11), which is the same level as the minimum Q1 (10), and Q2 (3) in FIG.
1 (j) (12 ≦ j ≦ 15), the smallest Q1 (1
The minimum audible value was thinned out so as to be the same level as in 2).

【００１６】このことは、図９に示すように、周波数分
解能が低い第２の変換長を選択することにより、マスキ
ング閾値レベル意Ｍ（ｉ，ｊ）を決定する際に用いる最
小可聴値Ｑ２（ｊ）が、第１の変換長を選択する場合に
比べて低くなることを示している。図８および図９にお
いては周波数スペクトルデータ毎に最小可聴値を供給す
る場合を例にしているが、聴覚心理で用いられる臨界帯
域毎に複数の周波数スペクトルデータを纏めた周波数軸
上に連続するブロックを単位に符号化を行う場合には、
このブロック毎に最小可聴値を供給する。第１の変換長
が選択されたときに周波数軸上のブロックの数をＭ＝１
６とした場合、最小可聴値は図８のように表され、また
第２の変換長が選択されたときに周波数軸上のブロック
の数をＭ＝４とした場合、最小可聴値は図９のように表
される。This means that, as shown in FIG. 9, by selecting the second conversion length having a low frequency resolution, the minimum audible value Q2 () used for determining the masking threshold level M (i, j) is determined. j) is lower than the case where the first conversion length is selected. FIGS. 8 and 9 show an example in which the minimum audible value is supplied for each frequency spectrum data. However, blocks continuous on the frequency axis in which a plurality of frequency spectrum data are collected for each critical band used in psychoacoustic psychology. When encoding in units of
The minimum audible value is provided for each block. When the first transform length is selected, the number of blocks on the frequency axis is M = 1.
When the second transform length is selected, the minimum audible value is represented as shown in FIG. 8, and when the number of blocks on the frequency axis is M = 4 when the second transform length is selected, the minimum audible value is represented by FIG. It is represented as

【００１７】[0017]

【発明が解決しようとする課題】しかしながら、上記の
ような従来のオーディオ信号符号化装置では、第２の変
換長を選択したときの最小可聴値が第１の変換長を選択
するときに比べて低くなり、その最小可聴値を、周波数
スペクトルデータに割り当てるビット数を決定する際に
用いる聴覚モデルとする場合に、第２の変換長を選択し
たときの周波数スペクトルデータに割り当てるビット数
が第１変換長を選択するときに比べて大きくなり、その
周波数スペクトルデータに対する符号化に必要なビット
数が多くなってしまう。However, in the above-described conventional audio signal encoding apparatus, the minimum audible value when the second conversion length is selected is smaller than that when the first conversion length is selected. When the minimum audible value is used as an auditory model used to determine the number of bits to be allocated to the frequency spectrum data, the number of bits to be allocated to the frequency spectrum data when the second conversion length is selected is changed to the first conversion. The length becomes longer than when the length is selected, and the number of bits required for encoding the frequency spectrum data increases.

【００１８】このように、第２の変換長を選択したとき
の周波数スペクトルデータに対する符号化に必要なビッ
ト数が多くなったために、ビット数が不足したことによ
り量子化誤差が大きくなり、音質が劣化してしまうとい
う問題点を有していた。本発明は、上記従来の問題点を
解決するもので、聴覚モデルの算出の際に実行する周波
数スペクトルデータに対する符号化に必要なビット数を
減少させることができ、全体としてビット数が不足して
いる場合にも符号化のための量子化ビット数をさらに減
少させることなく、音質の劣化を防止することができる
オーディオ信号符号化装置およびオーディオ信号符号化
方法を提供する。As described above, since the number of bits necessary for encoding the frequency spectrum data when the second transform length is selected increases, the quantization error increases due to the lack of the number of bits, and the sound quality is reduced. There was a problem that it deteriorated. The present invention solves the above-mentioned conventional problems, and can reduce the number of bits required for encoding frequency spectrum data to be executed at the time of calculating an auditory model. Provided is an audio signal encoding device and an audio signal encoding method capable of preventing deterioration of sound quality without further reducing the number of quantization bits for encoding even when the audio signal is present.

【００１９】[0019]

【課題を解決するための手段】上記の課題を解決するた
めに本発明のオーディオ信号符号化装置およびオーディ
オ信号符号化方法は、聴覚モデルの算出の際に周波数分
解能が低い第２の変換長を選択したことにより、最小可
聴値が低くなる場合にも、時間マスキング効果を用いて
聴覚モデルを示すマスキング閾値レベルが低下するのを
防ぎ、周波数スペクトルデータに割り当てるビット数を
少なく抑えることを特徴とする。In order to solve the above-mentioned problems, an audio signal encoding apparatus and an audio signal encoding method according to the present invention use a second transform length having a low frequency resolution when calculating an auditory model. By the selection, even when the minimum audible value becomes low, the masking threshold level indicating the auditory model is prevented from being lowered by using the time masking effect, and the number of bits allocated to the frequency spectrum data is reduced. .

【００２０】以上により、聴覚モデルの算出の際に実行
する周波数スペクトルデータに対する符号化に必要なビ
ット数を減少させることができ、全体としてビット数が
不足している場合にも符号化のための量子化ビット数を
さらに減少させることなく、音質の劣化を防止すること
ができる。As described above, it is possible to reduce the number of bits required for encoding the frequency spectrum data to be executed at the time of calculating the auditory model, and to perform encoding even when the number of bits is insufficient as a whole. Deterioration of sound quality can be prevented without further reducing the number of quantization bits.

【００２１】[0021]

【発明の実施の形態】本発明の請求項１に記載のオーデ
ィオ信号符号化装置は、時系列での個数Ｍを単位とする
デジタルオーディオ信号を、個数Ｍの周波数スペクトル
データに変換する第１の変換長、または時系列でＫを正
整数としてＭ／Ｋで与えられる個数Ｎを単位とするデジ
タルオーディオ信号を、時間軸上での個数がＫで連続す
る個数Ｎの周波数スペクトルデータに変換する第２の変
換長のうちの一方で、前記周波数スペクトルデータを算
出する周波数変換部と、前記周波数変換部により前記周
波数スペクトルデータを算出する際の前記第１の変換長
あるいは第２の変換長を、前記デジタルオーディオ信号
に応じて選択する変換長決定部と、前記第１の変換長あ
るいは第２の変換長に応じた周波数軸の最小可聴値を供
給する最小可聴値供給部と、前記周波数変換部で得られ
た前記周波数スペクトルデータと前記変換長決定部で選
択された変換長とに基づいて、前記周波数スペクトルデ
ータの量子化のためのビット数を決定する際に用いられ
る人間の聴覚特性に対応する聴覚モデルを算出する聴覚
モデル算出部と、前記聴覚モデル算出部で算出された前
記聴覚モデルにより決定したビット数で前記周波数スペ
クトルデータを量子化して、符号化ビット列を生成する
量子化および符号化部とを有するオーディオ信号符号化
装置であって、前記第２の変換長を選択したときは、前
記周波数変換部により算出される時間軸上での個数がＫ
で連続する前記周波数スペクトルデータのうち、時間軸
上で連続する複数の前記周波数スペクトルデータの間の
時間マスキング効果を使って時間マスキング閾値を算出
する時間マスキング算出部と、前記周波数スペクトルデ
ータと前記最小可聴値と前記時間マスキング閾値とを用
いて聴覚モデルを算出する前記聴覚モデル決定部とを備
えた構成とする。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An audio signal encoding apparatus according to a first aspect of the present invention converts a digital audio signal having a unit of number M in time series into a number M of frequency spectrum data. A digital audio signal whose unit is a conversion length or a number N given by M / K, where K is a positive integer in a time series, is converted into a number N of frequency spectrum data whose number on the time axis is continuous with K. 2, the frequency conversion unit for calculating the frequency spectrum data, the first conversion length or the second conversion length when calculating the frequency spectrum data by the frequency conversion unit, A conversion length determining unit for selecting according to the digital audio signal, and a minimum audible value for supplying a minimum audible value on a frequency axis according to the first conversion length or the second conversion length Supply unit, based on the frequency spectrum data obtained by the frequency conversion unit and the conversion length selected by the conversion length determination unit, when determining the number of bits for quantization of the frequency spectrum data An auditory model calculating unit that calculates an auditory model corresponding to the human auditory characteristics to be used, and quantizing the frequency spectrum data with the number of bits determined by the auditory model calculated by the auditory model calculating unit, to obtain an encoded bit sequence. An audio signal encoding apparatus having a quantization and encoding unit for generating the second transform length, wherein when the second transform length is selected, the number on the time axis calculated by the frequency transform unit is K
A time masking calculation unit that calculates a time masking threshold using a time masking effect between a plurality of the frequency spectrum data continuous on the time axis, the frequency spectrum data and the minimum A configuration including the auditory model determination unit that calculates an auditory model using an audible value and the temporal masking threshold.

【００２２】請求項２に記載のオーディオ信号符号化装
置は、請求項１に記載の第２の変換長を選択したとき
に、Ｌは１以上の正整数として、前記Ｎ個の周波数スペ
クトルデータを少なくとも１個の周波数スペクトルデー
タを含むＬ個のブロックに分割し、各々のブロック毎の
周波数スペクトルデータの絶対値の最大値、あるいは自
乗和で求められる信号レベルを算出し、時間軸上での個
数がＫで連続する個数Ｌの前記信号レベルのうち、時間
軸上で連続する複数の前記信号レベルの間の時間マスキ
ング効果を使って時間マスキング閾値を算出する時間マ
スキング算出部を備えた構成とする。In the audio signal encoding apparatus according to the second aspect, when the second transform length according to the first aspect is selected, L is a positive integer equal to or greater than 1 and the N frequency spectrum data are converted to the first transform length. It divides into L blocks including at least one frequency spectrum data, calculates the maximum value of the absolute value of the frequency spectrum data for each block, or the signal level obtained by the sum of squares, and calculates the number on the time axis. Has a time masking calculation unit that calculates a time masking threshold using a time masking effect between a plurality of the signal levels continuous on the time axis among the number L of the signal levels continuous with K. .

【００２３】請求項３に記載のオーディオ信号符号化方
法は、時系列での個数Ｍを単位とするデジタルオーディ
オ信号を、個数Ｍの周波数スペクトルデータに変換する
第１の変換長、または時系列でＫを正整数としてＭ／Ｋ
で与えられる個数Ｎを単位とするデジタルオーディオ信
号を、時間軸上での個数がＫで連続する個数Ｎの周波数
スペクトルデータに変換する第２の変換長のうち、前記
デジタルオーディオ信号に応じて選択した変換長で、前
記デジタルオーディオ信号から前記周波数スペクトルデ
ータを算出し、この周波数スペクトルデータと前記選択
した変換長とに基づいて、前記周波数スペクトルデータ
の量子化のためのビット数を決定する際に用いられる人
間の聴覚特性に対応する聴覚モデルを算出し、算出され
た前記聴覚モデルにより決定したビット数で前記選択し
た変換長により算出した周波数スペクトルデータを量子
化して、符号化ビット列を生成するオーディオ信号符号
化方法であって、前記聴覚モデルの算出で前記第２の変
換長を選択したときは、その変換長により算出される時
間軸上での個数がＫで連続する前記周波数スペクトルデ
ータのうち、時間軸上で連続する複数の前記周波数スペ
クトルデータの間の時間マスキング効果を使って時間マ
スキング閾値を算出し、前記周波数スペクトルデータ
と、第１の変換長あるいは第２の変換長に応じた周波数
軸の最小可聴値と、前記時間マスキング閾値を用いて聴
覚モデルを算出する方法とする。According to a third aspect of the present invention, in the audio signal encoding method, a first conversion length for converting a digital audio signal in units of the number M of time series into frequency spectrum data of the number M or a time series is used. M / K where K is a positive integer
Is selected in accordance with the digital audio signal from among the second conversion lengths for converting the digital audio signal in units of the number N given by the above into the number N of frequency spectrum data whose number on the time axis is continuous with K. Calculating the frequency spectrum data from the digital audio signal with the converted length, and determining the number of bits for quantization of the frequency spectrum data based on the frequency spectrum data and the selected conversion length. An audio for calculating an auditory model corresponding to the human auditory characteristic to be used, quantizing the frequency spectrum data calculated by the selected conversion length with the calculated number of bits determined by the auditory model, and generating an encoded bit sequence A signal encoding method, wherein the second transform length is selected in the calculation of the auditory model. Is time masking using a time masking effect between a plurality of frequency spectrum data continuous on the time axis among the frequency spectrum data whose number on the time axis calculated by the conversion length is continuous with K. A threshold value is calculated, and an auditory model is calculated using the frequency spectrum data, the minimum audible value on the frequency axis according to the first conversion length or the second conversion length, and the time masking threshold value.

【００２４】請求項４に記載のオーディオ信号符号化方
法は、請求項３に記載の第２の変換長を選択したとき
に、Ｌは１以上の正整数として、前記Ｎ個の周波数スペ
クトルデータを少なくとも１個の周波数スペクトルデー
タを含むＬ個のブロックに分割し、各々のブロック毎の
周波数スペクトルデータの絶対値の最大値、あるいは自
乗和で求められる信号レベルを算出し、時間軸上での個
数がＫで連続する個数Ｌの前記信号レベルのうち、時間
軸上で連続する複数の前記信号レベルの間の時間マスキ
ング効果を使って時間マスキング閾値を算出する方法と
する。According to a fourth aspect of the present invention, in the audio signal encoding method, when the second transform length according to the third aspect is selected, L is a positive integer of 1 or more, and the N number of frequency spectrum data are converted to N. It divides into L blocks including at least one frequency spectrum data, calculates the maximum value of the absolute value of the frequency spectrum data for each block, or the signal level obtained by the sum of squares, and calculates the number on the time axis. Is a method of calculating a time masking threshold using a time masking effect between a plurality of signal levels continuous on a time axis among a number L of the signal levels continuous with K.

【００２５】これらの構成および方法によると、聴覚モ
デルの算出の際に周波数分解能が低い第２の変換長を選
択したことにより、最小可聴値が低くなる場合にも、時
間マスキング効果を用いて聴覚モデルを示すマスキング
閾値レベルが低下するのを防ぎ、周波数スペクトルデー
タに割り当てるビット数を少なく抑える。以下、本発明
の実施の形態を示すオーディオ信号符号化装置およびオ
ーディオ信号符号化方法について、図面を参照しながら
具体的に説明する。（実施の形態１）本発明の実施の形態１のオーディオ信
号符号化装置を説明する。According to these configurations and methods, when the second conversion length having a low frequency resolution is selected at the time of calculating the auditory model, even when the minimum audible value is low, the auditory sense can be obtained using the time masking effect. The masking threshold level indicating the model is prevented from lowering, and the number of bits allocated to the frequency spectrum data is reduced. Hereinafter, an audio signal encoding device and an audio signal encoding method according to embodiments of the present invention will be specifically described with reference to the drawings. (Embodiment 1) An audio signal encoding apparatus according to Embodiment 1 of the present invention will be described.

【００２６】図１は本実施の形態１のオーディオ信号符
号化装置の構成を示すブロック図である。図１におい
て、周波数変換部１と変換長決定部２と最小可聴値供給
部４と量子化および符号化部５は、図４における従来の
オーディオ信号符号化装置と同じ動作をする構成要素で
ある。３は第２の変換長を選択した時に、時間軸上でＫ
個の連続する周波数スペクトルデータＳｐ（ｉ，ｊ）の
間の時間マスキング効果を使って時間マスキング閾値Ｔ
ｍ（ｉ，ｊ）を算出する時間マスキング算出部である。
５は最小可聴値供給部４から供給される最小可聴値Ｑ１
（ｊ）あるいはＱ２（ｊ）と、時間マスキング算出部３
で算出される時間マスキング閾値Ｔｍ（ｉ，ｊ）と、周
波数スペクトルデータＳｐ（ｉ，ｊ）とに基づいて、聴
覚モデルを示すマスキング閾値レベルＭ（ｉ，ｊ）を決
定する聴覚モデル決定部である。聴覚モデル決定部５
は、第１の変換長を選択したときには最小可聴値供給部
４から供給される最小可聴値Ｑ１（ｊ）と周波数スペク
トルデータＳｐ（ｉ，ｊ）とに基づいて聴覚モデルを示
すマスキング閾値レベルＭ（ｉ，ｊ）を決定し、第２の
変換長を選択したときには最小可聴値供給部４から供給
される最小可聴値Ｑ２（ｊ）と、時間マスキング算出部
３で算出される時間マスキング閾値Ｔｍ（ｉ，ｊ）とを
比較しどちらか大きい方と、周波数スペクトルデータＳ
ｐ（ｉ，ｊ）とに基づいて聴覚モデルを示すマスキング
閾値レベルＭ（ｉ，ｊ）を決定して出力する。FIG. 1 is a block diagram showing the configuration of the audio signal encoding apparatus according to the first embodiment. In FIG. 1, a frequency conversion unit 1, a conversion length determination unit 2, a minimum audible value supply unit 4, and a quantization and encoding unit 5 are components that perform the same operations as the conventional audio signal encoding device in FIG. . 3 indicates that when the second conversion length is selected,
The time masking threshold T using the time masking effect between two consecutive frequency spectrum data Sp (i, j)
It is a time masking calculation unit that calculates m (i, j).
5 is the minimum audible value Q1 supplied from the minimum audible value supply unit 4.
(J) or Q2 (j) and time masking calculating section 3
In the auditory model determining unit that determines a masking threshold level M (i, j) indicating the auditory model based on the time masking threshold Tm (i, j) calculated in the above and the frequency spectrum data Sp (i, j). is there. Auditory model determination unit 5
Is a masking threshold level M indicating the auditory model based on the minimum audible value Q1 (j) supplied from the minimum audible value supply unit 4 and the frequency spectrum data Sp (i, j) when the first conversion length is selected. When (i, j) is determined and the second conversion length is selected, the minimum audible value Q2 (j) supplied from the minimum audible value supply unit 4 and the time masking threshold Tm calculated by the time masking calculation unit 3 (I, j), and the larger one is compared with the frequency spectrum data S
Based on p (i, j), a masking threshold level M (i, j) indicating the auditory model is determined and output.

【００２７】図２は図１に示す時間マスキング算出部３
における時間軸上でＫ個の連続する周波数スペクトルデ
ータＳｐ（ｉ，ｊ）の間の時間マスキング効果を用いて
算出される時間マスキング閾値Ｔｍ（ｉ，ｊ）を示す図
である。図３は、｛ｊ＝０，Ｋ＝４｝とした場合で時間
軸上の４個の連続する周波数スペクトルデータＳｐ
（ｉ，０）（０≦ｉ≦３）の間でＳｐ（ｉ１，０）がＳ
ｐ（ｉ２，０）（０≦ｉ１≦３，０≦ｉ２≦３，ｉ１≠
ｉ２）にあたえる時間マスキング効果による時間マスキ
ング閾値Ｔｍ（ｉ，０）（ｉ＝０，…，３）を示す。FIG. 2 shows the time masking calculating section 3 shown in FIG.
FIG. 7 is a diagram showing a time masking threshold Tm (i, j) calculated using a time masking effect between K pieces of continuous frequency spectrum data Sp (i, j) on the time axis in FIG. FIG. 3 shows four continuous frequency spectrum data Sp on the time axis when {j = 0, K = 4}.
Sp (i1,0) becomes S between (i, 0) (0 ≦ i ≦ 3)
p (i2,0) (0 ≦ i1 ≦ 3,0 ≦ i2 ≦ 3, i1}
A time masking threshold Tm (i, 0) (i = 0,..., 3) based on the time masking effect given to i2) is shown.

【００２８】時間マスキング算出部３においては、各々
の周波数スペクトルデータＳｐ（ｉ，ｊ）が時間軸上に
連続する周波数スペクトルデータに与える時間マスキン
グ効果による時間マスキング閾値Ｔｍ（ｉ，ｊ）を算出
し、その最大値を周波数スペクトルＳｐ（ｉ，ｊ）に対
する時間マスキング閾値Ｔｍ（ｉ，ｊ）とする。また、
処理量を削減するために、時間軸上で隣接する周波数ス
ペクトルデータの間のマスキングレベルを算出してもよ
い。さらに、時間軸上のＫ個の連続する周波数スペクト
ルデータＳｐ（ｉ，ｊ）の間のマスキングレベルの算出
を任意の周波数ｊで行ってもよい。The time masking calculation section 3 calculates a time masking threshold Tm (i, j) by a time masking effect that each frequency spectrum data Sp (i, j) gives to frequency spectrum data continuous on the time axis. , The maximum value of which is defined as a time masking threshold Tm (i, j) for the frequency spectrum Sp (i, j). Also,
In order to reduce the processing amount, a masking level between frequency spectrum data adjacent on the time axis may be calculated. Furthermore, the calculation of the masking level between the K consecutive frequency spectrum data Sp (i, j) on the time axis may be performed at an arbitrary frequency j.

【００２９】図２で示される時間マスキング閾値Ｔｍ
（０，０）が時間マスキング算出部３で算出されるとき
は、Ｔｍ（０，３）は最小可聴値供給部４から供給され
る最小可聴値Ｑ２（０）よりも小さいので、聴覚モデル
決定部５においては最小可聴値Ｑ２（０）と周波数スペ
クトルデータＳｐ（０，０）とに基づいて周波数スペク
トルデータＳｐ（０，０）を量子化するビット数を決定
する際に用いるマスキング閾値レベルＭ（０，０）を決
定する。しかしながら、Ｔｍ（１，０）は最小可聴値Ｑ
２（０）よりも大きいので、聴覚モデル決定部５におい
ては、Ｔｍ（１，０）と周波数スペクトルデータＳｐ
（１，０）とに基づいて周波数スペクトルデータＳｐ
（１，０）を量子化するビット数を決定する際に用いる
マスキング閾値レベルＭ（１，０）を決定する。Ｔｍ
（２，０）およびＴｍ（３，０）も同様に最小可聴値Ｑ
２（０）よりも大きいので、これらの値と周波数スペク
トルデータＳｐ（２，０）およびＳｐ（３，０）とに基
づいてマスキング閾値レベルＭ（２，０）およびＭ
（３，０）を決定する。The time masking threshold Tm shown in FIG.
When (0,0) is calculated by the time masking calculation unit 3, since Tm (0,3) is smaller than the minimum audible value Q2 (0) supplied from the minimum audible value supply unit 4, the auditory model determination is performed. In the section 5, a masking threshold level M used for determining the number of bits for quantizing the frequency spectrum data Sp (0,0) based on the minimum audible value Q2 (0) and the frequency spectrum data Sp (0,0). (0,0) is determined. However, Tm (1,0) is the minimum audible value Q
2 (0), Tm (1,0) and frequency spectrum data Sp
(1, 0) and the frequency spectrum data Sp
A masking threshold level M (1,0) used to determine the number of bits for quantizing (1,0) is determined. Tm
(2,0) and Tm (3,0) also have a minimum audible value Q
2 (0), masking threshold levels M (2,0) and M based on these values and frequency spectrum data Sp (2,0) and Sp (3,0).
(3, 0) is determined.

【００３０】これにより、第２の変換長を選択したとき
に時間軸上に連続する周波数スペクトルデータＳｐ
（ｉ，ｊ）の間の時間マスキング効果による時間マスキ
ング閾値Ｔｍ（ｉ，ｊ）を算出し、時間マスキング閾値
Ｔｍ（ｉ，ｊ）が、最小可聴値レベルＱ２（ｊ）よりも
大きい場合には、この時間マスキング閾値Ｔｍ（ｉ，
ｊ）と、周波数スペクトルデータＳｐ（ｉ，ｊ）とに基
づいて聴覚モデルを示すマスキング閾値レベルＭ（ｉ，
ｊ）を決定することで、周波数スペクトルデータＳｐ
（ｉ，ｊ）に割り当てるビット数を小さくすることがで
き、符号化に必要なビット数を少なくすることができ
る。Thus, when the second conversion length is selected, the frequency spectrum data Sp continuous on the time axis is selected.
The time masking threshold Tm (i, j) due to the time masking effect during (i, j) is calculated, and when the time masking threshold Tm (i, j) is larger than the minimum audible value level Q2 (j). , This time masking threshold Tm (i,
j) and the frequency spectrum data Sp (i, j) based on a masking threshold level M (i,
j), the frequency spectrum data Sp
The number of bits allocated to (i, j) can be reduced, and the number of bits required for encoding can be reduced.

【００３１】このように符号化に必要なビット数を少な
くすることで、ビット数が不足して量子化誤差が大きく
なるのを防ぎ、音質劣化を防ぐことができる。なお、実
施の形態１の説明では時間マスキング閾値Ｔｍ（ｉ，
ｊ）と最小可聴値Ｑ２（ｊ）との大きい方と、周波数ス
ペクトルデータＳｐ（ｉ，ｊ）とに基づいて聴覚モデル
を示すマスキング閾値レベルＭ（ｉ，ｊ）を決定する場
合を説明したが、時間マスキング閾値Ｔｍ（ｉ，ｊ）を
用いて最小可聴値Ｑ２（ｊ）を補正し、この補正した最
小可聴値と周波数スペクトルデータＳｐ（ｉ，ｊ）とに
基づいて聴覚モデルを示すマスキング閾値レベルＭ
（ｉ，ｊ）を決定してもよい。By reducing the number of bits necessary for encoding in this way, it is possible to prevent a quantization error from becoming large due to an insufficient number of bits and to prevent sound quality deterioration. In the description of the first embodiment, the time masking threshold Tm (i, i,
Although the case where the masking threshold level M (i, j) indicating the auditory model is determined based on the larger one of j) and the minimum audible value Q2 (j) and the frequency spectrum data Sp (i, j) has been described, , The minimum audible value Q2 (j) is corrected using the time masking threshold value Tm (i, j), and a masking threshold value indicating an auditory model based on the corrected minimum audible value and the frequency spectrum data Sp (i, j). Level M
(I, j) may be determined.

【００３２】また、実施の形態１の説明では、時間軸に
連続する周波数スペクトルデータの間の時間マスキング
効果により時間マスキング閾値を算出し、この時間マス
キング閾値と周波数スペクトルデータ毎に供給される最
小可聴値とから周波数スペクトルデータＳｐ（ｉ，ｊ）
毎の量子化ビット数を決定する際に用いる聴覚モデルを
示すマスキング閾値レベルＭ（ｉ，ｊ）を決定する場合
を説明したが、聴覚心理で用いられる臨界帯域毎に複数
の周波数スペクトルデータＳｐ（ｉ，ｊ）を纏めた個数
Ｌ（Ｌは正整数）で周波数軸上に連続するブロックを単
位に符号化を行う場合には、ブロック毎の周波数スペク
トルデータＳｐ（ｉ，ｊ）の絶対値の最大値あるいは自
乗和で与えられる信号レベルＰ（ｉ，ｋ）（０≦ｋ≦Ｌ
−１）を算出し、時間軸上に連続するブロック毎の信号
レベルＰ（ｉ，ｋ）の間の時間マスキング効果による時
間マスキング閾値Ｔｍ（ｉ，ｋ）を算出し、このブロッ
ク毎に供給される最小可聴値Ｑ３（ｋ）（０≦ｋ≦Ｌ−
１）とからブロックに含まれる周波数スペクトルデータ
Ｓｐ（ｉ，ｊ）を量子化するビット数を決定する際に用
いるブロック毎の聴覚モデルを示すマスキング閾値レベ
ルＭ（ｉ，ｋ）を決定することで第２の変換長を選択し
たことにより最小可聴値が低くなる場合にも、時間マス
キング効果を用いて聴覚モデルを示すマスキング閾値レ
ベルＭ（ｉ，ｋ）が低下するのを防ぎ、周波数スペクト
ルデータに割り当てるビット数を少なく抑えることがで
きる。（実施の形態２）本発明の実施の形態２のオーディオ信
号符号化方法を説明する。In the description of the first embodiment, a time masking threshold is calculated by a time masking effect between frequency spectrum data continuous on the time axis, and the minimum audible value supplied for each time masking threshold and frequency spectrum data is calculated. From the value, the frequency spectrum data Sp (i, j)
The case where the masking threshold level M (i, j) indicating the auditory model used for determining the number of quantization bits for each is described, but a plurality of frequency spectrum data Sp ( When encoding is performed in units of blocks L (L is a positive integer) obtained by combining i, j) on a frequency axis basis, the absolute value of the frequency spectrum data Sp (i, j) for each block is calculated. The signal level P (i, k) given by the maximum value or the sum of squares (0 ≦ k ≦ L
-1) to calculate a time masking threshold Tm (i, k) due to a time masking effect between signal levels P (i, k) for each block on the time axis. Minimum audible value Q3 (k) (0 ≦ k ≦ L−
The masking threshold level M (i, k) indicating the auditory model for each block used when determining the number of bits for quantizing the frequency spectrum data Sp (i, j) included in the block from 1) is determined. Even when the minimum audible value is reduced by selecting the second conversion length, the masking threshold level M (i, k) indicating the auditory model is prevented from lowering by using the time masking effect, and the frequency spectrum data is reduced. The number of bits to be allocated can be reduced. (Embodiment 2) An audio signal encoding method according to Embodiment 2 of the present invention will be described.

【００３３】図３は、本実施の形態２のオーディオ信号
符号化方法の処理の流れを示すフロー図である。図３に
おいて、３１はデジタルオーディオ信号に応じて後に説
明する周波数変換処理３２で周波数スペクトルデータＳ
ｐ（ｉ，ｊ）を算出する単位を示す第１の変換長あるい
は第２の変換長を選択する変換長決定処理、３２は時系
列で個数Ｍを単位とするデジタルオーディオ信号を周波
数軸上の個数Ｍの周波数スペクトルデータＳｐ（ｉ，
ｊ）に変換する第１の変換長、あるいは時系列でＫ個を
正整数としてＭ／Ｋで与えられる個数Ｎを単位とするデ
ジタルオーディオ信号を、時間軸上で個数がＫで連続す
る周波数軸上の個数Ｎの周波数スペクトルデータＳｐ
（ｉ，ｊ）に変換する第２の変換長により、周波数スペ
クトルデータＳｐ（ｉ，ｊ）を算出する周波数変換処
理、３３は変換長決定処理３１で決定された変換長を判
定し、分岐する変換長判定処理、３４は変換長決定処理
３１で第２の変換長が選択されたときに、時間軸上に連
続する周波数スペクトルデータＳｐ（ｉ，ｊ）の間の時
間マスキング効果による時間マスキング閾値Ｔｍ（ｉ，
ｊ）を算出する時間マスキングレベル算出処理、３５は
周波数スペクトルデータＳｐ（ｉ，ｊ）と、変換長に対
応する最小可聴値Ｑ１（ｊ）あるいはＱ２（ｊ）と、時
間マスキングレベル算出処理３４で算出された時間マス
キング閾値Ｔｍ（ｉ，ｊ）とに基づいて、後に説明する
量子化および符号化処理３６で周波数スペクトルデータ
Ｓｐ（ｉ，ｊ）に割り当てるビット数を決定する際に用
いる聴覚モデルを示すマスキング閾値レベルＭ（ｉ，
ｊ）を算出する聴覚モデル決定処理である。FIG. 3 is a flowchart showing a flow of processing of the audio signal encoding method according to the second embodiment. In FIG. 3, reference numeral 31 denotes a frequency conversion process 32, which will be described later, according to the digital audio signal.
A conversion length determination process for selecting a first conversion length or a second conversion length indicating a unit for calculating p (i, j). 32 is a process for converting a digital audio signal in units of M in time series on the frequency axis. The number M of frequency spectrum data Sp (i,
j) the first conversion length, or a digital audio signal in units of the number N given by M / K, where K is a positive integer in a time series, and a frequency axis in which the number is continuous with K on the time axis. The above number N of frequency spectrum data Sp
The frequency conversion processing for calculating the frequency spectrum data Sp (i, j) based on the second conversion length to be converted to (i, j). The frequency conversion processing 33 determines the conversion length determined in the conversion length determination processing 31 and branches. A conversion length determination process 34 is a time masking threshold by a time masking effect between frequency spectrum data Sp (i, j) continuous on the time axis when the second conversion length is selected in the conversion length determination process 31. Tm (i,
a time masking level calculation process 35 for calculating j), frequency spectrum data Sp (i, j), a minimum audible value Q1 (j) or Q2 (j) corresponding to the conversion length, and a time masking level calculation process 34 Based on the calculated time masking threshold value Tm (i, j), an auditory model used for determining the number of bits to be allocated to the frequency spectrum data Sp (i, j) in the quantization and encoding process 36 described later is described. The masking threshold level M (i,
This is an auditory model determination process for calculating j).

【００３４】ここで、最小可聴値Ｑ１（ｊ）あるいはＱ
２（ｊ）は、従来のオーディオ信号符号化装置における
最小可聴値供給部４３から供給される最小可聴値と同じ
であり、図８および図９で示される。３６は変換長にし
たがって、周波数スペクトルデータＳｐ（ｉ，ｊ）と聴
覚モデル決定処理３５で算出された聴覚モデルを示すマ
スキング閾値レベルＭ（ｉ，ｊ）とに基づいて、周波数
スペクトルデータＳｐ（ｉ，ｊ）を量子化する量子化ビ
ット数を割り当て、周波数スペクトルデータＳｐ（ｉ，
ｊ）を量子化し符号化ビットストリームを生成する量子
化および符号化処理である。Here, the minimum audible value Q1 (j) or Q
2 (j) is the same as the minimum audible value supplied from the minimum audible value supply unit 43 in the conventional audio signal encoding device, and is shown in FIG. 8 and FIG. Reference numeral 36 denotes frequency spectrum data Sp (i, j) based on the frequency spectrum data Sp (i, j) and the masking threshold level M (i, j) indicating the auditory model calculated in the auditory model determination processing 35 according to the conversion length. , J) are assigned the number of quantization bits for quantizing the frequency spectrum data Sp (i, j).
j) is a quantization and encoding process that quantizes j) to generate an encoded bit stream.

【００３５】図３のフロー図で示されるオーディオ信号
符号化方法において、第２の変換長が選択されたとき
に、時間マスキングレベル算出処理３４により時間軸上
に連続する周波数スペクトルデータＳｐ（ｉ，ｊ）の間
の時間マスキング効果によるマスキング閾値Ｔｍ（ｉ，
ｊ）を算出し、聴覚モデル決定処理３５において第２の
変換長に対応する最小可聴値レベルＱ２（ｊ）と時間マ
スキング閾値Ｔｍ（ｉ，ｊ）との大きい方と、周波数ス
ペクトルデータＳｐ（ｉ，ｊ）とに基づいて聴覚モデル
を示すマスキング閾値レベルＭ（ｉ，ｊ）を決定するこ
とにより、周波数スペクトルデータＳｐ（ｉ，ｊ）に割
り当てるビット数を小さくすることができ、符号化に必
要なビット数を少なくすることができる。In the audio signal encoding method shown in the flowchart of FIG. 3, when the second transform length is selected, the frequency mask data Sp (i, j) a masking threshold Tm (i,
j) is calculated, and in the auditory model determination processing 35, the larger one of the minimum audible value level Q2 (j) corresponding to the second conversion length and the time masking threshold Tm (i, j), and the frequency spectrum data Sp (i) , J) to determine the masking threshold level M (i, j) indicating the auditory model, it is possible to reduce the number of bits allocated to the frequency spectrum data Sp (i, j), which is necessary for encoding. Bit number can be reduced.

【００３６】これにより、符号化に必要なビット数を少
なくすることで、ビット数が不足して量子化ビット数が
小さくなるのを防ぎ、音質劣化を防ぐことができる。な
お、実施の形態２の説明では時間マスキング閾値Ｔｍ
（ｉ，ｊ）と最小可聴値Ｑ２（ｊ）との大きい方と、周
波数スペクトルデータＳｐ（ｉ，ｊ）とに基づいて聴覚
モデルを示すマスキング閾値レベルＭ（ｉ，ｊ）を決定
する場合を説明したが、時間マスキング閾値Ｔｍ（ｉ，
ｊ）を用いて最小可聴値Ｑ２（ｊ）を補正し、この補正
した最小可聴値と周波数スペクトルデータＳｐ（ｉ，
ｊ）とに基づいて聴覚モデルを示すマスキング閾値レベ
ルＭ（ｉ，ｊ）を決定してもよい。Thus, by reducing the number of bits required for encoding, it is possible to prevent the number of bits from becoming insufficient and the number of quantization bits from being reduced, thereby preventing sound quality deterioration. In the description of the second embodiment, the time masking threshold Tm
A case where a masking threshold level M (i, j) indicating an auditory model is determined based on the larger of (i, j) and the minimum audible value Q2 (j) and the frequency spectrum data Sp (i, j). As described above, the time masking threshold Tm (i,
j), the minimum audible value Q2 (j) is corrected, and the corrected minimum audible value and the frequency spectrum data Sp (i,
j), the masking threshold level M (i, j) indicating the auditory model may be determined.

【００３７】また、実施の形態２の説明では、時間軸に
連続する周波数スペクトルデータの間の時間マスキング
効果により時間マスキング閾値を算出し、この時間マス
キング閾値と周波数スペクトルデータ毎に供給される最
小可聴値とから周波数スペクトルデータＳｐ（ｉ，ｊ）
毎の量子化ビット数を決定する際に用いる聴覚モデルを
示すマスキング閾値レベルＭ（ｉ，ｊ）を決定する場合
を説明したが、聴覚心理で用いられる臨界帯域毎に複数
の周波数スペクトルデータＳｐ（ｉ，ｊ）を纏めた個数
Ｌ（Ｌは正整数）で周波数軸上に連続するブロックを単
位に符号化を行う場合には、ブロック毎の周波数スペク
トルデータＳｐ（ｉ，ｊ）の絶対値の最大値あるいは自
乗和で与えられる信号レベルＰ（ｉ，ｋ）（０≦ｋ≦Ｌ
−１）を算出し、時間軸上に連続するブロック毎の信号
レベルＰ（ｉ，ｋ）の間の時間マスキング効果による時
間マスキング閾値Ｔｍ（ｉ，ｋ）を算出し、このブロッ
ク毎に供給される最小可聴値Ｑ３（ｋ）（０≦ｋ≦Ｌ−
１）とからブロックに含まれる周波数スペクトルデータ
Ｓｐ（ｉ，ｊ）を量子化するビット数を決定する際に用
いるブロック毎の聴覚モデルを示すマスキング閾値レベ
ルＭ（ｉ，ｋ）を決定することで第２の変換長を選択し
たことにより最小可聴値が低くなる場合にも、時間マス
キング効果を用いて聴覚モデルを示すマスキング閾値レ
ベルＭ（ｉ，ｋ）が低下するのを防ぎ、周波数スペクト
ルデータに割り当てるビット数を少なく抑えることがで
きる。In the description of the second embodiment, the time masking threshold is calculated by the time masking effect between the frequency spectrum data continuous on the time axis, and the minimum audible value supplied for each of the time masking threshold and the frequency spectrum data is calculated. From the value, the frequency spectrum data Sp (i, j)
The case where the masking threshold level M (i, j) indicating the auditory model used for determining the number of quantization bits for each is described, but a plurality of frequency spectrum data Sp ( When encoding is performed in units of blocks L (L is a positive integer) obtained by combining i, j) on a frequency axis basis, the absolute value of the frequency spectrum data Sp (i, j) for each block is calculated. The signal level P (i, k) given by the maximum value or the sum of squares (0 ≦ k ≦ L
-1) to calculate a time masking threshold Tm (i, k) due to a time masking effect between signal levels P (i, k) for each block on the time axis. Minimum audible value Q3 (k) (0 ≦ k ≦ L−
The masking threshold level M (i, k) indicating the auditory model for each block used when determining the number of bits for quantizing the frequency spectrum data Sp (i, j) included in the block from 1) is determined. Even when the minimum audible value is reduced by selecting the second conversion length, the masking threshold level M (i, k) indicating the auditory model is prevented from lowering by using the time masking effect, and the frequency spectrum data is reduced. The number of bits to be allocated can be reduced.

【００３８】[0038]

【発明の効果】以上のように本発明によれば、聴覚モデ
ルの算出の際に周波数分解能が低い第２の変換長を選択
した場合にも、聴覚モデルを示すマスキングレベルを低
下させることなく、周波数スペクトルデータに割り当て
るビット数を少なく抑えることができる。As described above, according to the present invention, even when the second conversion length having a low frequency resolution is selected in the calculation of the auditory model, the masking level indicating the auditory model is not reduced. The number of bits allocated to frequency spectrum data can be reduced.

【００３９】そのため、聴覚モデルの算出の際に実行す
る周波数スペクトルデータに対する符号化に必要なビッ
ト数を減少させることができ、全体としてビット数が不
足している場合にも符号化のための量子化ビット数をさ
らに減少させることなく、音質の劣化を防止することが
できる。Therefore, it is possible to reduce the number of bits required for encoding the frequency spectrum data executed when calculating the auditory model, and to reduce the number of bits for encoding even when the number of bits is insufficient as a whole. It is possible to prevent the sound quality from deteriorating without further reducing the number of coded bits.

[Brief description of the drawings]

【図１】本発明の実施の形態１のオーディオ信号符号化
装置の構成を示すブロック図FIG. 1 is a block diagram illustrating a configuration of an audio signal encoding device according to a first embodiment of the present invention.

【図２】同実施の形態１における周波数スペクトルデー
タの説明図FIG. 2 is an explanatory diagram of frequency spectrum data according to the first embodiment.

【図３】本発明の実施の形態２のオーディオ信号符号化
方法の処理を示すフロー図FIG. 3 is a flowchart showing processing of an audio signal encoding method according to the second embodiment of the present invention;

【図４】従来のオーディオ信号符号化装置の構成を示す
ブロック図FIG. 4 is a block diagram showing a configuration of a conventional audio signal encoding device.

【図５】同従来例における第１の変換長および第２の変
換長で符号化し復号したデジタルオーディオ信号を示す
図FIG. 5 is a diagram showing a digital audio signal encoded and decoded using a first conversion length and a second conversion length in the conventional example.

【図６】同従来例における第１の変換長で得られる周波
数スペクトルデータの説明図FIG. 6 is an explanatory diagram of frequency spectrum data obtained with a first conversion length in the conventional example.

【図７】同従来例における第２の変換長で得られる周波
数スペクトルデータの説明図FIG. 7 is an explanatory diagram of frequency spectrum data obtained with a second conversion length in the conventional example.

【図８】同従来例における第１の変換長に対応する周波
数軸の最小可聴値の説明図FIG. 8 is an explanatory diagram of a minimum audible value on a frequency axis corresponding to a first conversion length in the conventional example.

【図９】同従来例における第２の変換長に対応する周波
数軸の最小可聴値の説明図FIG. 9 is an explanatory diagram of a minimum audible value on a frequency axis corresponding to a second conversion length in the conventional example.

[Explanation of symbols]

１、４１周波数変換部２、４２変換長決定部３時間マスキング算出部４、４３最小可聴値供給部５、４４聴覚モデル決定部６、４５量子化および符号化部 1, 41 frequency conversion unit 2, 42 conversion length determination unit 3 time masking calculation unit 4, 43 minimum audible value supply unit 5, 44 auditory model determination unit 6, 45 quantization and coding unit

Claims

[Claims]

1. A first conversion length for converting a digital audio signal in units of the number M of time series into frequency spectrum data of the number M, or given by M / K with K being a positive integer in the time series. The number N of digital audio signals whose number on the time axis is continuous with K on the time axis
A frequency conversion unit for calculating the frequency spectrum data, and a first conversion length for calculating the frequency spectrum data by the frequency conversion unit. Alternatively, a conversion length determining unit that selects a second conversion length according to the digital audio signal, and a minimum audible value that supplies a minimum audible value on a frequency axis according to the first conversion length or the second conversion length. When determining the number of bits for quantization of the frequency spectrum data based on the supply unit and the frequency spectrum data obtained by the frequency conversion unit and the conversion length selected by the conversion length determination unit. An auditory model calculation unit that calculates an auditory model corresponding to a human auditory characteristic to be used, and the auditory model is determined by the auditory model calculated by the auditory model calculator. An audio signal encoding device having a quantization and encoding unit that quantizes the frequency spectrum data by the number of bits to generate an encoded bit sequence, wherein when the second conversion length is selected, A time masking threshold is calculated by using a time masking effect between a plurality of the frequency spectrum data continuous on the time axis among the frequency spectrum data whose number on the time axis calculated by the conversion unit is continuous with K. An audio signal encoding apparatus, comprising: a time masking calculation unit that performs the calculation; and the auditory model determination unit that calculates an auditory model using the frequency spectrum data, the minimum audible value, and the time masking threshold.

2. When the second conversion length is selected, L
Divides the N pieces of frequency spectrum data into L blocks including at least one piece of frequency spectrum data as one or more positive integers, and calculates the maximum value or the square of the absolute value of the frequency spectrum data for each block. The signal level calculated by the sum is calculated, and the number on the time axis is K
And a time masking calculation unit that calculates a time masking threshold using a time masking effect between a plurality of the signal levels that are continuous on the time axis among the number L of the signal levels that are continuous in the above manner. The audio signal encoding device according to claim 1.

3. A first conversion length for converting a digital audio signal in units of the number M of time series into frequency spectrum data of the number M, or given by M / K with K being a positive integer in the time series. The number N of digital audio signals whose number on the time axis is continuous with K on the time axis
Calculating the frequency spectrum data from the digital audio signal with a conversion length selected according to the digital audio signal among the second conversion lengths to be converted into the frequency spectrum data of Based on the length, to calculate an auditory model corresponding to the human auditory characteristics used in determining the number of bits for quantization of the frequency spectrum data,
An audio signal encoding method for quantizing frequency spectrum data calculated by the selected conversion length with the calculated number of bits determined by the auditory model, and generating an encoded bit sequence, wherein the audio model includes When the second conversion length is selected, among the frequency spectrum data whose number on the time axis calculated by the conversion length is continuous with K, among the plurality of frequency spectrum data continuous on the time axis, A time masking threshold using the time masking effect of the above, and using the frequency spectrum data, a minimum audible value of the frequency axis corresponding to the first conversion length or the second conversion length, and an auditory sense using the time masking threshold An audio signal encoding method comprising calculating a model.

4. When the second conversion length is selected, L
Divides the N pieces of frequency spectrum data into L blocks including at least one piece of frequency spectrum data as one or more positive integers, and calculates the maximum value or the square of the absolute value of the frequency spectrum data for each block. The signal level calculated by the sum is calculated, and the number on the time axis is K
4. The audio according to claim 3, wherein a time masking threshold is calculated using a time masking effect between a plurality of signal levels that are continuous on a time axis among the number L of the signal levels that are continuous. Signal encoding method.