JP6492915B2

JP6492915B2 - Encoding apparatus, encoding method, and program

Info

Publication number: JP6492915B2
Application number: JP2015083660A
Authority: JP
Inventors: 周作伊藤; 洋平岸; 舞子平原; 土永　義照; 義照土永; 美由紀白川; 祥吾中村; 晃釜野; 猛大谷
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2015-04-15
Filing date: 2015-04-15
Publication date: 2019-04-03
Anticipated expiration: 2035-04-15
Also published as: JP2016206244A

Description

本発明は、符号化装置、符号化方法、及びプログラムに関する。 The present invention relates to an encoding device, an encoding method, and a program.

オーディオ信号や音声信号（以下、まとめて「オーディオ信号」という）の符号化方式には、Advanced Audio Cording（ＡＡＣ）方式等、人間の聴覚特性を利用して情報量を低減する方式がある。この種の符号化方式では、オーディオ信号を少ないビット数で量子化した場合に増える量子化誤差を所定のマスキング閾値以下に抑えることで、知覚できる雑音を増やすことなく符号化に必要なビット数（すなわち情報量）を低減する。 As a coding method of an audio signal or a sound signal (hereinafter collectively referred to as “audio signal”), there is a method of reducing the amount of information using human auditory characteristics such as an Advanced Audio Cording (AAC) method. In this type of encoding method, the number of bits required for encoding without increasing perceivable noise by suppressing the quantization error that increases when an audio signal is quantized with a small number of bits to a predetermined masking threshold or less (no increase in perceivable noise) That is, the amount of information) is reduced.

マスキング閾値の理想的な値は、人間が知覚することのできない量子化誤差量の上限値である。そのため、マスキング閾値は、聴覚心理モデルに基づいて算出している。以下、聴覚心理モデルに基づいて算出した理想的なマスキング閾値を初期マスキング閾値という。 The ideal value of the masking threshold is the upper limit value of the quantization error amount that cannot be perceived by humans. Therefore, the masking threshold is calculated based on the psychoacoustic model. Hereinafter, an ideal masking threshold calculated based on the psychoacoustic model is referred to as an initial masking threshold.

ところが、低ビットレート条件（例えば６４ｋｂｐｓ以下）での符号化においては、使用可能なビット数が少ないため、量子化誤差を初期マスキング閾値以下に抑えられない場合が多い。量子化誤差を初期マスキング閾値以下に抑えられない場合、ビットレート条件に基づいて初期マスキング閾値を補正する。（例えば、非特許文献１を参照。）。 However, in coding under a low bit rate condition (for example, 64 kbps or less), since the number of usable bits is small, the quantization error cannot often be suppressed below the initial masking threshold. If the quantization error cannot be suppressed below the initial masking threshold, the initial masking threshold is corrected based on the bit rate condition. (For example, refer nonpatent literature 1.).

また、低ビットレート条件での符号化において限られた量のビットを効率よく利用する方法として、オーディオ信号に応じて各帯域へのビット数の割り当てを固定又は可変に適応的に切り替える方法が知られている（例えば、特許文献１を参照）。 In addition, as a method for efficiently using a limited amount of bits in encoding under a low bit rate condition, there is known a method for adaptively switching the allocation of the number of bits to each band in a fixed or variable manner according to the audio signal. (For example, see Patent Document 1).

特開２０００−１５１４１３号公報JP 2000-151413 A

”3GPP TS 26.403 V9.0.0”，[online]，3GPP，平成27年3月8日検索，インターネット〈URL: http://www.arib.or.jp/IMT-2000/V900Jul11/5_Appendix/Rel9/26/26403-900.pdf〉"3GPP TS 26.403 V9.0.0", [online], 3GPP, March 8, 2015 search, Internet <URL: http://www.arib.or.jp/IMT-2000/V900Jul11/5_Appendix/Rel9/ 26 / 26403-900.pdf>

マスキング閾値の補正方法は、量子化により帯域が欠落することを許容する条件で行う方法と、帯域の欠落を許容しない条件で行う方法とに大別される。 Masking threshold correction methods are roughly classified into a method that is performed under a condition that allows a band to be lost due to quantization, and a method that is performed under a condition that does not allow a band to be lost.

帯域の欠落を許容する条件でマスキング閾値を補正した場合、補正量が多くなると、人間が知覚可能な音を含む帯域が量子化により欠落してしまうことがある。量子化（符号化）により知覚可能な音を含む帯域が欠落すると、符号化されたオーディオ信号を再生（復号化）したときに再生音を聴く人に違和感を与える。そのため、欠落する帯域が増えると音質の劣化につながる。 When the masking threshold is corrected under a condition that allows the loss of bands, if the correction amount increases, a band including sound that can be perceived by humans may be lost due to quantization. If a band including a sound that can be perceived by quantization (encoding) is lost, when the encoded audio signal is reproduced (decoded), a person who listens to the reproduced sound is uncomfortable. For this reason, when the number of missing bands increases, the sound quality deteriorates.

一方、帯域の欠落を許容しない条件でマスキング閾値を補正する場合、各帯域のマスキング閾値に上限値を設定して行う。そのため、補正量が上限値に達して更なる補正をできない帯域が生じた場合、その帯域の補正量を増やせない（言い換えると割り当てるビット数を減らせない）分、他の帯域の補正量を増やすこととなる。したがって、初期マスキング閾値と上限値との差が大きい帯域のマスキング閾値が過度に補正され、当該帯域の符号化に割り当てるビットが少なくなってしまう。初期マスキング閾値と上限値との差が大きい帯域は、音質的に重要な帯域である。すなわち、帯域の欠落を許容しない条件でマスキング閾値を補正した場合、音質的に重要な帯域に割り当てるビット数が少なくなり、音質の劣化につながる。 On the other hand, when the masking threshold value is corrected under a condition that does not allow the loss of the band, an upper limit value is set for the masking threshold value of each band. Therefore, when the correction amount reaches the upper limit and a band that cannot be further corrected is generated, the correction amount of that band cannot be increased (in other words, the number of allocated bits cannot be reduced), and the correction amount of other bands is increased. It becomes. Therefore, the masking threshold value for the band having a large difference between the initial masking threshold value and the upper limit value is excessively corrected, and the number of bits allocated for encoding the band is reduced. A band having a large difference between the initial masking threshold and the upper limit value is a band important for sound quality. That is, when the masking threshold is corrected under conditions that do not allow band loss, the number of bits allocated to a band important for sound quality decreases, leading to deterioration of sound quality.

一つの側面において、本発明は、聴覚特性に基づくマスキング閾値を用いてオーディオ信号を符号化する際の音質の劣化を抑制することを目的とする。 In one aspect, an object of the present invention is to suppress deterioration in sound quality when an audio signal is encoded using a masking threshold based on auditory characteristics.

本発明の１つの態様である符号化装置は、オーディオ信号を周波数スペクトルに変換し、当該周波数スペクトルの量子化及び符号化を行う符号化装置において、閾値生成部と、閾値補正部と、再補正制御部と、を備える。前記閾値生成部は、前記周波数スペクトルに基づいて量子化する際の初期マスキング閾値を生成する。前記閾値補正部は、前記周波数スペクトルの量子化に使用可能なビット量に基づいて前記初期マスキング閾値を補正する。前記再補正制御部は、前記閾値補正部で補正されたマスキング閾値と前記初期マスキング閾値との概形の類似度が基準値以下の場合に、前記周波数スペクトルの帯域毎に欠落を許容するか否かを設定し、前記閾値補正部に前記初期マスキング閾値を再度補正させる。 An encoding device according to one aspect of the present invention is an encoding device that converts an audio signal into a frequency spectrum, and performs quantization and encoding of the frequency spectrum. A threshold generation unit, a threshold correction unit, and a recorrection A control unit. The threshold generation unit generates an initial masking threshold for quantization based on the frequency spectrum. The threshold correction unit corrects the initial masking threshold based on a bit amount usable for quantization of the frequency spectrum. The re-correction control unit determines whether or not to allow omission for each band of the frequency spectrum when the approximate similarity between the masking threshold corrected by the threshold correction unit and the initial masking threshold is equal to or less than a reference value. Then, the threshold value correction unit corrects the initial masking threshold value again.

上述の態様によれば、聴覚特性に基づくマスキング閾値を用いてオーディオ信号を符号化する際の音質の劣化を抑制することができる。 According to the above-described aspect, it is possible to suppress deterioration in sound quality when an audio signal is encoded using a masking threshold based on auditory characteristics.

本発明の第１の実施形態に係る符号化装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the encoding apparatus which concerns on the 1st Embodiment of this invention. 第１の実施形態に係る符号化装置における符号化処理を示すフローチャート（その１）である。It is a flowchart (the 1) which shows the encoding process in the encoding apparatus which concerns on 1st Embodiment. 第１の実施形態に係る符号化装置における符号化処理を示すフローチャート（その２）である。It is a flowchart (the 2) which shows the encoding process in the encoding apparatus which concerns on 1st Embodiment. 第１の実施形態に係る符号化装置における符号化処理を示すフローチャート（その３）である。It is a flowchart (the 3) which shows the encoding process in the encoding apparatus which concerns on 1st Embodiment. 第１の実施形態における欠落許容帯域設定処理の内容を示すフローチャートである。It is a flowchart which shows the content of the missing | missing tolerance band setting process in 1st Embodiment. 周波数スペクトル及び初期マスキング閾値の一例を示すグラフである。It is a graph which shows an example of a frequency spectrum and an initial masking threshold value. １回目の補正で得られたマスキング閾値と初期マスキング閾値との関係を説明するグラフである。It is a graph explaining the relationship between the masking threshold value obtained by the first correction and the initial masking threshold value. ２回目の補正で得られたマスキング閾値と１回目の補正で得られたマスキング閾値との関係を説明するグラフである。It is a graph explaining the relationship between the masking threshold value obtained by the second correction and the masking threshold value obtained by the first correction. ３回目の補正で得られたマスキング閾値と２回目の補正で得られたマスキング閾値との関係を説明するグラフである。It is a graph explaining the relationship between the masking threshold value obtained by the third correction and the masking threshold value obtained by the second correction. 符号化装置として用いるコンピュータのハードウェア構成例を示す模式図である。It is a schematic diagram which shows the hardware structural example of the computer used as an encoding apparatus. 第１の実施形態に係るマスキング閾値の補正処理の変形例を示すフローチャートである。It is a flowchart which shows the modification of the correction process of the masking threshold value which concerns on 1st Embodiment. 本発明の第２の実施形態に係るマスキング閾値の補正処理を示すフローチャートである。It is a flowchart which shows the correction process of the masking threshold value which concerns on the 2nd Embodiment of this invention. 第２の実施形態における欠落許容帯域設定処理の内容を示すフローチャートである。It is a flowchart which shows the content of the missing | missing tolerance band setting process in 2nd Embodiment. 欠落を許容する帯域の割合の決定方法を説明するグラフである。It is a graph explaining the determination method of the ratio of the zone | band | zone which accept | permits a loss | missing. 本発明の第３の実施形態に係る欠落許容帯域設定処理の内容を示すフローチャートである。It is a flowchart which shows the content of the deletion | missing tolerance band setting process which concerns on the 3rd Embodiment of this invention. 欠落を許容する帯域の割合の決定方法を説明するグラフである。It is a graph explaining the determination method of the ratio of the zone | band | zone which accept | permits a loss | missing.

［第１の実施形態］
図１は、本発明の第１の実施形態に係る符号化装置の構成例を示すブロック図である。 [First Embodiment]
FIG. 1 is a block diagram showing a configuration example of an encoding apparatus according to the first embodiment of the present invention.

図１に示すように、本実施形態の符号化装置１は、ブロック切替部１０、ＭＤＣＴ処理部１２、マスキング閾値生成部１４、聴覚特性算出部１６、マスキング閾値補正部１８、再補正制御部２０、量子化部２２、符号化部２４、及び多重化部２６を備える。 As shown in FIG. 1, the encoding apparatus 1 according to the present embodiment includes a block switching unit 10, an MDCT processing unit 12, a masking threshold generation unit 14, an auditory characteristic calculation unit 16, a masking threshold correction unit 18, and a recorrection control unit 20. , A quantization unit 22, an encoding unit 24, and a multiplexing unit 26.

ブロック切替部１０は、入力信号（オーディオ信号）の特性に基づいて、入力信号に対しModified Discrete Cosine Transform（ＭＤＣＴ）処理を行う際のブロック長を切り替える。ＡＡＣ方式での符号化においては、長ブロック（１０２４点）又は短ブロック（１２８点）にブロック長を切り替える。 The block switching unit 10 switches the block length when performing Modified Discrete Cosine Transform (MDCT) processing on the input signal based on the characteristics of the input signal (audio signal). In AAC coding, the block length is switched to a long block (1024 points) or a short block (128 points).

ＭＤＣＴ処理部１２は、入力信号に対し長ブロック又は短ブロックに応じた窓長のＭＤＣＴ処理を行い、入力信号を周波数スペクトルに変換する。ＡＡＣ方式での符号化においては、ブロック長が長ブロックであれば窓長２０４８のＭＤＣＴ処理を行い、短ブロックであれば窓長２５６のＭＤＣＴ処理を行う。 The MDCT processing unit 12 performs MDCT processing with a window length corresponding to the long block or the short block on the input signal, and converts the input signal into a frequency spectrum. In AAC encoding, if the block length is a long block, MDCT processing with a window length of 2048 is performed, and if the block length is short, MDCT processing with a window length of 256 is performed.

マスキング閾値生成部１４は、入力信号に対して聴覚心理分析を行い、当該入力信号から得た周波数スペクトルの量子化において最適なマスキング閾値（初期マスキング閾値）sfbThr₀を生成する。初期マスキング閾値sfbThr₀は、周波数スペクトルにおけるスケールファクタバンドsfb（以下「帯域sfb」ともいう）毎に生成する。また、マスキング閾値生成部１４は、周波数スペクトルの各帯域sfbにおける電力値（入力パワー）mdct_pow(sfb)と初期マスキング閾値sfbThr₀(sfb)とに基づいて、符号化対象の帯域sfbを決定する。更に、マスキング閾値生成部１４は、符号化対象の帯域sfbを決定した後、初期マスキング閾値を用いた量子化が可能か否か、言い換えるとマスキング閾値を補正する必要があるか否かを判断する。 The masking threshold generation unit 14 performs auditory psychological analysis on the input signal, and generates an optimal masking threshold (initial masking threshold) sfbThr ₀ in the quantization of the frequency spectrum obtained from the input signal. The initial masking threshold sfbThr ₀ is generated for each scale factor band sfb (hereinafter also referred to as “band sfb”) in the frequency spectrum. Further, the masking threshold value generator 14 determines the band sfb to be encoded based on the power value (input power) mdct_pow (sfb) and the initial masking threshold value sfbThr ₀ (sfb) in each band sfb of the frequency spectrum. Further, after determining the encoding target band sfb, the masking threshold generation unit 14 determines whether or not quantization using the initial masking threshold is possible, in other words, whether or not the masking threshold needs to be corrected. .

聴覚特性算出部１６は、マスキング閾値の補正に必要な聴覚特性を算出する。本実施形態の聴覚特性算出部１６は、聴覚特性として、各帯域sfbの信号対マスク比（Signal Mask Ratio；ＳＭＲ）を算出する。 The auditory characteristic calculation unit 16 calculates the auditory characteristic necessary for correcting the masking threshold. The auditory characteristic calculation unit 16 of the present embodiment calculates a signal mask ratio (SMR) of each band sfb as the auditory characteristic.

マスキング閾値補正部１８は、聴覚特性（信号対マスク比）及び各帯域sfbに設定されたマスキング閾値の上限値に基づいて、マスキング閾値を補正する。なお、本実施形態におけるマスキング閾値補正部１８は、一組の初期マスキング閾値に対する補正が１回目の場合、補正を行う前に、全ての帯域sfbのマスキング閾値の上限値を、量子化による欠落を許容しないことを表す値に設定する。また、後述するように、一組の初期マスキング閾値に対する補正が２回目以降の場合、マスキング閾値補正部１８は、いくつかの帯域の欠落を許容する条件下で補正を行う。 The masking threshold correction unit 18 corrects the masking threshold based on the auditory characteristics (signal to mask ratio) and the upper limit value of the masking threshold set for each band sfb. Note that the masking threshold value correction unit 18 in the present embodiment, when correction for a set of initial masking threshold values is performed for the first time, sets the upper limit value of the masking threshold values of all the bands sfb to be missing due to quantization. Set to a value that does not allow. As will be described later, when correction for a set of initial masking threshold values is performed for the second time or later, the masking threshold value correction unit 18 performs correction under a condition that allows some bands to be lost.

再補正制御部２０は、補正されたマスキング閾値の採否を判定し、採用しない場合にはマスキング閾値補正部１８にマスキング閾値の補正を再度行わせる。マスキング閾値の補正を再度行わせる場合、再補正制御部２０は、量子化による欠落を許容する帯域を設定する。この再補正制御部２０は、採否判定部２０ａ、欠落許容帯域設定部２０ｂ、及び記憶部２０ｃを有する。 The re-correction control unit 20 determines whether or not to use the corrected masking threshold value, and if not adopted, causes the masking threshold value correction unit 18 to correct the masking threshold value again. When the masking threshold is corrected again, the recorrection control unit 20 sets a band that allows missing due to quantization. The re-correction control unit 20 includes a acceptance / rejection determination unit 20a, a missing allowable band setting unit 20b, and a storage unit 20c.

採否判定部２０ａは、補正されたマスキング閾値の採否を判定する。本実施形態では、補正後のマスキング閾値と初期マスキング閾値との概形の類似度に基づいて、マスキング閾値の採否を判定する。概形の類似度には、補正後のマスキング閾値sfbThr(sfb)と初期マスキング閾値sfbThr₀(sfb)との相互相関値を用いる。 The acceptance / rejection determination unit 20a determines acceptance / rejection of the corrected masking threshold. In the present embodiment, whether or not the masking threshold is adopted is determined based on the approximate similarity between the corrected masking threshold and the initial masking threshold. For the approximate similarity, a cross-correlation value between the corrected masking threshold sfbThr (sfb) and the initial masking threshold sfbThr ₀ (sfb) is used.

欠落許容帯域設定部２０ｂは、採否判定部２０ａにおいて不採用と判定した場合に、欠落を許容する帯域を設定する。本実施形態では、マスキング閾値の上限値が欠落を許容しない値に設定された帯域のうち重要度が最も低い帯域の欠落を許容する。帯域の重要度には信号対マスク比（ＳＭＲ）を用い、信号対マスク比が大きいほど重要度が高い帯域とする。 The missing permissible band setting unit 20b sets a band that allows omission when the acceptance / rejection determination unit 20a determines that it is not adopted. In the present embodiment, the loss of the band with the lowest importance is permitted among the bands in which the upper limit value of the masking threshold is set to a value that does not allow the loss. For the importance of the band, a signal-to-mask ratio (SMR) is used, and a band having a higher importance as the signal-to-mask ratio is larger.

記憶部２０ｃには、マスキング閾値を補正しなおす際に必要な情報を記憶させる。本実施形態では、符号化対象の各帯域sfbについての初期マスキング閾値sfbThr₀(sfb)、信号対マスク比、マスキング閾値の上限値、及び重要度を含む情報を記憶させる。 The storage unit 20c stores information necessary for recorrecting the masking threshold. In this embodiment, information including the initial masking threshold value sfbThr ₀ (sfb), the signal-to-mask ratio, the upper limit value of the masking threshold value, and the importance level for each band sfb to be encoded is stored.

量子化部２２は、初期マスキング閾値及び補正されたマスキング閾値のいずれかを用いて、周波数スペクトルにおける符号化対象の帯域を量子化する。 The quantization unit 22 quantizes the band to be encoded in the frequency spectrum using either the initial masking threshold or the corrected masking threshold.

符号化部２４は、周波数スペクトルを量子化して得られた値を符号化する。ＡＡＣ方式での符号化の場合、符号化部２４は量子化して得られた値をハフマン符号化する。 The encoding unit 24 encodes a value obtained by quantizing the frequency spectrum. In the case of encoding by the AAC method, the encoding unit 24 performs Huffman encoding on the value obtained by quantization.

多重化部２６は、符号化されたオーディオ信号を多重化して符号化ストリームを生成する。 The multiplexing unit 26 multiplexes the encoded audio signal to generate an encoded stream.

次に、本実施形態に係る符号化装置１における符号化処理について説明する。
図２Ａは、第１の実施形態に係る符号化装置における符号化処理を示すフローチャート（その１）である。図２Ｂは、第１の実施形態に係る符号化装置における符号化処理を示すフローチャート（その２）である。図２Ｃは、第１の実施形態に係る符号化装置における符号化処理を示すフローチャート（その３）である。 Next, the encoding process in the encoding device 1 according to the present embodiment will be described.
FIG. 2A is a flowchart (part 1) illustrating an encoding process in the encoding device according to the first embodiment. FIG. 2B is a flowchart (part 2) illustrating the encoding process in the encoding device according to the first embodiment. FIG. 2C is a flowchart (part 3) illustrating the encoding process in the encoding device according to the first embodiment.

本実施形態の符号化装置１は、入力信号（オーディオ信号）におけるフレーム等の符号化単位のデータのそれぞれに対し、図２Ａ〜図２Ｃに示すような符号化処理を行う。 The encoding apparatus 1 according to the present embodiment performs encoding processing as shown in FIGS. 2A to 2C on each encoding unit data such as a frame in an input signal (audio signal).

符号化装置１は、図２Ａに示すように、まず、１フレーム分の入力信号を周波数スペクトルに変換し、各帯域sfbの電力値mdct_pow(sfb)を算出する（ステップＳ１０）。ステップＳ１０の処理は、ブロック切替部１０及びＭＤＣＴ処理部１２が行う。 As shown in FIG. 2A, the encoding device 1 first converts an input signal for one frame into a frequency spectrum, and calculates a power value mdct_pow (sfb) of each band sfb (step S10). The block switching unit 10 and the MDCT processing unit 12 perform the process in step S10.

ブロック切替部１０は、ＭＤＣＴ処理のブロック長を長ブロック及び短ブロックのいずれにするかを選択して切り替える。ブロック長は、既知の選択方法、例えば入力信号の電力変動比と予測利得変動比とに基づいて選択する。 The block switching unit 10 selects and switches whether the block length of the MDCT processing is a long block or a short block. The block length is selected based on a known selection method, for example, an input signal power fluctuation ratio and a predicted gain fluctuation ratio.

また、ＭＤＣＴ処理部１２は、ブロック切替部１０で選択したブロック長に応じた窓長のＭＤＣＴ処理を行い、入力信号を周波数スペクトルに変換する。ＭＤＣＴ処理部１２は、例えば、下記式（１）により入力信号ｘinを周波数スペクトル mdct(k) に変換する。 Further, the MDCT processing unit 12 performs an MDCT process with a window length corresponding to the block length selected by the block switching unit 10 and converts the input signal into a frequency spectrum. The MDCT processing unit 12 converts the input signal xin into the frequency spectrum mdct (k) by the following equation (1), for example.

式（１）におけるＮは、ＭＤＣＴ処理の窓長である。ＡＡＣ方式での符号化の場合、ブロック長が長ブロックであれば窓長Ｎを２０４８とし、短ブロックであれば窓長Ｎを２５６としてＭＤＣＴ処理を行う。 N in Formula (1) is the window length of MDCT processing. In the case of encoding by the AAC method, if the block length is a long block, the window length N is set to 2048, and if the block length is a short block, the MDCT process is performed with the window length N set to 256.

また、ＭＤＣＴ処理部１２は、得られた周波数スペクトルに基づき、各帯域（スケールファクターバンド sfb）の電力値mdct_pow(sfb)を算出する。電力値mdct_pow(sfb)は、例えば、下記式（２）により算出する。 Further, the MDCT processing unit 12 calculates the power value mdct_pow (sfb) of each band (scale factor band sfb) based on the obtained frequency spectrum. The power value mdct_pow (sfb) is calculated by the following equation (2), for example.

なお、オーディオ信号の周波数スペクトルへの変換は、式（１）を用いた変換に限らず、既知の変換方法のいずれかを用いて行えばよい。同様に、周波数スペクトルの各帯域sfbの電力値mdct_pow(sfb)は、式（２）に限らず、既知の算出方法のいずれかを用いて算出すればよい。 Note that the conversion of the audio signal into the frequency spectrum is not limited to the conversion using Equation (1), and any one of known conversion methods may be used. Similarly, the power value mdct_pow (sfb) of each band sfb of the frequency spectrum is not limited to the equation (2), and may be calculated using any known calculation method.

符号化装置１は、次に、周波数スペクトルを量子化する際の初期マスキング閾値sbfThr₀(sfb)を生成する（ステップＳ１２）。ステップＳ１２の処理は、マスキング閾値生成部１４が行う。 Next, the encoding device 1 generates an initial masking threshold sbfThr ₀ (sfb) for quantizing the frequency spectrum (step S12). The process of step S12 is performed by the masking threshold value generator 14.

マスキング閾値生成部１４は、入力信号に対して聴覚心理分析を行い、帯域sfb毎に初期マスキング閾値sfbThr₀(sfb)を求める。初期マスキング閾値sfbThr₀(sfb)は、各帯域sfbにおける最小可聴レベルやマスキング効果等に基づき、既知の算出方法のいずれかを用いて算出する。 The masking threshold value generation unit 14 performs auditory psychological analysis on the input signal and obtains an initial masking threshold value sfbThr ₀ (sfb) for each band sfb. The initial masking threshold value sfbThr ₀ (sfb) is calculated using any known calculation method based on the minimum audible level, the masking effect, etc. in each band sfb.

また、マスキング閾値生成部１４は、初期マスキング閾値sfbThr₀(sfb)を生成すると、次に、初期マスキング閾値sfbThr₀(sfb)と周波数スペクトルの電力値mdct_pow(sfb)とに基づいて符号化対象の帯域を決定する（ステップＳ１４）。ステップＳ１４の処理において、マスキング閾値生成部１４は、周波数スペクトルの全帯域のうち、sfbThr₀(sfb)＜mdct_pow(sfb)である帯域のみを符号化対象とする。 In addition, when the masking threshold value generator 14 generates the initial masking threshold value sfbThr ₀ (sfb), the masking threshold value generator 14 next generates an encoding target based on the initial masking threshold value sfbThr ₀ (sfb) and the frequency spectrum power value mdct_pow (sfb). A band is determined (step S14). In the process of step S14, the masking threshold value generation unit 14 sets only the band of sfbThr ₀ (sfb) <mdct_pow (sfb) out of the entire band of the frequency spectrum as an encoding target.

符号化対象の帯域を決定した後、マスキング閾値生成部１４は、マスキング閾値を補正するか否かを判定するため、初期ＰＥ値及び目標ＰＥ値を算出する（ステップＳ１６）。本実施形態では、初期ＰＥ値が目標ＰＥ値より大きいか否かにより、マスキング閾値を補正するか否かを判定する（ステップＳ１８）。 After determining the encoding target band, the masking threshold generation unit 14 calculates an initial PE value and a target PE value in order to determine whether or not to correct the masking threshold (step S16). In the present embodiment, it is determined whether or not to correct the masking threshold based on whether or not the initial PE value is larger than the target PE value (step S18).

ここで、ＰＥ値とは、音響パラメータの１つである知覚エントロピー（Perceptual Entropy）の値であり、符号化の難しさを表す。初期ＰＥ値は、符号化対象の帯域における電力値mdct_pow(sfb)と初期マスキング閾値sfbThr₀(sfb)とに基づいて算出されるＰＥ値である。また、目標ＰＥ値は、符号化に使用可能なビット数に基づいて算出されるＰＥ値である。初期ＰＥ値及び目標ＰＥ値は、既知の算出方法のいずれか（例えば、非特許文献１に記載された算出方法）を用いて算出する。 Here, the PE value is a value of perceptual entropy, which is one of acoustic parameters, and represents difficulty in encoding. The initial PE value is a PE value calculated based on the power value mdct_pow (sfb) and the initial masking threshold sfbThr ₀ (sfb) in the band to be encoded. The target PE value is a PE value calculated based on the number of bits that can be used for encoding. The initial PE value and the target PE value are calculated using any known calculation method (for example, the calculation method described in Non-Patent Document 1).

知覚エントロピーの値は、上記のように量子化に必要なビット数と関係があり、目標ＰＥ値に対して初期ＰＥ値が大きい場合、初期マスキング閾値を用いた量子化で使用するビット量がビットレートに応じて与えられるビット数を超えると判断できる。一方、初期ＰＥ値が目標ＰＥ値以下である場合、初期マスキング閾値を用いた量子化で使用するビット量がビットレートに応じて与えられるビット数内に収まると判断できる。よって、初期ＰＥ値と目標ＰＥ値との大小関係に基づいて初期マスキング閾値を用いた量子化が可能であるか否か、すなわちマスキング閾値を補正する必要があるか否かを判断できる。 The perceptual entropy value is related to the number of bits necessary for quantization as described above, and when the initial PE value is larger than the target PE value, the bit amount used for quantization using the initial masking threshold is bit. It can be determined that the number of bits given according to the rate is exceeded. On the other hand, when the initial PE value is less than or equal to the target PE value, it can be determined that the bit amount used in quantization using the initial masking threshold falls within the number of bits given according to the bit rate. Therefore, based on the magnitude relationship between the initial PE value and the target PE value, it can be determined whether quantization using the initial masking threshold is possible, that is, whether the masking threshold needs to be corrected.

初期ＰＥ値が目標ＰＥ値以下の場合（ステップＳ１８；Ｎｏ）、マスキング閾値生成部１４は、マスキング閾値を補正しないと判定し、量子化部２２に初期マスキング閾値sfbThr₀(sfb)を渡す。この場合、符号化装置１は、図２Ｃに示すように、初期マスキング閾値sfbThr₀(sfb)を用いて周波数スペクトルを量子化する（ステップＳ３２）。ステップＳ３２の量子化は、量子化部２２が行う。量子化部２２は、既知の量子化方法のいずれかを用いて周波数スペクトルを量子化する。 When the initial PE value is less than or equal to the target PE value (step S18; No), the masking threshold value generation unit 14 determines that the masking threshold value is not corrected, and passes the initial masking threshold value sfbThr ₀ (sfb) to the quantization unit 22. In this case, the encoding device 1 quantizes the frequency spectrum using the initial masking threshold sfbThr ₀ (sfb) as shown in FIG. 2C (step S32). The quantization unit 22 performs the quantization in step S32. The quantization unit 22 quantizes the frequency spectrum using any of the known quantization methods.

一方、初期ＰＥ値が目標ＰＥ値より大きい場合（ステップＳ１８；Ｙｅｓ）、マスキング閾値生成部１４は、マスキング閾値を補正すると判定し、初期マスキング閾値sfbThr₀(sfb)及び目標ＰＥ値を、聴覚特性算出部１６及び採否判定部２０ａに渡す。この場合、符号化装置１は、図２Ｂに示したステップＳ２０〜Ｓ３０のようなマスキング閾値の補正処理を行う。 On the other hand, when the initial PE value is larger than the target PE value (step S18; Yes), the masking threshold value generation unit 14 determines that the masking threshold value is to be corrected, and the initial masking threshold value sfbThr ₀ (sfb) and the target PE value are determined as auditory characteristics. It passes to the calculation part 16 and the acceptance / rejection determination part 20a. In this case, the encoding apparatus 1 performs a masking threshold value correction process as in steps S20 to S30 illustrated in FIG. 2B.

マスキング閾値を補正する場合、符号化装置１は、次に、周波数スペクトル等に基づき聴覚特性を算出する（ステップＳ２０）。ステップＳ２０の処理は、聴覚特性算出部１６が行う。 When correcting the masking threshold, the encoding apparatus 1 next calculates an auditory characteristic based on the frequency spectrum or the like (step S20). The process of step S20 is performed by the auditory characteristic calculation unit 16.

聴覚特性算出部１６は、各帯域sfbにおける信号対マスク比（ＳＭＲ）、すなわち各帯域における電力値mdct_pow(sfb)と初期マスキング閾値sfbThr₀(sfb)との差分値を算出する。信号対マスク比を算出すると、聴覚特性算出部１６は、初期マスキング閾値sfbThr₀(sfb)、目標ＰＥ値、及び信号対マスク比をマスキング閾値補正部１８に渡すとともに、信号対マスク比を再補正制御部２０の記憶部２０ｃに記憶させる。 The auditory characteristic calculator 16 calculates a signal-to-mask ratio (SMR) in each band sfb, that is, a difference value between the power value mdct_pow (sfb) and the initial masking threshold sfbThr ₀ (sfb) in each band. When the signal-to-mask ratio is calculated, the auditory characteristic calculation unit 16 passes the initial masking threshold value sfbThr ₀ (sfb), the target PE value, and the signal-to-mask ratio to the masking threshold value correction unit 18 and recorrects the signal-to-mask ratio. The data is stored in the storage unit 20c of the control unit 20.

次に、符号化装置１は、欠落防止処理及びマスキング閾値の補正処理を行う。欠落防止処理及びマスキング閾値の補正処理は、マスキング閾値補正部１８が行う。 Next, the encoding device 1 performs a loss prevention process and a masking threshold correction process. The masking threshold correction unit 18 performs the omission prevention processing and the masking threshold correction processing.

欠落防止処理は、補正により得たマスキング閾値を用いて周波数スペクトルを量子化した際に帯域が欠落することを防止する処理である。マスキング閾値補正部１８は、欠落防止処理として、各帯域sfbにおける補正後のマスキング閾値sfbThr(sfb)の上限値を、欠落を許容しない値に設定する処理を行う（ステップＳ２２）。量子化した際の帯域の欠落を防止するには、補正後のマスキング閾値sfbThr(sfb)が電力値mdct_pow(sfb)よりも大きくならないようにすればよい。よって、ステップＳ２２では、補正後のマスキング閾値sfbThr(sfb)の上限値を、電力値mdct_pow(sfb)よりもわずかに小さな値に設定する。 The loss prevention process is a process for preventing a band from being lost when a frequency spectrum is quantized using a masking threshold obtained by correction. The masking threshold value correction unit 18 performs a process of setting the upper limit value of the corrected masking threshold value sfbThr (sfb) in each band sfb to a value that does not allow a loss as a loss prevention process (step S22). In order to prevent loss of the band when quantized, the corrected masking threshold value sfbThr (sfb) should not be larger than the power value mdct_pow (sfb). Therefore, in step S22, the upper limit value of the masking threshold value sfbThr (sfb) after correction is set to a value slightly smaller than the power value mdct_pow (sfb).

マスキング閾値補正部１８は、次に、聴覚特性及びマスキング閾値の上限値、並びにビットレートに基づいてマスキング閾値を補正する（ステップＳ２４）。ステップＳ２４では、例えば、下記式（３）を用い、電力値mdct_pow(sfb)と補正後のマスキング閾値sfbThr(sfb)とに基づいて算出されるＰＥ値が目標ＰＥ値になるようマスキング閾値を補正する。 Next, the masking threshold value correction unit 18 corrects the masking threshold value based on the auditory characteristic, the upper limit value of the masking threshold value, and the bit rate (step S24). In step S24, for example, the following equation (3) is used to correct the masking threshold so that the PE value calculated based on the power value mdct_pow (sfb) and the corrected masking threshold sfbThr (sfb) becomes the target PE value. To do.

式（３）において、ｒは補正パラメータである（非特許文献１を参照）。 In Expression (3), r is a correction parameter (see Non-Patent Document 1).

ステップＳ２４における１回目のマスキング閾値の補正は、補正後のマスキング閾値を用いて周波数スペクトルを量子化した場合に欠落する帯域がない条件下で行っている。このように欠落する帯域がない条件下でマスキング閾値を補正した場合、補正後のマスキング閾値の概形が初期マスキング閾値の概形から大きくずれてしまい、音質が劣化する恐れがある。そこで、本実施形態の符号化装置１では、補正前後のマスキング閾値の概形の類似度と予め定めた基準値とを比較して、補正後のマスキング閾値を採用するか否かを判定する。そして、概形の類似度が基準値以下の場合、直前の補正処理により補正されたマスキング閾値を採用せず、マスキング閾値の補正を再度行う。補正後のマスキング閾値sfbThr(sfb)の採否の判定等は、再補正制御部２０が行う。 The first masking threshold correction in step S24 is performed under the condition that there is no band missing when the frequency spectrum is quantized using the corrected masking threshold. When the masking threshold is corrected under such a condition that there is no missing band, the outline of the masking threshold after the correction is greatly deviated from the outline of the initial masking threshold, and the sound quality may be deteriorated. Therefore, in the encoding device 1 of the present embodiment, the approximate similarity of the masking threshold values before and after correction is compared with a predetermined reference value to determine whether or not the corrected masking threshold value is adopted. When the approximate similarity is equal to or lower than the reference value, the masking threshold corrected by the immediately preceding correction process is not adopted, and the masking threshold is corrected again. The recorrection control unit 20 determines whether or not the masking threshold value sfbThr (sfb) after correction is adopted.

マスキング閾値補正部１８は、マスキング閾値を補正した後、得られたマスキング閾値sfbThr(sfb)を再補正制御部２０の採否判定部２０ａに渡すとともに、マスキング閾値の上限値を記憶部２０ｃに記憶させる。すると、採否判定部２０ａは、補正後のマスキング閾値sfbThr(sfb)の採否を判定するため、初期マスキング閾値sfbThr₀(sfb)と補正後のマスキング閾値sfbThr(sfb)との相互相関値correを算出する（ステップＳ２６）。補正後のマスキング閾値sfbThr(sfb)の採否は、算出した相互相関値correが予め定めた基準値（相関閾値ＴＨ_１）より大きいか否かで判定する（ステップＳ２８）。 After correcting the masking threshold value, the masking threshold value correction unit 18 passes the obtained masking threshold value sfbThr (sfb) to the acceptance / rejection determination unit 20a of the recorrection control unit 20, and stores the upper limit value of the masking threshold value in the storage unit 20c. . Then, the acceptance / rejection determination unit 20a calculates the cross-correlation value corre between the initial masking threshold sfbThr ₀ (sfb) and the corrected masking threshold sfbThr (sfb) in order to determine acceptance / rejection of the corrected masking threshold sfbThr (sfb). (Step S26). Whether or not the corrected masking threshold value sfbThr (sfb) is adopted is determined based on whether or not the calculated cross-correlation value corre is larger than a predetermined reference value (correlation threshold value TH ₁ ) (step S28).

閾値概形判定部２０ａは、例えば、下記式（４）により相互相関値correを算出する。 For example, the threshold outline determining unit 20a calculates the cross-correlation value corre by the following equation (4).

なお、相互相関値correは、符号化対象の帯域のマスキング閾値のみを用いて算出する。 The cross-correlation value corre is calculated using only the masking threshold of the band to be encoded.

式（４）により算出される相互相関値correは、０＜corre≦１となり、補正前後のマスキング閾値の概形の類似度が高いほど値が大きくなる。そのため、算出した相互相関値correが相関閾値ＴＨ_１（例えばＴＨ_１＝０．８）よりも大きい場合（ステップＳ２８；Ｙｅｓ）、採否判定部２０ａは、直前の補正処理で得た補正後のマスキング閾値sfbThr(sfb)を採用すると判定する。そして、採否判定部２０ａは、直前の補正処理で得たマスキング閾値sfbThr(sfb)を量子化部２２に渡す。この場合、符号化装置１は、図２Ｃに示すように、補正後のマスキング閾値sfbThr(sfb)を用いて周波数スペクトルを量子化する（ステップＳ３４）。 The cross-correlation value corre calculated by Expression (4) is 0 <corre ≦ 1, and the value increases as the similarity of the masking threshold before and after correction increases. Therefore, when the calculated cross-correlation value corre is larger than the correlation threshold TH ₁ (for example, TH ₁ = 0.8) (step S28; Yes), the acceptance / rejection determination unit 20a performs the masking after correction obtained in the immediately preceding correction process. It is determined that the threshold value sfbThr (sfb) is adopted. Then, the acceptance / rejection determination unit 20a passes the masking threshold value sfbThr (sfb) obtained in the immediately preceding correction process to the quantization unit 22. In this case, as illustrated in FIG. 2C, the encoding device 1 quantizes the frequency spectrum using the corrected masking threshold value sfbThr (sfb) (step S34).

一方、相互相関値correが相関閾値ＴＨ_１以下の場合（ステップＳ２８；Ｎｏ）、採否判定部２０ａは、直前の補正処理で得た補正後のマスキング閾値sfbThr(sfb)を採用しない（すなわちマスキング閾値を補正しなおす）と判定する。この場合、採否判定部２０ａは、欠落許容帯域設定部２０ｂに欠落許容帯域設定処理（ステップＳ３０）を行わせる。 On the other hand, if the cross-correlation value corre is less correlation threshold TH ₁ (step S28; No), adoption determination unit 20a does not employ the previous correction process obtained in the corrected masking threshold sfbThr (sfb) (i.e. the masking threshold Is corrected again). In this case, the acceptance / rejection determination unit 20a causes the missing allowance band setting unit 20b to perform a missing allowance band setting process (step S30).

なお、図２Ｂでは省略しているが、マスキング閾値を補正しなおす場合、採否判定部２０ａは、初期マスキング閾値sfbThr₀(sfb)を記憶部２０ｃに記憶させる。記憶部２０ｃに記憶させた初期マスキング閾値sfbThr₀(sfb)は、補正しなおしたマスキング閾値との相互相関値correの算出に用いる。 Although omitted in FIG. 2B, when the masking threshold is corrected again, the acceptance / rejection determination unit 20a stores the initial masking threshold sfbThr ₀ (sfb) in the storage unit 20c. The initial masking threshold value sfbThr ₀ (sfb) stored in the storage unit 20c is used to calculate a cross-correlation value corre with the corrected masking threshold value.

欠落許容帯域設定処理（ステップＳ３０）では、量子化した際に欠落することを許容する帯域を設定する。本実施形態では、ステップＳ３０の処理を行う毎に、マスキング閾値の上限値が欠落を許容しない値に設定された帯域のうち重要度が最も低い帯域を、欠落を許容する帯域に設定する。帯域の重要度は、量子化により欠落した場合の音質劣化への影響の度合いである。本実施形態では、後述するように、帯域の重要度として電力値mdct_pow(sfb)と初期マスキング閾値sfbThr₀(sfb)との差分値（すなわち信号対マスク比）を用いる。また、欠落を許容する帯域は、マスキング閾値の上限値を「０」又は「−１」等の欠落を許容しない場合の値と識別可能な値に設定する。 In the missing allowable band setting process (step S30), a band that is allowed to be lost when quantized is set. In the present embodiment, each time the process of step S30 is performed, the band with the lowest importance is set as the band that allows the omission, among the bands that have the upper limit value of the masking threshold set to a value that does not allow the omission. The importance of the band is the degree of influence on sound quality degradation when missing due to quantization. In this embodiment, as will be described later, a difference value (that is, a signal-to-mask ratio) between the power value mdct_pow (sfb) and the initial masking threshold value sfbThr ₀ (sfb) is used as the importance of the band. In addition, for the band that allows omission, the upper limit value of the masking threshold is set to a value that can be distinguished from a value that does not allow omission such as “0” or “−1”.

欠落許容帯域設定部２０ｂは、欠落を許容する帯域の設定を終えると、記憶部２０ｃが記憶しているマスキング閾値の上限値を更新するとともに、マスキング閾値の補正を指示する制御信号をマスキング閾値補正部１８に送る。これにより、欠落許容帯域設定処理（ステップＳ３０）が終了する。 After completing the setting of the band that allows the loss, the missing allowable band setting unit 20b updates the upper limit value of the masking threshold stored in the storage unit 20c and corrects the control signal instructing the correction of the masking threshold by the masking threshold correction. Send to part 18. Thereby, the missing allowable band setting process (step S30) ends.

欠落許容帯域設定処理（ステップＳ３０）が終了すると、符号化装置１が行う符号化処理は、マスキング閾値を補正する処理（ステップＳ２４）に戻る。以後、符号化装置１は、相互相関値correが相関閾値ＴＨ_１より大きくなるまで、ステップＳ２４〜Ｓ３０の処理を繰り返す。そして、相互相関値correが相関閾値ＴＨ_１より大きくなると（ステップＳ２８；Ｙｅｓ）、直前の補正処理で得られたマスキング閾値sfbThr(sfb)を用いた量子化（ステップＳ３４）を行う。 When the loss allowable band setting process (step S30) ends, the encoding process performed by the encoding apparatus 1 returns to the process of correcting the masking threshold (step S24). Thereafter, the encoding apparatus 1, until the cross-correlation value corre is greater than the correlation threshold TH _1, and repeats the processing of step S24～S30. When the cross-correlation value corre is greater than the correlation threshold TH _1; performing (step S28 Yes), the quantization using the previous correction processing resulting masking threshold sfbThr (sfb) (step S34).

以上の手順により初期マスキング閾値sfbThr₀(sfb)を用いた量子化（ステップＳ３２）又は補正後のマスキング閾値sfbThr(sfb)を用いた量子化（ステップＳ３４）を終えると、符号化装置１は、量子化された値を符号化する（ステップＳ３６）。ステップＳ３６は、符号化部２４が行う。 When the quantization using the initial masking threshold sfbThr ₀ (sfb) (step S32) or the quantization using the corrected masking threshold sfbThr (sfb) (step S34) is completed by the above procedure, the encoding device 1 The quantized value is encoded (step S36). Step S36 is performed by the encoding unit 24.

符号化部２４は、固定ハフマン符号化等の既知の符号化方法を用いた符号化を行う。そして、符号化を終えると、符号化部２４は、符号化したデータを多重化部２６に渡す。これにより、入力信号（オーディオデータ）の１フレーム分の符号化処理が終了する。 The encoding unit 24 performs encoding using a known encoding method such as fixed Huffman encoding. When the encoding is completed, the encoding unit 24 passes the encoded data to the multiplexing unit 26. Thereby, the encoding process for one frame of the input signal (audio data) is completed.

符号化処理を終えると、符号化装置１（多重化部２６）は、符号化されたオーディオデータにヘッダ情報等を付加した符号化ストリームを生成して出力する。 When the encoding process is completed, the encoding apparatus 1 (multiplexer 26) generates and outputs an encoded stream in which header information and the like are added to the encoded audio data.

このように、本実施形態に係る符号化装置１が行う符号化処理では、初期マスキング閾値を補正する際、まず、全ての帯域sfbに対し量子化による欠落を許容しない条件を付して補正する。そして、補正後のマスキング閾値sfbThr(sfb)と初期マスキング閾値sfbThr₀(sfb)との概形の類似度が基準値より低い場合、概形の類似度が基準値を超えるまで、重要度が低い帯域から順に欠落を許容しながらマスキング閾値の補正を繰り返す。これにより、帯域が欠落することによる音質の劣化を抑制しつつ、概形の類似度の低下（過度の補正）による音質劣化を抑制できる。 As described above, in the encoding process performed by the encoding apparatus 1 according to the present embodiment, when correcting the initial masking threshold, first, all bands sfb are corrected by adding a condition that does not allow missing due to quantization. . If the approximate similarity between the corrected masking threshold sfbThr (sfb) and the initial masking threshold sfbThr ₀ (sfb) is lower than the reference value, the importance is low until the approximate similarity exceeds the reference value. The correction of the masking threshold is repeated while allowing deletion in order from the band. Thereby, it is possible to suppress deterioration in sound quality due to a decrease in approximate similarity (excessive correction) while suppressing deterioration in sound quality due to lack of bands.

次に、本実施形態における欠落許容帯域設定処理（ステップＳ３０）の内容について説明する。 Next, the contents of the missing allowable band setting process (step S30) in the present embodiment will be described.

図３は、第１の実施形態における欠落許容帯域設定処理の内容を示すフローチャートである。 FIG. 3 is a flowchart showing the content of the missing allowable band setting process in the first embodiment.

本実施形態では、上記のように、各帯域sfbの重要度として電力値mdct_pow(sfb)と初期マスキング閾値sfbThr₀(sfb)との差分値を用いる。この差分値は、ステップＳ２０で算出した信号対マスク比であり、１回目の欠落許容帯域設定処理（ステップＳ３０）を開始する前に記憶部２０ｃに記憶させている。そのため、欠落許容帯域設定部２０ｂが欠落許容帯域設定処理を行う際には、図３に示すように、まず、記憶部２０ｃから符号化対象の各帯域の信号対マスク比及びマスキング閾値の上限値を読み出す（ステップＳ３０００）。 In the present embodiment, as described above, the difference value between the power value mdct_pow (sfb) and the initial masking threshold sfbThr ₀ (sfb) is used as the importance of each band sfb. This difference value is the signal-to-mask ratio calculated in step S20, and is stored in the storage unit 20c before starting the first loss allowable band setting process (step S30). Therefore, when the missing allowable band setting unit 20b performs the missing allowable band setting process, as shown in FIG. 3, first, the signal-to-mask ratio and the upper limit value of the masking threshold value of each band to be encoded are stored from the storage unit 20c. Is read (step S3000).

次に、欠落許容帯域設定部２０ｂは、マスキング閾値の上限値が欠落を許容しない値に設定された帯域のうち信号対マスク比が最も小さい帯域を、重要度が最も低い帯域に特定する（ステップＳ３００２）。欠落を許容しないマスキング閾値の上限値は、上記のように電力値よりも小さな正の値である。 Next, the missing allowable band setting unit 20b identifies the band having the lowest signal-to-mask ratio among the bands whose upper limit value of the masking threshold is set to a value that does not allow missing as the band having the lowest importance (step) S3002). The upper limit value of the masking threshold that does not allow omission is a positive value smaller than the power value as described above.

次に、欠落許容帯域設定部２０ｂは、重要度が最も低い帯域に対するマスキング閾値の上限値を、欠落を許容する値に変更して記憶部２０ｃのデータを更新する（ステップＳ３００４）。欠落を許容する値は、欠落を許容しないマスキング閾値の上限値との判別ができる値であればよく、例えば「０」又は「−１」にする。 Next, the missing allowable band setting unit 20b updates the data in the storage unit 20c by changing the upper limit value of the masking threshold for the band with the lowest importance to a value that allows the loss (step S3004). The value that allows the omission may be a value that can be distinguished from the upper limit value of the masking threshold that does not allow the omission, for example, “0” or “−1”.

ステップＳ３００４の処理を終えると、欠落許容帯域設定部２０ｂは、欠落許容帯域設定処理を終了する（リターン）。 When the process of step S3004 is completed, the missing allowable band setting unit 20b ends the missing allowable band setting process (return).

初期マスキング閾値sfbThr₀(sfb)は、聴覚心理モデルに基づいて算出した値であり、人間が知覚することのできない量子化誤差量の制限値である。そして、周波数スペクトルにおける電力値mdct_pow(sfb)が初期マスキング閾値より小さい帯域の音は知覚できない。また、電力値が初期マスキング閾値より大きくても両者の差が非常に小さい帯域の音は知覚が困難である。よって、ステップＳ２０で算出した信号対マスク比が非常に小さい帯域は、信号対マスク比が大きい帯域に比べ、量子化により欠落したときに音質劣化に及ぼす影響が小さい。したがって、帯域の重要度として信号対マスク比を用い、信号対マスク比が小さい帯域から順に欠落を許容することで、帯域の欠落による音質劣化を抑制することができる。 The initial masking threshold sfbThr ₀ (sfb) is a value calculated based on an auditory psychological model, and is a limit value of the quantization error amount that cannot be perceived by humans. A sound in a band where the power value mdct_pow (sfb) in the frequency spectrum is smaller than the initial masking threshold cannot be perceived. Further, even if the power value is larger than the initial masking threshold, it is difficult to perceive a sound in a band where the difference between the two is very small. Therefore, the band having a very small signal-to-mask ratio calculated in step S20 has a smaller influence on the sound quality degradation when missing due to quantization than the band having a large signal-to-mask ratio. Therefore, by using the signal-to-mask ratio as the importance of the band and allowing the loss in order from the band having the smallest signal-to-mask ratio, it is possible to suppress deterioration in sound quality due to the loss of the band.

更に、マスキング閾値を補正しなおす際にいくつかの帯域に対し量子化による欠落を許容した場合、欠落を許容した帯域に対するマスキング閾値は、前回の補正時の値より大きな値にすることが可能となる。欠落を許容した帯域のマスキング閾値を大きな値に補正すれば、その帯域の符号化に使用されるビット量を低減する、又は「０」にすることができる。このように、欠落を許容した帯域に使用されるビット量を減らすことができれば、減らした分のビットを他の帯域の符号化に使用することができる。欠落を許容した帯域で使用するビット量を減らして得たビットは、例えば前回の補正において補正前後のマスキング閾値の差が大きかった帯域に充当される。そのため、補正後のマスキング閾値を初期マスキング閾値に近づけることができる。よって、量子化に用いるマスキング閾値sfbThr(sfb)の概形と初期マスキング閾値sfbThr₀(sfb)の概形とのずれによる音質の劣化を抑制することができる。 Furthermore, when the masking threshold is corrected again, if some of the bands are allowed to be missing due to quantization, the masking threshold for the band that allowed the missing can be set to a value larger than the value at the previous correction. Become. If the masking threshold of the band that allows the loss is corrected to a large value, the amount of bits used for encoding the band can be reduced or set to “0”. In this way, if the amount of bits used in a band that allows loss can be reduced, the reduced amount of bits can be used for encoding other bands. Bits obtained by reducing the amount of bits used in a band that allows for loss are allocated to a band in which the difference between the masking threshold values before and after correction is large in the previous correction, for example. Therefore, the corrected masking threshold can be brought close to the initial masking threshold. Therefore, it is possible to suppress deterioration in sound quality due to a deviation between the outline of the masking threshold sfbThr (sfb) used for quantization and the outline of the initial masking threshold sfbThr ₀ (sfb).

上記のマスキング閾値の補正処理について、図４Ａ〜図４Ｄを参照しながら具体的に説明する。 The masking threshold value correction process will be specifically described with reference to FIGS. 4A to 4D.

図４Ａは、周波数スペクトル及び初期マスキング閾値の一例を示すグラフである。
１フレーム分の入力信号に対しステップＳ１０〜Ｓ１４の処理を行うと、例えば、図４Ａに示すような、周波数スペクトルの各帯域sfbの電力値mdct_pow(sfb)及び初期マスキング閾値sfbThr₀(sfb)が得られる。 FIG. 4A is a graph illustrating an example of a frequency spectrum and an initial masking threshold.
When the processing of steps S10 to S14 is performed on the input signal for one frame, for example, the power value mdct_pow (sfb) and the initial masking threshold value sfbThr ₀ (sfb) of each band sfb of the frequency spectrum as shown in FIG. 4A are obtained. can get.

初期マスキング閾値sfbThr₀(sfb)は、上記のように、対応する周波数スペクトルの量子化に最適なマスキング閾値である。そのため、ステップＳ１６，Ｓ１８の処理により初期マスキング閾値sfbThr₀(sfb)を用いた量子化が可能であると判定した場合、符号化装置１は、初期マスキング閾値sfbThr₀(sfb)を用いて周波数スペクトルを量子化する（ステップＳ３２）。 As described above, the initial masking threshold value sfbThr ₀ (sfb) is an optimal masking threshold value for quantization of the corresponding frequency spectrum. Therefore, when it is determined that the quantization using the initial masking threshold sfbThr ₀ (sfb) is possible by the processing in steps S16 and S18, the encoding device 1 uses the initial masking threshold sfbThr ₀ (sfb) to perform frequency spectrum analysis. Is quantized (step S32).

しかしながら、低ビットレート条件で符号化する場合、すなわち周波数スペクトルの符号化に使用可能なビット数が少ない場合、量子化誤差を初期マスキング閾値以下にできないことが多い。量子化誤差を初期マスキング閾値以下にできない場合、符号化装置１は、ビットレート条件や聴覚特性等に基づいて、音質がなるべく劣化しない範囲でマスキング閾値を大きくする（緩める）補正を行う。 However, when encoding is performed under a low bit rate condition, that is, when the number of bits that can be used for encoding a frequency spectrum is small, the quantization error often cannot be made lower than the initial masking threshold. When the quantization error cannot be made equal to or less than the initial masking threshold, the encoding apparatus 1 performs correction for increasing (relaxing) the masking threshold within a range where the sound quality is not deteriorated as much as possible based on the bit rate condition, auditory characteristics, and the like.

図４Ｂは、１回目の補正で得られたマスキング閾値と初期マスキング閾値との関係を説明するグラフである。 FIG. 4B is a graph for explaining the relationship between the masking threshold obtained by the first correction and the initial masking threshold.

本実施形態における初期マスキング閾値に対する１回目の補正は、全ての帯域に対し量子化による欠落を許容しない条件を付して行われる。この１回目の補正を行うと、補正後のマスキング閾値sfbThr(sfb)は、例えば、図４Ｂに実線の折れ線で示したような概形になる。 The first correction for the initial masking threshold in the present embodiment is performed under conditions that do not allow missing due to quantization for all bands. When the first correction is performed, the corrected masking threshold value sfbThr (sfb) has, for example, an outline as shown by a solid line in FIG. 4B.

この補正後のマスキング閾値sfbThr(sfb)の概形と、図４Ｂに点線で示した初期マスキング閾値sfbThr₀(sfb)の概形とを比較すると、低周波の帯域sfb1〜sfb3や、高周波の帯域sfb15〜sfb18において両者の類似度が低くなっている。それでも、補正前後のマスキング閾値sfbThr(sfb)，sfbThr₀(sfb)から算出した相互相関値correが相関閾値ＴＨ_１よりも大きければ、量子化による帯域の欠落を防ぎつつ、マスキング閾値の概形のずれによる音質劣化を抑制することができる。 Comparing the outline of the corrected masking threshold sfbThr (sfb) with the outline of the initial masking threshold sfbThr ₀ (sfb) shown by the dotted line in FIG. 4B, the low-frequency bands sfb1 to sfb3 and the high-frequency bands In sfb15 to sfb18, the similarity between the two is low. Still, if the cross-correlation value corre calculated from the masking threshold values sfbThr (sfb) and sfbThr ₀ (sfb) before and after correction is larger than the correlation threshold value TH ₁ , the outline of the masking threshold value is prevented while preventing loss of the band due to quantization. It is possible to suppress deterioration in sound quality due to deviation.

しかしながら、補正前後のマスキング閾値sfbThr(sfb)，sfbThr₀(sfb)から算出した相互相関値correが相関閾値ＴＨ_１以下である場合、いくつかの帯域のマスキング閾値が過度に補正されており音質劣化につながる。よって、図４Ｂに示した補正前後のマスキング閾値sfbThr(sfb)，sfbThr0(sfb)から算出した相互相関値correが相関閾値ＴＨ_１以下である場合、重要度が最も低い帯域の欠落を許容してマスキング閾値を補正しなおす。 However, before and after correction of the masking threshold sfbThr (sfb), sfbThr ₀ if the cross-correlation value is calculated from (sfb) corre is equal to or less than the correlation threshold value TH _1, some of the band of the masking threshold is excessively corrected by which sound quality Leads to. Thus, the masking threshold sfbThr before and after correction shown in FIG. 4B (sfb), sfbThr0 case cross-correlation value corre calculated from (sfb) is equal to or less than the correlation threshold value TH _1, and allows the omission of the lowest band importance Correct the masking threshold.

本実施形態において重要度が最も低い帯域は、信号対マスク比が最も小さい帯域である。図４Ａに示した帯域sfb1〜sfb5及びSfb11〜sfb18において信号対マスク比が最も小さい帯域は、帯域sfb5である。よって、符号化装置１は、帯域sfb5のマスキング閾値の上限値を、欠落を許容する値に変更して、初期マスキング閾値に対する２回目の補正を行う。 In the present embodiment, the band having the lowest importance is the band having the smallest signal-to-mask ratio. The band with the smallest signal-to-mask ratio in the bands sfb1 to sfb5 and Sfb11 to sfb18 shown in FIG. 4A is the band sfb5. Therefore, the encoding apparatus 1 changes the upper limit value of the masking threshold value of the band sfb5 to a value that allows omission and performs the second correction on the initial masking threshold value.

帯域sfb5の欠落のみを許容する条件下でマスキング閾値を補正しなおすと、補正後のマスキング閾値sfbThr₂(sfb)は、例えば、図４Ｃに実線の折れ線で示したような概形になる。図４Ｃは、２回目の補正で得られたマスキング閾値と１回目の補正で得られたマスキング閾値との関係を説明するグラフである。 When the masking threshold value is corrected again under the condition that only the loss of the band sfb5 is allowed, the corrected masking threshold value sfbThr ₂ (sfb) has, for example, an outline as shown by a solid line in FIG. 4C. FIG. 4C is a graph illustrating the relationship between the masking threshold obtained by the second correction and the masking threshold obtained by the first correction.

２回目の補正で得られたマスキング閾値sfbThr₂(sfb)と、図４Ｃに破線で示した１回目の補正で得られたマスキング閾値sfbThr(sfb)とを比較すると、いくつかの帯域で２回目の補正で得られたマスキング閾値sfbThr₂(sfb)のほうが小さくなっている。これは、２回目の補正では帯域sfb5の欠落を許容しているためである。 When the masking threshold value sfbThr ₂ (sfb) obtained by the second correction and the masking threshold value sfbThr (sfb) obtained by the first correction shown by the broken line in FIG. 4C are compared, the second time in several bands. The masking threshold sfbThr ₂ (sfb) obtained by the correction is smaller. This is because the second correction allows the loss of the band sfb5.

２回目の補正では、帯域sfb5に対するマスキング閾値sfbThr₂(sfb5)を１回目の補正時よりも大きくすることができる。帯域sfb5に対するマスキング閾値sfbThr₂(sfb5)が１回目の補正時よりも大きくなれば、帯域sfb5の符号化に使用されるビット量が低減し、低減した分だけ他の帯域の符号化に使用するビット量を増加させることができる。そして、１つの帯域の符号化に使用するビット量が増加すれば、その帯域のマスキング閾値は、１回目の補正時より小さくすることができる。そのため、２回目の補正で得られたマスキング閾値sfbThr₂(sfb)では、図４Ｃに示したように、１回目の補正において初期マスキング閾値sfbThr₀(sfb)との差が大きかった帯域sfb1，sfb2等の閾値が１回目の補正時の値よりも小さくなっている。すなわち、２回目の補正では、帯域sfb5の欠落を許容したことにより、帯域sfb1，sfb2のマスキング閾値に対する過度の補正が抑制されている。 In the second correction, the masking threshold sfbThr ₂ (sfb5) for the band sfb5 can be made larger than that in the first correction. If the masking threshold value sfbThr ₂ (sfb5) for the band sfb5 becomes larger than that at the time of the first correction, the bit amount used for coding the band sfb5 is reduced, and is used for coding other bands by the reduced amount. The amount of bits can be increased. If the amount of bits used for encoding one band increases, the masking threshold for that band can be made smaller than that for the first correction. Therefore, in the masking threshold value sfbThr ₂ (sfb) obtained by the second correction, as shown in FIG. 4C, the bands sfb1 and sfb2 in which the difference from the initial masking threshold value sfbThr ₀ (sfb) was large in the first correction. Etc. are smaller than the value at the time of the first correction. That is, in the second correction, since the loss of the band sfb5 is allowed, excessive correction to the masking threshold values of the bands sfb1 and sfb2 is suppressed.

図４Ｃからもわかるように、２回目の補正で得られたマスキング閾値sfbThr₂(sfb)は、１回目の補正で得られたマスキング閾値sfbThr(sfb)に比べて、初期マスキング閾値sfbThr₀(sfb)との概形の類似度が高い。よって、２回目の補正で得られたマスキング閾値sfbThr₂(sfb)と初期マスキング閾値sfbThr₀(sfb)との相互相関値correは、１回目の補正で得られたマスキング閾値sfbThr(sfb)と初期マスキング閾値sfbThr₀(sfb)との相互相関値より大きくなる。そして、２回目の補正で得られたマスキング閾値sfbThr₂(sfb)と初期マスキング閾値sfbThr₀(sfb)との相互相関値correが相関閾値ＴＨ_１よりも大きい場合、符号化装置１は、２回目の補正で得られたマスキング閾値sfbThr₂(sfb)を用いて量子化する。これにより、量子化に用いたマスキング閾値と初期マスキング閾値の概形とのずれ、言い換えるとマスキング閾値の過度の補正による音質劣化を抑制することができる。また、図４Ｃに示した２回目の補正で得られたマスキング閾値sfbThr₂(sfb)は、欠落を許容した帯域sfb5を含む全ての帯域でマスキング閾値が電力値mdct_pow(sfb)以下となっている。したがって、量子化により帯域が欠落することによる音質の劣化も防げる。 As can be seen from FIG. 4C, 2 nd correction obtained masking threshold sfbThr ₂ (sfb), compared to the first correction is obtained masking threshold sfbThr (sfb), the initial masking threshold sfbThr ₀ (sfb ) And high degree of similarity. Therefore, the cross-correlation value corre between the masking threshold sfbThr ₂ (sfb) obtained by the second correction and the initial masking threshold sfbThr ₀ (sfb) is the initial value of the masking threshold sfbThr (sfb) obtained by the first correction. It becomes larger than the cross-correlation value with the masking threshold value sfbThr ₀ (sfb). When the cross-correlation value corre between the masking threshold value sfbThr ₂ (sfb) obtained by the second correction and the initial masking threshold value sfbThr ₀ (sfb) is larger than the correlation threshold value TH ₁ , the encoding apparatus 1 performs the second time Quantization is performed using the masking threshold value sfbThr ₂ (sfb) obtained by the correction. Thereby, the shift | offset | difference of the masking threshold value used for quantization and the rough form of the initial masking threshold value, in other words, sound quality deterioration due to excessive correction of the masking threshold value can be suppressed. Further, the masking threshold value sfbThr ₂ (sfb) obtained by the second correction shown in FIG. 4C has a masking threshold value equal to or lower than the power value mdct_pow (sfb) in all the bands including the band sfb5 that is allowed to be lost. . Therefore, it is possible to prevent deterioration in sound quality due to lack of bands due to quantization.

しかしながら、２回目の補正で得られたマスキング閾値sfbThr₂(sfb)と初期マスキング閾値sfbThr₀(sfb)とを比較すると、高周波の帯域sfb15〜sfb18の類似度が依然低い。そのため、相関閾値ＴＨ_１の値によっては、２回目の補正で得られたマスキング閾値と初期マスキング閾値との相互相関値correが相関閾値ＴＨ_１以下になることもある。その場合、符号化装置１は、初期マスキング閾値sfbThr₀(sfb)に対し３回目の補正を行う。３回目の補正は、欠落を許容していない帯域のうち重要度が最も低い帯域、すなわち帯域sfb5の次に信号対マスク比が小さい帯域を、欠落を許容する帯域に追加設定して行う。図４Ａ（図４Ｃ）に示した帯域sfb1〜sfb5及びsfb11〜sfb18において、帯域sfb5の次に信号対マスク比が最も小さい帯域は、帯域sfb15である。よって、符号化装置１は、帯域sfb5，sfb15のマスキング閾値の上限値を、欠落を許容する値に変更した状態で、初期マスキング閾値sfbThr₀(sfb)に対する３回目の補正を行う。 However, when the masking threshold sfbThr ₂ (sfb) obtained by the second correction is compared with the initial masking threshold sfbThr ₀ (sfb), the similarity between the high-frequency bands sfb15 to sfb18 is still low. Therefore, depending on the value of the correlation threshold TH _1, sometimes the cross-correlation value corre the masking threshold obtained in the second correction and the initial masking threshold becomes less correlation threshold TH _1. In that case, the encoding apparatus 1 performs the third correction on the initial masking threshold value sfbThr ₀ (sfb). The third correction is performed by additionally setting the band having the lowest importance among the bands that do not allow omission, that is, the band having the next smallest signal-to-mask ratio after the band sfb5. In the bands sfb1 to sfb5 and sfb11 to sfb18 shown in FIG. 4A (FIG. 4C), the band having the smallest signal-to-mask ratio after the band sfb5 is the band sfb15. Therefore, the encoding device 1 performs the third correction on the initial masking threshold value sfbThr ₀ (sfb) in a state where the upper limit value of the masking threshold value of the bands sfb5 and sfb15 is changed to a value that allows the loss.

帯域sfb5，sfb15の欠落を許容する条件下でマスキング閾値を補正しなおすと、補正後のマスキング閾値sfbThr₃(sfb)は、図４Ｄに実線の折れ線で示したような概形になる。図４Ｄは、３回目の補正で得られたマスキング閾値と２回目の補正で得られたマスキング閾値との関係を説明するグラフである。 When the masking threshold value is corrected again under the condition that allows the loss of the bands sfb5 and sfb15, the corrected masking threshold value sfbThr ₃ (sfb) has an approximate shape as shown by a solid line in FIG. 4D. FIG. 4D is a graph for explaining the relationship between the masking threshold obtained by the third correction and the masking threshold obtained by the second correction.

３回目の補正で得られたマスキング閾値sfbThr₃(sfb)と、図４Ｄに破線の折れ線で示した２回目の補正で得られたマスキング閾値sfbThr₂(sfb)とを比較すると、いくつかの帯域で３回目の補正で得られたマスキング閾値のほうが小さくなっている。これは、欠落を許容した帯域sfb5の符号化に使用されるビット量を低減させた分、他の帯域sfb1，sfb2，sfb17，sfb18等の符号化に使用されるビット量が増加したためである。 When the masking threshold value sfbThr ₃ (sfb) obtained by the third correction is compared with the masking threshold value sfbThr ₂ (sfb) obtained by the second correction shown by the broken line in FIG. Thus, the masking threshold obtained by the third correction is smaller. This is because the amount of bits used for encoding the other bands sfb1, sfb2, sfb17, sfb18, etc. has increased by the amount of bits used for encoding the band sfb5 that allows loss.

図４Ｄ及び図４Ｃからわかるように、３回目の補正で得られたマスキング閾値sfbThr₃(sfb)は、１回目及び２回目の補正で得られたマスキング閾値sfbThr(sfb)，sfbThr₂(sfb)に比べ、初期マスキング閾値sfbThr₀(sfb)との概形の類似度が高い。よって、３回目の補正で得られたマスキング閾値sfbThr₃(sfb)と初期マスキング閾値sfbThr₀(sfb)との相互相関値correは、１回目及び２回目の補正で得られたマスキング閾値と初期マスキング閾値sfbThr₀(sfb)との相互相関値より大きくなる。そして、３回目の補正で得られたマスキング閾値sfbThr₃(sfb)と初期マスキング閾値sfbThr₀(sfb)との相互相関値correが相関閾値ＴＨ_１よりも大きい場合、符号化装置１は、３回目の補正で得られたマスキング閾値sfbThr₃(sfb)を用いて量子化する。これにより、量子化に用いたマスキング閾値と初期マスキング閾値の概形とのずれ、すなわちマスキング閾値の過度の補正による音質劣化を抑制することができる。 As can be seen from FIGS. 4D and 4C, the masking threshold values sfbThr ₃ (sfb) obtained by the third correction are masking threshold values sfbThr (sfb) and sfbThr ₂ (sfb) obtained by the first and second corrections. Compared to the initial masking threshold value sfbThr ₀ (sfb), the approximate similarity is high. Therefore, the cross-correlation value corre between the masking threshold value sfbThr ₃ (sfb) obtained by the third correction and the initial masking threshold value sfbThr ₀ (sfb) is the masking threshold value obtained by the first and second corrections and the initial masking. It becomes larger than the cross-correlation value with the threshold value sfbThr ₀ (sfb). When the cross-correlation value corre between the masking threshold value sfbThr ₃ (sfb) obtained by the third correction and the initial masking threshold value sfbThr ₀ (sfb) is larger than the correlation threshold value TH ₁ , the encoding apparatus 1 performs the third time Quantization is performed using the masking threshold value sfbThr ₃ (sfb) obtained by the correction. Thereby, the shift | offset | difference of the masking threshold value used for quantization and the rough form of an initial masking threshold value, ie, the sound quality degradation by the excessive correction | amendment of a masking threshold value, can be suppressed.

また、図４Ｄに示した３回目の補正で得られたマスキング閾値sfbThr₃(sfb)は、欠落を許容した帯域sfb5を除く全ての帯域でマスキング閾値が電力値mdct_pow(sfb)以下となっている。したがって、量子化により欠落するのは重要度が最も低い帯域sfb5のみである。よって、帯域の欠落による音質の劣化も最小限に抑えることができる。 Further, the masking threshold value sfbThr ₃ (sfb) obtained by the third correction shown in FIG. 4D has a masking threshold value equal to or less than the power value mdct_pow (sfb) in all bands except the band sfb5 that allows the loss. . Accordingly, only the least important band sfb5 is lost due to quantization. Therefore, deterioration in sound quality due to lack of bands can be minimized.

なお、相関閾値ＴＨ_１の値によっては、３回目の補正で得られたマスキング閾値sfbThr₃(sfb)と初期マスキング閾値sfbThr₀(sfb)との相互相関値correが相関閾値ＴＨ_１以下になることもある。その場合、符号化装置１は、初期マスキング閾値sfbThr₀(sfb)に対する４回目の補正を行う。４回目の補正は、欠落を許容していない帯域のうち重要度が最も低い帯域、すなわち帯域sfb15の次に信号対マスク比が小さい帯域（例えば帯域sfb14）を、欠落を許容する帯域に追加設定して行う。以後、符号化装置１は、補正後のマスキング閾値と初期マスキング閾値との相互相関値correが相関閾値ＴＨ_１より大きくなるまで、欠落を許容する帯域を追加設定しながらマスキング閾値の補正を繰り返す。これにより、帯域の欠落による音質の劣化を抑制しつつ、マスキング閾値の過度の補正による音質劣化を抑制できる。 Depending on the value of the correlation threshold TH ₁ , the cross-correlation value corre between the masking threshold sfbThr ₃ (sfb) obtained by the third correction and the initial masking threshold sfbThr ₀ (sfb) may be less than the correlation threshold TH _1. There is also. In that case, the encoding apparatus 1 performs the fourth correction on the initial masking threshold sfbThr ₀ (sfb). In the fourth correction, the band that has the lowest importance among the bands that do not allow omission, that is, the band that has the smallest signal-to-mask ratio (for example, band sfb14) after band sfb15 is additionally set as the band that allows omission. And do it. Thereafter, the encoding apparatus 1, until the cross-correlation value corre between the masking threshold and the initial masking threshold after correction is larger than the correlation threshold TH _1, repeated correction of the masking threshold while adding configure bandwidth to allow missing. Thereby, it is possible to suppress deterioration in sound quality due to excessive correction of the masking threshold while suppressing deterioration in sound quality due to lack of bands.

上記のような符号化処理を行う本実施形態の符号化装置１は、例えばコンピュータと、コンピュータに上記の符号化処理を実行させるプログラムとにより実現可能である。以下、コンピュータとプログラムにより実現される符号化装置１について、図５を参照しながら説明する。 The encoding apparatus 1 of the present embodiment that performs the encoding process as described above can be realized by, for example, a computer and a program that causes the computer to execute the encoding process. Hereinafter, the encoding apparatus 1 realized by a computer and a program will be described with reference to FIG.

図５は、符号化装置として動作させるコンピュータのハードウェア構成例を示す模式図である。 FIG. 5 is a schematic diagram illustrating a hardware configuration example of a computer that operates as an encoding apparatus.

図５に示すように、符号化装置として動作させるコンピュータ５は、Central Processing Unit（ＣＰＵ）５０と、主記憶装置５２と、補助記憶装置５４と、入力装置５６と、出力装置５８と、を備える。また、コンピュータ５は、Digital Signal Processor（ＤＳＰ）６０と、記憶媒体駆動装置６２と、インタフェース装置６４と、を更に備える。コンピュータ５におけるこれらの要素５０〜６４は、バス６８により相互に接続されており、要素間でのデータの受け渡しが可能になっている。 As shown in FIG. 5, the computer 5 that operates as an encoding device includes a Central Processing Unit (CPU) 50, a main storage device 52, an auxiliary storage device 54, an input device 56, and an output device 58. . The computer 5 further includes a digital signal processor (DSP) 60, a storage medium driving device 62, and an interface device 64. These elements 50 to 64 in the computer 5 are connected to each other by a bus 68 so that data can be exchanged between the elements.

ＣＰＵ５０は、オペレーティングシステムを含む各種のプログラムを実行することによりコンピュータ５の全体の動作を制御する演算処理装置である。 The CPU 50 is an arithmetic processing unit that controls the overall operation of the computer 5 by executing various programs including an operating system.

主記憶装置５２は、Read Only Memory（ＲＯＭ）５２ａ及びRandom Access Memory（ＲＡＭ）５２ｂを有する。ＲＯＭ５２ａには、例えばコンピュータ５の起動時にプロセッサ５０が読み出す所定の基本制御プログラム等が予め記録されている。また、ＲＡＭ５２ｂは、プロセッサ５０が各種のプログラムを実行する際に、必要に応じて作業用記憶領域として使用する。本実施形態においては、例えば符号化するオーディオ信号、マスキング閾値等の一時的な記憶にＲＡＭ５２ｂを使用する。 The main storage device 52 includes a read only memory (ROM) 52a and a random access memory (RAM) 52b. In the ROM 52a, for example, a predetermined basic control program read by the processor 50 when the computer 5 is started is recorded in advance. The RAM 52b is used as a working storage area as necessary when the processor 50 executes various programs. In the present embodiment, the RAM 52b is used for temporary storage of, for example, an audio signal to be encoded and a masking threshold.

補助記憶装置５４は、Hard Disk Drive（ＨＤＤ）やSolid State Disk（ＳＳＤ）等の主記憶装置５２に比べて大容量の記憶装置である。補助記憶装置５４には、ＣＰＵ５０によって実行される各種のプログラムや各種のデータ等を記憶させる。補助記憶装置５４に記憶させるプログラムとしては、例えば、オーディオ信号の符号化や再生を行うオーディオプレイヤーのプログラムが挙げられる。また、補助記憶装置５４に記憶させるデータとしては、例えば、上記のプレイヤーにより符号化されたオーディオ信号のデータが挙げられる。 The auxiliary storage device 54 is a storage device with a larger capacity than the main storage device 52 such as a hard disk drive (HDD) or a solid state disk (SSD). The auxiliary storage device 54 stores various programs executed by the CPU 50, various data, and the like. Examples of the program stored in the auxiliary storage device 54 include an audio player program that performs encoding and reproduction of an audio signal. Further, examples of data stored in the auxiliary storage device 54 include audio signal data encoded by the player.

入力装置５６は、例えばキーボード装置やマウス装置であり、コンピュータ５のオペレータにより操作されると、その操作内容に対応付けられている入力情報をＣＰＵ５０に送信する。 The input device 56 is, for example, a keyboard device or a mouse device. When operated by an operator of the computer 5, the input device 56 transmits input information associated with the operation content to the CPU 50.

出力装置５８は、例えば液晶ディスプレイやスピーカである。液晶ディスプレイは、ＣＰＵ５０等から送信される表示データに従って各種のテキスト、画像等を表示する。また、スピーカは、ＣＰＵ５０やＤＳＰ６０等から送信される音声データやオーディオデータを出力する。 The output device 58 is, for example, a liquid crystal display or a speaker. The liquid crystal display displays various texts, images, and the like according to display data transmitted from the CPU 50 or the like. The speaker outputs audio data and audio data transmitted from the CPU 50, DSP 60, and the like.

ＤＳＰ６０は、ＣＰＵ５０からの制御信号等に従ってオーディオ信号の符号化処理、復号化（再生）処理等を行う演算処理装置である。 The DSP 60 is an arithmetic processing unit that performs audio signal encoding processing, decoding (reproduction) processing, and the like in accordance with control signals from the CPU 50.

記憶媒体駆動装置６４は、図示しない可搬型記憶媒体に記録されているプログラムやデータの読み出し、補助記憶装置５４に記憶されたデータ等の可搬型記憶媒体への書き込みを行う。可搬型記憶媒体としては、例えば、ＵＳＢ規格のコネクタが備えられているフラッシュメモリが利用可能である。また、可搬型記憶媒体としては、Compact Disk（ＣＤ）、Digital Versatile Disc（ＤＶＤ）、Blu-ray Disc（Blu-rayは登録商標）等の光ディスクも利用可能である。 The storage medium driving device 64 reads programs and data recorded in a portable storage medium (not shown) and writes data stored in the auxiliary storage device 54 to the portable storage medium. As the portable storage medium, for example, a flash memory equipped with a USB standard connector can be used. Further, as a portable storage medium, an optical disc such as a Compact Disk (CD), a Digital Versatile Disc (DVD), and a Blu-ray Disc (Blu-ray is a registered trademark) can be used.

インタフェース装置６４は、例えばオーディオ入出力装置や通信制御装置である。オーディオ入出力装置は、例えばコンピュータ５とマイクやオーディオ装置とを接続してオーディオ信号の入力や出力を行う。通信制御装置は、コンピュータ５とインターネット等の通信ネットワークとを通信可能に接続し、通信ネットワークを介した外部通信装置等との通信によりオーディオデータ等の送受信を行う。 The interface device 64 is, for example, an audio input / output device or a communication control device. The audio input / output device inputs and outputs an audio signal by connecting the computer 5 to a microphone or an audio device, for example. The communication control device communicatively connects the computer 5 and a communication network such as the Internet, and transmits and receives audio data and the like by communication with an external communication device or the like via the communication network.

このコンピュータ５は、ＣＰＵ５０が補助記憶装置５４から上述した符号化処理を含むプログラムを読み出し、ＤＳＰ６０、主記憶装置５２、補助記憶装置５４等と協働してオーディオ信号の符号化処理を実行する。この際、ＣＰＵ５０は、符号化処理における演算処理をＤＳＰ６０に実行させる。ＤＳＰ６０は、オーディオ信号を周波数スペクトルに変換し、初期マスキング閾値を生成する。オーディオ信号は、例えば音楽ＣＤ等の可搬型記憶媒体から読み出して入力してもよいし、インタフェース装置６４を介した通信でコンピュータ５に入力してもよい。また、ＤＳＰ６０は、初期ＰＥ値及び目標ＰＥ値を算出し、それらの大小関係から初期マスキング閾値を用いてオーディオ信号を量子化することができるか否かを判定する。そして、初期マスキング閾値を用いて量子化することができない場合、ＤＳＰ６０は、聴覚特性を算出し、マスキング閾値を補正する。更に、ＤＳＰ６０は、補正されたマスキング閾値の採否を判定し、不採用の場合には欠落を許容する帯域を設定してマスキング閾値の補正を再度行う。そして、補正されたマスキング閾値を採用すると判定すると、採用したマスキング閾値を用いてオーディオ信号（周波数スペクトル）を量子化、符号化する。また、ＤＳＰ６０は、上記の処理の実行中、初期マスキング閾値、目標ＰＥ値、補正後のマスキング閾値の上限値等をＲＡＭ５２ｂや補助記憶装置５４に記憶させる処理、及びＲＡＭ５２ｂや補助記憶装置５４から読み出す処理を行う。 In the computer 5, the CPU 50 reads out the program including the encoding process described above from the auxiliary storage device 54, and executes the encoding process of the audio signal in cooperation with the DSP 60, the main storage device 52, the auxiliary storage device 54, and the like. To do. At this time, the CPU 50 causes the DSP 60 to perform arithmetic processing in the encoding process. The DSP 60 converts the audio signal into a frequency spectrum and generates an initial masking threshold. The audio signal may be read from a portable storage medium such as a music CD, for example, or may be input to the computer 5 by communication via the interface device 64. Further, the DSP 60 calculates the initial PE value and the target PE value, and determines whether or not the audio signal can be quantized using the initial masking threshold value based on the magnitude relationship between them. If the initial masking threshold cannot be used for quantization, the DSP 60 calculates an auditory characteristic and corrects the masking threshold. Further, the DSP 60 determines whether or not the corrected masking threshold is adopted, and if not adopted, sets a band that allows omission and corrects the masking threshold again. When it is determined that the corrected masking threshold is adopted, the audio signal (frequency spectrum) is quantized and encoded using the adopted masking threshold. Further, the DSP 60 stores the initial masking threshold value, the target PE value, the upper limit value of the corrected masking threshold value in the RAM 52b and the auxiliary storage device 54, and the RAM 52b and auxiliary storage device during the execution of the above processing. The process of reading from 54 is performed.

コンピュータ５で符号化したオーディオ信号のデータ（オーディオデータ）は、例えば、補助記憶装置５４に記憶させておき、必要に応じてコンピュータ５で復号化（再生）する。また、インタフェース装置６４として通信制御装置を備えたコンピュータ５であれば、例えば、オーディオデータを、通信ネットワークを介して他のコンピュータ等に提供（配信）することができる。 Audio signal data (audio data) encoded by the computer 5 is stored in, for example, the auxiliary storage device 54 and is decoded (reproduced) by the computer 5 as necessary. Further, if the computer 5 includes a communication control device as the interface device 64, for example, audio data can be provided (distributed) to another computer or the like via a communication network.

なお、符号化装置１として用いるコンピュータ５は、図５に示した構成に限らず、ＣＰＵ５０においてオーディオ信号の符号化を行う構成であってもよい。また、符号化装置１として用いるコンピュータ５は、種々のプログラムを実行することにより複数の機能を実現する汎用型のものに限らず、オーディオ信号の符号化、復号化に特化されたオーディオ装置でもよい。 The computer 5 used as the encoding device 1 is not limited to the configuration shown in FIG. 5, and may be configured to encode an audio signal in the CPU 50. In addition, the computer 5 used as the encoding device 1 is not limited to a general-purpose computer that realizes a plurality of functions by executing various programs, but may be an audio device specialized for encoding and decoding audio signals. Good.

以上説明したように、第１の実施形態に係る符号化装置１を用いた符号化方法では、聴覚心理モデルに基づいて生成した初期マスキング閾値を用いてオーディオ信号（周波数スペクトル）を量子化できない場合、マスキング閾値を補正する。その際、１回目の補正は全ての帯域が量子化により欠落しない条件で行い、補正前後のマスキング閾値の概形の類似度が基準値を満たしているか判定する。そして、概形の類似度が基準値を満たしていない場合、補正前後のマスキング閾値の概形の類似度が基準値を満たすようになるまで、重要度が低い帯域から順に欠落を許容しながらマスキング閾値の補正を繰り返す。そのため、量子化でいくつかの帯域が欠落することによる音質劣化を抑制しつつ、量子化に用いるマスキング閾値と初期マスキング閾値との概形の類似度のずれ（すなわちマスキング閾値の過度の補正）による音質劣化を抑制することができる。 As described above, in the encoding method using the encoding device 1 according to the first embodiment, the audio signal (frequency spectrum) cannot be quantized using the initial masking threshold generated based on the psychoacoustic model. The masking threshold is corrected. At this time, the first correction is performed under the condition that all bands are not lost due to quantization, and it is determined whether the approximate similarity of the masking threshold before and after the correction satisfies the reference value. If the approximate similarity does not satisfy the reference value, masking is performed while allowing omissions in order of decreasing importance until the approximate similarity of the masking threshold values before and after correction reaches the reference value. Repeat threshold correction. For this reason, while suppressing deterioration in sound quality due to the loss of some bands due to quantization, due to a shift in approximate similarity between the masking threshold used for quantization and the initial masking threshold (ie, excessive correction of the masking threshold) Sound quality deterioration can be suppressed.

なお、欠落許容帯域設定処理は、例えば、１度の設定処理において、相互相関値correと相関閾値ＴＨ_１との差の大きさに応じて重要度が最も低い帯域から順に複数の帯域の欠落を許容するようにしてもよい。 Note that the loss allowable band setting process includes, for example, deleting a plurality of bands in order from the lowest importance band according to the magnitude of the difference between the cross-correlation value corre and the correlation threshold value TH _{1 in one} setting process. It may be allowed.

また、補正後のマスキング閾値の採否の判定は、式（４）で表される相互相関値correに限らず、補正前後のマスキング閾値の概形の類似度と対応する値を用いて行えばよい。 Further, whether or not the masking threshold after correction is adopted is not limited to the cross-correlation value corre expressed by the equation (4), but may be determined using a value corresponding to the approximate similarity of the masking threshold before and after correction. .

また、ステップＳ２０で算出する聴覚特性は、信号対マスク比に限らず、他の特性であってもよい。 Further, the auditory characteristic calculated in step S20 is not limited to the signal-to-mask ratio, and may be another characteristic.

更に、符号化装置１は、図１に示したようなオーディオ信号の符号化のみを行う装置に限らず、映像信号の符号化を行う装置であってもよい。映像信号の符号化を行う装置では、図１に示した構成に加え、動画像の符号化を行う構成を備える。このような装置では、入力された映像信号の符号化を動画像の符号化と音声の符号化とに分けて行った後、符号化した動画像と音声とを多重化する。 Furthermore, the encoding apparatus 1 is not limited to an apparatus that only encodes an audio signal as shown in FIG. 1, and may be an apparatus that encodes a video signal. An apparatus for encoding a video signal has a configuration for encoding a moving image in addition to the configuration shown in FIG. In such an apparatus, encoding of an input video signal is divided into moving image encoding and audio encoding, and then the encoded moving image and audio are multiplexed.

［第１の実施形態の変形例］
図２Ｂに示したマスキング閾値の補正処理においては、補正後のマスキング閾値sfbThr(sfb)と初期マスキング閾値sfbThr₀(sfb)との相互相関値correが相関閾値ＴＨ_１より大きくなるまでマスキング閾値の補正を繰り返す。しかしながら、第１の実施形態に係る符号化処理においては、これに限らず、例えばマスキング閾値の補正回数に上限を設けてもよい。 [Modification of First Embodiment]
In the masking threshold correction process shown in FIG. 2B, the masking threshold is corrected until the cross-correlation value corre between the corrected masking threshold sfbThr (sfb) and the initial masking threshold sfbThr ₀ (sfb) becomes larger than the correlation threshold TH _1. repeat. However, the encoding process according to the first embodiment is not limited to this. For example, an upper limit may be provided for the number of masking threshold corrections.

図６は、第１の実施形態に係るマスキング閾値の補正処理の変形例を示すフローチャートである。なお、図６に示した補正処理は、図２Ａに示したステップＳ１８において初期ＰＥ値が目標ＰＥ値よりも大きいと判定された場合（ステップＳ１８；Ｙｅｓ）に行われる。 FIG. 6 is a flowchart illustrating a modification of the masking threshold value correction process according to the first embodiment. The correction process shown in FIG. 6 is performed when it is determined in step S18 shown in FIG. 2A that the initial PE value is larger than the target PE value (step S18; Yes).

初期ＰＥ値が目標ＰＥ値よりも大きい場合、符号化装置１は、図６に示すように、続けて、周波数スペクトル等に基づく聴覚特性の算出（ステップＳ２０）、及び補正後のマスキング閾値sfbThr(sfb)の上限値の設定（ステップＳ２２）を行う。その後、符号化装置１（マスキング閾値補正部１８）は、補正回数ＭをＭ＝１に初期化する（ステップＳ２３）。 When the initial PE value is larger than the target PE value, the encoding apparatus 1 continues to calculate the auditory characteristic based on the frequency spectrum or the like (step S20) and the corrected masking threshold value sfbThr (, as shown in FIG. The upper limit value of sfb) is set (step S22). Thereafter, the encoding device 1 (masking threshold correction unit 18) initializes the correction count M to M = 1 (step S23).

ステップＳ２３の後、符号化装置１は、マスキング閾値の補正（ステップＳ２４）、及び相互相関値correの算出（ステップＳ２６）を行い、相互相関値correが相関閾値ＴＨ_１より大きいか判定する（ステップＳ２８）。そして、相互相関値correが相関閾値ＴＨ_１より大きい場合（ステップＳ２８；Ｙｅｓ）、図２Ｃに示したように直前の補正で得られたマスキング閾値sfbThr(sfb)を用いて周波数スペクトルを量子化する（ステップＳ３４）。 After step S23, the encoding apparatus 1, the correction of the masking threshold (step S24), and and subjected to calculation of the cross-correlation value corre (step S26), the cross-correlation value corre determines greater than the correlation threshold value TH ₁ (step S28). Then, the cross-correlation value corre is greater than the correlation threshold TH _1; quantizing the frequency spectra by using (step S28 Yes), the masking threshold sfbThr obtained by correcting the immediately preceding, as shown in FIG. 2C (sfb) (Step S34).

一方、相互相関値correが相関閾値ＴＨ_１以下の場合（ステップＳ２８；Ｎｏ）、図６に示したように、補正回数Ｍが閾値ＴＨＭ以上であるか判定する（ステップＳ２９）。Ｍ≧ＴＨＭの場合（ステップＳ２９；Ｙｅｓ）、マスキング閾値の補正処理を終了し、直前の補正で得られたマスキング閾値sfbThr(sfb)を用いて周波数スペクトルを量子化する（ステップＳ３４）。また、Ｍ＜ＴＨＭの場合（ステップＳ２９；Ｎｏ）、欠落許容帯域設定処理（ステップＳ３０）を行った後、補正回数Ｍを１だけ増加させ（ステップＳ３１）、ステップＳ２４の補正処理に戻る。補正回数の閾値ＴＨＭは、適宜設定すればよく、例えば５〜１０程度にする。 On the other hand, if the cross-correlation value corre is less correlation threshold TH ₁ (step S28; No), as shown in FIG. 6, it is determined whether the correction count M is the threshold value THM or more (step S29). If M ≧ THM (step S29; Yes), the masking threshold value correction process is terminated, and the frequency spectrum is quantized using the masking threshold value sfbThr (sfb) obtained by the previous correction (step S34). If M <THM (step S29; No), after performing the missing allowable band setting process (step S30), the number of corrections M is increased by 1 (step S31), and the process returns to the correction process of step S24. The threshold THM for the number of corrections may be set as appropriate, for example, about 5 to 10.

図２Ｂに示したマスキング閾値の補正処理では、補正回数が多くなると欠落を許容する帯域も多くなる。そのため、オーディオ信号の周波数スペクトルによっては、補正回数が多くなり重要度が高い帯域の欠落が許容され、量子化により欠落してしまう恐れがある。 In the masking threshold correction process shown in FIG. 2B, the number of bands that allow omission increases as the number of corrections increases. For this reason, depending on the frequency spectrum of the audio signal, the number of corrections increases and the loss of a highly important band is allowed, and there is a risk of loss due to quantization.

これに対し、図６に示したように補正回数に上限を設けた場合、重要度が高い帯域の欠落を抑制することができる。また、図６に示したように補正回数に上限を設けた場合、マスキング閾値の補正を繰り返すことによる符号化処理の遅延等を抑制することができ、AAC-Enhanced Low Delay（ＡＡＣ−ＥＬＤ）等の低遅延符号化への適用も容易になる。 On the other hand, when an upper limit is set for the number of corrections as shown in FIG. In addition, when an upper limit is set for the number of corrections as shown in FIG. 6, it is possible to suppress delays in encoding processing due to repeated correction of the masking threshold, and AAC-Enhanced Low Delay (AAC-ELD), etc. Can be easily applied to low-delay coding.

なお、補正回数の閾値ＴＨＭは、特定の回数に固定してもよいし、例えば、重要度（信号対マスク比）が所定の値よりも小さい帯域の数にする等、周波数スペクトルのパターンに応じて都度変えてもよい。 The threshold THM for the number of corrections may be fixed to a specific number, for example, depending on the frequency spectrum pattern, such as the number of bands whose importance (signal-to-mask ratio) is smaller than a predetermined value. You may change it each time.

［第２の実施形態］
本実施形態は、本発明に係る符号化装置におけるマスキング閾値の補正処理を効率よく行えるようにしたものである。本実施形態のマスキング閾値の補正処理は、第１の実施形態と同様、図２Ａに示したステップＳ１８の判定において初期ＰＥ値が目標ＰＥ値より大きい場合（ステップＳ１８；Ｙｅｓ）に行われる。 [Second Embodiment]
In the present embodiment, the masking threshold value correction process in the encoding apparatus according to the present invention can be performed efficiently. The masking threshold value correction process of this embodiment is performed when the initial PE value is larger than the target PE value in the determination of step S18 shown in FIG. 2A (step S18; Yes), as in the first embodiment.

なお、本実施形態で説明するマスキング閾値の補正処理を行う符号化装置は、図１に示した符号化装置１におけるマスキング閾値生成部１４、及び再補正制御部２０で行う処理の一部を下記のように変更したものでよい。 Note that the encoding apparatus that performs the masking threshold correction process described in the present embodiment includes a part of the processes performed by the masking threshold generation unit 14 and the recorrection control unit 20 in the encoding apparatus 1 illustrated in FIG. It may be changed as follows.

本実施形態に係る符号化装置１におけるマスキング閾値生成部１４は、初期ＰＥ値が目標ＰＥ値よりも大きい場合、初期マスキング閾値sfbThr₀(sfb)及び目標ＰＥ値、並びに周波数スペクトルの電力値mdct_pow(sfb)を、採否判定部２０ａに渡す。また、マスキング閾値生成部１４は、初期マスキング閾値sfbThr₀(sfb)及び目標ＰＥ値を聴覚特性算出部１６に渡す。更に、マスキング閾値生成部１４は、初期ＰＥ値、又は目標ＰＥ値と初期ＰＥ値との差分値を再補正制御部２０の記憶部２０ｃに記憶させる。 When the initial PE value is larger than the target PE value, the masking threshold value generation unit 14 in the encoding device 1 according to the present embodiment, the initial masking threshold value sfbThr ₀ (sfb), the target PE value, and the power value mdct_pow ( sfb) is passed to the acceptance / rejection determination unit 20a. Further, the masking threshold value generation unit 14 passes the initial masking threshold value sfbThr ₀ (sfb) and the target PE value to the auditory characteristic calculation unit 16. Further, the masking threshold value generation unit 14 stores the initial PE value or the difference value between the target PE value and the initial PE value in the storage unit 20 c of the recorrection control unit 20.

一方、本実施形態に係る符号化装置１における再補正制御部２０は、図７及び図８に示すような処理を行う。 On the other hand, the re-correction control unit 20 in the encoding device 1 according to the present embodiment performs processing as shown in FIGS.

図７は、本発明の第２の実施形態に係るマスキング閾値の補正処理を示すフローチャートである。 FIG. 7 is a flowchart showing a masking threshold correction process according to the second embodiment of the present invention.

本実施形態に係るマスキング閾値の補正処理は、上記のように、初期ＰＥ値が目標ＰＥ値より大きい場合（ステップＳ１８；Ｙｅｓ）に行われる。初期ＰＥ値が目標ＰＥ値より大きい場合、第１の実施形態で説明したとおり、初期マスキング閾値を用いて周波数スペクトルを量子化することができない。そのため、初期ＰＥ値が目標ＰＥ値より大きい場合、符号化装置１は、図７に示した手順でマスキング閾値を補正する。 As described above, the masking threshold value correction process according to the present embodiment is performed when the initial PE value is larger than the target PE value (step S18; Yes). When the initial PE value is larger than the target PE value, the frequency spectrum cannot be quantized using the initial masking threshold as described in the first embodiment. Therefore, when the initial PE value is larger than the target PE value, the encoding apparatus 1 corrects the masking threshold according to the procedure shown in FIG.

本実施形態においても、マスキング閾値を補正する際には、まず聴覚特性算出部１６が周波数スペクトル等に基づき聴覚特性を算出する（ステップＳ２０）。聴覚特性算出部１６は、聴覚特性として、周波数スペクトルの電力値mdct_pow(sfb)と初期マスキング閾値sfbThr₀(sfb)との差分値（信号対マスク比）を算出する。 Also in the present embodiment, when correcting the masking threshold, the auditory characteristic calculator 16 first calculates the auditory characteristic based on the frequency spectrum or the like (step S20). The auditory characteristic calculation unit 16 calculates a difference value (signal-to-mask ratio) between the power value mdct_pow (sfb) of the frequency spectrum and the initial masking threshold sfbThr ₀ (sfb) as the auditory characteristic.

聴覚特性（信号対マスク比）の算出を終えると、次に、マスキング閾値補正部１８が、補正後のマスキング閾値sfbThr(sfb)の上限値を、欠落を許容する値に設定する（ステップＳ２１）。そして、閾値の上限値の設定を終えると、マスキング閾値補正部１８は、聴覚特性及び閾値の上限値に基づいてマスキング閾値を補正する（ステップＳ２４）。すなわち、本実施形態では、符号化対象の全帯域が量子化による欠落を許容された条件下で、１回目のマスキング閾値の補正を行う。なお、マスキング閾値は、第１の実施形態と同様、式（３）を用い、電力値mdct_pow(sfb)と補正後のマスキング閾値sfbThr(sfb)とに基づいて算出されるＰＥ値が目標ＰＥ値になるように補正する。 When the calculation of the auditory characteristic (signal-to-mask ratio) is finished, the masking threshold value correction unit 18 then sets the upper limit value of the corrected masking threshold value sfbThr (sfb) to a value that allows omission (step S21). . When the setting of the upper limit value of the threshold is completed, the masking threshold value correcting unit 18 corrects the masking threshold value based on the auditory characteristics and the upper limit value of the threshold value (step S24). That is, in the present embodiment, the first masking threshold value correction is performed under the condition that the entire band to be encoded is allowed to be lost due to quantization. As in the first embodiment, the masking threshold is calculated using Equation (3), and the PE value calculated based on the power value mdct_pow (sfb) and the corrected masking threshold sfbThr (sfb) is the target PE value. Correct so that

マスキング閾値の補正を終えると、次に、再補正制御部２０の採否判定部２０ａが、補正後のマスキング閾値sfbThr(sfb)と電力値mdct_pow(sfb)とを比較する。そして、採否判定部２０ａは、符号化対象の全ての帯域でsfbThr(sfb）＜mdct_pow(sfb)であるかを判定する（ステップＳ２７）。符号化対象の全ての帯域でsfbThr(sfb）＜mdct_pow(sfb)である場合（ステップＳ２７；Ｙｅｓ）、量子化により欠落する帯域はないので、採否判定部２０ａは補正後のマスキング閾値sfbThr(sfb)を採用すると判定する。この場合、符号化装置１は、図２Ｃに示したように、直前の補正により得られたマスキング閾値sfbThr(sfb)を用いて周波数スペクトルを量子化する（ステップＳ３４）。 When the correction of the masking threshold is finished, the acceptance determination unit 20a of the recorrection control unit 20 then compares the corrected masking threshold sfbThr (sfb) with the power value mdct_pow (sfb). Then, the acceptance / rejection determination unit 20a determines whether sfbThr (sfb) <mdct_pow (sfb) is satisfied in all the bands to be encoded (step S27). When sfbThr (sfb) <mdct_pow (sfb) is satisfied in all the bands to be encoded (step S27; Yes), since there is no band missing due to quantization, the acceptance / rejection determination unit 20a determines the corrected masking threshold value sfbThr (sfb ). In this case, as illustrated in FIG. 2C, the encoding apparatus 1 quantizes the frequency spectrum using the masking threshold value sfbThr (sfb) obtained by the previous correction (step S34).

一方、sfbThr(sfb）≧mdct_pow(sfb)の帯域がある場合（ステップＳ２７；Ｎｏ）、その帯域は、判定に係る補正後のマスキング閾値sfbThr(sfb)を用いて周波数スペクトルを量子化すると欠落する。１回目のマスキング閾値の補正は、上記のように全ての帯域が欠落を許容された条件下で行っている。そのため、sfbThr(sfb）≧mdct_pow(sfb)である帯域は、音質についての重要度が高い場合もありうる。重要度の高い帯域が量子化により欠落すると音質が著しく劣化する。よって、sfbThr(sfb）≧mdct_pow(sfb)の帯域がある場合、欠落許容帯域設定処理（ステップＳ３０）を行い、重要度が低い帯域の欠落のみを許容するようマスキング閾値の上限値の設定を変更してマスキング閾値を補正しなおす。欠落許容帯域設定処理は、第１の実施形態と同様、欠落許容帯域設定部２０ｂが行う。 On the other hand, when there is a band of sfbThr (sfb) ≧ mdct_pow (sfb) (step S27; No), the band is lost when the frequency spectrum is quantized using the corrected masking threshold sfbThr (sfb) related to the determination. . The first correction of the masking threshold is performed under the condition that all the bands are allowed to be lost as described above. Therefore, a band where sfbThr (sfb) ≧ mdct_pow (sfb) may have a high importance regarding sound quality. If a band with high importance is lost due to quantization, the sound quality is significantly degraded. Therefore, if there is a band of sfbThr (sfb) ≥ mdct_pow (sfb), the missing allowable band setting process (step S30) is performed, and the setting of the upper limit value of the masking threshold is changed so as to allow only the missing of the less important band Then, the masking threshold is corrected again. The missing permissible bandwidth setting process is performed by the missing permissible bandwidth setting unit 20b as in the first embodiment.

図８は、第２の実施形態における欠落許容帯域設定処理の内容を示すフローチャートである。図９は、欠落を許容する帯域の割合の決定方法を説明するグラフである。 FIG. 8 is a flowchart showing the content of the missing allowable band setting process in the second embodiment. FIG. 9 is a graph for explaining a method of determining a ratio of a band that allows loss.

欠落許容帯域設定部２０ｂは、本実施形態の欠落許容帯域設定処理（ステップＳ３０）として、図８に示すように、まず、目標ＰＥ値と初期ＰＥ値との差分値に基づいて、全帯域に対する欠落を許容する帯域の割合を決定する（ステップＳ３０２０）。ステップＳ３０２０では、例えば記憶部２０ｃから初期ＰＥ値及び目標ＰＥ値を読み出して差分値diffPeを算出する。その後、例えば、図９に示す関数ｆ(diffPe)に基づいて、欠落を許容する帯域の割合lack_ratioを決定する。関数ｆ(diffPe)は、差分値diffPeをパラメータとする関数であり、下記式（５）で表される。 As shown in FIG. 8, the missing permissible bandwidth setting unit 20b performs the permissible bandwidth setting processing (step S30) of this embodiment on the basis of the difference value between the target PE value and the initial PE value, as shown in FIG. The ratio of the band that allows the loss is determined (step S3020). In step S3020, for example, the initial PE value and the target PE value are read from the storage unit 20c, and the difference value diffPe is calculated. After that, for example, based on the function f (diffPe) shown in FIG. 9, the ratio lack_ratio of the band that allows the loss is determined. The function f (diffPe) is a function having the difference value diffPe as a parameter, and is represented by the following equation (5).

図９及び式（５）におけるＴＨ_２，ＴＨ_３，及びＴＨ_４は、いずれも任意の値であり、ビットレートや許容する音質劣化の度合い等に基づいて適宜設定すればよい。 In FIG. 9 and Expression (5), TH ₂ , TH ₃ , and TH ₄ are all arbitrary values, and may be set as appropriate based on the bit rate, the degree of acceptable sound quality degradation, and the like.

欠落を許容する帯域の割合を設定すると、欠落許容帯域設定部２０ｂは、次に、記憶部２０ｃから符号化対象の各帯域の信号対マスク比及びマスキング閾値の上限値を読み出す（ステップＳ３０２２）。続けて、欠落許容帯域設定部２０ｂは、信号対マスク比（重要度）が最も低い帯域から順に、ステップＳ３０２０で決定した割合に応じた順位の帯域まで欠落を許容する（ステップＳ３０２４）。 When the ratio of the band that allows the loss is set, the loss allowable band setting unit 20b next reads the signal-to-mask ratio and the upper limit value of the masking threshold of each band to be encoded from the storage unit 20c (step S3022). Subsequently, the loss allowable band setting unit 20b allows the loss to the band having the rank corresponding to the ratio determined in step S3020 in order from the band having the lowest signal-to-mask ratio (importance) (step S3024).

その後、欠落許容帯域設定部２０ｂは、欠落を許容する帯域以外の帯域についてのマスキング閾値の上限値を、欠落を許容しない値に変更し、記憶部２０ｃのデータを更新する（ステップＳ３０２６）。欠落を許容しないマスキング閾値の上限値は、第１の実施形態で説明したように、sfbThr(sfb)＜mdct_pow(sfb)を満たす正の値である。 Thereafter, the missing permissible band setting unit 20b changes the upper limit value of the masking threshold for a band other than the band that allows the loss to a value that does not allow the loss, and updates the data in the storage unit 20c (step S3026). The upper limit value of the masking threshold that does not allow omission is a positive value that satisfies sfbThr (sfb) <mdct_pow (sfb), as described in the first embodiment.

ステップＳ３０２６の処理を終えると、欠落許容帯域設定部２０ｂは、欠落許容帯域設定処理を終了する（リターン）。欠落許容帯域設定処理が終了すると、図７に示したステップＳ２４に戻り、初期マスキング閾値に対する補正が再度行われる。 When the process of step S3026 is completed, the missing allowable band setting unit 20b ends the missing allowable band setting process (return). When the missing allowable band setting process is completed, the process returns to step S24 shown in FIG. 7, and the correction for the initial masking threshold is performed again.

上記のように、本実施形態では、初期マスキング閾値に対する１回目の補正を全ての帯域の欠落を許容した条件下で行う。帯域の欠落を許容する条件下では補正後のマスキング閾値に上限値がないので、上述したような過剰な補正を防ぐことができ、補正前後のマスキング閾値の概形の類似度の低下を抑制できる。そのため、１回目の補正で得られたマスキング閾値が全ての帯域でsfbThr(sfb）＜mdct_pow(sfb)であれば、補正前後のマスキング閾値の概形の類似度は、音質が著しく劣化しない程度に高いといえる。したがって、１回目の補正で得られたマスキング閾値が採用された場合、帯域が欠落することによる音質劣化がない上、マスキング閾値の概形のずれ（過度の補正）による音質劣化も抑えられる。 As described above, in the present embodiment, the first correction with respect to the initial masking threshold is performed under a condition that allows the loss of all bands. Since there is no upper limit value for the masking threshold after correction under the condition that allows the loss of the band, it is possible to prevent excessive correction as described above, and to suppress a decrease in the similarity of the rough shape of the masking threshold before and after correction. . Therefore, if the masking threshold obtained by the first correction is sfbThr (sfb) <mdct_pow (sfb) in all bands, the approximate similarity of the masking threshold before and after correction is such that the sound quality does not deteriorate significantly. It can be said that it is expensive. Therefore, when the masking threshold obtained by the first correction is adopted, the sound quality is not deteriorated due to the lack of the band, and the sound quality deterioration due to the approximate deviation (excessive correction) of the masking threshold is suppressed.

また、本実施形態では、１回目の補正で得られたマスキング閾値にsfbThr（sfb）≧mdct_pow(sfb)の帯域がある場合、sfbThr（sfb）≧mdct_pow(sfb)の帯域ではなく、音質における重要度の低い帯域の欠落を許容してマスキング閾値を再度補正する。これにより、帯域が欠落することによる音質の劣化を抑制しつつ、マスキング閾値の概形のずれ（過度の補正）による音質劣化を抑制する。 Further, in the present embodiment, when the masking threshold obtained by the first correction has a band of sfbThr (sfb) ≧ mdct_pow (sfb), it is not important for the sound quality but the band of sfbThr (sfb) ≧ mdct_pow (sfb). The masking threshold value is corrected again by allowing a missing band of a low degree. This suppresses deterioration in sound quality due to a shift in the outline of the masking threshold (excessive correction) while suppressing deterioration in sound quality due to lack of bands.

また、本実施形態では、目標ＰＥ値と初期ＰＥ値との差分値diffPeに基づいて欠落を許容する帯域を設定してマスキング閾値を補正しなおす。 Further, in the present embodiment, the masking threshold is corrected again by setting a band that allows omission based on the difference value diffPe between the target PE value and the initial PE value.

オーディオ信号の符号化においては、目標ＰＥ値と初期ＰＥ値との差分値diffPeが大きくなると符号化時のビット数の不足量が増大する。そのため、差分値diffPeが大きい場合にはマスキング閾値の補正量を多くしなければならない。ところが、帯域の欠落を許容しない条件下では、各帯域のマスキング閾値を上限値（電力値mdct_pow(sfb)）より大きくすることができない。すなわち、式（３）を用いた補正処理の過程においてマスキング閾値が上限値に達した帯域については、マスキング閾値を更に大きくして量子化に使用されるビット量を低減することができない。したがって、マスキング閾値が上限値に達した帯域があるにもかかわらずビット数が不足している場合、マスキング閾値が上限値に達していない帯域のマスキング閾値を大きくして符号化に使用するビット量を低減することとなる。よって、マスキング閾値の補正量が多くなると、欠落を許容しない条件下では補正後のマスキング閾値の概形と初期マスキング閾値との概形の類似度が低くなる。このように、帯域の欠落を許容しない条件で補正されたマスキング閾値の概形と初期マスキング閾値との概形の類似度は、目標ＰＥ値と初期ＰＥ値との差分値diffPeの大きさから間接的に把握することができる。すなわち、帯域の欠落を許容しない条件でマスキング閾値を補正する処理、及び式（４）を用いて相互相関値correを算出する処理を行わなくても、帯域の欠落を許容しない条件下における補正前後のマスキング閾値の概形の類似度を把握することができる。 In encoding an audio signal, if the difference value diffPe between the target PE value and the initial PE value increases, the amount of bits deficient during encoding increases. Therefore, when the difference value diffPe is large, the masking threshold correction amount must be increased. However, under conditions that do not allow band loss, the masking threshold for each band cannot be made larger than the upper limit (power value mdct_pow (sfb)). That is, for the band in which the masking threshold reaches the upper limit value in the course of the correction process using Equation (3), the masking threshold cannot be further increased to reduce the amount of bits used for quantization. Therefore, if there is a band whose masking threshold has reached the upper limit, but the number of bits is insufficient, the amount of bits used for encoding by increasing the masking threshold of the band where the masking threshold has not reached the upper limit. Will be reduced. Therefore, when the amount of correction of the masking threshold increases, the degree of similarity between the approximate shape of the masking threshold after correction and the approximate shape of the initial masking threshold decreases under conditions that do not allow omission. As described above, the approximate similarity between the masking threshold value and the initial masking threshold value corrected under a condition that does not allow band loss is indirectly determined from the difference value diffPe between the target PE value and the initial PE value. Can be grasped. That is, before and after correction under conditions that do not allow band loss without performing processing for correcting the masking threshold under conditions that do not allow band loss and processing for calculating the cross-correlation value corre using Equation (4) It is possible to grasp the approximate similarity of the masking thresholds.

第１の実施形態からわかるように、マスキング閾値の概形の類似度と所定の基準値との差が大きいほど、類似度を基準値より大きくするために欠落を許容する帯域の数が多くなる。そのため、本実施形態では、図９に示した関数ｆ(diffPe)のように、目標ＰＥ値と初期ＰＥ値との差分値diffPeがある閾値ＴＨ_２より大きくなると差分値diffPeに比例して欠落を許容する帯域の割合（数）が増すようにしている。ただし、欠落を許容する帯域の割合（数）を大きくすると、補正後のマスキング閾値が上限値を超えてしまい量子化により欠落する帯域が多くなる恐れがある。そのため、差分値diffPeが別の閾値ＴＨ_３（＞ＴＨ_２）よりも大きい場合には、欠落を許容する帯域の割合を閾値ＴＨ_４に制限する。 As can be seen from the first embodiment, the greater the difference between the approximate similarity of the masking threshold and the predetermined reference value, the greater the number of bands that are allowed to be dropped in order to make the similarity greater than the reference value. . Therefore, in this embodiment, as a function f (diffPe) shown in FIG. 9, becomes greater than the threshold value TH ₂ with the difference value DiffPe the target PE value and the initial PE values missing in proportion to the difference value DiffPe The ratio (number) of allowable bands is increased. However, if the ratio (number) of bands that allow omission is increased, the masking threshold after correction exceeds the upper limit value, and there is a possibility that the band missing due to quantization may increase. Therefore, when the difference value diffPe is larger than another threshold value TH ₃ (> TH ₂ ), the ratio of the band that allows loss is limited to the threshold value TH ₄ .

このように、本実施形態では、マスキング閾値を補正しなおす際に、目標ＰＥ値と初期ＰＥ値との差分値に基づいて、補正後のマスキング閾値と初期マスキング閾値との概形の類似度が基準値を満たすよう欠落を許容する帯域の割合を設定する。そのため、帯域が欠落することによる音質劣化を抑制しつつ、マスキング閾値の概形のずれ（過度の補正）による音質劣化を抑制することができる。 As described above, in this embodiment, when the masking threshold is corrected again, the approximate similarity between the corrected masking threshold and the initial masking threshold is determined based on the difference value between the target PE value and the initial PE value. The ratio of the band that allows the loss is set so as to satisfy the reference value. For this reason, it is possible to suppress deterioration in sound quality due to deviation (excessive correction) of the approximate shape of the masking threshold while suppressing deterioration in sound quality due to lack of bands.

また、本実施形態で欠落を許容する帯域を設定する際に用いる目標ＰＥ値と初期ＰＥ値との差分値diffPeや各帯域の重要度（信号対マスク比）等の計算量は、式（４）を用いて算出する相互相関値correの計算量よりも少ない。加えて、目標ＰＥ値、初期ＰＥ値、及び信号対マスク比は、欠落許容帯域設定処理（ステップＳ３０）を行う前に、マスキング閾値生成部１４や聴覚特性算出部１６で算出している。そのため、本実施形態に係るマスキング閾値の補正処理は、マスキング閾値の相互相関値correを算出して概形の類似度を判定する第１の実施形態に比べて計算量を低減することができる。よって、本実施形態によれば、マスキング閾値の補正処理を効率よく行うことができる。 In addition, the calculation amount such as the difference value diffPe between the target PE value and the initial PE value used when setting the band that is allowed to be lost in this embodiment and the importance (signal to mask ratio) of each band is expressed by the equation (4). ) Is less than the amount of calculation of the cross-correlation value corre. In addition, the target PE value, the initial PE value, and the signal-to-mask ratio are calculated by the masking threshold value generation unit 14 and the auditory characteristic calculation unit 16 before the missing allowable band setting process (step S30). Therefore, the masking threshold correction processing according to the present embodiment can reduce the amount of calculation compared to the first embodiment in which the cross-correlation value corre of the masking threshold is calculated to determine the approximate similarity. Therefore, according to the present embodiment, the masking threshold value correction process can be performed efficiently.

更に、本実施形態に係るマスキング閾値の補正処理では、目標ＰＥ値と初期ＰＥ値との差分値diffPeが大きい場合、１度の欠落許容帯域設定処理において重要度が異なる２つ以上の帯域の欠落を許容することができる。そのため、マスキング閾値を補正しなおす際の補正処理を一層効率よく行うことができ、AAC-Enhanced Low Delay（ＡＡＣ−ＥＬＤ）等の低遅延符号化への適用が一層容易になる。 Furthermore, in the masking threshold correction process according to the present embodiment, when the difference value diffPe between the target PE value and the initial PE value is large, the missing of two or more bands having different degrees of importance in one missing permissible band setting process. Can be tolerated. Therefore, the correction process when correcting the masking threshold can be performed more efficiently, and the application to low delay encoding such as AAC-Enhanced Low Delay (AAC-ELD) becomes easier.

なお、本実施形態では、図９及び式（５）で表される関数ｆ(diffPe)に基づいて、マスキング閾値を補正しなす際の欠落を許容する帯域の割合lack_ratioを決定している。しかしながら、関数ｆ(diffPe)は、これに限らず、例えば任意のシグモイド関数を用いてもよい。 In the present embodiment, the ratio lack_ratio of the band that allows missing when correcting the masking threshold is determined based on the function f (diffPe) represented by FIG. 9 and Expression (5). However, the function f (diffPe) is not limited to this. For example, an arbitrary sigmoid function may be used.

［第３の実施形態］
本実施形態は、第２の実施形態とは異なる値に基づいて欠落を許容する帯域を設定することで、本発明に係る符号化装置におけるマスキング閾値の補正処理を効率よく行えるようにしたものである。本実施形態のマスキング閾値の補正処理は、第１の実施形態と同様、図２Ａに示したステップＳ１８の判定において初期ＰＥ値が目標ＰＥ値より大きい場合（ステップＳ１８；Ｙｅｓ）に行われる。また、本実施形態に係るマスキング閾値の補正処理は、図７に示した手順で行う。すなわち、本実施形態のマスキング閾値の補正処理では、初期マスキング閾値に対する１回目の補正を、全ての帯域の欠落を許容した条件下で行う（ステップＳ２１，Ｓ２４）。そして、周波数スペクトルの電力値mdct_pow(sfb)と補正後のマスキング閾値sfbThr(sfb)との関係がsfbThr(sfb)≧mdct_pow(sfb)となる帯域があった場合（ステップＳ２７；Ｙｅｓ）、マスキング閾値を補正しなおす。また、マスキング閾値を補正しなおす場合、欠落許容帯域設定処理（ステップＳ３０）を行って、量子化による欠落を許容する帯域を設定する。 [Third Embodiment]
In the present embodiment, a band that allows omission is set based on a value different from that of the second embodiment, so that the masking threshold correction process in the encoding device according to the present invention can be performed efficiently. is there. The masking threshold value correction process of this embodiment is performed when the initial PE value is larger than the target PE value in the determination of step S18 shown in FIG. 2A (step S18; Yes), as in the first embodiment. Further, the masking threshold correction processing according to the present embodiment is performed according to the procedure shown in FIG. That is, in the masking threshold value correction process of the present embodiment, the first correction for the initial masking threshold value is performed under a condition that allows the loss of all bands (steps S21 and S24). If there is a band where the relationship between the power value mdct_pow (sfb) of the frequency spectrum and the corrected masking threshold value sfbThr (sfb) is sfbThr (sfb) ≧ mdct_pow (sfb) (step S27; Yes), the masking threshold value Correct again. When the masking threshold is corrected again, a loss allowable band setting process (step S30) is performed to set a band that allows a loss due to quantization.

本実施形態に係る符号化装置１におけるマスキング閾値生成部１４は、初期ＰＥ値が目標ＰＥ値よりも大きい場合、初期マスキング閾値sfbThr₀(sfb)及び目標ＰＥ値、並びに周波数スペクトルの電力値mdct_pow(sfb)を、採否判定部２０ａに渡す。また、マスキング閾値生成部１４は、初期マスキング閾値sfbThr₀(sfb)及び目標ＰＥ値を聴覚特性算出部１６に渡す。 When the initial PE value is larger than the target PE value, the masking threshold value generation unit 14 in the encoding device 1 according to the present embodiment, the initial masking threshold value sfbThr ₀ (sfb), the target PE value, and the power value mdct_pow ( sfb) is passed to the acceptance / rejection determination unit 20a. Further, the masking threshold value generation unit 14 passes the initial masking threshold value sfbThr ₀ (sfb) and the target PE value to the auditory characteristic calculation unit 16.

また、本実施形態に係る符号化装置１における再補正制御部２０は、上記のように図７に示したマスキング閾値の補正処理を行うが、欠落許容帯域設定処理（ステップＳ３０）として、図１０に示すような処理を行う。 Further, the re-correction control unit 20 in the encoding device 1 according to the present embodiment performs the masking threshold value correction process shown in FIG. 7 as described above, but as the missing allowable band setting process (step S30), FIG. The process shown in

図１０は、本発明の第３の実施形態に係る欠落許容帯域設定処理の内容を示すフローチャートである。図１１は、欠落を許容する帯域の割合の決定方法を説明するグラフである。 FIG. 10 is a flowchart showing the content of the missing allowable band setting process according to the third embodiment of the present invention. FIG. 11 is a graph for explaining a method of determining a ratio of a band that allows loss.

図１０に示した欠落許容帯域設定処理は、再補正制御部２０の欠落許容帯域設定部２０ｂが行う。欠落許容帯域設定部２０ｂは、まず、符号化対象の帯域の数に対するsfbThr(sfb)≧mdct_pow(sfb)の帯域の数の割合に基づいて、欠落を許容する帯域の割合を決定する（ステップＳ３０２１）。ステップＳ３０２１では、例えば、図１１に示す関数ｆ(sat_ratio)に基づいて、欠落を許容する帯域の割合lack_ratioを決定する。関数ｆ(sat_ratio)は、符号化対象の帯域の数に対するsfbThr(sfb)≧mdct_pow(sfb)の帯域の数の割合をパラメータとする関数であり、下記式（６）で表される。 The missing permissible band setting process shown in FIG. 10 is performed by the missing permissible band setting unit 20b of the recorrection control unit 20. First, the tolerable band setting unit 20b determines the ratio of bands that allow for loss based on the ratio of the number of bands of sfbThr (sfb) ≧ mdct_pow (sfb) to the number of bands to be encoded (step S3021). ). In step S3021, for example, based on the function f (sat_ratio) shown in FIG. The function f (sat_ratio) is a function whose parameter is the ratio of the number of bands of sfbThr (sfb) ≧ mdct_pow (sfb) to the number of bands to be encoded, and is expressed by the following equation (6).

式（６）において、encode_sfb_numは符号化対象の帯域の数である。また、sat_sfb_numは、sfbThr(sfb)≧mdct_pow(sfb)の帯域の数である。 In equation (6), encode_sfb_num is the number of bands to be encoded. Further, sat_sfb_num is the number of bands of sfbThr (sfb) ≧ mdct_pow (sfb).

図１１及び式（６）におけるＴＨ_５，ＴＨ_６，及びＴＨ_７は、いずれも任意の値であり、ビットレートや許容する音質劣化の度合い等に基づいて適宜設定すればよい。 Each of TH ₅ , TH ₆ , and TH ₇ in FIG. 11 and Expression (6) is an arbitrary value, and may be set as appropriate based on the bit rate, the degree of acceptable sound quality degradation, and the like.

欠落を許容する帯域の割合を設定すると、欠落許容帯域設定部２０ｂは、次に、記憶部２０ｃから符号化対象の各帯域の信号対マスク比及びマスキング閾値の上限値を読み出す（ステップＳ３０２２）。続けて、欠落許容帯域設定部２０ｂは、信号対マスク比（重要度）が最も低い帯域から順に、ステップＳ３０２１で決定した割合に応じた順位の帯域まで欠落を許容する（ステップＳ３０２４）。 When the ratio of the band that allows the loss is set, the loss allowable band setting unit 20b next reads the signal-to-mask ratio and the upper limit value of the masking threshold of each band to be encoded from the storage unit 20c (step S3022). Subsequently, the loss allowable band setting unit 20b allows the loss to the band of the rank corresponding to the ratio determined in step S3021 in order from the band having the lowest signal-to-mask ratio (importance) (step S3024).

上記のように、本実施形態では、初期マスキング閾値に対する１回目の補正を全ての帯域の欠落を許容した条件下で行う。帯域の欠落を許容する条件下でのマスキング閾値の補正処理においては、補正前後のマスキング閾値の概形の類似度の低下が抑制される。そのため、１回目の補正で得られたマスキング閾値が全ての帯域でsfbThr(sfb）＜mdct_pow(sfb)であれば、補正前後のマスキング閾値の概形の類似度は、音質が著しく劣化しない程度に高いといえる。したがって、１回目の補正で得られたマスキング閾値が採用された場合、帯域が欠落することによる音質劣化がない上、マスキング閾値の概形のずれ（過度の補正）による音質劣化も抑えられる。 As described above, in the present embodiment, the first correction with respect to the initial masking threshold is performed under a condition that allows the loss of all bands. In the masking threshold correction process under the condition that allows the loss of the band, a decrease in the similarity of the approximate shape of the masking threshold before and after the correction is suppressed. Therefore, if the masking threshold obtained by the first correction is sfbThr (sfb) <mdct_pow (sfb) in all bands, the approximate similarity of the masking threshold before and after correction is such that the sound quality does not deteriorate significantly. It can be said that it is expensive. Therefore, when the masking threshold obtained by the first correction is adopted, the sound quality is not deteriorated due to the lack of the band, and the sound quality deterioration due to the approximate deviation (excessive correction) of the masking threshold is suppressed.

また、本実施形態においても、補正されたマスキング閾値にsfbThr(sfb）≧mdct_pow(sfb)の帯域がある場合、欠落を許容する帯域を設定してマスキング閾値を補正しなおす。この際、欠落を許容する帯域は、符号化対象の帯域の数に対するsfbThr(sfb）≧mdct_pow(sfb)の帯域の数の割合に基づいて、重要度の低い帯域から順に欠落を許容するよう設定する。これにより、帯域が欠落することによる音質の劣化を抑制しつつ、マスキング閾値の概形のずれ（過度の補正）による音質劣化を抑制する。 Also in the present embodiment, when the corrected masking threshold includes a band of sfbThr (sfb) ≧ mdct_pow (sfb), the band that allows missing is set and the masking threshold is corrected again. At this time, the band that allows the loss is set to allow the loss in order from the lowest importance band based on the ratio of the number of bands of sfbThr (sfb) ≧ mdct_pow (sfb) to the number of bands to be encoded. To do. This suppresses deterioration in sound quality due to a shift in the outline of the masking threshold (excessive correction) while suppressing deterioration in sound quality due to lack of bands.

量子化に用いるマスキング閾値にsfbThr(sfb）≧mdct_pow(sfb)の帯域がある場合、その帯域は量子化により欠落する。そのため、量子化による帯域の欠落を抑制するには、sfbThr（sfb）≧mdct_pow(sfb)の帯域のマスキング閾値がsfbThr（sfb）＜mdct_pow(sfb)になるよう補正しなおさなければならない。すなわち、sfbThr(sfb）≧mdct_pow(sfb)の帯域のマスキング閾値が小さくなるよう、その帯域の量子化に使用されるビット量を増加しなければならない。しかしながら、周波数スペクトルの量子化に使用可能なビット数にはビットレートに応じた上限がある。したがって、sfbThr(sfb）≧mdct_pow(sfb)の帯域のマスキング閾値を小さくするには、sfbThr(sfb）＜mdct_pow(sfb)である他の帯域のマスキング閾値を大きくしなければならない。よって、欠落を許容する条件下で補正したマスキング閾値にsfbThr(sfb）≧mdct_pow(sfb)となる帯域の数が多い場合、欠落を許容しない条件下での補正により得られるマスキング閾値の概形と初期マスキング閾値との概形の類似度が低くなる。すなわち、帯域の欠落を許容しない条件で補正したマスキング閾値と初期マスキング閾値との概形の類似度は、欠落を許容する条件で補正したマスキング閾値においてsfbThr(sfb）≧mdct_pow(sfb)となる帯域の数から間接的に把握することができる。 When there is a band of sfbThr (sfb) ≧ mdct_pow (sfb) in the masking threshold used for quantization, the band is lost due to quantization. Therefore, in order to suppress the loss of the band due to quantization, it is necessary to correct the band masking threshold of sfbThr (sfb) ≧ mdct_pow (sfb) so that sfbThr (sfb) <mdct_pow (sfb). That is, the amount of bits used for quantization of the band must be increased so that the masking threshold of the band of sfbThr (sfb) ≧ mdct_pow (sfb) becomes small. However, the number of bits that can be used for quantization of the frequency spectrum has an upper limit corresponding to the bit rate. Therefore, in order to reduce the masking threshold of the band of sfbThr (sfb) ≧ mdct_pow (sfb), the masking threshold of the other band of sfbThr (sfb) <mdct_pow (sfb) must be increased. Therefore, if the masking threshold corrected under the conditions that allow for omissions is large in the number of bands that satisfy sfbThr (sfb) ≥ mdct_pow (sfb), The approximate similarity with the initial masking threshold is lowered. That is, the approximate similarity between the masking threshold corrected under the condition not allowing the loss of the band and the initial masking threshold is the band where sfbThr (sfb) ≧ mdct_pow (sfb) in the masking threshold corrected under the condition allowing the loss. It can be indirectly grasped from the number.

第１の実施形態からわかるように、マスキング閾値の概形の類似度と所定の基準値との差が大きいほど、類似度を基準値より大きくするために欠落を許容する帯域の数が多くなる。そのため、本実施形態では、図１１に示した関数ｆ(sat_ratio)のように、sfbThr(sfb）≧mdct_pow(sfb)となる帯域の割合sat_ratioが閾値ＴＨ_５より大きくなると割合sat_ratioに比例して欠落を許容する帯域の割合（数）が増すようにしている。ただし、欠落を許容する帯域の割合（数）を多くすると、補正後のマスキング閾値が上限値を超えてしまい量子化により欠落する帯域が多くなる恐れがある。そのため、符号化対象の帯域の数に対するsfbThr(sfb）≧mdct_pow(sfb)の帯域の数の割合sat_ratioが別の閾値ＴＨ_６（＞ＴＨ_５）よりも大きい場合には、欠落を許容する帯域の割合を閾値ＴＨ_７に制限する。 As can be seen from the first embodiment, the greater the difference between the approximate similarity of the masking threshold and the predetermined reference value, the greater the number of bands that are allowed to be dropped in order to make the similarity greater than the reference value. . Therefore, in this embodiment, as a function f (sat_ratio) shown in FIG. 11, the missing proportion to the ratio Sat_ratio the sfbThr (sfb) ≧ mdct_pow (sfb ) to become the ratio of the bandwidth Sat_ratio is greater than the threshold value TH ₅ The ratio (number) of bands that allow the increase is increased. However, if the ratio (number) of bands that allow omission is increased, the masking threshold after correction exceeds the upper limit value, and there is a possibility that the band that is lost due to quantization may increase. Therefore, when the ratio sat_ratio of the number of bands of sfbThr (sfb) ≧ mdct_pow (sfb) to the number of bands to be encoded is larger than another threshold TH ₆ (> TH ₅ ), the band that allows the loss limiting the rate of the threshold value TH _7.

このように、本実施形態では、マスキング閾値を補正しなおす際に、符号化対象の帯域の数に対するsfbThr(sfb）≧mdct_pow(sfb)の帯域の数の割合に基づいて、重要度の低い帯域から順に欠落を許容する。そして、補正後のマスキング閾値と初期マスキング閾値との概形の類似度が基準値を満たすようになるまで、欠落を許容する帯域を追加しながらマスキング閾値の補正を繰り返す。そのため、帯域が欠落することによる音質劣化を抑制しつつ、マスキング閾値の概形のずれ（過度の補正）による音質劣化を抑制することができる。 As described above, in this embodiment, when the masking threshold is corrected again, the band with low importance is based on the ratio of the number of bands of sfbThr (sfb) ≧ mdct_pow (sfb) to the number of bands to be encoded. Missing is allowed in order. Then, the correction of the masking threshold is repeated while adding a band that allows omission until the approximate similarity between the corrected masking threshold and the initial masking threshold satisfies the reference value. For this reason, it is possible to suppress deterioration in sound quality due to deviation (excessive correction) of the approximate shape of the masking threshold while suppressing deterioration in sound quality due to lack of bands.

また、本実施形態で欠落を許容する帯域を設定する際に用いる値は、欠落許容帯域設定処理（ステップＳ３０）を行う前に、マスキング閾値生成部１４、聴覚特性算出部１６、及びマスキング閾値補正部１８で算出している。そのため、本実施形態に係るマスキング閾値の補正処理は、マスキング閾値の相互相関値correを算出して概形の類似度を判定する第１の実施形態に比べて計算量を低減することができる。よって、本実施形態によれば、マスキング閾値の補正処理を効率よく行うことができる。 Further, the values used when setting the band that allows the loss in this embodiment are the masking threshold value generation unit 14, the auditory characteristic calculation unit 16, and the masking threshold value correction before the deletion allowable band setting process (step S 30). This is calculated by the unit 18. Therefore, the masking threshold correction processing according to the present embodiment can reduce the amount of calculation compared to the first embodiment in which the cross-correlation value corre of the masking threshold is calculated to determine the approximate similarity. Therefore, according to the present embodiment, the masking threshold value correction process can be performed efficiently.

更に、本実施形態に係るマスキング閾値の補正処理では、符号化対象の帯域の数に対するsfbThr(sfb）≧mdct_pow(sfb)の帯域の数の割合が大きい場合、１度の欠落帯域設定処理において重要度が異なる２つ以上の帯域の欠落を許容することができる。そのため、マスキング閾値を補正しなおす際の補正処理を一層効率よく行うことができ、ＡＡＣ−ＥＬＤ等の低遅延符号化への適用が一層容易になる。 Furthermore, in the masking threshold correction process according to the present embodiment, when the ratio of the number of bands of sfbThr (sfb) ≧ mdct_pow (sfb) to the number of bands to be encoded is large, it is important in one missing band setting process Missing of two or more bands having different degrees can be allowed. Therefore, the correction process when correcting the masking threshold can be performed more efficiently, and the application to low-delay encoding such as AAC-ELD becomes easier.

なお、本実施形態では、図１１及び式（６）で表される関数ｆ(sat_ratio)に基づいて、マスキング閾値を補正しなす際の欠落を許容する帯域の割合lack_ratioを決定している。しかしながら、関数ｆ(sat_ratio)は、これに限らず、例えば任意のシグモイド関数を用いてもよい。 In this embodiment, based on the function f (sat_ratio) represented by FIG. 11 and the equation (6), the ratio lack_ratio of the band that allows omission when correcting the masking threshold is determined. However, the function f (sat_ratio) is not limited to this, and an arbitrary sigmoid function may be used, for example.

以上記載した各実施例を含む実施形態に関し、更に以下の付記を開示する。
（付記１）
オーディオ信号を周波数スペクトルに変換し、当該周波数スペクトルの量子化及び符号化を行う符号化装置において、
前記周波数スペクトルに基づいて量子化する際の初期マスキング閾値を生成する閾値生成部と、
前記周波数スペクトルの量子化に使用可能なビット量に基づいて前記初期マスキング閾値を補正する閾値補正部と、
前記閾値補正部で補正されたマスキング閾値と前記初期マスキング閾値との概形の類似度が基準値以下の場合に、前記周波数スペクトルの帯域毎に量子化による欠落を許容するか否かを設定し、前記閾値補正部に前記初期マスキング閾値を再度補正させる再補正制御部と、
を備えることを特徴とする符号化装置。
（付記２）
前記再補正制御部は、前記初期マスキング閾値及び補正された前記マスキング閾値における欠落を許容しない帯域の概形の類似度が前記基準値以下の場合に、前記閾値補正部に前記初期マスキング閾値を再度補正させる、
ことを特徴とする付記１に記載の符号化装置。
（付記３）
前記再補正制御部は、前記マスキング閾値の概形の類似度として前記初期マスキング閾値と前記閾値補正部で補正されたマスキング閾値との相互相関値を算出する、
ことを特徴とする付記２に記載の符号化装置。
（付記４）
前記再補正制御部は、前記初期マスキング閾値と補正された前記マスキング閾値との概形の類似度に基づいて符号化対象の全帯域に対する欠落を許容する帯域の割合を決定し、決定した前記割合に基づいて前記周波数スペクトルの各帯域に対し欠落を許容するか否かを設定する設定部を有する、
ことを特徴とする付記１に記載の符号化装置。
（付記５）
前記設定部は、前記マスキング閾値の概形の類似度として前記周波数スペクトル及び前記初期マスキング閾値に基づく第１の知覚エントロピー値と、前記周波数スペクトルの量子化に使用可能なビット数に基づく第２の知覚エントロピー値との差分値を用い、当該差分値に基づいて前記割合を決定する、
ことを特徴とする付記４に記載の符号化装置。
（付記６）
前記設定部は、前記マスキング閾値の概形の類似度として符号化対象の帯域の数に対する、補正された前記マスキング閾値が前記周波数スペクトルの電力値よりも大きい帯域の数の割合を用い、当該割合に基づいて前記符号化対象の全帯域に対する欠落を許容する帯域の割合を決定する、
ことを特徴とする付記４に記載の符号化装置。
（付記７）
前記再補正制御部は、前記周波数スペクトルの帯域のうち重要度の低い帯域から順に欠落を許容する、
ことを特徴とする付記１に記載の符号化装置。
（付記８）
前記再補正制御部は、前記重要度として、各帯域における電力値と初期マスキング閾値との差分値を用いる、
ことを特徴とする付記７に記載の符号化装置。
（付記９）
前記再補正制御部は、前記初期マスキング閾値に対する補正回数が所定の回数に達すると、補正されたマスキング閾値と前記初期マスキング閾値との概形の類似度によらず補正されたマスキング閾値を採用する、
ことを特徴とする付記１に記載の符号化装置。
（付記１０）
前記閾値生成部は、聴覚心理モデルに基づいて前記初期マスキング閾値を生成する、
ことを特徴とする付記１に記載の符号化装置。
（付記１１）
コンピュータが、
オーディオ信号から得た周波数スペクトルに基づいて生成した初期マスキング閾値が当該周波数スペクトルを量子化するための条件を満たしていない場合に、
前記周波数スペクトルの各帯域に対し量子化による欠落を許容するか否かを設定し、
前記周波数スペクトルの量子化に使用可能なビット量及び各帯域の欠落を許容するか否かの設定に基づいて前記初期マスキング閾値を補正し、
補正されたマスキング閾値と前記初期マスキング閾値との概形の類似度が基準値以下の場合、１つ以上の帯域の欠落を許容するか否かの設定を変更して初期マスキング閾値を再度補正する、
処理を実行することを特徴とする符号化方法。
（付記１２）
前記コンピュータが、前記概形の類似度として、補正された前記マスキング閾値と前記初期マスキング閾値との相互相関値を算出する、
処理を実行することを特徴とする付記１１に記載の符号化方法。
（付記１３）
前記コンピュータが、
前記周波数スペクトル及び前記初期マスキング閾値に基づく第１の知覚エントロピー値と、前記周波数スペクトルの量子化に使用可能なビット数に基づく第２の知覚エントロピー値との差分値を算出し、
算出した前記差分値に基づいて符号化対象の帯域に対する欠落を許容する帯域の割合を決定する、
処理を実行することを特徴とする付記１１に記載の符号化方法。
（付記１４）
前記コンピュータが、
符号化対象の帯域の数に対する、補正された前記マスキング閾値が前記周波数スペクトルの電力値よりも大きい帯域の数の割合に基づいて、符号化対象の帯域に対する欠落を許容する帯域の割合を決定する、
処理を実行することを特徴とする付記１１に記載の符号化方法。
（付記１５）
補正された前記マスキング閾値が所定の採用条件を満たしていない場合、前記周波数スペクトルの各帯域のうち電力値と前記初期マスキング閾値との差分値が小さい帯域から順に欠落を許容する、
ことを特徴とする付記１１に記載の符号化方法。
（付記１６）
オーディオ信号から得た周波数スペクトルに基づいて生成した初期マスキング閾値が当該周波数スペクトルを量子化するための条件を満たしていない場合に、
前記周波数スペクトルの各帯域に対し量子化による欠落を許容するか否かを設定し、
前記周波数スペクトルの量子化に使用可能なビット量及び各帯域の欠落を許容するか否かの設定に基づいて前記初期マスキング閾値を補正し、
補正されたマスキング閾値と前記初期マスキング閾値との概形の類似度が基準値以下の場合、１つ以上の帯域欠落を許容するか否かの設定を変更して初期マスキング閾値を再度補正する、
処理をコンピュータに実行させるためのプログラム。 The following additional notes are further disclosed with respect to the embodiments including the examples described above.
(Appendix 1)
In an encoding device that converts an audio signal into a frequency spectrum and performs quantization and encoding of the frequency spectrum.
A threshold generation unit that generates an initial masking threshold when quantizing based on the frequency spectrum;
A threshold correction unit that corrects the initial masking threshold based on the amount of bits available for quantization of the frequency spectrum;
Sets whether or not to allow omission due to quantization for each band of the frequency spectrum when the approximate similarity between the masking threshold corrected by the threshold correction unit and the initial masking threshold is equal to or less than a reference value. A re-correction control unit that causes the threshold correction unit to correct the initial masking threshold again;
An encoding device comprising:
(Appendix 2)
The re-correction control unit again sets the initial masking threshold to the threshold correction unit when the similarity between the initial masking threshold and the corrected masking threshold that does not allow omission of the approximate shape of the band is equal to or less than the reference value. To correct,
The encoding device according to appendix 1, wherein
(Appendix 3)
The re-correction control unit calculates a cross-correlation value between the initial masking threshold and the masking threshold corrected by the threshold correction unit as the approximate similarity of the masking threshold.
The encoding apparatus according to Supplementary Note 2, wherein
(Appendix 4)
The re-correction control unit determines a ratio of bands that allow a loss with respect to the entire band to be encoded based on the approximate similarity between the initial masking threshold and the corrected masking threshold, and the determined ratio A setting unit that sets whether or not to allow missing for each band of the frequency spectrum based on
The encoding device according to appendix 1, wherein
(Appendix 5)
The setting unit includes a first perceptual entropy value based on the frequency spectrum and the initial masking threshold as the approximate similarity of the masking threshold, and a second based on the number of bits usable for quantization of the frequency spectrum. Using the difference value with the perceptual entropy value, and determining the ratio based on the difference value;
The encoding apparatus according to supplementary note 4, wherein
(Appendix 6)
The setting unit uses a ratio of the number of bands whose corrected masking threshold is larger than the power value of the frequency spectrum to the number of bands to be encoded as the approximate similarity of the masking threshold. A ratio of a band that allows a loss to the entire band to be encoded is determined based on
The encoding apparatus according to supplementary note 4, wherein
(Appendix 7)
The re-correction control unit allows missing in order from a less important band of the frequency spectrum band,
The encoding device according to appendix 1, wherein
(Appendix 8)
The recorrection control unit uses a difference value between the power value in each band and the initial masking threshold as the importance.
The encoding apparatus according to appendix 7, characterized by:
(Appendix 9)
The re-correction control unit employs the corrected masking threshold regardless of the approximate similarity between the corrected masking threshold and the initial masking threshold when the number of corrections with respect to the initial masking threshold reaches a predetermined number. ,
The encoding device according to appendix 1, wherein
(Appendix 10)
The threshold generation unit generates the initial masking threshold based on an auditory psychological model.
The encoding device according to appendix 1, wherein
(Appendix 11)
Computer
When the initial masking threshold generated based on the frequency spectrum obtained from the audio signal does not satisfy the condition for quantizing the frequency spectrum,
Set whether to allow missing due to quantization for each band of the frequency spectrum,
Correcting the initial masking threshold based on the amount of bits available for quantization of the frequency spectrum and whether to allow missing of each band;
If the approximate similarity between the corrected masking threshold and the initial masking threshold is less than or equal to the reference value, the initial masking threshold is corrected again by changing the setting as to whether or not one or more bands are missing. ,
An encoding method characterized by executing processing.
(Appendix 12)
The computer calculates a cross-correlation value between the corrected masking threshold and the initial masking threshold as the approximate similarity.
The encoding method according to attachment 11, wherein the processing is executed.
(Appendix 13)
The computer is
Calculating a difference value between a first perceptual entropy value based on the frequency spectrum and the initial masking threshold and a second perceptual entropy value based on the number of bits usable for quantization of the frequency spectrum;
Based on the calculated difference value, a ratio of a band that allows a loss to a band to be encoded is determined.
The encoding method according to attachment 11, wherein the processing is executed.
(Appendix 14)
The computer is
Based on the ratio of the number of bands whose corrected masking threshold is larger than the power value of the frequency spectrum to the number of bands to be encoded, the ratio of the bands that are allowed to be missing from the band to be encoded is determined. ,
The encoding method according to attachment 11, wherein the processing is executed.
(Appendix 15)
When the corrected masking threshold does not satisfy a predetermined adoption condition, the loss is allowed in order from the band in which the difference value between the power value and the initial masking threshold is small in each band of the frequency spectrum.
The encoding method according to supplementary note 11, wherein
(Appendix 16)
When the initial masking threshold generated based on the frequency spectrum obtained from the audio signal does not satisfy the condition for quantizing the frequency spectrum,
Set whether to allow missing due to quantization for each band of the frequency spectrum,
Correcting the initial masking threshold based on the amount of bits available for quantization of the frequency spectrum and whether to allow missing of each band;
If the approximate similarity between the corrected masking threshold and the initial masking threshold is less than or equal to the reference value, the setting of whether or not one or more missing bands are allowed is changed and the initial masking threshold is corrected again;
A program that causes a computer to execute processing.

１符号化装置
１０ブロック切替部
１２ＭＤＣＴ処理部
１４マスキング閾値生成部
１６聴覚特性算出部
１８マスキング閾値補正部
２０再補正制御部
２０ａ採否判定部
２０ｂ欠落許容帯域設定部
２０ｃ記憶部
２２量子化部
２４符号化部
２６多重化部
５コンピュータ
５０ＣＰＵ
５２主記憶装置
５２ａＲＯＭ
５２ｂＲＡＭ
５４補助記憶装置
５６入力装置
５８出力装置
６０ＤＳＰ
６２記憶媒体駆動装置
６４インタフェース装置 DESCRIPTION OF SYMBOLS 1 Encoding apparatus 10 Block switching part 12 MDCT process part 14 Masking threshold value production | generation part 16 Auditory characteristic calculation part 18 Masking threshold value correction | amendment part 20 Recorrection control part 20a Acceptance determination part 20b Missing allowance band setting part 20c Storage part 22 Quantization part 24 Encoder 26 Multiplexer 5 Computer 50 CPU
52 Main storage device 52a ROM
52b RAM
54 Auxiliary storage device 56 Input device 58 Output device 60 DSP
62 Storage medium drive device 64 Interface device

Claims

In an encoding device that converts an audio signal into a frequency spectrum and performs quantization and encoding of the frequency spectrum.
A threshold generation unit that generates an initial masking threshold when quantizing based on the frequency spectrum;
A threshold correction unit that corrects the initial masking threshold based on the amount of bits available for quantization of the frequency spectrum;
Sets whether or not to allow omission due to quantization for each band of the frequency spectrum when the approximate similarity between the masking threshold corrected by the threshold correction unit and the initial masking threshold is equal to or less than a reference value. A re-correction control unit that causes the threshold correction unit to correct the initial masking threshold again;
An encoding device comprising:

The re-correction control unit again sets the initial masking threshold to the threshold correction unit when the similarity between the initial masking threshold and the corrected masking threshold that does not allow omission of the approximate shape of the band is equal to or less than the reference value. To correct,
The encoding apparatus according to claim 1.

The re-correction control unit determines a ratio of bands that allow a loss with respect to the entire band to be encoded based on the approximate similarity between the initial masking threshold and the corrected masking threshold, and the determined ratio A setting unit that sets whether or not to allow missing for each band of the frequency spectrum based on
The encoding apparatus according to claim 1.

The setting unit includes a first perceptual entropy value based on the frequency spectrum and the initial masking threshold as the approximate similarity of the masking threshold, and a second based on the number of bits usable for quantization of the frequency spectrum. Using the difference value with the perceptual entropy value, and determining the ratio based on the difference value;
The encoding apparatus according to claim 3.

The setting unit uses a ratio of the number of bands whose corrected masking threshold is larger than the power value of the frequency spectrum to the number of bands to be encoded as the approximate similarity of the masking threshold. A ratio of a band that allows a loss to the entire band to be encoded is determined based on
The encoding apparatus according to claim 3.

The re-correction control unit allows missing in order from a less important band of the frequency spectrum band,
The encoding apparatus according to claim 1.

Computer
When the initial masking threshold generated based on the frequency spectrum obtained from the audio signal does not satisfy the conditions for use in quantization of the frequency spectrum,
Set whether to allow missing due to quantization for each band of the frequency spectrum,
Correcting the initial masking threshold based on the amount of bits available for quantization of the frequency spectrum and whether to allow missing of each band;
If the approximate similarity between the corrected masking threshold and the initial masking threshold is less than or equal to a reference value, the initial masking threshold is corrected again by changing the setting of whether one or more bands are missing. To
An encoding method characterized by executing processing.

When the initial masking threshold generated based on the frequency spectrum obtained from the audio signal does not satisfy the conditions for use in quantization of the frequency spectrum,
Set whether to allow missing due to quantization for each band of the frequency spectrum,
Correcting the initial masking threshold based on the amount of bits available for quantization of the frequency spectrum and whether to allow missing of each band;
If the approximate similarity between the corrected masking threshold and the initial masking threshold is less than or equal to a reference value, the initial masking threshold is corrected again by changing the setting of whether one or more bands are missing. To
A program that causes a computer to execute processing.