JP2006145782A

JP2006145782A - Encoding device and method for audio signal

Info

Publication number: JP2006145782A
Application number: JP2004335005A
Authority: JP
Inventors: Masanobu Funakoshi; 正伸船越
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2004-11-18
Filing date: 2004-11-18
Publication date: 2006-06-08
Anticipated expiration: 2024-11-18
Also published as: CN101061534A; JP4639073B2; CN101061534B

Abstract

<P>PROBLEM TO BE SOLVED: To reduce a throughput of quantization processing in encoding of an audio signal. <P>SOLUTION: The system comprises a frame division section (1), an auditory psychology operation section (2), a filter bank (3), a scale factor calculation section (4) which weights the spectrum of each frequency band by the operation results of the auditory psychology operation section (2), a quantization step determination section (7) which determines the quantization step over the entire part of the spectrum by subtracting the information amount over the entire part of the spectrum after the quantization from the auditory information amount that the entire part of the spectrum having prior to the weighted quantization and integrating the coefficient obtained from the increment width of the quantization roughness, a spectrum quantization section (8), a bit shaping section (9) which outputs a bit stream shape obtained by shaping the bit stream. The quantization step determination section (7) predicts the information amount over the entire part of the quantized spectrum based on the the amount of the bit allocated to a frame which is an object for encoding. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、オーディオ信号の符号化装置および方法に関する。 The present invention relates to an audio signal encoding apparatus and method.

近年、高音質かつ高効率なオーディオ信号符号化技術が、DVD-Videoの音声トラック、半導体メモリやHDDなどを利用した携帯オーディオプレイヤー、インターネットを介した音楽配信、家庭内LANにおけるホームサーバへの楽曲蓄積などに広く利用され、幅広く普及するとともにその重要性も増している。 In recent years, high-quality and high-efficiency audio signal encoding technology has been developed for DVD-Video audio tracks, portable audio players using semiconductor memory, HDD, etc., music distribution via the Internet, music to home servers in home LAN It is widely used for accumulation, etc., and it is widely spread and its importance is increasing.

このようなオーディオ信号符号化技術の多くは、変換符号化技術を利用して時間周波数変換を行なっている。例えば、MPEG-2 AACやDolby Digital(AC-3)などでは、MDCTなどの直交変換単体でフィルタバンクを構成しており、MPEG-1 Audio Layer III(MP3)やATRAC(MDに利用されている符号化方式)では、QMFなどのサブバンド分割フィルタと直交変換を多段接続してフィルタバンクを構成している。 Many of such audio signal encoding techniques perform time-frequency conversion using a conversion encoding technique. For example, in MPEG-2 AAC, Dolby Digital (AC-3), etc., a filter bank is composed of a single orthogonal transform such as MDCT, and is used for MPEG-1 Audio Layer III (MP3) and ATRAC (MD In the encoding method, a filter bank is configured by connecting subband division filters such as QMF and orthogonal transforms in multiple stages.

これらの変換符号化技術では、人間の聴覚特性を利用したマスキング分析を行うことによって、マスキングされると判断したスペクトル成分を取り除く、あるいはマスクされる量子化誤差を許容することにより、スペクトルを表現するための情報量を削減し、圧縮効率を高めている。 In these transform coding techniques, by performing masking analysis using human auditory characteristics, spectral components that have been determined to be masked are removed, or quantization errors that are masked are allowed to be expressed. Therefore, the amount of information is reduced and the compression efficiency is increased.

また、これらの変換符号化技術では、その多くが、スペクトル成分を非線形量子化することにより、スペクトルが持つ情報量を圧縮している。例えば、MP3やAACでは、各スペクトル成分を0.75乗することにより情報量を圧縮している。 In many of these transform coding techniques, the amount of information held in a spectrum is compressed by nonlinearly quantizing the spectrum component. For example, in MP3 and AAC, the amount of information is compressed by raising each spectral component to the power of 0.75.

また、これらの変換符号化技術では、フィルタバンクによって周波数成分に変換された入力信号を、人間の聴覚の周波数分解能に基づいて設定される分割周波数帯域ごとにまとめ、量子化時に各分割周波数帯域毎の正規化係数を聴覚分析の結果より決定し、正規化係数と量子化スペクトルの組み合わせで周波数成分を表現することで情報量を削減している。この正規化係数は、実際には分割帯域毎の量子化粗さの調整を行う変数であり、正規化係数が１変化することによって、量子化粗さは１ステップ分変化することになる。MPEG-2 AACでは、この分割周波数帯域をスケールファクタバンド（SFB）と呼び、正規化係数をスケールファクタと呼称する。 Also, with these transform coding techniques, the input signals converted into frequency components by the filter bank are grouped into divided frequency bands set based on the human auditory frequency resolution, and each divided frequency band is quantized at the time of quantization. The normalization coefficient is determined from the result of auditory analysis, and the amount of information is reduced by expressing the frequency component by a combination of the normalization coefficient and the quantized spectrum. This normalization coefficient is actually a variable for adjusting the quantization roughness for each divided band. When the normalization coefficient changes by 1, the quantization roughness changes by one step. In MPEG-2 AAC, this divided frequency band is called a scale factor band (SFB), and the normalization coefficient is called a scale factor.

また、これらの変換符号化方式では、符号化単位である１フレーム全体の量子化粗さを制御することによって符号量を制御している。多くの変換符号化方式では、量子化粗さは、ある基数の整数乗幅でステップ状に制御されており、この整数を量子化ステップと呼ぶ。MPEGオーディオ規格では、この、フレーム全体の量子化粗さを設定する量子化ステップを「グローバルゲイン」もしくは「コモンスケールファクタ」と呼称している。また、前述のスケールファクタは量子化ステップとの相対値で表現することによって、これらの変数の符号に必要な情報量を削減している。 In these transform coding systems, the amount of code is controlled by controlling the quantization roughness of the entire frame, which is a coding unit. In many transform coding schemes, the quantization roughness is controlled in steps with an integer power of a certain radix, and this integer is called a quantization step. In the MPEG audio standard, this quantization step for setting the quantization roughness of the entire frame is called “global gain” or “common scale factor”. The scale factor described above is expressed as a relative value to the quantization step, thereby reducing the amount of information necessary for the sign of these variables.

例えば、MP3やAACではこれらの変数が１変化することによって、実際の量子化粗さは２の3/16乗分変化する。 For example, in MP3 and AAC, when these variables change by 1, the actual quantization roughness changes by 2 3/16 power.

変換符号化方式の量子化処理では、スケールファクタを制御して聴覚演算の結果を反映して量子化誤差がマスクされるように量子化歪みを制御すると同時に、量子化ステップを制御してフレーム全体の量子化粗さを適宜調整することによってフレーム全体の符号量制御を行わなければならない。これらの量子化粗さを決める二種類の数値は、符号化品質に重大な影響を及ぼすため、慎重かつ正確に、この二つの制御を同時に効率よく行うことが求められる。 In the transform coding quantization process, the scale factor is controlled to reflect the result of the auditory operation, and the quantization distortion is controlled so that the quantization error is masked. At the same time, the quantization step is controlled to control the entire frame. The amount of code of the entire frame must be controlled by appropriately adjusting the quantization roughness of the frame. Since these two kinds of numerical values that determine the quantization roughness have a significant influence on the encoding quality, it is required to perform these two controls simultaneously and efficiently with caution and accuracy.

MPEG-1 Audio Layer III(MP3)の規格書(ISO/IEC 11172-3)やMPEG-2 AACの規格書(ISO/IEC 13818-7)では、量子化時にスケールファクタとグローバルゲインを適宜制御する方法として、歪み制御ループ（アウターループ）と符号量制御ループ（インナーループ）の二重ループによって繰り返し処理を行う方法が紹介されている。以下、この方法を図面を用いて説明する。なお、便宜上、MPEG-2 AACの場合を例にとって説明を行う。 In the MPEG-1 Audio Layer III (MP3) standard (ISO / IEC 11172-3) and MPEG-2 AAC standard (ISO / IEC 13818-7), the scale factor and global gain are appropriately controlled during quantization. As a method, a method has been introduced in which a repetitive process is performed by a double loop of a distortion control loop (outer loop) and a code amount control loop (inner loop). Hereinafter, this method will be described with reference to the drawings. For convenience, the case of MPEG-2 AAC will be described as an example.

図１３は、ISO/IEC規格書に記載されている量子化処理を簡単なフローチャートにしたものである。 FIG. 13 is a simple flowchart of the quantization process described in the ISO / IEC standard.

まず、ステップＳ５０１では全てのSFBのスケールファクタと、グローバルゲインが０に初期化され、歪み制御ループ（アウターループ）に入る。 First, in step S501, all SFB scale factors and global gains are initialized to 0, and a distortion control loop (outer loop) is entered.

歪み制御ループでは、まず、符号量制御ループ（インナーループ）が実行される。 In the distortion control loop, first, a code amount control loop (inner loop) is executed.

符号量制御ループでは、まず、ステップＳ５０２において、１フレーム分、すなわち、１０２４個のスペクトル成分が、下記の量子化式に従って量子化される。 In the code amount control loop, first, in step S502, one frame, that is, 1024 spectral components are quantized according to the following quantization formula.

ただし、（1）式においてXqは量子化スペクトル、x_iは量子化前のスペクトル(MDCT係数)、global_gainはグローバルゲイン、scalefacはこのスペクトル成分が含まれるSFBのスケールファクタである。 In Equation (1), Xq is a quantized spectrum, x _i is a spectrum before quantization (MDCT coefficient), global_gain is a global gain, and scalefac is an SFB scale factor including this spectral component.

次に、ステップＳ５０３において、これらの量子化スペクトルをハフマン符号化した時の１フレーム分の使用ビット数が計算され、Ｓ５０４でフレームに割り当てられたビット数と比較する。割り当てられたビット数より使用ビット数が大きい場合は、Ｓ５０５においてグローバルゲインを１増加して、量子化粗さを粗くして、再びＳ５０２のスペクトル量子化に戻る。この繰り返しは量子化後に必要なビット数が割り当てられたビット数より少なくなるまで行われ、この時点でのグローバルゲインが決定されて、符号量制御ループが終了する。 Next, in step S503, the number of bits used for one frame when these quantized spectra are Huffman-coded is calculated and compared with the number of bits allocated to the frame in S504. If the number of used bits is larger than the allocated number of bits, the global gain is increased by 1 in S505, the quantization roughness is increased, and the process returns to the spectral quantization in S502 again. This repetition is performed until the number of bits necessary after quantization becomes smaller than the allocated number of bits, the global gain at this point is determined, and the code amount control loop is terminated.

ステップＳ５０６では、符号量制御ループによって量子化されたスペクトルを逆量子化して、量子化前のスペクトルとの差分を取ることによって量子化誤差を算出する。この量子化誤差は、SFB毎にまとめられる。 In step S506, the quantization error is calculated by dequantizing the spectrum quantized by the code amount control loop and taking the difference from the spectrum before quantization. This quantization error is collected for each SFB.

ステップＳ５０７では、全てのSFBでスケールファクタが０より大きくなったか、もしくは、量子化誤差が許容誤差範囲内に納まっているかどうかを調べる。このいずれの条件も満たさないSFBがある場合は、ステップＳ５０８に進み、許容誤差範囲内に量子化誤差が納まっていないSFBのスケールファクタを１増やし、再び歪み制御ループ処理を繰り返す。なお、SFB毎の許容誤差は聴覚演算によって量子化処理の前に求められている。 In step S507, it is checked whether the scale factor has become larger than 0 in all SFBs, or whether the quantization error is within the allowable error range. If there is an SFB that does not satisfy any of these conditions, the process proceeds to step S508, where the scale factor of the SFB in which the quantization error is not within the allowable error range is increased by 1, and the distortion control loop process is repeated again. The permissible error for each SFB is obtained before the quantization process by auditory calculation.

以上説明したように、ＩＳＯ規格書に記載されている量子化処理方法は二重ループで構成されており、しかも、グローバルゲインとスケールファクタは１刻みの制御しか行われないために、この処理が収束するまでに、スペクトル量子化とビット計算が幾度となく延々と繰り返されることになる。 As described above, the quantization processing method described in the ISO standard is composed of a double loop, and the global gain and the scale factor can only be controlled by one step. Spectral quantization and bit computation will be repeated endlessly until convergence.

ここで、例えばMPEG-2 AACの場合では、スペクトル量子化は１回処理するたびに式（1）の計算を1024回行うことになるため、計算量が多い処理である。また、ビット計算時に検索されるハフマン符号表が１１種も存在するため、ハフマン符号表を全探索するとビット計算もやはり計算量が必然的に多くなる。 Here, for example, in the case of MPEG-2 AAC, the spectrum quantization is a process with a large calculation amount because the calculation of the expression (1) is performed 1024 times every time the process is performed once. In addition, since there are 11 types of Huffman code tables that are searched during the bit calculation, if the Huffman code table is fully searched, the calculation amount of the bit calculation inevitably increases.

さらに、歪み制御ループでは逆量子化後に量子化誤差の計算を行っているが、この処理にも計算量がかかってしまう。そのため、この二重ループが収束するまでには膨大な処理量がかかってしまう。 Furthermore, in the distortion control loop, the quantization error is calculated after inverse quantization, but this processing also requires a calculation amount. Therefore, it takes a huge amount of processing before the double loop converges.

この問題を解決するために、二重ループの繰り返し回数を削減することによって、処理量を削減しようとする様々な試みがなされている。 In order to solve this problem, various attempts have been made to reduce the processing amount by reducing the number of repetitions of the double loop.

例えば、特開2003-271199号公報（特許文献１）には、ハフマン符号表の特性に応じて決定したステップ数によって、コモンスケールファクタやスケールファクタを１刻みではなく飛び飛びに制御することにより、二重ループそれぞれのループ回数を減らし、処理量を削減する方法が開示されている。 For example, Japanese Patent Laid-Open No. 2003-271199 (Patent Document 1) discloses that a common scale factor and a scale factor are controlled in a step-by-step manner according to the number of steps determined according to the characteristics of the Huffman code table. A method of reducing the processing amount by reducing the number of loops of each of the heavy loops is disclosed.

また、特開2001-184091号公報（特許文献２）には、最初に量子化ステップの推定値を算出した後、スケールファクタをMNRに応じて計算後、通常のインナーループを実行する方法が開示されている。 Japanese Patent Laid-Open No. 2001-184091 (Patent Document 2) discloses a method in which an estimation value of a quantization step is first calculated, a scale factor is calculated according to MNR, and then a normal inner loop is executed. Has been.

また、A.D.Duenes、R.Perez、 B.Rivasらの論文“A robust and efficient implementation of MPEG-2/4 AAC Natural Audio Coders”（AES 112th Convention Paper, 2002）（非特許文献１）には、式（１）を変形した式と、聴覚分析によって求められるSFB毎の許容誤差エネルギーを用いることによって、スケールファクタをスペクトル量子化に先行して適宜計算することにより、二重ループの外側の歪み制御ループを取り除き、処理量を削減する方法が紹介されている。 In addition, the paper “A robust and efficient implementation of MPEG-2 / 4 AAC Natural Audio Coders” (AES 112th Convention Paper, 2002) (Non-Patent Document 1) by ADDuenes, R. Perez, B. Rivas et al. By using the modified form of (1) and the permissible error energy for each SFB obtained by auditory analysis, the scale factor is calculated appropriately prior to spectral quantization, thereby enabling the distortion control loop outside the double loop. A method for removing the problem and reducing the processing amount is introduced.

これらの従来技術を用いることによって、量子化処理の二重ループの収束を早め、量子化処理の処理量をある程度までは削減することができる。 By using these conventional techniques, the convergence of the double loop of the quantization process can be accelerated, and the amount of the quantization process can be reduced to some extent.

特開2003-271199号公報JP 2003-271199 A 特開2001-184091号公報JP 2001-184091 A.D.Duenes、R.Perez、B.Rivas 等，“A robust and efficient implementation of MPEG-2/4 AAC Natural Audio Coders”，AES 112th Convention Paper（2002）A.D.Duenes, R.Perez, B.Rivas et al., “A robust and efficient implementation of MPEG-2 / 4 AAC Natural Audio Coders”, AES 112th Convention Paper (2002)

しかしながら、従来の技術では、ISO規格書に記載されている二重ループを完全に繰り返さないようにすることはできないため、依然としてスペクトル量子化を数回から数十回繰り返さないと量子化処理を終えることができず、符号化処理全体に占める量子化処理の処理量は依然として大きかった。 However, with the conventional technology, it is impossible to prevent the double loop described in the ISO standard document from being completely repeated. Therefore, the quantization process is finished unless the spectrum quantization is repeated several to several tens of times. The amount of quantization processing in the entire encoding processing was still large.

特に、二重ループのうち、聴覚演算結果を利用して事前にスケールファクタを計算することによって、外側の歪み制御ループを無くすことは可能であるが、量子化ステップを量子化前に計算することは従来の技術では不可能であった。 In particular, it is possible to eliminate the outer distortion control loop by calculating the scale factor in advance using the result of auditory calculation in the double loop, but the quantization step should be calculated before quantization. Was impossible with conventional technology.

そのため、従来の技術では符号量制御ループにおけるスペクトル量子化とビット計算をやはり繰り返し行なっており、処理量を浪費しているという課題がある。 Therefore, in the conventional technique, the spectrum quantization and the bit calculation in the code amount control loop are repeatedly performed, and there is a problem that the processing amount is wasted.

本発明は、上記問題点に鑑みてなされたものであり、オーディオ信号の符号化における量子化処理の処理量を大幅に削減することを目的とする。 The present invention has been made in view of the above-described problems, and an object thereof is to greatly reduce the amount of quantization processing in audio signal encoding.

本発明は、基本的には量子化前の情報量を量子化後の情報量で割ることによって、全体の量子化粗さを求めることができるという考えに基づき、量子化ステップを実際の量子化前に求めようとするものである。ここで、量子化粗さは一般的に基数を量子化ステップ乗したものであるため、量子化ステップを求めるために底をこの基数にした対数をとると、情報量の除算は情報量の差分に変化する。この差分に、量子化の刻み幅によって決定される係数を積算すると正確な量子化ステップを求めることができる。さらに、実際の量子化後の情報量は量子化後でないと求めることができないが、フレームに割り当てられた符号量から予測することができるため、この予測を利用して量子化前に正確な量子化ステップを求めるものである。 The present invention is basically based on the idea that the overall quantization roughness can be obtained by dividing the amount of information before quantization by the amount of information after quantization. It is what you want to ask before. Here, since the quantization roughness is generally obtained by multiplying the radix by the quantization step, when taking the logarithm with the base as the base to obtain the quantization step, the division of the information amount is the difference of the information amount. To change. An accurate quantization step can be obtained by multiplying this difference by a coefficient determined by the quantization step size. In addition, the actual amount of information after quantization can only be obtained after quantization, but since it can be predicted from the amount of code assigned to the frame, accurate quantization prior to quantization is made using this prediction. It is the one that seeks the conversion step.

例えば本発明の一側面に係るオーディオ信号符号化装置は、オーディオ入力信号をチャネルごとに処理単位フレームに分割するフレーム分割部と、オーディオ入力信号を分析し、変換ブロック長の決定および聴覚マスキングの計算を行う聴覚心理演算部と、前記聴覚心理演算部で決定された前記変換ブロック長に従って、処理対象フレームをブロック化し、フレーム中の時間領域信号を１または２以上の周波数スペクトルの組に変換するフィルタバンク部と、前記フィルタバンク部より出力された周波数スペクトルを、複数の周波数帯域に分割して、前記聴覚心理演算部の演算結果によって各周波数帯域のスペクトルを重み付けするスケールファクタ算出部と、前記スケールファクタ算出部によって重み付けされた量子化前のスペクトル全体が持つ聴覚情報量から量子化後のスペクトル全体の情報量を減じ、量子化粗さの刻み幅から得られる係数を積算することによって、フレーム全体の量子化ステップをスペクトル量子化前に決定する量子化ステップ決定部と、前記スケールファクタと前記量子化ステップとを利用して前記周波数スペクトル列を量子化するスペクトル量子化部と、前記スペクトル量子化部から出力される量子化スペクトルを規定のフォーマットに従って整形したビットストリームを作成出力するビット整形部とを備え、前記量子化ステップ決定部は、符号化対象であるフレームに割り当てられたビット量に基づいて前記量子化スペクトル全体の情報量を予測する量子化スペクトル情報量予測部を含むことを特徴とする。 For example, an audio signal encoding apparatus according to one aspect of the present invention includes a frame dividing unit that divides an audio input signal into processing unit frames for each channel, analyzes the audio input signal, determines a transform block length, and calculates auditory masking. And a filter that blocks a frame to be processed according to the conversion block length determined by the psychoacoustic operation unit and converts a time domain signal in the frame into a set of one or more frequency spectra. A bank unit, a scale factor calculation unit that divides the frequency spectrum output from the filter bank unit into a plurality of frequency bands, and weights the spectrum of each frequency band according to a calculation result of the auditory psychological calculation unit; and the scale Whole spectrum before quantization weighted by the factor calculator Quantization that determines the quantization step of the entire frame before spectral quantization by subtracting the information amount of the entire spectrum after quantization from the auditory information amount and integrating the coefficient obtained from the step size of the quantization roughness A step determination unit, a spectrum quantization unit that quantizes the frequency spectrum sequence using the scale factor and the quantization step, and a quantized spectrum output from the spectrum quantization unit according to a prescribed format A quantization unit that predicts the information amount of the entire quantized spectrum based on a bit amount allocated to a frame to be encoded. A spectral information amount prediction unit is included.

本発明においては、最初にスケールファクタを算出、確定した後に、その値を使用した計算で量子化ステップをほぼ正確に算出することができるので、ほぼ一回のスペクトル量子化とビット計算で量子化を終了することが可能になる。 In the present invention, after calculating and determining the scale factor for the first time, the quantization step can be calculated almost accurately by the calculation using the value, so that the quantization is performed by almost one spectral quantization and bit calculation. Can be terminated.

本発明によれば、オーディオ信号の符号化における量子化処理の処理量を大幅に低減することができる。 ADVANTAGE OF THE INVENTION According to this invention, the processing amount of the quantization process in the encoding of an audio signal can be reduced significantly.

以下、図面を参照して本発明の好適な実施形態について詳細に説明する。 DESCRIPTION OF EMBODIMENTS Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the drawings.

（第１の実施形態）
図１は、本実施形態におけるオーディオ信号符号化装置の一構成例を示す図である。同図において、太線はデータ信号、細線は制御信号を示す。 (First embodiment)
FIG. 1 is a diagram illustrating a configuration example of an audio signal encoding device according to the present embodiment. In the figure, a thick line indicates a data signal, and a thin line indicates a control signal.

図示の構成において、１はオーディオ入力信号を処理単位であるフレームに分割するフレーム分割器である。ここでフレーム単位に分割されたオーディオ入力信号は後述する聴覚心理演算器２とフィルタバンク３へ送出される。 In the illustrated configuration, reference numeral 1 denotes a frame divider that divides an audio input signal into frames as processing units. Here, the audio input signal divided into frame units is sent to an auditory psychological calculator 2 and a filter bank 3 described later.

２は聴覚心理演算器であり、オーディオ入力信号をフレーム単位に分析し、SFBよりも詳細な分割周波数帯域でマスキング計算を行う。この演算の結果、ブロックタイプがフィルタバンク３へ、また、SFB毎の信号対マスク比（SMR）がスケールファクタ計算器４へ出力される。 An auditory psychological calculator 2 analyzes an audio input signal for each frame and performs a masking calculation in a divided frequency band that is more detailed than SFB. As a result of this calculation, the block type is output to the filter bank 3, and the signal-to-mask ratio (SMR) for each SFB is output to the scale factor calculator 4.

３はフィルタバンクであり、フレーム分割器１から入力された時間信号に対して、聴覚心理演算器２によって指定されたブロックタイプのウィンドウ掛けを行った後、指定されたブロック長で時間周波数変換を行い、周波数スペクトルに変換する。 Reference numeral 3 denotes a filter bank, which performs windowing of the block type specified by the auditory psychological calculator 2 on the time signal input from the frame divider 1 and then performs time-frequency conversion with the specified block length. And convert to frequency spectrum.

４はスケールファクタ計算器であり、SFB毎のSMR（信号対マスク比）と、周波数スペクトルより、SFB毎の許容誤差エネルギーを算出し、それに基づいて全てのSFBのスケールファクタを決定する。 4 is a scale factor calculator, which calculates the allowable error energy for each SFB from the SMR (signal-to-mask ratio) for each SFB and the frequency spectrum, and determines the scale factors of all the SFBs based on it.

５はスペクトル割当ビット計算器であり、量子化スペクトル符号に割り当てられるビット数を計算する。 A spectrum allocation bit calculator 5 calculates the number of bits allocated to the quantized spectrum code.

６は量子化スペクトル総量予測器であり、スペクトル割当ビット数に基づいて、量子化後のスペクトル総量を予測する。 Reference numeral 6 denotes a quantized spectral total amount predictor that predicts the quantized spectral total amount based on the number of spectrum allocation bits.

７は量子化ステップ計算器であり、量子化前のスペクトルが持つ聴覚情報量を計算し、量子化後のスペクトル総量から求めた量子化後のスペクトル情報量を減じることによって量子化ステップを求める。 Reference numeral 7 denotes a quantization step calculator that calculates the amount of auditory information held by the spectrum before quantization and subtracts the amount of spectrum information after quantization obtained from the total amount of spectrum after quantization.

８はスペクトル量子化器であり、各周波数スペクトルを量子化する。 A spectrum quantizer 8 quantizes each frequency spectrum.

９はビット整形器であり、スケールファクタと量子化スペクトルを適宜規定のフォーマットに整形してビットストリームを作成し、出力する。 Reference numeral 9 denotes a bit shaper that creates a bit stream by appropriately shaping the scale factor and the quantized spectrum into a prescribed format and outputs the bit stream.

上記構成によるオーディオ信号符号化装置におけるオーディオ信号の処理動作を以下に説明する。 An audio signal processing operation in the audio signal encoding apparatus having the above configuration will be described below.

なお、本実施形態では、説明の便宜のために符号化方式としてMPEG-2 AACを例にとって説明するが、同様な量子化手法を適用可能なその他の符号化方式についても全く同様な方法で実現可能である。 In this embodiment, for convenience of explanation, MPEG-2 AAC is described as an example of an encoding method. However, other encoding methods to which a similar quantization method can be applied are realized in exactly the same manner. Is possible.

まず、処理に先立ち、各部の初期化を行う。初期化によって、量子化ステップと全てのスケールファクタの値は０にセットされる。 First, prior to processing, each unit is initialized. Initialization sets the quantization step and all scale factor values to zero.

オーディオPCM信号などのオーディオ入力信号はフレーム分割器１によってフレーム単位に分割され、聴覚心理演算器２とフィルタバンク３に送出される。MPEG-2 AAC LC(Low-Complexity)プロファイルの場合、１フレームは1024サンプルのPCM信号で構成され、この信号が送出される。 An audio input signal such as an audio PCM signal is divided into frame units by a frame divider 1 and sent to an auditory psychological calculator 2 and a filter bank 3. In the case of the MPEG-2 AAC LC (Low-Complexity) profile, one frame is composed of 1024 sample PCM signals, and this signal is transmitted.

聴覚心理演算器２では、フレーム分割器１から送出された入力信号を適宜分析し、聴覚マスキング分析を行い、ブロックタイプをフィルタバンク３に、また、SFB毎の信号対マスク比(SMR)をスケールファクタ計算器４に各々出力する。なお、聴覚心理演算器２で行われる分析やマスキング計算は当分野において公知であるため、これらの詳細な説明は行わない。 The psychoacoustic computing unit 2 appropriately analyzes the input signal sent from the frame divider 1, performs auditory masking analysis, scales the block type to filter bank 3, and the signal-to-mask ratio (SMR) for each SFB. Each is output to the factor calculator 4. In addition, since analysis and masking calculation performed by the auditory psychological calculator 2 are known in the art, detailed description thereof will not be given.

フィルタバンク３では、聴覚心理演算器２が出力するブロックタイプに従って、フレーム分割器１から送出される１フレーム分の現入力信号と、前回の変換時に受け取った前フレームの入力信号を合わせ、2フレーム分、2048サンプルの時間信号を周波数成分へ変換する。なお、本実施形態において、前フレームの入力信号はフィルタバンク３内のバッファに保持されている。ここで、ブロックタイプが長いブロック長を用いる場合は、入力信号の2048サンプルを一つのブロックとして、ブロックタイプに応じた形の窓掛けを実行後、MDCTを行い、1024個の周波数スペクトルを出力する。短いブロック長を用いる場合には、入力信号の2048サンプルのうち、448番目のサンプルを先頭として256サンプルを一つのブロックとして窓掛け後にMDCTを行い、128個の周波数成分を出力する変換を、入力信号を128サンプルずつずらしながら都合８回行い、８組の周波数スペクトルを得る。 In the filter bank 3, the current input signal for one frame sent from the frame divider 1 and the input signal of the previous frame received at the previous conversion are combined into two frames according to the block type output by the psychoacoustic operator 2. Min, 2048 sample time signals are converted to frequency components. In the present embodiment, the input signal of the previous frame is held in a buffer in the filter bank 3. Here, if the block type uses a long block length, 2048 samples of the input signal are used as one block, windowing according to the block type is performed, MDCT is performed, and 1024 frequency spectra are output. . When a short block length is used, MDCT is performed after windowing 256 samples starting from the 448th sample of the 2048 samples of the input signal, and a conversion that outputs 128 frequency components is input. The signal is shifted eight times while shifting by 128 samples to obtain eight sets of frequency spectra.

スケールファクタ計算器４は、フィルタバンク３から出力されるスペクトル成分と聴覚心理演算器２から出力されるSFB毎のSMR値からSFB毎の許容誤差エネルギーを算出し、これに基づき、SFB毎のスケールファクタを計算する。許容誤差エネルギーに基づくスケールファクタの算出方法は、当分野では公知であるため、ここでは詳細は述べないが、例えば、前述した非特許文献１に記載された手法をとるならば、MPEG-2 AACにおいて、SFB bにおけるスケールファクタscalefac[b]は次式で求めることができる。 The scale factor calculator 4 calculates the allowable error energy for each SFB from the spectrum component output from the filter bank 3 and the SMR value for each SFB output from the psychoacoustic operator 2, and based on this, the scale factor for each SFB is calculated. Calculate the factor. Since the calculation method of the scale factor based on the allowable error energy is known in the art, the details are not described here. For example, if the method described in Non-Patent Document 1 described above is used, MPEG-2 AAC , The scale factor scalefac [b] in SFB b can be obtained by the following equation.

ただし、式（２）において、x_avgはSFB bに含まれるスペクトル成分の平均レベルである。また、xmin[b]はSFB bの許容誤差エネルギーであり、SFB bのスペクトルエネルギーをenergy[b]、信号対マスク比をSMR[b]、含まれるスペクトル本数をsfb_width[b]とすると、このxmin[b]は次式によって求められる。 In equation (2), x _avg is the average level of the spectral components included in SFB b. Xmin [b] is the allowable error energy of SFB b. If the spectral energy of SFB b is energy [b], the signal-to-mask ratio is SMR [b], and the number of included spectra is sfb_width [b], xmin [b] is obtained by the following equation.

スペクトル割当ビット計算器５では、スケールファクタ計算器４から出力されたスケールファクタをハフマン符号化したときのビット数を計算し、指定されたフレームビット数から減じることによって、量子化スペクトルに割り当てられるビット数を計算し、量子化スペクトル総量予測器６へ出力する。 The spectrum allocation bit calculator 5 calculates the number of bits when the scale factor output from the scale factor calculator 4 is Huffman encoded, and subtracts it from the specified number of frame bits, thereby assigning bits to the quantized spectrum. The number is calculated and output to the quantized spectral total amount predictor 6.

量子化スペクトル総量予測器６は、スペクトル割当ビット計算器５から出力されたビット数に基づいて量子化スペクトル総量の予測計算を行う。本実施形態において、この計算は、従来の量子化器によって量子化した際の、スペクトル割当ビット数と量子化スペクトル総量との関係を実際に測定し、その結果に基づいて作成した近似式によって計算する。例えば、この近似式をF(x)として、スペクトル割当ビットをspectrum_bitsとすると、量子化後スペクトル予測総量は以下の式によって求めることができる。 The quantized spectrum total amount predictor 6 performs a prediction calculation of the quantized spectrum total amount based on the number of bits output from the spectrum allocation bit calculator 5. In this embodiment, this calculation is performed using an approximate expression created based on the actual measurement of the relationship between the number of spectrum allocation bits and the total amount of quantized spectrum when quantized by a conventional quantizer. To do. For example, if this approximate expression is F (x) and the spectrum allocation bits are spectrum_bits, the quantized spectrum prediction total amount can be obtained by the following expression.

量子化ステップ計算器７では、まず、フィルタバンク３から出力された各周波数スペクトルに、スケールファクタによって聴覚上の重み付けをした値の総計を取り、これに基づいて量子化前の周波数スペクトルが持つ聴覚情報量を計算する。
次に、量子化後スペクトル総量予測器６から出力された量子化スペクトル総量に基づいて量子化スペクトルが持つ情報量を計算する。 In the quantization step calculator 7, first, the sum of values obtained by weighting each frequency spectrum output from the filter bank 3 by the scale factor is taken, and based on this, the auditory frequency spectrum before quantization has Calculate the amount of information.
Next, the information amount of the quantized spectrum is calculated based on the quantized spectrum total amount output from the quantized spectrum total amount predictor 6.

最後に、量子化前スペクトルの聴覚情報量から量子化スペクトルの情報量を減じ、量子化粗さの刻み幅から得られる係数を掛けることによって、フレーム全体の量子化粗さである量子化ステップを計算する。 Finally, the quantization step, which is the quantization roughness of the entire frame, is obtained by subtracting the information amount of the quantization spectrum from the auditory information amount of the pre-quantization spectrum and multiplying by the coefficient obtained from the step size of the quantization roughness. calculate.

具体的には、MPEG-2 AACの場合は、量子化ステップの予測値は次式を計算することによって得られる。 Specifically, in the case of MPEG-2 AAC, the predicted value of the quantization step is obtained by calculating the following equation.

ただし、式（５）において、Xqは量子化スペクトル、xiは量子化前のスペクトル、global_gainはグローバルゲイン（量子化ステップ）、scalefacはこのスペクトル成分が含まれるSFBのスケールファクタである。また、総計を取るiの範囲は1フレーム分、すなわち0≦i≦1023である。
ここで、式（５）において、以下に示す右辺の第１項

が、量子化前のスペクトル全体が持つ聴覚情報量であり、各スペクトルに、スケールファクタによって聴覚上の重み付けがなされた値の総計である。また、右辺の第２項である log₂Σ_iX_q が、量子化後のスペクトルが持つ情報量であり、このうち、Σ_iX_q は量子化スペクトルの総計であり、量子化スペクトル総量予測器６によって予測された値である。この値は前述したように例えば近似式（４）を計算することによって得られる。 In Equation (5), Xq is a quantized spectrum, xi is a spectrum before quantization, global_gain is a global gain (quantization step), and scalefac is an SFB scale factor including this spectrum component. In addition, the range of i that takes the total is one frame, that is, 0 ≦ i ≦ 1023.
Here, in the expression (5), the first term on the right side shown below.

Is the amount of auditory information that the entire spectrum before quantization has, and is the sum of the values of each spectrum weighted auditorily by a scale factor. In addition, log ₂ Σ _i X _q which is the second term on the right side is the information amount of the spectrum after quantization, and among these, Σ _i X _q is the total of the quantized spectrum, and the quantized spectrum total amount prediction The value predicted by the device 6. This value is obtained, for example, by calculating the approximate expression (4) as described above.

なお、式（５）はスペクトル量子化式（１）を適宜変形することによって得ることができる。 Equation (5) can be obtained by appropriately modifying the spectral quantization equation (1).

スペクトル量子化器８は、スケールファクタ計算器４が出力したスケールファクタと量子化ステップ計算器７が出力した量子化ステップに従って、1024本の周波数スペクトルを量子化する。具体的には、例えば、MPEG-2 AACの場合では式（１）によって量子化スペクトルを算出し、フレーム全体で消費されるビット数をカウントする。 The spectrum quantizer 8 quantizes 1024 frequency spectra according to the scale factor output from the scale factor calculator 4 and the quantization step output from the quantization step calculator 7. Specifically, for example, in the case of MPEG-2 AAC, a quantized spectrum is calculated according to Equation (1), and the number of bits consumed in the entire frame is counted.

ここで、使用ビット数がスペクトル割当ビット数を超えてしまった場合には、使用ビット数がスペクトル割当ビット数に収まるまで量子化ステップを増加して再度スペクトル量子化を行う。しかしながら、量子化ステップ計算器７の計算が正確であるため、多くの場合１回のみの量子化スペクトル計算とビット計算が行われる。 If the number of used bits exceeds the number of spectrum allocated bits, the quantization step is increased until the number of used bits falls within the number of spectrum allocated bits, and spectrum quantization is performed again. However, since the calculation of the quantization step calculator 7 is accurate, in most cases, only one quantization spectrum calculation and bit calculation are performed.

各SFBのスケールファクタと量子化スペクトルはビット整形器８によって定められた書式に従ってビットストリームに整形されて、出力される。 The scale factor and quantized spectrum of each SFB are shaped into a bit stream according to the format determined by the bit shaper 8 and output.

以上説明したように、本実施形態におけるオーディオ信号符号化装置は、フレームに割り当てられたビット量から、量子化後のスペクトル総量を予測し、これを用いて量子化前後のスペクトル全体が持つ情報量の差分を計算することによって、スペクトル量子化の前に量子化ステップをほぼ正確に予測する。これにより、量子化ステップの調整のための繰り返しを行うことが減るために、迅速に量子化処理を終了することができる。 As described above, the audio signal encoding apparatus according to the present embodiment predicts the total amount of spectrum after quantization from the amount of bits allocated to the frame, and uses this information amount of information in the entire spectrum before and after quantization. By calculating the difference of, the quantization step is predicted almost accurately before spectral quantization. Thereby, since the repetition for adjusting the quantization step is reduced, the quantization process can be completed quickly.

（第２の実施形態）
本発明は、パーソナルコンピュータ（ＰＣ）等の汎用的な計算機上で動作するソフトウェアプログラムとして実施することも可能である。以下、この場合について図面を用いて説明する。 (Second Embodiment)
The present invention can also be implemented as a software program that operates on a general-purpose computer such as a personal computer (PC). Hereinafter, this case will be described with reference to the drawings.

図５は、本実施形態におけるオーディオ信号符号化装置の構成例を示す図である。 FIG. 5 is a diagram illustrating a configuration example of the audio signal encoding device according to the present embodiment.

図示の構成において、１００はＣＰＵであり、オーディオ信号符号化処理のための演算、論理判断等を行い、１０２のバスを介して各構成要素を制御する。 In the configuration shown in the figure, reference numeral 100 denotes a CPU, which performs operations for audio signal encoding processing, logic determination, and the like, and controls each component via a bus 102.

１０１はメモリであり、本実施形態の構成例における基本Ｉ／Oプログラムや、実行しているプログラムコード、プログラム処理時に必要なデータなどを格納する。 A memory 101 stores a basic I / O program in the configuration example of the present embodiment, a program code being executed, data necessary for program processing, and the like.

１０２はバスであり、ＣＰＵ１００の制御の対象とする構成要素を指示するアドレス信号を転送し、ＣＰＵ１００の制御の対象とする各構成要素のコントロール信号を転送し、各構成機器相互間のデータ転送を行う。 Reference numeral 102 denotes a bus, which transfers an address signal indicating a component to be controlled by the CPU 100, transfers a control signal of each component to be controlled by the CPU 100, and transfers data between the components. Do.

１０３は端末であり、装置の起動、各種条件や入力信号の設定、符号化開始の指示を行う。 Reference numeral 103 denotes a terminal for instructing device activation, setting of various conditions and input signals, and encoding start.

１０４はデータやプログラム等を記憶するための外部記憶領域を提供する外部記憶装置であり、例えばハードディスク装置などによって実現される。ここに、ＯＳをはじめとするプログラムやデータ等が保管され、また、保管されたデータやプログラムは必要な時にＣＰＵ１００によって呼び出される。また、後述するように、オーディオ信号符号化処理プログラムもこの外部記憶装置１０４にインストールされることになる。 Reference numeral 104 denotes an external storage device that provides an external storage area for storing data, programs, and the like, and is realized by, for example, a hard disk device. Here, programs and data including the OS are stored, and the stored data and programs are called by the CPU 100 when necessary. As will be described later, an audio signal encoding processing program is also installed in the external storage device 104.

１０５はメディアドライブである。記録媒体（例えば、ＣＤ−ＲＯＭ）に記録されているプログラムやデータ、デジタルオーディオ信号などはこのメディアドライブ１０５が読み取ることにより本オーディオ信号符号化装置にロードされる。また、外部記憶部１０４に蓄えられた各種データや実行プログラムを、記録媒体に書き込むこともできる。 Reference numeral 105 denotes a media drive. Programs, data, digital audio signals, and the like recorded on a recording medium (for example, a CD-ROM) are loaded into the audio signal encoding apparatus by being read by the media drive 105. In addition, various data and execution programs stored in the external storage unit 104 can be written in a recording medium.

１０６はマイクロフォンであり、実際の音を集音してオーディオ信号に変換する。１０７はスピーカーであり、任意のオーディオ信号データを実際の音にして出力することができる。 A microphone 106 collects actual sound and converts it into an audio signal. Reference numeral 107 denotes a speaker, which can output arbitrary audio signal data as an actual sound.

１０８は通信網であり、LAN、公衆回線、無線回線、放送電波などで構成されている。１０９は通信インタフェースであり、通信網１０８に接続されている。本実施形態におけるオーディオ信号符号化装置はこの通信インタフェース１０９を介して通信網１０８を経由し、外部機器と通信を行い、データやプログラムを送受信することができる。 A communication network 108 includes a LAN, a public line, a wireless line, a broadcast wave, and the like. Reference numeral 109 denotes a communication interface, which is connected to the communication network 108. The audio signal encoding apparatus according to this embodiment can communicate with an external device via the communication network 108 via the communication interface 109 to transmit / receive data and programs.

かかる構成を備えるオーディオ信号符号化装置は、端末１０３からの各種の入力に応じて作動する。端末１０３からの入力が供給されると、インタラプト信号がＣＰＵ１００に送られることによって、ＣＰＵ１００がメモリ１０１内に記憶してある各種の制御信号を読出し、それらの制御信号に従って、各種の制御が行われる。 The audio signal encoding apparatus having such a configuration operates in response to various inputs from the terminal 103. When an input from the terminal 103 is supplied, an interrupt signal is sent to the CPU 100, whereby the CPU 100 reads various control signals stored in the memory 101, and various controls are performed according to the control signals. .

本実施形態のオーディオ信号符号化装置は、ＣＰＵ１００が、メモリ１０１に格納されている基本Ｉ／Ｏプログラムを実行し、これより外部記憶装置１０４に記憶されているＯＳをメモリ１０１にロードしてこれを実行することによって、動作する。具体的には、本装置の電源がＯＮにされると、基本Ｉ／Ｏプログラム中のＩＰＬ（イニシャルプログラムローディング）機能により外部記憶部１０４からＯＳがメモリ１０１に読み込まれ、ＯＳの動作が開始される。 In the audio signal encoding apparatus according to the present embodiment, the CPU 100 executes the basic I / O program stored in the memory 101, and loads the OS stored in the external storage device 104 to the memory 101. It works by executing Specifically, when the power of this apparatus is turned on, the OS is read from the external storage unit 104 into the memory 101 by the IPL (Initial Program Loading) function in the basic I / O program, and the operation of the OS is started. The

オーディオ信号符号化処理プログラムは、図２に示されるオーディオ信号符号化処理手順のフローチャートに基づいてプログラムコード化されたものである。 The audio signal encoding processing program is a program code based on the flowchart of the audio signal encoding processing procedure shown in FIG.

図６は、オーディオ信号符号化処理プログラムおよび関連データを記録媒体に記録したときの内容構成例を示す図である。本実施形態において、オーディオ信号符号化処理プログラムおよびその関連データは記録媒体に記録されている。図示したように記録媒体の先頭領域には、この記録媒体のディレクトリ情報が記録されており、その後にこの記録媒体のコンテンツであるオーディオ信号符号化処理プログラムと、オーディオ信号符号化処理関連データがファイルとして記録されている。 FIG. 6 is a diagram showing a content configuration example when an audio signal encoding processing program and related data are recorded on a recording medium. In the present embodiment, the audio signal encoding processing program and related data are recorded on a recording medium. As shown in the drawing, directory information of the recording medium is recorded in the head area of the recording medium, and thereafter, an audio signal encoding processing program that is the content of the recording medium and audio signal encoding processing related data are files. It is recorded as.

図７は、オーディオ信号符号化処理プログラムのオーディオ信号符号化装置（ＰＣ）への導入を示す模式図である。記録媒体に記録されたオーディオ信号符号化処理プログラムおよびその関連データは、図７に示したようにメディアドライブ１０５を通じて本装置にロードすることができる。この記録媒体１１０をメディアドライブ１０５にセットすると、ＯＳ及び基本Ｉ／Ｏプログラムの制御のもとにオーディオ信号符号化処理プログラムおよびその関連データが記録媒体１１０から読み出され、外部記憶部１０４に格納される。その後、再起動時にこれらの情報がメモリ１０１にロードされて動作可能となる。 FIG. 7 is a schematic diagram showing the introduction of the audio signal encoding processing program into the audio signal encoding device (PC). The audio signal encoding processing program and related data recorded on the recording medium can be loaded into the apparatus through the media drive 105 as shown in FIG. When the recording medium 110 is set in the media drive 105, the audio signal encoding processing program and related data are read from the recording medium 110 and stored in the external storage unit 104 under the control of the OS and the basic I / O program. Is done. After that, these information are loaded into the memory 101 at the time of restart and can be operated.

図８は、本実施形態におけるオーディオ信号符号化処理プログラムがメモリ１０１にロードされ実行可能となった状態のメモリマップを示す図である。図示のように、メモリ１０１のワークエリアには例えば、基準ビットレート、基準サンプリングレート、ビットレート、サンプリングレート、割当ビット上限値、平均割当ビット、PEビット、使用ビット、スケールファクタビット、スペクトル割当ビット、量子化前スペクトル聴覚情報量、量子化後スペクトル予測情報量、許容誤差エネルギー、スペクトルバッファ、量子化スペクトル、入力信号バッファ、スケールファクタ、量子化ステップ、ブロックタイプ、SMR、PE、リザーブビット量が格納されている。 FIG. 8 is a diagram showing a memory map in a state where the audio signal encoding processing program according to the present embodiment is loaded into the memory 101 and can be executed. As illustrated, the work area of the memory 101 includes, for example, a reference bit rate, a reference sampling rate, a bit rate, a sampling rate, an assigned bit upper limit value, an average assigned bit, a PE bit, a used bit, a scale factor bit, and a spectrum assigned bit. Pre-quantization spectrum auditory information amount, post-quantization spectrum prediction information amount, allowable error energy, spectrum buffer, quantization spectrum, input signal buffer, scale factor, quantization step, block type, SMR, PE, reserve bit amount Stored.

図９は、本実施形態におけるオーディオ信号符号化装置における入力信号バッファの一構成例を示す図である。図示の構成において、バッファサイズは1024×3サンプルであり、説明の便宜上1024サンプル毎に縦線で区切っている。入力信号は右から1フレーム分の1024サンプルずつ入力されて、左から逐次処理される。なお、図示の構成は１チャネル分の入力信号バッファを模式的に示したものであり、本実施形態では入力信号のチャネル分だけ同様なバッファが用意される。 FIG. 9 is a diagram illustrating a configuration example of the input signal buffer in the audio signal encoding device according to the present embodiment. In the configuration shown in the figure, the buffer size is 1024 × 3 samples, and for convenience of explanation, every 1024 samples are separated by vertical lines. The input signal is input 1024 samples for one frame from the right and sequentially processed from the left. The illustrated configuration schematically shows an input signal buffer for one channel. In this embodiment, similar buffers are prepared for the channels of the input signal.

以下、本実施形態においてＣＰＵ１００で実行されるオーディオ信号符号化処理をフローチャートを用いて説明する。 Hereinafter, an audio signal encoding process executed by the CPU 100 in the present embodiment will be described with reference to flowcharts.

図２は、本実施形態におけるオーディオ信号符号化処理のフローチャートである。このフローチャートに対応するプログラムはオーディオ信号符号化処理プログラムに含まれ、上記のとおりメモリ１０１にロードされＣＰＵ１００によって実行される。 FIG. 2 is a flowchart of audio signal encoding processing in the present embodiment. A program corresponding to this flowchart is included in the audio signal encoding processing program, loaded into the memory 101 as described above, and executed by the CPU 100.

まず、ステップＳ１は、符号化する入力オーディオ信号をユーザが端末１０３を用いて指定する処理である。本実施形態において、符号化するオーディオ信号は、外部記憶１０４に格納されているオーディオＰＣＭファイルでも良いし、マイク１０６で捉えたリアルタイムの音声信号をアナログ・デジタル変換した信号でも良い。この処理を終えると、ステップＳ２へ進む。 First, step S <b> 1 is a process in which the user designates an input audio signal to be encoded using the terminal 103. In the present embodiment, the audio signal to be encoded may be an audio PCM file stored in the external storage 104, or a signal obtained by analog / digital conversion of a real-time audio signal captured by the microphone 106. When this process ends, the process proceeds to step S2.

ステップＳ２は、符号化する入力オーディオ信号が終了したかどうかを判定する処理である。入力信号が終了している場合は、ステップＳ１１へ処理が進む。未終了の場合は、ステップＳ３へ処理が進む。 Step S2 is a process of determining whether or not the input audio signal to be encoded has been completed. If the input signal has ended, the process proceeds to step S11. If not completed, the process proceeds to step S3.

ステップＳ３は、図９に示した入力信号バッファにおいて、右から２フレーム分、すなわち2048サンプルの時間信号を１フレーム分左にシフトするとともに、新たに１フレーム分、すなわち1024サンプルを右側に読み込む入力信号シフト処理である。この処理は入力信号に含まれる全てのチャネルに対して行われる。処理を終えると、ステップＳ４へ処理が進む。 In step S3, the input signal buffer shown in FIG. 9 shifts the time signal of two frames from the right, that is, 2048 samples to the left by one frame, and newly reads one frame, that is, 1024 samples to the right. This is signal shift processing. This process is performed for all channels included in the input signal. When the process is finished, the process proceeds to step S4.

ステップＳ４は、入力信号バッファに格納されている時間信号を分析し、現行フレームの聴覚心理演算を行う処理である。この演算の結果、現行フレームのブロックタイプと、聴覚エントロピー（PE）と、SFBごとのSMR値が算出され、メモリ１０１上のワークエリアに格納される。ここで、SMR値は、現行フレームのブロック長が短い場合はショートブロック時の８組が、若しくは、ブロックタイプがそれ以外の場合はロングブロック時の１組が算出される。このような聴覚演算は当分野において公知であるため、詳細な説明は行わない。処理を終えると、ステップＳ５へ処理が進む。 Step S4 is a process of analyzing the time signal stored in the input signal buffer and performing the psychoacoustic calculation of the current frame. As a result of this calculation, the block type of the current frame, auditory entropy (PE), and the SMR value for each SFB are calculated and stored in the work area on the memory 101. Here, as the SMR value, when the block length of the current frame is short, eight sets for the short block are calculated, or when the block type is other than that, one set for the long block is calculated. Such auditory computation is well known in the art and will not be described in detail. When the process is finished, the process proceeds to step S5.

ステップＳ５では、ステップＳ４で得られたブロックタイプに従って、現行フレームの時間信号、すなわち、図１０の現行フレーム先頭ポインタから右に2048サンプル（２フレーム分）の信号に対して窓掛け後、時間周波数変換を行う。この結果、MPEG-２ AACの場合、変換ブロック長が短いときは、128の周波数成分に分割されたスペクトルの組が８組得られる。それ以外のブロック長が長いブロックタイプの場合は、1024の周波数成分に分割されたスペクトルの組が1組得られる。双方の場合とも、算出された計1024本のスペクトルは、メモリ１０１上のワークエリアにあるスペクトルバッファに格納される。処理を終えると、ステップＳ６に処理が進む。 In step S5, according to the block type obtained in step S4, the time signal of the current frame, that is, the signal of 2048 samples (two frames) to the right from the current frame head pointer in FIG. Perform conversion. As a result, in the case of MPEG-2 AAC, when the transform block length is short, eight sets of spectra divided into 128 frequency components are obtained. In the case of a block type having a long block length, one set of spectra divided into 1024 frequency components is obtained. In both cases, a total of 1024 calculated spectra are stored in the spectrum buffer in the work area on the memory 101. When the process is finished, the process proceeds to step S6.

ステップＳ６は、ステップＳ５で得られた周波数スペクトルとステップＳ４で得られたSFB毎のSMRから許容誤差エネルギーを計算したのち、これを利用して各SFB毎のスケールファクタを算出する処理である。例えば、MPEG-2 AACの場合は前述の実施形態１の式（２）によってスケールファクタを算出する。この処理において算出したSFB毎の許容誤差エネルギーとスケールファクタはメモリ１０１上のワークエリア上に格納される。処理を終えると、ステップＳ７へ進む。 Step S6 is a process of calculating the allowable error energy from the frequency spectrum obtained in step S5 and the SMR for each SFB obtained in step S4, and using this to calculate the scale factor for each SFB. For example, in the case of MPEG-2 AAC, the scale factor is calculated by the equation (2) in the first embodiment. The allowable error energy and scale factor for each SFB calculated in this process are stored in the work area on the memory 101. When the process is finished, step S7 follows.

ステップＳ７は、量子化前のスペクトルが持つ聴覚情報量と量子化後のスペクトルが持つ情報量との差分から量子化ステップを計算する処理である。この処理の詳細は図３を用いて後述する。処理を終えると、ステップＳ８へ進む。 Step S7 is a process of calculating the quantization step from the difference between the auditory information amount of the spectrum before quantization and the information amount of the spectrum after quantization. Details of this processing will be described later with reference to FIG. When the process is finished, step S8 follows.

ステップＳ８は、ステップＳ６で求めたスケールファクタとステップＳ７で求めた量子化ステップに従って、1024本の周波数スペクトルを量子化して、使用ビットを計算し、使用ビットがメモリ１０１上のワークエリアに格納されている割当ビットを超えた場合のみ、量子化ステップの増加と再量子化を行う処理である。この処理の詳細は図４を用いて後述する。処理を終えると、ステップＳ９へ処理が進む。 In step S8, 1024 frequency spectra are quantized according to the scale factor obtained in step S6 and the quantization step obtained in step S7, and used bits are calculated. The used bits are stored in the work area on the memory 101. This is a process of increasing the quantization step and performing requantization only when the allocated bits are exceeded. Details of this processing will be described later with reference to FIG. When the process is finished, the process proceeds to step S9.

ステップＳ９は、ステップＳ８で算出された量子化スペクトルと、スケールファクタを、符号化方式によって定められたフォーマットに従って整形し、ビットストリームとして出力する処理である。本実施形態において、この処理によって出力されるビットストリームは、外部記憶装置１０４に格納されても良いし、あるいは、通信インタフェース１０９を介して通信網１０８に繋がっている外部機器に出力されても良い。処理を終えると、ステップＳ１０へ処理が進む。 Step S9 is a process of shaping the quantized spectrum and the scale factor calculated in step S8 according to a format determined by the encoding method, and outputting it as a bit stream. In the present embodiment, the bit stream output by this processing may be stored in the external storage device 104, or may be output to an external device connected to the communication network 108 via the communication interface 109. . When the process is finished, the process proceeds to step S10.

ステップＳ１０は、ステップＳ９で出力されたビットストリームに使用されたビット量と符号化ビットレートから、メモリ１０１上に格納されている蓄積ビット数の補正を行う処理である。処理を終えると、処理はステップＳ２へと戻る。 Step S10 is a process for correcting the number of accumulated bits stored in the memory 101 from the bit amount used in the bit stream output in step S9 and the encoding bit rate. When the process is finished, the process returns to step S2.

ステップＳ１１は、聴覚心理演算や直交変換などで生じる遅延によってまだ出力されていない量子化スペクトルがメモリ上に残っているため、それらをビットストリームに整形して出力する処理である。処理を終えると、オーディオ信号符号化処理を終了する。 Step S11 is a process of shaping the quantized spectrum that has not yet been output due to the delay caused by the psychoacoustic calculation or orthogonal transformation, etc., into the bit stream and outputting it. When the process is finished, the audio signal encoding process is finished.

図３は、上記したステップＳ７の量子化ステップ予測処理の詳細を示すフローチャートである。 FIG. 3 is a flowchart showing details of the quantization step prediction process in step S7 described above.

まず、ステップＳ１０１は、メモリ１０１上のワークエリアに保存されているスケールファクタを符号化形式によって定められているフォーマットに従って符号化したときに使用するビット数を算出する処理である。算出されたビット数は、メモリ１０１上のワークエリアに保存される。処理を終えると、ステップＳ１０２へ進む。 First, step S101 is a process of calculating the number of bits to be used when the scale factor stored in the work area on the memory 101 is encoded according to the format defined by the encoding format. The calculated number of bits is stored in a work area on the memory 101. When the process is finished, step S102 follows.

ステップＳ１０２は、フレームに割り当てられたビット数からメモリ１０１上に格納されたスケールファクタビット数を引いて、スペクトル符号に割り当てられるビット数を算出する処理である。算出されたスペクトル割当ビット数はメモリ１０１上のワークエリアに保存される。処理を終えると、ステップＳ１０３へ進む。 Step S102 is a process of calculating the number of bits allocated to the spectrum code by subtracting the number of scale factor bits stored in the memory 101 from the number of bits allocated to the frame. The calculated spectrum allocation bit number is stored in the work area on the memory 101. When the process is finished, step S 103 follows.

ステップＳ１０３は、メモリ１０１上のスペクトル割当ビット数を用いて、量子化スペクトル総量の予測計算を行う処理である。この予測計算は、予め実験を実施することによって求めた近似式によって行う。例えば、この近似式をF(x)として、スペクトル割当ビットをspectrum_bitsとすると、量子化後スペクトル予測総量は以下の式によって求めることができる。 Step S103 is a process of performing prediction calculation of the total amount of quantized spectrum using the number of spectrum allocation bits on the memory 101. This prediction calculation is performed by an approximate expression obtained by conducting an experiment in advance. For example, if this approximate expression is F (x) and the spectrum allocation bits are spectrum_bits, the quantized spectrum prediction total amount can be obtained by the following expression.

算出された量子化スペクトル予測総量はメモリ１０１上のワークエリアに格納される。処理を終えると、ステップＳ１０４へ進む。 The calculated quantized spectrum prediction total amount is stored in a work area on the memory 101. When the process is finished, step S 104 follows.

ステップＳ１０４は、量子化前のスペクトルが持つ聴覚情報量を算出する処理である。量子化前のスペクトルの聴覚情報量は、各スペクトル成分に、そのスペクトル成分が含まれるSFBのスケールファクタによる量子化粗さの減少分を積算し、1フレーム分の総量を求め、その対数を算出することによって求められる。例えば、MPEG-2 AACの場合、量子化前のスペクトルが持つ聴覚情報量は以下の式を計算することによって求めることができる。 Step S104 is a process of calculating the amount of auditory information that the spectrum before quantization has. The amount of auditory information of the spectrum before quantization is calculated by adding the amount of decrease in quantization roughness due to the scale factor of the SFB that contains the spectrum component to each spectrum component, obtaining the total amount for one frame, and calculating its logarithm It is required by doing. For example, in the case of MPEG-2 AAC, the amount of auditory information that the spectrum before quantization has can be obtained by calculating the following equation.

算出された量子化前スペクトルの聴覚情報量はメモリ１０１上のワークエリアに保存される。処理を終えると、ステップＳ１０５へ進む。 The calculated auditory information amount of the pre-quantization spectrum is stored in the work area on the memory 101. When the process is finished, step S105 follows.

ステップＳ１０５は、ステップＳ１０３で求めた量子化スペクトルの予測総量の対数を計算し、量子化スペクトルの予測情報量を算出する処理である。例えば、MPEG-2 AACの場合は以下の式を計算することによって算出することができる。 Step S105 is a process of calculating the logarithm of the predicted total amount of the quantized spectrum obtained in step S103 and calculating the predicted information amount of the quantized spectrum. For example, MPEG-2 AAC can be calculated by calculating the following formula.

すなわち、ステップＳ１０３で得られた量子化スペクトル総量の対数を計算することによって、量子化スペクトル予測情報量が得られる。この処理によって算出された量子化後のスペクトル情報量はメモリ１０１上のワークエリアに保存される。処理を終えると、ステップＳ１０６へ進む。 That is, by calculating the logarithm of the total quantized spectrum obtained in step S103, the quantized spectrum prediction information amount can be obtained. The quantized spectral information amount calculated by this processing is stored in the work area on the memory 101. When the process is finished, step S 106 follows.

ステップＳ１０６では、ステップＳ１０４で求めた量子化前スペクトルの聴覚情報量から、ステップＳ１０５で求めた量子化スペクトル予測情報量を引き、その結果にステップＳ１０７で、量子化粗さの刻み幅によって決定される係数をかけ、グローバルゲイン、すなわち量子化ステップの予測値が算出される。MPEG-2 AACの場合は、この予測値は結局実施形態１と同じく式（５）を計算したことになる。 In step S106, the quantized spectrum prediction information amount obtained in step S105 is subtracted from the auditory information amount of the pre-quantization spectrum obtained in step S104, and the result is determined by the step size of the quantization roughness in step S107. The global gain, that is, the predicted value of the quantization step is calculated. In the case of MPEG-2 AAC, this prediction value is the result of calculating Equation (5) as in the first embodiment.

算出された量子化ステップ予測値は、メモリ１０１上のワークエリアに量子化ステップとして格納される。処理を終えると、量子化ステップ予測処理を終了し、リターンする。 The calculated quantization step predicted value is stored in the work area on the memory 101 as a quantization step. When the process is finished, the quantization step prediction process is finished and the process returns.

図４は、上記したステップＳ８のスペクトル量子化処理を詳細化したフローチャートである。 FIG. 4 is a flowchart detailing the above-described spectrum quantization processing in step S8.

ステップＳ２０１は、メモリ１０１上に格納されている量子化ステップとスケールファクタに従って、スペクトルバッファに格納されている1024本のスペクトル成分を量子化する処理である。MPEG-2 AACの場合は、前出の式（１）に従って量子化スペクトルが計算される。処理を終えると、ステップＳ２０２へ進む。 Step S201 is a process of quantizing 1024 spectral components stored in the spectral buffer according to the quantization step and scale factor stored on the memory 101. In the case of MPEG-2 AAC, the quantized spectrum is calculated according to the above equation (1). When the process is finished, step S 202 follows.

ステップＳ２０２は、ステップＳ２０１で計算された量子化スペクトル全てを符号化した時に使用されるビット数を計算する処理である。例えば、MPEG-2 AACの場合は、量子化スペクトルは複数個をまとめた上でハフマン符号化されるため、この処理においてハフマンコード表の探索が行われ、符号化ビット数の総計が計算される。計算された使用ビット数はメモリ１０１上のワークエリアに格納される。処理を終えると、ステップＳ２０３へ進む。 Step S202 is a process of calculating the number of bits used when all the quantized spectra calculated in step S201 are encoded. For example, in the case of MPEG-2 AAC, since a plurality of quantized spectra are combined and Huffman encoded, the Huffman code table is searched in this process, and the total number of encoded bits is calculated. . The calculated number of used bits is stored in a work area on the memory 101. When the process is finished, step S203 follows.

ステップＳ２０３は、メモリ１０１上のスペクトル割当ビットと使用ビットの大きさを比較する処理である。この比較の結果、使用ビットが割り当てられたビットよりも大きい場合は、ステップＳ２０４へ進み、符号量を削減するためにメモリ１０１に格納されている量子化ステップを増加した後、ステップＳ２０１に戻り再度スペクトルの量子化を行うが、前述の量子化ステップ予測処理によってほぼ正確な量子化ステップが予測されるため、ステップＳ２０４が実際に実行されることは少ない。 Step S203 is processing for comparing the size of the spectrum allocation bits on the memory 101 and the used bits. As a result of this comparison, if the used bit is larger than the allocated bit, the process proceeds to step S204, the quantization step stored in the memory 101 is increased to reduce the code amount, and the process returns to step S201 again. Although the spectrum is quantized, since an almost accurate quantization step is predicted by the above-described quantization step prediction process, step S204 is rarely actually executed.

ステップＳ２０３の比較において使用ビットが割り当てられたビットよりも小さい場合は、スペクトル量子化処理を終了してリターンする。 If the used bit is smaller than the allocated bit in the comparison in step S203, the spectrum quantization process is terminated and the process returns.

以上説明したように、本実施形態におけるオーディオ信号符号化処理では、スペクトル符号に割り当てられたビット数から量子化後のスペクトルが持つ情報量を予測し、さらに、量子化前の聴覚情報量との差分を取ることによって量子化ステップを実際の量子化を行う前にほぼ正確に予測することによって、量子化ステップの調整を極力避けることが可能になるため、量子化処理にかかる処理量を大幅に削減することができる。 As described above, in the audio signal encoding process according to the present embodiment, the information amount of the spectrum after quantization is predicted from the number of bits assigned to the spectrum code, and further, the amount of audio information before quantization is calculated. By taking the difference, it is possible to avoid the adjustment of the quantization step as much as possible by predicting the quantization step almost accurately before performing the actual quantization, greatly increasing the amount of processing required for the quantization process. Can be reduced.

（第３の実施形態）
固定ビットレートで符号化するときに、ビットリザーバに蓄積されている蓄積ビットを入力信号の特性によって各フレームに適宜配分する場合でも、本発明の技術を適用することが可能である。本実施形態ではこの場合について図面を用いて説明する。 (Third embodiment)
When encoding at a fixed bit rate, the technique of the present invention can be applied even when the accumulated bits accumulated in the bit reservoir are appropriately distributed to each frame according to the characteristics of the input signal. In the present embodiment, this case will be described with reference to the drawings.

図１０は、本実施形態におけるオーディオ信号符号化装置の一構成例を示す図である。第１の実施形態に係る図１と同じく、図中の太線はデータの流れを、細線は制御信号の流れを示す。また、図１０においては、図１と同様の機能を持つ構成要素には同じ番号を振っている。 FIG. 10 is a diagram illustrating a configuration example of an audio signal encoding device according to the present embodiment. As in FIG. 1 according to the first embodiment, a thick line in the figure indicates a data flow, and a thin line indicates a control signal flow. In FIG. 10, the same numbers are assigned to components having the same functions as those in FIG.

図示の構成において、１はフレーム分割器、２は聴覚心理演算器、３はフィルタバンク、４はスケールファクタ計算器、７は量子化ステップ計算器、８はスペクトル量子化器、９はビット整形器である。 In the illustrated configuration, 1 is a frame divider, 2 is an auditory psychological calculator, 3 is a filter bank, 4 is a scale factor calculator, 7 is a quantization step calculator, 8 is a spectral quantizer, and 9 is a bit shaper. It is.

１１はPEビット計算器であり、フレームの聴覚エントロピー(PE)に基づいてフレームの予測発生符合量であるPEビットを計算する。 Reference numeral 11 denotes a PE bit calculator that calculates PE bits, which are the predicted generation code amount of a frame, based on the auditory entropy (PE) of the frame.

１２はスペクトル割当ビット計算器であり、ビットレートやPEビット、蓄積ビット量、スケールファクタなどに基づいてスペクトル符号に割り当てられるビット数を計算する。 A spectrum allocation bit calculator 12 calculates the number of bits allocated to the spectrum code based on the bit rate, PE bits, accumulated bit amount, scale factor, and the like.

１３はビットリザーバであり、符号化方式に従って規定されている蓄積ビット量を逐次管理する。 Reference numeral 13 denotes a bit reservoir that sequentially manages the amount of accumulated bits defined according to the encoding method.

１４は量子化スペクトル総量予測器であり、条件により、フレーム割当ビット、もしくはPEビットに基づいて量子化スペクトル総量を予測する。 Reference numeral 14 denotes a quantized spectrum total amount predictor that predicts a quantized spectrum total amount based on frame allocation bits or PE bits depending on conditions.

以上のような構成のオーディオ信号符号化装置における処理動作を以下に説明する。なお、本実施形態においても説明の便宜のために符号化方式としてMPEG-2 AACを例にとって説明するが、非線形量子化を行うその他の符号化方式においても全く同様な方法で実現可能である。 Processing operations in the audio signal encoding apparatus having the above configuration will be described below. In this embodiment, MPEG-2 AAC will be described as an example of an encoding method for convenience of explanation, but other encoding methods that perform nonlinear quantization can be realized by the same method.

まず、処理に先立ち、各部の初期化が行われる。初期化によって、量子化ステップと全てのスケールファクタは０にセットされる。 First, prior to processing, each unit is initialized. Initialization sets the quantization step and all scale factors to zero.

オーディオ入力信号はフレーム分割器１によってフレーム単位に分割され、聴覚心理演算器２とフィルタバンク３に出力される。 The audio input signal is divided into frame units by the frame divider 1 and output to the auditory psychological calculator 2 and the filter bank 3.

聴覚心理演算器２では、フレーム分割器１から出力された入力信号に対して適宜聴覚マスキング分析を行い、ブロックタイプとSFB毎のSMR、PEを出力する。 The auditory psychological computing unit 2 appropriately performs auditory masking analysis on the input signal output from the frame divider 1 and outputs SMR and PE for each block type and SFB.

フィルタバンク３はフレーム分割器１から出力された入力信号1フレームとフィルタバンク３内に保持している前フレームの1フレームを合わせた２フレーム分の入力信号を、聴覚心理分析器２から出力されたブロックタイプに従って時間周波数変換を行い、周波数スペクトルに変換する。 The filter bank 3 outputs an input signal for two frames, which is a sum of one frame of the input signal output from the frame divider 1 and one frame of the previous frame held in the filter bank 3, from the psychoacoustic analyzer 2. Time frequency conversion is performed according to the block type, and the frequency spectrum is converted.

スケールファクタ計算器４は、フィルタバンク３から出力される周波数スペクトルと聴覚心理演算器２から出力されるSFB毎のSMR値に基づいて、スケールファクタを実施形態1と同様に適宜算出する。 The scale factor calculator 4 appropriately calculates the scale factor based on the frequency spectrum output from the filter bank 3 and the SMR value for each SFB output from the auditory psychological calculator 2 as in the first embodiment.

PEビット計算器１１は聴覚心理演算器３が出力されるPEから、PEビットを計算する。すなわち、処理中のフレームの入力信号が持つ聴覚的な情報量を聴覚上完全に符号化した場合の予想符号量に変換する。MPEG-2 AACの場合、ISO規格書に記載されているPEビットの計算式は次のようになる。 The PE bit calculator 11 calculates a PE bit from the PE output from the psychoacoustic operator 3. That is, the auditory information amount of the input signal of the frame being processed is converted into the expected code amount when the auditory information is completely encoded. In the case of MPEG-2 AAC, the PE bit calculation formula described in the ISO standard is as follows.

ブロック長がロングのとき：

ブロック長がショートのとき：

When the block length is long:

When the block length is short:

本実施形態では、この計算式をこのまま用いて、ブロックタイプのブロック長に応じてPEビットを算出する。 In this embodiment, the PE bit is calculated according to the block length of the block type using this calculation formula as it is.

スペクトル割当ビット計算器１２では、まず、スケールファクタ計算器４から出力されるスケールファクタを符号化するために必要なビット数を計算し、次に、ビットレートに基づく１フレーム・チャネル当りの平均ビット量との差分を求めて平均スペクトル割当ビットを計算する。 The spectrum allocation bit calculator 12 first calculates the number of bits required to encode the scale factor output from the scale factor calculator 4, and then calculates the average bit per frame channel based on the bit rate. The average spectral allocation bit is calculated by obtaining the difference from the quantity.

次に、この値とPEビット計算器１１が出力するPEビットを比較し、PEビットが大きい場合はPEビットをビットリザーバ１３に蓄積されている蓄積ビット量によって決定される最大値まで割り当てる。PEビットが小さい場合は平均スペクトル割当ビットをそのまま割り当てる。 Next, this value is compared with the PE bit output from the PE bit calculator 11, and when the PE bit is large, the PE bit is assigned up to the maximum value determined by the accumulated bit amount accumulated in the bit reservoir 13. When the PE bit is small, the average spectrum allocation bit is allocated as it is.

すなわち、本実施形態において、スペクトル割当ビットは具体的には以下の手順で計算される。 That is, in this embodiment, the spectrum allocation bits are specifically calculated by the following procedure.

１．蓄積ビット使用許容量を蓄積ビット量から計算する。
ブロック長がロングのとき：蓄積ビット量の10％、
ブロック長がショートのとき：蓄積ビット量の25％、
を蓄積ビット使用許容量とする。これをusable_bitsとする。 1. Calculate the accumulated bit usage allowance from the accumulated bit amount.
When the block length is long: 10% of the accumulated bit amount,
When the block length is short: 25% of the accumulated bit amount,
Is an allowable use amount of accumulated bits. This is usable_bits.

2. 平均スペクトル割当ビット量をaverage_bitsとすると、スペクトル割当ビット量、spectrum_bitsは以下の要領で決定される。
pe_bits ＞ (average_bits＋usable_bits) のとき、
spectrum_bits=average_bits+usable_bits;
pe_bits ＜ average_bits のとき、
spectrum_bits=average_bits;
それ以外、average_bits≦pe_bits≦(average_bits+usable_bits) のとき、
spectrum_bits＝pe_bits; 2. If the average spectrum allocation bit amount is average_bits, the spectrum allocation bit amount and spectrum_bits are determined as follows.
When pe_bits> (average_bits + usable_bits)
spectrum_bits = average_bits + usable_bits;
When pe_bits <average_bits,
spectrum_bits = average_bits;
Otherwise, when average_bits ≦ pe_bits ≦ (average_bits + usable_bits)
spectrum_bits = pe_bits;

次に、スペクトル割当ビット計算器１２は、PEビットが平均スペクトル割当ビット量よりも少ない場合は、PEビットを量子化スペクトル総量予測器１４に出力し、PEビットが平均スペクトル割当ビット以上の場合は上記手順で計算したスペクトル割当ビットを量子化スペクトル総量予測器１４に出力する。このとき、どちらのビット数を量子化スペクトル総量予測器１４に出力したかを示すフラグであるビット選択情報（以下、単に「選択情報」という。）を同時に出力する。 Next, the spectrum allocation bit calculator 12 outputs the PE bit to the quantized spectrum total amount predictor 14 when the PE bit is smaller than the average spectrum allocation bit amount, and when the PE bit is equal to or greater than the average spectrum allocation bit. The spectrum allocation bits calculated in the above procedure are output to the quantized spectrum total amount predictor 14. At this time, bit selection information (hereinafter simply referred to as “selection information”), which is a flag indicating which number of bits has been output to the quantized spectrum total amount predictor 14, is simultaneously output.

量子化スペクトル総量予測器１４は、入力された選択情報とビット数に基づいて量子化スペクトル総量を予測する。この予測計算は、第１の実施形態で示した方法と同様に、実験によって求めた近似式によって行うが、本実施形態における量子化スペクトル総量予測器１４では、この近似式を選択情報によって切り換えて予測計算を行う。例えば、スペクトル割当ビットによる量子化スペクトル総量の近似式をF(x)、PEビットによる量子化スペクトル総量の近似式をG(x)とすると、スペクトル予測総量は以下の式で求められる。 The quantized spectrum total amount predictor 14 predicts the quantized spectrum total amount based on the input selection information and the number of bits. This prediction calculation is performed by an approximate expression obtained by experiment as in the method shown in the first embodiment. In the quantized spectrum total amount predictor 14 in the present embodiment, this approximate expression is switched according to selection information. Perform predictive calculations. For example, assuming that an approximate expression of the total amount of quantized spectrum using spectrum allocation bits is F (x) and an approximate expression of the total amount of quantized spectrum using PE bits is G (x), the total predicted spectrum amount can be obtained by the following expression.

選択情報がスペクトル割当ビットの選択を示している場合：

選択情報がPEビットの選択を示している場合：

If the selection information indicates the selection of spectrum allocation bits:

If the selection information indicates PE bit selection:

ただし、式（１０）において、bit_rateは処理中の入力信号のビットレート、sampling_rateは処理中の入力信号のサンプリングレート、base_bit_rateは基準ビットレート、base_sampling_rateは基準サンプリングレートである。基準ビットレートと基準サンプリングレートは、PEビットによる量子化スペクトル総量予測式G(x)を実験によって求めたときの入力信号のビットレートとサンプリングレートであり、本実施形態におけるオーディオ信号符号化装置において予め定められている値である。 In equation (10), bit_rate is the bit rate of the input signal being processed, sampling_rate is the sampling rate of the input signal being processed, base_bit_rate is the reference bit rate, and base_sampling_rate is the reference sampling rate. The reference bit rate and the reference sampling rate are the bit rate and the sampling rate of the input signal when the quantized spectral total amount prediction formula G (x) by PE bits is obtained by experiment, and in the audio signal encoding device in this embodiment This is a predetermined value.

ここで、本実施形態において上記のような量子化スペクトルの予測方法を取る理由を以下説明する。 Here, the reason why the above-described quantized spectrum prediction method is used in the present embodiment will be described below.

本実施形態では、スペクトル割当ビット計算器１２においてPEビットを基準にしたビットの割当が行われる。従って、スペクトル割当ビットには通常PEビットの大きさ、すなわち、処理中のフレームにおける入力信号が持つ聴覚上の発生符号量が反映される。ところが、固定ビットレート制御において、PEビットの大きさが平均スペクトル割当ビットを下回る場合には、スペクトル割当ビットには平均スペクトル割当ビットがそのまま割り当てられる。したがって、この場合はスペクトル割当ビットには入力信号の聴覚上の発生符号量が反映されないため、スペクトル割当ビットを用いて量子化スペクトル総量を予測すると予測誤差が大きくなってしまう。そこで、この場合にはPEビットを用いて量子化スペクトル総量を予測することで、より正確な量子化スペクトル総量を予測することができる。 In the present embodiment, the spectrum allocation bit calculator 12 performs bit allocation based on the PE bit. Therefore, the spectrum allocation bits usually reflect the size of the PE bits, that is, the amount of generated code perceptually possessed by the input signal in the frame being processed. However, in the fixed bit rate control, when the size of the PE bit is lower than the average spectrum allocation bit, the average spectrum allocation bit is allocated as it is to the spectrum allocation bit. Therefore, in this case, since the generated code amount perceptually of the input signal is not reflected in the spectrum allocation bits, the prediction error increases when the total quantized spectrum amount is predicted using the spectrum allocation bits. Therefore, in this case, it is possible to predict a more accurate quantized spectrum total amount by predicting the quantized spectrum total amount using the PE bits.

また、スペクトル割当ビットはビットレートやサンプリングレートの制約を考慮して計算されるため、ビットレートやサンプリングレートの変化に追随する特性を持っている。一方、PEビットは、サンプリングレートの変化によって元となるPE自体の値は変化するものの、ビットレートやサンプリングレートが変化しても式（８）、（９）自体は変化しない。そこで、PEビットによる予測を行う場合は式（１０）に示すように、基準となるビットレートやサンプリングレートからの変化率の影響を考慮した予測を行なっている。 Further, since the spectrum allocation bits are calculated in consideration of restrictions on the bit rate and sampling rate, they have characteristics that follow changes in the bit rate and sampling rate. On the other hand, in the PE bit, although the value of the original PE itself changes due to a change in the sampling rate, the expressions (8) and (9) themselves do not change even if the bit rate or the sampling rate changes. Therefore, when performing prediction using PE bits, prediction is performed in consideration of the influence of the change rate from the reference bit rate or sampling rate, as shown in Equation (10).

このようにすることで、一つの近似式G(x)をあらゆるビットレートやサンプリングレートに適用することが可能になる。 In this way, one approximate expression G (x) can be applied to any bit rate or sampling rate.

図１０の説明に戻る。量子化ステップ計算器７は第１の実施形態と同様に、フィルタバンク３が出力する周波数スペクトルにスケールファクタ計算器４から出力されたスケールファクタによって重み付けをした値の総量を計算し、さらにその対数を計算して量子化前のスペクトルが持つ聴覚情報量を算出する。次に、量子化スペクトル総量予測器１４によって予測された量子化スペクトル総量の対数を計算して量子化後のスペクトル情報量を算出する。さらに、この差分を取り、量子化粗さの刻み幅によって決定される係数を掛けることによって量子化ステップを計算する。具体的には、前出の式（５）の計算を行う。 Returning to the description of FIG. As in the first embodiment, the quantization step calculator 7 calculates the total amount of values obtained by weighting the frequency spectrum output from the filter bank 3 with the scale factor output from the scale factor calculator 4, and the logarithm thereof. To calculate the auditory information amount of the spectrum before quantization. Next, a logarithm of the quantized spectrum total amount predicted by the quantized spectrum total amount predictor 14 is calculated to calculate a spectral information amount after quantization. Further, the quantization step is calculated by taking this difference and multiplying by a coefficient determined by the step size of the quantization roughness. Specifically, the calculation of the above equation (5) is performed.

第１の実施形態と同様に、スペクトル量子化器８はスケールファクタ計算器４が出力するスケールファクタと量子化ステップ計算器７が出力する量子化ステップを用いてフィルタバンク３が出力する周波数スペクトルを量子化し、必要ビット数をカウントし、スペクトル割当ビット計算器１２が出力するスペクトル割当ビットと比較する。ここで、必要ビット数がスペクトル割当ビットを超えてしまった場合は、量子化ステップを適宜増加して再度量子化を行うが、前述したように量子化ステップ計算器７による量子化ステップの予測値はほぼ正確であるため、この再量子化が行われることは少ない。 As in the first embodiment, the spectrum quantizer 8 uses the scale factor output from the scale factor calculator 4 and the quantization step output from the quantization step calculator 7 to calculate the frequency spectrum output from the filter bank 3. The spectrum is quantized, the necessary number of bits is counted, and compared with the spectrum allocation bits output from the spectrum allocation bit calculator 12. Here, when the necessary number of bits exceeds the spectrum allocation bit, the quantization step is increased as appropriate, and the quantization is performed again. As described above, the predicted value of the quantization step by the quantization step calculator 7 Is almost accurate, so this requantization is rarely performed.

スペクトル量子化器８によって最終的に出力される量子化スペクトルとスケールファクタ、量子化ステップは、ビット整形器９によってエントロピー符号化後、符号化方式が定めるビットストリーム形式に適宜整形され、出力される。 The quantized spectrum, scale factor, and quantization step that are finally output by the spectrum quantizer 8 are entropy-encoded by the bit shaper 9 and then appropriately shaped into a bit stream format determined by the encoding method and output. .

このとき、ビットリザーバ１３に実際に符号に使用したビット数が通知され、ビットリザーバ１３はフレームビットとの差分を計算し、増減分を蓄積ビット量に加減することで適宜蓄積ビット量を調整する。 At this time, the bit reservoir 13 is notified of the number of bits actually used for the sign, the bit reservoir 13 calculates the difference from the frame bit, and adjusts the accumulated bit amount appropriately by adding or subtracting the increase / decrease to the accumulated bit amount. .

以上説明したように、本実施形態のように固定ビットレートにおいて、入力信号に応じてビットリザーバに蓄積されている蓄積ビットを適宜フレームに割り当てる場合においても、量子化前に正確に量子化スペクトル総量を予測することによって、量子化前に正確に量子化ステップを決定することが可能になり、スペクトル量子化とビット計算の繰り返しを避けて、効率的に量子化を行うことができる。 As described above, even when the accumulated bits accumulated in the bit reservoir according to the input signal are appropriately assigned to the frame at the fixed bit rate as in the present embodiment, the total amount of the quantized spectrum accurately before the quantization. Therefore, it is possible to accurately determine a quantization step before quantization, and to efficiently perform quantization while avoiding repetition of spectrum quantization and bit calculation.

（第４の実施形態）
第３の実施形態で説明したオーディオ符号化装置もまた、ＰＣ等の汎用的なコンピュータ上で動作するソフトウェアプログラムとして実施することが可能である。以下、この場合について図面を用いて説明する。 (Fourth embodiment)
The audio encoding device described in the third embodiment can also be implemented as a software program that runs on a general-purpose computer such as a PC. Hereinafter, this case will be described with reference to the drawings.

本実施形態におけるオーディオ信号符号化装置の構成およびオーディオ信号符号化処理プログラムの処理内容等については、第２の実施形態と概ね共通しているので、本実施形態では、第２の実施形態で説明した図５、図２、図６〜９を援用することとし、それらの詳細な説明は省略する。第２の実施形態と相違する点は、ステップＳ７の量子化ステップ予測処理の内容にある。そこで以下では、このステップＳ７の量子化ステップ予測処理の説明のみを行う。 The configuration of the audio signal encoding apparatus and the processing content of the audio signal encoding processing program in the present embodiment are generally the same as those in the second embodiment. Therefore, in the present embodiment, the second embodiment will be described. 5, 2, and 6 to 9 will be referred to, and detailed descriptions thereof will be omitted. The difference from the second embodiment is the content of the quantization step prediction process in step S7. Therefore, only the quantization step prediction process in step S7 will be described below.

図１１は、本実施形態におけるステップＳ７の量子化ステップ予測処理の詳細を示すフローチャートである。 FIG. 11 is a flowchart showing details of the quantization step prediction process in step S7 in the present embodiment.

まず、ステップＳ３０１は、ステップＳ４の聴覚心理演算処理によって得られたメモリ１０１上のPEとブロックタイプより、PEビットを算出する処理である。具体的には、第３の実施形態と同様にブロックタイプによって、前出の式（９）もしくは式（１０）を選択してPEビットを算出する。算出したPEビットはメモリ１０１上のワークエリアに格納される。処理を終えるとステップＳ３０２へ進む。 First, step S301 is a process of calculating PE bits from the PE and block type on the memory 101 obtained by the psychoacoustic calculation process of step S4. Specifically, the PE bit is calculated by selecting the above formula (9) or formula (10) according to the block type as in the third embodiment. The calculated PE bit is stored in the work area on the memory 101. When the process is finished, step S302 follows.

ステップＳ３０２は、メモリ１０１上のワークエリアに保存されているスケールファクタを符号化方式規定のフォーマットに符号化したときに使用するビット数を算出する処理である。この処理によって算出されたスケールファクタビット数は、メモリ１０１上のワークエリアに保存される。処理を終えると、ステップＳ３０３へ進む。 Step S302 is a process of calculating the number of bits to be used when the scale factor stored in the work area on the memory 101 is encoded in the format defined by the encoding method. The number of scale factor bits calculated by this processing is stored in the work area on the memory 101. When the process is finished, step S 303 follows.

ステップＳ３０３は、フレームに割り当てられる平均ビット数からメモリ１０１上に格納されたスケールファクタビット数を引いて、スペクトル符号に割り当てられるビット数、すなわち平均スペクトル割当ビット数（平均割当ビット）を算出する処理である。算出された平均割当ビット数はメモリ１０１上のワークエリアに保存される。処理を終えると、ステップＳ３０４へ進む。 Step S303 is a process of subtracting the number of scale factor bits stored in the memory 101 from the average number of bits allocated to the frame to calculate the number of bits allocated to the spectrum code, that is, the average number of spectrum allocated bits (average allocated bits). It is. The calculated average number of allocated bits is stored in a work area on the memory 101. When the process is finished, step S304 follows.

ステップＳ３０４は、メモリ１０１上の平均割当ビット数とPEビット数の大きさを比較する処理である。この比較の結果、PEビット数の方が大きい場合はステップＳ３０５へ進む。それ以外の場合は、ステップＳ３０７へ進む。 Step S304 is processing for comparing the average number of allocated bits on the memory 101 with the number of PE bits. As a result of the comparison, if the number of PE bits is larger, the process proceeds to step S305. Otherwise, the process proceeds to step S307.

ステップＳ３０５は、メモリ１０１上のPEビットと平均割当ビット、蓄積ビット量よりスペクトル割当ビットを算出する処理である。この処理の詳細は図１２を用いて後述する。処理を終えると、ステップＳ３０６へ進む。 Step S305 is processing for calculating spectrum allocation bits from the PE bits on the memory 101, the average allocation bits, and the accumulated bit amount. Details of this processing will be described later with reference to FIG. When the process is finished, step S306 follows.

ステップＳ３０６は、メモリ１０１上のスペクトル割当ビット数を用いて、量子化スペクトル総量の予測計算を行う処理である。この予測計算は、予め実験を実施することによって求めた近似式によって行う。例えば、この近似式をF(x)として、スペクトル割当ビットをspectrum_bitsとすると、量子化後スペクトル予測総量は第２の実施形態と同様に式（４）によって求めることができる。 Step S306 is a process of performing prediction calculation of the total amount of quantized spectrum using the number of spectrum allocation bits on the memory 101. This prediction calculation is performed by an approximate expression obtained by conducting an experiment in advance. For example, if this approximate expression is F (x) and the spectrum allocation bits are spectrum_bits, the quantized spectrum prediction total amount can be obtained by Expression (4) as in the second embodiment.

算出された量子化スペクトル予測総量はメモリ１０１上のワークエリアに格納される。処理を終えると、ステップＳ３０９へ進む。 The calculated quantized spectrum prediction total amount is stored in a work area on the memory 101. When the process is finished, step S309 follows.

一方のステップＳ３０７は、メモリ１０１上の平均割当ビットをスペクトル割当ビットとしてメモリ１０１に格納する処理である。すなわち、平均割当ビットの値をスペクトル割当ビットにコピーする。処理を終えると、ステップＳ３０８へ進む。 One step S307 is a process of storing the average allocation bits on the memory 101 in the memory 101 as spectrum allocation bits. That is, the value of the average allocation bit is copied to the spectrum allocation bit. When the process is finished, step S308 follows.

ステップＳ３０８は、メモリ１０１上のPEビット数を用いて、量子化スペクトル総量の予測計算を行う処理である。この予測計算も、予め実験を実施することによって求めた近似式によって行う。この近似式をG(x)として、PEビットをpe_bitsとすると、量子化後スペクトル予測総量は第３の実施形態と同様に式（１０）によって求めることができる。 Step S308 is a process of performing prediction calculation of the total amount of quantized spectrum using the number of PE bits on the memory 101. This prediction calculation is also performed by an approximate expression obtained by conducting an experiment in advance. When this approximate expression is G (x) and the PE bit is pe_bits, the quantized spectral prediction total amount can be obtained by Expression (10) as in the third embodiment.

ステップＳ３０９は、量子化前のスペクトルが持つ聴覚情報量を算出する処理である。第２の実施形態と同様に、量子化前スペクトルの聴覚情報量は、各スペクトル成分に、そのスペクトル成分が含まれるSFBのスケールファクタによる量子化粗さの減少分を積算し、1フレーム分の総量を求め、その対数を算出することによって求められる。例えば、MPEG-2 AACの場合、量子化前のスペクトルが持つ聴覚情報量は以下の式を計算することによって求めることができる。 Step S309 is processing for calculating the amount of auditory information that the spectrum before quantization has. As in the second embodiment, the amount of auditory information of the pre-quantization spectrum is obtained by multiplying each spectral component by a decrease in quantization roughness due to the scale factor of the SFB in which the spectral component is included. It is obtained by calculating the total amount and calculating its logarithm. For example, in the case of MPEG-2 AAC, the amount of auditory information that the spectrum before quantization has can be obtained by calculating the following equation.

算出された量子化前スペクトルの聴覚情報量はメモリ１０１上のワークエリアに保存される。処理を終えると、ステップＳ３１０へ進む。 The calculated auditory information amount of the pre-quantization spectrum is stored in the work area on the memory 101. When the process is finished, step S 310 follows.

ステップＳ３１０は、ステップＳ３０６もしくはステップＳ３０８で求めた量子化スペクトルの予測総量の対数を計算し、量子化スペクトルの予測情報量を算出する処理である。例えば、MPEG-2 AACの場合は以下の式を計算することによって算出することができる。 Step S310 is a process of calculating the logarithm of the predicted total amount of the quantized spectrum obtained in step S306 or step S308 and calculating the predicted information amount of the quantized spectrum. For example, MPEG-2 AAC can be calculated by calculating the following formula.

この処理によって算出された量子化後のスペクトル予測情報量はメモリ１０１上のワークエリアに保存される。処理を終えると、ステップＳ３１１へ進む。 The quantized spectral prediction information amount calculated by this processing is stored in the work area on the memory 101. When the process is finished, step S311 follows.

ステップＳ３１１では、ステップＳ３０９で求めた量子化前スペクトルの聴覚情報量から、ステップＳ３１０で求めた量子化スペクトル予測情報量を引き、その結果に量子化粗さの刻み幅によって決定される係数を掛け、グローバルゲイン、すなわち量子化ステップの予測値が算出される。MPEG-2 AACの場合は、この予測値は結局第１の実施形態と同じく式（５）を計算したことになる。 In step S311, the quantized spectrum prediction information amount obtained in step S310 is subtracted from the auditory information amount of the pre-quantization spectrum obtained in step S309, and the result is multiplied by a coefficient determined by the quantization roughness step size. The global gain, that is, the predicted value of the quantization step is calculated. In the case of MPEG-2 AAC, this prediction value is the result of calculating Equation (5) as in the first embodiment.

図１２は、本実施形態におけるステップＳ３０５のスペクトル割当ビット算出処理の詳細を示すフローチャートである。 FIG. 12 is a flowchart showing details of the spectrum allocation bit calculation processing in step S305 in the present embodiment.

ステップＳ４０１は、このフレームに割り当てることができる蓄積ビット数を、メモリ１０１上の蓄積ビット量とブロックタイプに従って計算し、この値を平均割当ビットに足すことで、スペクトル割当ビットの上限値を計算する処理である。
本実施形態において、蓄積ビット数は第３の実施形態と同じく以下の要領で決定される。 In step S401, the number of accumulated bits that can be allocated to this frame is calculated according to the accumulated bit amount and block type in the memory 101, and this value is added to the average allocated bits to calculate the upper limit value of the spectrum allocated bits. It is processing.
In the present embodiment, the number of accumulated bits is determined in the following manner as in the third embodiment.

ブロック長がロングのとき：蓄積ビット量の10％、
ブロック長がショートのとき：蓄積ビット量の25％ When the block length is long: 10% of the accumulated bit amount,
When the block length is short: 25% of the accumulated bit amount

上記手順で求められた値をメモリ１０１上の平均割当ビットに加えることで、スペクトル割当ビット上限値が得られる。 The spectrum allocation bit upper limit value is obtained by adding the value obtained in the above procedure to the average allocation bits on the memory 101.

この計算によって得られたスペクトル割当ビット上限値はメモリ１０１に格納される。処理を終えると、ステップＳ４０２へ進む。 The spectrum allocation bit upper limit value obtained by this calculation is stored in the memory 101. When the process is finished, step S 402 follows.

ステップＳ４０２は、メモリ１０１上のＰＥビットとスペクトル割当ビット上限値を比較する処理である。この比較の結果、ＰＥビット数がスペクトル割当ビット上限値よりも少ない場合はステップＳ４０３へ進む。そうでない場合は、ステップＳ４０４へ進む。 Step S402 is processing for comparing the PE bit on the memory 101 and the spectrum allocation bit upper limit value. As a result of the comparison, if the number of PE bits is smaller than the spectrum allocation bit upper limit value, the process proceeds to step S403. Otherwise, the process proceeds to step S404.

ステップＳ４０３は、メモリ１０１上のＰＥビットをスペクトル割当ビットとして格納する処理である。すなわち、ＰＥビットの値をスペクトル割当ビットにコピーする。処理を終えると、スペクトル割当ビット算出処理を終えて、リターンする。 Step S403 is processing for storing the PE bits on the memory 101 as spectrum allocation bits. That is, the value of the PE bit is copied to the spectrum allocation bit. When the process is finished, the spectrum allocation bit calculation process is finished and the process returns.

ステップＳ４０４は、メモリ１０１上のスペクトル割当ビット上限値をスペクトル割当ビットとして格納する処理である。すなわち、スペクトル割当ビット上限値をスペクトル割当ビットにコピーする。処理を終えると、スペクトル割当ビット算出処理を終えて、リターンする。 Step S404 is processing for storing the spectrum allocation bit upper limit value on the memory 101 as a spectrum allocation bit. That is, the spectrum allocation bit upper limit value is copied to the spectrum allocation bit. When the process is finished, the spectrum allocation bit calculation process is finished and the process returns.

本処理では、上述したようにＰＥビットによって割り当てられるビット数に上限値を設定することによって、蓄積ビットが枯渇してビットリザーバが破綻することを防止する効果がある。 In this process, as described above, setting the upper limit value for the number of bits allocated by the PE bits has an effect of preventing the bit reservoir from being exhausted due to the accumulated bits being depleted.

以上説明したように、本実施形態によれば、固定ビットレートにおいて、入力信号の特性に応じてビットリザーバに蓄積されている蓄積ビットを適宜フレームに割り当てる場合においても、量子化前に正確に量子化スペクトル総量を予測することによって、量子化前に正確に量子化ステップを決定することが可能になり、スペクトル量子化とビット計算の繰り返しを避けて、効率的に量子化を行うことができる。 As described above, according to the present embodiment, even when the accumulated bits accumulated in the bit reservoir according to the characteristics of the input signal are appropriately assigned to the frame at the fixed bit rate, the quantization is accurately performed before the quantization. By predicting the total amount of quantized spectrum, it is possible to accurately determine the quantization step before quantization, and it is possible to efficiently perform quantization while avoiding repetition of spectrum quantization and bit calculation.

以上説明したように、本発明のオーディオ信号符号化処理では、フレームに割り当てられたビット量から、量子化後のスペクトル総量を予測することにより、量子化前後のスペクトル全体が持つ情報量の差分を計算し、スペクトル量子化の前にフレーム全体の量子化ステップをほぼ正確に予測することで、ほぼ一回のスペクトル量子化処理を行うだけで量子化処理を終了することが可能になる。これにより、従来の技術と同等の符号化品質を保ちながら、従来の技術に比べて量子化処理にかかる処理量を大幅に削減することができる。 As described above, in the audio signal encoding processing according to the present invention, the difference in the information amount of the entire spectrum before and after quantization is predicted by predicting the total amount of spectrum after quantization from the amount of bits allocated to the frame. By calculating and predicting the quantization step of the entire frame almost accurately before spectrum quantization, the quantization process can be completed by performing only one spectrum quantization process. Thereby, while maintaining the encoding quality equivalent to that of the conventional technique, it is possible to significantly reduce the amount of processing required for the quantization process compared to the conventional technique.

（他の実施形態）
本発明は上述した実施形態に限定されるものではない。 (Other embodiments)
The present invention is not limited to the embodiment described above.

上述の実施形態では、オーディオ符号化装置および方法としてブロックスイッチングが行われる符号化方式の場合を取り扱っているが、ブロックスイッチングを行わない符号化方式においても同様に実施することが可能である。 In the above-described embodiment, the case of an encoding method in which block switching is performed is handled as an audio encoding device and method. However, the present invention can be similarly applied to an encoding method in which block switching is not performed.

その他、本発明はその要旨を逸脱しない範囲で種々変形して実施することができる。 In addition, the present invention can be variously modified and implemented without departing from the scope of the invention.

以上、本発明の実施形態を詳述したが、本発明は、複数の機器から構成されるシステムに適用してもよいし、また、一つの機器からなる装置に適用してもよい。 As mentioned above, although embodiment of this invention was explained in full detail, this invention may be applied to the system comprised from several apparatuses, and may be applied to the apparatus which consists of one apparatus.

なお、本発明は、前述したとおり、実施形態の機能を実現するオーディオ信号符号化処理プログラムを、システムあるいは装置に直接あるいは遠隔から供給し、そのシステムあるいは装置のコンピュータがその供給されたプログラムコードを読み出して実行することによっても達成される。その場合、プログラムの機能を有していれば、その形態はプログラムである必要はない。 As described above, the present invention supplies an audio signal encoding processing program that implements the functions of the embodiments directly or remotely to a system or apparatus, and the computer of the system or apparatus supplies the supplied program code. It is also achieved by reading and executing. In that case, as long as it has the function of a program, the form does not need to be a program.

従って、本発明の機能処理をコンピュータで実現するために、そのコンピュータにインストールされるプログラムコード自体およびそのプログラムを格納した記憶媒体も本発明を構成することになる。つまり、本発明の特許請求の範囲には、本発明の機能処理を実現するためのコンピュータプログラム自体、およびそのプログラムを格納した記憶媒体も含まれる。 Therefore, in order to realize the functional processing of the present invention with a computer, the program code itself installed in the computer and the storage medium storing the program also constitute the present invention. In other words, the claims of the present invention include the computer program itself for realizing the functional processing of the present invention and a storage medium storing the program.

その場合、プログラムの機能を有していれば、オブジェクトコード、インタプリタにより実行されるプログラム、ＯＳに供給するスクリプトデータ等、プログラムの形態を問わない。 In this case, the program may be in any form as long as it has a program function, such as an object code, a program executed by an interpreter, or script data supplied to the OS.

プログラムを供給するための記憶媒体としては、例えば、フレキシブルディスク、ハードディスク、光ディスク、光磁気ディスク、ＭＯ、ＣＤ−ＲＯＭ、ＣＤ−Ｒ、ＣＤ−ＲＷ、磁気テープ、不揮発性のメモリカード、ＲＯＭ、ＤＶＤ（ＤＶＤ−ＲＯＭ，ＤＶＤ−Ｒ）などがある。 As a storage medium for supplying the program, for example, flexible disk, hard disk, optical disk, magneto-optical disk, MO, CD-ROM, CD-R, CD-RW, magnetic tape, nonvolatile memory card, ROM, DVD (DVD-ROM, DVD-R).

その他、プログラムの供給方法としては、クライアントコンピュータのブラウザを用いてインターネットのホームページに接続し、そのホームページから本発明のコンピュータプログラムそのもの、もしくは圧縮され自動インストール機能を含むファイルをハードディスク等の記憶媒体にダウンロードすることによっても供給できる。また、本発明のプログラムを構成するプログラムコードを複数のファイルに分割し、それぞれのファイルを異なるホームページからダウンロードすることによっても実現可能である。つまり、本発明の機能処理をコンピュータで実現するためのプログラムファイルを複数のユーザに対してダウンロードさせるＷＷＷサーバも、本発明の範囲に含まれるものである。 As another program supply method, a client computer browser is used to connect to an Internet homepage, and the computer program of the present invention itself or a compressed file including an automatic installation function is downloaded from the homepage to a storage medium such as a hard disk. Can also be supplied. It can also be realized by dividing the program code constituting the program of the present invention into a plurality of files and downloading each file from a different homepage. That is, a WWW server that allows a plurality of users to download a program file for realizing the functional processing of the present invention on a computer is also included in the scope of the present invention.

また、本発明のプログラムを暗号化してＣＤ−ＲＯＭ等の記憶媒体に格納してユーザに配布し、所定の条件をクリアしたユーザに対し、インターネットを介してホームページから暗号化を解く鍵情報をダウンロードさせ、その鍵情報を使用することにより暗号化されたプログラムを実行してコンピュータにインストールさせて実現することも可能である。 In addition, the program of the present invention is encrypted, stored in a storage medium such as a CD-ROM, distributed to users, and key information for decryption is downloaded from a homepage via the Internet to users who have cleared predetermined conditions. It is also possible to execute the encrypted program by using the key information and install the program on a computer.

また、コンピュータが、読み出したプログラムを実行することによって、前述した実施形態の機能が実現される他、そのプログラムの指示に基づき、コンピュータ上で稼動しているＯＳなどが、実際の処理の一部または全部を行い、その処理によっても前述した実施形態の機能が実現され得る。 In addition to the functions of the above-described embodiments being realized by the computer executing the read program, the OS running on the computer based on the instruction of the program is a part of the actual processing. Alternatively, the functions of the above-described embodiment can be realized by performing all of them and performing the processing.

さらに、記憶媒体から読み出されたプログラムが、コンピュータに挿入された機能拡張ボードやコンピュータに接続された機能拡張ユニットに備わるメモリに書き込まれた後、そのプログラムの指示に基づき、その機能拡張ボードや機能拡張ユニットに備わるＣＰＵなどが実際の処理の一部または全部を行い、その処理によっても前述した実施形態の機能が実現される。 Furthermore, after the program read from the storage medium is written to a memory provided in a function expansion board inserted into the computer or a function expansion unit connected to the computer, the function expansion board or The CPU or the like provided in the function expansion unit performs part or all of the actual processing, and the functions of the above-described embodiments are realized by the processing.

図１は、本発明の第１の実施形態におけるオーディオ信号符号化装置の一構成例を示す図である。FIG. 1 is a diagram illustrating a configuration example of an audio signal encoding apparatus according to the first embodiment of the present invention. 図２は、本発明の第２の実施形態におけるオーディオ信号符号化処理のフローチャートである。FIG. 2 is a flowchart of audio signal encoding processing according to the second embodiment of the present invention. 図３は、本発明の第２の実施形態における量子化ステップ予測処理のフローチャートである。FIG. 3 is a flowchart of the quantization step prediction process in the second embodiment of the present invention. 図４は、本発明の第２の実施形態におけるスペクトル量子化処理のフローチャート、FIG. 4 is a flowchart of spectrum quantization processing according to the second embodiment of the present invention. 図５は、本発明の第２の実施形態におけるオーディオ信号符号化装置の一構成例を示す図である。FIG. 5 is a diagram illustrating a configuration example of an audio signal encoding device according to the second embodiment of the present invention. 図６は、本発明の第２の実施形態におけるオーディオ信号符号化処理プログラムを格納した記憶媒体の内容構成例を示す図である。FIG. 6 is a diagram showing a content configuration example of a storage medium storing an audio signal encoding processing program according to the second embodiment of the present invention. 図７は、本発明の第２の実施形態におけるオーディオ信号符号化処理プログラムのＰＣへの導入を示す模式図である。FIG. 7 is a schematic diagram showing introduction of an audio signal encoding processing program into a PC according to the second embodiment of the present invention. 図８は、本発明の第２の実施形態におけるメモリマップの例を示す図である。FIG. 8 is a diagram showing an example of a memory map in the second embodiment of the present invention. 図９は、本発明の第２の実施形態における入力信号バッファの構成例を示す図である。FIG. 9 is a diagram illustrating a configuration example of an input signal buffer according to the second embodiment of the present invention. 図１０は、本発明の第３の実施形態におけるオーディオ信号符号化装置の一構成例を示す図である。FIG. 10 is a diagram illustrating a configuration example of an audio signal encoding device according to the third embodiment of the present invention. 図１１は、本発明の第４の実施形態における量子化ステップ予測処理のフローチャートである。FIG. 11 is a flowchart of the quantization step prediction process in the fourth embodiment of the present invention. 図１２は、本発明の第４の実施形態におけるスペクトル割当ビット算出処理のフローチャートである。FIG. 12 is a flowchart of the spectrum allocation bit calculation process in the fourth embodiment of the present invention. 図１３は、ＩＳＯ規格書に従う量子化処理のフローチャートである。FIG. 13 is a flowchart of the quantization process according to the ISO standard.

Claims

A frame dividing unit that divides the audio input signal into processing unit frames for each channel;
An auditory psychological operation unit that analyzes an audio input signal, determines a conversion block length, and calculates auditory masking;
A filter bank unit that blocks a processing target frame according to the conversion block length determined by the auditory psychological calculation unit, and converts a time domain signal in the frame into a set of one or more frequency spectra;
A scale factor calculation unit that divides the frequency spectrum output from the filter bank unit into a plurality of frequency bands, and weights the spectrum of each frequency band according to a calculation result of the auditory psychology calculation unit;
By subtracting the information amount of the whole spectrum after quantization from the auditory information amount of the whole spectrum before quantization weighted by the scale factor calculation unit, by multiplying the coefficient obtained from the step size of the quantization roughness, A quantization step determining unit that determines the quantization step of the entire frame before spectral quantization;
A spectrum quantization unit that quantizes the frequency spectrum sequence using the scale factor and the quantization step;
A bit shaping unit that creates and outputs a bitstream obtained by shaping the quantized spectrum output from the spectrum quantization unit according to a prescribed format;
With
The audio signal characterized in that the quantization step determining unit includes a quantized spectrum information amount predicting unit that predicts an information amount of the entire quantized spectrum based on an amount of bits allocated to a frame to be encoded. Encoding device.

A frame dividing unit that divides the audio input signal into processing unit frames for each channel;
An auditory psychological operation unit that analyzes an audio input signal, determines a conversion block length, and calculates auditory masking;
A filter bank unit that blocks a processing target frame according to the conversion block length determined by the psychoacoustic operation unit, and converts a time domain signal in the frame into a set of one or more frequency spectra;
A scale factor calculation unit that divides the frequency spectrum output from the filter bank unit into a plurality of frequency bands, and weights the spectrum of each frequency band according to a calculation result of the auditory psychology calculation unit;
A quantized spectrum information amount prediction unit that predicts before quantizing the information amount of the entire quantized spectrum based on the bit amount allocated to the frame to be encoded;
By subtracting the information amount of the whole spectrum after quantization from the auditory information amount of the whole spectrum before quantization weighted by the scale factor calculation unit, and accumulating the coefficient obtained from the step size of the quantization roughness A quantization step determination unit that determines the quantization step of the entire frame before spectral quantization;
A spectrum quantization unit that quantizes the frequency spectrum sequence using the scale factor and the quantization step;
A bit shaping unit that creates and outputs a bitstream obtained by shaping the quantized spectrum output from the spectrum quantization unit according to a prescribed format;
With
The quantized spectral information amount prediction unit predicts the quantized spectral information amount based on auditory entropy when the prediction code amount of an input signal is less than an average frame allocation bit during fixed bit rate encoding. An audio signal encoding device.

A frame dividing unit that divides the audio input signal into processing unit frames for each channel;
An auditory psychological operation unit that analyzes an audio input signal, determines a conversion block length, and calculates auditory masking;
A filter bank unit that blocks a processing target frame according to the conversion block length determined by the psychoacoustic operation unit, and converts a time domain signal in the frame into a set of one or more frequency spectra;
A scale factor calculation unit that divides the frequency spectrum output from the filter bank unit into a plurality of frequency bands, and weights the spectrum of each frequency band according to a calculation result of the auditory psychology calculation unit;
A quantized spectrum information amount prediction unit that predicts before quantizing the information amount of the entire quantized spectrum based on the bit amount allocated to the frame to be encoded;
By subtracting the information amount of the whole spectrum after quantization from the auditory information amount of the whole spectrum before quantization weighted by the scale factor calculation unit, and accumulating the coefficient obtained from the step size of the quantization roughness A quantization step determination unit that determines the quantization step of the entire frame before spectral quantization;
A spectrum quantization unit that quantizes the frequency spectrum sequence using the scale factor and the quantization step;
A bit shaping unit that creates and outputs a bitstream obtained by shaping the quantized spectrum output from the spectrum quantization unit according to a prescribed format;
With
The spectrum quantization unit adjusts the quantization step and re-quantizes the spectrum when the code amount used for the quantized spectrum exceeds the assigned code amount. Encoding device.

The audio signal encoding apparatus according to any one of claims 1 to 3, wherein the encoding format is MPEG-1 Audio Layer III.

The audio signal encoding apparatus according to any one of claims 1 to 3, wherein the encoding format is MPEG-2 / 4 AAC.

A frame dividing step for dividing the audio input signal into processing unit frames for each channel;
Auditory psychological calculation step that analyzes audio input signal, determines transform block length and calculates auditory masking,
A filter bank processing step that blocks a processing target frame according to the conversion block length determined in the auditory psychological calculation step, and converts a time domain signal in the frame into a set of one or more frequency spectra;
Dividing the frequency spectrum obtained in the filter bank processing step into a plurality of frequency bands, and weighting the spectrum of each frequency band according to the calculation result in the auditory psychological calculation step; and
By subtracting the information amount of the whole spectrum after quantization from the information amount of the whole spectrum before quantization weighted by the scale factor calculation step, and integrating the coefficients obtained from the step size of the quantization roughness, A quantization step determination step for determining the entire quantization step before spectral quantization;
A spectral quantization step of quantizing the frequency spectrum sequence using the scale factor and the quantization step;
A bit shaping step of creating and outputting a bitstream obtained by shaping the quantized spectrum obtained in the spectrum quantization step according to a prescribed format;
Have
The quantization step determining step includes a quantized spectrum total amount predicting step for predicting an information amount of the entire quantized spectrum based on an information amount assigned to a frame to be encoded. Method.

A frame dividing step for dividing the audio input signal into processing unit frames for each channel;
Auditory psychological calculation step that analyzes audio input signal, determines transform block length and calculates auditory masking,
A filter bank processing step that blocks a processing target frame according to the conversion block length determined in the auditory psychological calculation step, and converts a time domain signal in the frame into a set of one or more frequency spectra;
Dividing the frequency spectrum obtained in the filter bank processing step into a plurality of frequency bands, and weighting the spectrum of each frequency band according to the calculation result in the auditory psychological calculation step; and
A quantized spectrum information amount prediction step for predicting before quantizing the information amount of the entire quantized spectrum based on the bit amount assigned to the frame to be encoded;
By subtracting the information amount of the entire spectrum after quantization from the auditory information amount of the entire spectrum before quantization weighted by the scale factor calculation step, and integrating the coefficient obtained from the step size of the quantization roughness A quantization step determining step for determining the quantization step for the entire frame before spectral quantization;
A spectral quantization step of quantizing the frequency spectrum sequence using the scale factor and the quantization step;
A bit shaping step of creating and outputting a bitstream obtained by shaping the quantized spectrum obtained by the spectrum quantization according to a prescribed format;
Have
In the quantized spectral information amount predicting step, when the predicted code amount of the input signal is less than the average frame allocation bit at the time of fixed bit rate encoding, the quantized spectral information amount is predicted based on auditory entropy. An audio signal encoding method.

A frame dividing step for dividing the audio input signal into processing unit frames for each channel;
Auditory psychological calculation step that analyzes audio input signal, determines transform block length and performs auditory masking calculation,
A filter bank processing step that blocks a processing target frame according to the conversion block length determined in the auditory psychological calculation step, and converts a time domain signal in the frame into a set of one or more frequency spectra;
Dividing the frequency spectrum obtained in the filter bank processing step into a plurality of frequency bands, and weighting the spectrum of each frequency band according to the calculation result in the auditory psychological calculation step; and
A quantized spectrum information amount prediction step for predicting before quantizing the information amount of the entire quantized spectrum based on the bit amount assigned to the frame to be encoded;
By subtracting the information amount of the entire spectrum after quantization from the auditory information amount of the entire spectrum before quantization weighted by the scale factor calculation step, and integrating the coefficient obtained from the step size of the quantization roughness A quantization step determining step for determining the quantization step for the entire frame before spectral quantization;
A spectral quantization step of quantizing the frequency spectrum sequence using the scale factor and the quantization step;
A bit shaping step of creating and outputting a bitstream obtained by shaping the quantized spectrum obtained in the spectrum quantization step according to a prescribed format;
Have
In the spectrum quantization step, when the code amount used for the quantized spectrum exceeds the assigned code amount, the audio signal is characterized in that the quantization step is adjusted to re-quantize the spectrum. Encoding method.

A program for causing a computer to execute the audio signal encoding method according to any one of claims 6 to 8.

A computer-readable storage medium storing the program according to claim 9.