JP4822816B2

JP4822816B2 - Audio signal encoding apparatus and method

Info

Publication number: JP4822816B2
Application number: JP2005328945A
Authority: JP
Inventors: 正伸船越
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2005-11-14
Filing date: 2005-11-14
Publication date: 2011-11-24
Anticipated expiration: 2025-11-14
Also published as: JP2007133323A

Abstract

<P>PROBLEM TO BE SOLVED: To reduce a processing amount for quantization, while minimizing sound quality deterioration by not performing auditory mentality analysis, in audio signal encoding which is constituted so that the auditory mentality analysis may be not performed. <P>SOLUTION: A spectrum information amount calculation part 15 calculates a spectrum information amount before quantization. A quantization spectrum information prediction part 16 predicts the spectrum information amount after the quantization based on a frame average bit amount. A quantization step deciding part 7 decides a quantization step of a whole frame by subtracting the spectrum information amount after the quantization from the spectrum information amount before the quantization, and then multiplying the result of the subtraction by a factor obtained from a quantization roughness increment width. A spectrum quantizing part 8 quantizes a frequency spectrum using the quantization step. Then, the spectrum quantizing part 8 performs code amount control, based on a spectrum allocation bit amount which is calculated in a spectrum allocation bit calculation part 12. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、オーディオ信号の符号化装置および方法に関する。 The present invention relates to an audio signal encoding apparatus and method.

近年、高音質かつ高効率なオーディオ信号符号化技術が、DVD-Videoの音声トラック、携帯オーディオプレーヤー、音楽配信、家庭内LANにおけるホームサーバへの楽曲蓄積などに広く利用され、幅広く普及するとともにその重要性も増している。 In recent years, high-quality and high-efficiency audio signal coding technology has been widely used for DVD-Video audio tracks, portable audio players, music distribution, music storage on home servers in home LAN, etc. The importance is also increasing.

このようなオーディオ信号符号化技術の多くは、変換符号化技術を利用して時間周波数変換を行っている。例えば、MPEG-2 AACやDolby Digital(AC-3)などでは、MDCT (Modified Discrete Cosine Transform) などの直交変換単体でフィルタバンクを構成している。また、MPEG-1 Audio Layer III (MP3) やATRAC (MD(ミニディスク)に利用されている符号化方式) では、QMF (Quadrature Mirror Filter)などのサブバンド分割フィルタと直交変換を多段接続してフィルタバンクを構成している。 Many of such audio signal encoding techniques perform time-frequency conversion using a conversion encoding technique. For example, in MPEG-2 AAC, Dolby Digital (AC-3), and the like, a filter bank is configured by a single orthogonal transform such as MDCT (Modified Discrete Cosine Transform). In MPEG-1 Audio Layer III (MP3) and ATRAC (encoding method used for MD (minidisc)), subband division filters such as QMF (Quadrature Mirror Filter) and orthogonal transforms are connected in multiple stages. A filter bank is configured.

これらの変換符号化技術では、人間の聴覚特性を利用したマスキング分析が行われる。そして、マスクされると判断されるスペクトル成分を取り除く、あるいはマスクされる量子化誤差を許容することにより、スペクトル表現のための情報量を削減し、圧縮効率を高めている。 In these transform coding techniques, masking analysis using human auditory characteristics is performed. Then, by removing spectral components determined to be masked or allowing quantization errors to be masked, the amount of information for spectral representation is reduced and compression efficiency is increased.

また、これらの変換符号化技術では、その多くが、スペクトル成分を非線形量子化することにより、スペクトルが持つ情報量を圧縮している。例えば、MP3やAACでは、各スペクトル成分を0.75乗することにより情報量を圧縮している。 In many of these transform coding techniques, the amount of information held in a spectrum is compressed by nonlinearly quantizing the spectrum component. For example, in MP3 and AAC, the amount of information is compressed by raising each spectral component to the power of 0.75.

また、これらの変換符号化技術では、フィルタバンクによって周波数成分に変換された入力信号を、人間の聴覚の周波数分解能に基づいて設定される分割周波数帯域ごとにまとめる。そして、量子化時に各分割周波数帯域毎の正規化係数を聴覚分析結果から決定し、正規化係数と量子化スペクトルの組み合わせで周波数成分を表現することで情報量を削減している。この正規化係数は、実際には分割帯域毎の量子化粗さの調整を行う変数であり、正規化係数が１変化することによって、量子化粗さは１ステップ分変化することになる。MPEG-2 AACでは、この分割周波数帯域をスケールファクタバンド（SFB）と呼び、正規化係数をスケールファクタと呼称する。 In these transform coding techniques, the input signals converted into frequency components by the filter bank are grouped for each divided frequency band set based on the human auditory frequency resolution. Then, the normalization coefficient for each divided frequency band is determined from the auditory analysis result at the time of quantization, and the amount of information is reduced by expressing the frequency component by a combination of the normalization coefficient and the quantized spectrum. This normalization coefficient is actually a variable for adjusting the quantization roughness for each divided band. When the normalization coefficient changes by 1, the quantization roughness changes by one step. In MPEG-2 AAC, this divided frequency band is called a scale factor band (SFB), and the normalization coefficient is called a scale factor.

また、これらの変換符号化方式では、符号化単位である１フレーム全体の量子化粗さを制御することによって符号量を制御している。多くの変換符号化方式では、量子化粗さは、ある基数の整数乗幅でステップ状に制御されており、この整数を量子化ステップと呼ぶ。MPEGオーディオ規格では、この、フレーム全体の量子化粗さを設定する量子化ステップを「グローバルゲイン」もしくは「コモンスケールファクタ」と呼称している。また、前述のスケールファクタは量子化ステップとの相対値で表現することによって、これらの変数の符号に必要な情報量を削減している。 In these transform coding systems, the amount of code is controlled by controlling the quantization roughness of the entire frame, which is a coding unit. In many transform coding schemes, the quantization roughness is controlled in steps with an integer power of a certain radix, and this integer is called a quantization step. In the MPEG audio standard, this quantization step for setting the quantization roughness of the entire frame is called “global gain” or “common scale factor”. The scale factor described above is expressed as a relative value to the quantization step, thereby reducing the amount of information necessary for the sign of these variables.

例えば、MP3やAACではこれらの変数が１変化することによって、実際の量子化粗さは２の3/16乗分変化する。 For example, in MP3 and AAC, when these variables change by 1, the actual quantization roughness changes by 2 3/16 power.

変換符号化方式の量子化処理では、スケールファクタを制御して聴覚演算の結果を反映して量子化誤差がマスクされるように量子化歪みを制御する。またこれと同時に、量子化ステップを制御してフレーム全体の量子化粗さを適宜調整することによってフレーム全体の符号量制御を行わなければならない。これらの量子化粗さを決める二種類の数値は、符号化品質に重大な影響を及ぼすため、慎重かつ正確に、この二つの制御を同時に効率よく行うことが求められる。 In the transform coding quantization process, the quantization distortion is controlled such that the quantization error is masked by controlling the scale factor to reflect the result of the auditory operation. At the same time, it is necessary to control the code amount of the entire frame by controlling the quantization step and appropriately adjusting the quantization roughness of the entire frame. Since these two kinds of numerical values that determine the quantization roughness have a significant influence on the encoding quality, it is required to perform these two controls simultaneously and efficiently with caution and accuracy.

MPEG-1 Audio Layer III(MP3)の規格書(ISO/IEC 11172-3)やMPEG-2 AACの規格書(ISO/IEC 13818-7)を参照されたい。そこには、量子化時にスケールファクタとグローバルゲインを適宜制御する方法として、歪み制御ループ（アウターループ）と符号量制御ループ（インナーループ）の二重ループによって繰り返し処理を行う方法が紹介されている。以下、この方法を図面を用いて説明する。なお、便宜上、MPEG-2 AACの場合を例にとって説明を行う。 Refer to the MPEG-1 Audio Layer III (MP3) standard (ISO / IEC 11172-3) and the MPEG-2 AAC standard (ISO / IEC 13818-7). It introduces a method for iterative processing using a double loop of a distortion control loop (outer loop) and a code amount control loop (inner loop) as a method for appropriately controlling the scale factor and global gain during quantization. . Hereinafter, this method will be described with reference to the drawings. For convenience, the case of MPEG-2 AAC will be described as an example.

図１０は、ISO/IEC規格書に記載されている量子化処理を簡単なフローチャートにしたものである。 FIG. 10 shows a simple flowchart of the quantization process described in the ISO / IEC standard.

まず、ステップＳ５０１では全てのSFBのスケールファクタと、グローバルゲインが０に初期化され、歪み制御ループ（アウターループ）に入る。 First, in step S501, all SFB scale factors and global gains are initialized to 0, and a distortion control loop (outer loop) is entered.

歪み制御ループでは、まず、符号量制御ループ（インナーループ）が実行される。 In the distortion control loop, first, a code amount control loop (inner loop) is executed.

符号量制御ループでは、まず、ステップＳ５０２において、１フレーム分、すなわち、１０２４個のスペクトル成分が、下記の量子化式に従って量子化される。 In the code amount control loop, first, in step S502, one frame, that is, 1024 spectral components are quantized according to the following quantization formula.

ただし、（1）式においてXqは量子化スペクトル、x_iは量子化前のスペクトル(MDCT係数)、global_gainはグローバルゲイン、scalefacはこのスペクトル成分が含まれるSFBのスケールファクタである。 In Equation (1), Xq is a quantized spectrum, x _i is a spectrum before quantization (MDCT coefficient), global_gain is a global gain, and scalefac is an SFB scale factor including this spectral component.

次に、ステップＳ５０３において、これらの量子化スペクトルをハフマン符号化した時の１フレーム分の使用ビット数が計算され、ステップＳ５０４でフレームに割り当てられたビット数と比較する。割り当てられたビット数より使用ビット数が大きい場合は、ステップＳ５０５においてグローバルゲインを１増加して、量子化粗さを粗くして、再びステップＳ５０２のスペクトル量子化に戻る。この繰り返しは量子化後に必要なビット数が割り当てられたビット数より少なくなるまで行われ、この時点でのグローバルゲインが決定されて、符号量制御ループが終了する。 Next, in step S503, the number of used bits for one frame when these quantized spectra are Huffman encoded is calculated and compared with the number of bits assigned to the frame in step S504. If the number of used bits is larger than the allocated number of bits, the global gain is increased by 1 in step S505, the quantization roughness is increased, and the process returns to the spectral quantization in step S502 again. This repetition is performed until the number of bits necessary after quantization becomes smaller than the allocated number of bits, the global gain at this point is determined, and the code amount control loop is terminated.

ステップＳ５０６では、符号量制御ループによって量子化されたスペクトルを逆量子化して、量子化前のスペクトルとの差分を取ることによって量子化誤差を算出する。この量子化誤差は、SFB毎にまとめられる。 In step S506, the quantization error is calculated by dequantizing the spectrum quantized by the code amount control loop and taking the difference from the spectrum before quantization. This quantization error is collected for each SFB.

ステップＳ５０７では、全てのSFBでスケールファクタが０より大きくなったか、もしくは、量子化誤差が許容誤差範囲内に納まっているかどうかを調べる。このいずれの条件も満たさないSFBがある場合は、ステップＳ５０８に進み、許容誤差範囲内に量子化誤差が納まっていないSFBのスケールファクタを１増やし、再び歪み制御ループ処理を繰り返す。なお、SFB毎の許容誤差は聴覚演算によって量子化処理の前に求められている。 In step S507, it is checked whether the scale factor has become larger than 0 in all the SFBs or whether the quantization error is within the allowable error range. If there is an SFB that does not satisfy any of these conditions, the process proceeds to step S508, where the scale factor of the SFB in which the quantization error is not within the allowable error range is increased by 1, and the distortion control loop process is repeated again. The permissible error for each SFB is obtained before the quantization process by auditory calculation.

以上説明したように、ＩＳＯ規格書に記載されている量子化処理方法は二重ループで構成されており、しかも、グローバルゲインとスケールファクタは１刻みの制御しか行われない。そのため、この処理が収束するまでに、スペクトル量子化とビット計算が幾度となく延々と繰り返されることになる。 As described above, the quantization processing method described in the ISO standard is composed of a double loop, and the global gain and the scale factor can only be controlled in increments of 1. For this reason, the spectral quantization and the bit calculation are repeated many times before the process is converged.

ここで、例えばMPEG-2 AACの場合では、スペクトル量子化は１回処理するたびに式（1）の計算を1024回行うことになるため、計算量が多い処理である。また、ビット計算時に検索されるハフマン符号表が１１種も存在するため、ハフマン符号表を全探索するとビット計算もやはり計算量が必然的に多くなる。 Here, for example, in the case of MPEG-2 AAC, the spectrum quantization is a process with a large calculation amount because the calculation of the expression (1) is performed 1024 times every time the process is performed once. In addition, since there are 11 types of Huffman code tables that are searched during the bit calculation, if the Huffman code table is fully searched, the calculation amount of the bit calculation inevitably increases.

さらに、歪み制御ループでは逆量子化後に量子化誤差の計算を行っているが、この処理も計算量が多い。そのため、この二重ループが収束するまでには膨大な処理量がかかってしまう。 Further, in the distortion control loop, the calculation of the quantization error is performed after the inverse quantization, but this processing also has a large amount of calculation. Therefore, it takes a huge amount of processing before the double loop converges.

この問題を解決するために、二重ループの繰り返し回数を削減することによって、処理量を削減しようとする様々な試みがなされている。 In order to solve this problem, various attempts have been made to reduce the processing amount by reducing the number of repetitions of the double loop.

例えば、特許文献１は、ハフマン符号表の特性に応じて決定したステップ数によって、コモンスケールファクタやスケールファクタを１刻みではなく飛び飛びに制御する技術を開示する。これにより、二重ループそれぞれのループ回数を減らし、処理量を削減している。 For example, Patent Document 1 discloses a technique for controlling the common scale factor and the scale factor in a step-by-step manner instead of one step by the number of steps determined according to the characteristics of the Huffman code table. This reduces the number of loops in each double loop and reduces the amount of processing.

また、特許文献２は、最初に量子化ステップの推定値を算出した後、スケールファクタをMNRに応じて計算後、通常のインナーループを実行する方法を開示している。 Patent Document 2 discloses a method of executing a normal inner loop after first calculating an estimated value of a quantization step and then calculating a scale factor according to MNR.

また、非特許文献１は、式（１）を変形した式と、聴覚分析によって求められるSFB毎の許容誤差エネルギーを用いることによって、スケールファクタをスペクトル量子化に先行して適宜計算する技術を開示する。これにより、二重ループの外側の歪み制御ループを取り除き、処理量を削減している。 Non-Patent Document 1 discloses a technique for appropriately calculating a scale factor prior to spectrum quantization by using an expression obtained by modifying Expression (1) and an allowable error energy for each SFB obtained by auditory analysis. To do. Thereby, the distortion control loop outside the double loop is removed, and the processing amount is reduced.

これらの従来技術を用いることによって、量子化処理の二重ループの収束を早め、量子化処理の処理量をある程度までは削減することができる。 By using these conventional techniques, the convergence of the double loop of the quantization process can be accelerated, and the amount of the quantization process can be reduced to some extent.

ところで、量子化処理とともに処理量のかかる処理として、聴覚心理分析処理がある。そこで、符号化効率よりも処理量削減が優先されるような場合、具体的には例えば、比較的安価な携帯ビデオ撮影機器などにおいて音質よりも消費電力の低減が優先されるような場合では、聴覚心理分析を全く行わずに符号化することも可能である。このとき、量子化処理においては、全ての分割周波数帯域においてスケールファクタを一律に同一値に設定することによって、外側の歪み制御ループを取り除き、さらに処理量を削減することができる。 By the way, there is an auditory psychological analysis process as a process which requires a large amount of processing along with the quantization process. Therefore, when processing amount reduction is prioritized over coding efficiency, specifically, for example, when reduction of power consumption is prioritized over sound quality in a relatively inexpensive portable video shooting device, etc. It is also possible to code without any psychoacoustic analysis. At this time, in the quantization process, the scale factor is uniformly set to the same value in all the divided frequency bands, thereby removing the outer distortion control loop and further reducing the processing amount.

特開2003-271199号公報JP 2003-271199 A 特開2001-184091号公報JP 2001-184091 A.D.Duenes、R.Perez、B.Rivas 等, "A robust and efficient implementation of MPEG-2/4 AAC Natural Audio Coders", AES 112th Convention Paper（2002）A.D.Duenes, R.Perez, B.Rivas et al., "A robust and efficient implementation of MPEG-2 / 4 AAC Natural Audio Coders", AES 112th Convention Paper (2002)

しかしながら、従来の技術では、ISO規格書に記載されている二重ループを完全に繰り返さないようにすることはできない。そのため、スペクトル量子化を数回から数十回繰り返さないと量子化処理を終えることができず、符号化処理全体に占める量子化処理の処理量は依然として大きかった。 However, the conventional technique cannot prevent the double loop described in the ISO standard document from being completely repeated. Therefore, the quantization process cannot be completed unless the spectrum quantization is repeated several to several tens of times, and the amount of the quantization process in the entire encoding process is still large.

この問題は聴覚心理分析を行わない場合においても同様である。全ての分割周波数帯域においてスケールファクタを一律に同じ値とした場合であっても、外側の歪み制御ループのみが省略できるだけであり、量子化ステップを量子化前に計算することは従来の技術では不可能である。そのため、従来の技術では符号量制御ループにおけるスペクトル量子化とビット計算をやはり繰り返し行っており、処理量を浪費しているという課題がある。 This problem is the same when no psychoacoustic analysis is performed. Even when the scale factor is uniformly set to the same value in all divided frequency bands, only the outer distortion control loop can be omitted, and it is not possible in the prior art to calculate the quantization step before quantization. Is possible. Therefore, in the conventional technique, the spectrum quantization and the bit calculation in the code amount control loop are repeatedly performed, and there is a problem that the processing amount is wasted.

さらに、聴覚心理分析を行わない場合は、符号量制御の根拠となるＰＥ（聴覚エントロピー）が算出されないため、ビットリザーバに蓄積されている余剰ビットをフレームに割り当てることができず、さらに音質が劣化してしまうという問題も生じる。 In addition, when auditory psychological analysis is not performed, PE (auditory entropy), which is the basis of code amount control, is not calculated, so the surplus bits stored in the bit reservoir cannot be assigned to the frame, and sound quality is further deteriorated. The problem of end up occurs.

したがって本発明の目的は、聴覚心理分析を行わないように構成されたオーディオ信号符号化において、聴覚心理分析を行わないことによる音質劣化を最小限に留めつつ、量子化にかかる処理量を削減することにある。 Accordingly, an object of the present invention is to reduce the amount of processing required for quantization while minimizing sound quality degradation due to not performing psychoacoustic analysis in audio signal coding configured not to perform psychoacoustic analysis. There is.

本発明の一側面に係るオーディオ信号符号化装置は、オーディオ入力信号をチャネルごとに処理単位のフレームに分割する分割手段と、前記分割手段より得られた連続する２フレームの時間信号を周波数スペクトルに変換する処理を、１フレームずつずらしながら行う変換手段と、前記変換手段より出力された周波数スペクトルの情報量を、量子化前のスペクトル情報量として算出するスペクトル情報量算出手段と、ビットレートとサンプリングレートとから算出されるフレーム平均ビット量に基づいて、量子化後のスペクトル情報量を予測する予測手段と、前記スペクトル情報量算出手段で算出された前記量子化前のスペクトル情報量から前記予測手段で予測された前記量子化後のスペクトル情報量を減じ、その減算結果に、量子化粗さの刻み幅から得られる係数を乗じることで、フレーム全体の量子化ステップを量子化前に決定する決定手段と、前記決定手段で決定された前記量子化ステップを利用して前記周波数スペクトルを量子化する量子化手段と、符号化規格に準じた余剰ビット量を管理するビットリザーバと、前記量子化手段で量子化された周波数スペクトルを所定のフォーマットに従って整形する整形手段と、前記フレーム平均ビットに、前記ビットリザーバに蓄積されている余剰ビット量の一部を加算してスペクトル割当ビットを計算するスペクトル割当ビット計算手段とを備え、前記量子化手段は、前記スペクトル割当ビット計算手段で計算された前記スペクトル割当ビット量に基づいて符号量を制御することを特徴とする。 Audio signal encoding apparatus according to an aspect of the present invention, an audio input signal dividing means for dividing a frame of the processing unit for each channel, a time signal of 2 consecutive frames obtained from the dividing means into a frequency spectrum Conversion means for performing the conversion process by shifting one frame at a time, spectrum information amount calculation means for calculating the information amount of the frequency spectrum output from the conversion means as the spectrum information amount before quantization, bit rate and sampling based on the frame average bit amount calculated from the rate prediction means for predicting a spectral information amount after quantization, the prediction means from the spectrum information amount before the quantization calculated by the spectral information calculating means The amount of spectral information after quantization predicted in step 1 is subtracted, and the result of subtraction is the quantization roughness. By multiplying a coefficient obtained from the observed width is quantized and determining means for determining before quantization, the frequency spectrum by using the quantization step determined by the determining means quantization step of the entire frame A quantization means ; a bit reservoir for managing an amount of surplus bits in accordance with an encoding standard; a shaping means for shaping a frequency spectrum quantized by the quantization means according to a predetermined format; and the frame average bits, by adding a portion of the excess bit amount accumulated in the bit reservoir and a spectrum allocation bit calculation means for calculating a spectrum assigned bits, the quantizing means, the spectrum calculated by the spectrum allocation bit calculation means and controlling the amount of codes based on the allocated bit amount.

本発明によれば、聴覚心理分析を行わないように構成されたオーディオ信号符号化において、聴覚心理分析を行わないことによる音質劣化を最小限に留めつつ、量子化にかかる処理量を削減することができる。 According to the present invention, in audio signal coding configured not to perform auditory psychological analysis, the amount of processing required for quantization is reduced while minimizing sound quality degradation due to not performing auditory psychological analysis. Can do.

本発明は、基本的には量子化前の情報量を量子化後の情報量で割ることによって、全体の量子化粗さを求めることができるという考えに基づき、量子化ステップを実際の量子化前に求めようとするものである。ここで、量子化粗さは一般的に基数を量子化ステップ乗したものであるため、量子化ステップを求めるために底をこの基数にした対数をとると、情報量の除算は情報量の差分に変化する。この差分に、量子化の刻み幅によって決定される係数を積算すると正確な量子化ステップを求めることができる。さらに、実際の量子化後の情報量は量子化後でないと求めることができないが、フレームに割り当てられた符号量から予測することができるため、本発明はこの予測を利用して量子化前に正確な量子化ステップを求めるものである。 The present invention is basically based on the idea that the overall quantization roughness can be obtained by dividing the amount of information before quantization by the amount of information after quantization. It is what you want to ask before. Here, since the quantization roughness is generally obtained by multiplying the radix by the quantization step, when taking the logarithm with the base as the base to obtain the quantization step, the division of the information amount is the difference of the information amount. To change. An accurate quantization step can be obtained by multiplying this difference by a coefficient determined by the quantization step size. Further, although the actual amount of information after quantization can only be obtained after quantization, since it can be predicted from the amount of code assigned to the frame, the present invention uses this prediction before quantization. It is an exact quantization step.

また、本発明は、量子化前の予測時にはフレーム平均符号量を利用し、実際の量子化時にはビットリザーバに蓄積されている余剰ビット量の一部を足し込み、この値を基準にして符号量を制御する。これにより、量子化ステップの予測値に多少の誤差が生じた場合でも一回のスペクトル量子化で量子化処理を終了し、かつ、情報量が多いフレームには聴覚分析せずとも自動的に余剰ビットの一部が割り当てられるようにする。 Further, the present invention uses the frame average code amount at the time of prediction before quantization, adds a part of the surplus bit amount accumulated in the bit reservoir at the time of actual quantization, and uses this value as a reference for the code amount. To control. As a result, even if there is some error in the predicted value of the quantization step, the quantization process is completed with a single spectral quantization, and a frame with a large amount of information is automatically left over without auditory analysis. Allow some bits to be allocated.

本発明においては、最初にスケールファクタを算出、確定した後に、その値を使用した計算で量子化ステップをほぼ正確に算出することができるので、ほぼ一回のスペクトル量子化とビット計算で量子化を終了することが可能になる。 In the present invention, after calculating and determining the scale factor for the first time, the quantization step can be calculated almost accurately by the calculation using the value, so that the quantization is performed by almost one spectral quantization and bit calculation. Can be terminated.

以下、図面を参照して本発明の好適な実施形態について詳細に説明する。なお、本発明は以下の実施形態に限定されるものではなく、本発明の実施に有利な具体例を示すにすぎない。また、以下の実施形態の中で説明されている特徴の組み合わせの全てが本発明の課題解決手段として必須のものであるとは限らない。 DESCRIPTION OF EMBODIMENTS Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the drawings. In addition, this invention is not limited to the following embodiment, It shows only the specific example advantageous for implementation of this invention. In addition, not all combinations of features described in the following embodiments are indispensable as means for solving the problems of the present invention.

（第１の実施形態）
図１は、本実施形態におけるオーディオ信号符号化装置の構成を示す図である。なお、同図において太線はデータ信号、細線は制御信号を示す。 (First embodiment)
FIG. 1 is a diagram illustrating a configuration of an audio signal encoding device according to the present embodiment. In the figure, a thick line indicates a data signal and a thin line indicates a control signal.

図示の構成において、フレーム分割器１はオーディオ入力信号を処理単位であるフレームに分割する。フレーム単位に分割された入力信号はフィルタバンク３へ送出される。フィルタバンク３は、フレーム分割器１から入力された時間信号に対して、ウィンドウ掛けを行った後、所定のブロック長で時間−周波数変換を行い、周波数スペクトルに変換する。 In the illustrated configuration, the frame divider 1 divides an audio input signal into frames as processing units. The input signal divided into frame units is sent to the filter bank 3. The filter bank 3 performs windowing on the time signal input from the frame divider 1, performs time-frequency conversion with a predetermined block length, and converts the signal into a frequency spectrum.

スペクトル情報量計算器１５は、フィルタバンク３から出力された各周波数スペクトルの総計をとり、これに基づいて量子化前の周波数スペクトルが持つ情報量を計算する。量子化ステップ計算器７は、スペクトル情報量計算器１５で求めた量子化前のスペクトルが持つ情報量から、後述の量子化スペクトル情報量予測器１６で予測した量子化後のスペクトル情報量を減じることによって量子化ステップを求める。スペクトル量子化器８は、各周波数スペクトルを量子化する。ビット整形器９は、スケールファクタと量子化スペクトルを適宜規定のフォーマットに整形してビットストリームを作成し、出力する。ビットリザーバ１３は、各符号化規格により規定される余剰ビット（リザーブビット）数を管理する。 The spectrum information calculator 15 takes the sum of each frequency spectrum output from the filter bank 3 and calculates the information amount of the frequency spectrum before quantization based on this. The quantization step calculator 7 subtracts the quantized spectral information amount predicted by the quantized spectral information amount predictor 16 described later from the information amount of the spectrum before quantization obtained by the spectral information amount calculator 15. To obtain the quantization step. The spectrum quantizer 8 quantizes each frequency spectrum. The bit shaper 9 shapes the scale factor and the quantized spectrum into a prescribed format as appropriate, creates a bit stream, and outputs it. The bit reservoir 13 manages the number of surplus bits (reserved bits) defined by each coding standard.

スペクトル割当ビット計算器１２は、ビットリザーバ１３から通知される余剰ビット量とフレーム平均ビットから量子化スペクトル符号に割り当てられるビット数を計算する。量子化スペクトル情報量予測器１６は、各フレームに割り当てられる平均ビット数に基づいて量子化スペクトル情報量の予測計算を行う。 The spectrum allocation bit calculator 12 calculates the number of bits allocated to the quantized spectrum code from the surplus bit amount notified from the bit reservoir 13 and the frame average bit. The quantized spectral information amount predictor 16 performs prediction calculation of the quantized spectral information amount based on the average number of bits assigned to each frame.

次に、上記構成によるオーディオ信号符号化装置における、オーディオ信号の符号化動作を説明する。なお、ここでは符号化方式としてMPEG-2 AACを例にとって説明するが、同様な量子化手法を適用可能なその他の符号化方式についても全く同様な方法で実現可能である。 Next, an audio signal encoding operation in the audio signal encoding apparatus having the above configuration will be described. Here, MPEG-2 AAC will be described as an example of an encoding method, but other encoding methods to which a similar quantization method can be applied can be realized by the same method.

まず、処理に先立ち、各部の初期化を行う。初期化によって、量子化ステップと全てのスケールファクタの値は０にセットされる。 First, prior to processing, each unit is initialized. Initialization sets the quantization step and all scale factor values to zero.

オーディオPCM信号などのオーディオ入力信号はフレーム分割器１によってフレーム単位に分割され、フィルタバンク３に送出される。MPEG-2 AAC LC(Low-Complexity)プロファイルの場合、1フレームは1024サンプルのPCM信号で構成され、この信号が送出される。 An audio input signal such as an audio PCM signal is divided into frames by the frame divider 1 and sent to the filter bank 3 . In the case of the MPEG-2 AAC LC (Low-Complexity) profile, one frame is composed of PCM signals of 1024 samples, and this signal is transmitted.

フィルタバンク３では、フレーム分割器１から送出される１フレーム分の現入力信号と、前回の変換時に受け取った先行フレームの入力信号とを合わせて２フレーム分、すなわち2048サンプルの時間信号が1024サンプルの周波数成分に変換される。なお、本実施形態において、先行フレームの入力信号はフィルタバンク３内の図示しないバッファに保持されている。フィルタバンク３は、入力信号の2048サンプルを１つのブロックとして、窓掛けを実行後、MDCTを行い、1024個の周波数スペクトルを出力する。 In the filter bank 3, the current input signal for one frame transmitted from the frame divider 1 and the input signal of the previous frame received at the previous conversion are combined for two frames, that is, the time signal of 2048 samples is 1024 samples. Is converted to a frequency component. In the present embodiment, the input signal of the preceding frame is held in a buffer (not shown) in the filter bank 3. The filter bank 3 uses 2048 samples of the input signal as one block, performs windowing, performs MDCT, and outputs 1024 frequency spectra.

スペクトル情報量計算器１５は、フィルタバンク３から出力された各周波数スペクトルの総計をとり、これに基づいて量子化前の周波数スペクトルが持つ情報量を計算する。MPEG-2 AACの場合、量子化前のスペクトル全体が持つ情報量は次式で計算できる。 The spectrum information calculator 15 takes the sum of each frequency spectrum output from the filter bank 3 and calculates the information amount of the frequency spectrum before quantization based on this. In the case of MPEG-2 AAC, the information content of the entire spectrum before quantization can be calculated by the following equation.

ただし、ｘ_iは量子化前のスペクトルを示し、総計をとるiの範囲は1フレーム分、すなわち0≦i≦1023である。これは、各スペクトルの総計に対して底が２の対数をとったものである。 Here, x _i represents a spectrum before quantization, and the range of i taking the total is one frame, that is, 0 ≦ i ≦ 1023. This is a logarithm with a base of 2 for the sum of each spectrum.

量子化スペクトル情報量予測器１６は、各フレームに割り当てられる平均ビット数に基づいて量子化スペクトル情報量の予測計算を行う。この計算では、まず、フレーム平均ビットに基づいて量子化スペクトル総量の予測計算が行われる。本実施形態において、この計算は、従来の量子化器によって量子化した際の、フレームビットと量子化スペクトル総量との関係を実際に測定し、その結果に基づいて作成した近似式によって計算する。例えば、この近似式をF(x)として、フレーム平均ビットをaverage_bitsとすると、量子化スペクトル予測総量は次式によって求めることができる。 Quantized spectral information amount predictor 1 6 performs prediction calculation of quantized spectral information amount based on the average number of bits allocated to each frame. In this calculation, first, prediction calculation of the total amount of quantized spectrum is performed based on the frame average bit. In the present embodiment, this calculation is performed using an approximate expression created based on the actual measurement of the relationship between the frame bits and the total amount of the quantized spectrum when quantized by a conventional quantizer. For example, if this approximate expression is F (x) and the frame average bit is average_bits, the quantized spectrum prediction total amount can be obtained by the following expression.

ただし、X_qは量子化スペクトルであり、総計をとるiの範囲は１フレーム分、すなわち0≦i≦1023である。なお、本実施形態において、フレーム平均ビットはシステム初期化時にビットレートとサンプリングレートと入力チャネル数とから予め計算されている。この計算は当分野において公知であるため、ここでは詳述しない。システム上に保持されているフレーム平均ビットは、初期化時に計算された値が符号化処理中は不変のまま利用される。 However, X _q is a quantized spectrum, and the range of i taking the total is one frame, that is, 0 ≦ i ≦ 1023. In the present embodiment, the frame average bit is calculated in advance from the bit rate, sampling rate, and number of input channels at the time of system initialization. This calculation is well known in the art and will not be described in detail here. The frame average bit held on the system is used while the value calculated at the time of initialization remains unchanged during the encoding process.

次に、量子化スペクトル総量を量子化スペクトル情報量に変換する。本実施形態において、この計算は（３）式で求めた量子化スペクトル総量に対し、底が２の対数をとることによって行われる。すなわち、量子化スペクトル情報量は次のように表される。 Next, the quantized spectrum total amount is converted into a quantized spectrum information amount. In the present embodiment, this calculation is performed by taking the logarithm of 2 for the total amount of the quantized spectrum obtained by the equation (3). That is, the quantized spectrum information amount is expressed as follows.

量子化ステップ計算器７は、スペクトル情報量計算器１５から出力された量子化前スペクトルの情報量から、量子化スペクトル情報量予測器１６から出力された量子化スペクトル情報量を減じる。その後、その減算結果に、量子化粗さの刻み幅から得られる係数を掛けることによって、フレーム全体の量子化粗さである量子化ステップを計算する。 The quantization step calculator 7 subtracts the quantized spectrum information amount output from the quantized spectrum information amount predictor 16 from the information amount of the spectrum before quantization output from the spectrum information amount calculator 15. Thereafter, a quantization step which is the quantization roughness of the entire frame is calculated by multiplying the subtraction result by a coefficient obtained from the step size of the quantization roughness.

具体的には、MPEG-2 AACの場合は、量子化ステップの予測値は次式によって得られる。 Specifically, in the case of MPEG-2 AAC, the predicted value of the quantization step is obtained by the following equation.

ただし、X_qは量子化スペクトル、ｘ_iは量子化前のスペクトル、global_gainはグローバルゲイン（量子化ステップ）である。また、総計をとるiの範囲は１フレーム分、すなわち0≦i≦1023である。 However, X _q is the quantized spectral, x _i is the unquantized spectral, global_gain is global gain (quantization step). The range of i for which the total is taken is one frame, that is, 0 ≦ i ≦ 1023.

ここで、（５）式における右辺の第１項は次のとおりである。 Here, the first term on the right side in the equation (5) is as follows.

これは、量子化前のスペクトル全体が持つ情報量であり、スペクトル情報量計算器１５によって（２）式により計算された値である。また、右辺の第２項は次のとおりである。 This is the information amount of the entire spectrum before quantization, and is a value calculated by the spectrum information amount calculator 15 according to the equation (2). The second term on the right side is as follows.

これは、量子化後のスペクトルが持つ情報量であり、量子化スペクトル情報量予測器１６によって（４）式により予測された値である。 This is the information amount of the spectrum after quantization, and is a value predicted by the quantized spectrum information amount predictor 16 according to the equation (4).

なお、（５）式は先述のスペクトル量子化式（１）を適宜変形し、スケールファクタscalefacに一律に０を代入することによって得ることができる。 The equation (5) can be obtained by appropriately modifying the above-described spectrum quantization equation (1) and uniformly substituting 0 for the scale factor scalefac.

スペクトル割当ビット計算器１２は、ビットリザーバ１３によって管理されている現在の余剰ビット量をビットリザーバ１３から通知され、例えばそのうちの２割をフレーム平均ビットに加えてこれを割当ビットとし、スペクトル量子化器８に通知する。 The spectrum allocation bit calculator 12 is notified of the current surplus bit amount managed by the bit reservoir 13 from the bit reservoir 13, for example, adding 20% of these to the frame average bit to make this an allocation bit, and spectral quantization Notify device 8.

スペクトル量子化器８は量子化ステップ計算器７が出力した量子化ステップに従って、1024本の周波数スペクトルを量子化する。例えば、MPEG-2 AACの場合では（１）式によって量子化スペクトルを算出し、フレーム全体で消費されるビット数をカウントする。 The spectrum quantizer 8 quantizes 1024 frequency spectra according to the quantization step output from the quantization step calculator 7. For example, in the case of MPEG-2 AAC, the quantized spectrum is calculated by the equation (1), and the number of bits consumed in the entire frame is counted.

ここで、使用ビット数がスペクトル割当ビット計算器１２から通知された割当ビット数を超えてしまった場合には、使用ビット数がスペクトル割当ビット数に収まるまで量子化ステップを増加して再度スペクトル量子化を行う。しかしながら、量子化ステップ計算器７の計算が正確であり、かつ、量子化ステップの予測計算が行われた時のビット量に加えて、余剰ビット量の一部が割当ビットに加算されている。このため、多くの場合、１回の量子化スペクトル計算とビット計算が行われるだけで量子化が完了する。 Here, when the number of used bits exceeds the number of allocated bits notified from the spectrum allocation bit calculator 12, the quantization step is increased until the number of used bits falls within the spectrum allocation bit number, and the spectrum quantum again. To do. However, the calculation of the quantization step calculator 7 is accurate, and a part of the surplus bit amount is added to the allocated bits in addition to the bit amount when the prediction calculation of the quantization step is performed. For this reason, in many cases, the quantization is completed by performing only one quantization spectrum calculation and bit calculation.

また、量子化ステップ計算器７で計算された量子化ステップでスペクトル量子化した場合に使用ビット量が足りなくなるようなフレームは、必然的に情報量が元々平均的なフレームよりも多いフレームである。そのため、余剰ビットの一部を割り当てビットに加算し、この値を基準にしてスペクトル量子化処理を行うことによって、このようなフレームには自動的により多くのビットが割り当てられることになる。 In addition, a frame in which the amount of used bits is insufficient when spectrum quantization is performed in the quantization step calculated by the quantization step calculator 7 is necessarily a frame whose information amount is originally larger than the average frame. . Therefore, by adding a part of the surplus bits to the assigned bits and performing the spectrum quantization process based on this value, more bits are automatically assigned to such a frame.

各SFBのスケールファクタと量子化スペクトルはビット整形器９によって定められた書式に従ってビットストリームに整形されて、出力される。 The scale factor and quantized spectrum of each SFB are shaped into a bit stream according to the format determined by the bit shaper 9 and output.

最後に、ビット整形器９は実際に使用したビット量をビットリザーバ１３に通知する。ビットリザーバ１３はビット整形器９から通知された使用ビット量とフレーム平均ビット量から実際に使用された余剰ビット量を計算し、リザーブビットを適宜加減する。 Finally, the bit shaper 9 notifies the bit reservoir 13 of the actually used bit amount. The bit reservoir 13 calculates the surplus bit amount actually used from the used bit amount notified from the bit shaper 9 and the frame average bit amount, and appropriately adjusts the reserve bits.

以上説明した本実施形態におけるオーディオ信号符号化装置は、処理負荷の重い聴覚心理分析を一切行わない。しかも、フレームに割り当てられたビット量から量子化後のスペクトル情報量を予測し、これを用いて量子化前後のスペクトル全体が持つ情報量の差分を計算することによって、スペクトル量子化の前に量子化ステップをほぼ正確に予測する。このため、量子化ステップの調整のための繰り返しを行うことが減るため、迅速に量子化処理を終了することができる。よって、符号化処理にかかる演算量を大幅に削減することができる。 The audio signal encoding apparatus according to the present embodiment described above does not perform auditory psychological analysis with a heavy processing load. In addition, by predicting the amount of spectral information after quantization from the amount of bits allocated to the frame, and using this to calculate the difference in the amount of information that the entire spectrum before and after quantization has, it is possible to quantize before spectral quantization. Predict the conversion step almost accurately. For this reason, the number of repetitions for adjusting the quantization step is reduced, so that the quantization process can be completed quickly. Therefore, the amount of calculation required for the encoding process can be significantly reduced.

また、本実施形態におけるオーディオ信号符号化装置は、フレーム平均ビット量に基づいて量子化ステップを予測しておき、余剰ビット量の一部を一律に足してから実際のスペクトル量子化を行う。これにより、多少の予測誤差が生じても量子化処理が１回の処理で済むとともに、元々の情報量が多いフレームに自動的にリザーブビットが割当てられることになるため、聴覚心理分析を行わないことによる音質劣化を最小限に留めることができる。 Also, the audio signal encoding apparatus according to the present embodiment predicts the quantization step based on the frame average bit amount, and performs actual spectrum quantization after adding a part of the surplus bit amount uniformly. As a result, even if some prediction error occurs, only one quantization process is required, and reserve bits are automatically allocated to frames with a large amount of original information, so no psychoacoustic analysis is performed. It is possible to minimize the deterioration of sound quality caused by this.

（第２の実施形態）
本発明は、パーソナルコンピュータ（ＰＣ）等の汎用的な計算機上で動作するソフトウェアプログラムとして実施することも可能である。以下、この場合について図面を用いて説明する。 (Second Embodiment)
The present invention can also be implemented as a software program that operates on a general-purpose computer such as a personal computer (PC). Hereinafter, this case will be described with reference to the drawings.

図５は、本実施形態におけるオーディオ信号符号化装置の構成例を示す図である。 FIG. 5 is a diagram illustrating a configuration example of the audio signal encoding device according to the present embodiment.

図示の構成において、１００はＣＰＵであり、オーディオ信号符号化処理のための演算、論理判断等を行い、１０２のバスを介して各構成要素を制御する。 In the configuration shown in the figure, reference numeral 100 denotes a CPU, which performs operations for audio signal encoding processing, logic determination, and the like, and controls each component via a bus 102.

１０１はメモリであり、本実施形態の構成例における基本Ｉ／Oプログラムや、実行しているプログラムコード、プログラム処理時に必要なデータなどを格納する。 A memory 101 stores a basic I / O program in the configuration example of the present embodiment, a program code being executed, data necessary for program processing, and the like.

１０２はバスであり、ＣＰＵ１００の制御の対象とする構成要素を指示するアドレス信号を転送し、ＣＰＵ１００の制御の対象とする各構成要素のコントロール信号を転送し、各構成機器相互間のデータ転送を行う。 Reference numeral 102 denotes a bus, which transfers an address signal indicating a component to be controlled by the CPU 100, transfers a control signal of each component to be controlled by the CPU 100, and transfers data between the components. Do.

１０３はキーボードやマウスなどの入力装置であり、装置の起動、各種条件や入力信号の設定、符号化開始の指示を行う。 Reference numeral 103 denotes an input device such as a keyboard and a mouse, which performs activation of the device, setting of various conditions and input signals, and an instruction to start encoding.

１０４はデータやプログラム等を記憶するための外部記憶領域を提供する外部記憶装置であり、例えばハードディスク装置などによって実現される。ここに、ＯＳをはじめとするプログラムやデータ等が保管され、また、保管されたデータやプログラムは必要な時にＣＰＵ１００によって呼び出される。また、後述するように、オーディオ信号符号化処理プログラムもこの外部記憶装置１０４にインストールされることになる。 Reference numeral 104 denotes an external storage device that provides an external storage area for storing data, programs, and the like, and is realized by, for example, a hard disk device. Here, programs and data including the OS are stored, and the stored data and programs are called by the CPU 100 when necessary. As will be described later, an audio signal encoding processing program is also installed in the external storage device 104.

１０５はメディアドライブである。記録媒体（例えば、ＣＤ−ＲＯＭ）に記録されているプログラムやデータ、デジタルオーディオ信号などはこのメディアドライブ１０５が読み取ることにより本オーディオ信号符号化装置にロードされる。また、外部記憶装置１０４に蓄えられた各種データや実行プログラムを、記録媒体に書き込むこともできる。なお上記の記録媒体は、ＣＤ−ＲＯＭに限らず、ＨＤＤ、ＤＶＤ、ＭＯ、半導体メモリなどを用いてもよい。 Reference numeral 105 denotes a media drive. Programs, data, digital audio signals, and the like recorded on a recording medium (for example, a CD-ROM) are loaded into the audio signal encoding apparatus by being read by the media drive 105. In addition, various data and execution programs stored in the external storage device 104 can be written in a recording medium. The recording medium is not limited to a CD-ROM, and an HDD, DVD, MO, semiconductor memory, or the like may be used.

１０６はマイクロフォンであり、実際の音を集音してオーディオ信号に変換する。１０７はスピーカーであり、任意のオーディオ信号データを実際の音にして出力することができる。 A microphone 106 collects actual sound and converts it into an audio signal. Reference numeral 107 denotes a speaker, which can output arbitrary audio signal data as an actual sound.

１０８は通信網であり、LAN、公衆回線、無線回線、放送電波などで構成されている。１０９は通信インタフェースであり、通信網１０８に接続されている。本実施形態におけるオーディオ信号符号化装置はこの通信インタフェース１０９を介して通信網１０８を経由し、外部機器と通信を行い、データやプログラムを送受信することができる。 A communication network 108 includes a LAN, a public line, a wireless line, a broadcast wave, and the like. Reference numeral 109 denotes a communication interface, which is connected to the communication network 108. The audio signal encoding apparatus according to this embodiment can communicate with an external device via the communication network 108 via the communication interface 109 to transmit / receive data and programs.

かかる構成を備えるオーディオ信号符号化装置は、入力装置１０３からの各種の入力に応じて作動する。入力装置１０３からの入力が供給されると、インタラプト信号がＣＰＵ１００に送られることによって、ＣＰＵ１００がメモリ１０１内に記憶してある各種の制御信号を読出し、それらの制御信号に従って、各種の制御が行われる。 The audio signal encoding device having such a configuration operates in response to various inputs from the input device 103. When an input from the input device 103 is supplied, an interrupt signal is sent to the CPU 100, whereby the CPU 100 reads various control signals stored in the memory 101, and performs various controls according to the control signals. Is called.

本実施形態のオーディオ信号符号化装置は、ＣＰＵ１００が、メモリ１０１に格納されている基本Ｉ／Ｏプログラムを実行し、これより外部記憶装置１０４に記憶されているＯＳをメモリ１０１にロードしてこれを実行することによって、動作する。具体的には、本装置の電源がＯＮにされると、基本Ｉ／Ｏプログラム中のＩＰＬ（イニシャルプログラムローディング）機能により外部記憶装置１０４からＯＳがメモリ１０１に読み込まれ、ＯＳの動作が開始される。 In the audio signal encoding apparatus according to the present embodiment, the CPU 100 executes the basic I / O program stored in the memory 101, and loads the OS stored in the external storage device 104 to the memory 101. It works by executing Specifically, when the power of this apparatus is turned on, the OS is read from the external storage device 104 into the memory 101 by the IPL (Initial Program Loading) function in the basic I / O program, and the operation of the OS is started. The

オーディオ信号符号化処理プログラムは、図２に示されるオーディオ信号符号化処理手順のフローチャートに基づいてプログラムコード化されたものである。 The audio signal encoding processing program is a program code based on the flowchart of the audio signal encoding processing procedure shown in FIG.

図６は、オーディオ信号符号化処理プログラムおよび関連データを記録媒体に記録したときの内容構成例を示す図である。本実施形態において、オーディオ信号符号化処理プログラムおよびその関連データは記録媒体に記録されている。図示したように記録媒体の先頭領域には、この記録媒体のディレクトリ情報が記録されており、その後にこの記録媒体のコンテンツであるオーディオ信号符号化処理プログラムと、オーディオ信号符号化処理関連データがファイルとして記録されている。 FIG. 6 is a diagram showing a content configuration example when an audio signal encoding processing program and related data are recorded on a recording medium. In the present embodiment, the audio signal encoding processing program and related data are recorded on a recording medium. As shown in the drawing, directory information of the recording medium is recorded in the head area of the recording medium, and thereafter, an audio signal encoding processing program that is the content of the recording medium and audio signal encoding processing related data are files. It is recorded as.

図７は、オーディオ信号符号化処理プログラムのオーディオ信号符号化装置（ＰＣ）への導入を示す模式図である。記録媒体に記録されたオーディオ信号符号化処理プログラムおよびその関連データは、図７に示したようにメディアドライブ１０５を通じて本装置にロードすることができる。この記録媒体１１０をメディアドライブ１０５にセットすると、ＯＳ及び基本Ｉ／Ｏプログラムの制御のもとにオーディオ信号符号化処理プログラムおよびその関連データが記録媒体１１０から読み出され、外部記憶装置１０４に格納される。その後、再起動時にこれらの情報がメモリ１０１にロードされて動作可能となる。 FIG. 7 is a schematic diagram showing the introduction of the audio signal encoding processing program into the audio signal encoding device (PC). The audio signal encoding processing program and related data recorded on the recording medium can be loaded into the apparatus through the media drive 105 as shown in FIG. When the recording medium 110 is set in the media drive 105, the audio signal encoding processing program and related data are read from the recording medium 110 and stored in the external storage device 104 under the control of the OS and the basic I / O program. Is done. After that, these information are loaded into the memory 101 at the time of restart and can be operated.

図８は、本実施形態におけるオーディオ信号符号化処理プログラムがメモリ１０１にロードされ実行可能となった状態のメモリマップを示す図である。図示のように、メモリ１０１のワークエリアには例えば、量子化前スペクトル聴覚情報量、量子化後スペクトル予測情報量、スペクトル割当ビット、スペクトルバッファ、量子化スペクトル、入力信号バッファが格納される。この他に、使用ビット、量子化ステップ、ビットレート、サンプリングレート、平均割当ビット、リザーブビット量も格納されている。 FIG. 8 is a diagram showing a memory map in a state where the audio signal encoding processing program according to the present embodiment is loaded into the memory 101 and can be executed. As illustrated, the work area of the memory 101 stores, for example, a pre-quantization spectrum aural information amount, a post-quantization spectrum prediction information amount, a spectrum allocation bit, a spectrum buffer, a quantization spectrum, and an input signal buffer. In addition to this, a used bit, a quantization step, a bit rate, a sampling rate, an average allocated bit, and a reserved bit amount are also stored.

図９は、本実施形態におけるオーディオ信号符号化装置における入力信号バッファの一構成例を示す図である。図示の構成において、バッファサイズは1024×2サンプルであり、説明の便宜上1024サンプル毎に縦線で区切っている。入力信号は1フレーム分の1024サンプルずつ右側から入力されて、左から逐次処理される。太線の矢印は、入力信号の流れを示している。なお、図示の構成は１チャネル分の入力信号バッファを模式的に示したものであり、本実施形態では入力信号のチャネル分だけ同様なバッファが用意される。 FIG. 9 is a diagram illustrating a configuration example of the input signal buffer in the audio signal encoding device according to the present embodiment. In the configuration shown in the drawing, the buffer size is 1024 × 2 samples, and for convenience of explanation, every 1024 samples are separated by vertical lines. The input signal is input from the right side in increments of 1024 samples for one frame and is processed sequentially from the left side. The bold arrow indicates the flow of the input signal. The illustrated configuration schematically shows an input signal buffer for one channel. In this embodiment, similar buffers are prepared for the channels of the input signal.

以下、本実施形態においてＣＰＵ１００で実行されるオーディオ信号符号化処理をフローチャートを用いて説明する。 Hereinafter, an audio signal encoding process executed by the CPU 100 in the present embodiment will be described with reference to flowcharts.

図２は、本実施形態におけるオーディオ信号符号化処理のフローチャートである。このフローチャートに対応するプログラムはオーディオ信号符号化処理プログラムに含まれ、上記のとおりメモリ１０１にロードされＣＰＵ１００によって実行される。 FIG. 2 is a flowchart of audio signal encoding processing in the present embodiment. A program corresponding to this flowchart is included in the audio signal encoding processing program, loaded into the memory 101 as described above, and executed by the CPU 100.

まず、ステップＳ１は、符号化する入力オーディオ信号をユーザが端末１０３を用いて指定する処理である。本実施形態において、符号化するオーディオ信号は、外部記憶装置１０４に格納されているオーディオＰＣＭファイルでも良いし、マイク１０６で捉えたリアルタイムの音声信号をアナログ・デジタル変換した信号でも良い。この処理を終えると、ステップＳ２へ進む。 First, step S <b> 1 is a process in which the user designates an input audio signal to be encoded using the terminal 103. In the present embodiment, the audio signal to be encoded may be an audio PCM file stored in the external storage device 104 or a signal obtained by analog / digital conversion of a real-time audio signal captured by the microphone 106. When this process ends, the process proceeds to step S2.

ステップＳ２は、符号化する入力オーディオ信号が終了したかどうかを判定する処理である。入力信号が終了している場合は、ステップＳ１１へ処理が進む。未終了の場合は、ステップＳ３へ処理が進む。 Step S2 is a process of determining whether or not the input audio signal to be encoded has been completed. If the input signal has ended, the process proceeds to step S11. If not completed, the process proceeds to step S3.

ステップＳ３は、図９に示した入力信号バッファにおいて、右から２フレーム分、すなわち2048サンプルの時間信号を１フレーム分左にシフトするとともに、新たに１フレーム分、すなわち1024サンプルを右側に読み込む入力信号シフト処理である。この処理は入力信号に含まれる全てのチャネルに対して行われる。処理を終えると、ステップＳ５へ処理が進む。 In step S3, the input signal buffer shown in FIG. 9 shifts the time signal of two frames from the right, that is, 2048 samples to the left by one frame, and newly reads one frame, that is, 1024 samples to the right. This is signal shift processing. This process is performed for all channels included in the input signal. When the process is finished, the process proceeds to step S5.

ステップＳ５では、現行フレームの時間信号、すなわち、図９の入力信号バッファに格納されている2048サンプル（２フレーム分）の信号に対して窓掛けを行った後、時間−周波数変換を行う。この結果、MPEG-２ AACの場合、1024の周波数成分に分割されたスペクトルの組が１組得られる。なお本実施形態では、ブロックタイプは全て長いブロック長に設定されている。算出された計1024本のスペクトルは、メモリ１０１上のワークエリアにあるスペクトルバッファに格納される。このステップＳ５を終えると、処理はステップＳ７へと進む。 In step S5, the time signal of the current frame, that is, the signal of 2048 samples (for two frames) stored in the input signal buffer of FIG. 9 is windowed, and then the time-frequency conversion is performed. As a result, in the case of MPEG-2 AAC, one set of spectra divided into 1024 frequency components is obtained. In this embodiment, all block types are set to a long block length. The calculated 1024 spectra in total are stored in the spectrum buffer in the work area on the memory 101. When step S5 is completed, the process proceeds to step S7.

ステップＳ７は、量子化前のスペクトルが持つ情報量と量子化後のスペクトルが持つ情報量との差分から量子化ステップを計算する処理である。この処理の詳細は図３を用いて後述する。このステップＳ７を終えると、処理はステップＳ８へと進む。 Step S7 is a process of calculating the quantization step from the difference between the information amount of the spectrum before quantization and the information amount of the spectrum after quantization. Details of this processing will be described later with reference to FIG. When step S7 is completed, the process proceeds to step S8.

ステップＳ８では、ステップＳ７で求めた量子化ステップに従って、1024本の周波数スペクトルを量子化して、使用ビットを計算する。さらに、その使用ビットがメモリ１０１上のワークエリアに格納されている割当ビットを超えた場合のみ、量子化ステップの増加と再量子化を行う。この処理の詳細は図４を用いて後述する。このステップＳ８を終えると、処理はステップＳ９へと進む。 In step S8, according to the quantization step obtained in step S7, 1024 frequency spectra are quantized and used bits are calculated. Further, only when the number of bits used exceeds the allocated bits stored in the work area on the memory 101, the quantization step is increased and requantization is performed. Details of this processing will be described later with reference to FIG. When step S8 is completed, the process proceeds to step S9.

ステップＳ９は、ステップＳ８で算出された量子化スペクトルと、スケールファクタとを、符号化方式によって定められたフォーマットに従って整形し、ビットストリームとして出力する処理である。本実施形態において、この処理によって出力されるビットストリームは、外部記憶装置１０４に格納されても良いし、あるいは、通信インタフェース１０９を介して通信網１０８に繋がっている外部機器に出力されても良い。このステップＳ９を終えると、処理はステップＳ１０へと進む。 Step S9 is a process of shaping the quantized spectrum calculated in step S8 and the scale factor according to a format determined by the encoding method, and outputting the result as a bit stream. In the present embodiment, the bit stream output by this processing may be stored in the external storage device 104, or may be output to an external device connected to the communication network 108 via the communication interface 109. . When step S9 is completed, the process proceeds to step S10.

ステップＳ１０は、ステップＳ９で出力されたビットストリームに使用されたビット量とフレーム平均ビットから、メモリ１０１上に格納されている余剰ビットの補正を行う処理である。このステップＳ１０を終えると、処理はステップＳ２へと戻る。 Step S10 is a process of correcting the surplus bits stored in the memory 101 from the bit amount used in the bit stream output in step S9 and the frame average bits. When step S10 is completed, the process returns to step S2.

ステップＳ１１は、直交変換などで生じる遅延によってまだ出力されていない量子化スペクトルがメモリ１０１上に残っているため、それらをビットストリームに整形して出力する処理である。このステップＳ１１を終えると、オーディオ信号符号化処理は終了する。 Step S11 is a process of shaping the quantized spectrum that has not yet been output due to a delay caused by orthogonal transformation or the like in the memory 101, and outputting it after converting it into a bit stream. When this step S11 is completed, the audio signal encoding process ends.

図３は、上記したステップＳ７の量子化ステップ予測処理の詳細を示すフローチャートである。 FIG. 3 is a flowchart showing details of the quantization step prediction process in step S7 described above.

ステップＳ１００は、量子化前のスペクトルが持つ情報量を算出する処理である。量子化前のスペクトル情報量は、各スペクトル成分の総量を求め、その対数を算出することによって求められる。例えば、MPEG-2 AACの場合、量子化前のスペクトル情報量は次式によって求めることができる。 Step S100 is a process of calculating the amount of information that the spectrum before quantization has. The amount of spectrum information before quantization is obtained by obtaining the total amount of each spectrum component and calculating the logarithm thereof. For example, in the case of MPEG-2 AAC, the amount of spectrum information before quantization can be obtained by the following equation.

算出された量子化前スペクトル情報量はメモリ１０１上のワークエリアに保存される。このステップＳ１００を終えると、処理はステップＳ１０３へ進む。 The calculated pre-quantization spectrum information amount is stored in a work area on the memory 101. When step S100 is completed, the process proceeds to step S103.

ステップＳ１０３は、メモリ１０１上のフレーム平均ビット数を用いて、量子化スペクトル総量の予測計算を行う処理である。この予測計算は、予め実験を実施することによって求めた近似式によって行う。例えば、この近似式をF(x)として、フレーム平均ビットをaverage_bitsとすると、量子化後スペクトル予測総量は次式によって求めることができる。 Step S103 is a process of performing prediction calculation of the total amount of quantized spectrum using the average number of frames in the memory 101. This prediction calculation is performed by an approximate expression obtained by conducting an experiment in advance. For example, if this approximate expression is F (x) and the frame average bit is average_bits, the quantized spectral prediction total amount can be obtained by the following expression.

算出された量子化スペクトル予測総量はメモリ１０１上のワークエリアに格納される。このステップＳ１０３を終えると、処理はステップＳ１０５へと進む。 The calculated quantized spectrum prediction total amount is stored in a work area on the memory 101. When step S103 is completed, the process proceeds to step S105.

ステップＳ１０５は、ステップＳ１０３で求めた量子化スペクトル予測総量の対数を計算し、量子化スペクトル予測情報量を算出する処理である。例えば、MPEG-2 AACの場合は次式によって算出することができる。 Step S105 is a process for calculating the logarithm of the quantized spectrum prediction total obtained in step S103 and calculating the quantized spectrum prediction information amount. For example, MPEG-2 AAC can be calculated by the following equation.

この処理によって算出された量子化後のスペクトル情報量はメモリ１０１上のワークエリアに保存される。このステップＳ１０５を終えると、処理はステップＳ１０８へと進む。 The quantized spectral information amount calculated by this processing is stored in the work area on the memory 101. When step S105 is completed, the process proceeds to step S108.

ステップＳ１０８では、ステップＳ１００で求めた量子化前スペクトル情報量から、ステップＳ１０５で求めた量子化スペクトル予測情報量を減じる処理を行う。次に、ステップＳ１０９で、ステップＳ１０８の減算結果に量子化粗さの刻み幅によって決定される係数を乗じ、グローバルゲイン、すなわち量子化ステップの予測値を算出する。MPEG-2 AACの場合は、この予測値は結局第１の実施形態と同じく式（５）を計算したことになる。 In step S108, a process of subtracting the quantized spectrum prediction information amount obtained in step S105 from the pre-quantization spectrum information amount obtained in step S100 is performed. Next, in step S109, the subtraction result in step S108 is multiplied by a coefficient determined by the step size of the quantization roughness to calculate a global gain, that is, a predicted value of the quantization step. In the case of MPEG-2 AAC, this prediction value is the result of calculating Equation (5) as in the first embodiment.

算出された量子化ステップ予測値は、メモリ１０１上のワークエリアに量子化ステップとして格納される。以上でこの量子化ステップ予測処理を終了し、リターンする。 The calculated quantization step predicted value is stored in the work area on the memory 101 as a quantization step. The quantization step prediction process is thus completed, and the process returns.

図４は、上記したステップＳ８のスペクトル量子化処理の詳細を示すフローチャートである。 FIG. 4 is a flowchart showing details of the spectrum quantization process in step S8 described above.

ステップＳ２００は、メモリ１０１上に格納されているフレーム平均ビットに、余剰ビット量の一部を加算して、スペクトル割当ビットを計算する処理である。例えば、本実施形態では、余剰ビット量の２割を一律にフレーム平均ビットに加算してスペクトル割当ビットとする。計算されたスペクトル割当ビットはメモリ１０１上のワークエリアに格納される。このステップＳ２００を終えると、処理はステップＳ２０１へ進む。 Step S200 is a process of calculating a spectrum allocation bit by adding a part of the surplus bit amount to the frame average bit stored in the memory 101. For example, in this embodiment, 20% of the surplus bit amount is uniformly added to the frame average bit to obtain the spectrum allocation bit. The calculated spectrum allocation bits are stored in the work area on the memory 101. When step S200 is completed, the process proceeds to step S201.

ステップＳ２０１は、メモリ１０１上に格納されている量子化ステップに従って、スペクトルバッファに格納されている1024本のスペクトル成分を量子化する処理である。MPEG-2 AACの場合は、前出の（１）式に従って量子化スペクトルが計算される。このステップＳ２０１を終えると、処理はステップＳ２０２へ進む。 Step S201 is a process of quantizing 1024 spectral components stored in the spectral buffer according to the quantization step stored on the memory 101. In the case of MPEG-2 AAC, the quantized spectrum is calculated according to the above equation (1). When step S201 is completed, the process proceeds to step S202.

ステップＳ２０２は、ステップＳ２０１で計算された量子化スペクトル全てを符号化した時に使用されるビット数を計算する処理である。例えば、MPEG-2 AACの場合は、量子化スペクトルは複数個をまとめた上でハフマン符号化されるため、この処理においてハフマンコード表の探索が行われ、符号化ビット数の総計が計算される。計算された使用ビット数はメモリ１０１上のワークエリアに格納される。このステップＳ２０２を終えると、処理はステップＳ２０３へ進む。 Step S202 is a process of calculating the number of bits used when encoding all quantized spectrum calculated at step S20 1. For example, in the case of MPEG-2 AAC, since a plurality of quantized spectra are combined and Huffman encoded, the Huffman code table is searched in this process, and the total number of encoded bits is calculated. . The calculated number of used bits is stored in a work area on the memory 101. When step S202 is completed, the process proceeds to step S203.

ステップＳ２０３は、メモリ１０１上のスペクトル割当ビットと使用ビットとの大きさを比較する処理である。この比較の結果、使用ビットが割り当てられたビットよりも大きい場合は、ステップＳ２０４へ進み、符号量を削減するためにメモリ１０１に格納されている量子化ステップを増加した後、ステップＳ２０１に戻り再度スペクトルの量子化を行う。ただし、図３に示した前述の量子化ステップ予測処理（ステップＳ７）によってほぼ正確な量子化ステップが予測されており、かつ、フレーム平均ビットに基づいて量子化ステップの予測が行われている。これに対し、ステップＳ２０３では、それに余剰ビットの一部を加えたスペクトル割当ビットを基準にして符号量の制御を行っているため、ステップＳ２０４が実際に実行されることは極めて少ないであろう。 Step S203 is a process of comparing the magnitudes of the spectrum allocation bits on the memory 101 and the used bits. As a result of this comparison, if the used bit is larger than the allocated bit, the process proceeds to step S204, the quantization step stored in the memory 101 is increased to reduce the code amount, and the process returns to step S201 again. Quantize the spectrum. However, a substantially accurate quantization step is predicted by the above-described quantization step prediction process (step S7) shown in FIG. 3, and the quantization step is predicted based on the frame average bit. On the other hand, in step S203, since the code amount is controlled based on the spectrum allocation bits obtained by adding a part of the surplus bits to the step S203, step S204 will not be actually executed.

また、予測した量子化ステップで量子化した結果、使用したビットがフレーム平均ビットを超えてしまう場合も、余剰ビットの追加分を超えなければ１回のスペクトル量子化で量子化が終了することになる。かつ、このようなフレームは元々情報量が多いフレームであり、結果的に情報量が多いフレームに自動的により多くのビットが割当てられることになる。 In addition, as a result of quantization in the predicted quantization step, even if the used bits exceed the frame average bit, if the extra bits do not exceed the added amount, the quantization is completed by one spectrum quantization. Become. Such a frame is originally a frame with a large amount of information, and as a result, more bits are automatically allocated to a frame with a large amount of information.

ステップＳ２０３の比較において使用ビットが割り当てられたビットよりも小さい場合は、このスペクトル量子化処理を終了してリターンする。 If the used bit is smaller than the allocated bit in the comparison in step S203, the spectrum quantization process is terminated and the process returns.

以上説明した本実施形態におけるオーディオ信号符号化処理は、聴覚心理分析処理を一切省いたものである。そして、フレーム平均ビットから量子化後のスペクトルが持つ情報量を予測し、さらに、量子化前のスペクトル情報量との差分をとることによって量子化ステップを実際の量子化を行う前にほぼ正確に予測する。これによって、聴覚心理演算を行わなくても、量子化ステップの調整を極力避けることが可能になり、符号化処理全体にかかる処理量を大幅に削減することができる。 The audio signal encoding process in the present embodiment described above omits the psychoacoustic analysis process. Then, the amount of information in the spectrum after quantization is predicted from the frame average bit, and the difference from the amount of spectrum information before quantization is calculated, so that the quantization step is almost accurately performed before actual quantization. Predict. Accordingly, it is possible to avoid the adjustment of the quantization step as much as possible without performing the psychoacoustic calculation, and the processing amount for the entire encoding process can be greatly reduced.

また、本実施形態におけるオーディオ信号符号化装置は、フレーム平均ビット量に基づいて量子化ステップを予測しておき、リザーブビット量の一部を一律に足してから実際のスペクトル量子化を行う。これにより、多少の予測誤差が生じても量子化処理が１回の処理で済むとともに、元々の情報量が多いフレームに自動的にリザーブビットが割り当てられることになるため、聴覚心理分析を行わないことによる音質劣化を最小限に留めることができる。 Also, the audio signal encoding apparatus according to the present embodiment predicts the quantization step based on the frame average bit amount, and performs actual spectrum quantization after adding a part of the reserved bit amount uniformly. As a result, even if some prediction error occurs, only one quantization process is required, and reserve bits are automatically assigned to frames with a large amount of original information, so that psychoacoustic analysis is not performed. It is possible to minimize the deterioration of sound quality caused by this.

（他の実施形態）
本発明はその要旨を逸脱しない範囲で種々変形して実施することができる。 (Other embodiments)
The present invention can be implemented with various modifications without departing from the scope of the invention.

たとえば、上述の実施形態ではブロックスイッチングを全く行っていないが、聴覚分析を行わず、比較的簡易に入力信号の過渡状態を検知して、ブロックスイッチングを行うように構成された装置にも、本発明を同様に適用することが可能である。 For example, although the block switching is not performed at all in the above-described embodiment, the present invention is also applied to an apparatus configured to detect a transient state of an input signal and perform block switching relatively easily without performing auditory analysis. It is possible to apply the invention as well.

また、本発明は、複数の機器から構成されるシステムに適用してもよいし、また、一つの機器からなる装置に適用してもよい。 In addition, the present invention may be applied to a system composed of a plurality of devices, or may be applied to an apparatus composed of a single device.

なお、本発明は、前述した実施形態の各機能を実現するプログラムを、システムまたは装置に直接または遠隔から供給し、そのシステムまたは装置に含まれるコンピュータがその供給されたプログラムコードを読み出して実行することによっても達成される。 In the present invention, a program for realizing each function of the above-described embodiments is supplied directly or remotely to a system or apparatus, and a computer included in the system or apparatus reads and executes the supplied program code. Can also be achieved.

従って、本発明の機能・処理をコンピュータで実現するために、そのコンピュータにインストールされるプログラムコード自体も本発明を実現するものである。つまり、上記機能・処理を実現するためのコンピュータプログラム自体も本発明の一つである。 Accordingly, since the functions and processes of the present invention are implemented by a computer, the program code itself installed in the computer also implements the present invention. That is, the computer program itself for realizing the functions and processes is also one aspect of the present invention.

その場合、プログラムの機能を有していれば、オブジェクトコード、インタプリタにより実行されるプログラム、ＯＳに供給するスクリプトデータ等、プログラムの形態を問わない。 In this case, the program may be in any form as long as it has a program function, such as an object code, a program executed by an interpreter, or script data supplied to the OS.

プログラムを供給するための記録媒体としては、例えば、フレキシブルディスク、ハードディスク、光ディスク、光磁気ディスク、ＭＯ、ＣＤ−ＲＯＭ、ＣＤ−Ｒ、ＣＤ−ＲＷなどがある。また、記録媒体としては、磁気テープ、不揮発性のメモリカード、ＲＯＭ、ＤＶＤ（ＤＶＤ−ＲＯＭ，ＤＶＤ−Ｒ）などもある。 Examples of the recording medium for supplying the program include a flexible disk, hard disk, optical disk, magneto-optical disk, MO, CD-ROM, CD-R, and CD-RW. Examples of the recording medium include a magnetic tape, a non-volatile memory card, a ROM, a DVD (DVD-ROM, DVD-R), and the like.

また、プログラムは、クライアントコンピュータのブラウザを用いてインターネットのホームページからダウンロードしてもよい。すなわち、ホームページから本発明のコンピュータプログラムそのもの、もしくは圧縮され自動インストール機能を含むファイルをハードディスク等の記録媒体にダウンロードしてもよい。また、本発明のプログラムを構成するプログラムコードを複数のファイルに分割し、それぞれのファイルを異なるホームページからダウンロードする形態も考えられる。つまり、本発明の機能・処理をコンピュータで実現するためのプログラムファイルを複数のユーザに対してダウンロードさせるＷＷＷサーバも、本発明の構成要件となる場合がある。 The program may be downloaded from a homepage on the Internet using a browser on a client computer. That is, the computer program itself of the present invention or a compressed file including an automatic installation function may be downloaded from a home page to a recording medium such as a hard disk. Further, it is also possible to divide the program code constituting the program of the present invention into a plurality of files and download each file from a different home page. That is, a WWW server that allows a plurality of users to download a program file for realizing the functions and processing of the present invention on a computer may be a constituent requirement of the present invention.

また、本発明のプログラムを暗号化してＣＤ−ＲＯＭ等の記憶媒体に格納してユーザに配布してもよい。この場合、所定条件をクリアしたユーザにのみ、インターネットを介してホームページから暗号化を解く鍵情報をダウンロードさせ、その鍵情報で暗号化されたプログラムを復号して実行し、プログラムをコンピュータにインストールしてもよい。 Further, the program of the present invention may be encrypted and stored in a storage medium such as a CD-ROM and distributed to users. In this case, only users who have cleared the predetermined conditions are allowed to download the key information for decryption from the homepage via the Internet, decrypt the program encrypted with the key information, execute it, and install the program on the computer. May be.

また、コンピュータが、読み出したプログラムを実行することによって、前述した実施形態の機能が実現されてもよい。なお、そのプログラムの指示に基づき、コンピュータ上で稼動しているＯＳなどが、実際の処理の一部または全部を行ってもよい。もちろん、この場合も、前述した実施形態の機能が実現され得る。 Further, the functions of the above-described embodiments may be realized by the computer executing the read program. Note that an OS or the like running on the computer may perform part or all of the actual processing based on the instructions of the program. Of course, also in this case, the functions of the above-described embodiments can be realized.

さらに、記録媒体から読み出されたプログラムが、コンピュータに挿入された機能拡張ボードやコンピュータに接続された機能拡張ユニットに備わるメモリに書き込まれてもよい。そのプログラムの指示に基づき、その機能拡張ボードや機能拡張ユニットに備わるＣＰＵなどが実際の処理の一部または全部を行ってもよい。このようにして、前述した実施形態の機能が実現されることもある。 Furthermore, the program read from the recording medium may be written in a memory provided in a function expansion board inserted into the computer or a function expansion unit connected to the computer. Based on the instructions of the program, a CPU or the like provided in the function expansion board or function expansion unit may perform part or all of the actual processing. In this way, the functions of the above-described embodiments may be realized.

図１は、本発明の第１の実施形態におけるオーディオ信号符号化装置の一構成例を示す図である。FIG. 1 is a diagram illustrating a configuration example of an audio signal encoding apparatus according to the first embodiment of the present invention. 図２は、本発明の第２の実施形態におけるオーディオ信号符号化処理のフローチャートである。FIG. 2 is a flowchart of audio signal encoding processing according to the second embodiment of the present invention. 図３は、本発明の第２の実施形態における量子化ステップ予測処理のフローチャートである。FIG. 3 is a flowchart of the quantization step prediction process in the second embodiment of the present invention. 図４は、本発明の第２の実施形態におけるスペクトル量子化処理のフローチャートである。FIG. 4 is a flowchart of spectrum quantization processing according to the second embodiment of the present invention. 図５は、本発明の第２の実施形態におけるオーディオ信号符号化装置の一構成例を示す図である。FIG. 5 is a diagram illustrating a configuration example of an audio signal encoding device according to the second embodiment of the present invention. 図６は、本発明の第２の実施形態におけるオーディオ信号符号化処理プログラムを格納した記憶媒体の内容構成例を示す図である。FIG. 6 is a diagram showing a content configuration example of a storage medium storing an audio signal encoding processing program according to the second embodiment of the present invention. 図７は、本発明の第２の実施形態におけるオーディオ信号符号化処理プログラムのＰＣへの導入を示す模式図である。FIG. 7 is a schematic diagram showing introduction of an audio signal encoding processing program into a PC according to the second embodiment of the present invention. 図８は、本発明の第２の実施形態におけるメモリマップの例を示す図である。FIG. 8 is a diagram showing an example of a memory map in the second embodiment of the present invention. 図９は、本発明の第２の実施形態における入力信号バッファの構成例を示す図である。FIG. 9 is a diagram illustrating a configuration example of an input signal buffer according to the second embodiment of the present invention. 図１０は、ＩＳＯ規格書に従う量子化処理のフローチャートである。FIG. 10 is a flowchart of the quantization process according to the ISO standard.

Claims

Dividing means for dividing a frame of a processing unit of audio input signals for each channel,
Conversion means for performing a process of converting the time signal of two consecutive frames obtained from the dividing means into a frequency spectrum while shifting each frame by one frame;
A spectrum information amount calculating means for calculating the information amount of the frequency spectrum output from the converting means as a spectrum information amount before quantization;
Prediction means for predicting the amount of spectral information after quantization based on the frame average bit amount calculated from the bit rate and the sampling rate;
The spectral information amount after quantization predicted by the prediction unit is subtracted from the spectral information amount before quantization calculated by the spectral information amount calculation unit , and the subtraction result is obtained from the step size of the quantization roughness. A determination means for determining the quantization step of the entire frame before quantization by multiplying the obtained coefficient;
Quantization means for quantizing the frequency spectrum using the quantization step determined by the determination means ;
A bit reservoir for managing the surplus bit amount according to the encoding standard;
Shaping means for shaping the frequency spectrum quantized by the quantization means according to a predetermined format;
Spectrum allocation bit calculation means for calculating a spectrum allocation bit by adding a part of the surplus bit amount stored in the bit reservoir to the frame average bit;
With
It said quantization means is an audio signal coding apparatus characterized by controlling the code amount based on the spectrum allocation bit amount calculated by the spectral allocation bit calculation means.

The audio signal encoding apparatus according to claim 1, wherein the encoding format is MPEG-1 Audio Layer III.

The audio signal encoding apparatus according to claim 1, wherein the encoding format is MPEG-2 AAC.

A dividing step of dividing a frame of a processing unit of audio input signals for each channel,
A conversion step in which the process of converting the time signal of two consecutive frames obtained in the dividing step into a frequency spectrum is performed while shifting one frame at a time;
A spectral information amount calculating step of calculating the information amount of the frequency spectrum obtained in the conversion step as a spectral information amount before quantization;
A prediction step for predicting the amount of spectral information after quantization based on the frame average bit amount calculated from the bit rate and the sampling rate;
The spectral information amount after quantization predicted in the prediction step is subtracted from the spectral information amount before quantization calculated in the spectral information amount calculation step, and the subtraction result is obtained from the step size of the quantization roughness. coefficients obtained by multiplying the a determining step of determining before quantizing the quantization step of the entire frame,
A quantization step of quantizing the frequency spectrum by using the quantization step determined by the determining step,
A shaping step for shaping the frequency spectrum quantized by the quantization step in accordance with a predetermined format,
A spectrum allocation bit calculation step of calculating a spectrum allocation bit by adding a part of the surplus bit amount stored in a bit reservoir that manages the surplus bit amount according to the encoding standard to the frame average bit;
With
The quantization step, the audio signal encoding method characterized by controlling the code amount based on the spectrum allocation bit amount calculated by the spectral allocation bit calculation step.

A program for causing a computer to execute the audio signal encoding method according to claim 4.

A computer-readable storage medium storing the program according to claim 5.