JP2005165056A

JP2005165056A - Device and method for encoding audio signal

Info

Publication number: JP2005165056A
Application number: JP2003405032A
Authority: JP
Inventors: Masanobu Funakoshi; 正伸船越
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2003-12-03
Filing date: 2003-12-03
Publication date: 2005-06-23

Abstract

PROBLEM TO BE SOLVED: To efficiently generate a bit stream of good sound quality by suppressing a pre-echo while keeping encoding efficiency. SOLUTION: An audio signal encoding device has a frame divider (1) which divides an audio input signal into processing units, an audiopsychology computing element (3) which outputs feature data by the processing units, a block length decision unit (4) which decides whether block lengths are long or short by the processing units based on feature data, a filter bank (2) which calculates permissible error energy by processing units when a block length is long, and puts audio signals of processing units in a block and calculates permissible error energy of each block when block lengths are short, a group decision unit (6) which groups short blocks based on the permissible error energy when the blocks are short, and encoding means (6 to 8) of encoding the audio signal by groups or processing units. COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、デジタルオーディオ信号の符号化装置及び方法に関し、特に、変換ブロック長の変更が可能な変換符号化技術を利用したオーディオ信号符号化装置及び方法に関する。 The present invention relates to a digital audio signal encoding apparatus and method, and more particularly, to an audio signal encoding apparatus and method using a conversion encoding technique capable of changing a conversion block length.

近年、高音質、かつ高効率なオーディオ信号符号化技術は、DVD-Videoの音声トラックや、半導体メモリやHDDなどを利用した携帯オーディオプレイヤー、インターネットを介した音楽配信、家庭内LANにおけるホームサーバへの楽曲蓄積などに広く利用され、幅広く普及するとともにその重要性も増している。 In recent years, audio signal coding technology with high sound quality and high efficiency has been applied to DVD-Video audio tracks, portable audio players using semiconductor memory and HDD, music distribution via the Internet, and home servers in home LANs. It is widely used for the storage of music, and is becoming more and more important.

このようなオーディオ信号符号化技術の多くは、変換符号化技術を利用して時間周波数変換を行っている。例えば、MPEG-2 AACやDolby Digital(AC-3)などでは、MDCTなどの直交変換単体でフィルタバンクを構成しており、MPEG-1 Audio Layer3(MP3)やATRAC(MDに利用されている符号化方式)では、QMFなどのサブバンド分割フィルタと直交変換を多段接続してフィルタバンクを構成している。 Many of such audio signal encoding techniques perform time-frequency conversion using a conversion encoding technique. For example, in MPEG-2 AAC and Dolby Digital (AC-3), etc., a filter bank is composed of a single orthogonal transform such as MDCT, and MPEG-1 Audio Layer 3 (MP3) and ATRAC (codes used in MD In the configuration method, a filter bank is configured by connecting subband division filters such as QMF and orthogonal transformation in multiple stages.

変換符号化方式では、基本的にはフィルタバンクによって周波数成分に変換された入力信号を、人間の聴覚の周波数分解能に基づいて設定される分割周波数帯域ごとにまとめ、量子化時に各分割周波数帯域毎の正規化係数を決定し、正規化係数と量子化スペクトルの組み合わせで周波数成分を表現することで情報量を削減している。MPEG-2 AACでは、この分割周波数帯域をスケールファクタバンド（ＳＦＢ）と呼び、正規化係数をスケールファクタと呼称する。 In the transform coding method, basically, the input signals converted into frequency components by the filter bank are grouped into divided frequency bands set based on the human auditory frequency resolution, and each divided frequency band is quantized at the time of quantization. The amount of information is reduced by determining the normalization coefficient and expressing the frequency component by a combination of the normalization coefficient and the quantized spectrum. In MPEG-2 AAC, this divided frequency band is called a scale factor band (SFB), and the normalization coefficient is called a scale factor.

更に、これらの高効率オーディオ符号化技術では、人間の聴覚特性を利用したマスキング分析を行うことによって、マスキングされると判断したスペクトル成分を取り除く、あるいはマスクされる量子化誤差を許容することにより、スペクトルを表現するための情報量を削減し、圧縮効率を高めている。 Furthermore, in these high-efficiency audio coding techniques, by performing masking analysis using human auditory characteristics, by removing spectral components determined to be masked or by allowing masked quantization errors, The amount of information for expressing the spectrum is reduced, and the compression efficiency is increased.

これらの高効率オーディオ符号化技術で用いられているマスキング分析は、主に、静寂時の可聴周波数領域によるマスキングと、臨界帯域におけるマスカーによる周波数マスキングである。 The masking analysis used in these high-efficiency audio coding techniques is mainly masking by an audible frequency region in silence and frequency masking by a masker in a critical band.

上記マスキング分析により、人間に感知できないと判断される信号は主に高周波域の信号になるため、通常の場合、高周波成分の量子化誤差は多少大きくなってもマスキングされ得る。 Since the signal determined to be undetectable by humans by the masking analysis is mainly a signal in the high frequency range, it can be masked even if the quantization error of the high frequency component becomes somewhat large.

ところが、変換符号化方式では、オーディオ入力信号に急激な変化がある、いわゆる過渡状態の場合、急激な変化が起こっている部分の高周波成分の量子化誤差が、急激な変化の直前や直後の信号にまで影響を与えるため、リンギングノイズが生じる。 However, in the transform coding method, in the case of a so-called transient state in which there is a sudden change in the audio input signal, the quantization error of the high-frequency component in the part where the sudden change occurs is the signal immediately before or immediately after the sudden change. Therefore, ringing noise occurs.

人間の聴覚特性として、大きな音が発生した場合、その直前と直後の時間は音が聞こえづらくなる。これを時間マスキング効果という。大きな音の後に聞こえなくなる時間は、個人差はあるが約100msec程度と比較的長い。しかしながら、直前に働くマスキング効果の時間（プリマスキング時間）は約5〜6msecと短い。従って、リンギングノイズが生じると、大きな音の前のノイズは感知されやすくなってしまう。これは一般にプリエコーと呼ばれる現象である。 As a human auditory characteristic, when a loud sound is generated, it becomes difficult to hear the sound immediately before and immediately after that. This is called a time masking effect. The amount of time that can be heard after a loud sound is relatively long, about 100 msec, although it varies from person to person. However, the masking effect time (pre-masking time) that works immediately before is as short as about 5 to 6 msec. Therefore, when ringing noise is generated, noise before a loud sound is easily detected. This is a phenomenon generally called pre-echo.

以下、この現象を図を用いて説明する。 Hereinafter, this phenomenon will be described with reference to the drawings.

図１１（ａ）は、急激に振幅が変化しているオーディオ入力信号の一例である。この信号を、MPEG-2 AACの通常の変換ブロック長である2048サンプルブロックで符号化・復号化したオーディオ信号の例を図１１（ｂ）に示している。図示したように急激な信号の変化の部分で生じている高周波域の量子化誤差が、ブロック全域に亘って影響している。 FIG. 11A shows an example of an audio input signal whose amplitude changes abruptly. FIG. 11B shows an example of an audio signal obtained by encoding / decoding this signal with a 2048 sample block that is a normal conversion block length of MPEG-2 AAC. As shown in the drawing, a quantization error in a high frequency region that occurs in a sudden signal change portion affects the entire block.

前述したように、振幅が急激に変化する部分の直前では、時間マスキング効果によって人間はノイズを感知できない。しかしながら、入力信号が音楽用CDに用いられているPCM信号と同様な44.1KHzサンプリング周波数を用いていると仮定して、ブロック長を時間に換算すると、2048サンプルブロックの時間は2048÷44100×1000＝約46.44msとなるため、この前半の時間にノイズが生じているとしてもプリマスキング時間をはみだしてしまい、人間はプリエコーを感知してしまう。 As described above, immediately before the portion where the amplitude changes abruptly, a human cannot sense noise due to the time masking effect. However, assuming that the input signal uses the same 44.1KHz sampling frequency as the PCM signal used for music CDs, the block length is converted to time, and the time of 2048 sample blocks is 2048 ÷ 44100 × 1000 = About 46.44 ms, so even if noise is generated in the first half of the time, the pre-masking time will protrude and humans will perceive the pre-echo.

これを抑制するための一方法として、種々のオーディオ符号化方式では、入力信号の急激な変化を検知して変換ブロック長を短くすることにより、急激な変化による高周波成分の量子化誤差が、変化直前の部分に及ばないようにすることで、プリエコーの発生を抑制している。 As a method for suppressing this, in various audio encoding methods, by detecting a sudden change in the input signal and shortening the transform block length, the quantization error of the high frequency component due to the sudden change is changed. The occurrence of pre-echo is suppressed by making it not reach the immediately preceding portion.

図１２では、MPEG-2 AACにおけるショートブロック長である256サンプルブロックで図１１（ａ）に示すオーディオ信号を符号化、復号化した信号を示している。この場合、入力信号の急激な変化による高周波数域の量子化誤差の影響は、変化が発生している256サンプルブロックの中に閉じ込められてしまう。先ほどと同様に、このブロック長を44.1KHzサンプリング周波数で時間に換算すると、約5.80msとなるため、プリマスキング効果によりこのノイズを人間はほぼ感知できなくなり、結果としてプリエコーは消える。 FIG. 12 shows a signal obtained by encoding and decoding the audio signal shown in FIG. 11A with 256 sample blocks having a short block length in MPEG-2 AAC. In this case, the influence of the quantization error in the high frequency region due to the sudden change of the input signal is confined in the 256 sample block in which the change occurs. As before, if this block length is converted to time at the 44.1 KHz sampling frequency, it becomes about 5.80 ms, so that humans can hardly perceive this noise due to the premasking effect, and the pre-echo disappears as a result.

ところが、一般にブロック長を短くすると、周波数分解能が落ちることによりマスキング分析の精度が落ちるばかりでなく、量子化時に使用するスケールファクタバンドがブロックの数だけ増大するために、スケールファクタによって消費される情報量が増えてしまい、量子化時に本来ならスペクトル情報に割り当てるべきビットがスケールファクタに消費されてしまうため、符号化効率が低下する。その結果、特に低ビットレート時には量子化誤差が厳密にマスキングできなくなるため、ブロック長が長い場合に比較して、ノイズが感知されやすくなる恐れがある。 However, in general, when the block length is shortened, not only does the accuracy of masking analysis decrease due to a decrease in frequency resolution, but also the information consumed by the scale factor because the scale factor band used for quantization increases by the number of blocks. The amount increases, and bits that should be allocated to spectrum information at the time of quantization are consumed by the scale factor, so that the coding efficiency is lowered. As a result, the quantization error cannot be strictly masked particularly at a low bit rate, and noise may be detected more easily than when the block length is long.

そこで、MPEG-2 AACでは、ショートブロックで処理するときに、各ブロックに含まれる信号の特性によって複数のブロックをグループ化し、同じグループに含まれるブロックではスケールファクタを共有することによって、スケールファクタによって消費されるビットを削減する仕組みが規格上定められている。これをグルーピングと呼ぶ。 Therefore, in MPEG-2 AAC, when processing with short blocks, multiple blocks are grouped according to the characteristics of the signals included in each block, and the blocks included in the same group share the scale factor, so that The standard defines a mechanism for reducing the bits consumed. This is called grouping.

MPEG-2 AACでは、適切なグルーピングを行うことによって、ショートブロックで変換を行った場合の符号化効率の低下を抑制しつつ、効果的にプリエコーの発生を抑えることが可能である。 In MPEG-2 AAC, by performing appropriate grouping, it is possible to effectively suppress the occurrence of pre-echo while suppressing a decrease in encoding efficiency when conversion is performed with a short block.

グルーピングではスケールファクタを異なるショートブロックで共有するため、本来ならば、スケールファクタを決定した後で、スケールファクタのパターンが似通っているショートブロックを同じグループにまとめることが望ましい。また、スケールファクタは入力信号の変化に応じて変化するため、ショートブロックのグループも入力信号の変化に合致していないと、復号時の量子化誤差が大きくなる可能性がある。 In grouping, since the scale factor is shared by different short blocks, it is originally desirable to group short blocks having similar scale factor patterns into the same group after determining the scale factor. Further, since the scale factor changes according to the change of the input signal, if the group of short blocks does not match the change of the input signal, the quantization error at the time of decoding may increase.

また、符号化効率を優先するあまりに多くのショートブロックを同じグループにしてしまうと、量子化誤差が聴覚上感知できるレベルまで大きくなってしまう危険性がある。MPEG-2 AACの場合、非特許文献１に記載の規格上ではショートブロックは必ず２つ以上のグループから構成されることになっている。 Also, if too many short blocks that prioritize coding efficiency are grouped together, there is a risk that the quantization error will increase to a level that can be perceptually perceived. In the case of MPEG-2 AAC, according to the standard described in Non-Patent Document 1, a short block is always composed of two or more groups.

特許文献１には、隣接するブロック、もしくはグループを統合した場合のスペクトル変動指標を全ての組み合わせで算出し、この変動指標と閾値を比較することによってグループの判定を行う技術が開示されている。 Patent Document 1 discloses a technique for determining a group by calculating a spectrum variation index when adjacent blocks or groups are integrated in all combinations, and comparing the variation index with a threshold value.

また、特許文献２には、ブロック浮動演算のコンテキストにおいて、隣接するブロックのスケールファクタを共有する方法が提案されている。 Patent Document 2 proposes a method for sharing the scale factor of adjacent blocks in the context of block floating calculation.

しかしながら、ショートブロックとなるフレームが連続している場合は、１つのフレーム内に２つのグループという分け方も考えられるが、ショートブロックフレームが単体で存在する場合は、過渡状態前の部分と、信号が激しく変化している部分と、変化後の定常状態に戻る部分という少なくとも３つ以上のグループが構成されることが望ましい。 However, if the frames that become short blocks are continuous, it can be divided into two groups in one frame. However, if the short block frame exists alone, the part before the transient state and the signal It is desirable that at least three or more groups are formed, that is, a portion where the change is drastically changed and a portion where the steady state after the change returns.

特開２００３−１０８１９２号公報JP 2003-108192 A 特開平４−３０４０３１号公報Japanese Unexamined Patent Publication No. 4-304031 ISO/IEC １３８１８−７ISO / IEC 13818-7

ところが、MPEG-2 AAC規格書（非特許文献１参照）には、グルーピング情報をビットストリーム上に格納する書式情報や、グルーピング情報の復号化方法は記載されているものの、ショートブロックのグループを決定する方法についての記述が一切されていない。 However, the MPEG-2 AAC standard (see Non-Patent Document 1) describes the format information for storing the grouping information on the bitstream and the decoding method of the grouping information, but determines the group of short blocks. There is no mention of how to do it.

また、ISOのMPEG-4 Ver.1リファレンスプログラムでは、グループ分けのパターンを予め決めておいて、全てのショートブロックフレームを同じグループパターンで処理するように実装されているが、これでは刻々と変化する入力情報とグルーピングパターンが合致せず、音質が劣化してしまう。 In addition, the ISO MPEG-4 Ver.1 reference program is implemented so that the grouping pattern is determined in advance and all short block frames are processed with the same group pattern. The input information to be matched does not match the grouping pattern, and the sound quality deteriorates.

最も単純なグループ判定手法としては、全てのブロックのスケールファクタを算出した後に、隣接するブロック間でスケールファクタの類似の度合いを判定することによってグループを判定する方法がある。しかし、スケールファクタが実際に決定されるのは量子化処理後であるため、この手法では、グループ判定後に再度量子化をやり直すことになってしまい、処理量のオーバヘッドが著しく大きくなってしまうため、現実的ではない。 As the simplest group determination method, there is a method of determining a group by determining the degree of similarity of scale factors between adjacent blocks after calculating scale factors of all blocks. However, since the scale factor is actually determined after the quantization process, this method will re-quantize after the group determination, and the processing overhead will be significantly increased. Not realistic.

また、特許文献１に記載の方法では、変動指標を計算するために多くの計算量が要求され、また、グループ統合の決定を行う度に繰り返し計算を行わなければならないため結果として処理効率が落ちる。また、繰り返し計算を行う度に変動指標に累積される誤差が増して、結果として入力信号に合致しないグルーピングが行われてしまう可能性がある。 Further, in the method described in Patent Document 1, a large amount of calculation is required to calculate the variation index, and it is necessary to perform the calculation repeatedly each time the group integration is determined. As a result, the processing efficiency decreases. . Further, every time iterative calculation is performed, an error accumulated in the variation index increases, and as a result, there is a possibility that grouping that does not match the input signal is performed.

また、特許文献２に記載の方法では、ブロックのスペクトルピークのみを用い、ピーク間の差分が固定値を超えているかどうかで判断しているが、周波数軸上でピークがどこに現れるかについて考慮されていない。 In the method described in Patent Document 2, only the spectral peak of the block is used and it is determined whether or not the difference between the peaks exceeds a fixed value. However, it is considered where the peak appears on the frequency axis. Not.

また、これらの先行技術では、前後フレームの状況より考えられる最小グループ数は考慮されていない。 In these prior arts, the minimum number of groups considered from the situation of the previous and subsequent frames is not considered.

本発明は上記問題点を鑑みてなされたものであり、符号化効率を保ちながらプリエコーを抑制し、音質の良いビットストリームを効率的に作成することを目的とする。 The present invention has been made in view of the above problems, and an object of the present invention is to efficiently create a bit stream with good sound quality by suppressing pre-echo while maintaining encoding efficiency.

上記目的を達成するために、本発明のオーディオ信号符号化装置は、オーディオ入力信号を処理単位に分割する分割手段と、前記処理単位毎に前記オーディオ入力信号を分析して、特徴データを出力する分析手段と、前記特徴データに基づいて、前記処理単位毎にオーディオ信号の変換ブロック長がロングブロック長であるかショートブロック長であるかを判定する判定手段と、ロングブロック長の場合に前記処理単位の許容誤差エネルギーを算出し、ショートブロック長の場合に前記処理単位のオーディオ信号をブロック化し、各ブロックの許容誤差エネルギーを算出する算出手段と、ショートブロックの場合に、前記許容誤差エネルギーに基づいてショートブロックをグループにまとめるグルーピング手段と、前記変換ブロック長がショートブロックの場合に前記グループ毎に、ロングブロックの場合に前記処理単位毎に、前記オーディオ信号を符号化する符号化手段とを有する。 To achieve the above object, an audio signal encoding apparatus according to the present invention divides an audio input signal into processing units, analyzes the audio input signal for each processing unit, and outputs feature data. Analysis means; determination means for determining whether the conversion block length of the audio signal is a long block length or a short block length for each processing unit based on the feature data; and the processing in the case of a long block length. A calculation means for calculating an allowable error energy of a unit, blocking the audio signal of the processing unit in the case of a short block length, and calculating an allowable error energy of each block; and in the case of a short block, based on the allowable error energy Grouping means for grouping short blocks into groups and the conversion block length For each of the groups in the case of heat block, said each processing unit in the case of long blocks, and a coding means for coding the audio signal.

また、本発明のオーディオ信号符号化方法は、オーディオ入力信号を処理単位に分割する分割工程と、前記処理単位毎に前記オーディオ入力信号を分析して、特徴データを出力する分析工程と、前記特徴データに基づいて、前記処理単位毎にオーディオ信号の変換ブロック長がロングブロック長であるかショートブロック長であるかを判定する判定工程と、ロングブロック長の場合に前記処理単位の許容誤差エネルギーを算出し、ショートブロック長の場合に前記処理単位のオーディオ信号をブロック化し、各ブロックの許容誤差エネルギーを算出する算出工程と、ショートブロックの場合に、前記許容誤差エネルギーに基づいてショートブロックをグループにまとめるグルーピング工程と、前記変換ブロック長がショートブロックの場合に前記グループ毎に、ロングブロックの場合に前記処理単位毎に、前記オーディオ信号を符号化する符号化工程とを有する。 The audio signal encoding method of the present invention includes a dividing step of dividing an audio input signal into processing units, an analyzing step of analyzing the audio input signal for each processing unit and outputting feature data, and the features A determination step for determining whether the conversion block length of the audio signal is a long block length or a short block length for each processing unit based on the data; and, in the case of a long block length, an allowable error energy of the processing unit In the case of a short block length, the audio signal of the processing unit is blocked, and a calculation step for calculating the allowable error energy of each block; in the case of a short block, the short blocks are grouped based on the allowable error energy. Grouping process to be combined and when the conversion block length is a short block Each serial group, the for each processing unit in the case of long blocks, and a coding step for coding the audio signal.

また、処理対象の処理単位のオーディオ信号の変換ブロック長がショートブロックであり、前後の処理単位のオーディオ信号の変換ブロック長のが共にロングブロックであると判定された場合、最小ブロック数を３に設定する。更に、処理対象の処理単位のオーディオ信号の変換ブロック長がショートブロックであり、前後の処理単位のオーディオ信号の変換ブロック長の少なくともいずれか一方がショートブロックであると判定された場合、最小ブロック数を２に設定する。 If it is determined that the conversion block length of the audio signal of the processing unit to be processed is a short block and the conversion block lengths of the audio signals of the preceding and subsequent processing units are both long blocks, the minimum number of blocks is set to 3. Set. Further, when it is determined that the conversion block length of the audio signal of the processing unit to be processed is a short block and at least one of the conversion block lengths of the audio signals of the preceding and subsequent processing units is a short block, the minimum number of blocks Is set to 2.

上記構成によれば、フレーム内の符号量を割り当てる基準となる許容誤差エネルギーが類似しているブロックを同じグループと判定することによって、人間の聴覚特性に合致した適切なグループ判定が可能になり、プリエコーの発生を抑えながら、ショートブロック選択による符号化効率の低下を防止することにより、高音質なビットストリームを効率的に作成することができる。 According to the above configuration, it is possible to perform appropriate group determination that matches human auditory characteristics by determining blocks having similar allowable error energies as a reference for allocating code amounts in a frame as the same group, By suppressing the occurrence of pre-echo and preventing a decrease in encoding efficiency due to short block selection, a high-quality bit stream can be efficiently created.

更に、グループ判定に用いる閾値に、時間的に直前に位置するブロックの許容誤差エネルギーを用いることによって、入力信号が変化している部分を確実にグループの分割点として判断することが可能となり、入力信号の変化に応じた正確なグルーピング結果を得ることができる。 Furthermore, by using the allowable error energy of the block located immediately before in the threshold used for group determination, it is possible to reliably determine the portion where the input signal is changing as the group division point. It is possible to obtain an accurate grouping result corresponding to a change in signal.

また、前後フレームのブロック長から最小グループ数を設定し、多くのブロックが同じグループに固まることによる量子化誤差の増大を防ぐことにより、適切なグルーピング結果を得ることができる。 Also, an appropriate grouping result can be obtained by setting the minimum number of groups based on the block lengths of the preceding and following frames and preventing an increase in quantization error due to many blocks being consolidated into the same group.

以下、添付図面を参照して本発明を実施するための最良の形態を詳細に説明する。 The best mode for carrying out the present invention will be described below in detail with reference to the accompanying drawings.

＜第１の実施形態＞
図１は、本第１の実施形態におけるオーディオ信号符号化装置の一構成例を示すブロック図である。 <First Embodiment>
FIG. 1 is a block diagram illustrating a configuration example of the audio signal encoding apparatus according to the first embodiment.

図１の構成において、１はオーディオ入力信号を処理単位であるフレームに分割するフレーム分割器である。ここで分割されたフレームは後述するフィルタバンク２と聴覚心理演算器３とに送出される。聴覚心理演算器３は、入力されたフレーム単位のオーディオ入力信号を分析し、聴覚エントロピー値の算出と、量子化単位となる分割周波数帯域毎のマスキング計算を行う。この演算の結果、聴覚エントロピー（ＰＥ）値をブロック長判定器４に、また、各分割周波数帯域毎の信号対マスク比(Signal Mask Ratio:SMR)をグループ判定器５にそれぞれ出力する。 In the configuration of FIG. 1, reference numeral 1 denotes a frame divider that divides an audio input signal into frames as processing units. The divided frames are sent to a filter bank 2 and an auditory psychological calculator 3 described later. The auditory psychological arithmetic unit 3 analyzes the input audio input signal in units of frames, calculates an auditory entropy value, and performs masking calculation for each divided frequency band serving as a quantization unit. As a result of this calculation, the auditory entropy (PE) value is output to the block length determiner 4, and the signal-to-mask ratio (Signal Mask Ratio: SMR) for each divided frequency band is output to the group determiner 5.

ブロック長判定器４は、聴覚心理演算器３から送出されるＰＥ値と予め定められているＰＥ閾値とを比較して変換ブロック長を判定し、フィルタバンク２に通知する。なお、本第１の実施形態において、ＰＥ閾値は予め決定され、ブロック長判定器４に保持されている。 The block length determiner 4 compares the PE value sent from the psychoacoustic operator 3 with a predetermined PE threshold value, determines the converted block length, and notifies the filter bank 2 of it. In the first embodiment, the PE threshold is determined in advance and held in the block length determiner 4.

フィルタバンク２は、フレーム分割器１から入力されたフレーム単位の入力時間信号をブロック長判定器４によって指定された長さのブロック長の周波数スペクトルに変換する。 The filter bank 2 converts the input time signal for each frame input from the frame divider 1 into a frequency spectrum having a block length of a length designated by the block length determiner 4.

グループ判定器５は、聴覚心理演算器３から送出される分割周波数帯域ごとのＳＭＲ値とフィルタバンク２から出力されるスペクトル列より、分割周波数帯域毎の許容誤差エネルギーを算出するとともに、スペクトル列がショートブロックの組の場合にのみ、許容誤差エネルギーに基づいてショートブロックのグループ判定を行う。 The group determination unit 5 calculates the allowable error energy for each divided frequency band from the SMR value for each divided frequency band transmitted from the psychoacoustic operator 3 and the spectrum string output from the filter bank 2, and the spectrum string is Only in the case of a set of short blocks, the group determination of the short block is performed based on the allowable error energy.

６はビット割当て器であり、聴覚心理演算器３より送出される分割周波数帯域毎のＳＭＲ値やフィルタバンク２から出力される周波数スペクトルを参照して、各分割周波数帯域に割り当てるビット量を決定する。７は量子化器であり、フィルタバンク２が出力する周波数スペクトルの正規化係数（スケールファクタ）を各周波数帯域毎に算出し、ビット割当て器６が出力する、各周波数帯域に割り当てられたビット量に従って周波数スペクトルを量子化する。８はビット整形器であり、量子化器７が出力するスケールファクタと量子化スペクトルを適宜規定のフォーマットに整形してビットストリームを作成し、出力する。 Reference numeral 6 denotes a bit allocator, which determines the bit amount to be allocated to each divided frequency band with reference to the SMR value for each divided frequency band transmitted from the psychoacoustic calculator 3 and the frequency spectrum output from the filter bank 2. . Reference numeral 7 denotes a quantizer, which calculates a normalization coefficient (scale factor) of the frequency spectrum output from the filter bank 2 for each frequency band and outputs the bit amount allocated to each frequency band output from the bit allocator 6. Quantize the frequency spectrum according to A bit shaper 8 forms a bit stream by appropriately shaping the scale factor and quantized spectrum output from the quantizer 7 into a prescribed format, and outputs the bit stream.

上記構成を有するオーディオ信号符号化装置におけるオーディオ信号の符号化処理動作を、図２を参照して以下に説明する。 The audio signal encoding processing operation in the audio signal encoding apparatus having the above configuration will be described below with reference to FIG.

なお、本第１の実施形態では説明の便宜のために符号化方式としてMPEG-2 AACを例にとって説明するが、グルーピングを行うその他の符号化方式についても同様な方法で実現可能である。また、符号化処理対象となる入力オーディオ信号としては、例えば、オーディオＰＣＭファイル、マイクで捉えたリアルタイムの音声信号をアナログ・デジタル変換した信号が挙げられるが、これらに限られるものではない。 In the first embodiment, MPEG-2 AAC is described as an example of an encoding method for convenience of explanation, but other encoding methods for performing grouping can be realized by a similar method. Examples of the input audio signal to be encoded include, but are not limited to, audio PCM files and signals obtained by analog / digital conversion of real-time audio signals captured by a microphone.

まず、ステップＳ１において、図１に示す各部の初期化を行う。このとき、本第１の実施形態ではＰＥ閾値の初期値として2000が与えられ、ブロック長判定器４に格納される。 First, in step S1, each part shown in FIG. 1 is initialized. At this time, in the first embodiment, 2000 is given as the initial value of the PE threshold value and stored in the block length determiner 4.

次に、ステップＳ２において、符号化する入力オーディオ信号が終了したかどうかを判定する。入力信号が終了している場合はステップＳ１５へ進み、未終了の場合はステップＳ３へ進む。ステップＳ３では、入力オーディオ信号は、フレーム分割器１によって処理単位であるフレームに分割され、フィルタバンク２と聴覚心理演算器３に送出される。MPEG-2 AAC LC(Low-Complexity)プロファイルの場合、１フレームは1024サンプルのＰＣＭ信号で構成される。フレーム分割後、ステップＳ４に進む。 Next, in step S2, it is determined whether or not the input audio signal to be encoded has been completed. If the input signal has been completed, the process proceeds to step S15. If the input signal has not been completed, the process proceeds to step S3. In step S 3, the input audio signal is divided into frames as processing units by the frame divider 1 and sent to the filter bank 2 and the psychoacoustic operator 3. In the case of the MPEG-2 AAC LC (Low-Complexity) profile, one frame is composed of 1024 sample PCM signals. After the frame division, the process proceeds to step S4.

ステップＳ４では、入力オーディオ信号に対してフレーム毎に聴覚心理演算器３によって聴覚エントロピー（ＰＥ）と、量子化単位である各周波数帯域ごとのマスキング計算を行って信号対マスク（ＳＭＲ）値を算出する。このＳＭＲ値はロングブロック長の場合の値とショートブロック長の場合の値が両方とも算出される。なお、ＭＤＣＴにおけるエイリアシング除去を確実に行うために、ブロック長の判定は１フレーム分先行して行なう必要があるため、聴覚心理演算器３による聴覚分析は、符号化対象フレームよりも１フレーム分時間的に後となるフレーム（以下、先行フレーム）に対して行われる。次に、ステップＳ６で、ブロック長判定器４により、算出された先行フレームのＰＥ値とブロック長判定器４内に予め設定されたＰＥ閾値とを比較する。ここで、ＰＥ閾値よりも先行フレームのＰＥ値が大きい場合は、先行フレームに短いブロック（ショートブロック）長を使用すると判定し、そうでない場合は、先行フレームに長いブロック（ロングブロック）長を使用すると判定する。次に、前回に判定された当該フレーム長と今回判定された先行フレーム長の判定結果に基づいて、当該フレームの変換ブロック長を正式に決定する。この結果、当該フレームにロングブロック長を使用する場合はステップＳ７へ進み、当該フレームにショートブロック長を使用する場合はしてステップＳ１０に進む。以下に説明するように、フィルタバンク２では、この決定に沿ったブロック長で、入力信号を周波数スペクトルへ変換することになる。なお、先行フレームのブロック長判定結果は、次回の（先行フレームが符号化対象フレームとなる）ブロック長判定に使用されるまで、ブロック長判定器４内に保持される。 In step S4, a signal pair mask (SMR) value is calculated by performing auditory entropy (PE) and masking calculation for each frequency band as a quantization unit by the auditory psychological calculator 3 for each frame of the input audio signal. To do. The SMR value is calculated for both the long block length and the short block length. Note that in order to reliably remove aliasing in MDCT, the block length needs to be determined one frame ahead. Therefore, the auditory analysis by the auditory psychological calculator 3 is performed for one frame from the encoding target frame. This is performed for a later frame (hereinafter, a preceding frame). Next, in step S6, the block length determiner 4 compares the calculated PE value of the preceding frame with the PE threshold value preset in the block length determiner 4. Here, if the PE value of the preceding frame is larger than the PE threshold, it is determined that the short block (short block) length is used for the preceding frame, and if not, the long block (long block) length is used for the preceding frame. Judge that. Next, the transform block length of the frame is formally determined based on the determination result of the frame length determined last time and the preceding frame length determined this time. As a result, when the long block length is used for the frame, the process proceeds to step S7. When the short block length is used for the frame, the process proceeds to step S10. As will be described below, the filter bank 2 converts the input signal into a frequency spectrum with a block length according to this determination. The block length determination result of the preceding frame is held in the block length determination unit 4 until it is used for the next block length determination (the preceding frame becomes the encoding target frame).

ステップＳ１０では、フィルタバンク２により処理対象フレームに対してロングブロック長による直交変換を行う。MPEG-２ AACの場合、直交変換によるエイリアシングを除去するために、変換ブロック長の幅の窓掛け処理を行った上で、MDCTによる重複変換が行われる。時間周波数変換では、処理対象フレームとその直前のフレームを合わせた２０４８サンプルを一単位として入力し、１０２４個の周波数スペクトルを得る。このとき、ロングブロック長を用いる場合は、入力信号の２０４８サンプルを一つのブロックとして直交変換を行い、１０２４個の周波数スペクトルを出力する。この結果、１０２４の周波数成分に分割されたスペクトルの組が一組だけ得られる。処理を終えると、ステップＳ１１に処理が進む。 In step S10, the filter bank 2 performs orthogonal transform with a long block length on the processing target frame. In the case of MPEG-2 AAC, in order to remove aliasing due to orthogonal transform, a windowing process of the width of the transform block length is performed, and then duplicate transform is performed by MDCT. In the time-frequency conversion, 2048 samples including the processing target frame and the immediately preceding frame are input as one unit, and 1024 frequency spectra are obtained. At this time, when a long block length is used, orthogonal transformation is performed with 2048 samples of the input signal as one block, and 1024 frequency spectra are output. As a result, only one set of spectra divided into 1024 frequency components is obtained. When the process is finished, the process proceeds to step S11.

ステップＳ１１では、聴覚心理演算器３の前回の出力である当該フレームのＳＭＲ値とステップＳ１０においてフィルタバンク２で得られたスペクトルから許容誤差エネルギーを算出する。ここで、グループ判定器５は、処理対象フレーム内の各ブロック毎に、分割周波数帯域毎の許容誤差エネルギーを求める。分割周波数帯域ｂの許容誤差エネルギーをxmin[b]とすると、次式より算出される。 In step S11, allowable error energy is calculated from the SMR value of the frame, which is the previous output of the psychoacoustic operator 3, and the spectrum obtained in the filter bank 2 in step S10. Here, the group determination unit 5 obtains an allowable error energy for each divided frequency band for each block in the processing target frame. When the allowable error energy of the divided frequency band b is xmin [b], the energy is calculated by the following equation.

ここで、energy[b]は分割周波数帯域bに含まれるスペクトルの総エネルギーである。ｉ番目のスペクトルをxiと表記し、帯域ｂに含まれるスペクトルがｊ番目からｋ番目までであるとすると、energy[b]は次式で求められる。

Here, energy [b] is the total energy of the spectrum included in the divided frequency band b. If the i-th spectrum is expressed as xi and the spectrum included in the band b is from the j-th to the k-th, energy [b] is obtained by the following equation.

また、SMR[b]は、聴覚心理演算器３が前回出力したロングブロック長の分割周波数帯域bにおけるSMR値である。なお、ステップＳ４において聴覚心理演算器３が出力したＳＭＲ値は、先行フレームが符号化対象フレームとなる次回の処理までグループ判定器５内に保持される。これは、ロングブロック長の場合とショートブロック長の場合の両方のＳＭＲ値がともに保存される。許容誤差エネルギーの算出が終わると、ステップＳ１２に進む。
一方、ステップＳ６において当該フレームにショートブロック長を使用すると判断された場合、ステップＳ７において、フィルタバンク２により処理対象フレームに対してショートブロック長による直交変換を行う。ここでもMPEG-２ AACの場合、直交変換によるエイリアシングを除去するために、変換ブロック長の幅の窓掛け処理を行った上で、MDCTによる重複変換が行われる。時間周波数変換では、処理対象フレームとその直前のフレームを合わせた２０４８サンプルを一単位として入力し、１０２４個の周波数スペクトルを得る。このとき、ショートブロック長を用いる場合は、入力信号の２５６サンプルを一つのブロックとして１２８個の周波数スペクトルを出力する変換を、入力信号を１２８サンプルずつずらしながら都合８回行い、８組の周波数スペクトルを得る。処理を終えると、ステップＳ９に進む。 Further, SMR [b] is an SMR value in the divided frequency band b having a long block length output last time by the auditory psychological calculator 3. Note that the SMR value output from the psychoacoustic operator 3 in step S4 is held in the group determiner 5 until the next processing in which the preceding frame becomes the encoding target frame. This stores both the SMR values for both the long block length and the short block length. When the calculation of the allowable error energy ends, the process proceeds to step S12.
On the other hand, when it is determined in step S6 that the short block length is used for the frame, in step S7, the filter bank 2 performs orthogonal transform on the processing target frame using the short block length. Again, in the case of MPEG-2 AAC, in order to remove aliasing due to orthogonal transform, a windowing process of the width of the transform block length is performed, and then duplicate transform is performed by MDCT. In the time-frequency conversion, 2048 samples including the processing target frame and the immediately preceding frame are input as one unit, and 1024 frequency spectra are obtained. At this time, when the short block length is used, the conversion for outputting 128 frequency spectra with 256 samples of the input signal as one block is performed 8 times while shifting the input signal by 128 samples, and 8 sets of frequency spectra are obtained. Get. When the process is finished, step S9 follows.

ステップＳ９において、グループ判定器５は前回保存されたショートブロックのＳＭＲ値に基づいて許容誤差エネルギーを算出し、その結果に基づき短ブロックのグループ判定を行い、結果をグループ情報として出力する。このグループ判定は、隣接する２つのブロック間において、各分割周波数帯域ごとに許容誤差エネルギーの差分を取り、その総和がある閾値を超えた場合、グループの分割点と判定する処理である。なお、この処理の詳細は図３を用いて後述する。グループ判定が終了すると、ステップＳ１２に進む。 In step S9, the group determination unit 5 calculates the allowable error energy based on the previously stored SMR value of the short block, performs group determination of the short block based on the result, and outputs the result as group information. This group determination is a process of taking a difference in allowable error energy for each divided frequency band between two adjacent blocks and determining a group division point when the sum exceeds a certain threshold. Details of this processing will be described later with reference to FIG. When the group determination ends, the process proceeds to step S12.

ステップＳ１２では、フィルタバンク２から出力された周波数スペクトルと、グループ判定器５から出力されたグループ情報と許容誤差エネルギー値に基づいて、ビット割当て器６が各周波数帯域にビットを割り当てる。なお、ここではビット割当ては二段階に行われる。まず、処理中のフレーム全体に割り当てるビットを余剰ビット量やビット割り当て器６に保存されている処理中のフレームのＰＥ値、変換ブロック長から決定し、次に、ステップＳ９もしくはステップＳ１１で得られる許容誤差エネルギー値に基づいて、フレーム内の各分割周波数帯域に割り当てるビット量を決定する。このような処理は本発明のような変換符号化方法において一般的であるので、詳細説明は省略する。次に、聴覚心理演算器３から出力される先行フレームＰＥ値をビット割り当て器６内に保存する。処理を終えると、ステップＳ１３に進む。 In step S12, the bit assigner 6 assigns bits to each frequency band based on the frequency spectrum output from the filter bank 2, the group information output from the group determiner 5, and the allowable error energy value. Here, bit allocation is performed in two stages. First, the bits to be allocated to the entire frame being processed are determined from the surplus bit amount, the PE value of the frame being processed stored in the bit allocator 6, and the transform block length, and then obtained in step S9 or step S11. Based on the allowable error energy value, the amount of bits to be allocated to each divided frequency band in the frame is determined. Since such processing is common in the transform coding method as in the present invention, detailed description is omitted. Next, the previous frame PE value output from the psychoacoustic operator 3 is stored in the bit allocator 6. When the process is finished, step S13 follows.

ステップＳ１３では、量子化器７は各周波数帯域のスケールファクタを算出し、ステップＳ１２で各周波数帯域に割り当てられたビット量に従って周波数スペクトルを量子化する。処理を終えると、ステップＳ１４に進む。 In step S13, the quantizer 7 calculates the scale factor of each frequency band, and quantizes the frequency spectrum according to the bit amount assigned to each frequency band in step S12. When the process is finished, step S14 follows.

ステップＳ１４では、ビット整形器８は、ステップＳ１３で算出された各周波数帯域のスケールファクタと量子化スペクトルを符号化方式によって定められたフォーマットに従ってビットストリームに整形し、出力する。 In step S14, the bit shaper 8 shapes the scale factor and quantized spectrum of each frequency band calculated in step S13 into a bit stream according to the format determined by the encoding method, and outputs the bit stream.

その後、ステップＳ２に戻り、符号化する入力オーディオ信号が終了したかどうかを判定し、入力信号が終了している場合はステップＳ１５へ進む。ステップＳ１５では、聴覚心理演算や直交変換などで生じる遅延によってまだ出力されていない量子化スペクトルが残っているため、それらをビットストリームに整形して出力する。処理を終えると、オーディオ信号符号化処理を終了する。 Thereafter, the process returns to step S2, and it is determined whether or not the input audio signal to be encoded has ended. If the input signal has ended, the process proceeds to step S15. In step S15, since the quantized spectrum that has not been output yet due to delay caused by auditory psychological calculation or orthogonal transformation remains, it is shaped into a bit stream and output. When the process is finished, the audio signal encoding process is finished.

次に、図３のフローチャートを参照して、グループ判定器５により図２のステップＳ９で行われるグループ判定処理の詳細を説明する。 Next, the details of the group determination process performed by the group determination unit 5 in step S9 of FIG. 2 will be described with reference to the flowchart of FIG.

ステップＳ１０１では、前回保存されたショートブロック時のＳＭＲ値とステップＳ７で求めた８組のショートブロックのスペクトル値より８組の許容誤差エネルギーを求める。この計算は、ステップＳ１１でロングブロック長の処理を行う場合の処理と同様であり、式１及び式２に従って行われる。次に、ステップＳ４において聴覚心理演算器３から出力された先行フレームのＳＭＲ値をグループ判定器５内に保存する。この場合も、ロングブロック長のＳＭＲ値とショートブロック長のＳＭＲ値がともに保存される。処理を終えると、ステップＳ１０２へ進む。 In step S101, eight sets of allowable error energies are obtained from the previously stored SMR values for the short blocks and the spectrum values of the eight sets of short blocks obtained in step S7. This calculation is the same as the processing when the long block length processing is performed in step S11, and is performed according to Equation 1 and Equation 2. Next, the SMR value of the preceding frame output from the psychoacoustic operator 3 in step S4 is stored in the group determiner 5. In this case, both the SMR value of the long block length and the SMR value of the short block length are stored. When the process is finished, step S102 follows.

ステップＳ１０２では、処理対象ブロックカウンタｗを０にリセットする。なお、このカウンタｗは、グループ判定器５内に保持される。、ステップＳ１０３において、処理対象ブロックカウンタｗが７以上であるかどうかを判定する。ｗが７未満、すなわち、全てのショートブロック間のグループ判定が未終了の場合はステップＳ１０４に進む。ｗが７、すなわち、全てのショートブロック間のグループ判定が終了した場合はステップＳ１０９へ進む。 In step S102, the processing target block counter w is reset to zero. The counter w is held in the group determination unit 5. In step S103, it is determined whether the processing target block counter w is 7 or more. If w is less than 7, that is, if group determination between all the short blocks has not been completed, the process proceeds to step S104. When w is 7, that is, when the group determination between all the short blocks is completed, the process proceeds to step S109.

ステップＳ１０４では、ｗ番目のショートブロックとｗ＋１番目のショートブロックの許容誤差エネルギー差分和Ｓを算出する。本実施の形態において、各ショートブロックがｎ個の分割周波数帯域に分割される場合、Ｓは次式によって求められる。 In step S104, an allowable error energy difference sum S between the wth short block and the w + 1th short block is calculated. In the present embodiment, when each short block is divided into n divided frequency bands, S is obtained by the following equation.

ただし、xmin[w][b]はブロックｗの分割周波数帯域ｂにおける許容誤差エネルギーである。 Here, xmin [w] [b] is an allowable error energy in the divided frequency band b of the block w.

次に、ステップＳ１０５において、ブロックｗの許容誤差エネルギー総和Ｘを算出する。Ｘは次式によって求められる。 Next, in step S105, the allowable error energy sum X of the block w is calculated. X is calculated | required by following Formula.

処理を終えると、ステップＳ１０６へ進む。
ステップＳ１０６では、ステップＳ１０４で計算した許容誤差エネルギー差分和Ｓと、ステップＳ１０５で計算したブロックｗの許容誤差エネルギー総和Ｘに予め定められた係数α（0＜α＜1）を掛けた値とを比較する。 When the process is finished, step S 106 follows.
In step S106, the allowable error energy difference sum S calculated in step S104 and a value obtained by multiplying the allowable error energy sum X of the block w calculated in step S105 by a predetermined coefficient α (0 <α <1). Compare.

このように、グループ判定に用いる閾値に、時間的に直前に位置するブロックの許容誤差エネルギーを用いることによって、入力信号が変化している部分を確実にグループの分割点として判断することが可能となる。 Thus, by using the allowable error energy of the block located immediately before in the threshold used for group determination, it is possible to reliably determine the portion where the input signal is changing as the group division point. Become.

比較の結果、許容誤差エネルギー差分和Ｓの方が大きい場合は（ステップＳ１０６でＹＥＳ）ステップＳ１０７へ処理が進み、ブロックｗとブロックｗ＋１の間をグループの切れ目と判断し、ブロックｗとブロックｗ＋１の境界にグループ境界を設定する。この処理では、グループ情報にこのグループ境界を追加する。処理を終えると、ステップＳ１０８へ進む。 As a result of the comparison, if the allowable error energy difference sum S is larger (YES in step S106), the process proceeds to step S107, and a determination is made between the block w and the block w + 1 as a group break. Set group boundaries at boundaries. In this process, this group boundary is added to the group information. When the process is finished, step S108 follows.

一方、許容誤差エネルギー差分和Ｓの方が大きくない場合は（ステップＳ１０６でＮＯ）、ブロックｗとブロックｗ＋１は同じグループであると判断し、そのままステップＳ１０８へ進む。ステップＳ１０８では、ブロックカウンタｗをインクリメントして、ステップＳ１０３へ戻り、上述した処理を繰り返す。 On the other hand, if the allowable error energy difference sum S is not larger (NO in step S106), it is determined that the block w and the block w + 1 are in the same group, and the process directly proceeds to step S108. In step S108, the block counter w is incremented, the process returns to step S103, and the above-described processing is repeated.

また、ステップＳ１０３でｗが７と判断した場合、すなわち、全てのショートブロック間のグループ判定が終了した場合には、ステップＳ１０９に進み、決定されたグループ情報に従って、許容誤差エネルギーをグループ毎にまとめる。本実施の形態では、この処理は、同グループ内に含まれる同じ分割周波数帯域の許容誤差エネルギーの総和Ｘを、グループに含まれるブロック数で割ることによって行われる。処理を終えると、ステップＳ１１０へ進む。 If it is determined in step S103 that w is 7, that is, if the group determination between all the short blocks is completed, the process proceeds to step S109, and the allowable error energy is collected for each group according to the determined group information. . In the present embodiment, this process is performed by dividing the total allowable error energy X of the same divided frequency band included in the same group by the number of blocks included in the group. When the process is finished, step S110 follows.

ステップＳ１１０では、決定されたグループ情報に従って、同グループ、同スケールファクタ毎にまとめるようにして、スペクトル成分の順番を入れ替える。本実施の形態では、MPEG-2 AACの場合を考慮しているが、MPEG-2 AACではこの並び替え順は規格上で定められており、公知であるため詳細な説明は省略する。処理を終えると、グループ判定処理を終了し、図２の処理にリターンする。 In step S110, the order of the spectral components is changed according to the determined group information so as to be grouped for each group and each same scale factor. In the present embodiment, the case of MPEG-2 AAC is considered. However, in MPEG-2 AAC, this rearrangement order is determined by the standard and is well known, and thus detailed description thereof is omitted. When the process ends, the group determination process ends, and the process returns to the process of FIG.

以上説明したように、本第１の実施形態におけるオーディオ信号符号化処理では、フレーム内の符号量を割り当てる基準となる許容誤差エネルギーが類似しているブロックを同じグループと判定することによって、人間の聴覚特性に合致した適切なグループ判定が可能になり、プリエコーの発生を抑えながら、ショートブロック選択による符号化効率の低下を防止することができる。これにより、高音質なビットストリームを効率的に作成することができる。 As described above, in the audio signal encoding process according to the first embodiment, blocks having similar allowable error energies serving as a reference for allocating code amounts in a frame are determined to be the same group, so that Appropriate group determination that matches the auditory characteristics is possible, and it is possible to prevent a decrease in encoding efficiency due to short block selection while suppressing the occurrence of pre-echo. As a result, a high-quality bit stream can be efficiently created.

＜第２の実施形態＞
本発明は、汎用的なＰＣ上で動作するソフトウェアプログラムとして実施することも可能である。以下、この場合について図面を用いて説明する。 <Second Embodiment>
The present invention can also be implemented as a software program that runs on a general-purpose PC. Hereinafter, this case will be described with reference to the drawings.

図４は本第２の実施形態における汎用的なＰＣを利用したオーディオ信号符号化装置の構成例である。 FIG. 4 is a configuration example of an audio signal encoding apparatus using a general-purpose PC in the second embodiment.

図示の構成において、１００はＣＰＵであり、オーディオ信号符号化処理のための演算、論理判断等を行い、バス１０２を介して、バス１０２に接続された各構成要素を制御する。１０１はメモリであり、本第２の実施形態の構成例における基本I／Oプログラムや、実行しているプログラムコード、プログラム処理時に必要なデータなどを格納する。１０２はバスであり、ＣＰＵ１００の制御の対象とする構成要素を指示するアドレス信号を転送し、ＣＰＵ１００の制御の対象とする各構成要素のコントロール信号を転送し、各構成機器相互間のデータ転送を行う。 In the configuration shown in the figure, reference numeral 100 denotes a CPU, which performs operations for audio signal encoding processing, logic determination, and the like, and controls each component connected to the bus 102 via the bus 102. A memory 101 stores a basic I / O program, a program code being executed, data necessary for program processing, and the like in the configuration example of the second embodiment. Reference numeral 102 denotes a bus, which transfers an address signal indicating a component to be controlled by the CPU 100, transfers a control signal of each component to be controlled by the CPU 100, and transfers data between the components. Do.

１０３は端末であり、装置の起動、各種条件や入力信号の設定、符号化開始の指示を行う。１０４は外部記憶装置であり、データやプログラム等を記憶するための記憶領域を提供する。データやプログラム等は必要に応じて保管され、また、保管されたデータやプログラムは必要な時に呼び出される。 Reference numeral 103 denotes a terminal for instructing device activation, setting of various conditions and input signals, and encoding start. An external storage device 104 provides a storage area for storing data, programs, and the like. Data, programs, and the like are stored as necessary, and the stored data and programs are called up when necessary.

１０５はメディアドライブであり、記録媒体に記録されているプログラムやデータ、デジタルオーディオ信号などはこのメディアドライブ１０５が読み取ることにより本オーディオ信号符号化装置にロードされる。また、外部記憶部１０４に蓄えられた各種データや実行プログラムを記録媒体に書き込むことができる。 Reference numeral 105 denotes a media drive. Programs, data, digital audio signals, and the like recorded on the recording medium are loaded into the audio signal encoding apparatus by the media drive 105 reading. In addition, various data and execution programs stored in the external storage unit 104 can be written in a recording medium.

１０６はマイクであり、音を集音してオーディオ信号に変換する。１０７はスピーカーであり、任意のオーディオ信号データを実際の音にして出力することができる。 A microphone 106 collects sound and converts it into an audio signal. Reference numeral 107 denotes a speaker, which can output arbitrary audio signal data as an actual sound.

１０８は通信網であり、LAN、公衆回線、無線回線、放送電波などで構成されている。１０９は通信インタフェースであり、通信網に接続されている。本第２の実施形態のオーディオ信号符号化装置はこの通信インタフェース１０９を介して通信網を経由し、外部機器と通信し、データやプログラムを送受信することができる。 A communication network 108 includes a LAN, a public line, a wireless line, a broadcast wave, and the like. Reference numeral 109 denotes a communication interface, which is connected to a communication network. The audio signal encoding apparatus according to the second embodiment can communicate with an external device via the communication interface 109 via a communication network, and can transmit and receive data and programs.

上記構成を有する本第２の実施形態のオーディオ信号符号化装置においては、端末１０３からの各種の入力に応じて作動する。端末１０３からの入力が供給されると、インタラプト信号がＣＰＵ１００に送られることによって、ＣＰＵ１００がメモリ１０１内に記憶してある各種の制御信号を読出し、それらの制御信号に従って、各種の制御が行われる。 The audio signal encoding apparatus according to the second embodiment having the above configuration operates in response to various inputs from the terminal 103. When an input from the terminal 103 is supplied, an interrupt signal is sent to the CPU 100, whereby the CPU 100 reads various control signals stored in the memory 101, and various controls are performed according to the control signals. .

本第２の実施形態のオーディオ信号符号化装置は、基本Ｉ／Ｏプログラム、ＯＳ、および本オーディオ信号符号化処理プログラムをＣＰＵ１００が実行することによって動作する。基本Ｉ／Ｏプログラムはメモリ１０１中に書き込まれており、ＯＳは外部記憶装置１０４に書き込まれている。そして、本装置の電源がＯＮにされると、基本Ｉ／Ｏプログラム中のＩＰＬ（イニシャルプログラムローディング）機能により外部記憶部１０４からＯＳがメモリ１０１に読み込まれ、ＯＳの動作が開始される。 The audio signal encoding apparatus according to the second embodiment operates when the CPU 100 executes the basic I / O program, the OS, and the audio signal encoding processing program. The basic I / O program is written in the memory 101, and the OS is written in the external storage device 104. When the power of the apparatus is turned on, the OS is read from the external storage unit 104 into the memory 101 by the IPL (Initial Program Loading) function in the basic I / O program, and the operation of the OS is started.

本第２の実施形態におけるオーディオ信号符号化処理プログラムは、後述する図８に示すオーディオ信号符号化処理手順のフローチャートに基づいてプログラムコード化されたものである。 The audio signal encoding processing program in the second embodiment is a program code based on a flowchart of an audio signal encoding processing procedure shown in FIG.

図５は、本オーディオ信号符号化処理プログラムおよび関連データを記録媒体に記録したときの内容構成図である。 FIG. 5 is a content configuration diagram when the audio signal encoding processing program and related data are recorded on a recording medium.

本第２の実施形態において、オーディオ信号符号化処理プログラムおよび関連データは記録媒体に記録されている。図示したように記録媒体の先頭領域には、この記録媒体のディレクトリ情報が記録されており、その後にこの記録媒体のコンテンツである本オーディオ信号符号化処理プログラムと、オーディオ信号符号化処理関連データがファイルとして記録されている。 In the second embodiment, the audio signal encoding processing program and related data are recorded on a recording medium. As shown in the figure, directory information of the recording medium is recorded in the head area of the recording medium, and thereafter, the audio signal encoding processing program which is the content of the recording medium and the audio signal encoding processing related data are stored. It is recorded as a file.

図６は本第２の実施形態のオーディオ信号符号化装置に、オーディオ信号符号化処理プログラムを導入する様子を示す模式図である。記録媒体に記録されたオーディオ信号符号化処理プログラムおよび関連データは、図６に示したようにメディアドライブ１０５を通じて本第２の実施形態のオーディオ信号符号化装置にロードすることができる。この記録媒体１１０をメディアドライブ１０５にセットすると、ＯＳ及び基本Ｉ／Ｏプログラムの制御のもとに本オーディオ信号符号化処理プログラムおよび関連データが記録媒体から読み出され、外部記憶部１０４に格納される。その後、再起動時にこれらの情報がメモリ１０１にロードされて動作可能となる。 FIG. 6 is a schematic diagram showing a state in which an audio signal encoding processing program is introduced into the audio signal encoding apparatus of the second embodiment. The audio signal encoding processing program and related data recorded on the recording medium can be loaded into the audio signal encoding apparatus of the second embodiment through the media drive 105 as shown in FIG. When the recording medium 110 is set in the media drive 105, the audio signal encoding processing program and related data are read from the recording medium under the control of the OS and the basic I / O program, and stored in the external storage unit 104. The After that, these information are loaded into the memory 101 at the time of restart and can be operated.

図７は、本オーディオ信号符号化装置処理プログラムがメモリ１０１にロードされ実行可能となった状態のメモリマップを示す。 FIG. 7 shows a memory map in a state where the audio signal encoding device processing program is loaded into the memory 101 and becomes executable.

このとき、メモリ１０１のワークエリアには、先行ブロック長、現行ブロック長、前ブロック長、許容誤差エネルギー、最小グループ数、ブロックカウンタｗ、グループ情報ＰＥ閾値、余剰ビット量、先行フレームＳＭＲ、現行フレームＳＭＲ、先行フレームＰＥ値、現行フレームＰＥ値が格納される。 At this time, the work area of the memory 101 includes the preceding block length, the current block length, the previous block length, the allowable error energy, the minimum number of groups, the block counter w, the group information PE threshold, the surplus bit amount, the preceding frame SMR, the current frame. The SMR, the previous frame PE value, and the current frame PE value are stored.

以下、本第２の実施形態においてＣＰＵ１００で実行されるオーディオ信号符号化処理を図８のフローチャートに従って説明する。 Hereinafter, an audio signal encoding process executed by the CPU 100 in the second embodiment will be described with reference to the flowchart of FIG.

まず、ステップＳ２１では、符号化する入力オーディオ信号をユーザが端末１０３を用いて指定する。本第２の実施形態において、符号化するオーディオ信号は、外部記憶１０４に格納されているオーディオＰＣＭファイルでも良いし、マイク１０６で捉えたリアルタイムの音声信号をアナログ・デジタル変換した信号でも良い。 First, in step S 21, the user designates an input audio signal to be encoded using the terminal 103. In the second embodiment, the audio signal to be encoded may be an audio PCM file stored in the external storage 104, or a signal obtained by analog-digital conversion of a real-time audio signal captured by the microphone 106.

次に、ステップＳ２２において、符号化する入力オーディオ信号が終了したかどうかを判定する。入力信号が終了している場合は、ステップＳ３５へ進み、未終了の場合は、ステップＳ２３へ進む。ステップＳ２３では、入力オーディオ信号をチャンネル毎に処理単位であるフレームに分割する。第１の実施形態での説明同様、例えば、MPEG-2 AACの場合、オーディオ入力信号をチャンネル毎に1024サンプルのフレームに分割する。処理を終えると、ステップＳ２４へ進む。 Next, in step S22, it is determined whether the input audio signal to be encoded has been completed. If the input signal has been completed, the process proceeds to step S35. If the input signal has not been completed, the process proceeds to step S23. In step S23, the input audio signal is divided into frames as processing units for each channel. Similar to the description in the first embodiment, for example, in the case of MPEG-2 AAC, the audio input signal is divided into frames of 1024 samples for each channel. When the process is finished, step S 24 follows.

ステップＳ２４では、符号化対象となっているフレームから時間的に次にくるフレーム（以下、先行フレーム）の聴覚心理演算を行う。この演算の結果、先行フレームの聴覚エントロピー（ＰＥ）と、同じく先行フレームに対する量子化単位である分割周波数帯域毎のＳＭＲ値が算出される。ここで、ＳＭＲ値はショートブロック時の８組とロングブロック時の１組を共に算出する。算出されたＰＥ値はメモリ１０１上の先行フレームＰＥ値に、また、ＳＭＲ値は全てメモリ１０１上の先行フレームＳＭＲにそれぞれ格納される。処理を終えると、ステップＳ２５へ進む。 In step S24, the psychoacoustic calculation is performed on a frame that is temporally next from the frame to be encoded (hereinafter referred to as the preceding frame). As a result of this calculation, the auditory entropy (PE) of the preceding frame and the SMR value for each divided frequency band, which is also the quantization unit for the preceding frame, are calculated. Here, the SMR value is calculated together with 8 sets for the short block and 1 set for the long block. The calculated PE value is stored in the preceding frame PE value on the memory 101, and all the SMR values are stored in the preceding frame SMR on the memory 101, respectively. When the process is finished, step S25 follows.

ステップＳ２５では、ステップＳ２４で行われた聴覚分析の結果より、先行フレームのブロック長を判定する。本第２の実施形態において、この判定は、先行フレームのＰＥ値と、メモリ１０１上のＰＥ閾値とを比較することによって行われる。すなわち、先行フレームのＰＥ値がＰＥ閾値よりも大きい場合は、ショートブロック長と判定し、そうでない場合は、ロングブロック長と判定する。次に、メモリ１０１上の現行フレームブロック長に格納されているデータをメモリ１０１のワークエリア内の前フレームブロック長に記憶し、さらに、先行フレームブロック長に格納されているデータをメモリ１０１のワークエリア内の現行フレームブロック長に記憶した後、メモリ１０１のワークエリア内の先行フレームブロック長に判定結果を格納する。これにより、現行フレーム及びその前後のフレームのフレーム長を記憶しておく。処理を終えると、ステップＳ２６へ進む。 In step S25, the block length of the preceding frame is determined from the result of the auditory analysis performed in step S24. In the second embodiment, this determination is performed by comparing the PE value of the preceding frame with the PE threshold value in the memory 101. That is, when the PE value of the preceding frame is larger than the PE threshold, it is determined as the short block length, and when it is not, it is determined as the long block length. Next, the data stored in the current frame block length on the memory 101 is stored in the previous frame block length in the work area of the memory 101, and further, the data stored in the previous frame block length is stored in the work frame of the memory 101. After storing the current frame block length in the area, the determination result is stored in the preceding frame block length in the work area of the memory 101. Thus, the frame lengths of the current frame and the frames before and after the current frame are stored. When the process is finished, step S26 follows.

ステップＳ２６では、メモリ１０１上に格納されている現行フレームブロック長と先行フレームブロック長から、最終的な現行フレームのブロックタイプを決定する。この判定は、MPEG-2 AACの場合は、規格書に記載されている方法によって決定される。なお、ブロックタイプが決定されると、自動的にブロック長が決定する。ブロック長がショートブロック長である場合は、ステップＳ２７へ進み、ロングブロック長である場合は、ステップＳ３０に進む。 In step S26, the block type of the final current frame is determined from the current frame block length and the preceding frame block length stored on the memory 101. In the case of MPEG-2 AAC, this determination is determined by the method described in the standard document. When the block type is determined, the block length is automatically determined. If the block length is a short block length, the process proceeds to step S27. If the block length is a long block length, the process proceeds to step S30.

ステップＳ３０では、ステップＳ２６で行われた決定に基づき、処理対象フレームに対してロングブロック長による直交変換を行い、ステップＳ３１において分割周波数帯域毎の許容誤差エネルギーを算出する。ステップＳ３０及びＳ３１で行われる処理は、第１の実施形態の図２に示すステップＳ１０及びＳ１１で行われる処理と同様であるため、ここでは詳細説明を省略する。 In step S30, based on the determination made in step S26, orthogonal transform based on the long block length is performed on the processing target frame, and in step S31, an allowable error energy for each divided frequency band is calculated. Since the processes performed in steps S30 and S31 are the same as the processes performed in steps S10 and S11 shown in FIG. 2 of the first embodiment, detailed description thereof is omitted here.

一方、ステップＳ２６においてショートブロック長であると判断された場合、ステップＳ２７では、処理対象フレームに対してショートブロック長による直交変換を行う。ここでの処理は、第１の実施形態の図２に示すステップＳ７で実施される処理と同様であり、MPEG-２ AACの場合、この結果１２８の周波数成分に分割されたスペクトルの組が８組得られる。処理を終えると、ステップＳ２８に処理が進む。 On the other hand, if it is determined in step S26 that the block length is short, orthogonal transform based on the short block length is performed on the processing target frame in step S27. The processing here is the same as the processing executed in step S7 shown in FIG. 2 of the first embodiment. In the case of MPEG-2 AAC, the result is that a set of spectra divided into 128 frequency components is 8 in this case. A pair is obtained. When the process is finished, the process proceeds to step S28.

ステップＳ２８では、現行フレームの時間的な前後に位置するフレーム（つまり、先行フレーム、現行フレーム、前フレームの３つ）のブロック長に基づいて、最小グループ数を決定する。すなわち、メモリ１０１上の前フレームブロック長、もしくは、先行フレームブロック長のいずれかがショートブロック長である場合は、メモリ１０１上の最小グループ数に２を格納する。これは、この場合は前後のフレームと合わせて少なくとも過渡状態の前後と過渡状態の３つの部分に別れればよいため、MPEG-２ AACの規格上最低２つのグループにグルーピングできるからである。一方、前後のフレームがどちらもロングブロック長である場合は、処理中のフレームは単一のショートブロックフレームであるので、メモリ１０１上の最小グループ数に３を格納する。これは、前述したように、過渡状態前の部分と、信号が激しく変化している部分と、変化後の定常状態に戻る部分という少なくとも３つ以上のグループが構成されるべきであるからである。処理を終えると、ステップＳ２９へ進む。 In step S28, the minimum number of groups is determined based on the block lengths of frames located before and after the current frame (that is, the preceding frame, the current frame, and the previous frame). That is, when either the previous frame block length on the memory 101 or the preceding frame block length is the short block length, 2 is stored in the minimum number of groups on the memory 101. This is because, in this case, it is sufficient to divide the frame into at least three parts, ie, before and after the transient state and the transient state, together with the previous and subsequent frames, and therefore, it can be grouped into at least two groups according to the MPEG-2 AAC standard. On the other hand, when both the previous and next frames have a long block length, since the frame being processed is a single short block frame, 3 is stored in the minimum number of groups on the memory 101. This is because, as described above, at least three or more groups should be formed: the part before the transient state, the part where the signal changes drastically, and the part that returns to the steady state after the change. . When the process is finished, step S29 follows.

ステップＳ２９は、メモリ１０１上に格納されているショートブロックの現行フレームＳＭＲ値と、ステップＳ２７で計算された８組のショートブロックスペクトルから周波数帯域ごとの許容誤差エネルギーを計算し、それに基づいてショートブロックのグループを決定する。なお、この処理の詳細は図９を用いて後述する。処理を終えるとステップＳ３２へ進む。 In step S29, an allowable error energy for each frequency band is calculated from the current frame SMR value of the short block stored on the memory 101 and the eight sets of short block spectra calculated in step S27, and based on the calculated error energy, Determine the group. Details of this process will be described later with reference to FIG. When the process is finished, step S 32 follows.

ステップＳ３２ではメモリ１０１上の現行フレームＰＥ値、変換ブロック長、及びステップＳ２９もしくはステップＳ３１で得られる許容誤差エネルギーを用いて、第１の実施形態の図２のステップＳ１２と同様の手順でビット割当てを行い、ステップＳ３３で各分割周波数帯域のスケールファクタを算出するとともに、ステップＳ３２で割り当てられたビット量に従って、周波数スペクトルを量子化し、ステップＳ３４で、ステップＳ３３で算出されたスケールファクタと量子化スペクトルを、符号化方式によって定められたフォーマットに従って整形し、ビットストリームとして出力する。本第２の実施形態では、この処理によって出力されるビットストリームは、外部記憶１０４に格納されても良いし、あるいは、通信インターフェース１０９を介して回線網１０８に繋がっている外部機器に出力されても良い。 In step S32, using the current frame PE value in memory 101, the transform block length, and the allowable error energy obtained in step S29 or step S31, bit allocation is performed in the same procedure as in step S12 of FIG. 2 of the first embodiment. In step S33, the scale factor of each divided frequency band is calculated, the frequency spectrum is quantized according to the bit amount allocated in step S32, and the scale factor and quantized spectrum calculated in step S33 are calculated in step S34. Are formatted according to the format determined by the encoding method and output as a bit stream. In the second embodiment, the bit stream output by this processing may be stored in the external storage 104, or output to an external device connected to the network 108 via the communication interface 109. Also good.

ステップＳ３５はメモリ１０１上の先行フレームＰＥ値を現行フレームＰＥ値に、また、先行フレームＳＭＲに格納されているＳＭＲ値を現行フレームＳＭＲにそれぞれコピーする処理である。 Step S35 is a process of copying the preceding frame PE value on the memory 101 to the current frame PE value and copying the SMR value stored in the preceding frame SMR to the current frame SMR.

その後、ステップＳ２２に戻り、入力信号が終了するとステップＳ３６に進む。ステップＳ３６では、聴覚心理演算や直交変換などで生じる遅延によってまだ出力されていない量子化スペクトルがメモリ上に残っているため、それらをビットストリームに整形して出力する。処理を終えると、オーディオ信号符号化処理を終了する。 Thereafter, the process returns to step S22, and when the input signal ends, the process proceeds to step S36. In step S36, quantized spectra that have not yet been output due to delay caused by psychoacoustic computation or orthogonal transformation remain in the memory, so they are shaped into a bit stream and output. When the process is finished, the audio signal encoding process is finished.

次に、図９のフローチャートを参照して、図８のステップＳ２９で行われる本第２の実施形態のグループ判定処理について説明するが、第１の実施形態で説明した図３と同様の処理には同じ参照番号を付し、図３と異なる点について飲み説明する。 Next, the group determination processing of the second embodiment performed in step S29 of FIG. 8 will be described with reference to the flowchart of FIG. 9, but the same processing as that of FIG. 3 described in the first embodiment will be described. Are given the same reference numerals, and different points from FIG. 3 will be explained.

なお、図３における処理対象ブロックカウンタｗは、メモリ１０１上で保持される。 Note that the processing target block counter w in FIG. 3 is held on the memory 101.

図９に示す例では、ステップＳ１０３でｗが７と判断した場合、すなわち、全てのショートブロック間のグループ判定が終了した場合には、ステップＳ１２０に進み、グループ判定の正否を判断する。 In the example shown in FIG. 9, when it is determined that w is 7 in step S103, that is, when the group determination between all short blocks is completed, the process proceeds to step S120, and whether the group determination is correct or not is determined.

ステップＳ１２０において、得られたグループ数がステップＳ２７で決定されたメモリ１０１上の最小グループ数以上になっているかどうかを判定する。グループ数が最小グループ数以上である場合は、ステップＳ１０９に進んで、第１の実施形態で説明した処理を行う。 In step S120, it is determined whether or not the obtained number of groups is equal to or greater than the minimum number of groups on the memory 101 determined in step S27. If the number of groups is equal to or greater than the minimum number of groups, the process proceeds to step S109, and the processing described in the first embodiment is performed.

一方、グループ数が最小グループ数未満の場合は、グループ判定が失敗したものとして再度グループ判定を行うために、ステップＳ１２１へ進む。ステップＳ１２１では、ステップＳ１０６において許容誤差総和Ｘに掛ける係数αを0.05減算し、ステップＳ１０２に戻ってグループ判定をやり直す。この係数αを減らすことにより、次回のステップＳ１０６における判定において閾値を下げることになるため、より細かくグループを分けが為されることになる。 On the other hand, if the number of groups is less than the minimum number of groups, the process proceeds to step S121 in order to perform group determination again assuming that group determination has failed. In step S121, 0.05 is subtracted from the coefficient α multiplied by the allowable error sum X in step S106, and the process returns to step S102 to redo the group determination. By reducing the coefficient α, the threshold value is lowered in the next determination in step S106, so that the group is divided more finely.

上記の通り、本第２の実施形態によれば、汎用のＰＣを用いてオーディオ信号符号化を実施することができる。 As described above, according to the second embodiment, audio signal encoding can be performed using a general-purpose PC.

また、図８のステップＳ２７及び図９のステップＳ１２０及びＳ１２１の処理を行うことによって、前後に位置するフレームの状況も考慮した適切なブロック判定が行われることが保障される。 Further, by performing the processing of step S27 of FIG. 8 and steps S120 and S121 of FIG. 9, it is ensured that an appropriate block determination is performed in consideration of the situation of the frames positioned before and after.

なお、上記処理は図１に示す第１の実施形態のオーディオ信号符号化装置における処理に適用することが可能であることは言うまでもない。また、図８及び図９に示す処理の代わりに、第１の実施形態で説明した図２及び図３の処理を汎用のＰＣを用いて実行することも勿論可能である。 Needless to say, the above processing can be applied to the processing in the audio signal encoding device of the first embodiment shown in FIG. Further, it is of course possible to execute the processes of FIGS. 2 and 3 described in the first embodiment using a general-purpose PC instead of the processes shown in FIGS.

＜第３の実施形態＞
上記第２の実施形態では、許容誤差エネルギーの差分の総和によってグルーピング判定を行う例を示したが、許容誤差エネルギーのピークによってグルーピング判定を行うことも可能である。以下、許容誤差エネルギーのピークによってグルーピング判定を行う処理について図１０を用いて説明する。オーディオ信号符号化処理の全体の流れは図８のフローチャートに示す処理と同様であるため、説明を省略する。また、上記第２の実施形態で説明した図４に示すオーディオ信号符号化装置と同様の構成を有する装置により実現することができる。 <Third Embodiment>
In the second embodiment, the example in which the grouping determination is performed based on the sum of the differences of the allowable error energy has been described. However, the grouping determination can be performed based on the peak of the allowable error energy. Hereinafter, the process of performing grouping determination based on the peak of allowable error energy will be described with reference to FIG. The overall flow of the audio signal encoding process is the same as the process shown in the flowchart of FIG. Moreover, it is realizable with the apparatus which has the structure similar to the audio signal encoding apparatus shown in FIG. 4 demonstrated in the said 2nd Embodiment.

図１０は、第３の実施形態における、図８のステップＳ２９のグループ判定処理を示すフローチャートである。 FIG. 10 is a flowchart showing the group determination processing in step S29 of FIG. 8 in the third embodiment.

ステップＳ２０１は、処理対象であるフレームに含まれる全てのショートブロックにおける、分割周波数帯域（ＳＦＢ）毎の許容誤差エネルギーを計算する。本第３の実施形態における許容誤差エネルギーも、第１の実施形態で説明した図２のステップＳ１１における処理と同様な方法で計算される。処理を終えると、ステップＳ２０２へ進む。 Step S201 calculates the allowable error energy for each divided frequency band (SFB) in all the short blocks included in the frame to be processed. The allowable error energy in the third embodiment is also calculated by the same method as the process in step S11 of FIG. 2 described in the first embodiment. When the process is finished, step S 202 follows.

ステップＳ２０２では、メモリ１０１上のブロックカウンタｗを１にリセットし、ステップＳ２０３において、ブロック１における許容誤差エネルギーがピークとなるＳＦＢ位置（ピークＳＦＢ位置）を検出する。この処理は許容誤差エネルギーが最大となるＳＦＢ位置を求めればよい。 In step S202, the block counter w on the memory 101 is reset to 1, and in step S203, the SFB position (peak SFB position) at which the allowable error energy in the block 1 reaches a peak is detected. In this process, the SFB position where the allowable error energy is maximized may be obtained.

次にステップＳ２０４において、ブロックカウンタｗが７未満であるかどうかを判断し、ｗが７未満、すなわち、全てのショートブロック間の判定が済んでいない場合は、ステップＳ２０５へ進む。ｗが７、すなわち、全てのショートブロック間の判定が終了した場合は、ステップＳ２１０へ進む。 Next, in step S204, it is determined whether or not the block counter w is less than 7. If w is less than 7, that is, if all the short blocks have not been determined, the process proceeds to step S205. When w is 7, that is, when the determination between all the short blocks is completed, the process proceeds to step S210.

ステップＳ２０５では、１ブロック前のショートブロックｗ＋１の許容誤差エネルギーのピークＳＦＢ位置を、ステップＳ２０３と同様にして検出する。処理を終えると、ステップＳ２０６へ進む。 In step S205, the peak SFB position of the allowable error energy of the short block w + 1 one block before is detected in the same manner as in step S203. When the process is finished, step S206 follows.

ステップＳ２０６では、ブロックｗのピークＳＦＢ位置と、ブロックｗ＋１のピークＳＦＢ位置の差分が閾値Ａより大きいかどうかを判定する。なお、本第３の実施形態において、閾値Ａは予め定められており、オーディオ信号符号化装置処理プログラムがメモリ１０１にロードされた時に、メモリ１０１のワークエリアに格納される。 In step S206, it is determined whether or not the difference between the peak SFB position of the block w and the peak SFB position of the block w + 1 is larger than the threshold value A. In the third embodiment, the threshold value A is determined in advance, and is stored in the work area of the memory 101 when the audio signal encoding device processing program is loaded into the memory 101.

判定の結果、ピークＳＦＢ位置の差分が閾値Ａより大きい場合は、ブロックｗとブロックｗ＋１の間をグループの切れ目と判断し、ステップＳ２０８へ進み、メモリ１０１上のグループ情報において、ブロックｗとブロックｗ＋１の間にグループ境界を設定する。処理を終えると、ステップＳ２０９へ進む。 If the difference between the peak SFB positions is larger than the threshold value A as a result of the determination, it is determined that there is a group break between the block w and the block w + 1, and the process proceeds to step S208. In the group information on the memory 101, the block w and the block w + 1 are determined. Set group boundaries between. When the process is finished, step S209 follows.

一方、ステップＳ２０６において、ピークＳＦＢ位置の差分が閾値Ａ以下の場合は、ステップＳ２０７へ進み、ブロックｗのピークＳＦＢ位置の許容誤差エネルギーと、ブロックｗ＋１のピークＳＦＢ位置の許容誤差エネルギーの差分が、閾値Ｂより大きいかどうかを判定する。本第３の実施形態において、閾値Ｂも予め定められ、メモリ１０１上のワークエリアに格納されているものとする。判定の結果、ピーク許容誤差エネルギーの差分が閾値Ｂより大きい場合もブロックｗとブロックｗ＋１の間をグループの切れ目と判断し、ステップＳ２０８においてメモリ１０１上のグループ情報において、ブロックｗとブロックｗ＋１の間にグループ境界を設定する。処理を終えると、ステップＳ２０９へ進む。また、ピーク許容誤差エネルギーの差分が閾値Ｂより大きくない場合は、ブロックｗとブロックｗ＋１は同じグループであると判断し、直接ステップＳ２０９へ進む。 On the other hand, if the difference in the peak SFB position is equal to or smaller than the threshold value A in step S206, the process proceeds to step S207, where the difference between the allowable error energy at the peak SFB position in block w and the allowable error energy at the peak SFB position in block w + 1 is It is determined whether or not the threshold value B is greater. In the third embodiment, it is assumed that the threshold value B is also predetermined and stored in the work area on the memory 101. As a result of the determination, even when the difference in peak allowable error energy is larger than the threshold B, it is determined that the block is between the block w and the block w + 1 as a group break, and in the group information on the memory 101 in step S208, between the block w and the block w + 1. Set the group boundary to. When the process is finished, step S209 follows. If the difference in peak allowable error energy is not greater than the threshold value B, it is determined that the block w and the block w + 1 are in the same group, and the process proceeds directly to step S209.

ステップＳ２０９では、グループカウンタｗをインクリメントしてステップＳ２０４に戻り、上述した処理を繰り返す。 In step S209, the group counter w is incremented, the process returns to step S204, and the above-described processing is repeated.

また、ステップＳ１０３でｗが７と判断した場合、すなわち、全てのショートブロック間のグループ判定が終了した場合には、ステップＳ２１０に進み、グループ判定の正否を判断する。 If it is determined in step S103 that w is 7, that is, if the group determination between all the short blocks is completed, the process proceeds to step S210 to determine whether the group determination is right or wrong.

ステップＳ２１０において、得られたグループ数が図８のステップＳ２７で決定されたメモリ１０１上の最小グループ数以上になっているかどうかを判定する。グループ数が最小グループ数以上である場合は、ステップＳ２１３に進んで、第１の実施形態で説明した図３のステップＳ１０９及びＳ１１０の処理をステップＳ２１３及びＳ２１４で行う。 In step S210, it is determined whether the obtained number of groups is equal to or greater than the minimum number of groups on the memory 101 determined in step S27 of FIG. If the number of groups is equal to or greater than the minimum number of groups, the process proceeds to step S213, and the processes of steps S109 and S110 of FIG. 3 described in the first embodiment are performed in steps S213 and S214.

一方、グループ数が最小グループ数未満の場合は、グループ判定が失敗したものとして再度グループ判定を行うために、ステップＳ２１１へ進む。ステップＳ２１１は、メモリ１０１上の閾値Ａから１を引き、更にステップＳ２１２において、メモリ１０１上の閾値Ｂを適宜減じる。処理を終えると、ステップＳ２０２に戻って、グループ判定をやり直す。 On the other hand, if the number of groups is less than the minimum number of groups, the process proceeds to step S211 to perform group determination again, assuming that group determination has failed. In step S211, 1 is subtracted from the threshold value A on the memory 101, and in step S212, the threshold value B on the memory 101 is appropriately reduced. When the process is finished, the process returns to step S202, and the group determination is performed again.

この処理を行うことによって、前後に位置するフレームの状況も考慮した適切なブロック判定が行われることが保障される。 By performing this process, it is ensured that an appropriate block determination is performed in consideration of the situation of the frames positioned before and after.

以上説明したように、スペクトルのピーク位置が違うショートブロックを別グループとして扱うことによって、適切なグルーピングを行うことが可能になる。さらに、ピークの大きさが著しく異なる部分を別グループとして扱うことにより、入力信号の変化が激しい部分を確実に別グループとして扱うことが可能になり、この場合も入力信号の変化に合致したグルーピングが可能である。 As described above, it is possible to perform appropriate grouping by handling short blocks with different spectral peak positions as separate groups. In addition, by handling parts with significantly different peak sizes as a separate group, it is possible to reliably handle a part where the input signal changes drastically as a separate group. In this case as well, grouping that matches the change in the input signal is possible. Is possible.

また、許容誤差エネルギーの差分和とピーク位置差分、ピーク差分を同時に検出し、総合的に判断してグルーピングを行ってもよい。 Further, grouping may be performed by simultaneously detecting a sum of differences of allowable error energy, a peak position difference, and a peak difference, and comprehensively judging the difference.

また、上述の実施例２では、特に記録媒体に関して言及していないが、これは、ＦＤ、ＨＤＤ、ＣＤ，ＤＶＤ，ＭＯ、半導体メモリなど、どのような記録媒体を用いても適用可能である。 In the second embodiment, the recording medium is not particularly mentioned, but this can be applied to any recording medium such as FD, HDD, CD, DVD, MO, and semiconductor memory.

その他、本発明はその要旨を逸脱しない範囲で種種変形して実施することができる。 In addition, the present invention can be implemented with various modifications without departing from the scope of the invention.

本発明の第１の実施形態におけるオーディオ信号符号化装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the audio signal encoding apparatus in the 1st Embodiment of this invention. 本発明の第１の実施形態におけるオーディオ信号符号化処理を示すフローチャートである。It is a flowchart which shows the audio signal encoding process in the 1st Embodiment of this invention. 本発明の第１の実施形態におけるグループ判定処理を示すフローチャートである。It is a flowchart which shows the group determination process in the 1st Embodiment of this invention. 本発明の第２の実施形態におけるオーディオ信号符号化装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the audio signal encoding apparatus in the 2nd Embodiment of this invention. 本発明の第２の実施形態におけるオーディオ信号符号化処理プログラムを格納した記憶媒体の内容構成図である。It is a content block diagram of the storage medium which stored the audio signal encoding process program in the 2nd Embodiment of this invention. 本発明の第２の実施形態におけるオーディオ信号符号化処理プログラムを導入する様子を示す模式図である。It is a schematic diagram which shows a mode that the audio signal encoding process program in the 2nd Embodiment of this invention is introduced. 本発明の第２の実施形態におけるオーディオ信号符号化処理プログラムがロードされた状態のメモリマップを示す図である。It is a figure which shows the memory map of the state by which the audio signal encoding process program in the 2nd Embodiment of this invention was loaded. 本発明の第２の実施形態におけるオーディオ信号符号化処理を示すフローチャートである。It is a flowchart which shows the audio signal encoding process in the 2nd Embodiment of this invention. 本発明の第２の実施形態におけるグループ判定処理を示すフローチャートである。It is a flowchart which shows the group determination process in the 2nd Embodiment of this invention. 本発明の第３の実施形態におけるグループ判定処理を示すフローチャートである。It is a flowchart which shows the group determination process in the 3rd Embodiment of this invention. 従来のオーディオ信号を２０４８サンプルブロックで符号化する場合の概念の説明図である。It is explanatory drawing of the concept in the case of encoding the conventional audio signal with a 2048 sample block. 従来のオーディオ信号を２５６サンプルブロックで符号化する場合の概念を示す図である。It is a figure which shows the concept in the case of encoding the conventional audio signal by a 256 sample block.

Explanation of symbols

１フレーム分割器
２フィルタバンク
３聴覚心理演算器
４ブロック長判定器
５グループ判定器
６ビット割当て器
７量子化器
８ビット整形器
１００ＣＰＵ
１０１メモリ
１０２バス
１０３端末
１０４外部記憶装置
１０５メディアドライブ
１０６マイク
１０７スピーカー
１０８通信回線
１０９通信インターフェース
１１０記録媒体 1 Frame Divider 2 Filter Bank 3 Auditory Psychological Operation Unit 4 Block Length Determinator 5 Group Determinator 6 Bit Allocation Unit 7 Quantizer 8 Bit Shaper 100 CPU
101 Memory 102 Bus 103 Terminal 104 External Storage Device 105 Media Drive 106 Microphone 107 Speaker 108 Communication Line 109 Communication Interface 110 Recording Medium

Claims

A dividing means for dividing the audio input signal into processing units;
Analyzing means for analyzing the audio input signal for each processing unit and outputting characteristic data;
Determination means for determining whether the conversion block length of the audio signal is a long block length or a short block length for each processing unit based on the feature data;
Calculating means for calculating the permissible error energy of the processing unit in the case of a long block length, blocking the audio signal of the processing unit in the case of a short block length, and calculating the allowable error energy of each block;
In the case of a short block, grouping means for grouping the short blocks into groups based on the allowable error energy;
An audio signal encoding apparatus comprising: encoding means for encoding the audio signal for each group when the transform block length is a short block; and for each processing unit when the transform block length is a long block.

The grouping means determines that the difference group of allowable error energies of consecutive short blocks is the same group when it is smaller than a predetermined ratio of the total allowable error energy of one short block, and if it is larger, The audio signal encoding apparatus according to claim 1, wherein the audio signal encoding apparatus is grouped by determining that there is one.

The grouping unit determines that the group is the same when the difference in the position of the divided frequency band where the allowable error energy of consecutive short blocks is maximum is smaller than a predetermined value, and determines that the group is different when the difference is larger. The audio signal encoding apparatus according to claim 1, wherein the audio signal encoding apparatus is grouped into groups.

In the grouping means, the difference in the position of the divided frequency band where the allowable error energy of the continuous short blocks is maximum is smaller than a predetermined value, and the allowable error energy difference between the short blocks at the position is smaller than the predetermined value. The audio signal encoding apparatus according to claim 1 or 2, wherein the audio signal encoding device is grouped by determining that the groups are the same group, and determining that they are different groups in other cases.

If the determination unit determines that the conversion block length of the audio signal of the processing unit to be processed is a short block and the conversion block lengths of the audio signals of the preceding and subsequent processing units are both long blocks, the minimum number of blocks is set. 5. The audio signal encoding apparatus according to claim 1, further comprising setting means for setting to 3.

The setting means determines that the conversion block length of the audio signal of the processing unit to be processed is a short block and at least one of the conversion block lengths of the audio signals of the preceding and subsequent processing units is a short block by the determination means. 6. The audio signal encoding apparatus according to claim 5, wherein the minimum number of blocks is set to 2.

The audio according to claim 5 or 6, wherein when the number of groups collected by the grouping unit is less than the minimum number of blocks, a grouping determination criterion is changed, and the processing by the grouping unit is re-executed. Signal encoding device.

8. The audio signal encoding apparatus according to claim 1, wherein the feature data is auditory entropy.

9. The audio signal encoding apparatus according to claim 1, wherein the allowable error energy is a product of a reciprocal of a signal-to-mask ratio of each frequency division band and a spectrum energy of each frequency band. .

10. The audio signal encoding apparatus according to claim 1, wherein the encoding format of the encoding means is MPEG-2 / 4 AAC.

A dividing step of dividing the audio input signal into processing units;
Analyzing the audio input signal for each processing unit and outputting characteristic data;
A determination step of determining whether the conversion block length of the audio signal is a long block length or a short block length for each processing unit based on the feature data;
Calculating the allowable error energy of the processing unit in the case of a long block length, blocking the audio signal of the processing unit in the case of a short block length, and calculating the allowable error energy of each block;
In the case of a short block, a grouping step of grouping the short blocks into groups based on the allowable error energy;
And a coding step of coding the audio signal for each group when the transform block length is a short block and for each processing unit when the transform block length is a long block.

In the grouping step, when the difference sum of allowable error energies of consecutive short blocks is smaller than a predetermined ratio of the sum of allowable error energies of one short block, it is determined that they are the same group. 12. The audio signal encoding method according to claim 11, wherein the audio signal encoding method is grouped by determining that there is one.

In the grouping step, it is determined that the group is the same group when the difference in position of the divided frequency band where the allowable error energy of consecutive short blocks is maximum is smaller than a predetermined value, and the group is different when the difference is larger. 13. The audio signal encoding method according to claim 11, wherein the audio signal encoding method is grouped into groups.

In the grouping step, when the difference between the positions of the divided frequency bands where the allowable error energy of the continuous short blocks is maximum is smaller than a predetermined value, and the allowable error energy difference between the short blocks at the position is smaller than the predetermined value. The audio signal encoding method according to claim 11 or 12, wherein the audio signal encoding method is grouped by determining that they are in the same group and determining that they are in different groups in other cases.

If it is determined in the determination step that the conversion block length of the audio signal of the processing unit to be processed is a short block and the conversion block lengths of the audio signals of the preceding and subsequent processing units are both long blocks, the minimum number of blocks is set. The audio signal encoding method according to claim 11, further comprising a setting step of setting to 3.

In the setting step, it is determined in the determination step that the conversion block length of the audio signal of the processing unit to be processed is a short block, and at least one of the conversion block lengths of the audio signal of the preceding and subsequent processing units is a short block. 16. The audio signal encoding method according to claim 15, wherein the minimum number of blocks is set to 2.

The audio signal code according to claim 15 or 16, wherein when the number of groups collected in the grouping step is less than the minimum number of blocks, a grouping determination criterion is changed and the grouping step is re-executed. Method.

18. The audio signal encoding method according to claim 11, wherein the feature data is auditory entropy.

19. The audio signal encoding method according to claim 11, wherein the allowable error energy is a product of a reciprocal of a signal-to-mask ratio in each frequency division band and a spectrum energy in each frequency band. .

20. The audio signal encoding method according to claim 11, wherein the encoding format of the encoding step is MPEG-2 / 4 AAC.

A program that can be executed by an information processing apparatus, and that causes the information processing apparatus that has executed the program to function as the audio signal encoding apparatus according to any one of claims 1 to 10.

21. A program executable by an information processing apparatus, comprising program code for implementing the audio signal encoding method according to claim 11.

23. A storage medium readable by an information processing apparatus, wherein the program according to claim 21 or 22 is stored.