JP2001077698A

JP2001077698A - Method for deciding block size with respect to audio encoding application

Info

Publication number: JP2001077698A
Application number: JP25417599A
Authority: JP
Inventors: Hon Neo Sua; ホン・ネオスア
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1999-09-08
Filing date: 1999-09-08
Publication date: 2001-03-23

Abstract

PROBLEM TO BE SOLVED: To provide an efficient and simple block size deciding method, in the encoding technique of a digital audio signal. SOLUTION: This block size deciding methods divides the PCM sample of a conversion inputting buffer into K sub-blocks of an equal length, obtains the difference from a PCM sample value by each sub-block, detects the peak of the difference and the peak of the PCM sample value by each sub-block, modulates the peak of the difference by the peak of an audio PCM sample value by each sub-block to obtain a modulated value, detects the peak (peak sub-block) of the modulated value from all the sub-blocks, calculates the ratio of the modulated value between the peak sub-block and all of the other sub- blocks appearing before the peak sub-block, compares the ratio with a prescribed threshold to detect an attack signal, applies a short block size when the attack signal is detected, and applies a long block size, when it is not detected.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、データ伝送または
デジタル蓄積媒体のためのデジタルオーディオ信号の効
率的な情報符号化技術に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a technology for efficiently encoding digital audio signals for data transmission or digital storage media.

【０００２】[0002]

【従来の技術】オーディオ符号化アルゴリズムは、一般
に３つのカテゴリーに分類される。すなわち、サブバン
ド符号化、変換符号化、サブバンド符号と変換符号を混
成した符号化に分類される。サブバンド符号化器は、例
えば、ISO/IEC11172-3 ＭＰＥＧ−１オーディオ・レイ
ヤー２アルゴリズムを含む。変換符号化器は、例えば、
ISO/IEC13818-7 ＭＰＥＧ−２アドバンスド・オーディ
オ符号化（ＡＡＣ）アルゴリズムを含む。例えば、サブ
バンド符号と変換符号の混成符号化器は、ミニディスク
システムで使用されるISO/IEC11172-3 ＭＰＥＧ−１オ
ーディオ・レイヤー３アルゴリズムを含む。2. Description of the Related Art Audio encoding algorithms generally fall into three categories. That is, it is classified into subband coding, transform coding, and coding in which subband codes and transform codes are mixed. The subband encoder includes, for example, an ISO / IEC11172-3 MPEG-1 audio layer 2 algorithm. The transform encoder, for example,
Includes ISO / IEC13818-7 MPEG-2 Advanced Audio Coding (AAC) algorithm. For example, a hybrid encoder for subband and transform codes includes the ISO / IEC 11172-3 MPEG-1 audio layer 3 algorithm used in minidisc systems.

【０００３】典型的な変換符号化器は、図２に示すよう
に、最初に入力オーディオＰＣＭサンプルが入力ＰＣＭ
バッファ２１により符号化フレームにバッファされる。
その符号化処理は一フレーム毎に行なわれる。ＰＣＭサ
ンプルの各フレームはブロックサイズ決定モジュール２
２と、ウィンドウ処理と変換モジュール２３とに同時に
入力される。ブロックサイズ決定モジュール２２におい
て、適当な所定のブロックサイズが、ウィンドウ処理
（窓関数処理）と変換処理のために選択される。すなわ
ち、通常、長いブロックと短いブロックの２種類のブロ
ックサイズが使用される。長いブロックサイズは定常的
な信号に使用され、短いブロックサイズはアタック信号
に対して使用される。ブロックサイズ情報はウィンドウ
処理と変換モジュール２３に入力され、バッファされた
ＰＣＭサンプルをウィンドウ処理し、スペクトル係数に
変換処理するために使用される。これらのスペクトル係
数に対してさらなる処理が、これらのスペクトル係数が
関連情報（side information）とともにデコーダに送信
される前に施される。[0003] A typical transform coder first converts input audio PCM samples to input PCM as shown in FIG.
The buffer 21 buffers the encoded frame.
The encoding process is performed for each frame. Each frame of the PCM sample is converted to a block size
2 and the window processing and conversion module 23 at the same time. In the block size determination module 22, an appropriate predetermined block size is selected for window processing (window function processing) and conversion processing. That is, usually, two types of block sizes, a long block and a short block, are used. The long block size is used for stationary signals, and the short block size is used for attack signals. The block size information is input to the windowing and transform module 23 and is used to window the buffered PCM samples and transform them into spectral coefficients. Further processing is performed on these spectral coefficients before they are transmitted to the decoder together with side information.

【０００４】ブロックサイズの決定は、変換符号化器に
おいて発生するプリエコーノイズの除去において重要な
役割を果たす。プリエコーノイズは、信号強度における
突然の増加の前の静かな期間によって特徴づけられるア
タック信号に対して、長いブロックで変換を行なうこと
によって引き起こされる。量子化ノイズは変換ブロック
全体に広がり、アタック信号の静かな部分に重大な影響
を及ぼす。そのように、量子化ノイズは、突然の大きな
音が現れる前の静かな部分においてより明白になり、こ
れによりプリエコーノイズとして説明される。長いブロ
ックを複数の短いブロックに分割し、短いブロック毎に
短いブロック変換を用いることにより、量子化ノイズは
各短いブロック内に閉じ込められる。これによりプリ−
エコーノイズを削減できる。一般に、静的な信号に対し
ては、高い周波数解像度の構成でより良い符号化効率を
実現するように長いブロック変換が行なわれる。一方、
短いブロック変換はアタック信号に対して使用され、プ
リ−エコーノイズを低減する。それゆえ、ブロックサイ
ズ決定方法が信号のタイプを正確に分類し得ることは重
要である。[0004] The determination of the block size plays an important role in removing the pre-echo noise generated in the transform encoder. Pre-echo noise is caused by performing a transform in long blocks on an attack signal characterized by a quiet period before a sudden increase in signal strength. The quantization noise spreads throughout the transform block and has a significant effect on the quiet part of the attack signal. As such, quantization noise becomes more pronounced in quiet parts before sudden loud noises appear, and is thus described as pre-echo noise. By dividing the long block into multiple short blocks and using a short block transform for each short block, the quantization noise is confined within each short block. This allows pre-
Echo noise can be reduced. Generally, a long block transform is performed on a static signal so as to realize better coding efficiency with a high frequency resolution configuration. on the other hand,
Short block transforms are used on attack signals to reduce pre-echo noise. Therefore, it is important that the block sizing method be able to accurately classify signal types.

【０００５】本発明は、オーディオ変換符号化における
ブロックサイズの決定に関する。特に、それはＭＰＥＧ
−２ＡＡＣアルゴリズムに使用可能なものである。ＭＰ
ＥＧ−２ＡＡＣアルゴリズムにおいて、変換処理は、プ
リンセンとブラッドレイにより最初に提案された変形離
散コサイン変換（ＭＤＣＴ(Modified Discrete Cosine
Transform)：Prinsen and Bradley, "Subband/Transfor
m Coding Using Filter Bank Designs Based on Time D
omain Aliasing Cancellation", ICASSP予稿集1987, pp
2161-2164）に基いている。[0005] The present invention relates to the determination of block size in audio transform coding. In particular, it is MPEG
-2 It can be used for the AAC algorithm. MP
In the EG-2AAC algorithm, the transformation process is a modified discrete cosine transform (MDCT) originally proposed by Princen and Bradley.
Transform): Prinsen and Bradley, "Subband / Transfor
m Coding Using Filter Bank Designs Based on Time D
omain Aliasing Cancellation ", ICASSP Proceedings 1987, pp
2161-2164).

【０００６】２種類のブロックサイズが事前に決められ
ており、長いブロックは２０４８個のＰＣＭサンプルを
使用し、短いブロックは２５６個のＰＣＭサンプルを使
用する。短いブロックが使用されたときは、処理フレー
ムは２５６個のＰＣＭサンプルの中の８つのサブブロッ
クに分割され、各サブブロックは短いブロックのＭＤＣ
Ｔにより処理される。[0006] Two types of block sizes are predetermined, with long blocks using 2048 PCM samples and short blocks using 256 PCM samples. When short blocks are used, the processing frame is divided into eight sub-blocks of 256 PCM samples, each sub-block being the MDC of the short block.
Processed by T.

【０００７】図６に、ＩＳＯ／ＩＥＣ１３８１８−７
ＭＰＥＧ−２ＡＡＣ規格書の補足Ｂ（情報的部分）に
おいて記述されているブロックサイズ決定方法を示す。
それは、提案された心理聴覚モデルにより計算される知
覚エントロピーの値に基いている。この方法において、
入力されたオーディオＰＣＭサンプルは最初２０４８個
のサンプルのフレームにバッファされる（Ｓ６１）。こ
れらのサンプルフレームはＦＦＴによりスペクトル係数
に変換される（Ｓ６２）。得られたスペクトル係数に基
き、予測不可能性度を計算する（Ｓ６３）。しきい値を
計算し（Ｓ６４）、知覚エントロピーを計算する（Ｓ６
５）。その知覚エントロピーを所定の決定しきい値swit
ch_peと比較する（Ｓ６６）。知覚エントロピーが所定
の決定しきい値switch_peよりも大きいときは、短いブ
ロックサイズがＭＤＣＴのために選択される（Ｓ６
８）。そうでなければ、長いブロックサイズがＭＤＣＴ
のために選択される（Ｓ６７）。所定の決定しきい値sw
itch_peは機器に依存（implementation dependent）す
る。FIG. 6 shows an ISO / IEC13818-7.
The following describes a block size determination method described in Supplement B (informational part) of the MPEG-2 AAC standard.
It is based on the value of perceptual entropy calculated by the proposed psychoacoustic model. In this method,
The input audio PCM sample is first buffered into a frame of 2048 samples (S61). These sample frames are converted into spectral coefficients by FFT (S62). The degree of unpredictability is calculated based on the obtained spectral coefficients (S63). The threshold is calculated (S64), and the perceptual entropy is calculated (S6).
5). The perceived entropy is determined by a predetermined decision threshold swit
Compare with ch_pe (S66). If the perceived entropy is greater than a predetermined decision threshold switch_pe, a short block size is selected for MDCT (S6).
8). Otherwise, the long block size is MDCT
(S67). Predetermined decision threshold sw
itch_pe is implementation dependent.

【０００８】[0008]

【発明が解決しようとする課題】変換符号化器において
発生するプリエコーノイズを除去する効率の良いブロッ
クサイズ決定方法が要望されている。変換時に使用され
る実際のブロックサイズはそれ自身において重要な要因
であるが、信号アタック、特に、臨界的なものの正確な
検出は非常に重要である。一般に、高い冗長性と不適切
な除去を実現するより良い周波数解像度を与えるオーデ
ィオ信号の変換符号化に対しては、長いブロックの使用
が望まれる。これは、特に、オーディオ信号の特性が緩
やかに変化するオーディオ信号のセグメントについては
真実である。短いブロックは臨界アタック信号からなる
セグメントに対してのみ使用され得る。There is a need for an efficient block size determination method for removing pre-echo noise generated in a transform encoder. While the actual block size used during the conversion is an important factor in itself, accurate detection of signal attacks, especially critical ones, is very important. In general, the use of long blocks is desirable for transform coding of audio signals that provides better frequency resolution to achieve high redundancy and improper rejection. This is especially true for segments of the audio signal where the characteristics of the audio signal change slowly. Short blocks can only be used for segments consisting of critical attack signals.

【０００９】従来の技術は正しいブロックサイズの決定
においてあまり効率的でない。その精度は非常に複雑な
処理である心理聴覚モデルに大きく依存している。[0009] The prior art is not very efficient in determining the correct block size. Its accuracy depends heavily on psychoacoustic models, which are very complex processes.

【００１０】本発明は上記課題を解決すべくなされたも
のであり、その目的とするところは、効率的でかつ簡単
なブロックサイズ決定方法を提供することにある。SUMMARY OF THE INVENTION The present invention has been made to solve the above problems, and an object of the present invention is to provide an efficient and simple block size determination method.

【００１１】[0011]

【課題を解決するための手段】本発明は、プリマスキン
グ（pre-masking）とポストマスキング（post-maskin
g）の時間性マスキング（temporal masking）条件を考
慮してなされている。プリマスキングとは、アタックの
前に発生する音がマスクされるような、大きい音の速い
立ち上がりにより引き起こされる条件である。一方、ポ
ストマスキングとは、大きな音のマスカー（loud maske
r）の後に発生する音がマスクされるという、大きな音
のマスカーによる音の残存効果（lingering effect）で
ある。プリマスキングとポストマスキング効果に対する
心理聴覚の実験は、プリマスキングが５から２０msecの
間継続し、また、ポストマスキングがマスカーの期間に
依存して２０から２００msecの間継続することを示して
いる。本発明においては、上記の実験結果に基き、プリ
マスキングの期間が相対的に短いために、信号のアタッ
クのみが検出される。ポストマスキングの期間は相対的
に長いので、信号の解放（release of a signal）は無
視される。SUMMARY OF THE INVENTION The present invention provides pre-masking and post-masking.
g) The temporal masking (temporal masking) condition is taken into consideration. Pre-masking is a condition caused by a fast rise of loud sounds, such that sounds occurring before an attack are masked. On the other hand, post-masking is a loud masker (loud maske).
This is the lingering effect of the sound caused by the loud masker, in which the sound generated after r) is masked. Psychoacoustic experiments on pre-masking and post-masking effects show that pre-masking lasts between 5 and 20 msec, and that post-masking lasts between 20 and 200 msec, depending on the duration of the masker. In the present invention, based on the above experimental results, since the pre-masking period is relatively short, only the signal attack is detected. Since the post-masking period is relatively long, the release of a signal is ignored.

【００１２】本発明は以下の手段から構成される。すな
わち、変換入力バッファのＰＣＭサンプルを、経験的な
プリマスキング期間を考慮した約６msecの等しい長さの
Ｋ個のサブブロックに分割する手段と、上記各サブブロ
ック毎に、上記ＰＣＭサンプル値からその差分を求める
手段と、上記各サブブロック毎に、上記差分のピークと
上記ＰＣＭサンプル値のピークとを検出する手段と、上
記サブブロック毎に、上記差分のピークを上記オーディ
オＰＣＭサンプル値のピークによって変調し、変調値を
生成する手段と、全てのサブブロックから変調値のピー
ク（ピーク・サブブロック）を検出する手段と、上記ピ
ーク・サブブロックと、上記ピーク・サブブロックより
前に現れる他の全てのサブブロックとの間で、変調値の
比を計算する手段と、変調値の比を所定のしきい値と比
較することにより、アタック信号を検出する手段と、変
換に対して、アタック信号が検出されたときは短いブロ
ックサイズを適用し、アタック信号が検出されなかった
ときは長いブロックサイズを適用する手段とを備える。The present invention comprises the following means. That is, means for dividing the PCM sample of the conversion input buffer into K sub-blocks having an equal length of about 6 msec in consideration of an empirical pre-masking period, and for each of the sub-blocks, Means for calculating a difference; means for detecting the peak of the difference and the peak of the PCM sample value for each of the sub-blocks; and means for detecting the peak of the difference for each of the sub-blocks by the peak of the audio PCM sample value. Means for modulating and generating a modulation value; means for detecting a peak (peak sub-block) of the modulation value from all sub-blocks; the above-mentioned peak sub-block; and other appearing before the peak sub-block. Means for calculating the modulation value ratio between all the sub-blocks and comparing the modulation value ratio with a predetermined threshold value; Means for detecting an attack signal, and means for applying a short block size when an attack signal is detected, and applying a long block size when no attack signal is detected.

【００１３】または、本発明は以下の手段から構成され
てもよい。すなわち、変換入力バッファのオーディオＰ
ＣＭサンプルの一フレームを等時間間隔のＫ個のサブブ
ロックに分割する手段と、各サブブロック毎に、オーデ
ィオＰＣＭサンプルからその差分を求める手段と、各サ
ブブロック毎に差分のピークを検出する手段と、全ての
サブブロックの中で差分のピークの最大値を検出する手
段と、差分のピークの最大値を含むサブブロックを最大
サブブロックとしてラベリングする手段と、最大サブブ
ロックより前に現れるサブブロックの全てと、最大サブ
ブロックとの間で差分のピークの比を計算する手段と、
各比を所定のしきい値と比較する手段と、比較結果とフ
レーム内の最大サブブロックの位置とに基きブロックサ
イズを決定する手段とを備える。Alternatively, the present invention may be constituted by the following means. That is, the audio P of the conversion input buffer
Means for dividing one frame of a CM sample into K sub-blocks at equal time intervals, means for obtaining the difference from audio PCM samples for each sub-block, means for detecting the peak of the difference for each sub-block Means for detecting the maximum value of the difference peak among all the sub-blocks; means for labeling the sub-block containing the maximum value of the difference peak as the maximum sub-block; and a sub-block appearing before the maximum sub-block. Means for calculating the ratio of the peak of the difference between all of the
Means for comparing each ratio with a predetermined threshold value, and means for determining a block size based on the comparison result and the position of the largest sub-block in the frame.

【００１４】プリマスキング効果の利用により変換入力
バッファのオーディオＰＣＭサンプルが、約６msecの期
間の等しい長さのサブブロックに分割される。これは、
サブブロック内で発生するアタックに対して、プリエコ
ーノイズの発生がマスクされるために、そららが簡単に
無視されるということを意味している。このことはま
た、ブロックサイズ決定方法の複雑さを低減するという
利点を有している。現時点のフレームにおいてアタック
信号の存在を検出するために、変調された値（変調値）
の比が所定のしきい値と比較される。それらの比の中の
いずれかがしきい値を超えていれば、それはアタック信
号が存在することを示し、このとき、短いブロックサイ
ズが変換のために選択される。各サブブロックの変調値
は、差分のピークを、サブブロックにおいて検出された
ＰＣＭサンプル値のピークによって変調することにより
得られる。その差分は２つの隣接するＰＣＭサンプル毎
にそれらの値の差分をとることにより導出される。それ
らの比を求めるために、最初に変調値のピークとそのサ
ブブロックが識別され、サブブロックのピークの前に現
れる全てのサブブロックの変調値と、変調値のピークと
の間で比がとられる。そうすることにより、アタック信
号のみが考慮され、解放信号（release signal）は考慮
されない。また、遅く立ち上がるアタック及び速く立ち
上がるアタックの双方の検出が可能である。By utilizing the pre-masking effect, the audio PCM samples in the conversion input buffer are divided into equal length sub-blocks of approximately 6 msec. this is,
Since the occurrence of pre-echo noise is masked for an attack occurring in a sub-block, it means that the noise is easily ignored. This also has the advantage of reducing the complexity of the block size determination method. Modulated value (modulation value) to detect the presence of an attack signal in the current frame
Is compared to a predetermined threshold. If any of those ratios exceed the threshold, it indicates that an attack signal is present, at which time a short block size is selected for the transform. The modulation value for each sub-block is obtained by modulating the peak of the difference with the peak of the PCM sample value detected in the sub-block. The difference is derived by taking the difference between their values for every two adjacent PCM samples. To determine their ratio, first the peak of the modulation value and its sub-blocks are identified, and the ratio between the modulation value of all the sub-blocks that appear before the peak of the sub-block and the peak of the modulation value is determined. Can be By doing so, only the attack signal is considered and not the release signal. Further, it is possible to detect both an attack that rises slowly and an attack that rises quickly.

【００１５】以上のように、本発明のブロックサイズ決
定方法は、複雑な心理聴覚モデルやＦＦＴ処理を利用せ
ずに容易に実現できる。また、従来技術では周波数領域
における同時マスキングに依存して使用するブロックサ
イズを決定したのに対し、本発明はプリマスキングとポ
ストマスキングの時間性マスキング効果を利用して、使
用するブロックサイズを決定する。As described above, the block size determination method of the present invention can be easily realized without using a complicated psychoacoustic model or FFT processing. Further, while the prior art determines the block size to be used depending on simultaneous masking in the frequency domain, the present invention determines the block size to be used by using the temporal masking effect of pre-masking and post-masking. .

【００１６】[0016]

【発明の実施の形態】以下、添付の図面を参照して本発
明に係るブロックサイズ決定方法の実施形態を説明す
る。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of a block size determining method according to the present invention will be described below with reference to the accompanying drawings.

【００１７】以下では、効率的なブロックサイズ決定方
法の利用が、ＭＰＥＧ−２ＡＡＣアルゴリズムを用いて
一例として説明される。これは他のオーディオ符号化ア
ルゴリズムに対しても適用できる。ＭＰＥＧ−２ＡＡＣ
アルゴリズムにおいて使用される変換は変形離散コサイ
ン変換（ＭＤＣＴ）である。ＭＤＣＴの他に、離散フー
リエ変換（ＤＦＴ）、離散コサイン変換（ＤＣＴ）、離
散サイン変換（ＤＳＴ）のような所定のブロックに基い
た他の変換を用いてもよい。In the following, the use of an efficient block size determination method will be described as an example using the MPEG-2 AAC algorithm. This can be applied to other audio coding algorithms. MPEG-2 AAC
The transform used in the algorithm is the Modified Discrete Cosine Transform (MDCT). In addition to the MDCT, other transforms based on a predetermined block such as a discrete Fourier transform (DFT), a discrete cosine transform (DCT), and a discrete sine transform (DST) may be used.

【００１８】なお、本実施形態のブロックサイズの決定
方法を実行する変換符号化器は図２に示す構成を有して
おり、その構成要素の１つであるブロックサイズ決定モ
ジュール２２が以下に説明する処理を実現する。The transform encoder for executing the block size determining method according to the present embodiment has the configuration shown in FIG. 2, and a block size determining module 22, which is one of the components, is described below. Realizing the processing.

【００１９】図１に本発明のブロックサイズの決定方法
のフローチャートを示す。ステップＳ１１において、Ｍ
ＤＣＴ入力バッファのＰＣＭサンプルを、等しい長さの
Ｋ個のサブブロックに分割する。これらのサブブロック
は、経験に基くプリマスキング期間を考慮して約６msec
の等しい期間を有する。図３はこのときのサブブロック
の分割の様子を示した図である。ステップＳ１２におい
て、各サブブロックにおいてＰＣＭサンプルの値（ｐｃ
ｍ_i）を、隣接する２つのＰＣＭサンプル毎の差分値
（ｄｉｆｆ_i）に変換する。これは、次のように記述さ
れる。ｄｉｆｆ_i=ｐｃｍ_i+1-ｐｃｍ_i， 1≦i≦N-1 （１）ここで、Ｎは各サブブロック中のＰＣＭサンプル数に等
しい。図４は、このようなＰＣＭサンプルから差分値
（図中、ｄ_i）への変換処理の一例を示した図である。FIG. 1 shows a flowchart of a method for determining a block size according to the present invention. In step S11, M
Divide the PCM samples of the DCT input buffer into K sub-blocks of equal length. These sub-blocks take approximately 6 msec to account for the empirical pre-masking period.
Have equal periods of time. FIG. 3 is a diagram showing how the sub-blocks are divided at this time. In step S12, the value of the PCM sample (pc
m _i ) is converted to a difference value (diff _i ) for every two adjacent PCM samples. This is described as follows: diff _i = pcm _{i + 1} -pcm _i , 1 ≦ i ≦ N−1 (1) where N is equal to the number of PCM samples in each sub-block. FIG. 4 is a diagram showing an example of a process of converting such a PCM sample into a difference value (d _{i in the} figure).

【００２０】ステップＳ１３において、サブブロック毎
に、差分値のピーク（Ｄ₁…Ｄ_K）（以下「ピーク差分
値」という。）と、ＰＣＭサンプル値のピーク（Ｓ₁…
Ｓ_K）（以下「ピークサンプル値」という。）とを検出
する。ここで、各ピーク（Ｄ₁…Ｄ_K）、（Ｓ₁…Ｓ_K）
は、差分値またはＰＣＭサンプル値の絶対値を比較する
ことにより求めることができる。ステップＳ１４におい
て、ピーク差分値（Ｄ₁…Ｄ_K）をピークサンプル値（Ｓ
₁…Ｓ_K）でそれぞれ変調し、サブブロック毎に変調され
た値（Ｐ₁…Ｐ_K）（以下「変調値」という。）を得る。
これは次式で表される。Ｐ_i＝Ｓ_i・Ｄ_i，１≦i≦K （２）At step S13, the difference value peak (D ₁ ... _DK ) (hereinafter referred to as “peak difference value”) and the PCM sample value peak (S ₁ .
S _K ) (hereinafter referred to as “peak sample value”). Here, each peak _{_{(D 1 ... D K),}} (S 1 ... S K)
Can be obtained by comparing the absolute value of the difference value or the PCM sample value. In step S14, the peak difference value (D ₁ ... _DK ) is converted to a peak sample value (S
₁ ... S _K ) to obtain values (P ₁ ... P _K ) (hereinafter referred to as “modulation values”) modulated for each sub-block.
This is represented by the following equation. P _i = S _i · D _i , 1 ≦ i ≦ K (2)

【００２１】ステップＳ１５では、全てのサブブロック
の変調値Ｐ_iの中で絶対値が最大となる変調値を検出
し、その最大の変調値を含むサブブロック（以下「ピー
ク・サブブロック」という。）のインデックス番号をpe
akIndexとしてラベリングする。これは次のように示さ
れる。Ｐ_peakIndex＝ｍａｘ（Ｐ₁…Ｐ_K）（３）In step S15, the modulation value having the maximum absolute value among the modulation values P _i of all the sub-blocks is detected, and the sub-block including the maximum modulation value (hereinafter referred to as “peak sub-block”). ) Index number of pe
Label as akIndex. This is shown as follows. P _peakIndex = max (P ₁ … P _K ) (3)

【００２２】ステップS１６において、インデックス番
号peakIndexが最初（１番目）のサブブロックのインデ
ックス番号と比較される。もし、それらが等しければ、
長いブロックサイズがＭＤＣＴ処理のために選択される
（Ｓ１９）。このように最初のサブブロックがピーク・
サブブロックであるときに長いブロックサイズを選択す
るのは、ポストマスキング効果の心理聴覚現象を利用し
ているからである。すなわち、もし、最初のサブブロッ
クにおいて信号のアタックが発生すれば、ポストマスキ
ング効果が生じ、そのアタックより後に現れる信号の時
間性ノイズをマスクするからである。このようにブロッ
クサイズの決定においてはフレーム内のピーク・サブブ
ロックの位置も考慮される。In step S16, the index number peakIndex is compared with the index number of the first (first) sub-block. If they are equal,
A long block size is selected for MDCT processing (S19). Thus, the first sub-block has a peak
The reason why a long block size is selected when a sub-block is used is that the psycho-aural phenomenon of the post-masking effect is used. That is, if a signal attack occurs in the first sub-block, a post-masking effect occurs, and the temporal noise of the signal appearing after the attack is masked. Thus, in determining the block size, the position of the peak sub-block in the frame is also considered.

【００２３】ステップＳ１６において、インデックス番
号peakIndexが最初のサブブロックのインデックス番号
と等しくなければ、ピーク・サブブロックと、そのピー
ク・サブブロックより前に現れる全ての他のサブブロッ
クとの間で変調値の比が計算される（Ｓ１７）。その比
Ｒ_jは次のように定義される。Ｒ_j＝Ｐ_peakIndex／Ｐ_j，１≦j≦(peakIndex-1) （４）In step S16, if the index number peakIndex is not equal to the index number of the first sub-block, the modulation value between the peak sub-block and all other sub-blocks appearing before the peak sub-block. Is calculated (S17). The ratio R _j is defined as follows. R _j = P _peakIndex / P _j , 1 ≦ j ≦ (peakIndex-1) (4)

【００２４】図５は、この比の計算の具体例について説
明している。すなわち、図５においては、サブブロック
Ｋ−２がピーク・サブブロックであり、その前に現れる
サブブロック１からサブブロックＫ−３までのサブブロ
ックについて比Ｒ_jが計算される。FIG. 5 illustrates a specific example of the calculation of this ratio. That is, in FIG. 5, the sub-block K-2 is the peak sub-block, and the ratio R _j is calculated for the sub-blocks from the sub-block 1 to the sub-block K-3 that appear before it.

【００２５】なお、上記の説明では、ピーク差分値をピ
ークサンプル値で変調して得られる変調値Ｐ_iに基いて
比Ｒ_iを求めたが、ピーク差分値を変調せずにピーク差
分値に基いて比Ｒ_iを求めてもよい。すなわち、全ての
サブブロックのピーク差分（Ｄ₁…Ｄ_K）の中で最大とな
るピーク差分を検出し、その最大ピーク差分を含むサブ
ブロック（最大サブブロック）と、その前に現れる各サ
ブブロックとの間で差分の比Ｒ_iを上記と同様にして求
めてもよい。In the above description, the ratio R _i is obtained based on the modulation value P _i obtained by modulating the peak difference value with the peak sample value. Based on this, the ratio R _i may be determined. That is, a peak difference which is the largest among the peak differences (D ₁ ... _DK ) of all the sub-blocks is detected, and a sub-block (the largest sub-block) including the maximum peak difference and each sub-block appearing before the sub-block. the ratio R _i of the difference may be determined in the same manner as above with the.

【００２６】ステップＳ１８において、各変調値の比Ｒ
_jが所定の決定しきい値d_thresholdと比較される。この
決定しきい値は機器に依存する。少なくとも１つの比が
決定しきい値を超えた場合、短いブロックサイズがＭＤ
ＣＴ処理のために選択される（Ｓ２０）。それ以外で
は、ＭＤＣＴ処理のために長いブロックサイズが選択さ
れる（Ｓ１９）。In step S18, the ratio R of each modulation value
_j is compared to a predetermined decision threshold d_threshold. This decision threshold depends on the device. If at least one ratio exceeds the decision threshold, the short block size is MD
Selected for CT processing (S20). Otherwise, a long block size is selected for MDCT processing (S19).

【００２７】なお、ＭＤＣＴ処理において、短いブロッ
クサイズが選択されたときは、短いブロックサイズで符
号化するための短いウィンドウ（Short Window）が使用
され、長いブロックサイズが選択されたときは、長いブ
ロックサイズで符号化するための長いウィンドウ（Long
Window）が使用される。ところで、ＭＰＥＧ２ＡＡ
Ｃ、ＭＰ３では、実際は、上記の短いウィンドウや長い
ウィンドウに加えて、それらのウィンドウを切り換える
ときに、スタートウィンドウ（Start Window）やストッ
プウィンドウ（Stop Window）のようなウインドウを使
用する。本発明はこうした過渡時に使用されるウィンド
ウを用いる他の符号化システムにおいても適用できる。In the MDCT processing, when a short block size is selected, a short window for encoding with a short block size is used, and when a long block size is selected, a long block is selected. Long window for encoding by size (Long
Window) is used. By the way, MPEG2 AA
In C and MP3, actually, in addition to the above-mentioned short window and long window, when switching these windows, a window such as a start window (Start Window) or a stop window (Stop Window) is used. The present invention can be applied to other encoding systems using windows used during such transitions.

【００２８】[0028]

【発明の効果】本発明によれば、オーディオ変換符号化
器に対して信号のアタックの検出及びプリエコー発生の
低減に有効な方法が提供される。それは、従来技術に比
して複雑さを伴わず、メモリ使用率および所要処理時間
の点において効率を向上できる。このため、本発明は、
オーディオ変換符号化器のＬＳＩでの実現における低コ
スト化に対して有効である。According to the present invention, there is provided a method effective for detecting an attack of a signal and reducing the occurrence of pre-echo for an audio transcoder. It can be more efficient in terms of memory utilization and required processing time, with less complexity than the prior art. Therefore, the present invention
This is effective for reducing the cost in realizing an audio conversion encoder by an LSI.

[Brief description of the drawings]

【図１】本発明に係るブロックサイズ決定方法のフロ
ーチャート。FIG. 1 is a flowchart of a block size determination method according to the present invention.

【図２】オーディオ変換符号化器における変換処理の
ための構成を示すブロック図。FIG. 2 is a block diagram showing a configuration for a conversion process in an audio conversion encoder.

【図３】入力ＰＣＭバッファの１フレームをサブブロ
ックへ分割する様子を説明した図。FIG. 3 is a view for explaining how one frame of an input PCM buffer is divided into sub-blocks.

【図４】ＰＣＭサンプルから差分値への変換を説明し
た図。FIG. 4 is a view for explaining conversion from PCM samples to difference values.

【図５】変調値の比の計算を説明した図。FIG. 5 is a diagram illustrating calculation of a modulation value ratio.

【図６】従来技術におけるブロックサイズ決定方法の
フローチャート。FIG. 6 is a flowchart of a block size determination method according to the related art.

[Explanation of symbols]

２１入力ＰＣＭバッファ２２ブロックサイズ決定モジュール２３ウィンドウ及び変換モジュール 21 Input PCM Buffer 22 Block Size Determination Module 23 Window and Conversion Module

Claims

[Claims]

1. A method for adaptively determining the choice of transform block size for an audio transform encoder, comprising: a) dividing one frame of audio PCM samples in a transform input buffer into K sub-blocks at equal time intervals; B) for each of the sub-blocks, the difference is obtained from the audio PCM sample; c) for each of the sub-blocks, the difference peak and the audio PCM sample peak are detected; For each sub-block, the peak of the difference is modulated by the peak of the audio PCM sample to generate a modulation value. E) The peak of the modulation value is detected, and the sub-block containing the peak of the modulation value is peak-sub- Labeling as a block, f) all sub-blocks appearing before the peak sub-block and the peak Calculating the ratio of the modulation values to and from the sub-block; g) comparing each of the ratios with a predetermined threshold; h) comparing the comparison result with the position of the peak sub-block in the frame. Determining a block size based on the block size.

2. A method for adaptively determining the selection of a transform block size for an audio transform encoder using differential PCM samples, comprising: a) one frame of audio PCM samples in a transform input buffer at equal time intervals; Dividing into K sub-blocks, b) for each of the sub-blocks, finding the difference from the audio PCM sample, c) detecting the peak of the difference for each of the sub-blocks, d) detecting all the sub-blocks Detecting the maximum value of the difference peak in the block; and e) identifying the sub-block containing the maximum value of the difference peak,
Labeling as the largest sub-block; f) calculating the ratio of the difference peaks between all of the sub-blocks appearing before the largest sub-block and the largest sub-block; H) determining a block size based on the comparison result and the position of the largest sub-block in the frame.

3. The transform is a discrete Fourier transform (DF)
3. The transform according to claim 1, wherein the transform is based on a predetermined block such as T), discrete cosine transform (DCT), discrete sine transform (DST) or modified discrete cosine transform (MDCT). Method.

4. The time interval between the sub-blocks is 5 ms.
3. A method according to claim 1 or claim 2, wherein the method is in the range between ec and 20 msec.

5. The method according to claim 1, wherein the difference is obtained by calculating a difference between two consecutive audio PCM sample values in the sub-block.

6. The method according to claim 1, wherein the peak of the difference is obtained by comparing the absolute value of the difference in the sub-block.

7. The method according to claim 1, wherein the peak of the audio PCM sample is obtained by comparing absolute values of audio PCM samples in the sub-block.

8. The method according to claim 1, wherein the modulation process is a mathematical operation.

9. The method according to claim 1, wherein the peak of the modulation value is obtained by comparing modulation values in the frame.

10. The method according to claim 1, wherein the predetermined threshold value is device-dependent.

11. The method according to claim 1, wherein when any one of the ratios is higher than the predetermined threshold, the block size is determined to be a short block size.

12. The apparatus according to claim 1, wherein the block size is determined to be a long block size when all of the ratios are equal to or less than the predetermined threshold value.
The described method.

13. The method of claim 1, wherein the block size is determined to be a long block size when the peak sub-block is the first sub-block in the frame.

14. The method of claim 2, wherein the block size is determined to be a long block size when the largest sub-block is the first sub-block in the frame.