JP4550595B2

JP4550595B2 - Audio encoding device

Info

Publication number: JP4550595B2
Application number: JP2005011737A
Authority: JP
Inventors: 正樹桐原; 将高長田
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2005-01-19
Filing date: 2005-01-19
Publication date: 2010-09-22
Anticipated expiration: 2025-01-19
Also published as: JP2006201375A

Description

本発明は、オーディオ信号を１フレーム中に複数が含まれるサブブロック単位で時間周波数変換してスペクトラルデータを得る処理を伴うとともに、１フレーム中に含まれる前記サブブロックをグルーピングして同一グループに含まれる複数の前記サブブロックでサイド情報を共通化するオーディオ符号化装置に関する。 The present invention involves processing for obtaining spectral data by performing time-frequency conversion of audio signals in units of sub-blocks including a plurality of audio signals in one frame, and grouping the sub-blocks included in one frame into the same group. The present invention relates to an audio encoding device for sharing side information among a plurality of sub-blocks.

非特許文献１には、ＡＡＣ（Advanced Audio Coding）による符号化について以下のように規定されている。 Non-Patent Document 1 stipulates encoding by AAC (Advanced Audio Coding) as follows.

アタック信号の様な過渡的な信号を含むフレームはショートブロックとして処理する。ショートブロックと判定されたフレームは、８個のサブブロックに分割し、サブブロック毎に処理する。 A frame including a transient signal such as an attack signal is processed as a short block. A frame determined to be a short block is divided into 8 sub-blocks and processed for each sub-block.

ただし、８個のサブブロックの全てが過渡的な信号を含んでいる訳ではない。そこで過渡的な信号を含まないサブブロックを纏めるようにグルーピングする。そして、グループ内でスケールファクター等のサイド情報を共通化することにより、圧縮率を高める。 However, not all eight sub-blocks contain transient signals. Therefore, grouping is performed so that sub-blocks that do not include transient signals are collected. The compression rate is increased by sharing side information such as a scale factor within the group.

ところで、サブブロックは、時間域のオーディオ信号を図３に示すような８つの窓Ｗ０〜Ｗ７のそれぞれで区切り、各区間の信号を個別にＭＤＣＴ（Modified Discrete Cosine Transform）することにより得られる。過渡的な信号の位置は、時間域のオーディオ信号において検出する。そしてその位置が図３に示す期間Ｐ０〜Ｐ７のいずれに属するかを考慮してグルーピングする。具体的には、過渡信号の位置が図３に示す位置であった場合、その位置は期間Ｐ２に属するので、窓Ｗ２に対応するサブブロックのみを含むグループを作るとともに、他のサブブロックを纏めるようにグルーピングする。
「3rd Generation Partnership Project; Technical Specification Group Service and System Aspects; General Audio Codec audio processing functions; Enhanced aacPlus general audio codec; Encoder Specification AAC part (Release 6)」，3GPP TS 26.403 V6.0.0，3rd Generation Partnership Project，２００４年９月 By the way, the sub-block is obtained by dividing the audio signal in the time domain by each of eight windows W0 to W7 as shown in FIG. 3, and individually performing MDCT (Modified Discrete Cosine Transform) on the signals in each section. The position of the transient signal is detected in the audio signal in the time domain. Then, the grouping is performed in consideration of which of the periods P0 to P7 shown in FIG. Specifically, when the position of the transient signal is the position shown in FIG. 3, the position belongs to the period P2, so that a group including only the sub-block corresponding to the window W2 is formed and the other sub-blocks are collected. Group them as follows.
"3rd Generation Partnership Project; Technical Specification Group Service and System Aspects; General Audio Codec audio processing functions; Enhanced aacPlus general audio codec; Encoder Specification AAC part (Release 6)", 3GPP TS 26.403 V6.0.0, 3rd Generation Partnership Project, 2004 September

図３に示すように窓Ｗ０〜Ｗ７は、隣接するものどうしが５０％ずつオーバーラップしている。このため、過渡信号の影響は２つのサブブロックにそれぞれ現れる。図３の例では、過渡信号の影響は、窓Ｗ２に対応する２番のサブブロックおよび窓Ｗ３に対応する３番のサブブロックのそれぞれに現れる。そして図３における過渡信号の位置においては、窓Ｗ２の窓関数に比べて窓Ｗ３の窓関数の方が大きいため、過渡信号の影響は２番のサブブロックよりも３番のサブブロックに大きく現れる。 As shown in FIG. 3, in the windows W0 to W7, adjacent ones overlap each other by 50%. For this reason, the influence of the transient signal appears in each of the two sub-blocks. In the example of FIG. 3, the influence of the transient signal appears in each of the second sub-block corresponding to the window W2 and the third sub-block corresponding to the window W3. Then, at the position of the transient signal in FIG. 3, the window function of the window W3 is larger than the window function of the window W2, so that the influence of the transient signal appears more in the third subblock than in the second subblock. .

しかしながら前述したように、非特許文献１に規定された手法によると、２番のサブブロックのみを含むグループを作るとともに、他のサブブロックを纏めるようにグルーピングされるのであり、適正なグルーピングに基づく符号化が常に行えるわけではなかった。 However, as described above, according to the method defined in Non-Patent Document 1, a group including only the second sub-block is created and grouped so as to collect other sub-blocks, and is based on proper grouping. Encoding was not always possible.

本発明はこのような事情を考慮してなされたものであり、その目的とするところは、過渡的な信号の影響を正しく反映して適正にグルーピングすることで、圧縮率を向上できるオーディオ符号化装置を提供することにある。 The present invention has been made in consideration of such circumstances, and an object of the present invention is to perform audio encoding that can improve the compression ratio by properly grouping the signal by correctly reflecting the influence of a transient signal. To provide an apparatus.

本発明の第１の態様によるオーディオ符号化装置は、オーディオ信号を、１フレームを構成するサブブロックの単位で時間周波数変換してスペクトラルデータを得る処理を伴うとともに、１フレーム中に含まれる前記サブブロックをグルーピングして同一グループに含まれる複数の前記サブブロックではサイド情報を共通化するオーディオ符号化方法を使用して前記オーディオ信号を符号化するオーディオ符号化装置において、前記スペクトラルデータに基づいて、前記サブブロック毎にスペクトルの平坦度を算出する算出手段と、前記１フレームに含まれた前記サブブロックのうちから最大の前記平坦度が算出されたものを単独ブロックとして選択する選択手段と、前記１フレームに含まれた前記サブブロックを、前記単独ブロックのグループと、前記単独ブロック以外の前記サブブロックのうちの連続する少なくとも２つのサブブロックを含むグループとを作るように前記グルーピングを行うグルーピング手段とを備える。 The audio encoding apparatus according to the first aspect of the present invention includes a process of obtaining spectral data by performing time-frequency conversion on an audio signal in units of sub-blocks constituting one frame, and includes the sub signal included in one frame. In an audio encoding device that encodes the audio signal using an audio encoding method for grouping blocks to share side information in the plurality of sub-blocks included in the same group, based on the spectral data, Calculating means for calculating the flatness of a spectrum for each sub-block, selecting means for selecting, as a single block, the one with the highest flatness calculated from the sub-blocks included in the one frame; The sub-block included in one frame is grouped with the single block. Comprising a flop, and a grouping means for performing the grouping to make a group including a consecutive at least two sub-blocks of said sub-blocks other than the single block.

本発明の第２の態様によるオーディオ符号化装置は、オーディオ信号を、１フレームを構成するサブブロックの単位で時間周波数変換してスペクトラルデータを得る処理を伴うとともに、１フレーム中に含まれる前記サブブロックをグルーピングして同一グループに含まれる複数の前記サブブロックではサイド情報を共通化するオーディオ符号化方法を使用して前記オーディオ信号を符号化するオーディオ符号化装置において、前記スペクトラルデータに基づいて、前記サブブロック毎にスペクトルの平坦度を算出する算出手段と、前記算出手段によって算出された前記平坦度が所定の閾値を超えるサブブロックを単独ブロックとして選択する選択手段と、前記単独ブロックのそれぞれを独立のグループとし、前記１フレームに含まれた前記単独ブロック以外の前記サブブロックのうちの連続する少なくとも２つのサブブロックを含むグループを作るグルーピング手段とを備える。 The audio encoding device according to the second aspect of the present invention includes a process of obtaining spectral data by performing time-frequency conversion on an audio signal in units of sub-blocks constituting one frame, and includes the sub-signals included in one frame. In an audio encoding device that encodes the audio signal using an audio encoding method for grouping blocks to share side information in the plurality of sub-blocks included in the same group, based on the spectral data, Calculating means for calculating the flatness of a spectrum for each sub-block, selecting means for selecting a sub-block whose flatness calculated by the calculating means exceeds a predetermined threshold as a single block, and each of the single blocks The independent group is included in the one frame. Making a group comprising at least two sub-blocks consecutive one of said sub-blocks other than German block and a grouping means.

本発明によれば、過渡的な信号の影響を正しく反映して適正にグルーピングすることで、圧縮率を向上できるようになる。 According to the present invention, it is possible to improve the compression ratio by properly reflecting the influence of a transient signal and appropriately performing grouping.

以下、図面を参照して本発明の一実施形態について説明する。
図１は本実施形態に係るオーディオ符号化装置（以下、符号化装置と称する）のブロック図である。 Hereinafter, an embodiment of the present invention will be described with reference to the drawings.
FIG. 1 is a block diagram of an audio encoding apparatus (hereinafter referred to as an encoding apparatus) according to this embodiment.

この符号化装置は、入力されるＰＣＭ信号に対して符号化処理を行い、符号化ビットストリームを出力するものである。この符号化装置は、ブロック切り出し部１、心理聴覚モデル部２、フィルタバンク部３、スイッチ４、フラットネスメジャー算出部５、グルーピング部６、量子化歪み／レート制御部７、ホストプロセッサ８、スケーリング部９、量子化部１０、符号化部１１およびフォーマッタ１２を含む。なお、これらの各部は、ハードウェアによりそれぞれを構成することも可能であるし、各部の機能をＤＳＰ（Digital Signal Processor）等によるソフトウェア処理によって実現することも可能である。 This encoding apparatus performs an encoding process on an input PCM signal and outputs an encoded bit stream. This encoding apparatus includes a block cutout unit 1, a psychoacoustic model unit 2, a filter bank unit 3, a switch 4, a flatness measure calculation unit 5, a grouping unit 6, a quantization distortion / rate control unit 7, a host processor 8, and scaling. A section 9, a quantization section 10, an encoding section 11, and a formatter 12. Each of these units can be configured by hardware, and the function of each unit can be realized by software processing using a DSP (Digital Signal Processor) or the like.

ブロック切り出し部１には、符号化の対象となるＰＣＭ信号が与えられる。このＰＣＭ信号は、時間域の信号である。ブロック切り出し部１は、このＰＣＭ信号から規定のブロックサイズのサンプル数毎にデータを切り出す。そしてブロック切り出し部１は、この切り出した信号を出力する。 The block cutout unit 1 is given a PCM signal to be encoded. This PCM signal is a time domain signal. The block cutout unit 1 cuts out data from the PCM signal for each sample number of a specified block size. The block cutout unit 1 outputs the cutout signal.

心理聴覚モデル部２は、ブロック切り出し部１が出力する信号に対してＤＦＴ（Discrete Fourier Transform）、ＤＣＴ（Discrete Cosine Transform）、あるいはＭＤＣＴ（Modified DCT）等の直交変換を行い、これにより時間域の信号から周波数領域の信号に変換する。心理聴覚モデル部２は、上記の直交変換により得られる変換係数（周波数成分）から知覚エントロピーと呼ばれるパラメータを計算する。このパラメータは、上記の変換係数から、聴感上の周波数分解能、周波数成分の拡散、予測不能性、信号の調音性（tonality）を解析することにより１ブロックを符号化するのに必要とされる情報量を推定するためのものであり、計算方法の詳細はＩＳＯ／ＩＥＣにより制定された国際標準規格13818-7にて規定されている。 The psychoacoustic model unit 2 performs orthogonal transform such as DFT (Discrete Fourier Transform), DCT (Discrete Cosine Transform), or MDCT (Modified DCT) on the signal output from the block cutout unit 1, thereby Convert signal to frequency domain signal. The psychoacoustic model unit 2 calculates a parameter called perceptual entropy from the transform coefficient (frequency component) obtained by the orthogonal transform. This parameter is information required to encode one block by analyzing the auditory frequency resolution, diffusion of frequency components, unpredictability, and signal tonality from the conversion coefficient. This is for estimating the quantity, and details of the calculation method are defined in the international standard 13818-7 established by ISO / IEC.

ところで心理聴覚モデル部２は、ブロック切替え部２ａおよびＳＭＲ算出部２ｂを有する。ブロック切替え部２ａは、上記の知覚エントロピーに基づいて、実際に符号化する時の直交変換処理（ＭＤＣＴ処理）で用いるブロック長（ロングブロック，ショートブロック）を判定する。ブロック切替え部２ａは、この判定の結果を示したブロック長情報をフィルタバンク部３およびスイッチ４へ出力する。ＳＭＲ算出部２ｂは、聴覚上の周波数分解能を考慮した尺度（バーク、メル等）での等間隔の帯域毎に、信号に対してマスクされる音、つまりノイズとして存在しても知覚されない許容雑音の量の比を示すＳＭＲ（Signal to Mask Ratio）を算出する。ＳＭＲ算出部２ｂは、上記算出したＳＭＲを、量子化歪み／レート制御部７へ出力する。 The psychoacoustic model unit 2 includes a block switching unit 2a and an SMR calculation unit 2b. Based on the perceptual entropy, the block switching unit 2a determines a block length (long block, short block) used in orthogonal transform processing (MDCT processing) when actually encoding. The block switching unit 2a outputs block length information indicating the result of this determination to the filter bank unit 3 and the switch 4. The SMR calculation unit 2b is a permissible noise that is not perceived even if it exists as a sound masked with respect to a signal, that is, noise, for each equally-spaced band on a scale (bark, mel, etc.) in consideration of auditory frequency resolution. The SMR (Signal to Mask Ratio) indicating the ratio of the amount of is calculated. The SMR calculation unit 2 b outputs the calculated SMR to the quantization distortion / rate control unit 7.

フィルタバンク部３は、ブロック切り出し部１の出力信号を、心理聴覚モデル部２から出力されるブロック長情報に従って直交変換する。フィルタバンク部３は、上記の直交変換により得られたスペクトラルデータを出力する。なお、符号化方式としてＡＡＣを採用している場合、フィルタバンク部３での直交変換はＭＤＣＴとなる。 The filter bank unit 3 orthogonally transforms the output signal of the block cutout unit 1 according to the block length information output from the psychoacoustic model unit 2. The filter bank unit 3 outputs the spectral data obtained by the orthogonal transformation. In addition, when AAC is employ | adopted as an encoding system, the orthogonal transformation in the filter bank part 3 becomes MDCT.

スイッチ４は、フラットネスメジャー算出部５およびグルーピング部６とスケーリング部９とのいずれかをブロック長情報に基づいて選択する。スイッチ４は、選択している側にフィルタバンク部３の出力信号を与える。 The switch 4 selects any one of the flatness measure calculation unit 5, the grouping unit 6, and the scaling unit 9 based on the block length information. The switch 4 gives the output signal of the filter bank unit 3 to the selected side.

フラットネスメジャー算出部５は、フィルタバンク部３の出力信号におけるスペクトルフラットネスメジャーを、サブブロック毎に算出する。グルーピング部６は、フラットネスメジャー算出部５で算出されたスペクトルフラットネスメジャーに基づいて、フィルタバンク部３の出力信号をグルーピングする。 The flatness measure calculation unit 5 calculates the spectral flatness measure in the output signal of the filter bank unit 3 for each sub-block. The grouping unit 6 groups the output signals of the filter bank unit 3 based on the spectrum flatness measure calculated by the flatness measure calculation unit 5.

量子化歪み／レート制御部７は、ホストプロセッサ８が指示する符号化レートと、心理聴覚モデル部２が出力するＳＭＲとに基づいて、フレーム毎に割当可能な符号量を算出する。量子化歪み／レート制御部７は、上記算出した符号量を符号化フレームの目標符号量として、スケーリング部９、量子化部１０および符号化部１１を制御する。例えば量子化歪み／レート制御部７は、量子化部１０から与えられる量子化係数から量子化歪み量を算出し、その結果に応じて量子化部１０へと出力指示を与える。また量子化歪み／レート制御部７は、符号化部１１から通知される符号量が上記目標符号量以内であるか否かを確認し、目標符号量以内であるときに符号化部１１へと出力指示を与える。 The quantization distortion / rate control unit 7 calculates a code amount that can be allocated for each frame based on the coding rate instructed by the host processor 8 and the SMR output from the psychoacoustic model unit 2. The quantization distortion / rate control unit 7 controls the scaling unit 9, the quantization unit 10, and the encoding unit 11 using the calculated code amount as the target code amount of the encoded frame. For example, the quantization distortion / rate control unit 7 calculates a quantization distortion amount from the quantization coefficient given from the quantization unit 10 and gives an output instruction to the quantization unit 10 according to the result. Further, the quantization distortion / rate control unit 7 checks whether or not the code amount notified from the encoding unit 11 is within the target code amount, and when the code amount is within the target code amount, the quantization distortion / rate control unit 7 proceeds to the encoding unit 11. Give output instructions.

スケーリング部９は、フィルタバンク部３またはグルーピング部６から出力されるスペクトラルデータに基づいてスケールファクターを決定する。スケーリング部９は、この決定したスケールファクターを用いてフィルタバンク部３から出力されるスペクトラルデータをスケーリングする。そしてスケーリング部９は、上記のスケーリング後のスペクトラルデータを量子化部１０へ出力する。またスケーリング部９は、量子化歪み／レート制御部７からの指示に応じてスケーリング係数をフォーマッタ１２へ出力する。 The scaling unit 9 determines a scale factor based on the spectral data output from the filter bank unit 3 or the grouping unit 6. The scaling unit 9 scales the spectral data output from the filter bank unit 3 using the determined scale factor. Then, the scaling unit 9 outputs the spectral data after scaling to the quantization unit 10. The scaling unit 9 outputs a scaling coefficient to the formatter 12 in response to an instruction from the quantization distortion / rate control unit 7.

量子化部１０は、スケーリング部９から出力されたスペクトラルデータを規定の式に従って補正したのち、全てのスペクトラルデータについて量子化を行う。量子化部１０は、量子化したのちのデータを、量子化歪み誤差がＳＭＲ値に基づく許容誤差であるかを判定するための情報として量子化歪み／レート制御部７へと出力する。量子化部１０は、量子化歪み／レート制御部７からの出力指示に応じて、量子化後のデータを符号化部１１へ出力する。 The quantizing unit 10 corrects the spectral data output from the scaling unit 9 according to a prescribed formula, and then quantizes all the spectral data. The quantization unit 10 outputs the quantized data to the quantization distortion / rate control unit 7 as information for determining whether the quantization distortion error is an allowable error based on the SMR value. The quantization unit 10 outputs the quantized data to the encoding unit 11 in response to an output instruction from the quantization distortion / rate control unit 7.

符号化部１１は、量子化部１０の出力を所定の符号化方式に従って圧縮符号化する。例えば、ＡＡＣの場合は上述の符号化方式としてハフマン符号化方式が適用される。符号化部１１は、符号化後の符号量を量子化歪み／レート制御部７へと出力する。符号化部１１は、量子化歪み／レート制御部７からの出力指示に応じて、符号化後のデータをフォーマッタ１２へ出力する。 The encoding unit 11 compresses and encodes the output of the quantization unit 10 according to a predetermined encoding method. For example, in the case of AAC, the Huffman encoding method is applied as the above-described encoding method. The encoding unit 11 outputs the encoded code amount to the quantization distortion / rate control unit 7. The encoding unit 11 outputs the encoded data to the formatter 12 in response to an output instruction from the quantization distortion / rate control unit 7.

フォーマッタ１２は、符号化部１１の出力とスケーリング部９から出力されるスケーリング係数とを所定のフォーマットに従って多重化する。フォーマッタ１２は、上記の多重化の結果を、符号化オーディオ信号として出力する。 The formatter 12 multiplexes the output of the encoding unit 11 and the scaling coefficient output from the scaling unit 9 according to a predetermined format. The formatter 12 outputs the result of the above multiplexing as an encoded audio signal.

なお、グルーピング部６にてグルーピングされた信号を処理するとき、スケーリング部９、量子化部１０、符号化部１１およびフォーマッタ１２における処理においては、スケールファクターなどのサイド情報を共通化する。 When signals grouped by the grouping unit 6 are processed, side information such as a scale factor is shared in the processing in the scaling unit 9, the quantization unit 10, the encoding unit 11, and the formatter 12.

次に以上のように構成された符号化装置の動作について説明する。なお、本願発明のポイントは、ショートブロックと判定された場合におけるグルーピング処理にある。そこでここでは、この処理を中心として説明する。この他の処理については、ＡＡＣに準拠する既存の符号化装置と同様な処理を適用可能である。 Next, the operation of the encoding apparatus configured as described above will be described. The point of the present invention lies in the grouping process when it is determined that the block is a short block. Therefore, here, this process will be mainly described. For other processing, processing similar to that of an existing encoding device that conforms to AAC can be applied.

ブロック切替え部２ａは、知覚エントロピーに基づいて、アタック信号のような過渡的な信号が含まれるフレームについてはショートブロックと判定し、それ以外のフレームについてはロングブロックと判定する。 Based on the perceptual entropy, the block switching unit 2a determines a frame including a transient signal such as an attack signal as a short block and determines other frames as a long block.

ショートブロックと判定された場合、スイッチ４はフラットネスメジャー算出部５およびグルーピング部６を選択する。 When it is determined that the block is a short block, the switch 4 selects the flatness measure calculation unit 5 and the grouping unit 6.

一方、ショートブロックと判定された場合にフィルタバンク部３は、時間域のオーディオ信号を図３に示すような８つの窓Ｗ０〜Ｗ７のそれぞれで区切り、各区間の信号を個別にＭＤＣＴすることにより、０番から７番までの８つのサブブロックのスペクトラルデータを順次得る。この８つのサブブロックのスペクトラルデータは、フラットネスメジャー算出部５およびグルーピング部６に順次入力される。 On the other hand, when it is determined that the block is a short block, the filter bank unit 3 divides the time domain audio signal by each of eight windows W0 to W7 as shown in FIG. , Spectral data of 8 sub-blocks from No. 0 to No. 7 are sequentially obtained. The spectral data of the eight sub blocks are sequentially input to the flatness measure calculation unit 5 and the grouping unit 6.

フラットネスメジャー算出部５は、入力されるスペクトラルデータの平坦度をサブブロック毎に算出する。スペクトラルデータの平坦度としては、下記の(1)式により算出されるスペクトラルフラットネスメジャーを使用することができる。

The flatness measure calculation unit 5 calculates the flatness of the input spectral data for each sub-block. As the flatness of the spectral data, a spectral flatness measure calculated by the following equation (1) can be used.

上記の(1)式におけるMaおよびMgは、サンプル毎の信号の強さまたはパワー値についてのサブブロック内での相加平均および相乗平均であって、下記の(2)式および(3)式により算出される。

Ma and Mg in the above equation (1) are the arithmetic mean and geometric mean within the sub-block for the signal strength or power value for each sample, and the following equations (2) and (3) Is calculated by

なおここで、kはサブブロック番号、nはサブブロックのサンプル数、spec(i)はサンプル毎の信号の強さまたはパワー値である。
すなわちスペクトラルフラットネスメジャーは、サンプル毎の信号の強さまたはパワー値についてのサブブロック内での相加相乗平均である。 Here, k is a sub-block number, n is the number of sub-block samples, and spec (i) is the signal strength or power value for each sample.
That is, the spectral flatness measure is an arithmetic geometric average within the sub-block for the signal strength or power value for each sample.

図２は図１中のグルーピング部６における動作フローを示す図である。グルーピング部６は各フレームを対象としてこの図２に示す処理を行う。 FIG. 2 is a diagram showing an operation flow in the grouping unit 6 in FIG. The grouping unit 6 performs the processing shown in FIG. 2 for each frame.

ステップＳａ１においてグルーピング部６は、上述のようにフラットネスメジャー算出部５がサブブロック毎に算出するスペクトラルフラットネスメジャーsfm(0)〜sfm(7)を収集する。ステップＳａ２においてグルーピング部６は、これらのスペクトラルフラットネスメジャーsfm(0)〜sfm(7)のうちの最大値を判定する。そしてステップＳａ３においてグルーピング部６は、最大値であるスペクトラルフラットネスメジャーsfmの番号を、変数ｋ_minに代入する。すなわち、例えばスペクトラルフラットネスメジャーsfm(3)が最大値であるならば、変数ｋ_minに「３」を代入する。 In step Sa1, the grouping unit 6 collects the spectral flatness measures sfm (0) to sfm (7) calculated by the flatness measure calculation unit 5 for each sub-block as described above. In step Sa2, the grouping unit 6 determines the maximum value of these spectral flatness measures sfm (0) to sfm (7). In step Sa3, the grouping unit 6 substitutes the number of the spectral flatness measure sfm, which is the maximum value, into the variable _kmin . That is, for example, if the spectral flatness measure sfm (3) is the maximum value, “3” is substituted into the variable _kmin .

ステップＳａ４においてグルーピング部６は、変数ｋ_minの値が「０」、「１」〜「６」および「７」のいずれであるかを確認する。 In step Sa4, the grouping unit 6 checks whether the value of the variable _kmin is “0”, “1” to “6”, or “7”.

変数ｋ_minの値が「０」であるならば、グルーピング部６はステップＳａ４からステップＳａ５へ進む。ステップＳａ５においてグルーピング部６は、０番のサブブロックを第１グループ、１〜７番のサブブロックを第２グループとしてグルーピングする。 If the value of the variable _kmin is “0”, the grouping unit 6 proceeds from step Sa4 to step Sa5. In step Sa5, the grouping unit 6 groups the 0th sub-block as the first group and the 1-7th sub-blocks as the second group.

変数ｋ_minの値が「１」〜「６」であるならば、グルーピング部６はステップＳａ４からステップＳａ６へ進む。ステップＳａ６においてグルーピング部６は、０番からｋ_min−１番のサブブロックを第１グループ、ｋ_min番のサブブロックを第２グループ、ｋ_min＋１番〜７番のサブブロックを第３グループとしてグルーピングする。 If the value of the variable _kmin is “1” to “6”, the grouping unit 6 proceeds from step Sa4 to step Sa6. Grouping unit 6 at step Sa6 is a k _min -1 th subblock 0 th first group, the sub-blocks of the second group of k _min th, the sub-blocks of k _min +1 th to 7 th as the third group Group.

変数ｋ_minの値が「７」であるならば、グルーピング部６はステップＳａ４からステップＳａ７へ進む。ステップＳａ７においてグルーピング部６は、０〜６番のサブブロックを第１グループ、７番のサブブロックを第２グループとしてグルーピングする。 If the value of the variable _kmin is “7”, the grouping unit 6 proceeds from step Sa4 to step Sa7. In step Sa7, the grouping unit 6 groups the 0th to 6th sub-blocks as the first group and the 7th sub-block as the second group.

つまりグルーピング部６は、スペクトル形状が最も平坦なサブブロックをひとつのブロックとして独立させ、その他の連続したサブブロックをひとつのグループとして纏める。例えば３番のサブブロックにおけるスペクトラルデータのスペクトル形状が最も平坦であるならば、グルーピング部６は各サブブロックを｛0,1,2｝、｛3｝、｛4,5,6,7｝のようにグルーピングする。 That is, the grouping unit 6 separates sub-blocks having the flatest spectrum shape as one block, and collects other consecutive sub-blocks as one group. For example, if the spectral shape of the spectral data in the third sub-block is the flattest, the grouping unit 6 assigns each sub-block to {0,1,2}, {3}, {4,5,6,7}. Group them as follows.

このようにして同一グループにグルーピングされた複数のサブブロックは、スケールファクターなどのサイド情報が共有される。 A plurality of sub-blocks grouped in the same group in this way share side information such as a scale factor.

スペクトルの形状は、過渡的な信号であるほど平坦になる。このため、スペクトル形状が最も平坦なサブブロック、すなわちスペクトラルフラットネスメジャーが最大であるサブブロックは、過渡的な信号の影響が最も大きく現れているサブブロックである。かくして上述のような本実施形態のグルーピングにより、過渡的な信号の影響を正しく反映して適正にグルーピングすることができる。そしてこれにより、サイド情報の共有が適正に行われるから、圧縮率を向上できる。 The shape of the spectrum becomes flatter as the signal is transient. For this reason, the sub-block having the flatst spectral shape, that is, the sub-block having the largest spectral flatness measure is the sub-block in which the influence of the transient signal appears most. Thus, by the grouping of the present embodiment as described above, it is possible to properly perform the grouping by correctly reflecting the influence of the transient signal. As a result, since the side information is properly shared, the compression rate can be improved.

例えば図３に示した例においては、過渡信号の影響は窓Ｗ２におけるＭＤＣＴ処理よりも窓Ｗ３におけるＭＤＣＴ処理に大きく影響し、窓Ｗ３に対応する３番のサブブロックのスペクトル形状が平坦になる。この結果、上述の具体例のような適正なグルーピングが行われる。 For example, in the example shown in FIG. 3, the influence of the transient signal has a greater effect on the MDCT processing in the window W3 than in the MDCT processing in the window W2, and the spectrum shape of the third sub-block corresponding to the window W3 becomes flat. As a result, appropriate grouping as in the above specific example is performed.

なお、スペクトラルフラットネスメジャーは、グルーピングとは異なる処理のために計算することが行われる場合がある。この場合には、このような別処理のために計算されるスペクトラルフラットネスメジャーをグルーピングに利用することにより、グルーピングをより簡易な処理で実現することが可能である。 Note that the spectral flatness measure may be calculated for processing different from the grouping. In this case, the grouping can be realized by a simpler process by using the spectral flatness measure calculated for such another process for the grouping.

この実施形態は、次のような種々の変形実施が可能である。
２番目や３番目に大きいスペクトラルフラットネスメジャーが閾値を超える場合に、それらのスペクトラルフラットネスメジャーが算出されたサブブロックも独立させるようにグルーピングしても良い。 This embodiment can be variously modified as follows.
When the second or third largest spectral flatness measure exceeds a threshold, the sub-blocks for which the spectral flatness measure is calculated may be grouped so as to be independent.

知覚エントロピーに基づいてロングブロックと判定できるフレームであっても、その前後のフレームがいずれもショートブロックと判定されるならば、ショートブロックに変換する手法が採用されることがある。この場合、ロングブロックからショートブロックに変換されたフレームには過渡的な信号が含まれないから、グルーピングを行わずに８つのサブブロックを全て１つのグループとして纏めることが好ましい。このようにすれば、全てのサイドブロックでサイド情報の共通化が図れ、圧縮率が向上する。なお、上記のように過渡的な信号が含まれずにショートブロックとされるフレームが生じることに対応するためには、例えばロングブロックからショートブロックに変換処理の結果を監視する手法や、スペクトラルフラットネスメジャーの最大値が閾値を超えるか否かを監視する手法が考えられる。 Even if the frame can be determined to be a long block based on the perceptual entropy, a method of converting to a short block may be adopted if both the preceding and succeeding frames are determined to be short blocks. In this case, since a frame converted from a long block to a short block does not include a transient signal, it is preferable to group all eight sub-blocks as one group without performing grouping. In this way, the side information can be shared by all the side blocks, and the compression rate is improved. In order to cope with the occurrence of a short block frame that does not include a transient signal as described above, for example, a method of monitoring the result of conversion processing from a long block to a short block, or spectral flatness A method of monitoring whether the maximum value of the measure exceeds a threshold value can be considered.

１グループに含めるサブブロックの数を制限すべきならば、より多くのグループを作るようにグルーピングしても良い。例えば、１グループに含めるサブブロックの３つまでに制限すべきならば、上記の実施形態に示した具体例では各サブブロックを｛0,1,2｝、｛3｝、｛4,5,6,7｝のようにグルーピングするところを、｛0,1,2｝、｛3｝、｛4,5,6｝、｛7｝のようにグルーピングする。 If the number of sub-blocks included in one group should be limited, grouping may be performed so as to create more groups. For example, if the number of subblocks to be included in one group should be limited to three, in the specific example shown in the above embodiment, each subblock is {0,1,2}, {3}, {4,5, Groupings such as 6,7} are grouped as {0,1,2}, {3}, {4,5,6}, {7}.

なお、本発明は上記実施形態そのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。また、上記実施形態に開示されている複数の構成要素の適宜な組み合わせにより、種々の発明を形成できる。例えば、実施形態に示される全構成要素から幾つかの構成要素を削除してもよい。 Note that the present invention is not limited to the above-described embodiment as it is, and can be embodied by modifying the constituent elements without departing from the scope of the invention in the implementation stage. In addition, various inventions can be formed by appropriately combining a plurality of constituent elements disclosed in the embodiment. For example, some components may be deleted from all the components shown in the embodiment.

本発明の本実施形態に係るオーディオ符号化装置のブロック図。The block diagram of the audio coding apparatus which concerns on this embodiment of this invention. 図１中のグルーピング部６における動作フローを示す図。The figure which shows the operation | movement flow in the grouping part 6 in FIG. 従来における過渡信号の位置の判定方法を説明する図。The figure explaining the determination method of the position of the transient signal in the past.

Explanation of symbols

１…ブロック切り出し部、２…心理聴覚モデル部、３…フィルタバンク部、４…スイッチ、５…フラットネスメジャー算出部、６…グルーピング部、７…量子化歪み／レート制御部、８…ホストプロセッサ、９…スケーリング部、１０…量子化部、１１…符号化部、１２…フォーマッタ。 DESCRIPTION OF SYMBOLS 1 ... Block cutout part, 2 ... Psychological auditory model part, 3 ... Filter bank part, 4 ... Switch, 5 ... Flatness measure calculation part, 6 ... Grouping part, 7 ... Quantization distortion / rate control part, 8 ... Host processor , 9: scaling unit, 10: quantization unit, 11: encoding unit, 12: formatter.

Claims

The audio signal is subjected to time-frequency conversion in units of sub-blocks constituting one frame to obtain spectral data, and the sub-blocks included in one frame are grouped to form a plurality of the sub-blocks included in the same group. In an audio encoding device that encodes the audio signal using an audio encoding method in which side information is shared in a block,
Based on the spectral data, calculation means for calculating the flatness of the spectrum for each sub-block,
Selecting means for selecting, as a single block, the one with the highest flatness calculated from the sub-blocks included in the one frame;
Grouping for grouping the sub-blocks included in the one frame so as to form a group of the single block and a group including at least two consecutive sub-blocks of the sub-blocks other than the single block. And an audio encoding device.

The audio signal is subjected to time-frequency conversion in units of sub-blocks constituting one frame to obtain spectral data, and the sub-blocks included in one frame are grouped to form a plurality of the sub-blocks included in the same group. In an audio encoding device that encodes the audio signal using an audio encoding method in which side information is shared in a block,
Based on the spectral data, calculation means for calculating the flatness of the spectrum for each sub-block,
Selecting means for selecting, as a single block, a sub-block whose flatness calculated by the calculating means exceeds a predetermined threshold;
Grouping means for making each of the single blocks an independent group and creating a group including at least two consecutive subblocks of the subblocks other than the single block included in the one frame. An audio encoding device.

3. The audio code according to claim 1, wherein the grouping unit performs the grouping so as to create a group including all of the continuous sub-blocks among the sub-blocks other than the single block. 4. Device.

The grouping means is a frame in which the block length used in the orthogonal transformation process is determined to be a long block based on perceptual entropy, and the frame converted to the short block according to the preceding and following frames is the one frame The audio encoding apparatus according to claim 1 or 2, wherein a group including all sub-blocks included therein is created.

When the maximum flatness of the flatness of the spectrum for each sub-block calculated by the calculating means is less than a predetermined threshold, the grouping means is a group consisting of all sub-blocks included in the one frame. The audio encoding device according to claim 1 or 2, wherein: