JP2006003580A

JP2006003580A - Device and method for coding audio signal

Info

Publication number: JP2006003580A
Application number: JP2004179321A
Authority: JP
Inventors: Kiyotaka Nagai; 清隆永井
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2004-06-17
Filing date: 2004-06-17
Publication date: 2006-01-05

Abstract

<P>PROBLEM TO BE SOLVED: To reduce a bit rate by appropriately conducting grouping of frequency data along a time direction or a frequency direction in audio signals of a plurality of channels and efficiently conducting coding. <P>SOLUTION: Analysis filter banks 100 and 101 receive left and right channel input audio signals and convert the signals into time series of frequency data. Strength computing sections 110 and 111 compute strength of the frequency data. A degree of similarity determining section 130 determines the degree of similarity of the plurality of the channels. When the degree of similarity is determined to be high by the degree of similarity determining section 130, a grouping section 122 commonly conducts grouping of the frequency data along the time direction or the frequency direction for the plurality of the channels based on the strength average of the frequency data for the plurality of the channels. Coding sections 150 and 151 conduct coding of the frequency data based on the grouping. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、複数のチャンネルの入力オーディオ信号を、周波数データの時間系列に変換して、周波数データの時間方向又は周波数方向のグループ分けを共通に行い、オーディオ信号を効率的に符号化するオーディオ信号符号化装置及びオーディオ信号符号化方法に関するものである。 The present invention converts an input audio signal of a plurality of channels into a time series of frequency data, and performs common time grouping of the frequency data or frequency direction grouping to efficiently encode the audio signal. The present invention relates to an encoding device and an audio signal encoding method.

近年、複数のチャンネルの入力オーディオ信号を周波数データの時間系列に変換し、周波数データ又はそのエンベロープデータの時間方向又は周波数方向のグループ分けを、複数のチャンネルに対して共通に行うことによって、グループ分けに関する情報を削減し、符号化効率を改善する方法が提案されてきている。 In recent years, the input audio signals of a plurality of channels are converted into a time series of frequency data, and the grouping of the frequency data or its envelope data in the time direction or the frequency direction is performed in common for a plurality of channels. There has been proposed a method for reducing the information about and improving the coding efficiency.

このような提案として、例えば非特許文献１と非特許文献２に記載されたＭＰＥＧ（ＭｏｖｉｎｇＰｉｃｔｕｒｅＥｘｐｅｒｔｓＧｒｏｕｐ）のＡＡＣ（ＡｄｖａｎｃｅｄＡｕｄｉｏＣｏｄｉｎｇ）が知られている。ＡＡＣでは、ＭＤＣＴ（ＭｏｄｉｆｉｅｄＤｉｓｃｒｅｔｅＣｏｓｉｎｅＴｒａｎｓｆｏｒｍ，変形離散コサイン変換）により、入力信号を周波数データ（ＭＤＣＴ係数）に変換する。ＭＤＣＴのブロック長には、長時間ブロックと短時間ブロックの２種類があり、入力信号の性質に応じて適切なブロック長を選択する。過渡的な入力信号の場合には、短時間ブロックを選択する。 As such a proposal, for example, AAC (Advanced Audio Coding) of MPEG (Moving Picture Experts Group) described in Non-Patent Document 1 and Non-Patent Document 2 is known. In AAC, an input signal is converted into frequency data (MDCT coefficient) by MDCT (Modified Discrete Cosine Transform). There are two types of MDCT block length, a long-time block and a short-time block, and an appropriate block length is selected according to the nature of the input signal. In the case of a transient input signal, a short time block is selected.

ＡＡＣの規格では、１つの短時間ブロックは１２８本の周波数データからなり、８ブロックを１フレームとして符号化する。短時間ブロックの符号化に際して、時間的に連続する複数の短時間ブロックのグループ分けを行い、グループを単位として量子化と符号化の処理を行う。ＡＡＣの規格では、１フレーム当り、最小１個から最大８個のグループに分ける。周波数データのバンド単位の量子化ステップサイズをスケールファクタとすると、同一のグループに対してはこのスケールファクタを共通にする。これにより、スケールファクタに必要なビット数を削減し、符号化効率を改善することができる。 In the AAC standard, one short-time block consists of 128 frequency data, and 8 blocks are encoded as one frame. When encoding short-time blocks, a plurality of short-time blocks that are temporally continuous are grouped, and quantization and encoding processes are performed in units of groups. In the AAC standard, one frame is divided into a minimum of 1 group and a maximum of 8 groups. If the quantization step size of the frequency data in band units is a scale factor, this scale factor is made common to the same group. Thereby, the number of bits required for the scale factor can be reduced, and the encoding efficiency can be improved.

また、入力信号がステレオ信号で、共通ウィンドウ（ｃｏｍｍｏｎｗｉｎｄｏｗ）のフラグがオンの場合、左右のチャンネルでグループ分けを共通に行うことにより、グループ分けに関する情報を１チャンネル分に削減し、符号化効率を更に改善することができる。 Also, when the input signal is a stereo signal and the common window flag is on, grouping is performed in common on the left and right channels, thereby reducing the information related to grouping to one channel and encoding efficiency. Can be further improved.

また、提案の別の例として、特許文献１及び非特許文献３と非特許文献４に記載されたＳＢＲ（ＳｐｅｃｔｒａｌＢａｎｄＲｅｐｌｉｃａｔｉｏｎ）がある。ＳＢＲの符号化では、入力されたオーディオ信号から高周波数帯域を削除した低周波数帯域の信号を符号化するとともに、削除された高周波数帯域のエンベロープデータを符号化する。ＳＢＲの復号化では、低周波数帯域の符号化データを復号化することにより、低周波数帯域の周波数データを再生する。そして、再生された低周波数帯域の周波数データを高周波数帯域に複製し、復号化されたエンベロープデータに基づいて、高周波数帯域のエンベロープのゲインを調整することにより、高周波数帯域の周波数データを復元する。こうして、２つの周波数帯域の周波数データを合成することによって、入力オーディオ信号を再生することができる。 Another example of the proposal is SBR (Spectral Band Replication) described in Patent Document 1, Non-Patent Document 3, and Non-Patent Document 4. In SBR encoding, a low frequency band signal obtained by deleting a high frequency band from an input audio signal is encoded, and the deleted high frequency band envelope data is encoded. In SBR decoding, frequency data in a low frequency band is reproduced by decoding encoded data in a low frequency band. Then, the reproduced frequency data of the low frequency band is copied to the high frequency band, and the frequency data of the high frequency band is restored by adjusting the gain of the envelope of the high frequency band based on the decoded envelope data. To do. Thus, the input audio signal can be reproduced by synthesizing the frequency data of the two frequency bands.

ＳＢＲでは、削除された高周波数帯域のエンベロープデータの符号化に、時間方向と周波数方向のグループ分けを行って符号化効率を改善している（例えば、特許文献１及び非特許文献４参照）。また、入力信号がステレオ信号で、カップリングモード（Ｃｏｕｐｌｉｎｇｍｏｄｅ）のフラグがオンの場合、左右のチャンネルでグループ分けを共通に行うことにより、グループ分けに関する情報を削減し、符号化効率を更に改善することができる。 In SBR, encoding efficiency is improved by performing grouping in the time direction and the frequency direction for encoding the deleted envelope data in the high frequency band (see, for example, Patent Document 1 and Non-Patent Document 4). In addition, when the input signal is a stereo signal and the coupling mode flag is on, grouping is performed in common on the left and right channels, thereby reducing grouping information and further improving coding efficiency. can do.

前記した先行技術文献には、複数のチャンネルで周波数データ又はエンベロープデータを共通にグループ分けして効率的に符号化する特定の方法又は装置については記載されていない。 The above-described prior art documents do not describe a specific method or apparatus for efficiently encoding frequency data or envelope data in a group by using a plurality of channels.

本発明の課題を明確にするために、先行技術として認識されているが、公知文献としては認識されていない２チャンネルのオーディオ信号符号化装置を従来例として説明する。図３は、従来例における２チャンネルのオーディオ信号符号化装置の構成を示すブロック図である。このオーディオ信号符号化装置は、分析フィルタバンク１００、１０１、強度算出部１１０、１１１、グループ分け部１２０、１２１、符号化部１５０、１５１、多重化部１６０、グループ分け一致判定部１９０を含んで構成される。このような構成のオーディオ信号符号化装置について、その動作を説明する。 In order to clarify the problem of the present invention, a two-channel audio signal encoding apparatus which is recognized as a prior art but not recognized as a publicly known document will be described as a conventional example. FIG. 3 is a block diagram showing the configuration of a conventional 2-channel audio signal encoding apparatus. The audio signal encoding apparatus includes analysis filter banks 100 and 101, intensity calculation units 110 and 111, grouping units 120 and 121, encoding units 150 and 151, a multiplexing unit 160, and a grouping match determination unit 190. Composed. The operation of the audio signal encoding apparatus having such a configuration will be described.

分析フィルタバンク１００に入力された左チャンネル（Ｌｃｈ）における時間軸のオーディオ信号は、周波数データの時間系列に変換される。同様に、分析フィルタバンク１０１に入力された右チャンネル（Ｒｃｈ）における時間軸のオーディオ信号は、周波数データの時間系列に変換される。 The time axis audio signal in the left channel (Lch) input to the analysis filter bank 100 is converted into a time series of frequency data. Similarly, the time axis audio signal in the right channel (Rch) input to the analysis filter bank 101 is converted into a time series of frequency data.

強度算出部１１０、１１１は、夫々のチャンネルのグループ分けをする前の符号化処理単位における周波数データの強度を算出する。ここで符号化処理単位とは、周波数データの量子化と符号化の処理を行うときの単位であり、同一の符号化処理単位に属する全ての周波数データに対して、共通の量子化ステップサイズを用いる。 The intensity calculators 110 and 111 calculate the intensity of the frequency data in the encoding processing unit before grouping each channel. Here, the encoding processing unit is a unit for performing frequency data quantization and encoding processing, and a common quantization step size is set for all frequency data belonging to the same encoding processing unit. Use.

図４（ａ）にグループ分けをする前の１フレームあたりの符号化処理単位を示す。図４では、グループ分けをする前の１フレームは、時間方向に８個、周波数方向に８個、全体で合計６４個の符号化処理単位からなる。強度算出部１１０、１１１は、夫々のチャンネルにおける符号化処理単位の周波数データの強度を、符号化処理単位に属するすべての周波数データの２乗和により算出して出力する。 FIG. 4A shows an encoding processing unit per frame before grouping. In FIG. 4, one frame before grouping is composed of a total of 64 coding processing units, 8 in the time direction and 8 in the frequency direction. The intensity calculators 110 and 111 calculate and output the intensity of the frequency data of the encoding process unit in each channel by the sum of squares of all the frequency data belonging to the encoding process unit.

グループ分け部１２０、１２１は、夫々のチャンネルにおいて、符号化処理単位の周波数データの強度に基づいて、時間方向と周波数方向とのグループ分けを行い、グループ分けに関する情報（以下、グループ分け情報という）を出力する。最初に時間方向についてのグループ分けを行う。時間方向のグループ分けは、時間方向の周波数データにおける強度の変化量に基づいて行う。同一時間に属する符号化処理単位の周波数データの強度を周波数方向に加算して、時間に属する符号化処理単位の合計強度を算出する。 The grouping units 120 and 121 perform grouping in the time direction and the frequency direction based on the strength of the frequency data of the encoding processing unit in each channel, and information on grouping (hereinafter referred to as grouping information). Is output. First, grouping in the time direction is performed. The grouping in the time direction is performed based on the amount of change in intensity in the frequency data in the time direction. The strength of the frequency data of the encoding processing units belonging to the same time is added in the frequency direction to calculate the total strength of the encoding processing units belonging to the time.

次に時間方向に隣り合う符号化処理単位の合計強度の比を算出して、時間方向の変化量とする。ただし、比の値が１より小さい場合には、１以上となるように比の値の逆数をとったものを変化量とする。変化量が所定の閾値以下の場合には、時間方向に隣り合う符号化処理単位を合併して１つのグループとし、新たな符号化処理単位にする。このような処理を時間方向に繰り返すことによって、時間方向のグループ分けを行う。 Next, the ratio of the total intensity of the encoding processing units adjacent in the time direction is calculated and set as the amount of change in the time direction. However, when the value of the ratio is smaller than 1, the amount of change is obtained by taking the reciprocal of the value of the ratio so as to be 1 or more. When the amount of change is less than or equal to a predetermined threshold value, the encoding processing units adjacent in the time direction are merged into one group to form a new encoding processing unit. By repeating such processing in the time direction, grouping in the time direction is performed.

次に周波数方向のグループ分けを行う。周波数方向のグループ分けは、高周波数分解能と低周波数分解能の２種類とする。高周波数分解能とは、周波数方向のグループ分けを行う前の符号化処理単位である。低周波数分解能とは、いくつかの符号化処理単位を周波数方向に合併し、新たな符号化処理単位としたものである。周波数方向のグループ分けは、時間方向のグループ分けを行った後の符号化処理単位に対して行い、周波数方向における周波数データの強度の変化量に基づいて行う。そして周波数方向に隣接する符号化処理単位の強度の比を変化量として算出する。ただし比の値が１より小さい場合には、１以上となるように比の値の逆数を用いる。 Next, frequency grouping is performed. There are two types of grouping in the frequency direction: high frequency resolution and low frequency resolution. High frequency resolution is a unit of encoding processing before grouping in the frequency direction. The low frequency resolution is a new coding processing unit obtained by merging several coding processing units in the frequency direction. The grouping in the frequency direction is performed on the encoding processing unit after the grouping in the time direction, and is performed based on the amount of change in the intensity of the frequency data in the frequency direction. And the ratio of the intensity | strength of the encoding process unit adjacent to a frequency direction is computed as a variation | change_quantity. However, when the value of the ratio is smaller than 1, the reciprocal of the value of the ratio is used so as to be 1 or more.

次に低周波数分解能の符号化処理単位内で、高周波数分解能における符号化処理単位での周波数データの強度の変化量が、所定の閾値より大きい場合には高周波数分解能とし、そうでない場合には低周波数分解能とする。グループ分け部１２０及び１２１は、以上のようにして算出した夫々のチャンネルにおける時間方向と周波数方向のグループ分け情報を出力する。 Next, within the coding processing unit with low frequency resolution, when the amount of change in the intensity of the frequency data in the coding processing unit with high frequency resolution is greater than a predetermined threshold, the frequency is set to high frequency resolution. Use low frequency resolution. The grouping units 120 and 121 output grouping information in the time direction and the frequency direction in each channel calculated as described above.

図４（ｂ）は、時間方向と周波数方向でグループ分けをした後、１フレームの符号化処理単位の例を示す図である。図３の符号化部１５０、１５１は、夫々グループ分け部１２０、１２１からのグループ分け情報に基づいて、グループ分けをした後の符号化処理単位を構成し、夫々のチャンネルの周波数データに対して、符号化処理単位で量子化と符号化を行い、符号化された周波数データを出力する。 FIG. 4B is a diagram illustrating an example of an encoding processing unit of one frame after grouping in the time direction and the frequency direction. The encoding units 150 and 151 in FIG. 3 constitute the encoding processing unit after grouping based on the grouping information from the grouping units 120 and 121, respectively, and the frequency data of each channel Quantization and encoding are performed in units of encoding processing, and encoded frequency data is output.

グループ分け一致判定部１９０は、グループ分け部１２０から出力される左チャンネルのグループ分け情報と、グループ分け部１２１から出力される右チャンネルのグループ分け情報とが完全に一致しているか否かの判定を行う。一致した場合には、グループ分け一致判定部１９０は左右のチャンネルで共通のグループ分けを行うことを表す共通グループフラグをオンにして出力する。 The grouping match determination unit 190 determines whether the left channel grouping information output from the grouping unit 120 completely matches the right channel grouping information output from the grouping unit 121. I do. If they match, the grouping match determination unit 190 turns on and outputs a common group flag indicating that common grouping is performed on the left and right channels.

多重化部１６０では、符号化部１５０、１５１からの符号化された周波数データと、グループ分け部１２０、１２１からのグループ分け情報と、グループ分け一致判定部１９０からの共通グループフラグとを多重化し、これを符号化データとして出力する。共通グループフラグがオンの場合には、左右のチャンネルのグループ分け情報は同一なので、例えばグループ分け部１２０からのグループ分け情報のみを符号化データとして多重化する。
特表２００３−５２９７８７号公報ボシ（Ｂｏｓｉ）、外９名、「イソ／アイイーシーエムペグ２アドバンストオーディオコーディング（ＩＳＯ／ＩＥＣＭＰＥＧ−２ＡｄｖａｎｃｅｄＡｕｄｉｏＣｏｄｉｎｇ）」、Ｊ．ＡｕｄｉｏＥｎｇ．Ｓｏｃ．、第４５巻、１０号、１９９７年１０月、第７８９頁−８１４頁イソ／アイイーシー（ＩＳＯ／ＩＥＣ）１３８１８−７、「インフォメーションテクノロジー、動画及び付随する音響信号の汎用符号化、パート７アドバンストオーディオコーディング（エイエイシー）（Ｉｎｆｏｒｍａｔｉｏｎｔｅｃｈｎｏｌｏｇｙ − Ｇｅｎｅｒｉｃｃｏｄｉｎｇｏｆｍｏｖｉｎｇｐｉｃｔｕｒｅｓａｎｄａｓｓｏｃｉａｔｅｄａｕｄｉｏｉｎｆｏｒｍａｔｉｏｎ，Ｐａｒｔ７ＡｄｖａｎｃｅｄＡｕｄｉｏＣｏｄｉｎｇ（ＡＡＣ））」、１９９７年マーチン（Ｍａｒｔｉｎ）、外３名、「スペクトルバンド複製、オーディオ符号化における新しいアプローチ（ＳｐｅｃｔｒａｌＢａｎｄＲｅｐｌｉｃａｔｉｏｎ，ａｎｏｖｅｌａｐｐｒｏａｃｈｉｎａｕｄｉｏｃｏｄｉｎｇ）」、第１１２回ＡＥＳ会議（Ｃｏｎｖｅｎｔｉｏｎ）、２００２年５月、論文第５５５３号イソ／アイイーシーエムペグ（ＩＳＯ／ＩＥＣＭＰＥＧ）１４４９６−３：２００１／エフディーエイエム１（ＦＤＡＭ１）、「インフォメーションテクノロジー、コーディングオブオーディオ・ビジュアルオブジェクト、パート３オーディオ、修正１：帯域拡張（Ｉｎｆｏｒｍａｔｉｏｎｔｅｃｈｎｏｌｏｇｙ − Ｃｏｄｉｎｇｏｆａｕｄｉｏ−ｖｉｓｕａｌｏｂｊｅｃｔｓ，Ｐａｒｔ３Ａｕｄｉｏ，Ａｍｅｎｄｍｅｎｔ１：ＢａｎｄｗｉｄｔｈＥｘｔｅｎｓｉｏｎ）」、文書第Ｎ５５７０号、２００３年３月 The multiplexing unit 160 multiplexes the encoded frequency data from the encoding units 150 and 151, the grouping information from the grouping units 120 and 121, and the common group flag from the grouping match determination unit 190. This is output as encoded data. When the common group flag is on, the left and right channel grouping information is the same, so for example, only the grouping information from the grouping unit 120 is multiplexed as encoded data.
Special table 2003-529787 gazette Bosi, 9 others, “ISO / IEC MPEG-2 Advanced Audio Coding”, J.I. Audio Eng. Soc. 45, No. 10, October 1997, pages 789-814. ISO / IEC 13818-7, “Information Technology, Generic Coding of Video and Accompanying Acoustic Signals, Part 7 Advanced technology—Generic coding of moving pictures and associations and associations. Part 7 Advanced Audio Coding (AAC)), 1997 Martin, 3 others, “Spectral Band Replication, a novel approach in audio coding”, 112th AES Conference (May 2002), Paper No. No. 5553 ISO / IEC MPEG 14496-3: 2001 / FDAM1 (FDAM1), “Information Technology, Coding of Audio-Visual Objects, Part 3 Audio, Modification 1: Information technology − Coding of audio-visual objects, Part 3 Audio, Amendment 1: Bandwidth Extension), Document N5570, March 2003.

しかしながら、従来の２チャンネルのオーディオ信号符号化装置では、両チャンネルのグループ分けが完全に一致しないと、２チャンネルで共通のグループ分けができなかった。このため、共通のグループ分けが可能となる発生頻度が少なく、共通のグループ分けによる符号化効率の改善効果が小さいという課題を有していた。 However, in the conventional 2-channel audio signal encoding apparatus, if the grouping of both channels does not completely match, the common grouping of the two channels cannot be performed. For this reason, there has been a problem that the frequency of occurrence of common grouping is low and the improvement effect of the coding efficiency by the common grouping is small.

また、グループ分けが、各チャンネルの周波数データの強度に基づいて、チャンネル毎に独立して行われるので、２つのチャンネルで共通のグループ分けを行うのに適していないという課題を有していた。 Further, since the grouping is performed independently for each channel based on the intensity of the frequency data of each channel, there is a problem that it is not suitable for performing a common grouping on two channels.

本発明はこのような従来の問題点を解決するもので、複数のチャンネルにおける入力オーディオ信号の周波数データの時間方向又は周波数方向のグループ分けを、複数のチャンネルにおける周波数データの強度と類似度に応じて適切に行い、符号化効率を改善して特に低いビットレートでの音質を改善した複数のチャンネルのオーディオ信号を符号化するオーディオ信号符号化装置を実現することを目的とする。 The present invention solves such a conventional problem, and the time direction or frequency direction grouping of the frequency data of the input audio signal in a plurality of channels is made according to the strength and similarity of the frequency data in the plurality of channels. An object of the present invention is to realize an audio signal encoding apparatus that encodes a plurality of channels of audio signals, which is appropriately performed, improves encoding efficiency, and particularly improves sound quality at a low bit rate.

この課題を解決するために、本発明のオーディオ信号符号化装置は、複数のチャンネルのオーディオ信号を入力して、周波数データの時間系列に変換する分析フィルタバンクと、前記複数のチャンネルにおける周波数データの強度を算出する強度算出部と、前記複数のチャンネルにおける周波数データの強度に基づいて、前記複数のチャンネルの類似度を判定する類似度判定部と、前記類似度判定部により類似度が高いと判定された場合に、前記複数のチャンネルにおける周波数データの時間方向又は周波数方向の少なくとも１つのグループ分けを、前記複数のチャンネルにおける周波数データの強度平均に基づいて、前記複数のチャンネルに対して共通に行うグループ分け部と、前記グループ分けに基づいて、前記複数のチャンネルの周波数データ又はそのエンベロープデータを符号化する符号化部と、を具備することを特徴とするものである。 In order to solve this problem, an audio signal encoding device according to the present invention inputs an audio signal of a plurality of channels and converts it into a time series of frequency data, and an analysis of the frequency data of the plurality of channels. A strength calculation unit that calculates strength, a similarity determination unit that determines the similarity of the plurality of channels based on the strength of frequency data in the plurality of channels, and the similarity determination unit determines that the similarity is high In this case, at least one grouping of the frequency data in the plurality of channels in the time direction or the frequency direction is performed in common for the plurality of channels based on an average intensity of the frequency data in the plurality of channels. A grouping unit; and frequency of the plurality of channels based on the grouping. It is characterized in that it comprises a data or coding unit to encode the envelope data.

この課題を解決するために、本発明のオーディオ信号符号化装置は、複数のチャンネルのオーディオ信号を入力して、前記入力オーディオ信号から所定の周波数帯域を削除した信号に対して符号化データを生成するとともに、前記削除された周波数帯域の信号のエンベロープデータ生成し、前記エンベロープデータを前記符号化データに多重化して伝送又は記憶するオーディオ信号符号化装置であって、前記複数のチャンネルのオーディオ信号を入力して、周波数データの時間系列に変換する分析フィルタバンクと、前記複数のチャンネルにおける前記削除された周波数帯域の周波数データの強度を算出する強度算出部と、前記複数のチャンネルにおける前記削除された周波数帯域の周波数データの強度に基づいて、前記複数のチャンネルの類似度を判定する類似度判定部と、前記類似度判定部により類似度が高いと判定された場合に、前記削除された周波数帯域の周波数データの時間方向又は周波数方向の少なくとも１つのグループ分けを、前記削除された周波数帯域の周波数データの強度平均値に基づいて、前記複数のチャンネルに対して共通に行うグループ分け部と、前記グループ分けに基づいて、前記複数のチャンネルの前記削除された周波数帯域の周波数データのエンベロープデータを符号化する符号化部と、を具備することを特徴とするものである。 In order to solve this problem, an audio signal encoding apparatus according to the present invention inputs audio signals of a plurality of channels and generates encoded data for a signal obtained by deleting a predetermined frequency band from the input audio signal. And an audio signal encoding device that generates envelope data of the signal of the deleted frequency band, and multiplexes the encoded data with the encoded data for transmission or storage, wherein the audio signals of the plurality of channels are received. An analysis filter bank that inputs and converts to a time series of frequency data, an intensity calculation unit that calculates the intensity of frequency data of the deleted frequency band in the plurality of channels, and the deleted in the plurality of channels Based on the frequency data intensity of the frequency band, When it is determined by the similarity determination unit that determines similarity and the similarity determination unit determines that the similarity is high, at least one grouping in the time direction or frequency direction of the frequency data of the deleted frequency band is performed. A grouping unit that is commonly used for the plurality of channels based on an intensity average value of frequency data of the deleted frequency band, and the deleted frequencies of the plurality of channels based on the grouping. And an encoding unit that encodes the envelope data of the frequency data of the band.

ここで前記周波数データの強度を、前記周波数データの絶対値のべき乗により算出するようにしてもよい。 Here, the intensity of the frequency data may be calculated by a power of the absolute value of the frequency data.

この課題を解決するために、本発明のオーディオ信号符号化方法は、複数のチャンネルのオーディオ信号を入力して、周波数データの時間系列に変換する周波数変換ステップと、前記複数のチャンネルにおける周波数データの強度を算出する強度算出ステップと、前記複数のチャンネルにおける周波数データの強度に基づいて、前記複数のチャンネルの類似度を判定する類似度判定ステップと、前記類似度判定ステップにより類似度が高いと判定された場合に、前記複数のチャンネルにおける周波数データの時間方向又は周波数方向の少なくとも１つのグループ分けを、前記複数のチャンネルにおける周波数データの強度平均に基づいて、前記複数のチャンネルに対して共通に行うグループ分けステップと、前記グループ分けに基づいて、前記複数のチャンネルの周波数データ又はそのエンベロープデータを符号化する符号化ステップと、を備えることを特徴とするものである。 In order to solve this problem, an audio signal encoding method according to the present invention includes a frequency conversion step of inputting audio signals of a plurality of channels and converting the signals into a time series of frequency data, and the frequency data of the plurality of channels. A strength calculation step for calculating strength, a similarity determination step for determining the similarity of the plurality of channels based on the strength of the frequency data in the plurality of channels, and a determination that the similarity is high by the similarity determination step In this case, at least one grouping of the frequency data in the plurality of channels in the time direction or the frequency direction is performed in common for the plurality of channels based on an average intensity of the frequency data in the plurality of channels. Based on the grouping step and the grouping, the previous Is characterized in that comprises a coding step of coding the frequency data or envelope data of a plurality of channels, the.

この課題を解決するために、本発明のオーディオ信号符号化方法は、複数のチャンネルのオーディオ信号を入力して、前記入力オーディオ信号から所定の周波数帯域を削除した信号に対して符号化データを生成するとともに、前記削除された周波数帯域のエンベロープデータを生成し、前記エンベロープデータを前記符号化データに多重化して伝送又は記憶するステレオオーディオ信号符号化方法であって、前記複数のチャンネルのオーディオ信号を入力して周波数データの時間系列に変換する周波数変換ステップと、前記複数のチャンネルの前記削除された周波数帯域の周波数データの強度を算出する強度算出ステップと、前記複数のチャンネルの前記削除された周波数帯域の周波数データの強度に基づいて、前記複数のチャンネルの類似度を判定する類似度判定ステップと、前記類似度判定ステップにより類似度が高いと判定された場合に、前記削除された周波数帯域の周波数データの時間方向又は周波数方向の少なくとも１つのグループ分けを、前記削除された周波数帯域の周波数データの強度平均に基づいて、前記複数のチャンネルに対して共通に行うグループ分けステップと、前記グループ分けに基づいて、前記複数のチャンネルの前記削除された周波数帯域の周波数データのエンベロープデータを符号化する符号化ステップと、を備えることを特徴とするものである。 In order to solve this problem, the audio signal encoding method of the present invention inputs audio signals of a plurality of channels and generates encoded data for a signal obtained by deleting a predetermined frequency band from the input audio signal. And a stereo audio signal encoding method for generating envelope data of the deleted frequency band and multiplexing or transmitting the envelope data to the encoded data, wherein the audio signals of the plurality of channels are recorded. A frequency conversion step for inputting and converting to a time series of frequency data; an intensity calculating step for calculating an intensity of frequency data of the deleted frequency band of the plurality of channels; and the deleted frequency of the plurality of channels. Similarity of the multiple channels based on the frequency data intensity of the band When the similarity is determined to be high by the similarity determination step and the similarity determination step, at least one grouping in the time direction or frequency direction of the frequency data of the deleted frequency band is A grouping step commonly performed for the plurality of channels based on an average intensity of frequency data of the deleted frequency band, and a frequency of the deleted frequency band of the plurality of channels based on the grouping And an encoding step for encoding the envelope data of the data.

本発明のオーディオ信号符号化装置及びオーディオ信号符号化方法によれば、複数のチャンネルにおける入力オーディオ信号の類似度が高い場合、オーディオ信号の周波数データの時間方向又は周波数方向のグループ分けを、複数のチャンネルにおける周波数データの強度の平均に基づいて、複数のチャンネルに対して共通に行うことにより、周波数データ又はそのエンベロープデータを効率的に符号化することができる。特に低いビットレートでの音質を改善することができる。 According to the audio signal encoding device and the audio signal encoding method of the present invention, when the similarity of input audio signals in a plurality of channels is high, the grouping in the time direction or frequency direction of the frequency data of the audio signal By performing in common for a plurality of channels based on the average of the intensity of frequency data in the channel, the frequency data or its envelope data can be efficiently encoded. In particular, the sound quality at a low bit rate can be improved.

以下本発明を実施するための最良の形態について、図面を参照しながら説明する。
（実施の形態１）
図１は本発明の実施の形態１における２チャンネルのオーディオ信号符号化装置の構成図である。このオーディオ信号符号化装置は、分析フィルタバンク１００、１０１、強度算出部１１０、１１１、グループ分け部１２０、１２１、１２２、類似度判定部１３０、切り替え部１４０、符号化部１５０、１５１、多重化部１６０を含んで構成される。 The best mode for carrying out the present invention will be described below with reference to the drawings.
(Embodiment 1)
FIG. 1 is a configuration diagram of a 2-channel audio signal encoding apparatus according to Embodiment 1 of the present invention. The audio signal encoding apparatus includes analysis filter banks 100 and 101, intensity calculation units 110 and 111, grouping units 120, 121, and 122, a similarity determination unit 130, a switching unit 140, encoding units 150 and 151, and multiplexing. The unit 160 is configured to be included.

分析フィルタバンク１００は左チャンネルのオーディオ信号を入力して、周波数データの時間系列に変換するものである。分析フィルタバンク１０１は右チャンネルのオーディオ信号を入力して、周波数データの時間系列に変換するものである。強度算出部１１０は、左チャンネルにおける周波数データの強度を算出するものである。強度算出部１１１は、右チャンネルにおける周波数データの強度を算出するものである。グループ分け部１２０は左チャンネルにおける周波数データの時間方向又は周波数方向のグループ分けを行うものである。グループ分け部１２１は右チャンネルにおける周波数データの時間方向又は周波数方向のグループ分けを行うものである。類似度判定部１３０は左及び右チャンネルにおける周波数データの強度に基づいて、両チャンネルの類似度を判定するものである。 The analysis filter bank 100 inputs a left channel audio signal and converts it into a time series of frequency data. The analysis filter bank 101 inputs a right channel audio signal and converts it into a time series of frequency data. The intensity calculation unit 110 calculates the intensity of the frequency data in the left channel. The intensity calculation unit 111 calculates the intensity of the frequency data in the right channel. The grouping unit 120 performs grouping in the time direction or frequency direction of frequency data in the left channel. The grouping unit 121 performs grouping in the time direction or frequency direction of frequency data in the right channel. The similarity determination unit 130 determines the similarity between both channels based on the intensity of the frequency data in the left and right channels.

グループ分け部１２２は、類似度判定部１３０により類似度が高いと判定された場合に、左及び右チャンネルにおける周波数データの時間方向又は周波数方向の少なくとも１つのグループ分けを、両チャンネルにおける周波数データの強度平均に基づいて、両チャンネルに対して共通に行うものである。切り替え部１４０は、類似度判定部１３０からの共通グループフラグに基づいて、グループ分け情報を切り替えるものである。符号化部１５０は分析フィルタバンク１００の信号を入力し、切り替え部１４０によるグループ分けに基づいて、左チャンネルの周波数データ又はそのエンベロープデータを符号化するものである。符号化部１５１は分析フィルタバンク１０１の信号を入力し、切り替え部１４０によるグループ分けに基づいて、右チャンネルの周波数データ又はそのエンベロープデータを符号化するものである。多重化部１６０は、符号化部１５０、１５１からの符号化された周波数データと、切り替え部１４０からのグループ分け情報と、類似度判定部１３０からの共通グループフラグとを多重化し、符号化データとして出力するものである。 When the similarity determination unit 130 determines that the similarity is high, the grouping unit 122 performs at least one grouping in the time direction or frequency direction of the frequency data in the left and right channels, and the frequency data in both channels. This is performed in common for both channels based on the intensity average. The switching unit 140 switches grouping information based on the common group flag from the similarity determination unit 130. The encoding unit 150 inputs the signal of the analysis filter bank 100 and encodes the frequency data of the left channel or its envelope data based on the grouping by the switching unit 140. The encoding unit 151 inputs the signal of the analysis filter bank 101 and encodes the right channel frequency data or its envelope data based on the grouping by the switching unit 140. The multiplexing unit 160 multiplexes the encoded frequency data from the encoding units 150 and 151, the grouping information from the switching unit 140, and the common group flag from the similarity determination unit 130, and encodes the encoded data. Is output as

以上のように構成された２チャンネルのオーディオ信号符号化装置について、その動作を以下に述べる。入力された左チャンネルの時間軸のオーディオ信号は、分析フィルタバンク１００において周波数データの時間系列に変換される。同様に、入力された右チャンネルのオーディオ信号は、分析フィルタバンク１０１において周波数データの時間系列に変換される。 The operation of the two-channel audio signal encoding apparatus configured as described above will be described below. The input audio signal on the time axis of the left channel is converted into a time series of frequency data in the analysis filter bank 100. Similarly, the input right channel audio signal is converted into a time series of frequency data in the analysis filter bank 101.

強度算出部１１０、１１１は、夫々のチャンネルにおいて、所定の符号化処理単位における周波数データの強度を算出して出力する。以下の説明では、符号化処理単位とは、周波数データの量子化と符号化の処理を行うときの単位を意味し、同一の符号化処理単位に属する周波数データに対して共通の量子化ステップサイズを用いる。強度算出部１１０、１１１は夫々のチャンネルにおける符号化処理単位の周波数データの強度を、符号化処理単位に属する全ての周波数データにおけるべき乗の和、ここでは２乗和により算出する。 Intensity calculators 110 and 111 calculate and output the intensity of frequency data in a predetermined encoding processing unit in each channel. In the following description, an encoding processing unit means a unit for performing frequency data quantization and encoding processing, and a common quantization step size for frequency data belonging to the same encoding processing unit. Is used. The intensity calculators 110 and 111 calculate the intensity of the frequency data of the encoding process unit in each channel by the sum of powers of all the frequency data belonging to the encoding process unit, here the sum of squares.

グループ分け部１２０、１２１は、夫々のチャンネルにおける符号化処理単位の周波数データの強度に基づいて、時間方向と周波数方向のグループ分けを行い、グループ分けに関する情報（以下、グループ分け情報という）を出力する。グループ分け部１２０、１２１は、最初に時間方向についてのグループ分けを行う。時間方向のグループ分けは、時間方向における周波数データの強度の変化量に基づいて行う。グループ分け部１２０、１２１は、同一の時間に属する符号化処理単位の周波数データの強度を周波数方向に加算して、同一の時間に属する符号化処理単位の合計強度を算出する。次に時間方向に隣り合う符号化処理単位の合計強度の比を算出して、時間方向の変化量とする。ただし、比の値が１より小さい場合には、１以上となるように比の値の逆数をとったものを変化量とする。変化量が所定の閾値以下の場合には、グループ分け部１２０、１２１は図４（ｂ）に示すように、時間方向に隣り合う符号化処理単位を合併して１つのグループとし、新たな符号化処理単位にする。このような処理を時間方向に繰り返すことによって、時間方向のグループ分けを行う。 The grouping units 120 and 121 perform grouping in the time direction and the frequency direction based on the strength of the frequency data of the encoding processing unit in each channel, and output information on grouping (hereinafter referred to as grouping information). To do. The grouping units 120 and 121 first perform grouping in the time direction. The grouping in the time direction is performed based on the amount of change in the intensity of the frequency data in the time direction. The grouping units 120 and 121 calculate the total strength of the coding processing units belonging to the same time by adding the strengths of the frequency data of the coding processing units belonging to the same time in the frequency direction. Next, the ratio of the total intensity of the encoding processing units adjacent in the time direction is calculated and set as the amount of change in the time direction. However, when the value of the ratio is smaller than 1, the amount of change is obtained by taking the reciprocal of the value of the ratio so as to be 1 or more. When the change amount is equal to or smaller than the predetermined threshold, the grouping units 120 and 121 merge the coding processing units adjacent in the time direction into one group as shown in FIG. To unitize processing. By repeating such processing in the time direction, grouping in the time direction is performed.

次にグループ分け部１２０、１２１は周波数方向のグループ分けを行う。本実施の形態では、周波数方向のグループ分けは、高周波数分解能と低周波数分解能の２種類とする。高周波数分解能とは、周波数方向のグループ分けを行う前の符号化処理単位である。低周波数分解能とは、いくつかの高周波数分解能の符号化処理単位を周波数方向に合併して、新たな符号化処理単位としたものである。周波数方向のグループ分けは、時間方向のグループ分けを行った後の符号化処理単位に対して行い、周波数方向における符号化処理単位における周波数データの強度の変化量に基づいて行う。周波数方向に隣接する符号化処理単位の強度の比を変化量として算出する。但し比の値が１より小さい場合には、１以上となるように比の値の逆数を演算したものを変化量として算出する。 Next, the grouping units 120 and 121 perform grouping in the frequency direction. In the present embodiment, there are two types of grouping in the frequency direction: high frequency resolution and low frequency resolution. High frequency resolution is a unit of encoding processing before grouping in the frequency direction. The low frequency resolution is obtained by merging several high frequency resolution encoding processing units in the frequency direction to form a new encoding processing unit. The grouping in the frequency direction is performed on the encoding processing unit after the grouping in the time direction, and is performed based on the amount of change in the intensity of the frequency data in the encoding processing unit in the frequency direction. The ratio of the intensity of coding processing units adjacent in the frequency direction is calculated as a change amount. However, when the ratio value is smaller than 1, a value obtained by calculating the reciprocal of the ratio value so as to be 1 or more is calculated as the amount of change.

次に低周波数分解能の符号化処理単位内で、高周波数分解能における符号化処理単位での周波数データの強度の変化量が所定の閾値より大きい場合には、高周波数分解能とし、そうでない場合には低周波数分解能とする。 Next, within the coding processing unit with low frequency resolution, if the amount of change in the intensity of the frequency data in the coding processing unit with high frequency resolution is greater than a predetermined threshold value, set to high frequency resolution, otherwise Use low frequency resolution.

グループ分け部１２０、１２１は、以上のようにして算出した夫々のチャンネルの時間方向と周波数方向のグループ分け情報を出力する。グループ分け部１２２は、最初に左右のチャンネルにおける周波数データの強度平均値を算出する。次にグループ分け部１２２は、左右のチャンネルにおける周波数データの強度平均値に基づいて、グループ分け部１２０、１２１と同様にしてグループ分けを行い、２つのチャンネルで共通のグループ分け情報を出力する。以上のようにグループ分け部１２２は、左右のチャンネルにおける周波数データの強度平均値に基づいて共通のグループ分けを行うので、左右のチャンネルで周波数データの強度に違いがある場合にも、それを反映して共通のグループ分けをすることができる。 The grouping units 120 and 121 output grouping information in the time direction and frequency direction of each channel calculated as described above. The grouping unit 122 first calculates an average intensity value of frequency data in the left and right channels. Next, the grouping unit 122 performs grouping in the same manner as the grouping units 120 and 121 based on the average intensity values of the frequency data in the left and right channels, and outputs common grouping information for the two channels. As described above, since the grouping unit 122 performs common grouping based on the average intensity value of the frequency data in the left and right channels, even when there is a difference in the intensity of the frequency data in the left and right channels, this is reflected. Common groupings.

類似度判定部１３０は、強度算出部１１０、１１１からの左右のチャンネルにおける周波数データの強度を入力し、符号化処理フレーム全体にわたって左右のチャンネルにおける周波数データの強度の正規化相関係数を算出する。正規化相関係数の値が所定の閾値より大きい場合には、２つのチャンネルにおける周波数データの類似度が高いと判定する。そして類似度判定部１３０は、左右のチャンネルでグループ分けが共通に行われていることを示す共通グループフラグをオンにして出力する。この共通グループフラグは、非特許文献２記載のＡＡＣの規格では、共通ウィンドウフラグと呼ばれている。 The similarity determination unit 130 receives the intensity of the frequency data in the left and right channels from the intensity calculation units 110 and 111, and calculates the normalized correlation coefficient of the intensity of the frequency data in the left and right channels over the entire encoding processing frame. . If the value of the normalized correlation coefficient is greater than a predetermined threshold, it is determined that the similarity between the frequency data in the two channels is high. Then, the similarity determination unit 130 turns on and outputs a common group flag indicating that grouping is performed in common on the left and right channels. This common group flag is called a common window flag in the AAC standard described in Non-Patent Document 2.

切り替え部１４０は、類似度判定部１３０からの共通グループフラグに基づいて、グループ分け情報を切り替える。共通グループフラグがオンの場合には、グループ分け部１２２からの左右のチャンネルで共通のグループ分け情報を選択し、符号化部１５０、１５１と多重化部１６０に出力する。また、共通グループフラグがオフの場合には、グループ分け部１２０からの左チャンネルのグループ分け情報を符号化部１５０と多重化部１６０に出力し、グループ分け部１２１からの右チャンネルのグループ分け情報を符号化部１５１と多重化部１６０に出力する。 The switching unit 140 switches the grouping information based on the common group flag from the similarity determination unit 130. When the common group flag is on, common grouping information is selected for the left and right channels from the grouping unit 122 and output to the encoding units 150 and 151 and the multiplexing unit 160. When the common group flag is OFF, the left channel grouping information from the grouping unit 120 is output to the encoding unit 150 and the multiplexing unit 160, and the right channel grouping information from the grouping unit 121 is output. Are output to the encoding unit 151 and the multiplexing unit 160.

符号化部１５０と１５１は、切り替え部１４０からのグループ分け情報に基づいて、符号化処理単位を構成し、夫々のチャンネルの周波数データに対して符号化処理単位で量子化と符号化を行い、符号化された周波数データを多重化部１６０に出力する。多重化部１６０は、符号化部１５０、１５１からの符号化された周波数データと、切り替え部１４０からのグループ分け情報と、類似度判定部１３０からのグループ共通フラグとを多重化し、符号化データとして出力する。共通グループフラグがオンの場合には、左右のチャンネルのグループ分け情報は同一なので、一方のチャンネルのグループ分け情報のみを符号化データとして多重化する。 The encoding units 150 and 151 configure an encoding processing unit based on the grouping information from the switching unit 140, perform quantization and encoding in units of the encoding processing on the frequency data of each channel, The encoded frequency data is output to multiplexing section 160. The multiplexing unit 160 multiplexes the encoded frequency data from the encoding units 150 and 151, the grouping information from the switching unit 140, and the group common flag from the similarity determination unit 130, and encodes the encoded data. Output as. When the common group flag is on, the left and right channel grouping information is the same, so only the grouping information of one channel is multiplexed as encoded data.

以上のように実施の形態１のオーディオ信号符号化装置では、２つのチャンネルにおけるオーディオ信号の類似度を判定する類似度判定部１３０と、２つのチャンネルで共通のグループ分けを行うグループ分け部１２２とを設ける。このことにより、２つのチャンネルにおいてオーディオ信号の類似度が高いと判定された場合に、２つのチャンネルにおける周波数データの強度平均値に基づいて、周波数データの時間方向と周波数方向のグループ分けを、２つのチャンネルに対して共通に行う。このため２つのチャンネルのオーディオ信号を効率的に符号化することができ、特に低いビットレートでの音質を改善することができる。また従来例と比較して共通のグループ分けの発生頻度が高くなり、符号化効率を改善することができる。 As described above, in the audio signal encoding device according to Embodiment 1, the similarity determination unit 130 that determines the similarity of audio signals in two channels, and the grouping unit 122 that performs common grouping on the two channels, Is provided. Accordingly, when it is determined that the similarity of the audio signals is high in the two channels, the grouping of the frequency data in the time direction and the frequency direction is set to 2 based on the average intensity value of the frequency data in the two channels. Common to two channels. For this reason, the audio signals of the two channels can be efficiently encoded, and the sound quality can be improved particularly at a low bit rate. In addition, the frequency of occurrence of common grouping is higher than in the conventional example, and the coding efficiency can be improved.

なお、符号化部１５０、１５１で、周波数データがノイズ性の場合には、ＰＮＳ（ＰｅｒｃｅｐｔｕａｌＮｏｉｓｅＳｕｂｓｔｉｔｕｔｉｏｎ）と呼ばれる処理を用いて、周波数データの代わりに、周波数データの平均パワーを表すエンベロープデータを符号化してもよい。ＰＮＳ処理を用いる場合、復号化時にはランダムなノイズデータを生成し、エンベロープデータに基づいてその平均パワーを調整することにより、周波数データを再生することができる。 When the frequency data is noisy in encoding sections 150 and 151, a process called PNS (Perceptual Noise Substation) is used to encode envelope data representing the average power of frequency data instead of frequency data. May be used. When PNS processing is used, frequency data can be reproduced by generating random noise data at the time of decoding and adjusting the average power based on the envelope data.

なお、オーディオ信号符号化装置としての動作処理量を削減する目的で、グループ分け部１２２は類似度判定部１３０からの共通グループフラグがオンのときのみに動作し、グループ分け部１２０と１２１は共通グループフラグがオフのときのみに動作するような構成にしてもよい。 For the purpose of reducing the amount of operation processing as an audio signal encoding device, the grouping unit 122 operates only when the common group flag from the similarity determination unit 130 is on, and the grouping units 120 and 121 are common. It may be configured to operate only when the group flag is off.

（実施の形態２）
次に本発明の実施の形態２におけるオーディオ信号符号化装置について説明する。図２は、実施の形態２における２チャンネルのオーディオ信号符号化装置の構成図である。ここではＳＢＲに適用した場合の構成を示す。 (Embodiment 2)
Next, an audio signal encoding apparatus according to Embodiment 2 of the present invention will be described. FIG. 2 is a configuration diagram of a two-channel audio signal encoding apparatus according to the second embodiment. Here, a configuration when applied to SBR is shown.

このオーディオ信号符号化装置は、分析フィルタバンク１００、１０１、強度算出部１１２、１１３、グループ分け部１２３、１２４、１２５、類似度判定部１３１、切り替え部１４１、符号化部１５２、１５３、多重化部１６１、ダウンサンプラー１７０、１７１、低域符号化部１８０、１８１を含んで構成される。このように構成された２チャンネルのオーディオ信号符号化装置について、その動作を説明する。 This audio signal encoding apparatus includes analysis filter banks 100 and 101, intensity calculation units 112 and 113, grouping units 123, 124, and 125, a similarity determination unit 131, a switching unit 141, encoding units 152 and 153, and multiplexing. Unit 161, down samplers 170 and 171, and low frequency encoding units 180 and 181. The operation of the two-channel audio signal encoding apparatus configured as described above will be described.

ダウンサンプラー１７０は、入力された左チャンネルのオーディオ信号から所定の高周波数帯域を削除し、サンプリング周波数を半分にダウンサンプルした低周波数帯域の信号を生成するものである。ダウンサンプラー１７１は、入力された右チャンネルのオーディオ信号から所定の高周波数帯域を削除し、サンプリング周波数を半分にダウンサンプルした低周波数帯域の信号を生成するものである。低域符号化部１８０、１８１は、夫々チャンネルのダウンサンプラー１７０、１７１からの低周波数帯域のオーディオ信号を入力として、高能率符号化した符号化データを多重化部１６１に出力するものである。 The down sampler 170 deletes a predetermined high frequency band from the input left channel audio signal, and generates a low frequency band signal obtained by down-sampling the sampling frequency by half. The down sampler 171 deletes a predetermined high frequency band from the input right channel audio signal, and generates a low frequency band signal obtained by down-sampling the sampling frequency by half. The low-frequency encoding units 180 and 181 receive low-frequency band audio signals from the channel down-samplers 170 and 171, respectively, and output high-efficiency encoded data to the multiplexing unit 161.

図２の分析フィルタバンク１００、１０１、強度算出部１１２、１１３、グループ分け部１２３、１２４、１２５、類似度判定部１３１、切り替え部１４１、符号化部１５２、１５３は、ダウンサンプラー１７０、１７１で削除された高周波数帯域のエンベロープを符号化するためのブロックである。これらのブロックは図１のものと同一であり、説明を省略する。また、図２の強度算出部１１２、１１３、グループ分け部１２３、１２４、１２５、類似度判定部１３１、切り替え部１４１は、対象とする周波数帯域が削除された高周波数帯域である点を除いて、図１の強度算出部１１０、１１１、グループ分け部１２０、１２１、１２２、類似度判定部１３０、切り替え部１４０と夫々同一であり、説明を省略する。なお、類似度判定部１３１から出力される共通グループフラグは、非特許文献４に記載されたＳＢＲの規格では、カップリングモードフラグと呼ばれている。 The analysis filter banks 100 and 101, the intensity calculation units 112 and 113, the grouping units 123, 124, and 125, the similarity determination unit 131, the switching unit 141, and the encoding units 152 and 153 in FIG. It is a block for encoding an envelope of a deleted high frequency band. These blocks are the same as those in FIG. 1, and a description thereof will be omitted. In addition, the intensity calculation units 112 and 113, the grouping units 123, 124, and 125, the similarity determination unit 131, and the switching unit 141 in FIG. 2 are the high frequency band from which the target frequency band is deleted. 1 are the same as the intensity calculation units 110 and 111, the grouping units 120, 121, and 122, the similarity determination unit 130, and the switching unit 140 in FIG. Note that the common group flag output from the similarity determination unit 131 is referred to as a coupling mode flag in the SBR standard described in Non-Patent Document 4.

符号化部１５２と１５３は、切り替え部１４１からのグループ分け情報に基づいて符号化処理単位を構成し、夫々のチャンネルにおいて削除された高周波数帯域の符号化処理単位での周波数データの平均パワーを表すエンベロープデータを算出する。次に符号化部１５２、１５３は、算出されたエンベロープデータの量子化と符号化を行い、符号化されたエンベロープデータを多重化部１６１に出力する。 The encoding units 152 and 153 configure an encoding processing unit based on the grouping information from the switching unit 141, and calculate the average power of the frequency data in the high frequency band encoding processing unit deleted in each channel. Calculate the envelope data to represent. Next, the encoding units 152 and 153 perform quantization and encoding of the calculated envelope data, and output the encoded envelope data to the multiplexing unit 161.

多重化部１６１は、低域符号化部１８０、１８１からの低周波数帯域の符号化データと、符号化部１５２と１５３からの高周波数帯域の符号化されたエンベロープデータと、切り替え部１４１からのエンベロープデータのグループ分け情報と、類似度判定部１３１からの共通グループフラグとを多重化し、符号化データとして出力する。共通グループフラグがオンの場合には、左右のチャンネルのグループ分け情報は同一なので、一方のチャンネルのグループ分け情報のみを符号化データとして多重化する。 The multiplexing unit 161 includes the low frequency band encoded data from the low frequency encoding units 180 and 181, the high frequency band encoded envelope data from the encoding units 152 and 153, and the switching unit 141. The grouping information of the envelope data and the common group flag from the similarity determination unit 131 are multiplexed and output as encoded data. When the common group flag is on, the left and right channel grouping information is the same, so only the grouping information of one channel is multiplexed as encoded data.

以上のように実施の形態２のオーディオ信号符号化装置では、低域符号化部１８０、１８１で符号化されなかったオーディオ信号の高周波数帯域に対して、２つのチャンネルにおけるオーディオ信号の類似度を判定する類似度判定部１３１と、２つのチャンネルで周波数データの共通のグループ分けを行うグループ分け部１２５とを設ける。このことにより、２つのチャンネルにおけるオーディオ信号の類似度が高いと判定された場合に、２つのチャンネルにおける周波数データの強度の平均に基づいて、周波数データの時間方向と周波数方向のグループ分けを、２つのチャンネルに対して共通に行う。このため、２つのチャンネルにおけるオーディオ信号の高周波数帯域のエンベロープデータを効率的に符号化することができ、特に低いビットレートでの音質を改善することができる。 As described above, in the audio signal encoding device according to the second embodiment, the similarity of audio signals in two channels is obtained with respect to the high frequency band of the audio signal that has not been encoded by the low frequency encoding units 180 and 181. A similarity determination unit 131 for determination and a grouping unit 125 that performs common grouping of frequency data on two channels are provided. As a result, when it is determined that the similarity between the audio signals in the two channels is high, the grouping of the frequency data in the time direction and the frequency direction is performed based on the average of the intensity of the frequency data in the two channels. Common to two channels. For this reason, the envelope data of the high frequency band of the audio signal in the two channels can be efficiently encoded, and the sound quality at a particularly low bit rate can be improved.

なお、上記の各実施の形態では、周波数データの強度を周波数データの２乗和により算出したが、周波数データの強度を周波数データにおける絶対値の任意のべき乗和により算出してもよい。また、上記の各実施の形態では、入力オーディオ信号は２チャンネルとしたが、これを３チャンネル以上にしてもよい。 In each of the above embodiments, the intensity of the frequency data is calculated by the square sum of the frequency data. However, the intensity of the frequency data may be calculated by an arbitrary power sum of the absolute values in the frequency data. In each of the above embodiments, the input audio signal has two channels. However, this may be three or more channels.

本発明にかかるオーディオ信号符号化装置及びオーディオ信号符号化方法は、複数のチャンネルにおけるオーディオ信号の周波数データの類似度が高い場合、周波数データの時間方向又は周波数方向の少なくとも１つのグループ分けを、複数のチャンネルに対して共通に行い、オーディオ信号を効率的に符号化することができる。このため特に低いビットレートでの音質を改善することができるので、放送、通信、蓄積等の分野でオーディオ信号の高能率な伝送又は記憶の用途に適用できる。特に映像信号とオーディオ信号とを含むコンテンツを、制限されたビットレートでサービスする携帯端末には好適に利用することができる。 In the audio signal encoding device and audio signal encoding method according to the present invention, when the similarity of frequency data of audio signals in a plurality of channels is high, at least one grouping of frequency data in the time direction or frequency direction is divided into a plurality of groups. The audio signal can be efficiently encoded by performing the same for all channels. For this reason, since the sound quality at a particularly low bit rate can be improved, the present invention can be applied to highly efficient transmission or storage of audio signals in fields such as broadcasting, communication, and storage. In particular, it can be suitably used for a portable terminal that services content including a video signal and an audio signal at a limited bit rate.

本発明の実施の形態１における２チャンネルのオーディオ信号符号化装置の構成図である。1 is a configuration diagram of a 2-channel audio signal encoding apparatus according to Embodiment 1 of the present invention. FIG. 本発明の実施の形態２における２チャンネルのオーディオ信号符号化装置の構成図である。It is a block diagram of the audio signal encoding apparatus of 2 channels in Embodiment 2 of this invention. 従来の２チャンネルのオーディオ信号符号化装置の構成図である。It is a block diagram of the conventional audio signal encoding apparatus of 2 channels. （ａ）はグループ分けをする前の符号化処理単位の例を示し、（ｂ）はグループ分けをした後の符号化処理単位の例を示す図である。(A) shows the example of the encoding process unit before grouping, (b) is a figure which shows the example of the encoding process unit after grouping.

Explanation of symbols

１００，１０１分析フィルタバンク
１１０，１１１，１１２，１１３強度算出部
１２０，１２１，１２２，１２３，１２４，１２５グループ分け部
１３０，１３１類似度判定部
１４０，１４１切り替え部
１５０，１５１，１５２，１５３符号化部
１６０，１６１多重化部
１７０，１７１ダウンサンプラー
１８０，１８１低域符号化部 100, 101 Analysis filter bank 110, 111, 112, 113 Strength calculation unit 120, 121, 122, 123, 124, 125 Grouping unit 130, 131 Similarity determination unit 140, 141 switching unit 150, 151, 152, 153 Code Encoding unit 160,161 Multiplexing unit 170,171 Downsampler 180,181 Low frequency encoding unit

Claims

An analysis filter bank that inputs audio signals of multiple channels and converts them into a time series of frequency data,
An intensity calculator for calculating the intensity of frequency data in the plurality of channels;
A similarity determination unit that determines the similarity of the plurality of channels based on the intensity of frequency data in the plurality of channels;
When the similarity determination unit determines that the similarity is high, at least one grouping in the time direction or frequency direction of the frequency data in the plurality of channels is based on an average intensity of the frequency data in the plurality of channels. A grouping unit that is commonly used for the plurality of channels;
An audio signal encoding apparatus, comprising: an encoding unit that encodes the frequency data of the plurality of channels or the envelope data thereof based on the grouping.

Input audio signals of a plurality of channels, generate encoded data for a signal in which a predetermined frequency band is deleted from the input audio signal, and generate envelope data of the signal in the deleted frequency band, An audio signal encoding device for multiplexing or transmitting or storing envelope data in the encoded data,
An analysis filter bank that inputs the audio signals of the plurality of channels and converts them into a time series of frequency data;
An intensity calculator that calculates the intensity of the frequency data of the deleted frequency band in the plurality of channels;
A similarity determination unit that determines the similarity of the plurality of channels based on the intensity of the frequency data of the deleted frequency band in the plurality of channels;
When it is determined that the similarity is high by the similarity determination unit, at least one grouping in the time direction or the frequency direction of the frequency data of the deleted frequency band is classified into the frequency data of the deleted frequency band. Based on the average intensity value, a grouping unit commonly performed for the plurality of channels;
An audio signal encoding apparatus, comprising: an encoding unit that encodes envelope data of the frequency data of the deleted frequency band of the plurality of channels based on the grouping.

The audio signal encoding apparatus according to claim 1 or 2, wherein the intensity of the frequency data is calculated by a power of an absolute value of the frequency data.

A frequency conversion step of inputting audio signals of a plurality of channels and converting the signals into a time series of frequency data;
An intensity calculating step for calculating the intensity of frequency data in the plurality of channels;
A similarity determination step of determining the similarity of the plurality of channels based on the intensity of the frequency data in the plurality of channels;
When it is determined that the similarity is high in the similarity determination step, at least one grouping in the time direction or frequency direction of the frequency data in the plurality of channels is based on an average intensity of the frequency data in the plurality of channels. A grouping step to be performed in common for the plurality of channels;
An audio signal encoding method comprising: an encoding step of encoding frequency data of the plurality of channels or envelope data thereof based on the grouping.

Input audio signals of a plurality of channels, generate encoded data for a signal in which a predetermined frequency band is deleted from the input audio signal, generate envelope data of the deleted frequency band, and generate the envelope A stereo audio signal encoding method for multiplexing or transmitting or storing data in the encoded data,
A frequency conversion step of inputting the audio signals of the plurality of channels and converting the signals into a time series of frequency data;
An intensity calculating step for calculating the intensity of the frequency data of the deleted frequency band of the plurality of channels;
A similarity determination step of determining the similarity of the plurality of channels based on the intensity of the frequency data of the deleted frequency bands of the plurality of channels;
When it is determined that the similarity is high in the similarity determination step, at least one grouping in the time direction or frequency direction of the frequency data of the deleted frequency band is classified into the frequency data of the deleted frequency band. A grouping step commonly performed on the plurality of channels based on an intensity average;
And a coding step of coding envelope data of the frequency data of the deleted frequency band of the plurality of channels based on the grouping.

6. The audio signal encoding method according to claim 4, wherein the intensity of the frequency data is calculated by a power of an absolute value of the frequency data.