JP2004246224A

JP2004246224A - Audio high-efficiency encoder, audio high-efficiency encoding method, audio high-efficiency encoding program, and recording medium therefor

Info

Publication number: JP2004246224A
Application number: JP2003037702A
Authority: JP
Inventors: Kiyotaka Nagai; 清隆永井
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2003-02-17
Filing date: 2003-02-17
Publication date: 2004-09-02
Anticipated expiration: 2023-02-17
Also published as: JP4369140B2

Abstract

<P>PROBLEM TO BE SOLVED: To realize an audio encoding method having improved encoding efficiency, which obtains the same decoding result as the conventional, with the smaller number of bits. <P>SOLUTION: Zero detection parts 151 and 152 and a data permutation part 160 are provided, in the coding method for expressing spectral data by the unit of band, with a scale factor showing the gain and quantized spectral data. The zero detection parts 151 and 152 detect that all of the quantized spectral data in the band are zero. The data permutation part 160 replaces the value of at least one of a scale factor, an intensity stereo position showing the directional gain of one channel with respect to that of the other channel of the intensity stereo processing, a code book number, and a flag regarding the stereo processing, with another value, on the basis of the detection result. <P>COPYRIGHT: (C)2004,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、オーディオ信号をスペクトルデータに変換し、可変長符号化を用いてスペクトルデータを高能率符号化するオーディオ高能率符号化装置、オーディオ高能率符号化方法、オーディオ高能率符号化プログラム及びその記録媒体に関する。
【０００２】
【従来の技術】
近年、オーディオ信号をスペクトルデータに変換し、可変長符号化を用いてスペクトルデータを符号化することにより、符号化効率を改善したオーディオ高能率符号化方法が提案されている。
【０００３】
このようなオーディオ高能率符号化方法としては、ＭＰＥＧ−２ＡＡＣ（ＡｄｖａｎｃｅｄＡｕｄｉｏＣｏｄｉｎｇ）の規格書（非特許文献１参照）に記載されたものが知られている。以下では、前記非特許文献１記載のＭＰＥＧ−２ＡＡＣ（以下ＡＡＣと略す）のローコンプレキシティプロファイル（ＬｏｗＣｏｍｐｌｅｘｉｔｙＰｒｏｆｉｌｅ）を例にとって、可変長符号化を用いてスペクトルデータを高能率符号化する従来の技術について説明する。
【０００４】
図１３は、従来のＡＡＣエンコーダの構成を示すブロック図である。このＡＡＣエンコーダには、フィルタバンク１０１、１０２、インテンシティステレオデータ生成部１１０、ミッドサイド（Ｍ／Ｓ）ステレオデータ生成部１２０、量子化部１３０、符号化部１４０が設けられている。このような構成のＡＡＣエンコーダの動作について説明する
【０００５】
フィルタバンク１０１に入力された左チャンネル（Ｌｃｈ）の時間軸のオーディオ信号は、所定の時間サンプル、即ち長変換ブロックの場合２０４８サンプルで、短変換ブロックの場合２５６サンプルからなる変換ブロックに分割される。そして、ＭＤＣＴ（ＭｏｄｉｆｉｅｄＤｉｓｃｒｅｔｅＣｏｓｉｎｅＴｒａｎｓｆｏｒｍ，変形離散コサイン変換）によりスペクトルデータ（ＭＤＣＴ係数）に変換される。この変換は変換ブロックを５０％ずつオーバーラップして実行し、長変換ブロックの場合には２０４８サンプルを１０２４本のスペクトルデータに変換する。また、短変換ブロックの場合には２５６サンプルを１２８本のスペクトルデータに変換する。
【０００６】
８個連続で短変換ブロックを変換することにより、短変換ブロックの出力スペクトルデータの本数を８×１２８＝１０２４本として、長変換ブロックと一致させる。このチャンネル当り１０２４本のスペクトルデータが符号化の単位である。次に、１０２４本のスペクトルデータは、人間の耳の臨界帯域特性を模擬したスケールファクタバンドと呼ばれるバンド単位にグループ化される。
【０００７】
同様に、フィルタバンク１０２に入力された右チャンネル（Ｒｃｈ）の時間軸のオーディオ信号は、変換ブロックに分割され、ＭＤＣＴにより１０２４本のスペクトルデータに変換される。次に、１０２４本のスペクトルデータはスケールファクタバンド単位にグループ化される。
【０００８】
インテンシティステレオデータ生成部１１０は、スケールファクタバンド単位でインテンシティステレオ処理の有無情報を出力すると共に、インテンシティステレオ処理を行うスケールファクタバンドに対しては、左チャンネルのインテンシティステレオ処理されたスペクトルデータと、２つのチャンネルの位相関係を示す情報と、左チャンネルに対する右チャンネルの指向性ゲインを示すインテンシティステレオポジションとを算出して出力する。またインテンシティステレオデータ生成部１１０は、インテンシティステレオ処理を行わないスケールファクタバンドに対しては、左右のチャンネルのスペクトルデータをそのまま出力する。
【０００９】
左チャンネルのインテンシティステレオ処理されたスペクトルデータは、左チャンネルと右チャンネルのスペクトルデータの和（２つのチャンネルの位相関係が同相の場合）、あるいは差（２つのチャンネルの位相関係が逆相の場合）を、そのパワーレベルが左チャンネルの元のパワーレベルと一致するようにゲインを補正することにより生成される。そして右チャンネルのスペクトルデータは零に設定される。なお、この零スペクトルデータは符号化データとしては伝送されない。
【００１０】
ミッドサイドステレオデータ生成部（Ｍ／Ｓステレオデータ生成部）１２０は、スケールファクタバンド単位でミッドサイドステレオ処理の有無情報と、ミッドサイドステレオ処理ありの場合にはミッドサイドステレオ処理されたミッドスペクトルデータ及びサイドスペクトルデータとを生成する。ミッドスペクトルデータは左チャンネルと右チャンネルのスペクトルデータの和の１／２で生成され、左チャンネルのスペクトルデータとして出力される。また、サイドスペクトルデータは左チャンネルと右チャンネルのスペクトルデータの差の１／２で生成され、右チャンネルのスペクトルデータとして出力される。インテンシティステレオ処理とミッドサイドステレオ処理を共に行わないスケールファクタバンドの場合、左右のチャンネルにおける元のスペクトルデータがそのまま出力される。なお、インテンシティステレオ処理とミッドサイドステレオ処理とは排他的な関係にあり、一方の処理を選択すると他方の処理を選択することはできない。
【００１１】
量子化部１３０は、スケールファクタバンド単位でスペクトルデータのゲインを示すスケールファクタと、前記ゲインで正規化されたスペクトルデータとの量子化を行う。左右のチャンネルのスペクトルデータをスケールファクタバンド毎に、聴覚モデルに基づいてスペクトルデータのマスキングレベル、すなわち許容量子化ノイズレベルを算出し、算出された許容量子化ノイズレベルに基づいてスケールファクタと正規化されたスペクトルデータとの量子化を行う。
【００１２】
符号化部１４０では、量子化されたデータに対してハフマンコードによる可変長符号化を用いて符号化処理を行い、符号化データを生成して出力する。量子化部１３０と符号化部１４０の処理は、符号化データに必要なビット数を利用可能なビット数以下に調整するため、動作を繰り返して行うことにより実行される。
【００１３】
以下、符号化部１４０で生成する符号化データのフォーマットについて説明する。ステレオオーディオ信号の場合、符号化データは、２つのチャンネルで共通のデータとチャンネル毎に固有のデータとからなる。最初にチャンネル毎のデータについて説明する。チャンネル毎の主要な符号化データとしては、セクションデータ、スケールファクタデータ、符号化スペクトルデータの３つのデータがあげられる。
【００１４】
最初にセクションデータ（非特許文献１のｓｅｃｔｉｏｎ＿ｄａｔａ（））について説明する。セクションとは、同一のコードブックを使用するスケールファクタバンドの集合のことである。セクションデータは、セクションで使用するコードブック番号と、スケールファクタバンドを単位とするセクションの長さとからなるセクション毎のデータを、すべてのセクションに対して繰り返したものである。
【００１５】
ＡＡＣでは、量子化されたスペクトルデータの符号化にコードブック番号１から１１の１１種類のハフマンコードブックを用いる。また、特別なコードブック番号として、スケールファクタバンド内の量子化されたスペクトルデータがすべて零データであることを表すコードブック番号０、インテンシティステレオ処理で２つのチャンネルの位相関係が同相であることを表すコードブック番号１５、インテンシティステレオ処理で２つのチャンネルの位相関係が逆相であることを表すコードブック番号１４の３つのコードブック番号がある。
【００１６】
ＡＡＣでは、符号化効率を改善するため、零データを表すコードブック番号０のセクションのスケールファクタと量子化されたスペクトルデータは符号化データとして伝送されない。
【００１７】
上記したセクションデータのフォーマットから明らかなように、セクションの数が多いほどセクションデータに必要なビット数は増加する。したがって、複数のハフマンコードブックが選択可能な場合、セクションデータと符号化スペクトルデータとを合わせた合計の符号化データビット数が小さくなるようにコードブックを選択してセクションを形成する。このようにセクションを形成する処理をセクショニングと呼ぶ。セクショニングにより、スケールファクタバンド内のすべての量子化されたスペクトルデータが零でもコードブック番号０を使わない場合が発生する。
【００１８】
スケールファクタデータ（非特許文献１のｓｃａｌｅ＿ｆａｃｔｏｒ＿ｄａｔａ（））は、すべてのスケールファクタバンドに対してスケールファクタを符号化したデータと、インテンシティステレオポジションを符号化したデータとからなる。
【００１９】
スケールファクタはスケールファクタバンドの１．５ｄＢ単位のゲインを表し、スケールファクタの符号化は、スケールファクタバンド間のスケールファクタの差分をハフマンコードで可変長符号化することによってなされる。スケールファクタの符号化データは、初期値と可変長符号化されたスケールファクタの差分とからなる。可変長符号化するときのスケールファクタの差分は±６０以内である。
【００２０】
図１４及び図１５はスケールファクタの差分の可変長符号化に用いられるハフマンコードの符号長を示したものである。図１４及び図１５において、ｉｎｄｅｘはハフマンコードブックを参照する時のアドレスを表し、スケールファクタの差分に６０を加算した値である。またｄｓｆはスケールファクタの差分を表わし、ｌｅｎｇｔｈはハフマンコードの符号長（単位はビット）を表す。図１４はｉｎｄｅｘが０〜５９までの、図１５はｉｎｄｅｘが６０〜１２０までのｄｓｆとｌｅｎｇｔｈとを示す。図１５に示すように、ハフマンコードの符号長が最も短いときの差分（ｄｓｆ）は０で、符号長（ｌｅｎｇｔｈ）は１ビットである。
【００２１】
インテンシティステレオ処理されたスケールファクタバンドの右チャンネルでは、スケ−ルファクタデータとして、スケールファクタの代わりにインテンシティステレオポジションを伝送する。
【００２２】
インテンシティステレオポジションは、左チャンネルに対する右チャンネルの１．５ｄＢ単位の指向性ゲインを表す。インテンシティステレオポジションの符号化は、スケールファクタの符号化と同様な方法でなされる。すなわち、隣接するインテンシティステレオ処理されたスケールファクタバンドのインテンシティステレオポジションの差分を、スケールファクタと同一のハフマンコードブックを使って可変長符号化する。ただし、スケールファクタの差分の初期値は符号化データとして伝送されるのに対し、インテンシティステレオポジションの差分の初期値は常に０であり、伝送されない。
【００２３】
符号化スペクトルデータ（非特許文献１のｓｐｅｃｔｒａｌ＿ｄａｔａ（））は、セクションとして選択したハフマンコードブックを使って量子化されたスペクトルデータを符号化したデータである。ハフマンコードブック番号が零データを表す０、あるいはインテンシティステレオ処理を表す１４、１５の場合には、スペクトルデータは伝送されない。
【００２４】
次に２つのチャンネルで共通な符号化データとして、Ｍ／Ｓ有無・ＩＳ位相反転フラグ（非特許文献１のｍｓ＿ｕｓｅｄ）と、ＭＳマスク（非特許文献１のｍｓ＿ｍａｓｋ＿ｐｒｅｓｅｎｔ）とについて以下に説明する。
【００２５】
Ｍ／Ｓ有無・ＩＳ位相反転フラグは、ミッドサイドステレオ処理の有無、あるいはインテンシティステレオ（ＩＳ）処理の位相反転の有無を表すフラグで、スケールファクタバンド当り１ビットのフラグである。具体的には次の状態を表す。
１）Ｍ／Ｓ有無・ＩＳ位相反転フラグ＝０
Ｍ／Ｓ処理なし、あるいはＩＳ処理の位相反転なし
２）Ｍ／Ｓ有無・ＩＳ位相反転フラグ＝１
Ｍ／Ｓ処理あり、あるいはＩＳ処理の位相反転あり
【００２６】
本フラグが、Ｍ／Ｓ処理の有無を表すか、ＩＳ処理の位相反転の有無を表すかは、右チャンネルのコードブック番号によって決定される。前記コードブック番号がインテンシティステレオ処理を表す場合はＩＳ処理の位相反転の有無を、そうでない場合はＭ／Ｓ処理の有無を表す。
【００２７】
ＭＳマスクは、Ｍ／Ｓ有無・ＩＳ位相反転フラグの符号化方法を表し、次の状態を表す。
１）ＭＳマスク＝０
すべてのＭ／Ｓ有無・ＩＳ位相反転フラグの値は０
２）ＭＳマスク＝１
バンド単位のＭ／Ｓ有無・ＩＳ位相反転フラグを伝送して指定
３）ＭＳマスク＝２
すべてのＭ／Ｓ有無フラグの値は１（ＩＳ位相反転フラグの値は０）
ＭＳマスクの値が０あるいは２の場合には、Ｍ／Ｓ有無・ＩＳ位相反転フラグは符号化データとして伝送されない。
【００２８】
図１６は、インテンシティステレオ処理された符号化データの例で、符号化データを説明するための図である。簡単のため、スケールファクタバンドの数は６としている。同図で、ｄｓｆはスケールファクタの差分（スケールファクタの差分の初期値は省略）、ｄｉｓｐはインテンシティステレオポジションの差分、ｓｄはスペクトルデータを表わす。また、「−」は該当するデータが伝送されないことを表す。
【００２９】
最初に左チャンネルのデータ（Ｌｃｈデータ）について説明する。この例では、セクションの数は３である。セクションデータを（コードブック番号，長さ）で表すと、セクションデータは（３，３），（０，１），（１，２）である。左チャンネルのスケールファクタバンド番号３のコードブック番号は０であり、スケールファクタデータの差分とスペクトルデータは伝送されない。
【００３０】
次に右チャンネルのデータ（Ｒｃｈデータ）について説明する。この例では、右チャンネルのセクションの数は３であり、セクションデータは、（３，２），（２，２）（１５，２）である。右チャンネルのスケールファクタバンド番号４と５のコードブック番号は１５で、インテンシティステレオ処理されていることを表す。インテンシティステレオ処理されたスケールファクタバンドでは、スケールファクタの差分の代わりにインテンシティステレオポジションの差分が伝送され、スペクトルデータは伝送されない。
【００３１】
左右のチャンネルの共通データについて説明する。この例では、ＭＳマスクの値が１であり、Ｍ／Ｓ有無・ＩＳ位相反転フラグが伝送される。右チャンネルのデータのコードブック番号から、スケールファクタバンド番号０から３のＭ／Ｓ有無・ＩＳ位相反転フラグはＭ／Ｓ処理の有無を表し、スケールファクタバンド番号４と５の前記フラグはＩＳ位相反転の有無を表す。
【００３２】
【非特許文献１】
ＩＳＯ／ＩＥＣＪＴＣ１／ＳＣ２９／ＷＧ１１Ｎ１６５０， ”ＩＳＩＳＯ／ＩＥＣ１３８１８−７（ＭＰＥＧ−２ＡｄｖａｎｃｅｄＡｕｄｉｏＣｏｄｉｎｇ，ＡＡＣ）”，１９９７年４月，ｐ．１４−２０，ｐ．３３−３８，ｐ．５１−６２，ｐ．９２−９３，ＡＮＮＥＸＢＩｎｆｏｒｍａｔｉｖｅＰａｒｔｐ．５７−６８
【００３３】
【発明が解決しようとする課題】
しかしながら、上記従来の符号化方法では、バンド内の量子化されたスペクトルデータがすべて零であることを検出し、前記検出結果に基づいてデータを置き換えることにより、より少ないビット数で符号化データを生成するための部を備えていない。このため制限されたビットレートの環境下では符号化効率が劣化し、音質が劣化することがあるという課題があった。
【００３４】
本発明は上記問題点を解決するもので、符号化効率が向上した符号化データを生成することのできるオーディオ高能率符号化装置及びその方法を実現することを目的とする。すなわち、より少ないビット数で従来と同一の復号結果を得ることが可能な符号化データを生成すると共に、削減したビットを音質に寄与する他のデータに割り当て、音質を向上することのできるオーディオ高能率符号化装置、オーディオ高能率符号化方法、オーディオ高能率符号化プログラム及びその記録媒体を実現することを目的とする。
【００３５】
【課題を解決するための手段】
第１の発明は、スペクトルデータを、バンド単位のゲインを示すスケールファクタと前記ゲインで正規化されて量子化されたスペクトルデータとで表し、隣接するバンドのスケールファクタの差分を可変長符号化する符号化装置であって、バンド内のすべての前記量子化されたスペクトルデータが零であるか否かを検出する零検出部と、前記零であることが検出されたバンドのスケールファクタを、差分可変長符号化後に最も短い符号長となる値に置き換えるデータ置換部と、を備えたことを特徴とするオーディオ高能率符号化装置及びその方法である。
【００３６】
又第２の発明は、スペクトルデータを、バンド単位のゲインを示すスケールファクタと前記ゲインで正規化されて量子化されたスペクトルデータとで表し、隣接するバンドのスケールファクタの差分を可変長符号化する符号化装置であって、バンド内のすべての前記量子化されたスペクトルデータが零であるか否かを検出する零検出部と、前記零であることが検出されたバンドと隣接するバンドとのスケールファクタの差分が可変長符号化の最も短い符号長の値となるように、前記零であることが検出されたバンドのスケールファクタを別の値に置き換えるデータ置換部と、を備えたことを特徴とするオーディオ高能率符号化装置及びその方法である。
【００３７】
又第３の発明は、インテンシティステレオ処理を用いて２つのチャンネルのスペクトルデータを、一方のチャンネルにおけるバンド単位のゲインを示すスケールファクタと、他方のチャンネルにおけるバンド単位の指向性ゲインを示すインテンシティステレオポジションと、一方のチャンネルのインテンシティステレオ処理されて前記ゲインで正規化されて量子化されたスペクトルデータとで表し、隣接するインテンシティステレオ処理されたバンドのインテンシティステレオポジションの差分を可変長符号化する符号化装置であって、前記インテンシティステレオ処理されたバンド内のすべての量子化されたスペクトルデータが零であるか否かを検出する零検出部と、前記零であることが検出されたバンドのインテンシティステレオポジションを差分可変長符号化後に最も短い符号長となる値に置き換えるデータ置換部と、を備えたことを特徴とするオーディオ高能率符号化装置及びその方法である。
【００３８】
又第４の発明は、インテンシティステレオ処理を用いて２つのチャンネルのスペクトルデータを、一方のチャンネルにおけるバンド単位のゲインを示すスケールファクタと、他方のチャンネルにおけるバンド単位の指向性ゲインを示すインテンシティステレオポジションと、一方のチャンネルのインテンシティステレオ処理されて前記ゲインで正規化されて量子化されたスペクトルデータとで表し、隣接するインテンシティステレオ処理されたバンドのインテンシティステレオポジションの差分を可変長符号化する符号化装置であって、前記インテンシティステレオ処理されたバンド内のすべての量子化されたスペクトルデータが零であるか否かを検出する零検出部と、前記零であることが検出されたバンドと隣接するバンドとのインテンシティステレオポジションの差分が可変長符号化の最も短い符号長の値となるように、前記零であることが検出されたバンドのインテンシティステレオポジションを別の値に置き換えるデータ置換部と、を備えたことを特徴とするオーディオ高能率符号化装置及びその方法である。
【００３９】
ここで可変長符号化の最も短い符号長の値を零とすることとしてもよい。
【００４０】
又第５の発明は、インテンシティステレオ処理を用いて２つのチャンネルのスペクトルデータに対してコードブックを用いてバンド単位で符号化し、インテンシティステレオ処理を行うバンドにおける一方のチャンネルのコードブック番号は、インテンシティステレオ処理されて量子化されたスペクトルデータの符号化に使用するコードブック番号を表し、他方のチャンネルのコードブック番号はインテンシティステレオ処理を表す場合の符号化装置であって、一方のチャンネルのインテンシティステレオ処理されたバンド内のすべての量子化されたスペクトルデータが零であるか否かを検出する零検出部と、インテンシティステレオ処理されて前記零であることが検出されたバンドにおける他方のチャンネルのコードブック番号を、インテンシティステレオ処理を表すコードブック番号から零データを表すコードブック番号に変更した場合と変更しない場合の符号化に必要なビット数を算出し、コードブック番号を変更した場合の方が符号化に必要なビット数が小さいときに、前記他方のチャンネルのコードブック番号を、インテンシティステレオ処理を表すコードブック番号から零データを表すコードブック番号に置き換えるデータ置換部と、を備えたことを特徴とするオーディオ高能率符号化装置及びその方法である。
【００４１】
又第６の発明は、ミッドサイドステレオ処理とインテンシティステレオ処理を用いて２つのチャンネルのスペクトルデータをバンド単位で符号化し、ミッドサイドステレオ処理の有無とインテンシティステレオ処理の２つのチャンネルにおける位相関係の反転の有無を示すフラグを符号化する符号化装置であって、一方のチャンネルのインテンシティステレオ処理されたバンド内のすべての量子化されたスペクトルデータが零であるか否かを検出する零検出部と、インテンシティステレオ処理されて前記零であることが検出されたバンドを除いたバンドの前記フラグがすべて同一の値のときには、バンド全体のフラグの値を前記同一の値に対応する別の値に置き換えるデータ置換部と、を備えたことを特徴とするオーディオ高能率符号化装置及びその方法である。
【００４２】
又第７の発明は、請求項８〜１４のいずれか１項記載のオーディオ高能率符号化方法を、コンピュータまたはデジタルシグナルプロセッサに実行させるためのプログラムである。
【００４３】
更に第８の発明は、請求項８〜１４のいずれか１項記載のオーディオ高能率符号化方法を、コンピュータまたはデジタルシグナルプロセッサに実行させるためのプログラムを記録したコンピュータ読み取り可能な記録媒体である。
【００４４】
【発明の実施の形態】
以下、本発明の実施の形態について、図面を参照しながら説明する。各実施の形態の説明では、本発明のオーディオ高能率符号化方法をＡＡＣエンコーダに適用した場合を例として説明する。
【００４５】
最初に本発明の各実施の形態におけるオーディオ高能率符号化方法に共通で、特徴的なポイントについてまとめて説明し、次に各実施の形態について固有の特徴的なポイントについて個々に説明する。
【００４６】
図１は、本発明の実施の形態におけるオーディオ高能率符号化方法によるＡＡＣエンコーダの構成を示すブロック図である。図１に示すＡＡＣエンコーダは、フィルタバンク１０１、１０２、インテンシティステレオデータ生成部１１０、ミッドサイド（Ｍ／Ｓ）ステレオデータ生成部１２０、量子化部１３０、符号化部１４０、零検出部１５１、１５２、データ置換部１６０を含んで構成される。なお、図１において図１３と同じ構成要素については同じ符号を用い、説明を省略する。
【００４７】
以下、本発明の特徴的なポイントである零検出部１５１、１５２とデータ置換部１６０とについて、その動作を説明する。零検出部１５１は、量子化部１３０からの左チャンネルの量子化されたスペクトルデータを用いて、スケールファクタバンド内の量子化されたスペクトルデータがすべて零であるか否かを検出し、検出結果をデータ置換部１６０に出力する。
【００４８】
同様に、零検出部１５２は量子化部１３０からの右チャンネルの量子化されたスペクトルデータがスケールファクタバンド内ですべて零であるか否かを検出し、検出結果をデータ置換部１６０に出力する。インテンシティステレオ処理された右チャンネルのスケールファクタバンドに関しては、零検出部１５２での零検出処理は不要である。
【００４９】
以下、零検出部１５１、１５２で零であることが検出されたスケールファクタバンドを零検出バンドと呼び、また、零であることが検出されなかったスケールファクタバンドを非零検出バンドと呼ぶことにする。
【００５０】
データ置換部１６０は、零検出部１５１、１５２からの零検出結果に基づいて零検出バンドのデータを置き換えて符号化部１４０に出力する。インテンシティステレオ処理されたスケールファクタバンドの場合、右チャンネルの復号は左チャンネルの復号結果に基づいて行われるので、左チャンネルの零検出部１５１からの検出結果に基づいて、対応する右チャンネルのスケールファクタバンドのインテンシティステレオポジションやコードブック番号を置き換える。
【００５１】
符号化部１４０は、データ置換部１６０で置換されたデータを用いて符号化を行い、符号化データを生成して出力する。
【００５２】
（実施の形態１）
図２は、実施の形態１のオーディオ高能率符号化装置において、データ置換部１６０Ａの構成を示すブロック図である。このデータ置換部１６０Ａには、非零バンド間スケールファクタ差分算出部２１１、２１２、零バンドスケールファクタ差分算出部２２１、２２２、零バンドスケールファクタ置換部２３１、２３２が設けられている。図２の上段における非零バンド間スケールファクタ差分算出部２１１、零バンドスケールファクタ差分算出部２２１、零バンドスケールファクタ置換部２３１は、左チャンネルのデータ用である。下段における非零バンド間スケールファクタ差分算出部２１２、零バンドスケールファクタ差分算出部２２２、零バンドスケールファクタ置換部２３２は右チャンネルのデータ用である。左チャンネルと右チャンネルの動作は同一であるので、以下では左チャンネルの動作について説明し、右チャンネルの動作については説明を省略する。
【００５３】
非零バンド間スケールファクタ差分算出部２１１は、零検出部１５１で非零検出バンドと検出されたスケールファクタバンド間のスケールファクタの差分を算出する。
【００５４】
零バンドスケールファクタ差分算出部２２１は、非零検出バンド間スケールファクタ差分算出部２１１からの非零検出バンド間のスケールファクタの差分を用いて、非零検出バンド間のスケールファクタの差分を変えることなく、零検出バンドと隣接するスケールファクタバンドのスケールファクタの差分を可変長符号化後に最も短い符号長にする値を算出する。
【００５５】
ｉをスケールファクタバンドの番号とし、ｓｆ（ｉ）をｉのスケールファクタとする。ここでｉとｉ＋２を非零検出バンドとし、ｉ＋１を零検出バンドとする場合を考える。ｓｆ（ｉ＋２）−ｓｆ（ｉ）の値を用いて、表を参照することにより、図１４及び図１５のハフマンコードによる可変長符号化後の符号長を最小にする差分ｓｆ（ｉ＋１）−ｓｆ（ｉ）の値を算出する。
【００５６】
図３〜図６は、図１４及び図１５のハフマンコードを用いて、２つの差分ｄｓｆ１とｄｓｆ２の和が一定という条件で、ｄｓｆ１とｄｓｆ２の合計の符号長を最小にするｄｓｆ１とｄｓｆ２の値と、そのときの合計の符号長ｌｅｎｇｔｈを示す表である。ここでｄｓｆ１とｄｓｆ２の値は入れ替えてもよい。なお、図３はｄｓｆ１＋ｄｓｆ２の値が−１２０〜−６１までのｄｓｆ１、ｄｓｆ２、ｌｅｎｇｔｈを示す。また図４はｄｓｆ１＋ｄｓｆ２の値が−６０〜−１までの値を示し、図５はｄｓｆ１＋ｄｓｆ２の値が０〜５９までの値を示し、図６はｄｓｆ１＋ｄｓｆ２の値が６０〜１２０までの値を示す。
【００５７】
非零検出バンド間のスケールファクタの差分ｓｆ（ｉ＋２）−ｓｆ（ｉ）が、例えば−４の場合を考える。図３〜図６において、ｄｓｆ１＋ｄｓｆ２が−４となる行を参照すると、図４に示すように可変長符号化後の符号長を最小にするｓｆ（ｉ＋１）−ｓｆ（ｉ）の値は−４（ｄｓｆ１）か０（ｄｓｆ２）であり、このときのｓｆ（ｉ＋２）−ｓｆ（ｉ＋１）の値はそれぞれ０、−４であることが算出できる。
【００５８】
零検出バンドがｋ個（ただし、ｋは正整数）連続する場合には、所定の差分を（ｋ＋１）個の差分の和として表した同様な表を参照することにより、可変長符号化後の符号長を最小とする差分を算出することができる。
【００５９】
零バンドスケールファクタ置換部２３１では、零検出バンドの１つ前のスケールファクタバンドにおけるスケールファクタに対して、零バンドスケールファクタ差分算出部２２１で算出した差分を加算した値を算出し、零検出バンドのスケールファクタを置き換える。零検出バンドでは、すべての量子化されたスペクトルデータは零なので、ゲインを表すスケールファクタの値を変化させても同一の復号結果を得ることができる。
【００６０】
なお、スケールファクタの符号化時にはスケールファクタの差分を算出することが必要であり、図７に示すような構成として、零バンドスケールファクタ差分算出部２２１、２２２で算出された零検出バンドのスケールファクタの差分を直接出力し、符号化部１４０では、前記スケールファクタの差分を直接可変長符号化するようにしてもよい。この場合、零バンドスケールファクタ差分算出部２２１、２２２では、零検出バンドと隣接する２つのスケールファクタバンドとの２つのスケールファクタの差分を算出して出力する必要がある。すなわち、上記した例では、ｓｆ（ｉ＋１）−ｓｆ（ｉ）とｓｆ（ｉ＋２）−ｓｆ（ｉ＋１）の２つの差分を算出して出力することが必要である。
【００６１】
実施の形態１によれば、スケールファクタの伝送が必要なコードブック番号が１から１１のスケールファクタバンドに対して、スケールファクタデータの符号化に必要なビット数を削減することが可能である。しかしながら、コードブック番号が０のスケールファクタバンドでは、スケールファクタを伝送する必要がないので、スケールファクタデータのビット数を削減することはできない。
【００６２】
以上のように実施の形態１では、零検出部１５１、１５２からの零検出バンド／非零検出バンドの検出結果に基づいて、零検出バンドのスケールファクタを差分可変長符号化後に最も短い符号長となる値に零バンドスケールファクタ置換部２３１、２３２で置き換えることにより、より少ないビット数で従来と同一の復号結果を得ることができ、符号化効率が向上した符号化データを生成することができる。さらに、削減したビットを音質に寄与する他のデータに割り当てることにより、音質を向上させることができる。
【００６３】
（実施の形態２）
次に本発明の実施の形態２におけるオーディオ高能率符号化装置について説明する。図８は実施の形態２のオーディオ高能率符号化装置におけるデータ置換部１６０Ｂの構成を示すブロック図である。図８において、図２と同じ構成要素については同じ符号を用いる。このデータ置換部１６０Ｂには、非零バンド間スケールファクタ差分算出部２１１、２１２、差分範囲判定部２４１、２４２、零バンドスケールファクタ差分設定部２５１、２５２、零バンドスケールファクタ置換部２３１、２３２が設けられる。
【００６４】
図８の上段において、非零バンド間スケールファクタ差分算出部２１１、差分範囲判定部２４１、零バンドスケールファクタ差分設定部２５１、零バンドスケールファクタ置換部２３１は、左チャンネルのデータ用である。下段の非零バンド間スケールファクタ差分算出部２１２、差分範囲判定部２４２、零バンドスケールファクタ差分設定部２５２、零バンドスケールファクタ置換部２３２は、右チャンネルのデータ用である。左チャンネルと右チャンネルの動作は同一であるので、以下では左チャンネルの動作について説明し、右チャンネルの動作については説明を省略する。
【００６５】
非零バンド間スケールファクタ差分算出部２１１は、図１の零検出部１５１で非零検出バンドとして検出されたスケールファクタバンド間のスケールファクタの差分を算出する。
【００６６】
差分範囲判定部２４１は、非零バンド間スケールファクタ差分算出部２１１からの非零検出バンド間のスケールファクタの差分が所定の範囲内にあるか否か、本実施の形態では±６０以内にあるか否かを判定し、判定結果を出力する。
【００６７】
零バンドスケールファクタ差分設定部２５１は、差分範囲判定部２４１からの出力に基づいて、非零検出バンド間のスケールファクタの差分が±６０以内にあるときには、零検出バンドと隣接するスケールファクタバンドのスケールファクタの差分を、可変長符号化の最も短い符号長である値に設定する。本実施の形態では、可変長符号化に図１４及び図１５のハフマンコードを用いる。最も短い符号長は１ビットであり、このときの値は０である。
【００６８】
ｉをスケールファクタバンドの番号、ｓｆ（ｉ）をｉのスケールファクタ、ｉとｉ＋２を非零検出バンドで、ｉ＋１を零検出バンドとした場合、ｓｆ（ｉ＋２）−ｓｆ（ｉ）が±６０以内のとき、差分ｓｆ（ｉ＋１）−ｓｆ（ｉ）の値を０に設定する。
【００６９】
零検出バンドが２つ以上続く場合には、すべての零検出バンドに対して隣接するスケールファクタバンドとのスケールファクタの差分を０に設定する。零バンドスケールファクタ置換部２３１は、零検出バンドの１つ前のスケールファクタバンドにおけるスケールファクタに、零バンドスケールファクタ差分設定部２５１で設定した差分を加算した値を算出し、零検出バンドのスケールファクタを置き換える。本実施の形態では差分は０に設定されているので、ｓｆ（ｉ＋１）はｓｆ（ｉ）と同一の値に置き換えられる。
【００７０】
零検出バンドでは、すべての量子化されたスペクトルデータは零なので、ゲインを表すスケールファクタの値を変化させても同一の復号結果を得ることができる。
【００７１】
実施の形態２では、零検出バンドと隣接するスケールファクタバンドのスケールファクタの差分を固定値に設定すれば良いので、実施の形態１で用いた図３〜図６の表は不要である。
【００７２】
また、実施の形態２では、実施の形態１と比較して、非零検出バンド間のスケールファクタにおける差分の範囲が±６０以内の条件（非零検出バンド間のスケールファクタの差分を変えないための条件）が必要である。しかし、スケールファクタの±６０は、±９０ｄＢ（＝±６０×１．５ｄＢ）のゲインに対応し、ほとんどの場合この条件を満足するので、実質的に制約条件とはならない。
【００７３】
また、スケールファクタの差分可変長符号化後の符号長に関しても、実施の形態２で対応可能な１２１個（±６０の範囲内における整数の個数）の差分の内、１２０個の差分で最小の値であり、残りの１個も最小より１ビット長いだけという、ほぼ最適な可変長符号化を実現できる。すなわち、図３〜図６におけるｄｓｆ１＋ｄｓｆ２が±６０の範囲内で、ｄｓｆ１＋ｄｓｆ２が３１の場合を除いて、ｄｓｆ１、あるいはｄｓｆ２の値は０としている。また、ｄｓｆ１が０（１ビット）、ｄｓｆ２が３１（１９ビット）でｄｓｆ１＋ｄｓｆ２が３１の場合の符号長は２０（＝１＋１９）ビットであり、これは図５に示す最小時の１９ビットより１ビット長い。
【００７４】
なお、スケールファクタの符号化時にはスケールファクタの差分を算出することが必要なので、実施の形態１で説明したのと同様に、零バンドスケールファクタ差分設定部２５１で算出された零検出バンドのスケールファクタの差分を出力し、符号化部１４０が前記スケールファクタの差分を直接可変長符号化するようにしてもよい。
【００７５】
実施の形態２によれば、スケールファクタの伝送に必要なコードブック番号が１から１１のスケールファクタバンドでは、スケールファクタデータの符号化に必要なビット数を削減することが可能である。しかしながら、コードブック番号が０のスケールファクタバンドでは、スケールファクタを伝送する必要がないので、スケールファクタデータのビット数を削減することはできない。
【００７６】
以上のように実施の形態２では、零検出部１５１、１５２からの零検出バンド／非零検出バンドの検出結果に基づいて、零検出バンドと隣接するスケールファクタバンドのスケールファクタの差分が可変長符号化の最も短い符号長の値となるように、零検出バンドのスケールファクタを零バンドスケールファクタ置換部２３１、２３２で置き換える。このような簡単な処理により、少ないビット数で従来と同一の復号結果を得ることができ、符号化効率が向上した符号化データを生成することができる。さらに、削減したビットを音質に寄与する他のデータに割り当てることにより、音質を向上させることができる。
【００７７】
なお、実施の形態２では、非零検出バンド間のスケールファクタの差分が所定の範囲内（±６０以内）である場合にのみ、零検出バンドのスケールファクタの置き換えを行った。しかし、零検出バンドのスケールファクタの置き換えを常に行い、非零検出バンド間のスケールファクタの差分が所定の範囲外となるときには、所定の範囲内となるように非零検出バンド間のスケールファクタの差分を制限するようにしてもよい。スケールファクタの差分が±６０（±９０ｄＢ）の範囲外となる場合はほとんどないので、このようにしても実施の形態２とほぼ同じ結果を得ることができる。
【００７８】
（実施の形態３）
次に本発明の実施の形態３におけるオーディオ高能率符号化装置について説明する。図９は実施の形態３のオーディオ高能率符号化装置におけるデータ置換部１６０Ｃの構成を示すブロック図である。このデータ置換部１６０Ｃには、非零バンド間ＩＳポジション差分算出部３１０、零バンドＩＳポジション差分算出部３２０、零バンドＩＳポジション置換部３３０が設けられている。
【００７９】
非零バンド間ＩＳポジション差分算出部３１０では、インテンシティステレオ処理されて左チャンネルの零検出部１５１で非零検出バンドと検出された場合、スケールファクタバンド間の右チャンネルのインテンシティステレオ（ＩＳ）ポジションの差分を算出する。
【００８０】
零バンドＩＳポジション差分算出部３２０では、非零バンド間ＩＳポジション差分算出部３１０からの非零検出バンド間のインテンシティステレオポジションの差分を用いて、非零検出バンド間のインテンシティステレオポジションの差分を変えることなく、零検出バンドと隣接するスケールファクタバンドのインテンシティステレオポジションの差分を可変長符号化後に最も短い符号長にする値を算出する。
【００８１】
ｉをスケールファクタバンドの番号、ｉｓｐ（ｉ）をｉのインテンシティステレオポジション、ｉとｉ＋２を非零検出バンドとし、ｉ＋１を零検出バンドとした場合、ｉｓｐ（ｉ＋２）−ｉｓｐ（ｉ）の値を用いて、図３〜図６に示す表を参照することにより、図１４及び図１５のハフマンコードによる可変長符号化後の符号長を最小にする差分ｉｓｐ（ｉ＋１）−ｉｓｐ（ｉ）の値を算出する。
【００８２】
インテンシティステレオポジションの差分の可変長符号化に用いるハフマンコードは、スケールファクタの差分の可変長符号化に用いるハフマンコードと同一なので、実施の形態１で用いた表をそのまま使用することができる。
【００８３】
例えば、非零検出バンド間のインテンシティステレオポジションの差分ｉｓｐ（ｉ＋２）−ｉｓｐ（ｉ）が０の場合、図３〜６の可変長符号化後の符号長を最小にするｓｆ（ｉ＋１）−ｓｆ（ｉ）の値は０であり、このときのｉｓｐ（ｉ＋２）−ｉｓｐ（ｉ＋１）の値も０である。
【００８４】
零バンドＩＳポジション置換部３３０は、零検出バンドの１つ前のスケールファクタバンドにおけるインテンシティステレオポジションに、零バンドＩＳポジション差分算出部３２０で算出した差分を加算した値を算出し、零検出バンドの右チャンネルのインテンシティステレオポジションを置き換える。
【００８５】
例えば、置き換え前のインテンシティステレオポジションの値をｉｓｐ（ｉ）＝４、ｉｓｐ（ｉ＋１）＝−４、ｉｓｐ（ｉ＋２）＝４とし、ｉとｉ＋２とを非零検出バンドとし、ｉ＋１を零検出バンドとした場合、ｉｓｐ（ｉ＋１）＝４に置き換える。この結果、置き換え前はｉｓｐ（ｉ＋１）−ｉｓｐ（ｉ）＝−８（８ビット）と、ｉｓｐ（ｉ＋２）−ｉｓｐ（ｉ＋１）＝８（８ビット）で合わせて符号化に１６（＝８＋８）ビットが必要であった。これに対し、置き換え後は、ｉｓｐ（ｉ＋１）−ｉｓｐ（ｉ）＝０（１ビット）となり、ｉｓｐ（ｉ＋２）−ｉｓｐ（ｉ＋１）＝０（１ビット）となり、計２（＝１＋１）ビットでよいので、符号化に必要なビット数を１４（＝１６−２）ビット削減できる。
【００８６】
零検出バンドでは、すべての量子化されたスペクトルデータは零なので、左チャンネルに対する右チャンネルの指向性ゲインを表すインテンシティステレオポジションの値を変化させても、同一の復号結果を得ることができる。
【００８７】
なお、インテンシティステレオポジションの符号化時にはインテンシティステレオポジションの差分を算出することが必要なので、データ置換部を図１０に示すような構成として、零バンドＩＳポジション差分算出部３２０で算出された零検出バンドの右チャンネルにおけるインテンシティステレオポジションの差分を出力し、図１の符号化部１４０が前記インテンシティステレオポジションの差分を直接可変長符号化するようにしてもよい。この場合、零バンドＩＳポジション差分算出部３２０では、零検出バンドと隣接する２つのスケールファクタバンドとの２つのインテンシティステレオポジションの差分を設定する必要がある。すなわち、上記した例では、ｉｓｐ（ｉ＋１）−ｉｓｐ（ｉ）と、ｉｓｐ（ｉ＋２）−ｉｓｐ（ｉ＋１）との２つの差分を算出して出力することが必要である。
【００８８】
実施の形態３によれば、右チャンネルのコードブック番号がインテンシティステレオ処理を表す１４か１５の場合、スケールファクタデータ中のインテンシティステレオポジションの符号化に必要なビット数を、その最小のビット数まで削減することが可能である。
【００８９】
以上のように実施の形態３では、零検出部１５１からの零検出バンド／非零検出バンドの検出結果に基づいて、零検出バンドのインテンシティステレオポジションを、差分可変長符号化後に最も短い符号長となる値に零バンドＩＳポジション置換部３３０で置き換えることにより、より少ないビット数で従来と同一の復号結果を得ることができ、符号化効率が向上した符号化データを生成することができる。さらに、削減したビットを音質に寄与する他のデータに割り当てることにより、音質を向上させることができる。
【００９０】
（実施の形態４）
次に本発明の実施の形態４におけるオーディオ高能率符号化装置について説明する。図１１は実施の形態４のオーディオ高能率符号化方法におけるデータ置換部１６０Ｄの構成を示すブロック図である。図９と同じ構成要素については同じ符号を用いる。このデータ置換部１６０Ｄには、非零バンド間ＩＳポジション差分算出部３１０、差分範囲判定部３４０、零バンドＩＳポジション差分設定部３５０、零バンドＩＳポジション置換部３３０が設けられている。
【００９１】
非零バンド間ＩＳポジション差分算出部３１０は、インテンシティステレオ処理されて左チャンネルの零検出部１５１で非零検出バンドと検出されたスケールファクタバンド間の右チャンネルにおけるインテンシティステレオポジションの差分を算出する。
【００９２】
差分範囲判定部３４０は、非零バンド間ＩＳポジション差分算出部３１０からの非零検出バンド間のインテンシティステレオポジションの差分が所定の範囲内、即ち本実施の形態では±６０以内にあるか否かを判定し、判定結果を出力する。
【００９３】
零バンドＩＳポジション差分設定部３５０は、差分範囲判定部３４０からの出力に基づいて、非零検出バンド間のインテンシティステレオポジションの差分が±６０以内にあるときには、零検出バンドと隣接するスケールファクタバンドのインテンシティステレオポジションの差分を、可変長符号化の最も短い符号長である値に設定する。本実施の形態では可変長符号化に図１４及び図１５のハフマンコードを用いるので、最も短い符号長は１ビットで値は０である。
【００９４】
ｉをスケールファクタバンドの番号、ｉｓｐ（ｉ）をｉのインテンシティステレオポジション、ｉとｉ＋２を非零検出バンドとし、ｉ＋１を零検出バンドとした場合、ｉｓｐ（ｉ＋２）−ｉｓｐ（ｉ）が±６０以内のとき、差分ｉｓｐ（ｉ＋１）−ｉｓｐ（ｉ）の値を０に設定する。
【００９５】
零検出バンドが２つ以上続く場合には、すべての零検出バンドに対して隣接するスケールファクタバンドとのインテンシティステレオポジションの差分を０に設定する。
【００９６】
零バンドＩＳポジション置換部３３０では、零検出バンドの１つ前のスケールファクタバンドにおけるインテンシティステレオポジションに、零バンドＩＳポジション差分設定部３５０で設定した差分を加算した値を算出し、零検出バンドのインテンシティステレオポジションを置き換える。本実施の形態では差分は０に設定されているので、ｉｓｐ（ｉ＋１）はｉｓｐ（ｉ）と同一の値に置き換えられる。
【００９７】
零検出バンドでは、すべての量子化されたスペクトルデータは零なので、指向性ゲインを表すインテンシティステレオポジションの値を変化させても、同一の復号結果を得ることができる。
【００９８】
実施の形態４では、零検出バンドと隣接するスケールファクタバンドのスケールファクタの差分を固定値に設定すれば良いので、実施の形態３で必要であった図３〜図６の表が不要になる。
【００９９】
また、実施の形態４では、実施の形態３と比較して、非零検出バンド間のインテンシティステレオポジションにおける差分の範囲が±６０以内の条件（非零検出バンド間のインテンシティステレオポジションの差分を変えないための条件）が必要であるが、インテンシティステレオポジションの±６０は±９０ｄＢ（＝±６０×１．５ｄＢ）のゲインに対応し、ほとんどの場合この条件を満足するので、実質的に制約条件とはならない。
【０１００】
また、インテンシティステレオポジションの差分可変長符号化後の符号長に関しても、実施の形態３で対応可能な１２１個の差分の内、１２０個が差分で最小の値であり、残りの１個も最小より１ビット長いだけである。このため、ほぼ最適な可変長符号化を実現できる。
【０１０１】
なお、インテンシティステレオポジションの符号化時には、インテンシティステレオポジションの差分を算出することが必要なので、実施の形態３で説明したのと同様に、零バンドＩＳポジション差分設定部３５０で算出された零検出バンドのインテンシティステレオポジションの差分を出力し、符号化部１４０では、前記インテンシティステレオポジションの差分を直接可変長符号化するようにしてもよい。
【０１０２】
実施の形態４によれば、右チャンネルのコードブック番号がインテンシティステレオ処理を表す１４か１５の場合、スケールファクタデータの中のインテンシティステレオポジションにおける符号化に必要なビット数を、ほとんどの場合その最小のビット数まで削減することが可能である。
【０１０３】
以上のように実施の形態４では、零検出部１５１からの零検出バンド／非零検出バンドの検出結果に基づいて、零検出バンドと隣接するスケールファクタバンドのインテンシティステレオポジションの差分が、可変長符号化の最も短い符号長の値となるように、零検出バンドのインテンシティステレオポジションを零バンドＩＳポジション置換部３３０で置き換える。このような簡単な処理により、少ないビット数で従来と同一の復号結果を得ることができ、符号化効率が向上した符号化データを生成することができる。さらに、削減したビットを音質に寄与する他のデータに割り当てることにより、音質を向上させることができる。
【０１０４】
なお、実施の形態４では、非零検出バンド間のインテンシティステレオポジションの差分が所定の範囲内（±６０以内）である場合にのみ、零検出バンドのインテンシティステレオポジションの置き換えを行った。しかし、零検出バンドのインテンシティステレオポジションの置き換えを常に行い、非零検出バンド間のインテンシティステレオポジションの差分が所定の範囲外となるときには、所定の範囲内となるように非零検出バンド間のインテンシティステレオポジションの差分を制限するようにしてもよい。インテンシティステレオポジションの差分が±６０（±９０ｄＢ）の範囲外となる場合はほとんどないので、このようにしても実施の形態４とほぼ同じ結果を得ることができる。
【０１０５】
（実施の形態５）
実施の形態５は、インテンシティステレオ処理されたスケールファクタバンドに関し、零検出バンドの右チャンネルハフマンコードブック番号をデータ置換部１６０で置き換えることを特徴とするものである。
【０１０６】
図１の左チャンネルの零検出部１５１は、インテンシティステレオ処理されたスケールファクタバンド内の量子化されたスペクトルデータがすべて零であるか否かを検出し、データ置換部１６０に出力する。
【０１０７】
データ置換部１６０は、インテンシティステレオ処理されたスケールファクタバンドに関し、以下の処理を行う。最初に非零検出バンド間の右チャンネルにおけるインテンシティステレオポジションの差分が±６０以内にあるかを判定する。±６０以内にあれば、零検出バンドのインテンシティステレオポジションを省いても、非零検出バンド間のインテンシティステレオポジションの差分を維持することができる。
【０１０８】
次に、前記差分が±６０以内にあるときには、非零検出バンドの間にある零検出バンドの右チャンネルのインテンシティステレオを表すコードブック番号（１４あるいは１５）を、零データを表すコードブック番号（０）に変更したときと変更しないとき、右チャンネルのセクションデータとスケールファクタデータの符号化に必要なビット数を算出する。変更したときの方が必要なビット数が少ないときには、インテンシティステレオを表す右チャンネルのコードブック番号を、零データを表すコードブック番号に置き換える。
【０１０９】
零データを表すコードブック番号の場合、スケールファクタを伝送する必要がないので、スケールファクタデータに必要なビット数は減少する。しかしながらコードブック番号を変更することにより、セクションの数が増加し、セクションデータに必要なビット数が増加することもある。このため、スケールファクタデータとセクションデータの符号化に必要なビット数が小さくなるときのみ、右チャンネルのコードブック番号を変更する。
【０１１０】
零検出バンドでは、インテンシティステレオ処理されて量子化されたスペクトルデータはすべて零なので、右チャンネルのコードブック番号を、インテンシティステレオ処理を表す番号（１４、あるいは１５）から零データを表すコードブック番号（０）に変化させても、同一の復号結果を得ることができる。
【０１１１】
以上のように実施の形態５では、零検出部１５１からの零検出バンド／非零検出バンドの検出結果に基づいて、データ置換部１６０で、インテンシティステレオ処理された零検出バンドの右チャンネルのコードブックック番号を、インテンシティステレオ処理を表すコードブック番号から零データを表すコードブック番号に変更したときの方が符号化に必要なビット数が小さいときに、前記右チャンネルのコードブック番号を置き換える。こうすることにより、より少ないビット数で従来と同一の復号結果を得ることができ、符号化効率が向上した符号化データを生成することができる。さらに、削減したビットを音質に寄与する他のデータに割り当てることにより、音質を向上させることができる。
【０１１２】
（実施の形態６）
次に本発明の実施の形態６におけるオーディオ高能率符号化方法について説明する。図１２は実施の形態６のオーディオ高能率符号化方法におけるデータ置換部１６０Ｅの構成を示すブロック図である。このデータ置換部１６０Ｅには、非零バンドＭ／Ｓ・ＩＳ位相反転フラグ同一判定部４１０、ＭＳマスク設定部４２０が設けられている。
【０１１３】
非零バンドＭ／Ｓ・ＩＳ位相反転フラグ同一判定部４１０は、インテンシティステレオ処理されて、左チャンネルの零検出部１５１で零検出バンドと検出されたスケールファクタバンドを除いたスケールファクタバンドのＭＳ有無・ＩＳ位相反転フラグが、すべて同一の値であるかどうかを判定し、判定結果を出力する。
【０１１４】
非零バンドＭ／Ｓ・ＩＳ位相反転フラグ同一判定部４１０で、ＭＳ有無・ＩＳ位相反転フラグが同一の値であると判定されたときには、ＭＳマスク設定部４２０はバンド全体のフラグであるＭＳマスクの値を、同一の値に対応する値に置き換えて出力する。すなわち、同一の値が０の場合にはＭＳマスクの値を０に置き換え、同一の値が１の場合にはＭＳマスクの値を２に置き換える。
【０１１５】
インテンシティステレオ処理された零検出バンドでは、すべての量子化されたスペクトルデータは零なので、２つのチャンネルの位相関係を表すフラグの値を反転しても同一の復号結果を得ることができる。
【０１１６】
前述したようにＭＳマスクの値が１以外の場合には、ＭＳ有無・ＩＳ位相反転フラグを符号化データとして伝送する必要がない。従って置き換える前のＭＳマスクの値が１の場合、０あるいは２に置き換えることによって、ＭＳ有無・ＩＳ位相反転フラグビットを伝送する必要がなくなり、符号化に必要なビットを削減することができる。ＭＳ有無・ＩＳ位相反転フラグはスケールファクタバンド毎に１ビット必要なので、例えば、スケールファクタバンドの数が４９の場合、４９ビット削減することができる。
【０１１７】
以上のように実施の形態６では、零検出部１５１からの零検出バンド／非零ゼ検出バンドの検出結果に基づいて、インテンシティステレオ処理された零検出バンドを除いたスケールファクタバンドのステレオ処理の状態を表すＭＳ有無・ＩＳ位相反転フラグの値がすべて同一の場合、バンド全体のフラグの状態を表すＭＳマスクの値をＭＳマスク設定部４２０で置き換えることにより、スケールファクタバンド単位の前記フラグの伝送を不要とし、符号化効率が向上した符号化データを生成することができる。さらに、削減したビットを音質に寄与する他のデータに割り当てることにより、音質を向上させることができる。
【０１１８】
なお、上記の各実施の形態におけるオーディオ高能率符号化方法は、零検出バンドとそれに関連するデータを、より短い符号長のデータとなるように置き換えることにより、符号化効率の向上を実現している。このため、零スペクトルデータが発生しやすい低ビットレートの符号化や狭帯域信号の符号化に対して特に有効である。
【０１１９】
なお、上記の各実施の形態を組み合わせて実施することも可能であり、例えば、実施の形態１、実施の形態３、実施の形態５、実施の形態６を組み合わせて実施することもできる。ただし、同じデータを異なった形態で置き換える実施の形態１と実施の形態２、あるいは実施の形態３と実施の形態４は、同時に実施することはできない。
【０１２０】
なお、上記の各実施の形態では、オーディオ高能率符号化方法をＡＡＣエンコーダに適用した例を示した。しかし本発明は、同様な符号化フォーマットを有する他の符号化方式にも適用可能である。
【０１２１】
なお、上記の各実施の形態におけるオーディオ高能率符号化方法は、コンピュータまたはデジタルシグナルプロセッサに実行させるためのプログラムとして実現することができ、これをコンピュータ読み取り可能な記録媒体に記録してもよい。
【０１２２】
【発明の効果】
以上のように本発明によれば、符号化効率が向上した符号化データを生成することができる。すなわち、より少ないビット数で従来と同一の復号結果を得る符号化データを生成することができる。
【０１２３】
また本発明によれば、削減したビットを音質に寄与する他のデータに割り当てることにより、音質を向上させることができる。
【０１２４】
本発明は特に零スペクトルデータが発生しやすい低ビットレートの符号化や狭帯域信号の符号化に対して有効である。
【図面の簡単な説明】
【図１】本発明の各実施の形態によるオーディオ高能率符号化方法において、ＡＡＣエンコーダの構成を示すブロック図である。
【図２】本発明の実施の形態１におけるデータ置換部の構成を示すブロック図である。
【図３】２つの差分の和が一定という条件で、可変長符号化後の符号長を最小にする差分を示す表（その１）である。
【図４】２つの差分の和が一定という条件で、可変長符号化後の符号長を最小にする差分を示す表（その２）である。
【図５】２つの差分の和が一定という条件で、可変長符号化後の符号長を最小にする差分を示す表（その３）である。
【図６】２つの差分の和が一定という条件で、可変長符号化後の符号長を最小にする差分を示す表（その４）である。
【図７】本発明の実施の形態１におけるデータ置換部の別の構成を示すブロック図である。
【図８】本発明の実施の形態２におけるデータ置換部の構成を示すブロック図である。
【図９】本発明の実施の形態３におけるデータ置換部の構成を示すブロック図である。
【図１０】本発明の実施の形態３におけるデータ置換部の別の構成を示すブロック図である。
【図１１】本発明の実施の形態４におけるデータ置換部の構成を示すブロック図である。
【図１２】本発明の実施の形態６におけるデータ置換部の構成を示すブロック図である。
【図１３】従来のＡＡＣエンコーダの構成を示すブロック図である。
【図１４】スケールファクタとインテンシティステレオポジションの差分可変長符号化に用いられるハフマンコードの符号長を示す表（その１）である。
【図１５】スケールファクタとインテンシティステレオポジションの差分可変長符号化に用いられるハフマンコードの符号長を示す表（その２）である。
【図１６】インテンシティステレオ処理された符号化データの例を説明するための図である。
【符号の説明】
１０１，１０２フィルタバンク
１１０インテンシティステレオデータ生成部
１２０Ｍ／Ｓステレオデータ生成部
１３０量子化部
１４０符号化部
１５１，１５２零検出部
１６０，１６０Ａ，１６０Ｂ，１６０Ｃ，１６０Ｄ，１６０Ｅデータ置換部
２１１，２１２非零バンド間スケールファクタ差分算出部
２２１，２２２零バンドスケールファクタ差分算出部
２３１，２３２零バンドスケールファクタ置換部
２４１，２４２差分範囲判定部
２５１，２５２零バンドスケールファクタ差分設定部
３１０非零バンド間ＩＳポジション差分算出部
３２０零バンドＩＳポジション差分算出部
３３０零バンドＩＳポジション置換部
３４０差分範囲判定部
３５０零バンドＩＳポジション差分設定部
４１０非零バンドＭ／Ｓ・ＩＳ位相反転フラグ同一判定部
４２０ＭＳマスク設定部[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention converts an audio signal into spectrum data, and performs high-efficiency coding of spectrum data using variable-length coding, an audio high-efficiency coding method, an audio high-efficiency coding method, and an audio high-efficiency coding program. It relates to a recording medium.
[0002]
[Prior art]
2. Description of the Related Art In recent years, there has been proposed an audio efficient coding method in which coding efficiency is improved by converting an audio signal into spectrum data and coding the spectrum data using variable length coding.
[0003]
As such a high-efficiency audio coding method, a method described in a standard specification of MPEG-2 AAC (Advanced Audio Coding) (see Non-Patent Document 1) is known. In the following, taking a low complexity profile of MPEG-2 AAC (hereinafter abbreviated as AAC) described in Non-Patent Document 1 as an example, a conventional technique for highly efficient encoding of spectral data using variable length encoding is described. The technique will be described.
[0004]
FIG. 13 is a block diagram showing a configuration of a conventional AAC encoder. The AAC encoder includes filter banks 101 and 102, an intensity stereo data generator 110, a mid-side (M / S) stereo data generator 120, a quantizer 130, and an encoder 140. The operation of the AAC encoder having such a configuration will be described.
[0005]
The audio signal on the time axis of the left channel (Lch) input to the filter bank 101 is divided into predetermined time samples, that is, 2048 samples for a long conversion block and 256 conversion samples for a short conversion block. . Then, the data is converted into spectrum data (MDCT coefficients) by MDCT (Modified Discrete Cosine Transform). This conversion is performed by overlapping the conversion blocks by 50%. In the case of the long conversion block, 2048 samples are converted into 1024 pieces of spectral data. In the case of a short conversion block, 256 samples are converted into 128 pieces of spectrum data.
[0006]
By converting the short transform block into eight consecutive short transform blocks, the number of output spectrum data of the short transform block is set to 8 × 128 = 1024, which is matched with the long transform block. 1024 spectral data per channel is a unit of encoding. Next, the 1024 spectral data are grouped into band units called scale factor bands that simulate the critical band characteristics of the human ear.
[0007]
Similarly, the audio signal on the time axis of the right channel (Rch) input to the filter bank 102 is divided into conversion blocks, and converted into 1024 spectral data by MDCT. Next, 1024 pieces of spectral data are grouped in units of scale factor bands.
[0008]
The intensity stereo data generation unit 110 outputs the presence / absence information of the intensity stereo processing in units of scale factor bands, and, for a scale factor band on which the intensity stereo processing is performed, a left-channel intensity stereo processed spectrum. Data and information indicating the phase relationship between the two channels and an intensity stereo position indicating the directivity gain of the right channel with respect to the left channel are calculated and output. In addition, the intensity stereo data generation unit 110 outputs the spectrum data of the left and right channels as they are for the scale factor band in which the intensity stereo processing is not performed.
[0009]
The spectral data of the left channel subjected to intensity stereo processing is the sum of the spectral data of the left channel and the spectral data of the right channel (when the phase relationship between the two channels is in phase) or the difference (when the phase relationship between the two channels is out of phase) ) Is generated by correcting the gain so that its power level matches the original power level of the left channel. Then, the spectrum data of the right channel is set to zero. Note that this zero spectrum data is not transmitted as encoded data.
[0010]
The mid-side stereo data generation unit (M / S stereo data generation unit) 120 includes information on presence / absence of mid-side stereo processing in units of scale factor bands, and mid-side stereo processed mid-spectral data when mid-side stereo processing is performed. And side spectrum data. The mid-spectrum data is generated by half the sum of the left-channel and right-channel spectral data, and is output as left-channel spectral data. Further, the side spectrum data is generated by half of the difference between the spectrum data of the left channel and the spectrum data of the right channel, and is output as the spectrum data of the right channel. In the case of a scale factor band in which neither the intensity stereo processing nor the mid-side stereo processing is performed, the original spectrum data in the left and right channels is output as it is. Note that the intensity stereo processing and the mid-side stereo processing have an exclusive relationship, and if one processing is selected, the other processing cannot be selected.
[0011]
The quantization unit 130 quantizes a scale factor indicating the gain of the spectrum data in units of scale factor bands and the spectrum data normalized by the gain. Calculates the masking level of the spectral data, that is, the permissible quantization noise level, based on the auditory model for the spectral data of the left and right channels for each scale factor band, and normalizes the scale factor based on the calculated permissible quantization noise level. Quantization with the spectral data obtained is performed.
[0012]
The encoding unit 140 performs an encoding process on the quantized data using variable length encoding based on Huffman code, and generates and outputs encoded data. The processing of the quantization unit 130 and the encoding unit 140 is executed by repeating the operation in order to adjust the number of bits necessary for the encoded data to the number of usable bits or less.
[0013]
Hereinafter, the format of the encoded data generated by the encoding unit 140 will be described. In the case of a stereo audio signal, the encoded data includes data common to two channels and data unique to each channel. First, data for each channel will be described. The main coded data for each channel includes three data: section data, scale factor data, and coded spectrum data.
[0014]
First, section data (section_data () in Non-Patent Document 1) will be described. A section is a set of scale factor bands that use the same codebook. The section data is obtained by repeating data for each section including a codebook number used in the section and a length of the section in units of a scale factor band for all sections.
[0015]
In AAC, eleven kinds of Huffman codebooks with codebook numbers 1 to 11 are used for encoding quantized spectral data. Also, as a special codebook number, a codebook number 0 indicating that all quantized spectral data in the scale factor band is zero data, and the phase relationship between the two channels in the intensity stereo processing is in phase. And a codebook number 14 indicating that the phase relationship between the two channels in the intensity stereo processing is opposite.
[0016]
In AAC, in order to improve coding efficiency, the scale factor of the section of codebook number 0 representing zero data and the quantized spectrum data are not transmitted as coded data.
[0017]
As is apparent from the format of the section data described above, the number of bits required for the section data increases as the number of sections increases. Therefore, when a plurality of Huffman codebooks can be selected, a codebook is selected to form a section such that the total number of encoded data bits including the section data and the encoded spectrum data is reduced. The process of forming a section in this way is called sectioning. Due to the sectioning, a case occurs where the codebook number 0 is not used even if all the quantized spectral data in the scale factor band is zero.
[0018]
Scale factor data (scale_factor_data () in Non-Patent Document 1) includes data obtained by coding scale factors for all scale factor bands and data obtained by coding intensity stereo positions.
[0019]
The scale factor represents the gain of the scale factor band in units of 1.5 dB, and the encoding of the scale factor is performed by variable-length encoding the difference of the scale factor between the scale factor bands using a Huffman code. The coded data of the scale factor includes an initial value and a difference between the scale factors subjected to the variable length coding. The difference between the scale factors when performing variable length coding is within ± 60.
[0020]
FIGS. 14 and 15 show the code length of the Huffman code used for variable length coding of the difference between the scale factors. In FIG. 14 and FIG. 15, index represents an address when the Huffman codebook is referred to, and is a value obtained by adding 60 to the difference between the scale factors. Dsf represents the difference between the scale factors, and length represents the code length of the Huffman code (unit is bit). FIG. 14 shows the dsf and the length when the index is from 0 to 59, and FIG. 15 shows the dsf and the length when the index is from 60 to 120. As shown in FIG. 15, when the code length of the Huffman code is the shortest, the difference (dsf) is 0, and the code length (length) is 1 bit.
[0021]
In the right channel of the scale factor band subjected to the intensity stereo processing, the intensity stereo position is transmitted instead of the scale factor as scale factor data.
[0022]
The intensity stereo position represents the directivity gain of the right channel with respect to the left channel in units of 1.5 dB. The encoding of the intensity stereo position is performed in a manner similar to the encoding of the scale factor. That is, the difference between the intensity stereo positions of adjacent scale factor bands subjected to intensity stereo processing is variable-length coded using the same Huffman codebook as the scale factor. However, the initial value of the difference between the scale factors is transmitted as coded data, whereas the initial value of the difference between the intensity stereo positions is always 0 and is not transmitted.
[0023]
The encoded spectrum data (spectral_data () in Non-Patent Document 1) is data obtained by encoding the quantized spectrum data using the Huffman codebook selected as a section. If the Huffman codebook number is 0 representing zero data, or 14, 15 representing intensity stereo processing, no spectral data is transmitted.
[0024]
Next, an M / S presence / absence / IS phase inversion flag (ms_used in Non-Patent Document 1) and an MS mask (ms_mask_present in Non-Patent Document 1) will be described below as encoded data common to two channels.
[0025]
The M / S presence / absence / IS phase inversion flag is a flag indicating presence / absence of mid-side stereo processing or presence / absence of phase inversion in intensity stereo (IS) processing, and is a 1-bit flag per scale factor band. Specifically, it represents the following state.
1) M / S existence / IS phase inversion flag = 0
No M / S processing or no phase reversal of IS processing
2) M / S existence / IS phase inversion flag = 1
With M / S processing or phase reversal of IS processing
[0026]
Whether this flag indicates the presence or absence of the M / S processing or the presence or absence of the phase inversion of the IS processing is determined by the codebook number of the right channel. If the codebook number indicates intensity stereo processing, it indicates the presence / absence of phase inversion of the IS processing, and if not, it indicates the presence / absence of M / S processing.
[0027]
The MS mask indicates the encoding method of the M / S presence / absence / IS phase inversion flag, and indicates the following state.
1) MS mask = 0
All M / S presence / IS phase inversion flag values are 0
2) MS mask = 1
Specify by transmitting the M / S existence / IS phase inversion flag in band units
3) MS mask = 2
All M / S flag values are 1 (IS phase inversion flag value is 0)
If the value of the MS mask is 0 or 2, the M / S presence / absence / IS phase inversion flag is not transmitted as encoded data.
[0028]
FIG. 16 is an example of encoded data that has been subjected to intensity stereo processing, and is a diagram for describing encoded data. For simplicity, the number of scale factor bands is six. In the figure, dsf represents a difference between scale factors (the initial value of the difference between scale factors is omitted), disp represents a difference between intensity stereo positions, and sd represents spectral data. "-" Indicates that the corresponding data is not transmitted.
[0029]
First, the left channel data (Lch data) will be described. In this example, the number of sections is three. When the section data is represented by (codebook number, length), the section data is (3, 3), (0, 1), (1, 2). The codebook number of the scale factor band number 3 of the left channel is 0, and the difference between the scale factor data and the spectrum data is not transmitted.
[0030]
Next, the right channel data (Rch data) will be described. In this example, the number of sections of the right channel is 3, and the section data is (3, 2), (2, 2) (15, 2). The codebook number of the scale factor band numbers 4 and 5 of the right channel is 15, which indicates that intensity stereo processing has been performed. In the scale factor band subjected to intensity stereo processing, the difference in intensity stereo position is transmitted instead of the difference in scale factor, and spectral data is not transmitted.
[0031]
The common data of the left and right channels will be described. In this example, the value of the MS mask is 1, and the M / S presence / absence / IS phase inversion flag is transmitted. From the codebook number of the data of the right channel, the M / S presence / absence / IS phase inversion flag of scale factor band numbers 0 to 3 indicates the presence / absence of M / S processing, and the flag of scale factor band numbers 4 and 5 indicates IS phase. Indicates the presence or absence of inversion.
[0032]
[Non-patent document 1]
ISO / IEC JTC1 / SC29 / WG11 N1650, "IS ISO / IEC 13818-7 (MPEG-2 Advanced Audio Coding, AAC)", April 1997, p. 14-20, p. 33-38, p. 51-62, p. 92-93, ANNEX B Informative Part p. 57-68
[0033]
[Problems to be solved by the invention]
However, in the above-mentioned conventional encoding method, it is detected that the quantized spectrum data in the band is all zero, and the encoded data is replaced with a smaller number of bits by replacing the data based on the detection result. There is no unit to generate. For this reason, there has been a problem that the coding efficiency is deteriorated under the environment of the limited bit rate, and the sound quality may be deteriorated.
[0034]
SUMMARY OF THE INVENTION The present invention has been made to solve the above problems, and has as its object to realize a high-efficiency audio encoding apparatus and method capable of generating encoded data with improved encoding efficiency. That is, while generating coded data capable of obtaining the same decoding result as the conventional one with a smaller number of bits, the reduced bits are allocated to other data contributing to the sound quality, and the audio quality that can improve the sound quality is improved. It is an object of the present invention to realize an efficiency coding device, an audio efficiency coding method, an audio efficiency coding program, and a recording medium thereof.
[0035]
[Means for Solving the Problems]
According to a first aspect, spectrum data is represented by a scale factor indicating a gain in a band unit and spectral data normalized and quantized by the gain, and a difference between scale factors of adjacent bands is variable-length coded. An encoding device, a zero detection unit that detects whether all the quantized spectral data in a band is zero, and a scale factor of the band that is detected to be zero, A high-efficiency audio encoding apparatus and method, comprising: a data replacement unit that replaces a value with a shortest code length after variable-length encoding.
[0036]
According to a second aspect of the present invention, the spectrum data is represented by a scale factor indicating a gain in a band unit and spectral data normalized and quantized by the gain, and a difference between scale factors of adjacent bands is variable-length coded. A zero detecting unit that detects whether all the quantized spectral data in the band is zero, and a band adjacent to the band detected to be zero. And a data replacement unit that replaces the scale factor of the band detected to be zero with another value so that the difference between the scale factors becomes the value of the shortest code length of the variable length coding. And an audio efficient encoding apparatus and method.
[0037]
According to a third aspect of the present invention, the spectral data of two channels is converted into a scale factor indicating a gain per band in one channel and an intensity indicating a directivity gain per band in the other channel by using intensity stereo processing. The stereo position and intensity data of one channel are subjected to intensity stereo processing, normalized by the gain and quantized, and represented as quantized data, and the difference between the intensity stereo positions of adjacent intensity stereo processed bands is variable length. An encoding device for encoding, comprising: a zero detector for detecting whether all the quantized spectral data in the band subjected to the intensity stereo processing is zero; and detecting that the zero is present. Band intensity stereo position A data replacing unit that replaces the on to the shortest code length value serving after differential variable length coding, an audio and high-efficiency encoding apparatus and method characterized by comprising a.
[0038]
According to a fourth aspect of the present invention, the spectral data of the two channels is divided into a scale factor indicating a band-wise gain in one channel and an intensity indicating a directivity gain in a band unit in the other channel by using intensity stereo processing. The stereo position and intensity data of one channel are subjected to intensity stereo processing, normalized by the gain and quantized, and represented as quantized data, and the difference between the intensity stereo positions of adjacent intensity stereo processed bands is variable length. An encoding device for encoding, comprising: a zero detector for detecting whether all the quantized spectral data in the band subjected to the intensity stereo processing is zero; and detecting that the zero is present. Between adjacent band and adjacent band A data replacement unit that replaces the intensity stereo position of the band detected to be zero with another value so that the difference between the stereo positions becomes the value of the shortest code length of the variable length coding. And an audio efficient encoding apparatus and method.
[0039]
Here, the value of the shortest code length of the variable length coding may be set to zero.
[0040]
According to a fifth aspect of the present invention, the spectral data of two channels is coded in band units using a codebook using intensity stereo processing, and the codebook number of one channel in a band for performing intensity stereo processing is Represents a codebook number used for encoding spectral data that has been subjected to intensity stereo processing and quantized, and the codebook number of the other channel is an encoding device in the case of representing intensity stereo processing. A zero detection unit that detects whether all the quantized spectral data in the band subjected to the intensity stereo processing of the channel is zero, and a band that is detected by the intensity stereo processing to be zero. The codebook number of the other channel at Calculate the number of bits required for encoding when changing from the codebook number representing city stereo processing to the codebook number representing zero data, and calculate the number of bits required for encoding when the codebook number is changed. A data replacement unit that replaces the codebook number of the other channel from a codebook number representing intensity stereo processing to a codebook number representing zero data when the number of bits is small. An audio efficient coding apparatus and method.
[0041]
According to a sixth aspect of the present invention, the spectral data of two channels is encoded in band units using mid-side stereo processing and intensity stereo processing, and the presence or absence of mid-side stereo processing and the phase relationship in the two channels of intensity stereo processing are coded. An encoding device for encoding a flag indicating the presence / absence of inversion of zero, which detects whether or not all quantized spectrum data in the band subjected to intensity stereo processing of one channel is zero. A detecting unit that, when all the flags of the bands other than the band detected to be zero by the intensity stereo processing have the same value, sets the flag value of the entire band to another value corresponding to the same value. And a data replacement unit for replacing the data with a value of Beauty is that way.
[0042]
A seventh invention is a program for causing a computer or a digital signal processor to execute the high-efficiency audio encoding method according to any one of claims 8 to 14.
[0043]
According to an eighth aspect of the present invention, there is provided a computer-readable recording medium recording a program for causing a computer or a digital signal processor to execute the high-efficiency audio encoding method according to any one of claims 8 to 14.
[0044]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to the drawings. In the description of each embodiment, a case where the audio high efficiency encoding method of the present invention is applied to an AAC encoder will be described as an example.
[0045]
First, characteristic points common to the high-efficiency audio encoding methods according to the embodiments of the present invention will be collectively described, and then characteristic points unique to each embodiment will be individually described.
[0046]
FIG. 1 is a block diagram showing a configuration of an AAC encoder according to an audio efficient coding method according to an embodiment of the present invention. The AAC encoder shown in FIG. 1 includes filter banks 101 and 102, an intensity stereo data generator 110, a mid-side (M / S) stereo data generator 120, a quantizer 130, an encoder 140, a zero detector 151, 152 and a data replacement unit 160. In FIG. 1, the same components as those in FIG. 13 are denoted by the same reference numerals, and description thereof will be omitted.
[0047]
Hereinafter, the operations of the zero detection units 151 and 152 and the data replacement unit 160 which are characteristic points of the present invention will be described. The zero detection unit 151 detects whether or not all the quantized spectrum data in the scale factor band is zero using the quantized spectrum data of the left channel from the quantization unit 130, and detects the detection result. Is output to the data replacement unit 160.
[0048]
Similarly, zero detection section 152 detects whether or not the quantized spectrum data of the right channel from quantization section 130 is all zero in the scale factor band, and outputs the detection result to data replacement section 160. . For the scale factor band of the right channel that has been subjected to the intensity stereo processing, the zero detection processing by the zero detection unit 152 is unnecessary.
[0049]
Hereinafter, a scale factor band detected as zero by the zero detectors 151 and 152 is referred to as a zero detection band, and a scale factor band not detected as zero is referred to as a non-zero detection band. I do.
[0050]
Data replacement section 160 replaces the data of the zero detection band based on the zero detection results from zero detection sections 151 and 152 and outputs the data to encoding section 140. In the case of the scale factor band subjected to the intensity stereo processing, since decoding of the right channel is performed based on the decoding result of the left channel, the scale of the corresponding right channel is determined based on the detection result from the zero detection unit 151 of the left channel. Replace factor band intensity stereo position or codebook number.
[0051]
Encoding section 140 performs encoding using the data replaced by data replacing section 160, and generates and outputs encoded data.
[0052]
(Embodiment 1)
FIG. 2 is a block diagram showing a configuration of data replacement section 160A in the high-efficiency audio encoding device according to the first embodiment. The data replacement unit 160A includes non-zero inter-band scale factor difference calculation units 211 and 212, zero band scale factor difference calculation units 221 and 222, and zero band scale factor replacement units 231 and 232. The non-zero inter-band scale factor difference calculation unit 211, the zero band scale factor difference calculation unit 221 and the zero band scale factor replacement unit 231 in the upper part of FIG. 2 are for left channel data. The non-zero inter-band scale factor difference calculation unit 212, the zero band scale factor difference calculation unit 222, and the zero band scale factor replacement unit 232 in the lower stage are for right channel data. Since the operations of the left channel and the right channel are the same, the operation of the left channel will be described below, and the description of the operation of the right channel will be omitted.
[0053]
The non-zero band scale factor difference calculation unit 211 calculates a difference in scale factor between the non-zero detection band and the scale factor band detected by the zero detection unit 151.
[0054]
The zero band scale factor difference calculation unit 221 changes the scale factor difference between the non-zero detection bands using the scale factor difference between the non-zero detection bands from the scale factor difference calculation unit 211 between the non-zero detection bands. Instead, a value that makes the difference between the scale factor of the zero detection band and the scale factor of the adjacent scale factor band the shortest code length after variable length coding is calculated.
[0055]
Let i be the scale factor band number and sf (i) be the scale factor for i. Here, consider a case where i and i + 2 are non-zero detection bands, and i + 1 is a zero detection band. Using the value of sf (i + 2) -sf (i) and referring to the table, the difference sf (i + 1) -sf that minimizes the code length after variable-length coding by the Huffman code in FIGS. The value of (i) is calculated.
[0056]
FIGS. 3 to 6 show values of dsf1 and dsf2 that minimize the total code length of dsf1 and dsf2 using the Huffman codes of FIGS. 14 and 15 under the condition that the sum of two differences dsf1 and dsf2 is constant. And a table showing the total code length at that time. Here, the values of dsf1 and dsf2 may be interchanged. FIG. 3 shows dsf1, dsf2, and length when the value of dsf1 + dsf2 is -120 to -61. 4 shows the value of dsf1 + dsf2 from -60 to -1; FIG. 5 shows the value of dsf1 + dsf2 from 0 to 59; and FIG. 6 shows the value of dsf1 + dsf2 from 60 to 120. .
[0057]
Consider the case where the difference sf (i + 2) -sf (i) of the scale factor between the non-zero detection bands is, for example, -4. Referring to the rows where dsf1 + dsf2 is -4 in FIGS. 3 to 6, as shown in FIG. 4, the value of sf (i + 1) -sf (i) that minimizes the code length after variable length coding is -4. (Dsf1) or 0 (dsf2), and the value of sf (i + 2) -sf (i + 1) at this time can be calculated to be 0 and -4, respectively.
[0058]
When k (where k is a positive integer) consecutive zero detection bands, the predetermined difference is referred to as a sum of (k + 1) differences to refer to a similar table, and the variable-length-encoded data is obtained. The difference that minimizes the code length can be calculated.
[0059]
The zero band scale factor replacement unit 231 calculates a value obtained by adding the difference calculated by the zero band scale factor difference calculation unit 221 to the scale factor in the scale factor band immediately before the zero detection band, and calculates the zero detection band. Replace the scale factor of. In the zero detection band, all the quantized spectral data are zero, so that the same decoding result can be obtained even if the value of the scale factor representing the gain is changed.
[0060]
Note that it is necessary to calculate the difference between the scale factors when encoding the scale factor. As a configuration shown in FIG. 7, the scale factor of the zero detection band calculated by the zero band scale factor difference calculation units 221 and 222 is used. May be directly output, and the encoding unit 140 may directly perform variable-length encoding on the difference between the scale factors. In this case, the zero band scale factor difference calculation units 221 and 222 need to calculate and output the difference between the two scale factors between the zero detection band and two adjacent scale factor bands. That is, in the above example, it is necessary to calculate and output two differences of sf (i + 1) -sf (i) and sf (i + 2) -sf (i + 1).
[0061]
According to the first embodiment, it is possible to reduce the number of bits required for encoding scale factor data for a scale factor band whose codebook number requiring transmission of a scale factor is 1 to 11. However, in a scale factor band with a codebook number of 0, it is not necessary to transmit a scale factor, so that the number of bits of scale factor data cannot be reduced.
[0062]
As described above, in the first embodiment, based on the detection result of the zero detection band / non-zero detection band from zero detection units 151 and 152, the scale factor of the zero detection band is set to the shortest code length after differential variable length coding. By replacing the values with zero band scale factor replacing units 231 and 232, the same decoding result as that of the related art can be obtained with a smaller number of bits, and encoded data with improved encoding efficiency can be generated. . Furthermore, sound quality can be improved by allocating the reduced bits to other data that contributes to sound quality.
[0063]
(Embodiment 2)
Next, a high-efficiency audio encoding device according to Embodiment 2 of the present invention will be described. FIG. 8 is a block diagram showing a configuration of the data replacement unit 160B in the high-efficiency audio encoding device according to the second embodiment. 8, the same reference numerals are used for the same components as those in FIG. The data replacement unit 160B includes non-zero inter-band scale factor difference calculation units 211 and 212, difference range determination units 241 and 242, zero band scale factor difference setting units 251 and 252, and zero band scale factor replacement units 231 and 232. Provided.
[0064]
In the upper part of FIG. 8, a non-zero inter-band scale factor difference calculation unit 211, a difference range determination unit 241, a zero band scale factor difference setting unit 251, and a zero band scale factor replacement unit 231 are for left channel data. The lower-stage non-zero band scale factor difference calculation unit 212, difference range determination unit 242, zero band scale factor difference setting unit 252, and zero band scale factor replacement unit 232 are for right channel data. Since the operations of the left channel and the right channel are the same, the operation of the left channel will be described below, and the description of the operation of the right channel will be omitted.
[0065]
The non-zero band scale factor difference calculation unit 211 calculates the difference in scale factor between the scale factor bands detected as the non-zero detection band by the zero detection unit 151 in FIG.
[0066]
The difference range determination unit 241 determines whether or not the difference of the scale factor between the non-zero detection bands from the non-zero inter-band scale factor difference calculation unit 211 is within a predetermined range. In the present embodiment, the difference is within ± 60. It determines whether or not it is, and outputs the determination result.
[0067]
Based on the output from the difference range determination unit 241, when the difference between the scale factors between the non-zero detection bands is within ± 60 based on the output from the difference range determination unit 241, the zero band scale factor difference setting unit 251 The difference between the scale factors is set to a value that is the shortest code length of the variable length coding. In this embodiment, the Huffman codes shown in FIGS. 14 and 15 are used for variable-length coding. The shortest code length is 1 bit, and the value at this time is 0.
[0068]
When i is a scale factor band number, sf (i) is a scale factor of i, i and i + 2 are non-zero detection bands, and i + 1 is a zero detection band, sf (i + 2) -sf (i) is within ± 60. , The value of the difference sf (i + 1) −sf (i) is set to 0.
[0069]
If two or more zero detection bands continue, the difference between the scale factor and the adjacent scale factor band for all zero detection bands is set to zero. The zero band scale factor replacement unit 231 calculates a value obtained by adding the difference set by the zero band scale factor difference setting unit 251 to the scale factor in the scale factor band immediately before the zero detection band, and calculates the scale of the zero detection band. Replace the factor. In the present embodiment, since the difference is set to 0, sf (i + 1) is replaced with the same value as sf (i).
[0070]
In the zero detection band, all the quantized spectral data are zero, so that the same decoding result can be obtained even if the value of the scale factor representing the gain is changed.
[0071]
In the second embodiment, since the difference between the scale factors of the zero detection band and the adjacent scale factor band may be set to a fixed value, the tables of FIGS. 3 to 6 used in the first embodiment are unnecessary.
[0072]
Also, in the second embodiment, as compared with the first embodiment, the condition that the range of the difference in the scale factor between the non-zero detection bands is within ± 60 (because the difference in the scale factor between the non-zero detection bands is not changed) Condition) is required. However, a scale factor of ± 60 corresponds to a gain of ± 90 dB (= ± 60 × 1.5 dB), and in most cases satisfies this condition, so that it is not substantially a constraint.
[0073]
Regarding the code length after the variable length coding of the difference of the scale factor, of the 121 differences (the number of integers within the range of ± 60) that can be handled in the second embodiment, the smallest difference of 120 differences is used. This is a value, and the other one is only one bit longer than the minimum, so that almost optimal variable length coding can be realized. That is, the value of dsf1 or dsf2 is 0 except when dsf1 + dsf2 in FIG. 3 to FIG. 6 is within ± 60 and dsf1 + dsf2 is 31. When dsf1 is 0 (1 bit), dsf2 is 31 (19 bits), and dsf1 + dsf2 is 31, the code length is 20 (= 1 + 19) bits, which is 1 bit from the minimum 19 bits shown in FIG. long.
[0074]
Since it is necessary to calculate the difference between the scale factors when encoding the scale factor, the scale factor of the zero detection band calculated by the zero band scale factor difference setting unit 251 is calculated in the same manner as described in the first embodiment. May be output, and the encoding unit 140 may directly perform variable-length encoding on the difference between the scale factors.
[0075]
According to the second embodiment, it is possible to reduce the number of bits required for encoding scale factor data in a scale factor band in which a codebook number required for transmission of a scale factor is 1 to 11. However, in a scale factor band with a codebook number of 0, it is not necessary to transmit a scale factor, so that the number of bits of scale factor data cannot be reduced.
[0076]
As described above, in the second embodiment, the difference between the scale factor of the zero detection band and the scale factor of the adjacent scale factor band is variable based on the detection result of the zero detection band / non-zero detection band from the zero detection units 151 and 152. The scale factor of the zero detection band is replaced by the zero band scale factor replacement units 231 and 232 so that the value of the code length becomes the shortest. By such a simple process, the same decoding result as that of the related art can be obtained with a small number of bits, and coded data with improved coding efficiency can be generated. Furthermore, sound quality can be improved by allocating the reduced bits to other data that contributes to sound quality.
[0077]
In the second embodiment, the scale factor of the zero detection band is replaced only when the difference of the scale factor between the non-zero detection bands is within a predetermined range (within ± 60). However, the scale factor of the zero detection band is always replaced, and when the difference of the scale factor between the non-zero detection bands is out of the predetermined range, the scale factor of the non-zero detection band is set to be within the predetermined range. The difference may be limited. Since there is almost no case where the difference between the scale factors is out of the range of ± 60 (± 90 dB), almost the same result as in the second embodiment can be obtained in this manner.
[0078]
(Embodiment 3)
Next, a high efficiency audio encoding apparatus according to Embodiment 3 of the present invention will be described. FIG. 9 is a block diagram showing a configuration of a data replacement unit 160C in the high-efficiency audio encoding device according to the third embodiment. The data replacement unit 160C includes a non-zero inter-band IS position difference calculation unit 310, a zero band IS position difference calculation unit 320, and a zero band IS position replacement unit 330.
[0079]
In the non-zero band IS position difference calculation unit 310, when intensity stereo processing is performed and the left channel zero detection unit 151 detects a non-zero detection band, the right channel intensity stereo (IS) between scale factor bands is used. Calculate the position difference.
[0080]
The zero band IS position difference calculation unit 320 uses the difference in intensity stereo position between non-zero detection bands from the non-zero inter-band IS position difference calculation unit 310 to calculate the difference in intensity stereo position between non-zero detection bands. Without changing the zero detection band, the difference between the intensity stereo positions of the scale factor bands adjacent to the zero detection band is calculated to be the shortest code length after variable length coding.
[0081]
If i is a scale factor band number, isp (i) is the intensity stereo position of i, i and i + 2 are non-zero detection bands, and i + 1 is a zero detection band, the value of isp (i + 2) -isp (i) 3 and FIG. 6, the difference isp (i + 1) −isp (i) that minimizes the code length after the variable length coding by the Huffman code in FIG. 14 and FIG. Calculate the value.
[0082]
The Huffman code used for variable-length encoding of the difference in intensity stereo position is the same as the Huffman code used for variable-length encoding of the difference in scale factor, so that the table used in the first embodiment can be used as it is.
[0083]
For example, when the difference in intensity stereo position between non-zero detection bands isp (i + 2) -isp (i) is 0, sf (i + 1)-which minimizes the code length after variable-length coding in FIGS. The value of sf (i) is 0, and the value of isp (i + 2) -isp (i + 1) at this time is also 0.
[0084]
The zero band IS position replacement unit 330 calculates a value obtained by adding the difference calculated by the zero band IS position difference calculation unit 320 to the intensity stereo position in the scale factor band immediately before the zero detection band, and calculates the zero detection band. Replace the intensity stereo position of the right channel of.
[0085]
For example, the values of the intensity stereo position before replacement are set as isp (i) = 4, isp (i + 1) =-4, isp (i + 2) = 4, i and i + 2 are non-zero detection bands, and i + 1 is zero detection. In the case of a band, it is replaced with isp (i + 1) = 4. As a result, before replacement, isp (i + 1) -isp (i) = − 8 (8 bits) and isp (i + 2) −isp (i + 1) = 8 (8 bits), resulting in 16 (= 8 + 8) for encoding. A bit was needed. On the other hand, after the replacement, isp (i + 1) -isp (i) = 0 (1 bit), and isp (i + 2) -isp (i + 1) = 0 (1 bit), for a total of 2 (= 1 + 1) bits. Therefore, the number of bits required for encoding can be reduced by 14 (= 16-2) bits.
[0086]
In the zero detection band, all the quantized spectral data are zero, so that even if the value of the intensity stereo position indicating the directivity gain of the right channel with respect to the left channel is changed, the same decoding result can be obtained.
[0087]
Since it is necessary to calculate the difference between the intensity stereo positions when encoding the intensity stereo position, the data replacement unit is configured as shown in FIG. 10 and the zero band IS position difference calculation unit 320 calculates the zero. The difference between the intensity stereo positions in the right channel of the detection band may be output, and the encoding unit 140 in FIG. 1 may directly perform the variable-length encoding on the difference between the intensity stereo positions. In this case, the zero band IS position difference calculation unit 320 needs to set the difference between two intensity stereo positions of the zero detection band and two adjacent scale factor bands. That is, in the above example, it is necessary to calculate and output two differences between isp (i + 1) -isp (i) and isp (i + 2) -isp (i + 1).
[0088]
According to the third embodiment, when the codebook number of the right channel is 14 or 15 representing the intensity stereo processing, the number of bits necessary for encoding the intensity stereo position in the scale factor data is set to the minimum bit number. It is possible to reduce to a number.
[0089]
As described above, in the third embodiment, based on the detection result of the zero detection band / non-zero detection band from zero detection unit 151, the intensity stereo position of the zero detection band is set to the shortest code after differential variable length coding. By replacing the long value with the zero-band IS position replacing unit 330, the same decoding result as that of the related art can be obtained with a smaller number of bits, and coded data with improved coding efficiency can be generated. Furthermore, sound quality can be improved by allocating the reduced bits to other data that contributes to sound quality.
[0090]
(Embodiment 4)
Next, a high efficiency audio encoding apparatus according to Embodiment 4 of the present invention will be described. FIG. 11 is a block diagram showing a configuration of the data replacement unit 160D in the audio efficient coding method according to the fourth embodiment. The same reference numerals are used for the same components as those in FIG. The data replacement unit 160D includes a non-zero inter-band IS position difference calculation unit 310, a difference range determination unit 340, a zero band IS position difference setting unit 350, and a zero band IS position replacement unit 330.
[0091]
The non-zero band IS position difference calculation unit 310 calculates the difference of the intensity stereo position in the right channel between the non-zero detection band and the scale factor band detected by the zero detection unit 151 of the left channel after the intensity stereo processing. I do.
[0092]
The difference range determination unit 340 determines whether the difference in intensity stereo position between non-zero detection bands from the non-zero inter-band IS position difference calculation unit 310 is within a predetermined range, that is, within ± 60 in the present embodiment. And outputs a result of the determination.
[0093]
Based on the output from the difference range determination unit 340, the zero band IS position difference setting unit 350 sets the scale factor adjacent to the zero detection band when the difference between the intensity stereo positions between the non-zero detection bands is within ± 60. The difference between the intensity stereo positions of the bands is set to a value that is the shortest code length of the variable length coding. In the present embodiment, since the Huffman code shown in FIGS. 14 and 15 is used for variable length coding, the shortest code length is 1 bit and the value is 0.
[0094]
If i is a scale factor band number, isp (i) is the intensity stereo position of i, i and i + 2 are non-zero detection bands, and i + 1 is a zero detection band, isp (i + 2) -isp (i) is ± When the difference is less than 60, the value of the difference isp (i + 1) -isp (i) is set to 0.
[0095]
If two or more zero detection bands continue, the difference between the intensity stereo position and the adjacent scale factor band for all zero detection bands is set to zero.
[0096]
The zero band IS position replacement unit 330 calculates a value obtained by adding the difference set by the zero band IS position difference setting unit 350 to the intensity stereo position in the scale factor band immediately before the zero detection band, and calculates the zero detection band. Replace the intensity stereo position of. In this embodiment, since the difference is set to 0, isp (i + 1) is replaced with the same value as isp (i).
[0097]
In the zero detection band, all the quantized spectral data are zero, so that even if the value of the intensity stereo position indicating the directivity gain is changed, the same decoding result can be obtained.
[0098]
In the fourth embodiment, since the difference between the scale factors of the zero detection band and the adjacent scale factor band may be set to a fixed value, the tables of FIGS. .
[0099]
Further, in the fourth embodiment, as compared with the third embodiment, the condition that the range of the difference in the intensity stereo position between the non-zero detection bands is within ± 60 (the difference in the intensity stereo position between the non-zero detection bands). Is necessary, but ± 60 of the intensity stereo position corresponds to a gain of ± 90 dB (= ± 60 × 1.5 dB), and in most cases this condition is satisfied. Is not a constraint.
[0100]
Regarding the code length of the intensity stereo position after the difference variable length coding, 120 of the 121 differences that can be handled in the third embodiment are the minimum values of the differences, and the remaining one is also the same. It is only one bit longer than the minimum. Therefore, almost optimal variable length coding can be realized.
[0101]
When encoding the intensity stereo position, it is necessary to calculate the difference between the intensity stereo positions. Therefore, as described in the third embodiment, the zero band IS position difference setting unit 350 calculates the zero. The difference between the intensity stereo positions of the detection bands may be output, and the encoding unit 140 may directly perform variable length encoding on the difference between the intensity stereo positions.
[0102]
According to the fourth embodiment, when the codebook number of the right channel is 14 or 15 representing the intensity stereo processing, the number of bits required for encoding at the intensity stereo position in the scale factor data is almost always It is possible to reduce to the minimum number of bits.
[0103]
As described above, in the fourth embodiment, based on the detection result of the zero detection band / non-zero detection band from zero detection section 151, the difference between the intensity stereo positions of the zero detection band and the adjacent scale factor band is variable. The intensity stereo position of the zero detection band is replaced by the zero band IS position replacement unit 330 so that the value of the shortest code length of the long coding is obtained. By such a simple process, the same decoding result as that of the related art can be obtained with a small number of bits, and encoded data with improved encoding efficiency can be generated. Furthermore, sound quality can be improved by allocating the reduced bits to other data that contributes to sound quality.
[0104]
In the fourth embodiment, the intensity stereo position of the zero detection band is replaced only when the difference between the intensity stereo positions between the non-zero detection bands is within a predetermined range (within ± 60). However, the intensity stereo position of the zero detection band is always replaced, and when the difference of the intensity stereo position between the non-zero detection bands is out of the predetermined range, the difference between the non-zero detection bands is set to be within the predetermined range. May be limited. Since there is almost no case where the difference between the intensity stereo positions is out of the range of ± 60 (± 90 dB), almost the same result as in the fourth embodiment can be obtained in this case.
[0105]
(Embodiment 5)
Embodiment 5 is characterized in that the data replacement unit 160 replaces the right channel Huffman codebook number of the zero detection band with respect to the scale factor band subjected to the intensity stereo processing.
[0106]
1 detects whether or not all the quantized spectral data in the scale factor band subjected to the intensity stereo processing is zero, and outputs the result to the data replacing unit 160.
[0107]
The data replacement unit 160 performs the following processing on the scale factor band subjected to the intensity stereo processing. First, it is determined whether the difference of the intensity stereo position in the right channel between the non-zero detection bands is within ± 60. If it is within ± 60, the difference in intensity stereo position between non-zero detection bands can be maintained even if the intensity stereo position of the zero detection band is omitted.
[0108]
Next, when the difference is within ± 60, the codebook number (14 or 15) representing the intensity stereo of the right channel of the zero detection band between the nonzero detection bands is replaced with the codebook number representing the zero data. When changed to (0) and when not changed, the number of bits required for encoding the right channel section data and scale factor data is calculated. If the number of required bits is smaller when changing, the codebook number of the right channel representing intensity stereo is replaced with a codebook number representing zero data.
[0109]
In the case of a codebook number representing zero data, the number of bits required for the scale factor data is reduced because the scale factor need not be transmitted. However, changing the codebook number may increase the number of sections and the number of bits required for section data. Therefore, the codebook number of the right channel is changed only when the number of bits necessary for encoding the scale factor data and the section data is reduced.
[0110]
In the zero detection band, all the spectral data quantized by the intensity stereo processing is zero. Therefore, the codebook number of the right channel is changed from the number (14 or 15) representing the intensity stereo processing to the codebook representing the zero data. Even if the number is changed to the number (0), the same decoding result can be obtained.
[0111]
As described above, in the fifth embodiment, based on the detection result of the zero detection band / non-zero detection band from the zero detection unit 151, the data replacement unit 160 performs the right / left of the zero detection band subjected to the intensity stereo processing. When the codebook number is changed from the codebook number representing intensity stereo processing to the codebook number representing zero data, when the number of bits required for encoding is smaller, the codebook number of the right channel is changed. replace. By doing so, it is possible to obtain the same decoding result as the conventional one with a smaller number of bits, and it is possible to generate encoded data with improved encoding efficiency. Furthermore, sound quality can be improved by allocating the reduced bits to other data that contributes to sound quality.
[0112]
(Embodiment 6)
Next, a high-efficiency audio encoding method according to Embodiment 6 of the present invention will be described. FIG. 12 is a block diagram showing a configuration of the data replacement unit 160E in the high-efficiency audio encoding method according to the sixth embodiment. The data replacement section 160E includes a non-zero band M / S · IS phase inversion flag identity determination section 410 and an MS mask setting section 420.
[0113]
The non-zero band M / S · IS phase reversal flag identity determination unit 410 performs intensity stereo processing and calculates the MS of the scale factor band excluding the scale factor band detected as the zero detection band by the zero detection unit 151 of the left channel. It is determined whether or not all the presence / absence / IS phase inversion flags have the same value, and a determination result is output.
[0114]
When the non-zero band M / S • IS phase inversion flag identity determination unit 410 determines that the MS presence / IS phase inversion flag has the same value, the MS mask setting unit 420 sets the MS mask that is a flag of the entire band. Is replaced with a value corresponding to the same value and output. That is, when the same value is 0, the value of the MS mask is replaced with 0, and when the same value is 1, the value of the MS mask is replaced with 2.
[0115]
In the zero detection band subjected to the intensity stereo processing, all the quantized spectral data are zero, so that the same decoding result can be obtained even if the value of the flag indicating the phase relationship between the two channels is inverted.
[0116]
As described above, when the value of the MS mask is other than 1, there is no need to transmit the presence / absence of MS / IS phase inversion flag as encoded data. Therefore, when the value of the MS mask before replacement is 1, it is not necessary to transmit the presence / absence of MS / IS phase inversion flag bit by replacing it with 0 or 2, and the bits required for encoding can be reduced. The MS presence / IS phase inversion flag requires one bit for each scale factor band. For example, when the number of scale factor bands is 49, 49 bits can be reduced.
[0117]
As described above, in the sixth embodiment, based on the detection result of the zero detection band / the non-zero detection band from the zero detection unit 151, the stereo processing of the scale factor band excluding the zero detection band subjected to the intensity stereo processing is performed. In the case where the values of the MS presence / IS phase inversion flags indicating the states of all the bands are the same, the MS mask value indicating the state of the flags of the entire band is replaced by the MS mask setting unit 420, so that the flags of the scale factor band unit are replaced. It is possible to generate encoded data with improved encoding efficiency by eliminating transmission. Furthermore, sound quality can be improved by allocating the reduced bits to other data that contributes to sound quality.
[0118]
Note that the audio high-efficiency encoding method in each of the above-described embodiments realizes an improvement in encoding efficiency by replacing the zero detection band and the data related thereto with data having a shorter code length. I have. For this reason, it is particularly effective for encoding at a low bit rate or encoding a narrow band signal in which zero spectrum data is easily generated.
[0119]
It is to be noted that the above embodiments can be implemented in combination, and for example, the embodiments 1, 3, 5, and 6 can be implemented in combination. However, the first and second embodiments, or the third and fourth embodiments, in which the same data is replaced with different forms, cannot be performed simultaneously.
[0120]
In each of the above embodiments, an example has been described in which the audio high-efficiency encoding method is applied to the AAC encoder. However, the present invention is applicable to other encoding schemes having a similar encoding format.
[0121]
The audio high-efficiency encoding method in each of the above embodiments can be realized as a program to be executed by a computer or a digital signal processor, and may be recorded on a computer-readable recording medium.
[0122]
【The invention's effect】
As described above, according to the present invention, encoded data with improved encoding efficiency can be generated. That is, encoded data that can obtain the same decoding result as that of the related art can be generated with a smaller number of bits.
[0123]
Further, according to the present invention, sound quality can be improved by allocating the reduced bits to other data that contributes to sound quality.
[0124]
The present invention is particularly effective for encoding at a low bit rate or in encoding a narrow band signal in which zero spectrum data is easily generated.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of an AAC encoder in an audio efficient coding method according to each embodiment of the present invention.
FIG. 2 is a block diagram illustrating a configuration of a data replacement unit according to Embodiment 1 of the present invention.
FIG. 3 is a table (1) showing a difference that minimizes the code length after variable-length coding under a condition that the sum of two differences is constant.
FIG. 4 is a table (part 2) showing a difference that minimizes the code length after variable-length coding under the condition that the sum of two differences is constant.
FIG. 5 is a table (part 3) showing a difference that minimizes the code length after variable length coding under a condition that the sum of two differences is constant.
FIG. 6 is a table (part 4) showing a difference that minimizes the code length after variable-length coding under a condition that the sum of two differences is constant.
FIG. 7 is a block diagram showing another configuration of the data replacement unit according to Embodiment 1 of the present invention.
FIG. 8 is a block diagram showing a configuration of a data replacement unit according to Embodiment 2 of the present invention.
FIG. 9 is a block diagram showing a configuration of a data replacement unit according to Embodiment 3 of the present invention.
FIG. 10 is a block diagram showing another configuration of the data replacement unit according to Embodiment 3 of the present invention.
FIG. 11 is a block diagram showing a configuration of a data replacement unit according to Embodiment 4 of the present invention.
FIG. 12 is a block diagram illustrating a configuration of a data replacement unit according to Embodiment 6 of the present invention.
FIG. 13 is a block diagram illustrating a configuration of a conventional AAC encoder.
FIG. 14 is a table (No. 1) showing a code length of a Huffman code used for differential variable length coding of a scale factor and an intensity stereo position.
FIG. 15 is a table (No. 2) showing the code length of the Huffman code used for the difference variable length coding between the scale factor and the intensity stereo position.
FIG. 16 is a diagram for describing an example of encoded data subjected to intensity stereo processing.
[Explanation of symbols]
101, 102 filter bank
110 intensity stereo data generator
120 M / S stereo data generator
130 Quantization unit
140 encoding unit
151, 152 Zero detector
160, 160A, 160B, 160C, 160D, 160E Data replacement unit
211, 212 Non-zero inter-band scale factor difference calculation unit
221 and 222 Zero band scale factor difference calculation unit
231 and 232 Zero band scale factor replacement unit
241, 242 Difference range determination unit
251, 252 Zero band scale factor difference setting unit
310 IS position difference calculation unit between non-zero bands
320 Zero band IS position difference calculation unit
330 Zero Band IS Position Replacement Unit
340 Difference range determination unit
350 Zero band IS position difference setting unit
410 Non-zero band M / S · IS phase reversal flag same judgment unit
420 MS mask setting section

Claims

An encoding apparatus for expressing spectral data by a scale factor indicating a gain in a band unit and spectral data normalized and quantized by the gain and performing variable length encoding of a difference between scale factors of adjacent bands. ,
A zero detection unit that detects whether all the quantized spectral data in the band is zero,
An audio efficient coding apparatus, comprising: a data replacement unit that replaces the scale factor of the band detected to be zero with a value having the shortest code length after differential variable length coding.

An encoding apparatus for expressing spectral data by a scale factor indicating a gain in a band unit and spectral data normalized and quantized by the gain and performing variable length encoding of a difference between scale factors of adjacent bands. ,
A zero detection unit that detects whether all the quantized spectral data in the band is zero,
The scale factor of the band detected to be zero so that the difference between the scale factor of the band detected to be zero and the adjacent band becomes the value of the shortest code length of the variable length coding. And a data replacement unit that replaces the value with another value.

Using the intensity stereo processing, the spectral data of the two channels is divided into a scale factor indicating a gain per band in one channel, an intensity stereo position indicating a directivity gain per band in the other channel, and one channel. Intensity stereo processing, normalized by the gain and represented by quantized spectral data, and a variable length encoding of a difference in intensity stereo position between adjacent intensity stereo processed bands is performed by a coding apparatus. So,
A zero detection unit that detects whether all the quantized spectral data in the band subjected to the intensity stereo processing are zero,
A data replacement unit that replaces the intensity stereo position of the band detected to be zero with a value having a shortest code length after differential variable length coding. .

Using the intensity stereo processing, the spectral data of the two channels is divided into a scale factor indicating a gain per band in one channel, an intensity stereo position indicating a directivity gain per band in the other channel, and one channel. Intensity stereo processing, normalized by the gain and represented by quantized spectral data, and a variable length encoding of a difference in intensity stereo position between adjacent intensity stereo processed bands is performed by a coding apparatus. So,
A zero detection unit that detects whether all the quantized spectral data in the band subjected to the intensity stereo processing are zero,
The band of the band detected as zero is changed so that the difference between the intensity stereo position of the band detected as zero and the adjacent band becomes the value of the shortest code length of the variable length coding. An audio efficient coding device, comprising: a data replacement unit for replacing an intensity stereo position with another value.

5. The audio efficient coding apparatus according to claim 1, wherein the value of the shortest code length of the variable length coding is set to zero.

The spectral data of the two channels are coded in band units using the codebook using the intensity stereo processing, and the codebook number of one channel in the band to be subjected to the intensity stereo processing is subjected to the intensity stereo processing. Represents a codebook number used to encode the quantized spectral data, the codebook number of the other channel is an encoding device when representing intensity stereo processing,
A zero detection unit that detects whether all the quantized spectral data in the intensity stereo processed band of one channel is zero,
The codebook number of the other channel in the band detected to be zero by the intensity stereo processing is changed from the codebook number representing the intensity stereo processing to the codebook number representing the zero data. When the number of bits required for encoding is calculated, and when the number of bits required for encoding is smaller when the codebook number is changed, the codebook number of the other channel is subjected to intensity stereo processing. A data replacement unit that replaces a represented codebook number with a codebook number representing zero data.

A flag indicating whether or not the mid-side stereo processing is performed and whether or not the phase relationship is inverted between the two channels of the intensity stereo processing using the mid-side stereo processing and the intensity stereo processing. Encoding apparatus for encoding
A zero detection unit that detects whether all the quantized spectral data in the intensity stereo processed band of one channel is zero,
When the flags of the bands other than the band detected to be zero by the intensity stereo processing are all the same value, the value of the flag of the entire band is replaced with another value corresponding to the same value. And a data replacement unit.

An encoding method for expressing spectral data by a scale factor indicating a gain in a band unit and spectral data normalized by the gain and quantized, and a variable length encoding of a difference between scale factors of adjacent bands. ,
Detecting whether all the quantized spectral data in the band is zero,
A high-efficiency audio encoding method, wherein a scale factor of a band detected to be zero is replaced with a value having a shortest code length after differential variable-length encoding.

An encoding method for expressing spectral data by a scale factor indicating a gain in a band unit and spectral data normalized by the gain and quantized, and a variable length encoding of a difference between scale factors of adjacent bands. ,
Detecting whether all the quantized spectral data in the band is zero,
The scale factor of the band detected as zero so that the difference between the band detected as zero and the scale factor of an adjacent band becomes the value of the shortest code length of the variable length coding. Is replaced with another value.

Using the intensity stereo processing, the spectral data of the two channels is divided into a scale factor indicating a gain per band in one channel, an intensity stereo position indicating a directivity gain per band in the other channel, and one channel. Intensity stereo processing, normalized by the gain and represented by quantized spectral data, and a coding method for performing variable length coding of the difference between the intensity stereo positions of adjacent intensity stereo processed bands. So,
Detecting whether all the quantized spectral data in the intensity stereo processed band is zero,
A high-efficiency audio encoding method, wherein the intensity stereo position of the band detected to be zero is replaced with a value having the shortest code length after differential variable-length encoding.

Using the intensity stereo processing, the spectral data of the two channels is divided into a scale factor indicating a gain per band in one channel, an intensity stereo position indicating a directivity gain per band in the other channel, and one channel. Intensity stereo processing, normalized by the gain and represented by quantized spectral data, and a coding method for performing variable length coding of the difference between the intensity stereo positions of adjacent intensity stereo processed bands. So,
Detecting whether all the quantized spectral data in the intensity stereo processed band is zero,
The band of the band detected as zero is determined so that the difference between the band detected as zero and the intensity stereo position of the adjacent band becomes the value of the shortest code length of the variable length coding. A high-efficiency audio encoding method, wherein the intensity stereo position is replaced with another value.

12. The high-efficiency audio encoding method according to claim 8, wherein the value of the shortest code length of the variable-length encoding is set to zero.

The spectral data of the two channels are coded in band units using the codebook using the intensity stereo processing, and the codebook number of one channel in the band to be subjected to the intensity stereo processing is subjected to the intensity stereo processing. Represents the codebook number used to encode the quantized spectral data, the codebook number of the other channel is an encoding method when representing intensity stereo processing,
Detecting whether all the quantized spectral data in the intensity stereo processed band of one channel is zero,
The codebook number of the other channel in the band detected to be zero by the intensity stereo processing is changed from the codebook number representing the intensity stereo processing to the codebook number representing the zero data. When the number of bits required for encoding is calculated, and when the number of bits required for encoding is smaller when the codebook number is changed, the codebook number of the other channel is subjected to intensity stereo processing. A high-efficiency audio coding method characterized by replacing a codebook number represented by a codebook number representing zero data.

A flag indicating whether or not the mid-side stereo processing is performed and whether or not the phase relationship is inverted between the two channels of the intensity stereo processing using the mid-side stereo processing and the intensity stereo processing. An encoding method for encoding
Detecting whether all the quantized spectral data in the intensity stereo processed band of one channel is zero,
When the flags of the bands other than the band detected to be zero by the intensity stereo processing are all the same value, the value of the flag of the entire band is replaced with another value corresponding to the same value. A highly efficient encoding method for audio.

A program for causing a computer or a digital signal processor to execute the audio high-efficiency encoding method according to any one of claims 8 to 14.

A computer-readable recording medium that records a program for causing a computer or a digital signal processor to execute the audio high-efficiency encoding method according to any one of claims 8 to 14.