JP4327420B2

JP4327420B2 - Audio signal encoding method and audio signal decoding method

Info

Publication number: JP4327420B2
Application number: JP2002211570A
Authority: JP
Inventors: 峰生津島; 武志則松; 智一石川
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 1998-03-11
Filing date: 2002-07-19
Publication date: 2009-09-09
Anticipated expiration: 2019-03-11
Also published as: JP2003058196A

Description

【０００１】
【発明の属する技術分野】
本発明は、オーディオ信号符号化方法、及びオーディオ信号復号化方法に関し、特に、音声信号や音楽信号などのオーディオ信号から得られる特徴量、特にオーディオ信号を直交変換等の手法を用いて、時間領域から周波数領域に変換した信号を用い、その変換した信号を原オーディオ信号と比較して、できるだけ少ない符号列で表現するために効率的に符号化する方法と、符号化された信号である符号化列のすべて、あるいはその一部のみを用いて、高い品質と広帯域なオーディオ信号を、復号可能な構成の復号化方法に関するものである。
【０００２】
【従来の技術】
オーディオ信号を効率的に符号化、および復号化する様々な手法が提案されている。音楽信号など、２０ｋHz以上の周波数帯域を有するオーディオ信号の圧縮符号化式には、MPEGオーディオ方式や、Twin VQ （TC-WVQ）方式などがある。ＭＰＥＧ方式に代表される符号化方式は、時間軸のディジタルオーディオ信号を、コサイン変換などの直交変換を用いて、周波数軸上のデータに変換し、その周波数軸上の情報を、人間の聴覚的な感度特性を利用して、聴覚的に重要な情報から符号化していく方式であり、聴覚的に重要でない情報や、冗長な情報は符号化しない方式である。一方、Twin VQ （TC-WVQ）方式は、ベクトル量子化手法を用いて、原ディジタル信号の情報量に対して、かなり少ない情報量で表現しようとする符号化方式がある。ＭＰＥＧオーディオ、および Twin VQ (ＴＣ−ＷＶＱ) は、それぞれISO/IEC 標準IS-11172-3、およびT.Moriya,H.Suga:An 8 Kbits transform coder for noisy channels, Proc.ICASSP 89,pp196-199、などに述べられている。
【０００３】
ここで、図１０を用いて、一般の、Twin VQ 方式の概要を説明する。
原オーディオ信号１０１を解析長判定部１０２に入力し、解析長を算出する。また同時に、解析長判定部１０２は解析長１１２を量子化し、解析長符号列１１１を出力する。次に、その解析長１１２に従って、時間周波数変換部１０３で、原オーディオ信号１０１を周波数領域の原オーディオ信号１０４に変換する。次に、周波数領域の原オーディオ信号１０４は、正規化処理部（平坦化処理部）１０６で正規化処理（平坦化処理）され、正規化処理後のオーディオ信号１０８を得る。正規化処理は、原オーディオ信号１０４から周波数概形１０５を計算し、原オーディオ信号１０４を算出した周波数概形１０５で割ることにより行われる。さらに、正規化処理部１０６は、正規化処理に用いた周波数概形情報を量子化し、正規化符号列１０７を出力する。次に、正規化処理後のオーディオ信号１０８を、ベクトル量子化部１０９により量子化し、符号列１１０が得られる。
【０００４】
近年、復号器に入力する符号列の一部を用いても、オーディオ信号を再生することができる構造を持つものがある。上記の構造を、スケーラブル構造と呼び、スケーラブル構造を実現できるように符号化することを、スケーラブルコーディングと呼ぶ。
【０００５】
図１１に一般の、Twin VQ 方式で採用されている、固定スケーラブルコーディングの一例を示す。
原オーディオ信号１３０１から解析長判定部１３０３により判定された解析長１３１４に従って、時間周波数変換部１３０２により、周波数領域の原オーディオ信号１３０４を得る。次に、周波数領域の原オーディオ信号１３０４を、低域符号化器１３０５に入力すると、量子化誤差１３０６と、低域符号列１３１１とが出力される。さらに、量子化誤差１３０６を中域符号化器１３０７に入力すると、量子化誤差１３０８と、中域符号列１３１２とが出力される。さらに、量子化誤差１３０８を高域符号化器１３０９に入力すると、量子化誤差１３１０と、高域符号列１３１３とが出力される。ここで、上記低域、または中域、または高域符号化器は、正規化処理部と、ベクトル量子化部とを併せ持ち、その出力は、量子化誤差、および正規化処理部、ならびにベクトル量子化部により出力された各符号列を含む、低域、中域、または高域符号列を、出力するものである。
【０００６】
【発明が解決しようとする課題】
従来方式の固定スケーラブルコーディングでは、図１１に示すように、低域、中域、高域の各帯域量子化器が固定されているため、図１２に示すように、原オーディオ信号の分布に対して、量子化誤差をできるだけ少なくするように符号化することが困難であった。それゆえ、多種多様な性質や分布を持つオーディオ信号の符号化を行う際には、十分な性能を発揮できず、高音質で高効率なスケーラブルコーディングを行なうことが困難であった。
【０００７】
本発明は上記の問題点を解消するためになされたもので、多種多様なオーディオ信号の符号化に際して、オーディオ信号を符号化する際、図１３に示すように、多種多様なオーディオ信号を適応的にスケーラブルコーディングすることにより、効率よく、低ビットレートで、かつ、高音質に、符号化を行なうことのできるオーディオ信号符号化方法、及びオーディオ信号復号化方法を提供することを目的としている。
【０００８】
【課題を解決するための手段】
この課題を解決するために、本発明にかかるオーディオ信号符号化方法、及びオーディオ信号復号化方法は、固定スケーラブルコーディングを用いず、原オーディオ信号の性質，分布にあわせて符号化する周波数範囲を変化させる適応スケーラブルコーディングを行なうようにしたものである。
【０００９】
本発明に係るオーディオ信号符号化方法は、特性判定ステップ、符号化帯域制御ステップ、符号化ステップを包含し、時間−周波数変換されたオーディオ信号を符号化列に変換するオーディオ信号符号化方法であって、符号化列は、符号化情報と帯域制御符号列とを含み、符号化ステップは、複数の符号化サブステップを有し、符号化帯域制御ステップの制御によりオーディオ信号の多段符号化を行い符号化情報を出力し、特性判定ステップは、入力されるオーディオ信号を判定し、符号化する各周波数帯域の重み付けを示す帯域重み情報を出力し、符号化帯域制御ステップは、帯域重み情報に基づいて、多段符号化を構成する各符号化サブステップの量子化帯域、接続順を決定し、決定した各符号化サブステップの量子化帯域、接続順に基づいてスケーラブルに構成される多段符号化を符号化ステップに行わせ、決定した各符号化サブステップの量子化帯域、接続順を示す帯域制御符号列を出力するものである。
【００１０】
本発明に係るオーディオ信号符号化方法は、前記オーディオ信号符号化方法において、符号化帯域制御ステップが、予め定義された多段符号化のいずれかになるように、各符号化サブステップの量子化帯域、接続順を決定するようにしたものである。
【００１１】
本発明に係るオーディオ信号符号化方法は、前記オーディオ信号符号化方法において、符号化ステップが、量子化誤差を出力し、符号化帯域制御ステップが、帯域重み情報と量子化誤差とに基づいて、各符号化サブステップの量子化帯域、接続順を決定するようにしたものである。
【００１２】
本発明に係るオーディオ信号復号化方法は、復号化帯域制御ステップ、復号化ステップを包含し、符号化情報と帯域制御符号列とを含む符号化列をオーディオ信号に復号するオーディオ信号復号化方法であって、帯域制御符号列は、符号化情報を多段符号化した際の各符号化の量子化帯域、接続順を示し、復号化ステップは、複数の復号化サブステップを有し、復号化帯域制御ステップの制御により符号化情報の多段復号化を行い、復号化帯域制御ステップは、帯域制御符号列に基づいてスケーラブルに構成される多段復号化を復号化ステップに行わせるようにしたものである。
【００１３】
【発明の実施の形態】
以下、本発明の実施の形態１について、図１ないし図９を用いて、また、実施の形態２について、図１４ないし図２０を用いて、説明する。
【００１４】
（実施の形態１）
図１は、本発明の実施の形態１による、適応スケーラブルコーディングを行なう、オーディオ信号符号化装置のブロック図を示す。
図１において、１００１は原オーディオ信号５０１を符号化する符号化装置である。該符号化装置１００１において、５０２は上記原オーディオ信号５０１を解析する際の解析長５０４を判定する解析長判定部、５０３は上記解析長５０４の単位で、原オーディオ信号５０１の時間軸を周波数軸に変換する時間周波数変換部、５０４は上記解析長判定部５０２で判定された解析長、５０５は原オーディオ信号のスペクトル、７０１は該原オーディオ信号のスペクトル５０５が入力されるフィルタ、５０６は原オーディオ信号のスペクトル５０５の特性を判定し、上記符号化装置１００１における複数の各段の各符号化器５１１，５１２，５１３，５１１ｂ等、の量子化するオーディオ信号の周波数帯域を決定する特性判定部、５０７は該特性判定部５０６で決定された各符号化器の周波数帯域と、上記周波数変換されたオーディオ信号をその入力とし、複数の各段の各符号化器５１２，５１３，５１４，５１１ｂ等、の接続順を決定し、各符号化器の量子化帯域、及び接続順を符号列に変換する符号化帯域制御部、５０８は、該符号化帯域制御部５０７より出力される上記符号列である帯域制御符号列、５１０は上記解析長判定部５０２より出力された上記解析長５０４を符号列とした解析長符号列、５１１，５１２，５１３は、上述した、それぞれ低域，中域，高域の信号を符号化する低域符号化器、中域符号化器、高域符号化器、５１１ｂは第１段の低域符号化器５１１の量子化誤差５１８を符号化する第２段低域符号化器、５２１，５２２，５２３は該各符号化器５１１，５１２，５１３から出力される符号化信号である低域符号列、中域符号列、高域符号列、５２１ｂは第２段低域符号化器５１１ｂの符号化出力である第２段低域符号列、５１８，５１９，５２０は該各符号化器５１１，５１２，５１３から出力される、符号化される前の信号と上記各符号化信号との差である量子化誤差、５１８ｂは第２段低域符号化器５１１ｂの量子化誤差である第２段量子化誤差である。
【００１５】
一方、１００２は上記符号化装置１００１で符号化された符号化列を復号化する復号化装置である。該復号化装置１００２において、５は上記符号化装置１００１における時間周波数変換部５０３と逆の変換を行なう周波数時間変換部、６は時間軸上で窓関数を乗じる窓掛けを行なう窓掛け部、７はフレーム重ねあわせ部、８は復号信号、９は帯域合成部、１２０１は復号化帯域制御部、１２０２，１２０３，１２０４は、それぞれ上記低域符号化器、中域符号化器、高域符号化器５１１，５１２，５１３に対応して、復号化を行なう低域復号化器、中域復号化器、高域復号化器、１２０２ｂは第１段低域復号化器１２０２の出力を復号化する第２段低域復号化器である。
【００１６】
ここで、第２段以降の符号化器、復号化器はさらに他の帯域にも、またさらに、多段にも設けてもよいものであり、これが多段になるほど、必要に応じて、符号化、復号化の精度を向上できるものである。
【００１７】
以下、先ず、符号化装置１００１の動作について説明する。
符号化しようとする原オーディオ信号５０１は、時間的に連続するディジタル信号系列であるとする。例えば、音声信号を、サンプリング周波数４８ｋＨｚで１６ビットに量子化したディジタル信号であるとする。
【００１８】
上記原オーディオ信号５０１を解析長判定部５０２に入力する。上記解析長判定部５０２は、入力された上記原オーディオ信号５０１の特性を判断し、解析長５０４を決定し、その結果は解析長符号列５１０として、復号化装置１００２に送られる。解析長５０４としては、たとえば２５６、１０２４、４０９６などが用いられる。例えば、原オーディオ信号５０１に含まれる高域周波数成分が所定の値を超える場合には、解析長５０４を２５６とし、低域周波数成分が所定の値を超え、かつ高域周波数成分が所定の値より小さい場合には、解析長５０４を４０９６とし、それ以外の場合は、解析長５０４を１０２４とする。
こうして決定された解析長５０４に従って、時間周波数変換部５０３により原オーディオ信号５０１のスペクトル５０５を算出する。
【００１９】
図２に、本発明の実施の形態１によるオーディオ信号符号化装置における、時間周波数変換部５０３のブロック図を示す。
上記原オーディオ信号５０１は、そのサンプル値が所定のサンプル数に達するまでフレーム分割部２０１で蓄積され、該蓄積されたサンプル数が、上記解析長判定部５０２で決定された解析長５０４に達すると、出力を行なう。また、フレーム分割部２０１は、あるシフト長ごとに出力を行う構成のものであり、例えば、解析長５０４を４０９６サンプルとした場合において、解析長５０４の半分のシフト長を設定すれば、解析長５０４が２０４８サンプルに達するに相当する時間ごとに、最新の４０９６サンプルを出力するなどの構成を持つ。当然ながら、解析長５０４や、サンプリング周波数が変わっても、同様に、シフト長を解析長５０４の半分に設定した構成を持つことは可能である。
そして、このフレーム分割部２０１からの出力は、後段の窓掛け部２０２へと入力される。窓掛け部２０２では、フレーム分割部２０１からの出力に対して、時間軸上で窓関数を乗じて、窓掛け部２０２の出力とする。この様子は、例えば、（数１）で示される。
【００２０】
【数１】

ただし、ここで、ｘｉはフレーム分割部２０１からの出力で、ｈｉは窓関数、ｈｘｉは窓掛け部２０２からの出力である。まだ、ｉは時間のサフィックスである。なお、（数１）で示した窓関数ｈｉは一例であり、窓関数は必ずしも、（数１）のものである必要はない。
【００２１】
窓関数の選択は、窓掛け部２０２に入力される信号の特徴と、フレーム分割部２０１の解析長５０４と、時間的に前後に位置するフレームにおける窓関数の形状とに依存する。例えば、窓掛け部２０２に入力される信号の特徴として、フレーム分割部２０１の解析長５０４をＮとした場合、Ｎ／４ごとに入力される信号の平均パワーを算出して、その平均パワーが非常に大きく変動する場合は、解析長５０４をＮよりも短くして（数１）に示した演算を実行する、などの選択を行う。また、前の時刻のフレームの窓関数の形状と、後ろのフレームの窓関数の形状とに応じて、現在の時刻のフレームの窓関数の形状に歪みがないように、適宜選択するのが望ましい。
【００２２】
次いで、窓掛け部２０２からの出力は、ＭＤＣＴ部２０３に入力され、ここで変形離散コサイン変換が施され、ＭＤＣＴ係数が出力される。変形離散コサイン変換の一般式は、（数２）で表される。
【００２３】
【数２】

このようにＭＤＣＴ部２０３の出力であるＭＤＣＴ係数は、（数２）中の、ykで表せるとすると、ＭＤＣＴ部２０３の出力は周波数特性を示し、ykの変数k が０に近いほど、低い周波数成分に、０から増大してN/2-1 に近くなるほど、高い周波数成分に、線形に対応する。こうして算出された上記ＭＤＣＴ係数が、原オーディオ信号のスペクトル５０５となる。
【００２４】
次に、上記原オーディオ信号のスペクトル５０５をフィルタ７０１へと入力する。該フィルタ７０１の入力を、ｘ７０１（ｉ）、出力を、ｙ７０１（ｉ）とすると、例えば、（数３）で表されるフィルタを用いる。
【００２５】
【数３】

ここで、ｆｓは解析長５０４である。
（数３）で表されるフィルタ７０１は、一種の移動平均フィルタであるが、当然ながら、移動平均フィルタに限定する必要はなく、他の、たとえば高域通過フィルタであってもいいし、帯域抑制フィルタであっても良い。
【００２６】
フィルタ７０１の出力と、解析長判定部５０２で算出した解析長５０４とを、特性判定部５０６に入力する。図６に、特性判定部５０６の詳細を示す。特性判定部５０６では、原オーディオ信号５０１、および原オーディオ信号のスペクトル５０５、の聴覚的、物理的な特性を決定する。原オーディオ信号５０１、および該スペクトル５０５の聴覚的、物理的特性とは、例えば、音声か、音楽か、の違いである。音声の場合、たとえば６ｋHzより低域に、大半の周波数成分があるものである。
【００２７】
次に、特性判定部５０６の動作を、図６を用いて説明する。
特性判定部５０６に入力された原オーディオ信号のスペクトル５０５をフィルタ７０１によってフィルタリングした信号を、ｘ５０６（ｉ）とすると、このｘ５０６（ｉ）を基に、スペクトルパワーｐ５０６（ｉ）を、（数４）により、スペクトルパワー計算部８０３で計算する。
【００２８】
【数４】

このスペクトルパワーｐ５０６（ｉ）を、符号化帯域制御部５０７の入力の一つとし、各符号化器の帯域制御重み５１７とする。
また、解析長５０４が小さい場合、例えば２５６なるとき、各符号化器を固定的に配置するよう、配置決定部８０４で決定し、符号化帯域制御部５０７へと、符号化帯域配置情報５１６を、固定配置として送る。
【００２９】
解析長５０４が小さい場合以外の場合、たとえば４０９６や１０２４のときは、各符号化器を動的に配置するよう、配置決定部８０４で決定し、符号化帯域制御部５０７へと、符号化帯域配置情報５１６を、動的配置として送る。
【００３０】
次に、符号化帯域制御部５０７の動作を、図７を用いて説明する。
符号化帯域制御部５０７には、上記特性判定部５０６からの出力である帯域制御重み５１７と、符号化帯域配置情報５１６、および原オーディオ信号のスペクトル５０５をフィルタ７０１でフィルタリングした信号と、各符号化器の出力した量子化誤差５１８、または５１９、または５２０が入力される。ただし、これらの入力があるのは、各符号化器５１１、５１２、５１３、５１１ｂと、符号化帯域制御部５０７とが、再帰的に動作するためであり、初回の符号化帯域制御部５０７の動作においては、量子化誤差がないため、量子化誤差を除いた３つの入力となる。
【００３１】
上記のように、解析長５０４が小さく、符号化帯域配置情報５１６が固定配置となる場合は、予め定義された帯域の固定配置に従って、符号化を、低域から中域、高域へと順に実行するよう、量子化順序決定部９０２、および、符号化器数決定部９０３、帯域幅算出部９０１により、符号化器の量子化帯域，個数，接続順を決定し、符号化を行う。即ち、その時の帯域制御符号列５０８には、符号化器の帯域情報、符号化器数、および、その接続順序が、情報として符号化される。
【００３２】
たとえば、各符号化器の符号化帯域、および符号化器数を、それぞれ0Hz 〜4kHzに１つ、0Hz 〜8kHzに１つ、4kHz〜12kHz に１つ、8kHz〜16kHz に２つ、16kHz 〜24kHz に３つ、となるように、符号化器を配置し、符号化を行う。
【００３３】
次に、符号化帯域配置情報５１６が動的配置になっている場合の、符号化帯域制御部５０７の動作について説明する。
符号化帯域制御部５０７は、各符号化器の量子化帯域幅を決定する帯域幅算出部９０１、各符号化器の量子化順序を決定する量子化順序決定部９０２、さらに各帯域の符号化器の数を決定する符号化器数決定部９０３、の３つよりなる。符号化帯域制御部５０７に入力された信号をもとに、各符号化器の帯域幅を決定する訳であるが、所定の帯域、例えば、0Hz 〜4kHz、0kHz〜8kHz、4kHz〜12kHz 、8kHz〜16kHz 、16kHz 〜24kHz の各帯域において、帯域制御重み５１７、および各符号化器が符号化した後の量子化誤差、を乗算したものの平均値を算出する。ここで、帯域制御重み５１７を、weight517(i)、量子化誤差を、err507(i) とすると、（数５）により、平均値を算出する。
【００３４】
【数５】

ここで、ｊは各帯域のインデックス、Ave901(j) は、帯域ｊにおける平均値、fupper(j) 、およびflower(j) は、帯域ｊの上限周波数、および下限周波数である。こうして得られた平均値 Ave901(j)が最大となるｊを検索し、それが、符号化器が符号化する帯域となる。さらに、検索されたｊの値を、符号化器数決定部９０３に送り、ｊに対応する帯域の符号化器数を一つ増やすようにし、所定の符号化帯域にいくつの符号化器が存在するのかを記憶しておき、記憶している符号化器数の合計が、予め決定しておいた符号化器の総数になるまで、符号化を繰り返す。最後に、符号化器の帯域、および符号化器数を、帯域制御符号列５０８として、復号化器へと伝送する。
【００３５】
次に、符号化器３の動作について、図３を用いて説明する。
符号化器３は、正規化部３０１と、量子化部３０２とからなる。
正規化部３０１では、フレーム分割部２０１からの出力である時間軸の信号と、ＭＤＣＴ部２０３からの出力であるＭＤＣＴ係数、との両者を入力として、いくつかのパラメータを用いて、ＭＤＣＴ係数を正規化する。ここで、ＭＤＣＴ係数の正規化とは、低域成分と高域成分とで非常に大きさに違いのあるＭＤＣＴ係数の大きさのばらつきを抑圧することを意味し、例えば、低域成分が高域成分に対して非常に大きい場合などは、低域成分では大きな値、高域成分では小さな値、となるようなパラメータを選出し、これで上記ＭＤＣＴ係数を除算することにより、ＭＤＣＴ係数の大きさのばらつきを抑圧することを指す。また正規化部３０１では、正規化に用いたパラメータを表現するインデックスを、正規化符号列３０３として符号化する。
【００３６】
量子化部３０２では、正規化部３０１で正規化されたＭＤＣＴ係数を入力として、ＭＤＣＴ係数の量子化を行う。この際、該量子化部３０２は、該量子化した値と、コードブック中にある複数のコードインデックスに対応する各量子化出力、との間の差が最も小さくなるような，そのような該コードイッデックスを出力する。この場合、上記量子化部３０２で量子化した値と、該量子化部３０２から出力されるコードインデックスに対応する値、との差が量子化誤差である。
【００３７】
次に、図４を用いて、上記正規化部３０１の詳細な一例を説明する。
図４において、４０１はフレーム分割部２０１とＭＤＣＴ部２０３の出力を受ける周波数概形正規化部、４０２は上記周波数概形正規化部４０１の出力を受け、帯域テーブル４０３を参照して、正規化を行う帯域振幅正規化部である。
【００３８】
次に、動作について説明する。
周波数概形正規化部４０１では、フレーム分割部２０１からの時間軸上のデータ出力を用いて、大まかな周波数の概形である周波数概形を算出し、ＭＤＣＴ部２０３からの出力であるＭＤＣＴ係数を除算する。周波数概形を表現するのに用いたパラメータは、正規化符号列３０３として符号化される。帯域振幅正規化部４０２では、周波数概形正規化部４０１からの出力信号を入力として、帯域テーブル４０３で示された帯域ごとに正規化を行う。例えば、周波数概形正規化部４０１の出力であるＭＤＣＴ係数が、dct(i)(i = 0〜2047) とし、帯域テーブル４０３が、例えば、（表１）に示されるようなものであるとすると、（数６）などを用いて、各帯域毎の振幅の平均値を算出する。
【００３９】
【表１】

【数６】

ここで、bjlow,bjhighは、帯域テーブル４０３に示されたj 番目の帯域におけるdct(i)が属する最も低域のインデックスi と、最も高域のインデックスi をそれぞれ示している。また、p は距離計算におけるノルムであり、2 などが望ましい。avejは、各帯域番号j における振幅の平均値である。帯域振幅正規化部４０２では、avejを量子化して、qavej を算出して、例えば、（数７）を用いて正規化する。
【００４０】
【数７】

avejの量子化は、スカラーの量子化を用いてもよいし、コードブックを用いてベクトル量子化を行ってもよい。帯域振幅正規化部４０２では、qavej を表現するのに用いたパラメータのインデックスを、正規化符号列３０３として符号化する。
【００４１】
なお、符号化器における正規化部３０１の構成は、図４の周波数概形正規化部４０１と、帯域振幅正規化部４０２、との両者を用いた構成のものを示したが、周波数概形正規化部４０１のみを用いた構成でもよく、帯域振幅正規化部４０２のみを用いた構成でもよい。さらに、ＭＤＣＴ部２０３から出力されるＭＤＣＴ係数の低域成分と、高域成分とで大きなばらつきがない場合には、上記両者を用いない構成で、ＭＤＣＴ部２０３の出力信号を、そのまま量子化部３０２に入力する構成としてもよい。
【００４２】
次に、図５を用いて、図４の周波数概形正規化部４０１の詳細について説明する。
図５において、６０１はフレーム分割部２０１の出力を受ける線形予測分析部、６０２は線形予測分析部６０１の出力を受ける概形量子化部、６０３はＭＤＣＴ部２０３の出力を受ける包絡特性正規化部である。
【００４３】
次に、上記周波数概形正規化部４０１の動作について、図５を参照して説明する。
上記線形予測分析部６０１では、フレーム分割部２０１からの時間軸上のオーディオ信号を入力として、線形予測分析（Linear Predictive Coding）を行う。線形予測分析の線形予測係数（ＬＰＣ係数）は、ハミング窓などの窓掛けされた信号の自己相関関数を算出し、正規方程式などを解くことで、一般に算出可能である。算出された線形予測係数は、線スペクトル対係数（ＬＳＰ（Line Spectrum Pair) 係数）などに変換され、概形量子化部６０２で量子化される。ここでの量子化手法としては、ベクトル量子化を用いてもよいし、スカラー量子化を用いてもよい。そして、概形量子化部６０２で量子化されたパラメータが表現する周波数伝達特性を、包絡特性正規化部６０３で算出し、ＭＤＣＴ部２０３からの出力であるＭＤＣＴ係数を、これで除算することによって正規化する。具体的な算出例としては、概形量子化部６０２で量子化されたパラメータと等価な線形予測係数を、qlpc(i) とすれば、包絡特性正規化部６０３で算出される上記周波数伝達特性は、例えば、（数８）で表すことができる。
【００４４】
【数８】

ここで、ORDER は１０〜４０くらいが望ましい。fft() は高速フーリエ変換を意味する。算出された周波数伝達特性env(i)を用いて、包絡特性正規化部６０３では、例えば、下記に示す（数９）を用いて、正規化を行う。
【数９】

ここで、mdct(i) はＭＤＣＴ部２０３からの出力信号で、fdct(i) は正規化された包絡特性正規化部６０３からの出力信号である。
【００４５】
次に図８を用いて、上記符号化装置１における量子化部３０２の量子化方法の詳細な動作について説明する。
量子化部３０２に入力されるＭＤＣＴ係数１００１は、そのＭＤＣＴ係数１００１から幾つかを抜き出して、音源サブベクトル１００３を構成する。同様に、正規化部３０１で、正規化部３０１の入力であるＭＤＣＴ係数を、正規化部３０１の出力であるＭＤＣＴ係数で割った係数列を、正規化成分１００２とした時、この正規化成分１００２についても、ＭＤＣＴ係数１００１から音源サブベクトル１００３を抜き出したのと同じ規則で、該正規化成分１００２からサブベクトルの抽出を行い、重みサブベクトル１００４を構成することができる。音源サブベクトル１００３、および重みサブベクトル１００４を、ＭＤＣＴ係数１００１および正規化成分１００２からそれぞれ抽出する規則は、例えば、（数１０）で示す方法などがある。
【００４６】
【数１０】

ここで、i 番目の音源サブベクトルのj 番目の要素はsubvector i(j) であり、ＭＤＣＴ係数１００１はvector()であり、ＭＤＣＴ係数１００１の総要素数がTOTAL で、音源サブベクトル１００３の要素数がＣＲ、VTOTALは、TOTAL と同じ値かより大きい値で、VTOTAL/CR が正数値になるように設定する。例えば、TOTAL が２０４８の時、ＣＲが１９で、VTOTALが２０５２、ＣＲが２３で、VTOTALが２０７０、ＣＲが２１で、VTOTALが２０７９などである。重みサブベクトル１００４も、数１０の手順で抽出可能である。ベクトル量子化器１００５では、コードブック１００９中のコードベクトルの中から、音源サブベクトル１００３との距離が、重みサブベクトル１００４で重み付けて最も小さくなるものを探し、その最小の距離を与えたコードベクトルのインデックスと、最小の距離を与えたコードベクトルと入力音源サブベクトル１００３との量子化誤差に相当する残差サブベクトル１０１０とを出力する。
【００４７】
実際の計算手順例においては、ベクトル量子化器１００５が、距離計算手段１００６，コード決定手段１００７，残差生成手段１００８、の３つの構成要素からなるものとして、説明する。
距離計算手段１００６では、例えば、（数１１）を用いて、i 番目の音源サブベクトル１００３と、コードブック１００９のk 番目のコードベクトル、との距離を算出する。
【００４８】
【数１１】

ここで、wjは、重みサブベクトルのj 番目の要素、Ck(j) は、k 番目のコードベクトルのj 番目の要素、Ｒ、Ｓは、距離計算のノルムであり、Ｒ、Ｓの値としては、1, 1.5, 2 などが望ましい。なお、このノルムＲとＳは、同一の値である必要はない。dik は、i 番目の音源サブベクトルに対するk 番目のコードベクトルの距離を意味する。コード決定手段１００７では、（数１１）などで算出された距離の中で、最小となるコードベクトルを選出し、そのインデックスを符号列３０４として符号化する。例えば、複数の上記dik があるうちの，diu が最小値である場合、i 番目のサブベクトルに対する符号化されるインデックスは、u となる。残差生成手段１００８では、コード決定手段１００７で選出したコードベクトルを用いて、（数１２）により残差サブベクトル１０１０を生成する。
【００４９】
【数１２】

ここで、i 番目の残差サブベクトル１０１０のj 番目の要素は、resi(j) であり、コード決定手段１００７で選出されたコードベクトルのｊ番目の要素を、Cu(j) とする。上記残差サブベクトル１０１０を用いて（数１０）の逆過程の演算を行ってベクトルを求め、該ベクトルと、当該符号化器の元々の符号化対象であったベクトルとの差を、それ以降の各符号化器の量子化対象となるＭＤＣＴ係数として保持する。ただし、ある帯域の符号化が、それ以降の符号化器に影響を与えない帯域に対して符号化を行っている場合、つまり、以降の符号化器が符号化をすることがない場合は、残差生成手段１００８による，残差サブベクトル１０１０，ＭＤＣＴ１０１１の生成は必要ない。なお、コードブック１００９が持つコードベクトルの個数はいくつでもよいが、メモリ容量、計算時間等を考慮すると、６４程度とすることが好ましい。
【００５０】
なお、上記ベクトル量子化器１００５の他の例としては、以下のような構成も可能である。すなわち、距離計算手段１００６では、（数１３）を用いて距離を算出する。
【００５１】
【数１３】

ただし、Ｋは、コードブック１００９のコード検索に用いるコードベクトルの総数である。
コード決定手段１００７では、（数１３）で算出された距離dik の最小値を与えるk を選出し、そのインデックスを符号化する。ただし、k は０から2K-1までの値となる。残差生成手段１００８では、（数１４）を用いて残差サブベクトル１０１０を生成する。
【００５２】
【数１４】

ここで、コードブック１００９が持つコードベクトルの個数はいくつでもよいが、メモリの容量、計算時間等を考慮すると、６４程度とすることが好ましい。また、上記では、重みサブベクトル１００４を、正規化成分１００２のみから生成する構成について述べたが、重みサブベクトル１００４に、人間の聴覚特性を考慮した重みをさらに乗じて、重みサブベクトルを生成することも可能である。
以上のようにして、複数の各段の各符号化器の帯域幅、符号化器の個数、及び、接続順序が動的に決定される。そして、こうして決定された各符号化器の情報を基に、量子化を行なう。
【００５３】
一方、復号化装置１００２では、各帯域の符号化器の出力である正規化符号列と、該正規化符号列に対応した量子化部からの符号列、さらに符号化装置における符号化帯域制御部の出力である帯域制御符号列、また解析長判定部の出力である解析長符号列、を用いて、復号を行う。
【００５４】
図９に、復号化器１２０２、１２０３、…の構成を示す。各復号化器は、正規化されたMDCT係数を再生する逆量子化部１１０１と、正規化係数を復号し、上記再生された正規化されたMDCT係数と、正規化係数とを乗算する逆正規化部１１０２とからなる。
【００５５】
逆正規化部１１０２では、各符号化器の正規化部３０１からの正規化符号列３０３から、符号化装置１で正規化に用いたパラメータの復元を行い、逆量子化部１１０１の出力と、該パラメータとを乗算し、ＭＤＣＴ係数の復元を行う。
【００５６】
復号化帯域制御部１２０１では、符号化帯域制御部５０７の出力である帯域制御符号列５０８を用いて、符号化装置で用いた符号化器の配置や、符号化器の個数の情報を復元し、その情報に基づいて各帯域に各復号化器１２０２、１２０３、１２０４、１２０２ｂを配置し、符号化装置での各符号化器５１１、５１２、５１３、５１１ｂの符号化順序とは逆順に帯域を合成する帯域合成部９により、MDCT係数を得る。こうして得られた該MDCT係数を入力とする周波数時間変換部５では、逆ＭＤＣＴを行い、周波数領域の信号から時間領域の信号への復元を行う。上記逆MDCT係数の計算は、例えば、（数１５）で示される。
【００５７】
【数１５】

ここで、ｙｙｋは帯域合成部９で復元されたＭＤＣＴ係数で、ｘｘ（ｎ）は逆ＭＤＣＴ係数であり、これを周波数時間変換部５の出力とする。
窓掛け部６では、周波数時間変換部５からの出力ｘｘ（ｉ）を用いて窓掛けを行う。窓掛けは、符号化装置１の時間周波数変換部５０３にある窓掛け部２０２で用いた窓を用い、たとえば、（数１６）で示される処理を行う。
【００５８】
【数１６】

ここで、 z(i) は窓掛け部６の出力である。
フレーム重ね合わせ部７では、窓掛け部６からの出力を用いて、オーディオ信号を再生する。窓掛け部６からの出力は、時間的に重複した信号となっているので、フレーム重ね合わせ部７では、例えば、（数１７）を用いて、復号化装置１００２の出力信号とする。
【００５９】
【数１７】

ここで、zm(i) は、第m 時刻フレームの第ｉ番目の窓掛け部６の出力信号 z(i) で、zm-1(i) は、第m-1 時刻フレームの第ｉ番目の窓掛け部６の出力信号とし、SHIFT は、符号化装置の解析長５０４に相当するサンプル数、out m(i)は、フレーム重ね合わせ部７の第ｍ時刻フレームにおける復号化装置１００２の出力信号とする。
【００６０】
また、本実施の形態１においては、以下のように、符号化帯域制御部５０７において、帯域幅算出部９０１で算出する量子化可能周波数範囲を、解析長５０４により制限することもある。
たとえば、解析長５０４が２５６の場合、各符号化器の量子化可能周波数範囲の下限を４ｋHz程度、上限を２４ｋHz程度にする。解析長が１０２４あるいは２０４８の場合、下限を０Hz、上限を１６ｋHz程度にする。さらに一度、解析長５０４が２５６になれば、その後一定時間の間、たとえば２０ｍｓｅｃ程度の間、各量子化器の量子化可能周波数範囲や、量子化器の配置を固定するように、量子化順序決定部９０２により制御することもできる。この処理を用いることにより、経時的に量子化器の配置を一定にし、聴感的な帯域の出入り感（ある瞬間まで高い帯域が中心の音声であったものが、突然、低い帯域が中心の音声に変わったときのように、音声帯域の出入りがあったような感覚）が発生するのを抑制することができる。
【００６１】
このような本実施の形態１によるオーディオ信号符号化装置，及び復号化装置では、複数の各段の符号化器の量子化するオーディオ信号の周波数帯域を決定する特性判定部と、上記特性判定部で決定された周波数帯域と、周波数変換された元々のオーディオ信号とをその入力とし、上記複数の各段の符号化器の接続順を決定し、符号化器の量子化帯域、及び接続順を符号列に変換する符号化帯域制御部とを備え、適応的にスケーラブルコーディングを行なう構成としたので、多種多様なオーディオ信号の符号化を行なう際にも、高品質，高効率の、十分な性能を発揮できる適応スケーラブルコーディングを行なうオーディオ信号符号化装置，及びこれを復号する復号化装置を得ることができる。
【００６２】
（実施の形態２）
図１４に、本発明の実施の形態２について、図１４ないし図２０を用いて説明する。
図１４は、本発明の実施の形態２による、適応スケーラブルコーディングを行なう符号化装置２００１、及び復号化装置２００２のブロック図を示す。図に示すように、符号化装置２００１において、２００１０５は、符号化器の個数、ビットレート、入力オーディオ信号のサンプリング周波数、各符号化器の符号化帯域情報、等の符号化条件、２００１０７は複数の各段の各符号化器の量子化するオーディオ信号の周波数帯域を決定する特性判定部、２００１０９は符号化帯域配置情報、２００１１０は特性判定部２００１０７で決定された周波数帯域と、周波数変換されたオーディオ入力信号とを入力とし、上記複数の各段の符号化器の量子化帯域、及び接続順を符号列に変換する符号化帯域制御部、２００１１１は符号化列、２００１１２は伝送符号化列合成器である。
【００６３】
また、復号化装置２００２において、２００１５０は伝送符号化列分解器、２００１５１は符号化列、２００１５３ｂは符号化列２００１５１を入力とし、これを復号化する各復号化器の復号化帯域を制御する復号化帯域制御部、２００１５４ｂは復号化スペクトルである。
【００６４】
本発明の実施の形態２による符号化装置２００１は、上記実施の形態１と同じく、適応スケーラブルコーディングを行なうものであるが、実施の形態１に比し、新たに、符号化装置２００１に、復号化帯域制御部２００１５３を含む符号化帯域制御部２００１１０を、復号化装置２００２に、上記復号化帯域制御部２００１５３と同じ処理を行なう復号化帯域制御部２００１５３ｂを追加し、さらに、本実施の形態２の特性判定部２００１０７においては、上記実施の形態１における特性判定部５０６のスペクトルパワー計算部８０３に代えて、図１６に示すように、聴覚心理モデル計算部２００６０２を設け、さらに、該特性判定部２００１０７内に、符号化条件２００１０５と、符号化帯域算出部２００６０１より計算される符号化帯域情報２００７０２と、配置決定部２００６０３より出力される帯域番号２００６０６とより、符号化帯域配置情報２００１０９を生成する符号化帯域配置情報生成手段２００６０４を設けたものである。
【００６５】
また、復号化装置２００２において、２００１５０は伝送符号化列分解器、２００１５１は符号化列、２００１５３ｂは符号化列２００１５１を入力とし、これを復号化する各復号化器の復号化帯域を制御する復号化帯域制御部、２００１５４ｂは復号化スペクトルである。
【００６６】
次に、本実施の形態２の動作について説明する。
本実施の形態２において、符号化しようとする原オーディオ信号５０１は、上記実施の形態１と同様、時間的に連続するディジタル信号系列であるとする。
まず、上記実施の形態１と同様な処理によって、原オーディオ信号のスペクトル５０５を得る。本実施の形態２では、符号化装置２００１に対して、符号化器数、ビットレート、入力オーディオ信号のサンプリング周波数、各符号器の符号化帯域情報、を含む符号化条件２００１０５を、該符号化装置２００１における特性判定部２００１０７に入力する。特性判定部２００１０７は、複数の各段の各符号化器の量子化帯域、個数、及び接続順の情報を含む符号化帯域配置情報２００１０９を出力し、これを符号化帯域制御部２００１１０へ入力する。符号化帯域制御部２００１１０には、図１７に示されるように、符号化帯域配置情報２００１０９以外に、原オーディオ信号のスペクトル５０５が入力され、これらを基に該符号化帯域制御部２００１１０により制御する各符号化器で符号化を行った符号化列２００１１１を出力し、これは伝送符号化列合成器２００１１２へ入力されてこれにより合成され、その合成された出力が、さらに復号化装置２００２へと送信される。
【００６７】
復号化装置２００２では、符号化装置２００１の伝送符号化列合成器２００１１２の出力を、伝送符号化列分解器２００１５０で受け取り、符号化列２００１５１と解析長符号列２００１５２とに分解する。符号化列２００１５１は、復号化帯域制御部２００１５３ｂへと入力され、該復号化帯域制御部により制御される各復号化器で復号化された復号化スペクトル２００１５４ｂを得る。そして、該復号化スペクトル２００１５４ｂと、上記伝送符号化列分解器２００１５０の出力である解析長符号化列２００１５２とから、上記実施の形態１と同様に、周波数時間変換部５、窓掛け部６、及びフレーム重ね合わせ部７を用いて、復号信号８を得る。
【００６８】
次に、特性判定部２００１０７の動作を、図１５〜図２０を用いて説明する。
該特性判定部２００１０７は、符号化条件２００１０５を用いて符号化帯域配置情報２００７０２を算出する符号化帯域算出部２００６０１、原オーディオ信号のスペクトル５０５、及び差分スペクトル２００１０８などのスペクトル情報、及び符号化帯域情報２００７０２から、人間の聴覚心理モデルに基づいて聴覚重み２００６０５を算出する聴覚心理モデル計算部２００６０２、解析長５０３を参照して、これに応じて聴覚重み２００６０５にさらに重み付けを行い、各符号化器の帯域の配置を決定して帯域番号２００６０６を出力する配置決定部２００６０３、及び符号化条件２００１０５と、符号化帯域算出部２００６０１より計算される符号化帯域情報２００７０２と、配置決定部２００６０３より出力される帯域番号２００６０６とより、符号化帯域配置情報２００１０９を生成する符号化帯域配置情報生成手段２００６０４から構成される。
【００６９】
符号化帯域算出部２００６０１は、符号化装置２００１が動作を開始する前に設定する符号化条件２００１０５を用いて、図１５に示される符号化器２００３が符号化する符号化帯域の上限 fpu(k) 、下限 fpl(k) を算出し、符号化帯域情報２００７０２として、符号化帯域配置情報生成手段２００６０４に送られる。ここで、ｋは符号化帯域を扱うための数で、ｋが０から予め設定された最大数である pmax になるに従って、周波数が大きな帯域を示している。 pmax の一例は、４である。符号化帯域算出部２００６０１の動作の一例を、表２に示す。
【００７０】
【表２】

聴覚心理モデル計算部２００６０２は、フィルタ７０１からの出力信号、または符号化帯域制御部２００１１０の出力である差分スペクトル２００１０８，などのスペクトル情報、及び、符号化帯域算出部２００６０１の出力である符号化帯域情報２００７０２から、人間の聴覚心理モデルに基づいて、聴覚重み２００６０５を算出する。該聴覚重み２００６０５は聴覚上重要な帯域が大きな値で、聴覚上それほど重要でない帯域が小さな値となるようなものである。聴覚心理モデル計算部２００６０２の一例としては、入力スペクトルのパワーを計算する方法を用いるものがある。入力されるスペクトルを x602(i)としたときに、聴覚重み wpsy(k)は、
【００７１】
【数１８】

となる。こうして算出された聴覚重み２００６０５は、配置決定部２００６０３に入力され、該配置決定部２００６０３では、解析長５０３を参照しながら、解析長５０３が小、たとえば１２８の時には、帯域番号２００６０６が大である，たとえば、４，の帯域の聴覚重み２００６０５が大きくなるように、たとえば、この帯域番号が４の帯域の聴覚重みを２倍に重み付けし、また、解析長５０３が小でないときには、聴覚重み２００６０５をそのままとして、該聴覚重み２００６０３が最大となる帯域を計算し、その帯域番号２００６０６を、符号化帯域配置情報生成手段２００６０４に送る。
【００７２】
符号化帯域配置情報生成手段２００６０４は、上記符号化帯域情報２００７０２、及び帯域番号２００６０６、さらには符号化条件２００１０５、を入力として、符号化帯域配置情報２００１０９を出力するものである。即ち、該符号化帯域配置情報生成手段２００６０４は、符号化条件２００１０５を常に参照しながら、該符号化条件からして、符号化帯域配置情報２００１０９が必要とされる間は、上記符号化帯域情報２００７０２と帯域番号２００６０６とを連結してなる符号化帯域配置情報２００１０９を出力し、これが必要で無くなるとその出力を止める動作をする。たとえば、符号化条件２００１０５で指定された符号化器数になるまで、帯域番号２００６０６を出力する。なお、上記配置決定部２００６０３において、解析長５０３が小なるときには、出力する帯域番号２００６０６を固定する場合もある。
【００７３】
次に、図１７を用いて、符号化帯域制御部２００１１０の動作について説明する。
符号化帯域制御部２００１１０は、上記特性判定部２００１０７からの出力である符号化帯域配置情報２００１０９、および原オーディオ信号のスペクトル５０５を入力とし、符号化列２００１１１、及び差分スペクトル２００１０８をその出力とし、その内部には、符号化帯域配置情報２００１０９を受け、原オーディオ信号のスペクトル５０５、及び、過去の該原オーディオ信号のスペクトル５０５と、該スペクトル５０５を符号化しかつ復号化したスペクトル２００７０５との差分スペクトル２００１０８を、帯域番号２００６０６の帯域にシフトするスペクトルシフト手段２００７０１、符号化器２００３、上記原オーディオ信号のスペクトル５０５と復号化スペクトル２００７０５との差分をとる差分計算手段２００７０３、差分スペクトル保持手段２００７０４、及び、符号列２００１１１を復号化器２００４で復号した合成スペクトル２００１００１を、符号化帯域配置情報２００７０２に基づき、スペクトルシフトを行い、これを順次合成して合成スペクトルを得、復号化スペクトル２００７０５６を算出する復号化帯域制御部２００１５３を含んでいる。スペクトルシフト手段２００７０１の構成は、図２０に示すとおりであるが、入力としては、シフトしたい元スペクトル２００１１０１と、符号化帯域配置情報２００１０９とを用いる。符号化帯域制御部２００１１０におけるスペクトルシフト手段２００７０１の入力のうち、シフトしたいスペクトル２００１１０１は、原オーディオ信号のスペクトル５０５、または差分スペクトル２００１０８であり、それらを帯域番号２００６０６の帯域にシフトして、シフトされたスペクトル２００１１０２と、符号化帯域配置情報２００１０９のうちの符号化帯域情報２００７０２とを出力する。帯域番号２００６０６に対応する帯域は、符号化帯域情報２００７０２のfpl(k)、及びfpu(k)から求めることができる。シフトする手順は、上記fpl(k)とfpu(k)との間のスペクトルを、符号化器２００３の処理できる帯域まで移動することである。
【００７４】
こうして、シフトされたスペクトル２００１１０２を入力とする符号化器２００３は、図１５に示すように、正規化符号列３０３、及び残差符号列３０４を出力し、それらと、スペクトルシフト手段２００７０１の出力である符号化帯域情報２００７０２とをあわせたものが、符号列２００１１１として、伝送符号化合成器２００１１２、及び復号化帯域制御部２００１５３へと送られる。
【００７５】
上記符号化器２００３の出力である上記符号化列２００１１１は、該符号化帯域制御部２００１１０内にある復号化帯域制御部２００１５３へと入力される。該復号化帯域制御部２００１５３は、復号化装置２００２内に存在するもの（２００１５３ｂ）と、動作は同じである。
【００７６】
次に、上記復号化装置２００２内に存在する復号化帯域制御部２００１５３ｂの構成を、図１９に示す。
復号化帯域制御部２００１５３ｂは、伝送符号化列分解器２００１５０からの符号列２００１１１を入力として、復号化スペクトル２００７０５ｂを出力するもので、その内部には、復号化器２００４、スペクトルシフト手段２００７０１、復号化スペクトル算出部２００１００３を持つ。
【００７７】
上記復号化器２００４の構成を、図１８に示す。
復号化器２００４は、逆量子化部１１０１と逆正規化部１１０２とから構成されており、逆量子化部１１０１は、符号列２００１１１のうち残差符号列３０４を入力として、該残差符号列３０４をコードインデックスに変換し、符号化器２００３で用いたコードブックを参照し、そのコードを再生する。再生されたコードは、逆正規化部１１０２に送られ、符号列２００１１１内の正規化符号列３０３から再生された正規化系数列３０３ａと乗算され、合成スペクトル２００１００１を得る。該合成スペクトル２００１００１は、スペクトルシフト手段２００７０１に入力される。
【００７８】
なお、符号化帯域制御部２００１１０内の復号化帯域制御部２００１５３の出力は、復号化スペクトル２００７０５となっているが、これは、復号化装置２００２内の復号化帯域制御部２００１５３ｂの出力である復号化スペクトル２００７０５ｂと同じものである。
【００７９】
復号化器２００４によって合成された合成スペクトル２００１００１は、スペクトルシフト手段２００７０１によりシフトされて、シフトされた合成スペクトル２００１００２が得られ、これは復号化スペクトル算出部２００１００３に入力される。
【００８０】
復号化スペクトル算出部２００１００３内では、入力された合成スペクトルを保持しており、保持しているスペクトルと、最新の合成スペクトルとを加算し、復号化スペクトル２００７０５ｂとして出力する動作をする。
【００８１】
符号化帯域制御部２００１１０内の差分計算手段２００７０３は、原オーディオ信号のスペクトル５０５と、復号化スペクトル２００７０５との差分を計算して、差分スペクトル２００１０８を出力し、これは特性判定部２００１０７へとフィードバックされる。また同時に、上記差分スペクトル２００１０８は、差分スペクトル保持手段２００７０４により保持されて、スペクトルシフト手段２００７０１へも送られ、次の符号化帯域配置情報２００１０９が入力されるときに備えるように構成されている。特性判定部２００１０７では、符号化条件を参照しながら、該符号化条件を満たすまで符号化帯域配置情報２００１０９を出力しつづけ、それが無くなった段階で、符号化帯域制御部２００１１０の動作も停止する。なお、上記符号化帯域制御部２００１１０は、差分スペクトル２００１０８を計算するために、差分スペクトル保持手段２００７０４を持っている。これは、差分スペクトルを保持するために必要な記憶領域で、たとえば、２０４８個の数を記憶できるような配列である。
【００８２】
以上のように、符号化条件２００１０５を満たすように、特性判定部２００１０７と、それに続く符号化帯域制御部２００１１０とによる処理が繰り返され、逐次、符号化列２００１１１が出力され、それが伝送符号化列合成器２００１１２へと送られ、解析長符号列５１０とともに、伝送符号化列として合成され、復号化装置２００２へと伝送される。
【００８３】
復号化装置２００２では、符号化装置２００１より伝送されてきた伝送符号化列を、伝送符号化列分解器２００１５０にて、符号化列２００１５１と、解析長符号列２００１５２とに分解する。該符号化列２００１５１と、解析長符号列２００１５２とは、符号化装置２００１内の符号化列２００１１１、及び解析長符号列５１０と同じものである。
【００８４】
分解された符号化列２００１５１は復号化帯域制御部２００１５３ｂにおいて復号化スペクトル２００１５４ｂに変換され、該復号化スペクトル２００１５４ｂは、解析長符号列２００１５２の情報を用いて、周波数時間変換部５、窓掛け部６、及びフレーム重ね合わせ部７にて、時間領域の信号に変換され、それが復号化信号８となる。
【００８５】
このように本実施の形態２によるオーディオ信号符号化装置，復号化装置によれば、上記実施の形態１のように、複数の各段の符号化器の量子化するオーディオ信号の周波数帯域を決定する特性判定部と、上記特性判定部で決定された周波数帯域と、周波数変換された元々のオーディオ信号とをその入力とし、上記複数の各段の符号化器の接続順を決定し、符号化器の量子化帯域、及び接続順を符号列に変換する符号化帯域制御部とを備え、適応的にスケーラブルコーディングを行なう構成において、符号化装置には復号化帯域制御部を含む符号化帯域制御部を、復号化装置には復号化帯域制御部を設けるとともに、さらに、特性判定部におけるスペクトルパワー計算部を、聴覚心理モデル計算部とし、さらに、該特性判定部において、符号化帯域配置情報生成手段を設けた構成としたので、特性判定部のスペクトルパワー計算部にかえて代えて聴覚心理モデル計算部を用いたことにより、聴覚的に重要な部分の判定を精度よく行って、その帯域をより選択することができる。また、本発明が対象とするオーディオ信号符号化装置，復号化装置では、符号化器の配置を決定する演算を行なっている際に、符号化条件が満たされれば、符号化の処理がＯＫと判定されて符号化帯域配置情報も出ないこととなるが、この符号化器の配置を決定するための演算において、上記実施の形態１では、符号化器を配置するときの帯域を選択するときの各帯域幅、及び各帯域の重みが固定であるのに対し、本実施の形態２では、特性判定部の判定条件として、入力信号のサンプリング周波数と、圧縮率、即ち、符号化のビットレート、も入っていることから、これらに応じて、上記各符号化器の帯域配置を選択するときの各帯域に対する重み付け度合いを変えられるものであり、さらに、特性判定部の判定条件として、圧縮率の条件も入っていることにより、圧縮率が高いとき、即ち、ビットレートが低いときには上記各符号化器の帯域配置を選択するときの各帯域の重み付け度合いをあまり変化させないようにし、一方、圧縮率が低いとき、即ち、ビットレートが高いときには、効率をより追求するために、上記各符号化器の帯域配置を選択するときの各帯域の重み付け度合いを、聴覚上、より大事なところを強調するようにし、これにより、圧縮率と品質とのベストバランスを得ることができるものである。このように、多種多様なオーディオ信号の符号化を行なう際にも、十分な性能を発揮して、高品質、高効率な、適応スケーラブルコーディングを行なう、オーディオ信号符号化，復号化装置を得ることができる。
【００８６】
【発明の効果】
以上のように、本発明にかかるオーディオ信号符号化方法、及びオーディオ信号復号化方法によれば、符号化ステップは、複数の符号化サブステップを有し、符号化帯域制御ステップの制御によりオーディオ信号の多段符号化を行い符号化情報を出力し、特性判定ステップは、入力されるオーディオ信号を判定し、符号化する各周波数帯域の重み付けを示す帯域重み情報を出力し、符号化帯域制御ステップは、帯域重み情報に基づいて、多段符号化を構成する各符号化サブステップの量子化帯域、接続順を決定し、決定した各符号化サブステップの量子化帯域、接続順に基づいてスケーラブルに構成される多段符号化を符号化ステップに行わせ、決定した各符号化サブステップの量子化帯域、接続順を示す帯域制御符号列を出力するようにしたことにより、多種多様な性質を持つオーディオ信号に対して、より高音質で、より高効率な、適応スケーラブルコーディングを行なうことができるという，有利な効果が得られる。
【図面の簡単な説明】
【図１】本発明の実施の形態１によるオーディオ信号符号化装置における適応スケーラブルコーディングのブロック図
【図２】上記実施の形態１の符号化装置における時間周波数変換部を示す図
【図３】上記実施の形態１の符号化装置における符号化器を示す図
【図４】上記実施の形態１の符号化装置における正規化部を示す図
【図５】上記実施の形態１の符号化装置における周波数概形正規化部を示す図
【図６】上記実施の形態１の符号化装置における特性判定部を示す図
【図７】上記実施の形態１の符号化装置における符号化帯域制御部を示す図
【図８】上記実施の形態１の符号化装置における量子化部を示す図
【図９】上記実施の形態１の符号化装置における復号化器を示す図
【図１０】一般のTwinVQ方式の概要を示す図
【図１１】一般のTwinVQスケーラブルコーディング方式を示す図
【図１２】一般の固定スケーラブルコーディングの短所を示す図
【図１３】一般の適応スケーラブルコーディングの長所を示す図
【図１４】本発明の実施の形態２によるオーディオ信号符号化装置における適応スケーラブルコーディングのブロック図
【図１５】上記実施の形態２の符号化装置における符号化器を示す図
【図１６】上記実施の形態２の符号化装置における特性判定部を示す図
【図１７】上記実施の形態２の符号化装置における符号化帯域制御部を示す図
【図１８】上記実施の形態２の符号化装置における復号化器を示す図
【図１９】上記実施の形態２の符号化装置における復号化帯域制御部を示す図
【図２０】上記実施の形態２の符号化装置におけるスペクトルシフト手段を示す図
【符号の説明】
１符号化装置
２復号化装置
５０１原オーディオ信号
５０２解析長判定部
５０３時間周波数変換部
５０４解析長
５０５原オーディオ信号のスペクトル
５０６特性判定部
５０７符号化帯域制御部
５０８帯域制御符号列
５１０解析長符号列
５１１低域符号化器
５１２中域符号化器
５１３高域符号化器
５１１ｂ第２段低域符号化器
５１８，５１９，５２０，５１８ｂ量子化誤差
５２１低域符号列
５２２中域符号列
５２３高域符号列
５２１ｂ第２段低域符号列
７０１フィルタ
５周波数時間変換部
６窓掛け部
７フレーム重ねあわせ部
８復号信号
９帯域合成部
１２０１復号化帯域制御部
１２０２低域復号化器
１２０３中域復号化器
１２０４高域復号化器
１２０２ｂ第２段低域復号化器
２０１フレーム分割部
２０２窓掛け部
２０３ MDCT部
３符号化器
３０１正規化部
３０２量子化部
３０３正規化符号列
３０４符号列
４０１周波数概形正規化部
４０２帯域振幅正規化部
４０３帯域テーブル
６０１線形予測分析部
６０２概形量子化部
６０３包絡特性正規化部
８０３スペクトルパワー計算部
８０４配置決定部
５１７帯域制御重み
５１６符号化帯域配置情報
９０１帯域幅算出部
９０２量子化順序決定部
９０３符号化器数決定部
１００１量子化部の量子化する帯域のMDCT
１００２同じ量子化帯域の正規化成分
１００３音源サブベクトル
１００４重みサブベクトル
１００５ベクトル量子化器
１００６距離計算手段
１００７コード決定手段
１００８残差生成手段
１００９コードブック
１０１０残差サブベクトル
１０１１ある量子化部の量子化する帯域のMDCTの残差
１０１原オーディオ信号
１０２解析長判定部
１０３時間周波数変換部
１０４周波数領域の原オーディオ信号
１０５周波数概形
１０６正規化処理部
１０７正規化符号列
１０８正規化処理後の現オーディオ信号
１０９ベクトル量子化部
１１０符号列
１１１解析長符号列
１３０１原オーディオ信号
１３０２時間周波数変換部
１３０３解析長判定部
１３０４周波数領域の原オーディオ信号
１３０５低域符号化器
１３０６量子化誤差
１３０７中域符号化器
１３０８量子化誤差
１３０９高域符号化器
１３１０量子化誤差
１３１１低域符号列
１３１２中域符号列
１３１３高域符号列
１３１４解析長符号列
２００１符号化装置
２００２復号化装置
２００１０５符号化条件
２００１０７特性判定部
２００１０８差分スペクトル
２００１０９符号化帯域配置情報
２００１１０符号化帯域制御部
２００１１１符号化列
２００１１２伝送符号化列合成器
２００１５０伝送符号化列分解器
２００１５１符号化列
２００１５２解析長符号化列
２００１５３復号化帯域制御部
２００１５４復号化スペクトル
２００３符号化器
２００３０５符号化帯域情報
２００６０１符号化帯域算出部
２００６０２聴覚心理モデル計算部
２００６０３配置決定部
２００６０４符号化帯域配置情報生成手段
２００６０５聴覚重み
２００７０１スペクトルシフト手段
２００７０２符号化帯域情報
２００７０３差分計算手段
２００７０４差分スペクトル保持手段
２００４復号化器
２００９０１逆量化部
２００９０２逆正規化部
２００１００１合成スペクトル
２００１００２シフトされた合成スペクトル
２００１００３復号化スペクトル算出部
２００１１０１元スペクトル
２００１１０２シフトされたスペクトル[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an audio signal encoding method and an audio signal decoding method, and more particularly, a feature amount obtained from an audio signal such as a voice signal or a music signal, in particular, a time domain using a technique such as orthogonal transformation of the audio signal. A method of using a signal converted from the frequency domain to the frequency domain, comparing the converted signal with the original audio signal, and efficiently encoding the signal to express it with as few code sequences as possible, and encoding that is the encoded signal The present invention relates to a decoding method having a configuration capable of decoding a high-quality and wideband audio signal using all or only a part of the columns.
[0002]
[Prior art]
Various techniques for efficiently encoding and decoding audio signals have been proposed. There are MPEG audio systems, Twin VQ (TC-WVQ) systems, and the like as compression encoding systems for audio signals having a frequency band of 20 kHz or higher such as music signals. An encoding method typified by the MPEG method converts a time-axis digital audio signal into data on the frequency axis using orthogonal transform such as cosine transform, and converts the information on the frequency axis into human auditory information. This is a method that encodes from auditory important information by using a sensitive characteristic, and is a method that does not encode information that is not auditorially important or redundant information. On the other hand, the Twin VQ (TC-WVQ) system is a coding system that uses a vector quantization method to express with a considerably small amount of information with respect to the amount of information of the original digital signal. MPEG audio and Twin VQ (TC-WVQ) are ISO / IEC standard IS-11172-3 and T.Moriya, H.Suga: An 8 Kbits transform coder for noisy channels, Proc.ICASSP 89, pp196-199, respectively. , Etc.
[0003]
Here, an outline of a general Twin VQ method will be described with reference to FIG.
The original audio signal 101 is input to the analysis length determination unit 102, and the analysis length is calculated. At the same time, the analysis length determination unit 102 quantizes the analysis length 112 and outputs an analysis length code sequence 111. Next, according to the analysis length 112, the time-frequency conversion unit 103 converts the original audio signal 101 into the original audio signal 104 in the frequency domain. Next, the original audio signal 104 in the frequency domain is normalized (flattened) by a normalization processing unit (flattening processing unit) 106 to obtain a normalized audio signal 108. The normalization process is performed by calculating the frequency outline 105 from the original audio signal 104 and dividing the original audio signal 104 by the calculated frequency outline 105. Further, the normalization processing unit 106 quantizes the frequency outline information used for the normalization process, and outputs a normalized code string 107. Next, the normalized audio signal 108 is quantized by the vector quantization unit 109, and a code string 110 is obtained.
[0004]
In recent years, there is a structure that can reproduce an audio signal even if a part of a code string input to a decoder is used. The above structure is referred to as a scalable structure, and encoding so as to realize a scalable structure is referred to as scalable coding.
[0005]
FIG. 11 shows an example of a fixed scalable coding adopted in the general Twin VQ method.
In accordance with the analysis length 1314 determined by the analysis length determination unit 1303 from the original audio signal 1301, the time-frequency conversion unit 1302 obtains the original audio signal 1304 in the frequency domain. Next, when the original audio signal 1304 in the frequency domain is input to the low frequency encoder 1305, a quantization error 1306 and a low frequency code string 1311 are output. Further, when the quantization error 1306 is input to the midband encoder 1307, a quantization error 1308 and a midband code sequence 1312 are output. Further, when the quantization error 1308 is input to the high frequency encoder 1309, the quantization error 1310 and the high frequency code string 1313 are output. Here, the low-frequency, middle-frequency, or high-frequency encoder has a normalization processing unit and a vector quantization unit, and outputs thereof are a quantization error, a normalization processing unit, and a vector quantization unit. A low-frequency, middle-frequency, or high-frequency code sequence including each code sequence output by the conversion unit is output.
[0006]
[Problems to be solved by the invention]
In the conventional fixed scalable coding, as shown in FIG. 11, each of the low-band, mid-band, and high-band quantizers is fixed. Therefore, as shown in FIG. Thus, it has been difficult to perform coding so as to minimize the quantization error. Therefore, when encoding audio signals having various properties and distributions, sufficient performance cannot be exhibited, and it is difficult to perform scalable coding with high sound quality and high efficiency.
[0007]
The present invention has been made to solve the above-mentioned problems. When encoding various audio signals, the audio signals are adaptively encoded as shown in FIG. It is an object of the present invention to provide an audio signal encoding method and an audio signal decoding method that can perform encoding efficiently and at a low bit rate and with high sound quality.
[0008]
[Means for Solving the Problems]
In order to solve this problem, the audio signal encoding method and the audio signal decoding method according to the present invention change the frequency range to be encoded in accordance with the nature and distribution of the original audio signal without using fixed scalable coding. Adaptive scalable coding is performed.
[0009]
An audio signal encoding method according to the present invention is an audio signal encoding method that includes a characteristic determination step, an encoding band control step, and an encoding step, and converts a time-frequency converted audio signal into an encoded sequence. The encoded sequence includes encoded information and a band control code sequence, and the encoding step includes a plurality of encoding substeps, and performs multi-stage encoding of the audio signal under the control of the encoding band control step. The encoded information is output, the characteristic determining step determines the input audio signal, outputs band weight information indicating the weight of each frequency band to be encoded, and the encoded band control step is based on the band weight information. Thus, the quantization band and connection order of each encoding sub-step constituting the multi-stage encoding are determined, and the determined quantization band and connection order of each encoding sub-step are determined. There made to perform the encoding step the formed multi scalably encoded, the quantization bands of the respective coded sub-steps determined, and outputs a band control code sequence indicating the connection order.
[0010]
The audio signal encoding method according to the present invention is such that, in the audio signal encoding method, the quantization band of each encoding sub-step is set such that the encoding band control step is one of the predefined multistage encodings. The connection order is determined.
[0011]
In the audio signal encoding method according to the present invention, in the audio signal encoding method, the encoding step outputs a quantization error, and the encoding band control step is based on the band weight information and the quantization error. The quantization band and the connection order of each encoding substep are determined.
[0012]
An audio signal decoding method according to the present invention includes a decoding band control step and a decoding step, and is an audio signal decoding method for decoding an encoded sequence including encoded information and a band control code sequence into an audio signal. The band control code string indicates the quantization band and connection order of each encoding when the encoding information is multi-stage encoded, the decoding step has a plurality of decoding sub-steps, and the decoding band The encoded information is subjected to multistage decoding under the control of the control step, and the decoding band control step causes the decoding step to perform multistage decoding configured in a scalable manner based on the band control code string. .
[0013]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, the first embodiment of the present invention will be described with reference to FIGS. 1 to 9, and the second embodiment will be described with reference to FIGS.
[0014]
(Embodiment 1)
FIG. 1 is a block diagram of an audio signal encoding apparatus that performs adaptive scalable coding according to Embodiment 1 of the present invention.
In FIG. 1, reference numeral 1001 denotes an encoding apparatus that encodes an original audio signal 501. In the

encoding apparatus

1001, 502 is an analysis length determination unit that determines an analysis length 504 when analyzing the

original audio signal

501, 503 is a unit of the analysis length 504, and the time axis of the original audio signal 501 is a frequency axis. 504 is the analysis length determined by the analysis

length determination unit

502, 505 is the spectrum of the original audio signal, 701 is a filter to which the spectrum 505 of the original audio signal is input, and 506 is the original audio A characteristic determination unit that determines a characteristic of a spectrum 505 of the signal and determines a frequency band of an audio signal to be quantized by each of the

encoders

511, 512, 513, and 511b of each of the plurality of stages in the encoding device 1001; Reference numeral 507 denotes a frequency band of each encoder determined by the characteristic determination unit 506 and the above-described frequency-converted audio. The connection order of the

encoders

512, 513, 514, 511b, etc. at each of the plurality of stages is determined, and the quantization band and connection order of each encoder are converted into a code string. The coding band control unit 508 is a band control code string that is the code string output from the coding

band control unit

507, and 510 is the analysis length 504 output from the analysis length determination unit 502 as a code string. The analysis

long code strings

511, 512, and 513 are the low-frequency encoder, mid-frequency encoder, high-frequency encoder, and 511b for encoding the low-frequency, mid-frequency, and high-frequency signals, respectively. Is a second-stage low-band encoder that encodes the quantization error 518 of the first-stage low-

band encoder

511, and 521, 522, and 523 are codes output from the

respective encoders

511, 512, and 513. Low-frequency code sequence, mid-range code sequence, and high-frequency code sequence Reference numeral 521b denotes an encoded output of the second-stage low-pass encoder 511b. Second-stage low-pass code strings 518, 519, and 520 are output from the

respective encoders

511, 512, and 513 and are encoded. A quantization error 518b that is a difference between the previous signal and each of the encoded signals is a second-stage quantization error that is a quantization error of the second-stage low-band encoder 511b.
[0015]
On the other hand, reference numeral 1002 denotes a decoding apparatus that decodes the encoded sequence encoded by the encoding apparatus 1001. In the

decoding device

1002, 5 is a frequency time conversion unit that performs the inverse conversion of the time frequency conversion unit 503 in the

encoding device

1001, 6 is a windowing unit that performs windowing that multiplies a window function on the time axis, and 7 Is a frame superposition unit, 8 is a decoded signal, 9 is a band synthesis unit, 1201 is a decoding band control unit, 1202, 1203, and 1204 are the above low-band encoder, mid-band encoder, and high-band coding, respectively. Corresponding to the

units

511, 512, and 513, the low band decoder, middle band decoder, and high band decoder that perform decoding, 1202b decodes the output of the first stage low band decoder 1202. A second stage low band decoder.
[0016]
Here, the encoders and decoders in the second and subsequent stages may be provided in other bands and further in multiple stages, and as the number of stages increases, the encoding, Decoding accuracy can be improved.
[0017]
Hereinafter, first, the operation of the encoding apparatus 1001 will be described.
Assume that the original audio signal 501 to be encoded is a digital signal sequence that is temporally continuous. For example, assume that the audio signal is a digital signal quantized to 16 bits at a sampling frequency of 48 kHz.
[0018]
The original audio signal 501 is input to the analysis length determination unit 502. The analysis length determination unit 502 determines the characteristics of the input original audio signal 501 to determine the analysis length 504, and the result is sent to the decoding apparatus 1002 as an analysis length code string 510. For example, 256, 1024, 4096 or the like is used as the analysis length 504. For example, when the high frequency component included in the original audio signal 501 exceeds a predetermined value, the analysis length 504 is set to 256, the low frequency component exceeds a predetermined value, and the high frequency component is a predetermined value. If it is smaller, the analysis length 504 is set to 4096, and otherwise, the analysis length 504 is set to 1024.
In accordance with the analysis length 504 determined in this way, the spectrum 505 of the original audio signal 501 is calculated by the time frequency conversion unit 503.
[0019]
FIG. 2 shows a block diagram of time frequency conversion section 503 in the audio signal encoding apparatus according to Embodiment 1 of the present invention.
The original audio signal 501 is accumulated in the frame division unit 201 until the sample value reaches a predetermined number of samples, and when the accumulated number of samples reaches the analysis length 504 determined by the analysis length determination unit 502. , Output. Further, the frame dividing unit 201 is configured to output for every certain shift length. For example, when the analysis length 504 is 4096 samples, the analysis length can be set by setting a half shift length of the analysis length 504. For example, the latest 4096 samples are output every time corresponding to 504 reaching 2048 samples. Of course, even if the analysis length 504 or the sampling frequency changes, it is possible to similarly have a configuration in which the shift length is set to half of the analysis length 504.
The output from the frame dividing unit 201 is input to the subsequent windowing unit 202. The windowing unit 202 multiplies the output from the frame dividing unit 201 by a window function on the time axis to obtain the output of the windowing unit 202. This situation is expressed by, for example, (Equation 1).
[0020]
[Expression 1]

Here, xi is an output from the frame dividing unit 201, hi is a window function, and hxi is an output from the windowing unit 202. I is still the time suffix. Note that the window function hi shown in (Equation 1) is an example, and the window function does not necessarily have to be that in (Equation 1).
[0021]
The selection of the window function depends on the characteristics of the signal input to the windowing unit 202, the analysis length 504 of the frame dividing unit 201, and the shape of the window function in the frames positioned before and after in time. For example, when the analysis length 504 of the frame division unit 201 is N as a feature of the signal input to the windowing unit 202, the average power of the signal input every N / 4 is calculated, and the average power is If it fluctuates significantly, the analysis length 504 is made shorter than N and the calculation shown in (Equation 1) is executed. In addition, it is desirable to appropriately select the window function shape of the frame at the current time so as not to be distorted according to the shape of the window function of the frame at the previous time and the shape of the window function of the subsequent frame. .
[0022]
Next, an output from the windowing unit 202 is input to the MDCT unit 203, where a modified discrete cosine transform is performed, and MDCT coefficients are output. A general expression of the modified discrete cosine transform is expressed by (Equation 2).
[0023]
[Expression 2]

Thus, if the MDCT coefficient that is the output of the MDCT unit 203 can be expressed by yk in (Equation 2), the output of the MDCT unit 203 shows frequency characteristics, and the closer the variable k of yk is to 0, the lower the frequency The component corresponds to a higher frequency component linearly as it increases from 0 and approaches N / 2-1. The MDCT coefficient calculated in this way becomes the spectrum 505 of the original audio signal.
[0024]
Next, the spectrum 505 of the original audio signal is input to the filter 701. When the input of the filter 701 is x701 (i) and the output is y701 (i), for example, a filter represented by (Equation 3) is used.
[0025]
[Equation 3]

Here, fs is the analysis length 504.
The filter 701 represented by (Equation 3) is a kind of moving average filter, but of course, it is not necessary to be limited to the moving average filter, and may be another high-pass filter, for example, It may be a suppression filter.
[0026]
The output of the filter 701 and the analysis length 504 calculated by the analysis length determination unit 502 are input to the characteristic determination unit 506. FIG. 6 shows details of the characteristic determination unit 506. The characteristic determination unit 506 determines auditory and physical characteristics of the original audio signal 501 and the spectrum 505 of the original audio signal. The audio and physical characteristics of the original audio signal 501 and the spectrum 505 are, for example, the difference between voice and music. In the case of voice, for example, most of the frequency components are present in a range lower than 6 kHz.
[0027]
Next, the operation of the characteristic determination unit 506 will be described with reference to FIG.
Assuming that the signal obtained by filtering the spectrum 505 of the original audio signal input to the characteristic determination unit 506 by the filter 701 is x506 (i), the spectrum power p506 (i) is expressed based on this x506 (i) as ), The spectrum power calculation unit 803 calculates.
[0028]
[Expression 4]

This spectrum power p506 (i) is set as one of the inputs of the coding band control unit 507, and is set as the band control weight 517 of each encoder.
Further, when the analysis length 504 is small, for example, 256, the arrangement determining unit 804 decides to arrange each encoder in a fixed manner, and the coding band arrangement information 516 is sent to the coding band control unit 507. Send as a fixed arrangement.
[0029]
In cases other than the case where the analysis length 504 is small, for example, 4096 or 1024, the arrangement determining unit 804 determines that the encoders are dynamically arranged, and the encoding band control unit 507 transmits the encoding band. Placement information 516 is sent as dynamic placement.
[0030]
Next, the operation of the coding band control unit 507 will be described with reference to FIG.
The encoded band control unit 507 includes a band control weight 517 output from the characteristic determination unit 506, the encoded band arrangement information 516, a signal obtained by filtering the spectrum 505 of the original audio signal with the filter 701, and each code. The

quantization error

518, 519, or 520 output from the generator is input. However, these inputs are present because the

encoders

511, 512, 513, and 511b and the encoding band control unit 507 operate recursively. In operation, there is no quantization error, so there are three inputs excluding the quantization error.
[0031]
As described above, when the analysis length 504 is small and the coding band arrangement information 516 is fixed arrangement, encoding is performed in order from the low band to the middle band and the high band in accordance with the fixed band arrangement. The quantization order determining unit 902, the number-of-encoders determining unit 903, and the bandwidth calculating unit 901 determine the quantization band, the number, and the connection order of the encoders to perform the encoding. That is, the bandwidth control code string 508 at that time is encoded with information on the bandwidth information of the encoder, the number of encoders, and the connection order thereof.
[0032]
For example, the encoding band and the number of encoders for each encoder are one for 0 Hz to 4 kHz, one for 0 Hz to 8 kHz, one for 4 kHz to 12 kHz, two for 8 kHz to 16 kHz, and 16 kHz to 24 kHz. Encoders are arranged so that there are three, and encoding is performed.
[0033]
Next, the operation of the coding band control unit 507 when the coding band arrangement information 516 is dynamically arranged will be described.
The coding band control unit 507 includes a bandwidth calculation unit 901 that determines the quantization bandwidth of each encoder, a quantization order determination unit 902 that determines the quantization order of each encoder, and further encodes each band. The encoder number determination unit 903 determines the number of units. The bandwidth of each encoder is determined based on the signal input to the encoding band control unit 507. The predetermined bandwidth, for example, 0Hz to 4kHz, 0kHz to 8kHz, 4kHz to 12kHz, 8kHz In each of the bands of 16 kHz and 16 kHz to 24 kHz, an average value is calculated by multiplying the band control weight 517 and the quantization error after encoding by each encoder. Here, assuming that the band control weight 517 is weight 517 (i) and the quantization error is err507 (i), the average value is calculated by (Equation 5).
[0034]
[Equation 5]

Here, j is an index of each band, Ave901 (j) is an average value in band j, and fupper (j) and flower (j) are an upper limit frequency and a lower limit frequency of band j. The j having the maximum average value Ave901 (j) obtained in this way is searched, and this is the band to be encoded by the encoder. Further, the value of the retrieved j is sent to the encoder number determination unit 903 so that the number of encoders in the band corresponding to j is increased by one, and there are several encoders in a predetermined encoding band. Whether or not to do so is stored, and the encoding is repeated until the total number of stored encoders reaches the predetermined total number of encoders. Finally, the band of the encoder and the number of encoders are transmitted to the decoder as a band control code string 508.
[0035]
Next, the operation of the encoder 3 will be described with reference to FIG.
The encoder 3 includes a normalization unit 301 and a quantization unit 302.
The normalization unit 301 receives both the time-axis signal output from the frame division unit 201 and the MDCT coefficient output from the MDCT unit 203, and uses several parameters to calculate the MDCT coefficient. Normalize. Here, the normalization of the MDCT coefficient means to suppress the variation in the size of the MDCT coefficient that is very different in magnitude between the low frequency component and the high frequency component. In a case where the frequency component is very large with respect to the band component, a parameter that has a large value for the low frequency component and a small value for the high frequency component is selected, and the MDCT coefficient is divided by the above parameter. It means to suppress the variation of the length. In addition, the normalization unit 301 encodes an index representing a parameter used for normalization as a normalization code string 303.
[0036]
The quantization unit 302 receives the MDCT coefficient normalized by the normalization unit 301 as input and quantizes the MDCT coefficient. At this time, the quantization unit 302 is configured to reduce the difference between the quantized value and each quantized output corresponding to a plurality of code indexes in the codebook. Outputs code index. In this case, the difference between the value quantized by the quantization unit 302 and the value corresponding to the code index output from the quantization unit 302 is a quantization error.
[0037]
Next, a detailed example of the normalization unit 301 will be described with reference to FIG.
In FIG. 4, 401 is a frequency outline normalization unit that receives the outputs of the frame division unit 201 and the

MDCT unit

203, 402 receives the output of the frequency outline normalization unit 401, and is normalized by referring to the band table 403. Is a band amplitude normalization unit for performing
[0038]
Next, the operation will be described.
The frequency outline normalization unit 401 uses the data output on the time axis from the frame division unit 201 to calculate a frequency outline that is an approximate form of the frequency, and outputs an MDCT coefficient that is an output from the MDCT unit 203. Divide The parameters used to express the frequency outline are encoded as a normalized code string 303. The band amplitude normalization unit 402 receives the output signal from the frequency outline normalization unit 401 and performs normalization for each band indicated in the band table 403. For example, the MDCT coefficient that is the output of the frequency outline normalization unit 401 is dct (i) (i = 0 to 2047), and the bandwidth table 403 is, for example, as shown in (Table 1). Then, the average value of the amplitude for each band is calculated using (Equation 6) and the like.
[0039]
[Table 1]

[Formula 6]

Here, bjlow and bjhigh respectively indicate the lowest frequency index i and the highest frequency index i to which dct (i) in the j-th bandwidth shown in the bandwidth table 403 belongs. Also, p is the norm in distance calculation, and is preferably 2. avej is an average value of amplitude in each band number j. The band amplitude normalization unit 402 quantizes avej, calculates qavej, and normalizes using, for example, (Equation 7).
[0040]
[Expression 7]

As the quantization of avej, scalar quantization may be used, or vector quantization may be performed using a codebook. The band amplitude normalization unit 402 encodes the parameter index used to represent qavej as a normalized code string 303.
[0041]
Note that the configuration of the normalization unit 301 in the encoder has a configuration using both the frequency outline normalization unit 401 and the band amplitude normalization unit 402 in FIG. A configuration using only the normalization unit 401 or a configuration using only the band amplitude normalization unit 402 may be used. Further, when there is no large variation between the low-frequency component and the high-frequency component of the MDCT coefficient output from the MDCT unit 203, the output signal of the MDCT unit 203 is directly used as the quantization unit without using the above-described configuration. It is good also as a structure which inputs into 302. FIG.
[0042]
Next, details of the frequency outline normalization unit 401 of FIG. 4 will be described with reference to FIG.
In FIG. 5, 601 is a linear prediction analysis unit that receives the output of the

frame division unit

201, 602 is a rough quantization unit that receives the output of the linear

prediction analysis unit

601, and 603 is an envelope characteristic normalization unit that receives the output of the MDCT unit 203. It is.
[0043]
Next, the operation of the frequency outline normalization unit 401 will be described with reference to FIG.
The linear prediction analysis unit 601 performs linear prediction analysis (Linear Predictive Coding) using the audio signal on the time axis from the frame division unit 201 as an input. The linear prediction coefficient (LPC coefficient) of the linear prediction analysis can be generally calculated by calculating an autocorrelation function of a windowed signal such as a Hamming window and solving a normal equation or the like. The calculated linear prediction coefficient is converted into a line spectrum pair coefficient (LSP (Line Spectrum Pair) coefficient) and the like, and is quantized by the rough quantization unit 602. As the quantization method here, vector quantization may be used, or scalar quantization may be used. Then, the frequency transfer characteristic represented by the parameter quantized by the rough quantization unit 602 is calculated by the envelope characteristic normalization unit 603, and the MDCT coefficient that is the output from the MDCT unit 203 is divided by this. Normalize. As a specific calculation example, if the linear prediction coefficient equivalent to the parameter quantized by the rough quantization unit 602 is qlpc (i), the frequency transfer characteristic calculated by the envelope characteristic normalization unit 603 is used. Can be expressed by, for example, (Equation 8).
[0044]
[Equation 8]

Here, ORDER is preferably about 10-40. fft () means fast Fourier transform. Using the calculated frequency transfer characteristic env (i), the envelope characteristic normalization unit 603 performs normalization using, for example, the following (Equation 9).
[Equation 9]

Here, mdct (i) is an output signal from the MDCT unit 203, and fdct (i) is an output signal from the normalized envelope characteristic normalizing unit 603.
[0045]
Next, a detailed operation of the quantization method of the quantization unit 302 in the encoding apparatus 1 will be described with reference to FIG.
Some MDCT coefficients 1001 input to the quantization unit 302 are extracted from the MDCT coefficients 1001 to form a sound source subvector 1003. Similarly, when the normalization unit 301 sets a coefficient sequence obtained by dividing the MDCT coefficient that is the input of the normalization unit 301 by the MDCT coefficient that is the output of the normalization unit 301 as the normalization component 1002, this normalization component With respect to 1002 as well, a weighting subvector 1004 can be formed by extracting a subvector from the normalized component 1002 in accordance with the same rule as extracting the sound source subvector 1003 from the MDCT coefficient 1001. The rules for extracting the sound source subvector 1003 and the weight subvector 1004 from the MDCT coefficient 1001 and the normalized component 1002 are, for example, the method shown in (Equation 10).
[0046]
[Expression 10]

Here, the j th element of the i th sound source subvector is subvector i (j), the MDCT coefficient 1001 is vector (), the total number of elements of the MDCT coefficient 1001 is TOTAL, and the elements of the sound source sub vector 1003 The numbers CR and VTOTAL are the same or larger than TOTAL, and VTOTAL / CR is set to a positive value. For example, when TOTAL is 2048, CR is 19, VTOTAL is 2052, CR is 23, VTOTAL is 2070, CR is 21, and VTOTAL is 2079. The weight subvector 1004 can also be extracted by the procedure of several tens. The vector quantizer 1005 searches the code vector in the code book 1009 for the smallest distance from the sound source subvector 1003 by weighting with the weight subvector 1004, and the code vector giving the minimum distance. And a residual subvector 1010 corresponding to a quantization error between the code vector giving the minimum distance and the input sound source subvector 1003 are output.
[0047]
In an actual calculation procedure example, description will be made assuming that the vector quantizer 1005 includes three components, that is, a distance calculation unit 1006, a code determination unit 1007, and a residual generation unit 1008.
The distance calculation unit 1006 calculates the distance between the i th sound source subvector 1003 and the k th code vector of the code book 1009 using, for example, (Equation 11).
[0048]
[Expression 11]

Where wj is the jth element of the weight subvector, Ck (j) is the jth element of the kth code vector, R and S are the norms of the distance calculation, and the values of R and S are 1, 1.5, 2 etc. are desirable. The norms R and S do not have to be the same value. dik means the distance of the k th code vector to the i th sound source subvector. The code determining unit 1007 selects a code vector that is the smallest among the distances calculated by (Equation 11) and encodes the index as a code string 304. For example, if diu is the minimum among a plurality of diks, the index to be encoded for the i-th subvector is u. The residual generation unit 1008 generates a residual subvector 1010 according to (Equation 12) using the code vector selected by the code determination unit 1007.
[0049]
[Expression 12]

Here, the j-th element of the i-th residual subvector 1010 is resi (j), and the j-th element of the code vector selected by the code determining means 1007 is Cu (j). Using the residual subvector 1010, the inverse process of (Equation 10) is performed to obtain a vector, and the difference between the vector and the original encoding target of the encoder is calculated thereafter. Are held as MDCT coefficients to be quantized by each encoder. However, when encoding of a certain band is performed on a band that does not affect the subsequent encoder, that is, when the subsequent encoder does not perform encoding, It is not necessary to generate the residual subvector 1010 and MDCT 1011 by the residual generation means 1008. The code book 1009 can have any number of code vectors, but it is preferable to set the code vector to about 64 in consideration of memory capacity, calculation time, and the like.
[0050]
As another example of the vector quantizer 1005, the following configuration is also possible. That is, the distance calculation unit 1006 calculates the distance using (Equation 13).
[0051]
[Formula 13]

Here, K is the total number of code vectors used for code search of the code book 1009.
The code determining means 1007 selects k that gives the minimum value of the distance dik calculated in (Equation 13), and encodes the index. However, k is a value from 0 to 2K-1. Residual generation means 1008 generates residual subvector 1010 using (Equation 14).
[0052]
[Expression 14]

Here, the code book 1009 may have any number of code vectors, but is preferably about 64 in consideration of the memory capacity, calculation time, and the like. In the above description, the configuration in which the weight subvector 1004 is generated only from the normalized component 1002 has been described. However, the weight subvector is generated by further multiplying the weight subvector 1004 by a weight that takes into account human auditory characteristics. It is also possible.
As described above, the bandwidth of each encoder in each of a plurality of stages, the number of encoders, and the connection order are dynamically determined. Then, quantization is performed based on the information of each encoder thus determined.
[0053]
On the other hand, in decoding apparatus 1002, a normalized code string that is an output of the encoder of each band, a code string from a quantization unit corresponding to the normalized code string, and a coding band control unit in the encoding apparatus Using the band control code string that is the output of the analysis length and the analysis length code string that is the output of the analysis length determination unit.
[0054]
FIG. 9 shows the configuration of the

decoders

1202, 1203,. Each decoder includes an inverse quantization unit 1101 that reproduces the normalized MDCT coefficient, and an inverse normalization that decodes the normalized coefficient and multiplies the reproduced normalized MDCT coefficient by the normalized coefficient. And the conversion unit 1102.
[0055]
In the denormalization unit 1102, the parameters used for normalization in the encoding device 1 are restored from the normalized code string 303 from the normalization unit 301 of each encoder, and the output of the dequantization unit 1101 The MDCT coefficient is restored by multiplying the parameter.
[0056]
The decoding band control unit 1201 uses the band control code string 508 that is the output of the coding band control unit 507 to restore information on the arrangement of the encoders used in the encoding apparatus and the number of encoders. Based on this information, each

decoder

1202, 1203, 1204, 1202b is arranged in each band, and the bands are arranged in the reverse order to the encoding order of each

encoder

511, 512, 513, 511b in the encoding device. The MDCT coefficient is obtained by the band synthesizing unit 9 for synthesizing. The frequency time conversion unit 5 that receives the MDCT coefficient obtained in this way performs inverse MDCT to restore the frequency domain signal to the time domain signal. The calculation of the inverse MDCT coefficient is expressed by, for example, (Equation 15).
[0057]
[Expression 15]

Here, yyk is an MDCT coefficient restored by the band synthesizing unit 9, and xx (n) is an inverse MDCT coefficient, which is output from the frequency time conversion unit 5.
The windowing unit 6 performs windowing using the output xx (i) from the frequency time conversion unit 5. The windowing uses the window used in the windowing unit 202 in the time-frequency conversion unit 503 of the encoding device 1 and performs, for example, the processing expressed by (Equation 16).
[0058]
[Expression 16]

Here, z (i) is the output of the windowing unit 6.
The frame superimposing unit 7 uses the output from the windowing unit 6 to reproduce an audio signal. Since the output from the windowing unit 6 is a signal that overlaps in time, the frame superimposing unit 7 uses, for example, (Equation 17) as the output signal of the decoding device 1002.
[0059]
[Expression 17]

Here, zm (i) is the output signal z (i) of the i-th windowing unit 6 of the m-th time frame, and zm-1 (i) is the i-th time of the m-1 time frame. The output signal of the windowing unit 6, SHIFT is the number of samples corresponding to the analysis length 504 of the encoding device, and out m (i) is the output signal of the decoding device 1002 in the mth time frame of the frame superposition unit 7. And
[0060]
In the first embodiment, as described below, in the coding band control unit 507, the quantisable frequency range calculated by the bandwidth calculation unit 901 may be limited by the analysis length 504.
For example, when the analysis length 504 is 256, the lower limit of the quantifiable frequency range of each encoder is set to about 4 kHz, and the upper limit is set to about 24 kHz. When the analysis length is 1024 or 2048, the lower limit is set to 0 Hz and the upper limit is set to about 16 kHz. Once the analysis length 504 reaches 256, the quantization order is fixed so that the quantizer frequency range of each quantizer and the arrangement of the quantizers are fixed for a certain period of time, for example, about 20 msec. Control by the determination unit 902 is also possible. By using this processing, the arrangement of the quantizers is made constant over time, and the perceived in and out of the audible band (the sound that was centered on the high band up to a certain moment is suddenly the sound that is centered on the low band. It is possible to suppress the occurrence of a feeling that the voice band has entered and exited, as in the case of changing to.
[0061]
In such an audio signal encoding device and decoding device according to the first embodiment, a characteristic determining unit that determines a frequency band of an audio signal to be quantized by a plurality of stages of encoders, and the characteristic determining unit The frequency band determined in step 1 and the original audio signal subjected to frequency conversion are used as inputs, the connection order of the encoders in each of the plurality of stages is determined, and the quantization band and connection order of the encoders are determined. It has a coding band control unit that converts to a code string and is configured to adaptively perform scalable coding, so it can provide high quality, high efficiency, and sufficient performance when encoding a wide variety of audio signals. Thus, it is possible to obtain an audio signal encoding apparatus that performs adaptive scalable coding and a decoding apparatus that decodes the same.
[0062]
(Embodiment 2)
FIG. 14 illustrates a second embodiment of the present invention with reference to FIGS. 14 to 20.
FIG. 14 shows a block diagram of an encoding apparatus 2001 and a decoding apparatus 2002 that perform adaptive scalable coding according to Embodiment 2 of the present invention. As shown in the figure, in the encoding apparatus 2001, reference numeral 200105 denotes encoding conditions such as the number of encoders, bit rate, sampling frequency of input audio signal, encoding band information of each encoder, and the like. The characteristic determination unit that determines the frequency band of the audio signal to be quantized by each encoder of each stage of the above, 200109 is the coding band arrangement information, 200110 is the frequency band determined by the characteristic determination unit 200107, and the frequency converted An audio input signal is used as an input, and a coding band control unit that converts the quantization band of each of the plurality of stages of encoders and the connection order into a code string, 200111 is a coded string, and 200112 is a transmission coded string synthesizer. It is a vessel.
[0063]
Also, in the

decoding apparatus

2002, 200150 is a transmission coded sequence decomposer, 2000015 is a coded sequence, 2000015b is a coded sequence 20011, and decoding is performed to control the decoding band of each decoder that decodes this. The band control unit 200134b is a decoded spectrum.
[0064]
The encoding apparatus 2001 according to the second embodiment of the present invention performs adaptive scalable coding as in the first embodiment. However, compared with the first embodiment, the encoding apparatus 2001 newly adds a decoding The coding band control unit 200110 including the coding band control unit 200153 is added to the decoding apparatus 2002, and a decoding band control unit 200153b that performs the same processing as the decoding band control unit 200153 is added. In the characteristic determination unit 200107, an auditory psychological model calculation unit 200602 is provided instead of the spectrum power calculation unit 803 of the characteristic determination unit 506 in the first embodiment, as shown in FIG. 200107, the coding condition 200105, and the coding band calculated by the coding band calculation unit 200601 A broadcast 200,702, more and band number 200606 output from the arrangement decision unit 200603, is provided with a coding band arrangement information generation unit 200604 which generates the coding band arrangement information 200109.
[0065]
Also, in the

decoding apparatus

2002, 200150 is a transmission coded sequence decomposer, 2000015 is a coded sequence, 2000015b is a coded sequence 20011, and decoding is performed to control the decoding band of each decoder that decodes this. The band control unit 200134b is a decoded spectrum.
[0066]
Next, the operation of the second embodiment will be described.
In the second embodiment, the original audio signal 501 to be encoded is assumed to be a temporally continuous digital signal sequence as in the first embodiment.
First, the spectrum 505 of the original audio signal is obtained by the same processing as in the first embodiment. In the second embodiment, the encoding condition 200105 including the number of encoders, the bit rate, the sampling frequency of the input audio signal, and the encoding band information of each encoder is supplied to the encoding apparatus 2001. The information is input to the characteristic determination unit 200107 in the device 2001. Characteristic determination section 200107 outputs coding band arrangement information 200109 including information on the quantization band, the number, and the connection order of each encoder in each of a plurality of stages, and inputs this to coding band control section 200110. . As shown in FIG. 17, the coding band control unit 200110 receives the spectrum 505 of the original audio signal in addition to the coding band arrangement information 200109, and controls the coding band control unit 200110 based on these. An encoded sequence 200111 encoded by each encoder is output, which is input to the transmission encoded sequence synthesizer 200112 and synthesized thereby, and the synthesized output is further sent to the decoding device 2002. Sent.
[0067]
In the decoding device 2002, the output of the transmission coded sequence synthesizer 200112 of the coding device 2001 is received by the transmission coded sequence decomposing unit 200150 and decomposed into the coded sequence 20011 and the analysis length code sequence 200152. The coded sequence 200151 is input to the decoding band control unit 200153b, and a decoded spectrum 200154b decoded by each decoder controlled by the decoding band control unit is obtained. Then, from the decoded spectrum 200154b and the analysis-length coded sequence 200152 that is the output of the transmission coded sequence decomposer 200150, the frequency time conversion unit 5, the windowing unit 6, The decoded signal 8 is obtained using the frame superimposing unit 7.
[0068]
Next, the operation of the characteristic determination unit 200107 will be described with reference to FIGS.
The characteristic determination unit 200107 uses an encoding condition 200105 to calculate encoding band arrangement information 200702, spectral information such as the spectrum 505 of the original audio signal, the difference spectrum 200108, and the encoding band From the information 200702, the auditory psychology model calculation unit 200602 for calculating the auditory weight 200605 based on the human auditory psychological model and the analysis length 503 are referred to, and the auditory weight 200605 is further weighted accordingly, Output from the arrangement determining unit 200603 for determining the arrangement of the band and outputting the band number 200606, the encoding condition 200105, the encoded band information 200702 calculated by the encoded band calculating unit 200601, and the arrangement determining unit 200603. Band number 2006 06 is composed of encoded band arrangement information generating means 200604 for generating encoded band arrangement information 200109.
[0069]
The coding band calculation unit 200601 uses the coding condition 200105 set before the coding apparatus 2001 starts its operation, and uses the coding band upper limit fpu (k ), The lower limit fpl (k) is calculated and sent to the encoded band arrangement information generating means 200604 as the encoded band information 200702. Here, k is a number for handling the coding band, and indicates a band having a large frequency as k becomes 0 to pmax which is a preset maximum number. An example of pmax is 4. An example of the operation of the coded band calculation unit 200601 is shown in Table 2.
[0070]
[Table 2]

The psychoacoustic model calculation unit 200602 includes spectrum information such as an output signal from the filter 701 or difference spectrum 200108 that is an output of the coding band control unit 200110, and a coding band that is an output of the coding band calculation unit 200601. Auditory weights 200605 are calculated from information 200702 based on a human auditory psychological model. The auditory weight 200605 is such that a band that is important for hearing is a large value, and a band that is not so important for hearing is a small value. As an example of the auditory psychological model calculation unit 200602, there is one that uses a method of calculating the power of an input spectrum. When the input spectrum is x602 (i), the auditory weight wpsy (k) is
[0071]
[Formula 18]

It becomes. The auditory weight 200605 calculated in this way is input to the arrangement determining unit 200603. The arrangement determining unit 200603 refers to the analysis length 503, and when the analysis length 503 is small, for example, 128, the band number 200606 is large. , For example, so that the auditory weight 200605 of the band 4 is increased, for example, the auditory weight of the band of band number 4 is doubled, and the auditory weight 200605 is set when the analysis length 503 is not small. As it is, the band in which the auditory weight 200603 is maximized is calculated, and the band number 200606 is sent to the encoded band arrangement information generating unit 200604.
[0072]
The coded band arrangement information generation unit 200604 receives the coded band information 200702, the band number 200606, and further the coding condition 200105, and outputs the coded band arrangement information 200109. That is, the encoded band arrangement information generation unit 200604 always refers to the encoding condition 200105, and while the encoded band arrangement information 200109 is required based on the encoding condition, Coding band arrangement information 200109 formed by concatenating 200702 and band number 200606 is output, and when this is no longer necessary, the output is stopped. For example, the band number 200606 is output until the number of encoders specified by the encoding condition 200105 is reached. In the arrangement determining unit 200603, the output band number 200606 may be fixed when the analysis length 503 is small.
[0073]
Next, the operation of the coding band control unit 200110 will be described using FIG.
The coding band control unit 200110 receives the coding band arrangement information 200109, which is an output from the characteristic determination unit 200107, and the spectrum 505 of the original audio signal, and the coding sequence 200111 and the difference spectrum 200108 as outputs. Inside, the encoded band arrangement information 200109 is received, and the spectrum 505 of the original audio signal and the difference spectrum between the spectrum 505 of the past original audio signal and the spectrum 200705 obtained by encoding and decoding the spectrum 505 are included. Spectrum shift means 200701 for shifting 200108 to the band of band number 200606, encoder 2003, difference calculation means 200703 for taking the difference between spectrum 505 of the original audio signal and decoded spectrum 200705, difference The spectrum holding means 200704 and the synthesized spectrum 2001001 obtained by decoding the code string 200111 by the decoder 2004 are subjected to spectrum shift based on the coding band arrangement information 200702, and are sequentially synthesized to obtain a synthesized spectrum for decoding. A decoding band control unit 200153 for calculating the spectrum 20077056 is included. The configuration of the spectrum shift means 200701 is as shown in FIG. 20, but the original spectrum 20010101 to be shifted and the coding band arrangement information 200109 are used as inputs. Of the inputs of the spectrum shift means 200701 in the coding band control unit 200110, the spectrum 2001011 to be shifted is the spectrum 505 of the original audio signal or the difference spectrum 200108, which is shifted to the band of the band number 200606 and shifted. And the encoded band information 200702 of the encoded band arrangement information 200109 are output. The band corresponding to the band number 200606 can be obtained from fpl (k) and fpu (k) of the encoded band information 200702. The shifting procedure is to move the spectrum between the fpl (k) and fpu (k) to a band that can be processed by the encoder 2003.
[0074]
Thus, the encoder 2003 that receives the shifted spectrum 2001102 outputs the normalized code string 303 and the residual code string 304 as shown in FIG. A combination of certain coding band information 200702 is sent as a code string 200111 to the transmission coding synthesizer 200112 and the decoding band control unit 200153.
[0075]
The coded sequence 200111 that is the output of the coder 2003 is input to a decoding band control unit 200153 in the coding band control unit 200110. The decoding band control unit 200153 has the same operation as that in the decoding device 2002 (200153b).
[0076]
Next, FIG. 19 shows a configuration of a decoding band control unit 200153b existing in the decoding apparatus 2002.
The decoding band control unit 200153b receives the code sequence 200111 from the transmission coded sequence decomposer 200150 and outputs a decoded spectrum 200705b. The decoding band control unit 200153b includes a decoder 2004, a spectrum shift unit 200701, and a decoding unit. The computerized spectrum calculation unit 2001003 is included.
[0077]
The configuration of the decoder 2004 is shown in FIG.
The decoder 2004 includes an inverse quantization unit 1101 and an inverse normalization unit 1102. The inverse quantization unit 1101 receives the residual code sequence 304 of the code sequence 20011, and inputs the residual code sequence. 304 is converted into a code index, the code book used in the encoder 2003 is referred to, and the code is reproduced. The reproduced code is sent to the denormalization unit 1102 and is multiplied by the normalized series sequence 303a reproduced from the normalized code sequence 303 in the code sequence 200111 to obtain a combined spectrum 2001001. The synthesized spectrum 2001001 is input to the spectrum shift means 200701.
[0078]
Note that the output of the decoding band control unit 200153 in the coding band control unit 200110 is the decoding spectrum 200705, which is the output of the decoding band control unit 200153b in the decoding device 2002. This is the same as the generalized spectrum 200705b.
[0079]
The synthesized spectrum 2001001 synthesized by the decoder 2004 is shifted by the spectrum shift unit 200701 to obtain a shifted synthesized spectrum 2001002, which is input to the decoded spectrum calculation unit 2001003.
[0080]
In the decoded spectrum calculation unit 2001003, the inputted combined spectrum is held, and the held spectrum and the latest synthesized spectrum are added and output as a decoded spectrum 200705b.
[0081]
The difference calculation unit 200703 in the coding band control unit 200110 calculates the difference between the spectrum 505 of the original audio signal and the decoded spectrum 200705, and outputs a difference spectrum 200108, which is fed back to the characteristic determination unit 200107. Is done. At the same time, the difference spectrum 200108 is held by the difference spectrum holding unit 200704 and also sent to the spectrum shift unit 200701 so that the next encoded band arrangement information 200109 is input. The characteristic determination unit 200107 continues to output the coding band arrangement information 200109 until the coding condition is satisfied while referring to the coding condition, and the operation of the coding band control unit 200110 is also stopped when it disappears. . Note that the coding band control unit 200110 has difference spectrum holding means 200704 in order to calculate the difference spectrum 200108. This is a storage area necessary for holding the difference spectrum, and is an array that can store, for example, 2048 numbers.
[0082]
As described above, the processing by the characteristic determination unit 200107 and the subsequent encoding band control unit 200110 is repeated so as to satisfy the encoding condition 200105, and the encoded sequence 200111 is sequentially output, which is transmitted and encoded. The result is sent to the sequence synthesizer 200112, synthesized as a transmission encoded sequence together with the analysis length code sequence 510, and transmitted to the decoding device 2002.
[0083]
In the decoding apparatus 2002, the transmission encoded sequence transmitted from the encoding apparatus 2001 is decomposed into an encoded sequence 200151 and an analysis long code sequence 200152 by the transmission encoded sequence decomposer 200150. The encoded sequence 20011 and the analysis length code sequence 200152 are the same as the encoded sequence 200111 and the analysis length code sequence 510 in the encoding device 2001.
[0084]
Decoded coded sequence 200151 is converted into decoded spectrum 200154b by decoding band control unit 200153b, and decoded spectrum 200154b is converted to frequency time converting unit 5 and windowing unit using information of analysis length code sequence 200152. 6 and the frame superimposing unit 7 convert the signal into a time domain signal, which becomes a decoded signal 8.
[0085]
As described above, according to the audio signal encoding device and the decoding device according to the second embodiment, the frequency band of the audio signal to be quantized by the plurality of encoders is determined as in the first embodiment. The characteristic determining unit, the frequency band determined by the characteristic determining unit, and the original audio signal subjected to frequency conversion are input, and the connection order of the encoders in each of the plurality of stages is determined and encoded. A coding band control unit including a decoding band control unit in a configuration in which adaptive coding is performed in an adaptive manner. And a decoding band control unit in the decoding device, and the spectrum power calculation unit in the characteristic determination unit is an auditory psychological model calculation unit. Since the band arrangement information generating means is provided, the auditory psychological model calculation unit is used in place of the spectrum power calculation unit of the characteristic determination unit, so that an auditory important portion can be accurately determined. The band can be selected more. Also, in the audio signal encoding device and decoding device targeted by the present invention, if the encoding condition is satisfied during the operation of determining the arrangement of the encoder, the encoding process is OK. In this calculation for determining the arrangement of the encoders, in the first embodiment, when selecting the band when the encoders are arranged, it is determined that the coding band arrangement information is not output. In the second embodiment, as the determination conditions of the characteristic determination unit, the sampling frequency of the input signal and the compression rate, that is, the bit rate of encoding, are fixed. Therefore, according to these, the weighting degree with respect to each band when selecting the band arrangement of each encoder can be changed. Further, as a determination condition of the characteristic determination unit, a compression rate Conditions When the compression rate is high, that is, when the bit rate is low, the band weighting degree when selecting the band arrangement of each encoder is not changed so much, while the compression rate is low. In other words, when the bit rate is high, in order to further pursue the efficiency, the weighting degree of each band when selecting the band arrangement of each encoder is emphasized more importantly. Thus, the best balance between compression ratio and quality can be obtained. As described above, an audio signal encoding / decoding device that performs high-quality, high-efficiency, and adaptive scalable coding with sufficient performance even when encoding a wide variety of audio signals is obtained. Can do.
[0086]
【The invention's effect】
As described above, according to the audio signal encoding method and the audio signal decoding method according to the present invention, the encoding step has a plurality of encoding substeps, and the audio signal is controlled by the control of the encoding band control step. The characteristic determination step determines the input audio signal, outputs band weight information indicating the weight of each frequency band to be encoded, and the coding band control step includes: Based on the band weight information, the quantization band and connection order of each encoding sub-step constituting the multi-stage encoding are determined, and the configuration is made scalable based on the determined quantization band and connection order of each encoding sub-step. Multi-stage encoding is performed in the encoding step, and a band control code string indicating the quantization band and connection order of each determined encoding sub-step is output. By, the audio signal having a wide variety of properties, with higher quality, more efficient, being able to perform adaptive scalable coding, advantageous effects can be obtained.
[Brief description of the drawings]
FIG. 1 is a block diagram of adaptive scalable coding in an audio signal encoding apparatus according to Embodiment 1 of the present invention.
FIG. 2 is a diagram showing a time-frequency conversion unit in the coding apparatus according to the first embodiment.
FIG. 3 is a diagram showing an encoder in the encoding apparatus according to the first embodiment.
FIG. 4 is a diagram showing a normalization unit in the encoding apparatus according to the first embodiment.
FIG. 5 is a diagram showing a frequency outline normalization unit in the encoding apparatus of the first embodiment.
FIG. 6 is a diagram showing a characteristic determination unit in the encoding apparatus according to the first embodiment.
FIG. 7 is a diagram showing a coding band control unit in the coding apparatus according to the first embodiment.
FIG. 8 is a diagram illustrating a quantization unit in the coding apparatus according to the first embodiment.
FIG. 9 is a diagram showing a decoder in the coding apparatus according to the first embodiment.
FIG. 10 is a diagram showing an outline of a general TwinVQ method
FIG. 11 shows a general TwinVQ scalable coding scheme.
FIG. 12 shows the disadvantages of general fixed scalable coding.
FIG. 13 is a diagram showing the advantages of general adaptive scalable coding.
FIG. 14 is a block diagram of adaptive scalable coding in an audio signal encoding apparatus according to Embodiment 2 of the present invention.
FIG. 15 is a diagram showing an encoder in the encoding apparatus of the second embodiment.
FIG. 16 is a diagram showing a characteristic determination unit in the coding apparatus according to the second embodiment.
FIG. 17 is a diagram showing a coding band control unit in the coding apparatus according to the second embodiment.
FIG. 18 is a diagram showing a decoder in the coding apparatus according to the second embodiment.
FIG. 19 is a diagram showing a decoding band control unit in the coding apparatus according to the second embodiment.
FIG. 20 is a diagram showing spectrum shift means in the coding apparatus according to the second embodiment.
[Explanation of symbols]
1 Encoder
2 Decryption device
501 Original audio signal
502 Analysis length determination unit
503 Time frequency converter
504 Analysis length
505 The spectrum of the original audio signal
506 Characteristic determination unit
507 Coding band control unit
508 Band control code string
510 Analysis length code string
511 Low-pass encoder
512 mid-range encoder
513 High band encoder
511b Second stage low band encoder
518, 519, 520, 518b Quantization error
521 Low frequency code sequence
522 Mid-range code string
523 High-frequency code string
521b Second stage low frequency code string
701 filter
5 Frequency time converter
6 Window hanging part
7 Frame overlapping part
8 Decoded signal
9 Band combiner
1201 Decoding band control unit
1202 Low frequency decoder
1203 Mid-range decoder
1204 high frequency decoder
1202b Second stage low band decoder
201 Frame division part
202 Window hanging part
203 MDCT section
3 Encoder
301 normalization part
302 Quantization unit
303 Normalized code string
304 code string
401 Frequency outline normalization unit
402 Band amplitude normalization unit
403 Bandwidth table
601 Linear prediction analysis unit
602 Approximate quantization unit
603 Envelope characteristic normalization unit
803 Spectral power calculator
804 Placement determination unit
517 Band control weight
516 Coding band allocation information
901 Bandwidth calculation unit
902 Quantization order determination unit
903 Encoder number determination unit
1001 MDCT of the band to be quantized by the quantizer
1002 Normalization component of the same quantization band
1003 Sound source subvector
1004 Weight subvector
1005 Vector quantizer
1006 Distance calculation means
1007 Code determining means
1008 Residual generation means
1009 Codebook
1010 Residual subvector
1011 MDCT residual of the band to be quantized by a quantizer
101 Original audio signal
102 Analysis length determination unit
103 Time frequency converter
104 Original audio signal in the frequency domain
105 Frequency outline
106 Normalization processing unit
107 Normalized code string
108 Current audio signal after normalization
109 Vector quantization section
110 Code sequence
111 Analysis long code string
1301 Original audio signal
1302 Time frequency converter
1303 Analysis length determination unit
1304 Original audio signal in frequency domain
1305 Low-pass encoder
1306 Quantization error
1307 Mid-range encoder
1308 Quantization error
1309 High band encoder
1310 Quantization error
1311 Low frequency code sequence
1312 Mid-range code string
1313 High-frequency code string
1314 Analysis long code string
2001 Encoder
2002 Decryption device
200105 Coding conditions
200107 Characteristic determination unit
200108 Difference spectrum
200109 Coding band arrangement information
200110 Coding band control unit
200111 coded sequence
200112 Transmission coded sequence synthesizer
200150 Transmission coding sequence decomposer
20011 Coded sequence
200152 Analysis length coded sequence
200153 Decoding Band Control Unit
200154 Decoded spectrum
2003 Encoder
200305 Coding band information
200601 Coding band calculation unit
200602 Auditory psychology model calculator
200603 Placement determination unit
200604 Coding band arrangement information generating means
200605 auditory weight
200701 Spectral shift means
200702 Coding band information
200703 Difference calculation means
200704 Difference spectrum holding means
2004 Decoder
200901 Inverse quantification unit
200902 Denormalization unit
2001001 synthetic spectrum
2001002 Shifted composite spectrum
2001003 Decoded spectrum calculation unit
2001101 original spectrum
2001102 Shifted spectrum

Claims

An audio signal encoding method that includes a characteristic determination step, an encoding band control step, an encoding step, and converts a time-frequency converted audio signal into an encoded sequence,
The encoded sequence includes encoded information and a band control code sequence,
The encoding step has a plurality of encoding sub-steps, performs multi-stage encoding of the audio signal under the control of the encoding band control step, and outputs encoding information.
The characteristic determination step determines the input audio signal, outputs band weight information indicating the weight of each frequency band to be encoded,
The coding band control step is:
Based on the band weight information, determine the quantization band and connection order of each encoding sub-step constituting the multi-stage encoding,
Based on the determined quantization band of each encoding sub-step, the order of connection, the encoding step performs multi-stage encoding configured in a scalable manner,
Output a band control code string indicating the quantization band and connection order of each determined encoding sub-step.
Audio signal encoding method.

The coding band control step determines the quantization band and the connection order of each coding sub-step so as to be one of the predefined multistage codings.
The audio signal encoding method according to claim 1.

The encoding step outputs a quantization error,
The coding band control step determines the quantization band and connection order of each coding sub-step based on the band weight information and the quantization error.
The audio signal encoding method according to claim 1.

An audio signal decoding method that includes a decoding band control step and a decoding step, and decodes an encoded sequence including encoded information and a band control code sequence into an audio signal,
The band control code string indicates the quantization band and connection order of each encoding when the encoding information is multistage encoded.
The decoding step has a plurality of decoding sub-steps, performs multi-stage decoding of encoded information under the control of the decoding band control step,
The decoding band control step causes the decoding step to perform multi-stage decoding configured in a scalable manner based on the band control code string.
Audio signal decoding method.