JP2004301954A

JP2004301954A - Hierarchical encoding method and hierarchical decoding method for sound signal

Info

Publication number: JP2004301954A
Application number: JP2003092581A
Authority: JP
Inventors: Masahiro Oshikiri; 正浩押切
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2003-03-28
Filing date: 2003-03-28
Publication date: 2004-10-28
Anticipated expiration: 2023-03-28
Also published as: JP4373693B2

Abstract

<P>PROBLEM TO BE SOLVED: To perform encoding of high quality at a low bit rate. <P>SOLUTION: A subtracter 110 calculates the difference between the output signal of a delay unit 109 and a 2nd layer decoded signal to generate a 3rd layer residue signal. A 3rd layer encoding part 111 encodes the 3rd layer residue signal so that auditory quality is improved, and determines a 3rd encoded code. A 3rd layer decoding part 112 performs decoding processing by using the 3rd encoded code to generate a 3rd layer decoding residue signal. A predictive filter 116 predictively filters the 3rd layer decoded signal to generate a predictive residue signal. A 1st layer encoding part 102 uses the predictive residue signal as the internal state of an adaptive code book present in the 1st layer encoding part 102. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、音響信号の階層符号化方法および階層復号化方法に関し、特に楽音信号または音声信号などの音響信号を高能率に圧縮符号化に用いて好適な音響信号の階層符号化方法および階層復号化方法に関する。
【０００２】
【従来の技術】
楽音信号または音声信号を低ビットレートで圧縮する音響符号化技術は、移動体通信における電波等の伝送路容量及び記録媒体の有効利用のために重要である。音声信号を符号化する音声符号化に、ＩＴＵ（ＩｎｔｅｒｎａｔｉｏｎａｌＴｅｌｅｃｏｍｍｕｎｉｃａｔｉｏｎＵｎｉｏｎ）で規格化されているＧ７２６、Ｇ７２９などの方式が存在する。これらの方式は、狭帯域信号（３００Ｈｚ〜３．４ｋＨｚ）を対象とし、８ｋｂｉｔ／ｓ〜３２ｋｂｉｔ／ｓで高品質に符号化が行える。また、広帯域信号（５０Ｈｚ〜７ｋＨｚ）を対象とする標準方式としてＩＴＵのＧ７２２、Ｇ７２２．１や、３ＧＰＰ（Ｔｈｅ３ｒｄＧｅｎｅｒａｔｉｏｎＰａｒｔｎｅｒｓｈｉｐＰｒｏｊｅｃｔ）のＡＭＲ−ＷＢなどが存在する。これらの方式は、ビットレートが６．６ｋｂｉｔ／ｓ〜６４ｋｂｉｔ／ｓで広帯域音声信号を高品質に符号化できる。
【０００３】
音声信号を低ビットレートで高能率に符号化を行う有効な方法に、ＣＥＬＰ（ＣｏｄｅＥｘｃｉｔｅｄＬｉｎｅａｒＰｒｅｄｉｃｔｉｏｎ）がある。ＣＥＬＰは、人間の音声生成モデルを工学的に模擬したモデルに基づき、乱数やパルス列で表される励振信号を周期性の強さに対応するピッチフィルタと声道特性に対応する合成フィルタに通し、その出力信号と入力信号の二乗誤差が聴覚特性の重み付けの下で最小になるよう符号化コードを決定する方法である（例えば、非特許文献１参照）。最近の標準音声符号化方式の多くがＣＥＬＰに基づいており、例えばＧ７２９は８ｋｂｉｔ／ｓで狭帯域信号の符号化が行え、ＡＭＲ−ＷＢは６．６ｋｂｉｔ／ｓ〜２３．８５ｋｂｉｔ／ｓで広帯域信号を符号化できる。
【０００４】
一方で、楽音信号を符号化する楽音符号化の場合は、ＭＰＥＧ（ＭｏｖｉｎｇＰｉｃｔｕｒｅＥｘｐｅｒｔＧｒｏｕｐ）で規格化されているレイヤＩＩＩ方式やＡＡＣ方式のように、楽音信号を周波数領域に変換し、聴覚心理モデルを利用して符号化を行う変換符号化が一般的である。これらの方式は、サンプリング周波数が４４．１ｋＨｚの信号に対しチャネル当たり６４ｋｂｉｔ／ｓ〜９６ｋｂｉｔ／ｓで聴感的な劣化がほとんど生じないことが知られている。
【０００５】
しかしながら、音声信号が主体で、背景に音楽や環境音が重畳している信号を符号化する場合、音声符号化方式を適用すると背景部の音楽や環境音の影響で、背景部の信号のみならず音声信号も劣化してしまい全体的な品質が低下するという問題があった。これは、音声符号化方式が、ＣＥＬＰという音声モデルに特化した方式を基本にしているために生じる問題である。また、音声符号化方式が対応できる信号帯域は高々７ｋＨｚまでであり、それ以上の高域を持つ信号に対しては構成上十分に対応しきれないという問題があった。
【０００６】
一方で、楽音符号化は音楽に対して高品質に符号化を行えるので、前述したような背景に音楽や環境音がある音声信号についても十分な品質を得ることができる。対象となる信号の帯域もＣＤ品質である２２ｋＨｚ程度まで対応可能である。その反面、高品質な符号化を実現するためにはビットレートを高くして使用する必要があり、仮にビットレートを３２ｋｂｉｔ／ｓ程度まで低く抑えると復号信号の品質が大きく低下するという問題がある。そのため、伝送レートの低い通信網で使用できないという問題があった。
【０００７】
上述した問題を回避するためにこれら技術を組み合わせて、最初に入力信号を第１レイヤにてＣＥＬＰで符号化し、次にその復号信号を入力信号から減算して得られる残差信号を求め、この信号を第２レイヤ以降にて変換符号化を行う方法が考えられる。この方法では、第１レイヤはＣＥＬＰを用いているため音声信号を高品質に符号化でき、かつ第２レイヤ以降では第１レイヤで表しきれない背景の音楽や環境音、第１レイヤでカバーする周波数帯よりも高い周波数成分の信号を効率よく符号化することができる。
【０００８】
しかしながら、音声ではなく音楽を入力したときに十分な品質を確保するためには、第２レイヤ以降へのビット配分を多くする必要があり、その結果ビットレートが高くなってしまうという問題がある。これは第１レイヤにＣＥＬＰのような音声に特化した符号化方式を適用しているために生じる問題である。つまり、音楽信号が入力されたとき、第１レイヤで用いられるＣＥＬＰでは音楽に対する符号化効率が高くないので、入力信号と第１レイヤの復号信号との誤差信号（つまり第２レイヤの入力信号）のパワーが大きくなる。この結果、第２レイヤ以降のレイヤに多くのビットを配分して、最終的な復号信号の品質を上げる必要があった。
【０００９】
【非特許文献１】
”Ｃｏｄｅ−ＥｘｃｉｔｅｄＬｉｎｅａｒＰｒｅｄｉｃｔｉｏｎ（ＣＥＬＰ）：ｈｉｇｈｑｕａｌｉｔｙｓｐｅｅｃｈａｔｖｅｒｙｌｏｗｂｉｔｒａｔｅｓ”，Ｐｒｏｃ．ＩＣＡＳＳＰ８５，ｐｐ．９３７−９４０，１９８５．
【００１０】
【発明が解決しようとする課題】
このように、従来の装置においては、低ビットレートで高品質な符号化を行うことが難しいという問題がある。
【００１１】
本発明はかかる点に鑑みてなされたものであり、低ビットレートで高品質な符号化を行うことができる音響信号の階層符号化方法および階層復号化方法を提供することを目的とする。
【００１２】
【課題を解決するための手段】
本発明の階層符号化方法は、入力音声信号を符号化し、前段で符号化した信号を復号し、この復号信号と入力信号との差分を符号化する階層符号化方法であって、所定の長さのフレーム単位で入力音響信号を符号化する第１符号化工程と、前段の符号化結果を復号化した信号と入力音響信号との差分を１段または複数段で符号化する第２符号化工程と、前記第２符号化工程の符号化結果を復号化した信号から予測残差信号を生成する予測フィルタ工程と、前記予測フィルタ工程の予測に基づいて符号化に用いる符号帳を更新する更新工程と具備するようにした。
【００１３】
本発明の階層符号化方法は、前記第１符号化工程は、入力音響信号をＣＥＬＰ符号化し、前記予測フィルタ工程は、量子化後のＬＰＣ係数を用いて予測フィルタを生成し、前記更新工程は、前記第２符号化手段の符号化結果を復号化した信号を前記予測フィルタに通した結果を用いて符号帳を更新するようにした。
【００１４】
これらの方法によれば、上位レイヤで符号化しきれない部分を符号化する下位レイヤで符号化する階層符号化において、第２レイヤ以降の符号化信号を復号した信号と、上位レイヤの符号化で得られるＬＰＣ係数とから上位レイヤの符号化で発生する予測残差信号を生成し、この予測残差信号を用いて上位レイヤの適応符号帳の更新を行うことにより、音響信号の符号化に近い駆動音源を持つ適応符号帳で符号化することができ、低ビットレートで高品質な符号化を行うことができる。
【００１５】
本発明の階層符号化方法は、入力音響信号をダウンサンプリングするダウンサンプリング工程と、前段の符号化結果を復号化した信号をアップサンプリングするアップサンプリング工程とを具備し、前記第２符号化工程は、アップサンプリング後の前段の符号化結果を復号化した信号と入力音響信号との差分を１段または複数段で符号化するようにした。
【００１６】
この方法によれば、下位レイヤで符号化する信号のサンプリング周波数を上位レイヤで符号化する信号のサンプリング周波数より高くすることにより、様々なサンプリング周波数に対応させて入力信号を符号化することができる。
【００１７】
本発明の階層符号化方法は、入力音響信号の周期性を測定する周期性算出工程を具備し、前記更新工程は、前記周期性が所定のしきい値以上である場合に前記予測フィルタ工程の予測で得られる予測残差信号を用いて符号帳を更新し、前記周期性が所定のしきい値未満である場合に生成した駆動音源信号とのいずれかを用いて符号帳を更新するようにした。
【００１８】
この方法によれば、入力音響信号の周期性が強い場合には、高位レイヤの復号信号から求められる予測残差信号を使って適応符号帳の内部状態を更新することにより適応符号帳による予測精度が増し性能が向上する。また、本実施の形態の階層符号化装置によれば、入力音響信号の周期性が強くない場合には駆動音源信号を使って適応符号帳の内部状態を更新することにより、非周期的な信号に対する効果を上げることができる。
【００１９】
本発明の階層符号化方法は、予測残差信号を用いて適応符号帳の内部状態を更新し入力音響信号を実際に符号化して求められる歪と、駆動音源信号を用いて符号帳の内部状態を更新し入力音響信号を実際に符号化して求められる歪のいずれが小さいかを判定する判定工程を具備し、前記更新工程は、前記歪みが小さい信号を用いて符号帳を更新するようにした。
【００２０】
この方法によれば、予測残差信号もしくは駆動音源信号のいずれを用いて適応符号帳の内部状態を更新するかを判定する際に、予測残差信号を用いて適応符号帳の内部状態を更新し入力音響信号を実際に符号化して求められる歪と、駆動音源信号を用いて適応符号帳の内部状態を更新し入力音響信号を実際に符号化して求められる歪を算出して比較し、歪が小さくなる信号を用いて適応符号帳の内部状態を更新することにより、歪の小さくなる信号を常に使って適応符号帳の内部状態を更新することになるので、品質を向上することができる。
【００２１】
本発明の階層復号化方法は、符号側で入力音声信号を符号化し、前段で符号化した信号を復号し、この復号信号と入力信号との差分を符号化した信号を復号する階層復号化方法であって、所定の長さのフレーム単位で入力音響信号を符号化した信号を復号する第１復号化工程と、前段の符号化結果を復号化した信号と入力音響信号との差分を１段または複数段で符号化した信号をそれぞれ復号して加算する第２復号化工程と、前記第１復号化工程と前記第２復号化工程の復号結果から予測残差信号を生成する予測フィルタ工程と、前記予測フィルタ工程の予測に基づいて復号化に用いる符号帳を更新する更新工程と、を具備するようにした。
【００２２】
本発明の階層復号化方法は、前記第１復号化工程は、入力音響信号をＣＥＬＰ符号化方式で信号を復号し、前記予測フィルタ工程は、符号化側で符号化されたＬＰＣ係数を復号して得られるＬＰＣ係数を用いて予測フィルタを生成し、前記更新工程は、前記第１復号化工程と前記第２復号化工程の復号結果を前記予測フィルタに通した結果を用いて符号帳を更新するようにした。
【００２３】
これらの方法によれば、上位レイヤで符号化しきれない部分を符号化する下位レイヤで符号化する階層符号化方法の復号において、第２レイヤ以降の符号化信号を復号した信号と、上位レイヤの符号化で得られるＬＰＣ係数とから上位レイヤの符号化で発生する残差信号を予測し、この予測した残差信号を用いて上位レイヤの適応符号帳の更新を行うことにより、音響信号の符号化に近い駆動音源を持つ適応符号帳で復号することができ、低ビットレートで高品質な信号を復号できる。
【００２４】
本発明の階層復号化方法は、前段の復号化結果をアップサンプリングするアップサンプリング工程と、アップサンプリングした復号結果と後段の復号結果を加算する加算工程と、前記加算結果をダウンサンプリングするダウンサンプリング工程と、を具備し、前記予測フィルタ工程は、ダウンサンプリング後の復号結果から予測残差信号を生成するようにした。
【００２５】
この方法によれば、下位レイヤで復号する信号のサンプリング周波数を上位レイヤで復号する信号のサンプリング周波数より高くすることにより、様々なサンプリング周波数に対応させて信号を符号化した信号を復号することができる。
【００２６】
本発明の階層復号化方法は、前記更新工程は、符号化側において前記予測フィルタ工程の予測で得られる予測残差信号と生成した駆動音源信号とのいずれかを用いて適応符号帳を更新するか判定した結果に基づいて符号帳を更新するようにした。
【００２７】
本発明の階層符号化装置は、入力音声信号を符号化し、前段で符号化した信号を復号し、この復号信号と入力信号との差分を符号化する階層符号化装置であって、所定の長さのフレーム単位で入力音響信号を符号化する第１符号化手段と、前段の符号化結果を復号化した信号と入力音響信号との差分を１段または複数段で符号化する第２符号化手段と、前記第２符号化手段の符号化結果を復号化した信号から予測残差信号を生成する予測フィルタ手段と、を具備し、前記第１符号化手段は、前記予測フィルタ手段の予測に基づいて符号化に用いる符号帳を更新するようにした。
【００２８】
この構成によれば、符号側において入力音響信号の周期性の強さ等に基づいて、予測残差信号もしくは駆動音源信号のいずれを用いて適応符号帳の内部状態を更新するかを判定した結果に基づいて、符号化された音響信号の周期性が強い場合には、高位レイヤの復号信号から求められる予測残差信号を使って適応符号帳の内部状態を更新することにより、階層符号化法の符号化コードを復号することができ、その結果高品質な音響信号を復号することができる。
【００２９】
本発明の階層符号化装置は、前記第１符号化手段は、入力音響信号をＣＥＬＰ符号化する手段であって、過去に生成した駆動音源信号を保持する符号帳と、入力音響信号からＬＰＣ係数を求めるＬＰＣ分析手段と、入力音声信号と差が最も小さい駆動音源信号を探索する探索手段と、を具備し、前記予測フィルタ手段は、量子化後のＬＰＣ係数を用いて予測フィルタを生成し、前記第１符号化手段は、前記第２符号化手段の符号化結果を復号化した信号を前記予測フィルタに通した結果を用いて符号帳を更新する構成を採る。
【００３０】
この構成によれば、上位レイヤで符号化しきれない部分を符号化する下位レイヤで符号化する階層符号化において、第２レイヤ以降の符号化信号を復号した信号と、上位レイヤの符号化で得られるＬＰＣ係数とから予測残差信号を生成し、この予測した残差信号を用いて上位レイヤの適応符号帳の更新を行うことにより、音響信号の符号化に近い駆動音源を持つ適応符号帳で符号化することができ、低ビットレートで高品質な符号化を行うことができる。
【００３１】
本発明の階層符号化装置は、入力音響信号をダウンサンプリングして前記第１符号化手段または前記第２符号化手段に出力するダウンサンプリング手段と、前段の符号化結果を復号化した信号をアップサンプリングするアップサンプリング手段とを具備し、前記第２符号化手段は、アップサンプリング後の前段の符号化結果を復号化した信号と入力音響信号との差分を１段または複数段で符号化する構成を採る。
【００３２】
この構成によれば、下位レイヤで符号化する信号のサンプリング周波数を上位レイヤで符号化する信号のサンプリング周波数より高くすることにより、様々なサンプリング周波数に対応させて入力信号を符号化することができる。
【００３３】
本発明の階層符号化装置は、前記第１符号化手段は、前記予測フィルタ手段の予測で得られる予測残差信号と生成した駆動音源信号とのいずれかを用いて適応符号帳を更新するか判定する判定手段を具備する構成を採る。
【００３４】
本発明の階層符号化装置は、前記第１符号化手段は、入力音響信号の周期性を測定する周期性算出手段を具備し、前記判定手段は、前記周期性が所定のしきい値以上である場合に前記予測フィルタ手段の予測で得られる予測残差信号を用いて符号帳を更新し、前記周期性が所定のしきい値未満である場合に生成した駆動音源信号とを用いて符号帳を更新する判定をする構成を採る。
【００３５】
これらの構成によれば、入力音響信号の周期性が強い場合には、高位レイヤの復号信号から求められる予測残差信号を使って適応符号帳の内部状態を更新することにより適応符号帳による予測精度が増し性能が向上する。また、本実施の形態の階層符号化装置によれば、入力音響信号の周期性が強くない場合には駆動音源信号を使って適応符号帳の内部状態を更新することにより、非周期的な信号に対する効果を上げることができる。
【００３６】
本発明の階層符号化装置は、前記判定手段は、予測残差信号を用いて適応符号帳の内部状態を更新し入力音響信号を実際に符号化して求められる歪と、駆動音源信号を用いて適応符号帳の内部状態を更新し入力音響信号を実際に符号化して求められる歪のいずれが小さいかを判定し、前記第１符号化手段は、前記歪みが小さい信号を用いて符号帳を更新する判定をする構成を採る。
【００３７】
この構成によれば、予測残差信号もしくは駆動音源信号のいずれを用いて適応符号帳の内部状態を更新するかを判定する際に、予測残差信号を用いて適応符号帳の内部状態を更新し入力音響信号を実際に符号化して求められる歪と、駆動音源信号を用いて適応符号帳の内部状態を更新し入力音響信号を実際に符号化して求められる歪を算出して比較し、歪が小さくなる信号を用いて適応符号帳の内部状態を更新することにより、歪の小さくなる信号を常に使って適応符号帳の内部状態を更新することになるので、品質を向上することができる。
【００３８】
本発明の階層復号化装置は、符号側で入力音声信号を符号化し、前段で符号化した信号を復号し、この復号信号と入力信号との差分を符号化した信号を復号する階層復号化装置であって、所定の長さのフレーム単位で入力音響信号を符号化した信号を復号する第１復号化手段と、前段の符号化結果を復号化した信号と入力音響信号との差分を１段または複数段で符号化した信号をそれぞれ復号して加算する第２復号化手段と、前記第１復号化手段と前記第２復号化手段の復号結果から予測残差信号を生成する予測フィルタ手段と、を具備し、前記復号化手段は、前記予測フィルタ手段の予測に基づいて復号化に用いる符号帳を更新する構成を採る。
【００３９】
本発明の階層復号化装置は、前記第１復号化手段は、入力音響信号をＣＥＬＰ符号化方式で信号を復号する手段であって、前記予測フィルタ手段は、符号化側で符号化されたＬＰＣ係数を復号して得られるＬＰＣ係数を用いて予測フィルタを生成し、前記第１復号化手段は、前記第１復号化手段と前記第２復号化手段の復号結果を前記予測フィルタに通した結果を用いて符号帳を更新する構成を採る。
【００４０】
これらの構成によれば、上位レイヤで符号化しきれない部分を符号化する下位レイヤで符号化する階層符号化方法の復号において、第２レイヤ以降の符号化信号を復号した信号と、上位レイヤの符号化で得られるＬＰＣ係数とから予測残差信号を生成し、この予測した残差信号を用いて上位レイヤの適応符号帳の更新を行うことにより、音響信号の符号化に近い駆動音源を持つ適応符号帳で復号することができ、低ビットレートで高品質な信号を復号できる。
【００４１】
本発明の階層復号化装置は、前段の復号化結果をアップサンプリングするアップサンプリング手段と、アップサンプリングした復号結果と後段の復号結果を加算する加算手段と、前記加算手段の加算結果をダウンサンプリングするダウンサンプリング手段と、を具備し、前記フィルタ手段は、ダウンサンプリング後の復号結果から予測残差信号を生成する構成を採る。
【００４２】
この構成によれば、下位レイヤで復号する信号のサンプリング周波数を上位レイヤで復号する信号のサンプリング周波数より高くすることにより、様々なサンプリング周波数に対応させて信号を符号化した信号を復号することができる。
【００４３】
本発明の階層復号化装置は、前記第１復号化手段は、符号化側において前記予測フィルタ手段の予測で得られる予測残差信号と生成した駆動音源信号とのいずれかを用いて適応符号帳を更新するか判定した結果に基づいて符号帳を更新する構成を採る。
【００４４】
この構成によれば、符号側において入力音響信号の周期性の強さ等に基づいて、予測残差信号もしくは駆動音源信号のいずれを用いて適応符号帳の内部状態を更新するかを判定した結果に基づいて、符号化された音響信号の周期性が強い場合には、高位レイヤの復号信号から求められる予測残差信号を使って適応符号帳の内部状態を更新することにより、階層符号化法の符号化コードを復号することができ、その結果高品質な音響信号を復号することができる。
【００４５】
本発明の音響信号送信装置は、音響信号を電気的信号に変換する音響入力手段と、この音響入力手段から出力された信号をディジタル信号に変換するＡ／Ｄ変換手段と、このＡ／Ｄ変換手段から出力されたディジタル信号を符号化する上記階層符号化装置と、この符号化装置から出力された符号化コードを無線周波数の信号に変調するＲＦ変調手段と、このＲＦ変調手段から出力された信号を電波に変換して送信する送信アンテナと、を具備する構成を採る。
【００４６】
この構成によれば、上位レイヤで符号化しきれない部分を符号化する下位レイヤで符号化する階層符号化において、第２レイヤ以降の符号化信号を復号した信号と、上位レイヤの符号化で得られるＬＰＣ係数とから予測残差信号を生成し、この予測した残差信号を用いて上位レイヤの適応符号帳の更新を行うことにより、音響信号の符号化に近い駆動音源を持つ適応符号帳で符号化することができ、低ビットレートで高品質な符号化を行うことができる。
【００４７】
本発明の音響信号受信装置は、電波を受信する受信アンテナと、この受信アンテナに受信された信号を復調するＲＦ復調手段と、このＲＦ復調手段にて得られた情報を復号する上記階層復号化装置と、この復号化装置から出力された信号をアナログ信号に変換するＤ／Ａ変換手段と、このＤ／Ａ変換手段から出力された電気的信号を音響信号に変換する音響出力手段と、を具備する構成を採る。
【００４８】
この構成によれば、上位レイヤで符号化しきれない部分を符号化する下位レイヤで符号化する階層符号化方法の復号において、第２レイヤ以降の符号化信号を復号した信号と、上位レイヤの符号化で得られるＬＰＣ係数とから予測残差信号を生成し、この予測した残差信号を用いて上位レイヤの適応符号帳の更新を行うことにより、音響信号の符号化に近い駆動音源を持つ適応符号帳で復号することができ、低ビットレートで高品質な信号を復号できる。
【００４９】
本発明の通信端末装置は、上記音響信号送信装置あるいは上記音響信号受信装置の少なくとも一方を具備する構成を採る。本発明の基地局装置は、上記音響信号送信装置あるいは上記音響信号受信装置の少なくとも一方を具備する構成を採る。
【００５０】
これらの構成によれば、上位レイヤで符号化しきれない部分を符号化する下位レイヤで符号化する階層符号化において、第２レイヤ以降の符号化信号を復号した信号と、上位レイヤの符号化で得られるＬＰＣ係数とから予測残差信号を生成し、この予測した残差信号を用いて上位レイヤの適応符号帳の更新を行うことにより、音響信号の符号化に近い駆動音源を持つ適応符号帳で符号化することができ、低ビットレートで高品質な符号化を行うことができる。
【００５１】
【発明の実施の形態】
本発明の骨子は、上位レイヤで符号化しきれない部分を符号化する下位レイヤで符号化する階層符号化において、第２レイヤ以降の符号化信号を復号した信号と、上位レイヤの符号化で得られるＬＰＣ係数とから予測残差信号を生成し、この予測した残差信号を用いて上位レイヤの適応符号帳の更新を行うことにより、音響信号の符号化に近い駆動音源を持つ適応符号帳で符号化して低ビットレートで高品質な符号化を行うことである。
【００５２】
以下、本発明の実施の形態について図面を参照して詳細に説明する。以降の実施の形態の説明ではレイヤ数Ｎを３にした場合について説明するが、本発明はこの数値に限定されるものではなく、Ｎ≧２の条件を満たす構成に適用することが可能である。
【００５３】
（実施の形態１）
図１は、本発明の実施の形態１に係る階層符号化装置の構成を示すブロック図である。図１の階層符号化装置１００は、入力端子１０１と、第１レイヤ符号化部１０２と、第１レイヤ復号化部１０３と、遅延器１０４と、減算器１０５と、第２レイヤ符号化部１０６と、第２レイヤ復号化部１０７と、加算器１０８と、遅延器１０９と、減算器１１０と、第３レイヤ符号化部１１１と、第３レイヤ復号化部１１２と、加算器１１３と、多重化部１１４と、出力端子１１５と、予測フィルタ１１６とから主に構成される。
【００５４】
本実施の形態では、各レイヤに入力される信号のサンプリング周波数は全て同じであるとし、サンプリング周波数をＦｓと表すものとする。入力端子１０１から、サンプリング周波数Ｆｓの音響信号が入力され、第１レイヤ符号化部１０２に与えられる。
【００５５】
第１レイヤ符号化部１０２は、過去に生成した駆動音源信号を内部状態として保持している適応符号帳を有し、適応符号帳を用いることで周期性の強い信号を効率的に符号化することができる。第１レイヤ符号化部１０２は、入力音響信号と符号化後に生成される復号信号との間の聴感的な歪が最小となるように第１符号化コードを決定する。第１レイヤ符号化部１０２に適用される代表的な方法として符号励信線形予測法（ＣＥＬＰ）があるが、この詳細な説明は後述する。
【００５６】
そして、第１レイヤ符号化部１０２は、得られた第１符号化コードを第１レイヤ復号化部１０３及び多重化部１１４に出力する。第１レイヤ復号化部１０３は、第１符号化コードを用いて第１レイヤ復号信号を生成し、この第１レイヤ復号信号を減算器１０５及び加算器１０８に出力する。
【００５７】
遅延器１０４は、入力端子１０１から入力される音響信号を所定の時間長だけ遅延して減算器１０５に出力する。すなわち、遅延器１０４は、第１レイヤ符号化部１０２と第１レイヤ復号化部１０３で生じる遅延を補正する役割を持つ。
【００５８】
減算器１０５は、遅延器１０４の出力信号と前述の第１レイヤ復号信号との差をとり第２レイヤ残差信号を生成する。そして、減算器１０５は、第２レイヤ残差信号を第２レイヤ符号化部１０６に出力する。
【００５９】
第２レイヤ符号化部１０６は、第２レイヤ残差信号を聴感的に品質改善が成されるように符号化を行い、第２符号化コードを決定する。そして、第２レイヤ符号化部１０６は、第２レイヤ復号化部１０７と第２符号化コードを多重化部１１４に出力する。
【００６０】
同様に第２レイヤ復号化部１０７に第２符号化コードを与え、第２レイヤ復号化部１０７は、第２符号化コードを用いて復号処理を行い、第２レイヤ復号残差信号を生成し、この第２レイヤ復号残差信号を加算器１０８に出力する。
【００６１】
加算器１０８は、第１レイヤ復号信号と第２レイヤ復号残差信号の和をとり、第２レイヤ復号信号を生成する。そして、加算器１０８は、この第２レイヤ復号信号を減算器１１０と加算器１１３に出力する。
【００６２】
次に、遅延器１０９は、入力端子１０１から入力される音響信号を所定の時間長だけ遅延した後、この音響信号を減算器１１０に出力する。すなわち、遅延器１０９は、前段までの符号化部と復号化部で生じる遅延、具体的には第１レイヤ符号化部１０２と第１レイヤ復号化部１０３および第２レイヤ符号化部１０６と第２レイヤ復号化部１０７で生じる遅延を補正する役割を持つ。
【００６３】
減算器１１０は、遅延器１０９の出力信号と前述の第２レイヤ復号信号との差をとり第３レイヤ残差信号を生成する。そして、減算器１１０は、この第３レイヤ残差信号を第３レイヤ符号化部１１１に出力する。
【００６４】
第３レイヤ符号化部１１１は、第３レイヤ残差信号を聴感的に品質改善が成されるように符号化して第３符号化コードを決定し、この第３符号化コードを第３レイヤ復号化部１１２と多重化部１１４に出力する。
【００６５】
第３レイヤ復号化部１１２は、第３符号化コードを用いて復号処理を行い、第３レイヤ復号残差信号を生成し、この第３レイヤ復号残差信号を加算器１１３に出力する。
【００６６】
加算器１１３は、第２レイヤ復号信号と第３レイヤ復号残差信号の和をとり、第３レイヤ復号信号を生成し、この第３レイヤ復号信号を予測フィルタ１１６に出力する。
【００６７】
多重化部１１４は、第１符号化コード、第２符号化コードおよび第３符号化コードを所定の手段によって多重化を行い、符号化ビット列を生成する。そして、多重化部１１４は、この符号化ビット列を出力端子１１５より出力する。
【００６８】
加算器１１３で生成された第３レイヤ復号信号は予測フィルタ１１６に与えられる。
【００６９】
予測フィルタ１１６は、第３レイヤ復号信号に予測フィルタをかけ、予測残差信号を生成し、この予測残差信号を第１レイヤ符号化部１０２に出力する。予測フィルタは、第１レイヤ符号化部１０２で算出された量子化後のＬＰＣ係数により構成される。第３レイヤ復号信号をｓｙｎ３（ｋ）、予測残差信号をｅ（ｋ）、量子化後のＬＰＣ係数をαｑ（ｉ）とすると、予測残差信号ｅ（ｋ）は次の式（１）で表される。
【００７０】
【数１】

ここで、ＮＰはＬＰＣ係数の次数を表す。
【００７１】
第１レイヤ符号化部１０２は、上記説明で求めた予測残差信号を用いて、第１レイヤ符号化部１０２に内在する適応符号帳の内部状態として利用する。
【００７２】
以下、第１レイヤ符号化部１０２の詳細について説明する。ここでは、第１レイヤ符号化部１０２にＣＥＬＰを用いる場合を例にして説明するが、本発明の要件として第１レイヤ符号化部に過去の駆動音源信号を内部状態として保持している適応符号帳が存在する符号化方法であればよく、本発明はＣＥＬＰに限定されるものではない。
【００７３】
図２は、本実施の形態の階層符号化装置の第１レイヤ符号化部の内部構成を示すブロック図である。図２の第１レイヤ符号化部は、ＣＥＬＰの代表的な構成を基にしたときの構成図である。図２において破線枠で囲まれた部分が図１の第１レイヤ符号化部１０２に相当する。図２において、第１レイヤ符号化部１０２は、入力端子２０１と、ＬＰＣ分析器２０２と、ＬＰＣ量子化器２０３と、ＬＰＣ復号器２０４と、聴感重みフィルタ２０５と、聴感重み付き合成フィルタ２０６と、適応符号帳２０７と、雑音符号帳２０８と、乗算器２０９と、乗算器２１０と、ゲイン符号帳２１１と、加算器２１２と、減算器２１３と、探索器２１４と、多重化部２１５と、出力端子２１６と、出力端子２１７と、入力端子２１８とから主に構成される。
【００７４】
入力端子２０１には、図１の入力端子１０１から入力される音響信号が入力される。ＬＰＣ分析器２０２は、入力端子２０１から入力されたサンプリングレートＦｓの音響信号からＬＰＣ係数を求める。このＬＰＣ係数は、聴感的な品質向上のために利用される。ＬＰＣ分析器２０２は、このＬＰＣ係数をＬＰＣ量子化器２０３、聴感重みフィルタ２０５、及び聴感重み付き合成フィルタ２０６に出力する。
【００７５】
ＬＰＣ量子化器２０３は、ＬＰＣ係数をＬＳＰ係数などの量子化に適したパラメータに変換し、量子化を行う。そして、ＬＰＣ量子化器２０３は、この量子化で得られる符号化コードをＬＰＣ復号器２０４と多重化部２１５に出力する。
【００７６】
ＬＰＣ復号器２０４は、符号化コードから量子化後のＬＳＰ係数を算出し、ＬＰＣ係数に変換して量子化後のＬＰＣ係数を求める。そして、ＬＰＣ復号器２０４は、この量子化後のＬＰＣ係数を聴感重み付き合成フィルタ２０６と出力端子２１７に出力する。この量子化後のＬＰＣ係数は、適応符号帳、適応ゲイン、雑音符号帳および雑音ゲインの符号化に利用される。また、量子化後のＬＰＣ係数が出力端子２１７より出力され、前述したように図１の予測フィルタ１１６に与えられ、予測残差信号ｅ（ｋ）を求める際に利用される。
【００７７】
聴感重みフィルタ２０５は、ＬＰＣ分析器２０２で求められたＬＰＣ係数を基に入力信号に重み付けを行う。これは、量子化歪のスペクトルを入力信号のスペクトル包絡にマスクされるようスペクトル整形を行うことを目的として行われる。そして、聴感重みフィルタ２０５は、重み付けされた入力信号を減算器２１３に出力する。
【００７８】
次に、適応ベクトル、適応ベクトルゲイン、雑音ベクトル、雑音ベクトルゲインの探索する構成の部分について説明する。
【００７９】
適応符号帳２０７は、過去に生成した駆動音源信号を内部状態として保持し、この内部状態を所望のピッチ周期で繰り返すことにより適応ベクトルを生成する。ピッチ周期の取る範囲は、実際の音声のピッチ周期を勘案し６０Ｈｚ〜４００Ｈｚの間が適当である。そして、適応符号帳２０７は、内部に保持した駆動音源信号を適応ベクトルとして順に乗算器２０９に出力する。
【００８０】
乗算器２０９は、この適応ベクトルにゲイン符号帳２１１から出力される適応ベクトルゲインを乗算して加算器２１２に出力する。
【００８１】
また、雑音符号帳２０８は、あらかじめ記憶領域に格納されている雑音ベクトル、もしくは代数（ａｌｇｅｂｒａｉｃ）構造のように記憶領域を持たずにルールに従い生成されるベクトルを雑音ベクトルとして出力する。
【００８２】
乗算器２１０は、この雑音ベクトルにゲイン符号帳２１１から出力される雑音ベクトルゲインを乗算して加算器２１２に出力する。
【００８３】
加算器２１２は、適応ベクトルゲインが乗じられた適応ベクトルと雑音ベクトルゲインが乗じられた雑音ベクトルとを加算して駆動音源信号を生成し、この駆動音源信号を聴感重み付き合成フィルタ２０６に出力する。
【００８４】
聴感重み付き合成フィルタ２０６は、駆動音源信号を聴覚重み付き合成フィルタに通して聴覚重み付き合成信号を生成し、この聴覚重み付き合成信号を減算器２１３に出力する。
【００８５】
減算器２１３は、聴覚重み付き入力信号から聴覚重み付き合成信号を減算し、減算後の信号を探索器２１４に出力する。
【００８６】
探索器２１４は、減算後の信号から定義される歪が最小となる適応ベクトル、適応ベクトルゲイン、雑音ベクトル、雑音ベクトルゲインの組み合わせを効率よく探索し、これら検索された符号化コードを多重化部２１５に出力する。
【００８７】
探索器２１４は、以下の式（２）または式（３）で定義される歪を最小とする符号化コードｉ，ｊ，ｍもしくは符号化コードｉ，ｊ，ｍ，ｎを決定してそれらを多重化部２１５に送る。
【００８８】
【数２】

【数３】

ここで、ｔ（ｋ）は聴覚重み付き入力信号、ｐｉ（ｋ）は第ｉ番目の適応ベクトルを聴覚重み付き合成フィルタに通して得られる信号、ｅｊ（ｋ）は第ｊ番目の雑音ベクトルを聴覚重み付き合成フィルタに通して得られる信号、βとγはそれぞれ適応ベクトルゲインと雑音ベクトルゲインを表す。式（２）と式（３）とではゲイン符号帳の構成が異なり、式２の場合、ゲイン符号帳は適応ベクトルゲインβｍと雑音ベクトルゲインγｍを要素として持つベクトルとして表されており、ベクトルを特定するための符号化コードｍが決定されることになる。式３の場合、ゲイン符号帳は適応ベクトルゲインβｍと雑音ベクトルゲインγｎをそれぞれ独立に有しており、それぞれの符号化コードｍ，ｎが独立に決定されることになる。
【００８９】
適応ベクトル、適応ベクトルゲイン、雑音ベクトル、雑音ベクトルゲインの最適化を同時に図ると演算量が膨大になるため対策が必要である。一般的には、適応ベクトル、適応ベクトルゲイン、雑音ベクトル、雑音ベクトルゲインの順に最適なベクトルまたは値を決定していく方法が採られる。
【００９０】
そして、探索器２１４において符号化コードが決定された後に、多重化部２１５はこれらの符号化コードを一つにまとめて出力端子２１６より出力する。
【００９１】
上記符号化処理が終了した後に、次のフレーム（もしくはサブフレーム）での符号化処理に備えて、適応符号帳の内部状態を更新する。
【００９２】
予測フィルタ１１６は、加算器１１３より得られる第３レイヤ復号信号ｓｙｎ（ｎ）と第１レイヤ符号化部１０２より得られる量子化後のＬＰＣ係数αｑ（ｉ）を用いて、予測残差信号ｒ（ｎ）を出力する。この予測残差信号ｒ（ｎ）を用いて適応符号帳内の内部状態を更新することになる。予測フィルタ１１６は、量子化後のＬＰＣ係数αｑ（ｉ）を用いて予測フィルタを構成し、この予測フィルタに第３レイヤ復号信号ｓｙｎ（ｎ）を入力することにより予測残差信号ｒ（ｎ）を算出することになる。予測残差信号ｒ（ｎ）は次の式（４）に従い算出される。
【００９３】
【数４】

ここでＮＰはＬＰＣ係数の次数を表す。
【００９４】
本発明の特徴はこの上記説明の部分にあり、従来の方法では加算器２１２で求められた駆動音源信号を用いて適応符号帳２０７の内部状態を更新していたが、本発明では入力端子２１８から入力される予測フィルタ１１６の出力信号を用いて適応符号帳の内部状態を更新する。本発明の効果の説明を、図３を用いて行う。
【００９５】
図３は、入力音響信号とそれに対応する第１レイヤ復号信号、第２レイヤ復号信号、第３レイヤ復号信号の関係を示す図である。従来の方法では、第１レイヤ復号信号に対応する駆動音源信号を用いて適応符号帳を更新している。
【００９６】
各レイヤの復号信号を比較すると、最も入力音響信号に近い信号は第３レイヤ復号信号であり、次いで第２レイヤ復号信号、第１レイヤ復号信号の順となる。これは、本実施の形態ではレイヤが増すごとに入力音響信号と復号信号の誤差が小さくなるように符号化されていくことによる。一方で、適応符号帳の内部状態が入力音響信号と類似な状態になっているほど適応符号帳の性能は高くなる。そのため、第３レイヤ復号信号を用いて適応符号帳の内部状態を更新すると、より効率的な符号化が実現できる。適応符号帳の内部状態は駆動音源信号にする必要があるため、実際上は、第３レイヤ復号信号から、ＬＰＣ係数を使って予測残差信号を求め、この予測残差信号を用いて適応符号帳の内部状態を更新することになる。
【００９７】
このように、本実施の形態の階層符号化装置によれば、上位レイヤで符号化しきれない部分を符号化する下位レイヤで符号化する階層符号化において、第２レイヤ以降の符号化信号を復号した信号と、上位レイヤの符号化で得られるＬＰＣ係数とから上位レイヤの符号化で発生する残差信号を予測し、この予測した残差信号を用いて上位レイヤの適応符号帳の更新を行うことにより、音響信号の符号化に近い駆動音源を持つ適応符号帳で符号化することができ、低ビットレートで高品質な符号化を行うことができる。
【００９８】
なお、上記説明では、予測フィルタ１１６が第３レイヤ復号信号と第１レイヤ符号化部１０２より得られる量子化後のＬＰＣ係数を用いて、予測残差信号を作成し、第１レイヤ符号化部１０２は、この予測残差信号を用いて適応符号帳の内部状態を更新しているが、予測フィルタ１１６が、第２レイヤ復号信号をもちいて予測残差信号を作成してもよい。すなわち、予測残差信号を作成するために必要な復号信号は、第１レイヤ符号化で符号化しきれない残差信号を符号化するレイヤであれば何段目であってもよい。
【００９９】
図４は、本発明の実施の形態１に係る階層符号化装置の構成を示すブロック図である。図４において図１と同一の番号が付与されている構成要素については同一の機能を有するものとしてここでは説明を省略する。本実施の形態の特徴は、中間レイヤの復号信号（図４では第２レイヤ復号信号）を予測フィルタ１１６に与え、その出力信号を適応符号帳２０７の内部状態の更新に用いる点にある。この構成によれば、中間レイヤまでのスケーラビリティを確保できるという特徴がある。
【０１００】
加算器１０８は、第１レイヤ復号信号と第２レイヤ復号残差信号の和をとり、第２レイヤ復号信号を生成する。そして、加算器１０８は、この第２レイヤ復号信号を減算器１１０と予測フィルタ１１６に出力する。
【０１０１】
予測フィルタ１１６は、第２レイヤ復号信号に予測フィルタをかけ、予測残差信号を生成し、この予測残差信号を第１レイヤ符号化部１０２に出力する。
【０１０２】
第１レイヤ符号化部１０２は、予測フィルタ１１６で求めた予測残差信号を用いて、第１レイヤ符号化部１０２に内在する適応符号帳の内部状態として利用する。第１レイヤ符号化部１０２は、入力音響信号と符号化後に生成される復号信号との間の聴感的な歪が最小となるように第１符号化コードを決定する。そして、第１レイヤ符号化部１０２は、得られた第１符号化コードを第１レイヤ復号化部１０３及び多重化部１１４に出力する。
【０１０３】
このように、本実施の形態の階層符号化装置によれば、中間レイヤの復号信号を予測フィルタに与え、その出力信号を第１レイヤ符号化の適応符号帳の内部状態の更新に用いることにより、中間レイヤまでのスケーラビリティを確保できる。
【０１０４】
（実施の形態２）
本実施の形態では、実施の形態１の階層符号化装置で符号化された信号を復号する例について説明する。本実施の形態の特徴は、実施の形態１で説明された階層符号化法の符号化コードを復号することができ、その結果高品質な音響信号を復号することが可能になる点にある。
【０１０５】
図５は、本発明の実施の形態２に係る階層復号化装置の構成を示すブロック図である。図５の階層復号化装置３００は、入力端子３０１と、分離部３０２と、第１レイヤ復号化部３０３と、第２レイヤ復号化部３０４と、第３レイヤ復号化部３０５と、加算器３０６と、加算器３０７と、予測フィルタ３０８と、出力端子３０９とから主に構成される。
【０１０６】
入力端子３０１から図１の階層符号化装置にて符号化された符号化ビット列が入力される。
【０１０７】
分離部３０２は、符号化ビット列を分離し、第１レイヤ符号化で得られる第１符号化コード、第２レイヤ符号化で得られる第２符号化コードおよび第３レイヤ符号化で得られる第３符号化コードを生成する。そして、分離部３０２は、第１符号化コードを第１レイヤ復号化部３０３に出力し、第２符号化コードを第２レイヤ復号化部３０４に出力し、第３符号化コードを第３レイヤ復号化部３０５に出力する。
【０１０８】
第１レイヤ復号化部３０３は、分離部３０２で得られた第１符号化コードを用いて復号処理を行い、第１レイヤ復号信号を生成する。
【０１０９】
次に、第２レイヤ復号化部３０４は、分離部３０２で得られた第２符号化コードを用いて復号処理を行い、第２レイヤ復号残差信号を生成する。加算器３０６では、前述の第１レイヤ復号信号と第２レイヤ復号残差信号とを加算し、第２レイヤ復号信号を生成する。そして、加算器３０６は、第２レイヤ復号信号を加算器３０７に出力する。
【０１１０】
次に、第３レイヤ復号化部３０５は、分離部３０２で得られた第３符号化コードを用いて復号処理を行い、第３レイヤ復号残差信号を生成する。加算器３０７は、前述の第２レイヤ復号信号と第３レイヤ復号残差信号とを加算し、第３レイヤ復号信号を生成する。加算器３０７は、第３レイヤ復号信号を予測フィルタ３０８と出力端子３０９に出力する。
【０１１１】
予測フィルタ３０８は、前述した実施の形態１の予測フィルタ１１６と同様の処理を行い、予測残差信号を生成する。予測フィルタ３０８で使用する量子化後のＬＰＣ係数には、第１レイヤ復号化部で得られる復号ＬＰＣ係数が用いられる。また、予測フィルタ３０８で生成される予測残差信号は第１レイヤ復号化部に与えられ、第１レイヤ復号化部に内在する適応符号帳の内部状態の更新に用いられる。
【０１１２】
この様子を詳細に説明するために、次に第１レイヤ復号化部３０３について説明する。ここでは、第１レイヤ復号化部３０３にＣＥＬＰを用いる場合を例にして説明するが、本発明の要件として第１レイヤ復号化部に適応符号帳が存在する復号化方法であればよく、本発明はＣＥＬＰに限定されるものではない。
【０１１３】
図６は、本実施の形態の階層復号化装置の第１レイヤ復号化部の内部構成を示すブロック図である。図６の第１レイヤ復号化部は、ＣＥＬＰの代表的な構成を基にしたときの構成図である。図６において破線枠で囲まれた部分が図５の第１レイヤ復号化部３０３に相当する。図６において、第１レイヤ復号化部３０３は、入力端子４０１と、分離部４０２と、適応符号帳４０３と、雑音符号帳４０４と、ゲイン符号帳４０５と、乗算器４０６と、乗算器４０７と、加算器４０８と、ＬＰＣ復号器４０９と、合成フィルタ４１０と、出力端子４１２と、出力端子４１３と、入力端子４１４とから主に構成される。
【０１１４】
分離部４０２は、入力端子４０１より入力される第１符号化コードから符号化コードを分離し、適応符号帳４０３、雑音符号帳４０４、ゲイン符号帳４０５およびＬＰＣ復号器４０９に出力する。
【０１１５】
ＬＰＣ復号器４０９は、与えられる符号化コードを用いてＬＰＣ係数を復号し、合成フィルタ４１０と出力端子４１２に出力する。出力端子４１２より出力されるＬＰＣ係数は、前述した予測フィルタ３０８にて利用されることになる。
【０１１６】
次に、適応符号帳４０３は符号化コードを利用して適応ベクトルｑ（ｋ）を復号して乗算器４０６に出力する。雑音符号帳４０４は、符号化コードを利用して雑音ベクトルｃ（ｋ）を復号して乗算器４０７に出力する。
【０１１７】
ゲイン符号帳４０５は、符号化コードを利用して適応ベクトルゲインβｑおよび雑音ベクトルゲインγｑを復号する。そして、乗算器４０６は適応ベクトルゲインβｑを乗算器４０６に出力し、雑音ベクトルゲインγｑを乗算器４０７に出力する。
【０１１８】
乗算器４０６は、適応ベクトルと適応ベクトルゲインを乗じ、加算器４０８に出力する。乗算器４０７では雑音ベクトルと雑音ベクトルゲインを乗じ、加算器４０８に出力する。加算器４０８は、乗算後の適応ベクトルと雑音ベクトルとの信号を加算して駆動音源信号を生成する。駆動音源信号をｅｘ（ｋ）と表すと、駆動音源信号ｅｘ（ｋ）は次の式（５）で求められる。
【０１１９】
【数５】

次に、復号されたＬＰＣ係数と駆動音源信号ｅｘ（ｋ）を用いて合成フィルタ４１０にて合成信号ｓｙｎ（ｋ）を次の式（６）に従い生成する。
【０１２０】
【数６】

ここで、αｑ（ｉ）は復号されたＬＰＣ係数、ＮＰはＬＰＣ係数の次数を表す。上記動作で復号された復号信号ｓｙｎ（ｎ）は出力端子４１３より出力される。
【０１２１】
上記復号化処理が終了した後に、次のフレーム（もしくはサブフレーム）での復号化処理に備えて、適応符号帳の内部状態を最新の駆動音源信号を用いて更新する。
【０１２２】
本発明の特徴は上記説明部分にあり、従来の方法では加算器４０８で求められた駆動音源信号を用いて適応符号帳４０３の内部状態を更新していたが、本発明では入力端子４１４から入力される予測フィルタ３０８の出力信号（予測残差信号）を用いて適応符号帳の内部状態を更新する。
【０１２３】
このように、本実施の形態の階層復号化装置によれば、上位レイヤで符号化しきれない部分を符号化する下位レイヤで符号化する階層符号化方法の復号において、第２レイヤ以降の符号化信号を復号した信号と、上位レイヤの符号化で得られるＬＰＣ係数とから予測残差信号を生成し、この予測した残差信号を用いて上位レイヤの適応符号帳の更新を行うことにより、音響信号の符号化に近い駆動音源を持つ適応符号帳で復号することができ、低ビットレートで高品質な信号を復号できる。
【０１２４】
なお、上記説明では、予測フィルタ３０８が第３レイヤ復号信号と第１レイヤ符号化部１０２より得られる量子化後のＬＰＣ係数を用いて、予測残差信号を作成し、適応符号帳４０３は、この予測残差信号を用いて適応符号帳の内部状態を更新しているが、予測フィルタ３０８が、第２レイヤ復号信号をもちいて予測残差信号を作成してもよい。すなわち、予測残差信号を作成するために必要な復号信号は、第１レイヤ符号化で符号化しきれない残差信号を符号化するレイヤであれば何段目であってもよい。
【０１２５】
図７は、本発明の実施の形態２に係る階層復号化装置の構成を示すブロック図である。図７において図５と同一の番号が付与されている構成要素については同一の機能を有するものとしてここでは説明を省略する。本実施の形態の特徴は、中間レイヤの復号信号（図７では第２レイヤ復号信号）を予測フィルタ３０８に与え、予測フィルタ３０８の出力信号を図６の適応符号帳４０３の内部状態の更新に用いる点にある。この構成によれば、中間レイヤまでのスケーラビリティを確保できるという特徴がある。
【０１２６】
加算器３０６は、前述の第１レイヤ復号信号と第２レイヤ復号残差信号とを加算し、第２レイヤ復号信号を生成する。そして、加算器３０６は、第２レイヤ復号信号を加算器３０７と予測フィルタ３０８に出力する。
【０１２７】
次に、第３レイヤ復号化部３０５は、分離部３０２で得られた第３符号化コードを用いて復号処理を行い、第３レイヤ復号残差信号を生成する。加算器３０７は、前述の第２レイヤ復号信号と第３レイヤ復号残差信号とを加算し、第３レイヤ復号信号を生成する。加算器３０７は、第３レイヤ復号信号を出力端子３０９に出力する。
【０１２８】
予測フィルタ３０８は、第１レイヤ復号化部３０３で生成される量子化後のＬＰＣ係数と加算器３０６で生成される第２レイヤ復号信号とから予測残差信号を生成する。そして、予測フィルタ３０８で生成される予測残差信号は、第１レイヤ復号化部に与えられ、第１レイヤ復号化部に内在する適応符号帳の内部状態の更新に用いられる。
【０１２９】
このように、本実施の形態の階層復号化装置によれば、中間レイヤの復号信号を予測フィルタに与え、その出力信号を第１レイヤ復号化の適応符号帳の内部状態の更新に用いることにより、中間レイヤまでのスケーラビリティを確保できる。
【０１３０】
（実施の形態３）
図８は、本発明の実施の形態３に係る階層符号化装置の構成を示すブロック図である。図８の階層符号化装置５００は、入力端子５０１と、ＤＳ１部５０２と、第１レイヤ符号化部５０３と、第１レイヤ復号化部５０４と、ＵＳ１部５０５と、ＤＳ２部５０６と、遅延器５０７と、減算器５０８と、第２レイヤ符号化部５０９と、第２レイヤ復号化部５１０と、加算器５１１と、ＵＳ２部５１２と、遅延器５１３と、減算器５１４と、第３レイヤ符号化部５１５と、第３レイヤ復号化部５１６と、加算器５１７と、多重化部５１８と、出力端子５１９と、ＤＳ３部５２０と、予測フィルタ５２１とから主に構成される。
【０１３１】
図８の階層符号化装置は、上位レイヤの符号化信号を復号し、この復号信号をアップサンプリングした信号と入力音響信号との差分を下位レイヤで符号化する方法に関し、下位レイヤで符号化する信号のサンプリング周波数が上位レイヤで符号化する信号のサンプリング周波数より高い点が図１の階層符号化装置と異なる。
【０１３２】
本実施の形態では、各レイヤに入力される信号のサンプリング周波数には次の式（７）に示す関係がある点に特徴がある。
【０１３３】
【数７】

ここで、Ｆｓ（ｎ）は第ｎレイヤの信号のサンプリング周波数を表す。本実施の形態によれば、複数のサンプリング周波数に対応した符号化を行うことが可能となる。
【０１３４】
入力端子５０１から、サンプリング周波数Ｆｓ（３）の音響信号が入力されＤＳ１部５０２に与えられる。
【０１３５】
ＤＳ１部５０２は、入力音響信号をダウンサンプリングし、この入力音響信号のサンプリング周波数をＦｓ（３）からＦｓ（１）に下げる。そして、ＤＳ１部５０２は、サンプリング周波数Ｆｓ（１）の入力信号を第１レイヤ符号化部５０３に出力する。
【０１３６】
第１レイヤ符号化部５０３は、過去に生成した駆動音源信号を内部状態として保持している適応符号帳を有し、適応符号帳を用いることで周期性の強い信号を効率的に符号化することができる。第１レイヤ符号化部５０３は、入力音響信号と符号化後に生成される復号信号との間の聴感的な歪が最小となるように第１符号化コードを決定する。第１レイヤ符号化部５０３に適用される代表的な方法として符号励信線形予測法（ＣＥＬＰ）がある。
【０１３７】
そして、第１レイヤ符号化部５０３は、得られた第１符号化コードを第１レイヤ復号化部５０４及び多重化部５１８に出力する。第１レイヤ復号化部５０４は、第１符号化コードを用いて第１レイヤ復号信号を生成し、この第１レイヤ復号信号をＵＳ１部５０５に出力する。
【０１３８】
ＵＳ１部５０５は、第１レイヤ復号信号をアップサンプリングし、サンプリング周波数をＦｓ（１）からＦｓ（２）に上げる。そして、ＵＳ１部５０５は、サンプリング周波数Ｆｓ（２）の第１レイヤ復号信号を減算器５０８と加算器５１１に出力する。
【０１３９】
次に、入力端子５０１から入力される音響信号がＤＳ２部５０６に与えられる。ＤＳ２部５０６は、入力音響信号をダウンサンプリングし、この入力音響信号のサンプリング周波数をＦｓ（３）からＦｓ（２）に下げる。そして、ＤＳ２部５０６は、サンプリング周波数Ｆｓ（２）の入力信号を遅延器５０７に出力する。
【０１４０】
遅延器５０７は、入力端子５０１から入力される音響信号を所定の時間長だけ遅延して減算器５０８に出力する。すなわち、ＤＳ１部５０２、第１レイヤ符号化部５０３、第１レイヤ復号化部５０４、ＵＳ１部５０５およびＤＳ２部５０６にて生じる遅延を補正する役割を持つ。
【０１４１】
減算器５０８は、遅延器５０７の出力信号と前述の第１レイヤ復号信号との差をとり第２レイヤ残差信号を生成する。そして、減算器５０８は、第２レイヤ残差信号を第２レイヤ符号化部５０９に出力する。
【０１４２】
第２レイヤ符号化部５０９は、第２レイヤ残差信号を聴感的に品質改善が成されるように符号化を行い、第２符号化コードを決定する。そして、第２レイヤ符号化部５０９は、第２レイヤ復号化部５１０と第２符号化コードを多重化部５１８に出力する。
【０１４３】
第２レイヤ復号化部５１０は、第２符号化コードを用いて復号処理を行い、第２レイヤ復号残差信号を生成し、この第２レイヤ復号残差信号を加算器５１１に出力する。
【０１４４】
加算器５１１は、第１レイヤ復号信号と第２レイヤ復号残差信号の和をとり、第２レイヤ復号信号を生成する。そして、加算器５１１は、この第２レイヤ復号信号をＵＳ２部５１２に出力する。
【０１４５】
ＵＳ２部５１２は、第２レイヤ復号信号をアップサンプリングし、サンプリング周波数をＦｓ（２）からＦｓ（３）に上げる。そして、ＵＳ２部５１２は、サンプリング周波数Ｆｓ（３）の第２レイヤ復号信号を減算器５１４と加算器５１７に出力する。
【０１４６】
次に、遅延器５１３は、入力端子５０１から入力される音響信号を所定の時間長だけ遅延した後、この音響信号を減算器５１４に出力する。すなわち、遅延器５１３は、前段までの符号化部と復号化部で生じる遅延、具体的にはＤＳ１部５０２からＵＳ２部５１２までの信号処理で生じる遅延を補正する役割を持つ。
【０１４７】
減算器５１４は、遅延器５１３の出力信号と前述の第２レイヤ復号信号との差をとり第３レイヤ残差信号を生成する。そして、減算器５１４は、この第３レイヤ残差信号を第３レイヤ符号化部５１５に出力する。
【０１４８】
第３レイヤ符号化部５１５は、第３レイヤ残差信号を聴感的に品質改善が成されるように符号化して第３符号化コードを決定し、この第３符号化コードを第３レイヤ復号化部５１６と多重化部５１８に出力する。
【０１４９】
第３レイヤ復号化部５１６は、第３符号化コードを用いて復号処理を行い、第３レイヤ復号残差信号を生成し、この第３レイヤ復号残差信号を加算器５１７に出力する。
【０１５０】
加算器５１７は、第２レイヤ復号信号と第３レイヤ復号残差信号の和をとり、第３レイヤ復号信号を生成し、この第３レイヤ復号信号をＤＳ３部５２０に出力する。
【０１５１】
多重化部５１８は、第１符号化コード、第２符号化コードおよび第３符号化コードを所定の手段によって多重化を行い、符号化ビット列を生成する。そして、多重化部５１８は、この符号化ビット列を出力端子５１９より出力する。
【０１５２】
ＤＳ３部５２０は、第３レイヤ復号信号をダウンサンプリングし、この第３レイヤ復号信号のサンプリング周波数をＦｓ（３）からＦｓ（１）に下げる。そして、ＤＳ３部５２０は、サンプリング周波数Ｆｓ（１）の第３レイヤ復号信号を予測フィルタ５２１に出力する。
【０１５３】
予測フィルタ５２１は、第３レイヤ復号信号に予測フィルタをかけ、予測残差信号を生成し、この予測残差信号を第１レイヤ符号化部５０３に出力する。予測フィルタは、第１レイヤ符号化部５０３で算出された量子化後のＬＰＣ係数により構成される。ＤＳ３部５２０から出力される第３レイヤ復号信号をｓｙｎ３（ｋ）、予測残差信号をｅ（ｋ）、量子化後のＬＰＣ係数をαｑ（ｉ）とすると、予測残差信号ｅ（ｋ）は次の式（８）で表される。
【０１５４】
【数８】

ここで、ＮＰはＬＰＣ係数の次数を表す。
【０１５５】
第１レイヤ符号化部５０３は、上記説明の動作で求めた予測残差信号を用いて、第１レイヤ符号化部５０３に内在する適応符号帳の内部状態として利用する。
【０１５６】
このように、本実施の形態の階層符号化装置によれば、下位レイヤで符号化する信号のサンプリング周波数を上位レイヤで符号化する信号のサンプリング周波数より高くすることにより、様々なサンプリング周波数に対応させて入力信号を符号化することができる。
【０１５７】
なお、上記説明では、予測フィルタ５２１が第３レイヤ復号信号と第１レイヤ符号化部５０３より得られる量子化後のＬＰＣ係数を用いて、予測残差信号を作成し、第１レイヤ符号化部５０３は、この予測残差信号を用いて適応符号帳の内部状態を更新しているが、予測フィルタ５２１が、第２レイヤ復号信号をもちいて予測残差信号を作成してもよい。すなわち、予測残差信号を作成するために必要な復号信号は、第１レイヤ符号化で符号化しきれない残差信号を符号化するレイヤであれば何段目であってもよい。
【０１５８】
図９は、本発明の実施の形態３に係る階層符号化装置の構成を示すブロック図である。図９において図８と同一の番号が付与されている構成要素については同一の機能を有するものとしてここでは説明を省略する。本実施の形態の特徴は、中間レイヤの復号信号（図９では第２レイヤ復号信号）を予測フィルタ５２１に与え、その出力信号を適応符号帳２０７の内部状態の更新に用いる点にある。この構成によれば、中間レイヤまでのスケーラビリティを確保できるという特徴がある。
【０１５９】
加算器５１１は、第１レイヤ復号信号と第２レイヤ復号残差信号の和をとり、第２レイヤ復号信号を生成する。そして、加算器５１１は、この第２レイヤ復号信号をＵＳ２部５１２に出力する。
【０１６０】
ＵＳ２部５１２は、第２レイヤ復号信号をアップサンプリングし、サンプリング周波数をＦｓ（２）からＦｓ（３）に上げる。そして、ＵＳ２部５１２は、サンプリング周波数Ｆｓ（３）の第１レイヤ復号信号を減算器５１４とＤＳ３部５２０に出力する。
【０１６１】
ＤＳ３部５２０は、第３レイヤ復号信号をダウンサンプリングし、この第３レイヤ復号信号のサンプリング周波数をＦｓ（３）からＦｓ（１）に下げる。そして、ＤＳ３部５２０は、サンプリング周波数Ｆｓ（１）の第３レイヤ復号信号を予測フィルタ５２１に出力する。
【０１６２】
予測フィルタ５２１は、第２レイヤ復号信号に予測フィルタをかけ、予測残差信号を生成し、この予測残差信号を第１レイヤ符号化部５０３に出力する。
【０１６３】
第１レイヤ符号化部５０３は、予測フィルタ５２１で求めた予測残差信号を用いて、第１レイヤ符号化部５０３に内在する適応符号帳の内部状態として利用する。第１レイヤ符号化部５０３は、入力音響信号と符号化後に生成される復号信号との間の聴感的な歪が最小となるように第１符号化コードを決定する。そして、第１レイヤ符号化部５０３は、得られた第１符号化コードを第１レイヤ復号化部５０４及び多重化部５１８に出力する。
【０１６４】
このように、本実施の形態の階層符号化装置によれば、中間レイヤの復号信号を予測フィルタに与え、その出力信号を第１レイヤ符号化の適応符号帳の内部状態の更新に用いることにより、中間レイヤまでのスケーラビリティを確保できる。
【０１６５】
（実施の形態４）
本実施の形態では、実施の形態３の階層符号化装置で符号化された信号を復号する例について説明する。本実施の形態の特徴は、実施の形態３で説明された階層符号化法の符号化コードを復号することができ、その結果高品質な音響信号を復号することが可能になる点にある。
【０１６６】
図１０は、本発明の実施の形態４に係る階層復号化装置の構成を示すブロック図である。図１０の階層復号化装置６００は、入力端子６０１と、分離部６０２と、第１レイヤ復号化部６０３と、ＵＳ１部６０４と、加算器６０５と、第２レイヤ復号化部６０６と、ＵＳ２部６０７と、第３レイヤ復号化部６０８と、加算器６０９と、出力端子６１０と、ＤＳ３部６１１と、予測フィルタ６１２とから主に構成される。
【０１６７】
入力端子６０１から図８の階層符号化装置にて符号化された符号化ビット列が入力される。
【０１６８】
分離部６０２は、符号化ビット列を分離し、第１レイヤ符号化で得られる第１符号化コード、第２レイヤ符号化で得られる第２符号化コードおよび第３レイヤ符号化で得られる第３符号化コードを生成する。そして、分離部６０２は、第１符号化コードを第１レイヤ復号化部６０３に出力し、第２符号化コードを第２レイヤ復号化部６０６に出力し、第３符号化コードを第３レイヤ復号化部６０８に出力する。
【０１６９】
第１レイヤ復号化部６０３は、分離部６０２で得られた第１符号化コードを用いて復号処理を行い、第１レイヤ復号信号を生成する。
【０１７０】
ＵＳ１部６０４は、第１レイヤ復号信号をアップサンプリングし、サンプリング周波数をＦｓ（１）からＦｓ（２）に上げる。そして、ＵＳ１部６０４は、サンプリング周波数Ｆｓ（２）の第１レイヤ復号信号を加算器６０５に出力する。
【０１７１】
次に、第２レイヤ復号化部６０６は、分離部６０２で得られた第２符号化コードを用いて復号処理を行い、第２レイヤ復号残差信号を生成する。加算器６０５では、前述の第１レイヤ復号信号と第２レイヤ復号残差信号とを加算し、第２レイヤ復号信号を生成する。そして、加算器６０５は、第１レイヤ復号信号と第２レイヤ復号信号をＵＳ２部６０７に出力する。
【０１７２】
ＵＳ２部６０７は、第２レイヤ復号信号をアップサンプリングし、サンプリング周波数をＦｓ（２）からＦｓ（３）に上げる。そして、ＵＳ２部６０７は、サンプリング周波数Ｆｓ（３）の第２レイヤ復号信号を加算器６０９に出力する。
【０１７３】
次に、第３レイヤ復号化部６０８は、分離部６０２で得られた第３符号化コードを用いて復号処理を行い、第３レイヤ復号残差信号を生成する。加算器６０９は、前述の第２レイヤ復号信号と第３レイヤ復号残差信号とを加算し、第３レイヤ復号信号を生成する。加算器６０９は、第３レイヤ復号信号をＤＳ３部６１１と出力端子６１０に出力する。
【０１７４】
ＤＳ３部６１１は、第３レイヤ復号信号をダウンサンプリングし、この第３レイヤ復号信号のサンプリング周波数をＦｓ（３）からＦｓ（１）に下げる。そして、ＤＳ３部６１１は、サンプリング周波数Ｆｓ（１）の第３レイヤ復号信号を予測フィルタ６１２に出力する。
【０１７５】
予測フィルタ６１２は、前述した実施の形態１の予測フィルタ１１６と同様の処理を行い、予測残差信号を生成する。予測フィルタ６１２で使用する量子化後のＬＰＣ係数には、第１レイヤ復号化部で得られる復号ＬＰＣ係数が用いられる。また、予測フィルタ６１２で生成される予測残差信号は第１レイヤ復号化部に与えられ、第１レイヤ復号化部に内在する適応符号帳の内部状態の更新に用いられる。
【０１７６】
なお、上記説明では、予測フィルタ６１２が第３レイヤ復号信号と第１レイヤ復号化部６０３より得られる量子化後のＬＰＣ係数を用いて、予測残差信号を作成し、第１レイヤ復号化部６０３内の適応符号帳は、この予測残差信号を用いて適応符号帳の内部状態を更新しているが、予測フィルタ６１２が、第２レイヤ復号信号をもちいて予測残差信号を作成してもよい。すなわち、予測残差信号を作成するために必要な復号信号は、第１レイヤ符号化で符号化しきれない残差信号を符号化するレイヤであれば何段目であってもよい。
【０１７７】
図１１は、本発明の実施の形態４に係る階層復号化装置の構成を示すブロック図である。但し、図１０と同一の構成となるものについては、図１０と同一番号を付し、詳しい説明を省略する。本実施の形態の特徴は、中間レイヤの復号信号（図１１では第２レイヤ復号信号）を予測フィルタ６１２に与え、予測フィルタ６１２の出力信号を第１レイヤ復号化部６０３内の適応符号帳の内部状態の更新に用いる点にある。この構成によれば、中間レイヤまでのスケーラビリティを確保できるという特徴がある。
【０１７８】
加算器６０５は、前述の第１レイヤ復号信号と第２レイヤ復号残差信号とを加算し、第２レイヤ復号信号を生成する。そして、加算器６０５は、第２レイヤ復号信号を加算器ＵＳ２部６０７とＤＳ３部６１１に出力する。
【０１７９】
ＵＳ２部６０７は、第２レイヤ復号信号をアップサンプリングし、サンプリング周波数をＦｓ（２）からＦｓ（３）に上げる。そして、ＵＳ２部６０７は、サンプリング周波数Ｆｓ（３）の第１レイヤ復号信号を加算器６０９に出力する。
【０１８０】
ＤＳ３部６１１は、第２レイヤ復号信号をダウンサンプリングし、この第２レイヤ復号信号のサンプリング周波数をＦｓ（２）からＦｓ（１）に下げる。そして、ＤＳ３部６１１は、サンプリング周波数Ｆｓ（１）の第２レイヤ復号信号を予測フィルタ６１２に出力する。
【０１８１】
このように、本実施の形態の階層復号化装置によれば、中間レイヤの復号信号を予測フィルタに与え、その出力信号を第１レイヤ復号化の適応符号帳の内部状態の更新に用いることにより、中間レイヤまでのスケーラビリティを確保できる。
【０１８２】
（実施の形態５）
図１２は、本発明の実施の形態５に係る階層符号化装置の第１レイヤ符号化部の構成を示すブロック図である。但し、図２と同一の構成となるものについては、図２と同一番号を付し、詳しい説明を省略する。図１２の第１レイヤ符号化部は、周期性算出部７０１と、判定部７０２と、スイッチ部７０３と、適応符号帳７０４と、多重化器７０５とを具備し、適応符号帳の内部状態を更新する際に、入力音響信号の周期性の強さに応じて入力端子２１８から入力される予測残差信号を用いるか、もしくは加算器２１２より出力される駆動音源信号を用いるかのいずれかを選択する点が図２の第１レイヤ符号化部と異なる。
【０１８３】
周期性算出部７０１は、入力端子２０１から入力された音響信号について相関分析などの処理を行い入力音響信号の周期性の強さの度合いを定量化し、この周期性の強さの度合いを判定部７０２に出力する。
【０１８４】
判定部７０２は、周期性の強さの度合いとあらかじめ定められた閾値と比較を行う。そして、判定部７０２は、周期性の強さの度合いが閾値を超える場合には入力音響信号の周期性は強いとみなし、フラグを「０」として多重化器７０５に出力する。また、判定部７０２は、周期性の強さの度合いが閾値以下のとき、入力音響信号の周期性は弱いとみなし、フラグを「１」として多重化器７０５に出力する。
【０１８５】
スイッチ部７０３では、判定部７０２より得られるフラグに応じ適応符号帳７０４の内部状態の更新に使用する信号を切り替える。フラグが０の場合には、スイッチ部７０３は適応符号帳７０４の内部状態の更新に使用する信号として入力端子２１８より入力される予測残差信号を用いるようにスイッチを接続する。同様にフラグが１の場合には、スイッチ部７０３は適応符号帳７０４の内部状態の更新に使用する信号として加算器２１２より出力される駆動音源信号を用いるようにスイッチを接続する。
【０１８６】
適応符号帳７０４は、過去に生成した駆動音源信号を内部状態として保持し、この内部状態を所望のピッチ周期で繰り返すことにより適応ベクトルを生成する。すなわち、判定部７０２において入力音響信号の周期性は強いと判定された場合、適応符号帳７０４は、入力端子２１８より入力される予測残差信号を用いて内部状態を更新する。また、判定部７０２において入力音響信号の周期性は弱いと判定された場合、適応符号帳７０４は、加算器２１２より出力される駆動音源信号を用いて内部状態を更新する。そして、適応符号帳７０４は、内部に保持した駆動音源信号を適応ベクトルとして順に乗算器２０９に出力する。
【０１８７】
多重化器７０５は、ＬＰＣ量子化器２０３、探索器２１４、及び判定部７０２からの信号を多重化して出力端子２１６から出力する。
【０１８８】
このように、本実施の形態の階層符号化装置によれば、入力音響信号の周期性が強い場合には、高位レイヤの復号信号から求められる予測残差信号を使って適応符号帳の内部状態を更新することにより適応符号帳による予測精度が増し性能が向上する。また、本実施の形態の階層符号化装置によれば、入力音響信号の周期性が強くない場合には駆動音源信号を使って適応符号帳の内部状態を更新することにより、非周期的な信号に対する効果を上げることができる。
【０１８９】
なお、上記説明では、入力音響信号の周期性の強さに基づいて、予測残差信号もしくは駆動音源信号のいずれを用いて適応符号帳の内部状態を更新するかを判定しているが、判定基準は、特に限定されない。
【０１９０】
例えば、予測残差信号を用いて適応符号帳の内部状態を更新し入力音響信号を実際に符号化して求められる歪と、駆動音源信号を用いて適応符号帳の内部状態を更新し入力音響信号を実際に符号化して求められる歪を算出して比較してもよい。図１３は、本実施の形態の階層符号化装置の動作の一例を示すフロー図である。以下、図１３を用いて階層符号化装置の判定動作について説明する。
【０１９１】
ステップＳ８１０において、予測残差信号を用いて適応符号帳の内部状態を更新し、第１レイヤ符号化部の符号化処理を行う。その際の入力音響信号に対する第１レイヤ復号信号の聴感上の歪Ｅ１を算出する。
【０１９２】
ステップＳ８２０において、同様に、駆動音源信号を用いて適応符号帳の内部状態を更新し、第１レイヤ符号化部の符号化処理を行う。その際の入力音響信号に対する第１レイヤ復号信号の聴感上の歪Ｅ２を算出する。
【０１９３】
ステップＳ８３０では、ステップＳ８１０で求めた歪Ｅ１とステップＳ８２０で求めた歪Ｅ２とを比較する。
【０１９４】
ステップＳ８４０にて判定を行い、歪Ｅ１の方が歪Ｅ２より小さい場合、ステップＳ８５０の処理に進む。また、歪Ｅ２の方が歪Ｅ１より小さい場合にはステップＳ８６０の処理に進む。
【０１９５】
ステップＳ８５０では、予測残差信号を用いる方が、効果が大きいと判断し、予測残差信号を使って適応符号帳の内部状態を更新した後に符号化処理を行う。このとき、適応符号帳の更新に予測残差信号を用いたとしてフラグを０にセットする。
【０１９６】
ステップＳ８６０では、駆動音源信号を用いる方が、効果が大きいと判断し、駆動音源信号を使って適応符号帳の内部状態を更新した後に符号化処理を行う。このとき、適応符号帳の更新に駆動音源信号を用いたとしてフラグを１にセットする。
【０１９７】
ステップＳ８７０では、符号化処理により得られた符号化コードとフラグを多重化部にて多重化して出力端子より出力する。
【０１９８】
このように、本実施の形態の階層符号化装置によれば、予測残差信号もしくは駆動音源信号のいずれを用いて適応符号帳の内部状態を更新するかを判定する際に、予測残差信号を用いて適応符号帳の内部状態を更新し入力音響信号を実際に符号化して求められる歪と、駆動音源信号を用いて適応符号帳の内部状態を更新し入力音響信号を実際に符号化して求められる歪を算出して比較し、歪が小さくなる信号を用いて適応符号帳の内部状態を更新することにより、歪の小さくなる信号を常に使って適応符号帳の内部状態を更新することになるので、品質を向上することができる。
【０１９９】
（実施の形態６）
図１４は、本発明の実施の形態６に係る階層復号化装置の第１レイヤ復号化部の構成を示すブロック図である。但し、図６と同一の構成となるものについては、図６と同一番号を付し、詳しい説明を省略する。図１４の第１レイヤ復号化部は、入力端子８０１と、分離部８０２と、スイッチ部８０３とを具備し、適応符号帳の内部状態を更新する際に、分離部８０２より得られるフラグ情報に基づき入力端子８０１から入力される予測残差信号を用いるか、もしくは加算器４０８より出力される駆動音源信号を用いるかのいずれかを選択する点が図２の第１レイヤ符号化部と異なる。
【０２００】
分離部８０２は、入力端子４０１より入力される符号化コードを基に適応符号帳８０４、雑音符号帳４０４、ゲイン符号帳４０５、ＬＰＣ復号器４０９で用いられる符号化コードを分離すると共に、適応符号帳８０４の内部状態の更新に使用する信号の種類を表すフラグ情報を分離する。このフラグ情報は、図１２の判定部７０２から多重化器７０５に出力される信号である。
【０２０１】
スイッチ部８０３は、フラグ情報に応じ適応符号帳８０４の内部状態の更新に使用する信号を切り替える。フラグが０の場合には、スイッチ部８０３は適応符号帳８０４の内部状態の更新に使用する信号として入力端子８０１より入力される予測残差信号を用いるようにスイッチを接続する。同様にフラグが１の場合には、スイッチ部８０３は適応符号帳８０４の内部状態の更新に使用する信号として加算器４０８より出力される駆動音源信号を用いるようにスイッチを接続する。
【０２０２】
このように、本実施の形態の階層復号化装置によれば、符号側において入力音響信号の周期性の強さ等に基づいて、予測残差信号もしくは駆動音源信号のいずれを用いて適応符号帳の内部状態を更新するかを判定した結果に基づいて、符号化された音響信号の周期性が強い場合には、高位レイヤの復号信号から求められる予測残差信号を使って適応符号帳の内部状態を更新することにより、階層符号化法の符号化コードを復号することができ、その結果高品質な音響信号を復号することができる。
【０２０３】
（実施の形態７）
次に、本発明の実施の形態７について、図面を参照して説明する。図１５は、本発明の実施の形態７に係る通信装置の構成を示すブロック図である。図１５における信号処理装置１５０３は前述した実施の形態１から実施の形態６に示した音響符号化装置の中の１つによって構成されている点に本実施の形態の特徴がある。
【０２０４】
図１５に示すように、本発明の実施の形態７に係る通信装置１５００は、入力装置１５０１、Ａ／Ｄ変換装置１５０２及びネットワーク１５０４に接続されている信号処理装置１５０３を具備している。
【０２０５】
Ａ／Ｄ変換装置１５０２は、入力装置１５０１の出力端子に接続されている。信号処理装置１５０３の入力端子は、Ａ／Ｄ変換装置１５０２の出力端子に接続されている。信号処理装置１５０３の出力端子はネットワーク１５０４に接続されている。
【０２０６】
入力装置１５０１は、人間の耳に聞こえる音波を電気的信号であるアナログ信号に変換してＡ／Ｄ変換装置１５０２に与える。Ａ／Ｄ変換装置１５０２はアナログ信号をディジタル信号に変換して信号処理装置１５０３に与える。信号処理装置１５０３は入力されてくるディジタル信号を符号化してコードを生成し、ネットワーク１５０４に出力する。
【０２０７】
このように、本発明の実施の形態の通信装置によれば、通信において前述した実施の形態１〜６に示したような効果を享受でき、少ないビット数で効率よく音響信号を符号化する音響符号化装置を提供することができる。
【０２０８】
（実施の形態８）
次に、本発明の実施の形態８について、図面を参照して説明する。図１６は、本発明の実施の形態８に係る通信装置の構成を示すブロック図である。図１６における信号処理装置１６０３は前述した実施の形態１から実施の形態６に示した音響復号化装置の中の１つによって構成されている点に本実施の形態の特徴がある。
【０２０９】
図１６に示すように、本発明の実施の形態８に係る通信装置１６００は、ネットワーク１６０１に接続されている受信装置１６０２、信号処理装置１６０３、及びＤ／Ａ変換装置１６０４及び出力装置１６０５を具備している。
【０２１０】
受信装置１６０２の入力端子は、ネットワーク１６０１に接続されている。信号処理装置１６０３の入力端子は、受信装置１６０２の出力端子に接続されている。Ｄ／Ａ変換装置１６０４の入力端子は、信号処理装置１６０３の出力端子に接続されている。出力装置１６０５の入力端子は、Ｄ／Ａ変換装置１６０４の出力端子に接続されている。
【０２１１】
受信装置１６０２は、ネットワーク１６０１からのディジタルの符号化音響信号を受けてディジタルの受信音響信号を生成して信号処理装置１６０３に与える。信号処理装置１６０３は、受信装置１６０２からの受信音響信号を受けてこの受信音響信号に復号化処理を行ってディジタルの復号化音響信号を生成してＤ／Ａ変換装置１６０４に与える。Ｄ／Ａ変換装置１６０４は、信号処理装置１６０３からのディジタルの復号化音声信号を変換してアナログの復号化音声信号を生成して出力装置１６０５に与える。出力装置１６０５は、電気的信号であるアナログの復号化音響信号を空気の振動に変換して音波として人間の耳に聴こえるように出力する。
【０２１２】
このように、本実施の形態の通信装置によれば、通信において前述した実施の形態１〜６に示したような効果を享受でき、少ないビット数で効率よく符号化された音響信号を復号することができるので、良好な音響信号を出力することができる。
【０２１３】
（実施の形態９）
次に、本発明の実施の形態９について、図面を参照して説明する。図１７は、本発明の実施の形態９に係る通信装置の構成を示すブロック図である。本発明の実施の形態９において、図１７における信号処理装置１７０３は、前述した実施の形態１から実施の形態６に示した音響符号化手段の中の１つによって構成されている点に本実施の形態の特徴がある。
【０２１４】
図１７に示すように、本発明の実施の形態９に係る通信装置１７００は、入力装置１７０１、Ａ／Ｄ変換装置１７０２、信号処理装置１７０３、ＲＦ変調装置１７０４及びアンテナ１７０５を具備している。
【０２１５】
入力装置１７０１は人間の耳に聞こえる音波を電気的信号であるアナログ信号に変換してＡ／Ｄ変換装置１７０２に与える。Ａ／Ｄ変換装置１７０２はアナログ信号をディジタル信号に変換して信号処理装置１７０３に与える。信号処理装置１７０３は入力されてくるディジタル信号を符号化して符号化音響信号を生成し、ＲＦ変調装置１７０４に与える。ＲＦ変調装置１７０４は、符号化音響信号を変調して変調符号化音響信号を生成し、アンテナ１７０５に与える。アンテナ１７０５は、変調符号化音響信号を電波として送信する。
【０２１６】
このように、本実施の形態の通信装置によれば、無線通信において前述した実施の形態１〜６に示したような効果を享受でき、少ないビット数で効率よく音響信号を符号化することができる。
【０２１７】
なお、本発明は、オーディオ信号を用いる送信装置、送信符号化装置又は音響信号符号化装置に適用することができる。また、本発明は、移動局装置又は基地局装置にも適用することができる。
【０２１８】
（実施の形態１０）
次に、本発明の実施の形態１０について、図面を参照して説明する。図１８は、本発明の実施の形態１０に係る通信装置の構成を示すブロック図である。本発明の実施の形態１０において、図１８における信号処理装置１８０３は、前述した実施の形態１から実施の形態６に示した音響復号化手段の中の１つによって構成されている点に本実施の形態の特徴がある。
【０２１９】
図１８に示すように、本発明の実施の形態１０に係る通信装置１８００は、アンテナ１８０１、ＲＦ復調装置１８０２、信号処理装置１８０３、Ｄ／Ａ変換装置１８０４及び出力装置１８０５を具備している。
【０２２０】
アンテナ１８０１は、電波としてのディジタルの符号化音響信号を受けて電気信号のディジタルの受信符号化音響信号を生成してＲＦ復調装置１８０２に与える。ＲＦ復調装置１８０２は、アンテナ１８０１からの受信符号化音響信号を復調して復調符号化音響信号を生成して信号処理装置１８０３に与える。
【０２２１】
信号処理装置１８０３は、ＲＦ復調装置１８０２からのディジタルの復調符号化音響信号を受けて復号化処理を行ってディジタルの復号化音響信号を生成してＤ／Ａ変換装置１８０４に与える。Ｄ／Ａ変換装置１８０４は、信号処理装置１８０３からのディジタルの復号化音声信号を変換してアナログの復号化音声信号を生成して出力装置１８０５に与える。出力装置１８０５は、電気的信号であるアナログの復号化音声信号を空気の振動に変換して音波として人間の耳に聴こえるように出力する。
【０２２２】
このように、本実施の形態の通信装置によれば、無線通信において前述した実施の形態１〜６に示したような効果を享受でき、少ないビット数で効率よく符号化された音響信号を復号することができるので、良好な音響信号を出力することができる。
【０２２３】
なお、本発明は、オーディオ信号を用いる受信装置、受信復号化装置又は音声信号復号化装置に適用することができる。また、本発明は、移動局装置又は基地局装置にも適用することができる。
【０２２４】
また、本発明は上記実施の形態に限定されず、種々変更して実施することが可能である。例えば、上記実施の形態では、信号処理装置として行う場合について説明しているが、これに限られるものではなく、この信号処理方法をソフトウェアとして行うことも可能である。
【０２２５】
例えば、上記信号処理方法を実行するプログラムを予めＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）に格納しておき、そのプログラムをＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｏｒＵｎｉｔ）によって動作させるようにしても良い。
【０２２６】
また、上記信号処理方法を実行するプログラムをコンピュータで読み取り可能な記憶媒体に格納し、記憶媒体に格納されたプログラムをコンピュータのＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓｍｅｍｏｒｙ）に記録して、コンピュータをそのプログラムにしたがって動作させるようにしても良い。
【０２２７】
なお、本発明は、オーディオ信号を用いる受信装置、受信復号化装置又は音声信号復号化装置に適用することができる。また、本発明は、移動局装置又は基地局装置にも適用することができる。
【０２２８】
【発明の効果】
以上説明したように、本発明の音響信号の階層符号化方法および階層復号化方法によれば、上位レイヤで符号化しきれない部分を符号化する下位レイヤで符号化する階層符号化において、第２レイヤ以降の符号化信号を復号した信号と、上位レイヤの符号化で得られるＬＰＣ係数とから上位レイヤの符号化で発生する残差信号を予測し、この予測した残差信号を用いて上位レイヤの適応符号帳の更新を行うことにより、音響信号の符号化に近い駆動音源を持つ適応符号帳で符号化して低ビットレートで高品質な符号化を行うことができる。
【図面の簡単な説明】
【図１】本発明の実施の形態１に係る階層符号化装置の構成を示すブロック図
【図２】本実施の形態の階層符号化装置の第１レイヤ符号化部の内部構成を示すブロック図
【図３】入力音響信号とそれに対応する第１レイヤ復号信号、第２レイヤ復号信号、第３レイヤ復号信号の関係を示す図
【図４】本発明の実施の形態１に係る階層符号化装置の構成を示すブロック図
【図５】本発明の実施の形態２に係る階層復号化装置の構成を示すブロック図
【図６】本実施の形態の階層復号化装置の第１レイヤ復号化部の内部構成を示すブロック図
【図７】本発明の実施の形態２に係る階層復号化装置の構成を示すブロック図
【図８】本発明の実施の形態３に係る階層符号化装置の構成を示すブロック図
【図９】本発明の実施の形態３に係る階層符号化装置の構成を示すブロック図
【図１０】本発明の実施の形態４に係る階層復号化装置の構成を示すブロック図
【図１１】本発明の実施の形態４に係る階層復号化装置の構成を示すブロック図
【図１２】本発明の実施の形態５に係る階層符号化装置の第１レイヤ符号化部の構成を示すブロック図
【図１３】本実施の形態の階層符号化装置の動作の一例を示すフロー図
【図１４】本発明の実施の形態６に係る階層復号化装置の第１レイヤ復号化部の構成を示すブロック図
【図１５】本発明の実施の形態７に係る通信装置の構成を示すブロック図
【図１６】本発明の実施の形態８に係る通信装置の構成を示すブロック図
【図１７】本発明の実施の形態９に係る通信装置の構成を示すブロック図
【図１８】本発明の実施の形態１０に係る通信装置の構成を示すブロック図
【符号の説明】
１０２、５０３第１レイヤ符号化部
１０３、３０３、５０４、６０３第１レイヤ復号化部
１０６、５０９第２レイヤ符号化部
１０７、３０４、５１０、６０６第２レイヤ復号化部
１１１、３０５、５１５第３レイヤ符号化部
１１２、５１６、６０８第３レイヤ復号化部
１１６、３０８、５２１、６１２予測フィルタ
２０２ＬＰＣ分析器
２０３ＬＰＣ量子化器
２０４ＬＰＣ復号器
２０５聴感重みフィルタ
２０６聴感重み付き合成フィルタ
２０７、４０３、７０４適応符号帳
２１４探索器
４０９ＬＰＣ復号器
４１０合成フィルタ
５０２ＤＳ１部
５０５、６０４ＵＳ１部
５０６ＤＳ２部
５１２、６０７ＵＳ２部
５２０、６１１ＤＳ３部
７０１周期性算出部
７０２判定部
７０３、８０３スイッチ部[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a hierarchical encoding method and a hierarchical decoding method for an audio signal, and more particularly, to a hierarchical encoding method and a hierarchical decoding method suitable for efficiently using an audio signal such as a musical sound signal or an audio signal for compression encoding. About the method of conversion.
[0002]
[Prior art]
An acoustic coding technique for compressing a tone signal or a voice signal at a low bit rate is important for effective use of a transmission path capacity of radio waves and the like and a recording medium in mobile communication. There are G726 and G729 standardized by ITU (International Telecommunication Union) for audio coding for encoding an audio signal. These systems target narrowband signals (300 Hz to 3.4 kHz) and can perform high-quality encoding at 8 kbit / s to 32 kbit / s. In addition, there are ITU G722, G722.1, 3GPP (The 3rd Generation Partnership Project) AMR-WB, and the like as standard systems for wideband signals (50 Hz to 7 kHz). These methods can code a wideband audio signal with high quality at a bit rate of 6.6 kbit / s to 64 kbit / s.
[0003]
An effective method of encoding a speech signal at a low bit rate with high efficiency is CELP (Code Excited Linear Prediction). CELP is based on a model that simulates a human voice generation model by engineering, and passes an excitation signal represented by a random number or a pulse train through a pitch filter corresponding to the strength of the periodicity and a synthesis filter corresponding to the vocal tract characteristics, This is a method of determining an encoding code such that the square error between the output signal and the input signal is minimized under the weighting of auditory characteristics (for example, see Non-Patent Document 1). Many of the recent standard audio coding systems are based on CELP. For example, G729 can perform narrowband signal coding at 8 kbit / s, and AMR-WB can perform wideband signal coding at 6.6 kbit / s to 23.85 kbit / s. Can be encoded.
[0004]
On the other hand, in the case of musical sound encoding for encoding a musical sound signal, a musical sound signal is converted into a frequency domain like a layer III system or an AAC system standardized by MPEG (Moving Picture Expert Group), and the psychoacoustic is used. Transform coding in which coding is performed using a model is general. In these systems, it is known that a signal having a sampling frequency of 44.1 kHz has 64 kbit / s to 96 kbit / s per channel and hardly causes audible deterioration.
[0005]
However, when encoding a signal mainly composed of audio signals and having music or environmental sound superimposed on the background, if the audio encoding method is applied, the effect of the music or environmental sound in the background will cause the signal to be encoded only in the background. In addition, there is a problem that the audio signal is deteriorated and the overall quality is reduced. This is a problem that occurs because the speech coding system is based on a CELP-based system specialized for a speech model. In addition, the signal band that can be supported by the audio coding system is up to 7 kHz at most, and there is a problem that a signal having a higher band than that can not be sufficiently supported due to its configuration.
[0006]
On the other hand, music encoding can perform high-quality encoding on music, so that sufficient quality can be obtained even for audio signals having music and environmental sounds in the background as described above. The band of the target signal can be handled up to the CD quality of about 22 kHz. On the other hand, in order to realize high-quality encoding, it is necessary to use a high bit rate, and if the bit rate is suppressed to about 32 kbit / s, there is a problem that the quality of a decoded signal is greatly reduced. . For this reason, there is a problem that it cannot be used in a communication network having a low transmission rate.
[0007]
Combining these techniques to avoid the problems described above, the input signal is first coded by CELP in the first layer, and then the decoded signal is subtracted from the input signal to obtain a residual signal. A method of transform-encoding a signal in the second and subsequent layers can be considered. In this method, since the first layer uses CELP, the audio signal can be encoded with high quality, and the second layer and the subsequent layers cover the background music and environmental sound that cannot be expressed by the first layer, and the first layer. A signal having a frequency component higher than the frequency band can be efficiently encoded.
[0008]
However, in order to secure sufficient quality when music is input instead of voice, it is necessary to increase the bit allocation to the second and subsequent layers, resulting in a problem that the bit rate increases. This is a problem that arises because a speech-specific coding scheme such as CELP is applied to the first layer. That is, when a music signal is input, the CELP used in the first layer does not have high coding efficiency for music, so an error signal between the input signal and the decoded signal of the first layer (that is, the input signal of the second layer) Power is increased. As a result, it is necessary to allocate many bits to the second and subsequent layers to improve the quality of the final decoded signal.
[0009]
[Non-patent document 1]
"Code-Excited Linear Prediction (CELP): high quality speech at very low bit rates", Proc. ICASSP 85, pp. 937-940, 1985.
[0010]
[Problems to be solved by the invention]
As described above, the conventional apparatus has a problem that it is difficult to perform high-quality encoding at a low bit rate.
[0011]
The present invention has been made in view of such a point, and an object of the present invention is to provide a hierarchical encoding method and a hierarchical decoding method of an audio signal capable of performing high-quality encoding at a low bit rate.
[0012]
[Means for Solving the Problems]
The hierarchical encoding method according to the present invention is a hierarchical encoding method for encoding an input audio signal, decoding a signal encoded in a previous stage, and encoding a difference between the decoded signal and the input signal, wherein Encoding step of encoding an input audio signal in frame units of a second frame, and second encoding of encoding a difference between a signal obtained by decoding a previous encoding result and an input audio signal in one or more stages. A step, a prediction filter step of generating a prediction residual signal from a signal obtained by decoding the encoding result of the second encoding step, and an update for updating a codebook used for encoding based on the prediction of the prediction filter step. It was prepared with a process.
[0013]
In the hierarchical encoding method according to the present invention, the first encoding step performs CELP encoding of the input audio signal, the prediction filter step generates a prediction filter using quantized LPC coefficients, and the updating step includes: The codebook is updated by using a result obtained by passing a signal obtained by decoding the result of encoding by the second encoding means through the prediction filter.
[0014]
According to these methods, in hierarchical coding in which a part that cannot be completely encoded in an upper layer is encoded in a lower layer, a signal obtained by decoding an encoded signal of a second layer or later and a signal obtained by encoding an upper layer By generating a prediction residual signal generated in the encoding of the upper layer from the obtained LPC coefficients and updating the adaptive codebook of the upper layer using the prediction residual signal, it is close to the encoding of the audio signal. Encoding can be performed with an adaptive codebook having a driving excitation, and high-quality encoding can be performed at a low bit rate.
[0015]
The hierarchical encoding method of the present invention includes a down-sampling step of down-sampling an input audio signal, and an up-sampling step of up-sampling a signal obtained by decoding a preceding-stage encoding result, wherein the second encoding step is The difference between the signal obtained by decoding the encoding result of the preceding stage after the upsampling and the input audio signal is encoded in one or more stages.
[0016]
According to this method, the input signal can be encoded corresponding to various sampling frequencies by setting the sampling frequency of the signal to be encoded in the lower layer higher than the sampling frequency of the signal to be encoded in the upper layer. .
[0017]
The hierarchical encoding method of the present invention includes a periodicity calculating step of measuring a periodicity of an input audio signal, and the updating step includes the step of performing the prediction filtering step when the periodicity is equal to or more than a predetermined threshold. Updating the codebook using the prediction residual signal obtained in the prediction, and updating the codebook using any of the driving excitation signals generated when the periodicity is less than a predetermined threshold. did.
[0018]
According to this method, when the periodicity of the input audio signal is strong, the internal code of the adaptive codebook is updated by using the prediction residual signal obtained from the decoded signal of the higher layer, whereby the prediction accuracy by the adaptive codebook is updated. And the performance is improved. In addition, according to the hierarchical coding apparatus of the present embodiment, when the periodicity of the input audio signal is not strong, the internal state of the adaptive codebook is updated using the driving excitation signal, whereby the non-periodic signal is updated. To the effect.
[0019]
The hierarchical coding method according to the present invention is characterized in that the internal state of the adaptive codebook is updated using the prediction residual signal, the distortion obtained by actually coding the input audio signal, and the internal state of the codebook is calculated using the driving excitation signal. And determining whether any of the distortions obtained by actually encoding the input audio signal is small, and the updating step updates the codebook using the signal with the small distortion. .
[0020]
According to this method, when it is determined whether to update the internal state of the adaptive codebook using the prediction residual signal or the driving excitation signal, the internal state of the adaptive codebook is updated using the prediction residual signal. Then, the distortion obtained by actually encoding the input audio signal and the distortion obtained by updating the internal state of the adaptive codebook using the driving excitation signal and actually encoding the input audio signal are calculated and compared. Since the internal state of the adaptive codebook is updated by using a signal having a small value, the internal state of the adaptive codebook is always updated by using a signal having a small distortion, so that the quality can be improved.
[0021]
The hierarchical decoding method of the present invention encodes an input audio signal on the encoding side, decodes a signal encoded in a preceding stage, and decodes a signal obtained by encoding a difference between the decoded signal and the input signal. Wherein a first decoding step of decoding a signal obtained by encoding an input audio signal in frame units of a predetermined length, and a difference between the input audio signal and a signal obtained by decoding the encoding result of the previous stage, is calculated by one step. Or a second decoding step of decoding and adding the signals encoded in a plurality of stages, respectively, and a prediction filter step of generating a prediction residual signal from the decoding results of the first decoding step and the second decoding step. And an update step of updating a codebook used for decoding based on the prediction of the prediction filter step.
[0022]
In the hierarchical decoding method according to the present invention, the first decoding step decodes an input audio signal by a CELP coding method, and the prediction filter step decodes LPC coefficients coded on the coding side. A prediction filter is generated by using the LPC coefficient obtained in step (a), and the updating step updates the codebook using a result obtained by passing the decoding results of the first decoding step and the second decoding step through the prediction filter. I did it.
[0023]
According to these methods, in the decoding of the hierarchical encoding method of encoding in a lower layer that encodes a part that cannot be encoded in an upper layer, a signal obtained by decoding an encoded signal of a second layer or later and a signal of an upper layer By predicting a residual signal generated in the encoding of the upper layer from the LPC coefficient obtained in the encoding and updating the adaptive codebook of the upper layer using the predicted residual signal, the code of the audio signal is obtained. It is possible to decode with an adaptive codebook having a driving excitation close to that of a signal, and to decode a high quality signal at a low bit rate.
[0024]
The hierarchical decoding method according to the present invention includes an up-sampling step of up-sampling a preceding-stage decoding result, an adding step of adding the up-sampled decoding result and a subsequent-stage decoding result, and a down-sampling step of down-sampling the addition result. And the prediction filter step is configured to generate a prediction residual signal from a decoded result after downsampling.
[0025]
According to this method, by making the sampling frequency of the signal to be decoded in the lower layer higher than the sampling frequency of the signal to be decoded in the upper layer, it is possible to decode the signal obtained by encoding the signal corresponding to various sampling frequencies. it can.
[0026]
In the hierarchical decoding method according to the present invention, in the updating step, the adaptive codebook is updated on the encoding side by using any of the prediction residual signal obtained by the prediction of the prediction filter step and the generated excitation signal. The codebook is updated based on the result of the determination.
[0027]
A hierarchical encoding device according to the present invention is a hierarchical encoding device that encodes an input audio signal, decodes a signal encoded in a previous stage, and encodes a difference between the decoded signal and the input signal. Encoding means for encoding the input audio signal in frame units of the same size, and second encoding for encoding the difference between the signal obtained by decoding the encoding result of the previous stage and the input audio signal in one or more stages Means, and prediction filter means for generating a prediction residual signal from a signal obtained by decoding the coding result of the second coding means, wherein the first coding means performs prediction by the prediction filter means. The codebook used for encoding is updated based on the codebook.
[0028]
According to this configuration, a result of determining on the code side whether to update the internal state of the adaptive codebook using either the prediction residual signal or the driving excitation signal based on the strength of the periodicity of the input acoustic signal or the like. In the case where the periodicity of the coded audio signal is strong based on, the internal state of the adaptive codebook is updated by using the prediction residual signal obtained from the decoded signal of the higher layer, so that the hierarchical coding method is performed. Can be decoded, and as a result, a high-quality audio signal can be decoded.
[0029]
In the hierarchical encoding apparatus according to the present invention, the first encoding unit is a unit that performs CELP encoding of an input audio signal, wherein the codebook holds a driving excitation signal generated in the past, and an LPC coefficient obtained from the input audio signal. LPC analysis means for obtaining the input audio signal, and a search means for searching for a drive excitation signal having the smallest difference from the input audio signal, wherein the prediction filter means generates a prediction filter using the quantized LPC coefficients, The first encoding unit updates the codebook using a result obtained by passing a signal obtained by decoding the encoding result of the second encoding unit through the prediction filter.
[0030]
According to this configuration, in hierarchical encoding in which a part that cannot be completely encoded in the upper layer is encoded in the lower layer, a signal obtained by decoding a coded signal in the second and subsequent layers and a signal obtained by encoding the upper layer. By generating a prediction residual signal from the obtained LPC coefficients and updating the adaptive codebook of the upper layer using the predicted residual signal, an adaptive codebook having a driving excitation close to that of audio signal coding is obtained. Encoding can be performed, and high-quality encoding can be performed at a low bit rate.
[0031]
The hierarchical encoding apparatus according to the present invention includes a down-sampling unit that down-samples an input audio signal and outputs the down-sampled audio signal to the first encoding unit or the second encoding unit; An up-sampling unit for performing sampling, wherein the second encoding unit encodes a difference between a signal obtained by decoding an encoding result of a preceding stage after the up-sampling and an input audio signal in one stage or a plurality of stages. Take.
[0032]
According to this configuration, the input signal can be encoded corresponding to various sampling frequencies by setting the sampling frequency of the signal to be encoded in the lower layer higher than the sampling frequency of the signal to be encoded in the upper layer. .
[0033]
In the hierarchical coding apparatus according to the present invention, the first coding unit may update the adaptive codebook using any of the prediction residual signal obtained by the prediction of the prediction filter unit and the generated excitation signal. A configuration including determination means for determination is adopted.
[0034]
The hierarchical encoding device according to the present invention, wherein the first encoding means includes periodicity calculation means for measuring the periodicity of the input audio signal, and wherein the determination means determines that the periodicity is equal to or greater than a predetermined threshold. In some cases, the codebook is updated using the prediction residual signal obtained by the prediction of the prediction filter means, and the codebook is generated using the driving excitation signal generated when the periodicity is less than a predetermined threshold. The configuration for determining whether to update is adopted.
[0035]
According to these configurations, when the periodicity of the input audio signal is strong, the internal state of the adaptive codebook is updated by using the prediction residual signal obtained from the decoded signal of the higher layer, whereby the prediction by the adaptive codebook is performed. Accuracy increases and performance improves. In addition, according to the hierarchical coding apparatus of the present embodiment, when the periodicity of the input audio signal is not strong, the internal state of the adaptive codebook is updated using the driving excitation signal, whereby the non-periodic signal is updated. To the effect.
[0036]
In the hierarchical coding apparatus of the present invention, the determination unit updates the internal state of the adaptive codebook using a prediction residual signal, and uses a distortion obtained by actually coding an input audio signal and a driving excitation signal. The internal state of the adaptive codebook is updated to determine which of the distortions obtained by actually encoding the input audio signal is smaller, and the first encoding unit updates the codebook using the signal with the smaller distortion. A configuration is adopted for making a determination.
[0037]
According to this configuration, when determining whether to update the internal state of the adaptive codebook using the prediction residual signal or the driving excitation signal, the internal state of the adaptive codebook is updated using the prediction residual signal. Then, the distortion obtained by actually encoding the input audio signal and the distortion obtained by updating the internal state of the adaptive codebook using the driving excitation signal and actually encoding the input audio signal are calculated and compared. Since the internal state of the adaptive codebook is updated by using a signal having a small value, the internal state of the adaptive codebook is always updated by using a signal having a small distortion, so that the quality can be improved.
[0038]
A hierarchical decoding device of the present invention encodes an input audio signal on the encoding side, decodes a signal encoded in a previous stage, and decodes a signal obtained by encoding a difference between the decoded signal and the input signal. And a first decoding means for decoding a signal obtained by encoding the input audio signal in frame units of a predetermined length, and a difference between the input audio signal and a signal obtained by decoding the encoding result of the previous stage, by one stage. Or a second decoding means for decoding and adding the signals coded in a plurality of stages, respectively, a prediction filter means for generating a prediction residual signal from decoding results of the first decoding means and the second decoding means, And the decoding unit updates the codebook used for decoding based on the prediction of the prediction filter unit.
[0039]
In the hierarchical decoding device according to the present invention, the first decoding means may decode the input audio signal by a CELP coding method, and the prediction filter means may include an LPC encoded on an encoding side. A prediction filter is generated using LPC coefficients obtained by decoding the coefficients, and the first decoding unit passes a decoding result of the first decoding unit and the second decoding unit through the prediction filter. Is adopted to update the codebook using.
[0040]
According to these configurations, in decoding of the hierarchical encoding method in which the lower layer encodes a portion that cannot be encoded in the upper layer, a signal obtained by decoding the encoded signal of the second layer and the following layers is decoded. By generating a prediction residual signal from the LPC coefficients obtained by encoding and updating the adaptive codebook of the upper layer using the predicted residual signal, a driving excitation similar to that of audio signal encoding is obtained. It can be decoded with an adaptive codebook, and can decode high-quality signals at low bit rates.
[0041]
A hierarchical decoding apparatus according to the present invention includes an up-sampling unit for up-sampling a preceding-stage decoding result, an adding unit for adding the up-sampled decoding result and a subsequent-stage decoding result, and down-sampling the addition result of the adding unit. And a down-sampling unit, wherein the filter unit is configured to generate a prediction residual signal from a decoded result after down-sampling.
[0042]
According to this configuration, by setting the sampling frequency of the signal to be decoded in the lower layer higher than the sampling frequency of the signal to be decoded in the upper layer, it is possible to decode a signal obtained by encoding a signal corresponding to various sampling frequencies. it can.
[0043]
In the hierarchical decoding apparatus according to the present invention, the first decoding means may use an adaptive codebook on the encoding side by using either a prediction residual signal obtained by the prediction of the prediction filter means or a generated excitation signal. The codebook is updated based on the result of determining whether to update the codebook.
[0044]
According to this configuration, it is determined on the code side whether to update the internal state of the adaptive codebook using either the prediction residual signal or the driving excitation signal based on the strength of the periodicity of the input acoustic signal and the like. In the case where the periodicity of the coded audio signal is strong based on, the internal state of the adaptive codebook is updated by using the prediction residual signal obtained from the decoded signal of the higher layer, so that the hierarchical coding method is performed. Can be decoded, and as a result, a high-quality audio signal can be decoded.
[0045]
An audio signal transmitting apparatus according to the present invention includes an audio input unit for converting an audio signal into an electric signal, an A / D conversion unit for converting a signal output from the audio input unit into a digital signal, and an A / D converter. Means for encoding the digital signal output from the means, RF modulation means for modulating the coded code output from the coding apparatus into a radio frequency signal, and output from the RF modulation means. And a transmission antenna that converts a signal into a radio wave and transmits the radio wave.
[0046]
According to this configuration, in hierarchical encoding in which a part that cannot be completely encoded in the upper layer is encoded in the lower layer, a signal obtained by decoding a coded signal in the second and subsequent layers and a signal obtained by encoding the upper layer. By generating a prediction residual signal from the obtained LPC coefficient and updating the adaptive codebook of the upper layer using the predicted residual signal, the adaptive codebook having a driving excitation close to the encoding of the acoustic signal is obtained. Encoding can be performed, and high-quality encoding can be performed at a low bit rate.
[0047]
The acoustic signal receiving apparatus according to the present invention includes a receiving antenna for receiving a radio wave, an RF demodulating means for demodulating a signal received by the receiving antenna, and the hierarchical decoding for decoding information obtained by the RF demodulating means. A D / A converter for converting a signal output from the decoding device into an analog signal, and an audio output unit for converting an electric signal output from the D / A converter into an audio signal. The configuration provided is adopted.
[0048]
According to this configuration, in the decoding of the hierarchical encoding method of encoding in a lower layer that encodes a part that cannot be encoded in an upper layer, a signal obtained by decoding an encoded signal of a second layer or later and a code of an upper layer A prediction residual signal is generated from the LPC coefficients obtained by the coding, and the adaptive codebook of the upper layer is updated using the predicted residual signal. It can decode with a codebook, and can decode a high-quality signal at a low bit rate.
[0049]
The communication terminal device of the present invention employs a configuration including at least one of the above-described acoustic signal transmitting device and the above-described acoustic signal receiving device. The base station apparatus of the present invention employs a configuration including at least one of the above-described acoustic signal transmitting apparatus and the above-described acoustic signal receiving apparatus.
[0050]
According to these configurations, in hierarchical coding in which a part that cannot be completely encoded in the upper layer is encoded in the lower layer, a signal obtained by decoding an encoded signal in the second and subsequent layers and a signal obtained by encoding the upper layer By generating a prediction residual signal from the obtained LPC coefficients and updating the upper layer adaptive codebook using the predicted residual signal, an adaptive codebook having a driving excitation close to that of audio signal coding is obtained. , And high-quality encoding can be performed at a low bit rate.
[0051]
BEST MODE FOR CARRYING OUT THE INVENTION
The gist of the present invention is that, in hierarchical encoding in which a part that cannot be completely encoded in an upper layer is encoded in a lower layer, a signal obtained by decoding an encoded signal of a second layer or later and an encoded signal of an upper layer are obtained. By generating a prediction residual signal from the obtained LPC coefficient and updating the adaptive codebook of the upper layer using the predicted residual signal, the adaptive codebook having a driving excitation close to the encoding of the acoustic signal is obtained. It is to perform high-quality encoding at a low bit rate by encoding.
[0052]
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. In the following description of the embodiment, the case where the number of layers N is 3 will be described, but the present invention is not limited to this numerical value, and can be applied to a configuration satisfying the condition of N ≧ 2. .
[0053]
(Embodiment 1)
FIG. 1 is a block diagram showing a configuration of a hierarchical encoding device according to Embodiment 1 of the present invention. 1 includes an input terminal 101, a first layer encoding unit 102, a first layer decoding unit 103, a delay unit 104, a subtractor 105, and a second layer encoding unit 106. , A second layer decoding unit 107, an adder 108, a delay unit 109, a subtractor 110, a third layer encoding unit 111, a third layer decoding unit 112, an adder 113, , An output terminal 115, and a prediction filter 116.
[0054]
In the present embodiment, the sampling frequencies of the signals input to each layer are all the same, and the sampling frequency is represented by Fs. An audio signal having a sampling frequency Fs is input from an input terminal 101 and provided to a first layer encoding unit 102.
[0055]
First layer coding section 102 has an adaptive codebook that holds a previously generated drive excitation signal as an internal state, and efficiently codes a signal having a high periodicity by using the adaptive codebook. be able to. First layer coding section 102 determines the first coded code such that audible distortion between the input audio signal and the decoded signal generated after coding is minimized. A typical method applied to the first layer coding section 102 is a code excitation linear prediction method (CELP), which will be described later in detail.
[0056]
Then, first layer encoding section 102 outputs the obtained first encoded code to first layer decoding section 103 and multiplexing section 114. First layer decoding section 103 generates a first layer decoded signal using the first encoded code, and outputs the first layer decoded signal to subtractor 105 and adder 108.
[0057]
The delay unit 104 delays the acoustic signal input from the input terminal 101 by a predetermined time length and outputs the delayed audio signal to the subtractor 105. That is, the delay unit 104 has a role of correcting a delay generated in the first layer encoding unit 102 and the first layer decoding unit 103.
[0058]
The subtractor 105 calculates a difference between the output signal of the delay unit 104 and the above-described first layer decoded signal to generate a second layer residual signal. Then, subtracter 105 outputs the second layer residual signal to second layer encoding section 106.
[0059]
Second layer encoding section 106 encodes the second layer residual signal so that quality is improved audibly, and determines a second encoded code. Then, second layer encoding section 106 outputs second layer decoding section 107 and the second encoded code to multiplexing section 114.
[0060]
Similarly, a second encoded code is given to second layer decoding section 107, and second layer decoding section 107 performs a decoding process using the second encoded code to generate a second layer decoded residual signal. , And outputs the second layer decoded residual signal to the adder 108.
[0061]
The adder 108 generates the second layer decoded signal by taking the sum of the first layer decoded signal and the second layer decoded residual signal. Then, adder 108 outputs the second layer decoded signal to subtractor 110 and adder 113.
[0062]
Next, the delay unit 109 outputs the acoustic signal to the subtractor 110 after delaying the acoustic signal input from the input terminal 101 by a predetermined time length. That is, the delay unit 109 includes a delay generated in the encoding unit and the decoding unit up to the previous stage, specifically, the first layer encoding unit 102, the first layer decoding unit 103, and the second layer encoding unit 106 It has a role of correcting a delay generated in the two-layer decoding unit 107.
[0063]
The subtracter 110 generates a third layer residual signal by taking the difference between the output signal of the delay unit 109 and the above-described second layer decoded signal. Then, subtracter 110 outputs the third layer residual signal to third layer encoding section 111.
[0064]
Third layer encoding section 111 determines a third encoded code by encoding the third layer residual signal so that quality improvement is perceptually perceived, and decodes the third encoded code to third layer decoding. Output to the multiplexing unit 112 and the multiplexing unit 114.
[0065]
Third layer decoding section 112 performs a decoding process using the third encoded code, generates a third layer decoded residual signal, and outputs the third layer decoded residual signal to adder 113.
[0066]
The adder 113 calculates the sum of the second layer decoded signal and the third layer decoded residual signal, generates a third layer decoded signal, and outputs the third layer decoded signal to the prediction filter 116.
[0067]
The multiplexing unit 114 multiplexes the first coded code, the second coded code, and the third coded code by predetermined means, and generates a coded bit sequence. Then, the multiplexing unit 114 outputs the encoded bit string from the output terminal 115.
[0068]
The third layer decoded signal generated by adder 113 is provided to prediction filter 116.
[0069]
Prediction filter 116 applies a prediction filter to the third layer decoded signal to generate a prediction residual signal, and outputs the prediction residual signal to first layer encoding section 102. The prediction filter is configured by the quantized LPC coefficients calculated by the first layer encoding unit 102. Assuming that the third layer decoded signal is syn3 (k), the prediction residual signal is e (k), and the LPC coefficient after quantization is αq (i), the prediction residual signal e (k) is expressed by the following equation (1). Is represented by
[0070]
(Equation 1)

Here, NP represents the order of the LPC coefficient.
[0071]
First layer coding section 102 uses the prediction residual signal obtained in the above description as an internal state of the adaptive codebook inherent in first layer coding section 102.
[0072]
Hereinafter, details of first layer encoding section 102 will be described. Here, a case where CELP is used for first layer coding section 102 will be described as an example. However, as a requirement of the present invention, an adaptive code in which the first drive excitation signal is held in the first layer coding section as an internal state. The present invention is not limited to CELP as long as the coding method has a book.
[0073]
FIG. 2 is a block diagram showing an internal configuration of the first layer encoding unit of the hierarchical encoding device according to the present embodiment. The first layer coding section in FIG. 2 is a configuration diagram based on a typical configuration of CELP. 2 corresponds to the first layer encoding unit 102 in FIG. 2, first layer encoding section 102 includes input terminal 201, LPC analyzer 202, LPC quantizer 203, LPC decoder 204, perceptual weight filter 205, and perceptual weighted synthesis filter 206. , Adaptive codebook 207, noise codebook 208, multiplier 209, multiplier 210, gain codebook 211, adder 212, subtractor 213, searcher 214, multiplexing section 215, It mainly includes an output terminal 216, an output terminal 217, and an input terminal 218.
[0074]
An audio signal input from the input terminal 101 of FIG. 1 is input to the input terminal 201. The LPC analyzer 202 calculates an LPC coefficient from the audio signal of the sampling rate Fs input from the input terminal 201. The LPC coefficient is used for improving perceived quality. The LPC analyzer 202 outputs the LPC coefficient to the LPC quantizer 203, the perceptual weight filter 205, and the perceptual weighted synthesis filter 206.
[0075]
The LPC quantizer 203 converts the LPC coefficients into parameters suitable for quantization such as LSP coefficients and performs quantization. Then, LPC quantizer 203 outputs the encoded code obtained by the quantization to LPC decoder 204 and multiplexing section 215.
[0076]
The LPC decoder 204 calculates the quantized LSP coefficient from the encoded code, converts the LSP coefficient into an LPC coefficient, and obtains the quantized LPC coefficient. Then, LPC decoder 204 outputs the quantized LPC coefficients to perceptually weighted synthesis filter 206 and output terminal 217. The quantized LPC coefficients are used for encoding an adaptive codebook, an adaptive gain, a noise codebook, and a noise gain. Further, the quantized LPC coefficient is output from the output terminal 217, and is provided to the prediction filter 116 of FIG. 1 as described above, and is used when obtaining the prediction residual signal e (k).
[0077]
The audibility weighting filter 205 weights the input signal based on the LPC coefficient obtained by the LPC analyzer 202. This is performed for the purpose of performing spectrum shaping so that the spectrum of the quantization distortion is masked by the spectrum envelope of the input signal. Then, the perceptual weight filter 205 outputs the weighted input signal to the subtractor 213.
[0078]
Next, an adaptive vector, an adaptive vector gain, a noise vector, and a configuration part for searching for a noise vector gain will be described.
[0079]
Adaptive codebook 207 holds a previously generated drive excitation signal as an internal state, and generates an adaptive vector by repeating this internal state at a desired pitch cycle. The range of the pitch period is suitably between 60 Hz and 400 Hz in consideration of the pitch period of the actual voice. Then, adaptive codebook 207 sequentially outputs the driving excitation signal held therein to multiplier 209 as an adaptive vector.
[0080]
The multiplier 209 multiplies the adaptive vector by the adaptive vector gain output from the gain codebook 211 and outputs the result to the adder 212.
[0081]
Further, the noise codebook 208 outputs a noise vector previously stored in a storage area or a vector generated according to a rule without a storage area as in an algebraic structure.
[0082]
The multiplier 210 multiplies the noise vector by the noise vector gain output from the gain codebook 211 and outputs the result to the adder 212.
[0083]
The adder 212 generates a driving sound source signal by adding the adaptive vector multiplied by the adaptive vector gain and the noise vector multiplied by the noise vector gain, and outputs the driving sound source signal to the perceptual weighting synthesis filter 206. .
[0084]
The perceptual weighting synthesis filter 206 passes the driving sound source signal through the perceptual weighting synthesis filter to generate a perceptual weighted synthesized signal, and outputs the perceptual weighted synthesized signal to the subtractor 213.
[0085]
The subtractor 213 subtracts the auditory weighted synthesized signal from the auditory weighted input signal, and outputs the subtracted signal to the searcher 214.
[0086]
The searcher 214 efficiently searches for a combination of an adaptive vector, an adaptive vector gain, a noise vector, and a noise vector gain that minimizes the distortion defined from the signal after the subtraction, and multiplexes these searched encoded codes into a multiplexing unit. 215.
[0087]
The searcher 214 determines an encoded code i, j, m or an encoded code i, j, m, n that minimizes the distortion defined by the following equation (2) or (3) and substitutes them. This is sent to the multiplexing unit 215.
[0088]
(Equation 2)

[Equation 3]

Here, t (k) is a perceptually weighted input signal, pi (k) is a signal obtained by passing an i-th adaptive vector through a perceptually weighted synthesis filter, and ej (k) is a j-th noise vector. The signals obtained through the perceptual weighting synthesis filter, β and γ, represent the adaptive vector gain and the noise vector gain, respectively. Equations (2) and (3) differ in the configuration of the gain codebook. In the case of Equation 2, the gain codebook is expressed as a vector having adaptive vector gain βm and noise vector gain γm as elements. The encoded code m for identification is determined. In the case of Equation 3, the gain codebook has the adaptive vector gain βm and the noise vector gain γn independently, and the encoded codes m and n are determined independently.
[0089]
If the optimization of the adaptive vector, the adaptive vector gain, the noise vector, and the noise vector gain is simultaneously attempted, the amount of calculation becomes enormous, and a countermeasure is required. Generally, a method is adopted in which an optimum vector or value is determined in the order of an adaptive vector, an adaptive vector gain, a noise vector, and a noise vector gain.
[0090]
After the encoding code is determined by the searcher 214, the multiplexing unit 215 combines these encoded codes into one and outputs it from the output terminal 216.
[0091]
After the above-described encoding process is completed, the internal state of the adaptive codebook is updated in preparation for the encoding process in the next frame (or subframe).
[0092]
The prediction filter 116 uses the third layer decoded signal syn (n) obtained from the adder 113 and the quantized LPC coefficient αq (i) obtained from the first layer encoding unit 102 to generate a prediction residual signal r (N) is output. The internal state in the adaptive codebook is updated using the prediction residual signal r (n). The prediction filter 116 forms a prediction filter using the quantized LPC coefficient αq (i), and inputs the third layer decoded signal syn (n) to the prediction filter to generate a prediction residual signal r (n). Is calculated. The prediction residual signal r (n) is calculated according to the following equation (4).
[0093]
(Equation 4)

Here, NP represents the order of the LPC coefficient.
[0094]
The feature of the present invention lies in the above description. In the conventional method, the internal state of adaptive codebook 207 is updated using the excitation signal obtained by adder 212, but in the present invention, input terminal 218 is used. The internal state of the adaptive codebook is updated using the output signal of prediction filter 116 input from. The effect of the present invention will be described with reference to FIG.
[0095]
FIG. 3 is a diagram showing the relationship between the input audio signal and the corresponding first layer decoded signal, second layer decoded signal, and third layer decoded signal. In the conventional method, the adaptive codebook is updated using the excitation signal corresponding to the first layer decoded signal.
[0096]
Comparing the decoded signals of the layers, the signal closest to the input audio signal is the third layer decoded signal, and then the second layer decoded signal and the first layer decoded signal. This is because, in the present embodiment, the encoding is performed such that the error between the input audio signal and the decoded signal decreases as the number of layers increases. On the other hand, the performance of the adaptive codebook becomes higher as the internal state of the adaptive codebook becomes more similar to the input audio signal. Therefore, when the internal state of the adaptive codebook is updated using the third layer decoded signal, more efficient encoding can be realized. Since the internal state of the adaptive codebook needs to be a driving excitation signal, in practice, a prediction residual signal is obtained from the third layer decoded signal using LPC coefficients, and an adaptive code is generated using the prediction residual signal. This will update the internal state of the book.
[0097]
As described above, according to the hierarchical encoding device of the present embodiment, in hierarchical encoding in which a part that cannot be completely encoded in an upper layer is encoded in a lower layer, an encoded signal of a second layer or later is decoded. A residual signal generated in the encoding of the upper layer is predicted from the signal thus obtained and the LPC coefficient obtained in the encoding of the upper layer, and the adaptive codebook of the upper layer is updated using the predicted residual signal. Accordingly, it is possible to perform encoding with an adaptive codebook having a driving excitation close to that of encoding an audio signal, and to perform high-quality encoding at a low bit rate.
[0098]
In the above description, the prediction filter 116 creates a prediction residual signal using the third layer decoded signal and the quantized LPC coefficients obtained from the first layer encoding section 102, and generates the first layer encoding section. Although 102 updates the internal state of the adaptive codebook using this prediction residual signal, the prediction filter 116 may create a prediction residual signal using the second layer decoded signal. That is, the decoded signal necessary to generate the prediction residual signal may be any layer as long as it is a layer that encodes a residual signal that cannot be encoded by the first layer encoding.
[0099]
FIG. 4 is a block diagram showing a configuration of the hierarchical encoding device according to Embodiment 1 of the present invention. 4, components having the same reference numerals as those in FIG. 1 have the same functions and will not be described here. A feature of the present embodiment is that a decoded signal of the intermediate layer (the decoded signal of the second layer in FIG. 4) is provided to prediction filter 116, and the output signal is used for updating the internal state of adaptive codebook 207. According to this configuration, there is a feature that scalability up to the intermediate layer can be secured.
[0100]
The adder 108 generates the second layer decoded signal by taking the sum of the first layer decoded signal and the second layer decoded residual signal. Then, the adder 108 outputs the second layer decoded signal to the subtractor 110 and the prediction filter 116.
[0101]
The prediction filter 116 applies a prediction filter to the second layer decoded signal, generates a prediction residual signal, and outputs the prediction residual signal to the first layer encoding section 102.
[0102]
First layer coding section 102 uses the prediction residual signal obtained by prediction filter 116 and uses the result as an internal state of an adaptive codebook inherent in first layer coding section 102. First layer coding section 102 determines the first coded code such that audible distortion between the input audio signal and the decoded signal generated after coding is minimized. Then, first layer encoding section 102 outputs the obtained first encoded code to first layer decoding section 103 and multiplexing section 114.
[0103]
As described above, according to the hierarchical encoding device of the present embodiment, the decoded signal of the intermediate layer is provided to the prediction filter, and the output signal is used for updating the internal state of the adaptive codebook of the first layer encoding. And scalability up to the intermediate layer.
[0104]
(Embodiment 2)
In the present embodiment, an example in which a signal encoded by the hierarchical encoding device of Embodiment 1 is decoded will be described. A feature of the present embodiment is that the encoded code of the hierarchical encoding method described in the first embodiment can be decoded, and as a result, a high-quality audio signal can be decoded.
[0105]
FIG. 5 is a block diagram showing a configuration of the hierarchical decoding device according to Embodiment 2 of the present invention. 5 includes an input terminal 301, a separating unit 302, a first layer decoding unit 303, a second layer decoding unit 304, a third layer decoding unit 305, and an adder 306. , An adder 307, a prediction filter 308, and an output terminal 309.
[0106]
A coded bit sequence coded by the hierarchical coding device of FIG. 1 is input from an input terminal 301.
[0107]
Separating section 302 separates the coded bit sequence, and outputs a first coded code obtained by first layer coding, a second coded code obtained by second layer coding, and a third coded code obtained by third layer coding. Generate an encoded code. Then, separating section 302 outputs the first encoded code to first layer decoding section 303, outputs the second encoded code to second layer decoding section 304, and outputs the third encoded code to the third layer decoding section. Output to decoding section 305.
[0108]
First layer decoding section 303 performs decoding processing using the first encoded code obtained in separation section 302, and generates a first layer decoded signal.
[0109]
Next, second layer decoding section 304 performs a decoding process using the second encoded code obtained in separation section 302, and generates a second layer decoded residual signal. The adder 306 adds the above-described first layer decoded signal and second layer decoded residual signal to generate a second layer decoded signal. Then, adder 306 outputs the second layer decoded signal to adder 307.
[0110]
Next, third layer decoding section 305 performs a decoding process using the third encoded code obtained in separation section 302, and generates a third layer decoded residual signal. The adder 307 adds the above-described second-layer decoded signal and the third-layer decoded residual signal to generate a third-layer decoded signal. The adder 307 outputs the third layer decoded signal to the prediction filter 308 and the output terminal 309.
[0111]
The prediction filter 308 performs a process similar to that of the prediction filter 116 of the first embodiment described above, and generates a prediction residual signal. As the quantized LPC coefficients used in the prediction filter 308, decoded LPC coefficients obtained by the first layer decoding unit are used. The prediction residual signal generated by the prediction filter 308 is provided to the first layer decoding unit, and is used for updating the internal state of the adaptive codebook inherent in the first layer decoding unit.
[0112]
In order to describe this state in detail, the first layer decoding section 303 will be described next. Here, a case where CELP is used for first layer decoding section 303 will be described as an example. However, a decoding method in which an adaptive codebook exists in the first layer decoding section is a requirement of the present invention. The invention is not limited to CELP.
[0113]
FIG. 6 is a block diagram showing an internal configuration of the first layer decoding unit of the hierarchical decoding device according to the present embodiment. The first layer decoding unit in FIG. 6 is a configuration diagram based on a typical configuration of CELP. In FIG. 6, a portion surrounded by a broken line frame corresponds to first layer decoding section 303 in FIG. 6, first layer decoding section 303 includes input terminal 401, separation section 402, adaptive codebook 403, noise codebook 404, gain codebook 405, multiplier 406, and multiplier 407. , An adder 408, an LPC decoder 409, a synthesis filter 410, an output terminal 412, an output terminal 413, and an input terminal 414.
[0114]
Separating section 402 separates the coded code from the first coded code input from input terminal 401, and outputs the coded code to adaptive codebook 403, noise codebook 404, gain codebook 405, and LPC decoder 409.
[0115]
LPC decoder 409 decodes the LPC coefficient using the given encoded code, and outputs the result to synthesis filter 410 and output terminal 412. The LPC coefficient output from the output terminal 412 is used in the prediction filter 308 described above.
[0116]
Next, adaptive codebook 403 decodes adaptive vector q (k) using the encoded code and outputs the result to multiplier 406. The noise codebook 404 decodes the noise vector c (k) using the encoded code and outputs the result to the multiplier 407.
[0117]
The gain codebook 405 decodes the adaptive vector gain βq and the noise vector gain γq using the encoded code. Then, the multiplier 406 outputs the adaptive vector gain βq to the multiplier 406, and outputs the noise vector gain γq to the multiplier 407.
[0118]
Multiplier 406 multiplies the adaptive vector by the adaptive vector gain and outputs the result to adder 408. The multiplier 407 multiplies the noise vector by the noise vector gain, and outputs the result to the adder 408. The adder 408 adds the signals of the multiplied adaptive vector and the noise vector to generate a drive excitation signal. When the driving sound source signal is expressed as ex (k), the driving sound source signal ex (k) is obtained by the following equation (5).
[0119]
(Equation 5)

Next, using the decoded LPC coefficient and the driving excitation signal ex (k), the synthesis filter 410 generates a synthesis signal syn (k) according to the following equation (6).
[0120]
(Equation 6)

Here, αq (i) represents the decoded LPC coefficient, and NP represents the order of the LPC coefficient. The decoded signal syn (n) decoded by the above operation is output from the output terminal 413.
[0121]
After the above-described decoding processing is completed, the internal state of the adaptive codebook is updated using the latest driving excitation signal in preparation for the decoding processing in the next frame (or subframe).
[0122]
The feature of the present invention lies in the above description. In the conventional method, the internal state of adaptive codebook 403 is updated using the excitation signal obtained by adder 408, but in the present invention, the input from input terminal 414 is input. The internal state of the adaptive codebook is updated using the output signal (prediction residual signal) of the prediction filter 308 obtained.
[0123]
As described above, according to the hierarchical decoding device of the present embodiment, in the decoding of the hierarchical encoding method of encoding a part that cannot be completely encoded by the upper layer in the lower layer, the encoding of the second and subsequent layers is performed. By generating a prediction residual signal from the signal obtained by decoding the signal and the LPC coefficient obtained by the encoding of the upper layer, and updating the adaptive codebook of the upper layer using the predicted residual signal, It is possible to decode with an adaptive codebook having a driving excitation similar to signal encoding, and to decode a high-quality signal at a low bit rate.
[0124]
In the above description, prediction filter 308 creates a prediction residual signal using the third layer decoded signal and the quantized LPC coefficient obtained from first layer encoding section 102, and adaptive codebook 403 generates Although the internal state of the adaptive codebook is updated using the prediction residual signal, the prediction filter 308 may create the prediction residual signal using the second layer decoded signal. That is, the decoded signal necessary to generate the prediction residual signal may be any layer as long as it is a layer that encodes a residual signal that cannot be encoded by the first layer encoding.
[0125]
FIG. 7 is a block diagram showing a configuration of a hierarchical decoding device according to Embodiment 2 of the present invention. 7, components having the same reference numerals as those in FIG. 5 have the same functions, and a description thereof will be omitted. A feature of the present embodiment is that the decoded signal of the intermediate layer (the decoded signal of the second layer in FIG. 7) is supplied to prediction filter 308, and the output signal of prediction filter 308 is used for updating the internal state of adaptive codebook 403 in FIG. The point is to use. According to this configuration, there is a feature that scalability up to the intermediate layer can be secured.
[0126]
The adder 306 adds the first layer decoded signal and the second layer decoded residual signal to generate a second layer decoded signal. Then, the adder 306 outputs the second layer decoded signal to the adder 307 and the prediction filter 308.
[0127]
Next, third layer decoding section 305 performs a decoding process using the third encoded code obtained in separation section 302, and generates a third layer decoded residual signal. The adder 307 adds the above-described second-layer decoded signal and the third-layer decoded residual signal to generate a third-layer decoded signal. Adder 307 outputs the third layer decoded signal to output terminal 309.
[0128]
The prediction filter 308 generates a prediction residual signal from the quantized LPC coefficients generated by the first layer decoding unit 303 and the second layer decoded signal generated by the adder 306. Then, the prediction residual signal generated by the prediction filter 308 is provided to the first layer decoding unit, and is used for updating the internal state of the adaptive codebook inherent in the first layer decoding unit.
[0129]
Thus, according to the hierarchical decoding apparatus of the present embodiment, the decoded signal of the intermediate layer is provided to the prediction filter, and the output signal is used for updating the internal state of the adaptive codebook of the first layer decoding. And scalability up to the intermediate layer.
[0130]
(Embodiment 3)
FIG. 8 is a block diagram showing a configuration of a hierarchical encoding device according to Embodiment 3 of the present invention. 8 includes an input terminal 501, a DS1 unit 502, a first layer encoding unit 503, a first layer decoding unit 504, a US1 unit 505, a DS2 unit 506, and a delay unit. 507, a subtractor 508, a second layer encoding unit 509, a second layer decoding unit 510, an adder 511, a US2 unit 512, a delay unit 513, a subtractor 514, and a third layer code. It mainly includes a multiplexing unit 515, a third layer decoding unit 516, an adder 517, a multiplexing unit 518, an output terminal 519, a DS3 unit 520, and a prediction filter 521.
[0131]
8 relates to a method of decoding an encoded signal of an upper layer and encoding a difference between a signal obtained by up-sampling the decoded signal and an input audio signal in a lower layer, and encoding in a lower layer. The difference from the hierarchical encoding apparatus of FIG. 1 is that the sampling frequency of the signal is higher than the sampling frequency of the signal to be encoded in the upper layer.
[0132]
The present embodiment is characterized in that the sampling frequency of a signal input to each layer has a relationship represented by the following equation (7).
[0133]
(Equation 7)

Here, Fs (n) represents the sampling frequency of the signal of the n-th layer. According to the present embodiment, it is possible to perform encoding corresponding to a plurality of sampling frequencies.
[0134]
An audio signal of the sampling frequency Fs (3) is input from the input terminal 501 and is provided to the DS1 unit 502.
[0135]
The DS1 unit 502 downsamples the input audio signal and reduces the sampling frequency of the input audio signal from Fs (3) to Fs (1). Then, DS1 section 502 outputs an input signal of sampling frequency Fs (1) to first layer encoding section 503.
[0136]
First layer coding section 503 has an adaptive codebook that holds a previously generated drive excitation signal as an internal state, and efficiently codes a signal having a high periodicity by using the adaptive codebook. be able to. First layer encoding section 503 determines the first encoded code such that the audible distortion between the input audio signal and the decoded signal generated after encoding is minimized. A typical method applied to the first layer coding section 503 is a code excitation linear prediction method (CELP).
[0137]
Then, first layer encoding section 503 outputs the obtained first encoded code to first layer decoding section 504 and multiplexing section 518. First layer decoding section 504 generates a first layer decoded signal using the first encoded code, and outputs the first layer decoded signal to US1 section 505.
[0138]
US1 section 505 up-samples the first layer decoded signal and increases the sampling frequency from Fs (1) to Fs (2). Then, US1 section 505 outputs the first layer decoded signal of sampling frequency Fs (2) to subtractor 508 and adder 511.
[0139]
Next, an acoustic signal input from the input terminal 501 is provided to the DS2 unit 506. The DS2 unit 506 downsamples the input audio signal and reduces the sampling frequency of the input audio signal from Fs (3) to Fs (2). Then, the DS2 unit 506 outputs the input signal of the sampling frequency Fs (2) to the delay unit 507.
[0140]
The delay unit 507 delays the audio signal input from the input terminal 501 by a predetermined time length and outputs the delayed audio signal to the subtractor 508. That is, it has a role of correcting a delay generated in the DS1 unit 502, the first layer encoding unit 503, the first layer decoding unit 504, the US1 unit 505, and the DS2 unit 506.
[0141]
The subtractor 508 calculates a difference between the output signal of the delay unit 507 and the above-described first layer decoded signal, and generates a second layer residual signal. Then, subtracter 508 outputs the second layer residual signal to second layer encoding section 509.
[0142]
Second layer encoding section 509 encodes the second layer residual signal so that the quality is perceptually improved, and determines a second encoded code. Then, second layer encoding section 509 outputs second layer decoding section 510 and the second encoded code to multiplexing section 518.
[0143]
Second layer decoding section 510 performs a decoding process using the second encoded code, generates a second layer decoded residual signal, and outputs the second layer decoded residual signal to adder 511.
[0144]
The adder 511 takes the sum of the first layer decoded signal and the second layer decoded residual signal, and generates a second layer decoded signal. Then, adder 511 outputs the second layer decoded signal to US2 section 512.
[0145]
US2 section 512 upsamples the second layer decoded signal and increases the sampling frequency from Fs (2) to Fs (3). Then, US2 section 512 outputs the second layer decoded signal of sampling frequency Fs (3) to subtractor 514 and adder 517.
[0146]
Next, the delay unit 513 delays the audio signal input from the input terminal 501 by a predetermined time length, and then outputs the audio signal to the subtractor 514. That is, the delay unit 513 has a role of correcting a delay occurring in the encoding unit and the decoding unit up to the previous stage, specifically, a delay occurring in the signal processing from the DS1 unit 502 to the US2 unit 512.
[0147]
The subtractor 514 calculates a difference between the output signal of the delay unit 513 and the above-described second layer decoded signal, and generates a third layer residual signal. Then, subtracter 514 outputs the third layer residual signal to third layer encoding section 515.
[0148]
Third layer encoding section 515 determines a third encoded code by encoding the third layer residual signal such that quality is perceptually improved, and decodes the third encoded code to third layer decoding. Multiplexing section 516 and multiplexing section 518.
[0149]
Third layer decoding section 516 performs a decoding process using the third encoded code, generates a third layer decoded residual signal, and outputs the third layer decoded residual signal to adder 517.
[0150]
Adder 517 takes the sum of the second layer decoded signal and the third layer decoded residual signal, generates a third layer decoded signal, and outputs the third layer decoded signal to DS3 section 520.
[0151]
The multiplexing unit 518 multiplexes the first coded code, the second coded code, and the third coded code by predetermined means, and generates a coded bit sequence. Then, the multiplexing unit 518 outputs the encoded bit sequence from the output terminal 519.
[0152]
DS3 section 520 downsamples the third layer decoded signal and reduces the sampling frequency of the third layer decoded signal from Fs (3) to Fs (1). Then, DS3 section 520 outputs the third layer decoded signal of sampling frequency Fs (1) to prediction filter 521.
[0153]
The prediction filter 521 applies a prediction filter to the third layer decoded signal to generate a prediction residual signal, and outputs the prediction residual signal to the first layer encoding section 503. The prediction filter is configured by the quantized LPC coefficients calculated by the first layer encoding unit 503. Assuming that the third layer decoded signal output from the DS3 unit 520 is syn3 (k), the prediction residual signal is e (k), and the quantized LPC coefficient is αq (i), the prediction residual signal e (k) Is represented by the following equation (8).
[0154]
(Equation 8)

Here, NP represents the order of the LPC coefficient.
[0155]
The first layer encoding section 503 uses the prediction residual signal obtained by the above-described operation and uses it as an internal state of the adaptive codebook inherent in the first layer encoding section 503.
[0156]
As described above, according to the hierarchical coding device of the present embodiment, the sampling frequency of a signal to be coded in a lower layer is higher than the sampling frequency of a signal to be coded in a higher layer, thereby supporting various sampling frequencies. Thus, the input signal can be encoded.
[0157]
In the above description, the prediction filter 521 creates a prediction residual signal using the third layer decoded signal and the quantized LPC coefficients obtained from the first layer encoding section 503, and generates the first layer encoding section. Although 503 updates the internal state of the adaptive codebook using the prediction residual signal, the prediction filter 521 may create a prediction residual signal using the second layer decoded signal. That is, the decoded signal necessary to generate the prediction residual signal may be any layer as long as it is a layer that encodes a residual signal that cannot be encoded by the first layer encoding.
[0158]
FIG. 9 is a block diagram showing a configuration of a hierarchical encoding device according to Embodiment 3 of the present invention. In FIG. 9, components denoted by the same reference numerals as those in FIG. 8 have the same functions and will not be described here. A feature of the present embodiment is that a decoded signal of the intermediate layer (the decoded signal of the second layer in FIG. 9) is provided to prediction filter 521, and the output signal is used for updating the internal state of adaptive codebook 207. According to this configuration, there is a feature that scalability up to the intermediate layer can be secured.
[0159]
The adder 511 takes the sum of the first layer decoded signal and the second layer decoded residual signal, and generates a second layer decoded signal. Then, adder 511 outputs the second layer decoded signal to US2 section 512.
[0160]
US2 section 512 upsamples the second layer decoded signal and increases the sampling frequency from Fs (2) to Fs (3). Then, US2 section 512 outputs the first layer decoded signal of sampling frequency Fs (3) to subtractor 514 and DS3 section 520.
[0161]
DS3 section 520 downsamples the third layer decoded signal and reduces the sampling frequency of the third layer decoded signal from Fs (3) to Fs (1). Then, DS3 section 520 outputs the third layer decoded signal of sampling frequency Fs (1) to prediction filter 521.
[0162]
The prediction filter 521 applies a prediction filter to the second layer decoded signal to generate a prediction residual signal, and outputs the prediction residual signal to the first layer encoding section 503.
[0163]
The first layer coding section 503 uses the prediction residual signal obtained by the prediction filter 521 and uses it as an internal state of the adaptive codebook inherent in the first layer coding section 503. First layer encoding section 503 determines the first encoded code such that the audible distortion between the input audio signal and the decoded signal generated after encoding is minimized. Then, first layer encoding section 503 outputs the obtained first encoded code to first layer decoding section 504 and multiplexing section 518.
[0164]
As described above, according to the hierarchical encoding device of the present embodiment, the decoded signal of the intermediate layer is provided to the prediction filter, and the output signal is used for updating the internal state of the adaptive codebook of the first layer encoding. And scalability up to the intermediate layer.
[0165]
(Embodiment 4)
In the present embodiment, an example will be described in which a signal encoded by the hierarchical encoding device of Embodiment 3 is decoded. A feature of the present embodiment is that the encoded code of the hierarchical encoding method described in the third embodiment can be decoded, and as a result, a high-quality audio signal can be decoded.
[0166]
FIG. 10 is a block diagram showing a configuration of a hierarchical decoding device according to Embodiment 4 of the present invention. 10 includes an input terminal 601, a separating unit 602, a first layer decoding unit 603, a US1 unit 604, an adder 605, a second layer decoding unit 606, and a US2 unit. 607, a third layer decoding section 608, an adder 609, an output terminal 610, a DS3 section 611, and a prediction filter 612.
[0167]
A coded bit sequence coded by the hierarchical coding device of FIG. 8 is input from an input terminal 601.
[0168]
Separating section 602 separates the coded bit sequence and generates a first coded code obtained by first layer coding, a second coded code obtained by second layer coding, and a third coded code obtained by third layer coding. Generate an encoded code. Then, separating section 602 outputs the first encoded code to first layer decoding section 603, outputs the second encoded code to second layer decoding section 606, and outputs the third encoded code to the third layer decoding section. Output to the decoding unit 608.
[0169]
First layer decoding section 603 performs a decoding process using the first encoded code obtained in separation section 602, and generates a first layer decoded signal.
[0170]
US1 section 604 upsamples the first layer decoded signal and increases the sampling frequency from Fs (1) to Fs (2). Then, US1 section 604 outputs the first layer decoded signal of sampling frequency Fs (2) to adder 605.
[0171]
Next, second layer decoding section 606 performs a decoding process using the second encoded code obtained in separation section 602, and generates a second layer decoded residual signal. The adder 605 adds the first-layer decoded signal and the second-layer decoded residual signal to generate a second-layer decoded signal. Then, adder 605 outputs the first layer decoded signal and the second layer decoded signal to US2 section 607.
[0172]
US2 section 607 up-samples the second layer decoded signal and increases the sampling frequency from Fs (2) to Fs (3). Then, US2 section 607 outputs the second layer decoded signal of sampling frequency Fs (3) to adder 609.
[0173]
Next, third layer decoding section 608 performs a decoding process using the third encoded code obtained in separation section 602, and generates a third layer decoded residual signal. The adder 609 adds the above-described second layer decoded signal and third layer decoded residual signal to generate a third layer decoded signal. Adder 609 outputs the third layer decoded signal to DS3 section 611 and output terminal 610.
[0174]
DS3 section 611 down-samples the third layer decoded signal, and reduces the sampling frequency of the third layer decoded signal from Fs (3) to Fs (1). Then, DS3 section 611 outputs the third layer decoded signal of sampling frequency Fs (1) to prediction filter 612.
[0175]
The prediction filter 612 performs the same processing as that of the prediction filter 116 according to the first embodiment, and generates a prediction residual signal. As the quantized LPC coefficient used in the prediction filter 612, a decoded LPC coefficient obtained by the first layer decoding unit is used. Further, the prediction residual signal generated by prediction filter 612 is provided to the first layer decoding unit, and is used for updating the internal state of the adaptive codebook inherent in the first layer decoding unit.
[0176]
In the above description, the prediction filter 612 creates a prediction residual signal using the third layer decoded signal and the quantized LPC coefficient obtained from the first layer decoding unit 603, and generates the first layer decoding unit. The adaptive codebook in 603 updates the internal state of the adaptive codebook using this prediction residual signal, but the prediction filter 612 generates a prediction residual signal using the second layer decoded signal. Is also good. That is, the decoded signal necessary to generate the prediction residual signal may be any layer as long as it is a layer that encodes a residual signal that cannot be encoded by the first layer encoding.
[0177]
FIG. 11 is a block diagram showing a configuration of a hierarchical decoding device according to Embodiment 4 of the present invention. However, components having the same configuration as in FIG. 10 are assigned the same reference numerals as in FIG. 10 and detailed descriptions thereof are omitted. A feature of the present embodiment is that a decoded signal of the intermediate layer (a second layer decoded signal in FIG. 11) is provided to prediction filter 612, and an output signal of prediction filter 612 is output from an adaptive codebook in first layer decoding section 603. It is used to update the internal state. According to this configuration, there is a feature that scalability up to the intermediate layer can be secured.
[0178]
The adder 605 adds the first layer decoded signal and the second layer decoded residual signal to generate a second layer decoded signal. Then, the adder 605 outputs the second layer decoded signal to the adder US2 unit 607 and the DS3 unit 611.
[0179]
US2 section 607 up-samples the second layer decoded signal and increases the sampling frequency from Fs (2) to Fs (3). Then, US2 section 607 outputs the first layer decoded signal of sampling frequency Fs (3) to adder 609.
[0180]
DS3 section 611 down-samples the second layer decoded signal, and reduces the sampling frequency of the second layer decoded signal from Fs (2) to Fs (1). Then, DS3 section 611 outputs the second layer decoded signal of sampling frequency Fs (1) to prediction filter 612.
[0181]
Thus, according to the hierarchical decoding apparatus of the present embodiment, the decoded signal of the intermediate layer is provided to the prediction filter, and the output signal is used for updating the internal state of the adaptive codebook of the first layer decoding. And scalability up to the intermediate layer.
[0182]
(Embodiment 5)
FIG. 12 is a block diagram showing a configuration of the first layer encoding unit of the hierarchical encoding device according to Embodiment 5 of the present invention. However, components having the same configuration as in FIG. 2 are denoted by the same reference numerals as in FIG. 2 and detailed description is omitted. The first layer encoding unit in FIG. 12 includes a periodicity calculation unit 701, a determination unit 702, a switch unit 703, an adaptive codebook 704, and a multiplexer 705. At the time of updating, either the prediction residual signal input from the input terminal 218 or the driving sound source signal output from the adder 212 is used according to the strength of the periodicity of the input audio signal. The point of selection differs from the first layer encoding unit in FIG.
[0183]
The periodicity calculation unit 701 performs processing such as correlation analysis on the audio signal input from the input terminal 201 to quantify the degree of the periodicity of the input audio signal, and determines the degree of the periodicity in the determination unit. 702.
[0184]
The determination unit 702 compares the degree of the periodicity with a predetermined threshold. Then, when the degree of the periodicity exceeds the threshold, the determination unit 702 regards the periodicity of the input audio signal as strong, and outputs the flag to “0” to the multiplexer 705. When the degree of the periodicity is equal to or smaller than the threshold, the determination unit 702 regards the periodicity of the input audio signal as weak, and outputs the flag to “1” to the multiplexer 705.
[0185]
The switch unit 703 switches a signal used for updating the internal state of the adaptive codebook 704 according to the flag obtained from the determination unit 702. When the flag is 0, the switch unit 703 connects a switch so that the prediction residual signal input from the input terminal 218 is used as a signal used for updating the internal state of the adaptive codebook 704. Similarly, when the flag is 1, the switch unit 703 connects a switch so that the driving excitation signal output from the adder 212 is used as a signal used for updating the internal state of the adaptive codebook 704.
[0186]
Adaptive codebook 704 holds a previously generated drive excitation signal as an internal state, and generates an adaptive vector by repeating this internal state at a desired pitch cycle. That is, when determining section 702 determines that the periodicity of the input audio signal is strong, adaptive codebook 704 updates the internal state using the prediction residual signal input from input terminal 218. When determining section 702 determines that the periodicity of the input acoustic signal is weak, adaptive codebook 704 updates the internal state using the drive excitation signal output from adder 212. Then, adaptive codebook 704 sequentially outputs the driving excitation signal held therein to multiplier 209 as an adaptive vector.
[0187]
The multiplexer 705 multiplexes signals from the LPC quantizer 203, the searcher 214, and the determination unit 702, and outputs the multiplexed signal from the output terminal 216.
[0188]
As described above, according to the hierarchical coding apparatus of the present embodiment, when the periodicity of the input audio signal is strong, the internal state of the adaptive codebook is calculated using the prediction residual signal obtained from the decoded signal of the higher layer. Is updated, the prediction accuracy by the adaptive codebook is increased, and the performance is improved. In addition, according to the hierarchical coding apparatus of the present embodiment, when the periodicity of the input audio signal is not strong, the internal state of the adaptive codebook is updated using the driving excitation signal, whereby the non-periodic signal is updated. To the effect.
[0189]
In the above description, whether to update the internal state of the adaptive codebook using the prediction residual signal or the driving excitation signal is determined based on the strength of the periodicity of the input audio signal. The criteria are not particularly limited.
[0190]
For example, the internal state of the adaptive codebook is updated using the prediction residual signal, and the distortion obtained by actually encoding the input audio signal and the internal state of the adaptive codebook are updated using the driving excitation signal to update the input audio signal. May be actually encoded to calculate a distortion that is obtained and then compared. FIG. 13 is a flowchart illustrating an example of the operation of the hierarchical encoding device according to the present embodiment. Hereinafter, the determination operation of the hierarchical encoding device will be described with reference to FIG.
[0191]
In step S810, the internal state of the adaptive codebook is updated using the prediction residual signal, and the encoding process of the first layer encoding unit is performed. At this time, the audible distortion E1 of the first layer decoded signal with respect to the input audio signal is calculated.
[0192]
In step S820, similarly, the internal state of the adaptive codebook is updated using the driving excitation signal, and the encoding process of the first layer encoding unit is performed. The audible distortion E2 of the first layer decoded signal with respect to the input audio signal at that time is calculated.
[0193]
In step S830, the distortion E1 determined in step S810 is compared with the distortion E2 determined in step S820.
[0194]
The determination is made in step S840. If the distortion E1 is smaller than the distortion E2, the process proceeds to step S850. If the distortion E2 is smaller than the distortion E1, the process proceeds to step S860.
[0195]
In step S850, it is determined that using the prediction residual signal is more effective, and the encoding process is performed after updating the internal state of the adaptive codebook using the prediction residual signal. At this time, the flag is set to 0 on the assumption that the prediction residual signal is used for updating the adaptive codebook.
[0196]
In step S860, it is determined that using the driving excitation signal is more effective, and the encoding process is performed after updating the internal state of the adaptive codebook using the driving excitation signal. At this time, the flag is set to 1 on the assumption that the driving excitation signal is used for updating the adaptive codebook.
[0197]
In step S870, the coded code and the flag obtained by the coding process are multiplexed by the multiplexing unit and output from the output terminal.
[0198]
Thus, according to the hierarchical coding apparatus of the present embodiment, when determining whether to update the internal state of the adaptive codebook using either the prediction residual signal or the driving excitation signal, the prediction residual signal Update the internal state of the adaptive codebook using, and actually encode the input audio signal and obtain the distortion, and update the internal state of the adaptive codebook using the drive excitation signal to actually encode the input audio signal By calculating and comparing the required distortion, and updating the internal state of the adaptive codebook using a signal with a small distortion, the internal state of the adaptive codebook is always updated with a signal with a small distortion. Therefore, quality can be improved.
[0199]
(Embodiment 6)
FIG. 14 is a block diagram showing a configuration of the first layer decoding unit of the hierarchical decoding device according to Embodiment 6 of the present invention. However, components having the same configuration as in FIG. 6 are denoted by the same reference numerals as in FIG. 6, and detailed description is omitted. The first layer decoding unit in FIG. 14 includes an input terminal 801, a separation unit 802, and a switch unit 803, and updates flag information obtained from the separation unit 802 when updating the internal state of the adaptive codebook. The difference from the first layer encoding unit in FIG. 2 is that either a prediction residual signal input from an input terminal 801 or a driving excitation signal output from an adder 408 is used based on the selection.
[0200]
Separating section 802 separates the coded codes used in adaptive codebook 804, noise codebook 404, gain codebook 405, and LPC decoder 409 based on the coded code input from input terminal 401, The flag information indicating the type of signal used for updating the internal state of the book 804 is separated. This flag information is a signal output from the determination unit 702 of FIG.
[0201]
Switch section 803 switches a signal used for updating the internal state of adaptive codebook 804 according to the flag information. When the flag is 0, the switch unit 803 connects a switch so that the prediction residual signal input from the input terminal 801 is used as a signal used for updating the internal state of the adaptive codebook 804. Similarly, when the flag is 1, the switch unit 803 connects the switch so that the driving excitation signal output from the adder 408 is used as a signal used for updating the internal state of the adaptive codebook 804.
[0202]
As described above, according to the hierarchical decoding apparatus of the present embodiment, based on the strength of the periodicity of the input audio signal on the code side, the adaptive codebook using either the prediction residual signal or the driving excitation signal is used. If the encoded audio signal has a strong periodicity based on the result of determining whether to update the internal state of the adaptive codebook, the internal residual of the adaptive codebook is calculated using the prediction residual signal obtained from the decoded signal of the higher layer. By updating the state, the encoded code of the hierarchical encoding method can be decoded, and as a result, a high-quality audio signal can be decoded.
[0203]
(Embodiment 7)
Next, a seventh embodiment of the present invention will be described with reference to the drawings. FIG. 15 is a block diagram showing a configuration of a communication device according to Embodiment 7 of the present invention. A feature of the present embodiment lies in that the signal processing device 1503 in FIG. 15 is configured by one of the acoustic encoding devices described in Embodiments 1 to 6 described above.
[0204]
As shown in FIG. 15, a communication device 1500 according to Embodiment 7 of the present invention includes an input device 1501, an A / D conversion device 1502, and a signal processing device 1503 connected to a network 1504.
[0205]
The A / D converter 1502 is connected to the output terminal of the input device 1501. An input terminal of the signal processing device 1503 is connected to an output terminal of the A / D conversion device 1502. The output terminal of the signal processing device 1503 is connected to the network 1504.
[0206]
The input device 1501 converts a sound wave audible to a human ear into an analog signal which is an electric signal, and supplies the analog signal to the A / D converter 1502. An A / D converter 1502 converts an analog signal into a digital signal and supplies the digital signal to a signal processor 1503. The signal processing device 1503 encodes the input digital signal to generate a code, and outputs the code to the network 1504.
[0207]
As described above, according to the communication apparatus of the embodiment of the present invention, it is possible to enjoy the effects shown in the above-described first to sixth embodiments in communication, and to efficiently encode an audio signal with a small number of bits. An encoding device can be provided.
[0208]
(Embodiment 8)
Next, an eighth embodiment of the present invention will be described with reference to the drawings. FIG. 16 is a block diagram showing a configuration of a communication device according to Embodiment 8 of the present invention. A feature of this embodiment lies in that the signal processing device 1603 in FIG. 16 is configured by one of the acoustic decoding devices described in Embodiments 1 to 6 described above.
[0209]
As shown in FIG. 16, a communication device 1600 according to Embodiment 8 of the present invention includes a receiving device 1602, a signal processing device 1603, a D / A conversion device 1604, and an output device 1605 connected to a network 1601. are doing.
[0210]
The input terminal of the receiving device 1602 is connected to the network 1601. An input terminal of the signal processing device 1603 is connected to an output terminal of the receiving device 1602. An input terminal of the D / A converter 1604 is connected to an output terminal of the signal processing device 1603. The input terminal of the output device 1605 is connected to the output terminal of the D / A converter 1604.
[0211]
Receiving apparatus 1602 receives a digital coded audio signal from network 1601, generates a digital received audio signal, and provides the signal to signal processing apparatus 1603. The signal processing device 1603 receives the received audio signal from the receiving device 1602, performs a decoding process on the received audio signal, generates a digital decoded audio signal, and provides the digital decoded audio signal to the D / A conversion device 1604. The D / A conversion device 1604 converts the digital decoded audio signal from the signal processing device 1603 to generate an analog decoded audio signal, and supplies the analog decoded audio signal to the output device 1605. The output device 1605 converts an analog decoded sound signal, which is an electric signal, into air vibration and outputs the sound as a sound wave so that it can be heard by a human ear.
[0212]
As described above, according to the communication device of the present embodiment, it is possible to enjoy the effects shown in the above-described first to sixth embodiments in communication, and to efficiently decode an encoded audio signal with a small number of bits. Therefore, a good acoustic signal can be output.
[0213]
(Embodiment 9)
Next, a ninth embodiment of the present invention will be described with reference to the drawings. FIG. 17 is a block diagram showing a configuration of a communication device according to Embodiment 9 of the present invention. The ninth embodiment of the present invention is different from the ninth embodiment in that the signal processing device 1703 in FIG. 17 is constituted by one of the acoustic encoding means shown in the first to sixth embodiments. There is a feature of the form.
[0214]
As shown in FIG. 17, a communication device 1700 according to Embodiment 9 of the present invention includes an input device 1701, an A / D converter 1702, a signal processor 1703, an RF modulator 1704, and an antenna 1705.
[0215]
The input device 1701 converts a sound wave that can be heard by a human ear into an analog signal that is an electric signal, and supplies the analog signal to the A / D converter 1702. An A / D converter 1702 converts an analog signal into a digital signal and supplies the digital signal to a signal processor 1703. The signal processing device 1703 encodes the input digital signal to generate an encoded audio signal, and supplies the encoded audio signal to the RF modulation device 1704. The RF modulation device 1704 modulates the encoded acoustic signal to generate a modulated encoded acoustic signal, and provides the modulated encoded acoustic signal to the antenna 1705. The antenna 1705 transmits the modulated and coded acoustic signal as a radio wave.
[0216]
As described above, according to the communication apparatus of the present embodiment, it is possible to enjoy the effects shown in the above-described first to sixth embodiments in wireless communication, and to efficiently encode an audio signal with a small number of bits. it can.
[0219]
Note that the present invention can be applied to a transmission device, a transmission encoding device, or an audio signal encoding device that uses an audio signal. Further, the present invention can be applied to a mobile station device or a base station device.
[0218]
(Embodiment 10)
Next, a tenth embodiment of the present invention will be described with reference to the drawings. FIG. 18 is a block diagram showing a configuration of a communication device according to Embodiment 10 of the present invention. In the tenth embodiment of the present invention, the signal processing device 1803 in FIG. 18 is configured by one of the audio decoding units shown in the first to sixth embodiments. There is a feature of the form.
[0219]
As shown in FIG. 18, a communication device 1800 according to Embodiment 10 of the present invention includes an antenna 1801, an RF demodulation device 1802, a signal processing device 1803, a D / A conversion device 1804, and an output device 1805.
[0220]
The antenna 1801 receives a digital coded acoustic signal as a radio wave, generates a digital reception coded acoustic signal of an electric signal, and supplies the generated signal to the RF demodulator 1802. The RF demodulation device 1802 demodulates the coded audio signal received from the antenna 1801, generates a demodulated coded audio signal, and provides the demodulated coded audio signal to the signal processing device 1803.
[0221]
The signal processing device 1803 receives the digital demodulated coded audio signal from the RF demodulation device 1802, performs a decoding process, generates a digital decoded audio signal, and provides it to the D / A conversion device 1804. The D / A converter 1804 converts the digital decoded audio signal from the signal processor 1803, generates an analog decoded audio signal, and supplies the analog decoded audio signal to the output device 1805. The output device 1805 converts an analog decoded audio signal, which is an electric signal, into vibration of air and outputs the sound as sound waves so that the sound can be heard by human ears.
[0222]
As described above, according to the communication apparatus of the present embodiment, it is possible to enjoy the effects shown in the above-described first to sixth embodiments in wireless communication, and to decode an audio signal efficiently encoded with a small number of bits. Therefore, a good acoustic signal can be output.
[0223]
Note that the present invention can be applied to a receiving device, a receiving decoding device, or an audio signal decoding device that uses an audio signal. Further, the present invention can be applied to a mobile station device or a base station device.
[0224]
Further, the present invention is not limited to the above embodiment, and can be implemented with various modifications. For example, in the above-described embodiment, the case where the processing is performed as a signal processing apparatus is described. However, the present invention is not limited to this, and the signal processing method can be performed as software.
[0225]
For example, a program for executing the signal processing method may be stored in a ROM (Read Only Memory) in advance, and the program may be operated by a CPU (Central Processor Unit).
[0226]
Further, a program for executing the above signal processing method is stored in a computer-readable storage medium, and the program stored in the storage medium is recorded in a RAM (Random Access Memory) of the computer, and the computer is operated according to the program. You may make it do.
[0227]
Note that the present invention can be applied to a receiving device, a receiving decoding device, or an audio signal decoding device that uses an audio signal. Further, the present invention can be applied to a mobile station device or a base station device.
[0228]
【The invention's effect】
As described above, according to the hierarchical encoding method and the hierarchical decoding method for an audio signal of the present invention, in the hierarchical encoding in which a part that cannot be encoded in an upper layer is encoded in a lower layer that encodes the second layer, A residual signal generated in the encoding of the upper layer is predicted from a signal obtained by decoding the encoded signal of the layer and subsequent layers and the LPC coefficient obtained in the encoding of the upper layer. By updating the adaptive codebook described above, it is possible to perform encoding with an adaptive codebook having a driving excitation close to that of encoding an acoustic signal and perform high-quality encoding at a low bit rate.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of a hierarchical encoding device according to Embodiment 1 of the present invention.
FIG. 2 is a block diagram illustrating an internal configuration of a first layer encoding unit of the hierarchical encoding device according to the present embodiment.
FIG. 3 is a diagram showing a relationship between an input audio signal and a corresponding first layer decoded signal, second layer decoded signal, and third layer decoded signal.
FIG. 4 is a block diagram showing a configuration of a hierarchical encoding device according to Embodiment 1 of the present invention.
FIG. 5 is a block diagram showing a configuration of a hierarchical decoding device according to Embodiment 2 of the present invention.
FIG. 6 is a block diagram showing an internal configuration of a first layer decoding unit of the hierarchical decoding device according to the present embodiment.
FIG. 7 is a block diagram showing a configuration of a hierarchical decoding device according to Embodiment 2 of the present invention.
FIG. 8 is a block diagram showing a configuration of a hierarchical encoding device according to Embodiment 3 of the present invention.
FIG. 9 is a block diagram showing a configuration of a hierarchical encoding device according to Embodiment 3 of the present invention.
FIG. 10 is a block diagram showing a configuration of a hierarchical decoding device according to Embodiment 4 of the present invention.
FIG. 11 is a block diagram showing a configuration of a hierarchical decoding device according to Embodiment 4 of the present invention.
FIG. 12 is a block diagram showing a configuration of a first layer encoding unit of a hierarchical encoding device according to Embodiment 5 of the present invention.
FIG. 13 is a flowchart showing an example of the operation of the hierarchical encoding device of the present embodiment.
FIG. 14 is a block diagram showing a configuration of a first layer decoding unit of the hierarchical decoding device according to Embodiment 6 of the present invention.
FIG. 15 is a block diagram showing a configuration of a communication device according to a seventh embodiment of the present invention.
FIG. 16 is a block diagram showing a configuration of a communication device according to Embodiment 8 of the present invention.
FIG. 17 is a block diagram showing a configuration of a communication device according to Embodiment 9 of the present invention.
FIG. 18 is a block diagram showing a configuration of a communication device according to a tenth embodiment of the present invention.
[Explanation of symbols]
102, 503 First layer encoding section
103, 303, 504, 603 First layer decoding section
106, 509 Second layer encoding unit
107, 304, 510, 606 Second layer decoding section
111, 305, 515 Third layer coding section
112, 516, 608 Third layer decoding section
116, 308, 521, 612 Prediction filter
202 LPC analyzer
203 LPC quantizer
204 LPC decoder
205 Hearing weight filter
206 A hearing weighted synthesis filter
207, 403, 704 Adaptive codebook
214 Searcher
409 LPC decoder
410 Synthesis filter
502 DS1 section
505, 604 US1
506 DS2 part
512, 607 US2
520, 611 DS3 part
701 Periodicity calculator
702 Judgment unit
703, 803 switch section

Claims

A hierarchical encoding method for encoding an input audio signal, decoding the signal encoded in the previous stage, and encoding the difference between the decoded signal and the input signal, wherein the input audio signal is encoded in frame units of a predetermined length. A first encoding step of encoding, a second encoding step of encoding a difference between a signal obtained by decoding a previous encoding result and an input audio signal in one or more stages, and the second encoding step A prediction filter step of generating a prediction residual signal from a signal obtained by decoding the result of encoding, and an update step of updating a codebook used for encoding based on the prediction of the prediction filter step. Hierarchical encoding method.

The first encoding step performs CELP encoding of the input audio signal, the prediction filter step generates a prediction filter using the quantized LPC coefficients, and the updating step includes a step of encoding the encoded signal by the second encoding unit. 2. The hierarchical encoding method according to claim 1, wherein the codebook is updated using a result obtained by passing a signal obtained by decoding the result of the encoding through the prediction filter.

A down-sampling step of down-sampling an input audio signal; and an up-sampling step of up-sampling a signal obtained by decoding a preceding-stage encoding result, wherein the second encoding step includes a preceding-stage encoding after up-sampling. 3. The hierarchical encoding method according to claim 1, wherein the difference between the decoded signal and the input audio signal is encoded in one or more stages.

The method further comprises a periodicity calculating step of measuring a periodicity of the input acoustic signal, wherein the updating step calculates a prediction residual signal obtained by the prediction of the prediction filter step when the periodicity is equal to or greater than a predetermined threshold. 4. The method according to claim 3, wherein the codebook is updated using any one of the driving excitation signals generated when the periodicity is less than a predetermined threshold. Hierarchical encoding method.

The internal state of the adaptive codebook is updated using the prediction residual signal and the distortion obtained by actually encoding the input audio signal, and the internal state of the codebook is updated using the driving excitation signal and the input audio signal is actually updated. 4. The hierarchy according to claim 3, further comprising a determining step of determining which of the distortions obtained by encoding is smaller, wherein the updating step updates the codebook using the signal having the smaller distortion. Encoding method.

A hierarchical decoding method for encoding an input audio signal on the encoding side, decoding a signal encoded in a previous stage, and decoding a signal obtained by encoding a difference between the decoded signal and the input signal, the decoding method comprising: A first decoding step of decoding a signal obtained by encoding the input audio signal in frame units, and a signal obtained by encoding the difference between the signal obtained by decoding the previous encoding result and the input audio signal in one or more stages. A second decoding step of decoding and adding each; a prediction filter step of generating a prediction residual signal from the decoding results of the first decoding step and the second decoding step; Updating the codebook used for decoding by using the hierarchical decoding method.

The first decoding step decodes an input audio signal by a CELP coding method, and the prediction filter step performs prediction using LPC coefficients obtained by decoding LPC coefficients coded on the coding side. 7. The filter according to claim 6, wherein the updating step updates the codebook by using a result obtained by passing the decoding results of the first decoding step and the second decoding step through the prediction filter. 2. The hierarchical decoding method according to item 1.

An up-sampling step of up-sampling the preceding-stage decoding result, an adding step of adding the up-sampled decoding result and a subsequent-stage decoding result, and a down-sampling step of down-sampling the addition result; 8. The hierarchical decoding method according to claim 6, wherein the step generates a prediction residual signal from a decoding result after downsampling.

The updating step is a codebook based on a result of determining whether to update the adaptive codebook using any of the prediction residual signal obtained in the prediction of the prediction filter step and the generated driving excitation signal on the encoding side. 9. The hierarchical decoding method according to claim 6, wherein is updated.

A hierarchical encoding device that encodes an input audio signal, decodes a signal encoded in a previous stage, and encodes a difference between the decoded signal and the input signal, and converts an input audio signal in frame units of a predetermined length. First encoding means for encoding, second encoding means for encoding the difference between the signal obtained by decoding the encoding result of the previous stage and the input audio signal in one or more stages, and the second encoding means Prediction filter means for generating a prediction residual signal from a signal obtained by decoding the result of encoding, wherein the first encoding means generates a codebook used for encoding based on the prediction of the prediction filter means. A hierarchical encoding device characterized by updating.

The first encoding means is means for CELP encoding an input audio signal, comprising: a codebook for holding a previously generated drive excitation signal; an LPC analysis means for obtaining LPC coefficients from the input audio signal; Searching means for searching for a drive excitation signal having the smallest difference from the signal, wherein the prediction filter means generates a prediction filter using the quantized LPC coefficients, and the first encoding means 11. The hierarchical encoding apparatus according to claim 10, wherein a codebook is updated using a result obtained by passing a signal obtained by decoding a result of encoding performed by the second encoding unit through the prediction filter.

Downsampling means for downsampling an input audio signal and outputting it to the first encoding means or the second encoding means, and upsampling means for upsampling a signal obtained by decoding a preceding encoding result. 11. The apparatus according to claim 10, wherein said second encoding means encodes, in one or more stages, a difference between a signal obtained by decoding a previous encoding result after upsampling and an input audio signal. Item 12. The hierarchical encoding device according to item 11.

The first encoding unit includes a determination unit that determines whether to update the adaptive codebook using any of the prediction residual signal obtained by the prediction of the prediction filter unit and the generated driving excitation signal. 13. The hierarchical encoding device according to claim 10, wherein

The first encoding unit includes a periodicity calculating unit that measures a periodicity of the input audio signal, and the determining unit determines whether or not the prediction filter unit performs a prediction when the periodicity is equal to or greater than a predetermined threshold. Updating the codebook using the prediction residual signal obtained in the above, characterized in that it is determined that the codebook is updated using the driving excitation signal generated when the periodicity is less than a predetermined threshold value 14. The hierarchical encoding device according to claim 13, wherein

The determining means updates the internal state of the adaptive codebook by using the prediction residual signal and updates the internal state of the adaptive codebook by using the driving excitation signal and the distortion obtained by actually encoding the input audio signal. It is characterized in that it is determined which of the distortions obtained by actually encoding the input audio signal is smaller, and the first encoding means makes a determination to update the codebook using the signal having the smaller distortion. The hierarchical encoding device according to claim 13.

A hierarchical decoding device that encodes an input audio signal on a code side, decodes a signal encoded in a previous stage, and decodes a signal obtained by encoding a difference between the decoded signal and the input signal, the signal having a predetermined length. First decoding means for decoding a signal obtained by encoding the input audio signal in frame units, and a signal obtained by encoding the difference between the signal obtained by decoding the previous encoding result and the input audio signal in one or more stages; A second decoding unit for decoding and adding each; and a prediction filter unit for generating a prediction residual signal from decoding results of the first decoding unit and the second decoding unit. Is a codec for updating a codebook used for decoding based on the prediction of said prediction filter means.

The first decoding unit decodes an input audio signal by a CELP coding method, and the prediction filter unit decodes an LPC coefficient obtained by decoding an LPC coefficient coded on a coding side. Generating a prediction filter using the first and second decoding means, and the first decoding means updates a codebook using a result of passing the decoding result of the first decoding means and the second decoding means through the prediction filter. The hierarchical decoding apparatus according to claim 16, wherein:

Upsampling means for upsampling the preceding stage decoding result, adding means for adding the upsampled decoding result and the subsequent stage decoding result, and downsampling means for downsampling the addition result of the adding means, 18. The hierarchical decoding apparatus according to claim 16, wherein the filter unit generates a prediction residual signal from a decoding result after downsampling.

The first decoding unit is configured to determine whether to update the adaptive codebook by using any of the prediction residual signal obtained by the prediction of the prediction filter unit and the generated excitation signal on the encoding side. 19. The hierarchical decoding device according to claim 16, wherein the codebook is updated by updating the codebook.

Audio input means for converting an audio signal into an electrical signal, A / D conversion means for converting a signal output from the audio input means into a digital signal, and encoding a digital signal output from the A / D conversion means The hierarchical encoding device according to any one of claims 10 to 15, wherein the RF encoding device modulates an encoded code output from the encoding device into a radio frequency signal. A transmitting antenna for converting an output signal into a radio wave and transmitting the radio wave;

The receiving antenna according to any one of claims 16 to 19, wherein the receiving antenna receives a radio wave, RF demodulating means demodulates a signal received by the receiving antenna, and information obtained by the RF demodulating means is decoded. Hierarchical decoding device, D / A conversion means for converting a signal output from the decoding device into an analog signal, and audio output means for converting an electric signal output from the D / A conversion means into an audio signal A sound signal receiving device comprising:

A communication terminal device comprising at least one of the acoustic signal transmitting device according to claim 20 and the acoustic signal receiving device according to claim 21.

A base station apparatus comprising at least one of the acoustic signal transmitting apparatus according to claim 20 and the acoustic signal receiving apparatus according to claim 21.