JP2004102186A

JP2004102186A - Device and method for sound encoding

Info

Publication number: JP2004102186A
Application number: JP2002267436A
Authority: JP
Inventors: Masahiro Oshikiri; 押切　正浩
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2002-09-12
Filing date: 2002-09-12
Publication date: 2004-04-02
Anticipated expiration: 2022-09-12
Also published as: JP3881946B2

Abstract

<P>PROBLEM TO BE SOLVED: To excellently encode even a signal, which consists principally of a voice and has music and noise superposed in the background, at a low bit rate. <P>SOLUTION: A basic layer encoder 102 encodes input data of a sampling rate FL in specified basic frame units to generate a 1st encoded code. A local decoder 103 decodes the 1st encoded code. An up sampling unit 104 increases the sampling rate of the decoded signal up to FH. A frequency determination part 107 determines an area where an error signal is encoded and an area where no encoding is carried out from the decoded signal whose sampling rate is increased up to FH. An extended layer encoder 108 converts the error signal into a coefficient of a frequency range to generate an error spectrum and encodes the error spectrum according to frequency information to be encoded which is obtained from the frequency determination part 107. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
本発明は、楽音信号または音声信号などの音響信号を高能率に圧縮符号化する音響符号化装置及び音響符号化方法に関し、特に符号化コードの一部からでも楽音や音声を復号することができるスケーラブル符号化を行う音響符号化装置及び音響符号化方法に関する。
【０００２】
【従来の技術】
楽音信号または音声信号を低ビットレートで圧縮する音響符号化技術は、移動体通信における電波等の伝送路容量及び記録媒体の有効利用のために重要である。音声信号を符号化する音声符号化に、ＩＴＵ（Ｉｎｔｅｒｎａｔｉｏｎａｌ　Ｔｅｌｅｃｏｍｍｕｎｉｃａｔｉｏｎ　Ｕｎｉｏｎ）で規格化されているＧ７２６、Ｇ７２９などの方式がある。これらの方式は、狭帯域信号（３００Ｈｚ〜３．４ｋＨｚ）を対象とし、８ｋｂｉｔ／ｓ〜３２ｋｂｉｔ／ｓのビットレートで高品質に符号化できる。
【０００３】
また、広帯域信号（５０Ｈｚ〜７ｋＨｚ）を符号化する標準方式としてＩＴＵのＧ７２２、Ｇ７２２．１や、３ＧＰＰ（Ｔｈｅ　３ｒｄ　Ｇｅｎｅｒａｔｉｏｎ　Ｐａｒｔｎｅｒｓｈｉｐ　Ｐｒｏｊｅｃｔ）のＡＭＲ−ＷＢなどが存在する。これらの方式は、６．６ｋｂｉｔ／ｓ〜６４ｋｂｉｔ／ｓのビットレートで広帯域音声信号を高品質に符号化できる。
【０００４】
音声信号を低ビットレートで高能率に符号化を行う有効な方法に、ＣＥＬＰ（Ｃｏｄｅ　Ｅｘｃｉｔｅｄ　Ｌｉｎｅａｒ　Ｐｒｅｄｉｃｔｉｏｎ）がある。ＣＥＬＰは、人間の音声生成モデルを工学的に模擬したモデルに基づき、乱数やパルス列で表される励振信号を周期性の強さに対応するピッチフィルタと声道特性に対応する合成フィルタに通し、その出力信号と入力信号の二乗誤差が聴覚特性の重み付けの下で最小になるよう符号化パラメータを決定する方法である（例えば、非特許文献１参照）。
【０００５】
最近の標準音声符号化方式の多くがＣＥＬＰに基づいており、例えばＧ７２９は、８ｋｂｉｔ／ｓのビットレートで狭帯域信号の符号化でき、ＡＭＲ−ＷＢは６．６ｋｂｉｔ／ｓ〜２３．８５ｋｂｉｔ／ｓのビットレートで広帯域信号を符号化できる。
【０００６】
一方で、楽音信号を符号化する楽音符号化の場合、ＭＰＥＧ（Ｍｏｖｉｎｇ　Ｐｉｃｔｕｒｅ　Ｅｘｐｅｒｔ　Ｇｒｏｕｐ）で規格化されているレイヤ３方式やＡＡＣ方式のように、楽音信号を周波数領域に変換し、聴覚心理モデルを利用して符号化を行う変換符号化が一般的である。これらの方式は、サンプリングレートが４４．１ｋＨｚの信号に対しチャネル当たり６４ｋｂｉｔ／ｓ〜９６ｋｂｉｔ／ｓのビットレートでほとんど劣化が生じないことが知られている。
【０００７】
しかしながら、音声信号が主体で、背景に音楽や環境音が重畳している信号を符号化する場合、音声符号化方式を適用すると背景部の音楽や環境音の影響で、背景部の信号のみならず音声信号も劣化してしまい全体的な品質が低下するという問題がある。これは、音声符号化方式が、ＣＥＬＰという音声モデルに特化した方式を基本にしているために生じる問題である。また、音声符号化方式が対応できる信号帯域は高々７ｋＨｚまでであり、それ以上の高域を持つ信号に対しては構成上十分に対応しきれないという問題がある。
【０００８】
一方で、楽音符号化は、音楽に対して高品質に符号化を行うことができるので、前述したような背景に音楽や環境音がある音声信号についても十分な品質を得ることができる。また、楽音符号化は、対象となる信号の帯域もＣＤ品質であるサンプリングレートが２２ｋＨｚ程度の信号まで対応可能である。
【０００９】
その反面、高品質な符号化を実現するためにはビットレートを高くして使用する必要があり、仮にビットレートを３２ｋｂｉｔ／ｓ程度まで低く抑えると復号信号の品質が大きく低下するという問題がある。そのため、伝送レートの低い通信網で使用できないという問題がある。
【００１０】
上述した問題を回避するためにこれらの技術を組み合わせて、最初に入力信号を基本レイヤにてＣＥＬＰで符号化し、次にその復号信号を入力信号から減算して得られる誤差信号を求め、この信号を拡張レイヤにて変換符号化を行うスケーラブル符号化が考えられる。
【００１１】
この方法では、基本レイヤはＣＥＬＰを用いているため音声信号を高品質に符号化でき、かつ拡張レイヤは基本レイヤで表しきれない背景の音楽や環境音、基本レイヤでカバーする周波数帯よりも高い周波数成分の信号を効率よく符号化することができる。さらにこの構成によればビットレートを低く抑えることができる。加えて、この構成によれば、符号化コードの一部つまり基本レイヤの符号化コードのみから音響信号を復号することが可能であり、このようなスケーラブル機能は伝送容量の異なる複数のネットワークに対するマルチキャストの実現に有効である。
【００１２】
しかしながら、音声ではなく音楽を入力したときに十分な品質を確保するためには、拡張レイヤへのビット配分を多くする必要があり、その結果ビットレートが高くなってしまうという問題がある。
【００１３】
基本レイヤの符号化において、ＣＥＬＰのような音声に特化した符号化方式を用いているが、このＣＥＬＰでは音楽に対する符号化効率が高くない。音楽信号を符号化すると、入力信号と基本レイヤの復号信号との誤差信号（拡張レイヤの入力信号）のパワーが大きくなるので、パワーの大きくなった誤差信号に対応するために拡張レイヤに多くのビットを配分して、最終的な復号信号の品質を上げる必要があった。
【００１４】
この問題を解決するために、拡張レイヤで聴覚マスキングを利用して符号化効率を上げることが考えられる。聴覚マスキングとは、ある信号が与えられたときその信号の周波数の近傍に位置する信号が聞こえなくなる（マスクされる）という人間の聴覚特性を利用したものである。
【００１５】
図２８は、音響（音楽）信号のスペクトルの一例を示す図である。図２８において、実線は聴覚マスキングを表し、破線は誤差スペクトルを表している。ここでいう誤差スペクトルとは、入力信号と基本レイヤの復号信号との誤差信号（拡張レイヤの入力信号）のスペクトルを指す。
【００１６】
図２８の斜線部で表される誤差スペクトルは、聴覚マスキングよりも振幅値が小さいため人間の聴覚では聞こえず、それ以外の領域では誤差スペクトルの振幅値が聴覚マスキングを超えているので量子化歪が知覚される。
【００１７】
そこで、拡張レイヤでは図２８の白地部に含まれる誤差スペクトルを符号化してその領域の量子化歪が聴覚マスキングよりも小さくなるようにすればよい。また、斜線部に属する係数は既に聴覚マスキングよりも小さくので量子化する必要がない。
【００１８】
【非特許文献１】
”Ｃｏｄｅ−Ｅｘｃｉｔｅｄ　Ｌｉｎｅａｒ　Ｐｒｅｄｉｃｔｉｏｎ　（ＣＥＬＰ）：　ｈｉｇｈ　ｑｕａｌｉｔｙ　ｓｐｅｅｃｈ　ａｔ　ｖｅｒｙ　ｌｏｗ　ｂｉｔ　ｒａｔｅｓ”，　Ｐｒｏｃ．　ＩＣＡＳＳＰ　８５，　ｐｐ．９３７−９４０，　１９８５．
【００１９】
【発明が解決しようとする課題】
しかしながら、従来の装置においては、聴覚マスキングにより量子化が必要な周波数の情報を伝送する必要があり、伝送する情報量が増加してビットレートを低くすることができないという問題がある。
【００２０】
本発明はかかる点に鑑みてなされたものであり、声が主体で背景に音楽や雑音が重畳しているような信号であっても、低ビットレートで高品質に符号化を行うことができる音響符号化装置及び音響符号化方法を提供することを目的とする。
【００２１】
【課題を解決するための手段】
本発明の音響符号化装置は、入力信号のサンプリングレートを下げるダウンサンプリング手段と、サンプリングレートが下げられた入力信号を符号化する基本レイヤ符号化手段と、符号化された入力信号を復号化して復号信号を得る復号化手段と、前記復号信号のサンプリングレートを入力時の入力信号のサンプリングレートと同一のレートに上げるアップサンプリング手段と、入力時の入力信号とサンプリングレートが上げられた復号信号との差分から誤差信号を得る減算手段と、サンプリングレートが上げられた復号信号に基づいて前記誤差信号を符号化する対象の周波数を決定する周波数決定手段と、前記周波数にある前記差分信号を符号化する拡張レイヤ符号化手段と、を具備する構成を採る。
【００２２】
この構成によれば、符号化信号の復号化した信号から拡張レイヤの符号化の対象となる周波数を決定することにより、符号化側から復号化側に伝送する基本レイヤの符号化信号のみで拡張レイヤの符号化の対象となる周波数を決定することができ、符号化側から復号化側にこの周波数の情報を伝送する必要がなくなり、低ビットレートで高品質に符号化を行うことができる。
【００２３】
本発明の音響符号化装置は、前記基本レイヤ符号化手段は、符号励振線形予測法を用いて入力信号を符号化する構成を採る。
【００２４】
この構成によれば、送信側において、基本レイヤにＣＥＬＰを適用して入力信号を符号化し、受信側において、この符号化した入力信号にＣＥＬＰを適用して復号することにより、低ビットレートで高品質な基本レイヤを実現することができる。
【００２５】
本発明の音響符号化装置は、前記拡張レイヤ符号化手段は、前記差分信号を時間領域から周波数領域に直交変換し、変換後の前記差分信号を符号化する構成を採る。
【００２６】
この構成によれば、差分信号を時間領域から周波数領域に変換し、変換後の信号について基本レイヤの符号化によりカバーできない周波数領域を拡張レイヤで符号化することにより、音楽のようにスペクトルの変化が大きい信号にも対応することができる。
【００２７】
本発明の音響符号化装置は、聴覚に寄与しない振幅値を表す聴覚マスキングを算出する聴覚マスキング手段を具備し、前記拡張レイヤ符号化手段は、前記周波数決定手段において前記聴覚マスキング内の信号を符号化の対象としないように符号化する対象を決定して前記誤差信号のスペクトルである誤差スペクトルを符号化する構成を採る。
【００２８】
本発明の音響符号化装置は、前記聴覚マスキング手段は、サンプリングレートが上げられた復号信号を周波数領域の係数に変換する周波数領域変換手段と、前記周波数領域の係数を用いて推定聴覚マスキングを算出する推定聴覚マスキング算出手段と、前記復号信号のスペクトルの振幅値が前記推定聴覚マスキングの振幅値を超える周波数を求める決定手段と、を具備し、前記拡張レイヤ符号化手段は、前記周波数に位置する前記誤差スペクトルを符号化する構成を採る。
【００２９】
これらの構成によれば、マスキング効果の特性を利用して、入力信号のスペクトルから聴覚マスキングを算出し、拡張レイヤの符号化において、量子化歪をこのマスキング値以下になるように量子化を行うことにより、品質の劣化を伴わずに量子化の対象となるＭＤＣＴ係数の数を減らすことができ、低ビットレートで高品質に符号化を行うことができる。
【００３０】
本発明の音響符号化装置は、前記聴覚マスキング手段は、前記周波数領域の係数を用いて推定誤差スペクトルを算出する推定誤差スペクトル算出手段を具備し、前記決定手段は、前記推定誤差スペクトルの振幅値が前記推定聴覚マスキングの振幅値を超える周波数を求める構成を採る。
【００３１】
この構成によれば、基本レイヤの復号信号のスペクトルから推定した残差スペクトルを平滑化することにより、推定誤差スペクトルを残差スペクトルに近似することができ、拡張レイヤにて誤差スペクトルを効率よく符号化することができる。
【００３２】
本発明の音響符号化装置は、前記聴覚マスキング手段は、前記推定聴覚マスキング算出手段において算出された推定聴覚マスキングを平滑化する修正手段を具備し、前記決定手段は、前記復号信号のスペクトルまたは前記推定誤差スペクトルの振幅値が平滑化された前記推定聴覚マスキングの振幅値を超える周波数を求める構成を採る。
【００３３】
この構成によれば、基本レイヤ復号信号の振幅スペクトルから推定した推定聴覚マスキングを、基本レイヤ符号化器の符号化コードの情報を基に修正を加えることにより、推定聴覚マスキングの精度を向上させることができ、結果拡張レイヤにて誤差スペクトルを効率よく符号化することができる。
【００３４】
本発明の音響符号化装置は、前記拡張レイヤ符号化手段は、推定誤差スペクトルまたは誤差スペクトルのいずれかと聴覚マスキングまたは推定聴覚マスキングのいずれかとの振幅値の差を周波数毎に算出し、前記振幅値の差の大きさに基づいて符号化の情報量を決定する構成を採る。
【００３５】
この構成によれば、拡張レイヤでの符号化において、推定誤差スペクトルが推定聴覚マスキングを超える量が大きい周波数に多くの情報量を配分して符号化することにより、量子化効率の向上を図ることができる。
【００３６】
本発明の音響符号化装置は、前記拡張レイヤ符号化手段は、前記決定手段で求めた周波数に加えて、予め定められた帯域にある前記誤差スペクトルを符号化する構成を採る。
【００３７】
この構成によれば、符号化の対象として選択されにくいが聴覚的に重要な帯域を強制的に量子化することにより、本来符号化の対象として選択されるべき周波数が選択されない場合でも、聴覚的に重要な帯域に含まれる周波数に位置する誤差スペクトルは必ず量子化されることになり、品質の改善することができる。
【００３８】
本発明の音響復号化装置は、符号化側において入力信号を所定の基本フレーム単位で符号化した第１符号化コードを復号化して第１復号信号を得る基本レイヤ復号化手段と、第１復号信号のサンプリングレートを第２復号信号のサンプリングレートと同一のサンプリングレートに上げるアップサンプリング手段と、前記アップサンプリングされた第１復号信号に基づいて入力信号と符号化側において第１符号化コードを復号した信号との残差信号を符号化した第２符号化コードを復号化する対象の周波数を決定する周波数決定手段と、前記周波数の情報を用いて前記第２符号化コードを復号化して第２復号信号を得る拡張レイヤ復号化手段と、前記第２復号信号とサンプリングレートを上げられた第１復号信号を加算する加算手段と、を具備する構成を採る。
【００３９】
この構成によれば、基本レイヤの符号化信号の復号化した信号から拡張レイヤの符号化の対象となる周波数を決定することにより、符号化側から復号化側に伝送する基本レイヤの符号化信号のみで拡張レイヤの符号化の対象となる周波数を決定することができ、符号化側から復号化側にこの周波数の情報を伝送する必要がなくなり、低ビットレートで高品質に符号化を行うことができる。
【００４０】
本発明の音響復号化装置は、基本レイヤ復号化手段は、符号励振線形予測法を用いて第１符号化コードを復号化する構成を採る。
【００４１】
この構成によれば、送信側において、基本レイヤにＣＥＬＰを適用して入力信号を符号化し、受信側において、この符号化した入力信号にＣＥＬＰを適用して復号することにより、低ビットレートで高品質な基本レイヤを実現することができる。
【００４２】
本発明の音響復号化装置は、拡張レイヤ復号化手段は、第２符号化コードを復号化した信号を周波数領域から時間領域に直交変換する構成を採る。
【００４３】
この構成によれば、差分信号を時間領域から周波数領域に変換し、変換後の信号について基本レイヤの符号化によりカバーできない周波数領域を拡張レイヤで符号化することにより、音楽のようにスペクトルの変化が大きい信号にも対応することができる。
【００４４】
本発明の音響復号化装置は、聴覚に寄与しない振幅値を表す聴覚マスキングを算出する聴覚マスキング手段を具備し、前記拡張レイヤ復号化手段は、前記周波数決定手段において前記聴覚マスキング内の信号を復号化の対象としないように復号化する対象を決定する構成を採る。
【００４５】
本発明の音響復号化装置は、前記聴覚マスキング手段は、サンプリングレートの上げられた基本レイヤの復号信号を周波数領域の係数に変換する周波数領域変換手段と、前記周波数領域の係数を用いて推定聴覚マスキングを算出する推定聴覚マスキング算出手段と、前記復号信号のスペクトルの振幅値が前記推定聴覚マスキングの振幅値を超える周波数を求める決定手段と、を具備し、前記拡張レイヤ復号化手段は、前記周波数に位置する前記誤差スペクトルを復号化する構成を採る。
【００４６】
これらの構成によれば、マスキング効果の特性を利用して、入力信号のスペクトルから聴覚マスキングを算出し、拡張レイヤの符号化において、量子化歪をこのマスキング値以下になるように量子化を行うことにより、品質の劣化を伴わずに量子化の対象となるＭＤＣＴ係数の数を減らすことができ、低ビットレートで高品質に符号化を行うことができる。
【００４７】
本発明の音響復号化装置は、前記聴覚マスキング手段は、前記周波数領域の係数を用いて推定誤差スペクトルを算出する推定誤差スペクトル算出手段を具備し、前記決定手段は、前記推定誤差スペクトルの振幅値が前記推定聴覚マスキングの振幅値を超える周波数を求める構成を採る。
【００４８】
この構成れば、基本レイヤの復号信号のスペクトルから推定した残差スペクトルを平滑化することにより、推定誤差スペクトルを残差スペクトルに近似することができ、拡張レイヤにて誤差スペクトルを効率よく符号化することができる。
【００４９】
本発明の音響復号化装置は、前記聴覚マスキング手段は、前記推定聴覚マスキング算出手段において算出された推定聴覚マスキングを平滑化する修正手段を具備し、前記決定手段は、前記復号信号のスペクトルまたは前記推定誤差スペクトルの振幅値が平滑化された前記推定聴覚マスキングの振幅値を超える周波数を求める構成を採る。
【００５０】
この構成によれば、基本レイヤ復号信号の振幅スペクトルから推定した推定聴覚マスキングを、基本レイヤ符号化器の符号化コードの情報を基に修正を加えることにより、推定聴覚マスキングの精度を向上させることができ、結果拡張レイヤにて誤差スペクトルを効率よく符号化することができる。
【００５１】
本発明の音響復号化装置は、前記拡張レイヤ復号化手段は、推定誤差スペクトルまたは誤差スペクトルのいずれかと聴覚マスキングまたは推定聴覚マスキングのいずれかとの振幅値の差を周波数毎に算出し、前記振幅値の差の大きさに基づいて復号化の情報量を決定する構成を採る。
【００５２】
この構成によれば、拡張レイヤでの符号化において、推定誤差スペクトルが推定聴覚マスキングを超える量に応じて適応ビット配分されたベクトル量子化を行うことにより、量子化効率の向上を図ることができる。
【００５３】
本発明の音響復号化装置は、前記拡張レイヤ復号化手段は、前記決定手段で求めた周波数に加えて、予め定められた帯域にある前記誤差スペクトルとを復号化する構成を採る。
【００５４】
この構成によれば、あらかじめ定めておいた帯域に含まれるＭＤＣＴ係数を復号化することにより、符号化の対象として選択されにくいが聴覚的に重要な帯域を強制的に量子化された信号を復号化することができ、符号化側において本来符号化の対象として選択されるべき周波数が選択されない場合でも、聴覚的に重要な帯域に含まれる周波数に位置する誤差スペクトルは必ず量子化されることになり、品質の改善することができる。
【００５５】
本発明の音響信号送信装置は、音響信号を電気的信号に変換する音響入力手段と、この音響入力手段から出力された信号をディジタル信号に変換するＡ／Ｄ変換手段と、このＡ／Ｄ変換手段から出力されたディジタル信号を符号化する上記音響符号化装置と、この符号化装置から出力された符号化コードを無線周波数の信号に変調するＲＦ変調手段と、このＲＦ変調手段から出力された信号を電波に変換して送信する送信アンテナとを具備する構成を採る。
【００５６】
本発明の音響信号受信装置は、電波を受信する受信アンテナと、この受信アンテナに受信された信号を復調するＲＦ復調手段と、このＲＦ復調手段にて得られた情報を復号する上記音響復号化装置と、この復号化装置から出力された信号をアナログ信号に変換するＤ／Ａ変換手段と、このＤ／Ａ変換手段から出力された電気的信号を音響信号に変換する音響出力手段とを具備する構成を採る。
【００５７】
本発明の通信端末装置は、上記音響信号送信装置あるいは上記音響信号受信装置の少なくとも一方を具備する構成を採る。本発明の基地局装置は、上記音響信号送信装置あるいは上記音響信号受信装置の少なくとも一方を具備する構成を採る。
【００５８】
これらの構成によれば、符号化信号の復号化した信号から拡張レイヤの符号化の対象となる周波数を決定することにより、符号化側から復号化側に伝送する基本レイヤの符号化信号のみで拡張レイヤの符号化の対象となる周波数を決定することができ、符号化側から復号化側にこの周波数の情報を伝送する必要がなくなり、低ビットレートで高品質に符号化を行うことができる。
【００５９】
本発明の音響符号化方法は、符号化側において、サンプリングレートが下げられた入力信号を符号化して第１符号化コードを作成し、前記第１符号化コードを復号化した第１復号信号のサンプリングレートを入力時の入力信号のサンプリングレートと同一のレートに上げ、サンプリングレートが上げられた復号信号に基づいて前記誤差信号を符号化する対象の周波数を決定し、入力時の入力信号とサンプリングレートが上げられた復号信号との差分信号のうち前記周波数にある前記差分信号を符号化して第２符号化コードを作成し、復号化側において、前記第１符号化コードを復号化して第２復号信号を得て、前記第２復号信号のサンプリングレートを第３復号信号のサンプリングレートと同一のレートに上げ、このサンプリングレートの上げられた第２復号信号に基づいて第２符号化コードを復号化する対象の周波数を決定し、前記周波数の情報を用いて前記第２符号化コードを復号化して第３復号信号を得て、前記サンプリングレートの上げられた第２復号信号と前記第３復号信号とを加算するようにした。
【００６０】
この方法によれば、符号化信号の復号化した信号から拡張レイヤの符号化の対象となる周波数を決定することにより、符号化側から復号化側に伝送する基本レイヤの符号化信号のみで拡張レイヤの符号化の対象となる周波数を決定することができ、符号化側から復号化側にこの周波数の情報を伝送する必要がなくなり、低ビットレートで高品質に符号化を行うことができる。
【００６１】
【発明の実施の形態】
本発明者は、入力信号の代わりに基本レイヤの符号化コードを復号化した信号を使って拡張レイヤで符号化する周波数を推定しても、この復号信号は入力信号との歪が小さくなるよう決定されているため、充分に近似され大きな問題は生じないということに着目し本発明をするに至った。
【００６２】
本発明の骨子は、入力信号をダウンサンプリングして符号化し、符号化した信号を復号化してアップサンプリングし、このアップサンプリングした復号信号と入力信号との差分信号を符号化する符号化方法において、符号化側と復号化側の両方で算出されるアップサンプリングした復号信号から拡張レイヤで符号化もしくは復号化の対象となる周波数を決定することにより、符号化側から復号化側にこの周波数の情報を伝送ことなく、低ビットレートで高品質に符号化を行うことである。
【００６３】
以下、本発明の実施の形態について図面を参照して詳細に説明する。
（実施の形態１）
図１は、本発明の実施の形態１に係る音響符号化装置の構成を示すブロック図である。図１の音響符号化装置１００は、ダウンサンプリング器１０１と、基本レイヤ符号化器１０２と、局所復号化器１０３と、アップサンプリング器１０４と、遅延器１０５と、減算器１０６と、周波数決定部１０７と、拡張レイヤ符号化器１０８と、多重化器１０９とから主に構成される。
【００６４】
図１において、ダウンサンプリング器１０１は、サンプリングレートＦＨの入力データ（音響データ）を受けつけ、この入力データをサンプリングレートＦＨより低いサンプリングレートＦＬに変換して基本レイヤ符号化器１０２に出力する。
【００６５】
基本レイヤ符号化器１０２は、サンプリングレートＦＬの入力データを所定の基本フレーム単位で符号化し、入力データを符号化した第１符号化コードを局所復号化器１０３と多重化器１０９に出力する。例えば、基本レイヤ符号化器１０２は、入力データをＣＥＬＰ方式で符号化する。
【００６６】
局所復号化器１０３は、第１符号化コードを復号化し、復号化により得られた復号信号をアップサンプリング器１０４に出力する。アップサンプリング器１０４は、復号信号のサンプリングレートをＦＨに上げて減算器１０６と周波数決定部１０７に出力する。
【００６７】
遅延器１０５は、入力信号を所定の時間遅延して減算器１０６に出力する。この遅延の大きさをダウンサンプリング器１０１と基本レイヤ符号化器１０２と局所復号化器１０３とアップサンプリング器１０４で生じる時間遅れと同値とすることにより、次の減算処理での位相のずれを防ぐ役割を持つ。例えば、この遅延時間は、ダウンサンプリング器１０１、基本レイヤ符号化器１０２、局所復号化器１０３、及びアップサンプリング器１０４における処理の時間の総和とする。減算器１０６は、入力信号を復号信号で減算し、減算結果を誤差信号として拡張レイヤ符号化器１０８に出力する。
【００６８】
周波数決定部１０７は、サンプリングレートをＦＨに上げた復号信号から誤差信号の符号化する領域と、符号化しない領域を決定して拡張レイヤ符号化器１０８に通知する。例えば、周波数決定部１０７は、サンプリングレートをＦＨに上げた復号信号から聴覚マスキングする周波数を決定して拡張レイヤ符号化器１０８に出力する。
【００６９】
拡張レイヤ符号化器１０８は、誤差信号を周波数領域の係数に変換して誤差スペクトルを生成し、周波数決定部１０７から得られる符号化の対象となる周波数情報に基づき誤差スペクトルの符号化を行う。多重化器１０９は、基本レイヤ符号化器１０２で符号化された信号と、拡張レイヤ符号化器１０８で符号化された信号を多重化する。
【００７０】
以下、基本レイヤ符号化器１０２と拡張レイヤ符号化器１０８とがそれぞれ符号化する信号について説明する。図２は、音響信号の情報の分布の一例を示す図である。図２において、縦軸は情報量を示し、横軸は周波数を示す。図２では、入力信号に含まれる音声情報と背景音楽・背景雑音情報がどの周波数帯にどれだけ存在しているかを表している。
【００７１】
図２に示すように、音声情報は、周波数の低い領域に情報が多く存在し、高域に向かうほど情報量は減少する。一方、背景音楽・背景雑音情報は、音声情報と比べると相対的に低域の情報は少なく、高域に含まれる情報が大きい。
【００７２】
そこで、基本レイヤではＣＥＬＰを用いて音声信号を高品質に符号化し、拡張レイヤでは基本レイヤで表しきれない背景の音楽や環境音、基本レイヤでカバーする周波数帯よりも高い周波数成分の信号を効率よく符号化する。
【００７３】
図３は、基本レイヤと拡張レイヤで符号化の対象とする領域の一例を示す図である。図３において、縦軸は情報量を示し、横軸は周波数を示す。図３は、基本レイヤ符号化器１０２と拡張レイヤ符号化器１０８がそれぞれ符号化する情報の対象となる領域を表している。
【００７４】
基本レイヤ符号化器１０２は、０〜ＦＬ間の周波数帯の音声情報を効率よく表すように設計されており、この領域での音声情報は品質良く符号化することができる。しかし、基本レイヤ符号化器１０２では、０〜ＦＬ間の周波数帯の背景音楽・背景雑音情報の符号化品質が高くない。
【００７５】
拡張レイヤ符号化器１０８は、上記説明にある基本レイヤ符号化器１０２の能力不足の部分と、ＦＬ〜ＦＨ間の周波数帯の信号をカバーするように設計されている。よって、基本レイヤ符号化器１０２と拡張レイヤ符号化器１０８を組み合わせることで広い帯域で高品質な符号化が実現できる。
【００７６】
図３に示すように、基本レイヤ符号化器１０２における符号化により得られた第１符号化コードには、０〜ＦＬ間の周波数帯の音声情報が含まれているので、少なくとも第１符号化コードのみでも復号信号が得られるというスケーラブル機能が実現できる。
【００７７】
また、拡張レイヤで聴覚マスキングを利用して符号化効率を上げることが考えられる。聴覚マスキングとは、ある信号が与えられたときその信号の周波数の近傍に位置する信号が聞こえなくなる（マスクされる）という人間の聴覚特性を利用したものである。
【００７８】
図２８は、音響（音楽）信号のスペクトルの一例を示す図である。図２８において、実線は聴覚マスキングを表し、破線は誤差スペクトルを表している。ここでいう誤差スペクトルとは、入力信号と基本レイヤの復号信号との誤差信号（拡張レイヤの入力信号）のスペクトルを指す。
【００７９】
図２８の斜線部で表される誤差スペクトルは、聴覚マスキングよりも振幅値が小さいため人間の聴覚では聞こえず、それ以外の領域では誤差スペクトルの振幅値が聴覚マスキングを超えているので量子化歪が知覚される。
【００８０】
そこで、拡張レイヤでは図２８の白地部に含まれる誤差スペクトルを符号化してその領域の量子化歪が聴覚マスキングよりも小さくなるようにすればよい。また、斜線部に属する係数は既に聴覚マスキングよりも小さくので量子化する必要がない。
【００８１】
本実施の形態の音響符号化装置１００では、聴覚マスキング等により残差信号を符号化する周波数を符号化側から復号化側に伝送することをせず、符号化側と復号側でそれぞれアップサンプリングされた基本レイヤの復号信号を用いて拡張レイヤが符号化する誤差スペクトルの周波数を決定する。
【００８２】
基本レイヤの符号化コードを復号化した復号信号は、符号化側と復号化側で同じ信号が得られるので、符号化側は、この復号化信号から聴覚マスキングする周波数を決定して信号を符号化し、復号化側は、この復号化信号から聴覚マスキングされた周波数の情報を得て信号を復号化することにより、誤差スペクトルの周波数の情報を付加情報として符号化して伝送する必要は無くなり、ビットレートの低減を実現することができる。
【００８３】
次に、本実施の形態に係る音響符号化装置の各ブロックの詳細な動作について説明する。最初にアップサンプリングされた基本レイヤの復号信号（以後、基本レイヤ復号信号と呼ぶ）から拡張レイヤにて符号化される誤差スペクトルの周波数を決定する周波数決定部１０７の動作の説明を行う。図４は、本実施の形態の音響符号化装置の周波数決定部の内部構成の一例を示すブロック図である。
【００８４】
図４において、周波数決定部１０７は、ＦＦＴ部４０１と、推定聴覚マスキング算出器４０２と、決定部４０３とから主に構成される。
【００８５】
ＦＦＴ部４０１は、アップサンプリング器１０４から出力された基本レイヤ復号信号ｘ（ｎ）を直交変換して振幅スペクトルＰ（ｍ）を算出して推定聴覚マスキング算出器４０２と決定部４０３に出力する。具体的には、ＦＦＴ部４０１は、以下の式（１）を用いて振幅スペクトルＰ（ｍ）を算出する。
【００８６】
【数１】

ここで、Ｒｅ（ｍ）とＩｍ（ｍ）は基本レイヤ復号信号ｘ（ｎ）のフーリエ係数の実部と虚部、ｍは周波数を表す。
【００８７】
次に、推定聴覚マスキング算出器４０２は、基本レイヤ復号信号の振幅スペクトルＰ（ｍ）を用いて推定聴覚マスキングＭ’（ｍ）を算出して決定部４０３に出力する。一般的には、聴覚マスキングは、入力信号のスペクトルを基に算出されるものであるが、本実施例では入力信号の代わりに基本レイヤ復号信号ｘ（ｎ）を使って聴覚マスキングを推定する。これは、基本レイヤ復号信号ｘ（ｎ）は入力信号との歪が小さくなるよう決定されているため、入力信号の代わりに基本レイヤ復号信号ｘ（ｎ）を用いても充分に近似され大きな問題は生じないという考えに基づいている。
【００８８】
次に、決定部４０３は、基本レイヤ復号信号の振幅スペクトルＰ（ｍ）と推定聴覚マスキング算出器４０２で得られる推定聴覚マスキングＭ’（ｍ）を用いて拡張レイヤ符号化器１０８で誤差スペクトルを符号化する対象の周波数を決定する。決定部４０３は、基本レイヤ復号信号の振幅スペクトルＰ（ｍ）を誤差スペクトルの近似値とみなし、次の式（２）が成り立つ周波数ｍを拡張レイヤ符号化器１０８に出力する。
【００８９】
【数２】

【００９０】
式（２）において、Ｐ（ｍ）の項は、誤差スペクトルの大きさを推定しており、Ｍ’（ｍ）の項は、聴覚マスキングを推定している。そして、決定部４０３は、推定誤差スペクトルと推定聴覚マスキングの大きさを比較し、式（２）を満たす場合、すなわち推定聴覚マスキングの大きさを推定誤差スペクトルの大きさが超える場合に、その周波数の誤差スペクトルはノイズとして知覚されるとして拡張レイヤ符号化器１０８で符号化する対象とする。
【００９１】
逆に推定聴覚マスキングの大きさより推定誤差スペクトルの大きさが下回る場合に、決定部４０３は、マスキング効果によりその周波数の誤差スペクトルはノイズとして知覚されないとみなし、この周波数の誤差スペクトルは量子化の対象から外す。
【００９２】
次に、推定聴覚マスキング算出器４０２の動作を説明する。図５は、本実施の形態の音響符号化装置の聴覚マスキング算出器の内部構成の一例を示す図である。図５において、推定聴覚マスキング算出器４０２は、バークスペクトル算出器５０１と、スプレッド関数畳み込み器５０２と、トーナリティ算出器５０３と、聴覚マスキング算出器５０４とから主に構成される。
【００９３】
図５において、バークスペクトル算出器５０１は、以下の式（３）を用いてバークスペクトルＢ（ｋ）を算出する。
【００９４】
【数３】

ここで、Ｐ（ｍ）は振幅スペクトルを表し、上述の式（１）より求められる。また、ｋはバークスペクトルの番号に対応し、ＦＬ（ｋ）、ＦＨ（ｋ）はそれぞれ第ｋバークスペクトルの最低周波数、最高周波数を表す。バークスペクトルＢ（ｋ）はバークスケール上で等間隔に帯域分割されたときのスペクトル強度を表す。ヘルツスケールをｆ、バークスケールをＢと表したとき、ヘルツスケールとバークスケールの関係は以下の式（４）で表される。
【００９５】
【数４】

【００９６】
スプレッド関数畳み込み器５０２は、以下に示す式（５）を用いてバークスペクトルＢ（ｋ）にスプレッド関数ＳＦ（ｋ）を畳み込み、Ｃ（ｋ）を算出する。
【００９７】
【数５】

【００９８】
トーナリティ算出器５０３は、以下の式（６）を用い、各バークスペクトルのスペクトル平坦度ＳＦＭ（ｋ）を求める。
【００９９】
【数６】

ここで、μｇ（ｋ）は第ｋバークスペクトルに含まれるパワースペクトルの幾何平均、μａ（ｋ）は第ｋバークスペクトルに含まれるパワースペクトルの算術平均を表す。そして、トーナリティ算出器５０３は、以下の式（７）を用いてスペクトル平坦度ＳＦＭ（ｋ）のデシベル値ＳＦＭｄＢ（ｋ）からトーナリティ係数α（ｋ）を算出する。
【０１００】
【数７】

【０１０１】
聴覚マスキング算出器５０４は、以下の式（８）を用いてトーナリティ算出器５０３で算出したトーナリティ係数α（ｋ）から各バークスケールのオフセットＯ（ｋ）を求める。
【０１０２】
【数８】

【０１０３】
そして、聴覚マスキング算出器５０４は、以下の式（９）を用いてスプレッド関数畳み込み器５０２で求めたＣ（ｋ）からオフセットＯ（ｋ）を減算して聴覚マスキングＴ（ｋ）を算出する。
【０１０４】
【数９】

ここで、Ｔ_ｑ（ｋ）は絶対閾値を表す。絶対閾値は、人間の聴覚特性として観測される聴覚マスキングの最小値を表す。そして、聴覚マスキング算出器５０４は、バークスケールで表される聴覚マスキングＴ（ｋ）をヘルツスケールに変換して推定聴覚マスキングＭ’（ｍ）を求め、決定部４０３に出力する。
【０１０５】
このようにして求められた量子化の対象となる周波数ｍを使って、拡張レイヤ符号化器１０８にてＭＤＣＴ係数の符号化を行う。図６は、本実施の形態の拡張レイヤ符号化器の内部構成の一例を示すブロック図である。図６の拡張レイヤ符号化器１０８は、ＭＤＣＴ部６０１と、ＭＤＣＴ係数量子化器６０２とから主に構成される。
【０１０６】
ＭＤＣＴ部６０１は、減算器１０６から出力された入力信号に分析窓を乗じた後、ＭＤＣＴ変換（変形離散コサイン変換）してＭＤＣＴ係数を求める。ＭＤＣＴ変換は、前後の隣接フレームと分析フレームを半分ずつ完全に重ね合わせ、分析フレームの前半部は奇関数、後半部は偶関数という直交基底を用いる。ＭＤＣＴ変換は、波形を合成する際、逆変換後の波形を重ね合わせて加算することにより、フレーム境界歪が発生しないという特徴がある。ＭＤＣＴを行う際には、ｓｉｎ窓などの窓関数を入力信号に乗ずる。ＭＤＣＴ係数をＸ（ｎ）とすると、ＭＤＣＴ係数は、式（１０）に従い算出される。
【０１０７】
【数１０】

【０１０８】
ＭＤＣＴ係数量子化器６０２は、ＭＤＣＴ部６０１から出力された入力信号に周波数決定部１０７から出力された量子化の対象となる周波数に対応する係数を量子化する。そして、ＭＤＣＴ係数量子化器６０２は、量子化したＭＤＣＴ係数の符号化コードを多重化器１０９に出力する。
【０１０９】
このように、本実施の形態の音響符号化装置によれば、基本レイヤの符号化コードを復号化した信号から拡張レイヤの符号化の対象となる周波数を決定することにより、符号化側から復号化側に伝送する基本レイヤの符号化信号のみで拡張レイヤの符号化の対象となる周波数を決定することができ、符号化側から復号化側にこの周波数の情報を伝送する必要がなくなり、低ビットレートで高品質に符号化を行うことができる。
【０１１０】
なお、上記実施の形態では、ＦＦＴを使った聴覚マスキングの算出法について説明しているが、ＦＦＴの代わりＭＤＣＴを使って聴覚マスキングを算出することもできる。図７は、本実施の形態の周波数決定部の内部構成の一例を示すブロック図である。但し、図５と同一の構成となるものについては、図５と同一番号を付し、詳しい説明を省略する。
【０１１１】
ＭＤＣＴ部７０１は、ＭＤＣＴ係数を使って振幅スペクトルＰ（ｍ）を近似する。具体的には、ＭＤＣＴ部７０１は、以下の式（１１）を用いてＰ（ｍ）を近似する。
【０１１２】
【数１１】

ここで、Ｒ（ｍ）は、アップサンプリング器１０４から与えられる信号をＭＤＣＴ変換して求めたＭＤＣＴ係数を表す。
【０１１３】
推定聴覚マスキング算出器４０２は、ＭＤＣＴ部７０１において近似されたＰ（ｍ）からバークスペクトルＢ（ｋ）を算出する。それ以後は上述した方法に従い量子化の対象となる周波数情報を算出する。
【０１１４】
このように、本実施の形態の音響符号化装置は、ＭＤＣＴを使って聴覚マスキングを算出することもできる。
【０１１５】
次に、復号化側について説明する。図８は、本発明の実施の形態１に係る音響復号化装置の構成を示すブロック図である。図８の音響復号化装置８００は、分離器８０１と、基本レイヤ復号化器８０２と、アップサンプリング器８０３と、周波数決定部８０４と、拡張レイヤ復号化器８０５と、加算器８０６とから主に構成される。
【０１１６】
分離器８０１は、音響符号化装置１００において符号化されたコードを基本レイヤ用の第１符号化コードと拡張レイヤ用の第２符号化コードに分離し、第１符号化コードを基本レイヤ復号化器８０２に出力し、第２符号化コードを拡張レイヤ復号化器８０５に出力する。
【０１１７】
基本レイヤ復号化器８０２は、第１符号化コードを復号してサンプリングレートＦＬの復号信号を得る。そして、基本レイヤ復号化器８０２は、復号信号をアップサンプリング器８０３に出力する。アップサンプリング器８０３は、サンプリングレートＦＬの復号信号をサンプリングレートＦＨの復号信号に変換して周波数決定部８０４と加算器８０６に出力する。
【０１１８】
周波数決定部８０４は、アップサンプリングされた基本レイヤの復号信号を用いて拡張レイヤ復号化器８０５で復号化の対象となる誤差スペクトルの周波数を決定する。この周波数決定部８０４は、図１の周波数決定部１０７と同様の構成をとる。
【０１１９】
拡張レイヤ復号化器８０５は、第２符号化コードを復号してサンプリングレートＦＨの復号信号を得る。そして、拡張レイヤ復号化器８０５は、復号された拡張フレーム単位の復号信号を重ね合わせ、重ね合わせた復号信号を加算器８０６に出力する。具体的には、拡張レイヤ復号化器８０５は、復号信号に合成用の窓関数を乗じ、前フレームで復号された時間領域の信号とフレームの半分だけオーバーラップさせて加算して出力信号を生成する。
【０１２０】
加算器８０６は、アップサンプリング器８０３においてアップサンプリングされた基本レイヤの復号信号と、加算器８０６において復号化された拡張レイヤの復号信号とを加算して出力する。
【０１２１】
次に、本実施の形態に係る音響復号化装置の各ブロックの詳細な動作について説明する。図９は、本実施の形態の音響復号化装置の拡張レイヤ復号化器の内部構成の一例を示すブロック図である。図９は、図８の拡張レイヤ復号化器８０５の内部構成の一例を示す図である。図９の拡張レイヤ復号化器８０５は、ＭＤＣＴ係数復号化器９０１と、ＩＭＤＣＴ部９０２と、重ね合わせ加算器９０３とから主に構成される。
【０１２２】
ＭＤＣＴ係数復号化器９０１は、周波数決定部８０４から出力される復号化の対象となる誤差スペクトルの周波数に基づいて分離器８０１から出力される第２符号化コードから量子化されたＭＤＣＴ係数を復号する。具体的には、周波数決定部８０４から示された信号の周波数に対応する復号ＭＤＣＴ係数を配置し、それ以外の周波数にはゼロを与える。
【０１２３】
ＩＭＤＣＴ部９０２は、ＭＤＣＴ係数復号化器９０１から出力されるＭＤＣＴ係数に逆ＭＤＣＴ変換を施し、時間領域の信号を生成して重ね合わせ加算器９０３に出力する。
【０１２４】
重ね合わせ加算器９０３は、復号された拡張フレーム単位の復号信号を重ね合わせ、重ね合わせた復号信号を加算器８０６に出力する。具体的には、重ね合わせ加算器９０３は、復号信号に合成用の窓関数を乗じ、前フレームで復号された時間領域の信号とフレームの半分だけオーバーラップさせて加算して出力信号を生成する。
【０１２５】
このように、本実施の形態の音響復号化装置によれば、基本レイヤの符号化コードを復号化した信号から拡張レイヤの復号化の対象となる周波数を決定することにより、符号化側から復号化側に伝送する基本レイヤの符号化コードのみで拡張レイヤの復号化の対象となる周波数を決定することができ、符号化側から復号化側にこの周波数の情報を伝送する必要がなくなり、低ビットレートで高品質に符号化を行うことができる。
【０１２６】
（実施の形態２）
本実施の形態では、基本レイヤの符号化においてＣＥＬＰを用いる例について説明する。図１０は、本発明の実施の形態２の基本レイヤ符号化器の内部構成の一例を示すブロック図である。図１０は、図１の基本レイヤ符号化器１０２の内部構成を示す図である。図１０の基本レイヤ符号化器１０２は、ＬＰＣ分析器１００１と、聴感重み部１００２と、適応符号帳探索器１００３と、適応ゲイン量子化器１００４と、目標ベクトル生成器１００５と、雑音符号帳探索器１００６と、雑音ゲイン量子化器１００７と、多重化器１００８とから主に構成される。
【０１２７】
ＬＰＣ分析器１００１は、サンプリングレートＦＬの入力信号のＬＰＣ係数を算出し、このＬＰＣ係数をＬＳＰ係数などの量子化に適したパラメータに変換して量子化する。そして、ＬＰＣ分析器１００１は、この量子化で得られる符号化コードを多重化器１００８に出力する。
【０１２８】
また、ＬＰＣ分析器１００１は、符号化コードから量子化後のＬＳＰ係数を算出してＬＰＣ係数に変換し、量子化後のＬＰＣ係数を、適応符号帳探索器１００３、適応ゲイン量子化器１００４、雑音符号帳探索器１００６、及び雑音ゲイン量子化器１００７に出力する。さらに、ＬＰＣ分析器１００１は、量子化前のＬＰＣ係数を聴感重み部１００２、適応符号帳探索器１００３、適応ゲイン量子化器１００４、雑音符号帳探索器１００６、及び雑音ゲイン量子化器１００７に出力する。
【０１２９】
聴感重み部１００２は、ＬＰＣ分析器１００１で求められたＬＰＣ係数に基づいてダウンサンプリング器１０１から出力された入力信号に重み付けを行う。これは、量子化歪のスペクトルを入力信号のスペクトル包絡にマスクされるようスペクトル整形を行うことを目的としている。
【０１３０】
適応符号帳探索器１００３では、聴覚重み付けされた入力信号を目標信号として適応符号帳の探索が行われる。過去の音源系列をピッチ周期で繰り返した信号を適応ベクトルと呼び、あらかじめ定められた範囲のピッチ周期で生成された適応ベクトルによって適応符号帳は構成される。
【０１３１】
聴覚重み付けされた入力信号をｔ（ｎ）、ピッチ周期ｉの適応ベクトルに量子化前のＬＰＣ係数と量子化後のＬＰＣ係数で構成される重み付き合成フィルタのインパルス応答を畳み込んだ信号をｐ_ｉ（ｎ）としたとき、適応符号帳探索器１００３は、式（１２）の評価関数Ｄを最小とする適応ベクトルのピッチ周期ｉをパラメータとして多重化器１００８に出力する。
【０１３２】
【数１２】

ここで、Ｎはベクトル長を表す。式（１２）の第１項はピッチ周期ｉに独立なので、実際には、適応符号帳探索器１００３は第２項のみを計算する。
【０１３３】
適応ゲイン量子化器１００４は、適応ベクトルに乗じられる適応ゲインの量子化を行う。適応ゲインβは、以下の式（１３）で表され、適応ゲイン量子化器１００４は、この適応ゲインβをスカラー量子化し、量子化時に得られる符号を多重化器１００８に出力する。
【０１３４】
【数１３】

【０１３５】
目標ベクトル生成器１００５は、入力信号から適応ベクトルの影響を減算して、雑音符号帳探索器１００６と雑音ゲイン量子化器１００７で用いる目標ベクトルを生成して出力する。目標ベクトル生成器１００５は、ｐ_ｉ（ｎ）を式１２で表される評価関数Ｄを最小とするときの適応ベクトルに重み付き合成フィルタのインパルス応答を畳み込んだ信号、βｑを式１３で表される適応ベクトルβをスカラー量子化したときの量子化値としたとき、目標ベクトルｔ２（ｎ）は、以下に示す式（１４）のように表される。
【０１３６】
【数１４】

【０１３７】
雑音符号帳探索器１００６は、前記目標ベクトルｔ２（ｎ）と量子化前のＬＰＣ係数と量子化後のＬＰＣ係数を用いて雑音符号帳の探索を行う。例えば、雑音符号帳探索器１００６には、ランダム雑音や大規模な音声信号を使って学習した信号を用いることができる。また、雑音符号帳探索器１００６が備える雑音符号帳は、代数（Ａｌｇｅｂｒａｉｃ）符号帳のように、振幅１のパルスをあらかじめ定められた非常に少ない数だけ有するベクトルで表されることができる。この代数符号長は、パルスの位置とパルスの符号（極性）の最適な組み合わせを少ない計算量で決定することができるという特徴がある。
【０１３８】
雑音符号帳探索器１００６は、目標ベクトルをｔ２（ｎ）、コードｊに対応する雑音ベクトルに重み付き合成フィルタのインパルス応答を畳み込んだ信号をｃｊ（ｎ）としたとき、以下に示す式（１５）の評価関数Ｄを最小とする雑音ベクトルのインデックスｊを多重化器１００８に出力する。
【０１３９】
【数１５】

【０１４０】
雑音ゲイン量子化器１００７は、雑音ベクトルに乗じる雑音ゲインを量子化する。雑音ゲイン量子化器１００７は、以下に示す式（１６）を用いて雑音ゲインγを算出し、この雑音ゲインγをスカラー量子化して多重化器１００８に出力する。
【０１４１】
【数１６】

【０１４２】
多重化器１００８は、送られてきたＬＰＣ係数、適応ベクトル、適応ゲイン、雑音ベクトル、雑音ゲインの符号化コードを多重化して局所復号化器１０３及び多重化器１０９に出力する。
【０１４３】
次に、復号化側について説明する。図１１は、本実施の形態の基本レイヤ復号化器の内部構成の一例を示すブロック図である。図１１は、図６の基本レイヤ復号化器８０２の内部構成を示す図である。図１１の基本レイヤ復号化器８０２は、分離器１１０１と、音源生成器１１０２と、合成フィルタ１１０３とから主に構成される。
【０１４４】
分離器１１０１は、分離器８０１から出力された第１符号化コードをＬＰＣ係数、適応ベクトル、適応ゲイン、雑音ベクトル、雑音ゲインの符号化コードに分離して、適応ベクトル、適応ゲイン、雑音ベクトル、雑音ゲインの符号化コードを音源生成器１１０２に出力する。同様に、分離器１１０１は、ＬＰＣ係数の符号化コードを合成フィルタ１１０３に出力する。
【０１４５】
音源生成器１１０２は、適応ベクトル、適応ベクトルゲイン、雑音ベクトル、雑音ベクトルゲインの符号化コードを復号し、以下に示す式（１７）を用いて音源ベクトルｅｘ（ｎ）を生成する。
【０１４６】
【数１７】

ここで、ｑ（ｎ）は適応ベクトル、β_ｑは適応ベクトルゲイン、ｃ（ｎ）は雑音ベクトル、γ_ｑは雑音ベクトルゲインを表す。
【０１４７】
合成フィルタ１１０３では、ＬＰＣ係数の符号化コードからＬＰＣ係数を復号し、以下に示す式（１８）を用いて復号されたＬＰＣ係数から合成信号ｓｙｎ（ｎ）を生成する。
【０１４８】
【数１８】

ここで、αｑは復号されたＬＰＣ係数、ＮＰはＬＰＣ係数の次数を表す。そして、合成フィルタ１１０３は、復号された復号信号ｓｙｎ（ｎ）をアップサンプリング器８０３に出力する。
【０１４９】
このように、本実施の形態の音響符号化装置及び音響復号化装置によれば、送信側において、基本レイヤにＣＥＬＰを適用して入力信号を符号化し、受信側において、この符号化した入力信号にＣＥＬＰを適用して復号することにより、低ビットレートで高品質な基本レイヤを実現することができる。
【０１５０】
なお、本実施の形態の音声符号化装置は、量子化歪の知覚を抑制するために、合成フィルタ１１０３の後にポストフィルタを従属接続する構成を採ることもできる。図１２は、本実施の形態の基本レイヤ復号化器の内部構成の一例を示すブロック図である。但し、図１１と同一の構成となるものについては、図１１と同一番号を付し、詳しい説明を省略する。
【０１５１】
ポストフィルタ１２０１は、量子化歪の知覚の抑制の実現のために様々な構成を適用しうるが、代表的な方法として、分離器１１０１で復号されて得られるＬＰＣ係数から構成されるホルマント強調フィルタを用いる方法がある。ホルマント強調フィルタＨ_ｆ（ｚ）は以下に示す式（１９）で表される。
【０１５２】
【数１９】

ここで、Ａ（ｚ）は復号ＬＰＣ係数から構成される合成フィルタ、γ_ｎ、γ_ｄ、μはフィルタの特性を決定する定数を表す。
【０１５３】
（実施の形態３）
図１３は、本発明の実施の形態３に係る音響符号化装置の周波数決定部の内部構成の一例を示すブロック図である。但し、図４と同一の構成となるものについては、図４と同一番号を付し、詳しい説明を省略する。図１３の周波数決定部１０７は、推定誤差スペクトル算出器１３０１と、決定部１３０２とを具備し、基本レイヤ復号信号の振幅スペクトルＰ（ｍ）から推定誤差スペクトルＥ’（ｍ）を推定し、推定誤差スペクトルＥ’（ｍ）と推定聴覚マスキングＭ’（ｍ）とを用いて拡張レイヤ符号化器１０８で符号化される誤差スペクトルの周波数を決定する点が図４と異なる。
【０１５４】
ＦＦＴ部４０１は、アップサンプリング器１０４から出力された基本レイヤ復号信号ｘ（ｎ）を直交変換して振幅スペクトルＰ（ｍ）を算出して推定聴覚マスキング算出器４０２と推定誤差スペクトル算出器１３０１に出力する。
【０１５５】
推定誤差スペクトル算出器１３０１は、ＦＦＴ部４０１で算出される基本レイヤ復号信号の振幅スペクトルＰ（ｍ）から推定誤差スペクトルＥ’（ｍ）を算出して決定部１３０２に出力する。推定誤差スペクトルＥ’（ｍ）は、基本レイヤ復号信号の振幅スペクトルＰ（ｍ）を平坦に近づける処理を施し算出される。具体的には、推定誤差スペクトル算出器１３０１は、以下の式（２０）を用いて推定誤差スペクトルＥ’（ｍ）を算出する。
【０１５６】
【数２０】

ここでａとγは０以上１未満の定数を表す。
【０１５７】
決定部１３０２は、推定誤差スペクトル算出器１３０１において推定された推定誤差スペクトルＥ’（ｍ）と推定聴覚マスキング算出器４０２で得られる推定聴覚マスキングＭ’（ｍ）を用いて拡張レイヤ符号化器１０８で誤差スペクトルを符号化する対象の周波数を決定する。
【０１５８】
次に、本実施の形態の推定誤差スペクトル算出器１３０１が算出する推定誤差スペクトルについて説明する。図１４は、本実施の形態の推定誤差スペクトル算出器が算出する残差スペクトルの一例を示す図である。
【０１５９】
誤差スペクトルＥ（ｍ）は、図１４に示すように基本レイヤ復号信号の振幅スペクトルＰ（ｍ）に比べスペクトルの形状が平坦になり、かつ全帯域のパワーが小さくなっている。よって、振幅スペクトルＰ（ｍ）をγ（０＜γ＜１）乗することによりスペクトル形状を平坦化し、ａ（０＜ａ＜１）倍することにより全体域のパワーを減少させることにより、誤差スペクトルの推定精度を向上させることができる。
【０１６０】
同様に、復号化側も音響復号化装置８００の周波数決定部８０４の内部構成を符号化側の図１３の周波数決定部１０７と同じ構成とする。
【０１６１】
このように、本実施の形態の音響符号化装置によれば、基本レイヤの復号信号のスペクトルから推定した残差スペクトルを平滑化することにより、推定誤差スペクトルを残差スペクトルに近似することができ、拡張レイヤにて誤差スペクトルを効率よく符号化することができる。
【０１６２】
なお、本実施の形態では、ＦＦＴを用いた場合について説明したが、前述した実施の形態１と同様に、ＦＦＴの代わりにＭＤＣＴを用いる構成も可能である。
【０１６３】
（実施の形態４）
図１５は、本発明の実施の形態４に係る音響符号化装置の周波数決定部の内部構成の一例を示すブロック図である。但し、図４と同一の構成となるものについては、図４と同一番号を付し、詳しい説明を省略する。図１５の周波数決定部１０７は、推定聴覚マスキング修正部１５０１と、決定部１５０２とを具備し、周波数決定部１０７において、基本レイヤ復号信号の振幅スペクトルＰ（ｍ）から推定聴覚マスキング算出器４０２にて推定聴覚マスキングＭ’（ｍ）を算出した後に、この推定聴覚マスキングＭ’（ｍ）に基本レイヤ符号化器１０２の符号化コードの情報を基に修正を加える点が図４と異なる。
【０１６４】
ＦＦＴ部４０１は、アップサンプリング器１０４から出力された基本レイヤ復号信号ｘ（ｎ）を直交変換して振幅スペクトルＰ（ｍ）を算出して推定聴覚マスキング算出器４０２と決定部１５０２に出力する。推定聴覚マスキング算出器４０２は、基本レイヤ復号信号の振幅スペクトルＰ（ｍ）を用いて推定聴覚マスキングＭ’（ｍ）を算出して推定聴覚マスキング修正部１５０１に出力する。
【０１６５】
推定聴覚マスキング修正部１５０１は、基本レイヤ符号化器１０２から入力される基本レイヤの符号化コードの情報を用いて推定聴覚マスキング算出器４０２で求められる推定聴覚マスキングＭ’（ｍ）に修正を加える。
【０１６６】
ここでは、基本レイヤの符号化コードの情報として、復号ＬＰＣ係数から算出される１次のＰＡＲＣＯＲ係数が与えられるものとする。一般にＬＰＣ係数やＰＡＲＣＯＲ係数は入力信号のスペクトル包絡を表現する。ＰＡＲＣＯＲ係数の次数を下げていくと、ＰＡＲＣＯＲ係数の性質から、スペクトル包絡の形状が簡略化されてゆき、ＰＡＲＣＯＲ係数の次数が１次のときにスペクトルの傾きの程度を表すようになる。
【０１６７】
一方で、入力信号として与えられる楽音や音声のスペクトル特性には、高域に対して低域にパワーが偏っている場合（例えば母音）やその逆の場合（例えば子音）が存在する。基本レイヤ復号信号はこういった入力信号のスペクトル特性に影響を受けやすく、必要以上にスペクトルのパワーの偏りを強調してしまう傾向にある。
【０１６８】
そこで、本実施の形態の音響符号化装置は、推定聴覚マスキング修正部１５０１において、前述した１次のＰＡＲＣＯＲ係数を利用して過度に強調されたスペクトルの偏りを補正することにより、推定マスキングＭ’（ｍ）の精度を向上させることができる。
【０１６９】
推定聴覚マスキング修正部１５０１は、以下に示す式（２１）を用いて基本レイヤ符号化器１０２から出力された１次のＰＡＲＣＯＲ係数ｋ（１）から修正フィルタＨ_ｋ（ｚ）を算出する。
【０１７０】
【数２１】

ここでβは１未満の正の定数を表す。次に、推定聴覚マスキング修正部１５０１は、以下に示す式（２２）を用いてＨ_ｋ（ｚ）の振幅特性Ｋ（ｍ）を算出する。
【０１７１】
【数２２】

【０１７２】
そして、推定聴覚マスキング修正部１５０１は、以下の式（２３）を用いて修正フィルタの振幅特性Ｋ（ｍ）から修正後の推定聴覚マスキングＭ’’（ｍ）を算出する。
【０１７３】
【数２３】

【０１７４】
そして、推定聴覚マスキング修正部１５０１は、推定聴覚マスキングＭ’（ｍ）の代わりに修正後の聴覚マスキングＭ’’（ｍ）を決定部１５０２に出力する。
【０１７５】
決定部１５０２は、基本レイヤ復号信号の振幅スペクトルＰ（ｍ）と推定聴覚マスキング修正部１５０１から出力される修正後の聴覚マスキングＭ’’（ｍ）を用いて拡張レイヤ符号化器１０８で誤差スペクトルを符号化する対象の周波数を決定する。
【０１７６】
このように本実施の形態の音響符号化装置によれば、マスキング効果の特性を利用して、入力信号のスペクトルから聴覚マスキングを算出し、拡張レイヤの符号化において、量子化歪をこのマスキング値以下になるように量子化を行うことにより、品質の劣化を伴わずに量子化の対象となるＭＤＣＴ係数の数を減らすことができ、低ビットレートで高品質に符号化を行うことができる。
【０１７７】
このように、本実施の形態の音響符号化装置によれば、基本レイヤ復号信号の振幅スペクトルから推定した推定聴覚マスキングを、基本レイヤ符号化器の符号化コードの情報を基に修正を加えることにより、推定聴覚マスキングの精度を向上させることができ、結果拡張レイヤにて誤差スペクトルを効率よく符号化することができる。
【０１７８】
同様に、復号化側も音響復号化装置８００の周波数決定部８０４の内部構成を符号化側の図１５の周波数決定部１０７と同じ構成とする。
【０１７９】
なお、本実施の形態の周波数決定部１０７は、本実施の形態と実施の形態３とを組み合わせた構成を採ることもできる。図１６は、本実施の形態の音響符号化装置の周波数決定部の内部構成の一例を示すブロック図である。但し、図４と同一の構成となるものについては、図４と同一番号を付し、詳しい説明を省略する。
【０１８０】
ＦＦＴ部４０１は、アップサンプリング器１０４から出力された基本レイヤ復号信号ｘ（ｎ）を直交変換して振幅スペクトルＰ（ｍ）を算出して推定聴覚マスキング算出器４０２と推定誤差スペクトル算出器１３０１に出力する。
【０１８１】
推定聴覚マスキング算出器４０２は、基本レイヤ復号信号の振幅スペクトルＰ（ｍ）を用いて推定聴覚マスキングＭ’（ｍ）を算出して推定聴覚マスキング修正部１５０１に出力する。
【０１８２】
推定聴覚マスキング修正部１５０１は、基本レイヤ符号化器１０２から入力される基本レイヤの符号化コードの情報が推定聴覚マスキング修正部１５０１を用いて推定聴覚マスキング算出器４０２で求められる推定聴覚マスキングＭ’（ｍ）に修正を加える。
【０１８３】
推定誤差スペクトル算出器１３０１は、ＦＦＴ部４０１で算出される基本レイヤ復号信号の振幅スペクトルＰ（ｍ）から推定誤差スペクトルＥ’（ｍ）を算出して決定部１６０１に出力する。
【０１８４】
決定部１６０１は、推定誤差スペクトル算出器１３０１において推定された推定誤差スペクトルＥ’（ｍ）と推定聴覚マスキング修正部１５０１から出力される修正後の聴覚マスキングＭ’’（ｍ）を用いて拡張レイヤ符号化器１０８で誤差スペクトルを符号化する対象の周波数を決定する。
【０１８５】
また、本実施の形態では、ＦＦＴを用いた場合について説明したが、前述した実施の形態１と同様に、ＦＦＴの代わりにＭＤＣＴを用いる構成も可能である。
【０１８６】
（実施の形態５）
図１７は、本発明の実施の形態５に係る音響符号化装置の拡張レイヤ符号化器の内部構成の一例を示すブロック図である。但し、図６と同一の構成となるものについては、図６と同一番号を付し、詳しい説明を省略する。図１７の拡張レイヤ符号化器は、順序づけ部１７０１と、ＭＤＣＴ係数量子化器１７０２を具備し、周波数決定部１０７から与えられる周波数を推定歪値Ｄ（ｍ）の大きさに従い周波数別で符号化後の情報量に重み付けを行う点が図６の拡張レイヤ符号化器と異なる。
【０１８７】
図１７において、ＭＤＣＴ部６０１は、減算器１０６から出力された入力信号に分析窓を乗じた後、ＭＤＣＴ変換（変形離散コサイン変換）してＭＤＣＴ係数を求め、ＭＤＣＴ係数量子化器１７０２に出力する。
【０１８８】
順序づけ部１７０１は、周波数決定部１０７で求められた周波数情報を受けつけ、各周波数の推定誤差スペクトルＥ’（ｍ）が推定聴覚マスキングＭ’（ｍ）を超える量（以後、推定歪値と呼ぶ）Ｄ（ｍ）を算出する。この推定歪値Ｄ（ｍ）は、以下に示す式（２４）で定義される。
【０１８９】
【数２４】

【０１９０】
ここで、順序づけ部１７０１は、以下に示す式（２５）を満たす推定歪値Ｄ（ｍ）のみ算出する。
【０１９１】
【数２５】

【０１９２】
そして、順序づけ部１７０１は、推定歪値Ｄ（ｍ）の大きさが大きいものから順序付けを行い、その周波数情報をＭＤＣＴ係数量子化器１７０２に出力する。ＭＤＣＴ係数量子化器１７０２では、推定歪値Ｄ（ｍ）により順序付けされた周波数情報を基に、推定歪値Ｄ（ｍ）の大きいものからその周波数に位置する誤差スペクトルＥ（ｍ）にビットを多く配分して量子化を行う。
【０１９３】
ここでは例として、周波数決定手段から送られてくる周波数と推定歪値が図１８である場合について説明する。図１８は、本実施の形態の順序づけ部の推定歪値の順位づけの一例を示す図である。
【０１９４】
順序づけ部１７０１は、図１８の情報から、推定歪値Ｄ（ｍ）が大きい順に周波数の並べ替えを行う。この例では、順序づけ部１７０１の処理の結果、周波数ｍ＝７、８、４、９、１、１１、３、１２の順序が得られる。順序づけ部１７０１は、この順序付けの情報をＭＤＣＴ係数量子化器１７０２に出力する。
【０１９５】
ＭＤＣＴ係数量子化器１７０２では、ＭＤＣＴ部６０１から与えられる誤差スペクトルＥ（ｍ）の内、順序づけ部１７０１から与えられる順序付けの情報を基にして、Ｅ（７）、Ｅ（８）、Ｅ（４）、Ｅ（９）、Ｅ（１）、Ｅ（１１）、Ｅ（３）、Ｅ（１２）を量子化する。
【０１９６】
このとき、順序づけの先頭に位置する誤差スペクトルの量子化に用いられるビット数は多く配分され、末尾にいくに従いビット数は少なく配分される。すなわち、推定歪値Ｄ（ｍ）が大きい周波数ほど誤差スペクトルの量子化に用いられるビット数は多く配分され、推定歪値Ｄ（ｍ）が小さい周波数ほど誤差スペクトルの量子化に用いられるビット数は少なく配分される。
【０１９７】
例えば、Ｅ（７）を８ビット、Ｅ（８）、Ｅ（４）を７ビット、Ｅ（９）、Ｅ（１）を６ビット、Ｅ（１１）、Ｅ（３）、Ｅ（１２）を５ビットというようなビット配分を実施する。このような推定歪値Ｄ（ｍ）に応じた適応ビット配分を行うことにより、量子化の効率が向上することになる。
【０１９８】
ベクトル量子化を適用する場合には、拡張レイヤ符号化器１０８は、先頭に位置する誤差スペクトルから順にベクトルを構成し、それぞれのベクトルに対してベクトル量子化を行う。このとき、先頭に位置する誤差スペクトルのビット配分が多くなり、末尾に位置する誤差スペクトルのビット配分が小さくなるようなベクトルの構成と量子化ビット配分が成される。図１８の例では、Ｖ１＝（Ｅ（７）、Ｅ（８））、Ｖ２＝（Ｅ（４）、Ｅ（９））、Ｖ３＝（Ｅ（１）、Ｅ（１１）、Ｅ（３）、Ｅ（１２））のように、２次元、２次元、４次元の３ベクトルを構成し、Ｖ１を１０ビット、Ｖ２を８ビット、Ｖ３を８ビットというようなビット配分を行う。
【０１９９】
このように、本実施の形態の音響符号化装置によれば、拡張レイヤでの符号化において、推定誤差スペクトルが推定聴覚マスキングを超える量が大きい周波数に多くの情報量を配分して符号化することにより、量子化効率の向上を図ることができる。
【０２００】
次に復号化側について説明する。図１９は、本発明の実施の形態５に係るの音響復号化装置の拡張レイヤ復号化器の内部構成の一例を示すブロック図である。但し、図９と同一の構成となるものについては、図９と同一番号を付し、詳しい説明を省略する。図１９の拡張レイヤ復号化器８０５は、順序づけ部１９０１と、ＭＤＣＴ係数復号化器１９０２とを具備し、周波数決定部８０４から与えられる周波数を推定歪値Ｄ（ｍ）の大きさに従い順序付けを行う点が図９と異なる。
【０２０１】
順序づけ部１９０１は、上述の式（２４）を用いて推定歪値Ｄ（ｍ）を算出する。順序づけ部１９０１は、上述の順序づけ部１７０１と同一の構成を採る。この構成により適応ビット配分を行い量子化効率の向上を図ることができる上述の音響符号化法の符号化コードを復号することができる。
【０２０２】
ＭＤＣＴ係数復号化器１９０２は、推定歪値Ｄ（ｍ）の大きさに従い順序付けされた周波数の情報を用いて分離器８０１から出力された第２符号化コードを復号化する。具体的には、ＭＤＣＴ係数復号化器１９０２は、周波数決定部８０４から与えられる周波数に対応する復号ＭＤＣＴ係数を配置し、それ以外の周波数にはゼロを与える。次にＩＭＤＣＴ部９０２は、ＭＤＣＴ係数復号化器１９０２から得られるＭＤＣＴ係数に逆ＭＤＣＴ変換を施し、時間領域の信号を生成する。
【０２０３】
重ね合わせ加算器９０３は、前記信号に合成用の窓関数を乗じ、前フレームで復号された時間領域の信号とフレームの半分だけオーバーラップさせて加算して出力信号を生成する。重ね合わせ加算器９０３は、この出力信号を加算器８０６に出力する。
【０２０４】
このように、本実施の形態の音響復号化装置によれば、拡張レイヤでの符号化において、推定誤差スペクトルが推定聴覚マスキングを超える量に応じて適応ビット配分されたベクトル量子化を行うことにより、量子化効率の向上を図ることができる。
【０２０５】
（実施の形態６）
図２０は、本発明の実施の形態６に係る音響符号化装置の拡張レイヤ符号化器の内部構成の一例を示すブロック図である。但し、図６と同一の構成となるものについては、図６と同一番号を付し、詳しい説明を省略する。図２０の拡張レイヤ符号化器は、固定帯域指定部２００１と、ＭＤＣＴ係数量子化器２００２とを具備し、あらかじめ定めておいた帯域に含まれるＭＤＣＴ係数を周波数決定部１０７から得られる周波数と共に量子化する点が図６の拡張レイヤ符号化器と異なる。
【０２０６】
図２０において、固定帯域指定部２００１には、あらかじめ聴感上重要な帯域が設定されている。ここでは、設定されている帯域に含まれる周波数をｍ＝１５、１６とする。
【０２０７】
ＭＤＣＴ係数量子化器２００２は、ＭＤＣＴ部６０１から出力された入力信号に周波数決定部１０７から出力された聴覚マスキングを用いて入力信号を量子化する係数と量子化しない係数に分類し、量子化する係数と、さらに固定帯域指定部２００１が設定する帯域にある係数を符号化する。
【０２０８】
その周波数が図１８で示されたものであるとすると、ＭＤＣＴ係数量子化器２００２では、誤差スペクトルＥ（１）、Ｅ（３）、Ｅ（４）、Ｅ（７）、Ｅ（８）、Ｅ（９）、Ｅ（１１）、Ｅ（１２）および、固定帯域指定部２００１で指定される周波数の誤差スペクトルＥ（１５）、Ｅ（１６）が量子化される。
【０２０９】
このように、本実施の形態の音響符号化装置によれば、符号化の対象として選択されにくいが聴覚的に重要な帯域を強制的に量子化することにより、本来符号化の対象として選択されるべき周波数が選択されない場合でも、聴覚的に重要な帯域に含まれる周波数に位置する誤差スペクトルは必ず量子化されることになり、品質を改善することができる。
【０２１０】
次に、復号化側について説明する。図２１は、本発明の実施の形態６に係る音響復号化装置の拡張レイヤ復号化器の内部構成の一例を示すブロック図である。但し、図９と同一の構成となるものについては、図９と同一番号を付し、詳しい説明を省略する。図２１の拡張レイヤ復号化器は、固定帯域指定部２１０１と、ＭＤＣＴ係数復号化器２１０２とを具備し、あらかじめ定めておいた帯域に含まれるＭＤＣＴ係数を周波数決定部８０４から得られる周波数と共に復号化する点が図９の拡張レイヤ復号化器と異なる。
【０２１１】
図２１において、固定帯域指定部２１０１には、あらかじめ聴感上重要な帯域が設定されている。
【０２１２】
ＭＤＣＴ係数復号化器２１０２は、周波数決定部８０４から出力される復号化の対象となる誤差スペクトルの周波数に基づいて分離器８０１から出力される第２符号化コードから量子化されたＭＤＣＴ係数を復号する。具体的には、周波数決定部８０４と固定帯域指定部２１０１から示された信号の周波数に対応する復号ＭＤＣＴ係数を配置し、それ以外の周波数にはゼロを与える。
【０２１３】
ＩＭＤＣＴ部９０２は、ＭＤＣＴ係数復号化器２１０２から出力されるＭＤＣＴ係数に逆ＭＤＣＴ変換を施し、時間領域の信号を生成して重ね合わせ加算器９０３に出力する。
【０２１４】
このように、本実施の形態の音響復号化装置によれば、あらかじめ定めておいた帯域に含まれるＭＤＣＴ係数を復号化することにより、符号化の対象として選択されにくいが聴覚的に重要な帯域を強制的に量子化された信号を復号化することができ、符号化側において本来符号化の対象として選択されるべき周波数が選択されない場合でも、聴覚的に重要な帯域に含まれる周波数に位置する誤差スペクトルは必ず量子化されることになり、品質の改善することができる。
【０２１５】
なお、本実施の形態の拡張レイヤ符号化器及び拡張レイヤ復号化器は、本実施の形態と実施の形態５とを組み合わせた構成を採ることもできる。図２２は、本実施の形態の音響符号化装置の周波数決定部の内部構成の一例を示すブロック図である。但し、図６と同一の構成となるものについては、図６と同一番号を付し、詳しい説明を省略する。
【０２１６】
図２２において、ＭＤＣＴ部６０１は、減算器１０６から出力された入力信号に分析窓を乗じた後、ＭＤＣＴ変換（変形離散コサイン変換）してＭＤＣＴ係数を求め、ＭＤＣＴ係数量子化器２２０１に出力する。
【０２１７】
順序づけ部１７０１は、周波数決定部１０７で求められた周波数情報を受けつけ、各周波数の推定誤差スペクトルＥ’（ｍ）が推定聴覚マスキングＭ’（ｍ）を超える量（以後、推定歪値と呼ぶ）Ｄ（ｍ）を算出する。
固定帯域指定部２００１には、あらかじめ聴感上重要な帯域が設定されている。
【０２１８】
ＭＤＣＴ係数量子化器２２０１では、推定歪値Ｄ（ｍ）により順序付けされた周波数情報を基に、推定歪値Ｄ（ｍ）の大きいものからその周波数に位置する誤差スペクトルＥ（ｍ）にビットを多く配分して量子化を行う。また、ＭＤＣＴ係数量子化器２２０１は、固定帯域指定部２００１が設定する帯域にある係数を符号化する。
【０２１９】
次に復号化側について説明する。図２３は、本発明の実施の形態６に係る音響復号化装置の拡張レイヤ復号化器の内部構成の一例を示すブロック図である。但し、図９と同一の構成となるものについては、図９と同一番号を付し、詳しい説明を省略する。
【０２２０】
図２３において、順序づけ部１９０１は、周波数決定部８０４で求められた周波数情報を受けつけ、各周波数の推定誤差スペクトルＥ’（ｍ）が推定聴覚マスキングＭ’（ｍ）を超える量（以後、推定歪値と呼ぶ）Ｄ（ｍ）を算出する。
【０２２１】
そして、順序づけ部１９０１は、推定歪値Ｄ（ｍ）の大きさが大きいものから順序付けを行い、その周波数情報をＭＤＣＴ係数復号化器２３０１に出力する。固定帯域指定部２１０１には、あらかじめ聴感上重要な帯域が設定されている。
【０２２２】
ＭＤＣＴ係数復号化器２３０１は、順序づけ部１９０１から出力される復号化の対象となる誤差スペクトルの周波数に基づいて分離器８０１から出力される第２符号化コードから量子化されたＭＤＣＴ係数を復号する。具体的には、周波数決定部８０４と固定帯域指定部２１０１から示された信号の周波数に対応する復号ＭＤＣＴ係数を配置し、それ以外の周波数にはゼロを与える。
【０２２３】
ＩＭＤＣＴ部９０２は、ＭＤＣＴ係数復号化器２３０１から出力されるＭＤＣＴ係数に逆ＭＤＣＴ変換を施し、時間領域の信号を生成して重ね合わせ加算器９０３に出力する。
【０２２４】
（実施の形態７）
次に、本発明の実施の形態７について、図面を参照して説明する。図２４は、本発明の実施の形態７に係る通信装置の構成を示すブロック図である。図２３における信号処理装置２４０３は前述した実施の形態１から実施の形態６に示した音響符号化装置の中の１つによって構成されている点に本実施の形態の特徴がある。
【０２２５】
図２４に示すように、本発明の実施の形態７に係る通信装置２４００は、入力装置２４０１、Ａ／Ｄ変換装置２４０２及びネットワーク２４０４に接続されている信号処理装置２４０３を具備している。
【０２２６】
Ａ／Ｄ変換装置２４０２は、入力装置２４０１の出力端子に接続されている。信号処理装置２４０３の入力端子は、Ａ／Ｄ変換装置２４０２の出力端子に接続されている。信号処理装置２４０３の出力端子はネットワーク２４０４に接続されている。
【０２２７】
入力装置２４０１は、人間の耳に聞こえる音波を電気的信号であるアナログ信号に変換してＡ／Ｄ変換装置２４０２に与える。Ａ／Ｄ変換装置２４０２はアナログ信号をディジタル信号に変換して信号処理装置２４０３に与える。信号処理装置２４０３は入力されてくるディジタル信号を符号化してコードを生成し、ネットワーク２４０４に出力する。
【０２２８】
このように、本発明の実施の形態の通信装置によれば、通信において前述した実施の形態１〜６に示したような効果を享受でき、少ないビット数で効率よく音響信号を符号化する音響符号化装置を提供することができる。
【０２２９】
（実施の形態８）
次に、本発明の実施の形態８について、図面を参照して説明する。図２５は、本発明の実施の形態８に係る通信装置の構成を示すブロック図である。図２５における信号処理装置２５０３は前述した実施の形態１から実施の形態６に示した音響復号化装置の中の１つによって構成されている点に本実施の形態の特徴がある。
【０２３０】
図２５に示すように、本発明の実施の形態８に係る通信装置２５００は、ネットワーク２５０１に接続されている受信装置２５０２、信号処理装置２５０３、及びＤ／Ａ変換装置２５０４及び出力装置２５０５を具備している。
【０２３１】
受信装置２５０２の入力端子は、ネットワーク２５０１に接続されている。信号処理装置２５０３の入力端子は、受信装置２５０２の出力端子に接続されている。Ｄ／Ａ変換装置２５０４の入力端子は、信号処理装置２５０３の出力端子に接続されている。出力装置２５０５の入力端子は、Ｄ／Ａ変換装置２５０４の出力端子に接続されている。
【０２３２】
受信装置２５０２は、ネットワーク２５０１からのディジタルの符号化音響信号を受けてディジタルの受信音響信号を生成して信号処理装置２５０３に与える。信号処理装置２５０３は、受信装置２５０２からの受信音響信号を受けてこの受信音響信号に復号化処理を行ってディジタルの復号化音響信号を生成してＤ／Ａ変換装置２５０４に与える。Ｄ／Ａ変換装置２５０４は、信号処理装置２５０３からのディジタルの復号化音声信号を変換してアナログの復号化音声信号を生成して出力装置２５０５に与える。出力装置２５０５は、電気的信号であるアナログの復号化音響信号を空気の振動に変換して音波として人間の耳に聴こえるように出力する。
【０２３３】
このように、本実施の形態の通信装置によれば、通信において前述した実施の形態１〜６に示したような効果を享受でき、少ないビット数で効率よく符号化された音響信号を復号することができるので、良好な音響信号を出力することができる。
【０２３４】
（実施の形態９）
次に、本発明の実施の形態９について、図面を参照して説明する。図２６は、本発明の実施の形態９に係る通信装置の構成を示すブロック図である。本発明の実施の形態９において、図２６における信号処理装置２６０３は、前述した実施の形態１から実施の形態６に示した音響符号化器の中の１つによって構成されている点に本実施の形態の特徴がある。
【０２３５】
図２６に示すように、本発明の実施の形態９に係る通信装置２６００は、入力装置２６０１、Ａ／Ｄ変換装置２６０２、信号処理装置２６０３、ＲＦ変調装置２６０４及びアンテナ２６０５を具備している。
【０２３６】
入力装置２６０１は人間の耳に聞こえる音波を電気的信号であるアナログ信号に変換してＡ／Ｄ変換装置２６０２に与える。Ａ／Ｄ変換装置２６０２はアナログ信号をディジタル信号に変換して信号処理装置２６０３に与える。信号処理装置２６０３は入力されてくるディジタル信号を符号化して符号化音響信号を生成し、ＲＦ変調装置２６０４に与える。ＲＦ変調装置２６０４は、符号化音響信号を変調して変調符号化音響信号を生成し、アンテナ２６０５に与える。アンテナ２６０５は、変調符号化音響信号を電波として送信する。
【０２３７】
このように、本実施の形態の通信装置によれば、無線通信において前述した実施の形態１〜６に示したような効果を享受でき、少ないビット数で効率よく音響信号を符号化することができる。
【０２３８】
なお、本発明は、オーディオ信号を用いる送信装置、送信符号化装置又は音響信号符号化装置に適用することができる。また、本発明は、移動局装置又は基地局装置にも適用することができる。
【０２３９】
（実施の形態１０）
次に、本発明の実施の形態１０について、図面を参照して説明する。図２７は、本発明の実施の形態１０に係る通信装置の構成を示すブロック図である。本発明の実施の形態１０において、図２７における信号処理装置２７０３は、前述した実施の形態１から実施の形態６に示した音響復号化器の中の１つによって構成されている点に本実施の形態の特徴がある。
【０２４０】
図２７に示すように、本発明の実施の形態１０に係る通信装置２７００は、アンテナ２７０１、ＲＦ復調装置２７０２、信号処理装置２７０３、Ｄ／Ａ変換装置２７０４及び出力装置２７０５を具備している。
【０２４１】
アンテナ２７０１は、電波としてのディジタルの符号化音響信号を受けて電気信号のディジタルの受信符号化音響信号を生成してＲＦ復調装置２７０２に与える。ＲＦ復調装置２７０２は、アンテナ２７０１からの受信符号化音響信号を復調して復調符号化音響信号を生成して信号処理装置２７０３に与える。
【０２４２】
信号処理装置２７０３は、ＲＦ復調装置２７０２からのディジタルの復調符号化音響信号を受けて復号化処理を行ってディジタルの復号化音響信号を生成してＤ／Ａ変換装置２７０４に与える。Ｄ／Ａ変換装置２７０４は、信号処理装置２７０３からのディジタルの復号化音声信号を変換してアナログの復号化音声信号を生成して出力装置２７０５に与える。出力装置２７０５は、電気的信号であるアナログの復号化音声信号を空気の振動に変換して音波として人間の耳に聴こえるように出力する。
【０２４３】
このように、本実施の形態の通信装置によれば、無線通信において前述した実施の形態１〜６に示したような効果を享受でき、少ないビット数で効率よく符号化された音響信号を復号することができるので、良好な音響信号を出力することができる。
【０２４４】
なお、本発明は、オーディオ信号を用いる受信装置、受信復号化装置又は音声信号復号化装置に適用することができる。また、本発明は、移動局装置又は基地局装置にも適用することができる。
【０２４５】
また、本発明は上記実施の形態に限定されず、種々変更して実施することが可能である。例えば、上記実施の形態では、信号処理装置として行う場合について説明しているが、これに限られるものではなく、この信号処理方法をソフトウェアとして行うことも可能である。
【０２４６】
例えば、上記信号処理方法を実行するプログラムを予めＲＯＭ（Ｒｅａｄ　Ｏｎｌｙ　Ｍｅｍｏｒｙ）に格納しておき、そのプログラムをＣＰＵ（Ｃｅｎｔｒａｌ　Ｐｒｏｃｅｓｓｏｒ　Ｕｎｉｔ）によって動作させるようにしても良い。
【０２４７】
また、上記信号処理方法を実行するプログラムをコンピュータで読み取り可能な記憶媒体に格納し、記憶媒体に格納されたプログラムをコンピュータのＲＡＭ（Ｒａｎｄｏｍ　Ａｃｃｅｓｓ　ｍｅｍｏｒｙ）に記録して、コンピュータをそのプログラムにしたがって動作させるようにしても良い。
【０２４８】
なお、上記説明では、時間領域から周波数領域への変換法にＭＤＣＴを用いる場合について説明を行っているがこれに限定されず直交変換であればいずれも適用できる。例えば、離散フーリエ変換または離散コサイン変換等を適用することもできる。
【０２４９】
なお、本発明は、オーディオ信号を用いる受信装置、受信復号化装置又は音声信号復号化装置に適用することができる。また、本発明は、移動局装置又は基地局装置にも適用することができる。
【０２５０】
【発明の効果】
以上説明したように、本発明の音響符号化装置及び音響符号化方法によれば、入力信号をダウンサンプリングして符号化し、符号化した信号を復号化してアップサンプリングし、このアップサンプリングした復号信号と入力信号との差分信号を符号化する符号化方法において、このアップサンプリングした復号信号から拡張レイヤで符号化の対象となる周波数を決定するため、符号化側と復号化側の両方にある信号のみで前記周波数を決定することができ、よって符号化側から復号化側にこの周波数の情報を伝送ことなく、低ビットレートで高品質に符号化を行うことができる。
【図面の簡単な説明】
【図１】本発明の実施の形態１に係る音響符号化装置の構成を示すブロック図
【図２】音響信号の情報の分布の一例を示す図
【図３】基本レイヤと拡張レイヤで符号化の対象とする領域の一例を示す図
【図４】上記実施の形態の音響符号化装置の周波数決定部の内部構成の一例を示すブロック図
【図５】上記実施の形態の音響符号化装置の聴覚マスキング算出器の内部構成の一例を示す図
【図６】上記実施の形態の拡張レイヤ符号化器の内部構成の一例を示すブロック図
【図７】上記実施の形態の聴覚マスキング算出器の内部構成の一例を示すブロック図
【図８】本発明の実施の形態１に係る音響復号化装置の構成を示すブロック図
【図９】上記実施の形態の音響復号化装置の拡張レイヤ復号化器の内部構成の一例を示すブロック図
【図１０】本発明の実施の形態２の基本レイヤ符号化器の内部構成の一例を示すブロック図
【図１１】上記実施の形態の基本レイヤ復号化器の内部構成の一例を示すブロック図
【図１２】上記実施の形態の基本レイヤ復号化器の内部構成の一例を示すブロック図
【図１３】本発明の実施の形態３に係る音響符号化装置の周波数決定部の内部構成の一例を示すブロック図
【図１４】上記実施の形態の推定誤差スペクトル算出器が算出する残差スペクトルの一例を示す図
【図１５】本発明の実施の形態４に係る音響符号化装置の周波数決定部の内部構成の一例を示すブロック図
【図１６】上記実施の形態の音響符号化装置の周波数決定部の内部構成の一例を示すブロック図
【図１７】本発明の実施の形態５に係る音響符号化装置の拡張レイヤ符号化器の内部構成の一例を示すブロック図
【図１８】上記実施の形態の順序づけ部の推定歪値の順位づけの一例を示す図
【図１９】本発明の実施の形態５に係るの音響復号化装置の拡張レイヤ復号化器の内部構成の一例を示すブロック図
【図２０】本発明の実施の形態６に係る音響符号化装置の拡張レイヤ符号化器の内部構成の一例を示すブロック図
【図２１】本発明の実施の形態６に係る音響復号化装置の拡張レイヤ復号化器の内部構成の一例を示すブロック図
【図２２】上記実施の形態の音響符号化装置の周波数決定部の内部構成の一例を示すブロック図
【図２３】本発明の実施の形態６に係る音響復号化装置の拡張レイヤ復号化器の内部構成の一例を示すブロック図
【図２４】本発明の実施の形態７に係る通信装置の構成を示すブロック図
【図２５】本発明の実施の形態８に係る通信装置の構成を示すブロック図
【図２６】本発明の実施の形態９に係る通信装置の構成を示すブロック図
【図２７】本発明の実施の形態１０に係る通信装置の構成を示すブロック図
【図２８】音響（音楽）信号のスペクトルの一例を示す図
【符号の説明】
１０１　ダウンサンプリング器
１０２　基本レイヤ符号化器
１０３　局所復号化器
１０４　アップサンプリング器
１０５　遅延器
１０６　減算器
１０７、８０４　周波数決定部
１０８　拡張レイヤ符号化器
１０９　多重化器
４０１　ＦＦＴ部
４０２　推定聴覚マスキング算出器
４０３　決定部
６０１、７０１　ＭＤＣＴ部
６０２　ＭＤＣＴ係数量子化器
８０１、１１０１　分離器
８０２　基本レイヤ復号化器
８０３　アップサンプリング器
８０５　拡張レイヤ復号化器
８０６、９０３　加算器
９０１　ＭＤＣＴ係数復号化器
９０２　ＩＭＤＣＴ部
１１０２　音源生成器
１１０３　合成フィルタ
１３０１　推定誤差スペクトル算出器
１３０２、１５０２、１６０１　決定部
１５０１　推定聴覚マスキング修正部
１７０１、１９０１　順序づけ部
１７０２、２００２、２２０１　ＭＤＣＴ係数量子化器
１９０２、２１０２、２３０１　ＭＤＣＴ係数復号化器
２００１、２１０１　固定帯域指定部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an acoustic encoding device and an acoustic encoding method for compressing and encoding an acoustic signal such as a musical tone signal or an audio signal with high efficiency. In particular, the present invention can decode musical tones and speech even from a part of the encoded code. The present invention relates to an acoustic encoding apparatus and an acoustic encoding method that perform scalable encoding.
[0002]
[Prior art]
An acoustic coding technique for compressing a musical sound signal or a voice signal at a low bit rate is important for the effective use of a transmission path capacity such as radio waves and a recording medium in mobile communication. There are methods such as G726 and G729 standardized by ITU (International Telecommunication Union) for speech coding for coding speech signals. These systems target narrowband signals (300 Hz to 3.4 kHz), and can encode with high quality at a bit rate of 8 kbit / s to 32 kbit / s.
[0003]
In addition, ITU G722 and G722.1, 3GPP (The 3rd Generation Partnership Project) AMR-WB, and the like exist as standard systems for encoding a wideband signal (50 Hz to 7 kHz). These systems can encode a wideband audio signal with high quality at a bit rate of 6.6 kbit / s to 64 kbit / s.
[0004]
CELP (Code Excited Linear Prediction) is an effective method for encoding an audio signal at a low bit rate with high efficiency. CELP is based on an engineered model of a human voice generation model, and passes excitation signals represented by random numbers and pulse trains through a pitch filter corresponding to the strength of periodicity and a synthesis filter corresponding to vocal tract characteristics, In this method, the encoding parameter is determined so that the square error between the output signal and the input signal is minimized under the weighting of auditory characteristics (see, for example, Non-Patent Document 1).
[0005]
Many of the recent standard speech coding schemes are based on CELP. For example, G729 can encode a narrowband signal at a bit rate of 8 kbit / s, and AMR-WB is 6.6 kbit / s to 23.85 kbit / s. A wideband signal can be encoded at a bit rate of.
[0006]
On the other hand, in the case of musical sound encoding for encoding a musical sound signal, the musical sound signal is converted into a frequency domain as in the layer 3 method or the AAC method standardized by MPEG (Moving Picture Expert Group), and the psychoacoustic model is converted. In general, transform coding is performed in which coding is performed by using. These systems are known to cause little degradation at a bit rate of 64 kbit / s to 96 kbit / s per channel for a signal with a sampling rate of 44.1 kHz.
[0007]
However, when encoding a signal that is mainly an audio signal and music or environmental sound is superimposed on the background, if the audio encoding method is applied, only the signal in the background part is affected by the music in the background part or the environmental sound. There is also a problem that the audio signal is deteriorated and the overall quality is lowered. This is a problem that occurs because the speech coding method is based on a method specialized for a speech model called CELP. In addition, the signal band that can be handled by the speech coding method is up to 7 kHz, and there is a problem that it cannot fully cope with a signal having a higher frequency than that.
[0008]
On the other hand, since musical sound encoding can perform high-quality encoding on music, sufficient quality can be obtained even for audio signals having music and environmental sounds in the background as described above. In addition, the musical sound encoding can be applied to a signal whose target signal band is CD quality and whose sampling rate is about 22 kHz.
[0009]
On the other hand, in order to realize high-quality encoding, it is necessary to use a higher bit rate. If the bit rate is reduced to about 32 kbit / s, the quality of the decoded signal is greatly reduced. . Therefore, there is a problem that it cannot be used in a communication network with a low transmission rate.
[0010]
In order to avoid the above-described problems, these techniques are combined to first encode the input signal with CELP in the base layer, and then subtract the decoded signal from the input signal to obtain an error signal. A scalable coding in which transform coding is performed in the enhancement layer is conceivable.
[0011]
In this method, since the base layer uses CELP, the audio signal can be encoded with high quality, and the enhancement layer is higher than the background music and environmental sounds that cannot be represented by the base layer, and the frequency band covered by the base layer. It is possible to efficiently encode the frequency component signal. Furthermore, according to this configuration, the bit rate can be kept low. In addition, according to this configuration, it is possible to decode the acoustic signal from only a part of the encoded code, that is, the encoded code of the base layer, and such a scalable function is a multicast for a plurality of networks having different transmission capacities. It is effective for realizing.
[0012]
However, in order to ensure sufficient quality when music is input instead of voice, it is necessary to increase the bit allocation to the enhancement layer, resulting in a problem that the bit rate increases.
[0013]
In the coding of the base layer, a coding method specialized for speech, such as CELP, is used. However, the coding efficiency for music is not high in this CELP. When music signals are encoded, the power of the error signal (enhancement layer input signal) between the input signal and the decoded signal of the base layer increases. It was necessary to allocate bits to improve the quality of the final decoded signal.
[0014]
In order to solve this problem, it is conceivable to increase the coding efficiency by using auditory masking in the enhancement layer. Auditory masking uses human auditory characteristics that when a signal is given, a signal located in the vicinity of the frequency of the signal cannot be heard (masked).
[0015]
FIG. 28 is a diagram illustrating an example of a spectrum of an acoustic (music) signal. In FIG. 28, the solid line represents auditory masking, and the broken line represents an error spectrum. The error spectrum here refers to the spectrum of the error signal (enhancement layer input signal) between the input signal and the decoded signal of the base layer.
[0016]
The error spectrum represented by the hatched portion in FIG. 28 is not audible to human hearing because the amplitude value is smaller than that of auditory masking, and the amplitude value of the error spectrum exceeds the auditory masking in other regions. Is perceived.
[0017]
Therefore, in the enhancement layer, the error spectrum included in the white background portion in FIG. 28 may be encoded so that the quantization distortion in that region is smaller than the auditory masking. Also, since the coefficients belonging to the shaded area are already smaller than auditory masking, there is no need to quantize.
[0018]
[Non-Patent Document 1]
“Code-Excited Linear Prediction (CELP): high quality speech at very low bit rates”, Proc. ICASSP 85, pp. 937-940, 1985.
[0019]
[Problems to be solved by the invention]
However, in the conventional apparatus, it is necessary to transmit information of a frequency that needs to be quantized by auditory masking, and there is a problem that the amount of information to be transmitted increases and the bit rate cannot be lowered.
[0020]
The present invention has been made in view of the above points, and can encode a high-quality signal at a low bit rate even if the signal is mainly a voice and music or noise is superimposed on the background. An object is to provide an acoustic encoding device and an acoustic encoding method.
[0021]
[Means for Solving the Problems]
An acoustic encoding apparatus according to the present invention includes a downsampling unit that reduces a sampling rate of an input signal, a base layer encoding unit that encodes an input signal whose sampling rate is reduced, and a coded input signal that is decoded. Decoding means for obtaining a decoded signal; up-sampling means for increasing the sampling rate of the decoded signal to the same rate as the sampling rate of the input signal at the time of input; Subtracting means for obtaining an error signal from the difference between them, frequency determining means for determining a target frequency for encoding the error signal based on a decoded signal whose sampling rate is increased, and encoding the difference signal at the frequency And an enhancement layer encoding means.
[0022]
According to this configuration, only the base layer encoded signal transmitted from the encoding side to the decoding side is determined by determining the frequency to be encoded by the enhancement layer from the decoded signal of the encoded signal. The frequency to be encoded in the layer can be determined, and it is not necessary to transmit information on this frequency from the encoding side to the decoding side, and encoding can be performed with a low bit rate and high quality.
[0023]
The acoustic encoding apparatus of the present invention employs a configuration in which the base layer encoding means encodes an input signal using a code-excited linear prediction method.
[0024]
According to this configuration, at the transmitting side, CELP is applied to the base layer to encode an input signal, and at the receiving side, CELP is applied to the encoded input signal to perform decoding, so that a high bit rate can be achieved. A quality basic layer can be realized.
[0025]
The acoustic encoding apparatus according to the present invention employs a configuration in which the enhancement layer encoding means orthogonally transforms the difference signal from the time domain to the frequency domain and encodes the converted difference signal.
[0026]
According to this configuration, the difference signal is converted from the time domain to the frequency domain, and the frequency domain that cannot be covered by the encoding of the base layer is encoded by the enhancement layer for the converted signal, thereby changing the spectrum like music. It is possible to cope with a signal having a large value.
[0027]
The acoustic encoding apparatus of the present invention includes an auditory masking unit that calculates an auditory masking that represents an amplitude value that does not contribute to hearing, and the enhancement layer encoding unit encodes a signal in the auditory masking in the frequency determining unit. A configuration is adopted in which an encoding target is determined so as not to be an encoding target, and an error spectrum that is a spectrum of the error signal is encoded.
[0028]
In the audio encoding device of the present invention, the auditory masking means calculates the estimated auditory masking using the frequency domain conversion means for converting the decoded signal whose sampling rate is increased into a frequency domain coefficient, and the frequency domain coefficient. Estimated auditory masking calculating means, and determining means for obtaining a frequency at which the amplitude value of the spectrum of the decoded signal exceeds the amplitude value of the estimated auditory masking, wherein the enhancement layer encoding means is located at the frequency A configuration for encoding the error spectrum is adopted.
[0029]
According to these configurations, auditory masking is calculated from the spectrum of the input signal using the characteristics of the masking effect, and quantization is performed so that the quantization distortion is less than or equal to the masking value in the encoding of the enhancement layer. As a result, the number of MDCT coefficients to be quantized can be reduced without quality deterioration, and high-quality encoding can be performed at a low bit rate.
[0030]
In the acoustic encoding device of the present invention, the auditory masking means includes estimated error spectrum calculating means for calculating an estimated error spectrum using the frequency domain coefficient, and the determining means is an amplitude value of the estimated error spectrum. Adopts a configuration for obtaining a frequency exceeding the amplitude value of the estimated auditory masking.
[0031]
According to this configuration, the estimated error spectrum can be approximated to the residual spectrum by smoothing the residual spectrum estimated from the spectrum of the decoded signal of the base layer, and the error spectrum can be efficiently encoded in the enhancement layer. Can be
[0032]
In the acoustic encoding apparatus according to the present invention, the auditory masking means includes a correcting means for smoothing the estimated auditory masking calculated by the estimated auditory masking calculating means, and the determining means includes the spectrum of the decoded signal or the A configuration is employed in which a frequency exceeding the amplitude value of the estimated auditory masking in which the amplitude value of the estimated error spectrum is smoothed is obtained.
[0033]
According to this configuration, the estimated auditory masking estimated from the amplitude spectrum of the base layer decoded signal is modified based on the information of the encoded code of the base layer encoder, thereby improving the accuracy of the estimated auditory masking. Thus, the error spectrum can be efficiently encoded in the result enhancement layer.
[0034]
In the acoustic encoding device of the present invention, the enhancement layer encoding means calculates, for each frequency, a difference in amplitude value between either the estimated error spectrum or the error spectrum and either auditory masking or estimated auditory masking, and the amplitude value A configuration is adopted in which the amount of information for encoding is determined based on the magnitude of the difference.
[0035]
According to this configuration, in the encoding in the enhancement layer, it is possible to improve the quantization efficiency by allocating a large amount of information to the frequency where the estimated error spectrum exceeds the estimated auditory masking and having a large amount. Can do.
[0036]
The acoustic encoding apparatus according to the present invention employs a configuration in which the enhancement layer encoding unit encodes the error spectrum in a predetermined band in addition to the frequency obtained by the determination unit.
[0037]
According to this configuration, even if a frequency that should be originally selected as an encoding target is not selected by forcibly quantizing an acoustically important band that is difficult to be selected as an encoding target, The error spectrum located in the frequency included in the important band is always quantized, and the quality can be improved.
[0038]
The acoustic decoding apparatus of the present invention includes a base layer decoding unit that decodes a first encoded code obtained by encoding an input signal in units of a predetermined basic frame on the encoding side to obtain a first decoded signal, and a first decoding Up-sampling means for raising the sampling rate of the signal to the same sampling rate as that of the second decoded signal, and decoding the first encoded code on the input signal and encoding side based on the up-sampled first decoded signal A frequency determining means for determining a frequency to which a second encoded code obtained by encoding a residual signal from the encoded signal is to be decoded, and a second by decoding the second encoded code using the frequency information. Enhancement layer decoding means for obtaining a decoded signal; and addition means for adding the second decoded signal and the first decoded signal whose sampling rate is increased. A configuration that.
[0039]
According to this configuration, the base layer encoded signal transmitted from the encoding side to the decoding side is determined by determining the frequency to be encoded by the enhancement layer from the decoded signal of the base layer encoded signal. It is possible to determine the frequency of the enhancement layer encoding only by this, and it is not necessary to transmit information of this frequency from the encoding side to the decoding side, and encoding at a low bit rate with high quality Can do.
[0040]
The acoustic decoding apparatus of the present invention employs a configuration in which the base layer decoding means decodes the first encoded code using the code-excited linear prediction method.
[0041]
According to this configuration, at the transmitting side, CELP is applied to the base layer to encode an input signal, and at the receiving side, CELP is applied to the encoded input signal to perform decoding, so that a high bit rate can be achieved. A quality basic layer can be realized.
[0042]
The acoustic decoding apparatus according to the present invention employs a configuration in which the enhancement layer decoding means orthogonally transforms a signal obtained by decoding the second encoded code from the frequency domain to the time domain.
[0043]
According to this configuration, the difference signal is converted from the time domain to the frequency domain, and the frequency domain that cannot be covered by the encoding of the base layer is encoded by the enhancement layer for the converted signal, thereby changing the spectrum like music. It is possible to cope with a signal having a large value.
[0044]
The acoustic decoding apparatus according to the present invention includes an auditory masking unit that calculates an auditory masking that represents an amplitude value that does not contribute to hearing, and the enhancement layer decoding unit decodes the signal in the auditory masking by the frequency determining unit. A configuration is adopted in which an object to be decoded is determined so as not to be an object to be converted.
[0045]
In the acoustic decoding apparatus according to the present invention, the auditory masking means uses a frequency domain transforming means for transforming a base layer decoded signal with an increased sampling rate into a frequency domain coefficient, and an estimated auditory perception using the frequency domain coefficient. An estimated auditory masking calculating means for calculating masking; and a determining means for obtaining a frequency at which an amplitude value of a spectrum of the decoded signal exceeds an amplitude value of the estimated auditory masking, wherein the enhancement layer decoding means includes the frequency The error spectrum located at is decoded.
[0046]
According to these configurations, auditory masking is calculated from the spectrum of the input signal using the characteristics of the masking effect, and quantization is performed so that the quantization distortion is less than or equal to the masking value in the encoding of the enhancement layer. As a result, the number of MDCT coefficients to be quantized can be reduced without quality deterioration, and high-quality encoding can be performed at a low bit rate.
[0047]
In the acoustic decoding apparatus of the present invention, the auditory masking means includes estimated error spectrum calculating means for calculating an estimated error spectrum using a coefficient in the frequency domain, and the determining means is an amplitude value of the estimated error spectrum. Adopts a configuration for obtaining a frequency exceeding the amplitude value of the estimated auditory masking.
[0048]
With this configuration, the estimated error spectrum can be approximated to the residual spectrum by smoothing the residual spectrum estimated from the spectrum of the decoded signal of the base layer, and the error spectrum can be efficiently encoded in the enhancement layer. can do.
[0049]
In the acoustic decoding apparatus according to the present invention, the auditory masking means includes a correcting means for smoothing the estimated auditory masking calculated by the estimated auditory masking calculating means, and the determining means includes the spectrum of the decoded signal or the A configuration is employed in which a frequency exceeding the amplitude value of the estimated auditory masking in which the amplitude value of the estimated error spectrum is smoothed is obtained.
[0050]
According to this configuration, the estimated auditory masking estimated from the amplitude spectrum of the base layer decoded signal is modified based on the information of the encoded code of the base layer encoder, thereby improving the accuracy of the estimated auditory masking. Thus, the error spectrum can be efficiently encoded in the result enhancement layer.
[0051]
In the acoustic decoding device of the present invention, the enhancement layer decoding means calculates, for each frequency, a difference in amplitude value between either the estimated error spectrum or the error spectrum and either auditory masking or estimated auditory masking, and the amplitude value A configuration is adopted in which the amount of information for decoding is determined based on the magnitude of the difference.
[0052]
According to this configuration, it is possible to improve quantization efficiency by performing vector quantization in which adaptive bit allocation is performed according to the amount that the estimated error spectrum exceeds the estimated auditory masking in encoding in the enhancement layer. .
[0053]
The acoustic decoding apparatus according to the present invention employs a configuration in which the enhancement layer decoding unit decodes the error spectrum in a predetermined band in addition to the frequency obtained by the determination unit.
[0054]
According to this configuration, by decoding MDCT coefficients included in a predetermined band, it is possible to decode a signal that is hard to be selected as an encoding target but forcibly quantized an audibly important band. Even if the encoding side does not select a frequency that should originally be selected as an encoding target, an error spectrum located at a frequency included in an auditory important band is necessarily quantized. And quality can be improved.
[0055]
The acoustic signal transmitting apparatus according to the present invention includes an acoustic input unit that converts an acoustic signal into an electrical signal, an A / D conversion unit that converts a signal output from the acoustic input unit into a digital signal, and the A / D conversion. The above-described acoustic encoding device for encoding the digital signal output from the means, the RF modulation means for modulating the encoded code output from the encoding device into a radio frequency signal, and the RF modulation means A configuration including a transmission antenna that converts a signal into a radio wave and transmits the signal is employed.
[0056]
The acoustic signal receiving apparatus of the present invention includes a receiving antenna for receiving radio waves, an RF demodulating means for demodulating a signal received by the receiving antenna, and the acoustic decoding for decoding information obtained by the RF demodulating means. Apparatus, D / A conversion means for converting the signal output from the decoding apparatus into an analog signal, and acoustic output means for converting the electrical signal output from the D / A conversion means into an acoustic signal The structure to do is taken.
[0057]
The communication terminal device of the present invention employs a configuration including at least one of the acoustic signal transmitting device and the acoustic signal receiving device. The base station apparatus of the present invention employs a configuration including at least one of the acoustic signal transmitting apparatus and the acoustic signal receiving apparatus.
[0058]
According to these configurations, only the base layer encoded signal transmitted from the encoding side to the decoding side is determined by determining the frequency to be encoded by the enhancement layer from the decoded signal of the encoded signal. It is possible to determine the frequency to be encoded in the enhancement layer, and it is not necessary to transmit information on this frequency from the encoding side to the decoding side, and encoding can be performed at a low bit rate and with high quality. .
[0059]
In the acoustic encoding method of the present invention, on the encoding side, a first encoded code is generated by encoding an input signal with a reduced sampling rate, and the first encoded code is decoded. Raise the sampling rate to the same rate as the sampling rate of the input signal at the time of input, determine the frequency to encode the error signal based on the decoded signal with the increased sampling rate, and sample the input signal at the time of input Of the difference signal from the decoded signal whose rate has been increased, the difference signal at the frequency is encoded to generate a second encoded code, and on the decoding side, the first encoded code is decoded to obtain a second encoded code. A decoded signal is obtained, the sampling rate of the second decoded signal is increased to the same rate as the sampling rate of the third decoded signal, and the sampling rate is increased. Determining a target frequency for decoding the second encoded code based on the second decoded signal, obtaining the third decoded signal by decoding the second encoded code using the frequency information, The second decoded signal and the third decoded signal having the increased sampling rate are added.
[0060]
According to this method, the frequency to be encoded in the enhancement layer is determined from the decoded signal of the encoded signal, and only the base layer encoded signal transmitted from the encoding side to the decoding side is extended. The frequency to be encoded in the layer can be determined, and it is not necessary to transmit information on this frequency from the encoding side to the decoding side, and encoding can be performed with a low bit rate and high quality.
[0061]
DETAILED DESCRIPTION OF THE INVENTION
Even if the inventor estimates the frequency to be encoded in the enhancement layer using a signal obtained by decoding the base layer encoding code instead of the input signal, the decoded signal is less distorted with the input signal. Since it has been determined, the present invention has been made paying attention to the fact that it is sufficiently approximate and does not cause a large problem.
[0062]
The gist of the present invention is an encoding method in which an input signal is downsampled and encoded, an encoded signal is decoded and upsampled, and a differential signal between the upsampled decoded signal and the input signal is encoded. By determining the frequency to be encoded or decoded in the enhancement layer from the upsampled decoded signal calculated on both the encoding side and the decoding side, information on this frequency is transmitted from the encoding side to the decoding side. Encoding with high quality at a low bit rate without transmitting the data.
[0063]
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
(Embodiment 1)
FIG. 1 is a block diagram showing a configuration of an acoustic encoding apparatus according to Embodiment 1 of the present invention. 1 includes a downsampler 101, a base layer encoder 102, a local decoder 103, an upsampler 104, a delay unit 105, a subtractor 106, and a frequency determination unit. 107, an enhancement layer encoder 108, and a multiplexer 109.
[0064]
In FIG. 1, a downsampler 101 receives input data (acoustic data) at a sampling rate FH, converts the input data to a sampling rate FL lower than the sampling rate FH, and outputs the converted data to the base layer encoder 102.
[0065]
Base layer encoder 102 encodes input data of sampling rate FL in units of a predetermined basic frame, and outputs a first encoded code obtained by encoding the input data to local decoder 103 and multiplexer 109. For example, the base layer encoder 102 encodes input data using the CELP method.
[0066]
The local decoder 103 decodes the first encoded code and outputs a decoded signal obtained by the decoding to the upsampler 104. The upsampler 104 raises the sampling rate of the decoded signal to FH and outputs it to the subtractor 106 and the frequency determination unit 107.
[0067]
The delay unit 105 delays the input signal by a predetermined time and outputs it to the subtracter 106. By setting this delay to the same value as the time delay generated in the downsampler 101, base layer encoder 102, local decoder 103, and upsampler 104, a phase shift in the next subtraction process is prevented. Have a role. For example, this delay time is the sum of the processing times in the downsampler 101, base layer encoder 102, local decoder 103, and upsampler 104. The subtractor 106 subtracts the input signal by the decoded signal and outputs the subtraction result to the enhancement layer encoder 108 as an error signal.
[0068]
The frequency determination unit 107 determines a region where the error signal is encoded and a region where the error signal is not encoded from the decoded signal whose sampling rate is increased to FH, and notifies the enhancement layer encoder 108 of the determined region. For example, the frequency determination unit 107 determines a frequency for auditory masking from the decoded signal whose sampling rate is increased to FH, and outputs the determined frequency to the enhancement layer encoder 108.
[0069]
The enhancement layer encoder 108 generates an error spectrum by converting the error signal into a frequency domain coefficient, and encodes the error spectrum based on the frequency information to be encoded obtained from the frequency determination unit 107. Multiplexer 109 multiplexes the signal encoded by base layer encoder 102 and the signal encoded by enhancement layer encoder 108.
[0070]
Hereinafter, signals encoded by base layer encoder 102 and enhancement layer encoder 108 will be described. FIG. 2 is a diagram illustrating an example of the distribution of information of acoustic signals. In FIG. 2, the vertical axis indicates the amount of information, and the horizontal axis indicates the frequency. FIG. 2 shows how much frequency information includes sound information and background music / background noise information included in an input signal.
[0071]
As shown in FIG. 2, the audio information has a lot of information in a low frequency region, and the information amount decreases as it goes to a high region. On the other hand, the background music / background noise information has relatively less low-frequency information and larger information contained in the high frequency than audio information.
[0072]
Therefore, the basic layer uses CELP to encode audio signals with high quality, and the enhancement layer efficiently uses background music and environmental sounds that cannot be represented by the basic layer, and signals with higher frequency components than the frequency band covered by the basic layer. Encode well.
[0073]
FIG. 3 is a diagram illustrating an example of regions to be encoded in the base layer and the enhancement layer. In FIG. 3, the vertical axis indicates the amount of information, and the horizontal axis indicates the frequency. FIG. 3 shows areas that are the targets of information to be encoded by the base layer encoder 102 and the enhancement layer encoder 108, respectively.
[0074]
The base layer encoder 102 is designed to efficiently express audio information in the frequency band between 0 and FL, and the audio information in this region can be encoded with high quality. However, in the base layer encoder 102, the encoding quality of the background music / background noise information in the frequency band between 0 and FL is not high.
[0075]
The enhancement layer encoder 108 is designed to cover a portion of the base layer encoder 102 described above that lacks capability and a signal in the frequency band between FL and FH. Therefore, by combining the base layer encoder 102 and the enhancement layer encoder 108, high quality encoding can be realized in a wide band.
[0076]
As shown in FIG. 3, since the first encoded code obtained by encoding in the base layer encoder 102 includes speech information in the frequency band between 0 and FL, at least the first encoding is performed. A scalable function can be realized in which a decoded signal can be obtained only by a code.
[0077]
Further, it is conceivable to increase the coding efficiency by using auditory masking in the enhancement layer. Auditory masking uses human auditory characteristics that when a signal is given, a signal located in the vicinity of the frequency of the signal cannot be heard (masked).
[0078]
FIG. 28 is a diagram illustrating an example of a spectrum of an acoustic (music) signal. In FIG. 28, the solid line represents auditory masking, and the broken line represents an error spectrum. The error spectrum here refers to the spectrum of the error signal (enhancement layer input signal) between the input signal and the decoded signal of the base layer.
[0079]
The error spectrum represented by the hatched portion in FIG. 28 is not audible to human hearing because the amplitude value is smaller than that of auditory masking, and the amplitude value of the error spectrum exceeds the auditory masking in other regions. Is perceived.
[0080]
Therefore, in the enhancement layer, the error spectrum included in the white background portion in FIG. 28 may be encoded so that the quantization distortion in that region is smaller than the auditory masking. Also, since the coefficients belonging to the shaded area are already smaller than auditory masking, there is no need to quantize.
[0081]
In the acoustic encoding apparatus 100 according to the present embodiment, the frequency at which the residual signal is encoded by auditory masking or the like is not transmitted from the encoding side to the decoding side, and upsampling is performed on the encoding side and the decoding side, respectively. The frequency of the error spectrum encoded by the enhancement layer is determined using the decoded base layer signal.
[0082]
Since the decoded signal obtained by decoding the base layer coding code can obtain the same signal on the coding side and the decoding side, the coding side determines the frequency for auditory masking from this decoded signal and codes the signal. Therefore, the decoding side obtains the information of the frequency that is aurally masked from this decoded signal and decodes the signal, so that it is not necessary to encode and transmit the frequency information of the error spectrum as additional information. Rate reduction can be realized.
[0083]
Next, a detailed operation of each block of the acoustic encoding device according to the present embodiment will be described. The operation of the frequency determination unit 107 that determines the frequency of the error spectrum encoded in the enhancement layer from the base layer decoded signal (hereinafter referred to as the base layer decoded signal) that has been upsampled first will be described. FIG. 4 is a block diagram illustrating an example of the internal configuration of the frequency determination unit of the acoustic encoding device according to the present embodiment.
[0084]
In FIG. 4, the frequency determination unit 107 mainly includes an FFT unit 401, an estimated auditory masking calculator 402, and a determination unit 403.
[0085]
The FFT unit 401 orthogonally transforms the base layer decoded signal x (n) output from the upsampler 104 to calculate an amplitude spectrum P (m) and outputs the amplitude spectrum P (m) to the estimated auditory masking calculator 402 and the determination unit 403. Specifically, the FFT unit 401 calculates the amplitude spectrum P (m) using the following formula (1).
[0086]
[Expression 1]

Here, Re (m) and Im (m) are the real and imaginary parts of the Fourier coefficients of the base layer decoded signal x (n), and m is the frequency.
[0087]
Next, estimated auditory masking calculator 402 calculates estimated auditory masking M ′ (m) using amplitude spectrum P (m) of the base layer decoded signal, and outputs it to determining section 403. In general, auditory masking is calculated based on the spectrum of the input signal, but in this embodiment, auditory masking is estimated using the base layer decoded signal x (n) instead of the input signal. This is because the base layer decoded signal x (n) is determined so as to reduce distortion with the input signal, so that even if the base layer decoded signal x (n) is used instead of the input signal, the base layer decoded signal x (n) is sufficiently approximated and is a big problem Is based on the idea that does not occur.
[0088]
Next, the determination unit 403 uses the amplitude spectrum P (m) of the base layer decoded signal and the estimated auditory masking M ′ (m) obtained by the estimated auditory masking calculator 402 to generate an error spectrum in the enhancement layer encoder 108. Determine the frequency to be encoded. The determination unit 403 regards the amplitude spectrum P (m) of the base layer decoded signal as an approximate value of the error spectrum, and outputs the frequency m that satisfies the following equation (2) to the enhancement layer encoder 108.
[0089]
[Expression 2]

[0090]
In Equation (2), the term P (m) estimates the magnitude of the error spectrum, and the term M ′ (m) estimates auditory masking. Then, the determination unit 403 compares the estimated error spectrum and the estimated auditory masking magnitude, and if the expression (2) is satisfied, that is, if the estimated error mask size exceeds the estimated auditory masking magnitude, Is assumed to be encoded by the enhancement layer encoder 108 as being perceived as noise.
[0091]
Conversely, when the size of the estimated error spectrum is smaller than the size of the estimated auditory masking, the determination unit 403 considers that the error spectrum of that frequency is not perceived as noise due to the masking effect, and the error spectrum of this frequency is subject to quantization. Remove from.
[0092]
Next, the operation of the estimated auditory masking calculator 402 will be described. FIG. 5 is a diagram illustrating an example of an internal configuration of the auditory masking calculator of the acoustic encoding device according to the present embodiment. In FIG. 5, the estimated auditory masking calculator 402 mainly includes a Bark spectrum calculator 501, a spread function convolution unit 502, a tonality calculator 503, and an auditory masking calculator 504.
[0093]
In FIG. 5, the Bark spectrum calculator 501 calculates the Bark spectrum B (k) using the following equation (3).
[0094]
[Equation 3]

Here, P (m) represents an amplitude spectrum and is obtained from the above-described equation (1). K corresponds to the number of the Bark spectrum, and FL (k) and FH (k) represent the lowest frequency and the highest frequency of the k-th Bark spectrum, respectively. The Bark spectrum B (k) represents the spectrum intensity when the band is divided at equal intervals on the Bark scale. When the Hertz scale is represented by f and the Bark scale is represented by B, the relationship between the Hertz scale and the Bark scale is represented by the following equation (4).
[0095]
[Expression 4]

[0096]
The spread function convolution unit 502 calculates C (k) by convolving the spread function SF (k) with the Bark spectrum B (k) using the following equation (5).
[0097]
[Equation 5]

[0098]
The tonality calculator 503 obtains the spectral flatness SFM (k) of each Bark spectrum using the following equation (6).
[0099]
[Formula 6]

Here, μg (k) represents the geometric average of the power spectrum included in the k-th bark spectrum, and μa (k) represents the arithmetic average of the power spectrum included in the k-th bark spectrum. Then, the tonality calculator 503 calculates the tonality coefficient α (k) from the decibel value SFMdB (k) of the spectral flatness SFM (k) using the following equation (7).
[0100]
[Expression 7]

[0101]
The auditory masking calculator 504 obtains an offset O (k) of each Bark scale from the tonality coefficient α (k) calculated by the tonality calculator 503 using the following equation (8).
[0102]
[Equation 8]

[0103]
Then, the auditory masking calculator 504 calculates the auditory masking T (k) by subtracting the offset O (k) from C (k) obtained by the spread function convolution unit 502 using the following equation (9).
[0104]
[Equation 9]

Where T _q (K) represents an absolute threshold. The absolute threshold represents the minimum value of auditory masking observed as a human auditory characteristic. Then, the auditory masking calculator 504 converts the auditory masking T (k) expressed in the Bark scale into the Hertz scale to obtain the estimated auditory masking M ′ (m) and outputs it to the determination unit 403.
[0105]
The enhancement layer encoder 108 encodes MDCT coefficients using the frequency m to be quantized thus obtained. FIG. 6 is a block diagram showing an example of an internal configuration of the enhancement layer encoder according to the present embodiment. The enhancement layer encoder 108 in FIG. 6 mainly includes an MDCT unit 601 and an MDCT coefficient quantizer 602.
[0106]
The MDCT unit 601 multiplies the input signal output from the subtractor 106 by an analysis window, and then performs MDCT conversion (modified discrete cosine conversion) to obtain MDCT coefficients. In the MDCT transform, the adjacent frames before and after and the analysis frame are completely overlapped in half, and an orthogonal basis is used in which the first half of the analysis frame is an odd function and the second half is an even function. MDCT conversion has a feature that frame boundary distortion does not occur by superimposing and adding waveforms after inverse conversion when combining waveforms. When performing MDCT, the input signal is multiplied by a window function such as a sin window. When the MDCT coefficient is X (n), the MDCT coefficient is calculated according to Expression (10).
[0107]
[Expression 10]

[0108]
The MDCT coefficient quantizer 602 quantizes the coefficient corresponding to the frequency to be quantized output from the frequency determination unit 107 to the input signal output from the MDCT unit 601. Then, the MDCT coefficient quantizer 602 outputs the encoded code of the quantized MDCT coefficient to the multiplexer 109.
[0109]
As described above, according to the acoustic encoding device of the present embodiment, decoding is performed from the encoding side by determining the frequency to be subjected to encoding of the enhancement layer from the signal obtained by decoding the encoded code of the base layer. The frequency to be encoded in the enhancement layer can be determined only by the base layer encoded signal transmitted to the encoding side, and there is no need to transmit information on this frequency from the encoding side to the decoding side. It is possible to perform high quality encoding at a bit rate.
[0110]
In the above embodiment, a method for calculating auditory masking using FFT has been described. However, auditory masking can also be calculated using MDCT instead of FFT. FIG. 7 is a block diagram illustrating an example of an internal configuration of the frequency determination unit of the present embodiment. 5 identical to those in FIG. 5 are assigned the same reference numerals as in FIG. 5 and detailed descriptions thereof are omitted.
[0111]
The MDCT unit 701 approximates the amplitude spectrum P (m) using the MDCT coefficient. Specifically, the MDCT unit 701 approximates P (m) using the following equation (11).
[0112]
## EQU11 ##

Here, R (m) represents an MDCT coefficient obtained by MDCT conversion of the signal given from the upsampler 104.
[0113]
The estimated auditory masking calculator 402 calculates the Bark spectrum B (k) from P (m) approximated by the MDCT unit 701. Thereafter, frequency information to be quantized is calculated according to the method described above.
[0114]
As described above, the acoustic coding apparatus according to the present embodiment can also calculate auditory masking using MDCT.
[0115]
Next, the decoding side will be described. FIG. 8 is a block diagram showing the configuration of the acoustic decoding apparatus according to Embodiment 1 of the present invention. 8 mainly includes a separator 801, a base layer decoder 802, an upsampler 803, a frequency determination unit 804, an enhancement layer decoder 805, and an adder 806. Composed.
[0116]
Separator 801 separates the code encoded in acoustic encoding apparatus 100 into the first encoded code for the base layer and the second encoded code for the enhancement layer, and performs the base layer decoding of the first encoded code And outputs the second encoded code to the enhancement layer decoder 805.
[0117]
Base layer decoder 802 decodes the first encoded code to obtain a decoded signal of sampling rate FL. Then, base layer decoder 802 outputs the decoded signal to upsampling device 803. The upsampler 803 converts the decoded signal having the sampling rate FL into a decoded signal having the sampling rate FH, and outputs the decoded signal to the frequency determining unit 804 and the adder 806.
[0118]
The frequency determination unit 804 determines the frequency of the error spectrum to be decoded by the enhancement layer decoder 805 using the upsampled base layer decoded signal. The frequency determination unit 804 has the same configuration as the frequency determination unit 107 in FIG.
[0119]
The enhancement layer decoder 805 decodes the second encoded code to obtain a decoded signal having a sampling rate FH. Then, enhancement layer decoder 805 superimposes the decoded signals in units of enhancement frames, and outputs the superimposed decoded signal to adder 806. Specifically, the enhancement layer decoder 805 multiplies the decoded signal by a window function for synthesis, overlaps the time-domain signal decoded in the previous frame by half the frame, and generates an output signal. To do.
[0120]
The adder 806 adds the base layer decoded signal upsampled by the upsampler 803 and the enhancement layer decoded signal decoded by the adder 806 and outputs the result.
[0121]
Next, detailed operation of each block of the acoustic decoding apparatus according to the present embodiment will be described. FIG. 9 is a block diagram showing an example of the internal configuration of the enhancement layer decoder of the acoustic decoding apparatus according to the present embodiment. FIG. 9 is a diagram showing an example of an internal configuration of enhancement layer decoder 805 in FIG. The enhancement layer decoder 805 in FIG. 9 mainly includes an MDCT coefficient decoder 901, an IMDCT unit 902, and a superposition adder 903.
[0122]
The MDCT coefficient decoder 901 decodes the MDCT coefficient quantized from the second encoded code output from the separator 801 based on the frequency of the error spectrum to be decoded output from the frequency determination unit 804. To do. Specifically, a decoded MDCT coefficient corresponding to the frequency of the signal indicated by the frequency determination unit 804 is arranged, and zero is given to other frequencies.
[0123]
The IMDCT unit 902 performs inverse MDCT conversion on the MDCT coefficient output from the MDCT coefficient decoder 901, generates a time domain signal, and outputs it to the overlay adder 903.
[0124]
Superposition adder 903 superimposes the decoded decoded frame unit decoded signals, and outputs the superimposed decoded signals to adder 806. Specifically, the superposition adder 903 multiplies the decoded signal by a window function for synthesis, adds the time-domain signal decoded in the previous frame by half the frame, and generates an output signal. .
[0125]
As described above, according to the acoustic decoding device of the present embodiment, decoding is performed from the encoding side by determining the frequency to be subjected to decoding of the enhancement layer from the signal obtained by decoding the encoded code of the base layer. The frequency to be decoded in the enhancement layer can be determined only by the base layer encoded code transmitted to the encoding side, and there is no need to transmit information on this frequency from the encoding side to the decoding side. It is possible to perform high quality encoding at a bit rate.
[0126]
(Embodiment 2)
In the present embodiment, an example in which CELP is used in base layer encoding will be described. FIG. 10 is a block diagram showing an example of the internal configuration of the base layer encoder according to Embodiment 2 of the present invention. FIG. 10 is a diagram illustrating an internal configuration of the base layer encoder 102 of FIG. The base layer encoder 102 in FIG. 10 includes an LPC analyzer 1001, an auditory weighting unit 1002, an adaptive codebook searcher 1003, an adaptive gain quantizer 1004, a target vector generator 1005, and a noise codebook search. Mainly composed of a multiplier 1006, a noise gain quantizer 1007, and a multiplexer 1008.
[0127]
The LPC analyzer 1001 calculates an LPC coefficient of an input signal having a sampling rate FL, converts the LPC coefficient into a parameter suitable for quantization such as an LSP coefficient, and quantizes the LPC coefficient. Then, the LPC analyzer 1001 outputs the encoded code obtained by this quantization to the multiplexer 1008.
[0128]
Further, the LPC analyzer 1001 calculates a quantized LSP coefficient from the encoded code, converts the LSP coefficient into an LPC coefficient, and converts the quantized LPC coefficient into an adaptive codebook searcher 1003, an adaptive gain quantizer 1004, This is output to the noise codebook searcher 1006 and the noise gain quantizer 1007. Furthermore, the LPC analyzer 1001 outputs the LPC coefficients before quantization to the auditory weighting unit 1002, the adaptive codebook searcher 1003, the adaptive gain quantizer 1004, the noise codebook searcher 1006, and the noise gain quantizer 1007. To do.
[0129]
The audibility weighting unit 1002 weights the input signal output from the downsampler 101 based on the LPC coefficient obtained by the LPC analyzer 1001. This is intended to perform spectrum shaping so that the spectrum of the quantization distortion is masked by the spectrum envelope of the input signal.
[0130]
The adaptive codebook searcher 1003 searches for an adaptive codebook using the perceptually weighted input signal as a target signal. A signal obtained by repeating a past sound source sequence with a pitch period is called an adaptive vector, and an adaptive codebook is composed of adaptive vectors generated with a predetermined range of pitch periods.
[0131]
The perceptually weighted input signal is t (n), and the signal obtained by convolving the impulse response of the weighted synthesis filter composed of the LPC coefficient before quantization and the LPC coefficient after quantization into the adaptive vector of the pitch period i is p. _i When (n) is set, the adaptive codebook searcher 1003 outputs the pitch period i of the adaptive vector that minimizes the evaluation function D of Expression (12) to the multiplexer 1008 as a parameter.
[0132]
[Expression 12]

Here, N represents a vector length. Since the first term of equation (12) is independent of the pitch period i, in practice, the adaptive codebook searcher 1003 calculates only the second term.
[0133]
The adaptive gain quantizer 1004 quantizes the adaptive gain multiplied by the adaptive vector. The adaptive gain β is expressed by the following equation (13), and the adaptive gain quantizer 1004 scalar quantizes the adaptive gain β and outputs a code obtained at the time of quantization to the multiplexer 1008.
[0134]
[Formula 13]

[0135]
The target vector generator 1005 subtracts the influence of the adaptive vector from the input signal to generate and output a target vector used by the noise codebook searcher 1006 and the noise gain quantizer 1007. The target vector generator 1005 is p _i (N) is a signal obtained by convolution of an impulse response of a weighted synthesis filter with an adaptation vector when the evaluation function D represented by Expression 12 is minimized, and βq is a scalar quantization of the adaptation vector β represented by Expression 13. When the quantized value is obtained, the target vector t2 (n) is expressed as the following equation (14).
[0136]
[Expression 14]

[0137]
The noise codebook searcher 1006 searches for a noise codebook using the target vector t2 (n), the LPC coefficient before quantization, and the LPC coefficient after quantization. For example, the noise codebook searcher 1006 can use a signal learned using random noise or a large-scale speech signal. Also, the noise codebook provided in the noise codebook searcher 1006 can be represented by a vector having a very small number of pulses having an amplitude of 1, as in the case of an algebraic codebook. This algebraic code length is characterized in that an optimal combination of a pulse position and a pulse code (polarity) can be determined with a small amount of calculation.
[0138]
When the target vector is t2 (n) and the signal obtained by convolving the impulse response of the weighted synthesis filter with the noise vector corresponding to the code j is cj (n), the noise codebook searcher 1006 has the following equation ( The noise vector index j that minimizes the evaluation function D of 15) is output to the multiplexer 1008.
[0139]
[Expression 15]

[0140]
The noise gain quantizer 1007 quantizes the noise gain multiplied by the noise vector. The noise gain quantizer 1007 calculates the noise gain γ using the following equation (16), scalar quantizes the noise gain γ, and outputs the result to the multiplexer 1008.
[0141]
[Expression 16]

[0142]
The multiplexer 1008 multiplexes the transmitted LPC coefficients, adaptive vectors, adaptive gains, noise vectors, and noise gain encoded codes, and outputs the multiplexed codes to the local decoder 103 and the multiplexer 109.
[0143]
Next, the decoding side will be described. FIG. 11 is a block diagram showing an example of the internal configuration of the base layer decoder of the present embodiment. FIG. 11 is a diagram showing an internal configuration of base layer decoder 802 in FIG. The base layer decoder 802 in FIG. 11 mainly includes a separator 1101, a sound source generator 1102, and a synthesis filter 1103.
[0144]
The separator 1101 separates the first encoded code output from the separator 801 into an LPC coefficient, an adaptive vector, an adaptive gain, a noise vector, and a noise gain encoded code, and an adaptive vector, an adaptive gain, a noise vector, The encoded code of the noise gain is output to the sound source generator 1102. Similarly, the separator 1101 outputs the encoded code of the LPC coefficient to the synthesis filter 1103.
[0145]
The sound source generator 1102 decodes the adaptive vector, the adaptive vector gain, the noise vector, and the encoded code of the noise vector gain, and generates the sound source vector ex (n) using Expression (17) shown below.
[0146]
[Expression 17]

Where q (n) is the adaptation vector, β _q Is the adaptive vector gain, c (n) is the noise vector, γ _q Represents the noise vector gain.
[0147]
The synthesis filter 1103 decodes the LPC coefficient from the encoded code of the LPC coefficient, and generates a synthesized signal syn (n) from the LPC coefficient decoded using Expression (18) shown below.
[0148]
[Formula 18]

Here, αq represents the decoded LPC coefficient, and NP represents the order of the LPC coefficient. Then, the synthesis filter 1103 outputs the decoded decoded signal syn (n) to the upsampler 803.
[0149]
As described above, according to the acoustic encoding device and the acoustic decoding device of the present embodiment, the input signal is encoded by applying CELP to the base layer on the transmission side, and the encoded input signal is encoded on the reception side. By applying CELP to the decoding, it is possible to realize a high-quality base layer at a low bit rate.
[0150]
Note that the speech coding apparatus according to the present embodiment can employ a configuration in which a post filter is cascade-connected after the synthesis filter 1103 in order to suppress the perception of quantization distortion. FIG. 12 is a block diagram showing an example of the internal configuration of the base layer decoder according to the present embodiment. 11 identical to those in FIG. 11 are assigned the same reference numerals as in FIG. 11 and detailed descriptions thereof are omitted.
[0151]
Although various configurations can be applied to the post filter 1201 in order to suppress quantization distortion perception, as a typical method, a formant emphasis filter composed of LPC coefficients obtained by decoding by the separator 1101 is used. There is a method of using. Formant emphasis filter H _f (Z) is represented by the following formula (19).
[0152]
[Equation 19]

Here, A (z) is a synthesis filter composed of decoded LPC coefficients, γ _n , Γ _d , Μ represents a constant that determines the characteristics of the filter.
[0153]
(Embodiment 3)
FIG. 13 is a block diagram showing an example of the internal configuration of the frequency determination unit of the acoustic encoding device according to Embodiment 3 of the present invention. 4 identical to those in FIG. 4 are assigned the same reference numerals as in FIG. 4 and detailed descriptions thereof are omitted. 13 includes an estimation error spectrum calculator 1301 and a determination unit 1302, and estimates an estimation error spectrum E ′ (m) from the amplitude spectrum P (m) of the base layer decoded signal. 4 is different from FIG. 4 in that the frequency of the error spectrum encoded by the enhancement layer encoder 108 is determined using the error spectrum E ′ (m) and the estimated auditory masking M ′ (m).
[0154]
The FFT unit 401 orthogonally transforms the base layer decoded signal x (n) output from the upsampler 104 to calculate an amplitude spectrum P (m), and sends it to the estimated auditory masking calculator 402 and the estimated error spectrum calculator 1301. Output.
[0155]
The estimation error spectrum calculator 1301 calculates an estimation error spectrum E ′ (m) from the amplitude spectrum P (m) of the base layer decoded signal calculated by the FFT unit 401 and outputs it to the determination unit 1302. The estimated error spectrum E ′ (m) is calculated by performing a process of bringing the amplitude spectrum P (m) of the base layer decoded signal close to flat. Specifically, the estimated error spectrum calculator 1301 calculates an estimated error spectrum E ′ (m) using the following equation (20).
[0156]
[Expression 20]

Here, a and γ represent a constant of 0 or more and less than 1.
[0157]
The determination unit 1302 uses the estimated error spectrum E ′ (m) estimated by the estimated error spectrum calculator 1301 and the estimated auditory masking M ′ (m) obtained by the estimated auditory masking calculator 402 to use the enhancement layer encoder 108. To determine the frequency to encode the error spectrum.
[0158]
Next, the estimation error spectrum calculated by the estimation error spectrum calculator 1301 of this embodiment will be described. FIG. 14 is a diagram illustrating an example of a residual spectrum calculated by the estimation error spectrum calculator according to the present embodiment.
[0159]
As shown in FIG. 14, the error spectrum E (m) has a flat spectrum shape and a lower power in the entire band than the amplitude spectrum P (m) of the base layer decoded signal. Therefore, the spectrum shape is flattened by raising the amplitude spectrum P (m) to the power of γ (0 <γ <1), and the power of the entire region is reduced by multiplying by a (0 <a <1). Spectrum estimation accuracy can be improved.
[0160]
Similarly, on the decoding side, the internal configuration of the frequency determination unit 804 of the acoustic decoding device 800 is the same as that of the frequency determination unit 107 in FIG. 13 on the encoding side.
[0161]
As described above, according to the acoustic encoding device of the present embodiment, the estimated error spectrum can be approximated to the residual spectrum by smoothing the residual spectrum estimated from the spectrum of the decoded signal of the base layer. The error spectrum can be efficiently encoded in the enhancement layer.
[0162]
In this embodiment, the case where FFT is used has been described. However, as in Embodiment 1 described above, a configuration using MDCT instead of FFT is also possible.
[0163]
(Embodiment 4)
FIG. 15 is a block diagram showing an example of the internal configuration of the frequency determination unit of the acoustic encoding device according to Embodiment 4 of the present invention. 4 identical to those in FIG. 4 are assigned the same reference numerals as in FIG. 4 and detailed descriptions thereof are omitted. The frequency determination unit 107 in FIG. 15 includes an estimated auditory masking correction unit 1501 and a determiner 1502. In the frequency determiner 107, the estimated auditory masking calculator 402 receives the amplitude spectrum P (m) of the base layer decoded signal. 4 is different from FIG. 4 in that after the estimated auditory masking M ′ (m) is calculated, the estimated auditory masking M ′ (m) is modified based on the information of the encoded code of the base layer encoder 102.
[0164]
The FFT unit 401 orthogonally transforms the base layer decoded signal x (n) output from the upsampler 104 to calculate an amplitude spectrum P (m) and outputs the amplitude spectrum P (m) to the estimated auditory masking calculator 402 and the determination unit 1502. Estimated auditory masking calculator 402 calculates estimated auditory masking M ′ (m) using amplitude spectrum P (m) of the base layer decoded signal, and outputs it to estimated auditory masking correction section 1501.
[0165]
The estimated auditory masking correction unit 1501 corrects the estimated auditory masking M ′ (m) obtained by the estimated auditory masking calculator 402 using information on the encoded code of the base layer input from the base layer encoder 102. .
[0166]
Here, it is assumed that the primary PARCOR coefficient calculated from the decoded LPC coefficient is given as the information of the encoded code of the base layer. In general, the LPC coefficient and the PARCOR coefficient express the spectral envelope of the input signal. When the order of the PARCOR coefficient is lowered, the shape of the spectrum envelope is simplified due to the nature of the PARCOR coefficient, and the degree of inclination of the spectrum is expressed when the order of the PARCOR coefficient is first order.
[0167]
On the other hand, in the spectral characteristics of musical sounds and voices given as input signals, there are cases where the power is biased in the low range with respect to the high range (for example, vowels) and vice versa (for example, consonants). The base layer decoded signal is easily affected by the spectral characteristics of the input signal, and tends to emphasize the spectral power bias more than necessary.
[0168]
Therefore, in the acoustic coding apparatus according to the present embodiment, the estimated auditory masking correction unit 1501 corrects an excessively emphasized spectrum bias using the first-order PARCOR coefficient described above, thereby estimating the masking M ′. The accuracy of (m) can be improved.
[0169]
Estimated auditory masking correction section 1501 calculates correction filter H from first-order PARCOR coefficient k (1) output from base layer encoder 102 using equation (21) shown below. _k (Z) is calculated.
[0170]
[Expression 21]

Here, β represents a positive constant less than 1. Next, the estimated auditory masking correction unit 1501 uses the following equation (22) to calculate H _k The amplitude characteristic K (m) of (z) is calculated.
[0171]
[Expression 22]

[0172]
Then, the estimated auditory masking correction unit 1501 calculates the corrected estimated auditory masking M ″ (m) from the amplitude characteristic K (m) of the correction filter using the following equation (23).
[0173]
[Expression 23]

[0174]
Then, the estimated auditory masking correcting unit 1501 outputs the corrected auditory masking M ″ (m) to the determining unit 1502 instead of the estimated auditory masking M ′ (m).
[0175]
The determining unit 1502 uses the amplitude spectrum P (m) of the base layer decoded signal and the corrected auditory masking M ″ (m) output from the estimated auditory masking correcting unit 1501 to cause the enhancement layer encoder 108 to use the error spectrum. Is determined as a frequency to be encoded.
[0176]
As described above, according to the acoustic encoding apparatus of the present embodiment, the auditory masking is calculated from the spectrum of the input signal using the characteristics of the masking effect, and the quantization distortion is determined by the masking value in the enhancement layer encoding. By performing quantization as follows, the number of MDCT coefficients to be quantized can be reduced without quality degradation, and high-quality encoding can be performed at a low bit rate.
[0177]
As described above, according to the acoustic encoding device of the present embodiment, the estimated auditory masking estimated from the amplitude spectrum of the base layer decoded signal is corrected based on the information of the encoded code of the base layer encoder. Thus, the accuracy of the estimated auditory masking can be improved, and the error spectrum can be efficiently encoded in the result enhancement layer.
[0178]
Similarly, on the decoding side, the internal configuration of the frequency determination unit 804 of the acoustic decoding device 800 is the same as that of the frequency determination unit 107 in FIG. 15 on the encoding side.
[0179]
Note that the frequency determination unit 107 according to the present embodiment can adopt a configuration in which the present embodiment and the third embodiment are combined. FIG. 16 is a block diagram illustrating an example of an internal configuration of the frequency determination unit of the acoustic encoding device according to the present embodiment. 4 identical to those in FIG. 4 are assigned the same reference numerals as in FIG. 4 and detailed descriptions thereof are omitted.
[0180]
The FFT unit 401 orthogonally transforms the base layer decoded signal x (n) output from the upsampler 104 to calculate an amplitude spectrum P (m), and sends it to the estimated auditory masking calculator 402 and the estimated error spectrum calculator 1301. Output.
[0181]
Estimated auditory masking calculator 402 calculates estimated auditory masking M ′ (m) using amplitude spectrum P (m) of the base layer decoded signal, and outputs it to estimated auditory masking correction section 1501.
[0182]
The estimated auditory masking correction unit 1501 uses the estimated auditory masking calculator 402 to obtain the estimated auditory masking calculator 402 by using the estimated auditory masking corrector 1501 to receive the base layer encoded code information input from the base layer encoder 102. Modify (m).
[0183]
Estimated error spectrum calculator 1301 calculates estimated error spectrum E ′ (m) from amplitude spectrum P (m) of the base layer decoded signal calculated by FFT section 401 and outputs the estimated error spectrum E ′ (m) to determining section 1601.
[0184]
The determining unit 1601 uses the estimated error spectrum E ′ (m) estimated by the estimated error spectrum calculator 1301 and the corrected auditory masking M ″ (m) output from the estimated auditory masking correcting unit 1501 to use the enhancement layer. The frequency to which the error spectrum is to be encoded is determined by the encoder 108.
[0185]
Further, in the present embodiment, the case where FFT is used has been described. However, similarly to Embodiment 1 described above, a configuration using MDCT instead of FFT is also possible.
[0186]
(Embodiment 5)
FIG. 17 is a block diagram showing an example of the internal configuration of the enhancement layer encoder of the acoustic encoding device according to Embodiment 5 of the present invention. 6 identical to those in FIG. 6 are assigned the same reference numerals as in FIG. 6 and detailed descriptions thereof are omitted. The enhancement layer encoder in FIG. 17 includes an ordering unit 1701 and an MDCT coefficient quantizer 1702, and encodes the frequency given from the frequency determination unit 107 for each frequency according to the magnitude of the estimated distortion value D (m). The point that weights the subsequent information amount is different from the enhancement layer encoder of FIG.
[0187]
In FIG. 17, the MDCT unit 601 multiplies the input signal output from the subtractor 106 by an analysis window, obtains an MDCT coefficient by MDCT transform (modified discrete cosine transform), and outputs the MDCT coefficient to the MDCT coefficient quantizer 1702. .
[0188]
The ordering unit 1701 receives the frequency information obtained by the frequency determination unit 107, and the amount by which the estimated error spectrum E ′ (m) of each frequency exceeds the estimated auditory masking M ′ (m) (hereinafter referred to as an estimated distortion value). D (m) is calculated. This estimated distortion value D (m) is defined by the following equation (24).
[0189]
[Expression 24]

[0190]
Here, the ordering unit 1701 calculates only the estimated distortion value D (m) that satisfies the following expression (25).
[0191]
[Expression 25]

[0192]
Then, ordering section 1701 performs ordering from the largest estimated distortion value D (m), and outputs the frequency information to MDCT coefficient quantizer 1702. Based on the frequency information ordered by the estimated distortion value D (m), the MDCT coefficient quantizer 1702 applies bits to the error spectrum E (m) located at that frequency from the one having the largest estimated distortion value D (m). Quantize by allocating a lot.
[0193]
Here, as an example, the case where the frequency and the estimated distortion value sent from the frequency determining means are shown in FIG. 18 will be described. FIG. 18 is a diagram illustrating an example of ranking the estimated distortion values of the ordering unit according to the present embodiment.
[0194]
The ordering unit 1701 rearranges the frequencies in the descending order of the estimated distortion value D (m) from the information in FIG. In this example, the order of the frequency m = 7, 8, 4, 9, 1, 11, 3, 12 is obtained as a result of the processing of the ordering unit 1701. Ordering section 1701 outputs this ordering information to MDCT coefficient quantizer 1702.
[0195]
The MDCT coefficient quantizer 1702 uses E (7), E (8), E (4) based on the ordering information given from the ordering unit 1701 among the error spectrum E (m) given from the MDCT unit 601. ), E (9), E (1), E (11), E (3), E (12) are quantized.
[0196]
At this time, the number of bits used for quantization of the error spectrum located at the top of the ordering is distributed, and the number of bits is distributed as it goes to the end. That is, the number of bits used for quantization of the error spectrum is more allocated as the estimated distortion value D (m) is larger, and the number of bits used for quantization of the error spectrum is smaller as the estimated distortion value D (m) is smaller. Allocated less.
[0197]
For example, E (7) is 8 bits, E (8), E (4) is 7 bits, E (9), E (1) is 6 bits, E (11), E (3), E (12) Bit allocation such as 5 bits. By performing adaptive bit allocation according to such estimated distortion value D (m), the efficiency of quantization is improved.
[0198]
When vector quantization is applied, enhancement layer encoder 108 constructs vectors in order from the error spectrum located at the head, and performs vector quantization on each vector. At this time, the bit configuration of the error spectrum located at the beginning increases, and the vector configuration and the quantization bit allocation are performed such that the bit allocation of the error spectrum located at the end becomes small. In the example of FIG. 18, V1 = (E (7), E (8)), V2 = (E (4), E (9)), V3 = (E (1), E (11), E (3) ) And E (12)), two-dimensional, two-dimensional, and four-dimensional three vectors are formed, and bit distribution is performed such that V1 is 10 bits, V2 is 8 bits, and V3 is 8 bits.
[0199]
As described above, according to the acoustic encoding device of the present embodiment, in encoding in the enhancement layer, encoding is performed by allocating a large amount of information to a frequency where the estimated error spectrum exceeds the estimated auditory masking. As a result, the quantization efficiency can be improved.
[0200]
Next, the decoding side will be described. FIG. 19 is a block diagram showing an example of the internal configuration of the enhancement layer decoder of the acoustic decoding device according to Embodiment 5 of the present invention. 9 identical to those in FIG. 9 are assigned the same reference numerals as in FIG. 9 and detailed descriptions thereof are omitted. The enhancement layer decoder 805 in FIG. 19 includes an ordering unit 1901 and an MDCT coefficient decoder 1902, and orders the frequency given from the frequency determination unit 804 according to the magnitude of the estimated distortion value D (m). This is different from FIG.
[0201]
The ordering unit 1901 calculates the estimated distortion value D (m) using the above equation (24). The ordering unit 1901 has the same configuration as the ordering unit 1701 described above. With this configuration, it is possible to decode the coded code of the above-described acoustic coding method that can perform adaptive bit allocation and improve the quantization efficiency.
[0202]
The MDCT coefficient decoder 1902 decodes the second encoded code output from the separator 801 using frequency information ordered according to the magnitude of the estimated distortion value D (m). Specifically, the MDCT coefficient decoder 1902 arranges decoded MDCT coefficients corresponding to the frequency given from the frequency determining unit 804, and gives zero to other frequencies. Next, the IMDCT unit 902 performs inverse MDCT transformation on the MDCT coefficient obtained from the MDCT coefficient decoder 1902 to generate a time domain signal.
[0203]
The superposition adder 903 multiplies the signal by a window function for synthesis, overlaps the signal in the time domain decoded in the previous frame by half of the frame, and adds the generated signal to generate an output signal. The superposition adder 903 outputs this output signal to the adder 806.
[0204]
As described above, according to the acoustic decoding device of the present embodiment, in the encoding in the enhancement layer, by performing the vector quantization in which the estimated error spectrum is adaptively allocated according to the amount exceeding the estimated auditory masking. Thus, the quantization efficiency can be improved.
[0205]
(Embodiment 6)
FIG. 20 is a block diagram showing an example of an internal configuration of an enhancement layer encoder of the acoustic encoding device according to Embodiment 6 of the present invention. 6 identical to those in FIG. 6 are assigned the same reference numerals as in FIG. 6 and detailed descriptions thereof are omitted. The enhancement layer encoder in FIG. 20 includes a fixed band designating unit 2001 and an MDCT coefficient quantizer 2002. The MDCT coefficients included in a predetermined band are quantized together with the frequency obtained from the frequency determining unit 107. Is different from the enhancement layer encoder of FIG.
[0206]
In FIG. 20, a band important for hearing is set in the fixed band designation unit 2001 in advance. Here, it is assumed that the frequencies included in the set band are m = 15 and 16.
[0207]
The MDCT coefficient quantizer 2002 classifies the input signal output from the MDCT unit 601 into a coefficient that quantizes the input signal and a coefficient that is not quantized using the auditory masking output from the frequency determination unit 107 and quantizes them. The coefficient and the coefficient in the band set by the fixed band designation unit 2001 are encoded.
[0208]
If the frequency is the one shown in FIG. 18, the MDCT coefficient quantizer 2002 uses error spectra E (1), E (3), E (4), E (7), E (8), E (9), E (11), E (12) and the error spectrum E (15), E (16) of the frequency designated by the fixed band designation unit 2001 are quantized.
[0209]
As described above, according to the acoustic encoding device of the present embodiment, it is difficult to select an encoding target, but an acoustically important band is forcibly quantized to be originally selected as an encoding target. Even when the frequency to be selected is not selected, the error spectrum located in the frequency included in the audioally important band is necessarily quantized, and the quality can be improved.
[0210]
Next, the decoding side will be described. FIG. 21 is a block diagram showing an example of an internal configuration of the enhancement layer decoder of the acoustic decoding device according to Embodiment 6 of the present invention. 9 identical to those in FIG. 9 are assigned the same reference numerals as in FIG. 9 and detailed descriptions thereof are omitted. The enhancement layer decoder of FIG. 21 includes a fixed band designation unit 2101 and an MDCT coefficient decoder 2102, and decodes MDCT coefficients included in a predetermined band together with the frequency obtained from the frequency determination unit 804. Is different from the enhancement layer decoder of FIG.
[0211]
In FIG. 21, a band important for hearing is set in the fixed band designation unit 2101 in advance.
[0212]
The MDCT coefficient decoder 2102 decodes the MDCT coefficient quantized from the second encoded code output from the separator 801 based on the frequency of the error spectrum to be decoded output from the frequency determination unit 804. To do. Specifically, a decoded MDCT coefficient corresponding to the frequency of the signal indicated by the frequency determination unit 804 and the fixed band designation unit 2101 is arranged, and zero is given to other frequencies.
[0213]
The IMDCT unit 902 performs inverse MDCT transformation on the MDCT coefficient output from the MDCT coefficient decoder 2102, generates a time domain signal, and outputs it to the overlay adder 903.
[0214]
As described above, according to the acoustic decoding device of the present embodiment, by decoding MDCT coefficients included in a predetermined band, it is difficult to select an encoding target, but it is an acoustically important band. Can be decoded forcibly, and even if the encoding side does not select the frequency that should be selected as the target of encoding, it is positioned at a frequency included in the auditory important band. The error spectrum is always quantized, and the quality can be improved.
[0215]
Note that the enhancement layer encoder and enhancement layer decoder of the present embodiment can also adopt a configuration in which the present embodiment and the fifth embodiment are combined. FIG. 22 is a block diagram illustrating an example of an internal configuration of the frequency determination unit of the acoustic encoding device according to the present embodiment. 6 identical to those in FIG. 6 are assigned the same reference numerals as in FIG. 6 and detailed descriptions thereof are omitted.
[0216]
In FIG. 22, the MDCT unit 601 multiplies the input signal output from the subtractor 106 by an analysis window, obtains an MDCT coefficient by MDCT transform (modified discrete cosine transform), and outputs the MDCT coefficient to the MDCT coefficient quantizer 2201. .
[0217]
The ordering unit 1701 receives the frequency information obtained by the frequency determination unit 107, and the amount by which the estimated error spectrum E ′ (m) of each frequency exceeds the estimated auditory masking M ′ (m) (hereinafter referred to as an estimated distortion value). D (m) is calculated.
In the fixed band designation unit 2001, a band important for audibility is set in advance.
[0218]
Based on the frequency information ordered by the estimated distortion value D (m), the MDCT coefficient quantizer 2201 applies bits to the error spectrum E (m) located at that frequency from the one having the largest estimated distortion value D (m). Quantize by allocating a lot. The MDCT coefficient quantizer 2201 encodes a coefficient in a band set by the fixed band designation unit 2001.
[0219]
Next, the decoding side will be described. FIG. 23 is a block diagram showing an example of an internal configuration of an enhancement layer decoder of the acoustic decoding device according to Embodiment 6 of the present invention. 9 identical to those in FIG. 9 are assigned the same reference numerals as in FIG. 9 and detailed descriptions thereof are omitted.
[0220]
In FIG. 23, the ordering unit 1901 receives the frequency information obtained by the frequency determining unit 804, and the amount (hereinafter, estimated distortion) in which the estimated error spectrum E ′ (m) of each frequency exceeds the estimated auditory masking M ′ (m) D (m) is calculated.
[0221]
Then, ordering section 1901 performs ordering from the largest estimated distortion value D (m), and outputs the frequency information to MDCT coefficient decoder 2301. In the fixed band designation unit 2101, a band important for hearing is set in advance.
[0222]
The MDCT coefficient decoder 2301 decodes the MDCT coefficient quantized from the second encoded code output from the separator 801 based on the frequency of the error spectrum to be decoded output from the ordering unit 1901. . Specifically, a decoded MDCT coefficient corresponding to the frequency of the signal indicated by the frequency determination unit 804 and the fixed band designation unit 2101 is arranged, and zero is given to other frequencies.
[0223]
The IMDCT unit 902 performs inverse MDCT conversion on the MDCT coefficient output from the MDCT coefficient decoder 2301, generates a time domain signal, and outputs the signal to the superposition adder 903.
[0224]
(Embodiment 7)
Next, a seventh embodiment of the present invention will be described with reference to the drawings. FIG. 24 is a block diagram showing a configuration of a communication apparatus according to Embodiment 7 of the present invention. The signal processing device 2403 in FIG. 23 is characterized by being configured by one of the acoustic encoding devices shown in the first to sixth embodiments described above.
[0225]
As shown in FIG. 24, a communication device 2400 according to Embodiment 7 of the present invention includes an input device 2401, an A / D conversion device 2402, and a signal processing device 2403 connected to a network 2404.
[0226]
The A / D conversion device 2402 is connected to the output terminal of the input device 2401. An input terminal of the signal processing device 2403 is connected to an output terminal of the A / D conversion device 2402. An output terminal of the signal processing device 2403 is connected to the network 2404.
[0227]
The input device 2401 converts sound waves that can be heard by the human ear into analog signals, which are electrical signals, and provides the analog signals to the A / D converter 2402. The A / D conversion device 2402 converts an analog signal into a digital signal and gives it to the signal processing device 2403. The signal processing device 2403 encodes the input digital signal to generate a code, and outputs the code to the network 2404.
[0228]
As described above, according to the communication device of the embodiment of the present invention, it is possible to enjoy the effects described in the first to sixth embodiments in communication, and to efficiently encode an acoustic signal with a small number of bits. An encoding device can be provided.
[0229]
(Embodiment 8)
Next, an eighth embodiment of the present invention will be described with reference to the drawings. FIG. 25 is a block diagram showing a configuration of a communication apparatus according to Embodiment 8 of the present invention. The signal processing device 2503 in FIG. 25 is characterized by being configured by one of the acoustic decoding devices shown in the first to sixth embodiments described above.
[0230]
As shown in FIG. 25, the communication device 2500 according to the eighth embodiment of the present invention includes a receiving device 2502, a signal processing device 2503, a D / A conversion device 2504, and an output device 2505 connected to a network 2501. doing.
[0231]
An input terminal of the receiving device 2502 is connected to the network 2501. An input terminal of the signal processing device 2503 is connected to an output terminal of the receiving device 2502. The input terminal of the D / A conversion device 2504 is connected to the output terminal of the signal processing device 2503. An input terminal of the output device 2505 is connected to an output terminal of the D / A conversion device 2504.
[0232]
Receiving device 2502 receives the digital encoded acoustic signal from network 2501, generates a digital received acoustic signal, and provides it to signal processing device 2503. The signal processing device 2503 receives the reception acoustic signal from the reception device 2502, performs a decoding process on the reception acoustic signal, generates a digital decoded acoustic signal, and supplies the digital decoded acoustic signal to the D / A conversion device 2504. The D / A conversion device 2504 converts the digital decoded speech signal from the signal processing device 2503 to generate an analog decoded speech signal, and provides it to the output device 2505. The output device 2505 converts an analog decoded acoustic signal, which is an electrical signal, into air vibrations and outputs the sound as a sound wave to the human ear.
[0233]
As described above, according to the communication apparatus of the present embodiment, the effects as described in the first to sixth embodiments can be enjoyed in communication, and an acoustic signal encoded efficiently with a small number of bits is decoded. Therefore, a good acoustic signal can be output.
[0234]
(Embodiment 9)
Next, a ninth embodiment of the present invention will be described with reference to the drawings. FIG. 26 is a block diagram showing a configuration of a communication apparatus according to Embodiment 9 of the present invention. In the ninth embodiment of the present invention, the signal processing device 2603 in FIG. 26 is configured by one of the acoustic encoders described in the first to sixth embodiments. There are features of the form.
[0235]
As shown in FIG. 26, a communication device 2600 according to Embodiment 9 of the present invention includes an input device 2601, an A / D conversion device 2602, a signal processing device 2603, an RF modulation device 2604, and an antenna 2605.
[0236]
The input device 2601 converts sound waves that can be heard by the human ear into analog signals, which are electrical signals, and supplies the analog signals to the A / D converter 2602. The A / D conversion device 2602 converts the analog signal into a digital signal and gives it to the signal processing device 2603. The signal processing device 2603 encodes the input digital signal to generate an encoded acoustic signal, and supplies the encoded acoustic signal to the RF modulation device 2604. The RF modulation device 2604 modulates the encoded acoustic signal to generate a modulated encoded acoustic signal, and supplies the modulated encoded acoustic signal to the antenna 2605. The antenna 2605 transmits the modulation-coded acoustic signal as a radio wave.
[0237]
As described above, according to the communication apparatus of the present embodiment, it is possible to enjoy the effects as described in the first to sixth embodiments in wireless communication, and to efficiently encode an acoustic signal with a small number of bits. it can.
[0238]
Note that the present invention can be applied to a transmission device, a transmission encoding device, or an acoustic signal encoding device that uses an audio signal. The present invention can also be applied to a mobile station apparatus or a base station apparatus.
[0239]
(Embodiment 10)
Next, a tenth embodiment of the present invention will be described with reference to the drawings. FIG. 27 is a block diagram showing a configuration of a communication apparatus according to Embodiment 10 of the present invention. In the tenth embodiment of the present invention, the signal processing device 2703 in FIG. 27 is constituted by one of the acoustic decoders shown in the first to sixth embodiments described above. There are features of the form.
[0240]
As shown in FIG. 27, a communication apparatus 2700 according to Embodiment 10 of the present invention includes an antenna 2701, an RF demodulation apparatus 2702, a signal processing apparatus 2703, a D / A conversion apparatus 2704, and an output apparatus 2705.
[0241]
The antenna 2701 receives a digital encoded acoustic signal as a radio wave, generates a digital received encoded acoustic signal of an electric signal, and provides the RF demodulator 2702 with it. The RF demodulator 2702 demodulates the received encoded acoustic signal from the antenna 2701 to generate a demodulated encoded acoustic signal, and provides it to the signal processor 2703.
[0242]
The signal processing device 2703 receives the digital demodulated encoded acoustic signal from the RF demodulating device 2702, performs a decoding process, generates a digital decoded acoustic signal, and provides the digital decoded acoustic signal to the D / A conversion device 2704. The D / A conversion device 2704 converts the digital decoded audio signal from the signal processing device 2703 to generate an analog decoded audio signal, and provides it to the output device 2705. The output device 2705 converts an analog decoded audio signal, which is an electrical signal, into air vibrations and outputs the sound as a sound wave to the human ear.
[0243]
As described above, according to the communication device of the present embodiment, the effects as described in the first to sixth embodiments can be enjoyed in wireless communication, and an acoustic signal encoded efficiently with a small number of bits can be decoded. Therefore, a good acoustic signal can be output.
[0244]
Note that the present invention can be applied to a receiving device, a receiving decoding device, or an audio signal decoding device using an audio signal. The present invention can also be applied to a mobile station apparatus or a base station apparatus.
[0245]
The present invention is not limited to the above-described embodiment, and can be implemented with various modifications. For example, although the case where the signal processing apparatus is used has been described in the above embodiment, the present invention is not limited to this, and the signal processing method may be performed as software.
[0246]
For example, a program for executing the signal processing method may be stored in advance in a ROM (Read Only Memory), and the program may be operated by a CPU (Central Processor Unit).
[0247]
Further, a program for executing the above signal processing method is stored in a computer-readable storage medium, the program stored in the storage medium is recorded in a RAM (Random Access memory) of the computer, and the computer operates according to the program. You may make it let it.
[0248]
In the above description, the case where MDCT is used for the transform method from the time domain to the frequency domain is described, but the present invention is not limited to this, and any orthogonal transform can be applied. For example, a discrete Fourier transform or a discrete cosine transform can be applied.
[0249]
Note that the present invention can be applied to a receiving device, a receiving decoding device, or an audio signal decoding device using an audio signal. The present invention can also be applied to a mobile station apparatus or a base station apparatus.
[0250]
【The invention's effect】
As described above, according to the audio encoding device and the audio encoding method of the present invention, the input signal is downsampled and encoded, the encoded signal is decoded and upsampled, and the upsampled decoded signal is decoded. In the encoding method for encoding the difference signal between the input signal and the input signal, the signal on both the encoding side and the decoding side is used to determine the frequency to be encoded in the enhancement layer from the upsampled decoded signal. Therefore, the frequency can be determined only by this, and therefore, it is possible to perform encoding at a low bit rate and high quality without transmitting information on this frequency from the encoding side to the decoding side.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of an acoustic encoding apparatus according to Embodiment 1 of the present invention.
FIG. 2 is a diagram showing an example of information distribution of acoustic signals
FIG. 3 is a diagram illustrating an example of regions to be encoded in a base layer and an enhancement layer
FIG. 4 is a block diagram illustrating an example of an internal configuration of a frequency determination unit of the acoustic encoding device according to the embodiment.
FIG. 5 is a diagram illustrating an example of an internal configuration of an auditory masking calculator of the acoustic encoding device according to the embodiment.
FIG. 6 is a block diagram showing an example of an internal configuration of the enhancement layer encoder according to the embodiment.
FIG. 7 is a block diagram showing an example of an internal configuration of the auditory masking calculator according to the embodiment.
FIG. 8 is a block diagram showing the configuration of the acoustic decoding apparatus according to Embodiment 1 of the present invention.
FIG. 9 is a block diagram illustrating an example of an internal configuration of an enhancement layer decoder of the acoustic decoding device according to the above embodiment.
FIG. 10 is a block diagram showing an example of an internal configuration of a base layer encoder according to Embodiment 2 of the present invention.
FIG. 11 is a block diagram showing an example of an internal configuration of a base layer decoder according to the above embodiment
FIG. 12 is a block diagram showing an example of an internal configuration of a base layer decoder according to the above embodiment
FIG. 13 is a block diagram showing an example of an internal configuration of a frequency determination unit of the acoustic encoding device according to Embodiment 3 of the present invention.
FIG. 14 is a diagram showing an example of a residual spectrum calculated by the estimation error spectrum calculator of the above embodiment.
FIG. 15 is a block diagram showing an example of an internal configuration of a frequency determination unit of the acoustic encoding device according to Embodiment 4 of the present invention.
FIG. 16 is a block diagram illustrating an example of an internal configuration of a frequency determination unit of the acoustic encoding device according to the embodiment.
FIG. 17 is a block diagram showing an example of an internal configuration of an enhancement layer encoder of the acoustic encoding device according to Embodiment 5 of the present invention.
FIG. 18 is a diagram showing an example of ranking of estimated distortion values of the ordering unit of the embodiment.
FIG. 19 is a block diagram showing an example of an internal configuration of an enhancement layer decoder of the acoustic decoding device according to Embodiment 5 of the present invention.
FIG. 20 is a block diagram showing an example of an internal configuration of an enhancement layer encoder of the acoustic encoding device according to Embodiment 6 of the present invention.
FIG. 21 is a block diagram showing an example of an internal configuration of an enhancement layer decoder of the acoustic decoding device according to Embodiment 6 of the present invention.
FIG. 22 is a block diagram showing an example of an internal configuration of a frequency determination unit of the acoustic encoding device according to the embodiment.
FIG. 23 is a block diagram showing an example of an internal configuration of an enhancement layer decoder of the acoustic decoding device according to Embodiment 6 of the present invention.
FIG. 24 is a block diagram showing a configuration of a communication apparatus according to Embodiment 7 of the present invention.
FIG. 25 is a block diagram showing a configuration of a communication apparatus according to Embodiment 8 of the present invention.
FIG. 26 is a block diagram showing a configuration of a communication apparatus according to Embodiment 9 of the present invention.
FIG. 27 is a block diagram showing a configuration of a communication apparatus according to Embodiment 10 of the present invention.
FIG. 28 is a diagram showing an example of a spectrum of an acoustic (music) signal.
[Explanation of symbols]
101 Downsampler
102 Base layer encoder
103 Local decoder
104 Upsampler
105 delay device
106 Subtractor
107, 804 Frequency determining unit
108 enhancement layer encoder
109 Multiplexer
401 FFT section
402 Estimated auditory masking calculator
403 decision part
601 and 701 MDCT sections
602 MDCT coefficient quantizer
801, 1101 separator
802 Base layer decoder
803 Upsampler
805 enhancement layer decoder
806, 903 adder
901 MDCT coefficient decoder
902 IMDCT section
1102 Sound generator
1103 Synthesis filter
1301 Estimated error spectrum calculator
1302, 1502, 1601 determining unit
1501 Estimated auditory masking correction section
1701, 1901 Ordering unit
1702, 2002, 2201 MDCT coefficient quantizer
1902, 2102, 2301 MDCT coefficient decoder
2001, 2101 Fixed band designation part

Claims

Downsampling means for lowering the sampling rate of the input signal, base layer encoding means for encoding the input signal whose sampling rate has been lowered, decoding means for decoding the encoded input signal to obtain a decoded signal, Up-sampling means for raising the sampling rate of the decoded signal to the same rate as the sampling rate of the input signal at the time of input, and subtracting means for obtaining an error signal from the difference between the input signal at the time of input and the decoded signal with the increased sampling rate Frequency determining means for determining a target frequency for encoding the error signal based on a decoded signal whose sampling rate is increased, and enhancement layer encoding means for encoding the differential signal at the frequency. An acoustic encoding device comprising:

2. The acoustic encoding apparatus according to claim 1, wherein the base layer encoding means encodes an input signal using a code-excited linear prediction method.

The acoustic coding according to claim 1 or 2, wherein the enhancement layer coding means orthogonally transforms the difference signal from a time domain to a frequency domain, and codes the transformed difference signal. apparatus.

Auditory masking means for calculating auditory masking representing an amplitude value that does not contribute to hearing is provided, and the enhancement layer encoding means encodes the frequency determination means so that the signal in the auditory masking is not subject to encoding. The acoustic encoding apparatus according to claim 1, wherein a target to be determined is determined and an error spectrum that is a spectrum of the error signal is encoded.

The auditory masking means includes a frequency domain converter that converts a decoded signal with an increased sampling rate into a frequency domain coefficient, an estimated auditory masking calculator that calculates an estimated auditory masking using the frequency domain coefficient, and Determining means for obtaining a frequency at which the amplitude value of the spectrum of the decoded signal exceeds the amplitude value of the estimated auditory masking, wherein the enhancement layer encoding means encodes the error spectrum located at the frequency. The acoustic encoding device according to claim 4, wherein

The auditory masking means includes an estimated error spectrum calculating means for calculating an estimated error spectrum using a coefficient in the frequency domain, and the determining means determines that the amplitude value of the estimated error spectrum is the amplitude value of the estimated auditory masking. The acoustic encoding device according to claim 5, wherein a frequency exceeding is obtained.

The auditory masking means includes a correcting means for smoothing the estimated auditory masking calculated by the estimated auditory masking calculating means, and the determining means smoothes the amplitude value of the decoded signal spectrum or the estimated error spectrum. The acoustic encoding device according to claim 5, wherein a frequency exceeding an amplitude value of the estimated auditory masking is obtained.

The enhancement layer encoding means calculates an amplitude value difference between either the estimated error spectrum or the error spectrum and either auditory masking or estimated auditory masking for each frequency, and performs coding based on the magnitude of the difference between the amplitude values. The acoustic encoding apparatus according to claim 5, wherein the information amount of the encoding is determined.

9. The enhancement layer encoding unit encodes the error spectrum in a predetermined band in addition to the frequency obtained by the determination unit. Acoustic encoding device.

Base layer decoding means for decoding a first encoded code obtained by encoding an input signal in units of a predetermined basic frame on the encoding side to obtain a first decoded signal; and a sampling rate of the first decoded signal as a second decoded signal Up-sampling means for raising the sampling rate to the same sampling rate as the above-described sampling rate, and encoding a residual signal between the input signal and a signal obtained by decoding the first encoded code on the encoding side based on the up-sampled first decoded signal Frequency determining means for determining a target frequency for decoding the converted second encoded code, and enhancement layer decoding means for decoding the second encoded code using the frequency information to obtain a second decoded signal And an adding means for adding the second decoded signal and the first decoded signal having an increased sampling rate. Apparatus.

The acoustic decoding apparatus according to claim 10, wherein the base layer decoding means decodes the first encoded code using a code-excited linear prediction method.

The acoustic decoding device according to claim 10 or 11, wherein the enhancement layer decoding means orthogonally transforms a signal obtained by decoding the second encoded code from the frequency domain to the time domain.

Auditory masking means for calculating auditory masking representing an amplitude value that does not contribute to hearing is provided, and the enhancement layer decoding means performs decoding so that the signal in the auditory masking is not subject to decoding in the frequency determining means. The acoustic decoding device according to claim 10, wherein an object to be determined is determined.

The auditory masking means includes a frequency domain conversion means for converting the decoded signal of the base layer whose sampling rate is increased into a frequency domain coefficient, and an estimated auditory masking calculation means for calculating an estimated auditory masking using the frequency domain coefficient. And a determining means for obtaining a frequency at which an amplitude value of a spectrum of the decoded signal exceeds an amplitude value of the estimated auditory masking, wherein the enhancement layer decoding means decodes the error spectrum located at the frequency The audio decoding device according to claim 13, wherein:

The auditory masking means includes an estimated error spectrum calculating means for calculating an estimated error spectrum using a coefficient in the frequency domain, and the determining means determines that the amplitude value of the estimated error spectrum is the amplitude value of the estimated auditory masking. The audio decoding apparatus according to claim 14, wherein a frequency exceeding is obtained.

The auditory masking means includes a correcting means for smoothing the estimated auditory masking calculated by the estimated auditory masking calculating means, and the determining means smoothes the amplitude value of the decoded signal spectrum or the estimated error spectrum. The acoustic decoding device according to claim 14 or 15, wherein a frequency exceeding an amplitude value of the estimated auditory masking is obtained.

The enhancement layer decoding means calculates an amplitude value difference between either the estimated error spectrum or the error spectrum and either auditory masking or estimated auditory masking for each frequency, and decodes based on the magnitude of the amplitude value difference. The audio decoding device according to any one of claims 14 to 16, wherein the amount of information to be converted is determined.

18. The enhancement layer decoding unit decodes the error spectrum in a predetermined band in addition to the frequency obtained by the determination unit. The acoustic decoding device described.

An acoustic input means for converting an acoustic signal into an electrical signal, an A / D conversion means for converting a signal output from the acoustic input means into a digital signal, and a digital signal output from the A / D conversion means An acoustic encoding device according to any one of claims 1 to 9, an RF modulation unit that modulates an encoded code output from the encoding device into a radio frequency signal, and an RF modulation unit An acoustic signal transmission device comprising: a transmission antenna that converts an output signal into a radio wave and transmits the signal.

The receiving antenna for receiving radio waves, the RF demodulating means for demodulating the signal received by the receiving antenna, and the information obtained by the RF demodulating means are decoded. Acoustic decoding apparatus, D / A conversion means for converting a signal output from the decoding apparatus into an analog signal, and acoustic output means for converting an electrical signal output from the D / A conversion means into an acoustic signal An acoustic signal receiving device comprising:

A communication terminal device comprising at least one of the acoustic signal transmitting device according to claim 19 or the acoustic signal receiving device according to claim 20.

A base station apparatus comprising at least one of the acoustic signal transmitting apparatus according to claim 19 or the acoustic signal receiving apparatus according to claim 20.

In the acoustic encoding method of the present invention, on the encoding side, a first encoded code is generated by encoding an input signal with a reduced sampling rate, and the first encoded code is decoded. Raise the sampling rate to the same rate as the sampling rate of the input signal at the time of input, determine the frequency to encode the error signal based on the decoded signal with the increased sampling rate, and sample the input signal at the time of input Of the difference signal from the decoded signal whose rate has been increased, the difference signal at the frequency is encoded to generate a second encoded code, and on the decoding side, the first encoded code is decoded to obtain a second encoded code. A decoded signal is obtained, the sampling rate of the second decoded signal is increased to the same rate as the sampling rate of the third decoded signal, and the sampling rate is increased. Determining a target frequency for decoding the second encoded code based on the second decoded signal, obtaining the third decoded signal by decoding the second encoded code using the frequency information, An acoustic encoding method comprising: adding the second decoded signal and the third decoded signal having the increased sampling rate.