JP3590342B2

JP3590342B2 - Signal encoding method and apparatus, and recording medium recording signal encoding program

Info

Publication number: JP3590342B2
Application number: JP2000318017A
Authority: JP
Inventors: 明夫神; 健弘守谷; 直樹岩上; 岳至森; 和明千喜良
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2000-10-18
Filing date: 2000-10-18
Publication date: 2004-11-17
Anticipated expiration: 2020-10-18
Also published as: JP2002123298A

Description

【０００１】
【発明の属する技術分野】
本発明は、入力信号を時間軸／周波数軸変換して量子化を行う信号符号化方法及び装置に関し、特に、符号化に際して発生する量子化誤差を、人間の耳が知覚しづらいように変形するための聴覚マスキング方法と、この聴覚マスキング方法による信号符号化装置に関する。
【０００２】
【従来の技術】
音声・楽音を符号化する従来の信号符号化方法における聴覚マスキング方法としては、入力信号を時間軸上または時間軸／周波数軸変換した上で、線形予測分析方法等によりその入力信号のスペクトル包絡曲線を推定し、その推定された曲線に妥当な変形操作を加えることによってマスキング曲線を求めて聴覚マスキングを行なうという方法があった。あるいは、入力信号を時間軸／周波数軸変換した信号から直接、スペクトル包絡曲線を求め、この曲線に妥当な変形操作を加えることによってマスキング曲線を求めて、聴覚マスキングによる量子化を行なう方法もあった。
【０００３】
【発明が解決しようとする課題】
聴覚マスキング方法では、周波数軸上でのマスキングとして、スペクトル包絡曲線の谷付近の量子化雑音を減らし、その代りにスペクトル包絡曲線の山付近の量子化雑音を増加させるようなノイズシェイピングを行うことによって、人間の耳には量子化雑音が聞こえにくいようにすることができる。ここで、上述したような従来法では、スペクトル包絡における山と谷の推定位置が不正確となる場合があったため、ノイズシェイピングが適切に行われずに、結果として符号化再生音の音質が悪い場合があった。
【０００４】
そこで本発明の目的は、スペクトル包絡曲線における山と谷の位置を正確に推定することができ、これによって精度の高い聴覚マスキング方法を実行できる信号符号化方法及び装置を提供することにある。
【０００５】
【課題を解決するための手段】
本発明は、聴感ベースでの歪みが最小となるように量子化できる信号符号化を実現するためのものであって、上述した課題を解決するために、スペクトル包絡曲線の山と谷の位置を正確に推定し、正確に推定した山と谷の位置から適切なノイズシェイピングを行う手法を取る。スペクトル包絡曲線の山と谷の位置推定は、時間軸／周波数軸変換した信号の正確なスペクトル包絡曲線から必要に応じて、微細な凹凸を取り除き、さらに必要に応じて１階微分、２階微分を求めて、これらの微分値または、微分値の相加平均値から、山と谷の正確な位置を決定する。こうして得られた山と谷の位置において適切な重みづけを行ない、効果的なノイズシェイピングを実現する。
【０００６】
【発明の実施の形態】
次に、本発明の好ましい実施の形態について、図面を参照して説明する。図１は本発明の実施の一形態の信号符号化装置の構成を示すブロック図である。
【０００７】
この信号符号化装置は、典型的には音声信号あるいは楽音信号である時系列の入力信号ｘ（ｔ）に対して時間軸／周波数軸変換（Ｔ／Ｆ変換）を施して周波数軸上の信号列Ｘ（ｎ）を得るＴ／Ｆ変換部１１と、この信号列Ｘ（ｎ）に対してベクトル量子化（ＶＱ）及びスカラー量子化（ＳＱ）を施して量子化インデックスを得る量子化部１２を備えている。ここでＴ／Ｆ変換部１１は、例えば、ＭＤＣＴ（ｍｏｄｉｆｉｅｄｄｅｓｃｒｅｔｅｃｏｓｉｎｅｔｒａｎｓｆｏｒｍ；変形離散コサイン変換）などの変換を実行し、Ｘ（ｎ）はこの変換によって得られた変換係数列などを指す。さらにこの信号符号化装置では、どの周波数帯域にどれだけの情報量を配分するのかを決定するための“聴覚重み”を算出し、量子化部１１での量子化に際し、人間の耳に量子化雑音が聞こえ難いようにこの聴覚重みに基づく聴覚重み付け量子化が行われるようになっている。聴覚重みの算出のために、この信号符号化装置は、信号列Ｘ（ｎ）に基づいてスペクトル包絡を算出する包絡算出部１３と、算出されたスペクトル包絡に基づいてスペクトルの山と谷の位置を推定する山・谷推定部１４と、推定されたスペクトルの山と谷の位置に基づき、情報量の配分が「山の位置で特に小さく」かつ「谷の位置で特に大きく」なるように、山の付近と谷の付近において適切な重み付けを行う重み付け部１５と、“聴覚重み”として量子化部１２に出力する聴覚重み算出部１６と、を備えている。ここで“聴覚重み”の原形としては、スペクトル包絡の逆数を用いている。
【０００８】
なお、山、谷については、横軸を周波数軸として信号列Ｘ（ｎ）をプロットし、ならした（平滑化した）ときに、周囲に比べて信号列の値が大きいところを山と称し、周囲に比べて値が小さいところを谷と称している。後述するように、平滑化は、例えばある区間長（平均区間長ともいう）での相加平均を算出する（その区間長による移動平均を算出する）ことによって行われているが、このとき、その区間長を変化させることにより、微細な山・谷、やや微細な山・谷、大まかな山・谷の位置などが推定されることになる。ここで相加平均とは、１フレーム内のスペクトルを周波数区間内で平滑化するためのものである。本発明では、平滑化の度合いが異なる山・谷の位置の推定を組み合わせることにより、より精度の高い聴覚マスキングを可能にしている。
【０００９】
次に、この信号符号化装置の動作を説明する。
【００１０】
時系列の信号として入力する時系列の入力信号ｘ（ｔ）は、Ｔ／Ｆ変換部１１によって周波数軸上の信号列Ｘ（ｎ）に変換される。この信号列Ｘ（ｎ）は、ベクトル量子化及びスカラー量子化のために量子化部１２に供給されるとともに、そのスペクトル包絡を算出するために、包絡算出部１３にも送られる。包絡算出部１３は、信号列Ｘ（ｎ）のスペクトル包絡を算出し、山・谷推定部１４は、算出されたスペクトル包絡に基づいて、スペクトルにおける山と谷の位置を推定し、推定した位置を重み付け部１５に出力する。重み付け部１５は、包絡算出部１３において得られたスペクトル包絡の逆数に基づいて、スペクトルの山と谷の位置においてそれぞれ、情報量の配分が「山の位置で特に小さく」、「谷の位置で特に大きく」なるように、山の付近と谷の付近において、適切な情報量重み付けを行う。具体的には、山の付近を高く持ち上げかつ谷の付近を深く下げるか、あるいは、山の付近を低く下げ谷の付近を浅くなるように持ち上げるような重み関数を用いて、山・谷の位置へ重み付け操作を行う。重み付け部１５には、包絡算出部１３からスペクトル包絡曲線が供給されており、重み付け操作が施されたスペクトル包絡曲線が重み付け部１５から聴覚重み算出部１６に供給される。
【００１１】
聴覚重み算出部１６は、重み付けされたスペクトル包絡曲線に基づいて量子化用聴覚重みを算出してそれを量子化部１２に向けて出力する。その結果、量子化部１３は、供給された量子化用聴覚重みを使用して、Ｔ／Ｆ変換部１１からの信号列Ｘ（ｎ）に対するベクトル量子化及びスカラー量子化を実行する。これにより、量子化部１３から、精度の高い聴覚マスキングがなされた量子化インデックス（出力インデックス）が出力される。
【００１２】
以上、この実施の形態の信号符号化装置の基本的動作を説明したが、本発明では、聴覚重み付けの方法として、上述した重み付けの方法と、従来から一般的に用いられている線形予測分析法等によりスペクトル包絡を予測し包絡曲線の山と谷をべき乗演算によりなまらせ重みとする方法とを併用してもよい。
【００１３】
次に、この実施の形態における重み付けの過程を詳細を説明する。
【００１４】
図２は、スペクトルの山・谷へ重み付けを行う過程を示すブロック図である。ここでは、スペクトル包絡算出部１３において得られたスペクトル包絡曲線から、山・谷推定部１４において、スペクトルの微細な山・谷の周波数位置を推定し、次にやや微細な山・谷の周波数位置を推定し、というように、この手順を必要な回数だけ繰り返し、最後に、スペクトルの大まかな山・谷の周波数位置を推定する。重み付け部１５は、これらの推定された山と谷の付近に対して、各々、妥当な重み関数によって重み付け操作を行う。
【００１５】
図３は、包絡算出部１３における処理の詳細を示すブロック図である。包絡算出部１３は、周波数領域の信号列Ｘ（ｎ）に対して相加平均処理を施すことにより、スペクトル包絡曲線を得るものである。図において、相加平均（１）から相加平均（ｋ）までは、それぞれ、区間長が異なる移動平均区間における相加平均である。ここでは、信号列Ｘ（ｎ）に対し、まず、第１の相加平均（１）が適用され、その結果Ｙ_１（ｎ）に対して第２の相加平均（２）が適用され、さらにその結果Ｙ_２（ｎ）に対して第３の相加平均（３）が適用されるというようにして、ｋ回の相加平均を順次行うようにしている。ここでｋは１以上の整数の定数である。このようにして得られた各相加平均の結果Ｙ_１（ｎ），Ｙ_２（ｎ），．．．，Ｙ_ｋ（ｎ）は、それぞれ山・谷推定部１４に送られる。各回の相加平均での区間長は、各々の用途に応じて決定されるものであるが、主として、相加平均（１）では平均区間長を短くして微細な山と谷の位置を検出し、相加平均（２）では相加平均（１）よりも平均区間長を長くして大まかな山と谷の位置を検出する。以下、相加平均（ｋ）まで同様の操作とし、各回の相加平均での平均区間長を徐々に長くして行くとよい。
また、前述した“相加平均（ｋ）”の演算は、必要に応じて、平均区間長を変えて複数回実施してもよい。
【００１７】
次に、山・谷推定部１４での処理を説明する。図４は、山・谷推定部１４での処理を説明するブロック図である。
【００１８】
山・谷推定部１４は、包絡算出部１３からの各回の相加平均によるスペクトル包絡を表す係数列Ｙ_１（ｎ），Ｙ_２（ｎ），．．．，Ｙ_ｋ（ｎ）を入力として、係数列ごとに、以下のようにして山と谷の位置を推定する。すなわち、入力した係数列Ｙ_ｊ（ｎ）（１≦ｊ≦ｋ）をまずｎで微分して系列Ｙ′_ｊ（ｎ）を求め、この系列Ｙ′_ｊ（ｎ）に対して適切な区間で相加平均をとり、微細な変動成分を取り除いた系列
【００１９】
【外１】

【００２０】
を求める。さらにこれをｎで再び微分して系列Ｙ″_ｊ（ｎ）を求め、この系列Ｙ″_ｊ（ｎ）の微細な変動成分を取り除いた系列
【００２１】
【外２】

【００２２】
を求める。そして、図４中に式で示したように、これらの値の正負からスペクトル包絡曲線の山と谷の位置を推定する。また、前述した、微細な変動成分を取り除くための“相加平均”の演算は、必要に応じて、平均区間長を変えて複数回実施してもよいし、これを実施しなくてもよい。
【００２３】
図５は、以上のようにして係数列Ｘ（ｎ）からスペクトル包絡の山と谷が検出された様子を例示する図である。ここでは、ｋ＝２、すなわち包絡算出部１３において２段階に相加平均を求める場合を示している。この図において、平均を取る前の係数列Ｘ（ｎ）の絶対値｜Ｘ（ｎ）｜を▲１▼、相加平均（１）による系数列Ｙ_１（ｎ）における絶対値｜Ｙ_１（ｎ）｜を▲２▼、相加平均（２）による係数列Ｙ_２（ｎ）における絶対値｜Ｙ_２（ｎ）｜を▲３▼とする。相加平均（１）から推定した山の位置をｍ_１，ｍ_２，．．．，ｍ_１２、谷の位置をＶ_１，Ｖ_２，．．．，Ｖ_１１で表し、相加平均（２）から推定した山の位置をＭ_１，Ｍ_２，Ｍ_３、谷の位置をＶ_１，Ｖ_２で表している。ここでは、相加平均（１）での区間長よりも相加平均（２）での区間長を長くしており、▲２▼が微細な山・谷の周波数位置に相当し、▲３▼が大まかな山・谷の周波数位置に相当する。
【００２４】
次に、このようにして、複数種類の山・谷の周波数位置が求められたとして、どのように情報量の重み付けを行うかを説明する。図６は、スペクトル包絡曲線の山・谷付近に情報量の重み付けを行った例を示す図である。ここでは、説明を分かりやすくするために、おおまかな波形を使って説明を行う。
【００２５】
図６において、あらかじめ推定されたスペクトル包絡曲線▲１▼（｜Ｙ_２（ｎ）｜）の逆数▲２▼（１／｜Ｙ_２（ｎ）｜）を聴覚重みの原形とし、これの山と谷の推定位置付近において、重み関数を使って重み付けを行う。この図の例では、重み付け関数▲４▼を▲２▼に乗算することによって、山と谷の位置で情報量を補正した聴覚重み▲３▼（Ｗ_Ｌ）を作成している。重み付け関数▲４▼及び▲５▼としては、種々の形のものが可能であるが、ここでは、一例として、重み付けを行う区間長が２ｔ、山の中心で０．５倍、山の端で１．０倍、谷の中心で２．０倍、谷の端で１．０倍となるような直線関数による重み付けを行った結果を▲３▼として示している。図６から分かるように、山と谷の正確な位置を推定し、谷の付近に情報量を多くし、山の付近に情報量を少なく割り当る重みを作成することができる。
【００２６】
ここでｔの値は、例えば、ピッチ周波数を表す山・谷の構造に重み付けしたい場合には１００〜２００Ｈｚ、ホルマント周波数を表す山・谷の構造に重み付けしたい場合には３００〜６００Ｈｚ程度とすることが好ましい。
【００２７】
実際には、スペクトル包絡の“微細な曲線”と“おおまかな曲線”の各々の山・谷の付近において、前述した方法により重み付けを行う。例えば、図５に示すようにスペクトル包絡の“微細な曲線”と“おおまかな曲線”の各々について山と谷の位置が推定されている場合には、微細構造を表すスペクトル包絡▲２▼の逆数１／｜Ｙ_１（ｎ）｜を聴覚重みの原形とし、この包絡曲線の山と谷の位置ｍ_１，ｖ_１，ｍ_２，ｖ_２，．．．の付近において、図６と同様にして聴覚重みの原形である１／｜Ｙ_１（ｎ）｜に対して適切な重み付けを行い、さらに、おおまかなスペクトル構造を表す曲線▲３▼の山と谷の位置Ｍ_１，Ｖ_１，Ｍ_２，Ｖ_２，．．．の付近において、同様に聴覚重みの原形である１／｜Ｙ_１（ｎ）｜に対して適切な重み付けを行う。
【００２８】
山に対する重み付け関数及び谷に対する重み付け関数としては、各種のものが考えられる。図７は、そうした重み付け関数を例示するものである。
【００２９】
図７中、（ａ），（ｂ）はいずれも山に対する重み付け関数の例を示しており、（ａ）は直線により構成されたもの、（ｂ）は放物線により構成されたものである。いずれも山の中心ｎ＝Ｍの両側にｔずつ、合計２ｔの区間を重み付け区間としている。重み付け関数の値は、重み付け区間の両端（Ｍ±ｔ）においては１．０であるものとする。また、山の中心ｎ＝Ｍにおける重みの値αは、通常、０＜α＜１．０における妥当な定数とすればよい。同様に図７中、（ｃ），（ｄ）は、谷に対する重み付け関数の例を示しており、（ｃ）は直線により構成されたもの、（ｄ）は放物線により構成されたものである。山の場合と同様に、谷に対する重み付け関数も、その値は、重み付け区間の両端（Ｖ±ｔ）においては１．０である。また、谷の中心ｎ＝Ｖにおける重みの値βは、通常、β＞１．０における妥当な定数を使用する。しかしながら、場合によっては、α＞１．０，０＜β＜１．０とすると効果的なこともある。
【００３０】
このようにして聴覚重み付けを行った場合に、量子化雑音は図８に示すように変形される。すなわち、聴覚重み付けを行わない場合には、量子化ノイズは周波数によらずに一定であると考えられるが（図中▲２▼）、入力信号のスペクトル包絡が図中▲１▼に示すようなものであるとすると、上述した聴覚重み付けを行うことにより、ノイズは、図中▲３▼に示すようにその周波数特性が変形され、入力信号のスペクトル特性である▲１▼に隠されて、聴感的に聞こえ難くなる。
【００３１】
したがって、従来法よりも精度の高い聴覚マスキングが行なえ、高品質な符号化を行なうことが可能となる。
【００３２】
次に、上述した本発明の信号符号化方法を一般的な変換符号化方式の聴覚重み付けに適用した例を説明する。図９はそのような聴覚重み付けを行う信号符号化装置の構成を示している。
【００３３】
図９に示す信号符号化装置は、入力信号に対してＭＤＣＴを施すＭＤＣＴ変換部３１と、ＭＤＣＴ後の信号のスペクトルを平坦化するスペクトル平坦化部３２と、平坦化後のスペクトルに基づいてフレームゲインを正規化し量子化した後、ゲインインデックスを出力するフレームゲイン正規化部３３と、正規化されたフレームゲインに基づいて残差成分を量子化（ベクトル量子化あるいはスカラー量子化）し、量子化インデックスを出力する残差成分量子化部３４と、ＭＤＣＴ後の信号のスペクトルからスペクトル包絡を推定するスペクトル包絡推定部３５と、残差成分量子化部３４での量子化に際して情報量重み付けを行うために、推定されたスペクトル包絡から聴覚重みを計算する聴覚重み計算部３６と、推定されたスペクトル包絡に基づいてスペクトル情報を量子化しスペクトルインデックスを出力するスペクトル情報量子化部３７とを備えている。この信号符号化装置では、ＭＤＣＴ変換部３１が図１に示した信号符号化装置のＴ／Ｆ変換部１１に相当し、また、スペクトル包絡推定部３５は、図１に示す装置の包絡算出部１３及び山・谷推定部１４で構成され、聴覚重み計算部３６は、図１に示す装置の重み付け部１５及び聴覚重み算出部１６で構成される。
【００３４】
本発明の信号符号化方法により、分析フレーム内におけるスペクトルの山と谷を正確かつ細かに分析し、その形に合わせて量子化の際に精度の高い聴覚マスキングを行うことができる。この聴覚マスキングは、ベクトル量子化や、サブバンドスカラー量子化に対して適用できる。
【００３５】
さらに図１０は、特開平８−４４３９９号公報に開示される符号器及び復号器に本発明の聴覚重み付けを適用した例を示している。図１０に示されるものにおいて、符号器１１０は、入力端子１１１に与えられた入力信号をフレームに分割するフレーム分割部１１４と、フレームに時間窓を描ける時間窓掛部１１５と、時間窓が掛けられたフレームにＮ次のＭＤＣＴを施すＭＤＣＴ部１１６と、時間窓が掛けられたフレームに対して線形予測分析を行い予測係数を出力する線形予測分析部１１７と、予測係数を量子化してインデックスＩ_ｐを得る量子化部１１８と、予測係数のスペクトラム概形を求めるスペクトラム概形計算部１２１と、ＭＤＣＴ部１１６からのスペクトラム振幅をスペクトラム概形により正規化し残差係数Ｒ（Ｆ）を得る正規化部１２２と、残差係数概形Ｅ_Ｒ（Ｆ）を計算する残差概形計算部１２３と、残差係数概形及びスペクトラム概形に基づいて重み付け係数（ベクトルＷ）を計算する重み計算部１２４と、重み付け係数に基づいて量子化しインデックスＩ_ｍと量子化小系列ベクトルＣ（ｍ）を出力する量子化部１２５と、残差係数Ｒ（Ｆ）を残差係数概形Ｅ_Ｒ（Ｆ）で正規化して微細構造係数を得る残差係数正規化部１２６と、現フレームの微細構造係数を正規化し正規化微細構造係数Ｘ（Ｆ）として量子化部１２５に与えるとともにインデックスＩ_Ｇを出力するパワー正規化部１２７と、量子化小系列ベクトルＣ（ｍ）を逆正規化し量子化残差係数Ｒ_ｑ（Ｆ）を残差概形計算部１２３に出力する逆正規化部１３１とを備えている。
【００３６】
符号器１１０において本発明に基づく聴覚重み付けを行うためには、スペクトラム概形計算部１２１において、従来法に加えてさらに図１に示した信号符号化装置の包絡算出部１３及び山・谷推定部１４での処理と同様の処理を行わせ、その結果に基づいて、重み計算部１２４においては、従来法に加えてさらに図１に示した装置の重み付け部１５及び聴覚重み算出部１６での処理と同様の処理を行い、得られた量子化用聴覚重みを量子化部１２５に供給するようにすればよい。
【００３７】
これに対して復号器１５０は、インデックスＩ_ｍから正規化微細構造係数を再生する再生部１５１と、インデックスＩ_Ｇから正規化ゲインを再生する正規化ゲイン再生部１５２と、正規化微細構造係数を正規化ゲインにより逆正規化して微細構造係数を得るパワー逆正規化部１５３と、微細構造係数を残差概形ＥＲで逆正規化して残差係数Ｒ（Ｆ）を再生する残差逆正規化部１５４と、残差概形Ｅ_Ｒを計算する残差概形計算部１５５と、インデックスＩ_ｐから線形予測係数を再生しスペクトラム概形を計算する再生・スペクトラム概形計算部１５６と、スペクトラム概形を残差係数Ｒ（Ｆ）で逆正規化し周波数領域係数を再生する逆正規化部１５７と、周波数領域係数にフレームごとに逆ＭＤＣＴを施し時間領域信号を得る逆ＭＤＣＴ部１５８と、時間領域信号にフレームごとに時間窓を掛ける窓掛部１５９と、窓掛け出力に対してフレーム重ね合わせを行い再生音響信号を得てこれを出力端子１９１に出力するフレーム重ね合わせ部１６１と、を備えている。
【００３８】
なお、図１０に示す符号器１１０においては、逆正規化部１３１を設けることなく、正規化部１２２の出力のみに基づいて残差概形計算部１２３が残差係数概形Ｅ_Ｒ（Ｆ）とインデックスＩ_Ｑを算出するようにすることが可能であり、この場合、復号器１５０において残差概形計算部１５５はインデックスＩ_Ｑに基づいて残差概形Ｅ_Ｒを計算する。
【００３９】
次に、時間領域の符号化方式であるＣＥＬＰ（Ｃｏｄｅ−ＥｘｃｉｔｅｄＬｉｎｅａｒＰｒｅｄｉｃｔｉｏｎ）符号化の聴覚マスキングに本発明を適用した例を説明する。ＣＥＬＰ符号化では、時間領域で聴覚マスキングが行われるため、本発明に基づく聴覚重み付けを周波数領域で適用し、得られた聴覚重みを時間領域に戻してから量子化に適用する。図１１はそのような符号化を行う信号符号化装置の構成を示すブロック図である。
【００４０】
図１１に示す装置は、入力信号に対してＦＦＴ（高速フーリエ変換）を施すＦＦＴ部３８と、ＦＦＴ部の出力（周波数領域の信号列）に基づき、スペクトル包絡を推定するスペクトル包絡推定部３５と、推定されたスペクトル包絡から聴覚重みを計算する聴覚重み計算部３６と、聴覚重みを時間領域に戻すための逆ＦＦＴ部３９と、時間領域の聴覚重みに基づいて入力信号のＣＥＬＰ符号化を行い、インデックスを出力するＣＥＬＰ符号化部４０とを備えている。この信号符号化装置においては、ＦＦＴ部３８が図１に示した信号符号化装置のＴ／Ｆ変換部１１に相当し、また、スペクトル包絡推定部３５は、図１に示す装置の包絡算出部１３及び山・谷推定部１４で構成され、聴覚重み計算部３６は、図１に示す装置の重み付け部１５及び聴覚重み算出部１６で構成される。
【００４１】
さらに図１２は、特開平６−２８２２９８号公報の図１に開示される音声符号化装置に本発明の聴覚重み付けを適用した例を示している。図１２に示される音声符号化装置は、入力端子２０１を介して入力した音声信号をフレームに分割して線形予測分析を行い、予測係数を決定する予測係数決定部２０２と、合成フィルタ２０３と、予測係数を量子化して合成フィルタ２０３に予測係数を設定する予測係数量子化部２０４と、複数のピッチ周期ベクトルを記憶する適応符号帳２１７と、複数の雑音波形ベクトルを記憶する雑音符号帳２１８と、適応符号帳２１７から選択されたピッチ周期ベクトルに利得を加える利得部２１９ａ及び雑音符号帳２１８から選択された雑音波形ベクトルに利得を加える利得部２１９ｂとを有する利得符号帳２１９と、利得部２１９ｂの過去の出力パワーに基づいて次の雑音波形ベクトルの予測利得を得る予測利得決定部２１５と、利得部２１９ｂの入力側に設けられ選択された雑音波形ベクトルにこの予測利得を加える予測利得部２１６と、利得部２１９ａ、２１９ｂからの出力ベクトルを加算して駆動ベクトルとして合成フィルタ２０３に供給する加算器２０９と、入力音声ベクトル（入力信号）から合成フィルタ２０３の出力（合成音声ベクトル）を減算して歪データとして出力する減算器２１１と、歪データに対して聴覚重み付けを行う聴覚重み付けフィルタ２２０と、聴覚重み付け後の歪データに基づいて歪パワーを計算し、歪パワーが最小になるように各符号帳２１７〜２１９での選択を行う歪パワー計算部２１２と、符号を出力する符号出力部２１３と、を備えている。
【００４２】
この音声符号化装置において本発明に基づく聴覚重み付けを行う場合には、上述の図１１に示した信号符号化装置をここでの聴覚重み付けフィルタ２２０として、または聴覚重み付けフィルタ２２０と併用して用いればよい。これにより、歪データに対して、本発明に基づく聴覚重み付けがなされることになる。さらに、ここでは図面を用いては説明しないが、特開平６−２８２２９８号公報の図２に開示される音声符号化装置においても、その聴覚重み付けフィルタとして、図１１に示した信号符号化装置を上述のように変形したものを使用することができる。
【００４３】
以上説明した本発明に基づく信号符号化方法及び装置は、それを実現するための計算機プログラムを、計算機（コンピュータ）に読み込ませ、そのプログラムを実行させることによっても実現できる。信号符号化を行うためのプログラムは、磁気テープやＣＤ−ＲＯＭなどの記録媒体によって、あるいは、ネットワークを介して、計算機に読み込まれる。図１３は、上述の信号符号化方法を実行する計算機の構成を示すブロック図である。
【００４４】
この計算機は、中央処理装置（ＣＰＵ）２１と、プログラムやデータを格納するためのハードディスク装置２２と、主メモリ２３と、キーボードやマウス、マイクロホンなどの入力装置２４と、ＣＲＴやスピーカなどの表示装置２５と、磁気テープやＣＤ−ＲＯＭ等の記録媒体２７を読み取る読み取り装置２６と、ネットワークに接続した通信インタフェース２８とから構成されている。ハードディスク装置２２、主メモリ２３、入力装置２４、表示装置２５、読み取り装置２６及び通信インタフェース２８は、いずれも中央処理装置２１に接続している。ハードディスク装置２２の代わりに、フラッシュＲＯＭなどの不揮発性半導体記憶装置を用いてもよい。この計算機は、信号符号化を行うためのプログラムを格納した記録媒体２７を読み取り装置２６に装着し、記録媒体２７からプログラムを読み出してハードディスク装置２２に格納し、ハードディスク装置２２に格納されたプログラムを中央処理装置２１が実行することにより、信号符号化装置として機能するようになる。もちろん、ネットワークを介して、信号符号化を行うためのプログラムをこの計算機にダウンロードするようにしてもよい。
【００４５】
【発明の効果】
以上説明したように、本発明によれば、音声・楽音信号を符号化する際に、従来法よりも精度の高い聴覚マスキングが行なえ、高品質な符号化を行なうことが可能となる。具体的には、例えばＭＤＣＴ変換等によって時系列信号を周波数領域の係数列に変換して量子化する際に、本発明を用いれば、人間の聴覚マスキング特性を利用して、量子化誤差を知覚し難いように、周波数軸上で従来法よりも高精度で配分することが可能となる。
【図面の簡単な説明】
【図１】本発明の実施の一形態の信号符号化装置の構成を示すブロック図である。
【図２】スペクトルの山・谷へ重み付けを行う過程を示すブロック図である。
【図３】包絡算出部における処理の詳細を示すブロック図である。
【図４】山・谷推定部における処理の詳細を示すブロック図である。
【図５】山・谷推定部により検出された、スペクトラム包絡における山及び谷の様子の一例を示す図である。
【図６】スペクトル包絡の山・谷付近に重み付けを行った例を示す図である。
【図７】（ａ）〜（ｄ）は、山・谷付近への重み付け関数の例を示す図である。
【図８】聴覚重み付け処理によって量子化雑音がスペクトル包絡にマスキングされる様子を示した図である。
【図９】本発明に基づく信号符号化装置の構成の一例を示すブロック図である。
【図１０】本発明に基づく聴覚重み付けが適用される符号器及び復号器の構成の一例を示すブロック図である。
【図１１】信号符号化装置の構成の一例を示すブロック図である。
【図１２】信号符号化装置の構成の一例を示すブロック図である。
【図１３】信号符号化装置を構成するために使用される計算機システムの一例を示すブロック図である。
【符号の説明】
１１Ｔ／Ｆ変換部
１２量子化部
１３包絡算出部
１４山・谷推定部
１５重み付け部
１６聴覚重み算出部[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a signal encoding method and apparatus for performing quantization by transforming an input signal on a time axis / frequency axis, and in particular, transforms a quantization error generated at the time of encoding so that it is difficult for a human ear to perceive. Masking method and a signal encoding device using the hearing masking method.
[0002]
[Prior art]
As an auditory masking method in a conventional signal encoding method for encoding voice / musical sound, an input signal is subjected to a time axis or time axis / frequency axis conversion, and then a spectral envelope curve of the input signal is obtained by a linear prediction analysis method or the like. There is a method in which auditory masking is performed by obtaining a masking curve by estimating the estimated curve and applying an appropriate deformation operation to the estimated curve. Alternatively, there is a method in which a spectrum envelope curve is directly obtained from a signal obtained by converting an input signal into a time axis / frequency axis, and a masking curve is obtained by applying a proper deformation operation to this curve, and quantization is performed by auditory masking. .
[0003]
[Problems to be solved by the invention]
In the auditory masking method, masking on the frequency axis is performed by performing noise shaping such that quantization noise near the valley of the spectrum envelope curve is reduced and quantization noise near the peak of the spectrum envelope curve is increased instead. However, the quantization noise can be hardly heard by a human ear. Here, in the conventional method as described above, since the estimated positions of peaks and valleys in the spectral envelope may be inaccurate, noise shaping is not properly performed, and as a result, the sound quality of the encoded reproduced sound is poor. was there.
[0004]
Accordingly, an object of the present invention is to provide a signal encoding method and apparatus capable of accurately estimating the positions of peaks and valleys in a spectral envelope curve and thereby executing a highly accurate auditory masking method.
[0005]
[Means for Solving the Problems]
The present invention is intended to realize signal encoding that can be quantized so that distortion on an auditory basis is minimized.In order to solve the above-described problem, the positions of peaks and valleys of a spectral envelope curve are determined. It takes a method of accurately estimating and performing appropriate noise shaping from the positions of the peaks and valleys that have been accurately estimated. The position estimation of the peaks and valleys of the spectrum envelope curve is performed by removing fine irregularities as necessary from the accurate spectrum envelope curve of the signal subjected to the time axis / frequency axis conversion, and further performing the first differentiation and the second differentiation as necessary. , And the exact positions of the peaks and valleys are determined from these differential values or the arithmetic mean of the differential values. Appropriate weighting is performed at the positions of the peaks and valleys thus obtained, and effective noise shaping is realized.
[0006]
BEST MODE FOR CARRYING OUT THE INVENTION
Next, a preferred embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram illustrating a configuration of a signal encoding device according to an embodiment of the present invention.
[0007]
This signal encoding apparatus performs time-axis / frequency-axis conversion (T / F conversion) on a time-series input signal x (t), which is typically a voice signal or a tone signal, to convert a signal on a frequency axis. A T / F converter 11 for obtaining a sequence X (n), and a quantizer 12 for performing a vector quantization (VQ) and a scalar quantization (SQ) on the signal sequence X (n) to obtain a quantization index. It has. Here, the T / F conversion unit 11 performs a conversion such as a modified discrete cosine transform (MDCT), and X (n) indicates a conversion coefficient sequence obtained by the conversion. Further, the signal encoding apparatus calculates an “auditory weight” for determining how much information amount is to be allocated to which frequency band, and when the quantization is performed by the quantization unit 11, the quantization is performed by the human ear. Perceptual weighting quantization based on this perceptual weight is performed so that noise is hard to hear. In order to calculate the perceptual weight, the signal encoding apparatus includes an envelope calculator 13 that calculates a spectrum envelope based on the signal sequence X (n), and positions of peaks and valleys of the spectrum based on the calculated spectrum envelope. Based on the peak and valley positions of the estimated spectrum, such that the distribution of the information amount is “particularly small at the peak position” and “particularly large at the valley position”. The apparatus includes a weighting unit 15 that performs appropriate weighting in the vicinity of a mountain and a valley, and an auditory weight calculator 16 that outputs to the quantizer 12 as “auditory weight”. Here, the reciprocal of the spectral envelope is used as the original form of the “auditory weight”.
[0008]
For the peaks and valleys, the signal sequence X (n) is plotted with the horizontal axis as the frequency axis, and when smoothed (smoothed), a portion where the value of the signal sequence is larger than the surroundings is called a peak, The area where the value is smaller than the surrounding area is called a valley. As described later, the smoothing is performed by, for example, calculating an arithmetic mean (calculating a moving average based on the section length) in a certain section length (also referred to as an average section length). By changing the section length, fine peaks and valleys, slightly fine peaks and valleys, rough positions of peaks and valleys, and the like are estimated. Here, the arithmetic averaging is for smoothing a spectrum in one frame in a frequency section. According to the present invention, more accurate auditory masking is enabled by combining the estimation of the positions of peaks and valleys having different degrees of smoothing.
[0009]
Next, the operation of the signal encoding device will be described.
[0010]
The time-series input signal x (t) input as a time-series signal is converted by the T / F converter 11 into a signal sequence X (n) on the frequency axis. The signal sequence X (n) is supplied to the quantization unit 12 for vector quantization and scalar quantization, and is also sent to the envelope calculation unit 13 to calculate the spectrum envelope. The envelope calculator 13 calculates the spectrum envelope of the signal sequence X (n), and the peak / valley estimator 14 estimates the positions of the peaks and valleys in the spectrum based on the calculated spectrum envelope, and estimates the estimated positions. Is output to the weighting unit 15. Based on the reciprocal of the spectrum envelope obtained by the envelope calculation unit 13, the weighting unit 15 determines that the information amount distribution is “particularly small at the peak position” and “at the valley position” at the peak and valley positions of the spectrum, respectively. Appropriate information amount weighting is performed in the vicinity of the peak and the valley so as to be “particularly large”. Specifically, the position of the valley / valley is raised using a weight function that raises the vicinity of the mountain high and lowers the vicinity of the valley deeply, or lowers the vicinity of the mountain and raises the vicinity of the valley so that it becomes shallow. Weighting operation. The weighting unit 15 is supplied with the spectrum envelope curve from the envelope calculation unit 13, and the spectrum envelope curve on which the weighting operation has been performed is supplied from the weighting unit 15 to the auditory weight calculation unit 16.
[0011]
The auditory weight calculator 16 calculates an auditory weight for quantization based on the weighted spectral envelope curve, and outputs it to the quantizer 12. As a result, the quantization unit 13 performs vector quantization and scalar quantization on the signal sequence X (n) from the T / F conversion unit 11 using the supplied auditory weights for quantization. As a result, a quantization index (output index) subjected to highly accurate auditory masking is output from the quantization unit 13.
[0012]
The basic operation of the signal encoding apparatus according to this embodiment has been described above. However, in the present invention, the above-mentioned weighting method and the linear prediction analysis method generally used conventionally are used as the auditory weighting method. For example, a method may be used in which a spectrum envelope is predicted by using the method, and peaks and valleys of the envelope curve are rounded by exponentiation and weighted.
[0013]
Next, the process of weighting in this embodiment will be described in detail.
[0014]
FIG. 2 is a block diagram showing a process of weighting peaks and valleys of a spectrum. Here, from the spectrum envelope curve obtained by the spectrum envelope calculation unit 13, the peak / valley estimating unit 14 estimates the frequency position of the fine peak / valley of the spectrum, and then the frequency position of the slightly fine peak / valley. This procedure is repeated as many times as necessary, and finally the rough frequency positions of peaks and valleys in the spectrum are estimated. The weighting unit 15 performs a weighting operation on each of the estimated peaks and valleys by using an appropriate weighting function.
[0015]
FIG. 3 is a block diagram illustrating the details of the process in the envelope calculation unit 13. The envelope calculation unit 13 obtains a spectrum envelope curve by performing arithmetic averaging on the signal sequence X (n) in the frequency domain. In the figure, arithmetic averages (1) to (k) are arithmetic averages in moving average sections having different section lengths. Here, first, the first arithmetic mean (1) is applied to the signal sequence X (n), and as a result Y ₁ A second arithmetic mean (2) is applied to (n) and the result Y ₂ The third arithmetic averaging (3) is applied to (n), so that k arithmetic averaging is sequentially performed. Here, k is an integer constant of 1 or more. Result Y of each arithmetic mean obtained in this way ₁ (N), Y ₂ (N),. . . , Y _k (N) are sent to the peak / valley estimating unit 14, respectively. The section length in each arithmetic averaging is determined according to each application, but mainly in the arithmetic averaging (1), the average section length is shortened to detect fine peaks and valleys. In the arithmetic averaging (2), the average section length is made longer than in the arithmetic averaging (1), and rough positions of peaks and valleys are detected. Hereinafter, the same operation is performed up to the arithmetic averaging (k), and the average section length in each arithmetic averaging may be gradually increased.
In addition, the above-described calculation of the “arithmetic average (k)” may be performed a plurality of times by changing the average section length as needed.
[0017]
Next, processing in the peak / valley estimating unit 14 will be described. FIG. 4 is a block diagram illustrating a process performed by the peak / valley estimating unit 14.
[0018]
The peak / valley estimating unit 14 calculates the coefficient sequence Y representing the spectral envelope by the arithmetic averaging of each time from the envelope calculating unit 13. ₁ (N), Y ₂ (N),. . . , Y _k Using (n) as an input, the positions of peaks and valleys are estimated for each coefficient sequence as follows. That is, the input coefficient sequence Y _j (N) (1 ≦ j ≦ k) is first differentiated by n to obtain a series Y ′ _j (N), and the sequence Y ′ _j A series obtained by taking the arithmetic mean in an appropriate section for (n) and removing fine fluctuation components
[0019]
[Outside 1]

[0020]
Ask for. Further, this is differentiated again by n to obtain a series Y ″ _j (N), and the sequence Y ″ _j A series from which the minute fluctuation component of (n) has been removed
[0021]
[Outside 2]

[0022]
Ask for. Then, as shown by the equation in FIG. 4, the positions of the peaks and valleys of the spectrum envelope curve are estimated from the positive and negative of these values. Further, the above-described operation of “arithmetic averaging” for removing minute fluctuation components may be performed a plurality of times by changing the average section length, if necessary, or may not be performed. .
[0023]
FIG. 5 is a diagram illustrating a manner in which peaks and valleys of the spectral envelope are detected from the coefficient sequence X (n) as described above. Here, k = 2, that is, the case where the envelope calculation unit 13 calculates the arithmetic mean in two stages is shown. In this figure, the absolute value | X (n) | of the coefficient sequence X (n) before taking the average is {1}, and the series Y of the arithmetic mean (1) ₁ Absolute value | Y in (n) ₁ (N) | is (2), coefficient sequence Y by arithmetic mean (2) ₂ Absolute value | Y in (n) ₂ (N) | is set to (3). The position of the mountain estimated from the arithmetic mean (1) is m ₁ , M ₂ ,. . . , M ₁₂ , The valley position is V ₁ , V ₂ ,. . . , V ₁₁ And the position of the mountain estimated from the arithmetic mean (2) ₁ , M ₂ , M ₃ , The valley position is V ₁ , V ₂ It is represented by Here, the section length in the arithmetic mean (2) is longer than the section length in the arithmetic mean (1), and (2) corresponds to a fine frequency position of peaks and valleys, and (3) Correspond roughly to the frequency positions of the peaks and valleys.
[0024]
Next, assuming that a plurality of types of frequency positions of peaks and valleys have been obtained in this manner, how to weight the information amount will be described. FIG. 6 is a diagram illustrating an example in which information amounts are weighted near peaks and valleys of a spectral envelope curve. Here, in order to make the explanation easy to understand, the explanation will be made using a rough waveform.
[0025]
In FIG. 6, the spectral envelope curve (1) (| Y ₂ (N) |) reciprocal (2) (1 / | Y ₂ (N) |) is used as the original form of the auditory weight, and weighting is performed using a weight function near the estimated position of the peak and valley. In the example of this figure, the auditory weight (3) (W) is obtained by multiplying (2) by the weighting function (4) to correct the amount of information at the positions of peaks and valleys. _L ) Has been created. As the weighting functions {circle around (4)} and {circle around (5)}, various forms are possible. Here, as an example, the section length to be weighted is 2t, 0.5 times at the center of the mountain, and 1.times. At the edge of the mountain. The result of weighting by a linear function such that it is 0 times, 2.0 times at the center of the valley, and 1.0 times at the end of the valley is shown as (3). As can be seen from FIG. 6, it is possible to estimate the exact positions of the peaks and valleys, create a weight that increases the amount of information near the valley, and reduces the amount of information near the peak.
[0026]
Here, the value of t is, for example, about 100 to 200 Hz when weighting the peak / valley structure representing the pitch frequency, and about 300 to 600 Hz when weighting the peak / valley structure representing the formant frequency. Is preferred.
[0027]
Actually, weighting is performed in the vicinity of the peaks and valleys of the “fine curve” and “rough curve” of the spectral envelope by the above-described method. For example, as shown in FIG. 5, when the positions of the peaks and valleys are estimated for each of the “fine curve” and the “rough curve” of the spectral envelope, the reciprocal of the spectral envelope (2) representing the fine structure is obtained. 1 / | Y ₁ (N) | is the original form of the perceptual weight, and the positions m of the peaks and valleys of this envelope curve ₁ , V ₁ , M ₂ , V ₂ ,. . . , The original form of the auditory weight 1 / | Y ₁ (N) | is appropriately weighted, and the positions M of peaks and valleys of a curve (3) representing a rough spectral structure ₁ , V ₁ , M ₂ , V ₂ ,. . . 1 / | Y which is also the original form of the auditory weight ₁ (N) |
[0028]
Various functions can be considered as the weighting function for the peak and the weighting function for the valley. FIG. 7 illustrates such a weighting function.
[0029]
In FIG. 7, (a) and (b) show examples of the weighting function for the mountain, where (a) is a straight line and (b) is a parabolic curve. In each case, a section of a total of 2t is set as a weighting section, with t on each side of the center n = M of the mountain. The value of the weighting function is assumed to be 1.0 at both ends (M ± t) of the weighting section. In addition, the value α of the weight at the center of the mountain n = M may be a reasonable constant in the case of 0 <α <1.0. Similarly, in FIGS. 7A and 7B, (c) and (d) show examples of the weighting function for the valley, in which (c) is formed by a straight line and (d) is formed by a parabola. As in the case of the peak, the value of the weighting function for the valley is 1.0 at both ends (V ± t) of the weighting section. In addition, as the weight value β at the center of the valley n = V, an appropriate constant at β> 1.0 is usually used. However, in some cases, setting α> 1.0 and 0 <β <1.0 may be effective.
[0030]
When the auditory weighting is performed in this manner, the quantization noise is transformed as shown in FIG. That is, when the auditory weighting is not performed, the quantization noise is considered to be constant regardless of the frequency ((2) in the figure), but the spectral envelope of the input signal is as shown in (1) in the figure. Assuming that the noise is applied, the above-mentioned auditory weighting is performed, so that the noise has its frequency characteristic deformed as indicated by (3) in the figure and is hidden by (1), which is the spectral characteristic of the input signal. Hard to hear.
[0031]
Therefore, auditory masking with higher accuracy than the conventional method can be performed, and high-quality encoding can be performed.
[0032]
Next, an example in which the above-described signal encoding method of the present invention is applied to auditory weighting of a general transform encoding method will be described. FIG. 9 shows the configuration of a signal encoding device that performs such auditory weighting.
[0033]
The signal encoding device illustrated in FIG. 9 includes an MDCT conversion unit 31 that performs MDCT on an input signal, a spectrum flattening unit 32 that flattens the spectrum of a signal after MDCT, and a frame based on the flattened spectrum. After normalizing and quantizing the gain, the frame gain normalizing unit 33 that outputs a gain index, and quantizes (vector quantization or scalar quantization) the residual component based on the normalized frame gain, and performs quantization. A residual component quantization unit 34 for outputting an index, a spectrum envelope estimation unit 35 for estimating a spectrum envelope from a spectrum of the signal after MDCT, and an information weighting in performing quantization in the residual component quantization unit 34 A hearing weight calculator 36 for calculating a hearing weight from the estimated spectrum envelope, and a Spectral information have and a spectral information quantization unit 37 which outputs a spectrum index by quantizing. In this signal encoding device, the MDCT conversion unit 31 corresponds to the T / F conversion unit 11 of the signal encoding device shown in FIG. 1, and the spectrum envelope estimating unit 35 is an envelope calculation unit of the device shown in FIG. 13 and the peak / valley estimator 14, and the auditory weight calculator 36 is comprised of the weighter 15 and the auditory weight calculator 16 of the apparatus shown in FIG.
[0034]
According to the signal encoding method of the present invention, peaks and valleys of a spectrum in an analysis frame can be accurately and finely analyzed, and highly accurate auditory masking can be performed at the time of quantization according to the shape. This auditory masking can be applied to vector quantization and subband scalar quantization.
[0035]
FIG. 10 shows an example in which the auditory weighting of the present invention is applied to the encoder and decoder disclosed in Japanese Patent Application Laid-Open No. 8-44399. 10, the encoder 110 includes a frame dividing unit 114 that divides an input signal given to the input terminal 111 into frames, a time window hanging unit 115 that can draw a time window on a frame, and a time window hanging unit. MDCT unit 116 that performs an N-order MDCT on the obtained frame, linear prediction analysis unit 117 that performs a linear prediction analysis on the frame to which the time window has been applied, and outputs a prediction coefficient, and an index I that quantizes the prediction coefficient. _p , A spectrum shape calculation unit 121 for calculating a spectrum shape of a prediction coefficient, and a normalization unit for normalizing the spectrum amplitude from the MDCT unit 116 by the spectrum shape to obtain a residual coefficient R (F). 122 and the residual coefficient approximate form E _R (F), a residual approximation calculator 123, a weight calculator 124 that calculates a weighting coefficient (vector W) based on the residual coefficient outline and the spectrum outline, and a quantization and index based on the weighting coefficient. I _m And a quantization unit 125 that outputs a quantized small sequence vector C (m) and a residual coefficient R (F) that is a residual coefficient approximate form E _R (F), a residual coefficient normalization unit 126 that obtains a fine structure coefficient by normalization, a fine structure coefficient of the current frame is normalized and given to the quantization unit 125 as a normalized fine structure coefficient X (F), and an index I _G And a power normalizing unit 127 that denormalizes the quantized small sequence vector C (m) and outputs a quantized residual coefficient R _q (F) to the residual approximate calculation unit 123.
[0036]
In order to perform the auditory weighting based on the present invention in the encoder 110, in addition to the conventional method, the spectrum shape calculator 121 further includes an envelope calculator 13 and a peak / valley estimator of the signal coding apparatus shown in FIG. 14 is performed, and based on the result, in addition to the conventional method, the weight calculation unit 124 further performs processing in the weighting unit 15 and the auditory weight calculation unit 16 of the apparatus shown in FIG. The same processing as described above may be performed, and the obtained auditory weight for quantization may be supplied to the quantization unit 125.
[0037]
On the other hand, the decoder 150 _m A reproducing unit 151 for reproducing a normalized fine structure coefficient from _G A normalization gain reproducing unit 152 for reproducing a normalized gain from a power factor, a power denormalization unit 153 for denormalizing a normalized fine structure coefficient by a normalization gain to obtain a fine structure coefficient, and a residual approximate shape A residual inverse normalizing unit 154 for inverse normalization by the ER to reproduce a residual coefficient R (F); _R 155 that calculates the residual _p A reproduction / spectrum shape calculation unit 156 for regenerating a linear prediction coefficient from and calculating a spectrum outline, and an inverse normalization unit 157 for denormalizing the spectrum outline with a residual coefficient R (F) and reproducing a frequency domain coefficient. An inverse MDCT unit 158 for performing an inverse MDCT on the frequency domain coefficient for each frame to obtain a time domain signal, a windowing unit 159 for applying a time window to the time domain signal for each frame, and performing frame superposition on the windowed output. And a frame superimposing section 161 for obtaining a reproduced audio signal and outputting the signal to an output terminal 191.
[0038]
In the encoder 110 shown in FIG. 10, without providing the inverse normalization unit 131, the residual approximate shape calculation unit 123 uses the residual coefficient approximate shape E based on only the output of the normalization unit 122. _R (F) and index I _Q In this case, in the decoder 150, the residual approximate calculation unit 155 calculates the index I _Q Residual E based on _R Is calculated.
[0039]
Next, an example in which the present invention is applied to auditory masking of CELP (Code-Excited Linear Prediction) encoding, which is an encoding method in the time domain, will be described. In CELP coding, since auditory masking is performed in the time domain, the auditory weighting according to the present invention is applied in the frequency domain, and the obtained auditory weights are returned to the time domain and then applied to quantization. FIG. 11 is a block diagram illustrating a configuration of a signal encoding device that performs such encoding.
[0040]
The apparatus shown in FIG. 11 includes an FFT unit 38 that performs an FFT (Fast Fourier Transform) on an input signal, a spectrum envelope estimation unit 35 that estimates a spectrum envelope based on an output (frequency-domain signal sequence) of the FFT unit, A perceptual weight calculating unit 36 for calculating perceptual weight from the estimated spectral envelope, an inverse FFT unit 39 for returning the perceptual weight to the time domain, and performing CELP encoding of the input signal based on the perceptual weight in the time domain. , And a CELP encoding unit 40 that outputs an index. In this signal encoding device, the FFT unit 38 corresponds to the T / F conversion unit 11 of the signal encoding device shown in FIG. 1, and the spectrum envelope estimating unit 35 is an envelope calculating unit of the device shown in FIG. 13 and the peak / valley estimating unit 14, and the auditory weight calculating unit 36 includes the weighting unit 15 and the auditory weight calculating unit 16 of the apparatus shown in FIG.
[0041]
FIG. 12 shows an example in which the auditory weighting of the present invention is applied to the speech encoding apparatus disclosed in FIG. 1 of JP-A-6-282298. The speech coding apparatus illustrated in FIG. 12 divides a speech signal input via an input terminal 201 into frames, performs linear prediction analysis, and determines a prediction coefficient, a prediction coefficient determination unit 202, a synthesis filter 203, A prediction coefficient quantization unit 204 that quantizes the prediction coefficients and sets the prediction coefficients in the synthesis filter 203; an adaptive codebook 217 that stores a plurality of pitch period vectors; and a noise codebook 218 that stores a plurality of noise waveform vectors. A gain codebook 219 having a gain section 219a for adding gain to the pitch period vector selected from the adaptive codebook 217 and a gain section 219b for adding gain to the noise waveform vector selected from the noise codebook 218; A prediction gain determination unit 215 for obtaining a prediction gain of the next noise waveform vector based on the past output power of A prediction gain unit 216 provided on the input side for adding the prediction gain to a selected noise waveform vector, an adder 209 for adding an output vector from the

gain units

219a and 219b and supplying the resultant to the synthesis filter 203 as a drive vector; A subtractor 211 that subtracts the output (synthesized speech vector) of the synthesis filter 203 from the input speech vector (input signal) and outputs the resulting data as distortion data; an auditory weighting filter 220 that performs auditory weighting on the distortion data; And a code output unit 213 that calculates a distortion power based on the distortion data and selects each of the codebooks 217 to 219 so as to minimize the distortion power, and a code output unit 213 that outputs a code. ing.
[0042]
When performing the hearing weighting based on the present invention in this speech coding device, the signal coding device shown in FIG. 11 described above may be used as the hearing weighting filter 220 here or in combination with the hearing weighting filter 220. Good. Thus, the auditory weighting based on the present invention is performed on the distortion data. Further, although not described here with reference to the drawings, in the speech encoding apparatus disclosed in FIG. 2 of JP-A-6-282298, the signal encoding apparatus shown in FIG. Those modified as described above can be used.
[0043]
The above-described signal encoding method and apparatus according to the present invention can also be implemented by causing a computer (computer) to read a computer program for realizing the method and executing the program. A program for performing signal encoding is read into a computer by a recording medium such as a magnetic tape or a CD-ROM, or via a network. FIG. 13 is a block diagram illustrating a configuration of a computer that executes the above-described signal encoding method.
[0044]
The computer includes a central processing unit (CPU) 21, a hard disk device 22 for storing programs and data, a main memory 23, an input device 24 such as a keyboard, a mouse, and a microphone, and a display device such as a CRT and a speaker. 25, a reading device 26 for reading a recording medium 27 such as a magnetic tape or a CD-ROM, and a communication interface 28 connected to a network. The hard disk device 22, main memory 23, input device 24, display device 25, reading device 26, and communication interface 28 are all connected to the central processing unit 21. Instead of the hard disk device 22, a nonvolatile semiconductor storage device such as a flash ROM may be used. The computer attaches a recording medium 27 storing a program for performing signal encoding to a reading device 26, reads the program from the recording medium 27, stores the program in the hard disk device 22, and executes the program stored in the hard disk device 22. The central processing unit 21 functions as a signal encoding device when executed. Of course, a program for performing signal encoding may be downloaded to this computer via a network.
[0045]
【The invention's effect】
As described above, according to the present invention, when encoding a speech / tone signal, auditory masking with higher accuracy than in the conventional method can be performed, and high-quality encoding can be performed. More specifically, when the time-series signal is converted into a frequency-domain coefficient sequence by, for example, an MDCT transform and quantized, the present invention is used to perceive a quantization error using human auditory masking characteristics. As a result, it is possible to perform the distribution on the frequency axis with higher accuracy than the conventional method.
[Brief description of the drawings]
FIG. 1 is a block diagram illustrating a configuration of a signal encoding device according to an embodiment of the present invention.
FIG. 2 is a block diagram illustrating a process of weighting peaks and valleys of a spectrum.
FIG. 3 is a block diagram illustrating details of processing in an envelope calculation unit.
FIG. 4 is a block diagram illustrating details of processing in a peak / valley estimating unit;
FIG. 5 is a diagram illustrating an example of a state of peaks and valleys in a spectrum envelope detected by a peak / valley estimating unit.
FIG. 6 is a diagram illustrating an example in which weighting is performed around peaks and valleys of a spectral envelope.
FIGS. 7A to 7D are diagrams illustrating examples of weighting functions for peaks and valleys;
FIG. 8 is a diagram illustrating a state in which quantization noise is masked into a spectral envelope by an auditory weighting process.
FIG. 9 is a block diagram illustrating an example of a configuration of a signal encoding device according to the present invention.
FIG. 10 is a block diagram illustrating an example of a configuration of an encoder and a decoder to which auditory weighting according to the present invention is applied.
FIG. 11 is a block diagram illustrating an example of a configuration of a signal encoding device.
FIG. 12 is a block diagram illustrating an example of a configuration of a signal encoding device.
FIG. 13 is a block diagram illustrating an example of a computer system used to configure the signal encoding device.
[Explanation of symbols]
11 T / F converter
12 Quantization unit
13 Envelope calculation unit
14 Mountain and valley estimator
15 Weighting unit
16 Auditory weight calculator

Claims

A signal encoding method for performing quantization on an input signal,
Performing a time axis / frequency axis conversion on the input signal to obtain a coefficient sequence on a frequency axis;
Calculating a spectral envelope based on the coefficient sequence;
Estimating the positions of the peaks and valleys in the spectral envelope,
In the spectral envelope, a step of weighting the amount of information to the estimated position of the peaks and valleys,
Calculating the auditory weight for quantization based on the information-weighted spectral envelope;
Performing quantization based on the auditory weights for quantization,
Has,
Wherein the step of estimating includes the steps of obtaining a first derivative of the spectral envelope, the steps of the second-order differential value seeking first-order differential value before Symbol first derivatives, a,
The step of weighting the information amount is to raise the vicinity of the hill higher and lower the vicinity of the valley deeply, or to lower the vicinity of the hill lower and lift the vicinity of the valley so as to be shallower, using a weighting function such as to lift the vicinity of the valley to be shallower. Having a step of performing a weighting operation on the position,
If the first-order differential value changes from a positive value to a negative value, and if the second-order differential value is always a negative value in the vicinity of the change, the frequency is regarded as a peak position, and the first-order differential value is calculated. Is changed from a negative value to a positive value, and if the second derivative is always a positive value in the vicinity of the change, the frequency is set as a valley position.

A signal encoding method for performing quantization on an input signal,
Performing a time axis / frequency axis conversion on the input signal to obtain a coefficient sequence on a frequency axis;
Calculating a spectral envelope based on the coefficient sequence;
Estimating the positions of the peaks and valleys in the spectral envelope,
In the spectral envelope, a step of weighting the amount of information to the estimated position of the peaks and valleys,
Calculating the auditory weight for quantization based on the information-weighted spectral envelope;
Performing quantization based on the auditory weights for quantization,
Has,
The step of estimating includes a step of calculating a first derivative of the spectrum envelope, a step of calculating an arithmetic mean of the first derivative, and a step of calculating a first derivative of the arithmetic mean of the first derivative. Calculating a second derivative, and calculating an arithmetic mean of the second derivative,
The step of performing the information amount weighting is to raise the vicinity of the hill high and lower the vicinity of the valley deeply, or to lower the vicinity of the hill and raise the vicinity of the valley to be shallow and lift the valley / valley using a weighting function. Having a step of performing a weighting operation on the position,
If the arithmetic mean of the first derivative changes from a positive value to a negative value, and if the arithmetic mean of the second derivative is always negative near the change, the frequency is changed to If the arithmetic mean of the first derivative changes from a negative value to a positive value and the arithmetic mean of the second derivative is always a positive value in the vicinity of the change, For example, a signal encoding method in which the frequency is set as a valley position.

A signal encoding device that performs quantization on an input signal,
Conversion means for performing time axis / frequency axis conversion on the input signal to obtain a coefficient sequence on a frequency axis;
An envelope calculating means for calculating a spectrum envelope based on the coefficient sequence,
Mountain and valley estimating means for estimating the position of the mountain and valley in the spectrum envelope,
In the spectral envelope, weighting means for weighting the amount of information to the estimated position of the peaks and valleys,
Based on the information-weighted spectral envelope, an auditory weight calculation unit that calculates an auditory weight for quantization,
Quantizing means for performing quantization based on the auditory weight for quantization,
Has,
The weighting means weights the positions of the peaks and valleys by using a weighting function that raises the vicinity of the mountain high and lowers the vicinity of the valley deeply, or lowers the vicinity of the mountain and raises the vicinity of the valley to be shallow. Do
The mountain-valley estimating means, a first-order differential value of the spectral envelope required, in search of first-order differential value before Symbol first-order differential value and second-order differential value, before Symbol first-order differential value from a positive value If the second derivative value changes to a negative value and the second derivative value is always a negative value in the vicinity of the change, the frequency is regarded as a peak position, and the first derivative value changes from a negative value to a positive value. And if the second derivative is always a positive value in the vicinity of the change, the frequency is set to a valley position.

A signal encoding device that performs quantization on an input signal,
Conversion means for performing time axis / frequency axis conversion on the input signal to obtain a coefficient sequence on a frequency axis;
An envelope calculating means for calculating a spectrum envelope based on the coefficient sequence,
Mountain and valley estimating means for estimating the position of the mountain and valley in the spectrum envelope,
In the spectral envelope, weighting means for weighting the amount of information to the estimated position of the peaks and valleys,
Based on the information-weighted spectral envelope, perceptual weight calculating means for calculating perceptual weight for quantization,
Quantizing means for performing quantization based on the auditory weight for quantization,
Has,
The weighting means weights the positions of the peaks and valleys by using a weighting function that raises the vicinity of the mountain high and lowers the vicinity of the valley deeply, or lowers the vicinity of the mountain and raises the vicinity of the valley to be shallow. Do
The peak / valley estimating means obtains a first derivative of the spectrum envelope, obtains an arithmetic mean of the first derivative, and obtains a first derivative of the arithmetic mean of the first derivative. An arithmetic mean of the second-order differential value is obtained, and the arithmetic mean value of the first-order differential value changes from a positive value to a negative value. If the arithmetic mean of the first derivative is always a negative value, the frequency is regarded as a peak position, and the arithmetic mean of the first derivative changes from a negative value to a positive value; If the arithmetic mean value of the second order differential value is always a positive value in the vicinity of, the frequency is set as a valley position.

A computer-readable recording medium,
In the calculator,
Performing a time axis / frequency axis conversion on the input signal to obtain a coefficient sequence on a frequency axis;
Calculating a spectral envelope based on the coefficient sequence;
Estimating the positions of the peaks and valleys in the spectral envelope,
In the spectral envelope, a step of weighting the amount of information to the estimated position of the peaks and valleys,
Calculating the auditory weight for quantization based on the information-weighted spectral envelope;
Performing quantization based on the auditory weights for quantization,
And execute
Wherein the step of estimating includes the steps of obtaining a first derivative of the spectral envelope, the steps of the second-order differential value seeking first-order differential value before Symbol first derivatives, a,
The step of weighting the information amount is to raise the vicinity of the hill higher and lower the vicinity of the valley deeply, or to lower the vicinity of the hill lower and lift the vicinity of the valley so as to be shallower, using a weighting function such as to lift the vicinity of the valley to be shallower. Having a step of performing a weighting operation on the position,
If the first-order differential value changes from a positive value to a negative value, and if the second-order differential value is always a negative value in the vicinity of the change, the frequency is regarded as a peak position, and the first-order differential value is calculated. recording medium but changes from a negative value to a positive value, and, if always a positive value the second order derivative in the vicinity of said change, that in which the frequency and location of the valley, to record the signal encoding program .

A computer-readable recording medium,
In the calculator,
Performing a time axis / frequency axis conversion on the input signal to obtain a coefficient sequence on a frequency axis;
Calculating a spectral envelope based on the coefficient sequence;
Estimating the positions of the peaks and valleys in the spectral envelope,
In the spectral envelope, a step of weighting the amount of information to the estimated position of the peaks and valleys,
Calculating the auditory weight for quantization based on the information-weighted spectral envelope;
Performing quantization based on the auditory weights for quantization,
And execute
The step of estimating includes a step of calculating a first derivative of the spectrum envelope, a step of calculating an arithmetic mean of the first derivative, and a step of calculating a first derivative of the arithmetic mean of the first derivative. Calculating a second derivative, and calculating an arithmetic mean of the second derivative,
The step of performing the information amount weighting is to raise the vicinity of the hill high and lower the vicinity of the valley deeply, or to lower the vicinity of the hill and raise the vicinity of the valley to be shallow and lift the valley / valley using a weighting function. Having a step of performing a weighting operation on the position,
If the arithmetic mean of the first derivative changes from a positive value to a negative value, and if the arithmetic mean of the second derivative is always negative near the change, the frequency is changed to If the arithmetic mean of the first derivative changes from a negative value to a positive value and the arithmetic mean of the second derivative is always a positive value in the vicinity of the change, For example, a recording medium on which a signal encoding program is recorded with its frequency being a valley position.