JPH11503882A

JPH11503882A - 3D virtual audio representation using a reduced complexity imaging filter

Info

Publication number: JPH11503882A
Application number: JP7529647A
Authority: JP
Inventors: アベル・ジョナサン・エス．
Original assignee: オーリアル・セミコンダクター・インコーポレーテッド
Priority date: 1994-05-11
Filing date: 1995-05-03
Publication date: 1999-03-30
Also published as: CA2189126C; WO1995031881A1; AU2460395A; EP0760197B1; AU703379B2; EP0760197A4; CA2189126A1; EP0760197A1

Abstract

(57)【要約】圧縮された頭部関連伝達関数（ＨＲＴＦ）（１３０）のパラメータは、仮想オーディオ表現のための音声信号のフィルタリングに使用するため、予め生成されるか、または、リアルタイムに生成される。周波数領域の観点からは、既知の伝達関数の周波数成分が、耳の臨界帯域の幅の関数である複数の帯域幅に渡って平滑化される（１２５）。第１の実現例では、ＨＲＴＦ（１２０）を周波数依存の重み関数と周波数領域において畳み込むことによって、ＨＲＴＦが平滑化される。第２の方法では、ＨＲＴＦの周波数軸がワープされ、非線形周波数領域に写像される。 (57) Summary The parameters of the compressed head related transfer function (HRTF) (130) are pre-generated or real-time generated for use in filtering audio signals for virtual audio representation. Is done. From a frequency domain perspective, the frequency components of the known transfer function are smoothed over a plurality of bandwidths that are a function of the width of the critical band of the ear (125). In a first implementation, the HRTF (120) is smoothed by convolving it in a frequency domain with a frequency-dependent weight function. In the second method, the frequency axis of the HRTF is warped and mapped to the non-linear frequency domain.

Description

【発明の詳細な説明】複雑性を低減したイメージングフィルタを用いた３次元仮想オーディオ表現技術分野この発明は、一般に３次元オーディオ、すなわち仮想オーディオに関する。さらに詳しくは、この発明は、仮想オーディオ表現に使用されるイメージングフィルタの複雑性を低減する方法および装置に係わる。この発明の教示によれば、このような複雑性の低減は、得られる３次元オーディオ表現の音響心理学的局所化特性に実質的に影響を与えることなく達成される背景技術聴取者に到達する音は、音源と聴取者との相対的な位置に依存する伝播効果を示す。また、聴取環境の効果も存在する。これらの効果は、信号強度の差や到達時間の差を含み、聴取者に音源位置の感じを与える。早期の、または、遅延した反響効果のような環境効果が含まれる場合には、この環境効果もまた、聴取者に音響的な環境の感じを与えることになる。適切な伝播効果を模擬するために音を処理することによって、聴取者は、その音が、３次元空間内の特定の点から、すなわち「仮想位置」から発せられているように受け取るであろう。例えば、ウィットマンおよびキスラーによる「自由音場聴取のヘッドホーンシミュレーション」,J．Acoust．Soc．Am.，Vol．85，No.2，1989を参照。現在の３次元すなわち仮想オーディオ表現は、複数の選択された頭部関連伝達関数（ＨＲＴＦ）を用いて音声入力信号を時間領域フィルタリングすることによって達成されている。各ＨＲＴＦは、３次元空間内の特定の位置または領域における音響心理学的な局所化、もしくは、３次元空間内の方向における音響心理学的な局所化を達成する伝播効果や音響キューを再現するように設計される。例えば、エリザベス・エム．・ウェンゼルによる「仮想音響表現における局所化」,P resence，Vol．1，No．1，Sumner 1992 を参照。簡単のために、この明細書では、１つのオーディオチャンネルに作用する１つのＨＲＴＦについてのみ言及する。実際には、複数対のＨＲＴＦが、聴取者の両耳に適切な信号を供給するために使用される。現在では、ほとんどのＨＲＴＦは、空間的な方向についてのみ索引付けられており、レンジ成分は独立に考慮されている。いくつかのＨＲＴＦは、レンジと方向の両方を含むことによって空間的な位置を規定しており、位置によって索引付けられている。ここでは特定の例として、方向を規定するＨＲＴＦに言及しているが、この発明は、方向または位置のいずれかを表すＨＲＴＦに適用される。典型的には、ＨＲＴＦは、実験的な測定によって得られるか、または、実験的に得られたＨＲＴＦを修正することによっても得られる。実際的な仮想オーディオ表現の構成では、複数のＨＲＴＦパラメータセットのテーブルが保存されており、その各ＨＲＴＦパラメータセットは、３次元空間の特定の点または領域に関連づけられている。テーブルの保存容量を削減するために、いくつかの空間的位置に対するＨＲＴＦパラメータのみが保存される。他の空間位置に対するＨＲＴＦパラメータは、テーブルに保存されたＨＲＴＦ位置の適切なセットを補間することによって生成される。上述したように、音響環境もまた考慮される。実際には、これは、ＨＲＴＦを修正するか、または、音声信号を、所望の音響環境を模擬する追加のフィルタリングの処理対象とすることによって達成される。説明の簡単のために、開示された実施例はＨＲＴＦに言及しているが、この発明は、より一般に仮想オーディオ表現に使用されるすべての伝達関数に適用されるものであり、そのような伝達関数としては、ＨＲＴＦや、音響環境効果を表す伝達関数や、頭部関連変換と音響環境効果の両方を表す伝達関数を含んでいる。従来の典型的な構成を図１に示す。３次元空間位置信号１０がＨＲＴＦパラメータテーブルおよび補間関数１１に適用されて、信号１０によって識別される３次元位置に応じた１組の補間されたＨＲＴＦパラメータ１２が得られる。入力音声信号１２は、適用され補間されたＨＲＴＦパラメータによって決定される伝達関数を有するイメージングフィルタ１５に適用される。このフィルタ１５は、ヘッドホーン１７の１チャンネルに適用するのに適した「空間化された」音声出力を提供する。種々の図面は、再現のためにヘッドホーンを示しているが、適切なＨＲＴＦは、スピーカを含む他のタイプのオーディオトランスデューサによって、音響心理学的に局所化された音声を生成することができる。この発明は、特定のタイプのオーディオトランスデューサを使用することに限定されるものではない。イメージングフィルタが有限インパルス応答（ＦＩＲ）フィルタで実現される時には、ＨＲＴＦパラメータは、そのＨＲＴＦに関連づけられたインパルス応答を構成するＦＩＲフィルタタップを規定する。以下に説明するように、この発明は、ＦＩＲフィルタを使用することに限定されるものではない。図１に示す従来技術のアプローチの主な欠点は、比較的長く複雑なＨＲＴＦの演算コストである。従来技術では、ＨＲＴＦの長さすなわち複雑性を低減するためにいくつかの技術を用いている。図２ａに示すＨＲＴＦは、時間遅れ成分Ｄと、インパルス応答成分ｇ（ｔ）とを含んでいる。すなわち、イメージングフィルタは、図２ｂに示すように、時間遅れ関数Ｚ^-Dと、インパルス応答関数ｇ（ｔ）とで実現できる。まず、この時間遅れを除去することによって複数のＨＲＴＦを時間的に整列させれば、イメージングフィルタのインパルス応答関数の計算の複雑性が低減される。図３ａは、従来技術の構成を示しており、ここでは複数の対の未加工の（すなわち「生の」）ＨＲＴＦが時間整列プロセッサ１０１に適用され、その出力端子に時間整列ＨＲＴＦ１０２と、後に使用（図示せず）される時間遅れ値１０３とが出力される。プロセッサ１０１は、複数対の未加工のＨＲＴＦの相互相関を取って、それらの到達時間の時間差を決定する。これらの時間差は、遅れ値１０３である。時間遅れ値１０３とフィルタ区間は、後の使用のために保持されるので、音響心理学的な局所化ロスは生じず、知覚的な効果は保存される。各時間整列ＨＲＴＦ１０２は、その後、最小位相コンバータ１０４で処理されて残りの時間遅れが除去され、時間整列ＨＲＴＦがさらに短くなる。図３ｂは、未加工のＨＲＴＦパラメータ１００から得られた未加工の２組の左 −右対のＨＲＴＦ(Ｒ１／Ｌ１およびＲ２／Ｌ２)の例を示している。図３ｃは、これに対応する時間整列ＨＲＴＦ１０２を示している。図３ｄは、これに対応する出力最小位相ＨＲＴＦ１０５を示している。時間整列ＨＲＴＦ１０２のインパルス応答長さは、未加工のＨＲＴＦ１００から短縮されており、また、最小位相ＨＲＴＦ１０５は時間整列ＨＲＴＦ１０２から短縮されている。このように、複数のＨＲＴＦを時間整列させるために遅れを抽出し、最小位相変換を適用することによって、フィルタの複雑性（ＦＩＲフィルタの場合にはその長さ）が低減される。図２ｂおよび図３ａの技術を使用したとしても、４８ｋＨｚのオーディオサンプリング率において、ＦＩＲフィルタに対して２５６点程度の長い最少位相応答が通常使用されており、これは、プロセッサが音源毎に２５ｍｉｐｓのオーダーで処理を実行することを要求する。演算のためのリソースが限られている場合には、ＨＲＴＦの長さすなわち複雑性をさらに低減するために、従来技術において２つの付加的なアプローチが、単独もしくは組み合わされて使用される。１つの技術は、図４ａに示すように、ＨＲＴＦをダウンサンプリングすることによって、サンプリング率を低下する方法である。多くの局所化キュー、特に、高さにとって重要なものは、高周波数成分を含むので、サンプリング率の低下はオーディオ表現の性能を受け入れ不可能な程度にまで劣化させることがある。他の技術は、図４ｂに示されており、時間領域でＨＲＴＦにウィンドウ関数を乗ずることによって、または、周波数領域でこれに対応する重み関数を用いてＨＲＴＦを畳み込むことによって、ＨＲＴＦにウィンドウ関数を適用する方法である。この処理は、時間領域においてＨＲＴＦにウィンドウを乗ずることを考えることによって最も容易に理解しうる。このとき、短縮されたＨＲＴＦが得られるように、ウィンドウ幅はＨＲＴＦよりも狭いものが選択される。このようなウィンドウ処理は、固定された重み関数を用いた周波数領域平滑化の結果が得られる。この既知のウィンドウ処理技術は、音響心理学的局所化特性を劣化させ、特に、複雑で長いインパルス応答を有する空間的な位置や方向に関するものを劣化させる。このように、元のＨＲＴＦの知覚的な効果や音響心理学的局所化特性を維持しつつ、ＨＲＴＦの複雑性すなわち長さを低減する方法が望まれている。発明の開示この発明によれば、３次元仮想オーディオ表現は、空間位置信号に応じて１組の伝達関数パラメータを生成し、この１組の頭部関連伝達関数パラメータに応じて音声信号をフィルタ処理する。この１組の頭部関連伝達関数パラメータは、複数の既知の頭部関連伝達関数のためのパラメータを平滑化したものである。この発明による平滑化は、その動作を周波数領域で考えることによって最も良く説明しうる。複数の既知の伝達関数の周波数成分は、周波数に関して一定の関数ではない複数の帯域幅に渡って平滑化される。得られた複数の伝達関数のパラメータ（ここでは「圧縮された」伝達関数と呼ぶ）は、仮想オーディオ表現のために音声信号をフィルタ処理するために使用される。圧縮された頭部関連伝達関数パラメータは、予め生成されていてもよく、あるいは、リアルタイムで作成されてもよい。前記の平滑化帯域は、耳の複数の臨界帯域の幅（すなわち、「臨界帯域幅」）の関数であることが好ましい。この関数は、平滑化帯域幅が、臨界帯域幅に比例するようにとることもできる。周知のように、耳の臨界帯域の幅は、周波数の増大にともなって増大し、従って、平滑化帯域幅も周波数とともに増大する。臨界帯域幅に対して平滑化帯域幅がより広いほど、結果として得られるＨＲＴＦの複雑性が低下する。ＦＩＲフィルタとして実現されるＨＲＴＦの場合には、フィルタの長さ（フィルタタップの数）は、臨界帯域幅の倍数として表現される平滑化帯域幅の逆数に関連付けられる。臨界帯域幅を考慮にいれている本発明の教示を適用することによって、複雑性や長さを同程度に低減した場合に、上述のような従来技術のウィンドウ技術によって、より単純に短くなったＨＲＴＦに比べて、知覚的効果や音響心理学的局所化の劣化がより少ないような、複雑性の低い、短いＨＲＴＦが得られる。ＨＲＴＦ（「未加工のＨＲＴＦ」）の例と、従来のウィンドウ処理方法によって作成された短縮されたＨＲＴＦの例と、本発明の方法によって作成されたＨＲＴＦ（「圧縮ＨＲＴＦ」）が、図５ａ（時間領域）および図５ｂ（周波数領域）に示されている。未加工のＨＲＴＦは、その複雑性すなわち長さを低減する処理が行われていない既知のＨＲＴＦの例である。図５ａにおいて、ＨＲＴＦの時間領域インパルス応答振幅が、０から３ミリ秒の時間軸に沿ってプロットされている。図５ｂには、各ＨＲＴＦの周波数領域伝達関数パワーが、１ｋＨｚから２０ｋＨｚまでの対数周波数軸に沿ってプロットされている。図５ｂの時間領域において、従来のＨＲＴＦは或る程度の短縮化を示しているが、圧縮ＨＲＴＦはさらに短縮化されている。図５ｂの周波数領域において、従来技術ＨＲＴＦにおける一様な平滑化帯域幅の効果が明らかであり、一方、圧縮ＨＲＴＦは、周波数の増大に伴って平滑化帯域幅が増大する効果を示している。図５ｂは対数周波数尺度なので、圧縮ＨＲＴＦは、未加工ＨＲＴＦに対して一定の平滑化を示している。時間領域の長さの差と、周波数領域における周波数応答の差があるにも係わらず、未加工ＨＲＴＦと、従来技術ＨＲＴＦと、圧縮ＨＲＴＦとは、同等な音響心理学的性能を提供する。未加工ＨＲＴＦに対する従来のウィンドウ処理の量と、本発明による圧縮の量とを、未加工ＨＲＴＦに対して実質的に同様の音響心理学的性能を与えるように選択すると、予備的な二重目隠し聴取テストでは、従来技術のウィンドウ処理ＨＲＴＦよりも圧縮ＨＲＴＦの方が好まれることが示される。驚いたことに、圧縮ＨＲＴＦは、未加工のＨＲＴＦよりも好まれる。この理由は、平滑化処理によって除去されたＨＲＴＦの微細構造が、ＨＲＴＦの位置には未関連であり、一種のノイズとして認識されるからであると考えられる。本発明は、少なくとも２つの方法によって実現可能である。第１の方法では、周波数領域において、周波数に依存する重み関数を用いてＨＲＴＦを畳み込むことによって、ＨＲＴＦが平滑化される。この重み関数は、不変ではなく、周波数に依存する関数であるという点において、従来技術の時間領域ウィンドウ関数の周波数領域版とは異なる。この代わりに、周波数依存の重み関数の時間領域版を、時間領域において、ＨＲＴＦインパルス応答に適用するようにしてもよい。第２の方法では、ＨＲＴＦの周波数軸がワープされ、すなわち、非線形周波数領域に写像され、この周波数ワープＨＲＴＦが、時間領域において（時間領域に変換された後に）従来のウィンドウ関数で乗算されるか、または、周波数領域において従来のウィンドウ関数の不変周波数応答と畳み込まれる。ウィンドウ処理された信号に対しては、その後、逆周波数ワーピングが行われる。この発明は、あらゆる種類のイメージングフィルタを使用して実現され得るものであり、このイメージングフィルタとしては、アナログフィルタや、ハイブリッドアナログ／デジタルフィルタや、デジタルフィルタなどを含むが、これらには限定されない。このようなフィルタは、ハードウェアや、ソフトウェアや、ハードウェア／ソフトウェアのハイブリッド構成など（例えばデジタル信号処理）によって実現されうる。デジタル的にあるいは部分的にデジタル的に実現された時には、ＦＩＲフィルタ、ＩＩＲ（無限インパルス応答）フィルタ、およびハイブリッドＦＩＲ／ＩＩＲフィルタを使用することができる。この発明は、また、主成分フィルタ・アーキテクチャによっても実現できる。仮想オーディオ表現の他の態様は、アナログ、デジタル、アナログ／デジタルのハイブリッド、ハードウェア、ソフトウェア、および、ハードウェア／ソフトウェアのハイブリッド技術を含む任意の組み合わせ、例えばデジタル信号処理、を用いて実現可能である。ＦＩＲフィルタで実現する場合には、ＨＲＴＦパラメータはそのＦＩＲフィルタを規定するフィルタタップである。ＩＩＲフィルタの場合には、ＨＲＴＦパラメータは、そのＩＩＲフィルタを規定する極およびゼロ点、または、他の特性である。主成分フィルタの場合には、ＨＲＴＦパラメータは、位置依存の重みである。この発明の他の態様では、１群のＨＲＴＦ内の各ＨＲＴＦは、その１群内のすべての頭部関連伝達関数に共通する固定頭部関連伝達関数と、それぞれの頭部関連伝達関数に関連付けられた可変頭部関連伝達関数とに分離され、この固定頭部関連伝達関数および各可変頭部関連伝達関数の組み合わせが、それぞれの元の既知の頭部関連伝達関数と実質的に等価である。この発明による平滑化技術は、固定ＨＲＴＦと可変ＨＲＴＦの一方、または、双方に適用してもよく、あるいは、両方ともに適用しなくてもよい。図面の簡単な説明図１は、従来技術の仮想オーディオ表現構成の機能ブロック図である。図２ａは、頭部関連伝達関数（ＨＲＴＦ）のインパルス応答の例である。図２ｂは、イメージングフィルタがＨＲＴＦの時間遅れ部分およびインパルス応答部分を表すように示された機能ブロック図である。図３ａは、ＨＲＴＦの複雑性すなわち長さを低減させる１つの従来技術の機能ブロック図である。図３ｂは、１組の左および右の「未加工」ＨＲＴＦ対を示す。図３ｃは、図３ｂの１組のＨＲＴＦ対が、時間整列されて短くなったものを示す。図３ｄは、図３０の１組のＨＲＴＦ対が、最少位相変換されてさらに長さが低減したものを示す。図４ａは、サンプリング率を低下させることによって、ＨＲＴＦインパルス応答を短くする従来の技術を示す機能ブロック図である。図４ｂは、時間領域においてＨＲＴＦインパルス応答にウィンドウを乗ずることによって、ＨＲＴＦインパルス応答を短くする従来の技術を示す機能ブロック図である。図５ａは、時間領域における３つの波形の組を示しており、「未加工」ＨＲＴＦの一例と、従来技術によって短縮されたＨＲＴＦと、本発明の教示に従って圧縮されたＨＲＴＦとを示している。図５ｂは、図５ａの１組のＨＲＴＦ波形の周波数領域表現である。図６ａは、本発明に従って圧縮ＨＲＴＦを得る実施例を示す機能ブロック図である。図６ｂは、一例の入力ＨＲＴＦの周波数応答を示している。図６ｃは、一例のＨＲＴＦインパルス応答のインパルス応答を示している。図６ｄは、圧縮出力ＨＲＴＦの周波数応答を示している。図６ｅは、圧縮出力ＨＲＴＦのインパルス応答を示している。図７ａは、本発明により圧縮ＨＲＴＦを得る他の実施例を示している。図７ｂは、一例の入力ＨＲＴＦインパルス応答のインパルス応答を示している。図７ｃは、一例の入力ＨＲＴＦの周波数応答を示している。図７ｄは、周波数ワーピング後の入力ＨＲＴＦの周波数応答を示している。図７ｅは、圧縮出力ＨＲＴＦの周波数応答を示している。図７ｆは、逆周波数ワーピング後の圧縮出力ＨＲＴＦの周波数応答を示している。図７ｇは、逆周波数ワーピング後の圧縮出力ＨＲＴＦのインパルス応答を示している。図８は、図６ａおよび図７ａの実施例の動作を理解するために有用な３つのウィンドウのファミリを示している。図９は、イメージングフィルタが主成分フィルタとして実施されている場合の機能ブロック図である。図１０は、本発明の他の態様を示す機能ブロック図である。発明の実施の形態図６ａは、この発明に従って圧縮ＨＲＴＦを生成する実施例を示している。この実施例では、入力ＨＲＴＦの周波数応答が、周波数領域において周波数依存の重み関数を用いて畳み込まれることによって、入力ＨＲＴＦが平滑化される。この代わりに、周波数依存の重み関数の時間領域版を、時間領域においてＨＲＴＦインパルス応答に適用するようにしても良い。図７ａは、この発明に従って圧縮ＨＲＴＦを生成する他の実施例を示している。この実施例によれば、入力ＨＲＴＦの周波数軸がワープされ（歪められ）、すなわち、非線形周波数領域に写像されて、この周波数ワープＨＲＴＦが、周波数領域において、不変重み関数（すなわち、従来の時間領域ウィンドウ関数の周波数領域版である重み関数）の周波数応答と畳み込まれる。その後、逆周波数ワーピングが平滑化信号に適用される。この代わりに、周波数ワープＨＲＴＦを時間領域に変換して、従来のウィンドウ関数で乗算するようにしてもよい。図６ａにおいては、オプションとしての非線形スケーリング関数５１が入力ＨＲＴＦ５０に適用されている。その後、平滑化関数５４がＨＲＴＦ５２に適用される。入力ＨＲＴＦに非線形スケーリングが適用される場合には、この後に、逆スケーリング関数５６が平滑化ＨＲＴＦ５４に適用される。圧縮ＨＲＴＦは出力として与えられる。以下でさらに説明するように、非線形スケーリング５１と非線形逆スケーリング５６は、平滑化平均関数が信号振幅または信号パワーのいずれに関するものか、および、それが算術平均か幾何平均か、その他の平均化関数であるか、を制御することができる。平滑化プロセッサ５４は、ＨＲＴＦを周波数依存重み関数と畳み込む。この平滑化プロセッサは、移動重み付き算術平均として実現しても良い；ここで、少なくとも平滑化帯域幅ｂ_fは周波数の関数であり、オプションとしてウィンドウ形状Ｗ_fも周波数の関数としてもよい。重み関数の幅は、周波数と共に増加する。重み関数の長さは臨界帯域幅の倍数であることが好ましく、要求されるＨＲＴＦインパルス応答長さが短いほど、その倍数値は大きくなる。ＨＲＴＦは、典型的には低周波数成分（約３００Ｈｚ以下）および高周波数成分（約１６ｋＨｚ以上）を欠いている。可能な限り短い(従って最も複雑で無い) ＨＲＴＦを提供するためには、ＨＲＴＦ周波数応答を、人間の可聴域の通常の下限および上限まで、あるいはそれらを超えて、拡張することが望ましい。しかし、こうした場合には、拡張された低周波数および高周波数音声帯域における重み関数の幅を、ＨＲＴＦの内容が典型的に含まれている音声帯域の主要な非拡張部分を通じて使用されている臨界帯域幅の倍数よりも、耳の臨界帯域に対して相対的により広くすべきである。約５００Ｈｚより下では、音声の波長が頭部のサイズに比較して大きいので、ＨＲＴＦは概略平坦なスペクトルとなる。従って、上述の臨界帯域幅の倍数よりも広い平滑化帯域幅を用いることが好ましい。約１６ｋＨｚより上の高周波数においては、人間の聴覚が貧弱であり、また、ほとんどの局所化キューはそのような高周波数よりも下に集中しているので、上述の臨界帯域幅の倍数よりも広い平滑化帯域幅を用いることが好ましい。従って、音声帯域の低周波数端および高周波数端における重み帯域幅は、ここで説明された式によって予測される帯域幅を超えて拡張するようにしてもよい。例えば、この発明の１つの具体的な実施例において、１ｋＨｚよりも下の周波数に対しては約２５０Ｈｚの一定の平滑化帯域幅が使用され、１ｋＨｚよりも上では１／３オクターブ帯域幅が使用される。１／３オクターブ帯域幅は臨界帯域幅の近似であり、１ｋＨｚにおいて１／３オクターブ帯域幅は約２５０Ｈｚである。従って、１ｋＨｚより下では、平滑化帯域幅は、臨界帯域幅よりも広い。場合によっては、低周波数（例えば３００〜５００Ｈｚ）でのパワーを、従来のＨＲＴＦ測定技術を使用しては正確には決定できないデータを補充するために、ＤＣにまで外挿するようにしてもよい。１つの群に属するすべてのＨＲＴＦを処理するのに、同じ臨界帯域幅の倍数を有する重み関数を用いるようにしてもよいが、異なる臨界帯域幅倍数値を有する複数の重み関数をそれぞれのＨＲＴＦに適用して、すべてのＨＲＴＦが同程度に圧縮されることがないようにしてもよい。これは、得られた複数の圧縮ＨＲＴＦが、同じ複雑性と長さを有することを確保するために必要なことがある（いくつかの未加工ＨＲＴＦは、その空間的位置に依存して、より複雑でより長いで、より大幅な、または、より少ない圧縮が必要になることがある）。この代わりに、ある方向や空間的位置を表すＨＲＴＦの圧縮量を他のＨＲＴＦよりも少なくして全体の空間的局所化のより良い感覚を維持しつつ、演算の複雑性を全体としていくらか緩和するようにしてもよい。ＨＲＴＦの圧縮量は、ＨＲＴＦの相対的な音響心理学的重要性の関数として変化するようにしてもよい。例えば、早期反射は、異なる方向から到達するので、別々の複数のＨＲＴＦを用いて得られるものであり、直接音声経路ほどには正確な空間化は重要ではない。従って、早期反射は、「過短縮化」されたＨＲＴＦを用いて、知覚的な影響無しで得ることができる。図６ａの平滑化５４を実現する他の方法は、各周波数ｆに対して、Ｈ_θ（ｎ）は、位置θにおける入力ＨＲＴＦ５２であり、Ｓ_θ（ｆ）は圧縮ＨＲＴＦ５４、ｎは周波数、Ｎはナイキスト周波数の１／２である。従って、各々が０からＮまでの区間でそれぞれ定義されている重み関数Ｗ_f, _θ（ｎ）のファミリーが存在し、これらの重み関数の幅は、それらの中心周波数ｆの関数であり、また、オプションとしてはＨＲＴＦ位置θの関数としてもよい。各重み関数の和は１である（式３）。図８は、ガウス分布形状を有する重み関数のファミリーの３つの構成要素を、周波数に対する振幅応答をプロットして示している。簡単のために、重み関数のファミリーの中の３個のみが示されている。中央のウィンドウは、その中心が周波数ｎ₀にあり、帯域幅ｂ_f=n0を有している。重み関数は、ガウス分布を有する必要はない。他の形状の重み関数としては、単純化のために、長方形を含む重み関数を用いてもよい。また、重み関数は、その中心周波数に対して対称である必要はない。非線形スケーリング関数５１および逆スケーリング関数５６を考慮して、図６ａを、もっと一般に次のように特徴付けることができる。ここで、Ｇはスケーリング５１であり、Ｇ^-1は逆スケーリングである。これまでに説明した平滑化５４は、入力ＨＲＴＦ伝達関数の統計に依存する算術平均関数を与えているが、丸め平均値（トリム平均）または中央値が、算術平均よりも好ましいかもしれない。人間の耳は、臨界帯域内における合計フィルタパワーに感受性があるようなので、図６ａの非線形スケーリング５１を２乗演算として実現し、出力逆スケーラ５６を平方根演算として実現することが好ましい。最少位相変換のような、或る前処理や後処理を適用することが望ましい場合もある。この代わりに、または２乗演算スケーリングおよび平方根逆スケーリングに加えて、非線形スケーリング５１が対数関数であり、逆スケーリング５７が指数関数である時には、平滑化５４の算術平均は幾何平均となる。このような平均は、高さ方向の知覚に重要と考えられる空スペクトルを保存するのに有用である。図６ｂと６ｃは、入力ＨＲＴＦの周波数スペクトルと入力インパルス応答の一例を、それぞれ周波数領域と時間領域で示している。図６ｄと６ｅは、それぞれの領域における圧縮出力ＨＲＴＦ５７を示している。ＨＲＴＦスペクトルが平滑化されている程度および、そのインパルス応答が短縮されている程度は、平滑化５４に対して選択された臨界帯域幅の倍数に依存している。圧縮ＨＲＴＦの特性は、また、上述したウィンドウ形状と他の要因に依存している。図７ａを参照する。この実施例において、入力ＨＲＴＦの周波数軸は、歪められた周波数スペクトルに作用する一定帯域幅の平滑化１２５が図６ａの平滑化５４と同等になるように、周波数ワーピング関数１２１によって変換される。平滑化ＨＲＴＦは、逆ワーピング１２９で処理されて、出力圧縮ＨＲＴＦが得られる。図６ａと同様に、非線形スケーリング５１と逆スケーリング５６を、任意に入力ＨＲＴＦと出力ＨＲＴＦに適用するようにしても良い。周波数ワーピング関数１２１は、一定帯域幅平滑化との組み合わせによって、図６ａの実施例の周波数依存平滑化帯域幅の目的を達成する。例えば、周波数をバークスケールに写像するワーピング関数を、臨界帯域平滑化を実現するために使用しても良い。平滑化１２５は、重み関数の幅が周波数に関して一定であるという点を除いて、図６ａの実施例と同様に、時間領域ウィンドウ関数の乗算として実現することもでき、また、周波数領域の重み関数の畳み込みとして実現することもできる。図６ａに関する場合と同様に、最少位相変換のような、或る前処理や後処理を適用することが望ましいこともある。周波数ワーピング関数１２１とスケーリング関数５１とが適用される順序は、逆にすることができる。これらの関数は線形ではないが、周波数ワーピング１２１は周波数領域に影響を与え、スケーリング５１は周波数ビンの値にのみ影響するので、これらの関数は交換できる。従って、逆スケーリング関数５６と逆ワーピング関数1２９もまた、逆にすることができる。さらに他の方法として、出力ＨＲＴＦをブロック１２５の後に取り出して、逆スケーリングと逆ワーピングを、その圧縮ＨＲＴＦパラメータを受け取る装置や関数の中に設けるようにしてもよい。図７ｂおよび７ｃは、入力ＨＲＴＦの入力応答および周波数スペクトルの一例をそれぞれ示している。図７ｄは、バークスケールに写像されたＨＲＴＦの周波数スペクトルを示している。図７ｅは、平滑化１２５の後のＨＲＴＦのスペクトルを示している。逆周波数ワーピングを行った後は、結果として得られる圧縮ＨＲＴＦは、図７ｆに示すようなスペクトルと、図７ｇに示すようなインパルス応答を有している。結果として得られるＨＲＴＦ特性は、図６ａの実施例のものと同一である。イメージングフィルタは、また、図９に示す方法で、主成分フィルタとして実施することもできる。位置信号３０は、図１のブロック１１と機能的に類似した重みテーブルおよび補間関数３１に適用される。ブロック３１によって提供されるパラメータと、補間された重みと、方向性マトリクスと、主成分フィルタとは、イメージングフィルタを制御するＨＲＴＦパラメータと機能的に等価である。この実施例のイメージングフィルタ１５’は、１組の並列固定フィルタ３４、すなわち、主成分フィルタ、ＰＣ₀〜ＰＣ_Nにおいて入力信号３３をフィルタ処理し、それらの出力は位置依存の重み付けによって混合されて、所望のイメージングフィルタを近似する。この近似の精度は、使用されている主成分フィルタの数と共に増加する。１組の未加工ＨＲＴＦに対して一定程度の近似を達成するには、この発明の実施例によって圧縮されたものに対する場合よりも、より多くの演算リソースが、追加の主成分フィルタの形で必要になる。この発明の他の態様が、図１０の実施例に示されている。３次元空間位置信号７０が、等化されたＨＲＴＦパラメータテーブルおよび補間関数７１に適用され、信号７０によって識別された３次元位置に応じた１組の補間された等化ＨＲＴＦパラメータ７２が得られる。入力音声信号７３は、等化フィルタ７４と、補間された等化ＨＲＴＦパラメータによって決定されるイメージングフィルタ７５とに適用される。この代わりに、等化フィルタ７４が、イメージングフィルタ７５の後に設置されていても良い。このフィルタ７５は、ヘッドホーン７７の１チャンネルに適用するのに適した空間化された音声出力を与える。テーブル７１内の複数組の等化された頭部関連伝達関数パラメータは、１群の既知の頭部関連伝達関数を、その群のすべての頭部関連伝達関数に共通する１つの固定頭部関連伝達関数と、それら既知の頭部関連伝達関数の各々に関連する可変位置依存頭部関連伝達関数とに分割することによって予め得られ、この固定頭脳関連伝達関数と各可変頭部関連伝達関数との組み合わせは、それぞれの元の既知の頭部関連伝達関数に実質的に等しい。等化フィルタ７４は、従って、テーブル内のすべての頭部関連伝達関数に共通する固定頭部関連関数を表している。このようにして、ＨＲＴＦとイメージングフィルタの複雑性が低減する。この等化フィルタ特性は、イメージングフィルタの複雑性を最少にするように選択される。これは、等化ＨＲＴＦテーブルのサイズを最小化し、ＨＲＴＦの補間とイメージングフィルタリングのための演算リソースを低減し、また、テーブル化されたＨＲＴＦのためのメモリリソースを低減する。ＦＩＲイメージングフィルタの場合には、フィルタ長さを最少にすることが望ましい。所望の等化フィルタを見いだすには、種々の最適化基準を用いることができる。等化フィルタは、平均ＨＲＴＦＭを近似するようなものでもよく、こうすれば、位置依存部のスペクトルが平均的に平坦になる（またその時間が短くなる）。等化フィルタは、１群の既知の伝達関数の拡散場音成分を表していても良い。等化フィルタが、ＨＲＴＦの重み付き平均として構成されている時には、その重み付けは、より長くより複雑なＨＲＴＦを、より重視するようにすべきである。左チャンネルと右チャンネルに対して、（位置可変ＨＲＴＦの前または後のいずれかに）異なる固定等化処理を行うようにしてもよく、単一の等化処理をモノラル音源信号に適用してもよい（モノラル信号が左成分と右成分とに分離される前に単一のフィルタとして適用してもよく、または、左成分と右成分のそれぞれに対する２つのフィルタとして適用しても良い）。人間の対称性から予測されるように、最適な左耳および右耳等化フィルタは、しばしばほぼ同一である。従って、音源信号は、単一の等化フィルタを用いてフィルタリングを行い、その出力を、両方の位置依存ＨＲＴＦフィルタに与えるようにしてもよい。この発明の教示に従って、等化されたＨＲＴＦパラメータと、固定等化フィルタのパラメータのいずれかを平滑化するか、または、等化されたＨＲＴＦパラメータと等化フィルタパラメータの両方を平滑化することによって、さらに利点が得られる。また、等化フィルタとイメージングフィルタとに対して、異なるフィルタ構造を使用することによって、演算を節約することができる。例えば、その内の一方をＩＩＲフィルタとして実現し、他方をＦＩＲフィルタとして実現するようにしてもよい。典型的には固定フィルタの方がかなり滑らかな応答を有するので、等化フィルタは低次のＩＩＲフィルタとして実現するのがもっともよいであろう。また、等化フィルタをアナログフィルタとして実現することも容易である。主成分法を含む、ＨＲＴＦフィルタにおいて使用するのに適したあらゆるフィルタリング技術を、可変位置依存部等化ＨＲＴＦパラメータを実現するために使用することができる。例えば、図１０は、イメージングフィルタ７５として、図９の実施例において説明したようなタイプの主成分イメージングフィルタ１５’ を利用してもよい。DETAILED DESCRIPTION OF THE INVENTION Using an imaging filter with reduced complexity 3D virtual audio representation Technical field The present invention relates generally to three-dimensional audio, or virtual audio. Sa More specifically, the invention relates to an imaging filter used for virtual audio representation. A method and apparatus for reducing complexity of a filter. According to the teachings of the present invention, The complexity reduction as described above is due to the psychoacoustic localization of the resulting 3D audio representation. Achieved without substantially affecting the properties Background art The sound reaching the listener has a propagation effect that depends on the relative position of the sound source and the listener. Show. There is also the effect of the listening environment. These effects can be caused by differences in signal strength or Including the time difference, it gives the listener a sense of the sound source position. Early or delayed If environmental effects such as reverberation effects are included, these environmental effects will also be heard by the listener. It will give an acoustic environment feeling. Sound to simulate proper propagation effects By processing, the listener can hear the sound from a particular point in three-dimensional space. That is, they will receive as if they originated from a "virtual location". For example, "Headphone Simulation of Listening to a Free Sound Field by Stuttman and Kistler J. Acoust. Soc. Am., Vol. 85, No. 2, 1989. The current three-dimensional or virtual audio representation has multiple selected head-related transmissions. By time domain filtering the audio input signal using a function (HRTF) Has been achieved. Each HRTF is located at a specific location or area in three-dimensional space. Psychoacoustic localization or psychoacoustics in directions in three-dimensional space It is designed to reproduce the propagation effects and acoustic cues that achieve local localization. example For example, Elizabeth M. Wenzel's "Localization in virtual acoustic expression", P resence, Vol. 1, No. 1, see Sumner 1992. For simplicity, this specification uses Mention only one HRTF acting on one audio channel . In practice, multiple pairs of HRTFs provide appropriate signals to both ears of the listener Used for At present, most HRTFs are only indexed for spatial orientation And the range components are considered independently. Some HRTFs have range and direction The spatial position is defined by including both directions and indexed by position Have been killed. Here, as a specific example, reference is made to an HRTF that defines directions. However, the invention applies to HRTFs representing either direction or position. Typically, HRTFs are obtained by experimental measurements or Can be obtained by modifying the obtained HRTF. Practical virtual audio In the expression configuration, a table of a plurality of HRTF parameter sets is stored. Each HRTF parameter set is associated with a particular point or region in three-dimensional space. It is linked. To reduce table storage space, some spatial Only the HRTF parameters for the location are saved. HRT for other spatial locations The F parameter interpolates the appropriate set of HRTF positions stored in the table Generated by As mentioned above, the acoustic environment is also considered. In practice, this is an HRTF Modify or modify the audio signal with additional filtering to simulate the desired acoustic environment. This can be achieved by making it a target for processing. Disclosed for simplicity of explanation. Although the preferred embodiments refer to HRTFs, the invention more generally relates to virtual audio It applies to all transfer functions used in the representation, and The numbers include HRTFs, transfer functions representing acoustic environment effects, and head-related transformations and acoustics. Includes transfer functions that represent both environmental effects. FIG. 1 shows a typical configuration of the related art. 3D spatial position signal 10 is HRTF parameter 3 applied to the data table and the interpolation function 11 and identified by the signal 10 A set of interpolated HRTF parameters 12 corresponding to the dimensional position is obtained. Input sound The voice signal 12 has a transmission determined by the applied and interpolated HRTF parameters. Applied to an imaging filter 15 having a function. This filter 15 "Spatialized" audio output suitable for application to one channel of the horn 17 I will provide a. Various drawings show the headphones for reproduction, but a suitable HRTF is Psychoacoustics by other types of audio transducers, including speakers A locally localized sound can be generated. The present invention provides certain types of It is not limited to using audio transducers. The imaging filter is implemented with a finite impulse response (FIR) filter Sometimes, the HRTF parameter is the impulse response associated with the HRTF Is defined. As described below, the present invention Is not limited to using FIR filters. A major drawback of the prior art approach shown in FIG. 1 is the relatively long and complex HRTF. It is an operation cost. In the prior art, the HRTF length or complexity was reduced. Several techniques are used for this. The HRTF shown in FIG. 2a has a time delay component D, And an impulse response component g (t). That is, the imaging filter Is the time delay function Z, as shown in FIG.^-DAnd the impulse response function g (t) Can be realized. First, multiple HRTFs can be timed out by removing this time delay. Alignment between them can complicate the calculation of the impulse response function of the imaging filter. Performance is reduced. FIG. 3a shows a prior art arrangement, in which a plurality of pairs of raw (sunset) are shown. HRTF is applied to the time alignment processor 101 and its output terminal HRTF 102, and a time delay value 103 that is used later (not shown). Is output. Processor 101 calculates the cross-correlation of multiple pairs of raw HRTFs. Thus, the time difference between their arrival times is determined. These time differences are equal to the delay value 103 It is. The time delay value 103 and the filter interval are retained for later use, No psychoacoustic localization loss occurs and the perceptual effect is preserved. Time alignment H The RTF 102 is then processed by the minimum phase converter 104 for the remaining time delay. This is removed, further shortening the time alignment HRTF. FIG. 3b shows two raw left sets obtained from raw HRTF parameters 100. -Shows examples of right paired HRTFs (R1 / L1 and R2 / L2). FIG. The corresponding time aligned HRTF 102 is shown. FIG. 3d corresponds to this. The output minimum phase HRTF 105 is shown. Time alignment of HRTF102 Loose response length is reduced from the raw HRTF 100 and the minimum phase The HRTF 105 is shortened from the time aligned HRTF 102. Thus, multiple Extract the delay to time align a number of HRTFs and apply a minimum phase transform. Reduces the complexity of the filter (its length in the case of FIR filters). It is. Even with the techniques of FIG. 2b and FIG. Longest minimum phase response of about 256 points for FIR filter at the pulling rate Is commonly used, which means that the processor has an order of 25 mips per sound source. Request to execute the process. If the resources for the operation are limited, the length of the HRTF, To further reduce reliability, two additional approaches in the prior art are simply Used alone or in combination. One technique, as shown in FIG. Method for reducing sampling rate by downsampling RTF It is. Many localization cues, especially those that are important for height, have high frequency content , The lower sampling rate makes the performance of audio representation unacceptable Degradation to the extent possible. Another technique is shown in FIG. 4b, which uses a window function for the HRTF in the time domain. By multiplication or by using the corresponding weight function in the frequency domain. A method of applying a window function to an HRTF by convolving the RTF You. This process considers windowing the HRTF in the time domain. This is the easiest to understand. At this time, a shortened HRTF is obtained. Thus, a window width smaller than the HRTF is selected. Such a wi In the window processing, a result of frequency domain smoothing using a fixed weight function is obtained. This known windowing technique degrades the psychoacoustic localization properties, Degrades spatial location and orientation with complex and long impulse response You. Thus, maintaining the perceptual effects and psychoacoustic localization properties of the original HRTF However, there is a need for a method that reduces the complexity or length of the HRTF. Disclosure of the invention According to the present invention, a three-dimensional virtual audio representation is composed of one set according to a spatial position signal. Is generated, and according to the set of head related transfer function parameters, To filter the audio signal. This set of head related transfer function parameters is It is a smoothing of the parameters for a number of known head related transfer functions. The smoothing according to the present invention is best performed by considering its operation in the frequency domain. Can be explained well. The frequency components of several known transfer functions have a constant relationship with frequency. It is smoothed over multiple bandwidths, not numbers. Parameters of multiple transfer functions obtained The meter (referred to herein as the “compressed” transfer function) is a virtual audio representation Used to filter the audio signal for further processing. Compressed head-related transmission functions Numeric parameters can be pre-generated or created in real time. It may be. The smoothing band is the width of a plurality of critical bands of the ear (ie, “critical Bandwidth "). This function determines that the smoothing bandwidth is It can also be taken in proportion to the bandwidth. As is well known, the width of the critical band of the ear is Increases with increasing frequency, thus increasing the smoothing bandwidth with frequency I do. The wider the smoothing bandwidth relative to the critical bandwidth, the higher the resulting HRT The complexity of F is reduced. For HRTFs implemented as FIR filters, The filter length (number of filter taps) is expressed as a multiple of the critical bandwidth Associated with the inverse of the smoothing bandwidth. By applying the teachings of the present invention taking into account the critical bandwidth, the complexity And the same length, the prior art window technology described above Therefore, the perceptual effect and psychoacoustic local A shorter HRTF with lower complexity, such as less degradation of the product, is obtained. HRTF ("raw HRTF") examples and traditional windowing methods Example of a shortened HRTF created by the method and HR created by the method of the present invention TF (“compressed HRTF”) is shown in FIG. 5a (time domain) and FIG. 5b (frequency domain) Is shown in Raw HRTF is a process that reduces its complexity or length 3 is an example of a known HRTF in which no HRTF has been performed. In FIG. 5a, the time of the HRTF The domain impulse response amplitude is plotted along the time axis from 0 to 3 ms. You. FIG. 5b shows that the frequency domain transfer function power of each HRTF is It is plotted along the logarithmic frequency axis up to kHz. In the time domain of FIG. Thus, the conventional HRTF has shown a certain degree of shortening, but the compressed HRTF has been further reduced. Has been shortened to In the frequency domain of FIG. The effect of a uniform smoothing bandwidth is evident, while compressed HRTFs increase the frequency. This shows the effect that the smoothing bandwidth increases as the size increases. Figure 5b is a log frequency scale As such, the compressed HRTF exhibits a constant smoothing relative to the raw HRTF. Despite the difference in length in the time domain and the difference in frequency response in the frequency domain, Raw HRTF, prior art HRTF, and compressed HRTF have equivalent psychoacoustics Provide dynamic performance. The amount of conventional windowing for raw HRTFs and the amount of compression according to the invention To provide substantially similar psychoacoustic performance to the raw HRTF If selected, the preliminary double blindfold listening test shows that prior art windowing H It is shown that compressed HRTF is preferred over RTF. Surprisingly, compression HRTFs are preferred over raw HRTFs. The reason is that the smoothing process The microstructure of the HRTF that has been removed by removal is unrelated to the location of the HRTF, This is considered to be because it is recognized as noise. The invention can be implemented in at least two ways. In the first method, In the frequency domain, convolve the HRTF with a frequency-dependent weight function. By this, the HRTF is smoothed. This weight function is not invariant, Of the prior art time-domain window function in that it is a function that depends on Different from frequency domain version. Instead, a time-domain version of the frequency-dependent weight function is In the time domain, it may be applied to the HRTF impulse response. Second In the method of (1), the frequency axis of the HRTF is warped, that is, in the nonlinear frequency domain. This frequency warped HRTF is mapped in the time domain (transformed to the time domain). Multiplied by a conventional window function or in the frequency domain Convolved with the invariant frequency response of a conventional window function. Windowed The signal is then subjected to inverse frequency warping. The invention can be implemented using any kind of imaging filter. This imaging filter can be an analog filter or a hybrid Including analog / digital filters, digital filters, etc. Is not limited. Such filters can be hardware, software, or hardware Hardware / software hybrid configuration (eg digital signal processing) Can be realized. Digitally or partially implemented digitally Sometimes FIR filters, IIR (infinite impulse response) filters, and high A brid FIR / IIR filter can be used. The invention also provides It can also be realized by a principal component filter architecture. Virtual audio expression Other aspects are analog, digital, hybrid analog / digital, hard Hardware, software, and hybrid hardware / software It can be realized using any combination including techniques, for example, digital signal processing. When implemented with an FIR filter, the HRTF parameters are This is a filter tap that defines the parameters. For IIR filters, the HRTF parameter The meter may have poles and zeros or other characteristics that define its IIR filter. is there. In the case of a principal component filter, the HRTF parameter is a position-dependent weight. You. In another aspect of the invention, each HRTF in a group of HRTFs is a member of the group. Fixed head related transfer functions common to all head related transfer functions and their respective head related The fixed head is separated into the variable head related transfer function associated with the continuous transfer function. The relevant transfer function and the combination of each variable head related transfer function are It is substantially equivalent to the head related transfer function of knowledge. The smoothing technique according to the present invention It may be applied to one or both of a constant HRTF and a variable HRTF, or Neither need be applied. BRIEF DESCRIPTION OF THE FIGURES FIG. 1 is a functional block diagram of a conventional virtual audio expression configuration. FIG. 2a is an example of an impulse response of the head related transfer function (HRTF). FIG. 2b shows that the imaging filter has the HRTF time lag and impulse FIG. 4 is a functional block diagram shown to represent a response portion. FIG. 3a illustrates one prior art feature that reduces the complexity or length of the HRTF. It is a block diagram. FIG. 3b shows a set of left and right "raw" HRTF pairs. FIG. 3c shows the set of HRTF pairs of FIG. 3b shortened in time alignment. You. FIG. 3d shows that the set of HRTF pairs of FIG. Indicates a reduction. FIG. 4a illustrates the HRTF impulse response by reducing the sampling rate. FIG. 9 is a functional block diagram showing a conventional technique for shortening an answer. FIG. 4b shows the windowing of the HRTF impulse response in the time domain. Is a functional block showing a conventional technique for shortening the HRTF impulse response. FIG. FIG. 5a shows a set of three waveforms in the time domain, the "raw" HRT F, an HRTF shortened by the prior art, and a HRTF in accordance with the teachings of the present invention. HRTF is shown in a reduced form. FIG. 5b is a frequency domain representation of the set of HRTF waveforms of FIG. 5a. FIG. 6a is a functional block diagram illustrating an embodiment for obtaining a compressed HRTF according to the present invention. is there. FIG. 6b shows the frequency response of an example input HRTF. FIG. 6c shows the impulse response of an example HRTF impulse response. FIG. 6d shows the frequency response of the compressed output HRTF. FIG. 6e shows the impulse response of the compressed output HRTF. FIG. 7a shows another embodiment of obtaining a compressed HRTF according to the present invention. FIG. 7b shows the impulse response of an example input HRTF impulse response. FIG. 7c shows the frequency response of an example input HRTF. FIG. 7d shows the frequency response of the input HRTF after frequency warping. FIG. 7e shows the frequency response of the compressed output HRTF. FIG. 7f shows the frequency response of the compressed output HRTF after inverse frequency warping. You. FIG. 7g shows the impulse response of the compressed output HRTF after inverse frequency warping. ing. FIG. 8 shows three views useful for understanding the operation of the embodiment of FIGS. 6a and 7a. The window family is shown. FIG. 9 shows a case where the imaging filter is implemented as a principal component filter. It is a functional block diagram. FIG. 10 is a functional block diagram showing another embodiment of the present invention. Embodiment of the Invention FIG. 6a shows an embodiment for generating a compressed HRTF according to the present invention. This In the embodiment of the invention, the frequency response of the input HRTF is frequency dependent in the frequency domain. The input HRTF is smoothed by being convolved with the weight function. This Instead of the HRTF in the time domain, You may make it apply to an impulse response. FIG. 7a illustrates another embodiment of generating a compressed HRTF according to the present invention. According to this embodiment, the frequency axis of the input HRTF is warped (distorted), and That is, the frequency warp HRTF is mapped to the nonlinear frequency domain, and , The invariant weight function (ie, the frequency of the conventional time-domain window function) Convolved with the frequency response of the domain version weight function). Then the inverse frequency warp Is applied to the smoothed signal. Instead, a frequency warp HRTF is used for the time domain. It may be converted to a range and multiplied by a conventional window function. In FIG. 6a, an optional non-linear scaling function 51 is provided for the input H Applied to RTF50. Thereafter, a smoothing function 54 is applied to the HRTF 52. It is. If nonlinear scaling is applied to the input HRTF, then the inverse A scaling function 56 is applied to the smoothed HRTF 54. Output compressed HRTF Given as As described further below, non-linear scaling 51 and non- Linear inverse scaling 56 determines whether the smoothed average function is signal amplitude or signal power. If it is, and whether it is an arithmetic or geometric mean, or any other averaging function Or can be controlled. The smoothing processor 54 convolves the HRTF with a frequency dependent weight function. This flat The smoothing processor may be implemented as a moving weighted arithmetic mean; Here, at least the smoothing bandwidth b_fIs a function of frequency, optionally Window shape W_fMay also be a function of frequency. The width of the weight function is To increase. The length of the weight function is preferably a multiple of the critical bandwidth, The shorter the HRTF impulse response length is, the larger the multiple value becomes. HRTFs typically have low frequency components (up to about 300 Hz) and high frequency components. Lack of minutes (about 16 kHz or more). As short as possible (and therefore least complex) To provide HRTF, the HRTF frequency response must be below normal for the human audible range. It is desirable to extend to or beyond limits and limits. But, In these cases, the weight function in the extended low-frequency and high-frequency voice bands is The number range is the main non-extended part of the voice band where the content of the HRTF is typically contained Relative to the critical band of the ear, than a multiple of the critical bandwidth used through Should be wider. Below about 500 Hz, the wavelength of the voice is large compared to the size of the head, The HRTF has a substantially flat spectrum. Therefore, from the multiple of the critical bandwidth described above, It is preferable to use a wide smoothing bandwidth. For high frequencies above about 16kHz Human hearing is poor, and most localized cues are Concentration below the critical high frequency, so that the average is greater than a multiple of the critical bandwidth described above. Preferably, a smoothing bandwidth is used. Therefore, the low frequency end and high frequency The weighted bandwidth at the wavenumber end is the bandwidth predicted by the equation described here. You may make it extend beyond. For example, in one specific embodiment of the invention, A constant smoothing band of about 250 Hz for frequencies below 1 kHz Width is used, and above 1 kHz, a 1/3 octave bandwidth is used. 1 The ３ octave bandwidth is an approximation of the critical bandwidth, 1/3 octave at 1 kHz. Turb bandwidth is about 250 Hz. Therefore, below 1 kHz, the smoothing band The width is wider than the critical bandwidth. In some cases, low frequencies (eg, 300-50 0 Hz) can be accurately determined using conventional HRTF measurement techniques. To supplement missing data, it may be extrapolated to DC. To process all HRTFs belonging to one group, the same critical bandwidth multiple must be used. May have different critical bandwidth multiples Apply multiple weighting functions to each HRTF so that all HRTFs are comparable It may not be compressed. This is because the resulting multiple compressed HRTFs May be necessary to ensure that they have the same complexity and length (how many The raw HRTF is more complex and longer, depending on its spatial location. May require more or less compression). Instead, Reduce the amount of compression of the HRTF that represents a certain direction or spatial position compared to other HRTFs Comprehensive computational complexity while maintaining a better sense of overall spatial localization Some relief may be provided. HRTF compression is relative sound of HRTF It may vary as a function of psychological importance. For example, early reflections Because they arrive from different directions, they can be obtained using different HRTFs. Therefore, accurate spatialization is not as important as the direct audio path. Therefore, early reflections With an "over shortened" HRTF, it can be obtained without any perceptual effects. Another method of implementing the smoothing 54 of FIG. 6a is that for each frequency f, H_θ(N) is the input HRTF 52 at the position θ,_θ(F) is compressed HR TF54, n is the frequency, and N is 1/2 of the Nyquist frequency. Therefore, each Weight functions W defined respectively in sections from 0 to N_f, _θ(N) family And the width of these weighting functions is a function of their center frequency f, Alternatively, a function of the HRTF position θ may be used as an option. The sum of each weight function is 1 (Equation 3). FIG. 8 shows the third of a family of weight functions having a Gaussian distribution shape. The two components are shown plotting the magnitude response against frequency. For simplicity Shows only three of the families of weight functions. The center window is , The center of which is frequency n₀And the bandwidth b_{f = n0}have. The weight function is Minute There is no need to have a cloth. For simplicity, the weight function for other shapes is rectangular May be used. Also, the weight function is paired with respect to its center frequency. It does not need to be a name. Considering the nonlinear scaling function 51 and the inverse scaling function 56, FIG. a can be more generally characterized as: Where G is scaling 51 and G^-1Is inverse scaling. The smoothing 54 described so far is a calculation that depends on the statistics of the input HRTF transfer function. Although the arithmetic mean function is given, the rounded mean (trim mean) or median is May be better than average. The human ear seems to be sensitive to the total filter power in the critical band Thus, the nonlinear scaling 51 of FIG. 6A is realized as a square operation, and the output inverse scaler is realized. Preferably, 56 is implemented as a square root operation. Some, such as minimal phase conversion It may be desirable to apply pre-processing or post-processing. Instead of this or 2 Nonlinear scaling in addition to multiply and inverse square root scaling When 51 is a logarithmic function and inverse scaling 57 is an exponential function, the smoothing 5 The arithmetic mean of 4 is the geometric mean. Such an average is considered important for height perception. Useful for preserving the resulting sky spectrum. 6b and 6c show one example of the frequency spectrum of the input HRTF and the input impulse response. Examples are shown in the frequency domain and the time domain, respectively. Figures 6d and 6e respectively Shows the compressed output HRTF 57 in the region of FIG. HRTF spectrum is smooth The degree to which the impulse response has been shortened It depends on a multiple of the critical bandwidth chosen for 54. Characteristics of compressed HRTF Also depends on the window shape described above and other factors. Referring to FIG. In this embodiment, the frequency axis of the input HRTF is distorted. The constant bandwidth smoothing 125 acting on the shifted frequency spectrum is the smoothing 5 of FIG. 4 is converted by the frequency warping function 121 so as to be equal to 4. smooth The modified HRTF is processed in inverse warping 129 to obtain an output compressed HRTF. As in FIG. 6A, the non-linear scaling 51 and the inverse scaling 56 are arbitrarily input. You may make it apply to HRTF and output HRTF. The frequency warping function 121 is obtained by combining with a constant bandwidth smoothing. 6a achieves the purpose of the frequency-dependent smoothing bandwidth of the embodiment. For example, the frequency In order to realize the critical band smoothing, the warping function mapped to the Bark scale May be used. Smoothing 125 assumes that the width of the weight function is constant with respect to frequency. Except for this point, as in the embodiment of FIG. It can also be implemented as a convolution of the weight function in the frequency domain. You can also. As in the case with respect to FIG. It may be desirable to apply processing and post-processing. The order in which the frequency warping function 121 and the scaling function 51 are applied is: Can be reversed. Although these functions are not linear, the frequency warping 12 1 affects the frequency domain, scaling 51 only affects the value of the frequency bin So these functions can be interchanged. Therefore, the inverse scaling function 56 and the inverse word The ping function 129 can also be reversed. As yet another alternative, the output HRTF is taken after block 125 and inverted. Scaling and dewarping can be performed on devices that receive their compressed HRTF parameters, It may be provided in a function. 7b and 7c show an example of the input response and frequency spectrum of the input HRTF Are respectively shown. FIG. 7d shows the frequency of the HRTF mapped on the Bark scale. The number spectrum is shown. FIG. 7 e shows the HRTF spectrum after smoothing 125. Is shown. After performing inverse frequency warping, the resulting compressed H The RTF has a spectrum as shown in FIG. 7f and an impulse response as shown in FIG. 7g. Have an answer. The resulting HRTF characteristics are similar to those of the embodiment of FIG. Are identical. The imaging filter is also implemented as a principal component filter by the method shown in FIG. It can also be applied. The position signal 30 is functionally similar to block 11 of FIG. Applied to weight table and interpolation function 31. Provided by block 31 Parameters, interpolated weights, directional matrix, and principal component filter It is functionally equivalent to HRTF parameters for controlling the imaging filter. This The imaging filter 15 ′ of this embodiment includes a set of parallel fixed filters 34, That is, principal component filter, PC₀~ PC_NFilters the input signal 33 at The outputs are mixed by position dependent weighting to produce the desired imaging Approximate the filter. The accuracy of this approximation is consistent with the number of principal component filters used. To increase. To achieve a degree of approximation for a set of raw HRTFs, More computational resources than for those compressed by the embodiments of the present invention. Sources are needed in the form of additional principal component filters. Another embodiment of the present invention is shown in the embodiment of FIG. 3D spatial position signal 70 is applied to the equalized HRTF parameter table and interpolation function 71, A set of interpolated equalized HRTFs according to the three-dimensional position identified by the signal 70 The parameter 72 is obtained. The input audio signal 73 is output from the equalization filter 74 and the The imaging filter 75 determined by the equalized HRTF parameters Applied. Instead, the equalization filter 74 replaces the imaging filter 75. It may be installed later. This filter 75 is one channel of the headphone 77. Provides a spatialized audio output suitable for application to a cell. Multiple sets of equalized head related transfer function parameters in table 71 are One of the known head related transfer functions that is common to all head related transfer functions in the group Of the fixed head-related transfer functions and the known head-related transfer functions This fixed head is obtained in advance by dividing into a position-dependent head-related transfer function. The combination of the brain-related transfer function and each variable head-related transfer function is It is substantially equal to the head related transfer function of knowledge. The equalization filter 74 is thus 7 shows a fixed head related function common to all head related transfer functions in the file. This , The complexity of the HRTF and the imaging filter is reduced. This equalization filter characteristic is designed to minimize the complexity of the imaging filter. Selected. This minimizes the size of the equalized HRTF table and complements the HRTF. Reduces computational resources for inter- and imaging filtering, and Reduce the memory resources for a simplified HRTF. FIR imaging In the case of filters, it is desirable to minimize the filter length. Various optimization criteria can be used to find the desired equalization filter. The equalization filter may be such that it approximates the average HRTFM, so that The spectrum of the position-dependent portion becomes flat on average (and the time is shortened). etc The filter may represent a group of known transfer function diffuse field sound components. Equalization When the filter is configured as a weighted average of the HRTF, its weight Injuries should place more emphasis on longer and more complex HRTFs. For the left and right channels (before or after the position variable HRTF Alternatively, different fixed equalization processes may be performed. It may be applied to a sound source signal (a monaural signal is separated into a left component and a right component) May be applied before as a single filter, or each of the left and right components May be applied as two filters). Predicted from human symmetry As such, the optimal left and right ear equalization filters are often nearly identical. Follow The source signal is filtered using a single equalization filter and its output May be applied to both position-dependent HRTF filters. In accordance with the teachings of the present invention, an equalized HRTF parameter and a fixed equalized fill Smoothing or equalizing the HRTF parameters By smoothing both the data and the equalization filter parameters, can get. Also, different filter structures are used for the equalizing filter and the imaging filter. The computation can be saved by using. For example, one of them As an IIR filter and the other as an FIR filter. You may. Since a fixed filter typically has a much smoother response, It is best to implement the filter as a low-order IIR filter. It is also easy to realize the equalization filter as an analog filter. Any filter suitable for use in HRTF filters, including principal component methods Filtering technology is used to realize HRTF parameters for variable position dependent equalization. Can be used. For example, FIG. Principal component imaging filter 15 'of the type described in the ninth embodiment. May be used.

───────────────────────────────────────────────────── フロントページの続き (81)指定国ＥＰ(ＡＴ，ＢＥ，ＣＨ，ＤＥ，ＤＫ，ＥＳ，ＦＲ，ＧＢ，ＧＲ，ＩＥ，ＩＴ，ＬＵ，ＭＣ，ＮＬ，ＰＴ，ＳＥ)，ＡＵ，ＣＡ，ＪＰ────────────────────────────────────────────────── ─── Continuation of front page (81) Designated countries EP (AT, BE, CH, DE, DK, ES, FR, GB, GR, IE, IT, LU, M C, NL, PT, SE), AU, CA, JP

Claims

Claims 1. A method for expressing three-dimensional virtual audio, comprising generating a set of transfer function parameters according to a spatial position or direction signal, and filtering an audio signal according to the set of transfer function parameters. Wherein the set of transfer function parameters selected from a plurality of parameters or obtained by interpolation interpolates the frequency components of the known transfer function over a bandwidth that is a function that is not constant with respect to frequency. Recording the transfer function parameters of the obtained compression transfer function. 2. The method of claim 1, wherein the bandwidth is a function of a critical bandwidth. 3. 3. The audio representation method according to claim 2, wherein the smoothing averages each frequency component in at least a part of a speech band of the expression with respect to the frequency component within a bandwidth including the frequency component. A method comprising applying an activation function. 4. The audio representation method according to claim 3, wherein the averaging function is a function of the amplitude of the frequency component. 5. 4. The audio expression method according to claim 3, wherein the averaging function is a function of a power of the frequency component. 6. A method according to claim 4 or 5, wherein the averaging function determines a median. 7. A method according to claim 4 or 5, wherein the averaging function determines a weighted arithmetic mean. 8. The method of claim 4 or 5, wherein the averaging function determines a weighted geometric mean. 9. The method of claim 4 or 5, wherein the averaging function determines a rounded average. 10. 3. The method of claim 2, wherein the weight function has a rectangular shape. 11. The method of claim 1, wherein the bandwidth is proportional to a critical bandwidth. 12. 12. The method of claim 11, wherein the transfer function parameter is extended at low and high frequencies, and the bandwidth is proportional to a critical bandwidth in the low and high frequency regions. Wider than bandwidth, way. 13. The method of claim 1, wherein the smoothing comprises convolving the transfer function with a frequency-dependent weight function, wherein a width of the frequency-dependent weight function is a function of a critical bandwidth. 14． 14. The method of claim 13, wherein the weight function has a bandwidth that is one or more multiples of a critical bandwidth. 15. 15. The audio representation method according to claim 14, wherein the transfer function parameter is extended at low frequency and high frequency, and the bandwidth is proportional to a critical bandwidth in the low frequency region and the high frequency region. Wider than the width, the way. 16. 14. The method of claim 13, wherein the weight function has a shape with a higher degree of continuity than a rectangular window. 17． The method of claim 1, wherein smoothing the frequency components comprises smoothing the frequency components in a frequency domain. 18. An audio representation method according to claim 17, wherein the smoothing, the known transfer function H (f), according to the relation, said method comprising convolving a weight function W _f (i) in the frequency domain, Here, at least the smoothing bandwidth b _f is a function of frequency, and optionally, the weight function form W _f is also a function of frequency. 19. 2. The method of claim 1, wherein the smoothing of the frequency components comprises applying a frequency warping function to the known transfer function, transforming the frequency warped transfer function into a time domain, and performing the frequency warped. A method comprising time domain windowing the impulse response of a transfer function. 20. 2. The audio representation method according to claim 1, wherein the smoothing of the frequency component is performed by applying a frequency warping function to the known transfer function, and converting the frequency-warped transfer function into a frequency response of a constant weight function and a frequency domain. A method, comprising performing convolution. 21. 21. The method of claim 19 or 20, wherein the frequency warping function maps the transfer function to a bark scale. 22. 21. The method of claim 19 or 20, further comprising applying non-linear scaling to the known transfer function prior to the multiplication or convolution, and inverse scaling the windowed or convolved transfer function. A method comprising applying 23. The method of claim 1, wherein the filtering is principal component filtering. 24. 2. The audio expression method according to claim 1, wherein the transfer function parameter is an equalized transfer function parameter, and the filtering includes fixed equalization filtering and filtering according to the equalized transfer function parameter. Including, methods. 25. 2. The audio representation method according to claim 1, wherein the set of transfer functions converts frequency components of a known transfer function over a plurality of different bandwidths into the spatial position or direction associated with the transfer function. The method obtained by smoothing as a function. 26. 2. The method of claim 1, wherein the set of transfer functions smoothes frequency components of a known transfer function over a plurality of different bandwidths as a function of the complexity of the transfer function. The method obtained by. 27. 2. The audio representation method according to claim 1, wherein the set of transfer functions converts frequency components of a known transfer function over a plurality of different bandwidths into the spatial position or direction associated with the transfer function. A method obtained by smoothing as a function and as a function of the complexity of the transfer function. 28. 28. The method of claim 26 or claim 27, wherein the bandwidth increases with an increase in transfer function complexity. 29. 29. The method of claim 1 or 28, wherein the bandwidth is selected such that the resulting most complex compressed transfer function does not exceed a predetermined complexity. 30. 2. The method of claim 1, wherein the set of transfer functions maps a frequency component of a known transfer function over a plurality of different bandwidths. A method obtained by smoothing as a function of gender. 31. 2. The audio representation method according to claim 1, wherein the set of transfer functions converts frequency components of a known transfer function over a plurality of different bandwidths into the spatial position or direction associated with the transfer function. A method obtained by smoothing as a function and as a function of the relative psychoacoustic importance of said transfer function. 32. A method for representing three-dimensional virtual audio, comprising: generating a set of equalization transfer function parameters according to a spatial position or direction signal, using fixed equalization filtering, and using the set of equalization transfer functions. Filtering the audio signal in response to a parameter, wherein the generation of the fixed equalization filtering and the set of equalization transfer function parameters are selected or interpolated from a plurality of parameters. The generation of a plurality of parameters at the time of obtaining includes associating a group of known transfer functions with one fixed transfer function common to all the transfer functions in the group and each of the known transfer functions. Separated into variable transfer functions, wherein the combination of the fixed transfer function and each variable transfer function is substantially equal to the respective original known transfer function. Recording the parameters of the fixed transfer function to characterize the fixed equalization filtering; and recording the parameters of each transfer function of the resulting variable transfer function for use as the equalization transfer function parameters. , The way it is done. 33. 29. The audio representation method according to claim 28, wherein the fixed equalization filtering and the generation of the set of equalization transfer function parameters further comprise: changing a frequency component of each of the variable transfer functions with respect to frequency. Smoothing over a bandwidth that is a function. 34. 29. The audio representation method according to claim 28, wherein the fixed equalization filtering and the generation of the set of equalization transfer function parameters further comprise: converting a frequency component of the fixed transfer function with a function that is not constant with respect to frequency. Smoothing over a bandwidth. 35. 29. The method of audio representation according to claim 28, wherein said group of known transfer functions is one fixed transfer function by selecting one fixed transfer function such that a variable transfer function of lowest complexity is obtained. The method is separated into a function and multiple variable transfer functions. 36. 29. The audio representation method according to claim 28, wherein the group of known transfer functions is selected by selecting one fixed transfer function representing a diffuse field sound component of the group of known transfer functions. A method that is separated into a fixed transfer function and a plurality of variable transfer functions. 37. 29. The method of claim 28, wherein the group of known transfer functions is a transfer function representing a range in a particular direction or directions in space. 38. 29. The method of claim 28, further comprising the step of smoothing frequency components of the fixed transfer function over a bandwidth that is a function that is not constant with respect to frequency, wherein the fixed equalization is performed. Recording the parameters of the fixed transfer function for characterizing the filtering, recording the parameters of the obtained compressed fixed transfer function. 39. 29. The audio representation method according to claim 28, wherein a plurality of sets of equalization transfer function parameters generated according to the spatial position or direction signal are generated by principal component filtering. 40. A three-dimensional virtual audio representation device, comprising: means for generating a set of transfer function parameters according to a spatial position or direction signal, wherein the parameters are: Recording the transfer function parameters of the obtained compression transfer function, smoothing over a bandwidth that is a function that is not a function of, wherein the means is obtained by being selected or interpolated from a plurality of parameters obtained. Means for filtering the audio signal according to said set of transfer function parameters. 41. A three-dimensional virtual audio representation device, which is means for generating a set of equalization transfer function parameters according to a spatial position or direction signal, wherein the parameters include: a group of known transfer functions; One fixed transfer function common to all the transfer functions in a group and a variable transfer function associated with each of the known transfer functions are separated, and a combination of the fixed transfer function and each variable transfer function is used. Record the parameters of the fixed transfer function to characterize the fixed equalization filtering, and to substantially equal each original known transfer function; and Is recorded for use as the equalization transfer function parameter, or is selected from a plurality of parameters obtained or supplemented. And means so as to be, using a fixed equalization filtering, also in response to said set of equalized transfer function parameters, apparatus comprising: means for filtering the audio signal.