JP4124702B2

JP4124702B2 - Stereo sound signal encoding apparatus, stereo sound signal encoding method, and stereo sound signal encoding program

Info

Publication number: JP4124702B2
Application number: JP2003166934A
Authority: JP
Inventors: 靖茂中山; 馨渡辺; 智康小森
Original assignee: Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2003-06-11
Filing date: 2003-06-11
Publication date: 2008-07-23
Anticipated expiration: 2023-06-11
Also published as: JP2005006018A

Description

【０００１】
【発明の属する技術分野】
本発明は、立体感のある音響効果を生じさせる立体音響信号を生成し、符号化する立体音響信号符号化装置、立体音響信号符号化方法および立体音響信号符号化プログラムに関する。
【０００２】
【従来の技術】
従来、音像（音源に対応してつくられた音場の空間的な広がり）の方向性、距離感を再現するために、複数の音響信号を合成して立体感のある音響効果を生じさせる立体音響信号を生成して、この生成した立体音響信号を符号化する技術として、例えば、特許文献１に記載されている「音像再生システム」がある。
【０００３】
この音像再生システムは、３次元音像の符号化方法（立体音響信号を符号化する方法）によって、送信側から音響信号とこの音源の位置や向き等の情報である補助情報とを送信している。この音像再生システムでは、当該補助情報が当該音響信号の増加に伴って増加してしまい、送信側から送信する情報量が増加してしまう。このため、この音像再生システムでは、受信側（再生側）のスピーカアレイの距離制御および方向感制御に用いる距離制御信号および方向感制御信号を送信側で符号化する際に使用する制御信号テーブルと、受信側で復号化する際に使用する制御信号テーブルとを共通化することで、補助情報の情報量の増加を抑制している。
【０００４】
【特許文献１】
特開２００１−３４６２９７号公報（段落１３−段落２９、第１図）
【０００５】
【発明が解決しようとする課題】
しかしながら、従来の音響再生システムでは、補助情報の情報量の増加を抑制することはできるが、送信する音響信号の情報量を抑制する対策が施されていないので、音響信号数に比例して音響信号の情報量が増加してしまうという問題がある。
【０００６】
ただし、従来の音響再生システムでは、送信する音響信号の情報量を抑制しようとすると、受信側で当該音響信号を再生する際に、音像の定位感（音源が特定される位置、距離感）等の臨場感を損なうおそれが生じるという問題がある。
【０００７】
そこで、本発明の目的は前記した従来の技術が有する課題を解消し、音像の定位感等の臨場感を損なうことなく、送信する音響信号の情報量を抑制することができる立体音響信号符号化装置、立体音響信号符号化方法および立体音響信号符号化プログラムを提供することにある。
【０００８】
【課題を解決するための手段】
本発明は、前記した目的を達成するため、以下に示す構成とした。
請求項１記載の立体音響信号符号化装置は、再生する際に立体感のある音響効果を生じさせる立体音響信号を、複数の音響信号を合成して生成し、当該立体音響信号を符号化する立体音響信号符号化装置であって、フィルタ分析手段と、方向一致組合せ抽出手段と、音源距離差計算手段と、合成帯域変換データ記録手段と、音響信号関連情報生成出力手段と、立体音響信号合成手段と、立体音響信号ビット割当符号化手段と、位置データ符号化手段と、多重化手段と、多重化データ出力手段と、を備える構成とした。
【０００９】
かかる構成によれば、立体音響信号符号化装置は、フィルタ分析手段によって、各音源から入力された音響信号を時間周波数変換して、前記音響信号の周波数成分を分析する。なお、フィルタ分析手段は、音響信号にＤＣＴ（離散コサイン変換）を施すものであり、音響信号周波数は、ＤＣＴ係数である。続いて、立体音響信号符号化装置は、方向一致組合せ抽出手段によって、音響信号の音源の位置を示す音源位置データの音源番号、距離方向のインデックスおよび左右方向のインデックスを用い、当該左右方向のインデックスにより方向が一致している当該音源番号の組合せを抽出する。そして、立体音響信号符号化装置は、音源距離差計算手段によって、方向一致組合せ抽出手段で抽出された音源番号に係る音源同士の距離差を、距離方向のインデックスの差分を取ることで計算する。また、立体音響信号符号化装置は、音響信号関連情報生成出力手段によって、音源距離差計算手段で計算した音源同士の距離差と合成帯域変換データにおける距離差とに対応する周波数と、方向一致組合せ抽出手段で抽出された音源番号の組合せとを含み、合成すべき音響信号の周波数帯域を示す音響信号関連情報を生成して出力する。そして、立体音響信号符号化装置は、立体音響信号合成手段によって、音響信号関連情報生成出力手段で生成した音響信号関連情報で示される音源番号の組合せによる音響信号同士について、当該音響信号の周波数帯域を合成することで立体音響信号とする。そして、立体音響信号符号化装置は、立体音響信号ビット割当符号化手段によって、合成した立体音響信号に所定のビットを割り当てて符号化した符号化立体音響データとし、位置データ符号化手段によって、音源位置データを符号化した符号化位置データとする。そして、立体音響信号符号化装置は、多重化手段によって、符号化位置データと符号化立体音響データとを多重化した多重化データとし、多重化データ出力手段によって、多重化データを出力する。なお、この多重化データを出力する際に、当該多重化データをストリーム化した多重化データストリームを生成して出力してもよい。
【００１０】
請求項２記載の立体音響信号符号化装置は、再生する際に立体感のある音響効果を生じさせる立体音響信号を、複数の音響信号を合成して生成し、当該立体音響信号を符号化する立体音響信号符号化装置であって、フィルタ分析手段と、方向一致組合せ抽出手段と、音源距離差計算手段と、合成帯域変換データ記録手段と、音響信号関連情報生成出力手段と、立体音響信号合成手段と、フィルタ合成手段と、分解音響信号符号化手段と、位置データ符号化手段と、データ出力手段と、を備える構成とした。
【００１１】
かかる構成によれば、立体音響信号符号化装置は、フィルタ分析手段によって、各音源から入力された音響信号を時間周波数変換して、音響信号の周波数成分を分析する。続いて、立体音響信号符号化装置は、方向一致組合せ抽出手段によって、音響信号の音源の位置を示す音源位置データの音源番号、距離方向のインデックスおよび左右方向のインデックスを用い、当該左右方向のインデックスにより方向が一致している当該音源番号の組合せを抽出する。そして、立体音響信号符号化装置は、音源距離差計算手段によって、方向一致組合せ抽出手段で抽出された音源番号に係る音源同士の距離差を、距離方向のインデックスの差分を取ることで計算する。そして、立体音響信号符号化装置は、音響信号関連情報生成出力手段によって、音源距離差計算手段で計算した音源同士の距離差と合成帯域変換データにおける距離差とに対応する周波数と、方向一致組合せ抽出手段で抽出された音源番号の組合せとを含み、合成すべき音響信号の周波数帯域を示す音響信号関連情報を生成して出力する。そして、立体音響信号符号化装置は、立体音響信号合成手段によって、音響信号関連情報生成出力手段で生成した音響信号関連情報で示される音源番号の組合せによる音響信号同士について、当該音響信号の周波数帯域を合成することで立体音響信号とする。そして、立体音響信号符号化装置は、フィルタ合成手段によって、立体音響信号合成手段によって合成した立体音響信号を分解した分解音響信号とする。そして、立体音響信号符号化装置は、分解音響信号符号化手段によって、フィルタ合成手段によって分解した分解音響信号を符号化した符号化分解音響データとし、位置データ符号化手段によって、音源位置データを符号化した符号化位置データとした後、データ出力手段によって、符号化分解音響データおよび符号化位置データを出力する。
【００１４】
請求項３記載の立体音響信号符号化方法は、再生する際に立体感のある音響効果を生じさせる立体音響信号を、複数の音響信号を合成して生成し、当該立体音響信号を符号化する立体音響信号符号化方法であって、フィルタ分析ステップと、方向一致組合せ抽出ステップと、音源距離差計算ステップと、音響信号関連情報生成出力ステップと、立体音響信号合成ステップと、立体音響信号ビット割当符号化ステップと、位置データ符号化ステップと、多重化ステップと、多重化データ出力ステップと、を含むものとした。
【００１５】
かかる手順によれば、立体音響信号符号化方法は、フィルタ分析ステップにおいて、音響信号を時間周波数変換して、前記音響信号の周波数成分を分析する。続いて、立体音響信号符号化方法は、方向一致組合せ抽出ステップにおいて、音響信号の音源の位置を示す音源位置データの音源番号、距離方向のインデックスおよび左右方向のインデックスを用い、当該左右方向のインデックスにより方向が一致している当該音源番号の組合せを抽出する。そして、立体音響信号符号化方法は、音源距離差計算ステップにおいて、方向一致組合せ抽出ステップにて抽出された音源番号に係る音源同士の距離差を、距離方向のインデックスの差分を取ることで計算する。そして、立体音響信号符号化方法は、音響信号関連情報生成出力ステップにおいて、音源距離差計算ステップにて計算した音源同士の距離差と、音源同士の距離差と周波数とを関連付けた合成帯域変換データにおける距離差と、に対応する周波数と、方向一致組合せ抽出ステップにて抽出された音源番号の組合せとを含み、合成すべき音響信号の周波数帯域を示す音響信号関連情報を生成して出力する。そして、立体音響信号符号化方法は、立体音響信号合成ステップにおいて、音響信号関連情報生成出力ステップにて生成した音響信号関連情報で示される音源番号の組合せによる音響信号同士について、当該音響信号の周波数帯域を合成することで立体音響信号とする。そして、立体音響信号符号化方法は、立体音響信号ビット割当符号化ステップにおいて、立体音響信号に所定のビットを割り当てて符号化した符号化立体音響データとし、位置データ符号化ステップにおいて、音源位置データを符号化した符号化位置データとする。そして、立体音響信号符号化方法は、多重化ステップにおいて、符号化位置データと符号化立体音響データとを多重化した多重化データとし、多重化データ出力ステップにおいて、多重化した多重化データを出力する。
【００１６】
請求項４記載の立体音響信号符号化プログラムは、再生する際に立体感のある音響効果を生じさせる立体音響信号を、複数の音響信号を合成して生成し、当該立体音響信号を符号化するために、音源同士の距離差と、周波数とを関連付けた合成帯域変換データを予め記録している合成帯域変換データ記録手段を備えたコンピュータを、フィルタ分析手段、方向一致組合せ抽出手段、音源距離差計算手段、音響信号関連情報生成出力手段、立体音響信号合成手段、立体音響信号ビット割当符号化手段、位置データ符号化手段、多重化手段、多重化データ出力手段、として機能させる構成とした。
【００１７】
かかる構成によれば、立体音響信号符号化プログラムは、フィルタ分析手段によって、音響信号を時間周波数変換して、音響信号の周波数を分析する。続いて、立体音響信号符号化プログラムは、方向一致組合せ抽出手段によって、音響信号の音源の位置を示す音源位置データの音源番号、距離方向のインデックスおよび左右方向のインデックスを用い、当該左右方向のインデックスにより方向が一致している当該音源番号の組合せを抽出する。そして、立体音響信号符号化プログラムは、音源距離差計算手段によって、方向一致組合せ抽出手段で抽出された音源番号に係る音源同士の距離差を、距離方向のインデックスの差分を取ることで計算する。そして、立体音響信号符号化プログラムは、音響信号関連情報生成出力手段によって、前記音源距離差計算手段で計算した音源同士の距離差と前記合成帯域変換データにおける距離差とに対応する周波数と、前記方向一致組合せ抽出手段で抽出された前記音源番号の組合せとを含み、合成すべき音響信号の周波数帯域を示す音響信号関連情報を生成して出力する。そして、立体音響信号符号化プログラムは、立体音響信号合成手段によって、音響信号関連情報生成出力手段で生成した音響信号関連情報で示される音源番号の組合せによる音響信号同士について、当該音響信号の周波数帯域を合成することで立体音響信号とする。そして、立体音響信号符号化プログラムは、立体音響信号ビット割当符号化手段によって、立体音響信号合成手段で合成した立体音響信号に所定のビットを割り当てて符号化した符号化立体音響データとし、位置データ符号化手段によって、音源位置データを符号化した符号化位置データとする。そして、立体音響信号符号化プログラムは、多重化手段によって、符号化位置データと符号化立体音響データとを多重化した多重化データとし、多重化データ出力手段によって、多重化手段で多重化した多重化データを出力する。
【００１８】
【発明の実施の形態】
以下、本発明の実施の形態について、図面を参照して詳細に説明する。
この実施の形態では、二つの実施の形態について、立体音響信号符号化装置の構成、動作を説明し（第一実施形態の構成を図１、動作を図２、第二実施形態の構成を図３、動作を図４）、続いて、音源の位置を示す音源位置データについて説明し（図５を使用）、最後に、音源の距離差と周波数との関連を示した合成帯域変換データについて説明する（図６を使用）。
【００１９】
（立体音響信号符号化装置の構成［第一実施形態］）
図１に立体音響信号符号化装置のブロック図を示す。この図１に示すように、立体音響信号符号化装置１は、音像（音源に対応してつくられた音場の空間的な広がり）の方向性、距離感を再現するために、複数の音源から発せられた音響信号を合成して、当該音源の立体的な距離感（奥行き、立体感のある音響効果）を再生する際に生じさせる立体音響信号を生成した後、符号化するもので、フィルタ分析部３と、音響信号関連情報生成部５と、立体音響信号合成部７と、立体音響信号ビット割当符号化部９と、音源位置データ符号化部１１と、多重化部１３と、多重化データストリーム生成出力部１５とを備えている。
【００２０】
なお、一般に、音像の距離感は、ラウドネス（音の大きさ）、直接音と間接音とのパワー比（以下、直間比とする）、周波数特性等を手がかりとして得られることが知られている。また、一般的な試聴環境（無響音室ではなく残響音が存在する環境）では、直間比が距離判断の突出した手がかりとなることが知られている。しかし、直間比によって音像の距離判断をする場合、音源から発せられる音響信号の全ての周波数帯域が一様に寄与するのではなく、ある周波数帯域においては殆ど音像の距離判断に寄与しないことが実験によって確認された。そこで、この立体音響信号符号化装置１では、音像の距離判断に殆ど寄与しない周波数帯域を音響信号毎に結合して、情報量を削減する処理を行っている。
【００２１】
また、音響信号を符号化する場合、周波数帯域を制限することが伝送容量の増加を抑制するのに有効であることが知られているので、この立体音響信号符号化装置１では、立体音響信号ビット割当符号化部９で、周波数帯域を制限して音響信号を符号化している（詳細は後記する）。
【００２２】
フィルタ分析部３は、複数の音源（Ａ、Ｂ、Ｃ）から入力された音響信号にどのような周波数成分が含まれているかを分析するスペクトル解析を行うもので、当該音響信号を時間周波数変換した音響信号の周波数成分（この実施の形態では、ＤＣＴ係数［離散コサイン変換したもの］）を出力するものである。すなわち、フィルタ分析部３では、それぞれの音響信号を、一定サンプル毎にフレーム分割した後で、時間周波数変換、例えば、ＭＤＣＴ（ＭｏｄｉｆｉｅｄＤｉｓｃｒｅｔｅＣｏｓｉｎｅＴｒａｎｓｆｏｒｍ）を行っている。時間周波数に変換された音響信号は、臨界帯域（例えば、ツヴィッカー著「心理音響学」西村書店Ｐ７４表１参照）毎に周波数帯域に分割される（サブバンド処理が施される）。このフィルタ分析部３が特許請求の範囲に記載したフィルタ分析手段に相当するものである。
【００２３】
音響信号関連情報生成部５は、音源の位置（仮想的な聴取者の位置を原点とした場合の三次元空間における相対的な位置）を示す音源位置データに基づいて、音響信号から立体音響信号を合成する際の、音響信号の組み合わせに関する情報である音響信号関連情報を生成するもので、方向一致組合せ抽出手段１７と、音源距離差計算手段１９と、合成帯域変換データ記録手段２１と、音響信号関連情報生成出力手段２３とを備えている。この音響信号関連情報生成部５が特許請求の範囲に記載した音響信号関連情報生成手段に相当するものである。
【００２４】
なお、音源には、それぞれの音源を識別するための音源番号が付されている。そして、この音源番号と、仮想的な聴取者を原点とした場合の三次元空間における位置を示す座標（ｘ，ｙ，ｚ）との対が一つのデータとして当該装置１で取り扱われている。この図１に示したように、音響信号Ａの音源の音源番号はＰａであり、音源位置データはＰａ（ｘａ，ｙａ，ｚａ）（図示せず）であり、音響信号Ｂの音源の音源番号はＰｂであり、音源位置データはＰｂ（ｘｂ，ｙｂ，ｚｂ）（図示せず）であり、音響信号Ｃの音源の音源番号はＰｃであり、音源位置データはＰｃ（ｘｃ，ｙｃ，ｚｃ）（図示せず）である。
【００２５】
方向一致組合せ抽出手段１７は、音源位置データに基づいて、音源の方向が一致している音源同士を組み合わせて抽出するものである。例えば、仮想的な聴取者の位置を原点とした場合、この原点から右方向に位置する音源同士の組み合わせがつくられ（探索され）、各音源の音源番号が抽出される。なお、この方向一致組合せ抽出手段１７によって音源の方向が一致しているとの判断は、原点から各音源に向けて直線を引いて（原点と各音源とを直線で結んで）、この直線の傾きが近似している場合である。
【００２６】
音源距離差計算手段１９は、方向一致組合せ抽出手段１７によって、方向が一致しているものとして抽出された音源同士の距離差（相対距離差）を、音源位置データに基づいて計算して、この距離差を音源番号と共に、音響信号関連情報生成出力手段２３に出力するものである。音源同士の距離差の計算は、簡便な測量法（距離測定方法）に従って行っている。
【００２７】
合成帯域変換データ記録手段２１は、符号化された立体音響信号（符号化立体音響データ）を受信した受信側（再生側）で、当該立体音響信号によって、音像の距離感を生じさせるために、各音響信号を発生させる音源同士の距離差と周波数との関係を設定した（関連付けた）合成帯域変換データを記録するものである。この合成帯域変換データの詳細については後記する（図６を使用）。
【００２８】
音響信号関連情報生成出力手段２３は、音源距離差計算手段１９から出力された音源番号および距離差と、合成帯域変換データ記録手段２１に記録されている合成帯域変換データとに基づいて、音響信号関連情報を生成して、立体音響信号合成部７に出力するものである。音響信号関連情報は、フィルタ分析部３で各音源からの音響信号が音響信号周波数に変換されたものを立体音響信号合成部７で立体音響信号に合成する際に利用されるものであり、また、この音響信号関連情報は、受信側（再生側）で再生した場合に立体的な音響効果を生じさせるための補助情報となるものである。
【００２９】
立体音響信号合成部７は、フィルタ分析部３で分析された音響信号周波数を、音響信号関連情報に基づいて、インテンシティ合成するものである。このインテンシティ合成の例としては、ＩＳＯ／ＩＥＣ１３８１８−７ＭＰＥＧ２ａｄｖａｎｃｅｄＡｕｄｉｏＣｏｄｉｎｇの手法が挙げられる。この立体音響信号合成部７でインテンシティ合成されたものを立体音響信号とする。この立体音響信号合成部７が特許請求の範囲に記載した立体音響信号合成手段に相当するものである。
【００３０】
立体音響信号ビット割当符号化部９は、立体音響信号合成部７でインテンシティ合成された立体音響信号に所定のビット数を割り当てて符号化して、符号化立体音響データとして多重化部１３に出力するものである。この立体音響信号ビット割当符号化部９が特許請求の範囲に記載した立体音響信号ビット割当符号化手段に相当するものである。
【００３１】
音源位置データ符号化部１１は、音源位置データを符号化して、符号化位置データとして多重化部１３に出力するものである。この音源位置データ符号化部１１における音源位置データの符号化方法は、特開２００１−３４６２９７号公報に開示されている位置データ符号化器によって符号化する方法と同様な方法（音源毎の位置データを補助情報として符号化する方法）である。この音源位置データ符号化部１１が特許請求の範囲に記載した位置データ符号化手段に相当するものである。
【００３２】
多重化部１３は、立体音響信号ビット割当符号化部９で符号化された符号化立体音響データと、音源位置データ符号化部１１で符号化された符号化位置データとを多重化した多重化データとして、多重化データストリーム生成出力部１５に出力するものである。この多重化部１３が特許請求の範囲に記載した多重化手段に相当するものである。
【００３３】
多重化データストリーム生成出力部１５は、多重化部１３で多重化された多重化データをストリーム化して多重化データストリームとして出力するものである。この多重化データストリーム生成出力部１５が特許請求の範囲に記載した多重化データ出力手段に相当するものである。
【００３４】
この立体音響信号符号化装置１によれば、フィルタ分析部３によって、各音源から入力された音響信号が時間周波数変換されて、音響信号の周波数成分が分析され、また、音響信号関連情報生成部５によって、音響信号の音源の位置を示す音源位置データに基づいて、音響信号の組み合わせに関する情報である音響信号関連情報が生成される。続いて、立体音響信号合成部７によって、音響信号関連情報に基づいて、音響信号の周波数成分毎に合成されて、立体音響信号とされ、立体音響信号ビット割当符号化部９によって、合成された立体音響信号に所定のビットが割り当てられて符号化されて、符号化立体音響データとされ、さらに、音源位置データ符号化部１１によって、音源位置データが符号化されて符号化位置データとされる。そして、多重化部１３によって、符号化位置データと符号化立体音響データとが多重化されて、多重化データとされ、多重化データストリーム生成出力部１５によって、多重化データがストリーム化されて、多重化データストリームが生成され、出力される。このため、音源からの方向と同方向の音源同士の距離差とを参照して生成された音響信号関連情報に基づいて、音響信号を立体音響信号に合成しているので、音像の定位感等の臨場感を損なうことなく、また、音源の数が増加しても、当該音源に関する情報が音響信号関連情報によって一元的に纏められているので、送信する音響信号の情報量を抑制することができる。
【００３５】
（立体音響信号符号化装置の動作［第一実施形態］）
次に、図２に示すフローチャートを参照して、立体音響信号符号化装置１の動作を説明する（適宜、図１参照）。
【００３６】
まず、立体音響信号符号化装置１のフィルタ分析部３に音響信号（Ａ、Ｂ、Ｃ）が入力され、このフィルタ分析部３において時間周波数変換され、音響信号周波数とされる（Ｓ１）。また、音響信号（Ａ、Ｂ、Ｃ）に対応する音源位置データ（Ｐａ、Ｐｂ、Ｐｃ）が、立体音響信号符号化装置１の音響信号関連情報生成部５に入力され、この音響位置データ（音源番号、音源の座標）と、合成帯域変換データ記録手段２１に記録されている合成帯域変換データとに基づいて、音響信号関連情報が生成される（Ｓ２）。
【００３７】
そして、フィルタ分析部３にて変換された音響信号周波数が立体音響信号合成部７で、音響信号関連情報生成部５にて生成された音響信号関連情報に基づいて、立体音響信号にインテンシティ合成される（Ｓ３）。そして、この立体音響信号が立体音響信号ビット割当符号化部９で、所定のビット数が割り当てられて符号化され、符号化立体音響データとされる（Ｓ４）。
【００３８】
また、音源位置データが音源位置データ符号化部１１で符号化され、符号化音源位置データとされる（Ｓ５）。そして、多重化部１３で、これら符号化立体音響データおよび符号化音源位置データが多重化され、多重化データとされる（Ｓ６）。この多重化部１３で多重化された多重化データが多重化データストリーム生成出力部１５でストリーム化され、多重化データストリームとして出力される（Ｓ７）。
【００３９】
（立体音響信号符号化装置の構成［第二実施形態］）
次に、図３に立体音響信号符号化装置のブロック図を示す。この図３に示すように、立体音響信号符号化装置３１は、立体音響信号符号化装置１と同様に、音像の方向性、距離感を再現するために、複数の音源から発せられた音響信号を合成して、当該音源の立体的な距離感を再生する際に生じさせる立体音響信号を生成した後、符号化するもので、フィルタ分析部３と、音響信号関連情報生成部５と、立体音響信号合成部７と、音源位置データ符号化部１１と、フィルタ合成部３３と、分解音響信号符号化部３５と、出力部３７とを備えている。なお、立体音響信号符号化装置１の各構成と同様な構成は同一の名称符号を付して、その説明を省略する。
【００４０】
フィルタ合成部３３は、立体音響信号合成部７で合成された立体音響信号を分解して、分解音響信号に変換するものである。分解音響信号は、既存の音響符号化器に適合する音響信号であり、それぞれの音響信号には音響信号関連情報が付されていない。このフィルタ合成部３３が特許請求の範囲に記載したフィルタ合成手段に相当するものである。
【００４１】
分解音響信号符号化部３５は、フィルタ合成部３３で分解された分解音響信号を符号化して、符号化分解音響データとして、出力部３７に出力するものである。この分解音響信号符号化部３５が特許請求の範囲に記載した分解音響信号符号化手段に相当するものである。
【００４２】
出力部３７は、分解音響信号符号化部３５にて符号化された符号化分解音響データと、音源位置データ符号化部１１にて符号化された符号化位置データとを外部に出力するものである。この出力部３７が特許請求の範囲に記載したデータ出力手段に相当するものである。
【００４３】
この立体音響信号符号化装置３１によれば、フィルタ分析部３によって、各音源から入力された音響信号が時間周波数変換されて、音響信号の周波数成分が分析される。また、音響信号関連情報生成部５によって、音響信号の音源の位置を示す音源位置データに基づいて、音響信号の組み合わせに関する情報である音源信号関連情報が生成される。続いて、立体音響信号合成部７によって、音響信号関連情報に基づいて、音響信号の周波数成分毎に合成され、立体音響信号とされ、この立体音響信号がフィルタ合成部３３によって分解された分解音響信号とされ、分解音響信号符号化部３５によって、分解音響信号が符号化された符号化分解音響データとされる。さらに、音源位置データ符号化部１１によって、音源位置データが符号化された符号化位置データとされた後、出力部３７によって、符号化分解音響データおよび符号化位置データが出力される。このため、音源からの方向と同方向の音源同士の距離差とを参照して生成された音響信号関連情報に基づいて、音響信号を立体音響信号に合成しているので、音像の定位感等の臨場感を損なうことがない。また、フィルタ合成部３３で立体音響信号を分解音響信号に分解しているので、既存の音響信号符号化器に対応した形式で送信することができる。
【００４４】
（立体音響信号符号化装置の動作［第二実施形態］）
次に、図４に示すフローチャートを参照して、立体音響信号符号化装置３１の動作を説明する（適宜、図３参照）。
【００４５】
まず、立体音響信号符号化装置３１のフィルタ分析部３に音響信号（Ａ、Ｂ、Ｃ）が入力され、このフィルタ分析部３において時間周波数変換され、音響信号の周波数成分とされる（Ｓ１１）。また、音響信号（Ａ、Ｂ、Ｃ）に対応する音源位置データ（Ｐａ、Ｐｂ、Ｐｃ）が、立体音響信号符号化装置３１の音響信号関連情報生成部５に入力され、この音響位置データ（音源番号、音源の座標）と、合成帯域変換データ記録手段２１に記録されている合成帯域変換データとに基づいて、音響信号関連情報が生成される（Ｓ１２）。
【００４６】
そして、フィルタ分析部３にて変換された音響信号の周波数成分が立体音響信号合成部７で、音響信号関連情報生成部５にて生成された音響信号関連情報に基づいて、立体音響信号にインテンシティ合成される（Ｓ１３）。そして、この立体音響信号がフィルタ合成部３３で分解音響信号に分解され、分解音響信号符号化部３５に出力される（Ｓ１４）。続いて、この分解音響信号が分解音響信号符号化部３５で符号化分解音響データに符号化される（Ｓ１５）。
【００４７】
また、音源位置データが音源位置データ符号化部１１で符号化され、符号化音源位置データとされる（Ｓ１６）。そして、出力部３７で、これら符号化分解音響データおよび符号化音源位置データが出力される（Ｓ１７）。
【００４８】
（音源位置データについて）
次に、図５を参照して、音源位置データについて説明する。
この図５に示すように、複数の音源位置データである、音源ａ、音源ｂ、音源ｃおよび音源ｄは、仮想的な聴取者を中心（原点）とした曲座標によって表されている。音源位置データは、音源番号が２ｂｉｔ（２ビット）、距離方向のインデックスが７ｂｉｔ（７ビット）および左右方向のインデックスが７ｂｉｔ（７ビット）の合計１６ｂｉｔ（１６ビット）で構成されている。距離方向のインデックスは、０．５ｍ〜１０ｍの距離を１２８分割したものであり、左右方向のインデックスは３６０度を３度毎に１２０分割したものである。
【００４９】
図５に示したように、例えば、距離方向のインデックスおよび左右方向のインデックスが「０００１１１１０００１０１０」である場合、距離方向のインデックスは「０００１１１１」であり、この２進数を１０進数にすると「１５」となり、｛（１０−０．５）／１２８｝×（１５＋１）≒１．２ｍとなり、また、左右方向のインデックスは「０００１０１０」であり、この２進数を１０進数にすると「１０」となり、３度×１０＝３０度（聴取位置から右へ３０度）となる。
【００５０】
つまり、音源位置データは、１６ビットのデータとして表現され、最上位ビット（ＭＳＢ）の２ビットで音源を特定する音源番号が表現されており、次の７ビットで距離表現、その次の７ビットで方向表現がなされているものである。この音源位置データの左右方向のインデックスが立体音響信号符号化装置１、３１の音響信号関連情報生成部５の方向一致組合せ抽出手段１７で抽出される。すなわち、全ての音源位置データから左右方向のインデックスを表現した７ビット（最下位ビット（ＬＳＢ）を含む位置にある）を抜き出して、各左右方向のインデックスの比較を行って、すべてのビットが一致したものから音源番号の組を抽出する。
【００５１】
そして、音源距離差計算手段１９では、方向が一致している音源番号の組み合わせの中から距離方向のインデックスを表現した７ビットをそれぞれ抜き出して差分をとり、方向が一致している音源同士の距離差を計算している。その後、音響信号関連情報生成出力手段２３では、音源番号の組と、距離差と、音源位置データと、合成帯域変換データ記録手段２１に記録されている合成帯域変換データとに基づいて、音響信号関連情報（合成すべき周波数帯域を示したもの）を生成して出力している。
【００５２】
（合成帯域変換データについて）
最後に、図６を参照して、合成帯域変換データについて説明する。
図６に示したように、合成帯域変換データは、組み合わせられる音源番号の組の中で、距離の近い音源毎に（図６中、近い音源の距離）１個のファイル（図６中、距離差を横軸に、周波数を縦軸にとった表）として構成されているものである。なお、各ファイルは、音源同士の距離差と周波数とが関連付けられている。
【００５３】
以上、一実施形態に基づいて本発明を説明したが、本発明はこれに限定されるものではない。
例えば、立体音響信号符号化装置１、３１の各構成の処理を汎用的なコンピュータ言語で記述した立体音響信号符号化プログラムとみなすことや、各構成の処理を一つずつの過程ととられた立体音響信号符号化方法とみなすことは可能である。これらの場合、立体音響信号符号化装置１、３１と同様の効果を得ることができる。
【００５４】
【発明の効果】
請求項１、３、４記載の発明によれば、音源からの方向と同方向の音源同士の距離差とを参照して生成された音響信号関連情報に基づいて、音響信号を立体音響信号に合成しているので、音像の定位感等の臨場感を損なうことがない。また、音源の数が増加しても、当該音源に関する情報が音響信号関連情報によって一元的に纏められているので、送信する音響信号の情報量を抑制することができる。
【００５５】
請求項２に記載の発明によれば、さらに、立体音響信号を分解音響信号に分解しているので、既存の音響信号符号化器に対応した形式で送信することができる。
【図面の簡単な説明】
【図１】本発明による一実施の形態である立体音響信号符号化装置のブロック図である。
【図２】図１に示した立体音響信号符号化装置の動作を説明したフローチャートである。
【図３】本発明による他の実施の形態である立体音響信号符号化装置のブロック図である。
【図４】図３に示した立体音響信号符号化装置の動作を説明したフローチャートである。
【図５】音源位置データを説明した図である。
【図６】合成帯域変換データを説明した図である。
【符号の説明】
１、３１立体音響信号符号化装置
３フィルタ分析部（フィルタ分析手段）
５音響信号関連情報生成部（音響信号関連情報生成手段）
７立体音響信号合成部（立体音響信号合成手段）
９立体音響信号ビット割当符号化部（立体音響信号ビット割当符号化手段）
１１音源位置データ符号化部（位置データ符号化手段）
１３多重化部（多重化手段）
１５多重化データストリーム生成出力部（多重化データ出力手段）
１７方向一致組合せ抽出手段
１９音源距離差計算手段
２１合成帯域変換データ記録手段
２３音響信号関連情報生成出力手段
３３フィルタ合成部（フィルタ合成手段）
３５分解音響信号符号化部（分解音響信号符号化手段）
３７出力部（データ出力手段）[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a stereophonic signal encoding device, a stereoacoustic signal encoding method, and a stereoacoustic signal encoding program that generate and encode a stereoacoustic signal that produces a stereoscopic effect.
[0002]
[Prior art]
Conventionally, in order to reproduce the directionality and sense of distance of a sound image (spatial expanse of a sound field created corresponding to a sound source), a solid that generates a three-dimensional sound effect by combining multiple sound signals As a technique for generating an audio signal and encoding the generated stereophonic signal, for example, there is a “sound image reproduction system” described in Patent Document 1.
[0003]
This sound image reproduction system transmits an acoustic signal and auxiliary information, which is information such as the position and orientation of the sound source, from the transmission side by a three-dimensional sound image encoding method (a method of encoding a stereophonic sound signal). . In this sound image reproduction system, the auxiliary information increases as the acoustic signal increases, and the amount of information transmitted from the transmission side increases. Therefore, in this sound image reproduction system, a control signal table used when encoding a distance control signal and a direction control signal used for distance control and direction control of a speaker array on the reception side (reproduction side) on the transmission side, and By sharing the control signal table used when decoding on the receiving side, an increase in the amount of auxiliary information is suppressed.
[0004]
[Patent Document 1]
JP 2001-346297 A (paragraph 13-paragraph 29, FIG. 1)
[0005]
[Problems to be solved by the invention]
However, in the conventional sound reproduction system, an increase in the amount of auxiliary information can be suppressed, but no measures are taken to suppress the amount of information in the transmitted sound signal. There is a problem that the amount of signal information increases.
[0006]
However, in the conventional sound reproduction system, if the information amount of the sound signal to be transmitted is to be suppressed, when the sound signal is reproduced on the receiving side, the sense of localization of the sound image (position where the sound source is specified, sense of distance), etc. There is a problem in that there is a risk of impairing the sense of reality.
[0007]
Accordingly, the object of the present invention is to solve the problems of the conventional techniques described above, and to encode the stereophonic sound signal that can suppress the information amount of the sound signal to be transmitted without impairing the sense of reality such as the sense of localization of the sound image. An apparatus, a stereophonic signal encoding method, and a stereoacoustic signal encoding program are provided.
[0008]
[Means for Solving the Problems]
In order to achieve the above-described object, the present invention has the following configuration.
The stereophonic signal encoding device according to claim 1 generates a stereoacoustic signal that produces a three-dimensional acoustic effect upon reproduction by synthesizing a plurality of acoustic signals, and encodes the stereoacoustic signal. A stereophonic signal encoding device, comprising: filter analysis means; Direction matching combination extraction means, sound source distance difference calculation means, synthesized band conversion data recording means, acoustic signal related information generation output means, The stereo sound signal synthesizing unit, the stereo sound signal bit allocation encoding unit, the position data encoding unit, the multiplexing unit, and the multiplexed data output unit are provided.
[0009]
According to such a configuration, the stereophonic sound signal encoding device performs time-frequency conversion on the sound signal input from each sound source by the filter analysis unit, and analyzes the frequency component of the sound signal. The The filter analysis means performs DCT (Discrete Cosine Transform) on the acoustic signal, and the acoustic signal frequency is a DCT coefficient. Subsequently, the stereophonic sound signal encoding device uses the direction matching combination extraction unit to Indicates the position of the sound source of the acoustic signal Using the sound source number of the sound source position data, the index in the distance direction, and the index in the left-right direction, the combination of the sound source numbers whose directions match with the index in the left-right direction is extracted. Then, the stereophonic signal encoding apparatus calculates the distance difference between the sound sources related to the sound source numbers extracted by the direction matching combination extraction means by taking the difference of the index in the distance direction by the sound source distance difference calculating means. In addition, the stereophonic sound signal encoding device uses the sound signal related information generating / outputting means to generate a frequency corresponding to the distance difference between sound sources calculated by the sound source distance difference calculating means and the distance difference in the combined band conversion data, and a direction matching combination. Including the combination of the sound source numbers extracted by the extracting means, and generating and outputting acoustic signal related information indicating the frequency band of the acoustic signal to be synthesized. Then, the stereophonic signal encoding apparatus is configured by the stereoacoustic signal synthesis means. Generation of acoustic signal related information output Information related to the acoustic signal generated by the means By synthesizing the frequency bands of the sound signals for the sound signals with the combination of sound source numbers indicated by Let it be a stereophonic signal. Then, the stereophonic signal encoding device uses the stereoacoustic signal bit allocation encoding unit to obtain encoded stereoacoustic data encoded by allocating predetermined bits to the synthesized stereoacoustic signal, and the position data encoding unit sound source The encoded position data is encoded position data. Then, the stereophonic signal encoding device uses the multiplexing means to multiplex the encoded position data and the encoded stereoacoustic data, and outputs the multiplexed data by the multiplexed data output means. When outputting the multiplexed data, a multiplexed data stream obtained by streaming the multiplexed data may be generated and output.
[0010]
The stereophonic signal encoding device according to claim 2 generates a stereoacoustic signal that generates a three-dimensional acoustic effect upon reproduction by synthesizing a plurality of acoustic signals, and encodes the stereoacoustic signal. A stereophonic signal encoding device, comprising: filter analysis means; Direction matching combination extraction means, sound source distance difference calculation means, synthesized band conversion data recording means, acoustic signal related information generation output means, The stereo sound signal synthesizing means, the filter synthesizing means, the decomposed sound signal encoding means, the position data encoding means, and the data output means are provided.
[0011]
According to such a configuration, the stereophonic sound signal encoding device performs time-frequency conversion on the sound signal input from each sound source by the filter analysis unit, and analyzes the frequency component of the sound signal. To do. Subsequently, the stereophonic sound signal encoding device uses the direction matching combination extraction unit to Indicates the position of the sound source of the acoustic signal Using the sound source number of the sound source position data, the index in the distance direction, and the index in the left-right direction, the combination of the sound source numbers whose directions match with the index in the left-right direction is extracted. Then, the stereophonic signal encoding apparatus calculates the distance difference between the sound sources related to the sound source numbers extracted by the direction matching combination extraction means by taking the difference of the index in the distance direction by the sound source distance difference calculating means. Then, the stereophonic sound signal encoding device uses the sound signal related information generation / output unit to generate a frequency corresponding to the distance difference between sound sources calculated by the sound source distance difference calculating unit and the distance difference in the combined band conversion data, and a direction matching combination. Including the combination of the sound source numbers extracted by the extracting means, and generating and outputting acoustic signal related information indicating the frequency band of the acoustic signal to be synthesized. And the stereophonic signal encoding device Sound signal related information generation by 3D sound signal synthesis means output Information related to the acoustic signal generated by the means By synthesizing the frequency bands of the sound signals for the sound signals with the combination of sound source numbers indicated by Let it be a stereophonic signal. Then, the stereophonic signal encoding apparatus uses the filter synthesizing unit to obtain a decomposed acoustic signal obtained by decomposing the stereophonic signal synthesized by the stereophonic signal synthesizing unit. Then, the stereophonic signal encoding device converts the decomposed acoustic signal decomposed by the filter synthesizing means into encoded decoded acoustic data encoded by the decomposed acoustic signal encoding means, and encodes the sound source position data by the position data encoding means. After the encoded encoded position data is obtained, the encoded output data and the encoded position data are output by the data output means.
[0014]
Claim 3 The described stereophonic signal encoding method generates a stereoacoustic signal that produces a stereoscopic effect during reproduction by synthesizing a plurality of acoustic signals, and encodes the stereoacoustic signal. An encoding method comprising: a filter analysis step; Direction matching combination extraction step, sound source distance difference calculation step, acoustic signal related information generation output step, The stereophonic signal synthesis step, the stereophonic signal bit allocation encoding step, the position data encoding step, the multiplexing step, and the multiplexed data output step are included.
[0015]
According to this procedure, the stereophonic sound signal encoding method performs time-frequency conversion of the sound signal in the filter analysis step, and analyzes the frequency component of the sound signal. To do. Subsequently, in the stereophonic signal encoding method, in the direction matching combination extraction step, Indicates the position of the sound source of the acoustic signal Using the sound source number of the sound source position data, the index in the distance direction, and the index in the left-right direction, the combination of the sound source numbers whose directions match with the index in the left-right direction is extracted. Then, in the sound source distance difference calculation step, the stereophonic signal encoding method calculates the distance difference between the sound sources related to the sound source numbers extracted in the direction matching combination extraction step by taking the difference in the index in the distance direction. . Then, the stereophonic sound signal encoding method includes the sound signal related information generation output step. , The frequency corresponding to the distance difference between the sound sources calculated in the sound source distance difference calculation step, and the distance difference in the combined band conversion data in which the distance difference between the sound sources and the frequency are associated with each other, extracted in the direction matching combination extraction step And generating and outputting acoustic signal related information indicating the frequency band of the acoustic signal to be synthesized. And the stereophonic signal encoding method is: Generation of acoustic signal related information in the stereophonic signal synthesis step output Step At Generated acoustic signal related information By synthesizing the frequency bands of the sound signals for the sound signals with the combination of sound source numbers indicated by Let it be a stereophonic signal. In the stereophonic signal encoding method, in the stereoacoustic signal bit allocation encoding step, the stereoacoustic signal is encoded stereoacoustic data encoded by allocating predetermined bits to the stereoacoustic signal, and in the position data encoding step, sound source The encoded position data is encoded position data. The stereophonic signal encoding method uses multiplexed data in which the encoded position data and the encoded stereoacoustic data are multiplexed in the multiplexing step, and outputs the multiplexed data in the multiplexed data output step. To do.
[0016]
Claim 4 The described stereophonic signal encoding program generates a stereoacoustic signal that produces a three-dimensional acoustic effect when played back by combining a plurality of acoustic signals, and encodes the stereoacoustic signal Therefore, a computer provided with synthetic band conversion data recording means for pre-recording synthetic band conversion data in which a distance difference between sound sources and a frequency are associated with each other Filter analysis means, Direction matching combination extraction means, sound source distance difference calculation means, acoustic signal related information generation output means , Stereo sound signal synthesis means, stereo sound signal bit allocation encoding means, position data encoding means, multiplexing means, and multiplexed data output means.
[0017]
According to this configuration, the stereophonic signal encoding program is stored in the filter analysis unit. By The sound signal is time-frequency converted and the frequency of the sound signal is analyzed. Subsequently, the stereophonic signal encoding program is obtained by the direction matching combination extraction unit. Indicates the position of the sound source of the acoustic signal Using the sound source number of the sound source position data, the index in the distance direction, and the index in the left-right direction, the combination of the sound source numbers whose directions match with the left-right index is extracted. Then, the stereophonic signal encoding program calculates the distance difference between the sound sources related to the sound source numbers extracted by the direction matching combination extracting means by taking the difference of the index in the distance direction by the sound source distance difference calculating means. And the stereophonic sound signal encoding program is a sound signal related information generating / outputting means. By Including a frequency corresponding to a distance difference between sound sources calculated by the sound source distance difference calculating means and a distance difference in the combined band conversion data, and a combination of the sound source numbers extracted by the direction matching combination extracting means, Sound signal related information indicating the frequency band of the sound signal to be generated is generated and output. Then, the stereophonic signal encoding program is executed by the stereoacoustic signal synthesis means. Sound signal related information generation output means so Generated acoustic signal related information By synthesizing the frequency bands of the sound signals for the sound signals with the combination of sound source numbers indicated by Let it be a stereophonic signal. And the stereophonic signal encoding program is By the stereophonic signal bit allocation encoding means, Stereo sound signal synthesis means so Coordinate stereophonic data obtained by allocating predetermined bits to the synthesized stereoacoustic signal and encoding it, and position data encoding means By the sound source The encoded position data is encoded position data. The stereophonic sound signal encoding program is multiplexed means. By The multiplexed data is obtained by multiplexing the encoded position data and the encoded stereophonic data, By multiplexed data output means , With multiplexing means Outputs multiplexed data.
[0018]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
In this embodiment, the configuration and operation of the stereophonic signal encoding apparatus will be described for two embodiments (the configuration of the first embodiment is shown in FIG. 1, the operation is shown in FIG. 2, and the configuration of the second embodiment is shown). 3. Next, the operation will be described with reference to FIG. 4), the sound source position data indicating the position of the sound source will be described (using FIG. 5), and finally the synthesized band conversion data indicating the relationship between the distance difference between the sound sources and the frequency will be described. (Use FIG. 6).
[0019]
(Configuration of Stereo Acoustic Signal Encoding Device [First Embodiment])
FIG. 1 shows a block diagram of a stereophonic signal encoding apparatus. As shown in FIG. 1, the stereophonic sound signal encoding device 1 includes a plurality of sound sources in order to reproduce the directionality and the sense of distance of a sound image (spatial expanse of a sound field created corresponding to the sound source). Is generated by synthesizing the sound signal emitted from the sound source and generating a three-dimensional sound signal that is generated when reproducing the three-dimensional sense of distance (depth, three-dimensional sound effect) of the sound source. Filter analysis unit 3, acoustic signal related information generation unit 5, stereophonic signal synthesis unit 7, stereophonic signal bit allocation encoding unit 9, sound source position data encoding unit 11, multiplexing unit 13, and multiplexing The data stream generation output unit 15 is provided.
[0020]
In general, it is known that the sense of distance of a sound image can be obtained from clues such as loudness (volume of sound), power ratio between direct sound and indirect sound (hereinafter referred to as direct ratio), frequency characteristics, and the like. Yes. Further, it is known that in a general listening environment (an environment in which a reverberation sound exists instead of an anechoic sound room), the direct ratio is a clue that makes the distance determination prominent. However, when determining the distance of the sound image by the direct ratio, not all the frequency bands of the acoustic signal emitted from the sound source contribute uniformly, but may hardly contribute to the determination of the distance of the sound image in a certain frequency band. Confirmed by experiment. In view of this, the stereophonic signal encoding apparatus 1 performs processing for reducing the amount of information by combining frequency bands that hardly contribute to sound image distance determination for each acoustic signal.
[0021]
Further, when encoding an audio signal, it is known that limiting the frequency band is effective in suppressing an increase in transmission capacity. The bit allocation encoding unit 9 encodes the acoustic signal by limiting the frequency band (details will be described later).
[0022]
The filter analysis unit 3 performs spectrum analysis for analyzing what frequency components are included in the acoustic signals input from a plurality of sound sources (A, B, C). The frequency component of the acoustic signal (in this embodiment, DCT coefficient [discrete cosine transformed]) is output. That is, the filter analysis unit 3 performs time-frequency conversion, for example, MDCT (Modified Discrete Cosine Transform), after each acoustic signal is divided into frames for each predetermined sample. The acoustic signal converted into the time frequency is divided into frequency bands (subband processing is performed) for each critical band (see, for example, Psychacoustics by Zwicker, P74 Table 1). The filter analysis unit 3 corresponds to the filter analysis means described in the claims.
[0023]
The sound signal related information generation unit 5 generates a stereo sound signal from the sound signal based on the sound source position data indicating the position of the sound source (relative position in the three-dimensional space when the position of the virtual listener is the origin). For generating acoustic signal related information, which is information related to the combination of acoustic signals when synthesizing the signal, direction matching combination extracting means 17, sound source distance difference calculating means 19, synthesized band conversion data recording means 21, acoustic Signal-related information generation output means 23. The acoustic signal related information generating unit 5 corresponds to the acoustic signal related information generating means described in the claims.
[0024]
A sound source number for identifying each sound source is attached to the sound source. A pair of the sound source number and the coordinates (x, y, z) indicating the position in the three-dimensional space when the virtual listener is the origin is handled as one piece of data by the apparatus 1. As shown in FIG. 1, the sound source number of the sound source of the acoustic signal A is Pa, the sound source position data is Pa (xa, ya, za) (not shown), and the sound source number of the sound source of the acoustic signal B is Is Pb, the sound source position data is Pb (xb, yb, zb) (not shown), the sound source number of the sound source of the acoustic signal C is Pc, and the sound source position data is Pc (xc, yc, zc). (Not shown).
[0025]
The direction coincidence combination extracting means 17 extracts sound sources whose sound source directions are the same based on the sound source position data. For example, when the position of the virtual listener is the origin, a combination of sound sources located in the right direction from the origin is created (searched), and the sound source number of each sound source is extracted. This direction matching association Set The extraction means 17 determines that the directions of the sound sources are coincident with each other by drawing a straight line from the origin toward each sound source (connecting the origin and each sound source with a straight line) and approximating the slope of this line. Is the case.
[0026]
The sound source distance difference calculation means 19 calculates the distance difference (relative distance difference) between the sound sources extracted by the direction matching combination extraction means 17 as having the same direction based on the sound source position data. The distance difference is output to the sound signal related information generating / outputting means 23 together with the sound source number. The distance difference between the sound sources is calculated according to a simple surveying method (distance measuring method).
[0027]
The combined band conversion data recording means 21 is a receiving side (reproducing side) that has received the encoded stereophonic signal (encoded stereoacoustic data). Synthetic band conversion data in which the relationship between the distance difference between the sound sources that generate each acoustic signal and the frequency is set (associated) is recorded. Details of the combined band conversion data will be described later (using FIG. 6).
[0028]
The acoustic signal related information generating / outputting unit 23 generates an acoustic signal based on the sound source number and the distance difference output from the sound source distance difference calculating unit 19 and the synthesized band conversion data recorded in the synthesized band conversion data recording unit 21. The related information is generated and output to the stereophonic sound signal synthesis unit 7. The acoustic signal related information is used when the acoustic signal from each sound source converted into the acoustic signal frequency by the filter analysis unit 3 is synthesized into the stereoscopic acoustic signal by the stereoscopic acoustic signal synthesis unit 7, and The acoustic signal related information is auxiliary information for producing a three-dimensional acoustic effect when reproduced on the reception side (reproduction side).
[0029]
The three-dimensional acoustic signal synthesis unit 7 synthesizes the intensity of the acoustic signal frequency analyzed by the filter analysis unit 3 based on the acoustic signal related information. As an example of the intensity synthesis, there is a method of ISO / IEC 13818-7 MPEG2 advanced Audio Coding. The sound synthesized by the stereophonic signal synthesizing unit 7 is defined as a stereoacoustic signal. This stereophonic signal synthesizing unit 7 corresponds to the stereoacoustic signal synthesizing means described in the claims.
[0030]
The stereophonic signal bit allocation encoding unit 9 allocates and encodes a predetermined number of bits to the stereoacoustic signal intensity-combined by the stereoacoustic signal synthesis unit 7 and outputs the encoded stereoacoustic data to the multiplexing unit 13. To do. This stereophonic signal bit allocation encoding unit 9 corresponds to the stereoacoustic signal bit allocation encoding means described in the claims.
[0031]
The sound source position data encoding unit 11 encodes sound source position data and outputs the encoded position data to the multiplexing unit 13 as encoded position data. The sound source position data encoding method in the sound source position data encoding unit 11 is similar to the method of encoding by the position data encoder disclosed in Japanese Patent Laid-Open No. 2001-346297 (position data for each sound source). Is encoded as auxiliary information). The sound source position data encoding unit 11 corresponds to the position data encoding means described in the claims.
[0032]
The multiplexing unit 13 multiplexes the encoded stereophonic data encoded by the stereoacoustic signal bit allocation encoding unit 9 and the encoded position data encoded by the sound source position data encoding unit 11. The data is output to the multiplexed data stream generation / output unit 15. The multiplexing unit 13 corresponds to multiplexing means described in the claims.
[0033]
The multiplexed data stream generation / output unit 15 converts the multiplexed data multiplexed by the multiplexing unit 13 into a stream and outputs it as a multiplexed data stream. The multiplexed data stream generation / output unit 15 corresponds to the multiplexed data output means described in the claims.
[0034]
According to this stereophonic signal encoding apparatus 1, the filter signal analysis unit 3 performs time-frequency conversion on the sound signal input from each sound source and analyzes the frequency component of the sound signal, and the sound signal related information generation unit. 5, the sound signal related information that is information relating to the combination of the sound signals is generated based on the sound source position data indicating the position of the sound source of the sound signal. Subsequently, the stereophonic signal synthesizer 7 synthesizes each frequency component of the sound signal based on the sound signal related information to form a stereoacoustic signal, which is synthesized by the stereophonic signal bit allocation encoding unit 9. A predetermined bit is assigned to the stereophonic signal and encoded to be encoded stereoacoustic data. Further, the sound source position data encoding unit 11 encodes the sound source position data to be encoded position data. . Then, the encoded position data and the encoded stereophonic data are multiplexed by the multiplexing unit 13 to be multiplexed data, and the multiplexed data is streamed by the multiplexed data stream generation output unit 15, A multiplexed data stream is generated and output. For this reason, since the sound signal is synthesized into the stereophonic sound signal based on the sound signal related information generated by referring to the direction from the sound source and the distance difference between the sound sources in the same direction, the localization of the sound image, etc. Even if the number of sound sources increases without sacrificing the presence of the sound source, the information on the sound sources is centralized by the sound signal related information, so that the amount of information of the sound signals to be transmitted can be suppressed. it can.
[0035]
(Operation of Stereo Acoustic Signal Encoding Device [First Embodiment])
Next, the operation of the stereophonic signal encoding apparatus 1 will be described with reference to the flowchart shown in FIG. 2 (see FIG. 1 as appropriate).
[0036]
First, acoustic signals (A, B, C) are input to the filter analysis unit 3 of the stereophonic signal encoding device 1, and the frequency analysis is performed in the filter analysis unit 3 to obtain the acoustic signal frequency (S1). Further, sound source position data (Pa, Pb, Pc) corresponding to the acoustic signals (A, B, C) is input to the acoustic signal related information generation unit 5 of the stereoscopic acoustic signal encoding device 1, and this acoustic position data ( Sound signal related information is generated based on the sound source number and the sound source coordinates) and the composite band conversion data recorded in the composite band conversion data recording means 21 (S2).
[0037]
Then, the sound signal frequency converted by the filter analysis unit 3 is intensity-synthesized into the stereophonic sound signal based on the sound signal related information generated by the sound signal related information generating unit 5 by the stereophonic sound signal synthesizing unit 7. (S3). Then, this stereophonic signal is encoded by the stereoacoustic signal bit allocation encoding unit 9 with a predetermined number of bits allocated and is set as encoded stereoacoustic data (S4).
[0038]
Further, the sound source position data is encoded by the sound source position data encoding unit 11 to be encoded sound source position data (S5). Then, in the multiplexing unit 13, the encoded stereophonic data and the encoded sound source position data are multiplexed to obtain multiplexed data (S6). The multiplexed data multiplexed by the multiplexing unit 13 is streamed by the multiplexed data stream generation / output unit 15 and output as a multiplexed data stream (S7).
[0039]
(Configuration of Stereo Acoustic Signal Encoding Device [Second Embodiment])
Next, FIG. 3 shows a block diagram of the stereophonic signal encoding apparatus. As shown in FIG. 3, the stereophonic signal encoding device 31 is similar to the stereoacoustic signal encoding device 1, in order to reproduce the directionality of the sound image and the sense of distance, the acoustic signals emitted from a plurality of sound sources. Are generated, and a stereophonic signal generated when reproducing the three-dimensional sense of distance of the sound source is generated, and then encoded. The filter analysis unit 3, the acoustic signal related information generation unit 5, An acoustic signal synthesis unit 7, a sound source position data encoding unit 11, a filter synthesis unit 33, a decomposition acoustic signal encoding unit 35, and an output unit 37 are provided. In addition, the same name code | symbol is attached | subjected to the structure similar to each structure of the stereophonic sound signal encoding apparatus 1, and the description is abbreviate | omitted.
[0040]
The filter synthesis unit 33 decomposes the stereophonic signal synthesized by the stereoacoustic signal synthesis unit 7 and converts it into a decomposed acoustic signal. The decomposed acoustic signal is an acoustic signal suitable for an existing acoustic encoder, and no acoustic signal related information is attached to each acoustic signal. The filter synthesis unit 33 corresponds to the filter synthesis means described in the claims.
[0041]
The decomposed acoustic signal encoding unit 35 encodes the decomposed acoustic signal decomposed by the filter synthesizing unit 33 and outputs it to the output unit 37 as encoded decomposed acoustic data. The decomposed acoustic signal encoding unit 35 corresponds to the decomposed acoustic signal encoding means described in the claims.
[0042]
The output unit 37 outputs the encoded decomposed acoustic data encoded by the decomposed acoustic signal encoding unit 35 and the encoded position data encoded by the sound source position data encoding unit 11 to the outside. is there. The output unit 37 corresponds to the data output means described in the claims.
[0043]
According to the three-dimensional acoustic signal encoding device 31, the filter analysis unit 3 performs time-frequency conversion on the acoustic signal input from each sound source, and analyzes the frequency component of the acoustic signal. Also, acoustic signal related information generation Part 5 Thus, sound source signal related information that is information relating to a combination of sound signals is generated based on sound source position data indicating the position of the sound source of the sound signal. Subsequently, the stereophonic signal synthesizing unit 7 synthesizes each frequency component of the sound signal based on the sound signal related information to obtain a stereoacoustic signal, and the stereophonic signal is decomposed by the filter synthesizing unit 33. It is set as a signal, and it is set as the encoding decomposition acoustic data by which the decomposition acoustic signal was encoded by the decomposition acoustic signal encoding part 35. Further, after the sound source position data encoding unit 11 converts the sound source position data into encoded position data, the output unit 37 outputs encoded decomposition acoustic data and encoded position data. For this reason, since the sound signal is synthesized into the stereophonic sound signal based on the sound signal related information generated by referring to the direction from the sound source and the distance difference between the sound sources in the same direction, the localization of the sound image, etc. There is no loss of realism. Further, since the stereophonic signal is decomposed into the decomposed acoustic signal by the filter combining unit 33, it can be transmitted in a format corresponding to the existing acoustic signal encoder.
[0044]
(Operation of Stereo Acoustic Signal Encoding Device [Second Embodiment])
Next, the operation of the stereophonic sound signal encoding device 31 will be described with reference to the flowchart shown in FIG. 4 (see FIG. 3 as appropriate).
[0045]
First, an acoustic signal (A, B, C) is input to the filter analysis unit 3 of the stereophonic signal encoding device 31, and the filter analysis unit 3 performs time-frequency conversion to obtain a frequency component of the acoustic signal (S11). . Further, sound source position data (Pa, Pb, Pc) corresponding to the acoustic signals (A, B, C) is input to the acoustic signal related information generating unit 5 of the three-dimensional acoustic signal encoding device 31, and the acoustic position data ( Sound signal related information is generated based on the sound source number and the sound source coordinates) and the composite band conversion data recorded in the composite band conversion data recording means 21 (S12).
[0046]
Then, the frequency component of the sound signal converted by the filter analysis unit 3 is converted into a stereophonic sound signal based on the sound signal related information generated by the sound signal related information generating unit 5 by the stereo sound signal synthesizing unit 7. The city is synthesized (S13). Then, this stereophonic signal is decomposed into a decomposed acoustic signal by the filter synthesizing unit 33 and output to the decomposed acoustic signal encoding unit 35 (S14). Subsequently, the decomposed acoustic signal is encoded into encoded decomposed acoustic data by the decomposed acoustic signal encoding unit 35 (S15).
[0047]
Further, the sound source position data is encoded by the sound source position data encoding unit 11 to be encoded sound source position data (S16). The output unit 37 outputs the encoded and decomposed acoustic data and the encoded sound source position data (S17).
[0048]
(About sound source position data)
Next, sound source position data will be described with reference to FIG.
As shown in FIG. 5, the sound source a, the sound source b, the sound source c, and the sound source d, which are a plurality of sound source position data, are represented by music coordinates with the virtual listener as the center (origin). The sound source position data is composed of a total of 16 bits (16 bits) including a sound source number of 2 bits (2 bits), an index in the distance direction of 7 bits (7 bits), and an index in the horizontal direction of 7 bits (7 bits). The index in the distance direction is obtained by dividing a distance of 0.5 m to 10 m into 128, and the index in the left and right direction is obtained by dividing 360 degrees into 120 parts every 3 degrees.
[0049]
As shown in FIG. 5, for example, when the index in the distance direction and the index in the left and right direction are “00011110001010”, the index in the distance direction is “0001111”. If this binary number is converted to a decimal number, “15” is obtained. , {(10−0.5) / 128} × (15 + 1) ≈1.2 m, and the index in the horizontal direction is “0001010”. When this binary number is converted to a decimal number, “10” is obtained. X10 = 30 degrees (30 degrees to the right from the listening position).
[0050]
That is, the sound source position data is expressed as 16-bit data, and the sound source number for specifying the sound source is expressed by 2 bits of the most significant bit (MSB), the distance is expressed by the next 7 bits, and the next 7 bits. The direction expression is made by. The left-right index of this sound source position data is encoded as a stereophonic signal. apparatus 1, 31 acoustic signal related information generation unit 5 Are extracted by the direction matching combination extraction means 17. That is, 7 bits (in the position including the least significant bit (LSB)) representing the left and right direction index are extracted from all sound source position data, and the left and right direction index is compared, and all the bits match. A set of sound source numbers is extracted from the result.
[0051]
Then, the sound source distance difference calculation means 19 extracts 7 bits each representing the index of the distance direction from the combination of the sound source numbers having the same direction, takes the difference, and calculates the distance between the sound sources having the same direction. The difference is calculated. Then, acoustic signal related information generation output In the means 23, the sound signal related information (frequency band to be synthesized is based on the set of sound source numbers, the distance difference, the sound source position data, and the synthesized band converted data recorded in the synthesized band converted data recording means 21. Is generated and output.
[0052]
(About synthetic band conversion data)
Finally, the synthesized band conversion data will be described with reference to FIG.
As shown in FIG. 6, the combined band conversion data is one file (distance in FIG. 6) for each sound source having a short distance (distance of close sound sources in FIG. 6) in the set of sound source numbers to be combined. The difference is shown on the horizontal axis and the frequency is shown on the vertical axis. Each file is associated with a distance difference between the sound sources and a frequency.
[0053]
As mentioned above, although this invention was demonstrated based on one Embodiment, this invention is not limited to this.
For example, the processing of each component of the stereophonic signal encoding devices 1 and 31 is regarded as a stereoacoustic signal encoding program described in a general-purpose computer language, or the processing of each component is taken as a single process. It can be regarded as a stereophonic signal encoding method. In these cases, the same effects as those of the stereophonic sound signal encoding devices 1 and 31 can be obtained.
[0054]
【The invention's effect】
Claim 1, 3, 4 According to the described invention, the sound signal is synthesized with the stereophonic sound signal based on the sound signal related information generated by referring to the direction from the sound source and the distance difference between the sound sources in the same direction. Does not impair the sense of presence such as the sense of orientation. Moreover, even if the number of sound sources increases, information related to the sound sources is centralized by the sound signal related information, so that the information amount of the sound signals to be transmitted can be suppressed.
[0055]
According to invention of Claim 2, Furthermore, standing Since the body acoustic signal is decomposed into the decomposed acoustic signal, it can be transmitted in a format corresponding to the existing acoustic signal encoder.
[Brief description of the drawings]
FIG. 1 is a block diagram of a stereophonic sound signal encoding apparatus according to an embodiment of the present invention.
FIG. 2 is a flowchart for explaining the operation of the stereophonic signal encoding apparatus shown in FIG.
FIG. 3 is a block diagram of a stereophonic sound signal encoding apparatus according to another embodiment of the present invention.
4 is a flowchart for explaining the operation of the stereophonic signal encoding apparatus shown in FIG. 3;
FIG. 5 is a diagram illustrating sound source position data.
FIG. 6 is a diagram illustrating combined band conversion data.
[Explanation of symbols]
1, 31 Stereophonic signal encoding apparatus
3 Filter analysis section (filter analysis means)
5 Acoustic signal related information generating unit (acoustic signal related information generating means)
7 Stereophonic sound signal synthesis unit (stereosonic signal synthesis means)
9 Stereophonic signal bit allocation encoding unit (stereoacoustic signal bit allocation encoding means)
11 Sound source position data encoding unit (position data encoding means)
13 Multiplexer (Multiplexing means)
15 Multiplexed data stream generation output unit (multiplexed data output means)
17 Direction matching combination extraction means
19 Sound source distance difference calculation means
21 Synthetic band conversion data recording means
23 Sound signal related information generating / outputting means
33 Filter synthesis unit (filter synthesis means)
35 Decomposed acoustic signal encoding unit (decomposed acoustic signal encoding means)
37 Output section (data output means)

Claims

A stereophonic sound signal encoding device that generates a stereophonic signal that produces a three-dimensional sound effect upon reproduction by synthesizing a plurality of sound signals and encodes the stereoacoustic signal,
Filter analysis means for performing time-frequency conversion of the acoustic signal and analyzing a frequency component of the acoustic signal;
Using the sound source number of the sound source position data indicating the position of the sound source of the acoustic signal, the index in the distance direction, and the index in the left and right direction, the direction matching that extracts the combination of the sound source numbers whose directions match with the left and right index Combination extraction means;
A sound source distance difference calculating means for calculating a distance difference between sound sources related to the sound source numbers extracted by the direction matching combination extracting means by taking a difference of indexes in the distance direction;
Synthetic band conversion data recording means for pre-recording synthetic band conversion data that associates the distance difference between the sound sources and the frequency;
Including a frequency corresponding to a distance difference between sound sources calculated by the sound source distance difference calculating means and a distance difference in the combined band conversion data, and a combination of the sound source numbers extracted by the direction matching combination extracting means, Acoustic signal related information generation output means for generating and outputting acoustic signal related information indicating the frequency band of the acoustic signal to be output ;
Stereo sound signal synthesizing means for making a stereo sound signal by synthesizing the frequency bands of the sound signals of the sound signals by the combination of the sound source numbers indicated by the sound signal related information generated by the sound signal related information generating / outputting means; ,
Stereophonic signal bit assignment encoding means for encoding stereophonic data obtained by assigning and encoding predetermined bits to the stereoacoustic signal synthesized by the stereoacoustic signal synthesis means;
Position data encoding means for encoding the sound source position data into encoded position data;
Multiplexing means for making multiplexed data obtained by multiplexing the encoded position data and the encoded stereophonic data;
Multiplexed data output means for outputting multiplexed data multiplexed by the multiplexing means;
A stereophonic signal encoding device comprising:

A stereophonic sound signal encoding device that generates a stereophonic signal that produces a three-dimensional sound effect upon reproduction by synthesizing a plurality of sound signals and encodes the stereoacoustic signal,
Filter analysis means for performing time-frequency conversion of the acoustic signal and analyzing a frequency component of the acoustic signal;
Using the sound source number of the sound source position data indicating the position of the sound source of the acoustic signal, the index in the distance direction, and the index in the left and right direction, the direction matching that extracts the combination of the sound source numbers whose directions match with the left and right index Combination extraction means;
A sound source distance difference calculating means for calculating a distance difference between sound sources related to the sound source numbers extracted by the direction matching combination extracting means by taking a difference of indexes in the distance direction;
Synthetic band conversion data recording means for pre-recording synthetic band conversion data that associates the distance difference between the sound sources and the frequency;
Including a frequency corresponding to a distance difference between sound sources calculated by the sound source distance difference calculating means and a distance difference in the combined band conversion data, and a combination of the sound source numbers extracted by the direction matching combination extracting means, Acoustic signal related information generation output means for generating and outputting acoustic signal related information indicating the frequency band of the acoustic signal to be output ;
Stereo sound signal synthesizing means for making a stereo sound signal by synthesizing the frequency bands of the sound signals of the sound signals by the combination of the sound source numbers indicated by the sound signal related information generated by the sound signal related information generating / outputting means; ,
Filter synthesizing means for decomposing the stereophonic signal synthesized by the stereophonic sound signal synthesizing means,
Decomposed acoustic signal encoding means for encoding encoded decomposition acoustic data obtained by encoding the decomposed acoustic signal decomposed by the filter synthesis means;
Position data encoding means for encoding the sound source position data into encoded position data;
Data output means for outputting the encoded decomposition acoustic data and the encoded position data;
A stereophonic signal encoding device comprising:

A stereophonic signal encoding method for generating a stereophonic signal that produces a three-dimensional sound effect upon reproduction by synthesizing a plurality of acoustic signals and encoding the stereoacoustic signal,
A filter analysis step of time-frequency converting the acoustic signal to analyze a frequency component of the acoustic signal;
Using the sound source number of the sound source position data indicating the position of the sound source of the acoustic signal, the index in the distance direction, and the index in the left and right direction, the direction matching that extracts the combination of the sound source numbers whose directions match with the left and right index A combination extraction step;
A sound source distance difference calculating step of calculating a distance difference between sound sources related to the sound source number extracted in the direction matching combination extracting step by taking a difference of the index in the distance direction;
The frequency corresponding to the distance difference between the sound sources calculated in the sound source distance difference calculating step, the distance difference in the synthesized band conversion data in which the distance difference between the sound sources and the frequency are associated, and the direction matching combination extracting step A sound signal related information generation output step for generating and outputting sound signal related information indicating the frequency band of the sound signal to be synthesized ,
A stereophonic sound signal synthesizing step for generating a stereophonic sound signal by synthesizing the frequency bands of the sound signals of the sound signals by the combination of the sound source numbers indicated by the sound signal related information generated in the sound signal related information generating / outputting step; ,
A stereophonic signal bit assignment encoding step for encoding stereophonic data obtained by assigning and encoding predetermined bits to the stereoacoustic signal synthesized in the stereoacoustic signal synthesis step;
A position data encoding step of encoding the sound source position data into encoded position data;
A multiplexing step in which the encoded position data and the encoded stereophonic data are multiplexed data;
A multiplexed data output step for outputting multiplexed data multiplexed in this multiplexing step;
A stereophonic signal encoding method comprising:

In order to generate a stereophonic sound signal that produces a three-dimensional sound effect during reproduction by synthesizing a plurality of sound signals and to encode the stereoacoustic signal, the distance difference between the sound sources and the frequency are calculated. A computer comprising synthetic band conversion data recording means for pre-recording associated synthetic band conversion data ,
Filter analysis means for analyzing the frequency of the acoustic signal by time-frequency converting the acoustic signal;
Using the sound source number of the sound source position data indicating the position of the sound source of the acoustic signal, the index in the distance direction, and the index in the left and right direction, the direction matching that extracts the combination of the sound source numbers whose directions match with the left and right index Combination extraction means,
Sound source distance difference calculating means for calculating the distance difference between sound sources related to the sound source numbers extracted by the direction matching combination extracting means by taking the difference of the index in the distance direction,
Including a frequency corresponding to a distance difference between sound sources calculated by the sound source distance difference calculating means and a distance difference in the combined band conversion data, and a combination of the sound source numbers extracted by the direction matching combination extracting means, Acoustic signal related information generating and outputting means for generating and outputting acoustic signal related information indicating the frequency band of the acoustic signal to be output;
Stereo sound signal synthesizing means for making a stereo sound signal by synthesizing the frequency band of the sound signal for the sound signals by the combination of the sound source numbers indicated by the sound signal related information generated by the sound signal related information generating and outputting means,
Stereophonic signal bit allocation encoding means for encoding stereoacoustic data encoded by assigning predetermined bits to the stereoacoustic signal synthesized by the stereoacoustic signal synthesis means;
Position data encoding means for encoding the sound source position data into encoded position data;
Multiplexing means for making multiplexed data in which the encoded position data and the encoded stereophonic data are multiplexed,
Multiplexed data output means for outputting multiplexed data multiplexed by the multiplexing means;
A three-dimensional sound signal encoding program characterized by being made to function as