JP6296072B2

JP6296072B2 - Sound reproduction apparatus and program

Info

Publication number: JP6296072B2
Application number: JP2016016322A
Authority: JP
Inventors: 一浩片桐
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2016-01-29
Filing date: 2016-01-29
Publication date: 2018-03-20
Anticipated expiration: 2036-01-29
Also published as: JP2017135669A

Description

本発明は、音響再生装置及びプログラムに関し、ステレオスピーカから音響信号を立体的に再生する場合に適用し得るものである。 The present invention relates to a sound reproduction device and a program, and can be applied to three-dimensional reproduction of sound signals from stereo speakers.

従来、音像を任意に定位させ、あたかもその場にいるかのような臨場感を再現する技術として、頭部伝達関数（ｈｅａｄ−ｒｅｌａｔｅｄｔｒａｎｓｆｅｒｆｕｎｃｔｉｏｎ：ＨＲＴＦ）を用いたバイノーラル再生がある。ＨＲＴＦとは、音源から耳に至るまでの音の伝達特性であり、人やダミーヘッドの耳に専用のマイクロホンを取り付け、音源を様々な方向に置いて測定する。バイノーラル再生では、音源に定位させたい方向のＨＲＴＦを畳み込み、バイノーラル音源に変換し、ヘッドフォンやイヤホンから再生することで、立体音響効果を生み出すことが出来る。しかし、バイノーラル音源をスピーカからそのまま再生した場合、十分な立体音響効果を得ることができなくなる。右耳用バイノーラル音源は、右耳にのみ到達する必要があるが、スピーカから再生した場合、右耳用バイノーラル音源が右耳だけでなく左耳にも入ってしまう、また同様に、左のスピーカから再生された左耳用バイノーラル音源も左耳だけでなく右耳に入ることになる。このような現象はクロストークと呼ばれ、立体音響効果を妨げる原因となっている。 Conventionally, there is binaural reproduction using a head-related transfer function (HRTF) as a technique for reproducing a sense of presence as if the sound image is localized arbitrarily and as if it were present. HRTF is a transmission characteristic of sound from the sound source to the ear. A dedicated microphone is attached to the ear of a person or a dummy head, and the sound source is placed in various directions for measurement. In binaural playback, a three-dimensional sound effect can be created by convolving the HRTF in the direction to be localized in the sound source, converting it to a binaural sound source, and playing it from headphones or earphones. However, when a binaural sound source is reproduced as it is from a speaker, a sufficient stereophonic effect cannot be obtained. The binaural sound source for the right ear needs to reach only the right ear, but when played from the speaker, the binaural sound source for the right ear enters not only the right ear but also the left ear. Similarly, the left speaker The binaural sound source for the left ear reproduced from the above will enter the right ear as well as the left ear. Such a phenomenon is called crosstalk and is a cause of hindering the stereophonic effect.

そして、従来、スピーカによる再生でもバイノーラル再生と同様の効果を得られる技術として、トランスオーラル再生と呼ばれるものがある（非特許文献１、特許文献１参照）。従来のバイノーラル再生方式では、各スピーカから両耳までの室内伝達関数を測定した後、バイノーラル音源に伝達関数を畳み込み、その中のクロストーク成分のみをキャンセルするフィルタを設計する処理を行う。そして、従来のバイノーラル再生方式では、このフィルタを音像定位させる音源に掛けてスピーカから再生する。これにより、従来のバイノーラル再生方式では、クロストーク成分が受聴者の耳元で打ち消され、左右それぞれのバイノーラル音源だけが耳に届き、バイノーラル再生と同様の立体音響効果を得ることができる。非特許文献１に記載されたトランスオーラル再生技術では、スピーカによる音響再生についても、バイノーラル再生と同様にクロストーク成分のみをキャンセルし、立体音響効果を得ることができる。 Conventionally, there is a technique called trans-oral reproduction that can achieve the same effect as binaural reproduction even with reproduction by a speaker (see Non-Patent Document 1 and Patent Document 1). In the conventional binaural reproduction method, after measuring the room transfer function from each speaker to both ears, the transfer function is convolved with the binaural sound source, and a process of designing a filter that cancels only the crosstalk component therein is performed. In the conventional binaural reproduction method, this filter is applied to a sound source for sound image localization and reproduced from a speaker. Thereby, in the conventional binaural reproduction method, the crosstalk component is canceled at the listener's ear, and only the left and right binaural sound sources reach the ear, and the same stereophonic effect as in binaural reproduction can be obtained. In the transoral reproduction technique described in Non-Patent Document 1, with respect to sound reproduction by a speaker, only a crosstalk component can be canceled and a three-dimensional sound effect can be obtained similarly to binaural reproduction.

しかしながら、従来のトランスオーラル再生技術では、立体音響効果を得られる受聴位置（スイートスポット）が狭いという問題がある。そのため受聴者が頭を前後左右、また顔の向きを少しでも変えると、たちまち立体音響効果がなくなってしまう。この問題に対して特許文献１では、スピーカを３個以上用意し、カメラを用いて受聴者の顔の位置と向きを常時解析し、その時々の受聴者に適したスピーカと伝達関数を選択し、トランスオーラル再生を行う方法を提案している。 However, the conventional transoral reproduction technique has a problem that the listening position (sweet spot) where the three-dimensional sound effect can be obtained is narrow. Therefore, if the listener changes his / her head, front / rear / left / right, and the direction of the face, the stereophonic effect will be lost. To deal with this problem, Patent Document 1 prepares three or more speakers, uses a camera to constantly analyze the position and orientation of the listener's face, and selects a speaker and transfer function suitable for the listener at that time. A method for transoral reproduction is proposed.

特開２０１３−１１０６３３号公報JP 2013-110633 A

ＷＧＧａｒｄｎｅｒ，「３−ＤＡｕｄｉｏＵｓｉｎｇＬｏｕｄｓｐｅａｋｅｒｓ」，Ｓｐｒｉｎｇｅｒ，１９９７，[Online]，INTERNET，[２０１６年１月２８日検索]，＜ＵＲＬ：sound.media.mit.edu/Papers/gardner_thesis.pdf＞WG Gardner, “3-D Audio Using Loudspeakers”, Springer, 1997, [Online], INTERNET, [searched January 28, 2016], <URL: sound.media.mit.edu/Papers/gardner_thesis.pdf>

しかしながら、特許文献１に記載された方法では、受聴者の顔の位置と向きを検出する処理をリアルタイムで行うための装置が別途必要となる。また３個以上のスピーカとカメラが必要なため、大掛かりなシステムとなり立体音響効果を体験できる場所も限られてしまうという問題がある。 However, the method described in Patent Document 1 requires a separate device for performing processing for detecting the position and orientation of the listener's face in real time. In addition, since three or more speakers and a camera are required, there is a problem that the place where the 3D sound effect can be experienced becomes limited because of a large-scale system.

そのため、複数のスピーカを音源とし、所定の方向に音源を定位する処理を精度よく行う音響再生装置及びプログラムが望まれている。 Therefore, there is a demand for a sound reproducing device and a program that use a plurality of speakers as sound sources and perform processing for localizing sound sources in a predetermined direction with high accuracy.

第１の本発明は、入力音響信号を立体音響処理して複数のスピーカのそれぞれに供給する立体音響信号を生成する音響再生装置において、（１）各音源の方向に対応する頭部伝達関数を保持する頭部伝達関数保持部と、（２）少なくとも音源を定位させる音像定位方向の情報を取得する情報取得部と、（３）前記頭部伝達関数保持部が保持した頭部伝達関数を用いて前記音像定位方向に音源を定位させた第１の立体音響信号を生成する第１の立体音響信号生成部と、（４）それぞれの前記スピーカに対して、前記第１の立体音響信号からクロストーク成分を除去してクロストークキャンセルフィルタを保持するものであって、前記情報取得部が取得した音像定位方向に応じたパラメータに基づくクロストークキャンセルフィルタを保持するクロストークキャンセルフィルタ保持部と、（５）それぞれの前記スピーカについて、前記クロストークキャンセルフィルタ保持部が保持したクロストークキャンセルフィルタを用いて、前記第１の立体音響信号からクロストーク成分を除去して第２の立体音響信号を生成する第２の立体音響信号生成部とを有することを特徴とする。 According to a first aspect of the present invention, there is provided a sound reproducing apparatus for generating a stereophonic signal to be supplied to each of a plurality of speakers by performing stereophonic processing on an input acoustic signal. Using a head-related transfer function held by the head-related transfer function held by the head-related transfer function held by the head-related transfer function; A first stereophonic signal generator that generates a first stereoacoustic signal in which the sound source is localized in the sound image localization direction; and (4) crossing from the first stereoacoustic signal to each of the speakers. a holds the crosstalk cancellation filter to remove talk component, click to hold the crosstalk cancellation filter based on the parameters the information acquisition unit corresponding to the acquired sound image localization direction (5) For each of the speakers, a crosstalk component is removed from the first stereophonic sound signal by using a crosstalk cancellation filter held by the crosstalk cancellation filter holding unit. And a second stereophonic signal generator for generating two stereoacoustic signals.

第２の本発明の音響再生プログラムは、入力音響信号を立体音響処理して複数のスピーカのそれぞれに供給する立体音響信号を生成する音響再生装置に搭載されたコンピュータを、（１）各音源の方向に対応する頭部伝達関数を保持する頭部伝達関数保持部と、（２）少なくとも音源を定位させる音像定位方向の情報を取得する情報取得部と、（３）前記頭部伝達関数保持部が保持した頭部伝達関数を用いて前記音像定位方向に音源を定位させた第１の立体音響信号を生成する第１の立体音響信号生成部と、（４）それぞれの前記スピーカに対して、前記第１の立体音響信号からクロストーク成分を除去してクロストークキャンセルフィルタを保持するものであって、前記情報取得部が取得した音像定位方向に応じたパラメータに基づくクロストークキャンセルフィルタを保持するクロストークキャンセルフィルタ保持部と、（５）、それぞれの前記スピーカについて、前記クロストークキャンセルフィルタ保持部が保持したクロストークキャンセルフィルタを用いて、前記第１の立体音響信号からクロストーク成分を除去して第２の立体音響信号を生成する第２の立体音響信号生成部として機能させることを特徴とする。 The sound reproduction program according to the second aspect of the present invention provides a computer mounted on an audio reproduction device that generates a three-dimensional sound signal to be supplied to each of a plurality of speakers by performing three-dimensional sound processing on an input sound signal. A head-related transfer function holding unit that holds a head-related transfer function corresponding to the direction; (2) an information acquisition unit that acquires information on a sound image localization direction that localizes at least a sound source; and (3) the head-related transfer function holding unit. A first stereophonic signal generation unit that generates a first stereoacoustic signal in which the sound source is localized in the sound image localization direction using the head-related transfer function held by (1), and (4) for each of the speakers, a holds the crosstalk cancellation filter to remove crosstalk components from the first stereophonic signal, based on a parameter corresponding to the sound image localization direction in which the information obtaining unit has obtained black A crosstalk cancellation filter holding unit for holding a talk cancellation filter; and (5) using the crosstalk cancellation filter held by the crosstalk cancellation filter holding unit for each of the speakers, from the first stereophonic sound signal. It is made to function as a 2nd stereophonic signal generation part which removes a crosstalk component and generates the 2nd stereophonic signal.

本発明によれば、複数のスピーカを音源とし、所定の方向に音源を定位する処理を精度よく行う音響再生装置及びプログラムを提供することができる。 ADVANTAGE OF THE INVENTION According to this invention, the sound reproducing apparatus and program which perform accurately the process which uses a several speaker as a sound source and localizes a sound source in a predetermined | prescribed direction can be provided.

第１の実施形態に係る音響再生装置の機能的構成について示したブロック図である。It is the block diagram shown about the functional structure of the sound reproduction apparatus which concerns on 1st Embodiment. 第１の実施形態におけるスピーカとユーザとの間の位置関係及び伝達関数の構成について示した説明図である。It is explanatory drawing shown about the positional relationship between the speaker and user in 1st Embodiment, and the structure of the transfer function. 第１の実施形態に係る音響再生装置の効果を検証した実験環境の構成について示した説明図である。It is explanatory drawing shown about the structure of the experimental environment which verified the effect of the sound reproduction apparatus which concerns on 1st Embodiment. 第１の実施形態に係る音響再生装置の効果を検証した実験結果について示した説明図（その１）である。It is explanatory drawing (the 1) shown about the experimental result which verified the effect of the sound reproduction apparatus which concerns on 1st Embodiment. 第１の実施形態に係る音響再生装置の効果を検証した実験結果について示した説明図（その２）である。It is explanatory drawing (the 2) shown about the experimental result which verified the effect of the sound reproduction apparatus which concerns on 1st Embodiment. 第２の実施形態に係る音響再生装置の機能的構成について示したブロック図である。It is the block diagram shown about the functional structure of the sound reproduction apparatus which concerns on 2nd Embodiment. 第３の実施形態に係る音響再生装置の機能的構成について示したブロック図である。It is the block diagram shown about the functional structure of the sound reproduction apparatus which concerns on 3rd Embodiment. 第３の実施形態に係る音響再生装置において、頭部伝達関数をスピーカとユーザとの間の伝達関数に置き換える処理の例について示した説明図である。It is explanatory drawing shown about the example of the process which replaces a head-related transfer function with the transfer function between a speaker and a user in the audio | voice reproduction apparatus which concerns on 3rd Embodiment.

（Ａ）第１の実施形態
以下、本発明による音響再生装置及びプログラムの第１の実施形態を、図面を参照しながら詳述する。 (A) First Embodiment Hereinafter, a first embodiment of a sound reproducing device and a program according to the present invention will be described in detail with reference to the drawings.

（Ａ−１）第１の実施形態の構成
図１は、この実施形態の音響再生装置１０の全体構成を示すブロック図である。 (A-1) Configuration of the First Embodiment FIG. 1 is a block diagram showing the overall configuration of the sound reproducing device 10 of this embodiment.

この実施形態の音響再生装置１０は、データ入力部１１、位置情報取得部１２、フィルタ形成部１３、ＨＲＴＦ保持部１４、伝達関数保持部１５、立体音響処理部１６、出力部１７、及び２つのスピーカ（右スピーカ１８Ｒ、左スピーカ１８Ｌ）を有している。 The sound reproduction apparatus 10 of this embodiment includes a data input unit 11, a position information acquisition unit 12, a filter formation unit 13, an HRTF holding unit 14, a transfer function holding unit 15, a stereophonic sound processing unit 16, an output unit 17, and two It has speakers (right speaker 18R, left speaker 18L).

音響再生装置１０は、全てをハードウェア（例えば、専用の半導体チップ）を用いて構成するようにしてもよいし、一部又は全部の演算処理（データ処理や信号処理）についてソフトウェア的に構成するようにしてもよい。例えば、音響再生装置１０は、スピーカを除く各構成要素について、プロセッサー及びメモリを有するプログラムの実行構成（コンピュータ）に実施形態の音響再生プログラムをインストールすることで実現するようにしてもよい。 The sound reproducing device 10 may be configured entirely using hardware (for example, a dedicated semiconductor chip), or may be configured in software for a part or all of arithmetic processing (data processing and signal processing). You may do it. For example, the sound reproduction device 10 may be realized by installing the sound reproduction program according to the embodiment in the execution configuration (computer) of a program having a processor and a memory for each component other than the speaker.

音響再生装置１０は、入力された音響信号（以下、「入力音響信号」と呼ぶ）に立体音響処理を施した信号（以下、「立体音響信号」と呼ぶ）を生成する装置である。この実施形態では、音響再生装置１０は、生成した立体音響信号を、２つのスピーカ（右スピーカ１８Ｒ、左スピーカ１８Ｌ）から聴取者であるユーザＵに出力する構成となっているものとする。音響再生装置１０が立体音響信号を出力する方式については限定されないものであり、例えば、デジタル音響データ（音声データ）として出力（例えば、所定のデータ記録媒体への書き込みや通信によるデータ送信により出力）するようにしてもよい。 The sound reproducing device 10 is a device that generates a signal (hereinafter referred to as “stereoscopic sound signal”) obtained by performing stereophonic sound processing on an input acoustic signal (hereinafter referred to as “input acoustic signal”). In this embodiment, it is assumed that the sound reproduction device 10 is configured to output the generated stereophonic signal from two speakers (the right speaker 18R and the left speaker 18L) to the user U who is a listener. The method by which the sound reproduction device 10 outputs a stereophonic sound signal is not limited. For example, it is output as digital sound data (sound data) (for example, output by writing to a predetermined data recording medium or transmitting data by communication). You may make it do.

図２は、スピーカ（右スピーカ１８Ｒ、左スピーカ１８Ｌ）と、ユーザＵとの位置関係（上方向から見た場合の位置関係）及びスピーカ（右スピーカ１８Ｒ、左スピーカ１８Ｌ）とユーザＵとの間の経路上の伝達関数について示した説明図である。 FIG. 2 shows the positional relationship between the speakers (the right speaker 18R and the left speaker 18L) and the user U (the positional relationship when viewed from above) and between the speakers (the right speaker 18R and the left speaker 18L) and the user U. It is explanatory drawing shown about the transfer function on the path | route.

図２に示す通り、右スピーカ１８ＲはユーザＵから見て右側に配置されたスピーカであり、左スピーカ１８ＬはユーザＵからみて左側に配置されたスピーカである。 As shown in FIG. 2, the right speaker 18 </ b> R is a speaker disposed on the right side when viewed from the user U, and the left speaker 18 </ b> L is a speaker disposed on the left side when viewed from the user U.

音響再生装置１０は、例えば、ビデオ会議システム等の会議端末において、図示しない遠隔地のマイクで捕捉された音響信号（遠端話者の音声信号）に、立体音響処理を施して出力するものである。例えば、図２に示すようにユーザＵの正面に配置されたディスプレイに表示される画像（例えば、遠隔地のカメラで撮影した画像）に応じた音像定位を施した音（例えば、ディスプレイＤに表示された遠端話者の位置に応じた音像定位を施した音）を出力することにより、ユーザＵに対し臨場感のある音（立体音響）を出力することができる。 For example, in a conference terminal such as a video conference system, the sound reproduction device 10 performs stereophonic sound processing on an audio signal (far-end speaker's audio signal) captured by a remote microphone (not shown) and outputs the sound signal. is there. For example, as shown in FIG. 2, a sound (for example, displayed on the display D) subjected to sound image localization according to an image (for example, an image photographed by a remote camera) displayed on a display arranged in front of the user U. Sound that is localized according to the position of the far-end speaker is output), so that a realistic sound (stereoscopic sound) can be output to the user U.

この実施形態では、音響再生装置１０は、入力音響信号の音源を定位（音像定位）させる方向（以下、「音像定位方向」と呼ぶ）と、ユーザＵの位置情報（以下、「ユーザ位置情報」とも呼ぶ）と、各スピーカの位置情報（以下、「スピーカ位置情報」とも呼ぶ）とを保持し、当該入力音響信号の音源を指定された音像定位方向に定位させる立体音響処理を行い、音をユーザＵに向けて放出（出力）する装置であるものとする。なお、音響再生装置１０が保持する音像定位方向、ユーザ位置情報、及びスピーカ位置情報の具体的な方式、入力タイミング及びデータ形式については限定されないものである。例えば、音響再生装置１０は、図示しない会議システムの会議端末上で、上述の各情報の入力を受付けて、入力音響信号に基づく音を出力する装置してもよい。なお、音響再生装置１０の用途は会議端末に限定されないものである。 In this embodiment, the sound reproducing device 10 is configured to localize (sound image localization) the sound source of the input acoustic signal (hereinafter referred to as “sound image localization direction”) and the position information of the user U (hereinafter referred to as “user position information”). 3) and position information of each speaker (hereinafter also referred to as “speaker position information”), and performing stereophonic sound processing to localize the sound source of the input sound signal in the specified sound image localization direction, It is assumed that the device emits (outputs) toward the user U. Note that the specific method, input timing, and data format of the sound image localization direction, user position information, and speaker position information held by the sound reproduction device 10 are not limited. For example, the sound reproduction device 10 may be a device that receives input of each of the above-described information on a conference terminal of a conference system (not shown) and outputs a sound based on the input sound signal. In addition, the use of the sound reproducing device 10 is not limited to the conference terminal.

データ入力部１１は、入力された入力音響信号をアナログ信号からデジタル信号に変換（Ａ／Ｄ変換）する機能を担っている。この実施形態では、入力音響信号はアナログ信号であるものとして説明するが、デジタル信号であってもよい。データ入力部１１は、入力音響信号がデジタル形式の場合には、当該デジタル信号又は当該デジタル信号のデータ変換を施して後段に出力する。例えば、入力音響信号がＲＴＰ（Ｒｅａｌ−ＴｉｍｅＴｒａｎｓｐｏｒｔＰｒｏｔｏｃｏｌ）等のプロトコルによるパケット形式で入力される場合に、データ入力部１１は、入力されるパケットをバッファリングして連続的な音声データ（例えば、ＰＣＭ（ＰｕｌｓｅＣｏｄｅＭｏｄｕｌａｔｉｏｎ）形式等の音声データ）として出力するようにしてもよい。ここでは、データ入力部１１が出力する音声データはモノラル（１チャンネル分）の音声データであるものとして説明する。 The data input unit 11 has a function of converting an input acoustic signal from an analog signal to a digital signal (A / D conversion). In this embodiment, the input acoustic signal is described as an analog signal, but may be a digital signal. When the input acoustic signal is in a digital format, the data input unit 11 performs data conversion of the digital signal or the digital signal and outputs it to the subsequent stage. For example, when the input acoustic signal is input in a packet format according to a protocol such as RTP (Real-Time Transport Protocol), the data input unit 11 buffers the input packet to continuously input audio data (for example, You may make it output as audio | voice data (PCM (Pulse Code Modulation) format etc.). Here, it is assumed that the audio data output from the data input unit 11 is monaural (one channel) audio data.

また、データ入力部１１に供給される入力音響信号は、リアルタイムなデータに限定されず、予め図示しないデータ記録媒体（例えば、ハードディスクドライブやフラッシュメモリ等）に記録されたデータ（オフラインのデータ）を読込んで、連続的なデジタル音響信号として後段に出力するようにしてもよい。 Further, the input acoustic signal supplied to the data input unit 11 is not limited to real-time data, and data (offline data) recorded in advance on a data recording medium (not shown) (for example, a hard disk drive or a flash memory) is used. It may be read and output to the subsequent stage as a continuous digital sound signal.

位置情報取得部１２は、入力音響信号の立体音響処理に必要な情報を取得して、フィルタ形成部１３に供給する機能を担っている。位置情報取得部１２は、音像定位方向、ユーザ位置情報、及びスピーカ位置情報を取得する。位置情報取得部１２が情報取得する方法や情報取得するタイミングは限定されないものである。位置情報取得部１２は、例えば、リアルタイムに情報取得を行うようにしてもよいし、予め設定されたデータを取得するようにしてもよいし、予め図示しないデータ記録媒体に記録された情報（例えば、入力音響信号のデータと対となる情報）を所定のタイミングで取得（更新）するようにしてもよい。 The position information acquisition unit 12 has a function of acquiring information necessary for the stereophonic sound processing of the input sound signal and supplying the information to the filter forming unit 13. The position information acquisition unit 12 acquires a sound image localization direction, user position information, and speaker position information. The method by which the position information acquisition unit 12 acquires information and the timing at which information is acquired are not limited. For example, the position information acquisition unit 12 may acquire information in real time, may acquire preset data, or may store information (for example, information recorded in a data recording medium (not shown) in advance). , Information paired with data of the input acoustic signal) may be obtained (updated) at a predetermined timing.

次に、位置情報取得部１２が取得する各情報の例について図２を用いて説明する。 Next, an example of each piece of information acquired by the position information acquisition unit 12 will be described with reference to FIG.

この実施形態では、位置情報取得部１２は、ユーザ位置情報として、少なくともユーザＵの右耳ｅ_Ｒの位置Ｐ_ｅＲの位置情報及び、左耳ｅ_Ｌの位置Ｐ_ｅＬの位置情報を保持するものとする。なお、位置情報取得部１２は、直接位置Ｐ_ｅＲ、Ｐ_ｅＬを取得するのではなく、例えば、ユーザＵの頭部の中心位置Ｐ_Ｕの位置情報と、ユーザＵの向いている方向の情報を取得してユーザの各耳の位置Ｐ_ｅＲ、Ｐ_ｅＬを推定（計算）して取得するようにしてもよい。以上のように、位置情報取得部１２がユーザ位置情報として、各耳の位置Ｐ_ｅＲ、Ｐ_ｅＬの情報を取得する方式は限定されないものである。 In this embodiment, the position information acquisition unit 12 holds at least position information of the position P _eR of the right ear e _R of the user U and position information of the position P _eL of the left ear e _L as the user position information. To do. Note that the position information acquisition unit 12 does not directly acquire the positions P _eR and P _eL , for example, the position information on the center position P _U of the head of the user U and the information on the direction in which the user U is facing. The position P _eR and P _eL of the user's ears may be acquired and estimated (calculated). As described above, the method by which the position information acquisition unit 12 acquires the information on the positions P _eR and P _eL of each ear as the user position information is not limited.

また、この実施形態では、位置情報取得部１２は、スピーカ位置情報として、右スピーカ１８Ｒの位置（図２のように上方向から見た場合の位置）Ｐ_ＳＲの位置情報と、左スピーカ１８Ｌの位置Ｐ_ＳＬを取得するものとする。なお、この実施形態では、位置Ｐ_ＳＲ、Ｐ_ＳＬは、それぞれ各スピーカ（右スピーカ１８Ｒ、左スピーカ１８Ｌ）の中心位置（図２のように上方向から見た場合の中心位置）とする。 Further, in this embodiment, the position information acquisition unit 12, a speaker position information, the position of the right speaker 18R (positions when viewed from above as in FIG. 2) and the position information of the P _SR, the left speaker 18L _Assume that the position _PSL is acquired. In this embodiment, the positions P _SR and P _SL are the center positions of the respective speakers (the right speaker 18R and the left speaker 18L) (center positions when viewed from above as shown in FIG. 2).

以下では、図２に示すように、音像定位方向を二次元の平面上（上方向から見た場合の平面上；ユーザＵから見た水平方向の面上）で、ユーザＵが向いている方向と、音像定位方向とが成す角度をθ_Ｓと表すものとする。なお、音響再生装置１０に入力される音像定位方向には、ユーザＵから見た上下方向の成分を含むようにしてもよい。 In the following, as shown in FIG. 2, the sound image localization direction is a two-dimensional plane (on a plane when viewed from above; on a horizontal plane viewed from the user U), and the direction in which the user U is facing. And the angle formed by the sound image localization direction is represented by θ _S. Note that the sound image localization direction input to the sound reproducing device 10 may include a vertical component viewed from the user U.

ＨＲＴＦ保持部１４は、各音源の方向（音像定位方向θ_Ｓ）に対応するＨＲＴＦを保持し、フィルタ形成部１３に供給する機能を担っている。ＨＲＴＦ保持部１４は、所定のグリッド幅（例えば、１°等）ごとのＨＲＴＦを予め保持しておくようにしてもよい、フィルタ形成部１３から要求されるごとにＨＲＴＦを算出して取得するようにしてもよい。ＨＲＴＦ保持部１４が保持するＨＲＴＦのデータ自体は種々のＨＲＴＦの算出方式により取得したデータを適用することができる。この実施形態では、ＨＲＴＦ保持部１４は、各音源の方向について右耳用のＨＲＴＦと左耳用のＨＲＴＦを保持しているものとする。なお、ＨＲＴＦ保持部１４が保持するＨＲＴＦのデータは、種々のバイノーラル再生やトランスオーラル再生の過程で用いられるＨＲＴＦと同様のものを適用することができる。 The HRTF holding unit 14 has a function of holding the HRTF corresponding to the direction of each sound source (sound image localization direction θ _S ) and supplying the HRTF to the filter forming unit 13. The HRTF holding unit 14 may hold the HRTF for each predetermined grid width (for example, 1 °) in advance, or calculate and acquire the HRTF whenever requested by the filter forming unit 13. It may be. Data acquired by various HRTF calculation methods can be applied to the HRTF data itself held by the HRTF holding unit 14. In this embodiment, it is assumed that the HRTF holding unit 14 holds the HRTF for the right ear and the HRTF for the left ear in the direction of each sound source. The HRTF data held by the HRTF holding unit 14 may be the same as the HRTF used in various binaural playback and transoral playback processes.

伝達関数保持部１５は、ユーザＵと各スピーカ（右スピーカ１８Ｒ、左スピーカ１８Ｌ）との位置関係に応じた伝達関数を保持し、フィルタ形成部１３に供給する機能を担っている。 The transfer function holding unit 15 has a function of holding a transfer function corresponding to the positional relationship between the user U and each speaker (the right speaker 18R and the left speaker 18L) and supplying the transfer function to the filter forming unit 13.

次に、伝達関数保持部１５が保持する各伝達関数について図２を用いて説明する。 Next, each transfer function held by the transfer function holding unit 15 will be described with reference to FIG.

以下では、右スピーカ１８Ｒの位置Ｐ_ＳＲとユーザＵの右耳の位置Ｐ_ｅＲとの間の伝達関数（右スピーカ−右耳経路の伝達関数）を、Ｇ_ＲＲと表すものとする。また、以下では、右スピーカ１８Ｒの位置Ｐ_ＳＲとユーザＵの左耳の位置Ｐ_ｅＬとの間の伝達関数（右スピーカ−左耳経路の伝達関数）を、Ｇ_ＲＬと表すものとする。さらに、以下では、左スピーカ１８Ｌの位置Ｐ_ＳＬとユーザＵの右耳の位置Ｐ_ｅＲとの間の伝達関数（左スピーカ−右耳経路の伝達関数）を、Ｇ_ＬＲと表すものとする。さらにまた、以下では、左スピーカ１８Ｌの位置Ｐ_ＳＬとユーザＵの左耳の位置Ｐ_ｅＬとの間の伝達関数（左スピーカ−左耳経路の伝達関数）を、Ｇ_ＬＬと表すものとする。 Hereinafter, the transfer function between the position P _eR of the right ear position P _SR and the user U of the right speaker 18R - (right speaker transfer function of the right ear path), intended to represent the G _RR. In the following, the transfer function between the position P _eL the left ear position P _SR and the user U of the right speaker 18R - (right speaker transfer function of the left ear path), intended to represent the G _RL. Furthermore, hereinafter, the transfer function between the position P _SL of the left speaker 18L and the position P _eR of the right ear of the user U (left speaker-right ear path transfer function) is represented as G _LR . Furthermore, hereinafter, the transfer function between the position P _SL of the left speaker 18L and the position P _eL of the left ear of the user U (left speaker-left ear path transfer function) is expressed as G _LL .

この実施形態では、伝達関数保持部１５は、伝達関数Ｇ_ＲＲ、Ｇ_ＲＬ、Ｇ_ＬＲ、Ｇ_ＬＬを保持して、フィルタ形成部１３に供給する。伝達関数保持部１５は、予め伝達関数Ｇ_ＲＲ、Ｇ_ＲＬ、Ｇ_ＬＲ、Ｇ_ＬＬを保持しておくようにしてもよいし、フィルタ形成部１３からの要求に応じて算出（例えば、ユーザ位置情報とスピーカ位置情報に基づいて算出）するようにしてもよい。伝達関数保持部１５が保持する伝達関数Ｇ_ＲＲ、Ｇ_ＲＬ、Ｇ_ＬＲ、Ｇ_ＬＬの算出方式は限定されないものであり、種々の伝達関数を適用することができる。なお、伝達関数保持部１５が保持する各伝達関数は、種々のトランスオーラル処理で用いられる伝達関数と同様のものを適用することができる。 In this embodiment, the transfer function holding unit 15 holds the transfer functions G _RR , G _RL , G _LR , and G _LL and supplies them to the filter forming unit 13. The transfer function holding unit 15 may hold the transfer functions G _RR , G _RL , G _LR , G _LL in advance, or calculate (for example, user position information) in response to a request from the filter forming unit 13. And may be calculated based on the speaker position information. The calculation method of the transfer functions G _RR , G _RL , G _LR , and G _LL held by the transfer function holding unit 15 is not limited, and various transfer functions can be applied. As the transfer functions held by the transfer function holding unit 15, the same transfer functions used in various transoral processes can be applied.

フィルタ形成部１３は、後段の立体音響処理部１６で立体音響処理（トランスオーラル処理）する際に必要となる各フィルタの保持を行う機能を担っている。 The filter forming unit 13 has a function of holding each filter that is required when the stereophonic sound processing unit 16 in the subsequent stage performs stereophonic sound processing (transoral processing).

フィルタ形成部１３は、音像定位方向に対応するＨＲＴＦ（入力音響信号Ｃ_ｉのバイノーラル処理等に必要となるフィルタ）をＨＲＴＦ保持部１４から取得して、立体音響処理部１６に供給する。また、フィルタ形成部１３は、ユーザ位置情報及びスピーカ位置情報に基づいて、バイノーラル音源（立体音響処理部１６が、ＨＲＴＦデータに基づいてバイノーラル処理を施した音響信号）からクロストーク成分をキャンセルするフィルタ（以下、「クロストークキャンセルフィルタ」と呼ぶ）を生成する処理を行う。フィルタ形成部１３が生成するクロストークキャンセルフィルタの詳細については後述する。 Filter forming section 13 obtains the HRTF (filter required for binaural processing of the input audio signal C _i) corresponding to the sound image localization direction from HRTF holding unit 14, and supplies the stereophonic sound processing section 16. Also, the filter forming unit 13 is a filter that cancels the crosstalk component from the binaural sound source (the acoustic signal that the stereophonic sound processing unit 16 has performed binaural processing based on the HRTF data) based on the user position information and the speaker position information. (Hereinafter referred to as “crosstalk cancellation filter”) is generated. Details of the crosstalk cancellation filter generated by the filter forming unit 13 will be described later.

立体音響処理部１６は、フィルタ形成部１３から供給された各フィルタ（ＨＲＴＦ及びクロストークキャンセルフィルタ）を用いて、入力音響信号Ｃ_ｉにトランスオーラル処理を施して、トランスオーラル音源となる音響信号を生成し、出力部１７に供給する。 Stereophonic sound processing unit 16, using each filter supplied from the filter forming portion 13 (HRTF and crosstalk cancellation filters) is subjected to a trans-oral treatment with the input audio signal C _i, the acoustic signal to be Transaural source Generated and supplied to the output unit 17.

立体音響処理部１６は、まず、ＨＲＴＦに基づき入力音響信号Ｃ_ｉにバイノーラル処理を施して、バイノーラル音源となる音響信号を生成する。そして、立体音響処理部１６は、生成したバイノーラル音源（音響信号）に、クロストークキャンセルフィルタを掛け（畳み込み）、トランスオーラル音源としての音響信号を生成し、出力部１７に供給する。 Stereophonic sound processing unit 16 first performs binaural processing on the input audio signals C _i based on the HRTF, generates an acoustic signal as a binaural sound. Then, the stereophonic sound processing unit 16 multiplies the generated binaural sound source (acoustic signal) by a crosstalk cancellation filter (convolution), generates an acoustic signal as a transaural sound source, and supplies it to the output unit 17.

以下では、右耳用のバイノーラル音源を「Ｂ_Ｒ」、左耳用のバイノーラル音源を「Ｂ_Ｌ」と呼ぶものとする。また、以下では、立体音響処理部１６において、トランスオーラル処理を施した結果得られる右スピーカ１８Ｒ用の音源（音響信号）をＴ_Ｒ、トランスオーラル処理を施した結果得られる左スピーカ１８Ｌ用の音源（音響信号）をＴ_Ｌと呼ぶものとする。 Hereinafter, the binaural sound source for the right ear is referred to as “B _R ”, and the binaural sound source for the left ear is referred to as “B _L ”. In the following, the stereophonic sound process unit 16, the right speaker 18R obtained as a result of performing a transaural processing sound sources (acoustic signal) T _R, for the left speaker 18L obtained as a result of subjecting the Transaural processing source Let (acoustic signal) be called _TL .

なお、入力音響信号Ｃ_ｉが複数の音源の音響信号により構成される場合、立体音響処理部１６は、それぞれの音源（入力音響信号Ｃ_ｉを構成する音響信号）についてトランスオーラル音源を生成し、それぞれのトランスオーラル音源についてゲイン調整（例えば、予め設定された比率でのゲイン調整）を行って混合し、１つのトランスオーラル音源として生成するようにしてもよい。 When the input sound signal C _i is composed of sound signals of a plurality of sound sources, the stereophonic sound processing unit 16 generates a trans-oral sound source for each sound source (acoustic signal constituting the input sound signal C _i ), Each trans-oral sound source may be mixed by performing gain adjustment (for example, gain adjustment at a preset ratio) to generate a single trans-oral sound source.

出力部１７は、立体音響処理部１６で生成したトランスオーラル音源の音響信号を、スピーカ１８Ｒ、１８Ｌに分配して出力する。なお、出力部１７は、スピーカ１８Ｒ、１８Ｌの入力形式に応じた信号変換（例えば、デジタル信号からアナログ信号への変換を行って、スピーカ１８Ｒ、１８Ｌに出力するようにしてもよい。 The output unit 17 distributes and outputs the acoustic signal of the transoral sound source generated by the stereophonic sound processing unit 16 to the speakers 18R and 18L. Note that the output unit 17 may perform signal conversion (for example, conversion from a digital signal to an analog signal) according to the input format of the speakers 18R and 18L and output the converted signals to the speakers 18R and 18L.

次に、フィルタ形成部１３がクロストークキャンセルフィルタを形成する処理の詳細について説明する。 Next, details of the process in which the filter forming unit 13 forms the crosstalk cancellation filter will be described.

まず、フィルタ形成部１３は、ＨＲＴＦ保持部１４から、音像定位方向に対応するＨＲＴＦデータを取得する。また、フィルタ形成部１３は、伝達関数保持部１５から、ユーザ位置情報・スピーカ位置情報に対応する伝達関数Ｇ_ＲＲ、Ｇ_ＲＬ、Ｇ_ＬＲ、Ｇ_ＬＬを取得する。 First, the filter forming unit 13 acquires HRTF data corresponding to the sound image localization direction from the HRTF holding unit 14. Further, the filter forming unit 13 acquires transfer functions G _RR , G _RL , G _LR , and G _LL corresponding to the user position information / speaker position information from the transfer function holding unit 15.

そして、フィルタ形成部１３は、伝達関数Ｇ_ＲＲ、Ｇ_ＲＬ、Ｇ_ＬＲ、Ｇ_ＬＬを用いて、バイノーラル音源からクロストーク成分をキャンセルするためのクロストークキャンセルフィルタを設計する。クロストークキャンセルフィルタは、バイノーラル音源Ｂ_Ｒ、Ｂ_Ｌバイノーラル音源に、各スピーカ１８Ｒ、１８ＬからユーザＵの両耳までの室内伝達関数Ｇ_ＲＲ、Ｇ_ＲＬ、Ｇ_ＬＲ、Ｇ_ＬＬを畳み込み、その中のクロストーク成分（右スピーカ１８ＲからユーザＵの左耳に到達する成分、及び左スピーカ１８ＬからユーザＵの右耳に到達する成分）のみをキャンセルするフィルタ設計を行う。 Then, the filter forming portion 13 is designed transfer function _G _RR, using a _{_{G RL, G LR, G LL}} , the crosstalk cancellation filter for canceling a crosstalk component from the binaural sound source. The crosstalk cancellation filter convolves the binaural sound sources B _R and B _{L with a} binaural sound source and the room transfer functions G _RR , G _RL , G _LR , and G _LL from the speakers 18R and 18L to both ears of the user U. A filter design that cancels only the crosstalk components (components that reach the left ear of the user U from the right speaker 18R and components that reach the right ear of the user U from the left speaker 18L) is performed.

以下では、右スピーカ１８Ｒ用のクロストークキャンセルフィルタを「Ｃ_Ｒ（ω）」（「ω」は周波数、以下同様）、左スピーカ１８Ｌ用のクロストークキャンセルフィルタを「Ｃ_Ｌ（ω）」と呼ぶものとする。言い換えると、クロストークキャンセルフィルタＣ_Ｒ（ω）は、バイノーラル音源Ｂ_Ｒにトランスオーラル処理を施して、右スピーカ１８Ｒ用のトランスオーラル音源Ｔ_Ｒを生成するためのフィルタとなる。また、クロストークキャンセルフィルタＣ_Ｌ（ω）は、バイノーラル音源Ｂ_Ｌにトランスオーラル処理を施して、左スピーカ１８Ｌ用のトランスオーラル音源Ｔ_Ｌを生成するためのフィルタとなる。 Hereinafter, the crosstalk cancellation filter for the right speaker 18R is referred to as “C _R (ω)” (“ω” is a frequency, the same applies hereinafter), and the crosstalk cancellation filter for the left speaker 18L is referred to as “C _L (ω)”. Shall. In other words, the crosstalk cancellation filter C _{R (omega)} is subjected to transaural processing binaural sound source B _R, a filter for generating a transaural sound T _R for the right speaker 18R. In addition, the crosstalk cancellation filter C _L (ω) is a filter for generating a trans-oral sound source T _L for the left speaker 18L by performing trans-oral processing on the binaural sound source B _L.

以下では、クロストークキャンセルフィルタＣ_Ｒ（ω）、Ｃ_Ｌ（ω）を形成（保持）する処理の一例について説明する。 Hereinafter, an example of processing for forming (holding) the crosstalk cancellation filters C _R (ω) and C _L (ω) will be described.

まず、フィルタ形成部１３は、以下の（１）式〜（４）式のように、左スピーカ−左耳経路のフィルタＣ_ＬＬ（ω）、右スピーカ−右耳経路のフィルタＣ_ＲＲ（ω）、左スピーカ−右耳経路のフィルタＣ_ＬＲ（ω）、及び右スピーカ−左耳経路のフィルタＣ_ＲＬ（ω）を生成（設計）する。なお、以下の（１）式〜（４）式において、Ｈ_Ｌ（ω）、Ｈ_Ｒ（ω）は、それぞれ左耳、右耳用の音像定位方向に対応したＨＲＴＦ（ＨＲＴＦ保持部１４で保持したＨＲＴＦ）である。また、以下の（１）式〜（４）式において、Ｇ_ＲＲ、Ｇ_ＲＬ、Ｇ_ＬＲ、Ｇ_ＬＬは、それぞれ伝達関数保持部１５で保持された伝達関数である。以下の（１）式〜（４）式においてωは周波数を表している。以下の（１）式〜（４）式における共通項Ｇ_０（ω）は、以下の（１）式〜（４）式をまとめて方程式とすると、以下の（５）式のように表すことができる。

First, the filter forming unit 13 performs the left speaker-left ear path filter C _LL (ω) and the right speaker-right ear path filter C _RR (ω) as in the following formulas (1) to (4). The left speaker-right ear path filter C _LR (ω) and the right speaker-left ear path filter C _RL (ω) are generated (designed). In the following formulas (1) to (4), H _L (ω) and H _R (ω) are held by the HRTF (HRTF holding unit 14) corresponding to the sound image localization directions for the left and right ears, respectively. HRTF). In the following formulas (1) to (4), G _RR , G _RL , G _LR , and G _LL are transfer functions held by the transfer function holding unit 15, respectively. In the following formulas (1) to (4), ω represents a frequency. The common term G ₀ (ω) in the following formulas (1) to (4) is expressed as the following formula (5) when the following formulas (1) to (4) are combined into an equation. Can do.

ここで、仮に、従来のトランスオーラル再生技術（非特許文献１）と同様の方式で、スピーカごとのクロストークキャンセルフィルタＣ_Ｒ（ω）、Ｃ_Ｌ（ω）を求める場合、その演算式は以下の（６）、（７）式のようになる。

Here, if the crosstalk cancellation filters C _R (ω) and C _L (ω) for each speaker are obtained by a method similar to that of the conventional transoral reproduction technique (Non-patent Document 1), an arithmetic expression thereof is as follows: (6) and (7).

この実施形態では、フィルタ形成部１３は、従来の演算式（上記の（６）、（７）式）の要素の一部に、ユーザＵと各スピーカとの位置関係に応じた重みづけを付加したものをクロストークキャンセルフィルタＣ_Ｒ（ω）、Ｃ_Ｌ（ω）として算出する。 In this embodiment, the filter forming unit 13 adds a weight according to the positional relationship between the user U and each speaker to some of the elements of the conventional arithmetic expression (the above expressions (6) and (7)). These are calculated as crosstalk cancellation filters C _R (ω) and C _L (ω).

具体的には、この実施形態の例では、フィルタ形成部１３は、以下の（８）、（９）式により、クロストークキャンセルフィルタＣ_Ｒ（ω）、Ｃ_Ｌ（ω）を求めるものとする。 Specifically, in the example of this embodiment, the filter forming unit 13 obtains the crosstalk cancellation filters C _R (ω) and C _L (ω) by the following equations (8) and (9). .

以下の（８）、（９）式においてαは音像定位方向θ_Ｓ（仮想音源の方向）に応じて変動するパラメータ（左右のキャンセル量のバランスを変化させるパラメータ）であり、以下の（１０）式のように示すことができる。 In the following equations (8) and (9), α is a parameter (a parameter that changes the balance of the left and right cancellation amount) that varies according to the sound image localization direction θ _S (the direction of the virtual sound source), and the following (10) It can be shown as:

以下の（１０）式におけるｘは、音像定位方向θ_Ｓ（仮想音源の方向）の単位を度数（ｄｅｇｒｅｅ）からラジアン（ｒａｄ）に変換したパラメータであるものとする。ここでは、ユーザの正面方向を０（ラジアン）としたとき、右回り（時計回り）に９０°（右９０°）ならπ／２、左周り（反時計回り）に９０°（左９０°）なら３π／２となるものとする。したがって、音像定位方向θ_Ｓが０°の場合はα＝１／２となり、音像定位方向θ_Ｓが右９０度ならα＝０となり、音像定位方向θ_Ｓが左９０度ならα＝−１となる。 In the following equation (10), x is a parameter obtained by converting the unit of the sound image localization direction θ _S (virtual sound source direction) from degrees (degrees) to radians (rad). Here, when the front direction of the user is 0 (radian), it is π / 2 if it is 90 ° clockwise (90 ° clockwise), and 90 ° counterclockwise (90 ° left). Then, 3π / 2 is assumed. Therefore, when the sound image localization direction θ _S is 0 °, α = ½, when the sound image localization direction θ _S is 90 degrees to the right, α = 0, and when the sound image localization direction θ _S is 90 degrees to the left, α = −1. Become.

したがって、αを用いて、上記の（８）、（９）のようにクロストークキャンセルフィルタＣ_Ｒ（ω）、Ｃ_Ｌ（ω）を求める、音像定位方向θ_Ｓが左９０°と右９０°では以下の（１１）〜（１４）式のように変化する。 Accordingly, the sound image localization direction θ _S is determined to be 90 ° to the left and 90 ° to the right by obtaining the crosstalk cancellation filters C _R (ω) and C _L (ω) as in the above (8) and (9) using α. Then, it changes like the following (11)-(14) Formula.

以下の（１１）、（１２）式は、音像定位方向θ_Ｓが左９０°の場合のクロストークキャンセルフィルタＣ_Ｒ（ω）、Ｃ_Ｌ（ω）を示している。また、以下の（１３）、（１４）式は、音像定位方向θ_Ｓが右９０°の場合のクロストークキャンセルフィルタＣ_Ｒ（ω）、Ｃ_Ｌ（ω）を示している。 The following equations (11) and (12) show the crosstalk cancellation filters C _R (ω) and C _L (ω) when the sound image localization direction θ _S is 90 ° to the left. The following equations (13) and (14) indicate the crosstalk cancellation filters C _R (ω) and C _L (ω) when the sound image localization direction θ _S is 90 ° to the right.

上記の（８）、（９）のように、音像定位方向θ_Ｓに応じて変動するαを考慮して、クロストークキャンセルフィルタＣ_Ｒ（ω）、Ｃ_Ｌ（ω）を求めると、音像定位方向θ_Ｓが左方向となっている場合（例えば、左９０°の場合）、以下の（１１）、（１２）式に示すように、右スピーカ１８Ｒ用のクロストークキャンセルフィルタＣ_Ｒ（ω）によるキャンセル量が大きくなる。言い換えると、上記の（８）、（９）を適用すると、音像定位方向θ_Ｓが左方向となっている場合には、θ_Ｓにより表される角度が大きくなるほど、右スピーカ１８Ｒ用のクロストークキャンセルフィルタＣ_Ｒ（ω）によるキャンセル量が大きくなるため、左スピーカ１８Ｌから出力される音響信号Ｔ_Ｌがより強調されることになる。 When the crosstalk cancellation filters C _R (ω) and C _L (ω) are obtained in consideration of α that varies according to the sound image localization direction θ _{S as} in the above (8) and (9), the sound image localization is obtained. When the direction θ _S is the left direction (for example, left 90 °), as shown in the following equations (11) and (12), the crosstalk cancellation filter C _R (ω) for the right speaker 18R. The amount of cancellation due to increases. In other words, when the above (8) and (9) are applied, when the sound image localization direction θ _S is the left direction, the crosstalk for the right speaker 18R increases as the angle represented by θ _S increases. Since the amount of cancellation by the cancel filter C _R (ω) increases, the acoustic signal _TL output from the left speaker 18L is more emphasized.

また、上記の（８）、（９）を適用すると、音像定位方向θ_Ｓが右方向となっている場合（例えば、右９０°の場合）、以下の（１３）、（１４）式に示すように、左スピーカ１８Ｌ用のクロストークキャンセルフィルタＣ_Ｌ（ω）のキャンセル量が大きくなる。言い換えると、上記の（８）、（９）を適用すると、音像定位方向θ_Ｓが右方向となっている場合には、θ_Ｓにより表される角度が大きくなるほど、左スピーカ１８Ｌ用のクロストークキャンセルフィルタＣ_Ｌ（ω）によるキャンセル量が大きくなるため、右スピーカ１８Ｒから出力される音響信号Ｔ_Ｒがより強調されることになる。 Further, when the above (8) and (9) are applied, when the sound image localization direction θ _S is the right direction (for example, 90 ° to the right), the following expressions (13) and (14) are shown. As described above, the amount of cancellation of the crosstalk cancellation filter C _L (ω) for the left speaker 18L increases. In other words, when the above (8) and (9) are applied and the sound image localization direction θ _S is rightward, the crosstalk for the left speaker 18L increases as the angle represented by θ _S increases. since the cancellation amount by canceling filter C _{L (omega)} increases, so that the acoustic signal T _R output from the right speaker 18R can be more emphasized.

以上のように、上記の（８）、（９）では、音像定位方向θ_Ｓが左方向となっている場合（例えば、０＜ｘ＜πの場合）、右スピーカ１８Ｒ用のクロストークキャンセルフィルタＣ_Ｒ（ω）によるキャンセル量を大きくして、左スピーカ１８Ｌから出力される音響信号Ｔ_Ｌを強調している。また、上記の（８）、（９）では、音像定位方向θ_Ｓが右方向となっている場合（例えば、π＜ｘ＜２πの場合）、左スピーカ１８Ｌ用のクロストークキャンセルフィルタＣ_Ｌ（ω）によるキャンセル量を大きくして、右スピーカ１８Ｒから出力される音響信号Ｔ_Ｒを強調している。言い換えると、上記の（８）、（９）では、音像定位方向θ_Ｓの側（ユーザＵから見て右側又は左側）と反対側のスピーカ用のクロストークキャンセルの量を大きくするようにαが設定されている。 As described above, in the above (8) and (9), when the sound image localization direction θ _S is the left direction (for example, 0 <x <π), the crosstalk cancellation filter for the right speaker 18R. The amount of cancellation by C _R (ω) is increased to emphasize the acoustic signal _TL output from the left speaker 18L. In the above (8) and (9), when the sound image localization direction θ _S is the right direction (for example, when π <x <2π), the crosstalk cancellation filter C _L (for the left speaker 18L) by increasing the cancel amount of omega), highlighting the acoustic signal T _R output from the right speaker 18R. In other words, in the above (8) and (9), α is set so as to increase the amount of crosstalk cancellation for the speaker on the side opposite to the sound image localization direction θ _S side (right side or left side as viewed from the user U). Is set.

言い換えると、上記の（８）、（９）では、伝達関数Ｇ_ＲＲ、Ｇ_ＲＬ、Ｇ_ＬＲ、Ｇ_ＬＬに加えて、音像定位方向θ_Ｓも用いて、左右のクロストークキャンセルの量を調整している。これにより、ユーザＵにとっては、音像定位方向θ_Ｓの側の音源の音がより強調されることになるため、ユーザＵの頭部が動作（ユーザＵの耳の位置が動作）した場合でも、ユーザＵに聞こえる音の定位感を安定させることが可能となる。 In other words, in the above (8) and (9), in addition to the transfer functions G _RR , G _RL , G _LR , and G _LL , the sound image localization direction θ _S is used to adjust the amount of left and right crosstalk cancellation. ing. Thereby, for the user U, since the sound of the sound source on the sound image localization direction θ _S side is more emphasized, even when the head of the user U moves (the position of the ear of the user U moves) It is possible to stabilize the sense of localization of the sound heard by the user U.

なお、この実施形態では、上記の（８）、（９）ではα（ｘ）を用いて左右のクロストークキャンセルのバランス（量）を調整しているが、上述と同様の調整が可能であれば、左右のクロストークキャンセルのバランス（量）の調整方式は限定されないものである。

In this embodiment, in (8) and (9) above, α (x) is used to adjust the balance (amount) of the left and right crosstalk cancellations, but the same adjustment as described above is possible. For example, the adjustment method of the balance (amount) of the left and right crosstalk cancellation is not limited.

フィルタ形成部１３は、ＨＲＴＦ保持部１４と同様に、予め音像定位方向θ_Ｓごとに対応するクロストークキャンセルフィルタＣ_Ｒ（ω）、Ｃ_Ｌ（ω）を保持しておいて、音像定位方向θ_Ｓが変動するごとに適用するクロストークキャンセルフィルタＣ_Ｒ（ω）、Ｃ_Ｌ（ω）を切替える（選択する）ようにしてもよいし、音像定位方向θ_Ｓが変動するごとに適用するクロストークキャンセルフィルタＣ_Ｒ（ω）、Ｃ_Ｌ（ω）を算出するようにしてもよい。 Similarly to the HRTF holding unit 14, the filter forming unit 13 holds the crosstalk cancellation filters C _R (ω) and C _L (ω) corresponding to each sound image localization direction θ _{S in} advance, and the sound image localization direction θ. _The crosstalk cancellation filters C _R (ω) and C _L (ω) to be applied every time _S changes may be switched (selected), or the crosstalk to be applied every time the sound image localization direction θ _S changes. Cancel filters C _R (ω) and C _L (ω) may be calculated.

（Ａ−２）第１の実施形態の動作
次に、以上のような構成を有する第１の実施形態の音響再生装置１０の動作について図１を用いて説明する。 (A-2) Operation of the First Embodiment Next, the operation of the sound reproduction device 10 of the first embodiment having the above configuration will be described with reference to FIG.

データ入力部１１は、入力されたアナログ信号をデジタル信号に変換し、入力音響信号Ｃｉとして出力する。この実施形態では、入力音響信号Ｃ_ｉは、フィルタ形成部１３を介して立体音響処理部１６に供給される。 The data input unit 11 converts the input analog signal into a digital signal and outputs it as an input acoustic signal Ci. In this embodiment, the input acoustic signal C _i is supplied to the stereophonic sound processing unit 16 via the filter forming unit 13.

位置情報取得部１２は、例えば、所定のタイミングごとに、最新の音像定位方向、ユーザ位置情報（Ｐ_ｅＲ、Ｐ_ｅＬ）、及びスピーカ位置情報Ｐ_ＳＲ、Ｐ_ＳＬを取得してフィルタ形成部１３に供給する。 The position information acquisition unit 12 acquires, for example, the latest sound image localization direction, user position information (P _eR , P _eL ), and speaker position information P _SR , P _SL for each predetermined timing and sends them to the filter formation unit 13. Supply.

フィルタ形成部１３は、位置情報取得部１２から供給される情報が更新されるごとに、ＨＲＴＦ保持部１４及び伝達関数保持部１５から、更新された情報に対応するＨＲＴＦ（Ｈ_Ｌ（ω）、Ｈ_Ｒ（ω））及び伝達関数Ｇ_ＲＲ、Ｇ_ＲＬ、Ｇ_ＬＲ、Ｇ_ＬＬを取得する。そして、フィルタ形成部１３は、取得したＨＲＴＦ（Ｈ_Ｌ（ω）、Ｈ_Ｒ（ω））及び伝達関数Ｇ_ＲＲ、Ｇ_ＲＬ、Ｇ_ＬＲ、Ｇ_ＬＬに基づいて、トランスオーラル処理に用いるクロストークキャンセルフィルタＣ_Ｒ（ω）、Ｃ_Ｌ（ω）を保持（算出又は選択により保持）する。そして、フィルタ形成部１３は、最新に取得したＨＲＴＦ（Ｈ_Ｌ（ω）、Ｈ_Ｒ（ω））、及びクロストークキャンセルフィルタＣ_Ｒ（ω）、Ｃ_Ｌ（ω）を立体音響処理部１６に供給する。このとき、フィルタ形成部１３は、例えば、音像定位方向θ_Ｓに応じたパラメータαを算出（上記の（１０）式により算出）し、上記の（８）、（９）式にαを代入してクロストークキャンセルフィルタＣ_Ｒ（ω）、Ｃ_Ｌ（ω）を算出する。 Each time the information supplied from the position information acquisition unit 12 is updated, the filter forming unit 13 receives the HRTF (H _L (ω), corresponding to the updated information from the HRTF holding unit 14 and the transfer function holding unit 15. H _R (ω)) and transfer functions G _RR , G _RL , G _LR , G _LL are acquired. Then, the filter forming unit 13 uses the acquired HRTF (H _L (ω), H _R (ω)) and the transfer functions G _RR , G _RL , G _LR , G _LL to cancel crosstalk used for transoral processing. The filters C _R (ω) and C _L (ω) are held (held by calculation or selection). Then, the filter forming unit 13 supplies the most recently acquired HRTF (H _L (ω), H _R (ω)) and the crosstalk cancellation filters C _R (ω), C _L (ω) to the stereophonic sound processing unit 16. Supply. At this time, for example, the filter forming unit 13 calculates the parameter α corresponding to the sound image localization direction θ _S (calculated by the above equation (10)), and substitutes α into the above equations (8) and (9). The crosstalk cancellation filters C _R (ω) and C _L (ω) are calculated.

立体音響処理部１６は、フィルタ形成部１３から最新に供給されたＨＲＴＦ（Ｈ_Ｌ（ω）、Ｈ_Ｒ（ω））、及びクロストークキャンセルフィルタＣ_Ｒ（ω）、Ｃ_Ｌ（ω）を用いて、入力音響信号Ｃ_ｉにトランスオーラル処理を施す。具体的には、まず、立体音響処理部１６は、入力音響信号Ｃ_ｉに右耳用のＨ_Ｒ（ω）を用いてバイノーラル処理を施した右耳用のバイノーラル音源Ｂ_Ｒを生成し、入力音響信号Ｃ_ｉに左耳用のＨ_Ｌ（ω）を用いてバイノーラル処理を施した左耳用のバイノーラル音源Ｂ_Ｌを生成する。そして、立体音響処理部１６は、右耳用のバイノーラル音源Ｂ_Ｒに右スピーカ１８Ｒ用のクロストークキャンセルフィルタＣ_Ｒ（ω）を掛けて、右スピーカ１８Ｒ用のトランスオーラル音源Ｔ_Ｒを生成し、出力部１７に供給する。また、立体音響処理部１６は、左耳用のバイノーラル音源Ｂ_Ｌに左スピーカ１８Ｌ用のクロストークキャンセルフィルタＣ_Ｌ（ω）を掛けて、左スピーカ１８Ｌ用のトランスオーラル音源Ｔ_Ｌを生成し、出力部１７に供給する。 The stereophonic sound processing unit 16 uses the latest supplied HRTF (H _L (ω), H _R (ω)) and the crosstalk cancellation filters C _R (ω), C _L (ω) from the filter forming unit 13. Then, transoral processing is performed on the input acoustic signal C _i . Specifically, first, the three-dimensional sound processing unit 16 generates a binaural sound source B _R for the right ear subjected to binaural processing using the H _{R (ω)} for the right ear to the input audio signal C _i, input A binaural sound source _BL for left ear is generated by binaural processing using H _L (ω) for left ear on the acoustic signal C _i . Then, stereophonic sound processing unit 16 multiplies the crosstalk cancellation filter C _R for the right speaker 18R _(omega) to binaural sound B _R for the right ear to generate a Transaural sound T _R for the right speaker 18R, This is supplied to the output unit 17. Further, the stereophonic sound processing unit 16 multiplies the binaural sound source _BL for the left ear by the crosstalk cancellation filter C _L (ω) for the left speaker 18L to generate a transoral sound source T _L for the left speaker 18L, This is supplied to the output unit 17.

出力部１７は、立体音響処理部１６から供給されたトランスオーラル音源Ｔ_Ｒ、Ｔ_Ｌをアナログ信号に変換し、それぞれスピーカ１８Ｒ、１８Ｌに出力（供給）する。 The output unit 17 converts the transoral sound sources T _R and T _L supplied from the stereophonic sound processing unit 16 into analog signals and outputs (supplies) to the speakers 18R and 18L, respectively.

以上のように、音響再生装置１０は、入力音響信号Ｃ_ｉをトランスオーラル処理し、トランスオーラル音源としての音響信号をユーザＵに出力する。 As described above, an audio reproducing device 10, an input audio signal C _i by transaural processing, and outputs a sound signal as Transaural sound source to the user U.

（Ａ−３）第１の実施形態の効果
第１の実施形態によれば、以下のような効果を奏することができる。 (A-3) Effects of First Embodiment According to the first embodiment, the following effects can be achieved.

音響再生装置１０では、トランスオーラル処理の際に、音像定位方向θ_Ｓに応じたクロストークキャンセルフィルタを適用している。これにより、音響再生装置１０では、立体音響効果を得られるスイートスポットを広くすることができる。これにより、第１の実施形態では、例えば、ユーザＵが音源定位方向へ顔を向けたとしても、定位感と自然さを保ったまま臨場感のある音を体感させることができる。 In the sound reproducing apparatus 10, when the transaural processing, and applying the cross-talk cancellation filters corresponding to the sound image localization direction theta _S. Thereby, in the sound reproduction apparatus 10, the sweet spot which can obtain a three-dimensional sound effect can be widened. Thereby, in the first embodiment, for example, even if the user U turns his face in the sound source localization direction, it is possible to experience a sound with a sense of reality while maintaining a sense of localization and naturalness.

また、音響再生装置１０では、音像定位方向θ_Ｓ毎にクロストークキャンセルフィルタを保持し、音像定位方向θ_Ｓに応じたクロストークキャンセルフィルタを選択する構成とすることができる。この場合、音響再生装置１０では、音像定位方向θ_Ｓ毎にクロストークキャンセルフィルタを変更する処理を行っても、それによる処理量の増加（従来技術と比較した処理量の増大）は僅かであるため、効率的に高品質な立体音響処理を行うことが可能となる。 Further, in the sound reproducing apparatus 10, it is possible to hold the crosstalk cancellation filters for each sound image localization direction theta _S, a configuration for selecting the crosstalk cancellation filter in accordance with the sound image localization direction theta _S. In this case, in the sound reproducing device 10, even if the process of changing the crosstalk cancellation filter is performed for each sound image localization direction θ _S , the increase in the processing amount (the increase in the processing amount compared to the conventional technology) is small. Therefore, it is possible to efficiently perform high-quality stereophonic processing.

次に、音響再生装置１０を実際に構築してユーザに聴取させた場合の実験（以下、「本実験」と呼ぶ）の内容及びその結果について説明する。 Next, the contents and results of an experiment (hereinafter referred to as “main experiment”) when the sound reproducing apparatus 10 is actually constructed and listened to by the user will be described.

図３は、本実験の環境について示した説明図である。図３では、本実験におけるユーザＵとスピーカとの位置関係について示している。 FIG. 3 is an explanatory diagram showing the environment of this experiment. FIG. 3 shows the positional relationship between the user U and the speaker in this experiment.

図３（ａ）は、ユーザＵと各スピーカを方向から見た場合の位置関係について示している。図３（ｂ）は、ユーザＵから見た場合の各スピーカの配置位置について示している。 FIG. 3A shows the positional relationship when the user U and each speaker are viewed from the direction. FIG. 3B shows the position of each speaker when viewed from the user U.

本実験では、図３に示すように４ｃｍ角の小型のスピーカを用い、一列８個、上下２段、計１６個のスピーカアレイＳＡを構築した。このスピーカアレイＳＡのうち、中央下段２つのスピーカ以外は、音の出ないダミーのスピーカＳＤとなっている。図３（ｂ）に示すように、スピーカアレイＳＡにおいて、中央下段の２つのスピーカのうち右側にあるスピーカが右スピーカ１８Ｒ、左側にあるスピーカが左スピーカ１８Ｌとなっている。 In this experiment, as shown in FIG. 3, a small speaker of 4 cm square was used, and a total of 16 speaker arrays SA were constructed, 8 in a row and 2 in upper and lower rows. In the speaker array SA, the speakers other than the two speakers at the lower center are dummy speakers SD that do not emit sound. As shown in FIG. 3B, in the speaker array SA, the right speaker 18R and the left speaker 18L are the right speaker and the left speaker 18L of the two speakers at the lower center.

図３に示すように、スピーカアレイＳＡにおいて、各スピーカの左右方向の間隔Ｌ１は２２ｃｍであり、上下の間隔Ｌ２は１５ｃｍである。また、スピーカアレイＳＡの左右方向の端から端までの距離Ｌ３は１．７ｍである。さらに、地面からスピーカアレイＳＡの下段のスピーカまでの高さＬ４は８５ｃｍとなっているものとする。さらにまた、図３（ａ）に示すように、スピーカアレイの中心から、被験者（聴取者）であるユーザＵの頭部の中心位置Ｐ_Ｕまでの距離Ｌ５は１ｍとした。 As shown in FIG. 3, in the speaker array SA, the distance L1 between the left and right directions of each speaker is 22 cm, and the distance L2 between the top and bottom is 15 cm. Further, the distance L3 from the end in the left-right direction of the speaker array SA is 1.7 m. Furthermore, it is assumed that the height L4 from the ground to the lower speaker of the speaker array SA is 85 cm. Furthermore, as shown in FIG. 3A, the distance L5 from the center of the speaker array to the center position P _U of the head of the user U who is the subject (listener) is 1 m.

本実験では、中央２つのスピーカ１８Ｒ、１８Ｌから立体音響処理を施した女性の音声を再生（この実施形態のトランスオーラル再生）した。このとき、定位させる音源の方向（音像定位方向θ_Ｓ）は、右６０°、左６０°、右９０°、左９０°の４種類であり、再生もこの順番で行った。また、このとき、一回に再生される音源は一つである。そして、本実験では、上述のように定位させる音源の方向（音像定位方向θ_Ｓ）を切替えながら音声を再生し、被験者であるユーザＵに、音が鳴っていると感じるスピーカを選択させる等のアンケートを行った。なお、本実験では、被験者には、事前に中央２つのスピーカから音が鳴っていると知らせであるが、上下どちらの段のスピーカが鳴っているかは伏せて聴取させた。 In this experiment, the female voice subjected to the stereophonic sound processing was reproduced from the two central speakers 18R and 18L (transoral reproduction in this embodiment). At this time, the direction of the sound source to be localized (sound image localization direction θ _S ) is four types of 60 ° to the right, 60 ° to the left, 90 ° to the right, and 90 ° to the left, and reproduction was also performed in this order. At this time, one sound source is reproduced at a time. In this experiment, the sound is reproduced while switching the direction of the sound source to be localized (sound image localization direction θ _S ) as described above, and the user U who is the subject selects a speaker that feels sounding, etc. A questionnaire was conducted. In this experiment, the test subject was informed that sound was sounding from the central two speakers in advance, but the speaker at the top and bottom was sounding down and listened.

そして、本実験では、上述の４種類の音声の再生が終了した後、平均オピニオン評点（ＭｅａｎＯｐｉｎｉｏｎＳｃｏｒｅＭＯＳ）により主観評価アンケートを実施した。本実験において、アンケートの項目は、定位感（音像の位置が変動せずに安定していたかどうか）、音質（歪みや異音などを感じたかどうか）、自然さ（実際にスピーカから音が鳴っていると感じたかどうか）の３つである。 And in this experiment, after the reproduction | regeneration of the above-mentioned 4 types of audio | voice was complete | finished, the subjective evaluation questionnaire was implemented by the average opinion score (Mean Opinion Score MOS). In this experiment, the items in the questionnaire were localization (whether the position of the sound image was stable without fluctuation), sound quality (whether you felt distortion or abnormal noise), naturalness (actually sounded from the speaker) Whether or not you feel it).

図４は、被験者が感じた音源の方向（位置）について集計した結果を示したグラフである。図４（ａ）〜図４（ｄ）は、それぞれ定位させる音源の方向（音像定位方向θ_Ｓ）を、左６０°、右６０°、左９０°、右９０°とした場合の集計結果について示したグラフである。図４に示す各グラフでは、縦軸（各棒グラフの高さ）は人数、横軸はスピーカの位置である。図４に示す各グラフの横軸において、１が左端のスピーカを示しており、８が右端のスピーカを示している。また、図４に示す各グラフでは、前列の棒グラフは、スピーカアレイＳＡの上段のスピーカに対応し、後列はスピーカアレイＳＡの下段のスピーカに対応している。実際に音が鳴っているスピーカ１８Ｒ、１８Ｌは、下段の４、５の棒グラフに対応している。 FIG. 4 is a graph showing the results of counting the direction (position) of the sound source felt by the subject. 4 (a) to 4 (d) show the total results when the direction of the sound source to be localized (sound image localization direction θ _S ) is 60 ° left, 60 ° right, 90 ° left, and 90 ° right. It is the shown graph. In each graph shown in FIG. 4, the vertical axis (height of each bar graph) is the number of people, and the horizontal axis is the position of the speaker. In the horizontal axis of each graph shown in FIG. 4, 1 indicates the leftmost speaker, and 8 indicates the rightmost speaker. In each graph shown in FIG. 4, the bar graph in the front row corresponds to the upper speaker of the speaker array SA, and the rear row corresponds to the lower speaker in the speaker array SA. The speakers 18R and 18L that are actually sounding correspond to the lower and fourth bar graphs.

図４に示すように本実験では、音源位置が右６０°のときは右端より１つ内側のスピーカを選択している被験者が多いが、左６０°では殆ど左端が選択されていることが分かる。さらに、図４に示すように、本実験では、音源位置が９０°のときは左右ともに、全員が両端のスピーカを選択する結果となった。 As shown in FIG. 4, in this experiment, when the sound source position is 60 ° to the right, there are many subjects who have selected one speaker inside the right end, but at the left 60 °, the left end is almost selected. . Furthermore, as shown in FIG. 4, in this experiment, when the sound source position was 90 °, all left and right speakers were selected.

次に、本実験における主観評価の結果を図５に示す。本実験では、ＭＯＳ値は、定位感が３．８３、音質が３．６７、自然さが４．３３と全て３を超える数値となった。特に、本実験では、自然さは４を超えており、被験者がダミーのスピーカから実際に音が鳴っているように感じていたことが分かる。さらに、本実験では、定位感についても４に近い高い値となっている。以上のように、本実験の結果により、実験の結果、この実施形態の音響再生装置１０では、定位が安定し、かつ自然に聞こえる立体音響効果を得られることが示された。 Next, the result of the subjective evaluation in this experiment is shown in FIG. In this experiment, the MOS values were 3.83 for localization, 3.67 for sound quality, and 4.33 for naturalness, all exceeding 3. In particular, in this experiment, the naturalness exceeds 4, and it can be seen that the subject felt as if the sound was actually sounding from the dummy speaker. Further, in this experiment, the sense of localization is a high value close to 4. As described above, the result of this experiment showed that the sound reproducing apparatus 10 according to this embodiment can obtain a stereophonic effect that is stable in localization and can be heard naturally.

（Ｂ）第２の実施形態
以下、本発明による音響再生装置及びプログラムの第２の実施形態を、図面を参照しながら詳述する。 (B) Second Embodiment Hereinafter, a second embodiment of the sound reproducing device and the program according to the present invention will be described in detail with reference to the drawings.

（Ｂ−１）第２の実施形態の構成
図６は、第２の実施形態の音響再生装置１０Ａの全体構成を示すブロック図である。図６では、上述の図１と同一部分又は対応部分には、同一符号又は対応符号を付している。 (B-1) Configuration of Second Embodiment FIG. 6 is a block diagram showing an overall configuration of a sound reproducing device 10A of the second embodiment. In FIG. 6, the same or corresponding parts as those in FIG.

以下では、第３の実施形態について第１の実施形態との差異を説明する。 Hereinafter, differences of the third embodiment from the first embodiment will be described.

第２の実施形態の音響再生装置１０Ａは、フィルタ保持部１９が追加されており、フィルタ形成部１３がフィルタ形成部１３Ａに置き換わっている点で第１の実施形態と異なっている。 The sound reproducing device 10A of the second embodiment is different from the first embodiment in that a filter holding unit 19 is added and the filter forming unit 13 is replaced with a filter forming unit 13A.

第１の実施形態では、フィルタ形成部１３が、音像定位方向θ_Ｓを取得する度にクロストークキャンセルフィルタを形成する処理を行う例について説明した。これに対して、第２の実施形態のフィルタ形成部１３Ａは、各音像定位方向θ_Ｓのクロストークキャンセルフィルタを形成してフィルタ保持部１９に保持させる。その後、フィルタ形成部１３は、入力された音像定位方向θ_Ｓに応じたクロストークキャンセルフィルタを選択して立体音響処理部１６に供給する。 In the first embodiment, the example in which the filter forming unit 13 performs the process of forming the crosstalk cancellation filter every time the sound image localization direction θ _S is acquired has been described. On the other hand, the filter forming unit 13A of the second embodiment forms a crosstalk cancellation filter of each sound image localization direction θ _S and holds it in the filter holding unit 19. Thereafter, the filter forming unit 13 selects a crosstalk cancellation filter corresponding to the input sound image localization direction θ _S and supplies it to the stereophonic sound processing unit 16.

また、フィルタ形成部１３Ａは、位置情報取得部１２から位置情報（ユーザ位置情報及びスピーカ位置情報）が供給されると、フィルタ保持部１９に当該位置情報に対応するクロストークキャンセルフィルタが保持されているか否かを確認する。そして、フィルタ形成部１３Ａは、位置情報取得部１２から供給された位置情報に対応するクロストークキャンセルフィルタがフィルタ保持部１９に保持されていない場合、当該位置情報に対応するクロストークキャンセルフィルタ（各音像定位方向θ_Ｓのクロストークキャンセルフィルタ）を形成して、フィルタ保持部１９に保持させる処理を行う。すなわち、第２の実施形態では、フィルタ保持部１９がクロストークキャンセルフィルタのキャッシュとして機能する。 Further, when position information (user position information and speaker position information) is supplied from the position information acquisition unit 12, the filter forming unit 13A holds the crosstalk cancellation filter corresponding to the position information in the filter holding unit 19. Check if it exists. When the crosstalk cancellation filter corresponding to the position information supplied from the position information acquisition unit 12 is not held in the filter holding unit 19, the filter forming unit 13 </ b> A A sound image localization direction θ _S (crosstalk cancellation filter) is formed, and the filter holding unit 19 holds it. That is, in the second embodiment, the filter holding unit 19 functions as a cache for the crosstalk cancellation filter.

（Ｂ−２）第２の実施形態の効果
第２の実施形態によれば、第１の実施形態の効果に加えて以下のような効果を奏することができる。 (B-2) Effects of Second Embodiment According to the second embodiment, the following effects can be obtained in addition to the effects of the first embodiment.

第２の実施形態の音響再生装置１０Ａでは、各音像定位方向θ_Ｓのクロストークキャンセルフィルタを形成してフィルタ保持部１９に保持（キャッシュ）させることで、第１の実施形態よりも処理量（クロストークキャンセルフィルタを形成する処理量）を低減させることができる。 In the sound reproducing device 10A of the second embodiment, a crosstalk cancellation filter for each sound image localization direction θ _S is formed and held (cached) in the filter holding unit 19, so that the processing amount ( The amount of processing that forms the crosstalk cancellation filter) can be reduced.

（Ｃ）第３の実施形態
以下、本発明による音響再生装置及びプログラムの第３の実施形態を、図面を参照しながら詳述する。 (C) Third Embodiment Hereinafter, a third embodiment of the sound reproducing device and the program according to the present invention will be described in detail with reference to the drawings.

（Ｃ−１）第３の実施形態の構成及び動作
図７は、第３の実施形態の音響再生装置１０Ｂの全体構成を示すブロック図である。図７では、上述の図１と同一部分又は対応部分には、同一符号又は対応符号を付している。 (C-1) Configuration and Operation of the Third Embodiment FIG. 7 is a block diagram showing the overall configuration of the sound reproducing device 10B of the third embodiment. In FIG. 7, the same or corresponding parts as those in FIG.

第３の実施形態の音響再生装置１０Ｂでは、伝達関数保持部１５が削除され、フィルタ形成部１３がフィルタ形成部１３Ｂに置き換わっている点で第１の実施形態と異なっている。また、第３の実施形態の音響再生装置１０Ｂでは、角度算出部２０が追加されている点で第１の実施形態と異なっている。 The sound reproducing device 10B according to the third embodiment is different from the first embodiment in that the transfer function holding unit 15 is deleted and the filter forming unit 13 is replaced with a filter forming unit 13B. Further, the sound reproducing device 10B of the third embodiment is different from the first embodiment in that an angle calculation unit 20 is added.

第１の実施形態のフィルタ形成部１３は、ＨＲＴＦと伝達関数Ｇ_ＲＲ、Ｇ_ＲＬ、Ｇ_ＬＲ、Ｇ_ＬＬを取得してクロストークキャンセルフィルタを形成している。したがって、第１の実施形態のフィルタ形成部１３では、伝達関数を計測していない空間ではクロストークキャンセルフィルタを形成することができないことになる。そこで、第３の実施形態のフィルタ形成部１３Ｂは、クロストークキャンセルフィルタを形成する際、伝達関数の代わりにＨＲＴＦ（ＨＲＴＦ保持部１４で保持されているＨＲＴＦ）を使用するものする。 The filter forming unit 13 according to the first embodiment acquires the HRTF and the transfer functions G _RR , G _RL , G _LR , and G _LL to form a crosstalk cancellation filter. Therefore, the filter forming unit 13 of the first embodiment cannot form a crosstalk cancellation filter in a space where the transfer function is not measured. Therefore, the filter forming unit 13B of the third embodiment uses HRTF (HRTF held by the HRTF holding unit 14) instead of the transfer function when forming the crosstalk cancellation filter.

角度算出部２０は、位置情報取得部１２から、ユーザ位置情報及びスピーカ位置情報が供給されると、ユーザ位置情報及びスピーカ位置情報に基づいてユーザＵ（ユーザＵの各耳）から各スピーカ１８Ｒ、１８Ｌへの方向（角度）を算出し、フィルタ形成部１３Ｂに供給する。 When the user position information and the speaker position information are supplied from the position information acquisition unit 12, the angle calculation unit 20 receives each speaker 18R from the user U (each ear of the user U) based on the user position information and the speaker position information. The direction (angle) to 18L is calculated and supplied to the filter forming unit 13B.

図８は、角度算出部２０が取得する各方向（角度）の例について示した説明図である。 FIG. 8 is an explanatory diagram showing an example of each direction (angle) acquired by the angle calculation unit 20.

具体的には、角度算出部２０は、ユーザ位置情報及びスピーカ位置情報に基づいて、ユーザＵの右耳の位置Ｐ_ｅＲから右スピーカ１８Ｒの位置Ｐ_ＳＲへの方向（角度）を示すθ_ＲＲと、ユーザＵの右耳の位置Ｐ_ｅＲから左スピーカ１８Ｌの位置Ｐ_ＳＬへの方向（角度）を示すθ_ＲＬと、ユーザＵの左耳の位置Ｐ_ｅＬから右スピーカ１８Ｒの位置Ｐ_ＳＲへの方向（角度）を示すθ_ＬＲと、ユーザＵの左耳の位置Ｐ_ｅＬから左スピーカ１８Ｌの位置Ｐ_ＳＬへの方向（角度）を示すθ_ＬＬとを算出する。 Specifically, the angle calculation unit 20 is based on the user position information and the speaker position information, and θ _RR indicating the direction (angle) from the right ear position P _eR of the user U to the position P _SR of the right speaker 18R. , Θ _RL indicating the direction (angle) from the position P _eR of the right ear of the user U to the position P _SL of the left speaker 18L, and the direction from the position P _eL of the left ear of the user U to the position P _{SR of the} right speaker 18R Θ _LR indicating (angle) and θ _LL indicating the direction (angle) from the position P _eL of the left ear of the user U to the position P _SL of the left speaker 18L are calculated.

図８に示すように、θ_ＲＲ、θ_ＲＬは、ユーザＵの右耳の位置Ｐ_ｅＲを起点する方向（角度）である。θ_ＲＲ、θ_ＲＬは、ユーザＵの右耳の位置Ｐ_ｅＲからユーザＵが向いている方向を０°として各スピーカ１８Ｒ、１８Ｌの存在する方向を示している。 As shown in FIG. 8, θ _RR and θ _RL are directions (angles) starting from the position _PeR of the right ear of the user U. θ _RR and θ _RL indicate directions in which the speakers 18R and 18L exist, assuming that the direction in which the user U is facing from the position P _eR of the right ear of the user U is 0 °.

また、図８に示すようにθ_ＬＲ、θ_ＬＬは、ユーザＵの右耳の位置Ｐ_ｅＬを起点する方向（角度）である。θ_ＬＲ、θ_ＬＬは、ユーザＵの右耳の位置Ｐ_ｅＲからユーザＵが向いている方向を０°として各スピーカ１８Ｒ、１８Ｌの存在する方向を示している。 Further, as shown in FIG. 8, θ _LR and θ _LL are directions (angles) starting from the position P _eL of the right ear of the user U. θ _LR and θ _LL indicate directions in which the speakers 18R and 18L exist, assuming that the direction in which the user U is facing from the right ear position _PeR of the user U is 0 °.

そして、フィルタ形成部１３Ｂは、角度算出部２０から取得した各方向（ユーザＵの各耳から各スピーカ１８Ｒ、１８Ｌへの方向）を示す方向θ_ＲＲ、θ_ＲＬ、θ_ＬＲ、θ_ＬＬのそれぞれについてＨＲＴＦを取得（ＨＲＴＦ保持部１４）する。以下では、θ_ＲＲ、θ_ＲＬ、θ_ＬＲ、θ_ＬＬに対応するＨＲＴＦを、それぞれＨ_ＲＲ（ω）、Ｈ_ＲＬ（ω）、Ｈ_ＬＲ（ω）、Ｈ_ＬＬ（ω）と表す。 The filter forming unit 13B then obtains the directions θ _RR , θ _RL , θ _LR , and θ _LL indicating the directions (directions from the user's U ears to the speakers 18R and 18L) acquired from the angle calculation unit 20. HRTF is acquired (HRTF holding unit 14). Hereinafter, HRTFs corresponding to θ _RR , θ _RL , θ _LR , and θ _LL are represented as H _RR (ω), H _RL (ω), H _LR (ω), and H _LL (ω), respectively.

そして、フィルタ形成部１３Ｂは、Ｈ_ＲＲ（ω）、Ｈ_ＲＬ（ω）、Ｈ_ＬＲ（ω）、Ｈ_ＬＬ（ω）を、それぞれ伝達関数Ｇ_ＲＲ、Ｇ_ＲＬ、Ｇ_ＬＲ、Ｇ_ＬＬとして用い、クロストークキャンセルフィルタを生成する。 Then, the filter forming unit 13B uses H _RR (ω), H _RL (ω), H _LR (ω), and H _LL (ω) as transfer functions G _RR , G _RL , G _LR , and G _LL , respectively. Generate a crosstalk cancellation filter.

（Ｃ−２）第３の実施形態の効果
第３の実施形態によれば、第１の実施形態の効果に加えて、以下のような効果を奏することができる。 (C-2) Effects of Third Embodiment According to the third embodiment, the following effects can be achieved in addition to the effects of the first embodiment.

第３の実施形態の音響再生装置１０Ｂでは、ＨＲＴＦを伝達関数に流用するため、対応する伝達関数のデータを保持していない空間においても、ある程度の精度でクロストークキャンセルフィルタを形成して、立体音響処理（トランスオーラル処理）を行うことができる。 In the sound reproducing device 10B of the third embodiment, since the HRTF is used as a transfer function, a crosstalk cancellation filter is formed with a certain degree of accuracy even in a space that does not hold the corresponding transfer function data, Acoustic processing (trans-oral processing) can be performed.

（Ｄ）他の実施形態
本発明は、上記の各実施形態に限定されるものではなく、以下に例示するような変形実施形態も挙げることができる。 (D) Other Embodiments The present invention is not limited to the above-described embodiments, and may include modified embodiments as exemplified below.

（Ｄ−１）上記の各実施形態では、本発明の音響再生装置では、位置情報として、音像定位方向、ユーザ位置情報、及びスピーカ位置情報の３つのパラメータのリアルタイム更新を行う構成として説明したが、リアルタイムに更新の必要のないパラメータについては、更新せずに予め設定された値を保持し続ける構成としてもよい。例えば、音響再生装置では、ユーザ位置情報、及びスピーカ位置情報については固定値として保持し、音像定位方向のみ変動するパラメータとして取得するようにしてもよい。 (D-1) In each of the above embodiments, the sound reproducing device of the present invention has been described as a configuration that performs real-time update of three parameters of the sound image localization direction, the user position information, and the speaker position information as the position information. A parameter that does not need to be updated in real time may be configured to keep a preset value without updating. For example, in the sound reproducing apparatus, the user position information and the speaker position information may be held as fixed values and acquired as parameters that vary only in the sound image localization direction.

また、本発明の音響再生装置は、例えば、ユーザ位置情報、及びスピーカ位置情報を保持せずに、直接伝達関数Ｇ_ＲＲ、Ｇ_ＲＬ、Ｇ_ＬＲ、Ｇ_ＬＬを保持する構成としてもよい。 In addition, the sound reproduction device of the present invention may be configured to directly hold the transfer functions G _RR , G _RL , G _LR , G _LL without holding the user position information and the speaker position information, for example.

１０…音響再生装置、１１…データ入力部、１２…位置情報取得部、１３…フィルタ形成部、１４…ＨＲＴＦ保持部、１５…伝達関数保持部、１６…立体音響処理部、１７…出力部、１８Ｒ…右スピーカ、１８Ｌ…左スピーカ。 DESCRIPTION OF SYMBOLS 10 ... Sound reproduction apparatus, 11 ... Data input part, 12 ... Position information acquisition part, 13 ... Filter formation part, 14 ... HRTF holding part, 15 ... Transfer function holding part, 16 ... Stereophonic sound processing part, 17 ... Output part, 18R ... right speaker, 18L ... left speaker.

Claims

In an audio reproduction device that generates stereophonic signals to be supplied to each of a plurality of speakers by performing stereophonic processing on an input acoustic signal,
A head-related transfer function holding unit that holds a head-related transfer function corresponding to the direction of each sound source;
An information acquisition unit that acquires information on a sound image localization direction that localizes at least a sound source;
A first stereophonic signal generation unit that generates a first stereoacoustic signal in which a sound source is localized in the sound image localization direction using the head-related transfer function held by the head-related transfer function holding unit;
For each of the speakers, a crosstalk component is removed from the first stereophonic sound signal and a crosstalk cancellation filter is retained, and a parameter corresponding to the sound image localization direction acquired by the information acquisition unit is used. A crosstalk cancellation filter holding unit for holding a crosstalk cancellation filter based on ,
For each of the speakers, a second stereophonic signal is generated by removing a crosstalk component from the first stereophonic signal using a crosstalk cancellation filter held by the crosstalk cancellation filter holding unit. And a stereophonic sound signal generator.

The crosstalk cancellation filter holding unit adjusts the balance of the amount of crosstalk components removed by the crosstalk cancellation filter corresponding to each speaker according to the sound image localization direction acquired by the information acquisition unit. The stereophonic sound reproducing device according to claim 1.

The crosstalk cancellation filter holding unit balances the amount of cancellation of crosstalk components by the crosstalk cancellation filter corresponding to each speaker, using a parameter that varies according to the sound image localization direction acquired by the information acquisition unit. The three-dimensional sound reproducing apparatus according to claim 2, wherein adjustment is performed.

The crosstalk cancellation filter holding unit holds a crosstalk cancellation filter for each sound image localization direction in advance, selects a crosstalk cancellation filter corresponding to the sound image localization direction acquired by the information acquisition unit, and selects the second stereoscopic The stereophonic sound reproducing device according to any one of claims 1 to 3, wherein the sound signal generating unit is supplied to the sound signal generating unit.

The information acquisition unit further acquires the position information of the listener and the position information of each speaker,
A transfer function holding unit for acquiring a transfer function between each speaker and the listener according to the position information of the listener and the position information of each speaker;
The stereophonic sound reproducing device according to claim 1, wherein the crosstalk cancellation filter holding unit generates a crosstalk cancellation filter using the transfer function acquired by the transfer function holding unit. .

The transfer function holding unit recognizes the direction from the listener to each speaker based on the listener's position information and the position information of each speaker, and from the listener to each speaker. The stereophonic sound reproducing device according to claim 5, wherein a head-related transfer function corresponding to the direction of the sound is acquired as a transfer function between each of the speakers and the listener.

A computer mounted on a sound reproduction device that generates a three-dimensional sound signal that is supplied to each of a plurality of speakers by performing a three-dimensional sound process on the input sound signal,
A head-related transfer function holding unit that holds a head-related transfer function corresponding to the direction of each sound source;
An information acquisition unit that acquires information on a sound image localization direction that localizes at least a sound source;
A first stereophonic signal generation unit that generates a first stereoacoustic signal in which a sound source is localized in the sound image localization direction using the head-related transfer function held by the head-related transfer function holding unit;
For each of the speakers, a crosstalk component is removed from the first stereophonic sound signal and a crosstalk cancellation filter is retained, and a parameter corresponding to the sound image localization direction acquired by the information acquisition unit is used. A crosstalk cancellation filter holding unit for holding a crosstalk cancellation filter based on ,
For each of the speakers, a second stereophonic signal is generated by removing a crosstalk component from the first stereophonic signal using a crosstalk cancellation filter held by the crosstalk cancellation filter holding unit. A stereophonic sound reproduction program that functions as a stereophonic sound signal generation unit.