JP5929455B2

JP5929455B2 - Audio processing apparatus, audio processing method, and audio processing program

Info

Publication number: JP5929455B2
Application number: JP2012093421A
Authority: JP
Inventors: 拓郎大谷; 洋平関; 桂樹岡林
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2012-04-16
Filing date: 2012-04-16
Publication date: 2016-06-08
Anticipated expiration: 2032-04-16
Also published as: JP2013223098A

Description

本発明は、音声処理装置、音声処理方法および音声処理プログラムに関する。 The present invention relates to a voice processing device, a voice processing method, and a voice processing program.

聴取者の周囲に多数の音源が存在する状況では、聴取者は、それらの音源のうち所望の音源からの音声を聞き分けることは難しい。そこで、聴取者にヘッドフォンを装着させ、複数の音源のうち選択された音源に対応する音声を、ヘッドフォンを通じて聴取者に提供することが考えられている。 In a situation where there are many sound sources around the listener, it is difficult for the listener to distinguish the sound from the desired sound source among those sound sources. In view of this, it is conceived that the listener is put on headphones and the sound corresponding to the sound source selected from the plurality of sound sources is provided to the listener through the headphones.

例えば、聴取者の周囲に複数の仮想音源を配置し、聴取者の頭部の前面を特定の音源に向け、聴取者がうなずくなどの所定の動作を行うことにより、特定の音源からの音声を選択する技術がある。また、聴取者が向いている方向に配置された仮想音源の音量を大きくするように制御する技術もある。さらに、聴取者の向きに応じて音像の定位を変化させる技術もある。 For example, a plurality of virtual sound sources are arranged around the listener, the front of the listener's head is directed to a specific sound source, and the listener performs a predetermined operation such as nodding, so that the sound from the specific sound source is There is a technology to choose. There is also a technique for controlling the volume of a virtual sound source arranged in the direction in which the listener is facing to be increased. Further, there is a technique for changing the localization of the sound image according to the orientation of the listener.

特開平９−９０９６３号公報Japanese Patent Laid-Open No. 9-90963 特開２００８−９２１９３号公報JP 2008-92193 A 特開２００３−１１１１９７号公報JP 2003-111197 A

しかしながら、複数の仮想音源の中から所望の音源を選択するために、聴取者がうなずくなどの特定の動作を行う方法では、操作が煩雑であり、聴取者が自然な動作で所望の音源からの音声を聞くことができないという問題があった。 However, in a method in which the listener performs a specific operation such as nodding in order to select a desired sound source from a plurality of virtual sound sources, the operation is complicated, and the listener can perform a natural operation from the desired sound source. There was a problem of not being able to hear the sound.

１つの側面では、本発明は、聴取者が自然な動作で所望の音源に対応する音声を聞き分けることが可能な音声処理装置、音声処理方法および音声処理プログラムを提供することを目的とする。 In one aspect, an object of the present invention is to provide a voice processing device, a voice processing method, and a voice processing program that enable a listener to recognize a voice corresponding to a desired sound source with natural motion.

１つの案では、聴取者の周囲に仮想的に配置された複数の仮想音源にそれぞれ対応する音声信号の出力を制御する音声処理装置が提供される。この音声処理装置は、状態判定部と出力制御部とを有する。状態判定部は、聴取者の向きを示す聴取者方向の動きが静止状態になったかを判定する。出力制御部は、聴取者から見て聴取者方向が中心になるように設定された聴取範囲に含まれる仮想音源に対応する音声信号の音量を、聴取範囲に含まれない仮想音源に対応する音声信号の音量より相対的に大きくするように制御する。また、出力制御部は、静止状態になったと判定されたとき、聴取範囲を縮小する。 In one proposal, there is provided an audio processing device that controls output of audio signals respectively corresponding to a plurality of virtual sound sources virtually arranged around a listener. This voice processing device includes a state determination unit and an output control unit. The state determination unit determines whether the movement in the listener direction indicating the direction of the listener is in a stationary state. The output control unit determines the volume of the audio signal corresponding to the virtual sound source included in the listening range set so that the listener direction is the center when viewed from the listener, and the sound corresponding to the virtual sound source not included in the listening range. Control to make it relatively larger than the volume of the signal. The output control unit reduces the listening range when it is determined that the stationary state is reached.

また、１つの案では、上記の音声処理装置と同様の処理が実行される音声処理方法が提供される。
さらに、１つの案では、上記の音声処理装置と同様の処理をコンピュータに実行させる音声処理プログラムが提供される。 Further, in one proposal, a voice processing method is provided in which processing similar to that performed by the voice processing device described above is executed.
Furthermore, in one proposal, a voice processing program that causes a computer to execute the same processing as that of the voice processing device described above is provided.

１態様によれば、聴取者が自然な動作で所望の音源に対応する音声を聞き分けることができる。 According to the first aspect, the listener can distinguish the sound corresponding to the desired sound source with a natural motion.

第１の実施の形態に係る音声処理装置の構成例およびその動作例を示す図である。It is a figure which shows the structural example of the audio | voice processing apparatus which concerns on 1st Embodiment, and its operation example. 第２の実施の形態に係る音声提供システムのシステム構成例を示す図である。It is a figure which shows the system configuration example of the audio | voice provision system which concerns on 2nd Embodiment. 展示会場における各機器の配置例を示す図である。It is a figure which shows the example of arrangement | positioning of each apparatus in an exhibition hall. 音声処理装置のハードウェア構成例を示す図である。It is a figure which shows the hardware structural example of a speech processing unit. ユーザ端末のハードウェア構成例を示す図である。It is a figure which shows the hardware structural example of a user terminal. ユーザ端末および音声処理装置が備える処理機能の構成例を示すブロック図である。It is a block diagram which shows the structural example of the processing function with which a user terminal and a speech processing unit are provided. 仮想空間における音源の配置例を示す図である。It is a figure which shows the example of arrangement | positioning of the sound source in virtual space. 音源管理テーブルに登録される情報の例を示す図である。It is a figure which shows the example of the information registered into a sound source management table. 注視状態の判定方法の例について説明するための図である。It is a figure for demonstrating the example of the determination method of a gaze state. 聴取範囲について説明するための図である。It is a figure for demonstrating the listening range. 二次元の仮想空間における聴取範囲の変化について説明するための図である。It is a figure for demonstrating the change of the listening range in a two-dimensional virtual space. 三次元の仮想空間における聴取範囲の変化について説明するための図である。It is a figure for demonstrating the change of the listening range in a three-dimensional virtual space. 聴取範囲を変化させる方法の第１の例を示す図である。It is a figure which shows the 1st example of the method of changing listening range. 聴取範囲を変化させる方法の第２の例を示す図である。It is a figure which shows the 2nd example of the method of changing listening range. 聴取範囲の角度が最小値になる前に注視状態が解消された場合の制御例を示す図である。It is a figure which shows the example of control when a gaze state is canceled before the angle of the listening range becomes the minimum value. 聴取範囲の角度が最大値になる前に再度注視状態になった場合の制御例を示す図である。It is a figure which shows the example of control when it will be in a gaze state again before the angle of a listening range becomes the maximum value. 聴取範囲に含まれる各音源の音量制御の例を示す図である。It is a figure which shows the example of the volume control of each sound source contained in the listening range. ユーザ管理テーブルに登録される情報の例を示す図である。It is a figure which shows the example of the information registered into a user management table. 注視判定部の処理手順の例を示すフローチャートである。It is a flowchart which shows the example of the process sequence of a gaze determination part. 聴取範囲制御部および音声出力処理部の処理手順の例を示すフローチャートである。It is a flowchart which shows the example of the process sequence of a listening range control part and an audio | voice output process part. 聴取範囲制御部および音声出力処理部の処理手順の例を示すフローチャートである。It is a flowchart which shows the example of the process sequence of a listening range control part and an audio | voice output process part. 注視状態にあるユーザが展示物に近づいたときの様子を示す図である。It is a figure which shows a mode when the user in a gaze state approaches the exhibit. 展示物管理テーブルに登録される情報の例を示す図である。It is a figure which shows the example of the information registered into an exhibit management table. 聴取範囲制御部および音声出力処理部の処理手順の変形例を示すフローチャートである。It is a flowchart which shows the modification of the process sequence of a listening range control part and an audio | voice output process part. 第３の実施の形態に係る音声提供システムの構成例を示す図である。It is a figure which shows the structural example of the audio | voice provision system which concerns on 3rd Embodiment. 第３の実施の形態におけるユーザ端末および音声処理装置の処理機能の例を示すブロック図である。It is a block diagram which shows the example of the processing function of the user terminal and audio | voice processing apparatus in 3rd Embodiment. 第４の実施の形態に係る音声提供システムの構成例を示す図である。It is a figure which shows the structural example of the audio | voice provision system which concerns on 4th Embodiment.

以下、本発明の実施の形態を図面を参照して説明する。
〔第１の実施の形態〕
図１は、第１の実施の形態に係る音声処理装置の構成例およびその動作例を示す図である。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.
[First Embodiment]
FIG. 1 is a diagram illustrating a configuration example and an operation example of the speech processing apparatus according to the first embodiment.

図１に示す音声処理装置１は、聴取者１０の周囲に仮想的に配置された複数の仮想音源にそれぞれ対応する音声信号の出力を制御するものである。各仮想音源の位置は、例えば、音源位置情報２に任意に登録されて、音声処理装置１の記憶装置に保持される。また、仮想音源に対応する音声信号は、例えば、あらかじめ記憶装置に用意されたものか、あるいは、マイクロフォンによって収音されて音声処理装置１に入力されるものである。後者の例としては、聴取者１０の周囲に実際に存在する人間が発する音声を、マイクロフォンによって収音することで得られた音声信号などがある。 The audio processing device 1 shown in FIG. 1 controls output of audio signals respectively corresponding to a plurality of virtual sound sources virtually arranged around the listener 10. The position of each virtual sound source is arbitrarily registered in the sound source position information 2, for example, and held in the storage device of the sound processing device 1. The audio signal corresponding to the virtual sound source is, for example, prepared in a storage device in advance, or collected by a microphone and input to the audio processing device 1. As an example of the latter, there is an audio signal obtained by picking up sound produced by a person who actually exists around the listener 10 using a microphone.

この音声処理装置１は、状態判定部３および出力制御部４を備える。
状態判定部３は、聴取者１０の向きを示す聴取者方向Ｄの動きを監視し、聴取者方向Ｄの動きが静止状態になったかを判定する。例えば、聴取者１０の身体に方向センサを装着しておき、状態判定部３は、方向センサによる検出結果を基に聴取者方向Ｄの動きを監視する。なお、聴取者方向Ｄは、聴取者１０の顔が向いている方向、あるいは聴取者１０の視線の方向であることが望ましい。 The voice processing device 1 includes a state determination unit 3 and an output control unit 4.
The state determination unit 3 monitors the movement in the listener direction D indicating the direction of the listener 10, and determines whether the movement in the listener direction D has become stationary. For example, a direction sensor is attached to the body of the listener 10, and the state determination unit 3 monitors the movement in the listener direction D based on the detection result by the direction sensor. Note that the listener direction D is preferably the direction in which the face of the listener 10 faces or the direction of the line of sight of the listener 10.

また、上記の静止状態とは、聴取者１０の向きが変化しなくなったと判断される状態である。状態判定部３は、例えば、聴取者方向Ｄの変動量が所定時間だけ所定の変動幅に収まっている場合に、静止状態になったと判定する。 In addition, the stationary state is a state in which it is determined that the orientation of the listener 10 no longer changes. For example, the state determination unit 3 determines that the stationary state has been reached when the amount of fluctuation in the listener direction D is within a predetermined fluctuation range for a predetermined time.

出力制御部４は、複数の仮想音源に対応する音声信号の出力音量を制御する。ここでは例として、出力制御部４は、各仮想音源に対応する音声信号を合成して所定チャネル数の合成音声信号を生成するものとする。合成音声信号は、例えば、聴取者１０が装着しているヘッドフォンやイヤフォンに出力されて、合成音声信号に基づく合成音声が聴取者１０に聴取される。あるいは、合成音声信号に基づく合成音声は、聴取者１０の周囲に配置された３つ以上のスピーカから出力される。 The output control unit 4 controls the output volume of an audio signal corresponding to a plurality of virtual sound sources. Here, as an example, it is assumed that the output control unit 4 synthesizes audio signals corresponding to each virtual sound source to generate a synthesized audio signal having a predetermined number of channels. The synthesized voice signal is output to, for example, headphones or earphones worn by the listener 10, and the synthesized voice based on the synthesized voice signal is listened to by the listener 10. Alternatively, synthesized speech based on the synthesized speech signal is output from three or more speakers arranged around the listener 10.

なお、他の例として、各仮想音源に対応する音声信号に基づく音声が、聴取者１０の周囲に仮想音源ごとに配置されたスピーカから出力されるようにしてもよい。
出力制御部４は、聴取者１０から見て聴取者方向Ｄが中心になるような聴取範囲３０を設定する。そして、出力制御部４は、設定した聴取範囲３０に含まれる仮想音源に対応する音声信号の音量を、聴取範囲３０に含まれない仮想音源に対応する音声信号の音量より相対的に大きくするように制御する。このような制御により、聴取者１０には、複数の音源のうち、聴取範囲３０に含まれる音源に対応する音声が強調して聞こえるようになる。 As another example, sound based on a sound signal corresponding to each virtual sound source may be output from a speaker arranged around the listener 10 for each virtual sound source.
The output control unit 4 sets a listening range 30 such that the listener direction D is the center when viewed from the listener 10. Then, the output control unit 4 makes the volume of the audio signal corresponding to the virtual sound source included in the set listening range 30 relatively larger than the volume of the audio signal corresponding to the virtual sound source not included in the listening range 30. To control. By such control, the listener 10 can hear the sound corresponding to the sound sources included in the listening range 30 among the plurality of sound sources with emphasis.

出力制御部４は、状態判定部３によって聴取者方向Ｄの動きが静止状態になったと判定されたとき、聴取範囲３０を縮小する。以下、図１の下側を参照して、静止状態になったと判定される前後の動作の例について説明する。 The output control unit 4 reduces the listening range 30 when the state determination unit 3 determines that the movement in the listener direction D has become stationary. Hereinafter, with reference to the lower side of FIG. 1, an example of the operation before and after it is determined that a stationary state has been described will be described.

図１の下側では、聴取者１０の周囲には５つの仮想音源２１〜２５が配置されている。なお、出力制御部４によって設定される聴取範囲３０を、斜線のハッチングによって示す。 On the lower side of FIG. 1, five virtual sound sources 21 to 25 are arranged around the listener 10. The listening range 30 set by the output control unit 4 is indicated by hatching.

状態判定部３が静止状態になったと判定する前の状態では、出力制御部４は、図１の左下に示すように聴取範囲３０を設定する。この状態では、聴取範囲３０には仮想音源２１〜２５が含まれており、聴取者１０には、仮想音源２１〜２５のそれぞれに対応する音声が均等に聞こえる。このため、聴取者１０は、仮想音源２１〜２５のそれぞれに対応する音声を聞き分けることが難しい。図１の左下の状態は、聴取者１０の向きが一定方向に定まっていない状態であり、聴取者１０は仮想音源２１〜２５のうちのどの音源に対応する音声を聞くかを特定していない状態と考えることができる。 In a state before it is determined that the state determination unit 3 is in a stationary state, the output control unit 4 sets the listening range 30 as shown in the lower left of FIG. In this state, the listening range 30 includes the virtual sound sources 21 to 25, and the listener 10 can hear sounds corresponding to the virtual sound sources 21 to 25 evenly. For this reason, it is difficult for the listener 10 to distinguish the sound corresponding to each of the virtual sound sources 21 to 25. The state in the lower left of FIG. 1 is a state in which the orientation of the listener 10 is not fixed in a certain direction, and the listener 10 has not specified which sound source corresponding to the sound source among the virtual sound sources 21 to 25 is to be heard. It can be considered a state.

一方、状態判定部３が静止状態になったと判定すると、出力制御部４は、図１の右下に示すように、聴取範囲３０を縮小する。この状態では、聴取者方向Ｄに最も近い仮想音源２３が聴取範囲３０に含まれているが、仮想音源２１，２２，２４，２５は聴取範囲３０に含まれていない。このため、出力制御部４は、仮想音源２３に対応する音声信号の音量を、他の仮想音源２１，２２，２４，２５に対応する音声信号の音量よりも相対的に大きくする。これにより、聴取者１０は、仮想音源２３に対応する音声を聞き取りやすくなる。 On the other hand, when it is determined that the state determination unit 3 is in a stationary state, the output control unit 4 reduces the listening range 30 as shown in the lower right of FIG. In this state, the virtual sound source 23 closest to the listener direction D is included in the listening range 30, but the virtual sound sources 21, 22, 24, and 25 are not included in the listening range 30. For this reason, the output control unit 4 makes the volume of the audio signal corresponding to the virtual sound source 23 relatively larger than the volume of the audio signal corresponding to the other virtual sound sources 21, 22, 24, 25. As a result, the listener 10 can easily hear the sound corresponding to the virtual sound source 23.

上記のような制御により、聴取者１０は、所望の仮想音源の方向に向いて静止するだけで、特別な入力操作を意識的に行うことなく、その仮想音源に対応する音声を容易に聞き分けることができる。所望の仮想音源の方向に向いて静止するという動作は、複数の音源が存在する状況下で所望の音源に対応する音声を聞き取ろうとする際に、聴取者１０が無意識に行う動作である。このため、上記の制御により、聴取者１０は、自然な動作で所望の仮想音源に対応する音声を聞き取ることができるようになる。 With the control as described above, the listener 10 can easily distinguish the sound corresponding to the virtual sound source without consciously performing a special input operation by just standing still in the direction of the desired virtual sound source. Can do. The operation of standing still in the direction of the desired virtual sound source is an operation that the listener 10 unconsciously performs when trying to listen to the sound corresponding to the desired sound source in a situation where there are a plurality of sound sources. For this reason, the listener 10 can listen to the sound corresponding to the desired virtual sound source by natural operation by the above control.

〔第２の実施の形態〕
次に、第２の実施の形態として、展示会場において入場者に音声情報を提供するための音声提供システムについて説明する。まず、図２は、第２の実施の形態に係る音声提供システムのシステム構成例を示す図である。 [Second Embodiment]
Next, as a second embodiment, an audio providing system for providing audio information to visitors at an exhibition hall will be described. First, FIG. 2 is a diagram illustrating a system configuration example of a voice providing system according to the second embodiment.

音声提供システム１００は、展示会場に入場したユーザに音声情報を提供するための制御処理を行う音声処理装置２００を備える。音声処理装置２００には、複数のマイクロフォンによって収音された音声信号が入力される。マイクロフォンの数は任意であり、図２では例として、音声処理装置２００には４つのマイクロフォン３０１ａ〜３０１ｄのそれぞれから音声信号が入力される。各マイクロフォンは、展示物について説明する説明者が発する音声を収音するものである。 The audio providing system 100 includes an audio processing device 200 that performs control processing for providing audio information to a user who has entered the exhibition hall. The sound processing apparatus 200 receives sound signals collected by a plurality of microphones. The number of microphones is arbitrary, and as an example in FIG. 2, an audio signal is input to the audio processing device 200 from each of the four microphones 301 a to 301 d. Each microphone picks up sound produced by an explainer explaining the exhibit.

なお、各マイクロフォンから音声処理装置２００への音声信号の送信方法としては、種々の方法を用いることができる。例えば、各マイクロフォンによって収音された音声信号は、デジタル音声信号に変換された後、有線または無線によって音声処理装置２００に送信される。あるいは、各マイクロフォンによって収音された音声信号は、アナログ信号のまま音声処理装置２００に入力されて、音声処理装置２００内でデジタル化されてもよい。 Note that various methods can be used as a method of transmitting an audio signal from each microphone to the audio processing device 200. For example, an audio signal collected by each microphone is converted into a digital audio signal and then transmitted to the audio processing device 200 by wire or wirelessly. Alternatively, the audio signal collected by each microphone may be input to the audio processing device 200 as an analog signal and digitized in the audio processing device 200.

また、音声処理装置２００には、無線信号を送受信するための複数のアクセスポイント１１０ａ〜１１０ｄが、ネットワーク１２０を介して接続されている。ネットワーク１２０は、例えばＬＡＮ（Local Area Network）である。この場合、アクセスポイント１１０ａ〜１１０ｄは、無線ＬＡＮアクセスポイントである。 In addition, a plurality of access points 110 a to 110 d for transmitting and receiving wireless signals are connected to the audio processing device 200 via a network 120. The network 120 is, for example, a LAN (Local Area Network). In this case, the access points 110a to 110d are wireless LAN access points.

一方、展示会場に入場したユーザは、ユーザ端末４００およびヘッドフォン５００を携帯する。ユーザ端末４００は、アクセスポイント１１０ａ〜１１０ｄとの間で無線通信することが可能になっている。また、ヘッドフォン５００は、ユーザ端末４００から出力されたアナログ音声信号を再生出力するドライバユニット（図示せず）を備える。 On the other hand, the user who entered the exhibition hall carries the user terminal 400 and the headphones 500. The user terminal 400 can wirelessly communicate with the access points 110a to 110d. The headphone 500 includes a driver unit (not shown) that reproduces and outputs an analog audio signal output from the user terminal 400.

音声処理装置２００は、各マイクロフォンによって収音された音声信号を合成し、合成された音声信号を、アクセスポイント１１０ａ〜１１０ｄの少なくとも１つを通じて、ユーザ端末４００に送信する。ユーザ端末４００は、音声処理装置２００から受信した音声信号をアナログ変換し、変換したアナログ音声信号をヘッドフォン５００のドライバユニットに出力する。 The audio processing device 200 synthesizes the audio signals collected by the respective microphones, and transmits the synthesized audio signal to the user terminal 400 through at least one of the access points 110a to 110d. The user terminal 400 converts the audio signal received from the audio processing device 200 into an analog signal, and outputs the converted analog audio signal to the driver unit of the headphone 500.

また、音声処理装置２００は、展示会場におけるユーザ端末４００の位置を検出する機能を備える。本実施の形態では例として、音声処理装置２００は、ユーザ端末４００から送信された信号を、アクセスポイント１１０ａ〜１１０ｄから受信し、これらの受信信号に基づいてユーザ端末４００の位置を検出する。例えば、音声処理装置２００は、ユーザ端末４００から送信された信号をアクセスポイント１１０ａ〜１１０ｄを通じて受信し、それぞれのアクセスポイントにおける信号の受信時刻の差、あるいは受信電波強度の差に基づいて、三角法を用いてユーザ端末４００の位置を検出する。この方法が用いられる場合、位置検出に使用されるアクセスポイントは、少なくとも３つ設置される。 The audio processing device 200 has a function of detecting the position of the user terminal 400 in the exhibition hall. As an example in the present embodiment, speech processing apparatus 200 receives signals transmitted from user terminal 400 from access points 110a to 110d, and detects the position of user terminal 400 based on these received signals. For example, the audio processing device 200 receives signals transmitted from the user terminal 400 through the access points 110a to 110d, and triangulation is performed based on the difference in signal reception time or the difference in received radio wave strength at each access point. Is used to detect the position of the user terminal 400. When this method is used, at least three access points used for position detection are installed.

さらに、ヘッドフォン５００には、ユーザが向いている方向を検出するための方向センサ５１０が搭載されている。以下、方向センサ５１０によって検出される方向を「視線方向」と呼ぶ。 Furthermore, the headphone 500 is equipped with a direction sensor 510 for detecting the direction in which the user is facing. Hereinafter, the direction detected by the direction sensor 510 is referred to as a “line-of-sight direction”.

方向センサ５１０は、例えば、加速度センサ、ジャイロセンサおよび地磁気センサを備える。ユーザ端末４００は、方向センサ５１０による検出結果を基にユーザの視線方向を演算し、算出された視線方向を、アクセスポイント１１０ａ〜１１０ｄの少なくとも１つを通じて音声処理装置２００に送信する。 The direction sensor 510 includes, for example, an acceleration sensor, a gyro sensor, and a geomagnetic sensor. The user terminal 400 calculates the user's line-of-sight direction based on the detection result of the direction sensor 510, and transmits the calculated line-of-sight direction to the audio processing device 200 through at least one of the access points 110a to 110d.

なお、方向センサ５１０は、ヘッドフォン５００とは別の位置に設けられてもよく、また、頭部以外の位置に設けられてもよい。ただし、方向センサ５１０の目的は、ユーザがどこを見ているかを検出することである。このため、方向センサ５１０は、ユーザの頭部に設けられることが望ましい。また、方向センサ５１０によって検出される方向は、水平面に沿った２次元方向であっても、あるいは鉛直方向を含めた３次元方向であってもよい。 The direction sensor 510 may be provided at a position different from the headphone 500, or may be provided at a position other than the head. However, the purpose of the direction sensor 510 is to detect where the user is looking. For this reason, it is desirable that the direction sensor 510 be provided on the user's head. Further, the direction detected by the direction sensor 510 may be a two-dimensional direction along a horizontal plane or a three-dimensional direction including a vertical direction.

図３は、展示会場における各機器の配置例を示す図である。
展示会場においては、例えば、展示物３１０ａ〜３１０ｃが展示されている。展示物３１０ａの前には説明者３０２ａが立ち、説明者３０２ａは展示物３１０ａの説明を行う。同様に、展示物３１０ｂの前には説明者３０２ｂが立ち、展示物３１０ｃの前には説明者３０２ｃが立っている。説明者３０２ａ〜３０２ｃは、それぞれマイクロフォン３０１ａ〜３０１ｃを持っている。そして、説明者３０２ａ〜３０２ｃがそれぞれ発する音声は、マイクロフォン３０１ａ〜３０１ｃによって収音され、収音された音声信号は音声処理装置２００に送信される。 FIG. 3 is a diagram illustrating an arrangement example of each device in the exhibition hall.
In the exhibition hall, for example, exhibits 310a to 310c are exhibited. An exhibitor 302a stands in front of the exhibit 310a, and the explainer 302a explains the exhibit 310a. Similarly, the presenter 302b stands in front of the exhibit 310b, and the presenter 302c stands in front of the exhibit 310c. The explainers 302a to 302c have microphones 301a to 301c, respectively. Then, the sounds emitted by the presenters 302 a to 302 c are collected by the microphones 301 a to 301 c, and the collected sound signals are transmitted to the sound processing apparatus 200.

音声処理装置２００は、受信した音声信号を合成し、合成した音声信号を、アクセスポイント１１０ａ〜１１０ｄを通じて、ユーザ４０１が携帯するユーザ端末４００に送信する。ユーザ４０１は、説明者３０２ａ〜３０２ｃに近づくことで説明者３０２ａ〜３０２ｃの声を直接聞くこともできるが、基本的には、説明者３０２ａ〜３０２ｃの声を、ヘッドフォン５００を介して聞く。 The audio processing device 200 synthesizes the received audio signals, and transmits the synthesized audio signals to the user terminal 400 carried by the user 401 through the access points 110a to 110d. The user 401 can directly hear the voices of the presenters 302 a to 302 c by approaching the presenters 302 a to 302 c, but basically listens to the voices of the presenters 302 a to 302 c through the headphones 500.

なお、展示会場に複数のユーザ４０１が入場した場合には、ユーザ４０１のそれぞれがユーザ端末４００およびヘッドフォン５００を携帯する。
ところで、展示会場に多くの展示物が展示され、展示物ごとに説明者が存在する場合、展示会場内には多くの音声が飛び交うことになる。展示会場に入場したユーザ４０１は、これら多くの音声から所望の展示物に対応する音声を聞き分ける必要があるが、例えば、所望の展示物の方向に視線を向けるだけでは、その展示物に対応する音声を聞くことは難しい。 Note that when a plurality of users 401 enter the exhibition hall, each of the users 401 carries the user terminal 400 and the headphones 500.
By the way, when many exhibits are exhibited in the exhibition hall and there is an explainer for each exhibition, many voices fly in the exhibition hall. The user 401 who has entered the exhibition hall needs to distinguish the voice corresponding to the desired exhibition from these many voices. For example, if the user looks at the desired exhibition, the user 401 corresponds to the exhibition. It is difficult to hear the voice.

そこで、音声処理装置２００は、ユーザ端末４００からユーザ４０１の視線方向を随時取得し、ユーザ４０１の視線方向の動きが静止したかを判断する。以下、ユーザ４０１の視線方向の動きが静止したと判断される状態を「注視状態」と呼ぶ。音声処理装置２００は、ユーザ４０１が注視状態になったと判断すると、ユーザ４０１の視線方向に存在する展示物に対応する音声が強調されて聞き取りやすくなるように、各マイクロフォンからの音声の合成バランスを調整して、調整後の合成音声信号をユーザ端末４００に送信する。 Therefore, the audio processing device 200 acquires the user's 401 gaze direction from the user terminal 400 as needed, and determines whether the movement of the user 401 in the gaze direction is stationary. Hereinafter, a state in which the movement of the user 401 in the line-of-sight direction is determined to be stationary is referred to as a “gaze state”. When the audio processing apparatus 200 determines that the user 401 is in a gaze state, the audio processing apparatus 200 adjusts the synthesis balance of the audio from each microphone so that the audio corresponding to the exhibit existing in the line of sight of the user 401 is emphasized and is easy to hear. After adjustment, the adjusted synthesized speech signal is transmitted to the user terminal 400.

また、音声処理装置２００は、ユーザ４０１が注視状態になったと判断したとき、ユーザ４０１の視線方向に存在する展示物に対応する音声を急激に強調するのではなく、緩やかに強調するように制御する。これにより、ユーザ４０１が自然な感覚で所望の音声を聞き取ることができるようにする。 In addition, when it is determined that the user 401 is in a gaze state, the audio processing device 200 performs control so that the audio corresponding to the exhibit existing in the line of sight of the user 401 is not emphasized suddenly but gently. To do. As a result, the user 401 can hear a desired voice with a natural feeling.

図４は、音声処理装置のハードウェア構成例を示す図である。
音声処理装置２００は、図４に示すようなコンピュータとして実現することができる。音声処理装置２００は、ＣＰＵ（Central Processing Unit）２０１によって装置全体が制御されている。ＣＰＵ２０１には、バス２０９を介して、ＲＡＭ（Random Access Memory）２０２と複数の周辺機器が接続されている。 FIG. 4 is a diagram illustrating a hardware configuration example of the audio processing device.
The voice processing apparatus 200 can be realized as a computer as shown in FIG. The entire voice processing apparatus 200 is controlled by a CPU (Central Processing Unit) 201. A RAM (Random Access Memory) 202 and a plurality of peripheral devices are connected to the CPU 201 via a bus 209.

ＲＡＭ２０２は、音声処理装置２００の主記憶装置として使用される。ＲＡＭ２０２には、ＣＰＵ２０１に実行させるＯＳ（Operating System）プログラムやアプリケーションプログラムの少なくとも一部が一時的に格納される。また、ＲＡＭ２０２には、ＣＰＵ２０１による処理に必要な各種データが格納される。 The RAM 202 is used as a main storage device of the sound processing device 200. The RAM 202 temporarily stores at least part of an OS (Operating System) program and application programs to be executed by the CPU 201. The RAM 202 stores various data necessary for processing by the CPU 201.

バス２０９に接続されている周辺機器としては、ＨＤＤ（Hard Disk Drive）２０３、グラフィックインタフェース２０４、入力インタフェース２０５、光学ドライブ装置２０６、ネットワークインタフェース２０７および通信インタフェース２０８がある。 Peripheral devices connected to the bus 209 include an HDD (Hard Disk Drive) 203, a graphic interface 204, an input interface 205, an optical drive device 206, a network interface 207, and a communication interface 208.

ＨＤＤ２０３は、内蔵した磁気ディスクに対して、磁気的にデータの書き込みおよび読み出しを行う。ＨＤＤ２０３は、音声処理装置２００の二次記憶装置として使用される。ＨＤＤ２０３には、ＯＳプログラム、アプリケーションプログラム、および各種データが格納される。なお、二次記憶装置としては、フラッシュメモリなどの他の種類の不揮発性記憶装置を使用することもできる。 The HDD 203 magnetically writes data to and reads data from a built-in magnetic disk. The HDD 203 is used as a secondary storage device of the sound processing device 200. The HDD 203 stores an OS program, application programs, and various data. As the secondary storage device, other types of nonvolatile storage devices such as a flash memory may be used.

グラフィックインタフェース２０４には、モニタ２０４ａが接続されている。グラフィックインタフェース２０４は、ＣＰＵ２０１からの命令に従って、画像をモニタ２０４ａに表示させる。なお、モニタ２０４ａは、例えば、液晶ディスプレイである。 A monitor 204 a is connected to the graphic interface 204. The graphic interface 204 displays an image on the monitor 204a in accordance with a command from the CPU 201. The monitor 204a is a liquid crystal display, for example.

入力インタフェース２０５には、キーボード２０５ａ、マウス２０５ｂなどの入力装置接続されている。入力インタフェース２０５は、入力装置からの出力信号をＣＰＵ２０１に送信する。 Input devices such as a keyboard 205a and a mouse 205b are connected to the input interface 205. The input interface 205 transmits an output signal from the input device to the CPU 201.

光学ドライブ装置２０６は、レーザ光などを利用して、光ディスク２０６ａに記録されたデータの読み取りを行う。光ディスク２０６ａは、光の反射によって読み取り可能なようにデータが記録された可搬型の記録媒体である。光ディスク２０６ａには、ＤＶＤ（Digital Versatile Disc）、ＤＶＤ−ＲＡＭ、ＣＤ−ＲＯＭ（Compact Disc Read Only Memory）、ＣＤ−Ｒ（Recordable）／ＲＷ（Rewritable）などがある。 The optical drive device 206 reads data recorded on the optical disc 206a using a laser beam or the like. The optical disk 206a is a portable recording medium on which data is recorded so that it can be read by reflection of light. Examples of the optical disk 206a include a DVD (Digital Versatile Disc), a DVD-RAM, a CD-ROM (Compact Disc Read Only Memory), and a CD-R (Recordable) / RW (Rewritable).

ネットワークインタフェース２０７は、ネットワーク１２０を通じて他の装置との間でデータを送受信する。通信インタフェース２０８は、各マイクロフォンによって収音されたデジタル音声信号を受信する。 The network interface 207 transmits / receives data to / from other devices via the network 120. The communication interface 208 receives a digital audio signal collected by each microphone.

図５は、ユーザ端末のハードウェア構成例を示す図である。
ユーザ端末４００は、図５に示すような情報端末装置として実現することができる。ユーザ端末４００は、ＣＰＵ４１１によって装置全体が制御されている。ＣＰＵ４１１には、バス４１９を介して、ＲＡＭ４１２と複数の周辺機器が接続されている。 FIG. 5 is a diagram illustrating a hardware configuration example of the user terminal.
The user terminal 400 can be realized as an information terminal device as shown in FIG. The entire user terminal 400 is controlled by the CPU 411. A RAM 412 and a plurality of peripheral devices are connected to the CPU 411 via a bus 419.

ＲＡＭ４１２は、ユーザ端末４００の主記憶装置として使用される。ＲＡＭ４１２には、ＣＰＵ４１１に実行させるＯＳプログラムやアプリケーションプログラムの少なくとも一部が一時的に格納される。また、ＲＡＭ４１２には、ＣＰＵ４１１による処理に必要な各種データが格納される。 The RAM 412 is used as a main storage device of the user terminal 400. The RAM 412 temporarily stores at least part of an OS program and application programs to be executed by the CPU 411. The RAM 412 stores various data necessary for processing by the CPU 411.

バス４１９に接続されている周辺機器としては、フラッシュメモリ４１３、表示装置４１４、入力装置４１５、無線インタフェース４１６、通信インタフェース４１７およびオーディオインタフェース４１８がある。 Peripheral devices connected to the bus 419 include a flash memory 413, a display device 414, an input device 415, a wireless interface 416, a communication interface 417, and an audio interface 418.

フラッシュメモリ４１３は、内蔵した磁気ディスクに対して、磁気的にデータの書き込みおよび読み出しを行う。フラッシュメモリ４１３は、ユーザ端末４００の二次記憶装置として使用される。フラッシュメモリ４１３には、ＯＳプログラム、アプリケーションプログラム、および各種データが格納される。なお、二次記憶装置としては、ＨＤＤなどの他の種類の不揮発性記憶装置を使用することもできる。 The flash memory 413 magnetically writes data to and reads data from a built-in magnetic disk. The flash memory 413 is used as a secondary storage device of the user terminal 400. The flash memory 413 stores an OS program, application programs, and various data. As the secondary storage device, other types of nonvolatile storage devices such as HDDs can also be used.

表示装置４１４は、例えば液晶ディスプレイなどを含み、ＣＰＵ４１１からの命令に従って画像を表示する。入力装置４１５は、例えば、表示装置４１４の表示面に設置されたタッチパネルや、所定の操作キーなどを含む。入力装置４１５に対する操作に応じた信号がＣＰＵ４１１に送信される。 The display device 414 includes a liquid crystal display, for example, and displays an image according to a command from the CPU 411. The input device 415 includes, for example, a touch panel installed on the display surface of the display device 414, predetermined operation keys, and the like. A signal corresponding to the operation on the input device 415 is transmitted to the CPU 411.

無線インタフェース４１６は、アクセスポイント１１０ａ〜１１０ｄとの間で無線通信する。通信インタフェース４１７は、方向センサ５１０による検出結果を受信する。オーディオインタフェース４１８は、ＣＰＵ４１１から送信されたデジタル音声信号をアナログ音声信号に変換し、アナログ音声信号を増幅してヘッドフォン５００に出力する。 The wireless interface 416 performs wireless communication with the access points 110a to 110d. The communication interface 417 receives the detection result from the direction sensor 510. The audio interface 418 converts the digital audio signal transmitted from the CPU 411 into an analog audio signal, amplifies the analog audio signal, and outputs it to the headphones 500.

図６は、ユーザ端末および音声処理装置が備える処理機能の構成例を示すブロック図である。
ユーザ端末４００は、視線方向検出部４２１および再生処理部４２２を有する。これらの各処理ブロックは、例えば、ユーザ端末４００のＣＰＵ４１１（図５参照）が所定のプログラムを実行することで実現される。 FIG. 6 is a block diagram illustrating a configuration example of processing functions included in the user terminal and the voice processing device.
The user terminal 400 includes a line-of-sight direction detection unit 421 and a reproduction processing unit 422. Each of these processing blocks is realized, for example, by the CPU 411 (see FIG. 5) of the user terminal 400 executing a predetermined program.

視線方向検出部４２１は、方向センサ５１０による検出結果を基に、ユーザ４０１の視線方向ρｔを演算する。視線方向検出部４２１は、算出した視線方向ρｔを音声処理装置２００に送信する。なお、視線方向ρｔは、例えば、ｘ軸、ｙ軸、ｚ軸のそれぞれの回りの回転角度（Ｒｔ，Ｐｔ，Ｙｔ）で表される。あるいは、視線方向ρｔは、例えば、ベクトルとして表されてもよい。 The gaze direction detection unit 421 calculates the gaze direction ρt of the user 401 based on the detection result by the direction sensor 510. The gaze direction detection unit 421 transmits the calculated gaze direction ρt to the sound processing device 200. The line-of-sight direction ρt is represented by, for example, rotation angles (Rt, Pt, Yt) around the x-axis, y-axis, and z-axis. Alternatively, the line-of-sight direction ρt may be represented as a vector, for example.

再生処理部４２２は、例えば、音声処理装置２００から受信した音声信号を所定の符号化方式に従って復号化し、オーディオインタフェース４１８（図５参照）に供給する。また、再生処理部４２２は、例えば、受信した音声信号に対して、擬似的な３Ｄ効果を与える処理などを施してもよい。 For example, the reproduction processing unit 422 decodes the audio signal received from the audio processing device 200 according to a predetermined encoding method, and supplies the decoded audio signal to the audio interface 418 (see FIG. 5). Further, the reproduction processing unit 422 may perform, for example, a process for giving a pseudo 3D effect to the received audio signal.

一方、音声処理装置２００は、ユーザ位置検出部２１１、音声入力部２１２、計時部２１３、注視判定部２１４、聴取範囲制御部２１５および音声出力処理部２１６を有する。これらの各処理ブロックの処理は、例えば、音声処理装置２００のＣＰＵ２０１（図４参照）が所定のプログラムを実行することで実現される。 On the other hand, the audio processing device 200 includes a user position detection unit 211, an audio input unit 212, a time measurement unit 213, a gaze determination unit 214, a listening range control unit 215, and an audio output processing unit 216. The processing of each processing block is realized, for example, by the CPU 201 (see FIG. 4) of the sound processing device 200 executing a predetermined program.

また、音声処理装置２００の記憶装置には、音源管理テーブル２２０およびユーザ管理テーブル２３０が格納される。音源管理テーブル２２０には、各展示物に対応する音声に関する情報が登録される。また、ユーザ管理テーブル２３０には、各ユーザ４０１に関する情報が登録される。音源管理テーブル２２０およびユーザ管理テーブル２３０は、例えば、音声処理装置２００のＲＡＭ２０２（図４参照）に展開されて、音声処理装置の処理ブロックから読み書きされる。 In addition, a sound source management table 220 and a user management table 230 are stored in the storage device of the sound processing device 200. In the sound source management table 220, information related to audio corresponding to each exhibit is registered. In addition, information regarding each user 401 is registered in the user management table 230. The sound source management table 220 and the user management table 230 are, for example, developed in the RAM 202 (see FIG. 4) of the sound processing device 200 and read / written from the processing block of the sound processing device.

ユーザ位置検出部２１１は、ユーザ端末４００から送信された信号をアクセスポイント１１０ａ〜１１０ｄを通じて受信し、これらの受信信号を基にユーザ端末４００の位置を検出する。ユーザ位置検出部２１１は、検出したユーザ端末４００の位置を、ユーザ位置Ｑｔとして、検出対象のユーザに対応するユーザ管理テーブル２３０に随時登録する。 The user position detection unit 211 receives signals transmitted from the user terminal 400 through the access points 110a to 110d, and detects the position of the user terminal 400 based on these received signals. The user position detection unit 211 registers the detected position of the user terminal 400 as the user position Qt in the user management table 230 corresponding to the detection target user as needed.

音声入力部２１２は、複数の説明者がそれぞれ備えるマイクロフォンによって収音された音声信号を、音源として受信し、受信した音声信号を音声出力処理部２１６に供給する。 The voice input unit 212 receives a voice signal picked up by microphones respectively provided by a plurality of explainers as a sound source, and supplies the received voice signal to the voice output processing unit 216.

音源管理テーブル２２０には、音声入力部２１２が受信する各音源についての情報が登録される。後述するように、各音源は、展示会場に対応する仮想空間上の任意の位置に配置され、音源管理テーブル２２０には、仮想空間における各音源の位置情報などが登録される。 In the sound source management table 220, information on each sound source received by the voice input unit 212 is registered. As will be described later, each sound source is arranged at an arbitrary position in the virtual space corresponding to the exhibition hall, and the sound source management table 220 registers position information of each sound source in the virtual space.

計時部２１３は、現在の時刻を注視判定部２１４および聴取範囲制御部２１５に供給する。計時部２１３からの時刻は、各種の経過時間を求める際に利用される。
注視判定部２１４は、ユーザ端末４００から受信した視線方向ρｔに基づいて、ユーザ４０１が注視状態にあるか否かを判定する。前述のように、注視状態とは、視線方向ρｔの動きが静止したと判断される状態である。注視判定部２１４は、注視状態にあるかの判定結果を聴取範囲制御部２１５に通知する。また、注視判定部２１４は、ユーザ端末４００から受信した視線方向ρｔを、検出対象のユーザ４０１に対応するユーザ管理テーブル２３０に随時登録する。 The timer unit 213 supplies the current time to the gaze determination unit 214 and the listening range control unit 215. The time from the time measuring unit 213 is used when various elapsed times are obtained.
The gaze determination unit 214 determines whether the user 401 is in a gaze state based on the gaze direction ρt received from the user terminal 400. As described above, the gaze state is a state in which it is determined that the movement in the line-of-sight direction ρt is stationary. The gaze determination unit 214 notifies the listening range control unit 215 of the determination result as to whether or not the user is in the gaze state. In addition, the gaze determination unit 214 registers the line-of-sight direction ρt received from the user terminal 400 in the user management table 230 corresponding to the detection target user 401 as needed.

聴取範囲制御部２１５は、上記の仮想空間においてユーザ４０１ごとに設定される聴取範囲の大きさを制御する。聴取範囲とは、ユーザ４０１から見て視線方向ρｔを中心とした範囲であり、後述するように、視線方向ρｔとのなす角度によって聴取範囲が決定される。そして、仮想空間上に配置された音源のうち、聴取範囲に含まれる音源に対応する音声信号の音量が、聴取範囲に含まれない音源に対応する音声信号の音量より大きくなるように制御される。本実施の形態では例として、仮想空間上に配置された音源のうち、聴取範囲に含まれる音源からの音声のみがユーザ４０１に提供され、聴取範囲に含まれない音源からの音声はユーザ４０１に聞こえないように制御される。 The listening range control unit 215 controls the size of the listening range set for each user 401 in the virtual space. The listening range is a range centered on the viewing direction ρt as viewed from the user 401, and the listening range is determined by an angle formed with the viewing direction ρt, as will be described later. The volume of the audio signal corresponding to the sound source included in the listening range among the sound sources arranged in the virtual space is controlled to be larger than the volume of the audio signal corresponding to the sound source not included in the listening range. . In the present embodiment, as an example, only the sound from the sound source included in the listening range is provided to the user 401 among the sound sources arranged in the virtual space, and the sound from the sound source not included in the listening range is provided to the user 401. It is controlled so that it cannot be heard.

聴取範囲制御部２１５は、注視判定部２１４によってユーザ４０１が注視状態にあると判定されると、聴取範囲を所定の最小の大きさになるまで徐々に狭めていく。このような制御により、聴取範囲制御部２１５は、ユーザ端末４００に送信される音声において、ユーザ４０１が向いている方向に配置された音源からの音声が徐々に聞き取りやすくなるようにする。 When the gaze determination unit 214 determines that the user 401 is in the gaze state, the listening range control unit 215 gradually narrows the listening range until a predetermined minimum size is reached. By such control, the listening range control unit 215 gradually makes it easier to hear the sound from the sound source arranged in the direction in which the user 401 is facing in the sound transmitted to the user terminal 400.

音声出力処理部２１６は、音声入力部２１２から入力される音声信号のうち、聴取範囲制御部２１５によって設定された聴取範囲に含まれる音源についての音声信号を選択して合成し、左右１チャネルずつの合成音声信号を生成する。音声出力処理部２１６は、生成した合成音声信号を、ユーザ端末４００に対して送信する。 The audio output processing unit 216 selects and synthesizes audio signals for sound sources included in the listening range set by the listening range control unit 215 from the audio signals input from the audio input unit 212, and outputs the left and right channels one by one. The synthesized voice signal is generated. The audio output processing unit 216 transmits the generated synthesized audio signal to the user terminal 400.

次に、図７は、仮想空間における音源の配置例を示す図である。
仮想空間３２０は、ユーザ４０１や展示物３１０ａ〜３１０ｊが存在する展示会場を二次元または三次元の座標系によって表した空間である。図７では、仮想空間３２０をｘ軸、ｙ軸、ｚ軸による三次元座標系によって表した例を示している。ユーザ位置Ｑｔは、座標（Ｘｔ，Ｙｔ，Ｚｔ）によって表される。また、ユーザ４０１の視線方向ρｔは、例えば、各軸の回りの回転角度を用いて（Ｒｔ，Ｐｔ，Ｙｔ）と表される。 Next, FIG. 7 is a diagram illustrating an arrangement example of sound sources in the virtual space.
The virtual space 320 is a space that represents an exhibition hall where the user 401 and the exhibits 310a to 310j exist by a two-dimensional or three-dimensional coordinate system. FIG. 7 shows an example in which the virtual space 320 is represented by a three-dimensional coordinate system using the x axis, the y axis, and the z axis. The user position Qt is represented by coordinates (Xt, Yt, Zt). Further, the line-of-sight direction ρt of the user 401 is expressed as (Rt, Pt, Yt) using, for example, a rotation angle around each axis.

音源Ｐ１〜Ｐ１０は、展示物３１０ａ〜３１０ｊのそれぞれを説明する説明者の音声を収音した音声信号に対応する。そして、音源Ｐ１〜Ｐ１０は、仮想空間３２０における展示物３１０ａ〜３１０ｊのそれぞれの位置に配置される。例えば、音源Ｐ１は、展示物３１０ａを説明する説明者の音声信号に対応し、音源Ｐ２は、展示物３１０ｂを説明する説明者の音声信号に対応する。そして、音源Ｐ１は、展示物３１０ａの位置に仮想的に配置され、音源Ｐ２は、展示物３１０ｂの位置に仮想的に配置される。なお、音源の位置は、例えば、対応する展示物の中心など、対応する展示物を代表する位置に配置される。 The sound sources P1 to P10 correspond to audio signals obtained by collecting the voices of the explainers explaining the exhibits 310a to 310j. The sound sources P 1 to P 10 are arranged at the positions of the exhibits 310 a to 310 j in the virtual space 320. For example, the sound source P1 corresponds to the audio signal of the explainer explaining the exhibit 310a, and the sound source P2 corresponds to the audio signal of the explainer explaining the exhibit 310b. The sound source P1 is virtually arranged at the position of the exhibit 310a, and the sound source P2 is virtually arranged at the position of the exhibit 310b. The position of the sound source is arranged at a position representing the corresponding exhibit, such as the center of the corresponding exhibit.

仮想空間３２０における音源Ｐ１〜Ｐ１０のそれぞれの位置は、音源管理テーブル２２０に設定される。音声処理装置２００の管理者は、音声処理装置２００への入力操作により、各音源の位置を任意に設定することができる。 The positions of the sound sources P1 to P10 in the virtual space 320 are set in the sound source management table 220. The administrator of the sound processing device 200 can arbitrarily set the position of each sound source by an input operation to the sound processing device 200.

なお、図７では、仮想空間３２０における音源Ｐｎ（ｎは１以上の整数）の位置を示す座標を（Ｘｎ，Ｙｎ，Ｚｎ）と表記している。
また、本実施の形態では、展示物の説明者が発する音声を音源とするが、音源は、例えば、あらかじめ記憶装置に格納された音声信号に基づく再生音声であってもよい。この場合、例えば、音声処理装置２００のＨＤＤ２０３に、音源として使用する音声信号が格納され、音声入力部２１２は、ＨＤＤ２０３から読み出した音声信号を音声出力処理部２１６に供給する。 In FIG. 7, coordinates indicating the position of the sound source Pn (n is an integer of 1 or more) in the virtual space 320 are expressed as (Xn, Yn, Zn).
In this embodiment, the sound emitted by the exhibitor of the exhibit is used as the sound source. However, the sound source may be, for example, reproduced sound based on an audio signal stored in the storage device in advance. In this case, for example, an audio signal used as a sound source is stored in the HDD 203 of the audio processing apparatus 200, and the audio input unit 212 supplies the audio signal read from the HDD 203 to the audio output processing unit 216.

図８は、音源管理テーブルに登録される情報の例を示す図である。
音源管理テーブル２２０には、仮想空間に配置された音源ごとに、各音源を一意に識別するための音源ＩＤと、仮想空間において音源が配置された位置を示す音源座標とが、対応付けて登録されている。音声処理装置２００の管理者は、音源管理テーブル２２０に対して新たな音源の情報を追加する、音源座標を変更する、音源管理テーブル２２０から音源の情報を削除する、といった操作を行うことができる。 FIG. 8 is a diagram illustrating an example of information registered in the sound source management table.
In the sound source management table 220, for each sound source arranged in the virtual space, a sound source ID for uniquely identifying each sound source and a sound source coordinate indicating a position where the sound source is arranged in the virtual space are registered in association with each other. Has been. The administrator of the sound processing apparatus 200 can perform operations such as adding new sound source information to the sound source management table 220, changing sound source coordinates, and deleting sound source information from the sound source management table 220. .

なお、図８では例として、音源ＩＤが示す音源に対応付けられた展示物を一意に識別するための展示物ＩＤが登録されている。音声処理装置２００の管理者は、例えば、１つの展示物に対して複数の音源を対応付ける、あるいは、複数の展示物に対して１つの音源を対応付けることも可能である。 In FIG. 8, for example, an exhibit ID for uniquely identifying an exhibit associated with the sound source indicated by the sound source ID is registered. For example, the administrator of the sound processing apparatus 200 can associate a plurality of sound sources with one exhibit, or can associate one sound source with a plurality of exhibits.

次に、図９は、注視状態の判定方法の例について説明するための図である。
前述のように、注視状態とは、ユーザ４０１の視線方向ρｔの動きが静止したと判断される状態である。ただし、ユーザ４０１が特定の位置を注視している状態であっても、実際にはユーザ４０１の動きが完全に静止することは少ない。この点を鑑みて、音声処理装置２００の注視判定部２１４は、視線方向ρｔの値の変動量がある一定の閾値幅Ｗｔｈに所定時間だけ収まっている場合に、ユーザ４０１が注視状態になったと判定する。 Next, FIG. 9 is a diagram for explaining an example of a gaze state determination method.
As described above, the gaze state is a state in which the movement of the user 401 in the line-of-sight direction ρt is determined to be stationary. However, even when the user 401 is gazing at a specific position, in practice, the movement of the user 401 is rarely completely stopped. In view of this point, the gaze determination unit 214 of the audio processing device 200 determines that the user 401 is in a gaze state when the variation amount of the value of the line-of-sight direction ρt is within a certain threshold width Wth for a predetermined time. judge.

図９の例では、視線方向ρｔの値（図９では角度Ｒｔ）の変動量が、時刻ｔ１から、あらかじめ決められた判定時間Ｔａが経過した時刻ｔ２までの期間において、閾値幅Ｗｔｈに収まっている。この場合、注視判定部２１４は、時刻ｔ２においてユーザ４０１が注視状態になったと判定する。また、その後の時刻ｔ３において視線方向ρｔの変動量が閾値幅Ｗｔｈから逸脱すると、注視判定部２１４は、注視状態が解消されたと判定する。 In the example of FIG. 9, the amount of change in the value of the line-of-sight direction ρt (angle Rt in FIG. 9) falls within the threshold width Wth during the period from time t1 to time t2 when a predetermined determination time Ta has elapsed. Yes. In this case, the gaze determination unit 214 determines that the user 401 is in a gaze state at time t2. Further, when the amount of change in the line-of-sight direction ρt deviates from the threshold width Wth at the subsequent time t3, the gaze determination unit 214 determines that the gaze state has been eliminated.

なお、ユーザ４０１が注視状態にあるか否かの判定は、実際には、視線方向ρｔを示す各軸方向の値（すなわち、Ｒｔ，Ｐｔ，Ｙｔ）の変動量が、すべて閾値幅Ｗｔｈに収まっているかによって行われる。 It should be noted that the determination as to whether or not the user 401 is in the gaze state actually has all the fluctuation amounts of the axial direction values (that is, Rt, Pt, Yt) indicating the line-of-sight direction ρt within the threshold width Wth. Done depending on what you are doing.

次に、図１０は、聴取範囲について説明するための図である。この図１０では例として、二次元の仮想空間における聴取範囲の例を示す。
聴取範囲は、ユーザ位置Ｑｔを中心として設定される。図１０のように仮想空間が二次元座標によって定義される場合、聴取範囲は、聴取範囲の境界が、水平面（ｘ−ｙ平面）においてユーザ４０１の視線方向ρｔとなす角度θｔによって定義される。ここで言う境界とは、２次元空間の場合、ユーザ位置Ｑｔから放射状に延びる２本の直線である。そして、一方の境界線と視線方向ρｔとの間の範囲と、他方の境界線と視線方向ρｔとの間の範囲とが、聴取範囲となる。 Next, FIG. 10 is a diagram for explaining the listening range. FIG. 10 shows an example of a listening range in a two-dimensional virtual space as an example.
The listening range is set around the user position Qt. When the virtual space is defined by two-dimensional coordinates as shown in FIG. 10, the listening range is defined by an angle θt that the boundary of the listening range makes with the visual line direction ρt of the user 401 on the horizontal plane (xy plane). In the case of a two-dimensional space, the boundaries referred to here are two straight lines extending radially from the user position Qt. The range between the one boundary line and the line-of-sight direction ρt and the range between the other boundary line and the line-of-sight direction ρt are the listening range.

図１０（Ａ）の例では、聴取範囲は、ユーザ位置Ｑｔを中心として、視線方向ρｔから右回り方向および左回り方向に対してそれぞれθ１度の範囲を指す。聴取範囲の角度θｔの最大値は１８０度であり、本実施の形態では、聴取範囲の角度θｔの初期値は、最大値である１８０度であるものとする。また、聴取範囲の角度θｔの最小値は、０度より大きい所定の角度に設定される。 In the example of FIG. 10A, the listening range refers to a range of θ1 degrees with respect to the clockwise direction and the counterclockwise direction from the line-of-sight direction ρt with the user position Qt as the center. The maximum value of the listening range angle θt is 180 degrees, and in this embodiment, the initial value of the listening range angle θt is 180 degrees, which is the maximum value. In addition, the minimum value of the angle θt of the listening range is set to a predetermined angle greater than 0 degrees.

ここで、図１０に示すように、仮想空間において、ユーザ４０１の周囲の所定位置に音源Ｐ１〜Ｐ４，Ｐ９が配置されているものとする。音声処理装置２００は、ユーザ位置Ｑｔから見て聴取範囲に含まれる音源の音声信号を合成して、合成音声信号をユーザ端末４００に送信する。 Here, as shown in FIG. 10, it is assumed that sound sources P 1 to P 4 and P 9 are arranged at predetermined positions around the user 401 in the virtual space. The audio processing device 200 synthesizes audio signals of sound sources included in the listening range when viewed from the user position Qt, and transmits the synthesized audio signal to the user terminal 400.

図１０（Ａ）の例では、音源Ｐ１〜Ｐ４，Ｐ９が聴取範囲に含まれているので、音声処理装置２００は、音源Ｐ１〜Ｐ４，Ｐ９のそれぞれの音声信号を合成して、合成音声信号をユーザ端末４００に送信する。一方、図１０（Ｂ）の例では、聴取範囲の角度θｔは、θ１より小さいθ２に設定されており、図１０（Ａ）の場合より聴取範囲が狭められている。このとき、音源Ｐ１〜Ｐ３は聴取範囲に含まれているものの、音源Ｐ４，Ｐ９は聴取範囲に含まれていない。この状態では、音声処理装置２００は、音源Ｐ１〜Ｐ３のそれぞれの音声信号を合成して、合成音声信号をユーザ端末４００に送信する。従って、ユーザ４０１には音源Ｐ４，Ｐ９の各音声は聞こえない。 In the example of FIG. 10A, since the sound sources P1 to P4 and P9 are included in the listening range, the sound processing device 200 synthesizes the sound signals of the sound sources P1 to P4 and P9 to generate a combined sound signal. Is transmitted to the user terminal 400. On the other hand, in the example of FIG. 10B, the angle θt of the listening range is set to θ2, which is smaller than θ1, and the listening range is narrower than in the case of FIG. At this time, the sound sources P1 to P3 are included in the listening range, but the sound sources P4 and P9 are not included in the listening range. In this state, the audio processing device 200 synthesizes the audio signals of the sound sources P 1 to P 3 and transmits the synthesized audio signal to the user terminal 400. Therefore, the user 401 cannot hear the sounds of the sound sources P4 and P9.

聴取範囲の角度θｔが狭められることで、ユーザ４０１には視線方向ρｔの近くに配置された音源の音声のみが聞こえるようになる。これによりユーザ４０１は、自分が向いている方向から発せられる音声を容易に聞き取ることができるようになる。 By narrowing the angle θt of the listening range, the user 401 can hear only the sound of the sound source arranged near the line-of-sight direction ρt. As a result, the user 401 can easily hear the sound emitted from the direction in which the user 401 is facing.

なお、音声処理装置２００は、聴取範囲に含まれる各音源の音像を、ユーザ位置Ｑｔと各音源の位置との相対関係に基づいて、視線方向ρｔを基準とした左右方向の対応する位置に定位させる。図１０の例では、音声処理装置２００は、音源Ｐ１の音声信号について、右チャネルの音量より左チャネルの音量を大きくし、音源Ｐ３の音声信号について、左チャネルの音量より右チャネルの音量を大きくする。 Note that the sound processing device 200 localizes the sound image of each sound source included in the listening range to a corresponding position in the left-right direction with respect to the line-of-sight direction ρt based on the relative relationship between the user position Qt and the position of each sound source. Let In the example of FIG. 10, the audio processing device 200 increases the volume of the left channel with respect to the audio signal of the sound source P1, and increases the volume of the right channel with respect to the audio signal of the sound source P3. To do.

なお、本実施の形態では、音声処理装置２００は、聴取範囲に含まれない音源に対応する音声信号を、合成音声信号に合成しない。しかしながら、音声処理装置２００は、聴取範囲に含まれない音源に対応する音声信号についても、聴取範囲に含まれている音源に対応する音声信号より音量を低くして、合成音声信号に合成してもよい。 In the present embodiment, audio processing apparatus 200 does not synthesize an audio signal corresponding to a sound source not included in the listening range into a synthesized audio signal. However, the sound processing apparatus 200 synthesizes a sound signal corresponding to a sound source not included in the listening range with a synthesized sound signal with a volume lower than that of the sound signal corresponding to the sound source included in the listening range. Also good.

次に、図１１および図１２を用いて、視線方向ρｔの動きと聴取範囲の角度との関係について説明する。まず、図１１は、二次元の仮想空間における聴取範囲の変化について説明するための図である。 Next, the relationship between the movement in the line-of-sight direction ρt and the angle of the listening range will be described with reference to FIGS. 11 and 12. First, FIG. 11 is a diagram for explaining a change in the listening range in a two-dimensional virtual space.

音声処理装置２００は、ユーザ４０１の視線方向ρｔの動きが静止していない状態（非注視状態）では、聴取範囲の角度θｔを、最大値θｍａｘ（＝１８０度）に設定する。この状態では、ユーザ４０１には、周囲に配置されたすべての音源の音声が聞こえる。そして、音声処理装置２００は、視線方向ρｔの動きが静止した「注視状態」になったと判定すると、聴取範囲の角度θｔを狭くする。このとき、音声処理装置２００は、聴取範囲の角度θｔをすぐに最小値θｍｉｎに変更するのではなく、時間経過に従って徐々に狭くしていく。図１１の例では、注視状態に遷移した後、聴取範囲の角度θｔはθｍａｘからθ１，θ２，θｍｉｎのように徐々に狭められている。 The audio processing device 200 sets the angle θt of the listening range to the maximum value θmax (= 180 degrees) when the movement of the user 401 in the line-of-sight direction ρt is not stationary (non-gaze state). In this state, the user 401 can hear the sound of all the sound sources arranged around. When the sound processing device 200 determines that the movement in the line-of-sight direction ρt is a “gaze state” in which the movement is stationary, the sound processing apparatus 200 narrows the angle θt of the listening range. At this time, the audio processing device 200 does not immediately change the angle θt of the listening range to the minimum value θmin, but gradually narrows it as time passes. In the example of FIG. 11, after transitioning to the gaze state, the angle θt of the listening range is gradually narrowed from θmax to θ1, θ2, θmin.

このように、ユーザ４０１が注視状態になると、聴取範囲の角度θｔが狭められていき、ユーザ４０１の周囲に配置された音源のうち、ユーザ４０１に聞こえる音源の音声の数が減っていく。そして、最終的には、ユーザ４０１が向いている方向の周囲のみの狭い範囲、すなわち角度θｍｉｎが設定された聴取範囲に含まれる音源の音声のみが、ユーザ４０１に聞こえるようになる。また、聴取範囲が徐々に狭められることで、ユーザ４０１は、自分が向いている方向から発せられる音声を自然な感覚で聞き取ることができる。 As described above, when the user 401 is in a gaze state, the angle θt of the listening range is narrowed, and the number of sound sources that can be heard by the user 401 among the sound sources arranged around the user 401 decreases. Eventually, the user 401 can hear only the sound of the sound source included in the narrow range only around the direction in which the user 401 is facing, that is, the listening range in which the angle θmin is set. Further, by gradually narrowing the listening range, the user 401 can listen to the sound emitted from the direction in which he / she is facing with a natural feeling.

図１２は、三次元の仮想空間における聴取範囲の変化について説明するための図である。
仮想空間が三次元座標系によって定義される場合には、聴取範囲は、聴取範囲の境界が、水平方向において視線方向ρｔとなす角度θｔと、鉛直方向において視線方向ρｔとなす角度φｔとによって定義される。音声処理装置２００は、視線方向ρｔの動きが静止していない状態（非注視状態）では、聴取範囲の水平方向の角度θｔを最大値θｍａｘ（＝１８０度）に設定するとともに、聴取範囲の鉛直方向の角度φｔを最大値φｍａｘ（＝１８０度）に設定する。この状態では、ユーザ４０１には、周囲に配置されたすべての音源の音声が聞こえる。 FIG. 12 is a diagram for explaining a change in the listening range in the three-dimensional virtual space.
When the virtual space is defined by a three-dimensional coordinate system, the listening range is defined by an angle θt that makes the boundary of the listening range the visual line direction ρt in the horizontal direction and an angle φt that makes the visual line direction ρt in the vertical direction. Is done. When the movement in the line-of-sight direction ρt is not stationary (non-gaze state), the audio processing device 200 sets the horizontal angle θt of the listening range to the maximum value θmax (= 180 degrees) and the vertical range of the listening range. The direction angle φt is set to the maximum value φmax (= 180 degrees). In this state, the user 401 can hear the sound of all the sound sources arranged around.

そして、音声処理装置２００は、視線方向ρｔの動きが静止した「注視状態」になったと判定すると、聴取範囲の角度θｔ，φｔをともに徐々に狭くしていく。図１２の例では、注視状態に遷移した後、聴取範囲の水平方向の角度θｔはθｍａｘからθ１，θ２，θｍｉｎのように徐々に狭められていき、聴取範囲の鉛直方向の角度φｔはφｍａｘからφ１，φ２，φｍｉｎのように徐々に狭められていく。 When the sound processing apparatus 200 determines that the movement in the line-of-sight direction ρt is in a “gaze state” where the movement is stationary, both the angles θt and φt of the listening range are gradually narrowed. In the example of FIG. 12, after transitioning to the gaze state, the horizontal angle θt of the listening range is gradually narrowed from θmax to θ1, θ2, θmin, and the vertical angle φt of the listening range is changed from φmax. It is gradually narrowed like φ1, φ2, φmin.

次に、図１３および図１４を用いて、聴取範囲を変化させる方法の例について説明する。図１３は、聴取範囲を変化させる方法の第１の例を示す図である。
図１３の例では、時刻ｔｄから、判定時間Ｔａが経過した時刻ｔ０までの期間において、ユーザ４０１の視線方向ρｔの変動量が所定の閾値幅Ｗｔｈに収まっている。この場合、注視判定部２１４は、時刻ｔ０において、ユーザ４０１が注視状態になったと判定する。聴取範囲制御部２１５は、時刻ｔ０を起点として、次の式（１）に従って聴取範囲の角度θｔを減少させる。
θｔ＝（θｍａｘ−θｍｉｎ）／｛１＋ＥＸＰ［（（ｔｓ−ｔ０）−（Ｔｆ／２））×ａ］｝＋θｍｉｎ・・・（１）
なお、式（１）において、ｔｓは現在時刻を示す。また、Ｔｆは、図１３に示すように、聴取範囲の角度θｔが最大値θｍａｘから最小値θｍｉｎになるまでの時間を示し、例えば、音声処理装置２００の管理者によって任意に設定可能である。また、ａは、任意に設定可能な定数であり、例えばａ＝１２／Ｔｆに設定される。 Next, an example of a method for changing the listening range will be described with reference to FIGS. 13 and 14. FIG. 13 is a diagram illustrating a first example of a method for changing the listening range.
In the example of FIG. 13, during the period from time td to time t 0 when the determination time Ta has elapsed, the fluctuation amount of the user 401 in the line-of-sight direction ρt falls within the predetermined threshold width Wth. In this case, the gaze determination unit 214 determines that the user 401 is in a gaze state at time t0. The listening range control unit 215 starts the time t0 as a starting point and decreases the listening range angle θt according to the following equation (1).
θt = (θmax−θmin) / {1 + EXP [((ts−t0) − (Tf / 2)) × a]} + θmin (1)
In Equation (1), ts indicates the current time. Further, as shown in FIG. 13, Tf indicates a time until the angle θt of the listening range reaches the minimum value θmin from the maximum value θmax, and can be arbitrarily set by, for example, the administrator of the sound processing apparatus 200. Further, a is a constant that can be arbitrarily set, and is set to a = 12 / Tf, for example.

また、図１３の例では、注視判定部２１４は、ユーザ４０１が注視状態になった後、時刻ｔ０’において注視状態が解消したと判定する。聴取範囲制御部２１５は、時刻ｔ０’を起点として、次の式（２）に従って聴取範囲の角度θｔを増加させる。
θｔ＝（θｍａｘ−θｍｉｎ）／｛１＋ＥＸＰ［−（（ｔｓ−ｔ０’）−（Ｔｆ’／２））×ａ］｝＋θｍｉｎ・・・（２）
なお、式（２）において、Ｔｆ’は、図１３に示すように、聴取範囲の角度θｔが最小値θｍｉｎから最大値θｍａｘになるまでの時間を示す。Ｔｆ’は、音声処理装置２００の管理者によって任意に設定可能であり、例えばＴｆと同じ値に設定される。 In the example of FIG. 13, the gaze determination unit 214 determines that the gaze state has been resolved at time t0 ′ after the user 401 has entered the gaze state. The listening range control unit 215 starts the time t0 ′ and increases the listening range angle θt according to the following equation (2).
θt = (θmax−θmin) / {1 + EXP [− ((ts−t0 ′) − (Tf ′ / 2)) × a]} + θmin (2)
In Expression (2), Tf ′ indicates the time until the angle θt of the listening range reaches the maximum value θmax from the minimum value θmin, as shown in FIG. Tf ′ can be arbitrarily set by the administrator of the speech processing apparatus 200, and is set to the same value as Tf, for example.

図１４は、聴取範囲を変化させる方法の第２の例を示す図である。
図１４の例でも、図１３の例と同様に、時刻ｔｄから、判定時間Ｔａが経過した時刻ｔ０までの期間において、ユーザ４０１の視線方向ρｔの変動量が所定の閾値幅Ｗｔｈに収まっている。注視判定部２１４は、時刻ｔ０において、ユーザ４０１が注視状態になったと判定する。聴取範囲制御部２１５は、時刻ｔ０を起点として、次の式（３）に従って聴取範囲の角度θｔを減少させる。
θｔ＝｛ｂ^{[(ts-t0)-Tf/2]}｝＋θｍｉｎ・・・（３）
なお、式（３）において、ｂは任意に設定可能な定数である。 FIG. 14 is a diagram illustrating a second example of a method for changing the listening range.
In the example of FIG. 14 as well, in the period from the time td to the time t0 when the determination time Ta has elapsed, the variation amount of the user 401 in the line-of-sight direction ρt falls within the predetermined threshold width Wth, as in the example of FIG. . The gaze determination unit 214 determines that the user 401 is in a gaze state at time t0. The listening range control unit 215 starts the time t0 as a starting point and decreases the listening range angle θt according to the following equation (3).
θt = {b ^{[(ts−t0) −Tf / 2]} } + ^θmin (3)
In equation (3), b is a constant that can be set arbitrarily.

また、時刻ｔ０’において、注視判定部２１４が、注視状態が解消したと判定すると、聴取範囲制御部２１５は、時刻ｔ０’を起点として、次の式（４）に従って聴取範囲の角度θｔを増加させる。
θｔ＝｛ｃ×ｌｏｇ［（ｔｓ−ｔ０’）＋ｄ］＋θｍｉｎ・・・（４）
なお、式（４）において、ｃ，ｄは任意に設定可能な定数である。 When the gaze determination unit 214 determines that the gaze state has been resolved at time t0 ′, the listening range control unit 215 increases the angle θt of the listening range according to the following equation (4), starting from time t0 ′. Let
θt = {c × log [(ts−t0 ′) + d] + θmin (4)
In Equation (4), c and d are constants that can be arbitrarily set.

また、以上の図１３および図１４では聴取範囲の角度θｔについて説明したが、聴取範囲の角度φｔについても、上記の式（１）および式（２）、または、式（３）および式（４）に従って制御することができる。 13 and 14 described the listening range angle θt, but the listening range angle φt may also be expressed by the above formulas (1) and (2) or the formulas (3) and (4). ) Can be controlled according to.

図１５は、聴取範囲の角度が最小値になる前に注視状態が解消された場合の制御例を示す図である。
時刻ｔ０において、注視判定部２１４が、注視状態になったと判定すると、聴取範囲制御部２１５は、聴取範囲の角度θｔを徐々に減少させる。ところが、時刻ｔ０から時間Ｔｆが経過していない、すなわち聴取範囲の角度θｔが最小値θｍｉｎまで減少していない時刻ｔ１１において、注視判定部２１４が、注視状態が解消されたと判定したとする。この場合、聴取範囲制御部２１５は、例えば、聴取範囲の角度θｔを、最大値θｍａｘに達するまで徐々に増加させる。 FIG. 15 is a diagram illustrating a control example when the gaze state is canceled before the angle of the listening range reaches the minimum value.
When the gaze determination unit 214 determines that the gaze state is entered at time t0, the listening range control unit 215 gradually decreases the listening range angle θt. However, it is assumed that the gaze determination unit 214 determines that the gaze state has been resolved at time t11 when the time Tf has not elapsed since time t0, that is, the angle θt of the listening range has not decreased to the minimum value θmin. In this case, for example, the listening range control unit 215 gradually increases the angle θt of the listening range until the maximum value θmax is reached.

図１６は、聴取範囲の角度が最大値になる前に再度注視状態になった場合の制御例を示す図である。
時刻ｔ０’において、注視判定部２１４が、注視状態が解消されたと判定すると、聴取範囲制御部２１５は、例えば、聴取範囲の角度θｔを徐々に増加させる。ところが、時刻ｔ０’から時間Ｔｆ’が経過していない、すなわち聴取範囲の角度θｔが最小値θｍａｘに達していない時刻ｔ１２において、注視判定部２１４が、再度注視状態になったと判定したとする。この場合、聴取範囲制御部２１５は、聴取範囲の角度θｔを、最小値θｍｉｎになるまで徐々に減少させる。 FIG. 16 is a diagram illustrating a control example in a case where the user is in a gaze state again before the angle of the listening range reaches the maximum value.
When the gaze determination unit 214 determines that the gaze state has been eliminated at time t0 ′, the listening range control unit 215 gradually increases the angle θt of the listening range, for example. However, suppose that the gaze determination unit 214 determines that the gaze state is again in the gaze state at the time t12 when the time Tf ′ has not elapsed since the time t0 ′, that is, the angle θt of the listening range has not reached the minimum value θmax. In this case, the listening range control unit 215 gradually decreases the listening range angle θt until it reaches the minimum value θmin.

なお、以上の図１３〜図１６の例では、聴取範囲制御部２１５は、注視状態が解消された場合に、聴取範囲を時間経過に従って徐々に広げていくようにした。しかしながら、聴取範囲制御部２１５は、例えば、注視状態が解消された場合には、聴取範囲を即座に最大値θｍａｘに設定してもよい。あるいは、聴取範囲制御部２１５は、注視状態が解消された場合に、視線方向ρｔの変動の度合いに応じて、聴取範囲の角度θｔを増加させる速度を調整してもよい。 In the example of FIGS. 13 to 16 described above, the listening range control unit 215 gradually widens the listening range with time when the gaze state is resolved. However, the listening range control unit 215 may immediately set the listening range to the maximum value θmax, for example, when the gaze state is resolved. Alternatively, the listening range control unit 215 may adjust the speed at which the angle θt of the listening range is increased according to the degree of change in the line-of-sight direction ρt when the gaze state is resolved.

図１７は、聴取範囲に含まれる各音源の音量制御の例を示す図である。
音声処理装置２００の音声出力処理部２１６は、聴取範囲に含まれる複数の音源に対応する音声信号を合成する際に、例えば、ユーザ４０１から見て聴取範囲の中心に近い位置に配置された音源ほど、対応する音声信号の音量を大きくしてもよい。これにより、ユーザ４０１には、自分が向いている方向に近くにある音源ほど、大きな音で音声が聞こえるようになり、聞こえ方が自然になる。 FIG. 17 is a diagram illustrating an example of volume control of each sound source included in the listening range.
When the audio output processing unit 216 of the audio processing device 200 synthesizes audio signals corresponding to a plurality of sound sources included in the listening range, for example, a sound source arranged at a position close to the center of the listening range as viewed from the user 401 As a result, the volume of the corresponding audio signal may be increased. As a result, the user 401 can hear a louder sound as the sound source is closer to the direction in which the user 401 is facing, and the sound becomes natural.

音量の制御方法としては、例えば、次のような方法を用いることができる。
聴取範囲の角度θｔの範囲は、あらかじめ複数の範囲に区分される。そして、区分された範囲ごとに、範囲内に存在する音源に対応する音声信号に対して乗算するゲインの値が設定される。ゲインの値は、聴取範囲の中心に近いほど大きく設定される。 As a volume control method, for example, the following method can be used.
The range of the angle θt of the listening range is divided into a plurality of ranges in advance. Then, for each divided range, a gain value for multiplying the audio signal corresponding to the sound source existing in the range is set. The gain value is set larger as it is closer to the center of the listening range.

図１７の例では、聴取範囲の中心（角度が０度）から角度θｍａｘ（＝１８０度）までの範囲が、５つの範囲に区分されている。そして、聴取範囲の中心から角度θｍｉｎまでの範囲には、ゲイン「１」が設定される。聴取範囲の角度θｍｉｎから次の角度θ１分の範囲にはゲイン「０．８」が設定され、次の角度θ２分の範囲にはゲイン「０．６」が設定され、次のθ３分の範囲にはゲイン「０．４」が設定され、次の角度θ４分の範囲、すなわち角度θｍａｘまでの範囲にはゲイン「０．２」が設定される。 In the example of FIG. 17, the range from the center of the listening range (angle is 0 degrees) to the angle θmax (= 180 degrees) is divided into five ranges. A gain “1” is set in the range from the center of the listening range to the angle θmin. The gain “0.8” is set in the range of the next angle θ1 from the angle θmin of the listening range, the gain “0.6” is set in the range of the next angle θ2, and the next range of θ3. A gain “0.4” is set, and a gain “0.2” is set in the range up to the next angle θ4, that is, the range up to the angle θmax.

音声出力処理部２１６は、合成音声信号を生成する際に、各音源が上記の区分範囲のうちのどの範囲に存在するかを判定し、判定した範囲に対応するゲインを音源の音声信号のレベルに乗算し、ゲイン調整後の音声信号を合成する。 When generating the synthesized speech signal, the sound output processing unit 216 determines in which of the above-described divided ranges each sound source exists, and sets the gain corresponding to the determined range to the level of the sound signal of the sound source. To synthesize the audio signal after gain adjustment.

なお、図１７の例では、聴取範囲の角度に対して固定的にゲインを設定したが、例えば、聴取範囲の大きさの変化に応じて設定されるゲインが変化してもよい。この場合の例として、次のような制御方法を用いることができる。 In the example of FIG. 17, the gain is fixedly set with respect to the angle of the listening range. However, for example, the gain that is set according to the change in the size of the listening range may change. As an example of this case, the following control method can be used.

聴取範囲の中心（角度が０度）から、聴取範囲制御部２１５によって現在設定されている角度θｔまでの範囲が、割合に応じて区分され、区分された範囲ごとにゲインが設定される。ただし、ユーザ４０１が所望する音源の位置はユーザ４０１の視線方向ρｔと完全に重なるとは限らないため、ゲイン「１」を設定する範囲は一定の大きさに維持されることが望ましい。 The range from the center of the listening range (the angle is 0 degree) to the angle θt currently set by the listening range control unit 215 is divided according to the ratio, and a gain is set for each divided range. However, since the position of the sound source desired by the user 401 does not necessarily completely overlap with the line-of-sight direction ρt of the user 401, it is desirable that the range in which the gain “1” is set be maintained at a constant size.

そこで、例えば、聴取範囲における０度から角度θｍｉｎまでの固定的な範囲に、ゲイン「１」が設定される。そして、聴取範囲における０度から角度θｔまでの範囲のうち、角度θｍｉｎから次のθ１／θｔの範囲にはゲイン「０．８」が設定され、次のθ２／θｔの範囲にはゲイン「０．６」が設定され、次のθ３／θｔの範囲にはゲイン「０．４」が設定され、次のθ４／θｔの範囲にはゲイン「０．２」が設定される。このようにゲインを割り当てると、ユーザ４０１から見て同じ方向に配置された音源（ただし、中心からθｍｉｎの範囲に配置された音源を除く）の音量は、聴取範囲が狭くなるのに連れて徐々に小さくなる。従って、ユーザ４０１にとって自然な感覚で、所望の音源の音声が強調されるようになる。 Therefore, for example, the gain “1” is set in a fixed range from 0 degree to the angle θmin in the listening range. In the range from 0 degree to the angle θt in the listening range, the gain “0.8” is set in the range from the angle θmin to the next θ1 / θt, and the gain “0” is set in the next θ2 / θt range. .6 ”, a gain“ 0.4 ”is set in the next range of θ3 / θt, and a gain“ 0.2 ”is set in the next range of θ4 / θt. When the gain is assigned in this way, the volume of the sound source arranged in the same direction as viewed from the user 401 (except for the sound source arranged in the range of θmin from the center) gradually increases as the listening range becomes narrower. Becomes smaller. Therefore, the sound of a desired sound source is emphasized with a natural feeling for the user 401.

なお、上記のような音源に対するゲイン制御方法は、例えば、注視判定部２１４によって注視状態であると判定されている期間にのみ使用されてもよい。具体的には、注視状態でないと判定されている期間では、音声出力処理部２１６は、聴取範囲に含まれるすべての音源に対応する音声信号を、同じ音量比で（すなわち、すべてにゲイン「１」を乗算して）合成する。この状態では、ユーザ４０１には聴取範囲に含まれるすべての音源に対応する音声が均等に聞こえるが、ユーザ４０１は聞き取りたい音源をまだ特定していないと考えられるので、特に不自然にはならない。 Note that the gain control method for the sound source as described above may be used only during a period in which the gaze determination unit 214 determines that the gaze state is in the gaze state, for example. Specifically, in a period in which it is determined that the user is not in the gaze state, the audio output processing unit 216 outputs audio signals corresponding to all sound sources included in the listening range with the same volume ratio (that is, gain “1” for all). ”And combine). In this state, the user 401 can hear the sound corresponding to all the sound sources included in the listening range evenly. However, since the user 401 is considered to have not yet specified the sound source to be heard, it is not particularly unnatural.

そして、注視状態になったと判定されると、音声出力処理部２１６は、上記のように、聴取範囲に含まれる音源に対応する音声信号の音量を、聴取範囲の中心に近い音源ほど大きくするように制御する。ユーザ４０１は、注視状態になった時点で、視線方向ρｔに近接する音源に対応する音声を少し聞き取りやすくなる。そして、さらに聴取範囲が徐々に縮小されることで、ユーザ４０１には、視線方向ρｔに近接する音源に対応する音声がより明瞭に聞こえるようになる。 Then, when it is determined that the state of gaze is reached, the audio output processing unit 216 increases the volume of the audio signal corresponding to the sound source included in the listening range as described above, as the sound source is closer to the center of the listening range. To control. When the user 401 is in a gaze state, the user 401 can easily hear the sound corresponding to the sound source that is close to the line-of-sight direction ρt. Further, by further reducing the listening range, the user 401 can hear the sound corresponding to the sound source close to the line-of-sight direction ρt more clearly.

次に、図１８は、ユーザ管理テーブルに登録される情報の例を示す図である。
ユーザ管理テーブル２３０には、ユーザ４０１ごとにレコードが登録される。各レコードには、ユーザ４０１を識別するユーザＩＤに対応付けて、ユーザ座標、視線方向、静止時間、注視フラグ、有効音源ＩＤ、聴取範囲角度および非注視時間が登録される。 Next, FIG. 18 is a diagram illustrating an example of information registered in the user management table.
Records are registered in the user management table 230 for each user 401. In each record, user coordinates, line-of-sight direction, rest time, gaze flag, effective sound source ID, listening range angle, and non-gaze time are registered in association with the user ID for identifying the user 401.

ユーザ座標は、前述のユーザ位置Ｑｔを示す、仮想空間上の座標であり、ユーザ位置検出部２１１によって随時更新される。
視線方向は、前述の視線方向ρｔに対応し、ここでは例として、各軸のまわりの回転角度（Ｒｔ，Ｐｔ，Ｙｔ）で表される。視線方向は、注視判定部２１４によって随時更新される。 The user coordinates are coordinates on the virtual space indicating the above-described user position Qt, and are updated by the user position detection unit 211 as needed.
The line-of-sight direction corresponds to the above-described line-of-sight direction ρt, and here is represented by a rotation angle (Rt, Pt, Yt) around each axis as an example. The gaze direction is updated by the gaze determination unit 214 as needed.

静止時間は、注視判定部２１４が、視線方向ρｔが閾値幅Ｗｔｈ（図９参照）に収まっていると判断している時間である。例えば、静止時間は、図９における時刻ｔ１からの経過時間であり、図１２〜図１４における時刻ｔｄからの経過時間である。静止時間は、注視判定部２１４によって設定される。また、本実施の形態では、静止時間は１秒単位で登録されるものとする。 The stationary time is a time during which the gaze determination unit 214 determines that the line-of-sight direction ρt is within the threshold width Wth (see FIG. 9). For example, the stationary time is an elapsed time from time t1 in FIG. 9, and is an elapsed time from time td in FIGS. The stationary time is set by the gaze determination unit 214. In the present embodiment, it is assumed that the stationary time is registered in units of 1 second.

なお、静止時間は、聴取範囲の角度θｔ，φｔを減少させる際に聴取範囲制御部２１５によって参照される。ここで、注視判定部２１４が注視状態になったと判定してからの経過時間（式（１）および式（３）における（ｔｓ−ｔ０））は、静止時間から判定時間Ｔａ（図９、図１２〜図１４参照）を減算した値となる。 Note that the stationary time is referred to by the listening range control unit 215 when the listening range angles θt and φt are decreased. Here, the elapsed time after determining that the gaze determination unit 214 is in the gaze state ((ts−t0) in the expressions (1) and (3)) is determined from the stationary time to the determination time Ta (FIG. 9, FIG. 9). 12 to 14) is subtracted.

注視フラグは、注視状態であるか否かを示すフラグ情報であり、注視状態であるとき「１」に設定され、注視状態でないとき「０」に設定される。注視フラグは、注視判定部２１４によって設定される。 The gaze flag is flag information indicating whether or not the user is in the gaze state, and is set to “1” when the gaze state is set, and is set to “0” when the gaze state is not set. The gaze flag is set by the gaze determination unit 214.

有効音源ＩＤは、聴取範囲に含まれている音源を示す音源ＩＤであり、聴取範囲制御部２１５によって設定される。なお、聴取範囲にいずれの音源も含まれていない場合、有効音源ＩＤには「０」が設定される。 The effective sound source ID is a sound source ID indicating a sound source included in the listening range, and is set by the listening range control unit 215. When no sound source is included in the listening range, “0” is set as the effective sound source ID.

聴取範囲角度は、前述の聴取範囲の角度θｔ，φｔであり、聴取範囲制御部２１５によって設定される。
非注視時間は、注視状態が解消されてからの経過時間を示す。例えば、非注視時間は、図１３，図１４における時刻ｔ’０からの経過時間である。非注視時間は、注視判定部２１４によって設定され、聴取範囲の角度θｔ，φｔを増加させる際に聴取範囲制御部２１５によって参照される。 The listening range angles are the above-described listening range angles θt and φt, and are set by the listening range control unit 215.
The non-gaze time indicates an elapsed time after the gaze state is resolved. For example, the non-gaze time is an elapsed time from time t′0 in FIGS. The non-gaze time is set by the gaze determination unit 214 and is referred to by the listening range control unit 215 when increasing the angles θt and φt of the listening range.

次に、図１９は、注視判定部の処理手順の例を示すフローチャートである。図１９の処理は、ユーザ４０１ごとに実行される。また、注視判定部２１４は、図１９の処理の開始時に、ユーザ管理テーブル２３０に対して、初期値として静止時間「０」、非注視時間「０」、注視フラグ「０」を設定する。 Next, FIG. 19 is a flowchart illustrating an example of a processing procedure of the gaze determination unit. The process in FIG. 19 is executed for each user 401. In addition, the gaze determination unit 214 sets a stationary time “0”, a non-gaze time “0”, and a gaze flag “0” as initial values in the user management table 230 at the start of the processing of FIG.

［ステップＳ１１］注視判定部２１４は、ユーザ端末４００から受信した視線方向ρｔの検出結果を１秒分取り込む。なお、視線方向ρｔの検出結果は、１秒間に複数回、ユーザ端末４００から送信されるものとする。また、注視判定部２１４は、視線方向ρｔの検出結果を受信するたびに、受信した値をユーザ管理テーブル２３０の視線方向の欄に登録する。 [Step S11] The gaze determination unit 214 captures the detection result of the line-of-sight direction ρt received from the user terminal 400 for one second. Note that the detection result of the line-of-sight direction ρt is transmitted from the user terminal 400 a plurality of times per second. The gaze determination unit 214 registers the received value in the line of sight direction field of the user management table 230 every time it receives the detection result of the line of sight direction ρt.

［ステップＳ１２］注視判定部２１４は、取り込んだ１秒分の視線方向ρｔの各軸についての変動量が、すべて閾値幅Ｗｔｈに収まっているかを判定する。注視判定部２１４は、変動量が閾値幅Ｗｔｈに収まっている場合には、ステップＳ１３の処理を実行する。一方、注視判定部２１４は、各軸について変動量のうち少なくとも１つが閾値幅Ｗｔｈに収まっていない場合には、ステップＳ２１の処理を実行する。 [Step S12] The gaze determination unit 214 determines whether all of the fluctuation amounts for each axis of the captured gaze direction ρt for one second are within the threshold width Wth. The gaze determination unit 214 executes the process of step S13 when the variation amount is within the threshold width Wth. On the other hand, the gaze determination unit 214 executes the process of step S21 when at least one of the fluctuation amounts for each axis does not fall within the threshold width Wth.

［ステップＳ１３］注視判定部２１４は、ユーザ管理テーブル２３０における静止時間の値を「１」だけインクリメントする。
［ステップＳ１４］注視判定部２１４は、ユーザ端末４００から受信した視線方向ρｔの検出結果を１秒分取り込む。なお、ステップＳ１１と同様に、注視判定部２１４は、視線方向ρｔの検出結果を受信するたびに、受信した値をユーザ管理テーブル２３０の視線方向の欄に登録する。 [Step S13] The gaze determination unit 214 increments the value of the still time in the user management table 230 by “1”.
[Step S14] The gaze determination unit 214 captures the detection result of the line-of-sight direction ρt received from the user terminal 400 for one second. As in step S11, the gaze determination unit 214 registers the received value in the line of sight direction field of the user management table 230 every time it receives the detection result of the line of sight direction ρt.

［ステップＳ１５］注視判定部２１４は、ステップＳ１２で「Ｙｅｓ」と判定してから現在までの期間における視線方向ρｔの各軸についての変動量が、すべて閾値幅Ｗｔｈに収まっているかを判定する。注視判定部２１４は、変動量が閾値幅Ｗｔｈに収まっている場合には、ステップＳ１６の処理を実行する。一方、注視判定部２１４は、各軸について変動量のうち少なくとも１つが閾値幅Ｗｔｈに収まっていない場合には、ステップＳ２０の処理を実行する。 [Step S15] The gaze determination unit 214 determines whether or not the amount of variation for each axis in the line-of-sight direction ρt in the period from the determination of “Yes” in Step S12 to the present time is all within the threshold width Wth. The gaze determination unit 214 executes the process of step S16 when the variation amount is within the threshold width Wth. On the other hand, the gaze determination unit 214 executes the process of step S20 when at least one of the fluctuation amounts for each axis does not fall within the threshold width Wth.

［ステップＳ１６］注視判定部２１４は、ユーザ管理テーブル２３０における静止時間の値を「１」だけインクリメントする。
［ステップＳ１７］注視判定部２１４は、ステップＳ１２で「Ｙｅｓ」と判定してからの経過時間が、判定時間Ｔａに達したかを判定する。ここで言う経過時間は、ユーザ管理テーブル２３０の静止時間の欄に登録された秒数である。注視判定部２１４は、経過時間が判定時間Ｔａに達した場合には、ステップＳ１９の処理を実行する。一方、注視判定部２１４は、経過時間が判定時間Ｔａに達していない場合、ステップＳ１８の処理を実行する。 [Step S16] The gaze determination unit 214 increments the value of the stationary time in the user management table 230 by “1”.
[Step S17] The gaze determination unit 214 determines whether or not the elapsed time after determining “Yes” in Step S12 has reached the determination time Ta. The elapsed time mentioned here is the number of seconds registered in the column of the still time in the user management table 230. When the elapsed time reaches the determination time Ta, the gaze determination unit 214 executes the process of step S19. On the other hand, when the elapsed time has not reached the determination time Ta, the gaze determination unit 214 executes the process of step S18.

［ステップＳ１８］ステップＳ１７で「Ｎｏ」と判定された状態とは、ユーザ４０１が注視状態になっていない状態（非注視状態）である。この場合、注視判定部２１４は、ユーザ管理テーブル２３０における非注視時間の値を「１」だけインクリメントする。この後、ステップＳ１４の処理が実行される。 [Step S18] The state determined as “No” in Step S17 is a state where the user 401 is not in a gaze state (non-gaze state). In this case, the gaze determination unit 214 increments the value of the non-gaze time in the user management table 230 by “1”. Thereafter, the process of step S14 is executed.

［ステップＳ１９］ステップＳ１７で「Ｙｅｓ」と判定された状態とは、ユーザ４０１が注視状態になったと判断される状態である。この場合、注視判定部２１４は、ユーザ管理テーブル２３０における注視フラグの値を「１」に更新するとともに、非注視時間の値を「０」にリセットする。この後、ステップＳ１４の処理が実行される。 [Step S19] The state determined as “Yes” in Step S17 is a state where it is determined that the user 401 is in a gaze state. In this case, the gaze determination unit 214 updates the value of the gaze flag in the user management table 230 to “1” and resets the value of the non-gaze time to “0”. Thereafter, the process of step S14 is executed.

［ステップＳ２０］ステップＳ１５で「Ｎｏ」と判定された状態とは、ユーザ４０１の向きの動きが大きくなった状態である。この場合、注視判定部２１４は、ユーザ管理テーブル２３０における静止時間の値を「０」にリセットする。また、注視判定部２１４は、ユーザ管理テーブル２３０における注視フラグの値が「１」である場合には、この値を「０」に更新する。この後、ステップＳ２１の処理が実行される。 [Step S20] The state determined as “No” in Step S15 is a state in which the movement of the direction of the user 401 has increased. In this case, the gaze determination unit 214 resets the value of the stationary time in the user management table 230 to “0”. In addition, when the value of the gaze flag in the user management table 230 is “1”, the gaze determination unit 214 updates this value to “0”. Thereafter, the process of step S21 is executed.

［ステップＳ２１］注視判定部２１４は、ユーザ管理テーブル２３０における非注視時間の値を「１」だけインクリメントする。この後、ステップＳ１１の処理が実行される。
図２０，図２１は、聴取範囲制御部および音声出力処理部の処理手順の例を示すフローチャートである。図２０，図２１の処理は、ユーザ４０１ごとに実行される。また、例えば、図２０のステップＳ３１の処理が１音声フレーム分の周期で実行されるように制御される。 [Step S21] The gaze determination unit 214 increments the value of the non-gaze time in the user management table 230 by “1”. Thereafter, the process of step S11 is executed.
20 and 21 are flowcharts illustrating examples of processing procedures of the listening range control unit and the audio output processing unit. The processing in FIGS. 20 and 21 is executed for each user 401. Further, for example, control is performed so that the process of step S31 in FIG. 20 is executed at a cycle of one audio frame.

［ステップＳ３１］聴取範囲制御部２１５は、ユーザ管理テーブル２３０における注視フラグの値をチェックする。聴取範囲制御部２１５は、注視フラグの値が「１」の場合には、ステップＳ３２の処理を実行する。一方、聴取範囲制御部２１５は、注視フラグの値が「０」の場合には、図２１のステップＳ５１の処理を実行する。 [Step S31] The listening range control unit 215 checks the value of the gaze flag in the user management table 230. When the value of the gaze flag is “1”, the listening range control unit 215 executes the process of step S32. On the other hand, when the value of the gaze flag is “0”, the listening range control unit 215 executes the process of step S51 of FIG.

［ステップＳ３２］聴取範囲制御部２１５は、ユーザ管理テーブル２３０における静止時間の欄に登録された値に基づき、注視判定部２１４によって注視状態になったと判定されてからの経過時間を計算する。そして、聴取範囲制御部２１５は、経過時間が時間Ｔｆ未満かを判定する。聴取範囲制御部２１５は、経過時間が時間Ｔｆ未満である場合には、ステップＳ３３の処理を実行する。一方、聴取範囲制御部２１５は、経過時間が時間Ｔｆ以上である場合には、ステップＳ３４の処理を実行する。 [Step S 32] The listening range control unit 215 calculates an elapsed time after the gaze determination unit 214 determines that the gaze state is entered based on the value registered in the static time column in the user management table 230. Then, the listening range control unit 215 determines whether the elapsed time is less than the time Tf. When the elapsed time is less than the time Tf, the listening range control unit 215 executes the process of step S33. On the other hand, when the elapsed time is equal to or longer than the time Tf, the listening range control unit 215 executes the process of step S34.

［ステップＳ３３］ステップＳ３２で「Ｙｅｓ」と判定される状態とは、聴取範囲の角度θｔ，φｔを徐々に減少させている途中の状態である。この場合、聴取範囲制御部２１５は、ステップＳ３２で算出した経過時間に応じて、前述の式（１）または式（３）に従って、聴取範囲の角度θｔ，φｔを計算する。聴取範囲制御部２１５は、算出した角度θｔ，φｔを、ユーザ管理テーブル２３０における聴取範囲角度の欄に登録する。 [Step S33] The state determined as “Yes” in Step S32 is a state in which the angles θt and φt of the listening range are being gradually decreased. In this case, the listening range control unit 215 calculates the angles θt and φt of the listening range according to the above formula (1) or formula (3) according to the elapsed time calculated in step S32. The listening range control unit 215 registers the calculated angles θt and φt in the listening range angle column in the user management table 230.

［ステップＳ３４］聴取範囲制御部２１５は、聴取範囲の角度θｔ，φｔを所定の最小値に決定し、決定した値をユーザ管理テーブル２３０における聴取範囲角度の欄に登録する。 [Step S34] The listening range control unit 215 determines the angles θt and φt of the listening range as predetermined minimum values, and registers the determined values in the listening range angle column of the user management table 230.

［ステップＳ３５］聴取範囲制御部２１５は、ユーザ管理テーブル２３０におけるユーザ座標、視線方向および聴取範囲角度の各欄の値と、音源管理テーブル２２０に登録された各音源の位置情報とに基づき、聴取範囲角度の値によって設定される聴取範囲に含まれる音源をチェックする。 [Step S35] The listening range control unit 215 listens based on the values in the fields of the user coordinates, the line-of-sight direction, and the listening range angle in the user management table 230 and the position information of each sound source registered in the sound source management table 220. Check the sound sources included in the listening range set by the range angle value.

［ステップＳ３６］聴取範囲制御部２１５は、聴取範囲に音源が１つ以上含まれているかを判定する。聴取範囲制御部２１５は、聴取範囲に音源が１つ以上含まれている場合には、ステップＳ３８の処理を実行する。一方、聴取範囲制御部２１５は、聴取範囲に音源が１つも含まれていない場合には、ステップＳ３７の処理を実行する。 [Step S36] The listening range control unit 215 determines whether one or more sound sources are included in the listening range. If one or more sound sources are included in the listening range, the listening range control unit 215 executes the process of step S38. On the other hand, the listening range control unit 215 executes the process of step S37 when no sound source is included in the listening range.

［ステップＳ３７］聴取範囲制御部２１５は、ユーザ管理テーブル２３０における聴取範囲角度の欄に登録された角度θｔ，φｔを、それぞれ１段階大きくして補正する。この後、ステップＳ３５の処理が実行される。 [Step S37] The listening range control unit 215 corrects the angles θt and φt registered in the listening range angle column in the user management table 230 by increasing them by one step. Thereafter, the process of step S35 is executed.

［ステップＳ３８］聴取範囲制御部２１５は、聴取範囲に含まれている音源の音源ＩＤを、ユーザ管理テーブル２３０における有効音源ＩＤの欄に登録する。
［ステップＳ３９］音声出力処理部２１６は、ユーザ管理テーブル２３０におけるユーザ座標、視線方向および聴取範囲角度の各欄の値と、音源管理テーブル２２０に登録された各音源の位置情報とに基づき、ユーザ管理テーブル２３０における有効音源ＩＤの欄に登録された各音源について、合成音声信号に合成する際のゲインを決定する。 [Step S38] The listening range control unit 215 registers the sound source ID of the sound source included in the listening range in the column of the effective sound source ID in the user management table 230.
[Step S39] The audio output processing unit 216 uses the user coordinates, line-of-sight direction, and listening range angle fields in the user management table 230, and the position information of each sound source registered in the sound source management table 220 based on the user information. For each sound source registered in the column of effective sound source ID in the management table 230, a gain at the time of synthesizing the synthesized sound signal is determined.

例えば、音声出力処理部２１６は、図１７で説明した処理手順に従って、聴取範囲の中心に近い音源ほど、対応する音声信号のゲインを大きく設定する。また、音声出力処理部２１６は、ユーザ位置Ｑｔと音源の位置とを結ぶ直線と、視線方向ρｔとのなす角度、および、聴取範囲の中心に対して音源が左右どちらに配置されているかに応じて、左チャネルおよび右チャネルのそれぞれにおける音量バランスを調整して、音源の音像を左右方向のいずれかの位置に定位させる。 For example, the audio output processing unit 216 sets the gain of the corresponding audio signal to be larger as the sound source is closer to the center of the listening range in accordance with the processing procedure described in FIG. In addition, the audio output processing unit 216 depends on the angle formed by the line connecting the user position Qt and the position of the sound source and the line-of-sight direction ρt, and whether the sound source is arranged on the left or right with respect to the center of the listening range. Then, the volume balance in each of the left channel and the right channel is adjusted, and the sound image of the sound source is localized at any position in the left-right direction.

［ステップＳ４０］音声出力処理部２１６は、ステップＳ３９で決定したゲインを適用して、合成音声信号を生成し、ユーザ端末４００に送信する。この後、ステップＳ３１に戻る。 [Step S 40] The audio output processing unit 216 generates a synthesized audio signal by applying the gain determined in step S 39 and transmits the synthesized audio signal to the user terminal 400. Thereafter, the process returns to step S31.

［ステップＳ５１］聴取範囲制御部２１５は、ユーザ管理テーブル２３０における非注視時間の欄に登録された値に基づき、注視判定部２１４によって注視状態が解消されたと判定されてからの経過時間を計算する。そして、聴取範囲制御部２１５は、経過時間が時間Ｔｆ’未満かを判定する。聴取範囲制御部２１５は、経過時間が時間Ｔｆ’未満である場合には、ステップＳ５２の処理を実行する。一方、聴取範囲制御部２１５は、経過時間が時間Ｔｆ’以上である場合には、ステップＳ５３の処理を実行する。 [Step S51] The listening range control unit 215 calculates an elapsed time after the gaze determination unit 214 determines that the gaze state has been resolved based on the value registered in the non-gaze time column in the user management table 230. . Then, the listening range control unit 215 determines whether the elapsed time is less than the time Tf ′. When the elapsed time is less than the time Tf ′, the listening range control unit 215 executes the process of step S52. On the other hand, when the elapsed time is equal to or longer than the time Tf ′, the listening range control unit 215 executes the process of step S53.

［ステップＳ５２］ステップＳ５１で「Ｙｅｓ」と判定される状態とは、聴取範囲の角度θｔ，φｔを徐々に増加させている途中の状態である。この場合、聴取範囲制御部２１５は、ステップＳ５１で算出した経過時間に応じて、前述の式（２）または式（４）に従って、聴取範囲の角度θｔ，φｔを計算する。聴取範囲制御部２１５は、算出した角度θｔ，φｔを、ユーザ管理テーブル２３０における聴取範囲角度の欄に登録する。 [Step S52] The state determined as “Yes” in Step S51 is a state in which the angles θt and φt of the listening range are being gradually increased. In this case, the listening range control unit 215 calculates the angles θt and φt of the listening range according to the above formula (2) or formula (4) according to the elapsed time calculated in step S51. The listening range control unit 215 registers the calculated angles θt and φt in the listening range angle column in the user management table 230.

［ステップＳ５３］聴取範囲制御部２１５は、聴取範囲の角度θｔ，φｔを所定の最大値に決定し、決定した値をユーザ管理テーブル２３０における聴取範囲角度の欄に登録する。 [Step S 53] The listening range control unit 215 determines the listening range angles θt and φt to predetermined maximum values, and registers the determined values in the listening range angle column of the user management table 230.

なお、聴取範囲制御部２１５は、上記のステップＳ５１の判定を行わずに、無条件でステップＳ５３の処理を実行してもよい。
［ステップＳ５４］ユーザ管理テーブル２３０におけるユーザ座標、視線方向および聴取範囲角度の各欄の値と、音源管理テーブル２２０に登録された各音源の位置情報とに基づき、聴取範囲角度の値によって設定される聴取範囲に含まれる音源をチェックする。聴取範囲制御部２１５は、聴取範囲に含まれている音源の音源ＩＤを、ユーザ管理テーブル２３０における有効音源ＩＤの欄に登録する。この後、ステップＳ３９の処理が実行される。 Note that the listening range control unit 215 may unconditionally execute the process of step S53 without performing the determination of step S51.
[Step S54] Based on the values in the fields of the user coordinates, line-of-sight direction, and listening range angle in the user management table 230, and the position information of each sound source registered in the sound source management table 220, the listening range angle value is set. Check the sound sources included in the listening range. The listening range control unit 215 registers the sound source ID of the sound source included in the listening range in the column of the effective sound source ID in the user management table 230. Thereafter, the process of step S39 is executed.

以上の図２０，図２１によれば、注視判定部２１４によって注視状態になったと判定されると、聴取範囲が徐々に縮小されていき、聴取範囲の中心付近の音源に対応する音声が徐々に強調されてユーザ４０１に聞こえるようになる。従って、ユーザ４０１は、自分が向いている方向に配置された音源に対応する音声を容易に聞き分けることができる。 According to FIGS. 20 and 21 described above, when the gaze determination unit 214 determines that the gaze state is reached, the listening range is gradually reduced, and the sound corresponding to the sound source near the center of the listening range is gradually increased. It is emphasized and can be heard by the user 401. Therefore, the user 401 can easily recognize the sound corresponding to the sound source arranged in the direction in which the user 401 is facing.

また、上記の図２０のステップＳ３６，Ｓ３７では、聴取範囲制御部２１５は、注視状態になっているとき、聴取範囲内に必ず音源が存在するように聴取範囲の角度を調整する。ここで、図２２は、注視状態にあるユーザが展示物に近づいたときの様子を示す図である。 Further, in steps S36 and S37 of FIG. 20, the listening range control unit 215 adjusts the angle of the listening range so that the sound source is always present in the listening range when in the gaze state. Here, FIG. 22 is a diagram illustrating a state when a user in a gaze state approaches the exhibit.

図２２における状態１では、ユーザ４０１は展示物３１０ａに向いた状態で注視状態になっている。ユーザ位置Ｑｔと展示物３１０ａとの距離はｄ１であり、聴取範囲の角度θｔは角度θ１である。そして、聴取範囲には、展示物３１０ａに対応する音源Ｐ１が含まれる。このためユーザ４０１は、音源Ｐ１に対応する音声を、ヘッドフォン５００を通じて聞くことができる。 In state 1 in FIG. 22, the user 401 is in a state of gazing while facing the exhibit 310a. The distance between the user position Qt and the exhibit 310a is d1, and the angle θt of the listening range is the angle θ1. The listening range includes the sound source P1 corresponding to the exhibit 310a. For this reason, the user 401 can hear the sound corresponding to the sound source P 1 through the headphones 500.

ここで、状態２に示すように、注視状態が維持されたままユーザ４０１が展示物３１０ａに近づいたものとする。このとき、ユーザ位置Ｑｔと展示物３１０ａとの距離はｄ２になったとする。聴取範囲の角度θｔが角度θ１のままであった場合、状態２のように、展示物３１０ａに対応する音源Ｐ１が、聴取範囲に含まれなくなってしまう可能性がある。この場合、ユーザ４０１は、音源Ｐ１に対応する音声を聞くことができない。 Here, as shown in state 2, it is assumed that the user 401 approaches the exhibit 310 a while the gaze state is maintained. At this time, it is assumed that the distance between the user position Qt and the exhibit 310a is d2. When the angle θt of the listening range remains the angle θ1, the sound source P1 corresponding to the exhibit 310a may not be included in the listening range as in the state 2. In this case, the user 401 cannot hear the sound corresponding to the sound source P1.

上記の図２０のステップＳ３６，Ｓ３７によれば、聴取範囲を縮小したときに、聴取範囲に音源が１つも含まれなくなる場合には、音源が少なくとも１つ含まれるようになるまで聴取範囲が拡大される。このような処理により、図２２の状態２のように、ユーザ４０１が展示物３１０ａに近づくことで対応する音声を聞けなくなってしまうという事態を回避できる。 According to steps S36 and S37 in FIG. 20 described above, when the listening range is reduced, if no sound source is included in the listening range, the listening range is expanded until at least one sound source is included. Is done. Such a process can avoid a situation in which the user 401 cannot hear the corresponding sound when the user 401 approaches the exhibit 310a as in the state 2 of FIG.

また、他の方法として、聴取範囲制御部２１５は、ユーザ位置Ｑｔと展示物との距離を検出し、距離が近くなるほど聴取範囲の角度を大きくするように補正してもよい。この方法により、例えば図２０の状態３に示すように、ユーザ位置Ｑｔと展示物３１０ａとの距離がｄ２まで近づいたときでも、聴取範囲に音源Ｐ１が含まれる可能性が高くなる。 As another method, the listening range control unit 215 may detect the distance between the user position Qt and the exhibit and correct the angle so that the angle of the listening range increases as the distance decreases. With this method, for example, as shown in state 3 of FIG. 20, even when the distance between the user position Qt and the exhibit 310a approaches d2, the possibility that the sound source P1 is included in the listening range is increased.

ユーザ位置Ｑｔと展示物との距離に応じて聴取範囲の角度を補正する方法は、次のような制御によって実現可能である。音声処理装置２００の記憶装置には、次の図２３に示すような展示物管理テーブルが格納される。 A method of correcting the angle of the listening range according to the distance between the user position Qt and the exhibit can be realized by the following control. The storage device of the sound processing device 200 stores an exhibit management table as shown in FIG.

図２３は、展示物管理テーブルに登録される情報の例を示す図である。
展示物管理テーブル２４０には、展示物ごとにレコードが登録され、各レコードには、展示物を識別するための展示物ＩＤに対して、展示物が配置された領域を示す情報が登録される。図２３の例では、展示物の配置領域を示す情報として、頂点数および頂点座標が登録されている。頂点数は、展示物の周縁部に存在する頂点の数を示す。頂点座標は、仮想空間における各頂点の座標を示す。 FIG. 23 is a diagram illustrating an example of information registered in the exhibit management table.
In the exhibit management table 240, a record is registered for each exhibit. In each record, information indicating an area where the exhibit is arranged is registered with respect to the exhibit ID for identifying the exhibit. . In the example of FIG. 23, the number of vertices and vertex coordinates are registered as information indicating the arrangement area of the exhibit. The number of vertices indicates the number of vertices existing at the peripheral edge of the exhibit. The vertex coordinates indicate the coordinates of each vertex in the virtual space.

図２４は、聴取範囲制御部および音声出力処理部の処理手順の変形例を示すフローチャートである。この図２４では、図２０と同じ処理が実行される処理ステップには同じ符号を付して示しており、これらの処理ステップについての説明を省略する。 FIG. 24 is a flowchart illustrating a modification of the processing procedure of the listening range control unit and the audio output processing unit. In FIG. 24, processing steps in which the same processing as that in FIG. 20 is executed are denoted by the same reference numerals, and description of these processing steps is omitted.

図２４に示す処理では、図２０におけるステップＳ３５〜Ｓ３７の代わりに、ステップＳ６１，Ｓ６２が実行される。
［ステップＳ６１］聴取範囲制御部２１５は、聴取範囲に含まれる音源に対応付けられた展示物との距離を計算する。具体的には、聴取範囲制御部２１５は、音源管理テーブル２２０を参照して、聴取範囲に含まれる音源に対応する展示物を特定する。聴取範囲制御部２１５は、特定した各展示物について、展示物管理テーブル２４０から頂点数および頂点座標を読み込み、各頂点とユーザ位置Ｑｔとの距離を計算する。聴取範囲制御部２１５は、特定したすべての展示物についてのすべての頂点と、ユーザ位置Ｑｔとの距離の算出結果のうち、最も小さい算出結果を、展示物との距離とする。 In the process shown in FIG. 24, steps S61 and S62 are executed instead of steps S35 to S37 in FIG.
[Step S61] The listening range control unit 215 calculates the distance from the exhibit associated with the sound source included in the listening range. Specifically, the listening range control unit 215 refers to the sound source management table 220 and identifies exhibits that correspond to the sound sources included in the listening range. The listening range control unit 215 reads the number of vertices and vertex coordinates from the exhibit management table 240 for each identified exhibit, and calculates the distance between each vertex and the user position Qt. The listening range control unit 215 sets the smallest calculation result among the calculation results of the distances between all the vertices of all the identified exhibits and the user position Qt as the distance to the exhibit.

［ステップＳ６２］聴取範囲制御部２１５は、算出された展示物との距離に応じて、ユーザ管理テーブル２３０の聴取範囲角度に登録された角度θｔ，φｔを補正する。例えば、聴取範囲制御部２１５は、登録された角度θｔ，φｔに、展示物との距離に応じた補正係数を乗じることで補正を行う。聴取範囲制御部２１５は、例えば、展示物との距離が所定の下限値以上である場合には、補正係数を「１」とする。そして、聴取範囲制御部２１５は、展示物との距離が下限値未満である場合には、距離が短くなるほど補正係数を「１」より小さく設定する。 [Step S62] The listening range control unit 215 corrects the angles θt and φt registered in the listening range angle of the user management table 230 according to the calculated distance from the exhibit. For example, the listening range control unit 215 performs correction by multiplying the registered angles θt and φt by a correction coefficient corresponding to the distance from the exhibit. For example, the listening range control unit 215 sets the correction coefficient to “1” when the distance to the exhibit is equal to or greater than a predetermined lower limit. When the distance to the exhibit is less than the lower limit value, the listening range control unit 215 sets the correction coefficient smaller than “1” as the distance becomes shorter.

以上のステップＳ６１，Ｓ６２の処理により、ユーザ４０１が展示物に近づくほど聴取範囲が拡大されるため、ユーザ４０１が向かった先にある展示物に対応する音声をユーザ４０１が聞けなくなる、という事態が発生しにくくなる。 As a result of the processes in steps S61 and S62 described above, the listening range is expanded as the user 401 approaches the exhibit, and thus the user 401 cannot hear the sound corresponding to the exhibit at the destination of the user 401. Less likely to occur.

なお、以上の図２４の処理では、ユーザ位置Ｑｔと展示物との距離に応じて聴取範囲の角度を補正したが、他の例として、ユーザ位置Ｑｔと音源の位置との距離に応じて聴取範囲の角度を補正してもよい。 In the above-described processing of FIG. 24, the angle of the listening range is corrected according to the distance between the user position Qt and the exhibit, but as another example, the listening according to the distance between the user position Qt and the sound source position. The angle of the range may be corrected.

〔第３の実施の形態〕
図２５は、第３の実施の形態に係る音声提供システムの構成例を示す図である。なお、図２５では、図３に対応する構成要素には同じ符号を付して示し、それらの説明を省略する。以下、第３の実施の形態と第２の実施の形態との相違点について説明する。 [Third Embodiment]
FIG. 25 is a diagram illustrating a configuration example of a voice providing system according to the third embodiment. 25, components corresponding to those in FIG. 3 are denoted by the same reference numerals, and description thereof is omitted. Hereinafter, differences between the third embodiment and the second embodiment will be described.

第３の実施の形態に係る音声提供システムでは、ユーザ４０１は、ヘッドフォン５００から音声を聞く代わりに、展示会場内に設置された複数のスピーカから出力される音声を聞く。従って、ユーザ４０１はヘッドフォン５００を装着する必要はなく、方向センサ５１０とユーザ端末４００ａのみを装着する。ユーザ端末４００ａは、方向センサ５１０による検出結果を、音声処理装置２００ａに対して無線送信する。 In the audio providing system according to the third embodiment, the user 401 listens to audio output from a plurality of speakers installed in the exhibition hall instead of listening to audio from the headphones 500. Therefore, the user 401 does not need to wear the headphones 500, but only the direction sensor 510 and the user terminal 400a. The user terminal 400a wirelessly transmits the detection result by the direction sensor 510 to the voice processing device 200a.

なお、図２５では例として４つのスピーカ３３０ａ〜３３０ｄが設けられている。ユーザ４０１の左右方向および前後方向に音像を定位させるためには、スピーカは少なくとも３つ以上設けられる。 In FIG. 25, four speakers 330a to 330d are provided as an example. In order to localize the sound image in the left-right direction and the front-rear direction of the user 401, at least three speakers are provided.

図２６は、第３の実施の形態におけるユーザ端末および音声処理装置の処理機能の例を示すブロック図である。なお、図２６では、図６に対応する構成要素には同じ符号を付して示し、それらの説明を省略する。 FIG. 26 is a block diagram illustrating an example of processing functions of the user terminal and the voice processing device according to the third embodiment. In FIG. 26, components corresponding to those in FIG. 6 are denoted by the same reference numerals, and description thereof is omitted.

ユーザ端末４００ａは、視線方向検出部４２１を備えるが、図６の再生処理部４２２を備えていない。
音声処理装置２００ａは、次の点で図６の音声処理装置２００と異なる。 The user terminal 400a includes the line-of-sight direction detection unit 421, but does not include the reproduction processing unit 422 of FIG.
The audio processing device 200a is different from the audio processing device 200 of FIG. 6 in the following points.

音声入力部２１２には、記憶装置２５０にあらかじめ格納された複数の音声信号２５１が入力される。複数の音声信号２５１は、それぞれ展示物に対応付けられた音源であり、音声入力部２１２は、音声信号２５１を記憶装置２５０から読み出して音声出力処理部２１６に供給する。記憶装置２５０は、音声処理装置２００ａの外部に設置された装置であってもよいし、あるいは記憶装置２５０の内部の装置（例えばＨＤＤ２０３）であってもよい。 A plurality of audio signals 251 stored in advance in the storage device 250 are input to the audio input unit 212. The plurality of audio signals 251 are sound sources associated with the exhibits, and the audio input unit 212 reads out the audio signal 251 from the storage device 250 and supplies it to the audio output processing unit 216. The storage device 250 may be a device installed outside the audio processing device 200a, or may be a device inside the storage device 250 (for example, the HDD 203).

また、音声出力処理部２１６は、合成音声信号を、ユーザ端末でなく、スピーカ３３０ａ〜３３０ｄに出力する。音声出力処理部２１６は、仮想空間における各音源の位置とユーザ位置Ｑｔとから、各音源に対応する音声信号２５１の出力チャネルごとの音量バランスを決定して、各音源の音像を空間上に定位させる。その上で、音声出力処理部２１６は、第２の実施の形態と同様の手順で、聴取範囲に含まれる音源に対応する音声信号２５１の音量を制御する。 The audio output processing unit 216 outputs the synthesized audio signal to the speakers 330a to 330d instead of the user terminal. The audio output processing unit 216 determines the volume balance for each output channel of the audio signal 251 corresponding to each sound source from the position of each sound source in the virtual space and the user position Qt, and localizes the sound image of each sound source in space. Let Then, the audio output processing unit 216 controls the volume of the audio signal 251 corresponding to the sound source included in the listening range in the same procedure as in the second embodiment.

以上の第３の実施の形態によれば、ユーザ４０１は、第２の実施の形態と同様に、自然な動作で所望の展示物に対応する音声を容易に聞き分けることができる。
〔第４の実施の形態〕
図２７は、第４の実施の形態に係る音声提供システムの構成例を示す図である。なお、図２７では、図２５に対応する構成要素には同じ符号を付して示し、それらの説明を省略する。以下、第４の実施の形態と第３の実施の形態との相違点について説明する。 According to the third embodiment described above, the user 401 can easily recognize the sound corresponding to the desired exhibit by a natural operation, as in the second embodiment.
[Fourth Embodiment]
FIG. 27 is a diagram illustrating a configuration example of a voice providing system according to the fourth embodiment. In FIG. 27, components corresponding to those in FIG. 25 are denoted by the same reference numerals, and description thereof is omitted. Hereinafter, differences between the fourth embodiment and the third embodiment will be described.

第４の実施の形態に係る音声提供システムでは、ユーザ４０１は、展示会場内に設置された複数のスピーカから出力される音声を聞く。ただし、第４の実施の形態では、第３の実施の形態とは異なり、各スピーカは展示物に対応付けられ、対応する展示物に近接した位置に設置される。図２７の例では、スピーカ３４０ａ〜３４０ｃは、それぞれ展示物３１０ａ〜３１０ｃに近接する位置に設定されている。 In the audio providing system according to the fourth embodiment, the user 401 listens to audio output from a plurality of speakers installed in the exhibition hall. However, in the fourth embodiment, unlike the third embodiment, each speaker is associated with an exhibit and installed at a position close to the corresponding exhibit. In the example of FIG. 27, the speakers 340a to 340c are set at positions close to the exhibits 310a to 310c, respectively.

音声処理装置２００ｂは、基本的には第３の実施の形態の音声処理装置２００ａと同様の処理機能を備えるが、次の点で第３の実施の形態の音声処理装置２００ａと異なる。音声処理装置２００ｂの音声出力処理部２１６は、それぞれ展示物に対応付けられたスピーカに音声信号を出力する。ここで、各展示物に音源が１つずつ対応付けられている場合、音声処理装置２００ｂの音声出力処理部２１６は、１つのスピーカへの出力チャネルに１つの音源に対応する音声信号を出力すればよく、出力チャネルごとの音量調整によって音像を定位させる必要がない。従って、音声出力処理部２１６は単に、第２の実施の形態と同様の手順で、聴取範囲に含まれる音源に対応する音声信号２５１の音量を制御すればよい。 The voice processing device 200b basically has the same processing functions as the voice processing device 200a of the third embodiment, but differs from the voice processing device 200a of the third embodiment in the following points. The audio output processing unit 216 of the audio processing device 200b outputs an audio signal to a speaker associated with each exhibit. Here, when one sound source is associated with each exhibit, the sound output processing unit 216 of the sound processing device 200b outputs a sound signal corresponding to one sound source to an output channel to one speaker. There is no need to localize the sound image by adjusting the volume for each output channel. Therefore, the audio output processing unit 216 may simply control the volume of the audio signal 251 corresponding to the sound source included in the listening range in the same procedure as in the second embodiment.

以上の第４の実施の形態によれば、ユーザ４０１は、第２，第３の実施の形態と同様に、自然な動作で所望の展示物に対応する音声を容易に聞き分けることができる。
なお、上記の各実施の形態に示した音声処理装置の処理機能は、コンピュータによって実現することができる。その場合、各装置が有すべき機能の処理内容を記述したプログラムが提供され、そのプログラムをコンピュータで実行することにより、上記処理機能がコンピュータ上で実現される。処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、磁気記憶装置、光ディスク、光磁気記録媒体、半導体メモリなどがある。磁気記憶装置には、ハードディスク装置（ＨＤＤ）、フレキシブルディスク（ＦＤ）、磁気テープなどがある。光ディスクには、ＤＶＤ、ＤＶＤ−ＲＡＭ、ＣＤ−ＲＯＭ、ＣＤ−Ｒ／ＲＷなどがある。光磁気記録媒体には、ＭＯ（Magneto-Optical disk）などがある。 According to the fourth embodiment described above, as in the second and third embodiments, the user 401 can easily recognize the sound corresponding to the desired exhibit by natural operation.
Note that the processing functions of the speech processing apparatus described in each of the above embodiments can be realized by a computer. In that case, a program describing the processing contents of the functions that each device should have is provided, and the processing functions are realized on the computer by executing the program on the computer. The program describing the processing contents can be recorded on a computer-readable recording medium. Examples of the computer-readable recording medium include a magnetic storage device, an optical disk, a magneto-optical recording medium, and a semiconductor memory. Examples of the magnetic storage device include a hard disk device (HDD), a flexible disk (FD), and a magnetic tape. Examples of the optical disc include a DVD, a DVD-RAM, a CD-ROM, and a CD-R / RW. Magneto-optical recording media include MO (Magneto-Optical disk).

プログラムを流通させる場合には、例えば、そのプログラムが記録されたＤＶＤ、ＣＤ−ＲＯＭなどの可搬型記録媒体が販売される。また、プログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することもできる。 When distributing the program, for example, a portable recording medium such as a DVD or a CD-ROM in which the program is recorded is sold. It is also possible to store the program in a storage device of a server computer and transfer the program from the server computer to another computer via a network.

プログラムを実行するコンピュータは、例えば、可搬型記録媒体に記録されたプログラムまたはサーバコンピュータから転送されたプログラムを、自己の記憶装置に格納する。そして、コンピュータは、自己の記憶装置からプログラムを読み取り、プログラムに従った処理を実行する。なお、コンピュータは、可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することもできる。また、コンピュータは、ネットワークを介して接続されたサーバコンピュータからプログラムが転送されるごとに、逐次、受け取ったプログラムに従った処理を実行することもできる。 The computer that executes the program stores, for example, the program recorded on the portable recording medium or the program transferred from the server computer in its own storage device. Then, the computer reads the program from its own storage device and executes processing according to the program. The computer can also read the program directly from the portable recording medium and execute processing according to the program. In addition, each time a program is transferred from a server computer connected via a network, the computer can sequentially execute processing according to the received program.

以上の各実施の形態に関し、さらに以下の付記を開示する。
（付記１）聴取者の周囲に仮想的に配置された複数の仮想音源にそれぞれ対応する音声信号の出力を制御する音声処理装置において、
前記聴取者の向きを示す聴取者方向の動きが静止状態になったかを判定する状態判定部と、
前記聴取者から見て前記聴取者方向が中心になるように設定された聴取範囲に含まれる前記仮想音源に対応する音声信号の音量を、前記聴取範囲に含まれない前記仮想音源に対応する音声信号の音量より相対的に大きくするように制御する出力制御部であって、前記静止状態になったと判定されたとき、前記聴取範囲を縮小する出力制御部と、
を有することを特徴とする音声処理装置。 Regarding the above embodiments, the following supplementary notes are further disclosed.
(Supplementary Note 1) In a sound processing device that controls output of sound signals respectively corresponding to a plurality of virtual sound sources virtually arranged around a listener,
A state determination unit for determining whether the movement of the listener direction indicating the direction of the listener is in a stationary state;
The volume of the audio signal corresponding to the virtual sound source included in the listening range set so that the listener direction is the center when viewed from the listener, and the sound corresponding to the virtual sound source not included in the listening range An output control unit that controls to be relatively larger than the volume of the signal, and when it is determined that the stationary state is reached, an output control unit that reduces the listening range;
A speech processing apparatus comprising:

（付記２）前記出力制御部は、前記静止状態になったと判定されたとき、前記聴取範囲を時間をかけて連続的または段階的に縮小することを特徴とする付記１記載の音声処理装置。 (Supplementary note 2) The speech processing apparatus according to supplementary note 1, wherein when it is determined that the output control unit is in the stationary state, the listening range is reduced continuously or stepwise over time.

（付記３）前記出力制御部は、前記静止状態になったと判定されたとき、前記聴取範囲に少なくとも１つの前記仮想音源が含まれるように前記聴取範囲を縮小することを特徴とすることを特徴とする付記１または２記載の音声処理装置。 (Additional remark 3) When it determines with the said output control part having become the said stationary state, the said listening range is reduced so that the said listening range may include at least 1 said virtual sound source, It is characterized by the above-mentioned. The speech processing apparatus according to Supplementary Note 1 or 2.

（付記４）前記各仮想音源の位置または前記各仮想音源に対応付けられた物体の位置と、前記聴取者の位置との距離を検出する距離検出部をさらに有し、
前記出力制御部は、前記聴取範囲を縮小する際に、前記聴取範囲に含まれる前記仮想音源の位置または当該仮想音源に対応付けられた物体の位置と、前記聴取者の位置の距離が近いほど前記聴取範囲が大きくなるように、前記聴取範囲の大きさを補正する、
ことを特徴とする付記１または２記載の音声処理装置。 (Additional remark 4) It further has a distance detection part which detects the distance of the position of each virtual sound source or the position of the object matched with each virtual sound source, and the position of the listener,
When the output control unit reduces the listening range, the distance between the position of the virtual sound source included in the listening range or the position of the object associated with the virtual sound source and the position of the listener is shorter. Correcting the size of the listening range so that the listening range becomes larger;
The speech processing apparatus according to supplementary note 1 or 2, characterized by the above.

（付記５）前記出力制御部は、前記聴取範囲に含まれる前記仮想音源のうち、前記聴取者から見て前記聴取範囲の中心に近い位置に配置された前記仮想音源ほど、対応する音声信号の音量を大きくすることを特徴とする付記１〜４のいずれか１項に記載の音声処理装置。 (Additional remark 5) The said output control part, as for the said virtual sound source arrange | positioned in the position nearer to the center of the said listening range seeing from the said listener among the said virtual sound sources contained in the said listening range, The sound processing apparatus according to any one of appendices 1 to 4, wherein the volume is increased.

（付記６）前記状態判定部は、前記聴取者方向の変動量が所定時間だけ所定の変動幅に収まっている場合に、前記静止状態になったと判定することを特徴とする付記１〜５のいずれか１項に記載の音声処理装置。 (Additional remark 6) The said state determination part determines with having become the said stationary state, when the fluctuation amount of the said listener direction is settled in the predetermined fluctuation range for predetermined time, The additional notes 1-5 characterized by the above-mentioned The speech processing apparatus according to any one of the above.

（付記７）前記出力制御部は、前記複数の仮想音源のそれぞれに対応する音声信号を合成して所定チャネル数の合成音声信号を生成し、前記合成音声信号を所定の音声出力機器に送信することを特徴とする付記１〜６のいずれか１項に記載の音声処理装置。 (Additional remark 7) The said output control part synthesize | combines the audio | voice signal corresponding to each of these virtual sound sources, produces | generates the synthetic | combination audio | voice signal of a predetermined number of channels, and transmits the said synthetic | combination audio | voice signal to a predetermined | prescribed audio | voice output apparatus. The speech processing apparatus according to any one of appendices 1 to 6, characterized in that:

（付記８）聴取者の周囲に仮想的に配置された複数の仮想音源にそれぞれ対応する音声信号の出力を制御する音声処理装置における音声処理方法であって、
前記聴取者から見て、前記聴取者の向きを示す聴取者方向が中心になるように設定された聴取範囲に含まれる前記仮想音源に対応する音声信号の音量を、前記聴取範囲に含まれない前記仮想音源に対応する音声信号の音量より相対的に大きくするように制御し、
前記聴取者方向の動きが静止状態になったと判定したとき、前記聴取範囲を縮小する、
ことを特徴とする音声処理方法。 (Additional remark 8) It is the audio | voice processing method in the audio | voice processing apparatus which controls the output of the audio | voice signal respectively corresponding to the some virtual sound source virtually arrange | positioned around a listener,
The volume of the audio signal corresponding to the virtual sound source included in the listening range set so that the listener's direction indicating the direction of the listener is the center when viewed from the listener is not included in the listening range. Control to be relatively larger than the volume of the audio signal corresponding to the virtual sound source,
When it is determined that the movement in the listener direction has become stationary, the listening range is reduced.
And a voice processing method.

（付記９）前記静止状態になったと判定したとき、前記聴取範囲を時間をかけて連続的または段階的に縮小することを特徴とする付記８記載の音声処理方法。
（付記１０）前記静止状態になったと判定したとき、前記聴取範囲に少なくとも１つの前記仮想音源が含まれるように前記聴取範囲を縮小することを特徴とすることを特徴とする付記８または９記載の音声処理方法。 (Supplementary note 9) The speech processing method according to supplementary note 8, wherein when it is determined that the stationary state is reached, the listening range is reduced continuously or stepwise over time.
(Supplementary note 10) The supplementary note 8 or 9, wherein when it is determined that the stationary state is reached, the listening range is reduced so that at least one virtual sound source is included in the listening range. Voice processing method.

（付記１１）前記各仮想音源の位置または前記各仮想音源に対応付けられた物体の位置と、前記聴取者の位置との距離を検出する処理をさらに含み、
前記聴取範囲を縮小する処理では、前記聴取範囲に含まれる前記仮想音源の位置または当該仮想音源に対応付けられた物体の位置と、前記聴取者の位置の距離が近いほど前記聴取範囲が大きくなるように、前記聴取範囲の大きさを補正する、
ことを特徴とする付記８または９記載の音声処理方法。 (Additional remark 11) The process which further detects the distance of the position of each said virtual sound source or the position of the object matched with each said virtual sound source, and the position of the said listener,
In the process of reducing the listening range, the listening range becomes larger as the distance between the position of the virtual sound source included in the listening range or the position of the object associated with the virtual sound source and the position of the listener is shorter. So as to correct the size of the listening range,
The speech processing method according to appendix 8 or 9, characterized in that.

（付記１２）前記聴取範囲に含まれる前記仮想音源のうち、前記聴取者から見て前記聴取範囲の中心に近い位置に配置された前記仮想音源ほど、対応する音声信号の音量を大きくすることを特徴とする付記８〜１１のいずれか１項に記載の音声処理方法。 (Supplementary note 12) Of the virtual sound sources included in the listening range, the volume of the corresponding audio signal is increased as the virtual sound source is located closer to the center of the listening range as viewed from the listener. The speech processing method according to any one of appendices 8 to 11, which is characterized by the following.

（付記１３）前記聴取者方向の変動量が所定時間だけ所定の変動幅に収まっている場合に、前記静止状態になったと判定することを特徴とする付記８〜１２のいずれか１項に記載の音声処理方法。 (Additional remark 13) It determines with having become the said stationary state, when the variation | change_quantity of the said listener direction is settled in the predetermined fluctuation range only for predetermined time, It is determined in any one of Additional remark 8-12 characterized by the above-mentioned. Voice processing method.

（付記１４）前記複数の仮想音源のそれぞれに対応する音声信号を合成して所定チャネル数の合成音声信号を生成し、前記合成音声信号を所定の音声出力機器に送信する処理をさらに含むことを特徴とする付記８〜１３のいずれか１項に記載の音声処理方法。 (Additional remark 14) It further includes the process which synthesize | combines the audio | voice signal corresponding to each of these virtual sound sources, produces | generates the synthetic | combination audio | voice signal of a predetermined number of channels, and transmits the said synthetic | combination audio | voice signal to a predetermined | prescribed audio | voice output apparatus. 14. The voice processing method according to any one of appendices 8 to 13, which is a feature.

（付記１５）聴取者の周囲に仮想的に配置された複数の仮想音源にそれぞれ対応する音声信号の出力を制御するための音声処理プログラムにおいて、
コンピュータに、
前記聴取者から見て、前記聴取者の向きを示す聴取者方向が中心になるように設定された聴取範囲に含まれる前記仮想音源に対応する音声信号の音量を、前記聴取範囲に含まれない前記仮想音源に対応する音声信号の音量より相対的に大きくするように制御し、
前記聴取者方向の動きが静止状態になったと判定したとき、前記聴取範囲を縮小する、
処理を実行させることを特徴とする音声処理プログラム。 (Supplementary Note 15) In an audio processing program for controlling output of audio signals respectively corresponding to a plurality of virtual sound sources virtually arranged around a listener,
On the computer,
The volume of the audio signal corresponding to the virtual sound source included in the listening range set so that the listener's direction indicating the direction of the listener is the center when viewed from the listener is not included in the listening range. Control to be relatively larger than the volume of the audio signal corresponding to the virtual sound source,
When it is determined that the movement in the listener direction has become stationary, the listening range is reduced.
A voice processing program for executing a process.

（付記１６）前記静止状態になったと判定したとき、前記聴取範囲を時間をかけて連続的または段階的に縮小することを特徴とする付記１５記載の音声処理プログラム。
（付記１７）前記静止状態になったと判定したとき、前記聴取範囲に少なくとも１つの前記仮想音源が含まれるように前記聴取範囲を縮小することを特徴とすることを特徴とする付記１５または１６記載の音声処理プログラム。 (Supplementary note 16) The audio processing program according to supplementary note 15, wherein when it is determined that the stationary state is reached, the listening range is reduced continuously or stepwise over time.
(Supplementary note 17) The supplementary note 15 or 16, wherein when it is determined that the listening state has been reached, the listening range is reduced so that the listening range includes at least one virtual sound source. Voice processing program.

（付記１８）前記各仮想音源の位置または前記各仮想音源に対応付けられた物体の位置と、前記聴取者の位置との距離を検出する処理を、前記コンピュータにさらに実行させ、
前記聴取範囲を縮小する処理では、前記聴取範囲に含まれる前記仮想音源の位置または当該仮想音源に対応付けられた物体の位置と、前記聴取者の位置の距離が近いほど前記聴取範囲が大きくなるように、前記聴取範囲の大きさを補正する、
ことを特徴とする付記１５または１６記載の音声処理プログラム。 (Additional remark 18) Let the said computer further perform the process which detects the distance of the position of each said virtual sound source or the position of the object matched with each said virtual sound source, and the position of the said listener,
In the process of reducing the listening range, the listening range becomes larger as the distance between the position of the virtual sound source included in the listening range or the position of the object associated with the virtual sound source and the position of the listener is shorter. So as to correct the size of the listening range,
The voice processing program according to supplementary note 15 or 16, characterized in that.

（付記１９）前記聴取範囲に含まれる前記仮想音源のうち、前記聴取者から見て前記聴取範囲の中心に近い位置に配置された前記仮想音源ほど、対応する音声信号の音量を大きくすることを特徴とする付記１５〜１８のいずれか１項に記載の音声処理プログラム。 (Supplementary Note 19) Of the virtual sound sources included in the listening range, the volume of the corresponding audio signal is increased as the virtual sound source is located closer to the center of the listening range as viewed from the listener. 19. The voice processing program according to any one of supplementary notes 15 to 18, which is a feature.

（付記２０）前記聴取者方向の変動量が所定時間だけ所定の変動幅に収まっている場合に、前記静止状態になったと判定することを特徴とする付記１５〜１９のいずれか１項に記載の音声処理プログラム。 (Supplementary note 20) According to any one of supplementary notes 15 to 19, wherein the stationary state is determined when the fluctuation amount in the listener direction is within a predetermined fluctuation range for a predetermined time. Voice processing program.

１音声処理装置
２音源位置情報
３状態判定部
４出力制御部
１０聴取者
２１〜２５仮想音源
３０聴取範囲 DESCRIPTION OF SYMBOLS 1 Audio processing apparatus 2 Sound source position information 3 State determination part 4 Output control part 10 Listener 21-25 Virtual sound source 30 Listening range

Claims

In an audio processing device that controls output of audio signals respectively corresponding to a plurality of virtual sound sources virtually arranged around a listener,
A state determination unit for determining whether the movement of the listener direction indicating the direction of the listener is in a stationary state;
The volume of the audio signal corresponding to the virtual sound source included in the listening range set so that the listener direction is the center when viewed from the listener, and the sound corresponding to the virtual sound source not included in the listening range An output control unit that controls to be relatively larger than the volume of the signal, and when it is determined that the stationary state is reached, an output control unit that reduces the listening range;
A speech processing apparatus comprising:

The audio processing apparatus according to claim 1, wherein the output control unit reduces the listening range continuously or stepwise over time when it is determined that the stationary state is reached.

The output control unit reduces the listening range so that at least one of the virtual sound sources is included in the listening range when it is determined that the stationary state is reached. The speech processing apparatus according to 1 or 2.

A distance detection unit for detecting a distance between the position of each virtual sound source or the position of an object associated with each virtual sound source and the position of the listener;
When the output control unit reduces the listening range, the distance between the position of the virtual sound source included in the listening range or the position of the object associated with the virtual sound source and the position of the listener is shorter. Correcting the size of the listening range so that the listening range becomes larger;
The speech processing apparatus according to claim 1 or 2,

The output control unit increases the volume of the corresponding audio signal for the virtual sound source arranged at a position closer to the center of the listening range when viewed from the listener among the virtual sound sources included in the listening range. The speech processing apparatus according to claim 1, wherein

The said state determination part determines with having become the said stationary state, when the fluctuation amount of the said listener direction is settled in the predetermined fluctuation range only for the predetermined time, The any one of Claims 1-5 characterized by the above-mentioned. The speech processing apparatus according to the item.

The output control unit synthesizes audio signals corresponding to each of the plurality of virtual sound sources to generate a synthesized audio signal having a predetermined number of channels, and transmits the synthesized audio signal to a predetermined audio output device. The speech processing apparatus according to any one of claims 1 to 6.

An audio processing method in an audio processing device that controls output of audio signals respectively corresponding to a plurality of virtual sound sources virtually arranged around a listener,
The volume of the audio signal corresponding to the virtual sound source included in the listening range set so that the listener's direction indicating the direction of the listener is the center when viewed from the listener is not included in the listening range. Control to be relatively larger than the volume of the audio signal corresponding to the virtual sound source,
When it is determined that the movement in the listener direction has become stationary, the listening range is reduced.
And a voice processing method.

In an audio processing program for controlling output of audio signals respectively corresponding to a plurality of virtual sound sources virtually arranged around a listener,
On the computer,
The volume of the audio signal corresponding to the virtual sound source included in the listening range set so that the listener's direction indicating the direction of the listener is the center when viewed from the listener is not included in the listening range. Control to be relatively larger than the volume of the audio signal corresponding to the virtual sound source,
When it is determined that the movement in the listener direction has become stationary, the listening range is reduced.
A voice processing program for executing a process.