JP5983421B2

JP5983421B2 - Audio processing apparatus, audio processing method, and audio processing program

Info

Publication number: JP5983421B2
Application number: JP2013008549A
Authority: JP
Inventors: 拓郎大谷; 関口　英紀; 英紀関口; 桂樹岡林; 洋平関
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2013-01-21
Filing date: 2013-01-21
Publication date: 2016-08-31
Anticipated expiration: 2033-01-21
Also published as: JP2014140128A

Description

本発明は、音声処理装置、音声処理方法および音声処理プログラムに関する。 The present invention relates to a voice processing device, a voice processing method, and a voice processing program.

ヘッドホンやイヤホンなどの左右２チャネル出力の再生装置を用いて、聴取者の周囲の任意の位置に仮想音源の音像を定位させる技術がある。この技術では、仮想音源から出力される音声信号に、仮想音源の位置に対応する頭部伝達関数（ＨＲＴＦ：Head-Related Transfer Function）を畳み込み演算することにより、音像定位が実現される。また、聴取者の周囲に複数の仮想音源を配置し、各仮想音源の音像を定位させることも可能である。 There is a technique for localizing a sound image of a virtual sound source at an arbitrary position around a listener using a playback device that outputs left and right channels such as headphones and earphones. In this technique, sound image localization is realized by performing a convolution operation on a head-related transfer function (HRTF) corresponding to the position of the virtual sound source in an audio signal output from the virtual sound source. It is also possible to arrange a plurality of virtual sound sources around the listener and localize the sound image of each virtual sound source.

また、仮想音源に関する技術としては、次のようなものがある。例えば、演奏や音声を再生する場合、自然で躍動感や臨場感にあふれた再生を実現するために、スピーカアレイから出力される音波の波面合成により仮想音源を形成するとともに、上記仮想音源の位置をその近傍で変化させる技術が提案されている。 Moreover, there are the following technologies related to the virtual sound source. For example, when playing a performance or sound, a virtual sound source is formed by wavefront synthesis of sound waves output from a speaker array in order to realize natural and lively and realistic reproduction, and the position of the virtual sound source There has been proposed a technique for changing the value in the vicinity thereof.

また、各チャネルの入力信号の重心から算出した重み係数を、仮想音像を作る信号に掛けることにより、重心位置に応じて仮想音像の定位感をより強調した高い包まれ感を得る技術が提案されている。 In addition, a technology has been proposed that obtains a high wrapping feeling that emphasizes the localization of the virtual sound image according to the position of the center of gravity by multiplying the signal that creates the virtual sound image by the weighting coefficient calculated from the center of gravity of the input signal of each channel. ing.

特開２００６−８６９２１号公報JP 2006-86921 A 特開２０１１−２１１３１２号公報JP2011-21112A

複数の仮想音源の音像を定位させる処理では、各仮想音源の位置に対応する頭部伝達関数を用いた畳み込み演算が仮想音源の数だけ実行される。このため、仮想音源の数が多いほど処理の負荷が大きくなるという問題がある。この点に関しては、複数の仮想音源それぞれに対応する音声信号を一定数の仮想スピーカに分配し、各仮想スピーカに対応するＨＲＴＦを用いて畳み込み演算することにより、畳み込み演算の処理量を常に仮想スピーカの数に固定できる。しかし、このように仮想スピーカを用いた方法では、仮想音源の位置とは異なる仮想スピーカの位置に対応するＨＲＴＦが用いられるため、仮想音源の音像の定位感が曖昧になる場合がある。 In the process of localizing the sound images of a plurality of virtual sound sources, a convolution operation using a head-related transfer function corresponding to the position of each virtual sound source is executed by the number of virtual sound sources. For this reason, there is a problem that the processing load increases as the number of virtual sound sources increases. In this regard, the audio signal corresponding to each of the plurality of virtual sound sources is distributed to a certain number of virtual speakers, and the convolution calculation is performed using the HRTF corresponding to each virtual speaker, so that the processing amount of the convolution calculation is always reduced to the virtual speaker. The number can be fixed. However, in the method using the virtual speaker as described above, since the HRTF corresponding to the position of the virtual speaker different from the position of the virtual sound source is used, the localization feeling of the sound image of the virtual sound source may be ambiguous.

一側面では、低負荷の処理により複数の仮想音源の音像の定位感を向上できる音声処理装置、音声処理方法および音声処理プログラムを提供することを目的とする。 An object of one aspect is to provide an audio processing device, an audio processing method, and an audio processing program that can improve the sense of localization of sound images of a plurality of virtual sound sources by low-load processing.

１つの案では、音声処理装置が提供される。音声処理装置は、スピーカ配置部と音声合成部を有する。スピーカ配置部は、聴取者から見て円周に沿った方向に隣り合う２つの仮想音源間の、聴取者を中心として円周に沿った方向の角度を算出する。スピーカ配置部は、聴取者を中心として円周に沿った方向の範囲のうち、算出された角度がしきい値以上となる仮想音源同士に挟まれ、かつ聴取者から見た当該各仮想音源の方向上の位置を含まない第１の範囲を、除く第２の範囲に、聴取者の周囲に配置された複数の仮想音源の数より少ない複数の仮想スピーカを配置する。音声合成部は、複数の仮想音源それぞれからの音声信号を複数の仮想スピーカのうち仮想音源毎に選択される１以上の仮想スピーカに分配する。音声合成部は、各仮想スピーカの位置に対応する頭部伝達関数を用いて、各仮想スピーカに分配された音声信号を左右２チャネルの音声信号に合成する。 In one scheme, a speech processing device is provided. The speech processing apparatus has a speaker arrangement unit and a speech synthesis unit. A speaker arrangement | positioning part calculates the angle of the direction along the circumference centering on a listener between two virtual sound sources adjacent to the direction along the circumference seeing from a listener. The speaker placement unit is sandwiched between virtual sound sources whose calculated angles are equal to or greater than a threshold value in a range in a direction along the circumference with the listener as the center, and each virtual sound source viewed from the listener A plurality of virtual speakers less than the number of the plurality of virtual sound sources arranged around the listener is arranged in the second range excluding the first range that does not include the position in the direction. The speech synthesizer distributes audio signals from each of the plurality of virtual sound sources to one or more virtual speakers selected for each virtual sound source among the plurality of virtual speakers. The speech synthesizer synthesizes the audio signal distributed to each virtual speaker into two left and right channel audio signals using a head-related transfer function corresponding to the position of each virtual speaker.

また、１つの案では、上記音声処理装置によって実現される処理と同様の処理を実行する音声処理方法が提供される。
さらに、１つの案では、上記の音声処理装置と同様の処理をコンピュータに実行させる音声処理プログラムが提供される。 Further, in one proposal, a voice processing method is provided that executes the same processing as that realized by the voice processing device.
Furthermore, in one proposal, a voice processing program that causes a computer to execute the same processing as that of the voice processing device described above is provided.

一側面では、低負荷の処理により複数の仮想音源の音像の定位感を向上できる。 In one aspect, the localization of sound images of a plurality of virtual sound sources can be improved by low-load processing.

第１の実施の形態の音声処理装置の例を示す図である。It is a figure which shows the example of the audio | voice processing apparatus of 1st Embodiment. 第２の実施の形態の音声処理システムの例を示す図である。It is a figure which shows the example of the speech processing system of 2nd Embodiment. 音声処理装置のハードウェア構成例を示す図である。It is a figure which shows the hardware structural example of a speech processing unit. ユーザ端末のハードウェア構成例を示す図である。It is a figure which shows the hardware structural example of a user terminal. 音声処理装置の機能例を示すブロック図である。It is a block diagram which shows the function example of a speech processing unit. 左右のチャネルの音声信号の生成例について示す図である。It is a figure shown about the example of a production | generation of the audio | voice signal of a channel on either side. 音源と仮想スピーカとの位置関係の例を示す図である。It is a figure which shows the example of the positional relationship of a sound source and a virtual speaker. 仮想スピーカの数が仮想音源の数以上である場合の仮想スピーカの配置方法の例を示す図である。It is a figure which shows the example of the arrangement | positioning method of a virtual speaker when the number of virtual speakers is more than the number of virtual sound sources. 仮想スピーカの数が仮想音源の数未満である場合の仮想スピーカの配置方法の例を示す第１の図である。It is a 1st figure which shows the example of the arrangement | positioning method of a virtual speaker when the number of virtual speakers is less than the number of virtual sound sources. 仮想スピーカの数が仮想音源の数未満である場合の仮想スピーカの配置方法の例を示す第２の図である。It is a 2nd figure which shows the example of the arrangement | positioning method of a virtual speaker when the number of virtual speakers is less than the number of virtual sound sources. 仮想スピーカの数が仮想音源の数未満である場合の仮想スピーカの配置方法の例を示す第３の図である。It is a 3rd figure which shows the example of the arrangement | positioning method of a virtual speaker when the number of virtual speakers is less than the number of virtual sound sources. 仮想スピーカの数が仮想音源の数未満である場合の仮想スピーカの配置方法の例を示す第４の図である。It is a 4th figure which shows the example of the arrangement | positioning method of a virtual speaker when the number of virtual speakers is less than the number of virtual sound sources. 仮想スピーカの数が仮想音源の数未満である場合の仮想スピーカの配置方法の例を示す第５の図である。It is a 5th figure which shows the example of the arrangement | positioning method of a virtual speaker when the number of virtual speakers is less than the number of virtual sound sources. ユーザ状態テーブルの例について示す図である。It is a figure shown about the example of a user status table. 音源管理テーブルの例について示す図である。It is a figure shown about the example of a sound source management table. 仮想スピーカ位置テーブルの例について示す図である。It is a figure shown about the example of a virtual speaker position table. 配置情報の例について示す図である。It is a figure shown about the example of arrangement | positioning information. 配置領域管理テーブルの例について示す図である。It is a figure shown about the example of an arrangement | positioning area | region management table. 仮想スピーカの配置処理の例を示すフローチャートである。It is a flowchart which shows the example of the arrangement | positioning process of a virtual speaker. 仮想スピーカの配置処理の例を示すフローチャート（続き）である。It is a flowchart (continuation) which shows the example of arrangement | positioning processing of a virtual speaker. 仮想スピーカの配置の例を示す第１の図である。It is a 1st figure which shows the example of arrangement | positioning of a virtual speaker. 仮想スピーカの配置の例を示す第２の図である。It is a 2nd figure which shows the example of arrangement | positioning of a virtual speaker. 仮想スピーカの配置の例を示す第３の図である。It is a 3rd figure which shows the example of arrangement | positioning of a virtual speaker. 仮想スピーカの配置の例を示す第４の図である。It is a 4th figure which shows the example of arrangement | positioning of a virtual speaker. 仮想スピーカの配置の例を示す第５の図である。It is a 5th figure which shows the example of arrangement | positioning of a virtual speaker. 配置領域に対する重みの設定方法の例を示す図である。It is a figure which shows the example of the setting method of the weight with respect to an arrangement | positioning area | region. 重みを考慮した仮想スピーカの配置の変形例を示す図である。It is a figure which shows the modification of arrangement | positioning of the virtual speaker which considered the weight.

以下、本実施の形態を図面を参照して説明する。
［第１の実施の形態］
図１は、第１の実施の形態の音声処理装置の例を示す図である。音声処理装置１は、聴取者の周囲に配置された複数の仮想音源からの音声信号を左右２チャネルの音声信号に合成する。仮想音源とは、音声を表現するために聴取者の周囲の任意の位置に仮想的に配置される音源である。なお、各仮想音源の位置は、例えば、ユーザの入力操作によってあらかじめ設定される。 Hereinafter, the present embodiment will be described with reference to the drawings.
[First Embodiment]
FIG. 1 is a diagram illustrating an example of a speech processing apparatus according to the first embodiment. The audio processing device 1 synthesizes audio signals from a plurality of virtual sound sources arranged around the listener into left and right channel audio signals. The virtual sound source is a sound source that is virtually arranged at an arbitrary position around the listener in order to express sound. Note that the position of each virtual sound source is set in advance by, for example, a user input operation.

音声処理装置１は、スピーカ配置部２および音声合成部３を有する。スピーカ配置部２および音声合成部３の処理は、例えば、ＣＰＵ（Central Processing Unit）やＤＳＰ（Digital Signal Processor）などのプロセッサ、あるいはＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）などのその他の電子回路、あるいはプロセッサと他の電子回路との組み合わせによって実現される。 The speech processing apparatus 1 includes a speaker placement unit 2 and a speech synthesis unit 3. The processing of the speaker placement unit 2 and the speech synthesis unit 3 is performed by, for example, a processor such as a CPU (Central Processing Unit) or a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), or an FPGA (Field Programmable Gate Array). It is realized by another electronic circuit or a combination of a processor and another electronic circuit.

スピーカ配置部２は、聴取者を中心とした所定の円周上に、仮想音源の数より少ない複数の仮想スピーカを動的に配置する。仮想スピーカとは、音場処理により実際には存在しないスピーカの配置位置から音が聞こえてくるようにするものをいう。なお、仮想スピーカも仮想音源の一種であるが、本実施の形態では、後述するように各仮想スピーカに分配される音声信号の発生源を「仮想音源」と呼び、スピーカ配置部２によって動的に配置される仮想音源を「仮想スピーカ」と呼ぶこととする。 The speaker placement unit 2 dynamically places a plurality of virtual speakers less than the number of virtual sound sources on a predetermined circumference centered on the listener. The virtual speaker means a sound that can be heard from a speaker arrangement position that does not actually exist by sound field processing. Note that the virtual speaker is also a kind of virtual sound source, but in this embodiment, as will be described later, a generation source of an audio signal distributed to each virtual speaker is referred to as a “virtual sound source” and is dynamically generated by the speaker placement unit 2. The virtual sound source arranged in is called “virtual speaker”.

音声合成部３は、複数の仮想音源それぞれからの音声信号を、複数の仮想スピーカのうち仮想音源毎に選択される１以上の仮想スピーカに分配する。音声合成部３は、各仮想スピーカの位置に対応するＨＲＴＦ（頭部伝達関数）を用いて、各仮想スピーカに分配された音声信号を左右２チャネルの音声信号に合成する。 The voice synthesizer 3 distributes audio signals from each of the plurality of virtual sound sources to one or more virtual speakers selected for each virtual sound source among the plurality of virtual speakers. The voice synthesizer 3 synthesizes a voice signal distributed to each virtual speaker into a left and right two-channel voice signal using an HRTF (head related transfer function) corresponding to the position of each virtual speaker.

この音声合成部３では、仮想スピーカ毎に、仮想スピーカに分配された音声信号と仮想スピーカの位置に対応するＨＲＴＦとを用いた畳み込み演算が行われる。これにより、仮想スピーカの位置に、この仮想スピーカに分配された音声の音像が定位される。ここで、仮想スピーカの数は仮想音源の数より少ないので、各仮想音源の位置に対応するＨＲＴＦを用いて畳み込み演算が行われる場合と比較して、演算の負荷が軽減される。また、例えば、仮想スピーカの数を仮想音源の数より少ない一定数とすることで、仮想音源の数によらず畳み込み演算の回数を一定に抑えることができる。 In the voice synthesizer 3, for each virtual speaker, a convolution operation is performed using the voice signal distributed to the virtual speaker and the HRTF corresponding to the position of the virtual speaker. Thereby, the sound image of the sound distributed to the virtual speaker is localized at the position of the virtual speaker. Here, since the number of virtual speakers is smaller than the number of virtual sound sources, the calculation load is reduced compared to the case where the convolution operation is performed using the HRTF corresponding to the position of each virtual sound source. Further, for example, by setting the number of virtual speakers to a constant number smaller than the number of virtual sound sources, the number of convolution calculations can be kept constant regardless of the number of virtual sound sources.

また、スピーカ配置部２は、次のような手順で複数の仮想スピーカを配置する。スピーカ配置部２は、聴取者から見て前述の円周に沿った方向に隣り合う２つの仮想音源間の、聴取者を中心としてこの円周に沿った方向の角度を算出する。スピーカ配置部２は、聴取者を中心としてこの円周に沿った方向の範囲から、算出された角度がしきい値以上となる仮想音源同士に挟まれ、かつこれら各仮想音源の方向上の位置を含まない第１の範囲を判別する。そして、スピーカ配置部２は、聴取者を中心として上記の円周に沿った方向の範囲のうち、第１の範囲を除く第２の範囲に上記の複数の仮想スピーカを配置し、第１の範囲には仮想スピーカを配置しない。 The speaker placement unit 2 places a plurality of virtual speakers in the following procedure. The speaker placement unit 2 calculates an angle in a direction along the circumference with the listener as a center, between two virtual sound sources adjacent to each other in the direction along the circumference as viewed from the listener. The speaker placement unit 2 is sandwiched between virtual sound sources whose calculated angles are equal to or greater than a threshold value from a range in a direction along the circumference with the listener as the center, and positions in the direction of these virtual sound sources. A first range that does not include is determined. And the speaker arrangement | positioning part 2 arrange | positions said several virtual speaker in the 2nd range except a 1st range among the range of the direction along said circumference centering on a listener, 1st No virtual speaker is placed in the range.

これにより、仮想スピーカと聴取者とを結ぶ直線と、上記の円周に沿った方向に対して仮想スピーカに隣り合う仮想音源と聴取者とを結ぶ直線との角度を減らすことができる。その結果、各仮想音源の音像の定位感を向上させることができる。 Thereby, the angle between the straight line connecting the virtual speaker and the listener and the straight line connecting the virtual sound source adjacent to the virtual speaker and the listener with respect to the direction along the circumference can be reduced. As a result, the feeling of localization of the sound image of each virtual sound source can be improved.

ここで、図１に示すように聴取者４の周囲に４つの仮想音源５ａ〜５ｄが配置された場合において、３つの仮想スピーカ６ａ〜６ｃを配置する例について説明する。
スピーカ配置部２は、聴取者４および仮想音源５ａ〜５ｄの各位置に基づいて、角度θ１〜θ４を算出する。なお、ここでは、スピーカ配置部２は、聴取者４を中心とした円周７に沿って右回り方向の角度を算出するものとする。角度θ１は、聴取者４および仮想音源５ａを結ぶ直線と、聴取者４および仮想音源５ｂを結ぶ直線との間の角度である。角度θ２は、聴取者４および仮想音源５ｂを結ぶ直線と、聴取者４および仮想音源５ｃを結ぶ直線との間の角度である。角度θ３は、聴取者４および仮想音源５ｃを結ぶ直線と、聴取者４および仮想音源５ｄを結ぶ直線との間の角度である。角度θ４は、聴取者４および仮想音源５ｄを結ぶ直線と、聴取者４および仮想音源５ａを結ぶ直線との間の角度である。 Here, an example in which three virtual speakers 6a to 6c are arranged when four virtual sound sources 5a to 5d are arranged around the listener 4 as shown in FIG. 1 will be described.
The speaker placement unit 2 calculates the angles θ1 to θ4 based on the positions of the listener 4 and the virtual sound sources 5a to 5d. Here, it is assumed that the speaker arrangement unit 2 calculates an angle in the clockwise direction along the circumference 7 with the listener 4 as the center. The angle θ1 is an angle between a straight line connecting the listener 4 and the virtual sound source 5a and a straight line connecting the listener 4 and the virtual sound source 5b. The angle θ2 is an angle between a straight line connecting the listener 4 and the virtual sound source 5b and a straight line connecting the listener 4 and the virtual sound source 5c. The angle θ3 is an angle between a straight line connecting the listener 4 and the virtual sound source 5c and a straight line connecting the listener 4 and the virtual sound source 5d. The angle θ4 is an angle between a straight line connecting the listener 4 and the virtual sound source 5d and a straight line connecting the listener 4 and the virtual sound source 5a.

ここで、角度θ１，θ２，θ３は所定の角度未満であり、角度θ４は所定の角度以上であったとする。所定の角度は、例えば、３６０°を配置するスピーカの数で割った数である。 Here, it is assumed that the angles θ1, θ2, and θ3 are less than a predetermined angle, and the angle θ4 is not less than the predetermined angle. The predetermined angle is, for example, a number obtained by dividing 360 ° by the number of speakers arranged.

この場合、スピーカ配置部２は、円周７上において、角度θ４である角度の範囲において隣り合う仮想音源５ｄおよび仮想音源５ａに挟まれ、かつ聴取者４から見た仮想音源５ａ，５ｄの方向を含まない範囲を、前述の第１の範囲と判別する。この第１の範囲には、聴取者４と仮想音源５ａ，５ｄとをそれぞれ結ぶ線分が含まれない。スピーカ配置部２は、この第１の範囲には仮想スピーカを配置しない。一方、スピーカ配置部２は、円周７上において、第１の範囲を除く第２の範囲（すなわち、角度θ１，θ２，θ３の範囲）に仮想スピーカ６ａ，６ｂ，６ｃを配置する。この第２の範囲には、聴取者４と仮想音源５ａ，５ｄとをそれぞれ結ぶ線分が含まれる。 In this case, the speaker placement unit 2 is sandwiched between the virtual sound source 5d and the virtual sound source 5a adjacent to each other in the range of the angle θ4 on the circumference 7, and the direction of the virtual sound sources 5a and 5d as viewed from the listener 4 A range that does not include is determined as the first range described above. This first range does not include the line segments connecting the listener 4 and the virtual sound sources 5a and 5d. The speaker placement unit 2 does not place a virtual speaker in the first range. On the other hand, the speaker arrangement unit 2 arranges the virtual speakers 6a, 6b, and 6c on the circumference 7 in a second range (that is, a range of angles θ1, θ2, and θ3) excluding the first range. This second range includes line segments that connect the listener 4 and the virtual sound sources 5a and 5d, respectively.

これにより、仮想スピーカと聴取者とを結ぶ直線と、円周７に沿った方向に対して仮想スピーカに隣り合う仮想音源と聴取者とを結ぶ直線との角度を減らすことができる。ここで、仮想音源が出力する音声信号を仮想スピーカに分配する際、仮想スピーカと聴取者とを結ぶ直線と、仮想スピーカに隣り合う仮想音源と聴取者とを結ぶ直線との角度が小さい方が、仮想音源の音像の定位感が向上する。これは、仮想音源の位置に対応する本来使用すべきＨＲＴＦと、仮想スピーカの位置に対応する実際の演算で使用されるＨＲＴＦとの誤差が小さくなるからである。よって、複数の仮想音源の音像の定位感を向上させることができる。 Thereby, the angle between the straight line connecting the virtual speaker and the listener and the straight line connecting the virtual sound source adjacent to the virtual speaker and the listener with respect to the direction along the circumference 7 can be reduced. Here, when the audio signal output from the virtual sound source is distributed to the virtual speaker, the angle between the straight line connecting the virtual speaker and the listener and the straight line connecting the virtual sound source adjacent to the virtual speaker and the listener should be smaller. The feeling of localization of the sound image of the virtual sound source is improved. This is because the error between the HRTF that should be used originally corresponding to the position of the virtual sound source and the HRTF used in the actual calculation corresponding to the position of the virtual speaker is small. Therefore, it is possible to improve the sense of localization of the sound images of a plurality of virtual sound sources.

［第２の実施の形態］
図２は、第２の実施の形態の音声処理システムの例を示す図である。
図２に示す音声処理システムは、ユーザに音声情報を提供するための制御処理を行う音声処理装置１００を備える。音声処理装置１００には、無線信号を送受信するための複数のアクセスポイント２１ａ〜２１ｄが、ネットワーク１０を介して接続されている。ネットワーク１０は、例えばＬＡＮ（Local Area Network）である。この場合、アクセスポイント２１ａ〜２１ｄは、無線ＬＡＮアクセスポイントである。 [Second Embodiment]
FIG. 2 is a diagram illustrating an example of a speech processing system according to the second embodiment.
The voice processing system shown in FIG. 2 includes a voice processing device 100 that performs control processing for providing voice information to a user. A plurality of access points 21 a to 21 d for transmitting and receiving wireless signals are connected to the audio processing device 100 via the network 10. The network 10 is, for example, a LAN (Local Area Network). In this case, the access points 21a to 21d are wireless LAN access points.

一方、ユーザは、ユーザ端末２００およびヘッドホン１２を携帯する。ユーザ端末２００は、アクセスポイント２１ａ〜２１ｄとの間で無線通信することが可能になっている。
音声処理装置１００は、管理者などにより仮想的に配置された音源が出力する音声信号を合成し、合成された音声信号を、アクセスポイント２１ａ〜２１ｄの少なくとも１つを通じて、ユーザ端末２００に送信する。以下、「音源」とは、あらかじめユーザの周辺の環境に対応する仮想空間に配置された仮想音源を示すものとする。 On the other hand, the user carries the user terminal 200 and the headphones 12. The user terminal 200 can wirelessly communicate with the access points 21a to 21d.
The voice processing device 100 synthesizes a voice signal output from a sound source virtually arranged by an administrator or the like, and transmits the synthesized voice signal to the user terminal 200 through at least one of the access points 21a to 21d. . Hereinafter, “sound source” refers to a virtual sound source arranged in advance in a virtual space corresponding to the environment around the user.

また、音声処理装置１００は、ユーザ端末２００の位置を検出する機能を備える。本実施の形態では例として、音声処理装置１００は、ユーザ端末２００から送信された信号を、アクセスポイント２１ａ〜２１ｄから受信し、これらの受信信号に基づいてユーザ端末２００の位置を検出する。例えば、音声処理装置１００は、ユーザ端末２００から送信された信号をアクセスポイント２１ａ〜２１ｄを通じて受信し、それぞれのアクセスポイントにおける信号の受信時刻の差、あるいは受信電波強度の差に基づいて、三角法を用いてユーザ端末２００の位置を検出する。この方法が用いられる場合、位置検出に使用されるアクセスポイントは、少なくとも３つ設置される。 In addition, the voice processing device 100 has a function of detecting the position of the user terminal 200. As an example in the present embodiment, the speech processing apparatus 100 receives signals transmitted from the user terminal 200 from the access points 21a to 21d, and detects the position of the user terminal 200 based on these received signals. For example, the speech processing apparatus 100 receives signals transmitted from the user terminal 200 through the access points 21a to 21d, and triangulation based on the difference in signal reception time or the difference in received radio wave intensity at each access point. Is used to detect the position of the user terminal 200. When this method is used, at least three access points used for position detection are installed.

ヘッドホン１２は、アナログ音声信号を内蔵されているスピーカを用いて音波に変換する装置である。ヘッドホン１２は、ユーザ端末２００から出力されたアナログ音声信号を再生出力するドライバユニット（図示せず）を備える。また、ヘッドホン１２には、センサ１１が搭載されている。 The headphone 12 is a device that converts an analog audio signal into sound waves using a built-in speaker. The headphone 12 includes a driver unit (not shown) that reproduces and outputs an analog audio signal output from the user terminal 200. In addition, the sensor 11 is mounted on the headphone 12.

センサ１１は、ユーザが向いている方向を検出する。以下、センサ１１によって検出される方向を“視線方向”と呼ぶ。センサ１１は、例えば、加速度センサ、ジャイロセンサおよび地磁気センサを備える。なお、センサ１１は、ヘッドホン１２とは別の位置に設けられてもよく、また、頭部以外の位置に設けられてもよい。ただし、センサ１１の目的は、ユーザがどこを見ているかを検出することである。このため、センサ１１は、ユーザの頭部に設けられることが望ましい。また、センサ１１によって検出される方向は、水平面に沿った２次元方向であっても、あるいは鉛直方向を含めた３次元方向であってもよい。 The sensor 11 detects the direction in which the user is facing. Hereinafter, the direction detected by the sensor 11 is referred to as “line-of-sight direction”. The sensor 11 includes, for example, an acceleration sensor, a gyro sensor, and a geomagnetic sensor. The sensor 11 may be provided at a position different from the headphones 12 or may be provided at a position other than the head. However, the purpose of the sensor 11 is to detect where the user is looking. For this reason, it is desirable that the sensor 11 be provided on the user's head. Further, the direction detected by the sensor 11 may be a two-dimensional direction along a horizontal plane or a three-dimensional direction including a vertical direction.

ユーザ端末２００は、音声処理装置１００から受信した音声信号をアナログ変換し、変換したアナログ音声信号をヘッドホン１２のドライバユニットに出力する。また、ユーザ端末２００は、センサ１１による検出結果を基にユーザの視線方向を演算し、算出された視線方向を、アクセスポイント２１ａ〜２１ｄの少なくとも１つを通じて音声処理装置１００に送信する。 The user terminal 200 converts the audio signal received from the audio processing device 100 into an analog signal, and outputs the converted analog audio signal to the driver unit of the headphones 12. In addition, the user terminal 200 calculates the user's line-of-sight direction based on the detection result of the sensor 11, and transmits the calculated line-of-sight direction to the voice processing device 100 through at least one of the access points 21a to 21d.

図３は、音声処理装置のハードウェア構成例を示す図である。音声処理装置１００は、プロセッサ１０１、ＲＡＭ（Random Access Memory）１０２、ＨＤＤ（Hard Disk Drive）１０３、画像信号処理部１０４、入力信号処理部１０５、ディスクドライブ１０６および通信インタフェース１０７を有する。上記ユニットは、音声処理装置１００内でバス１０８に接続されている。 FIG. 3 is a diagram illustrating a hardware configuration example of the sound processing device. The sound processing apparatus 100 includes a processor 101, a RAM (Random Access Memory) 102, an HDD (Hard Disk Drive) 103, an image signal processing unit 104, an input signal processing unit 105, a disk drive 106, and a communication interface 107. The unit is connected to the bus 108 in the sound processing apparatus 100.

プロセッサ１０１は、プログラムの命令を実行する演算器を含むプロセッサである。プロセッサ１０１は、ＨＤＤ１０３に記憶されているプログラムやデータの少なくとも一部をＲＡＭ１０２にロードしてプログラムを実行する。なお、プロセッサ１０１は複数のプロセッサコアを備えてもよい。また、音声処理装置１００は、複数のプロセッサを備えてもよい。また、音声処理装置１００は、複数のプロセッサまたは複数のプロセッサコアを用いて並列処理を行ってもよい。また、２以上のプロセッサの集合、ＦＰＧＡやＡＳＩＣなどの専用回路、２以上の専用回路の集合、プロセッサと専用回路の組み合わせなどを“プロセッサ”と呼んでもよい。 The processor 101 is a processor including an arithmetic unit that executes program instructions. The processor 101 loads at least a part of the program and data stored in the HDD 103 into the RAM 102 and executes the program. The processor 101 may include a plurality of processor cores. Moreover, the voice processing apparatus 100 may include a plurality of processors. The voice processing apparatus 100 may perform parallel processing using a plurality of processors or a plurality of processor cores. A set of two or more processors, a dedicated circuit such as an FPGA or an ASIC, a set of two or more dedicated circuits, a combination of a processor and a dedicated circuit, and the like may be referred to as a “processor”.

ＲＡＭ１０２は、プロセッサ１０１が実行するプログラムやプログラムから参照されるデータを一時的に記憶する揮発性メモリである。なお、音声処理装置１００は、ＲＡＭ以外の種類のメモリを備えてもよく、複数個の揮発性メモリを備えてもよい。 The RAM 102 is a volatile memory that temporarily stores programs executed by the processor 101 and data referred to by the programs. Note that the audio processing device 100 may include a memory of a type other than the RAM, or may include a plurality of volatile memories.

ＨＤＤ１０３は、ＯＳ（Operating System）やファームウェアやアプリケーションソフトウェアなどのソフトウェアのプログラムおよびデータを記憶する不揮発性の記憶装置である。なお、音声処理装置１００は、フラッシュメモリなどの他の種類の記憶装置を備えてもよく、複数個の不揮発性の記憶装置を備えてもよい。 The HDD 103 is a non-volatile storage device that stores software programs and data such as an OS (Operating System), firmware, and application software. Note that the voice processing apparatus 100 may include other types of storage devices such as a flash memory, and may include a plurality of nonvolatile storage devices.

画像信号処理部１０４は、プロセッサ１０１からの命令に従って、音声処理装置１００に接続されたディスプレイ１３に画像を出力する。ディスプレイ１３としては、ＣＲＴ（Cathode Ray Tube）ディスプレイや液晶ディスプレイなどを用いることができる。 The image signal processing unit 104 outputs an image to the display 13 connected to the sound processing apparatus 100 in accordance with a command from the processor 101. As the display 13, a CRT (Cathode Ray Tube) display, a liquid crystal display, or the like can be used.

入力信号処理部１０５は、音声処理装置１００に接続された入力デバイス１４から入力信号を取得し、プロセッサ１０１に通知する。入力デバイス１４としては、マウスやタッチパネルなどのポインティングデバイス、キーボードなどを用いることができる。 The input signal processing unit 105 acquires an input signal from the input device 14 connected to the sound processing apparatus 100 and notifies the processor 101 of the input signal. As the input device 14, a pointing device such as a mouse or a touch panel, a keyboard, or the like can be used.

ディスクドライブ１０６は、記録媒体１５に記録されたプログラムやデータを読み取る駆動装置である。記録媒体１５として、例えば、フレキシブルディスク（ＦＤ：Flexible Disk）やＨＤＤなどの磁気ディスク、ＣＤ（Compact Disc）やＤＶＤ（Digital Versatile Disc）などの光ディスク、光磁気ディスク（ＭＯ：Magneto-Optical disk）を使用できる。ディスクドライブ１０６は、プロセッサ１０１からの命令に従って、記録媒体１５から読み取ったプログラムやデータをＲＡＭ１０２またはＨＤＤ１０３に格納する。 The disk drive 106 is a drive device that reads programs and data recorded on the recording medium 15. As the recording medium 15, for example, a magnetic disk such as a flexible disk (FD) or HDD, an optical disk such as a CD (Compact Disc) or a DVD (Digital Versatile Disc), or a magneto-optical disk (MO). Can be used. The disk drive 106 stores the program and data read from the recording medium 15 in the RAM 102 or the HDD 103 in accordance with an instruction from the processor 101.

通信インタフェース１０７は、ネットワーク１０を通じて他の装置（例えば、ユーザ端末２００）との間でデータを送受信する。通信インタフェース１０７は、受信信号の復調・復号や送信信号の符号化・変調などを行う。 The communication interface 107 transmits / receives data to / from other devices (for example, the user terminal 200) through the network 10. The communication interface 107 performs demodulation / decoding of a received signal, encoding / modulation of a transmission signal, and the like.

なお、音声処理装置１００はディスクドライブ１０６を備えていなくてもよく、専ら他の情報処理装置からアクセスされる場合には、画像信号処理部１０４や入力信号処理部１０５を備えていなくてもよい。また、ディスプレイ１３や入力デバイス１４は、音声処理装置１００の筐体と一体に形成されていてもよい。 Note that the audio processing apparatus 100 may not include the disk drive 106, and may not include the image signal processing unit 104 or the input signal processing unit 105 when accessed exclusively from another information processing apparatus. . Further, the display 13 and the input device 14 may be formed integrally with the casing of the sound processing apparatus 100.

図４は、ユーザ端末のハードウェア構成例を示す図である。ユーザ端末２００は、プロセッサ２０１、ＲＡＭ２０２、フラッシュメモリ２０３、ディスプレイ２０４、入力部２０５、入力インタフェース２０６、出力インタフェース２０７および無線インタフェース２０８を有する。上記ユニットは、ユーザ端末２００内でバス２０９に接続されている。 FIG. 4 is a diagram illustrating a hardware configuration example of the user terminal. The user terminal 200 includes a processor 201, a RAM 202, a flash memory 203, a display 204, an input unit 205, an input interface 206, an output interface 207, and a wireless interface 208. The unit is connected to the bus 209 in the user terminal 200.

プロセッサ２０１は、前述のプロセッサ１０１と同様に、プログラムの命令を実行する演算器を含むプロセッサである。ＲＡＭ２０２は、前述のＲＡＭ１０２と同様に、プロセッサ２０１が実行するプログラムやデータを一時的に記憶する揮発性メモリである。 The processor 201 is a processor including an arithmetic unit that executes program instructions, like the processor 101 described above. The RAM 202 is a volatile memory that temporarily stores programs executed by the processor 201 and data, like the RAM 102 described above.

フラッシュメモリ２０３は、ＯＳやファームウェアやアプリケーションソフトウェアなどのプログラムおよびデータを記憶する不揮発性の記憶装置である。なお、ユーザ端末２００は、ＨＤＤなどの他の種類の記憶装置を備えてもよく、複数個の不揮発性の記憶装置を備えてもよい。 The flash memory 203 is a non-volatile storage device that stores programs and data such as an OS, firmware, and application software. Note that the user terminal 200 may include another type of storage device such as an HDD, or may include a plurality of nonvolatile storage devices.

ディスプレイ２０４は、プロセッサ２０１からの命令に従って画像を表示する。ディスプレイ２０４としては、液晶ディスプレイ（ＬＣＤ：Liquid Crystal Display）や有機ＥＬ（Electro Luminescence）ディスプレイなどを用いることができる。 The display 204 displays an image according to a command from the processor 201. As the display 204, a liquid crystal display (LCD), an organic EL (Electro Luminescence) display, or the like can be used.

入力部２０５は、ユーザの入力操作を検出して入力信号としてプロセッサ２０１に通知する。入力操作には、タッチパネルをタッチペンなどのポインティングデバイスまたはユーザの指などで操作するタッチ操作や、複数の入力キーを押下する操作などがあり、いずれの操作を採用してもよい。 The input unit 205 detects a user input operation and notifies the processor 201 as an input signal. The input operation includes a touch operation in which the touch panel is operated with a pointing device such as a touch pen or a user's finger, an operation in which a plurality of input keys are pressed, and any of these operations may be adopted.

入力インタフェース２０６は、センサ１１と接続している。センサ１１は、ヘッドホン１２に固定されており、ユーザの視線方向を示す信号をデータに変換して入力インタフェース２０６へ出力するセンサデバイスである。入力インタフェース２０６との通信手段は、有線を用いてもよいし、無線を用いてもよい。入力インタフェース２０６は、センサ１１から取得した信号をプロセッサ２０１に通知する。 The input interface 206 is connected to the sensor 11. The sensor 11 is a sensor device that is fixed to the headphones 12 and converts a signal indicating the user's line-of-sight direction into data and outputs the data to the input interface 206. The communication means with the input interface 206 may be wired or wireless. The input interface 206 notifies the processor 201 of the signal acquired from the sensor 11.

出力インタフェース２０７は、ヘッドホン１２に接続されており、同期フレームまたはデータフレームを含む所定チャネルの音声信号をアナログ音声信号に変換し、ヘッドホン１２に出力する。 The output interface 207 is connected to the headphones 12, converts an audio signal of a predetermined channel including a synchronization frame or a data frame into an analog audio signal, and outputs the analog audio signal to the headphones 12.

無線インタフェース２０８は、アクセスポイント２１ａ〜２１ｄとの間で無線通信する。無線インタフェース２０８は、受信信号の復調・復号や送信信号の符号化・変調などを行う機能を有する。 The wireless interface 208 performs wireless communication with the access points 21a to 21d. The wireless interface 208 has a function of performing demodulation / decoding of a reception signal, encoding / modulation of a transmission signal, and the like.

図５は、音声処理装置の機能例を示すブロック図である。音声処理装置１００は、配置管理情報記憶部１１０、音源情報記憶部１２０、ＨＲＴＦ情報記憶部１３０、ユーザ情報取得部１４０、仮想スピーカ配置部１５０、音源分配部１６０および音声生成部１７０を有する。 FIG. 5 is a block diagram illustrating an example of functions of the voice processing apparatus. The audio processing apparatus 100 includes an arrangement management information storage unit 110, a sound source information storage unit 120, an HRTF information storage unit 130, a user information acquisition unit 140, a virtual speaker arrangement unit 150, a sound source distribution unit 160, and an audio generation unit 170.

配置管理情報記憶部１１０は、ユーザの状態、音源に関する情報、配置する仮想スピーカの位置に関する情報、仮想スピーカを配置する配置領域に関する情報など、仮想スピーカの配置を管理するための情報を一時的に記憶する。仮想スピーカとは、ユーザを中心とした所定の円周上に仮想的に配置される仮想音源である。以下、ユーザを中心とした所定の円周上を“所定の円周”と記載する場合がある。 The arrangement management information storage unit 110 temporarily stores information for managing the arrangement of the virtual speakers, such as the user status, information on the sound source, information on the position of the virtual speaker to be arranged, and information on the arrangement area in which the virtual speaker is arranged. Remember. The virtual speaker is a virtual sound source that is virtually arranged on a predetermined circumference centered on the user. Hereinafter, a predetermined circumference centering on the user may be referred to as a “predetermined circumference”.

音源情報記憶部１２０は、あらかじめユーザの周辺に仮想的に配置された各音源に関する情報が登録される。音源に関する情報は、各音源を識別する音源ＩＤ（Identity）、各音源の位置情報および各音源から出力される音声信号を含む。各音源に関する情報は、例えば、管理者などによる音声処理装置１００への入力操作により任意に設定することができる。 In the sound source information storage unit 120, information on each sound source virtually arranged in the vicinity of the user is registered in advance. The information on the sound source includes a sound source ID (Identity) for identifying each sound source, position information on each sound source, and an audio signal output from each sound source. Information about each sound source can be arbitrarily set by an input operation to the sound processing apparatus 100 by an administrator or the like, for example.

なお、上記の「仮想スピーカ」および「音源」のいずれも、ユーザの周囲に仮想的に配置される仮想音源である。以下の説明では、音源情報記憶部１２０に位置の情報があらかじめ設定される仮想音源を単に「音源」と呼び、仮想スピーカ配置部１５０によって動的に配置される仮想音源を「仮想スピーカ」と呼ぶ。 Note that both the above-mentioned “virtual speaker” and “sound source” are virtual sound sources that are virtually arranged around the user. In the following description, a virtual sound source whose position information is preset in the sound source information storage unit 120 is simply referred to as “sound source”, and a virtual sound source dynamically arranged by the virtual speaker arrangement unit 150 is referred to as “virtual speaker”. .

ＨＲＴＦ情報記憶部１３０は、仮想スピーカの位置に対応する左右のＨＲＴＦの一覧情報を記憶する。ＨＲＴＦとは、任意の位置を持つ音源から出る音波と耳の鼓膜に到達する音波間の伝達関数を意味し、聴取者から見た音源の方位や高度によってその値が異なる。音声処理装置１００は、音声信号を特定方向のＨＲＴＦを用いて畳み込み演算することで、音像をその特定方向に定位させることができる。 The HRTF information storage unit 130 stores left and right HRTF list information corresponding to the position of the virtual speaker. HRTF means a transfer function between a sound wave emitted from a sound source having an arbitrary position and a sound wave reaching the eardrum of the ear, and its value varies depending on the direction and altitude of the sound source as viewed from the listener. The sound processing apparatus 100 can localize the sound image in the specific direction by performing a convolution operation on the sound signal using the HRTF in the specific direction.

ユーザ情報取得部１４０は、ユーザ端末２００からユーザの視線方向を示す情報を随時取得する。また、ユーザ情報取得部１４０は、ユーザ端末２００から送信された信号をアクセスポイント２１ａ〜２１ｄを通じて受信し、これらの受信信号を基にユーザ端末２００の位置を検出する。ユーザ情報取得部１４０は、取得したユーザの視線方向および検出されたユーザ端末２００の位置情報などのユーザの状態を示す情報を一時的に配置管理情報記憶部１１０に記憶する。 The user information acquisition unit 140 acquires information indicating the user's line-of-sight direction from the user terminal 200 as needed. Moreover, the user information acquisition part 140 receives the signal transmitted from the user terminal 200 through the access points 21a-21d, and detects the position of the user terminal 200 based on these received signals. The user information acquisition unit 140 temporarily stores, in the arrangement management information storage unit 110, information indicating the user state such as the acquired user's line-of-sight direction and the detected position information of the user terminal 200.

仮想スピーカ配置部１５０は、配置管理情報記憶部１１０に記憶されたユーザの状態を示す情報および音源情報記憶部１２０に記憶された音源の位置情報に基づいて、一定数の仮想スピーカをユーザの周囲の所定の円周上に配置する。そして、仮想スピーカ配置部１５０は、配置した仮想スピーカの情報を配置管理情報記憶部１１０に記憶する。 Based on the information indicating the state of the user stored in the arrangement management information storage unit 110 and the position information of the sound source stored in the sound source information storage unit 120, the virtual speaker arrangement unit 150 places a certain number of virtual speakers around the user. Are arranged on a predetermined circumference. Then, the virtual speaker placement unit 150 stores information on the placed virtual speakers in the placement management information storage unit 110.

音源分配部１６０は、配置管理情報記憶部１１０に記憶されたユーザの状態を示す情報および仮想スピーカの位置に関する情報に基づいて、音源情報記憶部１２０に記憶されている各音源に対応する音声信号を、仮想スピーカ配置部１５０により配置された各仮想スピーカに分配する。 The sound source distribution unit 160, based on the information indicating the state of the user stored in the arrangement management information storage unit 110 and the information on the position of the virtual speaker, the audio signal corresponding to each sound source stored in the sound source information storage unit 120 Is distributed to each virtual speaker arranged by the virtual speaker arrangement unit 150.

音声生成部１７０は、仮想スピーカ配置部１５０により配置された仮想スピーカの位置に対応する左右のＨＲＴＦを、ＨＲＴＦ情報記憶部１３０に記憶されたＨＲＴＦの一覧情報を基に取得する。そして、音声生成部１７０は、音源分配部１６０により分配された音声信号と取得された左右のＨＲＴＦとに基づいて、左右のチャネルの音声信号を生成する。また、音声生成部１７０は、生成した左右のチャネルの音声信号をユーザ端末２００へ送信する。 The sound generation unit 170 acquires the left and right HRTFs corresponding to the positions of the virtual speakers arranged by the virtual speaker arrangement unit 150 based on the HRTF list information stored in the HRTF information storage unit 130. Then, the sound generation unit 170 generates left and right channel sound signals based on the sound signals distributed by the sound source distribution unit 160 and the acquired left and right HRTFs. In addition, the sound generation unit 170 transmits the generated sound signals of the left and right channels to the user terminal 200.

一方、ユーザ端末２００は、ユーザ情報提供部２１０および音声出力部２２０を有する。
ユーザ情報提供部２１０は、センサ１１による検出結果を基にユーザの視線方向を算出し、視線方向を示す情報を音声処理装置１００へ送信する。ユーザの視線方向を示す情報は、センサ１１から検出結果が出力される毎に随時送信される。音声出力部２２０は、音声処理装置１００から受信した音声信号をアナログ化し、アナログ音声信号を増幅してヘッドホン１２へ送信する。 On the other hand, the user terminal 200 includes a user information providing unit 210 and an audio output unit 220.
The user information providing unit 210 calculates the user's line-of-sight direction based on the detection result of the sensor 11, and transmits information indicating the line-of-sight direction to the sound processing apparatus 100. Information indicating the user's line-of-sight direction is transmitted whenever the detection result is output from the sensor 11. The audio output unit 220 converts the audio signal received from the audio processing device 100 into an analog signal, amplifies the analog audio signal, and transmits the amplified analog audio signal to the headphones 12.

次に、音源および仮想スピーカが配置された状態で、音源から出力される音声信号から左右のチャネルの音声信号を生成する方法について説明する。なお、これ以降の説明では、鉛直方向（ｚ軸方向）の座標については無視し、ユーザの視線方向は水平方向（ｘ軸方向およびｙ軸方向）に平行であるとともに、音源および仮想スピーカの高さ（ｚ軸方向の座標）はユーザの頭部（正確には耳）の高さと同じであると考える。 Next, a method for generating left and right channel audio signals from audio signals output from the sound source in a state where the sound source and the virtual speaker are arranged will be described. In the following description, the coordinates in the vertical direction (z-axis direction) are ignored, the user's line-of-sight direction is parallel to the horizontal direction (x-axis direction and y-axis direction), and the height of the sound source and the virtual speaker is high. The height (coordinate in the z-axis direction) is considered to be the same as the height of the user's head (precisely, the ear).

図６は、左右のチャネルの音声信号の生成例について示す図である。ユーザ３０は、本実施の形態の音声処理システムを用いて音声を聞く者である。音源３１は、あらかじめユーザ３０の周りに配置されているｍ個（ｍは、１以上の整数）の音源に含まれる音源の１つである。仮想スピーカ３２，３３は、仮想スピーカ配置部１５０が配置したｎ個（ｎは、１以上かつｍより小さい整数）の仮想スピーカに含まれる仮想スピーカである。ユーザ３０を中心とした円周３４上において、仮想スピーカ３２（Ｖ０）はユーザ３０の視線方向３５から右回転方向にθｖ０°移動させた位置に配置され、仮想スピーカ３３（Ｖ１）はユーザ３０の視線方向３５から右回転方向にθｖ１°移動させた位置に配置されている。 FIG. 6 is a diagram illustrating an example of generating audio signals of the left and right channels. The user 30 is a person who listens to the voice using the voice processing system of the present embodiment. The sound source 31 is one of sound sources included in m sound sources (m is an integer of 1 or more) arranged around the user 30 in advance. The virtual speakers 32 and 33 are virtual speakers included in n (n is an integer of 1 or more and smaller than m) virtual speakers arranged by the virtual speaker arrangement unit 150. On the circumference 34 centered on the user 30, the virtual speaker 32 (V 0) is arranged at a position moved θv 0 ° in the clockwise rotation direction from the line-of-sight direction 35 of the user 30, and the virtual speaker 33 (V 1) It is arranged at a position moved from the line-of-sight direction 35 by θv1 ° in the clockwise rotation direction.

音源分配部１６０は、音源それぞれの音声信号を、ユーザ３０から見て円周３４に沿った方向に対して各音源に近接する１または２の仮想スピーカに分配する。例えば、音源３１から出力される音声信号は、音源分配部１６０により、仮想スピーカ配置部１５０により配置された複数の仮想スピーカのうちユーザ３０から見て円周３４に沿った方向に対して音源３１に近接する仮想スピーカ３２，３３に分配される。 The sound source distribution unit 160 distributes the sound signal of each sound source to one or two virtual speakers adjacent to each sound source in the direction along the circumference 34 when viewed from the user 30. For example, the sound signal output from the sound source 31 is generated by the sound source distribution unit 160 with respect to the direction along the circumference 34 when viewed from the user 30 among the plurality of virtual speakers arranged by the virtual speaker arrangement unit 150. Are distributed to the virtual speakers 32 and 33 adjacent to.

なお、音源の音声信号を２つの仮想スピーカに分配する場合には、ユーザと音源と２つの仮想スピーカとの位置関係に応じて重み付けされた音声信号が、各仮想スピーカに分配される。例えば、図６の例において、ユーザ３０と音源３１とを結ぶ直線と、ユーザ３０と仮想スピーカ３２（Ｖ０）とを結ぶ直線とがなす角をθｐとし、ユーザ３０と音源３１とを結ぶ直線と、ユーザ３０と仮想スピーカ３３（Ｖ１）とを結ぶ直線とがなす角をθｑとする。この場合、音源分配部１６０は、音源３１の音声信号に、重み係数｛θｑ／（θｐ＋θｑ）｝と、ユーザ３０と音源３１との距離に応じた重み係数とを乗じた音声信号を、仮想スピーカ３２（Ｖ０）に分配する。また、音源分配部１６０は、音源３１の音声信号に、重み係数｛θｐ／（θｐ＋θｑ）｝と、ユーザ３０と音源３１との距離に応じた重み係数とを乗じた音声信号を、仮想スピーカ３３（Ｖ１）に分配する。ここで、ユーザ３０と音源３１との距離に応じた重み係数は、距離が遠くなるほど小さく設定される。 When the sound signal of the sound source is distributed to the two virtual speakers, the sound signal weighted according to the positional relationship between the user, the sound source, and the two virtual speakers is distributed to each virtual speaker. For example, in the example of FIG. 6, an angle formed by a straight line connecting the user 30 and the sound source 31 and a straight line connecting the user 30 and the virtual speaker 32 (V0) is θp, and a straight line connecting the user 30 and the sound source 31 An angle formed by a straight line connecting the user 30 and the virtual speaker 33 (V1) is defined as θq. In this case, the sound source distribution unit 160 uses the sound signal obtained by multiplying the sound signal of the sound source 31 by the weighting coefficient {θq / (θp + θq)} and the weighting coefficient according to the distance between the user 30 and the sound source 31 as a virtual speaker. Distribute to 32 (V0). Further, the sound source distribution unit 160 multiplies the sound signal of the sound source 31 by the weight signal {θp / (θp + θq)} and the weight coefficient corresponding to the distance between the user 30 and the sound source 31 to the virtual speaker 33. Distribute to (V1). Here, the weighting coefficient corresponding to the distance between the user 30 and the sound source 31 is set smaller as the distance increases.

図５に示した音声生成部１７０は、図６に示すように、右チャネルの音声信号を出力するための音声生成部１７１と、左チャネルの音声信号を出力するための音声生成部１７２とを有する。 As shown in FIG. 6, the audio generation unit 170 shown in FIG. 5 includes an audio generation unit 171 for outputting a right channel audio signal and an audio generation unit 172 for outputting a left channel audio signal. Have.

音声生成部（右）１７１は、例えば、仮想スピーカ３２（Ｖ０）に分配された音声信号と、仮想スピーカ３２が配置された位置を示すθｖ０°に対応する右チャネルのＨＲＴＦとを畳み込み演算する。同様に、音声生成部（右）１７１は、仮想スピーカ３３（Ｖ１）に分配された音声信号と、仮想スピーカ３３（Ｖ１）が配置された位置を示すθｖ１°に対応する右チャネルのＨＲＴＦとを畳み込み演算する。そして、上記のように各音源について畳み込み演算された信号を合成することで右チャネルの音声信号を生成する。 For example, the sound generation unit (right) 171 performs a convolution operation on the sound signal distributed to the virtual speaker 32 (V0) and the HRTF of the right channel corresponding to θv0 ° indicating the position where the virtual speaker 32 is disposed. Similarly, the sound generation unit (right) 171 outputs the sound signal distributed to the virtual speaker 33 (V1) and the HRTF of the right channel corresponding to θv1 ° indicating the position where the virtual speaker 33 (V1) is disposed. Convolution operation. Then, a right channel audio signal is generated by synthesizing the signals subjected to the convolution operation for each sound source as described above.

同様に、音声生成部（左）１７２は、ｎ個の各仮想スピーカに分配された音声信号と各仮想スピーカの位置に対応する左チャネルのＨＲＴＦとを用いて合成することで左チャネルの音声信号を生成する。 Similarly, the audio generation unit (left) 172 combines the audio signal distributed to each of the n virtual speakers and the left channel HRTF corresponding to the position of each virtual speaker, thereby synthesizing the left channel audio signal. Is generated.

このように、複数の音源それぞれからの音声信号をこれらの音源の数より少ない複数の仮想スピーカに分配後に、左右のチャネルの音声信号を生成することで、畳み込み演算の処理の回数を音源の数より少なくし、音声生成部１７０の処理負荷を軽減することができる。よって、左右チャネルの音声信号を迅速に生成でき、音声出力のリアルタイム性を維持できる。 In this way, by distributing the audio signals from each of a plurality of sound sources to a plurality of virtual speakers that are fewer than the number of these sound sources, generating the sound signals of the left and right channels, the number of convolution operations can be reduced by the number of sound sources. The processing load of the voice generation unit 170 can be reduced. Therefore, the left and right channel audio signals can be generated quickly, and the real-time performance of the audio output can be maintained.

しかしながら、上記のように音源の音声信号を音源より少ない数の仮想スピーカに分配する方法においては、音源と仮想スピーカの位置が異なるため、音源と仮想スピーカそれぞれに対応するＨＲＴＦも異なる。そのため、上記のように仮想スピーカに分配された音声信号を用いて生成される音声信号は、各音源の音像の定位感が減少する場合がある。例えば、図６の音源３１の音声信号は、音源３１の位置に対応するＨＲＴＦを用いて処理されずに、音源３１とは異なる位置にある仮想スピーカ３２（Ｖ０）および仮想スピーカ３３（Ｖ１）の各位置に対応するＨＲＴＦを用いて処理される。この場合、本来使用されるべきＨＲＴＦとは異なるＨＲＴＦを用いた演算が行われることから、ユーザ３０は、音源３１の音像を正しい方向に認識できない可能性がある。 However, in the method of distributing the sound signal of the sound source to a smaller number of virtual speakers as described above, since the positions of the sound source and the virtual speakers are different, the HRTFs corresponding to the sound sources and the virtual speakers are also different. For this reason, the sound signal generated using the sound signal distributed to the virtual speaker as described above may reduce the sense of localization of the sound image of each sound source. For example, the audio signal of the sound source 31 in FIG. 6 is not processed using the HRTF corresponding to the position of the sound source 31, and the virtual speakers 32 (V0) and 33 (V1) at positions different from the sound source 31 are processed. Processing is performed using the HRTF corresponding to each position. In this case, since calculation using an HRTF different from the HRTF that should be used is performed, the user 30 may not be able to recognize the sound image of the sound source 31 in the correct direction.

そこで、図７〜図１３では、ユーザの周辺に配置されている各音源の音像の定位感が向上するように仮想スピーカを配置する方法について説明する。
図７は、音源と仮想スピーカとの位置関係の例を示す図である。ユーザ３０の周辺には、音源４１ａ，４１ｂを含む複数の音源が配置されている。音源４１ａは、仮想スピーカ４２ａおよび仮想スピーカ４２ｂと、円周３４に沿った方向に隣り合っており、音源４１ｂは、仮想スピーカ４２ｂおよび仮想スピーカ４２ｃと、円周３４に沿った方向に隣り合っている。 7 to 13, a method for arranging virtual speakers so as to improve the sense of localization of the sound images of the sound sources arranged around the user will be described.
FIG. 7 is a diagram illustrating an example of a positional relationship between a sound source and a virtual speaker. Around the user 30, a plurality of sound sources including sound sources 41a and 41b are arranged. The sound source 41a is adjacent to the virtual speaker 42a and the virtual speaker 42b in the direction along the circumference 34, and the sound source 41b is adjacent to the virtual speaker 42b and the virtual speaker 42c in the direction along the circumference 34. Yes.

角度θａ１は、ユーザ３０から仮想スピーカ４２ａへの方向とユーザ３０から音源４１ａへの方向との間の角度であり、角度θａ２は、ユーザ３０から仮想スピーカ４２ｂへの方向とユーザ３０から音源４１ａへの方向との間の角度である。同様に、角度θｂ１は、ユーザ３０から仮想スピーカ４２ｂへの方向とユーザ３０から音源４１ｂへの方向との間の角度であり、角度θｂ２は、ユーザ３０から仮想スピーカ４２ｃへの方向とユーザ３０から音源４１ｂへの方向との間の角度である。 The angle θa1 is an angle between the direction from the user 30 to the virtual speaker 42a and the direction from the user 30 to the sound source 41a, and the angle θa2 is the direction from the user 30 to the virtual speaker 42b and from the user 30 to the sound source 41a. Is the angle between Similarly, the angle θb1 is an angle between the direction from the user 30 to the virtual speaker 42b and the direction from the user 30 to the sound source 41b, and the angle θb2 is the direction from the user 30 to the virtual speaker 42c and from the user 30. It is an angle between the direction to the sound source 41b.

ここで、仮想スピーカを用いて音声信号を生成する際、ユーザ３０から見た仮想スピーカの方向と音源の方向との間の角度が小さいほど、音源の音像の定位感に近い音声信号を生成できる。そのため、仮想スピーカ４２ａ，４２ｂ，４２ｃを用いて音声信号を生成するとき、角度θａ１，θａ２，θｂ１，θｂ２の最大値が最小になるように各仮想スピーカを配置した場合に最も音源４１ａ，４１ｂの音像の定位感が向上する。 Here, when an audio signal is generated using a virtual speaker, an audio signal closer to the localization of the sound image of the sound source can be generated as the angle between the direction of the virtual speaker viewed from the user 30 and the direction of the sound source is smaller. . Therefore, when generating audio signals using the virtual speakers 42a, 42b, and 42c, when each virtual speaker is arranged so that the maximum values of the angles θa1, θa2, θb1, and θb2 are minimized, The localization of the sound image is improved.

このように、仮想スピーカ配置部１５０は、隣り合う２つの仮想スピーカそれぞれと、ユーザ３０から見てこれら２つの仮想スピーカの間に位置する音源との円周３４に沿った方向の距離が小さくなるように、各仮想スピーカを配置する。これにより、本来使用されるべきＨＲＴＦと実施の演算で使用されるＨＲＴＦとの誤差が減少する。したがって、仮想スピーカから出力される音声信号を用いて生成した左右のチャネルの音声信号が生成された際、各音源の音像の定位感を向上できる。 As described above, the virtual speaker placement unit 150 reduces the distance in the direction along the circumference 34 between each of the two adjacent virtual speakers and the sound source positioned between the two virtual speakers when viewed from the user 30. Thus, each virtual speaker is arranged. This reduces the error between the HRTF that should be used originally and the HRTF used in the operation. Therefore, when the left and right channel audio signals generated using the audio signal output from the virtual speaker are generated, it is possible to improve the sense of localization of the sound image of each sound source.

なお、音源とユーザとの距離は、音源分配部１６０などで音源の音量により補間するため、音像の定位感に影響しない。
次に、まず、図８で、仮想スピーカの数が音源の数以上である場合の仮想スピーカの配置方法について説明する。 The distance between the sound source and the user is interpolated by the sound source volume by the sound source distribution unit 160 and the like, and thus does not affect the sense of localization of the sound image.
Next, with reference to FIG. 8, a method for arranging virtual speakers when the number of virtual speakers is equal to or greater than the number of sound sources will be described.

図８は、仮想スピーカの数が仮想音源の数以上である場合の仮想スピーカの配置方法の例を示す図である。ユーザ３０の周辺に音源４３ａ，４３ｂが配置されている。このとき、円周３４上に、仮想スピーカ５１，５２，５３，５４，５５，５６，５７，５８を配置する場合、まず、仮想スピーカ配置部１５０は、仮想スピーカ５１，５２，５３，５４，５５，５６，５７，５８を円周３４上に均等になるよう配置する。次に、仮想スピーカ配置部１５０は、ユーザ３０から見て音源との円周３４に沿った方向の距離が最も近い仮想スピーカをユーザ３０から当該音源への方向になるように円周３４上に沿って移動させる。 FIG. 8 is a diagram illustrating an example of a virtual speaker arrangement method when the number of virtual speakers is equal to or greater than the number of virtual sound sources. Sound sources 43 a and 43 b are arranged around the user 30. At this time, when the virtual speakers 51, 52, 53, 54, 55, 56, 57, 58 are arranged on the circumference 34, first, the virtual speaker arrangement unit 150 first displays the virtual speakers 51, 52, 53, 54, 55, 56, 57, 58 are arranged on the circumference 34 so as to be even. Next, the virtual speaker placement unit 150 places the virtual speaker having the shortest distance in the direction along the circumference 34 with the sound source when viewed from the user 30 on the circumference 34 so as to be in the direction from the user 30 to the sound source. Move along.

例えば、仮想スピーカ配置部１５０は、音源４３ａについて円周３４に沿った方向の距離が最も近い仮想スピーカ５２を、円周３４上において、ユーザ３０から音源４３ａへの方向になるように配置する。また、仮想スピーカ配置部１５０は、音源４３ｂについて円周３４に沿った方向の距離が最も近い仮想スピーカ５４を、円周３４上において、ユーザ３０から音源４３ｂへの方向になるように配置する。 For example, the virtual speaker arrangement unit 150 arranges the virtual speaker 52 having the shortest distance in the direction along the circumference 34 with respect to the sound source 43a on the circumference 34 so as to be in the direction from the user 30 to the sound source 43a. Further, the virtual speaker arrangement unit 150 arranges the virtual speaker 54 having the shortest distance in the direction along the circumference 34 with respect to the sound source 43b on the circumference 34 so as to be in the direction from the user 30 to the sound source 43b.

これにより、仮想スピーカ５２のＨＲＴＦは音源４３ａと同じＨＲＴＦが適用され、仮想スピーカ５４のＨＲＴＦは音源４３ｂと同じＨＲＴＦが適用される。したがって、音声生成部１７０は、音源４３ａ，４３ｂの各音像がそれぞれの方向に正確に定位した音声信号を生成できる。 Thereby, the same HRTF as the sound source 43a is applied to the HRTF of the virtual speaker 52, and the same HRTF as the sound source 43b is applied to the HRTF of the virtual speaker 54. Therefore, the sound generation unit 170 can generate a sound signal in which each sound image of the sound sources 43a and 43b is accurately localized in each direction.

このように、仮想スピーカの数が音源の数以上の場合は、円周３４上において、ユーザ３０から各音源の方向に仮想スピーカを配置することで、各音源と同じＨＲＴＦを適用できるため、各音源の音像を各音源の方向に正確に定位させることができる。 Thus, when the number of virtual speakers is equal to or greater than the number of sound sources, the same HRTF as each sound source can be applied by arranging virtual speakers from the user 30 in the direction of each sound source on the circumference 34. The sound image of the sound source can be accurately localized in the direction of each sound source.

なお、移動させた仮想スピーカ５２，５４以外の仮想スピーカについては、音源分配部１６０で音源４１，４２から出力される音声信号を分配しないようにしてもよい。また、仮想スピーカ配置部１５０により、移動させた仮想スピーカ５２，５４以外の仮想スピーカを配置しないようにしてもよい。 For the virtual speakers other than the moved virtual speakers 52 and 54, the sound signal output from the sound sources 41 and 42 may not be distributed by the sound source distribution unit 160. Further, the virtual speaker placement unit 150 may not place virtual speakers other than the moved virtual speakers 52 and 54.

次に、図９〜図１３で、仮想スピーカの数が音源の数未満である場合の仮想スピーカの配置方法について説明する。
図９は、仮想スピーカの数が仮想音源の数未満である場合の仮想スピーカの配置方法の例を示す第１の図である。ユーザ３０の周辺には、音源６１（音源ＩＤ＝＃１）〜音源７６（音源ＩＤ＝＃１６）の１６個の音源が配置されている（以下の説明において、“音源ＩＤ＝”の記載は省略する）。この場合において、８個の仮想スピーカを配置する場合について説明する。以下、図１０〜図１３についても同様とする。 Next, with reference to FIGS. 9 to 13, a method for arranging virtual speakers when the number of virtual speakers is less than the number of sound sources will be described.
FIG. 9 is a first diagram illustrating an example of an arrangement method of virtual speakers when the number of virtual speakers is less than the number of virtual sound sources. In the vicinity of the user 30, 16 sound sources of sound source 61 (sound source ID = # 1) to sound source 76 (sound source ID = # 16) are arranged (in the following description, “sound source ID =” is described. (Omitted). In this case, a case where eight virtual speakers are arranged will be described. Hereinafter, the same applies to FIGS. 10 to 13.

まず、仮想スピーカ配置部１５０は、ユーザ３０から見て円周３４に沿った方向に隣り合う２つの音源間の、ユーザ３０を中心として円周に沿った方向の角度を算出する。
以下、例えば、音源６１（＃１）と音源６２（＃２）間のユーザ３０を中心として円周３４に沿った方向の角度をＳθ１−２のように表すものとし、音源６２（＃２）と音源６３（＃３）間の当該角度をＳθ２−３のように表すものとする。他の２つの音源間のユーザ３０を中心として円周に沿った方向の角度についても同様に表すものとする。 First, the virtual speaker arrangement unit 150 calculates an angle in a direction along the circumference around the user 30 between two sound sources adjacent in the direction along the circumference 34 when viewed from the user 30.
Hereinafter, for example, an angle in a direction along the circumference 34 around the user 30 between the sound source 61 (# 1) and the sound source 62 (# 2) is represented as Sθ1-2, and the sound source 62 (# 2). The angle between the sound source 63 and the sound source 63 (# 3) is represented as Sθ2-3. The angle in the direction along the circumference around the user 30 between the other two sound sources is also expressed in the same manner.

図１０は、仮想スピーカの数が仮想音源の数未満である場合の仮想スピーカの配置方法の例を示す第２の図である。次に、仮想スピーカ配置部１５０は、円周３４に沿った方向の範囲のうち、図９で算出した各音源間の角度がしきい値以上である音源のペアを選択し、選択したペアに含まれる各音源を「両端音源」と特定する。この「両端音源」とは、ユーザ３０から見て後述する「配置領域」の候補となる領域の両端の方向に位置する音源であることを意味する。 FIG. 10 is a second diagram illustrating an example of a placement method of virtual speakers when the number of virtual speakers is less than the number of virtual sound sources. Next, the virtual speaker placement unit 150 selects a pair of sound sources in which the angle between the sound sources calculated in FIG. 9 is equal to or greater than a threshold value in the range of directions along the circumference 34, and sets the selected pair. Each included sound source is identified as a “both-end sound source”. This “both-end sound source” means that the sound source is located in the direction of both ends of an area that is a candidate for an “arrangement area” to be described later when viewed from the user 30.

しきい値は、例えば、３６０°を仮想スピーカの数“８”で除算した値（すなわち４５°）に設定される。例えば、角度Ｓθ４−５および角度Ｓθ８−９のみが４５°以上であったとすると、円周３４に沿った方向の範囲のうち、音源６４（＃４）、音源６５（＃５）、音源６８（＃８）および音源６９（＃９）が仮想スピーカ配置部１５０により両端音源として特定される。 For example, the threshold value is set to a value obtained by dividing 360 ° by the number of virtual speakers “8” (that is, 45 °). For example, if only the angles Sθ4-5 and Sθ8-9 are 45 ° or more, the sound source 64 (# 4), the sound source 65 (# 5), the sound source 68 ( # 8) and the sound source 69 (# 9) are specified as the sound sources at both ends by the virtual speaker placement unit 150.

なお、しきい値は、３６０°を仮想スピーカの数で除算した値より小さい値であってもよい。
次に、仮想スピーカ配置部１５０は、特定された各両端音源が位置する方向に、それぞれ仮想スピーカを配置する。図１０の場合、仮想スピーカ配置部１５０は、音源６４（＃４）の方向に仮想スピーカ８１を配置し、音源６５（＃５）の方向に仮想スピーカ８２を配置し、音源６８（＃８）の方向に仮想スピーカ８３を配置し、音源６９（＃９）の方向に仮想スピーカ８４を配置する。 The threshold value may be smaller than a value obtained by dividing 360 ° by the number of virtual speakers.
Next, the virtual speaker placement unit 150 places virtual speakers in the direction in which the identified both-end sound sources are located. In the case of FIG. 10, the virtual speaker placement unit 150 places the virtual speaker 81 in the direction of the sound source 64 (# 4), places the virtual speaker 82 in the direction of the sound source 65 (# 5), and the sound source 68 (# 8). The virtual speaker 83 is disposed in the direction of the sound source 69, and the virtual speaker 84 is disposed in the direction of the sound source 69 (# 9).

次に、仮想スピーカ配置部１５０は、ユーザ３０から見て円周３４の方向に隣り合う両端音源に挟まれた領域のうち、両端音源以外の音源がさらに含まれている領域を、仮想スピーカをさらに配置するための「配置領域」として特定する。 Next, the virtual speaker placement unit 150 selects a virtual speaker as an area that further includes sound sources other than both-end sound sources among areas sandwiched between both-end sound sources adjacent to each other in the direction of the circumference 34 when viewed from the user 30. Further, it is specified as an “arrangement area” for arrangement.

図１０の場合、音源６５（＃５）および音源６８（＃８）に挟まれた領域Ｔθ５−８には音源６６（＃６）および音源６７（＃７）が含まれる。また、音源６９（＃９）および音源６４（＃４）に挟まれた領域Ｔθ９−４には音源６１（＃１）〜音源６３（＃３），音源７０（＃１０）〜音源７６（＃１６）が含まれる。したがって、領域Ｔθ５−８と領域Ｔθ９−４とが、配置領域として特定される。 In the case of FIG. 10, the region Tθ5-8 sandwiched between the sound source 65 (# 5) and the sound source 68 (# 8) includes the sound source 66 (# 6) and the sound source 67 (# 7). Further, in a region Tθ9-4 sandwiched between the sound source 69 (# 9) and the sound source 64 (# 4), the sound source 61 (# 1) to the sound source 63 (# 3), the sound source 70 (# 10) to the sound source 76 (# 16). Therefore, the region Tθ5-8 and the region Tθ9-4 are specified as the arrangement region.

図１１は、仮想スピーカの数が仮想音源の数未満である場合の仮想スピーカの配置方法の例を示す第３の図である。次に、仮想スピーカ配置部１５０は、特定された配置領域それぞれにおける、すでに配置された隣り合う２つの仮想スピーカ間の円周３４に沿った方向の角度を算出する。例えば、図１１の場合、領域Ｔθ５−８における、隣り合う２つの仮想スピーカ８２，８３間の円周３４に沿った方向の角度は角度Ｓθ５−８となる。また、領域Ｔθ９−４における、隣り合う２つの仮想スピーカ８４および仮想スピーカ８１との間の円周３４に沿った方向の角度は角度Ｓθ９−４となる。 FIG. 11 is a third diagram illustrating an example of a placement method of virtual speakers when the number of virtual speakers is less than the number of virtual sound sources. Next, the virtual speaker placement unit 150 calculates an angle in a direction along the circumference 34 between two adjacent virtual speakers that have already been placed in each of the specified placement regions. For example, in the case of FIG. 11, the angle in the direction along the circumference 34 between the two adjacent virtual speakers 82 and 83 in the region Tθ5-8 is an angle Sθ5-8. In the region Tθ9-4, the angle in the direction along the circumference 34 between the two adjacent virtual speakers 84 and 81 is the angle Sθ9-4.

次に、仮想スピーカ配置部１５０は、算出された仮想スピーカ間の角度が最大である配置領域に、仮想スピーカ間の間隔が均等になるように未配置の仮想スピーカを１つ配置する。図１１では、角度Ｓθ５−８＜角度Ｓθ９−４とすると、仮想スピーカ配置部１５０は、領域Ｔθ９−４に仮想スピーカ間の間隔が均等になるように仮想スピーカ８５を配置する。 Next, the virtual speaker placement unit 150 places one unplaced virtual speaker in the placement region where the calculated angle between the virtual speakers is maximum so that the intervals between the virtual speakers are equal. In FIG. 11, when the angle Sθ5-8 <angle Sθ9-4, the virtual speaker arrangement unit 150 arranges the virtual speakers 85 in the region Tθ9-4 so that the intervals between the virtual speakers are equal.

図１２は、仮想スピーカの数が仮想音源の数未満である場合の仮想スピーカの配置方法の例を示す第４の図である。次に、仮想スピーカ配置部１５０は、図１１と同様に、特定された配置領域それぞれにおける、すでに配置された隣り合う２つの仮想スピーカ間の円周３４に沿った方向の角度を算出し、算出された角度が最大である配置領域に仮想スピーカを配置する。図１２では、領域Ｔθ５−８において角度Ｓθ５−８が算出され、領域Ｔθ９−４において（角度Ｓθ９−４）／２が算出される。そして、角度Ｓθ５−８＞（角度Ｓθ９−４）／２とすると、領域Ｔθ５−８に仮想スピーカ間の距離が均等になるように仮想スピーカ８６が配置される。 FIG. 12 is a fourth diagram illustrating an example of a placement method of virtual speakers when the number of virtual speakers is less than the number of virtual sound sources. Next, the virtual speaker arrangement unit 150 calculates the angle in the direction along the circumference 34 between two adjacent virtual speakers that have already been arranged in each of the specified arrangement areas, as in FIG. The virtual speaker is arranged in the arrangement region where the angle formed is the maximum. In FIG. 12, the angle Sθ5-8 is calculated in the region Tθ5-8, and (angle Sθ9-4) / 2 is calculated in the region Tθ9-4. When the angle Sθ5-8> (angle Sθ9-4) / 2, the virtual speakers 86 are arranged in the region Tθ5-8 so that the distances between the virtual speakers are equal.

図１３は、仮想スピーカの数が仮想音源の数未満である場合の仮想スピーカの配置方法の例を示す第５の図である。次に、仮想スピーカ配置部１５０は、図１１、図１２と同様に、特定された配置領域それぞれにおける、すでに配置された隣り合う２つの仮想スピーカ間の円周３４に沿った方向の角度を算出し、算出された角度が最大である配置領域に仮想スピーカを配置する。図１３では、領域Ｔθ５−８において（角度Ｓθ５−８）／２が算出され、領域Ｔθ９−４において（角度Ｓθ９−４）／２が算出される。そして、（角度Ｓθ５−８）／２＜（角度Ｓθ９−４）／２とすると、領域Ｔθ９−４に仮想スピーカ間の距離が均等になるように仮想スピーカ８７が配置される。以下、未配置の仮想スピーカが無くなるまで図１３の処理を繰り返す。 FIG. 13 is a fifth diagram illustrating an example of a method of arranging virtual speakers when the number of virtual speakers is less than the number of virtual sound sources. Next, the virtual speaker placement unit 150 calculates the angle in the direction along the circumference 34 between two adjacent virtual speakers that have already been placed in each of the specified placement regions, as in FIGS. 11 and 12. Then, the virtual speaker is arranged in the arrangement area where the calculated angle is the maximum. In FIG. 13, (angle Sθ5-8) / 2 is calculated in the region Tθ5-8, and (angle Sθ9-4) / 2 is calculated in the region Tθ9-4. If (angle Sθ5-8) / 2 <(angle Sθ9-4) / 2, virtual speakers 87 are arranged in region Tθ9-4 so that the distances between the virtual speakers are equal. Hereinafter, the process of FIG. 13 is repeated until there is no virtual speaker that is not arranged.

図９〜図１３で示すように、仮想スピーカ配置部１５０は、まず、ユーザ３０から見て円周３４に沿った方向に隣り合う２つの音源間の、円周３４に沿った方向の角度に基づいて配置領域を特定する。そして、特定された配置領域それぞれにおける、隣り合う２つの仮想スピーカ間の円周３４に沿った方向の角度が最大である配置領域に仮想スピーカを配置することを、未配置の仮想スピーカが無くなるまで繰り返す。 As shown in FIGS. 9 to 13, the virtual speaker placement unit 150 first sets the angle in the direction along the circumference 34 between two sound sources adjacent to each other in the direction along the circumference 34 when viewed from the user 30. Based on this, the arrangement area is specified. Then, in each of the specified arrangement areas, the virtual speakers are arranged in the arrangement area where the angle in the direction along the circumference 34 between the two adjacent virtual speakers is the maximum, until there is no unplaced virtual speaker. repeat.

これにより、隣り合う２つの仮想スピーカそれぞれと、ユーザ３０から見て当該２つの仮想スピーカの間に位置する音源との円周３４に沿った方向の距離が小さくなるように、各仮想スピーカを配置できる。したがって、本来使用されるべきＨＲＴＦと実施の演算で使用されるＨＲＴＦとの誤差を小さくすることができる。 Thereby, each virtual speaker is arranged so that the distance in the direction along the circumference 34 between each of the two adjacent virtual speakers and the sound source located between the two virtual speakers when viewed from the user 30 is reduced. it can. Therefore, it is possible to reduce an error between the HRTF to be originally used and the HRTF used in the operation.

次に、音声処理装置１００の処理において使用されるテーブル情報の例について説明する。
図１４は、ユーザ状態テーブルの例について示す図である。ユーザ状態テーブル１１１は、ユーザ情報取得部１４０が取得するユーザの状態を示す情報を一時的に格納するテーブルである。ユーザ状態テーブル１１１は、配置管理情報記憶部１１０に記憶される。ユーザ状態テーブル１１１は、ユーザ端末２００からユーザの視線方向を示す情報を受信したり、ユーザの位置情報を検出したりする毎に随時更新される。ユーザ状態テーブル１１１は、ユーザＩＤ、座標および頭部姿勢角度の項目を有する。 Next, an example of table information used in the processing of the voice processing device 100 will be described.
FIG. 14 is a diagram illustrating an example of a user status table. The user status table 111 is a table that temporarily stores information indicating the user status acquired by the user information acquisition unit 140. The user status table 111 is stored in the arrangement management information storage unit 110. The user status table 111 is updated as needed each time information indicating the user's line-of-sight direction is received from the user terminal 200 or user position information is detected. The user status table 111 includes items of user ID, coordinates, and head posture angle.

ユーザＩＤの項目には、ユーザを識別するための識別子が設定される。
座標の項目には、ユーザの頭部の位置の座標が設定される。なお、図１４において、本項目は３次元で設定されているが、２次元で設定されてもよい。以下で説明する座標についても同様である。 An identifier for identifying the user is set in the user ID item.
In the coordinate item, coordinates of the position of the user's head are set. In FIG. 14, this item is set in three dimensions, but may be set in two dimensions. The same applies to the coordinates described below.

頭部姿勢角度の項目には、所定の基準方向（例えば、北方向）に対するユーザの視線方向の角度が設定される。なお、基準方向は水平方向に沿った方向であるものとする。
図１５は、音源管理テーブルの例について示す図である。音源管理テーブル１１２は、ユーザの周りに配置された音源に関する情報を格納するテーブルである。音源管理テーブル１１２には、ユーザの視線方向からユーザを中心とした円周に沿って所定の方向（例えば、右回転方向）に存在する順に音源に関する情報が一時的に記憶される。音源管理テーブル１１２は、配置管理情報記憶部１１０に記憶されている。音源管理テーブル１１２は、ユーザＩＤ、音源ＩＤ、音源位置および両端フラグの項目を有する。 In the head posture angle item, an angle of the user's line-of-sight direction with respect to a predetermined reference direction (for example, the north direction) is set. The reference direction is a direction along the horizontal direction.
FIG. 15 is a diagram illustrating an example of a sound source management table. The sound source management table 112 is a table that stores information regarding sound sources arranged around the user. The sound source management table 112 temporarily stores information on sound sources in the order in which they exist in a predetermined direction (for example, the right rotation direction) along the circumference centered on the user from the user's line-of-sight direction. The sound source management table 112 is stored in the arrangement management information storage unit 110. The sound source management table 112 has items of user ID, sound source ID, sound source position, and both end flags.

ユーザＩＤの項目には、ユーザを識別するための識別子が設定される。
音源ＩＤの項目には、音源を識別するための識別子が設定される。
音源位置の項目には、音源の位置を示す情報が設定される。音源の位置を示す情報は、例えば、ユーザの位置および視線方向を基準とした音源の相対的な座標でもよいし、音源の緯度経度でもよい。また、音源の位置を示す情報は、ユーザの視線方向とユーザから見た音源の方向との間の角度でもよい。 An identifier for identifying the user is set in the user ID item.
In the sound source ID item, an identifier for identifying the sound source is set.
In the sound source position item, information indicating the position of the sound source is set. The information indicating the position of the sound source may be, for example, relative coordinates of the sound source based on the user's position and line-of-sight direction, or the latitude and longitude of the sound source. The information indicating the position of the sound source may be an angle between the user's line-of-sight direction and the direction of the sound source viewed from the user.

両端フラグの項目には、音源が両端音源であるか否かを示す情報が設定される。例えば、音源が両端音源である場合は“ＴＲＵＥ”が設定され、音源が両端音源でない場合は“ＦＡＬＳＥ”が設定される。両端フラグの初期値は、“ＦＡＬＳＥ”である。 Information indicating whether the sound source is a both-end sound source is set in the both-end flag item. For example, “TRUE” is set when the sound source is a double-ended sound source, and “FALSE” is set when the sound source is not a double-ended sound source. The initial value of the both-end flag is “FALSE”.

なお、図示しないが、音源情報記憶部１２０には、例えば管理者の設定操作により、各音源の位置情報が音源ＩＤに対応付けてあらかじめ登録されており、音源管理テーブル１１２の音源ＩＤおよび音源位置の各項目には、音源情報記憶部１２０に登録された各音源の音源ＩＤおよび位置情報が設定される。また、音源管理テーブル１１２における音源の登録順は、ユーザ情報取得部１４０により、対応するユーザの位置が移動する度に更新される。 Although not shown, in the sound source information storage unit 120, the position information of each sound source is registered in advance in association with the sound source ID, for example, by an administrator's setting operation, and the sound source ID and the sound source position of the sound source management table 112 are registered. In each item, a sound source ID and position information of each sound source registered in the sound source information storage unit 120 are set. The order of registration of sound sources in the sound source management table 112 is updated by the user information acquisition unit 140 every time the corresponding user position moves.

図１６は、仮想スピーカ位置テーブルの例について示す図である。仮想スピーカ位置テーブル１１３は、仮想スピーカの位置に関する情報を格納するテーブルである。仮想スピーカ位置テーブル１１３は、配置管理情報記憶部１１０に一時的に記憶される。仮想スピーカ位置テーブル１１３は、ユーザＩＤ、仮想スピーカＩＤ、スピーカ位置および配置確定フラグの項目を有する。 FIG. 16 is a diagram illustrating an example of the virtual speaker position table. The virtual speaker position table 113 is a table that stores information related to the position of the virtual speaker. The virtual speaker position table 113 is temporarily stored in the arrangement management information storage unit 110. The virtual speaker position table 113 includes items of a user ID, a virtual speaker ID, a speaker position, and an arrangement determination flag.

ユーザＩＤの項目には、ユーザを識別するための識別子が設定される。
仮想スピーカＩＤの項目には、仮想スピーカを識別するための識別子が設定される。
スピーカ位置の項目には、仮想スピーカの位置を示す情報が設定される。仮想スピーカの位置を示す情報は、例えば、ユーザの位置および視線方向を基準とした仮想スピーカの相対的な座標でもよいし、仮想スピーカの絶対座標でもよい。また、スピーカの位置を示す情報は、ユーザの向きとユーザから見た仮想スピーカの方向との間の角度でもよい。 An identifier for identifying the user is set in the user ID item.
In the virtual speaker ID item, an identifier for identifying the virtual speaker is set.
Information indicating the position of the virtual speaker is set in the speaker position item. The information indicating the position of the virtual speaker may be, for example, relative coordinates of the virtual speaker based on the user's position and line-of-sight direction, or may be absolute coordinates of the virtual speaker. The information indicating the position of the speaker may be an angle between the direction of the user and the direction of the virtual speaker viewed from the user.

配置確定フラグの項目には、配置する仮想スピーカの位置が確定しているか否かを示す情報が設定される。例えば、仮想スピーカの位置が確定している場合は“ＴＲＵＥ”が設定され、仮想スピーカの位置が確定していない場合は“ＦＡＬＳＥ”が設定される。配置確定フラグの初期値は、“ＦＡＬＳＥ”である。 In the item of the placement confirmation flag, information indicating whether or not the position of the virtual speaker to be placed is confirmed is set. For example, “TRUE” is set when the position of the virtual speaker is fixed, and “FALSE” is set when the position of the virtual speaker is not fixed. The initial value of the placement confirmation flag is “FALSE”.

図１７は、配置情報の例について示す図である。配置情報１１４は、配置する仮想スピーカの数に関する情報である。配置情報１１４は、配置管理情報記憶部１１０に一時的に記憶される。配置情報１１４は、ユーザＩＤ，配置済み、未配置および合計の項目を有する。 FIG. 17 is a diagram illustrating an example of arrangement information. The arrangement information 114 is information regarding the number of virtual speakers to be arranged. The arrangement information 114 is temporarily stored in the arrangement management information storage unit 110. The placement information 114 includes items of user ID, placed, unplaced, and total.

ユーザＩＤの項目には、ユーザを識別するための識別子が設定される。
配置済みの項目には、配置済みの仮想スピーカの数が設定される。配置済みの初期値は“０”である。 An identifier for identifying the user is set in the user ID item.
In the arranged item, the number of arranged virtual speakers is set. The placed initial value is “0”.

未配置の項目には、未配置である仮想スピーカの数が設定される。未配置の初期値は合計の項目と同じ値である。
合計の項目には、配置する仮想スピーカ全体の数が設定される。すなわち、配置済みの仮想スピーカと未配置の仮想スピーカとの合計が設定される。 In the unplaced item, the number of virtual speakers that are not placed is set. The unallocated initial value is the same value as the total item.
In the total item, the total number of virtual speakers to be arranged is set. That is, the total of the arranged virtual speakers and the unplaced virtual speakers is set.

図１８は、配置領域管理テーブルの例について示す図である。配置領域管理テーブル１１５は、仮想スピーカを配置する配置領域に関する情報を格納するテーブルである。配置領域管理テーブル１１５は、配置管理情報記憶部１１０に一時的に記憶される。配置領域管理テーブル１１５は、ユーザＩＤ、領域ＩＤ、両端角度、探索フラグ、角度（分割後）および分割数の項目を有する。 FIG. 18 is a diagram illustrating an example of an arrangement area management table. The arrangement area management table 115 is a table that stores information related to an arrangement area in which virtual speakers are arranged. The arrangement area management table 115 is temporarily stored in the arrangement management information storage unit 110. The arrangement area management table 115 has items of user ID, area ID, both-end angle, search flag, angle (after division), and number of divisions.

ユーザＩＤの項目には、ユーザを識別するための識別子が設定される。
領域ＩＤの項目には、配置領域を識別するための識別子が設定される。
両端角度の項目には、配置領域の両端の間の、ユーザを中心として円周に沿った方向の角度が設定される。 An identifier for identifying the user is set in the user ID item.
An identifier for identifying the arrangement area is set in the area ID item.
In the both end angle item, an angle in a direction along the circumference with the user at the center between both ends of the arrangement area is set.

探索フラグの項目には、仮想スピーカを１つ追加する配置領域か否かを示す情報が設定される。例えば、仮想スピーカを追加する配置領域である場合は“ＴＲＵＥ”が設定され、仮想スピーカを追加する配置領域でない場合は“ＦＡＬＳＥ”が設定される。仮想スピーカを追加する配置領域か否かは、配置領域毎に均等に配置された仮想スピーカ間の、ユーザを中心として円周に沿った方向の角度が他の配置領域と比べ最大であるか否かで判断される。 In the search flag item, information indicating whether or not it is an arrangement region for adding one virtual speaker is set. For example, “TRUE” is set in the case where the virtual speaker is added, and “FALSE” is set in the case where the virtual speaker is not added. Whether or not it is a placement area for adding a virtual speaker is whether or not the angle in the direction along the circumference centered on the user between the virtual speakers evenly placed in each placement area is the maximum compared to other placement areas. Is judged.

角度（分割後）の項目には、配置領域に均等に配置された仮想スピーカ間の、ユーザを中心として円周に沿った方向の角度が設定される。具体的には、角度（分割後）の項目には、両端角度の項目に設定された角度を分割数の項目に設定された値によって除算した数値が設定される。分割数の項目には、配置領域が、その両端を除く仮想スピーカによって分割された数が設定され、具体的には、配置領域に配置される仮想スピーカの数−１として算出された値が設定される。 In the item of angle (after division), an angle in a direction along the circumference around the user is set between the virtual speakers arranged uniformly in the arrangement region. Specifically, in the angle (after division) item, a numerical value obtained by dividing the angle set in the both-end angle item by the value set in the division number item is set. In the item of the number of divisions, the number of the arrangement area divided by the virtual speakers excluding both ends thereof is set, and specifically, a value calculated as the number of virtual speakers arranged in the arrangement area minus 1 is set. Is done.

領域ＩＤ、両端角度、探索フラグ、角度（分割後）および分割数の初期値は、空欄である。
次に、音声処理装置１００の処理についてフローチャートを用いて説明する。 The area ID, the both-end angle, the search flag, the angle (after division), and the initial value of the number of divisions are blank.
Next, processing of the voice processing apparatus 100 will be described using a flowchart.

図１９は、仮想スピーカの配置処理の例を示すフローチャートである。図１９の処理において、音源情報記憶部１２０に音源の位置情報および音源が出力する音声信号が記憶されているものとする。また、図１９の処理は、ユーザ毎に行われるものとする。したがって、図１９（および図２０）の処理においては、ユーザ状態テーブル１１１などユーザＩＤの項目を含む各テーブルとしては、処理対象のユーザに対応するユーザＩＤが登録されているテーブルが利用される。 FIG. 19 is a flowchart illustrating an example of virtual speaker arrangement processing. In the processing of FIG. 19, it is assumed that the sound source information storage unit 120 stores the position information of the sound source and the audio signal output by the sound source. Moreover, the process of FIG. 19 shall be performed for every user. Accordingly, in the process of FIG. 19 (and FIG. 20), a table in which a user ID corresponding to a user to be processed is registered as each table including a user ID item such as the user status table 111.

（ステップＳ１１）ユーザ情報取得部１４０は、ユーザの状態を示す情報を取得する。ユーザの状態を示す情報には、ユーザの視線方向を示す情報およびユーザの位置情報が含まれる。 (Step S11) The user information acquisition unit 140 acquires information indicating the state of the user. The information indicating the user state includes information indicating the user's line-of-sight direction and user position information.

ユーザ情報取得部１４０は、ユーザ端末２００からユーザの視線方向を示す情報を受信する。また、ユーザ端末２００の位置を示す情報について、ユーザ情報取得部１４０は、アクセスポイント２１ａ〜２１ｄにおける信号の受信時刻の差、あるいは受信電波強度の差に基づいて、三角法を用いてユーザ端末２００の位置を検出する。 The user information acquisition unit 140 receives information indicating the user's line-of-sight direction from the user terminal 200. Further, for information indicating the position of the user terminal 200, the user information acquisition unit 140 uses trigonometry based on a difference in signal reception time or a difference in received radio wave intensity at the access points 21a to 21d. The position of is detected.

そして、ユーザ情報取得部１４０は、取得したユーザの状態を示す情報をユーザ状態テーブル１１１に一時的に格納する。この際、ユーザ情報取得部１４０は、座標の項目にユーザ端末２００の位置を示す情報を設定し、頭部姿勢角度の項目にユーザの視線方向と所定の基準方向（例えば、北方向）との間の角度を設定する。 The user information acquisition unit 140 temporarily stores information indicating the acquired user status in the user status table 111. At this time, the user information acquisition unit 140 sets information indicating the position of the user terminal 200 in the coordinate item, and sets the user's line-of-sight direction and a predetermined reference direction (for example, the north direction) in the head posture angle item. Set the angle between.

（ステップＳ１２）仮想スピーカ配置部１５０は、音源情報記憶部１２０から、各音源に関する情報を読み出す。音源に関する情報には、音源を識別する音源ＩＤ、音源の位置情報および音源の出力する音声信号が含まれる。 (Step S 12) The virtual speaker arrangement unit 150 reads information related to each sound source from the sound source information storage unit 120. The information on the sound source includes a sound source ID for identifying the sound source, position information of the sound source, and an audio signal output from the sound source.

（ステップＳ１３）仮想スピーカ配置部１５０は、ユーザ状態テーブル１１１に格納されたユーザの視線方向や位置を示す情報およびステップＳ１２で確認した各音源の位置情報に基づいて、音源に関する情報を音源管理テーブル１１２に格納する。 (Step S 13) The virtual speaker placement unit 150 displays information on the sound source based on the information indicating the user's line-of-sight direction and position stored in the user state table 111 and the position information of each sound source confirmed in Step S 12. 112.

具体的には、まず、仮想スピーカ配置部１５０は、ユーザ状態テーブル１１１からユーザの状態を示す情報を読み出す。次に、仮想スピーカ配置部１５０は、読み出したユーザの情報とステップＳ１２で確認した音源の座標とに基づいて、ユーザの視線方向からユーザを中心とした円周（以下、所定の円周）に沿って右回転方向に存在する音源を順に判別して、その判別順に各音源に関する情報を音源管理テーブル１１２に登録する。その際、仮想スピーカ配置部１５０は、音源位置の項目にステップＳ１２で確認した音源の位置情報を設定し、両端音源フラグの項目に初期値として“ＦＡＬＳＥ”を設定する。 Specifically, first, the virtual speaker placement unit 150 reads information indicating the user state from the user state table 111. Next, based on the read user information and the coordinates of the sound source confirmed in step S12, the virtual speaker placement unit 150 moves from the user's line-of-sight direction to a circumference centered on the user (hereinafter, a predetermined circumference). The sound sources that exist in the right rotation direction are sequentially determined, and information about each sound source is registered in the sound source management table 112 in the determination order. At that time, the virtual speaker arrangement unit 150 sets the position information of the sound source confirmed in step S12 in the sound source position item, and sets “FALSE” as the initial value in the both-end sound source flag item.

（ステップＳ１４）仮想スピーカ配置部１５０は、配置する仮想スピーカの数がステップＳ１３で格納した音源の数未満であるか判定する。配置する仮想スピーカの数が格納した音源の数未満の場合、処理をステップＳ２１へ進める。配置する仮想スピーカの数が格納した音源の数以上の場合、処理をステップＳ１５へ進める。 (Step S14) The virtual speaker placement unit 150 determines whether the number of virtual speakers to be placed is less than the number of sound sources stored in step S13. If the number of virtual speakers to be arranged is less than the number of stored sound sources, the process proceeds to step S21. If the number of virtual speakers to be arranged is equal to or greater than the number of stored sound sources, the process proceeds to step S15.

（ステップＳ１５）仮想スピーカ配置部１５０は、所定の円周上において、音源それぞれについてユーザと音源とを結ぶ直線上に仮想スピーカを配置する。このとき、仮想スピーカ配置部１５０は、配置した各仮想スピーカに関する情報を仮想スピーカ位置テーブル１１３に登録する。この際、仮想スピーカ配置部１５０は、スピーカ位置の項目に配置した仮想スピーカの位置情報を設定し、配置確定フラグの項目に“ＴＲＵＥ”を設定する。仮想スピーカの位置情報は、ステップＳ１１で取得したユーザの座標と所定の円周の半径とユーザから見て当該仮想スピーカと同じ方向に配置されている音源の座標との位置関係に基づいて算出される。 (Step S15) The virtual speaker placement unit 150 places virtual speakers on a straight line connecting the user and the sound source for each sound source on a predetermined circumference. At this time, the virtual speaker placement unit 150 registers information about each placed virtual speaker in the virtual speaker position table 113. At this time, the virtual speaker arrangement unit 150 sets the position information of the virtual speaker arranged in the item of the speaker position, and sets “TRUE” in the item of the arrangement confirmation flag. The position information of the virtual speaker is calculated based on the positional relationship between the coordinates of the user acquired in step S11, the radius of the predetermined circumference, and the coordinates of the sound source arranged in the same direction as the virtual speaker as viewed from the user. The

なお、所定の円周に関する情報（すなわち、ユーザと仮想スピーカとの距離）は、例えば、ＨＤＤ１０３などの記憶領域にあらかじめ記憶されている。
（ステップＳ１６）音源分配部１６０は、音源情報記憶部１２０に記憶された各音源から出力される音声信号を、所定の円周に沿って隣り合う２つの仮想スピーカに分配する。仮想スピーカに分配される音声信号は、具体的には、図６で説明したように、ユーザと音源と２つの仮想スピーカとの位置関係に応じて重み付けすることで生成される。 Information about a predetermined circumference (that is, the distance between the user and the virtual speaker) is stored in advance in a storage area such as the HDD 103, for example.
(Step S16) The sound source distribution unit 160 distributes the audio signal output from each sound source stored in the sound source information storage unit 120 to two adjacent virtual speakers along a predetermined circumference. Specifically, as described with reference to FIG. 6, the audio signal distributed to the virtual speakers is generated by weighting according to the positional relationship between the user, the sound source, and the two virtual speakers.

（ステップＳ１７）音声生成部１７０は、分配された音声信号を用いて左右チャネルの音声信号を生成する。
具体的には、まず、音声生成部１７０は、仮想スピーカ位置テーブル１１３に登録された各仮想スピーカのスピーカ位置に基づいて、ユーザの視線方向とユーザから仮想スピーカの配置された方向との間の角度を算出する。次に、音声生成部１７０は、算出した各角度と一致する左右のＨＲＴＦをＨＲＴＦ情報記憶部１３０から検索する。そして、音声生成部１７０は、音源分配部１６０により分配された音声信号と検索された左右のＨＲＴＦとを畳み込み演算した信号を合成し、左右のチャネルの音声信号を生成する。 (Step S 17) The sound generation unit 170 generates left and right channel sound signals using the distributed sound signals.
Specifically, first, the sound generation unit 170 determines between the user's line-of-sight direction and the direction in which the virtual speaker is arranged based on the speaker position of each virtual speaker registered in the virtual speaker position table 113. Calculate the angle. Next, the voice generation unit 170 searches the HRTF information storage unit 130 for left and right HRTFs that match the calculated angles. Then, the audio generation unit 170 combines the audio signals distributed by the sound source distribution unit 160 and the searched left and right HRTFs to generate audio signals for the left and right channels.

畳み込み演算の例として、時刻τから時刻ｔまでの音声信号において、左右のチャネルの音声信号をｈ（ｔ）とし、分配された音声信号の関数をｆ（ｔ）とし、ＨＲＴＦをｇ（ｔ）とした場合、以下のような畳み込み積分を用いることができる。 As an example of the convolution operation, in the audio signal from time τ to time t, the audio signal of the left and right channels is h (t), the function of the distributed audio signal is f (t), and HRTF is g (t) In this case, the following convolution integral can be used.

そして、音声生成部１７０は、生成した左右のチャネルの音声信号をユーザ端末２００へ送信する。
その後、ユーザ端末２００の音声出力部２２０は、音声処理装置１００から左右のチャネルの音声信号を受信する。音声出力部２２０は、受信した音声信号をアナログ音声信号に変換し、変換されたアナログ音声信号をヘッドホン１２に出力する。 Then, the sound generation unit 170 transmits the generated sound signals of the left and right channels to the user terminal 200.
Thereafter, the audio output unit 220 of the user terminal 200 receives the audio signals of the left and right channels from the audio processing device 100. The audio output unit 220 converts the received audio signal into an analog audio signal, and outputs the converted analog audio signal to the headphones 12.

図２０は、仮想スピーカの配置処理の例を示すフローチャート（続き）である。
（ステップＳ２１）仮想スピーカ配置部１５０は、図９〜図１０で説明したように、両端音源を特定する。 FIG. 20 is a flowchart (continued) illustrating an example of the virtual speaker arrangement process.
(Step S 21) The virtual speaker placement unit 150 identifies both-end sound sources as described with reference to FIGS. 9 to 10.

具体的には、まず、仮想スピーカ配置部１５０は、音源管理テーブル１１２からユーザの周辺に配置された複数の音源に関する情報を取得する。次に、仮想スピーカ配置部１５０は、取得した音源それぞれの音源位置に基づいて、図９で説明したように、ユーザから見て所定の円周に沿った方向に隣り合う２つの音源間の、円周に沿った方向の角度を算出する。次に、図１０で説明したように、所定の円周に沿った方向の範囲のうち、算出した各音源間の角度がしきい値以上である音源のペアを特定する。そして、仮想スピーカ配置部１５０は、特定した音源のペアに含まれる音源を両端音源と特定し、音源管理テーブル１１２において、その音源の両端フラグの項目を“ＴＲＵＥ”に更新する。 Specifically, first, the virtual speaker arrangement unit 150 acquires information regarding a plurality of sound sources arranged around the user from the sound source management table 112. Next, based on the sound source position of each acquired sound source, the virtual speaker placement unit 150, as described in FIG. 9, between two sound sources adjacent in a direction along a predetermined circumference as viewed from the user, The angle in the direction along the circumference is calculated. Next, as described with reference to FIG. 10, a sound source pair in which the calculated angle between the sound sources is equal to or greater than a threshold value is specified in a range of directions along a predetermined circumference. Then, the virtual speaker arrangement unit 150 identifies the sound sources included in the identified sound source pair as both-end sound sources, and updates the both-end flag item of the sound source to “TRUE” in the sound source management table 112.

（ステップＳ２２）仮想スピーカ配置部１５０は、図１０で説明したように、所定の円周上において、各両端音源が位置する方向にそれぞれ仮想スピーカを配置する。このとき、仮想スピーカ配置部１５０は、図１９のステップＳ１５と同様に、配置した各仮想スピーカに関する情報を仮想スピーカ位置テーブル１１３に格納する。 (Step S22) As described with reference to FIG. 10, the virtual speaker placement unit 150 places virtual speakers in the direction in which the sound sources at both ends are located on a predetermined circumference. At this time, the virtual speaker arrangement unit 150 stores information on the arranged virtual speakers in the virtual speaker position table 113, as in step S15 of FIG.

そして、仮想スピーカ配置部１５０は、配置情報１１４における配置済みの項目の値を、格納した仮想スピーカの数を加算した値に更新するとともに、未配置の項目の値を、格納した仮想スピーカの数を減算した値に更新する。 Then, the virtual speaker placement unit 150 updates the value of the placed item in the placement information 114 to a value obtained by adding the number of stored virtual speakers, and changes the value of the unplaced item to the number of stored virtual speakers. Update to the value obtained by subtracting.

この後のステップＳ２３〜Ｓ２７では、初期配置によって配置されたものを除く残りの仮想スピーカを配置するための処理が行われる。
（ステップＳ２３）仮想スピーカ配置部１５０は、ステップＳ２２ですでに配置された仮想スピーカに挟まれた複数の範囲のうち、その両端以外の位置にも音源が存在している範囲を配置領域として特定する。 In subsequent steps S23 to S27, processing for arranging the remaining virtual speakers excluding those arranged by the initial arrangement is performed.
(Step S23) The virtual speaker placement unit 150 identifies, as a placement area, a range in which a sound source is present at positions other than both ends of the plurality of ranges sandwiched between the virtual speakers already placed in Step S22. To do.

具体的には、仮想スピーカ配置部１５０は、音源管理テーブル１１２を参照して、両端フラグが“ＴＲＵＥ”である両端音源の間に、両端フラグが“ＦＡＬＳＥ”である音源のみが含まれている領域を配置領域として特定する。 Specifically, the virtual speaker arrangement unit 150 refers to the sound source management table 112 and includes only sound sources having both ends flag “FALSE” between both ends sound sources having both ends flag “TRUE”. An area is specified as an arrangement area.

次に、仮想スピーカ配置部１５０は、特定した配置領域に関する情報を配置領域管理テーブル１１５に登録する。この際、仮想スピーカ配置部１５０は、両端角度および角度（分割後）の項目にユーザから見た両端音源同士の各方向の間の角度を設定し、探索フラグの項目に“ＦＡＬＳＥ”を設定し、分割数に“１”を設定する。 Next, the virtual speaker placement unit 150 registers information regarding the identified placement area in the placement area management table 115. At this time, the virtual speaker placement unit 150 sets the angle between the two sound sources as viewed from the user in the both end angle and angle (after division) items, and sets “FALSE” in the search flag item. , “1” is set as the number of divisions.

（ステップＳ２４）仮想スピーカ配置部１５０は、図１１〜図１３で説明したように、ステップＳ２３で特定した配置領域から仮想スピーカを１つ追加する配置領域を選択する。 (Step S24) As described with reference to FIGS. 11 to 13, the virtual speaker placement unit 150 selects a placement area in which one virtual speaker is added from the placement area specified in step S23.

具体的には、仮想スピーカ配置部１５０は、配置領域管理テーブル１１５から、角度（分割後）の値が最大である配置領域を選択する。そして、仮想スピーカ配置部１５０は、配置領域管理テーブル１１５について、選択した配置領域の探索フラグを“ＴＲＵＥ”に更新する。 Specifically, the virtual speaker arrangement unit 150 selects an arrangement area having a maximum angle (after division) value from the arrangement area management table 115. Then, the virtual speaker arrangement unit 150 updates the search flag for the selected arrangement area to “TRUE” in the arrangement area management table 115.

（ステップＳ２５）仮想スピーカ配置部１５０は、図１１〜図１３で説明したように、仮想スピーカを１つ配置する。
具体的には、仮想スピーカ配置部１５０は、配置領域管理テーブル１１５について、探索フラグが“ＴＲＵＥ”である探索領域の分割数の項目を当該分割数に１加算した値に更新する。また、仮想スピーカ配置部１５０は、配置領域管理テーブル１１５について、探索フラグが“ＴＲＵＥ”である配置領域の角度（分割後）の項目を、両端角度の項目に設定された値を更新後の分割数に設定された値によって除算した値に更新する。また、仮想スピーカ配置部１５０は、配置領域管理テーブル１１５について、探索フラグが“ＴＲＵＥ”である配置領域の探索フラグの項目を“ＦＡＬＳＥ”に更新する。 (Step S25) The virtual speaker placement unit 150 places one virtual speaker as described with reference to FIGS.
Specifically, the virtual speaker placement unit 150 updates the placement area management table 115 to a value obtained by adding 1 to the number of divisions of the search area division number where the search flag is “TRUE”. Also, the virtual speaker placement unit 150 updates the value set in the angle (after division) item of the placement region whose search flag is “TRUE” and the value set in the both end angle item in the placement region management table 115 after the update. Update to the value divided by the value set for the number. Further, the virtual speaker placement unit 150 updates the search flag item of the placement area whose search flag is “TRUE” to “FALSE” in the placement area management table 115.

そして、仮想スピーカ配置部１５０は、配置情報１１４の配置済みの項目を１加算した値に更新し、当該配置情報１１４の未配置の項目を１減算した値に更新する。
（ステップＳ２６）仮想スピーカ配置部１５０は、図１１〜図１３で説明したように、全ての仮想スピーカを配置済みか判定する。全ての仮想スピーカを配置済みの場合、処理をステップＳ２７へ進める。全ての仮想スピーカを配置済みでない場合、処理をステップＳ２４へ進める。全ての仮想スピーカを配置済みであるか否かは、例えば、配置情報１１４の未配置の項目が“０”であるかにより判定できる。 Then, the virtual speaker placement unit 150 updates the placed item of the placement information 114 to a value obtained by adding 1, and updates the unplaced item of the placement information 114 to a value obtained by subtracting 1.
(Step S26) As described with reference to FIGS. 11 to 13, the virtual speaker placement unit 150 determines whether all the virtual speakers have been placed. If all virtual speakers have been arranged, the process proceeds to step S27. If not all virtual speakers have been arranged, the process proceeds to step S24. Whether or not all the virtual speakers have been arranged can be determined, for example, based on whether the unarranged item of the arrangement information 114 is “0”.

（ステップＳ２７）仮想スピーカ配置部１５０は、図１１〜図１３で説明したように、配置領域それぞれについて、隣り合う２つの仮想スピーカ間の、ユーザを中心とした角度が均等になるように、所定の円周上に配置された残りの仮想スピーカの座標を仮想スピーカ位置テーブル１１３に登録する。 (Step S 27) As described with reference to FIGS. 11 to 13, the virtual speaker placement unit 150 determines a predetermined angle so that the angle between two adjacent virtual speakers is centered on the user for each placement region. The coordinates of the remaining virtual speakers arranged on the circumference are registered in the virtual speaker position table 113.

具体的には、仮想スピーカ配置部１５０は、配置領域管理テーブル１１５から配置領域を順次選択し、選択された配置領域のそれぞれについて次の処理を実行する。
まず、仮想スピーカ配置部１５０は、配置領域管理テーブル１１５から選択された配置領域についての角度（分割後）を読み出す。仮想スピーカ配置部１５０は、所定の円周上において、選択された配置領域の一端から一方向（例えば右回り方向）に対し、読み出した角度（分割後）毎の位置に仮想スピーカを配置するよう、「選択した配置領域の分割数−１」分の仮想スピーカの位置情報（例えば、座標）を算出する。そして、仮想スピーカ配置部１５０は、仮想スピーカ位置テーブル１１３について、配置する仮想スピーカに関する情報を更新する。具体的には、仮想スピーカ配置部１５０は、配置決定フラグが“ＦＡＬＳＥ”である仮想スピーカについて、スピーカ位置の項目に算出された位置情報を登録し、配置決定フラグの項目を“ＴＲＵＥ”に更新する。 Specifically, the virtual speaker placement unit 150 sequentially selects placement areas from the placement area management table 115 and executes the following process for each of the selected placement areas.
First, the virtual speaker placement unit 150 reads the angle (after division) for the placement area selected from the placement area management table 115. The virtual speaker arrangement unit 150 arranges the virtual speaker at a position for each read angle (after division) with respect to one direction (for example, clockwise direction) from one end of the selected arrangement area on a predetermined circumference. , The position information (for example, coordinates) of the virtual speakers for “the number of divisions of the selected arrangement area−1” is calculated. Then, the virtual speaker arrangement unit 150 updates the information regarding the virtual speaker to be arranged in the virtual speaker position table 113. Specifically, the virtual speaker placement unit 150 registers the position information calculated in the speaker position item for the virtual speaker whose placement decision flag is “FALSE”, and updates the placement decision flag item to “TRUE”. To do.

そして、仮想スピーカ配置部１５０は、処理をステップＳ１６へ進める。
次に、図２１〜図２５では、図２０のステップＳ２１〜ステップＳ２５のように仮想スピーカを配置した場合の処理の例を具体的に説明する。 Then, the virtual speaker arrangement unit 150 proceeds with the process to step S16.
Next, FIGS. 21 to 25 will specifically describe an example of processing when virtual speakers are arranged as in steps S 21 to S 25 of FIG. 20.

図２１は、仮想スピーカの配置の例を示す第１の図である。図２０のステップＳ２１では、配置領域の両端に含まれる音源を特定する。ここでは、図１０の例のように、角度がしきい値以上となる音源のペアとして、音源ＩＤ＝＃４，＃５のペアと、音源ＩＤ＝＃８，＃９のペアとが抽出されたとする。その結果、両端音源として音源ＩＤ＝＃４，＃５，＃８，＃９である音源が特定される。 FIG. 21 is a first diagram illustrating an example of the arrangement of virtual speakers. In step S21 of FIG. 20, sound sources included at both ends of the arrangement area are specified. Here, as in the example of FIG. 10, a pair of sound source ID = # 4, # 5 and a pair of sound source ID = # 8, # 9 are extracted as a pair of sound sources having an angle equal to or greater than a threshold value. Suppose. As a result, sound sources with sound source ID = # 4, # 5, # 8, and # 9 are specified as both-end sound sources.

そのため、仮想スピーカ配置部１５０は、音源管理テーブル１１２ａのように、音源ＩＤが＃４，＃５，＃８および＃９である音源の両端フラグの項目を“ＴＲＵＥ”に更新する。 Therefore, the virtual speaker arrangement unit 150 updates the sound source both end flag items of the sound source IDs # 4, # 5, # 8, and # 9 to “TRUE” as in the sound source management table 112a.

図２２は、仮想スピーカの配置の例を示す第２の図である。図２０のステップＳ２２では、仮想スピーカ配置部１５０は、所定の円周上において、ユーザ３０から見て図２０のステップＳ２１で特定された音源それぞれが位置する方向に、仮想スピーカＩＤ＝Ｖ１，Ｖ２，Ｖ３，Ｖ４である仮想スピーカを配置する。仮想スピーカ配置部１５０は、仮想スピーカ位置テーブル１１３ａのように、仮想スピーカＩＤ＝Ｖ１，Ｖ２，Ｖ３，Ｖ４にそれぞれ対応するスピーカ位置の項目に、音源ＩＤ＝＃４，＃５，＃８，＃９である音源の方向にそれぞれ配置する仮想スピーカの位置情報を登録し、対応する配置確定フラグの項目を“ＴＲＵＥ”に更新する。 FIG. 22 is a second diagram illustrating an example of the arrangement of virtual speakers. In step S22 of FIG. 20, the virtual speaker placement unit 150 has virtual speaker IDs = V1 and V2 in a direction in which each of the sound sources specified in step S21 of FIG. , V3 and V4 are arranged. The virtual speaker arrangement unit 150 includes sound source IDs = # 4, # 5, # 8, # in the items of the speaker positions corresponding to the virtual speaker IDs = V1, V2, V3, and V4 as in the virtual speaker position table 113a. 9, the position information of the virtual speakers respectively arranged in the direction of the sound source is registered, and the item of the corresponding arrangement confirmation flag is updated to “TRUE”.

また、図２０のステップＳ２３では、仮想スピーカ配置部１５０は、音源管理テーブル１１２ａに基づき、隣り合う両端音源に挟まれた領域のうち、その両端以外の位置にも音源が存在している領域を配置領域として特定する。その結果、領域ＩＤ＝Ｔθ５−８，Ｔθ９−４が配置領域として特定されたとする。この場合、仮想スピーカ配置部１５０は、配置領域管理テーブル１１５ａのように、各配置領域の領域ＩＤ、両端角度、探索フラグ、角度（分割後）および分割数が登録される。 Further, in step S23 of FIG. 20, the virtual speaker placement unit 150 selects a region where a sound source is present at a position other than both ends among regions sandwiched between adjacent both end sound sources based on the sound source management table 112a. It is specified as a placement area. As a result, it is assumed that the area IDs = Tθ5-8 and Tθ9-4 are specified as the arrangement areas. In this case, as in the placement area management table 115a, the virtual speaker placement section 150 registers the area ID, the both end angles, the search flag, the angle (after division), and the number of divisions of each placement area.

また、仮想スピーカ位置テーブル１１３ａが示すように、配置確定フラグが“ＴＲＵＥ”である仮想スピーカの数は４つである。そのため、図示していないが、配置情報１１４についても、配置済みの仮想スピーカが“０”から“４”に更新され、未配置の仮想スピーカが“８”から“４”に更新される。 Further, as indicated by the virtual speaker position table 113a, the number of virtual speakers whose arrangement confirmation flag is “TRUE” is four. For this reason, although not shown, the arrangement information 114 is also updated from “0” to “4” for the arranged virtual speakers and from “8” to “4” for the non-arranged virtual speakers.

図２３は、仮想スピーカの配置の例を示す第３の図である。図２０のステップＳ２４では、ステップＳ２３で特定した配置領域から仮想スピーカを１つ追加する配置領域が選択される。また、図２０のステップＳ２５では、選択した領域に仮想スピーカが１つ配置される。 FIG. 23 is a third diagram illustrating an example of arrangement of virtual speakers. In step S24 of FIG. 20, an arrangement area for adding one virtual speaker is selected from the arrangement area specified in step S23. In step S25 of FIG. 20, one virtual speaker is arranged in the selected area.

具体的には、まず、図２０のステップＳ２４では、仮想スピーカ配置部１５０は、配置領域の角度（分割後）が最大である配置領域を選択する。配置領域管理テーブル１１５ａのように、領域ＩＤがＴθ９−４である配置領域の両端角度は８０°であり、領域ＩＤがＴθ５−８である配置領域の両端角度は３０°である。そのため、仮想スピーカ配置部１５０は、配置する仮想スピーカを１つ追加する領域として、領域ＩＤがＴθ９−４である配置領域を選択する。 Specifically, first, in step S 24 of FIG. 20, the virtual speaker placement unit 150 selects a placement region where the angle of the placement region (after division) is maximum. Like the arrangement area management table 115a, the both end angle of the arrangement area whose area ID is Tθ9-4 is 80 °, and the both end angle of the arrangement area whose area ID is Tθ5-8 is 30 °. Therefore, the virtual speaker placement unit 150 selects a placement region whose region ID is Tθ9-4 as a region to which one virtual speaker to be placed is added.

次に、図２０のステップＳ２５では、仮想スピーカ配置部１５０は、配置領域管理テーブル１１５ｂのように、選択された領域ＩＤがＴθ９−４である配置領域の探索フラグを“ＴＲＵＥ”に更新する。 Next, in step S25 of FIG. 20, the virtual speaker arrangement unit 150 updates the search flag for the arrangement area whose selected area ID is Tθ9-4 to “TRUE” as in the arrangement area management table 115b.

そして、仮想スピーカ配置部１５０は、配置領域管理テーブル１１５ｃのように、探索フラグが“ＴＲＵＥ”である配置領域において、分割数を１加算した値“２”に更新し、角度（分割後）の項目を両端角度の項目に設定された値を更新後の分割数に設定された値によって除算した値“４０”に更新し、探索フラグの項目を“ＦＡＬＳＥ”に更新する。そして、仮想スピーカ配置部１５０は、配置情報１１４ｂのように、配置済みの仮想スピーカを１加算した値“５”に更新し、未配置の仮想スピーカを１つ減算した値“３”に更新する。 Then, the virtual speaker arrangement unit 150 updates the angle (after division) to the value “2” obtained by adding 1 to the division number in the arrangement region where the search flag is “TRUE” as in the arrangement region management table 115c. The value set to the item of both-ends angle is updated to the value “40” obtained by dividing the value set by the updated division number, and the search flag item is updated to “FALSE”. Then, the virtual speaker placement unit 150 updates the value “5” obtained by adding 1 to the placed virtual speaker, and updates the value “3” obtained by subtracting one unplaced virtual speaker, as in the placement information 114b. .

図２４は、仮想スピーカの配置の例を示す第４の図である。次に、仮想スピーカ配置部１５０は、各配置領域において、図２３と同様の処理を実行する。配置領域管理テーブル１１５ｃが示すように、角度（分割後）が最大である配置領域の領域ＩＤはＴθ９−４である。 FIG. 24 is a fourth diagram illustrating an example of the arrangement of virtual speakers. Next, the virtual speaker placement unit 150 performs the same processing as in FIG. 23 in each placement region. As shown in the arrangement area management table 115c, the area ID of the arrangement area having the maximum angle (after division) is Tθ9-4.

そのため、仮想スピーカ配置部１５０は、配置領域管理テーブル１１５ｄのように、領域ＩＤがＴθ９−４である配置領域の分割数を１追加した値“３”に更新し、領域ＩＤがＴθ９−４である配置領域の角度（分割後）を両端角度／更新後の分割数“２６”に更新する。また、仮想スピーカ配置部１５０は、配置情報１１４ｃのように、配置済みの仮想スピーカを１加算した値“６”に更新し、未配値の仮想スピーカを１減算した値“２”に更新する。 Therefore, the virtual speaker arrangement unit 150 updates the number of divisions of the arrangement area whose area ID is Tθ9-4 to the value “3” added by 1 as in the arrangement area management table 115d, and the area ID is Tθ9-4. The angle (after division) of a certain arrangement area is updated to both-end angle / updated division number “26”. Also, the virtual speaker placement unit 150 updates the placed virtual speaker to a value “6” obtained by adding 1 and updates the unassigned virtual speaker to a value “2” obtained by subtracting 1 as shown in the placement information 114c. .

次に、仮想スピーカ配置部１５０は、仮想スピーカを追加後の各配置領域について図２３と同様の処理を実行する。配置領域管理テーブル１１５ｄが示すように、角度（分割後）が最大である配置領域の領域ＩＤはＴθ５−８である。 Next, the virtual speaker placement unit 150 executes the same processing as in FIG. 23 for each placement region after the virtual speaker is added. As shown in the arrangement area management table 115d, the area ID of the arrangement area having the maximum angle (after division) is Tθ5-8.

そのため、仮想スピーカ配置部１５０は、配置領域管理テーブル１１５ｅのように、領域ＩＤがＴθ５−８である配置領域の分割数を１追加した値“２”に更新し、領域ＩＤがＴθ５−８である配置領域の角度（分割後）を両端角度／更新後の分割数“１５”に更新する。また、仮想スピーカ配置部１５０は、配置情報１１４ｄのように、配置済みの仮想スピーカを１加算した値“７”に更新し、未配値の仮想スピーカを１減算した値“１”に更新する。 Therefore, the virtual speaker arrangement unit 150 updates the number of divisions of the arrangement area whose area ID is Tθ5-8 to “2” added by 1 as in the arrangement area management table 115e, and the area ID is Tθ5-8. The angle (after division) of a certain arrangement area is updated to both-end angle / updated division number “15”. In addition, the virtual speaker placement unit 150 updates the placed virtual speaker to a value “7” obtained by adding 1 and updates the unassigned virtual speaker to a value “1” obtained by subtracting 1 as the placement information 114d. .

図２５は、仮想スピーカの配置の例を示す第５の図である。未配値の仮想スピーカそれぞれについて、図２３〜図２４のように、配置する配置領域を算出した後、図２０のステップＳ２７において、仮想スピーカ配置部１５０は、配置領域それぞれについて、隣り合う２つの仮想スピーカ間の、ユーザ３０を中心とした角度が均等になるように、追加した仮想スピーカを所定の円周上に配置する。各配置領域に配置される仮想スピーカ間の角度は、配置領域管理テーブル１１５において各配置領域に対応付けられた角度（分割後）の値となる。 FIG. 25 is a fifth diagram illustrating an example of arrangement of virtual speakers. For each undistributed virtual speaker, as shown in FIGS. 23 to 24, after calculating the placement area to be placed, in step S 27 of FIG. 20, the virtual speaker placement unit 150 sets two adjacent loudspeakers for each placement area. The added virtual speakers are arranged on a predetermined circumference so that the angles between the virtual speakers around the user 30 are equal. The angle between the virtual speakers arranged in each arrangement area is a value of an angle (after division) associated with each arrangement area in the arrangement area management table 115.

その結果、仮想スピーカ位置テーブル１１３ｂのように、図２３〜図２４で新たに配置された仮想スピーカＩＤ＝Ｖ５，Ｖ６，Ｖ７，Ｖ８である仮想スピーカの座標および配置確定フラグが更新される。 As a result, as in the virtual speaker position table 113b, the coordinates of the virtual speakers newly arranged in FIG. 23 to FIG. 24 and the placement confirmation flags of the virtual speakers with V = V5, V6, V7, and V8 are updated.

第２の実施の形態の音声処理システムによれば、音声処理装置１００は、ユーザ３０から見て所定の円周に沿った方向に隣り合う２つの音源間の、ユーザ３０を中心として所定の円周に沿った方向の角度を算出する。音声処理装置１００は、算出した角度がしきい値以上となる仮想音源を両端音源として特定する。 According to the sound processing system of the second embodiment, the sound processing device 100 is configured to have a predetermined circle centered on the user 30 between two sound sources adjacent to each other in a direction along a predetermined circumference as viewed from the user 30. The angle in the direction along the circumference is calculated. The sound processing device 100 identifies a virtual sound source whose calculated angle is equal to or greater than a threshold value as a both-end sound source.

次に、ユーザ３０から見て、特定された各両端音源が位置する方向にそれぞれ仮想スピーカを配置する。そして、所定の円周上において、すでに配置された仮想スピーカに挟まれた領域（すなわち、両端音源に挟まれた領域）のうち、その両端以外の位置にも音源が存在している領域を配置領域と特定する。 Next, as viewed from the user 30, virtual speakers are arranged in the direction in which each identified both-end sound source is located. Then, on a predetermined circumference, a region where a sound source exists at positions other than both ends of the region sandwiched between already arranged virtual speakers (ie, a region sandwiched between both end sound sources) is arranged. Identify the area.

そして、特定された配置領域それぞれにおける、隣り合う２つの仮想スピーカ間のユーザ３０を中心とした角度が最大である配置領域に仮想スピーカ間の間隔が均等になるように未配置の仮想スピーカを１つ配置する処理を、未配置の仮想スピーカが無くなるまで繰り返す。 Then, in each of the specified placement areas, 1 virtual speaker that has not been placed is arranged so that the spacing between the virtual speakers is uniform in the placement area where the angle between the two adjacent virtual speakers around the user 30 is the maximum. The process of arranging one is repeated until there are no virtual speakers that are not arranged.

これにより、仮想スピーカとユーザ３０とを結ぶ直線と、仮想スピーカに隣り合う音源とユーザ３０とを結ぶ直線との角度を減らすことができる。ここで、音源が出力する音声信号を仮想スピーカに分配する際、仮想スピーカとユーザ３０とを結ぶ直線と、仮想スピーカに隣り合う音源とユーザ３０とを結ぶ直線との角度が小さい方が、ＨＲＴＦの誤差が減少して、音像の定位感の減少を抑制できる。そのため、仮想スピーカとユーザ３０とを結ぶ直線と、仮想スピーカに隣り合う音源とユーザ３０とを結ぶ直線との角度を減らすことで、各音源の音像の定位感の減少を抑制できる。よって、複数の仮想音源の音像を定位させる処理の負荷を軽減しつつ、音像の定位感を向上できる。 Thereby, the angle between the straight line connecting the virtual speaker and the user 30 and the straight line connecting the sound source adjacent to the virtual speaker and the user 30 can be reduced. Here, when the audio signal output from the sound source is distributed to the virtual speakers, the smaller the angle between the straight line connecting the virtual speaker and the user 30 and the straight line connecting the sound source adjacent to the virtual speaker and the user 30 is HRTF. This reduces the error in the sound image and suppresses a decrease in the sense of localization of the sound image. Therefore, by reducing the angle between the straight line connecting the virtual speaker and the user 30 and the straight line connecting the sound source adjacent to the virtual speaker and the user 30, it is possible to suppress a reduction in the localization of the sound image of each sound source. Therefore, it is possible to improve the sense of localization of the sound image while reducing the load of processing for localizing the sound images of the plurality of virtual sound sources.

［第２の実施の形態の変形例］
次に、第２の実施の形態の変形例について説明する。以下の変形例は、聴取者に対して所望の範囲の方向に存在する音源の方位感をより向上させるものである。例えば、聴取者の後方に存在する音源の方位感より、前方に存在する音源の方位感をより向上させたい場合がある。これは、人間の方位感が後方より前方の方が曖昧になりやすいからである。この場合、後方の範囲の方向より前方の範囲の方向に仮想スピーカを多く配置することで実現できる。そこで、図２６〜図２７では、各配置領域の位置によって重み付けをすることで、ユーザ３０の前方に存在する配置領域により仮想スピーカをより多く配置する例について説明する。図２６〜図２７において、第２の実施の形態と差異のある点を説明し、第２の実施の形態と同じ構成や処理については説明を省略する。 [Modification of Second Embodiment]
Next, a modification of the second embodiment will be described. The following modification improves the sense of direction of the sound source that exists in the direction of a desired range with respect to the listener. For example, there is a case where it is desired to improve the azimuth feeling of a sound source existing ahead of the listener rather than the azimuth feeling of a sound source existing behind the listener. This is because the human sense of direction tends to be more ambiguous in front than behind. In this case, it can be realized by arranging more virtual speakers in the direction of the front range than the direction of the rear range. Therefore, in FIG. 26 to FIG. 27, an example in which more virtual speakers are arranged in the arrangement area existing in front of the user 30 by weighting according to the position of each arrangement area will be described. 26 to 27, differences from the second embodiment will be described, and description of the same configuration and processing as those of the second embodiment will be omitted.

図２６は、配置領域に対する重みの設定方法の例を示す図である。円周３４は、領域Ｋ１，Ｋ２，Ｋ３，Ｋ４に分割される。領域Ｋ１はユーザ３０の前方に存在し、領域Ｋ２はユーザ３０の左側に存在し、領域Ｋ３はユーザ３０の右側に存在し、領域Ｋ４はユーザの後方に存在する。例えば、領域Ｋ１には１．０の重みが設定され、領域Ｋ２，Ｋ３には０．８の重みが設定され、領域Ｋ４には０．６の重みが設定されている。このように、ユーザ３０の後方よりも前方の方が重みの値が大きく設定される。 FIG. 26 is a diagram illustrating an example of a weight setting method for an arrangement region. The circumference 34 is divided into regions K1, K2, K3, and K4. The area K1 is present in front of the user 30, the area K2 is present on the left side of the user 30, the area K3 is present on the right side of the user 30, and the area K4 is present behind the user. For example, a weight of 1.0 is set for the area K1, a weight of 0.8 is set for the areas K2 and K3, and a weight of 0.6 is set for the area K4. In this way, the weight value is set larger in the front direction than in the rear direction of the user 30.

ここで、円周３４上に配置領域Ｔ１，Ｔ２が存在するものとする。配置領域Ｔ１における両端の方向の間の角度を２等分した方向３５ａは領域Ｋ１に含まれるため、配置領域Ｔ１の重みは１．０となる。また、配置領域Ｔ２における両端の方向の間の角度２等分した方向３５ｂは領域Ｋ４に含まれるため、配置領域Ｔ２の重みは０．６となる。 Here, it is assumed that the arrangement regions T1 and T2 exist on the circumference 34. Since the direction 35a obtained by equally dividing the angle between the directions of both ends in the arrangement region T1 is included in the region K1, the weight of the arrangement region T1 is 1.0. In addition, since the direction 35b obtained by equally dividing the angle between the two end directions in the arrangement region T2 is included in the region K4, the weight of the arrangement region T2 is 0.6.

図２７は、重みを考慮した仮想スピーカの配置の変形例を示す図である。配置領域管理テーブル１１６ａ，１１６ｂは、配置領域管理テーブル１１５に角度（重み付け後）および重みの項目が追加されている。角度（重み付け後）の項目には、角度（分割後）の値に重みを乗じた値が設定される。重みの項目には、図２６で説明したように、配置領域の位置に応じた重みが設定される。 FIG. 27 is a diagram illustrating a modification of the placement of the virtual speakers in consideration of the weight. In the arrangement area management tables 116a and 116b, items of angle (after weighting) and weight are added to the arrangement area management table 115. In the item of angle (after weighting), a value obtained by multiplying the value of angle (after division) by the weight is set. In the item of weight, as described with reference to FIG. 26, a weight corresponding to the position of the arrangement area is set.

ここで、図２０のステップＳ２４〜ステップＳ２５で、仮想スピーカを追加する配置領域を特定し、仮想スピーカを配置する例について説明する。
まず、第２の実施の形態のシステムの変形例では、配置領域の角度（分割後）ではなく、角度（重み付け後）が最大である配置領域に仮想スピーカを追加するようにする。配置領域管理テーブル１１６ａの例では、領域ＩＤがＴθ９−４である配置領域が仮想スピーカを追加する領域として選択され、当該配置領域の探索フラグが“ＴＲＵＥ”に更新される。 Here, an example will be described in which the placement area to which the virtual speaker is added is identified and the virtual speaker is placed in steps S24 to S25 in FIG.
First, in a modified example of the system of the second embodiment, a virtual speaker is added to an arrangement area where the angle (after weighting) is maximum, not the angle (after division) of the arrangement area. In the example of the arrangement area management table 116a, the arrangement area whose area ID is Tθ9-4 is selected as the area to which the virtual speaker is added, and the search flag of the arrangement area is updated to “TRUE”.

そして、配置領域管理テーブル１１６ｂのように、領域ＩＤがＴθ９−４である配置領域の角度（分割後）、角度（重み付け後）および分割数が更新される。すなわち、領域ＩＤがＴθ９−４よりも後方に位置する、領域ＩＤがＴθ５−８である配置領域の角度が、重み付けによって本来よりも小さく補正されて演算に利用される。これにより、より前方に位置する領域ＩＤがＴθ９−４の配置領域の方に仮想スピーカが優先的に追加され、その結果、前方の方位感が向上するようになる。 Then, as in the arrangement area management table 116b, the angle (after division), the angle (after weighting), and the division number of the arrangement area whose area ID is Tθ9-4 are updated. In other words, the angle of the arrangement area whose area ID is located behind Tθ9-4 and whose area ID is Tθ5-8 is corrected to be smaller than the original by weighting and used for the calculation. As a result, the virtual speaker is preferentially added toward the arrangement area having the area ID Tθ9-4 positioned further forward, and as a result, the forward orientation feeling is improved.

なお、図２６で説明した領域の数を４つ以上に設定してもよい。また、領域Ｋ１，Ｋ２，Ｋ３，Ｋ４毎に配置できる仮想スピーカの最大数を設定できるようにしてもよい。
第２の実施の形態の変形例によれば、音声処理装置１００は、配置領域の位置に基づいて仮想スピーカ間の角度に重み付けをすることで、ユーザ３０の所望の方向に仮想スピーカを多く配置することができる。仮想スピーカを多く配置すればユーザ３０から見た音源の方向とユーザ３０から見た仮想スピーカの方向との間の角度を小さくできる。よって、ユーザ３０が所望する方向に配置された複数の音源の音像の定位感が向上する。 Note that the number of regions described in FIG. 26 may be set to four or more. Moreover, you may enable it to set the maximum number of the virtual speakers which can be arrange | positioned for every area | region K1, K2, K3, K4.
According to the modification of the second embodiment, the sound processing apparatus 100 places a large number of virtual speakers in a desired direction of the user 30 by weighting the angle between the virtual speakers based on the position of the placement region. can do. If many virtual speakers are arranged, the angle between the direction of the sound source viewed from the user 30 and the direction of the virtual speaker viewed from the user 30 can be reduced. Therefore, the feeling of localization of the sound images of a plurality of sound sources arranged in the direction desired by the user 30 is improved.

なお、前述のように、第１の実施の形態の情報処理は、音声処理装置１にプログラムを実行させることで実現でき、第２の実施の形態の情報処理は、音声処理装置１００やユーザ端末２００にプログラムを実行させることで実現できる。このようなプログラムは、コンピュータ読み取り可能な記録媒体（例えば、記録媒体１５）に記録しておくことができる。記録媒体としては、例えば、磁気ディスク、光ディスク、光磁気ディスク、半導体メモリなどを使用できる。磁気ディスクには、ＦＤおよびＨＤＤが含まれる。光ディスクには、ＣＤ、ＣＤ−Ｒ（Recordable）／ＲＷ（Rewritable）、ＤＶＤおよびＤＶＤ−Ｒ／ＲＷが含まれる。 As described above, the information processing according to the first embodiment can be realized by causing the voice processing device 1 to execute a program, and the information processing according to the second embodiment can be performed by the voice processing device 100 or a user terminal. This can be realized by causing the program to be executed by 200. Such a program can be recorded on a computer-readable recording medium (for example, the recording medium 15). As the recording medium, for example, a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like can be used. Magnetic disks include FD and HDD. Optical discs include CD, CD-R (Recordable) / RW (Rewritable), DVD, and DVD-R / RW.

プログラムを流通させる場合、例えば、当該プログラムを記録した可搬記録媒体が提供される。また、プログラムを他のコンピュータの記憶装置に格納しておき、ネットワーク１０経由でプログラムを配布することもできる。コンピュータは、例えば、可搬記録媒体に記録されたプログラムまたは他のコンピュータから受信したプログラムを、記憶装置（例えば、ＨＤＤ１０３）に格納し、当該記憶装置からプログラムを読み込んで実行する。ただし、可搬記録媒体から読み込んだプログラムを直接実行してもよく、他のコンピュータからネットワーク１０を介して受信したプログラムを直接実行してもよい。また、上記の情報処理の少なくとも一部を、ＤＳＰ、ＡＳＩＣ、ＰＬＤ（Programmable Logic Device）などの電子回路で実現することも可能である。 When distributing the program, for example, a portable recording medium in which the program is recorded is provided. It is also possible to store the program in a storage device of another computer and distribute the program via the network 10. The computer stores, for example, a program recorded on a portable recording medium or a program received from another computer in a storage device (for example, HDD 103), and reads and executes the program from the storage device. However, a program read from a portable recording medium may be directly executed, or a program received from another computer via the network 10 may be directly executed. Further, at least a part of the information processing described above can be realized by an electronic circuit such as a DSP, ASIC, or PLD (Programmable Logic Device).

１音声処理装置
２スピーカ配置部
３音声合成部
４聴取者
５ａ，５ｂ，５ｃ，５ｄ仮想音源
６ａ，６ｂ，６ｃ仮想スピーカ
７円周
θ１，θ２，θ３，θ４角度 DESCRIPTION OF SYMBOLS 1 Sound processing apparatus 2 Speaker arrangement | positioning part 3 Voice synthesis | combination part 4 Listener 5a, 5b, 5c, 5d Virtual sound source 6a, 6b, 6c Virtual speaker 7 Circumference (theta) 1, (theta) 2, (theta) 3, (theta) 4 angle

Claims

A speaker arrangement unit that arranges a plurality of virtual speakers less than the number of virtual sound sources arranged around the listener on a circumference centered on the listener;
Audio signals from each of the plurality of virtual sound sources are distributed to one or more virtual speakers selected for each virtual sound source among the plurality of virtual speakers, and a head related transfer function corresponding to the position of each virtual speaker is used. A voice synthesizer for synthesizing the voice signal distributed to each virtual speaker into two left and right channel voice signals;
Have
The speaker placement section is
Calculating the angle in the direction along the circumference around the listener, between two virtual sound sources adjacent in the direction along the circumference as seen from the listener;
Of the range of directions along the circumference around the listener, the calculated angle is sandwiched between virtual sound sources that are equal to or greater than a threshold, and the direction of each virtual sound source viewed from the listener Arranging the plurality of virtual speakers in a second range excluding a first range not including the position of
Features and be Ruoto voice processing apparatus that.

The audio processing apparatus according to claim 1, wherein the threshold value is a value obtained by dividing 360 ° by the number of virtual speakers.

The speaker arrangement unit identifies a pair of virtual sound sources in which the calculated angle is equal to or greater than the threshold value, and places the virtual speakers in a direction in which each virtual sound source included in the pair is located when viewed from the listener The audio processing apparatus according to claim 1, wherein the remaining virtual speakers are distributed in the second range.

In the second range, the speaker placement unit has a direction along the circumference of each of two adjacent virtual speakers and a virtual sound source positioned between the two virtual speakers when viewed from the listener. The audio processing apparatus according to claim 3, wherein the remaining virtual speakers are arranged in the second range so that the distance is reduced.

When a plurality of the pairs are specified, the speaker arrangement unit arranges virtual speakers in a direction in which each virtual sound source included in each of the specified plurality of pairs is located as viewed from the listener, Among the plurality of ranges sandwiched between the arranged virtual speakers, one or more third ranges in which the virtual sound source exists also at positions other than both ends thereof are specified, and the specified third range 5. The sound processing apparatus according to claim 3, wherein the non-arranged virtual speakers are distributed and arranged.

When a plurality of the third ranges are specified, the speaker placement unit calculates an angle centered on the listener between two adjacent virtual speakers in each of the specified third ranges. Then, the process of arranging one unarranged virtual speaker so that the interval between the virtual speakers is even in the third range where the calculated angle is maximum is repeated until there is no unarranged virtual speaker. The speech processing apparatus according to claim 5, wherein:

When calculating the angle centered on the listener between two adjacent virtual speakers in each of the plurality of specified third ranges, the speech synthesizer corresponds to the calculated angle. 7. The speech processing apparatus according to claim 6, wherein weighting is performed according to a position in a range of 3.

A plurality of virtual speakers less than the number of the plurality of virtual sound sources arranged around the listener are arranged on a circumference centered on the listener,
Audio signals from each of the plurality of virtual sound sources are distributed to one or more virtual speakers selected for each virtual sound source among the plurality of virtual speakers, and a head related transfer function corresponding to the position of each virtual speaker is used. Synthesize the audio signal distributed to each virtual speaker into left and right channel audio signals;
Including processing,
In the arrangement of the plurality of virtual speakers,
Calculating the angle in the direction along the circumference around the listener, between two virtual sound sources adjacent in the direction along the circumference as seen from the listener;
Of the range of directions along the circumference around the listener, the calculated angle is sandwiched between virtual sound sources that are equal to or greater than a threshold, and the direction of each virtual sound source viewed from the listener Arranging the plurality of virtual speakers in a second range excluding a first range not including the position of
And a voice processing method.

On the computer,
A plurality of virtual speakers less than the number of the plurality of virtual sound sources arranged around the listener are arranged on a circumference centered on the listener,
Audio signals from each of the plurality of virtual sound sources are distributed to one or more virtual speakers selected for each virtual sound source among the plurality of virtual speakers, and a head related transfer function corresponding to the position of each virtual speaker is used. Synthesize the audio signal distributed to each virtual speaker into left and right channel audio signals;
Let the process run,
In the arrangement of the plurality of virtual speakers,
Calculating the angle in the direction along the circumference around the listener, between two virtual sound sources adjacent in the direction along the circumference as seen from the listener;
Of the range of directions along the circumference around the listener, the calculated angle is sandwiched between virtual sound sources that are equal to or greater than a threshold, and the direction of each virtual sound source viewed from the listener Arranging the plurality of virtual speakers in a second range excluding a first range not including the position of
A speech processing program characterized by the above.