JP2006094315A

JP2006094315A - Stereophonic reproduction system

Info

Publication number: JP2006094315A
Application number: JP2004279602A
Authority: JP
Inventors: Yasushi Kaneda; 泰金田
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2004-09-27
Filing date: 2004-09-27
Publication date: 2006-04-06

Abstract

<P>PROBLEM TO BE SOLVED: To enable sound source direction in a virtual space to be discriminated more accurately (recognized). <P>SOLUTION: A client 201 has accepting means 231 and 232 for accepting a swing indication by swinging a user's head from side to side or up and down in a virtual space, a client transmission means 222 for transmitting the swing indication accepted by the accepting means to an acoustic server 120, a client reception means 215 for receiving from the acoustic server 120 a stereophonic sound whose acoustic effect of sound sources is controlled, and an output means 217 for outputting the stereophonic sound received by the client reception means. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、仮想空間における音源の位置に応じて、立体的に音響を再生する技術に関する。 The present invention relates to a technique for reproducing sound three-dimensionally according to the position of a sound source in a virtual space.

本発明に関連する第 1 の背景技術として、デジタル信号処理にもとづく立体音響生成技術 (以下、「3 次元オーディオ技術」) がある。3 次元オーディオ技術は、複数のスピーカーを使用した多チャンネルステレオ再生、または、ステレオ・ヘッドフォンを使用したバイノーラル再生のための信号を、デジタル信号処理技術を使用して生成する（例えば、非特許文献１参照）。 As a first background technology related to the present invention, there is a stereophonic sound generation technology based on digital signal processing (hereinafter, “three-dimensional audio technology”). In the 3D audio technology, a signal for multi-channel stereo reproduction using a plurality of speakers or binaural reproduction using stereo headphones is generated using a digital signal processing technology (for example, Non-Patent Document 1). reference).

3次元オーディオ技術の中には、頭部伝達関数 HRTF (Head-Related Transfer Function) または頭部インパルス応答 HRIR (Head-Related Impulse Response) を使用したデジタル信号処理にもとづく技術がある。HRTF は、人頭とその周辺における耳殻、肩などによる音の変化を、伝達関数 (周波数応答) の形式で表現したものである。また、HRIR は、人頭とその周辺における耳殻、肩などによる音の変化を、インパルス応答の形式で表現したものである。なお、HRIR をフーリエ変換したものが HRTF である。HRTF または HRIR を使用した技術により、聴取者は、再生音において音源の方向（すなわち左右、前後、上下）を識別できるようになる。 Among 3D audio technologies, there are technologies based on digital signal processing using the head-related transfer function (HRTF) or head-related impulse response (HRIR). HRTF expresses changes in sound due to ear shells and shoulders around the human head in the form of a transfer function (frequency response). HRIR expresses changes in sound due to the ear shells and shoulders around the human head in the form of impulse responses. HRTF is the result of Fourier transform of HRIR. Techniques using HRTF or HRIR allow listeners to identify the direction of the sound source (ie, left / right, front / back, top / bottom) in the playback sound.

本発明に関連する第 2 の背景技術として、仮想空間を使用した会議システムがある（例えば、特許文献１、特許文献２および非特許文献２参照）。会議システムは、複数のユーザが仮想的な空間を共有し、同一の空間内にいるユーザ同士が会話をすることができるシステムである。 As a second background art related to the present invention, there is a conference system using a virtual space (see, for example, Patent Document 1, Patent Document 2, and Non-Patent Document 2). A conference system is a system in which a plurality of users share a virtual space and users in the same space can have a conversation.

ＵＳ６，４０８，３２７Ｂ１US 6,408,327 B1 ＵＳ５，８８９，８４３US 5,889,843 Begault, D. R., -D Sound for Virtual Reality and Multimedia NASA/TM-2000-XXXX, NASA Ames Research Center, April 2000, http://human-factors.arc.nasa.gov/ihh/spatial/papers/pdfs_db/Begault_2000_3d_Sound_Multimedia.pdfBegault, DR, -D Sound for Virtual Reality and Multimedia NASA / TM-2000-XXXX, NASA Ames Research Center, April 2000, http://human-factors.arc.nasa.gov/ihh/spatial/papers/pdfs_db/ Begault_2000_3d_Sound_Multimedia.pdf Singer，A．，Hindus，D．，Stifelman，L．，and White，S．，「Tangible Progress: Less Is More In Somewire Audio Spaces」，ACM CHI '９９ (Conference on Human Factors in Computing Systems)，pp．１０４-１１２，May １９９９．Singer, A. Hindus, D .; Stifelman, L .; , And White, S. "Tangible Progress: Less Is More In Somewire Audio Spaces", ACM CHI '99 (Conference on Human Factors in Computing Systems), pp. 104-112, May 1999.

さて、３次元オーディオ技術を用いてスピーカーまたはヘッドフォンから出力される再生音を聴く場合、人（聴取者）は、音源の方向を正確に判別（認知）することは容易ではない。 Now, when listening to reproduced sound output from a speaker or headphones using 3D audio technology, it is not easy for a person (listener) to accurately determine (recognize) the direction of the sound source.

その第１の理由としては、人は左右2つの耳しかもっていないという本質的な問題にある。人は、実空間において、両耳間の音量差および時間差、周波数特性の変化などにより音源の方向を識別する。しかしながら、耳は左右2つであることから、もともと人は、前後または上下の方向を判別する能力が低い。そのため、一般的に人は、実空間において直接聞こえる音についても、しばしば、前後または上下の方向について誤って判別する。 The first reason is the essential problem that people have only two left and right ears. In real space, a person identifies the direction of a sound source based on a volume difference and time difference between both ears, a change in frequency characteristics, and the like. However, since there are two ears on the left and right, a person originally has a low ability to discriminate front and rear or up and down directions. For this reason, in general, people often mistakenly determine the sound that can be heard directly in real space in the front-rear or up-down direction.

第２の理由は、３次元オーディオ技術を用いた再生音においては、個人ごとに異なる HRTF（またはHRIR）が正確に反映されていないことによる。すなわち、HRTF等に用いるデータは、標準的な人間の頭の形、耳殻を有するダミーヘッドを用いて測定される。しかしながら、人間の頭の形、耳殻は個人差があるため、ダミーヘッドの頭の形、耳殻等とは異なる聴取者は、３次元オーディオ技術を用いた再生音から、音源の方向を正確に判別（認知）することは容易ではない。 The second reason is that the HRTF (or HRIR) that differs from person to person is not accurately reflected in the reproduced sound using the three-dimensional audio technology. That is, data used for HRTF or the like is measured using a standard human head shape and a dummy head having an ear shell. However, since the shape of the human head and the ear shell vary from person to person, listeners who are different from the head shape and ear shell of the dummy head, etc., accurately determine the direction of the sound source from the reproduced sound using 3D audio technology. It is not easy to distinguish (recognize).

本発明は上記事情を考慮してなされたものであり、本発明の目的は、仮想空間における音源の方向を、より正確に判別（認識）できるようにすることにある。 The present invention has been made in view of the above circumstances, and an object of the present invention is to enable more accurate determination (recognition) of the direction of a sound source in a virtual space.

上記課題を解決するために、本発明では、仮想空間上で、ユーザの首を左右または上下に振るスイング指示を受け付け、首を振った状態で音源の音響を制御する。 In order to solve the above-described problem, in the present invention, a swing instruction for swinging the user's neck left and right or up and down is received in a virtual space, and the sound of the sound source is controlled while the head is swung.

例えば、仮想空間に存在する少なくとも１つの音源の音響を制御する立体音響再生システムであって、音源各々の音響効果を制御する音響サーバと、ユーザが使用するクライアントとを有する。クライアントは、仮想空間上で、ユーザの首を左右または上下に振るスイング指示を受け付ける受付手段と、受付手段が受け付けたスイング指示を、音響サーバに送信するクライアント送信手段と、音源各々の音響効果を制御した立体音響を音響サーバから受信するクライアント受信手段と、クライアント受信手段が受信した立体音響を出力する出力手段と、を有する。 For example, it is a stereophonic sound reproduction system that controls the sound of at least one sound source that exists in a virtual space, and includes a sound server that controls the sound effect of each sound source and a client used by the user. The client includes a reception unit that receives a swing instruction for swinging the user's neck left and right or up and down in the virtual space, a client transmission unit that transmits the swing instruction received by the reception unit to the acoustic server, and a sound effect of each sound source. Client receiving means for receiving the controlled stereo sound from the sound server, and output means for outputting the stereo sound received by the client receiving means.

音響サーバは、ユーザおよび音源各々の仮想空間における位置および向きを記憶するサーバ記憶手段と、音源各々から、当該音源が出力する音響を受信するサーバ受信手段と、サーバ記憶手段に記憶されたユーザおよび音源各々の位置および向きに基づいて、サーバ受信手段が受信した音響各々に適用する音響効果を制御する音響制御手段と、音響制御手段が音響効果を制御した立体音響を、クライアントに送信するサーバ送信手段と、を有し、クライアントからスイング指示を受信した場合、音響制御手段は、サーバ記憶手段に記憶されたユーザの向きから左右または上下に変更した向きに基づいて、受信した音響各々に適用する音響効果を制御する。 The acoustic server includes server storage means for storing positions and orientations of the user and the sound source in the virtual space, server reception means for receiving sound output from the sound source from each sound source, and a user stored in the server storage means, and Based on the position and orientation of each sound source, acoustic control means for controlling the acoustic effect applied to each of the sounds received by the server receiving means, and server transmission for transmitting the stereophonic sound whose acoustic effect is controlled by the acoustic control means to the client And when the swing instruction is received from the client, the sound control means applies each received sound based on the orientation changed from the user orientation stored in the server storage means to the left or right or up and down. Control sound effects.

本発明によれば、仮想空間において、ユーザは、仮想的に左右または上下に首をふる。これにより、音源から聞こえる音の変化に基づいて、音源の方向をより正確に把握することができる。 According to the present invention, in the virtual space, the user virtually swings his / her neck horizontally and vertically. Thereby, based on the change of the sound heard from the sound source, the direction of the sound source can be grasped more accurately.

以下に本発明の実施の形態について説明する。 Embodiments of the present invention will be described below.

図１は、本発明の一実施形態が適用された立体音響再生システムのシステム構成図を示したものである。図示するように、本システムは、複数ユーザ各々が使用する複数のクライアント２０１、２０２、２０３と、部屋管理サーバ１１０と、音響サーバ１２０と、登録サーバ１３０とを有する。そして、これらの装置２０１、２０２、２０３、１１０、１２０、１３０は、インターネット等のネットワーク１０１を介して接続されている。 FIG. 1 shows a system configuration diagram of a three-dimensional sound reproduction system to which an embodiment of the present invention is applied. As illustrated, the system includes a plurality of clients 201, 202, and 203 used by each of a plurality of users, a room management server 110, an acoustic server 120, and a registration server 130. These apparatuses 201, 202, 203, 110, 120, and 130 are connected via a network 101 such as the Internet.

部屋管理サーバ１１０は、仮想空間および当該仮想空間に存在するユーザのプレゼンス（位置情報等）を管理するとともに、セッション制御を行う。プレゼンスは、仮想空間そのものと、仮想空間内における各ユーザの位置情報（存在感）である。音響サーバ１２０は、クライアント２０１、２０２、２０３各々から入力された音声信号を、3次元オーディオ技術を用いて立体化しミキシングする。登録サーバ１３０は、各ユーザの登録または認証を行う。 The room management server 110 manages the virtual space and the presence of users (location information, etc.) existing in the virtual space, and performs session control. Presence is the virtual space itself and position information (presence) of each user in the virtual space. The acoustic server 120 three-dimensionally mixes audio signals input from the clients 201, 202, and 203 using a three-dimensional audio technology. The registration server 130 registers or authenticates each user.

なお、本実施形態では３台のクライアントを有しているが、クライアントの数は３台に限定されず、２台または４台以上であってもよい。また、本実施形態では、ネットワーク１０１は単一のドメインによって構成されているが、複数のドメインによりネットワークが構成され、各ドメインを結合して複数ドメインにまたがる通信を行うことも可能である。その場合には、部屋管理サーバ１１０、音響サーバ１２０および登録サーバ１３０は、複数個存在する。 In the present embodiment, three clients are provided, but the number of clients is not limited to three, and may be two or four or more. In this embodiment, the network 101 is configured by a single domain, but a network is configured by a plurality of domains, and it is also possible to perform communication across a plurality of domains by combining the domains. In that case, there are a plurality of room management servers 110, acoustic servers 120, and registration servers 130.

次に、立体音響再生システムのハードウェア構成について説明する。 Next, the hardware configuration of the stereophonic sound reproduction system will be described.

図２は、クライアント２０１、２０２、２０３、部屋管理サーバ１１０、音響サーバ１２０および登録サーバ１３０の各装置のハードウェア構成を示したものである。 FIG. 2 shows the hardware configuration of each of the clients 201, 202, 203, the room management server 110, the acoustic server 120, and the registration server 130.

クライアント２０１、２０２、２０３は、プログラムに従ってデータの加工・演算を行なうＣＰＵ３０１と、ＣＰＵ３０１が直接読み書き可能なメモリ３０２と、ハードディスク等の外部記憶装置３０３と、外部システムとデータ通信をするための通信装置３０４と、入力装置３０５と、出力装置３０６とを有する一般的なコンピュータシステムを利用することができる。例えば、PDA（Personal Digital Assistant）、ウェアラブルコンピュータ、PC（Personal Computer）など携帯用のコンピュータシステムである。なお、入力装置３０５および出力装置３０６については、図３において後述する。 Clients 201, 202, and 203 are a CPU 301 that processes and operates data according to a program, a memory 302 that can be directly read and written by the CPU 301, an external storage device 303 such as a hard disk, and a communication device that performs data communication with an external system. A general computer system having 304, an input device 305, and an output device 306 can be used. For example, a portable computer system such as a PDA (Personal Digital Assistant), a wearable computer, or a PC (Personal Computer). The input device 305 and the output device 306 will be described later with reference to FIG.

部屋管理サーバ１１０、音響サーバ１２０および登録サーバ１３０は、少なくともプログラムに従ってデータの加工・演算を行なうＣＰＵ３０１と、ＣＰＵ３０１が直接読み書き可能なメモリ３０２と、ハードディスク等の外部記憶装置３０３と、外部システムとデータ通信をするための通信装置３０４と、を有する一般的なコンピュータシステムを利用することができる。具体的には、サーバ、ホストコンピュータなどである。 The room management server 110, the acoustic server 120, and the registration server 130 include at least a CPU 301 that processes and calculates data according to a program, a memory 302 that can be directly read and written by the CPU 301, an external storage device 303 such as a hard disk, an external system, and data A general computer system having a communication device 304 for communication can be used. Specifically, a server, a host computer, and the like.

なお、上記各装置の後述する機能は、メモリ３０２にロードまたは記憶された所定のプログラム（クライアント２０１、２０２、２０３の場合はクライアント用のプログラム、部屋管理サーバ１１０の場合は部屋管理サーバ用のプログラム、音響サーバ１２０の場合は音響サーバ用のプログラム、そして、登録サーバ１３０の場合は登録サーバ用プログラム）を、ＣＰＵ３０１が実行することにより実現される。 The functions described below of each of the above-described devices are a predetermined program loaded or stored in the memory 302 (a client program in the case of the clients 201, 202, and 203, and a room management server program in the case of the room management server 110). In the case of the acoustic server 120, the acoustic server program, and in the case of the registration server 130, the registration server program) is executed by the CPU 301.

次に、図３を参照しクライアント２０１の入力装置３０５と、出力装置３０６と、機能構成とについて説明する。なお、クライアント２０２、２０３においても同様の構成とする。 Next, the input device 305, the output device 306, and the functional configuration of the client 201 will be described with reference to FIG. The clients 202 and 203 have the same configuration.

クライアント２０１は、入力装置３０５として、マイクロフォン２１１と、ポインティングデバイス２３０と、左右スイングボタン２３１と、上下スイングボタン２３２と、を有する。ポインティングデバイス２３０は、ユーザが自分自身の仮想空間上における移動情報（位置情報および方位情報）を入力するための装置である。左右スイングボタン２３１は、仮想空間においてユーザが首を左右に振る（すなわち、頭を左右に回転させる）ことを指示するための入力装置である。上下スイングボタン２３２は、仮想空間においてユーザが首を上下に振る（すなわち、頭を上向きまたは下向きに回転させる）ことを指示するための入力装置である。 The client 201 includes a microphone 211, a pointing device 230, a left / right swing button 231, and an up / down swing button 232 as the input device 305. The pointing device 230 is a device for a user to input movement information (position information and direction information) in his / her own virtual space. The left / right swing button 231 is an input device for instructing the user to swing his / her head left / right in the virtual space (that is, to rotate the head left / right). The up / down swing button 232 is an input device for instructing the user to swing his / her head up / down in the virtual space (that is, to rotate the head upward or downward).

また、クライアント２０１は、出力装置３０６として、３次元オーディオ技術対応のヘッドフォン２１７と、ディスプレイ２２０とを有する。 In addition, the client 201 includes a headphone 217 compatible with a three-dimensional audio technology and a display 220 as the output device 306.

機能構成としては、オーディオエンコーダ２１２と、オーディオデコーダ２１６と、オーディオ通信部２１５と、グラフィクスレンダラ２１９と、空間モデラ２２１と、プレゼンスプロバイダ２２２と、セッション制御部２２３と、を有する。 The functional configuration includes an audio encoder 212, an audio decoder 216, an audio communication unit 215, a graphics renderer 219, a spatial modeler 221, a presence provider 222, and a session control unit 223.

オーディオエンコーダ２１２は、マイクロフォン２１１から入力された音声（アナログ信号）を、オーディオ信号（デジタル信号）に変換し、オーディオ通信部２１５に出力する。 The audio encoder 212 converts the sound (analog signal) input from the microphone 211 into an audio signal (digital signal) and outputs the audio signal (digital signal) to the audio communication unit 215.

オーディオ通信部２１５は、音響サーバ１２０との間でオーディオ信号をリアルタイムに送受信する。すなわち、オーディオ通信部２１５は、オーディオエンコーダ２１２が変換した自ユーザのオーディオ信号を音響サーバ１２０に送信する。また、オーディオ通信部２１５は、３次元オーディオ技術を使用して残響、フィルタリングなど仮想空間の属性から帰結する処理を行って立体化された他ユーザのオーディオ信号を、音響サーバ１２０から受信する。 The audio communication unit 215 transmits and receives audio signals to and from the acoustic server 120 in real time. That is, the audio communication unit 215 transmits the audio signal of the own user converted by the audio encoder 212 to the acoustic server 120. Also, the audio communication unit 215 receives, from the acoustic server 120, the audio signal of the other user that has been three-dimensionalized by performing processing resulting from the attributes of the virtual space such as reverberation and filtering using a three-dimensional audio technique.

オーディオデコーダ２１６は、オーディオ通信部２１５から入力された立体的なオーディオ信号を、音声（アナログ信号）に変換し、ヘッドフォン２１７に出力する。 The audio decoder 216 converts the three-dimensional audio signal input from the audio communication unit 215 into sound (analog signal) and outputs it to the headphones 217.

グラフィクスレンダラ２１９は、仮想空間の属性から帰結する処理を行い、ディスプレイに出力する仮想空間のイメージデータを生成する。空間モデラ２２１は、ポインティングデバイス２３０から入力された移動情報を受け付けて、仮想空間上での自ユーザの位置や向きなどのプレゼンスを計算する。また、空間モデラ２２１は、左右スイングボタン２３１または上下スイングボタン２３２から入力された指示を受け付けて、仮想空間上での自ユーザが首を振った場合の向き計算する。 The graphics renderer 219 performs processing resulting from the attributes of the virtual space, and generates virtual space image data to be output to the display. The space modeler 221 receives the movement information input from the pointing device 230 and calculates the presence such as the position and orientation of the own user in the virtual space. The space modeler 221 receives an instruction input from the left / right swing button 231 or the up / down swing button 232, and calculates the direction when the user swings his / her head in the virtual space.

プレゼンスプロバイダ２２２は、部屋管理サーバ１１０との間で、仮想空間における各ユーザの位置情報および方位情報を送受信する。また、プレゼンスプロバイダ２２２は、左右スイングボタン２３１または上下スイングボタン２３２が押されたことによる自ユーザの向きの変更を、音響サーバ１２０に送信する。なお、このような情報を送受信するためのプロトコルとしては、IETF (Internet Engineering Task Force) において標準化中であるSIP (Session Initiation Protocol)の拡張仕様を用いることが考えられる。 The presence provider 222 transmits and receives position information and direction information of each user in the virtual space to and from the room management server 110. In addition, the presence provider 222 transmits a change in the orientation of the own user due to the pressing of the left / right swing button 231 or the up / down swing button 232 to the acoustic server 120. As a protocol for transmitting and receiving such information, it is conceivable to use an extended specification of SIP (Session Initiation Protocol) that is being standardized in the Internet Engineering Task Force (IETF).

セッション制御部２２３は、部屋管理サーバ１１０との間で、通信セションを制御する。このようなセッション制御のためのプロトコルとしては、IETF のドキュメントRFC3261において標準化されたSIP を用いることが考えられる。 The session control unit 223 controls a communication session with the room management server 110. As a protocol for such session control, it is conceivable to use SIP standardized in IETF document RFC3261.

ここで仮想空間とは、複数のユーザが会議または会話を行うために仮想的に作り出した空間、または、音楽やインターネット放送を聴くために仮想的に作り出した空間である。そして、部屋管理サーバ１１０が、仮想空間を管理している。ユーザがある仮想空間に入場すると、部屋管理サーバ１１０は、その仮想空間の属性、および、その仮想空間に存在する他のユーザの仮想空間における位置情報および方位情報を送信する。そして、空間モデラ２２１は、送信されたこれらの情報と、ポインティングデバイス２３０から入力された自ユーザの仮想空間上での位置情報および方位情報と、をメモリ３０２または外部記憶装置３０３に格納する。 Here, the virtual space is a space created by a plurality of users for a meeting or a conversation, or a space created for listening to music or Internet broadcasting. The room management server 110 manages the virtual space. When the user enters a certain virtual space, the room management server 110 transmits the attribute of the virtual space and the position information and direction information of the other users existing in the virtual space. The space modeler 221 stores the transmitted information and the position information and orientation information of the user in the virtual space input from the pointing device 230 in the memory 302 or the external storage device 303.

なお、仮想空間の属性には、例えば、空間の大きさ、天井の高さ、壁および天井の反射率・色彩・質感、残響特性、空間内の空気による音の吸収率などがある。これらのうち壁および天井の反射率、残響特性、空間内の空気による音の吸収率などは聴覚的な属性であり、壁および天井の色彩・質感は視覚的な属性であり、空間の大きさ、天井の高さは聴覚・視覚の両方にかかわる属性である。 The attributes of the virtual space include, for example, the size of the space, the height of the ceiling, the reflectance / color / texture of the walls and ceiling, the reverberation characteristics, and the sound absorption rate by the air in the space. Of these, the reflectance of walls and ceilings, reverberation characteristics, sound absorption by air in the space are auditory attributes, and the color and texture of walls and ceilings are visual attributes, and the size of the space The height of the ceiling is an attribute related to both hearing and vision.

つぎに、各機能の動作についてプレゼンス、音声、映像の順に説明する。 Next, the operation of each function will be described in the order of presence, audio, and video.

プレゼンスについては、ポインティングデバイス２３０が、自ユーザからの移動情報（位置情報または方位情報）の入力を受付け、これらの情報をデジタル信号に変換して空間モデラ２２１に入力する。空間モデラ２２１は、ポインティングデバイス２３０からの入力を受け付けて、仮想空間における自ユーザの位置および向きを変化させる。すなわち、空間モデラ２２１は、メモリ３０２または外部記憶装置３０３に保持された仮想空間の属性に基づいて、仮想空間上における自ユーザの位置および向きを変更する。 Regarding the presence, the pointing device 230 receives input of movement information (position information or direction information) from the own user, converts the information into a digital signal, and inputs the digital signal to the space modeler 221. The space modeler 221 receives an input from the pointing device 230 and changes the position and orientation of the own user in the virtual space. That is, the space modeler 221 changes the position and orientation of the own user in the virtual space based on the attribute of the virtual space held in the memory 302 or the external storage device 303.

そして、空間モデラ２２１は、プレゼンスプロバイダ２２２を介して、自ユーザの仮想空間の位置情報および方位情報を部屋管理サーバ１１０に送信する。また、空間モデラ２２１は、プレゼンスプロバイダ２２２を介して、他のユーザの仮想空間の位置情報および方位情報を、部屋管理サーバ１１０から受信する。そして、空間モデラ２２１は、クライアントを使用する自ユーザの仮想空間内の位置情報および方位情報と、他のユーザの仮想空間内の位置情報および方位情報とを保持する。 Then, the space modeler 221 transmits the location information and orientation information of the user's virtual space to the room management server 110 via the presence provider 222. In addition, the space modeler 221 receives the position information and direction information of the virtual space of other users from the room management server 110 via the presence provider 222. The space modeler 221 holds position information and direction information in the virtual space of the user using the client, and position information and direction information in the virtual space of other users.

また、左右スイングボタン２３１は、ボタンの押下を検知すると、空間モデラ２２１に左右スイング指示を入力する。空間モデラ２２１は、左右スイング指示を受け付けると、１秒程度の間に次の操作をおこなう。 Further, the left / right swing button 231 inputs a left / right swing instruction to the space modeler 221 when detecting pressing of the button. When the space modeler 221 receives a left / right swing instruction, the space modeler 221 performs the following operation in about one second.

まず、空間モデラ２２１は、仮想空間における自ユーザの首を現時点から所定の角度（例えば、10°程度）左に振った場合の自ユーザの向き（方位情報）を算出する。そして、空間モデラ２２１は、算出した自ユーザの向きを、グラフィクスレンダラ２１９とプレゼンスプロバイダ２２２に送出する。プレゼンスプロバイダ２２２は、自ユーザの方位情報を、音響サーバ１２０に送信する。 First, the space modeler 221 calculates the direction (azimuth information) of the user when the user's neck in the virtual space is swung to the left by a predetermined angle (for example, about 10 °). Then, the space modeler 221 sends the calculated orientation of the user to the graphics renderer 219 and the presence provider 222. The presence provider 222 transmits the orientation information of the own user to the acoustic server 120.

そして、空間モデラ２２１は、仮想空間における自ユーザの首を現時点から所定の角度（例えば、10°程度）右に振った場合の自ユーザの向き（方位情報）を算出する。そして、空間モデラ２２１は、算出した自ユーザの向きを、グラフィクスレンダラ２１９と音響サーバ１２０に送出する。なお、空間モデラ２２１は、プレゼンスプロバイダ２２２を介して、自ユーザの方位情報を音響サーバ１２０に送信する。 Then, the space modeler 221 calculates the direction (direction information) of the user when the user's neck in the virtual space is swung to the right by a predetermined angle (for example, about 10 °) from the current time. Then, the space modeler 221 sends the calculated orientation of the own user to the graphics renderer 219 and the acoustic server 120. The space modeler 221 transmits the orientation information of the own user to the acoustic server 120 via the presence provider 222.

以下、図４、図５および図６を用いて、左右スイングボタン２３１が押された場合について、さらに説明する。 Hereinafter, the case where the left / right swing button 231 is pressed will be further described with reference to FIGS. 4, 5, and 6.

図４は、仮想空間における自ユーザと音源（例えば、通信相手の他ユーザなど）を模式的に示した図である。図４では、自ユーザを真上から示した自ユーザ１と、音源２とを示している。自ユーザ１は、向きを示すために鼻１１を有している。すなわち、自ユーザ１は、鼻１１が付加されている方向３に向いている。図４では、最初の状態（左右スイングボタン２３１を押下する前の状態）において、自ユーザ１は、正面の方向３を向いており、音源２は、自ユーザ１の斜め右前方（ｎ°の角度４の方向）に存在している。 FIG. 4 is a diagram schematically showing the own user and a sound source (for example, another user of the communication partner) in the virtual space. FIG. 4 shows the own user 1 and the sound source 2 showing the own user from directly above. The own user 1 has a nose 11 to indicate the direction. That is, the user 1 is facing in the direction 3 in which the nose 11 is added. In FIG. 4, in the initial state (the state before pressing the left / right swing button 231), the own user 1 is facing the front direction 3, and the sound source 2 is diagonally right front of the own user 1 (n ° In the direction of angle 4).

さて、人間の耳は左右に２つしかないために、音源２が前方に存在するのか、または、後方に存在するのかを識別する能力が低い。したがって、実空間上であっても、自ユーザ１は、音源２の方向を、誤って後方に存在すると認識する場合がある。なお、このような人間の耳の特性については、例えば、以下に示す文献に記述されている。 Now, since there are only two human ears on the left and right, the ability to discriminate whether the sound source 2 is in front or behind is low. Therefore, even in real space, the user 1 may recognize that the direction of the sound source 2 is erroneously present behind. Such human ear characteristics are described, for example, in the following documents.

B. C. J. ムーア著，大串健吾訳: 聴覚心理学概論，誠信書房，1994．P. 220-221。(原著: B. C. J. Moore: An Introduction to the Psychology of Hearing, 3rd Ed., Academic Press, 1989.)
このような状況において、自ユーザ１は、3次元オーディオ技術により再生される音源２の立体音響（再生音）をヘッドフォンから聞いて、音源２が斜め右前方に存在するか、または、斜め右後方２ａに存在するのかを判別することが困難な場合（曖昧な場合）がある。この場合、自ユーザ１は、実空間上で一般的に音源の方向を確認する場合と同じように、首を左右に振る。すなわち、音源２の方向を正確に判別（認知）するために、左右スイングボタン２３１を押す。なお、誤認識しやすい音源２ａは、自ユーザ１の左右の耳を結ぶ平面５に対して、実際の音源２と前後に対称な位置に存在する。 BCJ Moore, Takeshi Ogushi Translated: Introduction to Auditory Psychology, Seishin Shobo, 1994. P. 220-221. (Original: BCJ Moore: An Introduction to the Psychology of Hearing, 3rd Ed., Academic Press, 1989.)
In such a situation, the user 1 hears the stereophonic sound (reproduced sound) of the sound source 2 reproduced by the three-dimensional audio technology from the headphones, and the sound source 2 exists diagonally right front or diagonally right rear. In some cases, it is difficult to determine whether it exists in 2a (if it is ambiguous). In this case, the user 1 swings his / her head to the left and right in the same manner as when the direction of the sound source is generally confirmed in real space. That is, in order to accurately determine (recognize) the direction of the sound source 2, the left / right swing button 231 is pressed. It should be noted that the sound source 2a that is easily misrecognized is present at a position symmetrical to the front and rear of the actual sound source 2 with respect to the plane 5 connecting the left and right ears of the user 1.

図５は、左右スイングボタン２３１が押されたことにより、空間モデラ２２１が、自ユーザ１の首を左に振った状態を模式的に示した図である。すなわち、空間モデラ２２１は、自ユーザ１の向きを所定の角度（α°）左に変更する。この状態において、音源２は、最初の状態よりさらに右（すなわち、ｎ＋α°の方向４Ｌ）に移動することになる。したがって、自ユーザ１は、3次元オーディオ技術により再生される音源２の立体音響（再生音）が、図４に示す最初の状態から右に移動して聞こえる。なお、誤認識しやすい音源２ａの位置に音源が存在する場合、自ユーザ１は、3次元オーディオ技術により再生される音源２’の立体音響が、図４に示す最初の状態から左に移動して聞こえる。 FIG. 5 is a diagram schematically showing a state in which the space modeler 221 has swung his / her own user's neck to the left when the left / right swing button 231 is pressed. That is, the space modeler 221 changes the orientation of the user 1 to the left by a predetermined angle (α °). In this state, the sound source 2 moves further to the right (that is, the direction 4L of n + α °) from the initial state. Therefore, the user 1 can hear the three-dimensional sound (reproduced sound) of the sound source 2 reproduced by the three-dimensional audio technology by moving to the right from the initial state shown in FIG. When the sound source is present at the position of the sound source 2a that is easily misrecognized, the user 1 moves the stereophonic sound of the sound source 2 ′ reproduced by the three-dimensional audio technology to the left from the initial state shown in FIG. I hear it.

図６は、空間モデラ２２１が、自ユーザ１の首を右に振った状態を模式的に示した図である。すなわち、空間モデラ２２１は、自ユーザ１の向きを所定の角度（α°）右に変更する。この状態において、音源２は、最初の状態より左（すなわち、ｎ−α°の方向４Ｒ）に移動することになる。したがって、自ユーザ１は、3次元オーディオ技術により再生される音源２の立体音響（再生音）が、図４に示す最初の状態から左に移動して聞こえる。なお、誤認識しやすい音源２ａ位置に音源が存在する場合、自ユーザ１は、3次元オーディオ技術により再生される音源２ａの立体音響が、図４に示す最初の状態から右に移動して聞こえる。 FIG. 6 is a diagram schematically illustrating a state in which the space modeler 221 has swung the user's 1 neck to the right. That is, the space modeler 221 changes the orientation of the user 1 to the right by a predetermined angle (α °). In this state, the sound source 2 moves to the left (that is, the n-α ° direction 4R) from the initial state. Therefore, the user 1 can hear the three-dimensional sound (reproduced sound) of the sound source 2 reproduced by the three-dimensional audio technology by moving to the left from the initial state shown in FIG. When a sound source exists at the position of the sound source 2a that is easily misrecognized, the user 1 can hear the three-dimensional sound of the sound source 2a reproduced by the three-dimensional audio technology moving to the right from the initial state shown in FIG. .

このように、左右スイングボタン２３１を使用することによって、ユーザは、方向が曖昧な音源について、正確な方向を把握することができる。すなわち、音源が前方にあるのか後方にあるのかが曖昧な場合、ユーザは、左右スイングボタン２３１を押す。そして、ユーザは、音源が最初に右、次に左に移動して聞こえる場合は、音源が前方にあると正しく判別することができる。一方、ユーザは、音源が最初に左、次に右に移動して聞こえる場合は、音源が後方にあると正しく判別することができる。 In this way, by using the left / right swing button 231, the user can grasp the accurate direction of the sound source whose direction is ambiguous. That is, when it is ambiguous whether the sound source is in front or behind, the user presses the left / right swing button 231. The user can correctly determine that the sound source is ahead if the sound source moves to the right and then to the left and is heard. On the other hand, the user can correctly determine that the sound source is behind when the sound source is heard moving first to the left and then to the right.

なお、一度に所定の角度左（または右）にユーザの向きが変化するような不連続な動作は、ユーザの混乱をまねく可能性がある。そのため、空間モデラ２２１は、一度に所定の角度を左および右に振った場合の方位情報を、グラフィクスレンダラ２１９および音響サーバ１２０に送出するのでなく、一定の間隔の角度ごとに補間した方位情報をグラフィクスレンダラ２１９および音響サーバ１２０に送出する。これにより、実空間においてユーザが首を振った場合の動作のようにほぼ連続的な動作となり、ユーザの混乱を防止することができる。 Note that a discontinuous operation in which the user's orientation changes to the left (or right) by a predetermined angle at a time may lead to user confusion. For this reason, the spatial modeler 221 does not send the azimuth information when the predetermined angle is swung to the left and right at the same time to the graphics renderer 219 and the acoustic server 120, but to interpolate the azimuth information for each angle at a certain interval. It is sent to the graphics renderer 219 and the sound server 120. Thereby, it becomes a substantially continuous operation | movement like the operation | movement when a user shakes his head in real space, and can prevent a user's confusion.

また、首を振る順番や、首を振る角度には個人差がある。そのため、先に左に振るかあるいは右に振るか、また、首を振る所定の角度については、ユーザ毎に調整（変更）できるものとする。 There are individual differences in the order of shaking the head and the angle of shaking the head. For this reason, it is assumed that a predetermined angle for first swinging to the left or to the right or swinging the neck can be adjusted (changed) for each user.

次に、上下スイングボタン２３２を押した場合について説明する。上下スイングボタン２３２は、ボタンの押下を検知すると、空間モデラ２２１に上下スイング指示を入力する。空間モデラ２２１は、上下スイング指示を受け付けると、１秒程度の間に次の操作をおこなう。 Next, a case where the up / down swing button 232 is pressed will be described. The up / down swing button 232 inputs an up / down swing instruction to the space modeler 221 when detecting pressing of the button. When the space modeler 221 receives the up / down swing instruction, the space modeler 221 performs the following operation in about one second.

まず、空間モデラ２２１は、仮想空間における自ユーザの首を現時点（水平な状態）から所定の角度（例えば、10°程度）上に振った場合の自ユーザの向き（上下の方位情報）を算出する。そして、空間モデラ２２１は、算出した自ユーザの向きを、グラフィクスレンダラ２１９と音響サーバ１２０に送出する。 First, the space modeler 221 calculates the orientation of the user (vertical orientation information) when the user's neck in the virtual space is shaken from a current position (horizontal state) to a predetermined angle (for example, about 10 °). To do. Then, the space modeler 221 sends the calculated orientation of the own user to the graphics renderer 219 and the acoustic server 120.

そして、空間モデラ２２１は、仮想空間における自ユーザの首を現時点（水平な状態）から所定の角度（例えば、10°程度）下に振った場合の自ユーザの向き（上下の方位情報）を算出する。そして、空間モデラ２２１は、算出した自ユーザ向きを、グラフィクスレンダラ２１９および音響サーバ１２０に送出する。以下、図７を用いて、上下スイングボタン２３２が押された場合について、さらに説明する。 Then, the space modeler 221 calculates the orientation of the user (vertical orientation information) when the user's neck in the virtual space is shaken down from the current time (horizontal state) by a predetermined angle (for example, about 10 °). To do. Then, the space modeler 221 sends the calculated user orientation to the graphics renderer 219 and the acoustic server 120. Hereinafter, the case where the up / down swing button 232 is pressed will be further described with reference to FIG.

図７は、仮想空間における自ユーザと音源を模式的に示した図である。図７では、自ユーザを側面から示した自ユーザ１と、音源２とを示している。自ユーザ１は、向きを示すために鼻１１を有している。最初の状態（上下スイングボタン２３２を押下する前）において、自ユーザ１は、水平の方向３を向いており、音源２は、自ユーザ１の斜め上前方（ｎ°の上方向４）に存在している。 FIG. 7 is a diagram schematically showing the user and the sound source in the virtual space. In FIG. 7, the own user 1 and the sound source 2 showing the own user from the side are shown. The own user 1 has a nose 11 to indicate the direction. In the initial state (before pressing the up / down swing button 232), the own user 1 is directed in the horizontal direction 3, and the sound source 2 is present diagonally upward and forward of the own user 1 (upward direction 4 of n °). is doing.

さて、人間の耳は左右に２つしかないために、前後の判別と同様に、音源２が上方に存在するか、または、下方に存在するかを識別する能力が低い。このような人間の耳の特性については、前述の文献（聴覚心理学概論）に記述されている。 Now, since there are only two human ears on the left and right, the ability to discriminate whether the sound source 2 exists above or below is low, as in the previous and subsequent discrimination. Such characteristics of the human ear are described in the above-mentioned document (Introduction to Auditory Psychology).

このような状況において、自ユーザ１は、３次元オーディオ技術により再生された音源２の立体音響（再生音）をヘッドフォンから聞いて、音源２が前方上方向に存在するか、または、前方下方向に存在するのかを判別することが困難な場合（曖昧な場合）がある。この場合、自ユーザは、音源の方向を正確に判別（認知）するために、実空間において音源の方向を確認するときと同様に、上下スイングボタン２３２を押す。なお、誤認識しやすい音源２ａは、自ユーザ１の左右の耳を結ぶ平面５に対して、実際の音源２と上下に対称な位置に存在する。 In such a situation, the user 1 hears the stereophonic sound (reproduced sound) of the sound source 2 reproduced by the three-dimensional audio technology from the headphones, and the sound source 2 exists in the front upper direction or the front lower direction. In some cases, it is difficult to determine whether it exists. In this case, in order to accurately determine (recognize) the direction of the sound source, the own user presses the up / down swing button 232 as in the case of confirming the direction of the sound source in the real space. Note that the sound source 2a that is easily misrecognized is present at a position that is vertically symmetrical with the actual sound source 2 with respect to the plane 5 connecting the left and right ears of the user 1.

上下スイングボタン２３２が押されたことにより、空間モデラ２２１は、最初に、自ユーザ１の向きを所定の角度（β°）上に変更する。すなわち、空間モデラ２２１は、自ユーザ１の首を上方向３Ｕに振る。この状態において、音源２は、最初の状態より下に（すなわち、ｎ°−β°の方向４Ｕ）に位置することになる。したがって、自ユーザ１は、3次元オーディオ技術により再生される音源２の立体音響（再生音）が、最初の水平状態から下に移動して聞こえる。なお、誤認識しやすい音源２ａの位置に実際の音源が存在する場合、自ユーザ１は、3次元オーディオ技術により再生される音源２ａの立体音響が、最初の状態から上に移動して聞こえる。 When the up / down swing button 232 is pressed, the space modeler 221 first changes the orientation of the user 1 to a predetermined angle (β °). That is, the space modeler 221 swings the neck of the own user 1 upward 3U. In this state, the sound source 2 is positioned below the initial state (that is, in the direction 4U of n ° −β °). Accordingly, the user 1 can hear the three-dimensional sound (reproduced sound) of the sound source 2 reproduced by the three-dimensional audio technology, moving downward from the initial horizontal state. When an actual sound source is present at the position of the sound source 2a that is easily misrecognized, the user 1 can hear the three-dimensional sound of the sound source 2a reproduced by the three-dimensional audio technology moving upward from the initial state.

次に、空間モデラ２２１は、次に、自ユーザ１の向きを所定の角度（β°）下に変更する。すなわち、空間モデラ２２１は、自ユーザ１の首を下方向３Ｄに振る。この状態において、音源２は、最初の状態より上に（すなわち、ｎ°＋β°の方向４Ｄ）に位置することになる。したがって、自ユーザ１は、3次元オーディオ技術により再生される音源２の立体音響が、最初の水平状態から上に移動して聞こえる。なお、誤認識しやすい音源２ａの位置に実際の音源が存在する場合、自ユーザ１は、3次元オーディオ技術により再生される音源４の立体音響が、最初の状態から上に移動して聞こえる。 Next, the space modeler 221 changes the orientation of the user 1 below a predetermined angle (β °). That is, the space modeler 221 swings the neck of the user 1 in the downward direction 3D. In this state, the sound source 2 is positioned above the initial state (that is, the direction 4D of n ° + β °). Accordingly, the user 1 can hear the three-dimensional sound of the sound source 2 reproduced by the three-dimensional audio technology moving upward from the initial horizontal state. When an actual sound source exists at the position of the sound source 2a that is easily misrecognized, the user 1 can hear the three-dimensional sound of the sound source 4 reproduced by the three-dimensional audio technology moving upward from the initial state.

このように、上下スイングボタン２３２を使用することによって、ユーザは、方向が曖昧な音源について、正確な方向を把握することができる。すなわち、音源が上にあるのか下にあるのかが曖昧な場合、ユーザは、上下スイングボタン２３２を押す。そして、ユーザは、音源が最初に下、次に上に移動して聞こえる場合は、音源が上にあると正しく判別することができる。一方、ユーザは、音源が最初に上、次に下に移動して聞こえる場合は、音源が下にあると正しく判別することができる。 Thus, by using the up / down swing button 232, the user can grasp the accurate direction of the sound source whose direction is ambiguous. That is, when it is ambiguous whether the sound source is above or below, the user presses the up / down swing button 232. The user can correctly determine that the sound source is at the top if the sound source is heard to move down and then up. On the other hand, if the user hears the sound source moving up first and then down, the user can correctly determine that the sound source is down.

なお、一度に所定の角度上（または下）にユーザの向きが変化するような不連続な動作は、ユーザの混乱をまねく可能性がある。そのため、空間モデラ２２１は、左右スイングボタン２３１と同様に、一定の間隔の角度ごとに補間した方位情報をグラフィクスレンダラ２１９と音響サーバ１２０に送出する。これにより、実空間においてユーザが首を振った場合の動作のようにほぼ連続的な動作となり、ユーザの混乱を防止することができる。また、首を振る順番や、首を振る角度は、個人差があるため、ユーザ毎に調整（変更）できるものとする。 Note that a discontinuous operation in which the user's orientation changes at a predetermined angle (or below) at a time may lead to user confusion. Therefore, the space modeler 221 sends the azimuth information interpolated for each angle at a constant interval to the graphics renderer 219 and the acoustic server 120, as with the left and right swing buttons 231. Thereby, it becomes a substantially continuous operation | movement like the operation | movement when a user shakes his head in real space, and can prevent a user's confusion. In addition, the order of shaking the head and the angle of shaking the head have individual differences and can be adjusted (changed) for each user.

また、以上説明した左右スイングボタン２３１および上下スイングボタン２３２による首振り動作は、ポインティングデバイス２３０を流用して入力することもできる。しかしながら、左右スイングボタン２３０および上下スイングボタン２３２を別途もうけることにより、ユーザは容易にかつ的確に首振り指示を入力することができる。 In addition, the swinging motion by the left / right swing button 231 and the up / down swing button 232 described above can be input by using the pointing device 230. However, by separately providing the left / right swing button 230 and the up / down swing button 232, the user can input a swing instruction easily and accurately.

次に、音声について説明する。 Next, audio will be described.

音声については、マイクロフォン２１１が当該クライアントを使用する自ユーザの音声を収集し、オーディオエンコーダ２１２に送付する。そして、オーディオエンコーダ２１２は、自ユーザの音声をオーディオ信号（デジタル信号）に変換して、オーディオ通信部２１５に出力する。オーディオ通信部２１５は、オーディオエンコーダ２１２から入力された自ユーザのオーディオ信号を、リアルタイムに音響サーバ１２０に送信する。 As for the voice, the microphone 211 collects the voice of the user using the client and sends it to the audio encoder 212. Then, the audio encoder 212 converts the voice of the own user into an audio signal (digital signal) and outputs the audio signal to the audio communication unit 215. The audio communication unit 215 transmits the audio signal of the own user input from the audio encoder 212 to the acoustic server 120 in real time.

また、オーディオ通信部２１５は、音響サーバ１２０から３次元オーディオ技術を使用して立体化された他のクライアントの他ユーザのオーディオ信号（立体音響）をリアルタイムに受信し、オーディオデコーダ２１６に出力する。オーディオデコーダ２１６は、音響サーバ１２０から受信したオーディオ信号（立体音響）をヘッドフォン２１７に出力する。なお、音響サーバ１２０が行うオーディオ信号の３次元オーディオ技術を使用した音響の立体化処理については後述する。 In addition, the audio communication unit 215 receives, in real time, an audio signal (stereo sound) of another user of another client that is three-dimensionalized using the three-dimensional audio technology from the sound server 120 and outputs the audio signal to the audio decoder 216. The audio decoder 216 outputs the audio signal (stereo sound) received from the sound server 120 to the headphones 217. The sound three-dimensional processing using the three-dimensional audio technology of the audio signal performed by the sound server 120 will be described later.

なお、オーディオ信号のリアルタイム通信には、IETF (Internet Engineering Task Force) が発行したドキュメントRFC 3550に記述されたプロトコルであるRTP(Real-time Transport Protocol) が使用される。 For real-time communication of audio signals, RTP (Real-time Transport Protocol), which is a protocol described in document RFC 3550 issued by IETF (Internet Engineering Task Force), is used.

次に、画像について説明する。 Next, the image will be described.

画像については、グラフィクスレンダラ２１９が、空間モデラ２２１が保持する視覚的な仮想空間属性、仮想空間における他ユーザ（通信相手）の位置および自ユーザの位置にもとづいて、仮想空間上でどのように他ユーザが見えるかを計算（座標変換）する。次に、グラフィクスレンダラ２１９は、あらかじめ定められた他ユーザの画像に対して、前記計算により自ユーザの位置から見た視点で仮想空間の属性から帰結する処理を行い、画面上に出力するイメージデータ（映像）を作成する。 As for the image, how the graphics renderer 219 performs in the virtual space based on the visual virtual space attribute held by the space modeler 221, the position of the other user (communication partner) in the virtual space, and the position of the own user. Calculate (coordinate conversion) whether the user can see. Next, the graphics renderer 219 performs processing that results from the attribute of the virtual space from the viewpoint of the user's position by the above calculation on the image of another user determined in advance, and outputs the image data on the screen Create (video).

このグラフィクスレンダラ２１９により生成された映像は、クライアントを使用する自ユーザの視点からの映像に再生され、ディスプレイ２２０に出力される。自ユーザは、必要に応じてディスプレイ２２０に出力された映像を参照する。 The video generated by the graphics renderer 219 is reproduced as a video from the viewpoint of the user using the client, and is output to the display 220. The user refers to the video output to the display 220 as necessary.

図８は、平面図を用いた仮想空間の一例である。図示する表示内容は、クライアント２０１を使用する自ユーザが、クライアント２０２およびクライアント２０３を使用する第１および第２の他ユーザと、仮想空間を共有している場合を例にしたものである。図示する仮想空間は、空間モデラ２２１が保持する仮想空間の属性、仮想空間内における自ユーザおよび他ユーザの位置・方位情報をもとに、真上から仮想空間に配置された自ユーザ４１１と、第１の他ユーザ４１２および第２の他ユーザ４１３と、を眺めることで得られる２次元画像を表示している。なお、図示する自ユーザ４１１および他ユーザ４１２、４１３は、それぞれ、頭aと、仮想空間に配置された各ユーザの向いている方向を示すための鼻ｂと、肩（肩幅）ｃとを有する。 FIG. 8 is an example of a virtual space using a plan view. The display content shown in the figure is an example in which the own user who uses the client 201 shares a virtual space with the first and second other users who use the client 202 and the client 203. The virtual space shown in the figure is based on the attributes of the virtual space held by the space modeler 221 and the own user 411 arranged in the virtual space from directly above based on the position and orientation information of the own user and other users in the virtual space, A two-dimensional image obtained by viewing the first other user 412 and the second other user 413 is displayed. The illustrated user 411 and other users 412, 413 each have a head a, a nose b for indicating the direction in which each user faces in the virtual space, and a shoulder (shoulder width) c. .

グラフィクスレンダラ２１９は、自ユーザ４１１の位置と向きを固定し、自ユーザ４１１を中心として仮想空間や仮想空間中の他のユーザ４１２、４１３が相対的に移動し回転するように表示する。ポインティングデバイス２３０を用いて自ユーザ４１１が移動または向きが変更した場合、仮想空間や仮想空間中の他のユーザが相対的に移動・回転した画面がリアルタイムでディスプレイ２２０に表示される。自ユーザの向きを前方に固定することにより、音声とグラフィクス表示との整合性が確保され、他ユーザの位置および方向を身体感覚として把握することができる。 The graphics renderer 219 fixes the position and orientation of the user 411 and displays the virtual user and other users 412 and 413 in the virtual space relative to the user 411 so as to move and rotate. When the own user 411 moves or changes its direction using the pointing device 230, a screen in which the virtual space and other users in the virtual space are relatively moved and rotated is displayed on the display 220 in real time. By fixing the direction of the own user to the front, consistency between the voice and the graphics display is ensured, and the position and direction of the other user can be grasped as a physical sensation.

また、図示する仮想空間では、所定の長さ（例えば、１ｍ）を示すスケールバー４１４を表示している。また、図示する仮想空間では、各ユーザの肩（肩幅）ｃを表示している。スケールバー４１４および肩（肩幅）ｃを表示することにより、自ユーザ４１１は、他ユーザ４１２、４１３との仮想空間上でのおよその距離を視覚的（直感的）に把握することができる。そして、自ユーザは、スケールバー４１４および肩（肩幅）ｃが表示されたイメージデータ（図８）を参照しつつ、ヘッドフォンから出力される３次元オーディオ技術を用いた他ユーザの立体音響を聴く。 In the illustrated virtual space, a scale bar 414 indicating a predetermined length (for example, 1 m) is displayed. In the illustrated virtual space, the shoulder (shoulder width) c of each user is displayed. By displaying the scale bar 414 and the shoulder (shoulder width) c, the user 411 can visually (intuitively) grasp the approximate distance of the other users 412 and 413 in the virtual space. Then, the user listens to the other user's stereophonic sound using the three-dimensional audio technology output from the headphones while referring to the image data (FIG. 8) on which the scale bar 414 and the shoulder (shoulder width) c are displayed.

なお、3次元オーディオ技術には、部屋の残響をシミュレートすることで、自ユーザと音源との距離や、部屋の大きさなどを表現する残響シミュレーション技術がある。部屋の残響は、音源が存在する部屋（仮想空間）の壁や部屋内の物体などによる音の反射や拡散などにより、音源の音響（以下、「直接音」）に付加される音響（以下、「反射音」）である。一般的に人間は、音源までの距離を判別（認識）する際に、直接音と残響である反射音との比率にもとづいて判別している。 Note that the 3D audio technology includes a reverberation simulation technology that expresses the distance between the user and the sound source, the size of the room, and the like by simulating the reverberation of the room. The reverberation of a room is the sound (hereinafter referred to as “direct sound”) that is added to the sound of the sound source (hereinafter referred to as “direct sound”) due to the reflection and diffusion of the sound from the walls of the room (virtual space) where the sound source exists and objects in the room "Reflected sound"). In general, when determining (recognizing) a distance to a sound source, a human determines based on a ratio between a direct sound and reflected sound that is reverberation.

したがって、自ユーザは、ディスプレイを参照することにより他ユーザ（音源）との距離を視覚的に把握しつつ、ヘッドフォンからは直接音と反射音とがまざった他ユーザの立体音響を聞く。これにより、自ユーザは、仮想空間内で移動することによって他ユーザとの距離を変化させ、距離の変化によって直接音と反射音との比がどのように変化するかを学習することができる。このような学習を繰り返すことにより、自ユーザは、ディスプレイを見なくても他ユーザとのおよその距離がわかるようになる。 Therefore, the user listens to the stereophonic sound of the other user, in which the direct sound and the reflected sound are mixed, while visually grasping the distance from the other user (sound source) by referring to the display. Thereby, the own user can learn how the ratio of the direct sound and the reflected sound is changed by changing the distance to the other user by moving in the virtual space and changing the distance. By repeating such learning, the user can know the approximate distance from other users without looking at the display.

なお、スケールバー４１４の所定の長さは、仮想空間の大きさや、ユーザの指示により変更（調整）することができるものとする。また、図示する仮想空間では、スケールバー４１４および肩（肩幅）ｃの両方を表示している。しかしながら、スケールバー４１４または肩（肩幅）ｃのいずれか一方のみを表示することとしてもよい。 It is assumed that the predetermined length of the scale bar 414 can be changed (adjusted) according to the size of the virtual space or a user instruction. In the illustrated virtual space, both the scale bar 414 and the shoulder (shoulder width) c are displayed. However, only one of the scale bar 414 and the shoulder (shoulder width) c may be displayed.

図９は、図８の状態において、左右スイングボタン２３１が押された場合に表示された仮想空間の一例である。左右スイングボタン２３１が押された場合、まず自ユーザ４１１は左に首を振る。そのため、ディスプレイに表示される仮想空間の平面図９Ａでは、他ユーザ４１２、４１３は、図８に表示された位置から右方向に移動（すなわち、所定の角度だけ右に回転）する。 FIG. 9 is an example of a virtual space displayed when the left / right swing button 231 is pressed in the state of FIG. When the left / right swing button 231 is pressed, the user 411 first swings his / her head to the left. Therefore, in the plan view 9A of the virtual space displayed on the display, the other users 412 and 413 move to the right from the position displayed in FIG. 8 (that is, rotate to the right by a predetermined angle).

次に、自ユーザ４１１は右に首を振るため、ディスプレイに表示される仮想空間の平面図９Ｂでは、他ユーザ４１２、４１３は、図８に表示された位置から左方向に移動（すなわち、所定の角度だけ左に回転）する。そして、そして、ディスプレイに表示される仮想空間は、図８に示すもとの状態を表示する。なお、自ユーザ４１１の位置および向きは、左右スイングボタンが２３１が押されても図８の状態と変わらない。 Next, since the own user 411 swings his head to the right, in the plan view 9B of the virtual space displayed on the display, the other users 412 and 413 move to the left from the position displayed in FIG. Rotate to the left by an angle of. Then, the virtual space displayed on the display displays the original state shown in FIG. Note that the position and orientation of the user 411 are not different from the state of FIG. 8 even if the left / right swing button 231 is pressed.

図１０は、図８の状態において、上下スイングボタン２３２が押された場合に表示された仮想空間の一例である。上下スイングボタン２３１が押された場合、まず自ユーザ４１１は首を上に振る。そのため、ディスプレイに表示される仮想空間の平面図１０Ａでは、上辺が底辺より小さい台形に変形して表示される。したがって、仮想空間内で自ユーザ４１１の前方に存在する他ユーザ４１２、４１３は、図８に示すもとの大きさより所定の割合だけ小さく表示される。 FIG. 10 is an example of the virtual space displayed when the up / down swing button 232 is pressed in the state of FIG. When the up / down swing button 231 is pressed, the user 411 first swings his / her head up. Therefore, in the plan view 10A of the virtual space displayed on the display, the upper side is deformed and displayed in a trapezoid smaller than the bottom side. Therefore, the other users 412, 413 existing in front of the own user 411 in the virtual space are displayed by a predetermined ratio smaller than the original size shown in FIG.

次に、自ユーザ４１１は首を下に振るので、ディスプレイに表示される仮想空間の平面図１０Ａでは、底辺が上辺より小さい台形に変形して表示される。したがって、仮想空間内で自ユーザ４１１の前方に存在する他ユーザ４１２、４１３は、もとの大きさより所定の割合だけ大きく表示される。そして、ディスプレイに表示される仮想空間は、図８に示すもとの状態を表示する。なお、自ユーザの位置および向きは、上下スイングボタン２３２が押されても図８の状態と変わらない。 Next, since the user 411 swings his / her head down, in the plan view 10A of the virtual space displayed on the display, the base is deformed and displayed in a trapezoid smaller than the top. Therefore, the other users 412 and 413 existing in front of the user 411 in the virtual space are displayed by a predetermined ratio larger than the original size. Then, the virtual space displayed on the display displays the original state shown in FIG. Note that the position and orientation of the user is not changed from the state shown in FIG. 8 even when the up / down swing button 232 is pressed.

図１１は、図８に示す平面図で表示した仮想空間を、３次元グラフィックス技術を使用して透視図のレンダリングを行った場合の仮想空間の一例である。すなわち、グラフィクスレンダラ２１９は、メモリ３０２または外部記憶装置３０３に記憶している空間の大きさ、壁および天井の材質などの仮想空間の属性、仮想空間内における自ユーザおよび他ユーザの位置情報などの３次元のデータから２次元画像を作成し、ディスプレイ２２０に表示する。図示する例では、仮想空間内における自ユーザ４１１の位置より定まる視点から、仮想空間に配置された壁面、天井、床面、他ユーザ４１２、４１３を眺めることで得られる２次元画像を表示している。 FIG. 11 is an example of a virtual space when the virtual space displayed in the plan view shown in FIG. 8 is rendered as a perspective view using a three-dimensional graphics technique. That is, the graphics renderer 219 includes the size of the space stored in the memory 302 or the external storage device 303, the attributes of the virtual space such as the wall and ceiling materials, the location information of the own user and other users in the virtual space, etc. A two-dimensional image is created from the three-dimensional data and displayed on the display 220. In the illustrated example, a two-dimensional image obtained by viewing a wall surface, a ceiling, a floor surface, and other users 412 and 413 arranged in the virtual space from a viewpoint determined from the position of the own user 411 in the virtual space is displayed. Yes.

図１１では、スケールバーとして、床面に所定の距離を示すメッシュ（例えば、１ｍ×１ｍ）を表示している。これにより、図８で説明したスケールバーと同様の効果が発生する。すなわち、自ユーザは、他ユーサとの距離の変化によって直接音と反射音との比がどのように変化するかを学習することができる。 In FIG. 11, a mesh (for example, 1 m × 1 m) indicating a predetermined distance is displayed on the floor surface as the scale bar. As a result, the same effect as the scale bar described in FIG. 8 occurs. That is, the own user can learn how the ratio of the direct sound and the reflected sound changes due to a change in the distance to other users.

なお、所定の距離を示すメッシュ（例えば、１ｍ×１ｍ）だけでは、自ユーザは、遠くに存在する他ユーザとの距離を直感的に把握することは難しい。そのため、所定の距離より大きな距離（例えば、５ｍ×５ｍ、ないし、１０ｍ×１０ｍ）ごとに、より太い線でメッシュを表示することにしてもよい。また、メッシュの一部だけを表示したり、メッシュの交点だけを表示したりすることによって、距離を自ユーザに把握させるようにしてもよい。また、図８に示す平面図では、スケールバー４１４を表示しているが、メッシュを表示することとしてもよい。 Note that it is difficult for the own user to intuitively grasp the distance to other users who are far away only with a mesh indicating a predetermined distance (for example, 1 m × 1 m). Therefore, the mesh may be displayed with a thicker line for each distance larger than a predetermined distance (for example, 5 m × 5 m or 10 m × 10 m). Alternatively, the user may be made to grasp the distance by displaying only a part of the mesh or displaying only the intersection of the mesh. Moreover, in the top view shown in FIG. 8, although the scale bar 414 is displayed, it is good also as displaying a mesh.

また、本実施形態では、平面のディスプレイ２２０を使用している。しかしながら、ステレオ視が可能なヘッドマウントディスプレイ等を使用することによって、スケールバーやメッシュを表示することなく、より直接的に距離を表示することとしてもよい。 In the present embodiment, a flat display 220 is used. However, it is also possible to display the distance more directly without displaying a scale bar or mesh by using a head-mounted display or the like capable of stereo viewing.

以上で、図２のクライアントの説明を終了する。なお、クライアントのなかで、マイクロフォン２１１、ポインティングデバイス２３０、左右スイングボタン２３１、上下スイングボタン２３２、ヘッドフォン２１７およびディスプレイ２２０は、ハードウェアによって実現される。また、オーディオエンコーダ２１２、オーディオデコーダ２１６およびグラフィクスレンダラ２１９は、ソフトウェア、ハードウェアまたはこれらの組み合せによって実現される。また、オーディオ通信部２１５、空間モデラ２２１およびセッション制御部２２３は、通常、ソフトウェアによって実現される。 This is the end of the description of the client in FIG. Among the clients, the microphone 211, the pointing device 230, the left / right swing button 231, the up / down swing button 232, the headphones 217, and the display 220 are realized by hardware. The audio encoder 212, the audio decoder 216, and the graphics renderer 219 are realized by software, hardware, or a combination thereof. The audio communication unit 215, the space modeler 221 and the session control unit 223 are usually realized by software.

なお、クライアント２０１は、例えば、図１２に示すようなＰＤＡまたはハンドヘルド・コンピュータに近い大きさと機能を有するコンピュータを用いることが考えられる。すなわち、クライアント本体２３０は、ディスプレイ２２０と、ポインティングデバイス２３０として自ユーサの位置および向きを入力するための操作部２４０と、左右スイングボタン２３１と、上下スイングボタン２３２と、ネットワーク１０１に接続するためのアンテナ２３７と、を有する。 Note that the client 201 may be a computer having a size and function similar to those of a PDA or handheld computer as shown in FIG. That is, the client main body 230 is connected to the display 220, the operation unit 240 for inputting the position and orientation of the user as the pointing device 230, the left / right swing button 231, the up / down swing button 232, and the network 101. An antenna 237.

また、本体２３０に接続されたヘッドセットは、ヘッドフォン２１７およびマイクロフォン２１１を有する。図示するヘッドセットは、本体２３０に有線接続されているが、BluetoothまたはIrDA(赤外線)などにより無線接続することも可能である。また、クライアント２０１は、一般的なパーソナルコンピュータ（Personal Computer）を用いることとしてもよい。
次に、音響サーバ１２０について説明する。 The headset connected to the main body 230 includes a headphone 217 and a microphone 211. The headset shown is wired to the main body 230, but can be wirelessly connected by Bluetooth or IrDA (infrared rays). The client 201 may use a general personal computer.
Next, the acoustic server 120 will be described.

音響サーバ１２０は、クライアントのオーディオ通信部２１５各々から送信されたオーディオ信号を、３次元オーディオ技術を使用して、音響サーバ１２０の空間モデラが保持する聴覚的な仮想空間属性、および、仮想空間上に存在する自ユーザおよび他ユーザ（音源）の位置にもとづいて、仮想空間上でどのように他ユーザの音声が聞こえるかを計算する。 The acoustic server 120 uses audio data transmitted from each of the audio communication units 215 of the client, the auditory virtual space attribute held by the spatial modeler of the acoustic server 120 using the three-dimensional audio technology, and the virtual space Based on the positions of the own user and other users (sound sources) existing in the above, how to hear the voices of other users in the virtual space is calculated.

図１６は、音響サーバ１２０の構成図である。図示するように、音響サーバ１２０は、オーディオ受信部１２１と、オーディオレンダラ１２２と、オーディオ送信部１２３と、をそれぞれ少なくとも１つ有する。すなわち、音響サーバ１２０は、クライアントの数だけ（すなわち、クライアント毎に）これらの処理部１２１〜１２３を有するものとする。なお、音響サーバ１２０は、オーディオ受信部１２１、オーディオレンダラ１２２およびオーディオ送信部１２３を、クライアントの数だけ有することなく、それぞれ1つのプログラムまたは装置を時分割で使用することによって実現することとしてもよい。 FIG. 16 is a configuration diagram of the acoustic server 120. As illustrated, the acoustic server 120 includes at least one audio receiving unit 121, an audio renderer 122, and an audio transmitting unit 123. That is, the acoustic server 120 includes these processing units 121 to 123 as many as the number of clients (that is, for each client). The acoustic server 120 may be realized by using a single program or device in a time-sharing manner, without having the audio reception unit 121, the audio renderer 122, and the audio transmission unit 123 as many as the number of clients. .

また、音響サーバ１２０は、空間モデラ１２４と、通信セション１２５を有する。空間モデラ１２４は、部屋管理サーバ１１０から、仮想空間における各ユーザの位置および仮想空間の属性を受信し、図３に示すクライアントの空間モデラ２２１と同様の処理を行い、仮想空間上に各ユーザを配置する。セッション制御部１２５は、図３に示すクライアントのセッション制御部２２３と同様に、部屋管理サーバ１１０との間で、通信セションを制御する。 The acoustic server 120 includes a space modeler 124 and a communication session 125. The space modeler 124 receives the position of each user in the virtual space and the attribute of the virtual space from the room management server 110, performs the same processing as the client space modeler 221 shown in FIG. Deploy. The session control unit 125 controls a communication session with the room management server 110 in the same manner as the session control unit 223 of the client shown in FIG.

クライアント毎に対応付けられたオーディオ受信部１２１各々は、各クライアントのオーディオ通信部２１５からオーディオ信号（音声）を受信する。そして、オーディオ受信部１２１各々は、受信したオーディオ信号をバッファリングすることによって、全てのクライアントからのオーディオ信号間で同期させた (対応づけた) 信号データを、各オーディオレンダラ１２２に送出する。このバッファリング (プレイアウト・バッファリング) の方法については、たとえば次の文献に記述されている。
Colin Perkins 著: RTP: Audio and Video for the Internet, Addison-Wesley Pub Co; 1st edition (June 11, 2003).
クライアント毎に対応付けられたオーディオレンダラ１２２各々は、各オーディオ受信部１２１から入力された各オーディオ信号（音声）を、空間モデラ１２４が配置した仮想空間上の各ユーザの位置に基づいて立体化する。そして、オーディオレンダラ１２２は、当該クライアントに対応した２チャンネル（左チャンネルと右チャンネル）の信号データ（信号列）を、当該クライアントのオーディオ送信部１２３に出力する。クライアント毎に対応付けられたオーディオ送信部１４４は、２チャンネルの信号データを対応するクライアントに送信する。 Each audio receiving unit 121 associated with each client receives an audio signal (voice) from the audio communication unit 215 of each client. Then, each of the audio reception units 121 sends the signal data synchronized (corresponded) between the audio signals from all clients to each audio renderer 122 by buffering the received audio signal. This buffering (playout buffering) method is described, for example, in the following document.
Colin Perkins: RTP: Audio and Video for the Internet, Addison-Wesley Pub Co; 1st edition (June 11, 2003).
Each audio renderer 122 associated with each client three-dimensionalizes each audio signal (sound) input from each audio receiving unit 121 based on the position of each user in the virtual space arranged by the space modeler 124. . Then, the audio renderer 122 outputs the signal data (signal sequence) of two channels (left channel and right channel) corresponding to the client to the audio transmission unit 123 of the client. The audio transmission unit 144 associated with each client transmits 2-channel signal data to the corresponding client.

次に、オーディオレンダラ１２２について、具体的に説明する。 Next, the audio renderer 122 will be specifically described.

３次元オーディオ技術においては、おもに人の頭（以下、「人頭」）のまわりでの音響の変化のしかた (インパルス応答) をあらわす HRIR (Head Related Impulse Response) と、部屋などの仮想環境によって生成される擬似的な残響とによって音の方向および距離を表現する。そして、HRIR は、音源と人頭との距離、および、人頭と音源との角度 (水平角度および垂直角度)によって決定される。なお、音響サーバ１２０のメモリ３０２または外部記憶装置３０３には、あらかじめダミーへッド（人頭）を使用して各距離および各角度毎に測定したHRIRの数値が記憶されているものとする。また、HRIRの数値には、左チャネル用（ダミーヘッドの左耳で測定したもの）と、右チャネル用（ダミーヘッドの右耳で測定したもの）とで異なる数値を使用することによって、左右、前後または上下の方向感を表現する。 In 3D audio technology, it is mainly generated by HRIR (Head Related Impulse Response), which expresses how the sound changes around the human head (hereinafter “human head”) (impulse response), and a virtual environment such as a room. The direction and distance of the sound is expressed by the simulated reverberation. HRIR is determined by the distance between the sound source and the human head and the angle (horizontal angle and vertical angle) between the human head and the sound source. It is assumed that the memory 302 of the acoustic server 120 or the external storage device 303 stores HRIR values measured in advance for each distance and each angle using a dummy head (person's head). Also, by using different values for the left channel (measured with the left ear of the dummy head) and the right channel (measured with the right ear of the dummy head), the HRIR values Express a sense of direction in the front-rear or up-down direction.

図１４は、オーディオレンダラ１２２の処理を示した図である。 FIG. 14 is a diagram showing processing of the audio renderer 122.

オーディオレンダラ１２２は、各音源（他ユーザ）に関して RTP (Real-time Transport Protocol) によって受信される 1 パケットごと (通常は 20 ms ごと) に、下記の計算をおこなう。 The audio renderer 122 performs the following calculation for each packet (usually every 20 ms) received by RTP (Real-time Transport Protocol) for each sound source (other users).

まず、オーディオレンダラ１２２は、音源毎、音源の信号列 s_i[t] (t = 1, ...) および音源の仮想空間内での座標 (x_i, y_i，ｚ_i ）の入力を受け付ける（Ｓ６１）。なお、仮想空間内での各音源の座標については、空間モデラ１２４から入力される。空間モデラ１２４は、仮想空間上に各音源（各ユーザ）を配置した後、各音源の座標をオーディオレンダラ１２２に入力する。また、各音源の信号列は、各オーディオ受信部１２１から入力される。 First, the audio renderer 122 inputs the sound source signal sequence s _i [t] (t = 1, ...) and coordinates (x _i , y _i, z _i ) in the virtual space of each sound source. Accept (S61). Note that the coordinates of each sound source in the virtual space are input from the space modeler 124. The space modeler 124 arranges each sound source (each user) in the virtual space, and then inputs the coordinates of each sound source to the audio renderer 122. In addition, the signal sequence of each sound source is input from each audio receiving unit 121.

そして、オーディオレンダラ１２２は、音源の直接音と、残響である反射音とを計算する。 Then, the audio renderer 122 calculates the direct sound of the sound source and the reflected sound that is reverberation.

直接音については、オーディオレンダラ１２２は、入力された座標を用いて、自ユーザと音源との距離および角度 (azimuth) を、音源ごとに計算する（Ｓ６２）。そして、オーディオレンダラ１２２は、自ユーザとの距離および角度 (azimuth)に対応するHRIR を、メモリ３０２または外部記憶装置３０３にあらかじ記憶されたHRIRの数値の中から特定する（Ｓ６３）。なお、オーディオレンダラ１２２は、メモリ３０２等に記憶されたHRIRの数値を補間することによって算出したHRIRの数値を使用することとしてもよい。 For direct sound, the audio renderer 122 calculates the distance and angle (azimuth) between the user and the sound source for each sound source using the input coordinates (S62). Then, the audio renderer 122 identifies the HRIR corresponding to the distance and angle (azimuth) with the user from the HRIR values stored in advance in the memory 302 or the external storage device 303 (S63). The audio renderer 122 may use the HRIR value calculated by interpolating the HRIR value stored in the memory 302 or the like.

そして、オーディオレンダラ１２２は、Ｓ６１において入力した信号列と、Ｓ６３において特定したHRIRの左チャネル用 HRIR と、を使用してたたみこみ (convolution) 計算を行い、左チャネル信号を生成する（Ｓ６４）。また、オーディオレンダラ１２２は、Ｓ６１において入力した信号列と、Ｓ６３において特定したHRIRの右チャネル用 HRIR と、を使用してたたみこみ (convolution) 計算を行い、右チャネル信号を生成する（Ｓ６５）。 The audio renderer 122 performs a convolution calculation using the signal sequence input in S61 and the HRIR for the left channel specified in S63, and generates a left channel signal (S64). The audio renderer 122 performs convolution calculation using the signal sequence input in S61 and the HRIR for the right channel specified in S63, and generates a right channel signal (S65).

反射音については、オーディオレンダラ１２２は、入力された座標を用いて、付加すべき残響を計算する（Ｓ６６、Ｓ６７）。すなわち、オーディオレンダラ１２２は、仮想空間の属性による音響の変化の仕方 (インパルス応答) にもとづいて残響を計算する。以下、残響の計算について説明する。 For the reflected sound, the audio renderer 122 calculates reverberation to be added using the input coordinates (S66, S67). That is, the audio renderer 122 calculates the reverberation based on the way of changing the sound (impulse response) according to the attribute of the virtual space. Hereinafter, calculation of reverberation will be described.

残響は初期反射（ｅａｒｌｙｒｅｆｌｅｃｔｉｏｎ）と後期残響（ｌａｔｅｒｅｖｅｒｂｅｒａｔｉｏｎ）とによって構成されている。そして、初期反射の方が後期残響より、他ユーザとの距離や部屋の大きさなどに関する感覚の形成（認知）において、重要であると一般的に考えられている。実空間上の室内では、音源から直接発せられた音（直接音）が聞こえた後、数ｍｓから１００ｍｓくらいの間に、条件によっては、壁、天井、床などからの数１０個の初期反射を聞くことができるといわれている。部屋の形状が直方体であれば、１回の初期反射は６個だけである。しかしながら、より複雑な形状または家具などがある部屋においては、反射音の数が増え、また、壁などで複数回反射した音も聞こえる。 The reverberation is composed of early reflection and late reverberation. And it is generally thought that the early reflection is more important than the late reverberation in the formation (recognition) of the sense regarding the distance to other users and the size of the room. In a room in real space, after hearing the sound directly emitted from the sound source (direct sound), several tens of initial reflections from walls, ceilings, floors, etc., depending on conditions, within a few ms to 100 ms. It is said that you can hear. If the shape of the room is a rectangular parallelepiped, there are only six initial reflections at a time. However, in a room with a more complicated shape or furniture, the number of reflected sounds increases, and sounds reflected multiple times by walls or the like can be heard.

初期反射の計算法としてｉｍａｇｅｓｏｕｒｃｅｍｅｔｈｏｄがあり、たとえば次の文献に記述されている。 There is an image source method as a method of calculating the initial reflection, which is described in the following document, for example.

Ａｌｌｅｎ，Ｊ．Ｂ．ａｎｄＢｅｒｋｌｅｙ，Ａ．， “ＩｍａｇｅＭｅｔｈｏｄｆｏｒｅｆｆｉｃｉｅｎｔｌｙＳｉｍｕｌａｔｉｎｇＳｍａｌｌ−ＲｏｏｍＡｃｏｕｓｔｉｃｓ”，Ｊ．ＡｃｏｕｓｔｉｃａｌＳｏｃｉｅｔｙｏｆＡｍｅｒｉｃａ，Ｖｏｌ．６５，Ｎｏ．４．，ｐｐ．９４３−９５０，Ａｐｒｉｌ１９７９．
単純なｉｍａｇｅｓｏｕｒｃｅｍｅｔｈｏｄにおいては、部屋の壁、天井、床を鏡面とみなし、反射音を鏡面の反対側にある音源の像からの音として計算する。 Allen, J .; B. and Berkley, A.A. “Image Method for Efficiently Simulating Small-Room Acoustics”, J. Am. Acoustical Society of America, Vol. 65, no. 4). , Pp. 943-950, April 1979.
In a simple image source method, the wall, ceiling, and floor of a room are regarded as mirror surfaces, and the reflected sound is calculated as sound from an image of a sound source on the opposite side of the mirror surface.

図１５は、説明を簡単にするために、天井と床を省略した２次元のｉｍａｇｅｓｏｕｒｃｅｍｅｔｈｏｄを図示したものである。すなわち、中央に本来の音室１があり、当該音室１には音源と聴取者である自ユーザが存在する。そして、音室１の周囲には、部屋の壁２を含む１２個の鏡像が描かれている。なお、鏡像は、１２個である必然性はなく、これより多くすることも少なくすることもできる。 FIG. 15 illustrates a two-dimensional image source method with the ceiling and floor omitted for the sake of simplicity. That is, there is an original sound chamber 1 in the center, and the sound chamber 1 includes a sound source and a user who is a listener. Around the sound chamber 1, twelve mirror images including the wall 2 of the room are drawn. Note that the number of mirror images is not necessarily 12 and can be increased or decreased.

オーディオレンダラ１２２は、鏡像各々の中に存在する各音源の像の位置からの音が、聴取者（自ユーザ）に直進するものとして、各音源の像から聴取者までの距離と方向を算出する（Ｓ６６）。音の強さは距離に反比例するため、オーディオレンダラ１２２は、距離に従って各音量を減衰させる。ただし、壁の反射率をα（０≦α≦１）とすると、壁でｎ回反射される音の標本には、αⁿを乗じて、音量をさらに減衰させる。 The audio renderer 122 calculates the distance and direction from the image of each sound source to the listener, assuming that the sound from the position of the image of each sound source existing in each mirror image goes straight to the listener (own user). (S66). Since the sound intensity is inversely proportional to the distance, the audio renderer 122 attenuates each volume according to the distance. However, if the reflectance of the wall is α (0 ≦ α ≦ 1), the sound sample reflected n times by the wall is multiplied by α ⁿ to further attenuate the volume.

なお、反射率αの値は、０．６程度の値を使用する。０．６程度の値にする理由は、聴取者が音源との距離を認識するのに充分な残響（すなわち、直接音と反射音との比）を取得するためである。また、もう１つの理由としては、αの値を過大にした場合、聴取者の方向感覚をにぶらせるからである。 Note that the value of the reflectance α is about 0.6. The reason why the value is set to about 0.6 is to obtain reverberation sufficient for the listener to recognize the distance to the sound source (that is, the ratio between the direct sound and the reflected sound). Another reason is that if the value of α is excessively large, the listener's sense of direction is disturbed.

そして、オーディオレンダラ１２２は、各音源の像毎に、自ユーザとの距離および角度 (azimuth)に対応するHRIR を、メモリ３０２または外部記憶装置３０３にあらかじ記憶されたHRIRの数値の中から特定する（Ｓ６７）。反射音はそれぞれ異なる方向から人頭に達するため、Ｓ６３において特定した直接音のHRIRとは異なるHRIRを適用する必要がある。 Then, the audio renderer 122 specifies the HRIR corresponding to the distance and angle (azimuth) with the user for each sound source image from the HRIR values stored in advance in the memory 302 or the external storage device 303. (S67). Since the reflected sounds reach the human head from different directions, it is necessary to apply an HRIR different from the HRIR of the direct sound specified in S63.

なお、多数の反射音各々に、異なるHRIRを用いて後述するたたみこみ計算（Ｓ６７、Ｓ６８）を行うと膨大な計算が必要になる。計算量の増加を防止するため、反射音の計算には、実際の音源の方向にかかわらず正面に音源があるときのHRIRを適用することとしてもよい。そして、音が左右の耳に達する際の時間差（ＩＴＤ，ｉｎｔｅｒａｕｒａｌｔｉｍｅｄｉｆｆｅｒｅｎｃｅ）と強度差（ＩＩＤ，ｉｎｔｅｒａｕｒａｌｉｎｔｅｎｓｉｔｙｄｉｆｆｅｒｅｎｃｅ）だけを計算することによって、少ない計算量でHRIRの計算を代替することができる。 If a convolution calculation (S67, S68), which will be described later, is performed on each of a large number of reflected sounds using different HRIRs, a huge amount of calculation is required. In order to prevent an increase in the amount of calculation, the HRIR when the sound source is in front may be applied to the calculation of the reflected sound regardless of the actual sound source direction. By calculating only the time difference (ITD, internal time difference) and intensity difference (IID) when the sound reaches the left and right ears, the calculation of HRIR can be replaced with a small amount of calculation.

そして、オーディオレンダラ１２２は、Ｓ６１において入力した信号列と、Ｓ６７において特定したHRIRの左チャネル用HRIR とを使用して、たたみこみ（convolution）計算を行い、左チャネル信号の残響を生成する（Ｓ６８）。また、オーディオレンダラ１２２は、Ｓ６１において入力した信号列と、Ｓ６７において特定したHRIRの右チャネル用HRIR とを使用して、たたみこみ（convolution）計算を行い、右チャネル信号の残響を生成する（Ｓ６９）。 Then, the audio renderer 122 performs convolution calculation using the signal sequence input in S61 and the HRIR for the left channel specified in S67, and generates reverberation of the left channel signal (S68). . In addition, the audio renderer 122 performs convolution calculation using the signal sequence input in S61 and the HRIR for the right channel of the HRIR specified in S67, and generates the reverberation of the right channel signal (S69). .

そして、オーディオレンダラ１２２は、各音源からの左チャネル信号を全て加算する（Ｓ７０）。なお、左チャネル信号は、Ｓ６４で算出した直接音と、Ｓ６８において算出した反射音とが含まれる。 Then, the audio renderer 122 adds all the left channel signals from the respective sound sources (S70). Note that the left channel signal includes the direct sound calculated in S64 and the reflected sound calculated in S68.

また、オーディオレンダラ１２２は、各音源からの右チャネル信号を全て加算する（Ｓ７１）。なお、左チャネル信号は、Ｓ６５で算出した直接音と、Ｓ６９において算出した反射音とが含まれる。 The audio renderer 122 adds all the right channel signals from the respective sound sources (S71). Note that the left channel signal includes the direct sound calculated in S65 and the reflected sound calculated in S69.

なお、HRIR計算（Ｓ６３、Ｓ６７）は、前記のように１パケットごとに行うが、たたみこみ計算（Ｓ６４、Ｓ６５、Ｓ６８、Ｓ６９）においては、次のパケットに繰り越すべき部分が生じる。そのため、特定したHRIRまたは入力された信号列を次のパケットの処理まで保持する必要がある。 The HRIR calculation (S63, S67) is performed for each packet as described above. However, in the convolution calculation (S64, S65, S68, S69), a portion to be carried over to the next packet is generated. Therefore, it is necessary to hold the specified HRIR or the input signal string until the next packet processing.

このように、オーディオレンダラ１２２は、各クライアントのオーディオ通信部２１５から送信された各ユーザの音声に対して、前記計算による音量の調節、残響や反響音の重ね合わせ、フィルタリングなどの処理を行い、自ユーザの仮想空間内の位置において聞こえるべき音に音響効果を制御する。すなわち、オーディオレンダラ１２２は、仮想空間の属性と他ユーザとの相対的な位置から帰結する処理によって音声を定位させた立体音響を生成する。 In this way, the audio renderer 122 performs processing such as volume adjustment, superimposition of reverberation and reverberation, filtering, etc. on the voice of each user transmitted from the audio communication unit 215 of each client, The sound effect is controlled on the sound to be heard at the position in the virtual space of the own user. That is, the audio renderer 122 generates stereophonic sound in which the sound is localized by a process resulting from the relative position between the attribute of the virtual space and another user.

なお、クライアント毎に備えられたオーディオレンダラ１２２は、必要に応じて、当該クライアントを使用する自ユーザの音声に対して残響、フィルタリングなどの仮想空間の属性から帰結する処理を行うこととしてもよい。オーディオレンダラ１２２により生成された自ユーザの音声は、ヘッドフォン２１７に出力され、これを自ユーザが聴取する。自ユーザの音声の直接音を自ユーザに聴取させると奇異な印象をあたえることがあり、特に遅延が大きいと自らの発声に支障を与えるため、通常は自ユーザに自身の音声を聴取させない。しかしなから、直接音については聴取させず、遅延を数 10 ms の範囲におさえた反射音（残響）だけを聴取させることも可能である。これによって、自ユーザの仮想空間内での位置、または、仮想空間の大きさを、自ユーザに身体感覚として把握させることができる。 Note that the audio renderer 122 provided for each client may perform processing resulting from virtual space attributes such as reverberation and filtering on the voice of the user using the client as necessary. The sound of the own user generated by the audio renderer 122 is output to the headphones 217, and the own user listens to the sound. When the user directly listens to the direct sound of the user's voice, the user may have a strange impression. In particular, if the delay is large, the user's utterance will be hindered. However, it is possible to listen to only the reflected sound (reverberation) with a delay in the range of several tens of ms without listening to the direct sound. As a result, the position of the user in the virtual space or the size of the virtual space can be recognized by the user as a physical sensation.

ただし、本実施形態のように音響サーバ１２０が残響の計算を行う場合、ユーザが声を発してから音響サーバ１２０で計算された反射音がユーザに届くまでに100 msあるいはそれ以上の時間がかかることが多い。この遅延は通常の部屋における反射音の遅延時間よりはるかに大きいため、ユーザの知覚を混乱させる原因となりうる。この問題を解決するために、クライアント上に音響サーバ１２０のオーディオレンダラ１２２と同様の機能を有するオーディオレンダラを実装する。そして、自ユーザの音声については、クライアントに実装されたオーディオレンダラが残響の計算をすることによって遅延を小さくし、ユーザが実際の部屋にいるのと同程度の遅延の残響を実現することが考えられる。なお、他ユーザの音声については、前述のとおり音響サーバ１２０が残響計算を行う。
次に、クライアントの処理について説明する。 However, when the acoustic server 120 calculates reverberation as in the present embodiment, it takes 100 ms or more until the reflected sound calculated by the acoustic server 120 reaches the user after the user speaks. There are many cases. Since this delay is much larger than the delay time of the reflected sound in a normal room, it can cause the user's perception to be confused. In order to solve this problem, an audio renderer having the same function as the audio renderer 122 of the acoustic server 120 is mounted on the client. For the user's voice, the audio renderer installed in the client calculates the reverberation, reducing the delay, and realizing a reverberation with the same degree of delay as when the user is in the actual room. It is done. As described above, the acoustic server 120 performs reverberation calculation for the voices of other users.
Next, client processing will be described.

以下、ネットワーク接続処理、入場処理、退場処理、自ユーザの移動処理、他ユーザの移動処理、および、スイング処理の順にクライアントの処理を説明する。
ネットワーク接続処理は、ネットワーク１０１に接続するときの処理手順であって、クライアントの電源投入時に実行される。まず、セッション制御部２２３は、ユーザの識別情報と認証情報とを含むログインメッセージを、登録サーバ１３０に送信する。そして、登録サーバ１３０は、ユーザの識別情報および認証情報を認証し、部屋管理サーバ１１０にログインメッセージを送付する。そして、プレゼンスプロバイダ２２２は、部屋管理サーバ１１０から部屋リストを受け取り、ディスプレイ２２０に表示する。 Hereinafter, client processing will be described in the order of network connection processing, entrance processing, exit processing, own user movement processing, other user movement processing, and swing processing.
The network connection process is a process procedure when connecting to the network 101, and is executed when the client is turned on. First, the session control unit 223 transmits a login message including user identification information and authentication information to the registration server 130. The registration server 130 authenticates the user identification information and authentication information, and sends a login message to the room management server 110. Then, the presence provider 222 receives the room list from the room management server 110 and displays it on the display 220.

なお、クライアントと登録サーバ１３０との通信には、SIP (Session Initiation Protocol) の REGISTER メッセージを使用することが考えられる。また、クライアントと、部屋管理サーバ１１０との通信には、SIP の INVITEメッセージ、BYEメッセージ、SUBSCRIBE メッセージおよびNOTIFY メッセージを使用することができる。 Note that it is possible to use a SIP (Session Initiation Protocol) REGISTER message for communication between the client and the registration server 130. For communication between the client and the room management server 110, an SIP INVITE message, BYE message, SUBSCRIBE message, and NOTIFY message can be used.

入場処理は、ユーザがディスプレイ２２０に表示された部屋リストの中から入場したい部屋を選択したときのクライアントの処理である。プレゼンスプロバイダ２２２は、入力装置３０５を用いて入力された部屋の選択指示を受け付け、部屋管理サーバ１１０に入場メッセージ (enter) を送信する。入場メッセージには、自ユーザの識別情報と、自ユーザの仮想空間における位置情報および方位情報（以下、「位置情報等」）とが含まれる。なお、自ユーザの位置情報等は、あらかじめメモリ３０２または外部記憶装置３０３（以下、「メモリ等」）に記憶されているものとする。 The entrance process is a client process when the user selects a room to be entered from the room list displayed on the display 220. The presence provider 222 receives a room selection instruction input using the input device 305 and transmits an entrance message (enter) to the room management server 110. The admission message includes identification information of the own user and position information and direction information (hereinafter, “position information etc.”) in the virtual space of the own user. It is assumed that the position information of the own user is stored in advance in the memory 302 or the external storage device 303 (hereinafter referred to as “memory etc.”).

入場メッセージの送信は、SIP の INVITE メッセージを使用することもできる。INVITE メッセージは、クライアントと部屋管理サーバ１１０間の音声通信の開始を宣言する。INVITEメッセージを受信すると、部屋管理サーバ１１０は、音響サーバ１２０に指示してクライアントと音響サーバ１２０間の音声通信が開始される。このINVITE メッセージの用法は、IETFのドキュメントRFC3261に従えばよい。 The SIP INVITE message can also be used to send the admission message. The INVITE message declares the start of voice communication between the client and the room management server 110. When receiving the INVITE message, the room management server 110 instructs the acoustic server 120 to start voice communication between the client and the acoustic server 120. The usage of this INVITE message may be in accordance with IETF document RFC3261.

また、クライアントは、入場メッセージの送信と同時に、ユーザの位置を通知するためにSIPのPUBLISHメッセージを送信することができる。PUBLISHメッセージは、選択した部屋の仮想空間において発生したイベント（例えば、ユーザの移動など）を、通知要求がないときでも通知するメッセージである。 The client can also send a SIP PUBLISH message to notify the user's location at the same time as sending the admission message. The PUBLISH message is a message for notifying an event (for example, movement of a user) occurring in the virtual space of the selected room even when there is no notification request.

また、クライアントは、他ユーザの位置の変更を含む仮想空間内のイベント通知を部屋管理サーバ１１０に要求するために、SUBSCRIBE メッセージを送信することができる。SUBSCRIBE メッセージは、選択した部屋の仮想空間において発生したイベント（例えば、ユーザの移動など）の通知を要求するメッセージである。SUBSCRIBEメッセージを受信した部屋管理サーバ１１０は、仮想空間内でイベントが発生すると、当該イベントの内容をNOTIFY メッセージによってクライアントに通知する。NOTIFY メッセージは、仮想空間において発生したイベント（例えば、ユーザの位置の変更）を通知要求にしたがって通知するメッセージである。 In addition, the client can transmit a SUBSCRIBE message in order to request the room management server 110 for event notification in the virtual space including a change in the position of another user. The SUBSCRIBE message is a message requesting notification of an event (for example, user movement) that has occurred in the virtual space of the selected room. When the room management server 110 receives the SUBSCRIBE message, when an event occurs in the virtual space, the room management server 110 notifies the client of the content of the event by a NOTIFY message. The NOTIFY message is a message for notifying an event (for example, a change in the user's position) occurring in the virtual space in accordance with a notification request.

これらのPUBLISHメッセージ、SUBSCRIBE メッセージ、NOTIFY メッセージの用法は、IETFのドキュメントRFC2543（Roach，Ａ．Ｂ著「Session Initiation Protocol（SIP）−Specific Event Notification」）、および、インターネットドラフト“Session Initiation Protocol（SIP）Extension for Event State Publication”(Niemi，Ａ．編)に従えばよい。 The usage of these PUBLISH messages, SUBSCRIBE messages, and NOTIFY messages is described in IETF document RFC 2543 (Roach, AB, “Session Initiation Protocol (SIP) —Specific Event Notification”) and the Internet draft “Session Initiation Protocol (SIP)”. You can follow “Extension for Event State Publication” (Niemi, A.).

そして、プレゼンスプロバイダ２２２は、選択した部屋の入場者リストを、例えばNOTIFYメッセージの形式で、部屋管理サーバ１１０から受けとる。なお、入場者リストには、部屋に入場している他ユーザの識別情報および仮想空間内における位置情報等と、選択した部屋の仮想空間属性と、が含まれているものとする。 Then, the presence provider 222 receives the attendee list of the selected room from the room management server 110 in the form of a NOTIFY message, for example. The attendee list includes identification information of other users who have entered the room, position information in the virtual space, and the like, and virtual space attributes of the selected room.

退場処理は、ユーザが部屋を退場する時の処理である。プレゼンスプロバイダ２２２は、自ユーザの退場指示を受付けて、自ユーザの識別情報を含む退場メッセージを部屋管理サーバ１１０に送信する。なお、入場者メッセージとしてSIPのINVITEメッセージを使用したときは、退場メッセージとしてBYEメッセージを使用するのが適切である。また、入場処理においてSUBSCRIBE メッセージを送信していた場合、クライアントは、SUBSCRIBE メッセージによる通知要求を取り消すためのUNSUBSCRIBE メッセージを部屋管理サーバ１１０に送信するべきである。 The exit process is a process when the user leaves the room. The presence provider 222 receives the exit instruction of the own user and transmits an exit message including the identification information of the own user to the room management server 110. When the SIP INVITE message is used as the visitor message, it is appropriate to use the BYE message as the exit message. Further, when the SUBSCRIBE message is transmitted in the admission process, the client should transmit an UNSUBSCRIBE message for canceling the notification request by the SUBSCRIBE message to the room management server 110.

自ユーザの移動処理は、自ユーザがプレゼンスを変更した場合、すなわち仮想空間において位置または向きを変更した場合の処理である。 The movement process of the own user is a process when the own user changes the presence, that is, when the position or orientation in the virtual space is changed.

図１６は、自ユーザの移動処理の処理フロー図である。まず、空間モデラ２２１は、ポインティングデバイス２３０から移動情報の入力を受け付ける（Ｓ１１０１）。すなわち、自ユーザがポインティングデバイス２３０を操作することにより、空間モデラ２２１に移動情報が入力される。 FIG. 16 is a process flow diagram of the movement process of the own user. First, the space modeler 221 receives input of movement information from the pointing device 230 (S1101). That is, when the user operates the pointing device 230, movement information is input to the space modeler 221.

そして、空間モデラ２２１は、ポインティングデバイス２３０からの入力を検知すると、自ユーザの移動前の位置および向きと、ポインティングデバイス２３０からの入力された移動情報と、を用いて自ユーザの移動後の位置および向きを算出する（Ｓ１１０２）。なお、ポインティングデバイス２３０は、移動後の位置情報等を直接入力することとしてもよい。そして、空間モデラ２２１は、算出した移動後の位置情報等をメモリ等に記憶する。 When the space modeler 221 detects an input from the pointing device 230, the position after the user's movement is detected using the position and orientation before the user's movement and the movement information input from the pointing device 230. And the orientation is calculated (S1102). Note that the pointing device 230 may directly input position information after movement. Then, the space modeler 221 stores the calculated position information after movement in a memory or the like.

次に、空間モデラ２２１は、算出した移動後の位置情報等をグラフィクスレンダラ２１９、および、プレゼンスプロバイダ２２２に通知する（Ｓ１１０３）。グラフィクスレンダラ２１９は、仮想空間内の通知された移動後の位置および向きに基づいて自ユーザの視点を変更し、仮想空間上でどのように他ユーザが見えるかを計算（座標変換）する。そして、グラフィクスレンダラ２１９は、当該位置および向きからの眺めで画面上に出力するイメージデータを作成し、表示画面を更新する。 Next, the space modeler 221 notifies the calculated position information and the like after movement to the graphics renderer 219 and the presence provider 222 (S1103). The graphics renderer 219 changes the viewpoint of the own user based on the notified position and orientation after movement in the virtual space, and calculates (coordinate conversion) how other users can see in the virtual space. Then, the graphics renderer 219 creates image data to be output on the screen with the view from the position and orientation, and updates the display screen.

プレゼンスプロバイダ２２２は、通知された自ユーザの移動後の位置情報等を、例えばＮＯＴＩＦＹメッセージの形式で部屋管理サーバ１１０に通知（送信）する（Ｓ１１０４）。なお、ＮＯＴＩＦＹメッセージは、通常ＳＵＢＳＣＲＩＢＥメッセージを受信した結果として送信される。そのため、部屋管理サーバ１１０は、クライアント２０１から入場メッセージを受信した際に、入場者リストを返信するとともに前記ＮＯＴＩＦＹメッセージに対応するＳＵＢＳＣＲＩＢＥメッセージを送信することが考えられる。 The presence provider 222 notifies (sends) the notified location information of the own user after moving to the room management server 110 in the form of a NOTIFY message, for example (S1104). The NOTIFY message is normally transmitted as a result of receiving a SUBSCRIBE message. Therefore, when the room management server 110 receives an admission message from the client 201, it may be possible to send back a SUBSCRIBE message corresponding to the NOTIFY message as well as returning a visitor list.

なお、部屋管理サーバ１１０は、プレゼンスプロバイダ２２２から通知された位置情報等を受け付け、入場者リストにおける当該ユーザの位置情報等を更新する。そして、部屋管理サーバ１１０は、通知された位置情報等を音響サーバ１２０および他ユーザのクライアントに送信する。音響サーバ１２０は、通知された自ユーザの仮想空間内の位置および向きでどのように他ユーザの音声が聞こえるかを計算する（図１４参照）。そして、音響サーバ１２０は、他ユーザの音声に対して前記計算による音量の調節、残響、フィルタリングなどの処理を行い、自ユーザの仮想空間内の位置において聞こえるべき音に音響効果を制御し、立体音響を更新する。そして、音響サーバ１２０は、更新された立体音響のオーディオ信号を、クライアントに送信する。 The room management server 110 receives the location information notified from the presence provider 222 and updates the location information of the user in the attendee list. Then, the room management server 110 transmits the notified position information and the like to the acoustic server 120 and other users' clients. The acoustic server 120 calculates how the other user's voice can be heard at the notified position and orientation in the virtual space of the own user (see FIG. 14). Then, the acoustic server 120 performs processing such as volume adjustment, reverberation, filtering, and the like on the other user's voice, controls the acoustic effect on the sound to be heard at the position in the user's virtual space, and Update the sound. Then, the acoustic server 120 transmits the updated stereophonic audio signal to the client.

他ユーザの移動処理は、部屋管理サーバ１１０がクライアントに他のユーザの仮想空間における位置情報等を通知した場合の処理である。 The movement process of another user is a process when the room management server 110 notifies the client of position information and the like in the virtual space of the other user.

図１７は、他ユーザの移動処理の処理フロー図である。 FIG. 17 is a process flow diagram of another user's movement process.

空間モデラ２２１は、プレゼンスプロバイダ２２２を介して部屋管理サーバ１１０から、他ユーザの仮想空間上の位置情報等を受け付ける（Ｓ１２０１）。なお、部屋管理サーバ１１０は、図１６のＳ１１０４においてクライアントから送信された位置情報等を、当該送信元のクライアント以外のクライアントに通知（送信）する。そして、空間モデラ２２１は、通知された仮想の位置情報等をメモリ等に記憶する。 The space modeler 221 receives location information of other users in the virtual space from the room management server 110 via the presence provider 222 (S1201). Note that the room management server 110 notifies (transmits) the position information and the like transmitted from the client in S1104 of FIG. 16 to clients other than the transmission source client. Then, the space modeler 221 stores the notified virtual position information or the like in a memory or the like.

そして、空間モデラ２２１は、通知された位置情報等を用いて、他ユーザの仮想空間上の位置および向きを変更する（Ｓ１２０２）。そして、空間モデラ２２１は、グラフィクスレンダラ２１９に、変更後の位置情報等を通知する（Ｓ１２０３）。グラフィクスレンダラ２１９は、図１６のＳ１１０３で説明したように、通知された他ユーザの位置および向きにもとづいて、表示画面を更新する。 Then, the space modeler 221 changes the position and orientation of the other user in the virtual space using the notified position information or the like (S1202). Then, the space modeler 221 notifies the graphics renderer 219 of the changed position information and the like (S1203). As described in S1103 of FIG. 16, the graphics renderer 219 updates the display screen based on the notified position and orientation of the other user.

なお、部屋管理サーバ１１０は、クライアントに他のユーザの仮想空間における位置情報等を通知する場合、あわせて、音響サーバ１２０にも他のユーザの位置情報等を通知する。音響サーバ１２０は、他ユーザの位置情報に基づいて更新された立体音響のオーディオ信号を、クライアントに送信する。 When the room management server 110 notifies the client of location information or the like of another user in the virtual space, the room management server 110 also notifies the acoustic server 120 of the location information or the like of the other user. The acoustic server 120 transmits a stereophonic audio signal updated based on the position information of the other user to the client.

スイング処理は、左右スイングボタン２３１または上下スイングボタンが押された場合の処理である。 The swing process is a process when the left / right swing button 231 or the up / down swing button is pressed.

図１８は、スイング処理の処理フロー図である。空間モデラ２２１は、左右スイングボタン２３１または上下スイングボタン２３２から、スイング処理の入力を受け付ける（Ｓ１３０１）。すなわち、自ユーザは、通信相手の方向を判別することが困難な場合、前記ボタン２３１、２３２を押すことにより、仮想空間上で仮想的に首を振ることを指示する。 FIG. 18 is a process flowchart of the swing process. The space modeler 221 receives an input of a swing process from the left / right swing button 231 or the up / down swing button 232 (S1301). That is, when it is difficult for the user to determine the direction of the communication partner, the user instructs to swing his / her head virtually in the virtual space by pressing the buttons 231 and 232.

そして、空間モデラ２２１は、前記ボタン２３１、２３２からの入力を検知すると、自ユーザの首を所定の角度だけ左（または、上）に振った場合の自ユーザの向きを算出する。そして、空間モデラ２２１は、算出した自ユーザの方位情報を、グラフィクスレンダラ２１９、および、プレゼンスプロバイダ２２２に通知する（Ｓ１３０２）。 When the space modeler 221 detects an input from the buttons 231 and 232, the spatial modeler 221 calculates the orientation of the user when the user's neck is swung left (or up) by a predetermined angle. Then, the space modeler 221 notifies the calculated orientation information of the user to the graphics renderer 219 and the presence provider 222 (S1302).

そして、空間モデラ２２１は、自ユーザの首を所定の角度だけ右（または、下）に振った場合の自ユーザの向きを算出する。そして、空間モデラ２２１は、算出した自ユーザの方位情報を、グラフィクスレンダラ２１９、および、プレゼンスプロバイダ２２２に通知する（Ｓ１３０３）。 Then, the space modeler 221 calculates the orientation of the user when the user's neck is swung to the right (or down) by a predetermined angle. Then, the space modeler 221 notifies the calculated orientation information of the user to the graphics renderer 219 and the presence provider 222 (S1303).

グラフィクスレンダラ２１９は、図９または図１０に示すように、自ユーザの視点で、自ユーザが首を振った場合のメージデータを作成し、表示画面を更新する。 As shown in FIG. 9 or FIG. 10, the graphics renderer 219 creates image data when the user shakes his / her head from the viewpoint of the user, and updates the display screen.

プレゼンスプロバイダ２２２は、Ｓ１３０２およびＳ１３０３で通知された自ユーザの方位情報を、順次、音響サーバ１２０に通知（送信）する（Ｓ１２０４）。音響サーバ１２０は、プレゼンスプロバイダ２２２から通知された方位情報を受け付け、空間モデラ１２４おける当該自ユーザの方位情報を、順次更新する。そして、音響サーバ１２０は、通知された自ユーザの仮想空間内の向きでどのように通信相手の音声が聞こえるかを計算する（図１４参照）。そして、音響サーバ１２０は、通信相手である他ユーザの音声に対して前記計算による音量の調節、残響、フィルタリングなどの処理を行い、自ユーザの仮想空間内の位置および向きにおいて聞こえるべき音に音響効果を制御し、立体音響を更新する。そして、音響サーバ１２０は、更新された立体音響のオーディオ信号を、前記ボタン２３１、２３２が押されたクライアントに送信する。 The presence provider 222 sequentially notifies (transmits) the direction information of the user notified in S1302 and S1303 to the acoustic server 120 (S1204). The acoustic server 120 receives the direction information notified from the presence provider 222, and sequentially updates the direction information of the user in the space modeler 124. Then, the acoustic server 120 calculates how the voice of the communication partner can be heard in the notified user's orientation in the virtual space (see FIG. 14). Then, the acoustic server 120 performs processing such as volume adjustment, reverberation, filtering, and the like on the voice of the other user who is the communication partner, and the sound to be heard at the position and orientation in the user's virtual space. Control effects and update stereophony. The acoustic server 120 transmits the updated stereophonic audio signal to the client whose buttons 231 and 232 are pressed.

なお、図１８に示すスイング処理は、所定の時間（例えば、１秒程度の間）に行われ、その後、クライアントおよび音響サーバ１２０は、前記ボタン２３１、２３２が押される前の状態に戻る。 The swing process shown in FIG. 18 is performed at a predetermined time (for example, for about 1 second), and then the client and the acoustic server 120 return to the state before the buttons 231 and 232 are pressed.

次に、部屋管理サーバ１１０の機能構成および処理手順について説明する。なお、登録サーバ１３０については、 SIP を使用する従来の通信と同じため、説明を省略する。 Next, the functional configuration and processing procedure of the room management server 110 will be described. Since the registration server 130 is the same as the conventional communication using SIP, the description is omitted.

図１９は、部屋管理サーバ１１０の機能構成を示す。部屋管理サーバ１１０は、クライアントおよび音響サーバ１２０と各種情報の送受信をするためのインタフェース部１１１と、クライアントからのメッセージ種別を判定する判定部１１２と、判定結果に応じた処理を行う処理部１１３と、仮想空間の属性、仮想空間で発生したイベント（ユーザの入退場、移動等）、部屋リスト、入場者リスト等を、管理し記憶する記憶部１１４とを有する。 FIG. 19 shows a functional configuration of the room management server 110. The room management server 110 includes an interface unit 111 for transmitting and receiving various types of information to and from the client and the acoustic server 120, a determination unit 112 that determines a message type from the client, and a processing unit 113 that performs processing according to the determination result. And a storage unit 114 that manages and stores the attributes of the virtual space, events (user entry / exit, movement, etc.) that occurred in the virtual space, room lists, visitor lists, and the like.

記憶部１１４には、あらかじめ、部屋管理サーバ１１０が管理するいくつかの仮想空間の属性が記憶されている。前述した入場処理において、ユーザはこれらの仮想空間（仮想的な部屋）の中から、入場したい仮想空間を選択する。その後、クライアントは、仮想空間に入場したユーザの各種のイベントを部屋管理サーバ１１０に送信する。これにより各仮想空間内には、各種のイベントが発生する。なお、記憶部１１４はこれらの情報をメモリ３０２または外部記憶装置３０３に記憶する。 In the storage unit 114, attributes of some virtual spaces managed by the room management server 110 are stored in advance. In the admission process described above, the user selects a virtual space to enter from these virtual spaces (virtual rooms). Thereafter, the client transmits various events of the user who entered the virtual space to the room management server 110. As a result, various events occur in each virtual space. The storage unit 114 stores such information in the memory 302 or the external storage device 303.

図２０は、部屋管理サーバ１１０の処理手順を示したものである。部屋管理サーバ１１０は、クライアントからの要求を受け付け、これに対する処理を部屋管理サーバ１１０が停止するまで行う。まず、インタフェース部１１１は、クライアントからのメッセージを待つ（Ｓ１４１１）。メッセージを受信すると、判定部１１２は、インタフェース部１１１が受け付けたメッセージの種類を判定する（Ｓ１４１２）。 FIG. 20 shows the processing procedure of the room management server 110. The room management server 110 receives a request from a client and performs processing for this until the room management server 110 stops. First, the interface unit 111 waits for a message from the client (S1411). When receiving the message, the determination unit 112 determines the type of message received by the interface unit 111 (S1412).

ログインメッセージの場合、処理部１１３は、メッセージ送信元のクライアントに部屋リストを送信するようインタフェース部１１１に指示する（Ｓ１４２１）。インタフェース部１１１は、部屋リストをメッセージ送信元のクライアントに送信し、その後Ｓ１４１１に戻り、次のメッセージを待つ。 In the case of a login message, the processing unit 113 instructs the interface unit 111 to transmit the room list to the message transmission source client (S1421). The interface unit 111 transmits the room list to the message transmission source client, and then returns to S1411 to wait for the next message.

入場メッセージの場合、処理部１１３は、メッセージ送信元クライアントのユーザを、指定された部屋の入場者リストに追加する（Ｓ１４３１）。すなわち、処理部１１３は、入場メッセージに含まれている、当該ユーザの識別情報と、当該ユーザの仮想空間上の位置情報および方位情報と、を入場者リストに追加する。次に、処理部１１３は、指定された部屋の仮想空間属性、および、入場者リストを、メッセージ送信元クライアントに送信するようインタフェース部１１１に指示する。なお、入場者リストには、指定された部屋に入場している全てのユーザの識別情報と、仮想空間上の位置情報および方位情報とが含まれている。インタフェース部１１１は、前記指示に従い送信元クライアントに入場者リストを送信する（Ｓ１４３２）。そして後述するＳ１４３６に進む。 In the case of an entrance message, the processing unit 113 adds the user of the message transmission source client to the entry list of the designated room (S1431). That is, the processing unit 113 adds the identification information of the user and the position information and direction information of the user in the virtual space included in the entrance message to the attendee list. Next, the processing unit 113 instructs the interface unit 111 to transmit the virtual space attribute of the designated room and the visitor list to the message transmission source client. The attendee list includes identification information of all users who have entered the designated room, position information in the virtual space, and direction information. The interface unit 111 transmits the visitor list to the transmission source client according to the instruction (S1432). And it progresses to S1436 mentioned later.

メッセージが移動メッセージの場合、処理部１１３は、入場者リストにおけるメッセージ送信元クライアント（ユーザ）の仮想空間上の位置情報および方位情報を更新する（Ｓ１４３５）。なお、仮想空間上の位置情報および方位情報は、移動メッセージに含まれている。そして、処理部１１３は、対象となる部屋の全ての入場者のクライアント（但し、メッセージ送信元クライアントは除く）および音響サーバ１２０に、メッセージ送信元クライアントのユーザの識別情報と、仮想空間上の位置情報および方位情報と、を通知するようインタフェース部１１１に指示する（Ｓ１４３６）。インタフェース部１１１は、前記指示に従い各クライアントおよび音響サーバ１２０に送信し、Ｓ１４１１に戻る。 When the message is a moving message, the processing unit 113 updates the position information and the direction information in the virtual space of the message transmission source client (user) in the attendee list (S1435). Note that position information and orientation information in the virtual space are included in the movement message. Then, the processing unit 113 sends the identification information of the user of the message transmission source client and the position in the virtual space to all the visitors' clients (excluding the message transmission source client) and the acoustic server 120 in the target room. The interface unit 111 is instructed to notify the information and the direction information (S1436). The interface unit 111 transmits to each client and the acoustic server 120 in accordance with the instruction, and returns to S1411.

なお、入場メッセージの場合についても、Ｓ１４３２の処理の後に、Ｓ１４３６の処理を行う。すなわち、入場メッセージを受信した際に、処理部１１３は、メッセージ送信元クライアントのユーザの識別情報および仮想空間上の位置情報等を、通知音響サーバ１２０とクライアントとに通知する（Ｓ１４３６）。これにより、クライアントは、部屋に入場すると音響サーバ１２０の既定の通信ポートとの間で (または、入場時に部屋管理サーバ１１０から通知されるポートとの間で) 音声通信を行う。すなわち、各クライアントのオーディオ通信部２１５は、音響サーバ１２０に１チャンネルの音声ストリームを送信し、音響サーバ１４０から２チャンネルの音声ストリームを受信する。 Also in the case of an entrance message, the process of S1436 is performed after the process of S1432. That is, when the admission message is received, the processing unit 113 notifies the notification acoustic server 120 and the client of the identification information of the user of the message transmission source client, the position information in the virtual space, and the like (S1436). Thus, when the client enters the room, the client performs voice communication with a predetermined communication port of the acoustic server 120 (or with a port notified from the room management server 110 at the time of entry). That is, the audio communication unit 215 of each client transmits a one-channel audio stream to the acoustic server 120 and receives a two-channel audio stream from the acoustic server 140.

退場メッセージの場合、処理部１１３は、メッセージ送信元クライアントのユーザを入場者リストから削除する（Ｓ１４４１）。そして、処理部１１３は、対象となる部屋の全ての入場者のクライアント（但し、メッセージ送信元クライアントは除く）、および、音響サーバ１２０に、当該ユーザが部屋から退場したことを通知するようにインタフェース部１１１に指示する（Ｓ１４４２）。インタフェース部１１１は、前記指示に従いクライアントに送信し、Ｓ１４１１に戻る。 In the case of an exit message, the processing unit 113 deletes the user of the message transmission source client from the visitor list (S1441). Then, the processing unit 113 is an interface that notifies all the visitors' clients (except the message transmission source client) in the target room and the acoustic server 120 that the user has left the room. The unit 111 is instructed (S1442). The interface unit 111 transmits to the client in accordance with the instruction, and returns to S1411.

以上で本発明の一実施形態について説明した。 The embodiment of the present invention has been described above.

本実施形態では、ヘッドフォン２１７から出力される音を聞いただけでは音源の方向を判別することが困難な場合、ユーザは、左右スイングボタン２３１または上下スイングボタン２３２を用いて、仮想空間内で仮想的に首を左右または上下に振る。これにより、ヘッドフォン２１７から再生される音源の音が、左右または上下に変化（振動）する。 In the present embodiment, when it is difficult to determine the direction of the sound source simply by listening to the sound output from the headphones 217, the user uses the left / right swing button 231 or the up / down swing button 232 to virtually Shake your neck left and right or up and down. Thereby, the sound of the sound source reproduced from the headphones 217 changes (vibrates) left and right or up and down.

左右スイングボタン２３１の場合、この音源の音の変化の仕方が、音源がユーザの前方に存在するのかあるいは後方に存在するのかによって異なるため、ユーザは、音源の正確な方向を把握することができる。また、この音源の音の変化の仕方が、上下スイングボタン２３２の場合、音源がユーザの上方に存在するのかあるいは下方に存在するのかによって異なるため、ユーザは、音源の正確な方向を把握することができる。すなわち、左右スイングボタン２３１または上下スイングボタン２３２使用することによって、ユーザは、仮想空間内において、音源の方向をより正確に判別（認知）することができる。 In the case of the left / right swing button 231, the way the sound of the sound source changes depends on whether the sound source exists in front of or behind the user, so that the user can grasp the exact direction of the sound source. . In addition, in the case of the up / down swing button 232, the way of changing the sound of the sound source differs depending on whether the sound source exists above or below the user, so that the user must grasp the exact direction of the sound source. Can do. That is, by using the left / right swing button 231 or the up / down swing button 232, the user can more accurately determine (recognize) the direction of the sound source in the virtual space.

また、本実施形態では、スケールバーまたはメッシュなど距離を表示した仮想空間のイメージ（図８、図１１参照）をディスプレイ２２０に出力する。これによって、ユーザは、ヘッドフォン２１７から出力される３次元化された立体音響から聴覚的に判別（認知）した音源との距離と、ディスプレイ２２０に表示されたイメージから視覚的に判別（認知）した音源との距離との関係を学習する。これによりユーザは、３次元化された立体音響立を聴くだけで、仮想空間における音源との距離をより正確に把握することができるようになる。すなわち、距離を表示した仮想空間のイメージデータを参照することにより、ユーザは、仮想空間における音源までの距離（音源の位置）を把握する能力を高めることができる。
なお、仮想空間における音源との距離を把握する能力を高めることにより、以下の効果が発生する。 In the present embodiment, an image of the virtual space (see FIGS. 8 and 11) displaying the distance such as a scale bar or a mesh is output to the display 220. As a result, the user visually discriminates (recognizes) the distance from the sound source audibly discriminated (recognized) from the three-dimensional stereophonic sound output from the headphone 217 and the image displayed on the display 220. Learn the relationship with the distance to the sound source. Thereby, the user can grasp | ascertain the distance with the sound source in virtual space more correctly only by listening to the three-dimensional stereophonic sound stand. That is, by referring to the image data of the virtual space displaying the distance, the user can improve the ability to grasp the distance (sound source position) to the sound source in the virtual space.
In addition, the following effects generate | occur | produce by raising the capability to grasp | ascertain the distance with the sound source in virtual space.

第１には、仮想空間を使用した会議システムにおいて、通信相手（会議の出席者である他ユーザ）との距離を正確に認知することによって、より豊かなコミュニケーション環境を実現することができる。例えば、複数の通信相手である他ユーザが存在する会議において、ある他ユーザが発する発言が自分（自ユーザ）に向けられているのか否かを判別する際に、自ユーザと発言者である他ユーザとの距離を把握することが重要になる。すなわち、発言者である他ユーザが仮想空間内で自ユーザに接近している場合は、自ユーザに対して話している可能性がより高くなる。 First, in a conference system using a virtual space, a richer communication environment can be realized by accurately recognizing the distance to a communication partner (another user who is an attendee of the conference). For example, in a meeting where there are other users who are communication partners, when determining whether or not a statement made by a certain other user is directed to him (your user) It is important to know the distance to the user. That is, when another user who is a speaker is approaching the user in the virtual space, the possibility of speaking to the user becomes higher.

また、１対１の会話においても、他ユーザが近い位置から（例えば１m 程度近づいて）話しかけ場合、自ユーザは、他ユーザがより親密な関係を結ぼうとしているか、または、より重要な（秘密性の高い）内容について会話しようとしていると認識する。しかし、他ユーザが離れた位置（例えば５m 程度）から話している場合は、自ユーザは、他ユーザが敵対心を抱いているか、または、より重要性（秘密性）の低い内容について会話しようとしていると認識する。 Also, in a one-on-one conversation, when another user talks from a close position (for example, approaching about 1 m), the user is trying to establish a more intimate relationship with the other user or a more important (secret) Recognize that you are talking about content. However, if the other user is speaking from a remote location (for example, about 5 m), the user tries to talk about content that the other user is hostile or less important (secret). Recognize that

従来の会議システムでは、このような非言語的なコミュニケーション情報を伝達することは困難であった。しかしながら、仮想空間における音源との距離を学習することによって、このような非言語的な情報の伝達が可能になり、より豊かなコミュニケーションを実現することができる。 In a conventional conference system, it has been difficult to transmit such non-verbal communication information. However, by learning the distance to the sound source in the virtual space, such non-verbal information can be transmitted, and richer communication can be realized.

第２には、仮想空間内で敵と戦うゲームにおいて、距離の感覚を取り入れることによって、よりリアリティの高いゲームを実現することができる。すなわち、３次元化された立体音響を用いて仮想空間内で敵と戦うゲームにおいては、ゲーム実施者（自ユーザ）から対戦相手 (他ユーザ、所定の物体) までの距離を正確に把握できるかどうかがゲームの勝敗をわける場合がある。例えば、遠距離にいる対戦相手がゲーム実施者（自ユーザ）に発砲しても的中する可能性が低いが、近距離にいる対戦相手が発砲するとゲーム実施者（自ユーザ）に的中する可能性が高い。そのため、ゲーム実施者（自ユーザ）は、対戦相手が近距離にいる場合は、より注意を払う必要がある。また、コンピュータ・グラフィクスを使用した視覚的なゲームにおいても、対戦相手が後方から接近するときにはディスプレイ上で相手の存在や距離を表示することができない場合など、対戦相手との仮想空間内の距離を学習することは重要である。 Secondly, a game with higher reality can be realized by incorporating a sense of distance in a game of fighting enemies in a virtual space. In other words, in a game in which a three-dimensional stereophonic sound is used to fight an enemy in a virtual space, is it possible to accurately grasp the distance from the game player (own user) to the opponent (other user, predetermined object)? There are cases where the game is divided. For example, it is unlikely that an opponent at a long distance will hit the game performer (own user), but if an opponent at a short distance fires, it will hit the game performer (self user). Probability is high. Therefore, the game performer (own user) needs to pay more attention when the opponent is at a short distance. Also, even in visual games using computer graphics, when the opponent approaches from behind, the presence or distance of the opponent cannot be displayed on the display. It is important to learn.

なお、本発明は上記の実施形態に限定されるものではなく、その要旨の範囲内で数々の変形が可能である。 In addition, this invention is not limited to said embodiment, Many deformation | transformation are possible within the range of the summary.

例えば、本実施形態のクライアントは、グラフィクスレンダラ２１９がディスプレイ２２０に仮想空間のイメージデータ（図８〜図１１参照）を出力する。しかしながら、本発明は、３次元オーディオ技術を用いた立体音響による通信を主としたシステムであるため、クライアント２０１は、ディスプレイ２２０に仮想空間のイメージデータを出力しないこととしてもよい。この場合、クライアント２０１は、グラフィクスレンダラ２１９およびディスプレイ２２０を有しない。 For example, in the client of the present embodiment, the graphics renderer 219 outputs virtual space image data (see FIGS. 8 to 11) to the display 220. However, since the present invention is a system mainly for communication by stereophonic sound using the three-dimensional audio technology, the client 201 may not output the virtual space image data to the display 220. In this case, the client 201 does not have the graphics renderer 219 and the display 220.

また、本実施形態では、音響サーバ１２０が、クライアント各々から送信されたユーザの音声（オーディオ信号）を、３次オーディオ技術を用いてクライアント毎に立体化する。そして、音響サーバ１２０は、クライアント毎に立体化した各ユーザの音声を、各クライアントに送信する。しかしながら、クライアント各々が、音響サーバ１２０を介すことなく、直接１対１で音声（オーディオ信号）を送受信し、他のクライアントから入力された音声を立体化することとしてもよい。 In the present embodiment, the acoustic server 120 three-dimensionalizes the user's voice (audio signal) transmitted from each client for each client using the tertiary audio technology. Then, the acoustic server 120 transmits each user's voice, which is three-dimensional for each client, to each client. However, each client may directly transmit and receive audio (audio signals) in a one-to-one manner without going through the acoustic server 120 and three-dimensionalize audio input from other clients.

この場合、各クライアントは、図３に示すクライアントの構成と以下の点において異なる。すなわち、オーディオデコーダ２１６は、音響サーバ１２０のオーディオレンダラ１２２（図１３、図１４参照）と同様の機能を有する。また、オーディオ通信部２１５は、音響サーバ１２０と通信するかわりに、他のクライアントと直接通信する。この場合、音響サーバ１２０は、不要である。 In this case, each client differs from the client configuration shown in FIG. 3 in the following points. That is, the audio decoder 216 has the same function as the audio renderer 122 (see FIGS. 13 and 14) of the acoustic server 120. The audio communication unit 215 communicates directly with other clients instead of communicating with the acoustic server 120. In this case, the acoustic server 120 is not necessary.

また、本実施形態における仮想空間は、複数のユーザが会議または会話を行うために仮想的に作り出した空間である。しかしながら、本発明は、これに限定されず、ユーザが、音楽やインターネット放送などの各種の音源を聴くために仮想的に作り出した空間であってもよい。 In addition, the virtual space in the present embodiment is a space that is virtually created for a plurality of users to hold a conference or conversation. However, the present invention is not limited to this, and may be a space created virtually by a user to listen to various sound sources such as music and Internet broadcasts.

また、本実施形態で説明した立体音響再生システムでは、登録サーバ１３０を有する。しかしながら、SIPプロトコルを用いて通信を行わない場合は、登録サーバは不要である。 Further, the stereophonic sound reproduction system described in the present embodiment includes a registration server 130. However, if communication is not performed using the SIP protocol, the registration server is not necessary.

本実施形態における全体構成図である。It is a whole block diagram in this embodiment. 本実施形態における各装置のハードウェア構成図である。It is a hardware block diagram of each apparatus in this embodiment. 本実施形態におけるクライアントの構成図である。It is a block diagram of the client in this embodiment. 本実施形態における音源の方向を模式的に示した図である。It is the figure which showed typically the direction of the sound source in this embodiment. 本実施形態における、左を向いた状態での音源の方向を模式的に示した図である。It is the figure which showed typically the direction of the sound source in the state which faced the left in this embodiment. 本実施形態における、右を向いた状態での音源の方向を模式的に示した図である。It is the figure which showed typically the direction of the sound source in the state which faced the right in this embodiment. 本実施形態における、上または下を向いた状態での音源の方向を模式的に示した図である。It is the figure which showed typically the direction of the sound source in the state which faced up or down in this embodiment. 本実施形態における平面図による仮想空間のディスプレイ表示画面例である。It is an example of the display display screen of the virtual space by the top view in this embodiment. 本実施形態における左および右を向いた状態の仮想空間のディスプレイ表示画面例である。It is an example of a display display screen of a virtual space in a state of facing left and right in the present embodiment. 本実施形態における上および下を向いた状態の仮想空間のディスプレイ表示画面例である。It is an example of the display display screen of the virtual space of the state which faced up and down in this embodiment. 本実施形態における透視図による仮想空間のディスプレイ表示画面例である。It is an example of the display display screen of the virtual space by the perspective view in this embodiment. 本実施形態におけるクライアントの種類を例示したものである。The type of the client in this embodiment is illustrated. 本実施形態における音響サーバの構成図である。It is a block diagram of the acoustic server in this embodiment. 本実施形態におけるオーディオレンダラの処理を模式的に示した図である。It is the figure which showed typically the process of the audio renderer in this embodiment. 本実施形態における初期反射を説明するための鏡像を模式的に示した図である。It is the figure which showed typically the mirror image for demonstrating the initial stage reflection in this embodiment. 本実施形態におけるクライアントの自ユーザの移動処理フロー図である。It is a movement process flowchart of the self user of the client in this embodiment. 本実施形態におけるクライアントの他ユーザの移動処理フロー図である。It is a movement processing flowchart of the other user of the client in this embodiment. 本実施形態におけるクライアントのスイング処理フロー図である。It is a swing process flowchart of the client in this embodiment. 本実施形態におけるプレゼンスサーバの機能構成図である。It is a function block diagram of the presence server in this embodiment. 本実施形態におけるプレゼンスサーバの処理フロー図である。It is a processing flow figure of the presence server in this embodiment.

Explanation of symbols

１０１…ネットワーク、１１０…プレゼンスサーバ、１２０…音響サーバ、１３０…登録サーバ、２０１、２０２、２０３…クライアント、２１１…マイクロフォン、２１２…オーディオエンコーダ、２１５…オーディオ通信部、２１６…オーディオデコーダ、２１７…ヘッドフォン、２１９…グラフィクスレンダラ、２２０…ディスプレイ、２２１…空間モデラ、２２２…プレゼンスプロバイダ、２２３…セッション制御部、２３０…ポインティングデバイス、２３１…左右スイングボタン、２３２…上下スイングボタン

DESCRIPTION OF SYMBOLS 101 ... Network, 110 ... Presence server, 120 ... Acoustic server, 130 ... Registration server, 201, 202, 203 ... Client, 211 ... Microphone, 212 ... Audio encoder, 215 ... Audio communication part, 216 ... Audio decoder, 217 ... Headphone 219 ... Graphics renderer, 220 ... Display, 221 ... Spatial modeler, 222 ... Presence provider, 223 ... Session control unit, 230 ... Pointing device, 231 ... Left / right swing button, 232 ... Up / down swing button

Claims

A stereophonic sound reproduction system for controlling the sound of at least one sound source existing in a virtual space,
An acoustic server for controlling the acoustic effect of each of the sound sources, and a client used by the user,
The client
Accepting means for accepting a swing instruction to swing the user's neck left and right or up and down arranged in the virtual space;
Client transmission means for transmitting the swing instruction received by the reception means to the acoustic server;
Client receiving means for receiving, from the acoustic server, a three-dimensional acoustic signal in which the acoustic effect of each of the sound sources is controlled;
Output means for outputting the stereophonic sound of the stereoacoustic signal received by the client receiving means,
The acoustic server is
Server storage means for storing the position and orientation of each of the user and the sound source in the virtual space;
Server receiving means for receiving an acoustic signal output by the sound source from each of the sound sources;
Acoustic control means for controlling an acoustic effect applied to each acoustic signal received by the server reception means based on the position and orientation of each of the user and the sound source stored in the server storage means;
Server transmission means for transmitting to the client a stereophonic sound signal in which the acoustic control means has controlled the acoustic effect;
When the swing instruction is received from the client, the acoustic control unit applies the received acoustic signal to each of the received acoustic signals based on the direction changed from the user orientation stored in the server storage unit to the left, right, or up and down. A three-dimensional sound reproduction system characterized by controlling sound effects.

The three-dimensional sound reproduction system according to claim 1,
A management server for managing positions of the user and the sound source in the virtual space,
The management server
Management server receiving means for receiving the position and orientation of the user and each of the sound sources in the virtual space from the client or the sound sources,
Management server storage means for storing the position and orientation received by the management server reception means;
Management server transmission means for transmitting the position and orientation of each of the user and the sound source stored in the management server storage means to the acoustic server,
The client
Further comprising position information transmitting means for transmitting the position and orientation of the user in the virtual space to the management server;
The acoustic server is
Further comprising position information receiving means for receiving the position and orientation of each of the user and the sound source in the virtual space from the management server and storing them in the server storage means;
The three-dimensional sound reproduction system, wherein the sound control means controls sound effects applied to each sound signal received by the server receiving means based on the position and orientation received by the position information receiving means.

The three-dimensional sound reproduction system according to claim 1,
The client
Client storage means for storing the position and orientation of each of the user and the sound source in the virtual space;
Image creation means for creating image data to be output to a display device based on the position and orientation stored in the client storage means;
The image creating means displays a scale for indicating a distance in the virtual space between the user and the sound source in the image data,
The acoustic control means of the acoustic server includes:
Based on the position and orientation of each of the user and the sound source stored in the server storage unit, a distance in the virtual space between the user and each of the sound sources is calculated,
A stereophonic sound reproduction system characterized in that, for each sound signal of the sound source received by the server receiving means, the sound effect is controlled by increasing or decreasing the ratio of the reflected sound to the direct sound according to the calculated distance.

The three-dimensional sound reproduction system according to claim 3,
A management server for managing positions of the user and the sound source in the virtual space,
The management server
Management server receiving means for receiving the position and orientation of the user and each of the sound sources in the virtual space from the client or each of the sound sources;
Management server storage means for storing the position and orientation received by the management server reception means;
Management server transmission means for transmitting the position and orientation of each of the user and the sound source stored in the management server storage means to the acoustic server,
The client
Further comprising position information transmitting means for transmitting the position and orientation of the user in the virtual space to the management server;
The acoustic server is
Further comprising position information receiving means for receiving the position and orientation of each of the user and the sound source in the virtual space from the management server and storing them in the server storage means;
The acoustic control unit calculates a distance in the virtual space between the user and each of the sound sources based on the position and orientation received by the position information receiving unit, and applies the calculated distance to each acoustic signal received by the server receiving unit. A three-dimensional sound reproduction system characterized by controlling sound effects.

The three-dimensional sound reproduction system according to claim 1,
The client
Client storage means for storing the position and orientation of each of the user and the sound source in the virtual space;
Image creation means for creating image data to be output to a display screen based on the position and orientation stored in the client storage means;
When the swing instruction is received, the image creation means fixes the position and orientation of the user in the virtual space, and image data obtained by moving the sound sources relatively left and right or up and down around the user. A stereophonic sound reproduction system characterized by creation.

The three-dimensional sound reproduction system according to claim 5,
The sound source is a user other than the user arranged in the virtual space,
The image creation means of the client displays the shoulder width of the user and the shoulder width of each of the other users for indicating the distance in the virtual space with the user and each of the other users in the image data,
The acoustic control means of the acoustic server includes:
Based on the positions and orientations of the user and each of the other users stored in the server storage unit, a distance in the virtual space between the user and each of the other users is calculated,
A stereophonic sound reproduction system characterized by controlling the sound effect by increasing or decreasing the ratio of reflected sound to direct sound according to the calculated distance for each of the other user's sound signals received by the server receiving means.

A stereophonic sound reproducing device for controlling the sound of at least one sound source existing in a virtual space,
Storage means for storing positions and orientations of the user using the stereophonic device and each of the sound sources in the virtual space;
Receiving means for receiving an acoustic signal output from the sound source from each of the sound sources;
Acoustic control means for controlling an acoustic effect applied to each acoustic signal received by the reception means based on the position and orientation of each of the user and the sound source stored in the storage means;
Accepting means for accepting a swing instruction to swing the user's neck left and right or up and down arranged in the virtual space;
Output means for outputting stereophonic sound controlled by the sound control means,
When the swing instruction is received, the sound control means controls the sound effect applied to each of the received sound signals based on the direction changed from the user orientation stored in the storage means to the left or right or up and down. A stereophonic sound reproducing device characterized by:

A three-dimensional sound reproduction method in a three-dimensional sound reproduction system for controlling the sound of at least one sound source existing in a virtual space,
The three-dimensional sound reproduction system includes a sound server that controls sound effects of each of the sound sources, and a client that is used by a user.
The processing unit of the client is
An instruction accepting step for accepting a swing instruction to swing the user's neck placed in the virtual space left and right or up and down;
A client transmission step of transmitting the swing instruction received in the instruction reception step to the acoustic server;
A client receiving step of receiving, from the acoustic server, a three-dimensional acoustic signal in which the acoustic effect of each of the sound sources is controlled;
Outputting the stereophonic sound of the stereoacoustic signal received by the client reception step, and
The acoustic server includes a processing unit, and a storage unit that stores positions and orientations of the user and the sound source in the virtual space,
The processing unit of the acoustic server is
A server receiving step of receiving an acoustic signal output from the sound source from each of the sound sources;
An acoustic control step for controlling an acoustic effect applied to each acoustic signal received in the server reception step based on the position and orientation of each of the user and the sound source stored in the storage unit;
A server transmission step of transmitting a stereophonic sound signal whose acoustic effect is controlled in the acoustic control step to the client, and
When the swing instruction is received from the client, the sound control step is configured to apply sound to each of the received sound signals based on a direction changed from the user's direction stored in the storage unit to the left / right or up / down. A stereophonic sound reproduction system characterized by controlling the effect.

A stereophonic sound reproduction method for controlling the sound of at least one sound source existing in a virtual space, performed by an information processing device,
The information processing device includes a processing unit, a storage unit that stores a position and orientation of each of the sound source and the user who uses the information processing device in the virtual space, and the user's neck arranged in the virtual space. Or an instruction receiving unit that receives a swing instruction to swing up and down,
The processor is
A receiving step of receiving an acoustic signal output from the sound source from each of the sound sources;
An acoustic control step for controlling an acoustic effect applied to each acoustic signal received in the reception step based on the position and orientation of each of the user and the sound source stored in the storage unit;
An output step for outputting the stereophonic sound controlled in the acoustic control step,
The acoustic control step is applied to each of the received acoustic signals based on a direction changed from the user orientation stored in the storage unit to the left or right or up and down when the instruction receiving unit receives the swing instruction. A three-dimensional sound reproduction method characterized by controlling an acoustic effect to be performed.

A three-dimensional sound reproduction program for controlling sound of at least one sound source existing in a virtual space, performed by an information processing device,
In the information processing apparatus,
Storage means for storing positions and orientations of the user using the stereophonic device and each of the sound sources in the virtual space;
Receiving means for receiving an acoustic signal output from the sound source from each of the sound sources;
Acoustic control means for controlling an acoustic effect applied to each acoustic signal received by the receiving means based on the position and orientation of each of the user and the sound source stored in the storage means;
Accepting means for accepting a swing instruction to swing the user's neck left and right or up and down arranged in the virtual space; and
An output unit that outputs stereophonic sound controlled by the sound control unit;
When receiving the swing instruction, the sound control means controls the sound effect applied to each of the received sound signals based on the orientation changed from the user orientation stored in the storage means to the left or right or up and down. A stereophonic sound reproduction program characterized by: