JP4426484B2

JP4426484B2 - Audio conference system, conference terminal and audio server

Info

Publication number: JP4426484B2
Application number: JP2005068918A
Authority: JP
Inventors: 泰金田
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2005-03-11
Filing date: 2005-03-11
Publication date: 2010-03-03
Anticipated expiration: 2025-03-11
Also published as: JP2006254167A

Description

本発明は、３Ｄオーディオ技術を用いた音声会議システムの技術に関する。 The present invention relates to a technology of an audio conference system using 3D audio technology.

特許文献１には、会議室接続型の音声会議システムが開示されている。この音声会議システムでは、各会議室にマイクとスピーカとを設置して各会議室間を接続し、マイクで収音した会議室にいる参加者の声を、他の会議室各々に設置したスピーカから出力する。 Patent Document 1 discloses a conference room connection type audio conference system. In this audio conference system, a microphone and a speaker are installed in each conference room, the conference rooms are connected to each other, and the voice of the participant in the conference room picked up by the microphone is installed in each other conference room. Output from.

また、特許文献２には、３Ｄオーディオ技術を用いた音声会議システムが開示されている。この音声会議システムでは、音声会議に参加する各参加者の会議端末から、３Ｄオーディオ処理（立体音響処理）された各参加者の音声データを出力する。 Patent Document 2 discloses an audio conference system using 3D audio technology. In this audio conference system, audio data of each participant subjected to 3D audio processing (stereoscopic audio processing) is output from the conference terminal of each participant participating in the audio conference.

米国特許第５３６５５８３号明細書US Pat. No. 5,365,583 米国特許第６３２７５６７号明細書US Pat. No. 6,327,567

特許文献１に記載の会議室接続型の音声会議システムでは、会議室に複数の参加者がいる場合に、当該会議室に設置されたマイクで集音された音声データから誰が発言したのかを容易に判別できないという問題がある。 In the conference room connection type audio conference system described in Patent Document 1, when there are a plurality of participants in the conference room, it is easy to determine who speaks from the audio data collected by the microphone installed in the conference room. There is a problem that cannot be determined.

一方、特許文献２に記載の３Ｄオーディオ技術を用いた音声会議システムでは、各参加者の音声が３Ｄオーディオ処理されて距離および方向が表現される。このため、音声データから誰が発言したのかを容易に判別できる。しかし、音声会議参加者のうちの複数人が同じ会議室内で音声会議システムを使用する場合には、同じ会議室にいる参加者の音声データで表現されている距離および方向と、当該参加者の実際の位置および向きとが異なると、音声データを介して聞こえる当該参加者の声の距離および方向と、直接聞こえる当該参加者の声の距離および方向とが相違し、このため、違和感を生じる。 On the other hand, in the audio conference system using the 3D audio technology described in Patent Document 2, each participant's audio is subjected to 3D audio processing to express distance and direction. For this reason, it is possible to easily determine who speaks from the voice data. However, when two or more of the audio conference participants use the audio conference system in the same conference room, the distance and direction expressed by the audio data of the participants in the same conference room, If the actual position and orientation are different, the distance and direction of the participant's voice that can be heard via the audio data are different from the distance and direction of the participant's voice that can be heard directly, and this causes a sense of incongruity.

本発明は上記事情に鑑みてなされたものであり、本発明の目的は、３Ｄオーディオ技術を用いた音声会議システムにおいて、音声データで表現されている各ユーザの方向および距離に違和感を生じさせないようにすることにある。 The present invention has been made in view of the above circumstances, and an object of the present invention is to prevent a sense of incongruity in the direction and distance of each user represented by audio data in an audio conference system using 3D audio technology. Is to make it.

上記課題を解決するために、本発明では、音声会議に参加する複数の会議端末各々のユーザの仮想空間上における位置および実空間上における所在場所を管理するプレゼンスサーバを設ける。そして、会議端末毎に、プレゼンスサーバを用いて、当該会議端末と同じ実空間上の所在場所に存在する会議端末以外の他の会議端末各々のユーザの音声データを、当該他の会議端末各々のユーザの前記仮想空間上における位置に基づいて立体音響処理して合成する。 In order to solve the above-described problems, the present invention provides a presence server that manages the location of each of a plurality of conference terminals participating in an audio conference in the virtual space and the location in the real space. Then, for each conference terminal, using the presence server, the voice data of the users of the other conference terminals other than the conference terminal existing at the same location in the same real space as the conference terminal is transmitted to each of the other conference terminals. Based on the position of the user in the virtual space, the sound is synthesized and synthesized.

例えば、本発明の音声会議システムは、複数の会議端末と、前記複数の会議端末各々のユーザの仮想空間上における位置および実空間上における所在場所を管理するプレゼンスサーバと、実空間上のそれぞれの所在場所に設置され、当該所在場所に存在する会議端末が前記プレゼンスサーバと通信を行なうために利用される複数の中継装置と、を有し、
前記複数の会議端末各々は、
自会議端末のユーザである自ユーザの前記仮想空間上における位置および向きを含む自ユーザの仮想位置情報を前記プレゼンスサーバに送信する仮想位置情報送信手段と、
前記プレゼンスサーバから各会議端末のユーザの仮想位置情報と実空間上の所在場所を示す所在情報とを受信する位置情報受信手段と、
前記位置情報受信手段が受信した各会議端末のユーザの所在情報に基づいて、自ユーザの実空間上の所在場所と異なる場所に存在するユーザである他会議室ユーザを検出する他会議室ユーザ検出手段と、
自ユーザの音声データを、前記他会議室ユーザ検出手段が検出した他会議室ユーザの会議端末各々に送信する音声データ送信手段と、
前記他会議室ユーザ検出手段が検出した他会議室ユーザの会議端末各々から、他会議室ユーザの音声データを受信する音声データ受信手段と、
前記音声データ受信手段で受信した他会議室ユーザの音声データ各々に対して、当該他会議室ユーザの仮想位置情報および自ユーザの仮想位置情報により特定される、前記仮想空間における当該他会議室ユーザと自ユーザとの相対位置に応じた立体音響処理を施し、立体音響処理された他会議室ユーザの音声データ各々を合成して、立体合成音声データを生成する音声合成手段と、
前記音声合成手段により生成された立体合成音声データをスピーカから出力する音声制御手段と、を有し、
前記プレゼンスサーバは、
前記複数の会議端末各々から送られてきたユーザの仮想位置情報を管理すると共に、前記複数の会議端末各々から送られてきた情報が経由した前記中継装置に基づいて、前記複数の会議室各々のユーザの所在情報を管理する管理手段と、
前記複数の会議端末の各々に対して、前記管理手段で管理されている各会議端末のユーザの仮想位置情報および所在情報を、当該会議端末に送信する位置情報送信手段と、を有する。
For example, an audio conference system according to the present invention includes a plurality of conference terminals, a presence server that manages the location of each user of the plurality of conference terminals in the virtual space and the location in the real space, A plurality of relay devices installed at a location and used for communication of a conference terminal existing at the location with the presence server;
Each of the plurality of conference terminals includes:
Virtual position information transmitting means for transmitting to the presence server virtual position information of the own user including the position and direction of the own user in the virtual space, which is the user of the self-conference terminal;
Position information receiving means for receiving, from the presence server, virtual position information of a user of each conference terminal and location information indicating a location in real space;
Based on the location information of each conference terminal user received by the location information receiving means, another conference room user detection for detecting a user in another conference room who is a user in a location different from the location in the real space of the own user Means,
And audio data transmission means for transmitting the audio data of its own user, to the conference terminal each of the other conference users the other conference room user detection means detects,
Voice data receiving means for receiving voice data of other conference room users from each conference terminal of the other conference room users detected by the other conference room user detecting means,
The other conference room user in the virtual space specified by the virtual position information of the other conference room user and the virtual position information of the own user for each voice data of the other conference room user received by the voice data receiving means A speech synthesis unit that performs stereophonic sound processing according to the relative position between the user and the user, synthesizes each of the audio data of other conference room users subjected to the stereophonic sound processing, and generates stereo-synthesized sound data;
Voice control means for outputting the three-dimensional synthesized voice data generated by the voice synthesis means from a speaker;
The presence server
While managing the virtual position information of the users sent from each of the plurality of conference terminals, and based on the relay device through which the information sent from each of the plurality of conference terminals has passed, Management means for managing user location information;
For each of the plurality of conference terminals, there is provided location information transmission means for transmitting the virtual location information and location information of the user of each conference terminal managed by the management means to the conference terminal.

また、本発明の他の音声会議システムは、複数の会議端末と、前記複数の会議端末各々のユーザの仮想空間上における位置および実空間上における所在場所を管理するプレゼンスサーバと、前記複数の会議端末各々に音声データを送信する音声サーバと、実空間上のそれぞれの所在場所に設置され、当該所在場所に存在する会議端末が前記プレゼンスサーバと通信を行なうために利用される複数の中継装置と、を有し、
前記複数の会議端末各々は、
自会議端末のユーザである自ユーザの前記仮想空間上における位置および向きを含む自ユーザの仮想位置情報を前記プレゼンスサーバに送信する仮想位置情報送信手段と、
自ユーザの音声データを前記音声サーバに送信する音声データ送信手段と、
前記音声サーバから立体合成音声データを受信する立体合成音声データ受信手段と、
前記立体合成音声データ受信手段で受信した立体合成音声データをスピーカから出力する音声制御手段と、を有し、
前記音声サーバは、
前記プレゼンスサーバから前記複数の会議端末各々のユーザの仮想位置情報と実空間上における所在場所を示す所在情報とを受信する位置情報受信手段と、
前記複数の会議端末の各々について、前記位置情報受信手段が受信した各会議端末のユーザの所在情報に基づいて、当該会議端末のユーザである対象ユーザと同じ実空間上の所在場所に存在するユーザである他会議室ユーザを検出する他会議室ユーザ検出手段と、
前記複数の会議端末各々から当該会議端末のユーザの音声データを受信する音声データ受信手段と、
前記複数の会議端末の各々について、前記他会議室ユーザ検出手段が検出した他会議室ユーザ各々の音声データに対して、当該他会議室ユーザの仮想位置情報および当該会議端末のユーザである対象ユーザの仮想位置情報により特定される、前記仮想空間における当該他会議室ユーザと対象ユーザとの相対位置に応じた立体音響処理を施し、立体音響処理された他会議室ユーザの音声データ各々を合成して、立体合成音声データを生成する音声合成手段と、
前記複数の会議端末各々に対して、前記音声合成手段により生成された当該会議端末の対象ユーザに対する立体合成音声データを、当該会議端末に送信する立体合成音声データ送信手段と、を有し、
前記プレゼンスサーバは、
前記複数の会議端末各々から送られてきたユーザの仮想位置情報を管理すると共に、前記複数の会議端末各々から送られてきた情報が経由した前記中継装置に基づいて、前記複数の会議端末各々の所在情報を管理する管理手段と、
前記管理手段で管理されている各会議端末のユーザの仮想位置情報および所在情報を、前記音声サーバに送信する位置情報送信手段と、を有する。 In addition, another audio conference system of the present invention includes a plurality of conference terminals, a presence server that manages a location of each of the plurality of conference terminals in a virtual space and a location in real space, and the plurality of conferences A voice server that transmits voice data to each of the terminals, and a plurality of relay devices that are installed at each location in the real space and are used by the conference terminal that exists at the location to communicate with the presence server; Have
Each of the plurality of conference terminals includes:
Virtual position information transmitting means for transmitting to the presence server virtual position information of the own user including the position and direction of the own user in the virtual space, which is the user of the self-conference terminal;
Voice data transmitting means for transmitting the voice data of the own user to the voice server;
Three-dimensional synthesized voice data receiving means for receiving three-dimensional synthesized voice data from the voice server;
Voice control means for outputting the stereo synthesized voice data received by the stereo synthesized voice data receiving means from a speaker;
The voice server is
Position information receiving means for receiving, from the presence server, virtual position information of each user of the plurality of conference terminals and location information indicating a location in real space;
For each of the plurality of conference terminals , based on the location information of the user of each conference terminal received by the location information receiving means , a user who exists in the same location in the real space as the target user who is the user of the conference terminal Other meeting room user detecting means for detecting other meeting room users,
Voice data receiving means for receiving voice data of a user of the conference terminal from each of the plurality of conference terminals;
For each of the plurality of conference terminals, with respect to the audio data of each other conference room user detected by the other conference room user detection means, the virtual location information of the other conference room user and the target user who is the user of the conference terminal 3D acoustic processing is performed in accordance with the relative position between the other conference room user and the target user in the virtual space specified by the virtual position information, and the audio data of the other conference room users subjected to the 3D acoustic processing are synthesized. Voice synthesis means for generating three-dimensional synthesized voice data;
For each of the plurality of conference terminals, there is stereo synthesized speech data transmitting means for transmitting to the conference terminal stereo synthesized speech data for the target user of the conference terminal generated by the speech synthesizer,
The presence server
While managing the virtual location information of the users sent from each of the plurality of conference terminals, and based on the relay device through which the information sent from each of the plurality of conference terminals has passed, each of the plurality of conference terminals Management means for managing location information;
Position information transmission means for transmitting the virtual position information and location information of the user of each conference terminal managed by the management means to the voice server.

本発明によれば、会議端末は、実空間上の同じ所在場所に存在する他の会議端末のユーザの音声データを出力しない。実空間上の別の所在場所に存在する他の会議端末のユーザの音声データのみが立体音響処理されて出力される。したがって、音声データで表現されている各ユーザの方向および距離に違和感を生じさせないようにすることができる。 According to the present invention, the conference terminal does not output the voice data of the users of other conference terminals existing at the same location in the real space. Only the audio data of the user of another conference terminal existing at another location in the real space is subjected to stereophonic processing and output. Therefore, it is possible to prevent a sense of incongruity from occurring in the direction and distance of each user expressed by the audio data.

以下に、本発明の実施の形態を説明する。 Hereinafter, embodiments of the present invention will be described.

<<第１実施形態>>
図１は本発明の第１実施形態が適用された音声会議システムの概略構成図である。図示するように、本実施形態の音声会議システムは、プレゼンスサーバ１と、複数の会議端末２と、ＩＰ（Internet Protocol）網４を介してプレセンスサーバ１に接続する複数の無線ＬＡＮ（Local Area Network）-ＡＰ（Access Point）３Ａ〜３Ｃと、を有する。 << first embodiment >>
FIG. 1 is a schematic configuration diagram of an audio conference system to which the first embodiment of the present invention is applied. As shown in the figure, the audio conference system of the present embodiment includes a presence server 1, a plurality of conference terminals 2, and a plurality of wireless LANs (Local Area Networks) connected to the presence server 1 via an IP (Internet Protocol) network 4. ) -AP (Access Point) 3A-3C.

無線ＬＡＮ-ＡＰ３Ａ〜３Ｂは、それぞれ異なる会議室Ａ〜Ｃに設置されており、会議室Ａ〜Ｃに存在する会議端末２がプレゼンスサーバ１と通信を行なうために利用される。なお、図１では３つの無線ＬＡＮ-ＡＰを示しているが、当然ながら、無線ＬＡＮ-ＡＰの数はこの数に限られない。 The wireless LAN-APs 3A to 3B are installed in different conference rooms A to C, respectively, and are used for the conference terminal 2 existing in the conference rooms A to C to communicate with the presence server 1. Although FIG. 1 shows three wireless LAN-APs, the number of wireless LAN-APs is not limited to this number.

図２は無線ＬＡＮ-ＡＰ３Ａ〜３Ｃの概略構成図である。 FIG. 2 is a schematic configuration diagram of the wireless LAN-APs 3A to 3C.

図示するように、無線ＬＡＮ-ＡＰ３Ａ〜３Ｃは、ＩＰ網４に接続するためのＩＰ網インターフェース部３０１と、無線ＬＡＮインターフェース部３０２と、所在情報送信部３０３と、を有する。 As illustrated, the wireless LAN-APs 3 </ b> A to 3 </ b> C include an IP network interface unit 301 for connecting to the IP network 4, a wireless LAN interface unit 302, and a location information transmission unit 303.

無線ＬＡＮインターフェース部３０２は、自無線ＬＡＮ-ＡＰ３Ａ〜３Ｃが設置されている会議室Ａ〜Ｃに存在する会議端末２と無線ＬＡＮを介して接続するためのインターフェースである。 The wireless LAN interface unit 302 is an interface for connecting to the conference terminal 2 existing in the conference rooms A to C where the own wireless LAN-APs 3A to 3C are installed via the wireless LAN.

所在情報送信部３０３は、無線ＬＡＮインターフェース３０２を介して会議端末２からユーザＩＤを伴う所在情報登録要求を受信すると、当該ユーザＩＤと自無線ＬＡＮ-ＡＰ３Ａ〜３Ｃの識別情報であるＡＰＩＤとを含む所在情報を、ＩＰ網インターフェース部３０１を介してプレゼンスサーバ１に送信する。 When the location information transmission unit 303 receives a location information registration request with a user ID from the conference terminal 2 via the wireless LAN interface 302, the location information transmission unit 303 includes the user ID and an APID that is identification information of the own wireless LAN-AP 3A to 3C. The location information is transmitted to the presence server 1 via the IP network interface unit 301.

プレゼンスサーバ１は、各会議端末２のユーザの仮想空間上における位置情報と所在情報（ＡＰＩＤ）とを管理する。ここで、仮想空間とは各会議端末２のユーザが会議を行うために仮想的に作り出した空間である。仮想空間の属性には、例えば、空間の大きさ、天井の高さ、壁および天井の反射率・色彩・質感、残響特性、空間内の空気による音の吸収率などがある。 The presence server 1 manages location information and location information (APID) of the user of each conference terminal 2 in the virtual space. Here, the virtual space is a space created virtually for the user of each conference terminal 2 to hold a conference. The attributes of the virtual space include, for example, the size of the space, the height of the ceiling, the reflectance / color / texture of the walls and ceiling, the reverberation characteristics, and the sound absorption rate by the air in the space.

図３はプレゼンスサーバ１の概略構成図である。 FIG. 3 is a schematic configuration diagram of the presence server 1.

図示するように、プレゼンスサーバ１は、ＩＰ網４に接続するためのＩＰ網インターフェース部１０１と、位置情報管理部１０２と、ＳＩＰサーバ処理部１０３と、位置情報記憶部１０４と、を有する。 As illustrated, the presence server 1 includes an IP network interface unit 101 for connecting to the IP network 4, a location information management unit 102, a SIP server processing unit 103, and a location information storage unit 104.

図４は位置情報記憶部１０４の登録内容を模式的に示した図である。図示するように、位置情報記憶部１０４には、会議端末２のユーザ毎に、レコード１０４０が記憶されている。レコード１０４０は、会議端末２のユーザを一意に識別するためのユーザＩＤを登録するフィールド１０４１と、当該会議端末２のＳＩＰ-ＵＲＩ（Uniform Resource Identifier）を登録するフィールド１０４２と、当該会議端末２のＩＰアドレスを登録するフィールド１０４３と、当該会議端末２のユーザの仮想空間における位置（座標）および向き（視線方向の方位）を示す仮想位置情報を登録するフィールド１０４４と、当該会議端末２のユーザの所在場所（会議室）を示す所在情報を登録するフィールド１０４５と、を有する。 FIG. 4 is a diagram schematically showing registration contents in the position information storage unit 104. As shown in the figure, the position information storage unit 104 stores a record 1040 for each user of the conference terminal 2. The record 1040 includes a field 1041 for registering a user ID for uniquely identifying the user of the conference terminal 2, a field 1042 for registering the SIP-URI (Uniform Resource Identifier) of the conference terminal 2, and the conference terminal 2 A field 1043 for registering an IP address, a field 1044 for registering virtual position information indicating the position (coordinates) and orientation (direction of the line of sight) of the user of the conference terminal 2 in the virtual space, and the user of the conference terminal 2 A field 1045 for registering location information indicating the location (conference room).

位置情報管理部１０２は、位置情報記憶部１０４に登録されているレコード１０４０の検索・更新を行う。 The location information management unit 102 searches and updates the record 1040 registered in the location information storage unit 104.

ＳＩＰサーバ処理部１０３は、位置情報記憶部１０４に登録されているＳＩＰ-ＵＲＩとＩＰアドレスとの対応関係を用いて、発側の会議端末２から受信したＩＮＶＩＴＥメッセージを、着側の会議端末２へ送信する。 The SIP server processing unit 103 uses the correspondence relationship between the SIP-URI and the IP address registered in the location information storage unit 104 to send the INVITE message received from the conference terminal 2 on the callee side to the conference terminal 2 on the callee side. Send to.

図５はプレゼンスサーバ１の動作フローを説明する図である。 FIG. 5 is a diagram for explaining the operation flow of the presence server 1.

位置情報管理部１０２は、ＩＰ網インターフェース部１０１を介して会議端末２からユーザＩＤと共に仮想位置情報を受信すると（Ｓ１００１）、当該ユーザＩＤがフィールド１０４１に登録されているレコード１０４０を位置情報記憶部１０４から検索し（Ｓ１００２）、検索したレコード１０４０のフィールド１０４４に登録されている仮想位置情報を該受信した仮想位置情報に更新する（Ｓ１００３）。 When the location information management unit 102 receives virtual location information together with the user ID from the conference terminal 2 via the IP network interface unit 101 (S1001), the location information storage unit stores a record 1040 in which the user ID is registered in the field 1041. The virtual position information registered in the field 1044 of the searched record 1040 is updated to the received virtual position information (S1003).

また、位置情報管理部１０２は、ＩＰ網インターフェース部１０１を介して無線ＬＡＮ−ＡＰ３Ａ〜３ＣからユーザＩＤと共に所在情報（ＡＰＩＤ）を受信すると（Ｓ１００４）、当該ユーザＩＤがフィールド１０４１に登録されているレコード１０４０を位置情報記憶部１０４から検索し（Ｓ１００５）、検索したレコード１０４０のフィールド１０４５に登録されている所在情報を該受信した所在情報に更新する（Ｓ１００６）。 Further, when the location information management unit 102 receives the location information (APID) together with the user ID from the wireless LAN-APs 3A to 3C via the IP network interface unit 101 (S1004), the user ID is registered in the field 1041. The record 1040 is searched from the position information storage unit 104 (S1005), and the location information registered in the field 1045 of the searched record 1040 is updated to the received location information (S1006).

また、位置情報管理部１０２は、ＩＰ網インターフェース部１０１を介して会議端末２からユーザＩＤを伴う位置情報送信要求を受信すると（Ｓ１００７）、位置情報記憶部１０４から全てのレコード１０４０を読出し（Ｓ１００８）、該要求の送信元の会議端末２に返信する（Ｓ１００９）。 In addition, when the location information management unit 102 receives a location information transmission request with a user ID from the conference terminal 2 via the IP network interface unit 101 (S1007), it reads all the records 1040 from the location information storage unit 104 (S1008). ), It is sent back to the conference terminal 2 that sent the request (S1009).

また、ＳＩＰサーバ処理部１０３は、ＩＰ網インターフェース部１０１を介して会議端末２から、宛先のＳＩＰ-ＵＲＩの指定を伴うＩＮＶＩＴＥメッセージを受信すると（Ｓ１０１０）、該ＳＩＰ-ＵＲＩがフィールド１０４２に登録されているレコード１０４０を位置情報記憶部１０４から検索する（Ｓ１０１１）。そして、検索したレコード１０４０のフィールド１０４３に登録されているＩＰアドレスを宛先として該ＩＮＶＩＴＥメッセージを転送する（Ｓ１０１２）。 In addition, when the SIP server processing unit 103 receives an INVITE message with designation of a destination SIP-URI from the conference terminal 2 via the IP network interface unit 101 (S1010), the SIP-URI is registered in the field 1042. The current record 1040 is searched from the position information storage unit 104 (S1011). Then, the INVITE message is transferred with the IP address registered in the field 1043 of the retrieved record 1040 as the destination (S1012).

図１に戻って説明を続ける。会議端末２は、自会議端末２が存在する会議室以外の会議室に存在する他の会議端末２各々のユーザの音声データを、各ユーザの仮想空間における位置情報と、自会議端末２のユーザの位置情報との相対的な位置関係に基づいて立体音響処理し出力する。図６は会議端末２の概略構成図である。 Returning to FIG. 1, the description will be continued. The conference terminal 2 uses the voice data of the users of the other conference terminals 2 existing in the conference room other than the conference room in which the self-conference terminal 2 exists, the location information in the virtual space of each user, and the user of the self-conference terminal 2 Stereophonic processing is performed based on the relative positional relationship with the position information and output. FIG. 6 is a schematic configuration diagram of the conference terminal 2.

図示するように、会議端末２は、音声入力部２０１と、音声出力部２０３と、映像出力部２０４と、操作受付部２０５と、オーディオエンコーダ２０６と、オーディオレンダラ２０８と、プレゼンスプロバイダ２１０と、空間モデラ２１１と、ＩＰパケットを処理するＩＰ処理部２１２と、ＲＴＰ（Real-time Transport Protocol）処理部２１３と、ＳＩＰ制御部２１４と、着席情報作成部２１７と、他会議室ユーザ検出部２１８と、無線ＬＡＮを介して無線ＬＡＮ-ＡＰ３Ａ〜２Ｃに接続するための無線ＬＡＮインターフェース部２１９と、を有する。 As illustrated, the conference terminal 2 includes an audio input unit 201, an audio output unit 203, a video output unit 204, an operation reception unit 205, an audio encoder 206, an audio renderer 208, a presence provider 210, a space, Modeler 211, IP processing unit 212 for processing IP packets, RTP (Real-time Transport Protocol) processing unit 213, SIP control unit 214, seating information creation unit 217, other conference room user detection unit 218, A wireless LAN interface unit 219 for connecting to the wireless LAN-APs 3A to 2C via the wireless LAN.

音声入力部２１０は、マイク２２１で収音した音声信号の入力端子である。音声出力部２０３は、３Ｄオーディオ対応（例えば擬似５．１チャンネル対応）のヘッドフォン（あるいはスピーカ）２２３に接続される音声出力端子である。そして、操作受付部２０５はユーザのポインティングデバイス２２５に対する操作を受け付ける。また、オーディオエンコーダ２０６は、音声入力部２０１に入力された音声信号をエンコードして音声データを出力する。 The audio input unit 210 is an input terminal for an audio signal collected by the microphone 221. The audio output unit 203 is an audio output terminal connected to a headphone (or speaker) 223 compatible with 3D audio (for example, pseudo 5.1 channel). Then, the operation reception unit 205 receives a user operation on the pointing device 225. The audio encoder 206 encodes the audio signal input to the audio input unit 201 and outputs audio data.

ＲＴＰ処理部２１３は、オーディオエンコーダ２０６より出力された音声データをＲＴＰパケットに格納し、該ＲＴＰパケットをＩＰ処理部２１２および無線ＬＡＮインターフェース部２１９を介して、ＳＩＰ処理部２１４より通知された宛先のＩＰアドレスへ送信する。また、ＲＴＰ処理部２１３は、無線ＬＡＮインターフェース部２１９およびＩＰ処理部２１２を介して他の会議端末２より受信したＲＴＰパケットから、音声データを取り出して、該ＲＴＰパケットの送信元アドレスと共に他会議室ユーザ検出部２１８に出力する。 The RTP processing unit 213 stores the audio data output from the audio encoder 206 in an RTP packet, and the RTP packet is transmitted to the destination notified from the SIP processing unit 214 via the IP processing unit 212 and the wireless LAN interface unit 219. Send to IP address. In addition, the RTP processing unit 213 extracts voice data from the RTP packet received from the other conference terminal 2 via the wireless LAN interface unit 219 and the IP processing unit 212, and the other conference room together with the transmission source address of the RTP packet. The data is output to the user detection unit 218.

空間モデラ２１１は、予め設定されている仮想空間の属性に従い、操作受付部２０５で受け付けた自ユーザのポインティングデバイス２２５に対する操作に応じて当該仮想空間における自ユーザの位置（座標）および視線方向（方位）を決定し、決定した位置および視線方向を含む自ユーザの位置情報をプレゼンスプロバイダ２１０に出力する。また、空間モデラ２１１は、プレゼンスプロバイダ２１０から各会議端末２のユーザの位置情報および所在情報を含むレコード１０４０を受け取って保持すると共に、着席情報生成部２１７および他会議室ユーザ検出部２１８に出力する。 The space modeler 211 follows the attributes of the virtual space set in advance, and the user's position (coordinates) and line-of-sight direction (azimuth) in the virtual space according to the operation of the user's pointing device 225 received by the operation receiving unit 205. ) And the position information of the user including the determined position and line-of-sight direction is output to the presence provider 210. In addition, the space modeler 211 receives and holds the record 1040 including the location information and location information of the user of each conference terminal 2 from the presence provider 210 and outputs the record 1040 to the seating information generation unit 217 and the other conference room user detection unit 218. .

プレゼンスプロバイダ２１０は、空間モデラ２１１から受け取った自ユーザの位置情報を、ＩＰ網インターフェース部２１０を介してプレゼンスサーバ１に定期的に送信する。また、プレゼンスプロバイダ２１０は、ＩＰ処理部２１２および無線ＬＡＮインターフェース部２１９を介してプレゼンスサーバ１に位置情報送信要求を定期的に送信し、その応答として、プレゼンスサーバ１から、音声会議に参加している各ユーザのレコード１０４０を受信する。そして、受信した各ユーザのレコード１０４０を空間モデラ２１１に通知する。 The presence provider 210 periodically transmits the location information of the own user received from the space modeler 211 to the presence server 1 via the IP network interface unit 210. In addition, the presence provider 210 periodically transmits a location information transmission request to the presence server 1 via the IP processing unit 212 and the wireless LAN interface unit 219, and participates in the audio conference from the presence server 1 as a response. A record 1040 for each user is received. Then, the received record 1040 of each user is notified to the space modeler 211.

他会議室ユーザ検出部２１８は、自ユーザのユーザＩＤに基づいて、空間モデラ２１１から受け取った各ユーザのレコード１０４０から自ユーザのレコード１０４０を特定する。そして、自ユーザの所在情報（ＡＰＩＤ）と異なる所在情報を持つ他ユーザ、つまり、自ユーザとは異なる会議室に存在するユーザのレコード１０４０を、他会議室ユーザのレコード１０４０として抽出する。それから、他会議室ユーザ検出部２１８は、ＲＴＰ処理部２１３から送信元アドレスと共に受信した音声データの中から、いずれかの他会議室ユーザのレコード１０４０に含まれているＩＰアドレスと一致する送信元アドレスを持つ音声データを、当該他会議室ユーザの音声データとして抽出する。そして、抽出した他会議室ユーザの音声データ各々に、当該他会議室ユーザのレコード１０４０に含まれている仮想位置情報を付加して、オーディオレンダラ２０８に出力する。また、他会議室ユーザ検出部２１８は、自ユーザのレコード１０４０に含まれている仮想位置情報をオーディオレンダラ２０８に出力する。 The other conference room user detection unit 218 identifies the record 1040 of the own user from the record 1040 of each user received from the space modeler 211 based on the user ID of the own user. Then, another user having location information different from the location information (APID) of the own user, that is, a user record 1040 existing in a conference room different from the own user is extracted as a record 1040 of another conference room user. Then, the other conference room user detection unit 218 sends a source that matches the IP address included in the record 1040 of any other conference room user from the voice data received together with the source address from the RTP processing unit 213. Audio data having an address is extracted as audio data of the other conference room user. Then, the virtual position information included in the record 1040 of the other conference room user is added to each of the extracted audio data of the other conference room user and output to the audio renderer 208. Also, the other conference room user detection unit 218 outputs the virtual position information included in the record 1040 of the own user to the audio renderer 208.

着席室情報作成部２１７は、空間モデラ２１１から受け取ったユーザ各々のレコード１０４０の位置情報に基づいて、例えば図７に示すような、仮想空間における自ユーザの配置位置２１６１および他ユーザ各々の配置位置２１６２を示す着席情報表示データを生成する。そして、着席情報表示データを映像出力部２０４を介してディスプレイ２２４に表示する。 Based on the position information of each user record 1040 received from the space modeler 211, the seating room information creation unit 217, for example, as shown in FIG. The seating information display data indicating 2162 is generated. Then, the seating information display data is displayed on the display 224 via the video output unit 204.

オーディオレンダラ２０８は、他会議室ユーザ検出部２１８から自ユーザの仮想位置情報を受信する。また、他会議室ユーザの音声データ各々を仮想位置情報と共に受信する。そして、受信した他会議室ユーザ各々の音声データをバッファリングすることによって、各音声データ間で同期させる（対応付ける）。このバッファリング（プレイアウト・バッファリング）の方法については、例えば文献「Colin Perkins著： RTP: Audio and Video for the Internet, Addison-Wesley Pub Co; 1st edition (June 11, 2003)」に記載されている。また、オーディオレンダラ２０８は、同期させた他会議室ユーザの音声データ各々を、当該他会議室ユーザの仮想位置情報および自ユーザの仮想位置情報により特定される、仮想空間における当該他ユーザと自ユーザとの相対位置に基づいて立体化する。そして、オーディオレンダラ２０８は、２チャンネル（左チャンネルと右チャンネル）の信号データ（信号列）を、音声出力部２０３に接続された３Ｄオーディオ対応ヘッドフォン２２３に出力する。 The audio renderer 208 receives the virtual position information of the own user from the other conference room user detection unit 218. Also, each of the audio data of other conference room users is received together with the virtual position information. Then, the received audio data of each other conference room user is buffered to synchronize (correlate) the audio data. This buffering (playout buffering) method is described in, for example, the document “Colin Perkins: RTP: Audio and Video for the Internet, Addison-Wesley Pub Co; 1st edition (June 11, 2003)”. Yes. In addition, the audio renderer 208 identifies each of the synchronized audio data of the other conference room users by the virtual position information of the other conference room users and the virtual position information of the own users, and the other users and the own users in the virtual space. Based on the relative position of the three-dimensional. The audio renderer 208 then outputs signal data (signal sequence) of two channels (left channel and right channel) to the 3D audio-compatible headphone 223 connected to the audio output unit 203.

オーディオレンダラ２０８をより詳細に説明する。３次元オーディオ技術では、主に人の頭（以下、「人頭」）のまわりでの音響の変化の仕方（インパルス応答）を表すＨＲＩＲ（Head Related Impulse Response）と、部屋などの仮想環境によって生成される擬似的な残響とによって音の方向および距離を表現する。ＨＲＩＲは、音源と人頭との距離、および、人頭と音源との角度（水平角度および垂直角度）によって決定される。なお、オーディオレンダラ２０８には、予めダミーへッドを使用して各距離および各角度毎に測定したＨＲＩＲの数値が記憶されているものとする。また、ＨＲＩＲの数値には、左チャネル用（ダミーヘッドの左耳で測定したもの）と、右チャネル用（ダミーヘッドの右耳で測定したもの）とで異なる数値を使用することによって、左右、前後または上下の方向感を表現する。 The audio renderer 208 will be described in more detail. In 3D audio technology, it is mainly generated by HRIR (Head Related Impulse Response) representing how the sound changes (impulse response) around the human head (hereinafter “human head”) and a virtual environment such as a room. The direction and distance of the sound is expressed by the simulated reverberation. HRIR is determined by the distance between the sound source and the human head and the angle (horizontal angle and vertical angle) between the human head and the sound source. It is assumed that the audio renderer 208 stores HRIR values measured in advance for each distance and each angle using a dummy head. Also, by using different values for the left channel (measured with the left ear of the dummy head) and the right channel (measured with the right ear of the dummy head), Express a sense of direction in the front-rear or up-down direction.

図８はオーディオレンダ２０８の処理を説明する図である。オーディオレンダラ２０８は、他会議室ユーザ検出部２１８から他会議室ユーザの仮想位置情報と共に送られてくる音声データ各々に関して、他会議室ユーザ毎に下記の計算を行う。 FIG. 8 is a diagram for explaining the processing of the audio render 208. The audio renderer 208 performs the following calculation for each other conference room user for each piece of audio data sent from the other conference room user detection unit 218 together with the virtual position information of the other conference room user.

まず、オーディオレンダラ２０８は、他会議室ユーザ毎に、他会議室ユーザ検出部２１８から当該他会議室ユーザの音声データの信号列ｓ_ｉ[ｔ](ｔ＝１，...）を、当該他会議室ユーザの仮想位置情報と共に受け付ける。そして、当該他会議室ユーザの仮想位置情報と、自ユーザの仮想位置情報とを、当該他会議室ユーザの音声データの信号列ｓ_ｉ[ｔ](ｔ＝１，...）を３Ｄオーディオ処理に用いるパラメータに設定する（Ｓ３００１）。 First, the audio renderer 208 receives the signal sequence s _i [t] (t = 1,...) Of the audio data of the other conference room user from the other conference room user detection unit 218 for each other conference room user. It is received together with the virtual position information of other conference room users. Then, the virtual position information of the other conference room user and the virtual position information of the user are used, and the signal sequence s _i [t] (t = 1,...) Of the voice data of the other conference room user is 3D audio. It sets to the parameter used for a process (S3001).

次に、オーディオレンダラ２０８は、他会議室ユーザ毎に、音声データの直接音と、残響である反射音とを計算する。直接音については、パラメータ設定された仮想位置情報を用いて、当該他会議室ユーザと自ユーザとの仮想空間における距離および角度（azimuth）を計算する（Ｓ３００２）。それから、オーディオレンダラ２０８は、自ユーザとの距離および角度に対応するＨＲＩＲを、予め記憶しておいたＨＲＩＲの数値の中から特定する（Ｓ３００３）。なお、オーディオレンダラ２０８は、予め記憶しておいたＨＲＩＲの数値を補間することによって算出したＨＲＩＲの数値を使用してもよい。 Next, the audio renderer 208 calculates the direct sound of the audio data and the reflected sound that is reverberation for each other conference room user. For the direct sound, the distance and angle (azimuth) in the virtual space between the other conference room user and the user are calculated using the virtual position information set as parameters (S3002). Then, the audio renderer 208 specifies the HRIR corresponding to the distance and angle with the user from the previously stored HRIR values (S3003). The audio renderer 208 may use the HRIR value calculated by interpolating the HRIR value stored in advance.

次に、オーディオレンダラ２０８は、Ｓ３００１で入力した信号列と、Ｓ３００３で特定したＨＲＩＲの左チャネル用ＨＲＩＲとを使用して、畳み込み（convolution）計算を行い、左チャネル信号を生成する（Ｓ３００４）。同様に、Ｓ３００１で入力した信号列と、Ｓ３００３で特定したＨＲＩＲの右チャネル用ＨＲＩＲとを使用して、畳み込み計算を行い、右チャネル信号を生成する（Ｓ３００５）。 Next, the audio renderer 208 performs convolution calculation using the signal sequence input in S3001 and the HRIR left channel HRIR specified in S3003 to generate a left channel signal (S3004). Similarly, convolution calculation is performed using the signal sequence input in S3001 and the HRIR right channel HRIR specified in S3003 to generate a right channel signal (S3005).

また、反響音については、Ｓ３００１でパラメータ設定された位置情報を用いて、付加すべき残響を計算する（Ｓ３００６、Ｓ３００７）。すなわち、オーディオレンダラ２０８は、仮想空間の属性による音響の変化の仕方（インパルス応答）に基づいて残響を計算する。以下、残響の計算について説明する。 For reverberant sound, reverberation to be added is calculated using the position information set as a parameter in S3001 (S3006, S3007). That is, the audio renderer 208 calculates the reverberation based on the way of changing the sound (impulse response) according to the attribute of the virtual space. Hereinafter, calculation of reverberation will be described.

残響は初期反射（early reflection）および後期残響（late reverberation）により構成される。そして、初期反射の方が後期残響より、他会議室ユーザとの距離や部屋（仮想空間）の大きさなどに関する感覚の形成（認知）において、重要であると一般的に考えられている。実空間上の室内では、音源から直接発せられた音（直接音）が聞こえた後、数ｍｓから１００ｍｓくらいの間に、条件によっては、壁、天井、床などからの数１０個の初期反射を聞くことができるといわれている。部屋の形状が直方体であれば、１回の初期反射は６個だけである。しかしながら、より複雑な形状または家具などがある部屋においては、反射音の数が増え、また、壁などで複数回反射した音も聞こえる。 Reverberation is composed of early reflections and late reverberation. Then, it is generally considered that the early reflection is more important than the late reverberation in the formation (recognition) of the sensation regarding the distance from other conference room users and the size of the room (virtual space). In a room in real space, after hearing the sound directly emitted from the sound source (direct sound), several tens of initial reflections from walls, ceilings, floors, etc., depending on conditions, between several ms to 100 ms. It is said that you can hear. If the shape of the room is a rectangular parallelepiped, there are only six initial reflections at a time. However, in a room with a more complicated shape or furniture, the number of reflected sounds increases, and sounds reflected multiple times by walls or the like can be heard.

初期反射の計算法としてimage source methodがあり、例えば文献「Allen， J.B. and Berkley， A.， "Image Method for efficiently Simulating Small-Room Acoustics", J.Acoustical Society of America， Vol.65， No.4， pp.943-950， April 1979.」に記載されている。単純なimage source methodでは、部屋の壁、天井、床を鏡面とみなし、反射音を鏡面の反対側にある音源の像からの音として計算する。 There is an image source method as a method for calculating the initial reflection. For example, “Allen, JB and Berkley, A.,“ Image Method for efficiently Simulating Small-Room Acoustics ”, J. Acoustic Society of America, Vol. 65, No. 4 , Pp.943-950, April 1979. ”. In the simple image source method, the wall, ceiling, and floor of the room are regarded as mirror surfaces, and the reflected sound is calculated as sound from the sound source image on the opposite side of the mirror surface.

図９は説明を簡単にするために、天井と床を省略した２次元のimage source methodを模式的に表した図である。すなわち、中央に本来の仮想空間である仮想会議室２０８１があり、当該仮想会議室２０８１には、自ユーザおよび他会議室ユーザが存在する。そして、仮想会議室２０８１の周囲には、部屋の壁２０８２を含む１２個の鏡像が描かれている。なお、鏡像は１２個である必然性はなく、これより多くすることも少なくすることもできる。 FIG. 9 is a diagram schematically showing a two-dimensional image source method in which the ceiling and floor are omitted for the sake of simplicity. That is, there is a virtual conference room 2081 that is an original virtual space in the center, and the virtual conference room 2081 includes its own user and other conference room users. In addition, twelve mirror images including the wall 2082 of the room are drawn around the virtual conference room 2081. The number of mirror images is not necessarily 12 and can be increased or decreased.

オーディオレンダラ２０８は、鏡像各々の中に存在する他会議室ユーザの各像からの音が、自ユーザ（聴取者）に直進するものとして、他会議室ユーザの各像から自ユーザまでの距離と方向を算出する（Ｓ３００６）。音の強さは距離に反比例するため、オーディオレンダラ２０８は、距離に従って各音量を減衰させる。但し、壁の反射率をα（０≦α≦１）とすると、壁でｎ回反射される音の標本には、αⁿを乗じて音量をさらに減衰させる。 The audio renderer 208 assumes that the sound from each image of the other conference room user existing in each mirror image goes straight to the own user (listener), and the distance from each image of the other conference room user to the own user. The direction is calculated (S3006). Since the sound intensity is inversely proportional to the distance, the audio renderer 208 attenuates each volume according to the distance. However, if the reflectance of the wall is α (0 ≦ α ≦ 1), the sound sample reflected n times by the wall is multiplied by α ⁿ to further attenuate the volume.

なお、反射率αの値は０．６程度の値を使用する。０．６程度の値にする理由は、自ユーザが他会議室ユーザとの距離を認識するのに充分な残響（すなわち、直接音と反射音との比）を取得するためである。また、もう１つの理由としては、αの値を過大にした場合、自ユーザの方向感覚をにぶらせるからである。 Note that the value of the reflectance α is about 0.6. The reason why the value is set to about 0.6 is to acquire reverberation (that is, the ratio of direct sound and reflected sound) sufficient for the user to recognize the distance to the other conference room user. Another reason is that if the value of α is excessively large, the user's sense of direction is disturbed.

次に、オーディオレンダラ２０８は、他会議室ユーザの像毎に、自ユーザとの距離および角度に対応するＨＲＩＲを、予め記憶しておいたＨＲＩＲの数値の中から特定する（Ｓ３００７）。反射音はそれぞれ異なる方向から人頭に達するため、Ｓ３００３で特定した直接音のＨＲＩＲとは異なるＨＲＩＲを適用する必要がある。 Next, the audio renderer 208 specifies the HRIR corresponding to the distance and angle with the own user from the stored HRIR numerical values for each image of other conference room users (S3007). Since the reflected sounds reach the human head from different directions, it is necessary to apply HRIR different from the HRIR of the direct sound specified in S3003.

なお、多数の反射音各々に、異なるＨＲＩＲを用いて後述するたたみこみ計算（Ｓ３００７、Ｓ３００８）を行うと、膨大な計算が必要になる。計算量の増加を防止するため、反射音の計算には、実際の音源の方向にかかわらず正面に音源があるときのＨＲＩＲを適用してもよい。そして、音が左右の耳に達する際の時間差（ITD:interaural time difference）と強度差（IID:interaural intensity difference）だけを計算することで、少ない計算量でＨＲＩＲの計算を代替できる。 If a convolution calculation (S3007, S3008), which will be described later, is performed on each of a large number of reflected sounds using different HRIRs, an enormous calculation is required. In order to prevent an increase in the amount of calculation, HRIR when a sound source is in front may be applied to the calculation of reflected sound regardless of the actual direction of the sound source. By calculating only the time difference (ITD: interaural time difference) and intensity difference (IID: interaural intensity difference) when the sound reaches the left and right ears, the calculation of HRIR can be replaced with a small amount of calculation.

次に、オーディオレンダラ２０８は、Ｓ３００１で入力した信号列と、Ｓ３００７で特定したＨＲＩＲの左チャネル用ＨＲＩＲとを使用して、畳み込み計算を行い、左チャネル信号の残響を生成する（Ｓ３００８）。同様に、Ｓ３００１で入力した信号列と、Ｓ３００７で特定したＨＲＩＲの右チャネル用ＨＲＩＲとを使用して、畳み込み計算を行い、右チャネル信号の残響を生成する（Ｓ３００９）。 Next, the audio renderer 208 performs convolution calculation using the signal sequence input in S3001 and the HRIR left channel HRIR specified in S3007 to generate reverberation of the left channel signal (S3008). Similarly, convolution calculation is performed using the signal sequence input in S3001 and the HRIR right channel HRIR specified in S3007 to generate the reverberation of the right channel signal (S3009).

さて、オーディオレンダラ２０８は、以上のようにして全ての他会議室ユーザ各々の左チャネル信号を計算したならば、これらを全て加算する（Ｓ３０１０）。なお、左チャネル信号は、Ｓ３００４で算出した直接音と、Ｓ３００８で算出した反射音とが含まれる。 When the audio renderer 208 calculates the left channel signals of all the other conference room users as described above, all of them are added (S3010). Note that the left channel signal includes the direct sound calculated in S3004 and the reflected sound calculated in S3008.

同様に、オーディオレンダラ２０８は、以上のようにして全ての他会議室ユーザ各々の右チャネル信号を計算したならば、これらを全て加算する（Ｓ３０１１）。なお、右チャネル信号は、Ｓ３００５で算出した直接音とＳ３００９で算出した反射音とが含まれる。 Similarly, if the audio renderer 208 has calculated the right channel signals of all the other conference room users as described above, all of them are added (S3011). The right channel signal includes the direct sound calculated in S3005 and the reflected sound calculated in S3009.

ＨＲＩＲ計算（Ｓ３００３、Ｓ３００７）は、ＲＴＰパケットの１パケット分の音声データ毎に行う。しかし、畳み込み計算（Ｓ３００４、Ｓ３００５、Ｓ３００８、Ｓ３００９）では、次の１パケット分の音声データに繰り越すべき部分が生じる。このため、特定したＨＲＩＲまたは入力された信号列を次の１パケット分の音声データに対する処理まで保持する必要がある。 The HRIR calculation (S3003, S3007) is performed for each voice data of one RTP packet. However, in the convolution calculation (S3004, S3005, S3008, S3009), a portion to be carried over to the next one packet of audio data occurs. For this reason, it is necessary to hold the specified HRIR or the input signal sequence until processing for the next one packet of audio data.

このように、オーディオレンダラ２０８は、他会議室ユーザ検出部２１８から送られてきた他会議室ユーザ各々の音声データに対して、上述の計算による音量の調節、残響や反響音の重ね合わせ、および、フィルタリング等の処理を行い、自ユーザの仮想空間内の位置において聞こえるべき音に音響効果を施す。すなわち、オーディオレンダラ２０８は、仮想空間の属性と、他会議室ユーザの自ユーザに対する相対的な位置とから帰結する処理によって音声を定位させた立体音響を生成する。 As described above, the audio renderer 208 adjusts the volume by the above-described calculation, superimposition of reverberation and reverberation sound on the audio data of each other conference room user sent from the other conference room user detection unit 218, and Then, processing such as filtering is performed, and an acoustic effect is applied to the sound to be heard at the position in the virtual space of the user. That is, the audio renderer 208 generates stereophonic sound in which the sound is localized by processing resulting from the attribute of the virtual space and the relative position of the other conference room user with respect to the user.

図６に戻って説明を続ける。ＳＩＰ制御部２１４は、各会議端末２のユーザＩＤおよびＳＩＰ-ＵＲＩが登録されたテーブルを保持しており、必要に応じてこのテーブルを用いて、他の会議端末２との間にコネクションを確立する。 Returning to FIG. 6, the description will be continued. The SIP control unit 214 holds a table in which the user ID and SIP-URI of each conference terminal 2 are registered, and establishes a connection with another conference terminal 2 using this table as necessary. To do.

図１０はＳＩＰ制御部２１４の動作フローを説明する図である。 FIG. 10 is a diagram for explaining the operation flow of the SIP control unit 214.

会議端末２の起動時に、ＳＩＰ処理部２１４は、他会議室ユーザ検出部２１８より通知された他会議室ユーザのユーザＩＤを持つ会議端末２各々とコネクションを確立する。先ず、ＳＩＰ処理部２１４は、自身のテーブルに登録されているＳＩＰ-ＵＲＩの中から、未抽出の他会議室ユーザのＳＩＰ-ＵＲＩを抽出する（Ｓ４００１）。次に、ＳＩＰ処理部２１４は、抽出したＳＩＰ-ＵＲＩを宛先とするＩＮＶＩＴＥメッセージを、ＩＰ処理部２１２および無線ＬＡＮインターフェース部２１９を介してプレゼンスサーバ１に送信し、ＳＩＰ-ＵＲＩを持つ会議端末２に対して、コネクションの確立を試みる（Ｓ４００２）。次に、ＳＩＰ処理部２１４は、自身のテーブルに登録されている全ての他会議室ユーザのＳＩＰ-ＵＲＩを抽出したか否かを調べ（Ｓ４００３）、抽出していない場合はＳ４００１に戻り、抽出した場合は、起動時のコネクション確立処理を終了し、各種イベントの待ち状態に移行する。 When the conference terminal 2 is activated, the SIP processing unit 214 establishes a connection with each conference terminal 2 having the user ID of the other conference room user notified from the other conference room user detection unit 218. First, the SIP processing unit 214 extracts the SIP-URIs of unextracted other conference room users from the SIP-URIs registered in its own table (S4001). Next, the SIP processing unit 214 transmits an INVITE message destined for the extracted SIP-URI to the presence server 1 via the IP processing unit 212 and the wireless LAN interface unit 219, and the conference terminal 2 having the SIP-URI. In response to this, an attempt is made to establish a connection (S4002). Next, the SIP processing unit 214 checks whether or not the SIP-URIs of all other conference room users registered in its own table have been extracted (S4003). If not extracted, the process returns to S4001 and extracted. In such a case, the connection establishment process at the time of activation is terminated, and a transition is made to a wait state for various events.

さて、ＳＩＰ処理部２１４は、無線ＬＡＮインターフェース部２１９およびＩＰ処理部２１２を介してＩＰ網４からＩＮＶＩＴＥメッセージを受信すると（Ｓ４１０１でＹＥＳ）、該ＩＮＶＩＴＥメッセージの送信元（発側）の会議端末２との間でＳＩＰに従った呼制御シーケンスを実行し、当該会議端末２との間にコネクションを確立する（Ｓ４１０２）。 When the SIP processing unit 214 receives an INVITE message from the IP network 4 via the wireless LAN interface unit 219 and the IP processing unit 212 (YES in S4101), the conference terminal 2 that is the transmission source (originating side) of the INVITE message. A call control sequence according to SIP is executed with the conference terminal 2 to establish a connection with the conference terminal 2 (S4102).

また、ＳＩＰ処理部２１４は、無線ＬＡＮインターフェース部２１９およびＩＰ処理部２１２を介してコネクションを確立している通話相手の会議端末２からＢＹＥメッセージを受信すると（Ｓ４２０１でＹＥＳ）、該通話相手の会議端末２との間でＳＩＰに従った呼制御シーケンスを実行し、当該会議端末２との間のコネクションを解放する（Ｓ４２０２）。 In addition, when the SIP processing unit 214 receives a BYE message from the conference terminal 2 of the communication partner that has established a connection via the wireless LAN interface unit 219 and the IP processing unit 212 (YES in S4201), the SIP communication unit 214 A call control sequence according to SIP is executed between the terminal 2 and the connection with the conference terminal 2 is released (S4202).

また、ＳＩＰ制御部２１４は、他会議室ユーザ検出部２１８より他会議室ユーザのユーザＩＤが新たに通知されると（Ｓ４３０１でＹＥＳ）、自身のテーブルに登録されているＳＩＰ-ＵＲＩの中から、通知された他会議室ユーザのユーザＩＤ各々に対応付けられているＳＩＰ-ＵＲＩを抽出し、各ＳＩＰ-ＵＲＩとの間にコネクションを確立しているか否かを調べる。コネクションを確立していないＳＩＰ-ＵＲＩがある場合（Ｓ４３０２でＹＥＳ）、ＳＩＰ処理部２１４は、当該ＳＩＰ-ＵＲＩを宛先とするＩＮＶＩＴＥメッセージを、ＩＰ処理部２１２および無線ＬＡＮインターフェース部２１９を介してプレゼンスサーバ１に送信し、当該ＳＩＰ-ＵＲＩを持つ会議端末２に対して、コネクションの確立を試みる（Ｓ４３０３）。 In addition, when the user ID of the other conference room user is newly notified from the other conference room user detection unit 218 (YES in S4301), the SIP control unit 214 selects the SIP-URI registered in its own table. The SIP-URI associated with each user ID of the other conference room user who has been notified is extracted, and it is checked whether or not a connection has been established with each SIP-URI. If there is a SIP-URI that has not established a connection (YES in S4302), the SIP processing unit 214 sends an INVITE message addressed to the SIP-URI via the IP processing unit 212 and the wireless LAN interface unit 219. It transmits to the server 1 and tries to establish a connection to the conference terminal 2 having the SIP-URI (S4303).

一方、新たに通知された他会議室ユーザのユーザＩＤに対応付けられている全てのＳＩＰ-ＵＲＩとの間にコネクションが確立している場合（Ｓ４３０２でＮＯ）、ＳＩＰ制御部２１４は、自身のテーブルを用いて、コネクションが確立中である通話相手のＳＩＰ-ＵＲＩの中に、他会議室ユーザ検出部２１８より新たに通知された他会議室ユーザ以外のユーザのユーザＩＤに対応付けられているＳＩＰ-ＵＲＩがあるか否かを調べる。他会議室ユーザ以外のユーザＩＤに対応づけられているＳＩＰ-ＵＲＩとの間でコネクションが確立中であるならば（Ｓ４３０４でＹＥＳ）、当該ＳＩＰ-ＵＲＩを宛先とするＢＹＥメッセージを、ＩＰ処理部２１２および無線ＬＡＮインターフェース部２１９を介して当該ＳＩＰ-ＵＲＩを持つ会議端末２に送信し、当該会議端末２との間のコネクションを解放する（Ｓ４３０５）。 On the other hand, when connections have been established with all the SIP-URIs associated with the user IDs of other conference room users newly notified (NO in S4302), the SIP control unit 214 Using the table, it is associated with the user ID of the user other than the other conference room user newly notified from the other conference room user detection unit 218 in the SIP-URI of the other party whose connection is being established. Check whether there is a SIP-URI. If a connection is being established with a SIP-URI associated with a user ID other than another conference room user (YES in S4304), a BYE message destined for the SIP-URI is sent to the IP processing unit. The connection is transmitted to the conference terminal 2 having the SIP-URI via the 212 and the wireless LAN interface unit 219, and the connection with the conference terminal 2 is released (S4305).

上記構成のプレゼンスサーバ１には、図１１に示すような、プログラムに従ってデータの加工・演算を行なうＣＰＵ４０１と、ＣＰＵ４０１が直接読み書き可能なメモリ４０２と、ハードディスク等の外部記憶装置４０３と、ＩＰ網３を介して外部システムとデータ通信をするための通信装置４０４と、入力装置４０５と、出力装置４０６とを、を有する一般的なコンピュータシステムを利用することができる。具体的には、サーバ、ホストコンピュータなどである。 As shown in FIG. 11, the presence server 1 configured as described above includes a CPU 401 that processes and operates data according to a program, a memory 402 that the CPU 401 can directly read and write, an external storage device 403 such as a hard disk, and an IP network 3. A general computer system having a communication device 404 for performing data communication with an external system, an input device 405, and an output device 406 can be used. Specifically, a server, a host computer, and the like.

また、上記構成の無線ＬＡＮ-ＡＰ３Ａ〜３Ｃは、図１１に示す構成に、無線ＬＡＮに接続するための無線通信装置を追加したコンピュータシステムを利用することができる。 The wireless LAN-APs 3A to 3C having the above configuration can use a computer system in which a wireless communication device for connecting to the wireless LAN is added to the configuration shown in FIG.

また、上記構成の会議端末２は、図１１に示す構成において、通信装置４０４の代わりに無線ＬＡＮに接続するための無線通信装置を搭載したコンピュータシステムを利用することができる。例えば、ＰＤＡ（Personal Digital Assistant）、ハンドヘルドコンピュータ、および、ウエアラブル・コンピュータなどである。 In addition, the conference terminal 2 having the above configuration can use a computer system equipped with a wireless communication device for connecting to a wireless LAN in place of the communication device 404 in the configuration shown in FIG. For example, a PDA (Personal Digital Assistant), a handheld computer, and a wearable computer.

図１２は会議端末２にＰＤＡまたはハンドヘルドコンピュータを用いた例を示している。装置本体２３０には、ディスプレイ２２４、ポインティングデバイス２２５、および、無線ＬＡＮ用のアンテナ２３１が設けられている。また、装置本体２３０に接続されたヘッドセットは、マイク２２１および３Ｄオーディオ対応ヘッドフォン２２３を有する。 FIG. 12 shows an example in which a PDA or a handheld computer is used for the conference terminal 2. The apparatus main body 230 is provided with a display 224, a pointing device 225, and a wireless LAN antenna 231. The headset connected to the apparatus main body 230 includes a microphone 221 and a 3D audio compatible headphone 223.

ポインティングデバイス２２５は、前進ボタン２２５１、後退ボタン２２５２、左移動ボタン２２５３、右移動ボタン２２５４および選択ボタン２２５５を有する。例えば、前進ボタン２２５１を押すことによって、仮想空間内で前進し、後退ボタン２２５２を押すことによって仮想空間内で後退する。なお、ポインティングデバイス２２５は、タッチパネルであってもよい。すなわち、ディスプレイ２２４の表面を、指などの接触を検知するための素子を配置した透明なスクリーン（タッチパネル）で覆ったタッチスクリーンとしてもよい。ユーザは、指や専用のペンでディスプレイ２２４に触れることで、容易に入力操作を行なうことができる。 The pointing device 225 includes a forward button 2251, a backward button 2252, a left movement button 2253, a right movement button 2254, and a selection button 2255. For example, pressing the forward button 2251 moves forward in the virtual space, and pressing the backward button 2252 moves backward in the virtual space. Note that the pointing device 225 may be a touch panel. That is, the surface of the display 224 may be a touch screen that is covered with a transparent screen (touch panel) on which elements for detecting contact with a finger or the like are arranged. The user can easily perform an input operation by touching the display 224 with a finger or a dedicated pen.

また、図示するヘッドセットは、装置本体２３０に有線で接続されているが、Ｂｌｕｅｔｏｏｔｈ（登録商標）やＩｒＤＡなどの近距離無線通信により接続してもよい。 Further, although the illustrated headset is connected to the apparatus main body 230 by wire, it may be connected by short-range wireless communication such as Bluetooth (registered trademark) or IrDA.

なお、上記各装置の各機能は、メモリ３０２にロードまたは記憶された所定のプログラム（プレセンスサーバ１の場合はプレゼンスサーバ用のプログラム、無線ＬＡＮ-ＡＰ３Ａ〜３Ｃの場合は無線ＬＡＮ-ＡＰ用のプログラム、そして、会議端末２の場合は会議端末用のプログラム）を、ＣＰＵ３０１が実行することにより実現される。 Each function of each of the above devices is a predetermined program loaded or stored in the memory 302 (a program for a presence server in the case of the presence server 1, a program for a wireless LAN-AP in the case of the wireless LAN-APs 3A to 3C) In the case of the conference terminal 2, the program is implemented by the CPU 301.

次に、上記構成のビデオ会議システムの概略動作を説明する。 Next, a schematic operation of the video conference system having the above configuration will be described.

図１３は、図１に示す音声会議システムの概略動作を説明するための図である。ここでは、始めはユーザＡ、Ｄが同じ会議室Ａにおり、ユーザＢ、Ｃが別の会議室Ｂにおり、その後、ユーザＡがユーザＢ、Ｃと同じ会議室Ｂに移動する場合を例にとり、ユーザＡが音声会議に参加する場合の概略動作を説明する。 FIG. 13 is a diagram for explaining the schematic operation of the audio conference system shown in FIG. In this example, first, users A and D are in the same conference room A, users B and C are in another conference room B, and then user A moves to the same conference room B as users B and C. By the way, the general operation when the user A participates in the audio conference will be described.

先ず、ユーザＡの会議端末２は、ユーザＡの仮想位置情報をプレゼンスサーバ１に送信する。これを受けて、プレゼンスサーバ１はユーザＡの仮想位置情報を登録する（Ｓ５００１）。また、ユーザＡの会議端末２は、所在情報送信要求を自会議端末２が無線通信に利用する無線ＬＡＮ-ＡＰ３Ａに送信する（Ｓ５００２）。これを受けて、無線ＬＡＮ-ＡＰ３Ａは、自無線ＬＡＮ-ＡＰのＡＰＩＤをユーザＡの所在情報としてプレゼンスサーバ１に送信する。これを受けて、プレゼンスサーバ１はユーザＡの所在情報を登録する（Ｓ５００３）。また、ユーザＡの会議端末２は、位置情報送信要求をプレゼンスサーバ１に送信する（Ｓ５００４）。これを受けて、プレゼンスサーバ１はユーザＡ〜Ｄの仮想位置情報および所在情報を送信する（Ｓ５００５）。 First, the conference terminal 2 of the user A transmits the virtual position information of the user A to the presence server 1. In response, the presence server 1 registers the virtual position information of the user A (S5001). Further, the conference terminal 2 of the user A transmits a location information transmission request to the wireless LAN-AP 3A used by the conference terminal 2 for wireless communication (S5002). In response to this, the wireless LAN-AP 3A transmits the APID of its own wireless LAN-AP to the presence server 1 as the location information of the user A. In response, the presence server 1 registers the location information of the user A (S5003). In addition, the conference terminal 2 of the user A transmits a position information transmission request to the presence server 1 (S5004). In response, the presence server 1 transmits the virtual position information and location information of the users A to D (S5005).

次に、ユーザＡの会議端末２は、他会議室ユーザの検出を行う（Ｓ５００６）。ここでは、始めはユーザＡ、Ｄが同じ会議室Ａにおり、ユーザＢ、Ｃが別の会議室Ｂにいる。したがって、ここでは、他会議室ユーザとしてユーザＢ、Ｃが検出される。このため、ユーザＡの会議端末２は、ユーザＢ、Ｃの会議端末２各々のＳＩＰ−ＵＲＩを宛先とするＩＮＶＩＴＥメッセージをプレゼンスサーバ１に送信する。プレゼンスサーバ１は、これらのＩＮＶＩＴＥメッセージをユーザＢ、Ｃの会議端末２各々に送信する（Ｓ５００７）。これにより、ユーザＡの会議端末２は、ユーザＢ、Ｃの会議端末２各々との間にコネクションを確立し、これらのコネクションを介して音声会議を行う（Ｓ５００８）。ユーザＤは同じ会議室Ａにいるので、ユーザＤとの間にはコネクションを確立しない。ユーザＡおよびユーザＤは、本実施形態の会議システムを通さずに、直接会話を行う。 Next, the conference terminal 2 of the user A detects another conference room user (S5006). Here, the users A and D are initially in the same conference room A, and the users B and C are in another conference room B. Therefore, here, users B and C are detected as other conference room users. Therefore, the conference terminal 2 of the user A transmits an INVITE message addressed to the SIP-URI of each of the conference terminals 2 of the users B and C to the presence server 1. The presence server 1 transmits these INVITE messages to the conference terminals 2 of the users B and C (S5007). Thereby, the conference terminal 2 of the user A establishes a connection with each of the conference terminals 2 of the users B and C, and performs a voice conference through these connections (S5008). Since the user D is in the same conference room A, no connection is established with the user D. User A and user D have a direct conversation without going through the conference system of this embodiment.

さて、ユーザＡが移動して、ユーザＢ、Ｃと同じ会議室Ｂに移動したとする。会議室Ｂにおいても、ユーザＡの会議端末ＡはＳ５００１〜Ｓ５００６と同様の処理を行う（Ｓ５００９〜Ｓ５０１４）。その結果、他会議室ユーザとしてユーザＤのみが検出される。このため、ユーザＡの会議端末２は、コネクション確立中のユーザＢ、Ｃの会議端末２各々に対してＢＹＥメッセージを送信し、これらの会議端末２との間のコネクションを解放する（Ｓ５０１５）。また、ユーザＡの会議端末２は、ユーザＤの会議端末２のＳＩＰ−ＵＲＩを宛先とするＩＮＶＩＴＥメッセージをプレゼンスサーバ１に送信する。プレゼンスサーバ１は、このＩＮＶＩＴＥメッセージをユーザＤの会議端末２に送信する（Ｓ５０１６）。これにより、ユーザＡの会議端末２は、ユーザＤの会議端末２との間にコネクションを確立し、このコネクションを介して音声会議を行う（Ｓ５０１７）。ユーザＢ、Ｃは同じ会議室Ｂにいるので、ユーザＢ、Ｃとの間にはコネクションを確立しない。ユーザＡ、ユーザＢおよびユーザＣは、本実施形態の会議システムを通さずに、直接会話を行う。 Now, it is assumed that the user A has moved to the same conference room B as the users B and C. Also in the conference room B, the conference terminal A of the user A performs the same processing as S5001 to S5006 (S5009 to S5014). As a result, only the user D is detected as another conference room user. For this reason, the conference terminal 2 of the user A transmits a BYE message to each of the conference terminals 2 of the users B and C that are establishing connections, and releases the connection with these conference terminals 2 (S5015). The conference terminal 2 of the user A transmits an INVITE message addressed to the SIP-URI of the conference terminal 2 of the user D to the presence server 1. The presence server 1 transmits this INVITE message to the conference terminal 2 of the user D (S5016). Thereby, the conference terminal 2 of the user A establishes a connection with the conference terminal 2 of the user D, and performs a voice conference through this connection (S5017). Since the users B and C are in the same conference room B, no connection is established between the users B and C. User A, user B, and user C have a direct conversation without going through the conference system of this embodiment.

以上、本発明の第１実施形態を説明した。本実施形態では、会議端末２は、実空間上の同じ会議室に存在する他の会議端末２のユーザの音声データを出力しない。実空間上の別の会議室に存在する他の会議端末２のユーザの音声データのみが立体音響処理されて出力される。したがって、音声データで表現されている各ユーザの方向および距離に違和感を生じさせないようにすることができる。 The first embodiment of the present invention has been described above. In the present embodiment, the conference terminal 2 does not output voice data of users of other conference terminals 2 existing in the same conference room in the real space. Only the audio data of the user of the other conference terminal 2 existing in another conference room in the real space is subjected to the stereophonic processing and output. Therefore, it is possible to prevent a sense of incongruity from occurring in the direction and distance of each user expressed by the audio data.

<<第２実施形態>>
図１４は本発明の第２実施形態が適用された音声会議システムの概略構成図である。図示するように、本実施形態のビデオ会議システムは、プレゼンスサーバ１´と、音声サーバ５と、複数の会議端末２´と、ＩＰ網４を介してプレセンスサーバ１に接続する複数の無線ＬＡＮ-ＡＰ（Local Area Network）３Ａ〜３Ｃと、を有する。本実施形態において、上記の第１実施形態と同じ機能を有するものには同じ符号を付している。 << Second Embodiment >>
FIG. 14 is a schematic configuration diagram of an audio conference system to which the second embodiment of the present invention is applied. As shown in the figure, the video conference system of this embodiment includes a presence server 1 ′, an audio server 5, a plurality of conference terminals 2 ′, and a plurality of wireless LANs connected to the presence server 1 via the IP network 4. AP (Local Area Network) 3A-3C. In the present embodiment, components having the same functions as those in the first embodiment are denoted by the same reference numerals.

プレゼンスサーバ１´は、各会議端末２´のユーザの仮想位置情報と所在情報とを管理する。また、音声サーバ５からの位置情報送信要求に応答して、各会議端末２´のユーザの仮想位置情報および所在情報を音声サーバ５に送信する。なお、本実施形態のプレゼンスサーバ１´は、図３に示す第１実施形態のプレゼンスサーバ１からＳＩＰ処理部１０３を省略したものである。本実施形態のプレゼンスサーバ１´の処理フローは、図５に示す第１実施形態のプレゼンスサーバ１の処理フローからＳＩＰ処理（Ｓ１０１０〜Ｓ１０１２）を省略したものと同じになる。 The presence server 1 ′ manages the virtual position information and location information of the user of each conference terminal 2 ′. In response to the position information transmission request from the voice server 5, the virtual position information and location information of the user of each conference terminal 2 ′ are transmitted to the voice server 5. Note that the presence server 1 ′ of the present embodiment is obtained by omitting the SIP processing unit 103 from the presence server 1 of the first embodiment shown in FIG. The processing flow of the presence server 1 ′ according to the present embodiment is the same as that obtained by omitting the SIP processing (S1010 to S1012) from the processing flow of the presence server 1 according to the first embodiment shown in FIG.

音声サーバ５は、各会議端末２´のユーザの音声データを受信する。また、音声サーバ５は、会議端末２´毎に、当該会議端末２´のユーザ向けの会議音声データ（３Ｄオーディオデータ）を生成し、当該会議端末２´に送信する。図１５は音声サーバ５の概略図である。 The voice server 5 receives the voice data of the user of each conference terminal 2 ′. Moreover, the audio | voice server 5 produces | generates the meeting audio | voice data (3D audio data) for users of the said conference terminal 2 'for every conference terminal 2', and transmits to the said conference terminal 2 '. FIG. 15 is a schematic diagram of the voice server 5.

図示するように、音声サーバ５は、ＩＰ網４に接続するためのＩＰ網インターフェース部５０１と、ＲＴＰ処理部５０２と、ＳＩＰ処理部５０３と、プレゼンスプロバイダ５０４と、空間モデラ５０５と、ユーザ情報生成部５０６と、音声分配部５０８と、会議端末２´毎に設けられたオーディオレンダラ５０９と、を有する。 As illustrated, the voice server 5 includes an IP network interface unit 501 for connecting to the IP network 4, an RTP processing unit 502, an SIP processing unit 503, a presence provider 504, a spatial modeler 505, and user information generation. A unit 506, an audio distribution unit 508, and an audio renderer 509 provided for each conference terminal 2 ′.

ＳＩＰ制御部５０３は、ＩＰ網インターフェース部５０１を介して各会議端末２´との間にコネクションを確立する。 The SIP control unit 503 establishes a connection with each conference terminal 2 ′ via the IP network interface unit 501.

ＲＴＰ処理部５０２は、会議端末２´毎に、当該会議端末２´との間で確立されているコネクションを介して当該会議端末２´からユーザの音声データを受信し、受信した音声データを、該音声データの送信元アドレスと共に、音声分配部５０８に出力する。また、ＲＴＰ処理部５０２は、会議端末２´毎に、当該会議端末２´に対応付けられたオーディオレンダラ５０９から出力された会議音声データを、当該会議端末２´との間で確立されているコネクションを介して当該会議端末２´に送信する。 For each conference terminal 2 ′, the RTP processing unit 502 receives the user's voice data from the conference terminal 2 ′ via the connection established with the conference terminal 2 ′. Along with the transmission source address of the voice data, the voice data is output to the voice distribution unit 508. In addition, the RTP processing unit 502 establishes, for each conference terminal 2 ′, conference audio data output from the audio renderer 509 associated with the conference terminal 2 ′ with the conference terminal 2 ′. It transmits to the said conference terminal 2 'via a connection.

プレゼンスプロバイダ５０４は、ＩＰ網インターフェース部５０１を介してプレゼンスサーバ１に位置情報送信要求を定期的に送信し、その応答としてプレゼンスサーバ１から各会議端末２´のユーザのレコード（仮想位置情報、所在情報）１０４０を受信する。そして、受信した各ユーザのレコード１０４０を空間モデラ５０５に通知する。 The presence provider 504 periodically sends a location information transmission request to the presence server 1 via the IP network interface unit 501, and as a response, the presence server 1 records the user's record (virtual location information, location). Information) 1040 is received. Then, the received record 1040 of each user is notified to the space modeler 505.

空間モデラ５０５は、プレゼンスプロバイダ５０４から各会議端末２´のユーザのレコード１０４０を受け取って保持すると共に、他会議室ユーザ検出部５０６に出力する。 The space modeler 505 receives and holds the user record 1040 of each conference terminal 2 ′ from the presence provider 504 and outputs it to the other conference room user detection unit 506.

ユーザ情報生成部５０６は、会議端末２´毎に、空間モデラ２１１から受信した各ユーザのレコード１０４０の中から当該会議端末２´のユーザＩＤを含むレコード１０４０を特定する。そして、特定したレコード１０４０に含まれているユーザＩＤ、ＩＰアドレスおよび位置情報を含む自ユーザ情報を生成し、音声分配部５０８に送信する。また、ユーザ情報生成部５０６は、会議端末２´毎に、前記特定したレコード１０４０以外のレコード１０４０の中から前記特定したレコード１０４０の所在情報と異なる所在情報を持つレコード１０４０を検索し、検索した各レコード１０４０に含まれているユーザＩＤ、ＩＰアドレスおよび仮想位置情報を含む他会議室ユーザ情報を生成して、生成した他会議室ユーザ情報各々を当該会議端末２´の自ユーザ情報に含まれているユーザＩＤに対応付けて音声分配部５０８に送信する。 The user information generation unit 506 identifies, for each conference terminal 2 ′, a record 1040 including the user ID of the conference terminal 2 ′ from among the records 1040 of each user received from the space modeler 211. Then, own user information including the user ID, IP address, and position information included in the specified record 1040 is generated and transmitted to the voice distribution unit 508. In addition, the user information generation unit 506 searches for the record 1040 having location information different from the location information of the identified record 1040 from the records 1040 other than the identified record 1040 for each conference terminal 2 ′. Other conference room user information including the user ID, IP address, and virtual position information included in each record 1040 is generated, and each generated other conference room user information is included in the own user information of the conference terminal 2 ′. The voice ID is transmitted to the voice distribution unit 508 in association with the user ID.

音声分配部５０８は、会議端末２´毎に、ＲＴＰ処理部５０２から受け取った各ユーザの音声データの中から、当該会議端末２´に送信する会議音声データに利用する音声データを抽出する。具体的には、会議端末２´毎に次の処理を行う。すわわち、ユーザ情報生成部５０６から受信した自ユーザ情報のうち、当該会議端末２´のユーザＩＤを含む自ユーザ情報を、当該会議端末２´の自ユーザ情報として検出する。そして、当該会議端末２´の自ユーザ情報を、当該会議端末２´に対応付けられたオーディオレンダラ５０９に出力する。また、ユーザ情報生成部５０６から受信した他会議室ユーザ情報のうち、当該会議端末２´の自ユーザ情報のユーザＩＤに対応付けられている他会議室ユーザ情報を、当該会議端末２´の他会議室ユーザ情報として検出する。また、ＲＴＰ処理部５０２から受け取った各ユーザの音声データのうち、当該会議端末２´の他会議室ユーザ情報のＩＰアドレスを送信元アドレスとする音声データを検出する。そして、検出した音声データを、当該音声データの送信元アドレスをＩＰアドレスとする当該会議端末２の他会議室ユーザ情報と共に、当該会議端末２´に対応付けられたオーディオレンダラ５０９に出力する。 For each conference terminal 2 ′, the audio distribution unit 508 extracts audio data used for conference audio data to be transmitted to the conference terminal 2 ′ from the audio data of each user received from the RTP processing unit 502. Specifically, the following processing is performed for each conference terminal 2 ′. That is, the local user information including the user ID of the conference terminal 2 ′ is detected as the local user information of the conference terminal 2 ′ from the local user information received from the user information generation unit 506. Then, the user information of the conference terminal 2 ′ is output to the audio renderer 509 associated with the conference terminal 2 ′. Further, among the other conference room user information received from the user information generation unit 506, the other conference room user information associated with the user ID of the own user information of the conference terminal 2 ′ is changed to the other conference room 2 ′. Detect as conference room user information. In addition, voice data having the source address as the IP address of the other conference room user information of the conference terminal 2 ′ is detected from the voice data of each user received from the RTP processing unit 502. Then, the detected audio data is output to the audio renderer 509 associated with the conference terminal 2 ′ together with other conference room user information of the conference terminal 2 whose IP address is the transmission source address of the audio data.

オーディオレンダラ５０９は、音声分配部５０８から各音声データを他会議室ユーザ情報と共に受信する。また、音声分配部５０８から自ユーザ情報を受信する。そして、受信した各音声データをバッファリングすることによって、各音声データ間で同期させる（対応付ける）。また、オーディオレンダラ５０９は、同期させた各音声データを、各音声データに付与された他会議室ユーザ情報の仮想位置情報と自ユーザ情報の仮想位置情報とにより特定される、仮想空間における他会議室ユーザと自ユーザとの相対位置に基づいて立体化する。そして、オーディオレンダラ５０９は、２チャンネル（左チャンネルと右チャンネル）の信号データ（信号列）を含む会議音声データをＲＴＰ処理部５０２に出力する。なお、音声データの立体化の方法は、第１実施形態のオーディオレンダラ２０８のそれと基本的に同様である（図８および図９参照）。 The audio renderer 509 receives each audio data from the audio distribution unit 508 together with other conference room user information. Also, the user information is received from the voice distributor 508. Then, the received audio data is buffered to synchronize (correlate) the audio data. In addition, the audio renderer 509 identifies each synchronized audio data based on the virtual location information of the other conference room user information given to each audio data and the virtual location information of the own user information. Three-dimensionalization is performed based on the relative position between the room user and the user. The audio renderer 509 outputs conference audio data including signal data (signal sequence) of two channels (left channel and right channel) to the RTP processing unit 502. Note that the method of three-dimensionalizing audio data is basically the same as that of the audio renderer 208 of the first embodiment (see FIGS. 8 and 9).

会議端末２´は、音声サーバ５との間にコネクションを確立し、該コネクションを介して自ユーザの音声データを音声サーバ５に送信する。また、該コネクションを介して音声サーバ５より音声会議データを受信して出力する。図１６は会議端末２´の概略構成図である。 The conference terminal 2 ′ establishes a connection with the voice server 5 and transmits the voice data of the own user to the voice server 5 through the connection. Also, the audio conference data is received from the audio server 5 through the connection and output. FIG. 16 is a schematic configuration diagram of the conference terminal 2 ′.

図示するように、会議端末２´は、音声入力部２０１と、音声出力部２０３と、映像出力部２０４と、操作受付部２０５と、オーディオエンコーダ２０６と、オーディオデコーダ２４８と、プレゼンスプロバイダ２１０と、空間モデラ２１１と、ＩＰ処理部２１２と、ＲＴＰ処理部２４３と、ＳＩＰ制御部２４４と、着席情報作成部２１７と、を有する。ここで、図６に示す第１実施形態の会議端末２と同じ機能を有するものには同じ符号を付している。 As shown in the figure, the conference terminal 2 ′ includes an audio input unit 201, an audio output unit 203, a video output unit 204, an operation reception unit 205, an audio encoder 206, an audio decoder 248, a presence provider 210, The space modeler 211, the IP processing unit 212, the RTP processing unit 243, the SIP control unit 244, and the seating information creation unit 217 are included. Here, components having the same functions as those of the conference terminal 2 of the first embodiment shown in FIG.

ＳＩＰ制御部２４４は、ＩＰ処理部２１２および無線ＬＡＮインターフェース部２１９を介して音声サーバ５との間にコネクションを確立する。 The SIP control unit 244 establishes a connection with the voice server 5 via the IP processing unit 212 and the wireless LAN interface unit 219.

ＲＴＰ処理部２４３は、音声サーバ５の間で確立されているコネクションを介して、オーディオエンコーダ２０６より出力された音声データを音声サーバ５に送信する。また、該コネクションを介して、音声サーバ５から会議音声データを受信し、受信した会議音声データをオーディオデコーダ２４８に送信する。 The RTP processing unit 243 transmits the audio data output from the audio encoder 206 to the audio server 5 via the connection established between the audio servers 5. Further, the conference audio data is received from the audio server 5 through the connection, and the received conference audio data is transmitted to the audio decoder 248.

オーディオデコーダ２４８は、ＲＴＰ処理部２４３から受け取った会議音声データをデコードして音声信号を音声出力部２０４に出力する。 The audio decoder 248 decodes the conference audio data received from the RTP processing unit 243 and outputs an audio signal to the audio output unit 204.

上記構成のプレゼンスサーバ１´および音声サーバ５も、第１実施形態のプレゼンスサーバ１と同様に、図１１に示すようなコンピュータシステムを利用することができる。具体的には、サーバ、ホストコンピュータなどである。また、上記構成の会議端末２´も、第１実施形態の会議端末２と同様に、図１１に示すようなコンピュータシステムを利用することができる。例えば、ＰＤＡ、ハンドヘルドコンピュータ、および、ウエアラブル・コンピュータなどである。 Similarly to the presence server 1 of the first embodiment, the presence server 1 ′ and the voice server 5 having the above configuration can also use a computer system as shown in FIG. 11. Specifically, a server, a host computer, and the like. Further, the conference terminal 2 ′ having the above configuration can also use a computer system as shown in FIG. 11, similarly to the conference terminal 2 of the first embodiment. For example, a PDA, a handheld computer, and a wearable computer.

次に、上記構成の音声会議システムの概略動作を説明する。 Next, a schematic operation of the voice conference system having the above configuration will be described.

図１７は図１４に示す音声会議システムの概略動作を説明するための図である。ここでは、始めはユーザＥ、Ｈが同じ会議室Ａにおり、ユーザＦ、Ｇが別の会議室Ｄにおり、その後、ユーザＥがユーザＦ、Ｇと同じ会議室Ｃに移動する場合を例にとり、ユーザＥが音声会議に参加する場合の概略動作を説明する。なお、ユーザＥ〜Ｈの会議端末２´各々は、音声サーバ５との間にコネクションを確立しているものとする。 FIG. 17 is a diagram for explaining the schematic operation of the voice conference system shown in FIG. Here, an example is given in which the users E and H are initially in the same conference room A, the users F and G are in a different conference room D, and then the user E moves to the same conference room C as the users F and G. By the way, the general operation when the user E participates in the audio conference will be described. It is assumed that each of the conference terminals 2 ′ of the users E to H has established a connection with the voice server 5.

先ず、ユーザＥの会議端末２´は、ユーザＥの仮想位置情報をプレゼンスサーバ１に送信する。これを受けて、プレゼンスサーバ１´はユーザＥの仮想位置情報を登録する（Ｓ６００１）。また、ユーザＥの会議端末２´は、所在情報送信要求を自会議端末２´が無線通信に利用する無線ＬＡＮ-ＡＰ３Ａに送信する（Ｓ６００２）。これを受けて、無線ＬＡＮ-ＡＰ３Ａは、自無線ＬＡＮ-ＡＰのＡＰＩＤをユーザＥの所在情報としてプレゼンスサーバ１´に送信する。これを受けて、プレゼンスサーバ１´はユーザＥの所在情報を登録する（Ｓ６００３）。 First, the conference terminal 2 ′ of the user E transmits the virtual position information of the user E to the presence server 1. In response, the presence server 1 'registers the virtual position information of the user E (S6001). Further, the conference terminal 2 ′ of the user E transmits a location information transmission request to the wireless LAN-AP 3A used by the conference terminal 2 ′ for wireless communication (S6002). In response to this, the wireless LAN-AP 3A transmits the APID of its own wireless LAN-AP to the presence server 1 ′ as the location information of the user E. In response to this, the presence server 1 'registers the location information of the user E (S6003).

一方、音声サーバ５は、位置情報送信要求を送信する（Ｓ６００４）。これを受けて、プレゼンスサーバ１´はユーザＥ〜Ｈの仮想位置情報および所在情報を音声サーバ５に送信する（Ｓ６００５）。それから、音声サーバ５は、他会議室ユーザの検出を行う（Ｓ６００６）。ここでは、始めはユーザＥ、Ｈが同じ会議室Ａにおり、ユーザＦ、Ｇが別の会議室Ｂにいる。しかたがって、ユーザＥの他会議室ユーザとしてユーザＦ、Ｇが検出される。このため、音声サーバ５は、ユーザＦ、Ｇの会議端末２から受信した音声データ各々をユーザＥとの相対位置に基づいて立体音響処理し合成し音声会議データを生成する。そして、生成した音声会議データを、ユーザＥの会議端末２´に送信する（Ｓ６００７）。ユーザＨはユーザＥと同じ会議室Ａにいるので、ユーザＨの音声データはユーザＥ向けの音声会議データに含まれない。ユーザＥおよびユーザＨは、本実施形態の会議システムを通さずに、直接会話を行う。 On the other hand, the voice server 5 transmits a position information transmission request (S6004). In response, the presence server 1 ′ transmits the virtual position information and location information of the users E to H to the voice server 5 (S6005). Then, the voice server 5 detects other conference room users (S6006). Here, users E and H are initially in the same conference room A, and users F and G are in another conference room B. Accordingly, the users F and G are detected as other conference room users of the user E. For this reason, the audio server 5 generates and generates audio conference data by performing stereophonic processing on the audio data received from the conference terminals 2 of the users F and G based on the relative position with the user E and synthesizing them. Then, the generated voice conference data is transmitted to the conference terminal 2 ′ of the user E (S6007). Since the user H is in the same conference room A as the user E, the voice data of the user H is not included in the voice conference data for the user E. The user E and the user H have a direct conversation without going through the conference system of the present embodiment.

さて、ユーザＥが移動して、ユーザＦ、Ｇと同じ会議室Ｂに移動したとする。会議室Ｂにおいても、Ｓ６００１〜Ｓ６００６と同様の処理が行われる（Ｓ６００８〜Ｓ６０１３）。その結果、ユーザＥの他会議室ユーザとしてユーザＨのみが検出される。このため、音声サーバ５は、ユーザＨの会議端末２から受信した音声データをユーザＥとの相対位置に基づいて立体音響処理し音声会議データを生成する。そして、生成した音声会議データを、ユーザＥの会議端末２´に送信する（Ｓ６０１４）。ユーザＦ、Ｇは、ユーザＥと同じ会議室Ｂにいるので、ユーザＦ、Ｇの音声データはユーザＥ向けの音声会議データに含まれない。ユーザＥ、ユーザＦおよびユーザＧは、本実施形態の会議システムを通さずに、直接会話を行う。 Now, it is assumed that the user E moves to the same conference room B as the users F and G. In the conference room B, the same processing as S6001 to S6006 is performed (S6008 to S6013). As a result, only the user H is detected as the other conference room user of the user E. For this reason, the voice server 5 performs voice processing on the voice data received from the conference terminal 2 of the user H based on the relative position with the user E to generate voice conference data. Then, the generated audio conference data is transmitted to the conference terminal 2 ′ of the user E (S6014). Since the users F and G are in the same conference room B as the user E, the voice data of the users F and G is not included in the voice conference data for the user E. User E, user F, and user G have a direct conversation without going through the conference system of the present embodiment.

以上、本発明の第２実施形態を説明した。本実施形態でも、上記の第１実施形態と同様に、会議端末２´は、実空間上の同じ会議室に存在する他の会議端末２´のユーザの音声データを出力しない。実空間上の別の会議室に存在する他の会議端末２´のユーザの音声データのみが立体音響処理されて出力される。したがって、音声データで表現されている各ユーザの方向および距離に違和感を生じさせないようにすることができる。 The second embodiment of the present invention has been described above. Also in the present embodiment, as in the first embodiment, the conference terminal 2 ′ does not output the voice data of users of other conference terminals 2 ′ existing in the same conference room in the real space. Only the audio data of the user of the other conference terminal 2 ′ existing in another conference room in the real space is subjected to the stereophonic processing and output. Therefore, it is possible to prevent a sense of incongruity from occurring in the direction and distance of each user expressed by the audio data.

なお、本発明は、上記の各実施形態に限定されるものではなく、その要旨の範囲内で数々の変形が可能である。 The present invention is not limited to the above-described embodiments, and various modifications are possible within the scope of the gist.

例えば、上記の各実施形態では、各会議端末２、２´が、ポインティングデバイス２２５を介してユーザより受付けた操作内容に応じて当該ユーザの仮想位置情報（位置および向き）を決定している。しかし、本発明はこれに限定されない。例えば、会議端末２、２´の当該会議端末２、２´が実際に所在する会議室での現在位置および向きに基づいて、当該会議端末２、２´のユーザの仮想位置情報を決定してもよい。 For example, in each of the above embodiments, each conference terminal 2, 2 ′ determines the virtual position information (position and orientation) of the user according to the operation content received from the user via the pointing device 225. However, the present invention is not limited to this. For example, the virtual position information of the user of the conference terminal 2, 2 ′ is determined based on the current position and orientation in the conference room where the conference terminal 2, 2 ′ is actually located Also good.

図１８は図６に示す会議端末２の変形例を説明するための図である。この変形例では、操作受付部２０５に代えて、会議端末２の方位を計測する方位計測部２５３と、会議端末２が所在する会議室における当該会議端末２の現在位置を算出する現在位置算出部２５２と、を有する。 FIG. 18 is a diagram for explaining a modification of the conference terminal 2 shown in FIG. In this modification, instead of the operation accepting unit 205, an azimuth measuring unit 253 that measures the azimuth of the conference terminal 2, and a current position calculating unit that calculates the current position of the conference terminal 2 in the conference room where the conference terminal 2 is located. 252.

方位計測部２５３には、例えば磁気方位センサを用いることができる。通常、磁気方位センサは、磁気抵抗素子で構成されたホイートストンブリッジおよび薄膜コイルを有する。磁気抵抗素子は、当該磁気抵抗素子を流れる電流の方向と直交する方向に磁界が印加されると抵抗値が変化する。磁気方位センサは、この特性を利用して地磁気を検出する。 For the azimuth measuring unit 253, for example, a magnetic azimuth sensor can be used. Usually, the magnetic azimuth sensor has a Wheatstone bridge and a thin film coil composed of magnetoresistive elements. A magnetoresistive element changes its resistance value when a magnetic field is applied in a direction orthogonal to the direction of current flowing through the magnetoresistive element. The magnetic orientation sensor detects geomagnetism using this characteristic.

現在値算出部２５２は、例えば会議室に設置された少なくとも３つの無線発信機から発信された無線信号の信号強度と、各無線発信機の設置位置（当該会議室に設けられた原点からの座標位置）とを用いて、三辺測量の原理により自会議端末２の現在位置を測定する。ここで、無線通信システムを用いた位置検出システムについては、例えば「荻野、恒原他/B-5-203、無線ＬＡＮ統合アクセスシステム（１）：位置検出システムの検討、電子情報通信学会総合大会講演論文集、Vol. 2003年_通信 Num. 1 pp.662 (2003.03)」や、「恒原、荻野他/B-5-204、無線ＬＡＮ統合アクセスシステム（２）:位置検出精度に関する検討、電子情報通信学会総合大会講演論文集、Vol. 2003年_通信 Num. 1 pp.663 (2003.03)」に詳しい。なお、本実施形態では、各会議室の原点を各会議室の中心に設定している。 For example, the current value calculation unit 252 determines the signal strength of wireless signals transmitted from at least three wireless transmitters installed in the conference room, and the installation positions (coordinates from the origin provided in the conference room) of each wireless transmitter. And the current position of the self-conference terminal 2 is measured by the principle of trilateral surveying. Here, for the position detection system using the wireless communication system, for example, “Ogino, Tsunehara et al./B-5-203, Wireless LAN integrated access system (1): Examination of position detection system, IEICE General Conference Proceedings of the Lecture, Vol. 2003_Communication Num. 1 pp.662 (2003.03) ”,“ Tsunehara, Sugano et al./B-5-204, Wireless LAN Integrated Access System (2): Study on location detection accuracy Proceedings of the IEICE General Conference, Vol. 2003_Communications Num. 1 pp.663 (2003.03) ”. In the present embodiment, the origin of each conference room is set at the center of each conference room.

空間モデラ２１１は、現在位置算出部２５２で算出した自会議端末２の会議室における現在位置および方位計測部２５３で測定した自会議端末２の方位と、仮想空間における自ユーザの位置および向きとして、仮想位置情報を生成する。本実施形態では、上述したように、各会議室の原点を各会議室の中心に設定している。したがって、プレゼンスサーバ１で管理される各会議端末２の仮想空間における位置と向きは図１９に示すようになる。つまり、実空間における会議端末２の位置と向きが仮想空間における位置と向きに反映されるので、より違和感のない音声会議を実現できる。なお、各会議室の原点は必ずしも各会議室の中心とする必要はない。この場合、各会議室の原点に各会議室の中心までのオフセット値を持たせる。そして、現在位置の測定値をこのオフセット値で補正した値を、仮想空間における位置とする。 The space modeler 211 includes the current position in the conference room of the self-conference terminal 2 calculated by the current position calculation unit 252 and the orientation of the self-conference terminal 2 measured by the orientation measurement unit 253, and the position and orientation of the own user in the virtual space. Generate virtual location information. In the present embodiment, as described above, the origin of each conference room is set at the center of each conference room. Accordingly, the positions and orientations of the conference terminals 2 managed by the presence server 1 in the virtual space are as shown in FIG. That is, since the position and orientation of the conference terminal 2 in the real space are reflected in the position and orientation in the virtual space, it is possible to realize a voice conference that is more comfortable. Note that the origin of each conference room does not necessarily have to be the center of each conference room. In this case, the origin of each conference room has an offset value to the center of each conference room. A value obtained by correcting the measured value of the current position with the offset value is set as a position in the virtual space.

なお、各会議端末２のユーザは、音声会議に参加する自身の仮想空間における位置を着席情報生成部２１７が生成した着席情報により確認し（図７参照）、仮想空間において他のユーザと重ならないように、自身の会議室（実空間）における現在位置を調整することができる。 The user of each conference terminal 2 confirms the position in his / her virtual space participating in the audio conference based on the seating information generated by the seating information generating unit 217 (see FIG. 7), and does not overlap with other users in the virtual space. Thus, the current position in its own conference room (real space) can be adjusted.

また、上記の各実施形態では、無線ＬＡＮ-ＡＰ３Ａ〜３Ｃが、会議端末２、２´からの所在情報送信要求に応答して、該要求送信元の所在情報をプレゼンスサーバ１に送信している。しかし、本発明はこれに限定されない。 Further, in each of the above embodiments, the wireless LAN-APs 3A to 3C transmit the location information of the request transmission source to the presence server 1 in response to the location information transmission request from the conference terminals 2 and 2 ′. . However, the present invention is not limited to this.

例えば、無線ＬＡＮ-ＡＰ３Ａ〜３Ｃが、会議端末２、２´からの所在情報送信要求に応答して、該要求送信元の所在情報を該要求送信元に返信し、会議端末２、２´が無線ＬＡＮ-ＡＰ３Ａ〜３Ｃから受信した所在情報を、プレゼンスサーバ１に送信してもよい。 For example, in response to the location information transmission request from the conference terminals 2 and 2 ', the wireless LAN-APs 3A to 3C return the location information of the request transmission source to the request transmission source, and the conference terminals 2 and 2' The location information received from the wireless LAN-APs 3A to 3C may be transmitted to the presence server 1.

あるいは、無線ＬＡＮ-ＡＰ３Ａ〜３Ｃ、もしくは、会議室毎に設けられた、当該会議室に所在する会議端末２、２´のデータがＩＰ網４へ伝送される場合に必ず通過するネットワーク装置（例えばＬＡＮスイッチ）に、ＳＩＰプロキシ機能を追加し、会議端末２、２´からプレゼンスサーバ１に送信されるＳＩＰメッセージに、所在情報を表すＳＩＰヘッダを追加させるようにしてもよい。追加するＳＩＰヘッダとしては、例えば、"Via:SIP/2.0/UDP room-301＠aa.co.jp:5060;type=room"のようなものが考えられる。この例では、このＳＩＰヘッダが所在情報を表すことを"type=room"で示し、会議室の識別子を"room-301＠aa.co.jp:5060"で示している。 Alternatively, the wireless LAN-AP 3A to 3C or a network device provided for each conference room, which always passes when data of the conference terminals 2 and 2 'located in the conference room is transmitted to the IP network 4 (for example, A SIP proxy function may be added to the LAN switch, and a SIP header representing location information may be added to the SIP message transmitted from the conference terminals 2, 2 ′ to the presence server 1. As the SIP header to be added, for example, “Via: SIP / 2.0 / UDP room-301@aa.co.jp: 5060; type = room” can be considered. In this example, it is indicated by “type = room” that the SIP header represents location information, and the identifier of the conference room is indicated by “room-301@aa.co.jp: 5060”.

あるいは、無線ＬＡＮ-ＡＰ３Ａ〜３Ｃ、もしくは、会議室毎に設けられた、当該会議室に所在する会議端末２、２´のデータがＩＰ網４へ伝送される場合に必ず通過するネットワーク装置（例えばＬＡＮスイッチ）に、ＳＩＰプロキシ機能を追加し、会議端末２、２´からプレゼンスサーバ１に送信されるＳＩＰの登録要求メッセージ（REGISTERパケット）に、所在情報を追加させるようにしてもよい。このようにすれば、通常のＳＩＰのシーケンスをそのまま使用して、会議端末２、２´の所在情報を、プレゼンスサーバ１、１´に登録することができる。 Alternatively, the wireless LAN-AP 3A to 3C or a network device provided for each conference room, which always passes when data of the conference terminals 2 and 2 'located in the conference room is transmitted to the IP network 4 (for example, The SIP proxy function may be added to the LAN switch), and the location information may be added to the SIP registration request message (REGISTER packet) transmitted from the conference terminals 2, 2 ′ to the presence server 1. In this way, the location information of the conference terminals 2 and 2 ′ can be registered in the presence server 1 and 1 ′ using a normal SIP sequence as it is.

あるいは、無線ＬＡＮ-ＡＰ３Ａ〜３Ｃ、もしくは、会議室毎に設けられた、当該会議室に所在する会議端末２、２´のデータがＩＰ網４へ伝送される場合に必ず通過するネットワーク装置（例えばＬＡＮスイッチ）に、ＤＳ（Differentiated Services）機能付きのルータ機能を持たせ、ＩＰパケットのＤＳフィールドを用いて、ＩＰパケットに所在情報をマーキングさせるようにしてもよい。ＩＰパケットのＤＳフィールドを使用すれば６４通りのマーキングが可能であるので、６４部屋までの会議室を区別できる。 Alternatively, the wireless LAN-AP 3A to 3C or a network device provided for each conference room, which always passes when data of the conference terminals 2 and 2 'located in the conference room is transmitted to the IP network 4 (for example, The LAN switch may have a router function with a DS (Differentiated Services) function, and the location information may be marked on the IP packet by using the DS field of the IP packet. If the DS field of the IP packet is used, 64 types of markings are possible, so that up to 64 conference rooms can be distinguished.

また、上記の各実施形態では、コネクションの確立にＳＩＰを利用する場合を例にとり説明した。しかし、本発明はこれに限定されない。例えばＨ.３２３等のＳＩＰ以外の呼制御プロトコルを利用してもよい。なお、上記の第２実施形態のように、会議端末２´と音声サーバ５との間で常時通信を行なうことを前提する場合は、呼制御プロトコルに従った呼制御シーケンスを省略できる。 In each of the above embodiments, the case where SIP is used for establishing a connection has been described as an example. However, the present invention is not limited to this. For example, a call control protocol other than SIP such as H.323 may be used. When it is assumed that communication is always performed between the conference terminal 2 ′ and the voice server 5 as in the second embodiment, the call control sequence according to the call control protocol can be omitted.

図１は本発明の第１実施形態が適用された音声会議システムの概略構成図である。FIG. 1 is a schematic configuration diagram of an audio conference system to which the first embodiment of the present invention is applied. 図２は無線ＬＡＮ-ＡＰ３Ａ〜３Ｃの概略構成図である。FIG. 2 is a schematic configuration diagram of the wireless LAN-APs 3A to 3C. 図３はプレゼンスサーバ１の概略構成図である。FIG. 3 is a schematic configuration diagram of the presence server 1. 図３は位置情報記憶部１０４の登録内容を模式的に示した図である。FIG. 3 is a diagram schematically showing the registered contents of the position information storage unit 104. 図５はプレゼンスサーバ１の動作フローを説明する図である。FIG. 5 is a diagram for explaining the operation flow of the presence server 1. 図６は会議端末２の概略構成図である。FIG. 6 is a schematic configuration diagram of the conference terminal 2. 図７は着席情報表示データの表示例を示す図である。FIG. 7 is a diagram showing a display example of seating information display data. 図８はオーディオレンダラ２０８の処理を説明する図である。FIG. 8 is a diagram for explaining the processing of the audio renderer 208. 図９は天井と床を省略した２次元のimage source methodを模式的に表した図である。FIG. 9 is a diagram schematically showing a two-dimensional image source method with the ceiling and floor omitted. 図１０はＳＩＰ制御部２１４の動作フローを説明する図である。FIG. 10 is a diagram for explaining the operation flow of the SIP control unit 214. 図１５は音声会議システムを構成する各装置のハードウエア構成例を示す図である。FIG. 15 is a diagram illustrating a hardware configuration example of each device constituting the audio conference system. 図１２は会議端末２の外観の一例を示す図である。FIG. 12 is a diagram illustrating an example of the appearance of the conference terminal 2. 図１３は図１に示す音声会議システムの概略動作を説明する図である。FIG. 13 is a diagram for explaining the schematic operation of the voice conference system shown in FIG. 図１４は本発明の第２実施形態が適用された音声会議システムの概略構成図である。FIG. 14 is a schematic configuration diagram of an audio conference system to which the second embodiment of the present invention is applied. 図１５は音声サーバ５の概略図である。FIG. 15 is a schematic diagram of the voice server 5. 図１６は会議端末２´の概略構成図である。FIG. 16 is a schematic configuration diagram of the conference terminal 2 ′. 図１７は図１４に示す音声会議システムの概略動作を説明する図である。FIG. 17 is a diagram for explaining the schematic operation of the voice conference system shown in FIG. 図１８は会議端末２の変形例を説明するための図である。FIG. 18 is a diagram for explaining a modified example of the conference terminal 2. 仮想現在位置の決定方法の一例を説明するための図である。It is a figure for demonstrating an example of the determination method of a virtual present position.

Explanation of symbols

１、１´…プレゼンスサーバ、２、２´…会議端末、３Ａ〜３Ｃ…無線ＬＡＮ-ＡＰ、４…ＩＰ網、５…音声サーバ、１０１…ＩＰ網インターフェース部、１０２…位置情報管理部、１０３…ＳＩＰサーバ処理部、１０４…位置情報記憶部、２０１…音声入力部、２０３…音声出力部、２０４…映像出力部、２０５…操作受付部、２０６…オーディオエンコーダ、２０８…オーディオレンダラ、２１０…プレゼンスプロバイダ、２１１…空間モデラ、２１２…ＩＰ処理部、２１３…ＲＴＰ処理部、２１４…ＳＩＰ制御部、２１７…着席情報生成部、２１８…他会議室ユーザ検出部、２１９…無線ＬＡＮインターフェース部、２４３…ＲＴＰ処理部、２４４…ＳＩＰ制御部、２４８…オーディオデコーダ、２５２…現在地算出部、２５３…方位計測部、３０１…ＩＰ網インターフェース部、３０２…無線ＬＡＮインターフェース部、３０３…所在情報送信部、５０１…ＩＰ網インターフェース部、５０２…ＲＴＰ処理部、５０３…ＳＩＰ制御部、５０４…プレゼンスプロバイダ、５０５…空間モデラ、５０６…ユーザ情報生成部、５０８…音声分配部、５０９…オーディオレンダラ DESCRIPTION OF SYMBOLS 1, 1 '... Presence server 2, 2' ... Conference terminal, 3A-3C ... Wireless LAN-AP, 4 ... IP network, 5 ... Voice server, 101 ... IP network interface part, 102 ... Location information management part, 103 DESCRIPTION OF SYMBOLS ... SIP server process part 104 ... Position information storage part 201 ... Audio | voice input part 203 ... Audio | voice output part 204 ... Video | video output part 205 ... Operation reception part 206 ... Audio encoder 208 ... Audio renderer 210 ... Presence Provider 211, Spatial modeler, 212 IP processor, 213 RTP processor, 214 SIP controller, 217 Seating information generator, 218 Other meeting room user detector, 219 Wireless LAN interface, 243 RTP processing unit, 244 ... SIP control unit, 248 ... audio decoder, 252 ... current location calculation unit, 253 ... direction meter 301: IP network interface unit, 302 ... Wireless LAN interface unit, 303 ... Location information transmission unit, 501 ... IP network interface unit, 502 ... RTP processing unit, 503 ... SIP control unit, 504 ... Presence provider, 505 ... Space Modeler, 506 ... user information generation unit, 508 ... audio distribution unit, 509 ... audio renderer

Claims

An audio conference system,
A plurality of conference terminals, a presence server for managing a location of each of the plurality of conference terminals in a virtual space and a location in the real space, and each location in the real space. A plurality of relay devices used for existing conference terminals to communicate with the presence server;
Each of the plurality of conference terminals includes:
Virtual position information transmitting means for transmitting to the presence server virtual position information of the own user including the position and direction of the own user in the virtual space, which is the user of the self-conference terminal;
Position information receiving means for receiving, from the presence server, virtual position information of a user of each conference terminal and location information indicating a location in real space;
Based on the location information of each conference terminal user received by the location information receiving means, another conference room user detection for detecting a user in another conference room who is a user in a location different from the location in the real space of the own user Means,
And audio data transmission means for transmitting the audio data of its own user, to the conference terminal each of the other conference users the other conference room user detection means detects,
Voice data receiving means for receiving voice data of other conference room users from each conference terminal of the other conference room users detected by the other conference room user detecting means,
The other conference room user in the virtual space specified by the virtual position information of the other conference room user and the virtual position information of the own user for each voice data of the other conference room user received by the voice data receiving means A speech synthesis unit that performs stereophonic sound processing according to the relative position between the user and the user, synthesizes each of the audio data of other conference room users subjected to the stereophonic sound processing, and generates stereo-synthesized sound data;
Voice control means for outputting the three-dimensional synthesized voice data generated by the voice synthesis means from a speaker;
The presence server
While managing the virtual position information of the users sent from each of the plurality of conference terminals, and based on the relay device through which the information sent from each of the plurality of conference terminals has passed, Management means for managing user location information;
For each of the plurality of conference terminals, there is provided location information transmission means for transmitting the virtual location information and location information of the user of each conference terminal managed by the management means to the conference terminal. Voice conference system.

The audio conference system according to claim 1,
The relay device is
In accordance with a request from a conference terminal that uses the own relay device, information on the own relay device is transmitted to the presence server as location information of the user of the conference terminal.

The audio conference system according to claim 1 or 2,
Each of the plurality of conference terminals includes:
It further has position information detecting means for detecting the position and orientation of the own user with respect to the origin provided in the location where the own conference terminal exists,
The virtual position information transmission is
The audio conference system, wherein the position and orientation detected by the position information detection means are set as the position and orientation of the user in the virtual space, and the user's virtual position information is transmitted to the presence server.

An audio conference system,
A plurality of conference terminals; a presence server that manages a location of each of the plurality of conference terminals in a virtual space and a location in real space; a voice server that transmits voice data to each of the plurality of conference terminals; A plurality of relay devices installed at each location in real space, and used by a conference terminal existing at the location to communicate with the presence server,
Each of the plurality of conference terminals includes:
Virtual position information transmitting means for transmitting to the presence server virtual position information of the own user including the position and direction of the own user in the virtual space, which is the user of the self-conference terminal;
Voice data transmitting means for transmitting the voice data of the own user to the voice server;
Three-dimensional synthesized voice data receiving means for receiving three-dimensional synthesized voice data from the voice server;
Voice control means for outputting the stereo synthesized voice data received by the stereo synthesized voice data receiving means from a speaker;
The voice server is
Position information receiving means for receiving, from the presence server, virtual position information of each user of the plurality of conference terminals and location information indicating a location in real space;
For each of the plurality of conference terminals, based on the location information of the user of each conference terminal received by the location information receiving means, exists in a location different from the location in the real space of the target user who is the user of the conference terminal Other meeting room user detecting means for detecting another meeting room user who is a user to perform,
Voice data receiving means for receiving voice data of a user of the conference terminal from each of the plurality of conference terminals;
For each of the plurality of conference terminals, the virtual location information of the other conference room user with respect to the audio data of each other conference room user for the target user who is the user of the conference terminal detected by the other conference room user detecting means And the audio of the other conference room user subjected to the stereophonic sound processing, which is specified by the virtual position information of the target user and subjected to the stereophonic sound processing according to the relative position between the target user and the other conference room user in the virtual space. A voice synthesis means for synthesizing each of the data to generate three-dimensional synthesized voice data;
For each of the plurality of conference terminals, there is stereo synthesized speech data transmitting means for transmitting to the conference terminal stereo synthesized speech data for the target user of the conference terminal generated by the speech synthesizer,
The presence server
While managing the virtual location information of the users sent from each of the plurality of conference terminals, and based on the relay device through which the information sent from each of the plurality of conference terminals has passed, each of the plurality of conference terminals Management means for managing location information;
A voice conference system comprising: position information transmitting means for transmitting virtual position information and location information of a user of each conference terminal managed by the management means to the voice server.

The audio conference system according to claim 4,
The relay device is
In accordance with a request from a conference terminal that uses the own relay device, information on the own relay device is transmitted to the presence server as location information of the user of the conference terminal.

The audio conference system according to claim 4 or 5,
Each of the plurality of conference terminals includes:
It further has position information detecting means for detecting the position and orientation of the own user with respect to the origin provided in the location where the own conference terminal exists,
The virtual position information transmitting means includes
The audio conference system, wherein the position and orientation detected by the position information detection means are set as the position and orientation of the user in the virtual space, and the user's virtual position information is transmitted to the presence server.

A conference terminal,
The user's virtual location information, including the location and orientation of the user who is the user of the conference terminal, in the virtual space, and the location and real space of the user of each of the plurality of conference terminals participating in the audio conference Virtual location information transmitting means for transmitting to a presence server for managing the location on
Position information receiving means for receiving, from the presence server, virtual position information of a user of each conference terminal and location information indicating a location in real space;
Based on the location information of each conference terminal user received by the location information receiving means, another conference room user detection for detecting a user in another conference room who is a user in a location different from the location in the real space of the own user Means,
And audio data transmission means for transmitting the audio data of its own user, to the conference terminal each of the other conference users the other conference room user detection means detects,
Voice data receiving means for receiving voice data of other conference room users from each conference terminal of the other conference room users detected by the other conference room user detecting means,
The other conference room user in the virtual space specified by the virtual position information of the other conference room user and the virtual position information of the own user for each voice data of the other conference room user received by the voice data receiving means A speech synthesis unit that performs stereophonic sound processing according to the relative position between the user and the user, synthesizes each of the audio data of other conference room users subjected to the stereophonic sound processing, and generates stereo-synthesized sound data;
And a voice control means for outputting the three-dimensional synthesized voice data generated by the voice synthesis means from a speaker.

A computer-readable program,
The program is a computer,
The user's virtual location information, including the location and orientation of the user who is the user of the conference terminal, in the virtual space, and the location and real space of the user of each of the plurality of conference terminals participating in the audio conference Virtual location information transmitting means for transmitting to a presence server for managing the location on
Position information receiving means for receiving, from the presence server, virtual position information of a user of each conference terminal and location information indicating a location in real space;
Based on the location information of each conference terminal user received by the location information receiving means, another conference room user detection for detecting a user in another conference room who is a user in a location different from the location in the real space of the own user Means,
And audio data transmission means for transmitting the audio data of its own user, to the conference terminal each of the other conference users the other conference room user detection means detects,
Voice data receiving means for receiving voice data of other conference room users from each conference terminal of the other conference room users detected by the other conference room user detecting means,
The other conference room user in the virtual space specified by the virtual position information of the other conference room user and the virtual position information of the own user for each voice data of the other conference room user received by the voice data receiving means A speech synthesis unit that performs stereophonic sound processing according to the relative position between the user and the user, synthesizes each of the audio data of other conference room users subjected to the stereophonic sound processing, and generates stereo-synthesized sound data;
A computer-readable program that functions as a conference terminal having voice control means for outputting the three-dimensionally synthesized voice data generated by the voice synthesis means from a speaker.

An audio server that transmits audio data to each of a plurality of conference terminals,
The virtual location information of each user of the plurality of conference terminals and the location in the real space from the presence server that manages the location in the virtual space and the location in the real space of each of the conference terminals participating in the audio conference Position information receiving means for receiving location information indicating
For each of the plurality of conference terminals, based on the location information of the user of each conference terminal received by the location information receiving means, exists in a location different from the location in the real space of the target user who is the user of the conference terminal Other meeting room user detecting means for detecting another meeting room user who is a user to perform,
Voice data receiving means for receiving voice data of a user of the conference terminal from each of the plurality of conference terminals;
For each of the plurality of conference terminals, the virtual location information of the other conference room user with respect to the audio data of each other conference room user for the target user who is the user of the conference terminal detected by the other conference room user detecting means And the audio of the other conference room user subjected to the stereophonic sound processing, which is specified by the virtual position information of the target user and subjected to the stereophonic sound processing according to the relative position between the target user and the other conference room user in the virtual space. A voice synthesis means for synthesizing each of the data to generate three-dimensional synthesized voice data;
Stereo-synthesized speech data transmitting means for transmitting, to each of the plurality of conference terminals, stereo-synthesized speech data for the target user of the conference terminal generated by the speech synthesizer, to the conference terminal. Voice server.

A computer-readable program,
The program is a computer,
The virtual location information of each user of the plurality of conference terminals and the location in the real space from the presence server that manages the location in the virtual space and the location in the real space of each of the conference terminals participating in the audio conference Position information receiving means for receiving location information indicating
For each of the plurality of conference terminals, based on the location information of the user of each conference terminal received by the location information receiving means, exists in a location different from the location in the real space of the target user who is the user of the conference terminal Other meeting room user detecting means for detecting another meeting room user who is a user to perform,
Voice data receiving means for receiving voice data of a user of the conference terminal from each of the plurality of conference terminals;
For each of the plurality of conference terminals, the virtual location information of the other conference room user with respect to the audio data of each other conference room user for the target user who is the user of the conference terminal detected by the other conference room user detecting means And the audio of the other conference room user subjected to the stereophonic sound processing, which is specified by the virtual position information of the target user and subjected to the stereophonic sound processing according to the relative position between the target user and the other conference room user in the virtual space. A voice synthesis means for synthesizing each of the data to generate three-dimensional synthesized voice data;
As a voice server having, for each of the plurality of conference terminals, stereo synthesized voice data transmitting means for transmitting to the conference terminal stereo synthesized voice data for the target user of the conference terminal generated by the voice synthesizing means. A computer-readable program characterized by causing it to function.