JP2007288255A

JP2007288255A - Communication system, information management apparatus, information processing apparatus, information processing method, and program

Info

Publication number: JP2007288255A
Application number: JP2006109814A
Authority: JP
Inventors: Yasuhisa Nakajima; 康久中嶋
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2006-04-12
Filing date: 2006-04-12
Publication date: 2007-11-01

Abstract

<P>PROBLEM TO BE SOLVED: To provide a technology capable of easily particularizing users during utterance from users not making statements even when a plurality of users make statements at the same time. <P>SOLUTION: A user terminal 1-1 detects users whose utterance is started and users whose utterance is finished on the basis of voice data or the like transmitted from user terminals 1-2, 1-3 being participants' terminals taking part in a conference. When the configuration of users making statements at the same time is changed, the user terminal 1-1 transmits control information including identification information for identifying the users making statements to the user terminals 1-2, 1-3. The user terminals 1-2, 1-3 particularize the users making statements on the basis of the control information and display information of the users making statements and information of the users not making statements in different forms. The technology above can be applied to apparatuses capable of making voice conversation by the VoIP. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、通信システム、情報管理装置、情報処理装置、情報処理方法、およびプログラムに関し、特に、複数のユーザが同時に発言を行っているときであっても、それぞれの情報処理装置が、発言を行っているユーザと発言を行っていないユーザを情報管理装置から送信されてくる情報に基づいて容易に特定することができるようにした通信システム、情報管理装置、情報処理装置、情報処理方法、およびプログラムに関する。 The present invention relates to a communication system, an information management apparatus, an information processing apparatus, an information processing method, and a program, and in particular, even when a plurality of users are speaking at the same time, each information processing apparatus A communication system, an information management device, an information processing device, an information processing method, and an information management device, which can easily identify a user who is performing and a user who is not speaking based on information transmitted from the information management device; Regarding the program.

近年、IP(Internet Protocol)ネットワークを利用した、VoIP(Voice over IP)による会話サービスが普及し始めている。VoIPによる会話サービスを利用することによって、ユーザは、パーソナルコンピュータなどに接続したハンドセット（通話機）を使って、電話回線を利用した電話機を使う場合と同じように、離れた場所にいる知人などと音声による会話を行うことができる。 In recent years, conversation services using VoIP (Voice over IP) using an IP (Internet Protocol) network have begun to spread. By using a VoIP conversation service, a user can use a handset (phone) connected to a personal computer, etc., as if using a telephone using a telephone line, and with an acquaintance at a remote location. You can have a voice conversation.

特許文献１には、ビデオ会議システムを構成する装置間で送受信される音声データにユーザに関する情報を多重化し、そのユーザに関する情報に基づいて、同時に伝送される画像をそれぞれの装置に表示させる技術が開示されている。この技術によっては、話者の名前や顔の画像などがそれぞれの装置により表示される。
特開平１１−１７７９５２号公報 Japanese Patent Application Laid-Open No. 2004-151867 discloses a technique for multiplexing information related to a user on audio data transmitted and received between devices constituting a video conference system, and displaying simultaneously transmitted images on each device based on the information related to the user. It is disclosed. Depending on this technique, the name of the speaker, the image of the face, and the like are displayed by each device.
JP 11-177952 A

ところで、VoIPによる会話のモードとして、３人以上のユーザが同時に会話に参加することのできるカンファレンスモードがある。カンファレンスモードにおいては、カンファレンス（会議）を主催するユーザの端末やサーバが、カンファレンスに参加するユーザの端末によって取り込まれた音声を合成し、合成して得られた音声をそれぞれの端末に送信して出力させることによって複数のユーザの間で会話が実現される。 By the way, as a conversation mode by VoIP, there is a conference mode in which three or more users can participate in the conversation at the same time. In the conference mode, the user's terminal or server that hosts the conference synthesizes the audio captured by the user's terminal participating in the conference, and sends the synthesized audio to each terminal. By outputting, conversation is realized among a plurality of users.

このようなカンファレンスモードで会話が行われている場合、それがP to P（peer-to-pee）の会話であるときには互いに相手を特定することができるので問題はないが、３人以上のユーザによって会話が行われており、しかも複数のユーザにより同時に発言が行われたとき、VoIPによる会話サービスにおいてはテレビ会議のように会話を行っている相手の顔を見ることができないことなどもあって、それが、どのユーザによって行われたものであるのかが分かりづらいことがある。このことを解決する方法については、上述した特許文献１にも開示されていない。 If a conversation is taking place in such a conference mode, there is no problem because it is possible to identify each other when it is a P-to-P (peer-to-pee) conversation, but three or more users When there is a conversation by multiple users, and when a user speaks at the same time, the VoIP conversation service may not be able to see the face of the other party who is having a conversation like a video conference. , It may be difficult to understand which user it was performed by. A method for solving this problem is not disclosed in Patent Document 1 described above.

本発明はこのような状況に鑑みてなされたものであり、複数のユーザが同時に発言を行っているときであっても、それぞれの情報処理装置が、発言を行っているユーザと発言を行っていないユーザを情報管理装置から送信されてくる情報に基づいて容易に特定することができるようにするものである。 The present invention has been made in view of such a situation, and even when a plurality of users are speaking at the same time, each information processing apparatus is speaking with the user who is speaking. This makes it possible to easily identify a user who is not present based on information transmitted from the information management apparatus.

本発明の第１の側面の通信システムは、ネットワークを介して接続される、情報管理装置と複数の情報処理装置からなる。このうちの前記情報管理装置は、所定の前記情報処理装置により取得され、送信されてきた音声を合成し、合成して得られた音声を複数の前記情報処理装置に送信して出力させることによって複数の前記情報処理装置のユーザの間で会話を行わせる合成手段と、複数の前記情報処理装置により取得される音声の状況に基づいて、ユーザの発言の開始または終了を検出する検出手段と、前記検出手段によりユーザの発言の開始または終了が検出され、同時に発言を行うユーザの構成が変わる毎に、複数の前記情報処理装置のユーザを識別する第１の識別情報と、発言を行っているユーザを識別する第２の識別情報を含む制御情報を生成し、複数の前記情報処理装置に送信する生成手段とを備え、複数の前記情報処理装置は、それぞれ、前記情報管理装置から送信されてきた前記制御情報を取得する取得手段と、前記取得手段により前記制御情報が取得される毎に、前記制御情報に基づいて、発言を行っているユーザと発言を行っていないユーザを特定し、それぞれのユーザに関する情報を異なる形式で表示させる表示制御手段とを備える。 The communication system according to the first aspect of the present invention includes an information management device and a plurality of information processing devices connected via a network. Of these, the information management apparatus synthesizes the voices acquired and transmitted by the predetermined information processing apparatus, and transmits the synthesized voices to the plurality of information processing apparatuses for output. Synthesizing means for carrying out conversations between a plurality of users of the information processing devices, detection means for detecting the start or end of the user's speech based on the state of the voice acquired by the plurality of information processing devices; Each time the detection means detects the start or end of a user's speech, and the configuration of the user who speaks at the same time is changed, the first identification information for identifying the users of the plurality of information processing devices and the speech are performed. Generating information for generating control information including second identification information for identifying a user and transmitting the control information to a plurality of the information processing devices, wherein each of the plurality of information processing devices includes the information An acquisition unit that acquires the control information transmitted from the physical device, and each time the control information is acquired by the acquisition unit, does not speak with the user who is speaking based on the control information Display control means for identifying users and displaying information related to each user in different formats.

本発明の第２の側面の情報管理装置は、複数の情報処理装置とネットワークを介して接続される情報管理装置において、所定の前記情報処理装置により取得され、送信されてきた音声を合成し、合成して得られた音声を複数の前記情報処理装置に送信して出力させることによって複数の前記情報処理装置のユーザの間で会話を行わせる合成手段と、複数の前記情報処理装置により取得される音声の状況に基づいて、ユーザの発言の開始または終了を検出する検出手段と、前記検出手段によりユーザの発言の開始または終了が検出され、同時に発言を行うユーザの構成が変わる毎に、複数の前記情報処理装置のユーザを識別する第１の識別情報と、発言を行っているユーザを識別する第２の識別情報を含む制御情報を生成し、複数の前記情報処理装置に送信する生成手段とを備える。 An information management device according to a second aspect of the present invention is an information management device connected to a plurality of information processing devices via a network, and synthesizes voices acquired and transmitted by the predetermined information processing device, The speech obtained by the synthesis is acquired by the plurality of information processing devices, and the synthesizing means for causing the plurality of information processing devices to have a conversation by transmitting and outputting the voices to the plurality of information processing devices. Detecting means for detecting the start or end of the user's speech based on the state of the voice, and the detection means detects the start or end of the user's speech, and each time the configuration of the user who speaks is changed Control information including first identification information for identifying a user of the information processing apparatus and second identification information for identifying a user who is making a statement, And a generation means for transmitting device.

前記制御情報をRTCPパケットとし、前記第１の識別情報を、RTCPパケットに記述されるCSRCとすることができる。 The control information may be an RTCP packet, and the first identification information may be a CSRC described in the RTCP packet.

前記第２の識別情報を、SDES ITEMとしてSDESタイプのRTCPパケットに記述される情報とすることができる。 The second identification information may be information described in an SDES type RTCP packet as SDES ITEM.

本発明の第２の側面の情報処理方法またはプログラムは、所定の情報処理装置により取得され、送信されてきた音声を合成し、合成して得られた音声を複数の前記情報処理装置に送信して出力させることによって複数の前記情報処理装置のユーザの間で会話を行わせ、複数の前記情報処理装置により取得される音声の状況に基づいて、ユーザの発言の開始または終了を検出し、ユーザの発言の開始または終了が検出され、同時に発言を行うユーザの構成が変わる毎に、複数の前記情報処理装置のユーザを識別する第１の識別情報と、発言を行っているユーザを識別する第２の識別情報を含む制御情報を生成し、複数の前記情報処理装置に送信するステップを含む。 An information processing method or program according to a second aspect of the present invention synthesizes voices acquired and transmitted by a predetermined information processing apparatus, and transmits the synthesized voices to the plurality of information processing apparatuses. The user of the plurality of information processing devices to have a conversation with each other, detect the start or end of the user's speech based on the state of the voice acquired by the plurality of information processing devices, and The first identification information for identifying the users of the plurality of information processing devices and the user who is speaking Generating control information including two pieces of identification information and transmitting the control information to a plurality of the information processing apparatuses.

本発明の第３の側面の情報処理装置は、情報管理装置から送信されてきた制御情報を取得する取得手段と、前記取得手段により前記制御情報が取得される毎に、前記制御情報に基づいて、発言を行っているユーザと発言を行っていないユーザを特定し、それぞれのユーザに関する情報を異なる形式で表示させる表示制御手段とを備える。 An information processing apparatus according to a third aspect of the present invention includes an acquisition unit that acquires control information transmitted from an information management apparatus, and each time the control information is acquired by the acquisition unit, based on the control information. Display control means for specifying a user who is speaking and a user who is not speaking and displaying information related to each user in different formats.

本発明の第３の側面の情報処理方法またはプログラムは、情報管理装置から送信されてきた制御情報を取得し、前記制御情報が取得される毎に、前記制御情報に基づいて、発言を行っているユーザと発言を行っていないユーザを特定し、それぞれのユーザに関する情報を異なる形式で表示させるステップを含む。 An information processing method or program according to a third aspect of the present invention acquires control information transmitted from an information management device, and makes a remark based on the control information each time the control information is acquired. Identifying a user who is speaking and a user who is not speaking, and displaying information related to each user in different formats.

本発明の第１の側面においては、所定の情報処理装置により取得され、送信されてきた音声を合成し、合成して得られた音声を複数の前記情報処理装置に送信して出力させることによって複数の前記情報処理装置のユーザの間で会話が行われる。複数の前記情報処理装置により取得される音声の状況に基づいて、ユーザの発言の開始または終了が検出され、同時に発言を行うユーザの構成が変わる毎に、複数の前記情報処理装置のユーザを識別する第１の識別情報と、発言を行っているユーザを識別する第２の識別情報を含む制御情報が生成され、複数の前記情報処理装置に送信される。 In the first aspect of the present invention, by synthesizing voices acquired and transmitted by a predetermined information processing apparatus, and transmitting the synthesized voices to the plurality of information processing apparatuses for output. A conversation is performed between a plurality of users of the information processing apparatus. Based on the state of voice acquired by the plurality of information processing devices, the start or end of the user's speech is detected, and each time the configuration of the user who speaks changes, the user of the plurality of information processing devices is identified. Control information including first identification information to be performed and second identification information for identifying a user who is making a statement is generated and transmitted to the plurality of information processing apparatuses.

また、本発明の第１の側面においては、情報管理装置から送信されてきた前記制御情報が取得され、前記制御情報が取得される毎に、前記制御情報に基づいて、発言を行っているユーザと発言を行っていないユーザが特定され、それぞれのユーザに関する情報が異なる形式で表示される。 In the first aspect of the present invention, the control information transmitted from the information management apparatus is acquired, and a user who makes a statement based on the control information every time the control information is acquired Are identified, and information about each user is displayed in a different format.

本発明の第２の側面においては、所定の前記情報処理装置により取得され、送信されてきた音声を合成し、合成して得られた音声を複数の前記情報処理装置に送信して出力させることによって複数の前記情報処理装置のユーザの間で会話が行われる。複数の前記情報処理装置により取得される音声の状況に基づいて、ユーザの発言の開始または終了が検出され、同時に発言を行うユーザの構成が変わる毎に、複数の前記情報処理装置のユーザを識別する第１の識別情報と、発言を行っているユーザを識別する第２の識別情報を含む制御情報が生成され、複数の前記情報処理装置に送信される。 In the second aspect of the present invention, the voice acquired and transmitted by the predetermined information processing apparatus is synthesized, and the synthesized voice is transmitted to the plurality of information processing apparatuses for output. Thus, a conversation is performed among a plurality of users of the information processing apparatus. Based on the state of voice acquired by the plurality of information processing devices, the start or end of the user's speech is detected, and each time the configuration of the user who speaks changes, the user of the plurality of information processing devices is identified. Control information including first identification information to be performed and second identification information for identifying a user who is making a statement is generated and transmitted to the plurality of information processing apparatuses.

本発明の第３の側面においては、情報管理装置から送信されてきた制御情報が取得され、前記制御情報が取得される毎に、前記制御情報に基づいて、発言を行っているユーザと発言を行っていないユーザが特定され、それぞれのユーザに関する情報が異なる形式で表示される。 In the third aspect of the present invention, the control information transmitted from the information management device is acquired, and each time the control information is acquired, the user who makes a statement is remarked based on the control information. Users who have not gone are identified, and information about each user is displayed in a different format.

本発明によれば、複数のユーザが同時に発言を行っているときであっても、それぞれの情報処理装置が、発言を行っているユーザと発言を行っていないユーザを情報管理装置から送信されてくる情報に基づいて容易に特定することができる。 According to the present invention, even when a plurality of users are speaking at the same time, each information processing apparatus transmits a user who is speaking and a user who is not speaking from the information management apparatus. It can be easily identified based on the information that comes.

以下に本発明の実施の形態を説明するが、本発明の構成要件と、明細書又は図面に記載の実施の形態との対応関係を例示すると、次のようになる。この記載は、本発明をサポートする実施の形態が、明細書又は図面に記載されていることを確認するためのものである。従って、明細書又は図面中には記載されているが、本発明の構成要件に対応する実施の形態として、ここには記載されていない実施の形態があったとしても、そのことは、その実施の形態が、その構成要件に対応するものではないことを意味するものではない。逆に、実施の形態が発明に対応するものとしてここに記載されていたとしても、そのことは、その実施の形態が、その構成要件以外には対応しないものであることを意味するものでもない。 Embodiments of the present invention will be described below. Correspondences between the constituent elements of the present invention and the embodiments described in the specification or the drawings are exemplified as follows. This description is intended to confirm that the embodiments supporting the present invention are described in the specification or the drawings. Therefore, even if there is an embodiment which is described in the specification or the drawings but is not described here as an embodiment corresponding to the constituent elements of the present invention, that is not the case. It does not mean that the form does not correspond to the constituent requirements. On the contrary, even if an embodiment is described herein as corresponding to the invention, this does not mean that the embodiment does not correspond to other than the configuration requirements. .

本発明の第１の側面の通信システムは、ネットワークを介して接続される、情報管理装置（例えば、図１のユーザ端末１−１）と複数の情報処理装置（例えば、図１のユーザ端末１−２と１−３）からなる。このうちの情報管理装置は、本発明の第２の側面の情報管理装置と実質的に同様の構成を有し、情報処理装置は、本発明の第３の側面の情報処理装置と実質的に同様の構成を有する。 The communication system according to the first aspect of the present invention includes an information management apparatus (for example, the user terminal 1-1 in FIG. 1) and a plurality of information processing apparatuses (for example, the user terminal 1 in FIG. 1) connected via a network. -2 and 1-3). Among these, the information management device has substantially the same configuration as the information management device of the second aspect of the present invention, and the information processing device is substantially the same as the information processing device of the third aspect of the present invention. It has the same configuration.

本発明の第２の側面の情報管理装置（例えば、図１のユーザ端末１−１）は、複数の情報処理装置（例えば、図１のユーザ端末１−２と１−３）とネットワークを介して接続される情報管理装置において、所定の前記情報処理装置により取得され、送信されてきた音声を合成し、合成して得られた音声を複数の前記情報処理装置に送信して出力させることによって複数の前記情報処理装置のユーザの間で会話を行わせる合成手段（例えば、図１０の音声合成部３２）と、複数の前記情報処理装置により取得される音声の状況に基づいて、ユーザの発言の開始または終了を検出する検出手段（例えば、図１０のパケット解析部３１）と、前記検出手段によりユーザの発言の開始または終了が検出され、同時に発言を行うユーザの構成が変わる毎に、複数の前記情報処理装置のユーザを識別する第１の識別情報と、発言を行っているユーザを識別する第２の識別情報を含む制御情報を生成し、複数の前記情報処理装置に送信する生成手段（例えば、図１０のパケット生成部３３）とを備える。 An information management apparatus (for example, user terminal 1-1 in FIG. 1) according to the second aspect of the present invention is connected to a plurality of information processing apparatuses (for example, user terminals 1-2 and 1-3 in FIG. 1) via a network. The information management devices connected to each other by synthesizing the voices acquired and transmitted by the predetermined information processing device, and transmitting the synthesized voices to the plurality of information processing devices for output. A synthesizing unit (for example, the voice synthesizing unit 32 in FIG. 10) that allows conversation between the users of the plurality of information processing apparatuses, and a user's remark based on the voice status acquired by the plurality of information processing apparatuses Detecting means (for example, the packet analyzing unit 31 in FIG. 10) for detecting the start or end of the user, and the detecting means detects the start or end of the user's speech, and changes the configuration of the user who speaks simultaneously. And generating control information including first identification information for identifying a plurality of users of the information processing devices and second identification information for identifying a user making a statement, and transmitting the control information to the plurality of information processing devices. Generation means (for example, the packet generation unit 33 in FIG. 10).

本発明の第２の側面の情報処理方法またはプログラムは、所定の情報処理装置により取得され、送信されてきた音声を合成し、合成して得られた音声を複数の前記情報処理装置に送信して出力させることによって複数の前記情報処理装置のユーザの間で会話を行わせ、複数の前記情報処理装置により取得される音声の状況に基づいて、ユーザの発言の開始または終了を検出し、ユーザの発言の開始または終了が検出され、同時に発言を行うユーザの構成が変わる毎に、複数の前記情報処理装置のユーザを識別する第１の識別情報と、発言を行っているユーザを識別する第２の識別情報を含む制御情報を生成し、複数の前記情報処理装置に送信するステップ（例えば、図１２のステップＳ４０）を含む。 An information processing method or program according to a second aspect of the present invention synthesizes voices acquired and transmitted by a predetermined information processing apparatus, and transmits the synthesized voices to the plurality of information processing apparatuses. The user of the plurality of information processing devices to have a conversation with each other, detect the start or end of the user's speech based on the state of the voice acquired by the plurality of information processing devices, and The first identification information for identifying the users of the plurality of information processing devices and the user who is speaking 2 including the step of generating control information including two pieces of identification information and transmitting the control information to the plurality of information processing apparatuses (for example, step S40 in FIG. 12).

本発明の第３の側面の情報処理装置（例えば、図１のユーザ端末１−２）は、情報管理装置から送信されてきた制御情報を取得する取得手段（例えば、図１１のパケット解析部４１）と、前記取得手段により前記制御情報が取得される毎に、前記制御情報に基づいて、発言を行っているユーザと発言を行っていないユーザを特定し、それぞれのユーザに関する情報を異なる形式で表示させる表示制御手段（例えば、図９の表示制御部１６）とを備える。 The information processing apparatus (for example, the user terminal 1-2 in FIG. 1) according to the third aspect of the present invention acquires acquisition means (for example, the packet analysis unit 41 in FIG. 11) that acquires control information transmitted from the information management apparatus. ) And each time the control information is acquired by the acquisition means, the user who is making a statement and the user who is not making a statement are identified based on the control information, and the information about each user is expressed in a different format. Display control means (for example, the display control unit 16 in FIG. 9) to be displayed.

本発明の第３の側面の情報処理方法またはプログラムは、前記情報管理装置から送信されてきた前記制御情報を取得し、前記制御情報が取得される毎に、前記制御情報に基づいて、発言を行っているユーザと発言を行っていないユーザを特定し、それぞれのユーザに関する情報を異なる形式で表示させるステップ（例えば、図１３のステップＳ５４）を含む。 An information processing method or program according to a third aspect of the present invention acquires the control information transmitted from the information management device, and makes a remark based on the control information each time the control information is acquired. This includes a step of identifying a user who is performing and a user who is not speaking and displaying information related to each user in different formats (for example, step S54 in FIG. 13).

以下、本発明の実施の形態について図を参照して説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図１は、本発明の一実施形態に係る通信システムの構成例を示す図である。 FIG. 1 is a diagram illustrating a configuration example of a communication system according to an embodiment of the present invention.

図１に示されるように、この通信システムは、ユーザ端末１−１乃至１−３がインターネットなどよりなるネットワーク２を介して相互に接続されることによって構成される。４つ以上のユーザ端末がネットワーク２に接続されるようにしてもよい。図１の例においては、サーバ３もネットワーク２に接続されている。 As shown in FIG. 1, this communication system is configured by connecting user terminals 1-1 to 1-3 to each other via a network 2 such as the Internet. Four or more user terminals may be connected to the network 2. In the example of FIG. 1, the server 3 is also connected to the network 2.

このような構成からなる図１の通信システムにおいては、例えば、ユーザ端末１−１をカンファレンスを主催するユーザ（主催者）の端末、ユーザ端末１−２と１−３をカンファレンスに参加するユーザ（参加者）の端末として、VoIPを利用したカンファレンスモードの音声会話がユーザ端末１−１乃至１−３のユーザの間で行われる。以下、適宜、ユーザ端末１−１のユーザをユーザＡ、ユーザ端末１−２のユーザをユーザＢ、ユーザ端末１−３のユーザをユーザＣとして説明する。 In the communication system of FIG. 1 having such a configuration, for example, the user terminal (host) that hosts the user terminal 1-1 and the user terminals 1-2 and 1-3 that participate in the conference (users) As a participant's terminal, conference mode voice conversation using VoIP is performed between the users of the user terminals 1-1 to 1-3. Hereinafter, the user of the user terminal 1-1 will be described as user A, the user of the user terminal 1-2 as user B, and the user of the user terminal 1-3 as user C, as appropriate.

後に詳述するように、ユーザ端末１−１乃至１−３には、それぞれ、ユーザの音声を取り込むマイクロフォンと、ネットワーク２を介して送信されてきたデータに基づいて音声を出力するスピーカが設けられている。 As will be described in detail later, each of the user terminals 1-1 to 1-3 is provided with a microphone that captures the user's voice and a speaker that outputs sound based on data transmitted via the network 2. ing.

ユーザ端末１−１は、図１に示されるように、ユーザ端末１−２によって取り込まれたユーザＢの音声のデータである音声データＡ_Bがユーザ端末１−２から送信されてきたり、ユーザ端末１−３によって取り込まれたユーザＣの音声のデータである音声データＡ_Cがユーザ端末１−３から送信されてきたとき、それらの音声データと、自分自身が取り込んだユーザＡの音声のデータとを合成し、合成して得られた音声データＡ_mixをユーザ端末１−２と１−３に送信する。 As shown in FIG. 1, the user terminal 1-1 transmits voice data A _B that is voice data of the user B captured by the user terminal 1-2 or is transmitted from the user terminal 1-2. When the voice data A _C that is the voice data of the user C captured by 1-3 is transmitted from the user terminal 1-3, the voice data and the voice data of the user A captured by the user C itself And the voice data A _mix obtained by the synthesis is transmitted to the user terminals 1-2 and 1-3.

また、ユーザ端末１−１は、音声データＡ_mixをユーザ端末１−２と１−３に送信するだけでなく、音声データＡ_mixなどに基づいて音声を出力し、ユーザＡに、ユーザＢの発言やユーザＣの発言などを聴かせる。 Further, the user terminal 1-1 not only transmits the audio data A _mix to the user terminals 1-2 and 1-3, but also outputs audio based on the audio data A _{mix and the} like. Listen to remarks, user C's remarks, etc.

ユーザ端末１−２は、ユーザＢの音声を取り込み、得られた音声データＡ_Bをユーザ端末１−１に送信する。また、ユーザ端末１−２は、音声データＡ_mixがユーザ端末１−１から送信されてきたとき、音声データＡ_mixに基づいて音声を出力し、ユーザＢに、ユーザＡの発言やユーザＣの発言などを聴かせる。 The user terminal 1-2 captures the voice of the user B and transmits the obtained voice data _AB to the user terminal 1-1. Further, when the voice data A _mix is transmitted from the user terminal 1-1, the user terminal 1-2 outputs voice based on the voice data A _mix , and the user B speaks to the user B or the user C Listen to remarks.

同様に、ユーザ端末１−３は、ユーザＣの音声を取り込み、得られた音声データＡ_Cをユーザ端末１−１に送信する。また、ユーザ端末１−３は、音声データＡ_mixがユーザ端末１−１から送信されてきたとき、音声データＡ_mixに基づいて音声を出力し、ユーザＣに、ユーザＡの発言やユーザＢの発言などを聴かせる。 Similarly, the user terminal 1-3 takes in the voice of the user C and transmits the obtained voice data _AC to the user terminal 1-1. In addition, when the voice data A _mix is transmitted from the user terminal 1-1, the user terminal 1-3 outputs a voice based on the voice data A _mix , and the user C speaks to the user C or the user B Listen to remarks.

ここで、音声データの送受信は、RTP(Real-time Transport Protocol)にしたがって行われる。RTPは、映像や音声をリアルタイムに伝送することを目的としたプロトコルである。RTPパケットには、タイムスタンプや再生順序を表す情報などが含まれる。 Here, transmission / reception of audio data is performed according to RTP (Real-time Transport Protocol). RTP is a protocol intended to transmit video and audio in real time. The RTP packet includes a time stamp and information indicating the reproduction order.

また、図１の通信システムにおいて、ユーザ端末１−１は、カンファレンスに新たに参加するユーザや、カンファレンスから退場するユーザを検出する。 In the communication system of FIG. 1, the user terminal 1-1 detects a user who newly participates in a conference or a user who leaves the conference.

カンファレンスに新たに参加するユーザやカンファレンスから退場するユーザを検出した場合、ユーザ端末１−１は、リストに登録して管理するカンファレンスに参加しているユーザの識別情報を更新し、その更新したユーザの識別情報を、ユーザ端末１−２と１−３に送信する音声データＡ_mixなどの情報に含めてユーザ端末１−２と１−３に送信する。 When a user who newly participates in the conference or a user who leaves the conference is detected, the user terminal 1-1 updates the identification information of the user participating in the conference registered and managed in the list, and the updated user _Is included in information such as audio data A _mix transmitted to the user terminals 1-2 and 1-3, and transmitted to the user terminals 1-2 and 1-3.

さらに、ユーザ端末１−１は、参加者の端末であるユーザ端末１−２と１−３から送信されてくる音声データと、自分自身が取り込んだユーザＡの音声のデータなどに基づいて、ユーザによる発言の開始や発言の終了を検出する。 Furthermore, the user terminal 1-1 is based on the voice data transmitted from the user terminals 1-2 and 1-3, which are the participants' terminals, and the voice data of the user A captured by the user terminal 1-1. Detects the start and end of utterances.

ユーザ端末１−１は、同時に発言を行うユーザの構成が変わったことを検出した場合、発言を行っているユーザを識別する識別情報を含む制御情報をユーザ端末１−２と１−３に送信する。 When the user terminal 1-1 detects that the configuration of the user who speaks at the same time is changed, the user terminal 1-1 transmits control information including identification information for identifying the user who is speaking to the user terminals 1-2 and 1-3. To do.

例えば、ユーザＢが発言を開始したことによって、ユーザＢとユーザＣが同時に発言を行う状態になったとき、そのときユーザ端末１−１から送信される制御情報には、発言を行っているユーザＢの識別情報とユーザＣの識別情報が含まれる。制御情報を受信したユーザ端末１−２と１−３は、ユーザＢとユーザＣが発言を行っていることを制御情報から特定することができる。 For example, when the user B and the user C speak at the same time because the user B starts speaking, the control information transmitted from the user terminal 1-1 at that time includes the user who is speaking. B identification information and user C identification information are included. The user terminals 1-2 and 1-3 that have received the control information can specify from the control information that the user B and the user C are speaking.

ユーザＢとユーザＣが発言を行っていることが特定されたとき、ユーザ端末１−２と１−３においては、例えば、発言を行っているユーザの名前である「ユーザＢ」と「ユーザＣ」が、発言を行っていないユーザの名前である「ユーザＡ」と異なる色で表示されるといったように、それらの情報が異なる形式で表示される。ユーザ端末１−１においても、同様の表示がなされる。なお、カンファレンスに参加するユーザがユーザＡ乃至Ｃの３人であることも、ユーザ端末１−１からユーザ端末１−２と１−３に通知されている。 When it is determined that the user B and the user C are speaking, the user terminals 1-2 and 1-3, for example, “user B” and “user C”, which are the names of the users who are speaking, are used. "Is displayed in a different color from" User A ", which is the name of the user who is not speaking, such information is displayed in a different format. A similar display is made on the user terminal 1-1. In addition, it is notified from the user terminal 1-1 to the user terminals 1-2 and 1-3 that there are three users A to C participating in the conference.

これにより、ユーザＡ乃至Ｃは、ユーザＢとユーザＣが同時に発言を行っていることをユーザ端末による画面表示から確認することができる。 Thereby, the users A to C can confirm from the screen display by the user terminal that the user B and the user C are simultaneously speaking.

このように、制御情報には、同時に発言を行っているユーザの識別情報が含まれるため、それを取得したユーザ端末は、容易に、発言を行っているユーザを特定することができる。例えば、発言を行っているユーザを特定するために、音声データＡ_mixの解析などを音声データＡ_mixを受信したユーザ端末自身が行わなければならないとするとそのユーザ端末にとって負担になるが、そのようなことを行う必要がない。 Thus, since the control information includes the identification information of the user who is speaking at the same time, the user terminal that has acquired it can easily identify the user who is speaking. For example, if the user terminal that has received the voice data A _mix must analyze the voice data A _mix in order to identify the user who is making a speech, it will be a burden on the user terminal. There is no need to do anything.

ここで、制御情報は、RTCP(RTP Control Protocol)パケットとしてユーザ端末１−１から送信される。RTCPは、RTPパケットの送信者、受信者に関する情報などを伝達するためのプロトコルである。 Here, the control information is transmitted from the user terminal 1-1 as an RTCP (RTP Control Protocol) packet. RTCP is a protocol for transmitting information about the sender and receiver of an RTP packet.

RTCPパケットには、主に、SR(Sender Report)、RR(Receiver Report)、SDES(Source Description)、BYE(Goodbye)、APP(Application Specific)の５つのタイプがある。 There are mainly five types of RTCP packets: SR (Sender Report), RR (Receiver Report), SDES (Source Description), BYE (Goodbye), and APP (Application Specific).

SRタイプのRTCPパケットは、音声データの送信側の端末が、受信側の端末に自分自身のステータスを教えるために用いられるパケットであり、RRタイプのRTCPパケットは、音声データの受信側の端末が、送信側の端末に自分自身のステータスを教えるために用いられるパケットである。 The SR type RTCP packet is a packet used by the terminal on the voice data transmitting side to teach its own status to the terminal on the receiving side. The RR type RTCP packet is used by the terminal on the voice data receiving side. , A packet used to tell the sending terminal its own status.

SDESタイプのRTCPパケットは、音声データの詳細情報を送信するのに用いられるパケットであり、BYEタイプのRTCPパケットは、カンファレンスから退場することを主催者の端末に教えるために用いられるパケットである。APPタイプのRTCPパケットは、どのようなアプリケーションを用いてカンファレンスを開催しているのかを教えるために用いられるパケットである。 The SDES type RTCP packet is a packet used for transmitting detailed information of voice data, and the BYE type RTCP packet is a packet used for instructing the organizer's terminal to leave the conference. An APP type RTCP packet is a packet used to teach what application is used to hold a conference.

制御情報を送信するRTCPパケットとしては、このうちのSDESタイプのRTCPパケットが用いられる。以下、適宜、SDESタイプのRTCPパケットを単にSDESパケットという。RTPパケットとRTCPパケットのデータ構造の詳細については後述する。 Of these, SDES type RTCP packets are used as RTCP packets for transmitting control information. Hereinafter, as appropriate, the SDES type RTCP packet is simply referred to as an SDES packet. Details of the data structure of the RTP packet and the RTCP packet will be described later.

図２は、図１のユーザ端末１−１乃至１−３の間で行われるデータの送受信の流れについて説明する図である。 FIG. 2 is a diagram for explaining a flow of data transmission / reception performed between the user terminals 1-1 to 1-3 in FIG.

VoIPによる音声データの伝送においては、一般的に、遅延を防ぐことを優先してUDP(User Datagram Protocol)が使われる。その際、上述したように、メディア（音声データ）の伝送にはRTPが用いられ、メディアの伝送の制御などにはRTCPが用いられる。音声データの送信側の端末は、対象のデータにRTPヘッダを付けて送信し、このRTPヘッダの記述によって、受信側の端末における再生の手順を規定する。送信側の端末と受信側の端末は定期的にRTCPパケットを交換し、受信側の端末はRTCP-SR（SRタイプのRTCPパケット）にしたがって動作し、送信側の端末はRTCP-RR（RRタイプのRTCPパケット）にしたがってフロー制御を行う。 In voice data transmission by VoIP, UDP (User Datagram Protocol) is generally used with priority given to preventing delay. At this time, as described above, RTP is used for transmitting media (voice data), and RTCP is used for controlling media transmission. The terminal on the transmission side of the audio data transmits the target data with an RTP header attached, and the playback procedure in the terminal on the reception side is defined by the description of the RTP header. The sending terminal and receiving terminal regularly exchange RTCP packets, the receiving terminal operates according to RTCP-SR (SR type RTCP packet), and the sending terminal uses RTCP-RR (RR type). Flow control according to the RTCP packet).

ユーザＢの音声を取り込んだとき、ステップＳ１１において、ユーザ端末１−２は、得られた音声データＡ_BをRTPパケットによりユーザ端末１−１に送信する。また、ステップＳ１２において、ユーザ端末１−２は、RTCP-RRパケット（RRタイプのRTCPパケット）をユーザ端末１−１に送信する。 When incorporating the user's voice B, in step S11, the user terminal 1-2 transmits the resulting audio data A _B by RTP packets to the user terminal 1-1. In step S12, the user terminal 1-2 transmits an RTCP-RR packet (RR type RTCP packet) to the user terminal 1-1.

ユーザ端末１−１は、ユーザ端末１−２から送信されてきたRTPパケットをステップＳ１において受信し、RTCP-RRパケットをステップＳ２において受信する。 The user terminal 1-1 receives the RTP packet transmitted from the user terminal 1-2 in step S1, and receives the RTCP-RR packet in step S2.

一方、ユーザＣの音声を取り込んだとき、ステップＳ２１において、ユーザ端末１−３は、得られた音声データＡ_CをRTPパケットによりユーザ端末１−１に送信する。また、ステップＳ２２において、ユーザ端末１−３は、RTCP-RRパケットをユーザ端末１−１に送信する。 On the other hand, when the user C's voice is captured, in step S21, the user terminal 1-3 transmits the obtained voice data A _C to the user terminal 1-1 using an RTP packet. In step S22, the user terminal 1-3 transmits an RTCP-RR packet to the user terminal 1-1.

ユーザ端末１−１は、ユーザ端末１−３から送信されてきたRTPパケットをステップＳ３において受信し、RTCP-RRパケットをステップＳ４において受信する。 The user terminal 1-1 receives the RTP packet transmitted from the user terminal 1-3 in step S3, and receives the RTCP-RR packet in step S4.

ステップＳ５において、ユーザ端末１−１は、そのときユーザＡの音声を取り込んでいる場合にはユーザＡの音声を含めて、ユーザ端末１−２から送信されてきたRTPパケットに格納される音声データＡ_Bとユーザ端末１−３から送信されてきたRTPパケットに格納される音声データＡ_Cを合成する。 In step S5, if the user terminal 1-1 captures the voice of the user A at that time, the voice data stored in the RTP packet transmitted from the user terminal 1-2 includes the voice of the user A. to synthesize speech data a _C stored in the a _B and RTP packets transmitted from the user terminal 1-3.

ユーザ端末１−１は、ステップＳ６において、合成して得られた音声データＡ_mixをRTPパケットによってユーザ端末１−２と１−３に送信する。 In step S6, the user terminal 1-1 transmits the voice data A _mix obtained by the synthesis to the user terminals 1-2 and 1-3 by the RTP packet.

また、ユーザ端末１−１は、それまではユーザＣだけが発言を行っていたのがユーザＢとユーザＣが同時に発言を行うようになったというように、同時に発言を行うユーザの構成が変わったことを検出した場合、ステップＳ７において、SDESパケットをユーザ端末１−２と１−３に送信する。ここで送信されるSDESパケットには、カンファレンスに参加するユーザＡ乃至Ｃのそれぞれの識別情報と、ステップＳ６で送信されたRTPパケットに含まれる音声の主である例えばユーザＢ，Ｃのそれぞれの識別情報が含まれる。 In addition, the configuration of the user terminal 1-1 is changed so that the user B and the user C speak at the same time, but the user C and the user C speak at the same time. If it is detected, the SDES packet is transmitted to the user terminals 1-2 and 1-3 in step S7. In the SDES packet transmitted here, the identification information of each of the users A to C participating in the conference and the identification of each of the users B and C, which are the main voices included in the RTP packet transmitted in step S6, for example. Contains information.

ユーザ端末１−２は、ユーザ端末１−１から送信されてきたRTPパケットをステップＳ１３において受信し、SDESパケットをステップＳ１４において受信する。 The user terminal 1-2 receives the RTP packet transmitted from the user terminal 1-1 in step S13, and receives the SDES packet in step S14.

ステップＳ１５において、ユーザ端末１−２は、受信したRTPパケットに基づいて音声を出力する。 In step S15, the user terminal 1-2 outputs voice based on the received RTP packet.

また、ユーザ端末１−２は、ステップＳ１６において、受信したSDESパケットに記述されている情報に基づいて、発言を行っているユーザの情報を、発言を行っていないユーザの情報と異なる形式で表示する。 In step S16, the user terminal 1-2 displays the information of the user who is speaking in a format different from the information of the user who is not speaking based on the information described in the received SDES packet. To do.

同様に、ユーザ端末１−３は、ユーザ端末１−１から送信されてきたRTPパケットをステップＳ２３において受信し、SDESパケットをステップＳ２４において受信する。 Similarly, the user terminal 1-3 receives the RTP packet transmitted from the user terminal 1-1 in step S23, and receives the SDES packet in step S24.

ステップＳ２５において、ユーザ端末１−３は、受信したRTPパケットに基づいて音声を出力する。 In step S25, the user terminal 1-3 outputs voice based on the received RTP packet.

また、ユーザ端末１−３は、ステップＳ２６において、受信したSDESパケットに記述されている情報に基づいて、発言を行っているユーザの情報を、発言を行っていないユーザの情報と異なる形式で表示する。 Further, in step S26, the user terminal 1-3 displays the information of the user who is speaking in a format different from the information of the user who is not speaking based on the information described in the received SDES packet. To do.

このように、同時に発言を行うユーザの構成が変わる毎にユーザ端末１−１からSDESパケットが送信される。また、ユーザ端末１−２，１−３においては、SDESパケットが受信される毎に、発言を行っているユーザの情報の表示が切り替えられる。 Thus, every time the configuration of a user who speaks at the same time changes, an SDES packet is transmitted from the user terminal 1-1. Further, in the user terminals 1-2 and 1-3, every time an SDES packet is received, the display of the information of the user who is speaking is switched.

以上においては、カンファレンスの主催者の装置がユーザ端末１−１であるとしたが、カンファレンスが図１のサーバ３により管理されるようにしてもよい。この場合、ユーザ端末１−１乃至１−３により取り込まれたユーザの音声のデータは、ネットワーク２を介してサーバ３に送信され、サーバ３により、それらの合成が行われる。合成された音声はサーバ３からユーザ端末１−１乃至１−３に送信され、それぞれのユーザ端末において音声が出力される。 In the above, the conference organizer's device is the user terminal 1-1, but the conference may be managed by the server 3 in FIG. In this case, the user voice data captured by the user terminals 1-1 to 1-3 is transmitted to the server 3 via the network 2, and the server 3 synthesizes them. The synthesized voice is transmitted from the server 3 to the user terminals 1-1 to 1-3, and the voice is output from each user terminal.

また、サーバ３により、ユーザの発言の開始や発言の終了の検出などが行われ、同時に発言を行うユーザの構成が変わったとき、同時に発言を行っているユーザの識別情報などを含む制御情報がサーバ３からユーザ端末１−１乃至１−３に送信される。 Further, when the server 3 detects the start of the user's speech or the end of the speech, and the configuration of the user who speaks at the same time is changed, control information including identification information of the user who is speaking at the same time is provided. It is transmitted from the server 3 to the user terminals 1-1 to 1-3.

ユーザ端末１−１乃至１−３においては、サーバ３から送信されてきた制御情報に基づいて、発言を行っているユーザと発言を行っていないユーザが特定され、発言を行っているユーザの情報が、発言を行っていないユーザの情報と異なる形式で表示される。 In the user terminals 1-1 to 1-3, based on the control information transmitted from the server 3, the user who is speaking and the user who is not speaking are specified, and the information of the user who is speaking Is displayed in a format different from the information of the user who is not speaking.

ここで、パケットのデータ構造について説明する。 Here, the data structure of the packet will be described.

図３は、RTPパケットのデータ構造を示す図である。 FIG. 3 is a diagram illustrating a data structure of the RTP packet.

図３に示されるように、RTPパケットはデータリンクヘッダ（Data Link Header）、IPヘッダ（IP Header）、UDPヘッダ（UDP Header）、RTPヘッダ（RTP Header）、データ部（Data）から構成される。図３には、RTPヘッダに記述されるデータを拡大して示している。 As shown in FIG. 3, an RTP packet is composed of a data link header (Data Link Header), an IP header (IP Header), a UDP header (UDP Header), an RTP header (RTP Header), and a data part (Data). . FIG. 3 shows enlarged data described in the RTP header.

RTPヘッダに記述されるデータのうちの「Ｖ」はバージョン番号を表し、「Ｐ」はパディングの有無を表す。例えば「Ｐ」の値として１が記述されている場合、それは、パケットの最後にペイロードがパディングされていることを表す。「Ｘ」はRTPヘッダの直後に拡張ヘッダがある場合に設定されるフラグである。 Of the data described in the RTP header, “V” represents a version number, and “P” represents the presence or absence of padding. For example, when 1 is described as the value of “P”, it indicates that the payload is padded at the end of the packet. “X” is a flag set when there is an extension header immediately after the RTP header.

「CC」はCSRC(Contributing Source)として記述される識別子の数を表し、「Ｍ」はマーカビットでありVoIPによる音声データの伝送においては有音・無音の境界を表す。 “CC” represents the number of identifiers described as CSRC (Contributing Source), and “M” is a marker bit, which represents a boundary between sound and silence in voice data transmission by VoIP.

「PT」はペイロードのタイプを表し、「Sequence」はパケットの通し番号を表す。「Sequence」と次に記述される「Time Stamp」により、RTPパケットを受信した受信側の端末における再生の同期が確立される。 “PT” represents a payload type, and “Sequence” represents a packet serial number. With “Sequence” and “Time Stamp” described next, synchronization of reproduction in the terminal on the receiving side that has received the RTP packet is established.

「SSRC(Synchronization Source)」は、セッション毎に変わる一時的な識別子であり、VoIPによる音声データの伝送においてはカンファレンスの主催者のユーザ端末が再生の同期を司るので、その主催者の識別子が割り当てられる。「CSRC」は、RTPパケットに含まれる音声データを用意したユーザ端末を使っている参加者の識別子である。 "SSRC (Synchronization Source)" is a temporary identifier that changes from session to session. In audio data transmission by VoIP, the conference organizer's user terminal controls playback, so the organizer's identifier is assigned. It is done. “CSRC” is an identifier of a participant who uses a user terminal that has prepared voice data included in an RTP packet.

RTPヘッダには、「CSRC」に続けて、「Payload Format Extension」、「Data」が記述される。 In the RTP header, “Payload Format Extension” and “Data” are described after “CSRC”.

RTPパケットはこのような構造を有している。例えば、カンファレンスの参加者の端末であるユーザ端末１−２から主催者の端末であるユーザ端末１−１に送信されるRTPパケットのSSRCにはユーザＡの識別子が記述され、CSRCにはユーザＢの識別子が記述される。ユーザ端末１−２から送信されてきたRTPパケットを受信したユーザ端末１−１は、受信したRTPパケットがユーザＢが利用するユーザ端末１−２から送信されてきたものであることをCSRCに基づいて特定し、RTPパケットのマーカビットの値から、ユーザＢが発言を行っているのか否かを判断することができる。 The RTP packet has such a structure. For example, the identifier of the user A is described in the SSRC of the RTP packet transmitted from the user terminal 1-2 that is the conference participant's terminal to the user terminal 1-1 that is the organizer's terminal, and the user B is included in the CSRC. Is described. Based on CSRC, the user terminal 1-1 that has received the RTP packet transmitted from the user terminal 1-2 determines that the received RTP packet is transmitted from the user terminal 1-2 used by the user B. It is possible to determine whether or not the user B is speaking from the marker bit value of the RTP packet.

一方、カンファレンスの主催者の端末であるユーザ端末１−１から参加者の端末であるユーザ端末１−２に送信されるRTPパケットのSSRCにはユーザＡの識別子が記述され、CSRCには、ユーザＢの識別子とユーザＣの識別子が記述される。カンファレンスの参加者が変わる毎にユーザ端末１−１によってCSRCも書き換えられるから、カンファレンスの参加者のユーザ端末は、ユーザ端末１−１から送信されてくるRTPパケットに記述されるCSRCから、カンファレンスの参加者を確認することができる。 On the other hand, the identifier of user A is described in the SSRC of the RTP packet transmitted from the user terminal 1-1, which is the conference organizer's terminal, to the user terminal 1-2, which is the participant's terminal, and the user is identified in CSRC. The identifier of B and the identifier of user C are described. Since the CSRC is also rewritten by the user terminal 1-1 each time the conference participant changes, the user terminal of the conference participant can read the conference from the CSRC described in the RTP packet transmitted from the user terminal 1-1. Participants can be confirmed.

このように、ユーザ端末１−１から送信されるRTPパケットにおいては、カンファレンスの参加者の識別子がRTPヘッダに記述されるから、RTPパケットを受信するユーザ端末のアプリケーション毎に処理を規定する必要がない。例えば、カンファレンスの参加者の識別子を記述する領域としてデータ領域を用いることも可能であるが、この場合、データ領域のこの部分に参加者の識別子が記述されているといったことをユーザ端末のアプリケーション毎に規定する必要があるが、そのようなことを行う必要がない。 Thus, in the RTP packet transmitted from the user terminal 1-1, since the identifier of the conference participant is described in the RTP header, it is necessary to specify the process for each application of the user terminal that receives the RTP packet. Absent. For example, it is possible to use a data area as an area for describing an identifier of a conference participant, but in this case, an identifier of the participant is described in this part of the data area for each application of the user terminal. It is not necessary to do such a thing.

図４は、RTCPパケットのデータ構造を示す図である。 FIG. 4 is a diagram showing a data structure of the RTCP packet.

図４に示されるように、RTCPパケットはデータリンクヘッダ、IPヘッダ、UDPヘッダ、RTCPデータ（RTCP Data）から構成される。上述したRTCPパケットのタイプに応じて、RTCPデータの記述内容が異なる。カンファレンスの主催者の端末であるユーザ端末１−１から制御情報として送信されるRTCPパケットにはSDESパケットが用いられる。 As shown in FIG. 4, the RTCP packet is composed of a data link header, an IP header, a UDP header, and RTCP data (RTCP Data). The description contents of the RTCP data differ depending on the type of the RTCP packet described above. An SDES packet is used as an RTCP packet transmitted as control information from the user terminal 1-1 which is a conference organizer's terminal.

図５は、SDESパケットのデータ部に記述されるデータの例を示す図である。 FIG. 5 is a diagram illustrating an example of data described in the data portion of the SDES packet.

「Ｖ」はバージョン番号を表し、「Ｐ」はパディングの有無を表す。「SC(Source Count)」は、「SSRC/CSRC」の数を表し、「PT」はペイロードのタイプを表す。「Packet Length」はパケット長を表す。 “V” represents a version number, and “P” represents the presence or absence of padding. “SC (Source Count)” represents the number of “SSRC / CSRC”, and “PT” represents the payload type. “Packet Length” represents the packet length.

図５に示されるように、SDESパケットには、「SSRC/CSRC」と「SDES ITEM」を所定の数だけ記述することができるようになされている。カンファレンスの主催者の端末であるユーザ端末１−１から参加者の端末であるユーザ端末１−２と１−３に送信されるSDESパケットには、同時に発言を行っている参加者の数と同じ数の「SSRC/CSRC」と「SDES ITEM」の組が記述される。 As shown in FIG. 5, a predetermined number of “SSRC / CSRC” and “SDES ITEM” can be described in the SDES packet. The SDES packet transmitted from the user terminal 1-1, which is the conference organizer's terminal, to the user terminals 1-2, 1-3, which are the participants' terminals, is the same as the number of participants who are speaking at the same time. A number of "SSRC / CSRC" and "SDES ITEM" pairs are described.

例えば、ユーザＢが発言を開始したことから、ユーザＢとユーザＣが同時に発言を行っている状態になったとき、そのときユーザ端末１−１から送信されるSDESパケットには、「SSRC/CSRC」と「SDES ITEM」が２組だけ記述される。 For example, when user B has started speaking and user B and user C are speaking at the same time, the SDES packet transmitted from user terminal 1-1 at that time includes “SSRC / CSRC "And" SDES ITEM "are described only in two sets.

１組目のSSRCにはカンファレンスの主催者であるユーザＡの識別子が記述され、CSRCにはカンファレンスの参加者であるユーザＢの識別子とユーザＣの識別子が記述される。SDES ITEMには、ユーザＢに関する付帯情報が記述される。 The first set of SSRC describes the identifier of the user A who is the conference organizer, and the CSRC describes the identifier of the user B who is the conference participant and the identifier of the user C. In the SDES ITEM, incidental information regarding the user B is described.

２組目のSSRCにはカンファレンスの主催者であるユーザＡの識別子が記述され、CSRCにはカンファレンスの参加者であるユーザＢの識別子とユーザＣの識別子が記述される。SDES ITEMには、ユーザＣに関する付帯情報が記述される。 The second set of SSRC describes the identifier of the user A who is the conference organizer, and the CSRC describes the identifier of the user B who is the conference participant and the identifier of the user C. In the SDES ITEM, incidental information regarding the user C is described.

図６は、SDES ITEMとして記述される情報の例を示す図である。 FIG. 6 is a diagram illustrating an example of information described as SDES ITEM.

図６に示されるように、「Item Identifier」、「Byte Length」、「Item Description」の３種類の情報がSDES ITEMとして記述される。「Item Identifier」と「Item Description」の定義を図７に示す。 As shown in FIG. 6, three types of information “Item Identifier”, “Byte Length”, and “Item Description” are described as SDES ITEM. The definitions of “Item Identifier” and “Item Description” are shown in FIG.

図７に示されるように、Item Identifierが０で識別されるSDES ITEMは「END」であり、Item DescriptionにはSDES ITEMの終了であることを表す情報が記述される。 As shown in FIG. 7, the SDES ITEM identified with an Item Identifier of 0 is “END”, and the Item Description describes information indicating the end of the SDES ITEM.

Item Identifierが１で識別されるSDES ITEMは「CNAME」であり、Item Descriptionには参加者毎に固有の識別子が記述される。 The SDES ITEM identified by Item Identifier 1 is “CNAME”, and a unique identifier is described for each participant in Item Description.

Item Identifierが２で識別されるSDES ITEMは「NAME」であり、Item Descriptionには参加者の名前が記述される。 The SDES ITEM whose Item Identifier is 2 is “NAME”, and the name of the participant is described in Item Description.

Item Identifierが３で識別されるSDES ITEMは「EMAIL」であり、Item Descriptionには参加者の電子メールアドレスが記述される。 The SDES ITEM identified by Item Identifier 3 is “EMAIL”, and the e-mail address of the participant is described in Item Description.

Item Identifierが４で識別されるSDES ITEMは「PHONE」であり、Item Descriptionには参加者の電話番号やその参加者が利用する端末の電話番号が記述される。 The SDES ITEM whose Item Identifier is 4 is “PHONE”, and the Item Description describes the phone number of the participant and the phone number of the terminal used by the participant.

Item Identifierが５で識別されるSDES ITEMは「LOC」であり、Item Descriptionには参加者の住所が記述される。 The SDES ITEM identified by Item Identifier 5 is “LOC”, and the address of the participant is described in Item Description.

Item Identifierが６で識別されるSDES ITEMは「TOOL」であり、Item Descriptionには参加者が利用しているアプリケーションの名前が記述される。 The SDES ITEM identified by Item Identifier 6 is “TOOL”, and the name of the application used by the participant is described in Item Description.

Item Identifierが７で識別されるSDES ITEMは「NOTE」であり、Item Descriptionには参加者の状態を表す情報が記述される。 The SDES ITEM identified by Item Identifier 7 is “NOTE”, and Item Description describes information indicating the state of the participant.

Item Identifierが８で識別されるSDES ITEMは「PRIV」であり、Item Descriptionにはアプリケーション拡張用の情報が記述される。 The SDES ITEM identified by Item Identifier 8 is “PRIV”, and the information for application extension is described in Item Description.

ユーザ端末１−１は、参加者の付帯情報が記述されるこのようなSDESパケットを、同時に発言を行うユーザの構成が変わる毎にユーザ端末１−２と１−３に送信する。ユーザ端末１−１から送信されてきたSDESパケットを受信したユーザ端末１−２と１−３は、それぞれ、SDESパケットに記述されたCNAME，NAME，EMAIL，PHONE，LOC等の情報を用いて、現在、だれが発言を行っているのかを特定し、それを表す表示を行う。 The user terminal 1-1 transmits such an SDES packet in which incidental information of the participant is described to the user terminals 1-2 and 1-3 every time the configuration of the user who speaks at the same time changes. The user terminals 1-2 and 1-3 that have received the SDES packet transmitted from the user terminal 1-1 use information such as CNAME, NAME, EMAIL, PHONE, and LOC described in the SDES packet, respectively. Identify who is currently speaking and display it.

なお、CNAMEはSDES ITEMの記述の中で唯一添付が義務付けられているものであるため、例えば、このCNAMEが、会話を行っているユーザの情報を通知するために用いられる。また、ユーザ端末１−１のように、マイクロフォンやスピーカを有し、カンファレンスの参加者であるユーザＢ、ユーザＣと会話を行うユーザＡが利用する端末がカンファレンスを主催する場合、主催者の識別子ではあるが、ユーザＡが発言を行っているときにはユーザＡの識別子もSDES ITEMのCNAMEとしてSDESパケットに記述されるようにしてもよい。すなわち、SDES ITEMのCNAMEは、発言を行っているユーザを表す。 Since CNAME is the only one that is required to be attached in the description of SDES ITEM, for example, this CNAME is used to notify the information of the user who is having a conversation. In addition, when the terminal used by the user A who has a microphone and a speaker and who has a conversation with the user B and the user C like the user terminal 1-1 hosts the conference, the identifier of the organizer However, when user A is speaking, the identifier of user A may also be described in the SDES packet as the CNAME of the SDES ITEM. That is, CNAME of SDES ITEM represents the user who is making a statement.

図８は、SDESパケットのCNAMEとして記述される識別子の例を示す図である。図８においては横軸が時間軸を表す。 FIG. 8 is a diagram illustrating an example of an identifier described as CNAME of the SDES packet. In FIG. 8, the horizontal axis represents the time axis.

図８の例においては、時刻ｔ₁においてユーザＡにより発言が開始され、それが時刻ｔ₄まで続けられている。また、時刻ｔ₂においてユーザＣにより発言が開始され、それが時刻ｔ₅まで続けられている。さらに、時刻ｔ₃においてユーザＢにより発言が開始され、それが時刻ｔ₆まで続けられている。 In the example of FIG. 8, the user A starts speaking at time t ₁ and continues until time t ₄ . In addition, the user C starts speaking at time t ₂ and continues until time t ₅ . Furthermore, at time t ₃ , the user B starts speaking and continues until time t ₆ .

したがって、時刻ｔ₁から時刻ｔ₂までの時間においては、ユーザＡだけによって発言が行われ、時刻ｔ₂から時刻ｔ₃までの時間においては、ユーザＡとユーザＣの２人によって同時に発言が行われている。また、時刻ｔ₃から時刻ｔ₄までの時間においては、ユーザＡ、ユーザＢ、ユーザＣの３人によって同時に発言が行われ、時刻ｔ₄から時刻ｔ₅までの時間においては、ユーザＢとユーザＣの２人によって同時に発言が行われている。時刻ｔ₅から時刻ｔ₆までの時間においては、ユーザＢだけによって発言が行われている。 Therefore, during time from time t ₁ to time t ₂ , only user A speaks, and during time from time t ₂ to time t ₃ , two users A and C simultaneously speak. It has been broken. Also, during the time from time t ₃ to time t _4, _three users, user A, user B, and user C, speak at the same time, and during the time from time t ₄ to time t ₅ , user B and user Two of C are speaking at the same time. In the time from time t ₅ to time t ₆ , the speech is made only by user B.

このようなタイミングでそれぞれのユーザにより発言が行われる場合、図８に示されるように、時刻ｔ₁においてユーザ端末１−１から送信されるSDESパケットには、そのとき発言を開始したユーザＡの識別子がCNAMEとして記述されるSDES ITEMが含まれる。 When each user makes a statement at such a timing, as shown in FIG. 8, the SDES packet transmitted from the user terminal 1-1 at time t ₁ includes the user A who started the statement at that time. Contains SDES ITEM whose identifier is described as CNAME.

また、時刻ｔ₂においてユーザ端末１−１から送信されるSDESパケットには、時刻ｔ₂において発言を開始したユーザＣの識別子が追加され、ユーザＡの識別子がCNAMEとして記述されるSDES ITEMと、ユーザＣの識別子がCNAMEとして記述されるSDES ITEMが含まれる。 Further, the SDES packet transmitted from the user terminal 1-1 in time t _2, the added identifier of the user C who start speaking at time t _2, the a SDES ITEM the identifier of the user A is described as CNAME, SDES ITEM in which the identifier of the user C is described as CNAME is included.

時刻ｔ₃においてユーザ端末１−１から送信されるSDESパケットには、さらに、時刻ｔ₃において発言を開始したユーザＢの識別子が追加され、ユーザＡの識別子がCNAMEとして記述されるSDES ITEM、ユーザＢの識別子がCNAMEとして記述されるSDES ITEM、および、ユーザＣの識別子がCNAMEとして記述されるSDES ITEMが含まれる。 The SDES packet transmitted from the user terminal 1-1 at time t _3, further adds the identifier of the user B who start speaking at time t ₃ is, SDES ITEM the identifier of the user A is described as CNAME, the user SDES ITEM in which the identifier of B is described as CNAME and SDES ITEM in which the identifier of user C is described as CNAME are included.

時刻ｔ₄においてユーザ端末１−１から送信されるSDESパケットには、時刻ｔ₄において発言を終了したユーザＡの識別子が削除され、ユーザＢの識別子がCNAMEとして記述されるSDES ITEMと、ユーザＣの識別子がCNAMEとして記述されるSDES ITEMが含まれる。 The SDES packet transmitted from the user terminal 1-1 at time t _4, the identifier of the user A has finished speaking at time t ₄ is deleted, and SDES ITEM the identifier of the user B is described as CNAME, user C SDES ITEM in which the identifier is described as CNAME is included.

時刻ｔ₅においてユーザ端末１−１から送信されるSDESパケットには、時刻ｔ₅において発言を終了したユーザＣの識別子が削除され、ユーザＢの識別子がCNAMEとして記述されるSDES ITEMが含まれる。 The SDES packet transmitted from the user terminal 1-1 at time t _5, is deleted identifier of the user C has been completed the speech at time t _5, the identifier of the user B includes SDES ITEM described as CNAME.

ユーザＢが発言を終了することによってだれも発言を行っていない状態になった時刻ｔ₆においては、ユーザ端末１−１からSDESパケットは送信されない。 At time t ₆ the state had not been anyone speech by the user B ends the remarks, SDES packet from the user terminal 1-1 is not transmitted.

このように、ユーザ端末１−１からは、同時に発言を行うユーザの構成が変わる毎にSDESパケットが送信され、いまだれが発言を行っているのかが参加者のユーザ端末に通知される。 As described above, the user terminal 1-1 transmits an SDES packet every time the configuration of a user who speaks at the same time changes, and notifies the participant's user terminal whether he / she is speaking.

カンファレンスの主催者の端末であるユーザ端末１−１が行う一連の動作とカンファレンスの参加者の端末であるユーザ端末１−２と１−３が行う一連の動作についてはフローチャートを参照して後述する。 A series of operations performed by the user terminal 1-1 which is a conference organizer's terminal and a series of operations performed by the user terminals 1-2 and 1-3 which are conference participant's terminals will be described later with reference to flowcharts. .

図９は、ユーザ端末１−１の機能構成例を示すブロック図である。 FIG. 9 is a block diagram illustrating a functional configuration example of the user terminal 1-1.

ユーザ端末１−１においては、図９に示されるように、通信制御部１１、主制御部１２、入出力制御部１３、音声入力部１４、音声出力部１５、および表示制御部１６が所定のプログラムが実行されることによって実現される。ユーザ端末１−１には、ネットワーク端子２１、マイクロフォン２２、スピーカ２３、およびディスプレイ２４が設けられている。 In the user terminal 1-1, as shown in FIG. 9, a communication control unit 11, a main control unit 12, an input / output control unit 13, a voice input unit 14, a voice output unit 15, and a display control unit 16 are predetermined. This is realized by executing the program. The user terminal 1-1 is provided with a network terminal 21, a microphone 22, a speaker 23, and a display 24.

通信制御部１１は、ネットワーク端子２１に装着されたケーブルを介してネットワーク２に接続し、ユーザ端末１−２，１−３と通信を行う。通信制御部１１は、ユーザ端末１−２，１−３からRTPパケット、RTCPパケットが送信されてきたとき、それを受信し、主制御部１２に出力する。また、通信制御部１１は、RTPパケット、RTCPパケット（SDESパケット）が主制御部１２から供給されたとき、それをユーザ端末１−２と１−３に送信する。 The communication control unit 11 is connected to the network 2 via a cable attached to the network terminal 21 and communicates with the user terminals 1-2 and 1-3. When the RTP packet and the RTCP packet are transmitted from the user terminals 1-2 and 1-3, the communication control unit 11 receives them and outputs them to the main control unit 12. Further, when the RTP packet and the RTCP packet (SDES packet) are supplied from the main control unit 12, the communication control unit 11 transmits them to the user terminals 1-2 and 1-3.

主制御部１２は、通信制御部１１から供給されたRTPパケットに含まれる音声データと入出力制御部１３から供給された音声データを合成し、合成して得られた音声データを格納するRTPパケットを生成する。主制御部１２は、生成したRTPパケットを通信制御部１１に出力する。また、主制御部１２は、通信制御部１１から供給されたRTPパケットに含まれる音声データを入出力制御部１３に出力する。 The main control unit 12 combines the audio data included in the RTP packet supplied from the communication control unit 11 and the audio data supplied from the input / output control unit 13, and stores the audio data obtained by the synthesis. Is generated. The main control unit 12 outputs the generated RTP packet to the communication control unit 11. Further, the main control unit 12 outputs audio data included in the RTP packet supplied from the communication control unit 11 to the input / output control unit 13.

さらに、主制御部１２は、ユーザ端末１−２，１−３から送信されるRTPパケットの状況と、入出力制御部１３から供給される音声データの状況に基づいて、同時に発言を行っているユーザの構成を監視し、ユーザの構成が変わったことを検出したとき、発言を行っているユーザの識別子が記述される、上述したようなSDESパケットを生成する。主制御部１２は、生成したSDESパケットを通信制御部１１に出力する。なお、だれが発言を行い、だれが発言を行っていないのかを表す情報は主制御部１２から入出力制御部１３にも出力される。 Further, the main control unit 12 makes a statement at the same time based on the status of the RTP packets transmitted from the user terminals 1-2 and 1-3 and the status of the voice data supplied from the input / output control unit 13. When the user configuration is monitored and it is detected that the user configuration has changed, an SDES packet as described above is generated in which the identifier of the user who is speaking is described. The main control unit 12 outputs the generated SDES packet to the communication control unit 11. Note that information indicating who has made a statement and who has not made a statement is also output from the main control unit 12 to the input / output control unit 13.

入出力制御部１３は、音声入力部１４により取り込まれた音声データを主制御部１２に出力する。また、入出力制御部１３は、主制御部１２から供給された音声データを音声出力部１５に出力し、音声を出力させる。 The input / output control unit 13 outputs the audio data captured by the audio input unit 14 to the main control unit 12. In addition, the input / output control unit 13 outputs the audio data supplied from the main control unit 12 to the audio output unit 15 to output audio.

さらに、入出力制御部１３は、主制御部１２から供給された、だれが発言を行い、だれが発言を行っていないのかを表す情報を表示制御部１６に出力する。 Furthermore, the input / output control unit 13 outputs information supplied from the main control unit 12 to the display control unit 16 indicating who is speaking and who is not speaking.

音声入力部１４は、スピーカ２２において取り込まれた音声信号にA/D(Analog/Digital)変換処理を施し、得られた音声データを入出力制御部１３に出力する。 The audio input unit 14 performs A / D (Analog / Digital) conversion processing on the audio signal captured by the speaker 22 and outputs the obtained audio data to the input / output control unit 13.

音声出力部１５は、入出力制御部１３から供給された音声データに対してD/A変換処理、増幅処理を施し、所定の音量に調整した後、音声をスピーカ２３から出力させる。 The audio output unit 15 performs D / A conversion processing and amplification processing on the audio data supplied from the input / output control unit 13, adjusts the sound volume to a predetermined level, and then outputs audio from the speaker 23.

表示制御部１６は、入出力制御部１３から供給された情報に基づいて、発言を行っているユーザの情報を、発言を行っていないユーザの情報と異なる形式でディスプレイ２４に表示させる。 Based on the information supplied from the input / output control unit 13, the display control unit 16 causes the display 24 to display the information of the user who is speaking, in a format different from the information of the user who is not speaking.

以上のような構成と同じ構成をユーザ端末１−２と１−３も有している。以下、適宜、図９に示されるユーザ端末１−１の構成を、ユーザ端末１−２や１−３の構成として引用して説明する。 The user terminals 1-2 and 1-3 have the same configuration as the above configuration. Hereinafter, the configuration of the user terminal 1-1 illustrated in FIG. 9 will be described as being referred to as the configuration of the user terminals 1-2 and 1-3 as appropriate.

図１０は、図９の主制御部１２の構成例を示すブロック図である。 FIG. 10 is a block diagram illustrating a configuration example of the main control unit 12 of FIG.

カンファレンスの主催者の端末であるユーザ端末１−１の主制御部１２は、図１０に示されるように、パケット解析部３１、音声合成部３２、およびパケット生成部３３から構成される。 As shown in FIG. 10, the main control unit 12 of the user terminal 1-1 that is the conference organizer's terminal includes a packet analysis unit 31, a voice synthesis unit 32, and a packet generation unit 33.

パケット解析部３１は、通信制御部１１から供給されたRTPパケットを解析し、RTPパケットに格納される音声データを入出力制御部１３と音声合成部３２に出力する。入出力制御部１３に出力された音声データは、スピーカ２３から音声を出力するために用いられる。 The packet analysis unit 31 analyzes the RTP packet supplied from the communication control unit 11 and outputs voice data stored in the RTP packet to the input / output control unit 13 and the voice synthesis unit 32. The sound data output to the input / output control unit 13 is used to output sound from the speaker 23.

また、パケット解析部３１は、例えば、RTPパケットに記述されるCSRCとマーカビットの値、および、入出力制御部１３から供給されたユーザＡの音声のデータに基づいて、発言を行っているユーザを特定する。RTPパケットに記述されるCSRCとマーカビットの値から、ユーザＢとユーザＣのそれぞれが発言を行っているか否かが特定され、入出力制御部１３から供給されたユーザＡの音声のデータから、ユーザＡが発言を行っているか否かが特定される。 The packet analysis unit 31 also makes a statement based on, for example, CSRC and marker bit values described in the RTP packet, and user A's voice data supplied from the input / output control unit 13. Is identified. From the CSRC and the marker bit values described in the RTP packet, it is specified whether or not each of the user B and the user C is speaking. From the voice data of the user A supplied from the input / output control unit 13, It is specified whether or not the user A is speaking.

パケット解析部３１は、発言を行っているユーザを特定し、同時に発言を行っているユーザの構成が変わったことを検出したとき、いま発言を行っているユーザの情報をパケット生成部３３に出力する。だれが発言を行い、だれが発言を行っていないのかを表す情報はパケット解析部３１から入出力制御部１３にも出力される。 When the packet analysis unit 31 identifies the user who is speaking and detects that the configuration of the user who is speaking at the same time is changed, the packet analysis unit 31 outputs the information of the user who is currently speaking to the packet generation unit 33 To do. Information indicating who makes a statement and who is not making a statement is also output from the packet analysis unit 31 to the input / output control unit 13.

また、パケット解析部３１は、通信制御部１１において受信されたデータを解析し、カンファレンスに新たに参加するユーザや、カンファレンスから退場するユーザを検出し、検出したそれらのユーザの情報をパケット生成部３３に出力する。 Further, the packet analysis unit 31 analyzes the data received by the communication control unit 11, detects a user who newly participates in the conference or a user who leaves the conference, and the information on those detected users is a packet generation unit. To 33.

音声合成部３２は、パケット解析部３１から供給された音声データと入出力制御部１３から供給された音声データを合成し、合成して得られた音声データをパケット生成部３３に出力する。 The voice synthesis unit 32 synthesizes the voice data supplied from the packet analysis unit 31 and the voice data supplied from the input / output control unit 13 and outputs the voice data obtained by the synthesis to the packet generation unit 33.

パケット生成部３３は、音声合成部３２から供給された音声データを格納するRTPパケットを生成し、生成したRTPパケットを通信制御部１１に出力する。 The packet generator 33 generates an RTP packet that stores the voice data supplied from the voice synthesizer 32, and outputs the generated RTP packet to the communication controller 11.

また、パケット生成部３３は、同時に発言を行うユーザの構成が変わったことが検出され、ユーザの情報がパケット解析部３１から供給されたとき、発言を行っているユーザの識別子がSDES ITEMのCNAMEとして記述されるSDESパケットを生成し、生成したSDESパケットを通信制御部１１に出力する。 Further, the packet generation unit 33 detects that the configuration of the user who makes a statement at the same time is changed, and when the user information is supplied from the packet analysis unit 31, the identifier of the user who makes the statement is the CNAME of the SDES ITEM. Is generated, and the generated SDES packet is output to the communication control unit 11.

パケット生成部３３は、カンファレンスに参加するユーザの情報が登録されるユーザリストを管理しており、カンファレンスに新たに参加するユーザや、カンファレンスから退場するユーザの情報がパケット解析部３１から供給されたとき、それに応じてユーザリストの登録内容を更新する。ユーザリストには、カンファレンスに参加するユーザの識別子などが登録されており、このユーザリストは、ユーザ端末１−１から送信されるRTPパケットのRTPヘッダにCSRCを記述するとき、あるいは、SDESパケットにCSRCを記述するときなどに参照される。 The packet generation unit 33 manages a user list in which information on users participating in the conference is registered, and information on users newly participating in the conference and users leaving the conference is supplied from the packet analysis unit 31. At that time, the registered contents of the user list are updated accordingly. In the user list, identifiers of users who participate in the conference are registered, and this user list is described when CSRC is described in the RTP header of the RTP packet transmitted from the user terminal 1-1 or in the SDES packet. Referenced when describing CSRC.

図１１は、図９の主制御部１２の他の構成例を示すブロック図である。図１１は、カンファレンスの参加者の端末である例えばユーザ端末１−２の主制御部１２の構成を示している。 FIG. 11 is a block diagram illustrating another configuration example of the main control unit 12 of FIG. FIG. 11 shows the configuration of the main control unit 12 of the user terminal 1-2 which is a conference participant's terminal, for example.

ユーザ端末１−２の主制御部１２は、図１１に示されるように、パケット解析部４１とパケット生成部４２から構成される。 The main control unit 12 of the user terminal 1-2 includes a packet analysis unit 41 and a packet generation unit 42 as shown in FIG.

パケット解析部４１は、ユーザ端末１−１から送信され、ユーザ端末１−２の通信制御部１１において受信されたRTPパケットを取得し、取得したRTPパケットに格納される音声データを入出力制御部１３に出力する。入出力制御部１３に出力された音声データに基づいて、ユーザ端末１−２のスピーカ２３から音声が出力される。 The packet analysis unit 41 acquires the RTP packet transmitted from the user terminal 1-1 and received by the communication control unit 11 of the user terminal 1-2, and the voice data stored in the acquired RTP packet is input / output control unit 13 is output. Based on the audio data output to the input / output control unit 13, audio is output from the speaker 23 of the user terminal 1-2.

また、パケット解析部４１は、ユーザ端末１−２の通信制御部１１において受信されたRTPパケットやSDESパケットのCSRCと、SDESパケットのSDES ITEMにCNAMEとして記述される識別子に基づいて、どのユーザが発言を行っており、どのユーザが発言を行っていないのかを特定する。パケット解析部４１は、特定したユーザの情報を入出力制御部１３に出力し、発言を行っているユーザの情報を、発言を行っていないユーザの情報と異なる形式でユーザ端末１−２のディスプレイ２４に表示させる。 Further, the packet analysis unit 41 determines which user is based on the CSRC of the RTP packet or SDES packet received by the communication control unit 11 of the user terminal 1-2 and the identifier described as CNAME in the SDES ITEM of the SDES packet. The user is speaking, and the user who is not speaking is specified. The packet analysis unit 41 outputs the specified user information to the input / output control unit 13, and displays the information of the user who is speaking in a format different from the information of the user who is not speaking. 24.

パケット生成部４２は、マイクロフォン２２により取り込まれたユーザＢの音声のデータが音声入力部１４、入出力制御部１３を介して供給されたとき、音声データを格納するRTPパケットを生成し、生成したRTPパケットを通信制御部１１に出力する。通信制御部１１に出力されたRTPパケットはユーザ端末１−１に送信される。 The packet generation unit 42 generates and generates an RTP packet for storing audio data when user B's audio data captured by the microphone 22 is supplied via the audio input unit 14 and the input / output control unit 13. The RTP packet is output to the communication control unit 11. The RTP packet output to the communication control unit 11 is transmitted to the user terminal 1-1.

次に、以上のような構成を有するユーザ端末の動作についてフローチャートを参照して説明する。 Next, the operation of the user terminal having the above configuration will be described with reference to a flowchart.

はじめに、図１２のフローチャートを参照して、カンファレンスを管理するユーザ端末１−１の処理について説明する。 First, processing of the user terminal 1-1 that manages the conference will be described with reference to the flowchart of FIG.

ステップＳ３１において、ユーザ端末１−１のパケット解析部３１は、通信制御部１１において受信されたデータの解析結果に基づいて、カンファレンスから退場したユーザがいるか否かを判定し、そのようなユーザがいないと判定した場合、ステップＳ３２に進む。 In step S31, the packet analysis unit 31 of the user terminal 1-1 determines whether or not there is a user who has left the conference based on the analysis result of the data received by the communication control unit 11. If it is determined that there is not, the process proceeds to step S32.

ステップＳ３２において、パケット解析部３１は、次に、カンファレンスに参加したユーザがいるか否かを判定する。 In step S32, the packet analysis unit 31 next determines whether there is a user who has participated in the conference.

パケット解析部３１は、ステップＳ３２において、カンファレンスに参加したユーザがいると判定した場合、カンファレンスに新たに参加したユーザの情報をパケット生成部３３に出力し、ステップＳ３３に進む。 If the packet analysis unit 31 determines in step S32 that there is a user who has participated in the conference, the packet analysis unit 31 outputs information on the user who has newly participated in the conference to the packet generation unit 33, and proceeds to step S33.

ステップＳ３３において、パケット生成部３３は、パケット解析部３１から供給された情報に基づいて、カンファレンスに新たに参加したユーザの識別子をユーザリストに追加する。カンファレンスに参加するユーザが変わったとき、SDESパケットのCSRC以外の情報も適宜書き換えられる。 In step S33, based on the information supplied from the packet analysis unit 31, the packet generation unit 33 adds the identifier of the user who has newly joined the conference to the user list. When the user participating in the conference changes, information other than CSRC in the SDES packet is appropriately rewritten.

一方、ステップＳ３２において、パケット解析部３１は、カンファレンスに参加したユーザもいないと判定した場合、ステップＳ３３の処理をスキップする。カンファレンスに参加したユーザもいないと判定された後、または、ステップＳ３３においてカンファレンスに新たに参加したユーザの識別子がユーザリストに登録された後、処理はステップＳ３５に進む。 On the other hand, if the packet analysis unit 31 determines in step S32 that no user has participated in the conference, the process of step S33 is skipped. After it is determined that no user has joined the conference, or after the identifier of the user who has newly joined the conference is registered in the user list in step S33, the process proceeds to step S35.

また、パケット解析部３１は、ステップＳ３１において、通信制御部１１において受信されたデータの解析結果に基づいて、カンファレンスから退場したユーザがいると判定した場合、カンファレンスから退場したユーザの情報をパケット生成部３３に出力し、ステップＳ３４に進む。 If the packet analysis unit 31 determines in step S31 that there is a user who has left the conference based on the analysis result of the data received by the communication control unit 11, the packet generator 31 generates information on the user who has left the conference. It outputs to the part 33, and progresses to step S34.

ステップＳ３４において、パケット生成部３３は、パケット解析部３１から供給された情報に基づいて、カンファレンスから退場したユーザの識別子をユーザリストから削除する。その後、処理はステップＳ３５に進む。 In step S <b> 34, the packet generation unit 33 deletes the identifier of the user who has left the conference from the user list based on the information supplied from the packet analysis unit 31. Thereafter, the process proceeds to step S35.

ステップＳ３５において、パケット解析部３１は、例えば、ユーザ端末１−２や１−３から送信され、通信制御部１１において受信されたRTPパケットに記述されるCSRCとマーカビットの値、および、入出力制御部１３から供給されたユーザＡの音声のデータに基づいて、どのユーザが発言を行っているかを特定し、それまでに行っていた発言を終了したユーザがいるか否かを判定する。ユーザ端末１−２と１−３からは、RTPパケットなどが所定のタイミングで送信されてきており、音声合成部３２においては、それに格納される音声データの合成が行われている。 In step S35, for example, the packet analysis unit 31 transmits the CSRC and the marker bit values described in the RTP packet transmitted from the user terminal 1-2 or 1-3 and received by the communication control unit 11, and the input / output Based on the voice data of the user A supplied from the control unit 13, it is specified which user is speaking, and it is determined whether there is a user who has ended the speech that has been performed so far. RTP packets and the like are transmitted from the user terminals 1-2 and 1-3 at a predetermined timing, and the voice synthesizer 32 synthesizes voice data stored therein.

パケット解析部３１は、ステップＳ３５において、発言を終了したユーザがいないと判定した場合、ステップＳ３６に進み、次に、発言を開始したユーザがいるか否かを判定する。 If it is determined in step S35 that there is no user who has finished speaking, the packet analysis unit 31 proceeds to step S36, and then determines whether there is a user who has started speaking.

ステップＳ３６において、パケット解析部３１は、発言を開始したユーザがいると判定した場合、ステップＳ３７に進み、カンファレンスから退場したユーザがいるとステップＳ３１において判定されたことから、または、カンファレンスに参加したユーザがいるとステップＳ３２において判定されたことから、カンファレンスに参加するユーザの構成が変わったことが確認されているか否かを判定する。 In step S36, if the packet analysis unit 31 determines that there is a user who has started speaking, the process proceeds to step S37, and it is determined in step S31 that there is a user who has left the conference, or joined the conference. Since it is determined in step S32 that there is a user, it is determined whether or not it has been confirmed that the configuration of the user participating in the conference has changed.

ステップＳ３７において、パケット解析部３１は、カンファレンスに参加するユーザの構成が変わったことが確認されていると判定した場合、ステップＳ３８に進む。 In step S37, when the packet analysis unit 31 determines that the configuration of the user participating in the conference has been confirmed, the process proceeds to step S38.

ステップＳ３８において、パケット生成部３３は、ユーザリストを参照し、送信するRTPパケットのRTPヘッダのCSRCと、RTCPパケットのCSRCを書き換える。 In step S38, the packet generation unit 33 refers to the user list and rewrites the CSRC of the RTP header of the RTP packet to be transmitted and the CSRC of the RTCP packet.

一方、ステップＳ３７において、パケット解析部３１は、カンファレンスから退場したユーザがいるとステップＳ３１において判定されておらず、かつ、カンファレンスに参加したユーザがいるとステップＳ３２において判定されていないことから、カンファレンスに参加するユーザの構成が変わったことが確認されていないと判定した場合、ステップＳ３８の処理をスキップし、ステップＳ３９に進む。この場合、それまでに送信されたRTPパケット、RTCPパケットと同じCSRCが、RTPパケットのRTPヘッダとRTCPパケットに記述されることになる。 On the other hand, in step S37, the packet analysis unit 31 does not determine in step S31 that there is a user who leaves the conference, and does not determine in step S32 that there is a user who has participated in the conference. If it is determined that it has not been confirmed that the configuration of the user participating in the process has changed, the process of step S38 is skipped and the process proceeds to step S39. In this case, the same CSRC as the RTP packet and RTCP packet transmitted so far is described in the RTP header and RTCP packet of the RTP packet.

ステップＳ３９において、パケット生成部３３は、ステップＳ３６において開始されたと判定された発言の主であるユーザの識別子をSDESパケットのSDES ITEMに追加する。 In step S39, the packet generation unit 33 adds the identifier of the user who is the principal of the speech determined to have started in step S36 to the SDES ITEM of the SDES packet.

ステップＳ４０において、パケット生成部３３は、ステップＳ３８でCSRCを書き換えたRTPヘッダを、音声合成部３２から供給された音声データに付加し、得られたRTPパケットを通信制御部１１からユーザ端末１−２と１−３に送信する。 In step S40, the packet generator 33 adds the RTP header in which CSRC has been rewritten in step S38 to the voice data supplied from the voice synthesizer 32, and sends the obtained RTP packet from the communication controller 11 to the user terminal 1- Send to 2 and 1-3.

また、パケット生成部３３は、ステップＳ４０において、ステップＳ３８でCSRCを書き換え、さらに、ステップＳ３９でユーザの識別子を追加したSDESパケットを通信制御部１１からユーザ端末１−２と１−３に送信する。これにより、あるユーザが発言を開始したことによって同時に発言を行うユーザの構成が変わったタイミングで、SDESパケットがユーザ端末１−１から送信されることになる。 In step S40, the packet generation unit 33 rewrites the CSRC in step S38, and further transmits the SDES packet with the user identifier added in step S39 from the communication control unit 11 to the user terminals 1-2 and 1-3. . Thus, the SDES packet is transmitted from the user terminal 1-1 at the timing when the configuration of the user who speaks at the same time when a certain user starts speaking is changed.

ユーザ端末１−２と１−３においては、ここで送信されたRTPパケットに格納される音声データに基づいて音声の出力が行われるとともに、RTCPパケットに記述されるユーザの識別子に基づいて、発言を行っているユーザの情報と、発言を行っていないユーザの情報とが異なる形式で表示される。 In the user terminals 1-2 and 1-3, voice is output based on the voice data stored in the RTP packet transmitted here, and a message is sent based on the user identifier described in the RTCP packet. The information of the user who is making a speech and the information of the user who is not making a speech are displayed in different formats.

ステップＳ３６において、発言も開始されていないと判定された場合、ステップＳ３７乃至Ｓ４０の処理はスキップされる。 If it is determined in step S36 that no speech has been started, the processes in steps S37 to S40 are skipped.

一方、ステップＳ３５において、パケット解析部３１は、発言を終了したユーザがいると判定した場合、ステップＳ４１に進み、カンファレンスから退場したユーザがいるとステップＳ３１において判定されたことから、または、カンファレンスに参加したユーザがいるとステップＳ３２において判定されたことから、カンファレンスに参加するユーザの構成が変わったことが確認されているか否かを判定する。 On the other hand, if it is determined in step S35 that there is a user who has finished speaking, the packet analysis unit 31 proceeds to step S41, and since it is determined in step S31 that there is a user who has left the conference, or in the conference. Since it is determined in step S32 that there is a user who has participated, it is determined whether or not it has been confirmed that the configuration of the user participating in the conference has changed.

ステップＳ４１において、パケット解析部３１は、カンファレンスに参加するユーザの構成が変わったことが確認されていると判定した場合、ステップＳ４２に進む。 In step S41, when the packet analysis unit 31 determines that the configuration of the user participating in the conference has been confirmed, the process proceeds to step S42.

ステップＳ４２において、パケット生成部３３は、ユーザリストを参照し、送信するRTPパケットのRTPヘッダのCSRCと、RTCPパケットのCSRCを書き換える。 In step S42, the packet generator 33 refers to the user list and rewrites the CSRC of the RTP header of the RTP packet to be transmitted and the CSRC of the RTCP packet.

一方、ステップＳ４１において、パケット解析部３１は、カンファレンスから退場したユーザがいるとステップＳ３１において判定されておらず、かつ、カンファレンスに参加したユーザがいるとステップＳ３２において判定されていないことから、カンファレンスに参加するユーザの構成が変わったことが確認されていないと判定した場合、ステップＳ４２の処理をスキップする。 On the other hand, in step S41, the packet analysis unit 31 does not determine in step S31 that there is a user who has left the conference, and does not determine in step S32 that there is a user who has participated in the conference. If it is determined that it has not been confirmed that the configuration of the user participating in the process has changed, the process of step S42 is skipped.

ステップＳ４３において、パケット生成部３３は、ステップＳ３５において終了されたと判定された発言の主であるユーザの識別子を、SDESパケットから削除する。 In step S43, the packet generator 33 deletes, from the SDES packet, the identifier of the user who is the main speaker who has been determined to have ended in step S35.

ここで識別子が削除されたSDESパケットは、ステップＳ４０においてユーザ端末１−２と１−３に送信され、これにより、あるユーザが発言を終了したことによって同時に発言を行うユーザの構成が変わったタイミングで、SDESパケットがユーザ端末１−１から送信されることになる。 Here, the SDES packet from which the identifier is deleted is transmitted to the user terminals 1-2 and 1-3 in step S40, and thereby the timing at which the configuration of the user who speaks at the same time is changed due to the termination of the certain user. Thus, the SDES packet is transmitted from the user terminal 1-1.

RTPパケットとSDESパケットの送信が行われたとき、パケット解析部３１は、ステップＳ４４において、カンファレンスに参加するユーザが残っているか否かを判定する。 When the RTP packet and the SDES packet are transmitted, the packet analyzing unit 31 determines whether or not there are any remaining users participating in the conference in step S44.

パケット解析部３１は、ステップＳ４４において、カンファレンスに残っているユーザがいると判定した場合、ステップＳ３１に戻り、上述した処理を繰り返し実行し、一方、全てのユーザが退場したことから、カンファレンスに残っているユーザがいないと判定した場合、処理を終了させる。 If the packet analysis unit 31 determines in step S44 that there are users remaining in the conference, the packet analysis unit 31 returns to step S31 and repeatedly executes the above-described processing. On the other hand, since all the users have left, the packet analysis unit 31 remains in the conference. If it is determined that no user is present, the process is terminated.

次に、図１３のフローチャートを参照して、ユーザの情報の表示を制御するユーザ端末１−２の処理について説明する。図１２に示される処理と同様の処理が、ユーザ端末１−３においても行われる。 Next, processing of the user terminal 1-2 that controls display of user information will be described with reference to the flowchart of FIG. A process similar to the process shown in FIG. 12 is also performed in the user terminal 1-3.

ステップＳ５１において、ユーザ端末１−２のパケット解析部４１は、ユーザ端末１−１から送信されてきたRTPパケットとSDESパケットが通信制御部１１において受信されたか否かを判定し、受信されたと判定した場合、ステップＳ５２に進む。 In step S51, the packet analysis unit 41 of the user terminal 1-2 determines whether the RTP packet and the SDES packet transmitted from the user terminal 1-1 have been received by the communication control unit 11, and determines that they have been received. If so, the process proceeds to step S52.

ステップＳ５２において、パケット解析部４１は、例えば、RTPパケットのRTPヘッダに記述されるCSRCと、SDES ITEMとしてSDESパケットに記述されるCNAMEを、通信制御部１１において受信されたデータから取得する。 In step S52, the packet analysis unit 41 acquires, for example, CSRC described in the RTP header of the RTP packet and CNAME described in the SDES packet as SDES ITEM from the data received by the communication control unit 11.

ステップＳ５３において、パケット解析部４１は、ステップＳ５２で取得したCSRC，CNAMEが、記憶しておいたCSRC，CNAMEと較べてデータの内容に変化があるか否かを判定する。パケット解析部４１においては、ユーザ端末１−１から送信されてくるRTPパケットに記述されるCSRCと、SDES ITEMとしてSDESパケットに記述されるCNAMEが記憶されており、それらのデータは、例えば、RTPパケットとRTCPパケットが新たに受信され、内容に変化がある毎に書き換えられる。 In step S53, the packet analysis unit 41 determines whether or not the CSRC and CNAME acquired in step S52 have a change in data content compared to the stored CSRC and CNAME. In the packet analysis unit 41, CSRC described in the RTP packet transmitted from the user terminal 1-1 and CNAME described in the SDES packet as SDES ITEM are stored, and these data are, for example, RTP Packets and RTCP packets are newly received and rewritten whenever there is a change in content.

カンファレンスに参加するユーザの構成が変わったとき、CSRCが書き換えられたRTPパケットが送信されてくるから、そのとき、CSRCの内容に変化があったと判定される。また、同時に発言を行うユーザの構成が変わったとき、CNAMEが追加されたり削除されたりしたSDESパケットが送信されてくるから、そのとき、CNAMEの内容に変化があったと判定される。 When the configuration of the user participating in the conference changes, an RTP packet with the CSRC rewritten is transmitted, and at that time, it is determined that the content of the CSRC has changed. Further, when the configuration of a user who makes a statement at the same time is changed, an SDES packet in which a CNAME is added or deleted is transmitted. At this time, it is determined that the content of the CNAME has changed.

パケット解析部４１は、ステップＳ５３において、データの内容に変化があったと判定した場合、ステップＳ５４に進む。このとき、例えばRTPパケットのCSRCとSDESパケットに記述されるCNAMEから特定される、発言を行っているユーザと発言を行っていないユーザを表す情報が入出力制御部１３に出力される。 If the packet analysis unit 41 determines in step S53 that the data contents have changed, the packet analysis unit 41 proceeds to step S54. At this time, for example, information indicating the user who is making a speech and the user who is not making a speech, which are specified from the CSRC of the RTP packet and the CNAME described in the SDES packet, is output to the input / output control unit 13.

ステップＳ５４において、入出力制御部１３は、パケット解析部４１から供給された情報に基づいて、発言を行っているユーザの情報を、発言を行っていないユーザの情報と異なる形式でディスプレイ２４に表示させる。 In step S54, based on the information supplied from the packet analysis unit 41, the input / output control unit 13 displays the information of the user who is speaking on the display 24 in a format different from the information of the user who is not speaking. Let

図１４は、ユーザ端末１−２のディスプレイ２４に表示される画面の例を示す図である。 FIG. 14 is a diagram illustrating an example of a screen displayed on the display 24 of the user terminal 1-2.

図１４の例においては、ディスプレイ２４の下側の縁に沿ってタブ５１乃至５３が表示されている。タブ５１にはユーザ端末１−１のユーザの名前を表す「Ａさん」が表示され、タブ５２にはユーザ端末１−２のユーザの名前を表す「Ｂさん」が表示されている。また、タブ５３にはユーザ端末１−３のユーザの名前を表す「Ｃさん」が表示されている。 In the example of FIG. 14, tabs 51 to 53 are displayed along the lower edge of the display 24. The tab 51 displays “Mr. A” representing the name of the user of the user terminal 1-1, and the tab 52 displays “Mr. B” representing the name of the user of the user terminal 1-2. The tab 53 displays “Mr. C” representing the name of the user of the user terminal 1-3.

また、図１４の例においては、タブ５１乃至５３のうち、タブ５２と５３が、タブ５１と異なる色で表示されている。タブ５２と５３に付されている斜線は、斜線が付されていないタブ５１と較べて異なる色で表示されていることを表している。 In the example of FIG. 14, among the tabs 51 to 53, the tabs 52 and 53 are displayed in a color different from that of the tab 51. The hatched lines attached to the tabs 52 and 53 indicate that the tabs 52 and 53 are displayed in a different color compared to the tabs 51 that are not hatched.

すなわち、図１４は、ユーザ端末１−２のユーザであるユーザＢと、ユーザ端末１−３のユーザであるユーザＣが発言を行っている場合の例を示している。 That is, FIG. 14 illustrates an example in which a user B who is a user of the user terminal 1-2 and a user C who is a user of the user terminal 1-3 are speaking.

このような画面表示から、ユーザＢは、スピーカ２３から出力されている音声が、自分とユーザＣの音声であることを確認することができる。図１４の画面が表示されているとき、ユーザ端末１−２の音声出力部１５においては、パケット解析部４１により取得されたRTPパケットに格納される音声データに基づいて、スピーカ２３から音声が出力されている。 From such a screen display, the user B can confirm that the sound output from the speaker 23 is the sound of the user and the user C. When the screen of FIG. 14 is displayed, the audio output unit 15 of the user terminal 1-2 outputs audio from the speaker 23 based on the audio data stored in the RTP packet acquired by the packet analysis unit 41. Has been.

なお、図１４に示されるように、カンファレンスに参加する全てのユーザの情報が表示され、発言を行っているユーザの情報と発言を行っていないユーザの情報が異なる形式で表示されるのではなく、単に、SDESパケットに記述されているCNAMEから特定された、発言を行っているユーザの情報だけが表示されるようにしてもよい。 As shown in FIG. 14, information on all users who participate in the conference is displayed, and information on users who are speaking and information on users who are not speaking are not displayed in different formats. Alternatively, only the information of the user who is making a speech specified from the CNAME described in the SDES packet may be displayed.

図１３の説明に戻り、ステップＳ５５において、パケット解析部４１は、ステップＳ５２で取得したCSRC，CNAMEを記憶し、ステップＳ５６に進む。ステップＳ５１においてRTPパケットとSDESパケットが受信されていないと判定された場合、または、ステップＳ５３においてCSRC，CNAMEの内容に変化がないと判定された場合も、処理はステップＳ５６に進む。 Returning to the description of FIG. 13, in step S55, the packet analysis unit 41 stores the CSRC and CNAME acquired in step S52, and proceeds to step S56. If it is determined in step S51 that an RTP packet and an SDES packet have not been received, or if it is determined in step S53 that there is no change in the contents of CSRC and CNAME, the process proceeds to step S56.

ステップＳ５６において、パケット解析部４１は、カンファレンスへの参加を終了するか否かを判定する。 In step S56, the packet analysis unit 41 determines whether to end participation in the conference.

パケット解析部４１は、ステップＳ５６において、カンファレンスへの参加を終了しないと判定した場合、ステップＳ５１に戻り、上述した処理を繰り返し実行し、一方、カンファレンスへの参加を終了すると判定した場合、処理を終了させる。例えば、カンファレンスへの参加を終了することがユーザＢにより指示された場合、カンファレンスへの参加を終了すると判定される。 If the packet analysis unit 41 determines in step S56 that the participation in the conference is not terminated, the packet analysis unit 41 returns to step S51 and repeatedly executes the above-described processing. On the other hand, if the packet analysis unit 41 determines that the participation in the conference is terminated, the processing is performed. Terminate. For example, when the user B instructs to end the participation in the conference, it is determined that the participation in the conference is ended.

以上の処理により、カンファレンスの参加者であるユーザＢは、画面表示から、発言を行っているユーザと、発言を行っていないユーザを一目で容易に確認することができる。 Through the above processing, the user B who is a participant in the conference can easily confirm at a glance the user who is speaking and the user who is not speaking from the screen display.

図１５は、ユーザ端末の他の構成例を示すブロック図である。図９の構成と同じ構成には同じ符号を付してある。 FIG. 15 is a block diagram illustrating another configuration example of the user terminal. The same components as those in FIG. 9 are denoted by the same reference numerals.

図１５の例においては、装置５１と装置５２の２つの装置によって、図９のユーザ端末１−１と同様の構成が実現されている。装置５１においては通信制御部１１と主制御部１２が実現され、装置５２においては入出力制御部１３、音声入力部１４、音声出力部１５、および表示制御部１６が実現されている。 In the example of FIG. 15, the same configuration as that of the user terminal 1-1 of FIG. 9 is realized by the two devices of the device 51 and the device 52. In the device 51, the communication control unit 11 and the main control unit 12 are realized, and in the device 52, the input / output control unit 13, the voice input unit 14, the voice output unit 15, and the display control unit 16 are realized.

装置５１の主制御部１２と、装置５２の入出力制御部１３はUSB(Universal Serial Bus)ケーブルなどを介して接続される。このような構成を有する装置５１と装置５２により、図９の構成を有するユーザ端末１−１が行う上述したような処理と同様の処理、あるいは、ユーザ端末１−２が行う上述したような処理と同様の処理が行われるようにしてもよい。 The main control unit 12 of the device 51 and the input / output control unit 13 of the device 52 are connected via a USB (Universal Serial Bus) cable or the like. The processing similar to the processing described above performed by the user terminal 1-1 having the configuration shown in FIG. 9 by the device 51 and the device 52 having such a configuration, or the processing described above performed by the user terminal 1-2. The same processing may be performed.

なお、図１５の例においては、図９のディスプレイ２４に替えてLED(Light Emitting Diode)発光部５３が装置５２に設けられている。LED発光部５３にはLED５３Ａ乃至５３Ｃが並べて設けられている。LED５３Ａの発光は、LED５３Ａが割り当てられたユーザＡが発言していることを表し、LED５３Ｂの発光は、LED５３Ｂが割り当てられたユーザＢが発言していることを表す。また、LED５３Ｃの発光は、LED５３Ｃが割り当てられたユーザ端末１−３のユーザが発言していることを表す。 In the example of FIG. 15, an LED (Light Emitting Diode) light emitting unit 53 is provided in the device 52 instead of the display 24 of FIG. 9. The LED light emitting unit 53 is provided with LEDs 53A to 53C arranged side by side. The light emission of the LED 53A indicates that the user A to whom the LED 53A is assigned speaks, and the light emission of the LED 53B indicates that the user B to which the LED 53B is assigned speaks. The light emission of the LED 53C indicates that the user of the user terminal 1-3 to which the LED 53C is assigned is speaking.

図１５の例においては、LED５３ＢとLED５３Ｃが発光しており、これにより、ユーザＢとユーザＣが同時に発言を行っていることが表されている。 In the example of FIG. 15, the LED 53B and the LED 53C emit light, which indicates that the user B and the user C are speaking at the same time.

発言を行っているユーザと発言を行っていないユーザが画面表示によって表されるのではなく、このように、LEDの発光によって表されるようにすることも可能である。 The user who is speaking and the user who is not speaking are not represented by the screen display, but can be represented by the light emission of the LED.

上述した一連の処理は、ハードウエアにより実行させることもできるし、ソフトウエアにより実行させることもできる。一連の処理をソフトウエアにより実行させる場合には、そのソフトウエアを構成するプログラムが、専用のハードウエアに組み込まれているコンピュータ、または、各種のプログラムをインストールすることで、各種の機能を実行することが可能な、例えば汎用のパーソナルコンピュータなどに、プログラム記録媒体からインストールされる。 The series of processes described above can be executed by hardware or can be executed by software. When a series of processing is executed by software, a program constituting the software executes various functions by installing a computer incorporated in dedicated hardware or various programs. For example, it is installed from a program recording medium in a general-purpose personal computer or the like.

図１６は、上述した一連の処理をプログラムにより実行するパーソナルコンピュータの構成例を示すブロック図である。 FIG. 16 is a block diagram illustrating a configuration example of a personal computer that executes the above-described series of processing by a program.

CPU(Central Processing Unit)１０１は、ROM(Read Only Memory)１０２、または記憶部１０８に記憶されているプログラムに従って各種の処理を実行する。RAM(Random Access Memory)１０３には、CPU１０１が実行するプログラムやデータなどが適宜記憶される。これらのCPU１０１、ROM１０２、およびRAM１０３は、バス１０４により相互に接続されている。 A CPU (Central Processing Unit) 101 executes various processes according to a program stored in a ROM (Read Only Memory) 102 or a storage unit 108. A RAM (Random Access Memory) 103 appropriately stores programs executed by the CPU 101 and data. These CPU 101, ROM 102, and RAM 103 are connected to each other by a bus 104.

CPU１０１にはまた、バス１０４を介して入出力インターフェース１０５が接続されている。入出力インターフェース１０５には、キーボード、マウス、マイクロホンなどよりなる入力部１０６、ディスプレイ、スピーカなどよりなる出力部１０７が接続されている。CPU１０１は、入力部１０６から入力される指令に対応して各種の処理を実行する。そして、CPU１０１は、処理の結果を出力部１０７に出力する。 An input / output interface 105 is also connected to the CPU 101 via the bus 104. Connected to the input / output interface 105 are an input unit 106 made up of a keyboard, mouse, microphone, and the like, and an output unit 107 made up of a display, a speaker, and the like. The CPU 101 executes various processes in response to commands input from the input unit 106. Then, the CPU 101 outputs the processing result to the output unit 107.

入出力インターフェース１０５に接続されている記憶部１０８は、例えばハードディスクからなり、CPU１０１が実行するプログラムや各種のデータを記憶する。通信部１０９はネットワーク２を介して外部の装置と通信を行う。 The storage unit 108 connected to the input / output interface 105 includes, for example, a hard disk, and stores programs executed by the CPU 101 and various data. A communication unit 109 communicates with an external device via the network 2.

入出力インターフェース１０５に接続されているドライブ１１０は、磁気ディスク、光ディスク、光磁気ディスク、或いは半導体メモリなどのリムーバブルメディア１１１が装着されたとき、それらを駆動し、そこに記録されているプログラムやデータなどを取得する。取得されたプログラムやデータは、必要に応じて記憶部１０８に転送され、記憶される。 The drive 110 connected to the input / output interface 105 drives a removable medium 111 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory, and drives programs and data recorded there. Get etc. The acquired program and data are transferred to and stored in the storage unit 108 as necessary.

コンピュータにインストールされ、コンピュータによって実行可能な状態とされるプログラムを格納するプログラム記録媒体は、図１６に示すように、磁気ディスク（フレキシブルディスクを含む）、光ディスク（CD-ROM(Compact Disc-Read Only Memory)，DVD(Digital Versatile Disc)を含む）、光磁気ディスク、もしくは半導体メモリなどよりなるパッケージメディアであるリムーバブルメディア１１１、または、プログラムが一時的もしくは永続的に格納されるROM１０２や、記憶部１０８を構成するハードディスクなどにより構成される。プログラム記録媒体へのプログラムの格納は、必要に応じてルータ、モデムなどのインタフェースである通信部１０９を介して、ローカルエリアネットワーク、インターネット、デジタル衛星放送といった、有線または無線の通信媒体を利用して行われる。 As shown in FIG. 16, a program recording medium for storing a program that is installed in a computer and can be executed by the computer is a magnetic disk (including a flexible disk), an optical disk (CD-ROM (Compact Disc-Read Only). Memory, DVD (Digital Versatile Disc), a magneto-optical disk, a removable medium 111 that is a package medium composed of a semiconductor memory, or the ROM 102 in which a program is temporarily or permanently stored, or a storage unit 108 It is comprised by the hard disk etc. which comprise. The program is stored in the program recording medium using a wired or wireless communication medium such as a local area network, the Internet, or digital satellite broadcasting via a communication unit 109 that is an interface such as a router or a modem as necessary. Done.

なお、本明細書において、プログラムを記述するステップは、記載された順序に沿って時系列的に行われる処理はもちろん、必ずしも時系列的に処理されなくとも、並列的あるいは個別に実行される処理をも含むものである。 In the present specification, the steps for describing a program are not only processes performed in time series in the order described, but also processes that are executed in parallel or individually even if they are not necessarily processed in time series. Is also included.

また、本明細書において、システムとは、複数の装置により構成される装置全体を表すものである。 Further, in this specification, the system represents the entire apparatus constituted by a plurality of apparatuses.

本発明の一実施形態に係る通信システムの構成例を示す図である。It is a figure which shows the structural example of the communication system which concerns on one Embodiment of this invention. 図１のユーザ端末１−１乃至１−３の間で行われるデータの送受信の流れについて説明する図である。It is a figure explaining the flow of transmission / reception of the data performed between the user terminals 1-1 thru | or 1-3 of FIG. RTPパケットのデータ構造の例を示す図である。It is a figure which shows the example of the data structure of an RTP packet. RTCPパケットのデータ構造の例を示す図である。It is a figure which shows the example of the data structure of a RTCP packet. SDESパケットに記述されるデータの例を示す図である。It is a figure which shows the example of the data described by the SDES packet. SDES ITEMの例を示す図である。It is a figure which shows the example of SDES ITEM. 図６のItem IdentifierとItem Descriptionを示す図である。It is a figure which shows Item Identifier and Item Description of FIG. SDESパケットに記述される識別子の例を示す図である。It is a figure which shows the example of the identifier described by the SDES packet. ユーザ端末１−１の機能構成例を示すブロック図である。It is a block diagram which shows the function structural example of the user terminal 1-1. 図９の主制御部の構成例を示すブロック図である。It is a block diagram which shows the structural example of the main control part of FIG. 図９の主制御部の他の構成例を示すブロック図である。It is a block diagram which shows the other structural example of the main control part of FIG. ユーザ端末１−１のカンファレンス管理処理について説明するフローチャートである。It is a flowchart explaining the conference management process of the user terminal 1-1. ユーザ端末１−２の表示制御処理について説明するフローチャートである。It is a flowchart explaining the display control process of the user terminal 1-2. ディスプレイに表示される画面の例を示す図である。It is a figure which shows the example of the screen displayed on a display. ユーザ端末１−１の他の構成例を示すブロック図である。It is a block diagram which shows the other structural example of the user terminal 1-1. パーソナルコンピュータの構成を示すブロック図である。It is a block diagram which shows the structure of a personal computer.

Explanation of symbols

１−１乃至１−３ユーザ端末，２ネットワーク，３サーバ，１１通信制御部，１２主制御部，１３入出力制御部，１４音声入力部，１５音声出力部，１６表示制御部，２１ネットワーク端子，２２マイクロフォン，２３スピーカ，２４ディスプレイ，３１パケット解析部，３２音声合成部，３３パケット生成部，４１パケット解析部，４２パケット生成部 1-1 to 1-3 User terminals, 2 networks, 3 servers, 11 communication control units, 12 main control units, 13 input / output control units, 14 audio input units, 15 audio output units, 16 display control units, 21 network terminals , 22 microphones, 23 speakers, 24 displays, 31 packet analysis units, 32 speech synthesis units, 33 packet generation units, 41 packet analysis units, 42 packet generation units

Claims

In a communication system consisting of an information management device and a plurality of information processing devices connected via a network,
The information management device includes:
The voices obtained and transmitted by the predetermined information processing apparatuses are synthesized, and the voices obtained by synthesizing the voices are transmitted to the plurality of information processing apparatuses and output to the plurality of information processing apparatuses. A compositing means that allows conversations between
Detecting means for detecting the start or end of a user's speech based on the state of sound acquired by the plurality of information processing devices;
Each time the detection means detects the start or end of a user's speech, and the configuration of the user who speaks at the same time is changed, the first identification information for identifying the users of the plurality of information processing devices and the speech are performed. Generating means for generating control information including second identification information for identifying a user, and transmitting the control information to a plurality of the information processing devices;
The plurality of information processing apparatuses are respectively
Obtaining means for obtaining the control information transmitted from the information management device;
Each time the control information is acquired by the acquisition means, a display that identifies a user who is speaking and a user who is not speaking based on the control information, and displays information related to each user in different formats A communication system comprising control means.

In an information management apparatus connected to a plurality of information processing apparatuses via a network,
The voices obtained and transmitted by the predetermined information processing apparatuses are synthesized, and the voices obtained by synthesizing the voices are transmitted to the plurality of information processing apparatuses and output to the plurality of information processing apparatuses. A compositing means that allows conversations between
Detecting means for detecting the start or end of a user's speech based on the state of sound acquired by the plurality of information processing devices;
Each time the detection means detects the start or end of a user's speech, and the configuration of the user who speaks at the same time is changed, the first identification information for identifying the users of the plurality of information processing devices and the speech are performed. An information management device comprising: generating means for generating control information including second identification information for identifying a user and transmitting the control information to a plurality of the information processing devices.

The information management apparatus according to claim 2, wherein the control information is an RTCP packet, and the first identification information is a CSRC described in the RTCP packet.

The information management apparatus according to claim 3, wherein the second identification information is information described in an SDES type RTCP packet as SDES ITEM.

In an information processing method of an information management device connected to a plurality of information processing devices via a network,
The voices obtained and transmitted by the predetermined information processing apparatuses are synthesized, and the voices obtained by synthesizing the voices are transmitted to the plurality of information processing apparatuses and output to the plurality of information processing apparatuses. Have conversations between them,
Detecting the start or end of a user's speech based on the status of the sound acquired by the plurality of information processing devices;
Each time the start or end of a user's speech is detected and the configuration of the user who speaks at the same time is changed, the first identification information for identifying the users of the plurality of information processing devices and the user who is speaking are identified. An information processing method including a step of generating control information including second identification information and transmitting the control information to a plurality of the information processing devices.

In a program for causing a computer to execute information processing of an information management device connected to a plurality of information processing devices via a network,
The voices obtained and transmitted by the predetermined information processing apparatuses are synthesized, and the voices obtained by synthesizing the voices are transmitted to the plurality of information processing apparatuses and output to the plurality of information processing apparatuses. Have conversations between them,
Detecting the start or end of a user's speech based on the status of the sound acquired by the plurality of information processing devices;
Each time the start or end of a user's speech is detected and the configuration of the user who speaks at the same time is changed, the first identification information for identifying the users of the plurality of information processing devices and the user who is speaking are identified. A program including a step of generating control information including second identification information and transmitting the control information to a plurality of the information processing apparatuses.

Between the users of a plurality of information processing devices by synthesizing the voices acquired and transmitted by a predetermined information processing device and transmitting the synthesized voices to the plurality of information processing devices for output. A synthesizing unit that causes a conversation to be performed, a detecting unit that detects a start or end of a user's speech based on a voice situation acquired by a plurality of the information processing devices, and a start or end of the user's speech by the detecting unit Each time an end is detected and the configuration of a user who makes a statement changes at the same time, first identification information for identifying a plurality of users of the information processing apparatus and second identification information for identifying a user who makes a statement Information processing apparatus that includes control means for generating control information including and transmitting the control information to a plurality of information processing apparatuses together with other information processing apparatuses via an information processing apparatus In the location,
Obtaining means for obtaining the control information transmitted from the information management device;
Each time the control information is acquired by the acquisition means, a display that identifies a user who is speaking and a user who is not speaking based on the control information, and displays information related to each user in different formats An information processing apparatus comprising: control means.

Between the users of a plurality of information processing devices by synthesizing the voices acquired and transmitted by a predetermined information processing device and transmitting the synthesized voices to the plurality of information processing devices for output. A synthesizing unit that causes a conversation to be performed, a detecting unit that detects a start or end of a user's speech based on a voice situation acquired by a plurality of the information processing devices, and a start or end of the user's speech by the detecting unit Each time an end is detected and the configuration of a user who makes a statement changes at the same time, first identification information for identifying a plurality of users of the information processing apparatus and second identification information for identifying a user who makes a statement Information processing apparatus that includes control means for generating control information including and transmitting the control information to a plurality of information processing apparatuses together with other information processing apparatuses via an information processing apparatus In the information processing method of the location,
Obtaining the control information transmitted from the information management device;
Information processing including a step of identifying a user who is making a speech and a user who is not making a speech based on the control information each time the control information is acquired, and displaying information related to each user in a different format Method.

Between the users of a plurality of information processing devices by synthesizing the voices acquired and transmitted by a predetermined information processing device and transmitting the synthesized voices to the plurality of information processing devices for output. A synthesizing unit that causes a conversation to be performed, a detecting unit that detects a start or end of a user's speech based on a voice situation acquired by a plurality of the information processing devices, and a start or end of the user's speech by the detecting unit Each time an end is detected and the configuration of a user who makes a statement changes at the same time, first identification information for identifying a plurality of users of the information processing apparatus and second identification information for identifying a user who makes a statement Information processing apparatus that includes control means for generating control information including and transmitting the control information to a plurality of information processing apparatuses together with other information processing apparatuses via an information processing apparatus A program for executing information processing of location in the computer,
Obtaining the control information transmitted from the information management device;
A program including a step of identifying a user who is speaking and a user who is not speaking based on the control information each time the control information is acquired, and displaying information related to each user in a different format.