JP2011066730A

JP2011066730A - Communication terminal, communication method, and communication system

Info

Publication number: JP2011066730A
Application number: JP2009216397A
Authority: JP
Inventors: Tomohiro Inagaki; 友大稲垣
Original assignee: Brother Industries Ltd
Current assignee: Brother Industries Ltd
Priority date: 2009-09-18
Filing date: 2009-09-18
Publication date: 2011-03-31

Abstract

<P>PROBLEM TO BE SOLVED: To provide a communication terminal, a communication method, and a communication system, wherein a network band usable among communication terminals of users who have conversation is efficiently secured without increasing the network band itself usable among a plurality of communication terminals which participate in a conference when the users have the conversation. <P>SOLUTION: The communication terminal detects that the conversation starts when the conversation starts between the users of two communication terminals. A conversation terminal which is a communication terminal of a user who starts the conversation specifies a partner terminal which is a terminal of a conversation partner (S2). When the conversation starts (S4: YES), the conversation terminal selects a non-conversation terminal which does not have the conversation (S6). The conversation terminal transmits communication data only to the selected non-conversation terminal and the partner terminal (S8, S9). The non-conversation terminal selected from the conversation terminal transmits the communication data to other non-conversation terminals (S15). <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、音声データおよび画像データの少なくとも一方を含むデータを複数の通信端末間で送受信して会議を実行するための通信端末、通信方法、および通信システムに関する。 The present invention relates to a communication terminal, a communication method, and a communication system for performing a conference by transmitting / receiving data including at least one of audio data and image data between a plurality of communication terminals.

従来、複数のユーザの全てが同一の拠点にいない場合でも会議を実行できるように、音声データおよび画像データを複数の通信端末間で送受信する技術が知られている。ユーザ同士が会話を行っている場合、会話を行っているユーザの通信端末間で使用できるネットワーク帯域を確保するために、例えば、特許文献１に記載の映像通信装置は、ユーザ間で会話が行われているか否かに応じて異なるネットワークを使用している。 2. Description of the Related Art Conventionally, a technique for transmitting and receiving audio data and image data between a plurality of communication terminals is known so that a conference can be executed even when all of a plurality of users are not at the same base. When users are having a conversation, in order to secure a network bandwidth that can be used between communication terminals of the users who are having a conversation, for example, the video communication device described in Patent Document 1 has a conversation between users. Different networks are used depending on whether or not

特開平７−２２２１２９号公報JP-A-7-222129

特許文献１に記載の技術では、会議に参加している複数の通信端末間で使用可能なネットワーク帯域自体を増大させている。この場合、会議に参加していない他の機器が使用できるネットワーク帯域が圧迫されるという問題点があった。 In the technique described in Patent Literature 1, the network bandwidth itself that can be used between a plurality of communication terminals participating in a conference is increased. In this case, there is a problem that the network bandwidth that can be used by other devices not participating in the conference is compressed.

本発明は、ユーザ同士が会話を行っている場合に、会議に参加している複数の通信端末間で使用可能なネットワーク帯域自体を増大させることなく、会話を行っているユーザの通信端末間で使用できるネットワーク帯域を効率良く確保することができる通信端末、通信方法、および通信システムを提供することを目的とする。 The present invention enables communication between communication terminals of users who are having a conversation without increasing the network bandwidth that can be used between a plurality of communication terminals participating in the conference when the users are having a conversation. It is an object of the present invention to provide a communication terminal, a communication method, and a communication system that can efficiently secure a usable network band.

本発明の第一の態様に係る通信端末は、３つ以上の他の通信端末にネットワークを介して接続し、音声データおよび画像データの少なくとも一方を含む通信データを送受信する通信端末であって、自端末のユーザと、前記他の通信端末中の一の通信端末のユーザとの間で会話が開始されたことを検出する第一開始検出手段と、前記第一開始検出手段によって会話が開始されたことが検出された場合に、開始された会話の相手側の前記一の通信端末である相手端末を特定する第一特定手段と、前記第一特定手段によって前記相手端末が特定された場合に、前記他の通信端末のうち、会話を行っている前記相手端末を除く非会話端末を少なくとも１つ選定する選定手段と、音声データおよび画像データの少なくとも一方を含む情報から通信データを生成する生成手段と、前記第一特定手段によって特定された前記相手端末に、前記生成手段によって生成された通信データである第一データを送信する第一送信手段と、前記選定手段によって選定された前記非会話端末に、前記生成手段による生成元となる情報が前記第一データと共通する通信データである第二データを送信する第二送信手段と、前記他の通信端末中の２つの通信端末のユーザ間で会話が開始されたことを検出する第二開始検出手段と、前記第二開始検出手段によって会話が開始されたことが検出された場合に、開始された会話を行っている前記２つの通信端末を特定する第二特定手段と、前記他の通信端末から第二データを受信した場合に、前記第二特定手段によって特定された前記２つの通信端末以外の通信端末に、受信した前記第二データを転送する転送手段とを備えている。 The communication terminal according to the first aspect of the present invention is a communication terminal that connects to three or more other communication terminals via a network, and transmits and receives communication data including at least one of audio data and image data. The first start detecting means for detecting that the conversation has started between the user of the own terminal and the user of one communication terminal among the other communication terminals, and the conversation is started by the first start detecting means. A first identification unit that identifies the partner terminal that is the one communication terminal on the partner side of the conversation that is started, and the partner terminal is identified by the first identification unit Communication data from selection means for selecting at least one non-conversation terminal excluding the counterpart terminal that is having a conversation among the other communication terminals, and information including at least one of voice data and image data Selected by the generating means, the first transmitting means for transmitting the first data, which is the communication data generated by the generating means, to the counterpart terminal specified by the first specifying means, and the selecting means Second communication means for transmitting to the non-conversation terminal second data whose communication source information is the same as the first data, and two communication terminals among the other communication terminals The second start detecting means for detecting that the conversation has started between the users, and the second start detecting means, when the second start detecting means detects that the conversation has been started, A second specifying means for specifying one communication terminal, and when receiving second data from the other communication terminal, a communication terminal other than the two communication terminals specified by the second specifying means, And a transfer means for transferring said second data signal.

第一の態様に係る通信端末によると、複数の通信端末のうち、会話を行っているユーザが使用する２つの通信端末（以下、「会話端末」という。）の間では、第一データが相互に送受信される。会話端末は、複数の非会話端末のうちの少なくとも１つに第二データを送信する。第二データを受信した非会話端末が、他の非会話端末に第二データを転送する。従って、会話中、会話端末は、他の通信端末の全てに通信データを送信しなくても、会議を実行できる。よって、複数の通信端末間で使用できるネットワーク帯域自体を増大させることなく、ネットワーク帯域を効率的に利用して、会話端末が使用できるネットワークの帯域を確保することができる。その結果、２つの会話端末間で通信データが安定して送受信されるため、会議をスムーズに進行させることができる。データ量が大きい通信データを会話端末間で送受信することもできる。 According to the communication terminal according to the first aspect, among the plurality of communication terminals, between the two communication terminals (hereinafter referred to as “conversation terminals”) used by the user having a conversation, the first data is mutually exchanged. Sent and received. The conversation terminal transmits the second data to at least one of the plurality of non-conversation terminals. The non-conversation terminal that has received the second data transfers the second data to another non-conversation terminal. Therefore, during a conversation, the conversation terminal can execute a conference without transmitting communication data to all of the other communication terminals. Therefore, the network bandwidth that can be used by the conversation terminal can be secured by efficiently using the network bandwidth without increasing the network bandwidth itself that can be used between the plurality of communication terminals. As a result, since communication data is stably transmitted and received between the two conversation terminals, the conference can proceed smoothly. Communication data having a large amount of data can be transmitted and received between conversation terminals.

前記第一送信手段は、前記第二送信手段によって送信される第二データよりもデータ量が大きい第一データを前記相手端末に送信することが望ましい。この場合、通信端末は、会話端末として動作する場合、会話の相手側の相手端末との間に確保された帯域を用いて、第二データよりもデータ量が大きい第一データを相手端末に送信することができる。よって、通信端末は、音声および映像の少なくとも一方を、高い品質で会話中のユーザに提供することができる。また、通信端末は、第一データよりもデータ量が小さい第二データを非会話端末に送信する。よって、相手端末との間で使用できる帯域をより確実に確保し、第一データを円滑に相手端末に送信することができる。 The first transmission unit may transmit first data having a larger data amount than the second data transmitted by the second transmission unit to the counterpart terminal. In this case, when the communication terminal operates as a conversation terminal, the communication terminal transmits the first data having a data amount larger than the second data to the partner terminal using the bandwidth secured with the partner terminal on the partner side of the conversation. can do. Therefore, the communication terminal can provide at least one of audio and video to a user who is in conversation with high quality. Further, the communication terminal transmits second data having a data amount smaller than that of the first data to the non-conversation terminal. Therefore, a band that can be used with the counterpart terminal can be more reliably secured, and the first data can be smoothly transmitted to the counterpart terminal.

前記通信端末は、前記第一特定手段によって特定された相手端末を履歴として順に第一記憶手段に記憶する記憶制御手段をさらに備えてもよい。前記選定手段は、複数の前記非会話端末のうち、前記第一記憶手段に履歴として記憶された順が新しい非会話端末を優先して選定する第一選定手段を備えてもよい。会話端末から第二データが送信されて非会話端末が受信するまでの遅延時間は、会話端末から非会話端末に直接第二データが送信される方が、転送される場合よりも短くなる。従って、この場合、現在の会話との関連性が一般的に高い、履歴への記憶順が新しい会話相手に対して、時間遅延を最小限に抑えて通信データを送信することができる。 The communication terminal may further include storage control means for sequentially storing the counterpart terminal specified by the first specifying means as a history in the first storage means. The selection means may include first selection means for preferentially selecting a non-conversation terminal having a new order of history stored in the first storage means among the plurality of non-conversation terminals. The delay time from when the second data is transmitted from the conversation terminal until it is received by the non-conversation terminal is shorter when the second data is transmitted directly from the conversation terminal to the non-conversation terminal than when it is transferred. Therefore, in this case, communication data can be transmitted with minimal time delay to a conversation partner that is generally highly relevant to the current conversation and has a new history storage order.

前記選定手段は、複数の前記非会話端末のうち、自端末との間の通信データの伝送時間が短い非会話端末を優先して選定する第二選定手段と、複数の前記非会話端末のうち、自端末との間のネットワーク帯域が高い非会話端末を優先して選定する第三選定手段と、複数の前記非会話端末のうち、データの処理能力が高い非会話端末を優先して選定する第四選定手段との少なくともいずれかを備えるのが望ましい。この場合、通信端末は、会話端末として動作する場合、選定手段によって選定されなかった非会話端末に対して、伝送の遅延が少ない状態、または伝送中の不具合が生じ難い状態で第二データを送信することができる。 The selection means is a second selection means for preferentially selecting a non-conversation terminal having a short transmission time of communication data with the own terminal among the plurality of non-conversation terminals, and a plurality of the non-conversation terminals. A third selection means for preferentially selecting a non-conversation terminal having a high network bandwidth with the terminal, and a non-conversation terminal having a high data processing capacity among the plurality of non-conversation terminals. It is desirable to include at least one of the fourth selection means. In this case, when operating as a conversation terminal, the communication terminal transmits the second data to the non-conversation terminal that has not been selected by the selection means in a state where transmission delay is low or a problem during transmission is unlikely to occur. can do.

前記通信端末は、前記ネットワークを介して接続された複数の通信端末における、それぞれの通信端末の優先度を記憶する第二記憶手段から、前記優先度を取得する第一取得手段をさらに備えてもよい。前記選定手段は、複数の前記非会話端末のうち、前記第一取得手段によって取得された優先度が高い非会話端末を優先して選定する第五選定手段を備えてもよい。この場合、ユーザは、各通信端末の優先度を設定することで、優先度が高い通信端末のユーザに対して、時間遅延を最小限に抑えて通信データを送信することができる。 The communication terminal further includes first acquisition means for acquiring the priority from second storage means for storing the priority of each communication terminal in a plurality of communication terminals connected via the network. Good. The selection unit may include a fifth selection unit that preferentially selects a non-conversation terminal having a high priority acquired by the first acquisition unit among the plurality of non-conversation terminals. In this case, by setting the priority of each communication terminal, the user can transmit communication data to the user of the communication terminal having a high priority while minimizing the time delay.

前記通信端末は、それぞれの前記通信端末を使用するユーザを特定する特定情報を、前記ユーザが使用する前記通信端末に対応付けて記憶する第三記憶手段から、前記特定情報を取得する第二取得手段と、前記複数の通信端末のそれぞれを使用するユーザの発話音声を取得する第三取得手段と、前記第三取得手段によって取得された発話音声から発話内容を認識する音声認識手段とをさらに備えてもよい。前記第一特定手段は、前記音声認識手段によって認識された発話内容に前記特定情報が含まれている場合に、前記特定情報を用いて前記相手端末を特定する音声特定手段を備えてもよい。この場合、通信端末は、ユーザの発話音声によって、会話相手の通信端末である相手端末を容易に特定することができる。 The communication terminal obtains the specific information from third storage means for storing specific information for specifying a user who uses each of the communication terminals in association with the communication terminal used by the user. Means, a third acquisition means for acquiring the speech voice of the user who uses each of the plurality of communication terminals, and a voice recognition means for recognizing the utterance content from the speech voice acquired by the third acquisition means. May be. The first specifying unit may include a voice specifying unit that specifies the partner terminal using the specific information when the specific information is included in the utterance content recognized by the voice recognition unit. In this case, the communication terminal can easily specify the partner terminal that is the communication terminal of the conversation partner based on the user's speech.

前記通信端末は、前記他の通信端末から送信された通信データに基づいて、画像を表示する表示手段に前記他の通信端末のユーザを表示させる表示制御手段と、ユーザを撮像する撮像手段から、前記ユーザの撮像映像を取得する第四取得手段と、前記第四取得手段によって取得された撮像映像から前記ユーザの視線方向を検出する視線検出手段とをさらに備えてもよい。前記第一特定手段は、前記表示手段に表示されたユーザが、前記視線検出手段によって検出された視線方向にある場合に、視線方向にあるユーザが使用する通信端末を前記相手端末に特定する視線特定手段を備えてもよい。この場合、通信端末は、ユーザの視線によって、会話相手の通信端末である相手端末を容易に特定することができる。 The communication terminal includes a display control unit that displays a user of the other communication terminal on a display unit that displays an image based on communication data transmitted from the other communication terminal, and an imaging unit that captures the user. You may further provide the 4th acquisition means which acquires the said user's captured image, and the gaze detection means which detects the said user's gaze direction from the captured image acquired by the said 4th acquisition means. The first identification unit identifies a communication terminal used by the user in the line-of-sight direction as the partner terminal when the user displayed on the display unit is in the line-of-sight direction detected by the line-of-sight detection unit. Specific means may be provided. In this case, the communication terminal can easily identify the partner terminal that is the communication terminal of the conversation partner based on the user's line of sight.

前記通信端末は、前記第一開始検出手段または前記第二開始検出手段によって会話が開始されたことが検出された後、２つの通信端末のユーザ間の会話が終了したか否かを検出する終了検出手段と、前記終了検出手段によって前記会話が終了したことが検出された場合に、前記生成手段によって生成された通信データである第三データを、前記他の通信端末の全てに送信する第三送信手段とをさらに備えるのが望ましい。上記構成の通信端末は、自端末を含む複数の通信端末中の２つの通信端末のユーザ間の会話が終了すると、各端末間で相互に通信データを直接送受信することができる。従って、２つの端末間での会話が行われていない場合には、通信データが送信されてから受信されるまでの時間遅延を最小限に抑えることができる。 The communication terminal detects whether the conversation between the users of the two communication terminals has ended after the first start detection means or the second start detection means detects that the conversation has started. A third means for transmitting third data, which is communication data generated by the generating means, to all of the other communication terminals when the end of the conversation is detected by the detecting means and the end detecting means; It is desirable to further comprise a transmission means. When the communication between the users of the two communication terminals among the plurality of communication terminals including the self terminal is completed, the communication terminals configured as described above can directly transmit and receive communication data between the terminals. Therefore, when a conversation between the two terminals is not performed, a time delay from when the communication data is transmitted until it is received can be minimized.

本発明の第二の態様に係る通信方法は、ネットワークを介して相互に接続され、音声データおよび画像データの少なくとも一方を含む通信データを送受信する４つ以上の通信端末を備えた通信システムで行われる通信方法であって、自端末のユーザと、自端末に接続された他の通信端末中の一の通信端末のユーザとの間で会話が開始されたことを検出する第一開始検出ステップと、前記第一開始検出ステップによって会話が開始されたことが検出された場合に、開始された会話の相手側の前記一の通信端末である相手端末を特定する第一特定ステップと、前記第一特定ステップによって前記相手端末が特定された場合に、前記他の通信端末のうち、会話を行っている前記相手端末を除く非会話端末を少なくとも１つ選定する選定ステップと、音声データおよび画像データの少なくとも一方を含む情報から通信データを生成する生成ステップと、前記第一特定ステップによって特定された前記相手端末に、前記生成ステップによって生成された通信データである第一データを送信する第一送信ステップと、前記選定ステップによって選定された前記非会話端末に、前記生成ステップによる生成元となる情報が前記第一データと共通する通信データである第二データを送信する第二送信ステップと、前記他の通信端末中の２つの通信端末のユーザ間で会話が開始されたことを検出する第二開始検出ステップと、前記第二開始検出ステップによって会話が開始されたことが検出された場合に、開始された会話を行っている前記２つの通信端末を特定する第二特定ステップと、前記他の通信端末から第二データを受信した場合に、前記第二特定ステップによって特定された前記２つの通信端末以外の通信端末に、受信した前記第二データを転送する転送ステップとを備えている。 A communication method according to a second aspect of the present invention is performed by a communication system including four or more communication terminals that are connected to each other via a network and transmit / receive communication data including at least one of audio data and image data. A first start detection step of detecting that a conversation has started between a user of the own terminal and a user of one communication terminal among the other communication terminals connected to the own terminal; A first specifying step of specifying a partner terminal that is the one communication terminal on the partner side of the started conversation when the first start detecting step detects that the conversation is started; and A selection step of selecting at least one non-conversation terminal excluding the counterpart terminal having a conversation among the other communication terminals when the counterpart terminal is identified by the identification step; A generation step of generating communication data from information including at least one of voice data and image data, and first data which is communication data generated by the generation step are added to the counterpart terminal specified by the first specification step. A first transmission step for transmitting, and a second data for transmitting the second data, which is communication data common to the first data, to the non-conversation terminal selected in the selection step. A transmission step, a second start detection step for detecting that a conversation is started between users of two communication terminals in the other communication terminals, and a detection that the conversation is started by the second start detection step A second specifying step of specifying the two communication terminals having a conversation that has been started, and the other communication When receiving the second data from the end, in the second specific communication terminal other than the two communication terminals identified by the step, and a transfer step of transferring the second data received.

第二の態様に係る通信方法によると、複数の通信端末のうち、会話を行っているユーザが使用する２つの会話端末の間では、第一データが相互に送受信される。会話端末は、複数の非会話端末のうちの少なくとも１つに第二データを送信する。第二データを受信した非会話端末が、他の非会話端末に第二データを転送する。従って、会話中、会話端末は、他の通信端末の全てに通信データを送信しなくても、会議を実行できる。よって、複数の通信端末間で使用できるネットワーク帯域自体を増大させることなく、ネットワーク帯域を効率的に利用して、会話端末が使用できるネットワークの帯域を確保することができる。その結果、２つの会話端末間で通信データが安定して送受信されるため、会議をスムーズに進行させることができる。データ量が大きい通信データを会話端末間で送受信することもできる。 According to the communication method according to the second aspect, among the plurality of communication terminals, the first data is transmitted and received between two conversation terminals used by the user having a conversation. The conversation terminal transmits the second data to at least one of the plurality of non-conversation terminals. The non-conversation terminal that has received the second data transfers the second data to another non-conversation terminal. Therefore, during a conversation, the conversation terminal can execute a conference without transmitting communication data to all of the other communication terminals. Therefore, the network bandwidth that can be used by the conversation terminal can be secured by efficiently using the network bandwidth without increasing the network bandwidth itself that can be used between the plurality of communication terminals. As a result, since communication data is stably transmitted and received between the two conversation terminals, the conference can proceed smoothly. Communication data having a large amount of data can be transmitted and received between conversation terminals.

本発明の第三の態様に係る通信システムは、音声データおよび画像データの少なくとも一方を含む通信データを送受信する通信端末を４つ以上備え、複数の前記通信端末がネットワークを介して相互に接続される通信システムであって、前記通信端末は、自端末のユーザと、前記他の通信端末中の一の通信端末のユーザとの間で会話が開始されたことを検出する第一開始検出手段と、前記第一開始検出手段によって会話が開始されたことが検出された場合に、開始された会話の相手側の前記一の通信端末である相手端末を特定する第一特定手段と、前記第一特定手段によって前記相手端末が特定された場合に、前記他の通信端末のうち、会話を行っている前記相手端末を除く非会話端末を少なくとも１つ選定する選定手段と、音声データおよび画像データの少なくとも一方を含む情報から通信データを生成する生成手段と、前記第一特定手段によって特定された前記相手端末に、前記生成手段によって生成された通信データである第一データを送信する第一送信手段と、前記選定手段によって選定された前記非会話端末に、前記生成手段による生成元となる情報が前記第一データと共通する通信データである第二データを送信する第二送信手段と、前記他の通信端末中の２つの通信端末のユーザ間で会話が開始されたことを検出する第二開始検出手段と、前記第二開始検出手段によって会話が開始されたことが検出された場合に、開始された会話を行っている前記２つの通信端末を特定する第二特定手段と、前記他の通信端末から第二データを受信した場合に、前記第二特定手段によって特定された前記２つの通信端末以外の通信端末に、受信した前記第二データを転送する転送手段とを備えている。 A communication system according to a third aspect of the present invention includes four or more communication terminals that transmit and receive communication data including at least one of audio data and image data, and the plurality of communication terminals are connected to each other via a network. A first start detecting means for detecting that a conversation is started between a user of the own terminal and a user of one communication terminal among the other communication terminals; A first specifying means for specifying a partner terminal that is the one communication terminal on the partner side of the started conversation when the first start detecting means detects that the conversation is started; and Selecting means for selecting at least one non-conversation terminal excluding the counterpart terminal having a conversation among the other communication terminals when the counterpart terminal is specified by the specifying means; A generating unit that generates communication data from information including at least one of the image data; and a first data that is the communication data generated by the generating unit is transmitted to the counterpart terminal specified by the first specifying unit. And a second transmission means for transmitting, to the non-conversation terminal selected by the selection means, second data that is communication data in which the information generated by the generation means is common to the first data. When it is detected that the conversation is started by the second start detecting means for detecting that the conversation is started between the users of the two communication terminals in the other communication terminals, and the second start detecting means. The second specifying means for specifying the two communication terminals that have started the conversation and the second specifying means when the second data is received from the other communication terminals. The communication terminal other than the two communication terminals identified I, and a transfer means for transferring said received second data.

第三の態様に係る通信システムによると、複数の通信端末のうち、会話を行っているユーザが使用する２つの会話端末の間では、第一データが相互に送受信される。会話端末は、複数の非会話端末のうちの少なくとも１つに第二データを送信する。第二データを受信した非会話端末が、他の非会話端末に第二データを転送する。従って、会話中、会話端末は、他の通信端末の全てに通信データを送信しなくても、会議を実行できる。よって、複数の通信端末間で使用できるネットワーク帯域自体を増大させることなく、ネットワーク帯域を効率的に利用して、会話端末が使用できるネットワークの帯域を確保することができる。その結果、２つの会話端末間で通信データが安定して送受信されるため、会議をスムーズに進行させることができる。データ量が大きい通信データを会話端末間で送受信することもできる。 According to the communication system according to the third aspect, among the plurality of communication terminals, the first data is transmitted and received between the two conversation terminals used by the user having a conversation. The conversation terminal transmits the second data to at least one of the plurality of non-conversation terminals. The non-conversation terminal that has received the second data transfers the second data to another non-conversation terminal. Therefore, during a conversation, the conversation terminal can execute a conference without transmitting communication data to all of the other communication terminals. Therefore, the network bandwidth that can be used by the conversation terminal can be secured by efficiently using the network bandwidth without increasing the network bandwidth itself that can be used between the plurality of communication terminals. As a result, since communication data is stably transmitted and received between the two conversation terminals, the conference can proceed smoothly. Communication data having a large amount of data can be transmitted and received between conversation terminals.

テレビ会議システム３のシステム構成を示す図である。1 is a diagram illustrating a system configuration of a video conference system 3. FIG. テレビ会議端末１の電気的構成を示すブロック図である。2 is a block diagram showing an electrical configuration of the video conference terminal 1. FIG. テレビ会議中に表示装置３４に表示される画像の一例を示す図である。It is a figure which shows an example of the image displayed on the display apparatus 34 during a video conference. ＨＤＤ１３の通信相手情報記憶エリア１３２に記憶されている情報を示す模式図である。4 is a schematic diagram showing information stored in a communication partner information storage area 132 of the HDD 13. FIG. ＨＤＤ１３の会話履歴記憶エリア１３３に記憶されている情報を示す模式図である。3 is a schematic diagram showing information stored in a conversation history storage area 133 of the HDD 13. FIG. テレビ会議端末１およびＰＣ２が行うメイン処理のフローチャートである。It is a flowchart of the main process which the video conference terminal 1 and PC2 perform. メイン処理中に実行される会議開始処理のフローチャートである。It is a flowchart of the meeting start process performed during a main process. テレビ会議端末１およびＰＣ２が行う第一会話開始検出処理のフローチャートである。It is a flowchart of the 1st conversation start detection process which the video conference terminal 1 and PC2 perform. テレビ会議端末１およびＰＣ２が行う第二会話開始検出処理のフローチャートである。It is a flowchart of the 2nd conversation start detection process which the video conference terminal 1 and PC2 perform. メイン処理中に実行される中継端末選定処理のフローチャートである。It is a flowchart of the relay terminal selection process performed during the main process. テレビ会議端末１およびＰＣ２が行う中継端末受諾処理のフローチャートである。It is a flowchart of the relay terminal acceptance process which the video conference terminal 1 and PC2 perform. テレビ会議システム３における通常のデータの送受信態様の一例を示す図である。It is a figure which shows an example of the transmission / reception aspect of the normal data in the video conference system 3. FIG. 拠点Ａのユーザと拠点Ｅのユーザとが会話を行っている場合のデータの送受信態様の一例を示す図である。It is a figure which shows an example of the transmission / reception aspect of data in case the user of the base A and the user of the base E are having conversation. 図１３の場合の拠点Ａの通信端末の送受信態様を示す図である。It is a figure which shows the transmission / reception aspect of the communication terminal of the base A in the case of FIG.

以下、本発明の通信端末を具現化した一実施の形態であるテレビ会議端末１、およびパーソナルコンピュータ（以下、「ＰＣ」という。）２について、図面を参照して説明する。なお、参照する図面は、本発明が採用し得る技術的特徴を説明するために用いられるものである。図面に記載されている装置の構成、各種処理のフローチャート等は、それのみに限定する趣旨ではなく、単なる説明例である。 Hereinafter, a video conference terminal 1 and a personal computer (hereinafter referred to as “PC”) 2 which are embodiments of a communication terminal according to the present invention will be described with reference to the drawings. The drawings to be referred to are used for explaining technical features that can be adopted by the present invention. The configuration of the apparatus, the flowcharts of various processes, and the like described in the drawings are not intended to be limited to these, but are merely illustrative examples.

図１を参照して、テレビ会議端末１およびＰＣ２を備えたテレビ会議システム３のシステム構成について説明する。テレビ会議システム３は、ネットワーク８を介して接続された複数の通信端末によって構成される。図１では、テレビ会議システム３を構成する通信端末として、拠点Ａに配置されたテレビ会議端末１、拠点Ｂに配置されたＰＣ２、および拠点Ｃに配置されたテレビ会議端末１を図示している。テレビ会議システム３では、各通信端末が互いに情報を入出力することで、複数の拠点のユーザに映像および音声を共有させる。その結果、全てのユーザが同一の拠点にいない場合でも、ユーザは円滑に会議を実行することができる。 With reference to FIG. 1, a system configuration of a video conference system 3 including a video conference terminal 1 and a PC 2 will be described. The video conference system 3 includes a plurality of communication terminals connected via a network 8. In FIG. 1, as a communication terminal constituting the video conference system 3, a video conference terminal 1 arranged at the base A, a PC 2 arranged at the base B, and a video conference terminal 1 arranged at the base C are illustrated. . In the video conference system 3, each communication terminal inputs / outputs information to / from each other, thereby allowing users at a plurality of bases to share video and audio. As a result, even when all the users are not at the same base, the users can smoothly execute the conference.

各通信端末は、同一拠点内に配置されたマイク３１、スピーカ３２、カメラ３３、表示装置３４、および赤外線ライト３５に接続している。マイク３１は、音声を音声データに変換する。スピーカ３２は、音声データに基づいて音声を発生させる。カメラ３３は動画像を撮像する。表示装置３４は画像を表示する。赤外線ライト３５は、詳細は後述するが、ユーザの視線方向を検出するために用いられる。 Each communication terminal is connected to a microphone 31, a speaker 32, a camera 33, a display device 34, and an infrared light 35 disposed in the same site. The microphone 31 converts sound into sound data. The speaker 32 generates sound based on the sound data. The camera 33 captures a moving image. The display device 34 displays an image. Although details will be described later, the infrared light 35 is used to detect the user's line-of-sight direction.

次に、図２を参照して、テレビ会議端末１の電気的構成について説明する。なお、ＰＣ２の電気的構成のうち、本実施の形態を説明するために必要な構成については、テレビ会議端末１の電気的構成と同じである。また、テレビ会議を実行するためにテレビ会議端末１およびＰＣ２が行う動作、記憶しているデータの構成等も同じである。よって、以下の説明では、ＰＣ２の電気的構成についてはテレビ会議端末１の構成と同一の番号を付し、この詳細な説明は省略する。ＰＣ２が行う動作等の説明も省略する。 Next, the electrical configuration of the video conference terminal 1 will be described with reference to FIG. Of the electrical configuration of the PC 2, the configuration necessary for describing the present embodiment is the same as the electrical configuration of the video conference terminal 1. The operations performed by the video conference terminal 1 and the PC 2 to execute the video conference, the configuration of stored data, and the like are the same. Therefore, in the following description, the electrical configuration of the PC 2 is assigned the same number as the configuration of the video conference terminal 1, and this detailed description is omitted. A description of operations performed by the PC 2 is also omitted.

図２に示すように、テレビ会議端末１は、テレビ会議端末１の制御を司るＣＰＵ１０を備えている。ＣＰＵ１０には、ＲＯＭ１１、ＲＡＭ１２、ハードディスクドライブ（以下、「ＨＤＤ」という。）１３、および入出力インターフェース１９が、バス１８を介して接続されている。 As shown in FIG. 2, the video conference terminal 1 includes a CPU 10 that controls the video conference terminal 1. A ROM 11, a RAM 12, a hard disk drive (hereinafter referred to as “HDD”) 13, and an input / output interface 19 are connected to the CPU 10 via a bus 18.

ＲＯＭ１１は、テレビ会議端末１を動作させるためのプログラムおよび初期値等を記憶している。ＲＡＭ１２は、制御プログラムで使用される各種の情報を一時的に記憶する。ＨＤＤ１３は、各種の情報を記憶する不揮発性の記憶装置である。ＨＤＤ１３の代わりに、ＥＥＰＲＯＭまたはメモリカード等の記憶装置を用いてもよい。 The ROM 11 stores a program for operating the video conference terminal 1, an initial value, and the like. The RAM 12 temporarily stores various information used in the control program. The HDD 13 is a non-volatile storage device that stores various types of information. Instead of the HDD 13, a storage device such as an EEPROM or a memory card may be used.

入出力インターフェース１９には、音声入力処理部２１、音声出力処理部２２、映像入力処理部２３、映像出力処理部２４、操作部２５、外部通信Ｉ／Ｆ２６、および赤外線ライト３５が接続されている。音声入力処理部２１は、マイク３１からの音声データの入力を処理する。音声出力処理部２２は、スピーカ３２の動作を処理する。映像入力処理部２３は、カメラ３３からの画像データ（映像データ）の入力を処理する。映像出力処理部２４は、表示装置３４の動作を処理する。操作部２５は、ユーザがテレビ会議端末１に各種指示を入力するために用いられる。外部通信Ｉ／Ｆ２６は、テレビ会議端末１をネットワーク８に接続する。 Connected to the input / output interface 19 are an audio input processing unit 21, an audio output processing unit 22, a video input processing unit 23, a video output processing unit 24, an operation unit 25, an external communication I / F 26, and an infrared light 35. . The voice input processing unit 21 processes voice data input from the microphone 31. The audio output processing unit 22 processes the operation of the speaker 32. The video input processing unit 23 processes input of image data (video data) from the camera 33. The video output processing unit 24 processes the operation of the display device 34. The operation unit 25 is used for the user to input various instructions to the video conference terminal 1. The external communication I / F 26 connects the video conference terminal 1 to the network 8.

次に、図３を参照して、テレビ会議端末１が行う動作の概要について説明する。図３に示すように、テレビ会議端末１は、ネットワーク８を介して接続された他の通信端末から画像データを受信する。テレビ会議端末１は、受信した画像データに基づいて、表示装置３４に画像を表示させる。図３は、拠点Ａ，Ｂ，Ｃ，Ｄ，Ｅのそれぞれに配置された５つの通信端末によってテレビ会議が実行されている場合に、拠点Ａのテレビ会議端末１が表示装置３４に表示させる画像の一例を示す。テレビ会議中、テレビ会議端末１は、他の通信端末と同じ数の表示領域４１〜４４を表示装置３４内に形成する。表示領域４１〜４４のそれぞれに、他の拠点の画像を表示させる。また、テレビ会議端末１は、他の通信端末から受信した音声データに基づいて、スピーカ３２から音声を発生させる。その結果、別拠点にいる複数のユーザは遠隔会議を行うことができる。 Next, with reference to FIG. 3, the outline | summary of the operation | movement which the video conference terminal 1 performs is demonstrated. As shown in FIG. 3, the video conference terminal 1 receives image data from another communication terminal connected via the network 8. The video conference terminal 1 displays an image on the display device 34 based on the received image data. FIG. 3 shows an image that the video conference terminal 1 at the base A displays on the display device 34 when the video conference is executed by five communication terminals arranged at the bases A, B, C, D, and E, respectively. An example is shown. During the video conference, the video conference terminal 1 forms the same number of display areas 41 to 44 in the display device 34 as the other communication terminals. Images of other bases are displayed in the display areas 41 to 44, respectively. In addition, the video conference terminal 1 generates sound from the speaker 32 based on the sound data received from another communication terminal. As a result, a plurality of users at different bases can perform a remote conference.

また、テレビ会議端末１は、複数の通信端末中の２つの通信端末のユーザ間で会話が行われたことを検出する。さらに、複数の通信端末の中から、会話中のユーザの通信端末（以下、「会話端末」という。）を特定する。テレビ会議端末１は、会話を検出すると、２つの会話端末間のネットワーク８の帯域を優先的に確保するための処理を行う。具体的には、会話端末は、会話を行っていない複数の非会話端末の中から、音声データおよび画像データ（以下、「通信データ」という。）を直接送信する非会話端末（以下、「中継端末」という。）を選定する。会話端末は、会話相手の通信端末（以下、「相手端末」という。）、および選定した非会話端末にのみ、通信データを送信する。中継端末は、会話端末から受信した通信データを、その他の非会話端末に転送する。その結果、会話端末が送信する通信データの経路が減少し、会話端末のネットワーク８の帯域が確保される。 In addition, the video conference terminal 1 detects that a conversation has occurred between users of two communication terminals among the plurality of communication terminals. Furthermore, the communication terminal of the user who is in conversation (hereinafter referred to as “conversation terminal”) is specified from among the plurality of communication terminals. When the video conference terminal 1 detects a conversation, it performs processing for preferentially securing the bandwidth of the network 8 between the two conversation terminals. Specifically, a conversation terminal is a non-conversation terminal (hereinafter referred to as “relay”) that directly transmits voice data and image data (hereinafter referred to as “communication data”) from a plurality of non-conversation terminals that are not engaged in conversation. "Terminal"). The conversation terminal transmits communication data only to the communication terminal of the conversation partner (hereinafter referred to as “partner terminal”) and the selected non-conversation terminal. The relay terminal transfers the communication data received from the conversation terminal to other non-conversation terminals. As a result, the path of communication data transmitted by the conversation terminal is reduced, and the bandwidth of the network 8 of the conversation terminal is secured.

ここで、テレビ会議端末１は、会話開始の検出、および会話端末の特定を行うために、ユーザの視線方向を検出して利用する方法と、音声認識を利用する方法とを用いることができる。 Here, the video conference terminal 1 can use a method of detecting and using the user's line-of-sight direction and a method of using voice recognition in order to detect the start of conversation and specify the conversation terminal.

テレビ会議端末１が行うユーザの視線方向の検出について説明する。テレビ会議端末１は、同一拠点内のユーザの視線方向を検出することができる。検出した視線方向に、表示装置３４に形成された複数の表示領域４１〜４４のいずれがあるかを判断する。テレビ会議端末１は、自端末のユーザの会話相手を、視線方向にある表示領域に画像が表示されている拠点のユーザに特定する。 The detection of the user's line-of-sight direction performed by the video conference terminal 1 will be described. The video conference terminal 1 can detect the line-of-sight direction of users in the same base. It is determined which of the plurality of display areas 41 to 44 formed in the display device 34 is in the detected line-of-sight direction. The video conference terminal 1 identifies the conversation partner of the user of the terminal itself as the user at the base where the image is displayed in the display area in the line-of-sight direction.

視線方向の検出について、具体的に説明する。視線方向の検出方法には周知の方法を用いればよい。例えば、特開平１０−１０８８４３号公報に掲載された方法を適用できる。この方法によると、テレビ会議端末１は、赤外線ライト３５を用いて赤外線を発光させる。赤外線は、ユーザの眼球角膜反射面に反射されて虚像（プルキニエ像）を形成する。カメラ３３は、形成されたプルキニエ像と眼球の瞳孔中心とを撮影する。テレビ会議端末１は、プルキニエ像と瞳孔中心との相対位置から、ユーザの視線方向を検出することができる。ユーザの視線方向を検出すると、他の拠点の画像の表示領域が視線方向にあるか否かを判断して、会話相手を特定する。この処理の詳細については後述する。 The detection of the gaze direction will be specifically described. A well-known method may be used as a method for detecting the line-of-sight direction. For example, the method described in Japanese Patent Application Laid-Open No. 10-108843 can be applied. According to this method, the video conference terminal 1 causes the infrared light 35 to emit infrared light. Infrared rays are reflected by the eyeball cornea reflection surface of the user to form a virtual image (Purkinje image). The camera 33 captures the formed Purkinje image and the pupil center of the eyeball. The video conference terminal 1 can detect the user's line-of-sight direction from the relative position between the Purkinje image and the pupil center. When the user's line-of-sight direction is detected, it is determined whether the display area of the image at another base is in the line-of-sight direction, and the conversation partner is specified. Details of this processing will be described later.

テレビ会議端末１が行う音声認識について説明する。テレビ会議端末１は、入力された音声データについて、周知の音声認識を行うことができる。認識した音声に基づいて、ユーザの会話相手の特定、および会話が終了したか否かの判断を行う。 The voice recognition performed by the video conference terminal 1 will be described. The video conference terminal 1 can perform well-known voice recognition on the input voice data. Based on the recognized voice, the user's conversation partner is specified, and whether or not the conversation has ended is determined.

具体的には、ＨＤＤ１３には、音声認識を行うための音響モデル、言語モデル、および単語辞書が記憶されている。テレビ会議端末１は、音声データを入力すると、入力した音声データを分析し、特徴量を抽出した後、音響モデルと言語モデルとのマッチングを行う。その結果、言語モデルで受理可能な文毎に尤度が求まり、尤度が最も高い文が認識結果として得られる。マッチングの際、言語モデルは単語辞書を参照する。なお、尤度が規定の閾値以下の値になった場合には、認識失敗として認識結果は得られない。テレビ会議端末１は、会話の開始部分から通信端末の登録ユーザ名を認識した場合には、認識したユーザに向けての会話が開始したと判断して処理を行う。また、テレビ会議端末１は、「以上」「終わり」「皆さん」等、２拠点のユーザ間の会話が終了したと判断するための特定の音声を認識した場合には、会話が終了したと判断する。この処理の詳細については後述する。 Specifically, the HDD 13 stores an acoustic model, a language model, and a word dictionary for performing speech recognition. When the audio conference data is input, the video conference terminal 1 analyzes the input audio data, extracts the feature amount, and then performs matching between the acoustic model and the language model. As a result, the likelihood is obtained for each sentence acceptable by the language model, and the sentence with the highest likelihood is obtained as the recognition result. When matching, the language model refers to a word dictionary. In addition, when the likelihood becomes a value equal to or less than a predetermined threshold, a recognition result is not obtained as a recognition failure. When the video conference terminal 1 recognizes the registered user name of the communication terminal from the start part of the conversation, the video conference terminal 1 determines that the conversation for the recognized user has started and performs the process. In addition, the video conference terminal 1 determines that the conversation has ended when it recognizes a specific voice for determining that the conversation between the users at the two bases has ended, such as “above”, “end”, and “everyone”. To do. Details of this processing will be described later.

次に、図４および図５を参照して、テレビ会議端末１のＨＤＤ１３に記憶されるデータについて説明する。ＨＤＤ１３には、自端末情報記憶エリア１３１、通信相手情報記憶エリア１３２、および会話履歴記憶エリア１３３等の各種記憶エリアが設けられている（図２参照）。 Next, data stored in the HDD 13 of the video conference terminal 1 will be described with reference to FIGS. 4 and 5. The HDD 13 is provided with various storage areas such as a local terminal information storage area 131, a communication partner information storage area 132, and a conversation history storage area 133 (see FIG. 2).

自端末情報記憶エリア１３１には、自端末の登録ユーザ名、処理能力、および優先度が記憶されている。登録ユーザ名は、自端末を使用するユーザを特定するためにあらかじめ登録されたユーザの名称である。ユーザは、複数の登録ユーザ名をあらかじめ登録しておくこともできる。例えば、ユーザが同僚から「鈴木」と呼ばれ、部下からは「部長」と呼ばれる場合には、登録ユーザ名として「鈴木」および「部長」を共に登録しておくことができる。登録ユーザ名は、音声認識による会話相手の特定を行うために用いられる。処理能力は、自端末のデータ処理能力を示す値である。本実施の形態では、処理能力として、自端末のＣＰＵ１０のクロック周波数が記憶されている。しかし、処理能力として用いることができる情報はこれに限られない。例えば、自端末のメモリの容量等の他の情報を、自端末の処理能力を示す値に用いてもよい。優先度は、テレビ会議システム３内の複数の通信端末における自端末の優先度を示す。優先度は、テレビ会議が実行される際にあらかじめユーザによって設定されている。 The own terminal information storage area 131 stores the registered user name, processing capability, and priority of the own terminal. The registered user name is a name of a user registered in advance in order to specify a user who uses the terminal. The user can also register a plurality of registered user names in advance. For example, if the user is called “Suzuki” by a colleague and “director” by a subordinate, both “Suzuki” and “director” can be registered as registered user names. The registered user name is used to specify a conversation partner by voice recognition. The processing capability is a value indicating the data processing capability of the terminal itself. In the present embodiment, the clock frequency of the CPU 10 of the own terminal is stored as the processing capability. However, information that can be used as processing capability is not limited to this. For example, other information such as the memory capacity of the own terminal may be used as a value indicating the processing capability of the own terminal. The priority indicates the priority of the own terminal among a plurality of communication terminals in the video conference system 3. The priority is set in advance by the user when the video conference is executed.

通信相手情報記憶エリア１３２には、図４に示すように、テレビ会議システム３における他の通信端末に関する各種情報が記憶されている。具体的には、登録ユーザ名、伝送時間、利用可能帯域幅、処理能力、および優先度が、それぞれの端末について記憶されている。ここで、伝送時間は、自端末と通信相手の端末との間のデータの伝送時間である。伝送時間は、テレビ会議の開始時に公知の方法で計測される。本実施の形態では、自端末と通信相手の端末との間で時間を同期させておき、通信相手の端末から受信したデータに付されているタイムスタンプと、自端末で計測している時間とを比較することで、伝送時間が計測される。しかし、ＰＩＮＧコマンド等を通信相手の端末に送信し、応答が得られるまでの時間から伝送時間を計測してもよい。また、利用可能帯域幅は、通信相手の端末との間で利用できるネットワーク８の帯域幅であり、データ通信の速度を示す値である。利用可能帯域幅は、テレビ会議システム３で利用するネットワーク８、ネットワーク８に接続するための接続回線、インターフェース等に左右される。本実施の形態では、テレビ会議の開始時から、他の通信端末の各々との間の利用可能帯域幅が、公知の手法によって計測される。 In the communication partner information storage area 132, various types of information regarding other communication terminals in the video conference system 3 are stored as shown in FIG. Specifically, the registered user name, transmission time, available bandwidth, processing capability, and priority are stored for each terminal. Here, the transmission time is a data transmission time between the own terminal and the communication partner terminal. The transmission time is measured by a known method at the start of the video conference. In the present embodiment, the time is synchronized between the own terminal and the communication partner terminal, the time stamp attached to the data received from the communication partner terminal, the time measured by the own terminal, The transmission time is measured by comparing. However, the transmission time may be measured from the time until a response is obtained by transmitting a PING command or the like to the communication partner terminal. The available bandwidth is the bandwidth of the network 8 that can be used with the communication partner terminal, and is a value indicating the speed of data communication. The available bandwidth depends on the network 8 used in the video conference system 3, the connection line for connecting to the network 8, the interface, and the like. In the present embodiment, the available bandwidth with each of the other communication terminals is measured by a known method from the start of the video conference.

テレビ会議端末１は、テレビ会議を開始させる際に、通信相手情報記憶エリア１３２に記憶される各種情報を取得する。伝送時間、利用可能帯域幅、処理能力、および優先度は、自端末が会話端末となった場合に、音声データおよび画像データを直接送信する中継端末を複数の非会話端末の中から選定するために使用される。この詳細は後述する。 The video conference terminal 1 acquires various information stored in the communication partner information storage area 132 when starting the video conference. Transmission time, available bandwidth, processing capacity, and priority are for selecting a relay terminal that directly transmits voice data and image data from a plurality of non-conversation terminals when the terminal becomes a conversation terminal. Used for. Details of this will be described later.

会話履歴記憶エリア１３３には、図５に示すように、自端末のユーザの会話相手となった相手端末の履歴が順に記憶される。図５に示す例は、拠点Ａのテレビ会議端末１の会話履歴を示す。この例では、前回の相手端末が拠点Ｂの端末、２回前の相手端末が拠点Ｄの端末、３回前の相手端末が拠点Ｂの端末、４回前の相手端末が拠点Ｅの端末であることを示す。会話履歴も、非会話端末を選定するために使用される。 In the conversation history storage area 133, as shown in FIG. 5, the history of the partner terminal that is the conversation partner of the user of the terminal itself is stored in order. The example shown in FIG. 5 shows the conversation history of the video conference terminal 1 at the site A. In this example, the previous partner terminal is the base B terminal, the previous partner terminal is the base D terminal, the previous partner terminal is the base B terminal, and the previous partner terminal is the base E terminal. Indicates that there is. The conversation history is also used to select a non-conversation terminal.

以下、本実施の形態に係るテレビ会議端末１およびＰＣ２が行う処理について、図６から図１１を参照して説明する。テレビ会議端末１およびＰＣ２が行う処理は同じであるため、以下ではテレビ会議端末１が行う処理について説明する。以下説明するメイン処理、会話開始検出処理、および中継端末受諾処理は、ＲＯＭ１１に記憶されているプログラムに従ってＣＰＵ１０が実行する。 Hereinafter, processing performed by the video conference terminal 1 and the PC 2 according to the present embodiment will be described with reference to FIGS. Since the process performed by the video conference terminal 1 and the PC 2 is the same, the process performed by the video conference terminal 1 will be described below. A main process, a conversation start detection process, and a relay terminal acceptance process described below are executed by the CPU 10 according to a program stored in the ROM 11.

図６に示すメイン処理は、テレビ会議を実行する指示が入力されることで開始される。メイン処理が開始されると、まず、会議開始処理が行われる（Ｓ１）。会議開始処理では、テレビ会議中の処理を実行するための各種情報の送受信、記憶、処理方法の設定等が行われる。 The main process shown in FIG. 6 is started when an instruction to execute a video conference is input. When the main process is started, a meeting start process is first performed (S1). In the conference start process, transmission / reception of various information for executing the process during the video conference, storage, setting of a processing method, and the like are performed.

図７に示すように、会議開始処理では、まず、ＨＤＤ１３の自端末情報記憶エリア１３１（図２参照）に記憶されている自端末の登録ユーザ名、処理能力、および優先度が取得される（Ｓ２１）。取得された自端末の情報が、テレビ会議システム３中の他の通信端末に送信される（Ｓ２２）。他の通信端末のそれぞれの登録ユーザ名、処理能力、および優先度の情報が他の通信端末から受信され、ＨＤＤ１３の通信相手情報記憶エリア１３２（図４参照）に記憶される（Ｓ２３）。 As shown in FIG. 7, in the conference start process, first, the registered user name, processing capability, and priority stored in the own terminal information storage area 131 (see FIG. 2) of the HDD 13 are acquired ( S21). The acquired information of the own terminal is transmitted to other communication terminals in the video conference system 3 (S22). Information of each registered user name, processing capability, and priority of the other communication terminal is received from the other communication terminal and stored in the communication partner information storage area 132 (see FIG. 4) of the HDD 13 (S23).

他の通信端末の各々との間のデータの伝送時間が計測され、通信相手情報記憶エリア１３２に記憶される（Ｓ２４）。他の通信端末の各々との間のネットワーク８の利用可能帯域幅が計測され、通信相手情報記憶エリア１３２に記憶される（Ｓ２５）。利用可能帯域幅は、テレビ会議中に適宜計測され、その都度更新される。 The transmission time of data with each of the other communication terminals is measured and stored in the communication partner information storage area 132 (S24). The available bandwidth of the network 8 with each of the other communication terminals is measured and stored in the communication partner information storage area 132 (S25). The available bandwidth is appropriately measured during the video conference and updated each time.

次いで、中継端末の選定方法が設定される（Ｓ２６）。前述したように、中継端末は、非会話端末のうち、会話端末から通信データを受信して他の非会話端末に転送する端末である。テレビ会議端末１は、会話端末となった場合に、会話履歴、データの伝送時間、利用可能帯域幅、データの処理能力、および優先度のいずれかの情報を用いて、中継端末を選定することができる。Ｓ２６の処理では、ユーザによる操作部２５（図２参照）の操作に応じて、いずれの情報を用いて中継端末を選定するかが設定される。 Next, a relay terminal selection method is set (S26). As described above, the relay terminal is a terminal that receives communication data from a conversation terminal and transfers it to another non-conversation terminal among non-conversation terminals. When the video conference terminal 1 becomes a conversation terminal, the relay terminal is selected using any information of conversation history, data transmission time, available bandwidth, data processing capacity, and priority. Can do. In the process of S26, which information is used to select a relay terminal is set according to the operation of the operation unit 25 (see FIG. 2) by the user.

次いで、相手端末の特定方法が設定される（Ｓ２７）。前述したように、テレビ会議端末１は、会話相手の相手端末を特定するために、ユーザの視線方向を検出する方法と、音声認識による方法とを用いることができる。Ｓ２７の処理では、いずれの方法で相手端末を特定するかが設定される。次いで、他の通信端末からの音声データおよび画像データの受信が開始され、受信した通信データに基づく音声の発生、および画像の表示が開始されて、テレビ会議が実行される（Ｓ２８）。処理はメイン処理に戻る。 Next, the identification method of the partner terminal is set (S27). As described above, the video conference terminal 1 can use a method of detecting the user's line-of-sight direction and a method based on voice recognition in order to specify the partner terminal of the conversation partner. In the process of S27, it is set which method is used to specify the partner terminal. Next, reception of audio data and image data from another communication terminal is started, generation of audio based on the received communication data and display of an image are started, and a video conference is executed (S28). The process returns to the main process.

図６の説明に戻る。会議開始処理（Ｓ１）が終了し、テレビ会議が開始されると、会話開始検出処理が開始される（Ｓ２）。会話開始検出処理では、２つの通信端末間で会話が開始されたことが検出される。さらに、２つの会話端末が特定される。ここでは、会議開始処理で行われた設定（図７、Ｓ２７）に応じて、第一会話開始検出処理（図８参照）、および第二会話開始検出処理（図９参照）のいずれかが実行される。なお、会話開始検出処理は、メイン処理と並行して実行される。 Returning to the description of FIG. When the conference start process (S1) is finished and the video conference is started, the conversation start detection process is started (S2). In the conversation start detection process, it is detected that a conversation is started between two communication terminals. Further, two conversation terminals are specified. Here, either the first conversation start detection process (see FIG. 8) or the second conversation start detection process (see FIG. 9) is executed according to the setting made in the conference start process (FIG. 7, S27). Is done. The conversation start detection process is executed in parallel with the main process.

図８を参照して、第一会話開始検出処理について説明する。第一会話開始検出処理は、音声認識を利用して相手端末を特定する設定が行われている場合に実行される。ＣＰＵ１０は、自端末の拠点を含む全ての拠点についての音声データを取得し、取得した音声データに対して音声認識を行う。まず、音声認識の結果から、いずれかの拠点での発話の開始時に登録ユーザ名が発話されたか否かが判断される（Ｓ３１）。詳細には、所定時間以上の無音状態の後に発話が行われた場合に、発話の開始時であるとされる。ＨＤＤ１３から登録ユーザ名が取得され、自端末および他の通信端末の登録ユーザ名のいずれかが発話の開始部分に含まれているか否かが判断される。登録ユーザ名が含まれていなければ（Ｓ３１：ＮＯ）、この判断が繰り返し行われる。登録ユーザ名が含まれていれば（Ｓ３１：ＹＥＳ）、登録ユーザ名の後の無音状態が一定時間（本実施の形態では０．５秒）に達したか否かが判断される（Ｓ３２）。本実施の形態では、登録ユーザ名の後の無音時間が一定時間未満であれば、発話された登録ユーザ名は一連の会話の途中に登場したユーザ名、つまり、会話相手に呼びかけを行う目的以外の目的で発話されたユーザ名と判断される。従って、無音状態が０．５秒未満であれば（Ｓ３２：ＮＯ）、処理はＳ３１の判断へ戻る。無音状態が０．５秒に達した場合には（Ｓ３２：ＹＥＳ）、あるユーザが他のいずれかのユーザに対する呼びかけを行ったと判断され、その時点で２拠点のユーザによる会話が開始されたと判断される。 The first conversation start detection process will be described with reference to FIG. The first conversation start detection process is executed when a setting for specifying a partner terminal using voice recognition is performed. CPU10 acquires the audio | voice data about all the bases including the base of an own terminal, and performs audio | voice recognition with respect to the acquired audio | voice data. First, from the result of speech recognition, it is determined whether or not the registered user name is uttered at the start of utterance at any base (S31). Specifically, it is assumed that the utterance is started when the utterance is performed after the silence state for a predetermined time or longer. The registered user name is acquired from the HDD 13, and it is determined whether or not any of the registered user names of the own terminal and other communication terminals is included in the start part of the utterance. If the registered user name is not included (S31: NO), this determination is repeated. If the registered user name is included (S31: YES), it is determined whether or not the silent state after the registered user name has reached a certain time (0.5 seconds in this embodiment) (S32). . In this embodiment, if the silent time after the registered user name is less than a certain time, the registered user name spoken is the user name that appeared in the middle of a series of conversations, that is, other than the purpose of calling the conversation partner. It is determined that the user name is spoken for the purpose. Therefore, if the silent state is less than 0.5 seconds (S32: NO), the process returns to the determination of S31. If the silent state has reached 0.5 seconds (S32: YES), it is determined that a certain user has made a call to one of the other users, and it is determined that conversation between the users at the two bases has started at that time. Is done.

会話が開始されたと判断されると、発話の開始部分に含まれていた登録ユーザ名が、自端末の登録ユーザ名であるか否かが判断される（Ｓ３４）。自端末の登録ユーザ名でなければ（Ｓ３４：ＮＯ）、登録ユーザ名が含まれていた発話の音声データが、自端末に入力された音声データであるか否かが判断される（Ｓ３５）。 If it is determined that the conversation has started, it is determined whether or not the registered user name included in the start part of the utterance is the registered user name of the terminal (S34). If it is not the registered user name of the own terminal (S34: NO), it is determined whether or not the voice data of the utterance including the registered user name is the voice data input to the own terminal (S35).

自端末の登録ユーザ名が含まれている場合には、他の通信端末のユーザが自端末のユーザを呼びかけている。また、登録ユーザ名が、自端末に入力された音声データに含まれていれば、自端末のユーザが他の通信端末のユーザを呼びかけている。従って、自端末の登録ユーザ名が含まれていた場合（Ｓ３４：ＹＥＳ）、または、自端末に入力された音声データである場合には（Ｓ３５：ＹＥＳ）、自端末が２つの会話端末の一方であると判定される（Ｓ３６）。自端末が会話端末であることを示す会話端末特定情報が、他の通信端末の全てに送信される（Ｓ３７）。もう一方の会話端末から、会話端末特定情報が受信される（Ｓ３８）。会話端末特定情報の送信元の通信端末が、会話相手のユーザの端末である相手端末に特定される（Ｓ３９）。特定された相手端末が、会話履歴（図５参照）の「前回」の欄に追加されて（Ｓ４０）、処理はＳ３１の判断へ戻る。 When the registered user name of the own terminal is included, the user of the other communication terminal calls for the user of the own terminal. Further, if the registered user name is included in the voice data input to the own terminal, the user of the own terminal calls for a user of another communication terminal. Accordingly, when the registered user name of the own terminal is included (S34: YES), or when the voice data is input to the own terminal (S35: YES), the own terminal is one of the two conversation terminals. (S36). Conversation terminal specifying information indicating that the terminal is a conversation terminal is transmitted to all other communication terminals (S37). The conversation terminal specifying information is received from the other conversation terminal (S38). The communication terminal that is the transmission source of the conversation terminal identification information is identified as the partner terminal that is the terminal of the conversation partner user (S39). The identified partner terminal is added to the “previous” column of the conversation history (see FIG. 5) (S40), and the process returns to the determination of S31.

発話に含まれていた登録ユーザ名が自端末の登録ユーザ名でなく（Ｓ３４：ＮＯ）、音声データが自端末に入力された音声データでもなければ（Ｓ３５：ＮＯ）、自端末が非会話端末であると判定される（Ｓ４２）。会話端末となった他の２つの通信端末から、会話端末特定情報が受信される（Ｓ４３）。会話端末特定情報の送信元である２つの通信端末が会話端末に特定されて（Ｓ４４）、処理はＳ３１の判断へ戻る。 If the registered user name included in the utterance is not the registered user name of the own terminal (S34: NO) and the voice data is not the voice data input to the own terminal (S35: NO), the own terminal is a non-conversation terminal (S42). The conversation terminal specifying information is received from the other two communication terminals that have become conversation terminals (S43). Two communication terminals that are the transmission source of the conversation terminal identification information are identified as conversation terminals (S44), and the process returns to the determination in S31.

図９を参照して、第二会話開始検出処理について説明する。第二会話開始検出処理は、ユーザの視線方向の検出を利用して相手端末を特定する設定が行われている場合に実行される。まず、自端末に接続されているマイク３１（図２参照）からの音声データの入力が開始されたか否かが判断される（Ｓ５１）。入力が開始された場合には（Ｓ５１：ＹＥＳ）、自端末に接続されているカメラ３３（図２参照）から撮像映像が取得され（Ｓ５２）、前述した視線方向の検出処理が行われる（Ｓ５３）。検出されたユーザの視線方向に、表示領域４１〜４４（図３参照）のいずれかがあるか否かが判断される（Ｓ５４）。視線方向にいずれの表示領域もなければ（Ｓ５４：ＮＯ）、処理はＳ５１の判断へ戻る。視線方向にいずれかの表示領域があれば（Ｓ５４：ＹＥＳ）、自端末のユーザが他のユーザに対して会話を開始したと判断され、自端末が会話端末の一方であると判定される（Ｓ５６）。視線方向にある表示領域に対応する通信端末が、相手端末に特定される（Ｓ５７）。特定された相手端末が、会話履歴（図５参照）に記憶される（Ｓ５８）。特定された相手端末が会話端末であることを示す会話端末特定情報が、他の通信端末の全てに送信される（Ｓ５９）。処理はＳ５１の判断へ戻る。 The second conversation start detection process will be described with reference to FIG. The second conversation start detection process is executed when the setting for specifying the partner terminal is performed using detection of the user's gaze direction. First, it is determined whether or not input of audio data from the microphone 31 (see FIG. 2) connected to the own terminal is started (S51). When the input is started (S51: YES), a captured image is acquired from the camera 33 (see FIG. 2) connected to the terminal (S52), and the above-described line-of-sight direction detection processing is performed (S53). ). It is determined whether or not any of the display areas 41 to 44 (see FIG. 3) exists in the detected line-of-sight direction of the user (S54). If there is no display area in the line-of-sight direction (S54: NO), the process returns to the determination in S51. If there is any display area in the line-of-sight direction (S54: YES), it is determined that the user of the own terminal has started a conversation with another user, and the own terminal is determined to be one of the conversation terminals ( S56). A communication terminal corresponding to the display area in the line-of-sight direction is identified as the counterpart terminal (S57). The identified partner terminal is stored in the conversation history (see FIG. 5) (S58). Conversation terminal specifying information indicating that the specified partner terminal is a conversation terminal is transmitted to all other communication terminals (S59). The process returns to the determination in S51.

マイク３１からの音声データの入力が開始されていなければ（Ｓ５１：ＮＯ）、他の通信端末から会話端末特定情報を受信したか否かが判断される（Ｓ６１）。受信していなければ（Ｓ６１：ＮＯ）、処理はＳ５１の判断へ戻る。受信した場合には（Ｓ６１：ＹＥＳ）、自端末が会話端末に特定されているか否かが、受信した会話端末特定情報によって判断される（Ｓ６３）。自端末が会話端末に特定されていれば（Ｓ６３：ＹＥＳ）、自端末が会話端末の一方であると判定される（Ｓ６４）。会話端末特定情報の送信元の通信端末が、相手端末に特定される（Ｓ６５）。相手端末が会話履歴に記憶されて（Ｓ６６）、処理はＳ５１の判断へ戻る。 If the input of voice data from the microphone 31 has not been started (S51: NO), it is determined whether or not the conversation terminal specifying information has been received from another communication terminal (S61). If not received (S61: NO), the process returns to the determination of S51. If received (S61: YES), whether or not the terminal is specified as a conversation terminal is determined based on the received conversation terminal identification information (S63). If the own terminal is specified as the conversation terminal (S63: YES), it is determined that the own terminal is one of the conversation terminals (S64). The communication terminal that is the transmission source of the conversation terminal specifying information is specified as the partner terminal (S65). The partner terminal is stored in the conversation history (S66), and the process returns to the determination in S51.

自端末が会話端末に特定されていなければ（Ｓ６３：ＮＯ）、自端末が非会話端末であると判定される（Ｓ６８）。会話端末特定情報によって会話端末に特定されている通信端末と、情報の送信元の通信端末とが、会話端末に特定される（Ｓ６９）。処理はＳ５１の判断へ戻る。 If the own terminal is not specified as a conversation terminal (S63: NO), it is determined that the own terminal is a non-conversation terminal (S68). The communication terminal specified as the conversation terminal by the conversation terminal specifying information and the communication terminal as the information transmission source are specified as the conversation terminal (S69). The process returns to the determination in S51.

図６の説明に戻る。会話開始検出処理が開始されると（Ｓ２）、まず、通常の通信データの送受信が行われる。詳細には、同一拠点内のマイク３１およびカメラ３３からの入力データがエンコードされ、中品質の音声データおよび画像データが生成される。生成された通信データが、他の通信端末の全てに同じように送信される（Ｓ３）。中品質のデータとは、データ量が後述する高品質データと低品質データとの間にあるデータである。データ量は、エンコード時の圧縮率、サンプリングレート、画像の解像度、画像データのフレームレート、エラー訂正符号等によって変動する。出力される音声および画像の品質は、データ量が大きいほど高い。 Returning to the description of FIG. When the conversation start detection process is started (S2), first, normal communication data is transmitted and received. Specifically, input data from the microphone 31 and the camera 33 in the same base is encoded, and medium-quality audio data and image data are generated. The generated communication data is transmitted to all other communication terminals in the same manner (S3). The medium quality data is data having a data amount between high quality data and low quality data, which will be described later. The amount of data varies depending on the compression rate at the time of encoding, the sampling rate, the image resolution, the frame rate of the image data, the error correction code, and the like. The quality of output audio and image is higher as the amount of data is larger.

次いで、前述した会話開始検出処理（図８または図９参照）によって、２拠点のユーザ間の会話の開始が検出されたか否かが判断される（Ｓ４）。詳細には、会話開始検出処理において、自端末が会話端末または非会話端末であると判定された場合に、会話開始が検出されたと判断される（Ｓ３６，Ｓ４２，Ｓ５６，Ｓ６４，Ｓ６８）。会話開始が検出されていなければ（Ｓ４：ＮＯ）、通常の通信データの送受信が継続される。会話開始が検出された場合には（Ｓ４：ＹＥＳ）、自端末が会話端末となっているか否かが判断される（Ｓ５）。自端末が会話端末となっていれば（Ｓ５：ＹＥＳ）、相手端末との間のネットワーク８の帯域を確保するための会話端末としての処理が行われる。まず、複数の非会話端末から中継端末を選定する中継端末選定処理が行われる（Ｓ６）。前述したように、中継端末とは、非会話端末のうち、会話端末が音声データおよび画像データを直接送信する通信端末である。中継端末は、会話端末から受信した通信データを、その他の非会話端末に転送する。 Next, it is determined whether or not the start of the conversation between the users at the two bases has been detected by the above-described conversation start detection process (see FIG. 8 or FIG. 9) (S4). Specifically, in the conversation start detection process, when it is determined that the terminal is a conversation terminal or a non-conversation terminal, it is determined that a conversation start has been detected (S36, S42, S56, S64, S68). If the conversation start is not detected (S4: NO), transmission / reception of normal communication data is continued. When the conversation start is detected (S4: YES), it is determined whether or not the terminal is a conversation terminal (S5). If the own terminal is a conversation terminal (S5: YES), processing as a conversation terminal for securing the bandwidth of the network 8 with the partner terminal is performed. First, relay terminal selection processing for selecting a relay terminal from a plurality of non-conversation terminals is performed (S6). As described above, the relay terminal is a communication terminal in which a conversation terminal directly transmits voice data and image data among non-conversation terminals. The relay terminal transfers the communication data received from the conversation terminal to other non-conversation terminals.

図１０に示すように、中継端末選定処理が開始されると、会議開始処理で行われた設定（図７、Ｓ２６）に応じて、該当する非会話端末が中継端末の候補として１つ選定される（Ｓ７１）。非会話端末の選定方法の詳細については後述する。次いで、中継端末の候補に選定したことを示す選定情報が、選定された非会話端末に送信される（Ｓ７２）。選定した非会話端末から受諾情報を受信したか否かが判断される（Ｓ７３）。詳細は後述するが、候補に選定された非会話端末は、選定の受諾を示す受諾情報、および拒否を示す拒否情報のいずれかを送信する。拒否情報が受信された場合には（Ｓ７３：ＮＯ）、選定した非会話端末が中継端末の候補から除外されて（Ｓ７４）、残りの非会話端末から再び中継端末の候補が選定される（Ｓ７１）。受諾情報が受信された場合には（Ｓ７３：ＹＥＳ）、候補として選定された非会話端末が中継端末に決定されて（Ｓ７５）、処理はメイン処理へ戻る。 As shown in FIG. 10, when the relay terminal selection process is started, one corresponding non-conversation terminal is selected as a relay terminal candidate according to the setting (FIG. 7, S26) performed in the conference start process. (S71). Details of the non-conversation terminal selection method will be described later. Next, selection information indicating that the relay terminal candidate has been selected is transmitted to the selected non-conversation terminal (S72). It is determined whether acceptance information has been received from the selected non-conversation terminal (S73). Although details will be described later, the non-conversation terminal selected as a candidate transmits either acceptance information indicating acceptance of selection or rejection information indicating rejection. When the refusal information is received (S73: NO), the selected non-conversation terminal is excluded from the relay terminal candidates (S74), and the relay terminal candidates are selected again from the remaining non-conversation terminals (S71). ). When the acceptance information is received (S73: YES), the non-conversation terminal selected as a candidate is determined as the relay terminal (S75), and the process returns to the main process.

次に、図１１を参照して、中継端末受諾処理について説明する。中継端末受諾処理は、テレビ会議が行われている間、メイン処理と並行して実行されている。まず、２拠点のユーザ間の会話が終了したか否かが判断される（Ｓ８１）。本実施の形態では、無音状態が所定時間以上継続した場合、会話が終了したと判断される。また、「以上」「終わり」「皆さん」等、２拠点のユーザ間の会話が終了したと判断するための特定の音声が認識された場合にも、会話が終了したと判断される。会話が終了していないと判断された場合には（Ｓ８１：ＮＯ）、新たな会話の開始が検出されたか否かが判断される（Ｓ８２）。会話開始検出処理（図８および図９参照）において新たな会話の開始が検出されていなければ（Ｓ８２：ＮＯ）、他の通信端末から選定情報を受信したか否かが判断される（Ｓ８３）。受信していなければ（Ｓ８３：ＮＯ）、処理はＳ８１の判断へ戻る。 Next, the relay terminal acceptance process will be described with reference to FIG. The relay terminal acceptance process is executed in parallel with the main process during the video conference. First, it is determined whether or not the conversation between the users at the two bases has ended (S81). In the present embodiment, when the silent state continues for a predetermined time or more, it is determined that the conversation has ended. In addition, when a specific voice for determining that the conversation between the users at the two bases is finished, such as “above”, “end”, “everyone”, the conversation is judged to be finished. If it is determined that the conversation has not ended (S81: NO), it is determined whether the start of a new conversation has been detected (S82). If the start of a new conversation is not detected in the conversation start detection process (see FIGS. 8 and 9) (S82: NO), it is determined whether selection information has been received from another communication terminal (S83). . If not received (S83: NO), the process returns to the determination of S81.

選定情報を受信した場合には（Ｓ８３：ＹＥＳ）、自端末が既に中継端末に選定されているか否かが、受諾中フラグによって判断される（Ｓ８４）。受諾中フラグは、自端末が中継端末に選定されている場合にＯＮとされ、選定されていない場合にＯＦＦとされる。中継端末に選定されていなければ（Ｓ８４：ＮＯ）、受諾中フラグがＯＮとされる（Ｓ８５）。中継端末への選定を受諾することを示す受諾情報が、選定情報の送信元の会話端末に送信される（Ｓ８６）。処理はＳ８１の判断へ戻る。一方、既に中継端末に選定されている場合には（Ｓ８４：ＹＥＳ）、選定を拒否することを示す拒否情報が送信されて（Ｓ８７）、処理はＳ８１の判断へ戻る。また、会話が終了したと判断された場合（Ｓ８１：ＹＥＳ）、および、新たな会話の開始が検出された場合には（Ｓ８２：ＹＥＳ）、受諾中フラグがＯＦＦとされて（Ｓ８８）、処理はＳ８１の判断へ戻る。 When the selection information is received (S83: YES), it is determined by the acceptance flag whether or not the own terminal has already been selected as the relay terminal (S84). The in-acceptance flag is set to ON when the terminal is selected as a relay terminal, and is set to OFF when the terminal is not selected. If it is not selected as a relay terminal (S84: NO), the acceptance flag is turned ON (S85). Acceptance information indicating acceptance of selection to the relay terminal is transmitted to the conversation terminal that is the transmission source of the selection information (S86). The process returns to the determination in S81. On the other hand, if the relay terminal is already selected (S84: YES), rejection information indicating that the selection is rejected is transmitted (S87), and the process returns to the determination of S81. When it is determined that the conversation has ended (S81: YES), and when the start of a new conversation is detected (S82: YES), the in-acceptance flag is turned off (S88), and processing is performed. Returns to the determination in S81.

ここで、中継端末の候補の選定方法（図１０、Ｓ７１参照）の詳細について説明する。前述したように、中継端末は、会話履歴、データの伝送時間、利用可能帯域幅、データの処理能力、および優先度のいずれかの情報を用いて選定される。 Here, the details of the method for selecting candidate relay terminals (see FIG. 10, S71) will be described. As described above, the relay terminal is selected using any information of conversation history, data transmission time, available bandwidth, data processing capability, and priority.

会話履歴を用いて中継端末を選定するように設定されている場合は、ＨＤＤ１３の会話履歴記憶エリア１３３（図５参照）に記憶されている会話履歴が参照される。非会話端末のうち、会話履歴に記憶された順が新しい非会話端末から中継端末の候補に選定される。会話履歴に記憶された順が新しい非会話端末ほど、現在の会話に対する関連性は一般的に高い。そして、中継端末には会話端末から音声データおよび画像データが直接送信される。従って、中継端末に選定されなかった非会話端末に比べ、会話端末がデータを送信してから受信するまでの遅延時間は短い。よって、会話履歴に記憶された順が新しい非会話端末を優先して中継端末に選定することで、現在の会話に対する関連性が高い非会話端末の受信遅延を最小限に抑えることができる。 When it is set to select a relay terminal using the conversation history, the conversation history stored in the conversation history storage area 133 (see FIG. 5) of the HDD 13 is referred to. Among the non-conversation terminals, the order stored in the conversation history is selected from the new non-conversation terminals as relay terminal candidates. A non-conversation terminal with a newer order stored in the conversation history generally has a higher relevance to the current conversation. The voice data and the image data are directly transmitted from the conversation terminal to the relay terminal. Therefore, the delay time from when the conversation terminal transmits data to when it receives data is shorter than that of a non-conversation terminal that is not selected as a relay terminal. Therefore, by selecting a non-conversation terminal with the newest order stored in the conversation history as a relay terminal, it is possible to minimize the reception delay of the non-conversation terminal highly related to the current conversation.

データの伝送時間を用いて中継端末を選定するように設定されている場合には、ＨＤＤ１３の通信相手情報記憶エリア１３２（図４参照）に記憶されている伝送時間が参照される。非会話端末のうち、会話端末との間のデータの伝送時間が短い非会話端末から中継端末の候補に選定される。中継端末は、会話端末から受信した通信データを、他の非会話端末に転送する。従って、会話端末と中継端末との間の伝送時間が長ければ、それに伴って、他の全ての非会話端末の受信遅延も長くなる。伝送時間が短い非会話端末を優先して中継端末に選定することで、非会話端末全体の受信遅延を緩和することができる。 When the relay terminal is set to be selected using the data transmission time, the transmission time stored in the communication partner information storage area 132 (see FIG. 4) of the HDD 13 is referred to. Among the non-conversation terminals, a non-conversation terminal having a short data transmission time with the conversation terminal is selected as a relay terminal candidate. The relay terminal transfers the communication data received from the conversation terminal to another non-conversation terminal. Therefore, if the transmission time between the conversation terminal and the relay terminal is long, the reception delay of all other non-conversation terminals is also increased accordingly. By preferentially selecting a non-conversation terminal with a short transmission time as a relay terminal, the reception delay of the entire non-conversation terminal can be reduced.

利用可能帯域幅を用いて中継端末を選定するように設定されている場合には、利用可能帯域幅が広い（帯域の値が高い）非会話端末から中継端末の候補に選定される。利用可能帯域幅が広い程、データの通信速度が速い。従って、非会話端末全体の受信遅延を緩和することができる。 If the relay terminal is set to be selected using the available bandwidth, a non-conversation terminal having a wide available bandwidth (a high bandwidth value) is selected as a candidate for the relay terminal. The wider the available bandwidth, the faster the data communication speed. Therefore, the reception delay of the entire non-conversation terminal can be reduced.

データの処理能力を用いて中継端末を選定するように設定されている場合には、処理能力が高い非会話端末から中継端末の候補に選定される。データの処理能力が高い非会話端末は、処理能力が低い非会話端末に比べ、会話端末から通信データを受信して他の非会話端末に転送するまでの処理を迅速に行うことができる。従って、非会話端末全体の受信遅延を緩和することができる。 When the relay terminal is set to be selected using the data processing capacity, the non-conversation terminal having a high processing capacity is selected as a relay terminal candidate. A non-conversation terminal with high data processing capability can perform processing from receiving communication data from a conversation terminal to transferring it to another non-conversation terminal more quickly than a non-conversation terminal with low processing capability. Therefore, the reception delay of the entire non-conversation terminal can be reduced.

優先度を用いて中継端末を選定するように設定されている場合には、ＨＤＤ１３の通信相手情報記憶エリア１３２（図４参照）に記憶されている優先度が取得される。高い優先度が設定されている非会話端末から中継端末の候補に選定される。この場合、ユーザは、優先度を適宜設定することで、所望の通信端末の受信遅延を最小限に抑えることができる。 If the relay terminal is set to be selected using the priority, the priority stored in the communication partner information storage area 132 (see FIG. 4) of the HDD 13 is acquired. A non-conversation terminal having a high priority is selected as a relay terminal candidate. In this case, the user can minimize the reception delay of a desired communication terminal by appropriately setting the priority.

図６の説明に戻る。中継端末が選定されると（Ｓ６）、高品質の音声データ・画像データと、低品質の音声データ・画像データとが、同一の音声および画像から生成される（Ｓ７）。詳細には、同一拠点内のマイク３１およびカメラ３３からの入力データが異なる条件でエンコードされ、データ量が異なる２種類の通信データが生成される。次いで、会話相手である相手端末に、高品質データが送信される（Ｓ８）。選定された中継端末に、低品質データが送信される（Ｓ９）。中継端末以外の非会話端末には、いずれの通信データも送信されない。次いで、新たな会話の開始が検出されたか否かが判断される（Ｓ１０）。検出された場合には（Ｓ１０：ＹＥＳ）、処理はＳ５の判断へ戻る。新たな会話の開始が検出されていなければ（Ｓ１０：ＮＯ）、相手端末のユーザとの間の会話が終了したか否かが判断される（Ｓ１１）。この判断条件は、図１１のＳ８１の判断条件と同じである。会話が終了していなければ（Ｓ１１：ＮＯ）、処理はＳ７へ戻り、高品質のデータおよび低品質のデータの送信が継続される。会話が終了した場合には（Ｓ１１：ＹＥＳ）、処理はＳ３へ戻り、通常のデータの送受信、つまり、他の全ての通信端末への中品質のデータの送信が、再び行われる。 Returning to the description of FIG. When a relay terminal is selected (S6), high-quality audio data / image data and low-quality audio data / image data are generated from the same audio and image (S7). Specifically, input data from the microphone 31 and the camera 33 in the same base are encoded under different conditions, and two types of communication data having different data amounts are generated. Next, the high quality data is transmitted to the partner terminal that is the conversation partner (S8). Low quality data is transmitted to the selected relay terminal (S9). None of the communication data is transmitted to non-conversation terminals other than the relay terminal. Next, it is determined whether or not the start of a new conversation has been detected (S10). If detected (S10: YES), the process returns to the determination in S5. If the start of a new conversation has not been detected (S10: NO), it is determined whether or not the conversation with the user of the partner terminal has ended (S11). This determination condition is the same as the determination condition of S81 in FIG. If the conversation has not ended (S11: NO), the process returns to S7, and transmission of high-quality data and low-quality data is continued. When the conversation ends (S11: YES), the process returns to S3, and normal data transmission / reception, that is, transmission of medium quality data to all other communication terminals is performed again.

また、会話が開始した場合に、自端末が会話端末となっていなければ（Ｓ５：ＮＯ）、非会話端末としての処理が行われる。まず、低品質のデータが生成される（Ｓ１２）。会話端末を含む他の全ての通信端末に、生成された低品質データが送信される（Ｓ１３）。次いで、中継端末に選定されているか否かが判断される（Ｓ１４）。選定されていなければ（Ｓ１４：ＮＯ）、処理はそのままＳ１６の判断へ移行する。中継端末に選定されていれば（Ｓ１４：ＹＥＳ）、会話端末から受信した低品質データが、他の非会話端末に転送されて、低品質データが非会話端末内でブロードキャストされる（Ｓ１５）。他の非会話端末とは、換言すると、全ての通信端末中、２つの会話端末と１つの中継端末（自端末）とを除く通信端末である。ここで、本実施の形態におけるブロードキャストとは、同一の非会話端末に同一のデータが重複して送信されることなく、且つ、複数の非会話端末内でデータがループされることなく、会話端末によって生成された低品質データが非会話端末に送信されることを示す。本実施の形態では、会話端末が送信した低品質データは、選定された１つの中継端末から他の非会話端末のそれぞれに対し、一方向に転送される。その結果、会話端末が送信した低品質データが同一の非会話端末に重複して送信されたり、データが非会話端末内でループしたりすることを防止することができる。 When the conversation is started, if the terminal is not a conversation terminal (S5: NO), processing as a non-conversation terminal is performed. First, low quality data is generated (S12). The generated low quality data is transmitted to all other communication terminals including the conversation terminal (S13). Next, it is determined whether or not the relay terminal is selected (S14). If it is not selected (S14: NO), the process proceeds to the determination of S16 as it is. If the relay terminal is selected (S14: YES), the low quality data received from the conversation terminal is transferred to the other non-conversation terminal, and the low quality data is broadcast in the non-conversation terminal (S15). In other words, the other non-conversation terminals are communication terminals excluding two conversation terminals and one relay terminal (own terminal) among all communication terminals. Here, the broadcast in this embodiment means that the same data is not repeatedly transmitted to the same non-conversation terminal, and the data is not looped in a plurality of non-conversation terminals. Indicates that the low-quality data generated by is transmitted to the non-conversation terminal. In the present embodiment, the low quality data transmitted by the conversation terminal is transferred in one direction from the selected one relay terminal to each of the other non-conversation terminals. As a result, it is possible to prevent the low quality data transmitted by the conversation terminal from being repeatedly transmitted to the same non-conversation terminal, or the data from being looped within the non-conversation terminal.

次いで、新たな会話の開始が検出されたか否かが判断される（Ｓ１６）。検出された場合には（Ｓ１６：ＹＥＳ）、処理はＳ５の判断へ戻る。新たな会話の開始が検出されていなければ（Ｓ１６：ＮＯ）、会話が終了したか否かが判断される（Ｓ１７）。会話が終了していなければ（Ｓ１７：ＮＯ）、処理はＳ１２へ戻り、非会話端末としての処理が継続される。会話が終了した場合には（Ｓ１７：ＹＥＳ）、処理はＳ３へ戻り、通常の通信データの送受信が行われる。 Next, it is determined whether or not the start of a new conversation has been detected (S16). If it is detected (S16: YES), the process returns to the determination of S5. If the start of a new conversation has not been detected (S16: NO), it is determined whether the conversation has ended (S17). If the conversation has not ended (S17: NO), the process returns to S12, and the process as a non-conversation terminal is continued. When the conversation ends (S17: YES), the process returns to S3, and normal communication data is transmitted and received.

次に、図１２から図１４を参照して、テレビ会議端末１およびＰＣ２が上記の処理を行った場合の通信データの送受信の態様について説明する。図１２から図１４は、拠点Ａ，Ｂ，Ｃ，Ｄ，Ｅの５拠点の通信端末からなるテレビ会議システム３における通信データの送受信の一例を示す。図１２は、２拠点のユーザ間の会話が行われていない場合の送受信の態様を示す。図１３は、拠点Ａのユーザと拠点Ｅのユーザとが会話を行っている場合の送受信の態様を示す。図１４では、図１３の場合の、拠点Ａの通信端末の送受信の態様のみを図示している。 Next, with reference to FIG. 12 to FIG. 14, an aspect of transmission / reception of communication data when the video conference terminal 1 and the PC 2 perform the above processing will be described. FIG. 12 to FIG. 14 show an example of transmission / reception of communication data in the video conference system 3 including communication terminals at five sites A, B, C, D, and E. FIG. 12 shows a mode of transmission / reception when a conversation between users at two bases is not performed. FIG. 13 shows a transmission / reception mode when a user at the site A and a user at the site E are having a conversation. In FIG. 14, only the transmission / reception mode of the communication terminal at the site A in the case of FIG. 13 is illustrated.

図１２に示すように、２拠点のユーザ間の会話が行われていない場合には、各通信端末は、中品質のデータを他の通信端末の全てに送信する。その結果、各通信端末は、他の全ての拠点の音声および映像を出力できる。よって、音声および映像を用いた遠隔会議を実行できる。しかし、ネットワーク８（図１および図２参照）の帯域は変動する場合がある。例えば、５拠点中２つの拠点のユーザ同士で会話を行っている場合、帯域が低下して音声および映像が途切れると会議が次の段階に進行せず、遠隔会議を円滑に実行することはできない。従って、２拠点のユーザ間で会話が行われている場合には、会話を行っているユーザの通信端末間のネットワーク８の帯域を優先的に確保することが望ましい。 As shown in FIG. 12, when a conversation between users at two sites is not performed, each communication terminal transmits medium quality data to all of the other communication terminals. As a result, each communication terminal can output audio and video from all other locations. Therefore, a remote conference using voice and video can be executed. However, the bandwidth of the network 8 (see FIGS. 1 and 2) may vary. For example, when users are talking between two out of five sites, if the bandwidth drops and audio and video are interrupted, the conference does not proceed to the next stage, and the remote conference cannot be performed smoothly. . Therefore, when a conversation is performed between users at two sites, it is desirable to preferentially secure the bandwidth of the network 8 between the communication terminals of the users who are performing the conversation.

図１３に示すように、テレビ会議システム３では、拠点Ａのユーザと拠点Ｅのユーザとが会話を行うと、拠点ＡおよびＥの通信端末である会話端末は、通信データを相互に送受信する。会話端末同士で通信データを直接送受信するため、一方の会話端末から通信データが送信されてから他方が受信するまでの時間は短い。よって、２拠点のユーザ間の会話を円滑に行わせることができる。また、会話端末は、拠点Ｂ，Ｃ，Ｄの３つの非会話端末のうちのいずれかを中継端末に選定し、選定した非会話端末にのみ通信データを送信する。 As shown in FIG. 13, in the video conference system 3, when a user at the site A and a user at the site E have a conversation, the conversation terminals that are communication terminals at the sites A and E transmit and receive communication data to and from each other. Since communication data is directly transmitted and received between conversation terminals, the time from when communication data is transmitted from one conversation terminal to when the other receives is short. Therefore, the conversation between the users at the two bases can be performed smoothly. The conversation terminal selects any one of the three non-conversation terminals B, C, and D as a relay terminal, and transmits communication data only to the selected non-conversation terminal.

図１４に示すように、拠点Ａの通信端末は、会話端末として動作する間、拠点Ｃ，Ｄの非会話端末に通信データを送信する必要がない。従って、会話端末として動作する場合は、通信データの送受信の経路の数を減少させることができる。よって、本実施の形態の通信端末は、会話端末として動作する場合、拠点Ａ〜Ｅの通信端末間で使用できるネットワーク帯域全体を増大させることなく、ネットワーク帯域を効率的に利用して、会話中に使用できるネットワーク帯域を確保することができる。拠点Ａ〜Ｅ以外の機器のネットワーク帯域を圧迫することはない。使用するネットワークを変更する等の煩雑な処理を経る必要もない。そして、拠点Ａにおける音声および画像の通信データは、拠点Ｂの中継端末が拠点Ｃ，Ｄの非会話端末に転送する。よって、テレビ会議システム３では、会話端末が他の通信端末の全てに通信データを送信しなくても、遠隔会議を円滑に実行できる。つまり、２拠点のユーザ間で会話が開始された場合、非会話端末も含めた全ての通信端末による遠隔会議を継続して円滑に実行しつつ、２つの会話端末が使用できるネットワーク帯域を確保することができる。 As shown in FIG. 14, the communication terminal at site A does not need to transmit communication data to the non-conversation terminals at sites C and D while operating as a conversation terminal. Therefore, when operating as a conversation terminal, the number of communication data transmission / reception paths can be reduced. Therefore, when the communication terminal according to the present embodiment operates as a conversation terminal, the network terminal can be used efficiently without increasing the entire network band that can be used between the communication terminals at the bases A to E. The network bandwidth that can be used for the network can be secured. There is no pressure on the network bandwidth of devices other than the bases A to E. There is no need to go through complicated processing such as changing the network to be used. Then, the voice and image communication data at the site A is transferred from the relay terminal at the site B to the non-conversation terminals at the sites C and D. Therefore, in the video conference system 3, even if the conversation terminal does not transmit communication data to all of the other communication terminals, the remote conference can be executed smoothly. In other words, when a conversation is started between users at two sites, a remote conference using all communication terminals including non-conversation terminals is continuously executed smoothly, and a network bandwidth that can be used by the two conversation terminals is secured. be able to.

また、図１３に示すように、２つの会話端末は、それぞれ異なる非会話端末を中継端末に選定することができる。この場合、通信データを転送する処理負担が、複数の非会話端末のうちの１つに集中することなく、処理負担を分散させることができる。 As shown in FIG. 13, the two conversation terminals can select different non-conversation terminals as relay terminals. In this case, the processing load for transferring the communication data can be distributed without being concentrated on one of the plurality of non-conversational terminals.

通信データの品質について説明する。図１３および図１４に示すように、会話端末（Ａ，Ｅ）同士で送受信される通信データは、データ量が大きい高品質のデータである。従って、会話端末（Ａ，Ｅ）は、音声および映像を高い品質で会話中のユーザに提供することができる。一方、会話端末（Ａ，Ｅ）は、データ量が小さい低品質のデータを中継端末（ＢまたはＤ）に送信する。会話端末（Ａ，Ｅ）は、中継端末（ＢまたはＤ）に送信する通信データのデータ量を小さくすることで、相手端末との間で使用できるネットワーク８の帯域を、より確実に確保することができる。さらに、ブロードキャストする通信データのデータ量が小さいため、非会話端末（Ｂ，Ｃ，Ｄ）は、会話端末（Ａ，Ｅ）から送信された通信データを円滑にブロードキャストすることができる。また、非会話端末（Ｂ，Ｃ，Ｄ）は、２つの会話端末間で会話が開始されると、送信するデータを低品質データとする。その結果、会話端末（Ａ，Ｅ）に送信される通信データのデータ量が小さくなる。よって、会話端末（Ａ，Ｅ）が使用できる帯域を確保できる。 The quality of communication data will be described. As shown in FIGS. 13 and 14, communication data transmitted and received between conversation terminals (A, E) is high-quality data with a large data amount. Accordingly, the conversation terminals (A, E) can provide voice and video to a user who is in conversation with high quality. On the other hand, the conversation terminal (A, E) transmits low quality data with a small amount of data to the relay terminal (B or D). The conversation terminal (A, E) secures the bandwidth of the network 8 that can be used with the other terminal more reliably by reducing the amount of communication data transmitted to the relay terminal (B or D). Can do. Furthermore, since the amount of communication data to be broadcast is small, the non-conversation terminals (B, C, D) can smoothly broadcast the communication data transmitted from the conversation terminals (A, E). In addition, when the conversation is started between the two conversation terminals, the non-conversation terminals (B, C, D) set the data to be transmitted as low quality data. As a result, the amount of communication data transmitted to the conversation terminals (A, E) is reduced. Therefore, a bandwidth that can be used by the conversation terminals (A, E) can be secured.

各通信端末は、２拠点のユーザ間の会話の終了を検出すると、中品質のデータを他の通信端末の全てに直接送信する（図１２参照）。従って、各通信端末は、２拠点のユーザ間で会話が行われていない場合には、データの受信遅延を最小限に抑えつつ遠隔会議を進行させることができる。また、各会議端末は、音声認識、およびユーザの視線方向の検出の少なくともいずれかを利用して、会話相手の通信端末である相手端末を容易に特定することができる。 When each communication terminal detects the end of the conversation between the users at the two locations, it directly transmits medium quality data to all of the other communication terminals (see FIG. 12). Therefore, each communication terminal can proceed with the remote conference while minimizing the data reception delay when no conversation is performed between the users at the two bases. In addition, each conference terminal can easily specify the partner terminal that is the communication terminal of the conversation partner by using at least one of voice recognition and detection of the user's line-of-sight direction.

なお、上記実施の形態において、テレビ会議端末１およびＰＣ２が本発明の「通信端末」に相当する。図８のＳ３６および図９のＳ５６，Ｓ６４で自端末のユーザと他の通信端末のユーザとの間で会話が開始されたことを検出するＣＰＵ１０が「第一開始検出手段」として機能する。図８のＳ３８，Ｓ３９、および図９のＳ５７，Ｓ６５で相手端末を特定するＣＰＵ１０が「第一特定手段」として機能する。図１０の中継端末選定処理を行うＣＰＵ１０が「選定手段」および「第一〜第五選定手段」として機能する。図６のＳ３，Ｓ７，Ｓ１２で通信データを生成するＣＰＵ１０が「生成手段」として機能する。上記実施の形態の高品質データが本発明の「第一データ」に相当する。低品質データが「第二データ」に相当し、中品質データが「第三データ」に相当する。 In the above embodiment, the video conference terminal 1 and the PC 2 correspond to the “communication terminal” of the present invention. The CPU 10 that detects that a conversation has started between the user of the terminal itself and the user of another communication terminal in S36 of FIG. 8 and S56 and S64 of FIG. 9 functions as “first start detection means”. The CPU 10 that specifies the counterpart terminal in S38 and S39 in FIG. 8 and S57 and S65 in FIG. 9 functions as “first specifying means”. The CPU 10 that performs the relay terminal selection process of FIG. 10 functions as “selection means” and “first to fifth selection means”. The CPU 10 that generates communication data in S3, S7, and S12 of FIG. 6 functions as a “generating unit”. The high quality data of the above embodiment corresponds to “first data” of the present invention. The low quality data corresponds to “second data”, and the medium quality data corresponds to “third data”.

図６のＳ８で相手端末に高品質データを送信するＣＰＵ１０が「第一送信手段」として機能する。図６のＳ９で中継端末に低品質データを送信するＣＰＵ１０が「第二送信手段」として機能する。図８のＳ４２および図９のＳ６８で他の２つの通信端末間の会話開始を検出するＣＰＵ１０が「第二開始検出手段」として機能する。図８のＳ４４および図９のＳ６９で２つの会話端末を特定するＣＰＵ１０が「第二特定手段」として機能する。図６のＳ１５で低品質データを非会話端末間でブロードキャストするＣＰＵ１０が「転送手段」として機能する。 CPU10 which transmits high quality data to the other party terminal by S8 of FIG. 6 functions as a "first transmission means". CPU10 which transmits low quality data to a relay terminal by S9 of FIG. 6 functions as a "2nd transmission means." The CPU 10 that detects the start of conversation between the other two communication terminals in S42 of FIG. 8 and S68 of FIG. 9 functions as “second start detection means”. The CPU 10 that specifies the two conversation terminals in S44 of FIG. 8 and S69 of FIG. 9 functions as the “second specifying means”. The CPU 10 that broadcasts the low-quality data between the non-conversation terminals in S15 of FIG. 6 functions as a “transfer unit”.

ＨＤＤ１３の会話履歴記憶エリア１３３が「第一記憶手段」に相当する。図８のＳ４０および図９のＳ５８，Ｓ６６で相手端末を履歴として記憶するＣＰＵ１０が「記憶制御手段」として機能する。ＨＤＤ１３の通信相手情報記憶エリア１３２が「第二記憶手段」および「第三記憶手段」に相当する。図１０のＳ７１で、優先度を用いて中継端末を設定する場合にＨＤＤ１３から優先度を取得するＣＰＵ１０が「第一取得手段」として機能する。図８のＳ３１で登録ユーザ名をＨＤＤ１３から取得するＣＰＵ１０が「第二取得手段」として機能する。各通信端末の発話音声を取得するＣＰＵ１０が「第三取得手段」として機能し、取得した発話音声に対して音声認識を行うＣＰＵ１０が「音声認識手段」として機能する。図８のＳ３９で、入力音声に含まれていた登録ユーザ名を用いて相手端末を特定するＣＰＵ１０が「音声特定手段」として機能する。 The conversation history storage area 133 of the HDD 13 corresponds to “first storage means”. The CPU 10 that stores the counterpart terminal as a history in S40 of FIG. 8 and S58 and S66 of FIG. 9 functions as “storage control means”. The communication partner information storage area 132 of the HDD 13 corresponds to “second storage means” and “third storage means”. In S <b> 71 of FIG. 10, the CPU 10 that acquires the priority from the HDD 13 functions as a “first acquisition unit” when the relay terminal is set using the priority. The CPU 10 that acquires the registered user name from the HDD 13 in S31 of FIG. 8 functions as a “second acquisition unit”. The CPU 10 that acquires the utterance voice of each communication terminal functions as a “third acquisition unit”, and the CPU 10 that performs voice recognition on the acquired utterance voice functions as a “voice recognition unit”. In S39 of FIG. 8, the CPU 10 that specifies the partner terminal using the registered user name included in the input voice functions as a “voice specification unit”.

他の通信端末から受信した画像データに基づいて画像を表示装置３４に表示させるＣＰＵ１０が「表示制御手段」として機能する。図９のＳ５２で自端末の撮像映像を取得するＣＰＵ１０が「第四取得手段」として機能する。図９のＳ５３でユーザの視線方向を検出するＣＰＵ１０が「視線検出手段」として機能する。図９のＳ５４，Ｓ５７で、ユーザの視線方向にある表示領域に対応する通信端末を相手端末に特定するＣＰＵ１０が「視線特定手段」として機能する。図６のＳ１１，Ｓ１７で会話が終了したことを検出するＣＰＵ１０が「終了検出手段」として機能する。図６のＳ３で中品質データを他の通信端末の全てに送信するＣＰＵ１０が「第三送信手段」として機能する。 The CPU 10 that causes the display device 34 to display an image based on image data received from another communication terminal functions as a “display control unit”. The CPU 10 that acquires the captured video of the terminal in S52 of FIG. 9 functions as a “fourth acquisition unit”. The CPU 10 that detects the user's line-of-sight direction in S53 of FIG. 9 functions as “line-of-sight detection means”. In S54 and S57 in FIG. 9, the CPU 10 that identifies the communication terminal corresponding to the display area in the user's line-of-sight direction as the counterpart terminal functions as “line-of-sight identifying means”. The CPU 10 that detects the end of the conversation in S11 and S17 in FIG. 6 functions as an “end detection means”. The CPU 10 that transmits the medium quality data to all of the other communication terminals in S3 of FIG. 6 functions as a “third transmission unit”.

図８のＳ３６および図９のＳ５６，Ｓ６４で自端末のユーザと他の通信端末のユーザとの間で会話が開始されたことを検出する処理が「第一開始検出ステップ」に相当する。図８のＳ３８，Ｓ３９、および図９のＳ５７，Ｓ６５で相手端末を特定する処理が「第一特定ステップ」に相当する。図１０の中継端末選定処理が「選定ステップ」および「第一〜第五選定ステップ」に相当する。図６のＳ３，Ｓ７，Ｓ１２で通信データを生成する処理が「生成ステップ」に相当する。図６のＳ８で相手端末に高品質データを送信する処理が「第一送信ステップ」に相当する。図６のＳ９で中継端末に低品質データを送信する処理が「第二送信ステップ」に相当する。図８のＳ４２および図９のＳ６８で他の２つの通信端末間の会話開始を検出する処理が「第二開始検出ステップ」に相当する。図８のＳ４４および図９のＳ６９で２つの会話端末を特定する処理が「第二特定ステップ」に相当する。図６のＳ１５で低品質データを非会話端末間でブロードキャストする処理が「転送ステップ」に相当する。 The process of detecting that the conversation has started between the user of the own terminal and the user of another communication terminal in S36 of FIG. 8 and S56 and S64 of FIG. 9 corresponds to the “first start detection step”. The process of specifying the partner terminal in S38 and S39 of FIG. 8 and S57 and S65 of FIG. 9 corresponds to the “first specifying step”. The relay terminal selection process in FIG. 10 corresponds to “selection step” and “first to fifth selection steps”. The process of generating communication data in S3, S7, and S12 of FIG. 6 corresponds to a “generation step”. The process of transmitting the high quality data to the counterpart terminal in S8 of FIG. 6 corresponds to the “first transmission step”. The process of transmitting the low quality data to the relay terminal in S9 of FIG. 6 corresponds to the “second transmission step”. The process of detecting the start of conversation between the other two communication terminals in S42 of FIG. 8 and S68 of FIG. 9 corresponds to the “second start detection step”. The process of specifying two conversation terminals in S44 of FIG. 8 and S69 of FIG. 9 corresponds to a “second specifying step”. The process of broadcasting low quality data between non-conversation terminals in S15 of FIG. 6 corresponds to a “transfer step”.

本発明は、上記実施の形態に限定されることはなく、様々な変形が可能であることは言うまでもない。例えば、テレビ会議システム３に使用する通信端末の数が５つに限られないことは勿論である。会話端末が中継端末を１つ選定する場合には、４つ以上の通信端末を用いることで本発明を適用できる。 Needless to say, the present invention is not limited to the above-described embodiment, and various modifications are possible. For example, the number of communication terminals used for the video conference system 3 is not limited to five. When the conversation terminal selects one relay terminal, the present invention can be applied by using four or more communication terminals.

上記実施の形態では、２つの会話端末はそれぞれ１つの中継端末を非会話端末の中から選定している。しかし、１つの会話端末が２以上の中継端末を選定してもよい。２以上の中継端末を選定した場合でも、会話端末の通信データの送信経路を減少させることができる。この場合、一の会話端末から選定された中継端末は、その一の会話端末から未だ通信データを受信していない非会話端末に、通信データを転送すればよい。 In the above embodiment, each of the two conversation terminals selects one relay terminal from among the non-conversation terminals. However, one conversation terminal may select two or more relay terminals. Even when two or more relay terminals are selected, the communication data transmission path of the conversation terminal can be reduced. In this case, the relay terminal selected from one conversation terminal may transfer the communication data to a non-conversation terminal that has not yet received communication data from the one conversation terminal.

非会話端末内で通信データをブロードキャストする方法は、上記実施の形態で用いた方法に限られない。ネットワーク８内の特殊なアドレスを指定して通信データを送信する方法等、データの重複、ループ等が発生しない周知の方法を用いてブロードキャストを行ってもよい。また、図１４に示す例では、拠点Ｂの中継端末が拠点Ｃ，Ｄの非会話端末のそれぞれに通信データを直接転送している。しかし、通信データは、拠点Ｂから拠点Ｃ、さらに拠点Ｃから拠点Ｄのように、順に転送されてもよい。 The method of broadcasting communication data in the non-conversation terminal is not limited to the method used in the above embodiment. Broadcast may be performed using a well-known method that does not cause duplication of data, loops, or the like, such as a method of transmitting communication data by designating a special address in the network 8. In the example shown in FIG. 14, the relay terminal at the base B directly transfers the communication data to the non-conversation terminals at the bases C and D. However, the communication data may be sequentially transferred from the base B to the base C and further from the base C to the base D.

上記実施の形態では、高品質、中品質、および低品質の３つの通信データの送受信を制御することで、会話端末が使用できるネットワーク８の帯域をより確実に確保している。しかし、送信する通信データの品質は適宜変更できる。例えば、通信データの送信先が会話端末であるか否かに関わらず、同一の品質の通信データを各々の通信端末に送信してもよい。本発明によれば、通信データの品質を変えなくても、会話端末の通信データの送信経路を減少させることができるため、会話端末が使用できる帯域を確保することができる。非会話端末も同様に、会話が行われているか否かに関わらず同一の品質の通信データを送信してもよい。 In the above-described embodiment, the bandwidth of the network 8 that can be used by the conversation terminal is more reliably secured by controlling transmission / reception of three communication data of high quality, medium quality, and low quality. However, the quality of communication data to be transmitted can be changed as appropriate. For example, communication data of the same quality may be transmitted to each communication terminal regardless of whether the transmission destination of the communication data is a conversation terminal. According to the present invention, since the communication data transmission path of the conversation terminal can be reduced without changing the quality of the communication data, a band that can be used by the conversation terminal can be secured. Similarly, non-conversational terminals may transmit communication data of the same quality regardless of whether or not a conversation is being performed.

上記実施の形態では、２つの会話端末はそれぞれ異なる非会話端末を中継端末に選定している。その結果、通信データを転送する処理の負担を２つの非会話端末に分散させることができる。しかし、２つの会話端末が同一の非会話端末を中継端末に選定してもよい。例えば、２つの会話端末が共に、処理能力が最も高い非会話端末を中継端末に選定する。２つの会話端末から送信される通信データが、同一の中継端末を経由して他の非会話端末に転送されると、非会話端末では、２つの通信データの受信遅延に差が生じることがない。よって、非会話端末は、２つの会話端末から同時に通信データを受信することができる。 In the above embodiment, the two conversation terminals select different non-conversation terminals as relay terminals. As a result, the processing load for transferring communication data can be distributed to the two non-conversation terminals. However, the non-conversation terminal in which two conversation terminals are the same may be selected as a relay terminal. For example, the two conversation terminals both select the non-conversation terminal having the highest processing capability as the relay terminal. When communication data transmitted from two conversation terminals is transferred to another non-conversation terminal via the same relay terminal, there is no difference in reception delay between the two communication data in the non-conversation terminal. . Therefore, the non-conversation terminal can receive communication data from two conversation terminals simultaneously.

上記実施の形態では、各通信端末は、音声認識およびユーザの視線方向の検出を利用して相手端末を特定している。しかし、相手端末の特定方法は適宜変更できる。例えば、会話を開始するユーザに操作部２５等を操作させて、相手端末を指定させてもよい。また、一定時間内に各通信端末が送信する音声データのデータ量を監視し、送信するデータ量が大きい２つの通信端末を会話端末として処理を行ってもよい。同様に、会話が終了したか否かを判断するための判断方法、会話が開始したか否かを検出するための検出方法等も変更できる。 In the above embodiment, each communication terminal specifies a partner terminal using voice recognition and detection of a user's line-of-sight direction. However, the identification method of the counterpart terminal can be changed as appropriate. For example, a user who starts a conversation may operate the operation unit 25 or the like to designate a partner terminal. Alternatively, the amount of audio data transmitted by each communication terminal may be monitored within a certain time, and processing may be performed using two communication terminals having a large amount of data to be transmitted as conversation terminals. Similarly, a determination method for determining whether or not the conversation has ended, a detection method for detecting whether or not the conversation has started, and the like can be changed.

１テレビ会議端末
２ＰＣ
３テレビ会議システム
１０ＣＰＵ
１３ＨＤＤ
３４表示装置
４１〜４４表示領域
１３２通信相手情報記憶エリア
１３３会話履歴記憶エリア 1 Video conference terminal 2 PC
3 Video conference system 10 CPU
13 HDD
34 Display devices 41 to 44 Display area 132 Communication partner information storage area 133 Conversation history storage area

Claims

A communication terminal that is connected to three or more other communication terminals via a network and transmits / receives communication data including at least one of audio data and image data,
A first start detecting means for detecting that a conversation is started between the user of the own terminal and the user of one communication terminal among the other communication terminals;
A first specifying means for specifying a partner terminal that is the one communication terminal on the partner side of the started conversation when the first start detecting means detects that the conversation has started;
Selecting means for selecting at least one non-conversational terminal excluding the counterpart terminal having a conversation among the other communication terminals when the counterpart terminal is specified by the first specifying means;
Generating means for generating communication data from information including at least one of audio data and image data;
First transmission means for transmitting first data which is communication data generated by the generation means to the counterpart terminal specified by the first specification means;
Second transmission means for transmitting, to the non-conversation terminal selected by the selection means, second data which is communication data in which the information generated by the generation means is common to the first data;
Second start detecting means for detecting that a conversation has started between users of two communication terminals in the other communication terminals;
A second specifying means for specifying the two communication terminals performing the started conversation when the second start detecting means detects that the conversation is started;
Transfer means for transferring the received second data to a communication terminal other than the two communication terminals specified by the second specifying means when receiving the second data from the other communication terminal. A communication terminal characterized by that.

2. The communication terminal according to claim 1, wherein the first transmission unit transmits first data having a data amount larger than second data transmitted by the second transmission unit to the counterpart terminal.

A storage control means for storing the counterpart terminal specified by the first specification means in the first storage means in order as a history;
The said selection means is provided with the 1st selection means which preferentially selects the non-conversation terminal with the new order which was memorize | stored as a log | history in said 1st memory | storage means among several said non-conversation terminals. Item 3. The communication terminal according to Item 1 or 2.

The selecting means is
A second selection means for preferentially selecting a non-conversation terminal having a short transmission time of communication data with the own terminal among the plurality of non-conversation terminals;
A third selection means for preferentially selecting a non-conversation terminal having a high network bandwidth with the own terminal among the plurality of non-conversation terminals;
4. The apparatus according to claim 1, further comprising at least one of a fourth selection unit that preferentially selects a non-conversation terminal having a high data processing capability among the plurality of non-conversation terminals. The communication terminal described.

A plurality of communication terminals connected via the network, further comprising first acquisition means for acquiring the priority from second storage means for storing the priority of each communication terminal;
The said selection means is provided with the 5th selection means which preferentially selects the non-conversation terminal with a high priority acquired by said 1st acquisition means among the said non-conversation terminals. The communication terminal according to any one of 1 to 4.

Second acquisition means for acquiring the specific information from third storage means for storing the specific information for specifying the user using each of the communication terminals in association with the communication terminal used by the user;
Third acquisition means for acquiring speech of a user who uses each of the plurality of communication terminals;
Voice recognition means for recognizing the utterance content from the utterance voice acquired by the third acquisition means,
The first specifying means includes voice specifying means for specifying the counterpart terminal using the specifying information when the specific information is included in the utterance content recognized by the voice recognizing means. The communication terminal according to any one of claims 1 to 5.

Display control means for displaying a user of the other communication terminal on a display means for displaying an image based on communication data transmitted from the other communication terminal;
A fourth acquisition means for acquiring a captured image of the user from an imaging means for imaging the user;
Line-of-sight detection means for detecting the line-of-sight direction of the user from the captured video acquired by the fourth acquisition means,
The first identification unit identifies a communication terminal used by the user in the line-of-sight direction as the partner terminal when the user displayed on the display unit is in the line-of-sight direction detected by the line-of-sight detection unit. The communication terminal according to claim 1, further comprising a specifying unit.

End detection means for detecting whether or not the conversation between the users of the two communication terminals has ended after the first start detection means or the second start detection means has detected that the conversation has started.
Third transmission means for transmitting third data, which is communication data generated by the generation means, to all of the other communication terminals when the end detection means detects that the conversation has ended. The communication terminal according to claim 1, further comprising:

A communication method performed in a communication system including four or more communication terminals connected to each other via a network and transmitting / receiving communication data including at least one of audio data and image data,
A first start detection step of detecting that a conversation has started between the user of the terminal and the user of one communication terminal among other communication terminals connected to the terminal;
A first specifying step of specifying a partner terminal that is the one communication terminal on the partner side of the started conversation when it is detected that the conversation is started by the first start detecting step;
A selection step of selecting at least one non-conversation terminal excluding the counterpart terminal having a conversation among the other communication terminals when the counterpart terminal is identified by the first identification step;
A generation step of generating communication data from information including at least one of audio data and image data;
A first transmission step of transmitting first data which is communication data generated by the generation step to the counterpart terminal specified by the first specification step;
A second transmission step of transmitting, to the non-conversational terminal selected in the selection step, second data, which is communication data common to the first data, as information that is generated by the generation step;
A second start detection step for detecting that a conversation has started between users of two communication terminals in the other communication terminals;
A second specifying step of specifying the two communication terminals performing the started conversation when it is detected by the second start detecting step that a conversation is started;
A transfer step of transferring the received second data to a communication terminal other than the two communication terminals specified in the second specifying step when receiving the second data from the other communication terminal. A communication method characterized by the above.

A communication system comprising four or more communication terminals that transmit and receive communication data including at least one of audio data and image data, wherein a plurality of the communication terminals are connected to each other via a network,
The communication terminal is
A first start detecting means for detecting that a conversation is started between the user of the own terminal and the user of one communication terminal among the other communication terminals;
A first specifying means for specifying a partner terminal that is the one communication terminal on the partner side of the started conversation when the first start detecting means detects that the conversation has started;
Selecting means for selecting at least one non-conversational terminal excluding the counterpart terminal having a conversation among the other communication terminals when the counterpart terminal is specified by the first specifying means;
Generating means for generating communication data from information including at least one of audio data and image data;
First transmission means for transmitting first data which is communication data generated by the generation means to the counterpart terminal specified by the first specification means;
Second transmission means for transmitting, to the non-conversation terminal selected by the selection means, second data which is communication data in which the information generated by the generation means is common to the first data;
Second start detecting means for detecting that a conversation has started between users of two communication terminals in the other communication terminals;
A second specifying means for specifying the two communication terminals performing the started conversation when the second start detecting means detects that the conversation is started;
Transfer means for transferring the received second data to a communication terminal other than the two communication terminals specified by the second specifying means when receiving the second data from the other communication terminal. A communication system characterized by the above.