JP2003163906A

JP2003163906A - Television conference system and method therefor

Info

Publication number: JP2003163906A
Application number: JP2001361254A
Authority: JP
Inventors: Masahiro Mikuriya; 正弘御厨; Takafumi Saito; 孝文齋藤; Shigehisa Ozawa; 滋久小澤
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2001-11-27
Filing date: 2001-11-27
Publication date: 2003-06-06

Abstract

<P>PROBLEM TO BE SOLVED: To provide a television conference system capable of more easily identifying a speaker. <P>SOLUTION: A WEB server 1 selects voice information having a maximum voice level among voice information from terminals 3, allots a first display size to identification information of other terminals than a speaker's terminal for transmitting the selected voice information, i.e., terminals of non-speakers, allots a second display size to the identification information of the terminal for transmitting the selected voice information, i.e., the speaker's terminal, and informs a rendering server 2 of the respective identification information and the display sizes corresponding thereto. The rendering server 2 receives picture information obtained by imaging each participant and corresponding identification information of the terminal for transmitting the picture information, composes the picture information so that the respective picture are displayed at once according to the informed display size corresponding to the identification information received from the WEB server 1 together with the picture information, transmits and displays the composite picture information obtained by the composition to each terminal. <P>COPYRIGHT: (C)2003,JPO

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、会議の各参加者の
画像を各参加者の端末の画面内に同時に表示させるテレ
ビ会議システム及びテレビ会議方法に係り、特に発言者
の識別を一層容易にする技術に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a video conference system and a video conference method for simultaneously displaying the images of the participants of a conference on the screens of the terminals of the participants, and more particularly, to more easily identify the speaker. Technology.

【０００２】[0002]

【従来の技術】従来より、様々な目的でテレビ会議シス
テムが利用されている。テレビ会議システムにあって
は、各参加者に端末が割り当てられ、その端末の表示画
面内には、全参加者、またはその一部である複数の参加
者の画像が同時に表示される。そして、例えば、発言者
の画像には識別のためのマークが付けられる。したがっ
てテレビ会議システムの参加者は、現在の発言者を識別
することができる。2. Description of the Related Art Conventionally, video conference systems have been used for various purposes. In the video conference system, a terminal is assigned to each participant, and images of all participants or a plurality of participants, which are a part of the participants, are simultaneously displayed on the display screen of the terminal. Then, for example, a mark for identification is added to the image of the speaker. Therefore, the participants of the video conference system can identify the current speaker.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、従来の
テレビ会議システムにあっては、発言者がマークによっ
て識別されるようになっているため、直感的な識別が難
しい場合があり、かかる点の改善が強く要望されてい
る。However, in the conventional video conference system, since the speaker is identified by the mark, it may be difficult to intuitively identify the speaker. Is strongly requested.

【０００４】そこで本発明は、上記の従来の課題に鑑み
てなされたものであり、その目的とするところは、発言
者の識別が一層容易に行えるテレビ会議システム及びテ
レビ会議方法を提供することにある。Therefore, the present invention has been made in view of the above conventional problems, and an object of the present invention is to provide a video conference system and a video conference method in which the speaker can be more easily identified. is there.

【０００５】[0005]

【課題を解決するための手段】上記従来の課題を解決す
るために、請求項１の本発明は、会議に参加する複数の
参加者のそれぞれに利用される端末に対し第１及び第２
のサーバを接続したテレビ会議システムであって、前記
第１のサーバは、前記各端末から、該端末を利用する参
加者の音声より得られた音声情報を受信し、当該各音声
情報の中から音声レベルが最大のものを選択し、該選択
した音声情報の送信元以外の端末の識別情報に対し第１
の表示サイズを対応づけるともに、当該選択した音声情
報の送信元である端末の識別情報には前記第１の表示サ
イズよりも大きい第２の表示サイズを対応づけ、当該各
識別情報及び各表示サイズを対応づけて第２のサーバに
送信し、前記第２のサーバは、各参加者を撮像して得ら
れた各画像情報と該画像情報の送信元である端末の識別
情報とを受信し、当該各画像情報が該画像情報とともに
受信した識別情報に対応する前記第１のサーバからの表
示サイズにしたがって同時に表示されるように当該画像
情報を合成し、該合成により得られた合成画像情報を各
端末に送信し表示させることを特徴とするテレビ会議シ
ステムをもって解決手段とする。In order to solve the above-mentioned conventional problems, the present invention according to claim 1 provides first and second terminals for terminals used by a plurality of participants participating in a conference.
A video conferencing system connected to the server, wherein the first server receives, from each of the terminals, voice information obtained from a voice of a participant who uses the terminal, and selects from among the voice information. The one with the highest voice level is selected, and the first is selected for the identification information of the terminals other than the source of the selected voice information
And the second display size larger than the first display size is associated with the identification information of the terminal that is the transmission source of the selected audio information, and the identification information and the display size are associated with each other. Is transmitted to the second server in association with each other, and the second server receives each image information obtained by imaging each participant and identification information of the terminal that is the transmission source of the image information, The image information is combined so that each image information is simultaneously displayed according to the display size from the first server corresponding to the identification information received together with the image information, and the combined image information obtained by the combination is The solution is a video conference system that is transmitted to each terminal and displayed.

【０００６】請求項２の本発明は、前記第１のサーバ
が、前記第２の表示サイズを対応づけた識別情報で識別
される端末から、新たに画像情報が送信されたときは、
当該識別情報に対して、前記第２の表示サイズよりも小
さく前記第１の表示サイズよりも大きい第３の表示サイ
ズを対応づけることを特徴とする請求項１記載のテレビ
会議システムをもって解決手段とする。According to a second aspect of the present invention, when the first server sends new image information from a terminal identified by identification information associated with the second display size,
The video conference system according to claim 1, wherein a third display size smaller than the second display size and larger than the first display size is associated with the identification information. To do.

【０００７】請求項３の本発明は、会議に参加する複数
の参加者のそれぞれに利用される端末に対し第１及び第
２のサーバを接続したテレビ会議システムが行うテレビ
会議方法であって、前記第１のサーバは、前記各端末か
ら、該端末を利用する参加者の音声より得られた音声情
報を受信し、当該各音声情報の中から音声レベルが最大
のものを選択し、該選択した音声情報の送信元以外の端
末の識別情報に対し第１の表示サイズを対応づけるとも
に、当該選択した音声情報の送信元である端末の識別情
報には前記第１の表示サイズよりも大きい第２の表示サ
イズを対応づけ、当該各識別情報及び各表示サイズを対
応づけて第２のサーバに送信し、前記第２のサーバは、
各参加者を撮像して得られた各画像情報と該画像情報の
送信元である端末の識別情報とを受信し、当該各画像情
報が該画像情報とともに受信した識別情報に対応する前
記第１のサーバからの表示サイズにしたがって同時に表
示されるように当該画像情報を合成し、該合成により得
られた合成画像情報を各端末に送信し表示させることを
特徴とするテレビ会議方法をもって解決手段とする。The present invention according to claim 3 is a video conference method performed by a video conference system in which first and second servers are connected to terminals used by a plurality of participants participating in a conference, respectively. The first server receives, from each of the terminals, voice information obtained from a voice of a participant who uses the terminal, selects one of the voice information having the highest voice level, and performs the selection. The first display size is associated with the identification information of the terminal other than the transmission source of the selected audio information, and the identification information of the terminal that is the transmission source of the selected audio information is larger than the first display size. The two display sizes are associated with each other, and the identification information and the display sizes are associated with each other and transmitted to the second server.
The image information obtained by imaging each participant and the identification information of the terminal that is the transmission source of the image information are received, and each of the image information corresponds to the identification information received together with the image information. With the video conferencing method, the image information is combined so that the image information is simultaneously displayed according to the display size from the server, and the combined image information obtained by the combination is transmitted to each terminal and displayed. To do.

【０００８】請求項４の本発明は、前記第１のサーバ
が、前記第２の表示サイズを対応づけた識別情報で識別
される端末から、新たに画像情報が送信されたときは、
当該識別情報に対して、前記第２の表示サイズよりも小
さく前記第１の表示サイズよりも大きい第３の表示サイ
ズを対応づけることを特徴とする請求項３記載のテレビ
会議方法をもって解決手段とする。According to a fourth aspect of the present invention, when the first server sends new image information from a terminal identified by identification information associated with the second display size,
The resolution means with a video conference method according to claim 3, wherein a third display size smaller than the second display size and larger than the first display size is associated with the identification information. To do.

【０００９】請求項１または請求項３の本発明にあって
は、第１のサーバは、各端末から、該端末を利用する参
加者の音声より得られた音声情報を受信し、当該各音声
情報の中から音声レベルが最大のものを選択し、該選択
した音声情報の送信元以外の端末、すなわち非発言者の
端末の識別情報に対し第１の表示サイズを対応づけると
もに、当該選択した音声情報の送信元である端末、すな
わち発言者の端末の識別情報には第１の表示サイズより
も大きい第２の表示サイズを対応づけ、当該各識別情報
及び各表示サイズを対応づけて第２のサーバに送信し、
前記第２のサーバは、各参加者を撮像して得られた各画
像情報と該画像情報の送信元である端末の識別情報とを
受信し、当該各画像情報が該画像情報とともに受信した
識別情報に対応する前記第１のサーバからの表示サイズ
にしたがって同時に表示されるように当該画像情報を合
成し、該合成により得られた合成画像情報を各端末に送
信し表示させるため、合成画像情報が各端末で表示され
るときには、発言者の画像情報が非発言者の画像情報よ
りも大きく表示され、その結果、発言者の識別が一層容
易に行えるようになる。In the present invention according to claim 1 or 3, the first server receives, from each terminal, voice information obtained from voices of participants who use the terminal, and the respective voices are received. The one having the highest audio level is selected from the information, the first display size is associated with the identification information of the terminal other than the transmission source of the selected audio information, that is, the terminal of the non-speaker, and the selected information is selected. A second display size larger than the first display size is associated with the identification information of the terminal that is the source of the audio information, that is, the speaker's terminal, and the second identification size is associated with the second identification size. Sent to the server of
The second server receives each image information obtained by capturing an image of each participant and identification information of a terminal which is a transmission source of the image information, and the image information is received together with the image information. Synthesized image information for synthesizing the image information so as to be simultaneously displayed according to the display size from the first server corresponding to the information, and transmitting and displaying the synthesized image information obtained by the synthesizing to each terminal. Is displayed on each terminal, the image information of the speaker is displayed larger than the image information of the non-speaker, and as a result, the speaker can be identified more easily.

【００１０】請求項２または請求項４の本発明にあって
は、第１のサーバが、第２の表示サイズを対応づけた識
別情報で識別される端末から、新たに画像情報が送信さ
れたときは、当該識別情報に対して、第２の表示サイズ
よりも小さく第１の表示サイズよりも大きい第３の表示
サイズを対応づけるので、過去の発言者の画像情報が、
非発言者のものよりも大きく、かつ現在の発言者のもの
よりも小さくなり、その結果、過去の発言者を、非発言
者と現在の発言者とに対して識別することができる。According to the present invention of claim 2 or claim 4, the first server transmits new image information from the terminal identified by the identification information associated with the second display size. In this case, since the third display size smaller than the second display size and larger than the first display size is associated with the identification information, the image information of the past speaker is
It is larger than that of the non-speaker and smaller than that of the current speaker, so that the past speaker can be discriminated from the non-speaker and the current speaker.

【００１１】[0011]

【発明の実施の形態】以下、本発明の実施の形態を図面
を参照して説明する。図１は、本発明の実施例に係るテ
レビ会議システムの構成を示すブロック図である。本実
施例のテレビ会議システムは、ＷＥＢサーバ１とレンダ
リングサーバ２と端末３，３，…とをネットワーク１０
０を介して接続してなるコンピュータシステムである。
ＷＥＢサーバ１は、端末３からの情報を受信する受付部
１１と、端末３からの音声情報を合成するとともに、端
末３の識別情報（端末ＩＤ）に表示サイズなる値を対応
づける情報統合部１２と、レンダリングサーバ２に情報
を送信する情報送信部１３と、端末ＩＤが蓄積される端
末ＤＢ１４とからなる。なお、端末ＩＤは、便宜的に端
末３のアドレスで代用してもよい。BEST MODE FOR CARRYING OUT THE INVENTION Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing the configuration of a video conference system according to an embodiment of the present invention. In the video conference system of this embodiment, a WEB server 1, a rendering server 2, terminals 3, 3, ...
It is a computer system that is connected via 0.
The WEB server 1 synthesizes the voice information from the terminal 3 with the reception unit 11 that receives the information from the terminal 3, and the information integration unit 12 that associates the identification information (terminal ID) of the terminal 3 with the value of the display size. And an information transmitting unit 13 that transmits information to the rendering server 2, and a terminal DB 14 that stores a terminal ID. The terminal ID may be replaced with the address of the terminal 3 for convenience.

【００１２】レンダリングサーバ２は、ＷＥＢサーバ１
からの情報を受信する情報受付部２１と、画像情報を合
成する画像処理部２２と、各端末３に情報を送信する送
信部２３とを備える。なお、画像処理部２２は、表示サ
イズにしたがって画像情報を合成するようになってい
る。The rendering server 2 is the WEB server 1
An information reception unit 21 that receives information from the device, an image processing unit 22 that combines image information, and a transmission unit 23 that transmits information to each terminal 3. The image processing unit 22 synthesizes the image information according to the display size.

【００１３】端末３は、図２に示すように、操作入力が
なされる入力部（具体的はキーボード等）３１と、参加
者を撮像するビデオカメラ（カメラ）３２と、参加者の
音声を検出するマイクロフォン（マイク）３３と、音声
情報を再生するスピーカ３４と、画像情報等が表示され
る表示部３５を備える。また、カメラ３２とマイク３３
で得られた信号をデータに変換してＷＥＢサーバ１へ送
信する一方、レンダリングサーバ２からの画像情報及び
音声情報を表示部３５やスピーカ３４で表示または再生
させる制御部３６を備える。As shown in FIG. 2, the terminal 3 detects an input unit (specifically, a keyboard) 31 for inputting an operation, a video camera (camera) 32 for capturing an image of the participant, and a voice of the participant. A microphone (microphone) 33, a speaker 34 that reproduces audio information, and a display unit 35 that displays image information and the like are provided. In addition, the camera 32 and the microphone 33
The control unit 36 converts the signal obtained in step 1 into data and transmits the data to the WEB server 1, while displaying or reproducing image information and audio information from the rendering server 2 on the display unit 35 or the speaker 34.

【００１４】次に、本実施例の作用を説明する。まず、
端末認証について説明する。各端末３の入力部３１に操
作がなされると、各端末３の制御部３６は、ＷＥＢサー
バ１の受付部１１に端末ＩＤを送信し、ＷＥＢサーバ１
の受付部１１は、受信した各端末ＩＤが端末ＤＢ１４に
あれば、その端末３に対して処理を行う。ここでは全端
末３の端末ＩＤが端末ＤＢ１４に記憶されていたことに
より、全参加者の端末が問題なく認証されたこととす
る。なお、参加者ＩＤとパスワードによるユーザ認証を
組み合わせてもよい。Next, the operation of this embodiment will be described. First,
The terminal authentication will be described. When the input unit 31 of each terminal 3 is operated, the control unit 36 of each terminal 3 transmits the terminal ID to the reception unit 11 of the WEB server 1, and the WEB server 1
If each of the received terminal IDs is in the terminal DB 14, the reception unit 11 performs the process for the terminal 3. Here, since the terminal IDs of all the terminals 3 are stored in the terminal DB 14, it is assumed that the terminals of all the participants have been authenticated without any problem. It should be noted that user authentication using a participant ID and a password may be combined.

【００１５】受付部１１は各端末ＩＤを情報統合部１２
に送り、情報統合部１２は、端末ＩＤの合計数ｎと、予
め記憶した端末３の表示部３５の画素数を基に、発言者
の画像情報の表示サイズＳと、非発言者の画像情報を表
示サイズｓとを求め記憶しておく。Ｓ、ｓは、それぞれ
１０万画素、５万画素というように、例えば、画素数と
して求められ記憶される。The reception unit 11 stores each terminal ID in the information integration unit 12
The information integration unit 12 determines the display size S of the image information of the speaker and the image information of the non-speaker based on the total number n of the terminal IDs and the number of pixels of the display unit 35 of the terminal 3 stored in advance. The display size s is calculated and stored. S and s are obtained and stored, for example, as the number of pixels, such as 100,000 pixels and 50,000 pixels, respectively.

【００１６】次に、実際にテレビ会議が始まってからの
処理を説明する。なお、以下の処理は、会議が終了する
まで連続的に行われるが、説明の便宜上、連続する処理
の中の１つについて説明する。Next, the processing after the video conference is actually started will be described. Note that the following process is continuously performed until the conference ends, but for convenience of description, one of the continuous processes will be described.

【００１７】各端末３では、カメラ３２が参加者を撮影
し、マイク３３が参加者の音声を検出する。そして制御
部３６が、得られた信号を画像情報と音声情報にそれぞ
れ変換する。そして、各端末３の制御部３６は、音声情
報、画像情報並びに自身の端末ＩＤをＷＥＢサーバ１の
受付部１１へ送信する。In each terminal 3, the camera 32 photographs the participant and the microphone 33 detects the participant's voice. Then, the control unit 36 converts the obtained signal into image information and audio information, respectively. Then, the control unit 36 of each terminal 3 transmits the voice information, the image information, and its own terminal ID to the reception unit 11 of the WEB server 1.

【００１８】受付部１１は、各端末３からの音声情報と
画像情報と端末ＩＤを受信して情報統合部１２へと送
る。情報統合部１２は、各音声情報を合成して１つの音
声情報（合成音声情報）を生成する。また、各音声情報
から、そのレベルが最大であるものを選択し、その選択
した音声情報とともに送信された端末ＩＤ（発言者端末
ＩＤという）に、上記記憶した表示サイズＳを対応づけ
る。さらに、情報統合部１２は、その選択した音声情報
以外の音声情報とともに送られた端末ＩＤ、つまり、非
発言者の端末ＩＤに上記記憶した表示サイズｓを対応づ
ける。そして、端末ＩＤ、画像情報及び表示サイズを組
にして、端末３に対応する各組と合成音声情報とを情報
送信部１３へと送り、情報送信部１３はこれらをレンダ
リングサーバ２への情報受付部２１へ送信する。The reception unit 11 receives the voice information, the image information and the terminal ID from each terminal 3 and sends them to the information integration unit 12. The information integration unit 12 synthesizes each voice information to generate one voice information (synthesized voice information). Also, from among the respective voice information, the one having the highest level is selected, and the stored display size S is associated with the terminal ID (referred to as a speaker terminal ID) transmitted together with the selected voice information. Further, the information integration unit 12 associates the stored display size s with the terminal ID sent with the voice information other than the selected voice information, that is, the terminal ID of the non-speaker. Then, the terminal ID, the image information, and the display size are paired, and each pair corresponding to the terminal 3 and the synthesized voice information are sent to the information transmitting unit 13, and the information transmitting unit 13 receives these information to the rendering server 2. It is transmitted to the section 21.

【００１９】情報受付部２１は、合成音声情報と各端末
ＩＤを送信部２３へ送り、表示サイズと画像情報の組を
画像処理部２２に送る。画像処理部２２は、各画像情報
を合成して（レンダリングして）１つの画像情報（合成
画像情報）を生成する。その際に画像処理部２２は、各
画像情報が、該画像情報に対応する表示サイズＳやｓで
表示されるようにする。そして、合成画像情報を送信部
２３に送り、送信部２３は、合成音声情報と合成画像情
報と同期させて、これを各端末ＩＤを宛先として各端末
３に送信する。The information receiving unit 21 sends the synthesized voice information and each terminal ID to the sending unit 23, and sends the set of the display size and the image information to the image processing unit 22. The image processing unit 22 synthesizes (renders) the pieces of image information to generate one piece of image information (synthesized image information). At that time, the image processing unit 22 causes each image information to be displayed in the display size S or s corresponding to the image information. Then, the synthetic image information is sent to the transmitting unit 23, and the transmitting unit 23 synchronizes the synthetic voice information and the synthetic image information, and transmits this to each terminal 3 with each terminal ID as a destination.

【００２０】各端末３がこれらの情報を受信すると、制
御部３６が、合成音声情報をスピーカ３４で再生させ、
一方、合成画像情報を表示部３５に表示させる。このと
き、発言者の端末３からの画像情報は非発言者の端末３
からの画像情報よりも大きく表示される。When each terminal 3 receives these information, the control unit 36 causes the speaker 34 to reproduce the synthesized voice information,
On the other hand, the composite image information is displayed on the display unit 35. At this time, the image information from the terminal 3 of the speaker is the terminal 3 of the non-speaker.
It is displayed larger than the image information from.

【００２１】なお、情報統合部１２が、過去の発言者端
末ＩＤを記憶しておき、上記処理を新たに行うときに
は、この端末ＩＤに対し、表示サイズｋ（ただしＳ＞ｋ
＞ｓ）を対応づけるようにしてもよい。この処理によ
り、過去の発言者の画像情報が、非発言者のものよりも
大きく、かつ現在の発言者のものよりも小さくなり、そ
の結果、過去の発言者を、非発言者と現在の発言者とに
対して識別することができる。なお、過去の発言者を現
在の発言者と同等に扱う場合は、表示サイズＳを対応づ
ければよい。When the information integration unit 12 stores the past speaker terminal ID and newly performs the above process, the display size k (where S> k) is satisfied for this terminal ID.
> S) may be associated. By this processing, the image information of the past speaker becomes larger than that of the non-speaker and smaller than that of the current speaker, and as a result, the past speaker is changed to the non-speaker and the current speaker. Can be distinguished from other persons. If the past speaker is treated in the same manner as the present speaker, the display size S may be associated.

【００２２】なお、上記処理は、会議が終了するまで連
続的に行われる。会議終了の際には、会議開催者などの
端末３からＷＥＢサーバ１の受付部１１へと終了通知が
送信され、これによりＷＥＢサーバ１及びレンダリング
サーバ２は処理を終える。また、ある端末３からの音声
情報及び画像情報が一定の期間途絶えたときは、その端
末３に対する処理を終える。The above process is continuously performed until the conference ends. At the end of the conference, a termination notice is transmitted from the terminal 3 such as the conference organizer to the reception unit 11 of the WEB server 1, whereby the WEB server 1 and the rendering server 2 finish the process. When the audio information and the image information from a certain terminal 3 are cut off for a certain period, the processing for the terminal 3 is finished.

【００２３】以上説明したように、本実施例のテレビ会
議システムによれば、ＷＥＢサーバ１（第１のサーバ）
は、各端末から、該端末を利用する参加者の音声より得
られた音声情報を受信し、当該各音声情報の中から音声
レベルが最大のものを選択し、該選択した音声情報の送
信元以外の端末、すなわち非発言者の端末の識別情報に
対し表示サイズｓ（第１の表示サイズ）を対応づけると
もに、当該選択した音声情報の送信元である端末、すな
わち発言者の端末の識別情報には第１の表示サイズより
も大きい表示サイズＳ（第２の表示サイズ）を対応づ
け、当該各識別情報及び各表示サイズを対応づけてレン
ダリングサーバ２（第２のサーバ）に通知し、第２のサ
ーバは、各参加者を撮像して得られた各画像情報と該画
像情報の送信元である端末の識別情報とを受信し、当該
各画像情報が該画像情報とともに受信した識別情報に対
応する前記第１のサーバからの表示サイズにしたがって
同時に表示されるように当該画像情報を合成し、該合成
により得られた合成画像情報を各端末に送信し表示させ
るため、合成画像情報が各端末で表示されるときには、
発言者の画像情報が非発言者の画像情報よりも大きく表
示され、その結果、発言者の識別が一層容易に行えるよ
うになる。As described above, according to the video conference system of this embodiment, the WEB server 1 (first server)
Receives from each terminal the voice information obtained from the voice of the participant who uses the terminal, selects the one with the highest voice level from the respective voice information, and selects the source of the selected voice information. Other than the terminal, ie, the identification information of the terminal of the non-speaker, the display size s (first display size) is associated, and the identification information of the terminal that is the transmission source of the selected audio information, that is, the terminal of the speaker. Is associated with a display size S (second display size) larger than the first display size, and the rendering server 2 (second server) is notified in association with each identification information and each display size. The server of 2 receives each image information obtained by imaging each participant and the identification information of the terminal which is the transmission source of the image information, and each of the image information becomes the identification information received together with the image information. The corresponding first service When the combined image information is displayed on each terminal, the combined image information is combined so as to be displayed simultaneously according to the display size from the camera, and the combined image information obtained by the combination is transmitted to each terminal for display. ,
The image information of the speaker is displayed larger than the image information of the non-speaker, and as a result, the speaker can be more easily identified.

【００２４】[0024]

【発明の効果】以上説明したように、本発明によれば、
第１のサーバは、各端末からの音声情報の中から音声レ
ベルが最大のものを選択し、該選択した音声情報の送信
元以外の端末、すなわち非発言者の端末の識別情報に対
し第１の表示サイズを対応づけるともに、当該選択した
音声情報の送信元である端末、すなわち発言者の端末の
識別情報には第１の表示サイズよりも大きい第２の表示
サイズを対応づけ、当該各識別情報及び各表示サイズを
対応づけて第２のサーバに送信し、第２のサーバは、各
参加者を撮像して得られた各画像情報と該画像情報の送
信元である端末の識別情報とを受信し、当該各画像情報
が該画像情報とともに受信した識別情報に対応する前記
第１のサーバからの表示サイズにしたがって同時に表示
されるように当該画像情報を合成し、該合成により得ら
れた合成画像情報を各端末に送信し表示させるため、合
成画像情報が各端末で表示されるときには、発言者の画
像情報が非発言者の画像情報よりも大きく表示され、そ
の結果、発言者の識別が一層容易に行えるようになる。As described above, according to the present invention,
The first server selects, from among the voice information from each terminal, the one having the highest voice level, and the first server selects the first one for the identification information of the terminal other than the source of the selected voice information, that is, the non-speaker terminal. And a second display size larger than the first display size is associated with the identification information of the terminal that is the transmission source of the selected voice information, that is, the terminal of the speaker, The information and each display size are associated with each other and transmitted to the second server, and the second server stores each image information obtained by imaging each participant and the identification information of the terminal which is the transmission source of the image information. Is received, and the image information is combined so that the image information is simultaneously displayed according to the display size from the first server corresponding to the identification information received together with the image information. Composite image information When the composite image information is displayed on each terminal, the image information of the speaker is displayed larger than the image information of the non-speaker so that it can be more easily identified. You will be able to do it.

[Brief description of drawings]

【図１】本発明の実施例に係るテレビ会議システムの構
成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of a video conference system according to an embodiment of the present invention.

【図２】図１に示した各端末３の構成を示すブロック図
である。FIG. 2 is a block diagram showing a configuration of each terminal 3 shown in FIG.

[Explanation of symbols]

１ＷＥＢサーバ２レンダリングサーバ３，３，… 端末１１受付部１２情報統合部１３情報送信部１４端末ＤＢ２１情報受付部２２画像処理部２３送信部３１入力部３２ビデオカメラ３３マイクロフォン３４スピーカ３５表示部３６制御部Ｓ，ｓ表示サイズ 1 WEB server 2 Rendering server 3, 3, ... Terminal 11 Reception Department 12 Information Integration Department 13 Information transmitter 14 terminal DB 21 Information reception section 22 Image processing unit 23 Transmitter 31 Input section 32 video camera 33 microphones 34 speakers 35 display 36 Control unit S, s display size

───────────────────────────────────────────────────── フロントページの続き (72)発明者小澤滋久東京都千代田区大手町二丁目３番１号日本電信電話株式会社内Ｆターム(参考） 5C064 AA02 AB04 AC04 AC06 AC12 AC16 AC22 AD18 5K015 AB01 ─────────────────────────────────────────────────── ─── Continued front page (72) Inventor Shigehisa Ozawa 2-3-1, Otemachi, Chiyoda-ku, Tokyo Inside Telegraph and Telephone Corporation F-term (reference) 5C064 AA02 AB04 AC04 AC06 AC12 AC16 AC22 AD18 5K015 AB01

Claims

[Claims]

1. A video conferencing system in which a first server and a second server are connected to a terminal used by each of a plurality of participants participating in a conference, wherein the first server is connected from each of the terminals. Receiving the voice information obtained from the voice of the participant who uses the terminal, selecting the one having the highest voice level from the respective voice information, and identifying the terminal other than the transmission source of the selected voice information. The first display size is associated with the information, and the identification information of the terminal that is the transmission source of the selected audio information is associated with the second display size larger than the first display size. The information and the respective display sizes are associated and transmitted to the second server, and the second server identifies the image information obtained by imaging each participant and the identification information of the terminal that is the transmission source of the image information. And each image information is received The image information is combined so as to be simultaneously displayed according to the display size from the first server corresponding to the identification information received together with the image information, and the combined image information obtained by the combination is transmitted to each terminal and displayed. A video conferencing system characterized by:

2. When the image information is newly transmitted from the terminal identified by the identification information in which the first server is associated with the second display size, the first server is configured to send the image information to the identification information. The video conference system according to claim 1, wherein a third display size smaller than the second display size and larger than the first display size is associated.

3. A video conference method performed by a video conference system in which a first server and a second server are connected to terminals used by a plurality of participants participating in a conference, respectively, wherein the first server is , Receiving voice information obtained from the voice of a participant who uses the terminal from each of the terminals, selecting one having the highest voice level from the respective voice information, and transmitting the selected voice information. The first display size is associated with the identification information of the terminals other than, and the second display size larger than the first display size is associated with the identification information of the terminal that is the transmission source of the selected audio information. Then, the identification information and the display size are associated with each other and transmitted to the second server, and the second server uses the image information obtained by imaging each participant and the transmission source of the image information. Receives the identification information of a certain terminal , The image information is combined so that each image information is simultaneously displayed according to the display size from the first server corresponding to the identification information received together with the image information, and the combined image information obtained by the combining A video conferencing method comprising transmitting and displaying a message to each terminal.

4. When image information is newly transmitted from the terminal identified by the identification information in which the first server is associated with the second display size, the first server sets the image information to the identification information. 4. The video conference method according to claim 3, wherein a third display size smaller than the second display size and larger than the first display size is associated.