JP2023121274A

JP2023121274A - Method for controlling conference system, terminal device, and program

Info

Publication number: JP2023121274A
Application number: JP2022024527A
Authority: JP
Inventors: 拓生大西; Takuo Onishi
Original assignee: Seiko Epson Corp
Current assignee: Seiko Epson Corp
Priority date: 2022-02-21
Filing date: 2022-02-21
Publication date: 2023-08-31

Abstract

To provide a method for promoting equalization of opportunities for speech when executing a conference by using a plurality of terminals.SOLUTION: A method for controlling a conference system 100 includes: displaying a first image corresponding to a first terminal with a first size based on a first speech amount indicating the time length of a first voice input from the first terminal; and displaying a second image corresponding to a second terminal side by side with the first image with a second size based on a second speech amount indicating the time length of a second voice input from the second terminal. When the first speech amount is larger than the second speech amount, the first size is smaller than the second size.SELECTED DRAWING: Figure 1

Description

本発明は、会議システムの制御方法、端末装置、及び、プログラムに関する。 The present invention relates to a conference system control method, a terminal device, and a program.

従来、コンピューターを利用する会議システムが知られている。例えば、特許文献１は、管理サーバー及び参加者端末を備えるシステムにおいて、発言頻度の多い参加者よりも発言頻度の少ない参加者に優先して発言権を与える構成を開示している。この構成によれば、参加者の発言機会を均等化する、とされている。 Conventionally, a conference system using a computer is known. For example, Patent Literature 1 discloses a configuration in which, in a system including a management server and participant terminals, the right to speak is given to participants who speak less frequently than participants who speak more frequently. According to this configuration, it is said that the opportunities for the participants to speak are equalized.

特開２００３－３０４３３７号公報Japanese Patent Application Laid-Open No. 2003-304337

特許文献１の構成は管理サーバーが発言権を与える構成である。発言機会を均等化する手法としてどのような手法が好ましいかは、ユーザーや、会議のスタイルによっても異なるため、手法の多様化が望まれていた。このため、発言機会の均等化を促進する新たな手法が望まれていた。 The configuration of Patent Document 1 is a configuration in which the management server gives the right to speak. Since what kind of method is preferable as a method for equalizing opportunities to speak differs depending on the user and the style of the meeting, diversification of the method has been desired. Therefore, a new technique for promoting equalization of speaking opportunities has been desired.

本開示の一態様は、第１端末から入力される第１音声の時間長を示す第１発言量に基づく第１の大きさで、前記第１端末に対応する第１画像を表示することと、第２端末から入力される第２音声の時間長を示す第２発言量に基づく第２の大きさで、前記第１画像と並べて、前記第２端末に対応する第２画像を表示することと、を含み、前記第１発言量が前記第２発言量よりも大きい場合には、前記第１の大きさは前記第２の大きさよりも小さい、会議システムの制御方法である。 An aspect of the present disclosure is to display a first image corresponding to the first terminal with a first size based on a first speech volume indicating a duration of a first voice input from the first terminal. displaying a second image corresponding to the second terminal alongside the first image in a second size based on a second utterance volume indicating the time length of the second voice input from the second terminal; and , wherein when the first speech volume is greater than the second speech volume, the first magnitude is smaller than the second magnitude.

本開示の別の一態様は、制御回路を含み、前記制御回路は、第１端末から入力される第１音声の時間長を示す第１発言量に基づく第１の大きさで、前記第１端末に対応する第１画像を表示装置に出力することと、第２端末から入力される第２音声の時間長を示す第２発言量に基づく第２の大きさで、前記第１画像と並べて、前記第２端末に対応する第２画像を前記表示装置に出力することと、を実行し、前記第１発言量が前記第２発言量よりも多い場合には、前記第１の大きさは前記第２の大きさよりも小さい、端末装置である。 Another aspect of the present disclosure includes a control circuit, wherein the control circuit outputs the first voice at a first magnitude based on a first speech volume indicating a duration of a first voice input from a first terminal. outputting a first image corresponding to the terminal to a display device, and arranging the image side by side with the first image in a second size based on a second utterance volume indicating a time length of a second voice input from the second terminal; and outputting a second image corresponding to the second terminal to the display device, and if the first speech volume is greater than the second speech volume, the first magnitude is A terminal device smaller than the second size.

本開示の別の一態様は、第１端末、第２端末、及び、第３端末と通信可能なコンピューターが実行するプログラムであって、前記コンピューターに、前記第１端末から入力される第１音声の時間長を示す第１発言量に基づく第１の大きさで、前記第１端末に対応する第１画像を前記第３端末によって表示することと、前記第２端末から入力される第２音声の時間長を示す第２発言量に基づく第２の大きさで、前記第１画像と並べて、前記第２端末に対応する第２画像を前記第３端末によって表示することと、を実行させ、前記第１発言量が前記第２発言量よりも大きい場合には、前記第１の大きさは前記第２の大きさよりも小さい、プログラム。 Another aspect of the present disclosure is a program executed by a computer communicable with a first terminal, a second terminal, and a third terminal, wherein a first voice input from the first terminal to the computer displaying, by the third terminal, a first image corresponding to the first terminal at a first size based on a first speech volume indicating a duration of a second voice input from the second terminal; causing the third terminal to display a second image corresponding to the second terminal alongside the first image in a second size based on a second amount of speech indicating the length of time of A program according to claim 1, wherein said first magnitude is smaller than said second magnitude when said first speech volume is greater than said second speech volume.

実施形態に係る会議システムの構成の一例を示す図。The figure which shows an example of a structure of the conference system which concerns on embodiment. 会議システムの動作の概要を示す説明図。Explanatory diagram showing an overview of the operation of the conference system. 会議システムを構成する装置のブロック図。FIG. 2 is a block diagram of devices that make up the conference system; 端末に表示される画面の一例を示す図。The figure which shows an example of the screen displayed on a terminal. 端末に表示される画面の一例を示す図。The figure which shows an example of the screen displayed on a terminal. サーバーの動作例を示すフローチャート。4 is a flowchart showing an example of server operation. サーバーの動作例を示すフローチャート。4 is a flowchart showing an example of server operation.

［１．会議システムの概要］
以下、図面を参照して本実施形態について説明する。
図１は、本実施形態に係る会議システム１００の構成の一例を示す図である。
会議システム１００は、複数のユーザーＵがコンピューターを利用し、音声及び映像を共有して会議を行うシステムである。会議システム１００は、複数のユーザーＵがそれぞれ使用する複数の端末１、及び、サーバー５０を備える。端末１及びサーバー５０は、通信ネットワーク７により相互にデータ通信可能に接続される。 [1. Overview of conference system]
Hereinafter, this embodiment will be described with reference to the drawings.
FIG. 1 is a diagram showing an example of the configuration of a conference system 100 according to this embodiment.
The conference system 100 is a system in which a plurality of users U use computers to hold a conference by sharing audio and video. A conference system 100 includes a plurality of terminals 1 used by a plurality of users U and a server 50 . The terminal 1 and the server 50 are connected by a communication network 7 so as to be able to communicate with each other.

会議システム１００に含まれる端末１の数に制限はない。また、サーバー５０は、１台のコンピューターであってもよいし、複数のコンピューターで構成されてもよく、クラウドサーバーであってもよい。サーバー５０は、制御装置の一例に対応する。サーバー５０が端末１とは異なる装置として会議システム１００に含まれる構成は一例である。例えば、会議システム１００において会議に参加するいずれかの端末１が、サーバー５０の機能を具備していてもよい。 The number of terminals 1 included in the conference system 100 is not limited. Also, the server 50 may be a single computer, may be composed of a plurality of computers, or may be a cloud server. Server 50 corresponds to an example of a control device. A configuration in which the server 50 is included in the conference system 100 as a device different from the terminal 1 is an example. For example, any terminal 1 participating in a conference in the conference system 100 may have the functions of the server 50 .

本実施形態では、図１に示すように、４台の端末１を利用して会議を行う例を説明する。以下の説明では、４台の端末１を、端末１Ａ、１Ｂ、１Ｃ、１Ｄと呼ぶ。これらを区別しない場合に端末１と記載する。端末１Ａ、端末１Ｂ、端末１Ｃ、及び、端末１Ｄは、通信機能を有するコンピューターである。具体的には、デスクトップ型ＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）、タブレット型ＰＣ、スマートフォン等である。端末１Ａ、１Ｂ、１Ｃ、１Ｄは、端末装置の一例に対応する。 In this embodiment, as shown in FIG. 1, an example of holding a conference using four terminals 1 will be described. In the following description, the four terminals 1 are called terminals 1A, 1B, 1C, and 1D. Terminal 1 is used when these terminals are not distinguished from each other. Terminal 1A, terminal 1B, terminal 1C, and terminal 1D are computers having a communication function. Specifically, they are desktop PCs (Personal Computers), tablet PCs, smartphones, and the like. Terminals 1A, 1B, 1C, and 1D correspond to examples of terminal devices.

図１に示す例では、端末１Ａは拠点Ｓ１に設置され、端末１Ｂは拠点Ｓ２に設置され、端末１Ｃは拠点Ｓ３に設置され、端末１Ｄは拠点Ｓ４に設置される。拠点Ｓ１～Ｓ４の地理的関係は制限されない。拠点Ｓ１～Ｓ４は、互いに離れた場所であってもよいし、同一の建物内の場所であってもよいし、同一の室内において互いに仕切られた場所であってもよい。以下の説明において、拠点Ｓ１～Ｓ４を区別しない場合は拠点Ｓと記載する。 In the example shown in FIG. 1, the terminal 1A is installed at the site S1, the terminal 1B is installed at the site S2, the terminal 1C is installed at the site S3, and the terminal 1D is installed at the site S4. The geographic relationship of sites S1-S4 is not restricted. The bases S1 to S4 may be separated from each other, may be located in the same building, or may be located in the same room separated from each other. In the following description, the bases S1 to S4 are referred to as bases S when they are not distinguished.

端末１を使用するユーザーＵの人数に制限はない。例えば、１台の端末１を複数のユーザーＵが使用して会議に参加してもよい。端末１が、可搬型のコンピューターである場合、ユーザーＵは端末１を持ち運んで使用してもよい。本実施形態では、端末１ＡをユーザーＵ１が使用し、端末１ＢをユーザーＵ２が使用し、端末１ＣをユーザーＵ３が使用し、端末１ＤをユーザーＵ４が使用する。ユーザーＵ１、Ｕ２、Ｕ３、Ｕ４を区別しない場合にユーザーＵと記載する。 The number of users U using the terminal 1 is not limited. For example, one terminal 1 may be used by multiple users U to participate in a conference. If the terminal 1 is a portable computer, the user U may carry and use the terminal 1 . In this embodiment, the terminal 1A is used by the user U1, the terminal 1B is used by the user U2, the terminal 1C is used by the user U3, and the terminal 1D is used by the user U4. Users U1, U2, U3, and U4 are referred to as user U when not distinguished.

通信ネットワーク７は、ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）であってもよいし、ＷＡＮ（ＷｉｄｅＡｒｅａＮｅｔｗｏｒｋ）であってもよい。また、専用回線、公衆回線網、インターネット等を含んで構成されるグローバルネットワークであってもよい。 The communication network 7 may be a LAN (Local Area Network) or a WAN (Wide Area Network). Alternatively, it may be a global network including a dedicated line, a public line network, the Internet, and the like.

端末１Ａは、ディスプレイ１４ａ、キーボード１５１ａ、マウス１５２ａ、カメラ１６ａ、マイク１７ａ、及び、スピーカー１８ａを有する。これらの各機器は、端末１Ａの本体に有線または無線で接続される。各機器の少なくとも１つが、端末１Ａの本体に一体に組み込まれていてもよい。いずれの場合も、これらの機器を、端末１Ａのディスプレイ１４ａ、端末１Ａのマイク１７ａ等と称する場合がある。ディスプレイ１４ａは、表示パネルを有し、表示パネルの表示領域１４１ａに画像や文字を表示する表示装置である。ディスプレイ１４ａの表示パネルは、例えば、液晶ディスプレイパネル、有機ＥＬ（Ｅｌｅｃｔｒｏ－Ｌｕｍｉｎｅｓｃｅｎｃｅ）パネル、プラズマディスプレイパネル等である。キーボード１５１ａ及びマウス１５２ａはユーザーＵ１が入力操作に使用する入力装置である。カメラ１６ａはユーザーＵ１を撮影する。マイク１７ａは、ユーザーＵ１の音声を集音する。スピーカー１８ａは、会議の音声を出力する。ユーザーＵ１は、これらの各機器を使用して会議に参加する。 Terminal 1A has display 14a, keyboard 151a, mouse 152a, camera 16a, microphone 17a, and speaker 18a. Each of these devices is wired or wirelessly connected to the main body of the terminal 1A. At least one of each device may be integrally incorporated into the main body of the terminal 1A. In either case, these devices may be referred to as the display 14a of the terminal 1A, the microphone 17a of the terminal 1A, and the like. The display 14a is a display device that has a display panel and displays images and characters in a display area 141a of the display panel. The display panel of the display 14a is, for example, a liquid crystal display panel, an organic EL (Electro-Luminescence) panel, a plasma display panel, or the like. A keyboard 151a and a mouse 152a are input devices used by the user U1 for input operations. Camera 16a photographs user U1. The microphone 17a collects the voice of the user U1. The speaker 18a outputs conference audio. User U1 uses each of these devices to participate in the conference.

端末１Ｂは、ディスプレイ１４ｂ、キーボード１５１ｂ、マウス１５２ｂ、カメラ１６ｂ、マイク１７ｂ、及び、スピーカー１８ｂを備える。これらの各機器は、端末１Ｂの本体に有線または無線で接続される。ディスプレイ１４ｂは、ディスプレイ１４ａと同様に表示パネルを有し、表示パネルの表示領域１４１ｂに画像や文字を表示する表示装置である。キーボード１５１ｂ、カメラ１６ｂ、マイク１７ｂ、及び、スピーカー１８ｂは、それぞれ、キーボード１５１ａ、カメラ１６ａ、マイク１７ａ、及び、スピーカー１８ａと同様に構成される。 Terminal 1B includes display 14b, keyboard 151b, mouse 152b, camera 16b, microphone 17b, and speaker 18b. Each of these devices is wired or wirelessly connected to the main body of the terminal 1B. The display 14b has a display panel like the display 14a, and is a display device that displays images and characters in a display area 141b of the display panel. Keyboard 151b, camera 16b, microphone 17b, and speaker 18b are configured similarly to keyboard 151a, camera 16a, microphone 17a, and speaker 18a, respectively.

カメラ１６ｂはユーザーＵ２を撮影する。マイク１７ｂは、ユーザーＵ２の音声を集音する。スピーカー１８ｂは、会議の音声を出力する。ユーザーＵ２は、これらの各機器を使用して会議に参加する。 Camera 16b photographs user U2. The microphone 17b collects the voice of the user U2. The speaker 18b outputs conference audio. User U2 uses each of these devices to participate in the conference.

端末１Ｃは、ディスプレイ１４ｃ、キーボード１５１ｃ、マウス１５２ｃ、カメラ１６ｃ、マイク１７ｃ、及び、スピーカー１８ｃを備える。これらの各機器は、端末１Ｃの本体に有線または無線で接続される。ディスプレイ１４ｃは、ディスプレイ１４ａと同様に表示パネルを有し、表示パネルの表示領域１４１ｃに画像や文字を表示する表示装置である。キーボード１５１ｃ、カメラ１６ｃ、マイク１７ｃ、及び、スピーカー１８ｃは、それぞれ、キーボード１５１ａ、カメラ１６ａ、マイク１７ａ、及び、スピーカー１８ａと同様に構成される。 Terminal 1C includes display 14c, keyboard 151c, mouse 152c, camera 16c, microphone 17c, and speaker 18c. Each of these devices is wired or wirelessly connected to the main body of the terminal 1C. The display 14c has a display panel like the display 14a, and is a display device that displays images and characters in a display area 141c of the display panel. The keyboard 151c, camera 16c, microphone 17c, and speaker 18c are configured similarly to the keyboard 151a, camera 16a, microphone 17a, and speaker 18a, respectively.

カメラ１６ｃはユーザーＵ３を撮影する。マイク１７ｃは、ユーザーＵ３の音声を集音する。スピーカー１８ｃは、会議の音声を出力する。ユーザーＵ３は、これらの各機器を使用して会議に参加する。 Camera 16c captures user U3. The microphone 17c collects the voice of the user U3. The speaker 18c outputs conference audio. User U3 uses each of these devices to participate in the conference.

端末１Ｄは、ディスプレイ１４ｄ、キーボード１５１ｄ、マウス１５２ｄ、カメラ１６ｄ、マイク１７ｄ、及び、スピーカー１８ｄを備える。これらの各機器は、端末１Ｄの本体に有線または無線で接続される。ディスプレイ１４ｄは、ディスプレイ１４ａと同様に表示パネルを有し、表示パネルの表示領域１４１ｄに画像や文字を表示する表示装置である。キーボード１５１ｄ、カメラ１６ｄ、マイク１７ｄ、及び、スピーカー１８ｄは、それぞれ、キーボード１５１ａ、カメラ１６ａ、マイク１７ａ、及び、スピーカー１８ａと同様に構成される。 Terminal 1D includes display 14d, keyboard 151d, mouse 152d, camera 16d, microphone 17d, and speaker 18d. Each of these devices is wired or wirelessly connected to the main body of the terminal 1D. The display 14d is a display device that has a display panel like the display 14a and displays images and characters in a display area 141d of the display panel. The keyboard 151d, the camera 16d, the microphone 17d, and the speaker 18d are configured similarly to the keyboard 151a, the camera 16a, the microphone 17a, and the speaker 18a, respectively.

カメラ１６ｄはユーザーＵ４を撮影する。マイク１７ｄは、ユーザーＵ４の音声を集音する。スピーカー１８ｄは、会議の音声を出力する。ユーザーＵ４は、これらの各機器を使用して会議に参加する。
ディスプレイ１４ａ、１４ｂ、１４ｃ、１４ｄを区別しない場合、ディスプレイ１４と記載する。 Camera 16d captures user U4. The microphone 17d collects the voice of the user U4. The speaker 18d outputs conference audio. User U4 uses each of these devices to participate in the conference.
The displays 14a, 14b, 14c and 14d will be referred to as the display 14 when not distinguished.

図２は、会議システム１００の動作の概要を示す説明図である。
会議システム１００を利用して実行される会議は、複数のユーザーＵが、少なくとも互いの音声を共有することによって行われる。また、会議システム１００は、カメラ１６ａ、１６ｂ、１６ｃ、１６ｄによって撮影した画像や映像を共有して会議を行う構成であってもよい。本実施形態では、会議システム１００は、カメラ１６ａ、１６ｂ、１６ｃ、１６ｄによって撮影した映像を共有して会議を行う。 FIG. 2 is an explanatory diagram showing an overview of the operation of the conference system 100. As shown in FIG.
A conference held using the conference system 100 is held by a plurality of users U sharing at least their voices. Also, the conference system 100 may be configured to hold a conference by sharing images and videos captured by the cameras 16a, 16b, 16c, and 16d. In this embodiment, the conference system 100 conducts a conference by sharing images captured by the cameras 16a, 16b, 16c, and 16d.

端末１Ａ、端末１Ｂ、端末１Ｃ、端末１Ｄからサーバー５０に送信されるデータを端末データＤ１と呼ぶ。端末データＤ１は、端末１Ａが送信する端末データＤ１１、端末１Ｂが送信する端末データＤ１２、端末１Ｃが送信する端末データＤ１３、及び、端末１Ｄが送信する端末データＤ１４を含む。 Data transmitted from the terminals 1A, 1B, 1C, and 1D to the server 50 is called terminal data D1. The terminal data D1 includes terminal data D11 transmitted by the terminal 1A, terminal data D12 transmitted by the terminal 1B, terminal data D13 transmitted by the terminal 1C, and terminal data D14 transmitted by the terminal 1D.

サーバー５０は、端末１Ａ、１Ｂ、１Ｃ、１Ｄに、サーバーデータＤ２を送信する。サーバーデータＤ２は、サーバーデータＤ２１、Ｄ２２、Ｄ２３、Ｄ２４を含む。 Server 50 transmits server data D2 to terminals 1A, 1B, 1C, and 1D. Server data D2 includes server data D21, D22, D23, and D24.

端末１Ａは、カメラ１６ａにより撮影した映像に基づく映像データ、マイク１７ａにより集音した音声に基づく音声データ、及び、キーボード１５１ａまたはマウス１５２ａの操作に基づく操作データを含む端末データＤ１１を、サーバー５０に送信する。同様に、端末１Ｂは、カメラ１６ｂにより撮影した映像に基づく映像データ、マイク１７ｂにより集音した音声に基づく音声データ、及び、キーボード１５１ｂまたはマウス１５２ｂの操作に基づく操作データを含む端末データＤ１１を、サーバー５０に送信する。端末１Ｃは、カメラ１６ｃにより撮影した映像に基づく映像データ、マイク１７ｃにより集音した音声に基づく音声データ、及び、キーボード１５１ｃまたはマウス１５２ｃの操作に基づく操作データを含む端末データＤ１１を、サーバー５０に送信する。端末１Ｄは、カメラ１６ｄにより撮影した映像に基づく映像データ、マイク１７ｄにより集音した音声に基づく音声データ、及び、キーボード１５１ｄまたはマウス１５２ｄの操作に基づく操作データを含む端末データＤ１１を、サーバー５０に送信する。 The terminal 1A transmits terminal data D11 including video data based on video captured by the camera 16a, audio data based on audio collected by the microphone 17a, and operation data based on operation of the keyboard 151a or mouse 152a to the server 50. Send. Similarly, the terminal 1B sends terminal data D11 including video data based on video captured by the camera 16b, audio data based on audio collected by the microphone 17b, and operation data based on the operation of the keyboard 151b or mouse 152b. Send to server 50 . The terminal 1C sends terminal data D11 including video data based on video captured by the camera 16c, audio data based on audio collected by the microphone 17c, and operation data based on operation of the keyboard 151c or mouse 152c to the server 50. Send. The terminal 1D transmits to the server 50 terminal data D11 including video data based on video captured by the camera 16d, audio data based on audio collected by the microphone 17d, and operation data based on operation of the keyboard 151d or mouse 152d. Send.

サーバー５０は、会議システム１００による会議の音声、及び、会議中の各々のユーザーＵの映像を各々の端末１に配信する。サーバー５０は、端末データＤ１に含まれる音声データに基づいて、統合音声データを生成する。統合音声データは、マイク１７ａ、１７ｂ、１７ｃ、１７ｄの各々によって集音された音声を含む。サーバー５０は、端末データＤ１に含まれる映像データに基づいて、表示データを生成する。表示データは、カメラ１６ａ、１６ｂ、１６ｃ、１６ｄの各々によって撮影された映像を含む。 The server 50 distributes the audio of the conference by the conference system 100 and the video of each user U during the conference to each terminal 1 . The server 50 generates integrated voice data based on the voice data included in the terminal data D1. Integrated audio data includes audio collected by each of the microphones 17a, 17b, 17c, and 17d. The server 50 generates display data based on the video data included in the terminal data D1. The display data includes images captured by each of the cameras 16a, 16b, 16c, and 16d.

端末１Ａは、サーバーデータＤ２１を受信し、サーバーデータＤ２１に含まれる統合音声データに基づいてスピーカー１８ａから音声を出力する。また、端末１Ａは、サーバーデータＤ２１に含まれる表示データをディスプレイ１４ａによって表示する。同様に、端末１Ｂは、サーバーデータＤ２２に基づいて、スピーカー１８ｂからの音声出力、及び、ディスプレイ１４ｂによる表示を行う。端末１Ｃは、サーバーデータＤ２３に基づいて、スピーカー１８ｃからの音声出力、及び、ディスプレイ１４ｃによる表示を行う。端末１Ｄは、サーバーデータＤ２４に基づいて、スピーカー１８ｄからの音声出力、及び、ディスプレイ１４ｄによる表示を行う。これにより、会議システム１００を利用する全てのユーザーＵが、他のユーザーＵの音声を聞くことができ、音声による会議を行うことができる。 The terminal 1A receives the server data D21 and outputs sound from the speaker 18a based on the integrated sound data included in the server data D21. Also, the terminal 1A displays the display data included in the server data D21 on the display 14a. Similarly, the terminal 1B performs audio output from the speaker 18b and display on the display 14b based on the server data D22. The terminal 1C performs audio output from the speaker 18c and display on the display 14c based on the server data D23. The terminal 1D performs audio output from the speaker 18d and display on the display 14d based on the server data D24. As a result, all users U using the conference system 100 can hear the voices of other users U, and can hold voice conferences.

サーバー５０は、端末１Ａ、端末１Ｂ、端末１Ｃ及び端末１Ｄに送信する統合音声データを、異なるデータとしてもよい。例えば、サーバー５０は、マイク１７ａにより集音された音声を含まないサーバーデータＤ２１を、端末１Ａに送信する。この場合、スピーカー１８ａは、マイク１７ａにより集音された音声を出力しない。すなわち、ユーザーＵ１の声がスピーカー１８ａから出力されない。これにより、ユーザーＵ１は、スピーカー１８ａからユーザーＵ１の声を聞くことがないので、ユーザーＵ１の違和感を回避できる。サーバー５０は、同様の制御を、サーバーデータＤ２２、Ｄ２３、Ｄ２４に対しても実行可能である。 The server 50 may transmit different integrated audio data to the terminals 1A, 1B, 1C, and 1D. For example, the server 50 transmits to the terminal 1A server data D21 that does not contain the sound collected by the microphone 17a. In this case, the speaker 18a does not output the sound collected by the microphone 17a. That is, the voice of user U1 is not output from speaker 18a. As a result, the user U1 does not hear the voice of the user U1 through the speaker 18a, so that the user U1 can avoid discomfort. The server 50 can perform similar control on server data D22, D23, and D24.

端末１Ａは、カメラ１６ａの撮影画像を加工した映像データを含む端末データＤ１１を生成してもよい。或いは、端末１Ａは、カメラ１６ａの撮影画像を含まない端末データＤ１１を生成してもよい。この場合、端末１Ａは、カメラ１６ａの撮影画像の代わりにダミーの映像データを含む端末データＤ１１を送信してもよいし、映像データを含まない端末データＤ１１を送信してもよい。サーバー５０は、映像データを含まない端末データＤ１１を受信した場合に、端末１Ａに対応付けて予め記憶しているアイコン画像を利用してサーバーデータＤ２を生成してもよい。端末１Ｂ、端末１Ｃ及び端末１Ｄについても同様である。 The terminal 1A may generate terminal data D11 including video data obtained by processing the image captured by the camera 16a. Alternatively, the terminal 1A may generate terminal data D11 that does not include the image captured by the camera 16a. In this case, the terminal 1A may transmit the terminal data D11 containing dummy video data instead of the image captured by the camera 16a, or may transmit the terminal data D11 containing no video data. When the server 50 receives the terminal data D11 that does not include the video data, the server 50 may generate the server data D2 using an icon image that is associated with the terminal 1A and stored in advance. The same applies to terminal 1B, terminal 1C, and terminal 1D.

［２．会議システムを構成する装置の構成］
図３は、会議システム１００を構成する装置のブロック図である。
図３には、端末１Ａ及びサーバー５０の機能的構成について図示する。端末１Ｂ、端末１Ｃ及び端末１Ｄは、端末１Ａと同様に構成されるため、端末１Ｂ、端末１Ｃ、及び端末１Ｄの構成について詳細な図示を省略する。 [2. Configuration of Devices Constituting Conference System]
FIG. 3 is a block diagram of devices that make up the conference system 100. As shown in FIG.
FIG. 3 illustrates functional configurations of the terminal 1A and the server 50. As shown in FIG. Since the terminals 1B, 1C and 1D are configured in the same manner as the terminal 1A, detailed illustrations of the configurations of the terminals 1B, 1C and 1D are omitted.

図３に示すように、端末１Ａは、制御回路１１を有し、制御回路１１にディスプレイ１４ａ、入力装置１５、カメラ１６ａ、マイク１７ａ、スピーカー１８ａ、及び通信装置１９が接続される。 As shown in FIG. 3, the terminal 1A has a control circuit 11 to which a display 14a, an input device 15, a camera 16a, a microphone 17a, a speaker 18a, and a communication device 19 are connected.

入力装置１５は、ユーザーＵが入力操作に使用する装置である。入力装置１５は、例えば、キーボード１５１ａ及びマウス１５２ａを含む。入力装置１５は、ディスプレイ１４ａの表示パネルに重ねて設置されるタッチセンサーであってもよいし、その他の装置であってもよい。 The input device 15 is a device that the user U uses for input operations. The input device 15 includes, for example, a keyboard 151a and a mouse 152a. The input device 15 may be a touch sensor placed over the display panel of the display 14a, or may be another device.

通信装置１９は、通信ネットワーク７に接続され、通信ネットワーク７を介してサーバー５０とデータ通信を実行する。通信装置１９は、例えば、通信ケーブルを接続するコネクター及び通信インターフェイス回路を備える。通信装置１９は、アンテナ及び無線通信回路を備え、無線通信回線を通じて通信ネットワーク７に接続される構成であってもよい。 The communication device 19 is connected to the communication network 7 and performs data communication with the server 50 via the communication network 7 . The communication device 19 includes, for example, a connector for connecting a communication cable and a communication interface circuit. The communication device 19 may include an antenna and a wireless communication circuit, and may be configured to be connected to the communication network 7 through a wireless communication line.

制御回路１１は、プロセッサー１２、及び、メモリー１３を備える。プロセッサー１２は、例えば、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、ＭＰＵ（Ｍｉｃｒｏ－ＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、或いはその他の集積回路により構成される。プロセッサー１２は、単一のプロセッサーであっても、複数のプロセッサーであってもよい。プロセッサー１２は、プログラムを実行することにより、端末１Ａの各部を制御する。 The control circuit 11 has a processor 12 and a memory 13 . The processor 12 is configured by, for example, a CPU (Central Processing Unit), MPU (Micro-Processing Unit), or other integrated circuits. Processor 12 may be a single processor or multiple processors. Processor 12 controls each part of terminal 1A by executing a program.

メモリー１３は、プロセッサー１２が実行するプログラムやデータを不揮発的に記憶する記憶装置である。メモリー１３は、磁気的記憶装置、フラッシュＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）等の半導体記憶素子、或いはその他の種類の不揮発性記憶装置により構成される。メモリー１３は、プロセッサー１２のワークエリアを構成するＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）を含んでもよい。メモリー１３は、制御回路１１により処理されるデータや、プロセッサー１２が実行する制御プログラムを記憶する。 The memory 13 is a storage device that stores programs and data executed by the processor 12 in a non-volatile manner. The memory 13 is composed of a semiconductor storage device such as a magnetic storage device, a flash ROM (Read Only Memory), or other types of non-volatile storage devices. The memory 13 may include a RAM (Random Access Memory) forming a work area of the processor 12 . The memory 13 stores data processed by the control circuit 11 and control programs executed by the processor 12 .

プロセッサー１２は、カメラ１６ａの撮影データを取得し、撮影データに基づき映像データを生成する。プロセッサー１２は、マイク１７ａが集音した音声をデジタル音声データに変換する。プロセッサー１２は、デジタル音声データに基づき音声データを生成する。プロセッサー１２は、入力装置１５による入力を受け付けて、受け付けた入力に基づき操作データを生成する。 The processor 12 acquires the photographed data of the camera 16a and generates video data based on the photographed data. The processor 12 converts the voice collected by the microphone 17a into digital voice data. Processor 12 generates audio data based on the digital audio data. The processor 12 receives input from the input device 15 and generates operation data based on the received input.

プロセッサー１２は、映像データ、音声データ、及び操作データを含む端末データＤ１１を生成し、通信装置１９によってサーバー５０に送信する。 The processor 12 generates terminal data D11 including video data, audio data, and operation data, and transmits the terminal data D11 to the server 50 via the communication device 19 .

プロセッサー１２は、サーバー５０が送信するサーバーデータＤ２１を、通信装置１９によって受信する。プロセッサー１２は、サーバーデータＤ２１に含まれる表示データに基づいてディスプレイ１４ａを制御することにより、ディスプレイ１４ａに会議画面６１を表示させる。プロセッサー１２は、サーバーデータＤ２１に含まれる統合音声データに基づく音声をスピーカー１８ａから出力させる。 The processor 12 receives the server data D21 transmitted by the server 50 through the communication device 19 . The processor 12 causes the display 14a to display the conference screen 61 by controlling the display 14a based on the display data included in the server data D21. The processor 12 causes the speaker 18a to output sound based on the integrated sound data included in the server data D21.

端末１Ｂ、端末１Ｃ、及び、端末１Ｄは、端末１Ａと同様に構成される。すなわち、端末１Ｂ、端末１Ｃ、及び、端末１Ｄは、プロセッサーを含む制御回路を備える。端末１Ｂ、端末１Ｃ、及び、端末１Ｄは、プロセッサーの制御によって、サーバー５０からサーバーデータＤ２を受信して音声出力及び表示を行う。また、端末１Ｂ、端末１Ｃ、及び、端末１Ｄは、プロセッサーの制御によって、音声データ、映像データ、及び操作データを含む端末データＤ１を生成して、サーバー５０に送信する。 Terminal 1B, terminal 1C, and terminal 1D are configured in the same manner as terminal 1A. That is, terminal 1B, terminal 1C, and terminal 1D are provided with control circuits including processors. The terminals 1B, 1C, and 1D receive the server data D2 from the server 50 under the control of the processor, and perform audio output and display. Also, the terminals 1B, 1C, and 1D generate terminal data D1 including audio data, video data, and operation data under the control of the processors, and transmit the terminal data D1 to the server 50 .

サーバー５０は、サーバー制御回路５１を有する。サーバー制御回路５１には通信装置５４が接続される。通信装置５４は、通信ネットワーク７に接続され、通信ネットワーク７を介して端末１とデータ通信を実行する。通信装置５４は、例えば、通信ケーブルを接続するコネクター及び通信インターフェイス回路を備える。通信装置５４は、アンテナ及び無線通信回路を備え、無線通信回線を通じて通信ネットワーク７に接続される構成であってもよい。 The server 50 has a server control circuit 51 . A communication device 54 is connected to the server control circuit 51 . The communication device 54 is connected to the communication network 7 and performs data communication with the terminal 1 via the communication network 7 . The communication device 54 includes, for example, a connector for connecting a communication cable and a communication interface circuit. The communication device 54 may include an antenna and a wireless communication circuit, and may be configured to be connected to the communication network 7 through a wireless communication line.

サーバー制御回路５１は、プロセッサー５２、及び、メモリー５３を備える。プロセッサー５２は、ＣＰＵ、ＭＰＵ、或いは、その他の集積回路により構成される。プロセッサー５２は、プログラムを実行することにより、端末１Ｃの各部を制御する。プロセッサー５２の具体的構成は、例えば、プロセッサー１２と同様であり、メモリー５３の具体的構成はメモリー１３と同様である。サーバー制御回路５１は、コンピューターの一例に対応する。 The server control circuit 51 has a processor 52 and a memory 53 . The processor 52 is composed of a CPU, MPU, or other integrated circuits. Processor 52 controls each part of terminal 1C by executing a program. The specific configuration of the processor 52 is, for example, similar to that of the processor 12 , and the specific configuration of the memory 53 is similar to that of the memory 13 . The server control circuit 51 corresponds to an example of a computer.

プロセッサー５２は、メモリー５３が記憶する制御プログラム５３１を実行することによって、通信制御部５２１、設定部５２２、端末識別部５２３、音声処理部５２４、及び、表示処理部５２５として機能する。メモリー５３は、制御プログラム５３１、及び、会議データ５３２を記憶する。会議データ５３２は、会議に関する情報を含む。会議に関する情報は、例えば、ユーザーＵが端末１を操作して入力される。会議に関する情報は、例えば、会議の日時、開始時刻、終了時刻、参加するユーザーＵ、会議のタイトルである。会議データ５３２は、会議の実行中にサーバー制御回路５１が生成する情報を含んでもよい。 The processor 52 functions as a communication control unit 521 , a setting unit 522 , a terminal identification unit 523 , an audio processing unit 524 and a display processing unit 525 by executing a control program 531 stored in the memory 53 . The memory 53 stores a control program 531 and conference data 532 . Meeting data 532 includes information about meetings. Information about the conference is input by the user U operating the terminal 1, for example. The information about the conference is, for example, the date and time of the conference, the start time, the end time, the participating users U, and the title of the conference. Conference data 532 may include information generated by server control circuit 51 during execution of the conference.

通信制御部５２１は、通信装置５４を制御して、端末１との間でデータ通信を実行させる。通信制御部５２１は、端末１が送信する端末データＤ１を、通信装置５４によって受信させる。通信制御部５２１は、プロセッサー５２が生成するサーバーデータＤ２を、端末１に送信する。 The communication control unit 521 controls the communication device 54 to perform data communication with the terminal 1 . The communication control unit 521 causes the communication device 54 to receive the terminal data D1 transmitted by the terminal 1 . The communication control unit 521 transmits server data D2 generated by the processor 52 to the terminal 1 .

設定部５２２は、端末データＤ１に含まれる操作データに基づいて、会議システム１００を利用する会議に関する設定を行う。 The setting unit 522 sets the conference using the conference system 100 based on the operation data included in the terminal data D1.

端末識別部５２３は、端末１から端末データＤ１を受信した場合に、端末データＤ１を送信した端末１を識別する。本実施形態の会議システム１００は、１台の端末１を１人のユーザーＵが使用する。このため、端末識別部５２３が端末１を識別した結果は、ユーザーＵを識別した結果と見なすことができる。例えば、音声処理部５２４は、端末識別部５２３によって端末データＤ１の送信元が端末１Ａであると識別された場合、この端末データＤ１がユーザーＵ１の音声を含むデータであると見なす。 When the terminal data D1 is received from the terminal 1, the terminal identification unit 523 identifies the terminal 1 that has transmitted the terminal data D1. In the conference system 100 of this embodiment, one user U uses one terminal 1 . Therefore, the result of identifying the terminal 1 by the terminal identification unit 523 can be regarded as the result of identifying the user U. FIG. For example, when the terminal identification unit 523 identifies that the terminal 1A is the transmission source of the terminal data D1, the audio processing unit 524 regards the terminal data D1 as data including the audio of the user U1.

音声処理部５２４は、端末データＤ１に含まれる音声データを処理する。音声処理部５２４は、例えば、端末データＤ１に含まれる音声データに基づいて、統合音声データを生成する。音声処理部５２４は、全ての端末１に共通する統合音声データを生成してもよいし、端末１毎に異なる統合音声データを生成してもよい。 The audio processing unit 524 processes audio data included in the terminal data D1. The audio processing unit 524 generates integrated audio data, for example, based on the audio data included in the terminal data D1. The voice processing unit 524 may generate integrated voice data common to all terminals 1 or may generate different integrated voice data for each terminal 1 .

音声処理部５２４は、端末データＤ１に含まれる音声データを解析する。音声処理部５２４は、音声データに含まれるユーザーＵの音声を検出する機能を有し、ユーザーＵが発話を開始したこと、及び、ユーザーＵの発話が終了したことを検出する。 The audio processing unit 524 analyzes the audio data included in the terminal data D1. The voice processing unit 524 has a function of detecting the voice of the user U included in the voice data, and detects that the user U has started speaking and that the user U has finished speaking.

音声処理部５２４は、ユーザーＵの発言量を算出する。発言量は、発言の時間の長さ、すなわち発言の時間長であってもよいし、会議における発言回数であってもよいし、発言頻度であってもよい。発言頻度とは会議における単位時間あたりの発言回数をいう。本実施形態では一例として、発言量を、ユーザーＵが発話した時間長とする。時間長は、秒、分、時間、或いはその他の単位で表現される。より詳細には、発言量は、ユーザーＵの発話が開始してから終了するまでの時間の長さである。これを、発話時間の時間長と呼ぶ。 The voice processing unit 524 calculates the amount of user U's speech. The amount of speech may be the length of time of speech, that is, the length of time of speech, the number of speeches in a conference, or the frequency of speech. Speech frequency refers to the number of speeches per unit time in a conference. In this embodiment, as an example, the amount of speech is the length of time that the user U has spoken. The length of time may be expressed in seconds, minutes, hours, or some other unit. More specifically, the amount of speech is the length of time from when user U's speech starts to when it ends. This is called the time length of speech time.

音声処理部５２４は、会議の実行中、端末１を使用するユーザーＵごとに、発話時間の時間長を積算する。例えば、音声処理部５２４は、会議の実行中、ユーザーＵ１の発話の終了を検出した場合に、発話時間の時間長をメモリー１３に記憶させる。その後、音声処理部５２４は、ユーザーＵ１の発話の終了を検出すると、検出した発話時間の時間長を、メモリー１３に記憶されているユーザーＵ１の発言時間の時間長に加算することにより、発話時間の積算値を算出する。音声処理部５２４は、算出した発話時間の積算値をメモリー１３に記憶させる。発話時間の積算値は、例えば、会議データ５３２に含まれる。音声処理部５２４は、会議に参加するユーザーＵ毎に、発話時間の積算値を更新する処理を実行する。このため、会議データ５３２は、会議の参加者であるユーザーＵ毎に、発話時間の積算値を含む。音声処理部５２４は、発話時間の積算値を更新する処理を、会議が終了するまで継続する。音声処理部５２４は、会議が終了した後に、メモリー１３が記憶するユーザーＵ毎の発話時間の積算値をリセットしてもよい。 The voice processing unit 524 accumulates the length of speech time for each user U who uses the terminal 1 during the execution of the conference. For example, when detecting the end of user U1's speech during the execution of the conference, the speech processing unit 524 causes the memory 13 to store the length of speech time. After that, when the speech processing unit 524 detects the end of the speech by the user U1, the speech processing unit 524 adds the detected speech duration to the speech duration of the user U1 stored in the memory 13, thereby calculating the speech duration. Calculate the integrated value of The voice processing unit 524 causes the memory 13 to store the calculated integrated value of the speech time. The integrated value of speech time is included in the conference data 532, for example. The voice processing unit 524 executes a process of updating the integrated value of speech time for each user U who participates in the conference. Therefore, the conference data 532 includes an integrated value of speech time for each user U who is a participant in the conference. The voice processing unit 524 continues the process of updating the integrated value of the speech time until the conference ends. The voice processing unit 524 may reset the integrated value of the speech time for each user U stored in the memory 13 after the conference ends.

本実施形態では、サーバー５０は、音声処理部５２４が積算する発話時間の時間長の積算値を、発言量の指標として用いる。いずれか１人のユーザーＵの発話時間の時間長の積算値は、第１発言量の一例に対応する。この場合、他のユーザーＵのうち１人のユーザーＵの発話時間の時間長の積算値は、第２発言量の一例に対応する。 In this embodiment, the server 50 uses the integrated value of the length of speech time accumulated by the voice processing unit 524 as an index of the amount of speech. The integrated value of the speech duration of any one user U corresponds to an example of the first speech volume. In this case, the integrated value of the speech duration of one user U among the other users U corresponds to an example of the second speech volume.

表示処理部５２５は、会議用の画面を表示するための表示データを生成する。表示処理部５２５は、表示データと、統合音声データとを含むサーバーデータＤ２を生成して、端末１に送信する。 The display processing unit 525 generates display data for displaying a conference screen. The display processing unit 525 generates server data D2 including display data and integrated audio data, and transmits the generated server data D2 to the terminal 1 .

表示処理部５２５は、カメラ１６ａの撮影画像、カメラ１６ｂの撮影画像、カメラ１６ｃの撮影画像、及び、カメラ１６ｄの撮影画像を含む会議画面６１を端末１が表示するための表示データを生成する。 The display processing unit 525 generates display data for the terminal 1 to display the conference screen 61 including the image captured by the camera 16a, the image captured by the camera 16b, the image captured by the camera 16c, and the image captured by the camera 16d.

図４は、端末１に表示される画面の一例を示す図であり、会議画面６１の例を示す。会議画面６１は、端末１が、サーバーデータＤ２に基づいて表示する画面であり、会議の実行中に表示される。図４には端末１Ａがディスプレイ１４ａに表示する会議画面６１を示すが、端末１Ｂ、１Ｃ、１Ｄも同様の会議画面６１を表示する。 FIG. 4 is a diagram showing an example of the screen displayed on the terminal 1, and shows an example of the conference screen 61. As shown in FIG. The conference screen 61 is displayed by the terminal 1 based on the server data D2, and is displayed during the execution of the conference. FIG. 4 shows the conference screen 61 displayed on the display 14a by the terminal 1A, but the terminals 1B, 1C, and 1D also display similar conference screens 61. FIG.

会議画面６１は、複数の表示枠６１１ａ、６１１ｂ、６１１ｃ、６１１ｄを有する。これらを総称して表示枠６１１と記載する。表示枠６１１は、１つの端末１に対応する表示領域である。表示枠６１１には、端末１を使用するユーザーＵのユーザー画像６２１が表示される。会議画面６１は、詳細には、ユーザー画像６２１ａ、６２１ｂ、６２１ｃ、６２１ｄを含む。これらを区別しない場合にユーザー画像６２１と記載する。ユーザー画像６２１ａ、６２１ｂ、６２１ｃ、６２１ｄのいずれかは第１画像の一例に対応し、他のいずれかは第２画像の一例に対応する。 The conference screen 61 has a plurality of display frames 611a, 611b, 611c, 611d. These are collectively described as a display frame 611 . A display frame 611 is a display area corresponding to one terminal 1 . A user image 621 of the user U who uses the terminal 1 is displayed in the display frame 611 . The conference screen 61 specifically includes user images 621a, 621b, 621c, and 621d. When these are not distinguished, they are described as a user image 621 . Any of the user images 621a, 621b, 621c, and 621d corresponds to an example of the first image, and any of the others corresponds to an example of the second image.

ユーザー画像６２１ａは、ユーザーＵ１に対応する画像であり、端末１Ａに対応する画像であるともいえる。ユーザー画像６２１ａは、端末１Ａがカメラ１６ａによって撮影した映像に基づく、映像または静止画像である。例えば、表示処理部５２５は、端末データＤ１１に含まれる映像データに基づき、ユーザー画像６２１ａを生成する。ユーザー画像６２１ａは、カメラ１６ａの撮影画像と同一の画像であってもよいし、カメラ１６ａの撮影画像を加工した画像であってもよい。例えば、ユーザー画像６２１ａは、カメラ１６ａの撮影画像をトリミングした画像であってもよい。また、ユーザー画像６２１ａは、カメラ１６ａの撮影画像とは関係のない画像であってもよい。例えば、ユーザー画像６２１ａは、サーバー５０が端末１Ａに対応付けてメモリー５３に記憶するアイコンやダミー画像であってもよい。 The user image 621a is an image corresponding to the user U1, and can be said to be an image corresponding to the terminal 1A. The user image 621a is a video or still image based on video captured by the terminal 1A with the camera 16a. For example, the display processing unit 525 generates the user image 621a based on the video data included in the terminal data D11. The user image 621a may be the same image as the image captured by the camera 16a, or may be an image obtained by processing the image captured by the camera 16a. For example, the user image 621a may be an image obtained by trimming the image captured by the camera 16a. Also, the user image 621a may be an image unrelated to the captured image of the camera 16a. For example, the user image 621a may be an icon or a dummy image that the server 50 stores in the memory 53 in association with the terminal 1A.

ユーザー画像６２１ｂは、ユーザーＵ２に対応する画像であり、端末１Ｂに対応する画像であるともいえる。ユーザー画像６２１ｂは、端末１Ｂがカメラ１６ｂによって撮影した映像に基づく、映像または静止画像である。例えば、表示処理部５２５は、端末データＤ１２に含まれる映像データに基づき、ユーザー画像６２１ｂを生成する。ユーザー画像６２１ｂは、カメラ１６ｂの撮影画像と同一の画像であってもよいし、カメラ１６ｂの撮影画像を加工した画像であってもよい。また、ユーザー画像６２１ｂは、カメラ１６ｂの撮影画像とは関係のない画像であってもよい。 The user image 621b is an image corresponding to the user U2, and can be said to be an image corresponding to the terminal 1B. The user image 621b is a video or a still image based on video captured by the terminal 1B with the camera 16b. For example, the display processing unit 525 generates the user image 621b based on the video data included in the terminal data D12. The user image 621b may be the same image as the image captured by the camera 16b, or may be an image obtained by processing the image captured by the camera 16b. Also, the user image 621b may be an image unrelated to the captured image of the camera 16b.

ユーザー画像６２１ｃ、及び、ユーザー画像６２１ｄはユーザー画像６２１ａ、６２１ｂと同様の画像である。すなわち、ユーザー画像６２１ｃは、ユーザーＵ３に対応する画像であり、端末１Ｃに対応する画像であるともいえる。ユーザー画像６２１ｄは、ユーザーＵ４に対応する画像であり、端末１Ｄに対応する画像であるともいえる。これらのユーザー画像６２１ｃ、６２１ｄは、カメラ１６ｃ、１６ｄの撮影画像、カメラ１６ｃ、１６ｄの撮影画像を加工した画像、或いは、カメラ１６ｃ、１６ｄの撮影画像とは関係のない画像である。 A user image 621c and a user image 621d are images similar to the user images 621a and 621b. That is, the user image 621c is an image corresponding to the user U3, and can be said to be an image corresponding to the terminal 1C. The user image 621d is an image corresponding to the user U4, and can be said to be an image corresponding to the terminal 1D. These user images 621c and 621d are images captured by the cameras 16c and 16d, images obtained by processing the images captured by the cameras 16c and 16d, or images unrelated to the images captured by the cameras 16c and 16d.

例えば、サーバー５０は、ユーザー画像６２１ａ、６２１ｂ、６２１ｃ、６２１ｄとして使用するダミー画像やアイコンの画像を、メモリー５３に記憶する。表示処理部５２５は、端末１の操作により指示された場合、或いは、事前に設定された場合に、メモリー５３が記憶するダミー画像やアイコンの画像を、ユーザー画像６２１ａ、６２１ｂ、６２１ｃ、６２１ｄとして使用する。 For example, the server 50 stores in the memory 53 dummy images and icon images used as the user images 621a, 621b, 621c, and 621d. The display processing unit 525 uses dummy images and icon images stored in the memory 53 as the user images 621a, 621b, 621c, and 621d when instructed by the operation of the terminal 1 or when set in advance. do.

表示枠６１１ａは、ユーザー画像６２１ａが配置される領域である。表示枠６１１ｂはユーザー画像６２１ｂが配置される領域であり、表示枠６１１ｃはユーザー画像６２１ｃが配置される領域である。表示枠６１１ｄは、ユーザー画像６２１ｄが配置される領域である。 The display frame 611a is an area in which the user image 621a is arranged. A display frame 611b is an area in which a user image 621b is arranged, and a display frame 611c is an area in which a user image 621c is arranged. The display frame 611d is an area in which the user image 621d is arranged.

表示処理部５２５は、会議の開始時において、ユーザー画像６２１ａ、６２１ｂ、６２１ｃ、６２１ｄの大きさが均等となる表示データを生成する。ユーザー画像６２１ａの大きさとは、例えば、ユーザー画像６２１ａの面積、会議画面６１に占めるユーザー画像６２１ａの面積の割合である。ユーザー画像６２１ａの大きさは、会議画面６１の縦方向におけるユーザー画像６２１ａのサイズ、及び、会議画面６１の水平方向におけるユーザー画像６２１ａのサイズであってもよい。ユーザー画像６２１ｂ、６２１ｃ、６２１ｄの大きさについても同様である。表示枠６１１ａ、６１１ｂ、６１１ｃ、６１１ｄの大きさやアスペクト比は、ユーザー画像６２１ａ、６２１ｂ、６２１ｃ、６２１ｄに合わせて適宜に決定される。 The display processing unit 525 generates display data in which the sizes of the user images 621a, 621b, 621c, and 621d are uniform at the start of the conference. The size of the user image 621 a is, for example, the area of the user image 621 a or the ratio of the area of the user image 621 a to the conference screen 61 . The size of the user image 621 a may be the size of the user image 621 a in the vertical direction of the conference screen 61 and the size of the user image 621 a in the horizontal direction of the conference screen 61 . The same applies to the sizes of the user images 621b, 621c, and 621d. The sizes and aspect ratios of the display frames 611a, 611b, 611c, and 611d are appropriately determined according to the user images 621a, 621b, 621c, and 621d.

表示処理部５２５は、音声処理部５２４により積算されるユーザーＵの発話時間の積算値に基づいて、ユーザー画像６２１ａ、６２１ｂ、６２１ｃ、６２１ｄの少なくともいずれかの大きさを決定する。本実施形態では、各ユーザーＵの発話時間の積算値の大きさの相対的な関係に対応させて、ユーザー画像６２１ａ、６２１ｂ、６２１ｃ、６２１ｄの相対的な大きさを決定する。 The display processing unit 525 determines the size of at least one of the user images 621 a , 621 b , 621 c , and 621 d based on the integrated value of the speech time of the user U that is integrated by the audio processing unit 524 . In the present embodiment, the relative sizes of the user images 621a, 621b, 621c, and 621d are determined in accordance with the relative relationship between the integrated values of the speech times of the users U. FIG.

例えば、表示処理部５２５は、ユーザーＵ１の発言量に基づく大きさでユーザー画像６２１ａが表示されるように表示データを生成する。また、表示処理部５２５は、ユーザーＵ２の発言量に基づく大きさでユーザー画像６２１ｂが表示されるように表示データを生成する。そして、ユーザーＵ１の発言量が、ユーザーＵ２の発言量より大きい場合は、ユーザー画像６２１ａの大きさをユーザー画像６２１ｂの大きさよりも小さくする。言い換えれば、表示処理部５２５は、ユーザーＵ１の発言量がユーザーＵ２の発言量より小さい場合、ユーザー画像６２１ａの大きさを、ユーザー画像６２１ｂの大きさよりも大きくする。 For example, the display processing unit 525 generates display data so that the user image 621a is displayed in a size based on the speech volume of the user U1. In addition, the display processing unit 525 generates display data so that the user image 621b is displayed in a size based on the speech volume of the user U2. Then, when the amount of speech by user U1 is greater than the amount of speech by user U2, the size of user image 621a is made smaller than the size of user image 621b. In other words, the display processing unit 525 makes the size of the user image 621a larger than the size of the user image 621b when the amount of speech by the user U1 is smaller than the amount of speech by the user U2.

図５は、端末１に表示される画面の一例を示す図であり、会議画面６１においてユーザー画像６２１の大きさを変化させた例を示す。
図５には、ユーザーＵ１、Ｕ２、Ｕ４、Ｕ３の順に発言量が小さい場合の会議画面６１の例を示す。ユーザーＵ１とユーザーＵ２の発言量は等しく、ユーザーＵ１、Ｕ２、Ｕ３、Ｕ４の中で発言量が最も小さい。この場合、ユーザーＵ１に対応するユーザー画像６２１ａ、及び、ユーザーＵ２に対応するユーザー画像６２１ｂの大きさは、ユーザー画像６２１ｃ及びユーザー画像６２１ｄの大きさよりも大きい。また、ユーザーＵ３の発言量が最も大きいことを想定すると、ユーザーＵ３に対応するユーザー画像６２１ｃの大きさは、ユーザー画像６２１ａ、６２１ｂ、６２１ｄよりも小さい。また、ユーザーＵ１とユーザーＵ２の発言量が等しいので、ユーザー画像６２１ａの大きさとユーザー画像６２１ｂの大きさとは、ほぼ等しい。 FIG. 5 is a diagram showing an example of a screen displayed on the terminal 1, and shows an example of changing the size of the user image 621 on the conference screen 61. As shown in FIG.
FIG. 5 shows an example of the conference screen 61 when the amount of speech is smaller in the order of users U1, U2, U4, and U3. User U1 and user U2 have the same amount of speech, and the amount of speech is the smallest among users U1, U2, U3, and U4. In this case, the sizes of the user image 621a corresponding to the user U1 and the user image 621b corresponding to the user U2 are larger than the sizes of the user images 621c and 621d. Also, assuming that user U3 speaks the most, the size of user image 621c corresponding to user U3 is smaller than user images 621a, 621b, and 621d. In addition, since user U1 and user U2 speak the same amount, the size of user image 621a and the size of user image 621b are almost the same.

図５は、表示処理部５２５が、ユーザー画像６２１のアスペクト比を維持する例を示している。すなわち、ユーザー画像６２１の大きさとして、ユーザー画像６２１の面積を採用する場合、縦方向のユーザー画像６２１のサイズを採用する場合、及び、水平方向のユーザー画像６２１のサイズを採用する場合のいずれも、図５に示す表示態様となる。ユーザー画像６２１の大きさとして、会議画面６１に占めるユーザー画像６２１の面積の割合を採用する場合も同様である。また、表示枠６１１ａ、６１１ｂ、６１１ｃ、６１１ｄの大きさやアスペクト比は、ユーザー画像６２１ａ、６２１ｂ、６２１ｃ、６２１ｄに合わせて適宜に決定される。表示処理部５２５は、ユーザー画像６２１のアスペクト比を維持しなくてもよい。例えば、縦方向のユーザー画像６２１のサイズを基準として、発話量に基づくユーザー画像６２１の大きさを決定してもよい。 FIG. 5 shows an example in which the display processing unit 525 maintains the aspect ratio of the user image 621. FIG. That is, when adopting the area of the user image 621 as the size of the user image 621, when adopting the size of the user image 621 in the vertical direction, and when adopting the size of the user image 621 in the horizontal direction , the display form shown in FIG. The same is true when using the ratio of the area of the user image 621 to the conference screen 61 as the size of the user image 621 . Also, the sizes and aspect ratios of the display frames 611a, 611b, 611c, and 611d are appropriately determined according to the user images 621a, 621b, 621c, and 621d. The display processing unit 525 does not have to maintain the aspect ratio of the user image 621. For example, the size of the user image 621 in the vertical direction may be used as a reference to determine the size of the user image 621 based on the amount of speech.

表示処理部５２５は、発言量の大きいユーザーＵに対応するユーザー画像６２１の大きさを、他のユーザーＵに対応するユーザー画像６２１の大きさよりも小さくする。このため、発言量が多いユーザーＵほど、ユーザー画像６２１の大きさが小さくされる。表示処理部５２５は、ユーザー画像６２１の大きさの下限を有していてもよい。この場合、表示処理部５２５は、発言量が閾値より大きいユーザーＵに対応するユーザー画像６２１の大きさを、設定された下限値とする。つまり、ユーザー画像６２１の大きさは下限値以上の大きさである。これにより、ユーザーＵの発言量が大きい場合であっても、ユーザー画像６２１の大きさが下限値を下回ることはない。下限値、及び、閾値は、例えば、予めサーバー５０がメモリー５３に記憶している。下限値は、例えば、標準的な大きさのディスプレイ１４を用いてユーザー画像６２１を表示したときに、ユーザー画像６２１が目視で視認できる程度の大きさである。下限値は、標準的な大きさのディスプレイ１４を用いてユーザー画像６２１を表示した場合の縦方向及び横方向の長さで規定されてもよいし、ユーザー画像６２１の表示解像度または表示画素数で規定されてもよい。或いは、下限値は、ディスプレイ１４の表示解像度または表示画素数に対するユーザー画像６２１の表示解像度または画素数が占める割合で規定されてもよい。また、下限値は、ディスプレイ１４において会議画面６１が表示される領域においてユーザー画像６２１が占める割合で規定されてもよい。 The display processing unit 525 makes the size of the user image 621 corresponding to the user U whose speech volume is large smaller than the size of the user images 621 corresponding to other users U. FIG. Therefore, the size of the user image 621 is made smaller for the user U who speaks more. The display processing unit 525 may have a lower limit for the size of the user image 621 . In this case, the display processing unit 525 sets the size of the user image 621 corresponding to the user U whose speech volume is greater than the threshold as the set lower limit. That is, the size of the user image 621 is equal to or larger than the lower limit. As a result, the size of the user image 621 does not fall below the lower limit even when the amount of speech by the user U is large. The lower limit value and the threshold value are stored in advance in the memory 53 by the server 50, for example. The lower limit value is, for example, a size that allows the user image 621 to be visually recognized when the user image 621 is displayed using the standard-sized display 14 . The lower limit may be defined by the vertical and horizontal lengths when the user image 621 is displayed using the standard size display 14, or by the display resolution or the number of display pixels of the user image 621. may be specified. Alternatively, the lower limit value may be defined by the ratio of the display resolution or the number of pixels of the user image 621 to the display resolution or the number of pixels of the display 14 . Also, the lower limit value may be defined by the ratio of the user image 621 to the area where the conference screen 61 is displayed on the display 14 .

このように、会議画面６１におけるユーザー画像６２１ａ、６２１ｂ、６２１ｃ、６２１ｄの大きさが、ユーザーＵ１、Ｕ２、Ｕ３、Ｕ４の発言量に応じて決定され、発言量の小さいユーザーＵのユーザー画像６２１が、他の少なくとも１つのユーザー画像６２１よりも相対的に大きく表示される。言い換えれば、発言量の大きいユーザーＵのユーザー画像６２１が、相対的に小さく表示される。このため、会議において発言量が小さいユーザーＵの画像が、より目立つように表示されるので、該当するユーザーＵに対して発言を促すことができる。また、会議において発言量が大きいユーザーＵの画像が、相対的に小さく表示されるので、該当するユーザーＵに対して発言の抑制を促すことができる。さらに、会議システム１００において会議に参加する全ての端末１が表示する会議画面６１において、ユーザー画像６２１ａ、６２１ｂ、６２１ｃ、６２１ｄが、ユーザーＵ１、Ｕ２、Ｕ３、Ｕ４の発言量に基づく大きさで表示される。このため、発言量の少ないユーザーＵのユーザー画像６２１が大きく表示されることによって、会議に参加する複数のユーザーＵに対し、発言量の少ないユーザーＵに発言機会を与えることを促す効果が期待できる。 In this way, the sizes of the user images 621a, 621b, 621c, and 621d on the conference screen 61 are determined according to the amount of speech of the users U1, U2, U3, and U4, and the user image 621 of the user U, whose amount of speech is small, is , is displayed relatively larger than at least one other user image 621 . In other words, the user image 621 of the user U whose speaking volume is large is displayed relatively small. As a result, the image of the user U whose speech volume is small in the conference is displayed more conspicuously, so that the corresponding user U can be encouraged to speak. In addition, since the image of the user U who speaks a lot in the conference is displayed in a relatively small size, it is possible to encourage the user U in question to refrain from speaking. Further, on the conference screen 61 displayed by all the terminals 1 participating in the conference in the conference system 100, user images 621a, 621b, 621c, and 621d are displayed in sizes based on the speech volume of the users U1, U2, U3, and U4. be done. Therefore, by displaying the user image 621 of the user U with a small amount of speech in a large size, it is possible to expect an effect of encouraging a plurality of users U participating in the conference to give the user U with a small amount of speech an opportunity to speak. .

［３．会議システムの動作］
図６及び図７は、サーバー５０の動作例を示すフローチャートである。これらの図を参照して、会議システム１００の動作について説明する。会議システム１００の動作において、端末１Ａ、１Ｂ、１Ｃ、１Ｄのうちいずれか１つの端末は第１端末の一例に対応し、他の端末のうちいずれか１つが第２端末の一例に対応する。ステップＳＴ１１－ＳＴ１６は音声処理部５２４により実行され、ステップＳＴ１７－ＳＴ１９は表示処理部５２５により実行される。 [3. Operation of conference system]
6 and 7 are flowcharts showing an operation example of the server 50. FIG. The operation of the conference system 100 will be described with reference to these figures. In the operation of the conference system 100, any one of terminals 1A, 1B, 1C, and 1D corresponds to an example of a first terminal, and any one of the other terminals corresponds to an example of a second terminal. Steps ST11-ST16 are executed by the audio processing section 524, and steps ST17-ST19 are executed by the display processing section 525. FIG.

ステップＳＴ１１で、サーバー５０は、発話の検出を開始する。発話の検出は、サーバー５０が端末１から受信した端末データＤ１を解析することによって、ユーザーＵの発話が開始されたことを検出する処理である。 In step ST11, the server 50 starts detecting speech. The speech detection is a process of detecting that the user U has started speaking by analyzing the terminal data D1 received from the terminal 1 by the server 50 .

ステップＳＴ１２で、サーバー５０は、発話が開始されたか否かを判定する。発話が開始されていない場合（ステップＳＴ１２；ＮＯ）、サーバー５０は、ステップＳＴ１２の判定を所定時間周期で繰り返す。発話が開始されたと判定した場合（ステップＳＴ１２；ＹＥＳ）、サーバー５０は、ステップＳＴ１３に移行する。 In step ST12, the server 50 determines whether or not speech has started. If the speech has not started (step ST12; NO), the server 50 repeats the determination of step ST12 at predetermined time intervals. If it is determined that speech has started (step ST12; YES), the server 50 proceeds to step ST13.

ステップＳＴ１３で、サーバー５０は、発話したユーザーＵを特定する。具体的には、サーバー５０は、ステップＳＴ１２で発話を検出した端末データＤ１を送信した端末１を特定し、この端末１を使用するユーザーＵが発話をしたと決定する。 In step ST13, the server 50 identifies the user U who has spoken. Specifically, the server 50 identifies the terminal 1 that transmitted the terminal data D1 whose speech was detected in step ST12, and determines that the user U using this terminal 1 has spoken.

ステップＳＴ１４で、サーバー５０は、発話が終了したか否かを判定する。発話が終了していない場合（ステップＳＴ１４；ＮＯ）、サーバー５０は、ステップＳＴ１４の判定を所定時間周期で繰り返す。発話が終了したと判定した場合（ステップＳＴ１４；ＹＥＳ）、サーバー５０は、ステップＳＴ１５に移行する。 In step ST14, the server 50 determines whether or not the speech has ended. If the speech has not ended (step ST14; NO), the server 50 repeats the determination of step ST14 at predetermined time intervals. When determining that the speech has ended (step ST14; YES), the server 50 proceeds to step ST15.

ステップＳＴ１５で、サーバー５０は、発話時間の時間長を算出する。サーバー５０は、ステップＳＴ１２で発話の開始を検出してからステップＳＴ１４で発話の終了を検出するまでの時間を、発話時間の時間長とする。続いて、ステップＳＴ１６で、サーバー５０は、メモリー５３が記憶する発話時間の積算値、すなわち発言量を更新する。サーバー５０は、ステップＳＴ１３で特定されたユーザーＵに対応付けてメモリー５３が記憶する発言量に、ステップＳＴ１５で算出した発話時間の時間長を加算して、メモリー５３が記憶する発言量を更新する。 In step ST15, the server 50 calculates the length of speech time. The server 50 takes the time from detecting the start of speech in step ST12 to detecting the end of speech in step ST14 as the length of speech time. Subsequently, in step ST16, the server 50 updates the integrated value of speech time stored in the memory 53, that is, the speech volume. The server 50 updates the speech volume stored in the memory 53 by adding the speech duration calculated in step ST15 to the speech volume stored in the memory 53 in association with the user U identified in step ST13. .

ステップＳＴ１７で、サーバー５０は、表示データ生成処理を実行することによって、サーバーデータＤ２を生成し、端末１に送信する。ステップＳＴ１７の表示データ生成処理の詳細は、図７を参照して後述する。 In step ST17 , the server 50 generates server data D2 by executing display data generation processing, and transmits the server data D2 to the terminal 1 . Details of the display data generation process in step ST17 will be described later with reference to FIG.

ステップＳＴ１８で、サーバー５０は、会議が終了したか否かを判定する。会議が終了していない場合（ステップＳＴ１８；ＮＯ）、サーバー５０はステップＳＴ１２に戻る。会議が終了した場合（ステップＳＴ１８；ＹＥＳ）、サーバー５０は図６の処理を終了する。 At step ST18, the server 50 determines whether or not the conference has ended. If the conference has not ended (step ST18; NO), the server 50 returns to step ST12. If the conference has ended (step ST18; YES), the server 50 ends the processing of FIG.

図７は、ステップＳＴ１７の表示データ生成処理の詳細を示す。
ステップＳＴ２１で、サーバー５０は、メモリー５３が記憶する発言量を参照し、発言量が閾値より大きいユーザーＵがいるか否かを判定する。発言量が閾値より大きいユーザーＵがいない場合（ステップＳＴ２１；ＮＯ）、サーバー５０はステップＳＴ２２に移行する。また、発言量が閾値より大きいユーザーＵがいる場合（ステップＳＴ２１；ＹＥＳ）、サーバー５０はステップＳＴ２７に移行する。 FIG. 7 shows details of the display data generation process in step ST17.
In step ST21, the server 50 refers to the amount of speech stored in the memory 53 and determines whether or not there is a user U whose amount of speech is greater than the threshold. If there is no user U whose speech volume is greater than the threshold (step ST21; NO), the server 50 proceeds to step ST22. Also, if there is a user U whose speech volume is greater than the threshold (step ST21; YES), the server 50 proceeds to step ST27.

ステップＳＴ２２で、サーバー５０は、メモリー５３が各ユーザーＵに対応付けて記憶する発言量の比を算出する。続いて、サーバー５０は、ステップＳＴ２３で、メモリー５３が各ユーザーＵに対応付けて記憶する発言量の逆比を算出する。逆比とは、ステップＳＴ２２で算出した比の逆数の比である。例えば、ステップＳＴ２２で算出されたユーザーＵ１、Ｕ２、Ｕ３、Ｕ４の発言量の比が１：１：２：３であった場合、ステップＳＴ２３で算出される逆比は、１：１：１／２：１／３である。 In step ST22, the server 50 calculates the speech volume ratio stored in the memory 53 in association with each user U. FIG. Subsequently, the server 50 calculates the inverse ratio of the amount of speech stored in the memory 53 in association with each user U in step ST23. The inverse ratio is the ratio of the reciprocal of the ratio calculated in step ST22. For example, if the speech volume ratio of users U1, U2, U3, and U4 calculated in step ST22 is 1:1:2:3, the inverse ratio calculated in step ST23 is 1:1:1/ 2:1/3.

ステップＳＴ２４で、サーバー５０は、ステップＳＴ２３で算出した逆比に合わせて、ユーザー画像６２１の大きさを決定する。具体的には、サーバー５０は、ユーザー画像６２１ａ、６２１ｂ、６２１ｃ、６２１ｄの大きさの比が、ステップＳＴ２３で算出した逆比の通りになるように、ユーザー画像６２１ａ、６２１ｂ、６２１ｃ、６２１ｄの大きさを決定する。例えば、ユーザーＵ１、Ｕ２、Ｕ３、Ｕ４の発言量の比が１：１：２：３であった場合、ユーザー画像６２１ａ、６２１ｂ、６２１ｃ、６２１ｄの大きさは、１：１：１／２：１／３となる。 At step ST24, the server 50 determines the size of the user image 621 according to the inverse ratio calculated at step ST23. Specifically, the server 50 adjusts the sizes of the user images 621a, 621b, 621c, and 621d so that the ratio of the sizes of the user images 621a, 621b, 621c, and 621d is equal to the inverse ratio calculated in step ST23. determine the For example, if the speech volume ratio of users U1, U2, U3, and U4 is 1:1:2:3, the sizes of user images 621a, 621b, 621c, and 621d are 1:1:1/2: 1/3.

ステップＳＴ２５で、サーバー５０は、ステップＳＴ２４で決定したユーザー画像６２１の大きさに従って、ユーザー画像６２１を含む会議画面６１の表示データを生成する。サーバー５０は、ステップＳＴ２６で、会議画面６１の表示データを含むサーバーデータＤ２を生成し、端末１に送信する。これにより、端末１がディスプレイ１４に表示する画面が更新され、各々の端末１が表示するユーザー画像６２１の大きさが、発言量に対応する大きさに変更される。 At step ST25, the server 50 generates display data of the conference screen 61 including the user image 621 according to the size of the user image 621 determined at step ST24. The server 50 generates server data D2 including the display data of the conference screen 61 and transmits it to the terminal 1 in step ST26. As a result, the screen displayed on the display 14 by the terminal 1 is updated, and the size of the user image 621 displayed by each terminal 1 is changed to the size corresponding to the amount of speech.

また、ステップＳＴ２７で、サーバー５０は、発言量が閾値より大きいユーザーＵのユーザー画像６２１の大きさを、下限値に設定する。続いて、ステップＳＴ２８で、サーバー５０は、メモリー５３が各ユーザーＵに対応付けて記憶する発言量の比を算出する。サーバー５０は、ステップＳＴ２９で、メモリー５３が各ユーザーＵに対応付けて記憶する発言量の逆比を算出する。逆比とは、ステップＳＴ２８で算出した比の逆数の比である。例えば、ステップＳＴ２８で算出されたユーザーＵ１、Ｕ２、Ｕ３、Ｕ４の発言量の比が１：１：２：５であった場合、ステップＳＴ２３で算出される逆比は、１：１：１／２：１／５である。 Also, in step ST27, the server 50 sets the size of the user image 621 of the user U whose speech volume is greater than the threshold to the lower limit. Subsequently, in step ST28, the server 50 calculates the speech volume ratio stored in the memory 53 in association with each user U. FIG. The server 50 calculates the inverse ratio of the amount of speech stored in the memory 53 in association with each user U in step ST29. The inverse ratio is the ratio of the reciprocal of the ratio calculated in step ST28. For example, if the speech volume ratio of users U1, U2, U3, and U4 calculated in step ST28 is 1:1:2:5, the inverse ratio calculated in step ST23 is 1:1:1/ 2:1/5.

ステップＳＴ３０で、サーバー５０は、ユーザー画像６２１の大きさの下限値と、ステップＳＴ２９で算出した逆比とに合わせて、ユーザー画像６２１の大きさを決定する。具体的には、サーバー５０は、下限値に設定されたユーザー画像６２１を除く、他のユーザー画像６２１の大きさを、ステップＳＴ２９で算出した逆比の通りになるように決定する。例えば、ユーザーＵ１、Ｕ２、Ｕ３、Ｕ４の発言量の比が１：１：２：５であった場合、ユーザーＵ４に対応するユーザー画像６２１ｄの大きさは下限値に設定される。例えば、下限値が１／４である場合、サーバー５０は、ユーザー画像６２１ａ、６２１ｂ、６２１ｃ、６２１ｄの大きさを、大きさの比がユーザーＵ１、Ｕ２、Ｕ３の発言量の逆比である１：１：１／２：１／４となるように決定する。 In step ST30, the server 50 determines the size of the user image 621 according to the lower limit of the size of the user image 621 and the inverse ratio calculated in step ST29. Specifically, the server 50 determines the sizes of the other user images 621, excluding the user image 621 set to the lower limit, so as to follow the inverse ratio calculated in step ST29. For example, if the speech volume ratio of users U1, U2, U3, and U4 is 1:1:2:5, the size of the user image 621d corresponding to user U4 is set to the lower limit. For example, if the lower limit is 1/4, the server 50 sets the sizes of the user images 621a, 621b, 621c, and 621d to 1, where the ratio of sizes is the inverse ratio of the amount of speech of users U1, U2, and U3. : 1:1/2:1/4.

ステップＳＴ３０で、サーバー５０は、発言量の逆比となるように決定したユーザー画像６２１の大きさが、下限値未満とならないように処理を行う。上述の例では、ユーザー画像６２１ａ、６２１ｂ、６２１ｃの大きさを、下限値よりも大きくなるように決定する。ステップＳＴ３０の処理後、サーバー５０はステップＳＴ２５に移行する。 In step ST30, the server 50 performs processing so that the size of the user image 621 determined to be the inverse ratio of the amount of speech does not fall below the lower limit. In the above example, the sizes of the user images 621a, 621b, and 621c are determined to be larger than the lower limit. After the processing of step ST30, the server 50 proceeds to step ST25.

ステップＳＴ２４及びステップＳＴ３０において、サーバー５０は、複数のユーザー画像６２１が会議画面６１において重なることを許容してもよい。具体的には、会議画面６１において複数のユーザー画像６２１が重なって表示されるように、ユーザー画像６２１の大きさを決定してもよい。この場合、会議画面６１に対するユーザー画像６２１の大きさの制約が緩和されるため、発言量を反映した大きさでユーザー画像６２１を端末１に表示させることができる。 In steps ST24 and ST30, the server 50 may allow multiple user images 621 to overlap on the conference screen 61. FIG. Specifically, the size of the user images 621 may be determined such that a plurality of user images 621 are displayed in an overlapping manner on the conference screen 61 . In this case, since restrictions on the size of the user image 621 with respect to the conference screen 61 are relaxed, the user image 621 can be displayed on the terminal 1 in a size that reflects the amount of speech.

また、図７には、サーバー５０が、ユーザー画像６２１ａ、６２１ｂ、６２１ｃ、６２１ｄの大きさの比が発言量の大きさの逆比となるように処理を行う例を示したが、これは一例である。例えば、サーバー５０は、ユーザーＵ１、Ｕ２、Ｕ３、Ｕ４の発言量を大きい順に並べることにより、発言量の順位を生成する。この場合、サーバー５０は、ユーザー画像６２１ａ、６２１ｂ、６２１ｃ、６２１ｄの大きさが、発言量の順位の逆順となるように、ユーザー画像６２１ａ、６２１ｂ、６２１ｃ、６２１ｄの大きさを決定してもよい。この例では、ユーザー画像６２１ａ、６２１ｂ、６２１ｃ、６２１ｄの大きさを、発言量の大きさを反映した大きさとすることができる。また、サーバー５０は、ユーザー画像６２１ａ、６２１ｂ、６２１ｃ、６２１ｄの大きさを詳細に算出する必要がないので、処理負荷を軽減できる。この例において、サーバー５０は、ユーザー画像６２１の大きさを段階的に設定してもよい。すなわち、サーバー５０において、ユーザー画像６２１の大きさが予め段階的に設定されてもよい。この場合、サーバー５０は、ユーザー画像６２１の大きさを、予め設定された大きさから選択すればよいので、サーバー５０の処理負荷を、より一層軽減できる。 Also, FIG. 7 shows an example in which the server 50 performs processing so that the size ratio of the user images 621a, 621b, 621c, and 621d is the inverse ratio of the amount of speech, but this is just an example. is. For example, the server 50 generates the ranking of the amount of speech by arranging the amounts of speech of the users U1, U2, U3, and U4 in descending order. In this case, the server 50 may determine the sizes of the user images 621a, 621b, 621c, and 621d so that the sizes of the user images 621a, 621b, 621c, and 621d are in reverse order of the order of speech volume. . In this example, the sizes of the user images 621a, 621b, 621c, and 621d can be set to reflect the amount of speech. Moreover, since the server 50 does not need to calculate the sizes of the user images 621a, 621b, 621c, and 621d in detail, the processing load can be reduced. In this example, the server 50 may set the size of the user image 621 in stages. That is, in the server 50, the size of the user image 621 may be set stepwise in advance. In this case, the server 50 can select the size of the user image 621 from preset sizes, so that the processing load on the server 50 can be further reduced.

［４．実施形態の作用］
以上説明したように、会議システム１００の制御方法は、第１端末から入力される第１音声の時間長を示す第１発言量に基づく第１の大きさで、第１端末に対応する第１画像を表示することを含む。この制御方法は、第２端末から入力される第２音声の時間長を示す第２発言量に基づく第２の大きさで、第１画像と並べて、第２端末に対応する第２画像を表示することを含む。この制御方法において、第１発言量が第２発言量よりも大きい場合には、第１の大きさは第２の大きさよりも小さい。ここで、第１端末は、会議システム１００において端末１Ａ、１Ｂ、１Ｃ、１Ｄのいずれにも適用可能である。第２端末は、端末１Ａ、１Ｂ、１Ｃ、１Ｄのうち第１端末でないいずれかの端末１とすることができる。 [4. Action of Embodiment]
As described above, the control method of the conference system 100 is to set the first voice input from the first terminal to the first voice corresponding to the first terminal at the first magnitude based on the first speech volume indicating the time length of the first voice input from the first terminal. Including displaying images. In this control method, the second image corresponding to the second terminal is displayed side by side with the first image at a second size based on the second speech volume indicating the time length of the second voice input from the second terminal. including doing In this control method, the first magnitude is smaller than the second magnitude when the first speech volume is greater than the second speech volume. Here, the first terminal can be applied to any of the terminals 1A, 1B, 1C, and 1D in the conference system 100. FIG. The second terminal may be any one of the terminals 1A, 1B, 1C, 1D that is not the first terminal.

この方法によれば、第１端末から入力される音声の発言量と第２端末から入力される音声の発言量とが異なる場合に、発言量の少ない第２端末に対応する第２画像が、第１画像よりも大きく表示される。このため、発言量が小さいユーザーＵに対応する画像が、発言量が大きいユーザーＵに対応する画像よりも大きく表示される。これにより、会議において発言量が小さいユーザーＵの画像が、より目立つように表示されるので、該当するユーザーＵに対して発言を促すことができる。従って、会議における発言機会の偏りを抑制し、発言機会の均等化を促進できる。 According to this method, when the utterance volume of the voice input from the first terminal is different from the utterance volume of the voice input from the second terminal, the second image corresponding to the second terminal with the smaller utterance volume is It is displayed larger than the first image. Therefore, the image corresponding to the user U whose speech volume is small is displayed larger than the image corresponding to the user U whose speech volume is large. As a result, the image of the user U whose speaking volume is small in the conference is displayed more conspicuously, so that the corresponding user U can be encouraged to speak. Therefore, it is possible to suppress bias in speaking opportunities in the conference and promote equalization of speaking opportunities.

上記制御方法において、第１発言量が第２発言量よりも小さい場合には、第１の大きさは第２の大きさよりも大きい。このため、発言量の大きいユーザーＵに対応する画像が、相対的に小さく表示される。これにより、会議において発言量が大きいユーザーＵの画像が、相対的に小さく表示されるので、該当するユーザーＵに対して発言の抑制を促すことができる。従って、会議における発言機会の偏りを抑制し、発言機会の均等化を促進できる。 In the above control method, when the first speech volume is smaller than the second speech volume, the first magnitude is greater than the second magnitude. Therefore, the image corresponding to the user U who speaks a lot is displayed in a relatively small size. As a result, the image of the user U who speaks a lot in the conference is displayed in a relatively small size, so that the corresponding user U can be urged to refrain from speaking. Therefore, it is possible to suppress bias in speaking opportunities in the conference and promote equalization of speaking opportunities.

上記制御方法において、第１の大きさは、第１の大きさの下限値以上の大きさである。これにより、相対的に小さく表示される画像の視認性を確保できる。 In the above control method, the first magnitude is equal to or greater than the lower limit of the first magnitude. Thereby, the visibility of the image displayed relatively small can be ensured.

上記制御方法において、第１発言量が閾値より大きい場合には、第１の大きさは下限値である。このため、負荷の軽い処理によって、発言量が閾値より大きいユーザーＵに対応する画像を相対的に小さくすることができる。 In the above control method, when the first speech volume is greater than the threshold, the first magnitude is the lower limit. Therefore, the image corresponding to the user U whose speech volume is larger than the threshold can be made relatively small by processing with a light load.

上記制御方法において、第１発言量と第２発言量とが等しい場合には、第１の大きさと第２の大きさとは等しい。これにより、会議における発言量が均等に近い場合に、発言機会の偏りを生じさせない効果が期待できる。 In the above control method, when the first speech volume is equal to the second speech volume, the first magnitude and the second magnitude are equal. As a result, when the amount of speech in a conference is nearly equal, the effect of not causing bias in speaking opportunities can be expected.

上記制御方法は、第１端末及び第２端末を利用する会議が開始されてからの第１音声の時間長を積算することによって、第１発言量を算出することを含む。上記制御方法は、会議が開始されてからの第２音声の時間長を積算することによって第２発言量を算出することを含む。これにより、会議が開始されてから発言の時間長を積算することによって発言量を求めることができる。この方法によれば、会議が開始されてからの発言量に応じて画像の大きさを決定することができ、発言量の変化を反映するように画像の大きさを変化させることもできる。このため、会議中に発言機会の偏りを抑制し、発言機会の均等化を促進する効果が期待できる。 The control method includes calculating the first speech volume by accumulating the time length of the first voice after the conference using the first terminal and the second terminal is started. The above control method includes calculating the second speech volume by accumulating the time length of the second voice after the conference is started. As a result, the amount of speech can be obtained by accumulating the time length of speech from the start of the conference. According to this method, the size of the image can be determined according to the amount of speech after the conference is started, and the size of the image can be changed so as to reflect the change in the amount of speech. Therefore, it is possible to expect an effect of suppressing bias in speaking opportunities during a conference and promoting equalization of speaking opportunities.

会議システム１００を構成する端末１は、制御回路１１を含む。制御回路１１は、第１端末から入力される第１音声の時間長を示す第１発言量に基づく第１の大きさで、第１端末に対応する第１画像を表示装置に表示する。制御回路１１は、第２端末から入力される第２音声の時間長を示す第２発言量に基づく第２の大きさで、第１画像と並べて、第２端末に対応する第２画像を表示装置に表示する。第１発言量が第２発言量よりも多い場合には、第１の大きさは第２の大きさよりも小さい。この構成は、例えば、表示装置として、ディスプレイ１４ａを有する端末１Ａ、ディスプレイ１４ｂを有する端末１Ｂ、ディスプレイ１４ｃを有する端末１Ｃ、及び、ディスプレイ１４ｄを有する端末１Ｄのいずれにも適用できる。また、第１端末は、会議システム１００において端末１Ａ、１Ｂ、１Ｃ、１Ｄのいずれにも適用可能である。第２端末は、端末１Ａ、１Ｂ、１Ｃ、１Ｄのうち第１端末でないいずれかの端末１とすることができる。 A terminal 1 that constitutes the conference system 100 includes a control circuit 11 . The control circuit 11 displays the first image corresponding to the first terminal on the display device with a first size based on the first speech amount indicating the time length of the first voice input from the first terminal. The control circuit 11 displays the second image corresponding to the second terminal side by side with the first image in a second size based on the second speech volume indicating the time length of the second voice input from the second terminal. display on the device. The first magnitude is smaller than the second magnitude when the first speech volume is greater than the second speech volume. This configuration can be applied to any of terminal 1A having display 14a, terminal 1B having display 14b, terminal 1C having display 14c, and terminal 1D having display 14d, for example. Also, the first terminal can be applied to any of the terminals 1A, 1B, 1C, and 1D in the conference system 100. FIG. The second terminal may be any one of the terminals 1A, 1B, 1C, 1D that is not the first terminal.

この構成によれば、第１端末から入力される音声の発言量と第２端末から入力される音声の発言量とが異なる場合に、ディスプレイ１４に、発言量の少ない第２端末に対応する第２画像が、第１画像よりも大きく表示される。このため、発言量が小さいユーザーＵに対応する画像が、発言量が大きいユーザーＵに対応する画像よりも大きく表示される。これにより、会議において発言量が小さいユーザーＵの画像が、より目立つように表示されるので、該当するユーザーＵに対して発言を促すことができる。従って、会議における発言機会の偏りを抑制し、発言機会の均等化を促進できる。 According to this configuration, when the utterance volume of the voice input from the first terminal and the utterance volume of the voice input from the second terminal are different, the display 14 displays the number corresponding to the second terminal with the smaller utterance volume. Two images are displayed larger than the first image. Therefore, the image corresponding to the user U whose speech volume is small is displayed larger than the image corresponding to the user U whose speech volume is large. As a result, the image of the user U whose speaking volume is small in the conference is displayed more conspicuously, so that the corresponding user U can be encouraged to speak. Therefore, it is possible to suppress bias in speaking opportunities in the conference and promote equalization of speaking opportunities.

会議システム１００は、第１端末と、第２端末と、第３端末と、を含む。会議システム１００は、第１端末から入力される第１音声の時間長を示す第１発言量に基づく第１の大きさで、第１端末に対応する第１画像を第３端末によって表示する。会議システム１００は、第２端末から入力される第２音声の時間長を示す第２発言量に基づく第２の大きさで、第１画像と並べて、第２端末に対応する第２画像を第３端末によって表示する。会議システム１００において、第１発言量が第２発言量よりも大きい場合には、第１の大きさは第２の大きさよりも小さい。 Conference system 100 includes a first terminal, a second terminal, and a third terminal. The conference system 100 causes the third terminal to display the first image corresponding to the first terminal in a first size based on the first speech volume indicating the time length of the first voice input from the first terminal. The conference system 100 arranges the second image corresponding to the second terminal side by side with the first image at a second size based on the second speech volume indicating the time length of the second voice input from the second terminal. 3 Display by terminal. In conference system 100, when the first speech volume is greater than the second speech volume, the first magnitude is smaller than the second magnitude.

会議システム１００は、第３端末によって、第１端末に対応する画像および第２端末に対応する画像を表示させる。この構成において、第１端末から入力される音声の発言量と第２端末から入力される音声の発言量とが異なる場合に、発言量の少ない第２端末に対応する画像が、第１端末に対応する画像よりも相対的に大きく表示される。このため、会議において発言量が小さいユーザーＵの画像が、より目立つように表示されるので、該当するユーザーＵに対して発言を促すことができる。従って、会議における発言機会の偏りを抑制し、発言機会の均等化を促進できる。 The conference system 100 causes the third terminal to display an image corresponding to the first terminal and an image corresponding to the second terminal. In this configuration, when the utterance volume of the voice input from the first terminal is different from the utterance volume of the voice input from the second terminal, the image corresponding to the second terminal with the smaller utterance volume is displayed on the first terminal. It appears relatively larger than the corresponding image. As a result, the image of the user U whose speech volume is small in the conference is displayed more conspicuously, so that the corresponding user U can be encouraged to speak. Therefore, it is possible to suppress bias in speaking opportunities in the conference and promote equalization of speaking opportunities.

制御プログラム５３１は、第１端末、第２端末、及び、第３端末と通信可能なコンピューターであるサーバー５０が実行するプログラムである。制御プログラム５３１は、サーバー５０に、第１端末から入力される第１音声の時間長を示す第１発言量に基づく第１の大きさで、第１端末に対応する第１画像を第３端末によって表示することを実行させる。また、第２端末から入力される第２音声の時間長を示す第２発言量に基づく第２の大きさで、第１画像と並べて、第２端末に対応する第２画像を第３端末によって表示することを実行させる。ここで、第１発言量が第２発言量よりも大きい場合には、第１の大きさは第２の大きさよりも小さい。 The control program 531 is a program executed by the server 50, which is a computer capable of communicating with the first terminal, second terminal, and third terminal. The control program 531 causes the server 50 to transmit the first image corresponding to the first terminal to the third terminal with the first size based on the first speech volume indicating the time length of the first voice input from the first terminal. causes the display to be performed. Also, a second image corresponding to the second terminal is displayed by the third terminal in a second size based on the second speech volume indicating the time length of the second voice input from the second terminal, side by side with the first image. Make it do what it displays. Here, when the first speech volume is larger than the second speech volume, the first magnitude is smaller than the second magnitude.

このプログラムによれば、サーバー５０が、第３端末によって、第１端末に対応する画像および第２端末に対応する画像を表示させる。サーバー５０は、第３端末により、第１端末から入力される音声の発言量と第２端末から入力される音声の発言量とが異なる場合に、発言量の少ない第２端末に対応する画像を、第１端末に対応する画像よりも相対的に大きく表示させる。このため、会議において発言量が小さいユーザーＵの画像が、より目立つように表示されるので、該当するユーザーＵに対して発言を促すことができる。従って、会議における発言機会の偏りを抑制し、発言機会の均等化を促進できる。 According to this program, the server 50 causes the third terminal to display an image corresponding to the first terminal and an image corresponding to the second terminal. When the utterance volume of the voice input from the first terminal and the utterance volume of the voice input from the second terminal are different from the third terminal, the server 50 displays an image corresponding to the second terminal with the smaller utterance volume. , is displayed relatively larger than the image corresponding to the first terminal. As a result, the image of the user U whose speech volume is small in the conference is displayed more conspicuously, so that the corresponding user U can be encouraged to speak. Therefore, it is possible to suppress bias in speaking opportunities in the conference and promote equalization of speaking opportunities.

［５．他の実施形態］
上記各実施形態は本発明を適用した一具体例を示すものであり、本発明はこれに限定されるものではない。 [5. Other embodiments]
Each of the above embodiments shows a specific example to which the present invention is applied, and the present invention is not limited to this.

上記実施形態では、端末１が表示装置としてディスプレイ１４を備える構成を例に挙げて説明したが、表示装置はディスプレイ１４に限定されない。例えば、端末１は、表示装置として、スクリーン等の投写面に画像を投写するプロジェクターを備えてもよい。 In the above embodiment, the terminal 1 includes the display 14 as a display device. However, the display device is not limited to the display 14 . For example, the terminal 1 may include, as a display device, a projector that projects an image onto a projection surface such as a screen.

会議システム１００において、端末１がサーバー５０に送信する端末データＤ１、及び、サーバー５０が端末１に送信するサーバーデータＤ２のデータフォーマットは制限されない。例えば、サーバーデータＤ２に含まれる表示データは、端末１が表示する会議画面６１のデータであってもよいし、端末１が表示データに基づいて会議画面６１のデータを生成する処理を行ってもよい。 In the conference system 100, the data formats of the terminal data D1 that the terminal 1 transmits to the server 50 and the server data D2 that the server 50 transmits to the terminal 1 are not limited. For example, the display data included in the server data D2 may be the data of the conference screen 61 displayed by the terminal 1, or the terminal 1 may generate the data of the conference screen 61 based on the display data. good.

サーバー５０が生成するサーバーデータＤ２１、Ｄ２２、Ｄ２３、Ｄ２４は、異なる表示データを含んでもよいし、共通の表示データを含んでもよい。サーバーデータＤ２１、Ｄ２２、Ｄ２３、Ｄ２４が共通の表示データを含む場合、会議システム１００に参加する全てのユーザーＵに、発言量に基づく大きさでユーザー画像６２１を見せることができる。また、サーバー５０は、サーバーデータＤ２１、Ｄ２２、Ｄ２３、Ｄ２４の表示データを異ならせることによって、特定のユーザーＵが使用する端末１によって、発言量に基づく大きさでユーザー画像６２１を表示してもよい。例えば、発言量が最も小さいユーザーＵが使用する端末１、或いは、発言量が最も大きいユーザーＵが使用する端末１を対象として、発言量に基づく大きさでユーザー画像６２１を表示してもよい。この場合、発言量の偏りを生じているユーザーＵに対し、発言の促進、または、発言の抑制を促すことができる。 The server data D21, D22, D23, and D24 generated by the server 50 may contain different display data or may contain common display data. When the server data D21, D22, D23, and D24 include common display data, all users U participating in the conference system 100 can be shown the user image 621 in a size based on the speaking volume. Further, the server 50 may display the user image 621 in a size based on the amount of speech on the terminal 1 used by the specific user U by changing the display data of the server data D21, D22, D23, and D24. good. For example, the user image 621 may be displayed in a size based on the amount of speech, targeting the terminal 1 used by the user U with the smallest amount of speech or the terminal 1 used by the user U with the largest amount of speech. In this case, it is possible to prompt the user U, who has a biased amount of speech, to promote speech or suppress speech.

また、図３に示した各機能部は、機能的構成を示すものであって、具体的な実装形態を制限しない。例えば、サーバー５０が、サーバー制御回路５１の各機能部に個別に対応するハードウェアを実装する必要はなく、一つのプロセッサーがプログラムを実行することで複数の機能部の機能を実現する構成とすることも勿論可能である。また、上記施形態においてソフトウェアで実現される機能の一部をハードウェアで実現してもよく、或いは、ハードウェアで実現される機能の一部をソフトウェアで実現してもよい。その他、会議システム１００の他の各部の具体的な細部構成についても、趣旨を逸脱しない範囲で任意に変更可能である。 Moreover, each functional unit shown in FIG. 3 shows a functional configuration, and does not limit a specific implementation form. For example, the server 50 does not need to implement hardware individually corresponding to each functional unit of the server control circuit 51, and the configuration is such that one processor executes a program to realize the functions of a plurality of functional units. Of course, it is also possible. Further, a part of the functions realized by software in the above embodiments may be realized by hardware, or a part of the functions realized by hardware may be realized by software. In addition, the specific detailed configuration of other units of the conference system 100 can be arbitrarily changed without departing from the scope.

また、例えば、図６及び図７に示す動作のステップ単位は、会議システム１００の動作の理解を容易にするために、主な処理内容に応じて分割したものであり、処理単位の分割の仕方や名称によって、本開示が限定されることはない。処理内容に応じて、多くのステップ単位に分割してもよい。また、１つのステップ単位が多くの処理を含むように分割してもよい。また、そのステップの順番は、本開示の趣旨に支障のない範囲で適宜に入れ替えてもよい。 Further, for example, the operation step units shown in FIGS. 6 and 7 are divided according to the main processing contents in order to facilitate understanding of the operation of the conference system 100. The disclosure is not limited by any name or name. It may be divided into many steps depending on the processing contents. Also, one step unit may be divided so as to include many processes. Also, the order of the steps may be changed as appropriate within the scope of the present disclosure.

１、１Ａ、１Ｂ、１Ｃ、１Ｄ…端末（端末装置）、７…通信ネットワーク、１１…制御回路、１２…プロセッサー、１３…メモリー、１４、１４ａ、１４ｂ、１４ｃ、１４ｄ…ディスプレイ、１６ａ、１６ｂ、１６ｃ、１６ｄ…カメラ、１７ａ、１７ｂ、１７ｃ、１７ｄ…マイク、１８ａ、１８ｂ、１８ｃ、１８ｄ…スピーカー、５０…サーバー、５１…サーバー制御回路、５２…プロセッサー、５３…メモリー、５４…通信装置、１００…会議システム、５２１…通信制御部、５２２…設定部、５２３…端末識別部、５２４…音声処理部、５３１…制御プログラム（プログラム）、５３２…会議データ、６２１、６２１ａ、６２１ｂ、６２１ｃ、６２１ｄ…ユーザー画像、Ｕ、Ｕ１、Ｕ２、Ｕ３、Ｕ４…ユーザー。 1, 1A, 1B, 1C, 1D... terminal (terminal device), 7... communication network, 11... control circuit, 12... processor, 13... memory, 14, 14a, 14b, 14c, 14d... display, 16a, 16b, 16c, 16d... camera, 17a, 17b, 17c, 17d... microphone, 18a, 18b, 18c, 18d... speaker, 50... server, 51... server control circuit, 52... processor, 53... memory, 54... communication device, 100 Conference system 521 Communication control unit 522 Setting unit 523 Terminal identification unit 524 Audio processing unit 531 Control program (program) 532 Conference data 621, 621a, 621b, 621c, 621d User images, U, U1, U2, U3, U4 . . . users.

Claims

displaying a first image corresponding to the first terminal at a first size based on a first speech volume indicating a duration of a first voice input from the first terminal;
Displaying a second image corresponding to the second terminal side by side with the first image at a second size based on a second amount of speech indicating a time length of a second voice input from the second terminal. , including
The method of controlling a conference system, wherein the first magnitude is smaller than the second magnitude when the first speech volume is greater than the second speech volume.

2. The method of controlling a conference system according to claim 1, wherein said first magnitude is greater than said second magnitude when said first speech volume is smaller than said second speech volume.

3. The conference system control method according to claim 1, wherein said first size is equal to or greater than a lower limit value of said first size.

4. The conference system control method according to claim 3, wherein said first magnitude is said lower limit value when said first speech volume is greater than a threshold.

4. The control of the conference system according to any one of claims 1 to 3, wherein when said first speech volume and said second speech volume are equal, said first magnitude and said second magnitude are equal. Method.

calculating the first speech volume by accumulating the time length of the first voice after the conference using the first terminal and the second terminal is started;
5. The conference system according to any one of claims 1 to 4, comprising: calculating the second speech volume by accumulating the time length of the second voice after the conference is started. control method.

including a control circuit,
The control circuit is
outputting a first image corresponding to the first terminal to a display device at a first magnitude based on a first speech volume indicating a duration of a first voice input from the first terminal;
displaying a second image corresponding to the second terminal on the display device in a second size based on a second amount of speech indicating a time length of the second voice input from the second terminal, side by side with the first image; and run the
The terminal device, wherein the first magnitude is smaller than the second magnitude when the first speech volume is greater than the second speech volume.

A program executed by a computer capable of communicating with a first terminal, a second terminal, and a third terminal,
on said computer;
displaying, by the third terminal, a first image corresponding to the first terminal at a first size based on a first speech volume indicating a duration of a first voice input from the first terminal;
The second image corresponding to the second terminal is displayed in the third size side by side with the first image at a second size based on the second speech volume indicating the time length of the second voice input from the second terminal. display by the terminal and run
A program according to claim 1, wherein said first magnitude is smaller than said second magnitude when said first speech volume is greater than said second speech volume.