JP2023015877A

JP2023015877A - Conference control device, conference control method, and computer program

Info

Publication number: JP2023015877A
Application number: JP2021119941A
Authority: JP
Inventors: 聡小柴; Satoshi Koshiba; 進若林; Susumu Wakabayashi; 幸典岸本; Yukinori Kishimoto
Original assignee: NTT Communications Corp
Current assignee: NTT Communications Corp
Priority date: 2021-07-20
Filing date: 2021-07-20
Publication date: 2023-02-01

Abstract

To make it possible to easily confirm the status of other participants in a conference held via a network.SOLUTION: A conference control device according to one aspect of the present invention includes, a display information generation unit that performs image processing so as to approach a predetermined standard for an image of a user participating in a conference held over a network, generates display information including the user's image that has undergone image processing, and transmits the display information to a user terminal used by another user, and an audio control unit that controls audio in the conference.SELECTED DRAWING: Figure 1

Description

本発明は、オンラインで話をすることを支援する技術に関する。 TECHNICAL FIELD The present invention relates to technology for supporting online conversations.

従来から、ネットワークを介して会議を行うためのシステムが提案されている。最近では特に、各参加者がそれぞれカメラを使用して自身の顔を含む画像を他者と共有しながら会議を行う仕組みも普及している（例えば特許文献１参照）。 Conventionally, systems for holding conferences over networks have been proposed. In recent years, in particular, a mechanism has become widespread in which each participant uses a camera to hold a conference while sharing an image including his or her own face with others (see Patent Document 1, for example).

特開２０１５－１５４３１５号公報JP 2015-154315 A

しかしながら、会議の状況や各参加者のカメラの使用環境など種々の要因により、他の参加者の状況が確認しにくくなってしまうことがあった。
上記事情に鑑み、本発明は、ネットワークを介して行われる会議において、他の参加者の状況をより容易に確認しやすくすることが可能となる技術の提供を目的としている。 However, due to various factors such as the situation of the conference and the usage environment of each participant's camera, it has sometimes become difficult to check the situation of other participants.
SUMMARY OF THE INVENTION In view of the circumstances described above, an object of the present invention is to provide a technology that makes it possible to more easily check the status of other participants in a conference held via a network.

本発明の一態様は、ネットワークを介して行われる会議に参加しているユーザーの画像について、所定の基準に近づくように画像処理を行い、画像処理が行われた前記ユーザーの画像を含む表示情報を生成し、前記表示情報を他のユーザーによって使用されるユーザー端末に送信する表示情報生成部と、前記会議における音声を制御する音声制御部と、を備える会議制御装置である。 According to one aspect of the present invention, an image of a user participating in a conference held via a network is subjected to image processing so as to approach a predetermined standard, and display information including the image of the user subjected to the image processing. and transmits the display information to user terminals used by other users; and a voice control unit that controls voice in the meeting.

本発明の一態様は、上記の会議制御装置であって、前記所定の基準は、前記ユーザーの顔の画像の大きさに関する基準である。 An aspect of the present invention is the conference control device described above, wherein the predetermined standard is a standard relating to the size of the image of the user's face.

本発明の一態様は、上記の会議制御装置であって、前記所定の基準は、前記ユーザーの顔の画像のホワイトバランス又はコントラストに関する基準である。 An aspect of the present invention is the conference control device described above, wherein the predetermined criterion is a criterion relating to white balance or contrast of the image of the user's face.

本発明の一態様は、上記の会議制御装置であって、前記ユーザーの画像に基づいて前記ユーザーの感情状態を推定する推定部をさらに備え、前記表示情報生成部は、前記推定部における推定結果の感情状態に応じて前記画像に画像処理を行う。 One aspect of the present invention is the conference control device described above, further comprising an estimation unit that estimates the user's emotional state based on the image of the user, wherein the display information generation unit generates an estimation result of the estimation unit. performing image processing on the image according to the emotional state of the image.

本発明の一態様は、ネットワークを介して行われる会議に参加しているユーザーの画像について、所定の基準に近づくように画像処理を行い、画像処理が行われた前記ユーザーの画像を含む表示情報を生成し、前記表示情報を他のユーザーによって使用されるユーザー端末に送信する表示情報生成ステップと、前記会議における音声を制御する音声制御ステップと、を有する会議制御方法である。 According to one aspect of the present invention, an image of a user participating in a conference held via a network is subjected to image processing so as to approach a predetermined standard, and display information including the image of the user subjected to the image processing. and transmitting the display information to user terminals used by other users; and an audio control step of controlling audio in the conference.

本発明の一態様は、上記の会議制御装置としてコンピューターを機能させるためのコンピュータープログラムである。 One aspect of the present invention is a computer program for causing a computer to function as the conference control device.

本発明により、ネットワークを介して行われる会議において、他の参加者の状況をより容易に確認しやすくすることが可能となる。 INDUSTRIAL APPLICABILITY According to the present invention, it becomes possible to easily check the status of other participants in a conference held via a network.

本発明の会議システム１００のシステム構成を示す概略ブロック図である。1 is a schematic block diagram showing the system configuration of a conference system 100 of the present invention; FIG. ユーザー端末１０の機能構成の具体例を示す概略ブロック図である。2 is a schematic block diagram showing a specific example of the functional configuration of the user terminal 10; FIG. 会議制御装置２０の機能構成の具体例を示す概略ブロック図である。2 is a schematic block diagram showing a specific example of the functional configuration of a conference control device 20; FIG. 従来の技術でユーザー端末に表示される画像の具体例を示す図である。FIG. 10 is a diagram showing a specific example of an image displayed on a user terminal according to conventional technology; ユーザー端末１０の表示部１３に表示される画像の具体例を示す図である。4 is a diagram showing a specific example of an image displayed on the display unit 13 of the user terminal 10; FIG. ユーザー端末１０の表示部１３に表示される画像の具体例を示す図である。4 is a diagram showing a specific example of an image displayed on the display unit 13 of the user terminal 10; FIG. ユーザー端末１０の表示部１３に表示される画像の具体例を示す図である。4 is a diagram showing a specific example of an image displayed on the display unit 13 of the user terminal 10; FIG. ユーザー端末１０の表示部１３に表示される画像の具体例を示す図である。4 is a diagram showing a specific example of an image displayed on the display unit 13 of the user terminal 10; FIG. ユーザー端末１０の表示部１３に表示される画像の具体例を示す図である。4 is a diagram showing a specific example of an image displayed on the display unit 13 of the user terminal 10; FIG. 会議システム１００の処理の流れの具体例を示すシーケンスチャートである。4 is a sequence chart showing a specific example of the flow of processing of the conference system 100;

以下、本発明の具体的な構成例について、図面を参照しながら説明する。なお、以下の説明では、２名以上のユーザーが他者に対して発話を行うための仮想的な繋がりを示す概念を会議室と呼ぶ。そのため、以下の説明における会議室は、必ずしもその名称が会議室である必要は無く、例えば単に会話と呼ばれたりセッションと呼ばれたりするものであっても、２名以上のユーザーが他者に対して発話を行う仮想的な場であれば全て以下の説明における会議室に相当する。例えば、特定のユーザー（例えば講師）が複数（50名や100名などの多数を含む）の他者に対して一方的に発話を行うセミナーやプレゼンテーションが行われる仮想的な繋がりも、以下の説明における会議室に含まれる。 Hereinafter, specific configuration examples of the present invention will be described with reference to the drawings. In the following description, the concept of a virtual connection between two or more users speaking to others is called a conference room. Therefore, the name of the conference room in the following explanation does not necessarily have to be a conference room. On the other hand, any virtual place for speaking corresponds to the conference room in the following description. For example, a virtual connection in which a specific user (e.g. lecturer) speaks unilaterally to a plurality of others (including a large number such as 50 or 100 people) or a presentation is held. Included in the conference room in

図１は、本発明の会議システム１００のシステム構成を示す概略ブロック図である。会議システム１００は、ユーザー端末１０を操作するユーザー同士がネットワーク４０を介して会議を行うためのシステムである。会議システム１００は、複数のユーザー端末１０及び会議制御装置２０を含む。複数のユーザー端末１０及び会議制御装置２０は、ネットワーク３０を介して通信可能に接続される。ネットワーク３０は、無線通信を用いたネットワークであってもよいし、有線通信を用いたネットワークであってもよい。ネットワーク３０は、複数のネットワークが組み合わされて構成されてもよい。 FIG. 1 is a schematic block diagram showing the system configuration of a conference system 100 of the present invention. The conference system 100 is a system for users who operate the user terminals 10 to have a conference via the network 40 . A conference system 100 includes a plurality of user terminals 10 and a conference control device 20 . A plurality of user terminals 10 and conference control devices 20 are communicably connected via a network 30 . The network 30 may be a network using wireless communication or a network using wired communication. Network 30 may be configured by combining a plurality of networks.

図２は、ユーザー端末１０の機能構成の具体例を示す概略ブロック図である。ユーザー端末１０は、例えばスマートフォン、タブレット、パーソナルコンピューター、携帯ゲーム機、据え置き型ゲーム機、専用機器などの情報機器を用いて構成される。ユーザー端末１０は、通信部１１、操作部１２、表示部１３、音声入力部１４、音声出力部１５、記憶部１６及び制御部１７を備える。 FIG. 2 is a schematic block diagram showing a specific example of the functional configuration of the user terminal 10. As shown in FIG. The user terminal 10 is configured using information equipment such as a smart phone, tablet, personal computer, portable game machine, stationary game machine, and dedicated equipment. The user terminal 10 includes a communication section 11 , an operation section 12 , a display section 13 , an audio input section 14 , an audio output section 15 , a storage section 16 and a control section 17 .

通信部１１は、通信機器である。通信部１１は、例えばネットワークインターフェースとして構成されてもよい。通信部１１は、制御部１７の制御に応じて、ネットワーク３０を介して他の装置とデータ通信する。通信部１１は、無線通信を行う装置であってもよいし、有線通信を行う装置であってもよい。 The communication unit 11 is a communication device. The communication unit 11 may be configured as a network interface, for example. The communication unit 11 performs data communication with other devices via the network 30 under the control of the control unit 17 . The communication unit 11 may be a device that performs wireless communication, or may be a device that performs wired communication.

操作部１２は、キーボード、ポインティングデバイス（マウス、タブレット等）、ボタン、タッチパネル等の既存の入力装置を用いて構成される。操作部１２は、ユーザーの指示をユーザー端末１０に入力する際にユーザーによって操作される。操作部１２は、入力装置をユーザー端末１０に接続するためのインターフェースであっても良い。この場合、操作部１２は、入力装置においてユーザーの入力に応じ生成された入力信号をユーザー端末１０に入力する。操作部１２は、マイク及び音声認識装置を用いて構成されてもよい。この場合、操作部１２はユーザーによって発話された文言を音声認識し、認識結果の文字列情報をユーザー端末１０に入力する。この場合、操作部１２は音声入力部１４と一体に構成されてもよい。操作部１２は、ユーザーの指示をユーザー端末１０に入力可能な構成であればどのように構成されてもよい。 The operation unit 12 is configured using existing input devices such as a keyboard, pointing device (mouse, tablet, etc.), buttons, touch panel, and the like. The operation unit 12 is operated by the user when inputting user instructions to the user terminal 10 . The operation unit 12 may be an interface for connecting an input device to the user terminal 10 . In this case, the operation unit 12 inputs to the user terminal 10 an input signal generated according to the user's input in the input device. The operation unit 12 may be configured using a microphone and a voice recognition device. In this case, the operation unit 12 performs voice recognition of the words uttered by the user, and inputs the character string information of the recognition result to the user terminal 10 . In this case, the operation unit 12 may be configured integrally with the voice input unit 14 . The operation unit 12 may be configured in any way as long as it is configured to allow the user's instructions to be input to the user terminal 10 .

表示部１３は、液晶ディスプレイ、有機ＥＬ（Electro Luminescence）ディスプレイ等の画像表示装置である。表示部１３は、会議を行う際に用いられる画像データを表示する。表示部１３は、画像表示装置をユーザー端末１０に接続するためのインターフェースであっても良い。この場合、表示部１３は、画像データを表示するための映像信号を生成し、自身に接続されている画像表示装置に映像信号を出力する。 The display unit 13 is an image display device such as a liquid crystal display or an organic EL (Electro Luminescence) display. The display unit 13 displays image data used when holding a conference. The display unit 13 may be an interface for connecting an image display device to the user terminal 10 . In this case, the display unit 13 generates a video signal for displaying image data, and outputs the video signal to an image display device connected thereto.

音声入力部１４は、マイクを用いて構成される。音声入力部１４は、マイクそのものとして構成されてもよいし、外部機器としてマイクをユーザー端末１０に接続するためのインターフェースとして構成されてもよい。マイクは、会議を行うユーザーの発話音声を取得する。音声入力部１４は、マイクによって取得された音声のデータを制御部１７に出力する。 The voice input unit 14 is configured using a microphone. The voice input unit 14 may be configured as a microphone itself, or may be configured as an interface for connecting a microphone as an external device to the user terminal 10 . A microphone picks up the speech voice of the user who conducts the conference. The voice input unit 14 outputs voice data acquired by the microphone to the control unit 17 .

音声出力部１５は、スピーカーやヘッドホンやイヤホン等の音声出力装置を用いて構成される。音声出力部１５は、音声出力装置そのものとして構成されてもよいし、外部機器として音声出力装置をユーザー端末１０に接続するためのインターフェースとして構成されてもよい。音声出力装置は、会議を行うユーザーが音声を聞き取ることができるように音声を出力することが望ましい。音声出力部１５は、制御部１７によって出力される音声信号に応じた音声を出力する。 The audio output unit 15 is configured using an audio output device such as a speaker, headphones, or earphones. The audio output unit 15 may be configured as an audio output device itself, or may be configured as an interface for connecting an audio output device as an external device to the user terminal 10 . It is desirable for the audio output device to output audio so that the conference users can hear the audio. The audio output unit 15 outputs audio according to the audio signal output by the control unit 17 .

記憶部１６は、磁気ハードディスク装置や半導体記憶装置等の記憶装置を用いて構成される。記憶部１６は、制御部１７によって使用されるデータを記憶する。記憶部１６は、例えばユーザー情報記憶部１６１及び発話情報記憶部１６２として機能してもよい。 The storage unit 16 is configured using a storage device such as a magnetic hard disk device or a semiconductor storage device. The storage unit 16 stores data used by the control unit 17 . The storage unit 16 may function as the user information storage unit 161 and the speech information storage unit 162, for example.

ユーザー情報記憶部１６１は、ユーザー端末１０を操作するユーザーに関する情報（以下「ユーザー情報」という。）を記憶する。ユーザー情報は、例えばユーザーの識別情報や属性情報を含んでもよい。属性情報は、例えばユーザーの年齢や性別等に関する情報を含んでもよい。 The user information storage unit 161 stores information about the user who operates the user terminal 10 (hereinafter referred to as “user information”). User information may include user identification information and attribute information, for example. Attribute information may include, for example, information on the user's age, gender, and the like.

制御部１７は、ＣＰＵ（Central Processing Unit）等のプロセッサーとメモリーとを用いて構成される。制御部１７は、プロセッサーがプログラムを実行することによって、表示制御部１７１、会議制御部１７２及び音声制御部１７３として機能する。なお、制御部１７の各機能の全て又は一部は、ＡＳＩＣ（Application Specific Integrated Circuit）やＰＬＤ（Programmable Logic Device）やＦＰＧＡ（Field Programmable Gate Array）等のハードウェアを用いて実現されても良い。上記のプログラムは、コンピューター読み取り可能な記録媒体に記録されても良い。コンピューター読み取り可能な記録媒体とは、例えばフレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ－ＲＯＭ、半導体記憶装置（例えばＳＳＤ：Solid State Drive）等の可搬媒体、コンピューターシステムに内蔵されるハードディスクや半導体記憶装置等の記憶装置である。上記のプログラムは、電気通信回線を介して送信されてもよい。 The control unit 17 is configured using a processor such as a CPU (Central Processing Unit) and a memory. The control unit 17 functions as a display control unit 171, a conference control unit 172, and an audio control unit 173 by the processor executing programs. All or part of each function of the control unit 17 may be implemented using hardware such as an ASIC (Application Specific Integrated Circuit), a PLD (Programmable Logic Device), or an FPGA (Field Programmable Gate Array). The above program may be recorded on a computer-readable recording medium. Computer-readable recording media include portable media such as flexible disks, magneto-optical disks, ROMs, CD-ROMs, semiconductor storage devices (such as SSD: Solid State Drives), hard disks and semiconductor storage built into computer systems. It is a storage device such as a device. The above program may be transmitted via telecommunication lines.

表示制御部１７１は、通信部１１を介して会議制御装置２０から表示情報を受信する。表示制御部１７１は、取得された表示情報に基づいて画像信号を生成し、表示部１３に表示させる。表示情報は、例えば表示される画像そのものを示す画像データであってもよい。この場合、画像データを生成する主体（画像データ生成部）は会議制御装置２０である。表示情報は、例えば表示される画像を生成するために必要となる情報（例えば、参加しているユーザーに関する情報）を示すデータであってもよい。この場合、表示制御部１７１は、表示データに基づいて、表示部１３に表示するための画像データを生成する。この場合、画像データを生成する主体（画像データ生成部）は表示制御部１７１である。 The display control unit 171 receives display information from the conference control device 20 via the communication unit 11 . The display control unit 171 generates an image signal based on the acquired display information, and causes the display unit 13 to display the image signal. The display information may be, for example, image data representing the displayed image itself. In this case, the main body (image data generation unit) that generates image data is the conference control device 20 . The display information may be, for example, data indicating information needed to generate the displayed image (eg, information about participating users). In this case, the display control unit 171 generates image data to be displayed on the display unit 13 based on the display data. In this case, it is the display control unit 171 that generates the image data (image data generating unit).

会議制御部１７２は、会議制御装置２０において仮想的に設けられる会議に関する制御を行う。例えば、ユーザーが操作部１２を操作することによって会議制御装置２０が提供する会議サービスへログインすることを指示した場合、会議制御部１７２は、ログインするための処理を行う。例えば、ユーザーが操作部１２を操作することによって新規の会議室を設置することを指示した場合、会議制御部１７２は、新規の会議室を設置するための処理を行う。例えば、ユーザーが操作部１２を操作することによって会議室に入室することを指示した場合、会議制御部１７２は、指示された会議室へ入室するための処理を行う。会議室への入室はどのような形で行われてもよい。例えば、会議室を示す文字やボタンやアイコンが１又は複数表示されている画面において、いずれかの文字、ボタン又はアイコンが操作されることによってその会議室への入室が行われてもよい。会議室毎に割り当てられたアドレス（例えば特定の識別番号やＵＬＲ（Uniform Resource Locator）など）に対してアクセスが行われることによって、その会議室への入室が行われてもよい。 The conference control unit 172 controls a virtual conference provided in the conference control device 20 . For example, when the user instructs to log in to the conference service provided by the conference control device 20 by operating the operation unit 12, the conference control unit 172 performs processing for login. For example, when the user instructs to set up a new conference room by operating the operation unit 12, the conference control unit 172 performs processing for setting up a new conference room. For example, when the user instructs to enter the conference room by operating the operation unit 12, the conference control unit 172 performs processing for entering the instructed conference room. Entry into the conference room may be performed in any manner. For example, on a screen on which one or a plurality of characters, buttons, or icons indicating conference rooms are displayed, entry into the conference room may be performed by operating any of the characters, buttons, or icons. A user may enter a conference room by accessing an address assigned to each conference room (for example, a specific identification number, ULR (Uniform Resource Locator), or the like).

音声制御部１７３は、他のユーザー端末１０のユーザーとの間で行われるやりとりされる音声に関する制御を行う。会議室に入室すると、その会議室に入室している他のユーザーとの間で音声の送受信が行われる。音声制御部１７３は、例えば音声入力部１４から入力された音声データを、通信部１１を介して会議制御装置２０へ送信する。音声制御部１７３は、会議制御装置２０から音声データを受信すると、受信された音声データを音声出力部１５から出力する。 The voice control unit 173 controls voice exchanged with users of other user terminals 10 . When entering a conference room, voice transmission/reception is performed with other users who are in the conference room. The voice control unit 173 transmits, for example, voice data input from the voice input unit 14 to the conference control device 20 via the communication unit 11 . When voice data is received from the conference control device 20 , the voice control section 173 outputs the received voice data from the voice output section 15 .

図３は、会議制御装置２０の機能構成の具体例を示す概略ブロック図である。会議制御装置２０は、例えばパーソナルコンピューターやサーバー装置などの情報処理装置を用いて構成される。会議制御装置２０は、通信部２１、記憶部２２及び制御部２３を備える。 FIG. 3 is a schematic block diagram showing a specific example of the functional configuration of the conference control device 20. As shown in FIG. The conference control device 20 is configured using an information processing device such as a personal computer or a server device, for example. The conference control device 20 includes a communication section 21 , a storage section 22 and a control section 23 .

通信部２１は、通信機器である。通信部２１は、例えばネットワークインターフェースとして構成されてもよい。通信部２１は、制御部２３の制御に応じて、ネットワーク３０を介して他の装置とデータ通信する。通信部２１は、無線通信を行う装置であってもよいし、有線通信を行う装置であってもよい。 The communication unit 21 is a communication device. The communication unit 21 may be configured as a network interface, for example. The communication unit 21 performs data communication with other devices via the network 30 under the control of the control unit 23 . The communication unit 21 may be a device that performs wireless communication or a device that performs wired communication.

記憶部２２は、磁気ハードディスク装置や半導体記憶装置等の記憶装置を用いて構成される。記憶部２２は、制御部２３によって使用されるデータを記憶する。記憶部２２は、例えばユーザー情報記憶部２２１、会議室情報記憶部２２２及び感情状態情報記憶部２２３として機能してもよい。ユーザー情報記憶部２２１は、ユーザー端末１０を操作する複数のユーザーに関する情報（ユーザー情報）を記憶する。 The storage unit 22 is configured using a storage device such as a magnetic hard disk device or a semiconductor storage device. Storage unit 22 stores data used by control unit 23 . The storage unit 22 may function as a user information storage unit 221, a conference room information storage unit 222, and an emotional state information storage unit 223, for example. The user information storage unit 221 stores information (user information) about multiple users who operate the user terminal 10 .

会議室情報記憶部２２２は、会議室に関する情報（以下「会議室情報」という。）を記憶する。会議室とは、会議システム１００においてユーザーが会議を行うために設置する仮想的な部屋である。会議室情報は、例えばその会議室のＩＤ、会議室に設定されている名前を示す情報、会議室が設置される予約の日時を示す情報、会議室の属性に関する情報を含んでもよい。会議室の属性に関する情報とは、例えばその会議室に入室可能な人数や、会議室に入室可能なユーザーを示す情報を含んでもよい。 The conference room information storage unit 222 stores information about conference rooms (hereinafter referred to as “conference room information”). A conference room is a virtual room set up for a user to hold a conference in the conference system 100 . The conference room information may include, for example, the ID of the conference room, information indicating the name set for the conference room, information indicating the date and time of reservation for setting up the conference room, and information regarding attributes of the conference room. The information about the attributes of the conference room may include, for example, information indicating the number of people who can enter the conference room and the users who can enter the conference room.

感情状態情報記憶部２２３は、制御部２３の推定部２３５によって推定される各ユーザー（各参加者）の感情状態を示す情報（以下「感情状態情報」という。）を記憶する。例えば、感情状態情報記憶部２２３は、感情状態情報を、会議毎に記憶してもよい。この場合、感情状態情報は、その会議に参加した各参加者について、所定のタイミング（例えば１秒毎、５秒毎、１分毎、など）毎の感情状態の推定結果を表してもよい。 The emotional state information storage unit 223 stores information indicating the emotional state of each user (each participant) estimated by the estimation unit 235 of the control unit 23 (hereinafter referred to as “emotional state information”). For example, the emotional state information storage unit 223 may store emotional state information for each meeting. In this case, the emotional state information may represent the estimation result of the emotional state of each participant who participated in the conference at predetermined timings (for example, every 1 second, every 5 seconds, every 1 minute, etc.).

制御部２３は、ＣＰＵ等のプロセッサーとメモリーとを用いて構成される。制御部２３は、プロセッサーがプログラムを実行することによって、ユーザー制御部２３１、会議室制御部２３２、表示情報生成部２３３、音声制御部２３４、推定部２３５及び評価部２３６として機能する。なお、制御部２３の各機能の全て又は一部は、ＡＳＩＣやＰＬＤやＦＰＧＡ等のハードウェアを用いて実現されても良い。上記のプログラムは、コンピューター読み取り可能な記録媒体に記録されても良い。コンピューター読み取り可能な記録媒体とは、例えばフレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ－ＲＯＭ、半導体記憶装置（例えばＳＳＤ）等の可搬媒体、コンピューターシステムに内蔵されるハードディスクや半導体記憶装置等の記憶装置である。上記のプログラムは、電気通信回線を介して送信されてもよい。 The control unit 23 is configured using a processor such as a CPU and a memory. The control unit 23 functions as a user control unit 231, a conference room control unit 232, a display information generation unit 233, an audio control unit 234, an estimation unit 235, and an evaluation unit 236 by the processor executing programs. All or part of each function of the control unit 23 may be realized using hardware such as ASIC, PLD, FPGA, or the like. The above program may be recorded on a computer-readable recording medium. Computer-readable recording media include portable media such as flexible disks, magneto-optical disks, ROMs, CD-ROMs, semiconductor storage devices (such as SSD), and storage devices such as hard disks and semiconductor storage devices built into computer systems. It is a device. The above program may be transmitted via telecommunication lines.

ユーザー制御部２３１は、ユーザーに関する制御処理を行う。例えば、ユーザー制御部２３１は、会議制御装置２０にアクセスしてくるユーザー端末１０についてログインのための処理（例えば認証処理）を行ってもよい。ユーザー制御部２３１は、ユーザー端末１０から受信されたユーザー情報をユーザー情報記憶部２２１に登録してもよい。 The user control unit 231 performs control processing related to users. For example, the user control unit 231 may perform login processing (for example, authentication processing) for the user terminal 10 accessing the conference control device 20 . The user control section 231 may register the user information received from the user terminal 10 in the user information storage section 221 .

会議室制御部２３２は、会議室に関する制御処理を行う。例えば、会議室制御部２３２は、会議室を新たに設置することについてユーザー端末１０から指示を受けた場合には、受信される情報に基づいて会議室情報を生成し、会議室情報記憶部２２２に登録してもよい。また、会議室制御部２３２は、会議室を設置するタイミングになった場合には、その会議室を仮想的に設置する。会議室を設置するタイミングとは、例えば即時に会議室を新設することについてユーザー端末１０から指示された場合にはその時であるし、予め会議室の設置の予約が登録されていた場合にはその日時が到来した時である。会議室制御部２３２は、ユーザーによって会議室へ参加するための所定の操作が行われた場合、所定の条件が満たされると、その会議室へユーザーを参加させるための処理を行う。例えば、会議室制御部２３２は、会議室情報記憶部２２２を更新することによって、会議室に新たなユーザーが参加したことを登録する。 The conference room control unit 232 performs control processing related to the conference room. For example, when the conference room control unit 232 receives an instruction to set up a new conference room from the user terminal 10, the conference room control unit 232 generates conference room information based on the received information, and the conference room information storage unit 222 may be registered with. Further, when the timing for setting up a conference room comes, the conference room control unit 232 virtually sets up the conference room. The timing of setting up a conference room is, for example, when the user terminal 10 instructs to set up a new conference room immediately, or when a reservation for setting up a conference room has been registered in advance. The time has come. When the user performs a predetermined operation for joining the conference room and a predetermined condition is satisfied, the conference room control unit 232 performs processing for allowing the user to participate in the conference room. For example, the conference room control unit 232 updates the conference room information storage unit 222 to register that a new user has joined the conference room.

表示情報生成部２３３は、ユーザー端末１０において表示される画像の生成に必要となる情報（表示情報）を生成する。表示情報は、例えば現在設置されている会議室に関する情報や、各会議室に入室している各ユーザー端末１０のユーザーに関する情報を含んでもよい。表示情報は、各ユーザー端末１０のユーザーの感情状態について推定部２３５が推定した結果を示す情報を含んでもよい。 The display information generation unit 233 generates information (display information) necessary for generating an image displayed on the user terminal 10 . The display information may include, for example, information about the currently installed conference room and information about the user of each user terminal 10 who has entered each conference room. The display information may include information indicating the result estimated by the estimation unit 235 regarding the emotional state of the user of each user terminal 10 .

表示情報は、各ユーザーの顔画像のデータを含んでもよい。表示情報生成部２３３は、各ユーザーの顔画像について、画像の品質に関する所定の適切な基準を満たしているか否か判定する。画像の品質に関する所定の基準とは、例えば顔部分の大きさに関する基準（例えば、顔部分の大きさが適切な所定の範囲内の大きさであることを示す基準）であってもよい。画像の品質に関する所定の基準とは、例えば顔部分の適切なホワイトバランス（明るさ）に関する基準（例えば、顔部分の明るさやエッジの強さを示す基準）であってもよい。画像の品質に関する所定の基準とは、例えば顔部分の適切なコントラストに関する基準（例えば、顔部分の明暗差を示す基準）であってもよい。表示情報生成部２３３は、画像の品質に関する所定の基準が満たされていない画像については、その画質が適切な基準に近づくように予め定められた画像処理を実行する。そして、表示情報生成部２３３は、画像処理が行われた画像を用いて表示情報を生成する。表示情報生成部２３３は、生成された表示情報を、ユーザー端末１０に対して送信する。 The display information may include face image data of each user. The display information generator 233 determines whether or not the face image of each user satisfies a predetermined appropriate standard regarding image quality. The predetermined standard regarding image quality may be, for example, a standard regarding the size of the face (for example, a standard indicating that the size of the face is within an appropriate predetermined range). The predetermined standard for image quality may be, for example, a standard for appropriate white balance (brightness) of the face (for example, a standard indicating brightness of the face or strength of edges). The predetermined criterion for image quality may be, for example, a criterion for appropriate contrast of the face (for example, a criterion indicating the contrast of the face). The display information generation unit 233 performs predetermined image processing so that the image quality of an image that does not satisfy a predetermined standard regarding image quality approaches an appropriate standard. Then, the display information generation unit 233 generates display information using the image that has undergone the image processing. The display information generator 233 transmits the generated display information to the user terminal 10 .

音声制御部２３４は、ユーザー端末１０から音声データを受信する。音声制御部２３４は、各ユーザー端末１０に対して出力されるべき音声データ（以下「会議音声データ」という。）を生成し、各ユーザー端末１０に会議音声データを送信する。音声制御部２３４は、例えば各ユーザー端末１０に対し、そのユーザーが入室している会議室における会議音声データを送信してもよい。 The voice control unit 234 receives voice data from the user terminal 10 . The audio control unit 234 generates audio data to be output to each user terminal 10 (hereinafter referred to as “conference audio data”) and transmits the conference audio data to each user terminal 10 . The audio control unit 234 may transmit, for example, conference audio data in the conference room in which the user is in, to each user terminal 10 .

推定部２３５は、各会議室に参加している各ユーザーについて、感情状態を推定する。推定部２３５は、例えば、各ユーザーの顔画像に基づいて、どのような感情状態であるか推定する。推定される感情状態は、例えば喜・怒・哀・楽のいずれかであってもよいし、喜び・怒り・驚き・悲しい・平常のいずれかであってもよい。推定される感情状態は、例えば会議室で流れている音声の内容に対する興味の程度であってもよい。感情状態は、顔画像における表情を示す所定の特徴量に基づいて画像認識によって推定されてもよい。推定部２３５は、推定結果を時系列に沿って感情状態情報記憶部２２３に記録する。このような画像認識は、例えば予め教師画像等を用いて機械学習を行うことによって得られている学習済みモデルを用いて実行されてもよい。 The estimation unit 235 estimates the emotional state of each user participating in each conference room. The estimation unit 235 estimates what kind of emotional state each user is in, for example, based on the face image. The estimated emotional state may be, for example, one of joy, anger, sorrow, and comfort, or may be one of joy, anger, surprise, sadness, and normal. The estimated emotional state may be, for example, the degree of interest in the content of speech being played in a conference room. The emotional state may be estimated by image recognition based on a predetermined feature amount representing the expression in the face image. The estimation unit 235 records the estimation result in the emotional state information storage unit 223 in chronological order. Such image recognition may be performed using a trained model obtained by performing machine learning in advance using a teacher image or the like, for example.

感情状態は、例えば顔画像の大きさに基づいて推定されてもよい。顔画像の大きさに基づいて推定される場合には、例えば顔画像が大きいほど画面に近づいて見ていることが推定されるため、より大きな興味を持っていると推定されてもよい。顔画像の大きさに基づいて推定される場合には、例えば顔画像が小さいほど画面から遠ざかって見ていることが推定されるため、より小さな興味を持っていると推定されてもよい。 The emotional state may be estimated, for example, based on the size of the facial image. In the case of estimation based on the size of the face image, for example, it is estimated that the larger the face image is, the closer the person is to the screen, and thus the greater the interest may be inferred. In the case of estimation based on the size of the face image, for example, it is estimated that the smaller the face image is, the further away the person is looking from the screen, so it may be estimated that the person has less interest.

感情状態は、各ユーザーの動きに基づいて推定されてもよい。例えば、予め教師画像等を用いて機械学習を行うことによって得られている学習済みモデルを用いて、各ユーザーのうなずく動作を検出し、うなずきの単位時間当たりの回数や大きさ等に基づいて感情状態が推定されてもよい。例えば、うなずきの単位時間当たりの回数がより多いほどより大きな興味を持っていると推定されてもよい。例えば、うなずきの大きさがより大きいほどより大きな興味を持っていると推定されてもよい。単位時間当たりの回数及び大きさに基づいて推定されてもよい。 The emotional state may be estimated based on each user's movements. For example, by using a trained model obtained by performing machine learning in advance using teacher images, etc., it detects each user's nodding motion, and based on the number of nods per unit time and the magnitude, etc. A state may be estimated. For example, it may be estimated that the greater the number of nods per unit time, the greater the interest. For example, it may be inferred that the larger the size of the nod, the greater the interest. It may be estimated based on the number of times per unit time and the magnitude.

感情状態の推定処理の具体例についていくつか説明したが、上述した以外の処理によって感情状態が推定されてもよい。 Although some specific examples of the emotional state estimation process have been described, the emotional state may be estimated by processes other than those described above.

評価部２３６は、会議の内容について評価する。例えば、評価部２３６は、感情状態情報記憶部２２３に記憶されている感情状態情報に基づいて、会議の内容を評価してもよい。例えば、評価部２３６は、各ユーザーの興味の程度を示す値の統計値が興味を示していることを示す値として高い値をしめすほど、良い会議であったことを示す評価を行っても良い。評価部２３６は、喜び又は驚きを示す人がより多く、怒り・悲しいを示す人がより少ないほど、良い会議であったことを示す評価を行っても良い。 The evaluation unit 236 evaluates the content of the conference. For example, the evaluation unit 236 may evaluate the contents of the meeting based on the emotional state information stored in the emotional state information storage unit 223. For example, the evaluation unit 236 may evaluate the conference as good as the statistical value of the values indicating the degree of interest of each user indicates that the user is interested in the conference. . The evaluation unit 236 may perform an evaluation indicating that the meeting was good as more people showed joy or surprise and fewer people showed anger/sadness.

評価部２３６は、感情状態情報記憶部２２３に記憶されている感情状態情報に基づいて、個々の参加者の出席態度について評価しても良い。例えば、感情情報が取得されている時間が相対的に長いほど、良い出席態度であったことを示す評価が行われても良い。感情情報が取得されているということは、モニターの前にモニターに向かって顔が位置していたことを示しており、きちんと会議に出席していたと推定できるためである。 The evaluation unit 236 may evaluate the attendance attitude of each participant based on the emotional state information stored in the emotional state information storage unit 223 . For example, an evaluation may be made indicating that the longer the emotional information is acquired, the better the attendance attitude. Acquisition of emotional information indicates that the face was positioned facing the monitor in front of the monitor, and it can be assumed that the participant attended the meeting properly.

図４は、従来の技術でユーザー端末に表示される画像の具体例を示す図である。各ユーザー領域９１には、その会議室に入室しているユーザーの画像が表示される。ユーザー領域９１に表示される各ユーザーの画像は、各ユーザーのユーザー端末に接続されたカメラで撮影されている動画像である。左上のユーザーの顔画像は、適度な大きさ、適度なホワイトバランスで表示されている。 FIG. 4 is a diagram showing a specific example of an image displayed on a user terminal according to conventional technology. Each user area 91 displays an image of a user who has entered the conference room. Each user's image displayed in the user area 91 is a moving image taken by a camera connected to each user's user terminal. The user's facial image on the upper left is displayed with a moderate size and a moderate white balance.

右上のユーザーの顔画像は、大きすぎる。そのため、ユーザー領域９１から顔画像の一部がはみ出ている。また、右上のユーザーの顔画像は、撮影された環境が明るすぎることやカメラのパラメータの設定が不適切であることなどが起因して、ホワイトバランスが不適切な状態で撮影されている。そのため、顔画像が白くなりすぎている（いわゆる『白飛び』の状態である）。その結果、右上のユーザーの顔画像は見づらい状態になっている。 The face image of the user in the upper right is too large. Therefore, part of the face image protrudes from the user area 91 . In addition, the user's facial image on the upper right is shot with an inappropriate white balance due to factors such as the shooting environment being too bright and the camera parameter settings being inappropriate. As a result, the face image becomes too white (a so-called "overexposed" state). As a result, the face image of the user in the upper right is difficult to see.

左下のユーザーの顔画像は、大きさが少し小さめである。また、左下のユーザーの顔画像は、撮影された環境が暗すぎることやカメラのパラメータの設定が不適切であることなどが起因して、ホワイトバランスが不適切な状態で撮影されている。そのため、顔画像が黒くなりすぎている（いわゆる『黒つぶれ』の状態である）。その結果、左下のユーザーの顔画像は見づらい状態になっている。 The face image of the user in the lower left is slightly smaller in size. In addition, the user's face image in the lower left is shot with an inappropriate white balance due to factors such as the shooting environment being too dark and the camera parameter settings being inappropriate. As a result, the face image is too dark (a state of so-called "blackout"). As a result, the face image of the user in the lower left is difficult to see.

右下のユーザーの顔画像は、大きさがかなり小さめである。その結果、右下のユーザーの顔画像は見づらい状態になっている。 The user's face image on the bottom right is rather small in size. As a result, the face image of the user in the lower right is difficult to see.

図５は、ユーザー端末１０の表示部１３に表示される画像の具体例を示す図である。図５は、図４と同じ状況において、本発明のユーザー端末１０の表示部１３に表示される画像の具体例を示す。図５において、表示部１３には、会議室内画面が表示されている。会議室内画面とは、ユーザーが会議室に入室している最中に表示される画像である。会議室内画面では、その会議室に入室している一部又は全部のユーザーの画像が表示される。表示される画面は１又は複数のユーザー領域５１で形成される。各ユーザー領域５１には、入室しているユーザーの画像が表示される。ユーザー領域５１に表示される各ユーザーの画像は、各ユーザーのユーザー端末１０に接続されたカメラで撮影されている動画像であってもよいし、静止画像（例えばアイコン画像）であってもよい。各ユーザー領域５１に表示される画像は、表示情報生成部２３３によって生成される。 FIG. 5 is a diagram showing a specific example of an image displayed on the display unit 13 of the user terminal 10. As shown in FIG. FIG. 5 shows a specific example of an image displayed on the display unit 13 of the user terminal 10 of the present invention in the same situation as in FIG. In FIG. 5, the display unit 13 displays a conference room screen. The conference room screen is an image displayed while the user is entering the conference room. On the conference room screen, images of some or all of the users who have entered the conference room are displayed. A displayed screen is formed of one or more user areas 51 . Each user area 51 displays an image of a user who has entered the room. The image of each user displayed in the user area 51 may be a moving image captured by a camera connected to the user terminal 10 of each user, or may be a still image (for example, an icon image). . The image displayed in each user area 51 is generated by the display information generator 233 .

左上のユーザーの顔画像は、適度な大きさ、適度なホワイトバランス及びコントラストで撮影された画像である。すなわち、左上のユーザーの顔画像の画像データにおいて、ユーザーの顔画像の大きさは、所定の範囲内の大きさである。また、左上のユーザーの顔画像の画像データにおいて、ユーザーの顔画像の画質は、ホワイトバランス及びコントラストに関する所定の条件を満たしている。そのため、左上のユーザーの顔画像については画像処理が実行されてなくてもよい。 The image of the user's face on the upper left is an image captured with an appropriate size, appropriate white balance, and appropriate contrast. That is, in the image data of the user's facial image on the upper left, the size of the user's facial image is within a predetermined range. In addition, in the image data of the user's face image on the upper left, the image quality of the user's face image satisfies predetermined conditions regarding white balance and contrast. Therefore, image processing does not have to be performed on the upper left user's face image.

右上のユーザーの顔画像は、撮影されて会議制御装置２０に受信された時点では、その顔の大きさが大きすぎていた。そのため、表示情報生成部２３３は、右上のユーザーの顔画像を、顔の大きさが所定の大きさになるように縮小している。また、右上のユーザーの顔画像は白とびしていた。そのため、表示情報生成部２３３は、右上のユーザーの顔画像に対して輝度を下げるなどの画像処理を実行することによって、白とびを軽減させている。このような画像処理が行われた後の顔画像のデータが右上のユーザーの顔画像のデータとして各ユーザー端末１０に送信されている。 The face image of the upper right user was too large when it was captured and received by the conference control device 20 . Therefore, the display information generation unit 233 reduces the upper right face image of the user so that the face size becomes a predetermined size. Also, the face image of the user on the upper right was blown out. Therefore, the display information generation unit 233 reduces overexposure by executing image processing such as lowering the brightness of the face image of the user on the upper right. The face image data after such image processing is transmitted to each user terminal 10 as the face image data of the upper right user.

左下のユーザーの顔画像は、撮影されて会議制御装置２０に受信された時点では、大きさが少し小さめであった。そのため、表示情報生成部２３３は、左下のユーザーの顔画像を、顔の大きさが所定の大きさになるように拡大している。また、左下のユーザーの顔画像は黒つぶれしていた。そのため、表示情報生成部２３３は、左下のユーザーの顔画像に対して輝度を上げるなどの画像処理を実行することによって、黒つぶれを軽減させている。このような画像処理が行われた後の顔画像のデータが左下のユーザーの顔画像のデータとして各ユーザー端末１０に送信されている。 The face image of the lower left user was slightly smaller in size when it was captured and received by the conference control device 20 . Therefore, the display information generator 233 enlarges the lower left face image of the user so that the size of the face becomes a predetermined size. In addition, the face image of the user in the lower left was blacked out. Therefore, the display information generation unit 233 reduces blocked-up shadows by executing image processing such as increasing the brightness of the face image of the user on the lower left. The face image data after such image processing is transmitted to each user terminal 10 as the face image data of the lower left user.

右下のユーザーの顔画像は、撮影されて会議制御装置２０に受信された時点では、大きさがかなり小さめであった。そのため、表示情報生成部２３３は、右下のユーザーの顔画像を、顔の大きさが所定の大きさになるように拡大している。このような画像処理が行われた後の顔画像のデータが右下のユーザーの顔画像のデータとして各ユーザー端末１０に送信されている。 The face image of the user on the lower right was rather small when it was captured and received by the conference control device 20 . Therefore, the display information generation unit 233 enlarges the face image of the user in the lower right such that the size of the face becomes a predetermined size. The face image data after such image processing is transmitted to each user terminal 10 as the face image data of the lower right user.

このように、表示情報生成部２３３は、各ユーザー領域５１に表示される顔の大きさが所定の大きさ（略同一の大きさ）になるように各ユーザーの画像に画像処理を行う。また、表示情報生成部２３３は、各ユーザー領域５１に表示される顔の明るさが所定の明るさ（例えば顔領域内の輝度の平均値が略同一）になるように各ユーザーの画像に画像処理を行う。 In this manner, the display information generation unit 233 performs image processing on each user's image so that the size of the face displayed in each user area 51 is a predetermined size (substantially the same size). In addition, the display information generation unit 233 creates an image of each user so that the brightness of the face displayed in each user area 51 is a predetermined brightness (for example, the average brightness in the face area is approximately the same). process.

図６は、ユーザー端末１０の表示部１３に表示される画像の具体例を示す図である。図６に示す画像では、図５に示された各ユーザー領域５１の顔画像に対して、推定部２３５が推定した結果に基づいた画像処理が行われている。具体的には、推定部２３５によって推定された各ユーザーの感情状態に応じた画像処理が行われている。例えば、喜びと推定されたユーザーの顔画像（左上の顔画像）には、顔画像の周囲を囲む破線の円が重畳されている。例えば、怒りと推定されたユーザーの顔画像（右上の顔画像）には、顔画像の周囲を囲む一点鎖線の円が重畳されている。例えば、悲しいと推定されたユーザーの顔画像（左下の顔画像）には、顔画像の周囲を囲む点線の円が重畳されている。例えば、驚きと推定されたユーザーの顔画像（右下の顔画像）には、顔画像の周囲を囲む二点鎖線の円が重畳されている。なお、上述した各円の表示態様は、一具体例にすぎない。このように各円の線種が異なってもよいし、線の太さや色が異なってもよい。また、各顔画像を囲む幾何学図形は、円形に限られる必要は無い。例えば、矩形や楕円形や多角形であってもよいし、それぞれ異なる形であってもよい。また、各顔画像に対して重畳される画像は、上述したような円形等の幾何学図形である必要は無く、それぞれの感情状態を示すピクトグラムや文字が重畳されてもよい。 FIG. 6 is a diagram showing a specific example of an image displayed on the display unit 13 of the user terminal 10. As shown in FIG. In the image shown in FIG. 6, image processing is performed on the face image of each user area 51 shown in FIG. Specifically, image processing is performed according to the emotional state of each user estimated by the estimation unit 235 . For example, a dashed circle surrounding the face image is superimposed on the face image of the user estimated to be happy (upper left face image). For example, the face image of the user estimated to be angry (upper right face image) is superimposed with a dashed-dotted line circle surrounding the face image. For example, a dotted-line circle surrounding the face image is superimposed on the face image of the user estimated to be sad (lower left face image). For example, the user's face image (lower right face image) estimated to be surprised is superimposed with a two-dot chain line circle surrounding the face image. In addition, the display mode of each circle mentioned above is only a specific example. In this way, the line types of the circles may be different, and the thickness and color of the lines may be different. Also, the geometric figure surrounding each face image need not be limited to a circle. For example, it may be rectangular, elliptical, polygonal, or different shapes. Also, the image superimposed on each face image need not be a geometric figure such as a circle as described above, and may be superimposed with pictograms or characters indicating each emotional state.

また、感情状態に応じた画像処理では、背景の色を変化させるような画像処理が行われてもよいし、ユーザー領域５１の全体の色や模様を変更するような画像処理が行われても良い。このとき、ユーザーの顔画像が視認可能な状態で表示されないほどに画像処理が行われても良い。例えば、各感情状態に応じた色又は模様でユーザー領域５１が塗りつぶされていても良い。図７は、ユーザー端末１０の表示部１３に表示される画像の具体例を示す図である。図７では、４８名分のユーザーのユーザー領域５１が表示されている。各ユーザー領域５１は、ユーザーの画像に基づいて喜び・怒り・驚き・悲しい・平常のいずれかの感情に分類され、分類された感情状態に応じた模様（パターン）で塗りつぶされている。 Further, in the image processing according to the emotional state, image processing such as changing the color of the background may be performed, or image processing such as changing the color or pattern of the entire user area 51 may be performed. good. At this time, the image processing may be performed to such an extent that the user's face image is not displayed in a visible state. For example, the user area 51 may be filled with colors or patterns according to each emotional state. FIG. 7 is a diagram showing a specific example of an image displayed on the display unit 13 of the user terminal 10. As shown in FIG. In FIG. 7, user areas 51 for 48 users are displayed. Each user area 51 is classified into one of joy, anger, surprise, sadness, and normal based on the user's image, and is filled with a pattern corresponding to the classified emotional state.

通常の表示装置を用いている場合には、そもそも４８名もの大勢の顔画像が表示されたところで、各ユーザー領域５１の大きさは小さくなってしまうため、各ユーザーの表情を読み取ることは困難である。そのような状況において、このように感情状態に応じた色や模様で各ユーザー領域５１が示されると、ユーザーはより容易に各ユーザーの感情状態を認識することが可能となる。また、顔画像が通信される場合に比べて、色や模様で塗りつぶされている画像が通信される方が、通信に要するデータ量を抑えることが可能となる。そのため、ネットワークを介した会議をより安定して実現することが可能となる。また、データ量を抑えることにより、安価なスマートフォン等のように描画能力が高くない装置がユーザー端末１０として使用された場合であっても、処理落ちなどの問題の発生を低減することが可能となる。 When a normal display device is used, it is difficult to read the facial expressions of each user because the size of each user area 51 becomes small when the face images of as many as 48 people are displayed. be. In such a situation, if each user area 51 is displayed with colors and patterns corresponding to the emotional state, the user can more easily recognize the emotional state of each user. In addition, the amount of data required for communication can be reduced by communicating an image filled with colors or patterns as compared to communicating a face image. Therefore, it is possible to more stably implement a conference over the network. In addition, by suppressing the amount of data, it is possible to reduce the occurrence of problems such as processing failures even when a device such as an inexpensive smartphone that does not have high drawing ability is used as the user terminal 10. Become.

図８は、ユーザー端末１０の表示部１３に表示される画像の具体例を示す図である。図８では、同じ会議室に参加しているユーザーの感情状態の数が棒グラフで表示されている。表示情報生成部２３３は、図８に示されるような、各感情状態に分類されたユーザーの数を示す情報を表示するように表示情報を生成してもよい。このような表示情報は、図５～図７に示される画像とともに表示されるように生成されてもよい。 FIG. 8 is a diagram showing a specific example of an image displayed on the display unit 13 of the user terminal 10. As shown in FIG. In FIG. 8, the number of emotional states of users participating in the same conference room is displayed in a bar graph. The display information generator 233 may generate display information to display information indicating the number of users classified into each emotional state, as shown in FIG. Such display information may be generated to be displayed with the images shown in FIGS. 5-7.

図９は、ユーザー端末１０の表示部１３に表示される画像の具体例を示す図である。図９では、同じ会議室に参加しているユーザーの感情状態の数が、時系列の変化を示す折れ線グラフで表示されている。表示情報生成部２３３は、図９に示されるような、各感情状態に分類されたユーザーの数の時系列変化を示す情報を表示するように表示情報を生成してもよい。このような表示情報は、図５～図７に示される画像とともに表示されるように生成されてもよい。図９の表示情報では、時間の変化を示す横軸において、会議の進行状況（例えば、挨拶、第１部、第２部及び質疑応答）を示す情報が示されてもよい。図９の表示情報では、時間の変化を示す横軸において、所定のイベント（例えば、新商品発表）を示す情報が、そのイベントが発生された時刻にあった位置に示されてもよい。 FIG. 9 is a diagram showing a specific example of an image displayed on the display unit 13 of the user terminal 10. As shown in FIG. In FIG. 9, the number of emotional states of users participating in the same meeting room is displayed in a line graph showing changes over time. The display information generation unit 233 may generate display information so as to display information indicating a time series change in the number of users classified into each emotional state as shown in FIG. Such display information may be generated to be displayed with the images shown in FIGS. 5-7. In the display information of FIG. 9, information indicating the progress of the conference (for example, greetings, first part, second part, and Q&A) may be shown on the horizontal axis that shows changes over time. In the display information of FIG. 9, information indicating a predetermined event (for example, a new product announcement) may be displayed at a position corresponding to the time when the event occurred on the horizontal axis indicating the time change.

図１０は、会議システム１００の処理の流れの具体例を示すシーケンスチャートである。まず、ユーザー端末１０は、所定のタイミングでユーザー（出席者）の画像を撮像して会議制御装置２０に送信する（ステップＳ１０１）。 FIG. 10 is a sequence chart showing a specific example of the processing flow of the conference system 100. As shown in FIG. First, the user terminal 10 captures an image of the user (attendee) at a predetermined timing and transmits the image to the conference control device 20 (step S101).

会議制御装置２０は、各ユーザーの画像を受信すると、画像修正処理を実行する（ステップＳ１０２）。画像修正処理において、会議制御装置２０は、顔の大きさを略同一にする画像処理や、顔の明るさを略同一にする画像処理等を実行する。会議制御装置２０は、ユーザーの画像に基づいて感情状態を推定する処理を実行する（ステップＳ１０３）。会議制御装置２０は、推定された感情状態に応じて表示情報を生成する（ステップＳ１０４）。そして、会議制御装置２０は、生成された表示情報をユーザー端末１０に送信する（ステップＳ１０５）。ユーザー端末１０は、受信された表示情報に基づいて表示部１３に画像や文字を表示する（ステップＳ１０６）。 Upon receiving the image of each user, the conference control device 20 executes image correction processing (step S102). In the image correction process, the conference control device 20 performs image processing such as image processing to make the sizes of faces substantially the same, image processing to make the brightness of faces substantially the same, and the like. The conference control device 20 executes processing for estimating the user's emotional state based on the user's image (step S103). The conference control device 20 generates display information according to the estimated emotional state (step S104). The conference control device 20 then transmits the generated display information to the user terminal 10 (step S105). The user terminal 10 displays images and characters on the display unit 13 based on the received display information (step S106).

このように構成された会議システム１００によれば、ネットワークを介して行われる会議において、他の参加者の状況をより容易に確認することが可能となる。より具体的には以下の通りである。会議システム１００では、各ユーザーの顔画像が適切な基準に近づくように画像処理が行われる。例えば、顔の大きさやホワイトバランスやコントラストが適切な基準に近づくように画像処理が行われる。そのため、各参加者の顔の表情や状態をより容易に確認することが可能となる。また、会議システム１００では、各ユーザーの感情状態が推定され、その推定結果に応じた画像処理が行われる。このような処理によっても、各参加者の状態をより容易に確認することが可能となる。 According to the conferencing system 100 configured in this manner, it is possible to more easily check the status of other participants in a conference held via a network. More specifically, it is as follows. In the conference system 100, image processing is performed so that each user's face image approaches an appropriate standard. For example, image processing is performed so that the face size, white balance, and contrast approach appropriate standards. Therefore, it is possible to more easily confirm the facial expression and state of each participant. Also, in the conference system 100, the emotional state of each user is estimated, and image processing is performed according to the estimation result. Such processing also makes it possible to more easily confirm the status of each participant.

以上、この発明の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。 Although the embodiment of the present invention has been described in detail with reference to the drawings, the specific configuration is not limited to this embodiment, and design and the like are included within the scope of the gist of the present invention.

１００…会議システム，１０…ユーザー端末，２０…会議制御装置，１１…通信部，１２…操作部，１３…表示部，１４…音声入力部，１５…音声出力部，１６…記憶部，１６１…ユーザー情報記憶部，１７…制御部，１７１…表示制御部，１７２…会議制御部，１７３…音声制御部，２１…通信部，２２…記憶部，２２１…ユーザー情報記憶部，２２２…会議室情報記憶部，２２３…感情状態情報記憶部，２３…制御部，２３１…ユーザー制御部，２３２…会議室制御部，２３３…表示情報生成部，２３４…音声制御部，２３５…推定部，２３６…評価部，５１…ユーザー領域 DESCRIPTION OF SYMBOLS 100... Conference system 10... User terminal 20... Conference control apparatus 11... Communication part 12... Operation part 13... Display part 14... Voice input part 15... Voice output part 16... Storage part 161... User information storage unit 17 Control unit 171 Display control unit 172 Conference control unit 173 Audio control unit 21 Communication unit 22 Storage unit 221 User information storage unit 222 Conference room information Storage unit 223 Emotional state information storage unit 23 Control unit 231 User control unit 232 Conference room control unit 233 Display information generation unit 234 Voice control unit 235 Estimation unit 236 Evaluation part, 51 ... user area

Claims

An image of a user participating in a conference held via a network is subjected to image processing so as to approach a predetermined standard, and display information including the image of the user subjected to the image processing is generated, and the display information to a user terminal used by another user; and
an audio control unit that controls audio in the conference;
A conference control device comprising:

2. The conference control apparatus according to claim 1, wherein said predetermined standard is a standard relating to the size of said user's face image.

2. The conference controller according to claim 1, wherein said predetermined criterion is a criterion relating to white balance or contrast of said user's facial image.

further comprising an estimation unit for estimating the emotional state of the user based on the image of the user;
The conference control apparatus according to any one of claims 1 to 3, wherein the display information generation section performs image processing on the image according to the emotional state of the estimation result of the estimation section.

An image of a user participating in a conference held via a network is subjected to image processing so as to approach a predetermined standard, and display information including the image of the user subjected to the image processing is generated, and the display information to a user terminal used by another user;
a voice control step of controlling voice in the conference;
A conference control method comprising:

A computer program for causing a computer to function as the conference control device according to any one of claims 1 to 4.