JP2021064833A

JP2021064833A - Server device and video conferencing system

Info

Publication number: JP2021064833A
Application number: JP2019187054A
Authority: JP
Inventors: 明 ▲高▼橋; Akira Takahashi; 章弥稲垣; Akiya Inagaki; 浩孝津田; Hirotaka Tsuda; 直也佐藤; Naoya Sato; 実沙子小泉; Misako Koizumi
Original assignee: NTT Docomo Inc
Current assignee: NTT Docomo Inc
Priority date: 2019-10-10
Filing date: 2019-10-10
Publication date: 2021-04-22

Abstract

To materialize smooth progress of video conference.SOLUTION: A server device 10 comprises the following configuration. An acquisition unit 111 acquires, from each of a plurality of user devices, an image of each user, individual information indicating the sound of the user, and picture information regarding the picture size of a user device corresponding to each user. An image generation unit 112 generates image information corresponding to the picture size of each of the user devices for the user device on the basis of a plurality of pieces of individual information corresponding to the plurality of user devices in one-to-one manner and a plurality of pieces of picture information corresponding to the plurality of user devices in one-to-one manner. A communication device 130 transmits the image information generated for each of the plurality of user devices to the corresponding user device.SELECTED DRAWING: Figure 2

Description

本発明は、サーバ装置及びテレビ会議システムに関する。 The present invention relates to a server device and a video conferencing system.

特許文献１には、会議に参加する複数の参加者のうち、話者を特定することによって、会議を支援する会議支援システムが開示されている。会議支援システムでは、会議テーブルに魚眼レンズを用いた撮像装置が配置される。会議支援システムは、撮像装置によって撮像された撮像情報から参加者の唇の動きを抽出し、抽出された唇の動きに基づいて話者を特定する。さらに、会議支援システムは、撮像情報に基づいて、複数の参加者の顔と話者を示すマイクのアイコンとを含む画像を生成し、ディスプレイに表示する。 Patent Document 1 discloses a conference support system that supports a conference by identifying a speaker among a plurality of participants who participate in the conference. In the conference support system, an imaging device using a fisheye lens is placed on the conference table. The conference support system extracts the movement of the participant's lips from the imaging information captured by the imaging device, and identifies the speaker based on the extracted lip movement. Further, the conference support system generates an image including the faces of a plurality of participants and a microphone icon indicating a speaker based on the captured information and displays the image on the display.

特開２０１５−１９１６２号公報Japanese Unexamined Patent Publication No. 2015-19162

しかし、従来の会議支援システムは、異なる場所にいる複数のユーザが参加するテレビ会議には対応していない。テレビ会議に参加するユーザが使用するユーザ装置は、例えば、スマートフォン、パーソナルコンピュータ、あるいは時計型のリスタブル装置などがある。すなわち、ユーザ装置の画面サイズは、小画面のサイズもあれば、大画面のサイズもある。画面サイズの異なる複数のユーザ装置を使用してテレビ会議を行う場合、大画面用の画像を複数のユーザ装置に配信すると、あるユーザ装置では画像を表示できないことがある。一方、小画面用の画像を複数のユーザ装置に配信すると、他のユーザ装置では表示能力を十分に発揮できないことがある。このため、従来の技術は、テレビ会議を円滑に進行できないといった問題がある。 However, the conventional conference support system does not support video conferencing in which a plurality of users in different locations participate. The user device used by the user who participates in the video conference includes, for example, a smartphone, a personal computer, or a clock-type restable device. That is, the screen size of the user device includes the size of a small screen and the size of a large screen. When a video conference is held using a plurality of user devices having different screen sizes, if an image for a large screen is distributed to a plurality of user devices, the image may not be displayed on a certain user device. On the other hand, when an image for a small screen is distributed to a plurality of user devices, the display capability may not be sufficiently exhibited by other user devices. Therefore, the conventional technique has a problem that the video conference cannot proceed smoothly.

以上の課題を解決するために、本開示の好適な態様に係るサーバ装置は、テレビ会議に参加する複数のユーザと１対１に対応する複数のユーザ装置と通信するサーバ装置であって、各ユーザの画像及び当該ユーザの音を示す個別情報、並びに前記各ユーザに対応するユーザ装置の画面サイズに関する画面情報を前記複数のユーザ装置の各々から取得する取得部と、前記複数のユーザ装置と1対１に対応する複数の個別情報及び前記複数のユーザ装置と1対１に対応する複数の画面情報に基づいて、各ユーザ装置の画面サイズに応じた画像情報を前記複数のユーザ装置の各々について生成する画像生成部と、前記複数のユーザ装置の各々について生成された画像情報を、対応するユーザ装置に送信する送信部と、を備える。 In order to solve the above problems, the server device according to the preferred embodiment of the present disclosure is a server device that communicates with a plurality of users participating in a video conference and a plurality of user devices having a one-to-one correspondence. An acquisition unit that acquires an image of a user, individual information indicating the sound of the user, and screen information regarding a screen size of a user device corresponding to each user from each of the plurality of user devices, the plurality of user devices, and 1 Based on a plurality of individual information corresponding to one-to-one and a plurality of screen information corresponding to one-to-one with the plurality of user devices, image information corresponding to the screen size of each user device is provided for each of the plurality of user devices. It includes an image generation unit to be generated, and a transmission unit to transmit image information generated for each of the plurality of user devices to the corresponding user device.

本開示によれば、サーバ装置は、複数のユーザ装置に画面サイズに応じた画像情報を送信するので、テレビ会議を円滑に進行できる。 According to the present disclosure, the server device transmits image information according to the screen size to a plurality of user devices, so that the video conference can proceed smoothly.

テレビ会議システム１の構成の一例を示すブロック図である。It is a block diagram which shows an example of the structure of a video conferencing system 1. ユーザ装置２０_1の構成の一例を示す斜視図である。It is a perspective view which shows an example of the structure of the user apparatus 20_1. ユーザ装置２０_2の構成の一例を示す平面図である。It is a top view which shows an example of the structure of the user apparatus 20_2. サーバ装置１０の構成の一例を示すブロック図である。It is a block diagram which shows an example of the structure of the server apparatus 10. テレビ会議の参加人数が４名の場合における、画面レイアウトの一例を示す説明図である。It is explanatory drawing which shows an example of the screen layout when the number of participants of a video conference is four. テレビ会議の参加人数が６名の場合における、画面レイアウトの一例を示す説明図である。It is explanatory drawing which shows an example of the screen layout when the number of participants of a video conference is six. テレビ会議の画像の一例を示す説明図である。It is explanatory drawing which shows an example of the image of a video conference. テレビ会議の画像の一例を示す説明図である。It is explanatory drawing which shows an example of the image of a video conference. 画像情報Ｇ２の示す画像の一例を示す説明図である。It is explanatory drawing which shows an example of the image which image information G2 shows. ユーザ装置２０_jの構成の一例を示すブロック図である。It is a block diagram which shows an example of the structure of the user apparatus 20_j. テレビ会議における画像情報の生成に関するサーバ装置１０の動作の一例を示すフローチャートである。It is a flowchart which shows an example of the operation of the server apparatus 10 concerning the generation of image information in a video conference. 密談に関するサーバ装置１０の動作の一例を示すフローチャートである。It is a flowchart which shows an example of the operation of the server apparatus 10 about a secret talk. 密談の相手方であるユーザ装置２０_1に表示される画像の一例を示す説明図である。It is explanatory drawing which shows an example of the image displayed on the user apparatus 20_1 which is the other party of a secret talk. 画像情報Ｇ３に対応する画像の一例を示す説明図である。It is explanatory drawing which shows an example of the image corresponding to image information G3. チャット中にユーザ装置２０_1に表示される画像の一例を示す説明図である。It is explanatory drawing which shows an example of the image displayed on the user apparatus 20_1 during a chat.

１．実施形態
１−１：全体構成
図１は、実施形態に係るテレビ会議システム１の構成例を示すブロック図である。テレビ会議システム１は、互いに異なる場所にいる複数のユーザＵ１、Ｕ２、…Ｕｎが参加するテレビ会議のサービスを提供する。さらに、テレビ会議システム１は、テレビ会議中に密談を行うサービスを提供する。密談は、複数のユーザＵ１、Ｕ２、…Ｕｎの一部のユーザで行うことができる。密談の典型例は、複数のユーザＵ１、Ｕ２、…Ｕｎのうち２人のユーザで行われる。密談の内容は、密談の参加者のみに共有され、密談の参加者以外のユーザには、秘密とされる。ｎは２以上の整数である。 1. 1. Embodiment 1-1: Overall configuration FIG. 1 is a block diagram showing a configuration example of the video conferencing system 1 according to the embodiment. The video conferencing system 1 provides a video conferencing service in which a plurality of users U1, U2, ... Un, who are located at different locations, participate. Further, the video conferencing system 1 provides a service for conducting a secret talk during the video conferencing. The secret talk can be performed by some users of a plurality of users U1, U2, ... Un. A typical example of a secret talk is performed by two users out of a plurality of users U1, U2, ... Un. The content of the secret talk is shared only with the participants of the secret talk and is kept secret to users other than the participants of the secret talk. n is an integer of 2 or more.

テレビ会議システム１は、サーバ装置１０、及び複数のユーザ装置２０_1、２０_2、…２０_nを備える。これらの構成要素は、インターネットなどの通信網ＮＥＴに接続される。複数のユーザ装置２０_1〜２０_nは、複数のユーザＵ１〜Ｕｎと1対１に対応する。以下の説明では、１以上ｎ以下の任意の整数をｊとする。複数のユーザ装置２０_1〜２０_nのうち、任意のユーザ装置をユーザ装置２０_jと表記する。複数のユーザＵ１〜Ｕｎのうち、任意のユーザをユーザＵｊと表記する。また、ユーザ装置２０_1は第１のユーザ装置の一例である。ユーザ装置２０_2は第２のユーザ装置の一例である。 The video conferencing system 1 includes a server device 10, and a plurality of user devices 20_1, 20_2, ... 20_n. These components are connected to a communication network NET such as the Internet. The plurality of user devices 20_1 to 20_n have a one-to-one correspondence with the plurality of users U1 to Un. In the following description, an arbitrary integer of 1 or more and n or less is defined as j. Of the plurality of user devices 20_1 to 20_n, any user device is referred to as user device 20_j. Of the plurality of users U1 to Un, any user is referred to as user Uj. Further, the user device 20_1 is an example of the first user device. The user device 20_2 is an example of the second user device.

ユーザ装置２０_jは、通話機能、撮像機能、及び通信機能を備える装置を用いることができる。ユーザ装置２０_jとして、例えば、時計型のウェアラブル装置、パーソナルコンピュータ、又はスマートフォンが用いられる。以下の説明では、ユーザＵ１が使用するユーザ装置２０_1としてパーソナルコンピュータが用いられ、ユーザＵ２が使用するユーザ装置２０_2として時計型のウェアラブル装置が用いられる。 As the user device 20_j, a device having a call function, an image pickup function, and a communication function can be used. As the user device 20_j, for example, a watch-type wearable device, a personal computer, or a smartphone is used. In the following description, a personal computer is used as the user device 20_1 used by the user U1, and a watch-type wearable device is used as the user device 20_2 used by the user U2.

図２は、ユーザ装置２０_1の構成例を示す斜視図である。ユーザ装置２０_1は、表示装置２４０、入力装置２５０、撮像装置２６０、及びマイク２７０を備える。表示装置２４０は、テレビ会議の画像を表示する。撮像装置２６０は、被写体を撮像して、撮像結果を示す撮像情報を出力する。マイク２７０は、音を電気信号に変換し、変換結果を示す音声信号を出力する。テレビ会議の画像にはユーザの顔画像の他、発言ボタンＢ１の画像が含まれる。発言ボタンＢ１については、後述する。 FIG. 2 is a perspective view showing a configuration example of the user device 20_1. The user device 20_1 includes a display device 240, an input device 250, an image pickup device 260, and a microphone 270. The display device 240 displays the image of the video conference. The image pickup apparatus 260 takes an image of the subject and outputs the image pickup information indicating the image pickup result. The microphone 270 converts sound into an electric signal and outputs an audio signal indicating the conversion result. The image of the video conference includes an image of the voice button B1 in addition to the image of the user's face. The speak button B1 will be described later.

図３は、ユーザ装置２０_2の構成例を示す平面図である。ユーザ装置２０_2は、ユーザ装置２０_1と同様に、表示装置２４０、入力装置２５０、撮像装置２６０、及びマイク２７０を備える。ユーザ装置２０_2はベルトを有している。ユーザ装置２０_2は、ユーザＵ１の手首にベルトを巻回することによって、使用される。 FIG. 3 is a plan view showing a configuration example of the user device 20_2. The user device 20_2 includes a display device 240, an input device 250, an image pickup device 260, and a microphone 270, similarly to the user device 20_1. The user device 20_2 has a belt. The user device 20_2 is used by winding a belt around the wrist of the user U1.

１−２：サーバ装置
図４は、サーバ装置１０の構成の一例を示すブロック図である。サーバ装置１０は、情報処理装置の一例である。サーバ装置１０は、処理装置１１０、記憶装置１２０、及び通信装置１３０を備える。サーバ装置１０の各要素は、情報を通信するための単体又は複数のバスで相互に接続される。なお、本明細書における「装置」という用語は、回路、デバイス又はユニット等の他の用語に読替えてもよい。また、サーバ装置１０の各要素は、単数又は複数の機器で構成され、サーバ装置１０の一部の要素は省略されてもよい。 1-2: Server device FIG. 4 is a block diagram showing an example of the configuration of the server device 10. The server device 10 is an example of an information processing device. The server device 10 includes a processing device 110, a storage device 120, and a communication device 130. Each element of the server device 10 is connected to each other by a single bus or a plurality of buses for communicating information. The term "device" in the present specification may be read as another term such as a circuit, a device, or a unit. Further, each element of the server device 10 may be composed of a single or a plurality of devices, and some elements of the server device 10 may be omitted.

処理装置１１０は、サーバ装置１０の全体を制御するプロセッサであり、例えば、単数又は複数のチップで構成される。処理装置１１０は、例えば、周辺装置とのインタフェース、演算装置及びレジスタ等を含む中央処理装置（ＣＰＵ：Central Processing Unit）で構成される。なお、処理装置１１０の機能の一部又は全部を、ＤＳＰ（Digital Signal Processor）、ＡＳＩＣ（Application Specific Integrated Circuit）、ＰＬＤ（Programmable Logic Device）、ＦＰＧＡ（Field Programmable Gate Array）等のハードウェアによって実現してもよい。処理装置１１０は、各種の処理を並列的又は逐次的に実行する。 The processing device 110 is a processor that controls the entire server device 10, and is composed of, for example, a single chip or a plurality of chips. The processing device 110 is composed of, for example, a central processing unit (CPU) including an interface with peripheral devices, an arithmetic unit, registers, and the like. In addition, a part or all of the functions of the processing device 110 are realized by hardware such as DSP (Digital Signal Processor), ASIC (Application Specific Integrated Circuit), PLD (Programmable Logic Device), FPGA (Field Programmable Gate Array). You may. The processing device 110 executes various processes in parallel or sequentially.

記憶装置１２０は、処理装置１１０が読取可能な記録媒体であり、処理装置１１０が実行する制御プログラムＰＲ１を含む複数のプログラム、及び処理装置１１０が使用する各種の情報、例えば、管理情報Ｋなどを記憶する。記憶装置１２０は、例えば、ＲＯＭ（Read Only Memory）、ＥＰＲＯＭ（Erasable Programmable ＲＯＭ）、ＥＥＰＲＯＭ（Electrically Erasable Programmable ＲＯＭ）、ＲＡＭ（Random Access Memory）等の少なくとも１つによって構成されてもよい。記憶装置１２０は、レジスタ、キャッシュ、メインメモリ（主記憶装置）等と呼ばれてもよい。
管理情報Ｋは、テレビ会議を識別するサービスＩＤ、サービスを利用するユーザを識別するユーザＩＤ、ユーザが使用するユーザ装置の画面サイズを示す画面情報、話者を示す話者フラグ、及び密談を識別する密談ＩＤを対付ける情報である。管理情報Ｋは、処理装置１１０は、管理情報Ｋを参照及び更新することによって、テレビ会議のサービスを提供する。 The storage device 120 is a recording medium that can be read by the processing device 110, and can store a plurality of programs including the control program PR1 executed by the processing device 110 and various information used by the processing device 110, such as management information K. Remember. The storage device 120 may be composed of at least one such as a ROM (Read Only Memory), an EPROM (Erasable Programmable ROM), an EPROM (Electrically Erasable Programmable ROM), and a RAM (Random Access Memory). The storage device 120 may be called a register, a cache, a main memory (main storage device), or the like.
The management information K identifies a service ID that identifies a video conference, a user ID that identifies a user who uses the service, screen information that indicates the screen size of the user device used by the user, a speaker flag that indicates a speaker, and a secret talk. This is information for associating a secret talk ID. The management information K provides a video conferencing service by the processing device 110 referring to and updating the management information K.

通信装置１３０は、他の装置と通信を行うためのハードウェア（送受信デバイス）である。通信装置１３０は、例えば、ネットワークデバイス、ネットワークコントローラ、ネットワークカード、通信モジュール等とも呼ばれる。 The communication device 130 is hardware (transmission / reception device) for communicating with another device. The communication device 130 is also called, for example, a network device, a network controller, a network card, a communication module, or the like.

処理装置１１０は、記憶装置１２０から制御プログラムＰＲ１を読み出して実行することによって、取得部１１１、画像生成部１１２、認識部１１３、及び管理部１１４として機能する。 The processing device 110 functions as an acquisition unit 111, an image generation unit 112, a recognition unit 113, and a management unit 114 by reading and executing the control program PR1 from the storage device 120.

取得部１１１は、複数のユーザ装置２０_1〜２０_nの各々から、各種の情報を取得する。ユーザ装置２０_jから取得される情報には、個別情報Ｘｊ、画面情報Ｙｊ、指定情報Ｚｊ及び検出情報Ｄｊが含まれる。個別情報Ｘｊ、画面情報Ｙｊ、指定情報Ｚｊ、検出情報Ｄｊは、ユーザ装置２０_jによって生成される。また、これらの情報は、ユーザ装置２０_jから、直接、サーバ装置１０に送信されてもよいし、他の装置を経由してサーバ装置１０に送信されてもよい。例えば、ユーザ装置２０_jが、時計型のリスタブル装置である場合、これらの情報は、ユーザＵｊが所持するスマートフォンを介して、サーバ装置１０に送信されてもよい。 The acquisition unit 111 acquires various types of information from each of the plurality of user devices 20_1 to 20_n. The information acquired from the user device 20_j includes individual information Xj, screen information Yj, designated information Zj, and detection information Dj. The individual information Xj, the screen information Yj, the designated information Zj, and the detection information Dj are generated by the user device 20_j. Further, these pieces of information may be transmitted directly from the user device 20_j to the server device 10 or may be transmitted to the server device 10 via another device. For example, when the user device 20_j is a clock-type restable device, such information may be transmitted to the server device 10 via a smartphone possessed by the user Uj.

個別情報Ｘｊは、ユーザＵｊの画像及びユーザＵｊの音を示す情報である。画面情報Ｙｊは、ユーザ装置２０_jの画面サイズに関する情報である。画面情報Ｙｊは、例えば、画面の解像度、ユーザ装置２０_jの型番、又はユーザ装置２０_jの装置の種類を示してもよい。ユーザ装置２０_jの型番は、直接的に画面サイズを示す情報ではない。しかし、ユーザ装置２０_jの型番によって、ユーザ装置２０_jの画面サイズは一意に特定される。このため、ユーザ装置２０_jの型番は、間接的に画面サイズを示す情報である。ユーザ装置２０_jの装置の種類としては、時計型のリスタブル装置、スマートフォン、タブレット端末、及びパーソナルコンピュータが含まれ得る。指定情報Ｚｊは、密談の相手方及び密談のコユニケーション方法を指定する情報である。 The individual information Xj is information indicating the image of the user Uj and the sound of the user Uj. The screen information Yj is information regarding the screen size of the user device 20_j. The screen information Yj may indicate, for example, the screen resolution, the model number of the user device 20_j, or the type of the device of the user device 20_j. The model number of the user device 20_j is not information that directly indicates the screen size. However, the screen size of the user device 20_j is uniquely specified by the model number of the user device 20_j. Therefore, the model number of the user device 20_j is information that indirectly indicates the screen size. The type of device of the user device 20_j may include a watch-type restable device, a smartphone, a tablet terminal, and a personal computer. The designated information Zj is information that specifies the other party of the secret talk and the communication method of the secret talk.

検出情報Ｄｊは、ユーザ装置２０_jにおいて、テレビ会議の画像に含まれる発言ボタンＢ１が操作されたことが検出されたことを示す情報である。発言ボタンＢ１は、発言を希望する意志を伝えるための操作子の一例である。テレビ会議において、複数のユーザが同時に発言すると、テレビ会議の進行が妨げられる。そこで、テレビ会議システム１において、発言ボタンＢ１の操作によって、複数のユーザ装置２０_1〜２０_nは、発言の意志を伝える画像を表示する。 The detection information Dj is information indicating that the user device 20_j has detected that the speech button B1 included in the image of the video conference has been operated. The remark button B1 is an example of an operator for conveying the intention of wishing to remark. In a video conference, when a plurality of users speak at the same time, the progress of the video conference is hindered. Therefore, in the video conferencing system 1, the plurality of user devices 20_1 to 20_n display an image that conveys the intention of speaking by operating the speaking button B1.

画像生成部１１２は、個別情報Ｘｊ及び画面情報Ｙｊに基づいて、ユーザ装置２０_jの画面サイズに応じた画像情報Ｇｊを生成する。即ち、画像生成部１１２は、複数の個別情報Ｘ１〜Ｘｎ及び複数の画面情報Ｙ１〜Ｙｊに基づいて、各ユーザ装置の画面サイズに応じた画像情報Ｇ１〜Ｇｎを複数のユーザ装置２０_1〜２０_nの各々について生成する。画像情報Ｇ１は第１の画像情報の一例である。画像情報Ｇ２は第２の画像情報の一例である。 The image generation unit 112 generates image information Gj according to the screen size of the user device 20_j based on the individual information Xj and the screen information Yj. That is, the image generation unit 112 uses the image information G1 to Gn according to the screen size of each user device based on the plurality of individual information X1 to Xn and the plurality of screen information Y1 to Yj of the plurality of user devices 20_1 to 20_n. Generate for each. The image information G1 is an example of the first image information. The image information G2 is an example of the second image information.

画像生成部１１２は、複数の個別情報Ｘ１〜Ｘｎに基づいて、ユーザの顔を含む部分の画像を抽出することによって、複数のユーザＵ１〜Ｕｎと1対１に対応する複数の顔画像情報Ｇｆ１〜Ｇｆｎを生成する。ユーザの顔を含む部分の画像の形状は、例えば、楕円形、円形、又は長方形である。ユーザの顔を含む部分の画像は、背景とユーザの顔とから構成される。 The image generation unit 112 extracts a plurality of face image information Gf1 corresponding to one-to-one with a plurality of users U1 to Un by extracting an image of a portion including a user's face based on a plurality of individual information X1 to Xn. ~ Gfn is generated. The shape of the image of the portion including the user's face is, for example, an ellipse, a circle, or a rectangle. The image of the part including the user's face is composed of the background and the user's face.

画像生成部１１２は、テレビ会議に参加するユーザの数と画面情報Ｙｊとに基づいて、ユーザ装置２０_jに送信する画像情報Ｇｊの示す画像の画面レイアウトを決定する。画像生成部１１２は、例えば、ユーザの数及び画面情報Ｙｊの示す画面サイズと画面レイアウトとを対応付けたレイアウト情報を参照することによって、画面レイアウトを決定する。 The image generation unit 112 determines the screen layout of the image indicated by the image information Gj to be transmitted to the user device 20_j based on the number of users participating in the video conference and the screen information Yj. The image generation unit 112 determines the screen layout by referring to the layout information in which the number of users and the screen size indicated by the screen information Yj are associated with the screen layout, for example.

例えば、画面情報Ｙｊの画面サイズがパーソナルコンピュータを示し、テレビ会議の参加人数が４名の場合、画面レイアウトは、例えば、図５Ａに示されるように４個の顔画像領域Ｒａ１〜Ｒａ４が割り当てられる。また、画面情報Ｙｊの画面サイズがパーソナルコンピュータを示し、テレビ会議の参加人数が６名の場合、画面レイアウトは、例えば、図５Ｂに示されるように４個の顔画像領域Ｒｂ１〜Ｒｂ６が割り当てられる。一方、画面情報Ｙｊの画面サイズが時計型のウェアラブル装置を示す場合、テレビ会議の参加人数に関らず、図３に示されるように１個の顔画像領域Ｒｃ１が割り当てられる。 For example, when the screen size of the screen information Yj indicates a personal computer and the number of participants in the video conference is 4, the screen layout is allocated, for example, four face image areas Ra1 to Ra4 as shown in FIG. 5A. .. Further, when the screen size of the screen information Yj indicates a personal computer and the number of participants in the video conference is 6, the screen layout is allocated, for example, four face image areas Rb1 to Rb6 as shown in FIG. 5B. .. On the other hand, when the screen size of the screen information Yj indicates a watch-type wearable device, one face image area Rc1 is allocated as shown in FIG. 3, regardless of the number of participants in the video conference.

画像生成部１１２は、画面レイアウトの顔画像領域に顔画像を挿入することによって、画像情報Ｇｊを生成する。図５Ｂに示される画面レイアウトの場合、画像生成部１１２は、顔画像領域Ｒｂ１〜Ｒｂ６に１対１に対応するユーザＵ１〜Ｕ６の顔画像を挿入して、画像情報Ｇｊを生成する。この場合、テレビ会議の参加者は６名である。一方、図３に示される画面レイアウトの場合、画像生成部１１２は、顔画像領域Ｒｃ１にユーザＵ１〜Ｕｎのうちいずれか１名の顔画像を挿入して、画像情報Ｇｊを生成する。 The image generation unit 112 generates image information Gj by inserting a face image into the face image area of the screen layout. In the case of the screen layout shown in FIG. 5B, the image generation unit 112 inserts the face images of the users U1 to U6 corresponding to one-to-one into the face image areas Rb1 to Rb6 to generate the image information Gj. In this case, there are 6 participants in the video conference. On the other hand, in the case of the screen layout shown in FIG. 3, the image generation unit 112 inserts the face image of any one of the users U1 to Un into the face image area Rc1 to generate the image information Gj.

画像生成部１１２は、取得部１１１が検出情報Ｄｊを取得すると、検出情報Ｄｊの送信元のユーザ装置２０_jのユーザＵｊを、発言を希望するユーザとして特定する。画像生成部１１２は、画像情報Ｇｊを更新することによって、特定されたユーザの顔画像を他のユーザの顔画像と識別可能な画像をテレビ会議の画像に含ませる。例えば、ユーザ装置２０_jに図６Ａに示すテレビ会議の画像が表示される場合を想定する。図６Ａに示される例では、６個の顔画像領域Ｒｂ１〜Ｒｂ６に１対１に対応してユーザＵ１〜Ｕ６の顔画像が配置される。また、話者の顔画像の大きさは他のユーザの顔画像の大きさよりも大きい。即ち、顔画像領域Ｒｂ２に表示される顔画像のユーザＵ２が話者である。ここで、ユーザＵ６が、ユーザ装置２０_6において表示されるテレビ会議の画像中の発言ボタンＢ１を操作すると、ユーザ装置２０_6は検出情報Ｄ６をサーバ装置１０に送信する。取得部１１１が検出情報Ｄ６を取得することを契機に、画像生成部１１２は、画像情報Ｇｊを更新する。この結果、画像情報Ｇｊの示す画像は、図６Ｂに示すテレビ会議の画像となる。図６Ｂに示されるように、ユーザＵ６の顔画像が表示される顔画像領域Ｒｂ６の境界と重なる位置にアイコンＡ１が配置される。アイコンＡ１は、発言を希望することを示す画像である。アイコンＡ１によって、ユーザＵ６以外の他のユーザＵ１〜Ｕ５は、ユーザＵ６が発言を希望していることを知る。 When the acquisition unit 111 acquires the detection information Dj, the image generation unit 112 identifies the user Uj of the user device 20_j, which is the source of the detection information Dj, as a user who wants to speak. By updating the image information Gj, the image generation unit 112 includes an image in which the face image of the specified user can be distinguished from the face image of another user in the image of the video conference. For example, assume that the image of the video conference shown in FIG. 6A is displayed on the user device 20_j. In the example shown in FIG. 6A, the face images of the users U1 to U6 are arranged one-to-one in the six face image areas Rb1 to Rb6. Also, the size of the speaker's face image is larger than the size of the face image of another user. That is, the user U2 of the face image displayed in the face image area Rb2 is the speaker. Here, when the user U6 operates the speak button B1 in the image of the video conference displayed on the user device 20_6, the user device 20_6 transmits the detection information D6 to the server device 10. When the acquisition unit 111 acquires the detection information D6, the image generation unit 112 updates the image information Gj. As a result, the image shown by the image information Gj becomes the image of the video conference shown in FIG. 6B. As shown in FIG. 6B, the icon A1 is arranged at a position overlapping the boundary of the face image area Rb6 on which the face image of the user U6 is displayed. The icon A1 is an image indicating that he / she wants to speak. By the icon A1, the users U1 to U5 other than the user U6 know that the user U6 wants to speak.

ところで、ユーザ装置２０_1〜２０_nには時計型のウェアラブル装置が含まれることがある。例えば、図3に示すユーザ装置２０_2は時計型のウェアラブル装置である。画像生成部１１２は、時計型のウェアラブル装置において、発言を希望するユーザを識別可能な画像情報Ｇ２を生成する。図６Ｃは、画像情報Ｇ２の示す画像の一例を示す説明図である。図６Ｃに示されるように、顔画像領域Ｒｃ２に発言を希望するユーザＵ６の顔画像が配置される。画像生成部１１２は、顔画像情報Ｇｆ６を用いて画像情報Ｇ２を生成する。この例では、ユーザＵ６の顔画像がワイプ表示されるので、ユーザＵ２はユーザＵ６が発言を希望していることを任ｓ期できる。 By the way, the user devices 20_1 to 20_n may include a watch-type wearable device. For example, the user device 20_2 shown in FIG. 3 is a watch-type wearable device. The image generation unit 112 generates image information G2 that can identify a user who wants to speak in a watch-type wearable device. FIG. 6C is an explanatory diagram showing an example of the image shown by the image information G2. As shown in FIG. 6C, the face image of the user U6 who wants to speak is arranged in the face image area Rc2. The image generation unit 112 generates image information G2 using the face image information Gf6. In this example, since the face image of the user U6 is wiped and displayed, the user U2 can assume that the user U6 wants to speak.

説明を図４に戻す、認識部１１３は、複数の個別情報Ｘ１〜Ｘｎに基づいて、話者を特定する。認識部１１３は、例えば、複数の個別情報Ｘ１〜Ｘｎの示す音の大きさを比較し、比較結果に基づいて、最も大きな音を示す個別情報に対応するユーザを話者とする。 Returning to FIG. 4, the recognition unit 113 identifies the speaker based on the plurality of individual information X1 to Xn. For example, the recognition unit 113 compares the loudness of the sounds indicated by the plurality of individual information X1 to Xn, and based on the comparison result, the user corresponding to the individual information indicating the loudest sound is set as the speaker.

管理部１１４は、密談が行われる場合、指定情報Ｚｊで指定される密談に用いるコミュニケーション方法に応じて、指定情報Ｚｊの送信元のユーザ装置２０_jと指定情報Ｚｊで指定される密談の相手方のユーザ装置との間におけるコミュニケーション方法を切り替える。 When a secret talk is conducted, the management unit 114 determines the user device 20_j of the source of the designated information Zj and the user of the other party of the secret talk specified by the designated information Zj, according to the communication method used for the secret talk specified by the designated information Zj. Switch the communication method with the device.

１−３：ユーザ装置
次に、ユーザ装置２０_jについて説明する。図７は、ユーザ装置２０_jの構成の一例を示すブロック図である。ユーザ装置２０_jは、処理装置２１０、記憶装置２２０、通信装置２３０、表示装置２４０、入力装置２５０、撮像装置２６０、及びマイク２７０を備える。ユーザ装置２０_jの各要素は、情報を通信するための単体又は複数のバスで相互に接続される。ユーザ装置２０_jの各要素は、単数又は複数の機器で構成され、ユーザ装置２０_jの一部の要素は省略されてもよい。 1-3: User device Next, the user device 20_j will be described. FIG. 7 is a block diagram showing an example of the configuration of the user device 20_j. The user device 20_j includes a processing device 210, a storage device 220, a communication device 230, a display device 240, an input device 250, an image pickup device 260, and a microphone 270. Each element of the user device 20_j is connected to each other by a single bus or a plurality of buses for communicating information. Each element of the user device 20_j is composed of a single device or a plurality of devices, and some elements of the user device 20_j may be omitted.

処理装置２１０は、ユーザ装置２０_jの全体を制御するプロセッサであり、例えば、単数又は複数のチップで構成される。処理装置２１０は、例えば、周辺装置とのインタフェース、演算装置及びレジスタ等を含む中央処理装置で構成される。処理装置２１０は、各種の処理を並列的又は逐次的に実行する。 The processing device 210 is a processor that controls the entire user device 20_j, and is composed of, for example, a single chip or a plurality of chips. The processing device 210 is composed of, for example, a central processing unit including an interface with peripheral devices, an arithmetic unit, registers, and the like. The processing device 210 executes various processes in parallel or sequentially.

記憶装置２２０は、処理装置２１０が読取可能な記録媒体であり、処理装置２１０が実行する制御プログラムＰＲ２を含む複数のプログラム、及び処理装置２１０が使用する各種の情報、例えば、画面情報Ｙｊなどを記憶する。記憶装置２２０は、例えば、ＲＯＭ（Read Only Memory）、ＥＰＲＯＭ（Erasable Programmable ＲＯＭ）、ＥＥＰＲＯＭ（Electrically Erasable Programmable ＲＯＭ）、ＲＡＭ（Random Access Memory）等の少なくとも１つによって構成されてもよい。記憶装置２２０は、レジスタ、キャッシュ、メインメモリ（主記憶装置）等と呼ばれてもよい。 The storage device 220 is a recording medium that can be read by the processing device 210, and can store a plurality of programs including the control program PR2 executed by the processing device 210 and various information used by the processing device 210, such as screen information Yj. Remember. The storage device 220 may be composed of at least one such as a ROM (Read Only Memory), an EPROM (Erasable Programmable ROM), an EPROM (Electrically Erasable Programmable ROM), and a RAM (Random Access Memory). The storage device 220 may be called a register, a cache, a main memory (main storage device), or the like.

通信装置２３０は、他の装置と通信を行うためのハードウェア（送受信デバイス）である。 The communication device 230 is hardware (transmission / reception device) for communicating with another device.

表示装置２４０は、画像を表示する。表示装置２４０は、処理装置２１０による制御のもとで各種の画像を表示する。例えば、液晶表示パネル及び有機ＥＬ（Electro Luminescence）表示パネル等の各種の表示パネルが表示装置２４０として好適に利用される。 The display device 240 displays an image. The display device 240 displays various images under the control of the processing device 210. For example, various display panels such as a liquid crystal display panel and an organic EL (Electro Luminescence) display panel are preferably used as the display device 240.

入力装置２５０は、外部からの入力を受付けるデバイスである。例えば、入力装置２５０は、数字及び文字等の符号を処理装置１１０に入力するための操作と、表示装置２４０に表示されるアイコンなどの画像を選択するための操作とを受付ける。例えば、表示装置２４０の表示面に対する接触を検出するタッチパネルが入力装置２５０として好適である。なお、入力装置２５０は、ユーザが操作可能な一又は複数の操作子を含んでもよい。入力装置２５０はユーザの操作に応じた入力情報を出力する。 The input device 250 is a device that receives an input from the outside. For example, the input device 250 accepts an operation for inputting a code such as a number and a character into the processing device 110 and an operation for selecting an image such as an icon displayed on the display device 240. For example, a touch panel that detects contact with the display surface of the display device 240 is suitable as the input device 250. The input device 250 may include one or more controls that can be operated by the user. The input device 250 outputs input information according to the user's operation.

撮像装置２６０は、被写体を撮像して、撮像結果を示す撮像情報を処理装置２１０に出力する。テレビ会議のサービスを利用する場合、被写体はユーザ装置２０_jを使用するユーザＵｊである。ユーザＵｊがユーザ装置２０_jを用いて自らを撮像する。この撮像情報の示す画像は、ユーザＵｊの顔を含む。 The image pickup device 260 takes an image of the subject and outputs the image pickup information indicating the image pickup result to the processing device 210. When using the video conferencing service, the subject is the user Uj who uses the user device 20_j. The user Uj images himself / herself using the user device 20_j. The image indicated by this imaging information includes the face of the user Uj.

マイク２７０は、音を電気信号に変換し、変換結果を示す音情報を処理装置２１０に出力する。 The microphone 270 converts sound into an electric signal, and outputs sound information indicating the conversion result to the processing device 210.

処理装置２１０は、記憶装置２２０から制御プログラムＰＲ２を読み出し、読み出された制御プログラムＰＲ２を実行することによって、第１生成部２１１、第２生成部２１２、及び送信制御部２１３として機能する。 The processing device 210 reads the control program PR2 from the storage device 220 and executes the read control program PR2 to function as the first generation unit 211, the second generation unit 212, and the transmission control unit 213.

第１生成部２１１は、撮像情報及び音情報に基づいて、個別情報Ｘｊを生成する。撮像情報は、ユーザＵｊを被写体とする画像を示す。音情報は、ユーザＵｊが発話した場合にはユーザＵｊの音声を示す。なお、個別情報ＸｊにユーザＩＤを含まれてもよい。通信装置２３０は、個別情報Ｘｊをサーバ装置１０に送信する。 The first generation unit 211 generates individual information Xj based on the imaging information and the sound information. The imaging information indicates an image in which the user Uj is a subject. The sound information indicates the voice of the user Uj when the user Uj speaks. The user ID may be included in the individual information Xj. The communication device 230 transmits the individual information Xj to the server device 10.

第２生成部２１２は、入力情報に基づいて、検出情報Ｄｊ及び指定情報Ｚｊを生成する。
第２生成部２１２は、入力情報が発言ボタンＢ１の操作を示す場合に、検出情報Ｄｊを生成する。 The second generation unit 212 generates the detection information Dj and the designated information Zj based on the input information.
The second generation unit 212 generates the detection information Dj when the input information indicates the operation of the speak button B1.

第２生成部２１２は、入力情報が、ユーザＵ１〜Ｕｎのうち、ユーザＵｊ以外のユーザの顔画像に対する操作に応じて指定情報Ｚｊを生成する。顔画像に対する操作は、操作を識別できるのであれば、どのような操作であってもよい。例えば、顔画像に対する操作には、タップ操作と長押し操作とが含まれる。長押し操作とは、顔画像を所定時間以上、継続して押す操作を意味する。第２生成部２１２は、入力情報が顔画像に対するタップ操作である場合、当該顔画像のユーザに対して、第１のコミュニケーション方法を用いた密談を指定する指定情報Ｚｊを生成する。第２生成部２１２は、入力情報が顔画像に対する長押し操作である場合、当該顔画像のユーザに対して、第２のコミュニケーション方法を用いた密談を指定する指定情報Ｚｊを生成する。 The second generation unit 212 generates the designated information Zj according to the operation of the input information on the face image of the user other than the user Uj among the users U1 to Un. The operation on the face image may be any operation as long as the operation can be identified. For example, the operation on the face image includes a tap operation and a long press operation. The long press operation means an operation of continuously pressing the face image for a predetermined time or longer. When the input information is a tap operation on the face image, the second generation unit 212 generates the designated information Zj that specifies the secret talk using the first communication method for the user of the face image. When the input information is a long press operation on the face image, the second generation unit 212 generates the designated information Zj that specifies the secret talk using the second communication method for the user of the face image.

第１のコミュニケーション方法は、例えば、音声通話である。第２のコミュニケーション方法は、例えば、チャット及びノンバーバルコミュニケーションである。ノンバーバルコミュニケーションとは、言語によらないコミュニケーションの意味である。ノンバーバルコミュニケーションには、いわゆるスタンプなどの具象化されたアイコン及び写真の送信、顔画像に対するエフェクトが含まれる。エフェクトの対象となる顔画像は、密談の相手方のユーザ装置に表示されるテレビ会議の画像において、指定情報の送信元のユーザ装置に対応するユーザの顔画像である。また、エフェクトは、例えば、当該顔画像を左右に揺らすこと、当該顔画像の大きさを周期的に変化させること等が含まれ得る。 The first communication method is, for example, a voice call. The second communication method is, for example, chat and non-verbal communication. Non-verbal communication means communication that does not depend on language. Non-verbal communication includes sending concrete icons such as so-called stamps and photographs, and effects on facial images. The face image targeted by the effect is a face image of the user corresponding to the user device of the source of the specified information in the image of the video conference displayed on the user device of the other party of the secret talk. Further, the effect may include, for example, shaking the face image from side to side, changing the size of the face image periodically, and the like.

送信制御部２１３は、通信装置２３０を制御することによって、個別情報Ｘｊ、画面情報Ｙｊ、指定情報Ｚｊ、及び検出情報Ｄｊをサーバ装置１０に送信する。 The transmission control unit 213 transmits the individual information Xj, the screen information Yj, the designated information Zj, and the detection information Dj to the server device 10 by controlling the communication device 230.

１−４：サーバ装置の動作
サーバ装置１０の動作を、テレビ会議と密談に分けて説明する。図８は、テレビ会議における画像情報の生成に関するサーバ装置１０の動作を示すフローチャートである。処理装置１１０は、テレビ会議の起動を検知する（ステップＳ１）。テレビ会議に参加するユーザ装置２０_1〜２０_nには、予め特定のＵＲＬが通知されている。ユーザ装置２０_1〜２０_nは、特定のＵＲＬにアクセスすることによって、テレビ会議のサービスを利用できる。処理装置１１０は、例えば特定のＵＲＬにユーザ装置がアクセスしたことを検知することによって、テレビ会議の起動を検知する。 1-4: Operation of the server device The operation of the server device 10 will be described separately for a video conference and a secret talk. FIG. 8 is a flowchart showing the operation of the server device 10 regarding the generation of image information in a video conference. The processing device 110 detects the start of the video conference (step S1). A specific URL is notified in advance to the user devices 20_1 to 20_n participating in the video conference. The user devices 20_1 to 20_n can use the video conferencing service by accessing a specific URL. The processing device 110 detects the activation of the video conference by detecting, for example, that the user device has accessed a specific URL.

次に、処理装置１１０は取得部１１１として機能し、テレビ会議に参加するユーザ装置２０_1〜２０_nから、個別情報Ｘ１〜Ｘｎ及び画面情報Ｙ１〜Ｙｎを取得する（ステップＳ２）。 Next, the processing device 110 functions as the acquisition unit 111, and acquires the individual information X1 to Xn and the screen information Y1 to Yn from the user devices 20_1 to 20_n participating in the video conference (step S2).

ステップＳ３において、処理装置１１０は、画像生成部１１２として機能し、個別情報Ｘ１〜Ｘｎに基づいて顔画像を抽出する。個別情報Ｘ１〜Ｘｎに基づいて、ユーザの顔を含む部分の画像を抽出することによって、処理装置１１０は、複数のユーザＵ１〜Ｕｎと1対１に対応する複数の顔画像情報Ｇｆ１〜Ｇｆｎを生成する。 In step S3, the processing device 110 functions as an image generation unit 112 and extracts a face image based on the individual information X1 to Xn. By extracting an image of a portion including a user's face based on the individual information X1 to Xn, the processing device 110 obtains a plurality of face image information Gf1 to Gfn corresponding to one-to-one with the plurality of users U1 to Un. Generate.

ステップＳ４において、処理装置１１０は、画像生成部１１２として機能し、テレビ会議の参加人数に応じた画面レイアウトを、ユーザ装置２０_1〜２０_nの画面サイズごとに決定する。処理装置１１０は、取得された個別情報Ｘ１〜Ｘｎの数をテレビ会議の参加人数としてもよい。あるいは、処理装置１１０は、ステップＳ２で抽出した複数の顔画像を計数することによって、テレビ会議に参加するユーザの人数を特定してもよい。 In step S4, the processing device 110 functions as the image generation unit 112, and determines the screen layout according to the number of participants in the video conference for each screen size of the user devices 20_1 to 20_n. The processing device 110 may use the number of acquired individual information X1 to Xn as the number of participants in the video conference. Alternatively, the processing device 110 may specify the number of users participating in the video conference by counting the plurality of face images extracted in step S2.

ステップＳ５において、処理装置１１０は、画像生成部１１２として機能し、画像情報Ｇ１〜Ｇｎを生成する。具体的には、処理装置１１０は、ユーザ装置２０_1〜２０_nごとに、決定された画面レイアウトに、顔画像情報Ｇｆ１〜Ｇｆｎによって示される複数の顔画像を挿入することによって、画像情報Ｇ１〜Ｇｎを生成する。 In step S5, the processing device 110 functions as an image generation unit 112 to generate image information G1 to Gn. Specifically, the processing device 110 inserts the image information G1 to Gn into the determined screen layout for each of the user devices 20_1 to 20_n by inserting a plurality of face images indicated by the face image information Gf1 to Gfn. Generate.

ステップＳ６において、処理装置１１０は、認識部１１３として機能し、個別情報Ｘ１〜Ｘｎの各々に基づいて話者が認識できたかを判定する。判定結果が否定の場合、処理装置１１０は判定結果が肯定になるまで、判定を繰り返す。 In step S6, the processing device 110 functions as the recognition unit 113, and determines whether or not the speaker can recognize the individual information X1 to Xn. If the determination result is negative, the processing device 110 repeats the determination until the determination result becomes affirmative.

ステップＳ６の判定結果が肯定の場合、処理装置１１０は、画像生成部１１２として機能し、話者の顔画像を強調する（ステップＳ７）。ステップＳ８において、処理装置１１０は、画像生成部１１２として機能し、話者の顔画像が強調されていない画像を示す画像情報を、話者の顔画像が強調された画像を示す画像情報に更新する。 If the determination result in step S6 is affirmative, the processing device 110 functions as the image generation unit 112 and emphasizes the face image of the speaker (step S7). In step S8, the processing device 110 functions as an image generation unit 112, and updates the image information indicating the image in which the speaker's face image is not emphasized to the image information indicating the image in which the speaker's face image is emphasized. To do.

ステップＳ９において、処理装置１１０は画像生成部１１２として機能し、取得部１１１が検出情報Ｄｊを取得したか否かを判定する。
ステップＳ９の判定結果が否定である場合、処理装置１１０は、処理をステップＳ６に戻す。ステップＳ９の判定結果が肯定である場合、処理装置１１０は、画像生成部１１２として機能し、検出情報Ｄｊの送信元のユーザ装置２０_jのユーザＵｊを、発話を希望するユーザとして特定する（ステップＳ１０）。 In step S9, the processing device 110 functions as the image generation unit 112, and determines whether or not the acquisition unit 111 has acquired the detection information Dj.
If the determination result in step S9 is negative, the processing device 110 returns the processing to step S6. If the determination result in step S9 is affirmative, the processing device 110 functions as the image generation unit 112, and identifies the user Uj of the user device 20_j, which is the source of the detection information Dj, as the user who wants to speak (step S10). ).

ステップＳ１１において、処理装置１１０は、画像生成部１１２として機能し、テレビ会議の画像に、特定されたユーザの顔画像を識別可能な画像を含ませることによって、画像情報Ｇ１〜Ｇｎを更新する。特定されたユーザの顔画像を識別可能な画像は、例えば、図６Ｂに示すアイコンＡ１の画像が対応する。 In step S11, the processing device 110 functions as an image generation unit 112, and updates the image information G1 to Gn by including an image that can identify the face image of the specified user in the image of the video conference. The image that can identify the face image of the specified user corresponds to, for example, the image of the icon A1 shown in FIG. 6B.

この後、処理装置１１０は、テレビ会議の終了条件を充足したか否かを判定する（ステップＳ１２）。判定結果が否定の場合、処理装置１１０は、処理をステップＳ６に戻す。一方、判定結果が肯定の場合、処理装置１１０は処理を終了する。テレビ会議の終了条件は、例えば、テレビ会議に参加するためにサーバ装置１０に通信接続されていた全てのユーザ装置２０_1〜２０_nについて、通信接続が終了したこととしてもよい。 After that, the processing device 110 determines whether or not the end condition of the video conference is satisfied (step S12). If the determination result is negative, the processing device 110 returns the processing to step S6. On the other hand, if the determination result is affirmative, the processing device 110 ends the processing. The termination condition of the video conference may be, for example, that the communication connection is terminated for all the user devices 20_1 to 20_n that are communication-connected to the server device 10 in order to participate in the video conference.

以上の処理によって、画像情報Ｇ１〜Ｇｎは生成される。生成された画像情報Ｇ１〜Ｇｎは、１対１に対応するユーザ装置２０_1〜２０_nに送信される。 Image information G1 to Gn are generated by the above processing. The generated image information G1 to Gn are transmitted to the user devices 20_1 to 20_n corresponding to one-to-one.

次に、密談におけるサーバ装置１０の動作を説明する。図９は、密談に関するサーバ装置１０の動作を示すフローチャートである。 Next, the operation of the server device 10 in the secret talk will be described. FIG. 9 is a flowchart showing the operation of the server device 10 regarding the secret talk.

処理装置１１０は、複数のユーザ装置２０_1〜２０_nのいずれかから、指定情報が取得されたか否かを判定する（ステップＳ２１）。ステップＳ２１の判定結果が否定である場合、処理装置１１０は、判定結果が肯定になるまで、ステップＳ２１の処理を繰り返す。以下の説明では、サーバ装置１０は、ユーザ装置２０_3から指定情報Ｚ３を取得する場合を王定する。また、指定情報Ｚ３は、密談の相手方としてユーザ装置２０_1を使用するユーザＵ１を指定する場合を想定する。即ち、ユーザＵ３が密談の要求元であり、ユーザＵ１が密談の相手方である。 The processing device 110 determines whether or not the designated information has been acquired from any of the plurality of user devices 20_1 to 20_n (step S21). If the determination result in step S21 is negative, the processing device 110 repeats the process in step S21 until the determination result becomes affirmative. In the following description, the server device 10 determines the case where the designated information Z3 is acquired from the user device 20_3. Further, it is assumed that the designation information Z3 specifies a user U1 who uses the user device 20_1 as the other party of the secret talk. That is, the user U3 is the requester of the secret talk, and the user U1 is the other party of the secret talk.

ステップＳ２１の判定結果が肯定である場合、処理装置１１０は画像生成部１１２として機能し、指定情報Ｚ３の示すコミュニケーション方法が第１態様であるか否かを判定する。この例において、密談のコミュニケーション方法は、第１態様と第２態様の２つである。第１態様のコミュニケーション方法は音声通話である。また、第２態様のコミュニケーション方法はチャットである。 If the determination result in step S21 is affirmative, the processing device 110 functions as the image generation unit 112, and determines whether or not the communication method indicated by the designated information Z3 is the first aspect. In this example, there are two communication methods for secret talk, the first aspect and the second aspect. The communication method of the first aspect is a voice call. The communication method of the second aspect is chat.

ステップＳ２１の判定結果が肯定である場合、処理装置１１０は、画像生成部１１２として機能し、密談の相手方に対応し、且つ第１態様に対応する画像情報Ｇ１を生成する（ステップＳ２４）。この後、処理装置１１０は、密談の相手方であるユーザＵ１が使用するユーザ装置２０_1に画像情報Ｇ１を送信する（ステップＳ２４）。 If the determination result in step S21 is affirmative, the processing device 110 functions as an image generation unit 112 to generate image information G1 corresponding to the other party of the secret talk and corresponding to the first aspect (step S24). After that, the processing device 110 transmits the image information G1 to the user device 20_1 used by the user U1 who is the other party of the secret talk (step S24).

図１０は、密談の相手方であるユーザ装置２０_1に表示される画像の一例を示す説明図である。図１０に示されるように、画像には、密談を要求するユーザＵ３の顔画像が配置される顔画像領域Ｒｂ３に電話機が具象化されたアイコンＡ２が配置される。ユーザＵ１は、第１態様のコミュニケーション方法に対応するアイコンＡ２が、顔画像領域Ｒｂ３に配置されることによって、ユーザＵ３が音声通話により密談を求めていることが分かる。 FIG. 10 is an explanatory diagram showing an example of an image displayed on the user device 20_1 which is the other party of the secret talk. As shown in FIG. 10, in the image, the icon A2 in which the telephone is embodied is arranged in the face image area Rb3 in which the face image of the user U3 requesting a secret talk is arranged. The user U1 can know that the user U3 requests a secret talk by voice call by arranging the icon A2 corresponding to the communication method of the first aspect in the face image area Rb3.

ステップＳ２５において、処理装置１１０は、密談の相手方が密談を了承したか否かを判定する。具体的には、処理装置１１０は、密談の相手方であるユーザＵ１が、所定期間中に、図１０に示されるアイコンＡ２に対してタップ操作をしたか否かを判定する。ユーザ装置２０_1では、上記タップ操作がなされると、検出情報Ｄ１が生成される。処理装置１１０は、所定期間中に検出情報Ｄ１を取得したか否かによって、密談の相手方が密談を了承したか否かを判定する。 In step S25, the processing device 110 determines whether or not the other party of the secret talk has accepted the secret talk. Specifically, the processing device 110 determines whether or not the user U1 who is the other party of the secret talk has tapped the icon A2 shown in FIG. 10 during a predetermined period. In the user device 20_1, the detection information D1 is generated when the tap operation is performed. The processing device 110 determines whether or not the other party of the secret talk has accepted the secret talk, depending on whether or not the detection information D1 has been acquired during the predetermined period.

ステップＳ２５の判定結果が肯定の場合、処理装置１１０は、ユーザ装置２０_1とユーザ装置２０_3との間の音声通話を開始する（ステップＳ２６）。ステップＳ２６において、処理装置１１０は、ユーザ装置２０_1から取得した音声情報をユーザ装置２０_3に転送し、ユーザ装置２０_3から取得した音声情報をユーザ装置２０_1に転送する。この処理によって、ユーザＵ１とユーザＵ３との間の音声通話が可能となる。 If the determination result in step S25 is affirmative, the processing device 110 starts a voice call between the user device 20_1 and the user device 20_3 (step S26). In step S26, the processing device 110 transfers the voice information acquired from the user device 20_1 to the user device 20_3, and transfers the voice information acquired from the user device 20_3 to the user device 20_1. This process enables a voice call between the user U1 and the user U3.

ステップＳ２５の判定結果が否定である場合、処理装置１１０は、密談の要求元に対応する画像情報Ｇ３を生成する（ステップＳ２７）。この後、処理装置１１０は、密談の要求元であるユーザ装置２０_3に画像情報Ｇ３を送信する（ステップＳ２８）。図１１は、画像情報Ｇ３に対応する画像の一例を示す説明図である。図１１に示されるように、画像には、密談の相手方であるユーザＵ１の顔画像が配置される顔画像領域Ｒｂ１に通話不能であることを示すアイコンＡ３が配置される。ユーザＵ３は、アイコンＡ３が、顔画像領域Ｒｂ１に配置されることによって、ユーザＵ１が音声通話による密談を了承しないことが分かる。 If the determination result in step S25 is negative, the processing device 110 generates the image information G3 corresponding to the request source of the secret talk (step S27). After that, the processing device 110 transmits the image information G3 to the user device 20_3, which is the request source of the secret talk (step S28). FIG. 11 is an explanatory diagram showing an example of an image corresponding to the image information G3. As shown in FIG. 11, in the image, an icon A3 indicating that the call is not possible is arranged in the face image area Rb1 where the face image of the user U1 who is the other party of the secret talk is arranged. The user U3 knows that the icon A3 is arranged in the face image area Rb1 so that the user U1 does not accept the secret talk by the voice call.

次に、ステップＳ２２の判定結果が否定である場合、指定情報Ｚ３によって指定されるコミュニケーション方法は第２態様である。この場合、処理装置１１０は、第２態様であるチャットに対応した画像情報Ｇ１及びＧ３を生成する（ステップＳ２９）。画像情報Ｇ１の示す画像と及び画像情報Ｇ３の示す画像は同一である。処理装置１１０は、ユーザ装置２０_3に画像情報Ｇ３を送信し、ユーザ装置２０_1に画像情報Ｇ１を送信する（ステップＳ３０）。 Next, when the determination result in step S22 is negative, the communication method designated by the designated information Z3 is the second aspect. In this case, the processing device 110 generates image information G1 and G3 corresponding to the chat, which is the second aspect (step S29). The image indicated by the image information G1 and the image indicated by the image information G3 are the same. The processing device 110 transmits the image information G3 to the user device 20_3, and transmits the image information G1 to the user device 20_1 (step S30).

この後、処理装置１１０は、ユーザ装置２０_1とユーザ装置２０_3との間でチャットを開始する（ステップＳ３１）。チャット中の処理装置１１０は、ユーザ装置２０_1から取得したテキスト情報及びユーザ装置２０_3から取得したテキスト情報を反映させた画像を示す画像情報Ｇ１及びＧ３を生成し、生成された画像情報Ｇ１及びＧ３をユーザ装置２０_1及びユーザ装置２０_3に送信する。 After that, the processing device 110 starts a chat between the user device 20_1 and the user device 20_3 (step S31). The processing device 110 during chat generates image information G1 and G3 indicating an image reflecting the text information acquired from the user device 20_1 and the text information acquired from the user device 20_3, and generates the generated image information G1 and G3. It is transmitted to the user device 20_1 and the user device 20_3.

図１２は、チャット中にユーザ装置２０_1に表示される画像の一例を示す説明図である。図１２に示されるように、画像には、チャット領域Ｒｄが設けられる。チャット領域Ｒｄにはテキストが表示される。また、チャット領域Ｒｄには、終了の指示を入力するための終了ボタンＢ２が配置される。この例のチャット領域Ｒｄは、密談の要求元であるユーザＵ３の顔画像、密談の相手方であるユーザＵ１の顔画像、及び話者であるユーザＵ２の顔画像と重ならない領域に配置される。この画面レイアウトによれば、ユーザは、密談の要求元と密談の相手方を一見して認識できる。また、この画面レイアウトによれば、話者の様子を認識できるので、密談の要求元及び相手方は、テレビ会議の進行を把握しながら、密談できる。 FIG. 12 is an explanatory diagram showing an example of an image displayed on the user device 20_1 during a chat. As shown in FIG. 12, the image is provided with a chat area Rd. Text is displayed in the chat area Rd. Further, in the chat area Rd, an end button B2 for inputting an end instruction is arranged. The chat area Rd of this example is arranged in an area that does not overlap with the face image of the user U3 who is the requester of the chat, the face image of the user U1 who is the other party of the chat, and the face image of the user U2 who is the speaker. According to this screen layout, the user can recognize the requester of the secret talk and the other party of the secret talk at a glance. Further, according to this screen layout, since the state of the speaker can be recognized, the requester and the other party of the secret talk can talk while grasping the progress of the video conference.

この後、処理装置１１０は、密談の終了条件が充足されたか否かを判定する（ステップＳ３２）。ステップＳ３２の判定結果が否定ある場合、処理装置１１０は判定結果が肯定になるまでステップＳ３２の判定処理を繰り返す。処理装置１１０は、判定結果が肯定になると、密談の処理を終了し、通常のテレビ会議の処理に戻る。 After that, the processing device 110 determines whether or not the end condition of the secret talk is satisfied (step S32). If the determination result in step S32 is negative, the processing device 110 repeats the determination process in step S32 until the determination result becomes affirmative. When the determination result becomes affirmative, the processing device 110 ends the processing of the secret talk and returns to the processing of the normal video conference.

以上、説明したように、サーバ装置１０は、テレビ会議に参加する複数のユーザＵ１〜Ｕｎと１対１に対応する複数のユーザ装置２０_1〜２０_nと通信する。取得部１１１は、各ユーザの画像及び当該ユーザの音を示す個別情報Ｘｊ、並びに各ユーザに対応するユーザ装置の画面サイズに関する画面情報Ｙｊを複数のユーザ装置２０_1〜２０_nの各々から取得する。画像生成部１１２は、複数のユーザ装置２０_1〜２０_nと1対１に対応する複数の個別情報Ｘ１〜Ｘｎ及び複数のユーザ装置２０_1〜２０_nと1対１に対応する複数の画面情報Ｙ１〜Ｙｎに基づいて、各ユーザ装置の画面サイズに応じた画像情報Ｇｊを複数のユーザ装置２０_1〜２０_nの各々について生成する。送信部の一例である通信装置１３０は、複数のユーザ装置２０_1〜２０_nの各々について生成された画像情報Ｇｊを、対応するユーザ装置２０_jに送信する。
以上の構成によれば、複数のユーザ装置２０_1〜２０_nの画面サイズが一部、相違する場合であっても、サーバ装置１０は、複数のユーザ装置２０_1〜２０_nの各々に画面サイズに応じた画像情報Ｇｊを送信するので、テレビ会議を円滑に進行できる。 As described above, the server device 10 communicates with the plurality of users U1 to Un participating in the video conference and the plurality of user devices 20_1 to 20_n corresponding to one-to-one. The acquisition unit 111 acquires the image of each user, the individual information Xj indicating the sound of the user, and the screen information Yj regarding the screen size of the user device corresponding to each user from each of the plurality of user devices 20_1 to 20_n. The image generation unit 112 provides a plurality of individual information X1 to Xn corresponding to one-to-one with the plurality of user devices 20_1 to 20_n and a plurality of screen information Y1 to Yn corresponding to one-to-one with the plurality of user devices 20_1 to 20_n. Based on this, image information Gj corresponding to the screen size of each user device is generated for each of the plurality of user devices 20_1 to 20_n. The communication device 130, which is an example of the transmission unit, transmits the image information Gj generated for each of the plurality of user devices 20_1 to 20_n to the corresponding user device 20_j.
According to the above configuration, even if the screen sizes of the plurality of user devices 20_1 to 20_n are partially different, the server device 10 has an image corresponding to the screen size of each of the plurality of user devices 20_1 to 20_n. Since the information Gj is transmitted, the video conference can proceed smoothly.

また、複数のユーザ装置２０_1〜２０_nには、第１のユーザ装置の一例であるユーザ装置２０_1と第２のユーザ装置の一例であるユーザ装置２０_2とが含まれる。ユーザ装置２０_1に対応する画面情報Ｙ１の示す画面サイズは、ユーザ装置２０_2に対応する画面情報Ｙ２の示す画面サイズよりも大きい。画像生成部１１２は、ユーザ装置２０_1に対応する画像情報Ｇ１と、ユーザ装置２０_2に対応する画像情報Ｇ２とを生成する。画像情報Ｇ１の示す画像は、複数のユーザＵ１〜Ｕｎの顔を含む。画像情報Ｇ２の示す画像は、複数のユーザＵ１〜Ｕｎのうち話者の顔を含む。画像情報Ｇ２の示す画像に含まれるユーザの顔の数は、画像情報Ｇ１の示す画像に含まれるユーザの顔の数よりも少ない。通信装置１３０は、画像情報Ｇ１をユーザ装置２０_1に送信し、画像情報Ｇ２をユーザ装置２０_2に送信する。 Further, the plurality of user devices 20_1 to 20_n include a user device 20_1 which is an example of the first user device and a user device 20_2 which is an example of the second user device. The screen size indicated by the screen information Y1 corresponding to the user device 20_1 is larger than the screen size indicated by the screen information Y2 corresponding to the user device 20_2. The image generation unit 112 generates the image information G1 corresponding to the user device 20_1 and the image information G2 corresponding to the user device 20_2. The image indicated by the image information G1 includes the faces of a plurality of users U1 to Un. The image indicated by the image information G2 includes the face of the speaker among the plurality of users U1 to Un. The number of user faces included in the image indicated by the image information G2 is smaller than the number of user faces included in the image indicated by the image information G1. The communication device 130 transmits the image information G1 to the user device 20_1 and the image information G2 to the user device 20_2.

以上の構成によれば、画面サイズが小さいユーザ装置に表示されるユーザの顔は、画面サイズが大きいユーザ装置に表示されるユーザの顔の数は、画面サイズが小さいユーザ装置に表示されるユーザの顔と比較して多いので、画面サイズに応じて、テレビ会議に参加するユーザの顔の数を変動できる。表示するユーザの顔の数が減少すると、情報量が減少するが、画面サイズの小さいユーザ装置であっても、話者の顔は表示されるので、テレビ会議の利便性が高まる。 According to the above configuration, the number of user faces displayed on the user device having a small screen size and the number of faces of the user displayed on the user device having a large screen size are the users displayed on the user device having a small screen size. Since the number of faces is larger than that of the user, the number of faces of users participating in the video conference can be changed according to the screen size. When the number of faces of the user to be displayed decreases, the amount of information decreases, but the face of the speaker is displayed even on a user device having a small screen size, which enhances the convenience of video conferencing.

また、画像生成部１１２は、複数の個別情報Ｘ１〜Ｘｎに基づいて、ユーザの顔を含む部分の画像を抽出することによって、複数のユーザＵ１〜Ｕｎと1対１に対応する複数の顔画像情報Ｇｆ１〜Ｇｆｎを生成する。画像生成部１１２は、画像情報Ｇ１として、複数のユーザＵ１〜Ｕｎの顔画像情報Ｇｆ１〜Ｇｆｎを含む情報を生成する。画像生成部１１２は、画像情報Ｇ２として、複数のユーザＵ１〜Ｕｎの顔画像情報Ｇｆ１〜Ｇｆｎのうち話者の顔画像情報を含み、且つ複数のユーザＵ１〜Ｕｎの顔画像情報Ｇｆ１〜Ｇｆｎのうち一部の顔画像情報を含まない情報を生成する。 Further, the image generation unit 112 extracts a plurality of face images corresponding to one-to-one with a plurality of users U1 to Un by extracting an image of a portion including a user's face based on a plurality of individual information X1 to Xn. Information Gf1 to Gfn are generated. The image generation unit 112 generates information including face image information Gf1 to Gfn of a plurality of users U1 to Un as the image information G1. The image generation unit 112 includes the face image information of the speaker among the face image information Gf1 to Gfn of the plurality of users U1 to Un as the image information G2, and the face image information Gf1 to Gfn of the plurality of users U1 to Un. Generate information that does not include some face image information.

以上の構成によれば、画像生成部１１２は、複数の顔画像情報Ｇｆ１〜Ｇｆｎを生成し、生成された複数の顔画像情報Ｇｆ１〜Ｇｆｎを用いて、複数の画像情報Ｇ１〜Ｇｎを生成する。従って、複数の画像情報Ｇ１〜Ｇｎごとに必要なユーザの顔画像を抽出する場合と比較して、画像生成部１１２は、複数の画像情報Ｇ１〜Ｇｎを簡単に生成できる。 According to the above configuration, the image generation unit 112 generates a plurality of face image information Gf1 to Gfn, and uses the generated plurality of face image information Gf1 to Gfn to generate a plurality of image information G1 to Gn. .. Therefore, the image generation unit 112 can easily generate a plurality of image information G1 to Gn as compared with the case where a necessary user's face image is extracted for each of the plurality of image information G1 to Gn.

サーバ装置１０は、複数の個別情報Ｘ１〜Ｘｎの示す音声に基づいて、複数のユーザＵ１〜Ｕｎのうち話者を認識する認識部１１３を備える。画像生成部１１２は、認識部１１３の認識結果に基づいて、複数の顔画像情報Ｇｆ１〜Ｇｆｎのうち、話者に対応する顔画像情報を特定する。 The server device 10 includes a recognition unit 113 that recognizes a speaker among a plurality of users U1 to Un based on the voices indicated by the plurality of individual information X1 to Xn. The image generation unit 112 identifies the face image information corresponding to the speaker among the plurality of face image information Gf1 to Gfn based on the recognition result of the recognition unit 113.

以上の構成によれば、音声に基づいて話者を特定するので、画像を解析して唇の動きの特徴量を生成し、生成された特徴量から話者を特定する場合と比較して、処理負荷が低減する。 According to the above configuration, since the speaker is identified based on the voice, the image is analyzed to generate the feature amount of the movement of the lips, and the speaker is specified from the generated feature amount. The processing load is reduced.

画像情報Ｇ２の示す画像に含まれるユーザの顔の数が「１」である場合、画像生成部１１２は、話者に対応する顔画像情報の示す顔画像を強調することによって、画像情報Ｇ１を生成する。画像生成部１１２は、話者に対応する顔画像情報の示す顔画像を強調することなく、画像情報Ｇ２を生成する。 When the number of faces of the user included in the image indicated by the image information G2 is "1", the image generation unit 112 uses the image information G1 by emphasizing the face image indicated by the face image information corresponding to the speaker. Generate. The image generation unit 112 generates the image information G2 without emphasizing the face image indicated by the face image information corresponding to the speaker.

話者の顔画像を強調する理由は、画像に複数の顔画像が含まれる場合に、話者の識別を容易にするためである。画像情報Ｇ２の示す画像に含まれるユーザの顔の数が「１」であるので、話者のみが表示される。従って、画像情報Ｇ２の示す画像に含まれる顔画像を強調する必要がない。以上の構成によれば、画像情報Ｇ２の示す画像に含まれるユーザの顔の数が「１」である場合に、話者の顔画像を強調しないので、画像情報Ｇ２を生成する処理負荷が軽減される。 The reason for emphasizing the speaker's face image is to facilitate the identification of the speaker when the image contains a plurality of face images. Since the number of user's faces included in the image indicated by the image information G2 is "1", only the speaker is displayed. Therefore, it is not necessary to emphasize the face image included in the image indicated by the image information G2. According to the above configuration, when the number of faces of the user included in the image indicated by the image information G2 is "1", the face image of the speaker is not emphasized, so that the processing load for generating the image information G2 is reduced. Will be done.

複数の画像情報Ｇ１〜Ｇｎの各々が示す画像は、発言を希望する意志を伝えるための操作子の一例である発言ボタンＢ１の画像を含む。取得部１１１は、発言ボタンＢ１の画像が操作されたこと示す検出情報Ｄｊを複数のユーザ装置２０_1〜２０_nのうちいずれかのユーザ装置２０_jから取得可能である。画像生成部１１２は、複数の画像情報Ｇ１〜Ｇｎの各々が示す画像に、検出情報Ｄｊの送信元であるユーザ装置２０_jを使用するユーザＵｊを識別可能な画像を含ませる。 The image indicated by each of the plurality of image information G1 to Gn includes an image of the remark button B1 which is an example of an operator for transmitting a desire to remark. The acquisition unit 111 can acquire the detection information Dj indicating that the image of the speak button B1 has been operated from any of the plurality of user devices 20_1 to 20_n. The image generation unit 112 includes an image in which each of the plurality of image information G1 to Gn can identify the user Uj who uses the user apparatus 20_j which is the source of the detection information Dj.

以上の構成によれば、発言を希望するユーザは、発言ボタンＢ１を操作することによって、テレビ会議に参加する他のユーザに、発言を希望する意志を伝えことができるので、テレビ会議を円滑に進行できる。 According to the above configuration, the user who wants to speak can convey the intention to speak to other users who participate in the video conference by operating the voice button B1, so that the video conference can be smoothly performed. You can proceed.

取得部１１１は、複数のユーザ装置２０_1〜２０_nから、密談の相手方となるユーザ及び密談に用いるコミュニケーション方法を指定する指定情報Ｚ１〜Ｚｎを取得可能である。管理部１１４は、密談が行われる場合、指定情報Ｚｊで指定される密談に用いるコミュニケーション方法に応じて、指定情報Ｚｊの送信元のユーザ装置２０_jと指定情報Ｚｊで指定される密談の相手方のユーザ装置との間におけるコミュニケーション方法を切り替える。 The acquisition unit 111 can acquire the designated information Z1 to Zn that specifies the user who is the other party of the secret talk and the communication method used for the secret talk from the plurality of user devices 20_1 to 20_n. When a secret talk is conducted, the management unit 114 determines the user device 20_j of the source of the designated information Zj and the user of the other party of the secret talk specified by the designated information Zj, according to the communication method used for the secret talk specified by the designated information Zj. Switch the communication method with the device.

以上の構成によれば、密談の要求元で生成された指定情報Ｚｊに従って、コミュニケーション方法を指定できるので、密談のコミュニケーション方法が一つである場合と比較して、状況に応じた密談ができる。 According to the above configuration, the communication method can be specified according to the designated information Zj generated by the requester of the secret talk, so that the secret talk can be performed according to the situation as compared with the case where there is only one communication method of the secret talk.

指定情報Ｚｊは、複数のユーザ装置２０_1〜２０_nのうち一のユーザ装置２０_jで生成される。密談の相手方となるユーザは、一のユーザ装置２０_jに表示される複数の顔の画像うち、操作の対象となる顔の画像によって指定される。密談におけるコミュニケーション方法は、顔の画像に対する操作によって指定される。 The designated information Zj is generated by one of the plurality of user devices 20_1 to 20_n, the user device 20_j. The user who is the other party of the secret talk is specified by the face image to be operated among the plurality of face images displayed on one user device 20_j. The communication method in the secret talk is specified by the operation on the face image.

以上の構成によれば、密談の相手方の指定と密談におけるコミュニケーション方法の指定は、一つの顔の画像に対する操作によって指定できるので、互いに異なる画像に対する操作によって指定する場合と比較して、ユーザの利便性が向上する。 According to the above configuration, the designation of the other party of the secret talk and the designation of the communication method in the secret talk can be specified by the operation on one face image, so that the user's convenience is compared with the case of specifying by the operation on different images. Improves sex.

画像生成部１１２は、テレビ会議において、密談の相手方のユーザ装置２０_jに表示される画像に、複数のユーザＵ１〜Ｕｎの顔が表示される場合、指定情報の送信元のユーザの顔を強調した画像を示す画像情報Ｇｊを生成する。
以上の構成によれば、密談の相手方のユーザに密談の要求元のユーザを知らせることができる。 When the faces of a plurality of users U1 to Un are displayed in the image displayed on the user device 20_j of the other party of the secret talk in the video conference, the image generation unit 112 emphasizes the face of the user who is the source of the specified information. Image information Gj indicating an image is generated.
According to the above configuration, it is possible to notify the user of the other party of the secret talk to the user who requested the secret talk.

２．変形例
本開示は、以上に例示した実施形態に限定されない。具体的な変形の態様を以下に例示する。以下の例示から任意に選択された２以上の態様を併合してもよい。 2. Modifications The present disclosure is not limited to the embodiments exemplified above. A specific mode of modification is illustrated below. Two or more aspects arbitrarily selected from the following examples may be merged.

２−１：変形例１
実施形態では画像情報Ｇ１〜Ｇｎの各々が示す画像は１頁で構成されるが、本開示はこれに限定されない。例えば、画面サイズに応じて、１頁に表示する顔画像の最大値を定め、画像生成部１１２は、テレビ会議の参加者の人数が最大値を超える場合は、複数頁でテレビ会議の画像を構成する画像情報Ｇｊを生成してもよい。例えば、画面サイズが時計型のウェアラブル装置の場合、最大値を「１」としてもよい。画像生成部１１２は、テレビ会議の画像をｎ頁で構成する画像情報Ｇｊを生成する。例えば、画面サイズがスマートフォンの場合、最大値を「３」としてもよい。テレビ会議の参加者の人数が６名であれば、画像生成部１１２は、テレビ会議の画像を２頁で構成する画像情報Ｇｊを生成してもよい。テレビ会議の画像が複数頁で構成される場合、ユーザ装置２０_jにおいて、フリック操作によって表示される頁を変更できる。 2-1: Modification 1
In the embodiment, the image shown by each of the image information G1 to Gn is composed of one page, but the present disclosure is not limited to this. For example, the maximum value of the face image to be displayed on one page is determined according to the screen size, and when the number of participants in the video conference exceeds the maximum value, the image generation unit 112 displays the image of the video conference on a plurality of pages. The constituent image information Gj may be generated. For example, in the case of a wearable device having a screen size of a watch type, the maximum value may be set to "1". The image generation unit 112 generates image information Gj composed of n pages of the image of the video conference. For example, when the screen size is a smartphone, the maximum value may be set to "3". If the number of participants in the video conference is 6, the image generation unit 112 may generate image information Gj in which the image of the video conference is composed of two pages. When the image of the video conference is composed of a plurality of pages, the page displayed by the flick operation can be changed on the user device 20_j.

２−２：変形例２
本開示において画像情報Ｇ１〜Ｇｎは、動画であってもよいし、静止画であってもよい。また、話者に関する画像情報を動画とし、話者以外の他のユーザに関する画像情報を静止画としてもよい。画像情報Ｇ１〜Ｇｎを動画とし、話者の画像情報のフレームレートを、話者以外の他のユーザに関する画像情報のフレームレートよりも高くしてもよい。この場合、処理装置１１０は、ユーザ装置２０_1〜２０_nの各々に対して、個別情報のフレームレートを指示してもよい。 2-2: Modification example 2
In the present disclosure, the image information G1 to Gn may be a moving image or a still image. Further, the image information about the speaker may be a moving image, and the image information about a user other than the speaker may be a still image. The image information G1 to Gn may be used as a moving image, and the frame rate of the image information of the speaker may be higher than the frame rate of the image information related to a user other than the speaker. In this case, the processing device 110 may instruct each of the user devices 20_1 to 20_n to specify the frame rate of the individual information.

３．その他
（１）上述した実施形態では、記憶装置１２０又は２２０は、処理装置１１０又は２１０が読取可能な記録媒体であり、ＲＯＭ及びＲＡＭなどを例示したが、フレキシブルディスク、光磁気ディスク(例えば、コンパクトディスク、デジタル多用途ディスク、Ｂｌｕ−ｒａｙ（登録商標）ディスク)、スマートカード、フラッシュメモリデバイス(例えば、カード、スティック、キードライブ)、ＣＤ−ＲＯＭ（Compact Disc−ＲＯＭ）、レジスタ、リムーバブルディスク、ハードディスク、フロッピー（登録商標）ディスク、磁気ストリップ、データベース、サーバその他の適切な記憶媒体である。また、プログラムは、電気通信回線を介してネットワークから送信されてもよい。また、プログラムは、電気通信回線を介して通信網から送信されてもよい。 3. 3. Others (1) In the above-described embodiment, the storage device 120 or 220 is a recording medium that can be read by the processing device 110 or 210, and examples thereof include a ROM and a RAM. Discs, digital versatile discs, Blu-ray® discs), smart cards, flash memory devices (eg cards, sticks, key drives), CD-ROMs (Compact Disc-ROMs), registers, removable discs, hard disks , Floppy® discs, magnetic strips, databases, servers and other suitable storage media. The program may also be transmitted from the network via a telecommunication line. The program may also be transmitted from the communication network via a telecommunication line.

（２）上述した実施形態において、説明した情報、信号などは、様々な異なる技術のいずれかを使用して表されてもよい。例えば、上記の説明全体に渡って言及され得るデータ、命令、コマンド、情報、信号、ビット、シンボル、チップなどは、電圧、電流、電磁波、磁界若しくは磁性粒子、光場若しくは光子、又はこれらの任意の組み合わせによって表されてもよい。 (2) In the above-described embodiment, the described information, signals, etc. may be represented using any of a variety of different techniques. For example, data, instructions, commands, information, signals, bits, symbols, chips, etc. that may be referred to throughout the above description are voltages, currents, electromagnetic waves, magnetic fields or magnetic particles, light fields or photons, or any of these. It may be represented by a combination of.

（３）上述した実施形態において、入出力された情報等は特定の場所（例えば、メモリ）に保存されてもよいし、管理テーブルを用いて管理してもよい。入出力される情報等は、上書き、更新、又は追記され得る。出力された情報等は削除されてもよい。入力された情報等は他の装置へ送信されてもよい。 (3) In the above-described embodiment, the input / output information and the like may be stored in a specific place (for example, a memory) or may be managed by using a management table. Input / output information and the like can be overwritten, updated, or added. The output information and the like may be deleted. The input information or the like may be transmitted to another device.

（４）上述した実施形態において、判定は、１ビットで表される値（０か１か）によって行われてもよいし、真偽値（Boolean：true又はfalse）によって行われてもよいし、数値の比較（例えば、所定の値との比較）によって行われてもよい。 (4) In the above-described embodiment, the determination may be made by a value represented by 1 bit (0 or 1) or by a boolean value (Boolean: true or false). , May be done by numerical comparison (eg, comparison with a given value).

（５）上述した実施形態において例示した処理手順、シーケンス、フローチャートなどは、矛盾の無い限り、順序を入れ替えてもよい。例えば、本開示において説明した方法については、例示的な順序を用いて様々なステップの要素を提示しており、提示した特定の順序に限定されない。 (5) The order of the processing procedures, sequences, flowcharts, etc. exemplified in the above-described embodiment may be changed as long as there is no contradiction. For example, the methods described in the present disclosure present elements of various steps using exemplary order, and are not limited to the particular order presented.

（６）図１に例示された各機能は、ハードウェア及びソフトウェアの少なくとも一方の任意の組み合わせによって実現される。また、各機能ブロックの実現方法は特に限定されない。すなわち、各機能ブロックは、物理的又は論理的に結合した１つの装置を用いて実現されてもよいし、物理的又は論理的に分離した２つ以上の装置を直接的又は間接的に（例えば、有線、無線などを用いて）接続し、これら複数の装置を用いて実現されてもよい。機能ブロックは、上記１つの装置又は上記複数の装置にソフトウェアを組み合わせて実現されてもよい。 (6) Each function illustrated in FIG. 1 is realized by any combination of at least one of hardware and software. Further, the method of realizing each functional block is not particularly limited. That is, each functional block may be realized by using one device that is physically or logically connected, or directly or indirectly (for example, by two or more devices that are physically or logically separated). , Wired, wireless, etc.) and may be realized using these plurality of devices. The functional block may be realized by combining the software with the one device or the plurality of devices.

また、通信装置１３０及び２３０は、有線ネットワーク及び無線ネットワークの少なくとも一方を介してコンピュータ間の通信を行うためのハードウェア（送受信デバイス）であり、例えばネットワークデバイス、ネットワークコントローラ、ネットワークカード、通信モジュールなどともいう。通信装置１３０は、例えば、周波数分割複信（ＦＤＤ：Frequency Division Duplex）及び時分割複信（ＴＤＤ：Time Division Duplex）の少なくとも一方を実現するために、高周波スイッチ、デュプレクサ、フィルタ、周波数シンセサイザなどを含んで構成されてもよい。 Further, the communication devices 130 and 230 are hardware (transmission / reception devices) for performing communication between computers via at least one of a wired network and a wireless network, and are, for example, a network device, a network controller, a network card, a communication module, and the like. Also called. The communication device 130 includes, for example, a high frequency switch, a duplexer, a filter, a frequency synthesizer, or the like in order to realize at least one of FDD (Frequency Division Duplex) and Time Division Duplex (TDD). It may be configured to include.

（７）上述した実施形態で例示したプログラムは、ソフトウェアは、ソフトウェア、ファームウェア、ミドルウェア、マイクロコード、ハードウェア記述言語と呼ばれるか、他の名称で呼ばれるかを問わず、命令、命令セット、コード、コードセグメント、プログラムコード、プログラム、サブプログラム、ソフトウェアモジュール、アプリケーション、ソフトウェアアプリケーション、ソフトウェアパッケージ、ルーチン、サブルーチン、オブジェクト、実行可能ファイル、実行スレッド、手順、機能などを意味するよう広く解釈されるべきである。 (7) In the program illustrated in the above-described embodiment, the software is an instruction, an instruction set, a code, regardless of whether the software is called software, firmware, middleware, microcode, hardware description language, or another name. It should be broadly interpreted to mean code segments, program code, programs, subprograms, software modules, applications, software applications, software packages, routines, subroutines, objects, executable files, execution threads, procedures, features, etc. ..

また、ソフトウェア、命令、情報などは、伝送媒体を介して送受信されてもよい。例えば、ソフトウェアが、有線技術（同軸ケーブル、光ファイバケーブル、ツイストペア、デジタル加入者回線（ＤＳＬ：Digital Subscriber Line）など）及び無線技術（赤外線、マイクロ波など）の少なくとも一方を使用してウェブサイト、サーバ、又は他のリモートソースから送信される場合、これらの有線技術及び無線技術の少なくとも一方は、伝送媒体の定義内に含まれる。 Further, software, instructions, information and the like may be transmitted and received via a transmission medium. For example, a website, where the software uses at least one of wired technology (coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), etc.) and wireless technology (infrared, microwave, etc.). When transmitted from a server, or other remote source, at least one of these wired and wireless technologies is included within the definition of transmission medium.

（８）前述の各形態において、「システム」及び「ネットワーク」という用語は、互換的に使用される。 (8) In each of the above-mentioned forms, the terms "system" and "network" are used interchangeably.

（９）本開示において説明した情報、パラメータなどは、絶対値を用いて表されてもよいし、所定の値からの相対値を用いて表されてもよいし、対応する別の情報を用いて表されてもよい。 (9) The information, parameters, etc. described in the present disclosure may be expressed using absolute values, relative values from predetermined values, or other corresponding information. May be represented as.

（１０）上述した実施形態において、店舗装置及びユーザ装置は、移動局（ＭＳ：Mobile Station）である場合が含まれる。移動局は、当業者によって、加入者局、モバイルユニット、加入者ユニット、ワイヤレスユニット、リモートユニット、モバイルデバイス、ワイヤレスデバイス、ワイヤレス通信デバイス、リモートデバイス、モバイル加入者局、アクセス端末、モバイル端末、ワイヤレス端末、リモート端末、ハンドセット、ユーザエージェント、モバイルクライアント、クライアント、又はいくつかの他の適切な用語で呼ばれる場合もある。また、本開示においては、「移動局」、「ユーザ端末（user terminal）」、「ユーザ装置（ＵＥ：User Equipment）」、「端末」等の用語は、互換的に使用され得る。 (10) In the above-described embodiment, the store device and the user device may be a mobile station (MS: Mobile Station). Mobile stations can be used by those skilled in the art as subscriber stations, mobile units, subscriber units, wireless units, remote units, mobile devices, wireless devices, wireless communication devices, remote devices, mobile subscriber stations, access terminals, mobile terminals, wireless. It may also be referred to as a terminal, remote terminal, handset, user agent, mobile client, client, or some other suitable term. Further, in the present disclosure, terms such as "mobile station", "user terminal", "user equipment (UE)", and "terminal" can be used interchangeably.

（１１）上述した実施形態において、「接続された(connected)」、「結合された(coupled)」という用語、又はこれらのあらゆる変形は、２又はそれ以上の要素間の直接的又は間接的なあらゆる接続又は結合を意味し、互いに「接続」又は「結合」された２つの要素間に１又はそれ以上の中間要素が存在することを含むことができる。要素間の結合又は接続は、物理的なものであっても、論理的なものであっても、或いはこれらの組み合わせであってもよい。例えば、「接続」は「アクセス」で読み替えられてもよい。本開示で使用する場合、２つの要素は、１又はそれ以上の電線、ケーブル及びプリント電気接続の少なくとも一つを用いて、並びにいくつかの非限定的かつ非包括的な例として、無線周波数領域、マイクロ波領域及び光（可視及び不可視の両方）領域の波長を有する電磁エネルギーなどを用いて、互いに「接続」又は「結合」されると考えることができる。 (11) In the embodiments described above, the terms "connected", "coupled", or any variation thereof, are direct or indirect between two or more elements. It means any connection or connection and can include the presence of one or more intermediate elements between two elements that are "connected" or "connected" to each other. The connection or connection between the elements may be physical, logical, or a combination thereof. For example, "connection" may be read as "access". As used in the present disclosure, the two elements use at least one of one or more wires, cables and printed electrical connections, and, as some non-limiting and non-comprehensive examples, the radio frequency domain. Can be considered to be "connected" or "coupled" to each other using electromagnetic energies having wavelengths in the microwave and light (both visible and invisible) regions.

（１２）上述した実施形態において、「に基づいて」という記載は、別段に明記されていない限り、「のみに基づいて」を意味しない。言い換えれば、「に基づいて」という記載は、「のみに基づいて」と「に少なくとも基づいて」の両方を意味する。 (12) In the above embodiments, the phrase "based on" does not mean "based on" unless otherwise stated. In other words, the statement "based on" means both "based only" and "at least based on".

（１３）本開示で使用する「判断(determining)」、「決定(determining)」という用語は、多種多様な動作を包含する場合がある。「判断」、「決定」は、例えば、判定(judging)、計算(calculating)、算出(computing)、処理(processing)、導出(deriving)、調査(investigating)、探索(looking up、search、inquiry)（例えば、テーブル、データベース又は別のデータ構造での探索）、確認(ascertaining)した事を「判断」「決定」したとみなす事などを含み得る。また、「判断」、「決定」は、受信(receiving)（例えば、情報を受信すること）、送信(transmitting)(例えば、情報を送信すること)、入力(input)、出力(output)、アクセス(accessing)（例えば、メモリ中のデータにアクセスすること）した事を「判断」「決定」したとみなす事などを含み得る。また、「判断」、「決定」は、解決(resolving)、選択(selecting)、選定(choosing)、確立(establishing)、比較(comparing)などした事を「判断」「決定」したとみなす事を含み得る。つまり、「判断」「決定」は、何らかの動作を「判断」「決定」したとみなす事を含み得る。また、「判断（決定）」は、「想定する（assuming）」、「期待する（expecting）」、「みなす（considering）」などで読み替えられてもよい。 (13) The terms "determining" and "determining" used in the present disclosure may include a wide variety of actions. "Judgment" and "decision" are, for example, judgment, calculation, computing, processing, deriving, investigating, looking up, search, inquiry. (For example, searching in a table, database or another data structure), ascertaining may be regarded as "judgment" or "decision". Also, "judgment" and "decision" are receiving (for example, receiving information), transmitting (for example, transmitting information), input (input), output (output), and access. (Accessing) (for example, accessing data in memory) may be regarded as "judgment" or "decision". In addition, "judgment" and "decision" mean that the things such as solving, selecting, choosing, establishing, and comparing are regarded as "judgment" and "decision". Can include. That is, "judgment" and "decision" may include considering some action as "judgment" and "decision". Further, "judgment (decision)" may be read as "assuming", "expecting", "considering" and the like.

（１４）上述した実施形態において、「含む（include）」、「含んでいる（including）」及びそれらの変形が使用されている場合、これらの用語は、用語「備える（comprising）」と同様に、包括的であることが意図される。更に、本開示において使用されている用語「又は（or）」は、排他的論理和ではないことが意図される。 (14) When "include", "including" and variants thereof are used in the embodiments described above, these terms are similar to the term "comprising". , Intended to be inclusive. Furthermore, the term "or" used in the present disclosure is intended not to be an exclusive OR.

（１５）本開示において、例えば、英語でのa, an及びtheのように、翻訳により冠詞が追加された場合、本開示は、これらの冠詞の後に続く名詞が複数形であることを含んでもよい。 (15) In the present disclosure, if articles are added by translation, for example a, an and the in English, the disclosure also includes the plural nouns following these articles. Good.

（１６）本開示において、「ＡとＢが異なる」という用語は、「ＡとＢが互いに異なる」ことを意味してもよい。なお、当該用語は、「ＡとＢがそれぞれＣと異なる」ことを意味してもよい。「離れる」、「結合される」等の用語も、「異なる」と同様に解釈されてもよい。 (16) In the present disclosure, the term "A and B are different" may mean "A and B are different from each other". The term may mean that "A and B are different from C". Terms such as "separate" and "combined" may be interpreted in the same way as "different".

（１７）本開示において説明した各態様／実施形態は単独で用いてもよいし、組み合わせて用いてもよいし、実行に伴って切り替えて用いてもよい。また、所定の情報の通知（例えば、「Ｘであること」の通知）は、明示的に行うものに限られず、暗黙的（例えば、当該所定の情報の通知を行わない）ことによって行われてもよい。 (17) Each of the embodiments / embodiments described in the present disclosure may be used alone, in combination, or switched with execution. Further, the notification of predetermined information (for example, the notification of "being X") is not limited to the explicit one, but is performed implicitly (for example, the notification of the predetermined information is not performed). May be good.

以上、本開示について詳細に説明したが、当業者にとっては、本開示が本開示中に説明した実施形態に限定されない。本開示は、請求の範囲の記載により定まる本開示の趣旨及び範囲を逸脱することなく修正及び変更態様として実施することができる。従って、本開示の記載は、例示説明を目的とするものであり、本開示に対して何ら制限的な意味を有するものではない。 Although the present disclosure has been described in detail above, those skilled in the art are not limited to the embodiments described in the present disclosure. The present disclosure may be implemented as an amendment or modification without departing from the purpose and scope of the present disclosure, which is determined by the description of the scope of claims. Therefore, the description of the present disclosure is for the purpose of exemplary explanation and does not have any limiting meaning to the present disclosure.

１…テレビ会議システム、１０…サーバ装置、２０_1〜２０_n…ユーザ装置、１１１…取得部、１１２…画像生成部、１１３…認識部、１１４…管理部、Ｄｊ…検出情報、Ｇｆ１〜Ｇｆ６…顔画像情報、Ｇｊ…画像情報、Ｕｊ…ユーザ、Ｘｊ…個別情報、Ｙｊ…画面情報、Ｚｊ…指定情報。 1 ... Video conferencing system, 10 ... Server device, 20_1 to 20_n ... User device, 111 ... Acquisition unit, 112 ... Image generation unit, 113 ... Recognition unit, 114 ... Management unit, DJ ... Detection information, Gf1 to Gf6 ... Face image Information, Gj ... image information, Uj ... user, Xj ... individual information, Yj ... screen information, Zj ... designated information.

Claims

A server device that communicates with a plurality of users participating in a video conference and a plurality of user devices having a one-to-one correspondence.
An acquisition unit that acquires an image of each user, individual information indicating the sound of the user, and screen information regarding the screen size of the user device corresponding to each user from each of the plurality of user devices.
Based on the plurality of individual information corresponding to one-to-one with the plurality of user devices and the plurality of screen information corresponding to one-to-one with the plurality of user devices, the image information corresponding to the screen size of each user device is obtained. An image generator generated for each of a plurality of user devices,
A transmission unit that transmits image information generated for each of the plurality of user devices to the corresponding user device, and a transmission unit.
A server device that comprises.

The plurality of user devices include a first user device and a second user device.
The screen size indicated by the screen information corresponding to the first user device is larger than the screen size indicated by the screen information corresponding to the second user device.
The image generation unit generates first image information corresponding to the first user device and second image information corresponding to the second user device.
The image indicated by the first image information includes the faces of the plurality of users.
The image indicated by the second image information includes the face of the speaker among the plurality of users.
The number of user's faces included in the image indicated by the second image information is smaller than the number of user's faces included in the image indicated by the first image information.
The transmission unit transmits the first image information to the first user device, and transmits the second image information to the second user device.
The server device according to claim 1.

The image generation unit
By extracting an image of a portion including the face of the user based on the plurality of individual information, a plurality of face image information corresponding to one-to-one with the plurality of users is generated.
As the first image information, information including the face image information of the plurality of users is generated, and the information is generated.
As the second image information, among the face image information of the plurality of users, information including the face image information of the speaker and not including a part of the face image information of the plurality of users. To generate,
The server device according to claim 2.

A recognition unit that recognizes a speaker among the plurality of users based on the voice indicated by the plurality of individual information is further provided.
The image generation unit identifies the face image information corresponding to the speaker among the plurality of face image information based on the recognition result of the recognition unit.
The server device according to claim 3.

When the number of user's faces included in the image indicated by the second image information is 1, the image generation unit may perform the image generation unit.
The first image information is generated by emphasizing the face image indicated by the face image information corresponding to the speaker.
The second image information is generated without emphasizing the face image indicated by the face image information corresponding to the speaker.
The server device according to claim 3 or 4.

The image indicated by each of the plurality of image information includes an image of an operator for conveying the intention to make a statement.
The acquisition unit can acquire detection information indicating that the image of the operator has been operated from any of the plurality of user devices.
The image generation unit
The image indicated by each of the plurality of image information includes an image that can identify the user who uses the user device that is the source of the detection information.
The server device according to any one of claims 1 to 5.

The acquisition unit can acquire designated information for designating the user who is the other party of the secret talk and the communication method used for the secret talk from the plurality of user devices.
When the secret talk is performed, the user device of the source of the designated information and the user device of the other party of the secret talk designated by the designated information are used according to the communication method used for the secret talk specified by the designated information. Equipped with a management unit that switches communication methods
The server device according to any one of claims 1 to 6.

The specified information is
Generated by one of the plurality of user devices
The user who is the other party of the secret talk is designated by the face image to be operated among the plurality of face images displayed on the one user device.
The communication method in the secret talk is specified by the operation on the face image.
The server device according to claim 7.

The image generation unit
In the video conference, when the faces of the plurality of users are displayed on the image displayed on the user device of the other party of the secret talk, the image showing the image emphasizing the face of the user who is the transmission source of the designated information. The server device according to claim 7 or 8, which generates information.

The server device according to any one of claims 1 to 9, and the server device.
Multiple user devices that have a one-to-one correspondence with multiple users participating in a video conference,
Video conferencing system with.