JP7400231B2

JP7400231B2 - Communication systems, information processing devices, communication methods and programs

Info

Publication number: JP7400231B2
Application number: JP2019129658A
Authority: JP
Inventors: 怜士川▲崎▼
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2019-07-11
Filing date: 2019-07-11
Publication date: 2023-12-19
Anticipated expiration: 2039-07-11
Also published as: JP2021016083A

Description

本発明は、通信システム、情報処理装置、通信方法およびプログラムに関する。 The present invention relates to a communication system, an information processing device, a communication method, and a program.

複数の地点にいる複数の参加者が、ディスプレイを用いて会議を行うビデオ会議システムにおいて、ビデオ会議の参加者の情報として例えば各参加者を仮想的に表現するアバターをディスプレイに表示させる技術が知られている。 In a video conferencing system in which multiple participants at multiple locations hold a conference using a display, there is a known technology for displaying, for example, avatars that virtually represent each participant on the display as information about the participants in the video conference. It is being

このような、アバターを表示させるビデオ会議システムとして、ネットワーク上を伝送するデータ量を少なくしつつ、参加者が会議の状況を把握しづらくなるのを防ぐために、参加者毎の属性情報に応じて画面の表示制御を行うシステムが開示されている（特許文献１参照）。 In order to reduce the amount of data transmitted over the network and prevent it from becoming difficult for participants to grasp the status of the meeting, a video conferencing system that displays avatars is designed to A system for controlling screen display has been disclosed (see Patent Document 1).

しかしながら、特許文献１に記載された技術では、事前に参加者の属性情報が登録されていないと、表示制御を行うことができないという問題がある。 However, the technique described in Patent Document 1 has a problem in that display control cannot be performed unless attribute information of participants is registered in advance.

本発明は、上述の問題点に鑑みてなされたものであって、ビデオ会議の開始前に属性情報が登録されていなくても、参加者の情報に対する表示制御を行うことができる通信システム、情報処理装置、通信方法およびプログラムを提供することを目的とする。 The present invention has been made in view of the above-mentioned problems, and the present invention provides a communication system and information system that can control the display of participant information even if attribute information is not registered before the start of a video conference. The purpose is to provide processing devices, communication methods, and programs.

上述した課題を解決し、目的を達成するために、本発明は、複数の通信端末がネットワークを介して音声データを送受信することによりビデオ会議が可能な通信システムであって、前記ビデオ会議の１以上の参加者を撮影した映像データを得る撮像部と、前記撮像部により得られた前記映像データから前記参加者を検出する検出部と、前記参加者が発話した音声データを入力する音声入力部と、前記音声入力部により入力された前記音声データから、前記参加者の属性情報を抽出する抽出部と、前記抽出部により抽出された前記属性情報に基づいて、前記検出部により検出された前記参加者の情報について、第１通信端末での表示制御を行う第１制御部と、前記第１通信端末と前記ネットワークを介して通信する第２通信端末を利用する参加者に対して前記属性情報の発話を要求する指示を、前記第２通信端末へ送信する要求部と、前記要求部から受信した前記指示に従って、前記第２通信端末において、該第２通信端末を利用する参加者に対して前記属性情報の発話を促す処理を行う第２制御部と、を有することを特徴とする。 In order to solve the above-mentioned problems and achieve the objects, the present invention provides a communication system in which a plurality of communication terminals can conduct a video conference by transmitting and receiving audio data via a network. an imaging section that obtains video data of the above participants; a detection section that detects the participants from the video data obtained by the imaging section; and an audio input section that inputs audio data uttered by the participants. an extraction unit that extracts attribute information of the participant from the audio data input by the audio input unit; and an extraction unit that extracts attribute information of the participant from the audio data input by the audio input unit; Regarding participant information, a first control unit that controls display on a first communication terminal, and the attribute information for participants who use a second communication terminal that communicates with the first communication terminal via the network. a request unit that transmits an instruction to request an utterance to the second communication terminal; and a request unit that transmits an instruction to request an utterance from the second communication terminal to a participant using the second communication terminal, according to the instruction received from the request unit. A second control unit that performs a process of prompting the user to utter the attribute information .

本発明によれば、ビデオ会議の開始前に属性情報が登録されていなくても、参加者の情報に対する表示制御を行うことができる。 According to the present invention, display control of participant information can be performed even if attribute information is not registered before the start of a video conference.

図１は、実施形態に係る通信システムの概略構成図である。FIG. 1 is a schematic configuration diagram of a communication system according to an embodiment. 図２は、実施形態に係る通信端末（ビデオ会議端末）のハードウェア構成の一例を示す図である。FIG. 2 is a diagram illustrating an example of the hardware configuration of a communication terminal (video conference terminal) according to the embodiment. 図３は、実施形態に係る通信端末（電子黒板）のハードウェア構成の一例を示す図である。FIG. 3 is a diagram showing an example of the hardware configuration of the communication terminal (electronic whiteboard) according to the embodiment. 図４は、実施形態に係る管理システムおよびプログラム提供システムのハードウェア構成の一例を示す図である。FIG. 4 is a diagram illustrating an example of the hardware configuration of the management system and program providing system according to the embodiment. 図５は、実施形態に係る通信端末のソフトウェア構成の一例を示す図である。FIG. 5 is a diagram illustrating an example of the software configuration of the communication terminal according to the embodiment. 図６は、実施形態に係る通信システムの機能ブロックの構成の一例を示す図である。FIG. 6 is a diagram illustrating an example of the configuration of functional blocks of the communication system according to the embodiment. 図７は、認証管理テーブルの一例を示す図である。FIG. 7 is a diagram showing an example of an authentication management table. 図８は、端末管理テーブルの一例を示す図である。FIG. 8 is a diagram showing an example of a terminal management table. 図９は、グループ管理テーブルの一例を示す図である。FIG. 9 is a diagram showing an example of a group management table. 図１０は、セッション管理テーブルの一例を示す図である。FIG. 10 is a diagram showing an example of a session management table. 図１１は、実施形態に係る通信システムにおけるコンテンツデータおよび各種管理情報を送受信するために確立されたセッションを示す図である。FIG. 11 is a diagram showing sessions established for transmitting and receiving content data and various types of management information in the communication system according to the embodiment. 図１２は、実施形態に係る通信システムにおける、通信端末が通話を開始するための認証処理を含む準備段階の処理の一例を示すシーケンス図である。FIG. 12 is a sequence diagram illustrating an example of a preparation stage process including an authentication process for a communication terminal to start a call in the communication system according to the embodiment. 図１３は、宛先リストの表示例を示す図である。FIG. 13 is a diagram showing a display example of the destination list. 図１４は、実施形態に係る通信システムにおける通話の開始を要求する処理の一例を示すシーケンス図である。FIG. 14 is a sequence diagram illustrating an example of a process for requesting the start of a call in the communication system according to the embodiment. 図１５は、実施形態に係る通信システムにおける通話の開始の要求を許可する処理の一例を示すシーケンス図である。FIG. 15 is a sequence diagram illustrating an example of processing for permitting a request to start a call in the communication system according to the embodiment. 図１６は、開始要求受付画面の表示例を示す図である。FIG. 16 is a diagram illustrating a display example of a start request reception screen. 図１７は、実施形態に係る通信システムのアバター生成処理の流れの一例を示すフローチャートである。FIG. 17 is a flowchart illustrating an example of the flow of avatar generation processing in the communication system according to the embodiment. 図１８は、アバターの表示動作を説明する図である。FIG. 18 is a diagram illustrating the display operation of the avatar. 図１９は、アバターの表示動作を説明する図である。FIG. 19 is a diagram illustrating the display operation of the avatar. 図２０は、実施形態に係る通信端末の音声方向特定処理の流れの一例を示すフローチャートである。FIG. 20 is a flowchart illustrating an example of the flow of audio direction identification processing of the communication terminal according to the embodiment. 図２１は、実施形態に係る通信システムの属性情報取得処理の流れの一例を示すフローチャートである。FIG. 21 is a flowchart illustrating an example of the flow of attribute information acquisition processing in the communication system according to the embodiment. 図２２は、属性情報の取得動作を説明する図である。FIG. 22 is a diagram illustrating an operation for acquiring attribute information. 図２３は、顔・アバター・属性対応テーブルの一例を示す図である。FIG. 23 is a diagram showing an example of a face/avatar/attribute correspondence table. 図２４は、実施形態に係る通信システムのアバター制御処理の流れの一例を示すフローチャートである。FIG. 24 is a flowchart illustrating an example of the flow of avatar control processing in the communication system according to the embodiment. 図２５は、属性情報に基づいてアバターの配置を変更する動作を説明する図である。FIG. 25 is a diagram illustrating the operation of changing the arrangement of avatars based on attribute information. 図２６は、発話方向と発話者との対応を説明する説明する図である。FIG. 26 is an explanatory diagram illustrating the correspondence between speech directions and speakers. 図２７は、発話者・発話方向対応テーブルの一例を示す図である。FIG. 27 is a diagram showing an example of a speaker/speech direction correspondence table. 図２８は、実施形態に係る通信システムの全体動作の流れの一例を示すシーケンス図である。FIG. 28 is a sequence diagram illustrating an example of the flow of the overall operation of the communication system according to the embodiment. 図２９は、音声認識による属性情報の取得を促す画面の一例を示す図である。FIG. 29 is a diagram illustrating an example of a screen that prompts the acquisition of attribute information through voice recognition. 図３０は、実施形態に係る通信端末のＷｅｂアプリを利用する場合のソフトウェア構成の一例を示す図である。FIG. 30 is a diagram illustrating an example of a software configuration when using a web application of a communication terminal according to an embodiment.

以下に、図面を参照しながら、本発明に係る通信システム、情報処理装置、通信方法およびプログラムの実施形態を詳細に説明する。また、以下の実施形態によって本発明が限定されるものではなく、以下の実施形態における構成要素には、当業者が容易に想到できるもの、実質的に同一のもの、およびいわゆる均等の範囲のものが含まれる。さらに、以下の実施形態の要旨を逸脱しない範囲で構成要素の種々の省略、置換、変更および組み合わせを行うことができる。 DESCRIPTION OF THE PREFERRED EMBODIMENTS Embodiments of a communication system, an information processing device, a communication method, and a program according to the present invention will be described in detail below with reference to the drawings. Further, the present invention is not limited to the following embodiments, and the constituent elements in the following embodiments include those that can be easily conceived by a person skilled in the art, those that are substantially the same, and those that are within the so-called equivalent range. is included. Furthermore, various omissions, substitutions, changes, and combinations of constituent elements can be made without departing from the gist of the following embodiments.

また、コンピュータソフトウェアとは、コンピュータの動作に関するプログラム、その他コンピュータによる処理の用に供する情報であってプログラムに準ずるものをいう（以下、コンピュータソフトウェアは、ソフトウェアという）。アプリケーションソフトとは、ソフトウェアの分類のうち、特定の作業を行うために使用されるソフトウェアの総称である。一方、オペレーティングシステム（ＯＳ）とは、コンピュータを制御し、アプリケーションソフト等がコンピュータ資源を利用可能にするためのソフトウェアのことである。オペレーティングシステムは、入出力の制御、メモリやハードディスクなどのハードウェアの管理、プロセスの管理といった、コンピュータの基本的な管理・制御を行っている。アプリケーションソフトウェアは、オペレーティングシステムが提供する機能を利用して動作する。プログラムとは、コンピュータに対する指令であって、一の結果を得ることができるように組み合わせたものをいう。また、プログラムに準ずるものとは、コンピュータに対する直接の指令ではないためプログラムとは呼べないが、コンピュータの処理を規定するという点でプログラムに類似する性質を有するものをいう。例えば、データ構造（データ要素間の相互関係で表される、データの有する論理的構造）がプログラムに準ずるものに該当する。 Furthermore, computer software refers to programs related to computer operations and other information used for processing by a computer that is equivalent to a program (hereinafter, computer software will be referred to as software). Application software is a general term for software used to perform specific tasks among software categories. On the other hand, an operating system (OS) is software that controls a computer and allows application software and the like to use computer resources. The operating system performs basic computer management and control, such as controlling input and output, managing hardware such as memory and hard disks, and managing processes. Application software operates using functions provided by the operating system. A program is a set of instructions to a computer that are combined to produce a single result. Also, what is said to be similar to a program refers to something that cannot be called a program because it is not a direct command to a computer, but has properties similar to a program in that it specifies computer processing. For example, a data structure (a logical structure of data expressed by mutual relationships between data elements) corresponds to a program.

［実施形態］
（通信システムの全体構成）
図１は、実施形態に係る通信システムの概略構成図である。図１を参照しながら、本実施形態に係る通信システム１の構成の概略を説明する。 [Embodiment]
(Overall configuration of communication system)
FIG. 1 is a schematic configuration diagram of a communication system according to an embodiment. An outline of the configuration of a communication system 1 according to this embodiment will be described with reference to FIG. 1.

図１に示すように、通信システム１は、複数の通信端末１０ａａ、１０ａｂ、１０ｂａ、１０ｂｂ、１０ｃａ、１０ｃｂ、１０ｄａ、１０ｄｂと、一部の通信端末用のディスプレイ１２０ａａ、１２０ｂａ、１２０ｃａ、１２０ｄａと、一部の通信端末に接続されたＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）３０ａｂ、３０ｃｂと、管理システム５０と、プログラム提供システム９０と、を含み、通信ネットワーク２を介して互いに通信可能となるように構築されている。 As shown in FIG. 1, the communication system 1 includes a plurality of communication terminals 10aa, 10ab, 10ba, 10bb, 10ca, 10cb, 10da, 10db, displays 120aa, 120ba, 120ca, 120da for some communication terminals, It includes PCs (Personal Computers) 30ab and 30cb connected to some communication terminals, a management system 50, and a program provision system 90, and is constructed to be able to communicate with each other via the communication network 2. .

なお、図１では、通信端末１０ａａ、１０ａｂ、１０ｂａ、１０ｂｂ、１０ｃａ、１０ｃｂ、１０ｄａ、１０ｄｂが示されているが、これらのうち任意の通信端末を示す場合または総称する場合、単に「通信端末１０」と称する。また、図１に示す通信システム１に含まれる複数の通信端末１０は、一例を示すものであり、異なる台数であってもよい。 In addition, although communication terminals 10aa, 10ab, 10ba, 10bb, 10ca, 10cb, 10da, and 10db are shown in FIG. 1, when any communication terminal among these is shown or collectively referred to, it is simply referred to as "communication terminal 10. ”. Further, the plurality of communication terminals 10 included in the communication system 1 shown in FIG. 1 is an example, and the number of communication terminals 10 may be different.

また、図１では、ディスプレイ１２０ａａ、１２０ｂａ、１２０ｃａ、１２０ｄａが示されているが、これらのうち任意のディスプレイを示す場合または総称する場合、単に「ディスプレイ１２０」と称する。また、図１に示す通信システム１に含まれる複数のディスプレイ１２０は、一例を示すものであり、異なる台数であってもよい。 Further, although displays 120aa, 120ba, 120ca, and 120da are shown in FIG. 1, when any of these displays is shown or collectively referred to, they are simply referred to as "display 120." Furthermore, the plurality of displays 120 included in the communication system 1 shown in FIG. 1 is an example, and the number of displays 120 may be different.

通信端末１０は、他の装置との間で、各種情報を送受信する端末である。通信端末１０は、他の通信端末１０との間でセッションを確立し、確立したセッションにおいて、音声データおよび画像データ（映像データ）を含むコンテンツデータの送受信による通話を行う。これにより、通信システム１において、複数の通信端末１０間のビデオ会議が実現される。なお、通信端末１０は、ディスプレイ（上述のディスプレイ１２０）が接続される専用装置（ビデオ会議端末）、電子黒板（インタラクティブホワイトボード：ＩＷＢ）、デスクトップＰＣ、ノートＰＣ、スマートフォン、またはタブレット端末等のいずれであってもよい。 The communication terminal 10 is a terminal that transmits and receives various information to and from other devices. The communication terminal 10 establishes a session with another communication terminal 10, and performs a telephone call by transmitting and receiving content data including audio data and image data (video data) in the established session. Thereby, in the communication system 1, a video conference between the plurality of communication terminals 10 is realized. Note that the communication terminal 10 is any one of a dedicated device (video conference terminal) to which a display (display 120 described above) is connected, an electronic whiteboard (interactive whiteboard: IWB), a desktop PC, a notebook PC, a smartphone, a tablet terminal, etc. It may be.

ディスプレイ１２０は、通信ネットワーク２を介して、接続されている通信端末１０が受信した参加者の映像等を、表示する表示装置である。ディスプレイ１２０は、例えば、ＬＣＤ（ＬｉｑｕｉｄＣｒｙｓｔａｌＤｉｓｐｌａｙ）または有機ＥＬ（Ｅｌｅｃｔｒｏ－Ｌｕｍｉｎｅｓｃｅｎｃｅ）ディスプレイ等である。 The display 120 is a display device that displays images of participants received by the connected communication terminal 10 via the communication network 2. The display 120 is, for example, an LCD (Liquid Crystal Display) or an organic EL (Electro-Luminescence) display.

ＰＣ３０は、例えば、通信端末１０に接続され、他の通信端末１０との共有する画面イメージを当該通信端末１０に送信し、管理システム５０を介して配信することによって、他の通信端末１０と画面共有の状態にする情報処理装置である。なお、ＰＣ３０は、デスクトップＰＣおよびノートＰＣ等のＰＣであることに限定されず、スマートフォンまたはタブレット端末等の情報処理装置であってもよい。 For example, the PC 30 is connected to a communication terminal 10 and transmits a screen image to be shared with another communication terminal 10 to the communication terminal 10 and distributes it via the management system 50, thereby sharing the screen image with the other communication terminal 10. This is an information processing device that is placed in a shared state. Note that the PC 30 is not limited to a PC such as a desktop PC or a notebook PC, but may be an information processing device such as a smartphone or a tablet terminal.

管理システム５０は、通信端末１０を一元的に管理するコンピュータである。管理システム５０は、通信端末１０間でのセッションを確立することにより、通信端末１０間における通話等によるビデオ会議を実現する。管理システム５０は、所定の通信端末１０からセッションの開始要求情報を受信した場合に、開始要求情報を送信した通信端末１０（開始要求端末）と宛先端末との間のセッションを確立し、ビデオ会議を開始させる。したがって、管理システム５０は、確立したセッションにより、複数の通信端末１０間でコンテンツデータの中継を行う。なお、管理システム５０は複数の通信端末１０間でセッションを確立するものの、実際にコンテンツデータの中継は、別の中継装置により行われるものとしてもよい。本実施形態では、説明を簡略にするため、セッションの確立、およびコンテンツデータの中継は、管理システム５０が行うものとして説明する。 The management system 50 is a computer that centrally manages the communication terminals 10. The management system 50 establishes a session between the communication terminals 10, thereby realizing a video conference using a telephone call or the like between the communication terminals 10. When receiving session start request information from a predetermined communication terminal 10, the management system 50 establishes a session between the communication terminal 10 that transmitted the start request information (start request terminal) and the destination terminal, and starts the video conference. start. Therefore, the management system 50 relays content data between the plurality of communication terminals 10 using the established session. Although the management system 50 establishes sessions between the plurality of communication terminals 10, the relay of content data may actually be performed by another relay device. In this embodiment, in order to simplify the explanation, it will be assumed that the management system 50 establishes a session and relays content data.

プログラム提供システム９０は、通信端末１０に各種機能または各種手段を実現させるための端末用プログラムが記憶された補助記憶装置（ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）等）を備えており、通信端末１０に端末用プログラム（後述する通信アプリＡ等）を提供するコンピュータである。また、プログラム提供システム９０は、管理システム５０等に各種機能または各種手段を実現させるためのプログラムも補助記憶装置に記憶しており、管理システム５０等に、対応するプログラムを送信することができる。 The program providing system 90 includes an auxiliary storage device (HDD (Hard Disk Drive), etc.) that stores terminal programs for realizing various functions or means in the communication terminal 10, and provides terminal programs to the communication terminal 10. This is a computer that provides programs (such as communication application A to be described later). The program providing system 90 also stores programs in the auxiliary storage device for making the management system 50 etc. implement various functions or means, and can send the corresponding programs to the management system 50 etc.

通信ネットワーク２は、図１に示すように、例えば、ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）２ａ～２ｄ、専用線２ａｂ、２ｃｄ、およびインターネット２ｉを含んで構築されている。なお、通信ネットワーク２は、図１に示すような構成に限定されるものではなく、その他のネットワーク機器が含まれるものとしてもよく、有線だけでなく無線による通信が行われる箇所があってもよい。 As shown in FIG. 1, the communication network 2 is constructed to include, for example, LANs (Local Area Networks) 2a to 2d, private lines 2ab and 2cd, and the Internet 2i. Note that the communication network 2 is not limited to the configuration shown in FIG. 1, and may include other network devices, and may have locations where wireless communication is performed in addition to wired communication. .

ＬＡＮ２ａ～２ｄ、および専用線２ａｂ、２ｃｄは、それぞれルータ７０ａ～７０ｄ、７０ａｂ、７０ｃｄを含む。ルータ７０ａ～７０ｄ、７０ａｂ、７０ｃｄは、通信データの最適な経路の選択を行うネットワーク機器である。 LANs 2a to 2d and private lines 2ab and 2cd include routers 70a to 70d, 70ab and 70cd, respectively. The routers 70a to 70d, 70ab, and 70cd are network devices that select optimal routes for communication data.

通信端末１０（１０ａａ、１０ａｂ、・・・）、およびルータ７０ａは、ＬＡＮ２ａによって通信可能に接続されている。また、通信端末１０（１０ｂａ、１０ｂｂ、・・・）、およびルータ７０ｂは、ＬＡＮ２ｂによって通信可能に接続されている。また、ＬＡＮ２ａ、ＬＡＮ２ｂ、およびルータ７０ａｂは、専用線２ａｂによって通信可能に接続されており、地域Ａ内で構築されている。 The communication terminals 10 (10aa, 10ab, . . . ) and the router 70a are communicably connected via the LAN 2a. Furthermore, the communication terminals 10 (10ba, 10bb, . . . ) and the router 70b are communicably connected via the LAN 2b. Further, the LAN 2a, the LAN 2b, and the router 70ab are communicably connected by a dedicated line 2ab, and are constructed within the region A.

一方、通信端末１０（１０ｃａ、１０ｃｂ、・・・）、およびルータ７０ｃは、ＬＡＮ２ｃによって通信可能に接続されている。また、通信端末１０（１０ｄａ、１０ｄｂ、・・・）、およびルータ７０ｄは、ＬＡＮ２ｄによって通信可能に接続されている。また、ＬＡＮ２ｃ、ＬＡＮ２ｄ、およびルータ７０ｃｄは、専用線２ｃｄによって通信可能に接続されており、地域Ｂ内で構築されている。 On the other hand, the communication terminals 10 (10ca, 10cb, . . . ) and the router 70c are communicably connected via the LAN 2c. Further, the communication terminals 10 (10da, 10db, . . . ) and the router 70d are communicably connected via the LAN 2d. Further, LAN2c, LAN2d, and router 70cd are communicably connected by dedicated line 2cd, and are constructed within area B.

地域Ａおよび地域Ｂのネットワークは、それぞれルータ７０ａｂ、７０ｃｄによってインターネット２ｉを介して通信可能に接続されている。 The networks in region A and region B are communicably connected via the Internet 2i by routers 70ab and 70cd, respectively.

また、管理システム５０およびプログラム提供システム９０は、インターネット２ｉを介して、各通信端末１０と通信可能に接続されている。なお、管理システム５０およびプログラム提供システム９０は、地域Ａまたは地域Ｂに設置されていてもよいし、これら以外の地域に設置されていてもよい。 Furthermore, the management system 50 and the program providing system 90 are communicably connected to each communication terminal 10 via the Internet 2i. Note that the management system 50 and the program providing system 90 may be installed in area A or area B, or may be installed in an area other than these.

また、図１において、各通信端末１０、管理システム５０、各ルータ７０およびプログラム提供システム９０の近傍に示されている４組の数字は、一般的なＩＰｖ４におけるＩＰ（ＩｎｅｔｅｒｎｅｔＰｒｏｔｏｃｏｌ）アドレスを簡易的に示している。例えば、通信端末１０ａａのＩＰアドレスは、「１．２．１．３」であるものとしている。なお、ＩＰｖ４ではなく、ＩＰｖ６を用いてもよいが、説明を簡略化するため、ＩＰｖ４を用いて説明する。 In addition, in FIG. 1, four sets of numbers shown near each communication terminal 10, management system 50, each router 70, and program providing system 90 are simplified representations of IP (Internet Protocol) addresses in general IPv4. It is shown in For example, it is assumed that the IP address of the communication terminal 10aa is "1.2.1.3". Note that IPv6 may be used instead of IPv4, but to simplify the explanation, IPv4 will be used in the explanation.

なお、図１に示す通信システム１の構成は、一例を示すものであり、この構成に限定されるものではない。すなわち、図１に示す各装置、システムの台数は、図１に示す台数に限定されるものではない。また、図１では、地域Ａ、Ｂの２つの地域のネットワーク構成が示されているが、同一地域内のネットワークであってもよく、３つ以上の地域がネットワークで接続された構成であってもよい。 Note that the configuration of the communication system 1 shown in FIG. 1 is an example, and is not limited to this configuration. That is, the number of each device and system shown in FIG. 1 is not limited to the number shown in FIG. In addition, although FIG. 1 shows the network configuration of two regions, regions A and B, the network may be within the same region, or it may be a structure in which three or more regions are connected by a network. Good too.

（通信端末のハードウェア構成）
図２は、実施形態に係る通信端末（ビデオ会議端末）のハードウェア構成の一例を示す図である。図３は、実施形態に係る通信端末（電子黒板）のハードウェア構成の一例を示す図である。まず、図２を参照しながら、本実施形態に係る通信端末１０がビデオ会議端末であるものとした場合のハードウェア構成の詳細について説明する。 (Hardware configuration of communication terminal)
FIG. 2 is a diagram illustrating an example of the hardware configuration of a communication terminal (video conference terminal) according to the embodiment. FIG. 3 is a diagram showing an example of the hardware configuration of the communication terminal (electronic whiteboard) according to the embodiment. First, with reference to FIG. 2, the details of the hardware configuration when the communication terminal 10 according to this embodiment is a video conference terminal will be described.

図２に示すように、本実施形態に係る通信端末１０は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）１０１と、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）１０２と、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）１０３と、補助記憶装置１０５と、メディアドライブ１０７と、入力装置１０８と、を備えている。 As shown in FIG. 2, the communication terminal 10 according to the present embodiment includes a CPU (Central Processing Unit) 101, a ROM (Read Only Memory) 102, a RAM (Random Access Memory) 103, and an auxiliary storage device 105. It includes a media drive 107 and an input device 108.

ＣＰＵ１０１は、通信端末１０全体の動作を制御する演算装置である。ＲＯＭ１０２は、通信端末１０用のプログラム（後述する通信アプリＡ等）を記憶している不揮発性記憶装置である。ＲＡＭ１０３は、ＣＰＵ１０１のワークエリアとして使用される揮発性記憶装置である。 CPU 101 is an arithmetic device that controls the overall operation of communication terminal 10 . The ROM 102 is a non-volatile storage device that stores programs for the communication terminal 10 (such as communication application A to be described later). The RAM 103 is a volatile storage device used as a work area for the CPU 101.

補助記憶装置１０５は、画像データ、音声データおよび動画データ等の各種データを記憶するＨＤＤまたはＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）等の不揮発性記憶装置である。メディアドライブ１０７は、ＣＰＵ１０１の制御に従って、フラッシュメモリ等であるメディア１０６に対するデータの読み出しおよび書き込みを制御する装置である。メディア１０６は、通信端末１０に対して着脱自在の記憶装置である。なお、メディア１０６は、ＣＰＵ１０１の制御に従ってデータの読み出しおよび書き込みを行う不揮発性メモリであれば、フラッシュメモリに限定されるものではなく、ＥＥＰＲＯＭ（ＥｌｅｃｔｒｉｃａｌｌｙＥｒａｓａｂｌｅａｎｄＰｒｏｇｒａｍｍａｂｌｅＲＯＭ）等を用いてもよい。 The auxiliary storage device 105 is a nonvolatile storage device such as an HDD or an SSD (Solid State Drive) that stores various data such as image data, audio data, and video data. The media drive 107 is a device that controls reading and writing of data to and from the medium 106, such as a flash memory, under the control of the CPU 101. Media 106 is a storage device that is detachable from communication terminal 10 . Note that the medium 106 is not limited to a flash memory as long as it is a nonvolatile memory that reads and writes data under the control of the CPU 101, and may also be an EEPROM (Electrically Erasable and Programmable ROM) or the like.

入力装置１０８は、マウスまたはキーボード等の各種情報を入力するための装置、または操作ボタンおよび電源ボタン等のボタンである。 The input device 108 is a device for inputting various information such as a mouse or a keyboard, or a button such as an operation button or a power button.

また、通信端末１０は、ネットワークＩ／Ｆ１１１と、撮像素子Ｉ／Ｆ１１３と、音声入出力Ｉ／Ｆ１１６と、ＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）Ｉ／Ｆ１１７と、ディスプレイＩ／Ｆ１１９と、を備えている。 The communication terminal 10 also includes a network I/F 111, an image sensor I/F 113, an audio input/output I/F 116, a USB (Universal Serial Bus) I/F 117, and a display I/F 119.

ネットワークＩ／Ｆ１１１は、通信ネットワーク２を利用してデータを通信するためのインターフェースである。ネットワークＩ／Ｆ１１１は、例えば、ＴＣＰ（ＴｒａｎｓｍｉｓｓｉｏｎＣｏｎｔｒｏｌＰｒｏｔｏｃｏｌ）／ＩＰ（ＩｎｔｅｒｎｅｔＰｒｏｔｏｃｏｌ）に準拠したＮＩＣ（ＮｅｔｗｏｒｋＩｎｔｅｒｆａｃｅＣａｒｄ）等である。 The network I/F 111 is an interface for communicating data using the communication network 2. The network I/F 111 is, for example, a NIC (Network Interface Card) that is compliant with TCP (Transmission Control Protocol)/IP (Internet Protocol).

撮像素子Ｉ／Ｆ１１３は、ＣＰＵ１０１の制御に従って被写体を撮像して画像データを得るカメラ１１２との間で画像データを伝送するためのインターフェースである。カメラ１１２は、レンズ、および光を電荷に変換して被写体の画像（映像）を電子化する固体撮像素子を含む。カメラ１１２は、ケーブル１１２ｃによって撮像素子Ｉ／Ｆ１１３に接続される。固体撮像素子としては、ＣＭＯＳ（ＣｏｍｐｌｅｍｅｎｔａｒｙＭｅｔａｌＯｘｉｄｅＳｅｍｉｃｏｎｄｕｃｔｏｒ）またはＣＣＤ（ＣｈａｒｇｅＣｏｕｐｌｅｄＤｅｖｉｃｅ）等が用いられる。 The image sensor I/F 113 is an interface for transmitting image data to and from the camera 112 that images a subject and obtains image data under the control of the CPU 101. The camera 112 includes a lens and a solid-state image sensor that converts light into electric charge to digitize an image (video) of a subject. The camera 112 is connected to the image sensor I/F 113 by a cable 112c. As the solid-state image sensor, a CMOS (Complementary Metal Oxide Semiconductor), a CCD (Charge Coupled Device), or the like is used.

音声入出力Ｉ／Ｆ１１６は、ＣＰＵ１０１の制御に従って、音声を入力するマイク１１４ａ、および音声を出力するスピーカ１１４ｂ（出力装置）を有するスマートスピーカ１１４との間で音声信号（音声データ）の入出力を処理するインターフェースである。スマートスピーカ１１４は、マイクロホンアレイで構成されるマイク１１４ａを備えることによって、各マイクロホンから入力された音声に対する音声処理を行うことによって、当該音声の方向を特定する装置である。なお、スマートスピーカ１１４は、マイクロホンアレイを搭載して音声の方向を特定することができる装置であれば、必ずしもスマートスピーカである必要はない。スマートスピーカ１１４は、ケーブル１１４ｃによって音声入出力Ｉ／Ｆ１１６に接続される。 The audio input/output I/F 116 inputs and outputs audio signals (audio data) to and from the smart speaker 114, which has a microphone 114a for inputting audio and a speaker 114b (output device) for outputting audio, under the control of the CPU 101. It is an interface to process. The smart speaker 114 is a device that includes a microphone 114a constituted by a microphone array, and specifies the direction of the sound by performing sound processing on the sound input from each microphone. Note that the smart speaker 114 does not necessarily need to be a smart speaker as long as it is a device that is equipped with a microphone array and can specify the direction of sound. Smart speaker 114 is connected to audio input/output I/F 116 via cable 114c.

ＵＳＢＩ／Ｆ１１７は、外部機器（例えばＰＣ等）と接続してデータ通信を行うためのＵＳＢ規格のインターフェースである。 The USB I/F 117 is a USB standard interface for connecting with an external device (for example, a PC, etc.) and performing data communication.

ディスプレイＩ／Ｆ１１９は、ＣＰＵ１０１の制御に従って、外付けのディスプレイ１２０（表示装置）に画像データを伝送するためのインターフェースである。ディスプレイ１２０は、ケーブル１２０ｃによってディスプレイＩ／Ｆ１１９に接続される。ケーブル１２０ｃは、アナログＲＧＢ（ＶＧＡ）信号用のケーブルであってもよく、コンポーネントビデオ用のケーブルであってもよく、ＨＤＭＩ（登録商標）(Ｈｉｇｈ－ＤｅｆｉｎｉｔｉｏｎＭｕｌｔｉｍｅｄｉａＩｎｔｅｒｆａｃｅ)またはＤＶＩ（ＤｉｇｉｔａｌＶｉｄｅｏＩｎｔｅｒａｃｔｉｖｅ）信号用のケーブルであってもよい。 The display I/F 119 is an interface for transmitting image data to an external display 120 (display device) under the control of the CPU 101. Display 120 is connected to display I/F 119 by cable 120c. The cable 120c may be a cable for analog RGB (VGA) signals, or may be a cable for component video, such as HDMI (registered trademark) (High-Definition Multimedia Interface) or DVI (Digital Video Interactive) signal. It may be a cable for

上述のＣＰＵ１０１、ＲＯＭ１０２、ＲＡＭ１０３、補助記憶装置１０５、メディアドライブ１０７、入力装置１０８、ネットワークＩ／Ｆ１１１、撮像素子Ｉ／Ｆ１１３、音声入出力Ｉ／Ｆ１１６、ＵＳＢＩ／Ｆ１１７およびディスプレイＩ／Ｆ１１９は、アドレスバスおよびデータバス等のバスライン１１０によって互いに通信可能に接続されている。 The above-mentioned CPU 101, ROM 102, RAM 103, auxiliary storage device 105, media drive 107, input device 108, network I/F 111, image sensor I/F 113, audio input/output I/F 116, USB I/F 117, and display I/F 119, They are communicably connected to each other by bus lines 110 such as an address bus and a data bus.

なお、図２に示したビデオ会議端末である通信端末１０のハードウェア構成は一例を示すものであり、図２に示した構成要素以外の構成要素を含むものとしてもよい。また、カメラ１１２、およびスマートスピーカ１１４は、通信端末１０に一体的に備えられるものとしてもよく、または、カメラ１１２、およびスマートスピーカ１１４のうち少なくともいずれかは、外付けの別体の装置であってもよい。また、ディスプレイ１２０は、図２では、通信端末１０に対して外付けされるディスプレイとしているが、これに限定されるものではなく、通信端末１０と一体的に備えられるものとしてもよい。 Note that the hardware configuration of the communication terminal 10, which is a video conference terminal, shown in FIG. 2 is an example, and may include components other than those shown in FIG. Further, the camera 112 and the smart speaker 114 may be integrally provided in the communication terminal 10, or at least one of the camera 112 and the smart speaker 114 may be a separate external device. You can. Furthermore, although the display 120 is shown as a display that is attached externally to the communication terminal 10 in FIG. 2, it is not limited to this, and may be provided integrally with the communication terminal 10.

次に、図３を参照しながら、本実施形態に係る通信端末１０が電子黒板であるものとした場合のハードウェア構成の詳細について説明する。 Next, with reference to FIG. 3, the details of the hardware configuration when the communication terminal 10 according to this embodiment is an electronic whiteboard will be described.

図３に示すように、本実施形態に係る通信端末１０は、ＣＰＵ２０１と、ＲＯＭ２０２と、ＲＡＭ２０３と、ＳＳＤ２０４と、ネットワークＩ／Ｆ２０５と、外部機器接続Ｉ／Ｆ２０６と、を備えている。 As shown in FIG. 3, the communication terminal 10 according to the present embodiment includes a CPU 201, a ROM 202, a RAM 203, an SSD 204, a network I/F 205, and an external device connection I/F 206.

ＣＰＵ２０１は、通信端末１０全体の動作を制御する演算装置である。ＲＯＭ２０２は、ＣＰＵ２０１およびＩＰＬ（ＩｎｉｔｉａｌＰｒｏｇｒａｍＬｏａｄｅｒ）等のＣＰＵ２０１の駆動に用いられるプログラムを記憶する不揮発性記憶装置である。ＲＡＭ２０３は、ＣＰＵ２０１のワークエリアとして使用される揮発性記憶装置である。 The CPU 201 is a calculation device that controls the operation of the communication terminal 10 as a whole. The ROM 202 is a nonvolatile storage device that stores the CPU 201 and programs used to drive the CPU 201 such as an IPL (Initial Program Loader). The RAM 203 is a volatile storage device used as a work area for the CPU 201.

ＳＳＤ２０４は、画像データ、音声データおよび動作データ、ならびに通信端末１０用のプログラム等の各種データを記憶する不揮発性記憶装置である。なお、ＳＳＤ２０４の代わりにＨＤＤ等の不揮発性記憶装置を用いるものとしてもよい。 The SSD 204 is a nonvolatile storage device that stores various data such as image data, audio data, operation data, and programs for the communication terminal 10. Note that a nonvolatile storage device such as an HDD may be used instead of the SSD 204.

ネットワークＩ／Ｆ２０５は、通信ネットワーク２を利用してデータを通信するためのインターフェースである。ネットワークＩ／Ｆ２０５は、例えば、ＴＣＰ／ＩＰに準拠したＮＩＣ等である。 The network I/F 205 is an interface for communicating data using the communication network 2. The network I/F 205 is, for example, a NIC based on TCP/IP.

外部機器接続Ｉ／Ｆ２０６は、各種の外部機器を接続するためのＵＳＢ規格等のインターフェースである。この場合の外部機器としては、例えば、ＵＳＢメモリ２３０、スマートスピーカ２４０、およびカメラ２６０である。 The external device connection I/F 206 is an interface such as a USB standard for connecting various external devices. In this case, the external devices include, for example, the USB memory 230, the smart speaker 240, and the camera 260.

スマートスピーカ２４０は、音声を入力するマイク２４１、および音声を出力するスピーカ２４２（出力装置）を有する。スマートスピーカ２４０は、マイクロホンアレイで構成されるマイク２４１を備えることによって、各マイクロホンから入力された音声に対する音声処理を行うことによって、当該音声の方向を特定することができる装置である。なお、マイクロホンアレイを搭載して音声の方向を特定することができる装置であれば、必ずしもスマートスピーカである必要はない。 The smart speaker 240 has a microphone 241 that inputs audio, and a speaker 242 (output device) that outputs audio. The smart speaker 240 is a device that is equipped with a microphone 241 constituted by a microphone array, and can specify the direction of the sound by performing sound processing on the sound input from each microphone. Note that the device does not necessarily need to be a smart speaker as long as it is equipped with a microphone array and can specify the direction of sound.

カメラ２６０は、レンズ、および光を電荷に変換して被写体の画像（映像）を電子化する固体撮像素子を含む。固体撮像素子としては、ＣＭＯＳまたはＣＣＤ等が用いられる。 The camera 260 includes a lens and a solid-state image sensor that converts light into electric charges to digitize an image (video) of a subject. As the solid-state image sensor, CMOS, CCD, etc. are used.

また、通信端末１０は、キャプチャデバイス２１１と、ＧＰＵ（ＧｒａｐｈｉｃｓＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）２１２と、ディスプレイコントローラ２１３と、ディスプレイ２１４（表示装置）と、センサコントローラ２１５と、接触センサ２１６と、電子ペンコントローラ２１７と、電源スイッチ２２２と、選択スイッチ２２３と、を備えている。 The communication terminal 10 also includes a capture device 211, a GPU (Graphics Processing Unit) 212, a display controller 213, a display 214 (display device), a sensor controller 215, a contact sensor 216, an electronic pen controller 217, It includes a power switch 222 and a selection switch 223.

キャプチャデバイス２１１は、外付けのＰＣ２７０のディスプレイに対して映像情報を静止画または動画として表示させるデバイスである。 The capture device 211 is a device that displays video information as a still image or a moving image on the display of the external PC 270.

ＧＰＵ２１２は、画像処理に特化した演算装置である。ディスプレイコントローラ２１３は、ＧＰＵ２１２からの出力画像をディスプレイ２１４等へ出力するために画面表示の制御および管理を行うコントローラである。 The GPU 212 is an arithmetic unit specialized for image processing. The display controller 213 is a controller that controls and manages screen display in order to output the output image from the GPU 212 to the display 214 or the like.

センサコントローラ２１５は、接触センサ２１６の処理を制御するコントローラである。接触センサ２１６は、赤外線遮断方式による座標の入力および座標の検出を行うセンサである。この座標の入力および座標の検出する方法は、ディスプレイ２１４の上側両端部に設置された２つ受発光装置が、ディスプレイ２１４に平行して複数の赤外線を放射し、ディスプレイ２１４の周囲に設けられた反射部材によって反射されて、受光素子が放射した光の光路と同一の光路上を戻って来る光を受光する方法である。接触センサ２１６は、ディスプレイ２１４上に電子ペン２９０およびユーザの手Ｈ等が接触したことを検知する。接触センサ２１６は、物体によって遮断された２つの受発光装置が放射した赤外線のＩＤをセンサコントローラ２１５に出力し、センサコントローラ２１５が、物体の接触位置である座標位置を特定する。 The sensor controller 215 is a controller that controls processing of the contact sensor 216. The contact sensor 216 is a sensor that inputs and detects coordinates using an infrared cutoff method. This method of inputting and detecting coordinates is such that two light receiving and emitting devices installed at both ends of the upper side of the display 214 emit a plurality of infrared rays in parallel to the display 214. This is a method of receiving light that is reflected by a reflecting member and returns along the same optical path as the light emitted by the light receiving element. The contact sensor 216 detects that the electronic pen 290, the user's hand H, etc. are in contact with the display 214. The contact sensor 216 outputs the ID of the infrared rays emitted by the two light receiving and emitting devices blocked by the object to the sensor controller 215, and the sensor controller 215 identifies the coordinate position that is the contact position of the object.

なお、接触センサ２１６は、赤外線遮断方式に限らず、静電容量の変化を検知することにより接触位置を特定する静電容量方式のタッチパネル、対向する２つの抵抗膜の電圧変化によって接触位置を特定する抵抗膜方式のタッチパネル、接触物体が表示部に接触することによって生じる電磁誘導を検知して接触位置を特定する電磁誘導方式のタッチパネル等の種々の検出手段を用いてもよい。 Note that the contact sensor 216 is not limited to the infrared cutoff type, but can also be a capacitive type touch panel that identifies the touch position by detecting changes in capacitance, or a touch sensor that identifies the touch position by voltage changes between two opposing resistive films. Various detection means may be used, such as a resistive film type touch panel that detects the electromagnetic induction generated when a touch object contacts the display section, and an electromagnetic induction type touch panel that specifies the contact position.

電子ペンコントローラ２１７は、電子ペン２９０と通信することによって、ディスプレイ２１４へのペン先のタッチおよびペン尻のタッチの有無を判断するコントローラである。なお、電子ペンコントローラ２１７は、電子ペン２９０のペン先およびペン尻だけでなく、電子ペン２９０のユーザが握る部分、その他の電子ペンの部分のタッチの有無を判断するようにしてもよい。 The electronic pen controller 217 is a controller that communicates with the electronic pen 290 to determine whether or not the display 214 has been touched with the tip of the pen or the butt of the pen. Note that the electronic pen controller 217 may determine whether or not a portion of the electronic pen 290 that the user grips and other portions of the electronic pen are touched, in addition to the pen tip and pen tail of the electronic pen 290.

電源スイッチ２２２は、通信端末１０の電源のＯＮ／ＯＦＦを切り換えるためのスイッチである。選択スイッチ２２３は、例えば、ディスプレイ２１４の表示の明暗、色合い等を調整するためのスイッチ群である。 The power switch 222 is a switch for switching the power of the communication terminal 10 on and off. The selection switch 223 is, for example, a group of switches for adjusting the brightness, shade, etc. of the display on the display 214.

上述のＣＰＵ２０１、ＲＯＭ２０２、ＲＡＭ２０３、ＳＳＤ２０４、ネットワークＩ／Ｆ２０５、外部機器接続Ｉ／Ｆ２０６、キャプチャデバイス２１１、ＧＰＵ２１２、センサコントローラ２１５、電子ペンコントローラ２１７、電源スイッチ２２２および選択スイッチ２２３は、アドレスバスおよびデータバス等のバスライン２１０によって互いに通信可能に接続されている。 The above-mentioned CPU 201, ROM 202, RAM 203, SSD 204, network I/F 205, external device connection I/F 206, capture device 211, GPU 212, sensor controller 215, electronic pen controller 217, power switch 222, and selection switch 223 are an address bus and data They are communicably connected to each other by a bus line 210 such as a bus.

なお、図３に示した電子黒板である通信端末１０のハードウェア構成は一例を示すものであり、図３に示した構成要素以外の構成要素を含むものとしてもよい。また、カメラ２６０、およびスマートスピーカ２４０は、通信端末１０に一体的に備えられるものとしてもよく、または、カメラ２６０、およびスマートスピーカ２４０のうち少なくともいずれかは、外付けの別体の装置であってもよい。 Note that the hardware configuration of the communication terminal 10, which is an electronic whiteboard, shown in FIG. 3 is an example, and may include components other than those shown in FIG. Further, the camera 260 and the smart speaker 240 may be integrally provided in the communication terminal 10, or at least one of the camera 260 and the smart speaker 240 may be a separate external device. You can.

（管理システムおよびプログラム提供システムのハードウェア構成）
図４は、実施形態に係る管理システムおよびプログラム提供システムのハードウェア構成の一例を示す図である。図４を参照しながら、管理システム５０およびプログラム提供システム９０のハードウェア構成の詳細について説明する。 (Hardware configuration of management system and program provision system)
FIG. 4 is a diagram illustrating an example of the hardware configuration of the management system and program providing system according to the embodiment. The details of the hardware configuration of the management system 50 and the program providing system 90 will be described with reference to FIG. 4.

まず、図４を参照しながら管理システム５０のハードウェア構成について説明する。図４に示すように、管理システム５０は、ＣＰＵ３０１と、ＲＯＭ３０２と、ＲＡＭ３０３と、補助記憶装置３０５と、メディアドライブ３０７と、ディスプレイ３０８と、ネットワークＩ／Ｆ３０９と、キーボード３１１と、マウス３１２と、ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｃ）ドライブ３１４と、を備えている。 First, the hardware configuration of the management system 50 will be described with reference to FIG. 4. As shown in FIG. 4, the management system 50 includes a CPU 301, a ROM 302, a RAM 303, an auxiliary storage device 305, a media drive 307, a display 308, a network I/F 309, a keyboard 311, a mouse 312, A DVD (Digital Versatile Disc) drive 314 is provided.

ＣＰＵ３０１は、管理システム５０全体の動作を制御する演算装置である。ＲＯＭ３０２は、管理システム５０用のプログラムを記憶している不揮発性記憶装置である。ＲＡＭ３０３は、ＣＰＵ３０１のワークエリアとして使用される揮発性記憶装置である。 The CPU 301 is a calculation device that controls the operation of the management system 50 as a whole. ROM 302 is a nonvolatile storage device that stores programs for management system 50. RAM 303 is a volatile storage device used as a work area for CPU 301.

補助記憶装置３０５は、後述する認証管理ＤＢ５００１、端末管理ＤＢ５００２、グループ管理ＤＢ５００３およびセッション管理ＤＢ５００４等の各種データを記憶するＨＤＤまたはＳＳＤ等の記憶装置である。メディアドライブ３０７は、ＣＰＵ３０１の制御に従って、フラッシュメモリ等の記録メディア３０６に対するデータの読み出しおよび書き込みを制御する装置である。 The auxiliary storage device 305 is a storage device such as an HDD or an SSD that stores various data such as an authentication management DB 5001, a terminal management DB 5002, a group management DB 5003, and a session management DB 5004, which will be described later. The media drive 307 is a device that controls reading and writing of data to and from a recording medium 306 such as a flash memory under the control of the CPU 301 .

ディスプレイ３０８は、カーソル、メニュー、ウィンドウ、文字または画像等の各種情報を表示する液晶または有機ＥＬ等によって構成された表示装置である。ネットワークＩ／Ｆ３０９は、通信ネットワーク２を利用してデータを通信するためのインターフェースである。ネットワークＩ／Ｆ３０９は、例えば、ＴＣＰ／ＩＰに準拠したＮＩＣ等である。 The display 308 is a display device configured with a liquid crystal, an organic EL, or the like that displays various information such as a cursor, menu, window, characters, or images. The network I/F 309 is an interface for communicating data using the communication network 2. The network I/F 309 is, for example, a NIC based on TCP/IP.

キーボード３１１は、文字、数字、各種指示の選択、およびカーソルの移動等を行う入力装置である。マウス３１２は、各種指示の選択および実行、処理対象の選択、ならびにカーソルの移動等を行うための入力装置である。 The keyboard 311 is an input device for selecting letters, numbers, various instructions, moving a cursor, and the like. The mouse 312 is an input device for selecting and executing various instructions, selecting a processing target, moving a cursor, and the like.

ＤＶＤドライブ３１４は、着脱自在な記憶媒体の一例としてのＤＶＤ－ＲＯＭまたはＤＶＤ－Ｒ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｋＲｅｃｏｒｄａｂｌｅ）等のＤＶＤ３１３に対するデータの読み出しおよび書き込みを制御する装置である。 The DVD drive 314 is a device that controls reading and writing of data to and from a DVD 313 such as a DVD-ROM or a DVD-R (Digital Versatile Disk Recordable), which is an example of a removable storage medium.

上述のＣＰＵ３０１、ＲＯＭ３０２、ＲＡＭ３０３、補助記憶装置３０５、メディアドライブ３０７、ディスプレイ３０８、ネットワークＩ／Ｆ３０９、キーボード３１１、マウス３１２およびＤＶＤドライブ３１４は、アドレスバスおよびデータバス等のバスライン３１０によって互いに通信可能に接続されている。 The above-described CPU 301, ROM 302, RAM 303, auxiliary storage device 305, media drive 307, display 308, network I/F 309, keyboard 311, mouse 312, and DVD drive 314 can communicate with each other via bus lines 310 such as an address bus and a data bus. It is connected to the.

なお、図３に示した管理システム５０のハードウェア構成は一例を示すものであり、図３に示した構成要素を全て含む必要はなく、または、その他の構成要素を含むものとしてもよい。 Note that the hardware configuration of the management system 50 shown in FIG. 3 is an example, and it is not necessary to include all the components shown in FIG. 3, or it may include other components.

なお、プログラム提供システム９０は、上述の管理システム５０と同様のハードウェア構成を有しているため、その説明を省略する。ただし、ＲＯＭ３０２には、プログラム提供システム９０を制御するためのプログラム提供システム９０用のプログラムが記録されている。 Note that since the program providing system 90 has the same hardware configuration as the above-described management system 50, a description thereof will be omitted. However, a program for the program providing system 90 for controlling the program providing system 90 is recorded in the ROM 302 .

（通信端末のソフトウェア構成）
図５は、実施形態に係る通信端末のソフトウェア構成の一例を示す図である。図５を参照しながら、本実施形態に係る通信端末１０のソフトウェア構成の詳細について説明する。 (Software configuration of communication terminal)
FIG. 5 is a diagram illustrating an example of the software configuration of the communication terminal according to the embodiment. The details of the software configuration of the communication terminal 10 according to this embodiment will be described with reference to FIG. 5.

通信端末１０には、クライアントアプリとして通信アプリＡがインストールされている。ここで、アプリとは、アプリケーションソフトを意味する。図５に示すように、ＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）１０２０、および通信アプリＡは、通信端末１０のＲＡＭ１０３（ＲＡＭ２０３）の作業領域１０１０上で動作する。 A communication application A is installed on the communication terminal 10 as a client application. Here, the application means application software. As shown in FIG. 5, an OS (Operating System) 1020 and a communication application A operate on a work area 1010 of the RAM 103 (RAM 203) of the communication terminal 10.

ＯＳ１０２０は、基本的な機能を提供し、通信端末１０全体を管理する基本ソフトウェアである。通信アプリＡは、ＯＳ１０２０の制御に従って動作し、他の通信端末１０と通信（通話）するためのアプリである。 The OS 1020 is basic software that provides basic functions and manages the entire communication terminal 10. The communication application A is an application that operates under the control of the OS 1020 and is used to communicate (call) with other communication terminals 10 .

なお、通信アプリＡの通信プロトコルとしては、ＳＩＰ（ＳｅｓｓｉｏｎＩｎｉｔｉａｔｉｏｎＰｒｏｔｏｃｏｌ)、Ｈ.３２３、ＩＲＣ（ＩｎｔｅｒｎｅｔＲｅｌａｙＣｈａｔ)、またはＪｉｎｇｌｅ等が挙げられる。 Note that examples of the communication protocol of the communication application A include SIP (Session Initiation Protocol), H.323, IRC (Internet Relay Chat), Jingle, and the like.

（通信システムの機能ブロックの構成）
図６は、実施形態に係る通信システムの機能ブロックの構成の一例を示す図である。図６を参照しながら、本実施形態に係る通信システム１の機能ブロックの構成について説明する。 (Configuration of functional blocks of communication system)
FIG. 6 is a diagram illustrating an example of the configuration of functional blocks of the communication system according to the embodiment. The configuration of functional blocks of the communication system 1 according to this embodiment will be described with reference to FIG. 6.

＜通信端末の機能ブロックの構成＞
図６に示すように、通信端末１０は、通信部１１と、操作入力受付部１２と、撮像部１３と、表示制御部１４（第２制御部の一例）と、音声入力部１５と、音声出力部１６（第２制御部の一例）と、記憶・読出部１７と、記憶部１８と、認証要求部１９と、発話方向特定部２０（第１特定部）と、を有している。 <Configuration of functional blocks of communication terminal>
As shown in FIG. 6, the communication terminal 10 includes a communication section 11, an operation input reception section 12, an imaging section 13, a display control section 14 (an example of a second control section), an audio input section 15, and an audio input section 15. It has an output section 16 (an example of a second control section), a storage/readout section 17, a storage section 18, an authentication request section 19, and a speech direction specifying section 20 (first specifying section).

通信部１１は、通信ネットワーク２を介して、他の通信端末１０または各システムと各種データの送受信を行う機能部である。通信部１１は、図２に示すＣＰＵ１０１（図３に示すＣＰＵ２０１）によるソフトウェアである通信アプリＡの実行、およびネットワークＩ／Ｆ１１１（ネットワークＩ／Ｆ２０５）によって実現される。 The communication unit 11 is a functional unit that transmits and receives various data to and from other communication terminals 10 or each system via the communication network 2. The communication unit 11 is realized by execution of a communication application A, which is software, by the CPU 101 shown in FIG. 2 (CPU 201 shown in FIG. 3), and by the network I/F 111 (network I/F 205).

通信部１１は、当該通信端末１０が他の通信端末１０とセッションを確立し、通話によるビデオ会議を開始する前に、管理システム５０から、宛先端末の候補としての各通信端末１０の状態を示す各状態情報の受信を開始する。ここで、宛先端末の候補とは、通信端末１０が、ビデオ会議を行う相手、すなわちセッションの相手として指定可能なビデオ会議の相手であるユーザ（参加者）が利用する宛先候補となる他の通信端末１０である。すなわち、通信端末１０は、宛先端末の候補として予め設定されていない通信端末とは、セッションを確立することができず、ビデオ会議を行うことができない。 The communication unit 11 indicates the status of each communication terminal 10 as a destination terminal candidate from the management system 50 before the communication terminal 10 establishes a session with another communication terminal 10 and starts a video conference by telephone call. Start receiving each status information. Here, the term "destination terminal candidate" refers to other communications that the communication terminal 10 uses as a destination candidate for use by a video conference partner, that is, a user (participant) who is a video conference partner who can be specified as a session partner. This is the terminal 10. That is, the communication terminal 10 cannot establish a session and cannot conduct a video conference with a communication terminal that has not been set in advance as a candidate for a destination terminal.

また、状態情報は、各通信端末１０の稼動状態（オンラインかオフラインかの状態）と、オンラインにおいてはさらに通話中であるか、待受け中であるか等の詳細な状態（以下、通信状態と称する）とを示す。また、状態情報は、各通信端末１０の稼動状態および通信状態だけでなく、ケーブルが通信端末１０から外れている、音声を出力できるが画像は出力できない、または、音声が入力されないように設定されている（ミュート）等、様々な状態を示すものとしてもよいが、以下では、一例として、稼動状態および通信状態を示す場合について説明する。 In addition, the status information includes the operating status of each communication terminal 10 (online or offline status), and if online, detailed status such as whether it is on a call or on standby (hereinafter referred to as communication status). ). In addition, the status information includes not only the operating status and communication status of each communication terminal 10, but also information such as whether the cable is disconnected from the communication terminal 10, whether audio can be output but not images, or whether audio is not input. Although it may indicate various states such as muted (muted), in the following, a case where the operating state and the communication state are shown will be described as an example.

通信部１１は、当該通信端末１０が開始要求端末として動作する場合には、開始要求情報を管理システム５０に送信する。ここで、開始要求情報とは、ビデオ会議に用いられるセッションの開始を要求する情報である。開始要求情報は、具体的には、開始を要求する旨を示す情報と、開始要求情報の送信元である開始要求端末の端末ＩＤと、セッションの相手となる宛先端末の端末ＩＤと、を含む。端末ＩＤは、通信端末１０を識別するための情報であって、予め通信端末１０に記憶させておく他、ユーザが直接通信端末１０へ入力して決定するものとしてもよい。 The communication unit 11 transmits start request information to the management system 50 when the communication terminal 10 operates as a start request terminal. Here, the start request information is information requesting the start of a session used for a video conference. Specifically, the start request information includes information indicating that a start is requested, the terminal ID of the start requesting terminal that is the source of the start request information, and the terminal ID of the destination terminal that is the other party of the session. . The terminal ID is information for identifying the communication terminal 10, and may be stored in the communication terminal 10 in advance or may be determined by being input directly into the communication terminal 10 by the user.

操作入力受付部１２は、ユーザによる各種入力を受け付ける機能部である。操作入力受付部１２は、図２に示す入力装置１０８（図３に示す接触センサ２１６、電源スイッチ２２２および選択スイッチ２２３）によって実現される。 The operation input receiving unit 12 is a functional unit that receives various inputs from the user. The operation input receiving unit 12 is realized by the input device 108 shown in FIG. 2 (the contact sensor 216, the power switch 222, and the selection switch 223 shown in FIG. 3).

例えば、ユーザが、操作入力受付部１２のうち図２に示す入力装置１０８としての電源ボタンをオンにすると、当該通信端末１０の電源がオン状態になる。また、ユーザが電源をオン状態からオフにすると、通信部１１は、管理システム５０へ、当該通信端末１０の電源がオフになった旨の状態情報を送信してから、当該通信端末１０の電源が完全にオフとなる。これによって、管理システム５０は、通信端末１０が電源オンから電源オフになったことを把握することができる。 For example, when the user turns on the power button of the operation input receiving unit 12 as the input device 108 shown in FIG. 2, the power of the communication terminal 10 is turned on. Further, when the user turns off the power from the on state, the communication unit 11 transmits status information to the management system 50 to the effect that the power of the communication terminal 10 is turned off, and then is completely turned off. Thereby, the management system 50 can grasp that the communication terminal 10 has changed from being powered on to being powered off.

撮像部１３は、被写体を撮像して、撮像して得た画像データを取得する機能部である。撮像部１３は、図２に示すＣＰＵ１０１（図３に示すＣＰＵ２０１）によるソフトウェアである通信アプリＡの実行、ならびに、カメラ１１２（カメラ２６０）および撮像素子Ｉ／Ｆ１１３（外部機器接続Ｉ／Ｆ２０６）によって実現される。 The image capturing unit 13 is a functional unit that captures an image of a subject and obtains image data obtained by capturing the image. The imaging unit 13 executes communication application A, which is software, by the CPU 101 shown in FIG. Realized.

表示制御部１４は、ディスプレイ１２０（ディスプレイ２１４）に対して画像データ等の表示制御を行う機能部である。表示制御部１４は、図２に示すＣＰＵ１０１（図３に示すＣＰＵ２０１）によるソフトウェアである通信アプリＡの実行によって実現される。 The display control unit 14 is a functional unit that controls the display of image data and the like on the display 120 (display 214). The display control unit 14 is realized by execution of communication application A, which is software, by the CPU 101 shown in FIG. 2 (CPU 201 shown in FIG. 3).

表示制御部１４は、例えば、ビデオ会議の要求元としての当該通信端末１０が所望の宛先としての通信端末１０とビデオ会議の通話を開始する前に、通信部１１によって受信された宛先端末の候補の状態情報を反映させて、各宛先端末の候補の名前が含まれた宛先リストをディスプレイ１２０（ディスプレイ２１４）に表示させる。 For example, before the communication terminal 10 as a video conference request source starts a video conference call with the communication terminal 10 as a desired destination, the display control unit 14 displays the destination terminal candidates received by the communication unit 11. A destination list including candidate names of each destination terminal is displayed on the display 120 (display 214), reflecting the status information of the destination terminal.

音声入力部１５は、マイク１１４ａ（マイク２４１）のマイクロホンアレイによって収音された参加者（話者）の音声が音声信号に変換された後、当該音声信号を入力する機能部である。音声入力部１５は、図２に示すＣＰＵ１０１（図３に示すＣＰＵ２０１）によるソフトウェアである通信アプリＡの実行、および音声入出力Ｉ／Ｆ１１６（外部機器接続Ｉ／Ｆ２０６）によって実現される。 The audio input unit 15 is a functional unit that inputs the audio signal after the participant's (speaker's) audio collected by the microphone array of the microphone 114a (microphone 241) is converted into an audio signal. The audio input unit 15 is realized by the CPU 101 shown in FIG. 2 (the CPU 201 shown in FIG. 3) executing the communication application A, which is software, and the audio input/output I/F 116 (external device connection I/F 206).

音声出力部１６は、音声信号をスピーカ１１４ｂ（スピーカ２４２）に出力し、スピーカ１１４ｂ（スピーカ２４２）から音声を出力させる機能部である。音声出力部１６は、図２に示すＣＰＵ１０１（図３に示すＣＰＵ２０１）によるソフトウェアである通信アプリＡの実行、および音声入出力Ｉ／Ｆ１１６（外部機器接続Ｉ／Ｆ２０６）によって実現される。 The audio output unit 16 is a functional unit that outputs an audio signal to the speaker 114b (speaker 242) and causes the speaker 114b (speaker 242) to output audio. The audio output unit 16 is realized by execution of a communication application A, which is software, by the CPU 101 shown in FIG. 2 (CPU 201 shown in FIG. 3), and by the audio input/output I/F 116 (external device connection I/F 206).

記憶・読出部１７は、記憶部１８に各種データを記憶したり、記憶部１８に記憶された各種データを読み出す処理を行う機能部である。記憶部１８には、例えば、宛先端末との通話を行う際に受信されるコンテンツデータが、受信される度に上書き記憶される。このうち、上書きされる前の画像データによってディスプレイ１２０（ディスプレイ２１４）に画像が表示され、上書きされる前の音声データによってスピーカ１１４ｂ（スピーカ２４２）から音声が出力される。記憶・読出部１７は、図２に示すＣＰＵ１０１（図３に示すＣＰＵ２０１）によるソフトウェアである通信アプリＡの実行によって実現される。 The storage/readout unit 17 is a functional unit that stores various data in the storage unit 18 and performs processing to read out various data stored in the storage unit 18. For example, content data received when making a call to a destination terminal is overwritten and stored in the storage unit 18 each time the content data is received. Among these, an image is displayed on the display 120 (display 214) using the image data before being overwritten, and sound is output from the speaker 114b (speaker 242) using the audio data before being overwritten. The storage/reading unit 17 is realized by execution of communication application A, which is software, by the CPU 101 shown in FIG. 2 (CPU 201 shown in FIG. 3).

認証要求部１９は、当該通信端末１０の電源がオンした場合、または、操作入力受付部１２により認証要求操作が受け付けられた場合、通信部１１から通信ネットワーク２を介して管理システム５０に、ログインの認証を要求する旨を示す認証要求情報、および当該通信端末１０の現時点のＩＰアドレスを送信する機能部である。認証要求部１９は、図２に示すＣＰＵ１０１（図３に示すＣＰＵ２０１）によるソフトウェアである通信アプリＡの実行によって実現される。 The authentication request unit 19 causes the communication unit 11 to log in to the management system 50 via the communication network 2 when the communication terminal 10 is powered on or when the operation input reception unit 12 accepts an authentication request operation. This is a functional unit that transmits authentication request information indicating that authentication of the communication terminal 10 is requested, and the current IP address of the communication terminal 10. The authentication request unit 19 is realized by execution of communication application A, which is software, by the CPU 101 shown in FIG. 2 (CPU 201 shown in FIG. 3).

発話方向特定部２０は、音声入力部１５により入力された音声信号に基づいて、音声方向（発話方向）を特定する機能部である。具体的には、音声入力部１５により入力された音声信号は、マイク１１４ａ（マイク２４１）のマイクロホンアレイに含まれる各マイクロホンから入力された各音声信号を含み、発話方向特定部２０は、各マイクロホンの音声信号に対して音声処理を行うことにより、音声の方向を特定する。発話方向特定部２０は、図２に示すＣＰＵ１０１（図３に示すＣＰＵ２０１）によるソフトウェアである通信アプリＡの実行によって実現される。 The speech direction specifying unit 20 is a functional unit that specifies the speech direction (speech direction) based on the audio signal input by the audio input unit 15. Specifically, the audio signal input by the audio input unit 15 includes each audio signal input from each microphone included in the microphone array of the microphone 114a (microphone 241), and the speech direction identifying unit 20 The direction of the sound is determined by performing sound processing on the sound signal. The speech direction specifying unit 20 is realized by execution of a communication application A, which is software, by the CPU 101 shown in FIG. 2 (CPU 201 shown in FIG. 3).

なお、上述の表示制御部１４、認証要求部１９および発話方向特定部２０のうち少なくともいずれかは、ＡＳＩＣ（ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）またはＦＰＧＡ（Ｆｉｅｌｄ－ＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）等のハードウェア回路によって実現されるものとしてもよい。 Note that at least one of the display control section 14, authentication request section 19, and speech direction identification section 20 described above is realized by a hardware circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field-Programmable Gate Array). It may also be used as a

また、図６に示した通信端末１０の各機能部は、機能を概念的に示したものであって、このような構成に限定されるものではない。例えば、図６に示した通信端末１０で独立した機能部として図示した複数の機能部を、１つの機能部として構成してもよい。一方、図６に示した通信端末１０の１つ機能部が有する機能を複数に分割し、複数の機能部として構成するものとしてもよい。 Moreover, each functional unit of the communication terminal 10 shown in FIG. 6 shows the function conceptually, and is not limited to such a configuration. For example, a plurality of functional units illustrated as independent functional units in the communication terminal 10 shown in FIG. 6 may be configured as one functional unit. On the other hand, the function of one functional unit of the communication terminal 10 shown in FIG. 6 may be divided into a plurality of functions and configured as a plurality of functional units.

＜管理システムの機能ブロックの構成＞
図６に示すように、管理システム５０は、通信部５１と、認証部５２と、状態管理部５３と、端末抽出部５４と、端末状態取得部５５と、セッション制御部５６と、記憶・読出部５７と、記憶部５８と、を有している。管理システム５０は、さらに、顔検出部６１（検出部）と、比較部６２と、生成部６３と、属性情報要求部６４（要求部）と、対応付け部６５（第２特定部）と、テキスト化部６６と、抽出部６７と、登録部６８と、表示制御部６９（第１制御部、制御部）と、を有している。記憶部５８は、図４に示す補助記憶装置３０５によって実現され、図６に示すように、認証管理ＤＢ５００１と、端末管理ＤＢ５００２と、グループ管理ＤＢ５００３と、セッション管理ＤＢ５００４とを記憶している。以下、記憶部５８に記憶されている各ＤＢにおいて管理される各テーブルについて説明する。 <Configuration of functional blocks of management system>
As shown in FIG. 6, the management system 50 includes a communication unit 51, an authentication unit 52, a status management unit 53, a terminal extraction unit 54, a terminal status acquisition unit 55, a session control unit 56, and a storage/readout unit 56. It has a section 57 and a storage section 58. The management system 50 further includes a face detection unit 61 (detection unit), a comparison unit 62, a generation unit 63, an attribute information request unit 64 (request unit), an association unit 65 (second identification unit), It has a text conversion section 66, an extraction section 67, a registration section 68, and a display control section 69 (first control section, control section). The storage unit 58 is realized by the auxiliary storage device 305 shown in FIG. 4, and stores an authentication management DB 5001, a terminal management DB 5002, a group management DB 5003, and a session management DB 5004, as shown in FIG. Each table managed in each DB stored in the storage unit 58 will be described below.

＜＜認証管理テーブル＞＞
図７は、認証管理テーブルの一例を示す図である。 <<Authentication management table>>
FIG. 7 is a diagram showing an example of an authentication management table.

記憶部５８は、図７に示す認証管理テーブルを含む認証管理ＤＢ５００１を記憶している。認証管理テーブルでは、ログインの認証を行う通信端末１０を利用するユーザ（参加者）を識別するユーザＩＤに対して、パスワードが関連付けられて管理される。ここで、パスワードは、ログインの認証するために利用される情報である。例えば、図７に示す認証管理テーブルにおいて、ユーザＩＤが「Ａ＿１０ａａ」に関連付けられたパスワードが「ａａａａ」であることが示されている。 The storage unit 58 stores an authentication management DB 5001 including an authentication management table shown in FIG. In the authentication management table, a password is managed in association with a user ID that identifies a user (participant) using the communication terminal 10 that performs login authentication. Here, the password is information used to authenticate login. For example, the authentication management table shown in FIG. 7 shows that the password associated with the user ID "A_10aa" is "aaaa".

なお、ユーザＩＤは、通信端末１０を利用するユーザを一意に識別するために使われる文字、記号、数字または各種のしるし等の識別情報であり、例えば、当該ユーザが利用するメールアドレス等であってもよい。 Note that the user ID is identification information such as letters, symbols, numbers, or various marks used to uniquely identify the user who uses the communication terminal 10. For example, the user ID is identification information such as an e-mail address used by the user. You can.

＜＜端末管理テーブル＞＞
図８は、端末管理テーブルの一例を示す図である。 <<Terminal management table>>
FIG. 8 is a diagram showing an example of a terminal management table.

記憶部５８は、図８に示す端末管理テーブルを含む端末管理ＤＢ５００２を記憶している。端末管理テーブルでは、各通信端末１０の端末ＩＤ毎に、端末名、各通信端末１０にログインしたユーザのユーザＩＤ、各通信端末１０の稼動状態、他の通信端末１０との通信状態、および各通信端末１０のＩＰアドレスが関連付けられて管理される。 The storage unit 58 stores a terminal management DB 5002 including a terminal management table shown in FIG. In the terminal management table, for each terminal ID of each communication terminal 10, the terminal name, the user ID of the user who logged in to each communication terminal 10, the operating status of each communication terminal 10, the communication status with other communication terminals 10, and each The IP address of the communication terminal 10 is associated and managed.

ここで、稼動状態としては、電源がオンされ、通信が可能または通信中の状態であるオンラインと、電源がオンされていない等、通信が可能でない状態であるオフラインとがある。また、通信状態としては、例えば、「Ｃａｌｌｉｎｇ」、「Ｒｉｎｇｉｎｇ」、「Ａｃｃｅｐｔｅｄ」、「Ｂｕｓｙ」、および「Ｎｏｎｅ」等がある。「Ｃａｌｌｉｎｇ」は、他の通信端末１０を呼び出している状態、すなわち、他の通信端末１０に対しビデオ会議に用いられるセッションを確立するための開始要求情報を送信し、応答を待っている状態を示す。「Ｒｉｎｇｉｎｇ」は、他の通信端末１０から呼び出されている状態、すなわち、他の通信端末１０から開始要求情報を受信し、受信した開始要求情報に対する応答が完了していない状態を示す。「Ａｃｃｅｐｔｅｄ」は、他の通信端末１０からの開始要求情報に対し許可の応答が完了しているが、セッションの確立が完了していない状態、および、自端末が送信した開始要求情報に対し許可の応答の受信が完了しているが、セッションの確立が完了していない状態を示す。「Ｂｕｓｙ」は、他の通信端末１０とのセッションが確立し、ビデオ会議におけるコンテンツデータの通信による通話が行われている状態を示す。「Ｎｏｎｅ」は、他の通信端末１０と通信しておらず、待ち受け中の状態を示す。 Here, the operating state includes online, which is a state in which the power is turned on and communication is possible or in progress, and offline, which is a state in which the power is not turned on and communication is not possible. Furthermore, the communication status includes, for example, "Calling", "Ringing", "Accepted", "Busy", and "None". "Calling" indicates a state in which another communication terminal 10 is called, that is, a state in which start request information for establishing a session used for a video conference is sent to another communication terminal 10 and a response is waited for. show. "Ringing" indicates a state of being called by another communication terminal 10, that is, a state of receiving start request information from another communication terminal 10 and not completing a response to the received start request information. "Accepted" indicates a state in which a permission response has been completed in response to start request information from another communication terminal 10, but session establishment has not been completed, and a state in which permission has been granted in response to start request information sent by the own terminal. Indicates that the reception of the response has been completed, but the session establishment has not been completed. "Busy" indicates a state in which a session with another communication terminal 10 has been established and a call is being made by communicating content data in a video conference. “None” indicates that the communication terminal 10 is not communicating with other communication terminals 10 and is on standby.

例えば、図８に示す端末管理テーブルにおいて、端末ＩＤが「１０ａｄ」の通信端末１０ａｄは、端末名が「日本東京事業所ＡＤ端末」で、ログインしているユーザのユーザＩＤが「Ｃ＿１０ａｄ」で、稼動状態が「オンライン」で、通信状態が他の通信端末１０から呼び出されている状態を示す「Ｒｉｎｇｉｎｇ」で、この通信端末１０ａｄのＩＰアドレスが「１．２．１．６」であることが示されている。 For example, in the terminal management table shown in FIG. 8, the communication terminal 10ad with the terminal ID "10ad" has the terminal name "Japan Tokyo Office AD Terminal" and the user ID of the logged-in user is "C_10ad". The operating state is "online" and the communication state is "Ringing", which indicates that another communication terminal 10 is calling, and the IP address of this communication terminal 10ad is "1.2.1.6". It is shown.

＜＜グループ管理システム＞＞
図９は、グループ管理テーブルの一例を示す図である。 <<Group management system>>
FIG. 9 is a diagram showing an example of a group management table.

記憶部５８は、図９に示すグループ管理テーブルを含むグループ管理ＤＢ５００３を記憶している。グループ管理テーブルでは、管理システム５０に予め登録されているビデオ会議のグループごとに、当該グループに含まれる通信端末１０の端末ＩＤが管理される。すなわち、グループ管理テーブルでは、グループを識別するグループＩＤと、当該グループに含まれる通信端末１０の端末ＩＤとが関連付けられて管理される。 The storage unit 58 stores a group management DB 5003 including a group management table shown in FIG. In the group management table, for each video conference group registered in advance in the management system 50, the terminal IDs of the communication terminals 10 included in the group are managed. That is, in the group management table, a group ID for identifying a group and a terminal ID of a communication terminal 10 included in the group are managed in association with each other.

例えば、図９に示すグループ管理テーブルにおいて、グループＩＤが「Ｇ００２」のグループは、端末ＩＤが「１０ａｃ」、「１０ｃａ」、「１０ｃｂ」である通信端末１０を含むことが示されている。 For example, in the group management table shown in FIG. 9, a group with a group ID of "G002" is shown to include communication terminals 10 with terminal IDs of "10ac", "10ca", and "10cb".

＜＜セッション管理テーブル＞＞
図１０は、セッション管理テーブルの一例を示す図である。 <<Session management table>>
FIG. 10 is a diagram showing an example of a session management table.

記憶部５８は、図１０に示すセッション管理テーブルを含むセッション管理ＤＢ５００４を記憶している。セッション管理テーブルでは、通信端末１０間でコンテンツデータが通信されるセッションを識別するためのセッションＩＤ毎に、セッションの開始要求端末の端末ＩＤ、およびセッションを確立するための開始要求情報において相手先として指定された宛先端末の端末ＩＤが関連付けられて管理される。 The storage unit 58 stores a session management DB 5004 including a session management table shown in FIG. In the session management table, for each session ID for identifying the session in which content data is communicated between the communication terminals 10, the terminal ID of the session start request terminal and the other party in the start request information for establishing the session, The terminal ID of the designated destination terminal is associated and managed.

例えば、図１０に示すセッション管理テーブルにおいて、セッションＩＤ「ｓｅ１」で識別されるセッションは、端末ＩＤが「１０ａａ」の開始要求端末（通信端末１０ａａ）と、端末ＩＤが「１０ｄｂ」の宛先端末（通信端末１０ｄｂ）との間で確立されたことを示す。 For example, in the session management table shown in FIG. 10, the session identified by the session ID "se1" is between the start requesting terminal (communication terminal 10aa) with the terminal ID "10aa" and the destination terminal (communication terminal 10aa) with the terminal ID "10db". This indicates that the connection has been established with the communication terminal 10db).

なお、図７～図１０に示した各テーブルで管理される情報は、テーブル形式の情報としているが、これに限定されるものではなく、管理される各情報が関連付けられることができれば、テーブル形式に限定されるものではない。 Although the information managed in each table shown in FIGS. 7 to 10 is in table format, the information is not limited to this, and as long as each piece of managed information can be associated, it can be in table format. It is not limited to.

図６に戻り、管理システム５０の機能ブロックの説明に戻る。 Returning to FIG. 6, the description of the functional blocks of the management system 50 will be returned.

通信部５１は、通信ネットワーク２を介して、通信端末１０または他のシステムと各種データの送受信を行う機能部である。通信部５１は、図４に示すＣＰＵ３０１によるプログラムの実行、およびネットワークＩ／Ｆ３０９によって実現される。 The communication unit 51 is a functional unit that transmits and receives various data to and from the communication terminal 10 or other systems via the communication network 2. The communication unit 51 is realized by the execution of a program by the CPU 301 and the network I/F 309 shown in FIG.

認証部５２は、通信部５１を介して受信された認証要求情報に含まれているユーザＩＤおよびパスワードを検索キーとし、記憶部５８の認証管理テーブル（図７参照）を検索し、認証管理テーブルに同一のユーザＩＤおよびパスワードが管理されているかを判断することによってユーザ認証を行う機能部である。認証部５２は、図４に示すＣＰＵ３０１によるプログラムの実行によって実現される。 The authentication unit 52 uses the user ID and password included in the authentication request information received via the communication unit 51 as a search key, searches the authentication management table (see FIG. 7) in the storage unit 58, and searches the authentication management table. This is a functional unit that performs user authentication by determining whether the same user ID and password are managed in the same user ID and password. The authentication unit 52 is realized by executing a program by the CPU 301 shown in FIG.

状態管理部５３は、図８に示す端末管理テーブルの稼動状態および通信状態を管理する機能部である。状態管理部５３は、ログインの認証を要求してきた通信端末１０の稼動状態を管理すべく、端末管理テーブルに、この通信端末１０の端末ＩＤ、当該通信端末１０にログインしている参加者のユーザＩＤ、当該通信端末１０の稼動状態、および当該通信端末１０のＩＰアドレスを関連付けて記憶して管理する。 The status management unit 53 is a functional unit that manages the operating status and communication status of the terminal management table shown in FIG. In order to manage the operating state of the communication terminal 10 that has requested login authentication, the state management unit 53 stores the terminal ID of this communication terminal 10 and the user of the participant who is logged in to the communication terminal 10 in the terminal management table. The ID, the operating state of the communication terminal 10, and the IP address of the communication terminal 10 are stored and managed in association with each other.

状態管理部５３は、通信端末１０のユーザによる入力装置１０８（電源スイッチ２２２）の操作によってオフ状態からオン状態になると、この通信端末１０から送られてきた電源をオンする旨の情報に基づいて、端末管理テーブルの稼動状態をオフラインからオンラインに更新する。また、状態管理部５３は、通信端末１０のユーザによる入力装置１０８（電源スイッチ２２２）の操作によってオン状態からオフ状態になると、この通信端末１０から送られてきた電源をオフする旨の情報に基づいて、端末管理テーブルの稼動状態をオンラインからオフラインに更新する。 When the communication terminal 10 changes from the off state to the on state by the user's operation of the input device 108 (power switch 222), the state management unit 53 controls the state management unit 53 based on the information sent from the communication terminal 10 to turn on the power. , updates the operating status of the terminal management table from offline to online. In addition, when the communication terminal 10 changes from the on state to the off state by the user's operation of the input device 108 (power switch 222), the state management unit 53 responds to information sent from the communication terminal 10 indicating that the power is to be turned off. Based on this, the operating status of the terminal management table is updated from online to offline.

状態管理部５３は、図４に示すＣＰＵ３０１によるプログラムの実行によって実現される。 The state management unit 53 is realized by executing a program by the CPU 301 shown in FIG.

端末抽出部５４は、ログインの認証要求した通信端末１０等、処理対象となる対象端末の端末ＩＤを検索キーとして、図９に示すグループ管理テーブルを検索し、対象端末と通話することができる、すなわちセッションを確立することのできる宛先端末の候補（同じグループの通信端末１０）の端末ＩＤを読み出す機能部である。端末抽出部５４は、図４に示すＣＰＵ３０１によるプログラムの実行によって実現される。 The terminal extraction unit 54 searches the group management table shown in FIG. 9 using the terminal ID of the target terminal to be processed, such as the communication terminal 10 that has requested login authentication, as a search key, and can make a call to the target terminal. That is, it is a functional unit that reads the terminal ID of a destination terminal candidate (communication terminal 10 of the same group) with which a session can be established. The terminal extraction unit 54 is realized by executing a program by the CPU 301 shown in FIG.

端末状態取得部５５は、端末ＩＤを検索キーとして、図８に示す端末管理テーブルを検索し、端末ＩＤ毎に稼動状態および通信状態を読み出す機能部である。これにより、端末状態取得部５５は、ログインの認証要求をしてきた通信端末１０と通話することができる宛先端末の候補の稼動状態および通信状態を取得することができる。端末状態取得部５５は、図４に示すＣＰＵ３０１によるプログラムの実行によって実現される。 The terminal status acquisition unit 55 is a functional unit that searches the terminal management table shown in FIG. 8 using the terminal ID as a search key, and reads out the operating status and communication status for each terminal ID. Thereby, the terminal status acquisition unit 55 can acquire the operating status and communication status of a candidate destination terminal that can communicate with the communication terminal 10 that has made the login authentication request. The terminal status acquisition unit 55 is realized by executing a program by the CPU 301 shown in FIG.

セッション制御部５６は、図９に示すセッション管理テーブルに、生成したセッションＩＤ、開始要求端末の端末ＩＤおよび宛先端末の端末ＩＤを関連付けて記憶して管理する機能部である。セッション制御部５６は、通信端末１０間のセッションの確立をするための制御を行う。セッション制御部５６は、図４に示すＣＰＵ３０１によるプログラムの実行によって実現される。 The session control unit 56 is a functional unit that stores and manages the generated session ID, the terminal ID of the start requesting terminal, and the terminal ID of the destination terminal in association with each other in the session management table shown in FIG. Session control unit 56 performs control for establishing a session between communication terminals 10. The session control unit 56 is realized by executing a program by the CPU 301 shown in FIG.

記憶・読出部５７は、記憶部５８に各種テーブルに情報を記憶したり、記憶部５８に記憶された各種テーブルの情報を読み出す処理を行う機能部である。記憶・読出部５７は、図４に示すＣＰＵ３０１によるプログラムの実行によって実現される。 The storage/reading unit 57 is a functional unit that stores information in various tables in the storage unit 58 and performs processing to read information from various tables stored in the storage unit 58. The storage/reading unit 57 is realized by executing a program by the CPU 301 shown in FIG.

顔検出部６１は、通信部５１で受信した映像データから、写り込んでいる参加者の顔画像を検出して、顔の特徴を数値化して特徴値（以下、顔検出情報と称する場合がある）として取得する機能部である。例えば、検出された顔の目、眉毛、鼻、口等の特徴を数値化して特徴値を求め、当該特徴値ごとに「Ｍ４」、「ＥＬ２」等のＩＤ（識別情報）が割り振られる。また、各顔の部分の位置を特徴点として求め、当該位置、および特徴点間の距離等も特徴値として顔検出情報に含まれる。また、顔検出部６１は、映像データにおいて検出した参加者の顔の画像の中心座標を算出する。なお、顔検出部６１は、参加者の顔画像を検出するものとしたが、これに限定されるものではなく、検出対象は、参加者の顔を含む上半身等、参加者を判別することが可能な部位であればよい。顔検出部６１は、図４に示すＣＰＵ３０１によるプログラムの実行によって実現される。 The face detection unit 61 detects the facial image of the participant in the image from the video data received by the communication unit 51, digitizes facial features, and calculates feature values (hereinafter sometimes referred to as face detection information). ). For example, features such as the eyes, eyebrows, nose, and mouth of the detected face are digitized to obtain feature values, and IDs (identification information) such as "M4" and "EL2" are assigned to each feature value. Further, the position of each face part is determined as a feature point, and the position, distance between the feature points, etc. are also included as feature values in the face detection information. The face detection unit 61 also calculates the center coordinates of the image of the participant's face detected in the video data. Although the face detection unit 61 detects the face image of the participant, it is not limited to this, and the detection target may be the upper body including the face of the participant, etc., which cannot be used to identify the participant. Any part that is possible is fine. The face detection unit 61 is realized by executing a program by the CPU 301 shown in FIG.

比較部６２は、記憶部５８に記憶されている後述の顔・アバター・属性対応テーブル（図２３参照）を参照し、顔検出部６１により取得された顔検出情報が、顔・アバター・属性対応テーブルに登録されている顔認識情報と一致するか否か比較する機能部である。また、比較部６２により、顔検出情報と、顔・アバター・属性対応テーブルの顔認識情報とが一致すると判断されるためには、必ずしも顔検出情報と顔認識情報とが完全に一致する必要はなく、一定程度近似する場合、一致すると判断されるものとしてもよい。比較部６２は、図４に示すＣＰＵ３０１によるプログラムの実行によって実現される。 The comparison unit 62 refers to a face/avatar/attribute correspondence table (see FIG. 23) stored in the storage unit 58, which will be described later, and determines whether the face detection information acquired by the face detection unit 61 corresponds to a face/avatar/attribute. This is a functional unit that compares whether or not it matches the face recognition information registered in the table. Furthermore, in order for the comparison unit 62 to determine that the face detection information and the face recognition information in the face/avatar/attribute correspondence table match, the face detection information and the face recognition information do not necessarily have to match completely. However, if they approximate to a certain degree, they may be determined to match. The comparison unit 62 is realized by executing a program by the CPU 301 shown in FIG.

生成部６３は、顔・アバター・属性対応テーブルにおいて、比較部６２により顔検出情報と一致すると判断された顔認識情報に対応するアバター情報に基づいて、当該顔検出情報に対応する参加者を表すアバターを生成する機能部である。ここで、アバター情報とは、アバターを生成するために用いられる情報であり、例えば、人の顔画像に基づいてその人に似せたアバターを生成するための情報、または、当該顔画像の情報そのものであってもよい。または、アバター情報は、必ずしも参加者の顔であることが分かるようなアバターを生成するための情報である必要はなく、各参加者の代わりとなるアバター（例えば動物のアバター等）を生成するための情報であってもよい。または、アバター情報は、既存のいくつかのアバターを生成するための情報の中からどの情報を用いてアバターを生成するのかを示す種類情報であってもよい。生成部６３は、図４に示すＣＰＵ３０１によるプログラムの実行によって実現される。 The generation unit 63 represents the participant corresponding to the face detection information based on the avatar information corresponding to the face recognition information determined by the comparison unit 62 to match the face detection information in the face/avatar/attribute correspondence table. This is a functional unit that generates avatars. Here, avatar information is information used to generate an avatar, for example, information for generating an avatar that resembles a person based on a person's face image, or information on the face image itself. It may be. Alternatively, the avatar information does not necessarily have to be information for generating an avatar that is recognizable as the participant's face, but can be used to generate an avatar that can replace each participant (for example, an animal avatar, etc.) It may be information of Alternatively, the avatar information may be type information indicating which information is used to generate an avatar from among several existing pieces of information for generating an avatar. The generation unit 63 is realized by executing a program by the CPU 301 shown in FIG.

属性情報要求部６４は、参加者に属性情報についての発話を要求するための指示を通信端末１０へ送信する機能部である。ここで、属性情報とは、ビデオ会議の参加者が属する社名および所属、ならびに参加者の役職および名前等の参加者の属性を示す情報である。属性情報要求部６４は、図４に示すＣＰＵ３０１によるプログラムの実行によって実現される。 The attribute information requesting unit 64 is a functional unit that transmits an instruction to the communication terminal 10 to request a participant to speak about attribute information. Here, the attribute information is information indicating the attributes of the participants, such as the company name and affiliation to which the participants in the video conference belong, and the positions and names of the participants. The attribute information request unit 64 is realized by executing a program by the CPU 301 shown in FIG.

対応付け部６５は、通信部５１を介して受信した参加者の音声の発話方向と、顔検出部６１により検出された顔画像、すなわち参加者とを対応付ける機能部である。なお、対応付け部６５による音声と、顔画像（すなわち参加者）とを対応付ける方法として、上記の動作に限定されるものではなく、例えば、通信部５１を介して受信した音声データおよび映像データを用いて、映像データが示す参加者の口唇動作と、音声データの音声出力のタイミングとに基づいて、音声と参加者とを対応付けるものとしてもよい。対応付け部６５は、図４に示すＣＰＵ３０１によるプログラムの実行によって実現される。 The association unit 65 is a functional unit that associates the speech direction of the participant's voice received via the communication unit 51 with the face image detected by the face detection unit 61, that is, the participant. Note that the method of associating audio and facial images (i.e., participants) by the associating unit 65 is not limited to the above-mentioned operation; The voice and the participant may be associated with each other based on the participant's lip movements indicated by the video data and the timing of the audio output of the audio data. The association unit 65 is realized by executing a program by the CPU 301 shown in FIG.

テキスト化部６６は、通信部５１を介して受信した音声データを、既知の音声認識技術によりテキスト化する機能部である。テキスト化部６６は、図４に示すＣＰＵ３０１によるプログラムの実行によって実現される。 The text conversion unit 66 is a functional unit that converts voice data received via the communication unit 51 into text using known voice recognition technology. The text converting unit 66 is realized by executing a program by the CPU 301 shown in FIG.

抽出部６７は、テキスト化部６６によりテキスト化されたテキストから、予め属性情報を示す候補として登録されている登録済みワードと一致するキーワードを抽出する機能部である。登録済みワードは、例えば記憶部５８に予め記憶されているものとすればよい。なお、例えば、後述の図２３の顔・アバター・属性対応テーブルに示すように、属性情報として社名、所属、役職、および名前が登録されるものとした場合、登録済みワードとしては、社名に関する登録済みワード、所属に関する登録済みワード、役職に関する登録済みワード、および名前に関する登録済みワードがそれぞれ用意にされているものとしてもよい。また、属性情報として扱う属性は、社名、所属、役職、および名前に限定されるものではなく、他の属性（例えば、役割、年齢等）が含まれるものとしてもよい。抽出部６７は、図４に示すＣＰＵ３０１によるプログラムの実行によって実現される。 The extraction unit 67 is a functional unit that extracts keywords that match registered words that are registered in advance as candidates indicating attribute information from the text converted into text by the text conversion unit 66. The registered words may be stored in the storage unit 58 in advance, for example. For example, as shown in the face/avatar/attribute correspondence table in FIG. 23, which will be described later, if company name, affiliation, position, and name are registered as attribute information, the registered words include the registration related to the company name. A registered word, a registered word related to affiliation, a registered word related to position, and a registered word related to name may be prepared. Furthermore, the attributes handled as attribute information are not limited to company name, department, position, and name, and may include other attributes (for example, role, age, etc.). The extraction unit 67 is realized by executing a program by the CPU 301 shown in FIG.

登録部６８は、抽出部６７により抽出されたキーワードを、テキスト化部６６によりテキスト化された音声データに対応する参加者の属性情報として、顔・アバター・属性対応テーブルにおいて比較部６２により顔認識情報と一致すると判断された参加者の顔検出情報に関連付けて登録する機能部である。登録部６８は、図４に示すＣＰＵ３０１によるプログラムの実行によって実現される。 The registration unit 68 uses the keyword extracted by the extraction unit 67 as attribute information of the participant corresponding to the audio data converted into text by the text conversion unit 66, and performs face recognition using the comparison unit 62 in the face/avatar/attribute correspondence table. This is a functional unit that registers in association with face detection information of a participant that is determined to match the information. The registration unit 68 is realized by executing a program by the CPU 301 shown in FIG.

表示制御部６９は、生成部６３により生成されたアバターについて、当該アバターに対応する属性情報に基づいて表示制御を行う機能部である。具体的には、表示制御部６９は、属性情報からビデオ会議の参加者のうち、同じ会社に所属する参加者のアバターを同列となるように配置し、役職の順序に並べた表示となるように、映像データを生成する。また、表示制御部６９は、ビデオ会議の各拠点の参加者を同一の表示領域に表示させるようにしてもよく、参加者を拠点ごとに表示領域を分けて表示させるようにしてもよい。また、表示制御部６９は、アバターを役職の順序に並べると共に、または、それに代えて、役職名をアバターの近傍に表示するものとしてもよい。この際、表示制御部６９は、さらにアバターの近傍に名前、所属等を表示させるものとしてもよい。このように役職の順序に並べたり、役職名等を表示させることによって、自拠点以外の拠点の参加者（アバター）について、少なくともどの参加者がどの参加者よりも目上のものであるのか等の各参加者の立場を把握することができ、円滑に会議を進めることができる。 The display control unit 69 is a functional unit that performs display control for the avatar generated by the generation unit 63 based on attribute information corresponding to the avatar. Specifically, the display control unit 69 arranges the avatars of participants in the video conference who belong to the same company based on the attribute information so that they are in the same row, and arranges them in order of job title. Then, video data is generated. Further, the display control unit 69 may display participants at each base of the video conference in the same display area, or may display participants in separate display areas for each base. Further, the display control unit 69 may arrange the avatars in the order of their positions, or alternatively, display the position names near the avatars. At this time, the display control unit 69 may further display the name, affiliation, etc. near the avatar. In this way, by arranging the positions in order and displaying the position names, etc., it is possible to at least know which participants are superior to other participants (avatars) at bases other than one's own base. The position of each participant can be understood, and the meeting can proceed smoothly.

なお、表示制御部６９は、属性情報に基づいてアバターの配置を変更して映像データを生成した場合、当該映像データの中で手前に配置されたアバターであるほど、対応する音声データの音圧レベルを上げる処理を施しててもよい。これによって、アバターを表示する映像データであっても、ビデオ会議の臨場感を高めることができる。 Note that when the display control unit 69 generates video data by changing the arrangement of avatars based on attribute information, the closer the avatar is placed in the video data, the higher the sound pressure of the corresponding audio data. You may perform processing to raise the level. This makes it possible to enhance the sense of realism in a video conference even when the video data displays an avatar.

また、表示制御部６９は、通信部５１を介して通信端末１０から音声データを受信すると共に、発話方向の情報を受信した場合、当該発話方向に対応するアバターが音声を発話している状態を示す映像データを生成する。これによって、参加者が相手拠点の参加者のアバターが表示されたディスプレイを見ている場合、その参加者（アバター）が発話しているのかを認識することができる。ここで、アバターが音声を発話している状態とは、例えば、口を有するアバターであれば音声データの出力に合わせて口を動かしているような状態、または、音声データの出力に合わせてアバターを上下に動かすことにより発話をしているように示した状態等が挙げられる。さらに、表示制御部６９は、参加者が発話している場合だけでなく、参加者の表情、視線等をリアルタイムに、対応するアバターに反映するものとしてもよい。 In addition, when the display control unit 69 receives audio data from the communication terminal 10 via the communication unit 51 and also receives information on the speaking direction, the display control unit 69 displays the state in which the avatar corresponding to the speaking direction is speaking. Generate the video data shown. With this, when a participant is looking at a display displaying an avatar of a participant at the other site, it is possible to recognize whether that participant (avatar) is speaking. Here, the state in which the avatar is uttering audio means, for example, if the avatar has a mouth, the state in which the mouth is moving in accordance with the output of audio data, or the state in which the avatar is in a state in which the mouth is moving in accordance with the output of audio data. Examples include a state in which the user appears to be speaking by moving the button up and down. Furthermore, the display control unit 69 may reflect the participant's facial expression, line of sight, etc. on the corresponding avatar in real time, not only when the participant is speaking.

また、表示制御部６９は、映像データにアバターを含める場合、その背景の画像としては実画像の背景を用いてもよく、またはバーチャルな背景を用いるものとしてもよい。また、表示制御部６９は、背景の画像として実画像の背景を用いる場合、アバターの表示のみを明確に表示させ、実画像の背景についてはぼかすものとしてもよい。このようにバーチャルな背景を表示させたり、実画像の背景をぼかす表示によって、ビデオ会議の参加者は、自身が居る会議室等の部屋の状態を気にすることなくビデオ会議に参加することができ、背景に社外秘の情報が含まれている場合でも当該情報の流出の防止を抑制することができる。 Further, when including an avatar in the video data, the display control unit 69 may use a real image background or a virtual background as the background image. Furthermore, when using the background of the real image as the background image, the display control unit 69 may display only the avatar clearly and blur the background of the real image. By displaying a virtual background or blurring the background of a real image, participants in a video conference can participate in a video conference without worrying about the state of the conference room or other room they are in. Therefore, even if confidential information is included in the background, it is possible to suppress the prevention of leakage of the information.

表示制御部６９は、図４に示すＣＰＵ３０１によるプログラムの実行によって実現される。このように、表示制御部６９により参加者の実画像の映像データではなく、アバターの映像データを用いることによって、実画像の場合と比べてより少ないフレームレートで転送することができるのでデータ通信量を低減することができ、ディスプレイへのスペック要求を下げることができる。 The display control unit 69 is realized by executing a program by the CPU 301 shown in FIG. In this way, by using the video data of the avatar rather than the video data of the participant's real image by the display control unit 69, it is possible to transfer the data at a lower frame rate than in the case of the real image, thereby reducing the amount of data communication. can be reduced, and the specification requirements for the display can be lowered.

なお、上述の認証部５２、状態管理部５３、端末抽出部５４、端末状態取得部５５、セッション制御部５６、顔検出部６１、比較部６２、生成部６３、属性情報要求部６４、対応付け部６５、テキスト化部６６、抽出部６７、登録部６８および表示制御部６９のうち少なくともいずれかは、ＡＳＩＣまたはＦＰＧＡ等のハードウェア回路によって実現されるものとしてもよい。 Note that the above-mentioned authentication section 52, state management section 53, terminal extraction section 54, terminal state acquisition section 55, session control section 56, face detection section 61, comparison section 62, generation section 63, attribute information request section 64, association At least one of the section 65, the text conversion section 66, the extraction section 67, the registration section 68, and the display control section 69 may be realized by a hardware circuit such as ASIC or FPGA.

また、図６に示した管理システム５０の各機能部は、機能を概念的に示したものであって、このような構成に限定されるものではない。例えば、図６に示した管理システム５０で独立した機能部として図示した複数の機能部を、１つの機能部として構成してもよい。一方、図６に示した管理システム５０の１つ機能部が有する機能を複数に分割し、複数の機能部として構成するものとしてもよい。 Furthermore, the functions of each functional unit of the management system 50 shown in FIG. 6 are conceptually shown, and the structure is not limited to this. For example, a plurality of functional units illustrated as independent functional units in the management system 50 shown in FIG. 6 may be configured as one functional unit. On the other hand, the function of one functional section of the management system 50 shown in FIG. 6 may be divided into a plurality of parts and configured as a plurality of functional parts.

また、図６に示した通信端末１０が有する機能部は、例えば管理システム５０で実現される場合があってもよく、管理システム５０が有する機能部は、例えば通信端末１０で実現する場合があったもよい。また、通信端末１０および管理システム５０が有する機能部は、通信端末１０および管理システム５０以外の装置が実現する場合があってもよい。例えば、アバター生成処理における管理システム５０の顔検出部６１による参加者の顔の検出、属性情報取得処理における管理システム５０のテキスト化部６６による音声データのテキスト化等は、管理システム５０ではなく通信端末１０が有する機能であってもよい。また、管理システム５０の認証部５２による認証処理は、通信端末１０および管理システム５０以外の他の装置が行うものとしてもよい。 Further, the functional units included in the communication terminal 10 shown in FIG. It's good. Further, the functional units included in the communication terminal 10 and the management system 50 may be realized by devices other than the communication terminal 10 and the management system 50. For example, the detection of a participant's face by the face detection unit 61 of the management system 50 in the avatar generation process, the text conversion of voice data by the text conversion unit 66 of the management system 50 in the attribute information acquisition process, etc. are performed by the management system 50 and not by the management system 50. It may be a function that the terminal 10 has. Further, the authentication process by the authentication unit 52 of the management system 50 may be performed by a device other than the communication terminal 10 and the management system 50.

（コンテンツデータおよび各種管理情報の送受信の状態）
図１１は、実施形態に係る通信システムにおけるコンテンツデータおよび各種管理情報を送受信するために確立されたセッションを示す図である。図１１を参照しながら、通信システム１におけるコンテンツデータおよび各種管理情報を送受信するために確立されたセッションについて説明する。 (Status of sending and receiving content data and various management information)
FIG. 11 is a diagram showing sessions established for transmitting and receiving content data and various types of management information in the communication system according to the embodiment. A session established for transmitting and receiving content data and various management information in the communication system 1 will be described with reference to FIG. 11.

図１１に示すように、通信システム１では、開始要求端末と宛先端末Ａと宛先端末Ｂとの間で、管理システム５０を介して、各種の管理情報を送受信するための管理情報用セッションｓｅｉが確立される。さらに、開始要求端末と宛先端末Ａと宛先端末Ｂとの間で、管理システム５０を介して、画像データおよび音声データ等を送受信するためのコンテンツデータ用セッションｓｅｄが確立される。すなわち、コンテンツデータ用セッションｓｅｄが、ビデオ会議において直接的に用いられるセッションである。なお、このセッションの概念はあくまで一例であって、例えば、画像データのセッションでは、解像度ごとに分けられるものとしてもよい。 As shown in FIG. 11, in the communication system 1, a management information session sei is established for transmitting and receiving various types of management information between the start request terminal, destination terminal A, and destination terminal B via the management system 50. Established. Further, a content data session sed for transmitting and receiving image data, audio data, etc. is established between the start request terminal, destination terminal A, and destination terminal B via the management system 50. That is, the content data session sed is a session that is directly used in the video conference. Note that this session concept is just an example; for example, an image data session may be divided by resolution.

（通信端末が通話開始する前の準備段階における各管理情報の送受信処理）
図１２は、実施形態に係る通信システムにおける、通信端末が通話を開始するための認証処理を含む準備段階の処理の一例を示すシーケンス図である。図１３は、宛先リストの表示例を示す図である。図１２および図１３を参照しながら、通信端末１０ａａが通話を開始する前の準備段階における各情報の送受信処理について説明する。なお、図１２では、管理情報用セッションｓｅｉによって、各種管理情報が送受信される処理が示されている。 (Transmission and reception processing of each management information in the preparation stage before the communication terminal starts a call)
FIG. 12 is a sequence diagram illustrating an example of a preparation stage process including an authentication process for a communication terminal to start a call in the communication system according to the embodiment. FIG. 13 is a diagram showing a display example of the destination list. With reference to FIGS. 12 and 13, the transmission and reception process of each piece of information in the preparation stage before the communication terminal 10aa starts a call will be described. Note that FIG. 12 shows a process in which various types of management information are transmitted and received by the management information session sei.

＜ステップＳ２１＞
まず、通信端末１０ａａのユーザが、図２に示す入力装置１０８に対する操作により電源をオンにすると、通信端末１０ａａの操作入力受付部１２が、電源オンを受け付けて、通信端末１０ａａの電源をオンにする。 <Step S21>
First, when the user of the communication terminal 10aa turns on the power by operating the input device 108 shown in FIG. do.

＜ステップＳ２２＞
そして、通信端末１０ａａの認証要求部１９は、上述の通信端末１０ａａの電源オンを契機とし、通信部１１から通信ネットワーク２を介して管理システム５０に、ログインの認証要求を示す認証要求情報、および通信端末１０ａａのＩＰアドレスを送信する。この認証要求情報には、開始要求端末としての自端末である通信端末１０ａａを識別するための端末ＩＤ、通信端末１０ａａにログインしているユーザのユーザＩＤ、およびパスワードが含まれている。端末ＩＤ、ユーザＩＤおよびパスワードは、通信端末１０ａａの記憶・読出部１７によって記憶部１８から読み出されて、通信部１１に送られたデータである。また、通信端末１０ａａから管理システム５０へ認証要求情報が送信される際は、受信側である管理システム５０は、送信側である通信端末１０ａａのＩＰアドレスを把握することができる。 <Step S22>
Then, the authentication request unit 19 of the communication terminal 10aa sends authentication request information indicating a login authentication request from the communication unit 11 to the management system 50 via the communication network 2, triggered by the above-mentioned power-on of the communication terminal 10aa. The IP address of the communication terminal 10aa is transmitted. This authentication request information includes a terminal ID for identifying the communication terminal 10aa, which is the own terminal as the start requesting terminal, a user ID of a user logging into the communication terminal 10aa, and a password. The terminal ID, user ID, and password are data read from the storage section 18 by the storage/readout section 17 of the communication terminal 10aa and sent to the communication section 11. Further, when authentication request information is transmitted from the communication terminal 10aa to the management system 50, the management system 50, which is the receiving side, can grasp the IP address of the communication terminal 10aa, which is the sending side.

＜ステップＳ２３＞
次に、管理システム５０の認証部５２は、通信部５１を介して受信した認証要求情報に含まれているユーザＩＤおよびパスワードを検索キーとして、認証管理テーブル（図７参照）を検索し、認証管理テーブルに同一のユーザＩＤおよびパスワードが管理されているかを判断することによってユーザ認証を行う。 <Step S23>
Next, the authentication unit 52 of the management system 50 searches the authentication management table (see FIG. 7) using the user ID and password included in the authentication request information received via the communication unit 51 as a search key, and performs authentication. User authentication is performed by determining whether the same user ID and password are managed in the management table.

＜ステップＳ２４－１＞
認証部５２によって、正当な利用権限を有する通信端末１０からのログインの認証要求であると判断された場合には、管理システム５０の状態管理部５３は、端末管理テーブル（図８参照）に、通信端末１０ａａの端末ＩＤおよび端末名で示されるレコード毎に、ユーザＩＤおよび通信端末１０ａａのＩＰアドレスを関連付けて記憶する。これにより、端末管理テーブルには、通信端末１０ａａの端末ＩＤ「１０ａａ」に、ユーザＩＤ「Ａ＿１０ａａ」およびＩＰアドレス「１．２．１．３」が関連付けて管理されることになる。 <Step S24-1>
If the authentication unit 52 determines that the login authentication request is from the communication terminal 10 that has legitimate usage authority, the state management unit 53 of the management system 50 writes the following information in the terminal management table (see FIG. 8). The user ID and the IP address of the communication terminal 10aa are stored in association with each record indicated by the terminal ID and terminal name of the communication terminal 10aa. As a result, the terminal ID "10aa" of the communication terminal 10aa is managed in association with the user ID "A_10aa" and the IP address "1.2.1.3" in the terminal management table.

＜ステップＳ２４－２＞
続いて、状態管理部５３は、通信端末１０ａａの稼動状態「オンライン」および通信状態「Ｎｏｎｅ」を設定し、端末管理テーブルに、通信端末１０ａａの端末ＩＤおよび端末名で示されるレコードに、稼動状態および通信状態を関連付けて記憶する。これにより、端末管理テーブルには、通信端末１０ａａの端末ＩＤ「１０ａａ」に、稼動状態「オンライン」および通信状態「Ｎｏｎｅ」が関連付けて管理されることになる。 <Step S24-2>
Subsequently, the state management unit 53 sets the operating state "online" and the communication state "None" of the communication terminal 10aa, and adds the operating state to the record indicated by the terminal ID and terminal name of the communication terminal 10aa in the terminal management table. and the communication status are stored in association with each other. As a result, the terminal ID "10aa" of the communication terminal 10aa is managed in association with the operating state "online" and the communication state "None" in the terminal management table.

＜ステップＳ２５＞
そして、管理システム５０の通信部５１は、認証部５２によって得られたユーザ認証の結果が示された認証結果情報を、通信ネットワーク２を介して、認証要求情報を送信してきた開始要求端末（通信端末１０ａａ）に送信する。本実施形態では、通信端末１０ａａが、認証部５２によって正当な利用権限を有するユーザが利用する端末であるとユーザ認証されたものとして、以下続けて説明する。 <Step S25>
Then, the communication unit 51 of the management system 50 transmits authentication result information indicating the user authentication result obtained by the authentication unit 52 to the start requesting terminal (communication Terminal 10aa). In the present embodiment, the following description will be continued on the assumption that the communication terminal 10aa has been authenticated by the authentication unit 52 as a terminal used by a user who has legitimate usage authority.

＜ステップＳ２６＞
通信端末１０ａａにおいて、正当な利用権限を有するユーザが利用する端末であるとユーザ認証された結果が示された認証結果情報を受信すると、通信部１１は、通信ネットワーク２を介して管理システム５０へ、宛先リストを要求する旨を示す宛先リスト要求情報を送信するこれにより、管理システム５０の通信部５１は、宛先リスト要求情報を受信する。 <Step S26>
When the communication terminal 10aa receives authentication result information indicating that the terminal is authenticated to be used by a user with legitimate usage authority, the communication unit 11 transmits the information to the management system 50 via the communication network 2. , sends destination list request information indicating that a destination list is requested.Thereby, the communication unit 51 of the management system 50 receives the destination list request information.

＜ステップＳ２７＞
次に、管理システム５０の端末抽出部５４は、開始要求端末（通信端末１０ａａ）の端末ＩＤ「１０ａａ」を検索キーとして、グループ管理テーブル（図９参照）を検索し、開始要求端末が通話することができる、すなわち、開始要求端末と同じグループ（ここでは、グループＩＤ「Ｇ００１」のグループ）に属する宛先端末の候補の端末ＩＤを抽出する。また、端末抽出部５４は、抽出した端末ＩＤを検索キーとして、端末管理テーブルを検索し、この端末ＩＤに対応する端末名、すなわち宛先端末の候補の端末名を抽出する。ここでは、開始要求端末（通信端末１０ａａ）の端末ＩＤ「１０ａａ」に対応する宛先端末の候補（通信端末１０ａｂ、１０ａｃ、１０ｄｂ）のそれぞれの端末ＩＤ（「１０ａｂ」、「１０ａｃ」、「１０ｄｂ」）と、これらに対応する端末名（「日本東京事業所ＡＢ端末」、「日本東京事業所ＡＣ端末」、「アメリカワシントン事業所ＤＢ端末」）が抽出される。 <Step S27>
Next, the terminal extraction unit 54 of the management system 50 searches the group management table (see FIG. 9) using the terminal ID "10aa" of the start requesting terminal (communication terminal 10aa) as a search key, and the start requesting terminal makes a call. In other words, the terminal ID of the destination terminal candidate belonging to the same group as the start requesting terminal (here, the group with group ID "G001") is extracted. Furthermore, the terminal extraction unit 54 searches the terminal management table using the extracted terminal ID as a search key, and extracts the terminal name corresponding to this terminal ID, that is, the terminal name of the destination terminal candidate. Here, the respective terminal IDs ("10ab", "10ac", "10db") of destination terminal candidates (communication terminals 10ab, 10ac, 10db) corresponding to the terminal ID "10aa" of the start requesting terminal (communication terminal 10aa) are ) and the corresponding terminal names (“Japan Tokyo Office AB Terminal”, “Japan Tokyo Office AC Terminal”, “USA Washington Office DB Terminal”) are extracted.

＜ステップＳ２８、Ｓ２９＞
次に、管理システム５０の通信部５１は、端末抽出部５４によって抽出された宛先端末の候補の端末ＩＤおよび端末名を含む宛先リスト情報を、開始要求端末（通信端末１０ａａ）に送信する。これにより、開始要求端末（通信端末１０ａａ）では、通信部１１が宛先リスト情報を受信し、記憶・読出部１７が記憶部１８へ宛先リスト情報を記憶する。 <Steps S28, S29>
Next, the communication unit 51 of the management system 50 transmits destination list information including the terminal ID and terminal name of the destination terminal candidates extracted by the terminal extraction unit 54 to the start requesting terminal (communication terminal 10aa). As a result, in the start requesting terminal (communication terminal 10aa), the communication section 11 receives the destination list information, and the storage/reading section 17 stores the destination list information in the storage section 18.

このように、本実施形態では、各通信端末１０で宛先リスト情報を管理するのではなく、管理システム５０がすべての通信端末１０の宛先リスト情報を一元管理している。これによって、通信システム１に新たな通信端末１０が含まれるようになったり、既に含まれている通信端末１０が除外されたりする場合でも、管理システム５０側で一括して対応するため、各通信端末１０側で宛先リスト情報の変更を行う手間を省くことができる。 In this manner, in this embodiment, the management system 50 centrally manages the destination list information of all communication terminals 10, instead of managing destination list information at each communication terminal 10. As a result, even if a new communication terminal 10 is included in the communication system 1 or a communication terminal 10 already included in the communication system 1 is excluded, each communication The effort of changing destination list information on the terminal 10 side can be saved.

＜ステップＳ３０＞
また、管理システム５０の端末状態取得部５５は、端末抽出部５４によって抽出された宛先端末の候補の端末ＩＤ（「１０ａｂ」、「１０ａｃ」、「１０ｄｂ」）を検索キーとして、端末管理テーブルを検索する。そして、端末状態取得部５５は、宛先端末の候補の端末ＩＤ毎に、対応する稼動状態および通信状態を読み出すことにより、宛先端末の候補（通信端末１０ａｂ、１０ａｃ、１０ｄｂ）それぞれの稼動状態および通信状態を取得する。 <Step S30>
In addition, the terminal status acquisition unit 55 of the management system 50 searches the terminal management table using the terminal ID (“10ab”, “10ac”, “10db”) of the destination terminal candidate extracted by the terminal extraction unit 54 as a search key. search for. Then, the terminal state acquisition unit 55 reads out the corresponding operating state and communication state for each terminal ID of the destination terminal candidate, thereby obtaining the operating state and communication state of each of the destination terminal candidates (communication terminals 10ab, 10ac, 10db). Get the status.

＜ステップＳ３１＞
次に、通信部５１は、ステップＳ３０で使用された検索キーである端末ＩＤと、対応する宛先端末の候補の稼動状態および通信状態とを含む状態情報を、通信ネットワーク２を介して開始要求端末に送信する。具体的には、通信部５１は、例えば、検索キーとしての端末ＩＤ「１０ａｂ」と、宛先端末の候補（通信端末１０ａｂ）の稼動状態「オフライン」とを含む状態情報を、開始要求端末（通信端末１０ａａ）に送信する。なお、稼動状態が「オフライン」の場合には、状態情報には、通信状態は含まれない。また、通信部５１は、端末ＩＤ「１０ａｃ」と、宛先端末の候補（通信端末１０ａｃ）の稼動状態「オンライン」と、通信状態「Ｎｏｎｅ」とを含む状態情報等、宛先端末の候補すべてに対する状態情報それぞれを開始要求端末（通信端末１０ａａ）へ送信する。 <Step S31>
Next, the communication unit 51 sends status information including the terminal ID, which is the search key used in step S30, and the operating status and communication status of the corresponding destination terminal candidate, to the start requesting terminal via the communication network 2. Send to. Specifically, the communication unit 51 transmits state information including the terminal ID "10ab" as a search key and the operation state "offline" of the destination terminal candidate (communication terminal 10ab) to the start requesting terminal (communication terminal 10ab). Terminal 10aa). Note that when the operating state is "offline", the state information does not include the communication state. The communication unit 51 also provides status information for all destination terminal candidates, such as status information including the terminal ID “10ac”, the operation status “online” of the destination terminal candidate (communication terminal 10ac), and the communication status “None”. Each piece of information is transmitted to the start request terminal (communication terminal 10aa).

＜ステップＳ３２＞
次に、開始要求端末（通信端末１０ａａ）の記憶・読出部１７は、順次、管理システム５０から受信した状態情報を記憶部１８に記憶する。したがって、開始要求端末（通信端末１０ａａ）は、宛先端末の候補の状態情報を受信することで、通話することができる宛先端末の候補の現時点のそれぞれの稼動状態および通信状態を取得することができる。 <Step S32>
Next, the storage/reading unit 17 of the start request terminal (communication terminal 10aa) sequentially stores the status information received from the management system 50 in the storage unit 18. Therefore, by receiving the status information of the destination terminal candidates, the start requesting terminal (communication terminal 10aa) can obtain the current operating status and communication status of each of the destination terminal candidates with which the call can be made. .

＜ステップＳ３３＞
次に、開始要求端末（通信端末１０ａａ）の表示制御部１４は、記憶部１８に記憶されている宛先リスト情報、および宛先端末の候補の状態情報に基づいて、宛先端末の候補の稼動状態および通信状態を反映させた宛先リストを作成する。そして、表示制御部１４は、図１に示すディスプレイ１２０ａａに、所定のタイミングで図１３に示すような宛先リストを表示する。 <Step S33>
Next, the display control unit 14 of the start requesting terminal (communication terminal 10aa) determines the operating status of the destination terminal candidate based on the destination list information stored in the storage unit 18 and the status information of the destination terminal candidate. Create a destination list that reflects the communication status. Then, the display control unit 14 displays a destination list as shown in FIG. 13 at a predetermined timing on the display 120aa shown in FIG.

図１３に示すように、ディスプレイ１２０ａａに表示される宛先リストは、宛先端末の候補の端末ＩＤ１１００－２と、端末名１１００－３と、状態情報を反映させたアイコン１１００－４ａ、１１００－４ｂ等を含む。アイコンとしては、オフラインで通話できないことを示すオフラインアイコン１１００－４ａと、オンラインで通話可能であることを示す通話可能アイコン１１００－４ｂと、がある。なお、オンラインで通話中であることを示す通話中アイコン等があってもよい。 As shown in FIG. 13, the destination list displayed on the display 120aa includes a terminal ID 1100-2 of destination terminal candidates, a terminal name 1100-3, icons 1100-4a and 1100-4b reflecting status information, etc. including. The icons include an offline icon 1100-4a that indicates that a call cannot be made offline, and a call enabled icon 1100-4b that indicates that a call is possible online. Note that there may be a call-in-progress icon or the like that indicates that the call is in progress online.

表示制御部１４は、宛先端末の候補の稼動状態が「オンライン」であり、通信状態が「Ｎｏｎｅ」である場合には、この宛先端末の候補に対し、通話可能アイコン１１００－４ｂを割り当てる。また、表示制御部１４は、宛先端末の候補の稼動状態が「オフライン」である場合には、この宛先端末の候補に対し、オフラインアイコン１１００－４ａを割り当てる。なお、表示制御部１４は、宛先端末の候補の稼動状態が「オンライン」であり、通信状態が「Ｎｏｎｅ」以外である場合には、この宛先端末の候補に対し、通話中アイコンを割り当てればよい。 If the operating state of the destination terminal candidate is "online" and the communication state is "None," the display control unit 14 assigns a call enabled icon 1100-4b to this destination terminal candidate. Further, when the operating state of the destination terminal candidate is "offline", the display control unit 14 assigns an offline icon 1100-4a to this destination terminal candidate. Note that if the operation status of the destination terminal candidate is "online" and the communication status is other than "None", the display control unit 14 may assign a busy icon to this destination terminal candidate. good.

なお、他の通信端末１０でも、ステップＳ２１と同様に、ユーザが図２に示す入力装置１０８を介して電源をオンにすると、当該通信端末１０の操作入力受付部１２が、電源オンを受け付けて、上述のステップＳ２２～Ｓ３３の処理と同様の処理が行われる。 Note that in other communication terminals 10, similarly to step S21, when the user turns on the power through the input device 108 shown in FIG. , processing similar to the processing of steps S22 to S33 described above is performed.

（通信端末が他の通信端末との通信の開始を要求する場合の処理）
図１４は、実施形態に係る通信システムにおける通話の開始を要求する処理の一例を示すシーケンス図である。図１４を参照しながら、通信端末１０が他の通信端末１０との通信の開始を要求する場合の処理を説明する。なお、図１４では、すべて管理情報用セッションｓｅｉによって、各種管理情報が送受信される処理が示されている。 (Processing when a communication terminal requests the start of communication with another communication terminal)
FIG. 14 is a sequence diagram illustrating an example of a process for requesting the start of a call in the communication system according to the embodiment. Processing when the communication terminal 10 requests the start of communication with another communication terminal 10 will be described with reference to FIG. 14. Note that FIG. 14 shows processing in which various types of management information are transmitted and received entirely by the management information session sei.

図１４においては、図１２においてログインが許可された通信端末１０ａａが、開始要求情報を送信する例、すなわち、通信端末１０ａａが開始要求端末として動作する例について説明する。開始要求端末としての通信端末１０ａａは、図１２のステップＳ３１で受信した宛先端末の候補の状態情報に基づいて、宛先端末の候補のうち、稼動状態が「オンライン」であり、通信状態が「Ｎｏｎｅ」である通信端末１０のうち少なくとも１つの通信端末１０と通話を行うことができる。例えば、開始要求端末（通信端末１０ａａ）は、宛先端末の候補のうち、図１２のステップＳ３１によって受信した状態情報により、稼動状態が「オンライン」であり、通信状態が「Ｎｏｎｅ」である通信端末１０ｄｂと通話を行うことができる。そこで、以下では、開始要求端末（通信端末１０ａａ）のユーザが、宛先端末（通信端末１０ｄｂ）と通話を開始することを選択した場合について説明する。 In FIG. 14, an example will be described in which the communication terminal 10aa, which was permitted to log in in FIG. 12, transmits start request information, that is, an example in which the communication terminal 10aa operates as a start request terminal. The communication terminal 10aa serving as the start request terminal determines that among the destination terminal candidates, the operating state is "online" and the communication state is "None" based on the state information of the destination terminal candidates received in step S31 of FIG. It is possible to make a telephone conversation with at least one communication terminal 10 among the communication terminals 10 that are ``. For example, the start requesting terminal (communication terminal 10aa) is a communication terminal whose operating state is "online" and whose communication state is "none" according to the state information received in step S31 of FIG. 12 among the destination terminal candidates. You can talk to 10db. Therefore, below, a case will be described in which the user of the start requesting terminal (communication terminal 10aa) selects to start a call with the destination terminal (communication terminal 10db).

なお、図１４に示す処理が開始される前の状態において、開始要求端末としての通信端末１０ａａのディスプレイ１２０ａａには、図１３に示す宛先リストが表示されているものとする。そして、開始要求端末のユーザは、宛先リストから所望の通話相手（宛先端末）を選択することができる。 It is assumed that before the process shown in FIG. 14 is started, the destination list shown in FIG. 13 is displayed on the display 120aa of the communication terminal 10aa as the start requesting terminal. The user of the start requesting terminal can then select a desired communication partner (destination terminal) from the destination list.

＜ステップＳ４１＞
まず、開始要求端末のユーザは、通信端末１０ａａの入力装置１０８を操作して宛先端末（通信端末１０ｄｂ）を選択する。 <Step S41>
First, the user of the start request terminal operates the input device 108 of the communication terminal 10aa to select a destination terminal (communication terminal 10db).

＜ステップＳ４２＞
すると、通信端末１０ａａの通信部１１は、開始要求端末（通信端末１０ａａ）の端末ＩＤ「１０ａａ」、および宛先端末（通信端末１０ｄｂ）の端末ＩＤ「１０ｄｂ」を含む開始要求情報を、開始要求端末のＩＰアドレスと共に管理システム５０へ送信する。これにより、管理システム５０の通信部５１は、開始要求情報を受信すると共に、送信元である開始要求端末（通信端末１０ａａ）のＩＰアドレス「１．２．１．３」を把握することになる。 <Step S42>
Then, the communication unit 11 of the communication terminal 10aa sends the start request information including the terminal ID "10aa" of the start requesting terminal (communication terminal 10aa) and the terminal ID "10db" of the destination terminal (communication terminal 10db) to the start requesting terminal. It is sent to the management system 50 along with the IP address of. As a result, the communication unit 51 of the management system 50 receives the start request information and grasps the IP address "1.2.1.3" of the start request terminal (communication terminal 10aa) that is the sender. .

＜ステップＳ４３＞
そして、状態管理部５３は、開始要求情報に含まれる開始要求端末（通信端末１０ａａ）の端末ＩＤ「１０ａａ」および宛先端末（通信端末１０ｄｂ）の端末ＩＤ「１０ｄｂ」に基づき、端末管理ＤＢ５００２の端末管理テーブルにおいて、端末ＩＤ「１０ａａ」および端末ＩＤ「１０ｄｂ」がそれぞれ含まれるレコードの通信状態のフィールド部分を変更する。具体的には、状態管理部５３は、端末管理テーブルの端末ＩＤ「１０ａａ」が含まれるレコードの通信状態を「Ｃａｌｌｉｎｇ」に変更する。同様に、状態管理部５３は、端末管理テーブルの端末ＩＤ「１０ｄｂ」が含まれるレコードの通信状態を「Ｒｉｎｇｉｎｇ」に変更する。 <Step S43>
Then, the state management unit 53 selects the terminal in the terminal management DB 5002 based on the terminal ID "10aa" of the start requesting terminal (communication terminal 10aa) and the terminal ID "10db" of the destination terminal (communication terminal 10db) included in the start request information. In the management table, change the communication status field of the record that includes the terminal ID "10aa" and the terminal ID "10db". Specifically, the state management unit 53 changes the communication state of the record including the terminal ID "10aa" in the terminal management table to "Calling". Similarly, the state management unit 53 changes the communication state of the record including the terminal ID "10db" in the terminal management table to "Ringing".

＜ステップＳ４４＞
そして、管理システム５０のセッション制御部５６は、開始要求端末（通信端末１０ａａ）によって要求された宛先端末との間の通信を実行するためのセッション（コンテンツデータ用セッションｓｅｄ）を識別するためのセッションＩＤ「ｓｅ１」を生成する。セッション制御部５６は、セッションＩＤを生成すると、セッションＩＤ「ｓｅ１」をセッション管理テーブル（図１０参照）に記憶する。 <Step S44>
Then, the session control unit 56 of the management system 50 uses a session for identifying a session (content data session sed) for executing communication with the destination terminal requested by the start requesting terminal (communication terminal 10aa). Generate ID "se1". After generating the session ID, the session control unit 56 stores the session ID "se1" in the session management table (see FIG. 10).

＜ステップＳ４５＞
続いて、セッション制御部５６は、セッション管理テーブルにおいて、セッションＩＤ「ｓｅ１」が含まれるレコードの開始要求端末の端末ＩＤおよび宛先端末の端末ＩＤのフィールド部分に、それぞれ開始要求端末の端末ＩＤ「１０ａａ」、宛先端末の端末ＩＤ「１０ｄｂ」を記憶して管理する。 <Step S45>
Next, the session control unit 56 enters the terminal ID "10aa" of the start requesting terminal into the fields of the start requesting terminal terminal ID and the destination terminal terminal ID of the record including the session ID "se1" in the session management table. ”, and the terminal ID “10db” of the destination terminal is stored and managed.

＜ステップＳ４６＞
次に、通信部５１は、通信ネットワーク２を介して、開始要求端末（通信端末１０ａａ）へ、セッション制御部５６により生成されたセッションＩＤを送信する。 <Step S46>
Next, the communication unit 51 transmits the session ID generated by the session control unit 56 to the start requesting terminal (communication terminal 10aa) via the communication network 2.

＜ステップＳ４７＞
また、通信部５１は、開始要求端末の端末ＩＤ「１０ａａ」と、セッションＩＤ「ｓｅ１」とを含む開始要求情報と、管理システム５０のＩＰアドレスとを宛先端末へ送信する。これにより、宛先端末（通信端末１０ｄｂ）は、開始要求情報を受信すると共に、管理システム５０のＩＰアドレス「１．１．１．２」を把握することになる。 <Step S47>
The communication unit 51 also transmits start request information including the terminal ID "10aa" of the start request terminal and the session ID "se1" and the IP address of the management system 50 to the destination terminal. As a result, the destination terminal (communication terminal 10db) receives the start request information and also learns the IP address "1.1.1.2" of the management system 50.

（宛先端末が開始要求端末との間で通信開始を許可する応答を受け付けた場合の処理）
図１５は、実施形態に係る通信システムにおける通話の開始の要求を許可する処理の一例を示すシーケンス図である。図１６は、開始要求受付画面の表示例を示す図である。図１５および図１６を参照しながら、開始要求情報を受信した宛先端末のユーザが、入力装置１０８を操作することにより、開始要求端末との間の通信の開始（セッションの確立）を許可する旨の応答が受け付けられた場合の処理について説明する。 (Processing when the destination terminal receives a response permitting the start of communication with the start requesting terminal)
FIG. 15 is a sequence diagram illustrating an example of processing for permitting a request to start a call in the communication system according to the embodiment. FIG. 16 is a diagram illustrating a display example of a start request reception screen. Referring to FIGS. 15 and 16, the user of the destination terminal that has received the start request information operates the input device 108 to indicate permission to start communication (establishment of a session) with the start request terminal. The processing when the response is accepted will be explained.

＜ステップＳ５１＞
図１５に示す送受信処理の開始時には、宛先端末（通信端末１０ｄｂ）のディスプレイ２１４には、開始要求情報を受信したことを示す開始要求受付画面１２００－１（図１６参照）が表示されている。 <Step S51>
At the start of the transmission/reception process shown in FIG. 15, a start request reception screen 1200-1 (see FIG. 16) indicating that start request information has been received is displayed on the display 214 of the destination terminal (communication terminal 10db).

図１６に示す開始要求受付画面１２００－１は、開始要求情報を受信した旨を示し、開始要求端末との間の通信の開始（セッションの確立）を許可するか否かを指定するためのユーザインターフェースである。ユーザは、開始要求受付画面１２００－１を閲覧することにより、開始要求を受信したことを確認することができる。開始要求受付画面１２００－１は、セッションの確立を許可するための「はい」ボタン１２００－２と、セッションの確立を許可しない選択をするための「いいえ」ボタン１２００－３と、を含む。 The start request reception screen 1200-1 shown in FIG. 16 indicates that start request information has been received, and allows the user to specify whether or not to permit the start of communication (establishment of a session) with the start request terminal. It is an interface. The user can confirm that the start request has been received by viewing the start request reception screen 1200-1. The start request acceptance screen 1200-1 includes a "Yes" button 1200-2 for allowing the establishment of a session, and a "No" button 1200-3 for selecting not to permit the establishment of the session.

＜ステップＳ５２＞
宛先端末（通信端末１０ｄｂ）の入力装置１０８の操作によって「はい」ボタン１２００－２が押下された場合、操作入力受付部１２は、開始要求端末（通信端末１０ａａ）との間の通信の開始（セッション確立）を許可する旨の応答を受け付ける。 <Step S52>
When the “Yes” button 1200-2 is pressed by operating the input device 108 of the destination terminal (communication terminal 10db), the operation input reception unit 12 starts communication with the start requesting terminal (communication terminal 10aa). A response indicating that session establishment is permitted is accepted.

＜ステップＳ５３＞
次に、宛先端末の通信部１１は、宛先端末の端末ＩＤ「１０ｄｂ」、開始要求端末の端末ＩＤ「１０ａａ」、およびセッションＩＤ「ｓｅ１」が含まれる開始応答情報を、管理システム５０へ送信する。 <Step S53>
Next, the communication unit 11 of the destination terminal transmits start response information including the terminal ID “10db” of the destination terminal, the terminal ID “10aa” of the start requesting terminal, and the session ID “se1” to the management system 50. .

＜ステップＳ５４＞
管理システム５０の通信部５１が開始応答情報を受信すると、状態管理部５３は、開始応答情報に含まれる開始要求端末の端末ＩＤ「１０ａａ」および宛先端末の端末ＩＤ「１０ｄｂ」に基づき、端末管理テーブルにおいて、端末ＩＤ「１０ａａ」および端末ＩＤ「１０ｄｂ」がそれぞれ含まれるレコードの通信状態のフィールド部分を変更する。具体的には、状態管理部５３は、端末管理テーブルの端末ＩＤ「１０ａａ」が含まれるレコードの通信状態を「Ａｃｃｅｐｔｅｄ」に変更する。同様に、状態管理部５３は、端末管理テーブルの端末ＩＤ「１０ｄｂ」が含まれるレコードの通信状態も「Ａｃｃｅｐｔｅｄ」に変更する。 <Step S54>
When the communication unit 51 of the management system 50 receives the start response information, the state management unit 53 performs terminal management based on the terminal ID “10aa” of the start requesting terminal and the terminal ID “10db” of the destination terminal included in the start response information. In the table, the communication status field portion of the record containing the terminal ID "10aa" and the terminal ID "10db" is changed. Specifically, the state management unit 53 changes the communication state of the record including the terminal ID "10aa" in the terminal management table to "Accepted". Similarly, the status management unit 53 also changes the communication status of the record that includes the terminal ID "10db" in the terminal management table to "Accepted".

＜ステップＳ５５＞
次に、通信部５１は、宛先端末（通信端末１０ｄｂ）の端末ＩＤ「１０ｄｂ」、およびセッションＩＤ「ｓｅ１」が含まれる開始応答情報を開始要求端末（通信端末１０ａａ）へ送信する。 <Step S55>
Next, the communication unit 51 transmits start response information including the terminal ID "10db" of the destination terminal (communication terminal 10db) and the session ID "se1" to the start request terminal (communication terminal 10aa).

＜ステップＳ５６＞
開始要求端末は、この開始応答情報を受信すると、通信部１１によってセッションＩＤ「ｓｅ１」を管理システム５０に送信することにより、セッションを確立させる。 <Step S56>
Upon receiving this start response information, the start request terminal establishes a session by transmitting the session ID "se1" to the management system 50 using the communication unit 11.

＜ステップＳ５７＞
一方、宛先端末は、通信部１１によってセッションＩＤ「ｓｅ１」を管理システム５０に送信することにより、セッションを確立させる。 <Step S57>
On the other hand, the destination terminal establishes a session by transmitting the session ID "se1" to the management system 50 using the communication unit 11.

なお、上述の図１２における同じグループの通信端末１０を識別する端末ＩＤを抽出する動作、ならびに図１４および図１５に示す通信端末１０ａａと通信端末１０ｄｂとの間でセッションを確立させるための動作においては、端末ＩＤを利用した動作ではなく、各通信端末にログインしているユーザのユーザＩＤ（例えばメールアドレス等）を利用した動作であってもよい。 Note that in the operation of extracting the terminal ID for identifying the communication terminals 10 of the same group in FIG. 12 described above and the operation of establishing a session between the communication terminal 10aa and the communication terminal 10db shown in FIGS. may be an operation using the user ID (for example, e-mail address, etc.) of the user who is logged in to each communication terminal, instead of using the terminal ID.

（アバター生成処理）
図１７は、実施形態に係る通信システムのアバター生成処理の流れの一例を示すフローチャートである。図１８および図１９は、アバターの表示動作を説明する図である。図１７～図１９を参照しながら、管理システム５０においてビデオ会議に参加する参加者のアバターを生成する処理の流れについて説明する。 (Avatar generation process)
FIG. 17 is a flowchart illustrating an example of the flow of avatar generation processing in the communication system according to the embodiment. FIGS. 18 and 19 are diagrams illustrating the display operation of the avatar. The flow of processing for generating avatars of participants participating in a video conference in the management system 50 will be described with reference to FIGS. 17 to 19.

＜ステップＳ６１＞
管理システム５０の顔検出部６１は、通信部５１により受信されたビデオ会議に参加する各通信端末１０から受信した映像データから、写り込んでいる参加者の顔画像を検出して、顔の特徴を数値化した特徴値を顔検出情報として取得する。そして、顔検出部６１は、映像データにおいて検出した参加者の顔の画像の中心座標を算出する。そして、ステップＳ６２へ移行する。 <Step S61>
The face detection unit 61 of the management system 50 detects facial images of participants in the video data from the communication terminals 10 participating in the video conference received by the communication unit 51, and determines facial characteristics. The feature values obtained by quantifying the values are obtained as face detection information. Then, the face detection unit 61 calculates the center coordinates of the image of the participant's face detected in the video data. Then, the process moves to step S62.

＜ステップＳ６２＞
管理システム５０の比較部６２は、記憶部５８に記憶されている後述の顔・アバター・属性対応テーブル（後述する図２３参照）を参照し、顔検出部６１により取得された顔検出情報が、顔・アバター・属性対応テーブルに登録されている顔認識情報と一致するか否か比較する。そして、ステップＳ６３へ移行する。 <Step S62>
The comparison unit 62 of the management system 50 refers to a face/avatar/attribute correspondence table (described later) stored in the storage unit 58 (see FIG. 23, described later), and determines whether the face detection information acquired by the face detection unit 61 is Compare whether it matches the face recognition information registered in the face/avatar/attribute correspondence table. Then, the process moves to step S63.

＜ステップＳ６３＞
比較部６２による顔検出情報と顔認識情報との比較の結果、一致する場合（ステップＳ６３：Ｙｅｓ）、ステップＳ６５へ移行し、一致しない場合（ステップＳ６３：Ｎｏ）、ステップＳ６４へ移行する。 <Step S63>
As a result of the comparison between the face detection information and the face recognition information by the comparison unit 62, if they match (step S63: Yes), the process moves to step S65, and if they do not match (step S63: No), the process moves to step S64.

＜ステップＳ６４＞
比較部６２は、顔・アバター・属性対応テーブルにおいて、顔検出部６１により取得された顔検出情報と一致する顔認識情報が存在しない場合、当該顔検出情報を新しい顔認識情報として、顔・アバター・属性対応テーブルに登録する。この時点では、顔・アバター・属性対応テーブルにおいて、新たに登録された顔認識情報のレコードには、対応する（関連付けられた）アバター情報および属性情報は登録されていない状態となる。そして、ステップＳ６５へ移行する。 <Step S64>
If there is no face recognition information that matches the face detection information acquired by the face detection unit 61 in the face/avatar/attribute correspondence table, the comparison unit 62 uses the face detection information as new face recognition information and uses the face/avatar as new face recognition information. -Register in the attribute correspondence table. At this point, in the face/avatar/attribute correspondence table, the record of newly registered face recognition information has no corresponding (associated) avatar information and attribute information registered therein. Then, the process moves to step S65.

＜ステップＳ６５＞
管理システム５０の生成部６３は、顔・アバター・属性対応テーブルにおいて、顔検出部６１により検出された顔検出情報と一致する顔認識情報（ステップＳ６４で新たに顔認識情報として登録された場合は、当該顔認識情報）に関連付けられたアバター情報が登録されているか否か検索する。関連付けられたアバター情報が登録されている場合（ステップＳ６５：Ｙｅｓ）、ステップＳ６６へ移行し、登録されていない場合（ステップＳ６５：Ｎｏ）、ステップＳ６７へ移行する。 <Step S65>
The generation unit 63 of the management system 50 generates face recognition information that matches the face detection information detected by the face detection unit 61 (if it is newly registered as face recognition information in step S64) in the face/avatar/attribute correspondence table. , the face recognition information) is registered. If the associated avatar information is registered (step S65: Yes), the process moves to step S66, and if it is not registered (step S65: No), the process moves to step S67.

＜ステップＳ６６＞
生成部６３は、顔・アバター・属性対応テーブルから、比較部６２により顔検出情報と一致すると判断された顔認識情報に対応するアバター情報を取得し、当該アバター情報に基づいて、当該顔認識情報に対応する参加者を表すアバターを生成する。そして、管理システム５０の表示制御部６９は、生成部６３により生成されたアバター（参加者の情報の一例）について、当該アバターに対応する属性情報に基づいて表示制御を行う。 <Step S66>
The generation unit 63 acquires avatar information corresponding to the face recognition information determined by the comparison unit 62 to match the face detection information from the face/avatar/attribute correspondence table, and based on the avatar information, generates the face recognition information. Generate an avatar representing the corresponding participant. Then, the display control unit 69 of the management system 50 performs display control on the avatar (an example of participant information) generated by the generation unit 63 based on attribute information corresponding to the avatar.

例えば、図１８（ａ）に示すように、拠点ａでは、通信端末１０ａを用いてビデオ会議に参加する参加者Ａ～Ｄがいて、拠点ｂでは、通信端末１０ｂを用いてビデオ会議に参加する参加者Ｅ、Ｆがいるものとした場合、表示制御部６９は、通信端末１０ａ、１０ｂに対して、図１８（ｂ）に示すように、ビデオ会議に参加している各拠点の参加者全員（アバター）が同一の場所にいるような映像データを生成して送信するものとしてもよい。または、図１９（ａ）および図１９（ｂ）に示すように、表示制御部６９は、生成部６３により生成されたアバターについて、自拠点以外の拠点の参加者（アバター）のみを映すような映像データを生成して、通信端末１０ａ、１０ｂに対して送信するものとしてもよい。図１９（ｂ）に示す例では、表示制御部６９は、拠点ｂの通信端末１０ｂのディスプレイ２１４に映す映像データとして、自拠点（ここでは拠点ｂ）以外の拠点（ここでは拠点ａ）の参加者Ａ～Ｄのみを映すような映像データを生成するものとしてもよい。 For example, as shown in FIG. 18(a), at site a, there are participants A to D who participate in a video conference using communication terminal 10a, and at site b, there are participants A to D who participate in the video conference using communication terminal 10b. When it is assumed that there are participants E and F, the display control unit 69 displays all participants at each base participating in the video conference to the communication terminals 10a and 10b, as shown in FIG. 18(b). (Avatar) may be generated and transmitted as if they are in the same location. Alternatively, as shown in FIGS. 19(a) and 19(b), the display control unit 69 may display the avatars generated by the generation unit 63 in such a way that only participants (avatars) from bases other than the own base are displayed. It is also possible to generate video data and transmit it to the communication terminals 10a and 10b. In the example shown in FIG. 19(b), the display control unit 69 displays the video data displayed on the display 214 of the communication terminal 10b of the base b by the participation of a base (here, base a) other than its own base (here, base b). It is also possible to generate video data that shows only persons A to D.

また、表示制御部６９は、参加者の属性情報が顔・アバター・属性対応テーブルに登録されている場合、当該属性情報を参照して、ビデオ会議の参加者のうち同じ会社に所属する参加者のアバターを同列となるように配置し、役職の順序に並べた表示となるように、映像データを生成して送信するものとしてもよい。また、表示制御部６９は、アバターを役職の順序に並べると共に、または、それに代えて、役職名をアバターの近傍に表示するものとしてもよい。この際、表示制御部６９は、さらにアバターの近傍に名前、所属等を表示させるものとしてもよい。このように役職の順序に並べたり、役職名等を表示させることによって、自拠点以外の拠点の参加者（アバター）について、少なくともどの参加者がどの参加者よりも目上のものであるのか等の各参加者の立場を把握することができ、円滑に会議を進めることができる。なお、ここで、顔・アバター・属性対応テーブルに、参加者に対応する属性情報が登録されていない場合、図２１で後述する属性情報取得処理が実行される。 In addition, when the attribute information of the participants is registered in the face/avatar/attribute correspondence table, the display control unit 69 refers to the attribute information and selects participants who belong to the same company among the participants of the video conference. The video data may be generated and transmitted so that the avatars are arranged in the same row and displayed in the order of their positions. Further, the display control unit 69 may arrange the avatars in the order of their positions, or alternatively, display the position names near the avatars. At this time, the display control unit 69 may further display the name, affiliation, etc. near the avatar. In this way, by arranging the positions in order and displaying the position names, etc., it is possible to at least know which participants are superior to other participants (avatars) at bases other than one's own base. The position of each participant can be understood, and the meeting can proceed smoothly. Here, if attribute information corresponding to the participant is not registered in the face/avatar/attribute correspondence table, attribute information acquisition processing, which will be described later with reference to FIG. 21, is executed.

＜ステップＳ６７＞
顔・アバター・属性対応テーブルにおいて、顔検出部６１により検出された顔検出情報と一致する顔認識情報（ステップＳ６４で新たに顔認識情報として登録された場合は、当該顔認識情報）に関連付けられたアバター情報が登録されていない場合、生成部６３は、新たにアバター情報を生成する。例えば、記憶部５８にデフォルトのアバター情報が記憶されているものとし、生成部６３は、新たなアバター情報として、デフォルトのアバター情報を用いるものとしてもよい。そして、生成部６３は、新たに生成したアバター情報に基づいて、顔認識情報に対応する参加者を表すアバターを生成する。そして、生成部６３は、顔・アバター・属性対応テーブルにおいて、生成した新たなアバター情報を、当該顔認識情報に関連付けて登録する。表示制御部６９によるアバターの表示制御は、上述のステップＳ６６で説明した動作と同様である。 <Step S67>
In the face/avatar/attribute correspondence table, the face recognition information that matches the face detection information detected by the face detection unit 61 (if the face recognition information is newly registered as face recognition information in step S64, the face recognition information) If the avatar information has not been registered, the generation unit 63 generates new avatar information. For example, it is assumed that default avatar information is stored in the storage unit 58, and the generation unit 63 may use the default avatar information as the new avatar information. Then, the generation unit 63 generates an avatar representing the participant corresponding to the face recognition information based on the newly generated avatar information. The generation unit 63 then registers the generated new avatar information in association with the face recognition information in the face/avatar/attribute correspondence table. The display control of the avatar by the display control unit 69 is similar to the operation described in step S66 above.

以上のステップＳ６１～Ｓ６７の流れにより、管理システム５０によるアバター生成処理が実行される。 The avatar generation process by the management system 50 is executed by the flow of steps S61 to S67 described above.

（発話方向特定処理）
図２０は、実施形態に係る通信端末の音声方向特定処理の流れの一例を示すフローチャートである。図２０を参照しながら、通信端末１０における発話方向特定処理の流れについて説明すする。 (Speech direction identification process)
FIG. 20 is a flowchart illustrating an example of the flow of audio direction identification processing of the communication terminal according to the embodiment. Referring to FIG. 20, the flow of speech direction identification processing in the communication terminal 10 will be explained.

＜ステップＳ７１＞
ビデオ会議の参加者は、他拠点の参加者とコミュニケーションを取るために発話する。そして、ステップＳ７２へ移行する。 <Step S71>
Participants in a video conference speak to communicate with participants at other locations. Then, the process moves to step S72.

＜ステップＳ７２＞
すると、発話した参加者が利用する通信端末１０のマイク１１４ａ（マイク２４１）は、マイクロホンアレイにより発話した音声を収音して音声信号に変換し、通信端末１０の音声入力部１５は、当該音声信号を入力（取得）する。そして、ステップＳ７３へ移行する。 <Step S72>
Then, the microphone 114a (microphone 241) of the communication terminal 10 used by the participant who spoke collects the voice spoken by the microphone array and converts it into an audio signal, and the voice input unit 15 of the communication terminal 10 receives the voice. Input (acquire) the signal. Then, the process moves to step S73.

＜ステップＳ７３＞
通信端末１０の発話方向特定部２０は、音声入力部１５により入力された音声信号に対して音声処理を行うことにより、音声の発話方向を特定する。そして、ステップＳ７４へ移行する。 <Step S73>
The speech direction specifying unit 20 of the communication terminal 10 specifies the speech direction of the voice by performing voice processing on the voice signal input by the voice input unit 15. Then, the process moves to step S74.

＜ステップＳ７４＞
通信端末１０の通信部１１は、音声入力部１５により入力された音声データ（音声信号）、撮像部１３により撮影された映像データ、および発話方向特定部２０により特定された発話方向の情報を、管理システム５０へ送信する。そして、発話方向特定処理を終了する。 <Step S74>
The communication unit 11 of the communication terminal 10 receives audio data (audio signal) input by the audio input unit 15, video data captured by the imaging unit 13, and information on the speech direction specified by the speech direction identification unit 20. It is sent to the management system 50. Then, the speech direction specifying process ends.

以上のステップＳ７１～Ｓ７４の流れで、通信端末１０による発話方向特定処理が実行される。 The speech direction specifying process by the communication terminal 10 is executed through the flow of steps S71 to S74 described above.

（属性情報取得処理）
図２１は、実施形態に係る通信システムの属性情報取得処理の流れの一例を示すフローチャートである。図２２は、属性情報の取得動作を説明する図である。図２３は、顔・アバター・属性対応テーブルの一例を示す図である。図２１～図２３を参照しながら、通信システム１による属性情報取得処理の流れについて説明する。なお、ビデオ会議が開始され、上述の図１７で説明したように、管理システム５０の表示制御部６９がアバターを表示する場合に用いる参加者の属性情報が、顔・アバター・属性対応テーブルに登録されていないため、属性情報要求部６４によって、属性情報が登録されていない参加者の拠点の通信端末１０へ、属性情報についての発話を要求するための指示を送信したものとする。この場合、通信端末１０の表示制御部１４は、通信部１１を介して当該指示を受信すると、例えば、ディスプレイ１２０（ディスプレイ２１４）に、属性情報の発話を促すメッセージを表示させる。なお、属性情報の発話を促すメッセージの表示のほか、例えば、通信端末１０の音声出力部１６は、属性情報の発話を促す音声を、スピーカ１１４ｂ（スピーカ２４２）から出力させてもよい。 (Attribute information acquisition processing)
FIG. 21 is a flowchart illustrating an example of the flow of attribute information acquisition processing in the communication system according to the embodiment. FIG. 22 is a diagram illustrating an operation for acquiring attribute information. FIG. 23 is a diagram showing an example of a face/avatar/attribute correspondence table. The flow of attribute information acquisition processing by the communication system 1 will be described with reference to FIGS. 21 to 23. Note that when the video conference is started, the attribute information of the participants used when the display control unit 69 of the management system 50 displays the avatar is registered in the face/avatar/attribute correspondence table, as explained in FIG. 17 above. Therefore, it is assumed that the attribute information requesting unit 64 transmits an instruction to request speech regarding the attribute information to the communication terminal 10 at the base of the participant whose attribute information is not registered. In this case, upon receiving the instruction via the communication unit 11, the display control unit 14 of the communication terminal 10 causes the display 120 (display 214) to display a message prompting the user to speak the attribute information, for example. In addition to displaying a message prompting the user to utter the attribute information, for example, the audio output unit 16 of the communication terminal 10 may cause the speaker 114b (speaker 242) to output a voice prompting the user to utter the attribute information.

＜ステップＳ８１＞
通信端末１０のディスプレイ１２０（ディスプレイ２１４）に表示された属性情報の発話を促すメッセージを確認したビデオ会議の参加者は、自身の名前、属する会社の社名、所属および役職等の属性情報を発話する。図２２に示す例では、拠点ｂの参加者Ｅ、Ｆに対して、属性情報の発話を促すようなメッセージが表示された場合、参加者Ｅは、自身の属性情報を含む「ＡＡＡ社、技術のＢＢＢです。」と発話し、参加者Ｆは、自身の属性情報を含む「ＸＸＸ社、部長のＹＹＹです。」と発話している状態を示す。図２２の例では、拠点ｂの参加者が発話している状態を示しているが、上述の管理システム５０のアバター生成処理の際に、拠点ａの参加者の属性情報が顔・アバター・属性対応テーブルに登録されていないことが確認された場合、管理システム５０から拠点ａの通信端末１０に対しても、属性情報についての発話を要求するための指示が送信される。通信端末１０の音声入力部１５は、マイク１１４ａ（マイク２４１）により収音された参加者が発話した音声の音声データを入力（取得）する。また、通信端末１０の発話方向特定部２０は、音声入力部１５により入力された音声データに基づいて、音声方向（発話方向）を特定する。そして、通信端末１０の通信部１１は、音声入力部１５により入力された音声データ、および発話方向特定部２０により特定された発話方向の情報を、管理システム５０へ送信する。管理システム５０の通信部５１は、通信端末１０から送信された音声データおよび発話方向の情報を受信する。そして、ステップＳ８２へ移行する。 <Step S81>
Participants of the video conference who confirm the message prompting them to speak the attribute information displayed on the display 120 (display 214) of the communication terminal 10 speak attribute information such as their name, the name of the company to which they belong, their affiliation, and position. . In the example shown in FIG. 22, if a message prompting participants E and F at site b to speak their attribute information is displayed, participant E will respond to the message "AAA company, technology" that includes his or her own attribute information. This is BBB from Company XXX.'', and Participant F is shown uttering, ``This is YYY, the manager of XXX Company,'' which includes his own attribute information. The example in FIG. 22 shows a state in which the participant at base b is speaking, but during the avatar generation process of the management system 50 described above, the attribute information of the participant at base a is If it is confirmed that the attribute information is not registered in the correspondence table, the management system 50 also sends an instruction to the communication terminal 10 at the base a to request speech regarding the attribute information. The audio input unit 15 of the communication terminal 10 inputs (obtains) audio data of the voices uttered by the participants, which are collected by the microphone 114a (microphone 241). Furthermore, the speech direction specifying unit 20 of the communication terminal 10 specifies the voice direction (speech direction) based on the voice data input by the voice input unit 15. Then, the communication unit 11 of the communication terminal 10 transmits the voice data input by the voice input unit 15 and the speech direction information specified by the speech direction specifying unit 20 to the management system 50. The communication unit 51 of the management system 50 receives voice data and speech direction information transmitted from the communication terminal 10. Then, the process moves to step S82.

＜ステップＳ８２＞
管理システム５０のテキスト化部６６は、通信部５１により受信された音声データを、既知の音声認識技術によりテキスト化する。そして、ステップＳ８３へ移行する。 <Step S82>
The text conversion unit 66 of the management system 50 converts the voice data received by the communication unit 51 into text using known voice recognition technology. Then, the process moves to step S83.

＜ステップＳ８３＞
管理システム５０の抽出部６７は、テキスト化部６６によりテキスト化されたテキストと、予め属性情報を示す候補として記憶部５８に登録されている登録済みワードとを比較する。そして、ステップＳ８４へ移行する。 <Step S83>
The extraction unit 67 of the management system 50 compares the text converted into text by the text conversion unit 66 with registered words registered in advance in the storage unit 58 as candidates indicating attribute information. Then, the process moves to step S84.

＜ステップＳ８４＞
抽出部６７によるテキストと登録済みワードとの比較の結果、一致するキーワードがある（ステップＳ８４：Ｙｅｓ）、ステップＳ８５へ移行し、一致するキーワードがない（ステップＳ８４：Ｎｏ）、ステップＳ８７へ移行する。 <Step S84>
As a result of the comparison between the text and the registered word by the extraction unit 67, if there is a matching keyword (step S84: Yes), the process moves to step S85, and if there is no matching keyword (step S84: No), the process moves to step S87. .

＜ステップＳ８５＞
抽出部６７は、テキスト化部６６によりテキスト化されたテキストと、登録済みワードとの比較の結果、当該テキストに登録済みワードと一致するキーワードがある場合、当該キーワードを抽出する。例えば、図２３に示す顔・アバター・属性対応テーブルのように属性情報として、社名、所属、役職、および名前のような属性がある場合、抽出部６７は、テキストから、各属性に対応するキーワードを抽出する。そして、ステップＳ８６へ移行する。 <Step S85>
As a result of comparing the text converted into text by the text conversion unit 66 with the registered words, the extraction unit 67 extracts the keyword if the text has a keyword that matches the registered word. For example, when there are attributes such as company name, affiliation, position, and name as attribute information as in the face/avatar/attribute correspondence table shown in FIG. 23, the extraction unit 67 extracts keywords corresponding to each attribute from the text. Extract. Then, the process moves to step S86.

＜ステップＳ８６＞
管理システム５０の登録部６８は、抽出部６７により抽出されたキーワードを、テキスト化部６６によりテキスト化された音声データに対応する参加者の属性情報として、顔・アバター・属性対応テーブルにおいて、比較部６２により顔認識情報と一致すると判断された参加者の顔検出情報であって、発話方向に対応する参加者の顔検出情報に関連付けて登録する。例えば、図２３に示す顔・アバター・属性対応テーブルでは、顔認識情報が「ＸＸＸ」（Ｍ４）、および「ＹＹＹ」（ＥＬ２）である参加者の属性情報として、社名「ＡＡＡ」、所属「ＢＢＢ」、役職「ＣＣＣ」、および名前「ＤＤＤ」が登録されている。そして、属性情報取得処理を終了する。 <Step S86>
The registration unit 68 of the management system 50 compares the keywords extracted by the extraction unit 67 as participant attribute information corresponding to the audio data converted into text by the text conversion unit 66 in the face/avatar/attribute correspondence table. The face detection information of the participant determined by the unit 62 to match the face recognition information is registered in association with the face detection information of the participant corresponding to the speaking direction. For example, in the face/avatar/attribute correspondence table shown in FIG. 23, the attribute information for participants whose face recognition information is "XXX" (M4) and "YYY" (EL2) is the company name "AAA" and the affiliation "BBB." ”, title “CCC”, and name “DDD” are registered. Then, the attribute information acquisition process ends.

＜ステップＳ８７＞
テキスト化部６６によりテキスト化されたテキストと、登録済みワードとの比較の結果、当該テキストに登録済みワードと一致するキーワードがないため、管理システム５０の属性情報要求部６４は、属性情報が登録されていない参加者に対して、属性情報の発話を要求するための指示を、通信部５１を介して通信端末１０へ送信する。そして、通信端末１０の表示制御部１４は、通信部１１を介して当該指示を受信すると、例えば、ディスプレイ１２０（ディスプレイ２１４）に、属性情報の発話を促すメッセージを表示させる。そして、ステップＳ８１へ戻る。 <Step S87>
As a result of comparing the text converted into text by the text conversion unit 66 with registered words, there is no keyword matching the registered word in the text, so the attribute information request unit 64 of the management system 50 determines that the attribute information is registered. An instruction is sent to the communication terminal 10 via the communication unit 51 to request the utterance of the attribute information to the participants who have not yet done so. Then, upon receiving the instruction via the communication unit 11, the display control unit 14 of the communication terminal 10 causes the display 120 (display 214) to display a message prompting the user to speak the attribute information, for example. Then, the process returns to step S81.

以上のステップＳ８１～Ｓ８７の流れで、通信システム１による属性情報取得処理が実行される。 Attribute information acquisition processing by the communication system 1 is executed through the flow of steps S81 to S87 described above.

（アバター制御処理）
図２４は、実施形態に係る通信システムのアバター制御処理の流れの一例を示すフローチャートである。図２５は、属性情報に基づいてアバターの配置を変更する動作を説明する図である。図２６は、発話方向と発話者との対応を説明する説明する図である。図２７は、発話者・発話方向対応テーブルの一例を示す図である。図２４～図２７を参照しながら、通信システム１によるアバター制御処理の流れについて説明する。 (Avatar control processing)
FIG. 24 is a flowchart illustrating an example of the flow of avatar control processing in the communication system according to the embodiment. FIG. 25 is a diagram illustrating the operation of changing the arrangement of avatars based on attribute information. FIG. 26 is an explanatory diagram illustrating the correspondence between speech directions and speakers. FIG. 27 is a diagram showing an example of a speaker/speech direction correspondence table. The flow of avatar control processing by the communication system 1 will be described with reference to FIGS. 24 to 27.

＜ステップＳ９１＞
図１７に示したアバター生成処理が実行されると、管理システム５０の生成部６３により生成されたアバターは、表示制御部６９により、当該アバターに対応する属性情報に基づいて表示制御が行われ、映像データが生成される。表示制御部６９による表示制御の詳細は、図１７のステップＳ６６で説明したとおりである。ここでは、自拠点の通信端末１０が、相手拠点の参加者について生成されたアバターについて表示制御された映像データを管理システム５０から受信し、当該映像データをディスプレイ１２０（ディスプレイ２１４）に表示させているものとする。 <Step S91>
When the avatar generation process shown in FIG. 17 is executed, the display control unit 69 performs display control of the avatar generated by the generation unit 63 of the management system 50 based on the attribute information corresponding to the avatar. Video data is generated. Details of the display control by the display control unit 69 are as described in step S66 of FIG. 17. Here, the communication terminal 10 at the own base receives display-controlled video data about the avatar generated for the participant at the other base from the management system 50, and displays the video data on the display 120 (display 214). It is assumed that there is

例えば、図２５（ａ）に示すように、拠点ａの参加者をＡ社の課長、拠点ｂの参加者をＢ社の部長およびＢ社の課長、拠点ｃの参加者をＡ社の担当者およびＢ社の担当者とした場合、拠点ａの通信端末１０におけるディスプレイ１２０（ディスプレイ２１４）には、例えば図２５（ｂ）に示すように映像データが表示される。すなわち、図２５（ｂ）に示す映像データは、表示制御部６９によって、各拠点（ここでは相手拠点となる拠点ｂ、ｃ）の参加者が同一の場所にいるようにし、同じ会社に所属する参加者（Ｂ社の部長、課長および担当者）を同列、かつ役職の順序に並べた状態となるように表示制御されたものである。 For example, as shown in Figure 25(a), the participants at site a are the section manager of company A, the participants at site b are the general manager of company B and the section manager of company B, and the participants at site c are the person in charge of company A. In the case of a person in charge of company B, video data is displayed on the display 120 (display 214) of the communication terminal 10 at base a, for example, as shown in FIG. 25(b). That is, the video data shown in FIG. 25(b) uses the display control unit 69 to ensure that the participants at each base (here, bases b and c, which are the other bases) are in the same location and belong to the same company. The display is controlled so that the participants (the general manager, section manager, and person in charge of Company B) are arranged in the same row and in the order of their positions.

そして、ステップＳ９２へ移行する。 Then, the process moves to step S92.

＜ステップＳ９２＞
相手拠点の参加者が発話すると、相手拠点の通信端末１０の通信部１１は、音声入力部１５により入力された音声データ、撮像部１３により撮影された映像データ、および発話方向特定部２０により特定された発話方向の情報を、管理システム５０へ送信する。管理システム５０の通信部５１は、相手拠点の音声データ、映像データおよび発話方向の情報を、相手拠点の通信端末１０から受信する。そして、ステップＳ９３へ移行する。 <Step S92>
When a participant at the other site speaks, the communication unit 11 of the communication terminal 10 at the other site uses the audio data input by the audio input unit 15 , the video data captured by the imaging unit 13 , and the speech direction identification unit 20 to identify the speech direction. The information about the direction of speech is sent to the management system 50. The communication unit 51 of the management system 50 receives audio data, video data, and speech direction information of the other party's base from the communication terminal 10 of the other party's base. Then, the process moves to step S93.

＜ステップＳ９３＞
管理システム５０の対応付け部６５は、通信部５１を介して受信した参加者の音声の発話方向と、顔検出部６１により検出された顔画像、すなわち参加者とを対応付ける。具体的には、対応付け部６５は、予め記憶部５８に記憶されている図２７に示すような座標（顔中心座標）と、発話方向とを対応付けた発話者・発話方向対応テーブルを参照し、通信部５１により受信された発話方向が、どの座標（顔中心座標）に対応するのかを特定する。ここで、例えば、図２６に示すように、相手拠点の参加者がＡ～Ｄである場合、顔検出部６１により参加者Ａ～Ｄの顔の画像の中心座標はそれぞれ算出されている。そして、対応付け部６５は、顔検出部６１により算出された相手拠点の参加者の顔画像の中心座標のうち、特定した座標と一致する（または一致するとみなせる）中心座標を特定し、当該中心座標を有する顔画像に対応する参加者のアバターを特定する。そして、ステップＳ９４へ移行する。 <Step S93>
The association unit 65 of the management system 50 associates the speech direction of the participant's voice received via the communication unit 51 with the face image detected by the face detection unit 61, that is, the participant. Specifically, the association unit 65 refers to a speaker/speech direction correspondence table that associates coordinates (face center coordinates) as shown in FIG. 27 and utterance directions, which is stored in advance in the storage unit 58. Then, it is specified to which coordinates (face center coordinates) the speech direction received by the communication unit 51 corresponds. Here, for example, as shown in FIG. 26, if the participants at the partner base are A to D, the face detection unit 61 has calculated the center coordinates of the face images of the participants A to D, respectively. Then, the matching unit 65 identifies the center coordinates that match (or can be considered to match) the specified coordinates among the center coordinates of the face images of the participants at the partner base calculated by the face detection unit 61, and Identify the participant's avatar corresponding to the face image with coordinates. Then, the process moves to step S94.

＜ステップＳ９４＞
管理システム５０の表示制御部６９は、対応付け部６５により特定された参加者のアバターが、通信部５１により受信された音声データに合わせて、音声を発話している動作となるように反映した映像データを生成する。具体的に反映動作は、上述したとおりである。そして、管理システム５０の通信部５１は、相手拠点の通信端末１０から受信した音声データ、および表示制御部６９により生成された映像データを、自拠点の通信端末１０へ送信する。自拠点の通信端末１０の表示制御部１４は、管理システム５０から通信部１１を介して音声データおよび映像データを受信すると、ディスプレイ１２０（ディスプレイ２１４）に当該映像データを表示させ、音声出力部１６は、当該音声データを音声として出力する。 <Step S94>
The display control unit 69 of the management system 50 reflects the behavior of the participant's avatar identified by the association unit 65 so that it is speaking in accordance with the audio data received by the communication unit 51. Generate video data. The specific reflection operation is as described above. Then, the communication unit 51 of the management system 50 transmits the audio data received from the communication terminal 10 at the other site and the video data generated by the display control unit 69 to the communication terminal 10 at the own site. When the display control unit 14 of the communication terminal 10 at its own site receives audio data and video data from the management system 50 via the communication unit 11, the display control unit 14 displays the video data on the display 120 (display 214), and outputs the audio data to the audio output unit 16. outputs the audio data as audio.

以上のステップＳ９１～Ｓ９４の流れで、通信システム１によるアバター制御処理が実行される。このように、相手拠点の参加者が発話すると、発話方向が特定され、当該発話方向からどの参加者が発話しているのかが特定されるので、当該参加者のアバターに対して、発話動作を反映することができる。これによって、相手拠点のアバターを含む映像データを見ている自拠点の参加者は、アバターを介してどの参加者が発話しているのかを認識することができるので、アバターを介したビデオ会議においても、スムーズなコミュニケーションを実現することができる。 The avatar control process by the communication system 1 is executed through the flow of steps S91 to S94 described above. In this way, when a participant at the other site speaks, the speaking direction is specified, and which participant is speaking from that speaking direction is identified, so the speaking action can be directed to the avatar of the participant. can be reflected. As a result, participants at their own site who are viewing video data that includes the avatar of the other site can recognize which participant is speaking through the avatar, so in a video conference using the avatar. It also allows for smooth communication.

（通信システムの全体動作）
図２８は、実施形態に係る通信システムの全体動作の流れの一例を示すシーケンス図である。図２９は、音声認識による属性情報の取得を促す画面の一例を示す図である。図２８および図２９を参照しながら、通信システム１の全体的な動作の流れについて総括的に説明する。なお、図２８においては、自拠点の通信端末を通信端末１０ａ（第２通信端末）とし、相手拠点の通信端末を通信端末１０ｂ（第１通信端末）として説明する。 (Overall operation of communication system)
FIG. 28 is a sequence diagram illustrating an example of the flow of the overall operation of the communication system according to the embodiment. FIG. 29 is a diagram illustrating an example of a screen that prompts the acquisition of attribute information through voice recognition. With reference to FIGS. 28 and 29, the overall operation flow of the communication system 1 will be described in general. In addition, in FIG. 28, the communication terminal at the own site is assumed to be the communication terminal 10a (second communication terminal), and the communication terminal at the other site is assumed to be the communication terminal 10b (first communication terminal).

＜ステップＳ１０１、Ｓ１０２＞
自拠点の参加者は、相手拠点の参加者とのビデオ会議を開始するために、通信端末１０ａの入力装置１０８を介して、開始するための操作（例えば、通信端末１０ｂを宛先端末として選択する操作）を行う。これによって、上述の図１４および図１５に示した動作が実行され、通信端末１０ａと通信端末１０ｂとの間でセッションが確立される。 <Steps S101, S102>
In order to start a video conference with a participant at the other site, the participant at the own site performs a start operation (for example, selecting the communication terminal 10b as the destination terminal) via the input device 108 of the communication terminal 10a. operation). As a result, the operations shown in FIGS. 14 and 15 described above are executed, and a session is established between communication terminal 10a and communication terminal 10b.

＜ステップＳ１０３＞
セッションの確立後、通信端末１０ａの通信部１１が、音声入力部１５により入力された音声データ、および撮像部１３により撮影された映像データを、管理システム５０へ送信したものとする。 <Step S103>
It is assumed that after the session is established, the communication unit 11 of the communication terminal 10a transmits the audio data input by the audio input unit 15 and the video data captured by the imaging unit 13 to the management system 50.

＜ステップＳ１０４＞
管理システム５０は、通信端末１０ａから音声データおよび映像データを受信すると、図１７に示したアバター生成処理を実行する。これによって、相手拠点の通信端末１０ｂのディスプレイ１２０（ディスプレイ２１４）には、自拠点の参加者をアバターとして示す映像データが表示される。 <Step S104>
When the management system 50 receives audio data and video data from the communication terminal 10a, it executes the avatar generation process shown in FIG. 17. As a result, video data showing the participant at the own base as an avatar is displayed on the display 120 (display 214) of the communication terminal 10b at the other base.

＜ステップＳ１０５＞
ステップＳ１０４のアバター生成処理において、自拠点の参加者のうち少なくともいずれかの参加者の属性情報が顔・アバター・属性対応テーブルに登録されていない場合、管理システム５０の属性情報要求部６４は、属性情報が登録されていない参加者に対して、属性情報の発話を要求するための指示を、通信部５１を介して通信端末１０ａへ送信する。 <Step S105>
In the avatar generation process of step S104, if the attribute information of at least one of the participants at the own base is not registered in the face/avatar/attribute correspondence table, the attribute information requesting unit 64 of the management system 50: An instruction to request the participant whose attribute information is not registered to speak the attribute information is transmitted to the communication terminal 10a via the communication unit 51.

＜ステップＳ１０６＞
通信端末１０ａの表示制御部１４は、通信部１１を介して当該指示を受信すると、例えば、ディスプレイ１２０（ディスプレイ２１４）に、属性情報の発話を促すメッセージを表示させる。例えば、図２９では、通信端末１０ａのディスプレイ２１４に、属性情報を促すメッセージとして「自己紹介をしてください。会社・所属・役職・名前」のように表示された例を示している。 <Step S106>
When the display control unit 14 of the communication terminal 10a receives the instruction via the communication unit 11, it causes the display 120 (display 214) to display a message prompting the user to utter the attribute information, for example. For example, FIG. 29 shows an example in which a message such as "Please introduce yourself. Company, affiliation, position, name" is displayed on the display 214 of the communication terminal 10a as a message prompting for attribute information.

＜ステップＳ１０７＞
通信端末１０ａのディスプレイ１２０（ディスプレイ２１４）に表示された属性情報の発話を促すメッセージを確認したビデオ会議の参加者は、自身の名前、属する会社の社名、所属および役職等の属性情報を自己紹介として発話する。すると、発話した参加者が利用する通信端末１０ａのマイク１１４ａ（マイク２４１）は、マイクロホンアレイにより発話した音声を収音して音声信号に変換し、通信端末１０ａの音声入力部１５は、当該音声信号を入力（取得）する。 <Step S107>
After confirming the message prompting them to speak the attribute information displayed on the display 120 (display 214) of the communication terminal 10a, the participants of the video conference self-introduce their attribute information such as their name, the name of the company they belong to, their affiliation, and position. utter as. Then, the microphone 114a (microphone 241) of the communication terminal 10a used by the participant who spoke collects the voice spoken by the microphone array and converts it into an audio signal, and the voice input unit 15 of the communication terminal 10a receives the voice. Input (acquire) the signal.

＜ステップＳ１０８、Ｓ１０９＞
通信端末１０ａの発話方向特定部２０は、音声入力部１５により入力された音声信号に対して音声処理を行うことにより、音声の発話方向を特定する。通信端末１０の通信部１１は、音声入力部１５により入力された音声データ（音声信号）、撮像部１３により撮影された映像データ、および発話方向特定部２０により特定された発話方向の情報を、管理システム５０へ送信する。なお、ステップＳ１０９で発話方向の情報が管理システム５０へ送信されることに限られず、ビデオ会議のコミュニケーションにおいて発話される度に、通信端末１０ａにおいて発話方向が特定され、当該発話方向の情報が相手拠点の通信端末１０ｂへ送信される。 <Steps S108, S109>
The speech direction specifying unit 20 of the communication terminal 10a specifies the speech direction of the voice by performing voice processing on the voice signal input by the voice input unit 15. The communication unit 11 of the communication terminal 10 receives audio data (audio signal) input by the audio input unit 15, video data captured by the imaging unit 13, and information on the speech direction specified by the speech direction identification unit 20. The information is sent to the management system 50. Note that the information on the direction of speech is not limited to being transmitted to the management system 50 in step S109, but the communication terminal 10a identifies the direction of speech each time a speech is made in video conference communication, and the information on the direction of speech is transmitted to the other party. It is transmitted to the communication terminal 10b at the base.

＜ステップＳ１１０＞
管理システム５０により通信端末１０ａから音声データ、映像データおよび発話方向の情報が受信されると、図２１に示した属性情報取得処理が実行される。これによって、自拠点の参加者の属性情報が、顔・アバター・属性対応テーブルに登録される。 <Step S110>
When the management system 50 receives audio data, video data, and speech direction information from the communication terminal 10a, the attribute information acquisition process shown in FIG. 21 is executed. As a result, the attribute information of the participant at the own base is registered in the face/avatar/attribute correspondence table.

＜ステップＳ１１１、Ｓ１１２＞
そして、アバター生成処理の実行によりアバターが生成され、属性情報取得処理により自拠点の参加者の属性情報が登録されると、アバターと属性情報との対応関係が確立し、通信端末１０ａから管理システム５０へ音声データ、映像データおよび発話方向の情報が送信されると、図２４に示したアバター制御処理が実行され、音声を発話している動作が反映したアバターの映像データが、通信端末１０ｂへ送信される。 <Steps S111, S112>
Then, when an avatar is generated by executing the avatar generation process and the attribute information of the participant at the own base is registered by the attribute information acquisition process, a correspondence relationship between the avatar and the attribute information is established, and the management system When the audio data, video data, and speech direction information are transmitted to the communication terminal 10b, the avatar control process shown in FIG. Sent.

以上のステップＳ１０１～Ｓ１１２の流れによって、通信システム１の全体的な動作が行われる。 The overall operation of the communication system 1 is performed through the flow of steps S101 to S112 described above.

以上のように、本実施形態に係る通信システム１では、ビデオ会議を開始時に参加者の属性情報が登録されていない場合、管理システム５０から、参加者が用いる通信端末１０に対して、属性情報の発話を要求するための指示を送信し、通信端末１０は、属性情報の発話を促す動作（例えばメッセージ表示または音声出力等）を行う。そして、管理システム５０は、通信端末１０において入力された音声データをテキスト化して、登録済みワードと一致するキーワードを抽出して、発話した参加者の属性情報として登録するものとしている。これによって、ビデオ会議の開始前に属性情報が登録されていなくても、属性情報を取得することができるので、当該属性情報に基づいて参加者の情報（例えばアバター、実画像）に対する表示制御を行うことができる。また、このように、属性情報に基づいて参加者の情報に対する表示制御を行うことによって、参加者の立場等を把握することができ、円滑に会議を進めることができる。なお、属性情報に基づいた参加者の情報に対する表示制御としては、参加者を示すものとしてアバターを表示させることが必ずしも必須ではなく、例えば、参加者の実画像（参加者の情報の一例）の近傍に、属性情報を表示させる等の制御を行うものとしてもよい。 As described above, in the communication system 1 according to the present embodiment, if attribute information of a participant is not registered at the time of starting a video conference, the management system 50 sends attribute information to the communication terminal 10 used by the participant. The communication terminal 10 transmits an instruction to request the user to speak the attribute information, and the communication terminal 10 performs an operation (for example, displaying a message or outputting a voice) to prompt the user to speak the attribute information. Then, the management system 50 converts the audio data input at the communication terminal 10 into text, extracts keywords that match the registered words, and registers them as attribute information of the participant who spoke. As a result, even if attribute information is not registered before the start of a video conference, it is possible to obtain attribute information, so display control of participant information (for example, avatars, real images) can be performed based on the attribute information. It can be carried out. Furthermore, by controlling the display of participant information based on the attribute information in this way, it is possible to understand the participants' positions, etc., and the meeting can proceed smoothly. Note that display control of participant information based on attribute information does not necessarily require displaying an avatar to represent the participant; for example, displaying a real image of the participant (an example of participant information) It may also be possible to perform control such as displaying attribute information nearby.

また、本実施形態に係る通信システム１では、管理システム５０は、通信端末１０で撮影された映像データから参加者の顔を検出し、検出した顔検出情報と一致する、登録済みの顔認識情報に対応するアバター情報から、当該参加者を示すアバターを生成している。このように参加者を示すアバターの映像データを用いることによって、実画像の場合と比べてより少ないフレームレートで転送することができるのでデータ通信量を低減することができ、ディスプレイへのスペック要求を下げることができる。 Further, in the communication system 1 according to the present embodiment, the management system 50 detects the faces of participants from the video data captured by the communication terminal 10, and provides registered face recognition information that matches the detected face detection information. An avatar representing the participant is generated from the avatar information corresponding to the participant. By using video data of avatars representing participants in this way, it is possible to transfer data at a lower frame rate than when using real images, which reduces the amount of data communication and reduces spec requirements for displays. Can be lowered.

また、本実施形態に係る通信システム１では、各拠点の通信端末１０において発話した参加者の発話方向を特定し、管理システム５０で検出した参加者の顔画像と、当該発話方向とを対応付けることで、発話をした参加者のアバターを特定する。そして、管理システム５０は、発話の音声データに合わせて、アバターが音声を発話している動作となるように反映した映像データを生成し、相手拠点の通信端末１０へ送信する。これによって、自拠点のアバターを含む映像データを見た相手拠点の参加者は、アバターを介してどの参加者が発話しているのかを認識することができるので、アバターを介したビデオ会議においても、スムーズなコミュニケーションを実現することができる。 Furthermore, in the communication system 1 according to the present embodiment, the direction of speech of a participant who speaks at the communication terminal 10 at each base is identified, and the face image of the participant detected by the management system 50 is associated with the direction of speech. to identify the avatar of the participant who made the utterance. Then, the management system 50 generates video data reflecting the avatar's uttering action in accordance with the voice data of the utterance, and transmits it to the communication terminal 10 at the other party's base. As a result, participants at the other site who view video data that includes an avatar from their own site can recognize which participant is speaking through the avatar, so even in video conferences using avatars. , it is possible to realize smooth communication.

なお、ビデオ会議中に、参加者がその拠点である会議室の中で席を移動する可能性もあり、この場合には参加者の発話方向も変わることになる。この場合、例えば、顔検出部６１による参加者の顔の検出、および検出した顔の画像の中心座標の算出の処理を、一定期間ごとに実行、または、映像データから参加者の移動を検出した場合に実行する等によって対応することができる。これによって、上述のアバター制御処理の中で、対応付け部６５は、発話者・発話方向対応テーブルを参照し、変化した発話方向が、どの座標（顔中心座標）に対応するのかを特定することができ、当該中心座標を有する顔画像に対応する参加者のアバターを特定することができる。 Note that during a video conference, there is a possibility that participants may move their seats within the conference room where they are based, and in this case, the direction of the participants' speech will also change. In this case, for example, the face detection unit 61 detects the participant's face and calculates the center coordinates of the detected face image at regular intervals, or the movement of the participant is detected from the video data. This can be handled by executing the following steps. As a result, in the above-mentioned avatar control process, the association unit 65 refers to the speaker/speech direction correspondence table and specifies which coordinates (face center coordinates) the changed speech direction corresponds to. It is possible to specify the participant's avatar corresponding to the face image having the center coordinates.

また、上述の実施形態に係る通信端末１０のソフトウェア構成は、上述の図５に示した構成に限定されるものではなく、例えば、図３０に示す構成であってもよい。図３０は、実施形態に係る通信端末のＷｅｂアプリを利用する場合のソフトウェア構成の一例を示す図である。上述の図５では、通信端末１０で通信アプリＡが実行される動作を説明したが、同様の処理をＷｅｂアプリによっても実現できる。Ｗｅｂアプリは、ブラウザ上で動作する、例えばＪａｖａＳｃｒｉｐｔ（登録商標）によるプログラムとＷｅｂサーバ側のプログラムとが協調することによって動作し、ユーザはそれをブラウザ上で使用する。すなわち、図３０に示すように、通信端末１０は管理システム５０から、プログラムＷＡ（ＨＴＭＬ（ＨｙｐｅｒＴｅｘｔＭａｒｋｕｐＬａｎｇｕａｇｅ）＋ＪａｖａＳｃｒｉｐｔ（登録商標）＋ＣＳＳ等）をダウンロードして、ブラウザ１０４０上で実行する。当該ブラウザ１０４０は、ＯＳ１０２０の制御に従って動作する。通信端末１０は、ＨＴＴＰ（ＨｙｐｅｒｔｅｘｔＴｒａｎｓｆｅｒＰｒｏｔｏｃｏｌ）またはＨＴＴＰＳ（ＨｙｐｅｒｔｅｘｔＴｒａｎｓｆｅｒＰｒｏｔｏｃｏｌＳｅｃｕｒｅ）等のプロトコルを用いて管理システム５０とデータを送受信することによって、管理システム５０が提供しているサービスを利用できる。このような利用形態では、予め通信端末１０に通信アプリＡをダウンロードしておく必要がない。 Furthermore, the software configuration of the communication terminal 10 according to the embodiment described above is not limited to the configuration shown in FIG. 5 described above, and may be, for example, the configuration shown in FIG. 30. FIG. 30 is a diagram illustrating an example of a software configuration when using a web application of a communication terminal according to an embodiment. Although the above-described FIG. 5 describes the operation of the communication application A executed by the communication terminal 10, similar processing can also be realized by a Web application. A Web application operates by cooperation between a program based on JavaScript (registered trademark) and a program on a Web server that runs on a browser, and is used by a user on a browser. That is, as shown in FIG. 30, the communication terminal 10 downloads a program WA (HTML (HyperText Markup Language) + JavaScript (registered trademark) + CSS, etc.) from the management system 50 and executes it on the browser 1040. The browser 1040 operates under the control of the OS 1020. The communication terminal 10 can utilize the services provided by the management system 50 by exchanging data with the management system 50 using a protocol such as HTTP (Hypertext Transfer Protocol) or HTTPS (Hypertext Transfer Protocol Secure). In such a usage pattern, there is no need to download the communication application A to the communication terminal 10 in advance.

また、上述の実施形態の各機能は、一または複数の処理回路によって実現することが可能である。ここで、「処理回路」とは、電子回路により実装されるプロセッサのようにソフトウェアによって各機能を実行するようプログラミングされたプロセッサや、上述した各機能を実行するよう設計されたＡＳＩＣ、ＤＳＰ（ＤｉｇｉｔａｌＳｉｇｎａｌＰｒｏｃｅｓｓｏｒ）、ＦＰＧＡ、ＳｏＣ（Ｓｙｓｔｅｍｏｎａｃｈｉｐ)、ＧＰＵ（ＧｒａｐｈｉｃｓＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）や従来の回路モジュール等のデバイスを含むものとする。 Moreover, each function of the above-described embodiments can be realized by one or more processing circuits. Here, the term "processing circuit" refers to a processor that is programmed to execute each function using software, such as a processor implemented using an electronic circuit, or an ASIC or DSP (Digital It includes devices such as a signal processor), an FPGA, an SoC (system on a chip), a GPU (graphics processing unit), and a conventional circuit module.

また、上述の実施形態において、通信端末１０および管理システム５０の各機能部の少なくともいずれかがプログラムの実行によって実現される場合、そのプログラムは、ＲＯＭ等に予め組み込まれて提供される。また、上述の実施形態に係る通信端末１０および管理システム５０で実行されるプログラムは、インストール可能な形式または実行可能な形式のファイルでＣＤ－ＲＯＭ（ＣｏｍｐａｃｔＤｉｓｃＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、フレキシブルディスク（ＦＤ）、ＣＤ－Ｒ（ＣｏｍｐａｃｔＤｉｓｋ－Ｒｅｃｏｒｄａｂｌｅ）、ＤＶＤまたはＳＤカード等のコンピュータで読み取り可能な記録媒体に記録して提供するように構成してもよい。また、上述の実施形態に係る通信端末１０および管理システム５０で実行されるプログラムを、インターネット等のネットワークに接続されたコンピュータ上に格納し、ネットワーク経由でダウンロードさせることにより提供するように構成してもよい。また、上述の実施形態に係る通信端末１０および管理システム５０で実行されるプログラムを、インターネット等のネットワーク経由で提供または配布するように構成してもよい。また、上述の実施形態に係る通信端末１０および管理システム５０で実行されるプログラムは、上述した各機能部のうち少なくともいずれかを含むモジュール構成となっており、実際のハードウェアとしてはＣＰＵが上述の記憶装置からプログラムを読み出して実行することにより、上述の各機能部が主記憶装置上にロードされて生成されるようになっている。 Further, in the above-described embodiment, when at least one of the functional units of the communication terminal 10 and the management system 50 is realized by executing a program, the program is provided by being incorporated in a ROM or the like in advance. Furthermore, the programs executed by the communication terminal 10 and the management system 50 according to the above-described embodiments can be stored as files in an installable or executable format on a CD-ROM (Compact Disc Read Only Memory) or a flexible disk (FD). , a CD-R (Compact Disk-Recordable), a DVD, or an SD card. Further, the program executed by the communication terminal 10 and the management system 50 according to the above-described embodiment is stored on a computer connected to a network such as the Internet, and is provided by being downloaded via the network. Good too. Furthermore, the programs executed by the communication terminal 10 and the management system 50 according to the above-described embodiments may be configured to be provided or distributed via a network such as the Internet. Further, the programs executed by the communication terminal 10 and the management system 50 according to the above-described embodiment have a module configuration including at least one of the above-mentioned functional units, and the actual hardware includes the CPU described above. By reading a program from the storage device and executing it, each of the above-mentioned functional units is loaded onto the main storage device and generated.

１通信システム
２通信ネットワーク
２ａ～２ｄＬＡＮ
２ａｂ、２ｃｄ専用線
２ｉインターネット
１０、１０ａ、１０ａａ、１０ａｂ、１０ｂ、１０ｂａ、１０ｂｂ、１０ｃａ、１０ｃｂ、１０ｄａ、１０ｄｂ通信端末
１１通信部
１２操作入力受付部
１３撮像部
１４表示制御部
１５音声入力部
１６音声出力部
１７記憶・読出部
１８記憶部
１９認証要求部
２０発話方向特定部
３０ａｂ、３０ｃｂＰＣ
５０管理システム
５１通信部
５２認証部
５３状態管理部
５４端末抽出部
５５端末状態取得部
５６セッション制御部
５７記憶・読出部
５８記憶部
６１顔検出部
６２比較部
６３生成部
６４属性情報要求部
６５対応付け部
６６テキスト化部
６７抽出部
６８登録部
６９表示制御部
７０ａ～７０ｄ、７０ａｂ、７０ｃｄルータ
９０プログラム提供システム
１０１ＣＰＵ
１０２ＲＯＭ
１０３ＲＡＭ
１０５補助記憶装置
１０６メディア
１０７メディアドライブ
１０８入力装置
１１０バスライン
１１１ネットワークＩ／Ｆ
１１２カメラ
１１２ｃケーブル
１１３撮像素子Ｉ／Ｆ
１１４スマートスピーカ
１１４ａマイク
１１４ｂスピーカ
１１４ｃケーブル
１１５スピーカ
１１５ｃケーブル
１１６音声入出力Ｉ／Ｆ
１１７ＵＳＢＩ／Ｆ
１１９ディスプレイＩ／Ｆ
１２０ａａ、１２０ｂａ、１２０ｃａ、１２０ｄａディスプレイ
１２０ｃケーブル
２０１ＣＰＵ
２０２ＲＯＭ
２０３ＲＡＭ
２０４ＳＳＤ
２０５ネットワークＩ／Ｆ
２０６外部機器接続Ｉ／Ｆ
２１０バスライン
２１１キャプチャデバイス
２１２ＧＰＵ
２１３ディスプレイコントローラ
２１４ディスプレイ
２１５センサコントローラ
２１６接触センサ
２１７電子ペンコントローラ
２２２電源スイッチ
２２３選択スイッチ
２３０ＵＳＢメモリ
２４０スマートスピーカ
２４１マイク
２４２スピーカ
２６０カメラ
２７０ＰＣ
２９０電子ペン
３０１ＣＰＵ
３０２ＲＯＭ
３０３ＲＡＭ
３０５補助記憶装置
３０６記録メディア
３０７メディアドライブ
３０８ディスプレイ
３０９ネットワークＩ／Ｆ
３１０バスライン
３１１キーボード
３１２マウス
３１３ＤＶＤ
３１４ＤＶＤドライブ
３１５ＵＳＢＩ／Ｆ
１０１０作業領域
１０２０ＯＳ
１０４０ブラウザ
１１００－２端末ＩＤ
１１００－３端末名
１１００－４ａオフラインアイコン
１１００－４ｂ通話可能アイコン
１２００－１開始要求受付画面
１２００－２「はい」ボタン
１２００－３「いいえ」ボタン
５００１認証管理ＤＢ
５００２端末管理ＤＢ
５００３グループ管理ＤＢ
５００４セッション管理ＤＢ
Ａ通信アプリ
ＷＡプログラム 1 Communication system 2 Communication network 2a to 2d LAN
2ab, 2cd Dedicated line 2i Internet 10, 10a, 10aa, 10ab, 10b, 10ba, 10bb, 10ca, 10cb, 10da, 10db Communication terminal 11 Communication section 12 Operation input reception section 13 Imaging section 14 Display control section 15 Audio input section 16 Audio output unit 17 Storage/readout unit 18 Storage unit 19 Authentication request unit 20 Speech direction identification unit 30ab, 30cb PC
50 Management System 51 Communication Unit 52 Authentication Unit 53 State Management Unit 54 Terminal Extraction Unit 55 Terminal Status Acquisition Unit 56 Session Control Unit 57 Storage/Reading Unit 58 Storage Unit 61 Face Detection Unit 62 Comparison Unit 63 Generation Unit 64 Attribute Information Request Unit 65 Correlation unit 66 Text conversion unit 67 Extraction unit 68 Registration unit 69 Display control unit 70a to 70d, 70ab, 70cd Router 90 Program provision system 101 CPU
102 ROM
103 RAM
105 Auxiliary storage device 106 Media 107 Media drive 108 Input device 110 Bus line 111 Network I/F
112 Camera 112c Cable 113 Image sensor I/F
114 Smart speaker 114a Microphone 114b Speaker 114c Cable 115 Speaker 115c Cable 116 Audio input/output I/F
117 USB I/F
119 Display I/F
120aa, 120ba, 120ca, 120da Display 120c Cable 201 CPU
202 ROM
203 RAM
204 SSD
205 Network I/F
206 External device connection I/F
210 Bus line 211 Capture device 212 GPU
213 Display controller 214 Display 215 Sensor controller 216 Contact sensor 217 Electronic pen controller 222 Power switch 223 Selection switch 230 USB memory 240 Smart speaker 241 Microphone 242 Speaker 260 Camera 270 PC
290 Electronic pen 301 CPU
302 ROM
303 RAM
305 Auxiliary storage device 306 Recording media 307 Media drive 308 Display 309 Network I/F
310 Bus line 311 Keyboard 312 Mouse 313 DVD
314 DVD drive 315 USB I/F
1010 Work area 1020 OS
1040 Browser 1100-2 Terminal ID
1100-3 Terminal name 1100-4a Offline icon 1100-4b Call ready icon 1200-1 Start request reception screen 1200-2 "Yes" button 1200-3 "No" button 5001 Authentication management DB
5002 Terminal management DB
5003 Group management DB
5004 Session management DB
A Communication app WA program

特開２０１０－０９３５８３号公報Japanese Patent Application Publication No. 2010-093583

Claims

A communication system that enables video conferencing by multiple communication terminals transmitting and receiving audio data via a network,
an imaging unit that obtains video data of one or more participants in the video conference;
a detection unit that detects the participant from the video data obtained by the imaging unit;
an audio input unit that inputs audio data uttered by the participant;
an extraction unit that extracts attribute information of the participant from the audio data input by the audio input unit;
a first control unit that controls display on a first communication terminal of the participant information detected by the detection unit based on the attribute information extracted by the extraction unit;
a request unit that transmits to the second communication terminal an instruction requesting a participant using a second communication terminal that communicates with the first communication terminal via the network to utter the attribute information;
a second control unit that performs a process of prompting a participant using the second communication terminal to utter the attribute information in the second communication terminal according to the instruction received from the request unit;
A communication system with

The communication system according to claim 1 , wherein the second control unit displays a message indicating the instruction on a display device of the second communication terminal to a participant using the second communication terminal.

The communication system according to claim 1 , wherein the second control unit causes an output device of the second communication terminal to output a voice indicating the instruction to a participant using the second communication terminal.

further comprising a storage unit that associates and stores the participant's facial recognition information and attribute information;
The detection unit detects the face of the participant from the video data and stores the detected information in the storage unit as face recognition information,
When the attribute information is stored in the storage unit in association with the face recognition information of the participant detected by the detection unit, the first control unit controls information about the participant based on the attribute information. is displayed on the first communication terminal,
When the attribute information is not stored in the storage unit in association with the face recognition information of the participant detected by the detection unit, the request unit transmits an instruction to request the utterance to the second communication terminal. The communication system according to any one of claims 1 to 3, wherein the communication system transmits a message to a user.

The audio input unit inputs audio data uttered by a participant using the second communication terminal after the request unit sends an instruction to request the utterance to the second communication terminal,
The extraction unit extracts attribute information of the participant from the audio data of the participant using the second communication terminal input by the audio input unit,
The communication system further includes a registration unit that registers attribute information of a participant using the second communication terminal extracted by the extraction unit in the storage unit in association with face recognition information corresponding to the participant. The communication system according to claim 4, comprising:

further comprising a generation unit that generates an avatar representing the participant detected by the detection unit,
The first control unit controls display on the first communication terminal of an avatar corresponding to the participant generated by the generation unit as information about the participant detected by the detection unit. The communication system according to any one of items 1 to 5 .

The communication system according to claim 6 , wherein the first control unit changes the arrangement of the avatar corresponding to the attribute information based on the attribute information.

Among the avatars displayed on the first communication terminal, the first control unit increases the sound pressure level of the voice data of the participant corresponding to the avatar that is placed closer to the avatar. The communication system according to claim 7 , wherein the communication system outputs the output from the output device of the first communication terminal.

The communication system according to claim 6 , wherein the first control unit displays the attribute information near the avatar corresponding to the attribute information.

a first identifying unit that identifies, based on the audio data input by the audio input unit, the speaking direction of the participant who uttered the audio data;
a second identifying unit that identifies a participant corresponding to the speaking direction among the participants detected by the detecting unit based on the position of the participant detected by the detecting unit on the video data; ,
It further has
The first control unit performs display control on the avatar of the participant identified by the second identification unit so as to indicate that the avatar of the participant is speaking the audio of the audio data in accordance with the audio data. The communication system according to any one of items 6 to 9 .

According to any one of claims 1 to 10 , the first control unit divides the information of the participants detected by the detection unit by base of the participant and displays the information on the first communication terminal. Communication system as described.

The communication according to any one of claims 1 to 10 , wherein the first control unit displays information on the participants detected by the detection unit on the first communication terminal as if they were at the same base. system.

An information processing device that relays transmission and reception of audio data between multiple communication terminals that conduct a video conference,
a detection unit that detects the participant from video data obtained by an imaging unit that has obtained video data of one or more participants of the video conference;
an extraction unit that extracts attribute information of the participant from audio data input by an audio input unit that inputs audio data uttered by the participant;
a first control unit that controls display on a first communication terminal of the participant information detected by the detection unit based on the attribute information extracted by the extraction unit;
a requesting unit that transmits to the second communication terminal an instruction requesting a participant using a second communication terminal communicating with the first communication terminal to utter the attribute information;
a second control unit that performs a process of prompting a participant using the second communication terminal to utter the attribute information in the second communication terminal according to the instruction received from the request unit;
An information processing device having:

A communication method that realizes a video conference by having multiple communication terminals transmit and receive audio data via a network,
an imaging step of obtaining video data of one or more participants in the video conference;
a detection step of detecting the participant from the video data;
a voice input step of inputting voice data uttered by the participant;
an extraction step of extracting attribute information of the participant from the input audio data;
a first control step of controlling display on a first communication terminal of the detected participant information based on the extracted attribute information;
a requesting step of transmitting to the second communication terminal an instruction requesting a participant using a second communication terminal that communicates with the first communication terminal via the network to utter the attribute information;
a second control step of performing, in the second communication terminal, a process of prompting a participant using the second communication terminal to utter the attribute information according to the instruction;
A communication method having

A computer that relays the transmission and reception of audio data between multiple communication terminals conducting a video conference.
a detection step of detecting the participant from video data obtained by an imaging unit that has obtained video data of one or more participants of the video conference;
an extraction step of extracting attribute information of the participant from the audio data input by an audio input unit that inputs the audio data uttered by the participant;
a first control step of controlling display on a first communication terminal of the detected participant information based on the extracted attribute information;
a requesting step of transmitting, to the second communication terminal, an instruction requesting a participant using a second communication terminal communicating with the first communication terminal to utter the attribute information;
a second control step of performing, in the second communication terminal, a process of prompting a participant using the second communication terminal to utter the attribute information according to the instruction;
A program to run.