JP7102859B2

JP7102859B2 - Video Conference Systems, Video Conference Methods, and Programs

Info

Publication number: JP7102859B2
Application number: JP2018065249A
Authority: JP
Inventors: 直志合川; 智木村; 伸正佐藤
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2018-03-29
Filing date: 2018-03-29
Publication date: 2022-07-20
Anticipated expiration: 2038-03-29
Also published as: JP2022136115A; JP7400886B2; JP2019176416A

Description

本発明は、ビデオ会議システム、ビデオ会議方法、およびプログラムに関する。 The present invention relates to video conference systems, video conference methods, and programs.

離れた場所にいる人物と会議を行う方法の１つとして、ビデオ会議システムが利用されている。ビデオ会議システムでは、それぞれの場所で撮影された映像を互いにやり取りすることにより、互いに離れた場所にいる人物同士で会議を行うことができる。 A video conference system is used as one of the methods for holding a conference with a person in a remote place. In a video conference system, by exchanging images taken at each location with each other, it is possible to hold a conference between people who are separated from each other.

上述のビデオ会議システムに関連する技術が、例えば、下記特許文献１に開示されている。下記特許文献１には、（１）遠隔会議の参加者の画像を用いて各参加者の認証を行い、（２）認証された参加者の数が必要数に達した場合に、アクション許可処理（会議を開始できることを通知する処理）を実行する技術が開示されている。 The technique related to the video conference system described above is disclosed in, for example, Patent Document 1 below. In the following Patent Document 1, (1) each participant is authenticated using an image of a participant in a remote conference, and (2) an action permission process is performed when the number of authenticated participants reaches the required number. A technique for executing (a process of notifying that a meeting can be started) is disclosed.

また、下記特許文献２には、（１）ディスプレイに設けられた複数の撮像装置の画像を使って当該ディスプレイの前に存在するユーザを検知し、（２）ユーザの検知結果に基づいて、ディスプレイの位置を上下左右方向に移動させることにより、全てのユーザが見やすい位置にディスプレイを配置する技術が開示されている。 Further, in Patent Document 2 below, (1) a user existing in front of the display is detected by using images of a plurality of image pickup devices provided on the display, and (2) a display is displayed based on the detection result of the user. Disclosed is a technique for arranging a display in a position that is easy for all users to see by moving the position of the display in the vertical and horizontal directions.

特開２００９－１７１１１９号公報Japanese Unexamined Patent Publication No. 2009-171119 特開２０１６－００４２０７号公報Japanese Unexamined Patent Publication No. 2016-004207

上述の特許文献１では、会議の参加人物が写る画像を用いて、個々の人物を特定（認証）している。ここで、会議の参加人物は、それぞれ、設けられたカメラに対して異なる向きまたは姿勢でいる場合がほとんどである。そのため、会議の参加人物の少なくとも一部が、画像から特定できない可能性がある。また、会議という場の性質に鑑みて、会議の参加人物が個人として特定されていないままとなっていることは好ましくない。 In the above-mentioned Patent Document 1, each person is specified (authenticated) by using an image showing a person participating in the conference. Here, in most cases, the participants in the conference have different orientations or postures with respect to the provided cameras. Therefore, it is possible that at least some of the participants in the conference cannot be identified from the image. Also, given the nature of the conference, it is not desirable for the participants in the conference to remain unidentified as individuals.

本発明は、上記の課題に鑑みてなされたものである。本発明の目的の一つは、ビデオ会議システムにおいて、会議の参加人物を特定する技術を提供することである。 The present invention has been made in view of the above problems. One of the objects of the present invention is to provide a technique for identifying a participant of a conference in a video conference system.

本発明のビデオ会議システムは、
第１の撮像装置により生成された、会議の参加人物が写る画像を取得する画像取得手段と、
前記画像を解析して、前記画像に含まれる人物を特定する人物特定処理を実行する人物特定手段と、
前記人物特定処理で特定できなかった未特定人物の位置を検出する位置検出手段と、
検出された前記未特定人物の位置を用いて、前記未特定人物を特定するための所定処理を実行する処理実行手段と、
を備える。 The video conference system of the present invention
An image acquisition means for acquiring an image of a person participating in the conference generated by the first imaging device, and
A person identification means that analyzes the image and executes a person identification process for identifying a person included in the image, and
A position detecting means for detecting the position of an unspecified person who could not be identified by the person identification process, and
A process execution means for executing a predetermined process for identifying the unspecified person using the detected position of the unspecified person, and a process executing means.
To be equipped.

本発明のビデオ会議方法は、
コンピュータが、
第１の撮像装置により生成された、会議の参加人物が写る画像を取得し、
前記画像を解析して、前記画像に含まれる人物を特定する人物特定処理を実行し、
前記人物特定処理で特定できなかった未特定人物の位置を検出し、
検出された前記未特定人物の位置を用いて、前記未特定人物を特定するための所定処理を実行する、
ことを含む。 The video conference method of the present invention
The computer
Acquire the image of the participants of the conference generated by the first imaging device,
The image is analyzed, a person identification process for identifying the person included in the image is executed, and the person identification process is executed.
Detects the position of an unspecified person that could not be specified by the person identification process,
Using the detected position of the unspecified person, a predetermined process for identifying the unspecified person is executed.
Including that.

本発明のプログラムは、コンピュータに上述のビデオ会議方法を実行させる。 The program of the present invention causes a computer to perform the video conferencing method described above.

本発明によれば、ビデオ会議システムにおいて、会議の参加人物を特定することができる。 According to the present invention, in a video conference system, it is possible to identify the participants in a conference.

第１実施形態におけるビデオ会議システムの構成例を示す図である。It is a figure which shows the configuration example of the video conference system in 1st Embodiment. ビデオ会議システムのハードウエア構成を例示するブロック図である。It is a block diagram which illustrates the hardware configuration of a video conference system. 第１実施形態のビデオ会議システムにより実行される処理の流れを例示するフローチャートである。It is a flowchart which illustrates the flow of the process executed by the video conference system of 1st Embodiment. 第１実施形態のビデオ会議システムにより実行される処理の流れを例示するフローチャートである。It is a flowchart which illustrates the flow of the process executed by the video conference system of 1st Embodiment. 第２実施形態のビデオ会議システムの構成例を示す図である。It is a figure which shows the configuration example of the video conference system of 2nd Embodiment. 第３実施形態のビデオ会議システムの構成例を示す図である。It is a figure which shows the configuration example of the video conference system of 3rd Embodiment. 表示装置に表示される、被写体を特定するための情報の一例を示す図である。It is a figure which shows an example of the information for identifying a subject, which is displayed on a display device. 第４実施形態におけるビデオ会議システムの構成例を示す図である。It is a figure which shows the configuration example of the video conference system in 4th Embodiment. 第４実施形態のビデオ会議システムにより実行される処理の流れを例示するフローチャートである。It is a flowchart which illustrates the flow of the process executed by the video conference system of 4th Embodiment. 第５実施形態におけるビデオ会議システムの構成例を示す図である。It is a figure which shows the configuration example of the video conference system in 5th Embodiment. 第５実施形態におけるビデオ会議システムのハードウエア構成を例示する図である。It is a figure which illustrates the hardware structure of the video conference system in 5th Embodiment. 第５実施形態のビデオ会議システムにより実行される処理の流れを例示するフローチャートである。It is a flowchart which illustrates the flow of the process executed by the video conference system of 5th Embodiment.

以下、本発明の実施形態について、図面を用いて説明する。尚、すべての図面において、同様な構成要素には同様の符号を付し、適宜説明を省略する。また、特に説明する場合を除き、各ブロック図において、各ブロックは、ハードウエア単位の構成ではなく、機能単位の構成を表している。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. In all drawings, similar components are designated by the same reference numerals, and description thereof will be omitted as appropriate. Further, unless otherwise specified, in each block diagram, each block represents a configuration of a functional unit, not a configuration of a hardware unit.

［第１実施形態］
〔システム構成例〕
図１は、第１実施形態におけるビデオ会議システム１の構成例を示す図である。図１に例示されるビデオ会議システム１では、サーバ装置１０と、会議が開催される各地点にそれぞれ設けられた、複数の通信端末２０とが互いに通信可能に接続されている。ビデオ会議は、サーバ装置１０を経由して、各通信端末２０の間で実行される。 [First Embodiment]
[System configuration example]
FIG. 1 is a diagram showing a configuration example of the video conference system 1 according to the first embodiment. In the video conference system 1 illustrated in FIG. 1, the server device 10 and a plurality of communication terminals 20 provided at each point where the conference is held are connected to each other so as to be able to communicate with each other. The video conference is executed between the communication terminals 20 via the server device 10.

通信端末２０には、撮像装置３０および表示装置４０が接続されている。撮像装置３０は、その撮像装置３０が設けられている地点にいる会議の参加人物を撮影して、ビデオ会議の相手側の地点に設けられた表示装置４０に表示させる画像Ｍ（以下、「メイン画像」と表記）を生成する。表示装置４０は、ビデオ会議の相手側の地点に設けられた撮像装置３０により撮影された、相手側のメイン画像Ｍを表示する。また、相手側の表示装置４０に表示されるメイン画像Ｍには、そのメイン画像Ｍを用いて特定された参加人物の情報（例えば、氏名、所属など）が重畳表示される。また、ビデオ会議システム１では、撮像装置３０とは別に、移動型の撮像装置３５が備えられている。一例として、撮像装置３５は、自律移動可能なロボットに組み込まれている。他の一例として、撮像装置３５は、スマートフォン、タブレット、ノート型パソコンといった、携帯型端末に組み込まれている。撮像装置３５は、撮像装置３０により生成されたメイン画像Ｍを用いて特定できなかった人物が存在する場合、その人物を特定するために利用する画像Ｓ（以下、「サブ画像」と表記）を生成する。 An imaging device 30 and a display device 40 are connected to the communication terminal 20. The image pickup device 30 captures an image M of a person who participates in the conference at a point where the image pickup device 30 is provided and displays it on a display device 40 provided at a point on the other side of the video conference (hereinafter, "main"). Notation "image") is generated. The display device 40 displays the main image M of the other party, which is taken by the image pickup device 30 provided at the other party's point of the video conference. Further, information (for example, name, affiliation, etc.) of a participant specified by using the main image M is superimposed and displayed on the main image M displayed on the display device 40 on the other side. Further, in the video conference system 1, a mobile image pickup device 35 is provided in addition to the image pickup device 30. As an example, the imaging device 35 is incorporated in a robot that can move autonomously. As another example, the image pickup device 35 is incorporated in a portable terminal such as a smartphone, a tablet, or a notebook personal computer. When there is a person who could not be identified by using the main image M generated by the image pickup device 30, the image pickup apparatus 35 uses an image S (hereinafter, referred to as “sub-image”) to identify the person. Generate.

図１に示されるように、ビデオ会議システム１は、画像取得部１１０、人物特定部１２０、位置検出部１３０、および処理実行部１４０をそなえる。図１の例において、これらの処理部は、１台のサーバ装置１０に備えられているが、ビデオ会議システム１の構成は図１の例に制限されない。図示されていないが、これらの処理部の全部または一部は、複数のサーバ装置に分散して或いは重複して設けられていてもよい。 As shown in FIG. 1, the video conference system 1 includes an image acquisition unit 110, a person identification unit 120, a position detection unit 130, and a processing execution unit 140. In the example of FIG. 1, these processing units are provided in one server device 10, but the configuration of the video conference system 1 is not limited to the example of FIG. Although not shown, all or part of these processing units may be distributed or overlapped in a plurality of server devices.

画像取得部１１０は、撮像装置３０（第１の撮像装置）により生成された、会議の参加人物が写る画像を取得する。図１の例では、画像取得部１１０は、撮像装置３０により生成された、会議の参加人物が写る画像を、ネットワークを介して接続された通信端末２０から取得することができる。また、画像取得部１１０は、図示しない他の通信端末に接続された撮像装置（第１の撮像装置）から、当該他の通信端末が設けられた場所で会議に参加している人物の画像を取得することができる。 The image acquisition unit 110 acquires an image of a person participating in the conference generated by the image pickup device 30 (first image pickup device). In the example of FIG. 1, the image acquisition unit 110 can acquire an image of a person participating in the conference generated by the image pickup apparatus 30 from a communication terminal 20 connected via a network. Further, the image acquisition unit 110 obtains an image of a person participating in the conference from an image pickup device (first image pickup device) connected to another communication terminal (not shown) at a place where the other communication terminal is provided. Can be obtained.

人物特定部１２０は、画像取得部１１０により取得された画像を解析して、当該画像に含まれる人物を特定する人物特定処理を実行する。言い換えると、人物特定部１２０は、画像取得部１１０により取得された画像に写る参加人物を個々に特定（認証）する。 The person identification unit 120 analyzes the image acquired by the image acquisition unit 110 and executes a person identification process for identifying a person included in the image. In other words, the person identification unit 120 individually identifies (authenticates) the participants appearing in the image acquired by the image acquisition unit 110.

人物特定部１２０は、例えば、次のように動作する。まず、人物特定部１２０は、画像取得部１１０により取得された画像の中から、人物と認識される領域を検出する。人物特定部１２０は、既知の一般物体検出アルゴリズムを利用して、「人物」と認識（分類）される領域を検出することができる。また、人物特定部１２０は、例えば、動きのある物体の領域を、人物の領域として検出してもよい。人物特定部１２０は、「動きのある物体」を、例えば、時系列で並ぶ複数の画像間での特徴点の移動量に基づいて判断することができる。具体的には、人物特定部１２０は、時系列で並ぶ複数の画像間において、基準値以上移動している特徴点が含まれる物体の領域を、人物の領域として推定することができる。そして、人物特定部１２０は、検出された領域から抽出される特徴量と、予め登録された参加人物の特徴量とを照合した結果に基づいて、各領域に含まれる人物が誰であるかを特定することができる。なお、会議の参加人物の特徴量は、その参加人物の情報（氏名、所属など）と対応付けて、サーバ装置１０のストレージデバイスなどに予め記憶されている。ここで、人物特定部１２０は、検出された領域に含まれる人物が特定できなかった場合、その領域に含まれる人物が特定できなかったことを示す情報（特定失敗情報）をその領域に関連付ける。なお、「人物が特定できなかった場合」とは、例えば、照合の結果として算出されるスコアが基準値以上となる人物が存在しなかった場合などである。 The person identification unit 120 operates as follows, for example. First, the person identification unit 120 detects an area recognized as a person from the image acquired by the image acquisition unit 110. The person identification unit 120 can detect a region recognized (classified) as a "person" by using a known general object detection algorithm. Further, the person identification unit 120 may detect, for example, a region of a moving object as a region of a person. The person identification unit 120 can determine the "moving object" based on, for example, the amount of movement of feature points between a plurality of images arranged in time series. Specifically, the person identification unit 120 can estimate the area of the object including the feature points moving by the reference value or more between the plurality of images arranged in the time series as the area of the person. Then, the person identification unit 120 determines who is a person included in each area based on the result of collating the feature amount extracted from the detected area with the feature amount of the participant registered in advance. Can be identified. The feature amount of the participants in the conference is stored in advance in the storage device of the server device 10 or the like in association with the information (name, affiliation, etc.) of the participants. Here, when the person included in the detected area cannot be identified, the person identification unit 120 associates the information indicating that the person included in the area cannot be identified (specific failure information) with the area. The "case where a person cannot be identified" is, for example, a case where there is no person whose score calculated as a result of collation is equal to or higher than the reference value.

位置検出部１３０は、人物特定部１２０により実行された人物特定処理で特定できなかった人物（以下、「未特定人物」と表記）が存在する場合、その未特定人物の位置を検出する。図１の例では、画像取得部１１０により取得されたメイン画像Ｍを用いた人物特定処理の結果、４人の参加人物のうち、３人の人物（Ｊａｎｅ、Ｊｏｈｎ、Ｎａｎｃｙ）が特定された一方で、残りの１人の人物が特定できず未特定人物と判断された場合が描かれている。この場合、位置検出部１３０は、特定できなかった残り１人の未特定人物の位置を検出する。一例として、位置検出部１３０は、メイン画像Ｍの座標系での未特定人物の座標（位置）をそのまま検出することができる。また他の一例として、位置検出部１３０は、メイン画像Ｍの座標系での未特定人物の座標（位置）を基に他の座標系（例えば、会議が開催される場所のマップデータでの座標系）での未特定人物の座標（位置）を検出してもよい。この場合、メイン画像Ｍの座標系での座標を別の座標系での座標に変換するルール（変換パラメータ）は、例えば、サーバ装置１０のメモリやストレージデバイスなどに予め用意されている。メイン画像Ｍの座標系での座標を別の座標系での座標に変換するルール（変換パラメータ）は、例えば、会議が行われる場所に設けられる撮像装置３０の撮像範囲と、その場所の座席位置を含むマップデータと、を基に生成することができる。変換ルール（変換パラメータ）を生成するための情報は、例えば、各地点の参加人物が利用する端末（図示せず）などからサーバ装置１０に対して会議の開催前に送信される。一例として、サーバ装置１０は、ロボット６０が保持するマップデータおよび撮像装置３０により生成された画像を並べて或いは切替可能に表示装置４０上に表示し、画像内での各座席の位置とマップデータ内の座席の位置とを対応付ける入力を受け付ける。サーバ装置１０は、この入力に基づいて、画像の座標系をマップデータの座標系に変換するルール（変換パラメータ）を生成することができる。 The position detection unit 130 detects the position of the unspecified person when there is a person (hereinafter referred to as “unspecified person”) that could not be specified by the person identification process executed by the person identification unit 120. In the example of FIG. 1, as a result of the person identification process using the main image M acquired by the image acquisition unit 110, three of the four participants (Jane, John, Nancy) were identified. In, the case where the remaining one person cannot be identified and is determined to be an unspecified person is depicted. In this case, the position detection unit 130 detects the position of the remaining one unspecified person who could not be specified. As an example, the position detection unit 130 can detect the coordinates (position) of an unspecified person in the coordinate system of the main image M as it is. As another example, the position detection unit 130 uses the coordinates (position) of an unspecified person in the coordinate system of the main image M as the coordinate in another coordinate system (for example, the coordinates in the map data of the place where the conference is held). The coordinates (position) of an unspecified person in the system) may be detected. In this case, a rule (conversion parameter) for converting the coordinates in the coordinate system of the main image M to the coordinates in another coordinate system is prepared in advance in, for example, the memory or the storage device of the server device 10. The rules (conversion parameters) for converting the coordinates in the coordinate system of the main image M to the coordinates in another coordinate system are, for example, the imaging range of the imaging device 30 provided at the place where the meeting is held and the seat position at that place. It can be generated based on the map data including. Information for generating conversion rules (conversion parameters) is transmitted to the server device 10 from a terminal (not shown) used by the participants at each point before the conference is held, for example. As an example, the server device 10 displays the map data held by the robot 60 and the image generated by the image pickup device 30 on the display device 40 side by side or can be switched, and the position of each seat in the image and the map data. Accepts input that associates with the seat position of. Based on this input, the server device 10 can generate a rule (conversion parameter) for converting the coordinate system of the image into the coordinate system of the map data.

処理実行部１４０は、位置検出部１３０により検出された未特定人物の位置を用いて、当該未特定人物を特定するための所定処理を実行する。処理実行部１４０により実行される所定処理の詳細については、後述する。処理実行部１４０の所定処理により、画像取得部１１０が、撮像装置３５によりサブ画像Ｓを取得することができる。そして、人物特定部１２０が、そのサブ画像Ｓを用いた人物特定処理を実行することで、未特定人物が特定される。 The processing execution unit 140 executes a predetermined process for identifying the unspecified person by using the position of the unspecified person detected by the position detecting unit 130. Details of the predetermined processing executed by the processing execution unit 140 will be described later. By the predetermined processing of the processing execution unit 140, the image acquisition unit 110 can acquire the sub-image S by the image pickup apparatus 35. Then, the person identification unit 120 executes a person identification process using the sub-image S to identify an unspecified person.

〔ハードウエア構成例〕
ビデオ会議システム１の各機能構成部は、各機能構成部を実現するハードウエア（例：ハードワイヤードされた電子回路など）で実現されてもよいし、ハードウエアとソフトウエアとの組み合わせ（例：電子回路とそれを制御するプログラムの組み合わせなど）で実現されてもよい。以下、ビデオ会議システム１の各機能構成部が、サーバ装置１０においてハードウエアとソフトウエアとの組み合わせによって実現される場合について、さらに説明する。 [Hardware configuration example]
Each functional component of the video conference system 1 may be realized by hardware that realizes each functional component (eg, a hard-wired electronic circuit, etc.), or a combination of hardware and software (eg, example). It may be realized by a combination of an electronic circuit and a program that controls it). Hereinafter, a case where each functional component of the video conference system 1 is realized by a combination of hardware and software in the server device 10 will be further described.

図２は、ビデオ会議システム１のハードウエア構成を例示するブロック図である。図２の例において、サーバ装置１０は、バス１０１０、プロセッサ１０２０、メモリ１０３０、ストレージデバイス１０４０、入出力インタフェース１０５０、及びネットワークインタフェース１０６０を有する。 FIG. 2 is a block diagram illustrating a hardware configuration of the video conference system 1. In the example of FIG. 2, the server device 10 has a bus 1010, a processor 1020, a memory 1030, a storage device 1040, an input / output interface 1050, and a network interface 1060.

バス１０１０は、プロセッサ１０２０、メモリ１０３０、ストレージデバイス１０４０、入出力インタフェース１０５０、及びネットワークインタフェース１０６０が、相互にデータを送受信するためのデータ伝送路である。ただし、プロセッサ１０２０などを互いに接続する方法は、バス接続に限定されない。 The bus 1010 is a data transmission path for the processor 1020, the memory 1030, the storage device 1040, the input / output interface 1050, and the network interface 1060 to transmit and receive data to and from each other. However, the method of connecting the processors 1020 and the like to each other is not limited to the bus connection.

プロセッサ１０２０は、ＣＰＵ（Central Processing Unit）やＧＰＵ（Graphics Processing Unit）などで実現されるプロセッサである。 The processor 1020 is a processor realized by a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), or the like.

メモリ１０３０は、ＲＡＭ（Random Access Memory）などで実現される主記憶装置である。 The memory 1030 is a main storage device realized by a RAM (Random Access Memory) or the like.

ストレージデバイス１０４０は、ＨＤＤ（Hard Disk Drive）、ＳＳＤ（Solid State Drive）、メモリカード、又はＲＯＭ（Read Only Memory）などで実現される補助記憶装置である。ストレージデバイス１０４０はビデオ会議システム１の各機能（画像取得部１１０、人物特定部１２０、位置検出部１３０、および処理実行部１４０など）を実現するプログラムモジュールを記憶している。プロセッサ１０２０がこれら各プログラムモジュールをメモリ１０３０上に読み込んで実行することで、そのプログラムモジュールに対応する各機能が実現される。 The storage device 1040 is an auxiliary storage device realized by an HDD (Hard Disk Drive), an SSD (Solid State Drive), a memory card, a ROM (Read Only Memory), or the like. The storage device 1040 stores a program module that realizes each function of the video conference system 1 (image acquisition unit 110, person identification unit 120, position detection unit 130, processing execution unit 140, and the like). When the processor 1020 reads each of these program modules into the memory 1030 and executes them, each function corresponding to the program module is realized.

入出力インタフェース１０５０は、サーバ装置１０と各種入出力デバイスとを接続するためのインタフェースである。入出力インタフェース１０５０には、キーボードやマウスといった入力装置（図示せず）、または、ディスプレイやスピーカーといった出力装置（図示せず）などが接続され得る。 The input / output interface 1050 is an interface for connecting the server device 10 and various input / output devices. An input device (not shown) such as a keyboard or mouse, or an output device (not shown) such as a display or speaker may be connected to the input / output interface 1050.

ネットワークインタフェース１０６０は、サーバ装置１０をネットワークに接続するためのインタフェースである。このネットワークは、例えばＬＡＮ（Local Area Network）やＷＡＮ（Wide Area Network）である。ネットワークインタフェース１０６０がネットワークに接続する方法は、無線接続であってもよいし、有線接続であってもよい。図示されるように、サーバ装置１０は、ネットワークインタフェース１０６０を介して、ビデオ会議が開催される各地点に設けられた、複数の通信端末２０と通信可能に接続されている。各通信端末２０には、会議の参加人物を撮影するための撮像装置３０、各撮像装置３０により生成された画像を表示させるための表示装置４０、および、会議中の音声を拾うための集音装置５０が接続されている。また、各通信端末２０には、会議の音声を出力するための音声出力装置（図示せず）が更に接続されている。また、サーバ装置１０は、ネットワークインタフェース１０６０を介して、メイン画像Ｍを生成する撮像装置３０（第１の撮像装置）とは別の撮像装置３５と接続されている。 The network interface 1060 is an interface for connecting the server device 10 to the network. This network is, for example, a LAN (Local Area Network) or a WAN (Wide Area Network). The method of connecting the network interface 1060 to the network may be a wireless connection or a wired connection. As shown, the server device 10 is communicably connected to a plurality of communication terminals 20 provided at each point where a video conference is held via a network interface 1060. Each communication terminal 20 has an image pickup device 30 for photographing a person participating in the meeting, a display device 40 for displaying an image generated by each image pickup device 30, and a sound collecting device for picking up the sound during the meeting. The device 50 is connected. Further, an audio output device (not shown) for outputting the audio of the conference is further connected to each communication terminal 20. Further, the server device 10 is connected to an image pickup device 35 different from the image pickup device 30 (first image pickup device) that generates the main image M via the network interface 1060.

画像取得部１１０は、ネットワークインタフェース１０６０を介して各通信端末２０から会議の参加人物が写る画像を取得することができる。また、サーバ装置１０は、ネットワークインタフェース１０６０を介して、各通信端末２０に相手の参加人物の画像を送信することができる。また、サーバ装置１０は、ネットワークインタフェース１０６０を介して、各通信端末２０にその通信端末２０が備えられている地点の参加人物の画像を送信することができる。 The image acquisition unit 110 can acquire an image of a person participating in the conference from each communication terminal 20 via the network interface 1060. Further, the server device 10 can transmit an image of a participant of the other party to each communication terminal 20 via the network interface 1060. Further, the server device 10 can transmit an image of a participant at a point where the communication terminal 20 is provided to each communication terminal 20 via the network interface 1060.

〔処理の流れ〕
図３および図４を用いて、第１実施形態のビデオ会議システム１により実行される処理の流れについて説明する。図３および図４は、第１実施形態のビデオ会議システム１により実行される処理の流れを例示するフローチャートである。 [Processing flow]
A flow of processing executed by the video conference system 1 of the first embodiment will be described with reference to FIGS. 3 and 4. 3 and 4 are flowcharts illustrating a flow of processing executed by the video conference system 1 of the first embodiment.

まず、画像取得部１１０は、通信端末２０から、ある地点における会議の参加人物が写るメイン画像Ｍを取得する（Ｓ１０２）。画像取得部１１０は、ネットワークインタフェース１０６０を介して、撮像装置３０により生成された、メイン画像Ｍを取得することができる。 First, the image acquisition unit 110 acquires the main image M in which the participants of the conference at a certain point are captured from the communication terminal 20 (S102). The image acquisition unit 110 can acquire the main image M generated by the image pickup apparatus 30 via the network interface 1060.

人物特定部１２０は、Ｓ１０２の処理で取得されたメイン画像Ｍの中から、人物と認識される領域を検出する（Ｓ１０４）。そして、人物特定部１２０は、Ｓ１０４の処理で検出された領域について、人物特定処理を実行する（Ｓ１０６）。人物特定部１２０は、人物が特定できなかった場合（Ｓ１０８：ＮＯ）、人物特定処理の対象となった領域に対して、その領域に含まれる人物が特定できなかったことを示す情報（特定失敗情報）を関連付ける（Ｓ１１０）。一方、人物が特定できた場合（Ｓ１０８：ＹＥＳ）、人物特定部１２０は、特定した人物の氏名を含む人物情報を取得し、その人物の領域に関連付ける（Ｓ１１２）。なお、人物情報は、会議の開催前に参加人物の特徴量と関連付けて取得され、ストレージデバイス１０４０などに事前に登録されている。人物情報は、人物の氏名のほか、その人物が所属するグループ（会社や部署など）の名称、その人物の役職名などを含んでいる。なお、Ｓ１０６からＳ１１０の処理は、Ｓ１０４の処理で検出された人物の領域の全てが処理されるまで繰り返される（Ｓ１１４：ＮＯ）。 The person identification unit 120 detects an area recognized as a person from the main image M acquired in the process of S102 (S104). Then, the person identification unit 120 executes the person identification process for the area detected by the process of S104 (S106). When the person cannot be identified (S108: NO), the person identification unit 120 provides information indicating that the person included in the area cannot be identified with respect to the area targeted for the person identification process (identification failure). Information) is associated (S110). On the other hand, when the person can be identified (S108: YES), the person identification unit 120 acquires the person information including the name of the specified person and associates it with the area of the person (S112). The person information is acquired in association with the feature amount of the participants before the meeting is held, and is registered in advance in the storage device 1040 or the like. In addition to the name of the person, the person information includes the name of the group (company, department, etc.) to which the person belongs, the title of the person, and the like. The processes of S106 to S110 are repeated until all the areas of the person detected in the process of S104 are processed (S114: NO).

Ｓ１０４の処理で検出された人物の領域の全てが処理された後（Ｓ１１４：ＹＥＳ）、サーバ装置１０は、Ｓ１１２の処理で関連付けられた各人物の人物情報を、その人物情報が関連付けられた領域の位置に合わせてメイン画像Ｍに重畳させて、他の地点の通信端末２０に送信する。これにより、ある地点でのメイン画像Ｍを用いた人物特定処理の結果が、そのメイン画像Ｍと共に他の地点の表示装置４０に表示される（Ｓ１１６）。 After all the areas of the person detected in the process of S104 are processed (S114: YES), the server device 10 transfers the person information of each person associated with the process of S112 to the area to which the person information is associated. It is superimposed on the main image M according to the position of, and transmitted to the communication terminal 20 at another point. As a result, the result of the person identification process using the main image M at a certain point is displayed on the display device 40 at another point together with the main image M (S116).

また、位置検出部１３０は、未特定人物が存在するか否かを判定する（Ｓ１１８）。ここで、位置検出部１３０は、特定失敗情報が関連付けられた領域が存在するか否かによって、未特定人物が存在するか否かを判断することができる。未特定人物が存在しない場合（Ｓ１１８：ＮＯ）、以降の処理は実行されない。 Further, the position detection unit 130 determines whether or not an unspecified person exists (S118). Here, the position detection unit 130 can determine whether or not an unspecified person exists depending on whether or not there is an area associated with the specific failure information. If there is no unspecified person (S118: NO), the subsequent processing is not executed.

未特定人物が存在する場合（Ｓ１１８：ＹＥＳ）、位置検出部１３０は、未特定人物の位置を検出する（Ｓ１２０）。ここで、位置検出部１３０は、メイン画像Ｍの座標系での未特定人物の座標（位置）を検出してもよいし、メイン画像Ｍの座標系での未特定人物の座標（位置）を基に他の座標系での未特定人物の座標（位置）を検出してもよい。 When an unspecified person exists (S118: YES), the position detection unit 130 detects the position of the unspecified person (S120). Here, the position detection unit 130 may detect the coordinates (position) of the unspecified person in the coordinate system of the main image M, or may detect the coordinates (position) of the unspecified person in the coordinate system of the main image M. Based on this, the coordinates (position) of an unspecified person in another coordinate system may be detected.

そして、処理実行部１４０は、位置検出部１３０により検出された未特定人物の位置を用いて、その未特定人物を特定するための所定処理を実行する（Ｓ１２２）。なお、未特定人物は、撮像装置３０とは異なる、移動型の撮像装置３５により生成されたサブ画像Ｓを用いて特定される。処理実行部１４０により実行される所定処理は、別の観点では、メイン画像Ｍとは異なる写り方で未特定人物が写るサブ画像Ｓを取得するための処理とも言える。一例として、処理実行部１４０は、移動型の撮像装置３５が組み込まれた自律移動可能なロボットに対して、撮影位置（撮像装置３５が撮影動作を実行すべき位置）を特定するための情報を生成して出力する処理を、所定処理として実行する。他の一例として、処理実行部１４０は、撮像装置３５の被写体（未特定人物）を特定するための情報を生成して出力する処理を、所定処理として実行する。 Then, the process execution unit 140 executes a predetermined process for identifying the unspecified person by using the position of the unspecified person detected by the position detection unit 130 (S122). The unspecified person is identified by using the sub-image S generated by the mobile imaging device 35, which is different from the imaging device 30. From another point of view, the predetermined process executed by the process execution unit 140 can be said to be a process for acquiring a sub-image S in which an unspecified person is captured in a manner different from that of the main image M. As an example, the processing execution unit 140 provides information for specifying a shooting position (a position where the imaging device 35 should execute a shooting operation) to an autonomously movable robot in which a mobile imaging device 35 is incorporated. The process of generating and outputting is executed as a predetermined process. As another example, the process execution unit 140 executes a process of generating and outputting information for identifying a subject (unspecified person) of the image pickup apparatus 35 as a predetermined process.

画像取得部１１０は、処理実行部１４０の所定処理に応じて撮像装置３５で生成された追加の画像（サブ画像Ｓ）を、当該撮像装置３５が組み込まれた装置から取得する（Ｓ１２４）。そして、人物特定部１２０は、サブ画像Ｓを解析することによって、未特定人物を特定する（Ｓ１２６）。ここで、人物特定部１２０は、ストレージデバイス１０４０などに予め登録された人物情報の中から、サブ画像Ｓを用いて特定された参加人物の人物情報を取得する。そして、人物特定部１２０は、メイン画像Ｍでの未特定人物の領域に、取得した人物情報を関連付ける（Ｓ１２８）。サーバ装置１０は、Ｓ１２８の処理で取得された人物情報を、その人物情報が関連付けられた領域の位置に合わせてメイン画像Ｍに重畳させて、他の地点の通信端末２０に送信する。これにより、サブ画像Ｓを用いた人物特定処理の結果が、他の地点の表示装置４０に追加表示される（Ｓ１３０）。 The image acquisition unit 110 acquires an additional image (sub-image S) generated by the image pickup device 35 in response to a predetermined process of the process execution unit 140 from the device in which the image pickup device 35 is incorporated (S124). Then, the person identification unit 120 identifies an unspecified person by analyzing the sub-image S (S126). Here, the person identification unit 120 acquires the person information of the participant specified by using the sub-image S from the person information registered in advance in the storage device 1040 or the like. Then, the person identification unit 120 associates the acquired person information with the area of the unspecified person in the main image M (S128). The server device 10 superimposes the person information acquired in the process of S128 on the main image M according to the position of the area associated with the person information, and transmits the person information to the communication terminal 20 at another point. As a result, the result of the person identification process using the sub-image S is additionally displayed on the display device 40 at another point (S130).

なお、撮影時の環境などにより、未特定人物が不鮮明なサブ画像Ｓが取得される可能性もある。人物特定部１２０は、サブ画像Ｓを解析しても未特定人物が特定できなかった場合、サブ画像Ｓの取り直し指示を、撮像装置３５が組み込まれた装置に対して出力してもよい。また、人物特定部１２０は、サブ画像Ｓの取り直しを予め決められた回数行ったにもかかわらず未特定人物が特定できなかった場合、その未特定人物を部外者（会議の参加人物として予め登録された人物以外の人物）と判断してもよい。この場合、人物特定部１２０は、表示装置４０や図示しないスピーカーなどを用いて、部外者の存在を報知する処理を実行してもよい。 It should be noted that there is a possibility that a sub-image S in which an unspecified person is unclear may be acquired depending on the environment at the time of shooting. If the unspecified person cannot be identified even after analyzing the sub-image S, the person-identifying unit 120 may output a retake instruction of the sub-image S to the device in which the imaging device 35 is incorporated. Further, when the unspecified person cannot be identified even though the sub-image S is retaken a predetermined number of times, the person identification unit 120 preliminarily sets the unspecified person as an outsider (as a participant in the meeting). It may be determined as a person other than the registered person). In this case, the person identification unit 120 may execute a process of notifying the presence of an outsider by using a display device 40, a speaker (not shown), or the like.

以上、本実施形態では、メイン画像Ｍを用いて特定できなかった人物が存在する場合、メイン画像Ｍを生成した撮像装置３０とは異なる、移動型の撮像装置３５により生成されたサブ画像Ｓを用いてその未特定人物を特定する処理が実行される。これにより、未特定人物が、会議の場で特定されないままの状態となることを防止できる。また、本実施形態によれば、会議の参加人物が、会議の場に紛れ込んだ部外者の存在を認識することができる。 As described above, in the present embodiment, when there is a person who cannot be identified using the main image M, the sub-image S generated by the mobile image pickup device 35, which is different from the image pickup device 30 that generated the main image M, is displayed. The process of identifying the unspecified person is executed by using. As a result, it is possible to prevent an unspecified person from being left unspecified at the meeting place. Further, according to the present embodiment, the participants of the conference can recognize the existence of an outsider who has been mixed into the meeting place.

［第２実施形態］
本実施形態では、撮像装置３５が、自律移動可能なロボットに組み込まれている場合の処理について説明する。本実施形態は、以下で説明する点を除き、第１実施形態と同様である。 [Second Embodiment]
In the present embodiment, the processing when the image pickup apparatus 35 is incorporated in the autonomously movable robot will be described. The present embodiment is the same as the first embodiment except for the points described below.

〔システム構成例〕
図５は、第２実施形態のビデオ会議システム１の構成例を示す図である。図５に示されるように、本実施形態の撮像装置３５は、自律移動可能なロボット６０に組み込まれている。また、本実施形態において、処理実行部１４０は、未特定人物の位置に基づいて、撮像装置３５の撮影位置を特定するための情報を生成する。また、処理実行部１４０は、撮像装置３５の撮影位置を特定するための情報をロボット６０に出力することにより、その情報により特定される位置に当該ロボット６０を誘導して撮影を実行させる。 [System configuration example]
FIG. 5 is a diagram showing a configuration example of the video conference system 1 of the second embodiment. As shown in FIG. 5, the image pickup apparatus 35 of the present embodiment is incorporated in the autonomously movable robot 60. Further, in the present embodiment, the processing execution unit 140 generates information for specifying the shooting position of the imaging device 35 based on the position of the unspecified person. Further, the processing execution unit 140 outputs information for specifying the shooting position of the imaging device 35 to the robot 60, and guides the robot 60 to the position specified by the information to execute shooting.

処理実行部１４０は、位置検出部１３０により検出された未特定人物の位置から、ロボット６０に組み込まれた撮像装置３５の撮影位置を特定する。ここで、処理実行部１４０は、撮像装置３５の撮影位置を、ロボット６０が保持するマップデータ上での位置として算出する。なお、本実施形態において、位置検出部１３０は、メイン画像Ｍの座標系での未特定人物の座標（位置）を検出してもよいし、メイン画像Ｍの座標系での未特定人物の座標（位置）を基にマップデータの座標系での未特定人物の座標（位置）を検出してもよい。前者の場合、処理実行部１４０は、メイン画像Ｍの座標系での未特定人物の座標（位置）をマップデータの座標系での座標（位置）に変換するルール（変換パラメータ）を用いて、撮像装置３５の撮影位置を算出する。そして、算出した撮影位置を特定する情報をロボット６０に出力する。後者の場合、処理実行部１４０は、位置検出部１３０により検出された位置を特定する情報をロボット６０に出力すればよい。また、処理実行部１４０は、ロボット６０に撮影時の角度を示す情報（撮像装置３５をどの方向にどの程度傾けるかを示す情報）を生成し、撮影位置に対応付けて出力することができる。具体的には、処理実行部１４０は、メイン画像Ｍ内での未特定人物の顔の位置（高さ）を更に判定し、その顔の位置および撮影位置を基準とする撮像装置３５の撮像可能範囲に基づいて、撮影時の角度を算出することができる。 The processing execution unit 140 identifies the shooting position of the image pickup device 35 incorporated in the robot 60 from the position of the unspecified person detected by the position detection unit 130. Here, the processing execution unit 140 calculates the shooting position of the imaging device 35 as a position on the map data held by the robot 60. In the present embodiment, the position detection unit 130 may detect the coordinates (position) of the unspecified person in the coordinate system of the main image M, or the coordinates of the unspecified person in the coordinate system of the main image M. The coordinates (position) of an unspecified person in the coordinate system of the map data may be detected based on (position). In the former case, the processing execution unit 140 uses a rule (conversion parameter) for converting the coordinates (position) of an unspecified person in the coordinate system of the main image M into the coordinates (position) in the coordinate system of the map data. The imaging position of the image pickup device 35 is calculated. Then, the calculated information for specifying the shooting position is output to the robot 60. In the latter case, the processing execution unit 140 may output information for identifying the position detected by the position detection unit 130 to the robot 60. Further, the processing execution unit 140 can generate information indicating an angle at the time of shooting (information indicating how much the image pickup apparatus 35 is tilted in which direction) on the robot 60, and output the information in association with the shooting position. Specifically, the processing execution unit 140 further determines the position (height) of the face of an unspecified person in the main image M, and can image the image pickup device 35 based on the position of the face and the shooting position. The angle at the time of shooting can be calculated based on the range.

ロボット６０は、処理実行部１４０から取得した情報により特定される位置まで移動する。なお、ロボット６０は、既知の自己位置推定アルゴリズムを使って自己位置を算出して、目的とする位置（処理実行部１４０から取得した情報により特定される位置）まで移動することができる。ロボット６０は、目的とする位置に到達すると、撮像装置３５に撮影動作を実行させる。これにより、メイン画像Ｍを用いて特定されなかった未特定人物が写るサブ画像Ｓが生成される。また、ロボット６０は、撮像装置３５により生成されたサブ画像Ｓを画像取得部１１０に送信する。その結果、第１実施形態で説明したように、サブ画像Ｓを用いた人物特定処理が実行可能となる。 The robot 60 moves to a position specified by the information acquired from the processing execution unit 140. The robot 60 can calculate its own position using a known self-position estimation algorithm and move to a target position (a position specified by information acquired from the processing execution unit 140). When the robot 60 reaches a target position, the robot 60 causes the image pickup device 35 to perform a shooting operation. As a result, a sub-image S in which an unspecified person who has not been specified using the main image M is captured is generated. Further, the robot 60 transmits the sub-image S generated by the image pickup apparatus 35 to the image acquisition unit 110. As a result, as described in the first embodiment, the person identification process using the sub-image S can be executed.

また、ロボット６０の動きを人が操作できるようにしてもよい。例えば、会議の参加人物が、携帯型端末（スマートフォンやノートパソコンなど）、或いは、専用のリモートコントローラを操作して、ロボット６０に対して移動指示を送信してもよい。ロボット６０の動作は、携帯型端末や専用のリモートコントローラから受信した移動指示によって制御される。なおこの場合において、会議の参加人物は、後述の第３実施形態で説明するような被写体を特定するための情報（表示装置４０上に出力される情報）を確認することにより、ロボット６０を移動させるべき位置を判断することができる。 Further, a person may be able to operate the movement of the robot 60. For example, a person participating in the conference may operate a portable terminal (smartphone, laptop computer, etc.) or a dedicated remote controller to send a movement instruction to the robot 60. The operation of the robot 60 is controlled by a movement instruction received from a portable terminal or a dedicated remote controller. In this case, the participants in the conference move the robot 60 by confirming the information for identifying the subject (information output on the display device 40) as described in the third embodiment described later. It is possible to determine the position to be made.

以上、本実施形態によれば、第１実施形態で説明した効果が得られる。また、本実施形態では、未特定人物が写るサブ画像Ｓを自律移動可能なロボット６０が自動的に取得してくれる。そのため、会議の参加人物は、未特定人物を特定するために何らかの特別なアクションを会議中に起こさなくてもよくなる。つまり、未特定人物を特定する際の手間を省くことができ、ビデオ会議システム１の利便性が向上する。 As described above, according to the present embodiment, the effects described in the first embodiment can be obtained. Further, in the present embodiment, the autonomously movable robot 60 automatically acquires the sub-image S in which an unspecified person is captured. Therefore, the participants in the meeting do not have to take any special action during the meeting to identify the unspecified person. That is, it is possible to save the trouble of identifying an unspecified person, and the convenience of the video conference system 1 is improved.

［第３実施形態］
本実施形態では、撮像装置３５が、会議の参加人物が所有する携帯型装置（例えば、スマートフォン、タブレット、ノート型パソコンなど）に組み込まれている場合の処理について説明する。本実施形態は、以下の点を除き、第１実施形態と同様である。 [Third Embodiment]
In the present embodiment, the processing when the imaging device 35 is incorporated in a portable device (for example, a smartphone, a tablet, a notebook computer, etc.) owned by the participants of the conference will be described. The present embodiment is the same as the first embodiment except for the following points.

図６は、第３実施形態のビデオ会議システム１の構成例を示す図である。図６に示されるように、本実施形態の撮像装置３５は、会議の参加人物が所有する携帯型端末７０に組み込まれている。携帯型端末７０は、例えば、スマートフォン、タブレット、ノート型パソコンなどである。また、本実施形態において、処理実行部１４０は、位置検出部１３０により検出された未特定人物の位置に基づいて、撮像装置３５の被写体（未特定人物）を特定するための情報を生成する。具体的には、処理実行部１４０は、位置検出部１３０により検出された、メイン画像Ｍにおける未特定人物の位置に合わせて、その人物が未特定人物であることを示す情報をメイン画像Ｍに重畳させたデータを生成する。そして、処理実行部１４０は、このように生成された、撮像装置３５の被写体を特定するための情報を表示装置４０（メイン画像Ｍの撮影地点に設けられた表示装置４０）に出力する（例：図７）。 FIG. 6 is a diagram showing a configuration example of the video conference system 1 of the third embodiment. As shown in FIG. 6, the image pickup apparatus 35 of the present embodiment is incorporated in a portable terminal 70 owned by a participant of the conference. The portable terminal 70 is, for example, a smartphone, a tablet, a notebook computer, or the like. Further, in the present embodiment, the processing execution unit 140 generates information for identifying the subject (unspecified person) of the image pickup apparatus 35 based on the position of the unspecified person detected by the position detecting unit 130. Specifically, the processing execution unit 140 sends information indicating that the person is an unspecified person to the main image M according to the position of the unspecified person in the main image M detected by the position detecting unit 130. Generate superimposed data. Then, the processing execution unit 140 outputs the information for identifying the subject of the image pickup device 35 generated in this way to the display device 40 (display device 40 provided at the shooting point of the main image M) (example). : Fig. 7).

図７は、表示装置４０に表示される、被写体を特定するための情報の一例を示す図である。図７では、特定済みの参加人物の氏名を示す情報に加えて、「Ｕｎｋｎｏｗｎ」という文字情報が、メイン画像Ｍに重畳表示されている様子が描かれている。図７の例では、この「Ｕｎｋｎｏｗｎ」という文字情報が、未特定人物であることを示す情報である。メイン画像Ｍの撮影地点にいる会議の参加人物は、図７に示されるような情報（「Ｕｎｋｎｏｗｎ」という文字情報）を確認することにより、撮像装置３５を使ってどの人物を撮影すればよいかを把握することができる。そして、未特定人物と判断された参加人物本人または他の参加人物が、携帯型端末７０に備えられている撮像装置３５を未特定人物と判断された参加人物に向けて撮影操作を実行する。これにより、メイン画像Ｍを用いて特定されなかった未特定人物が写るサブ画像Ｓが生成される。ここで、メイン画像Ｍの中に複数の未特定人物が存在する場合もある。この場合には、携帯型端末７０により撮影されたサブ画像Ｓがどの未特定人物に対応する画像かを示す情報が必要となる。そこで、メイン画像Ｍの中に複数の未特定人物が存在する場合、一例として、携帯型端末７０は、サブ画像Ｓに対応する未特定人物を指定する操作を更に受け付けてもよい。例えば、携帯型端末７０は、サブ画像Ｓの撮影前または撮影後にメイン画像Ｍを表示画面上に表示させ、その中から未特定人物を選択する操作を受け付けてもよい。そして、携帯型端末７０は、撮影動作に応じて、或いは、その端末を操作している人物の更なる操作に応じて、生成されたサブ画像Ｓをサーバ装置１０に送信する。その結果、第１実施形態で説明したように、サブ画像Ｓを用いた人物特定処理が実行可能となる。 FIG. 7 is a diagram showing an example of information for identifying a subject displayed on the display device 40. In FIG. 7, in addition to the information indicating the names of the identified participants, the character information "Unknown" is superimposed and displayed on the main image M. In the example of FIG. 7, the character information "Unknown" is information indicating that the person is an unspecified person. Which person should be photographed by using the image pickup apparatus 35 by confirming the information (character information "Unknown") as shown in FIG. 7 for the participants of the conference at the shooting point of the main image M. Can be grasped. Then, the participant himself / herself or another participant who is determined to be an unspecified person executes a shooting operation toward the participant who is determined to be an unspecified person with the imaging device 35 provided in the portable terminal 70. As a result, a sub-image S in which an unspecified person who has not been specified using the main image M is captured is generated. Here, a plurality of unspecified persons may exist in the main image M. In this case, information indicating which unspecified person the sub-image S taken by the portable terminal 70 corresponds to is required. Therefore, when a plurality of unspecified persons are present in the main image M, as an example, the portable terminal 70 may further accept an operation of designating the unspecified persons corresponding to the sub image S. For example, the portable terminal 70 may display the main image M on the display screen before or after shooting the sub image S, and may accept an operation of selecting an unspecified person from the main image M. Then, the portable terminal 70 transmits the generated sub-image S to the server device 10 according to the shooting operation or the further operation of the person who is operating the terminal. As a result, as described in the first embodiment, the person identification process using the sub-image S can be executed.

以上、本実施形態によれば、第１実施形態で説明した効果が得られる。 As described above, according to the present embodiment, the effects described in the first embodiment can be obtained.

［第４実施形態］
本実施形態では、自動的に議事録を作成する機能を更に有する点を除き、上述の各実施形態と同様の構成を有する。 [Fourth Embodiment]
The present embodiment has the same configuration as each of the above-described embodiments, except that it further has a function of automatically creating minutes.

図８は、第４実施形態におけるビデオ会議システム１の構成例を示す図である。図８に例示されるビデオ会議システム１は、リスト作成部１５０、音声取得部１６０、発言者特定部１７０、議事録作成部１８０を更に備える。 FIG. 8 is a diagram showing a configuration example of the video conference system 1 according to the fourth embodiment. The video conference system 1 illustrated in FIG. 8 further includes a list creation unit 150, a voice acquisition unit 160, a speaker identification unit 170, and a minutes creation unit 180.

リスト作成部１５０は、人物特定部１２０の人物特定処理によって特定された人物のリストを作成する。リスト作成部１５０は、例えば次のように動作する。まず、リスト作成部１５０は、人物特定部１２０の人物特定処理で人物が特定された場合に、人物特定部１２０からその結果を取得する。そして、リスト作成部１５０は、人物特定部１２０から取得した人物の特定結果を、メモリ１０３０などに保持されるリストに追加する。これにより、ビデオ会議システム１を利用して開催される会議の参加者のリストを自動的に生成することができる。 The list creation unit 150 creates a list of persons identified by the person identification process of the person identification unit 120. The list creation unit 150 operates as follows, for example. First, when a person is identified by the person identification process of the person identification unit 120, the list creation unit 150 acquires the result from the person identification unit 120. Then, the list creation unit 150 adds the person identification result acquired from the person identification unit 120 to the list held in the memory 1030 or the like. As a result, a list of participants in the conference held by using the video conference system 1 can be automatically generated.

音声取得部１６０は、図示しないマイクにより生成された、会議中の会話の音声データを取得する。発言者特定部１７０は、音声取得部１６０により取得された音声データに関する発言者を特定する。一例として、発言者特定部１７０は、例えば会議の開催前にストレージデバイス１０４０などに予め登録された各参加人物の声紋データとの照合を行うことにより、音声取得部１６０が取得した音声データに関する発言者を特定することができる。他の一例として、発言者特定部１７０は、音声データと同期して取得される画像（撮像装置３０により生成される画像）を解析することによって、音声取得部１６０が取得した音声データに関する発言者を特定することができる。具体的には、発言者特定部１７０は、音声データと同期して取得された画像を解析した結果、口の部分が動いている人物の領域を特定する。そして、口の部分が動いている人物の領域についての人物特定処理の結果から、その発話者を特定することができる。議事録作成部１８０は、発言者特定部１７０による発言者の特定結果と、音声取得部１６０により取得された音声データに基づいて生成されたテキストデータとを対応付けることにより、議事録データを生成する。また、議事録作成部１８０は、リスト作成部１５０により生成された人物のリストを、会議の参加者として議事録データに付加することができる。 The voice acquisition unit 160 acquires voice data of a conversation during a meeting generated by a microphone (not shown). The speaker identification unit 170 identifies a speaker regarding the voice data acquired by the voice acquisition unit 160. As an example, the speaker identification unit 170 makes a statement regarding the voice data acquired by the voice acquisition unit 160 by collating with the voiceprint data of each participant registered in advance in the storage device 1040 or the like before the conference is held. Can be identified. As another example, the speaker identification unit 170 analyzes an image (an image generated by the imaging device 30) acquired in synchronization with the voice data, so that the speaker identification unit 170 is a speaker regarding the voice data acquired by the voice acquisition unit 160. Can be identified. Specifically, the speaker identification unit 170 identifies a region of a person whose mouth is moving as a result of analyzing an image acquired in synchronization with voice data. Then, the speaker can be identified from the result of the person identification process for the area of the person whose mouth is moving. The minutes creation unit 180 generates the minutes data by associating the speaker identification result by the speaker identification unit 170 with the text data generated based on the voice data acquired by the voice acquisition unit 160. .. In addition, the minutes preparation unit 180 can add the list of persons generated by the list preparation unit 150 to the minutes data as participants in the meeting.

〔ハードウエア構成例〕
本実施形態のビデオ会議システム１は、第１実施形態と同様のハードウエア構成（例：図２）を有する。本実施形態のストレージデバイス１０４０は、上述のリスト作成部１５０、音声取得部１６０、発言者特定部１７０および議事録作成部１８０の機能を実現するためのプログラムモジュールを更に記憶している。プロセッサ１０２０が、これらのプログラムモジュールをメモリ１０３０上に読み出して実行することにより、上述の本実施形態の各機能が実現される。 [Hardware configuration example]
The video conference system 1 of the present embodiment has the same hardware configuration as that of the first embodiment (example: FIG. 2). The storage device 1040 of the present embodiment further stores a program module for realizing the functions of the list creation unit 150, the voice acquisition unit 160, the speaker identification unit 170, and the minutes creation unit 180 described above. When the processor 1020 reads and executes these program modules on the memory 1030, each function of the above-described embodiment is realized.

〔処理の流れ〕
図９を用いて、本実施形態のビデオ会議システム１により実行される処理の流れについて説明する。図９は、第４実施形態のビデオ会議システム１により実行される処理の流れを例示するフローチャートである。 [Processing flow]
The flow of processing executed by the video conference system 1 of the present embodiment will be described with reference to FIG. FIG. 9 is a flowchart illustrating a flow of processing executed by the video conference system 1 of the fourth embodiment.

まず、音声取得部１６０は会議の音声データを取得する（Ｓ２０２）。会議の音声データは、各地点に設けられている集音装置５０により生成される。集音装置５０は、通信端末２０に接続されている。音声取得部１６０は、ネットワークインタフェース１０６０を介して各地点の通信端末２０と通信して、その地点の集音装置５０により生成された音声データを取得することができる。 First, the voice acquisition unit 160 acquires the voice data of the conference (S202). The audio data of the conference is generated by the sound collecting device 50 provided at each point. The sound collecting device 50 is connected to the communication terminal 20. The voice acquisition unit 160 can communicate with the communication terminal 20 at each point via the network interface 1060 and acquire the voice data generated by the sound collector 50 at that point.

そして、発言者特定部１７０は、音声取得部１６０により取得された音声データに関する発言者を特定する（Ｓ２０４）。一例として、発言者特定部１７０は、次のようにして、音声取得部１６０により取得された音声データに関する発言者を特定することができる。まず、発言者特定部１７０は、ストレージデバイス１０４０などに事前に登録された各参加人物の声紋データと音声データとを照合して、当該音声データの声紋との一致度が基準を満たす声紋データを特定する。そして、発言者特定部１７０は、特定した声紋データに関連付けられている参加人物の識別情報（人物の氏名、または、人物毎に割り当てられたＩＤなど）を取得することにより、音声取得部１６０により取得された音声データの発言者を特定することができる。他の一例として、発言者特定部１７０は、次のようにして、音声取得部１６０により取得された音声データに関する発言者を特定することができる。まず、発言者特定部１７０は、音声データと同期して画像取得部１１０により取得された画像を解析する。具体的には、発言者特定部１７０は、画像の中から人物の口の領域を検出し、その領域（すなわち、口）が時系列で並ぶ複数の画像間で動いているか否かを判定する。そして、発言者特定部１７０は、口の領域が動いていると判定された人物の領域について、人物特定部１２０の人物特定処理の結果を取得することにより、音声取得部１６０により取得された音声データの発言者を特定することができる。また、発言者特定部１７０は、既知の話者追尾方法（例えば、センサーマイクと顔検出技術とを組み合わせて、音源が位置する方向と人物（顔）の検出位置に基づいて話者を特定する方法）を利用して、発言者を特定してもよい。なお、ここでは、例えば、上述の各実施形態で説明したような処理によって、全ての人物が特定されているものと仮定している。 Then, the speaker identification unit 170 identifies the speaker regarding the voice data acquired by the voice acquisition unit 160 (S204). As an example, the speaker identification unit 170 can identify the speaker regarding the voice data acquired by the voice acquisition unit 160 as follows. First, the speaker identification unit 170 collates the voiceprint data of each participant registered in advance in the storage device 1040 or the like with the voice data, and obtains the voiceprint data whose degree of coincidence with the voiceprint of the voice data satisfies the standard. Identify. Then, the speaker identification unit 170 obtains the identification information (name of the person, ID assigned to each person, etc.) of the participating person associated with the specified voiceprint data, and thus the voice acquisition unit 160 uses the voice acquisition unit 160. The speaker of the acquired voice data can be identified. As another example, the speaker identification unit 170 can identify the speaker regarding the voice data acquired by the voice acquisition unit 160 as follows. First, the speaker identification unit 170 analyzes the image acquired by the image acquisition unit 110 in synchronization with the voice data. Specifically, the speaker identification unit 170 detects a region of a person's mouth from an image and determines whether or not the region (that is, the mouth) is moving between a plurality of images arranged in chronological order. .. Then, the speaker identification unit 170 acquires the result of the person identification process of the person identification unit 120 for the area of the person determined to be moving in the mouth area, and thus the voice acquired by the voice acquisition unit 160. The speaker of the data can be identified. Further, the speaker identification unit 170 identifies the speaker based on the direction in which the sound source is located and the detection position of the person (face) by combining a known speaker tracking method (for example, a sensor microphone and a face detection technique). Method) may be used to identify the speaker. Here, it is assumed that all the persons are specified by, for example, the processing described in each of the above-described embodiments.

議事録作成部１８０は、音声取得部１６０および発言者特定部１７０の処理結果に基づいて、議事録データを生成する（Ｓ２０６）。具体的には、議事録作成部１８０は、音声データをテキスト化するＡＰＩ（Application Programming Interface）などを利用して、音声取得部１６０により取得された音声データをテキストデータ化する。また、議事録作成部１８０は、発言者特定部１７０によって特定された、当該音声データの発言者の情報（例えば、発言者の氏名など）を取得する。そして、議事録作成部１８０は、音声取得部１６０により取得された音声データから生成されたテキストデータと、その音声データに関する発言者として特定された人物の情報とを対応付けて、議事録データに追加する。また、議事録作成部１８０は、リスト作成部１５０により生成された、会議の参加人物リストを読み出し、議事録データに参加人物の情報を付加してもよい。 The minutes creation unit 180 generates minutes data based on the processing results of the voice acquisition unit 160 and the speaker identification unit 170 (S206). Specifically, the minutes creation unit 180 converts the voice data acquired by the voice acquisition unit 160 into text data by using an API (Application Programming Interface) or the like that converts the voice data into text. In addition, the minutes preparation unit 180 acquires the speaker information (for example, the name of the speaker) of the voice data specified by the speaker identification unit 170. Then, the minutes creation unit 180 associates the text data generated from the voice data acquired by the voice acquisition unit 160 with the information of the person specified as the speaker regarding the voice data, and converts the minutes data into the minutes data. to add. Further, the minutes preparation unit 180 may read the list of participants in the meeting generated by the list preparation unit 150 and add the information of the participants to the minutes data.

以上、本実施形態の構成によれば、ビデオ会議システム１を利用した開催される会議の議事録を、自動で作成することができる。これにより、会議の参加人物が議事録の作成する手間を削減することができる。 As described above, according to the configuration of the present embodiment, the minutes of the meeting to be held using the video conference system 1 can be automatically created. As a result, it is possible to reduce the time and effort required for the participants in the meeting to prepare the minutes.

［第５実施形態］
本実施形態は、以下の点で、上述の各実施形態と異なる。 [Fifth Embodiment]
This embodiment differs from each of the above-described embodiments in the following points.

〔システム構成例〕
図１０は、第５実施形態におけるビデオ会議システム１の構成例を示す図である。図１０に例示されるように、本実施形態のビデオ会議システム１は、位置検出部１３０および撮像装置３５を備えていない。その代わりに、本実施形態のビデオ会議システム１は、音声取得部１６０および発言者特定部１７０を備えている。音声取得部１６０および発言者特定部１７０の動作は、第４実施形態で説明した動作と同様である。本実施形態の処理実行部１４０は、音声データと同期して取得されたメイン画像Ｍを解析することによって、未特定人物が音声データに関する発話者か否かを特定する。本実施形態では、メイン画像Ｍに基づく個人認証処理と、音声データに基づく個人認証処理が並行して実行される。 [System configuration example]
FIG. 10 is a diagram showing a configuration example of the video conference system 1 according to the fifth embodiment. As illustrated in FIG. 10, the video conference system 1 of the present embodiment does not include the position detection unit 130 and the image pickup device 35. Instead, the video conference system 1 of the present embodiment includes a voice acquisition unit 160 and a speaker identification unit 170. The operations of the voice acquisition unit 160 and the speaker identification unit 170 are the same as the operations described in the fourth embodiment. The processing execution unit 140 of the present embodiment identifies whether or not the unspecified person is a speaker related to the voice data by analyzing the main image M acquired in synchronization with the voice data. In the present embodiment, the personal authentication process based on the main image M and the personal authentication process based on the voice data are executed in parallel.

〔ハードウエア構成例〕
図１１は、第５実施形態におけるビデオ会議システム１のハードウエア構成を例示する図である。図１１に例示されるハードウエア構成は、撮像装置３５が備えられていない点で、図２に例示されるハードウエア構成と異なる。また、本実施形態のストレージデバイス１０４０は、位置検出部１３０の機能を実現するプログラムモジュールを記憶していない。その代わりに、本実施形態のストレージデバイス１０４０は、音声取得部１６０および発言者特定部１７０を実現するためのプログラムモジュールを更に記憶している。また、本実施形態のストレージデバイス１０４０に記憶される処理実行部１４０のプログラムモジュールは、上述した本実施形態の処理実行部１４０の機能を実現する。 [Hardware configuration example]
FIG. 11 is a diagram illustrating a hardware configuration of the video conference system 1 according to the fifth embodiment. The hardware configuration illustrated in FIG. 11 differs from the hardware configuration illustrated in FIG. 2 in that the image pickup apparatus 35 is not provided. Further, the storage device 1040 of the present embodiment does not store the program module that realizes the function of the position detection unit 130. Instead, the storage device 1040 of the present embodiment further stores a program module for realizing the voice acquisition unit 160 and the speaker identification unit 170. Further, the program module of the processing execution unit 140 stored in the storage device 1040 of the present embodiment realizes the function of the processing execution unit 140 of the present embodiment described above.

〔処理の流れ〕
図１２を用いて、本実施形態のビデオ会議システム１により実行される処理の流れについて説明する。図１２は、第５実施形態のビデオ会議システム１により実行される処理の流れを例示するフローチャートである。なお、ここでは、図３のＳ１０２からＳ１１４までの処理（メイン画像Ｍに基づく個人認証処理）が並行して実施されている。 [Processing flow]
A flow of processing executed by the video conference system 1 of the present embodiment will be described with reference to FIG. FIG. 12 is a flowchart illustrating a flow of processing executed by the video conference system 1 of the fifth embodiment. Here, the processes from S102 to S114 in FIG. 3 (personal authentication process based on the main image M) are performed in parallel.

まず、処理実行部１４０は、並行して実行されるメイン画像Ｍに基づく個人認証処理で、未特定人物が検出されたか否かを判定する（Ｓ３０２）。未特定人物が検出されなかった場合（Ｓ３０２：ＮＯ）、以降の処理は実行されない。 First, the process execution unit 140 determines whether or not an unspecified person has been detected in the personal authentication process based on the main image M executed in parallel (S302). If an unspecified person is not detected (S302: NO), the subsequent processing is not executed.

一方、未特定人物が検出された場合（Ｓ３０２：ＹＥＳ）、発言者特定部１７０は、音声取得部１６０により取得される音声データを用いて、その音声データに関する発言者を特定する処理を開始する（Ｓ３０４）。このＳ３０４の処理の具体的な流れは、図９のＳ３０４の処理と同様である。 On the other hand, when an unspecified person is detected (S302: YES), the speaker identification unit 170 starts a process of identifying the speaker related to the voice data by using the voice data acquired by the voice acquisition unit 160. (S304). The specific flow of the process of S304 is the same as the process of S304 of FIG.

また、処理実行部１４０は、上述の音声データと同期して取得されたメイン画像Ｍを解析して、Ｓ３０４の処理で特定された発言者が未特定人物と一致するか否かを判定する（Ｓ３０６）。処理実行部１４０は、メイン画像Ｍの中で口の部分が動いている人物の領域に特定失敗情報が関連付けられているか否かに基づいて、発言者が未特定人物か否かを判定することができる。発言者が未特定人物である場合（Ｓ３０６：ＹＥＳ）、処理実行部１４０は、メイン画像Ｍの未特定人物の領域に、発言者として特定された参加人物の人物情報を関連付ける（Ｓ３０８）。この関連付けにより、メイン画像Ｍ上では、未特定人物を示す情報に代わって、その参加人物の人物情報が表示される。 Further, the processing execution unit 140 analyzes the main image M acquired in synchronization with the above-mentioned voice data, and determines whether or not the speaker specified in the processing of S304 matches the unspecified person ( S306). The processing execution unit 140 determines whether or not the speaker is an unspecified person based on whether or not the specific failure information is associated with the area of the person whose mouth is moving in the main image M. Can be done. When the speaker is an unspecified person (S306: YES), the processing execution unit 140 associates the person information of the participant specified as the speaker with the area of the unspecified person in the main image M (S308). By this association, the person information of the participating person is displayed on the main image M instead of the information indicating the unspecified person.

本実施形態では、サブ画像Ｓを生成する撮像装置３５を用いる代わりに、音声認証技術を用いて未特定人物が特定することができる。 In the present embodiment, instead of using the image pickup apparatus 35 that generates the sub-image S, an unspecified person can be identified by using a voice authentication technique.

以上、図面を参照して本発明の実施形態について述べたが、これらは本発明の例示であり、上記以外の様々な構成を採用することもできる。 Although the embodiments of the present invention have been described above with reference to the drawings, these are examples of the present invention, and various configurations other than the above can be adopted.

また、上述の説明で用いた複数のフローチャートでは、複数の工程（処理）が順番に記載されているが、各実施形態で実行される工程の実行順序は、その記載の順番に制限されない。各実施形態では、図示される工程の順番を内容的に支障のない範囲で変更することができる。また、上述の各実施形態は、内容が相反しない範囲で組み合わせることができる。 Further, in the plurality of flowcharts used in the above description, a plurality of steps (processes) are described in order, but the execution order of the steps executed in each embodiment is not limited to the order of description. In each embodiment, the order of the illustrated steps can be changed within a range that does not hinder the contents. In addition, the above-described embodiments can be combined as long as the contents do not conflict with each other.

上記の実施形態の一部または全部は、以下の付記のようにも記載されうるが、以下に限られない。
１．
第１の撮像装置により生成された、会議の参加人物が写る画像を取得する画像取得手段と、
前記画像を解析して、前記画像に含まれる人物を特定する人物特定処理を実行する人物特定手段と、
前記人物特定処理で特定できなかった未特定人物の位置を検出する位置検出手段と、
検出された前記未特定人物の位置を用いて、前記未特定人物を特定するための所定処理を実行する処理実行手段と、
を備えるビデオ会議システム。
２．
前記処理実行手段は、移動型の撮像装置である第２の撮像装置の撮影位置または被写体を特定するための情報を前記未特定人物の位置に基づいて生成して出力する処理を前記所定処理として実行し、
前記画像取得手段は、前記第２の撮像装置により生成された追加画像を取得し、
前記人物特定手段は、前記第２の撮像装置により生成された追加画像を解析して前記未特定人物を特定する、
１．に記載のビデオ会議システム。
３．
前記第２の撮像装置は、自律移動可能なロボットに組み込まれており、
前記処理実行手段は、
前記未特定人物の位置に基づいて、前記第２の撮像装置の撮影位置を特定するための情報を生成し、
前記撮影位置を特定するための情報を前記ロボットに出力することにより、当該情報により特定される位置に前記ロボットを誘導して撮影を実行させる、
２．に記載のビデオ会議システム。
４．
前記第２の撮像装置は、前記会議の参加人物が所有する携帯型端末に組み込まれており、
前記処理実行手段は、
前記未特定人物の位置に基づいて、前記第２の撮像装置の被写体を特定するための情報を生成し、
前記被写体を特定するための情報を表示装置に出力する、
２．に記載のビデオ会議システム。
５．
前記人物特定手段によって特定された人物のリストを作成するリスト作成手段を更に備える、
１．から４．のいずれか１つに記載のビデオ会議システム。
６．
音声データを取得する音声取得手段と、
前記音声データまたは前記音声データと同期して取得された画像を解析することによって、前記音声データに関する発言者を特定する発言者特定手段と、
前記発言者の特定結果と前記音声データに基づいて生成されたテキストデータとを対応付けることにより、議事録データを生成する議事録作成手段と、を更に備える、
１．から５．のいずれか１つに記載のビデオ会議システム。
７．
コンピュータが、
第１の撮像装置により生成された、会議の参加人物が写る画像を取得し、
前記画像を解析して、前記画像に含まれる人物を特定する人物特定処理を実行し、
前記人物特定処理で特定できなかった未特定人物の位置を検出し、
検出された前記未特定人物の位置を用いて、前記未特定人物を特定するための所定処理を実行する、
ことを含むビデオ会議方法。
８．
前記コンピュータが、
移動型の撮像装置である第２の撮像装置の撮影位置または被写体を特定するための情報を前記未特定人物の位置に基づいて生成して出力する処理を前記所定処理として実行し、
前記第２の撮像装置により生成された追加画像を取得し、
前記第２の撮像装置により生成された追加画像を解析して前記未特定人物を特定する、
ことを含む７．に記載のビデオ会議方法。
９．
前記第２の撮像装置は、自律移動可能なロボットに組み込まれており、
前記コンピュータが、
前記未特定人物の位置に基づいて、前記第２の撮像装置の撮影位置を特定するための情報を生成し、
前記撮影位置を特定するための情報を前記ロボットに出力することにより、当該情報により特定される位置に前記ロボットを誘導して撮影を実行させる、
ことを含む８．に記載のビデオ会議方法。
１０．
前記第２の撮像装置は、前記会議の参加人物が所有する携帯型端末に組み込まれており、
前記コンピュータが、
前記未特定人物の位置に基づいて、前記第２の撮像装置の被写体を特定するための情報を生成し、
前記被写体を特定するための情報を表示装置に出力する、
ことを含む８．に記載のビデオ会議方法。
１１．
前記コンピュータが、
前記人物特定手段によって特定された人物のリストを作成する、
ことを含む７．から１０．のいずれか１つに記載のビデオ会議方法。
１２．
前記コンピュータが、
音声データを取得し、
前記音声データまたは前記音声データと同期して取得された画像を解析することによって、前記音声データに関する発言者を特定し、
前記発言者の特定結果と前記音声データに基づいて生成されたテキストデータとを対応付けることにより、議事録データを生成する、
ことを含む７．から１１．のいずれか１つに記載のビデオ会議方法。
１３．
コンピュータに、７．から１２．のいずれか１つに記載のビデオ会議方法を実行させるためのプログラム。 Some or all of the above embodiments may also be described, but not limited to:
1. 1.
An image acquisition means for acquiring an image of a person participating in the conference generated by the first imaging device, and
A person identification means that analyzes the image and executes a person identification process for identifying a person included in the image, and
A position detecting means for detecting the position of an unspecified person who could not be identified by the person identification process, and
A process execution means for executing a predetermined process for identifying the unspecified person using the detected position of the unspecified person, and a process executing means.
Video conference system with.
2.
The process executing means sets as the predetermined process a process of generating and outputting information for identifying a shooting position or a subject of a second image pickup device, which is a mobile image pickup device, based on the position of the unspecified person. Run and
The image acquisition means acquires an additional image generated by the second image pickup apparatus, and obtains an additional image.
The person identifying means analyzes the additional image generated by the second imaging device to identify the unspecified person.
1. 1. The video conference system described in.
3. 3.
The second imaging device is incorporated in an autonomously movable robot.
The processing execution means is
Based on the position of the unspecified person, information for specifying the shooting position of the second imaging device is generated.
By outputting the information for specifying the shooting position to the robot, the robot is guided to the position specified by the information to execute shooting.
2. The video conference system described in.
4.
The second imaging device is incorporated in a portable terminal owned by a participant of the conference.
The processing execution means is
Based on the position of the unspecified person, information for identifying the subject of the second imaging device is generated.
The information for identifying the subject is output to the display device.
2. The video conference system described in.
5.
Further provided with a list creation means for creating a list of persons identified by the person identification means.
1. 1. From 4. The video conference system according to any one of the above.
6.
Voice acquisition means to acquire voice data and
A speaker identifying means for identifying a speaker related to the voice data by analyzing the voice data or an image acquired in synchronization with the voice data.
Further provided is a minutes creation means for generating minutes data by associating the specific result of the speaker with the text data generated based on the voice data.
1. 1. From 5. The video conference system according to any one of the above.
7.
The computer
Acquire the image of the participants of the conference generated by the first imaging device,
The image is analyzed, a person identification process for identifying the person included in the image is executed, and the person identification process is executed.
Detects the position of an unspecified person that could not be specified by the person identification process,
Using the detected position of the unspecified person, a predetermined process for identifying the unspecified person is executed.
Video conference methods including that.
8.
The computer
A process of generating and outputting information for identifying a shooting position or a subject of a second image pickup device, which is a mobile image pickup device, based on the position of the unspecified person is executed as the predetermined process.
The additional image generated by the second image pickup apparatus is acquired, and the additional image is acquired.
The unspecified person is identified by analyzing the additional image generated by the second imaging device.
Including 7. The video conference method described in.
9.
The second imaging device is incorporated in an autonomously movable robot.
The computer
Based on the position of the unspecified person, information for specifying the shooting position of the second imaging device is generated.
By outputting the information for specifying the shooting position to the robot, the robot is guided to the position specified by the information to execute shooting.
Including 8. The video conference method described in.
10.
The second imaging device is incorporated in a portable terminal owned by a participant of the conference.
The computer
Based on the position of the unspecified person, information for identifying the subject of the second imaging device is generated.
The information for identifying the subject is output to the display device.
Including 8. The video conference method described in.
11.
The computer
Create a list of people identified by the person identification means,
Including 7. To 10. The video conference method according to any one of the above.
12.
The computer
Get audio data,
By analyzing the voice data or an image acquired in synchronization with the voice data, a speaker regarding the voice data can be identified.
Minutes data is generated by associating the specific result of the speaker with the text data generated based on the voice data.
Including 7. From 11. The video conference method according to any one of the above.
13.
To the computer, 7. From 12. A program for executing the video conference method described in any one of the above.

１ビデオ会議システム
１０サーバ装置
１０１０バス
１０２０プロセッサ
１０３０メモリ
１０４０ストレージデバイス
１０５０入出力インタフェース
１０６０ネットワークインタフェース
１１０画像取得部
１２０人物特定部
１３０位置検出部
１４０処理実行部
１５０リスト作成部
１６０音声取得部
１７０発言者特定部
１８０議事録作成部
２０通信端末
３０撮像装置
３０第１の撮像装置
３５撮像装置
４０表示装置
５０集音装置
６０ロボット
７０携帯型端末 1 Video conferencing system 10 Server device 1010 Bus 1020 Processor 1030 Memory 1040 Storage device 1050 Input / output interface 1060 Network interface 110 Image acquisition unit 120 Person identification unit 130 Position detection unit 140 Processing execution unit 150 List creation unit 160 Voice acquisition unit 170 Speaker Specific unit 180 Minutes creation unit 20 Communication terminal 30 Imaging device 30 First imaging device 35 Imaging device 40 Display device 50 Sound collecting device 60 Robot 70 Portable terminal

Claims

An information acquisition means for acquiring the first information for identifying the participants in the conference, which is generated by the first acquisition device.
A person identification means that analyzes the first information and executes a person identification process for identifying a person included in the first information , and a person identification means.
Specific information detection means for detecting specific information of an unspecified person that could not be specified by the person identification process, and
A process execution means for executing a predetermined process for acquiring a second information capable of identifying the unspecified person by analyzing the detected specific information .
Video conference system with.

The person identification means analyzes the second information and executes a person identification process for identifying a person included in the second information.
The video conference system according to claim 1.

The first information and the second information include biological information including at least one of facial information and voice.
The specific information includes at least one of position and voice.
The video conference system according to claim 1 or 2.

The process executing means executes as the predetermined process a process of generating and outputting information for specifying a shooting position or a subject of a mobile imaging device based on the specific information of the unspecified person.
The information acquisition means acquires an additional image generated by the mobile imaging device, and obtains an additional image.
The person identifying means analyzes an additional image generated by the mobile imaging device to identify the unspecified person.
The video conference system according to any one of claims 1 to 3 .

The mobile imaging device is incorporated in a robot that can move autonomously.
The processing execution means is
Based on the specific information of the unspecified person, information for specifying the shooting position of the mobile imaging device is generated.
By outputting the information for specifying the shooting position to the robot, the robot is guided to the position specified by the information to execute shooting.
The video conference system according to claim 4 .

The mobile imaging device is incorporated in a portable terminal owned by a participant of the conference.
The processing execution means is
Based on the specific information of the unspecified person, information for identifying the subject of the mobile imaging device is generated.
The information for identifying the subject is output to the display device.
The video conference system according to claim 4 .

Further provided with a list creation means for creating a list of persons identified by the person identification means.
The video conference system according to any one of claims 1 to 6 .

Voice acquisition means to acquire voice data and
A speaker identifying means for identifying a speaker related to the voice data by analyzing the voice data or an image acquired in synchronization with the voice data.
Further provided is a minutes creation means for generating minutes data by associating the specific result of the speaker with the text data generated based on the voice data.
The video conference system according to any one of claims 1 to 7 .

The computer
Acquires the first information for identifying the participants in the conference, which is generated by the first acquisition device.
The person identification process for identifying the person included in the first information by analyzing the first information is executed.
Detects specific information of an unspecified person that could not be specified by the person identification process,
Using the detected specific information , a predetermined process for acquiring a second information capable of identifying the unspecified person by analysis is executed.
Video conference methods including that.

A program for causing a computer to execute the video conference method according to claim 9 .