JP2017092675A

JP2017092675A - Information processing apparatus, conference system, information processing method, and program

Info

Publication number: JP2017092675A
Application number: JP2015219495A
Authority: JP
Inventors: 和紀北澤; Kazuki Kitazawa; 清人五十嵐; Kiyoto Igarashi; 耕司桑田; Koji Kuwata; 高橋　仁人; Masahito Takahashi; 仁人高橋; 智幸後藤; Tomoyuki Goto; 宣正銀川; Nobumasa Gingawa; 未来袴谷; Miku Hakamatani
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2015-11-09
Filing date: 2015-11-09
Publication date: 2017-05-25
Anticipated expiration: 2035-11-09
Also published as: JP6544209B2

Abstract

PROBLEM TO BE SOLVED: To provide an information processing apparatus, conference system, information processing method, and program that are capable of, with regard to a user participating in a conference, performing close-up on video of the user as intended.SOLUTION: An information processing apparatus comprises: an imaging unit for imaging video; an input unit for inputting voice; a recognition unit for recognizing a user from the video; an identification unit for identifying an arrangement pattern showing which direction, with respect to the information processing apparatus, the user is arranged; an acquisition unit for acquiring first information including information showing the user and the user's role associated with the information showing the user; a setting unit for setting, on the basis of the first information's role corresponding to the recognized user and second information associating the role with priority, priority for the user's role; an extraction unit for extracting a video region corresponding to a direction of a user having a role having higher priority, from the video, when voice is input from a plurality of directions; and a transmission unit for transmitting video data corresponding to the extracted video region.SELECTED DRAWING: Figure 7

Description

本発明は、情報処理装置、会議システム、情報処理方法およびプログラムに関する。 The present invention relates to an information processing apparatus, a conference system, an information processing method, and a program.

遠隔地との間で、インターネット等の通信ネットワークを介して遠隔会議を行うビデオ会議システムが普及している。このビデオ会議システムにおいては、遠隔会議（ビデオ会議）を行う参加者等の当事者の一方がいる会議室において、ビデオ会議システムの端末装置を用いて会議室の参加者等の画像（映像）を撮像し、かつ、発話による音声を入力し、映像データおよび音声データを相手方の端末装置に送信する。そして、相手方の会議室のディスプレイに表示し、かつ、スピーカにより音声出力して、実際の会議に近い状態で遠隔地間の会議を実現している。 Video conferencing systems that conduct remote conferences with remote locations via a communication network such as the Internet have become widespread. In this video conference system, an image (video) of a participant in the conference room is taken using a terminal device of the video conference system in a conference room where one of the parties such as a participant conducting a remote conference (video conference) exists. In addition, the voice of the utterance is input, and the video data and the voice data are transmitted to the other party's terminal device. And it displays on the display of the other party's conference room, and outputs a voice by a speaker, thereby realizing a conference between remote locations in a state close to the actual conference.

また、ビデオ会議システムでは、会議の参加者の音声を取得するためにマイクを、映像を取得するためにカメラを使用している。ただし、カメラには画角があるため、カメラの画角外にいる参加者の映像は撮影することができない。この問題を解決するために、３６０度全方向を撮影することができるパノラマカメラを用いる方法が知られている。一方、マイクは、通常、無指向性のため、参加者の発話以外の周囲の音も集音してしまう。この問題を解決するため、マイクアレイを使用してマイクの集音範囲に指向性をもたせることにより、周囲の音の集音を抑え、参加者の発話を効率よく集音する方法が知られている。 In the video conference system, a microphone is used to acquire the voice of the conference participant and a camera is used to acquire the video. However, since the camera has an angle of view, it is not possible to take pictures of participants outside the angle of view of the camera. In order to solve this problem, a method using a panoramic camera capable of photographing 360 degrees in all directions is known. On the other hand, since the microphone is usually omnidirectional, ambient sounds other than the participant's utterance are also collected. In order to solve this problem, there is a known method for collecting sound from participants by efficiently collecting sound from surrounding sounds by using a microphone array to provide directivity to the sound collection range of the microphone. Yes.

このような、ビデオ会議システムとして、会議時における参加者の配置パターンを予め記憶しておき、会議開始時に端末装置に予め記憶されている配置パターンから、参加者の配置パターンに合致したものを選択し、会議端末は、選択された配置パターンに基づいてマイクアレイの集音方向を設定する技術が開示されている（特許文献１）。 As such a video conferencing system, the arrangement pattern of participants at the time of the conference is stored in advance, and the arrangement pattern pre-stored in the terminal device at the start of the meeting is selected according to the arrangement pattern of the participants In the conference terminal, a technique for setting the sound collection direction of the microphone array based on the selected arrangement pattern is disclosed (Patent Document 1).

しかしながら、特許文献１に記載された技術では、例えば、複数の参加者が同時に発話している場合等、それぞれの参加者のいずれの者の映像（画像）をクローズアップ（映像の切り出し）すればよいのかシステム上判別できず、意図通りに映像が切り替わらないという問題点がある。 However, in the technique described in Patent Document 1, for example, when a plurality of participants speak at the same time, if the video (image) of any one of the participants is closed up (cut out of the video), There is a problem that the system cannot determine whether it is good and the video does not switch as intended.

本発明は、上記に鑑みてなされたものであって、会議に参加している利用者について、意図通りに利用者の映像をクローズアップすることができる情報処理装置、会議システム、情報処理方法およびプログラムを提供することを目的とする。 The present invention has been made in view of the above, and an information processing apparatus, a conference system, an information processing method, and an information processing apparatus that can close up a user's video as intended for users participating in a conference. The purpose is to provide a program.

上述した課題を解決し、目的を達成するために、本発明は、情報処理装置であって、映像を撮像する撮像部と、音声を入力する入力部と、前記撮像部により撮像された前記映像から利用者を認識する認識部と、前記撮像部により撮像された前記映像において、前記認識部により認識された利用者が前記情報処理装置に対してどの方向に配置されているかを示す配置パターンを特定する特定部と、少なくとも利用者を示す情報と、該利用者を示す情報に関連付けられた利用者の役割と、を含む第１情報を取得する取得部と、前記認識部により認識された利用者に対応する前記第１情報の役割と、役割と優先度とを予め関連付けた第２情報と、に基づいて、該利用者に対して、該利用者の役割に対応する優先度を設定する設定部と、前記入力部により複数の方向から音声が入力されている場合、前記複数の方向の中で前記配置パターンで特定される利用者の方向のうち、優先度が高い役割を有する利用者の方向を優先して、該方向に対応する該利用者を含む映像領域を前記映像から切り出す切出部と、前記切出部により切り出された前記映像領域を送信する送信部と、を備えたことを特徴とする。 In order to solve the above-described problems and achieve the object, the present invention is an information processing apparatus, which is an imaging unit that captures an image, an input unit that inputs audio, and the image captured by the imaging unit A recognizing unit for recognizing the user from the image, and an arrangement pattern indicating in which direction the user recognized by the recognizing unit is arranged with respect to the information processing apparatus in the video imaged by the imaging unit. An acquisition unit that acquires first information including an identification unit to be identified, information indicating at least a user, and a role of the user associated with the information indicating the user, and a use recognized by the recognition unit The priority corresponding to the role of the user is set for the user based on the role of the first information corresponding to the user and the second information in which the role and the priority are associated in advance. In the setting part and the input part In the case where voice is input from a plurality of directions, priority is given to the direction of the user having a role with high priority among the directions of the users specified by the arrangement pattern in the plurality of directions. A cutout unit that cuts out a video area including the user corresponding to the direction from the video, and a transmission unit that transmits the video area cut out by the cutout unit.

本発明によれば、会議に参加している利用者について、意図通りに利用者の映像をクローズアップすることができる。 ADVANTAGE OF THE INVENTION According to this invention, a user's image | video can be closed up as intended about the user who has participated in the meeting.

図１は、実施の形態に係る会議システムの全体構成の一例を示す図である。FIG. 1 is a diagram illustrating an example of the overall configuration of a conference system according to an embodiment. 図２は、実施の形態に係る情報処理装置のハードウェア構成の一例を示す図である。FIG. 2 is a diagram illustrating an example of a hardware configuration of the information processing apparatus according to the embodiment. 図３は、実施の形態に係る情報処理装置の複数のマイクの配置およびパノラマカメラの配置の一例を示す図である。FIG. 3 is a diagram illustrating an example of the arrangement of a plurality of microphones and the arrangement of a panoramic camera of the information processing apparatus according to the embodiment. 図４は、実施の形態の会議情報テーブルの構成の一例を示す図である。FIG. 4 is a diagram illustrating an example of a configuration of a conference information table according to the embodiment. 図５は、実施の形態の優先度設定テーブルの構成の一例を示す図である。FIG. 5 is a diagram illustrating an example of a configuration of a priority setting table according to the embodiment. 図６は、実施の形態の顔認識特徴情報テーブルの構成の一例を示す図である。FIG. 6 is a diagram illustrating an example of a configuration of a face recognition feature information table according to the embodiment. 図７は、実施の形態に係る情報処理装置の機能ブロック構成の一例を示す図である。FIG. 7 is a diagram illustrating an example of a functional block configuration of the information processing apparatus according to the embodiment. 図８は、会議に参加する利用者の配置例を示す図である。FIG. 8 is a diagram illustrating an arrangement example of users participating in the conference. 図９は、実施の形態に係る情報処理装置の優先度設定処理の一例を示すフローチャートである。FIG. 9 is a flowchart illustrating an example of priority setting processing of the information processing apparatus according to the embodiment. 図１０は、実施の形態に係る情報処理装置の集音動作および映像切り出し動作の流れの一例を示すフローチャートである。FIG. 10 is a flowchart illustrating an example of the flow of the sound collection operation and video cutout operation of the information processing apparatus according to the embodiment.

以下に、図１〜１０を参照しながら、本発明に係る情報処理装置、会議システム、情報処理方法およびプログラムの実施の形態を詳細に説明する。また、以下の実施の形態によって本発明が限定されるものではなく、以下の実施の形態における構成要素には、当業者が容易に想到できるもの、実質的に同一のもの、およびいわゆる均等の範囲のものが含まれる。さらに、以下の実施の形態の要旨を逸脱しない範囲で構成要素の種々の省略、置換、変更および組み合わせを行うことができる。 Hereinafter, embodiments of an information processing apparatus, a conference system, an information processing method, and a program according to the present invention will be described in detail with reference to FIGS. In addition, the present invention is not limited by the following embodiments, and constituent elements in the following embodiments are easily conceivable by those skilled in the art, substantially the same, and so-called equivalent ranges. Is included. Furthermore, various omissions, substitutions, changes, and combinations of the constituent elements can be made without departing from the scope of the following embodiments.

（会議システムの構成）
図１は、実施の形態に係る会議システムの全体構成の一例を示す図である。図１を参照しながら、本実施の形態に係る会議システム１の構成について説明する。 (Conference system configuration)
FIG. 1 is a diagram illustrating an example of the overall configuration of a conference system according to an embodiment. The configuration of the conference system 1 according to the present embodiment will be described with reference to FIG.

図１に示すように、本実施の形態に係る会議システム１は、２以上の情報処理装置（情報処理装置１０ａ、１０ｂ、・・・）と、会議サーバ２０と、予約サーバ３０と、を含む。情報処理装置１０ａ、１０ｂは、それぞれインターネット等のネットワーク２を介して、会議サーバ２０および予約サーバ３０と通信可能となっている。 As shown in FIG. 1, the conference system 1 according to the present embodiment includes two or more information processing devices (information processing devices 10a, 10b,...), A conference server 20, and a reservation server 30. . The information processing apparatuses 10a and 10b can communicate with the conference server 20 and the reservation server 30 via the network 2 such as the Internet.

情報処理装置１０ａ、１０ｂは、他の情報処理装置との間で、会議サーバ２０の制御に基づいて、セッションを確立し、確立したセッションを介して、音声データおよび映像データを送受信する会議端末装置である。これにより、会議システム１において、複数の情報処理装置（１０ａ、１０ｂ、・・・）間のビデオ会議（以下、単に「会議」という場合がある）が実現される。なお、図１に示す１以上の情報処理装置（１０ａ、１０ｂ、・・・）について、任意の情報処理装置を示す場合、または総称する場合、単に「情報処理装置１０」と称するものとする。 The information processing devices 10a and 10b establish a session with other information processing devices based on the control of the conference server 20, and a conference terminal device that transmits and receives audio data and video data via the established session It is. Thereby, in the conference system 1, a video conference (hereinafter sometimes simply referred to as “conference”) between a plurality of information processing devices (10a, 10b,...) Is realized. Note that one or more information processing apparatuses (10a, 10b,...) Illustrated in FIG. 1 are simply referred to as “information processing apparatus 10” when referring to or collectively referring to any information processing apparatus.

会議サーバ２０は、各情報処理装置１０が会議サーバ２０と接続しているか否かのモニタリング、会議開始時に各情報処理装置１０の呼び出し制御、および会議時の情報処理の制御を行うサーバ装置である。 The conference server 20 is a server device that performs monitoring of whether or not each information processing device 10 is connected to the conference server 20, control of calling each information processing device 10 at the start of the conference, and control of information processing during the conference. .

予約サーバ３０は、会議を主催する利用者等が、事前に、会議情報（開催日時、開催場所、参加する利用者、役割、使用する情報処理装置等）を登録（予約）しておくサーバ装置である。会議情報については、後述する図４で説明する。また、予約サーバ３０は、後述する図５および６にそれぞれ示す優先度設定テーブル１００２および顔認識特徴情報テーブル１００３を記憶している。各テーブルの詳細については後述する。なお、予約サーバ３０は、例えば、管理ＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）等がネットワーク２を介して接続されており、上述の会議情報の登録、および優先度設定テーブル１００２および顔認識特徴情報テーブル１００３の追加および更新等の設定ができるものとしてもよい。 The reservation server 30 is a server device in which a user or the like who hosts the conference registers (reserves) the conference information (the date and time, the venue, the participating user, the role, the information processing device to be used, etc.) in advance. It is. The meeting information will be described later with reference to FIG. The reservation server 30 stores a priority setting table 1002 and a face recognition feature information table 1003 shown in FIGS. 5 and 6 to be described later. Details of each table will be described later. The reservation server 30 is connected to, for example, a management PC (Personal Computer) via the network 2 to register the conference information and add the priority setting table 1002 and the face recognition feature information table 1003. It is good also as what can perform setting, such as an update.

なお、図１に示す会議システム１の構成は一例であり、例えば、会議サーバ２０および予約サーバ３０は別々のサーバ装置としているが、これに限定されるものではなく、１つのサーバ装置で構成されるものとしてもよい。 The configuration of the conference system 1 shown in FIG. 1 is an example. For example, the conference server 20 and the reservation server 30 are separate server devices. However, the configuration is not limited to this, and the conference server 20 and the reservation server 30 are configured by one server device. It is good also as a thing.

（情報処理装置のハードウェア構成）
図２は、実施の形態に係る情報処理装置のハードウェア構成の一例を示す図である。図３は、実施の形態に係る情報処理装置の複数のマイクの配置およびパノラマカメラの配置の一例を示す図である。図２および３を参照しながら、本実施の形態に係る情報処理装置１０のハードウェア構成の詳細について説明する。 (Hardware configuration of information processing device)
FIG. 2 is a diagram illustrating an example of a hardware configuration of the information processing apparatus according to the embodiment. FIG. 3 is a diagram illustrating an example of the arrangement of a plurality of microphones and the arrangement of a panoramic camera of the information processing apparatus according to the embodiment. Details of the hardware configuration of the information processing apparatus 10 according to the present embodiment will be described with reference to FIGS.

図２に示すように、本実施の形態に係る情報処理装置１０は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）２０１と、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）２０２と、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）２０３と、補助記憶装置２０４と、メディアドライブ２０５と、操作ボタン２０６と、電源スイッチ２０７と、ネットワークＩ／Ｆ２０８と、撮像素子Ｉ／Ｆ２０９と、パノラマカメラ２１０と、音声Ｉ／Ｆ２１１と、マイクアレイ２１２と、スピーカ２１３と、出力Ｉ／Ｆ２１４と、外部機器Ｉ／Ｆ２１６と、を備えている。 As shown in FIG. 2, the information processing apparatus 10 according to the present embodiment includes a CPU (Central Processing Unit) 201, a ROM (Read Only Memory) 202, a RAM (Random Access Memory) 203, and an auxiliary storage device 204. A media drive 205, an operation button 206, a power switch 207, a network I / F 208, an image sensor I / F 209, a panoramic camera 210, an audio I / F 211, a microphone array 212, a speaker 213, An output I / F 214 and an external device I / F 216 are provided.

ＣＰＵ２０１は、情報処理装置１０全体の動作を制御する集積回路である。ＲＯＭ２０２は、情報処理装置１０用のファームウェア等のプログラムを記憶している不揮発性の記憶装置である。ＲＡＭ２０３は、ＣＰＵ２０１のワークエリアとして使用される揮発性の記憶装置である。 The CPU 201 is an integrated circuit that controls the operation of the entire information processing apparatus 10. The ROM 202 is a non-volatile storage device that stores programs such as firmware for the information processing apparatus 10. The RAM 203 is a volatile storage device used as a work area for the CPU 201.

補助記憶装置２０４は、情報処理装置１０の動作を実現する各種プログラム、ならびに映像データおよび音声データ等の各種データを記憶する不揮発性の記憶装置である。補助記憶装置２０４は、例えば、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）またはＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）等である。 The auxiliary storage device 204 is a non-volatile storage device that stores various programs for realizing the operation of the information processing apparatus 10 and various data such as video data and audio data. The auxiliary storage device 204 is, for example, an HDD (Hard Disk Drive) or an SSD (Solid State Drive).

メディアドライブ２０５は、フラッシュメモリ等である記録メディア２０５ａに対するデータの読み出しおよび書き込みを制御する装置である。 The media drive 205 is a device that controls reading and writing of data with respect to a recording medium 205a such as a flash memory.

操作ボタン２０６は、情報処理装置１０に対する設定操作等を行うためのボタンである。電源スイッチ２０７は、情報処理装置１０の電源のＯＮ／ＯＦＦを切り替えるスイッチである。 The operation button 206 is a button for performing a setting operation or the like for the information processing apparatus 10. The power switch 207 is a switch for switching on / off the power of the information processing apparatus 10.

ネットワークＩ／Ｆ２０８は、ネットワーク２を利用してデータを通信するためのインターフェースである。ネットワークＩ／Ｆ２０８は、例えば、ＮＩＣ（ＮｅｔｗｏｒｋＩｎｔｅｒｆａｃｅＣａｒｄ）等である。撮像素子Ｉ／Ｆ２０９は、ＣＰＵ２０１の制御に従って被写体を撮像して映像データを得るパノラマカメラ２１０との間で映像データを伝送するためのインターフェースである。 The network I / F 208 is an interface for communicating data using the network 2. The network I / F 208 is, for example, a NIC (Network Interface Card). The image sensor I / F 209 is an interface for transmitting video data to and from the panoramic camera 210 that captures a subject and obtains video data under the control of the CPU 201.

パノラマカメラ２１０は、レンズ、および光を電荷に変換して被写体の画像（映像）をデジタルデータ化する固体撮像素子を含む撮像装置である。パノラマカメラ２１０は、周囲３６０度の映像データを取得する。このように、３６０度の映像データを取得することにより、情報処理装置１０の周囲にいる会議に参加する利用者を全て撮像することが可能となる。パノラマカメラ２１０は、撮像素子Ｉ／Ｆ２０９に接続される。固体撮像素子としては、ＣＭＯＳ（ＣｏｍｐｌｅｍｅｎｔａｒｙＭｅｔａｌＯｘｉｄｅＳｅｍｉｃｏｎｄｕｃｔｏｒ）またはＣＣＤ（ＣｈａｒｇｅＣｏｕｐｌｅｄＤｅｖｉｃｅ）等が用いられる。また、パノラマカメラ２１０は、例えば、図３に示すように、情報処理装置１０の上面の中央部等に設置される。 The panoramic camera 210 is an imaging device that includes a lens and a solid-state imaging device that converts light into electric charges and converts an object image (video) into digital data. The panoramic camera 210 acquires video data of 360 degrees around. As described above, by acquiring 360-degree video data, it is possible to image all the users who participate in the conference around the information processing apparatus 10. The panoramic camera 210 is connected to the image sensor I / F 209. A CMOS (Complementary Metal Oxide Semiconductor) or a CCD (Charge Coupled Device) is used as the solid-state imaging device. Further, the panoramic camera 210 is installed, for example, at the center of the upper surface of the information processing apparatus 10 as shown in FIG.

音声Ｉ／Ｆ２１１は、ＣＰＵ２０１の制御に従って、音声を入力するマイクアレイ２１２および音声を出力するスピーカ２１３との間で音声信号の入出力を処理するインターフェースである。マイクアレイ２１２は、会議に参加している利用者の音声を入力する集音装置である。マイクアレイ２１２は、複数のマイクを有し、ＣＰＵ２０１の制御に従って、集音方向を任意に設定する指向性制御を実現することができる。スピーカ２１３は、ＣＰＵ２０１の制御に従って、音声を出力する装置である。マイクアレイ２１２およびスピーカ２１３は、それぞれ音声Ｉ／Ｆ２１１に接続される。また、マイクアレイ２１２は、例えば、図３に示すように、６つのマイク（２１２ａ〜２１２ｆ）を有する。マイク２１２ａ〜２１２ｆは、例えば、図３に示すように、情報処理装置１０の筐体の上面に分散して配置される。これらのマイク２１２ａ〜２１２ｆは、ＣＰＵ２０１の制御に従って、入力動作の有効または無効の切り替え、または、それぞれが入力した音声を増幅するゲイン等を切り替えることによって、周囲３６０度の任意の方向または範囲の音声を集音することができる。なお、マイクアレイ２１２は、図３に示すように６つのマイクで構成されることに限定されるものではなく、複数のマイクを有するものとすればよい。また、マイクアレイ２１２の各マイクは、図３に示すように、情報処理装置１０の筐体に分散して配置されるものとしているが、これに限定されるものではなく、マイク２１２ａ〜２１２ｆを有するマイクアレイ２１２のユニットが、情報処理装置１０の筐体とは別体として構成されているものとしてもよい。 The audio I / F 211 is an interface that processes input / output of audio signals between the microphone array 212 that inputs audio and the speaker 213 that outputs audio, under the control of the CPU 201. The microphone array 212 is a sound collection device that inputs the voices of users participating in the conference. The microphone array 212 includes a plurality of microphones, and can implement directivity control that arbitrarily sets the sound collection direction according to the control of the CPU 201. The speaker 213 is a device that outputs sound in accordance with the control of the CPU 201. The microphone array 212 and the speaker 213 are connected to the audio I / F 211, respectively. Moreover, the microphone array 212 includes, for example, six microphones (212a to 212f) as illustrated in FIG. For example, as illustrated in FIG. 3, the microphones 212 a to 212 f are distributed on the upper surface of the housing of the information processing apparatus 10. These microphones 212a to 212f can switch the input operation valid or invalid, or switch the gain for amplifying the sound input by each of them to control sound in an arbitrary direction or range around 360 degrees. Can be collected. Note that the microphone array 212 is not limited to being configured with six microphones as illustrated in FIG. 3, and may have a plurality of microphones. Further, as shown in FIG. 3, each microphone of the microphone array 212 is arranged in a distributed manner in the housing of the information processing apparatus 10, but is not limited to this, and the microphones 212 a to 212 f are included. The unit of the microphone array 212 that is included may be configured separately from the housing of the information processing apparatus 10.

出力Ｉ／Ｆ２１４は、ＣＰＵ２０１の制御に従って、外付けの表示装置２１５に映像データを伝送するためのインターフェースである。外部機器接続Ｉ／Ｆ２１６は、ＵＳＢ(ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ)ケーブル等によって、外付けカメラ、外付けマイクおよび外付けスピーカ等の外部機器がそれぞれ電気的に接続可能なインターフェースである。 The output I / F 214 is an interface for transmitting video data to the external display device 215 under the control of the CPU 201. The external device connection I / F 216 is an interface through which external devices such as an external camera, an external microphone, and an external speaker can be electrically connected by a USB (Universal Serial Bus) cable or the like.

表示装置２１５は、会議に参加している他拠点の利用者の映像を表示する表示装置である。表示装置２１５は、例えば、ＣＲＴ（ＣａｔｈｏｄｅＲａｙＴｕｂｅ）ディスプレイ、ＬＣＤ（ＬｉｑｕｉｄＣｒｙｓｔａｌＤｉｓｐｌａｙ：液晶ディスプレイ）、または有機ＥＬ（ＯｒｇａｎｉｃＥｌｅｃｔｒｏ−Ｌｕｍｉｎｅｓｃｅｎｃｅ）ディスプレイ等である。表示装置２１５は、ケーブル２１５ａによって出力Ｉ／Ｆ２１４に接続される。ケーブル２１５ａは、アナログＲＧＢ（ＶＧＡ）信号用のケーブルであってもよく、コンポーネントビデオ用のケーブルであってもよく、ＨＤＭＩ（登録商標）(Ｈｉｇｈ−ＤｅｆｉｎｉｔｉｏｎＭｕｌｔｉｍｅｄｉａＩｎｔｅｒｆａｃｅ)またはＤＶＩ(ＤｉｇｉｔａｌＶｉｄｅｏＩｎｔｅｒａｃｔｉｖｅ)信号用のケーブルであってもよい。 The display device 215 is a display device that displays images of users at other sites participating in the conference. The display device 215 is, for example, a CRT (Cathode Ray Tube) display, an LCD (Liquid Crystal Display), an organic EL (Organic Electro-Luminescence) display, or the like. The display device 215 is connected to the output I / F 214 by a cable 215a. The cable 215a may be an analog RGB (VGA) signal cable or a component video cable. The HDMI (registered trademark) (High-Definition Multimedia Interface) or DVI (Digital Video Interactive) signal may be used as the cable 215a. It may be a cable for use.

上述のＣＰＵ２０１、ＲＯＭ２０２、ＲＡＭ２０３、補助記憶装置２０４、メディアドライブ２０５、操作ボタン２０６、電源スイッチ２０７、ネットワークＩ／Ｆ２０８、撮像素子Ｉ／Ｆ２０９、音声Ｉ／Ｆ２１１、出力Ｉ／Ｆ２１４、および外部機器Ｉ／Ｆ２１６は、アドレスバスおよびデータバス等のバス２１７によって互いに通信可能に接続されている。 CPU 201, ROM 202, RAM 203, auxiliary storage device 204, media drive 205, operation button 206, power switch 207, network I / F 208, image sensor I / F 209, audio I / F 211, output I / F 214, and external device I described above The / F216 is connected to be communicable with each other via a bus 217 such as an address bus and a data bus.

なお、情報処理装置１０のハードウェア構成は、図２に示す構成に限定されるものではない。例えば、メディアドライブ２０５は備えていなくてもよい。 Note that the hardware configuration of the information processing apparatus 10 is not limited to the configuration shown in FIG. For example, the media drive 205 may not be provided.

（各種情報テーブル）
図４は、実施の形態の会議情報テーブルの構成の一例を示す図である。図４を参照しながら、予約サーバ３０が記憶する会議情報テーブル１００１について説明する。 (Various information tables)
FIG. 4 is a diagram illustrating an example of a configuration of a conference information table according to the embodiment. The conference information table 1001 stored in the reservation server 30 will be described with reference to FIG.

図１に示す予約サーバ３０は、上述のように、登録された会議情報を管理するため会議情報テーブル１００１を記憶している。図４に示すように、会議情報テーブル１００１では、例えば、会議情報を識別する会議識別情報毎に、開催日時、開催場所、利用者を識別する利用者識別情報、役割、およびその会議で使用される使用端末（情報処理装置１０）が、それぞれ関連付けられて管理されている。ここで、役割とは、会議に参加する利用者に割り当てられるその会議における役割である。例えば、役割として、議長、議事録（議事録をとる利用者）、板書（板書する利用者）、ならびに、議長、議事録および板書以外の通常の参加者等が挙げられる。また、使用端末とは、例えば、その会議室で使用される会議端末（情報処理装置１０）に一意に識別する識別情報である。 The reservation server 30 shown in FIG. 1 stores a conference information table 1001 for managing registered conference information as described above. As shown in FIG. 4, in the conference information table 1001, for example, for each conference identification information for identifying conference information, the date and time of the conference, the location, the user identification information for identifying the user, the role, and the conference are used. Used terminals (information processing apparatus 10) are managed in association with each other. Here, the role is a role in the conference assigned to a user who participates in the conference. For example, the role includes a chairperson, minutes (users who take minutes), board (users who make board), and normal participants other than the chair, minutes, and board. The used terminal is, for example, identification information that uniquely identifies a conference terminal (information processing apparatus 10) used in the conference room.

例えば、図４に示す会議情報テーブル１００１では、会議識別情報が「００２」である会議情報としては、開催日時が「２０１５／１１／１４０９：３０」、開催場所が「会議室２−Ｆ」、利用者識別情報が「１０５０７，２０３１１」、役割が「議長，参加者」、使用端末が「ＶＤ５０１１」であることが示されている。 For example, in the conference information table 1001 shown in FIG. 4, as the conference information whose conference identification information is “002”, the date and time of the meeting is “2015/11/14 09:30” and the meeting place is “Meeting room 2-F”. The user identification information is “10507, 20311”, the role is “chairperson, participant”, and the terminal used is “VD5011”.

なお、図４に示す会議情報テーブル１００１は、テーブル形式の情報としているが、これに限定されるものではなく、テーブルの各フィールドの値が互いに関連付けて管理することができれば、どのような形式の情報であってもよい。 Note that the conference information table 1001 shown in FIG. 4 is information in a table format, but is not limited to this, and any format can be used as long as the values of the fields of the table can be managed in association with each other. It may be information.

図５は、実施の形態の優先度設定テーブルの構成の一例を示す図である。図５を参照しながら、予約サーバ３０が記憶する優先度設定テーブル１００２について説明する。 FIG. 5 is a diagram illustrating an example of a configuration of a priority setting table according to the embodiment. The priority setting table 1002 stored in the reservation server 30 will be described with reference to FIG.

図１に示す予約サーバ３０は、役割に対する優先度を管理する優先度設定テーブル１００２（第２情報の一例）を記憶している。図５に示すように、優先度設定テーブル１００２では、会議の役割毎に優先度が関連付けられて管理されている。ここで、優先度とは、会議中にパノラマカメラ２１０によって撮像された映像データにおいて、それぞれの役割が割り当てられた利用者の映像の切り出しをするための優先順位を決める値である。 The reservation server 30 shown in FIG. 1 stores a priority setting table 1002 (an example of second information) for managing priorities for roles. As shown in FIG. 5, in the priority setting table 1002, the priority is associated with each role of the conference and managed. Here, the priority is a value that determines a priority order for cutting out the video of the user assigned to each role in the video data captured by the panoramic camera 210 during the conference.

例えば、図５に示す優先度設定テーブル１００２では、役割である「議長」、「議事録」、「参加者」、「ホワイトボード」および「その他」は、それぞれ、優先度として「３」、「２」、「１」、「４」および「０」が割り当てられている。図５の例では、優先度の値が大きいほど、優先度が高いことを示す。すなわち、図５に示す優先度設定テーブル１００２では、役割「ホワイトボード」の優先度が「４」で最も高く、役割「その他」の優先度が「０」で最も小さいことを示す。 For example, in the priority setting table 1002 shown in FIG. 5, the roles “chairman”, “minutes”, “participants”, “whiteboard”, and “others” have priority “3”, “ “2”, “1”, “4” and “0” are assigned. In the example of FIG. 5, the higher the priority value, the higher the priority. That is, the priority setting table 1002 shown in FIG. 5 indicates that the priority of the role “whiteboard” is the highest at “4” and the priority of the role “other” is the lowest at “0”.

なお、図５に示す優先度設定テーブル１００２は、テーブル形式の情報としているが、これに限定されるものではなく、役割と優先度とが互いに関連付けて管理することができれば、どのような形式の情報であってもよい。 The priority setting table 1002 shown in FIG. 5 is information in a table format, but is not limited to this, and any format can be used as long as roles and priorities can be managed in association with each other. It may be information.

図６は、実施の形態の顔認識特徴情報テーブルの構成の一例を示す図である。図６を参照しながら、予約サーバ３０が記憶する顔認識特徴情報テーブル１００３について説明する。 FIG. 6 is a diagram illustrating an example of a configuration of a face recognition feature information table according to the embodiment. The face recognition feature information table 1003 stored in the reservation server 30 will be described with reference to FIG.

図１に示す予約サーバ３０は、利用者に対応する顔画像の特徴情報を管理する顔認識特徴情報テーブル１００３を記憶している。図６に示すように、顔認識特徴情報テーブル１００３では、利用者を識別する利用者識別情報毎に、その利用者の顔画像の特徴情報が関連付けられて管理されている。ここで、利用者の顔画像の特徴情報とは、例えば、利用者の顔の輪郭、目、鼻、あご、およびほお骨等の各パーツの形状および相対位置等を含む情報であり、パノラマカメラ２１０により撮像された画像から利用者の顔を認識するための情報である。 The reservation server 30 shown in FIG. 1 stores a face recognition feature information table 1003 for managing feature information of face images corresponding to users. As shown in FIG. 6, in the face recognition feature information table 1003, feature information of the face image of the user is managed in association with each piece of user identification information for identifying the user. Here, the feature information of the user's face image is information including, for example, the contour of the user's face, the shape and relative position of each part such as the eyes, nose, chin, and cheekbone. This is information for recognizing the user's face from the image picked up by.

例えば、図６に示す顔認識特徴情報テーブル１００３では、利用者識別情報「２０３９１」に対して、特徴情報「｛７２，１２３，−３，・・・，−１１０、５６，２１９｝」が関連付けられている。 For example, in the face recognition feature information table 1003 shown in FIG. 6, feature information “{72, 123, −3,..., −110, 56, 219}” is associated with the user identification information “20391”. It has been.

（情報処理装置の機能ブロック構成）
図７は、実施の形態に係る情報処理装置の機能ブロック構成の一例を示す図である。図８は、会議に参加する利用者の配置例を示す図である。図７および８を参照しながら、本実施の形態に係る情報処理装置１０の機能ブロック構成の詳細について説明する。 (Function block configuration of information processing device)
FIG. 7 is a diagram illustrating an example of a functional block configuration of the information processing apparatus according to the embodiment. FIG. 8 is a diagram illustrating an arrangement example of users participating in the conference. Details of the functional block configuration of the information processing apparatus 10 according to the present embodiment will be described with reference to FIGS.

図７に示すように、本実施の形態に係る情報処理装置１０は、取得部１０１と、認識部１０２と、特定部１０３と、切出部１０４と、切替部１０５と、設定部１０６と、送信部１０７と、受信部１０８と、撮像制御部１０９と、表示制御部１１０と、音声出力制御部１１１と、入力部１１２と、記憶部１１３と、操作部１１４と、通信部１１５と、撮像部１１６と、表示部１１７と、音声出力部１１８と、を有する。 As shown in FIG. 7, the information processing apparatus 10 according to the present embodiment includes an acquisition unit 101, a recognition unit 102, a specifying unit 103, a cutout unit 104, a switching unit 105, a setting unit 106, Transmission unit 107, reception unit 108, imaging control unit 109, display control unit 110, audio output control unit 111, input unit 112, storage unit 113, operation unit 114, communication unit 115, imaging Unit 116, display unit 117, and audio output unit 118.

取得部１０１は、通信部１１５およびネットワーク２を介して、予約サーバ３０から会議情報（第１情報）を取得する機能部である。具体的には、取得部１０１は、例えば、会議情報を取得するための取得要求、ならびに、会議の開催日時、開催場所および使用端末の情報を、通信部１１５およびネットワーク２を介して予約サーバ３０に送信する。そして、予約サーバ３０は、取得要求を受信すると、図４に示す会議情報テーブル１００１を参照し、受信した開催日時、開催場所および使用端末に対応する利用者識別情報および役割を、ネットワーク２および通信部１１５を介して、取得部１０１に送信する。取得部１０１は、例えば、図２に示すＣＰＵ２０１がプログラムを実行することによって実現される。 The acquisition unit 101 is a functional unit that acquires conference information (first information) from the reservation server 30 via the communication unit 115 and the network 2. Specifically, the acquisition unit 101 sends, for example, an acquisition request for acquiring conference information, and information on the date and time of the conference, the location of the conference, and the terminal in use via the communication unit 115 and the network 2. Send to. When the reservation server 30 receives the acquisition request, the reservation server 30 refers to the conference information table 1001 shown in FIG. 4 to obtain the user identification information and the role corresponding to the received date and time, the venue, and the terminal used. The data is transmitted to the acquisition unit 101 via the unit 115. For example, the acquisition unit 101 is realized by the CPU 201 illustrated in FIG. 2 executing a program.

認識部１０２は、撮像部１１６により撮像された画像に含まれる１以上の利用者の顔画像から利用者を認識（以下、「顔認識」という場合がある）する機能部である。具体的には、認識部１０２は、例えば、まず、撮像部１１６により撮像された画像に含まれる１以上の利用者の顔画像から、顔の輪郭、目、鼻、あご、およびほお骨等の各パーツの形状および相対位置等を含む特徴情報を抽出する。次に、認識部１０２は、抽出した１以上の利用者分の特徴情報を、通信部１１５およびネットワーク２を介して予約サーバ３０に送信する。そして、予約サーバ３０は、認識部１０２より抽出された特徴情報を受信すると、図６に示す顔認識特徴情報テーブル１００３を参照し、受信した特徴情報と一致する特徴情報に対応する利用者識別情報を、ネットワーク２および通信部１１５を介して、認識部１０２に送信する。認識部１０２は、利用者識別情報を受信することによって、会議に参加する利用者を認識する。すなわち、認識部１０２が受信した利用者識別情報が示す利用者が、実際に会議に参加する利用者ということになる。ここで、認識部１０２により抽出された特徴情報と、顔認識特徴情報テーブル１００３に含まれる特徴情報とが一致するとは、２つの特徴情報が完全に一致した場合だけではなく、２つの特徴情報が実質的に同じ利用者を示す特徴情報と判断できる場合も含まれ得る。認識部１０２は、例えば、図２に示すＣＰＵ２０１がプログラムを実行することによって実現される。 The recognition unit 102 is a functional unit that recognizes a user (hereinafter also referred to as “face recognition”) from one or more user face images included in the image captured by the imaging unit 116. Specifically, for example, the recognition unit 102 first detects each of the facial contour, eyes, nose, chin, cheekbones, and the like from one or more user face images included in the image captured by the imaging unit 116. Feature information including the shape and relative position of the part is extracted. Next, the recognizing unit 102 transmits the extracted feature information for one or more users to the reservation server 30 via the communication unit 115 and the network 2. When the reservation server 30 receives the feature information extracted from the recognition unit 102, the reservation server 30 refers to the face recognition feature information table 1003 shown in FIG. 6 and user identification information corresponding to the feature information that matches the received feature information. Is transmitted to the recognition unit 102 via the network 2 and the communication unit 115. The recognizing unit 102 recognizes users participating in the conference by receiving the user identification information. That is, the user indicated by the user identification information received by the recognition unit 102 is the user who actually participates in the conference. Here, the feature information extracted by the recognition unit 102 and the feature information included in the face recognition feature information table 1003 match not only when the two feature information completely match, but also with the two feature information. A case where it can be determined that the feature information indicates substantially the same user may be included. The recognition unit 102 is realized, for example, when the CPU 201 illustrated in FIG. 2 executes a program.

特定部１０３は、撮像部１１６によって撮像された画像において、認識部１０２により認識された利用者の位置を特定し、会議に参加する利用者の会議室における利用者の配置パターンを特定する機能部である。例えば、図８（ａ）に示す状況で、机４０の上に載置された情報処理装置１０の撮像部１１６により撮像された３６０度全方向の画像（以下、「パノラマ画像」という場合がある）において、特定部１０３は、認識部１０２によって認識された利用者６０ａ〜６０ｅについて、利用者６０ａが領域Ｐ１に、利用者６０ｂが領域Ｐ２に、利用者６０ｃが領域Ｐ３に、利用者６０ｄが領域Ｐ５に、利用者６０ｅが領域Ｐ６に、そして、ホワイトボード５０が領域Ｐ７に位置することを示す配置パターンを特定する。また、図８（ｂ）に示す状況で、撮像部１１６により撮像されたパノラマ画像において、特定部１０３は、認識部１０２によって認識された利用者６１ａ〜６１ｄについて、利用者６１ａが領域Ｐ２に、利用者６１ｂが領域Ｐ３に、利用者６１ｃが領域Ｐ４に、利用者６１ｄが領域Ｐ５に、そして、ホワイトボード５０が領域Ｐ７に位置することを示す配置パターンを特定する。すなわち、配置パターンを特定するということは、情報処理装置１０に対してどの方向にどの利用者またはホワイトボードが配置されているかを特定することになる。特定部１０３は、例えば、図２に示すＣＰＵ２０１がプログラムを実行することによって実現される。なお、配置パターンで情報処理装置１０に対して配置された方向が特定される利用者およびホワイトボード等を「利用者等」という場合がある。 The identification unit 103 identifies the position of the user recognized by the recognition unit 102 in the image captured by the imaging unit 116, and identifies the arrangement pattern of the user in the conference room of the user participating in the conference. It is. For example, in the situation illustrated in FIG. 8A, an image in 360 degrees omnidirectional (hereinafter referred to as “panoramic image”) captured by the imaging unit 116 of the information processing apparatus 10 placed on the desk 40 may be used. ), For the users 60a to 60e recognized by the recognition unit 102, the user 60a is in the area P1, the user 60b is in the area P2, the user 60c is in the area P3, and the user 60d is An arrangement pattern indicating that the user 60e is located in the area P6 and the whiteboard 50 is located in the area P7 is specified in the area P5. In the situation shown in FIG. 8B, in the panoramic image captured by the imaging unit 116, the specifying unit 103 sets the user 61 a to the region P <b> 2 for the users 61 a to 61 d recognized by the recognition unit 102. An arrangement pattern indicating that the user 61b is located in the area P3, the user 61c is located in the area P4, the user 61d is located in the area P5, and the whiteboard 50 is located in the area P7 is specified. That is, specifying the arrangement pattern specifies which user or whiteboard is arranged in which direction with respect to the information processing apparatus 10. The specifying unit 103 is realized, for example, when the CPU 201 illustrated in FIG. 2 executes a program. Note that a user, a whiteboard, or the like in which the direction arranged with respect to the information processing apparatus 10 is specified by the arrangement pattern may be referred to as “user etc.”.

なお、ホワイトボード５０の位置を特定するには、情報処理装置１０を机４０に設置した状態で、情報処理装置１０とホワイトボード５０との位置関係を予め記憶部１１３に記憶させておくものとしてもよい。または、認識部１０２が、利用者の顔のみではなく、ホワイトボード５０の基準画像に基づくテンプレートマッチング等の公知の方法により、パノラマ画像においてホワイトボード５０を認識するものとしてもよい。 In order to specify the position of the whiteboard 50, it is assumed that the positional relationship between the information processing apparatus 10 and the whiteboard 50 is stored in the storage unit 113 in a state where the information processing apparatus 10 is installed on the desk 40. Also good. Alternatively, the recognition unit 102 may recognize the whiteboard 50 in the panoramic image by a known method such as template matching based on the reference image of the whiteboard 50 instead of only the user's face.

また、撮像部１１６により撮像された画像を「パノラマ画像」という場合があるものとしたが、撮像部１１６により撮像する客体として映像を示す場合、「パノラマ映像」という場合があるものとする。ただし、映像は、画像を含む概念であるものとする。 In addition, an image captured by the imaging unit 116 is sometimes referred to as a “panoramic image”. However, when an image is shown as an object captured by the imaging unit 116, it may be referred to as a “panoramic image”. However, the video is a concept including an image.

切出部１０４は、利用者に割り当てられた役割に対応する優先度に基づいて、パノラマ映像から特定の利用者等（図８の例では、例えば、複数の利用者およびホワイトボード５０のうちいずれか）の映像領域を切り出す機能部である。切出部１０４は、例えば、図２に示すＣＰＵ２０１がプログラムを実行することによって実現される。 Based on the priority corresponding to the role assigned to the user, the cutout unit 104 selects a specific user or the like from the panoramic video (in the example of FIG. 8, for example, any one of the plurality of users and the whiteboard 50). This is a functional part that cuts out the video area. The cutting unit 104 is realized, for example, when the CPU 201 illustrated in FIG. 2 executes a program.

切替部１０５は、利用者に割り当てられた役割に対応する優先度に基づいて、入力部１１２の集音方向を特定の利用者等（図８の例では、例えば、複数の利用者およびホワイトボード５０のうちいずれか）が位置する方向となるように切り替える機能部である。切替部１０５は、例えば、図２に示すＣＰＵ２０１がプログラムを実行することによって実現される。 Based on the priority corresponding to the role assigned to the user, the switching unit 105 changes the sound collection direction of the input unit 112 to a specific user or the like (in the example of FIG. 8, for example, a plurality of users and whiteboards). It is a functional unit that switches so that any one of 50) is located. The switching unit 105 is realized, for example, when the CPU 201 illustrated in FIG. 2 executes a program.

設定部１０６は、利用者の操作入力を受け付けた操作部１１４からの操作情報に基づいて、または、取得部１０１により取得された会議情報に基づいて、認識部１０２により認識された利用者に対して、優先度を設定する機能部である。設定部１０６は、例えば、図２に示すＣＰＵ２０１がプログラムを実行することによって実現される。 The setting unit 106 detects the user recognized by the recognition unit 102 based on the operation information from the operation unit 114 that has received the user's operation input or based on the conference information acquired by the acquisition unit 101. This is a function unit for setting the priority. The setting unit 106 is realized, for example, when the CPU 201 illustrated in FIG. 2 executes a program.

送信部１０７は、切出部１０４により切り出された映像領域に対応する映像データ、および、入力部１１２により入力された音声データを、通信部１１５およびネットワーク２を介して、他拠点の情報処理装置１０に送信する機能部である。具体的には、送信部１０７は、例えば、映像データおよび音声データをエンコードして、他拠点の情報処理装置１０に送信する。ここで、エンコードの方法としては、公知の方法を用いればよい。例えば、Ｈ．２６４／ＡＶＣ、またはＨ．２６４／ＳＶＣ等の圧縮符号化技術を用いればよい。送信部１０７は、例えば、図２に示すＣＰＵ２０１がプログラムを実行することによって実現される。 The transmission unit 107 transmits the video data corresponding to the video area cut out by the cut-out unit 104 and the audio data input by the input unit 112 via the communication unit 115 and the network 2 to information processing apparatuses at other bases. 10 is a functional unit that transmits data to 10. Specifically, for example, the transmission unit 107 encodes video data and audio data and transmits the encoded data to the information processing apparatus 10 at another site. Here, a known method may be used as the encoding method. For example, H.M. H.264 / AVC, or H.264. A compression coding technique such as H.264 / SVC may be used. The transmission unit 107 is realized, for example, when the CPU 201 illustrated in FIG. 2 executes a program.

受信部１０８は、ネットワーク２および通信部１１５を介して、他拠点の情報処理装置１０から受信した映像データおよび音声データを受信する機能部である。具体的には、受信部１０８は、例えば、受信した映像データおよび音声データをデコードし、デコードした映像データを表示制御部１１０に送り、デコードした音声データを音声出力制御部１１１に送る。ここで、デコードの方法としては、公知の方法を用いればよい。受信部１０８は、例えば、図２に示すＣＰＵ２０１がプログラムを実行することによって実現される。 The receiving unit 108 is a functional unit that receives video data and audio data received from the information processing apparatus 10 at another site via the network 2 and the communication unit 115. Specifically, for example, the receiving unit 108 decodes received video data and audio data, sends the decoded video data to the display control unit 110, and sends the decoded audio data to the audio output control unit 111. Here, a known method may be used as a decoding method. The receiving unit 108 is realized, for example, when the CPU 201 illustrated in FIG. 2 executes a program.

撮像制御部１０９は、撮像部１１６の動作を制御する機能部である。具体的には、撮像制御部１０９は、例えば、撮像部１１６による撮像の開始および停止の動作等を制御し、撮像部１１６により撮像されたパノラマ画像を取得する。撮像制御部１０９は、例えば、図２に示すＣＰＵ２０１がプログラムを実行することによって実現される。 The imaging control unit 109 is a functional unit that controls the operation of the imaging unit 116. Specifically, for example, the imaging control unit 109 controls the start and stop operations of imaging performed by the imaging unit 116 and acquires a panoramic image captured by the imaging unit 116. The imaging control unit 109 is realized, for example, when the CPU 201 illustrated in FIG. 2 executes a program.

表示制御部１１０は、表示部１１７に各種画像を表示させる制御を行う機能部である。表示制御部１１０は、例えば、図２に示すＣＰＵ２０１がプログラムを実行することによって実現される。 The display control unit 110 is a functional unit that performs control to display various images on the display unit 117. The display control unit 110 is realized, for example, when the CPU 201 illustrated in FIG. 2 executes a program.

音声出力制御部１１１は、音声出力部１１８に各種音声を出力させる制御を行う機能部である。音声出力制御部１１１は、例えば、図２に示すＣＰＵ２０１がプログラムを実行することによって実現される。 The audio output control unit 111 is a functional unit that controls the audio output unit 118 to output various types of audio. The audio output control unit 111 is realized, for example, when the CPU 201 illustrated in FIG. 2 executes a program.

入力部１１２は、音声を入力する機能部である。入力部１１２は、切替部１０５の制御に従って、特定の集音方向の音声を入力する。入力部１１２は、例えば、図２に示すマイクアレイ２１２によって実現される。 The input unit 112 is a functional unit that inputs voice. The input unit 112 inputs sound in a specific sound collection direction according to the control of the switching unit 105. The input unit 112 is realized by, for example, the microphone array 212 illustrated in FIG.

記憶部１１３は、情報処理装置１０の動作を実現する各種プログラム、映像データ、音声データ、および特定部１０３によって特定された配置パターン等の情報を記憶する機能部である。記憶部１１３は、例えば、図２に示すＲＡＭ２０３および補助記憶装置２０４によって実現される。 The storage unit 113 is a functional unit that stores various programs for realizing the operation of the information processing apparatus 10, video data, audio data, and information such as the arrangement pattern specified by the specifying unit 103. The storage unit 113 is realized by, for example, the RAM 203 and the auxiliary storage device 204 illustrated in FIG.

操作部１１４は、利用者（例えば、会議の参加者）の各種操作入力を受け付ける機能部である。操作部１１４は、例えば、図２に示す操作ボタン２０６および電源スイッチ２０７等によって実現される。なお、操作部１１４は、図２に示す操作ボタン２０６および電源スイッチ２０７に限定されるものではなく、マウス、キーボード、またはタッチパネル等によって実現されるものとしてもよい。 The operation unit 114 is a functional unit that receives various operation inputs of a user (for example, a participant in a conference). The operation unit 114 is realized by, for example, the operation button 206 and the power switch 207 shown in FIG. Note that the operation unit 114 is not limited to the operation button 206 and the power switch 207 illustrated in FIG. 2, and may be realized by a mouse, a keyboard, a touch panel, or the like.

通信部１１５は、ネットワーク２を介して、他の情報処理装置１０、会議サーバ２０および予約サーバ３０とデータ通信をする機能部である。通信部１１５は、例えば、図２に示すネットワークＩ／Ｆ２０８によって実現される。 The communication unit 115 is a functional unit that performs data communication with the other information processing apparatus 10, the conference server 20, and the reservation server 30 via the network 2. The communication unit 115 is realized by, for example, the network I / F 208 shown in FIG.

撮像部１１６は、３６０度全方向のパノラマ画像またはパノラマ映像を撮像する機能部である。撮像部１１６は、例えば、図２に示すパノラマカメラ２１０によって実現される。 The imaging unit 116 is a functional unit that captures panoramic images or panoramic images in all directions of 360 degrees. The imaging unit 116 is realized by, for example, the panoramic camera 210 illustrated in FIG.

表示部１１７は、表示制御部１１０の制御に従って、各種画像を表示する機能部である。表示部１１７は、例えば、図２に示す表示装置２１５によって実現される。 The display unit 117 is a functional unit that displays various images under the control of the display control unit 110. The display unit 117 is realized by, for example, the display device 215 illustrated in FIG.

音声出力部１１８は、音声出力制御部１１１の制御に従って、各種音声を出力する機能部である。音声出力部１１８は、例えば、図２に示すスピーカ２１３によって実現される。 The sound output unit 118 is a functional unit that outputs various sounds according to the control of the sound output control unit 111. The audio output unit 118 is realized by, for example, the speaker 213 illustrated in FIG.

なお、図７に示す情報処理装置１０の取得部１０１、認識部１０２、特定部１０３、切出部１０４、切替部１０５、設定部１０６、送信部１０７、受信部１０８、撮像制御部１０９、表示制御部１１０、音声出力制御部１１１、入力部１１２、記憶部１１３、操作部１１４、通信部１１５、撮像部１１６、表示部１１７および音声出力部１１８は、機能を概念的に示したものであって、このような構成に限定されるものではない。例えば、図７に示す情報処理装置１０で独立した機能部として図示した複数の機能部を、１つの機能部として構成してもよい。一方、図７に示す情報処理装置１０で１つの機能部が有する機能を複数に分割し、複数の機能部として構成するものとしてもよい。 Note that the acquisition unit 101, the recognition unit 102, the identification unit 103, the clipping unit 104, the switching unit 105, the setting unit 106, the transmission unit 107, the reception unit 108, the imaging control unit 109, and the display of the information processing apparatus 10 illustrated in FIG. The control unit 110, the audio output control unit 111, the input unit 112, the storage unit 113, the operation unit 114, the communication unit 115, the imaging unit 116, the display unit 117, and the audio output unit 118 are conceptually shown functions. Thus, the present invention is not limited to such a configuration. For example, a plurality of functional units illustrated as independent functional units in the information processing apparatus 10 illustrated in FIG. 7 may be configured as one functional unit. On the other hand, in the information processing apparatus 10 illustrated in FIG. 7, the functions of one function unit may be divided into a plurality of functions and configured as a plurality of function units.

また、情報処理装置１０の取得部１０１、認識部１０２、特定部１０３、切出部１０４、切替部１０５、設定部１０６、送信部１０７、受信部１０８、撮像制御部１０９、表示制御部１１０および音声出力制御部１１１の一部または全部は、ソフトウェアであるプログラムではなく、ＦＰＧＡ（Ｆｉｅｌｄ−ＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）またはＡＳＩＣ（ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）等のハードウェア回路によって実現されてもよい。 In addition, the acquisition unit 101, the recognition unit 102, the identification unit 103, the cutout unit 104, the switching unit 105, the setting unit 106, the transmission unit 107, the reception unit 108, the imaging control unit 109, the display control unit 110, and the information processing apparatus 10 A part or all of the audio output control unit 111 may be realized by a hardware circuit such as an FPGA (Field-Programmable Gate Array) or an ASIC (Application Specific Integrated Circuit) instead of a program that is software.

（優先度設定処理）
図９は、実施の形態に係る情報処理装置の優先度設定処理の一例を示すフローチャートである。図９を参照しながら、本実施の形態に係る情報処理装置１０の優先度設定処理の流れについて説明する。 (Priority setting process)
FIG. 9 is a flowchart illustrating an example of priority setting processing of the information processing apparatus according to the embodiment. With reference to FIG. 9, the flow of the priority setting process of the information processing apparatus 10 according to the present embodiment will be described.

＜ステップＳ１１＞
まず、会議に参加しようとする利用者は、情報処理装置１０の操作部１１４を操作して、情報処理装置１０の電源をＯＮ状態にし、撮像部１１６によって周囲の画像（パノラマ画像）を撮像するための操作入力を行う。撮像制御部１０９は、操作部１１４からパノラマ画像を撮像するための操作情報を受け取ると、撮像部１１６にパノラマ画像を撮像させる。そして、ステップＳ１２へ移行する。 <Step S11>
First, a user who wants to participate in the conference operates the operation unit 114 of the information processing apparatus 10 to turn on the power of the information processing apparatus 10 and captures a surrounding image (panoramic image) with the imaging unit 116. Operation input is performed. Upon receiving operation information for capturing a panoramic image from the operation unit 114, the imaging control unit 109 causes the imaging unit 116 to capture a panoramic image. Then, the process proceeds to step S12.

＜ステップＳ１２＞
認識部１０２は、撮像部１１６により撮像されたパノラマ画像に含まれる１以上の利用者の顔画像から、顔の輪郭、目、鼻、あご、およびほお骨等の各パーツの形状および相対位置等を含む特徴情報を抽出する。次に、認識部１０２は、抽出した１以上の利用者分の特徴情報を、通信部１１５およびネットワーク２を介して予約サーバ３０に送信する。そして、予約サーバ３０は、認識部１０２より抽出された特徴情報を受信すると、図６に示す顔認識特徴情報テーブル１００３を参照し、受信した特徴情報と一致する特徴情報に対応する利用者識別情報を、ネットワーク２および通信部１１５を介して、認識部１０２に送信する。認識部１０２は、利用者識別情報を取得することによって、会議に参加する利用者を認識（顔認識）する。 <Step S12>
The recognizing unit 102 determines the shape and relative position of each part such as the face outline, eyes, nose, chin, and cheekbone from one or more user face images included in the panoramic image captured by the image capturing unit 116. Extract feature information. Next, the recognizing unit 102 transmits the extracted feature information for one or more users to the reservation server 30 via the communication unit 115 and the network 2. When the reservation server 30 receives the feature information extracted from the recognition unit 102, the reservation server 30 refers to the face recognition feature information table 1003 shown in FIG. 6 and user identification information corresponding to the feature information that matches the received feature information. Is transmitted to the recognition unit 102 via the network 2 and the communication unit 115. The recognition unit 102 recognizes (face recognition) a user who participates in the conference by acquiring user identification information.

そして、特定部１０３は、撮像部１１６によって撮像されたパノラマ画像において、認識部１０２により認識された利用者の位置を特定し、会議に参加する利用者の会議室における利用者の配置パターンを特定する。特定部１０３は、特定した配置パターンの情報を、記憶部１１３に記憶させる。そして、ステップＳ１３へ移行する。 The specifying unit 103 specifies the position of the user recognized by the recognition unit 102 in the panoramic image captured by the imaging unit 116, and specifies the arrangement pattern of the user in the conference room of the user participating in the conference. To do. The identifying unit 103 causes the storage unit 113 to store information on the identified arrangement pattern. Then, the process proceeds to step S13.

＜ステップＳ１３＞
取得部１０１は、通信部１１５およびネットワーク２を介して、予約サーバ３０から会議情報を取得する。具体的には、取得部１０１は、会議情報を取得するための取得要求、ならびに、会議の開催日時、開催場所および使用端末の情報を、通信部１１５およびネットワーク２を介して予約サーバ３０に送信する。予約サーバ３０は、取得要求を受信すると、図４に示す会議情報テーブル１００１を参照し、受信した開催日時、開催場所および使用端末に対応する利用者識別情報および役割を、ネットワーク２および通信部１１５を介して、取得部１０１に送信する。そして、取得部１０１は、これから開催される会議の会議情報として、会議に参加する利用者の利用者識別情報および役割を取得する。これから開催される会議の会議情報が取得部１０１によって取得できた場合（ステップＳ１３：Ｙｅｓ）、ステップＳ１４へ移行し、取得できなかった場合（ステップＳ１３：Ｎｏ）、ステップＳ１５へ移行する。 <Step S13>
The acquisition unit 101 acquires conference information from the reservation server 30 via the communication unit 115 and the network 2. Specifically, the acquisition unit 101 transmits an acquisition request for acquiring conference information and information on the date and time of the conference, the location of the conference, and the terminal used to the reservation server 30 via the communication unit 115 and the network 2. To do. When the reservation server 30 receives the acquisition request, the reservation server 30 refers to the conference information table 1001 shown in FIG. 4 and stores the user identification information and the role corresponding to the received date and time, the venue, and the terminal used in the network 2 and the communication unit 115. To the acquisition unit 101. And the acquisition part 101 acquires the user identification information and role of the user who participates in a meeting as meeting information of a meeting to be held from now. When meeting information of a meeting to be held can be obtained by the obtaining unit 101 (step S13: Yes), the process proceeds to step S14, and when it cannot be obtained (step S13: No), the process proceeds to step S15.

＜ステップＳ１４＞
設定部１０６は、取得部１０１により取得された会議情報のうちの役割に対応する優先度を取得するための取得要求、およびその役割の情報を、通信部１１５およびネットワーク２を介して予約サーバ３０に送信する。予約サーバ３０は、取得要求を受信すると、図５に示す優先度設定テーブル１００２を参照し、受信した役割に対応する優先度を、ネットワーク２および通信部１１５を介して、設定部１０６に送信する。また、設定部１０６は、取得部１０１により取得された会議情報のうちの利用者識別情報と、認識部１０２による顔認識により取得された利用者識別情報（実際に会議に参加する利用者の利用者識別情報）とが一致する利用者に対応する役割に基づいて、その利用者に対して、受信した優先度を設定する。そして、ステップＳ１８へ移行する。 <Step S14>
The setting unit 106 sends an acquisition request for acquiring the priority corresponding to the role in the conference information acquired by the acquiring unit 101 and the information on the role via the communication unit 115 and the network 2 to the reservation server 30. Send to. When receiving the acquisition request, the reservation server 30 refers to the priority setting table 1002 shown in FIG. 5 and transmits the priority corresponding to the received role to the setting unit 106 via the network 2 and the communication unit 115. . The setting unit 106 also includes user identification information in the conference information acquired by the acquisition unit 101 and user identification information acquired by face recognition by the recognition unit 102 (use of users who actually participate in the conference). The received priority is set for the user based on the role corresponding to the user whose user ID matches the user identification information. Then, the process proceeds to step S18.

なお、取得部１０１により取得された会議情報のうちの利用者識別情報の中に、認識部１０２による顔認識により取得された利用者識別情報がない場合、すなわち、会議情報に登録されていない利用者が、実際の会議室に存在する場合、設定部１０６は、デフォルトの役割（例えば、「参加者」）を設定するものとすればよい。または、設定部１０６は、会議情報に登録されていない利用者は実際の会議に参加しないものと判断し、役割を「その他」に設定するものとしてもよい。または、会議情報に登録されていない利用者が、実際の会議室に存在する場合、利用者は、操作部１１４を介して、その登録されていない利用者に対して手動で役割の設定操作を行うものとしてもよい。 In addition, when the user identification information acquired by the face recognition by the recognition unit 102 is not included in the user identification information in the conference information acquired by the acquisition unit 101, that is, the usage not registered in the conference information. When the person exists in the actual meeting room, the setting unit 106 may set a default role (for example, “participant”). Alternatively, the setting unit 106 may determine that a user who is not registered in the conference information does not participate in the actual conference, and sets the role to “other”. Alternatively, when a user who is not registered in the conference information exists in the actual conference room, the user manually performs a role setting operation on the unregistered user via the operation unit 114. It may be done.

＜ステップＳ１５＞
これから開催される会議の会議情報が取得部１０１によって取得できなかった場合に、利用者が、認識部１０２により顔認識された利用者に対して、操作部１１４から手動で役割を設定する操作入力を行った場合（ステップＳ１５：Ｙｅｓ）、ステップＳ１６へ移行する。一方、利用者が、認識部１０２により顔認識された利用者に対して、操作部１１４から手動で役割を設定する操作入力を行わなかった場合（ステップＳ１５：Ｎｏ）、ステップＳ１７へ移行する。 <Step S15>
An operation input in which the user manually sets a role from the operation unit 114 to the user whose face is recognized by the recognition unit 102 when the acquisition unit 101 cannot acquire the meeting information of the meeting to be held (Step S15: Yes), the process proceeds to step S16. On the other hand, when the user does not perform the operation input for manually setting the role from the operation unit 114 to the user whose face is recognized by the recognition unit 102 (step S15: No), the process proceeds to step S17.

＜ステップＳ１６＞
設定部１０６は、利用者により操作部１１４から入力された役割の設定操作に基づいて、認識部１０２により顔認識された利用者に対して役割を設定し、その役割に対応する優先度を取得するための取得要求、およびその役割の情報を、通信部１１５およびネットワーク２を介して予約サーバ３０に送信する。予約サーバ３０は、取得要求を受信すると、図５に示す優先度設定テーブル１００２を参照し、受信した役割に対応する優先度を、ネットワーク２および通信部１１５を介して、設定部１０６に送信する。また、設定部１０６は、認識部１０２により顔認識された利用者に対応する役割に基づいて、その利用者に対して、受信した優先度を設定する。そして、ステップＳ１８へ移行する。 <Step S16>
The setting unit 106 sets a role for the user whose face is recognized by the recognition unit 102 based on the role setting operation input from the operation unit 114 by the user, and obtains a priority corresponding to the role. An acquisition request for performing the request and information on its role are transmitted to the reservation server 30 via the communication unit 115 and the network 2. When receiving the acquisition request, the reservation server 30 refers to the priority setting table 1002 shown in FIG. 5 and transmits the priority corresponding to the received role to the setting unit 106 via the network 2 and the communication unit 115. . The setting unit 106 sets the received priority for the user based on the role corresponding to the user whose face is recognized by the recognition unit 102. Then, the process proceeds to step S18.

＜ステップＳ１７＞
設定部１０６は、デフォルトの役割設定に基づいて、認識部１０２により顔認識された利用者に対して役割を設定し、その役割に対応する優先度を取得するための取得要求、およびその役割の情報を、通信部１１５およびネットワーク２を介して予約サーバ３０に送信する。予約サーバ３０は、取得要求を受信すると、図５に示す優先度設定テーブル１００２を参照し、受信した役割に対応する優先度を、ネットワーク２および通信部１１５を介して、設定部１０６に送信する。また、設定部１０６は、認識部１０２により顔認識された利用者に対応する役割に基づいて、その利用者に対して、受信した優先度を設定する。ここで、デフォルトの役割設定とは、例えば、図８に示す領域Ｐ１に存在する利用者に対して優先的に役割を「議長」に設定し、領域Ｐ２に存在する利用者に対して優先的に役割を「議事録」に設定し、領域Ｐ６に存在する利用者に対して優先的に役割を「板書」に設定し、その他の利用者の役割を「参加者」に設定する、というような予め定められた役割の設定である。そして、ステップＳ１８へ移行する。 <Step S17>
The setting unit 106 sets a role for the user whose face is recognized by the recognition unit 102 based on the default role setting, an acquisition request for acquiring the priority corresponding to the role, and the role Information is transmitted to the reservation server 30 via the communication unit 115 and the network 2. When receiving the acquisition request, the reservation server 30 refers to the priority setting table 1002 shown in FIG. 5 and transmits the priority corresponding to the received role to the setting unit 106 via the network 2 and the communication unit 115. . The setting unit 106 sets the received priority for the user based on the role corresponding to the user whose face is recognized by the recognition unit 102. Here, the default role setting means, for example, that the role is set to “chairperson” preferentially for the user existing in the area P1 shown in FIG. 8, and is preferential to the user existing in the area P2. The role is set to “minutes”, the role is set to “board” preferentially for the users existing in the area P6, and the roles of other users are set to “participants”. This is a predetermined role setting. Then, the process proceeds to step S18.

＜ステップＳ１８＞
情報処理装置１０は、設定部１０６によって、顔認識がされた利用者に対して役割および優先度が設定された後、ビデオ会議を開始する。具体的には、情報処理装置１０は、後述する図１０に示す集音動作および映像切り出し動作の処理に移行する。 <Step S18>
The information processing apparatus 10 starts the video conference after the setting unit 106 sets the role and priority for the user whose face is recognized. Specifically, the information processing apparatus 10 proceeds to processing of a sound collection operation and a video cut-out operation shown in FIG.

以上のステップＳ１１〜Ｓ１８の動作によって、情報処理装置１０により優先度設定処理が実行される。 The priority setting process is executed by the information processing apparatus 10 through the operations in steps S11 to S18.

なお、図９に示す優先度設定処理においては、会議情報に基づいて役割および優先度を設定するか、利用者による役割設定操作に基づいて役割および優先度を設定するかについては、予約サーバ３０の会議情報テーブル１００１に、該当する会議情報の有無によって判定するものとしているが、これに限定されるものではない。例えば、情報処理装置１０で手動モードおよび自動モードのいずれかを設定できるようにし、手動モードの場合は、予約サーバ３０の会議情報テーブル１００１を参照せず、利用者の役割設定操作により設定し、自動モードの場合は、予約サーバ３０の会議情報テーブル１００１を参照して設定するものとしてもよい。 In the priority setting process shown in FIG. 9, whether the role and priority are set based on the conference information or whether the role and priority is set based on the role setting operation by the user is determined by the reservation server 30. The conference information table 1001 is determined based on the presence / absence of corresponding conference information, but is not limited to this. For example, the information processing apparatus 10 can set either the manual mode or the automatic mode. In the manual mode, the user can set the role by the user's role setting operation without referring to the conference information table 1001 of the reservation server 30. In the case of the automatic mode, it may be set with reference to the conference information table 1001 of the reservation server 30.

また、図９に示す優先度設定処理のうち、ステップＳ１１〜Ｓ１７の処理は、会議中においても、所定時間ごとに、または、所定の条件を充足した場合に再実行するものとしてもよい。所定の条件を充足した場合とは、例えば、撮像部１１６により撮像されているパノラマ映像において、認識部１０２が顔認識した利用者の位置が移動した場合、前回に認識部１０２により顔認識した利用者がいなくなった場合、または、前回に認識部１０２により顔認識した利用者以外の利用者が顔認識された場合等が挙げられる。 In addition, among the priority setting processes shown in FIG. 9, the processes in steps S11 to S17 may be re-executed every predetermined time or when a predetermined condition is satisfied even during the meeting. When the predetermined condition is satisfied, for example, when the position of the user whose face is recognized by the recognition unit 102 in the panoramic image captured by the image pickup unit 116 is moved, the use of the face recognized by the recognition unit 102 last time is used. For example, when the user disappears, or when a user other than the user whose face was previously recognized by the recognition unit 102 was recognized as a face.

（集音動作および映像切り出し動作）
図１０は、実施の形態に係る情報処理装置の集音動作および映像切り出し動作の流れの一例を示すフローチャートである。図１０を参照しながら、本実施の形態に係る情報処理装置１０の集音動作および映像切り出し動作の流れについて説明する。 (Sound collection operation and video cut-out operation)
FIG. 10 is a flowchart illustrating an example of the flow of the sound collection operation and video cutout operation of the information processing apparatus according to the embodiment. With reference to FIG. 10, the flow of the sound collection operation and video cutout operation of the information processing apparatus 10 according to the present embodiment will be described.

＜ステップＳ３１＞
まず、切替部１０５は、入力部１１２の集音方向を３６０度全方向になるように切り替える。そして、ステップＳ３２へ移行する。 <Step S31>
First, the switching unit 105 switches the sound collection direction of the input unit 112 to be 360 degrees in all directions. Then, the process proceeds to step S32.

＜ステップＳ３２＞
切出部１０４は、特定部１０３により特定された配置パターンで配置方向が定まった利用者等のうち、優先度が最も高い役割を有する利用者等の映像領域を、撮像部１１６により撮像されているパノラマ映像から切り出す。例えば、優先度設定テーブル１００２が図５に示す内容に設定されている場合、役割「ホワイトボード」の優先度が「４」で最も高いので、切出部１０４は、配置パターンで配置方向が定まっているホワイトボードを含む映像領域をパノラマ映像から切り出す。そして、ステップ３３へ移行する。 <Step S32>
The clipping unit 104 captures an image area of a user having a role with the highest priority among the users whose arrangement direction is determined by the arrangement pattern specified by the specifying unit 103 by the imaging unit 116. Cut out from the panoramic video. For example, when the priority setting table 1002 is set to the content shown in FIG. 5, the priority of the role “whiteboard” is the highest at “4”, so that the cutout unit 104 has the arrangement direction determined by the arrangement pattern. The video area including the whiteboard is cut out from the panorama video. Then, the process proceeds to step 33.

なお、このステップＳ３２で、切出部１０４は、優先度が最も高い役割を有する利用者等の映像領域を切り出すものとしているが、これに限定されるものではない。例えば、どの利用者等を切り出しの対象とするかについての設定を、優先度とは別個独立に設定するものとしてもよい。 In step S32, the cutout unit 104 cuts out a video area of a user or the like having a role with the highest priority. However, the present invention is not limited to this. For example, the setting as to which user or the like is to be cut out may be set independently of the priority.

＜ステップＳ３３＞
送信部１０７は、切出部１０４により切り出された映像領域に対応する映像データをエンコードし、通信部１１５およびネットワーク２を介して、他拠点の情報処理装置１０に送信する。そして、ステップＳ３４へ移行する。 <Step S33>
The transmission unit 107 encodes video data corresponding to the video area cut out by the cut-out unit 104 and transmits the encoded video data to the information processing apparatus 10 at another base via the communication unit 115 and the network 2. Then, control goes to a step S34.

＜ステップＳ３４＞
入力部１１２は、音声の入力の受け付けを開始する。そして、ステップＳ３５へ移行する。 <Step S34>
The input unit 112 starts accepting voice input. Then, the process proceeds to step S35.

＜ステップＳ３５＞
情報処理装置１０は、ビデオ会議が終了したか否かを判定する。例えば、情報処理装置１０は、利用者が操作部１１４から会議終了のための操作を行ったか否かを判定する。ビデオ会議が終了した場合（ステップＳ３５：Ｙｅｓ）、集音動作および映像切り出し動作を終了し、ビデオ会議が終了していない場合（ステップＳ３５：Ｎｏ）、ステップＳ３６へ移行する。 <Step S35>
The information processing apparatus 10 determines whether or not the video conference has ended. For example, the information processing apparatus 10 determines whether the user has performed an operation for ending the conference from the operation unit 114. When the video conference is finished (step S35: Yes), the sound collection operation and the video cutout operation are finished. When the video conference is not finished (step S35: No), the process proceeds to step S36.

＜ステップＳ３６＞
入力部１１２により音声が入力された場合（ステップＳ３６：Ｙｅｓ）、ステップＳ３７へ移行し、入力部１１２により音声が入力されない場合（ステップＳ３６：Ｎｏ）、ステップＳ３１へ戻る。 <Step S36>
When the voice is input by the input unit 112 (step S36: Yes), the process proceeds to step S37, and when the voice is not input by the input unit 112 (step S36: No), the process returns to step S31.

＜ステップＳ３７＞
入力部１１２は、入力した音声が複数の方向からの音声か否かを判定する。入力部１１２により複数の方向からの音声が入力された場合（ステップＳ３７：Ｙｅｓ）、ステップＳ３８へ移行し、入力部１１２により１の方向からの音声が入力された場合（ステップＳ３７：Ｎｏ）、ステップＳ４０へ移行する。 <Step S37>
The input unit 112 determines whether or not the input sound is sound from a plurality of directions. When audio from a plurality of directions is input by the input unit 112 (step S37: Yes), the process proceeds to step S38, and when audio from one direction is input by the input unit 112 (step S37: No), Control goes to step S40.

＜ステップＳ３８＞
入力部１１２により複数の方向からの音声が入力されている場合、切替部１０５は、特定部１０３により特定された利用者等の配置パターンから、複数の方向に対応する利用者を特定し、特定した利用者のうち最も優先度の高い役割を有する利用者の方向に、入力部１１２の集音方向を切り替える。 <Step S38>
When voices from a plurality of directions are input by the input unit 112, the switching unit 105 identifies and identifies users corresponding to a plurality of directions from the arrangement pattern of the users and the like identified by the identifying unit 103. The sound collection direction of the input unit 112 is switched to the direction of the user who has the highest priority role among the selected users.

ここで、例えば、切替部１０５によって、特定の利用者の方向に入力部１１２の集音方向が切り替えられた後、入力部１１２により新たな方向から音声が入力された場合、切替部１０５は、現在の入力部１１２の集音方向に対応する利用者の役割の優先度と、新たな方向に対応する利用者の役割の優先度とを比較し、優先度が高い役割を有する利用者の方向に、入力部１１２の集音方向を切り替えるものとすればよい。これは、後述する、ステップＳ４０でも同様である。 Here, for example, when the sound collection direction of the input unit 112 is switched to the direction of a specific user by the switching unit 105 and then a sound is input from a new direction by the input unit 112, the switching unit 105 The direction of the user having a role with a higher priority by comparing the priority of the role of the user corresponding to the sound collection direction of the current input unit 112 with the priority of the role of the user corresponding to the new direction. In addition, the sound collection direction of the input unit 112 may be switched. The same applies to step S40 described later.

なお、切替部１０５は、特定した利用者のうち最も優先度の高い役割を有する利用者の方向に、入力部１１２の集音方向を切り替えるものとしたが、これに限定されるものではない。すなわち、切替部１０５は、特定した利用者のうち、他の利用者の役割の優先度よりも高い優先度の役割を有する利用者の方向を優先して、入力部１１２の集音方向を切り替えるものとしてもよい。 Note that although the switching unit 105 switches the sound collection direction of the input unit 112 to the direction of the user having the highest priority among the identified users, the present invention is not limited to this. That is, the switching unit 105 switches the sound collection direction of the input unit 112 with priority given to the direction of a user having a higher priority role than the priority of other user roles among the identified users. It may be a thing.

また、切替部１０５は、配置パターンから特定した、複数の方向に対応する利用者がそれぞれ有する役割の優先度が等しい場合、例えば、等しい優先度の役割を有する利用者のうち、ランダムに決定した利用者の方向に、入力部１１２の集音方向を切り替えるものとすればよい。または、切替部１０５は、等しい優先度の役割を有する利用者のうち、先に入力部１１２により音声が入力された利用者の方向を優先して、入力部１１２の集音方向を切り替えるものとしてもよい。 In addition, when the priority of the roles that the users corresponding to a plurality of directions specified by the arrangement pattern have the same priority, for example, the switching unit 105 is determined randomly among the users having the roles of the same priority, for example. What is necessary is just to switch the sound collection direction of the input part 112 to a user's direction. Alternatively, the switching unit 105 switches the sound collection direction of the input unit 112 with priority given to the direction of the user whose voice has been input first by the input unit 112 among the users having equal priority roles. Also good.

入力部１１２は、切替部１０５により切り替えられた集音方向からの音声を入力する。そして、ステップＳ３９へ移行する。 The input unit 112 inputs sound from the sound collection direction switched by the switching unit 105. Then, the process proceeds to step S39.

＜ステップＳ３９＞
切出部１０４は、撮像部１１６により撮像されているパノラマ映像から、切替部１０５により切り替えられた集音方向に対応する利用者を含む映像領域を切り出す。そして、ステップＳ４２へ移行する。 <Step S39>
The cutout unit 104 cuts out a video area including the user corresponding to the sound collection direction switched by the switching unit 105 from the panoramic video captured by the imaging unit 116. Then, the process proceeds to step S42.

＜ステップＳ４０＞
入力部１１２により１の方向からの音声が入力されている場合、切替部１０５は、特定部１０３により特定された利用者等の配置パターンから、１の方向に対応する利用者を特定し、特定した利用者の方向に、入力部１１２の集音方向を切り替える。入力部１１２は、切替部１０５により切り替えられた集音方向からの音声を入力する。そして、ステップＳ４１へ移行する。 <Step S40>
When the voice from one direction is input by the input unit 112, the switching unit 105 identifies and identifies the user corresponding to the one direction from the arrangement pattern of the users and the like identified by the identifying unit 103. The sound collection direction of the input unit 112 is switched to the user direction. The input unit 112 inputs sound from the sound collection direction switched by the switching unit 105. Then, the process proceeds to step S41.

なお、切替部１０５は、配置パターンにより特定した１の方向に対応する利用者の役割およびその優先度が、例えば、図５に示す優先度設定テーブル１００２のように、それぞれ「その他」および「０」となっている場合は、その利用者は会議に参加しないと判定し、その利用者の方向に集音方向を切り替えない、すなわち、その利用者が発話する音声を集音させないものとしてもよい。この場合、後述のステップＳ４１でも、切出部１０４は、その利用者を含む映像領域をパノラマ映像から切り出さないものとすればよい。 Note that the switching unit 105 determines that the role of the user corresponding to the one direction specified by the arrangement pattern and the priority thereof are “other” and “0” as in the priority setting table 1002 shown in FIG. ”, It is determined that the user does not participate in the conference, and the sound collection direction is not switched to the direction of the user, that is, the sound uttered by the user may not be collected. . In this case, the cutout unit 104 may not cut out the video area including the user from the panoramic video also in step S41 described later.

＜ステップＳ４１＞
切出部１０４は、撮像部１１６により撮像されているパノラマ映像から、切替部１０５により切り替えられた集音方向に対応する利用者を含む映像領域を切り出す。そして、ステップＳ４２へ移行する。 <Step S41>
The cutout unit 104 cuts out a video area including the user corresponding to the sound collection direction switched by the switching unit 105 from the panoramic video captured by the imaging unit 116. Then, the process proceeds to step S42.

＜ステップＳ４２＞
送信部１０７は、切出部１０４により切り出された映像領域に対応する映像データ、および、入力部１１２により入力された音声データをエンコードし、通信部１１５およびネットワーク２を介して、他拠点の情報処理装置１０に送信する。そして、ステップＳ３５へ戻り、動作が繰り返される。 <Step S42>
The transmission unit 107 encodes the video data corresponding to the video region cut out by the cut-out unit 104 and the audio data input by the input unit 112, and transmits information on other bases via the communication unit 115 and the network 2. It transmits to the processing apparatus 10. And it returns to step S35 and operation | movement is repeated.

以上のステップＳ３１〜Ｓ４２の動作によって、情報処理装置１０により集音動作および映像切り出し動作が実行される。 Through the operations in steps S31 to S42 described above, the information processing apparatus 10 performs a sound collection operation and a video cutout operation.

以上のように、本実施の形態に係る情報処理装置１０では、ビデオ会議に参加する利用者等に対して、予め役割および優先度を設定し、複数の利用者が発話した場合には、優先度が高い役割を有する利用者の方向に、入力部１１２の集音方向を切り替え、パノラマ画像から集音方向に対応する利用者を含む映像領域を切り出すものとしている。そして、送信部１０７は、切出部１０４により切り出された映像領域に対応する映像データ、および、入力部１１２により入力された音声データをエンコードし、他拠点の情報処理装置１０に送信するものとしている。これによって、会議中に複数の利用者が同時に発話している場合でも、優先度の高い利用者の音声を集音し、その利用者を含む画像を切り出して、他拠点の情報処理装置１０に送信するので、会議に参加している利用者について、意図通りに利用者の映像をクローズアップすることができる。 As described above, in the information processing apparatus 10 according to the present embodiment, roles and priorities are set in advance for users who participate in a video conference, and priority is given when a plurality of users speak. The sound collection direction of the input unit 112 is switched to the direction of a user having a high role, and a video area including the user corresponding to the sound collection direction is cut out from the panoramic image. Then, the transmission unit 107 encodes the video data corresponding to the video region cut out by the cut-out unit 104 and the audio data input by the input unit 112 and transmits the encoded data to the information processing apparatus 10 at another base. Yes. As a result, even when a plurality of users are speaking at the same time during the conference, the voices of the users with high priority are collected, and an image including the users is cut out to the information processing apparatus 10 at another base. Since it is transmitted, it is possible to close up the video of the user as intended for the user participating in the conference.

また、予約サーバ３０に予め会議情報を登録しておくので、会議に参加する利用者の役割を改めて設定する必要がなく、手間を省くことができ、会議をスムーズに開始することができる。 Further, since the conference information is registered in advance in the reservation server 30, it is not necessary to set the role of the user who participates in the conference again, so that labor can be saved and the conference can be started smoothly.

また、会議の開始前に、撮像部１１６がパノラマ画像を撮像し、認識部１０２が会議に参加する利用者を認識し、特定部１０３が利用者の配置パターンを特定しておくので、会議中に毎回、顔認識および配置パターンの特定等の動作を行う必要がなく、ＣＰＵ２０１の負荷を軽減することができる。 In addition, before the start of the conference, the imaging unit 116 captures a panoramic image, the recognition unit 102 recognizes the user participating in the conference, and the specifying unit 103 specifies the user arrangement pattern. It is not necessary to perform operations such as face recognition and arrangement pattern identification every time, and the load on the CPU 201 can be reduced.

なお、撮像部１１６は、パノラマカメラ２１０によって実現されるものとし、パノラマ画像またはパノラマ映像を撮像するものとしたが、必ずしもこれに限定されるものではない。すなわち、撮像する範囲が３６０度全方向である必要がない等の場合、パノラマカメラを利用する必要はなく、例えば、必要な撮像範囲を網羅する画角を有する撮像装置（カメラ）であってもよい。この場合、撮像装置が撮像可能な画角の範囲で、映像の切り出し、および集音方向の設定を行うものとすればよい。 In addition, although the imaging part 116 shall be implement | achieved by the panorama camera 210 and shall take a panoramic image or a panoramic image, it is not necessarily limited to this. That is, when the imaging range does not need to be 360 degrees in all directions, it is not necessary to use a panoramic camera. For example, even an imaging device (camera) having an angle of view that covers a necessary imaging range can be used. Good. In this case, it is only necessary to cut out the video and set the sound collection direction within the range of the angle of view that can be captured by the imaging apparatus.

また、上述の実施の形態において、情報処理装置１０の各機能部の少なくともいずれかがプログラムの実行によって実現される場合、そのプログラムは、ＲＯＭ等に予め組み込まれて提供される。また、上述の実施の形態に係る情報処理装置１０で実行されるプログラムは、インストール可能な形式または実行可能な形式のファイルでＣＤ−ＲＯＭ（ＣｏｍｐａｃｔＤｉｓｃＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、フレキシブルディスク（ＦＤ）、ＣＤ−Ｒ（ＣｏｍｐａｃｔＤｉｓｋ−Ｒｅｃｏｒｄａｂｌｅ）、またはＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｃ）等のコンピュータで読み取り可能な記憶媒体に記憶して提供するように構成してもよい。また、上述の実施の形態の情報処理装置１０で実行されるプログラムを、インターネット等のネットワークに接続されたコンピュータ上に格納し、ネットワーク経由でダウンロードさせることにより提供するように構成してもよい。また、上述の実施の形態の情報処理装置１０で実行されるプログラムを、インターネット等のネットワーク経由で提供または配布するように構成してもよい。また、上述の実施の形態の情報処理装置１０で実行されるプログラムは、上述した各機能部のうち少なくともいずれかを含むモジュール構成となっており、実際のハードウェアとしてはＣＰＵ２０１が上述の記憶装置（例えば、ＲＯＭ２０２および補助記憶装置２０４等）からプログラムを読み出して実行することにより、上述の各機能部が主記憶装置（例えば、ＲＡＭ２０３）上にロードされて生成されるようになっている。 In the above-described embodiment, when at least one of the functional units of the information processing apparatus 10 is realized by executing a program, the program is provided by being incorporated in advance in a ROM or the like. The program executed by the information processing apparatus 10 according to the above-described embodiment is a file in an installable format or an executable format, and is a CD-ROM (Compact Disc Read Only Memory), a flexible disk (FD), a CD. -R (Compact Disk-Recordable) or DVD (Digital Versatile Disc) may be stored and provided in a computer-readable storage medium. Further, the program executed by the information processing apparatus 10 according to the above-described embodiment may be configured to be provided by being stored on a computer connected to a network such as the Internet and downloaded via the network. Further, the program executed by the information processing apparatus 10 according to the above-described embodiment may be configured to be provided or distributed via a network such as the Internet. In addition, the program executed by the information processing apparatus 10 according to the above-described embodiment has a module configuration including at least one of the above-described functional units. As actual hardware, the CPU 201 includes the above-described storage device. By reading and executing the program from the ROM 202 and the auxiliary storage device 204 (for example, the ROM 202 and the auxiliary storage device 204), the above-described functional units are loaded on the main storage device (for example, the RAM 203) and generated.

１会議システム
２ネットワーク
１０、１０ａ、１０ｂ情報処理装置
２０会議サーバ
３０予約サーバ
４０机
５０ホワイトボード
６０ａ〜６０ｅ利用者
６１ａ〜６１ｄ利用者
１０１取得部
１０２認識部
１０３特定部
１０４切出部
１０５切替部
１０６設定部
１０７送信部
１０８受信部
１０９撮像制御部
１１０表示制御部
１１１音声出力制御部
１１２入力部
１１３記憶部
１１４操作部
１１５通信部
１１６撮像部
１１７表示部
１１８音声出力部
２０１ＣＰＵ
２０２ＲＯＭ
２０３ＲＡＭ
２０４補助記憶装置
２０５メディアドライブ
２０５ａ記録メディア
２０６操作ボタン
２０７電源スイッチ
２０８ネットワークＩ／Ｆ
２０９撮像素子Ｉ／Ｆ
２１０パノラマカメラ
２１１音声Ｉ／Ｆ
２１２マイクアレイ
２１２ａ〜２１２ｆマイク
２１３スピーカ
２１４出力Ｉ／Ｆ
２１５表示装置
２１５ａケーブル
２１６外部機器Ｉ／Ｆ
２１７バス
１００１会議情報テーブル
１００２優先度設定テーブル
１００３顔認識特徴情報テーブル
Ｐ１〜Ｐ７領域 DESCRIPTION OF SYMBOLS 1 Conference system 2 Network 10, 10a, 10b Information processing apparatus 20 Conference server 30 Reservation server 40 Desk 50 White board 60a-60e User 61a-61d User 101 Acquisition part 102 Recognition part 103 Identification part 104 Cutting part 105 Switching part DESCRIPTION OF SYMBOLS 106 Setting part 107 Transmission part 108 Reception part 109 Imaging control part 110 Display control part 111 Audio | voice output control part 112 Input part 113 Storage part 114 Operation part 115 Communication part 116 Imaging part 117 Display part 118 Audio | voice output part 201 CPU
202 ROM
203 RAM
204 Auxiliary storage device 205 Media drive 205a Recording medium 206 Operation button 207 Power switch 208 Network I / F
209 Image sensor I / F
210 Panorama Camera 211 Audio I / F
212 Microphone array 212a to 212f Microphone 213 Speaker 214 Output I / F
215 Display device 215a Cable 216 External device I / F
217 Bus 1001 Conference information table 1002 Priority setting table 1003 Face recognition feature information table P1 to P7 area

特開２００７−２７４４６３号公報JP 2007-274463 A

Claims

An information processing apparatus,
An imaging unit for imaging video;
An input unit for inputting voice;
A recognition unit for recognizing a user from the video imaged by the imaging unit;
A specifying unit for specifying an arrangement pattern indicating a direction in which the user recognized by the recognition unit is arranged with respect to the information processing device in the video imaged by the imaging unit;
An acquisition unit that acquires first information including at least information indicating a user and a role of the user associated with the information indicating the user;
Based on the role of the first information corresponding to the user recognized by the recognition unit, and the second information in which the role and the priority are associated in advance, the user's role with respect to the user A setting unit for setting the priority corresponding to
When voice is input from a plurality of directions by the input unit, priority is given to the direction of a user having a higher priority among the user directions specified by the arrangement pattern in the plurality of directions. A cutout unit that cuts out a video area including the user corresponding to the direction from the video;
A transmission unit for transmitting the video region cut out by the cut-out unit;
An information processing apparatus comprising:

The input unit is capable of switching the direction of sound collection,
When voice is input from a plurality of directions by the input unit, among the users in the plurality of directions specified from the arrangement pattern, the input unit is set in the direction of the user having the highest priority role. The information processing apparatus according to claim 1, further comprising a switching unit that switches a sound collection direction of the sound.

The switching unit is a user corresponding to the different direction when sound is input from a direction different from the one direction in a state where the input unit is switched to the sound collection direction by the input unit. If the priority of the role of the user is higher than the priority of the role of the user corresponding to the one direction, the sound collection direction of the input unit is switched to the different direction,
The information processing apparatus according to claim 2, wherein the cutout unit cuts out a video region including a user corresponding to the different direction from the video.

The information processing apparatus according to any one of claims 1 to 3, wherein the cutout unit cuts out a video region corresponding to a specific direction from the video when no audio is input from the input unit.

The cutout unit cuts out a video region corresponding to a direction of a role having the highest priority in the second information as the specific direction from the video when no audio is input from the input unit. The information processing apparatus described in 1.

When the voice is input from a plurality of directions by the input unit, and the roles of the users in the plurality of directions identified from the arrangement pattern are equal to each other, the switching unit Priority is given to the voice input to the input unit earlier, and the sound collection direction of the input unit is switched to the direction of the user corresponding to the voice.
The information processing apparatus according to claim 2, wherein the cutout unit cuts out a video area including a user corresponding to the sound collection direction from the video.

The switching unit switches the sound collection direction of the input unit to the direction of the user when the priority of the role of the user corresponding to the voice input by the input unit is the lowest in the second information. Without
The information processing apparatus according to claim 2, wherein the cutout unit does not cut out a video area including the user from the video.

Every predetermined time
The recognition unit recognizes a user from the video imaged by the imaging unit,
The specifying unit specifies the arrangement pattern,
The setting unit corresponds to the role of the user with respect to the user based on the role of the first information corresponding to the user recognized by the recognition unit and the second information. The information processing apparatus according to claim 1, wherein the priority is set.

The information processing apparatus according to claim 1, wherein the imaging unit captures images in all directions to obtain the images in all directions.

The information processing apparatus according to any one of claims 1 to 9,
A server device having the first information and the second information;
Conference system.

An information processing method in an information processing apparatus,
An imaging step for imaging video;
An input step for inputting voice;
A recognition step for recognizing a user from the captured image;
A specific step of specifying an arrangement pattern indicating in which direction the recognized user is arranged with respect to the information processing device in the captured image;
An acquisition step of acquiring first information including at least information indicating a user and a role of the user associated with the information indicating the user;
Based on the role of the first information corresponding to the recognized user and the second information in which the role and the priority are associated in advance, the priority corresponding to the role of the user is given to the user. A setting step to set
When voice is input from a plurality of directions, the direction of the user having a role having a high priority among the directions of the users specified by the arrangement pattern among the plurality of directions is given priority, Cutting out a video region including the user corresponding to the direction from the video;
A transmission step of transmitting the cut out video area;
An information processing method comprising:

In a computer provided with a video section for capturing video and an input section for inputting sound,
A recognition step of recognizing a user from the video imaged by the imaging unit;
A specifying step of specifying an arrangement pattern indicating in which direction the recognized user is arranged with respect to the information processing device in the video imaged by the imaging unit;
An acquisition step of acquiring first information including at least information indicating a user and a role of the user associated with the information indicating the user;
Based on the role of the first information corresponding to the user recognized by the recognition unit, and the second information in which the role and the priority are associated in advance, the user's role with respect to the user A setting step for setting the priority corresponding to
When voice is input from a plurality of directions by the input unit, priority is given to the direction of a user having a higher priority among the user directions specified by the arrangement pattern in the plurality of directions. Cutting out a video region including the user corresponding to the direction from the video;
A transmission step of transmitting the cut out video area;
A program for running