JP2020057875A

JP2020057875A - Communication terminal, communication system, imaging apparatus, and imaging method

Info

Publication number: JP2020057875A
Application number: JP2018186050A
Authority: JP
Inventors: 宣正銀川; Nobumasa Gingawa
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2018-09-28
Filing date: 2018-09-28
Publication date: 2020-04-09
Anticipated expiration: 2038-09-28
Also published as: JP7167610B2

Abstract

To perform optimal exposure control for more participants.SOLUTION: The communication terminal for communicating with a plurality of terminals located at a first site and located at another site includes: an output unit for outputting a first site image, which is an image of the first site to the plurality of terminals; an acquisition unit for acquiring user's viewpoint information for the first site image displayed at the site of the terminal from each of the plurality of terminals; and an exposure control unit for using a region having a high degree of attention from the user as a photometric region in the first site image obtained from the plurality of pieces of viewpoint information for the first site image.SELECTED DRAWING: Figure 6

Description

本発明は、通信端末、通信システム、撮像装置及び撮像方法に関する。 The present invention relates to a communication terminal, a communication system, an imaging device, and an imaging method.

従来、インターネット等の通信網を介して、互いに離れた拠点の間で会議を行うビデオ会議システム等の通信システムがある。このようなビデオ会議システムでは、各拠点の撮像装置によって撮像された映像が、拠点間で送受信される。 2. Description of the Related Art Conventionally, there has been a communication system such as a video conference system for holding a conference between remote sites via a communication network such as the Internet. In such a video conference system, a video imaged by an imaging device at each site is transmitted and received between the sites.

例えば、逆光時の黒つぶれ及び過順光時の白飛び等を抑える撮像装置がある（例えば、特許文献１参照）。特許文献１の撮像装置は、撮影者の視点が各領域に存在する時間比率の大小により、主要被写体領域と非主要被写体領域とを判別し、主要被写体領域及び非主要被写体領域それぞれに対して露出制御を行う。 For example, there is an image pickup apparatus that suppresses blackout under backlight and overexposure during over-directed light (for example, see Patent Document 1). The imaging apparatus disclosed in Patent Document 1 determines a main subject area and a non-main subject area based on the magnitude of the time ratio in which the photographer's viewpoint is present in each area, and exposes the main subject area and the non-main subject area to each other. Perform control.

複数の拠点に参加者がいるビデオ会議システムでは、１つの拠点の撮像装置によって撮像された当該拠点の参加者の映像が、他の複数の拠点の参加者によって視られる。このようなビデオ会議システムに特許文献１の撮像装置を用いた場合、他の拠点の１人の参加者に対して、その視点が存在する領域の被写体に対する露出制御は可能である。しかしながら、上記複数の拠点に含まれ且つ上記他の拠点ではない拠点の参加者に対して、その参加者の視点が存在する領域とは異なる領域に対して露出制御が行われる場合がある。 In a video conferencing system in which participants are present at a plurality of locations, the images of the participants at the location taken by the imaging device at one location are viewed by the participants at the other locations. When the imaging device disclosed in Patent Literature 1 is used in such a video conference system, it is possible for one participant at another base to control the exposure of a subject in an area where the viewpoint is present. However, exposure control may be performed on a participant at a site that is included in the plurality of sites and is not the other site, in an area different from an area where the viewpoint of the participant exists.

そこで、本開示の通信端末、通信システム、撮像装置及び撮像方法は、より多くの参加者に対して最適な露出制御を行うことを目的とする。 Therefore, a communication terminal, a communication system, an imaging device, and an imaging method according to the present disclosure aim to perform optimal exposure control for more participants.

本発明の一実施形態による通信端末は、第一拠点に配置され且つ他の拠点に配置された複数の端末と通信する通信端末であって、前記第一拠点を撮像した画像である第一拠点画像を前記複数の端末に出力する出力部と、前記複数の端末それぞれから、前記端末の拠点で表示される前記第一拠点画像に対するユーザの視点情報を取得する取得部と、前記第一拠点画像に対して、複数の前記視点情報から得られる前記第一拠点画像内の前記ユーザからの注目度の高い領域を測光領域とする露出制御を行う露出制御部とを備える。 A communication terminal according to an embodiment of the present invention is a communication terminal that is located at a first base and communicates with a plurality of terminals located at other bases, and is a first base that is an image of the first base. An output unit that outputs an image to the plurality of terminals, an acquisition unit that obtains, from each of the plurality of terminals, viewpoint information of a user with respect to the first base image displayed at the base of the terminal, and the first base image And an exposure control unit that performs exposure control that sets a region of high interest of the user in the first base image obtained from the plurality of pieces of viewpoint information as a photometric region.

本発明の一実施形態による通信システムは、複数の拠点に配置され且つ互いに通信する複数の通信端末を備える通信システムであって、前記通信端末はそれぞれ、前記通信端末が配置される第一拠点を撮像した画像である第一拠点画像を他の前記通信端末それぞれに出力する第一出力部と、他の前記通信端末それぞれから取得され且つ前記第一拠点で表示される拠点画像であって、前記他の通信端末それぞれが配置される拠点を撮像した画像である拠点画像に対する第一ユーザの視点情報を前記他の通信端末に出力する第二出力部と、他の前記通信端末それぞれから、前記他の通信端末それぞれが配置される拠点で表示される前記第一拠点画像に対する第二ユーザの視点情報を取得する取得部と、前記第一拠点画像に対して、他の前記通信端末から取得された複数の前記視点情報から得られる前記第一拠点画像内の前記第二ユーザからの注目度の高い領域を測光領域とする露出制御を行う露出制御部とを備える。 A communication system according to an embodiment of the present invention is a communication system including a plurality of communication terminals arranged at a plurality of bases and communicating with each other, wherein each of the communication terminals is a first base where the communication terminal is arranged. A first output unit that outputs a first base image that is a captured image to each of the other communication terminals, and a base image acquired from each of the other communication terminals and displayed at the first base, A second output unit that outputs the first user's viewpoint information for the base image, which is an image of a base where the other communication terminals are arranged, to the other communication terminal, and from the other communication terminals, An acquisition unit for acquiring viewpoint information of the second user for the first site image displayed at the site where each of the communication terminals is arranged, and for the other communication terminal for the first site image And a exposure control unit that controls exposure to the attention of regions of high photometric area from the second user in said first base image obtained from the acquired plurality of viewpoint information from.

本発明の一実施形態による撮像装置は、第一拠点に配置される撮像装置であって、前記第一拠点を撮像した画像である第一拠点画像を取得する撮像部と、他の拠点に配置された複数の端末と通信する通信端末とを備え、前記通信端末は、前記第一拠点画像を前記複数の端末に出力する第一出力部と、前記複数の端末それぞれから取得され且つ前記第一拠点で表示される拠点画像であって、前記複数の端末それぞれが配置される拠点を撮像した画像である拠点画像に対する第一ユーザの視点情報を前記端末に出力する第二出力部と、前記複数の端末それぞれから、前記端末の拠点で表示される前記第一拠点画像に対する第二ユーザの視点情報を取得する取得部と、前記第一拠点画像に対して、前記複数の端末から取得された複数の前記視点情報から得られる前記第一拠点画像内の前記第二ユーザからの注目度の高い領域を測光領域とする露出制御を行う露出制御部とを備える。 An imaging device according to an embodiment of the present invention is an imaging device arranged at a first base, an imaging unit for acquiring a first base image which is an image obtained by imaging the first base, and an imaging unit arranged at another base. A communication terminal that communicates with the plurality of terminals, the communication terminal is a first output unit that outputs the first base image to the plurality of terminals, and the first terminal is obtained from each of the plurality of terminals and the first A second output unit that outputs, to the terminal, a base image displayed at the base, and viewpoint information of a first user with respect to the base image, which is an image of a base where the plurality of terminals are arranged; From each of the terminals, an acquisition unit that acquires viewpoint information of the second user for the first base image displayed at the base of the terminal, and a plurality of units obtained from the plurality of terminals for the first base image. Said viewpoint information And a exposure control unit that controls exposure to the region with a high level of interest from the second user in said first base image et resulting photometric area.

本発明の一実施形態による撮像方法は、第一拠点における撮像方法であって、前記第一拠点を撮像した画像である第一拠点画像を取得するステップと、他の拠点に配置された複数の端末と通信することによって、前記第一拠点画像を前記複数の端末に出力するステップと、前記複数の端末それぞれから取得され且つ前記第一拠点で表示される拠点画像であって、前記複数の端末それぞれが配置される拠点を撮像した画像である拠点画像に対する第一ユーザの視点情報を前記端末に出力するステップと、前記複数の端末それぞれから、前記端末の拠点で表示される前記第一拠点画像に対する第二ユーザの視点情報を取得するステップと、前記第一拠点画像に対して、前記複数の端末から取得された複数の前記視点情報から得られる前記第一拠点画像内の前記第二ユーザからの注目度の高い領域を測光領域とする露出制御を行うステップと、前記露出制御後の前記第一拠点画像を、前記複数の端末に出力するステップとを含む。 An imaging method according to an embodiment of the present invention is an imaging method at a first base, a step of obtaining a first base image which is an image obtained by imaging the first base, and a plurality of bases arranged at other bases. Outputting the first site image to the plurality of terminals by communicating with a terminal, and a site image acquired from each of the plurality of terminals and displayed at the first site, wherein the plurality of terminals Outputting, to the terminal, viewpoint information of a first user with respect to a base image which is an image of a base where the respective bases are arranged; and the first base image displayed at the base of the terminal from each of the plurality of terminals. Obtaining viewpoint information of a second user with respect to the first base image obtained from the plurality of viewpoint information obtained from the plurality of terminals with respect to the first base image And performing exposure control for the region with a high level of interest from the second user in the image and metering area, said first base image after the exposure control, and outputting the plurality of terminals.

本開示の技術によれば、より多くの参加者に対して最適な露出制御を行うことが可能になる。 According to the technology of the present disclosure, it is possible to perform optimal exposure control for more participants.

実施の形態１に係るビデオ会議システムの構成の一例を示す図FIG. 1 is a diagram showing an example of a configuration of a video conference system according to Embodiment 1. 実施の形態１に係るサーバ装置の機能的な構成の一例を示すブロック図FIG. 2 is a block diagram showing an example of a functional configuration of the server device according to the first embodiment. 実施の形態１に係る予約サーバ装置の機能的な構成の一例を示すブロック図FIG. 2 is a block diagram showing an example of a functional configuration of the reservation server device according to the first embodiment. 実施の形態１に係るサーバ装置のハードウェア構成の一例を示すブロック図FIG. 2 is a block diagram illustrating an example of a hardware configuration of the server device according to the first embodiment. 実施の形態１に係る予約サーバ装置のハードウェア構成の一例を示すブロック図FIG. 2 is a block diagram showing an example of a hardware configuration of a reservation server device according to the first embodiment. 実施の形態１に係る端末システムの機能的な構成の一例を示すブロック図FIG. 2 is a block diagram showing an example of a functional configuration of the terminal system according to Embodiment 1. 実施の形態１に係る端末装置のハードウェア構成の一例を示すブロック図FIG. 2 is a block diagram illustrating an example of a hardware configuration of the terminal device according to Embodiment 1. 会議の拠点のうちの第一拠点における参加者及び撮像部の配置の一例を示す平面図FIG. 2 is a plan view showing an example of the arrangement of participants and imaging units at a first base among the bases of the conference; 図８Ａの第一拠点の撮像部によって撮像された画像の一例を示す図The figure which shows an example of the image imaged by the imaging part of the 1st base of FIG. 8A. 露出制御における画像に設定される測光領域の一例を示す図FIG. 7 is a diagram illustrating an example of a photometry area set for an image in exposure control. 図８Ｂの画像への測光領域の適用例を示す図FIG. 8B is a diagram showing an example of application of the photometry area to the image of FIG. 8B. 露出制御における図９Ｂの画像への測光領域の重み付けの一例を示す図FIG. 9 is a diagram illustrating an example of weighting of a photometric area to the image in FIG. 9B in exposure control. 図８Ｂの第一拠点の画像に対する第二拠点の参加者の視線情報の一例を示す図The figure which shows an example of the gaze information of the participant of the 2nd base with respect to the image of the 1st base of FIG. 8B. 図８Ｂの第一拠点の画像に対する第三拠点の参加者の視線情報の一例を示す図The figure which shows an example of the gaze information of the participant of the 3rd base with respect to the image of the 1st base of FIG. 8B. 第二拠点及び第三拠点の視線情報を用いて測光領域の注目情報が設定された図９Ｂの画像の一例を示す図The figure which shows an example of the image of FIG. 9B in which attention information of the photometry area was set using the line-of-sight information of the second base and the third base. 第一拠点における話者の方向の一例を示す図Diagram showing an example of the direction of the speaker at the first base 第一拠点における話者の方向を用いて測光領域の注目情報が設定された図９Ｂの画像の一例を示す図FIG. 9B is a diagram illustrating an example of the image of FIG. 9B in which attention information of a photometry area is set using a direction of a speaker at a first base. 各拠点の注目情報から設定される各測光領域の注目情報の一例を示す図Diagram showing an example of attention information of each photometry area set from attention information of each base 図１３の各測光領域の注目情報への重み付け後の各測光領域の注目情報の一例を示す図FIG. 13 is a diagram illustrating an example of attention information of each photometry area after weighting attention information of each photometry area in FIG. 13. 注目度が高い領域が分散している例を示す図Diagram showing an example in which regions of high interest are dispersed 実施の形態１に係る端末システムの動作の一例を示すフローチャートFlowchart showing an example of the operation of the terminal system according to Embodiment 1. 実施の形態２に係る端末システムにおける注目情報の重要度の決定処理を説明する図FIG. 8 is a view for explaining a process of determining importance of attention information in the terminal system according to the second embodiment.

近年、インターネット等の通信網を介して、互いに離れた会議の拠点の間で会議を行うビデオ会議システムが普及している。このような会議システムでは、各拠点において、端末システムを用いて、参加者等の画像及び音声が撮像及び収集され、画像及び音声がデジタルデータに変換されて、他の拠点の端末システムに送信される。他の拠点では、送信された画像及び音声それぞれが、ディスプレイ及びスピーカによって出力される。これにより、複数の拠点の参加者が、同じ拠点での会議に近い状態で会議を行うことができる。 2. Description of the Related Art In recent years, a video conference system for holding a conference between conference sites distant from each other via a communication network such as the Internet has become widespread. In such a conference system, at each site, images and sounds of participants and the like are captured and collected using a terminal system, and the images and sounds are converted into digital data and transmitted to terminal systems at other sites. You. At other sites, the transmitted image and sound are output by a display and a speaker. Thus, participants at a plurality of locations can hold a conference in a state close to a conference at the same location.

しかしながら、従来のビデオ会議システムでは、撮像装置によって撮像された画像は、会議室全体等の撮像範囲全体に対して最適な露出制御が行われることが多い。必ずしも参加者が注目している箇所が最適な露出となるように制御されていない。特に、明暗差の大きい場所を撮像したとき、送信側の画像において、参加者が注目したいポイントが白飛び、黒つぶれ又はこれらに近い状態が発生する場合がある。このような場合、受信側の拠点において画質調整が行われても、良好な画像は得られない。 However, in a conventional video conference system, an image captured by an imaging device is often subjected to optimal exposure control over an entire imaging range such as an entire conference room. The part that the participant is paying attention to is not necessarily controlled to obtain the optimal exposure. In particular, when an image of a place with a large difference in brightness is captured, a point that the participant wants to pay attention to may be overexposed, underexposed, or close to these points in the image on the transmission side. In such a case, a good image cannot be obtained even if the image quality is adjusted at the receiving site.

例えば、１つの壁に窓がある会議室では、会議室全体に露出を合わせると、窓側の人は逆光となる。このため、撮像された画像では、窓側の人の顔の像が、黒つぶれ又はそれに近い状態になる。一方、窓と反対側の人の顔には光がよく当たるため、より明るく撮像される。多くの参加者が明るく撮像される人の表情に注目するシーンでは、受信側の各拠点で画質調整が行われるとしても、撮像場所での明るさから大きな補正が必要であるため、良好な画質調整は困難である。 For example, in a meeting room having a window on one wall, if the exposure is adjusted to the entire meeting room, the person on the window side will be backlit. For this reason, in the captured image, the image of the face of the person on the window side is in a state of blackening or close to it. On the other hand, the face of the person on the opposite side of the window is well lit by light, so that the image is brighter. In a scene where many participants pay attention to the facial expression of a person who is imaged brightly, even if image quality adjustment is performed at each site on the receiving side, a large correction is required from the brightness at the imaging location, so that good image quality is obtained. Coordination is difficult.

このことは、特許文献１の撮像装置にも当てはまる。この撮像装置は、撮像装置の画像を視る複数の拠点の参加者のうちの１人の参加者に対して、露出制御を行うことは可能であるが、他の参加者に対する露出制御を行うことができない。よって、多くの参加者に良好な画像を提供することが困難である。 This also applies to the imaging device of Patent Document 1. This imaging apparatus can perform exposure control on one of the participants at a plurality of locations viewing the images of the imaging apparatus, but performs exposure control on other participants. Can not do. Therefore, it is difficult to provide good images to many participants.

そこで、本開示の技術は、より多くの参加者に対して最適な露出制御を行うことを可能にするビデオ会議端末、ビデオ会議システム、撮像装置及び撮像方法を提供する。 Thus, the technology of the present disclosure provides a video conference terminal, a video conference system, an imaging device, and an imaging method that can perform optimal exposure control for more participants.

以下、本発明の実施の形態について添付の図面を参照しつつ説明する。なお、本明細書及び図面において、実質的に同一の機能構成を有する構成要素については、同一の符号を付することによって重複した説明を省く。 Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. In the present specification and the drawings, components having substantially the same functional configuration are denoted by the same reference numerals, and redundant description is omitted.

（実施の形態１）
通信システムは、互いに離れた位置にある複数の拠点に配置された通信端末を介して、複数の拠点間で画像及び音声等の情報を送受信するシステムである。本実施の形態では、通信システムは、複数の拠点間で会議を行うために用いられるビデオ会議システム１であるとして説明する。ビデオ会議システム１は、通信システムの一例である。 (Embodiment 1)
2. Description of the Related Art A communication system is a system for transmitting and receiving information such as images and sounds between a plurality of sites via communication terminals arranged at a plurality of sites separated from each other. In the present embodiment, a description will be given assuming that the communication system is a video conference system 1 used for holding a conference between a plurality of sites. The video conference system 1 is an example of a communication system.

＜ビデオ会議システム１の構成＞
実施の形態１に係るビデオ会議システム１を説明する。図１は、実施の形態１に係るビデオ会議システム１の構成の一例を示す図である。図１に示すように、本実施の形態に係るビデオ会議システム１は、複数の端末装置１０と、サーバ装置２０と、予約サーバ装置３０とを含む。複数の端末装置１０、サーバ装置２０及び予約サーバ装置３０は、ネットワーク４０を介して互いに接続されている。ビデオ会議システム１は、互いに離れた位置にある複数の会議の拠点に配置された端末装置１０が、ネットワーク４０を介して、複数の拠点間で画像及び音声等の情報を送受信することで、各拠点の参加者が同じ場所にいるように会議を行うことを可能にする。本実施の形態では、会議の拠点は会議室であるが、これに限定されず、いかなる場所でもよい。ここで、端末装置１０は通信端末の一例であり、参加者はユーザの一例である。 <Configuration of Video Conference System 1>
The video conference system 1 according to Embodiment 1 will be described. FIG. 1 is a diagram illustrating an example of a configuration of a video conference system 1 according to Embodiment 1. As shown in FIG. 1, a video conference system 1 according to the present embodiment includes a plurality of terminal devices 10, a server device 20, and a reservation server device 30. The plurality of terminal devices 10, the server device 20, and the reservation server device 30 are connected to each other via a network 40. The video conferencing system 1 is configured such that the terminal devices 10 arranged at a plurality of conference bases that are separated from each other transmit and receive information such as images and sounds between the plurality of bases via the network 40, so that Allows participants at bases to conduct meetings as if they were at the same location. In the present embodiment, the base of the conference is a conference room, but is not limited to this, and may be any location. Here, the terminal device 10 is an example of a communication terminal, and a participant is an example of a user.

本実施の形態では、４つの拠点に４つの端末装置１０ａ〜１０ｄが配置されるとして説明する。しかしながら、２つ以上の端末装置が、２つ以上の拠点に配置されればよい。以下の説明において、４つの端末装置を個別に特定して表現する場合、参照符号「１０ａ〜１０ｄ」を用いることがあり、４つの端末装置全体又は個別に特定せずに表現する場合、参照符号「１０」を用いることがある。 In the present embodiment, a description will be given assuming that four terminal devices 10a to 10d are arranged at four bases. However, two or more terminal devices may be arranged at two or more bases. In the following description, when the four terminal devices are individually specified and expressed, reference numerals “10a to 10d” may be used. When the four terminal devices are entirely or individually specified without being specified, the reference numerals are used. "10" may be used.

サーバ装置２０は、複数の端末装置１０間の通信を制御する。例えば、サーバ装置２０は、端末装置１０のそれぞれがサーバ装置２０と接続しているか否かの接続状態を監視する。サーバ装置２０は、端末装置１０の会議への参加を許可及び拒絶する。例えば、サーバ装置２０は、認証情報を送信する端末装置１０の参加を許可する。サーバ装置２０は、会議の開始時に、会議に参加する端末装置１０それぞれを呼び出す。サーバ装置２０は、会議中、各端末装置１０に対する画像及び音声等の情報の送受信を制御する。サーバ装置２０は、予約サーバ装置３０から会議に参加する端末装置１０の情報を取得し、参加する端末装置１０間の情報の送受信を可能にし、参加する端末装置１０と参加しない端末装置１０との間の情報の送受信を遮断する。 The server device 20 controls communication between the terminal devices 10. For example, the server device 20 monitors a connection state as to whether each of the terminal devices 10 is connected to the server device 20. The server device 20 permits and denies participation of the terminal device 10 in the conference. For example, the server device 20 permits the terminal device 10 transmitting the authentication information to participate. The server device 20 calls each of the terminal devices 10 participating in the conference at the start of the conference. The server device 20 controls transmission and reception of information such as images and sounds to and from each terminal device 10 during the conference. The server device 20 acquires the information of the terminal devices 10 participating in the conference from the reservation server device 30, enables transmission and reception of information between the participating terminal devices 10, and allows the terminal device 10 to participate and the terminal device 10 not participating. Block transmission and reception of information between

予約サーバ装置３０は、会議の予定を管理する。予約サーバ装置３０は、ネットワーク４０を介して、端末装置１０と接続されるだけでなく、端末装置１０以外のコンピュータ装置と接続され得る。予約サーバ装置３０は、端末装置１０又は他のコンピュータ装置から会議の情報を受け付け、登録する。会議の情報は、会議の開催日時、拠点等の開催場所、会議参加者、会議の議題、及び使用される端末装置等の情報を含む。予約サーバ装置３０は、パスワード等の会議に参加するための認証情報を発行する。端末装置１０は、予約サーバ装置３０に問い合わせすることで、当該端末装置１０が参加する予定の会議の情報と、参加への認証情報とを取得する。また、予約サーバ装置３０は、会議の情報及び認証情報をサーバ装置２０に送信し、会議を開催させる。 The reservation server device 30 manages the schedule of the meeting. The reservation server device 30 can be connected not only to the terminal device 10 but also to a computer device other than the terminal device 10 via the network 40. The reservation server device 30 receives and registers conference information from the terminal device 10 or another computer device. The meeting information includes information such as the date and time of the meeting, the location such as a base, the meeting participants, the agenda of the meeting, and the terminal device used. The reservation server device 30 issues authentication information for participating in a conference, such as a password. The terminal device 10 obtains information of a conference in which the terminal device 10 is to participate and authentication information for participation by making an inquiry to the reservation server device 30. Further, the reservation server device 30 transmits the meeting information and the authentication information to the server device 20, and causes the server device 20 to hold the meeting.

端末装置１０は、ネットワーク４０を介してサーバ装置２０及び予約サーバ装置３０と通信する。端末装置１０は、予約サーバ装置３０に問い合わせすることで、当該端末装置１０が参加する予定の会議の情報と、参加への認証情報とを取得する。端末装置１０は、サーバ装置２０から許可を受けることで、会議に参加し、他の端末装置１０と、ネットワーク４０及びサーバ装置２０を介して通信する。端末装置１０は、自身が配置される拠点において、会議の参加者等の画像及び音声のデータを取得し、サーバ装置２０に送信する。サーバ装置２０は、取得されたデータを、会議に参加する他の端末装置１０に送信する。また、端末装置１０は、他の端末装置１０によって取得されサーバ装置２０に送信された画像及び音声のデータを、サーバ装置２０から受信する。 The terminal device 10 communicates with the server device 20 and the reservation server device 30 via the network 40. The terminal device 10 obtains information of a conference in which the terminal device 10 is to participate and authentication information for participation by making an inquiry to the reservation server device 30. Upon receiving permission from the server device 20, the terminal device 10 participates in the conference and communicates with the other terminal devices 10 via the network 40 and the server device 20. The terminal device 10 acquires image and audio data of the participants of the conference at the base where the terminal device 10 is located, and transmits the data to the server device 20. The server device 20 transmits the acquired data to another terminal device 10 participating in the conference. Further, the terminal device 10 receives, from the server device 20, image and sound data acquired by another terminal device 10 and transmitted to the server device 20.

例えば、端末装置１０ａ〜１０ｄのうち、端末装置１０ａ〜１０ｃが会議に参加し、端末装置１０ｄが会議に参加しない場合、端末装置１０ａ〜１０ｄは、サーバ装置２０に接続可能である。そして、端末装置１０ａによって送信されるデータは、サーバ装置２０を介して端末装置１０ｂ及び１０ｃに送信されるが、端末装置１０ｄには送信されない。同様に、端末装置１０ｂ及び１０ｃから送信されるデータは、端末装置１０ａ〜１０ｃのうちの自身を除く端末装置に送信されるが、端末装置１０ｄには送信されない。また、端末装置１０ｄから送信されるデータは、端末装置１０ａ〜１０ｄに送信されない。 For example, among the terminal devices 10a to 10d, when the terminal devices 10a to 10c participate in the conference and the terminal device 10d does not participate in the conference, the terminal devices 10a to 10d can be connected to the server device 20. The data transmitted by the terminal device 10a is transmitted to the terminal devices 10b and 10c via the server device 20, but is not transmitted to the terminal device 10d. Similarly, data transmitted from the terminal devices 10b and 10c is transmitted to the terminal devices 10a to 10c except for the terminal device 10b, but is not transmitted to the terminal device 10d. The data transmitted from the terminal device 10d is not transmitted to the terminal devices 10a to 10d.

ネットワーク４０は、本実施の形態ではインターネットであるが、これに限定されない。例えば、ネットワーク４０は、ＬＡＮ（Local Area Network）、ＷＡＮ（Wide Area Network）、モバイル通信網、電話回線通信網、又は、その他の通信網であってもよい。ネットワーク４０は、無線ネットワークであってもよく有線ネットワークであってもよい。 The network 40 is the Internet in the present embodiment, but is not limited to this. For example, the network 40 may be a LAN (Local Area Network), a WAN (Wide Area Network), a mobile communication network, a telephone line communication network, or another communication network. Network 40 may be a wireless network or a wired network.

サーバ装置２０及び予約サーバ装置３０の機能的な構成を説明する。図２は、実施の形態１に係るサーバ装置２０の機能的な構成の一例を示すブロック図である。図３は、実施の形態１に係る予約サーバ装置３０の機能的な構成の一例を示すブロック図である。 The functional configuration of the server device 20 and the reservation server device 30 will be described. FIG. 2 is a block diagram illustrating an example of a functional configuration of the server device 20 according to the first embodiment. FIG. 3 is a block diagram illustrating an example of a functional configuration of the reservation server device 30 according to the first embodiment.

図２に示すように、サーバ装置２０は、機器制御部２０ａと、通信部２０ｂと、記憶部２０ｃと、操作部２０ｄと、表示部２０ｅとを備える。通信部２０ｂは、ネットワーク４０と接続され、ネットワーク４０を介して端末装置１０等と通信する。機器制御部２０ａは、サーバ装置２０の全体の動作を制御する。機器制御部２０ａは、通信部２０ｂを介して、ネットワーク４０に対して情報を送受信する。記憶部２０ｃは、機器制御部２０ａによる種々の情報の記憶及び取り出しを可能にする。例えば、記憶部２０ｃは、会議中に各端末装置１０から送信された画像及び音声のデータを記憶してもよい。操作部２０ｄは、サーバ装置２０の操作者による操作、情報及び指令の入力を受け付け、機器制御部２０ａに出力する。表示部２０ｅは、機器制御部２０ａから出力される画像データを、画像として出力する。また、表示部２０ｅは、機器制御部２０ａから出力される音声データを、音声として出力する機能を備えてもよい。 As shown in FIG. 2, the server device 20 includes a device control unit 20a, a communication unit 20b, a storage unit 20c, an operation unit 20d, and a display unit 20e. The communication unit 20b is connected to the network 40 and communicates with the terminal device 10 and the like via the network 40. The device control unit 20a controls the overall operation of the server device 20. The device control unit 20a transmits and receives information to and from the network 40 via the communication unit 20b. The storage unit 20c enables storage and retrieval of various information by the device control unit 20a. For example, the storage unit 20c may store image and audio data transmitted from each terminal device 10 during the conference. The operation unit 20d receives an operation, information, and a command input by an operator of the server device 20, and outputs the input to the device control unit 20a. The display unit 20e outputs the image data output from the device control unit 20a as an image. The display unit 20e may have a function of outputting audio data output from the device control unit 20a as audio.

図３に示すように、予約サーバ装置３０は、機器制御部３０ａと、通信部３０ｂと、記憶部３０ｃと、操作部３０ｄと、表示部３０ｅとを備える。通信部３０ｂは、ネットワーク４０と接続され、ネットワーク４０を介して端末装置１０等と通信する。機器制御部３０ａは、予約サーバ装置３０の全体の動作を制御する。機器制御部３０ａは、通信部３０ｂを介して、ネットワーク４０に対して情報を送受信する。記憶部３０ｃは、機器制御部３０ａによる種々の情報の記憶及び取り出しを可能にする。例えば、記憶部３０ｃは、会議の情報及び認証情報を記憶してもよい。操作部３０ｄは、予約サーバ装置３０の操作者による操作、情報及び指令の入力を受け付け、機器制御部３０ａに出力する。表示部３０ｅは、機器制御部３０ａから出力される画像データを、画像として出力する。また、表示部３０ｅは、機器制御部３０ａから出力される音声データを、音声として出力する機能を備えてもよい。 As illustrated in FIG. 3, the reservation server device 30 includes a device control unit 30a, a communication unit 30b, a storage unit 30c, an operation unit 30d, and a display unit 30e. The communication unit 30b is connected to the network 40 and communicates with the terminal device 10 and the like via the network 40. The device control unit 30a controls the entire operation of the reservation server device 30. The device control unit 30a transmits and receives information to and from the network 40 via the communication unit 30b. The storage unit 30c enables storage and retrieval of various types of information by the device control unit 30a. For example, the storage unit 30c may store conference information and authentication information. The operation unit 30d accepts an operation, information, and a command input by the operator of the reservation server device 30, and outputs the input to the device control unit 30a. The display unit 30e outputs the image data output from the device control unit 30a as an image. The display unit 30e may have a function of outputting audio data output from the device control unit 30a as audio.

なお、サーバ装置２０及び予約サーバ装置３０は、本実施の形態では分離した別々の装置を構成するが、一体化された装置を構成してもよい。また、各装置は、１つ以上の装置で構成されてもよい。装置が２つ以上の装置で構成される場合、当該２つ以上の装置は、１つの機器内に配置されてもよく、分離した２つ以上の機器内に分かれて配置されてもよい。本明細書及び特許請求の範囲では、「装置」とは、１つの装置を意味し得るだけでなく、複数の装置からなるシステムも意味し得る。 Although the server device 20 and the reservation server device 30 constitute separate and separate devices in the present embodiment, they may constitute an integrated device. Further, each device may be configured by one or more devices. When the device is configured by two or more devices, the two or more devices may be arranged in one device or may be separately arranged in two or more separated devices. In this specification and in the claims, the term “device” may mean not only one device, but also a system including a plurality of devices.

サーバ装置２０及び予約サーバ装置３０のハードウェア構成を説明する。図４は、実施の形態１に係るサーバ装置２０のハードウェア構成の一例を示すブロック図である。図５は、実施の形態１に係る予約サーバ装置３０のハードウェア構成の一例を示すブロック図である。 The hardware configuration of the server device 20 and the reservation server device 30 will be described. FIG. 4 is a block diagram illustrating an example of a hardware configuration of the server device 20 according to the first embodiment. FIG. 5 is a block diagram illustrating an example of a hardware configuration of the reservation server device 30 according to the first embodiment.

図４に示すように、サーバ装置２０は、ＣＰＵ（Central Processing Unit）１２１と、不揮発性記憶装置１２２と、揮発性記憶装置１２３と、メモリ１２４と、通信Ｉ／Ｆ（インタフェース）１２５と、操作Ｉ／Ｆ１２６と、表示装置１２７とを構成要素として備える。上記構成要素はそれぞれ、例えばバスを介して互いに接続されている。なお、上記構成要素は、有線通信及び無線通信のいずれを介して接続されてもよい。 As shown in FIG. 4, the server device 20 includes a CPU (Central Processing Unit) 121, a nonvolatile storage device 122, a volatile storage device 123, a memory 124, a communication I / F (interface) 125, an operation An I / F 126 and a display device 127 are provided as constituent elements. The above components are connected to each other, for example, via a bus. The above components may be connected via any of wired communication and wireless communication.

サーバ装置２０の例は、コンピュータ装置である。 An example of the server device 20 is a computer device.

通信Ｉ／Ｆ１２５は、通信部２０ｂの機能を実現する。通信Ｉ／Ｆ１２５は、接続端子及び通信回路等を含んでもよい。操作Ｉ／Ｆ１２６は、操作部２０ｄの機能を実現する。操作Ｉ／Ｆ１２６は、ボタン、ダイヤル、キー、タッチパネル及び音声入力のためのマイク等の入力装置を含んでもよい。表示装置１２７は、表示部２０ｅの機能を実現する。表示装置１２７は、液晶パネル、有機ＥＬ（Electroluminescence）、無機ＥＬ及び電子ペーパーディスプレイ等のディスプレイであってもよい。表示装置１２７は、操作Ｉ／Ｆ１２６を兼ねたタッチパネルであってもよい。表示装置１２７はスピーカを含んでもよい。 The communication I / F 125 implements the function of the communication unit 20b. The communication I / F 125 may include a connection terminal, a communication circuit, and the like. The operation I / F 126 implements the function of the operation unit 20d. The operation I / F 126 may include input devices such as buttons, dials, keys, a touch panel, and a microphone for voice input. The display device 127 implements the function of the display unit 20e. The display device 127 may be a display such as a liquid crystal panel, an organic EL (Electroluminescence), an inorganic EL, and an electronic paper display. The display device 127 may be a touch panel that also serves as the operation I / F 126. The display device 127 may include a speaker.

メモリ１２４は、記憶部２０ｃの機能を実現する。メモリ１２４は、揮発性又は不揮発性の半導体メモリ、ＨＤＤ（Hard Disk Drive）又はＳＳＤ（Solid State Drive）等の記憶装置で構成される。なお、メモリ１２４が、不揮発性記憶装置１２２及び／又は揮発性記憶装置１２３を含んでもよい。 The memory 124 implements the function of the storage unit 20c. The memory 124 is configured by a storage device such as a volatile or nonvolatile semiconductor memory, a hard disk drive (HDD), or a solid state drive (SSD). Note that the memory 124 may include the nonvolatile storage device 122 and / or the volatile storage device 123.

ＣＰＵ１２１は、機器制御部２０ａの機能を実現する。ＣＰＵ１２１はプロセッサ等で構成される。不揮発性記憶装置１２２の例は、ＲＯＭ（Read Only Memory）であり、揮発性記憶装置１２３の例は、ＲＡＭ（Random Access Memory）である。機器制御部２０ａを動作させるプログラムは、不揮発性記憶装置１２２又はメモリ１２４等に予め保持されている。プログラムは、ＣＰＵ１２１によって、不揮発性記憶装置１２２又はメモリ１２４等から揮発性記憶装置１２３に読み出されて展開される。ＣＰＵ１２１は、揮発性記憶装置１２３に展開されたプログラム中のコード化された各命令を実行する。なお、プログラムは、例えば記録ディスク等の記録媒体に格納されていてもよい。また、プログラムは、有線ネットワーク、無線ネットワーク又は放送等を介して伝送され、揮発性記憶装置１２３に取り込まれてもよい。 The CPU 121 implements the function of the device control unit 20a. The CPU 121 includes a processor and the like. An example of the non-volatile storage device 122 is a ROM (Read Only Memory), and an example of the volatile storage device 123 is a RAM (Random Access Memory). The program for operating the device control unit 20a is stored in the nonvolatile storage device 122 or the memory 124 in advance. The program is read out from the nonvolatile storage device 122 or the memory 124 to the volatile storage device 123 and expanded by the CPU 121. The CPU 121 executes each coded instruction in the program developed in the volatile storage device 123. The program may be stored on a recording medium such as a recording disk. In addition, the program may be transmitted via a wired network, a wireless network, broadcast, or the like, and may be loaded into the volatile storage device 123.

なお、機器制御部２０ａは、ＣＰＵ１２１等のプログラム実行部によって実現されてもよく、回路によって実現されてもよく、プログラム実行部及び回路の組み合わせによって実現されてもよい。例えば、このような構成要素は、集積回路であるＬＳＩ（大規模集積回路：Large Scale Integration）として実現されてもよい。このような構成要素は個別に１チップ化されてもよく、一部又は全てを含むように１チップ化されてもよい。ＬＳＩとして、ＬＳＩ製造後にプログラムすることが可能なＦＰＧＡ（Field Programmable Gate Array）、ＬＳＩ内部の回路セルの接続及び／又は設定を再構成可能なリコンフィギュラブル・プロセッサ、又は、特定用途向けに複数の機能の回路が１つにまとめられたＡＳＩＣ（Application Specific Integrated Circuit）等が利用されてもよい。 The device control unit 20a may be realized by a program execution unit such as the CPU 121, may be realized by a circuit, or may be realized by a combination of a program execution unit and a circuit. For example, such components may be realized as an LSI (Large Scale Integration) which is an integrated circuit. Such components may be individually formed into one chip, or may be formed into one chip so as to include some or all of them. As the LSI, an FPGA (Field Programmable Gate Array) that can be programmed after the LSI is manufactured, a reconfigurable processor that can reconfigure the connection and / or setting of circuit cells inside the LSI, or a plurality of specific applications. An ASIC (Application Specific Integrated Circuit) in which functional circuits are integrated into one may be used.

図５に示すように、予約サーバ装置３０は、ＣＰＵ１３１と、不揮発性記憶装置１３２と、揮発性記憶装置１３３と、メモリ１３４と、通信Ｉ／Ｆ１３５と、操作Ｉ／Ｆ１３６と、表示装置１３７とを構成要素として備える。上記構成要素はそれぞれ、例えばバスを介して互いに接続されている。なお、上記構成要素は、有線通信及び無線通信のいずれを介して接続されてもよい。 As shown in FIG. 5, the reservation server device 30 includes a CPU 131, a nonvolatile storage device 132, a volatile storage device 133, a memory 134, a communication I / F 135, an operation I / F 136, and a display device 137. Is provided as a component. The above components are connected to each other, for example, via a bus. The above components may be connected via any of wired communication and wireless communication.

予約サーバ装置３０の例は、コンピュータ装置である。 An example of the reservation server device 30 is a computer device.

通信Ｉ／Ｆ１３５は、通信部３０ｂの機能を実現する。通信Ｉ／Ｆ１３５の構成は、通信Ｉ／Ｆ１２５について上述した構成と同様である。操作Ｉ／Ｆ１３６は、操作部３０ｄの機能を実現する。操作Ｉ／Ｆ１３６の構成は、操作Ｉ／Ｆ１２６について上述した構成と同様である。表示装置１３７は、表示部３０ｅの機能を実現する。表示装置１３７の構成は、表示装置１２７について上述した構成と同様である。メモリ１３４は、記憶部３０ｃの機能を実現する。メモリ１３４の構成は、メモリ１２４について上述した構成と同様である。 The communication I / F 135 implements the function of the communication unit 30b. The configuration of communication I / F 135 is the same as the configuration described above for communication I / F 125. The operation I / F 136 implements the function of the operation unit 30d. The configuration of the operation I / F 136 is the same as the configuration described above for the operation I / F 126. The display device 137 implements the function of the display unit 30e. The configuration of the display device 137 is similar to the configuration of the display device 127 described above. The memory 134 implements the function of the storage unit 30c. The configuration of the memory 134 is the same as the configuration described above for the memory 124.

ＣＰＵ１３１は、機器制御部３０ａの機能を実現する。ＣＰＵ１３１はプロセッサ等で構成される。不揮発性記憶装置１３２の例はＲＯＭであり、揮発性記憶装置１３３の例はＲＡＭである。ＣＰＵ１３１、不揮発性記憶装置１３２及び揮発性記憶装置１３３の構成は、ＣＰＵ１２１、不揮発性記憶装置１２２及び揮発性記憶装置１２３について上述した構成と同様である。 The CPU 131 implements the function of the device control unit 30a. The CPU 131 includes a processor and the like. An example of the non-volatile storage device 132 is a ROM, and an example of the volatile storage device 133 is a RAM. The configurations of the CPU 131, the nonvolatile storage device 132, and the volatile storage device 133 are the same as those described above for the CPU 121, the nonvolatile storage device 122, and the volatile storage device 123.

次いで、端末装置１０の機能的な構成を説明する。図６は、実施の形態１に係る端末システム１００の機能的な構成の一例を示すブロック図である。端末装置１０は、端末システム１００を構成する。端末システム１００は、拠点それぞれに配置される。端末システム１００は、自身が存在する拠点（以下、「自拠点」とも呼ぶ）において、端末装置１０と、撮像部５１と、音声入力部５２と、音声出力部５３と、表示部５４と、入力部５５とを備える。 Next, a functional configuration of the terminal device 10 will be described. FIG. 6 is a block diagram showing an example of a functional configuration of the terminal system 100 according to Embodiment 1. The terminal device 10 constitutes a terminal system 100. The terminal system 100 is arranged at each base. The terminal system 100 includes a terminal device 10, an imaging unit 51, an audio input unit 52, an audio output unit 53, a display unit 54, and an input terminal at a base where the terminal system 100 is located (hereinafter, also referred to as “own base”). And a unit 55.

撮像部５１は、被写体の静止画及び／又は動画を撮像する。撮像部５１の例は、デジタル画像を撮像するカメラである。撮像部５１は、自拠点が同じである端末装置１０と、有線通信又は無線通信を介して接続される。撮像部５１は、自拠点の会議室内の参加者等の拠点の画像を撮像し、撮像した画像の画像データを端末装置１０に出力する。 The imaging unit 51 captures a still image and / or a moving image of a subject. An example of the imaging unit 51 is a camera that captures a digital image. The imaging unit 51 is connected to the terminal device 10 having the same own base via wired communication or wireless communication. The imaging unit 51 captures an image of a site such as a participant in a conference room at the own site, and outputs image data of the captured image to the terminal device 10.

音声入力部５２は、周囲から音声を取得し、取得した音声を音声信号等の音声データに変換し出力する。音声入力部５２は、音源の方向に応じた音声データを出力する。音声入力部５２の例は、複数のマイクロホンが配列されたマイクロホンアレイである。音声入力部５２は、自拠点が同じである端末装置１０と、有線通信又は無線通信を介して接続される。音声入力部５２は、自拠点の会議室内の参加者等の音声を取得し、その音声データを端末装置１０に出力する。 The audio input unit 52 acquires audio from the surroundings, converts the acquired audio into audio data such as an audio signal, and outputs the audio data. The sound input unit 52 outputs sound data according to the direction of the sound source. An example of the audio input unit 52 is a microphone array in which a plurality of microphones are arranged. The voice input unit 52 is connected to the terminal device 10 having the same base via wired communication or wireless communication. The voice input unit 52 acquires the voice of the participant or the like in the conference room at its own site, and outputs the voice data to the terminal device 10.

音声出力部５３は、周囲へ音声を出力する。音声出力部５３の例は、スピーカである。音声出力部５３は、自拠点が同じである端末装置１０と、有線通信又は無線通信を介して接続される。音声出力部５３は、端末装置１０から取得する音声データを音声に変換し、自拠点の会議の参加者等へ向かって音声を出力する。例えば、音声出力部５３は、他拠点の端末装置１０から送信される音声データを出力する。 The sound output unit 53 outputs sound to the surroundings. An example of the audio output unit 53 is a speaker. The audio output unit 53 is connected to the terminal device 10 at the same base via wired communication or wireless communication. The audio output unit 53 converts the audio data acquired from the terminal device 10 into audio, and outputs the audio to a participant or the like of the conference at the own location. For example, the audio output unit 53 outputs audio data transmitted from the terminal device 10 at another site.

表示部５４は、入力される画像データを、画像として出力する。表示部５４の例は、液晶パネル、有機ＥＬ、無機ＥＬ及び電子ペーパーディスプレイである。表示部５４は、入力部５５の機能を兼ねたタッチパネルであってもよい。表示部５４は、自拠点が同じである端末装置１０と、有線通信又は無線通信を介して接続される。表示部５４は、端末装置１０から取得する画像データを画像に変換し、自拠点の会議の参加者へ向かって画像を出力する。例えば、表示部５４は、他拠点の端末装置１０から送信される画像データを出力する。１つの拠点に１つの表示部５４が配置され、他の全ての拠点の端末装置１０から送信される画像を出力してもよい。又は、他の拠点と同じ数量の表示部５４が配置され、各表示部５４は、他の拠点のうちの１つの拠点の端末装置１０から送信される画像を出力してもよい。又は、表示部５４は、拠点の参加者それぞれに配置されてもよい。 The display unit 54 outputs the input image data as an image. Examples of the display unit 54 are a liquid crystal panel, an organic EL, an inorganic EL, and an electronic paper display. The display unit 54 may be a touch panel that also has the function of the input unit 55. The display unit 54 is connected to the terminal device 10 at the same base via wired communication or wireless communication. The display unit 54 converts the image data acquired from the terminal device 10 into an image, and outputs the image to a participant of the conference at the own site. For example, the display unit 54 outputs image data transmitted from the terminal device 10 at another site. One display unit 54 may be arranged at one site, and may output images transmitted from the terminal devices 10 at all other sites. Alternatively, the same number of display units 54 as the other bases may be arranged, and each display unit 54 may output an image transmitted from the terminal device 10 at one of the other bases. Alternatively, the display unit 54 may be arranged for each participant at the base.

入力部５５は、会議の参加者等の自拠点の操作者による入力を受け付け、入力された情報を示す信号等の入力データを端末装置１０に出力する。入力部５５は、有線通信又は無線通信を介して端末装置１０と接続される。入力部５５は、表示部５４によって表示される画像に対する編集の入力を受け付ける。入力部５５の例は、キー、マウス及びタッチパネルである。 The input unit 55 accepts an input by an operator of the base such as a participant of the conference and outputs input data such as a signal indicating the input information to the terminal device 10. The input unit 55 is connected to the terminal device 10 via wired communication or wireless communication. The input unit 55 receives an input for editing an image displayed by the display unit 54. Examples of the input unit 55 are a key, a mouse, and a touch panel.

端末装置１０は、自拠点に存在する撮像部５１、音声入力部５２、音声出力部５３、表示部５４及び入力部５５の各構成要素と、有線通信又は無線通信を介して接続されるが、これら構成要素の少なくとも１つと一体化されていてもよい。各構成要素は、他の構成要素の少なくとも１つと一体化されていてもよい。上記有線通信又は無線通信は、有線ＬＡＮ又は無線ＬＡＮ等のいかなる通信であってもよい。 The terminal device 10 is connected to each component of the imaging unit 51, the audio input unit 52, the audio output unit 53, the display unit 54, and the input unit 55 existing at the own base via wired communication or wireless communication. It may be integrated with at least one of these components. Each component may be integrated with at least one of the other components. The wired communication or the wireless communication may be any communication such as a wired LAN or a wireless LAN.

端末装置１０は、第一通信部１１と、第二通信部１２と、制御部１３と、端末操作部１４と、端末表示部１５、記憶部１６とを含む。制御部１３は、視線推定部１３ａと、音方向推定部１３ｂと、注目情報決定部１３ｃと、合成部１３ｄと、エリア決定部１３ｅと、露出制御部１３ｆとを含む。 The terminal device 10 includes a first communication unit 11, a second communication unit 12, a control unit 13, a terminal operation unit 14, a terminal display unit 15, and a storage unit 16. The control unit 13 includes a line-of-sight estimation unit 13a, a sound direction estimation unit 13b, an attention information determination unit 13c, a synthesis unit 13d, an area determination unit 13e, and an exposure control unit 13f.

第一通信部１１は、自拠点の撮像部５１、音声入力部５２、音声出力部５３、表示部５４及び入力部５５と接続され、これらと通信する。第二通信部１２は、ネットワーク４０と接続され、他拠点の端末装置１０、サーバ装置２０及び予約サーバ装置３０と通信する。自拠点の撮像部５１、音声入力部５２及び入力部５５から出力される画像データ、音声データ及び入力信号は、第一通信部１１を介して制御部１３に入力され、制御部１３の処理を受けた後、第二通信部１２を介して他拠点の端末装置１０に送信される。また、他拠点の端末装置１０から送信される画像データ及び音声データは、第二通信部１２を介して制御部１３に入力され、制御部１３の処理を受けた後、第一通信部１１を介して音声出力部５３及び表示部５４に出力される。 The first communication unit 11 is connected to and communicates with the imaging unit 51, the audio input unit 52, the audio output unit 53, the display unit 54, and the input unit 55 of the own site. The second communication unit 12 is connected to the network 40, and communicates with the terminal device 10, the server device 20, and the reservation server device 30 at another base. Image data, audio data, and input signals output from the imaging unit 51, the audio input unit 52, and the input unit 55 of the own base are input to the control unit 13 via the first communication unit 11, and the processing of the control unit 13 is performed. After being received, it is transmitted to the terminal device 10 at another site via the second communication unit 12. In addition, the image data and the audio data transmitted from the terminal device 10 at the other base are input to the control unit 13 via the second communication unit 12, and after being processed by the control unit 13, the first communication unit 11 The data is output to the audio output unit 53 and the display unit 54 via the display unit.

端末操作部１４は、端末装置１０の操作者による操作、情報及び指令の入力を受け付け、制御部１３に出力する。端末表示部１５は、制御部１３から出力されるデータを、画像として出力する。また、端末表示部１５は、制御部１３から出力される音声データを、音声として出力する機能を備えてもよい。 The terminal operation unit 14 receives an operation, information, and a command input by an operator of the terminal device 10 and outputs the input to the control unit 13. The terminal display unit 15 outputs data output from the control unit 13 as an image. Further, the terminal display unit 15 may have a function of outputting audio data output from the control unit 13 as audio.

記憶部１６は、種々の情報の記憶及び取り出しを可能にする。例えば、記憶部１６には、自拠点の撮像部５１、音声入力部５２、音声出力部５３及び表示部５４の相対的な位置及び／絶対的な位置の情報、並びに、会議の拠点の識別情報等が、予め記憶される。記憶部１６は、制御部１３と接続されている。 The storage unit 16 enables storage and retrieval of various information. For example, the storage unit 16 stores information on the relative and / or absolute positions of the imaging unit 51, the audio input unit 52, the audio output unit 53, and the display unit 54 of the own base, and the identification information of the base of the conference Etc. are stored in advance. The storage unit 16 is connected to the control unit 13.

制御部１３は、端末装置１０の全体の動作を制御する。制御部１３は、第一通信部１１を介して、自拠点の撮像部５１、音声入力部５２、音声出力部５３、表示部５４及び入力部５５とデータを送受信する。制御部１３は、第二通信部１２及びネットワーク４０を介して、他拠点の端末装置１０、サーバ装置２０及び予約サーバ装置３０とデータを送受信する。 The control unit 13 controls the overall operation of the terminal device 10. The control unit 13 transmits and receives data to and from the imaging unit 51, the audio input unit 52, the audio output unit 53, the display unit 54, and the input unit 55 of the own base via the first communication unit 11. The control unit 13 transmits and receives data to and from the terminal device 10, the server device 20, and the reservation server device 30 at another site via the second communication unit 12 and the network 40.

視線推定部１３ａは、自拠点の参加者の視線を推定する。具体的には、視線推定部１３ａは、自拠点の撮像部５１によって撮像された画像データを取得し、当該画像データの画像に写し出される参加者の視線の方向を推定する。画像における人の抽出、及び、抽出された人の視線方向の推定は、既知の技術により実現可能である。 The line-of-sight estimating unit 13a estimates the line of sight of the participant at the own base. Specifically, the line-of-sight estimating unit 13a acquires image data captured by the image capturing unit 51 at its own site, and estimates the direction of the line-of-sight of the participant shown in the image of the image data. Extraction of a person from an image and estimation of the line of sight of the extracted person can be realized by a known technique.

さらに、視線推定部１３ａは、自拠点の撮像部５１及び表示部５４の相対的な位置の情報を記憶部１６から取得する。視線推定部１３ａは、推定された視線の方向と相対的な位置の情報とを用いて、表示部５４の画面上における各視線が指す領域である注目領域の位置及び範囲を推定する。撮像部５１及び表示部５４の相対的な位置の情報は、撮像部５１の位置と表示部５４の位置との相対的な関係と、撮像部５１の撮像方向（「光軸方向」とも呼ばれる）と表示部５４の画面の向きとの相対的な関係とを含む。 Further, the line-of-sight estimation unit 13 a acquires from the storage unit 16 information on the relative positions of the imaging unit 51 and the display unit 54 at the own base. The gaze estimation unit 13a estimates the position and range of the attention area, which is the area indicated by each gaze on the screen of the display unit 54, using the information on the estimated gaze direction and the relative position. The information on the relative positions of the imaging unit 51 and the display unit 54 includes the relative relationship between the position of the imaging unit 51 and the position of the display unit 54, and the imaging direction of the imaging unit 51 (also referred to as “optical axis direction”). And the relative relationship between the orientation of the screen of the display unit 54.

さらに、視線推定部１３ａは、注目領域の推定結果を用いて、表示部５４によって表示される画像上における注目領域の位置及び範囲の画素座標を算出する。画素座標は、画像に設定される２次元座標であり、１画素つまり１ピクセルを１単位とする。そして、視線推定部１３ａは、注目領域の位置及び範囲の画素座標と自拠点のＩＤ等の識別情報とを対応付けて含む注目情報を、注目情報決定部１３ｃに出力する。 Further, the gaze estimating unit 13a calculates pixel coordinates of the position and range of the attention area on the image displayed by the display unit 54, using the estimation result of the attention area. The pixel coordinates are two-dimensional coordinates set in the image, and one pixel, that is, one pixel is defined as one unit. Then, the line-of-sight estimation unit 13a outputs attention information including the pixel coordinates of the position and range of the attention area and identification information such as the ID of the own base in association with the attention information determination unit 13c.

なお、視線推定部１３ａは、自拠点の表示部５４によって表示される画像のデータを取得し、当該画像に写し出される参加者を抽出してもよい。視線推定部１３ａは、当該画像上において、抽出された参加者の位置及び範囲と注目領域の位置及び範囲とを比較することにより、抽出された参加者が写し出される注目領域を特定してもよい。そして、視線推定部１３ａは、抽出された参加者にＩＤ等の識別情報を設定し、当該識別情報と、注目領域の位置及び範囲の画素座標等とを対応付けて、注目情報に含めてもよい。 The gaze estimating unit 13a may acquire data of an image displayed by the display unit 54 of the own site, and extract a participant shown in the image. The eye gaze estimating unit 13a may specify a region of interest in which the extracted participant is projected by comparing the position and range of the extracted participant with the position and range of the region of interest on the image. . Then, the gaze estimation unit 13a sets identification information such as an ID for the extracted participant, associates the identification information with the pixel coordinates of the position and range of the attention area, and includes the identification information in the attention information. Good.

音方向推定部１３ｂは、自拠点の参加者のうちの話者の方向を推定する。具体的には、音方向推定部１３ｂは、自拠点の音声入力部５２によって取得された音声データを取得し、当該音声データの音声発生源の方向を推定する。音声発生源の方向の例は、音声入力部５２からの方位である。マイクロホンアレイ等の音声入力部５２を用いた音声発生源の方向の推定は、既知の技術により実現可能である。 The sound direction estimating unit 13b estimates the direction of the speaker among the participants at the base. Specifically, the sound direction estimating unit 13b acquires the sound data acquired by the sound input unit 52 of the own base, and estimates the direction of the sound source of the sound data. An example of the direction of the sound source is a direction from the sound input unit 52. The estimation of the direction of the sound source using the sound input unit 52 such as a microphone array can be realized by a known technique.

さらに、音方向推定部１３ｂは、自拠点の撮像部５１及び音声入力部５２の相対的な位置の情報を記憶部１６から取得する。音方向推定部１３ｂは、推定された音声発生源の方向と相対的な位置の情報とを用いて、撮像部５１によって撮像される画像上における音声発生源の領域の位置及び範囲の画素座標を推定する。つまり、音方向推定部１３ｂは、画像上における話者の位置及び範囲の画素座標を推定する。音声発生源の領域は、注目領域である。撮像部５１及び音声入力部５２の相対的な位置の情報は、撮像部５１の位置と音声入力部５２の位置との相対的な関係と、撮像部５１の撮像方向と音声入力部５２の集音方向との相対的な関係とを含む。 Further, the sound direction estimating unit 13b acquires from the storage unit 16 information on the relative positions of the imaging unit 51 and the audio input unit 52 at the own base. The sound direction estimating unit 13b uses the information of the estimated direction of the sound source and the relative position to calculate the pixel coordinates of the position and range of the region of the sound source on the image captured by the imaging unit 51. presume. That is, the sound direction estimation unit 13b estimates the pixel coordinates of the position and range of the speaker on the image. The region of the sound source is the region of interest. Information on the relative positions of the imaging unit 51 and the audio input unit 52 includes the relative relationship between the position of the imaging unit 51 and the position of the audio input unit 52, the imaging direction of the imaging unit 51, and the collection of the audio input unit 52. And the relative relationship with the sound direction.

そして、音方向推定部１３ｂは、注目領域の位置及び範囲の画素座標と自拠点のＩＤ等の識別情報とを対応付けて含む注目情報を、注目情報決定部１３ｃに出力する。なお、音方向推定部１３ｂは、自拠点の撮像部５１によって撮像される画像のデータを取得し、当該画像に写し出される参加者を抽出してもよい。音方向推定部１３ｂは、当該画像上において、抽出された参加者の位置及び範囲と注目領域の位置及び範囲とを比較することにより、抽出された参加者が写し出される注目領域を特定してもよい。そして、音方向推定部１３ｂは、抽出された参加者にＩＤ等の識別情報を設定し、当該識別情報と、注目領域の位置及び範囲の画素座標等とを対応付けて、注目情報に含めてもよい。 Then, the sound direction estimation unit 13b outputs the attention information including the pixel coordinates of the position and range of the attention area and the identification information such as the ID of the own base to the attention information determination unit 13c. Note that the sound direction estimating unit 13b may acquire data of an image captured by the image capturing unit 51 of the own base, and extract a participant shown in the image. The sound direction estimating unit 13b compares the position and range of the extracted participant with the position and range of the region of interest on the image to specify the region of interest in which the extracted participant is projected. Good. Then, the sound direction estimating unit 13b sets identification information such as an ID for the extracted participant, associates the identification information with the pixel coordinates of the position and range of the attention area, and includes the identification information in the attention information. Is also good.

注目情報決定部１３ｃは、視線推定部１３ａ及び音方向推定部１３ｂから注目情報を取得する。さらに、注目情報決定部１３ｃは、画像上における注目領域を示す指標を決定する。具体的には、注目情報決定部１３ｃは、画像上において、注目領域に外接する矩形枠を指標として生成し、当該矩形枠の頂点の画素座標と当該矩形枠の寸法とを算出する。注目情報決定部１３ｃは、矩形枠の頂点の座標及び寸法と自拠点の識別情報とを少なくとも対応付けて含む注目情報を、第二通信部１２を介して、他拠点の端末装置１０に送信する。また、注目情報決定部１３ｃは、当該注目情報を合成部１３ｄに出力する。 The attention information determination unit 13c acquires attention information from the eye gaze estimation unit 13a and the sound direction estimation unit 13b. Further, the attention information determination unit 13c determines an index indicating the attention area on the image. Specifically, the attention information determination unit 13c generates a rectangular frame circumscribing the attention area on the image as an index, and calculates the pixel coordinates of the vertexes of the rectangular frame and the dimensions of the rectangular frame. The attention information determination unit 13c transmits the attention information including at least the coordinates and dimensions of the vertices of the rectangular frame and the identification information of the own site to the terminal device 10 at the other site via the second communication unit 12. . In addition, the attention information determination unit 13c outputs the attention information to the synthesis unit 13d.

合成部１３ｄは、自拠点の注目情報決定部１３ｃから注目情報を取得し、他拠点の端末装置それぞれから当該他拠点の注目情報を取得する。さらに、合成部１３ｄは、自拠点の注目情報に他拠点の注目情報を加算することによって、自拠点の撮像部５１によって撮像される画像上における注目情報を合成する。自拠点の注目情報は、自拠点の音方向推定部１３ｂによって算出される注目領域を示す指標の頂点の座標及び寸法であり、話者の方向の推定結果に基づく情報である。他拠点の注目情報は、他拠点の視線推定部１３ａによって算出される注目領域を示す指標の頂点の座標及び寸法であり、参加者の視線の推定結果に基づく情報である。 The combining unit 13d acquires the attention information from the attention information determination unit 13c of the own base, and acquires the attention information of the other base from each of the terminal devices at the other bases. Further, the combining unit 13d combines the attention information on the image captured by the imaging unit 51 of the own base by adding the attention information of the other base to the attention information of the own base. The attention information of the own base is the coordinates and dimensions of the vertex of the index indicating the attention area calculated by the sound direction estimation unit 13b of the own base, and is information based on the estimation result of the direction of the speaker. The attention information of the other base is the coordinates and the size of the vertex of the index indicating the attention area calculated by the gaze estimation unit 13a of the other base, and is information based on the estimation result of the gaze of the participant.

エリア決定部１３ｅは、合成部１３ｄによって合成された注目情報に対して、予め決められた重み付けを付加することによって、注目エリアを決定する。エリア決定部１３ｅは、注目エリアの情報を露出制御部１３ｆに出力する。重み付けに関する情報は、例えば、記憶部１６に記憶されている。 The area determination unit 13e determines the area of interest by adding a predetermined weight to the attention information synthesized by the synthesis unit 13d. The area determining unit 13e outputs the information of the attention area to the exposure control unit 13f. Information on the weighting is stored in, for example, the storage unit 16.

露出制御部１３ｆは、注目エリアの情報を用いて、自拠点の撮像部５１によって撮像された画像の露光を調整する。露出制御部１３ｆは、露光調整後の画像を、第二通信部１２を介して他拠点の端末装置１０に送信する。 The exposure control unit 13f adjusts the exposure of the image captured by the imaging unit 51 of the own base using the information of the attention area. The exposure control unit 13 f transmits the image after the exposure adjustment to the terminal device 10 at another site via the second communication unit 12.

次いで、端末装置１０のハードウェア構成を説明する。図７は、実施の形態１に係る端末装置１０のハードウェア構成の一例を示すブロック図である。図７に示すように、端末装置１０は、ＣＰＵ１１１と、不揮発性記憶装置１１２と、揮発性記憶装置１１３と、第一通信Ｉ／Ｆ１１４と、第二通信Ｉ／Ｆ１１５と、操作Ｉ／Ｆ１１６と、表示装置１１７と、メモリ１１８とを構成要素として備える。上記構成要素はそれぞれ、例えばバスを介して互いに接続されている。なお、上記構成要素は、有線通信及び無線通信のいずれを介して接続されてもよい。 Next, the hardware configuration of the terminal device 10 will be described. FIG. 7 is a block diagram showing an example of a hardware configuration of terminal device 10 according to Embodiment 1. As shown in FIG. 7, the terminal device 10 includes a CPU 111, a nonvolatile storage device 112, a volatile storage device 113, a first communication I / F 114, a second communication I / F 115, and an operation I / F 116. , A display device 117 and a memory 118 as constituent elements. The above components are connected to each other, for example, via a bus. The above components may be connected via any of wired communication and wireless communication.

端末装置１０の例は、コンピュータ装置である。なお、端末装置１０を含む端末システム１００は、複数の装置からなるシステムであってもよく、１つの装置であってもよい。１つの装置の場合、端末システム１００の例は、コンピュータ装置及び多機能テレビ等である。 An example of the terminal device 10 is a computer device. The terminal system 100 including the terminal device 10 may be a system including a plurality of devices or a single device. In the case of one device, examples of the terminal system 100 are a computer device and a multifunctional television.

第一通信Ｉ／Ｆ１１４は、第一通信部１１の機能を実現する。第二通信Ｉ／Ｆ１１５は、第二通信部１２の機能を実現する。第一通信Ｉ／Ｆ１１４及び第二通信Ｉ／Ｆ１１５は、通信Ｉ／Ｆ１２５について上述した構成と同様である。 The first communication I / F 114 implements the function of the first communication unit 11. The second communication I / F 115 implements the function of the second communication unit 12. The first communication I / F 114 and the second communication I / F 115 have the same configuration as that of the communication I / F 125 described above.

操作Ｉ／Ｆ１１６は、端末操作部１４の機能を実現する。操作Ｉ／Ｆ１１６の構成は、操作Ｉ／Ｆ１２６について上述した構成と同様である。表示装置１１７は、端末表示部１５の機能を実現する。表示装置１１７の構成は、表示装置１２７について上述した構成と同様である。メモリ１１８は、記憶部１６の機能を実現する。メモリ１１８の構成は、メモリ１２４について上述した構成と同様である。 The operation I / F 116 implements the function of the terminal operation unit 14. The configuration of the operation I / F 116 is the same as the configuration described above for the operation I / F 126. The display device 117 implements the function of the terminal display unit 15. The configuration of the display device 117 is the same as the configuration of the display device 127 described above. The memory 118 implements the function of the storage unit 16. The configuration of the memory 118 is similar to the configuration described above for the memory 124.

ＣＰＵ１１１は、制御部１３の各構成要素の機能を実現する。ＣＰＵ１１１はプロセッサ等で構成される。不揮発性記憶装置１１２の例はＲＯＭであり、揮発性記憶装置１１３の例はＲＡＭである。ＣＰＵ１１１、不揮発性記憶装置１１２及び揮発性記憶装置１１３の構成は、ＣＰＵ１２１、不揮発性記憶装置１２２及び揮発性記憶装置１２３について上述した構成と同様である。 The CPU 111 implements the function of each component of the control unit 13. The CPU 111 includes a processor and the like. An example of the non-volatile storage device 112 is a ROM, and an example of the volatile storage device 113 is a RAM. The configurations of the CPU 111, the nonvolatile storage device 112, and the volatile storage device 113 are the same as those described above for the CPU 121, the nonvolatile storage device 122, and the volatile storage device 123.

＜端末装置１０の処理＞
端末装置１０の処理の詳細を説明する。以下において、４つの拠点ＣＰ１〜ＣＰ４のうちの第一拠点ＣＰ１の端末装置１０ａの処理について説明するが、他の拠点ＣＰ２〜ＣＰ４の端末装置１０ｂ〜１０ｄについても同様であるため、その説明を省略する。 <Process of Terminal Device 10>
Details of the processing of the terminal device 10 will be described. In the following, the processing of the terminal device 10a of the first site CP1 of the four sites CP1 to CP4 will be described, but the same applies to the terminal devices 10b to 10d of the other sites CP2 to CP4, and therefore, the description thereof will be omitted. I do.

＜第一拠点ＣＰ１内の配置＞
まず、第一拠点ＣＰ１内の配置を説明する。図８Ａは、会議の拠点ＣＰ１〜ＣＰ４のうちの第一拠点ＣＰ１における参加者及び撮像部５１の配置の一例を示す平面図である。図８Ｂは、図８Ａの第一拠点ＣＰ１の撮像部５１によって撮像された画像の一例を示す図である。 <Arrangement in the first base CP1>
First, the arrangement in the first base CP1 will be described. FIG. 8A is a plan view showing an example of the arrangement of the participants and the imaging unit 51 at the first site CP1 of the conference sites CP1 to CP4. FIG. 8B is a diagram illustrating an example of an image captured by the imaging unit 51 of the first base CP1 in FIG. 8A.

図８Ａに示すように、第一拠点ＣＰ１では、矩形状の会議机ＭＤの周りに、１つの撮像部５１と３名の参加者ＰＡ〜ＰＣとが位置している。撮像部５１と参加者ＰＢとは対向して位置し、参加者ＰＡと参加者ＰＣとは対向して位置している。参加者ＰＡ〜ＰＣは、撮像部５１の視野内に位置している。撮像部５１は、表示部５４の上部に配置され、撮像部５１の撮像方向と表示部５４の画面の指向方向とは、略平行である。画面の指向方向は、画面に垂直な方向である。このような撮像部５１は、参加者ＰＡ〜ＰＣを撮像することによって、図８Ｂに示すような画像Ｉ１を出力する。画像Ｉ１では、参加者ＰＡ〜ＰＣ及び会議机ＭＤそれぞれの像である参加者像ＰＡ１〜ＰＣ１及び会議机像ＭＤ１が写し出されている。 As shown in FIG. 8A, at the first base CP1, one imaging unit 51 and three participants PA to PC are located around a rectangular conference desk MD. The imaging unit 51 and the participant PB are located facing each other, and the participant PA and the participant PC are located facing each other. The participants PA to PC are located within the field of view of the imaging unit 51. The imaging unit 51 is disposed above the display unit 54, and the imaging direction of the imaging unit 51 and the direction of the screen of the display unit 54 are substantially parallel. The directional direction of the screen is a direction perpendicular to the screen. Such an imaging unit 51 outputs an image I1 as shown in FIG. 8B by imaging the participants PA to PC. In the image I1, participant images PA1 to PC1 and a conference machine image MD1, which are images of the participants PA to PC and the conference machine MD, respectively, are displayed.

＜露出制御部１３ｆの露出制御＞
次いで、露出制御部１３ｆの露出制御を説明する。図９Ａは、露出制御における画像に設定される測光領域の一例を示す図である。図９Ｂは、図８Ｂの画像Ｉ１への測光領域の適用例を示す図である。図９Ａに示すように、露出制御において、画像Ｉは複数の測光領域Ｉｍｎに分割される。測光領域Ｉｍｎは、測光値を算出ための最小単位の領域である。図９Ａでは、複数の測光領域Ｉｍｎは、画像Ｉを水平方向にｍ分割し且つ垂直方向にｎ分割することによって、形成されている。 <Exposure control of exposure controller 13f>
Next, the exposure control of the exposure control unit 13f will be described. FIG. 9A is a diagram illustrating an example of a photometry area set for an image in exposure control. FIG. 9B is a diagram illustrating an application example of the photometry area to the image I1 of FIG. 8B. As shown in FIG. 9A, in exposure control, the image I is divided into a plurality of photometric areas Imn. The photometric area Imn is an area of a minimum unit for calculating a photometric value. In FIG. 9A, the plurality of photometric regions Imn are formed by dividing the image I into m parts in the horizontal direction and n parts in the vertical direction.

各測光領域Ｉｍｎについて、当該測光領域Ｉｍｎに含まれる画素の画素値を示す輝度信号を積分することによって、測光値が算出される。また、各測光領域Ｉｍｎには、重み付けが付与されている。そして、全ての測光領域Ｉｍｎの測光値それぞれに重み付けを付与した重み付け後の測光値の平均値が、ＡＥ（アコースティックエミッション：）評価値として算出される。つまり、全ての測光領域Ｉｍｎの測光値の加重平均値が、ＡＥ評価値として算出される。ＡＥ評価値とターゲット輝度値との差異がエラー量として算出される。そして、エラー量が所定の範囲内に収まるように、ゲイン及び露光時間等が制御されることで、露出制御される。 For each photometric region Imn, a photometric value is calculated by integrating a luminance signal indicating a pixel value of a pixel included in the photometric region Imn. Further, a weight is assigned to each photometric region Imn. Then, an average value of the weighted photometric values obtained by weighting the photometric values of all the photometric regions Imn is calculated as an AE (acoustic emission :) evaluation value. That is, the weighted average of the photometric values of all the photometric regions Imn is calculated as the AE evaluation value. The difference between the AE evaluation value and the target luminance value is calculated as an error amount. Then, exposure control is performed by controlling the gain, the exposure time, and the like so that the error amount falls within a predetermined range.

例えば、全ての測光領域Ｉｍｎの重み付けが同じである場合、画像の視野内全体に対して均一に露出が合わせられる。例えば、図９Ｂに示す参加者像ＰＢ１に比重をおいて露出制御する場合、参加者像ＰＢ１付近の測光領域Ｉｍｎの重み付けが、参加者像ＰＢ１以外の測光領域Ｉｍｎの重み付けよりも大きくされる。よって、各測光領域Ｉｍｎに対する重み付けを制御することによって、画像内の対象とする被写体に合わせた露出制御が可能である。これにより、対象とする被写体における白飛び、黒つぶれ又はこれらに近い状態の発生が抑えられる。 For example, when the weighting of all the photometric regions Imn is the same, the exposure is uniformly adjusted over the entire visual field of the image. For example, when performing exposure control with a specific gravity relative to the participant image PB1 shown in FIG. 9B, the weight of the photometric region Imn near the participant image PB1 is set to be larger than the weight of the photometric region Imn other than the participant image PB1. Therefore, by controlling the weighting for each photometric region Imn, it is possible to perform exposure control according to the target subject in the image. As a result, the occurrence of overexposure, underexposure, or a state close to these is suppressed in the target subject.

例えば、露出制御部１３ｆは、エリア決定部１３ｅから取得される重み付け後の注目エリアの情報を用いて、画像Ｉ１の各測光領域Ｉｍｎに重み付けを設定する。図１０は、露出制御における図９Ｂの画像Ｉ１への測光領域Ｉｍｎの重み付けの一例を示す図である。図１０の画像Ｉ１では、注目エリアが表されており、注目エリアは、参加者像ＰＢ１及びその付近の領域である。注目エリアのうち、濃いドットで示される測光領域Ｉｍｎｃは、最も注目されているエリア、つまり最も高い注目度のエリアである。薄いドットで示される測光領域Ｉｍｎｂは、次に注目されているエリア、つまり次に高い注目度のエリアである。無地である測光領域Ｉｍｎａは、注目されていないエリア、つまり非注目のエリアである。 For example, the exposure control unit 13f sets a weight for each photometric region Imn of the image I1 using the information of the weighted attention area acquired from the area determination unit 13e. FIG. 10 is a diagram illustrating an example of weighting the photometric area Imn to the image I1 of FIG. 9B in exposure control. In the image I1 of FIG. 10, the attention area is shown, and the attention area is the participant image PB1 and an area around the participant image PB1. Among the attention areas, the photometric area Imnc indicated by dark dots is an area that is receiving the most attention, that is, an area with the highest attention degree. The photometric area Imnb indicated by a thin dot is an area of interest next, that is, an area of next highest interest. The plain photometry area Imna is an area that is not noticed, that is, an area that is not noticed.

例えば、露出制御部１３ｆは、測光領域Ｉｍｎｃに対する露出制御の重み付けを最も大きく設定する。露出制御部１３ｆは、測光領域Ｉｍｎｂに対する重み付けを測光領域Ｉｍｎｃよりも小さいが、画像Ｉ１に予め設定されている通常の重み付けよりも大きく設定する。露出制御部１３ｆは、測光領域Ｉｍｎａに対する重み付けを通常の重み付けよりも小さく設定する、又は、重み付けをなくす、つまり０にする。このように、露出制御部１３ｆは、より注目されているエリアにより大きな重み付けを行うことによって、注目エリアに比重を置いた露出制御を行う。 For example, the exposure control unit 13f sets the weight of the exposure control on the photometric region Imnc to be the largest. The exposure control unit 13f sets the weight for the photometric region Imnb to be smaller than that of the photometric region Imnc, but larger than the normal weight set in advance for the image I1. The exposure control unit 13f sets the weight for the photometric area Imna to be smaller than the normal weight, or eliminates the weight, that is, sets the weight to zero. As described above, the exposure control unit 13f performs the exposure control with the specific gravity placed on the attention area by weighting the attention area more heavily.

＜視線推定部１３ａの注目領域の推定処理＞
視線推定部１３ａの注目領域の推定処理を説明する。図１１Ａは、図８Ｂの第一拠点ＣＰ１の画像に対する第二拠点ＣＰ２の参加者Ｖｂの視線情報の一例を示す図である。図１１Ｂは、図８Ｂの第一拠点ＣＰ１の画像に対する第三拠点ＣＰ３の参加者Ｖｃの視線情報の一例を示す図である。図１１Ｃは、第二拠点ＣＰ２及び第三拠点ＣＰ３の視線情報を用いて測光領域の注目情報が設定された図９Ｂの画像の一例を示す図である。 <Estimation processing of attention area by gaze estimation section 13a>
The process of estimating the region of interest performed by the gaze estimation unit 13a will be described. FIG. 11A is a diagram illustrating an example of line-of-sight information of the participant Vb of the second site CP2 with respect to the image of the first site CP1 in FIG. 8B. FIG. 11B is a diagram illustrating an example of line-of-sight information of the participant Vc of the third site CP3 with respect to the image of the first site CP1 in FIG. 8B. FIG. 11C is a diagram illustrating an example of the image in FIG. 9B in which attention information of the photometry area is set using the line-of-sight information of the second site CP2 and the third site CP3.

図１１Ａ及び図１１Ｂに示すように、第一拠点ＣＰ１の撮像部５１によって撮像された画像Ｉ１は、端末装置１０ａによって、他の拠点の端末装置１０ｂ〜１０ｄに送信される。例えば、第二拠点ＣＰ２の表示部５４には、画像Ｉ１と同様の画像Ｉ２が表示され、第三拠点ＣＰ３の表示部５４には、画像Ｉ１と同様の画像Ｉ３が表示される。 As shown in FIGS. 11A and 11B, the image I1 captured by the imaging unit 51 of the first location CP1 is transmitted by the terminal device 10a to the terminal devices 10b to 10d at other locations. For example, an image I2 similar to the image I1 is displayed on the display unit 54 of the second site CP2, and an image I3 similar to the image I1 is displayed on the display unit 54 of the third site CP3.

図１１Ａに示すように、第二拠点ＣＰ２の端末装置１０ｂの視線推定部１３ａは、自拠点の撮像部５１によって撮像された参加者Ｖｂの画像を用いて、参加者Ｖｂの視線の方向を推定する。さらに、当該視線推定部１３ａは、参加者Ｖｂの視線の方向と、第二拠点ＣＰ２の撮像部５１及び表示部５４の相対的な位置の情報とを用いて、自拠点の表示部５４によって表示される画像Ｉ２上での注目領域Ｆ２の位置及び範囲の画素座標を算出する。 As shown in FIG. 11A, the line-of-sight estimating unit 13a of the terminal device 10b of the second site CP2 estimates the direction of the line-of-sight of the participant Vb using the image of the participant Vb imaged by the imaging unit 51 of the own site. I do. Further, the gaze estimating unit 13a uses the direction of the gaze of the participant Vb and the relative position information of the imaging unit 51 and the display unit 54 of the second base CP2 to display the information on the display unit 54 of the own base. The pixel coordinates of the position and range of the attention area F2 on the image I2 to be processed are calculated.

端末装置１０ｂの注目情報決定部１３ｃは、注目領域Ｆ２の位置及び範囲の画素座標を用いて、注目領域Ｆ２に外接する枠Ｆ２ｆの１つの頂点の画素座標（ｘ２，ｙ２）と枠Ｆ２ｆの寸法（ｗ２，ｚ２）とを算出する。枠Ｆ２ｆの頂点の画素座標及び枠の寸法は、画像Ｉ２の画素座標を用いて算出される。画素座標の成分「ｘ２」は画像Ｉ２の水平方向の成分であり、成分「ｙ２」は画像Ｉ２の垂直方向の成分である。寸法の成分「ｗ２」は画像Ｉ２の水平方向の寸法であり、成分「ｚ２」は画像Ｉ２の垂直方向の寸法である。画像Ｉ２の水平方向及び垂直方向はそれぞれ、格子状に配列された画素の横及び縦の並び方向である。 The attention information determination unit 13c of the terminal device 10b uses the position and the pixel coordinates of the attention area F2 to determine the pixel coordinates (x2, y2) of one vertex of the frame F2f circumscribing the attention area F2 and the dimensions of the frame F2f. (W2, z2) is calculated. The pixel coordinates of the vertices of the frame F2f and the dimensions of the frame are calculated using the pixel coordinates of the image I2. The component “x2” of the pixel coordinates is a horizontal component of the image I2, and the component “y2” is a vertical component of the image I2. The dimension component “w2” is the horizontal dimension of the image I2, and the component “z2” is the vertical dimension of the image I2. The horizontal direction and the vertical direction of the image I2 are the horizontal and vertical arrangement directions of the pixels arranged in a lattice, respectively.

注目情報決定部１３ｃは、自拠点のＩＤ「ＣＰ２」と、注目領域の対象である第一拠点ＣＰ１の参加者のＩＤ「Ｃ」と、枠頂点の画素座標（ｘ２，ｙ２）と、枠寸法（ｗ２，ｚ２）とを対応付けて含む注目情報を、第一拠点ＣＰ１の端末装置１０ａに送信する。 The attention information determination unit 13c determines the ID “CP2” of the own base, the ID “C” of the participant of the first base CP1 which is the target of the attention area, the pixel coordinates (x2, y2) of the frame vertex, and the frame size. Attention information including (w2, z2) in association with each other is transmitted to the terminal device 10a of the first base CP1.

図１１Ｂに示すように、第三拠点ＣＰ３の端末装置１０ｃの視線推定部１３ａは、自拠点の撮像部５１によって撮像された参加者Ｖｃの画像を用いて、参加者Ｖｃの視線の方向を推定する。さらに、当該視線推定部１３ａは、参加者Ｖｃの視線の方向と、第三拠点ＣＰ３の撮像部５１及び表示部５４の相対的な位置の情報とを用いて、自拠点の表示部５４の画像Ｉ３上での注目領域Ｆ３の位置及び範囲の画素座標を算出する。 As illustrated in FIG. 11B, the line-of-sight estimation unit 13a of the terminal device 10c of the third site CP3 estimates the direction of the line-of-sight of the participant Vc using the image of the participant Vc captured by the imaging unit 51 of the own site. I do. Further, the gaze estimating unit 13a uses the direction of the gaze of the participant Vc and the information on the relative positions of the imaging unit 51 and the display unit 54 of the third base CP3 to generate an image on the display unit 54 of the own base. The pixel coordinates of the position and range of the attention area F3 on I3 are calculated.

端末装置１０ｃの注目情報決定部１３ｃは、注目領域Ｆ３の位置及び範囲の画素座標を用いて、注目領域Ｆ３の外接枠Ｆ３ｆの頂点の画素座標（ｘ３，ｙ３）及び枠寸法（ｗ３，ｚ３）を、画像Ｉ３の画素座標に基づいて算出する。 The attention information determination unit 13c of the terminal device 10c uses the position and the pixel coordinates of the area of the attention area F3 to determine the pixel coordinates (x3, y3) and the frame size (w3, z3) of the vertex of the circumscribed frame F3f of the attention area F3. Is calculated based on the pixel coordinates of the image I3.

注目情報決定部１３ｃは、自拠点のＩＤ「ＣＰ３」と、注目領域の対象である第一拠点ＣＰ１の参加者のＩＤ「Ｂ」と、枠頂点の画素座標（ｘ３，ｙ３）と、枠寸法（ｗ３，ｚ３）とを対応付けて含む注目情報を、第一拠点ＣＰ１の端末装置１０ａに送信する。 The attention information determination unit 13c determines the ID “CP3” of the own base, the ID “B” of the participant of the first base CP1 that is the target of the attention area, the pixel coordinates (x3, y3) of the frame vertex, and the frame size. Attention information including (w3, z3) in association with each other is transmitted to the terminal device 10a of the first base CP1.

また、第一拠点ＣＰ１の端末装置１０ａの合成部１３ｄは、画像Ｉ１に対して測光領域を区分する処理を行うことによって、画像Ｉ１Ａを生成する。合成部１３ｄは、各拠点の端末装置１０から受信した注目情報を、画像Ｉ１Ａに適用する。 Further, the synthesizing unit 13d of the terminal device 10a at the first base CP1 generates an image I1A by performing a process of dividing a photometric area on the image I1. The combining unit 13d applies the attention information received from the terminal device 10 at each base to the image I1A.

図１１Ｃに示すように、例えば、合成部１３ｄは、第二拠点ＣＰ２の端末装置１０ｂの注目情報を適用することによって、画像Ｉ１Ａ上に枠Ｆ２ｆを形成する。また、合成部１３ｄは、第三拠点ＣＰ３の端末装置１０ｃの注目情報を適用することによって、画像Ｉ１Ａ上に枠Ｆ３ｆを形成する。 As illustrated in FIG. 11C, for example, the combining unit 13d forms a frame F2f on the image I1A by applying attention information of the terminal device 10b at the second base CP2. Further, the combining unit 13d forms the frame F3f on the image I1A by applying the attention information of the terminal device 10c at the third base CP3.

合成部１３ｄは、枠Ｆ２ｆに全体が含まれる測光領域及び枠Ｆ３ｆに全体が含まれる測光領域をそれぞれ、最も高い注目度の測光領域である第一注目領域に決定する。合成部１３ｄは、枠Ｆ２ｆに部分的に含まれる測光領域及び枠Ｆ３ｆに部分的に含まれる測光領域をそれぞれ、次に高い注目度の測光領域である第二注目領域に決定する。合成部１３ｄは、枠Ｆ２ｆに含まれない測光領域及び枠Ｆ３ｆに含まれない測光領域をそれぞれ、注目されていない測光領域である非注目領域に決定する。 The synthesizing unit 13d determines each of the photometry area including the entirety in the frame F2f and the photometry area including the entirety in the frame F3f as the first attention area which is the photometry area having the highest interest. The combining unit 13d determines each of the photometry area partially included in the frame F2f and the photometry area partially included in the frame F3f as a second attention area that is a photometry area with the next highest interest. The synthesizing unit 13d determines each of the photometry area not included in the frame F2f and the photometry area not included in the frame F3f as a non-attention area that is a photometry area that is not observed.

なお、画像Ｉ１Ａ内の参加者像ＰＡ１〜ＰＣ１を抽出することによって、第一注目領域内の測光領域をさらに差異付けてもよい。例えば、合成部１３ｄは、第一注目領域内において、参加者像ＰＡ１〜ＰＣ１を少なくとも部分的に含む測光領域の注目度は、参加者像ＰＡ１〜ＰＣ１を含まない測光領域よりも高いと決定してもよい。 Note that the photometric regions in the first region of interest may be further differentiated by extracting the participant images PA1 to PC1 in the image I1A. For example, the synthesizing unit 13d determines that the degree of interest of the photometric region including at least partially the participant images PA1 to PC1 is higher than the photometric region not including the participant images PA1 to PC1 in the first region of interest. You may.

合成部１３ｄは、第一注目領域及び第二注目領域を含む領域を注目測光領域に決定する。図１１Ｃにおいて、第一注目領域は濃いドットで示され、第二注目領域は薄いドットで示され、非注目領域は無地である。このように、第一拠点ＣＰ１の端末装置１０ａは、他拠点の視線情報に基づく注目領域を用いて、第一拠点ＣＰ１を写す画像Ｉ１における注目測光領域を決定する。ここで、視線情報に基づく注目領域の情報は、視点情報の一例である。 The combining unit 13d determines a region including the first region of interest and the second region of interest as a target photometric region. In FIG. 11C, the first region of interest is indicated by dark dots, the second region of interest is indicated by light dots, and the non-target region is solid. As described above, the terminal device 10a of the first base CP1 determines the target photometric region in the image I1 depicting the first base CP1, using the target region based on the line-of-sight information of the other base. Here, the information of the attention area based on the line-of-sight information is an example of viewpoint information.

＜音方向推定部１３ｂの注目領域の推定処理＞
音方向推定部１３ｂの注目領域の処理を説明する。図１２Ａは、第一拠点ＣＰ１における話者の方向の一例を示す図である。図１２Ｂは、第一拠点ＣＰ１における話者の方向を用いて測光領域の注目情報が設定された図９Ｂの画像の一例を示す図である。 <Estimation processing of attention area by sound direction estimation section 13b>
The processing of the attention area by the sound direction estimation unit 13b will be described. FIG. 12A is a diagram illustrating an example of the direction of the speaker at the first base CP1. FIG. 12B is a diagram illustrating an example of the image of FIG. 9B in which attention information of the photometry area is set using the direction of the speaker at the first base CP1.

図１２Ａに示すように、第一拠点ＣＰ１において、音声入力部５２は、撮像部５１及び表示部５４の下方に配置されて、参加者ＰＡ〜ＰＣの音声を取得する。例えば、参加者ＰＣが発話すると、音声入力部５２は、参加者ＰＣの音声を取得し、その音声データを端末装置１０ａの音方向推定部１３ｂに出力する。音方向推定部１３ｂは、取得された音声データを用いて、音声入力部５２から当該音声データの音源である参加者ＰＣへの方向を推定する。音方向推定部１３ｂは、参加者ＰＣの方向と、自拠点の撮像部５１及び音声入力部５２の相対的な位置の情報とを用いて、撮像部５１によって撮像される画像Ｉ１上における注目領域である音源の領域の位置及び範囲の画素座標を算出する。 As shown in FIG. 12A, at the first base CP1, the voice input unit 52 is disposed below the imaging unit 51 and the display unit 54, and obtains the voices of the participants PA to PC. For example, when the participant PC speaks, the voice input unit 52 acquires the voice of the participant PC, and outputs the voice data to the sound direction estimation unit 13b of the terminal device 10a. The sound direction estimating unit 13b estimates the direction from the sound input unit 52 to the participant PC, which is the sound source of the sound data, using the obtained sound data. The sound direction estimating unit 13b uses the direction of the participant PC and information on the relative positions of the imaging unit 51 and the voice input unit 52 at the own location, and the attention area on the image I1 captured by the imaging unit 51. Then, the pixel coordinates of the position and range of the sound source area are calculated.

端末装置１０ａの注目情報決定部１３ｃは、音源の領域の位置及び範囲の画素座標を用いて、音源の領域に外接する枠ＦＡｆの１つの頂点の画素座標と枠ＦＡｆの寸法とを算出する。 The attention information determination unit 13c of the terminal device 10a calculates the pixel coordinates of one vertex of the frame FAf circumscribing the sound source region and the size of the frame FAf using the position and range pixel coordinates of the sound source region.

注目情報決定部１３ｃは、自拠点のＩＤ「ＣＰ１」と、注目領域の対象である参加者のＩＤ「Ｃ」と、枠ＦＡｆの頂点の画素座標と、枠ＦＡｆの寸法とを対応付けて含む注目情報を、端末装置１０ａの合成部１３ｄに出力する。 The attention information determination unit 13c includes the ID “CP1” of its own base, the ID “C” of the participant who is the target of the attention area, the pixel coordinates of the vertex of the frame FAf, and the dimensions of the frame FAf in association with each other. The attention information is output to the combining unit 13d of the terminal device 10a.

また、合成部１３ｄは、画像Ｉ１に対して測光領域を区分する処理を行うことによって、画像Ｉ１Ａを生成する。合成部１３ｄは、注目情報決定部１３ｃから取得された注目情報を、画像Ｉ１Ａに適用する。 Further, the synthesizing unit 13d generates an image I1A by performing a process of dividing the photometry area on the image I1. The combining unit 13d applies the attention information obtained from the attention information determination unit 13c to the image I1A.

図１２Ｂに示すように、例えば、合成部１３ｄは、注目情報を適用することによって、画像Ｉ１Ａ上に枠ＦＡｆを形成する。合成部１３ｄは、枠ＦＡｆに全体が含まれる測光領域を第一注目領域に決定し、枠ＦＡｆに部分的に含まれる測光領域を第二注目領域に決定し、枠ＦＡｆに含まれない測光領域を非注目領域に決定する。 As shown in FIG. 12B, for example, the combining unit 13d forms a frame FAf on the image I1A by applying the attention information. The synthesizing unit 13d determines the photometric region that is entirely included in the frame FAf as the first region of interest, determines the photometric region that is partially included in the frame FAf as the second region of interest, and determines the photometric region that is not included in the frame FAf. Is determined as a non-attention area.

なお、画像Ｉ１Ａ内の参加者像ＰＡ１〜ＰＣ１を抽出することによって、第一注目領域内の測光領域をさらに差異付けてもよい。例えば、合成部１３ｄは、第一注目領域内において、参加者像ＰＡ１〜ＰＣ１を少なくとも部分的に含む測光領域の注目度を、参加者像ＰＡ１〜ＰＣ１を含まない測光領域よりも高く設定してもよい。 Note that the photometric regions in the first region of interest may be further differentiated by extracting the participant images PA1 to PC1 in the image I1A. For example, the synthesizing unit 13d sets the degree of attention of the photometry area including at least partially the participant images PA1 to PC1 in the first attention area higher than that of the photometry area not including the participant images PA1 to PC1. Is also good.

合成部１３ｄは、第一注目領域及び第二注目領域を含む領域を注目測光領域に決定する。このように、第一拠点ＣＰ１の端末装置１０ａは、自拠点の音方向の情報に基づく注目領域を用いて、第一拠点ＣＰ１を写す画像Ｉ１における注目測光領域を決定する。 The combining unit 13d determines a region including the first region of interest and the second region of interest as a target photometric region. As described above, the terminal device 10a of the first base CP1 determines the target photometric region in the image I1 that captures the first base CP1, using the target region based on the sound direction information of the local base.

＜合成部１３ｄの注目エリア決定処理＞
合成部１３ｄの注目エリア決定処理を説明する。図１３は、各拠点の注目情報から設定される各測光領域の注目情報の一例を示す図である。図１３は、第一拠点ＣＰ１の撮像部５１の画像Ｉ１Ａに対する注目エリアの決定方法を示す。図１３では、話者は、参加者ＰＡである。 <Attention area determination processing of combining section 13d>
The attention area determination processing of the combining unit 13d will be described. FIG. 13 is a diagram illustrating an example of attention information of each photometry area set from attention information of each base. FIG. 13 shows a method of determining the area of interest for the image I1A of the imaging unit 51 of the first base CP1. In FIG. 13, the speaker is the participant PA.

第一拠点ＣＰ１の端末装置１０ａの合成部１３ｄは、画像Ｉ１Ａについて、自拠点の注目情報決定部１３ｃから、音方向の情報に基づく注目測光領域の情報を取得し、他拠点ＣＰ２〜ＣＰ４の端末装置１０ｂ〜１０ｄの注目情報決定部１３ｃから、視線情報に基づく注目測光領域の情報を取得する。注目測光領域の情報は、注目測光領域及び非注目領域の位置及び注目度を含む、つまり、各測光領域の注目度を含む。なお、各注目情報決定部１３ｃによって、注目度は数値化され、値が大きいほど注目度が高い。 The synthesizing unit 13d of the terminal device 10a of the first base CP1 acquires the information of the target photometric region based on the sound direction information from the target information determining unit 13c of the own base for the image I1A, and the terminal of the other base CP2 to CP4. From the attention information determination unit 13c of each of the devices 10b to 10d, information on the attention photometry area based on the line-of-sight information is acquired. The information on the target photometry area includes the positions and the degrees of interest of the target photometry areas and the non-target areas, that is, includes the degrees of interest of the respective photometry areas. Note that each attention information determination unit 13c quantifies the attention degree, and the greater the value, the higher the attention degree.

図１３では、例えば、第一注目領域の注目度は「３」であり、第二注目領域の注目度は「２」であり、非注目領域の注目度は「０」である。例えば、第一拠点ＣＰ１の注目測光領域において、参加者像ＰＡ１の顔を含む測光領域は、第一注目領域であり、参加者像ＰＡ１周辺の測光領域は、第二注目領域である。他拠点ＣＰ２〜ＣＰ４の注目測光領域の測光領域は、第二注目領域である。 In FIG. 13, for example, the attention degree of the first attention area is “3”, the attention degree of the second attention area is “2”, and the attention degree of the non-interest area is “0”. For example, in the photometric area of interest at the first base CP1, the photometric area including the face of the participant image PA1 is the first area of interest, and the photometric area around the participant image PA1 is the second area of interest. The photometric area of the photometric area of interest at the other bases CP2 to CP4 is a second area of interest.

合成部１３ｄは、第一拠点ＣＰ１の注目測光領域の各測光領域の注目度に、他拠点ＣＰ２〜ＣＰ４の注目測光領域の対応する測光領域の注目度を加算する。第一拠点ＣＰ１の測光領域の注目度に対して、画像Ｉ１Ａ上で当該測光領域と同じ位置にある他拠点ＣＰ２〜ＣＰ４の測光領域の注目度が加算される。これにより、合成部１３ｄは、各測光領域の注目度が加算後の注目度である加算注目情報を生成し、エリア決定部１３ｅに出力する。 The synthesizing unit 13d adds the remarkability of the corresponding photometry areas of the other photometry areas of the other locations CP2 to CP4 to the remarkability of each photometry area of the photometry area of interest at the first location CP1. The remarkability of the photometry areas of the other locations CP2 to CP4 located at the same position on the image I1A as the photometry area is added to the remarkability of the photometry area of the first location CP1. Thus, the combining unit 13d generates addition attention information in which the attention degree of each photometric region is the attention degree after the addition, and outputs the information to the area determination unit 13e.

エリア決定部１３ｅは、取得された加算注目情報において、最も注目度が大きい測光領域を抽出する。例えば、図１３では、注目度「９」が最も大きく、注目度「９」の測光領域が最も注目されていると見なすことができる。 The area determination unit 13e extracts a photometric region having the highest degree of interest from the acquired additional information of interest. For example, in FIG. 13, the degree of attention “9” is the largest, and the photometric region with the degree of attention “9” can be regarded as the most noticeable.

例えば図１４のように、エリア決定部１３ｅは、注目度「９」の測光領域を中心とした周囲の測光領域に対して、予め設定された重み付け方法に従って、注目度を再設定することによって、注目エリアを決定する。図１４は、図１３の各測光領域の注目情報への重み付け後の各測光領域の注目情報の一例を示す図である。 For example, as shown in FIG. 14, the area determining unit 13 e resets the degree of interest in the surrounding photometric area around the photometric area with the degree of interest “9” according to a preset weighting method. Determine the area of interest. FIG. 14 is a diagram illustrating an example of the attention information of each photometry area after weighting the attention information of each photometry area in FIG. 13.

図１４では、重み付け方法は、注目度「９」の測光領域の中心から周辺に向かって、例えば所定の割合で、注目度を次第に小さくする重点形式である。しかしながら、重み付け方法は、図１３の方法に限定されない。例えば、重み付け方法は、最も注目度が大きい測光領域のみ、又は、所定の注目度以上の測光領域等に、注目度を設定するスポット形式でもよい。エリア決定部１３ｅは、設定後の各測光領域の注目度を含む注目エリア情報を露出制御部１３ｆに出力する。 In FIG. 14, the weighting method is an emphasis type in which the degree of attention is gradually reduced, for example, at a predetermined rate from the center of the photometric area having the degree of interest “9” toward the periphery. However, the weighting method is not limited to the method of FIG. For example, the weighting method may be a spot format in which the degree of interest is set only in the photometric region with the highest degree of interest or in a photometric region with a predetermined degree of interest or higher. The area determination unit 13e outputs attention area information including the attention degree of each photometry area after the setting to the exposure control unit 13f.

露出制御部１３ｆは、画像Ｉ１Ａの各測光領域に対して、注目エリア情報に含まれる各測光領域の注目度に対応する重み付けで露出制御を行う。つまり、露出制御部１３ｆは、注目エリア情報を、露出制御における測光領域の重み付けに活用する。これにより、最も多くの人が注目している領域に対する最適な露出制御が可能になる。 The exposure control unit 13f performs exposure control on each photometric region of the image I1A with a weight corresponding to the degree of interest of each photometric region included in the attention area information. That is, the exposure control unit 13f uses the attention area information for weighting the photometry area in the exposure control. This makes it possible to perform optimal exposure control on an area where the most people are paying attention.

また、加算注目情報において、最も注目度が大きい２つ以上の測光領域が分散して存在する場合がある。例えば、図１５は、注目度が高い領域が分散している例を示す図である。図１５は、注目エリアを示し、最大注目度「９」である測光領域をそれぞれが含む２つの分離した注目エリアが存在する。つまり、参加者の注目ポイントが２つに割れていることが示される。このような場合、露出制御部１３ｆは、注目度「９」の２つの測光領域の測光値の差異の大きさから、露出制御方式を決定してもよい。 Further, in the added attention information, there may be cases where two or more photometry areas having the highest attention degree are dispersed. For example, FIG. 15 is a diagram illustrating an example in which regions of high interest are dispersed. FIG. 15 shows an area of interest, in which there are two separate areas of interest each including a photometric area with the maximum degree of interest “9”. That is, it is shown that the participant's attention point is split into two. In such a case, the exposure control unit 13f may determine the exposure control method from the magnitude of the difference between the photometric values of the two photometric regions with the attention level “9”.

例えば、測光値の差異の絶対値が所定の範囲内である場合、いずれの測光領域に合わせた露出制御であっても、２つの注目エリアに対して最適な露出制御が可能である。このため、露出制御部１３ｆは、いずれか一方の測光領域に合わせた露出制御を行う。 For example, when the absolute value of the difference between the photometric values is within a predetermined range, optimal exposure control can be performed on the two areas of interest, regardless of the exposure control for any of the photometric regions. For this reason, the exposure control unit 13f performs exposure control in accordance with one of the photometric regions.

また、測光値の差異の絶対値が所定の範囲を超える場合、例えば、一方の測光領域が逆光で撮像された画像に対応し、他方の測光領域が順光で撮像された画像に対応すると考えられる。この場合、露出制御部１３ｆは、露出制御方式をダイナミックレンジが広がるモードに切り替える。このようなモードの例は、ヒストグラム測光方式、ＷＤＲ（ワイドダイナミックレンジ：Wide dynamic range）合成、ＨＤＲ（ハイダイナミックレンジ：High dynamic range）合成等である。上記切り替えにより、明暗差の大きい２つの注目エリアであっても、ユーザにとって両方が見やすい画像を生成することができる。 When the absolute value of the difference between the photometric values exceeds a predetermined range, for example, it is considered that one photometric region corresponds to an image captured in backlight and the other photometric region corresponds to an image captured in normal light. Can be In this case, the exposure control unit 13f switches the exposure control method to a mode in which the dynamic range is widened. Examples of such modes include histogram photometry, WDR (Wide Dynamic Range) synthesis, HDR (High Dynamic Range) synthesis, and the like. By the above switching, it is possible to generate an image that is easy for the user to see even in two attention areas having a large contrast difference.

＜端末装置１０の動作＞
端末装置１０の動作を説明する。図１６は、実施の形態１に係る端末システム１００の動作の一例を示すフローチャートである。以下の説明において、第一拠点ＣＰ１の端末装置１０ａの動作を説明する。他の拠点ＣＰ２〜ＣＰ４の端末装置１０ｂ〜１０ｄの動作も端末装置１０ａと同様であるため、その説明を省略する。 <Operation of Terminal Device 10>
The operation of the terminal device 10 will be described. FIG. 16 is a flowchart showing an example of the operation of the terminal system 100 according to Embodiment 1. In the following description, the operation of the terminal device 10a at the first base CP1 will be described. The operations of the terminal devices 10b to 10d at the other bases CP2 to CP4 are the same as those of the terminal device 10a, and the description thereof will be omitted.

図１６に示すように、会議が開始すると、端末装置１０ａは、自拠点の音方向の情報に基づく自拠点の注目情報を、算出することによって取得する（ステップＳ１）。さらに、端末装置１０ａは、他拠点の視線情報に基づく他拠点の注目情報を、他拠点の端末装置１０ｂ〜１０ｄから取得する（ステップＳ２）。 As shown in FIG. 16, when the conference starts, the terminal device 10a obtains the attention information of the own base by calculating the information of the sound direction of the own base by calculating (step S1). Furthermore, the terminal device 10a acquires the attention information of the other base based on the line-of-sight information of the other base from the terminal devices 10b to 10d of the other base (Step S2).

さらに、端末装置１０ａは、自拠点の注目情報に他拠点の注目情報を加算することによって、加算注目情報を取得する（ステップＳ３）。端末装置１０ａは、加算注目情報において、最も大きい注目度の測光領域の数量が１つである場合（ステップＳ４でＮＯ）、ステップＳ５の処理に進み、最も大きい注目度の測光領域の数量が２つ以上である場合（ステップＳ４でＹＥＳ）、ステップＳ６の処理に進む。 Further, the terminal device 10a acquires the added attention information by adding the attention information of the other base to the attention information of the own base (step S3). If the number of photometric regions with the highest degree of interest is one in the added attention information (NO in step S4), the terminal device 10a proceeds to the process in step S5, and the number of photometric regions with the highest degree of interest is 2 If the number is equal to or greater than one (YES in step S4), the process proceeds to step S6.

ステップＳ５において、端末装置１０ａは、最も大きい注目度の１つの測光領域を中心とする注目エリアを決定し、当該注目エリアの情報を生成する。次いで、端末装置１０ａは、注目エリアの情報を測光領域の重みとして重み付けした露出制御を行う（ステップＳ１０）。次いで、端末装置１０ａは、ステップＳ１１の処理に進む。 In step S5, the terminal device 10a determines an attention area centered on one photometry area having the highest attention degree, and generates information on the attention area. Next, the terminal device 10a performs the exposure control in which the information of the attention area is weighted as the weight of the photometry area (Step S10). Next, the terminal device 10a proceeds to the process of step S11.

また、ステップＳ６において、端末装置１０ａは、最も大きい注目度の測光領域間で測光値の差異を算出する。端末装置１０ａは、差異の絶対値の全てが所定の数値範囲内である場合（ステップＳ７でＹＥＳ）、ステップＳ８の処理に進み、差異の絶対値のいずれかが所定の数値範囲の外である場合（ステップＳ７でＮＯ）、ステップＳ９の処理に進む。なお、最も大きい注目度の測光領域の数量が３つ以上である場合、２つ以上の差異が算出される。 Further, in step S6, the terminal device 10a calculates a difference between the photometric values between the photometric regions having the highest degree of interest. If all of the absolute values of the difference are within the predetermined numerical range (YES in step S7), the terminal device 10a proceeds to the process of step S8, and any of the absolute values of the difference is out of the predetermined numerical range. In this case (NO in step S7), the process proceeds to step S9. If the number of photometric regions with the highest degree of interest is three or more, two or more differences are calculated.

ステップＳ８において、端末装置１０ａは、最も大きい注目度の測光領域のいずれかを中心とする注目エリアを決定し、当該注目エリアの情報を生成する。次いで、端末装置１０ａは、ステップＳ１０に進む。 In step S8, the terminal device 10a determines an attention area centered on one of the photometric areas having the highest attention degree, and generates information on the attention area. Next, the terminal device 10a proceeds to step S10.

ステップＳ９において、端末装置１０ａは、ダイナミックレンジが広い撮影シーンであると判定し、ダイナミックレンジを広げる露出制御方式へ露出制御を切り替える。次いで、端末装置１０ａは、ステップＳ１１の処理に進む。 In step S9, the terminal device 10a determines that the shooting scene has a wide dynamic range, and switches the exposure control to an exposure control method that widens the dynamic range. Next, the terminal device 10a proceeds to the process of step S11.

ステップＳ１１において、端末装置１０ａは、会議終了の指令を受け付けると（ステップＳ１１でＹＥＳ）、一連の処理を終了し、会議終了の指令を受け付けていない場合（ステップＳ１１でＮＯ）、ステップＳ１に戻る。 In step S11, when the terminal device 10a receives the instruction to end the conference (YES in step S11), the terminal device 10a ends a series of processes, and when the instruction to end the conference is not received (NO in step S11), returns to step S1. .

＜効果等＞
上述のような実施の形態１に係る端末装置１０ａは、第一拠点ＣＰ１に配置され且つ他の拠点ＣＰ２〜ＣＰ４に配置された複数の端末装置１０ｂ〜１０ｄと通信する。端末装置１０ａは、第一拠点ＣＰ１を撮像した画像である第一拠点画像を複数の端末装置１０ｂ〜１０ｄに出力する出力部としての第二通信部１２と、複数の端末装置１０ｂ〜１０ｄそれぞれから、端末装置１０ｂ〜１０ｄの拠点ＣＰ２〜ＣＰ４で表示される第一拠点画像に対するユーザの視点情報を取得する取得部としての第一通信部１１と、第一拠点画像に対して、複数の視点情報から得られる第一拠点画像内のユーザからの注目度の高い領域を測光領域とする露出制御を行う露出制御部１３ｆとを備える。 <Effects>
The terminal device 10a according to the first embodiment as described above communicates with a plurality of terminal devices 10b to 10d arranged at the first site CP1 and arranged at the other sites CP2 to CP4. The terminal device 10a includes a second communication unit 12 as an output unit that outputs a first site image, which is an image of the first site CP1, to the plurality of terminal devices 10b to 10d, and a plurality of terminal devices 10b to 10d, respectively. A first communication unit 11 as an acquisition unit for acquiring viewpoint information of the user with respect to the first base images displayed at the bases CP2 to CP4 of the terminal devices 10b to 10d, and a plurality of pieces of viewpoint information for the first base images. And an exposure control unit 13f that performs exposure control in which a region of high user interest in the first base image obtained from is set as a photometric region.

上記構成によると、端末装置１０ａは、他の拠点における第一拠点画像に対する視点情報を用いて、第一拠点画像の露出制御を行う。端末装置１０ａは、第一拠点画像において、複数の視点情報から得られる注目度の高い領域を測光領域とする露出制御をすることで、多くのユーザにとって鮮明な当該領域を提示することができる。つまり、端末装置１０ａは、より多くのユーザに対して最適な露出制御を行うことができる。 According to the above configuration, the terminal device 10a controls the exposure of the first base image using the viewpoint information on the first base image at another base. The terminal device 10a can present a clear area to many users by performing exposure control in the first base image, using a high-attention area obtained from a plurality of pieces of viewpoint information as a photometric area. That is, the terminal device 10a can perform optimal exposure control for more users.

また、実施の形態１に係る端末装置１０ａは、音方向推定部１３ｂを備えてもよい。音方向推定部１３ｂは、第一拠点ＣＰ１に対して集音された音声データから音源の方向を推定する方向推定部として機能し、音源の方向を用いて、第一拠点画像における音源の位置情報を推定する位置推定部として機能してもよい。露出制御部１３ｆは、第一拠点画像に対して、複数の視点情報及び音源の位置情報から得られる第一拠点画像内の注目度の高い領域を測光領域とする露出制御を行ってもよい。 Further, terminal device 10a according to Embodiment 1 may include a sound direction estimating unit 13b. The sound direction estimating unit 13b functions as a direction estimating unit that estimates the direction of the sound source from the sound data collected for the first site CP1, and uses the direction of the sound source to obtain the position information of the sound source in the first site image. May function as a position estimating unit for estimating. The exposure control unit 13f may perform exposure control on the first base image, using a region of high interest in the first base image obtained from a plurality of pieces of viewpoint information and position information of sound sources as a photometric region.

上記構成によると、露出制御部１３ｆが露出制御に用いる視点情報及び音源の位置情報は、他の拠点における視点情報と、第一拠点ＣＰ１における音源の位置情報とを含む。例えば、第一拠点画像において、音源の位置は、他の拠点のユーザの注目度が高い位置であると見なすことができる。端末装置１０ａは、第一拠点画像において、視点情報及び音源の位置情報に関して注目度の高い領域を測光領域とする露出制御をすることで、より多くのユーザに対して最適な露出制御を行うことができる。 According to the above configuration, the viewpoint information and the sound source position information used by the exposure control unit 13f for the exposure control include the viewpoint information at another base and the sound source position information at the first base CP1. For example, in the first base image, the position of the sound source can be regarded as a position where the user of another base has a high degree of attention. The terminal device 10a performs optimal exposure control for a larger number of users by performing exposure control in the first base image, where a region of high interest in viewpoint information and sound source position information is a photometric region. Can be.

また、実施の形態１に係る端末装置１０ａは、視線推定部１３ａを備えてもよい。視線推定部１３ａは、第一拠点画像から、第一拠点ＣＰ１のユーザの視線を推定してもよい。さらに、視線推定部１３ａは、視点生成部として機能し、視線の情報を用いて、端末装置１０ｂ〜１０ｄから取得され且つ第一拠点で表示される第二拠点画像であって、端末装置１０ｂ〜１０ｄの拠点を撮像した画像である第二拠点画像に対する第一拠点ＣＰ１のユーザの視点を示す視点情報を生成してもよい。さらに、第二通信部１２は、視線推定部１３ａにより生成された視点情報を端末装置１０ｂ〜１０ｄに出力してもよい。 Further, terminal device 10a according to Embodiment 1 may include a gaze estimation unit 13a. The gaze estimation unit 13a may estimate the gaze of the user at the first base CP1 from the first base image. Further, the line-of-sight estimating unit 13a functions as a viewpoint generating unit, and is a second base image acquired from the terminal devices 10b to 10d and displayed at the first base, using the line-of-sight information. The viewpoint information indicating the viewpoint of the user of the first base CP1 with respect to the second base image which is an image of the base of 10d may be generated. Furthermore, the second communication unit 12 may output the viewpoint information generated by the gaze estimation unit 13a to the terminal devices 10b to 10d.

上記構成によると、端末装置１０ａは、他の端末装置１０ｂ〜１０ｄそれぞれに対して、端末装置１０ｂ〜１０ｄの拠点を撮像した画像に対する第一拠点ＣＰ１の視点情報を出力することができる。よって、端末装置１０ａ〜１０ｄのそれぞれが、他拠点の視点情報を用いて、露出制御を行うことができる。 According to the above configuration, the terminal device 10a can output, to each of the other terminal devices 10b to 10d, the viewpoint information of the first site CP1 with respect to the image obtained by capturing the site of the terminal device 10b to 10d. Therefore, each of the terminal devices 10a to 10d can perform exposure control using the viewpoint information of another base.

また、実施の形態１に係る端末装置１０ａにおいて、露出制御部１３ｆは、注目度が最も高い領域である最大領域が複数ある場合、第一拠点画像における最大領域間の輝度値の差異が所定の範囲を超えるとき、ダイナミックレンジを広げるように露出制御を行ってもよい。なお、上記輝度値は、測光領域の測光値であってもよい。 Further, in terminal device 10a according to Embodiment 1, when there are a plurality of maximum regions that are regions with the highest degree of attention, exposure control unit 13f determines that the difference in luminance value between the maximum regions in the first base image is a predetermined value. When the range is exceeded, exposure control may be performed so as to widen the dynamic range. The brightness value may be a photometric value in a photometric area.

上記構成によると、最大領域間の輝度値の差異が所定の範囲外にある場合、例えば、一方の最大領域が逆光で撮像された画像に対応し、他方の最大領域が順光で撮像された画像に対応すると考えられ得る。このような最大領域に対して、ダイナミックレンジを広げる露出制御を行うことによって、明暗差の大きい２つの領域であっても、ユーザにとって両方が見やすい画像を生成することができる。 According to the above configuration, when the difference in the luminance value between the maximum regions is outside the predetermined range, for example, one of the maximum regions corresponds to an image captured with backlight, and the other maximum region is captured with normal light. It can be considered to correspond to an image. By performing exposure control for expanding the dynamic range on such a maximum area, it is possible to generate an image that is easy for the user to see even in two areas having a large difference in brightness.

また、実施の形態１に係る通信システムとしてのビデオ会議システム１は、複数の拠点に配置され且つ互いに通信する複数の端末装置１０を備える。端末装置１０はそれぞれ、端末装置１０が配置される第一拠点を撮像した画像である第一拠点画像を他の端末装置１０それぞれに出力する第一出力部としての第二通信部１２と、他の端末装置１０それぞれから取得され且つ第一拠点で表示される拠点画像であって、他の端末装置１０それぞれが配置される拠点を撮像した画像である拠点画像に対する第一ユーザの視点情報を他の端末装置１０に出力する第二出力部としての注目情報決定部１３ｃと、他の端末装置１０それぞれから、他の端末装置１０それぞれが配置される拠点で表示される第一拠点画像に対する第二ユーザの視点情報を取得する取得部としての第一通信部１１と、第一拠点画像に対して、他の端末装置１０から取得された複数の視点情報から得られる第一拠点画像内の第二ユーザからの注目度の高い領域を測光領域とする露出制御を行う露出制御部１３ｆとを備える。このビデオ会議システム１によれば、上記端末装置１０と同様の効果が得られる。 The video conference system 1 as a communication system according to the first embodiment includes a plurality of terminal devices 10 arranged at a plurality of bases and communicating with each other. The terminal devices 10 each include a second communication unit 12 as a first output unit that outputs a first site image, which is an image of the first site where the terminal device 10 is arranged, to each of the other terminal devices 10, Is a base image acquired from each of the terminal devices 10 and displayed at the first base, and the viewpoint information of the first user with respect to the base image which is an image of the base where the other terminal devices 10 are respectively arranged. The attention information determination unit 13c as a second output unit that outputs to the terminal device 10, and the second terminal image from the other terminal device 10 for the first site image displayed at the site where each of the other terminal devices 10 is arranged. A first communication unit 11 as an acquisition unit for acquiring viewpoint information of the user, and a second base image in the first base image obtained from a plurality of viewpoint information obtained from other terminal devices 10 for the first base image. And a exposure control unit 13f that controls exposure to the region with a high level of interest from over The photometric area. According to the video conference system 1, the same effect as the terminal device 10 can be obtained.

また、本発明は撮像装置であってもよい。例えば、本発明の撮像装置は、第一拠点ＣＰ１に配置される。撮像装置は、第一拠点を撮像した画像である第一拠点画像を取得する撮像部５１と、他の拠点に配置された複数の端末装置１０ｂ〜１０ｄと通信する端末装置１０ａとを備える。端末装置１０ａは、第一拠点画像を複数の端末装置１０ｂ〜１０ｄに出力する第二通信部１２と、複数の端末装置１０ｂ〜１０ｄそれぞれから取得され且つ第一拠点で表示される拠点画像であって、複数の端末装置１０ｂ〜１０ｄそれぞれが配置される拠点を撮像した画像である拠点画像に対する第一ユーザの視点情報を端末装置１０ｂ〜１０ｄに出力する注目情報決定部１３ｃと、複数の端末装置１０ｂ〜１０ｄそれぞれから、端末装置１０ｂ〜１０ｄの拠点で表示される第一拠点画像に対する第二ユーザの視点情報を取得する第一通信部１１と、第一拠点画像に対して、複数の端末装置１０ｂ〜１０ｄから取得された複数の視点情報から得られる第一拠点画像内の第二ユーザからの注目度の高い領域を測光領域とする露出制御を行う露出制御部１３ｆとを備える。この撮像装置によれば、上記端末装置１０と同様の効果が得られる。なお、撮像装置の一例は、端末システム１００であってもよい。 Further, the present invention may be an imaging device. For example, the imaging device of the present invention is arranged at the first base CP1. The imaging device includes an imaging unit 51 that acquires a first site image that is an image of the first site, and a terminal device 10a that communicates with a plurality of terminal devices 10b to 10d arranged at other sites. The terminal device 10a is a second communication unit 12 that outputs the first base image to the plurality of terminal devices 10b to 10d, and a base image acquired from each of the plurality of terminal devices 10b to 10d and displayed at the first base. An attention information determination unit 13c that outputs, to the terminal devices 10b to 10d, viewpoint information of a first user with respect to a site image which is an image of a site where the plurality of terminal devices 10b to 10d are arranged; A first communication unit 11 for acquiring viewpoint information of a second user for a first site image displayed at a site of the terminal device 10b to 10d from each of the terminal devices 10b to 10d; Exposure for performing exposure control in which a region of high interest from the second user in the first site image obtained from a plurality of pieces of viewpoint information obtained from 10b to 10d is a photometric region. And a control unit 13f. According to this imaging device, the same effect as that of the terminal device 10 can be obtained. Note that an example of the imaging device may be the terminal system 100.

また、本発明は撮像方法であってもよい。例えば、本発明に係る撮像方法は、第一拠点における撮像方法であって、前記第一拠点を撮像した画像である第一拠点画像を取得するステップと、他の拠点に配置された複数の端末と通信することによって、前記第一拠点画像を前記複数の端末に出力するステップと、前記複数の端末それぞれから取得され且つ前記第一拠点で表示される拠点画像であって、前記複数の端末それぞれが配置される拠点を撮像した画像である拠点画像に対する第一ユーザの視点情報を前記端末に出力するステップと、前記複数の端末それぞれから、前記端末の拠点で表示される前記第一拠点画像に対する第二ユーザの視点情報を取得するステップと、前記第一拠点画像に対して、前記複数の端末から取得された複数の前記視点情報から得られる前記第一拠点画像内の前記第二ユーザからの注目度の高い領域を測光領域とする露出制御を行うステップと、前記露出制御後の前記第一拠点画像を、前記複数の端末に出力するステップとを含む。この撮像方法によれば、上記撮像装置と同様の効果が得られる。このような撮像方法は、ＣＰＵ、ＬＳＩなどの回路、ＩＣカード又は単体のモジュール等によって、実現されてもよい。 Further, the present invention may be an imaging method. For example, an imaging method according to the present invention is an imaging method at a first site, wherein a step of obtaining a first site image which is an image of the first site is performed, and a plurality of terminals arranged at other sites. By communicating with the step of outputting the first base image to the plurality of terminals, and a base image acquired from each of the plurality of terminals and displayed at the first base, wherein each of the plurality of terminals Outputting, to the terminal, viewpoint information of a first user with respect to a base image which is an image of a base where the base is arranged; and from each of the plurality of terminals, the first base image displayed at the base of the terminal. A step of obtaining viewpoint information of a second user; and, for the first base image, the first base image obtained from the plurality of viewpoint information obtained from the plurality of terminals. And performing exposure control of the degree of attention regions of high photometric area from the second user of the first base image after the exposure control, and outputting the plurality of terminals. According to this imaging method, effects similar to those of the above-described imaging device can be obtained. Such an imaging method may be realized by a circuit such as a CPU and an LSI, an IC card, a single module, or the like.

（実施の形態２）
実施の形態２に係る端末装置は、他拠点の端末装置から取得する注目情報に重み付けして注目情報を加算する。以下、実施の形態２について、実施の形態１と異なる点を中心に説明し、実施の形態１と同様の点の説明を適宜省略する。 (Embodiment 2)
The terminal device according to Embodiment 2 weights attention information acquired from terminal devices at other bases and adds the attention information. Hereinafter, the second embodiment will be described focusing on the points different from the first embodiment, and the description of the same points as the first embodiment will be appropriately omitted.

図１７は、実施の形態２に係る端末システムにおける注目情報の重要度の決定処理を説明する図である。図１７に示すように、第一拠点ＣＰ１には、参加者ＰＡ〜ＰＣと、撮像部５１ａと、表示部５４ａが存在する。第二拠点ＣＰ２には、参加者ＰＤ〜ＰＦと、撮像部５１ｂと、表示部５４ｂが存在する。第三拠点ＣＰ３には、参加者ＰＧ〜ＰＩと、撮像部５１ｃと、表示部５４ｃが存在する。第四拠点ＣＰ４には、参加者ＰＪ〜ＰＬと、撮像部５１ｄと、表示部５４ｄが存在する。 FIG. 17 is a diagram illustrating a process of determining the importance of attention information in the terminal system according to Embodiment 2. As shown in FIG. 17, at the first base CP1, participants PA to PC, an imaging unit 51a, and a display unit 54a exist. At the second base CP2, participants PD to PF, an imaging unit 51b, and a display unit 54b are present. The third base CP3 includes participants PG to PI, an imaging unit 51c, and a display unit 54c. The fourth base CP4 includes participants PJ to PL, an imaging unit 51d, and a display unit 54d.

第一拠点ＣＰ１の端末装置１０ａが、撮像部５１ａによって撮像された画像を露出制御し、他拠点ＣＰ２〜ＣＰ４の端末装置１０ｂ〜１０ｄに送信するケースを説明する。図１７では、第二拠点ＣＰ２の参加者ＰＤが最後の話者、つまり直近に発話した話者であり、第三拠点ＣＰ３の参加者ＰＨが参加者ＰＤの直前に発話した話者である。 A case will be described in which the terminal device 10a at the first site CP1 controls the exposure of the image captured by the imaging unit 51a and transmits the image to the terminal devices 10b to 10d at the other sites CP2 to CP4. In FIG. 17, the participant PD at the second base CP2 is the last speaker, that is, the speaker who spoke most recently, and the participant PH at the third base CP3 is the speaker who spoke immediately before the participant PD.

端末装置１０ａ〜１０ｄはそれぞれ、自拠点での音方向の情報に基づく注目情報を算出する。これにより、端末装置１０ａ〜１０ｄはそれぞれ、自拠点において、発話した参加者と発話のタイミングとを検出することができる。そして、端末装置１０ａ〜１０ｄはそれぞれ、他拠点の端末装置１０ａ〜１０ｄに対して、自拠点における発話した参加者の識別情報と発話の時刻とを対応付けて含む話者情報を送信する。さらに、端末装置１０ａ〜１０ｄはそれぞれ、拠点ＣＰ１〜ＣＰ４の全ての参加者の発話の履歴を蓄積することで、発話した参加者の拠点の履歴の情報である話者履歴情報を記憶部１６に保持する。 Each of the terminal devices 10a to 10d calculates attention information based on the information on the sound direction at its own base. Thereby, each of the terminal devices 10a to 10d can detect the uttering participant and the timing of the utterance at the own base. Then, each of the terminal devices 10a to 10d transmits the speaker information including the identification information of the participant who uttered at the own site and the utterance time in association with the terminal devices 10a to 10d at the other sites. Further, each of the terminal devices 10a to 10d accumulates the utterance histories of all the participants at the bases CP1 to CP4, and stores the speaker history information, which is the information of the histories of the bases of the uttered participants, in the storage unit 16. Hold.

また、自拠点での音方向の情報に基づく注目情報は、音声入力部５２によって取得される音声データが音声の検出時刻を含むことで、時刻の情報を含むことができる。また、他拠点での視線情報に基づく注目情報は、撮像部５１によって撮像された画像データが撮像時刻を含むことで、時刻の情報を含むことができる。 In addition, the attention information based on the information on the sound direction at the own base can include time information when the sound data acquired by the sound input unit 52 includes the sound detection time. The attention information based on the line-of-sight information at another site can include time information when the image data captured by the imaging unit 51 includes the imaging time.

端末装置１０ａ〜１０ｄはそれぞれ、記憶部１６の話者履歴情報と、自拠点での音方向の情報に基づく注目情報と、他拠点での視線情報に基づく注目情報とを、例えば時刻に基づき紐づける。そして、端末装置１０ａ〜１０ｄはそれぞれ、話者履歴情報を用いて注目情報を重み付けする。 The terminal devices 10a to 10d respectively link the speaker history information of the storage unit 16, the attention information based on the sound direction information at the own base, and the attention information based on the line-of-sight information at the other base, for example, based on time. Attach. Then, each of the terminal devices 10a to 10d weights the attention information using the speaker history information.

例えば、端末装置１０ａは、最後の話者である参加者ＰＤがいる第二拠点ＣＰ２の端末装置１０ｂから取得する注目情報の重みを最も大きい「重要度Ａ」とし、参加者ＰＤの直前の話者である参加者ＰＨがいる第三拠点ＣＰ３の端末装置１０ｂから取得する注目情報の重みを２番目に大きい「重要度Ｂ」とする。さらに、端末装置１０ａは、第二拠点ＣＰ２及び第三拠点ＣＰ３以外の拠点の端末装置１０で生成され取得される注目情報の重みを最も小さい「重要度Ｃ」とする。 For example, the terminal device 10a sets the weight of attention information acquired from the terminal device 10b of the second base CP2 where the participant PD who is the last speaker is the largest to “importance A”, and sets the talk immediately before the participant PD. The weight of the attention information acquired from the terminal device 10b of the third base CP3 in which the participant PH who is the participant is located is set to the second largest “importance B”. Furthermore, the terminal device 10a sets the weight of the attention information generated and acquired by the terminal devices 10 at the bases other than the second base CP2 and the third base CP3 to the “importance C”.

端末装置１０ａは、自拠点で生成される注目情報及び他拠点から取得される注目情報に、拠点に対応した重要度別の重み付けを行い、加算注目情報の算出の際に活用する。例えば、重み付けは、注目度への重みの乗算であってもよい。これにより、現在話者が存在する拠点において注目されている領域が見やすい映像が得られる。 The terminal device 10a weights the attention information generated at its own base and the attention information obtained from another base by importance according to the base, and uses the weighted information when calculating the additional attention information. For example, the weighting may be a multiplication of the attention degree by a weight. As a result, it is possible to obtain an image in which the region of interest at the base where the speaker currently exists is easy to see.

また、端末装置１０ａ〜１０ｄはそれぞれ、最後の話者がいる拠点以外の拠点の端末装置で生成された注目情報を、無効な注目情報として取り扱ってもよい。これにより、現在話者が存在する拠点において注目されている領域のみに最適に露出制御した映像が得られる。 In addition, each of the terminal devices 10a to 10d may handle attention information generated by a terminal device at a base other than the base where the last speaker is located as invalid attention information. As a result, it is possible to obtain a video image whose exposure is optimally controlled only in a region of interest at the base where the speaker currently exists.

また、実施の形態２に係る端末装置のその他の構成及び動作は、実施の形態１と同様であるため、その説明を省略する。そして、上述のような実施の形態２に係る端末装置によると、実施の形態１と同様の効果が得られる。 Other configurations and operations of the terminal device according to the second embodiment are the same as those of the first embodiment, and a description thereof will not be repeated. According to the terminal device according to the second embodiment as described above, the same effect as in the first embodiment can be obtained.

さらに、実施の形態２に係る端末装置１０ａは、複数の端末装置１０ｂ〜１０ｄそれぞれから、端末装置１０ｂ〜１０ｄの拠点ＣＰ２〜ＣＰ４に対して集音された音声データの履歴情報を取得し、露出制御部１３ｆは、複数の端末装置１０ｂ〜１０ｄから取得された視点情報に対して複数の端末装置１０ｂ〜１０ｄそれぞれの音声データの履歴情報に応じた重み付けをした重み付け後の視点情報を露出制御に用いてもよい。 Further, the terminal device 10a according to the second embodiment acquires, from each of the plurality of terminal devices 10b to 10d, history information of audio data collected for the bases CP2 to CP4 of the terminal devices 10b to 10d, The control unit 13f uses the weighted viewpoint information obtained by weighting the viewpoint information acquired from the plurality of terminal devices 10b to 10d according to the history information of the audio data of each of the plurality of terminal devices 10b to 10d for exposure control. May be used.

上記構成によると、重み付け後の視点情報から得られる注目度には、音声データの履歴情報が反映される。例えば、音声データの取得時期が現在から近い拠点の視点情報に対する重みを大きくすることによって、当該視点情報から得られる注目度は高くなる。例えば、音声データの取得時期が現在から近い拠点ほど、他の拠点のユーザの注目度が高いと見なすことができる。よって、端末装置１０ａは、より多くのユーザに対して最適な露出制御を行うことができる。 According to the above configuration, the attention level obtained from the weighted viewpoint information reflects the history information of the audio data. For example, by increasing the weight for the viewpoint information of the base whose acquisition time of the audio data is close to the present, the degree of attention obtained from the viewpoint information increases. For example, it can be considered that the closer the acquisition time of the audio data is from the present time, the higher the attention degree of the user of the other base is. Therefore, the terminal device 10a can perform optimal exposure control for more users.

（その他の実施の形態）
以上、本発明の実施の形態の例について説明したが、本発明は、上記実施の形態及び変形例に限定されない。すなわち、本発明の範囲内で種々の変形及び改良が可能である。例えば、各種変形を実施の形態又は変形例に施したもの、及び、異なる実施の形態及び変形例における構成要素を組み合わせて構築される形態も、本発明の範囲内に含まれる。 (Other embodiments)
As mentioned above, although the example of the embodiment of the present invention was explained, the present invention is not limited to the above-mentioned embodiment and the modification. That is, various modifications and improvements are possible within the scope of the present invention. For example, various modifications made to the embodiment or the modified example, and forms constructed by combining components in different embodiments and modified examples are also included in the scope of the present invention.

例えば、実施の形態に係る端末システム１００において、撮像部５１は、視線情報を取得するために拠点の参加者を撮像する機能と、露出制御部１３ｆにより露出制御を受ける画像を撮像する機能とを兼ねていたが、これに限定されない。例えば、各機能を実現する２つの撮像部が設けられてもよい。 For example, in the terminal system 100 according to the embodiment, the imaging unit 51 has a function of imaging a participant at a base in order to obtain line-of-sight information and a function of imaging an image subjected to exposure control by the exposure control unit 13f. It also served, but is not limited to this. For example, two imaging units that realize each function may be provided.

また、上記で用いた序数、数量等の数字は、全て本発明の技術を具体的に説明するために例示するものであり、本発明は例示された数字に制限されない。また、構成要素間の接続関係は、本発明の技術を具体的に説明するために例示するものであり、本発明の機能を実現する接続関係はこれに限定されない。 Further, the numbers such as the ordinal numbers and the quantities used above are all examples for specifically explaining the technology of the present invention, and the present invention is not limited to the illustrated numbers. Further, the connection relation between the constituent elements is illustrated for specifically explaining the technology of the present invention, and the connection relation for realizing the function of the present invention is not limited to this.

また、機能ブロック図におけるブロックの分割は一例であり、複数のブロックを一つのブロックとして実現する、一つのブロックを複数に分割する、及び／又は、一部の機能を他のブロックに移してもよい。また、類似する機能を有する複数のブロックの機能を単一のハードウェア又はソフトウェアが並列又は時分割に処理してもよい。 The division of blocks in the functional block diagram is merely an example, and a plurality of blocks may be implemented as one block, one block may be divided into a plurality of blocks, and / or some functions may be moved to another block. Good. Also, a single piece of hardware or software may process functions of a plurality of blocks having similar functions in parallel or in a time-division manner.

１ビデオ会議システム（通信システム）
１０，１０ａ，１０ｂ，１０ｃ，１０ｄ端末装置（通信端末）
１１第一通信部（取得部）
１２第二通信部（出力部、第一出力部）
１３制御部
１３ａ視線推定部（視点生成部）
１３ｂ音方向推定部（方向推定部、位置推定部）
１３ｃ注目情報決定部（第二出力部）
１３ｄ合成部
１３ｅエリア決定部
１３ｆ露出制御部
５１，５１ａ，５１ｂ，５１ｃ，５１ｄ撮像部
５２音声入力部
５４，５４ａ，５４ｂ，５４ｃ，５４ｄ表示部
１００端末システム（撮像装置） 1 Video conference system (communication system)
10, 10a, 10b, 10c, 10d Terminal device (communication terminal)
11 First communication unit (acquisition unit)
12 Second communication unit (output unit, first output unit)
13 control unit 13a gaze estimation unit (viewpoint generation unit)
13b Sound direction estimation unit (direction estimation unit, position estimation unit)
13c Attention information determination unit (second output unit)
13d synthesis unit 13e area determination unit 13f exposure control unit 51, 51a, 51b, 51c, 51d imaging unit 52 audio input unit 54, 54a, 54b, 54c, 54d display unit 100 terminal system (imaging device)

特開平０５−２２７４６９号公報JP 05-227469 A

Claims

A communication terminal that is located at the first base and communicates with a plurality of terminals located at other bases,
An output unit that outputs a first base image, which is an image of the first base, to the plurality of terminals,
From each of the plurality of terminals, an acquisition unit that acquires user viewpoint information for the first base image displayed at the base of the terminal,
A communication terminal comprising: an exposure control unit configured to perform an exposure control on the first base image as a photometric area in the first base image obtained from the plurality of pieces of viewpoint information and having a high degree of attention from the user in the first base image. .

A direction estimating unit that estimates the direction of a sound source from audio data collected for the first base,
Using a direction of the sound source, further comprising a position estimating unit for estimating position information of the sound source in the first site image,
The exposure control unit performs an exposure control on the first base image, in which a region of high interest in the first base image obtained from the plurality of pieces of viewpoint information and the position information of the sound source is the photometric region. The communication terminal according to claim 1.

A gaze estimating unit that estimates the gaze of the user at the first base from the first base image;
Using the line-of-sight information, the second base image acquired from the terminal and displayed at the first base, the first base image of the second base image is an image of the base of the terminal A viewpoint generation unit that generates the viewpoint information indicating the viewpoint of the user,
The communication terminal according to claim 1, wherein the output unit outputs the viewpoint information generated by the viewpoint generation unit to the terminal.

The exposure control unit, when there are a plurality of maximum regions that are regions with the highest degree of attention, expands a dynamic range when a difference in luminance value between the maximum regions in the first base image exceeds a predetermined range. The communication terminal according to any one of claims 1 to 3, wherein the communication terminal performs exposure control.

The acquisition unit acquires, from each of the plurality of terminals, history information of audio data collected for the base of the terminal,
The exposure control unit uses, for exposure control, weighted viewpoint information obtained by weighting the viewpoint information acquired from the plurality of terminals according to history information of the audio data of each of the plurality of terminals. The communication terminal according to any one of claims 1 to 4.

A communication system including a plurality of communication terminals arranged at a plurality of bases and communicating with each other,
Each of the communication terminals,
A first output unit that outputs a first base image, which is an image of the first base where the communication terminal is arranged, to each of the other communication terminals,
First user viewpoint information for a base image obtained from each of the other communication terminals and displayed at the first base, the base image being an image of a base where each of the other communication terminals is arranged. A second output unit that outputs to the other communication terminal,
From each of the other communication terminals, an acquisition unit that acquires viewpoint information of the second user for the first site image displayed at the site where the other communication terminals are respectively arranged,
Exposure of the first base image to a photometric region, where a region of high interest from the second user in the first base image obtained from the plurality of pieces of viewpoint information obtained from the other communication terminals is a photometric region. A communication system comprising: an exposure control unit that performs control.

An imaging device arranged at a first base,
An imaging unit that acquires a first base image that is an image of the first base,
A communication terminal that communicates with a plurality of terminals located at other bases,
The communication terminal,
A first output unit that outputs the first base image to the plurality of terminals,
A base image acquired from each of the plurality of terminals and displayed at the first base, wherein the viewpoint information of the first user with respect to the base image, which is an image of a base where the plurality of terminals are arranged, is obtained. A second output unit for outputting to the terminal,
From each of the plurality of terminals, an acquisition unit that acquires viewpoint information of the second user for the first base image displayed at the base of the terminal,
Exposure control for the first site image, where a region of high interest from the second user in the first site image obtained from the plurality of viewpoint information obtained from the plurality of terminals is a photometric region. An imaging apparatus comprising: an exposure control unit configured to perform the operation.

The imaging unit,
A first imaging unit that captures an image for acquiring the viewpoint information,
The imaging device according to claim 7, further comprising: a second imaging unit configured to capture an image on which exposure control is performed by the exposure control unit.

An imaging method at a first base,
Obtaining a first base image, which is an image of the first base,
By communicating with a plurality of terminals located at other bases, outputting the first base image to the plurality of terminals,
A base image acquired from each of the plurality of terminals and displayed at the first base, wherein the viewpoint information of the first user with respect to the base image, which is an image of a base where the plurality of terminals are arranged, is obtained. Outputting to a terminal;
From each of the plurality of terminals, obtaining the viewpoint information of the second user for the first base image displayed at the base of the terminal,
Exposure control for the first site image, where a region of high interest from the second user in the first site image obtained from the plurality of viewpoint information obtained from the plurality of terminals is a photometric region. Performing
Outputting the first base image after the exposure control to the plurality of terminals.