JP2018142252A

JP2018142252A - Region-of-interest estimating method, region-of-interest estimating apparatus, and region-of-interest estimating program

Info

Publication number: JP2018142252A
Application number: JP2017037075A
Authority: JP
Inventors: 永野　秀尚; Hidenao Nagano; 秀尚永野; 柏野　邦夫; Kunio Kashino; 邦夫柏野; 祐樹児玉; Yuki Kodama; 康友川西; Yasutomo Kawanishi; 大輔出口; Daisuke Deguchi; 一郎井手; Ichiro Ide; 高嗣平山; Takatsugu Hirayama; 村瀬　洋; Hiroshi Murase; 洋村瀬
Original assignee: Nagoya University NUC; Nippon Telegraph and Telephone Corp
Current assignee: Nagoya University NUC; Nippon Telegraph and Telephone Corp
Priority date: 2017-02-28
Filing date: 2017-02-28
Publication date: 2018-09-13

Abstract

PROBLEM TO BE SOLVED: To provide a region-of-interest estimating method, a region-of-interest estimating apparatus, and a region-of-interest estimating program that can accurately estimate a region in which a plurality of organisms appearing in an image pay attention.SOLUTION: An image in which a plurality of organisms are captured is taken as an input, region-of-interest information indicating a region on a surface facing the plurality of organisms attracting attention by each of the plurality of organisms appearing in an image is generated, from the generated region-of-interest information, integrated region-of-interest information is generated by integrating the regions attracting attention by each of the plurality of organisms, and a region-of-interest is estimated form the generated integrated region-of-interest information.SELECTED DRAWING: Figure 1

Description

本発明は、画像に写る生物が注目している領域を推定する注目領域推定方法、注目領域推定装置、及び注目領域推定プログラムに関する。 The present invention relates to a region-of-interest estimation method, a region-of-interest estimation apparatus, and a region-of-interest estimation program for estimating a region that is noticed by a living thing in an image.

従来、生物が写っている画像が一枚以上与えられたときに、与えられた画像から、生物が注目している領域を推定する注目領域推定方法が提案されている。 2. Description of the Related Art Conventionally, a region of interest estimation method has been proposed in which when one or more images showing a living thing are given, a region in which the living thing is focused is estimated from the given image.

この注目領域推定方法に関する従来技術としては、対象とする生物を人間とし、対象とする生物の数を１つとした場合の視線測定装置（下記特許文献１を参照）、比較的近距離から撮像した複数の画像を用いて対象人物の視線を推定する視線推定方法（下記非特許文献１を参照）等が挙げられる。 As a conventional technique related to this attention area estimation method, a gaze measuring device (refer to Patent Document 1 below) in which the target organism is a human and the number of target organisms is one, is taken from a relatively short distance. Examples include a gaze estimation method (see Non-Patent Document 1 below) that estimates a gaze of a target person using a plurality of images.

特許第４２６０７１５号公報Japanese Patent No. 4260715

岡本他，“複数カメラを利用した広域遠隔視線計測”，インタラクション２０１２．Okamoto et al., “Wide range gaze measurement using multiple cameras”, Interaction 2012.

しかし、上記特許文献１及び非特許文献１の手法を注目領域推定に用いる場合、上記特許文献１で提案されている手法では、対象とする人物ごとに視線を測定するための装置を設置する必要があり、コストがかかるという課題があった。 However, when the methods of Patent Document 1 and Non-Patent Document 1 are used for region-of-interest estimation, the method proposed in Patent Document 1 needs to install a device for measuring the line of sight for each target person. There was a problem that it was expensive.

また、上記非特許文献１で提案されている手法では、対象とする人物について眼球の光彩に関する詳細な情報が得られる程度の高精細な画像が必要となってしまうという課題があった。また、対象とする人物が注目している領域の推定精度は、視線方向の推定精度に大きく依存し、視線方向推定が間違った場合には、注目領域の推定も間違ったものとなってしまうという課題もあった。 In addition, the technique proposed in Non-Patent Document 1 has a problem that a high-definition image is required to obtain detailed information about the eyeball glow for a target person. In addition, the estimation accuracy of the area that the target person is paying attention to greatly depends on the estimation accuracy of the gaze direction, and if the gaze direction estimation is incorrect, the estimation of the attention area is also incorrect. There were also challenges.

本発明は、以上のような事情に鑑みてなされたものであり、画像に写る複数の生物が注目している領域を高精度に推定することができる注目領域推定方法、注目領域推定装置、及び注目領域推定プログラムを提供することを目的とする。 The present invention has been made in view of the circumstances as described above, and an attention area estimation method, an attention area estimation device, and a method of accurately estimating an area that is noticed by a plurality of organisms in an image, and An object is to provide a region of interest estimation program.

上記目的を達成するために、本発明の注目領域推定方法は、注目領域推定部、及び注目領域統合部を有する注目領域推定装置における注目領域推定方法であって、前記注目領域推定部が、複数の生物が写る画像を入力とし、前記画像に写る前記複数の生物の各々が注目している、前記複数の生物に対向する面上の領域を示す注目領域情報を生成する第１生成ステップと、前記注目領域統合部が、生成された前記注目領域情報から、前記複数の生物の各々が注目している領域を統合した統合注目領域情報を生成する第２生成ステップと、前記注目領域統合部が、生成された前記統合注目領域情報から、注目領域を推定する推定ステップと、を含む。 In order to achieve the above object, an attention area estimation method of the present invention is an attention area estimation method in an attention area estimation device having an attention area estimation section and an attention area integration section, and the attention area estimation section includes a plurality of attention area estimation sections. A first generation step of generating region-of-interest information indicating a region on a surface facing the plurality of organisms, each of which is viewed by the plurality of organisms captured in the image; A second generation step in which the attention area integration unit generates integrated attention area information obtained by integrating the areas of interest of each of the plurality of organisms from the generated attention area information; And an estimation step of estimating a region of interest from the generated integrated region of interest information.

なお、前記第１生成ステップでは、前記注目領域推定部が、複数の生物が写る１枚の入力画像毎に、前記複数の生物に対向する面上の領域毎に、前記画像に写る前記複数の生物の各々の注目度を示すスコアを求め、前記注目領域情報とするようにしても良い。 In the first generation step, the attention area estimation unit includes the plurality of images captured in the image for each input image in which a plurality of organisms are captured, and for each region on the surface facing the plurality of organisms. A score indicating the attention level of each organism may be obtained and used as the attention area information.

また、前記第２生成ステップでは、前記注目領域統合部が、生成された前記注目領域情報から、前記複数の生物に対向する面上の領域毎に、前記複数の生物が注目する注目度を示すスコアを求め、前記統合注目領域情報とするようにしても良い。 In the second generation step, the attention area integration unit indicates, from the generated attention area information, a degree of attention that the plurality of organisms pay attention to for each region on a surface facing the plurality of organisms. A score may be obtained and used as the integrated attention area information.

また、前記第２生成ステップでは、前記注目領域統合部が、前記複数の生物に対向する面上の領域毎に、前記複数の生物の各々の注目度を示すスコアの平均値、最大値、中央値、及び最頻値の何れかを求め、あるいは前記複数の生物の各々の注目度を示すスコアの平均値、最大値、中央値、及び最頻値の少なくとも２つの組み合わせを求め、前記統合注目領域情報とするようにしても良い。 Further, in the second generation step, the attention area integration unit, for each area on the surface facing the plurality of organisms, an average value, a maximum value, a center value of a score indicating the degree of attention of each of the plurality of organisms Either a value or a mode value, or a combination of at least two of an average value, a maximum value, a median value, and a mode value indicating a degree of attention of each of the plurality of organisms. Area information may be used.

また、前記注目領域推定装置は、部分画像生成部を有し、前記部分画像生成部が、前記画像のうちの前記生物が写る部分画像を生成する部分画像生成ステップを更に含み、前記第１生成ステップでは、前記注目領域推定部が、前記部分画像に基づいて、前記注目領域情報を生成するようにしても良い。 In addition, the attention area estimation apparatus includes a partial image generation unit, and the partial image generation unit further includes a partial image generation step of generating a partial image in which the living thing is captured in the image, and the first generation In the step, the attention area estimation unit may generate the attention area information based on the partial image.

上記目的を達成するために、本発明の注目領域推定装置は、複数の生物が写る画像を入力とし、前記画像に写る前記複数の生物の各々が注目している、前記複数の生物に対向する面上の領域を示す注目領域情報を生成する注目領域推定部と、前記注目領域推定部により生成された前記注目領域情報から、前記複数の生物の各々が注目している領域を統合した統合注目領域情報を生成し、生成された前記統合注目領域情報から、注目領域を推定する注目領域統合部と、を含む。 In order to achieve the above object, an attention area estimation device according to the present invention receives an image of a plurality of organisms as an input, and faces each of the plurality of organisms in which each of the plurality of organisms in the image is focused. An attention area estimation unit that generates attention area information indicating an area on the surface, and an integrated attention that integrates the areas of interest of each of the plurality of organisms from the attention area information generated by the attention area estimation unit An attention area integration unit that generates area information and estimates the attention area from the generated integrated attention area information.

なお、前記注目領域統合部は、前記複数の生物に対向する面上の領域毎に、前記複数の生物の各々の注目度を示すスコアの平均値、最大値、中央値、及び最頻値の何れかを求め、あるいは前記複数の生物の各々の注目度を示すスコアの平均値、最大値、中央値、及び最頻値の少なくとも２つの組み合わせを求め、前記統合注目領域情報とする
ようにしても良い。 The attention area integration unit, for each area on the surface facing the plurality of organisms, an average value, a maximum value, a median value, and a mode value of scores indicating the attention degree of each of the plurality of organisms Obtain any one, or obtain at least two combinations of an average value, a maximum value, a median value, and a mode value indicating the degree of attention of each of the plurality of organisms, and use it as the integrated attention area information. Also good.

上記目的を達成するために、本発明の注目領域推定プログラムは、コンピュータを、上記注目領域推定装置の各部として機能させるためのプログラムである。 In order to achieve the above object, the attention area estimation program of the present invention is a program for causing a computer to function as each part of the attention area estimation device.

本発明によれば、画像に写る複数の生物が注目している領域を高精度に推定することができることが可能となる。 ADVANTAGE OF THE INVENTION According to this invention, it becomes possible to estimate the area | region which the some living organisms reflected on an image are paying attention to with high precision.

実施形態に係る注目領域推定装置の機能的な構成を示すブロック図である。It is a block diagram which shows the functional structure of the attention area estimation apparatus which concerns on embodiment. 実施形態に係る注目領域推定処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the attention area estimation process which concerns on embodiment. 実施形態に係る注目領域推定装置を用いた実験の研究背景を説明するための模式図である。It is a schematic diagram for demonstrating the research background of the experiment using the attention area estimation apparatus which concerns on embodiment. 実施形態に係る注目領域推定装置を用いて被験者の注視位置を推定するための実験における、被験者位置を示す模式図である。It is a schematic diagram which shows a test subject position in the experiment for estimating a test subject's gaze position using the attention area estimation apparatus which concerns on embodiment. 実施形態に係る注目領域推定装置を用いて被験者の注視位置を推定するための実験における、指標表示位置を示す模式図である。It is a schematic diagram which shows the parameter | index display position in the experiment for estimating a test subject's gaze position using the attention area estimation apparatus which concerns on embodiment. 実施形態に係る注目領域推定装置を用いて被験者の注視位置を推定するための実験において用いた注視位置分類器を示す表である。It is a table | surface which shows the gaze position classifier used in the experiment for estimating a test subject's gaze position using the attention area estimation apparatus which concerns on embodiment. 実施形態に係る注目領域推定装置を用いて被験者の注視位置を推定するための実験における、被験者位置が４ｍである場合の取得した画像数を示す表である。It is a table | surface which shows the number of images acquired when a test subject position is 4 m in the experiment for estimating a test subject's gaze position using the attention area estimation apparatus which concerns on embodiment. 実施形態に係る注目領域推定装置を用いて被験者の注視位置を推定するための実験における、被験者の注視位置の推定精度を示す表である。It is a table | surface which shows the estimation precision of a test subject's gaze position in the experiment for estimating a test subject's gaze position using the attention area estimation apparatus which concerns on embodiment.

以下、本実施形態について図面を用いて説明する。 Hereinafter, the present embodiment will be described with reference to the drawings.

本実施形態に係る注目領域推定装置は、第三者視点から撮影した画像において、画像に写っている複数の生物が注目している、当該複数の生物に対向する面上の領域を推定し、推定した領域を統合することにより、複数の生物が注目している注目領域を推定する。例えば、映画館で観客席を撮影した画像から各観客の顏画像を切り出し、切り出した各顔画像から、各観客が注目している領域を推定し、推定した領域を統合することで、観客全体が、現在、スクリーンのどの部分に注目しているかを推定することができる。 The attention area estimation device according to the present embodiment estimates a region on a surface facing a plurality of living organisms in which an image taken from a third-party viewpoint is focused on by the plurality of living organisms in the image, By integrating the estimated regions, the region of interest that a plurality of organisms are paying attention to is estimated. For example, by cutting out the image of each spectator from an image of a spectator seat taken in a movie theater, estimating the area that each spectator is paying attention to from the cut out facial images, and integrating the estimated areas, the entire audience However, it is possible to estimate which part of the screen is currently focused on.

このように、複数の生物が注目している、当該複数の生物に対向する面上の領域を推定することにより、各生物に視線測定装置等を装着することなく、生物が注目している領域を高精度に推定することが可能になる。 Thus, by estimating the area on the surface facing the plurality of organisms, which the plurality of organisms are paying attention to, the area that the organisms are paying attention to without attaching a gaze measuring device or the like to each organism Can be estimated with high accuracy.

図１は、本実施形態に係る注目領域推定装置１０の機能的な構成を示すブロック図である。図１に示すように、第１実施形態に係る注目領域推定装置１０は、部分画像生成部１２、注目領域推定部１４、及び、注目領域統合部１６を備えている。 FIG. 1 is a block diagram showing a functional configuration of an attention area estimation device 10 according to the present embodiment. As illustrated in FIG. 1, the attention area estimation device 10 according to the first embodiment includes a partial image generation unit 12, an attention area estimation unit 14, and an attention area integration unit 16.

なお、本実施形態に係る注目領域推定装置１０は、画像情報２０を入力として、画像に写る複数の生物が注目している、当該複数の生物に対向する面上の領域を示す統合注目領域情報２２を出力とする。 Note that the attention area estimation device 10 according to the present embodiment receives the image information 20 as an input, and the integrated attention area information indicating the area on the surface facing the plurality of organisms that the organisms appearing in the image are paying attention to. 22 is an output.

部分画像生成部１２は、生物が写っている１枚以上の画像を示す画像情報２０を入力とすると共に、必要に応じて、画像情報２０によって示される画像のうちの、生物が写っている部分画像を抽出し、抽出した部分画像を示す画像情報を注目領域推定部１４に出力する。なお、部分画像を抽出しない場合には、入力した画像情報２０を注目領域推定部１４に出力する。 The partial image generation unit 12 receives as input image information 20 indicating one or more images showing a living thing, and if necessary, a part of the image indicated by the image information 20 showing a living thing. An image is extracted, and image information indicating the extracted partial image is output to the attention area estimation unit 14. When the partial image is not extracted, the input image information 20 is output to the attention area estimation unit 14.

注目領域推定部１４は、画像情報によって示される画像に写っている各生物をｈ_ｉ（１≦ｉ≦ｎ）で表し、画像情報によって示される画像に写っている生物の集合をＨ＝｛ｈ_１，ｈ_２，…，ｈ_ｎ｝とする。また、生物ｈ_ｉが、生物全体に対向する面上の位置（ｘ，ｙ）に注目している場合のスコアをＬ_ｅ（ｈ_ｉ，ｘ，ｙ）とすることにより、このスコアＬ_ｅ（ｈ_ｉ，ｘ，ｙ）を、複数の生物の各々が注目している、生物全体に対向する面上の領域を示す注目領域情報として生成する。なお、ｎは、１以上の自然数である。 The attention area estimation unit 14 represents each creature shown in the image indicated by the image information as h _i (1 ≦ i ≦ n), and H = {h ₁ , h ₂ ,..., H _n }. Also, organisms _{h i} is the position on the surface facing the whole organism (x, y) the score if of interest to _{_{L e (h i, x,}} y) With this score _L e ( h _i , x, y) is generated as attention area information indicating an area on the surface facing the whole organism that each of the plurality of organisms is interested in. Note that n is a natural number of 1 or more.

例えば、画像情報によって示される画像に写っている複数の生物が複数の人物であり、各人物が、人物全体に対向するスクリーン上の何れかの位置を見ている場合について説明する。ここでは、各人物をｈ_ｉ（１≦ｉ≦ｎ）で表し、スクリーンを見ている人の集合をＨ＝｛ｈ_１，ｈ_２，…，ｈ_ｎ｝とする。また、人物ｈ_ｉがスクリーン上の位置（ｘ，ｙ）に注目している度合いを表すスコアをＬ_ｅ（ｈ_ｉ，ｘ，ｙ）とする。 For example, a case will be described in which a plurality of organisms shown in an image indicated by image information are a plurality of persons, and each person is looking at any position on the screen facing the entire person. Here, each person is represented by h _i (1 ≦ i ≦ n), and a set of people looking at the screen is H = {h ₁ , h ₂ ,..., H _n }. Also, let L _e (h _i , x, y) be a score representing the degree to which the person h _i is paying attention to the position (x, y) on the screen.

なお、本実施形態では、複数の生物を複数の人物とした場合について説明する。また、スクリーンを複数の領域に区分けし、位置（ｘ，ｙ）を、区分けされた領域で表し、領域（ｘ，ｙ）毎にスコアＬ_ｅ（ｈ_ｉ，ｘ，ｙ）を計算する場合について説明する。具体的には、スクリーンをｘ方向の領域に区分けし、ｙ方向の領域に区分けし、位置（ｘ，ｙ）を、区分けされたｘ方向の領域と、区分けされたｙ方向の領域とで表される領域とし、領域（ｘ，ｙ）毎にスコアＬ_ｅ（ｈ_ｉ，ｘ，ｙ）を計算する。 In the present embodiment, a case where a plurality of creatures are a plurality of persons will be described. In addition, the screen is divided into a plurality of areas, the position (x, y) is represented by the divided areas, and the score L _e (h _i , x, y) is calculated for each area (x, y). explain. Specifically, the screen is divided into regions in the x direction, divided into regions in the y direction, and the position (x, y) is represented by the divided regions in the x direction and the divided regions in the y direction. The score L _e (h _i , x, y) is calculated for each region (x, y).

注目領域統合部１６は、各人物のスコアＬ_ｅ（ｈ_ｉ，ｘ，ｙ）を統合することにより、画像情報によって示される画像に写っている人物がスクリーン上の位置（ｘ，ｙ）に注目している度合いを表すスコアＬ（ｘ，ｙ）を推定する。 The attention area integration unit 16 integrates the scores L _e (h _i , x, y) of each person so that the person shown in the image indicated by the image information pays attention to the position (x, y) on the screen. A score L (x, y) representing the degree of the image is estimated.

具体的には、注目領域統合部１６は、領域（ｘ，ｙ）に対する人物ｈ_ｉ（１≦ｉ≦ｎ）の注目度を示すスコアＬ_ｅ（ｈ_ｉ，ｘ，ｙ）の平均値、最大値、中央値、及び最頻値の何れかを求め、あるいは領域（ｘ，ｙ）に対する人物ｈ_ｉ（１≦ｉ≦ｎ）の注目度を示すスコアＬ_ｅ（ｈ_ｉ，ｘ，ｙ）の平均値、最大値、中央値、及び最頻値の少なくとも２つの組み合わせを求め、スコアＬ（ｘ，ｙ）を推定し、このスコアＬ（ｘ，ｙ）を統合注目領域情報として生成する。 Specifically, the attention area integration unit 16 calculates the average value of the score L _e (h _i , x, y) indicating the degree of attention of the person h _i (1 ≦ i ≦ n) with respect to the area (x, y), the maximum The score L _e (h _i , x, y) indicating the degree of attention of the person h _i (1 ≦ i ≦ n) for the region (x, y) is obtained. At least two combinations of an average value, a maximum value, a median value, and a mode value are obtained, a score L (x, y) is estimated, and this score L (x, y) is generated as integrated attention area information.

この際、例えば、下記（Ａ）乃至（Ｅ）に示す統合法の少なくとも１つを用いて、スコアＬ（ｘ，ｙ）を推定する。 At this time, for example, the score L (x, y) is estimated using at least one of the integration methods shown in the following (A) to (E).

（Ａ）下記（１）式に示すように、人物ｈ_ｉがスクリーン上の点（ｘ，ｙ）に注目している度合いを表すスコアをＬ_ｅ（ｈ_ｉ，ｘ，ｙ）の平均値を、スコアＬ（ｘ，ｙ）とする。 (A) As shown in the following formula (1), a score representing the degree to which the person h _i is paying attention to the point (x, y) on the screen is expressed as an average value of L _e (h _i , x, y). , Score L (x, y).

…（１）
... (1)

（Ｂ）下記（２）式に示すように、人物ｈ_ｉがスクリーン上の領域（ｘ，ｙ）に注目している度合いを表すスコアＬ_ｅ（ｈ_ｉ，ｘ，ｙ）の最大値を、スコアＬ（ｘ，ｙ）とする。 (B) As shown in the following equation (2), the maximum value of the score L _e (h _i , x, y) representing the degree to which the person h _i is paying attention to the area (x, y) on the screen is Let score L (x, y).

…（２）
... (2)

（Ｃ）下記（３）式に示すように、人物ｈ_ｉがスクリーン上の領域（ｘ，ｙ）に注目している度合いを表すスコアをＬ_ｅ（ｈ_ｉ，ｘ，ｙ）の中央値を、スコアＬ（ｘ，ｙ）とする。 (C) As shown in the following equation (3), a score representing the degree to which the person h _i is paying attention to the area (x, y) on the screen is expressed as the median value of L _e (h _i , x, y). , Score L (x, y).

…（３）
... (3)

（Ｄ）下記（４）式に示すように、人物ｈ_ｉがスクリーン上の領域（ｘ，ｙ）に注目している度合いを表すスコアをＬ_ｅ（ｈ_ｉ，ｘ，ｙ）の最頻値を、スコアＬ（ｘ，ｙ）とする。 (D) As shown in the following equation (4), the score representing the degree to which the person h _i is paying attention to the area (x, y) on the screen is the mode value of L _e (h _i , x, y). Is a score L (x, y).

…（４）
... (4)

（Ｅ）人物ｈ_ｉの各々に対して、ある領域（ｘ，ｙ）についてスコアＬ_ｅ（ｈ_ｉ，ｘ，ｙ）＝１とし、その他の領域（ｘ，ｙ）についてスコアＬ_ｅ（ｈ_ｉ，ｘ，ｙ）＝０として、下記（５）式に示すように、人物ｈ_ｉの各々に対する各領域（ｘ，ｙ）のスコアＬ_ｅ（ｈ_ｉ，ｘ，ｙ）を用いて、スコアＬ（ｘ，ｙ）を求める（投票法）。 (E) for each person _{h i,} a region (x, y) for the score _{_{L e (h i, x,}} y) = 1 and with the other region (x, y) for the score _L e _{(h i} , X, y) = 0, and using the score L _e (h _i , x, y) of each region (x, y) for each person h _i as shown in the following equation (5), the score L Find (x, y) (voting method).

…（５）
... (5)

なお、本実施形態に係る注目領域推定装置１０は、例えば、ＣＰＵ（Central Processing Unit）、ＲＡＭ（Random Access Memory）、各種プログラムを記憶するＲＯＭ（Read Only Memory）を備えたコンピュータ装置で構成される。また、注目領域推定装置１０を構成するコンピュータは、ハードディスクドライブ、不揮発性メモリ等の記憶部を備えていても良い。本実施形態では、ＣＰＵがＲＯＭ、ハードディスク等の記憶部に記憶されているプログラムを読み出して実行することにより、上記のハードウェア資源とプログラムとが協働し、上述した機能が実現される。 Note that the attention area estimation device 10 according to the present embodiment includes, for example, a computer device including a CPU (Central Processing Unit), a RAM (Random Access Memory), and a ROM (Read Only Memory) that stores various programs. . The computer constituting the attention area estimation apparatus 10 may include a storage unit such as a hard disk drive or a nonvolatile memory. In the present embodiment, the CPU reads and executes a program stored in a storage unit such as a ROM or a hard disk, whereby the hardware resources and the program cooperate to realize the above-described function.

本実施形態に係る注目領域推定装置１０による注目領域推定処理の流れを、図２に示すフローチャートを用いて説明する。本実施形態では、注目領域推定装置１０に、注目領域推定処理の実行を開始するための予め定めた情報が入力されたタイミングで注目領域推定処理が開始されるが、注目領域推定処理が開始されるタイミングはこれに限らず、例えば、入力画像が入力されたタイミングで注目領域推定処理が開始されても良い。 A flow of attention area estimation processing by the attention area estimation device 10 according to the present embodiment will be described with reference to a flowchart shown in FIG. In the present embodiment, the attention area estimation processing is started at a timing when predetermined information for starting execution of the attention area estimation processing is input to the attention area estimation device 10, but the attention area estimation processing is started. For example, the attention area estimation process may be started at the timing when the input image is input.

ステップＳ１０１では、部分画像生成部１２が、複数の人物が写る画像を入力する。 In step S101, the partial image generation unit 12 inputs an image showing a plurality of persons.

ステップＳ１０３では、部分画像生成部１２が、入力とした画像のうちの人物が写る部分画像を抽出する。なお、部分画像の抽出は必須ではなく、本ステップをスキップして、下記ステップＳ１０５以降のステップで、上記ステップＳ１０１で入力した画像を用いても良い。 In step S <b> 103, the partial image generation unit 12 extracts a partial image in which a person is captured from the input image. The extraction of the partial image is not essential, and this step may be skipped and the image input in step S101 may be used in the steps after step S105 below.

ステップＳ１０５では、注目領域推定部１４が、画像に写る複数の人物の各々が注目している領域を示す注目領域情報を生成する。 In step S <b> 105, the attention area estimation unit 14 generates attention area information indicating an area in which each of a plurality of persons in the image is paying attention.

ステップＳ１０７では、注目領域統合部１６が、生成された注目領域情報から、複数の人物の各々が注目している領域を統合した統合注目領域情報を生成する。 In step S <b> 107, the attention area integration unit 16 generates integrated attention area information that integrates the areas that each of the plurality of persons pays attention to from the generated attention area information.

ステップＳ１０９では、注目領域統合部１６が、生成された統合注目領域情報から、注目領域を推定し、推定された注目領域を出力し、本注目領域推定処理のプログラムの実行を終了する。なお、本実施形態では、推定された注目領域をディスプレイ等の表示手段に表示させたり、推定された注目領域を示すデータを記憶手段に記憶させたりすることにより、注目領域を出力する。 In step S109, the attention area integration unit 16 estimates the attention area from the generated integrated attention area information, outputs the estimated attention area, and ends the execution of the program of this attention area estimation processing. In the present embodiment, the attention area is output by displaying the estimated attention area on a display unit such as a display or by storing data indicating the estimated attention area in the storage means.

このように、本実施形態では、複数の生物（人物）が写る画像を入力とし、画像に写る複数の人物の各々が注目している、人物全体に対向する面上の領域を示す注目領域情報を生成し、生成された注目領域情報から、複数の人物の各々が注目している領域を統合した統合注目領域情報を生成し、生成された統合注目領域情報から、注目領域が推定される。複数の人物の視線推定結果を用いることにより、各人物に視線測定装置などを装着することなしに、複数の人物が注目している領域を高精度に推定することができる。 As described above, in the present embodiment, attention area information indicating an area on the surface facing the entire person, in which each of the plurality of persons appearing in the image is focused, taking an image in which a plurality of organisms (persons) are captured as input. Is generated from the generated attention area information, and integrated attention area information obtained by integrating the areas noted by each of the plurality of persons is generated, and the attention area is estimated from the generated integrated attention area information. By using the gaze estimation results of a plurality of persons, it is possible to accurately estimate a region in which a plurality of persons are paying attention without attaching a gaze measurement device or the like to each person.

なお、本実施形態では、注目領域推定装置１０が部分画像生成部１２を備えている場合について説明したが、これに限らない。例えば、注目領域推定装置１０は部分画像生成部１２を備えず、注目領域推定部１４が、外部装置によって抽出された部分画像を入力するようにしても良い。あるいは、注目領域推定装置１０は部分画像生成部１２を備えず、注目領域推定部１４が、複数の人物が写った画像を示す画像情報を入力し、入力した画像情報によって示される画像から、各人物の注目領域を推定しても良い。 In addition, although this embodiment demonstrated the case where the attention area estimation apparatus 10 was provided with the partial image generation part 12, it does not restrict to this. For example, the attention area estimation device 10 may not include the partial image generation unit 12 and the attention area estimation unit 14 may input a partial image extracted by an external device. Alternatively, the attention area estimation device 10 does not include the partial image generation unit 12, and the attention area estimation unit 14 inputs image information indicating an image in which a plurality of persons are captured, and from the images indicated by the input image information, A person's attention area may be estimated.

［実施例］ [Example]

ここで、平均値を用いる統合法で複数の人物の注目領域を統合した場合の実験例と、中央値を用いる統合法で複数の人物の注目領域を統合した場合の実験例とを示す。 Here, an experimental example in which attention areas of a plurality of persons are integrated by an integration method using an average value and an experimental example in which attention areas of a plurality of persons are integrated by an integration method using a median value are shown.

なお、本実験では、Ｃ＋＋の機械学習用のライブラリであるｄｌｉｂに実装されている顏器官検出手法を用いて顔画像の自動検出も行っている。なお、本実験では顔画像からその注目領域を出力するように学習したＣＮＮ（Convolutional Neural Network）を用い、このＣＮＮに、人物ｈ_ｉの顔画像を入力した出力結果を、人物ｈ_ｉについての各領域（ｘ，ｙ）のスコアＬ_ｅ（ｈ_ｉ，ｘ，ｙ）としている。 In this experiment, face images are also automatically detected using a heel organ detection method implemented in dlib, which is a library for C ++ machine learning. Note that using the CNN in this experiment learned to output the target area from the face image (Convolutional Neural Network), to the CNN, the output result of the input face image of a person h _i, each of the persons h _i The score L _e (h _i , x, y) of the region (x, y) is used.

また、ｘ方向については中心、右、及び左の３つの離散化された領域に区分けし、ｙ方向については中心、上、及び下の３つの離散化された領域に区分けし、指標位置とする領域（ｘ、ｙ）を、ｘ方向の領域とｙ方向の領域との組み合わせで表している。具体的には、スクリーン上の各領域を左上、上、右上、左、中心、右、左下、下、及び右下の９つの領域としている。 In addition, the x direction is divided into three discretized areas of the center, right, and left, and the y direction is divided into three discretized areas of the center, upper, and lower, and set as index positions. The region (x, y) is represented by a combination of a region in the x direction and a region in the y direction. Specifically, each area on the screen is defined as nine areas of upper left, upper, upper right, left, center, right, lower left, lower, and lower right.

＜研究背景および目的＞
集団の注目対象の位置推定技術への期待を研究背景とし、集団の注目対象の三次元位置推定を目的とする。 <Research Background and Purpose>
The research background is the expectation for the position estimation technology of the group's attention object, and the purpose is to estimate the three-dimensional position of the attention object of the group.

＜研究のまとめ＞
（１）データセット撮影
被験者の集団が同一対象を注視する映像のデータセット作成を目的とし、図３に示すように、以下のそれぞれのタスクにおいて被験者Ｈが指標Ｐに対して注視を行う映像を記録した。
（ｉ）スクリーンにランダムに表示される視標を注視するタスク（複数の被験者に同時に視標を一つ提示する。）
（ｉｉ）スクリーン外の一点を注視するタスク（複数の被験者が同時に一点を注視する。）
（ｉｉｉ）スクリーン外の適当な点を注視するタスク（複数の被験者がそれぞれスクリーン外の自由な位置を注視する。） <Summary of research>
(1) Data set shooting For the purpose of creating a data set of images in which a group of subjects gaze at the same object, as shown in FIG. 3, images of subjects H gazing at the index P in each of the following tasks: Recorded.
(I) A task of gazing at a target randomly displayed on the screen (presenting one target to a plurality of subjects simultaneously)
(Ii) A task of gazing at a point off the screen (a plurality of subjects gazing at a point at the same time)
(Iii) A task of gazing at an appropriate point outside the screen (a plurality of subjects each gaze at a free position outside the screen)

また、下記（ａ）乃至（ｃ）に示す設定で、実験を行った。
（ａ）被験者位置：４ｍ地点、７ｍ地点（図４を参照）
（ｂ）撮影対象
・４ｍ地点での撮影：一度で６人
・７ｍ地点での撮影：一度で７人
・被験者総数：１２人
・試行１回毎に席をローテーション
（ｃ）カメラ
・解像度：１２８０×１０２４
・フレームレート：６０ｆｐｓ
・レンズの焦点距離：８ｍｍ（３５ｍｍ換算３１ｍｍ）
・注視中の画像のうち後半３０フレームを利用
（ｄ）指標表示
・指標位置：左上、上、右上、左、中心、右、左下、下、右下の９つの領域（図５において、指標位置を「＋」で示している。）
・表示時間：３．０秒
・待ち時間：２．０秒
・表示回数：指標位置毎に２回ずつ（計１８回） The experiment was performed with the settings shown in the following (a) to (c).
(A) Subject position: 4 m point, 7 m point (see FIG. 4)
(B) Object to be photographed ・ Shooting at 4m point: 6 people at once ・ Shooting at 7m point: 7 people at once ・ Total number of subjects: 12 ・ Rotate seats for each trial (c) Camera ・ Resolution: 1280 × 1024
・ Frame rate: 60 fps
・ Focal length of lens: 8mm (35mm equivalent 31mm)
・ Use 30 frames in the latter half of the image being watched (d) Indicator display ・ Index position: Nine areas (upper left, upper, upper right, left, center, right, lower left, lower, lower right) Is indicated by “+”.)
・ Display time: 3.0 seconds ・ Wait time: 2.0 seconds ・ Display count: 2 times for each index position (18 times in total)

（２）下記（ア）乃至（キ）に示す条件下で、ＣＮＮを用いた注視位置分類器により、被験者の指標位置を分類した。
（ア）アーキテクチャ（図６を参照）
（イ）入出力
・入力：顔画像（３２×３２）
・出力：注視位置の推定結果
なお、図７では、各領域が注視されている確率を、各位置が注視されているスコアとしている。
（ウ）データセット
（ａ）４ｍ地点
テストデータ
・全ての被験者位置で顔が検出された被験者６人
・各被験者位置と視標位置毎に顔画像１枚
学習データ
・テストデータ以外の６人の全ての顔画像
（ｂ）７ｍ地点
テストデータ
・被験者のうち１人の全ての顔画像
学習データ
・テストデータ以外の１１人の全ての顔画像
（エ）結果の統合
（ａ）各人物の推定結果の平均から推定
各人物の推定注視位置のスコアから算出
（ｂ）各人物の推定結果の中央値から推定
各人物の推定注視位置のスコアから算出
（オ）評価指標
（ａ）認識率
（カ）比較手法
結果の統合なし
（キ）結果（４ｍ地点での結果のみ示す） (2) The index position of the subject was classified by a gaze position classifier using CNN under the conditions shown in the following (a) to (ki).
(A) Architecture (see Fig. 6)
(B) Input / output-Input: Face image (32 x 32)
Output: Estimated Result of Gaze Position In FIG. 7, the probability that each area is being watched is the score at which each position is watched.
(C) Data set (a) 4m point test data-6 subjects whose faces were detected at all subject positions-1 face image for each subject position and target position Learning data-6 people other than test data All face images (b) 7m point Test data-All face images of one of the subjects Learning data-All 11 face images other than test data (D) Integration of results (a) Estimation results for each person Estimated from the average of each person Calculated from the score of the estimated gaze position of each person (b) Estimated from the median of the estimation results of each person Calculated from the score of the estimated gaze position of each person (e) Evaluation index (a) Recognition rate (f) Comparison method No integration of results (G) Results (only shown at 4m)

図８に示すように、統合なしのテストデータの推定結果の精度が２０．０６％であったのに対し、スコアの平均値を用いた推定結果の精度が２９．６３％であり、スコアの中央値を用いた推定結果の精度が２７．７８％であった。従って、推定結果の統合しない場合と比較し、推定結果の統合した場合に推定精度が向上していることがわかる。 As shown in FIG. 8, the accuracy of the estimation result of the test data without integration was 20.06%, whereas the accuracy of the estimation result using the average value of the score was 29.63%. The accuracy of the estimation result using the median was 27.78%. Therefore, it can be seen that the estimation accuracy is improved when the estimation results are integrated as compared with the case where the estimation results are not integrated.

なお、本実施形態では、図１に示す機能の構成要素の動作をプログラムとして構築し、注目領域推定装置１０として利用されるコンピュータにインストールして実行させるが、これに限らず、ネットワークを介して流通させても良い。 In the present embodiment, the operation of the components of the functions shown in FIG. 1 is constructed as a program and installed in a computer used as the attention area estimation device 10 to be executed. However, the present invention is not limited to this. It may be distributed.

また、構築されたプログラムをハードディスク、ＣＤ−ＲＯＭ等の可搬記憶媒体に格納し、コンピュータにインストールしたり、配布したりしても良い。 Further, the constructed program may be stored in a portable storage medium such as a hard disk or a CD-ROM, and installed in a computer or distributed.

１０注目領域推定装置
１２部分画像生成部
１４注目領域推定部
１６注目領域統合部
２０画像情報
２２統合注目領域情報 DESCRIPTION OF SYMBOLS 10 Attention area estimation apparatus 12 Partial image generation part 14 Attention area estimation part 16 Attention area integration part 20 Image information 22 Integrated attention area information

Claims

An attention area estimation method in an attention area estimation device having an attention area estimation section and an attention area integration section,
The region-of-interest estimation unit receives an image of a plurality of organisms as input, and region-of-interest information indicating a region on the surface facing the plurality of organisms that each of the plurality of organisms captured in the image is interested in. A first generating step for generating;
A second generation step in which the attention area integration unit generates integrated attention area information obtained by integrating the areas of interest of each of the plurality of organisms from the generated attention area information;
The step of estimating the region of interest from the integrated region of interest information generated by the region of interest integration unit;
An attention area estimation method including:

In the first generation step, the attention area estimation unit includes, for each input image in which a plurality of organisms are captured, for each region on a surface facing the plurality of organisms, the plurality of organisms that are captured in the image. The attention area estimation method according to claim 1, wherein a score indicating each degree of attention is obtained and used as the attention area information.

In the second generation step, the attention area integration unit generates, from the generated attention area information, a score indicating a degree of attention that the plurality of organisms pay attention to for each area on the surface facing the plurality of organisms. The attention area estimation method according to claim 1, wherein the information is obtained as the integrated attention area information.

In the second generation step, for each region on the surface facing the plurality of organisms, the attention region integration unit calculates an average value, a maximum value, a median value of scores indicating the attention degree of each of the plurality of organisms, And a mode value, or a combination of at least two of an average value, a maximum value, a median value, and a mode value indicating the degree of attention of each of the plurality of organisms, and the integrated attention area information. The attention area estimation method according to any one of claims 1 to 3.

The attention area estimation device includes a partial image generation unit,
The partial image generation unit further includes a partial image generation step of generating a partial image in which the organism in the image is captured,
The attention area estimation method according to claim 1, wherein in the first generation step, the attention area estimation section generates the attention area information based on the partial image.

An attention area estimation unit that receives an image of a plurality of organisms as input, and generates attention area information indicating an area on a surface facing the plurality of organisms that each of the plurality of organisms in the image is interested in; ,
From the region-of-interest information generated by the region-of-interest estimation unit, each region of the plurality of living organisms is focused, and integrated region-of-interest information that integrates the regions is generated. A region of interest integration unit for estimating a region;
A region of interest estimation device including

The attention area integration unit is any one of an average value, a maximum value, a median value, and a mode value of a score indicating a degree of attention of each of the plurality of creatures for each area on a surface facing the plurality of creatures. Or obtaining at least two combinations of an average value, a maximum value, a median value, and a mode value indicating a degree of attention of each of the plurality of organisms, and using the combination as the integrated attention area information. Attention area estimation device.

The program for functioning a computer as each part of the attention area estimation apparatus of Claim 6 or 7.