JP2010258914A

JP2010258914A - Prominent area image generating method, prominent area image generating device, program, and recording medium

Info

Publication number: JP2010258914A
Application number: JP2009108474A
Authority: JP
Inventors: Shogo Kimura; 昭悟木村; Junji Yamato; 淳司大和
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2009-04-27
Filing date: 2009-04-27
Publication date: 2010-11-11
Anticipated expiration: 2029-04-27
Also published as: JP5235770B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a technology for enabling an area to be divided even when no information on an object area and a background area is given in advance. <P>SOLUTION: An area image generating device 1000 includes: a degree-of-interest image extraction unit 1 which extracts a degree-of-interest image from input image, a prominent area prior probability image extraction unit 2 which extracts a prominent area prior probability image showing a probability that each position of an input image as each of frames constituting the input image is in a prominent area; a feature quantity likelihood calculation unit 3 which calculates a feature quantity likelihood showing a likelihood of the image feature quantity that the input image is included in the prominent area and an area other than the prominent area; a prominent area image extraction unit 4 which extracts a prominent area image showing a prominent area of the input image from the input image, the prominent area prior probability image, and the feature quantity likelihood; and a prominent area image generation unit 5 which generates the prominent area image from the each prominent area image obtained by executing the above processes on each input image. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、顕著領域映像生成方法、顕著領域映像生成装置、プログラムおよび記録媒体に関する。特に、本発明は、人間の視覚特性を利用した信号処理によって顕著領域（物体領域）、背景領域（非顕著領域）に関する事前情報を算出し、算出した事前情報を利用して、入力映像から顕著領域と非顕著領域とを高精度に分割する、顕著領域映像生成方法、顕著領域映像生成装置、プログラムおよび記録媒体に関する。 The present invention relates to a saliency image generation method, a saliency image generation device, a program, and a recording medium. In particular, the present invention calculates prior information about a saliency area (object area) and a background area (non-salience area) by signal processing using human visual characteristics, and uses the calculated prior information to make a significant difference from the input video. The present invention relates to a saliency area image generation method, a saliency area image generation apparatus, a program, and a recording medium that divide an area and a non-salience area with high accuracy.

画像や映像から、人物・動物・物体など興味の対象である領域（以下、「物体領域」という）を、背景などの物体領域以外の領域（以下、「背景領域」という）と区別して抽出する領域分割技術は、クロマキーなどを利用しない自由な画像映像合成、背景領域の変動に頑健な物体認識・画像映像検索、領域の重要性に応じてビットレートを調整可能な画像映像符号化など、応用範囲の広い重要な技術である。領域分割技術として、画像領域分割をある種の統計モデルに対する事後確率最大化問題として定式化し、事後確率最大化問題を統計モデルと等価なグラフの最小カットを求めることによって解決する方法が知られている（例えば、非特許文献１参照）。また、非特許文献１に記載の方法を映像信号に拡張し、映像信号の時間的な連続性を利用してグラフの最小カットを効率的に求めることで高速に映像の領域分割を実現する方法も知られている（例えば、非特許文献２参照）。 Extracting regions of interest (hereinafter referred to as “object regions”) such as people, animals, and objects from images and video separately from regions other than object regions such as backgrounds (hereinafter referred to as “background regions”) Area segmentation technology can be applied to free image synthesis without using chroma keys, object recognition and image retrieval robust to changes in the background area, and image video coding that can adjust the bit rate according to the importance of the area. It is an important technology with a wide range. As a region segmentation technique, a method is known in which image region segmentation is formulated as a posterior probability maximization problem for a certain statistical model, and the posterior probability maximization problem is solved by finding the minimum cut of the graph equivalent to the statistical model. (For example, refer nonpatent literature 1). In addition, the method described in Non-Patent Document 1 is extended to a video signal, and a video segmentation is realized at high speed by efficiently obtaining a minimum cut of a graph using temporal continuity of the video signal. Is also known (see, for example, Non-Patent Document 2).

Y.Boykov and G.F.Lea,“Graph cuts and efficient N-D image segmentation,” International Journal of Computer Vision,Vol.70,No.2,pp.109-131,2006.Y. Boykov and G.F.Lea, “Graph cuts and efficient N-D image segmentation,” International Journal of Computer Vision, Vol. 70, No. 2, pp. 109-131, 2006. P.Kohli and P. Torr,“Dynamic graph cuts for efficient inference in Markov random fields,”IEEE Transactions on Pattern Analysis and Machine Intelligence,Vol.29,No.12,pp.2079-2088,2007.P. Kohli and P. Torr, “Dynamic graph cuts for efficient inference in Markov random fields,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 29, No. 12, pp. 2079-2088, 2007.

領域分割技術の応用では、物体領域や背景領域に関する事前情報（手がかり）が与えられない状況においても正しく物体領域と背景領域とを分割できる手法、換言すれば、人手を介せず領域分割を含めて全ての手順を自動で実行できる手法が望まれている。領域分割技術の応用範囲が拡大するからである。しかしながら、非特許文献１、２に記載の方法は、物体領域・背景領域に関する事前情報が部分的に手動で与えられることを想定しているため、物体領域・背景領域に関する事前情報が全く与えられない場合には利用することができないという問題がある。即ち、領域分割技術の応用範囲を著しく制限しているという問題がある。 In the application of region segmentation technology, a method that can correctly segment the object region and the background region even in the situation where prior information (cue) about the object region and the background region is not given, in other words, including region segmentation without human intervention. Therefore, a method that can automatically execute all the procedures is desired. This is because the application range of the area division technique is expanded. However, since the methods described in Non-Patent Documents 1 and 2 assume that prior information on the object region / background region is partially manually given, no prior information on the object region / background region is given at all. If not, there is a problem that it cannot be used. That is, there is a problem that the application range of the region division technique is remarkably limited.

本発明は、上述した課題に鑑みてなされたものであって、物体領域、背景領域に関する事前情報が全く与えられない場合においても領域分割を可能にするための技術を提供することを目的とする。 The present invention has been made in view of the above-described problems, and an object of the present invention is to provide a technique for enabling region division even when prior information regarding an object region and a background region is not given at all. .

上記問題を解決するために、本発明の一態様である顕著領域映像生成方法は、入力映像から、人間が注意を向けやすい度合いである注目度を示す注目度映像を抽出する注目度映像抽出過程と、入力映像を構成する各フレームである入力画像の各位置が顕著領域である確率を示す顕著領域事前確率画像を抽出する顕著領域事前確率画像抽出過程と、前記入力画像の顕著領域および顕著領域外の領域にそれぞれ含まれる画像特徴量の尤度を示す特徴量尤度を算出する特徴量尤度算出過程と、前記入力画像、前記顕著領域事前確率画像および前記特徴量尤度から、前記入力画像の顕著領域を示す顕著領域画像を抽出する顕著領域画像抽出部と、前記各入力画像に対し、前記注目度映像抽出過程、前記顕著領域事前確率画像抽出過程、前記特徴量尤度算出過程および前記顕著領域画像抽出過程を実行して得られる前記各顕著領域画像から前記顕著領域映像を生成する顕著領域映像生成過程とを有し、前記顕著領域事前確率画像抽出過程は、一の前記入力画像の各位置が顕著領域である確率を示す顕著領域事前確率画像を、前記注目度映像抽出過程によって抽出された前記注目度映像内の前記入力画像に対応する画像である注目度画像および前記顕著領域画像に基づいて抽出し、前記特徴量尤度算出過程は、前記特徴量尤度を、前記入力画像、前記注目度画像、前記顕著領域事前確率画像、前記顕著領域画像および前回迄に算出した前記特徴量尤度の少なくとも１つに基づいて算出することを特徴とする。 In order to solve the above problem, a remarkable area image generation method according to an aspect of the present invention includes an attention level image extraction process for extracting an attention level image indicating a degree of attention that is a degree to which a person is likely to pay attention from an input image. A saliency area prior probability image extraction process for extracting a saliency area prior probability image indicating the probability that each position of the input image, which is each frame constituting the input video, is a saliency area, and the saliency area and saliency area of the input image From the feature amount likelihood calculation process for calculating the feature amount likelihood indicating the likelihood of the image feature amount included in each of the outer regions, the input image, the saliency region prior probability image, and the feature amount likelihood, the input A saliency area image extracting unit for extracting a saliency area image indicating a saliency area of the image; and for each of the input images, the attention degree video extraction process, the saliency area prior probability image extraction process, and the feature amount likelihood. And a saliency area image generation process for generating the saliency area image from the saliency area images obtained by executing the saliency area image extraction process and the saliency area prior probability image extraction process. A remarkable area prior probability image indicating the probability that each position of the input image is a remarkable area, an attention degree image that is an image corresponding to the input image in the attention degree video extracted by the attention degree video extraction process, and The feature amount likelihood calculation process extracts the feature amount likelihood from the input image, the attention degree image, the saliency area prior probability image, the saliency area image, and the previous time. It is calculated based on at least one of the calculated feature amount likelihoods.

上記顕著領域映像生成方法において、前記注目度映像抽出過程は、前記入力画像に基づいて、該入力画像の中で顕著な特性を持つ空間領域を表示した画像である基礎注目度画像を算出する基礎注目度画像抽出過程と、現在の前記入力画像の各位置における顕著性を確率的な表現を用いて表示した画像である確率的基礎注目度画像を、前記基礎注目度画像抽出過程によって算出された基礎注目度画像と、前回の前記入力画像から該確率的基礎注目度画像抽出過程によって算出された前記確率的基礎注目度画像と、逐次更新され、視線位置推定に用いる第１の母数である確率的基礎注目度母数とに基づいて算出する確率的基礎注目度画像抽出過程と、現在の前記入力画像における前記視線位置確率密度映像のフレームである視線位置確率密度画像を、前記確率的基礎注目度画像抽出過程によって算出された確率的基礎注目度画像と、前回の前記入力画像から該視線位置確率密度画像抽出過程によって算出された前記視線位置確率密度画像と、逐次更新され、視線位置推定に用いる第２の母数である視線位置確率密度母数とに基づいて算出する視線位置確率密度画像抽出過程と、前記基礎注目度画像抽出過程と、前記確率的基礎注目度画像抽出過程と、前記視線位置確率密度画像抽出過程とを、それぞれの前記入力画像に対して順に繰り返し行うことにより算出される時系列の前記視線位置確率密度画像を前記視線位置確率密度映像として出力する視線位置確率密度映像出力過程とを有し、前記視線位置確率密度画像抽出過程は、視線移動の大きさを制御する確率変数である視線移動状態変数を、前回の前記入力画像から該視線位置確率密度画像抽出過程によって算出された前記視線位置確率密度画像と、前回の前記入力画像から該視線移動状態変数更新過程によって算出された視線移動状態変数と、前記視線位置確率密度母数とに基づいて更新し、該視線移動状態変数の集合である視線移動状態変数集合を出力する視線移動状態変数更新過程と、視線移動を考慮した代表的な視線位置を示す代表視線位置の集合である代表視線位置集合を、前記確率的基礎注目度画像抽出過程によって算出された確率的基礎注目度画像と、前回の前記入力画像から該代表視線位置更新過程によって更新された代表視線位置集合と、前記視線移動状態変数集合と、前記視線位置確率密度母数とに基づいて更新する代表視線位置更新過程と、それぞれの前記代表視線位置に関連付けられた重みからなる代表視線位置重み係数の集合である代表視線位置重み係数集合を、前記確率的基礎注目度画像抽出過程によって算出された確率的基礎注目度画像と、前記代表視線位置更新過程によって更新された代表視線位置集合と、前記視線移動状態変数更新過程から出力された視線移動状態変数集合と、前記視線位置確率密度母数とに基づいて算出する代表視線位置重み係数算出過程と、前記代表視線位置更新過程によって更新された代表視線位置集合と、代表視線位置重み係数算出過程によって算出された代表視線位置重み係数集合とに基づいて、前記視線位置確率密度画像を算出する視線位置確率密度画像出力過程とを有し、前記代表視線位置集合と、前記代表視線位置重み係数集合とを含む前記視線位置確率密度画像を算出するようにしてもよい。 In the saliency area video generation method, the attention level image extraction step is a basis for calculating a basic attention level image that is an image displaying a spatial area having a remarkable characteristic in the input image based on the input image. Attention level image extraction process and a stochastic basic attention level image that is an image displaying prominence at each position of the current input image using a probabilistic expression was calculated by the basic attention level image extraction process The basic attention degree image, the probabilistic basic attention degree image calculated by the probabilistic basic attention degree image extraction process from the previous input image, and a first parameter that is sequentially updated and used for gaze position estimation. A stochastic basic attention degree image extraction process that is calculated based on a stochastic basic attention degree parameter, and a gaze position probability density image that is a frame of the gaze position probability density image in the current input image The probabilistic basic attention degree image calculated by the probabilistic basic attention level image extraction process, and the gaze position probability density image calculated by the gaze position probability density image extraction process from the previous input image, and sequentially updated A gaze position probability density image extraction process calculated based on a gaze position probability density parameter that is a second parameter used for gaze position estimation, the basic attention degree image extraction process, and the stochastic basic attention degree The time-series gaze position probability density image calculated by sequentially repeating the image extraction process and the gaze position probability density image extraction process for each of the input images is output as the gaze position probability density image. A line-of-sight position probability density image output process, and the line-of-sight position probability density image extraction process is a line-of-sight movement state that is a random variable that controls the magnitude of line-of-sight movement The line-of-sight position probability density image calculated by the line-of-sight position probability density image extraction process from the previous input image, and the line-of-sight movement state variable calculated by the line-of-sight movement state variable update process from the previous input image And a line-of-sight movement state variable update process for outputting a line-of-sight movement state variable set that is a set of the line-of-sight movement state variables, and a representative line of sight in consideration of line-of-sight movement. A representative line-of-sight position set, which is a set of representative line-of-sight positions indicating a position, is obtained by a probabilistic basic attention level image calculated by the probabilistic basic attention level image extraction process and a representative gaze position update process from the previous input image. Representative visual line position update process for updating based on the updated representative visual line position set, the visual line movement state variable set, and the visual line position probability density parameter, respectively. A representative gaze position weight coefficient set, which is a set of representative gaze position weight coefficients composed of weights associated with the representative gaze position, and a probabilistic basic attention image calculated by the probabilistic basic attention image extraction process; The representative gaze position calculated based on the representative gaze position set updated by the representative gaze position update process, the gaze movement state variable set output from the gaze movement state variable update process, and the gaze position probability density parameter. Based on the weight coefficient calculation process, the representative eye position position set updated by the representative eye position update process, and the representative eye position weight coefficient set calculated by the representative eye position weight coefficient calculation process, the eye position probability density image A visual line position probability density image output process for calculating the representative visual line position set and the representative visual line position weighting coefficient set It may be calculated the linear position probability density image.

上記顕著領域映像生成方法において、前記顕著領域事前確率画像抽出過程は、前記注目度画像のみを用いて前記顕著領域事前確率画像を生成する顕著領域事前確率画像生成過程と、前記顕著領域画像を用いて前記顕著領域事前確率画像生成過程によって生成された前記顕著領域事前確率画像を更新する顕著領域事前確率画像更新過程とから構成されるようにしてもよい。 In the saliency area image generation method, the saliency area prior probability image extraction process uses the saliency area prior probability image generation process for generating the saliency area prior probability image using only the attention level image, and the saliency area image. And a saliency area prior probability image update process for updating the saliency area prior probability image generated by the saliency area prior probability image generation process.

上記顕著領域映像生成方法において、前記特徴量尤度算出過程は、顕著領域に含まれる画像特徴量の尤度を示す顕著領域特徴量尤度を、前記入力画像、前記顕著領域事前確率画像、前記顕著領域画像および前回迄に算出した前記顕著領域特徴量尤度のうち少なくとも１つに基づいて算出する顕著領域特徴量尤度算出過程と、顕著領域外の領域に含まれる画像特徴量の尤度を示す非顕著領域特徴量尤度を、前記入力画像、前記顕著領域事前確率画像、前記顕著領域画像および前回迄に算出した前記非顕著領域特徴量尤度のうち少なくとも１つに基づいて算出する非顕著領域特徴量尤度算出過程と、前記顕著領域特徴量尤度および前記非顕著領域特徴量尤度を加算して特徴量尤度として出力する特徴量尤度出力過程とから構成されるようにしてもよい。 In the saliency area video generation method, the feature amount likelihood calculation step includes the saliency area feature amount likelihood indicating the likelihood of the image feature amount included in the saliency area, the input image, the saliency area prior probability image, A saliency area feature amount likelihood calculation process based on at least one of the saliency area image and the saliency area feature amount likelihood calculated up to the previous time, and the likelihood of the image feature amount included in the area outside the saliency area Is calculated based on at least one of the input image, the saliency area prior probability image, the saliency area image, and the non-salience area feature amount likelihood calculated until the previous time. A non-significant area feature quantity likelihood calculation process and a feature quantity likelihood output process of adding the saliency area feature quantity likelihood and the non-salience area feature quantity likelihood to output as a feature quantity likelihood. even if There.

上記顕著領域映像生成方法において、前記顕著領域特徴量尤度算出過程は、前記入力画像、前記顕著領域事前確率画像および前記顕著領域画像に基づいて、前記顕著領域特徴量尤度を生成する顕著領域特徴量尤度生成過程と、前記顕著領域特徴量尤度生成過程によって生成された前記顕著領域特徴量尤度を更新する顕著領域特徴量尤度更新過程とから構成され、非顕著領域特徴量尤度算出過程は、前記入力画像、前記顕著領域事前確率画像および前記顕著領域画像に基づいて、前記非顕著領域特徴量尤度を生成する非顕著領域特徴量尤度生成過程と、前記非顕著領域特徴量尤度生成過程によって生成された前記非顕著領域特徴量尤度を更新する非顕著領域特徴量尤度更新過程とから構成され、前記顕著領域特徴量尤度更新過程は、前記入力画像、前記顕著領域画像および前回迄に更新した更新後の前記顕著領域特徴量尤度のうち少なくとも１つに基づいて前記顕著領域特徴量尤度を更新し、前記非顕著領域特徴量尤度更新過程は、前記入力画像、非顕著領域画像および前回迄に更新した更新後の前記非顕著領域特徴量尤度のうち少なくとも１つに基づいて前記非顕著領域特徴量尤度を更新するようにしてもよい。 In the saliency area video generation method, the saliency area feature amount likelihood calculating step includes generating a saliency area feature amount likelihood based on the input image, the saliency area prior probability image, and the saliency area image. A feature amount likelihood generation process and a saliency area feature amount likelihood update process for updating the saliency area feature amount likelihood generated by the saliency area feature amount likelihood generation process. The degree calculation process includes: a non-salience area feature amount likelihood generation process for generating the non-salience area feature amount likelihood based on the input image, the saliency area prior probability image, and the saliency area image; A non-salience area feature amount likelihood update process for updating the non-salience area feature amount likelihood generated by the feature amount likelihood generation process, and the saliency area feature amount likelihood update process includes the input image Updating the saliency area feature quantity likelihood based on at least one of the saliency area image and the updated saliency area feature quantity likelihood updated up to the previous time, and updating the non-salience area feature quantity likelihood May update the non-significant region feature value likelihood based on at least one of the input image, the non-significant region image, and the updated non-significant region feature value likelihood updated so far. Good.

上記顕著領域映像生成方法は、前記入力画像を異なる解像度によってそれぞれ平滑化した複数の平滑化画像からなる平滑化画像群を生成する平滑化画像群生成過程と、前記平準化画像群に対し、前記顕著領域事前確率画像抽出過程、前記特徴量尤度算出過程、前記顕著領域画像抽出過程を実行し、前記入力画像の前記顕著領域画像を確定する顕著領域画像確定過程とを更に有し、前記特徴量尤度算出過程および顕著領域画像抽出過程は、前記入力画像に代えて前記平滑化画像を用い、前記顕著領域映像生成過程は、前記各入力画像に対し、前記注目度映像抽出過程、前記顕著領域事前確率画像抽出過程、前記特徴量尤度算出過程、前記顕著領域画像抽出過程、前記平滑化画像群生成過程および顕著領域画像確定過程を実行して得られる前記顕著領域画像から前記顕著領域映像を生成するようにしてもよい。 The saliency area image generation method includes a smoothed image group generation process for generating a smoothed image group including a plurality of smoothed images obtained by smoothing the input image at different resolutions, and the leveled image group, A saliency area prior probability image extraction process, the feature amount likelihood calculation process, and the saliency area image extraction process, and further including a saliency area image determination process for determining the saliency area image of the input image, The quantity likelihood calculation process and the saliency area image extraction process use the smoothed image instead of the input image, and the saliency area image generation process performs the attention level image extraction process and the saliency for each input image. The saliency obtained by executing an area prior probability image extraction process, the feature likelihood calculation process, the saliency area image extraction process, the smoothed image group generation process, and the saliency area image determination process. It may generate the salient region image from the area image.

上記問題を解決するために、本発明の他の態様である顕著領域映像生成装置は、入力映像から、人間が注意を向けやすい度合いである注目度を示す注目度映像を抽出する注目度映像抽出部と、入力映像を構成する各フレームである前記入力画像の各位置が顕著領域である確率を示す顕著領域事前確率画像を抽出する顕著領域事前確率画像抽出部と、前記入力画像の顕著領域および顕著領域外の領域にそれぞれ含まれる画像特徴量の尤度を示す特徴量尤度を算出する特徴量尤度算出部と、前記入力画像、前記顕著領域事前確率画像および前記特徴量尤度から、前記入力画像の顕著領域を示す顕著領域画像を抽出する顕著領域画像抽出部と、前記各入力画像に対し、前記注目度映像抽出部、前記顕著領域事前確率画像抽出部、前記特徴量尤度算出部および前記顕著領域画像抽出部を実行して得られる前記各顕著領域画像から前記顕著領域映像を生成する顕著領域映像生成部とを備え、前記顕著領域事前確率画像抽出部は、一の前記入力画像の各位置が顕著領域である確率を示す顕著領域事前確率画像を、前記注目度映像抽出部によって抽出された前記注目度映像内の前記入力画像に対応する画像である注目度画像および前記顕著領域画像に基づいて抽出し、前記特徴量尤度算出部は、前記特徴量尤度を、前記入力画像、前記注目度画像、前記顕著領域事前確率画像、前記顕著領域画像および前回迄に算出した前記特徴量尤度の少なくとも１つに基づいて算出することを特徴とする。 In order to solve the above-described problem, a remarkable area image generation device according to another aspect of the present invention extracts an attention level image that extracts an attention level image indicating a degree of attention that is a degree to which a person is likely to pay attention from an input image. A saliency area prior probability image extraction unit that extracts a saliency area prior probability image indicating a probability that each position of the input image that is each frame constituting the input video is a saliency area, and a saliency area of the input image and From the feature amount likelihood calculating unit that calculates the likelihood of the image feature amount included in each of the regions outside the saliency region, the input image, the saliency region prior probability image, and the feature amount likelihood, A saliency area image extraction unit that extracts a saliency area image indicating a saliency area of the input image; and, for each of the input images, the attention level video extraction unit, the saliency area prior probability image extraction unit, and the feature amount likelihood calculation. And a saliency area image generation unit that generates the saliency area image from each saliency area image obtained by executing the saliency area image extraction unit, and the saliency area prior probability image extraction unit includes the one input image A noticeable area prior probability image indicating a probability that each position of the noticeable area is a noticeable area, an attention degree image that is an image corresponding to the input image in the attention degree video extracted by the attention degree video extraction unit, and the salient area The feature amount likelihood calculating unit extracts the feature amount likelihood from the input image, the attention degree image, the saliency area prior probability image, the saliency area image, and the previous time. The calculation is based on at least one of the feature amount likelihoods.

上記顕著領域映像生成装置において、前記注目度映像抽出部は、前記入力画像に基づいて、該入力画像の中で顕著な特性を持つ空間領域を表示した画像である基礎注目度画像を算出する基礎注目度画像抽出部と、現在の前記入力画像の各位置における顕著性を確率的な表現を用いて表示した画像である確率的基礎注目度画像を、前記基礎注目度画像抽出部によって算出された基礎注目度画像と、前回の前記入力画像から該確率的基礎注目度画像抽出部によって算出された前記確率的基礎注目度画像と、逐次更新され、視線位置推定に用いる第１の母数である確率的基礎注目度母数とに基づいて算出する確率的基礎注目度画像抽出部と、現在の前記入力画像における前記視線位置確率密度映像のフレームである視線位置確率密度画像を、前記確率的基礎注目度画像抽出部によって算出された確率的基礎注目度画像と、前回の前記入力画像から該視線位置確率密度画像抽出部によって算出された前記視線位置確率密度画像と、逐次更新され、視線位置推定に用いる第２の母数である視線位置確率密度母数とに基づいて算出する視線位置確率密度画像抽出部と、前記基礎注目度画像抽出部と、前記確率的基礎注目度画像抽出部と、前記視線位置確率密度画像抽出部とを、それぞれの前記入力画像に対して順に繰り返し行うことにより算出される時系列の前記視線位置確率密度画像を前記視線位置確率密度映像として出力する視線位置確率密度映像出力部とを備え、前記視線位置確率密度画像抽出部は、視線移動の大きさを制御する確率変数である視線移動状態変数を、前回の前記入力画像から該視線位置確率密度画像抽出部によって算出された前記視線位置確率密度画像と、前回の前記入力画像から該視線移動状態変数更新部によって算出された視線移動状態変数と、前記視線位置確率密度母数とに基づいて更新し、該視線移動状態変数の集合である視線移動状態変数集合を出力する視線移動状態変数更新部と、視線移動を考慮した代表的な視線位置を示す代表視線位置の集合である代表視線位置集合を、前記確率的基礎注目度画像抽出部によって算出された確率的基礎注目度画像と、前回の前記入力画像から該代表視線位置更新部によって更新された代表視線位置集合と、前記視線移動状態変数集合と、前記視線位置確率密度母数とに基づいて更新する代表視線位置更新部と、それぞれの前記代表視線位置に関連付けられた重みからなる代表視線位置重み係数の集合である代表視線位置重み係数集合を、前記確率的基礎注目度画像抽出部によって算出された確率的基礎注目度画像と、前記代表視線位置更新部によって更新された代表視線位置集合と、前記視線移動状態変数更新部から出力された視線移動状態変数集合と、前記視線位置確率密度母数とに基づいて算出する代表視線位置重み係数算出部と、前記代表視線位置更新部によって更新された代表視線位置集合と、代表視線位置重み係数算出部によって算出された代表視線位置重み係数集合とに基づいて、前記視線位置確率密度画像を算出する視線位置確率密度画像出力部とを有し、前記代表視線位置集合と、前記代表視線位置重み係数集合とを含む前記視線位置確率密度画像を算出するようにしてもよい。 In the saliency area video generation device, the attention level video extraction unit calculates a basic attention level image that is an image displaying a spatial area having a remarkable characteristic in the input image based on the input image. The basic attention degree image extraction unit calculates a probabilistic basic attention degree image that is an image displaying the saliency at each position of the current input image using a probabilistic expression. The basic attention image, the probabilistic basic attention image calculated by the probabilistic basic attention image extraction unit from the previous input image, and a first parameter that is sequentially updated and used for gaze position estimation. A probabilistic basic attention degree image extraction unit that is calculated based on a probabilistic basic attention degree parameter, and a gaze position probability density image that is a frame of the gaze position probability density image in the current input image; The probabilistic basic attention level image calculated by the static basic attention level image extraction unit and the gaze position probability density image calculated by the gaze position probability density image extraction unit from the previous input image are sequentially updated, and the line of sight A gaze position probability density image extraction unit that is calculated based on a gaze position probability density parameter that is a second parameter used for position estimation, the basic attention level image extraction unit, and the probabilistic basic attention level image extraction unit And a line-of-sight position probability density image extracting unit that outputs the line-of-sight position probability density image in time series calculated as the line-of-sight position probability density image as the line-of-sight position probability density image. A probability density image output unit, and the line-of-sight position probability density image extraction unit converts a line-of-sight movement state variable, which is a random variable that controls the amount of line-of-sight movement, into the previous input image. The line-of-sight position probability density image calculated by the line-of-sight position probability density image extraction unit, the line-of-sight movement state variable calculated by the line-of-sight movement state variable update unit from the previous input image, and the line-of-sight position probability density mother A line-of-sight movement state variable update unit that outputs a line-of-sight movement state variable set that is a set of the line-of-sight movement state variables, and a set of representative line-of-sight positions indicating representative line-of-sight positions that take line-of-sight movement into account A representative gaze position set calculated by the probabilistic basic attention level image extraction unit, and a representative gaze position set updated by the representative gaze position update unit from the previous input image, , A representative line-of-sight position update unit for updating based on the line-of-sight movement state variable set and the line-of-sight position probability density parameter, and weights associated with the representative line-of-sight positions The representative gaze position weighting coefficient set, which is a set of representative gaze position weighting coefficients, is updated by the probabilistic basic attention degree image calculated by the probabilistic basic attention degree image extracting unit and the representative gaze position updating unit. A representative gaze position weight coefficient calculating unit that calculates a representative gaze position set, a gaze movement state variable set output from the gaze movement state variable update unit, and the gaze position probability density parameter; and the representative gaze position A gaze position probability density image output unit that calculates the gaze position probability density image based on the representative gaze position weight set updated by the update unit and the representative gaze position weight coefficient set calculated by the representative gaze position weight coefficient calculation unit The visual line position probability density image including the representative visual line position set and the representative visual line position weighting coefficient set may be calculated.

上記顕著領域映像生成装置において、前記顕著領域事前確率画像抽出部は、前記注目度画像のみを用いて前記顕著領域事前確率画像を生成する顕著領域事前確率画像生成部と、前記顕著領域画像を用いて前記顕著領域事前確率画像生成部によって生成された前記顕著領域事前確率画像を更新する顕著領域事前確率画像更新部とから構成されるようにしてもよい。 In the saliency area video generation device, the saliency area prior probability image extraction unit uses the saliency area prior probability image generation part that generates the saliency area prior probability image using only the attention level image, and the saliency area image. And a saliency area prior probability image update unit that updates the saliency area prior probability image generated by the saliency area prior probability image generation unit.

上記顕著領域映像生成装置において、前記特徴量尤度算出部は、顕著領域に含まれる画像特徴量の尤度を示す顕著領域特徴量尤度を、前記入力画像、前記顕著領域事前確率画像、前記顕著領域画像および前回迄に算出した前記顕著領域特徴量尤度のうち少なくとも１つに基づいて算出する顕著領域特徴量尤度算出部と、顕著領域外の領域に含まれる画像特徴量の尤度を示す非顕著領域特徴量尤度を、前記入力画像、前記顕著領域事前確率画像、前記顕著領域画像および前回迄に算出した前記非顕著領域特徴量尤度のうち少なくとも１つに基づいて算出する非顕著領域特徴量尤度算出部と、前記顕著領域特徴量尤度および前記非顕著領域特徴量尤度を加算して特徴量尤度として出力する特徴量尤度出力部とから構成されるようにしてもよい。 In the saliency area video generation device, the feature amount likelihood calculating unit calculates the saliency area feature amount likelihood indicating the likelihood of the image feature amount included in the saliency area, the input image, the saliency area prior probability image, the The saliency area feature amount likelihood calculation unit that calculates based on at least one of the saliency area image and the saliency area feature amount likelihood calculated up to the previous time, and the likelihood of the image feature amount included in the area outside the saliency area Is calculated based on at least one of the input image, the saliency area prior probability image, the saliency area image, and the non-salience area feature amount likelihood calculated until the previous time. A non-significant region feature amount likelihood calculating unit and a feature amount likelihood output unit that adds the saliency region feature amount likelihood and the non-significant region feature amount likelihood and outputs the result as a feature amount likelihood. It may be.

上記顕著領域映像生成装置において、前記顕著領域特徴量尤度算出部は、前記入力画像、前記顕著領域事前確率画像および前記顕著領域画像に基づいて、前記顕著領域特徴量尤度を生成する顕著領域特徴量尤度生成部と、前記顕著領域特徴量尤度生成部によって生成された前記顕著領域特徴量尤度を更新する顕著領域特徴量尤度更新部とから構成され、前記非顕著領域特徴量尤度算出部は、前記入力画像、前記顕著領域事前確率画像および前記顕著領域画像に基づいて、前記非顕著領域特徴量尤度を生成する非顕著領域特徴量尤度生成部と、前記非顕著領域特徴量尤度生成部によって生成された前記非顕著領域特徴量尤度を更新する非顕著領域特徴量尤度更新部とから構成され、前記顕著領域特徴量尤度更新部は、前記入力画像、前記顕著領域画像および前回迄に更新した更新後の前記顕著領域特徴量尤度のうち少なくとも１つに基づいて前記顕著領域特徴量尤度を更新し、前記非顕著領域特徴量尤度更新部は、前記入力画像、非顕著領域画像および前回迄に更新した更新後の前記非顕著領域特徴量尤度のうち少なくとも１つに基づいて前記非顕著領域特徴量尤度を更新するようにしてもよい。 In the saliency area video generation device, the saliency area feature amount likelihood calculating unit generates the saliency area feature amount likelihood based on the input image, the saliency area prior probability image, and the saliency area image. The non-significant area feature quantity includes a feature quantity likelihood generation section and a saliency area feature quantity likelihood update section that updates the saliency area feature quantity likelihood generated by the saliency area feature quantity likelihood generation section. The likelihood calculation unit includes a non-salience region feature amount likelihood generation unit that generates the non-salience region feature amount likelihood based on the input image, the saliency region prior probability image, and the saliency region image; A non-significant region feature amount likelihood update unit that updates the non-salience region feature amount likelihood generated by the region feature amount likelihood generation unit, and the saliency region feature amount likelihood update unit includes the input image The prominent territory The saliency area feature quantity likelihood is updated based on at least one of the image and the updated saliency area feature quantity likelihood updated up to the previous time, and the non-salience area feature quantity likelihood update unit is configured to input the input The non-salience area feature amount likelihood may be updated based on at least one of the image, the non-salience area image, and the updated non-salience area feature amount likelihood updated so far.

上記顕著領域映像生成装置は、前記入力画像を異なる解像度によってそれぞれ平滑化した複数の平滑化画像からなる平滑化画像群を生成する平滑化画像群生成部と、前記平準化画像群に対し、前記顕著領域事前確率画像抽出部、前記特徴量尤度算出部、前記顕著領域画像抽出部の処理を実行し、前記入力画像の前記顕著領域画像を確定する顕著領域画像確定部とを更に備え、前記特徴量尤度算出部および顕著領域画像抽出部は、前記入力画像に代えて前記平滑化画像を用い、前記顕著領域映像生成部は、前記各入力画像に対し、前記注目度映像抽出部、前記顕著領域事前確率画像抽出部、前記特徴量尤度算出部、前記顕著領域画像抽出部、前記平滑化画像群生成部および前記顕著領域画像確定部の各処理を実行して得られる前記顕著領域画像から前記顕著領域映像を生成するようにしてもよい。 The saliency area video generation device includes a smoothed image group generation unit that generates a smoothed image group including a plurality of smoothed images obtained by smoothing the input image with different resolutions, and the leveled image group, A saliency area prior probability image extraction unit, the feature amount likelihood calculation unit, and a saliency area image extraction unit, and a saliency area image determination unit that determines the saliency area image of the input image. The feature amount likelihood calculation unit and the saliency area image extraction unit use the smoothed image instead of the input image, and the saliency area image generation unit performs the attention level image extraction unit, The saliency area image obtained by executing each process of the saliency area prior probability image extraction unit, the feature amount likelihood calculation unit, the saliency area image extraction unit, the smoothed image group generation unit, and the saliency area image determination unit It may generate an al the salient region image.

上記問題を解決するために、本発明の他の態様であるプログラムは、入力映像から、人間が注意を向けやすい度合いである注目度を示す注目度映像を抽出する注目度映像抽出ステップと、入力映像を構成する各フレームである前記入力画像の各位置が顕著領域である確率を示す顕著領域事前確率画像を抽出する顕著領域事前確率画像抽出ステップと、前記入力画像の顕著領域および顕著領域外の領域にそれぞれ含まれる画像特徴量の尤度を示す特徴量尤度を算出する特徴量尤度算出ステップと、前記入力画像、前記顕著領域事前確率画像および前記特徴量尤度から、前記入力画像の顕著領域を示す顕著領域画像を抽出する顕著領域画像抽出ステップと、前記各入力画像に対し、前記注目度映像抽出ステップ、前記顕著領域事前確率画像抽出ステップ、前記特徴量尤度算出ステップおよび前記顕著領域画像抽出ステップを実行して得られる前記各顕著領域画像から前記顕著領域映像を生成する顕著領域映像生成ステップとをコンピュータに実行させるプログラムであって、前記顕著領域事前確率画像抽出ステップは、一の前記入力画像の各位置が顕著領域である確率を示す顕著領域事前確率画像を、前記注目度映像抽出ステップによって抽出された前記注目度映像内の前記入力画像に対応する画像である注目度画像および前記顕著領域画像に基づいて抽出し、前記特徴量尤度算出ステップは、前記特徴量尤度を、前記入力画像、前記注目度画像、前記顕著領域事前確率画像、前記顕著領域画像および前回迄に算出した前記特徴量尤度の少なくとも１つに基づいて算出することを特徴とする。 In order to solve the above problem, a program according to another aspect of the present invention includes an attention level video extraction step of extracting an attention level video indicating a level of attention that is a degree to which a person is likely to pay attention from an input video, and an input A saliency area prior probability image extracting step for extracting a saliency area prior probability image indicating a probability that each position of the input image which is each frame constituting the video is a saliency area; The feature amount likelihood calculating step for calculating the feature amount likelihood indicating the likelihood of the image feature amount included in each region, and the input image, the saliency region prior probability image, and the feature amount likelihood A saliency area image extraction step for extracting a saliency area image indicating a saliency area; and, for each of the input images, the attention degree video extraction step and the saliency area prior probability image extraction stage. And a saliency area image generation step for generating the saliency area image from the saliency area images obtained by executing the feature amount likelihood calculation step and the saliency area image extraction step. The saliency area prior probability image extraction step includes a saliency area prior probability image indicating the probability that each position of the one input image is a saliency area in the attention level video extracted by the attention level video extraction step. Based on the attention level image that is an image corresponding to the input image and the saliency area image, and the feature amount likelihood calculating step includes calculating the feature amount likelihood as the input image, the attention level image, It is calculated based on at least one of the saliency area prior probability image, the saliency area image, and the feature amount likelihood calculated until the previous time. That.

上記問題を解決するために、本発明の他の態様である記録媒体は、入力映像から、人間が注意を向けやすい度合いである注目度を示す注目度映像を抽出する注目度映像抽出ステップと、入力映像を構成する各フレームである前記入力画像の各位置が顕著領域である確率を示す顕著領域事前確率画像を抽出する顕著領域事前確率画像抽出ステップと、前記入力画像の顕著領域および顕著領域外の領域にそれぞれ含まれる画像特徴量の尤度を示す特徴量尤度を算出する特徴量尤度算出ステップと、前記入力画像、前記顕著領域事前確率画像および前記特徴量尤度から、前記入力画像の顕著領域を示す顕著領域画像を抽出する顕著領域画像抽出ステップと、前記各入力画像に対し、前記注目度映像抽出ステップ、前記顕著領域事前確率画像抽出ステップ、前記特徴量尤度算出ステップおよび前記顕著領域画像抽出ステップを実行して得られる前記各顕著領域画像から前記顕著領域映像を生成する顕著領域映像生成ステップとをコンピュータに実行させるためのプログラムを記録したコンピュータ読み取り可能な記録憶媒体であって、前記顕著領域事前確率画像抽出ステップは、一の前記入力画像の各位置が顕著領域である確率を示す顕著領域事前確率画像を、前記注目度映像抽出ステップによって抽出された前記注目度映像内の前記入力画像に対応する画像である注目度画像および前記顕著領域画像に基づいて抽出し、前記特徴量尤度算出ステップは、前記特徴量尤度を、前記入力画像、前記注目度画像、前記顕著領域事前確率画像、前記顕著領域画像および前回迄に算出した前記特徴量尤度の少なくとも１つに基づいて算出することを特徴とする。 In order to solve the above problem, a recording medium according to another aspect of the present invention includes an attention level video extraction step for extracting an attention level video indicating a degree of attention that is a degree to which a human is easily directed from an input video; A saliency area prior probability image extracting step for extracting a saliency area prior probability image indicating a probability that each position of the input image which is each frame constituting the input video is a saliency area; From the feature amount likelihood calculating step for calculating the feature amount likelihood indicating the likelihood of the image feature amount included in each of the regions, the input image, the saliency region prior probability image, and the feature amount likelihood, the input image A saliency area image extracting step for extracting a saliency area image indicating a saliency area of the image, a focus level image extraction step, and a saliency area prior probability image extraction step for each input image. A program for causing a computer to execute a saliency area image generation step for generating the saliency area image from each saliency area image obtained by executing the feature amount likelihood calculation step and the saliency area image extraction step. In the recorded computer-readable recording medium, the remarkable area prior probability image extraction step includes: a remarkable area prior probability image indicating a probability that each position of the one input image is a remarkable area; Extracting based on the attention level image and the saliency area image, which are images corresponding to the input image in the attention level video extracted in the extraction step, and the feature amount likelihood calculating step includes calculating the feature amount likelihood. , The input image, the attention degree image, the saliency area prior probability image, the saliency area image, and the feature amount calculated up to the previous time And calculating, based on at least one degree.

本発明によれば、物体領域・背景領域に関する事前情報が全く与えられない場合においても領域分割が可能になる。従って、物体領域・背景領域に関する事前知識がない場合でも、精度良く物体領域と背景領域を分割して、注目している領域（物体領域）を抽出することができるようになる。 According to the present invention, it is possible to divide a region even when no prior information regarding the object region / background region is given. Therefore, even if there is no prior knowledge about the object region / background region, the object region and the background region can be divided with high accuracy and the region of interest (object region) can be extracted.

本発明の第１の実施形態に係る顕著領域映像生成装置１０００による顕著領域事前確率画像の算出過程の模式図である。It is a schematic diagram of the calculation process of the saliency area prior probability image by the saliency area image generation device 1000 according to the first embodiment of the present invention. 顕著領域映像生成装置１０００の機能ブロック図の一例である。3 is an example of a functional block diagram of a saliency area image generation device 1000. FIG. 注目度映像抽出部１の機能ブロック図である。It is a functional block diagram of the attention level video extraction unit 1. 顕著領域事前確率画像抽出部２、特徴量尤度算出部３および顕著領域画像抽出部４の機能ブロック図である。4 is a functional block diagram of a saliency area prior probability image extraction unit 2, a feature amount likelihood calculation unit 3, and a saliency area image extraction unit 4. FIG. 特徴量尤度の算出過程の模式図である。It is a schematic diagram of the feature amount likelihood calculation process. 顕著領域抽出グラフの例である。It is an example of a remarkable area | region extraction graph. 顕著領域映像生成装置１１００の機能ブロック図の一例である。3 is an example of a functional block diagram of a saliency area image generation device 1100. FIG. 顕著領域抽出の例である。It is an example of remarkable area | region extraction. 顕著領域抽出の比較であるComparison of salient region extraction

（第１の実施形態）
以下、本発明の第１の実施形態に係る顕著領域映像生成装置１０００について図面を参照して説明する。なお、本実施形態（後述する第２の実施形態も同様）においては、領域分割を映像顕著性に基づいて実現するため、以下、「顕著領域抽出」と「領域分割」とを同義で用いる。同様に、「顕著領域」と「物体領域」とを同義で用い、「非顕著領域」と「背景領域」とを同義で用いる。また、以下の説明において、式中の文字上部に￣が付いた文字は、文中において文字の前に￣を記載して示す。例えば、式中の文字（下記式１）は文中において￣ｘと記載して示す。 (First embodiment)
Hereinafter, a saliency image generating apparatus 1000 according to a first embodiment of the present invention will be described with reference to the drawings. In the present embodiment (the same applies to the second embodiment described later), in order to realize area division based on video saliency, hereinafter, “significant area extraction” and “area division” are used synonymously. Similarly, “saliency area” and “object area” are used synonymously, and “non-salience area” and “background area” are used synonymously. In the following description, a character with a ￣ in the upper part of the character in the formula is indicated with a ￣ in front of the character in the sentence. For example, a character in the formula (the following formula 1) is indicated as ￣x in the sentence.

また、式中の文字上部に〜が付いた文字は、文中において文字の前に〜を記載して示す。例えば、式中の文字（下記式２）は文中において〜ηと記載して示し、式中の文字（下記式３）は文中において〜Σと記載して示す。 In addition, a character with “˜” attached to the upper part of the character in the formula is indicated by “˜” before the character in the sentence. For example, a character in the formula (Formula 2 below) is indicated by ˜η in the sentence, and a character in the formula (Formula 3 below) is indicated by ˜Σ in the sentence.

なお、式中の文字（下記式４）と文中のｇは同じである。 In addition, the character in a formula (following formula 4) and g in a sentence are the same.

顕著領域映像生成装置１０００は、図１に示すように、外部から入力映像を取得し、当該入力映像を構成する各入力フレーム（各入力画像）からそれぞれの顕著領域を抽出した各顕著領域フレーム（各顕著領域画像）から構成される顕著領域映像を生成し、外部に出力する。 As shown in FIG. 1, the saliency area video generation apparatus 1000 obtains an input video from the outside, and extracts each saliency area frame (extracting each saliency area from each input frame (each input image) constituting the input video ( A saliency area image composed of each saliency area image) is generated and output to the outside.

顕著領域映像生成装置１０００は、図２に示すように、注目度映像抽出部１、顕著領域事前確率画像抽出部２、特徴量尤度算出部３、顕著領域画像抽出部４および顕著領域映像生成部５を備える。注目度映像抽出部１は、図３に示すように、基礎注目度画像抽出部１１、確率的基礎注目度画像抽出部１２、確率的基礎注目度母数逐次推定部１３、視線位置確率密度画像抽出部１４、視線位置確率密度映像出力部１５を備える。視線位置確率密度画像抽出部１４は、視線移動状態変数更新部１４１、代表視線位置更新部１４２、代表視線位置重み係数算出部１４３、視線位置確率密度画像出力部１４４、代表視線位置集合再構成部１４５を備える。顕著領域事前確率画像抽出部２は、図４（ａ）に示すように、顕著領域事前確率画像生成部２１および顕著領域事前確率画像更新部２２を備える。特徴量尤度算出部３は、顕著領域特徴量尤度算出部３１、図４（ｂ）に示すように、非顕著領域特徴量尤度算出部３２および特徴量尤度出力部３３を備える。顕著領域特徴量尤度算出部３１は、顕著領域特徴量尤度生成部３１１および顕著領域特徴量尤度更新部３１２を備える。非顕著領域特徴量尤度算出部３２は、非顕著領域特徴量尤度生成部３２１および非顕著領域特徴量尤度更新部３２２を備える。顕著領域画像抽出部４は、図４（ｃ）に示すように、顕著領域抽出グラフ生成部４１および顕著領域抽出グラフ分割部４２を備える。 As shown in FIG. 2, the saliency area image generation apparatus 1000 includes an attention degree image extraction unit 1, a saliency area prior probability image extraction unit 2, a feature amount likelihood calculation unit 3, a saliency area image extraction unit 4, and a saliency area image generation. Part 5 is provided. As shown in FIG. 3, the attention level video extraction unit 1 includes a basic attention level image extraction unit 11, a stochastic basic attention level image extraction unit 12, a stochastic basic attention level parameter sequential estimation unit 13, and a gaze position probability density image. An extraction unit 14 and a line-of-sight position probability density video output unit 15 are provided. The gaze position probability density image extraction unit 14 includes a gaze movement state variable update unit 141, a representative gaze position update unit 142, a representative gaze position weight coefficient calculation unit 143, a gaze position probability density image output unit 144, and a representative gaze position set reconstruction unit. 145. As shown in FIG. 4A, the saliency area prior probability image extraction unit 2 includes a saliency area prior probability image generation unit 21 and a saliency area prior probability image update unit 22. The feature amount likelihood calculation unit 3 includes a saliency region feature amount likelihood calculation unit 31, a non-salience region feature amount likelihood calculation unit 32, and a feature amount likelihood output unit 33, as shown in FIG. The saliency area feature quantity likelihood calculation unit 31 includes a saliency area feature quantity likelihood generation unit 311 and a saliency area feature quantity likelihood update unit 312. The non-significant region feature quantity likelihood calculating unit 32 includes a non-significant region feature amount likelihood generating unit 321 and a non-significant region feature amount likelihood updating unit 322. As shown in FIG. 4C, the saliency area image extraction unit 4 includes a saliency area extraction graph generation unit 41 and a saliency area extraction graph division unit 42.

注目度映像抽出部１は、入力映像を取得する。注目度映像抽出部１は、入力映像の各フレームの中で人間が注意を向けやすい度合いである注目度を示す映像である注目度映像を抽出する。注目度映像抽出部１は、抽出した注目度映像を顕著領域事前確率画像抽出部２に出力（供給）する。 The attention level video extraction unit 1 acquires an input video. The attention level video extracting unit 1 extracts a attention level video that is a video showing a level of attention, which is the degree to which a person is likely to pay attention in each frame of the input video. The attention level video extraction unit 1 outputs (supplies) the extracted attention level video to the saliency prior probability image extraction unit 2.

具体的には、注目度映像抽出部１は、視線位置推定の対象となる入力映像、視線位置推定に必要となる第１の母数である確率的基礎注目度母数Θ_ｓ（ｔ）、及び視線位置推定に必要となる第２の母数である視線位置確率密度母数Θ_ｘ（ｔ）が入力され、入力映像に含まれる時系列の各入力画像（各フレーム）内の各位置において、人間が視線を向ける確率を示した視線位置確率密度画像Ｘ（ｔ）を算出する。更に、視線位置推定装置１００は、算出した視線位置確率密度画像Ｘ（ｔ）の時系列の映像である視線位置確率密度映像を出力する。 Specifically, the attention level video extraction unit 1 includes an input video that is a target of eye gaze position estimation, a stochastic basic attention degree parameter Θ _s (t) that is a first parameter necessary for gaze position estimation, And a gaze position probability density parameter Θ _x (t), which is a second parameter necessary for gaze position estimation, is input at each position in each time-series input image (each frame) included in the input video. Then, a line-of-sight position probability density image X (t) showing the probability that the person turns the line of sight is calculated. Furthermore, the gaze position estimation apparatus 100 outputs a gaze position probability density image that is a time-series image of the calculated gaze position probability density image X (t).

基礎注目度画像抽出部１１は、入力された入力映像から視線位置推定を行う入力画像（フレーム）を取り出す。また、基礎注目度画像抽出部１１は、取り出した入力画像の中で顕著な特性を持つ空間領域を示した画像である基礎注目度画像を抽出する。そして、抽出した基礎注目度画像を確率的基礎注目度画像抽出部１２及び確率的基礎注目度母数逐次推定部１３に出力する。 The basic attention level image extraction unit 11 extracts an input image (frame) for estimating the line-of-sight position from the input video input. Further, the basic attention level image extraction unit 11 extracts a basic attention level image that is an image showing a spatial region having a remarkable characteristic in the extracted input image. Then, the extracted basic attention level image is output to the probabilistic basic attention level image extraction unit 12 and the stochastic basic attention level parameter sequential estimation unit 13.

基礎注目度画像抽出部１１における基礎注目度画像の抽出処理は、特許文献１に記載されている基礎注目度画像抽出部１１の処理と同様であるため、処理内容の詳細な説明は省略する。ただし、本実施形態では、時刻ｉの入力画像から算出する基礎注目度画像を下記式（５）（以下、「基礎注目度画像￣Ｓ（ｉ）」と表す）とする。 Since the basic attention level image extraction processing in the basic attention level image extraction unit 11 is the same as the processing of the basic attention level image extraction unit 11 described in Patent Document 1, detailed description of the processing content is omitted. However, in the present embodiment, the basic attention degree image calculated from the input image at time i is represented by the following equation (5) (hereinafter referred to as “basic attention degree image ￣S (i)”).

確率的基礎注目度画像抽出部１２は、現在の入力画像の各位置における顕著性を確率的な表現を用いて表示した画像である確率的基礎注目度画像Ｓ（ｔ）を抽出する。なお、確率的基礎注目度画像抽出部１２による確率的基礎注目度画像Ｓ（ｔ）の抽出は、基礎注目度画像抽出部１１から入力された基礎注目度画像￣Ｓ（ｉ）、該確率的基礎注目度画像抽出部１２が算出したこれまでの確率的基礎注目度画像Ｓ（ｔ）、及び確率的基礎注目度母数Θ_ｓ（ｔ）に基づいて行われる。
また、確率的基礎注目度画像抽出部１２によって抽出される確率的基礎注目度画像Ｓ（ｔ）は、各位置ｙにおける確率的基礎注目度ｓ（ｔ，ｙ）の期待値である下記式（６）（以下、「期待値＾ｓ（ｔ，ｙ｜ｔ）」と表す）及び標準偏差σｓ（ｔ，ｙ｜ｔ）を保持する画像である。 The probabilistic basic attention level image extraction unit 12 extracts a probabilistic basic attention level image S (t) that is an image in which saliency at each position of the current input image is displayed using a probabilistic expression. Note that the stochastic basic attention level image S (t) is extracted by the probabilistic basic attention level image extraction unit 12 based on the basic attention level image ￣S (i) input from the basic attention level image extraction unit 11 and the probabilistic basic attention level image extraction unit 12. This is performed based on the previous stochastic basic attention level image S (t) calculated by the basic attention level image extraction unit 12 and the stochastic basic attention level parameter Θ _s (t).
The probabilistic basic attention image S (t) extracted by the probabilistic basic attention image extracting unit 12 is an expected value of the probabilistic basic attention s (t, y) at each position y (the following formula ( 6) An image holding a standard deviation σs (t, y | t) (hereinafter, expressed as “expected value ^ s (t, y | t)”).

また、確率的基礎注目度画像抽出部１２は、抽出した確率的基礎注目度画像Ｓ（ｔ）を視線位置確率密度画像抽出部１４及び確率的基礎注目度母数逐次推定部１３に出力する。
なお、確率的基礎注目度画像抽出部１２には、確率的基礎注目度母数逐次推定部１３によって更新された確率的基礎注目度母数Θ_ｓ（ｔ＋１）が入力される。 The stochastic basic attention level image extraction unit 12 outputs the extracted probabilistic basic attention level image S (t) to the gaze position probability density image extraction unit 14 and the stochastic basic attention level parameter sequential estimation unit 13.
The stochastic basic attention degree image extraction unit 12 receives the stochastic basic attention degree parameter Θ _s (t + 1) updated by the stochastic basic attention degree parameter sequential estimation unit 13.

確率的基礎注目度画像抽出部１２における確率的基礎注目度画像Ｓ（ｔ）の抽出は、非特許文献１、２に記載されている手法によって算出することができる。また、確率的基礎注目度画像抽出部１２における確率的基礎注目度画像Ｓ（ｔ）の抽出方法は、特に限定されるものではないが、一例として、カルマンフィルタを用いた推定方法について述べる。 The extraction of the stochastic basic attention level image S (t) in the stochastic basic attention level image extraction unit 12 can be calculated by the methods described in Non-Patent Documents 1 and 2. The method of extracting the stochastic basic attention level image S (t) in the stochastic basic attention level image extracting unit 12 is not particularly limited, but an estimation method using a Kalman filter will be described as an example.

まず、現在（時刻ｔ）の確率的基礎注目度画像Ｓ（ｔ）（確率変数）の位置ｙにおける画素値ｓ（ｔ，ｙ）（確率変数）が、現在の基礎注目度画像である下記式（７）（以下、「基礎注目度画像￣Ｓ（ｔ）」と表す）の位置ｙにおける画素値である下記式（８）（以下、「画素値￣ｓ（ｔ，ｙ）」と表す）、及び１時点前（時刻ｔ−１）の確率的基礎注目度画像Ｓ（ｔ−１）の位置ｙにおける画素値ｓ（ｔ−１，ｙ）について、下記式（９）、（１０）のような関係式を満たしているものとする。 First, the pixel value s (t, y) (probability variable) at the position y of the current (time t) stochastic basic attention image S (t) (probability variable) is the current basic attention image. (7) (hereinafter referred to as “pixel value ￣ s (t, y)”) (7) (hereinafter referred to as “pixel value ￣ s (t, y)”) , And the pixel value s (t−1, y) at the position y of the stochastic basic attention level image S (t−1) before one time point (time t−1), the following formulas (9) and (10) It is assumed that the relational expression is satisfied.

ここで、確率的基礎注目度母数Θ_ｓ（ｔ）は、時刻ｔ及び位置ｙに依存する形で下記式（１１）のように与えられているものとする。 Here, it is assumed that the stochastic basic attention degree parameter Θ _s (t) is given by the following equation (11) in a form depending on the time t and the position y.

また、上記式（９）、（１０）において、ｐ（ａ｜ｂ）は、ｂが与えられたときのａの確率密度を示す。また、下記式（１２）は期待値が下記式（１３）で標準偏差がσである正規分布に従うｓの確率密度を示し、下記式（１４）のように表される。 In the above formulas (9) and (10), p (a | b) indicates the probability density of a when b is given. Also, the following formula (12) shows the probability density of s according to the normal distribution whose expected value is the following formula (13) and the standard deviation is σ, and is expressed as the following formula (14).

以降の説明においては、画素値￣ｓ（ｔ，ｙ）を位置ｙにおける基礎注目度と呼ぶこととする。また、同様に、確率的基礎注目度ｓ（ｔ，ｙ）を位置ｙにおける確率的基礎注目度と呼ぶこととする。また、特に必要な場合を除いて、位置ｙを省略するものとする。例えば、ｓ（ｔ，ｙ）をｓ（ｔ）と表す。 In the following description, the pixel value ￣s (t, y) is referred to as the basic attention level at the position y. Similarly, the stochastic basic attention level s (t, y) is referred to as the stochastic basic attention level at the position y. Further, the position y is omitted unless particularly necessary. For example, s (t, y) is represented as s (t).

続いて、１時点前の確率的基礎注目度ｓ（ｔ−１）が、これまでの確率的基礎注目度画像抽出部１２の処理により、下記式（１５）のような確率密度を用いた表現にて抽出されているものとする。 Subsequently, the stochastic basic attention level s (t−1) one point before is expressed by using the probability density as in the following formula (15) by the processing of the probabilistic basic attention level image extraction unit 12 so far. It is assumed that it has been extracted at

上記式（１５）において、下記式（１６）は、時刻ｔ_１から時刻ｔ_２までの基礎注目度の系列、下記式（１７）は時刻１から時刻ｔ_２までの基礎注目度である下記式（１８）が与えられているときの時刻ｔ_１の確率的基礎注目度ｓ（ｔ_１）の期待値、σ_ｓ（ｔ_１｜ｔ_２）は、この時の標準偏差を示す。 In the above formula (15), the following formula (16) is a series of basic attention degrees from time t ₁ to time t _2, and the following formula (17) is the following formula that is basic attention degrees from time 1 to time t _2. The expected value σ _s (t ₁ | t ₂ ) of the stochastic basic attention degree s (t ₁ ) at time t ₁ when (18) is given indicates the standard deviation at this time.

このとき、確率的基礎注目度画像抽出部１２は、現在の確率的基礎注目度ｓ（ｔ）の下記式（１９）に示す確率密度における期待値である下記式（２０）（以下、「期待値＾ｓ（ｔ｜ｔ）」と表す）及び標準偏差σ_ｓ（ｔ｜ｔ）を、下記式（２１）、（２２）のようにして更新する。 At this time, the stochastic basic attention level image extraction unit 12 uses the following formula (20) (hereinafter, “expectation”) which is an expected value in the probability density shown in the following formula (19) of the current stochastic basic attention level s (t). Value ^ s (t | t) ") and standard deviation σ _s (t | t) are updated as in the following equations (21) and (22).

なお、上述の確率的基礎注目度画像抽出部１２における期待値＾ｓ（ｔ｜ｔ）及び標準偏差σ_ｓ（ｔ｜ｔ）の更新は、画像中の各位置で独立して実行することができる。 Note that the update of the expected value ^ s (t | t) and the standard deviation σ _s (t | t) in the above-described stochastic basic attention level image extraction unit 12 may be executed independently at each position in the image. it can.

また、上記式（１０）に替えて、下記式（２３）のような関係式を用いることによって、入力画像の各位置における動き成分を考慮した実施形態とすることも可能である。 Further, by using a relational expression such as the following expression (23) instead of the above expression (10), an embodiment in which a motion component at each position of the input image is taken into consideration can be obtained.

上記式（２３）において、Δｙ（ｔ）は時刻ｔ、位置ｙにおけるオプティカルフローであり、例えば、特許文献１に記載されている運動特徴画像抽出部１１５と同様の方法を用いる。 In the above equation (23), Δy (t) is an optical flow at time t and position y. For example, the same method as the motion feature image extraction unit 115 described in Patent Document 1 is used.

上記に述べたカルマンフィルタを用いた推定方法では、各位置の確率的基礎注目度ｓ（ｔ，ｙ）が空間方向で独立に抽出されていたが、確率的基礎注目度の空間的な連続性を導入することもできる。以下、動的マルコフ確率場と呼ばれる統計モデルに基づく確率的基礎注目度の記述を行い、平均場近似と呼ばれる統計解析手法により確率的基礎注目度ｓ（ｔ，ｙ）を解析的に導出する場合について述べる。 In the estimation method using the Kalman filter described above, the stochastic basic attention degree s (t, y) at each position is extracted independently in the spatial direction. It can also be introduced. Hereinafter, a description of a stochastic basic attention degree based on a statistical model called a dynamic Markov random field will be described, and a stochastic basic attention degree s (t, y) will be derived analytically by a statistical analysis technique called mean field approximation. Is described.

まず、現在（時刻ｔ）の確率的基礎注目度画像Ｓ（ｔ）（確率変数）の位置ｙにおける画素値ｓ（ｔ，ｙ）（確率変数）が、現在の基礎注目度画像￣Ｓ（ｔ）の位置ｙにおける画素値￣ｓ（ｔ，ｙ）、１時点前（時刻ｔ−１）の確率的基礎注目度画像Ｓ（ｔ−１）の位置ｙにおける画素値ｓ（ｔ−１，ｙ）、及び位置ｙの近傍Ｄ（ｙ）に含まれる各位置である下記式（２４）における現在の確率的基礎注目度画像Ｓ（ｔ）の画素値である下記式（２５）について、下記式（２６）〜（３０）のような関係式を満たしているものとする。 First, the pixel value s (t, y) (probability variable) at the position y of the current (time t) probabilistic basic attention image S (t) (probability variable) is the current basic attention image ￣S (t ) At the position y of the probabilistic basic attention level image S (t−1) one time before (time t−1). ), And the following formula (25) that is the pixel value of the current probabilistic basic attention level image S (t) in the following formula (24) that is each position included in the vicinity D (y) of the position y: It is assumed that the relational expressions (26) to (30) are satisfied.

ここで、確率的基礎注目度母数Θ_ｓ（ｔ）は、時刻ｔ及び位置ｙに依存する形で下記式（３１）のように再定義されているものとする。 Here, it is assumed that the stochastic basic attention degree parameter Θ _s (t) is redefined as shown in the following formula (31) in a form depending on the time t and the position y.

近傍Ｄ（ｙ）の決定方法としては、例えば、位置ｙの上下左右の４点、もしくはさらに斜め位置４点を加えた８点、などが考えられる。 As a method for determining the neighborhood D (y), for example, four points on the top, bottom, left, and right of the position y, or eight points including four additional oblique positions may be considered.

続いて、上述したカルマンフィルタを用いた推定方法と同様に、１時点前の確率的基礎注目度ｓ（ｔ−１，ｙ）が、これまでの確率的基礎注目度画像抽出部１２の処理により、下記式（３２）のような確率密度を用いた表現にて抽出されているものとする。 Subsequently, as in the estimation method using the Kalman filter described above, the stochastic basic attention level s (t−1, y) one point before is obtained by the processing of the probabilistic basic attention level image extraction unit 12 so far. It is assumed that it is extracted by an expression using a probability density such as the following formula (32).

上記式（３２）において、下記式（３３）は、時刻ｔ_２までの基礎注目度画像である下記式（３４）が与えられているときの時刻ｔ_１・位置ｙの確率的基礎注目度ｓ（ｔ_１，ｙ）の期待値、σ_ｓ（ｔ_１，ｙ｜ｔ_２）は、この時の標準偏差を示す。 In the above equation (32), the following equation (33) is a stochastic basic attention degree s at time t ₁ and position y when the following expression (34) which is a basic attention degree image up to time t ₂ is given. _(t 1, y) of the expected _{_{value, σ s (t 1, y}} | t 2) represents the standard deviation at this time.

このとき、確率的基礎注目度画像抽出部１２は、位置ｙにおける現在の確率的基礎注目度ｓ（ｔ，ｙ）の下記式（３５）に示す確率密度における期待値＾ｓ（ｔ，ｙ｜ｔ）及び標準偏差σ_ｓ（ｔ，ｙ｜ｔ）を更新することが目的となる。 At this time, the stochastic basic attention level image extraction unit 12 expects the current value of the probabilistic basic attention level s (t, y) at the position y in the probability density represented by the following equation (35) ^ s (t, y | The objective is to update t) and the standard deviation σ _s (t, y | t).

また、確率的基礎注目度画像抽出部１２による更新は、繰り返し計算を用いた下記式（３６）〜（４１）の方法によって行われる。 The update by the probabilistic basic attention level image extraction unit 12 is performed by the following formulas (36) to (41) using repetitive calculation.

上記式（３６）〜（４１）において、｜Ｄ（ｙ）｜は集合Ｄ（ｙ）の要素数を示す。また、上記式（３６）〜（４１）を用いた計算において、上記式（３９）に示すような無限ステップの繰り返しは不可能であるため、実際には、第ｌ＋１ステップの出力である下記式（４２）と第ｌステップの出力である下記式（４３）との差が十分小さくなった時点で繰り返し計算を打ち切ることとする。 In the above formulas (36) to (41), | D (y) | indicates the number of elements of the set D (y). In addition, in the calculation using the above formulas (36) to (41), it is impossible to repeat the infinite steps as shown in the above formula (39). When the difference between (42) and the following equation (43) which is the output of the l-th step becomes sufficiently small, the calculation is repeatedly iterated.

また、ステップに関するインデックスｌを固定したとき、上記式（３８）に示す更新は、画像の各位置において独立に算出することができる。また、それ以外の更新式については、時刻ｔを固定することによって、同様に画像の各位置において独立に算出することができる。
このことによって、上述の確率的基礎注目度画像抽出部１２における期待値＾ｓ（ｔ，ｙ｜ｔ）及び標準偏差σ_ｓ（ｔ，ｙ｜ｔ）の更新は、上述したカルマンフィルタを用いた推定方法と同様に、画像中の各位置で独立して実行することができ、これらの更新処理を容易に並列化できる。 Further, when the index 1 relating to the step is fixed, the update shown in the above equation (38) can be calculated independently at each position of the image. Further, other update formulas can be calculated independently at each position of the image by fixing the time t.
As a result, the update of the expected value ^ s (t, y | t) and the standard deviation σ _s (t, y | t) in the above-described stochastic basic attention level image extraction unit 12 is estimated using the above-described Kalman filter. Similar to the method, it can be executed independently at each position in the image, and these update processes can be easily parallelized.

確率的基礎注目度母数逐次推定部１３は、基礎注目度画像抽出部１１から入力された基礎注目度画像￣Ｓ（ｉ）、確率的基礎注目度画像抽出部１２から入力された確率的基礎注目度画像Ｓ（ｔ）、及び事前に与えられた母数である確率的基礎注目度母数Θ_ｓ（ｔ）に基づいて、確率的基礎注目度母数Θ_ｓ（ｔ）を逐次的に更新する。
また、確率的基礎注目度母数逐次推定部１３は、更新した確率的基礎注目度母数Θ_ｓ（ｔ＋１）を確率的基礎注目度画像抽出部１２に出力する。 The stochastic basic attention degree parameter sequential estimation unit 13 includes a basic attention degree image ￣S (i) input from the basic attention degree image extraction unit 11 and a stochastic basis input from the stochastic basic attention degree image extraction unit 12. Based on the attention degree image S (t) and the probabilistic basic attention degree parameter Θ _s (t) which is a parameter given in advance, the probabilistic basic attention degree parameter Θ _s (t) is sequentially obtained. Update.
The stochastic basic attention degree parameter sequential estimation unit 13 outputs the updated stochastic basic attention degree parameter Θ _s (t + 1) to the stochastic basic attention degree image extraction unit 12.

なお、確率的基礎注目度母数逐次推定部１３は、確率的基礎注目度母数Θ_ｓ（ｔ）を更新していない場合は、事前に与えられた母数である確率的基礎注目度母数Θ_ｓ（ｔ）を確率的基礎注目度母数Θ_ｓ（ｔ＋１）として確率的基礎注目度画像抽出部１２に出力する。即ち、確率的基礎注目度画像抽出部１２から確率的基礎注目度画像Ｓ（ｔ）が入力されていない初期段階では、確率的基礎注目度母数Θ_ｓ（ｔ）の更新ができないため、入力された確率的基礎注目度母数Θ_ｓ（ｔ）をそのまま確率的基礎注目度画像抽出部１２に出力する。 Note that the stochastic basic attention degree parameter sequential estimation unit 13 does not update the stochastic basic attention degree parameter Θ _s (t), and the probabilistic basic attention degree mother, which is a parameter given in advance. The number Θ _s (t) is output to the stochastic basic attention level image extraction unit 12 as a stochastic basic attention level parameter Θ _s (t + 1). That is, since the stochastic basic attention degree parameter Θ _s (t) cannot be updated in the initial stage where the stochastic basic attention degree image S (t) is not input from the stochastic basic attention degree image extraction unit 12, probabilistic basis prominence population parameter theta _s a _(t) is output as a probability basis saliency image extracting section 12.

確率的基礎注目度母数逐次推定部１３における確率的基礎注目度母数Θ_ｓ（ｔ＋１）の推定方法は特に限定されるものではないが、本実施形態においては、適応カルマンフィルタを用いた推定方法について述べる。 The estimation method of the stochastic basic attention degree parameter Θ _s (t + 1) in the stochastic basic attention degree parameter sequential estimation unit 13 is not particularly limited, but in this embodiment, the estimation method using an adaptive Kalman filter Is described.

確率的基礎注目度母数逐次推定部１３において、次の時刻ｔ＋１で用いられる確率的基礎注目度母数Θ_ｓ（ｔ＋１）を、下記式（４４）に示す。 In probabilistic basis attention parametric sequential estimation unit 13, the next time t + 1 with probability basis prominence population parameter theta _s used the _(t + 1), represented by the following formula (44).

確率的基礎注目度母数逐次推定部１３は、既に基礎注目度画像抽出部１１によって算出されている基礎注目度画像￣Ｓ（ｉ）、及び確率的基礎注目度画像抽出部１２によって算出されている確率的基礎注目度画像Ｓ（ｔ）を構成する確率的基礎注目度の期待値及び標準偏差を用いて、下記式（４５）〜（５２）のように計算する。 The stochastic basic attention level parameter sequential estimation unit 13 is calculated by the basic attention level image ￣S (i) that has already been calculated by the basic attention level image extraction unit 11 and the probabilistic basic attention level image extraction unit 12. Using the expected value and standard deviation of the probabilistic basic attention degree that constitutes the probabilistic basic attention degree image S (t), calculation is performed as in the following formulas (45) to (52).

上記式（４５）〜（５２）において、下記式（５３）（以下、「￣σ_ｓ１」と表す）及び下記式（５４）（以下、「￣σ_ｓ２」と表す）は基本確率的基礎注目度母数であり、予め定めておく、もしくは事前に学習によって算出しておくものである。 In the above formulas (45) to (52), the following formula (53) (hereinafter referred to as “￣σ _s1 ”) and the following formula (54) (hereinafter referred to as “￣σ _s2 ”) are basic stochastic basic attentions. It is a degree parameter and is determined in advance or calculated by learning in advance.

また、λ_ｓ１及びλ_ｓ２は予め定められた母数の混合比であり、これらの数値を適切に定めることで、逐次更新で獲得した母数である下記式（５５），下記式（５６）と、予め定めておいた母数￣σ_ｓ１，￣σ_ｓ２とのバランスを制御することができる。 Also, λ _s1 and λ _s2 are predetermined mixing ratios of the parameters, and by appropriately determining these numerical values, the parameters obtained by successive updating are _expressed by the following equations (55) and (56). And the predetermined parameters ￣σ _s1 and ￣σ _s2 can be controlled.

なお、λ_ｓ１＝λ_ｓ２＝０とすることにより、確率的基礎注目度母数逐次推定部１３による確率的基礎注目度母数Θ_ｓ（ｔ＋１）の推定処理を行わないことと等価になる。また、Ｎ_ｓは過去の情報を保持するバッファの時間長である。 Note that setting λ _s1 = λ _s2 = 0 is equivalent to not performing the process of estimating the stochastic basic attention degree parameter Θ _s (t + 1) by the stochastic basic attention degree parameter sequential estimation unit 13. N _s is the time length of a buffer that holds past information.

視線位置確率密度画像抽出部１４は、視線移動状態変数更新部１４１、代表視線位置更新部１４２、代表視線位置重み係数算出部１４３、視線位置確率密度画像出力部１４４、代表視線位置集合再構成部１４５から構成される。
視線位置確率密度画像抽出部１４は、視線位置確率密度映像を構成するフレームである視線位置確率密度画像Ｘ（ｔ）を抽出する。なお、視線位置確率密度画像抽出部１４による視線位置確率密度画像Ｘ（ｔ）の抽出は、確率的基礎注目度画像抽出部１２から入力された確率的基礎注目度画像Ｓ（ｔ）、該視線位置確率密度画像抽出部１４が抽出したこれまでの視線位置確率密度画像Ｘ（ｔ）、及び事前に与えられた母数である視線位置確率密度母数Θ_ｘ（ｔ）に基づいて行われる。
また、視線位置確率密度画像抽出部１４は、視線位置確率密度画像Ｘ（ｔ）を視線位置確率密度映像出力部５に出力する。 The gaze position probability density image extraction unit 14 includes a gaze movement state variable update unit 141, a representative gaze position update unit 142, a representative gaze position weight coefficient calculation unit 143, a gaze position probability density image output unit 144, and a representative gaze position set reconstruction unit. 145.
The line-of-sight position probability density image extraction unit 14 extracts a line-of-sight position probability density image X (t) that is a frame constituting the line-of-sight position probability density image. Note that the eye-gaze position probability density image X (t) is extracted by the eye-gaze position probability density image extraction unit 14 from the probabilistic basic attention level image S (t) input from the probabilistic basic attention level image extraction unit 12 and the line of sight. This is performed based on the line-of-sight position probability density image X (t) extracted by the position probability density image extraction unit 14 and the line-of-sight position probability density parameter Θ _x (t) given in advance.
Further, the line-of-sight position probability density image extraction unit 14 outputs the line-of-sight position probability density image X (t) to the line-of-sight position probability density video output unit 5.

視線移動状態変数更新部１４１は、代表視線位置集合再構成部１４５から出力されたこれまでの出力である視線位置確率密度画像Ｘ（ｔ）、及び事前に与えられた母数である視線位置確率密度母数Θ_ｘ（ｔ）に基づいて、これまでの視線位置確率密度画像Ｘ（ｔ）に含まれている視線移動の大きさを制御する確率変数である視線移動状態変数ｕ（ｔ）を更新する。
また、視線移動状態変数更新部１４１は、更新した視線移動状態変数ｕ（ｔ）の集合である視線移動状態変数集合Ｕ（ｔ）を代表視線位置更新部１４２及び代表視線位置集合再構成部１４５に出力する。 The line-of-sight movement state variable updating unit 141 includes a line-of-sight position probability density image X (t) that has been output from the representative line-of-sight position set reconstruction unit 145 and a line-of-sight position probability that is a parameter given in advance. Based on the density parameter Θ _x (t), the gaze movement state variable u (t), which is a random variable for controlling the magnitude of the gaze movement included in the gaze position probability density image X (t) so far, is obtained. Update.
Also, the line-of-sight movement state variable update unit 141 converts the line-of-sight movement state variable set U (t), which is a set of the updated line-of-sight movement state variables u (t), to the representative line-of-sight position update unit 142 and the representative line-of-sight position set reconstruction unit 145. Output to.

視線移動状態変数更新部１４１における視線移動状態確率変数集合Ｕ（ｔ）の更新方法は特に限定されるものではないが、本実施形態による方法について述べる。 The method of updating the line-of-sight movement state random variable set U (t) in the line-of-sight movement state variable updating unit 141 is not particularly limited, but the method according to this embodiment will be described.

まず、代表視線位置集合再構成部１４５の出力の一部として、１時点前（時刻ｔ−１）の視線移動状態変数集合Ｕ（ｔ−１）が、下記式（５７）のように与えられているものとする。 First, as part of the output of the representative line-of-sight position set reconstruction unit 145, the line-of-sight movement state variable set U (t-1) one point before (time t-1) is given as in the following equation (57). It shall be.

上記式（５７）において、Ｎ_ｕは視線移動状態変数集合の要素数、即ち、視線移動状態変数のサンプル数を示す。また、各視線移動状態変数はｍ_ｕ通りの値（１，２，・・・，ｍ_ｕ）のいずれかを取るものとする。 In the above equation (57), _Nu represents the number of elements of the line-of-sight movement state variable set, that is, the number of samples of the line-of-sight movement state variable. Each eye movement state variable shall take one of the values of the street _{_{m u (1,2, ···, m}} u).

このとき、各視線移動状態変数のサンプルｕ_ｎ（ｔ−１）から、視線位置確率密度母数Θ_ｘ（ｔ）の１つである視線移動遷移確率行列Φ＝｛φ_{（ｉ，ｊ）}｝_{（ｉ，ｊ）}に基づいて、現在の視線移動状態変数のサンプルｕ_ｎ（ｔ）をランダムに生成する。視線移動遷移確率行列はｍ_ｕ行ｍ_ｕ列の行列によって表現され、ｉ行ｊ列の要素φ_{（ｉ，ｊ）}によって、状態ｊから状態ｉへ遷移する確率を表現している。よって、Φは下記式（５８）に示すような性質を満たす。 At this time, a line-of-sight movement transition probability matrix Φ = {φ _{(i, j)} } _, which is one of the line-of-sight position probability density parameters Θ _x (t), from the samples u _n (t−1) of the line-of-sight movement state variables. _{Based on (i, j)} , a sample u _n (t) of the current line-of-sight movement state variable is randomly generated. The line-of-sight movement transition probability matrix is expressed by a matrix of m _u rows and m _u columns, and expresses a probability of transition from the state j to the state i by an element φ _{(i, j) of} i rows and j columns. Therefore, Φ satisfies the properties shown in the following formula (58).

即ち、上述のようにして生成した視線移動状態変数のサンプルｕ_ｎ（ｔ）の集合である下記式（５９）が、現在の視線移動状態変数集合Ｕ（ｔ）となる。
なお、現在の視線移動状態変数集合Ｕ（ｔ）に含まれる視線移動状態変数のサンプルｕ_ｎ（ｔ）の経験確率分布が、視線移動状態変数の生起確率の近似となっている。 In other words, the following equation (59), which is a set of samples u _n (t) of the line-of-sight movement state variables generated as described above, is the current line-of-sight movement state variable set U (t).
It should be noted that the empirical probability distribution of the line-of-sight movement state variable sample u _n (t) included in the current line-of-sight movement state variable set U (t) is an approximation of the occurrence probability of the line-of-sight movement state variable.

別の実施形態として、視線移動状態変数更新部１４１では何の処理もしないことも可能である。しかし、これは、前述の視線移動状態変数更新部１４１における視線移動状態確率変数集合Ｕ（ｔ）の更新処理において、ｍ_ｕ＝１、即ち、視線移動状態が１通りしかないことと等価である。 As another embodiment, the line-of-sight movement state variable updating unit 141 can perform no processing. However, this is equivalent to m _u = 1, that is, there is only one line-of-sight movement state in the update process of the line-of-sight movement state random variable set U (t) in the line-of-sight movement state variable update unit 141 described above. .

代表視線位置更新部１４２は、代表視線位置集合再構成部１４５から出力されたこれまでの出力である視線位置確率密度画像Ｘ（ｔ）、視線移動状態変数更新部１４１から入力された視線移動状態変数集合Ｕ（ｔ）、及び事前に与えられた母数である視線位置確率密度母数Θ_ｘ（ｔ）に基づいて、視線移動状態変数ｕ（ｔ）によって制御される視線移動を考慮し、これまでの視線位置確率密度画像Ｘ（ｔ）に含まれている代表的な視線位置を表現する代表視線位置の集合である代表視線位置集合Ｖ（ｔ）を更新する。
また、代表視線位置更新部１４２は、更新した代表視線位置集合Ｖ（ｔ）を代表視線位置重み係数算出部１４３、視線位置確率密度画像出力部１４４、及び代表視線位置集合再構成部１４５に出力する。 The representative line-of-sight position update unit 142 includes the line-of-sight position probability density image X (t) output from the representative line-of-sight position set reconstruction unit 145 and the line-of-sight movement state input from the line-of-sight movement state variable update unit 141. Based on the variable set U (t) and the gaze position probability density parameter Θ _x (t), which is a parameter given in advance, the gaze movement controlled by the gaze movement state variable u (t) is considered, A representative line-of-sight position set V (t), which is a set of representative line-of-sight positions representing representative line-of-sight positions included in the line-of-sight position probability density image X (t), is updated.
Further, the representative visual line position update unit 142 outputs the updated representative visual line position set V (t) to the representative visual line position weight coefficient calculation unit 143, the visual line position probability density image output unit 144, and the representative visual line position set reconstruction unit 145. To do.

代表視線位置更新部１４２における代表視線位置集合Ｖ（ｔ）の更新方法は特に限定されるものではないが、本実施形態による方法について述べる。 The method of updating the representative line-of-sight position set V (t) in the representative line-of-sight position update unit 142 is not particularly limited, but the method according to the present embodiment will be described.

まず、視線移動状態変数ｕ（ｔ）によって視線位置ｘ（ｔ）が制御されるモデルとして、以下の２通りのモデルについて述べる。 First, the following two models will be described as models in which the line-of-sight position x (t) is controlled by the line-of-sight movement state variable u (t).

（モデル１）：１時点前（時刻ｔ−１）における視線位置が、視線位置ｘ（ｔ−１）として与えられているときの、現在（時刻ｔ）における視線位置ｘ（ｔ）の生起確率を、現在の視線移動状態変数ｕ（ｔ）に依存する形で、下記式（６０）のように与える。 (Model 1): Occurrence probability of the line-of-sight position x (t) at the current time (time t) when the line-of-sight position before the time point (time t-1) is given as the line-of-sight position x (t-1) Is given by the following equation (60) in a form depending on the current line-of-sight movement state variable u (t).

上記式（６０）において、γ_ｘｉ及びσ_ｘｉ（ｉ＝０，１，・・・，ｍ_ｕ−１）はそれぞれ視線位置確率密度母数Θ_ｘ（ｔ）を構成する定数、下記式（６１）（以下、「確率密度Ｑ（ｘ；￣ｘ，γ，σ）」と表す）は中心が下記式（６２）、最頻距離がγ、最頻距離からの標準偏差に相当する母数をσとする下記式（６３）に示す確率密度関数を表す。 In the above equation (60), γ _xi and σ _xi (i = 0, 1,..., M _u −1) are constants constituting the line-of-sight position probability density parameter Θ _x (t), respectively, ) (Hereinafter referred to as “probability density Q (x; ￣x, γ, σ)”), the center is the following formula (62), the most frequent distance is γ, and the parameter corresponding to the standard deviation from the most frequent distance is The probability density function shown in the following formula (63) as σ is represented.

上記式（６３）において、‖ｘ‖はベクトルｘのノルム、Ｚ_Ｌは確率密度Ｑ（ｘ；￣ｘ，γ，σ）の全定義域における積分値を１にするための下記式（６４）で表される正規化定数を示す。 In the above equation (63), ‖x‖ is the norm of the vector x, Z _L is the following equation (64) for setting the integral value in the entire domain of probability density Q (x; ￣x, γ, σ) to 1. The normalization constant represented by

（モデル２）：１時点前（時刻ｔ−１）における視線位置が、視線位置ｘ（ｔ−１）として与えられているときの、現在（時刻ｔ）における視線位置ｘ（ｔ）の生起確率として、ベータ分布を用いる。なお、１次元変量ｘに対するベータ分布は、定義域を下記式（６５）とする下記式（６６）によって定義される。 (Model 2): Occurrence probability of the gaze position x (t) at the present time (time t) when the gaze position before the time point (time t-1) is given as the gaze position x (t-1) A beta distribution is used. The beta distribution for the one-dimensional variable x is defined by the following formula (66) in which the domain of definition is the following formula (65).

上記式（６６）において、ａ，ｂはそれぞれベータ分布を特徴付ける母数を示す。また、Ｂ（ａ，ｂ）はベータ関数と呼ばれ、ベータ分布の全定義域における積分値を１にするための下記式（６７）に示す正規化定数を表す。 In the above equation (66), a and b each indicate a parameter that characterizes the beta distribution. B (a, b) is called a beta function, and represents a normalization constant represented by the following formula (67) for setting the integral value in the entire domain of the beta distribution to 1.

なお、本実施形態においては、正規化定数の変量として位置ｘと、予め定められた原点ｘとの距離を用い、定義域を下記式（６８）とするベータ分布である下記式（６９）を用いる。 In the present embodiment, the following equation (69), which is a beta distribution using the distance between the position x and a predetermined origin x as a variable of the normalization constant, and having a defined region as the following equation (68), Use.

即ち、上述のベータ分布である下記式（７０）は、下記式（７１）によって与えられる。 That is, the following formula (70) which is the above-mentioned beta distribution is given by the following formula (71).

上記式（７１）のように正規化されたベータ分布を用いることにより、１時点前（時刻ｔ−１）の視線位置が視線位置ｘ（ｔ−１）として与えられているときの、現在（時刻ｔ）の視線位置ｘ（ｔ）の生起確率は、現在の視線移動状態変数ｕ（ｔ）に依存する形で、下記式（７２）のように与えられる。 By using the normalized beta distribution as in the above formula (71), the current position when the line-of-sight position one time before (time t−1) is given as the line-of-sight position x (t−1) ( The occurrence probability of the line-of-sight position x (t) at time t) is given by the following equation (72), depending on the current line-of-sight movement state variable u (t).

上記式（７２）において、ａ_ｘｉ及びｂ_ｘｉ（ｉ＝０，１，ｍ_ｕ−１）は、それぞれ視線位置確率密度母数Θ_ｘ（ｔ）を構成する定数を表す。 In the above formula (72), a _xi and b _xi (i = 0, 1, m _u −1) represent constants constituting the line-of-sight position probability density parameter Θ _x (t), respectively.

代表視線位置更新部１４２は、上述したモデルのいずれかを用いる方法によって、次に述べるように、代表視線位置集合Ｖ（ｔ）を更新する。 The representative line-of-sight position update unit 142 updates the representative line-of-sight position set V (t) by a method using any of the above-described models as described below.

まず、代表視線位置集合再構成部１４５の出力の一部として、１時点前（時刻ｔ−１）の代表視線位置集合Ｖ（ｔ−１）が、下記式（７３）のように与えられているものとする。 First, as a part of the output of the representative line-of-sight position set reconstruction unit 145, a representative line-of-sight position set V (t−1) one point before (time t−1) is given as in the following equation (73). It shall be.

上記式（７３）において、Ｎ_ｘは代表視線位置集合Ｖ（ｔ）の要素数、即ち、代表視線位置のサンプル数を示す。なお、一般的な実施形態においては、代表視線位置集合Ｖ（ｔ）の要素数Ｎ_ｘを、視線移動状態変数集合Ｕ（ｔ）の要素数Ｎ_ｕと同一にしておく。 In the above formula (73), N _x is the number of elements of the representative line of sight position set V (t), that is, the number of samples of the representative line-of-sight position. In a general embodiment, the number of elements N _x of the representative line-of-sight position set V (t) is made the same as the number of elements N _u of the line-of-sight movement state variable set U (t).

また、代表視線位置更新部１４２は、１時点前（時刻ｔ−１）における代表視線位置の各サンプルｘ_ｎ（ｔ−１）から、上述したいずれかのモデルによって下記式（７４）のように示す確率密度関数を用いて、現時点（時刻ｔ）における代表視線位置のサンプルｘ_ｎ（ｔ）をランダムに生成する。 Also, the representative line-of-sight position update unit 142 calculates, from one sample x _n (t−1) of the representative line-of-sight position one time before (time t−1), as shown in the following formula (74) using one of the models described above. A sample x _n (t) of the representative line-of-sight position at the present time (time t) is randomly generated using the probability density function shown.

なお、上記式（７４）のように示す確率密度関数を用いてランダムにサンプルを生成する方法は、サンプル生成に用いる確率密度関数が複雑である。このため、直接的な方法によってランダムなサンプルを生成することは困難である。しかし、上述のようなランダムのサンプル生成は、例えば、マルコフ連鎖モンテカルロ法に基づいたサンプル生成の方法を用いることができる。 Note that, in the method of randomly generating a sample using the probability density function represented by the above formula (74), the probability density function used for sample generation is complicated. For this reason, it is difficult to generate a random sample by a direct method. However, for the random sample generation as described above, for example, a sample generation method based on the Markov chain Monte Carlo method can be used.

次に、一般にＭｅｔｒｏｐｏｌｉｓ−Ｈａｓｔｉｎｇｓアルゴリズムと呼ばれるマルコフ連鎖モンテカルロ法に基づいたサンプル生成の詳細な方法について述べる。 Next, a detailed method of sample generation based on the Markov chain Monte Carlo method generally called the Metropolis-Hastings algorithm will be described.

まず、代表視線位置更新部１４２は、代表視線位置の仮のサンプルの初期値である下記式（７５）として、１時点前（時刻ｔ−１）における代表視線位置のサンプルを下記式（７６）のように与える。 First, the representative line-of-sight position update unit 142 sets the sample of the representative line-of-sight position one time before (time t−1) as the following expression (75) that is the initial value of the temporary sample of the representative line-of-sight position: Give like.

次に、原点対称な確率密度関数を用いて、２次元ベクトルである下記式（７７）を生成し、この２次元ベクトルである下記式（７８）を、第ｋ−１ステップの代表視線位置の仮のサンプルである下記式（７９）に加えることによって、第ｋステップの代表視線位置の仮のサンプルである下記式（８０）を下記式（８１）のように生成する。 Next, the following equation (77), which is a two-dimensional vector, is generated using a probability density function that is symmetric with respect to the origin, and the following equation (78), which is a two-dimensional vector, is expressed as the representative line-of-sight position in the (k-1) th step. By adding to the following formula (79) that is a temporary sample, the following formula (80) that is a temporary sample of the representative line-of-sight position of the k-th step is generated as the following formula (81).

この原点対称な確率密度関数は、原点に対する対称性を満足しているのみで良く、例えば、原点を中心とする２次元正規分布、原点を中心とする各要素±δ_ｘの範囲内の一様分布、などが考えられる。 This origin-symmetric probability density function only needs to satisfy the symmetry with respect to the origin, for example, a two-dimensional normal distribution centered on the origin, and uniform within the range of each element ± δ _x centered on the origin. Distribution, etc. can be considered.

そして、第ｋステップの仮の代表視線位置のサンプルである上記式（８０）の生起確率と第ｋ−１ステップの代表視線位置の仮のサンプルである上記式（７９）の生起確率との比である下記式（８２）を下記式（８３）に基づいて計算する。 Then, the ratio of the occurrence probability of the above formula (80), which is a sample of the temporary representative line-of-sight position in the k-th step, and the occurrence probability of the above-described expression (79), which is a temporary sample of the representative line-of-sight position in the k-1 step The following formula (82) is calculated based on the following formula (83).

最後に、下記式（８４）の一様乱数である下記式（８５）を発生させ、下記式（８６）の場合のみ、第ｋステップの代表視線位置の仮のサンプルである上述の数式（８０）を棄却して第ｋ−１ステップの代表視線位置の仮のサンプルである上述の数式（７９）に置き換える。 Finally, the following formula (85), which is a uniform random number of the following formula (84), is generated. Only in the case of the following formula (86), the above formula (80), which is a temporary sample of the representative line-of-sight position in the k-th step. ) Is replaced with the above-described mathematical expression (79), which is a temporary sample of the representative line-of-sight position in the (k−1) -th step.

その後、上述の仮のサンプルの生成ステップをあらかじめ定められた回数（Ｋ_ｘ回）繰り返し、第Ｋ_ｘステップの仮のサンプルである下記式（８７）を下記式（８８）に示すような時刻ｔの代表視線位置のサンプルとする。 Thereafter, the above-described tentative sample predetermined number (K _x times) a generation step of repeating, the K _x formula is a sample of tentative step (87) the following equation (88) shows such a time t The representative line-of-sight position is a sample.

上記に述べたとおり、マルコフ連鎖モンテカルロ法に基づいて、サンプルを生成する。この生成したサンプルの集合である下記式（８９）が、即ち、現時点の代表視線位置集合Ｖ（ｔ）となる。また、現時点の代表視線位置集合Ｖ（ｔ）に含まれる代表視線位置サンプルの経験確率分布が、視線位置の生起確率の近似となっている。 As described above, a sample is generated based on the Markov chain Monte Carlo method. The following expression (89), which is a set of the generated samples, is the current representative gaze position set V (t). Also, the experience probability distribution of representative visual line position samples included in the current representative visual line position set V (t) is an approximation of the occurrence probability of the visual line position.

代表視線位置重み係数算出部１４３は、確率的基礎注目度画像抽出部１２から入力された確率的基礎注目度画像Ｓ（ｔ）、代表視線位置更新部１４２から入力された代表視線位置集合Ｖ（ｔ）、及び事前に与えられた母数である視線位置確率密度母数Θ_ｘ（ｔ）に基づいて、各代表視線位置に関連付けられた重みである代表視線位置重み係数を算出する。
また、代表視線位置重み係数算出部１４３は、算出した代表視線位置重み係数の集合である代表視線位置重み係数集合である下記式（９０）を視線位置確率密度画像出力部１４４及び代表視線位置集合再構成部１４５に出力する。 The representative gaze position weighting coefficient calculation unit 143 includes the probabilistic basic attention level image S (t) input from the probabilistic basic attention level image extraction unit 12 and the representative gaze position set V ( Based on t) and a gaze position probability density parameter Θ _x (t) that is a parameter given in advance, a representative gaze position weight coefficient that is a weight associated with each representative gaze position is calculated.
In addition, the representative gaze position weight coefficient calculating unit 143 converts the following gaze position probability density image output unit 144 and the representative gaze position set as the representative gaze position weight coefficient set which is a set of the calculated representative gaze position weight coefficients. The data is output to the reconstruction unit 145.

代表視線位置重み係数算出部１４３における代表視線位置重み係数集合Ｗ（ｔ）の抽出方法は特に限定されるものではないが、本実施形態においては、信号検出理論に基づいた代表視線位置重み係数集合Ｗ（ｔ）の抽出方法について述べる。 The method of extracting the representative line-of-sight position weighting coefficient set W (t) in the representative line-of-sight position weighting coefficient calculation unit 143 is not particularly limited, but in this embodiment, the representative line-of-sight position weighting coefficient set based on the signal detection theory is used. A method for extracting W (t) will be described.

代表視線位置のサンプルｘ_ｎ（ｔ）（ｎ＝１，２，・・・，Ｎ_ｘ）に関連付けられる代表視線位置重み係数ｗ_ｎ（ｔ）は、下記式（９１）及び下記式（９２）によって算出される。なお、下記式（９１）及び下記式（９２）は、位置ｘ_ｎ（ｔ）における確率的基礎注目度ｓ（ｔ，ｙ）の実現値が、ある位置集合Ｄ_ｘ（ｘ_ｎ（ｔ））以外の位置ｙにおける確率的基礎注目度ｓ（ｔ，ｙ）の実現値以上となる確率を算出するものである。 The representative line-of-sight position weighting coefficient w _n (t) associated with the representative line-of-sight position sample x _n (t) (n = 1, 2,..., N _x ) is expressed by the following expressions (91) and (92). Is calculated by In addition, the following formula (91) and the following formula (92) indicate that the actual value of the probabilistic basic attention level s (t, y) at the position x _n (t) is a certain position set D _x (x _n (t)). The probability of being equal to or higher than the actual value of the probabilistic basic attention degree s (t, y) at the position y other than is calculated.

なお、上記式（９１）および上記式（９２）においてのみｓ＝ｓ（ｔ，ｘ_ｎ（ｔ））の表記を用いている。また、上記式（９１）および上記式（９２）において、下記式（９３）は、位置ｙにおける現在の確率的基礎注目度ｓ（ｔ，ｙ）の確率分布関数を示し、位置ｙにおける現在の確率密度ｐ（ｓ（ｔ，ｘ））に対応して、下記式（９４）のように定義される。 Note that the notation of s = s (t, x _n (t)) is used only in the above formula (91) and the above formula (92). In the above formula (91) and the above formula (92), the following formula (93) represents the probability distribution function of the current probabilistic basic attention level s (t, y) at the position y, and the current distribution at the position y. Corresponding to the probability density p (s (t, x)), it is defined as the following formula (94).

ある位置集合Ｄ_ｘ（ｘ）の与え方は種々の方法が考えられるが、例えば、位置ｘ以外の任意位置の集合、位置ｘ以外で基礎注目度である下記式（９５）が局所的に最大となる位置ｙの集合、位置ｘ以外で確率的基礎注目度ｓ（ｔ，ｙ）の期待値である下記式（９６）が局所的に最大となる位置ｙの集合、などが考えられる。 Various methods can be considered for giving a certain position set D _x (x). For example, a set of arbitrary positions other than the position x, and the following formula (95) which is a basic attention degree other than the position x is locally maximum. And a set of positions y where the following expression (96), which is an expected value of the probabilistic basic attention degree s (t, y), other than the position x is locally maximized.

上記に述べた信号検出理論に基づいた代表視線位置重み係数集合Ｗ（ｔ）の抽出方法では、サンプリングにより代表視線位置集合Ｖ（ｔ）及び代表視線位置重み係数集合Ｗ（ｔ）を抽出していたが、サンプリングを用いずに代表視線位置重み係数集合Ｗ（ｔ）を抽出することもできる。以下、サンプリングを用いずに代表視線位置重み係数集合Ｗ（ｔ）を抽出する方法について述べる。 In the method of extracting the representative gaze position weight coefficient set W (t) based on the signal detection theory described above, the representative gaze position set V (t) and the representative gaze position weight coefficient set W (t) are extracted by sampling. However, the representative line-of-sight position weighting coefficient set W (t) can also be extracted without using sampling. Hereinafter, a method for extracting the representative gaze position weighting coefficient set W (t) without using sampling will be described.

サンプリングを用いずに代表視線位置重み係数集合Ｗ（ｔ）を抽出する方法では、代表視線位置更新部１４２による代表視線位置集合Ｖ（ｔ）の更新と、代表視線位置重み係数算出部１４３による代表視線位置重み係数集合Ｗ（ｔ）の抽出とが同時に行われる。 In the method of extracting the representative gaze position weight coefficient set W (t) without using sampling, the representative gaze position update unit 142 updates the representative gaze position set V (t) and the representative gaze position weight coefficient calculation unit 143 represents the representative. The line-of-sight position weighting coefficient set W (t) is extracted at the same time.

まず、上述した信号検出理論に基づいた代表視線位置重み係数集合Ｗ（ｔ）の抽出方法と同様に下記式（９７）によって、位置ｘ（ｔ）における確率的基礎注目度ｓ（ｔ，ｙ）の実現値が、ある位置集合Ｄ_ｘ（ｘ（ｔ））以外の位置ｙにおける確率的基礎注目度ｓ（ｔ，ｙ）の実現値以上になる確率を、入力画像中の各位置において算出する。 First, similarly to the method of extracting the representative gaze position weighting coefficient set W (t) based on the signal detection theory described above, the probabilistic basic attention s (t, y) at the position x (t) is obtained by the following equation (97). Is calculated at each position in the input image at a position in the input image that is equal to or greater than the actual value of the probabilistic basic attention level s (t, y) at a position y other than a position set D _x (x (t)). .

続いて、この上記式（９７）によって算出した確率分布を、下記式（９８）〜（１０２）に示すようなＥＭアルゴリズムを用いて混合ガウス分布でモデル化する。即ち、混合ガウス分布の各パラメータであるガウス分布の混合比π_ｎ（ｔ）（ｎ＝１，２，・・・，Ｍ_ｙ）、各ガウス分布の平均ベクトルである下記式（１０３）、及び共分散行列Ｓ_ｎ（ｔ）を、下記式（９８）〜（１０２）によるモデル化のステップをｋ＝１，２，・・・について各パラメータが収束するまで繰り返すことによって混合ガウス分布のモデルを導出する。なお、混合ガウス分布のモデル導出の際、位置ｘがどのガウス分布に所属するものであるかを表現する確率変数ｚを導入する。 Subsequently, the probability distribution calculated by the above equation (97) is modeled by a mixed Gaussian distribution using an EM algorithm as shown in the following equations (98) to (102). That is, the mixing ratio π _n (t) (n = 1, 2,..., M _y ) of the Gaussian distribution, which is each parameter of the mixed Gaussian distribution, the following equation (103) that is an average vector of each Gaussian distribution, and A model of the mixed Gaussian distribution is obtained by repeating the modeling step according to the following equations (98) to (102) for the covariance matrix S _n (t) until each parameter converges for k = 1, 2,. To derive. When deriving a model of the mixed Gaussian distribution, a random variable z expressing which Gaussian distribution the position x belongs to is introduced.

上記式（９８）〜（１０２）において、α_ｎ（ｎ＝１，２，・・・，Ｍ_ｙ）は、下記式（１０４）を満たすようにあらかじめ定められた定数を示す。 In the above formula (98) ~ (102), α n (n = 1,2, ···, M y) denotes a predetermined constant so as to satisfy the following equation (104).

なお、ガウス分布の混合比π_ｎ（ｔ）が予め定められた定数よりも小さいガウス分布については、ガウス分布の混合比π_ｎ（ｔ）に対する寄与が小さいものとして除去し、最終的に残ったＮ_ｘ個のガウス分布によって混合ガウス分布を構成する。そして、この混合ガウス分布の各平均位置である上記式（１０３）（ｎ＝１，２，・・・，Ｎ_ｘ）を、現時点（時刻ｔ）の代表視線位置ｖ_ｎ（ｔ）（ｎ＝１，２，・・・，Ｎ_ｘ）として決定する。
このことから、サンプリングを用いずに代表視線位置重み係数集合Ｗ（ｔ）を抽出する方法では、代表視線位置集合Ｖ（ｔ）の要素数Ｎ_ｘは、予め与えられるものではなく、入力画像によって異なることがわかる。 Note that the smaller Gaussian distribution than constant mixing ratio of the Gaussian distribution [pi _{n (t)} is predetermined, and removed as contribution to the mixing ratio of the Gaussian distribution [pi _{n (t)} is small, remaining finally A mixed Gaussian distribution is formed by N _x Gaussian distributions. Then, the above equation (103) (n = 1, 2,..., N _x ), which is each average position of the mixed Gaussian distribution, is converted into the representative line-of-sight position v _n (t) (n = 1, 2,..., N _x ).
From this, in the method of extracting the representative gaze position weight coefficient set W (t) without using sampling, the number of elements N _x of the representative gaze position set V (t) is not given in advance, and depends on the input image. I can see that they are different.

一方、代表視線位置重み係数ｗ_ｎ（ｔ）（ｎ＝１，２，・・・，Ｎ_ｘ）については、１時点前（時刻ｔ−１）の代表視線位置集合である下記式（１０５）、１時点前（時刻ｔ−１）の代表視線位置重み係数集合である下記式（１０６）及び上記の混合ガウス分布の混合比π_ｎ（ｔ）に基づいて、下記式（１０７）のようにして算出する。 On the other hand, the representative line of sight position weight coefficient _{w n (t) (n =} 1,2, ···, N x) for the 1 point before (time t-1) of the representative line of sight position set at a following formula (105) Based on the following formula (106) which is a representative gaze position weighting coefficient set one time before (time t−1) and the mixture ratio π _n (t) of the above mixed Gaussian distribution, the following formula (107) is obtained. To calculate.

これは、即ち、１時点前の代表視線位置集合Ｖ（ｔ）及び代表視線位置重み係数ｗ_ｎ（ｔ）から構成される混合ガウス分布を、上記式（９８）〜（１０２）によってモデル化した混合ガウス分布に、視線移動に関する確率密度ｐ（ｓ（ｔ，ｘ））を考慮して遷移させたものである。 In other words, a mixed Gaussian distribution composed of the representative line-of-sight position set V (t) and the representative line-of-sight position weighting coefficient w _n (t) before one time point is modeled by the above equations (98) to (102). The transition is made to the mixed Gaussian distribution in consideration of the probability density p (s (t, x)) regarding the line-of-sight movement.

上記に述べたとおり、サンプリングを用いずに代表視線位置重み係数集合Ｗ（ｔ）を抽出する方法では、代表視線位置更新部１４２及び代表視線位置重み係数算出部１４３は、代表視線位置集合である下記式（１０８）及び代表視線位置重み係数集合である上記式（９０）を抽出し、視線位置確率密度画像出力部１４４に出力する。 As described above, in the method of extracting the representative gaze position weight coefficient set W (t) without using sampling, the representative gaze position update unit 142 and the representative gaze position weight coefficient calculation unit 143 are a representative gaze position set. The following formula (108) and the above formula (90) which is a representative gaze position weighting coefficient set are extracted and output to the gaze position probability density image output unit 144.

視線位置確率密度画像出力部１４４は、代表視線位置更新部１４２から入力された代表視線位置集合Ｖ（ｔ）、及び代表視線位置重み係数算出部１４３から入力された代表視線位置重み係数集合Ｗ（ｔ）に基づいて、代表視線位置確率密度画像Ｈ（ｔ）を抽出する。
また、視線位置確率密度画像出力部１４４は、抽出した代表視線位置確率密度画像Ｈ（ｔ）を代表視線位置集合再構成部１４５に出力する。 The line-of-sight position probability density image output unit 144 includes a representative line-of-sight position set V (t) input from the representative line-of-sight position update unit 142 and a representative line-of-sight position weight coefficient set W ( Based on t), a representative gaze position probability density image H (t) is extracted.
The line-of-sight position probability density image output unit 144 outputs the extracted representative line-of-sight position probability density image H (t) to the representative line-of-sight position set reconstruction unit 145.

視線位置確率密度画像出力部１４４による代表視線位置確率密度画像Ｈ（ｔ）の算出方法は特に限定されるものではないが、本実施形態による方法について述べる。 The method for calculating the representative eye-gaze position probability density image H (t) by the eye-gaze position probability density image output unit 144 is not particularly limited, but the method according to the present embodiment will be described.

視線位置確率密度画像出力部１４４は、現時点（時刻ｔ）の代表視線位置確率密度画像Ｈ（ｔ）の位置ｘ（ｔ）における画素値を、代表視線位置集合Ｖ（ｔ）及び代表視線位置重み集合Ｗ（ｔ）に基づいて、下記式（１０９）のように算出する。 The gaze position probability density image output unit 144 uses the representative gaze position set V (t) and the representative gaze position weight as the pixel value at the position x (t) of the representative gaze position probability density image H (t) at the current time (time t). Based on the set W (t), calculation is performed as in the following formula (109).

上記式（１０９）において、ｆ（・）は予め定められた関数であり、例えば、下記式（１１０）に示すデルタ関数や、下記式（１１１）に示す２次元正規分布などが考えられる。 In the above equation (109), f (•) is a predetermined function, and for example, a delta function represented by the following equation (110), a two-dimensional normal distribution represented by the following equation (111), and the like are conceivable.

代表視線位置集合再構成部１４５は、代表視線位置更新部１４２から入力された代表視線位置集合Ｖ（ｔ）、視線移動状態変数更新部１４１から入力された視線移動状態変数集合Ｕ（ｔ）、及び代表視線位置重み係数算出部１４３から入力された代表視線位置重み係数集合Ｗ（ｔ）に基づいて、代表視線位置集合Ｖ（ｔ）及び視線移動状態変数集合Ｕ（ｔ）を、代表視線位置重み係数集合Ｗ（ｔ）の示す重み配分に従って再構成する。
また、代表視線位置集合再構成部１４５は、代表視線位置重み係数集合Ｗ（ｔ）を再構成する。
また、代表視線位置集合再構成部１４５は、再構成された代表視線位置集合Ｖ^＊（ｔ）、視線移動状態変数集合Ｕ^＊（ｔ）、及び代表視線位置重み係数集合Ｗ^＊（ｔ）に基づいた、視線位置確率密度画像Ｘ（ｔ）を視線位置確率密度映像出力部５に出力する。
また、代表視線位置集合再構成部１４５は、視線位置確率密度画像Ｘ（ｔ）を視線移動状態変数更新部１４１及び代表視線位置更新部１４２に出力する。 The representative line-of-sight position set reconstruction unit 145 includes a representative line-of-sight position set V (t) input from the representative line-of-sight position update unit 142, a line-of-sight movement state variable set U (t) input from the line-of-sight movement state variable update unit 141, And the representative gaze position weight coefficient set W (t) input from the representative gaze position weight coefficient calculation unit 143, the representative gaze position set V (t) and the gaze movement state variable set U (t) Reconfiguration is performed according to the weight distribution indicated by the weight coefficient set W (t).
Further, the representative gaze position set reconstruction unit 145 reconstructs the representative gaze position weight coefficient set W (t).
The representative gaze position set reconstruction unit 145, the reconstructed representative gaze position set ^V * (t), eye movement state variable set ^U * (t), and the representative line of sight position weight coefficient set ^W * (t) The line-of-sight position probability density image X (t) is output to the line-of-sight position probability density image output unit 5.
In addition, the representative gaze position set reconstruction unit 145 outputs the gaze position probability density image X (t) to the gaze movement state variable update unit 141 and the representative gaze position update unit 142.

代表視線位置集合再構成部１４５における代表視線位置集合Ｖ（ｔ）及び視線移動状態変数集合Ｕ（ｔ）の再構成方法は特に限定されるものではないが、本実施形態による方法について述べる。 The method for reconstructing the representative line-of-sight position set V (t) and the line-of-sight movement state variable set U (t) in the representative line-of-sight position set reconstruction unit 145 is not particularly limited, but the method according to this embodiment will be described.

まず、代表視線位置重み係数ｗ_ｎ（ｔ）（ｎ＝１，２，・・・，Ｎ_ｘ）の累積和ｃ_ｎ（ｔ）を下記式（１１２）によって算出する。なお、累積和ｃ_ｎ（ｔ）を算出する際に必要に応じて、代表視線位置重み係数ｗ_ｎ（ｔ）の大きい順に代表視線位置ｖ_ｎ（ｔ）、視線移動状態変数ｕ（ｔ）及び代表視線位置重み係数ｗ_ｎ（ｔ）の並べ替えを行う。 First, a cumulative sum c _n (t) of representative line-of-sight position weighting factors w _n (t) (n = 1, 2,..., N _x ) is calculated by the following equation (112). If necessary in calculating the cumulative sum _c n (t), representative gaze position weight coefficient _w n descending order representative line-of-sight position of the _{(t) v n (t)} , eye movement state variable u (t) and The representative line-of-sight position weighting coefficient w _n (t) is rearranged.

以降の処理のため、ｃ_０（ｔ）＝０と定める。 For the subsequent processing, it is determined that c ₀ (t) = 0.

次に、ある数κ_１を下記式（１１３）の範囲でランダムに定め、以降、ｎ＝２，３，・・・，Ｎ_ｘについて、κ_ｎを下記式（１１４）のように定める。 Then, randomly determined a certain number kappa ₁ in the range of the following formula (113), and later, n = 2,3, · · ·, for _{N x,} defining a kappa _n as the following equation (114).

そして、ｎ＝１，２，・・・，Ｎ_ｘのそれぞれについて、下記式（１１５）の条件を満たす整数ｎ^＊を求める。 Then, for each of n = 1, 2,..., N _x , an integer n ^* that satisfies the condition of the following formula (115) is obtained.

そして、新しい代表視線位置である下記式（１１６）を下記式（１１７）のように定める。 Then, the following expression (116), which is a new representative line-of-sight position, is defined as the following expression (117).

また、新しい視線移動状態変数である下記式（１１８）を下記式（１１９）のように定める。 Further, the following formula (118), which is a new line-of-sight movement state variable, is defined as the following formula (119).

なお、新しい代表視線位置重み係数である下記式（１２０）は、全て１／Ｎ_ｘとする。 Incidentally, the new representative sight position weight coefficient formula (120) are all 1 / _{N x.}

なお、上述した代表視線位置集合Ｖ（ｔ）及び視線移動状態変数集合Ｕ（ｔ）の再構成は、必ずしも全ての時刻において実施する必要はなく、例えば、ある一定時間間隔を置いて実施することや、全く実施しないこともできる。
また、例えば、下記式（１２１）に示す代表視線位置重み係数の偏りに関する条件を満たさないときのみ実施することもできる。 Note that the above-described reconstruction of the representative line-of-sight position set V (t) and the line-of-sight movement state variable set U (t) does not necessarily have to be performed at all times, for example, at certain time intervals. Or you can do nothing at all.
Further, for example, it can be performed only when the condition regarding the bias of the representative gaze position weighting coefficient represented by the following formula (121) is not satisfied.

上記式（１２１）において、Ｎ_ｅｆｆは下記式（１２２）を満たすように予め定められた定数である。 In the above formula (121), N _eff is a constant determined in advance so as to satisfy the following formula (122).

また、代表視線位置集合再構成部１４５は、上記に述べた代表視線位置集合Ｖ（ｔ）、視線移動状態変数集合Ｕ（ｔ）、及び代表視線位置重み係数集合Ｗ（ｔ）の再構成方法によって再構成した新しい代表視線位置集合である下記式（１２３）、新しい視線移動状態変数集合である下記式（１２４）、及び新しい代表視線位置重み係数集合である下記式（１２５）に基づいて、視線位置確率密度画像出力部１４４から入力された代表視線位置確率密度画像Ｈ（ｔ）を再構成した視線位置確率密度画像Ｘ（ｔ）を、視線位置確率密度画像抽出部１４の出力として視線位置確率密度映像出力部５に出力する。 Further, the representative gaze position set reconstruction unit 145 reconstructs the representative gaze position set V (t), the gaze movement state variable set U (t), and the representative gaze position weight coefficient set W (t) described above. Based on the following formula (123) that is a new representative gaze position set reconstructed by the following formula, the following formula (124) that is a new gaze movement state variable set, and the following formula (125) that is a new representative gaze position weight coefficient set, The line-of-sight position probability density image X (t) reconstructed from the line-of-sight position probability density image H (t) input from the line-of-sight position probability density image output unit 144 is used as the output of the line-of-sight position probability density image extraction unit 14 Output to the probability density video output unit 5.

なお、代表視線位置集合再構成部１４５による代表視線位置集合Ｖ（ｔ）、視線移動状態変数集合Ｕ（ｔ）、及び代表視線位置重み係数集合Ｗ（ｔ）の再構成を全く実施しない場合は、視線位置確率密度画像出力部１４４から入力された代表視線位置確率密度画像Ｈ（ｔ）を視線位置確率密度画像抽出部１４の出力である視線位置確率密度画像Ｘ（ｔ）として視線位置確率密度映像出力部５に出力する。 In the case where the representative visual line position set reconstruction unit 145 does not reconstruct the representative visual line position set V (t), the visual line movement state variable set U (t), and the representative visual line position weight coefficient set W (t) at all. The visual line position probability density image H (t) input from the visual line position probability density image output unit 144 is used as the visual line position probability density image X (t) that is the output of the visual line position probability density image extraction unit 14. Output to the video output unit 5.

視線位置確率密度映像出力部５は、入力映像に含まれる時系列の各入力画像から、基礎注目度画像抽出部１１、確率的基礎注目度画像抽出部１２、確率的基礎注目度母数逐次推定部１３、及び視線位置確率密度画像抽出部１４の処理によって抽出される視線位置確率密度画像Ｘ（ｔ）の時系列である視線位置確率密度映像を抽出して、出力する。 The line-of-sight position probability density video output unit 5 sequentially estimates a basic attention level image extraction unit 11, a probabilistic basic attention level image extraction unit 12, and a stochastic basic attention level parameter from each time-series input image included in the input video. The line-of-sight position probability density image X (t) extracted by the processing of the unit 13 and the line-of-sight position probability density image extracting unit 14 is extracted and output.

上記に述べたとおり、第１の実施形態によれば、視線位置推定の対象となる入力映像、確率的基礎注目度母数Θ_ｓ（ｔ）、及び視線位置確率密度母数Θ_ｘ（ｔ）に基づいて、視線位置確率密度映像を出力する場合に、確率的基礎注目度母数Θ_ｓ（ｔ）を逐次更新することができる。 As described above, according to the first embodiment, the input image, the probabilistic basic attention degree parameter Θ _s (t), and the sight position probability density parameter Θ _x (t), which are targets for eye gaze position estimation. On the basis of the above, the stochastic basic attention degree parameter Θ _s (t) can be sequentially updated when the line-of-sight position probability density image is output.

また、第１の実施形態によれば、確率的基礎注目度画像抽出部１２における期待値及び標準偏差の更新を、入力画像中の各位置で独立して実行することができる。その結果、確率的基礎注目度画像抽出部１２による期待値及び標準偏差の更新処理を、複数コアを持つ計算機やＧｒａｐｈｉｃＰｒｏｃｅｓｓｏｒＵｎｉｔ（ＧＰＵ）などの並列処理が可能な計算機上で容易に並列化することができ、処理を高速化することができる。 Further, according to the first embodiment, the expected value and the standard deviation in the probabilistic basic attention level image extraction unit 12 can be independently updated at each position in the input image. As a result, the expected value and standard deviation update processing by the probabilistic basic attention level image extraction unit 12 is easily parallelized on a computer capable of parallel processing, such as a computer having multiple cores or a graphic processor unit (GPU). And the processing speed can be increased.

なお、注目度映像抽出部１は、確率的基礎注目度画像抽出部１２、確率的基礎注目度母数逐次推定部１３、視線位置確率密度画像抽出部１４、視線位置確率密度映像出力部１５による上述の手法に代えて、下記非特許文献３、４に記載の手法でも、注目度映像を抽出することもできる。但し、確率的基礎注目度画像抽出部１２、確率的基礎注目度母数逐次推定部１３、視線位置確率密度画像抽出部１４、視線位置確率密度映像出力部１５による手法は、上述の如く並列処理に適しているため、非特許文献３、４に記載の手法に比べ、高速に計算（高速に注目度映像を抽出）することができる。 Note that the attention level video extraction unit 1 includes a probabilistic basic attention level image extraction unit 12, a stochastic basic attention level parameter sequential estimation unit 13, a gaze position probability density image extraction unit 14, and a gaze position probability density video output unit 15. Instead of the above-described method, the attention degree video can also be extracted by the methods described in Non-Patent Documents 3 and 4 below. However, the probabilistic basic attention level image extraction unit 12, the probabilistic basic attention level parameter sequential estimation unit 13, the gaze position probability density image extraction unit 14, and the gaze position probability density video output unit 15 employ parallel processing as described above. Therefore, compared to the methods described in Non-Patent Documents 3 and 4, it is possible to calculate at high speed (extract attention level video at high speed).

（非特許文献３）Derek Pang,Akisato Kimura,Tatsuto Takeuchi,Junji Yamato and Kunio Kashino,”A stochastic model of selective visual attention with a dynamic Bayesian network,”Proc.International Conference on Multimedia and Expo (ICME2008),pp.1073−1076, Hannover,Germany,June 2008.
（非特許文献４）Akisato Kimura,Derek Pang,Tatsuto Takeuchi, Junji Yamato and Kunio Kashino,”Dynamic Markov random fields for stochastic modeling of visual attention,”Proc.International Conference on Pattern Recognition (ICPR2008), Mo.BT8.35,Tampa, Florida,USA,December 2008. (Non-Patent Document 3) Derek Pang, Akisato Kimura, Tatsuto Takeuchi, Junji Yamato and Kunio Kashino, “A stochastic model of selective visual attention with a dynamic Bayesian network,” Proc. International Conference on Multimedia and Expo (ICME2008), pp. 1073-1076, Hannover, Germany, June 2008.
(Non-Patent Document 4) Akisato Kimura, Derek Pang, Tatsuto Takeuchi, Junji Yamato and Kunio Kashino, “Dynamic Markov random fields for stochastic modeling of visual attention,” Proc. International Conference on Pattern Recognition (ICPR2008), Mo.BT8.35 , Tampa, Florida, USA, December 2008.

顕著領域事前確率画像抽出部２は、入力映像を構成する各フレームである入力画像の各位置が顕著領域である確率を示す顕著領域事前確率画像を抽出する。具体的には、顕著領域事前確率画像抽出部２は、注目度映像の一のフレームである注目度画像および顕著領域画像抽出部４によって抽出された顕著領域画像から、入力映像中の対応するフレームである入力画像の各位置が顕著領域である確率を表示する顕著領域事前確率画像を抽出する。換言すれば、顕著領域事前確率画像抽出部２は、一の入力画像の各位置が顕著領域である確率を示す顕著領域事前確率画像を、注目度映像抽出部１によって抽出された注目度映像内の当該入力画像に対応する画像である注目度画像および顕著領域画像抽出部４によって抽出された当該入力画像に対応する顕著領域画像に基づいて抽出する。顕著領域事前確率画像抽出部２は、抽出した顕著領域事前確率画像を特徴量尤度算出部３および顕著領域画像抽出部４に出力する。顕著領域事前確率画像抽出部２が顕著領域事前確率画像を抽出する方法は特に限定しないが、本実施形態においては、顕著領域事前確率画像生成部２１と顕著領域事前確率画像更新部２２とによって抽出する方法について説明する。 The saliency area prior probability image extraction unit 2 extracts a saliency area prior probability image indicating the probability that each position of the input image, which is each frame constituting the input video, is a saliency area. Specifically, the saliency area prior probability image extraction unit 2 corresponds to a corresponding frame in the input video from the attention level image which is one frame of the attention level video and the saliency area image extracted by the saliency area image extraction unit 4. A saliency area prior probability image that displays the probability that each position of the input image is a saliency area is extracted. In other words, the saliency area prior probability image extraction unit 2 extracts the saliency area prior probability image indicating the probability that each position of one input image is a saliency area in the attention level video extracted by the attention level video extraction unit 1. Are extracted based on the attention level image which is an image corresponding to the input image and the saliency area image corresponding to the input image extracted by the saliency area image extraction unit 4. The saliency area prior probability image extraction unit 2 outputs the extracted saliency area prior probability image to the feature amount likelihood calculation unit 3 and the saliency area image extraction unit 4. The method by which the saliency area prior probability image extraction unit 2 extracts the saliency area prior probability image is not particularly limited. In this embodiment, the saliency area prior probability image generation unit 21 and the saliency area prior probability image update unit 22 perform extraction. How to do will be described.

顕著領域事前確率画像生成部２１は、注目度画像を入力し、注目度画像のみから顕著領域事前確率画像を生成する。顕著領域事前確率画像生成部２１が注目度画像から顕著領域事前確率画像を生成する方法は特に限定しないが、本実施形態においては、ガウス混合分布モデルを利用した方法について説明する。 The saliency area prior probability image generation unit 21 receives the attention level image and generates a saliency area prior probability image only from the attention level image. The method for generating the saliency area prior probability image from the attention level image by the saliency area prior probability image generation unit 21 is not particularly limited. In the present embodiment, a method using a Gaussian mixture distribution model will be described.

顕著領域事前確率画像生成部２１は、まず、時刻ｔの注目度画像（即ち、基礎注目度画像￣Ｓ（ｔ）若しくは視線位置確率密度画像Ｘ（ｔ））が、それぞれ中心位置〜ｘ_ｊ（ｔ）・共分散行列〜Σ_ｓ，ｊ（ｔ）（ｊ＝１，２，・・・，Ｍ_ｓ）を持ち、混合比が〜η_ｓ，ｊ（ｔ）であるＭ_ｓ個のガウス分布の混合によって構成されていると仮定し、そのモデルパラメータ（即ち、Ｍ_ｓ個の中心位置、共分散行列、混合比）を注目度画像（ｂ）から推定する。推定方法の具体例は、以下の２つである。 The saliency area prior probability image generation unit 21 first calculates the attention level image at time t (that is, the basic attention level image ￣S (t) or the gaze position probability density image X (t)) from the center position to x _j ( t) M _s Gaussian distributions with covariance matrix ~ Σ _{s, j} (t) (j = 1,2, ..., M _s ) and mixing ratio ~ η _{s, j} (t) Model parameters (ie, M _s center positions, covariance matrix, mixture ratio) are estimated from the attention degree image (b). Specific examples of the estimation method are the following two.

（推定方法１）
ＥＭアルゴリズムを用いて導出する。このとき、ＥＭアルゴリズムに与える各サンプルは注目度画像（ｂ）のある特定の位置ｘに対応し、位置ｘにおける画素値と等しい値の重みを持つことに注意する。ＥＭアルゴリズムによるガウス混合分布パラメータの推定は、下記式（１２６）〜下記式（１２９）をｋ＝１，２，・・・で繰り返すことによって行われ、各パラメータが収束した時点で手順を打ち切り、パラメータを固定する。 (Estimation method 1)
Derived using EM algorithm. At this time, it should be noted that each sample given to the EM algorithm corresponds to a specific position x of the attention degree image (b) and has a weight equal to the pixel value at the position x. The estimation of the Gaussian mixture distribution parameter by the EM algorithm is performed by repeating the following formula (126) to the following formula (129) with k = 1, 2,..., And the procedure is terminated when each parameter converges. Fix the parameters.

ここで、ｇ（ｘ；〜ｘ，Σ）は多次元正規分布であり、次元数がＤとするときには下記式（１３０）で定義される。 Here, g (x; ˜x, Σ) is a multidimensional normal distribution, and is defined by the following formula (130) when the number of dimensions is D.

また、視線位置確率密度画像Ｘ（ｔ）の位置ｘにおける画素値を、位置ｘをＥＭアルゴリズムのサンプルとみなしたときの重みとして利用していることから、ここではｗ_ｘ（ｔ）と表現している。 Further, since the pixel value at the position x of the line-of-sight position probability density image X (t) is used as a weight when the position x is regarded as a sample of the EM algorithm, it is expressed here as w _x (t). ing.

（推定方法２）
注目度画像の画素値の極大値をＭ_ｓ個検出し、極大値となる位置を中心位置〜ｘ_ｊ（ｔ）（ｊ＝１，２，・・・，Ｍ_ｓ）として定め、その位置の注目度画像の画素値を混合比〜η_ｓ，ｊ（ｔ）とする。共分散行列〜Σ_ｓ，ｊ（ｔ）については、第１の推定方法と同様にして求めるか、予め定めておいた値を利用する。 (Estimation method 2)
M _s maximum values of the pixel values of the attention level image are detected, and a position where the maximum value is obtained is determined as a center position to x _j (t) (j = 1, 2,..., M _s ). Let the pixel value of the attention level image be a mixture ratio ~ η _{s, j} (t). The covariance matrix ~ Σ _{s, j} (t) is obtained in the same manner as in the first estimation method or a predetermined value is used.

以上のようにして、顕著領域事前確率画像生成部２１は、注目度画像からガウス混合分布のモデルパラメータを推定し、顕著領域事前確率画像を生成する。具体的には、顕著領域事前確率画像生成部２１は、モデルパラメータの１つである混合比を、その最大値が１と等しくなるように正規化し、その後に各位置におけるガウス混合分布の確率を計算して、当該位置の顕著領域事前確率画像￣Ξ_１（ｔ）の画素値ξ_１（ｘ，ｔ）とする（下記式（１３１））。 As described above, the saliency area prior probability image generation unit 21 estimates the model parameter of the Gaussian mixture distribution from the attention degree image, and generates the saliency area prior probability image. Specifically, the saliency area prior probability image generation unit 21 normalizes the mixture ratio, which is one of the model parameters, so that its maximum value is equal to 1, and then calculates the probability of the Gaussian mixture distribution at each position. The pixel value ξ ₁ (x, t) of the saliency area prior probability image _{１ 1} (t) at the position is calculated (the following formula (131)).

上記の実施形態では、全ての位置においてガウス混合分布を用いた方法によって顕著領域事前確率画像を生成しているが、顕著領域が画像の中心位置に存在しやすいことを考慮した後述する第２の実施形態も可能である。即ち、この場合には、顕著領域事前確率画像を第１の実施形態と同様に生成した後、画像の左右両端もしくは上下左右の端の一定領域のピクセル値を強制的に０とする。当該方法は、画像の端に顕著領域が存在する可能性を排除することを意味している。若しくは、顕著領域事前確率画像を第１の実施形態と同様に生成した後、画像の中心位置からの距離に比例する重みを顕著領域事前確率画像に掛け合わせ、その出力を新たに顕著領域事前確率とする実施形態も考えられる。上記の通り、顕著領域事前確率画像生成部２１は、顕著領域事前確率画像￣Ξ_１（ｔ）を生成し出力する。 In the above-described embodiment, the saliency area prior probability image is generated by the method using the Gaussian mixture distribution at all positions. However, the second area described later in consideration of the fact that the saliency area tends to exist at the center position of the image. Embodiments are possible. That is, in this case, after generating the saliency area prior probability image in the same manner as in the first embodiment, the pixel values of the constant areas at the left and right ends or the upper and lower left and right ends of the image are forcibly set to zero. This method means eliminating the possibility of a significant area at the edge of the image. Alternatively, after the saliency area prior probability image is generated in the same manner as in the first embodiment, the weight proportional to the distance from the center position of the image is multiplied by the saliency area prior probability image, and the output is newly generated as the saliency area prior probability. An embodiment is also conceivable. As described above, the saliency area prior probability image generation unit 21 generates and outputs the saliency area prior probability image ￣Ξ ₁ (t).

顕著領域事前確率画像更新部２２は、顕著領域画像を用いて顕著領域事前確率画像生成部２１によって生成された顕著領域事前確率画像を更新する。即ち、顕著領域事前確率画像更新部２２は、顕著領域事前確率画像生成部２１によって生成された顕著領域事前確率画像、および、顕著領域画像抽出部４によって抽出された顕著領域画像を入力し、顕著領域画像を用いて顕著領域事前確率画像を更新する。顕著領域事前確率画像更新部２２が顕著領域事前確率画像を更新する方法は特に限定しないが、本実施形態においては、カルマンフィルタの原理を利用する方法について説明する。 The saliency area prior probability image update unit 22 updates the saliency area prior probability image generated by the saliency area prior probability image generation unit 21 using the saliency area image. That is, the saliency area prior probability image update unit 22 inputs the saliency area prior probability image generated by the saliency area prior probability image generation unit 21 and the saliency area image extracted by the saliency area image extraction unit 4, and The saliency prior probability image is updated using the region image. The method by which the saliency area prior probability image update unit 22 updates the saliency area prior probability image is not particularly limited, but in the present embodiment, a method using the principle of the Kalman filter will be described.

現時点（時刻ｔ）の顕著領域事前確率画像Ξ_１（ｔ）（確率変数）の位置ｘにおける画素値ξ_１（ｘ，ｔ）（確率変数）が、現時点の更新前顕著領域事前確率画像￣Ξ１（ｔ）の位置ｘにおける画素値￣ξ１（ｘ，ｔ）、および、１時点前（時刻ｔ−１）の顕著領域画像Ａ（ｔ−１）の位置ｘにおける画素値ａ（ｘ，ｔ−１）について、下記式（１３２）（１３３）を満たしているものとする。 The pixel value ξ ₁ (x, t) (probability variable) at the position x of the saliency area prior probability image Ξ ₁ (t) (probability variable) at the current time (time t) is the current saliency area prior probability image ￣Ξ1 before update. The pixel value ￣ξ1 (x, t) at the position x in (t) and the pixel value a (x, t− at the position x in the saliency area image A (t−1) one time before (time t−1). It is assumed that the following formulas (132) and (133) are satisfied for 1).

ここで、θ＝（σ_１，σ_２）はあらかじめ与えられるパラメータである。またｆ（・）は、顕著領域画像の画素値を実数値に変換する関数であり、例えば、下記式（１３４）（１３５のように設定する。 Here, θ = (σ ₁ , σ ₂ ) is a parameter given in advance. Further, f (•) is a function for converting the pixel value of the saliency area image into a real value, and is set as in the following formulas (134) and (135), for example.

このとき、顕著領域事前確率画像更新部２２は、現時点の顕著領域事前確率画像Ξ_１（ｔ）の位置ｘにおける画素値ξ_１（ｘ，ｔ）を、カルマンフィルタの原理を利用することにより、下記式（１３６）（１３７）によって更新する。 At this time, the saliency area prior probability image update unit 22 uses the Kalman filter principle to calculate the pixel value ξ ₁ (x, t) at the position x of the current saliency area prior probability image _{１ 1} (t) as follows. It is updated by the equations (136) and (137).

上記の実施形態では、各時刻の顕著領域事前確率を保持しているが、この分散を次の時刻での更新の際に利用しなくてもよい。即ち、上記式（１３６）（１３７）に、下記式（１３８）を追加してもよい。 In the above embodiment, the saliency prior probability at each time is held, but this distribution may not be used when updating at the next time. That is, the following formula (138) may be added to the above formulas (136) and (137).

上記の通り、顕著領域事前確率画像更新部２２は、顕著領域事前確率画像￣Ξ_１（ｔ）をΞ_１（ｔ）に更新し、更新後の顕著領域事前確率画像Ξ１（ｔ）を出力する。 As described above, salient region prior probability image update section 22 updates the marked area prior probability image ¯Ξ _{1 (t)} to Ξ _{1 (t),} and outputs the salient region prior probability image .xi.1 (t) the updated .

上記の通り、顕著領域事前確率画像抽出部２は、図５に示すように、顕著領域事前確率画像Ξ１（ｔ）を抽出（生成、更新）し、出力する。 As described above, the saliency area prior probability image extraction unit 2 extracts (generates and updates) and outputs the saliency area prior probability image Ξ1 (t) as shown in FIG.

特徴量尤度算出部３は、入力画像の顕著領域および顕著領域外の領域にそれぞれ含まれる画像特徴量の尤度を示す特徴量尤度を算出する。具体的には、特徴量尤度算出部３は、特徴量尤度を、入力画像、注目度画像、顕著領域事前確率画像抽出部２によって抽出された顕著領域事前確率画像、顕著領域画像抽出部４によって抽出された顕著領域画像および前回迄に算出した特徴量尤度の少なくとも１つに基づいて算出する。例えば、特徴量尤度算出部３は、入力画像、顕著領域事前確率画像、顕著領域画像および前回迄に算出した特徴量尤度から、特徴量尤度を算出する。特徴量尤度算出部３は、算出した特徴量尤度を顕著領域画像抽出部４に出力する。特徴量尤度算出部３が徴量尤度を算出する方法は特に限定しないが、本実施形態においては、顕著領域特徴量尤度算出部３１と非顕著領域特徴量尤度算出部３２と特徴量尤度出力部３３とによって算出する方法について説明する。 The feature amount likelihood calculating unit 3 calculates a feature amount likelihood indicating the likelihood of the image feature amount included in each of the saliency area and the area outside the saliency area of the input image. Specifically, the feature amount likelihood calculating unit 3 determines the feature amount likelihood as the input image, the attention degree image, the saliency area prior probability image extracted by the saliency area prior probability image extracting unit 2, and the saliency area image extracting unit. 4 is calculated based on at least one of the salient region image extracted by 4 and the feature amount likelihood calculated until the previous time. For example, the feature amount likelihood calculation unit 3 calculates the feature amount likelihood from the input image, the saliency area prior probability image, the saliency area image, and the feature amount likelihood calculated until the previous time. The feature amount likelihood calculating unit 3 outputs the calculated feature amount likelihood to the saliency area image extracting unit 4. The method by which the feature amount likelihood calculation unit 3 calculates the collection likelihood is not particularly limited, but in the present embodiment, the saliency region feature amount likelihood calculation unit 31, the non-salience region feature amount likelihood calculation unit 32, and the feature A method of calculation by the quantity likelihood output unit 33 will be described.

顕著領域特徴量尤度算出部３１は、顕著領域に含まれる画像特徴量の尤度を示す顕著領域特徴量尤度を、入力画像、顕著領域事前確率画像、顕著領域画像および前回迄に算出した顕著領域特徴量尤度のうち少なくとも１つに基づいて算出する。顕著領域特徴量尤度算出部３１が顕著領域特徴量尤度を算出する方法は特に限定しないが、本実施形態においては、顕著領域特徴量尤度生成部３１１と顕著領域特徴量尤度更新部３１２とによって算出する方法について説明する。 The saliency area feature amount likelihood calculation unit 31 calculates the saliency area feature amount likelihood indicating the likelihood of the image feature amount included in the saliency area by the input image, the saliency area prior probability image, the saliency area image, and the previous time. Calculation is made based on at least one of the saliency feature amount likelihoods. The method by which the saliency area feature amount likelihood calculating unit 31 calculates the saliency area feature amount likelihood is not particularly limited, but in the present embodiment, the saliency area feature amount likelihood generating unit 311 and the saliency area feature amount likelihood updating unit are included. 312 will be described.

顕著領域特徴量尤度生成部３１１は、入力画像、顕著領域事前確率画像および顕著領域画像に基づいて顕著領域特徴量尤度を新たに生成（算出）し、出力する。顕著領域特徴量尤度生成部３１１が顕著領域特徴量尤度を生成する方法は、特に限定しないが、本実施形態においては、ガウス混合分布モデルを利用した方法について説明する。 The saliency area feature amount likelihood generation unit 311 newly generates (calculates) and outputs a saliency area feature amount likelihood based on the input image, the saliency area prior probability image, and the saliency area image. The method for generating the saliency feature amount likelihood by the saliency feature amount likelihood generation unit 311 is not particularly limited, but in the present embodiment, a method using a Gaussian mixture distribution model will be described.

顕著領域特徴量尤度生成部３１１は、まず、時刻ｔにおいて、顕著領域に特有の特徴量の確率分布である顕著領域特徴量確率が、それぞれ平均〜ｃ_ｊ（ｔ）・共分散行列〜Σ_ｆ，ｊ（ｔ）（ｊ＝１，２，・・・，Ｍ_ｆ）を持ち、混合比が〜η_ｆ，ｊ（ｔ）であるＭ_ｆ個のガウス分布の混合によって構成されていると仮定し、これらのモデルパラメータを、顕著領域事前確率画像の画素値で重み付けした入力画像の画素値から推定する。モデルパラメータの推定には、例えばＥＭアルゴリズムを用いる。具体的には、下記式（１３９）〜下記式（１４２）をｋ＝１，２，・・・で繰り返すことによって行われ、各パラメータが収束した時点で手順を打ち切り、パラメータを固定する。 The saliency area feature amount likelihood generation unit 311 first calculates, at time t, the saliency area feature quantity probabilities, which are probability distributions of the characteristic quantities peculiar to the saliency area, from the mean to c _j (t) and the covariance matrix to Σ, respectively. _{f, j} (t) (j = 1, 2,..., M _f ), and a mixture ratio of ˜η _{f, j} (t) is composed of a mixture of M _f Gaussian distributions. Assuming these model parameters are estimated from the pixel values of the input image weighted with the pixel values of the saliency prior probability image. For example, an EM algorithm is used for estimating the model parameters. Specifically, the following formula (139) to the following formula (142) are repeated by k = 1, 2,..., And when each parameter converges, the procedure is terminated and the parameters are fixed.

ここで、入力画像の位置ｘにおける画素値は、ＲＧＢの３次元ベクトルとしてｃ（ｘ，ｔ）で表現される。上記のようにして、顕著領域特徴量尤度生成部３１１は、推定したガウス混合分布のモデルパラメータから、顕著領域特徴量尤度を算出する。具体的には、推定したモデルパラメータで特徴付けられるガウス混合分布を尤度とする下記式（１４３）によって算出する。 Here, the pixel value at the position x of the input image is expressed as c (x, t) as a three-dimensional RGB vector. As described above, the saliency area feature amount likelihood generation unit 311 calculates the saliency area feature amount likelihood from the model parameters of the estimated Gaussian mixture distribution. Specifically, it is calculated by the following equation (143) with a Gaussian mixture distribution characterized by the estimated model parameters as the likelihood.

上記の通り、顕著領域特徴量尤度生成部３１１は、顕著領域特徴量尤度￣ψ_１（ｃ，ｔ）を生成（算出）し、出力する。 As described above, the saliency area feature amount likelihood generation unit 311 generates (calculates) and outputs the saliency area feature amount likelihood ￣ψ ₁ (c, t).

顕著領域特徴量尤度更新部３１２は、顕著領域特徴量尤度生成部３１１によって生成された顕著領域特徴量尤度を更新する。具体的には、顕著領域特徴量尤度更新部３１２は、入力画像、顕著領域画像および前回迄に更新した更新後の顕著領域特徴量尤度のうち少なくとも１つに基づいて、顕著領域特徴量尤度生成部３１１によって生成された顕著領域特徴量尤度を更新する。顕著領域特徴量尤度更新部３１２が顕著領域特徴量尤度（ｅ１）を更新する方法は特に限定しないが、本実施形態においては、以下の２通りの方法を説明する。 The saliency area feature amount likelihood update unit 312 updates the saliency area feature amount likelihood generated by the saliency area feature amount likelihood generation unit 311. Specifically, the saliency area feature amount likelihood updating unit 312 is based on at least one of the input image, the saliency area image, and the updated saliency area feature amount likelihood updated until the previous time, and the saliency area feature amount. The saliency area feature amount likelihood generated by the likelihood generation unit 311 is updated. The method by which the saliency feature amount likelihood update unit 312 updates the saliency feature amount likelihood (e1) is not particularly limited, but in the present embodiment, the following two methods will be described.

（更新方法１）
２種類の顕著領域特徴量尤度を混合することによって更新する。具体的には、求めるべき現時点（時刻ｔ）の顕著領域特徴量尤度ψ_１（ｃ，ｔ）を、顕著領域特徴量尤度生成部３１１から出力された更新前の顕著領域特徴量尤度ψ１（ｃ，ｔ）、および、１時点前（時刻ｔ−１）の顕著領域特徴量尤度￣ψ_１（ｃ，ｔ−１）を、あらかじめ定められた混合比λ_ｃにて混合する下記式（１４４）によって計算する。 (Update method 1)
It is updated by mixing two types of saliency area feature likelihoods. Specifically, the saliency area feature amount likelihood ψ ₁ (c, t) at the current time (time t) to be obtained is determined as the saliency area feature amount likelihood before update output from the saliency area feature amount likelihood generation unit 311. ψ1 (c, t) and the saliency area feature likelihood ￣ψ ₁ (c, t-1) one point before (time t-1) are mixed at a predetermined mixing ratio λ _c Calculated according to equation (144).

（更新方法２）
１時点前（時刻ｔ−１）の顕著領域画像Ａ（ｔ−１）に基づいて、１時点前の顕著領域特徴量尤度ψ_１（ｃ，ｔ−１）を更新した上で第１の実施形態と同様の方法を実施する。具体的には、１時点前の顕著領域画像Ａ（ｔ−１）において顕著領域であるとされた領域（下記式（１４５）によって示される顕著領域Ａ_ｏｂｊ（ｔ））を取り出し、顕著領域Ａ_ｏｂｊ（ｔ）にある入力画像中の画素値から、顕著領域特徴量尤度生成部３１１に示した方法と同様の方法で顕著領域特徴量尤度ψ_１（ｃ，ｔ−１）を再学習する。但し、本実施形態では、重みとして、顕著領域事前確率画像に代えて顕著領域画像を用いるものとする。１時点前の顕著領域特徴量尤度ψ１（ｃ，ｔ−１）を再学習した後、第１の実施形態と同様の方法により、現在の顕著領域特徴量尤度ψ_１（ｃ，ｔ）を生成する。 (Update method 2)
Based on the saliency area image A (t−1) before one time point (time t−1), the saliency area feature amount likelihood ψ ₁ (c, t−1) before one time point is updated and the first A method similar to that of the embodiment is performed. Specifically, a salient region and areas at one time before the salient region image A (t-1) to (formula (salient region _A obj indicated by 145) (t)) is taken out, salient region A _The saliency feature amount likelihood ψ ₁ (c, t−1) is relearned from the pixel value in the input image at _obj (t) by the same method as the method shown in the saliency feature amount likelihood generation unit 311. To do. However, in the present embodiment, the saliency area image is used as the weight instead of the saliency area prior probability image. After re-learning the saliency area feature amount likelihood ψ1 (c, t-1) one point before, the current saliency region feature amount likelihood ψ ₁ (c, t) is obtained by the same method as in the first embodiment. Is generated.

上記の通り、顕著領域特徴量尤度更新部３１２は、顕著領域特徴量尤度￣ψ_１（ｃ，ｔ）をψ_１（ｃ，ｔ）に更新し、出力する。上記の通り、顕著領域特徴量尤度算出部３１は、顕著領域特徴量尤度ψ_１（ｃ，ｔ）を算出し、出力する。 As described above, the saliency area feature quantity likelihood update unit 312 updates the saliency area feature quantity likelihood ￣ψ ₁ (c, t) to ψ ₁ (c, t) and outputs it. As described above, the saliency area feature amount likelihood calculating unit 31 calculates and outputs the saliency area feature amount likelihood ψ ₁ (c, t).

非顕著領域特徴量尤度算出部３２は、顕著領域外の領域に含まれる画像特徴量の尤度を示す非顕著領域特徴量尤度を、入力画像、顕著領域事前確率画像、顕著領域画像および前回迄に算出した非顕著領域特徴量尤度のうち少なくとも１つに基づいて算出する。非顕著領域特徴量尤度算出部３２が非顕著領域特徴量尤度を算出する方法は特に限定しないが、本実施形態においては、非顕著領域特徴量尤度生成部３２１と非顕著領域特徴量尤度更新部３２２とによって算出する方法について説明する。 The non-significant region feature amount likelihood calculating unit 32 converts the non-significant region feature amount likelihood indicating the likelihood of the image feature amount included in the region outside the saliency region into an input image, a saliency region prior probability image, a saliency region image, and The calculation is performed based on at least one of the non-significant region feature amount likelihoods calculated so far. The method by which the non-significant region feature amount likelihood calculating unit 32 calculates the non-significant region feature amount likelihood is not particularly limited, but in the present embodiment, the non-significant region feature amount likelihood generating unit 321 and the non-significant region feature amount likelihood are calculated. A method of calculation by the likelihood update unit 322 will be described.

非顕著領域特徴量尤度生成部３２１は、入力画像、顕著領域事前確率画像および顕著領域画像に基づいて非顕著領域特徴量尤度を新たに生成（算出）し、出力する。非顕著領域特徴量尤度生成部３２１が非顕著領域特徴量尤度を生成する方法は特に限定しないが、本実施形態においては、ガウス混合分布モデルを利用した方法について説明する。当該方法は、前述の顕著領域特徴量尤度生成部３１１の方法とほぼ同様であるが、顕著領域事前確率画像に代えて、顕著領域事前確率画像Ξ_１（ｔ）の各画素値ξ_１（ｘ，ｔ）をある規則に従って変換して生成した画像である非顕著領域事前確率画像Ξ_２（ｔ）を用いるものとする。当該変換規則として、例えば以下の２つの方法が考えられる。 The non-salience area feature amount likelihood generation unit 321 newly generates (calculates) and outputs a non-salience area feature amount likelihood based on the input image, the saliency area prior probability image, and the saliency area image. The method by which the non-salience feature amount likelihood generation unit 321 generates the non-salience feature amount likelihood is not particularly limited, but in the present embodiment, a method using a Gaussian mixture distribution model will be described. The method is almost the same as the method of salient region feature amount likelihood generator 311 described above, instead of the salient region prior probability image, each pixel value of the notable region prior probability image Ξ _{₁ (t)} ξ _{1 (} Assume that a non-salience area prior probability image Ξ ₂ (t), which is an image generated by converting x, t) according to a certain rule, is used. For example, the following two methods can be considered as the conversion rule.

（方法１）
非顕著領域事前確率画像（ｆ）の位置ｘにおける画素値ξ_２（ｘ，ｔ）を、１−ξ_１（ｘ，ｔ）に変換する。
（方法２）
ξ_１（ｘ，ｔ）＝０である位置ｘのみ、その位置の非顕著領域事前確率画像（ｆ）の画素値を１とする。それ以外の位置は、画素値を０とする。 (Method 1)
The pixel value ξ ₂ (x, t) at the position x of the non-salience area prior probability image (f) is converted into 1−ξ ₁ (x, t).
(Method 2)
Only in the position x where ξ ₁ (x, t) = 0, the pixel value of the non-salience area prior probability image (f) at that position is set to 1. For other positions, the pixel value is 0.

上記の通り、非顕著領域特徴量尤度生成部３２１は、非顕著領域特徴量尤度￣ψ_２（ｃ，ｔ）を生成（算出）し、出力する。 As described above, the non-salient region feature quantity likelihood generating unit 321, a non-salient region feature quantity likelihood ¯ψ _{2 (c,} t) generates (calculates), outputs.

非顕著領域特徴量尤度更新部３２２は、非顕著領域特徴量尤度生成部３２１によって生成された非顕著領域特徴量尤度を更新する。具体的には、非顕著領域特徴量尤度更新部３２２は、入力画像、非顕著領域画像および前回迄に更新した更新後の非顕著領域特徴量尤度のうち少なくとも１つに基づいて、非顕著領域特徴量尤度生成部３２１によって生成された非顕著領域特徴量尤度を更新する。なお、非顕著領域画像は、顕著領域事前確率画像抽出部２によって抽出される顕著領域外の領域に係る画像である。非顕著領域特徴量尤度更新部３２２が非顕著領域特徴量尤度を更新する方法は、顕著領域特徴量尤度更新部３１２の方法と同様である。但し、顕著領域事前確率画像に代えて非顕著領域事前確率画像を、顕著領域特徴量尤度に代えて非顕著領域特徴量尤度を、顕著領域に代えて非顕著領域（下記式（１４６）によって示される顕著領域Ａ_ｂｋｇ（ｔ））を用いる。 The non-significant region feature quantity likelihood update unit 322 updates the non-significant region feature amount likelihood generated by the non-significant region feature amount likelihood generation unit 321. Specifically, the non-salience area feature amount likelihood update unit 322 performs non-remarkable based on at least one of the input image, the non-salience area image, and the updated non-salience area feature amount likelihood updated up to the previous time. The non-salience area feature amount likelihood generated by the saliency area feature amount likelihood generation unit 321 is updated. The non-significant area image is an image relating to an area outside the saliency area extracted by the saliency area prior probability image extraction unit 2. The method by which the non-salience area feature amount likelihood update unit 322 updates the non-salience area feature amount likelihood is the same as the method of the saliency area feature amount likelihood update unit 312. However, instead of the saliency area prior probability image, the non-salience area prior probability image is replaced with the saliency area feature amount likelihood, and the non-salience area feature amount likelihood is replaced with the saliency area. The saliency area A _bkg (t)) indicated by is used.

上記の通り、非顕著領域特徴量尤度更新部３２２は、非顕著領域特徴量尤度ψ_２（ｃ，ｔ）を更新し、更新後の非顕著領域特徴量尤度を出力する。上記の通り、非顕著領域特徴量尤度算出部３２は、非顕著領域特徴量尤度ψ_２（ｃ，ｔ）を抽出し、出力する。 As described above, the non-significant region feature value likelihood update unit 322 updates the non-significant region feature value likelihood ψ ₂ (c, t), and outputs the updated non-salience region feature value likelihood. As described above, the non-salience area feature amount likelihood calculating unit 32 extracts and outputs the non-salience area feature amount likelihood ψ ₂ (c, t).

特徴量尤度出力部３３は、顕著領域特徴量尤度および非顕著領域特徴量尤度を加算して特徴量尤度として出力する。 The feature quantity likelihood output unit 33 adds the saliency area feature quantity likelihood and the non-salience area feature quantity likelihood and outputs the result as a feature quantity likelihood.

顕著領域画像抽出部４は、入力画像、顕著領域事前確率画像および特徴量尤度から、入力画像の顕著領域を示す顕著領域画像を抽出する。顕著領域画像抽出部４は、抽出した顕著領域画像を顕著領域事前確率画像抽出部２、特徴量尤度算出部３および顕著領域映像生成部５に出力する。顕著領域画像抽出部４が顕著領域画像を抽出する方法は特に限定しないが、本実施形態においては、非特許文献１に記載の方法を基礎としたグラフカットを用いる方法について説明する。当該方法は、顕著領域抽出グラフ生成部４１と、顕著領域抽出グラフ分割部４２とによって顕著領域画像を抽出する。 The saliency area image extraction unit 4 extracts a saliency area image indicating the saliency area of the input image from the input image, the saliency area prior probability image, and the feature amount likelihood. The saliency area image extraction unit 4 outputs the extracted saliency area image to the saliency area prior probability image extraction part 2, the feature amount likelihood calculation part 3, and the saliency area video generation part 5. The method of extracting the saliency area image by the saliency area image extracting unit 4 is not particularly limited, but in this embodiment, a method using a graph cut based on the method described in Non-Patent Document 1 will be described. In this method, the saliency area extraction graph generation unit 41 and the saliency area extraction graph division unit 42 extract the saliency area image.

顕著領域抽出グラフ生成部４１は、入力画像、顕著領域事前確率画像および特徴量尤度を入力し、顕著領域画像を抽出するためのグラフである顕著領域抽出グラフを生成し、出力する。 The saliency area extraction graph generation unit 41 inputs the input image, the saliency area prior probability image, and the feature amount likelihood, and generates and outputs a saliency area extraction graph that is a graph for extracting the saliency area image.

具体的には、顕著領域抽出グラフ生成部４１は、まず、時刻ｔの顕著領域抽出グラフＧ（ｔ）の頂点として、入力画像の各位置ｘ∈Ｉに対応する頂点と、顕著領域・非顕著領域のラベルにそれぞれ対応する頂点の２種類の頂点を用意する。即ち、頂点は総計画素数＋２個となる。以降、簡単のため、各位置ｘに対応する頂点をｖ_ｘ、顕著領域のラベルに対応する頂点をＳＯＵＲＣＥＳ、非顕著領域のラベルに対応する頂点をＳＩＮＫＴとして、それぞれ表現する。また、顕著領域抽出グラフの辺として、近傍位置に対応する頂点の間相互に配置される有向辺であるｎ−ｌｉｎｋと、ＳＯＵＲＣＥから各頂点・各頂点からＳＩＮＫにそれぞれ配置される有向辺であるｔ−ｌｉｎｋの２種類の辺を用意する。近傍としては、例えば上下左右の４近傍、若しくはさらに斜め方向を加えた８近傍を考える。このようにして、顕著領域抽出グラフは有向グラフとして、例えば図６に示すような形で構成される。 Specifically, the saliency area extraction graph generation unit 41 first sets vertices corresponding to the respective positions x∈I of the input image as vertices of the saliency area extraction graph G (t) at time t, and saliency areas / non-salience. Two types of vertices corresponding to the label of the area are prepared. That is, the total number of vertices is +2. Hereinafter, for simplicity, the vertex corresponding to each position x is represented as v _x , the vertex corresponding to the label of the saliency area is SOURCES, and the vertex corresponding to the label of the non-salience area is SINKT, respectively. Also, as the sides of the saliency area extraction graph, n-links, which are directed edges between vertices corresponding to neighboring positions, and directed edges, which are arranged from SOURCE to each vertex and from each vertex to SINK, respectively. Two types of t-link sides are prepared. As the neighborhood, for example, 4 neighborhoods in the upper, lower, left, and right directions, or 8 neighborhoods including an oblique direction are considered. In this way, the saliency area extraction graph is configured as a directed graph, for example, in the form shown in FIG.

次に、顕著領域抽出グラフ生成部４１は、顕著領域抽出グラフの各辺にコストを与える。ｔ−ｌｉｎｋのコストは、顕著領域事前確率画像および特徴量尤度から算出される。具体的には、ＳＯＵＲＣＥＳから頂点ｖ_ｘへのｔ−ｌｉｎｋのコストＣ（Ｓ，ｖ_ｘ；ｔ）は対応する非顕著領域事前確率と非顕著領域特徴量尤度の和で、頂点ｖ_ｘからＳＩＮＫＴへのｔ−ｌｉｎｋのコストＣ（Ｔ，ｖ_ｘ；ｔ）は、対応する顕著領域事前確率と顕著領域特徴量尤度を用いて、下記式（１４７）（１４８）のように与えられる。 Next, the saliency area extraction graph generation unit 41 gives a cost to each side of the saliency area extraction graph. The cost of t-link is calculated from the significant area prior probability image and the feature amount likelihood. Specifically, the cost C (S, v _x ; t) of t-link from SOURCES to vertex v _x is the sum of the corresponding non-salience area prior probability and non-salience area feature likelihood, and from vertex v _x The cost C (T, v _x ; t) of t-link to SINKT is given by the following equations (147) and (148) using the corresponding saliency area prior probability and saliency area feature likelihood.

一方、ｎ−ｌｉｎｋのコストは、近接画素間の輝度値の類似性に基づいて算出される。具体的には、ある２点ｖ_ｘとｖ_ｙとの間のｎ−ｌｉｎｋのコストＣ（ｖ_ｘ，ｖ_ｙ）は、下記式（１４９）で与えられる。 On the other hand, the cost of n-link is calculated based on the similarity of luminance values between adjacent pixels. Specifically, the cost _{C (v} x, _{v y)} of n-link between one 2-point _{v x} and _{v y} is given by the following formula (149).

顕著領域抽出グラフ分割部４２は、顕著領域抽出グラフを入力し、顕著領域抽出グラフを分割することで顕著領域画像を生成し、出力する。 The saliency area extraction graph dividing unit 42 receives the saliency area extraction graph, divides the saliency area extraction graph, and generates and outputs a saliency area image.

具体的には、顕著領域抽出グラフ分割部４２は、まず、顕著領域抽出グラフに含まれる頂点を、ＳＯＵＲＣＥを含む部分集合とＳＩＮＫを含む部分集合に分割することを考える。このとき、ＳＯＵＲＣＥ側の頂点の部分集合からＳＩＮＫ側の頂点の部分集合へまたがる辺のコストが最も小さくなるように分割する。逆向き、即ち、ＳＩＮＫ側の頂点の部分集合からＳＯＵＲＣＥ側の頂点の部分集合へまたがる辺のコストは考慮しないことに注意する。このような問題は、グラフの最小カット問題と呼ばれ、グラフの最大フロー問題と等価であることが知られている。この最大フロー問題を解く方法として、非特許文献１の他、下記非特許文献５に記載の「Ford-Fulkerson algorithm」、下記非特許文献６に記載の「Goldberg-Tarjan algorithm」などが一般に広く知られている。 Specifically, the saliency area extraction graph dividing unit 42 first considers dividing the vertices included in the saliency area extraction graph into a subset including SOURCE and a subset including SINK. At this time, the division is performed so that the cost of the side extending from the subset of the vertexes on the SOURCE side to the subset of the vertexes on the SINK side is minimized. Note that it does not take into account the cost in the opposite direction, ie, the edge spanning from a subset of vertices on the SINK side to a subset of vertices on the SOURCE side. Such a problem is called a graph minimum cut problem, and is known to be equivalent to a graph maximum flow problem. As a method for solving this maximum flow problem, in addition to Non-Patent Document 1, “Ford-Fulkerson algorithm” described in Non-Patent Document 5 below, “Goldberg-Tarjan algorithm” described in Non-Patent Document 6 below, etc. are generally widely known. It has been.

（非特許文献５）L.R.Ford,D.R.Fulkerson:“Maximal flow through a network,”Canadial Journal of Mathematics,Vol.8,pp.399-404,1956.
（非特許文献６）A.V.Goldberg,R.E.Tarjan:“A new approach to the maximum-flow problem,”Journal of the ACM,Vol.35,pp.921-940,1988. (Non-Patent Document 5) LRFord, DRFulkerson: “Maximal flow through a network,” Canadial Journal of Mathematics, Vol. 8, pp. 399-404, 1956.
(Non-Patent Document 6) AVGoldberg, REtarjan: “A new approach to the maximum-flow problem,” Journal of the ACM, Vol. 35, pp. 921-940, 1988.

上記の方法で顕著領域抽出グラフを分割した結果、ＳＯＵＲＣＥを含む部分グラフに属した頂点に対応する画素位置を顕著領域Ａ_ｏｂｊ（ｔ）に、ＳＩＮＫを含む部分グラフに属した頂点に対応する画素位置を非顕著領域Ａ_ｂｋｇ（ｔ）に所属させる。顕著領域画像は、下記式（１５０）に示すように、顕著領域に属する位置の画素値を１、非顕著領域に属する位置の画素値を０とする画像である。 As a result of dividing the salient region extraction graph by the above method, the pixel position corresponding to the vertex belonging to the subgraph including SOURCE is set to the salient region A _obj (t), and the pixel corresponding to the vertex belonging to the subgraph including SINK The position is made to belong to the non-salience area A _bkg (t). As shown in the following formula (150), the saliency area image is an image in which the pixel value of the position belonging to the saliency area is 1 and the pixel value of the position belonging to the non-salience area is 0.

上記の通り、顕著領域抽出グラフ分割部４２は、顕著領域画像Ａ（ｘ，ｔ）を抽出し、出力する。つまり、顕著領域画像抽出部４は、顕著領域画像Ａ（ｘ，ｔ）を抽出し、この顕著領域画像を出力する。 As described above, the saliency area extraction graph dividing unit 42 extracts and outputs the saliency area image A (x, t). That is, the saliency area image extraction unit 4 extracts the saliency area image A (x, t) and outputs the saliency area image.

顕著領域映像生成部５は、各入力画像に対し、注目度映像抽出部１、顕著領域事前確率画像抽出部２、特徴量尤度算出部３および顕著領域画像抽出部４を実行して得られる各顕著領域画像から顕著領域映像を生成する。換言すれば、顕著領域映像生成部５は、注目度映像抽出部１、顕著領域事前確率画像抽出部２、特徴量尤度算出部３および顕著領域画像抽出部４を順に各入力画像に対して繰り返し実行することにより抽出された顕著領域画像の集合を入力し、集合内の顕著領域画像から構成される顕著領域映像を生成する（集合内の顕著領域画像を時系列に並べて顕著領域映像を生成する）。顕著領域映像生成部５は、生成した顕著領域映像を外部に出力する。 The saliency area video generation unit 5 is obtained by executing the attention level video extraction unit 1, the saliency area prior probability image extraction unit 2, the feature amount likelihood calculation unit 3, and the saliency area image extraction unit 4 for each input image. A saliency area image is generated from each saliency area image. In other words, the saliency area video generation unit 5 applies the attention level video extraction unit 1, the saliency area prior probability image extraction unit 2, the feature amount likelihood calculation unit 3, and the saliency area image extraction unit 4 in order to each input image. A set of salient area images extracted by repeated execution is input and a salient area video composed of salient area images in the set is generated (a salient area video is generated by arranging the salient area images in the set in time series) To do). The saliency area image generation unit 5 outputs the generated saliency area image to the outside.

（第２の実施形態）
以下、本発明の第２の実施形態に係る顕著領域映像生成装置１１００について図面を参照して説明する。顕著領域映像生成装置１１００は、外部から入力映像を取得し、当該入力映像を構成する各入力フレーム（各入力画像）からそれぞれの顕著領域を抽出した各顕著領域フレーム（各顕著領域画像）から構成される顕著領域映像を生成し、外部に出力する。 (Second Embodiment)
Hereinafter, a saliency image generating apparatus 1100 according to a second embodiment of the present invention will be described with reference to the drawings. The saliency area image generation device 1100 is configured from each saliency area frame (each saliency area image) obtained by acquiring an input image from outside and extracting each saliency area from each input frame (each input image) constituting the input image. Generated saliency area video and output it to the outside.

顕著領域映像生成装置１１００は、図７に示すように、注目度映像抽出部１、顕著領域事前確率画像抽出部２、特徴量尤度算出部３、顕著領域画像抽出部４、顕著領域映像生成部５、平滑化画像群生成部６、顕著領域画像確定部７を備える。注目度映像抽出部１、顕著領域画像抽出部４および顕著領域映像生成部５は、第１の実施形態と同様であるため説明を省略する。 As shown in FIG. 7, the saliency area image generation device 1100 includes an attention degree image extraction unit 1, a saliency area prior probability image extraction unit 2, a feature amount likelihood calculation unit 3, a saliency area image extraction unit 4, and a saliency area image generation unit. Unit 5, smoothed image group generation unit 6, and saliency area image determination unit 7. The attention level video extraction unit 1, the saliency area image extraction unit 4 and the saliency area image generation unit 5 are the same as those in the first embodiment, and thus the description thereof is omitted.

平滑化画像群生成部６は、入力画像を異なる解像度によってそれぞれ平滑化した複数の平滑化画像からなる平滑化画像群を生成する。つまり、平滑化画像群生成部６は、入力画像を入力し、当該入力画像を異なる解像度によってそれぞれ平滑化した平滑化画像群を生成する。平滑化画像群生成部６は、生成した平滑化画像群を特徴量尤度算出部３、顕著領域画像抽出部４および顕著領域画像確定部７に出力する。平滑化画像群生成部６が平滑化画像群を生成する方法は特に限定しないが、本実施形態では、入力画像に対し平滑化と縮小とを繰り返す方法について説明する。 The smoothed image group generation unit 6 generates a smoothed image group composed of a plurality of smoothed images obtained by smoothing the input image with different resolutions. That is, the smoothed image group generation unit 6 inputs an input image and generates a smoothed image group obtained by smoothing the input image with different resolutions. The smoothed image group generation unit 6 outputs the generated smoothed image group to the feature amount likelihood calculation unit 3, the saliency area image extraction unit 4, and the saliency area image determination unit 7. A method for generating the smoothed image group by the smoothed image group generation unit 6 is not particularly limited. In the present embodiment, a method for repeating smoothing and reduction on the input image will be described.

平滑化画像群生成部６は、時刻ｔの平滑化画像の初期値Ｈ_０（ｔ）とし、ある整数ｋについて平滑化画像Ｈ_ｋ−１（ｔ）が与えられているとき、入力画像を所定の標準偏差パラメータσ_ｇを有するガウス平滑化フィルタを用いて平滑化する。平滑化画像群生成部６は、ガウス平滑化フィルタを用いて平滑化した画像を、下記式（１５１）を満たす所定の倍率ａ_ｇを用いて縮小し、平滑化画像Ｈ_ｋ（ｔ）を生成する。 Smoothed image group generating section 6, the initial value H ₀ of the time t of the smoothed image _(t), when the smoothed image H _{k-1 (t)} is given for some integer k, given an input image Is smoothed using a Gaussian smoothing filter having a standard deviation parameter σ _g of. Smoothed image group generating section 6, an image obtained by smoothing with a Gaussian smoothing filter, reduced with a predetermined magnification a _g which satisfies the following expression (151), generating a smoothed image H _{k (t)} To do.

整数ｋが平滑化画像における平滑化の度合いと対応していることから、以下、整数ｋを平滑化係数と呼ぶ。上記の過程をｋ＝１，２，・・・，ｎ_ｇ−２で繰り返すことにより、平滑化画像群を形成する（下記式（１５２））。 Since the integer k corresponds to the degree of smoothing in the smoothed image, the integer k is hereinafter referred to as a smoothing coefficient. A smoothed image group is formed by repeating the above process at k = 1, 2,..., N _g −2 (the following formula (152)).

このとき、特にσ_ｇ＝０，ａ_ｇ＝１とすると、各平滑化画像が全て入力画像と同一となる。以上のように、平滑化画像群生成部６は、平滑化画像群を抽出し、出力する。 At this time, if σ _g = 0 and a _g = 1 in particular, all the smoothed images are the same as the input image. As described above, the smoothed image group generation unit 6 extracts and outputs a smoothed image group.

顕著領域事前確率画像抽出部２は、注目度映像の一のフレームである注目度画像および顕著領域画から、入力映像中の対応するフレームである入力画像の各位置が顕著領域である確率を表示する顕著領域事前確率画像を抽出する。顕著領域事前確率画像抽出部２が顕著領域事前確率画像を抽出する方法は特に限定しないが、本実施形態においては、顕著領域事前確率画像生成部２１と、顕著領域事前確率画像更新部２２とによって抽出する方法について説明する。 The saliency area prior probability image extraction unit 2 displays the probability that each position of the input image that is a corresponding frame in the input video is a saliency area from the attention level image and the saliency area image that are one frame of the attention level video. The saliency prior probability image to be extracted is extracted. The method by which the saliency area prior probability image extraction unit 2 extracts the saliency area prior probability image is not particularly limited, but in this embodiment, the saliency area prior probability image generation unit 21 and the saliency area prior probability image update unit 22 A method of extracting will be described.

顕著領域事前確率画像生成部２１は、第１の実施形態と同様であるため説明を省略する。顕著領域事前確率画像更新部２２も、第１の実施形態とほぼ同様である。但し、以下の点が第１の実施形態と異なる。 Since the saliency area prior probability image generation unit 21 is the same as that of the first embodiment, the description thereof is omitted. The saliency area prior probability image update unit 22 is substantially the same as that in the first embodiment. However, the following points are different from the first embodiment.

１.ある時刻ｔにおいて本処理を初めて実行する際、即ち、以降の特徴量尤度算出部３および顕著領域画像抽出部４において、平滑化係数最大の平滑化画像（下記式（１５３））が用いられる場合には、第１の実施形態と同様の方法で顕著領域事前確率画像（ｄ）を更新する。 1. When this process is executed for the first time at a certain time t, that is, in the subsequent feature amount likelihood calculation unit 3 and saliency area image extraction unit 4, a smoothed image with the maximum smoothing coefficient (the following equation (153)) is obtained. When used, the saliency prior probability image (d) is updated by the same method as in the first embodiment.

２．ある時刻ｔにおいて本処理を再度実行する際、即ち、以降の特徴量尤度算出部３および顕著領域画像抽出部４において平滑化画像Ｈ_ｋ（ｔ）（ｋ＝ｎ_ｇ−２，ｎ_ｇ−３，・・・・，０）が用いる場合には、以下の変更を行った上で第１の実施形態と同様の処理を行う。
（１）第１の実施形態に記載の更新式のパラメータの一つであるσ_１、および、１時点前（時刻ｔ−１）顕著領域事前確率の分散（下記式（１５４））を強制的に０に置き換える。 2. When this process is executed again at a certain time t, that is, in the subsequent feature amount likelihood calculation unit 3 and the saliency region image extraction unit 4, the smoothed image H _k (t) (k = _ng- 2, _ng- 3,..., 0), the same processing as in the first embodiment is performed after making the following changes.
(1) σ ₁ , which is one of the parameters of the update formula described in the first embodiment, and the variance of the saliency prior probability (timed t−1) before one time point (time (t−1)) (the following formula (154)) are forced Replace with 0.

（２）１時点前（時刻ｔ−１）の顕著領域画像Ａ（ｔ−１）に代えて、平滑化係数が１つ大きい平滑化画像Ｈ_ｋ＋１（ｔ）を用いて生成された顕著領域画像Ａ（ｔ；ｋ＋１）を用いる。
（３）顕著領域事前確率の分散（下記式（１５５））を更新せずに、平均ξ_１（ｘ，ｔ）のみを第１の実施形態と同様の方法で更新する。 (2) A saliency area image generated using a smoothed image H _{k + 1} (t) having one larger smoothing coefficient instead of the saliency area image A (t−1) one point before (time t−1). A (t; k + 1) is used.
(3) Only the average ξ ₁ (x, t) is updated by the same method as in the first embodiment without updating the variance of the saliency prior probability (the following formula (155)).

３．入力として平滑化係数ｋの平滑化画像Ｈ_ｋ（ｔ）を用いたことを明確にするため、出力である顕著領域事前確率画像をΞ_１（ｔ；ｋ）と表記する。 3. In order to clarify that the smoothed image H _k (t) having the smoothing coefficient k is used as an input, the saliency area prior probability image as an output is denoted as Ξ ₁ (t; k).

特徴量尤度算出部３も、第１の実施形態とほぼ同様である。但し、以下の点が第１の実施形態と異なる。
１．入力画像に代えて平滑化画像Ｈ_ｋ（ｔ）（ｋ＝ｎ_ｇ−１，ｎ_ｇ−２，・・・，０）のうち１つを用いてもよい。このとき、時刻ｔにおいて本処理をｊ（ｊ＝１，２，・・・，ｎ_ｇ）回目に実行するときには、平滑化係数ｋ＝ｎ_ｇ−ｊの平滑化画像（下記式１５６）が用いられる。このことは、平滑化係数が大きい平滑化画像から順に用いられることを意味する。 The feature amount likelihood calculating unit 3 is substantially the same as that in the first embodiment. However, the following points are different from the first embodiment.
1. Instead of the input image, one of the smoothed images H _k (t) (k = _ng− 1, _ng− 2,..., 0) may be used. At this time, when this process is executed j (j = 1, 2,..., N _g ) times at time t, a smoothed image (equation 156 below) with a smoothing coefficient k = _ng− j is used. It is done. This means that the smoothed image having the largest smoothing coefficient is used in order.

２．ある時刻ｔにおいて本処理を初めて実行する際、即ち、平滑化係数ｎ_ｇ−１の平滑化画像Ｈ_ｎｇ−１（ｔ）が入力として用いられる場合には、第１項目以外は第１の実施形態と同様である。
３．ある時刻ｔにおいて本処理を再度実行する際、即ち、平滑化係数ｋ（ｋ＝ｎ_ｇ−２，ｎ_ｇ−１，・・・，０）の平滑化画像Ｈ_ｋ（ｔ）が入力として用いられる場合には、以下を用いる。
（１）１時点前（時刻ｔ−１）の顕著領域画像Ａ（ｔ−１）に代えて、現時点（時刻ｔ）で平滑化係数が１つ大きい平滑化画像Ｈ_ｋ＋１（ｔ）を用いて生成された顕著領域画像Ａ（ｔ；ｋ＋１）を用いる。
（２）１時点前の顕著領域特徴量尤度ψ_１（ｃ，ｔ−１）に代えて、現時点で平滑化係数が１つ大きい平滑化画像Ｈ_ｋ＋１（ｔ）を用いて生成された顕著領域特徴量尤度ψ_１（ｃ，ｔ；ｋ＋１）を用いる。
（３）１時点前の非顕著領域特徴量尤度ψ_２（ｃ，ｔ−１）に代えて、現時点で平滑化係数が１つ大きい平滑化画像Ｈ_ｋ＋１（ｔ）を用いて生成された非顕著領域特徴量尤度ψ_２（ｃ，ｔ；ｋ＋１）を用いる。
４．入力として平滑化係数ｋの平滑化画像Ｈ_ｋ（ｔ）を用いたことを明確にするため、出力である顕著領域特徴量尤度をψ_１（ｔ；ｋ）、非顕著領域特徴量尤度をψ_２（ｔ；ｋ）と表記する。 2. When first run the process at a certain time t, that is, when the smoothing factor n _g -1 smoothed image H _{ng-1 (t)} is used as an input, except the first item first embodiment It is the same as the form.
3. When performing the process at a certain time t again, i.e., used as a smoothing coefficient _{_{k (k = n g -2,}} n g -1, ···, 0) of the smoothed image _H k (t) is input If so, use:
(1) Instead of the saliency area image A (t−1) one time before (time t−1), a smoothed image H _{k + 1} (t) having a larger smoothing coefficient at the current time (time t) is used. The generated saliency area image A (t; k + 1) is used.
(2) The saliency generated by using the smoothed image H _{k + 1} (t) having one larger smoothing coefficient at the current time instead of the saliency area feature likelihood ψ ₁ (c, t−1) one time before The region feature amount likelihood ψ ₁ (c, t; k + 1) is used.
(3) Instead of the non-significant region feature amount likelihood ψ ₂ (c, t−1) one point before, it is generated using a smoothed image H _{k + 1} (t) having one larger smoothing coefficient at the present time. The non-salience area feature likelihood ψ ₂ (c, t; k + 1) is used.
4). In order to clarify that the smoothed image H _k (t) having the smoothing coefficient k is used as an input, the saliency area feature quantity likelihood as an output is ψ ₁ (t; k), and the non-salience area feature quantity likelihood. Is denoted as ψ ₂ (t; k).

顕著領域画像確定部７は、平準化画像群に対し、顕著領域事前確率画像抽出部２、特徴量尤度算出部３、顕著領域画像抽出部４の処理を実行し、入力画像の顕著領域画像を確定する。即ち、顕著領域画像確定部７は、平滑化係数ｋの平滑化画像Ｈ_ｋ（ｔ）に対し、顕著領域事前確率画像抽出部２、特徴量尤度算出部３および顕著領域画像抽出部４を順に実行し、抽出された顕著領域画像Ａ（ｔ；ｋ）を入力し、１つ前のステップで抽出された顕著領域画像Ａ（ｔ；ｋ＋１）からの変化がない場合には、現時点（時刻ｔ）の入力画像に対しての最終的な顕著領域画像を確定し、この顕著領域画像Ａ（ｔ）＝Ａ（ｔ；ｋ）を出力し、変化がある場合には、ｋを１つ小さくして、再度顕著領域事前確率画像抽出部２に戻る。 The saliency area image determination unit 7 executes the processes of the saliency area prior probability image extraction unit 2, the feature amount likelihood calculation unit 3, and the saliency area image extraction unit 4 for the leveled image group, and the saliency area image of the input image Confirm. That is, the saliency area image determination unit 7 applies the saliency area prior probability image extraction unit 2, the feature amount likelihood calculation unit 3, and the saliency area image extraction unit 4 to the smoothed image H _k (t) with the smoothing coefficient k. When the extracted saliency area image A (t; k) is input and there is no change from the saliency area image A (t; k + 1) extracted in the previous step, the current time (time The final saliency image for the input image of t) is determined, and this saliency image A (t) = A (t; k) is output. If there is a change, k is decreased by one. Then, the process returns to the saliency area prior probability image extraction unit 2 again.

図８は、顕著領域映像生成装置１０００による顕著領域映像の生成結果（顕著領域の抽出結果）である。なお、入力映像は、６４０×４８０ピクセル、３０〜９０秒、各パラメータの数値は、σ_１＝０．０．３１、σ_２＝０．０３７、Ｍ_ｆ＝５、λ_ｃ＝０．２５、λ_ｉ＝１００、σ＝０、１である。図８において、第１行および第３行は入力映像、第２行は第１行に対応する顕著領域の抽出結果、第４行は第３行に対応する顕著領域の抽出結果である。図８に示すように、顕著領域映像生成装置１０００によれば、適切に顕著領域が抽出され、その境界もほぼ完全である。 FIG. 8 shows a saliency area image generation result (salience area extraction result) by the saliency area image generation apparatus 1000. The input video is 640 × 480 pixels, 30 to 90 seconds, and the numerical values of the parameters are σ ₁ = 0.0.31, σ ₂ = 0.037, M _f = 5, λ _c = 0.25, λ _i = 100, σ = 0, 1. In FIG. 8, the first row and the third row are the input video, the second row is the extraction result of the salient region corresponding to the first row, and the fourth row is the extraction result of the salient region corresponding to the third row. As shown in FIG. 8, according to the saliency area image generation device 1000, the saliency area is appropriately extracted, and the boundary is almost perfect.

図９は、顕著領域映像生成装置１０００による方法と、顕著領域映像生成装置１０００から顕著領域事前確率画像更新部２２、顕著領域特徴量尤度更新部３１２および非顕著領域特徴量尤度更新部３２２を取り除いた方法、即ち、顕著領域事前確率画像および特徴量尤度を逐次的に更新しない方法との比較である。図９に示すように、顕著領域事前確率画像および特徴量尤度を逐次的に更新しない方法では分割結果がフレームによって大きく異なるが、顕著領域映像生成装置１０００による方法では安定して画像分割ができる。 FIG. 9 shows a method using the saliency area video generation apparatus 1000 and a saliency area prior probability image update unit 22, a saliency area feature amount likelihood update part 312, and a non-salience area feature amount likelihood update part 322 from the saliency area image generation apparatus 1000. This is a comparison with a method in which the saliency area prior probability image and the feature amount likelihood are not sequentially updated. As shown in FIG. 9, in the method in which the saliency area prior probability image and the feature amount likelihood are not sequentially updated, the division result varies greatly depending on the frame, but the method using the saliency area image generation device 1000 can stably divide the image. .

以上の説明したように、本発明では、主に以下の２点により、上記の映像顕著領域抽出方法を実現している。 As described above, in the present invention, the above-described video salient region extraction method is realized mainly by the following two points.

（１）注目度映像抽出部１による、人間の視覚機構を模擬したモデルに基づく映像顕著性の算出、並びに、顕著領域事前確率画像生成部２１および顕著領域特徴量尤度算出部３１による、映像顕著性に基づく顕著領域、非顕著領域に関する事前情報の生成
（２）顕著領域事前確率画像更新部２２および非顕著領域特徴量尤度算出部３２による、顕著領域、非顕著領域に関する事前情報の逐次更新
物体領域・背景領域に関する事前情報が全く与えられない場合においても領域分割が可能になる。従って、物体領域、背景領域に関する事前知識がない場合でも、精度良く物体領域と背景領域を分割して、注目している領域（物体領域）を抽出することができるようになる。 (1) Image saliency calculation based on a model simulating a human visual mechanism by the attention level video extraction unit 1, and video by the saliency area prior probability image generation unit 21 and the saliency feature amount likelihood calculation unit 31 Generation of prior information on saliency areas and non-salience areas based on saliency (2) Sequential generation of prior information on saliency areas and non-salience areas by the saliency area prior probability image update unit 22 and non-salience area feature amount likelihood calculation unit 32 Even when no prior information regarding the updated object area / background area is given, the area can be divided. Therefore, even when there is no prior knowledge about the object region and the background region, the object region and the background region can be divided with high accuracy and the region of interest (object region) can be extracted.

これにより、物体領域・背景領域に関する事前情報が全く与えられない場合においても領域分割が可能になる。従って、物体領域・背景領域に関する事前知識がない場合でも、精度良く物体領域と背景領域を分割して、注目している領域（物体領域）を抽出することができるようになる。 As a result, it is possible to divide the region even when no prior information regarding the object region / background region is given. Therefore, even if there is no prior knowledge about the object region / background region, the object region and the background region can be divided with high accuracy and the region of interest (object region) can be extracted.

なお、顕著領域映像生成装置１０００、１１００の各処理を実行するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、当該記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することにより、顕著領域映像生成装置１０００、１１００に係る上述した種々の処理を行ってもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものであってもよい。また、「コンピュータシステム」は、ＷＷＷシステムを利用している場合であれば、ホームページ提供環境（あるいは表示環境）も含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、フラッシュメモリ等の書き込み可能な不揮発性メモリ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。 A program for executing each process of the saliency image generating apparatuses 1000 and 1100 is recorded on a computer-readable recording medium, and the program recorded on the recording medium is read into a computer system and executed. The above-described various processes related to the saliency image generation apparatuses 1000 and 1100 may be performed. Here, the “computer system” may include an OS and hardware such as peripheral devices. Further, the “computer system” includes a homepage providing environment (or display environment) if a WWW system is used. The “computer-readable recording medium” means a flexible disk, a magneto-optical disk, a ROM, a writable nonvolatile memory such as a flash memory, a portable medium such as a CD-ROM, a hard disk built in a computer system, etc. This is a storage device.

さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムが送信された場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリ（例えばＤＲＡＭ（Dynamic Random Access Memory））のように、一定時間プログラムを保持しているものも含むものとする。また、上記プログラムは、このプログラムを記憶装置等に格納したコンピュータシステムから、伝送媒体を介して、あるいは、伝送媒体中の伝送波により他のコンピュータシステムに伝送されてもよい。ここで、プログラムを伝送する「伝送媒体」は、インターネット等のネットワーク（通信網）や電話回線等の通信回線（通信線）のように情報を伝送する機能を有する媒体のことをいう。また、上記プログラムは、前述した機能の一部を実現するためのものであっても良い。さらに、前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるもの、いわゆる差分ファイル（差分プログラム）であっても良い。 Further, the “computer-readable recording medium” means a volatile memory (for example, DRAM (Dynamic DRAM) in a computer system that becomes a server or a client when a program is transmitted through a network such as the Internet or a communication line such as a telephone line. Random Access Memory)), etc., which hold programs for a certain period of time. The program may be transmitted from a computer system storing the program in a storage device or the like to another computer system via a transmission medium or by a transmission wave in the transmission medium. Here, the “transmission medium” for transmitting the program refers to a medium having a function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line. The program may be for realizing a part of the functions described above. Furthermore, what can implement | achieve the function mentioned above in combination with the program already recorded on the computer system, and what is called a difference file (difference program) may be sufficient.

以上、この発明の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。 The embodiment of the present invention has been described in detail with reference to the drawings. However, the specific configuration is not limited to this embodiment, and includes designs and the like that do not depart from the gist of the present invention.

１注目度映像抽出部２顕著領域事前確率画像抽出部３特徴量尤度算出部４顕著領域画像抽出部５顕著領域映像生成部６平滑化画像群生成部７顕著領域画像確定部１１基礎注目度画像抽出部１２確率的基礎注目度画像抽出部１３確率的基礎注目度母数逐次推定部１４視線位置確率密度画像抽出部１５視線位置確率密度映像出力部２１顕著領域事前確率画像生成部２２顕著領域事前確率画像更新部３１顕著領域特徴量尤度算出部３２非顕著領域特徴量尤度算出部３３特徴量尤度出力部４１顕著領域抽出グラフ生成部４２顕著領域抽出グラフ分割部１４１視線移動状態変数更新部１４２代表視線位置更新部１４３代表視線位置重み係数算出部１４４視線位置確率密度画像出力部１４５代表視線位置集合再構成部３１１顕著領域特徴量尤度生成部３１２顕著領域特徴量尤度更新部３２１非顕著領域特徴量尤度生成部３２２非顕著領域特徴量尤度更新部１０００、１１００顕著領域映像生成装置 DESCRIPTION OF SYMBOLS 1 Attention degree video extraction part 2 Saliency area prior probability image extraction part 3 Feature quantity likelihood calculation part 4 Saliency area image extraction part 5 Saliency area image generation part 6 Smoothed image group generation part 7 Saliency area image determination part 11 Basic attention degree Image extraction unit 12 Probabilistic basic attention level image extraction unit 13 Probabilistic basic attention level parameter sequential estimation unit 14 Gaze position probability density image extraction unit 15 Gaze position probability density video output unit 21 Saliency area prior probability image generation unit 22 Saliency area Prior probability image update unit 31 Saliency region feature amount likelihood calculation unit 32 Non-salience region feature amount likelihood calculation unit 33 Feature amount likelihood output unit 41 Saliency region extraction graph generation unit 42 Salience region extraction graph division unit 141 Gaze movement state variable Update unit 142 Representative eye gaze position update unit 143 Representative eye gaze position weight coefficient calculation unit 144 Eye gaze position probability density image output unit 14 Representative line-of-sight position set reconstruction unit 311 Striking region feature amount likelihood generating unit 312 Striking region feature amount likelihood updating unit 321 Non-salience region feature amount likelihood generating unit 322 Non-salience region feature amount likelihood updating unit 1000, 1100 Striking region Video generation device

Claims

Attention level video extraction process for extracting attention level video indicating the level of attention that is the degree to which human attention is likely to be drawn from the input video,
A saliency area prior probability image extraction process for extracting a saliency area prior probability image indicating the probability that each position of the input image that is each frame constituting the input video is a saliency area; and
A feature amount likelihood calculating process for calculating a feature amount likelihood indicating the likelihood of the image feature amount included in each of the saliency area and the non-salience area of the input image;
A saliency area image extraction process for extracting a saliency area image indicating a saliency area of the input image from the input image, the saliency area prior probability image, and the feature amount likelihood;
From each saliency area image obtained by executing the attention level video extraction process, the saliency area prior probability image extraction process, the feature amount likelihood calculation process, and the saliency area image extraction process for each input image. A saliency image generation process for generating a saliency image,
The saliency area prior probability image extraction process includes:
The degree of attention is an image corresponding to the input image in the attention level video extracted by the attention level video extraction process, wherein the remarkable area prior probability image indicating the probability that each position of the one input image is a remarkable area Extracting based on the image and the salient region image;
The feature amount likelihood calculation process includes:
The feature quantity likelihood is calculated based on at least one of the input image, the attention degree image, the saliency area prior probability image, the saliency area image, and the feature quantity likelihood calculated up to the previous time. A saliency area image generation method.

The attention level video extraction process includes:
A basic attention degree image extraction process for calculating a basic attention degree image, which is an image displaying a spatial region having a remarkable characteristic in the input image based on the input image;
A stochastic basic attention image that is an image displaying prominence at each position of the current input image using a probabilistic expression, a basic attention image calculated by the basic attention image extraction process, and a previous The probabilistic basic attention degree image calculated by the probabilistic basic attention degree image extraction process from the input image and the first basic parameter that is sequentially updated and used for eye-gaze position estimation is the probabilistic basic attention degree parameter. A stochastic basic attention level image extraction process calculated based on
A gaze position probability density image that is a frame of the gaze position probability density image in the current input image is obtained from the probabilistic basic attention image calculated by the probabilistic basic attention image extraction process and the previous input image. A line-of-sight position calculated based on the line-of-sight position probability density image calculated by the line-of-sight position probability density image extraction process and a line-of-sight position probability density parameter that is sequentially updated and used for line-of-sight position estimation Probability density image extraction process;
A time series calculated by sequentially repeating the basic attention level image extraction process, the probabilistic basic attention level image extraction process, and the gaze position probability density image extraction process for each of the input images. A line-of-sight position probability density image output process for outputting the line-of-sight position probability density image as the line-of-sight position probability density image;
The line-of-sight position probability density image extraction process includes:
A line-of-sight movement state variable, which is a random variable that controls the amount of line-of-sight movement, is calculated from the line-of-sight position probability density image calculated by the line-of-sight position probability density image extraction process from the previous input image and the previous input image. Gaze movement that is updated based on the gaze movement state variable calculated by the gaze movement state variable update process and the gaze position probability density parameter and outputs a gaze movement state variable set that is a set of the gaze movement state variables State variable update process;
A representative gaze position set, which is a set of representative gaze positions indicating representative gaze positions considering gaze movement, a stochastic basic attention image calculated by the stochastic basic attention image extraction process, and the previous input A representative gaze position update process that is updated based on the representative gaze position set updated from the image by the representative gaze position update process, the gaze movement state variable set, and the gaze position probability density parameter;
A representative gaze position weight coefficient set, which is a set of representative gaze position weight coefficients composed of weights associated with each of the representative gaze positions, and a probabilistic basic attention image calculated by the probabilistic basic attention image extraction process, A representative line-of-sight calculated based on the representative line-of-sight position set updated by the representative line-of-sight position update process, the line-of-sight movement state variable set output from the line-of-sight movement state variable update process, and the line-of-sight position probability density parameter Position weighting factor calculation process,
Gaze position probability for calculating the gaze position probability density image based on the representative gaze position set updated by the representative gaze position update process and the representative gaze position weight coefficient set calculated by the representative gaze position weight coefficient calculation process The saliency area image generation according to claim 1, further comprising: calculating a gaze position probability density image including the representative gaze position set and the representative gaze position weighting coefficient set. Method.

The saliency area prior probability image extraction process includes:
A saliency area prior probability image generation process for generating the saliency area prior probability image using only the attention level image; and
2. The saliency area prior probability image update process in which the saliency area prior probability image generated by the saliency area prior probability image generation process is updated using the saliency area image. Item 3. The saliency image generation method according to any one of Items 2 to 3.

The feature amount likelihood calculation process includes:
The saliency area feature amount likelihood indicating the likelihood of the image feature amount included in the saliency area is calculated based on the input image, the saliency area prior probability image, the saliency area image, and the saliency area feature amount likelihood calculated up to the previous time. A saliency area feature likelihood calculation process to be calculated based on at least one of them,
The non-salience area feature amount likelihood indicating the likelihood of the image feature amount included in the area outside the saliency area is calculated as the input image, the saliency area prior probability image, the saliency area image, and the non-salience area calculated until the previous time. A non-significant region feature amount likelihood calculation process based on at least one of the feature amount likelihoods;
4. The feature amount likelihood output process of adding the saliency region feature amount likelihood and the non-salience region feature amount likelihood and outputting the feature amount likelihood as a feature amount likelihood. The saliency area | region video generation method of any one of these.

The saliency area feature likelihood calculation process includes:
Based on the input image, the saliency area prior probability image, and the saliency area image, a saliency area feature amount likelihood generation process for generating the saliency area feature amount likelihood;
A saliency area feature amount likelihood update process for updating the saliency area feature amount likelihood generated by the saliency area feature amount likelihood generation process,
The non-significant region feature amount likelihood calculation process includes:
Based on the input image, the saliency area prior probability image, and the saliency area image, a non-salience area feature amount likelihood generation process for generating the non-salience area feature amount likelihood;
A non-salience area feature amount likelihood update process for updating the non-salience area feature amount likelihood generated by the non-salience area feature amount likelihood generation process,
The saliency area feature likelihood update process includes:
Updating the saliency area feature amount likelihood based on at least one of the input image, the saliency area image and the updated saliency area feature amount likelihood updated until the previous time,
The non-significant region feature amount likelihood update process includes:
The non-significant region feature value likelihood is updated based on at least one of the input image, the non-significant region image, and the updated non-significant region feature value likelihood updated up to the previous time. Item 5. The saliency image generation method according to Item 4.

A smoothed image group generation process for generating a smoothed image group composed of a plurality of smoothed images obtained by smoothing the input image with different resolutions, respectively;
Perform the saliency area prior probability image extraction process, the feature amount likelihood calculation process, and the saliency area image extraction process on the leveled image group, and determine the saliency area image of the input image. And further comprising a process,
The feature amount likelihood calculation process and the saliency area image extraction process include:
Using the smoothed image instead of the input image,
The saliency image generation process includes:
For each input image, the attention level video extraction process, the saliency area prior probability image extraction process, the feature amount likelihood calculation process, the saliency area image extraction process, the smoothed image group generation process, and the saliency area image 6. The saliency area image generation method according to claim 1, wherein the saliency area image is generated from the saliency area image obtained by executing a determination process.

An attention level video extraction unit that extracts a degree of attention video indicating a degree of attention that is a degree that a person is likely to pay attention from an input video;
A saliency area prior probability image extraction unit that extracts a saliency area prior probability image indicating the probability that each position of the input image that is each frame constituting the input video is a saliency area; and
A feature amount likelihood calculating unit that calculates a feature amount likelihood indicating the likelihood of the image feature amount included in each of the saliency area and the area outside the saliency area of the input image;
A saliency area image extracting unit that extracts a saliency area image indicating a saliency area of the input image from the input image, the saliency area prior probability image, and the feature amount likelihood;
From each saliency area image obtained by executing the attention level video extraction unit, the saliency area prior probability image extraction unit, the feature amount likelihood calculation unit, and the saliency area image extraction unit for each input image A saliency image generating unit for generating a saliency image,
The saliency area prior probability image extraction unit includes:
Attention degree which is an image corresponding to the input image in the attention degree video extracted by the attention degree video extracting unit, a saliency area prior probability image indicating the probability that each position of the one input image is a saliency area Extracting based on the image and the salient region image;
The feature amount likelihood calculating unit includes:
The feature quantity likelihood is calculated based on at least one of the input image, the attention degree image, the saliency area prior probability image, the saliency area image, and the feature quantity likelihood calculated up to the previous time. A saliency area image generating device.

The attention level video extraction unit
A basic attention level image extraction unit that calculates a basic attention level image, which is an image displaying a spatial region having a remarkable characteristic in the input image, based on the input image;
A stochastic basic attention image, which is an image displaying prominence at each position of the current input image using a probabilistic expression, a basic attention image calculated by the basic attention image extraction unit, and a previous time The probabilistic basic attention level image calculated by the probabilistic basic attention level image extraction unit from the input image and a probabilistic basic attention level parameter that is sequentially updated and used as a first parameter for eye-gaze position estimation A stochastic basic attention level image extraction unit that is calculated based on
A gaze position probability density image, which is a frame of the gaze position probability density image in the current input image, is obtained from the probabilistic basic attention image calculated by the probabilistic basic attention image extraction unit and the previous input image. A line-of-sight position calculated based on the line-of-sight position probability density image calculated by the line-of-sight position probability density image extraction unit and a line-of-sight position probability density parameter that is sequentially updated and used for line-of-sight position estimation A probability density image extraction unit;
A time series calculated by sequentially repeating the basic attention level image extraction unit, the probabilistic basic attention level image extraction unit, and the gaze position probability density image extraction unit for each of the input images. A line-of-sight position probability density image output unit that outputs the line-of-sight position probability density image as the line-of-sight position probability density image;
The line-of-sight position probability density image extraction unit,
A line-of-sight movement state variable, which is a random variable that controls the magnitude of line-of-sight movement, is calculated from the line-of-sight position probability density image calculated by the line-of-sight position probability density image extraction unit from the previous input image and the previous input image. A line-of-sight movement that is updated based on the line-of-sight movement state variable calculated by the line-of-sight movement state variable update unit and the line-of-sight position probability density parameter and outputs a line-of-sight movement state variable set that is a set of the line-of-sight movement state variables A state variable updater;
A representative gaze position set, which is a set of representative gaze positions indicating representative gaze positions considering gaze movement, a stochastic basic attention level image calculated by the probabilistic basic attention level image extraction unit, and the previous input A representative gaze position update unit that updates the representative gaze position set updated by the representative gaze position update unit based on the gaze movement state variable set and the gaze position probability density parameter;
A representative gaze position weighting coefficient set, which is a set of representative gaze position weighting coefficients composed of weights associated with each of the representative gaze positions, and a probabilistic basic attention degree image calculated by the probabilistic basic attention degree image extraction unit, The representative line-of-sight calculated based on the representative line-of-sight position set updated by the representative line-of-sight position update unit, the line-of-sight movement state variable set output from the line-of-sight movement state variable update unit, and the line-of-sight position probability density parameter A position weight coefficient calculation unit;
The line-of-sight position probability of calculating the line-of-sight position probability density image based on the representative line-of-sight position set updated by the representative line-of-sight position update unit and the representative line-of-sight position weight coefficient set calculated by the representative line-of-sight position weight coefficient calculating unit The saliency area image generation according to claim 7, further comprising: a density image output unit, wherein the sight line position probability density image including the representative sight line position set and the representative sight position weighting coefficient set is calculated. apparatus.

The saliency area prior probability image extraction unit includes:
A saliency area prior probability image generation unit that generates the saliency area prior probability image using only the attention level image; and
8. The saliency area prior probability image update unit configured to update the saliency area prior probability image generated by the saliency area prior probability image generation unit using the saliency area image. Item 9. The saliency area image generation device according to any one of Items 8 to 9.

The feature amount likelihood calculating unit includes:
The saliency area feature amount likelihood indicating the likelihood of the image feature amount included in the saliency area is calculated based on the input image, the saliency area prior probability image, the saliency area image, and the saliency area feature amount likelihood calculated up to the previous time. A saliency area feature amount likelihood calculating unit that calculates based on at least one of them,
The non-salience area feature amount likelihood indicating the likelihood of the image feature amount included in the area outside the saliency area is calculated as the input image, the saliency area prior probability image, the saliency area image, and the non-salience area calculated until the previous time. A non-significant region feature amount likelihood calculating unit that calculates based on at least one of the feature amount likelihoods;
10. The feature amount likelihood output unit configured to add the saliency region feature amount likelihood and the non-salience region feature amount likelihood and output as a feature amount likelihood. The saliency area | region video generation apparatus of any one of these.

The saliency area feature likelihood calculating unit
A saliency area feature amount likelihood generating unit that generates the saliency area feature amount likelihood based on the input image, the saliency area prior probability image, and the saliency area image;
A saliency area feature amount likelihood update unit configured to update the saliency area feature amount likelihood generated by the saliency area feature amount likelihood generation unit;
The non-significant region feature amount likelihood calculating unit
A non-salience area feature amount likelihood generating unit that generates the non-salience area feature amount likelihood based on the input image, the saliency area prior probability image, and the saliency area image;
A non-salience area feature amount likelihood update unit configured to update the non-salience area feature amount likelihood generated by the non-salience area feature amount likelihood generation unit;
The saliency area feature likelihood update unit includes:
Updating the saliency area feature amount likelihood based on at least one of the input image, the saliency area image and the updated saliency area feature amount likelihood updated until the previous time,
The non-significant region feature amount likelihood update unit
The non-significant region feature value likelihood is updated based on at least one of the input image, the non-significant region image, and the updated non-significant region feature value likelihood updated up to the previous time. Item 13. The saliency image generating device according to Item 10.

A smoothed image group generation unit that generates a smoothed image group composed of a plurality of smoothed images obtained by smoothing the input image at different resolutions;
A saliency area that executes processing of the saliency area prior probability image extraction unit, the feature amount likelihood calculation unit, and the saliency area image extraction unit for the leveled image group to determine the saliency area image of the input image An image determination unit;
The feature amount likelihood calculating unit and the salient region image extracting unit are:
Using the smoothed image instead of the input image,
The saliency area image generation unit includes:
For each input image, the attention level video extraction unit, the saliency area prior probability image extraction unit, the feature amount likelihood calculation unit, the saliency area image extraction unit, the smoothed image group generation unit, and the saliency area image 12. The saliency area image generation device according to claim 7, wherein the saliency area image is generated from the saliency area image obtained by executing each process of the determination unit.

Attention level video extraction step for extracting an attention level video indicating a degree of attention that is the degree to which a human is likely to pay attention from an input video;
A saliency area prior probability image extraction step for extracting a saliency area prior probability image indicating a probability that each position of the input image that is each frame constituting the input video is a saliency area; and
A feature amount likelihood calculating step of calculating a feature amount likelihood indicating the likelihood of the image feature amount included in each of the saliency area and the area outside the saliency area of the input image;
A saliency area image extracting step of extracting a saliency area image indicating a saliency area of the input image from the input image, the saliency area prior probability image, and the feature amount likelihood; and
From each saliency area image obtained by executing the attention level video extraction step, the saliency area prior probability image extraction step, the feature amount likelihood calculation step and the saliency area image extraction step for each input image. A program for causing a computer to execute a saliency area image generation step for generating a saliency area image,
The saliency area prior probability image extraction step includes:
Attention degree which is an image corresponding to the input image in the attention degree video extracted by the attention degree video extraction step, a saliency area prior probability image indicating the probability that each position of the one input image is a saliency area Extracting based on the image and the salient region image;
The feature amount likelihood calculating step includes:
The feature quantity likelihood is calculated based on at least one of the input image, the attention degree image, the saliency area prior probability image, the saliency area image, and the feature quantity likelihood calculated up to the previous time. Program.

Attention level video extraction step for extracting an attention level video indicating a degree of attention that is the degree to which a human is likely to pay attention from an input video;
A saliency area prior probability image extraction step for extracting a saliency area prior probability image indicating a probability that each position of the input image that is each frame constituting the input video is a saliency area; and
A feature amount likelihood calculating step of calculating a feature amount likelihood indicating the likelihood of the image feature amount included in each of the saliency area and the area outside the saliency area of the input image;
A saliency area image extracting step of extracting a saliency area image indicating a saliency area of the input image from the input image, the saliency area prior probability image, and the feature amount likelihood; and
From each saliency area image obtained by executing the attention level video extraction step, the saliency area prior probability image extraction step, the feature amount likelihood calculation step and the saliency area image extraction step for each input image. A computer-readable storage medium storing a program for causing a computer to execute a saliency area image generation step for generating a saliency area image,
The saliency area prior probability image extraction step includes:
Attention degree which is an image corresponding to the input image in the attention degree video extracted by the attention degree video extraction step, a saliency area prior probability image indicating the probability that each position of the one input image is a saliency area Extracting based on the image and the salient region image;
The feature amount likelihood calculating step includes:
The feature quantity likelihood is calculated based on at least one of the input image, the attention degree image, the saliency area prior probability image, the saliency area image, and the feature quantity likelihood calculated up to the previous time. A recording medium.