JP2009003615A

JP2009003615A - Attention region extraction method, attention region extraction device, computer program, and recording medium

Info

Publication number: JP2009003615A
Application number: JP2007162477A
Authority: JP
Inventors: Shogo Kimura; 昭悟木村; Kunio Kayano; 邦夫柏野; Tatsuto Takeuchi; 龍人竹内; Leung Clement; レオンクレメント
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2007-06-20
Filing date: 2007-06-20
Publication date: 2009-01-08
Anticipated expiration: 2027-06-20
Also published as: JP4750758B2

Abstract

<P>PROBLEM TO BE SOLVED: To extract an attention region under the consideration of the temporal variation of the degree of attention. <P>SOLUTION: This attention region extraction method includes: a process for extracting a basic attention degree image displaying a space region having remarkable characteristics in the certain frame of an input video; a process for suppressing the maximum basic attention degree region as a region whose basic attention degree is the highest of the basic attention degree image extracted from a frame previous to a current frame in the basic attention degree image extracted from the current frame, and for extracting a post-instantaneous suppression attention degree image or a process for extracting a region having a remarkable value in a time-axis direction about the basic attention degree image calculated from several frames before the current frame, and for emphasizing the extracted region, and for suppressing the other region to extract a post-gradual suppression attention degree image in the basic attention degree image or an instantaneous transition attention degree image; or both of those processes; and a process for extracting the post-gradual suppression attention degree image as the time series of the post-instantaneous suppression attention degree image or the post-gradual suppression attention degree video, and for outputting this as an attention degree video. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、映像信号における注目領域を抽出する注目領域抽出方法、注目領域抽出装置、コンピュータプログラム、及び、記録媒体に関する。 The present invention relates to an attention area extraction method, an attention area extraction device, a computer program, and a recording medium that extract an attention area in a video signal.

ディジタルカメラの普及、記録媒体の大容量化、低価格化、大容量ネットワークの普及により、多様な画像・映像を蓄積して検索・利用する技術が必要となってきている。中でも、言葉では表現しにくい色・形状・構図・模様などの概念から、全体として類似した画像・映像を高速に引き出してくる技術は、ショッピングサイトや動画共有サイトなど、幅広い分野で開発が求められている。 With the spread of digital cameras, the increase in recording media capacity, the price reduction, and the proliferation of large-capacity networks, a technique for storing and retrieving and using various images / videos has become necessary. Above all, technology that quickly extracts similar images and videos from colors, shapes, compositions, and patterns that are difficult to express in words requires development in a wide range of fields such as shopping sites and video sharing sites. ing.

上記のような検索技術の実現において、ユーザの検索意図を反映させるために、画像・映像の中で人間が重要と感じる度合いを注目度として抽出し、この抽出した注目度に基づいて画像の類似性を判断する方法が有望である。前記注目度を算出する技術として非特許文献１及び非特許文献２に記載の方法が、また前記注目度に基づいて画像の類似性を判断する技術として特許文献１に記載の方法が知られている。
特開２００６−３３８３１３号公報 L. Itti et al.，“A model of saliency-based visual attention for rapid scene analysis”，IEEE Transactions on Pattern Analysis and Machine Intelligence，Volume 20，Number11，pp. 1254-1259，November 1998. L. Itti et a1.，“Realistic avatar eye and head animation using a neurobiological model of visual attention”，Proceedings of SPIE International Symposium on Optical Science and Technology，Volume 5200，pp. 64-78，August 2003. In the implementation of the search technology as described above, in order to reflect the search intention of the user, the degree of importance that a person feels important in the image / video is extracted as the attention level, and the similarity of the images based on the extracted attention level A promising method for gender determination is promising. As a technique for calculating the degree of attention, a method described in Non-Patent Document 1 and Non-Patent Document 2 is known, and as a technique for determining image similarity based on the degree of attention, a method described in Patent Document 1 is known. Yes.
JP 2006-338313 A L. Itti et al., “A model of saliency-based visual attention for rapid scene analysis”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 20, Number 11, pp. 1254-1259, November 1998. L. Itti et a1, “Realistic avatar eye and head animation using a neurobiological model of visual attention”, Proceedings of SPIE International Symposium on Optical Science and Technology, Volume 5200, pp. 64-78, August 2003.

しかしながら、上記文献に記載の方法では、注目度が時間的に変動する側面についてはほとんど考慮されていない。本発明は、このような事情に鑑みてなされたものであり、その目的は、時間的変動を考慮した新たな注目度の算出方法を導入した注目領域抽出方法、注目領域抽出装置、コンピュータプログラム、及び、記録媒体を提供することにある。 However, in the method described in the above document, the aspect in which the degree of attention varies with time is hardly taken into consideration. The present invention has been made in view of such circumstances, and the purpose thereof is an attention area extraction method, an attention area extraction device, a computer program, and a new attention degree calculation method that takes into account temporal variations. And providing a recording medium.

すなわち、本発明においては、以下の２点により、注目度の時間的変動を考慮した注目度算出方法を実現する。 That is, in the present invention, the attention level calculation method considering the temporal variation of the attention level is realized by the following two points.

（１）注目度が最大となる領域の瞬時的な移動の導入。
人間の初期視覚において、できるだけ早く視野全体の情報を把握するために、注視点が高速に移動する「視覚探索」という現象が知られている（R.M. Klein: "Inhibition of return," Trends in Cognitive Sciences, Vol.4, No.4, pp.138-147, April 2000.；非特許文献３）。本発明ではこの「視覚探索」を注目度の算出に導入する。 (1) Introduction of instantaneous movement of an area where the degree of attention is maximum.
In order to grasp the information of the entire visual field as soon as possible in human early vision, a phenomenon called “visual search” in which the point of interest moves at high speed is known (RM Klein: “Inhibition of return,” Trends in Cognitive Sciences , Vol.4, No.4, pp.138-147, April 2000 .; In the present invention, this “visual search” is introduced in the calculation of the degree of attention.

（２）変動の小さい視覚的刺激に対する注目度の抑制の導入。
人間の初期視覚において、できるだけ少ない注視点移動で視野全体の情報を把握するために、視覚的刺激が小さい領域に対する注目の度合いが時間経過と共に低下する「視覚適応」という現象が知られている（S. Martinez-Conde, S. L. Macknik and D. H. Hubel: "The role of fixational eye movements in visual perception," Nature Reviews, Vol.5, pp.229-240, March 2004.；非特許文献４）。本発明では、これの「視覚適応」を注目度の算出に導入する。 (2) Introduction of suppression of attention to visual stimuli with small fluctuations.
In early human vision, a phenomenon called “visual adaptation” is known in which the degree of attention to an area with a small visual stimulus decreases with time in order to grasp information of the entire visual field with as little gaze point movement as possible ( S. Martinez-Conde, SL Macknik and DH Hubel: “The role of fixational eye movements in visual perception,” Nature Reviews, Vol. 5, pp. 229-240, March 2004 .; In the present invention, this “visual adaptation” is introduced in the calculation of the degree of attention.

前記課題を解決するために、本発明は、対象となる入力映像から、その入力映像の中で顕著な特性を持つ時空間領域を表示した映像である注目度映像を抽出する注目領域抽出方法であって、入力映像を構成するあるフレームから、当該フレームの中で顕著な特性を持つ空間領域を表示した画像である基礎注目度画像を抽出する基礎注目度画像抽出過程と、前記基礎注目度画像抽出過程により前記入力映像の現在より１つ前のフレームから抽出された基礎注目度画像において、各画素の値である基礎注目度が最も大きい領域である最大基礎注目度領域を、前記基礎注目度画像抽出過程により前記入力映像の現在フレームから抽出された基礎注目度画像において抑制することにより、瞬時抑制後注目度画像を抽出する注目度瞬時抑制過程、または、前記入力映像の現在のフレーム以前のいくつかのフレームから算出された基礎注目度画像について、時間軸方向で顕著な値を持つ領域を抽出し、前記基礎注目度画像抽出過程により前記入力映像の当該フレームから抽出された基礎注目度画像、または、前記注目度瞬時抑制過程により前記入力映像の当該フレームから抽出された瞬時遷移注目度画像において、抽出した前記領域を強調するとともに他の領域を抑制することにより、漸進抑制後注目度画像を抽出する注目度漸進抑制過程の一方、または、両方の過程と、前記基礎注目度画像抽出過程と、注目度瞬時抑制過程または注目度漸進抑制過程の一方または両方の過程とを、前記入力映像の各フレームに対して順に繰り返して実行することにより、前記瞬時抑制後注目度画像または前記漸進抑制後注目度画像の時系列である漸進抑制後注目度映像を抽出し、これを注目度映像として出力する注目度映像出力過程とを有することを特徴とする注目領域抽出方法である。 In order to solve the above-mentioned problems, the present invention provides a region-of-interest extraction method for extracting a degree-of-interest video, which is a video displaying a spatio-temporal region having remarkable characteristics in the input video, from a target input video. A basic attention level image extraction process for extracting a basic attention level image, which is an image displaying a spatial region having a remarkable characteristic in the frame from a certain frame constituting the input video, and the basic attention level image In the basic attention level image extracted from the previous frame of the input video by the extraction process, the maximum basic attention level area that is the area where the basic attention level that is the value of each pixel is the largest is the basic attention level area. An attention level instantaneous suppression process of extracting an attention level image after instantaneous suppression by suppressing in a basic attention level image extracted from the current frame of the input video by an image extraction process, or An area having a remarkable value in the time axis direction is extracted from the basic attention image calculated from several frames before the current frame of the input video, and the input video of the input video is extracted by the basic attention image extraction process. In the basic attention level image extracted from the frame or the instantaneous transition attention level image extracted from the frame of the input video by the attention level instantaneous suppression process, the extracted region is emphasized and other regions are suppressed. Accordingly, one or both of the attention degree gradual suppression processes for extracting the attention degree image after gradual suppression, the basic attention degree image extraction process, one of the attention degree instantaneous suppression process or the attention degree gradual suppression process, or By repeating both steps in order for each frame of the input video, Proceed to extract the series progressive suppression after prominence image is a time of suppression after prominence image, a region of interest extraction method and having a degree of attention video output process of outputting this as attention image.

上記発明では、注目度瞬時抑制過程によって、注目度が最大となる領域の瞬時的な移動を導入し、「視覚探索」現象を信号処理により実現する。また、注目度漸進抑制過程によって、変動の小さい視覚的刺激に対する注目度の抑制を導入し、「視覚適応」現象を信号処理により実現する。 In the above invention, an instantaneous movement of a region where the degree of attention is maximized is introduced by the process of instantaneous attention degree suppression, and the “visual search” phenomenon is realized by signal processing. In addition, the attention degree gradual suppression process is used to introduce attention degree suppression for visual stimuli with small fluctuations, and the “visual adaptation” phenomenon is realized by signal processing.

また、本発明は、上述する注目領域抽出方法であって、前記基礎注目度画像抽出過程は、前記入力映像のあるフレームから、基礎特徴画像を複数種類抽出する基礎特徴画像抽出過程と、前記基礎特徴画像抽出過程により抽出された基礎特徴画像の各種類について、その多重解像度表現である多重解像度画像を抽出する多重解像度画像抽出過程と、前記多重解像度画像抽出過程により抽出された多重解像度画像の各種類について、解像度の異なる画像の間の差分である解像度差分画像を複数抽出する解像度差分画像抽出過程と、前記解像度差分画像抽出過程により抽出された解像度差分画像の各種類について、解像度の異なる解像度差分画像を統合することにより、顕著度画像を抽出する顕著度画像抽出過程と、前記顕著度画像抽出過程により抽出された顕著度画像について、複数種類の顕著度画像を統合することにより、基礎注目度画像を抽出する顕著度画像統合過程とからなり、前記注目度漸進抑制過程は、基礎注目度画像に代えて漸進抑制画像を用いて顕著度画像を抽出することを特徴とする。 Further, the present invention is the attention area extraction method described above, wherein the basic attention image extraction process includes a basic feature image extraction process for extracting a plurality of types of basic feature images from a frame of the input video, and the basic For each type of basic feature image extracted by the feature image extraction process, a multi-resolution image extraction process for extracting a multi-resolution image, which is a multi-resolution expression, and each of the multi-resolution images extracted by the multi-resolution image extraction process A resolution difference image extraction process for extracting a plurality of resolution difference images, which are differences between images having different resolutions, and a resolution difference having a different resolution for each type of resolution difference image extracted by the resolution difference image extraction process. By integrating the images, the saliency image extraction process for extracting the saliency image and the saliency image extraction process A saliency image integration process for extracting a basic attention level image by integrating a plurality of types of saliency images with respect to the issued saliency image, and the attention degree gradual suppression process is replaced with a basic attention level image. The saliency image is extracted using the progressive suppression image.

また、本発明は、上述する注目領域抽出方法であって、さらに、前記基礎注目度画像抽出過程により抽出された基礎注目度画像から前記最大基礎注目度領域を抽出し、複数種類の顕著度画像について前記最大基礎注目度領域に対応する領域の中の値を算出し、その値の大きさから、対応する顕著度画像の重みである顕著度画像統合比率を決定する顕著度画像統合比率算出過程とを有し、前記基礎注目度画像抽出過程は、前記入力映像の１つ前のフレームについて前記顕著度画像統合比率算出過程により算出された顕著度画像統合比率によって顕著度画像を重み付けして統合することにより、基礎注目度画像を抽出することを特徴とする。 Further, the present invention is the above-described attention area extraction method, further extracting the maximum basic attention degree area from the basic attention degree image extracted by the basic attention degree image extraction process, and a plurality of types of saliency images. A saliency image integration ratio calculating step of calculating a value in an area corresponding to the maximum basic attention degree area and determining a saliency image integration ratio that is a weight of the corresponding saliency image from the magnitude of the value In the basic attention level image extraction process, the saliency image is weighted and integrated by the saliency image integration ratio calculated by the saliency image integration ratio calculation process for the previous frame of the input video. Thus, the basic attention degree image is extracted.

また、本発明は、上述する注目領域抽出方法であって、前記顕著度画像統合比率算出過程は、前記入力映像の１つの前のフレームについて算出された前記顕著度画像統合比率を初期値とし、各顕著度画像について算出された前記最大基礎注目度領域の中の値を当該初期値に対しての差分値として新たな顕著度画像統合比率を更新することを特徴とする。 Further, the present invention is the attention area extraction method described above, wherein the saliency image integration ratio calculation step uses the saliency image integration ratio calculated for one previous frame of the input video as an initial value, A new saliency image integration ratio is updated using a value in the maximum basic attention level calculated for each saliency image as a difference value with respect to the initial value.

また、本発明は、上述する注目領域抽出方法であって、前記注目度瞬時抑制過程は、前記基礎注目度画像抽出過程により前記入力映像の現在より１つ前のフレームから抽出された基礎注目度画像から前記最大基礎注目度領域を抽出する最大基礎注目度領域検出過程と、前記最大基礎注目度領域検出過程により抽出された前記最大基礎注目度領域を遮蔽する画像である最大基礎注目度領域遮蔽画像を抽出する最大基礎注目度領域遮蔽画像抽出過程と、前記最大基礎注目度領域遮蔽画像抽出過程により抽出された最大基礎注目度領域遮蔽画像によって遮蔽されている領域について、その領域における遮蔽を低減する画像である注目度漸進回復画像を抽出する注目度漸進回復画像抽出過程と、前記最大基礎注目度領域遮蔽画像抽出過程により抽出された最大基礎注目度領域遮蔽画像、及び、前記注目度漸進回復画像抽出過程により抽出された注目度漸進回復画像を統合することにより注目度瞬時抑制画像を生成する注目度瞬時抑制画像生成過程と、前記注目度瞬時抑制画像生成過程により生成された注目度瞬時抑制画像、及び、基礎注目度画像抽出過程により抽出された前記入力映像の現在のフレームの基礎注目度画像を統合することにより瞬時抑制後注目度画像を生成する瞬時抑制後注目度画像生成過程とからなることを特徴とする。 Further, the present invention is the above-described attention area extracting method, wherein the attention degree instantaneous suppression process includes a basic attention degree extracted from the previous frame of the input video by the basic attention degree image extraction process. Maximum basic attention area detection process for extracting the maximum basic attention area from the image, and maximum basic attention area shielding that is an image for shielding the maximum basic attention area extracted by the maximum basic attention area detection process Reduction of occlusion in the area of the maximum basic attention area occlusion image extraction process for extracting an image and the area occluded by the maximum basic attention area occlusion image extracted by the maximum basic attention area occlusion image extraction process Extracted by the attention degree progressive recovery image extraction process of extracting the attention degree progressive recovery image that is the image to be captured and the maximum basic attention degree region occlusion image extraction process. The attention level instantaneous suppression image generation process for generating the attention level instantaneous suppression image by integrating the maximum basic attention level region occlusion image and the attention level progressive recovery image extracted by the attention level progressive recovery image extraction process, After instantaneous suppression by integrating the instantaneous attention suppressed image generated by the attention instantaneous suppression image generation process and the basic attention image of the current frame of the input video extracted by the basic attention image extraction process It is characterized by comprising an attention degree image generation process after instantaneous suppression for generating an attention degree image.

また、本発明は、上述する注目領域抽出方法であって、前記注目度漸進抑制過程は、前記基礎注目度画像抽出過程により抽出された基礎注目度画像を漸進的に遮蔽する画像である注目度漸進遮蔽画像を生成する注目度漸進遮蔽画像生成過程と、前記入力映像の現在のフレーム以前のいくつかのフレームから算出された基礎注目度画像もしくは顕著度画像について、時間軸方向で顕著な値を持つ領域を抽出し、その領域に対応する前記基礎注目度画像もしくは前記瞬時抑制後注目度画像における領域の基礎注目度の抑制を解除させる画像である注目度瞬時回復画像を生成する注目度瞬時回復画像生成過程と、前記注目度漸進遮蔽画像生成過程により生成された注目度漸進遮蔽画像、及び、前記注目度瞬時回復画像生成過程により生成された注目度瞬時回復画像を統合することにより注目度漸進抑制画像を生成する注目度漸進抑制画像生成過程と、前記基礎注目度画像抽出過程により抽出された基礎注目度画像、もしくは、前記注目度瞬時抑制過程により抽出された瞬時抑制後注目度画像と、前記注目度漸進抑制画像生成過程に生成された注目度漸進抑制画像とを統合することにより漸進抑制後注目度画像を生成する漸進抑制後注目度画像生成過程とからなることを特徴とする。 Further, the present invention is the attention area extraction method described above, wherein the attention degree gradual suppression process is an attention degree that is an image that gradually blocks the basic attention degree image extracted by the basic attention degree image extraction process. Attention level progressive occlusion image generation process for generating a progressive occlusion image, and a basic attention level image or saliency image calculated from several frames before the current frame of the input video, and a remarkable value in the time axis direction Attention instant recovery that extracts a region of interest and generates an attention degree instantaneous recovery image that is an image for releasing the suppression of the basic attention degree of the region in the basic attention degree image or the instantaneous attention degree attention image corresponding to the region Image generation process, attention degree progressive occlusion image generated by the attention degree progressive occlusion image generation process, and attention generated by the attention degree instantaneous recovery image generation process The attention degree gradual suppression image generation process for generating the attention degree gradual suppression image by integrating the instantaneous recovery images, the basic attention degree image extracted by the basic attention degree image extraction process, or the attention degree instantaneous suppression process Generating attention degree image after progressive suppression by integrating the extracted attention degree image after instantaneous suppression and the attention degree progressive suppression image generated in the attention degree progressive suppression image generation process. It consists of a process.

また、本発明は、対象となる入力映像から、その入力映像の中で顕著な特性を持つ時空間領域を表示した映像である注目度映像を抽出する注目領域抽出装置であって、入力映像を構成するあるフレームから、当該フレームの中で顕著な特性を持つ空間領域を表示した画像である基礎注目度画像を抽出する基礎注目度画像抽出部と、前記基礎注目度画像抽出部により前記入力映像の現在より１つ前のフレームから抽出された基礎注目度画像において、各画素の値である基礎注目度が最も大きい領域である最大基礎注目度領域を、前記基礎注目度画像抽出部により前記入力映像の現在フレームから抽出された基礎注目度画像において抑制することにより、瞬時抑制後注目度画像を抽出する注目度瞬時抑制部、または、前記入力映像の現在のフレーム以前のいくつかのフレームから算出された基礎注目度画像について、時間軸方向で顕著な値を持つ領域を抽出し、前記基礎注目度画像抽出部により前記入力映像の当該フレームから抽出された基礎注目度画像、または、前記注目度瞬時抑制部により前記入力映像の当該フレームから抽出された瞬時遷移注目度画像において、抽出した前記領域を強調するとともに他の領域を抑制することにより、漸進抑制後注目度画像を抽出する注目度漸進抑制部の一方、または、両方と、前記入力映像の各フレームに対して、注目度瞬時抑制部が抽出した前記瞬時抑制後注目度画像、または、前記注目度漸進抑制部が抽出した前記漸進抑制後注目度画像の時系列である漸進抑制後注目度映像を抽出し、これを注目度映像として出力する注目度映像出力部とを備えることを特徴とする注目領域抽出装置である。 The present invention is also a region-of-interest extraction apparatus that extracts a video of attention level, which is a video displaying a spatio-temporal region having remarkable characteristics in the input video, from the target input video, A basic attention level image extraction unit that extracts a basic attention level image that is an image displaying a spatial region having a remarkable characteristic in the frame from a certain frame that constitutes the frame, and the input video by the basic attention level image extraction unit In the basic attention level image extracted from the previous frame from the current frame, the maximum basic attention level area, which is the area having the highest basic attention level as the value of each pixel, is input by the basic attention level image extraction unit. An attention level instantaneous suppression unit that extracts an attention level image after instantaneous suppression by suppressing the basic attention level image extracted from the current frame of the video, or the current frame of the input video For the basic attention degree image calculated from several previous frames, a region having a remarkable value in the time axis direction is extracted, and the basic attention extracted from the frame of the input video by the basic attention degree image extraction unit In the instantaneous image or the instantaneous transition attention level image extracted from the frame of the input video by the attention level instantaneous suppression unit, the extracted region is emphasized and the other regions are suppressed, thereby attracting attention after progressive suppression. The attention degree gradual suppression unit for extracting the degree image and the instantaneous degree of attention suppression image extracted by the attention degree instantaneous suppression unit or the attention degree gradual extraction for each frame of the input video A degree-of-interest video output unit that extracts a degree-of-intermediate attention level image that is a time series of the degree-of-gradual attention level image extracted by the suppression unit, and outputs this Is a region of interest extraction device, characterized in that it comprises.

また、本発明は、対象となる入力映像から、その入力映像の中で顕著な特性を持つ時空間領域を表示した映像である注目度映像を抽出する注目領域抽出装置として用いられるコンピュータに、入力映像を構成するあるフレームから、当該フレームの中で顕著な特性を持つ空間領域を表示した画像である基礎注目度画像を抽出する基礎注目度画像抽出過程と、前記基礎注目度画像抽出過程により前記入力映像の現在より１つ前のフレームから抽出された基礎注目度画像において、各画素の値である基礎注目度が最も大きい領域である最大基礎注目度領域を、前記基礎注目度画像抽出過程により前記入力映像の現在フレームから抽出された基礎注目度画像において抑制することにより、瞬時抑制後注目度画像を抽出する注目度瞬時抑制過程、または、前記入力映像の現在のフレーム以前のいくつかのフレームから算出された基礎注目度画像について、時間軸方向で顕著な値を持つ領域を抽出し、前記基礎注目度画像抽出過程により前記入力映像の当該フレームから抽出された基礎注目度画像、または、前記注目度瞬時抑制過程により前記入力映像の当該フレームから抽出された瞬時遷移注目度画像において、抽出した前記領域を強調するとともに他の領域を抑制することにより、漸進抑制後注目度画像を抽出する注目度漸進抑制過程の一方、または、両方の過程と、前記基礎注目度画像抽出過程と、注目度瞬時抑制過程または注目度漸進抑制過程の一方または両方の過程とを、前記入力映像の各フレームに対して順に繰り返して実行することにより、前記瞬時抑制後注目度画像または前記漸進抑制後注目度画像の時系列である漸進抑制後注目度映像を抽出し、これを注目度映像として出力する注目度映像出力過程とを実行させることを特徴とするコンピュータプログラムである。 The present invention also provides an input to a computer used as an attention area extraction device that extracts an attention degree video that is a video displaying a spatio-temporal area having remarkable characteristics in the input video from the target input video. A basic attention degree image extraction process for extracting a basic attention degree image, which is an image displaying a spatial region having a remarkable characteristic in the frame, from a certain frame constituting the video, and the basic attention degree image extraction process. In the basic attention degree image extracted from the frame immediately before the current input video, the maximum basic attention degree area, which is the area having the largest basic attention degree as the value of each pixel, is obtained by the basic attention degree image extraction process. Attention level instantaneous suppression process of extracting an attention level image after instantaneous suppression by suppressing the basic attention level image extracted from the current frame of the input video, or A region having a remarkable value in the time axis direction is extracted from the basic attention image calculated from several frames before the current frame of the input video, and the input video is extracted by the basic attention image extraction process. In the basic attention level image extracted from the frame or the instantaneous transition attention level image extracted from the frame of the input video by the attention level instantaneous suppression process, the extracted region is emphasized and other regions are suppressed. Thus, one or both of the attention degree gradual suppression process for extracting the attention degree image after the gradual suppression, the basic attention degree image extraction process, the attention degree instantaneous suppression process or the attention degree gradual suppression process. Alternatively, by repeating the both processes in order for each frame of the input video, Extract the series progressive suppression after prominence image is a time of gradual suppression after prominence image, a computer program, characterized in that to execute the attention image output step of outputting this as attention image.

また、本発明は、対象となる入力映像から、その入力映像の中で顕著な特性を持つ時空間領域を表示した映像である注目度映像を抽出する注目領域抽出装置として用いられるコンピュータに、入力映像を構成するあるフレームから、当該フレームの中で顕著な特性を持つ空間領域を表示した画像である基礎注目度画像を抽出する基礎注目度画像抽出過程と、前記基礎注目度画像抽出過程により前記入力映像の現在より１つ前のフレームから抽出された基礎注目度画像において、各画素の値である基礎注目度が最も大きい領域である最大基礎注目度領域を、前記基礎注目度画像抽出過程により前記入力映像の現在フレームから抽出された基礎注目度画像において抑制することにより、瞬時抑制後注目度画像を抽出する注目度瞬時抑制過程、または、前記入力映像の現在のフレーム以前のいくつかのフレームから算出された基礎注目度画像について、時間軸方向で顕著な値を持つ領域を抽出し、前記基礎注目度画像抽出過程により前記入力映像の当該フレームから抽出された基礎注目度画像、または、前記注目度瞬時抑制過程により前記入力映像の当該フレームから抽出された瞬時遷移注目度画像において、抽出した前記領域を強調するとともに他の領域を抑制することにより、漸進抑制後注目度画像を抽出する注目度漸進抑制過程の一方、または、両方の過程と、前記基礎注目度画像抽出過程と、注目度瞬時抑制過程または注目度漸進抑制過程の一方または両方の過程とを、前記入力映像の各フレームに対して順に繰り返して実行することにより、前記瞬時抑制後注目度画像または前記漸進抑制後注目度画像の時系列である漸進抑制後注目度映像を抽出し、これを注目度映像として出力する注目度映像出力過程とを実行させるコンピュータプログラムを記録したコンピュータ読み取り可能な記録媒体である。 The present invention also provides an input to a computer used as an attention area extraction device that extracts an attention degree video that is a video displaying a spatio-temporal area having remarkable characteristics in the input video from the target input video. A basic attention degree image extraction process for extracting a basic attention degree image, which is an image displaying a spatial region having a remarkable characteristic in the frame, from a certain frame constituting the video, and the basic attention degree image extraction process. In the basic attention degree image extracted from the frame immediately before the current input video, the maximum basic attention degree area, which is the area having the largest basic attention degree as the value of each pixel, is obtained by the basic attention degree image extraction process. Attention level instantaneous suppression process of extracting an attention level image after instantaneous suppression by suppressing the basic attention level image extracted from the current frame of the input video, or A region having a remarkable value in the time axis direction is extracted from the basic attention image calculated from several frames before the current frame of the input video, and the input video is extracted by the basic attention image extraction process. In the basic attention level image extracted from the frame or the instantaneous transition attention level image extracted from the frame of the input video by the attention level instantaneous suppression process, the extracted region is emphasized and other regions are suppressed. Thus, one or both of the attention degree gradual suppression process for extracting the attention degree image after the gradual suppression, the basic attention degree image extraction process, the attention degree instantaneous suppression process or the attention degree gradual suppression process. Alternatively, by repeating the both processes in order for each frame of the input video, A computer-readable recording medium that records a computer program that extracts a degree-of-interest attention level video that is a time series of the degree-of-advance attention degree image and outputs the attention degree video as a degree-of-interest video. is there.

本発明によれば、映像から注目領域を抽出する際に、注目度が最大となる領域の瞬時的な移動の導入、変動の小さい視覚的刺激に対する注目度の抑制を行うことにより、映像の注目度の時間的変更を考慮し、人間が重要と感じる部分に近い注目領域を抽出することが可能となる。 According to the present invention, when extracting a region of interest from an image, the attention of the image is obtained by introducing instantaneous movement of the region where the degree of attention is maximum and suppressing the degree of attention to a visual stimulus with small fluctuation. It is possible to extract a region of interest close to a portion that is felt important by humans in consideration of the temporal change of the degree.

以下、図面を用いて本発明の実施形態を説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

［第１の実施形態］
図１は、本発明の第１の実施形態による注目領域抽出装置の機能ブロック図である。
同図に示す注目領域抽出装置は、基礎注目度画像抽出部１と、注目度瞬時抑制部３と、注目度映像出力部５とにより構成され、注目度抽出の対象となる映像のデータである入力映像を入力し、入力画像の中で注目度の高い顕著な特性を持つ時空間領域を表示した映像である注目度映像を出力する。映像は、複数のフレームの画像からなる。 [First Embodiment]
FIG. 1 is a functional block diagram of a region of interest extraction apparatus according to the first embodiment of the present invention.
The attention area extracting apparatus shown in the figure is composed of a basic attention degree image extraction unit 1, an attention degree instantaneous suppression unit 3, and an attention degree video output unit 5, and is video data that is a target of attention degree extraction. An input video is input, and an attention level video which is a video displaying a spatio-temporal region having a remarkable characteristic with a high level of attention in the input image is output. A video consists of images of a plurality of frames.

基礎注目度画像抽出部１は、入力映像のあるフレームである入力画像を入力し、そのフレームの中で顕著な特性を持つ空間領域を表示した画像である基礎注目度画像を抽出し、その基礎注目度画像を出力する。
基礎注目度画像の抽出方法は特に限定されるものではないが、本実施形態においては、基礎注目度画像抽出部１が、基礎特徴画像抽出部１１と、多重解像度画像抽出部１２と、解像度差分画像抽出部１３と、顕著度画像抽出部１４と、顕著度画像統合部１５とによって構成される場合の特徴抽出方法について述べる。 The basic attention level image extracting unit 1 inputs an input image that is a frame with an input video, extracts a basic attention level image that is an image displaying a spatial region having a remarkable characteristic in the frame, Output attention level image.
The basic attention level image extraction method is not particularly limited, but in this embodiment, the basic attention level image extraction unit 1 includes a basic feature image extraction unit 11, a multi-resolution image extraction unit 12, and a resolution difference. A feature extraction method in the case where the image extraction unit 13, the saliency image extraction unit 14, and the saliency image integration unit 15 are configured will be described.

基礎特徴画像抽出部１１は、入力画像を入力し、複数の特徴抽出方法により入力画像からそれぞれ基礎特徴画像を抽出し、それら基礎特徴画像の集合を出力する。
基礎特徴画像の抽出方法は特に限定されるものではないが、本実施形態においては、基礎特徴画像抽出部１１が、輝度特徴画像抽出部１１１と、色特徴画像抽出部１１２と、方向特徴画像抽出部１１３と、点滅特徴画像抽出部１１４と、運動特徴画像抽出部１１５とによって構成される場合の特徴抽出方法について述べる。 The basic feature image extraction unit 11 inputs an input image, extracts basic feature images from the input image by a plurality of feature extraction methods, and outputs a set of the basic feature images.
Although the basic feature image extraction method is not particularly limited, in this embodiment, the basic feature image extraction unit 11 includes a luminance feature image extraction unit 111, a color feature image extraction unit 112, and a direction feature image extraction. A feature extraction method in the case of being configured by the unit 113, the blinking feature image extraction unit 114, and the motion feature image extraction unit 115 will be described.

輝度特徴画像抽出部１１１は、入力画像を入力し、入力画像の輝度成分を表現する輝度特徴画像を出力する。輝度特徴画像は、入力画像のＲＧＢ（Ｒｅｄ，Ｇｒｅｅｎ，Ｂｌｕｅ）成分の平均として、以下の（式１）のように求める。 The luminance feature image extraction unit 111 receives an input image and outputs a luminance feature image that represents the luminance component of the input image. The luminance feature image is obtained as the following (Equation 1) as an average of RGB (Red, Green, Blue) components of the input image.

ここで、ｒ（ｉ），ｇ（ｉ），ｂ（ｉ）は、それぞれｉ番目の入力画像（＝入力映像のｉ番目のフレーム）におけるＲ（赤色），Ｇ（緑色），Ｂ（青）の各成分であり、画素値はそれぞれ０から１の実数値で表されている。また、Ｉ（ｉ）は、ｉ番目の入力画像から計算される輝度特徴画像である。 Here, r (i), g (i), and b (i) are R (red), G (green), and B (blue) in the i-th input image (= i-th frame of the input video), respectively. The pixel values are represented by real values from 0 to 1, respectively. I (i) is a luminance feature image calculated from the i-th input image.

色特徴画像抽出部１１２は、入力画像を入力し、入力画像の色成分を表現する色特徴画像を出力する。色特徴画像抽出部１１２では、赤色・緑色・青色・黄色にそれぞれ対応する４種類の色特徴画像を抽出する。これらの色特徴画像はそれぞれ、以下の（式２）〜（式４）のように求める。ただし、ｍａｘは、括弧｛｝内の値のうちの最大値を示す。 The color feature image extraction unit 112 inputs an input image and outputs a color feature image that represents a color component of the input image. The color feature image extraction unit 112 extracts four types of color feature images corresponding to red, green, blue, and yellow, respectively. Each of these color feature images is obtained as in the following (Expression 2) to (Expression 4). Here, max indicates the maximum value among the values in parentheses {}.

ここで、Ｒ（ｉ），Ｇ（ｉ），Ｂ（ｉ），Ｙ（ｉ）は、それぞれ赤色・緑色・青色・黄色に対応する、ｉ番目の入力画像から計算される色特徴画像である。また、Ｒ（ｉ）_{（ｘ，ｙ）}は、座標（ｘ，ｙ）における色特徴画像Ｒ（ｉ）の画素値である。以降の説明において、必要な場合を除いて、下添字の（ｘ，ｙ）は省略するものとする。 Here, R (i), G (i), B (i), and Y (i) are color feature images calculated from the i-th input image corresponding to red, green, blue, and yellow, respectively. . R (i) _{(x, y)} is a pixel value of the color feature image R (i) at the coordinates (x, y). In the following description, the subscript (x, y) is omitted unless necessary.

方向特徴画像抽出部１１３は、入力画像を入力し、入力画像の方向成分を表現する方向特徴画像を出力する。方向特徴画像は、輝度特徴画像抽出部１１１によって求めた前記輝度特徴画像Ｉ（ｉ）にガボールフィルタを作用させることによって、以下の（式６）のように求められる。なお、ガボールフィルタとは、画像の局所的な濃淡情報を取り出すものである。 The direction feature image extraction unit 113 inputs an input image and outputs a direction feature image representing the direction component of the input image. The direction feature image is obtained as shown in the following (formula 6) by applying a Gabor filter to the luminance feature image I (i) obtained by the luminance feature image extraction unit 111. The Gabor filter is for extracting local grayscale information of an image.

ここで、ｇ_φは回転角φを持つガボールフィルタ、＊は畳み込み（関数を平行移動しながらもう一方の関数を重ね足し合わせる）を表現する演算子である。また、Ｏ_φ（ｉ）はｉ番目の入力画像から計算される、回転角φに対応する方向特徴画像である。方向特徴画像は、ｎ_φ通りの回転角について抽出される。このとき回転角φは、π＝１８０°を均等にｎ_φ分割するように、以下の（式７）のように選択される。 Here, g _φ is a Gabor filter having a rotation angle φ, and * is an operator that expresses convolution (the other function is added together while moving the function in parallel). O _φ (i) is a direction feature image corresponding to the rotation angle φ calculated from the i-th input image. Direction feature image is extracted for the rotation angle of the street n _phi. At this time, the rotation angle φ is selected as shown in the following (Equation 7) so that π = 180 ° is divided into n _φ evenly.

点滅特徴画像抽出部１１４は、入力画像を入力し、入力画像の点滅成分を表現する点滅特徴画像を出力する。点滅特徴画像は、輝度特徴画像抽出部１１１により、現在及びそれ以前のいくつかの入力画像から計算される前記輝度特徴画像Ｉ（ｉ）を用い、以下の（式８）のように計算される。 The blinking feature image extraction unit 114 inputs an input image and outputs a blinking feature image that represents the blinking component of the input image. The blinking feature image is calculated by the brightness feature image extraction unit 111 using the brightness feature image I (i) calculated from the current and previous input images as shown in (Equation 8) below. .

ここで、ｎ_Ｆは点滅特徴画像を抽出する際に参照する輝度特徴画像の数（ただし現在の入力画像から抽出された輝度特徴画像は除く）、Ｆ（ｉ）はｉ番目及びそれ以前の入力画像から計算される点滅特徴画像である。ｎ_Ｆ＝１としたとき、Ｆ（ｉ）＝｜Ｉ（ｉ）−Ｉ（ｉ−１）｜となり、非特許文献２に記載の方法と一致する。 Here, n _F flashing number of luminance feature image to be referred to when extracting feature image (where luminance feature image extracted from the current input image except), F (i) is the i-th and earlier input It is a blinking feature image calculated from the image. When n _F = 1, F (i) = | I (i) −I (i−1) |, which is consistent with the method described in Non-Patent Document 2.

運動特徴画像抽出部１１５は、入力画像を入力し、入力画像の運動成分を表現する運動特徴画像を出力する。運動特徴画像の抽出方法は特に限定されるものではないが、本実施形態においては、輝度特徴画像の各点におけるオプティカルフローを求めることによって抽出する。オプティカルフローの導出方法は特に限定されるものではないが、本実施形態においては、例えば、一般にLucas-Kanade法と呼ばれる画像勾配に基づく方法を用いて、以下の（式９）〜（式１０）のように計算する。 The motion feature image extraction unit 115 inputs an input image and outputs a motion feature image representing a motion component of the input image. The method for extracting the motion feature image is not particularly limited, but in the present embodiment, the motion feature image is extracted by obtaining an optical flow at each point of the luminance feature image. The method for deriving the optical flow is not particularly limited. In the present embodiment, for example, a method based on an image gradient generally called the Lucas-Kanade method is used, and the following (formula 9) to (formula 10) are used. Calculate as follows.

ここで、Ａ（ｘ，ｙ）は座標（ｘ，ｙ）の近傍領域、Ｍ_ｘ（ｉ），Ｍ_ｙ（ｉ）はそれぞれ運動の水平成分・垂直成分に対応する運動特徴画像である。 Here, A (x, y) is a neighborhood region of coordinates (x, y), and M _x (i) and M _y (i) are motion feature images corresponding to the horizontal and vertical components of motion, respectively.

上記の通り、基礎特徴画像抽出部１１は、前記輝度特徴画像、前記色特徴画像、前記方向特徴画像、前記点滅特徴画像、及び前記運動特徴画像を、それぞれ基礎特徴画像とし、それら基礎特徴画像の集合を出力する。 As described above, the basic feature image extraction unit 11 sets the luminance feature image, the color feature image, the direction feature image, the blinking feature image, and the motion feature image as basic feature images, respectively. Output a set.

多重解像度画像抽出部１２は、前記基礎特徴画像の集合を入力し、各基礎特徴画像について、その多重解像度表現である多重解像度画像を抽出し、多重解像度画像の集合を出力する。
本実施形態においては、いずれの基礎特徴画像についても同様の処理を行うため、輝度特徴画像を例に処理の説明を行うものとする。輝度特徴画像についての多重解像度表現である輝度多重解像度画像は、輝度特徴画像にガウシアンフィルタを繰り返し作用させることによって以下の（式１１）のように抽出される。ガウシアンフィルタとは、画像のノイズ除去するための平滑化フィルタのひとつであり、注目画素からの距離に基づく重み付けにガウス関数を用いる。 The multi-resolution image extraction unit 12 receives the set of basic feature images, extracts a multi-resolution image that is a multi-resolution representation of each basic feature image, and outputs a set of multi-resolution images.
In the present embodiment, since the same processing is performed for any basic feature image, the processing will be described using a luminance feature image as an example. A luminance multi-resolution image that is a multi-resolution representation of the luminance feature image is extracted as shown in the following (Equation 11) by repeatedly applying a Gaussian filter to the luminance feature image. The Gaussian filter is one of smoothing filters for removing noise from an image, and uses a Gaussian function for weighting based on the distance from the target pixel.

ここで、Ｇ_σは分散σを持つガウシアンフィルタ、Ｉ（ｉ；ｌ）は輝度特徴画像Ｉ（ｉ）から抽出した第ｌレベルの輝度多重解像度画像、ｎ_ｌは多重解像度画像のレベル数である。第０レベルの輝度多重解像度画像は輝度特徴画像そのもの、すなわちＩ（ｉ；０）＝Ｉ（ｉ）とする。 Here, G _σ is a Gaussian filter having variance σ, I (i; l) is the l-th level luminance multi-resolution image extracted from the luminance feature image I (i), and n _l is the number of levels of the multi-resolution image. . The brightness multi-resolution image at the 0th level is the brightness feature image itself, that is, I (i; 0) = I (i).

他の基礎特徴画像についても同様にして多重解像度画像を抽出することができる。このとき、輝度多重解像度画像がｎ_ｌ枚抽出されるのに対して、色特徴画像Ｒ（ｉ），Ｇ（ｉ），Ｂ（ｉ），Ｙ（ｉ）を用いた場合の色多重解像度画像Ｒ（ｉ；ｌ），Ｇ（ｉ；ｌ），Ｂ（ｉ；ｌ），Ｙ（ｉ；ｌ）は合計４ｎ_ｌ枚、方向特徴画像Ｏ_φ（ｉ）を用いた場合の方向多重解像度画像Ｏ_φ（ｉ；ｌ）は合計ｎ_φｎ_ｌ枚、点滅特徴画像Ｆ（ｉ）を用いた場合の点滅多重解像度画像Ｆ（ｉ；ｌ）はｎ_ｌ枚、運動特徴画像Ｍ_ｘ（ｉ），Ｍ_ｙ（ｉ）を用いた場合の運動多重解像度画像Ｍ_ｘ（ｉ；ｌ），Ｍ_ｙ（ｉ；ｌ）は合計２ｎ_ｌ枚、それぞれ抽出される。 Multiresolution images can be extracted in the same manner for other basic feature images. At this time, n _l luminance multi-resolution images are extracted, whereas color multi-resolution images when color feature images R (i), G (i), B (i), and Y (i) are used. R (i; l), G (i; l), B (i; l), Y (i; l) are 4n _l in total, and a direction multi-resolution image when the direction feature image _Oφ (i) is used. O _{φ (i;} l) total n _phi n _l Like, blinking multiple resolution images F in the case of using a blinking feature image F (i) (i; l ) is _{n l} Like, the movement feature image _M x (i) , M _y (i), the motion multi-resolution images M _x (i; l) and M _y (i; l) are extracted in total 2n ₁ respectively.

上記の通り、多重解像度画像抽出部１２は、前記輝度多重解像度画像、前記色多重解像度画像、前記方向多重解像度画像、前記点滅多重解像度画像、及び前記運動多重解像度画像をそれぞれ多重解像度画像とし、それら多重解像度画像の集合を出力する。 As described above, the multi-resolution image extraction unit 12 sets the luminance multi-resolution image, the color multi-resolution image, the direction multi-resolution image, the blinking multi-resolution image, and the motion multi-resolution image as multi-resolution images, respectively. Output a set of multi-resolution images.

解像度差分画像抽出部１３は、多重解像度画像抽出部１２が出力した前記多重解像度画像の集合を入力し、多重解像度画像の各種類（輝度・色など）について、解像度レベルの異なる画像の間の差分画像である解像度差分画像を抽出し、解像度差分画像の集合を出力する。
本実施形態においては、以下の（式１２）〜（式１７）のようにして解像度差分画像を抽出する。 The resolution difference image extraction unit 13 receives the set of the multi-resolution images output from the multi-resolution image extraction unit 12 and, for each type of multi-resolution image (luminance, color, etc.), the difference between images having different resolution levels. A resolution difference image that is an image is extracted, and a set of resolution difference images is output.
In the present embodiment, the resolution difference image is extracted as in the following (Expression 12) to (Expression 17).

ここで、ＲＳ_Ｉ（ｉ；ｃ，ｓ）は第ｃレベルと第ｓレベルの輝度多重解像度画像から得られる輝度解像度差分画像であり、以降、（ｃ，ｓ）レベル輝度解像度差分画像と呼ぶことにする。同様に、ＲＳ_ＲＧ（ｉ；ｃ，ｓ）及びＲＳ_ＢＹ（ｉ；ｃ，ｓ）はそれぞれ（ｃ，ｓ）レベルＲＧ色解像度差分画像及び（ｃ，ｓ）レベルＢＹ色解像度差分画像、ＲＳ_Ｏ（ｉ；φ；ｃ，ｓ）は回転角φの（ｃ，ｓ）レベル方向解像度差分画像、ＲＳ_Ｆ（ｉ；ｃ，ｓ）は（ｃ，ｓ）レベル点滅解像度差分画像、 Here, RS _I (i; c, s) is a luminance resolution difference image obtained from the luminance multi-resolution images of the c-th level and the s-th level, and is hereinafter referred to as a (c, s) -level luminance resolution difference image. To. Similarly, RS _RG (i; c, s) and RS _BY (i; c, s) are respectively a (c, s) level RG color resolution difference image, a (c, s) level BY color resolution difference image, and RS _O. (I; φ; c, s) is a (c, s) level direction resolution difference image of the rotation angle φ, RS _F (i; c, s) is a (c, s) level blinking resolution difference image,

（以下、「ＲＳ_Ｍｋ（ｉ；ｃ，ｓ）」と記載）はｋ方向の（ｃ，ｓ）レベル運動解像度差分画像である。また、Ｌ_ｃ，Ｌ_ｓは輝度解像度差分画像を抽出する際に考慮する解像度レベルの集合であり、それぞれ中心解像度レベル集合、周辺解像度レベル集合と呼ぶ。
このとき、輝度解像度差分画像が｜Ｌ_ｃ・Ｌ_ｓ｜枚抽出されるのに対して、色解像度差分画像は４｜Ｌ_ｃ・Ｌ_ｓ｜枚、方向解像度差分画像はｎ_φ｜Ｌ_ｃ・Ｌ_ｓ｜枚、点滅解像度差分画像は｜Ｌ_ｃ・Ｌ_ｓ｜枚、運動解像度差分画像は２｜Ｌ_ｃ・Ｌ_ｓ｜枚、それぞれ抽出される。 (Hereinafter referred to as “RS _Mk (i; c, s)”) is a (c, s) level motion resolution difference image in the k direction. L _c and L _s are sets of resolution levels to be considered when extracting the luminance resolution difference image, and are referred to as a central resolution level set and a peripheral resolution level set, respectively.
At this time, | L _c · L _s | luminance resolution difference images are extracted, whereas 4 | L _c · L _s | color resolution difference images and n _φ | L _c · L _s |, blinking resolution difference images are extracted as | L _c · L _s |, and motion resolution difference images are extracted as 2 | L _c · L _s |

上記の通り、解像度差分画像抽出部１３は、前記輝度解像度差分画像、前記色解像度差分画像、前記方向解像度差分画像、前記点滅解像度差分画像、及び前記運動解像度差分画像をそれぞれ解像度差分画像とし、それら解像度差分画像の集合を出力する。 As described above, the resolution difference image extraction unit 13 sets the luminance resolution difference image, the color resolution difference image, the direction resolution difference image, the blinking resolution difference image, and the motion resolution difference image as resolution difference images, respectively. A set of resolution difference images is output.

顕著度画像抽出部１４は、解像度差分画像抽出部１３により出力された前記解像度差分画像の集合を入力し、解像度差分画像の各種類（輝度・色など）について解像度差分画像を統合した画像である顕著度画像を抽出し、それら顕著度画像の集合を出力する。
顕著度画像の抽出方法は特に限定されるものではないが、本実施形態においては、顕著度画像抽出部１４が解像度差分画像正規化部１４１と、正規化解像度差分画像積算部１４２とによって構成される場合の顕著度画像抽出方法について述べる。 The saliency image extraction unit 14 is an image obtained by inputting the set of resolution difference images output from the resolution difference image extraction unit 13 and integrating the resolution difference images for each type of resolution difference image (luminance, color, etc.). A saliency image is extracted, and a set of these saliency images is output.
The extraction method of the saliency image is not particularly limited, but in this embodiment, the saliency image extraction unit 14 includes a resolution difference image normalization unit 141 and a normalized resolution difference image integration unit 142. A method for extracting the saliency image in this case will be described.

解像度差分画像正規化部１４１は、解像度差分画像抽出部１３により出力された前記解像度差分画像の集合を入力し、各解像度差分画像について正規化処理を施した画像である正規化解像度差分画像を抽出し、正規化解像度差分画像の集合を出力する。
本実施形態においては、いずれの解像度差分画像についても同様の処理を行うため、あるｃ∈Ｌ_ｃ，ｓ∈Ｌ_ｓを選んだときの（ｃ，ｓ）レベル輝度解像度差分画像ＲＳ_Ｉ（ｉ；ｃ，ｓ）を例に処理の説明を行うものとする。解像度差分画像に対する正規化処理は、以下の（式１８）〜（式２０）ようにして行われ、正規化解像度差分画像Ｎ（ＲＳ_Ｉ（ｉ；ｃ，ｓ））を得る。 The resolution difference image normalization unit 141 receives the set of resolution difference images output from the resolution difference image extraction unit 13 and extracts a normalized resolution difference image that is an image obtained by performing normalization processing on each resolution difference image. Then, a set of normalized resolution difference images is output.
In the present embodiment, since the same processing is performed for any resolution difference image, (c, s) level luminance resolution difference image RS _I (i;) when a certain c∈L _c , s∈L _s is selected. The processing will be described using c, s) as an example. The normalization process for the resolution difference image is performed as follows (Equation 18) to (Equation 20) to obtain a normalized resolution difference image N (RS _I (i; c, s)).

ここで、ｍ^＊（ＲＳ_Ｉ（ｉ；ｃ，ｓ））は、（ｃ，ｓ）レベル輝度解像度差分画像の中の最大画素値である。 Here, m ^* (RS _I (i; c, s)) is the maximum pixel value in the (c, s) level luminance resolution difference image.

（以下、￣ｍ（ＲＳ_Ｉ（ｉ；ｃ，ｓ））と記載）は、（ｃ，ｓ）レベル輝度解像度差分画像の局所領域における最大値の平均、Ａは、￣ｍ（ＲＳ_Ｉ（ｉ；ｃ，ｓ））を計算する際の局所領域、ｎ_Ａはそのような局所領域の総数である。各局所領域は、例えば、画像を格子状に分割したときの１つの格子（領域）とする。なお、以下では、文字の上に￣や〜が付いた文字は、￣や〜を文字の前に記載して表す（例えば、上述する「￣ｍ」）。

(Hereinafter referred to as ￣m (RS _I (i; c, s))) is the average of the maximum values in the local region of the (c, s) level luminance resolution difference image, and A is ￣ m (RS _I (i C, s)) the local region in calculating n, n _A is the total number of such local regions. Each local area is, for example, one grid (area) when an image is divided into a grid. In the following, a character with ￣ or ~ on the character indicates して or ~ before the character (for example, “￣m” described above).

他のレベルの輝度解像度差分画像、及び他の種類の輝度解像度差分画像についても同様にして正規化処理を行うことができ、それぞれの正規化解像度差分画像、すなわち、正規化輝度解像度差分画像、正規化ＲＧ色解像度差分画像、正規化ＢＹ色解像度差分画像、正規化方向解像度差分画像、正規化方向解像度差分画像、正規化点滅解像度差分画像、正規化運動解像度差分画像を得る。 Normalization processing can be performed in the same manner for other levels of luminance resolution difference images and other types of luminance resolution difference images, and the respective normalized resolution difference images, that is, normalized luminance resolution difference images, normal A normalized RG color resolution difference image, a normalized BY color resolution difference image, a normalized direction resolution difference image, a normalized direction resolution difference image, a normalized blinking resolution difference image, and a normalized motion resolution difference image are obtained.

正規化解像度差分画像積算部１４２は、解像度差分画像正規化部１４１により得られた前記正規化解像度差分画像の集合を入力し、それら正規化解像度差分画像を各種類（輝度・色など）について積算することにより顕著度画像を抽出し、顕著度画像の集合を出力する。
本実施形態において、正規化解像度差分画像は、以下の（式２１）〜（式２５）のようにして積算される。 The normalized resolution difference image integration unit 142 inputs the set of normalized resolution difference images obtained by the resolution difference image normalization unit 141, and integrates these normalized resolution difference images for each type (luminance, color, etc.). As a result, a saliency image is extracted, and a set of saliency images is output.
In the present embodiment, the normalized resolution difference image is integrated as in the following (Expression 21) to (Expression 25).

ここで、ＣＭ_Ｉ（ｉ）、ＣＭ_Ｃ（ｉ）、ＣＭ_Ｏ（ｉ）、ＣＭ_Ｆ（ｉ）、ＣＭ_Ｍ（ｉ）はそれぞれ輝度顕著度画像、色顕著度画像、方向顕著度画像、点滅顕著度画像、運動顕著度画像であり、正規化解像度差分画像Ｎ（ＲＳ_Ｉ（ｉ；ｃ，ｓ））、正規化ＲＧ色解像度差分画像Ｎ（ＲＳ_ＲＧ（ｉ；ｃ，ｓ））及び正規化ＢＹ色解像度差分画像Ｎ（ＲＳ_ＢＹ（ｉ；ｃ，ｓ））、正規化方向解像度差分画像Ｎ（ＲＳ_Ｏ（ｉ；φ；ｃ，ｓ））、正規化方向解像度差分画像Ｎ（ＲＳ_Ｆ（ｉ；ｃ，ｓ））、正規化点滅解像度差分画像Ｎ（ＲＳ_Ｆ（ｉ；ｃ，ｓ））、正規化運動解像度差分画像Ｎ（ＲＳ_Ｍｋ（ｉ；ｃ，ｓ））を用いて同様に求められる。 Here, CM _I (i), CM _C (i), CM _O (i), CM _F (i), and _CMM (i) are a luminance saliency image, a color saliency image, a direction saliency image, and blinking, respectively. A saliency image, a motion saliency image, a normalized resolution difference image N (RS _I (i; c, s)), a normalized RG color resolution difference image N (RS _RG (i; c, s)), and a normal BY color resolution difference image N (RS _BY (i; c, s)), normalized direction resolution difference image N (RS _O (i; φ; c, s)), normalized direction resolution difference image N (RS _F (I; c, s)), normalized blink resolution difference image N (RS _F (i; c, s)), normalized motion resolution difference image N (RS _Mk (i; c, s)) Is required.

上記の通り、顕著度画像抽出部１４は、前記輝度顕著度画像、前記色顕著度画像、前記方向顕著度画像、前記点滅顕著度画像、及び前記運動顕著度画像を、それぞれ顕著度画像とし、それら顕著度画像の集合を出力する。 As described above, the saliency image extraction unit 14 sets the luminance saliency image, the color saliency image, the direction saliency image, the blinking saliency image, and the motion saliency image as saliency images, respectively. A set of these saliency images is output.

顕著度画像統合部１５は、顕著度画像抽出部１４により出力された前記顕著度画像の集合を入力し、顕著度画像を統合した画像である基礎注目度画像を抽出し、その基礎注目度画像を出力する。
基礎注目度画像の抽出方法は特に限定されるものではないが、本実施形態においては、顕著度画像統合部１５が、顕著度画像正規化部１５１と、正規化顕著度画像積算部１５２とによって構成される場合の基礎注目度画像抽出方法について述べる。 The saliency image integration unit 15 receives the set of saliency images output from the saliency image extraction unit 14, extracts a basic attention image that is an image obtained by integrating the saliency images, and the basic attention image. Is output.
The extraction method of the basic attention level image is not particularly limited, but in the present embodiment, the saliency image integration unit 15 includes a saliency image normalization unit 151 and a normalized saliency image integration unit 152. The basic attention level image extraction method when configured is described.

顕著度画像正規化部１５１は、顕著度画像抽出部１４により出力された顕著度画像の集合を入力し、各顕著度画像（輝度顕著度画像、色顕著度画像、方向顕著度画像、点滅顕著度画像、運動顕著度画像）それぞれについて正規化処理を施した画像である正規化顕著度画像を抽出し、正規化顕著度画像の集合を出力する。
顕著度画像に対する正規化処理は、前記解像度差分画像正規化部１４１による正規化処理と同様である。 The saliency image normalization unit 151 receives the set of saliency images output by the saliency image extraction unit 14 and inputs each saliency image (luminance saliency image, color saliency image, direction saliency image, flashing saliency image). Normalized image and motion saliency image), a normalized saliency image that is an image subjected to normalization processing is extracted, and a set of normalized saliency images is output.
The normalization process for the saliency image is the same as the normalization process by the resolution difference image normalization unit 141.

正規化顕著度画像積算部１５２は、顕著度画像正規化部１５１により出力された正規化顕著度画像の集合を入力し、それら正規化顕著度画像を積算することにより基礎注目度画像を抽出し、その基礎注目度画像を出力する。
本実施形態において、正規化顕著度画像Ｎ（ＣＭ_ｊ（ｉ））（ｊ＝Ｉ，Ｃ，Ｏ，Ｆ，Ｍ）は、以下の（式２６）のようにして積算される。ただし、〜ｊは、式中でｊと区別するために用いているが、ｊと同様、〜ｊ＝Ｉ，Ｃ，Ｏ，Ｆ，Ｍである。 The normalized saliency image accumulation unit 152 receives the set of normalized saliency images output by the saliency image normalization unit 151, and extracts the basic attention level image by integrating the normalized saliency images. The basic attention degree image is output.
In this embodiment, the normalized saliency image N (CM _j (i)) (j = I, C, O, F, M) is integrated as shown in the following (Equation 26). However, although ~ j is used in order to distinguish from j in a formula, it is ~ j = I, C, O, F, and M like j.

ここで、Ｓ（ｉ）はｉ番目の入力画像から抽出される基礎注目度画像、ｗ_ｊ（ｉ）は後述の第３の実施形態の顕著度画像統合比率算出部２により抽出される顕著度画像ＣＭ_ｊ（ｉ）に対応する顕著度画像統合比率であるが、本実施形態及び後述する第２の実施形態では、顕著度画像統合比率算出部２を用いないため、ｗ_ｊ（ｉ）＝１／５∀（ｉ，ｊ）とする。（∀は任意を示す。） Here, S (i) is the basic attention degree image extracted from the i-th input image, and w _j (i) is the saliency extracted by the saliency image integration ratio calculation unit 2 of the third embodiment described later. Although the saliency image integration ratio corresponding to the image CM _j (i) is not used in the present embodiment and the second embodiment described later, the saliency image integration ratio calculation unit 2 is not used, and thus w _j (i) = Let 1/5 (i, j). (∀ indicates optional.)

上記の通り、基礎注目度画像抽出部１は、前記基礎注目度画像を抽出し、これを出力する。 As described above, the basic attention level image extraction unit 1 extracts the basic attention level image and outputs it.

注目度瞬時抑制部３は、基礎注目度画像抽出部１により、現在及び１時点前の入力画像から算出された前記基礎注目度画像を入力し、１時点前の入力画像から算出された基礎注目度画像について、その基礎注目度画像の各画素の値である基礎注目度が最も大きい領域である最大基礎注目度領域を抑制することにより、瞬時抑制後注目度画像を抽出し、その瞬時抑制後注目度画像を出力する。 The attention level instantaneous suppression unit 3 inputs the basic attention level image calculated from the current and previous input images by the basic attention level image extraction unit 1 and calculates the basic attention calculated from the previous input image. For the degree image, extract the attention degree image after instantaneous suppression by suppressing the maximum basic attention degree area, which is the area where the basic attention degree is the largest value of each pixel of the basic attention degree image, and after the instantaneous suppression Output attention level image.

瞬時抑制後注目度画像の抽出方法は特に限定されるものではないが、本実施形態においては、注目度瞬時抑制部３が、最大基礎注目度領域検出部３１と、最大基礎注目度領域遮蔽画像抽出部３２と、注目度漸進回復画像抽出部３３と、注目度瞬時抑制画像生成部３４と、瞬時抑制後注目度画像生成部３５とによって構成される場合の瞬時抑制後注目度画像抽出方法について述べる。 The method of extracting the attention level image after instantaneous suppression is not particularly limited, but in the present embodiment, the instantaneous level suppression unit 3 includes the maximum basic attention level area detection unit 31 and the maximum basic attention level area occlusion image. About the method of extracting the attention level image after instantaneous suppression in the case where the extraction unit 32, the attention degree progressive recovery image extraction unit 33, the attention level instantaneous suppression image generation unit 34, and the instantaneous suppression attention level image generation unit 35 are configured. State.

最大基礎注目度領域検出部３１は、１時点前の入力画像から算出された前記基礎注目度画像を入力し、入力された基礎注目度画像における前記最大基礎注目度領域を抽出し、この最大基礎注目度領域を出力する。
最大基礎注目度領域の抽出方法は特に限定されるものではないが、本実施形態においては、最大基礎注目度領域ＭＳＲ（ｉ−１）を、基礎注目度画像Ｓ（ｉ−１）の画素値（基礎注目度）が最も大きい箇所（（￣ｘ），（￣ｙ））を中心とした半径εの円によって抽出する。すなわち、以下の（式２７）のように表現される。ただし、ａｒｇｍａｘは、直後の項を最大化する値を返す演算子である。例えば、（式２７）においては、Ｓ（ｉ−１）_{（ｘ’，ｙ’）}を最大化する（ｘ’，ｙ’）を返す。 The maximum basic attention level area detection unit 31 inputs the basic attention level image calculated from the input image one time before, extracts the maximum basic attention level area in the input basic attention level image, and extracts the maximum basic attention level area. Output attention area.
Although the extraction method of the maximum basic attention level region is not particularly limited, in the present embodiment, the maximum basic attention level region MSR (i−1) is used as the pixel value of the basic attention level image S (i−1). Extraction is performed by a circle having a radius ε centered on a point ((￣x), (￣y)) having the largest (basic attention level). That is, it is expressed as (Equation 27) below. Here, argmax is an operator that returns a value that maximizes the immediately following term. For example, in (Expression 27), _{(x ′, y ′)} that maximizes S (i−1) (x ′, y ′) is returned.

最大基礎注目度領域遮蔽画像抽出部３２は、現在の入力画像から算出された前記基礎注目度画像、現在及びそれ以前の入力画像から算出された前記最大基礎注目度領域、及び１時点前における注目度瞬時抑制画像生成部３４の出力である注目度瞬時抑制画像を入力し、最大基礎注目度領域を新たに遮蔽する画像である最大基礎注目度領域遮蔽画像を抽出し、この最大基礎注目度領域遮蔽画像を出力する。
最大基礎注目度領域遮蔽画像の抽出方法は特に限定されるものではないが、本実施形態においては、以下の方法によって抽出する。 The maximum basic attention level region occlusion image extraction unit 32 calculates the basic attention level image calculated from the current input image, the maximum basic attention level region calculated from the current and previous input images, and the attention at the previous time point. An instantaneous attention suppressed image that is an output of the instantaneous instantaneous suppression image generation unit 34 is input, and a maximum basic attention level region occlusion image that is an image that newly shields the maximum basic attention level region is extracted. Output occlusion image.
Although the extraction method of the maximum basic attention level region occlusion image is not particularly limited, in the present embodiment, extraction is performed by the following method.

１時点前の注目度瞬時抑制画像生成部３４における処理により、注目度瞬時抑制画像ＩＤ（ｉ−１）がすでに得られているものとする。
ここで、現在の入力画像がｉ番目の入力画像であるとする。このとき、１時点前の最大基礎注目度領域ＭＳＲ（ｉ−１）を新たに遮蔽するようにＩＤ（ｉ−１）を更新することで、現在の最大基礎注目度領域遮蔽画像ＩＤ_１（ｉ）を以下の（式２８）〜（式２９）のように生成する。ただし、ｍｏｄは、除算の余りを示す。 It is assumed that the attention degree instantaneous suppression image ID (i-1) has already been obtained by the processing in the attention degree instantaneous suppression image generation unit 34 one point before.
Here, it is assumed that the current input image is the i-th input image. At this time, ID (i-1) is updated so as to newly shield the maximum basic attention level area MSR (i-1) one point before the current time point, so that the current maximum basic attention level area shielding image ID ₁ (i ) Is generated as in the following (Expression 28) to (Expression 29). However, mod indicates the remainder of division.

ここで、μ（０＜μ≦１）は、最大基礎注目度領域を遮蔽する度合を表現する係数であり、μ＝１のときには、その領域における基礎注目度の大きさによらずその領域を完全に遮蔽する。また、Δｔ_Ｉ≧１は最大基礎注目度領域による遮蔽の間隔を制御する定数である。特に、Δｔ_Ｉ＝１のときには、すべてのフレームにおいて最大基礎注目度領域による遮蔽を行う。 Here, μ (0 <μ ≦ 1) is a coefficient expressing the degree of shielding of the maximum basic attention level region. When μ = 1, the region is determined regardless of the size of the basic attention level in the region. Shield completely. Δt _I ≧ 1 is a constant that controls the interval of shielding by the maximum basic attention degree region. In particular, when Δt _I = 1, all frames are shielded by the maximum basic attention area.

別の実施形態として、１時点前の最大基礎注目度領域遮蔽画像ＩＤ_１（ｉ−１）によって遮蔽されている領域を、その領域の動きに追随して移動させることも可能である。ｋ番目の入力画像から算出された最大基礎注目度領域ＭＳＲ（ｋ）の時点ｉ（ｉ≧ｋ）での位置ＭＳＲ（ｉ；ｋ）は、以下の（式３０）〜（式３２）のようにして算出される。 As another embodiment, it is also possible to move the area that is shielded by the maximum basic attention area shadow image ID ₁ (i-1) one point before the time, following the movement of the area. The position MSR (i; k) at the time point i (i ≧ k) of the maximum basic attention level region MSR (k) calculated from the kth input image is expressed by the following (Expression 30) to (Expression 32). Is calculated as follows.

ただし、ＭＳＲ（ｉ；ｉ）＝ＭＳＲ（ｉ）とする。このとき、最大基礎注目度領域遮蔽画像ＩＤ_１（ｉ）は、以下の（式３３）〜（式３５）のようにして算出される。 However, MSR (i; i) = MSR (i). At this time, the maximum basic attention level region occlusion image ID ₁ (i) is calculated as in the following (Expression 33) to (Expression 35).

上記（式３３）の第１行は１時点前の最大基礎注目度領域を遮蔽する操作、第２行は既に遮蔽されている最大基礎注目度領域を前記運動特徴画像の画素値を利用して移動させる操作、第３行は第２行で移動する前の領域の遮蔽を解除する操作である。 The first row of the above (Equation 33) is an operation for blocking the maximum basic attention level region one point before, and the second row is a maximum basic attention level region that has already been blocked using the pixel value of the motion feature image. The moving operation, the third line, is an operation for releasing the shielding of the area before moving in the second line.

注目度漸進回復画像抽出部３３は、最大基礎注目度領域遮蔽画像抽出部３２の算出した前記最大基礎注目度領域遮蔽画像によって遮蔽されている領域における遮蔽を低減させる画像である注目度漸進回復画像を抽出し、この注目度漸進回復画像を出力する。
本実施形態において、注目度漸進回復画像ＩＤ_２（ｉ）は、全ての画素値がα（０≦α≦１）である画像とする。 The attention degree progressive recovery image extracting unit 33 is an attention degree progressive recovery image which is an image for reducing occlusion in an area shielded by the maximum basic attention degree area occlusion image calculated by the maximum basic attention area occlusion image extraction unit 32. , And this attention degree progressive recovery image is output.
In this embodiment, the attention degree progressive recovery image ID ₂ (i) is an image in which all pixel values are α (0 ≦ α ≦ 1).

注目度瞬時抑制画像生成部３４は、最大基礎注目度領域遮蔽画像抽出部３２の算出した前記最大基礎注目度領域遮蔽画像及び注目度漸進回復画像抽出部３３の出力した注目度漸進回復画像を入力し、これら画像を統合することにより注目度瞬時抑制画像を生成し、この注目度瞬時抑制画像を出力する。
本実施形態において、注目度瞬時抑制画像ＩＤ（ｉ）は、以下の（式３７）のようにして得られる。ただし、ｍｉｎは、括弧｛｝内の値のうちの最小値を示す。 The attention level instantaneous suppression image generation unit 34 inputs the maximum basic attention level region occlusion image calculated by the maximum basic attention level region occlusion image extraction unit 32 and the attention level progressive recovery image output by the attention level progressive recovery image extraction unit 33. Then, by integrating these images, an attention level instantaneous suppression image is generated, and this attention level instantaneous suppression image is output.
In the present embodiment, the attention degree instantaneous suppression image ID (i) is obtained as shown in (Expression 37) below. Here, min indicates the minimum value among the values in parentheses {}.

瞬時抑制後注目度画像生成部３５は、前記注目度瞬時抑制画像及び前記基礎注目度画像を入力し、これら画像を統合することにより瞬時抑制後注目度画像を生成し、この瞬時抑制後注目度画像を出力する。
本実施形態において、瞬時抑制後注目度画像Ｓ_Ｉ（ｉ）は、以下の（式３８）のようにして得られる。 The attention-suppressed attention level image generation unit 35 inputs the attention-degree-of-interest instantaneous suppression image and the basic attention-level image, and generates an after-suppression attention level image by integrating these images. Output an image.
In the present embodiment, the after-suppression attention level image S _I (i) is obtained as shown in the following (formula 38).

ここで、ω_Ｉ（ｉ）≧０は注目度瞬時抑制画像ＩＤ（ｉ）に対する重みを表現する係数である。 Here, ω _I (i) ≧ 0 is a coefficient expressing the weight for the attention-degree instantaneous suppression image ID (i).

上記の通り、注目度瞬時抑制部３は、前記瞬時抑制後注目度画像を抽出し、これを出力する。 As described above, the attention level instantaneous suppression unit 3 extracts the instantaneous suppression level attention level image and outputs it.

注目度映像出力部５は、前記基礎注目度画像抽出部１〜注目度瞬時抑制部３を、各入力画像について順に繰り返して実行することにより抽出された前記瞬時抑制後注目度画像の時系列である瞬時抑制後注目度映像を抽出し、これを注目度映像として出力する。 The attention level video output unit 5 is a time series of the attention level images after instantaneous suppression extracted by repeatedly executing the basic attention level image extraction unit 1 to the attention level instantaneous suppression unit 3 for each input image in order. An attention level image after a certain instantaneous suppression is extracted, and this is output as the attention level image.

図２に、本実施形態の動作例を示す。
同図において、上段（ａ）は入力画像、中段の（ｂ）は注目度瞬時抑制画像、下段の（ｃ）は瞬時抑制後注目度画像であり、それぞれ左から時系列順に整列している。 FIG. 2 shows an operation example of the present embodiment.
In the figure, the upper stage (a) is an input image, the middle stage (b) is an attention level instantaneous suppression image, and the lower stage (c) is an instantaneous suppression attention level image, which are arranged in chronological order from the left.

［第２の実施形態］
図３は、本発明の第２の実施形態による注目領域抽出装置の機能ブロック図である。
本実施形態に示す注目領域抽出装置は、基礎注目度画像抽出部１と、注目度瞬時抑制部３と、注目度漸進抑制部４と、注目度映像出力部５とで構成され、注目度抽出の対象となる入力映像を入力し、入力画像の中で注目度の高い領域を表示した映像である注目度映像を出力する。なお、注目領域抽出装置を、注目度瞬時抑制部３を用いずに、基礎注目度画像抽出部１と、注目度漸進抑制部４と、注目度映像出力部５とで構成することも可能である。同図において、第１の実施形態と同様の構成は同じ符号を付し、説明を省略する。基礎注目度画像抽出部１、及び注目度瞬時抑制部３は、第１の実施形態と同様である。 [Second Embodiment]
FIG. 3 is a functional block diagram of the attention area extracting apparatus according to the second embodiment of the present invention.
The attention area extracting apparatus shown in the present embodiment includes a basic attention degree image extraction unit 1, an attention degree instantaneous suppression unit 3, an attention degree gradual suppression unit 4, and an attention degree video output unit 5. Is input, and an attention level video which is a video displaying a region of high attention level in the input image is output. Note that the attention area extraction device can be configured by the basic attention level image extraction unit 1, the attention level progressive suppression unit 4, and the attention level video output unit 5 without using the attention level instantaneous suppression unit 3. is there. In the figure, the same components as those in the first embodiment are denoted by the same reference numerals, and description thereof is omitted. The basic attention level image extraction unit 1 and the attention level instantaneous suppression unit 3 are the same as those in the first embodiment.

注目度漸進抑制部４は、現在及びそれ以前の入力画像のうちいくつかから算出された前記顕著度画像、及び前記基礎注目度画像もしくは前記瞬時抑制後注目度画像を入力し、顕著度画像について時間軸方向で顕著な値を持つ領域を抽出し、その領域に対応する基礎注目度画像もしくは瞬時抑制後注目度画像における領域を強調し、そうではない領域を抑制することにより、漸進抑制後注目度画像を抽出し、その漸進抑制後注目度画像を出力する。 The attention degree gradual suppression unit 4 inputs the saliency image calculated from some of the current and previous input images and the basic attention degree image or the instantaneous saliency attention degree image. Extract a region with a remarkable value in the time axis direction, emphasize the region in the basic attention image or the instantaneous attention level image corresponding to that region, and suppress the regions that are not so that the attention after progressive suppression The degree image is extracted, and the attention degree image after the progressive suppression is output.

漸進抑制後注目度画像の抽出方法は特に限定されるものではないが、本実施形態においては、注目度漸進抑制部４が、注目度漸進遮蔽画像生成部４１と、注目度瞬時回復画像生成部４２と、注目度漸進抑制画像生成部４３と、漸進抑制後注目度画像生成部４４とによって構成される場合の漸進抑制後注目度画像抽出方法について述べる。 The method of extracting the attention degree image after progressive suppression is not particularly limited. In the present embodiment, the attention degree progressive suppression unit 4 includes an attention degree progressive occlusion image generation unit 41 and an attention degree instantaneous recovery image generation unit. 42, the attention degree progressive suppression image generation unit 43 and the progressive suppression attention level image generation unit 44 will be described.

注目度漸進遮蔽画像生成部４１は、前記基礎注目度画像を漸進的に遮蔽する画像である注目度漸進遮蔽画像を生成し、この注目度漸進遮蔽画像を出力する。
注目度漸進遮蔽画像の生成方法は、特に限定されるものではないが、本実施形態においては、基礎注目度画像Ｓ（ｉ）の全ての画素値を１時点ごとにβ（０＜β≦１）ずつ減少させることで以下の（式３９）のように注目度漸進遮蔽画像ＧＤ_１（ｉ）を生成する。 The attention degree progressive occlusion image generation unit 41 generates an attention degree progressive occlusion image that is an image that gradually obstructs the basic attention degree image, and outputs the attention degree progressive occlusion image.
The method of generating the attention degree progressive occlusion image is not particularly limited. In the present embodiment, all pixel values of the basic attention degree image S (i) are represented by β (0 <β ≦ 1) for each time point. ), The attention degree progressive occlusion image GD ₁ (i) is generated as in (Equation 39) below.

注目度瞬時回復画像生成部４２は、現在及びそれ以前の入力画像のうちいくつかの入力画像から算出された前記顕著度画像を入力し、それら顕著度画像について、時間軸方向で顕著な値を持つ領域を抽出し、その領域に対応する前記基礎注目度画像もしくは前記瞬時抑制後注目度画像における領域の基礎注目度の抑制を解除させる画像である注目度瞬時回復画像を生成し、この注目度瞬時回復画像を出力する。
注目度瞬時回復画像の生成方法は、特に限定されるものではないが、本実施形態においては、注目度瞬時回復画像生成部４２が、時間顕著度画像生成部４２１と、時間顕著度画像二値化部４２２とによって構成される場合の注目度瞬時回復画像生成方法について述べる。 The attention level instantaneous recovery image generation unit 42 inputs the saliency images calculated from some input images of the current and previous input images, and sets the saliency images with a remarkable value in the time axis direction. A region of interest is extracted, and an attention level instantaneous recovery image, which is an image for releasing the suppression of the basic attention level of the region in the basic attention level image or the instantaneous suppression level attention level image corresponding to the region, is generated. Instant recovery image is output.
The method of generating the attention level instantaneous recovery image is not particularly limited. In this embodiment, the attention level instantaneous recovery image generation unit 42 includes the time saliency image generation unit 421 and the time saliency image binary. The attention level instantaneous recovery image generation method in the case of being configured by the conversion unit 422 will be described.

時間顕著度画像生成部４２１は、現在及びそれ以前の入力画像のうちいくつかの入力画像から算出された前記顕著度画像を入力し、それら顕著度画像について、時間軸方向で顕著な値を持つ領域を示す画像である時間顕著度画像を生成し、この時間顕著度画像を出力する。
時間顕著度画像の生成方法は、特に限定されるものではないが、本実施形態においては、以下、２通りの方法について記述する。 The temporal saliency image generation unit 421 inputs the saliency images calculated from some of the current and previous input images, and the saliency images have remarkable values in the time axis direction. A time saliency image that is an image showing a region is generated, and the time saliency image is output.
The method for generating the temporal saliency image is not particularly limited, but in the present embodiment, the following two methods will be described.

第１の時間顕著度画像生成方法では、L. Itti and P. Baldi，"A principled approach to detecting surprising events in video," in Proc. Conference on Computer Vision and Pattern Recognition (CVPR), pp. 631-637, June 2005.（非特許文献５）に記載の方法に基づき、既定の確率モデルとの一致度を利用して時間顕著度画像を生成する。 In the first temporal saliency image generation method, L. Itti and P. Baldi, “A principled approach to detecting surprising events in video,” in Proc. Conference on Computer Vision and Pattern Recognition (CVPR), pp. 631-637 , June 2005. (Non-Patent Document 5), a temporal saliency image is generated using the degree of coincidence with a predetermined probability model.

以下の（式４０）に示すガンマ分布の確率密度関数Ｐ_γ（λ：ρ_１，ρ_２）について、１時点前の入力画像を用いた本処理部における出力により、この確率密度関数のパラメータであるガンマ分布係数（ρ_１，ρ_２）＝（ρ_１（ｉ−１），ρ_２（ｉ−１））が決定されているものとする。 With respect to the probability density function P _γ (λ: ρ ₁ , ρ ₂ ) of the gamma distribution shown in the following (Equation 40), the output of this processing unit using the input image before one time point is used as a parameter of this probability density function. It is assumed that a certain gamma distribution coefficient (ρ ₁ , ρ ₂ ) = (ρ ₁ (i−1), ρ ₂ (i−1)) is determined.

ここで、Γ（・）はガンマ関数である。 Here, Γ (·) is a gamma function.

第１の時間顕著度画像生成方法の基本的な考え方は、各解像度差分画像の各画素に対応する形でそれぞれ上記のガンマ分布係数ρ_１，ρ_２が保持されており、各時点の解像度差分画像の画素値に応じてガンマ分布係数を更新しながら、時間顕著度画像を生成する枠組である。以下、簡単のため、ある（ｃ，ｓ）レベル輝度解像度差分画像ＲＳ_Ｉ（ｉ；ｃ，ｓ）を例として処理を説明する。 The basic idea of the first temporal saliency image generation method is that the above gamma distribution coefficients ρ ₁ and ρ ₂ are held in a form corresponding to each pixel of each resolution difference image, and the resolution difference at each time point. This is a framework for generating a temporal saliency image while updating the gamma distribution coefficient according to the pixel value of the image. Hereinafter, for the sake of simplicity, the processing will be described by taking a certain (c, s) level luminance resolution difference image RS _I (i; c, s) as an example.

まず、輝度解像度差分画像ＲＳ_Ｉ（ｉ；ｃ，ｓ）の各画素（ｘ，ｙ）について、その画素におけるガンマ分布の入力λの推定値であるガンマ分布推定入力係数λ_Ｉ（ｉ）_{（ｘ，ｙ）}を、輝度解像度差分画像における（空間的・時間的双方の意味を含めた）注目画素周辺の画素値及び注目画素に対応するガンマ分布係数ρ_１，Ｉ（ｉ−１）_{（ｘ，ｙ）}，ρ_２，Ｉ（ｉ−１）_{（ｘ，ｙ）}から導出する。次に、導出されたガンマ分布推定入力係数λ_Ｉ（ｉ）_{（ｘ，ｙ）}を用いてガンマ分布係数ρ_１，Ｉ（ｉ−１）_{（ｘ，ｙ）}，ρ_２，Ｉ（ｉ−１）_{（ｘ，ｙ）}を以下の（式４１）、（式４２）ように更新する。 First, for each pixel (x, y) of the luminance resolution difference image RS _I (i; c, s), a gamma distribution estimated input coefficient λ _I (i) _{(x , Y)} is a pixel value around the target pixel (including both spatial and temporal meanings ₎ in the luminance resolution difference image and a gamma distribution coefficient ρ _{1, I} (i−1) _{(x, y)} , ρ _{2, I} (i-1) Derived from _{(x, y)} . Next, using the derived gamma distribution estimation input coefficient λ _I (i) _{(x, y)} , the gamma distribution coefficients ρ _{1, I} (i−1) _{(x, y)} , ρ _{2, I} (i−1) ) _{(X, y)} is updated as in the following (Expression 41) and (Expression 42).

ここで、ζ（０＜ζ＜１）は係数更新における履歴の反映率を表現する係数である。上記の処理により導出されたガンマ分布係数から、輝度成分についての時間方向の顕著性を表す画像である（ｃ，ｓ）レベル輝度時間顕著度画像ＳＰ_Ｉ（ｉ；ｃ，ｓ）を以下のように導出する。 Here, ζ (0 <ζ <1) is a coefficient expressing the reflection rate of the history in coefficient update. From the gamma distribution coefficient derived by the above processing, the (c, s) level luminance time saliency image SP _I (i; c, s), which is an image representing the saliency of the luminance component in the time direction, is as follows. To derive.

ここで、Ｄ（Ｐ‖Ｐ’）は確率密度関数ＰとＰ’との間のKullback-Liebler divergenceである。また、Ψ（ｘ）はdigamma functionと呼ばれ、以下の（式４５）で表される。 Here, D (P‖P ′) is a Kullback-Liebler divergence between the probability density functions P and P ′. Ψ (x) is called a digamma function and is expressed by the following (formula 45).

なお、表記の簡略化のため、輝度成分を表すＩ及び注目画素を示す（ｘ，ｙ）は省略している。 For simplification of description, I representing the luminance component and (x, y) representing the target pixel are omitted.

このようにして導出された輝度時間顕著度画像ＳＰ_Ｉ（ｉ）と、同様の処理によって導出された色時間顕著度画像ＳＰ_Ｃ（ｉ）・方向時間顕著度画像ＳＰ_Ｏ（ｉ）・点滅時間顕著度画像ＳＰ_Ｆ（ｉ）・運動時間顕著度画像ＳＰ_Ｍ（ｉ）を統合して、時間顕著度画像ＳＰ（ｉ）を生成する。具体的には、以下の（式４６）〜（式５１）のようにして生成する。 The luminance time saliency image SP _I (i) derived in this way, the color time saliency image SP _C (i), the direction time saliency image SP _O (i), and the blinking time derived by the same processing. The temporal saliency image SP (i) is generated by integrating the saliency image SP _F (i) and the exercise time saliency image SP _M (i). Specifically, it is generated as in the following (Expression 46) to (Expression 51).

これに対し、第２の時間顕著度画像生成方法では、解像度差分画像抽出部１３と類似の方法に基づき、時間軸における重み付き差分処理を利用して時間顕著度画像を生成する。
まず、現在及びそれ以前ｎ_Ｔ時点前までの各種類（輝度・色など）の顕著度画像ＣＭ_ｊ（ｉ−ｔ）（０≦ｔ≦ｎ_Ｔ）から、ガウス分布によって重み付けされた時間軸方向の差分処理によって、時間応答画像Ｔ_ｊ（ｉ）を（式５２）のように算出する。 On the other hand, in the second time saliency image generation method, a time saliency image is generated using weighted difference processing on the time axis based on a method similar to the resolution difference image extraction unit 13.
First, the time axis direction weighted by a Gaussian distribution from the saliency images CM _j (it) (0 ≦ t ≦ n _T ) of each type (brightness, color, etc.) up to the present time and before n _T time points. The time response image T _j (i) is calculated as shown in (Equation 52) by the difference processing.

ここで、η_σ（ｔ）は、平均０、分散σ^２を持つガウス分布の確率密度関数である。上記時間応答画像を統合することで、時間顕著度画像ＳＰ（ｉ）を（式５３）のように算出する。 Here, η _σ (t) is a probability density function of a Gaussian distribution having an average of 0 and a variance of σ ² . By integrating the time response images, the time saliency image SP (i) is calculated as shown in (Formula 53).

時間顕著度画像二値化部４２２は、前記時間顕著度画像を入力し、時間顕著度画像を二値化することによって注目度瞬時回復画像を生成し、この注目度瞬時回復画像を出力する。
本実施形態において、時間顕著度画像ＳＰ（ｉ）は以下のように二値化され、注目度瞬時回復画像ＧＤ_２（ｉ）が（式５４）のように生成される。 The time saliency image binarization unit 422 receives the time saliency image, binarizes the time saliency image, generates an attention level instantaneous recovery image, and outputs the attention level instantaneous recovery image.
In the present embodiment, the temporal saliency image SP (i) is binarized as follows, and the attention degree instantaneous recovery image GD ₂ (i) is generated as shown in (Formula 54).

ここで、θは時間顕著度画像の二値化における閾値であり、予め決められる。 Here, θ is a threshold for binarization of the time saliency image, and is determined in advance.

上記の通り、注目度瞬時回復画像生成部４２は、注目度瞬時回復画像ＧＤ_２（ｉ）を生成し、これを出力する。 As described above, the attention degree instantaneous recovery image generation unit 42 generates the attention degree instantaneous recovery image GD ₂ (i) and outputs it.

注目度漸進抑制画像生成部４３は、前記注目度漸進遮蔽画像及び前記注目度瞬時回復画像を入力し、これら画像を統合することにより注目度漸進抑制画像を生成し、この注目度漸進抑制画像を出力する。
本実施形態において、注目度漸進抑制画像ＧＤ（ｉ）は、以下の（式５５）のようにして得られる。 The attention degree gradual suppression image generation unit 43 inputs the attention degree gradual occlusion image and the attention degree instantaneous recovery image, and generates an attention degree gradual suppression image by integrating these images. Output.
In the present embodiment, the attention degree progressive suppression image GD (i) is obtained as shown in the following (Expression 55).

漸進抑制後注目度画像生成部４４は、前記注目度漸進抑制画像、及び前記基礎注目度画像または前記瞬時抑制後注目度画像を入力し、これら画像を統合することにより漸進抑制後注目度画像を生成し、この漸進抑制後注目度画像を出力する。
漸進抑制後注目度画像の生成方法は、特に限定されるものではないが、本実施形態においては、瞬時抑制後注目度画像Ｓ_Ｉ（ｉ）と注目度漸進抑制画像ＧＤ（ｉ）とを統合して、漸進抑制後注目度画像Ｓ_Ｌ（ｉ）を（式５６）〜（式５７）のように生成する。 The attention level image generation unit 44 after progressive suppression inputs the attention level progressive suppression image and the basic attention level image or the instantaneous suppression degree of attention image, and integrates these images to obtain the attention level image after progressive suppression. And generating an attention degree image after the progressive suppression.
The method of generating the attention degree image after progressive suppression is not particularly limited, but in the present embodiment, the attention degree attention image S _I (i) after instantaneous suppression and the attention degree progressive suppression image GD (i) are integrated. Then, the attention degree image S _L (i) after progressive suppression is generated as in (Expression 56) to (Expression 57).

ここで、ω_Ｇ（ｉ）≧０は注目度漸進抑制画像ＧＤ（ｉ）に対する重みを表現する係数である。上記に示すように、注目度瞬時抑制部３の出力する瞬時抑制後注目度画像Ｓ_Ｉ（ｉ）を用いた式を、基礎注目度画像抽出部１の出力する基礎注目度画像Ｓ（ｉ）を用いた式により算出することが可能である。ここで、ω_Ｉ（ｉ）＝０のときには、注目度瞬時抑制部３を用いない場合と等価になる。 Here, ω _G (i) ≧ 0 is a coefficient expressing the weight for the attention degree progressive suppression image GD (i). As shown above, the basic attention level image S (i) output from the basic attention level image extraction unit 1 is expressed by an expression using the instantaneous attention level suppression image S _I (i) output from the attention level instantaneous suppression unit 3. It is possible to calculate by the formula using Here, when ω _I (i) = 0, this is equivalent to the case where the attention degree instantaneous suppression unit 3 is not used.

上記の通り、注目度漸進抑制部４は、前記漸進抑制後注目度画像を抽出し、これを出力する。 As described above, the attention degree gradual suppression unit 4 extracts the gradual suppression attention degree image and outputs it.

注目度映像出力部５は、前記基礎注目度画像抽出部１〜注目度漸進抑制部４を、各入力画像に対して順に繰り返して実行することにより抽出された前記漸進抑制後注目度画像の時系列である漸進抑制後注目度映像を抽出し、これを注目度映像として出力する。 The attention level video output unit 5 is the above-described attention level image after progressive suppression extracted by repeatedly executing the basic attention level image extraction unit 1 to the attention level progressive suppression unit 4 sequentially for each input image. The attention degree image after progressive suppression that is a series is extracted and output as the attention degree image.

図４は、本実施形態の動作例を示す。同図において、（ａ）は入力画像、（ｂ）は注目度漸進抑制画像、（ｃ）は漸進抑制後注目度画像であり、それぞれ左から時系列順に整列している。 FIG. 4 shows an operation example of this embodiment. In the figure, (a) is an input image, (b) is an attention degree progressive suppression image, and (c) is a progressive suppression attention degree image, which are arranged in chronological order from the left.

［第３の実施形態］
図５は、本発明の第３の実施形態による注目領域抽出装置の機能ブロック図である。
本実施形態に示す注目領域抽出装置は、基礎注目度画像抽出部１と、顕著度画像統合比率算出部２と、注目度瞬時抑制部３と、注目度漸進抑制部４と、注目度映像出力部５とで構成され、注目度抽出の対象となる入力映像を入力し、入力画像の中で注目度の高い領域を表示した映像である注目度映像を出力する。同図において、第１、第２の実施形態と同様の構成は同じ符号を付し、説明を省略する。基礎注目度画像抽出部１、注目度瞬時抑制部３、注目度漸進抑制部４、及び注目度映像出力部５は、第１もしくは第２の実施形態と同様である。 [Third Embodiment]
FIG. 5 is a functional block diagram of the attention area extracting apparatus according to the third embodiment of the present invention.
The attention area extraction apparatus shown in the present embodiment includes a basic attention level image extraction unit 1, a saliency image integration ratio calculation unit 2, an attention level instantaneous suppression unit 3, an attention level progressive suppression unit 4, and an attention level video output. An input video that is a target of interest level extraction is input, and an attention level video that is a video displaying a region of high attention level in the input image is output. In the figure, the same components as those in the first and second embodiments are denoted by the same reference numerals, and the description thereof is omitted. The basic attention level image extraction unit 1, the attention level instantaneous suppression unit 3, the attention level gradual suppression unit 4, and the attention level video output unit 5 are the same as those in the first or second embodiment.

顕著度画像統合比率算出部２は、それぞれ現在の入力画像から抽出された前記顕著度画像の集合及び前記基礎注目度画像を入力し、前記基礎注目度画像から最大基礎注目度領域を抽出する。そして、各顕著度画像について抽出した最大基礎注目度領域に対応する領域の中の値を算出し、その値の大きさから、対応する顕著度画像の重みである顕著度画像統合比率を決定し、その顕著度画像統合比率の集合を出力する。 The saliency image integration ratio calculation unit 2 inputs the set of saliency images extracted from the current input image and the basic attention level image, and extracts the maximum basic attention level region from the basic attention level image. Then, the value in the region corresponding to the maximum basic attention level region extracted for each saliency image is calculated, and the saliency image integration ratio that is the weight of the corresponding saliency image is determined from the magnitude of the value. The set of the saliency image integration ratios is output.

顕著度画像統合比率の計算方法は、特に限定されるものではないが、本実施形態においては、V. Navalpakkam and L. Itti: "Optimal cue selection strategy," in Advances in Neural Information Processing Systems (NIPS), pp.987-994, December 2005.（非特許文献６）に記載の方法に基づき、最大基礎注目度領域における各顕著度画像の画素値を用いて顕著度画像統合比率を逐次更新していく。 The calculation method of the saliency image integration ratio is not particularly limited. In this embodiment, V. Navalpakkam and L. Itti: “Optimal cue selection strategy,” in Advances in Neural Information Processing Systems (NIPS). , pp.987-994, December 2005. (Non-Patent Document 6), the saliency image integration ratio is sequentially updated using the pixel value of each saliency image in the maximum basic attention level region. .

１時点前の入力画像を用いた本処理部における処理により、各顕著度画像ＣＭ_ｊ（ｉ−１）（ｊ＝Ｉ，Ｃ，Ｏ，Ｆ，Ｍ）に対応する顕著度画像統合比率ｗ_ｊ（ｉ−１）が既に得られているものとする。まず、基礎注目度画像Ｓ（ｉ）から最大基礎注目度領域ＭＳＲ（ｉ）を抽出する。抽出方法は、前述の最大基礎注目度領域検出部３１における抽出方法と同様である。次に、各顕著度画像ＣＭ_ｊ（ｉ）に対応する顕著度画像統合比率ｗ_ｊ（ｉ）を、以下のように決定する。 The saliency image integration ratio w _j corresponding to each saliency image CM _j (i−1) (j = I, C, O, F, M) is obtained by the processing in this processing unit using the input image one point before. It is assumed that (i-1) has already been obtained. First, the maximum basic attention level region MSR (i) is extracted from the basic attention level image S (i). The extraction method is the same as the extraction method in the maximum basic attention level region detection unit 31 described above. Next, the saliency image integration ratio w _j (i) corresponding to each saliency image CM _j (i) is determined as follows.

ここで、δは重み更新における履歴の反映比率を与える定数である。 Here, δ is a constant that gives the reflection ratio of the history in the weight update.

続いて、本発明の実施形態を用いた実験データを以下に示す。
入力映像として、大きさ６４０×４８０ピクセル、長さ８〜１５秒の映像６種類を用意した。また、各実施形態に示した記号の実際の数値として、以下の値を用いた。 Subsequently, experimental data using the embodiment of the present invention is shown below.
As input images, six types of images having a size of 640 × 480 pixels and a length of 8 to 15 seconds were prepared. Moreover, the following values were used as actual numerical values of the symbols shown in each embodiment.

ｎ_φ＝４，ｎ_Ｆ＝３， σ＝１．２５，ｎ_ｌ＝８，Ｌ_ｃ＝｛２，３，４｝，Ｌ_ｓ＝｛ｃ＋３，ｃ＋４｝（ｃ∈Ｌ_Ｃ），ｎ_Ａ＝３２×２４＝７６８， ε＝２５， μ＝１．０，ｔ_Ｉ＝１０， α＝１／ｔ_Ｉ＝０．１， β＝０．００２５，ｔ_Ｔ＝８， θ＝０．２５ｍａｘ_{（ｘ，ｙ）}ＳＰ（ｉ）_{（ｘ，ｙ）}， δ＝０．１ n _φ = 4, n _F = 3, σ = 1.25, n _l = 8, L _c = {2, 3, 4}, L _s = {c + 3, c + 4} (c∈L _C ), n _A = 32 × 24 = 768, ε = 25, μ = 1.0, t _I = 10, α = 1 / t _I = 0.1, β = 0.0025, t _T = 8, θ = 0.25max _{(x , Y)} SP (i) _{(x, y)} , δ = 0.1

本発明の効果を確認するため、本発明の実施形態及び既知の方法によって得られる注目度映像がどの程度人間の視覚特性を模擬できているかを比較した。人間の視覚特性を表現する数量として、人間が実際に入力映像を見ている際の視線の位置を採用した。５名の被験者に入力映像を提示し、既存の視線測定装置を用いて各被験者の入力映像中の注視位置を逐次測定した。各被験者について、１種類の入力映像を２回提示した。これにより、各被験者・各入力映像について、注視位置の時系列を２本獲得した。この注視位置の時系列を、時刻の整合性を保ちながら入力映像の各フレーム（すなわち入力画像）に対応付けることで、各被験者・各入力画像について２通りの注視位置を獲得した。 In order to confirm the effect of the present invention, the degree of attention video obtained by the embodiment of the present invention and a known method was compared with how much human visual characteristics could be simulated. As the quantity that expresses human visual characteristics, the position of the line of sight when humans are actually watching the input video is adopted. The input video was presented to five subjects, and the gaze position in the input video of each subject was sequentially measured using an existing gaze measurement device. For each subject, one type of input video was presented twice. As a result, two time series of gaze positions were obtained for each subject and each input video. By associating the time series of the gaze position with each frame (that is, the input image) of the input video while maintaining time consistency, two gaze positions were obtained for each subject and each input image.

人間の視覚特性を模擬できているかどうかの評価尺度として、被験者の注視位置における注目度を採用した。本発明の方法もしくは既知の方法により入力映像から抽出した注目度映像について、入力画像ＩＮ（ｉ；ｋ）（ｋ＝１，２，…，６：映像の種類に対応）に対応する注目度映像のフレームである注目度画像Ｓ_Ｆ（ｉ；ｋ）を考える。本発明の第１の実施形態においては瞬時抑制後注目度画像が注目度画像と等しい、すなわちＳ_Ｆ（ｉ；ｋ）＝Ｓ_Ｉ（ｉ；ｋ）であり、本発明の第２の実施形態においては漸進抑制後注目度画像が注目度画像と等しい、すなわちＳ_Ｆ（ｉ；ｋ）＝Ｓ_Ｉ（ｉ；ｋ）である。入力画像ＩＮ（ｎ；ｋ）における被験者ｎ（ｎ＝１，２，…，５）の注視位置を（ｘ（ｉ；ｋ，ｎ），ｙ（ｉ；ｋ，ｎ））と表記するとき、被験者ｎを「教師」とする注目度画像Ｓ_Ｆ（ｉ；ｋ）の評価値Ｖ（ｉ；ｋ，ｎ）を以下の（式６０）で定めた。 The degree of attention at the gaze position of the subject was adopted as an evaluation scale for whether or not human visual characteristics could be simulated. Attention level video corresponding to the input image IN (i; k) (k = 1, 2,..., 6: corresponding to the type of video) of the attention level video extracted from the input video by the method of the present invention or a known method. Consider the attention degree image S _F (i; k) which is a frame of In the first embodiment of the present invention, the attention-suppressed attention level image is equal to the attention level image, that is, S _F (i; k) = S _I (i; k), and the second embodiment of the present invention. , The attention degree image after progressive suppression is equal to the attention degree image, that is, S _F (i; k) = S _I (i; k). When the gaze position of the subject n (n = 1, 2,..., 5) in the input image IN (n; k) is expressed as (x (i; k, n), y (i; k, n)), The evaluation value V (i; k, n) of the attention degree image S _F (i; k) with the subject n as “teacher” was determined by the following (formula 60).

上記（式６０）の右辺において、分母は、注目度画像を正規化するために付与されている。注目度映像｛Ｓ_Ｆ（ｉ；ｋ）｝_ｉの評価値Ｖ（ｋ）は、（式６１）に示すように、各注目度画像についての評価値を総計し、さらに被験者について平均を取ることで得る。ｎ_Ｅは被験者の数（すなわち、本評価例においては「５」）である。 On the right side of (Equation 60), the denominator is given to normalize the attention level image. Attention level image {S _F (i; k)} The evaluation value V (k) of _i is the sum of the evaluation values for each attention level image, as shown in (Equation 61), and further takes the average for the subjects. Get in. n _E is the number of subjects (that is, “5” in this evaluation example).

上記の評価値により、本発明の第１〜第３の実施形態による方法と既知の方法とを比較した。既知の方法として、非特許文献１に記載の方法、及び非特許文献２に記載の方法を用いた。図６は、非特許文献１に記載の方法、非特許文献２に記載の方法、第１の実施形態による方法（本発明の実施形態１）、第２の実施形態による方法（本発明の実施形態２）、第３の実施形態による方法（本発明の実施形態３）を用いた場合の入力映像（Ｖｉｄｅｏ１〜６）ごとの評価値（図中ではＮＥＴＲＶａｌｕｅ）を比較したグラフ、図７に各方法毎に全入力映像について平均を取った評価値を比較したグラフを示す。図７では、左から非特許文献１に記載の方法（Still image algorithm）、非特許文献２に記載の方法（Moving algorithm）、第１の実施形態による方法（Case 1）、第２の実施形態による方法（Case 2）、第３の実施形態による方法（Case 3）の平均評価値を示している。ここで、本発明の第１〜第３の実施形態による方法では、漸進抑制後注目度画像生成部４４に注目した、３通りの設定について評価している。第１の設定（図中、本発明の第１の実施形態による方法）では、漸進抑制後注目度画像生成部４４において（ω_Ｉ（ｉ），ω_Ｇ（ｉ））＝（１，０）∀ｉとした。第２の設定（図中、本発明の第２の実施形態による方法）では、漸進抑制後注目度画像生成部４４において（ω_Ｉ（ｉ），ω_Ｇ（ｉ））＝（０，１）∀ｉとした。第３の設定（図中、本発明の第３の実施形態による方法）では、漸進抑制後注目度画像生成部４４において（ω_Ｉ（ｉ），ω_Ｇ（ｉ））＝（１，１）∀ｉとした。図７からわかるように、入力映像について平均したときに、本発明の第２の実施形態の設定が最も良い評価値を示した。また、図６からわかるように、本発明の第２の実施形態の設定は、いずれの入力映像についてもその他の方法を上回る評価値を示した。さらに、図６からわかるように、本発明の第１の及び第３の実施形態の設定は、いくつかの入力映像について既知の方法を上回る評価値を示した。 Based on the above evaluation values, the methods according to the first to third embodiments of the present invention were compared with known methods. As a known method, the method described in Non-Patent Document 1 and the method described in Non-Patent Document 2 were used. FIG. 6 shows a method described in Non-Patent Document 1, a method described in Non-Patent Document 2, a method according to the first embodiment (Embodiment 1 of the present invention), and a method according to the second embodiment (Implementation of the present invention). Embodiment 2), a graph comparing evaluation values (NETR Value in the figure) for each input video (Video 1 to 6) when using the method according to the third embodiment (Embodiment 3 of the present invention), FIG. The graph which compared the evaluation value which took the average about all the input images | videos for each method is shown. In FIG. 7, from the left, the method described in Non-Patent Document 1 (Still image algorithm), the method described in Non-Patent Document 2 (Moving algorithm), the method according to the first embodiment (Case 1), and the second embodiment. The average evaluation value of the method (Case 2) according to the third embodiment and the method (Case 3) according to the third embodiment is shown. Here, in the methods according to the first to third embodiments of the present invention, three settings focused on the attention degree image generation unit 44 after progressive suppression are evaluated. In the first setting (in the drawing, the method according to the first embodiment of the present invention), the degree-of-advance attention degree image generation unit 44 (ω _I (i), ω _G (i)) = (1, 0) ∀i. In the second setting (in the figure, the method according to the second embodiment of the present invention), the degree-of-advance attention degree image generation unit 44 (ω _I (i), ω _G (i)) = (0, 1) ∀i. In the third setting (in the drawing, the method according to the third embodiment of the present invention), the degree-of-advance attention level image generation unit 44 (ω _I (i), ω _G (i)) = (1, 1) ∀i. As can be seen from FIG. 7, when the input video was averaged, the setting of the second embodiment of the present invention showed the best evaluation value. Further, as can be seen from FIG. 6, the setting of the second embodiment of the present invention showed an evaluation value that exceeded the other methods for any input video. Further, as can be seen from FIG. 6, the settings of the first and third embodiments of the present invention showed an evaluation value exceeding the known method for some input images.

なお、上述の注目領域抽出装置は、内部にコンピュータシステムを有している。そして、注目領域抽出装置の基礎注目度画像抽出部１、顕著度画像統合比率算出部２、注目度瞬時抑制部３、注目度漸進抑制部４、及び、注目度映像出力部５の動作の過程は、プログラムの形式でコンピュータ読み取り可能な記録媒体に記憶されており、このプログラムをコンピュータシステムが読み出して実行することによって、上記処理が行われる。ここでいうコンピュータシステムとは、ＣＰＵ及び各種メモリやＯＳ、周辺機器等のハードウェアを含むものである。 Note that the attention area extraction apparatus described above has a computer system therein. The basic attention level image extraction unit 1, the saliency image integration ratio calculation unit 2, the attention level instantaneous suppression unit 3, the attention level progressive suppression unit 4, and the attention level video output unit 5 of the attention area extraction device Is stored in a computer-readable recording medium in the form of a program, and the above processing is performed by the computer system reading and executing this program. The computer system here includes a CPU, various memories, an OS, and hardware such as peripheral devices.

また、「コンピュータシステム」は、ＷＷＷシステムを利用している場合であれば、ホームページ提供環境（あるいは表示環境）も含むものとする。
また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含むものとする。また上記プログラムは、前述した機能の一部を実現するためのものであっても良く、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであっても良い。 Further, the “computer system” includes a homepage providing environment (or display environment) if a WWW system is used.
The “computer-readable recording medium” refers to a portable medium such as a flexible disk, a magneto-optical disk, a ROM, and a CD-ROM, and a storage device such as a hard disk built in the computer system. Furthermore, the “computer-readable recording medium” dynamically holds a program for a short time, like a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line. In this case, a volatile memory in a computer system serving as a server or a client in that case is also used to hold a program for a certain period of time. The program may be a program for realizing a part of the functions described above, and may be a program capable of realizing the functions described above in combination with a program already recorded in a computer system.

本発明の第１の実施形態による注目領域抽出装置の構成を示すブロック図である。It is a block diagram which shows the structure of the attention area extraction apparatus by the 1st Embodiment of this invention. 図１に示す注目領域抽出装置の動作例を示す図である。It is a figure which shows the operation example of the attention area extraction apparatus shown in FIG. 本発明の第２の実施形態による注目領域抽出装置の構成を示すブロック図である。It is a block diagram which shows the structure of the attention area extraction device by the 2nd Embodiment of this invention. 図３に示す注目領域抽出装置の動作例を示す図である。It is a figure which shows the operation example of the attention area extraction apparatus shown in FIG. 本発明の第３の実施形態による注目領域抽出装置の構成を示すブロック図である。It is a block diagram which shows the structure of the attention area extraction device by the 3rd Embodiment of this invention. 本発明と従来技術を用いた方法による入力映像ごとの評価値を示すグラフである。It is a graph which shows the evaluation value for every input image | video by the method using this invention and a prior art. 図６に示す各方法の評価値の平均を示すグラフである。It is a graph which shows the average of the evaluation value of each method shown in FIG.

Explanation of symbols

１…基礎注目度画像抽出部
１１…基礎特徴画像抽出部
１１１…輝度特徴画像抽出部
１１２…色特徴画像抽出部
１１３…方向特徴画像抽出部
１１４…点滅特徴画像抽出部
１１５…運動特徴画像抽出部
１２…多重解像度画像抽出部
１３…解像度差分画像抽出部
１４…顕著度画像抽出部
１４１…解像度差分画像正規化部
１４２…正規化解像度差分画像積算部
１５…顕著度画像統合部
２…顕著度画像統合比率算出部
３…注目度瞬時抑制部
３１…最大基礎注目度領域検出部
３２…最大基礎注目度領域遮蔽画像抽出部
３３…注目度漸進回復画像抽出部
３４…注目度瞬時抑制画像生成部
３５…瞬時抑制後注目度画像生成部
４…注目度漸進抑制部
４１…注目度漸進遮蔽画像生成部
４２…注目度瞬時回復画像生成部
４３…注目度漸進抑制画像生成部
４４…漸進抑制後注目度画像生成部
５…注目度映像出力部 DESCRIPTION OF SYMBOLS 1 ... Basic attention image extraction part 11 ... Basic feature image extraction part 111 ... Luminance feature image extraction part 112 ... Color feature image extraction part 113 ... Direction feature image extraction part 114 ... Flashing feature image extraction part 115 ... Motion feature image extraction part DESCRIPTION OF SYMBOLS 12 ... Multi-resolution image extraction part 13 ... Resolution difference image extraction part 14 ... Saliency image extraction part 141 ... Resolution difference image normalization part 142 ... Normalization resolution difference image integration part 15 ... Saliency degree image integration part 2 ... Saliency degree image Integrated ratio calculation unit 3 ... attention level instantaneous suppression unit 31 ... maximum basic attention level region detection unit 32 ... maximum basic attention level region occlusion image extraction unit 33 ... attention level progressive recovery image extraction unit 34 ... attention level instantaneous suppression image generation unit 35 ... attention degree image generation unit after instantaneous suppression 4 ... attention degree gradual suppression part 41 ... attention degree gradual occlusion image generation part 42 ... attention degree instantaneous recovery image generation part 43 ... attention degree gradual suppression image generation Part 44 ... gradual suppression after prominence image generating unit 5 ... attention video output unit

Claims

A region-of-interest extraction method for extracting an attention level image, which is a video that displays a spatio-temporal region having remarkable characteristics in the input image, from a target input image,
A basic attention level image extraction process for extracting a basic attention level image, which is an image displaying a spatial region having a remarkable characteristic in the frame from a certain frame constituting the input video;
In the basic attention level image extracted from the previous frame of the input video by the basic attention level image extraction process, a maximum basic attention level area that is the area where the basic attention level that is the value of each pixel is the largest The attention degree instantaneous suppression process of extracting the attention degree image after instantaneous suppression by suppressing the basic attention degree image extracted from the current frame of the input video by the basic attention degree image extraction process;
Or
An area having a remarkable value in the time axis direction is extracted from the basic attention image calculated from several frames before the current frame of the input video, and the input video of the input video is extracted by the basic attention image extraction process. In the basic attention level image extracted from the frame or the instantaneous transition attention level image extracted from the frame of the input video by the attention level instantaneous suppression process, the extracted region is emphasized and other regions are suppressed. Thus, one or both of the attention degree gradual suppression processes for extracting the attention degree image after gradual suppression,
By repeatedly executing the basic attention level image extraction process and one or both of the attention level instantaneous suppression process and the attention level gradual suppression process sequentially for each frame of the input video, Attention area extraction characterized by comprising: an attention level image or a time series of attention level images after progressive suppression, which is a time series of the attention level image after progressive suppression, and an attention level video output process of outputting this as attention level video Method.

The basic attention degree image extraction process includes:
A basic feature image extraction process for extracting a plurality of types of basic feature images from a frame of the input video;
For each type of basic feature image extracted by the basic feature image extraction process, a multi-resolution image extraction process for extracting a multi-resolution image that is a multi-resolution expression;
A resolution difference image extraction step of extracting a plurality of resolution difference images, which are differences between images of different resolutions, for each type of the multi-resolution image extracted by the multi-resolution image extraction step;
For each type of resolution difference image extracted by the resolution difference image extraction process, a saliency image extraction process for extracting a saliency image by integrating resolution difference images with different resolutions;
The saliency image extracted by the saliency image extraction process consists of a saliency image integration process for extracting a basic attention image by integrating a plurality of types of saliency images,
The attention area extraction method according to claim 1, wherein the attention degree gradual suppression process extracts a saliency image using a gradual suppression image instead of the basic attention degree image.

further,
The maximum basic attention level region is extracted from the basic attention level image extracted by the basic attention level image extraction process, and a value in a region corresponding to the maximum basic attention level region is calculated for a plurality of types of saliency images. A saliency image integration ratio calculation process for determining a saliency image integration ratio that is a weight of the corresponding saliency image from the magnitude of the value, and
In the basic attention level image extraction process, the saliency image is weighted and integrated by the saliency image integration ratio calculated by the saliency image integration ratio calculation process for the previous frame of the input video, thereby integrating the basic attention level image. The attention area extraction method according to claim 1 or 2, wherein an attention degree image is extracted.

The saliency image integration ratio calculation process includes:
The saliency image integration ratio calculated for one previous frame of the input video is set as an initial value, and the value in the maximum basic attention area calculated for each saliency image is calculated with respect to the initial value. The attention area extraction method according to claim 3, wherein a new saliency image integration ratio is updated as a difference value.

The attention degree instantaneous suppression process is:
A maximum basic attention area detection process for extracting the maximum basic attention area from a basic attention image extracted from a previous frame of the input video by the basic attention image extraction process;
A maximum basic attention area occlusion image extraction process for extracting a maximum basic attention area occlusion image that is an image that occludes the maximum basic attention area extracted by the maximum basic attention area detection process;
Attention gradual extraction for extracting an attention degree progressive recovery image that is an image that reduces occluding in the maximum basic attention degree area occlusion image extracted by the maximum basic attention degree area occlusion image extraction process Recovery image extraction process,
Instantaneous attention level suppression by integrating the maximum basic attention level region occlusion image extracted by the maximum basic attention level region occlusion image extraction process and the attention level progressive recovery image extracted by the attention level progressive recovery image extraction process Attention level instantaneous suppression image generation process to generate an image,
After instantaneous suppression by integrating the instantaneous attention suppressed image generated by the attention instantaneous suppression image generation process and the basic attention image of the current frame of the input video extracted by the basic attention image extraction process The attention area extracting method according to any one of claims 1 to 4, further comprising: an attention degree image generation process after instantaneous suppression that generates an attention degree image.

The attention degree gradual suppression process is:
Attention degree progressive occlusion image generation process for generating an attention degree progressive occlusion image that is an image that gradually occludes the basic attention degree image extracted by the basic attention degree image extraction process;
For a basic attention level image or saliency image calculated from several frames before the current frame of the input video, an area having a remarkable value in the time axis direction is extracted, and the basic attention level corresponding to the area is extracted. Attention degree instantaneous recovery image generation process for generating an attention degree instantaneous recovery image that is an image or an image for canceling the suppression of the basic attention degree of the area in the attention degree image after instantaneous suppression;
The attention degree progressive occlusion image generated by the attention degree progressive occlusion image generation process and the attention degree instantaneous recovery image generated by the attention degree instantaneous recovery image generation process are integrated to generate an attention degree progressive suppression image. Attention level progressive suppression image generation process,
The basic attention degree image extracted by the basic attention degree image extraction process, or the instantaneous attention degree image extracted by the attention degree instantaneous suppression process, and the attention degree generated in the attention degree gradual suppression image generation process The attention area extraction according to any one of claims 1 to 5, comprising a step of generating a post-progression attention level image by integrating a progressive suppression image and integrating the progressive suppression image. Method.

An attention area extraction device that extracts an attention degree image, which is an image displaying a spatio-temporal region having a remarkable characteristic in the input image, from a target input image,
A basic attention level image extraction unit that extracts a basic attention level image, which is an image displaying a spatial region having remarkable characteristics in the frame, from a certain frame constituting the input video;
In the basic attention level image extracted from the previous frame of the input video by the basic attention level image extraction unit, the maximum basic attention level area that is the area where the basic attention level that is the value of each pixel is the largest An attention level instantaneous suppression unit that extracts an attention level image after instantaneous suppression by suppressing the basic attention level image extracted from the current frame of the input video by the basic attention level image extraction unit,
Or
A region having a remarkable value in the time axis direction is extracted from the basic attention image calculated from several frames before the current frame of the input video, and the basic attention image extraction unit extracts the region of the input video. In the basic attention level image extracted from the frame or the instantaneous transition attention level image extracted from the frame of the input video by the attention level instantaneous suppression unit, the extracted region is emphasized and other regions are suppressed. Thus, one or both of the degree-of-interest gradual suppression units for extracting the degree-of-interest attention level image,
For each frame of the input video, the attention level image after instantaneous suppression extracted by the attention level instantaneous suppression unit or the time series of the attention level image after progressive suppression extracted by the attention level progressive suppression unit An attention area extracting apparatus comprising: an attention degree video output unit that extracts an attention degree video after suppression and outputs the attention degree video as the attention degree video.

From a target input video, to a computer used as a region of interest extraction device that extracts a video of attention level, which is a video displaying a spatio-temporal region with remarkable characteristics in the input video,
A basic attention level image extraction process for extracting a basic attention level image, which is an image displaying a spatial region having a remarkable characteristic in the frame from a certain frame constituting the input video;
In the basic attention level image extracted from the previous frame of the input video by the basic attention level image extraction process, a maximum basic attention level area that is the area where the basic attention level that is the value of each pixel is the largest The attention degree instantaneous suppression process of extracting the attention degree image after instantaneous suppression by suppressing the basic attention degree image extracted from the current frame of the input video by the basic attention degree image extraction process;
Or
An area having a remarkable value in the time axis direction is extracted from the basic attention image calculated from several frames before the current frame of the input video, and the input video of the input video is extracted by the basic attention image extraction process. In the basic attention level image extracted from the frame or the instantaneous transition attention level image extracted from the frame of the input video by the attention level instantaneous suppression process, the extracted region is emphasized and other regions are suppressed. Thus, one or both of the attention degree gradual suppression processes for extracting the attention degree image after gradual suppression,
By repeatedly executing the basic attention level image extraction process and one or both of the attention level instantaneous suppression process and the attention level gradual suppression process sequentially for each frame of the input video, A computer program that extracts an attention level image or a time series of attention level images after progressive suppression that is a time series of the attention level image after progressive suppression and outputs the attention level image as an attention level image. .

From a target input video, to a computer used as a region of interest extraction device that extracts a video of attention level, which is a video displaying a spatio-temporal region with remarkable characteristics in the input video,
A basic attention level image extraction process for extracting a basic attention level image, which is an image displaying a spatial region having a remarkable characteristic in the frame from a certain frame constituting the input video;
In the basic attention level image extracted from the previous frame of the input video by the basic attention level image extraction process, a maximum basic attention level area that is the area where the basic attention level that is the value of each pixel is the largest The attention degree instantaneous suppression process of extracting the attention degree image after instantaneous suppression by suppressing the basic attention degree image extracted from the current frame of the input video by the basic attention degree image extraction process;
Or
An area having a remarkable value in the time axis direction is extracted from the basic attention image calculated from several frames before the current frame of the input video, and the input video of the input video is extracted by the basic attention image extraction process. In the basic attention level image extracted from the frame or the instantaneous transition attention level image extracted from the frame of the input video by the attention level instantaneous suppression process, the extracted region is emphasized and other regions are suppressed. Thus, one or both of the attention degree gradual suppression processes for extracting the attention degree image after gradual suppression,
By repeatedly executing the basic attention level image extraction process and one or both of the attention level instantaneous suppression process and the attention level gradual suppression process sequentially for each frame of the input video, A computer-readable recording of a computer program for extracting an attention level image or a time series of attention level images after progressive suppression, which is a time series of the attention level image after progressive suppression, and outputting the attention level image as an attention level image Possible recording media.