JP2017016592A

JP2017016592A - Main subject detection device, main subject detection method and program

Info

Publication number: JP2017016592A
Application number: JP2015135679A
Authority: JP
Inventors: 山本　貴久; Takahisa Yamamoto; 貴久山本; 優和真継; Masakazu Matsugi; 小森　康弘; Yasuhiro Komori; 康弘小森; 伊藤　嘉則; Yoshinori Ito; 嘉則伊藤
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2015-07-06
Filing date: 2015-07-06
Publication date: 2017-01-19

Abstract

PROBLEM TO BE SOLVED: To provide a main subject detection device capable of detecting an intended main subject from images.SOLUTION: In an embodiment, an area of an object which frequently appears is detected as a first candidate area from plurality of images prior to specific point of time. A feature amount of the images at a specific point of time is calculated based on a piece of information on a function which uses a detection result of the main subject, and based on the calculated feature amount, a second candidate area is detected. The main subject area is finally detected based on the detected first candidate area and the second candidate area.SELECTED DRAWING: Figure 1

Description

本発明は、入力画像において主被写体の領域を検出する技術に関する。 The present invention relates to a technique for detecting a region of a main subject in an input image.

従来から、入力画像中の主被写体の領域を検出する方法が知られている。特許文献１には、過去の複数フレームの画像内で存在している累積時間が長い等、安定的に出現している被写体を主被写体として検出する技術が開示されている。 Conventionally, a method for detecting a region of a main subject in an input image is known. Patent Document 1 discloses a technique for detecting a stably appearing subject as a main subject such as a long accumulated time existing in images of a plurality of past frames.

そして、このような主被写体領域検出の処理が例えばデジタルカメラなどの撮像装置内で行われることにより、主被写体にオートフォーカス（ＡＦ）を調節したり、主被写体に対して自動追尾を行うことができるようになる。 Such main subject area detection processing is performed in an imaging device such as a digital camera, so that autofocus (AF) can be adjusted for the main subject or automatic tracking can be performed for the main subject. become able to.

特開２０１３−１２０９４９号公報JP2013-120949A 特開２０１１−５３７５９号公報JP 2011-53759 A

Ｌ．Ｉｔｔｉ，Ｃ．Ｋｏｃｈ，Ｅ．Ｎｉｅｂｕｒ，ＡＭｏｄｅｌｏｆＳａｌｉｅｎｃｙ−ＢａｓｅｄＶｉｓｕａｌＡｔｔｅｎｔｉｏｎｆｏｒＲａｐｉｄＳｃｅｎｅＡｎａｌｙｓｉｓ，ＩＥＥＥＴｒａｎｓａｃｔｉｏｎｓｏｎＰａｔｔｅｒｎＡｎａｌｙｓｉｓａｎｄＭａｃｈｉｎｅＩｎｔｅｌｌｉｇｅｎｃｅ，Ｖｏｌ．２０，Ｎｏ．１１，ｐｐ．１２５４−１２５９，Ｎｏｖ１９９８．L. Itti, C.I. Koch, E .; Niebur, A Model of Saliency-Based Visual Attention for Rapid Scene Analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence, Inc. 20, no. 11, pp. 1254-1259, Nov 1998. Ｂ．Ｋ．Ｐ．ＨｏｒｎａｎｄＢ．Ｇ．Ｓｃｈｕｎｃｋ，“ＤｅｔｅｒｍｉｎｉｎｇＯｐｔｉｃａｌＦｌｏｗ，ＡｒｔｉｆｉｃｉａｌＩｎｔｅｌｌｉｇｅｎｃｅ”，ｖｏｌ．１７，ｐｐ．１８５−２０３，１９８１．B. K. P. Horn and B.H. G. Schunk, “Determining Optical Flow, Artificial Intelligence”, vol. 17, pp. 185-203, 1981.

しかしながら、特許文献１に開示される技術のように、複数フレームの画像にわたって長く安定的に写っているものを主被写体とするだけでは、撮影者の意図を誤って主被写体を検出してしまう可能性がある。 However, as in the technique disclosed in Patent Document 1, it is possible to detect the main subject by mistake of the photographer's intention only by using the main subject as a long and stable image over a plurality of frames. There is sex.

例えば、動かない物体（例えば道路標識等）を撮影しているときに、動きのある物体が画像内に入り込んできた場合（例えば車が入り込んできた場合）、従来の特許文献１の方法では道路標識を主被写体として検出することになる。しかしながら、撮影者は、道路標識が写るような構図で車を撮影しようとして、車が画角に入ってくるのを待っていたということもある。この場合には、車が主被写体と検出されることが好ましく、画像に長く安定的に写っていたものを主被写体と検出するだけでは、撮影者の意図に反した主被写体検出をしてしまうことがあった。本発明は、撮影者の意図に反した主被写体検出を抑制することを目的とする。 For example, when a moving object has entered the image (for example, a car has entered) while shooting a non-moving object (for example, a road sign), the conventional patent document 1 uses a road. The sign is detected as the main subject. However, the photographer may have waited for the car to come into the angle of view, trying to photograph the car in a composition in which the road sign is reflected. In this case, it is preferable that the car is detected as the main subject, and the main subject that is contrary to the photographer's intention is detected only by detecting the main subject as a long and stable image. There was a thing. An object of the present invention is to suppress detection of a main subject against the photographer's intention.

以上の課題を解決するために、本発明は、特定時点における画像の主被写体の領域を検出する主被写体検出装置であって、前記特定時点より前の、時系列に連続する複数の画像を取得する画像取得手段と、前記取得手段により取得された複数の画像において存在する頻度が所定の回数以上の物体を検出する物体検出手段と、前記物体検出手段により検出された物体の情報に基づいて、前記特定時点における画像において主被写体の第１候補領域を検出する第１検出手段と、前記特定時点における画像において主被写体の領域が検出された場合に、当該検出された主被写体の領域の情報が供される機能を取得する機能取得手段と、前記機能取得手段により取得された機能に応じて、前記特定時点における画像の特徴量を算出する特徴量算出手段と、前記特徴量算出手段により算出された特徴量に基づいて、前記特定時点における画像において主被写体の第２候補領域を検出する第２検出手段と、前記第１検出手段により検出された第１候補領域と、前記第２検出手段により検出された第２候補領域とに基づき、前記特定時点における画像の主被写体の領域を検出する最終検出手段と、を有することを特徴とする。 In order to solve the above problems, the present invention is a main subject detection device that detects a region of a main subject of an image at a specific time point, and acquires a plurality of continuous images in time series before the specific time point. On the basis of the information on the object detected by the object detecting means, the object detecting means for detecting an object having a predetermined frequency or more in the plurality of images acquired by the acquiring means, First detection means for detecting a first candidate region of the main subject in the image at the specific time point, and when the main subject region is detected in the image at the specific time point, information on the detected main subject region is obtained. A function acquisition unit that acquires a function to be provided; and a feature amount calculation unit that calculates a feature amount of the image at the specific time according to the function acquired by the function acquisition unit. , Second detection means for detecting a second candidate area of the main subject in the image at the specific time point based on the feature quantity calculated by the feature quantity calculation means, and a first candidate detected by the first detection means And final detection means for detecting a main subject area of the image at the specific time point based on the area and the second candidate area detected by the second detection means.

以上の構成によれば、本発明は、撮影者の意図に反した主被写体の検出を抑制することが可能になる。 According to the above configuration, the present invention can suppress the detection of the main subject against the photographer's intention.

第１の実施形態における主被写体検出装置の構成を示す概略ブロック図。1 is a schematic block diagram illustrating a configuration of a main subject detection device according to a first embodiment. 第１の実施形態において入力画像と顕著領域の例を示す図。The figure which shows the example of an input image and a remarkable area | region in 1st Embodiment. 第１の実施形態における第１検出部の処理の詳細を説明する図。The figure explaining the detail of the process of the 1st detection part in 1st Embodiment. 第１の実施形態における第２検出部の処理の詳細を説明する図。The figure explaining the detail of the process of the 2nd detection part in 1st Embodiment. 第１の実施形態における主被写体検出処理の手順を示すフローチャート。5 is a flowchart illustrating a procedure of main subject detection processing in the first embodiment. 第２の実施形態における主被写体検出方法の概要を説明する図。FIG. 10 is a diagram for explaining an outline of a main subject detection method according to a second embodiment. 第１の実施形態の変形例における主被写体検出処理の手順を示すフローチャート。9 is a flowchart showing a procedure of main subject detection processing in a modification of the first embodiment. 第２の実施形態における主被写体検出処理の手順を示すフローチャート。9 is a flowchart illustrating a procedure of main subject detection processing in the second embodiment. 第３の実施形態における主被写体検出装置の構成を示す概略ブロック図。FIG. 10 is a schematic block diagram illustrating a configuration of a main subject detection device according to a third embodiment. 第３の実施形態における主被写体検出処理の手順を示すフローチャート。10 is a flowchart illustrating a procedure of main subject detection processing in the third embodiment.

［第１の実施形態］
以下、図面を参照して本発明の実施形態を詳細に説明する。図１は、本実施形態に係る主被写体検出装置１００の構成を示す概略ブロック図である。主被写体検出装置１００は、画像取得部１０１、対象物体検出部１０２、第１検出部１０３、機能取得部１０４、特徴量算出部１０５、第２検出部１０６、最終検出部１０７、顕著領域検出部１０８、同一物体領域特定部１０９、および出現頻度算出部１１０を備える。本実施形態に係る主被写体検出装置１００は、半導体集積回路（ＬＳＩ）を用いて実現される。例えば、デジタルカメラのような撮像装置に備えられた半導体集積回路が前述の主被写体検出装置としての機能を実現するようにしてもよく、この場合、撮像装置自体が本実施形態の主被写体検出装置１００に相当する。 [First Embodiment]
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. FIG. 1 is a schematic block diagram illustrating a configuration of a main subject detection device 100 according to the present embodiment. The main subject detection apparatus 100 includes an image acquisition unit 101, a target object detection unit 102, a first detection unit 103, a function acquisition unit 104, a feature amount calculation unit 105, a second detection unit 106, a final detection unit 107, and a saliency area detection unit. 108, an identical object region specifying unit 109, and an appearance frequency calculating unit 110. The main subject detection apparatus 100 according to the present embodiment is realized using a semiconductor integrated circuit (LSI). For example, a semiconductor integrated circuit provided in an imaging device such as a digital camera may realize the function as the above-described main subject detection device. In this case, the imaging device itself is the main subject detection device of this embodiment. It corresponds to 100.

主被写体検出装置１００は、単体の装置として構成されていてもよい。つまり、主被写体検出装置１００は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）等のハードウェア構成を備える。そして、ＣＰＵがＲＯＭやＨＤＤ等に格納されたプログラムを実行することにより、例えば、後述する各機能構成やフローチャートの処理が実現される。ＲＡＭは、ＣＰＵがプログラムを展開して実行するワークエリアとして機能する記憶領域を有する。ＲＯＭは、ＣＰＵが実行するプログラム等を格納する記憶領域を有する。ＨＤＤは、ＣＰＵが処理を実行する際に要する各種のプログラム、閾値に関するデータ等を含む各種のデータを格納する記憶領域を有する。 The main subject detection device 100 may be configured as a single device. That is, the main subject detection apparatus 100 includes a hardware configuration such as a CPU (Central Processing Unit), a ROM (Read Only Memory), a RAM (Random Access Memory), and an HDD (Hard Disk Drive). Then, when the CPU executes a program stored in a ROM, an HDD, or the like, for example, each functional configuration and flowchart processing described later are realized. The RAM has a storage area that functions as a work area where the CPU develops and executes the program. The ROM has a storage area for storing programs executed by the CPU. The HDD has a storage area for storing various types of data including various programs necessary for the CPU to execute processing, data on threshold values, and the like.

以下、本実施形態の説明では、主被写体検出装置１００がデジタルカメラとして構成されている場合を例に説明する。まず、画像取得部１０１は、撮像部（イメージセンサ等）から逐次入力される入力画像（動画フレーム）を取得し、取得した入力画像（動画フレーム）を対象物体検出部１０２、第１検出部１０３、特徴量算出部１０５へと出力する。なお、このような逐次入力される画像は、デジタルカメラのスルー表示の機能等にも供される。 Hereinafter, in the description of the present embodiment, a case where the main subject detection device 100 is configured as a digital camera will be described as an example. First, the image acquisition unit 101 acquires input images (moving image frames) sequentially input from an imaging unit (such as an image sensor), and uses the acquired input images (moving image frames) as a target object detection unit 102 and a first detection unit 103. And output to the feature amount calculation unit 105. Note that such sequentially input images are also used for a through display function of a digital camera.

対象物体検出部１０２は、複数フレームで高頻度に存在している物体（以下、取込み対象物体と称す）を検出する。対象物体検出部１０２の処理の概要について簡単に説明する。まず、対象物体検出部１０２は、取込み対象物体の候補を探索し、その候補が複数フレームにわたって長く映っているかどうかを判断する。その際、対象物体検出部１０２は、異なるフレームで探索された候補が同一のものであるか否かの判定を行い、同一と判定された候補が複数フレームにわたって存在すれば、長く映っていると判断する。そして、対象物体検出部１０２は、複数フレームにわたって長く映っていると判断した取込み対象物体に関する情報（位置やサイズ、色等）を出力する。 The target object detection unit 102 detects an object that exists frequently in a plurality of frames (hereinafter referred to as a capture target object). An outline of the processing of the target object detection unit 102 will be briefly described. First, the target object detection unit 102 searches for candidates for capture target objects, and determines whether or not the candidates are reflected for a plurality of frames. At that time, the target object detection unit 102 determines whether or not the candidates searched for in different frames are the same, and if there are candidates determined to be the same over a plurality of frames, the target object detection unit 102 appears to be long. to decide. Then, the target object detection unit 102 outputs information (position, size, color, etc.) related to the capture target object that is determined to be reflected for a long time over a plurality of frames.

次に、対象物体検出部１０２の詳細を説明するために、対象物体検出部１０２を構成するサブブロックごとの処理について説明する。顕著領域検出部１０８は、動画フレームの一枚一枚を入力画像として逐次取得し、各入力画像から顕著な領域を探索する。顕著領域の探索手法としては、種々の公知手法を用いることができる。例えば、非特許文献１に記載されている中心−周囲（ｃｅｎｔｅｒ−ｓｕｒｒｏｕｎｄ）差分法等を用いればよく、画像の各画素に小領域を設定し、その小領域と周囲の領域との特徴量の差分を顕著度として算出する等により顕著領域が検出される。 Next, in order to explain the details of the target object detection unit 102, processing for each sub-block constituting the target object detection unit 102 will be described. The saliency detection unit 108 sequentially acquires each moving image frame as an input image and searches for a saliency area from each input image. Various known methods can be used as the salient region searching method. For example, a center-surround difference method or the like described in Non-Patent Document 1 may be used. A small area is set for each pixel of the image, and a feature amount between the small area and the surrounding area is set. A saliency area is detected, for example, by calculating the difference as the saliency.

図２は、入力画像、及び顕著領域検出部１０８により探索された顕著領域の例を示す。図２（ａ）は、画像取得部１０１により取得され、顕著領域検出部１０８へと入力された入力画像２０１を示している。入力画像２０１には、標識２０２と車２０３が写っている。 FIG. 2 shows an example of the saliency area searched by the input image and the saliency area detection unit 108. FIG. 2A shows an input image 201 acquired by the image acquisition unit 101 and input to the saliency area detection unit 108. In the input image 201, a sign 202 and a car 203 are shown.

顕著領域検出部１０８は、この入力画像２０１に対して上記非特許文献１に記載されるような手法を用いて顕著度の算出を行い、顕著度を画素値とする顕著度マップを得る。この顕著度マップを所定の閾値で二値化し、所定の閾値以上の顕著度を有する高顕著領域だけを抽出する。 The saliency detection unit 108 calculates the saliency with respect to the input image 201 using a technique as described in Non-Patent Document 1 to obtain a saliency map having the saliency as a pixel value. This saliency map is binarized with a predetermined threshold value, and only a high saliency area having a saliency greater than or equal to the predetermined threshold value is extracted.

図２（ｂ）は、検出した高顕著度領域の外接矩形を入力画像２０１に重ねて表示している。矩形２０５及び２０６が、標識２０２及び車２０３に対して設定された高顕著度領域の外接矩形の領域である。顕著領域検出部１０８は、検出した高顕著領域を取込み対象物体の候補として、その情報を同一物体領域特定部１０９に出力する。ここで出力する情報は、例えば、外接矩形２０５、２０６で切り取られた画像及び、外接矩形２０５、２０６の入力画像２０１内での位置である。 FIG. 2B displays the circumscribed rectangle of the detected high saliency area superimposed on the input image 201. The rectangles 205 and 206 are circumscribed rectangle areas of the high saliency area set for the sign 202 and the car 203. The saliency area detection unit 108 takes the detected highly saliency area as a candidate for a target object and outputs the information to the same object area specification unit 109. The information to be output here is, for example, the image clipped by the circumscribed rectangles 205 and 206 and the position of the circumscribed rectangles 205 and 206 in the input image 201.

同一物体領域特定部１０９は、異なるフレーム（入力画像）で検出された取込み対象物体の候補が、同一の物体を撮影したものであるか否かの判定を行う。同一物体か否かの判定は、異なるフレームに対して顕著領域検出部１０８から出力された候補領域同士の相関を算出することにより判定される。 The same object region specifying unit 109 determines whether the candidates for the capture target object detected in different frames (input images) are those obtained by photographing the same object. Whether or not they are the same object is determined by calculating a correlation between candidate areas output from the saliency area detection unit 108 for different frames.

図２に戻り、同一物体領域特定部１０９における同一物体領域の判定処理について説明する。図２（ｃ）には、入力画像２０１に続く（時間的に連続する）入力画像２０７が図示されている。入力画像２０７において、標識２０８及び車２０９は、図２（ａ）の標識２０２及び車２０３に対応しており、車２０９は入力画像２０１の時点から少し動いている。また、入力画像２０７には、新たにフレーム中に入ってきた飛行機２１０が存在している。 Returning to FIG. 2, the same object region determination process in the same object region specifying unit 109 will be described. In FIG. 2C, an input image 207 following the input image 201 (continuous in time) is shown. In the input image 207, the sign 208 and the car 209 correspond to the sign 202 and the car 203 in FIG. 2A, and the car 209 has moved slightly from the time of the input image 201. In addition, the input image 207 includes an airplane 210 that has newly entered the frame.

図２（ｄ）は、検出した高顕著度領域の外接矩形を入力画像２０７に重ねて表示すものであり。矩形２１２、２１３、及び２１４が、標識２０８、車２０９及び飛行機２１０に対して設定された高顕著度領域の外接矩形の領域である。 FIG. 2D shows the circumscribed rectangle of the detected high saliency area superimposed on the input image 207. The rectangles 212, 213, and 214 are circumscribed rectangular areas of the high saliency area set for the sign 208, the car 209, and the airplane 210.

このように、ある時点の入力画像２０１には、取込み対象物体の候補領域として矩形２０５、２０６の二つが存在し、異なる時点の入力画像２０７には、取込み対象物体の候補領域として矩形２１２、２１３、２１４の三つが存在している。そして、同一物体領域特定部１０９は、異なる時点のフレーム（入力画像）の候補領域同士の相関をとる。上述した図２の場合には、「２０５と２１２」、「２０５と２１３」、「２０５と２１４」、「２０６と２１２」、「２０６と２１３」、「２０６と２１４」の６通りの組み合わせに対して相関係数を算出する。なお、相関係数を算出する手法にあたっては、種々の公知技術を用いることができ、例えば正規化相関を算出すればよい。正規化相関を算出する場合、相関をとる対象の候補領域のサイズをアフィン変換等で揃える必要がある。 As described above, the input image 201 at a certain point in time has two rectangles 205 and 206 as acquisition target object candidate regions, and the input image 207 at a different point in time has rectangles 212 and 213 as acquisition target object candidate regions. , 214 exists. Then, the same object region specifying unit 109 correlates candidate regions of frames (input images) at different points in time. In the case of FIG. 2 described above, there are six combinations of “205 and 212”, “205 and 213”, “205 and 214”, “206 and 212”, “206 and 213”, and “206 and 214”. Correlation coefficients are calculated for it. In the method of calculating the correlation coefficient, various known techniques can be used. For example, a normalized correlation may be calculated. When calculating a normalized correlation, it is necessary to align the sizes of candidate areas to be correlated by affine transformation or the like.

このように、同一物体領域特定部１０９は、異なるフレーム（入力画像）の候補領域同士の相関をとり、得られた相関係数に対して所定の閾値により閾値処理を行う。そして、閾値を超えた相関係数をもつ候補領域同士は、同一物体として判定される。図２の例では、「２０５と２１２」、「２０６と２１３」が同一物体領域として判定される。どの候補領域同士が同一物体領域と判定されたのかという判定結果は、出現頻度算出部１１０に出力される。 In this way, the same object region specifying unit 109 correlates candidate regions of different frames (input images), and performs threshold processing with a predetermined threshold on the obtained correlation coefficient. Then, candidate regions having correlation coefficients exceeding the threshold are determined as the same object. In the example of FIG. 2, “205 and 212” and “206 and 213” are determined as the same object region. A determination result indicating which candidate regions are determined to be the same object region is output to the appearance frequency calculation unit 110.

出現頻度算出部１１０は、同一物体領域特定部１０９から出力される判定結果に基づき、各取込み対象物体の候補が動画フレーム中に何度出現したかをカウントする。図２に示した２フレームの例では、候補２０５（２１２）の出現頻度は２回、候補２０６（２１３）の出現頻度は２回、候補２１４の出現頻度は１回となる。同一物体領域特定部１０９からは同一と判定された候補（対）が出力されるので、これを辿ることで３フレーム以上にわたっても同一物体に対する出現頻度をカウントしていくことができる。 Based on the determination result output from the same object region specifying unit 109, the appearance frequency calculation unit 110 counts how many times each candidate for the capture target object appears in the video frame. In the example of two frames shown in FIG. 2, the appearance frequency of the candidate 205 (212) is twice, the appearance frequency of the candidate 206 (213) is two times, and the appearance frequency of the candidate 214 is one time. Since the candidate (pair) determined to be the same is output from the same object region specifying unit 109, the appearance frequency for the same object can be counted over three frames or more by following this.

出現頻度算出部１１０は、予め決められたフレーム枚数中に、所定の回数以上、出現している取込み対象物体の候補が現れると、その候補を取込み対象物体とする。例えば、２フレーム中で２回以上の出現回数であれば、その候補を取込み対象物体とする、というように予め決められているとすると、図２の場合、候補２０６（２１３）及び候補２０５（２１２）が取込み対象物体となる。 When the appearance frequency calculation unit 110 appears as a target object to be captured when a candidate for the capture target object that has appeared a predetermined number of times or more appears in a predetermined number of frames, the appearance frequency calculation unit 110 sets the candidate as the target object. For example, in the case of FIG. 2, the candidates 206 (213) and 205 ( 212) is an object to be captured.

出現頻度算出部１１０は、以上の処理の結果、取込み対象物体を示す情報（以下、取込み対象物体情報と称す）を第１検出部１０３に出力する。例えば、出力結果の取込み対象物体情報として、取込み対象物体２１２、２１３の切取り画像及び、それらの入力画像２０７内での位置に関する情報が出力される。なお、取込み対象物体情報としては、取込み対象物体２１２と２０５であれば、どちらの切取り画像を出力してもよいが、本実施形態では時間的に新しい取込み対象物体２１２を出力とする。これは、主被写体を検出しようとしている画像と時間的に近いフレームから切り取った方が、後に説明する第１候補領域の探索がより精度よく探索できるからである。 As a result of the above processing, the appearance frequency calculation unit 110 outputs information indicating the capture target object (hereinafter referred to as capture target object information) to the first detection unit 103. For example, as the capture target object information of the output result, cut-out images of the capture target objects 212 and 213 and information regarding their positions in the input image 207 are output. As the capture target object information, any of the cropped images may be output as long as the capture target objects 212 and 205 are used, but in this embodiment, the capture target object 212 that is new in time is output. This is because the search for the first candidate region, which will be described later, can be performed with higher accuracy by cutting from a frame that is close in time to the image for which the main subject is to be detected.

以上の説明のようにして、対象物体検出部１０２は取込み対象物体を検出する。これにより、動画フレーム中で出現回数の高い物体が取込み対象物体として特定されることになる。動画フレーム中で出現回数の高い物体は、撮影者が撮影画像に取り込みたい物体である可能性が高いので、このような物体を特定することは、撮影者の意図を反映した主被写体検出を実現する上で必要な処理であるといえる。 As described above, the target object detection unit 102 detects the capture target object. As a result, an object having a high appearance frequency in the moving image frame is specified as the capture target object. An object with a high appearance frequency in a video frame is likely to be an object that the photographer wants to capture in the captured image, so identifying such an object realizes main subject detection that reflects the photographer's intention It can be said that this is a necessary process.

なお、このような、撮影者はどのような物体を撮影しようとしているのかを知るための処理は、実際に主被写体検出を行う時点のフレームよりも以前の動画フレームに対して行っておく必要がある。従って、対象物体検出部１０２における処理は、動画フレームを構成する静止画像が入力される度に行われることになる。すなわち、本実施形態の主被写体検出装置１００（デジタルカメラ）でも、対象物体検出部１０２は、シャッターを押して記録するか否かに関係なく、画像が入力される度に、言わばバックグラウンドの処理として上述の処理が実行される。 It should be noted that such processing for knowing what kind of object the photographer is going to shoot needs to be performed on a moving image frame before the frame at the time of actual main subject detection. is there. Therefore, the processing in the target object detection unit 102 is performed every time a still image constituting a moving image frame is input. In other words, even in the main subject detection apparatus 100 (digital camera) of the present embodiment, the target object detection unit 102 performs the background process every time an image is input regardless of whether or not recording is performed by pressing the shutter. The above process is executed.

図１の説明に戻ると、第１検出部１０３は、対象物体検出部１０２より取得した取込み対象物体情報を用いて、画像取得部１０１から入力された主被写体検出を行うフレーム（以下、検出対象フレームと称す）のどこに取込み対象物体があるのかを探索する。本実施形態において、その探索方法は特に限定されるものではない。例えば、対象物体検出部１０２から取込み対象物体情報として出力される取込み対象物体の切り出し画像をテンプレートとして、テンプレートマッチングにより取込み対象物体を探索することができる。 Returning to the description of FIG. 1, the first detection unit 103 uses the capture target object information acquired from the target object detection unit 102 to perform a main subject detection frame input from the image acquisition unit 101 (hereinafter, detection target). Search for where the object to be captured is located. In the present embodiment, the search method is not particularly limited. For example, the capture target object can be searched by template matching using a cut-out image of the capture target object output as capture target object information from the target object detection unit 102 as a template.

図３は、本実施形態における第１検出部１０３の処理を説明する図である。図３（ａ）は、検出対象フレーム３０１を示している。この検出対象フレーム３０１に対して、取込み対象物体２１２、２１３をテンプレートとして、テンプレートマッチングすることで、検出対象フレーム３０１中のどこに取込み対象物体２１２、２１３が存在するかを特定する。図３（ｂ）は、テンプレートマッチングによる取込み対象物体特定の処理の結果を示している。同図では、矩形領域３０３、３０４が取込み対象物体２１２、２１３の存在する場所として特定されている。第１検出部１０３で特定、検出された領域は、以下、第１候補領域と称す。本実施形態では、矩形領域３０３、３０４の領域が第１候補領域となる。 FIG. 3 is a diagram for explaining the processing of the first detection unit 103 in the present embodiment. FIG. 3A shows a detection target frame 301. Template matching is performed on the detection target frame 301 using the capture target objects 212 and 213 as templates, thereby specifying where the capture target objects 212 and 213 exist in the detection target frame 301. FIG. 3B shows a result of processing for specifying an object to be captured by template matching. In the figure, rectangular areas 303 and 304 are specified as locations where the capture target objects 212 and 213 exist. The area identified and detected by the first detection unit 103 is hereinafter referred to as a first candidate area. In the present embodiment, the rectangular areas 303 and 304 are the first candidate areas.

または、対象物体検出部１０２と同様の処理により取込み対象物体を検出してもよい。この場合、顕著領域検出部１０８と同様の処理により高顕著領域を抽出し、続いて、同一物体領域特定部１０９で行ったような相関演算を抽出した高顕著領域と取込み対象物体の切り出し画像との間で行えばよい。第１検出部１０３で特定、検出された領域（第１候補領域）に関する情報は、第１検出部１０３から最終検出部１０７へ出力される。 Alternatively, the capture target object may be detected by a process similar to that of the target object detection unit 102. In this case, the high saliency area is extracted by the same processing as the saliency area detection unit 108, and subsequently, the high saliency area obtained by extracting the correlation calculation as performed in the same object area specifying unit 109 and the cut-out image of the capture target object Between the two. Information about the area (first candidate area) identified and detected by the first detection unit 103 is output from the first detection unit 103 to the final detection unit 107.

機能取得部１０４は、ユーザにより指定された、装置（デジタルカメラ）で実行可能な機能に関する情報を取得する。ここで言う機能とは、主被写体検出結果を用いて実現することができるアプリケーションのことを指し、例えば、背景技術の段で例を挙げたような、オートフォーカス（ＡＦ）機能や自動追尾機能等である。機能取得部１０４は、これから撮影しようとする画像に対して、これらの機能のうちどれを適用させようとするのかに関する情報を取得する。例えば、主被写体に対して自動追尾機能を利用して撮影を行いたいと撮影者が考え、装置（デジタルカメラ）に対してそのように設定をしていれば、機能取得部１０４は、その設定された情報（自動追尾機能）を取得する。 The function acquisition unit 104 acquires information related to functions that can be executed by the device (digital camera) specified by the user. The function here refers to an application that can be realized using the main subject detection result. For example, an autofocus (AF) function, an automatic tracking function, etc., as exemplified in the background art section. It is. The function acquisition unit 104 acquires information regarding which of these functions is to be applied to an image to be taken. For example, if the photographer wants to shoot the main subject using the auto-tracking function, and the camera (digital camera) has been set as such, the function acquisition unit 104 sets the setting. Acquired information (automatic tracking function) is acquired.

一般に、主被写体検出結果を利用する機能（以下、検出結果利用機能）として、どの機能が選択されているのかという点においても、撮影者の意図を推定するヒントがある。つまり、自動追尾機能を選択したということは、撮影者は主被写体として動きを持つ物体を撮影したいのではないかという推察ができる。また、ＡＦ機能を選択したのであれば、撮影者は画像中の前景にある物体を主被写体として撮影したいのではないかと考えられる。 In general, there is a hint for estimating a photographer's intention in terms of which function is selected as a function that uses a main subject detection result (hereinafter referred to as a detection result use function). In other words, the fact that the automatic tracking function has been selected can be inferred that the photographer wants to photograph an object having a motion as the main subject. If the AF function is selected, the photographer may want to photograph the object in the foreground in the image as the main subject.

機能取得部１０４は、撮影しようとしている撮影対象画像に対して設定された機能の情報を取得し、取得した機能を示す情報（利用機能情報）を特徴量算出部１０５に出力する。本実施形態では、以下説明するように、検出結果利用機能として何が選択されているのかということに基づいて主被写体の第２候補領域を検出することも行う。 The function acquisition unit 104 acquires information about the function set for the shooting target image to be shot, and outputs information indicating the acquired function (usage function information) to the feature amount calculation unit 105. In the present embodiment, as described below, the second candidate area of the main subject is also detected based on what is selected as the detection result utilization function.

特徴量算出部１０５は、検出結果利用機能ごとに算出すべき特徴量（以下、算出特徴量）が予め対応付けて設定されており、入力される利用機能情報に対応した特徴量の算出を、画像取得部１０１から入力された検出対象フレームに対して行う。ここで、算出特徴量の種類は、検出結果利用機能に有用な特徴量が対応付けられている。例えば、検出結果利用機能が自動追尾機能であれば、物体の動きを検出するような特徴量が算出特徴量として対応付けられ、またＡＦ機能であれば、前景と背景を分離できるような特徴量が算出特徴量として対応付けられる。 The feature amount calculation unit 105 is set in advance with a feature amount (hereinafter referred to as a calculated feature amount) to be calculated for each detection result using function, and calculates the feature amount corresponding to the input used function information. This is performed on the detection target frame input from the image acquisition unit 101. Here, the type of the calculated feature amount is associated with a feature amount useful for the detection result utilization function. For example, if the detection result utilization function is an automatic tracking function, a feature quantity that detects the movement of an object is associated as a calculated feature quantity, and if it is an AF function, a feature quantity that can separate the foreground and the background Are associated as calculated feature amounts.

物体の動きを検出するために適した特徴量としては、例えば非特許文献２に開示されているオプティカルフロー等の各画素における速度ベクトルが挙げられる。動きの検出の場合は、一枚の検出対象フレームだけから検出することは困難であるので、検出対象フレームの前のフレームも合わせて特徴量算出部１０５に入力する必要がある。あるいは、カメラが動いてない場合には、フレーム間差分によっても動きの特徴量を算出できる。前景と背景とを分離できるような特徴量も、種々の公知技術を用いることができ、例えば特許文献２に開示される手法により算出される色特徴量などでよい。 As a feature quantity suitable for detecting the motion of an object, for example, a velocity vector in each pixel such as an optical flow disclosed in Non-Patent Document 2 can be cited. In the case of motion detection, since it is difficult to detect from only one detection target frame, it is necessary to input the frame preceding the detection target frame together to the feature amount calculation unit 105. Alternatively, when the camera is not moving, the feature quantity of the motion can be calculated from the inter-frame difference. Various known techniques can be used as the feature amount that can separate the foreground and the background. For example, a color feature amount calculated by the method disclosed in Patent Document 2 may be used.

以上のように、特徴量算出部１０５は、検出結果利用機能ごとに定められた種類の特徴量を算出し、その算出した特徴量を各画素の画素値とする特徴量マップを作成する。そして、作成した特徴量マップは第２検出部１０６に出力する。機能取得部１０４によって取得された機能に対応する特徴量が複数ある場合は、複数の特徴量マップを作成し、出力する。 As described above, the feature amount calculation unit 105 calculates a type of feature amount determined for each detection result utilization function, and creates a feature amount map in which the calculated feature amount is a pixel value of each pixel. Then, the created feature amount map is output to the second detection unit 106. If there are a plurality of feature amounts corresponding to the function acquired by the function acquisition unit 104, a plurality of feature amount maps are created and output.

第２検出部１０６は、入力された特徴量マップを用いて、検出結果利用機能にとって有用な領域を検出する。入力される特徴量マップは、検出結果利用機能にとって有用な特徴量を算出した結果であるので、特徴量マップを所定の閾値で閾値処理し、閾値を超えるような領域を主被写体の第２候補領域として検出する。 The second detection unit 106 detects a region useful for the detection result utilization function using the input feature amount map. Since the input feature quantity map is a result of calculating a feature quantity useful for the detection result utilization function, the feature quantity map is subjected to threshold processing with a predetermined threshold, and an area exceeding the threshold is set as the second candidate for the main subject. Detect as a region.

図４は、第２検出部１０６の処理の詳細を説明するための図である。ここでは、検出結果利用機能として自動追尾機能が設定されている場合について説明する。その場合、特徴量算出部１０５では、動きのある領域に対して高い値を持つ特徴量マップが作成され、出力される。図４（ａ）のように示される検出対象フレーム４０１に対して、物体の動きを検出するのに適した特徴量の算出を行い、閾値処理することで、検出対象フレーム４０１内のどこに動きの大きい領域が存在するかを特定することができる。なお、一枚のフレームだけでは動きに関する特徴量の算出はできないので、検出対象フレーム４０１だけでなく、その前のフレーム（不図示）も用いて動きの特徴量を算出する。具体的には、例えば、検出対象フレーム４０１とその前のフレームとで差分を算出し、閾値処理をする。図４（ｂ）は、検出対象フレーム４０１において、動きの大きな領域として特定された領域４０３、４０４を示している。第２検出部１０６は、このような動きの大きい領域を探索し、これら領域（４０３、４０４）を第２候補領域として特定する。 FIG. 4 is a diagram for explaining details of the processing of the second detection unit 106. Here, a case where the automatic tracking function is set as the detection result utilization function will be described. In this case, the feature amount calculation unit 105 creates and outputs a feature amount map having a high value for a region with motion. For the detection target frame 401 shown in FIG. 4A, a feature amount suitable for detecting the movement of the object is calculated, and threshold processing is performed. It can be specified whether a large area exists. Note that since a feature quantity related to motion cannot be calculated with only one frame, the feature quantity of motion is calculated using not only the detection target frame 401 but also the previous frame (not shown). Specifically, for example, a difference is calculated between the detection target frame 401 and the previous frame, and threshold processing is performed. FIG. 4B shows areas 403 and 404 identified as areas with large motion in the detection target frame 401. The second detection unit 106 searches for such a region having a large movement and identifies these regions (403, 404) as second candidate regions.

最終検出部１０７は、第１検出部１０３から入力される第１候補領域と、第２検出部１０６から入力される第２候補領域とに基づき、主被写体検出結果として、最終的な主被写体の領域を検出する。本実施形態では、第１候補領域および第２候補領域に共通して存在する領域を最終的な主被写体領域として検出する。最終検出部１０７は、主被写体情報として、検出した主被写体領域の位置やサイズ、さらには、検出領域のスコア（主被写体らしさ）を出力する。主被写体検出スコアの算出は、検出領域の位置やサイズ、第１候補領域を検出するときに得られる相関値、第２候補領域を検出するときに得られる特徴量値等の各情報を重み付き加算する等の演算を行い、算出すればよい。重み付き加算の重み値は、多数の画像（サンプル）から公知の学習手法により学習して決定することができる。 The final detection unit 107 determines the final main subject as a main subject detection result based on the first candidate region input from the first detection unit 103 and the second candidate region input from the second detection unit 106. Detect areas. In the present embodiment, a region that exists in common with the first candidate region and the second candidate region is detected as the final main subject region. The final detection unit 107 outputs, as main subject information, the position and size of the detected main subject region, and also the detection region score (main subjectness). The main subject detection score is calculated by weighting each piece of information such as the position and size of the detection area, the correlation value obtained when detecting the first candidate area, and the feature value obtained when detecting the second candidate area. What is necessary is just to calculate by performing calculations, such as addition. The weight value for weighted addition can be determined by learning from a large number of images (samples) by a known learning method.

ここで、最終検出部１０７の処理の詳細について、図３および図４を用いて説明する。図３の検出対象フレーム３０１、及び図４の検出対象フレーム４０１は同じフレーム（入力画像）であり、このとき、第１候補領域として領域３０３、３０４が特定され、第２候補領域として領域４０３、４０４が特定される。本実施形態の例では、両候補領域に共通して存在する領域を最終的な主被写体検出領域として検出するので、最終的な主被写体検出領域は領域３０４（あるいは領域４０３）となる。 Here, details of the processing of the final detection unit 107 will be described with reference to FIGS. 3 and 4. The detection target frame 301 in FIG. 3 and the detection target frame 401 in FIG. 4 are the same frame (input image). At this time, the areas 303 and 304 are specified as the first candidate areas, and the areas 403 and 403 are specified as the second candidate areas. 404 is identified. In the example of the present embodiment, an area that is common to both candidate areas is detected as the final main subject detection area, so the final main subject detection area is the area 304 (or area 403).

なお、本実施形態において、第１候補領域と第２候補領域とに共通して存在する領域がなければ、最終検出部１０７は主被写体なしと判断する。両候補領域の共通領域がないということは、主被写体として確実な領域が存在しないことを意味するので、その場合は主被写体なしと出力することで、後に利用する機能において主被写体の誤検出を抑制することができる。主被写体領域を誤検出すると、撮影者に不快感や違和感を与えてしまう可能性があるので、共通の領域がない場合には、主被写体なしとすることで、その可能性を低減している。 In the present embodiment, if there is no common area between the first candidate area and the second candidate area, the final detection unit 107 determines that there is no main subject. The fact that there is no common area between the two candidate areas means that there is no reliable area as the main subject, so in that case, outputting no main subject will cause false detection of the main subject in the function used later. Can be suppressed. If the main subject area is erroneously detected, the photographer may feel uncomfortable or uncomfortable, so if there is no common area, the possibility is reduced by eliminating the main subject. .

次に、主被写体検出装置１００による主被写体検出処理の詳細について説明する。図５は、本実施形態に係る主被写体検出処理の処理手順を示すフローチャートである。ここでは、あるタイミング（特定時点）で入力された検出対象フレーム（入力画像）について、主被写体領域の検出をする処理を説明するものである。 Next, details of main subject detection processing by the main subject detection apparatus 100 will be described. FIG. 5 is a flowchart showing a processing procedure of main subject detection processing according to the present embodiment. Here, a process for detecting a main subject region for a detection target frame (input image) input at a certain timing (a specific time) will be described.

図５に示すフローチャートにおいて、まずステップＳ５０１では、機能取得部１０４が設定された検出結果利用機能に関する情報を取得する。そして、ステップＳ５０２において、特徴量算出部１０５は、機能取得部１０４によって取得された検出結果利用機能に応じて、算出すべき特徴量を決定する。以上のステップＳ５０１、Ｓ５０２の処理は、検出対象フレームが取得されるタイミング（特定時点）よりも前に行われていれば、任意のタイミングに行うことができる。 In the flowchart shown in FIG. 5, first, in step S501, the function acquisition unit 104 acquires information related to the detection result use function set. In step S <b> 502, the feature amount calculation unit 105 determines a feature amount to be calculated according to the detection result use function acquired by the function acquisition unit 104. The processes in steps S501 and S502 can be performed at an arbitrary timing as long as the process is performed before the timing (specific time point) at which the detection target frame is acquired.

ステップＳ５０３において、画像取得部１０１は、動画フレーム（入力画像）の入力を判断し、画像が入力されれば、入力画像は対象物体検出部１０２へと渡され、処理はステップＳ５０４に進む。ステップＳ５０４では、対象物体検出部１０２の顕著領域検出部１０８が取込み対象物体の候補として高顕著領域を検出する。そして、続くステップＳ５０５において、同一物体領域特定部１０９が、異なるフレーム（入力画像）において検出された取込み対象物体の候補が同一の物体であるか否かの判定を行い、同一物体領域を特定する。そして、処理はステップＳ５０６に進み、出現頻度算出部１１０は、同一物体領域特定部１０９の判定結果に基づき、各取込み対象物体の候補の出現回数をカウントする。ステップＳ５０４〜Ｓ５０６の処理により、取込み対象物体の特定が行われる。この取込み対象物体の特定は、あるタイミング（特定時点）で入力される検出対象フレームよりも時間的に前のタイミングで逐次的に取得される複数フレームを用いて、予め行われているものである。 In step S503, the image acquisition unit 101 determines input of a moving image frame (input image). If an image is input, the input image is passed to the target object detection unit 102, and the process proceeds to step S504. In step S504, the saliency area detection unit 108 of the target object detection unit 102 detects a high saliency area as a candidate for the capture target object. In the subsequent step S505, the same object region specifying unit 109 determines whether the candidates for the capture target objects detected in different frames (input images) are the same object, and specifies the same object region. . Then, the process proceeds to step S506, and the appearance frequency calculation unit 110 counts the number of appearances of each capture target object candidate based on the determination result of the same object region specifying unit 109. The acquisition target object is specified by the processes of steps S504 to S506. The acquisition target object is specified in advance using a plurality of frames that are sequentially acquired at a timing earlier than the detection target frame input at a certain timing (specific time). .

そして、あるタイミング（特定時点）で検出対象フレームが入力され、ステップＳ５０７において、主被写体検出装置１００は、処理対象となる入力画像に対して主被写体検出処理を行うか否か判断する。入力画像に対して、主被写体検出処理を行うと判断した場合、処理はステップＳ５０８以降に進む。まず、ステップＳ５０８で、第１検出部１０３は、テンプレートマッチング等の手法により、画像取得部１０１から入力された検出対象フレームから取込み対象物体を探索し、第１候補領域を検出する。 Then, a detection target frame is input at a certain timing (a specific time), and in step S507, the main subject detection apparatus 100 determines whether or not to perform main subject detection processing on the input image to be processed. If it is determined that the main subject detection process is to be performed on the input image, the process proceeds to step S508 and subsequent steps. First, in step S508, the first detection unit 103 searches for a capture target object from the detection target frame input from the image acquisition unit 101 using a method such as template matching, and detects a first candidate region.

次に、ステップＳ５０９では、特徴量算出部１０５が、ステップＳ５０２で決定した種類の特徴量を、検出対象フレーム、及び必要に応じてそれ以前のフレームとから算出し、特徴量マップを作成する。そして、ステップＳ５１０では、第２検出部１０６が、特徴量算出部が算出した特徴量マップに基づいて、第２候補領域を検出する。なお、図５に示すフローチャートでは、ステップＳ５０４〜Ｓ５０６の処理の後に、ステップＳ５０８〜Ｓ５１０の処理を行うようにしている。しかし、同一の画像に対してステップＳ５０４〜Ｓ５０６の処理とステップＳ５０８〜Ｓ５１０の処理の両方を行う場合には、どちらを先に行っても構わないし、並列して同時に処理してもよい。 Next, in step S509, the feature amount calculation unit 105 calculates the type of feature amount determined in step S502 from the detection target frame and, if necessary, the previous frame, and creates a feature amount map. In step S510, the second detection unit 106 detects a second candidate region based on the feature amount map calculated by the feature amount calculation unit. In the flowchart shown in FIG. 5, the processes of steps S508 to S510 are performed after the processes of steps S504 to S506. However, in the case where both the processes of steps S504 to S506 and the processes of steps S508 to S510 are performed on the same image, either may be performed first, or the processes may be performed simultaneously in parallel.

続いて、ステップＳ５１１において、最終検出部１０７は、第１候補領域と第２候補領域の共通領域の有無を判定する。そして、ステップＳ５１１の判定の結果、共通領域があれば、処理はステップＳ５１２に進み、最終検出部１０７は、その共通領域の情報を主被写体検出の最終候補領域として出力する。一方、共通領域がなければ、処理はステップＳ５１３に進み、最終検出部１０７は、検出対象フレームには主被写体がないと判定し、その結果を出力する。 Subsequently, in step S511, the final detection unit 107 determines whether or not there is a common area between the first candidate area and the second candidate area. If the result of determination in step S511 is that there is a common area, the process proceeds to step S512, and the final detection unit 107 outputs information on the common area as the final candidate area for main subject detection. On the other hand, if there is no common area, the process proceeds to step S513, and the final detection unit 107 determines that there is no main subject in the detection target frame, and outputs the result.

以上のように、本実施形態の主被写体検出装置１００は、ある特定時点より前のフレーム群（複数の画像）から、撮影者が撮影する画像に取り込もうとしている取込み対象物体を検出する（第１候補領域の検出）。また、これと同時に、主被写体検出の結果が供されるアプリケーションにとって意味のある物体領域の検出を行う（第２候補領域の検出）。撮像装置（カメラ）の画角を決定するのは撮影者であるので、画角中に何が高頻度で存在しているかを調べることは、ある特定時点で撮影しようとする画像における撮影者の意図を推定することにつながる。また、主被写体検出の結果をどのような機能（アプリケーション）に利用するのかを設定するのも撮影者であるので、何のために主被写体を推定するのかを知ることも、ある特定時点で撮影しようとする画像における撮影者の意図を推定することにつながるといえる。 As described above, the main subject detection apparatus 100 according to the present embodiment detects a capture target object that is to be captured in an image captured by the photographer from a group of frames (a plurality of images) before a certain specific time (first). Detection of one candidate area). At the same time, an object area that is meaningful for an application provided with the result of main subject detection is detected (detection of a second candidate area). Since it is the photographer that determines the angle of view of the imaging device (camera), checking what is frequently present in the angle of view is the photographer's view of the image to be photographed at a particular point in time. This leads to estimation of intention. Also, since it is the photographer who sets what function (application) the result of the main subject detection is used for, it is also possible to know what the main subject is to be estimated for at a certain point in time. It can be said that this leads to estimation of the photographer's intention in the image to be attempted.

本実施形態では、このように異なる二つの観点で撮影者の意図を推定して候補領域を検出し、両候補領域から最終的な主被写体候補領域として特定するものである。このように多層的に撮影者の意図を推定することで、本実施形態では、撮影者の意図に反した主被写体の検出を抑制することが可能になる。そのため、この主被写体検出結果を利用したアプリケーションを操作する撮影者に対し、主被写体検出結果の誤検出に伴う不快感や違和感を低減させることができる。 In this embodiment, the photographer's intention is estimated from two different viewpoints as described above to detect a candidate area, and specify the final main subject candidate area from both candidate areas. As described above, by estimating the photographer's intention in multiple layers, in this embodiment, it is possible to suppress detection of the main subject against the photographer's intention. Therefore, it is possible to reduce discomfort and discomfort associated with erroneous detection of the main subject detection result for the photographer who operates the application using the main subject detection result.

例えば、特許文献１の技術のように、画角中に何が高頻度で存在するかを調べること（第１候補領域の探索）のみで主被写体を推定すると、図３の例では、領域３０３或いは３０４が主被写体候補領域として特定されることになる。具体的には、領域３０３、及び３０４の両方が主被写体領域として判定されるか、より安定している（動いていない）被写体である領域３０３を主被写体領域と判定される。これに対して、本実施形態の主被写体検出方法では、主被写体検出結果を何のために利用するのか（例えば自動追尾機能のため）という情報も手掛かりとして使用する。これにより、本実施形態は、動きのある物体を主被写体として撮影したいのではないか、という撮影者の意図を反映して、領域３０４（あるいは領域４０３）を主被写体領域として検出することが可能となる。これにより、本実施形態は、撮影者の意図に反した主被写体の検出を抑制することが可能になる。 For example, as in the technique of Patent Document 1, when the main subject is estimated only by examining what is frequently present in the angle of view (searching for the first candidate area), in the example of FIG. Alternatively, 304 is specified as the main subject candidate area. Specifically, both the areas 303 and 304 are determined as the main subject area, or the area 303 that is a more stable (not moving) subject is determined as the main subject area. On the other hand, in the main subject detection method of the present embodiment, information on what the main subject detection result is used for (for example, for an automatic tracking function) is also used as a clue. As a result, this embodiment can detect the region 304 (or the region 403) as the main subject region, reflecting the photographer's intention that he / she wants to photograph a moving object as the main subject. It becomes. Thereby, this embodiment can suppress the detection of the main subject contrary to the photographer's intention.

［第１の実施形態の変形例］
上述の第１の実施形態では、第１候補領域と第２候補領域の共通領域がなければ、最終検出部１０７は、主被写体なしと判定するようにしていた。そこで、第１の実施形態の変形例として、最終検出部１０７が主被写体なしと判定した場合、そのタイミングで第２候補領域として特定した領域の情報を、続くフレーム（時間的にその後のフレーム）における処理に利用する形態について説明する。なお、第１の実施形態で既に説明をした構成については、同一の符号を付し、その説明は省略する。 [Modification of First Embodiment]
In the first embodiment described above, if there is no common area between the first candidate area and the second candidate area, the final detection unit 107 determines that there is no main subject. Therefore, as a modification of the first embodiment, when the final detection unit 107 determines that there is no main subject, information on the area specified as the second candidate area at that timing is used as a subsequent frame (temporarily subsequent frame). The form used for the processing in will be described. In addition, about the structure already demonstrated by 1st Embodiment, the same code | symbol is attached | subjected and the description is abbreviate | omitted.

まず、図６を用いて、本変形例の主被写体検出方法の概要を説明する。図６（ａ）は、画像取得部１０１が逐次的に取得する入力画像を時系列に並べたものである。図６（ｂ）、（ｃ）には、図６（ａ）の各入力画像に対して検出される第１候補領域、第２候補領域を示している。ここでは、上述の第１の実施形態と同様に、検出結果利用機能として自動追尾機能が設定されており、また、フレーム６０１（及びそれ以前のフレーム）には、フレーム内に「標識」が存在している場合を示している。 First, the outline of the main subject detection method of this modification will be described with reference to FIG. FIG. 6A shows the input images sequentially acquired by the image acquisition unit 101 arranged in time series. FIGS. 6B and 6C show the first candidate area and the second candidate area detected for each input image in FIG. Here, as in the first embodiment described above, the automatic tracking function is set as the detection result utilization function, and the frame 601 (and previous frames) has a “marker” in the frame. It shows the case.

このような場合に、フレーム６０２が検出対象フレームとして入力されると、第１の実施形態では、第１候補領域として「標識」、第２候補領域として「飛行機」が検出される。そして、最終的な主被写体検出結果は、第１候補領域と第２候補領域の共通領域がないために、主被写体なしと判断される。本変形例では、このような場合に、第２候補領域として検出された領域に含まれる物体を、次のタイミングの検出対象フレームでの主被写体検出処理では、取込み対象物体に追加するものである。 In such a case, when the frame 602 is input as a detection target frame, in the first embodiment, “signpost” is detected as the first candidate region and “airplane” is detected as the second candidate region. The final main subject detection result is determined to have no main subject because there is no common area between the first candidate area and the second candidate area. In this modification, in such a case, the object included in the region detected as the second candidate region is added to the capture target object in the main subject detection process in the detection target frame at the next timing. .

図６に示す本変形例の場合、フレーム６０２のタイミングで第２候補領域として検出した領域は、次の検出対象フレーム６０３に対して実行される主被写体検出処理では、取込み対象物体に追加され、処理される。そのため、検出対象フレーム６０３に対して検出される第１候補領域は「標識」と「飛行機」、第２候補領域は「飛行機」となり、最終的な主被写体検出領域として「飛行機」が出力されることになる。また、その次の検出対象フレーム６０４でも同様の出力結果となる。 In the case of this modification shown in FIG. 6, the region detected as the second candidate region at the timing of the frame 602 is added to the capture target object in the main subject detection process executed for the next detection target frame 603, It is processed. Therefore, the first candidate area detected for the detection target frame 603 is “signpost” and “airplane”, the second candidate area is “airplane”, and “airplane” is output as the final main subject detection area. It will be. The same output result is obtained for the next detection target frame 604.

本変形例がこのように処理を行う理由は、撮影者が検出結果利用機能として自動追尾機能を選択したということは、動いている物体を主被写体として撮影したいのであろうと推測しているからである。つまり、画角内で高頻度に存在している物体（取込み対象物体）が動かない物体であるなら、これは主被写体でないと判断し、そういう構図（「標識がある構図」）において、動く物体（「飛行機」）が画角中に入ってくるのを待っていると推測しているからである。これは、本変形例が、主被写体検出結果をどの機能（アプリケーション）を利用しようとしているのかという推測を、特に重視しているということである。このような推測に基づく主被写体検出を実現するために、本変形例では上述のような処理を行っている。 The reason why this modification performs the processing in this way is that the photographer has selected the automatic tracking function as the detection result utilization function, because it is assumed that he / she wants to photograph a moving object as the main subject. is there. In other words, if an object that frequently exists in the angle of view (capture target object) is an object that does not move, it is determined that this is not the main subject, and in such a composition ("composition with a sign"), a moving object This is because it is assumed that it is waiting for ("Airplane") to enter the angle of view. This means that the present modification particularly emphasizes the estimation of which function (application) the main subject detection result is to be used. In order to realize main subject detection based on such estimation, the above-described process is performed in the present modification.

図７は、本変形例における主被写体検出処理の処理手順を示すフローチャートである。第１の実施形態と異なる点は、ステップＳ５１１において、第１候補領域と第２候補領域とに共通領域がないと判断され、主被写体なしと判断した場合に、処理をステップＳ７０１に進めることである。このステップＳ７０１では、最終検出部１０７が、検出対象フレームにおいて第２候補領域として検出された領域を、次のタイミングの検出対象フレームの処理において取込み対象物体の領域に追加する。追加された取込み対象物体領域は、次の検出対象フレームに対する第１候補領域の検出において使用されることになる。 FIG. 7 is a flowchart showing a processing procedure of main subject detection processing in the present modification. The difference from the first embodiment is that in step S511, if it is determined that there is no common area in the first candidate area and the second candidate area, and if it is determined that there is no main subject, the process proceeds to step S701. is there. In step S701, the final detection unit 107 adds the area detected as the second candidate area in the detection target frame to the area of the capture target object in the processing of the detection target frame at the next timing. The added capture target object region is used in the detection of the first candidate region for the next detection target frame.

なお、動画フレームによっては、ステップＳ７０１において、わざわざ取込み対象物体として追加しなくても、いずれ時間がたてば、第１候補領域として検出されなかった領域も（例えば図６の６０２の飛行機）、取込み対象物体として検出されることもある。しかしながら、ステップＳ７０１のように明に取込み対象物体として追加した方が、早期に追従できる効果が期待できる。 Note that, depending on the moving image frame, an area that is not detected as the first candidate area after some time (eg, the airplane 602 in FIG. 6) without being added as an object to be captured in step S701 after some time. It may be detected as an object to be captured. However, the effect of being able to follow early can be expected when the object is clearly added as a capture target object as in step S701.

［第２の実施形態］
次に、本発明の第２の実施形態として、第１候補領域と第２候補領域に共通領域がない場合を３通りに場合分けし、それぞれで異なる検出結果を出力する形態について説明する。なお、第１の実施形態で既に説明をした構成については、同一の符号を付し、その説明は省略する。 [Second Embodiment]
Next, as a second embodiment of the present invention, a mode in which the first candidate region and the second candidate region do not have a common region is classified into three cases and different detection results are output for each case will be described. In addition, about the structure already demonstrated by 1st Embodiment, the same code | symbol is attached | subjected and the description is abbreviate | omitted.

第１候補領域と第２候補領域に共通領域がない場合には、以下の３通りがある。すなわち、「１．第１候補領域も第２候補領域も存在するが、共通領域がない場合」、「２．第１候補領域は存在するが、第２候補領域がない場合」、「３．第１候補領域はないが、第２候補領域は存在する場合」である。 When there is no common area between the first candidate area and the second candidate area, there are the following three types. That is, “1. When there is a first candidate area and a second candidate area but no common area”, “2. When there is a first candidate area but no second candidate area”, “3. This is a case where there is no first candidate area but a second candidate area exists ”.

まず、「１．第１候補領域も第２候補領域も存在するが、共通領域がない場合」は、次のようなケースである。すなわち、取込み対象物体を手がかりとして検出した結果（第１候補領域）と、検出結果利用機能を手がかりとして検出した結果（第２候補領域）とで、両方を満足するような領域が存在しなかったということである。従って、この場合は、本実施形態でも、その検出対象フレームに対しては主被写体なしと判定する。 First, “1. When both the first candidate area and the second candidate area exist but there is no common area” is as follows. That is, there is no region that satisfies both the result (first candidate region) detected using the capture target object as a clue and the result detected using the detection result utilization function (second candidate region). That's what it means. Therefore, in this case, in this embodiment, it is determined that there is no main subject for the detection target frame.

次に「２．第１候補領域は存在するが、第２候補領域がない場合」は、検出結果利用機能情報を手がかりとした領域が検出されなかった場合である。このようなケースは、例えば、検出結果利用機能として自動追尾が設定されている場合、じっとしている物体（動物等）が動くのを待ち構えているような場面が想定される。物体がじっとしている間は動きがないので、第２候補領域は検出されない。しかしながら、このような場合には、じっとしている動物を主被写体としても問題はなく、その後に動き始めれば、それを自動追尾すればよい。従って、この２番目の場合には、第１候補領域の結果を主被写体検出結果として出力する。 Next, “2. The first candidate area exists but the second candidate area does not exist” is a case where an area based on the detection result use function information is not detected. In such a case, for example, when automatic tracking is set as the detection result utilization function, a scene in which a stationary object (animal or the like) is waiting to move is assumed. Since there is no movement while the object is still, the second candidate area is not detected. However, in such a case, there is no problem even if the still animal is used as the main subject, and if it starts to move after that, it may be automatically tracked. Therefore, in the second case, the result of the first candidate area is output as the main subject detection result.

次に「３．第１候補領域はないが、第２候補領域は存在する場合」は、取込み対象物体に相当する領域が検出されなかった場合である。このようなケースは、例えば、検出結果利用機能として自動追尾が設定されている場合、晴天時の青空を撮影しているときに、飛行機が画面に入ってきたような場面が想定される。晴天時の青空には、特に顕著領域は存在しないので、画像内取り込み対象物体は検出されない。しかしながら、このような場合には、フレームに入ってきた飛行機を自動追尾しても問題はない。従って、この３番目の場合には、第２候補領域の結果を主被写体検出結果として出力する。 Next, “3. When there is no first candidate area but there is a second candidate area” is a case where an area corresponding to the capture target object is not detected. In such a case, for example, when automatic tracking is set as a detection result utilization function, a scene in which an airplane enters the screen when shooting a blue sky in fine weather is assumed. Since there is no significant area in the blue sky in fine weather, the object to be captured in the image is not detected. However, in such a case, there is no problem even if the airplane that has entered the frame is automatically tracked. Therefore, in the third case, the result of the second candidate area is output as the main subject detection result.

図８は、本実施形態における主被写体検出処理の処理手順を示すフローチャートである。第１の実施形態と異なる点は、ステップＳ５１３の処理に代えて、ステップＳ８０１〜Ｓ８０６の処理が追加されていることである。まず、ステップＳ８０１で、最終検出部１０７は、第１候補領域の有無を判定し、第１候補領域がある場合には処理はステップＳ８０２に進み、第１候補領域がない場合には処理はステップＳ８０３へと進む。ステップＳ８０２、Ｓ８０３では、最終検出部１０７は、第２候補領域の有無を判定する。この３ステップの判定処理により、第１候補領域と第２候補領域に共通領域がない場合は、３通りに場合分けされて、ステップＳ８０４〜Ｓ８０６のそれぞれで、上述の場合分けに応じた出力結果がなされる。 FIG. 8 is a flowchart showing a processing procedure of main subject detection processing in the present embodiment. The difference from the first embodiment is that the processes in steps S801 to S806 are added instead of the process in step S513. First, in step S801, the final detection unit 107 determines whether or not there is a first candidate area. If there is a first candidate area, the process proceeds to step S802. If there is no first candidate area, the process proceeds to step S802. The process proceeds to S803. In steps S802 and S803, the final detection unit 107 determines whether or not there is a second candidate area. If there is no common area in the first candidate area and the second candidate area by this three-step determination process, the process is classified into three cases, and an output result corresponding to the above-mentioned case classification in each of steps S804 to S806. Is made.

以上、本実施形態によれば、第１候補領域と第２候補領域に共通領域がない場合、さらに３通りに場合分けし、それぞれで異なる検出結果を出力することで、より詳細な主被写体領域検出処理を実現することができる。 As described above, according to the present embodiment, when there is no common area in the first candidate area and the second candidate area, the case is further divided into three cases, and different detection results are output for each, thereby providing a more detailed main subject area. Detection processing can be realized.

［第３の実施形態］
次に、本発明の第３の実施形態として、パンチルト動作や撮影者による画角の変化等、撮像装置（カメラ）の姿勢の変化も利用して主被写体領域を検出する形態について説明する。なお、第１、第２の実施形態で既に説明をした構成については、同一の符号を付し、その説明は省略する。 [Third Embodiment]
Next, as a third embodiment of the present invention, a mode in which a main subject region is detected using a change in posture of an imaging device (camera) such as a pan / tilt operation or a change in angle of view by a photographer will be described. In addition, about the structure already demonstrated by 1st, 2nd embodiment, the same code | symbol is attached | subjected and the description is abbreviate | omitted.

一般に、撮像装置（カメラ）がパンチルト動作をしているのか否かは、カメラ自体がパンチルト動作を制御しているため、カメラはその状態変化を容易に検出ことができる。また、撮影者による画角変化に関しては、カメラに装備されたジャイロセンサー等の動きセンサーの出力を基にカメラが動いているのか否かを検出することができる。このようにして、主被写体検出装置１００は装置自体（カメラ）の動きに関する情報を取得することができる。 In general, whether or not the imaging device (camera) is performing the pan / tilt operation is controlled by the camera itself, so that the camera can easily detect the state change. Further, regarding the change in the angle of view by the photographer, it is possible to detect whether or not the camera is moving based on the output of a motion sensor such as a gyro sensor mounted on the camera. In this way, the main subject detection apparatus 100 can acquire information regarding the movement of the apparatus itself (camera).

次に、カメラの姿勢変化とその際に推定される主被写体領域との関係について説明する。
まず、カメラに動きがないとき、対象物体検出部１０２により取込み対象物体が検出されるという場合は、上述の各実施形態で想定しているようなケースである。この場合、「その取込み対象物体を撮影したいから」なのか、「たまたま画角中に入ってきたものが取込み対象物体になったから」なのか、明確に区別できない可能性がある。そのために、上述の各実施形態のように、検出結果利用機能情報を用いて第２候補領域を検出することを行い、両者の区別に利用することが好適である。 Next, the relationship between the camera posture change and the main subject area estimated at that time will be described.
First, when the target object detection unit 102 detects the capture target object when there is no movement of the camera, it is a case assumed in the above embodiments. In this case, there is a possibility that it cannot be clearly distinguished whether it is "I want to shoot the object to be captured" or "I just happened to be an object to be captured that entered the angle of view". For this purpose, it is preferable to detect the second candidate area using the detection result utilization function information as in each of the above-described embodiments and use the second candidate area to distinguish between the two.

これに対し、カメラに動きがあるときに、対象物体検出部１０２により取込み対象物体が検出されるような場合は、カメラを動かす（画角を変更する）ことによって、撮影者が取込み対象物体を何とか画角に入れようという意図が働いていると判断することができる。このような場合には、カメラを動かしてでも画角内に入れようとしている取込み対象物体は、主被写体領域の可能性が高いと推測される。本実施形態では、このような場合、第１候補領域を優先して最終候補領域（主被写体領域）として出力することを行う。 On the other hand, when the capture target object is detected by the target object detection unit 102 when the camera moves, the photographer moves the camera (changes the angle of view) so that the photographer selects the capture target object. It can be judged that the intention to manage the angle of view is working. In such a case, it is estimated that the capture target object that is going to be within the angle of view even when the camera is moved is highly likely to be the main subject area. In this embodiment, in such a case, the first candidate area is preferentially output as the final candidate area (main subject area).

図９は、本実施形態における主被写体検出装置１００の概略ブロック図を示している。図９（ａ）において、動き検出部９０１は主被写体検出装置１００の外部に設けられたジャイロセンサー等の動きセンサーにより構成され、カメラ（主被写体検出装置１００自身）の動きを検出し、その情報を出力する。そして、最終検出部１０７は、カメラ動き情報を取得し、その情報も利用して最終候補領域（最終的な主被写体領域）の特定を行う。具体的には、最終検出部１０７は、カメラ動き情報からカメラに動きがあると検知された場合には、第１候補領域を優先して最終候補領域（主被写体領域）を特定する。例えば、第１候補領域も第２候補領域も存在するが、共通領域がない場合、本実施形態では、主被写体検出結果として第１候補領域を出力する。 FIG. 9 shows a schematic block diagram of the main subject detection apparatus 100 in the present embodiment. In FIG. 9A, the motion detection unit 901 is configured by a motion sensor such as a gyro sensor provided outside the main subject detection device 100, detects the motion of the camera (the main subject detection device 100 itself), and the information thereof. Is output. Then, the final detection unit 107 acquires camera motion information and specifies the final candidate area (final main subject area) using the information. Specifically, when it is detected from the camera motion information that the camera is moving, the final detection unit 107 prioritizes the first candidate area and identifies the final candidate area (main subject area). For example, when both the first candidate area and the second candidate area exist, but there is no common area, in the present embodiment, the first candidate area is output as the main subject detection result.

なお、ここでの処理は、カメラの動きの有無ではなく、カメラの動きの大きさが所定の閾値を超えたか否かをトリガーとして行うようにしてもよい。また、カメラの姿勢変化の計測手法としては、カメラ自体の動きをセンサー等により直接計測することで可能であるが、カメラの撮像部から撮り込まれる複数の画像の変化からカメラの動きを間接的に推定することでも可能である。カメラの撮像部から撮り込まれる複数の画像の変化からカメラの動きを推定する手法としては、ＳｔｒｕｃｔｕｒｅｆｒｏｍＭｏｔｉｏｎと呼ばれる公知技術が知られている。ＳｔｒｕｃｔｕｒｅｆｒｏｍＭｏｔｉｏｎでは、画像中の多数の特徴点の３次元的移り変わりを用いることで、カメラの姿勢変化と、そのカメラが撮影している物体の幾何学構造を計算する。この手法により、外部のセンサー等を使用せず、撮り込まれる画像のみからカメラの姿勢変化を推定することができる。図９（ｂ）は、このような撮影した画像の変化に基づいてカメラの動きを推定する主被写体検出装置１００の概略ブロック図を示している。動き検出部９０２は、撮影された動画フレームを用いて、カメラ（主被写体検出装置１００）の動き情報を検出し、出力する。動き検出部９０２が、動画フレームからカメラの動きを推定する際には、上述したＳｔｒｕｃｔｕｒｅｆｒｏｍＭｏｔｉｏｎを用いる。 Note that the process here may be performed not based on the presence or absence of camera movement but as a trigger whether or not the magnitude of camera movement exceeds a predetermined threshold. In addition, as a method for measuring the camera posture change, it is possible to directly measure the movement of the camera itself with a sensor or the like, but indirectly the movement of the camera from changes in multiple images taken from the camera imaging unit. It is also possible to estimate it. A known technique called Structure from Motion is known as a method for estimating the movement of a camera from changes in a plurality of images taken from an imaging unit of the camera. In Structure from Motion, the posture change of the camera and the geometric structure of the object photographed by the camera are calculated by using the three-dimensional transition of many feature points in the image. With this method, it is possible to estimate the change in the posture of the camera from only the captured image without using an external sensor or the like. FIG. 9B shows a schematic block diagram of the main subject detection apparatus 100 that estimates the movement of the camera based on such a change in the captured image. The motion detection unit 902 detects and outputs motion information of the camera (main subject detection device 100) using the captured moving image frame. When the motion detection unit 902 estimates the motion of the camera from the moving image frame, the above-described Structure from Motion is used.

図１０は、本実施形態における主被写体検出処理の処理手順を示すフローチャートである。第１の実施形態と異なる点は、ステップＳ５１１の前に、ステップＳ１００１として、最終検出部１０７が動き検出部９０１または動き検出部９０２の検出結果からカメラ（主被写体検出装置１００）に動きがあるかを判定するステップがあることである。このステップで、動きがないと判定された場合は、処理はステップＳ５１１へと進み、第１の実施形態と同様の処理が行われる。一方、ステップＳ１００１において、カメラに動きがあると判定されて場合には、処理はステップＳ１００２へと進み、最終検出部１０７は、主被写体検出結果として第１候補領域を出力する。 FIG. 10 is a flowchart showing a processing procedure of main subject detection processing in the present embodiment. The difference from the first embodiment is that in step S1001 before step S511, the final detection unit 107 moves in the camera (main subject detection apparatus 100) based on the detection result of the motion detection unit 901 or the motion detection unit 902. There is a step of determining whether or not. If it is determined in this step that there is no movement, the process proceeds to step S511, and the same process as in the first embodiment is performed. On the other hand, if it is determined in step S1001 that the camera is moving, the process proceeds to step S1002, and the final detection unit 107 outputs the first candidate area as the main subject detection result.

以上、本実施形態によれば、撮像装置（カメラ）の動きの変化も利用して主被写体領域を検出することにより、精度良く主被写体領域を検出できるようになり、撮影者の意図に反した主被写体の検出をより抑制することが可能になる。 As described above, according to the present embodiment, the main subject region can be detected with high accuracy by detecting the main subject region using the change in the movement of the imaging device (camera), which is contrary to the intention of the photographer. The detection of the main subject can be further suppressed.

［その他の実施形態］
上述の説明では、同一物体領域特定部１０９での同一物体の特定手法として、相関演算による手法を例に挙げたが、色ヒストグラム同士を比較する等、他の手法であってもよい。 [Other Embodiments]
In the above description, as a method for specifying the same object in the same object region specifying unit 109, a method based on correlation calculation has been described as an example, but other methods such as comparing color histograms may be used.

また、対象物体検出部１０２による取込み対象物体の探索に係る処理は、複数フレームにわたって高頻度に存在する物体を探索することとしていた。しかし、撮影者が画像内に取り込みたいと思う物体を推定する方法であれば、他の手法により取込み対象物体を探索するようにしてもよい。例えば、対象物体検出部１０２は、単純に顕著な物体を取込み対象物体として特定するようにしてもよい。さらには、複数フレームにわたって長く映っているだけでなく、その物体がどの程度顕著な物体なのか、或いは、その物体が存在する画像中の位置等も考慮して、取込み対象物体を探索するような手法でもよい。 Further, the process related to the search for the target object to be captured by the target object detection unit 102 is to search for an object that exists frequently over a plurality of frames. However, as long as it is a method for estimating an object that the photographer wants to capture in the image, the capture target object may be searched by another method. For example, the target object detection unit 102 may simply take a prominent object and specify it as a target object. Furthermore, not only is the image captured for a long time across multiple frames, but also how much the object is prominent, or the position in the image where the object exists, etc. A technique may be used.

また、本発明は、上記実施形態の機能を実現するソフトウェア（プログラム）を、ネットワーク又は各種記憶媒体を介してシステム或いは装置に供給し、そのシステム或いは装置のコンピュータ（又はＣＰＵやＭＰＵ等）がプログラムを読み出して実行する処理である。また、本発明は、複数の機器から構成されるシステムに適用しても、１つの機器からなる装置に適用してもよい。本発明は上記実施例に限定されるものではなく、本発明の趣旨に基づき種々の変形（各実施例の有機的な組合せを含む）が可能であり、それらを本発明の範囲から除外するものではない。即ち、上述した各実施例及びその変形例を組み合わせた構成も全て本発明に含まれるものである。 In addition, the present invention supplies software (program) for realizing the functions of the above-described embodiments to a system or apparatus via a network or various storage media, and the computer of the system or apparatus (or CPU, MPU, etc.) programs Is read and executed. Further, the present invention may be applied to a system composed of a plurality of devices or an apparatus composed of a single device. The present invention is not limited to the above embodiments, and various modifications (including organic combinations of the embodiments) are possible based on the spirit of the present invention, and these are excluded from the scope of the present invention. is not. That is, the present invention includes all the combinations of the above-described embodiments and modifications thereof.

１００主被写体検出装置
１０１画像取得部
１０２対象物体検出部
１０３第１検出部
１０４機能取得部
１０５特徴量算出部
１０６第２検出部
１０７最終検出部
１０８顕著領域検出部
１０９同一物体領域特定部
１１０出現頻度算出部 DESCRIPTION OF SYMBOLS 100 Main subject detection apparatus 101 Image acquisition part 102 Target object detection part 103 1st detection part 104 Function acquisition part 105 Feature-value calculation part 106 2nd detection part 107 Final detection part 108 Notable area | region detection part 109 Same object area | region identification part 110 Appearance Frequency calculator

Claims

A main subject detection device for detecting a region of a main subject of an image at a specific time,
Image acquisition means for acquiring a plurality of images in chronological order before the specific time point;
Object detection means for detecting an object having a predetermined frequency or more in a plurality of images acquired by the acquisition means;
First detection means for detecting a first candidate region of a main subject in the image at the specific time point based on information on the object detected by the object detection means;
Function acquisition means for acquiring a function provided with information of the detected main subject area when the main subject area is detected in the image at the specific time;
A feature amount calculating unit that calculates a feature amount of the image at the specific time point according to the function acquired by the function acquiring unit;
Second detection means for detecting a second candidate area of the main subject in the image at the specific time point based on the feature quantity calculated by the feature quantity calculation means;
Final detection means for detecting a main subject area of the image at the specific time point based on the first candidate area detected by the first detection means and the second candidate area detected by the second detection means;
A main subject detection apparatus comprising:

The object detection means calculates the saliency in each pixel of each of the plurality of images acquired by the image acquisition means, and detects an area where the saliency is a predetermined threshold or more as an area where the object exists. The main subject detection device according to claim 1.

The object detection unit calculates the saliency based on a feature amount difference between a small region set in each pixel of each of the plurality of images acquired by the image acquisition unit and a region around the small region. The main subject detection device according to claim 2, wherein:

4. The main subject detection apparatus according to claim 1, wherein the function acquisition unit acquires at least one of autofocus and automatic tracking as the function. 5.

5. The main subject detection apparatus according to claim 4, wherein when the function acquisition unit acquires autofocus as the function, the feature amount calculation unit calculates a color feature amount of the image at the specific time point.

6. The main subject detection device according to claim 4, wherein when the function acquisition unit acquires automatic tracking as the function, the feature amount calculation unit calculates a velocity vector of the image at the specific time point. .

The said last detection means detects the area | region of the main to-be-photographed object of the image in the said specific time for the area | region common to the said 1st candidate area | region and the said 2nd candidate area | region. The main subject detection device according to the item.

The final detection means determines that there is no main subject area in the image at the specific time when there is no area common to the first candidate area and the second candidate area. The main subject detection device according to claim 7.

The final detection means determines the second candidate area when the first candidate area is not detected by the first detection means and there is no area common to the first candidate area and the second candidate area. The main subject detection device according to claim 1, wherein the main subject detection device detects the main subject region of the image at a specific time.

The final detection means, when the second candidate area is not detected by the second detection means, and there is no area common to the first candidate area and the second candidate area, the first candidate area The main subject detection device according to claim 1, wherein the main subject detection device detects the main subject region of the image at a specific time.

11. The information according to claim 1, wherein when the final detection unit determines that there is no main subject region in the image at the specific time, the final detection unit outputs information indicating that there is no main subject region. The main subject detection device according to claim 1.

When the final detection unit fails to detect the main subject region in the image at the specific time point, the final detection unit detects the object included in the second candidate region detected by the second detection unit as the main object of the image after the specific time point. The main subject detection device according to claim 1, wherein when detecting a region of a subject, the main subject detection device adds to the object detected by the object detection means.

An image capturing unit for capturing an image;
The main subject detection device according to claim 1, wherein an image captured by the imaging unit is acquired.

Motion detection means for detecting a change in motion of the main subject detection device;
14. The main subject detection apparatus according to claim 13, wherein the final detection unit detects a main subject region of the image at the specific time point based on the change in the motion detected by the motion detection unit. .

The main subject detection apparatus according to claim 14, wherein the motion detection unit detects a change in the motion based on a plurality of images captured from the imaging unit.

A main subject detection method for detecting a region of a main subject of an image at a specific time,
Acquiring a plurality of continuous images in time series before the specific time point;
Detecting an object having a predetermined number of times or more in the plurality of acquired images;
Detecting a first candidate area of a main subject in the image at the specific time point based on the detected object information;
Obtaining a function provided with information of the detected main subject area when the main subject area is detected in the image at the specific time;
Calculating a feature amount of the image at the specific time according to the acquired function;
Detecting a second candidate region of the main subject in the image at the specific time point based on the calculated feature amount;
Detecting a region of a main subject of the image at the specific time point based on the first candidate region and the second candidate region;
A main subject detection method characterized by comprising:

A program for causing a computer to function as the main subject detection device according to any one of claims 1 to 12.