JP6685823B2

JP6685823B2 - Subject tracking device, control method thereof, imaging device, and program

Info

Publication number: JP6685823B2
Application number: JP2016090348A
Authority: JP
Inventors: 広明栗栖
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2016-04-28
Filing date: 2016-04-28
Publication date: 2020-04-22
Anticipated expiration: 2036-04-28
Also published as: JP2017200088A

Description

本発明は、撮像装置等に用いられる被写体追跡処理の技術に関する。 The present invention relates to a technique of subject tracking processing used in an imaging device or the like.

動画像から特定の被写体画像を抽出して被写体を追跡する技術は、画像内の被写体人物の顔領域や人体領域の特定等に利用されている。例えば、通信会議、マン・マシン・インターフェイス、セキュリティ、任意の被写体を追跡するためのモニタ・システム、画像圧縮等の多くの分野で使用可能である。 A technique of extracting a specific subject image from a moving image and tracking the subject is used for identifying a face area or a human body area of a subject person in the image. For example, it can be used in many fields such as teleconferencing, man-machine interface, security, monitor system for tracking arbitrary objects, image compression and so on.

ユーザが指定した撮像画像内の被写体の画像を抽出して追跡し、被写体に対する焦点状態や露出状態を最適化する技術がある（特許文献１参照）。一般的なテンプレートマッチングの技術では、タッチパネル等の入力インターフェイスを用いてユーザが画像内で任意に指定した領域を基準としてテンプレート画像が登録される。テンプレート画像と最も類似度が高いか、または相違度が低い領域を画像内において推定し、特定の被写体を追跡する処理が行われる。画素パターンの類似性を評価尺度として用いる方法は、追跡対象とその他の被写体（背景等）において部分領域の画素パターンが類似していると、誤った被写体を対象として追跡が行われる可能性がある。また、色ヒストグラムの類似性を評価尺度に利用する方法は、追跡対象とその他の被写体において部分領域の色の割合が類似していると、誤った被写体を対象として追跡が行われる可能性がある。 There is a technique of extracting and tracking an image of a subject in a captured image designated by a user, and optimizing a focus state and an exposure state of the subject (see Patent Document 1). In a general template matching technique, a template image is registered with reference to an area arbitrarily designated in the image by a user using an input interface such as a touch panel. A process of estimating a region having the highest similarity or the lowest difference with the template image in the image and tracking a specific subject is performed. In the method of using the similarity of pixel patterns as an evaluation measure, if the pixel patterns of the partial areas of the tracking target and other subjects (background, etc.) are similar, there is a possibility that tracking will be performed on the wrong subject. . Further, in the method of using the similarity of the color histogram as an evaluation measure, if the color ratios of the partial areas of the tracking target and other subjects are similar, there is a possibility that tracking may be performed on the wrong subject. .

被写体追跡を正確に行うために距離情報を利用した技術が提案されている。特許文献２には追跡対象の距離情報を利用して、撮像装置と追跡対象の被写体の間に存在する別の被写体（遮蔽被写体）の領域を検出し、遮蔽被写体に焦点を合わせないようにする技術が開示されている。 Techniques have been proposed that use distance information in order to accurately track an object. In Patent Document 2, by using distance information of a tracking target, a region of another subject (shielded subject) existing between the imaging device and the subject of the tracking target is detected so as not to focus on the shielded subject. The technology is disclosed.

特開２００１−６０２６９号公報JP 2001-60269 A 特開２０１４−２０２８７５号公報JP, 2014-202875, A 特開２００２−２５１３８０号公報JP 2002-251380 A

従来の技術では被写体追跡が困難な撮影シーンがあり、被写体追跡の精度が充分に得られない場合がある。例えば、追跡対象の被写体（以下、主被写体と呼ぶ）に対して、画素パターンや色ヒストグラム等が類似する被写体（以下、類似被写体と呼ぶ）が存在するシーンを想定する。撮像装置の光軸方向（撮影方向）にて主被写体と類似被写体との距離差が小さい場合、主被写体と類似被写体とを距離情報で差別化することが困難となる。また、撮像装置から見た場合に主被写体が類似被写体に遮蔽されて姿が見えない間に主被写体が撮像装置の光軸方向に移動するシーンを想定する。追跡対象の乗り移りの検出や、再び出現した主被写体の追跡が困難になる可能性がある。 In the conventional technique, there are shooting scenes in which it is difficult to track an object, and there are cases where the accuracy of object tracking cannot be sufficiently obtained. For example, assume a scene in which a subject (hereinafter, referred to as a similar subject) having a pixel pattern, a color histogram, and the like that are similar to a subject to be tracked (hereinafter, referred to as a main subject) exists. When the difference in distance between the main subject and the similar subject in the optical axis direction (shooting direction) of the imaging device is small, it is difficult to differentiate the main subject and the similar subject based on the distance information. Further, it is assumed that the main subject moves in the optical axis direction of the image pickup device while the main subject is shielded by similar subjects and invisible when viewed from the image pickup device. There is a possibility that it may be difficult to detect the transfer of the tracking target or to track the main subject that reappears.

本発明は、撮像された画像に関連する距離情報と、被写体に係る奥行き方向の位置変化の検出情報を用いることで被写体追跡の精度を向上させることを目的とする。 It is an object of the present invention to improve the accuracy of subject tracking by using distance information related to a captured image and detection information of a position change in the depth direction of a subject.

本発明の一実施形態に係る装置は、画像データおよび該画像データに関連する距離情報を取得して画像内の被写体の追跡処理を行う被写体追跡装置であって、前記画像データおよび距離情報を取得する取得手段と、前記画像データから被写体の画像を検出して被写体領域に係る複数の第１の候補領域の情報を出力する第１の検出手段と、前記距離情報および画像内における前記被写体の奥行き方向の位置変化から被写体領域に係る第２の候補領域の情報を出力する第２の検出手段と、前記第１および第２の候補領域の情報を取得し、前記第２の候補領域により前記複数の第１の候補領域から絞り込んだ領域を被写体追跡に用いる被写体領域として決定する決定手段と、を備える。前記第１の検出手段は、前記画像データを取得して被写体を検出する被写体検出手段と、前記画像データおよび前記被写体検出手段の検出情報を取得してマッチング処理を行い、前記第１の候補領域の情報として複数の評価値と領域情報を出力するマッチング手段と、を有する。前記第２の検出手段は、前記距離情報を取得して画像内における前記被写体の奥行き方向の位置変化を示す位置変化ベクトルを算出する算出手段と、検出された複数の被写体領域に係る前記距離情報および位置変化ベクトルを比較して前記第２の候補領域の情報を出力する比較手段と、を有する。
An apparatus according to an embodiment of the present invention is an object tracking apparatus that acquires image data and distance information related to the image data to perform tracking processing of an object in an image, and acquires the image data and distance information. Acquiring means, first detecting means for detecting an image of a subject from the image data and outputting information of a plurality of first candidate regions related to the subject region, the distance information and the depth of the subject in the image. A second detection unit that outputs information on a second candidate region related to the subject region based on a change in position in the direction, and information on the first and second candidate regions are acquired, and the plurality of information items are acquired by the second candidate region. Determining unit that determines a region narrowed down from the first candidate region as a subject region used for subject tracking. The first detection unit obtains the image data and detects a subject, and obtains the image data and detection information of the subject detection unit to perform matching processing to obtain the first candidate region. And a matching unit that outputs a plurality of evaluation values and area information as information. The second detection unit obtains the distance information and calculates a position change vector indicating a position change of the subject in the image in the depth direction, and the distance information regarding the plurality of detected subject regions. And a comparison means for comparing the position change vectors and outputting the information of the second candidate area.

本発明によれば、撮像された画像に関連する距離情報と、被写体に係る奥行き方向の位置変化の検出情報を用いることで被写体追跡の精度を向上させることができる。 According to the present invention, the accuracy of subject tracking can be improved by using the distance information related to the captured image and the detection information of the position change of the subject in the depth direction.

本発明の実施形態に係る撮像装置を例示する外観図である。It is an external view which illustrates the imaging device which concerns on embodiment of this invention. 本発明の実施形態に係る撮像装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the imaging device which concerns on embodiment of this invention. 撮像光学系の瞳と撮像素子の光電変換部との関係を示す模式図である。It is a schematic diagram which shows the relationship between the pupil of an imaging optical system, and the photoelectric conversion part of an image sensor. 本発明の第１実施形態における被写体追跡処理を含む全体のフローチャートである。3 is an overall flowchart including subject tracking processing according to the first embodiment of the present invention. 本発明の第１実施形態における被写体追跡処理部を示すブロック図である。FIG. 3 is a block diagram showing a subject tracking processing unit in the first embodiment of the present invention. マッチング処理を模式的に示す図である。It is a figure which shows a matching process typically. 被写体の像面位置とフォーカスレンズ位置の像面変換値の時間変化を示す図である。It is a figure which shows the time change of the image surface conversion value of a subject's image surface position and a focus lens position. 被写体領域と焦点検出領域を示す図である。It is a figure which shows a to-be-photographed area | region and a focus detection area. 被写体追跡処理を示すフローチャートである。It is a flowchart which shows a subject tracking process. 第２の被写体候補領域の検出処理を示すフローチャートである。It is a flow chart which shows the detection processing of the 2nd subject candidate field. ２つの被写体が同距離に存在する場合に被写体追跡を示す図である。It is a figure which shows subject tracking when two subjects exist in the same distance. ２つの被写体のデフォーカス量と光軸ベクトル量の時間変化を示す図である。It is a figure which shows the time change of the defocus amount and optical axis vector amount of two subjects. ２つの被写体がすれ違う際の被写体追跡を示す図である。It is a figure which shows subject tracking when two subjects pass each other. ２つの被写体のデフォーカス量と光軸ベクトル量の時間変化を示す図である。It is a figure which shows the time change of the defocus amount and optical axis vector amount of two subjects. 本発明の第２実施形態における被写体追跡処理部を示すブロック図である。It is a block diagram which shows the subject tracking process part in 2nd Embodiment of this invention. 本発明の第２実施形態における焦点検出領域を示す図である。It is a figure which shows the focus detection area | region in 2nd Embodiment of this invention.

以下、添付図面を参照して、本発明の例示的な実施形態を詳細に説明する。各実施形態の撮像装置は、撮像面位相差検出方式の焦点検出が可能な構成を有し、画像データに関連する距離情報と後述の光軸ベクトル情報（位置変化ベクトルの光軸成分の情報）を用いて被写体追跡の演算を行う。なお、以下の各実施形態に説明する構成は単なる例示であり、本発明は実施形態に記載された構成に限定されない。 Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. The image pickup apparatus of each embodiment has a configuration capable of focus detection by an image pickup surface phase difference detection method, and has distance information related to image data and optical axis vector information (information of an optical axis component of a position change vector) described later. Is used to calculate the subject tracking. The configurations described in the following embodiments are merely examples, and the present invention is not limited to the configurations described in the embodiments.

［第１実施形態］
図１および図２を参照して、本実施形態における撮像装置について説明する。図１は撮像装置１０１の外観を示す斜視図である。撮像装置１０１の光軸方向は、撮像装置１０１内の撮像面１０２に対して法線方向となる奥行き方向１０３である。図２は、撮像装置２００の構成例を示すブロック図である。撮像装置２００は、撮像光学系を介して撮像素子により撮像された被写体の画像データ（動画像や静止画像のデータ）を記録媒体に記録する機能を有する。記録媒体はテープや固体メモリ、光ディスクや磁気ディスク等の各種メディアである。撮像装置２００はデジタルスチルカメラやビデオカメラ等であるが、これらに限定されるものではない。 [First Embodiment]
An image pickup apparatus according to this embodiment will be described with reference to FIGS. 1 and 2. FIG. 1 is a perspective view showing the outer appearance of the image pickup apparatus 101. The optical axis direction of the imaging device 101 is a depth direction 103 that is a normal line direction to the imaging surface 102 in the imaging device 101. FIG. 2 is a block diagram showing a configuration example of the image pickup apparatus 200. The imaging device 200 has a function of recording image data (moving image or still image data) of a subject imaged by an imaging element via an imaging optical system on a recording medium. The recording medium is various media such as a tape, a solid-state memory, an optical disk and a magnetic disk. The imaging device 200 is a digital still camera, a video camera, or the like, but is not limited to these.

撮影レンズ２０１は、固定レンズ２０２、絞り２０３、フォーカスレンズ２０４を備えるレンズユニットである。絞り制御部２１１は、絞り２０３を駆動することにより、絞り２０３の開口径を調整して撮影時の光量調節を行う。フォーカス制御部２１２は、撮影レンズ２０１の焦点ずれ量に基づいてフォーカスレンズ２０４の駆動量を決定する。焦点調節用レンズであるフォーカスレンズ２０４を移動させることにより、焦点調節状態が制御される。レンズ制御部２１３は、フォーカス制御部２１２を介してフォーカスレンズ２０４の移動を制御し、自動焦点調節制御が実現される。図２にはフォーカスレンズ２０４を単レンズで簡略的に示しているが、通常複数のレンズで構成される。絞り制御部２１１やフォーカス制御部２１２はレンズ制御部２１３によって制御される。詳しくは後述するが、フォーカスレンズ２０４は主被写体に対して常に焦点を合わせ続けるように駆動制御される。 The taking lens 201 is a lens unit including a fixed lens 202, a diaphragm 203, and a focus lens 204. The diaphragm control unit 211 drives the diaphragm 203 to adjust the aperture diameter of the diaphragm 203 to adjust the light amount at the time of shooting. The focus control unit 212 determines the drive amount of the focus lens 204 based on the defocus amount of the photographing lens 201. The focus adjustment state is controlled by moving the focus lens 204 which is the focus adjustment lens. The lens control unit 213 controls the movement of the focus lens 204 via the focus control unit 212, and automatic focus adjustment control is realized. Although the focus lens 204 is simply shown as a single lens in FIG. 2, it is usually composed of a plurality of lenses. The aperture controller 211 and the focus controller 212 are controlled by the lens controller 213. As will be described in detail later, the focus lens 204 is drive-controlled so that the main subject is always focused.

撮影レンズ２０１を構成する光学部材を介して入射した被写界光は、撮像素子２２１の受光面上に結像する。撮像素子２２１は、被写体像（光学像）を信号電荷に光電変換する光電変換素子であり、ＣＣＤ（電荷結合素子）イメージセンサやＣＭＯＳ（相補型金属酸化膜半導体）イメージセンサにより構成される。撮像素子２２１の各光電変換部に蓄積された信号電荷は、タイミングジェネレータ２２２が出力する駆動パルスにより、信号電荷に応じた電圧信号として順次読み出される。 The field light that has entered through the optical member that constitutes the taking lens 201 forms an image on the light receiving surface of the image sensor 221. The image sensor 221 is a photoelectric conversion element that photoelectrically converts a subject image (optical image) into signal charges, and is configured by a CCD (charge coupled device) image sensor or a CMOS (complementary metal oxide semiconductor) image sensor. The signal charge accumulated in each photoelectric conversion unit of the image sensor 221 is sequentially read as a voltage signal corresponding to the signal charge by the drive pulse output from the timing generator 222.

撮像素子２２１は、１つのマイクロレンズを共有する複数の光電変換部を備え、複数の視差画像を生成することで、撮像面位相差検出方式の焦点検出が可能な構成となっている。複数の光電変換部は、例えば第１の光電変換部および第２の光電変換部であり、一対の視差画像データを取得可能である。撮像素子２２１については図３を参照して後述する。 The image sensor 221 includes a plurality of photoelectric conversion units that share one microlens, and generates a plurality of parallax images to enable focus detection by an imaging plane phase difference detection method. The plurality of photoelectric conversion units are, for example, a first photoelectric conversion unit and a second photoelectric conversion unit, and can acquire a pair of parallax image data. The image sensor 221 will be described later with reference to FIG.

撮像信号処理部２２３は、撮像素子２２１の出力信号を取得して処理し、バス２３１を介してＳＤＲＡＭ（Synchronous Dynamic Random Access Memory）２６１に画像信号を記憶する。焦点検出用信号処理部２２４は、撮像素子２２１から出力された焦点検出用の信号を取得して処理する。焦点検出用信号処理部２２４が行う、撮像面位相差検出方式の焦点検出方法については後述する。 The imaging signal processing unit 223 acquires and processes the output signal of the imaging device 221, and stores the image signal in an SDRAM (Synchronous Dynamic Random Access Memory) 261 via the bus 231. The focus detection signal processing unit 224 acquires and processes the focus detection signal output from the image sensor 221. The focus detection method of the imaging plane phase difference detection method performed by the focus detection signal processing unit 224 will be described later.

ＳＤＲＡＭ２６１に格納された画像信号は、バス２３１を介して表示制御部２４１によって読み出され、表示部２４２は画像信号にしたがって画像表示を行う。また、画像信号の記録を行う動作モードにおいて記録媒体制御部２５１は、ＳＤＲＡＭ２６１から画像信号を読み出して記録媒体２５２に記録する制御を行う。 The image signal stored in the SDRAM 261 is read by the display control unit 241 via the bus 231, and the display unit 242 displays an image according to the image signal. In addition, in the operation mode in which the image signal is recorded, the recording medium control unit 251 performs control to read the image signal from the SDRAM 261 and record it on the recording medium 252.

カメラ制御部２２５はメインＣＰＵ（中央演算処理装置）を備え、カメラシステムの各部の制御を統括する。ＲＯＭ（リード・オンリ・メモリ）２６２にはカメラ制御部２２５が実行する制御プログラムおよび制御に必要な各種データ等が格納されている。フラッシュＲＯＭ２６３には、ユーザ設定情報等のカメラ動作に関する各種設定情報等が格納されている。 The camera control unit 225 includes a main CPU (central processing unit) and controls the respective units of the camera system. A ROM (Read Only Memory) 262 stores a control program executed by the camera control unit 225 and various data necessary for control. The flash ROM 263 stores various setting information regarding camera operation such as user setting information.

カメラ制御部２２５は、焦点検出用信号処理部２２４から出力されるデフォーカス量をフォーカスレンズ駆動量に変換してレンズ制御部２１３に伝達する。レンズ制御部２１３はサブＣＰＵを備え、フォーカス制御部２１２に指示してフォーカスレンズ２０４の移動を制御する。またカメラ制御部２２５は、ユーザが操作部を用いて行った操作指示を受け付けて各部の動作を制御する。例えば、カメラ制御部２２５は操作指示や画素信号の大きさに基づき、撮像素子２２１の蓄積時間、撮像素子２２１から撮像信号処理部２２３へ出力を行う際のゲインの設定値、タイミングジェネレータ２２２の設定値を決定する。画素信号は、一時的にＳＤＲＡＭ２６１に蓄積された画像データに係る画素信号である。 The camera control unit 225 converts the defocus amount output from the focus detection signal processing unit 224 into a focus lens drive amount and transmits the focus lens drive amount to the lens control unit 213. The lens control unit 213 includes a sub CPU and instructs the focus control unit 212 to control the movement of the focus lens 204. Further, the camera control unit 225 receives an operation instruction given by the user using the operation unit and controls the operation of each unit. For example, the camera control unit 225 sets the accumulation time of the image sensor 221, the set value of the gain when the image sensor 221 outputs to the image signal processing unit 223, and the setting of the timing generator 222 based on the size of the pixel signal. Determine the value. The pixel signal is a pixel signal related to the image data temporarily stored in the SDRAM 261.

被写体追跡部２７１は、主被写体の追跡処理を実行する。被写体追跡部２７１は被写体追跡において視差画像から算出可能な、距離情報と、位置変化ベクトルの光軸成分（以下、光軸ベクトルと呼ぶ）を用いた演算を行う。本実施形態では追跡精度を高めるために距離情報を使用するが、より精度を高めるために、撮像面位相差検出方式の焦点検出方法を用いる。つまり、被写体追跡部２７１は毎フレームの被写体の追跡結果を出力可能であるため、同様に毎フレームの距離情報を算出することができように、同じ撮像面から視差画像を取得する処理が行われる。被写体追跡部２７１の追跡結果は、バス２３１を介して各モジュールへ伝えられる。被写体追跡部２７１の追跡結果はカメラ制御部２２５、レンズ制御部２１３を介してフォーカス制御部２１２に伝達され、撮像画像内の主被写体領域に対する自動焦点検出および焦点調節制御が実現される。主被写体領域とは、追跡対象となる主被写体の画像領域である。また、絞り制御部２１１は、特定の被写体領域の輝度値を用いた露出制御を行う。被写体追跡部２７１が追跡している主被写体の表示については、主被写体の画像領域を矩形枠等で表示部２４２が画面に表示する。なお、被写体追跡部２７１には、ＳＤＲＡＭ２６１に蓄積された撮像画像のデータがバス２３１を介して送られる。被写体追跡部２７１は撮影時刻の異なる複数の画像を用いて主被写体を追跡し、追跡結果として主被写体を示す部分領域を抽出する。 The subject tracking unit 271 executes a tracking process of the main subject. The subject tracking unit 271 performs calculation using distance information and an optical axis component of a position change vector (hereinafter referred to as an optical axis vector) that can be calculated from a parallax image in subject tracking. In this embodiment, the distance information is used to improve the tracking accuracy, but the focus detection method of the imaging plane phase difference detection method is used to improve the accuracy. That is, since the subject tracking unit 271 can output the tracking result of the subject for each frame, the process of acquiring the parallax image from the same imaging plane is performed so that the distance information for each frame can be calculated similarly. . The tracking result of the subject tracking unit 271 is transmitted to each module via the bus 231. The tracking result of the subject tracking unit 271 is transmitted to the focus control unit 212 via the camera control unit 225 and the lens control unit 213, and automatic focus detection and focus adjustment control for the main subject region in the captured image are realized. The main subject area is an image area of the main subject to be tracked. Further, the aperture control unit 211 performs exposure control using the brightness value of a specific subject area. Regarding the display of the main subject tracked by the subject tracking unit 271, the display unit 242 displays the image region of the main subject in a rectangular frame or the like on the screen. Note that the captured image data stored in the SDRAM 261 is sent to the subject tracking unit 271 via the bus 231. The subject tracking unit 271 tracks the main subject using a plurality of images at different shooting times, and extracts a partial area indicating the main subject as a tracking result.

次に図３を参照して、撮像光学系の瞳と撮像素子の光電変換部との関係を説明する。図３に例示する撮像素子は、２次元的に配列されたｍ×ｎ個のセンサ部で構成される。図３の断面部３０１は、撮像素子２２１の一部を示している。各センサ部３０２には、マイクロレンズ３０３と２つの光電変換部（３０４，３０５）が配置されている。第１の光電変換部３０４および第２の光電変換部３０５の各出力信号を取得して、撮像面位相差検出方式による自動焦点調節に用いる像信号を生成可能である。 Next, the relationship between the pupil of the image pickup optical system and the photoelectric conversion unit of the image pickup device will be described with reference to FIG. The image sensor illustrated in FIG. 3 includes m × n sensor units arranged two-dimensionally. A cross section 301 in FIG. 3 shows a part of the image sensor 221. A microlens 303 and two photoelectric conversion units (304, 305) are arranged in each sensor unit 302. It is possible to obtain each output signal of the first photoelectric conversion unit 304 and the second photoelectric conversion unit 305 and generate an image signal used for automatic focus adjustment by the imaging plane phase difference detection method.

図３は、撮像光学系の瞳３０６の異なる領域（３０７，３０８）と各光電変換部との対応関係を示している。第１の瞳部分領域３０７と第２の瞳部分領域３０８は、光軸を挟んで位置している。各領域を通過した光束は、奥行き方向１０３の軸、つまり光軸を中心に各センサ部３０２に配置されたマイクロレンズ３０３を介して、２つの光電変換部（３０４、３０５）によりそれぞれ受光される。第１の瞳部分領域３０７を通過した光束は、マイクロレンズ３０３を介して第１の光電変換部３０４が受光する。また第２の瞳部分領域３０８を通過した光束は、マイクロレンズ３０３を介して第２の光電変換部３０５が受光する。各センサ部に設けられた２つの光電変換部により、撮像用の信号と焦点検出用の信号を取得できる。すなわち、２つの光電変換部（３０４，３０５）の出力を加算することで、撮像画像のデータが取得される。図２の撮像信号処理部２２３は撮像信号を画像信号（画像データ）として整える。また、２つの光電変換部（３０４，３０５）の出力を各々扱うことにより、視点の異なる２つの画像（視差画像）を取得できる。第１の光電変換部３０４の出力から第１の視差画像が得られ、第２の光電変換部３０５の出力から第２の視差画像が得られる。図２の焦点検出用信号処理部２２４は、一対の視差画像の信号を用いて焦点検出の演算を行う。 FIG. 3 shows the correspondence between different regions (307, 308) of the pupil 306 of the imaging optical system and the photoelectric conversion units. The first pupil partial region 307 and the second pupil partial region 308 are located with the optical axis sandwiched therebetween. The light flux passing through each region is received by each of the two photoelectric conversion units (304, 305) via the microlens 303 arranged in each sensor unit 302 about the axis in the depth direction 103, that is, the optical axis. . The light flux that has passed through the first pupil partial region 307 is received by the first photoelectric conversion unit 304 via the microlens 303. The light flux that has passed through the second pupil partial region 308 is received by the second photoelectric conversion unit 305 via the microlens 303. An image pickup signal and a focus detection signal can be acquired by the two photoelectric conversion units provided in each sensor unit. That is, the data of the captured image is acquired by adding the outputs of the two photoelectric conversion units (304, 305). The image pickup signal processing unit 223 of FIG. 2 prepares the image pickup signal as an image signal (image data). Further, by handling the outputs of the two photoelectric conversion units (304, 305) respectively, two images (parallax images) with different viewpoints can be acquired. The first parallax image is obtained from the output of the first photoelectric conversion unit 304, and the second parallax image is obtained from the output of the second photoelectric conversion unit 305. The focus detection signal processing unit 224 in FIG. 2 performs focus detection calculation using the signals of the pair of parallax images.

本実施形態では、２つの光電変換部の出力を加算して取得される画像信号の画像をＡ＋Ｂ像と呼称する。２つの光電変換部の出力から各々取得される画像信号の画像をＡ像、Ｂ像と呼称する。なお、位相差信号の生成方法については本実施形態の方法に限定されず、他の方法を用いてもよい。例えば、２つの光電変換部の加算出力により取得されるＡ＋Ｂ像の信号から、一方の光電変換部の出力により取得されるＡ像またはＢ像の信号を減算し、他方のＢ像またはＡ像の信号を生成することができる。 In the present embodiment, an image of an image signal acquired by adding the outputs of the two photoelectric conversion units is called an A + B image. The images of the image signals respectively acquired from the outputs of the two photoelectric conversion units are called A image and B image. The method of generating the phase difference signal is not limited to the method of this embodiment, and other methods may be used. For example, the signal of the A image or the B image obtained by the output of one photoelectric conversion unit is subtracted from the signal of the A + B image obtained by the addition output of the two photoelectric conversion units, and the signal of the B image or the A image of the other is subtracted. A signal can be generated.

ここで、焦点検出用信号処理部２２４が行う、撮像面位相差検出方式の焦点検出方法について説明する。撮像面位相差検出では、設定した焦点検出領域に対して撮像素子２２１から焦点検出用の一対の像信号、例えばＡ像の信号とＢ像の信号が取得される。次に焦点検出用信号処理部２２４は、取得された像信号間の相関量を算出する。相関量を求める演算は、焦点検出領域内の各走査ラインにおいてそれぞれ行われる。焦点検出用信号処理部２２４は相関量から相関変化量を算出し、相関変化量に基づいて２像のずれ量を算出する。この２像のずれ量に、所定の換算係数を乗算することでデフォーカス量に変換することができる。このとき、カメラ制御部２２５は自動焦点検出に使用する撮影パラメータ情報を取得する。撮影パラメータはカメラ本体部またはレンズ部のメモリに記憶されている。撮影パラメータとは、撮影レンズ２０１内の絞り２０３の絞り情報や、カメラ本体部内の撮像素子２２１のセンサゲイン等の情報である。本実施形態の構成に依らず、カメラの構成に応じて必要な情報を適宜取得してもよい。撮影パラメータに基づいて、焦点検出用の信号生成に係る処理や、焦点検出を行う領域が設定できるように、カメラ制御部２２５は必要な情報を提供する。本実施形態における焦点検出領域は、画面全体を２次元的に等分割して設定されている。 Here, the focus detection method of the imaging plane phase difference detection method performed by the focus detection signal processing unit 224 will be described. In the imaging plane phase difference detection, a pair of image signals for focus detection, for example, an A image signal and a B image signal, are acquired from the image sensor 221 for the set focus detection area. Next, the focus detection signal processing unit 224 calculates the correlation amount between the acquired image signals. The calculation for obtaining the correlation amount is performed for each scanning line in the focus detection area. The focus detection signal processing unit 224 calculates a correlation change amount from the correlation amount, and calculates a shift amount between the two images based on the correlation change amount. A defocus amount can be converted by multiplying the shift amount of these two images by a predetermined conversion coefficient. At this time, the camera control unit 225 acquires shooting parameter information used for automatic focus detection. The shooting parameters are stored in the memory of the camera body or the lens. The photographing parameters are information such as diaphragm information of the diaphragm 203 in the photographing lens 201 and sensor gain of the image sensor 221 in the camera body. Regardless of the configuration of the present embodiment, necessary information may be appropriately acquired according to the configuration of the camera. The camera control unit 225 provides necessary information so that the process related to the signal generation for focus detection and the region for focus detection can be set based on the shooting parameter. The focus detection area in this embodiment is set by equally dividing the entire screen in two dimensions.

次に、図４から図１４を用いて、撮像装置の動作について説明する。まず、図４を参照し、被写体追跡処理を含む全体の処理の流れを説明する。以下の処理は、カメラ制御部２２５のＣＰＵが実行するプログラムにしたがって実現される。 Next, the operation of the image pickup apparatus will be described with reference to FIGS. 4 to 14. First, the flow of the entire processing including the subject tracking processing will be described with reference to FIG. The following processing is realized according to a program executed by the CPU of the camera control unit 225.

撮像装置は露光を行い、撮像素子２２１は光像を電気信号に変換する（Ｓ４０１）。処理Ｓ４０２およびＳ４０３と、処理Ｓ４０４およびＳ４０５は並列処理として実行される。撮像信号処理部２２３は、撮像素子２２１から読み出した撮像画像（Ａ＋Ｂ像）の信号を画像信号に変換する（Ｓ４０２）。取得されたＡ＋Ｂ像の画像信号はＳＤＲＡＭ２６１に格納される（Ｓ４０３）。一方、撮像素子２２１により取得された視差画像（Ａ像、Ｂ像）の各信号は焦点検出用信号処理部２２４に入力される（Ｓ４０４）。焦点検出用信号処理部２２４はＡ像とＢ像のずれ量を検出してデフォーカス量を算出する（Ｓ４０５）。 The image pickup device performs exposure, and the image pickup element 221 converts the light image into an electric signal (S401). The processes S402 and S403 and the processes S404 and S405 are executed as parallel processes. The imaging signal processing unit 223 converts the signal of the captured image (A + B image) read from the image sensor 221 into an image signal (S402). The image signal of the acquired A + B image is stored in the SDRAM 261 (S403). On the other hand, each signal of the parallax image (A image, B image) acquired by the image sensor 221 is input to the focus detection signal processing unit 224 (S404). The focus detection signal processing unit 224 detects the shift amount between the A image and the B image and calculates the defocus amount (S405).

処理Ｓ４０３、Ｓ４０５の後、判定処理Ｓ４０６へ進む。カメラ制御部２２５はモードを判定する。ユーザが追跡モードを選択している場合（Ｓ４０６でＹＥＳ）、処理Ｓ４０７へ進み、追跡モードが選択されていない場合（Ｓ４０６でＮＯ）には判定処理Ｓ４０８へ移行する。追跡モードは、撮像装置が主被写体の追跡処理を行うモードである。撮像画像とデフォーカス情報に基づいて被写体追跡部２７１は被写体追跡処理を行う（Ｓ４０７）。判定処理Ｓ４０８にてカメラ制御部２２５は、電源のＯＮ／ＯＦＦを判定する。ユーザが電源をＯＦＦ操作するとシステムの動作が終了する（Ｓ４０８でＹＥＳ）。また電源のＯＦＦ操作が行われない場合（Ｓ４０８でＮＯ）、再び処理Ｓ４０１に戻り露光が行われる。 After the processes S403 and S405, the process proceeds to the determination process S406. The camera control unit 225 determines the mode. If the user has selected the tracking mode (YES in S406), the process proceeds to step S407, and if the tracking mode has not been selected (NO in S406), the process proceeds to determination process S408. The tracking mode is a mode in which the imaging device performs the tracking process of the main subject. The subject tracking unit 271 performs subject tracking processing based on the captured image and the defocus information (S407). In the determination processing S408, the camera control unit 225 determines ON / OFF of the power source. When the user turns off the power, the operation of the system ends (YES in S408). If the power-off operation is not performed (NO in S408), the process returns to step S401 and the exposure is performed.

続いて図５を参照し、被写体追跡処理について説明する。図５は本実施形態の被写体追跡を説明するブロック図である。被写体追跡部２７１は、撮像された画像（Ａ＋Ｂ像）を処理する第１の処理部と、デフォーカス情報を処理する第２の処理部を備える。第１の処理部は、被写体検出部５０１、マッチング部５０２、被写体領域決定部５０３を備える。ＳＤＲＡＭ２６１から逐次供給される撮像画像（Ａ＋Ｂ像）に基づいて、被写体検出部５０１、マッチング部５０２は第１の被写体候補領域を抽出する。また、第２の処理部は、光軸ベクトル算出部５０４、距離・ベクトル比較部５０５を備える。第２の処理部は、被写体追跡性能を向上させるために、視差画像（Ａ像、Ｂ像）に基づいて焦点検出用信号処理部２２４が算出したデフォーカス情報を使って第２の被写体候補領域を抽出する。第１および第２つの被写体候補領域に基づいて被写体領域決定部５０３は主被写体を追跡する。追跡の結果として、画像中の特定被写体を示す部分領域が抽出される。 Subsequently, the subject tracking process will be described with reference to FIG. FIG. 5 is a block diagram illustrating subject tracking according to this embodiment. The subject tracking unit 271 includes a first processing unit that processes a captured image (A + B image) and a second processing unit that processes defocus information. The first processing unit includes a subject detection unit 501, a matching unit 502, and a subject region determination unit 503. The subject detection unit 501 and the matching unit 502 extract the first subject candidate region based on the captured images (A + B images) sequentially supplied from the SDRAM 261. The second processing unit also includes an optical axis vector calculation unit 504 and a distance / vector comparison unit 505. The second processing unit uses the defocus information calculated by the focus detection signal processing unit 224 based on the parallax image (A image, B image) to improve the second subject candidate region in order to improve the subject tracking performance. To extract. The subject area determination unit 503 tracks the main subject based on the first and second subject candidate areas. As a result of the tracking, a partial area indicating the specific subject in the image is extracted.

本実施形態では、焦点検出用信号処理部２２４が算出したデフォーカス情報を用いる被写体追跡処理を例示して説明するが、本発明は、画像における被写体の深さに対応する情報としてさまざまな実施形態での適用が可能である。つまり、被写体の深さに対応するデータが示す情報（深さ情報）は、画像内における撮像装置から被写体までの被写体距離を直接的に表すか、または画像内の被写体の被写体距離や深さの相対関係を表す情報であればよい。以下、各処理部の詳細を説明する。 In the present embodiment, the subject tracking process using the defocus information calculated by the focus detection signal processing unit 224 will be described as an example, but the present invention is applicable to various embodiments as information corresponding to the depth of the subject in the image. It can be applied in. That is, the information (depth information) indicated by the data corresponding to the depth of the subject directly represents the subject distance from the image pickup device to the subject in the image, or the subject distance and depth of the subject in the image. Any information that represents a relative relationship may be used. The details of each processing unit will be described below.

図５の被写体検出部５０１は、撮像画像（Ａ＋Ｂ像）を取得して目的とする被写体を検出し、被写体追跡の追跡対象とする。撮像画像のデータは、撮像信号処理部２２３が出力してＳＤＲＡＭ２６１に記憶されている。被写体検出部５０１は、例えば顔検出を行い、画像内の被写体領域として人物の顔領域を特定する。検出方法としては、公知の顔検出方法を用いる。顔に関する知識（肌色情報、目・鼻・口等のパーツ）を利用する方法と、ニューラルネットに代表される学習アルゴリズムにより顔検出のための識別器を構成する方法等がある。また顔検出では、認識率向上のためにこれらを組み合わせて顔認識を行うのが一般的である。具体的には特許文献３に記載のウェーブレット変換と画像特徴量を利用して顔検出する方法がある。これに限らず、タッチパネルやボタン等を含む入力インターフェイス部を用いて、ユーザ（操作者）が、画像に含まれる任意の被写体画像を追跡対象に指定する構成としてもよい。その場合、被写体検出部５０１はユーザの操作により指定された位置情報に基づき、撮像画像内の被写体領域を検出して検出情報をマッチング部５０２に出力する。検出により特定された被写体領域は主被写体の画像領域として設定される。 The subject detection unit 501 in FIG. 5 acquires a captured image (A + B image), detects a target subject, and sets the target as a tracking target for subject tracking. The data of the captured image is output from the captured signal processing unit 223 and stored in the SDRAM 261. The subject detection unit 501 performs face detection, for example, and identifies a human face area as a subject area in the image. A known face detection method is used as the detection method. There are a method of using knowledge about the face (skin color information, parts of eyes, nose, mouth, etc.) and a method of configuring a discriminator for face detection by a learning algorithm represented by a neural network. In face detection, face recognition is generally performed by combining these in order to improve the recognition rate. Specifically, there is a method of detecting a face using the wavelet transform and the image feature amount described in Patent Document 3. The present invention is not limited to this, and the user (operator) may use an input interface unit including a touch panel, buttons, and the like to specify an arbitrary subject image included in the image as a tracking target. In that case, the subject detection unit 501 detects the subject region in the captured image based on the position information designated by the user's operation, and outputs the detection information to the matching unit 502. The subject region specified by the detection is set as the image region of the main subject.

マッチング部５０２は、被写体検出部５０１の検出情報を取得して、検出された被写体領域をテンプレートとして登録する。マッチング部５０２は、テンプレートと、ＳＤＲＡＭ２６１から逐次供給される画像の部分領域とのマッチング処理を行い、複数の評価値と領域情報を第１の被写体候補領域の情報として出力する。マッチング方式は多種多様に存在するが、本実施形態では、画素パターンの相違度に基づくテンプレートマッチング法を適用する。テンプレートマッチング法の詳細に関して図６を用いて説明する。 The matching unit 502 acquires the detection information of the subject detection unit 501 and registers the detected subject region as a template. The matching unit 502 performs matching processing between the template and the partial areas of the image sequentially supplied from the SDRAM 261, and outputs a plurality of evaluation values and area information as information on the first subject candidate area. Although there are various types of matching methods, in the present embodiment, the template matching method based on the degree of difference between pixel patterns is applied. Details of the template matching method will be described with reference to FIG.

図６（Ａ）は、テンプレートマッチングにおける被写体モデル（テンプレート）の例を示す。マッチング部５０２は、目的とする主被写体の領域を示す画像６０１の画素パターンを特徴量として扱う。図６（Ａ）は画像６０１の特徴量６０２をマトリクスで表現した例を示し、画素データの輝度信号を特徴量とする。特徴量をT(i,j)と表記し、テンプレート領域内の座標を（i,j）、水平画素数をＷ、垂直画素数をＨと表記する。特徴量T(i,j)は下記式で表現される。

FIG. 6A shows an example of a subject model (template) in template matching. The matching unit 502 treats the pixel pattern of the image 601 showing the target area of the main subject as a feature amount. FIG. 6A shows an example in which the characteristic amount 602 of the image 601 is expressed in a matrix, and the luminance signal of the pixel data is used as the characteristic amount. The feature amount is expressed as T (i, j), the coordinates in the template region are expressed as (i, j), the number of horizontal pixels is W, and the number of vertical pixels is H. The feature amount T (i, j) is expressed by the following equation.

図６（Ｂ）は、主被写体を探索する場合の画像情報を例示し、マッチング処理を行う範囲の画像６０３を示す。探索画像における座標は、（x,y）で表現する。マッチングの評価値を取得するための部分領域６０４を矩形枠で示す。部分領域６０４の特徴量６０５については、図６（Ａ）のテンプレートと同様に画像データの輝度信号を特徴量とする。特徴量をS(i,j)と表記し、部分領域６０４内の座標を（i,j）、水平画素数をＷ、垂直画素数をＨと表記する。特徴量S(i,j)は、下記式で表現される。

FIG. 6B exemplifies image information when searching for a main subject, and shows an image 603 in a range in which matching processing is performed. The coordinates in the search image are represented by (x, y). A partial area 604 for obtaining the matching evaluation value is shown by a rectangular frame. As for the feature amount 605 of the partial area 604, the brightness signal of the image data is used as the feature amount as in the template of FIG. The feature amount is expressed as S (i, j), the coordinates in the partial region 604 are expressed as (i, j), the number of horizontal pixels is W, and the number of vertical pixels is H. The feature amount S (i, j) is expressed by the following equation.

テンプレート領域と部分領域６０４との類似性を評価する演算方法として、差分絶対和、いわゆるＳＡＤ（Sum of Absolute Difference）値を用いる方法がある。ＳＡＤ値（V(x,y)と記す）は、下記式により算出される。

As a calculation method for evaluating the similarity between the template area and the partial area 604, there is a method of using a difference absolute sum, that is, a so-called SAD (Sum of Absolute Difference) value. The SAD value (denoted as V (x, y)) is calculated by the following formula.

マッチング部５０２は、部分領域６０４を探索範囲の画像６０３の左上から順に１画素ずつずらしながら、ＳＡＤ値V(x,y)を演算する。算出されたＳＡＤ値V(x,y)が最小値を示す座標（x,y）はテンプレートと最も類似した位置を示す。つまり、ＳＡＤ値V(x,y)が最小値を示す位置は、探索画像において主被写体が存在する可能性の高い位置である。 The matching unit 502 calculates the SAD value V (x, y) while shifting the partial area 604 by one pixel in order from the upper left of the image 603 in the search range. The coordinates (x, y) at which the calculated SAD value V (x, y) exhibits the minimum value indicate the position most similar to the template. That is, the position where the SAD value V (x, y) exhibits the minimum value is the position where the main subject is likely to exist in the search image.

特徴量として輝度信号の１次元の情報を用いる例を説明したが、明度・色相・彩度の信号等の３次元の情報を特徴量として扱ってもよい。また、マッチングの評価値の演算方法としてＳＡＤ値に関して説明したが、正規化相互相関いわゆるＮＣＣ（Normalized Cross-Correlation）等の、異なる演算方法を用いてもよい。本発明の適用上、テンプレートマッチングに限定されず、ヒストグラムの類似性に基づくヒストグラムマッチング等の他のマッチング方式であってもよい。 Although the example in which the one-dimensional information of the luminance signal is used as the feature amount has been described, three-dimensional information such as a signal of lightness, hue, and saturation may be treated as the feature amount. Further, although the SAD value has been described as the calculation method of the matching evaluation value, a different calculation method such as normalized cross-correlation (NCC) may be used. Application of the present invention is not limited to template matching, and other matching methods such as histogram matching based on histogram similarity may be used.

図５の被写体領域決定部５０３は、マッチング部５０２より複数の評価値と領域情報を取得し、主被写体の候補領域から最も評価値が小さい領域を被写体領域として決定する。しかし、主被写体と画素パターンや色ヒストグラムが似ている類似被写体が主被写体の近くにいる場面では類似被写体領域の評価値が小さくなってしまうことがある。その対策として、焦点検出用信号処理部２２４から出力されるデフォーカス量を利用して主被写体と類似被写体を区別する方法がある。しかし、この方法では主被写体と類似被写体が光軸方向にて近くにいる場面や、主被写体が類似被写体とすれ違って一時的に撮像面上からいなくなった間に光軸方向に移動する場面にて、主被写体と類似被写体とを区別することが難しい。つまりデフォーカス情報のみでは主被写体と類似被写体の判別に限界がある。そこで本実施形態では、デフォーカス情報に加えて、被写体の光軸ベクトルを参照することにより、追跡性能をさらに向上させることができる。 The subject region determination unit 503 in FIG. 5 acquires a plurality of evaluation values and region information from the matching unit 502, and determines the region having the smallest evaluation value from the candidate regions of the main subject as the subject region. However, the evaluation value of the similar subject area may be small in a scene in which a similar subject having a pixel pattern or a color histogram similar to that of the main subject is near the main subject. As a countermeasure, there is a method of distinguishing the main subject and the similar subject by using the defocus amount output from the focus detection signal processing unit 224. However, with this method, the main subject and the similar subject are close to each other in the optical axis direction, or the main subject passes the similar subject and moves in the optical axis direction while temporarily disappearing from the imaging surface. Therefore, it is difficult to distinguish the main subject and the similar subject. That is, there is a limit to the discrimination between the main subject and the similar subject only with the defocus information. Therefore, in the present embodiment, the tracking performance can be further improved by referring to the optical axis vector of the subject in addition to the defocus information.

図７を参照して、光軸ベクトル算出部５０４について説明する。光軸ベクトルとは、撮像面上の各焦点検出領域内に存在する被写体像の位置変化ベクトルの光軸成分を指す。図７は任意の焦点検出領域における被写体にピントが合う像面位置とフォーカスレンズ位置を、像面値に変換した値の時間変化を例示する。横軸は時間軸であり、縦軸は像面値を表わす。各時刻t1,t2において、フォーカスレンズ位置を黒塗りの円形記号で表わし、被写体にピントが合う位置を黒塗りの四角形記号で表わしている。 The optical axis vector calculation unit 504 will be described with reference to FIG. 7. The optical axis vector refers to the optical axis component of the position change vector of the subject image existing in each focus detection area on the imaging surface. FIG. 7 exemplifies a temporal change of a value obtained by converting an image plane position and a focus lens position where an object is in focus in an arbitrary focus detection area into an image plane value. The horizontal axis represents the time axis, and the vertical axis represents the image plane value. At each of the times t1 and t2, the focus lens position is represented by a black circle symbol, and the position at which the subject is in focus is represented by a black rectangle symbol.

任意の焦点検出領域を、x方向にｃ番目であってy方向にｒ番目の焦点検出領域としてその位置を(c,r)により表記する。任意の焦点検出領域での過去（時刻t1）のデフォーカス量をDpre(c,r)と表記し、当該焦点検出領域での現在（時刻t2）のデフォーカス量をDcur(c,r)と表記する。光軸ベクトルをv(c,r)と表記し、フォーカスレンズ２０４の駆動量を像面値に変換した値をLと表記する。図７より、光軸ベクトルv(c,r)は下記式で算出される。

なお、フォーカスレンズ２０４の駆動量については、当該駆動量をデフォーカス量に変換する換算係数と、フォーカスレンズ２０４の駆動量１パルスあたりの繰り出し量を乗算することで像面値Lに変換することができる。フォーカスレンズ駆動量からデフォーカス量への換算係数や１パルスあたりの繰り出し量の情報はＲＯＭ２６２に格納されているので、カメラ制御部２２５は必要に応じてバス２３１を介して取得できる。また、焦点検出用信号処理部２２４が算出したデフォーカス量はフォーカスレンズ２０４の駆動量へ変換することができる。 An arbitrary focus detection area is a c-th focus detection area in the x direction and an r-th focus detection area in the y direction, and its position is represented by (c, r). The past (time t1) defocus amount in an arbitrary focus detection area is expressed as Dpre (c, r), and the current (time t2) defocus amount in the focus detection area is expressed as Dcur (c, r). write. The optical axis vector is represented by v (c, r), and the value obtained by converting the drive amount of the focus lens 204 into the image plane value is represented by L. From FIG. 7, the optical axis vector v (c, r) is calculated by the following formula.

It should be noted that the drive amount of the focus lens 204 is converted into the image plane value L by multiplying the conversion coefficient for converting the drive amount into the defocus amount by the extension amount per pulse of the drive amount of the focus lens 204. You can Since the conversion coefficient from the focus lens drive amount to the defocus amount and the information about the amount of extension per pulse are stored in the ROM 262, the camera control unit 225 can obtain it via the bus 231 as necessary. The defocus amount calculated by the focus detection signal processing unit 224 can be converted into the drive amount of the focus lens 204.

図８を参照して、フォーカスレンズ２０４の駆動量の決定方法を説明する。図８は被写体８０１の追跡を説明する画像例を示す。画像内の被写体領域８０２は、１フレーム前に被写体領域決定部５０３が決定した被写体領域であり、その重心点を点８０３で示す。矩形領域は重心点８０３を内包する焦点検出領域８０４である。被写体８０１を追跡する際の、フォーカスレンズ２０４の駆動量については、被写体領域８０２の重心点８０３を内包する焦点検出領域８０４において算出されたデフォーカス量を反映してカメラ制御部２２５が決定する。この時、主被写体に対するデフォーカス量はＳＤＲＡＭ２６１に貯蓄し続けるので、一定時間のデフォーカス量の時間傾向に基づいて近似曲線を算出することにより、次の被写体位置を予測することができる。予測位置に基づいてフォーカスレンズ２０４の駆動量を決定することができる。 A method of determining the drive amount of the focus lens 204 will be described with reference to FIG. FIG. 8 shows an example of an image for explaining the tracking of the subject 801. The subject area 802 in the image is the subject area determined by the subject area determination unit 503 one frame before, and the center of gravity thereof is indicated by a point 803. The rectangular area is a focus detection area 804 including the center of gravity 803. The camera control unit 225 determines the drive amount of the focus lens 204 when tracking the subject 801 by reflecting the defocus amount calculated in the focus detection area 804 including the center of gravity 803 of the subject area 802. At this time, since the defocus amount for the main subject continues to be stored in the SDRAM 261, the next subject position can be predicted by calculating the approximate curve based on the time tendency of the defocus amount for a certain period of time. The drive amount of the focus lens 204 can be determined based on the predicted position.

ところで、動画撮影モード等の場合、単位時間における焦点検出用の視差画像サンプリング数が多い。このようなモードに関しては、光軸ベクトルの絶対量が小さいため、焦点検出誤差によるノイズの影響を受ける可能性がある。そこで本実施形態では、光軸ベクトル算出部５０４により算出された光軸ベクトルはＳＤＲＡＭ２６１に記憶され、数フレームに亘る移動平均値として更新していく処理が行われる。これにより、ノイズを低減させた光軸ベクトルを算出することができる。なお、ノイズの低減方法は移動平均法に限定されず、数フレームの間にＳＤＲＡＭ２６１へ記憶されたデフォーカス量に対し、ローパスフィルタ処理を行う方法がある。このようにデフォーカス量の変化を平滑化してから光軸ベクトルを算出する各種の方法を用いてもよい。 By the way, in the moving image shooting mode and the like, the number of parallax image samples for focus detection per unit time is large. With respect to such a mode, since the absolute amount of the optical axis vector is small, there is a possibility of being affected by noise due to focus detection error. Therefore, in the present embodiment, the optical axis vector calculated by the optical axis vector calculation unit 504 is stored in the SDRAM 261 and is updated as a moving average value over several frames. This makes it possible to calculate an optical axis vector with reduced noise. Note that the noise reduction method is not limited to the moving average method, and there is a method of performing low-pass filter processing on the defocus amount stored in the SDRAM 261 during several frames. As described above, various methods of smoothing the change in the defocus amount and then calculating the optical axis vector may be used.

図５の距離・ベクトル比較部５０５は、ＳＤＲＡＭ２６１に記憶された被写体のデフォーカス量および光軸ベクトルと、各焦点検出領域におけるデフォーカス量および光軸ベクトルとが一致しているか否かを判定する。以下、判定用の閾値について説明する。デフォーカス量と光軸ベクトルの閾値は、マッチング部５０２から出力される複数の評価値と、各領域の中心点座標の分散値に基づいて決定され、追跡信頼性に応じて動的に設定される。具体的には、マッチング部５０２から出力されるN個の評価値のうち、座標に対応するｉ番目の評価値をViと表記し、N個の評価値のうちで最小の評価値をVminと表記する。対象領域の中心座標（Ciと記す）に基づいて、下記式から評価値重心座標（Gと記す）が算出される。

The distance / vector comparison unit 505 in FIG. 5 determines whether the defocus amount and the optical axis vector of the subject stored in the SDRAM 261 and the defocus amount and the optical axis vector in each focus detection area match. . Hereinafter, the threshold value for determination will be described. The defocus amount and the threshold value of the optical axis vector are determined based on the plurality of evaluation values output from the matching unit 502 and the variance value of the center point coordinates of each area, and are dynamically set according to the tracking reliability. It Specifically, of the N evaluation values output from the matching unit 502, the i-th evaluation value corresponding to the coordinate is written as Vi, and the minimum evaluation value of the N evaluation values is Vmin. write. Based on the center coordinates (denoted as Ci) of the target area, the evaluation value barycentric coordinates (denoted as G) are calculated from the following equation.

評価値Viは、最小の評価値Vminで除算することで正規化される。また、先述したように評価値Viはテンプレートとの差分を意味するため、値が小さいほどテンプレートとの類似度が高い。そこで、評価値Viで重みづけした評価値重心座標Gを算出するために、中心座標Ciを、正規化した評価値Vi/Vminで除算し、その総和をNで除算して平均化する演算が行われる。下記式のように、評価値重心座標Gと各領域の中心座標Ciとのユークリッド距離を、各評価値Viで重みづけした合計値を求める演算が行われ、追跡信頼性評価値（Rと記す）が算出される。

The evaluation value Vi is normalized by dividing by the minimum evaluation value Vmin. Further, as described above, the evaluation value Vi means the difference from the template, and thus the smaller the value, the higher the similarity to the template. Therefore, in order to calculate the evaluation value barycentric coordinate G weighted with the evaluation value Vi, the central coordinate Ci is divided by the normalized evaluation value Vi / Vmin, and the sum is divided by N to average. Done. As shown in the following formula, the Euclidean distance between the evaluation value barycentric coordinate G and the center coordinate Ci of each area is weighted by each evaluation value Vi, and a calculation is performed to obtain a total value, which is a tracking reliability evaluation value (denoted as R). ) Is calculated.

追跡信頼性評価値Rは分散値であるので、値が小さいほど第１の被写体候補領域座標のばらつきが小さく追跡の信頼性が高くなる。したがって、追跡信頼性評価値Rが小さいときには、高い評価値をもつ領域が密集しており、類似被写体を主被写体と間違える可能性が低い。そのため距離・ベクトル比較部５０５は判定用の閾値を変更し、デフォーカス量と光軸ベクトルの一致度を判定するための許容範囲を広げて、デフォーカス量と光軸ベクトルの影響を小さくする。こうすることで、誤測距による影響を軽減させることができる。反対に、追跡信頼性評価値Rが大きいときには、高い評価値をもつ領域がばらついており、類似被写体を誤追跡する可能性が高くなる。そのため距離・ベクトル比較部５０５は判定用の閾値を変更し、デフォーカス量と光軸ベクトル量の一致度を判定するための許容範囲を狭めて、デフォーカス量と光軸ベクトルの影響を大きくする。なお、評価値重心座標Gと追跡信頼性評価値Rについて評価値Viで重みづけを行う方法を説明したが、この方法に限定されない。重みづけを行わない方法や他の方法を用いてもよい。 Since the tracking reliability evaluation value R is a variance value, the smaller the value, the smaller the variation in the coordinates of the first subject candidate area and the higher the tracking reliability. Therefore, when the tracking reliability evaluation value R is small, regions having high evaluation values are densely arranged, and it is unlikely that the similar subject is mistaken for the main subject. Therefore, the distance / vector comparison unit 505 changes the threshold for determination, widens the allowable range for determining the degree of coincidence between the defocus amount and the optical axis vector, and reduces the influence of the defocus amount and the optical axis vector. By doing so, the influence of erroneous distance measurement can be reduced. On the other hand, when the tracking reliability evaluation value R is large, the areas having a high evaluation value are scattered, and the possibility of erroneous tracking of a similar subject increases. Therefore, the distance / vector comparison unit 505 changes the threshold for determination, narrows the allowable range for determining the degree of coincidence between the defocus amount and the optical axis vector amount, and increases the influence of the defocus amount and the optical axis vector. . Although the method of weighting the evaluation value centroid coordinate G and the tracking reliability evaluation value R with the evaluation value Vi has been described, the present invention is not limited to this method. A method without weighting or another method may be used.

次に、図９、図１０のフローチャートを参照して、図５の被写体追跡部２７１が行う被写体追跡処理を詳細に説明する。まず図９を用いて被写体追跡の概要を説明する。判定処理Ｓ９０１で被写体追跡部２７１は、テンプレート画像の有無を判定する。ユーザが追跡モードを選択した時点から初期フレームであればテンプレート画像が無いので、この場合（Ｓ９０１でＮＯ）、処理Ｓ９０２へ進む。テンプレート画像が既に存在する場合には、処理Ｓ９０６以降および処理Ｓ９０９以降の並行処理が実行される。 Next, the subject tracking processing performed by the subject tracking unit 271 of FIG. 5 will be described in detail with reference to the flowcharts of FIGS. 9 and 10. First, the outline of subject tracking will be described with reference to FIG. In determination processing S901, the subject tracking unit 271 determines whether or not there is a template image. Since the template image does not exist in the initial frame from the time when the user selects the tracking mode, in this case (NO in S901), the process proceeds to step S902. If the template image already exists, the parallel processing from step S906 onward and step S909 onward is executed.

テンプレート画像を登録する必要があると判定された場合、ＳＤＲＡＭ２６１の撮像画像（Ａ＋Ｂ像）のデータは被写体追跡部２７１に入力される（Ｓ９０２）。被写体検出部５０１は、入力された撮像画像に基づいて画像上の特徴量から被写体検出を行う（Ｓ９０３）。検出された被写体領域はテンプレートとして登録される（Ｓ９０４）。初期フレームにおける被写体追跡処理では、テンプレートの登録のみが行われる。判定処理Ｓ９０５に進み、追跡モードの終了判定が行われる。ここで、追跡モードを終了すると（Ｓ９０５でＹＥＳ）、被写体追跡モードが終了する。追跡モードの終了が判定されない場合には、判定処理Ｓ９０１に戻る。 When it is determined that the template image needs to be registered, the data of the captured image (A + B image) of the SDRAM 261 is input to the subject tracking unit 271 (S902). The subject detection unit 501 performs subject detection from the feature amount on the image based on the input captured image (S903). The detected subject area is registered as a template (S904). In the subject tracking process in the initial frame, only template registration is performed. Proceeding to the determination processing S905, the termination determination of the tracking mode is performed. Here, when the tracking mode ends (YES in S905), the subject tracking mode ends. If the end of the tracking mode is not determined, the process returns to the determination process S901.

被写体追跡を開始してから２フレーム以降にはテンプレート画像が存在するので、処理Ｓ９０６および処理Ｓ９０９に進む。処理Ｓ９０６で入力される撮像画像（Ａ＋Ｂ像）に対して、テンプレートに基づいてマッチング部５０２はマッチング処理を実行し（Ｓ９０７）、第１の被写体候補領域を検出する（Ｓ９０８）。 Since the template image exists in the second and subsequent frames after the subject tracking is started, the process proceeds to step S906 and step S909. The matching unit 502 performs matching processing on the captured image (A + B image) input in processing S906 based on the template (S907), and detects the first subject candidate area (S908).

一方、処理Ｓ９０９では、焦点検出用信号処理部２２４から被写体追跡部２７１にデフォーカス情報が入力される。光軸ベクトル算出部５０４は光軸ベクトルを算出し、距離・ベクトル比較部５０５は第２の被写体候補領域を検出する（Ｓ９１０）。第２の被写体候補領域の検出方法の詳細については後述する。 On the other hand, in step S909, defocus information is input from the focus detection signal processing unit 224 to the subject tracking unit 271. The optical axis vector calculation unit 504 calculates the optical axis vector, and the distance / vector comparison unit 505 detects the second subject candidate area (S910). Details of the method of detecting the second subject candidate area will be described later.

処理Ｓ９０８、Ｓ９１０の後、処理Ｓ９１１へ進み、第２の被写体候補領域から第１の被写体候補領域の絞り込みが行われる。被写体追跡に距離情報を付加するために、テンプレートマッチングに基づく第１の被写体候補領域（Ｓ９０８の出力）の中心点のうち、距離・ベクトル比較部５０５が検出した第２の被写体候補領域（Ｓ９１０の出力）内に存在する点が抽出される。第１の被写体候補領域は、評価値が小さいほどテンプレートとの類似度が高い。そのため処理Ｓ９１２では、絞り込みにより残った第１の被写体候補領域の中で最も評価値が小さい被写体候補領域を被写体領域決定部５０３が被写体領域として決定する。 After steps S908 and S910, the process proceeds to step S911, and the first subject candidate area is narrowed down from the second subject candidate area. In order to add distance information to the object tracking, of the center points of the first object candidate area (output of S908) based on template matching, the second object candidate area (of S910) detected by the distance / vector comparison unit 505 is detected. Points existing in (output) are extracted. The smaller the evaluation value of the first subject candidate area, the higher the degree of similarity with the template. Therefore, in step S912, the subject area determination unit 503 determines the subject candidate area having the smallest evaluation value among the first subject candidate areas remaining after the narrowing down as the subject area.

ところで、被写体領域決定部５０３は常に被写体領域を決定できるとは限らない。例えば、被写体の形状や色分布が大きく変化してマッチング部５０２から出力される評価値が所定値よりも小さい場合には、被写体領域を決定できない。また、処理Ｓ９１１において第１の被写体候補領域と第２の被写体候補領域との間で一致する領域が無いために第１の被写体候補領域が無くなった場合には、被写体領域を決定できない。このように、被写体領域を決定できない状態をLOST状態と定義する。被写体領域決定部５０３において被写体領域を決定できる状態をFIND状態と定義する。判定処理Ｓ９１３ではFIND／LOST状態の判定が行われ、処理がＳ９１４、Ｓ９１７にそれぞれ分岐する。 By the way, the subject area determination unit 503 cannot always determine the subject area. For example, if the shape or color distribution of the subject changes significantly and the evaluation value output from the matching unit 502 is smaller than a predetermined value, the subject area cannot be determined. Further, in the processing S911, if there is no matching area between the first subject candidate area and the second subject candidate area and the first subject candidate area is lost, the subject area cannot be determined. In this way, the state in which the subject area cannot be determined is defined as the LOST state. The state in which the subject area determination unit 503 can determine the subject area is defined as the FIND state. In the determination processing S913, the FIND / LOST state is determined, and the processing branches to S914 and S917, respectively.

FIND状態のときに処理Ｓ９１４へ進み、被写体のテンプレート画像が最新の被写体画像に更新される。その後、被写体に焦点を合わせ続けるためにフォーカスレンズ２０４の駆動量を決定する必要がある。フォーカスレンズ２０４の駆動量は、被写体領域決定部５０３が決定した主被写体領域における重心位置のデフォーカス量に基づいて決定される（Ｓ９１５）。この時、フォーカスレンズ２０４の駆動量を像面値に換算した値は、被写体の光軸ベクトルの大きさと同じである。このため、フォーカスレンズ２０４の駆動量は被写体の光軸ベクトルとしてＳＤＲＡＭ２６１に記憶される（Ｓ９１６）。そして判定処理Ｓ９０５へ進む。一方、判定処理Ｓ９１３にて判定結果がLOST状態のときには、主被写体領域が決まらず主被写体に対するデフォーカス量が不明である。このため、フォーカスレンズ２０４の駆動が停止される（Ｓ９１７）。LOST状態が所定時間（判定用の閾値時間）以上続いたか否かが判定される（Ｓ９１８）。LOST状態が所定時間以上続いた場合（Ｓ９１８でＹＥＳ）、被写体のテンプレートが削除される（Ｓ９１９）。テンプレートの削除後に判定処理Ｓ９０１に戻り、判定結果（ＮＯ）により、再びテンプレートの登録が行われる（Ｓ９０４）。また、LOST状態が所定時間未満である場合（Ｓ９１８でＮＯ）、テンプレートを保持したままで、判定処理Ｓ９０５へ移行する。なお、本実施形態では判定処理Ｓ９１８で所定時間以上のLOST状態を判定条件として採用しているが、その他の条件に変更してもよい。 In the FIND state, the process proceeds to step S914, and the template image of the subject is updated to the latest subject image. After that, it is necessary to determine the drive amount of the focus lens 204 in order to continue focusing on the subject. The drive amount of the focus lens 204 is determined based on the defocus amount of the gravity center position in the main subject area determined by the subject area determination unit 503 (S915). At this time, the value obtained by converting the drive amount of the focus lens 204 into the image plane value is the same as the magnitude of the optical axis vector of the subject. Therefore, the drive amount of the focus lens 204 is stored in the SDRAM 261 as the optical axis vector of the subject (S916). Then, the process proceeds to the determination process S905. On the other hand, when the determination result in the determination processing S913 is the LOST state, the main subject area is not determined and the defocus amount for the main subject is unknown. Therefore, the drive of the focus lens 204 is stopped (S917). It is determined whether the LOST state has continued for a predetermined time (a threshold time for determination) or more (S918). If the LOST state continues for a predetermined time or longer (YES in S918), the subject template is deleted (S919). After deleting the template, the process returns to the determination process S901, and the template is registered again according to the determination result (NO) (S904). When the LOST state is less than the predetermined time (NO in S918), the template is held and the process proceeds to the determination process S905. In this embodiment, the LOST state for a predetermined time or longer is adopted as the judgment condition in the judgment processing S918, but it may be changed to other conditions.

続いて、図１０のフローチャートを参照し、距離・ベクトル比較部５０５における第２の被写体候補領域の決定方法を説明する。距離・ベクトル比較部５０５は、設定した複数の焦点検出領域に対して１つずつ演算処理を実行し、各焦点検出領域が主被写体領域であるか、またはその他の領域であるかを判別する。まず、光軸ベクトル算出部５０４は入力されたデフォーカス量に基づいて光軸ベクトルを算出する（Ｓ１００１）。この後、１フレーム前の状態がFIND状態であったかLOST状態であったかが判定される（Ｓ１００２）。１フレーム前の状態がFIND状態であった場合（Ｓ１００２でＹＥＳ）、判定処理Ｓ１００３に進み、１フレーム前の状態がLOST状態であった場合（Ｓ１００２でＮＯ）、判定処理Ｓ１００６に進む。判定処理Ｓ１００３では、算出されたデフォーカス量が所定範囲以内であるかどうかが判定される。デフォーカス量が所定範囲以内である場合（Ｓ１００３でＹＥＳ）、判定処理Ｓ１００４に進み、デフォーカス量が所定範囲以内でない場合（Ｓ１００３でＮＯ）、処理Ｓ１００８に移行する。 Subsequently, a method of determining the second subject candidate area in the distance / vector comparison unit 505 will be described with reference to the flowchart of FIG. The distance / vector comparison unit 505 performs a calculation process on each of the set focus detection areas, and determines whether each focus detection area is the main subject area or another area. First, the optical axis vector calculation unit 504 calculates the optical axis vector based on the input defocus amount (S1001). After this, it is determined whether the state one frame before was the FIND state or the LOST state (S1002). When the state one frame before is the FIND state (YES in S1002), the process proceeds to the determination process S1003, and when the state one frame before is the LOST state (NO in S1002), the process proceeds to the determination process S1006. In the determination processing S1003, it is determined whether the calculated defocus amount is within a predetermined range. If the defocus amount is within the predetermined range (YES in S1003), the process proceeds to determination process S1004, and if the defocus amount is not within the predetermined range (NO in S1003), the process proceeds to process S1008.

判定処理Ｓ１００４では、処理Ｓ１００１で算出された光軸ベクトルと、図９の処理Ｓ９１６でＳＤＲＡＭ２６１に記憶された主被写体の光軸ベクトルとが比較される。これらの光軸ベクトル同士の方向および大きさが一致していると判定された場合、処理Ｓ１００５に進み、方向または大きさが一致していないと判定された場合には処理Ｓ１００８へ移行する。ここで光軸ベクトルが一致するとは、光軸ベクトルの方向が同じであり、かつ処理Ｓ１００１で算出された光軸ベクトルとＳＤＲＡＭ２６１に記憶された主被写体の光軸ベクトルとの大きさの差が所定の閾値以下であることとする。つまり、判定処理Ｓ１００４での光軸ベクトルは、焦点検出領域に対応する被写体の光軸方向の速度を意味するので、同じ被写体の追尾が継続している間、光軸ベクトルの方向が一致しており、光軸ベクトルの大きさの差は小さい。 In the determination processing S1004, the optical axis vector calculated in the processing S1001 is compared with the optical axis vector of the main subject stored in the SDRAM 261 in the processing S916 of FIG. If it is determined that the directions and sizes of these optical axis vectors are the same, the process proceeds to step S1005, and if it is determined that the directions or the sizes are not the same, the process proceeds to step S1008. The coincidence of the optical axis vectors means that the directions of the optical axis vectors are the same, and the difference in size between the optical axis vector calculated in step S1001 and the optical axis vector of the main subject stored in the SDRAM 261 is predetermined. Is less than or equal to the threshold value of. That is, the optical axis vector in the determination processing S1004 means the velocity of the subject in the optical axis direction corresponding to the focus detection area. Therefore, while the tracking of the same subject continues, the directions of the optical axis vectors match. Therefore, the difference in the magnitude of the optical axis vector is small.

処理Ｓ１００５にて距離・ベクトル比較部５０５は、判定対象である焦点検出領域を主被写体領域と見なし、第２の被写体候補領域とする。また、判定処理Ｓ１００３、Ｓ１００４にてどちらかの一方でも条件を満たさない場合、距離・ベクトル比較部５０５は判定対象の焦点検出領域を主被写体以外の被写体領域であると見なす（Ｓ１００８）。先述したように、撮像装置は常に主被写体に焦点を合わせ続けているため、判定処理Ｓ１００３でデフォーカス量が所定範囲外であると判定された場合には主被写体を捉えていないと見なされる。また判定処理Ｓ１００４で光軸ベクトルの方向または大きさが一致していないと判定された場合にも主被写体を捉えていないと見なされる。 In step S1005, the distance / vector comparison unit 505 regards the focus detection area that is the determination target as the main subject area and sets it as the second subject candidate area. If either one of the conditions is not satisfied in the determination processes S1003 and S1004, the distance / vector comparison unit 505 regards the focus detection area to be determined as an object area other than the main object (S1008). As described above, since the imaging device always keeps the focus on the main subject, when it is determined that the defocus amount is out of the predetermined range in the determination process S1003, it is considered that the main subject is not captured. Also, when it is determined in the determination processing S1004 that the directions or sizes of the optical axis vectors do not match, it is considered that the main subject is not captured.

一方、判定処理Ｓ１００２の判定結果がLOST状態である場合、判定処理Ｓ１００６、Ｓ１００７が実行される。判定処理Ｓ１００６、Ｓ１００７の処理内容はそれぞれ、判定処理Ｓ１００３、Ｓ１００４と同様である。判定処理Ｓ１００６にて、算出されたデフォーカス量が小さく、所定範囲以内である場合、処理Ｓ１００５に進む。デフォーカス量が所定範囲外である場合には判定処理Ｓ１００７に進み、判定対象の焦点検出領域における光軸ベクトルとＳＤＲＡＭ２６１に記憶された主被写体の光軸ベクトルとの方向および大きさが比較される。これらの光軸ベクトルの方向および大きさが一致した場合、処理Ｓ１００５へ進み、光軸ベクトルの方向または大きさが一致していない場合には処理Ｓ１００８へ進む。処理Ｓ１００５、Ｓ１００８の後、判定処理Ｓ１００９に進み、設定した全ての焦点検出領域について処理が終了したか否かについて判定が行われる。全ての焦点検出領域について終了した場合には、一連の処理を終えるが、未終了の場合には処理Ｓ１００１に戻って処理を続行する。 On the other hand, when the determination result of the determination process S1002 is the LOST state, the determination processes S1006 and S1007 are executed. The processing contents of the determination processes S1006 and S1007 are the same as those of the determination processes S1003 and S1004, respectively. In the determination processing S1006, when the calculated defocus amount is small and is within the predetermined range, the processing proceeds to processing S1005. If the defocus amount is out of the predetermined range, the process proceeds to a determination process S1007, and the direction and size of the optical axis vector in the focus detection area to be determined and the optical axis vector of the main subject stored in the SDRAM 261 are compared. . If the directions and sizes of the optical axis vectors match, the process proceeds to step S1005, and if the directions and sizes of the optical axis vectors do not match, the process proceeds to step S1008. After the processes S1005 and S1008, the process proceeds to a determination process S1009, and it is determined whether or not the process is completed for all the set focus detection areas. If all focus detection areas have been completed, the series of processes is ended, but if not completed, the process returns to step S1001 to continue the process.

次に図１１から図１４を参照して具体例を説明する。図１１、図１２は１フレーム前の状態がFIND状態である場合（図１０の判定処理Ｓ１００２でＹＥＳ）の第２の被写体候補領域の決定方法に関する説明図である。図１１は２匹の類似する犬（被写体）がいる場面を例示する。右側の犬を主被写体１１０１とし、左側の犬を類似被写体１１０２とする。 Next, a specific example will be described with reference to FIGS. 11 to 14. 11 and 12 are explanatory diagrams regarding the method of determining the second subject candidate region when the state one frame before is the FIND state (YES in the determination process S1002 in FIG. 10). FIG. 11 illustrates a scene in which there are two similar dogs (subjects). The dog on the right side is the main subject 1101, and the dog on the left side is the similar subject 1102.

図１１（Ａ）は、画面奥から手前に移動する主被写体１１０１に対して、その左側で同じ場所に停留する類似被写体１１０２を撮影するシーンを例示する。主被写体１１０１に対する焦点検出領域１１０３と類似被写体に対する焦点検出領域１１０４をそれぞれ矩形枠で示す。図１１（Ｂ）は、マッチング部５０２から出力される第１の被写体候補領域の中心点１１０５を黒塗りの三角形記号で示す。 FIG. 11A illustrates a scene in which a similar subject 1102 staying at the same place on the left side of the main subject 1101 moving from the back to the front of the screen is photographed. The focus detection area 1103 for the main subject 1101 and the focus detection area 1104 for the similar subject are shown by rectangular frames. In FIG. 11B, the center point 1105 of the first subject candidate area output from the matching unit 502 is indicated by a black triangle symbol.

図１２（Ａ）は主被写体１１０１、類似被写体１１０２の位置に対するそれぞれの焦点検出領域１１０３、１１０４におけるデフォーカス量の時間変化を例示する。横軸は時間軸であり、縦軸はデフォーカス量を表す。図１２（Ｂ）は、焦点検出領域１１０３、１１０４におけるそれぞれの光軸ベクトルの時間変化を例示する。横軸は時間軸であり、縦軸は光軸ベクトルを表す。光軸の向きとしては、撮像装置から見た場合に奥方向を正方向とし、手前方向を負方向とする。主被写体１１０１については円形の記号で示し、類似被写体１１０２については四角形の記号で示している。 FIG. 12A illustrates a temporal change of the defocus amount in each of the focus detection areas 1103 and 1104 with respect to the positions of the main subject 1101 and the similar subject 1102. The horizontal axis represents the time axis and the vertical axis represents the defocus amount. FIG. 12B exemplifies a temporal change of each optical axis vector in the focus detection areas 1103 and 1104. The horizontal axis represents the time axis and the vertical axis represents the optical axis vector. Regarding the direction of the optical axis, when viewed from the imaging device, the back direction is the positive direction and the front direction is the negative direction. The main subject 1101 is indicated by a circular symbol, and the similar subject 1102 is indicated by a rectangular symbol.

図１２（Ａ）では、同じ場所に停留している類似被写体１１０２に関するデフォーカス量が時間的に変化している。その理由は、フォーカスレンズ２０４が主被写体１１０１に合わせて移動しているためである。図１２（Ａ）の２本の一点鎖線はデフォーカス量の閾値１２０１を示している。デフォーカス量が２本の一点鎖線の間の範囲内にある場合に主被写体と見なすことができるが、類似被写体に関するデフォーカス量が当該範囲内に入る期間をＴ１に示す。期間Ｔ１では、主被写体および類似被写体に関するデフォーカス領域が閾値１２０１内に収まっているので、両者を区別することができない。そこで、図１２（Ｂ）に示す光軸ベクトルの時間傾向、つまり期間Ｔ１での主被写体および類似被写体に関する光軸ベクトルの相違から両者を区別することができる。図１２（Ｂ）のように光軸方向において動かない類似被写体１１０２では、その光軸ベクトル量がゼロ近辺を維持する。これに対して、光軸手前方向に移動する主被写体１１０１では、その光軸ベクトル量がゼロでない値Ｖ１を維持する。そのため、類似被写体と主被写体とを判別することができる。図１１（Ｂ）には、光軸ベクトル量がＶ１を維持する焦点検出領域を抽出することで得られる第２の被写体候補領域１１０６を長方形枠で示している。 In FIG. 12A, the defocus amount regarding the similar subject 1102 staying at the same place changes with time. The reason is that the focus lens 204 is moving according to the main subject 1101. Two dashed-dotted lines in FIG. 12A indicate the defocus amount threshold 1201. When the defocus amount is within the range between the two alternate long and short dash lines, it can be regarded as the main subject, but T1 indicates a period in which the defocus amount for the similar subject falls within the range. In the period T1, since the defocus areas relating to the main subject and the similar subject are within the threshold 1201, it is impossible to distinguish them. Therefore, the two can be distinguished from each other based on the temporal tendency of the optical axis vector shown in FIG. 12B, that is, the difference between the optical axis vectors of the main subject and the similar subject in the period T1. In the similar subject 1102 that does not move in the optical axis direction as shown in FIG. 12B, the optical axis vector amount maintains around zero. On the other hand, in the main subject 1101 that moves toward the front side of the optical axis, the optical axis vector amount maintains a non-zero value V1. Therefore, it is possible to distinguish between the similar subject and the main subject. In FIG. 11B, the second subject candidate region 1106 obtained by extracting the focus detection region in which the optical axis vector amount maintains V1 is shown by a rectangular frame.

以上の処理では、複数の被写体に関するデフォーカス量の時間変化と光軸ベクトルの時間変化に基づき、第１の被写体候補領域と第２の被写体候補領域をすり合わせることで主被写体と類似被写体を正確に区別できる。すなわち、デフォーカス量の時間変化だけでは区別が困難な主被写体と類似被写体を、光軸ベクトルの時間変化に基づいて判別することによって、主被写体の正確な追跡を実現できる。 In the above processing, the main subject and the similar subject are accurately identified by grinding the first subject candidate region and the second subject candidate region based on the time change of the defocus amount and the time change of the optical axis vector for a plurality of subjects. Can be distinguished. That is, accurate tracking of the main subject can be realized by discriminating the main subject and the similar subject, which are difficult to be distinguished only by the temporal change of the defocus amount, based on the temporal change of the optical axis vector.

続いて図１３、図１４の具体例を用いて、１フレーム前の状態がLOST状態である場合（図１０の判定処理Ｓ１００２でＮＯ）の被写体領域判定について説明する。図１３は、奥から手前に主被写体１３０１が進んできており、類似被写体１３０２が画面の右側から左側へ横切っているシーンを例示する。撮影開始時刻ｔ１を起点として図１３（Ａ）は時刻ｔ２で撮影された画像を示し、図１３（Ｂ）は時刻ｔ３で撮影された画像を示す。図１３（Ｃ）は時刻ｔ４で撮影された画像を示す。「ｔ１＜ｔ２＜ｔ３＜ｔ４」とする。 Subsequently, the subject area determination in the case where the state one frame before is the LOST state (NO in determination processing S1002 in FIG. 10) will be described with reference to the specific examples in FIGS. 13 and 14. FIG. 13 illustrates a scene in which the main subject 1301 is moving from the back to the front, and the similar subject 1302 crosses from the right side to the left side of the screen. 13A shows an image taken at time t2, and FIG. 13B shows an image taken at time t3, starting from the shooting start time t1. FIG. 13C shows an image taken at time t4. Let “t1 <t2 <t3 <t4”.

図１３（Ａ）では主被写体１３０１の領域１３０３が決定され、FIND状態である。主被写体に対する焦点検出領域１３０４と類似被写体１３０２に対する焦点検出領域１３０５をそれぞれ示す。図１３（Ｂ）では類似被写体１３０２が主被写体１３０１に重なっており、主被写体１３０１が隠れるので、撮影画面上で見えなくなる。このため、LOST状態となる。図１３（Ｃ）では再び主被写体１３０１が出現する。 In FIG. 13A, the area 1303 of the main subject 1301 is determined and is in the FIND state. A focus detection area 1304 for the main subject and a focus detection area 1305 for the similar subject 1302 are shown. In FIG. 13B, the similar subject 1302 overlaps the main subject 1301 and the main subject 1301 is hidden, so that it cannot be seen on the shooting screen. Therefore, the LOST state is set. In FIG. 13C, the main subject 1301 appears again.

図１４（Ａ）は各被写体に対する焦点検出領域１３０４、１３０５におけるデフォーカス量の時間変化を例示する。図１４（Ｂ）は、焦点検出領域１３０４、１３０５におけるそれぞれの光軸ベクトルの時間変化を例示する。各軸の設定は図１２と同じである。期間Ｔ２、Ｔ３、Ｔ４については、図１３（Ａ）から（Ｃ）の各シーンに対応している。つまり、図１３（Ａ）の撮影時刻ｔ２は期間Ｔ２内の時刻であり、図１３（Ｂ）の撮影時刻ｔ３は期間Ｔ３内の時刻である。図１３（Ｃ）の撮影時刻ｔ４は期間Ｔ４内の時刻である。 FIG. 14A illustrates a change over time in the defocus amount in the focus detection areas 1304 and 1305 for each subject. FIG. 14B exemplifies a temporal change of each optical axis vector in the focus detection areas 1304 and 1305. The setting of each axis is the same as in FIG. The periods T2, T3, and T4 correspond to the scenes in FIGS. 13A to 13C. That is, the shooting time t2 in FIG. 13A is the time within the period T2, and the shooting time t3 in FIG. 13B is the time within the period T3. The shooting time t4 in FIG. 13C is a time within the period T4.

期間Ｔ２においてデフォーカス量の変化と光軸ベクトルの変化は図１１に示したFIND状態のシーンと同様である。しかし、期間Ｔ３では、手前を横切る類似被写体１３０２によって主被写体１３０１が隠れる。このため、主被写体１３０１についてはデフォーカス量と光軸ベクトル量を共に算出できなくなる。また、LOST状態ではフォーカスレンズ２０４の駆動が停止するので、光軸方向に移動していない類似被写体１３０２に関するデフォーカス量は一定値になる。 In the period T2, the change in the defocus amount and the change in the optical axis vector are the same as in the scene in the FIND state shown in FIG. However, in the period T3, the main subject 1301 is hidden by the similar subject 1302 that crosses the front side. Therefore, it becomes impossible to calculate both the defocus amount and the optical axis vector amount for the main subject 1301. Further, since the drive of the focus lens 204 is stopped in the LOST state, the defocus amount for the similar subject 1302 that has not moved in the optical axis direction becomes a constant value.

期間Ｔ４では、主被写体１３０１が再び撮影画面に出現し、デフォーカス量１４０１、光軸ベクトル量１４０２が取得される。つまり図１３（Ｃ）の焦点検出領域１３０６におけるデフォーカス量と光軸ベクトル量が取得され、これらは一度見失った主被写体が再び出現した時に正確に追跡できた場合に存在すると想定される結果である。期間Ｔ４ではフォーカスレンズ２０４が停止していて、主被写体１３０１が手前に進んでくるため、デフォーカス量は絶対量が大きくなっていく。このためデフォーカス量１４０１だけでは、焦点検出領域１３０６を主被写体の領域と見なすことは困難である。一方、図１４（Ｂ）に示す光軸ベクトル量は、期間Ｔ２と期間Ｔ４とで主被写体の光軸ベクトルが向きおよび大きさ（Ｖ２）ともに一致していることがわかる。このように期間Ｔ３、Ｔ４でのLOST状態において、すべての焦点検出領域に対して光軸ベクトル量がＶ２となる領域を探し続けることで、再び第２の被写体候補領域を決定できる。すなわち、主被写体が類似被写体によって隠れてしまうシーンの場合にデフォーカス量の時間変化だけでは特定が困難な主被写体を、光軸ベクトルの時間変化に基づいて識別することができる。第２の被写体候補領域が決定された後の処理については、図１１、図１２で示したFIND状態の場合の処理と同様であるため、その詳細な説明を省略する。なお図１２（Ｂ）、図１４（Ｂ）のグラフは、主被写体が等速運動を行っている状況を示す。これに限らず、光軸ベクトルの時間変化の傾向から近似式を生成して予測することで主被写体の加速度運動への対応も可能である。 In the period T4, the main subject 1301 appears again on the shooting screen, and the defocus amount 1401 and the optical axis vector amount 1402 are acquired. That is, the defocus amount and the optical axis vector amount in the focus detection area 1306 in FIG. 13C are acquired, and these are the results that are supposed to exist when the main object that was once lost can be accurately tracked when it reappears. is there. In the period T4, the focus lens 204 is stopped and the main subject 1301 moves forward, so that the absolute amount of defocus increases. Therefore, it is difficult to regard the focus detection area 1306 as the area of the main subject only with the defocus amount 1401. On the other hand, in the optical axis vector amount shown in FIG. 14B, it can be seen that the optical axis vector of the main subject is the same in both direction and size (V2) in the periods T2 and T4. In this way, in the LOST state in the periods T3 and T4, the second subject candidate region can be determined again by continuing to search for the region where the optical axis vector amount is V2 for all the focus detection regions. That is, in a scene in which the main subject is hidden by a similar subject, it is possible to identify the main subject, which is difficult to identify only by the temporal change of the defocus amount, based on the temporal change of the optical axis vector. The processing after the second subject candidate area is determined is the same as the processing in the FIND state shown in FIGS. 11 and 12, and thus detailed description thereof will be omitted. Note that the graphs in FIGS. 12B and 14B show a situation in which the main subject is moving at a constant velocity. Not limited to this, it is also possible to deal with the acceleration motion of the main subject by generating and predicting an approximate expression from the tendency of the temporal change of the optical axis vector.

本実施形態によれば、撮像面全体の距離情報と位置変化ベクトルの光軸成分（光軸ベクトル情報）を追跡のための情報に利用することで、追跡精度の向上を実現できる。主被写体と画素パターンや色ヒストグラムが類似している類似被写体に対して主被写体が光軸方向に近づいて来る場合や、光軸方向に移動している主被写体に対して類似被写体が手前を横切る場合でも主被写体の追跡処理を正確に行える。 According to the present embodiment, the tracking accuracy can be improved by using the distance information of the entire imaging surface and the optical axis component (optical axis vector information) of the position change vector as the information for tracking. Pixel pattern or color histogram is similar to the main subject When the main subject approaches the similar subject in the optical axis direction, or when the main subject is moving in the optical axis direction, the similar subject crosses in front of you. Even in this case, the tracking process of the main subject can be accurately performed.

［第２実施形態］
以下、本発明の第２実施形態について説明する。まず、本実施形態と第１実施形態との相違について説明する。第２実施形態では、第１実施形態に対して演算量を削減し、処理効率を上げることを目的とする。第１実施形態では撮像面全体を複数の小領域に分割して焦点検出を行ったが、本実施形態ではマッチング部５０２から出力される第１の被写体候補領域の座標に基づいて焦点検出領域を決定する。この方法を用いることで、第１の被写体候補領域が一か所または狭い範囲に集中しているときには焦点検出領域の数が減り、演算処理量を低減できる。なお、本実施形態における撮像装置の構成は図２の構成と同様であるため、既に使用した符号を用いることで、それらの詳細な説明を省略する。 [Second Embodiment]
The second embodiment of the present invention will be described below. First, the difference between the present embodiment and the first embodiment will be described. The second embodiment aims to reduce the amount of calculation and improve the processing efficiency as compared with the first embodiment. In the first embodiment, the entire image pickup surface is divided into a plurality of small areas for focus detection, but in the present embodiment, the focus detection area is determined based on the coordinates of the first subject candidate area output from the matching unit 502. decide. By using this method, the number of focus detection regions is reduced when the first subject candidate regions are concentrated in one place or a narrow range, and the amount of calculation processing can be reduced. Since the configuration of the image pickup apparatus in this embodiment is the same as the configuration in FIG. 2, the reference numerals already used are used, and the detailed description thereof is omitted.

図１５は、本実施形態の被写体追跡部２７１の構成例を示すブロック図である。第１実施形態ではマッチング部５０２から出力される複数の評価値と領域情報が被写体領域決定部５０３にのみ入力されていた。本実施形態では複数の評価値と領域情報がカメラ制御部２２５にも入力される。焦点検出用信号処理部２２４は、これらの情報に基づいて焦点検出領域を設定し、算出したデフォーカス情報を光軸ベクトル算出部５０４に出力する。光軸ベクトル算出部５０４以降の処理については第１実施形態と同様であるため、その説明を省略する。 FIG. 15 is a block diagram showing a configuration example of the subject tracking unit 271 of this embodiment. In the first embodiment, the plurality of evaluation values and the area information output from the matching unit 502 are input only to the subject area determination unit 503. In this embodiment, a plurality of evaluation values and area information are also input to the camera control unit 225. The focus detection signal processing unit 224 sets the focus detection region based on these pieces of information, and outputs the calculated defocus information to the optical axis vector calculation unit 504. The processes after the optical axis vector calculation unit 504 are the same as those in the first embodiment, and thus the description thereof will be omitted.

図１６を参照して、マッチング部５０２の出力に基づく焦点検出領域の設定方法を説明する。図１６は２つの被写体を撮影するシーンの画像例を示す。焦点検出用信号処理部２２４は、マッチング部５０２から出力される複数の領域情報に基づき、各領域の中心座標をそれぞれ示す中心点群１６０１、１６０４を算出する。中心点群１６０１は、第１の被写体の画像に対応する複数の領域の中心座標をそれぞれ示し、中心点群１６０４は、第２の被写体の画像に対応する複数の領域の中心座標をそれぞれ示す。中心点群に基づいて焦点検出領域が設定される。具体的には、図１６に示すように予め画面全体に小領域が配置され、それぞれの小領域に対して中心点群１６０１を内包する領域が仮焦点検出領域１６０２（実線の太線枠参照）として設定される。この段階で仮焦点検出領域１６０２とする理由は、被写体の大きさが考慮されていないためである。そこで被写体のそれぞれの中心点が持つ領域情報を考慮して、小領域を含む焦点検出領域１６０３が設定される。例えば、中心点群１６０４の１つは、領域１６０５の領域情報を持っているため、仮焦点検出領域（実線の太線枠参照）の上に位置する２つの小領域１６０６、１６０７の一部が領域１６０５に含まれる。よって、２つの小領域１６０６、１６０７は焦点検出領域１６０３に追加される。このように、より広く設定される焦点検出領域１６０３内に存在する小領域に対して、焦点検出用信号処理部２２４は相関演算を行い、デフォーカス量を算出する。 A method of setting the focus detection area based on the output of the matching unit 502 will be described with reference to FIG. FIG. 16 shows an example image of a scene in which two subjects are photographed. The focus detection signal processing unit 224 calculates center point groups 1601 and 1604 respectively indicating the center coordinates of each region based on the plurality of region information output from the matching unit 502. The center point group 1601 shows the center coordinates of a plurality of regions corresponding to the image of the first subject, and the center point group 1604 shows the center coordinates of the plurality of regions corresponding to the image of the second subject. The focus detection area is set based on the central point group. Specifically, as shown in FIG. 16, small areas are arranged in advance on the entire screen, and an area including the center point group 1601 for each small area is a temporary focus detection area 1602 (see a solid thick line frame). Is set. The reason for setting the provisional focus detection area 1602 at this stage is that the size of the subject is not taken into consideration. Therefore, the focus detection area 1603 including a small area is set in consideration of the area information of each center point of the subject. For example, since one of the center point groups 1604 has the area information of the area 1605, a part of the two small areas 1606 and 1607 located above the provisional focus detection area (see the solid bold frame) is the area. Included in 1605. Therefore, the two small areas 1606 and 1607 are added to the focus detection area 1603. In this way, the focus detection signal processing unit 224 performs a correlation operation on the small area existing in the wider focus detection area 1603 to calculate the defocus amount.

本実施形態では、図１６のように中心点群が密集している場合に、中心点がそれぞれ有する領域情報から焦点検出領域を設定することで、焦点検出領域１６０３の数を必要最小限に抑えることができる。これにより、焦点検出領域１６０３以外の領域については相関演算を行う必要が無いので、演算負荷を低減できる。本実施形態によれば、焦点検出の処理効率を向上させることができる。 In the present embodiment, when the central point group is dense as shown in FIG. 16, the number of the focal point detection areas 1603 is minimized by setting the focal point detection areas from the area information of the central points. be able to. As a result, it is not necessary to perform the correlation calculation for the areas other than the focus detection area 1603, so the calculation load can be reduced. According to this embodiment, the processing efficiency of focus detection can be improved.

［その他の実施形態］
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 [Other Embodiments]
The present invention supplies a program that realizes one or more functions of the above-described embodiments to a system or apparatus via a network or a storage medium, and one or more processors in a computer of the system or apparatus read and execute the program. It can also be realized by the processing. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

２００‥‥‥撮像装置
２０４‥‥‥フォーカスレンズ
２１２‥‥‥フォーカス制御部
２１３‥‥‥レンズ制御部
２２１‥‥‥撮像素子
２２４‥‥‥焦点検出用信号処理部
２２５‥‥‥カメラ制御部
２７１‥‥‥被写体追跡部

200 ... image pickup device 204 ... focus lens 212 ... focus control unit 213 ... lens control unit 221 ... image pickup device 224 ... focus detection signal processing unit 225 ... camera control unit 271 ‥‥‥ Subject tracking unit

Claims

A subject tracking device for performing a tracking process of a subject in an image by acquiring image data and distance information related to the image data,
Acquisition means for acquiring the image data and distance information,
First detection means for detecting an image of a subject from the image data and outputting information on a plurality of first candidate regions related to the subject region;
Second detection means for outputting information on a second candidate region related to the subject region based on the distance information and the position change of the subject in the depth direction in the image;
Determining means for acquiring information on the first and second candidate regions and determining a region narrowed down from the plurality of first candidate regions by the second candidate region as a subject region used for subject tracking. ,
The first detection means is
Subject detecting means for obtaining the image data to detect a subject,
A matching unit that acquires the image data and the detection information of the subject detection unit, performs matching processing, and outputs a plurality of evaluation values and region information as information of the first candidate region;
The second detection means is
Calculating means for acquiring the distance information and calculating a position change vector indicating a position change of the subject in the image in the depth direction;
And a comparing unit that compares the distance information and the position change vector relating to the plurality of detected subject areas and outputs the information of the second candidate area .

The acquisition unit acquires the distance information from data of a plurality of parallax images having parallax and outputs the distance information to the second detection unit,
The comparing means stores the distance information and the position change vector corresponding to the subject area in the past captured image stored in the storage means, and the distance information and the position change vector corresponding to the subject area in the current captured image. comparing the subject tracking apparatus according to claim 1, the difference between the difference and the positional change vector of the distance information of a subject area less than the threshold value, respectively, and determines the second candidate region.

Signal processing means for generating a signal for focus detection from the data of the parallax image and calculating a defocus amount related to the imaging optical system;
A control unit that acquires a defocus amount from the signal processing unit and controls focus adjustment of the imaging optical system,
The subject tracking apparatus according to claim 2 , wherein the second detection unit acquires a defocus amount from the signal processing unit as the distance information.

The signal processing means calculates a defocus amount in a plurality of focus detection areas set in a captured image, and the control means performs drive control of a focus lens of the imaging optical system,
4. The subject tracking apparatus according to claim 3 , wherein the calculation unit calculates the position change vector from the defocus amount calculated in the past and the current by the signal processing unit and the drive amount of the focus lens.

In the first state in which the subject is being identified, the comparison unit has a defocus amount equal to or less than a threshold value, and a position change vector corresponding to a subject region in a past captured image, and a current position change vector. The subject tracking device according to claim 3 or 4 , wherein a subject region having a difference from a position change vector corresponding to the subject region in the captured image of 1 is a threshold value or less is determined as the second candidate region.

In the second state in which the subject is not specified, the comparison unit has the defocus amount equal to or less than a threshold value, or a position change vector corresponding to a subject region in a past captured image, and a current position change vector. The subject tracking device according to claim 3 or 4 , wherein a subject region having a difference from a position change vector corresponding to the subject region in the captured image of 1 is a threshold value or less is determined as the second candidate region.

The comparing means may obtain a plurality of the position change vectors stored in the storage means, and perform a process of identifying a position change vector corresponding to a current subject region based on a tendency of the position change vectors with respect to time change. The subject tracking device according to any one of claims 2 to 6 , which is characterized in that.

The comparison means acquires the plurality of evaluation values and area information by the matching means, calculates a reliability evaluation value of subject tracking, and uses the reliability evaluation value for determining the difference in the distance information and the difference in the position change vector, respectively. The subject tracking device according to claim 2 , wherein the threshold value is changed to a value corresponding to the reliability evaluation value.

Wherein, according to claim 4, characterized in that to determine the driving amount of defocus amounts obtained by the focus lens in the focus detection area centroid of the object region determined by the determining means is positioned Subject tracking device.

Imaging apparatus including a subject tracking apparatus according to any one of claims 1 9.

A plurality of microlenses, an imaging device having a plurality of photoelectric conversion units corresponding to each microlens,
The signals output by the plurality of photoelectric conversion units corresponding to the plurality of focus detection areas set in the captured image are acquired, the defocus amount is calculated from the displacement amount of the plurality of images, and the focus is calculated from the defocus amount. The image pickup apparatus according to claim 10 , further comprising a focus adjustment control unit that calculates a drive amount of the lens and performs focus adjustment by controlling the drive of the focus lens.

The focus adjustment control means calculates a defocus amount in the focus detection area set corresponding to the first candidate area among the plurality of focus detection areas, and does not correspond to the first candidate area. The image pickup apparatus according to claim 11 , wherein focus detection calculation processing is not performed on the focus detection area.

A control method executed by a subject tracking device that obtains image data and distance information related to the image data to perform a tracking process of a subject in an image,
A first detection unit acquires the image data, detects a subject, performs matching processing based on the image data and detection information of the subject, and obtains a plurality of evaluation values as information of a first candidate region related to the subject region. Outputting the area information,
The second detecting means acquires the distance information, calculates a position change vector indicating a position change of the subject in the image in the depth direction, and calculates the distance information and the position change vector relating to the plurality of detected subject regions. A step of comparing and outputting information of the second candidate area related to the object area;
Determining means acquires information on the first and second candidate areas, and determines an area narrowed down from the plurality of first candidate areas by the second candidate area as a subject area to be used for subject tracking; A method of controlling a subject tracking device, comprising:

A program that causes a computer of a subject tracking device to execute each step according to claim 13 .