JP7223079B2

JP7223079B2 - IMAGE PROCESSING APPARATUS, CONTROL METHOD THEREOF, AND IMAGING APPARATUS

Info

Publication number: JP7223079B2
Application number: JP2021126039A
Authority: JP
Inventors: 良介辻
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2016-06-21
Filing date: 2021-07-30
Publication date: 2023-02-15
Anticipated expiration: 2037-04-21
Also published as: JP2017229061A; JP2021176243A; JP6924064B2

Description

本発明は、画像処理装置およびその制御方法、ならびに撮像装置に関し、特には画像間で特定の領域を追跡する技術に関する。 The present invention relates to an image processing device, its control method, and an imaging device, and more particularly to technology for tracking a specific region between images.

ある時刻ｔに撮影された画像内の領域と類似した領域を、時刻ｔより後に撮影された１つ以上の画像内で探索することで、領域の経時的な動きを検出することができる。例えば動画撮影において特定の被写体の領域（顔領域）の動きを検出することにより、特定の被写体にピントを合わせ続けたり、特定の被写体の露出が適正になるように露出条件を動的に変更したりすることが可能になる（特許文献１）。 By searching for regions in one or more images taken after time t for regions similar to regions in images taken at time t, movement of regions over time can be detected. For example, by detecting the movement of a specific subject's area (face area) during movie shooting, it is possible to keep the specific subject in focus or dynamically change the exposure conditions so that the specific subject's exposure is appropriate. (Patent Document 1).

特開２００５－３１８５５４号公報JP 2005-318554 A

特定の画像領域と類似した領域を探索する場合、マッチングと呼ばれる手法が一般的に用いられる。例えばテンプレートマッチングでは、ある画像領域の画素パターンを特徴量（テンプレート）として設定し、別の画像の探索領域内でテンプレートの位置を相対的に変えながら位置ごとに類似度（例えば相関量）を算出し、類似度の最も高い位置を検出する。そして、検出された位置での類似度が十分に高いと判定されれば、その位置にテンプレートと同じパターンの画像領域が存在すると推定する。 A technique called matching is generally used when searching for a region similar to a specific image region. For example, in template matching, a pixel pattern in a certain image area is set as a feature amount (template), and the similarity (for example, correlation amount) is calculated for each position while relatively changing the position of the template within the search area of another image. and detect the position with the highest similarity. Then, if it is determined that the similarity at the detected position is sufficiently high, it is estimated that there is an image area with the same pattern as the template at that position.

マッチングによる探索精度は、マッチングに用いる特徴量をどのように設定するかに大きく依存する。例えば、ある特定の人物の顔領域を追跡する場合、顔領域の一部しか含まない領域の画素パターンを特徴量に設定すると、顔の特徴量が少ないために誤検出が起こりやすくなる。また逆に顔領域全体を含むが、顔領域の周辺領域（例えば背景領域）の割合が多い画素パターンを特徴量に設定すると、背景の類似度の寄与が大きくなり、やはり誤検出が起こりやすくなる。 The accuracy of search by matching greatly depends on how the feature amount used for matching is set. For example, when tracking a face area of a specific person, if a pixel pattern of an area including only a part of the face area is set as a feature amount, erroneous detection is likely to occur because the face feature amount is small. Conversely, if a pixel pattern that includes the entire face area but has a large proportion of the area surrounding the face area (for example, the background area) is set as the feature amount, the similarity of the background will make a large contribution, and erroneous detection will likely occur. .

本発明はこのような従来技術の課題に鑑みてなされたものであり、精度の良い領域追跡が可能な画像処理装置およびその制御方法の提供を目的とする。 SUMMARY OF THE INVENTION It is an object of the present invention to provide an image processing apparatus capable of highly accurate area tracking and a control method thereof.

上述の目的は、画像領域から特徴量を抽出する抽出手段と、特徴量を用いて、画像領域と類似する領域を時系列的な複数の画像内で探索する探索手段と、を有し、抽出手段は、画像領域と類似する領域の周辺領域における距離情報のうち、画像領域と類似する領域における距離情報との差が近距離側および遠距離側において所定の範囲内である距離情報を有する領域が周辺領域に占める割合が閾値以上でない場合に、探索手段が探索に用いる特徴量を更新し、割合が閾値以上の場合に、探索手段が探索に用いる特徴量を更新しないことを特徴とする画像処理装置によって達成される。 The above object has an extraction means for extracting a feature quantity from an image region, and a search means for searching for a region similar to the image region in a plurality of time-series images using the feature quantity, and extracting In the area having distance information in the peripheral area of the area similar to the image area, the difference from the distance information in the area similar to the image area is within a predetermined range on the short distance side and the long distance side. is not greater than the threshold value, the search means updates the feature value used for the search when the ratio occupies the peripheral region is not greater than the threshold value, and the search means does not update the feature value used for the search when the ratio is greater than or equal to the threshold value. Accomplished by processing equipment.

本発明によれば、精度の良い領域追跡が可能な画像処理装置およびその制御方法を提供できる。 According to the present invention, it is possible to provide an image processing apparatus capable of highly accurate area tracking and a control method thereof.

実施形態に係るデジタルカメラの機能構成例を示すブロック図1 is a block diagram showing an example functional configuration of a digital camera according to an embodiment; FIG. 図１の撮像素子の画素配列例を示す図A diagram showing an example of a pixel array of the image sensor in FIG. 図１の追跡部の機能構成例を示すブロック図Block diagram showing a functional configuration example of the tracking unit in FIG. 第１の実施形態におけるテンプレートマッチングに関する図FIG. 4 is a diagram related to template matching in the first embodiment; 第１の実施形態におけるヒストグラムマッチングに関する図Diagram relating to histogram matching in the first embodiment 第１の実施形態における被写体距離の取得方法に関する図FIG. 4 is a diagram relating to a subject distance acquisition method according to the first embodiment; 第１の実施形態における被写体領域の特定方法を模式的に示す図FIG. 4 is a diagram schematically showing a method of specifying a subject region according to the first embodiment; 第１の実施形態における撮像処理のフローチャートFlowchart of imaging processing in the first embodiment 第１の実施形態における被写体追跡処理のフローチャートFlowchart of subject tracking processing in the first embodiment 第２の実施形態における撮像処理のフローチャートFlowchart of imaging processing in the second embodiment 第２の実施形態における被写体追跡処理のフローチャートFlowchart of subject tracking processing in the second embodiment 第２の実施形態における特徴量の更新判定方法を模式的に示す図FIG. 11 is a diagram schematically showing a feature amount update determination method according to the second embodiment;

以下、添付図面を参照して本発明の実施形態に係る画像処理装置の一例としてのデジタルカメラについて詳細に説明する。しかしながら、本発明は撮影機能を有さない電子機器においても実施可能である。本発明を実施可能な電子機器には例えば、デジタルカメラ、携帯電話機、タブレット端末、ゲーム機、パーソナルコンピュータ、ナビゲーションシステム、家電製品、ロボットなどが含まれるが、これらに限定されない。 A digital camera as an example of an image processing apparatus according to embodiments of the present invention will be described in detail below with reference to the accompanying drawings. However, the present invention can also be implemented in electronic equipment that does not have a photographing function. Examples of electronic devices that can implement the present invention include, but are not limited to, digital cameras, mobile phones, tablet terminals, game machines, personal computers, navigation systems, home appliances, robots, and the like.

●＜第１の実施形態＞
（撮像装置の構成）
図１は、本発明の第１の実施形態に係るデジタルカメラ１００の機能構成例を示すブロック図である。デジタルカメラ１００は動画および静止画の撮影ならびに記録が可能である。デジタルカメラ１００内の各機能ブロックは、バス１６０を介して互いに通信可能に接続されている。デジタルカメラ１００の動作は、主制御部１５１（中央演算処理装置）がプログラムを実行して各機能ブロックを制御することにより実現される。 ● <First embodiment>
(Configuration of imaging device)
FIG. 1 is a block diagram showing an example functional configuration of a digital camera 100 according to the first embodiment of the present invention. The digital camera 100 can shoot and record moving and still images. Each functional block within the digital camera 100 is communicably connected to each other via a bus 160 . The operation of the digital camera 100 is realized by the main control unit 151 (central processing unit) executing a program to control each functional block.

本実施形態のデジタルカメラ１００は撮影した被写体の距離情報を取得可能である。距離情報は例えば画素値が対応する被写体の距離を表す距離画像であってよい。距離情報はどのような方法で取得してもよいが、本実施形態では視差画像に基づいて距離情報を取得するものとする。視差画像の取得方法にも制限は無いが、本実施形態では１つのマイクロレンズを共有する複数の光電変換素子を備えた撮像素子１４１を用いて視差画像を取得するものとする。なお、デジタルカメラ１００をステレオカメラのような多眼カメラとして視差画像を取得してもよいし、任意の方法で撮影された視差画像のデータを記憶媒体や外部装置から取得してもよい。 The digital camera 100 of this embodiment can acquire distance information of a photographed subject. The distance information may be, for example, a distance image representing the distance of the object corresponding to the pixel value. The distance information may be obtained by any method, but in this embodiment the distance information is obtained based on the parallax images. Although there is no limitation on the method of acquiring the parallax images, in this embodiment, the parallax images are acquired using the imaging device 141 having a plurality of photoelectric conversion elements sharing one microlens. Parallax images may be obtained by using the digital camera 100 as a multi-view camera such as a stereo camera, or parallax image data captured by an arbitrary method may be obtained from a storage medium or an external device.

また、デジタルカメラ１００は指定された被写体領域と類似した領域の探索を継続的に実行することにより被写体追跡機能を実現する追跡部１６１を有する。追跡部１６１は視差画像から距離情報を生成し、被写体領域の探索に用いる。追跡部１６１の構成及び動作の詳細については後述する。 The digital camera 100 also has a tracking unit 161 that implements a subject tracking function by continuously searching for areas similar to the designated subject area. The tracking unit 161 generates distance information from the parallax image, and uses it for searching the subject area. Details of the configuration and operation of the tracking unit 161 will be described later.

撮影レンズ１０１（レンズユニット）は、固定１群レンズ１０２、ズームレンズ１１１、絞り１０３、固定３群レンズ１２１、フォーカスレンズ１３１、ズームモータ１１２、絞りモータ１０４、およびフォーカスモータ１３２を有する。固定１群レンズ１０２、ズームレンズ１１１、絞り１０３、固定３群レンズ１２１、フォーカスレンズ１３１は撮影光学系を構成する。なお、便宜上レンズ１０２、１１１、１２１、１３１を１枚のレンズとして図示しているが、それぞれ複数のレンズで構成されてもよい。また、撮影レンズ１０１は着脱可能な交換レンズとして構成されてもよい。 The taking lens 101 (lens unit) has a fixed 1st group lens 102 , a zoom lens 111 , an aperture 103 , a fixed 3rd group lens 121 , a focus lens 131 , a zoom motor 112 , an aperture motor 104 , and a focus motor 132 . A fixed 1st group lens 102, a zoom lens 111, an aperture 103, a fixed 3rd group lens 121, and a focus lens 131 constitute a photographing optical system. Although the lenses 102, 111, 121, and 131 are illustrated as one lens for convenience, they may each be composed of a plurality of lenses. Also, the photographing lens 101 may be configured as a detachable interchangeable lens.

絞り制御部１０５は絞り１０３を駆動する絞りモータ１０４の動作を制御し、絞り１０３の開口径を変更する。
ズーム制御部１１３は、ズームレンズ１１１を駆動するズームモータ１１２の動作を制御し、撮影レンズ１０１の焦点距離（画角）を変更する。 A diaphragm control unit 105 controls the operation of a diaphragm motor 104 that drives the diaphragm 103 to change the aperture diameter of the diaphragm 103 .
A zoom control unit 113 controls the operation of a zoom motor 112 that drives the zoom lens 111 to change the focal length (angle of view) of the photographing lens 101 .

フォーカス制御部１３３は、撮像素子１４１から得られる１対の焦点検出用信号（Ａ像およびＢ像）の位相差に基づいて撮影レンズ１０１のデフォーカス量およびデフォーカス方向を算出する。そしてフォーカス制御部１３３は、デフォーカス量およびデフォーカス方向をフォーカスモータ１３２の駆動量および駆動方向に変換する。この駆動量および駆動方向に基づいてフォーカス制御部１３３はフォーカスモータ１３２の動作を制御し、フォーカスレンズ１３１を駆動することにより、撮影レンズ１０１の焦点状態を制御する。このように、フォーカス制御部１３３は位相差検出方式の自動焦点検出（ＡＦ）を実施する。なお、フォーカス制御部１３３は撮像素子１４１から得られる画像信号から得られるコントラスト評価値に基づくコントラスト検出方式のＡＦを実行してもよい。 The focus control unit 133 calculates the defocus amount and defocus direction of the photographing lens 101 based on the phase difference between the pair of focus detection signals (the A image and the B image) obtained from the imaging device 141 . The focus control unit 133 then converts the defocus amount and defocus direction into the drive amount and drive direction of the focus motor 132 . The focus control unit 133 controls the operation of the focus motor 132 based on the drive amount and the drive direction, and drives the focus lens 131 to control the focus state of the photographing lens 101 . In this manner, the focus control unit 133 performs automatic focus detection (AF) using a phase difference detection method. Note that the focus control unit 133 may perform contrast detection AF based on a contrast evaluation value obtained from an image signal obtained from the imaging device 141 .

撮影レンズ１０１によって撮像素子１４１の結像面に形成される被写体像は、撮像素子１４１に配置された複数の画素のそれぞれが有する光電変換素子により電気信号（画像信号）に変換される。本実施形態では、撮像素子１４１に、水平方向にｍ、垂直方向にｎ（ｎ，ｍは複数）の画素が行列状に配置されており、各画素には２つの光電変換素子（光電変換領域）が設けられている。撮像素子１４１からの信号読み出しは、主制御部１５１からの指示に従ってセンサ制御部１４３が制御する。 A subject image formed on the imaging plane of the imaging element 141 by the imaging lens 101 is converted into an electric signal (image signal) by a photoelectric conversion element of each of a plurality of pixels arranged on the imaging element 141 . In this embodiment, m pixels in the horizontal direction and n pixels in the vertical direction (where n and m are plural) are arranged in a matrix in the image sensor 141, and each pixel has two photoelectric conversion elements (photoelectric conversion regions). ) is provided. Signal reading from the imaging device 141 is controlled by the sensor control section 143 according to instructions from the main control section 151 .

（撮像素子１４１の画素配列）
図２は、撮像素子１４１における画素の配置例を模式的に示す図であり、水平方向に４画素、垂直方向に４画素の１６画素からなる領域を代表的に示している。撮像素子１４１の各画素には１つのマイクロレンズ２１０と、マイクロレンズ２１０を介して受光する２つの光電変換素子２０１、２０２とが設けられている。図２の例では水平方向に２つの光電変換素子２０１、２０２が配置されているため、各画素は撮影レンズ１０１の瞳領域を水平方向に分割する機能を有する。 (Pixel array of image sensor 141)
FIG. 2 is a diagram schematically showing an arrangement example of pixels in the imaging element 141, and representatively shows an area composed of 16 pixels, 4 pixels in the horizontal direction and 4 pixels in the vertical direction. Each pixel of the imaging element 141 is provided with one microlens 210 and two photoelectric conversion elements 201 and 202 that receive light via the microlens 210 . Since the two photoelectric conversion elements 201 and 202 are arranged in the horizontal direction in the example of FIG. 2, each pixel has the function of dividing the pupil region of the photographing lens 101 in the horizontal direction.

また、撮像素子１４１には、水平方向２画素×垂直方向２画素の４画素を繰り返し単位とする原色ベイヤー配列のカラーフィルタが設けられている。カラーフィルタはＲ（赤）およびＧ（緑）が水平方向に繰り返し配置される行と、ＧおよびＢ（青）が水平方向に繰り返し配置される行とが交互に配置された構成を有する。赤フィルタが設けられた画素２００Ｒを赤画素、Ｇ（緑）フィルタが設けられた画素２００Ｇを緑画素、Ｂ（青）フィルタが設けられた画素２００Ｂを青画素と呼ぶ。 In addition, the imaging element 141 is provided with a color filter in a primary color Bayer arrangement in which four pixels of 2 pixels in the horizontal direction×2 pixels in the vertical direction are repeated units. The color filter has a configuration in which rows in which R (red) and G (green) are repeatedly arranged in the horizontal direction and rows in which G and B (blue) are repeatedly arranged in the horizontal direction are alternately arranged. A pixel 200R provided with a red filter is called a red pixel, a pixel 200G provided with a G (green) filter is called a green pixel, and a pixel 200B provided with a B (blue) filter is called a blue pixel.

以下の説明では、第１の光電変換素子２０１をＡ画素、第２の光電変換素子２０２をＢ画素、Ａ画素から読み出される信号をＡ信号、Ｂ画素から読み出される信号をＢ信号と呼ぶことがある。ある領域に含まれる複数の画素から得られるＡ信号で構成される画像と、Ｂ信号で構成される画像とは１組の視差画像を構成する。したがって、デジタルカメラ１００は１回の撮影によって２つの視差画像を生成することができる。また、画素ごとにＡ信号とＢ信号とを加算すると、瞳分割機能を持たない一般的な画素と同様の信号を得ることができる。以下ではこの加算信号をＡ＋Ｂ信号、Ａ＋Ｂ信号から構成される画像を撮像画像と呼ぶことがある。 In the following description, the first photoelectric conversion element 201 is called an A pixel, the second photoelectric conversion element 202 is called a B pixel, a signal read from the A pixel is called an A signal, and a signal read from the B pixel is called a B signal. be. An image composed of A signals obtained from a plurality of pixels included in a certain region and an image composed of B signals form a set of parallax images. Therefore, the digital camera 100 can generate two parallax images by one shot. Also, by adding the A signal and the B signal for each pixel, it is possible to obtain a signal similar to that of a general pixel that does not have a pupil division function. Hereinafter, this added signal is sometimes referred to as an A+B signal, and an image composed of the A+B signal is sometimes referred to as a captured image.

このように、１つの画素から、第１の光電変換素子２０１の出力（Ａ信号）、第２の光電変換素子２０２の出力（Ｂ信号）、および第１の光電変換素子２０１と第２の光電変換素子２０２の加算出力（Ａ＋Ｂ信号）という３種類の信号を読み出すことができる。なお、Ａ信号（Ｂ信号）は、読み出す代わりにＡ＋Ｂ信号からＢ信号（Ａ信号）を減じて求めてもよい。 In this way, from one pixel, the output (A signal) of the first photoelectric conversion element 201, the output (B signal) of the second photoelectric conversion element 202, and the first photoelectric conversion element 201 and the second photoelectric conversion element 201 are output. Three types of signals, ie, the addition output (A+B signal) of the conversion element 202 can be read. The A signal (B signal) may be obtained by subtracting the B signal (A signal) from the A+B signal instead of reading.

なお、光電変換素子は垂直方向に分割配置されてもよいし、光電変換素子の分割方向が異なる画素が混在していてもよい。また、光電変換素子は垂直および水平の両方向に分割されていてもよい。また、同一方向で３つ以上に分割されていてもよい。 Note that the photoelectric conversion elements may be divided and arranged in the vertical direction, and pixels in which the photoelectric conversion elements are divided in different directions may be mixed. Also, the photoelectric conversion element may be divided in both the vertical and horizontal directions. Also, it may be divided into three or more in the same direction.

図１に戻って、撮像素子１４１から読み出された画像信号は信号処理部１４２に供給される。信号処理部１４２は、ノイズ低減処理、Ａ／Ｄ変換処理、自動利得制御処理などの信号処理を画像信号に適用し、センサ制御部１４３に出力する。センサ制御部１４３は信号処理部１４２から受信した画像信号をＲＡＭ（ランダム・アクセス・メモリ）１５４に蓄積する。 Returning to FIG. 1, the image signal read out from the imaging element 141 is supplied to the signal processing section 142 . The signal processing unit 142 applies signal processing such as noise reduction processing, A/D conversion processing, and automatic gain control processing to the image signal, and outputs the image signal to the sensor control unit 143 . The sensor control unit 143 stores the image signal received from the signal processing unit 142 in a RAM (random access memory) 154 .

画像処理部１５２は、ＲＡＭ１５４に蓄積された画像データに対して予め定められた画像処理を適用する。画像処理部１５２が適用する画像処理には、ホワイトバランス調整処理、色補間（デモザイク）処理、ガンマ補正処理といった所謂現像処理のほか、信号形式変換処理、スケーリング処理、被写体検出処理、被写体認識処理などがあるが、これらに限定されない。また、自動露出制御（ＡＥ）に用いるための、被写体輝度に関する情報なども画像処理部１５２で生成することができる。被写体検出処理や被写体認識処理の結果を他の画像処理（例えばホワイトバランス調整処理）に利用してもよい。なお、コントラスト検出方式のＡＦを行う場合、ＡＦ評価値を画像処理部１５２が生成してもよい。画像処理部１５２は、処理した画像データをＲＡＭ１５４に保存する。 The image processing unit 152 applies predetermined image processing to the image data accumulated in the RAM 154 . Image processing applied by the image processing unit 152 includes so-called development processing such as white balance adjustment processing, color interpolation (demosaicing) processing, and gamma correction processing, as well as signal format conversion processing, scaling processing, subject detection processing, subject recognition processing, and the like. There are, but not limited to, The image processing unit 152 can also generate information about subject brightness for use in automatic exposure control (AE). The results of subject detection processing and subject recognition processing may be used for other image processing (for example, white balance adjustment processing). It should be noted that the image processing unit 152 may generate the AF evaluation value when the contrast detection AF is performed. The image processing unit 152 stores the processed image data in the RAM 154 .

ＲＡＭ１５４に保存された画像データを記録する場合、主制御部１５１は画像処理データに例えば所定のヘッダを追加するなどして、記録形式に応じたデータファイルを生成する。この際、主制御部１５１は必要に応じて圧縮解凍部１５３で画像データを符号化して情報量を圧縮する。主制御部１５１は、生成したデータファイルを例えばメモリカードのような記録媒体１５７に記録する。 When recording the image data stored in the RAM 154, the main control unit 151 adds, for example, a predetermined header to the image processing data to generate a data file according to the recording format. At this time, the main control unit 151 encodes the image data in the compression/decompression unit 153 as necessary to compress the amount of information. The main controller 151 records the generated data file in a recording medium 157 such as a memory card.

また、ＲＡＭ１５４に保存された画像データを表示する場合、主制御部１５１は表示部１５０での表示サイズに適合するように画像データを画像処理部１５２でスケーリングした後、ＲＡＭ１５４のうちビデオメモリとして用いる領域（ＶＲＡＭ領域）に書き込む。
表示部１５０は、ＲＡＭ１５４のＶＲＡＭ領域から表示用の画像データを読み出し、例えばＬＣＤや有機ＥＬディスプレイなどの表示装置に表示する。 When displaying the image data stored in the RAM 154, the main control unit 151 scales the image data in the image processing unit 152 so as to fit the display size of the display unit 150, and then uses the RAM 154 as a video memory. Write to the area (VRAM area).
The display unit 150 reads image data for display from the VRAM area of the RAM 154, and displays it on a display device such as an LCD or an organic EL display.

本実施形態のデジタルカメラ１００は、動画撮影時（撮影スタンバイ状態や動画記録中）に、撮影された動画を表示部１５０に即時表示することにより、表示部１５０を電子ビューファインダー（ＥＶＦ）として機能させる。表示部１５０をＥＶＦとして機能させる際に表示する動画像およびそのフレーム画像を、ライブビュー画像もしくはスルー画像と呼ぶ。
また、デジタルカメラ１００は、静止画撮影を行った場合、撮影結果をユーザが確認できるように、直前に撮影した静止画を一定時間表示部１５０に表示する。これらの表示動作についても、主制御部１５１の制御によって実現される。 The digital camera 100 of the present embodiment functions as an electronic viewfinder (EVF) by immediately displaying the captured moving image on the display unit 150 during moving image shooting (shooting standby state or during moving image recording). Let A moving image and its frame images displayed when the display unit 150 functions as an EVF are called a live view image or a through image.
Further, when a still image is captured, the digital camera 100 displays the last captured still image on the display unit 150 for a certain period of time so that the user can confirm the result of the capturing. These display operations are also realized under the control of the main control unit 151 .

操作部１５６は、ユーザがデジタルカメラ１００に指示を入力するためのスイッチ、ボタン、キー、タッチパネルなどである。操作部１５６を通じた入力はバス１６０を通じて主制御部１５１が検知し、主制御部１５１は入力に応じた動作を実現するために各部を制御する。 An operation unit 156 includes switches, buttons, keys, a touch panel, and the like for the user to input instructions to the digital camera 100 . An input through the operation unit 156 is detected by the main control unit 151 through the bus 160, and the main control unit 151 controls each unit in order to realize operations according to the input.

主制御部１５１は例えばＣＰＵやＭＰＵなどのプログラマブルプロセッサを１つ以上有し、例えば記憶部１５５に記憶されたプログラムをＲＡＭ１５４に読み込んで実行することにより各部を制御し、デジタルカメラ１００の機能を実現する。主制御部１５１はまた、被写体輝度の情報に基づいて露出条件（シャッタースピードもしくは蓄積時間、絞り値、感度）を自動的に決定するＡＥ処理を実行する。被写体輝度の情報は例えば画像処理部１５２から取得することができる。主制御部１５１は、例えば人物の顔など、特定被写体の領域を基準として露出条件を決定することもできる。 The main control unit 151 has one or more programmable processors such as CPU and MPU, for example, and controls each unit by reading a program stored in the storage unit 155 into the RAM 154 and executing it, thereby realizing the functions of the digital camera 100. do. The main control unit 151 also executes AE processing for automatically determining exposure conditions (shutter speed or accumulation time, aperture value, sensitivity) based on subject brightness information. Information on subject brightness can be acquired from the image processing unit 152, for example. The main control unit 151 can also determine exposure conditions based on a specific subject area, such as a person's face.

主制御部１５１は、動画撮影時には絞りは固定とし、電子シャッタスピード（蓄積時間）とゲインの大きさで露出を制御する。主制御部１５１は決定した蓄積時感とゲインの大きさをセンサ制御部１４３に通知する。センサ制御部１４３は通知された露出条件に従った撮影が行われるように撮像素子１４１の動作を制御する。 The main control unit 151 fixes the aperture during moving image shooting, and controls the exposure based on the electronic shutter speed (accumulation time) and gain. The main control unit 151 notifies the sensor control unit 143 of the determined sense of accumulation and the magnitude of the gain. The sensor control unit 143 controls the operation of the imaging device 141 so that shooting is performed according to the notified exposure conditions.

なお、本実施形態では、１回の撮影で１組の視差画像と、撮像画像との計３つの画像が取得可能であり、個々の画像について画像処理部１５２が処理を行ってＲＡＭ１５４に書き込む。追跡部１６１は、１組の視差画像から被写体の距離情報を求め、撮像画像を対象とした被写体追跡処理に利用する。被写体追跡に成功した場合、追跡部１６１は撮像画像内の被写体領域の位置についての情報と、信頼度に関する情報を出力する。 Note that in this embodiment, a total of three images, a set of parallax images and a captured image, can be acquired in one shot, and the image processing unit 152 processes each image and writes them to the RAM 154 . The tracking unit 161 obtains distance information of a subject from a set of parallax images, and uses the information for subject tracking processing for captured images. When subject tracking is successful, the tracking unit 161 outputs information about the position of the subject area in the captured image and information about reliability.

被写体追跡の結果は、例えば焦点検出領域の自動設定に用いることができる。この結果、特定の被写体領域に対する追跡ＡＦ機能を実現できる。また、焦点検出領域の輝度情報に基づいてＡＥ処理を行ったり、焦点検出領域の画素値に基づいて画像処理（例えばガンマ補正処理やホワイトバランス調整処理など）を行ったりすることもできる。なお、主制御部１５１は、現在の被写体領域の位置を表す指標（例えば領域を囲む矩形枠）を表示画像に重畳表示させてもよい。 The result of subject tracking can be used, for example, for automatic setting of the focus detection area. As a result, it is possible to realize a tracking AF function for a specific subject area. Also, AE processing can be performed based on the luminance information of the focus detection area, and image processing (for example, gamma correction processing, white balance adjustment processing, etc.) can be performed based on the pixel values of the focus detection area. Note that the main control unit 151 may superimpose an index indicating the position of the current subject area (for example, a rectangular frame surrounding the area) on the display image.

バッテリ１５９は、電源管理部１５８により管理され、デジタルカメラ１００の全体に電源を供給する。
記憶部１５５は、主制御部１５１が実行するプログラム、プログラムの実行に必要な設定値、ＧＵＩデータ、ユーザ設定値などを記憶する。例えば操作部１５６の操作により電源ＯＦＦ状態から電源ＯＮ状態への移行が指示されると、記憶部１５５に格納されたプログラムがＲＡＭ１５４の一部に読み込まれ、主制御部１５１がプログラムを実行する。 A battery 159 is managed by a power management unit 158 and supplies power to the entire digital camera 100 .
The storage unit 155 stores programs to be executed by the main control unit 151, setting values necessary for executing the programs, GUI data, user setting values, and the like. For example, when the operation unit 156 is operated to instruct the transition from the power OFF state to the power ON state, the program stored in the storage unit 155 is read into a part of the RAM 154, and the main control unit 151 executes the program.

（追跡部の構成および動作）
図３は、追跡部１６１の機能構成例を示すブロック図である。追跡部１６１は照合部１６１０と、特徴抽出部１６２０と、距離マップ生成部１６３０とを有する。追跡部１６１は、指定された位置から追跡を行う画像領域（被写体領域）を特定し、被写体領域から特徴量を抽出する。そして、供給される個々の撮像画像内で、抽出した特徴量を用いて、前フレームの被写体領域と類似度の高い領域を被写体領域として探索する。また、追跡部１６１は１対の視差画像から距離情報を取得し、被写体領域の特定に利用する。 (Configuration and operation of tracking unit)
FIG. 3 is a block diagram showing a functional configuration example of the tracking unit 161. As shown in FIG. The tracking unit 161 has a matching unit 1610 , a feature extraction unit 1620 and a distance map generation unit 1630 . The tracking unit 161 identifies an image area (object area) to be tracked from the designated position, and extracts a feature amount from the object area. Then, in each supplied captured image, using the extracted feature amount, a region having a high degree of similarity with the subject region of the previous frame is searched as the subject region. Also, the tracking unit 161 acquires distance information from the pair of parallax images and uses it to identify the subject area.

照合部１６１０では、特徴抽出部１６２０から供給される被写体領域の特徴量を用いて、供給される画像内の被写体領域を探索する。画像の特徴量に基づいて領域を探索する方法に特に制限は無いが、照合部１６１０はテンプレートマッチングおよびヒストグラムマッチングの少なくとも一方を用いる。 The matching unit 1610 searches for a subject area in the supplied image using the feature amount of the subject area supplied from the feature extraction unit 1620 . Although there is no particular limitation on the method of searching for regions based on image feature amounts, matching section 1610 uses at least one of template matching and histogram matching.

以下、テンプレートマッチングおよびヒストグラムマッチングについて説明する。
テンプレートマッチングは、画素パターンをテンプレートとして設定し、テンプレートとの類似度が最も高い領域を画像内で探索する技術である。テンプレートと画像領域との類似度として、対応画素間の差分絶対値和のような相関量を用いることができる。 Template matching and histogram matching are described below.
Template matching is a technique in which a pixel pattern is set as a template and an area with the highest similarity to the template is searched in the image. A correlation amount such as the sum of absolute differences between corresponding pixels can be used as the degree of similarity between the template and the image region.

図４（ａ）は、テンプレート３０１とその構成例３０２を模式的に示す。テンプレートマッチングを行う場合、特徴抽出部１６２０からはテンプレートに利用する色（色相）の情報が特徴量として照合部１６１０に供給される。ここでは、テンプレート３０１が水平画素数Ｗ、垂直画素数Ｈの大きさであり、特徴量と一致する画素と一致しない画素とを、それぞれ別の固定値に置換する２値化が行われている。照合部１６１０は２値化されたテンプレート３０１を用いてパターンマッチングを行う。 FIG. 4A schematically shows a template 301 and its configuration example 302. FIG. When template matching is performed, the feature extraction unit 1620 supplies color (hue) information to be used for the template to the matching unit 1610 as a feature amount. Here, the template 301 has a horizontal pixel count of W and a vertical pixel count of H, and binarization is performed to replace pixels that match the feature amount and pixels that do not match the feature amount with different fixed values. . A matching unit 1610 performs pattern matching using the binarized template 301 .

従って、パターンマッチングに用いるテンプレート３０１の特徴量T(i,j)は、テンプレート３０１内の座標を図４（ａ）に示すような座標系で表すと、以下の式（１）で表現できる。
T(i, j) = {T(0, 0), T(1, 0), ..., T(W-1, H-1)} （１） Therefore, the feature amount T(i,j) of the template 301 used for pattern matching can be expressed by the following formula (1) when the coordinates in the template 301 are represented by the coordinate system shown in FIG. 4(a).
T(i, j) = {T(0, 0), T(1, 0), ..., T(W-1, H-1)} (1)

図４（ｂ）は、被写体領域の探索領域３０３とその構成３０５の例を示す。探索領域３０３は画像内でパターンマッチングを行う範囲であり、画像の全体もしくは一部であってよい。探索領域３０３内の座標は（x, y）で表すものとする。探索領域３０３においても、特徴量と一致する画素と一致しない画素とを、それぞれ別の固定値に置換する２値化が行われている。領域３０４はテンプレート３０１と同じ大きさ（水平画素数Ｗ、垂直画素数Ｈ）を有し、テンプレート３０１との類似度を算出する対象である。 FIG. 4(b) shows an example of a search area 303 of a subject area and its configuration 305. FIG. A search area 303 is a range in which pattern matching is performed within the image, and may be the entire image or a portion of the image. Coordinates within the search area 303 are represented by (x, y). Also in the search area 303, binarization is performed in which pixels that match the feature quantity and pixels that do not match the feature quantity are replaced with different fixed values. A region 304 has the same size (the number of horizontal pixels W and the number of vertical pixels H) as the template 301, and is a target for calculating the degree of similarity with the template 301. FIG.

パターンマッチングに用いる領域３０４の特徴量S(i,j)は、テンプレート３０１内の座標を図４（ｂ）に示すような座標系で表すと、以下の式（２）で表現できる。
S(i, j) = {S(0, 0), S(1, 0), ..., S(W-1, H-1)} （２） The feature amount S(i,j) of the region 304 used for pattern matching can be expressed by the following equation (2) when the coordinates in the template 301 are represented by the coordinate system shown in FIG. 4B.
S(i, j) = {S(0, 0), S(1, 0), ..., S(W-1, H-1)} (2)

照合部１６１０は、テンプレート３０１と領域３０４との類似性を表す評価値V(x, y)として、以下の式（３）に示す差分絶対和(SAD: Sum of Absolute Difference)値を算出する。

ここで、V(x, y)は、領域３０４の左上頂点の座標(x, y)における評価値を表す。 Collating section 1610 calculates a Sum of Absolute Difference (SAD) value shown in Equation (3) below as an evaluation value V(x, y) representing the similarity between template 301 and region 304 .

Here, V(x, y) represents the evaluation value at the coordinates (x, y) of the upper left vertex of the area 304 .

照合部１６１０は、領域３０４を探索領域３０３の左上から右方向に１画素ずつ、またx=(X-1)-(W-1)に達すると次にx=0として下方向に１画素ずつ、それぞれずらしながら、各位置で評価値V(x, y)を算出する。算出された評価値V(x, y)が最小値を示す座標(x, y)がテンプレート３０１と最も類似した画素パターンを有する領域３０４の位置を示す。照合部１６１０は、評価値V(x, y)が最小値を示す領域３０４を、探索領域内に存在する被写体領域として検出する。なお、探索結果の信頼性が低い場合（例えば評価値V(x, y)の最小値が閾値を超える場合）には、被写体領域が見つからなかったと判定してもよい。 The matching unit 1610 moves the region 304 from the upper left to the right of the search region 303 by one pixel, and when x=(X-1)-(W-1) is reached, then x=0 and downward by one pixel. , and the evaluation value V(x, y) is calculated at each position. The coordinate (x, y) at which the calculated evaluation value V(x, y) indicates the minimum value indicates the position of the region 304 having the pixel pattern most similar to the template 301 . The matching unit 1610 detects the area 304 having the minimum evaluation value V(x, y) as a subject area existing within the search area. Note that when the reliability of the search result is low (for example, when the minimum value of the evaluation value V(x, y) exceeds the threshold), it may be determined that the subject region has not been found.

ここではパターンマッチングに、特徴量に対応するいずれかの色であるか否かに応じて２値化したテンプレートを用いる例を示したが、特徴量に含まれる複数の色のそれぞれに応じて多値化したテンプレートを用いても良い。また、色の特徴量の代わりに明度や彩度に基づく特徴量を用いてもよい。また、類似度の評価値としてＳＡＤを用いる例を示したが、他の評価値、例えば正規化相互相関（ＮＣＣ: Normalized Cross-Correlation）やＺＮＣＣなどを用いてもよい。 Here, for pattern matching, an example is shown in which a template that is binarized according to whether it is one of the colors corresponding to the feature amount is used. A valued template may also be used. Also, a feature amount based on brightness or saturation may be used instead of the color feature amount. Moreover, although an example using SAD as an evaluation value of similarity has been shown, other evaluation values such as normalized cross-correlation (NCC) and ZNCC may be used.

次に、ヒストグラムマッチングの詳細に関して説明する。
図５（ａ）は被写体領域４０１とそのヒストグラム４０２の例を示す。ヒストグラムマッチングを行う場合、特徴抽出部１６２０からは色ヒストグラムに利用する色（色相）の情報が特徴量として照合部１６１０に供給される。色ヒストグラムのビン数をＭ（Ｍは２以上の整数）とすると、照合部１６２０が生成する色ヒストグラムｐ（ｍ）４０２は以下の式（４）で表現できる。
p(m) = {p(0), p(1), ..., p(M-1)} （４）
なお、ｐ（ｍ）は正規化ヒストグラムであるものとする。この色ヒストグラムｐ（ｍ）は、特徴量に含まれる色に対応するビンのみを有する。つまりビン数がＭであるならば、特徴量として供給された色の数もＭである。 Next, the details of histogram matching will be described.
FIG. 5A shows an example of a subject area 401 and its histogram 402. FIG. When performing histogram matching, the feature extracting unit 1620 supplies color (hue) information to be used for the color histogram to the matching unit 1610 as a feature amount. Assuming that the number of bins in the color histogram is M (M is an integer equal to or greater than 2), the color histogram p(m) 402 generated by the matching unit 1620 can be expressed by Equation (4) below.
p(m) = {p(0), p(1), ..., p(M-1)} (4)
Note that p(m) is a normalized histogram. This color histogram p(m) has only bins corresponding to colors included in the features. That is, if the number of bins is M, the number of colors supplied as features is also M.

図５（ｂ）は、被写体領域の探索領域４０３と色ヒストグラム４０５の例を示す。領域４０４の色ヒストグラムｑ（ｍ）４０５はビン数がＭとすると、以下の式（５）で表現される。
q(m) = {q(0), q(1), ..., q(M-1)} （５）
なお、ｑ（ｍ）は正規化ヒストグラムであるものとする。また、この色ヒストグラムｑ（ｍ）も、特徴量に含まれる色に対応するビンのみを有するヒストグラムである。 FIG. 5B shows an example of the search area 403 of the object area and the color histogram 405. FIG. Assuming that the number of bins is M, the color histogram q(m) 405 of the area 404 is expressed by the following equation (5).
q(m) = {q(0), q(1), ..., q(M-1)} (5)
It is assumed that q(m) is a normalized histogram. This color histogram q(m) is also a histogram having only bins corresponding to colors included in the feature amount.

追跡部１６１は、被写体領域４０１の色ヒストグラムｐ（ｍ）と領域４０４の色ヒストグラムｑ（ｍ）との類似性の評価値D(x, y)として以下の式（６）に示すBhattacharyya係数を算出することができる。

ここで、D(x, y)は、領域４０４の左上頂点の座標(x, y)における評価値を表す。 The tracking unit 161 uses the Bhattacharyya coefficient shown in the following equation (6) as the similarity evaluation value D(x, y) between the color histogram p(m) of the subject region 401 and the color histogram q(m) of the region 404. can be calculated.

Here, D(x, y) represents the evaluation value at the coordinates (x, y) of the upper left vertex of the area 404 .

照合部１６１０はテンプレートマッチングと同様に、領域４０４を探索領域４０３内でずらしながら、評価値D(x, y)を算出する。算出された評価値D(x, y)が最大値を示す座標(x, y)が被写体領域４０１と最も類似する領域４０４の位置を示す。照合部１６１０は、評価値D(x, y)が最大値を示す領域４０４を、探索領域内に存在する被写体領域として検出する。 The matching unit 1610 calculates the evaluation value D(x, y) while shifting the area 404 within the search area 403 in the same manner as in template matching. The coordinates (x, y) at which the calculated evaluation value D(x, y) indicates the maximum value indicate the position of the region 404 that is most similar to the subject region 401 . The matching unit 1610 detects the area 404 having the maximum evaluation value D(x, y) as a subject area existing within the search area.

ここではヒストグラムマッチングに色の特徴量を用いる例を示したが、色相や彩度の特徴量を用いてもよい。また、類似度の評価値としてBhattacharyya係数を用いる例を示したが、他の評価値、例えばヒストグラムインタセクションなどを用いてもよい。 Although an example of using a color feature amount for histogram matching is shown here, a hue or saturation feature amount may be used. Also, although an example of using the Bhattacharyya coefficient as the similarity evaluation value has been shown, other evaluation values such as a histogram intersection may be used.

距離マップ生成部１６３０では、１組の視差画像から被写体距離を算出し、距離マップを生成する。距離マップは画素のそれぞれが被写体距離を表す距離情報の１つであり、デプスマップ、奥行き画像、距離画像と呼ばれることもある。なお、距離マップは視差画像を用いずに生成してもよい。例えば、コントラスト評価値が極大となるフォーカスレンズ１３１の位置を画素ごとに求めることで、画素ごとの被写体距離を取得し、距離画像を生成してもよい。 A distance map generation unit 1630 calculates the subject distance from a set of parallax images and generates a distance map. A distance map is one type of distance information in which each pixel represents an object distance, and is sometimes called a depth map, a depth image, or a distance image. Note that the distance map may be generated without using the parallax image. For example, by obtaining the position of the focus lens 131 at which the contrast evaluation value is maximized for each pixel, the subject distance for each pixel may be obtained and a distance image may be generated.

図６を用いて被写体距離の算出方法について説明する。図６において、Ａ像１１５１ａとＢ像１１５１ｂが得られているとすると、撮影レンズ１０１の焦点距離および、フォーカスレンズ１３１と撮像素子１４１との距離情報から、実線のように光束が屈折されることがわかる。従って、ピントの合う被写体は１１５２ａの位置にあることがわかる。同様にして、Ａ像１１５１ａに対してＢ像１１５１ｃが得られた場合には位置１１５２ｂ、Ｂ像１１５１ｄが得られた場合には位置１１５２ｃにピントの合う被写体があることがわかる。以上のように、各画素において、その画素を含むＡ像と、対応するＢ像との相対位置から、その画素位置における被写体の距離情報を算出することができる。 A method of calculating the subject distance will be described with reference to FIG. In FIG. 6, assuming that an A image 1151a and a B image 1151b have been obtained, the light flux is refracted as indicated by the solid line based on the focal length of the photographing lens 101 and the distance information between the focus lens 131 and the image sensor 141. I understand. Therefore, it can be seen that the subject in focus is at the position 1152a. Similarly, when the B image 1151c is obtained for the A image 1151a, it is found that there is a subject in focus at a position 1152b, and when the B image 1151d is obtained, there is a subject at a position 1152c. As described above, for each pixel, the distance information of the subject at that pixel position can be calculated from the relative position of the A image including that pixel and the corresponding B image.

例えば図６においてＡ像１１５１ａとＢ像１１５１ｄが得られているとする。この場合、像のずれ量の半分に相当する中間点の画素１１５４から被写体位置１１５２ｃまでの距離１１５３または距離１１５３に相当するデフォーカス量を、画素１１５４の画素値として記憶する。このようにして、各画素について被写体の距離情報を算出し、距離マップを生成することができる。 For example, assume that an A image 1151a and a B image 1151d are obtained in FIG. In this case, the distance 1153 from the midpoint pixel 1154 corresponding to half the image shift amount to the object position 1152c or the defocus amount corresponding to the distance 1153 is stored as the pixel value of the pixel 1154 . In this way, the distance information of the object can be calculated for each pixel, and the distance map can be generated.

なお、画像を微小領域に分割し、微小領域ごとにデフォーカス量を算出することによって距離マップを生成してもよい。微小領域に含まれる画素からＡ像およびＢ像を生成し、その位相差（像ずれ量）を相関演算によって検出し、デフォーカス量に変換すればよい。この場合においても生成される距離マップは各画素が被写体距離を示すが、微小領域に含まれる画素は同じ被写体距離を示す。距離マップ生成部１６３０は、生成した距離マップを特徴抽出部１６２０に供給する。 Note that the distance map may be generated by dividing the image into minute areas and calculating the defocus amount for each minute area. An A image and a B image are generated from the pixels included in the minute area, the phase difference (image shift amount) is detected by correlation calculation, and converted into a defocus amount. In this case as well, each pixel in the generated distance map indicates the subject distance, but the pixels included in the minute area indicate the same subject distance. The distance map generator 1630 supplies the generated distance map to the feature extractor 1620 .

なお、距離マップは画像全体に対して生成してもよいが、特徴量を抽出するために指定された部分領域に対してだけ生成してもよい。 Note that the distance map may be generated for the entire image, or may be generated only for a partial area designated for extracting the feature amount.

特徴抽出部１６２０は、被写体領域から、被写体領域を追跡（探索）するために用いる特徴量を抽出する。
被写体追跡を実行する場合、一般には追跡の実行開始前に、ユーザに追跡対象となる画像中の位置を指定させる。例えば、撮影スタンバイ状態において、表示部１５０に表示されている画像内の位置を操作部１５６を通じてユーザに指定させることができる。例えば主制御部１５１は、表示部１５０がタッチディスプレイであればタップ操作された座標や、操作部１５６の操作を通じて画像上を移動可能なカーソルによって指定された位置の座標を取得する。特徴抽出部１６２０には主制御部１５１から指定位置の情報が入力される。 The feature extraction section 1620 extracts feature amounts used for tracking (searching) the subject area from the subject area.
When subject tracking is performed, the user is generally asked to specify a position in the image to be tracked before tracking is started. For example, in the shooting standby state, the user can designate a position within the image displayed on the display unit 150 through the operation unit 156 . For example, if the display unit 150 is a touch display, the main control unit 151 acquires the coordinates of the tap operation, or the coordinates of the position specified by the cursor that can move on the image through the operation of the operation unit 156 . Information on the designated position is input from the main control unit 151 to the feature extraction unit 1620 .

特徴抽出部１６２０が特徴量を抽出する被写体領域を特定する方法について、図７を参照して説明する。図７（ａ）は撮像画像を示し、指定位置５０３は人物の顔５０１内の座標を示すものとする。また、背景としての家５０２は、人物の顔５０１と類似した色情報を有しているとする。 A method for specifying a subject region from which the feature amount is to be extracted by the feature extraction unit 1620 will be described with reference to FIG. FIG. 7A shows a captured image, and a specified position 503 indicates coordinates within a person's face 501 . It is also assumed that the background house 502 has similar color information to the person's face 501 .

特徴抽出部１６２０は、指定位置５０３を含んだ所定領域、例えば指定位置５０３を中心とした所定の矩形領域を仮の被写体領域として、被写体領域内の色ヒストグラムH_inを生成する。また、特徴抽出部１６２０は仮の被写体領域以外の全ての領域を参照領域とし、この参照領域に関する色ヒストグラムH_Outを生成する。色ヒストグラムは、画像に含まれる色の頻度を表し、ここでは一例として画素値をＲＧＢ色空間からＨＳＶ色空間に変換し、色相（Ｈ）についての色ヒストグラムを生成するものとする。しかし、他の型式の色ヒストグラムを生成してもよい。 The feature extraction unit 1620 generates a color histogram H _in within the subject area, using a predetermined area including the specified position 503, for example, a predetermined rectangular area centered on the specified position 503 as a temporary subject area. Also, the feature extraction unit 1620 uses all areas other than the temporary object area as a reference area, and generates a color histogram H _Out for this reference area. A color histogram represents the frequency of colors included in an image. Here, as an example, pixel values are converted from the RGB color space to the HSV color space to generate a color histogram for hue (H). However, other types of color histograms may be generated.

そして、特徴抽出部１６２０は、以下の式（７）で表わされる情報量Ｉ（ａ）を算出する。
I(a) = -log₂（H_in(a) / H_out(a)）（７）
ここでａはビンの番号を示す整数である。情報量Ｉ（ａ）の絶対値は、参照領域に含まれるそのビンに対応する色の画素数に対する、仮の被写体領域に含まれるそのビンに対応する色の画素数の割合が大きいほど小さくなる。すなわち、この情報量Ｉ（ａ）の値が小さいほど、この情報量Ｉ（ａ）に対応する色は、参照領域に含まれる割合よりも、仮の被写体領域に含まれる割合が大きく、仮の被写体領域の特徴的な色である可能性が高いと考えられる。特徴抽出部１６２０は全てのビンについて情報量Ｉ（ａ）を算出する。 Then, feature extraction section 1620 calculates information amount I(a) represented by the following equation (7).
I(a) = _-log2 ( _Hin (a) / _Hout (a)) (7)
where a is an integer indicating the bin number. The absolute value of the information amount I(a) becomes smaller as the ratio of the number of pixels of the color corresponding to the bin included in the virtual subject area to the number of pixels of the color corresponding to the bin included in the reference area increases. . That is, the smaller the value of the information amount I(a), the larger the ratio of the color corresponding to the information amount I(a) included in the provisional subject area than the reference area. It is considered that there is a high possibility that it is a characteristic color of the subject area. The feature extractor 1620 calculates the amount of information I(a) for all bins.

特徴抽出部１６２０は、算出した情報量Ｉ（ａ）のそれぞれを、特定の範囲（例えば８ビット値（０～２５５）の範囲）内のいずれかの値に置換する。この際、特徴抽出部１６２０は、情報量Ｉ（ａ）の値が小さいほど大きい値に置換する。そして、特徴抽出部１６２０は、撮像画像に含まれる各画素の値を、その画素の色に対応する情報量Ｉ（ａ）が置換された値に置き換える。 The feature extraction unit 1620 replaces each of the calculated information amounts I(a) with any value within a specific range (for example, the range of 8-bit values (0 to 255)). At this time, the feature extraction unit 1620 replaces the smaller value of the information amount I(a) with a larger value. Then, the feature extraction unit 1620 replaces the value of each pixel included in the captured image with a value obtained by replacing the information amount I(a) corresponding to the color of the pixel.

このような処理により、特徴抽出部１６２０は、色情報に基づく被写体マップを生成する。図７（ｂ）は被写体マップの例を示し、白に近い画素は被写体の画素である確からしさが高く、黒に近い画素は被写体の画素である確からしさが低いことを示す。なお、便宜上、図７（ｂ）では被写体マップを二値画像として示しているが、実際には多階調画像である。撮像画像の背景としての家５０２の一部が人物の顔５０１と類似した色を有するため、色情報に基づく被写体マップでは人物の顔５０１の識別が十分ではない。図７（ｃ）に示す矩形領域５０４は、例えば、被写体マップで画素値が所定の閾値以上の領域に基づいて最終的に設定した（更新した）被写体領域の例を示す。 Through such processing, feature extraction section 1620 generates a subject map based on color information. FIG. 7B shows an example of a subject map, in which pixels close to white are highly likely to be pixels of the subject, and pixels close to black are less likely to be pixels of the subject. Although the object map is shown as a binary image in FIG. 7B for convenience, it is actually a multi-tone image. Since part of the house 502 as the background of the captured image has a color similar to that of the person's face 501, the subject map based on the color information is not sufficient to identify the person's face 501. FIG. A rectangular area 504 shown in FIG. 7C is an example of a finally set (updated) subject area based on, for example, an area in the subject map in which pixel values are equal to or greater than a predetermined threshold.

このような被写体領域から抽出した特徴量を用いた場合、人物の顔５０１を精度良く追跡できる可能性は低くなる。そのため本実施形態では、色情報に基づいて設定した被写体領域の精度を向上させるために、距離マップ生成部１６３０が生成した距離マップを利用する。図７（ｄ）に、図７（ａ）に示した撮像画像について生成された距離マップを、指定位置５０３に対応する被写体距離を基準として、被写体距離の差が小さいほど白く、大きいほど黒く表されるように変換した例を示す。なお、便宜上、図７（ｄ）では距離マップを二値画像として示しているが、実際には多階調画像である。 If the feature amount extracted from such a subject area is used, the possibility that the person's face 501 can be tracked with high accuracy is low. Therefore, in this embodiment, the distance map generated by the distance map generation unit 1630 is used in order to improve the accuracy of the subject area set based on the color information. FIG. 7(d) shows the distance map generated for the captured image shown in FIG. 7(a), with the object distance corresponding to the specified position 503 as a reference, the smaller the difference in object distance, the whiter, and the larger the difference, the blacker. Here is an example converted to be Although the distance map is shown as a binary image in FIG. 7D for convenience, it is actually a multi-tone image.

特徴抽出部１６２０は、距離情報を加味した被写体マップを、例えば、距離マップと色情報に基づく被写体マップの対応画素の値を乗じることによって生成する。図７（ｅ）に、距離情報を加味した（すなわち、色情報と距離情報の両方に基づく）被写体マップの例を示す。図７（ｅ）に示す被写体マップでは、人物の顔５０１と背景としての家５０２とを精度良く区別できている。図７（ｆ）に示す矩形領域５０５は、例えば図７（ｅ）に示す被写体マップで画素値が所定の閾値以上の領域に基づいて設定した被写体領域の例を示す。矩形領域５０５は人物の顔５０１に外接した矩形領域であり、領域内に含まれる背景の画素が非常に少ない。このような被写体領域で抽出した特徴量を用いた場合、人物の顔５０１を精度良く追跡できる可能性は高くなる。 The feature extraction unit 1620 generates a subject map with the distance information added, for example, by multiplying the values of the corresponding pixels of the distance map and the subject map based on the color information. FIG. 7(e) shows an example of a subject map with distance information (that is, based on both color information and distance information). In the subject map shown in FIG. 7E, the person's face 501 and the background house 502 can be distinguished with high accuracy. A rectangular area 505 shown in FIG. 7(f) is an example of a subject area set based on an area having pixel values equal to or greater than a predetermined threshold in the subject map shown in FIG. 7(e), for example. A rectangular area 505 is a rectangular area circumscribing the person's face 501 and contains very few background pixels. Using the feature amount extracted from such a subject area increases the possibility that the person's face 501 can be tracked with high accuracy.

このように、指定位置を含んだ所定範囲に関する色情報に加え、距離情報を参照することにより、より精度の高い被写体領域を設定でき、精度の良い追跡に適した特徴量を抽出することが可能になる。 In this way, by referring to the distance information in addition to the color information about the specified range including the specified position, it is possible to set the subject area with higher accuracy and extract the feature quantity suitable for accurate tracking. become.

なお、追跡対象の位置が指定された時点において、指定位置およびその近傍領域に関し、有効な（参照するに足りる信頼性を有する）距離情報が得られていない場合もある。例えば、距離マップの生成が特定の領域（例えば焦点検出領域）についてしか実行されず、指定位置が特定領域外である場合や、指定位置のピントが合っておらず、距離情報の信頼性が低い場合などが考えられる。 At the time when the position of the tracked object is specified, effective (reliable enough to refer to) distance information may not be obtained for the specified position and its neighboring area. For example, if the distance map is generated only for a specific area (for example, focus detection area) and the specified position is outside the specific area, or if the specified position is out of focus, the reliability of the distance information is low. Such cases can be considered.

そのため、特徴抽出部１６２０は、指定位置近傍（仮の被写体領域）について参照するに足りる信頼性を有する距離情報が得られていれば、色情報に加えて距離情報を参照して被写体領域を設定する。一方、指定位置近傍（仮の被写体領域）について参照するに足りる信頼性を有する距離情報が得られていない場合、特徴抽出部１６２０は、距離情報を参照せずに色情報に基づいて被写体領域を設定する。なお、参照するに足りる信頼性を有する距離情報とは、例えば、仮の被写体領域が合焦状態もしくは合焦に近い状態（すなわちデフォーカス量が所定の閾値以下である状態）で得られた距離情報であってよいが、これに限定されない。 Therefore, if the feature extraction unit 1620 obtains distance information with sufficient reliability to refer to the vicinity of the designated position (temporary subject area), the feature extraction unit 1620 sets the subject area by referring to the distance information in addition to the color information. do. On the other hand, if distance information with sufficient reliability to refer to the vicinity of the specified position (temporary subject area) is not obtained, the feature extraction unit 1620 extracts the subject area based on the color information without referring to the distance information. set. Note that the distance information with sufficient reliability to refer to is, for example, the distance obtained when the provisional subject area is in focus or in a state close to focus (i.e., when the defocus amount is equal to or less than a predetermined threshold). It may be information, but is not limited to this.

（撮像装置の処理の流れ）
図８および図９のフローチャートを用いて、本実施形態のデジタルカメラ１００による、被写体追跡処理を伴う動画撮影動作に関して説明する。動画撮影動作は、撮影スタンバイ時や動画記録時に実行される。なお、撮影スタンバイ時と動画記録時とでは取り扱う画像（フレーム）の解像度など、細部において異なるが、被写体追跡に係る処理の内容は基本的に同様であるため、以下では特に区別せずに説明する。 (Processing flow of imaging device)
A moving image shooting operation accompanied by subject tracking processing by the digital camera 100 of the present embodiment will be described with reference to the flowcharts of FIGS. 8 and 9. FIG. The moving image shooting operation is executed during shooting standby or movie recording. Details such as the resolution of images (frames) handled during shooting standby and during video recording are different, but the content of the processing related to subject tracking is basically the same, so the following description will be given without any particular distinction. .

Ｓ８０１で主制御部１５１はデジタルカメラ１００の電源がＯＮかどうか判定し、ＯＮと判定されなければ処理を終了し、ＯＮと判定されれば処理をＳ８０２に進める。
Ｓ８０２で主制御部１５１は各部を制御し、１フレーム分の撮像処理を実行して処理をＳ８０３に進める。なお、ここでは１組の視差画像と、１画面分の撮像画像とが生成され、ＲＡＭ１５４に格納される。 In S801, the main control unit 151 determines whether the power of the digital camera 100 is ON, ends the process if it is not determined to be ON, and advances the process to S802 if it is determined to be ON.
In S802, the main control unit 151 controls each unit, executes imaging processing for one frame, and advances the processing to S803. Note that here, a set of parallax images and a captured image for one screen are generated and stored in the RAM 154 .

Ｓ８０３で主制御部１５１は、追跡部１６１に被写体追跡処理を実行させる。処理の詳細については後述する。なお、被写体追跡処理により、追跡部１６１から被写体領域の位置や大きさが主制御部１５１に通知される。主制御部１５１は通知された被写体領域に基づいて焦点検出領域を設定する。 In S803, the main control unit 151 causes the tracking unit 161 to execute subject tracking processing. Details of the processing will be described later. By subject tracking processing, the tracking unit 161 notifies the main control unit 151 of the position and size of the subject area. The main control unit 151 sets the focus detection area based on the notified object area.

Ｓ８０４で主制御部１５１は、フォーカス制御部１３３に焦点検出処理を実行させる。フォーカス制御部１３３は、１対の視差画像のうち焦点検出領域に含まれる複数の画素のうち、同一行に配置された複数の画素から得られる複数のＡ信号をつなぎ合わせてＡ像を、複数のＢ信号をつなぎ合わせてＢ像を生成する。そして、フォーカス制御部１３３は、Ａ像とＢ像との相対的な位置をずらしながらＡ像とＢ像の相関量を演算し、Ａ像とＢ像との類似度が最も高くなる相対位置をＡ像とＢ像との位相差（ずれ量）として求める。さらに、フォーカス制御部１３３は位相差をデフォーカス量およびデフォーカス方向に変換する。 In S804, the main control unit 151 causes the focus control unit 133 to execute focus detection processing. The focus control unit 133 connects a plurality of A signals obtained from a plurality of pixels arranged in the same row among the plurality of pixels included in the focus detection region in the pair of parallax images to form a plurality of A images. B signals are spliced together to generate a B image. Then, the focus control unit 133 calculates the amount of correlation between the A image and the B image while shifting the relative positions of the A image and the B image, and determines the relative position where the similarity between the A image and the B image is the highest. It is obtained as a phase difference (shift amount) between the A image and the B image. Furthermore, the focus control unit 133 converts the phase difference into a defocus amount and a defocus direction.

Ｓ８０５でフォーカス制御部１３３はＳ８０４で求めたデフォーカス量およびデフォーカス方向に対応するレンズ駆動量および駆動方向に従ってフォーカスモータ１３２を駆動し、フォーカスレンズ１３１を移動させ、処理をＳ８０１に戻す。 In S805, the focus control unit 133 drives the focus motor 132 according to the lens drive amount and drive direction corresponding to the defocus amount and the defocus direction obtained in S804, moves the focus lens 131, and returns the process to S801.

以後、Ｓ８０１で電源スイッチがＯＮであると判定されなくなるまで、Ｓ８０２～Ｓ８０５の処理を繰り返し実行する。これにより、時系列的な複数の画像に対して被写体領域の探索が行われ、被写体追跡機能が実現される。なお、図８では被写体追跡処理を毎フレーム実行するものとしているが、処理負荷や消費電力の軽減を目的として数フレームごとに行うようにしてもよい。 Thereafter, the processes of S802 to S805 are repeatedly executed until it is no longer determined in S801 that the power switch is ON. As a result, a subject region is searched for a plurality of time-series images, and a subject tracking function is realized. In FIG. 8, the subject tracking process is performed every frame, but it may be performed every several frames for the purpose of reducing the processing load and power consumption.

（被写体追跡処理）
次に、図９のフローチャートを用いて、Ｓ８０３における被写体追跡処理の詳細について説明する。
Ｓ９０１で追跡部１６１は、被写体追跡の開始指示が検出されたか否かを判定し、開始指示があったと判定されればＳ９０２へ、判定されなければＳ９０６へ、処理を進める。なお、開始指示は例えば操作部１５６からの追跡位置の指定入力であってよい。指定された位置の情報は主制御部１５１から通知される。この時点では、指定された位置の距離情報が得られていなかったり、指定された位置が非合焦のため距離情報の信頼性が低かったりする可能性が高い。そのため、指定された位置について焦点検出処理が行われた後とは処理内容を異ならせている。 (subject tracking processing)
Next, details of the subject tracking processing in S803 will be described with reference to the flowchart of FIG.
In S901, the tracking unit 161 determines whether or not an instruction to start subject tracking has been detected. Note that the start instruction may be, for example, a designation input of a tracking position from the operation unit 156 . Information on the specified position is notified from the main control unit 151 . At this point, there is a high possibility that the distance information for the specified position has not been obtained, or that the specified position is out of focus and therefore the reliability of the distance information is low. Therefore, the processing contents are different from those after the focus detection processing is performed for the specified position.

Ｓ９０２で追跡部１６１（特徴抽出部１６２０）は指定位置およびその近傍について有効な（信頼性の高い）距離情報が得られているか否かを判定し、得られていると判定されればＳ９０４へ、得られていると判定されなければＳ９０３へ、処理を進める。 In S902, the tracking unit 161 (feature extraction unit 1620) determines whether or not effective (highly reliable) distance information has been obtained for the specified position and its vicinity. , the process advances to S903 if it is not determined to have been obtained.

Ｓ９０３で追跡部１６１（特徴抽出部１６２０）は上述したように色情報のみを用いて指定位置から被写体領域を特定し、被写体領域の特徴量を抽出して処理をＳ９０５に進める。 In S903, the tracking unit 161 (feature extraction unit 1620) identifies the subject area from the specified position using only the color information as described above, extracts the feature amount of the subject area, and advances the process to S905.

Ｓ９０４で追跡部１６１（特徴抽出部１６２０）は上述したように色情報と距離情報の両方を用いて指定位置から被写体領域を特定し、被写体領域の特徴量（画素パターンまたはヒストグラム）を抽出して処理をＳ９０５に進める。 In S904, the tracking unit 161 (feature extraction unit 1620) identifies the subject area from the specified position using both the color information and the distance information as described above, and extracts the feature amount (pixel pattern or histogram) of the subject area. The process proceeds to S905.

Ｓ９０５で追跡部１６１（照合部１６１０）は、Ｓ９０３またはＳ９０４で抽出された特徴量を用いて撮像画像の探索領域に対してマッチング処理を実行し、特徴量の類似度が最も高い領域を探索する。追跡部１６１は、探索された領域の位置および大きさに関する情報を追跡結果として主制御部１５１に通知し、追跡処理を終了する。 In S905, the tracking unit 161 (collation unit 1610) performs matching processing on the search area of the captured image using the feature amount extracted in S903 or S904, and searches for an area with the highest similarity of the feature amount. . The tracking unit 161 notifies the main control unit 151 of information regarding the position and size of the searched area as a tracking result, and terminates the tracking process.

一方、Ｓ９０６で追跡部１６１（特徴抽出部１６２０）は、直近に抽出した特徴量が、色情報と距離情報の両方を用いて特定された被写体領域から抽出されたものか否かを判定する。そして、追跡部１６１（特徴抽出部１６２０）は、直近に抽出した特徴量が、色情報と距離情報の両方を用いて特定された被写体領域から抽出されたものと判定されればＳ９０５へ、判定されなければＳ９０７へ、処理を進める。 On the other hand, in S906, the tracking unit 161 (feature extraction unit 1620) determines whether or not the most recently extracted feature amount is extracted from the subject area specified using both the color information and the distance information. Then, if the tracking unit 161 (feature extraction unit 1620) determines that the most recently extracted feature amount is extracted from the subject area specified using both the color information and the distance information, the process proceeds to S905. If not, the process proceeds to S907.

Ｓ９０７で追跡部１６１（特徴抽出部１６２０）は、前回の照合により検出された被写体領域について有効な距離情報が得られているか否かを判定し、得られていると判定されればＳ９０８へ、得られていると判定されなければＳ９０５へ、処理を進める。 In S907, the tracking unit 161 (feature extraction unit 1620) determines whether or not effective distance information has been obtained for the subject area detected by the previous collation. If it is determined not to have been obtained, the process proceeds to S905.

Ｓ９０８で追跡部１６１（特徴抽出部１６２０）はＳ９０４と同様に色情報と距離情報の両方を用いて指定位置から被写体領域を改めて特定（更新）し、更新した被写体領域の特徴量を抽出して処理をＳ９０５に進める。なお、Ｓ９０８で抽出した特徴量に、過去に抽出した（例えば直前のＳ９０３の処理で抽出した）特徴量を加味するようにしてもよい。 In S908, the tracking unit 161 (feature extraction unit 1620) identifies (updates) the subject area again from the designated position using both the color information and the distance information in the same manner as in S904, and extracts the feature amount of the updated subject area. The process proceeds to S905. It should be noted that the feature amount extracted in S908 may be added with the feature amount extracted in the past (for example, extracted in the immediately preceding process of S903).

継続処理中にＳ９０５で実行される照合処理では、Ｓ９０８で特徴量が更新されていれば更新された特徴量を用い、Ｓ９０８で特徴量が更新されていなければ直近に抽出した特徴量を継続して用いる。 In the matching process executed in S905 during the continuation process, if the feature amount has been updated in S908, the updated feature amount is used, and if the feature amount has not been updated in S908, the most recently extracted feature amount is continued. used.

例えば前回の照合により検出された被写体領域についての焦点検出処理は開始されていても、デフォーカス量が所定の閾値以下になっていなければ、距離情報の信頼性が高いとは言えない。このような場合は、Ｓ９０１、Ｓ９０６、Ｓ９０７、Ｓ９０５の手順で処理される。
追跡された被写体領域のデフォーカス量が所定の閾値以下になれば、被写体領域について信頼性の高い距離情報が取得できる。このような場合は、Ｓ９０１、Ｓ９０６、Ｓ９０７、Ｓ９０８、Ｓ９０５の手順で処理される。
色情報だけでなく、信頼性の高い距離情報も用いて被写体領域が特定されるようになったら被写体領域および特徴量を更新し、以後の追跡処理においては更新した特徴量を用いる。この場合は、Ｓ９０１、Ｓ９０６、Ｓ９０５の手順で処理される。 For example, even if the focus detection process for the subject area detected by the previous collation has started, the reliability of the distance information cannot be said to be high unless the defocus amount is equal to or less than a predetermined threshold value. In such a case, processing is performed in the steps of S901, S906, S907, and S905.
If the defocus amount of the tracked subject area is equal to or less than a predetermined threshold value, highly reliable distance information can be obtained for the subject area. In such a case, processing is performed in the steps of S901, S906, S907, S908, and S905.
When the subject area can be specified using not only the color information but also the highly reliable distance information, the subject area and the feature amount are updated, and the updated feature amount is used in subsequent tracking processing. In this case, the process is performed in steps S901, S906, and S905.

以上説明したように本実施形態によれば、画像中の指定位置に基づいて追跡を行う画像領域（被写体領域）を特定する際、画像の色情報に加え、距離情報を用いることにより、被写体領域の精度を向上させることができる。そのため、被写体領域から抽出される特徴量を用いる追跡処理の精度を向上させることができる。 As described above, according to the present embodiment, when specifying an image region (subject region) to be tracked based on a specified position in an image, distance information is used in addition to color information of the image to determine the subject region. can improve the accuracy of Therefore, it is possible to improve the accuracy of the tracking process using the feature amount extracted from the subject area.

また、距離情報の信頼性が高くない場合には、信頼性が高くなるまでは色情報に基づいて被写体領域を特定し、信頼性が高い距離情報が得られるようになった時点で距離情報をさらに用いて被写体領域を特定し直す（更新する）。そのため、距離情報が得られていない位置や距離情報の信頼性が低い位置が追跡対象として指定された場合であっても、時間の経過と共に追跡処理の精度を向上させることができる。 If the reliability of the distance information is not high, the subject area is specified based on the color information until the reliability becomes high, and the distance information is updated when the reliable distance information becomes available. It is further used to re-identify (update) the subject area. Therefore, even if a position for which distance information is not obtained or a position for which the reliability of distance information is low is specified as a tracking target, the accuracy of tracking processing can be improved over time.

●＜第２の実施形態＞
第１の実施形態では、信頼性が高い距離情報と色情報に基づいて特定した被写体領域から特徴量を抽出できた場合、特徴量を更新しない。これにより、ドリフトの蓄積を回避できたり、オクルージョンに強い被写体追跡が実現できる。一方で、例えば、被写体の存在する環境が変化した場合など、被写体の輝度や色相が特徴量を抽出したときから変化した場合に被写体の追跡精度が低下することがある。 ● <Second embodiment>
In the first embodiment, the feature amount is not updated when the feature amount can be extracted from the subject area specified based on the highly reliable distance information and color information. This makes it possible to avoid the accumulation of drift and achieve subject tracking that is resistant to occlusion. On the other hand, for example, when the environment in which the subject exists changes, the tracking accuracy of the subject may decrease when the brightness or hue of the subject changes from when the feature amount is extracted.

そこで、本実施形態では、被写体領域とその周辺領域の距離情報の差異が所定の条件を満たす場合には、信頼性が高い距離情報を用いて抽出した特徴量についても更新することを特徴としている。なお、本実施形態は第１の実施形態と同様に図１の構成を有するデジタルカメラ１００で実施可能であるため、以下では主に第１の実施形態との動作上の差異について説明する。 Therefore, in this embodiment, when the difference in the distance information between the subject area and its surrounding area satisfies a predetermined condition, the feature amount extracted using the highly reliable distance information is also updated. . Since the present embodiment can be implemented with the digital camera 100 having the configuration shown in FIG. 1 as in the first embodiment, the differences in operation from the first embodiment will be mainly described below.

図１０のフローチャートを用いて、本実施形態のデジタルカメラ１００による、被写体追跡処理を伴う動画撮影動作に関して説明する。
図１０の、Ｓ１００１～Ｓ１００３およびＳ１００５～Ｓ１００６は、図８のＳ８０１～Ｓ８０５と同じである。本実施形態では、Ｓ１００３で被写体追跡処理を行った後、Ｓ１００４で特徴量更新処理を行う点が第１の実施形態と異なる。 A moving image shooting operation accompanied by subject tracking processing by the digital camera 100 of the present embodiment will be described with reference to the flowchart of FIG. 10 .
S1001 to S1003 and S1005 to S1006 in FIG. 10 are the same as S801 to S805 in FIG. The present embodiment differs from the first embodiment in that after subject tracking processing is performed in S1003, feature amount update processing is performed in S1004.

次に、図１１のフローチャートを用いて、図１０のＳ１００４で実施する特徴量更新処理の詳細について説明する。
Ｓ１１０１で、追跡部１６１（特徴抽出部１６２０）は、照合処理（Ｓ９０５）で探索された被写体領域と、得られている距離情報とから、被写体領域とその周辺領域の距離情報の差異が大きいか否かを判定する。 Next, details of the feature amount update processing performed in S1004 of FIG. 10 will be described using the flowchart of FIG.
In S1101, the tracking unit 161 (feature extraction unit 1620) determines whether the difference in distance information between the subject region and its surrounding regions is large, based on the subject region searched in the matching process (S905) and the obtained distance information. determine whether or not

図１２（ａ）と図１２（ｃ）はそれぞれ別の撮像画像を、図１２（ｂ）と図１２（ｄ）はそれぞれ図１２（ａ）と図１２（ｃ）の撮像画像に対して生成された距離マップを模式的に示す。図１２（ａ）では、人物１２０１の後ろに距離をあけて背景としての家１２０２が存在し、図１２（ｃ）では、人物１２０５の手前に別の人物１２０６が存在している。 FIGS. 12(a) and 12(c) are different captured images, and FIGS. 12(b) and 12(d) are generated for the captured images of FIGS. 12(a) and 12(c), respectively. Schematically shows a distance map that has been generated. In FIG. 12(a), a house 1202 exists as a background with a distance behind the person 1201, and another person 1206 exists in front of the person 1205 in FIG. 12(c).

図１２（ｂ）の距離マップは、各画素の距離情報を、追跡処理の対象である人物１２０１に対応する距離情報を基準とした差が小さいほど白く、大きいほど黒く示している。同様に、図１２（ｂ）の距離マップは、各画素の距離情報を、追跡処理の対象である人物１２０５に対応する距離情報を基準とした差が小さいほど白く、大きいほど黒く示している。なお、作図上、図１２（ｂ）および（ｄ）は距離マップを二値画像として示しているが、実際には多値のグレースケール画像である。なお、基準とする距離情報は、被写体領域に対応する距離情報は、距離情報の平均値もしくは最も頻度の高い距離情報などであってよい。 In the distance map of FIG. 12B, the distance information of each pixel is shown in white when the difference with respect to the distance information corresponding to the person 1201 to be tracked is small, and in black when the difference is large. Similarly, the distance map of FIG. 12B shows the distance information of each pixel in white as the difference with respect to the distance information corresponding to the person 1205 to be tracked is smaller, and in black as the difference is larger. Although FIGS. 12(b) and 12(d) show the distance maps as binary images for drawing purposes, they are actually multi-valued grayscale images. Note that the reference distance information corresponding to the subject area may be the average value of the distance information, the most frequently used distance information, or the like.

図１２（ｂ）の領域１２０３および図１２（ｄ）の領域１２０７は、Ｓ１００３の被写体追跡処理によって特定された被写体領域であり、領域１２０４および領域１２０８はそれぞれ領域１２０３および領域１２０７の周辺領域である。ここでは、被写体領域の周辺領域を、被写体領域を上下および左右方向に等量拡大し、水平方向および垂直方向のサイズがそれぞれ被写体領域の３倍の領域から、被写体領域を除外した、中心が空いた中空の領域と規定する。ただし、これは一例であり、他の方法で規定してもよい。 Area 1203 in FIG. 12(b) and area 1207 in FIG. 12(d) are subject areas identified by subject tracking processing in S1003, and areas 1204 and 1208 are areas surrounding areas 1203 and 1207, respectively. . Here, the peripheral area of the subject area is obtained by enlarging the subject area by the same amount in the vertical and horizontal directions, and excluding the subject area from an area whose size in the horizontal direction and the vertical direction is three times the size of the subject area. defined as a hollow region. However, this is just an example, and other methods may be used.

追跡部１６１（特徴抽出部１６２０）は、周辺領域から、主被写体領域における距離情報と類似する（差が所定の範囲内である）距離情報を有する領域を抽出し、この抽出された領域が周辺領域において占める割合が、所定の閾値以上であるか否かを判定する。追跡部１６１（特徴抽出部１６２０）は、この割合が閾値以上と判定されれば特徴量更新処理を終了し、割合が閾値以上と判定されなければ処理をＳ１１０２に進める。 The tracking unit 161 (feature extraction unit 1620) extracts an area having distance information similar to (within a predetermined range of difference) the distance information in the main subject area from the surrounding area, and extracts the extracted area as the surrounding area. It is determined whether or not the ratio in the area is equal to or greater than a predetermined threshold. The tracking unit 161 (feature extraction unit 1620) ends the feature update processing if the ratio is determined to be equal to or greater than the threshold, and advances the processing to S1102 if the ratio is not determined to be equal to or greater than the threshold.

Ｓ１１０１での判定に関して説明する。周辺領域のうち、主被写体領域における距離情報と類似する距離情報を有する部分の割合が少なければ（例えば閾値未満であれば）、追跡対象である被写体領域と背景領域とが明確に区別できる状況であると考えられる。そのため、この条件を満たす撮像画像に基づいて特徴量を更新しても、更新後の特徴量における背景の影響は少ないと考えられる。 The determination in S1101 will be described. If the proportion of the surrounding area that has distance information similar to the distance information in the main subject area is low (for example, if it is less than a threshold), the subject area to be tracked and the background area can be clearly distinguished. It is believed that there is. Therefore, even if the feature amount is updated based on the captured image that satisfies this condition, the effect of the background on the updated feature amount is considered to be small.

反対に、周辺領域のうち、主被写体領域における距離情報と類似する距離情報を有する部分の割合が多ければ（例えば閾値以上であれば）、追跡対象である被写体領域と背景領域との区別が難しい状況であると考えられる。 Conversely, if a portion of the surrounding area has distance information similar to the distance information in the main subject area at a high rate (e.g., if it is equal to or greater than the threshold), it is difficult to distinguish between the subject area to be tracked and the background area. situation.

図１２（ｂ）および（ｄ）の例では、白く示された領域が、主被写体領域に対応する距離情報と類似する距離情報を有する領域である。Ｓ１１０１で用いる閾値は例えば実験的に定めることができる。ここでは、主被写体領域における距離情報と類似する（差が所定の範囲内である）距離情報を有する領域が周辺領域において占める割合が、図１２（ｂ）に示す例では所定の閾値未満、図１２（ｄ）に示す例では所定の閾値以上であると判定される。 In the examples of FIGS. 12(b) and (d), the areas shown in white are areas having distance information similar to the distance information corresponding to the main subject area. The threshold used in S1101 can be determined experimentally, for example. Here, in the example shown in FIG. In the example shown in 12(d), it is determined to be greater than or equal to the predetermined threshold.

Ｓ１１０２で、追跡部１６１（特徴抽出部１６２０）は、照合処理で算出した評価値（式（３））に基づいて、照合処理で探索された被写体領域から抽出した新たな特徴量と、照合処理で被写体領域の探索に用いた特徴量との類似度が低いか否かを判定する。具体的には、特徴抽出部１６２０は、照合部１６１０が算出した新たな評価値が更新閾値よりも高いか否か、あるいは、Bhattacharyya係数に基づく評価値（式（６））が、別の更新閾値より低いか否かを判定する。 In S<b>1102 , the tracking unit 161 (feature extraction unit 1620 ) extracts a new feature amount from the object region searched for in the matching process, based on the evaluation value (Equation (3)) calculated in the matching process. determines whether or not the degree of similarity with the feature quantity used for searching the object region is low. Specifically, the feature extraction unit 1620 determines whether the new evaluation value calculated by the matching unit 1610 is higher than the update threshold, or whether the evaluation value based on the Bhattacharyya coefficient (equation (6)) is determined by another update. Determine whether or not it is lower than the threshold.

探索された被写体領域から、探索に用いられた特徴量と類似度が低い特徴量が抽出された場合、被写体領域の探索はできたが、被写体領域の見た目に変化が生じており、特徴量を更新する必要性が高いと考えられる。一方で、探索された被写体領域から、探索に用いられた特徴量と類似度が高い特徴量が抽出された場合には、被写体領域の見た目の変化が小さく、特徴量を更新する必要性は低いと考えられる。 If a feature quantity with low similarity to the feature quantity used for the search is extracted from the searched subject region, the subject region could be searched, but the appearance of the subject region has changed, and the feature quantity is It is considered that there is a high need for updating. On the other hand, if a feature quantity having a high degree of similarity to the feature quantity used for searching is extracted from the searched subject region, the change in the appearance of the subject region is small, and the need to update the feature quantity is low. it is conceivable that.

したがって、追跡部１６１（特徴抽出部１６２０）は、Ｓ１１０２で類似度が低いと判定されれば処理をＳ１１０３の処理に進め、類似度が低いと判定されなければ特徴量更新処理を終了する。 Therefore, if the tracking unit 161 (feature extraction unit 1620) determines that the similarity is low in S1102, the process proceeds to S1103, and if the similarity is not determined to be low, the feature update process ends.

Ｓ１１０３で追跡部１６１（特徴抽出部１６２０）は、Ｓ９０８と同様に、探索された被写体領域から抽出された新たな特徴量で、照合処理に用いる特徴量を更新する。更新の方法に特に制限はない。例えば、それまで照合処理に用いていた特徴量を新たな特徴量で完全に置き換えてもよいし、それまで照合処理に用いていた特徴量と新たな特徴量とを用いて更新後の特徴量を算出してもよい。たとえば、差分絶対和に基づく評価値（式（３））であれば、式（８）にしたがって更新後の特徴量を求めることができる。
T(i, j) = Tpre(i, j)×α + Tnow(i, j)×(1-α) , 0 ≦α≦1 （８）
ここで、Tpre (i,j)が照合処理に用いた特徴量、Tnow (i,j)が新たな特徴量、T(i, j)が更新後の特徴量である。 In S<b>1103 , the tracking unit 161 (feature extraction unit 1620 ) updates the feature amount used for matching processing with the new feature amount extracted from the searched subject area, as in S<b>908 . There are no particular restrictions on the method of updating. For example, the feature quantity used in the matching process until then may be completely replaced with a new feature quantity, or the feature quantity after updating using the feature quantity used in the matching processing and the new feature quantity may be calculated. For example, if the evaluation value (equation (3)) is based on the sum of absolute differences, the updated feature amount can be obtained according to equation (8).
T(i, j) = Tpre(i, j) × α + Tnow(i, j) × (1-α) , 0 ≤ α ≤ 1 (8)
Here, Tpre (i, j) is the feature amount used in the matching process, Tnow (i, j) is the new feature amount, and T(i, j) is the updated feature amount.

また、Bhattacharyya係数に基づく評価値（式（６））であれば、式（９）にしたがって更新後の特徴量を求めることができる。
p(m) = ppre(m)×α + pnow(m)×(1-α) , 0 ≦α≦1 （９）
ここで、ppre (m)が照合処理に用いた特徴量、pnow (m)が新たな特徴量、p(m)が更新後の特徴量を示す。 Further, if the evaluation value (equation (6)) is based on the Bhattacharyya coefficient, the updated feature amount can be obtained according to equation (9).
p(m) = ppre(m) x α + pnow(m) x (1-α) , 0 ≤ α ≤ 1 (9)
Here, ppre (m) indicates the feature amount used in the matching process, pnow (m) indicates the new feature amount, and p(m) indicates the updated feature amount.

式（８）および式（９）のいずれにおいても、α＝0が新たな抽出した特徴量で完全に置換する更新を示し、α＝１が特徴量が更新されないことを示す。更新の度合いαは、例えば、Ｓ１１０１で判定された距離情報の差異の大きさと、Ｓ１１０２で判定された類似度との少なくとも一方に応じて適応的に決定することができる。 In both equations (8) and (9), α=0 indicates an update that completely replaces with new extracted features, and α=1 indicates that the features are not updated. The update degree α can be adaptively determined according to at least one of the magnitude of the difference in distance information determined in S1101 and the similarity determined in S1102, for example.

例えば、Ｓ１１０１およびＳ１１０２での判定条件を満たした上で、距離情報の差異が大きいほど、また類似度が低いほど、更新の度合いαの値を小さく（新たな特徴量の寄与を大きく）して更新後の特徴量を算出することができる。また、Ｓ１１０１およびＳ１１０２での判定条件を満たした上で、距離情報の差異が小さいほど、また類似度が高いほど、更新の度合いαの値を大きく（新たな特徴量の寄与を小さく）して更新後の特徴量を算出することができる。 For example, after satisfying the determination conditions in S1101 and S1102, the greater the difference in distance information and the lower the degree of similarity, the smaller the value of the update degree α (the greater the contribution of the new feature amount). The updated feature amount can be calculated. Further, after satisfying the determination conditions in S1101 and S1102, the smaller the difference in the distance information and the higher the similarity, the larger the value of the degree of update α (the smaller the contribution of the new feature amount). The updated feature amount can be calculated.

さらに、合焦距離や露出を確定する操作（例えば、撮影準備指示または撮影開始指示に相当する操作であり、操作部１５６に含まれるシャッタボタンの操作）が検出された場合、その時点で被写体の追跡処理が成功している可能性が高いと考えられる。したがって、合焦距離や露出を確定する操作が検出された場合には、その時点で検出されている被写体領域から抽出された新たな特徴量で更新されやすくなるようにＳ１１０１およびＳ１１０２での判定に用いる閾値を変更するようにしてもよい。 Furthermore, when an operation to determine the focus distance or exposure (for example, an operation corresponding to an instruction to prepare for shooting or an instruction to start shooting, and an operation of the shutter button included in the operation unit 156) is detected, the subject is positioned at that point. It is considered highly probable that the tracking process was successful. Therefore, when an operation to confirm the focus distance or exposure is detected, the determinations in S1101 and S1102 are made so as to facilitate updating with new feature amounts extracted from the subject area detected at that time. You may make it change the threshold value to be used.

以上説明したように本実施形態によれば、距離情報を用い、被写体領域から精度良く特徴量を抽出できる場合には特徴量を更新できるようにした。そのため、追跡対象の被写体領域の見えが変化する場合であっても、追跡精度を低下させることなく、特徴量を更新することが可能となり、被写体追跡の性能をさらに向上させることができる。 As described above, according to the present embodiment, the feature amount can be updated when the feature amount can be accurately extracted from the subject area using the distance information. Therefore, even if the appearance of the tracked subject area changes, it is possible to update the feature amount without deteriorating the tracking accuracy, and it is possible to further improve the subject tracking performance.

（その他の実施形態）
なお、上述の実施形態では撮影時に被写体追跡を行う場合について説明したが、距離情報が取得可能であれば、動画像の再生時においても同様の被写体追跡を行うことが可能である。この場合、動画像のフレームに記録されている距離情報を取得してもよいし、各フレームが１組の視差画像の形式で記録されていれば、視差画像から距離情報を生成し、視差画像を合成して再生用の動画フレームを生成すればよい。もちろん、他の方法で距離情報を取得してもよい。 (Other embodiments)
In the above-described embodiment, the case where subject tracking is performed during shooting has been described, but if distance information can be obtained, subject tracking can be performed in the same way during playback of moving images. In this case, the distance information recorded in the frames of the moving image may be acquired, or if each frame is recorded in the form of a set of parallax images, the distance information is generated from the parallax images, and the parallax images are generated. are combined to generate a video frame for playback. Of course, you may acquire distance information by another method.

再生時に被写体追跡を実行する場合、追跡結果は例えば動画の表示方法の制御に用いることができる。例えば、追跡中の被写体領域が画面の中心に表示されるように制御したり、追跡中の被写体領域の大きさが一定になるようにスケーリングして表示されるように制御したりすることができる。また、追跡中の被写体領域を特定する指標（例えば被写体領域の外接矩形枠）を重畳表示するようにしてもよい。なお、これらは単なる例にすぎず、追跡結果を他の用途で用いてもよい。 When subject tracking is performed during playback, the tracking result can be used, for example, to control the display method of moving images. For example, it is possible to control so that the subject area being tracked is displayed in the center of the screen, or to be displayed after being scaled so that the size of the subject area being tracked is constant. . Also, an index (for example, a circumscribing rectangular frame of the subject area) that specifies the subject area being tracked may be superimposed and displayed. Note that these are merely examples, and the tracking results may be used for other purposes.

追跡中の被写体領域を特定する指標の重畳表示を、被写体領域が距離情報を参照して特定されている場合と、色情報のみを用いて特定されている場合とで異なる形態としてもよい。例えば、被写体領域が色情報のみを用いて特定されている場合には、被写体領域の精度が低い可能性があるため、固定位置および大きさの指標を表示する。また、被写体領域が距離情報を参照して特定されている場合には、被写体領域の位置や大きさに応じて指標の位置や大きさを動的に変更する。 The superimposed display of the index for specifying the subject area being tracked may be different between when the subject area is specified by referring to the distance information and when it is specified by using only the color information. For example, if the subject area is specified using only color information, the accuracy of the subject area may be low, so an index of fixed position and size is displayed. Further, when the subject area is specified by referring to the distance information, the position and size of the index are dynamically changed according to the position and size of the subject area.

また、動画に限らず、連写やインターバル撮影のような時系列的な複数の画像の撮影および再生時にも本発明は適用可能である。 In addition, the present invention is applicable not only to moving images, but also to shooting and playing back a plurality of images in time series, such as continuous shooting and interval shooting.

本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 The present invention supplies a program that implements one or more functions of the above-described embodiments to a system or device via a network or a storage medium, and one or more processors in the computer of the system or device reads and executes the program. It can also be realized by processing to It can also be implemented by a circuit (for example, ASIC) that implements one or more functions.

また、上述の実施形態は本発明の理解を助けることを目的とした具体例に過ぎず、いかなる意味においても本発明を上述の実施形態に限定する意図はない。特許請求の範囲に規定される範囲に含まれる全ての実施形態は本発明に包含される。 Moreover, the above-described embodiments are merely specific examples for the purpose of helping understanding of the present invention, and are not intended to limit the present invention to the above-described embodiments in any sense. All embodiments falling within the scope defined by the claims are included in the present invention.

１００…デジタルカメラ、１０１…撮影レンズ、１３１…フォーカスレンズ、１３２…フォーカスモータ、１３３…フォーカス制御部、１４１…撮像素子、１５１…主制御部、１５２…画像処理部、１６１…追跡部 DESCRIPTION OF SYMBOLS 100... Digital camera 101... Shooting lens 131... Focus lens 132... Focus motor 133... Focus control part 141... Imaging element 151... Main control part 152... Image processing part 161... Tracking part

Claims

an extraction means for extracting a feature amount from an image region;
searching means for searching for a region similar to the image region in a plurality of time-sequential images using the feature amount;
The extracting means is
Among the distance information in the peripheral area of the area similar to the image area, an area having distance information whose difference from the distance information in the area similar to the image area is within a predetermined range on the short distance side and the long distance side. updating the feature quantity used by the searching means for searching when the ratio of the surrounding area is not equal to or greater than a threshold ;
The image processing apparatus according to claim 1, wherein, when the ratio is equal to or greater than the threshold value, the feature quantity used by the searching means for searching is not updated .

The extracting means extracts the above-described distance information according to the ratio of the areas in the peripheral area having distance information in which the difference from the distance information in the area similar to the image area is within a predetermined range on the short distance side and the long distance side . 2. The image processing apparatus according to claim 1 , wherein the searching means changes the degree of update of the feature quantity used for searching.

The extraction means is
If the ratio is not equal to or greater than the threshold value, determining the degree of similarity between the feature amount used for searching by the searching means and the feature amount extracted from a region similar to the image region;
When the similarity is determined to be low, updating the feature amount used for searching by the searching means,
3. The image processing apparatus according to claim 1, wherein when the similarity is not determined to be low, the search means does not update the feature amount used for the search.

an image processing apparatus according to any one of claims 1 to 3 ;
focus detection means for performing focus detection on an area searched by the search means and including an area similar to the image area;
An imaging device characterized by comprising:

5. The imaging method according to claim 4, wherein the threshold is changed so that the feature quantity used by the search means for searching is easily updated when an operation to determine the focal distance or the exposure is detected. Device.

an imaging device having a function of dividing a pupil region of a photographing lens;
generating means for generating the distance information from the parallax image obtained from the imaging element;
6. The imaging device according to claim 5, further comprising:

an extracting step in which the extracting means extracts the feature amount from the image region;
a searching step of searching for a region similar to the image region in a plurality of time-series images using the feature amount,
In the extraction step,
Among the distance information in the peripheral area of the area similar to the image area, an area having distance information whose difference from the distance information in the area similar to the image area is within a predetermined range on the short distance side and the long distance side. updating the feature amount used for searching in the searching step when the ratio of the surrounding area is not equal to or greater than a threshold ;
A control method for an image processing apparatus , wherein the feature amount used for searching by the searching means is not updated when the ratio is equal to or greater than the threshold value .

A program for causing a computer to function as each means of the image processing apparatus according to any one of claims 1 to 3 .