JP5323243B2

JP5323243B2 - Image processing apparatus and control method thereof

Info

Publication number: JP5323243B2
Application number: JP2012222509A
Authority: JP
Inventors: 康嘉宮▲崎▼
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2012-10-04
Filing date: 2012-10-04
Publication date: 2013-10-23
Anticipated expiration: 2028-08-21
Also published as: JP2013042524A

Description

本発明は、画像処理装置及びその制御方法に関し、特には被写体検出機能を有する画像処理装置及びその制御方法に関する。 The present invention relates to an image processing apparatus and a control method thereof, and more particularly to an image processing apparatus having a subject detection function and a control method thereof.

デジタルカメラやビデオカメラのような、光電変換素子を用いた撮像装置においては、画像処理技術を用いて撮像画像から被写体を検出することが可能である。そのため、検出された被写体に焦点を合わせる自動焦点制御（ＡＦ）や、検出された被写体が適正露出となるような自動露出制御（ＡＥ）を行う撮像装置も実現されている。 In an imaging apparatus using a photoelectric conversion element such as a digital camera or a video camera, an object can be detected from a captured image using an image processing technique. For this reason, an imaging apparatus that performs automatic focus control (AF) for focusing on a detected subject and automatic exposure control (AE) for achieving a proper exposure of the detected subject is also realized.

例えば、撮像素子が有する表示装置を電子ビューファインダ（ＥＶＦ）として機能させるために連続的に撮像されるＥＶＦ画像（ライブ画像、スルー画像、ライブビュー画像などとも呼ばれる）において、継続的に被写体検出を行う場合を考える。この場合、例えば、顔検出のような画像認識技術を用いて個々の画像から被写体として検出する方法（特許文献１参照）や、被写体領域として指定された領域の動きを検出（追尾）する方法（特許文献２参照）を用いることが可能である。 For example, in an EVF image (also called a live image, a live view image, a live view image, or the like) that is continuously captured in order to make the display device included in the image sensor function as an electronic viewfinder (EVF), subject detection is continuously performed. Think about what to do. In this case, for example, a method of detecting an object from an individual image using an image recognition technique such as face detection (see Patent Document 1) or a method of detecting (tracking) a movement of an area designated as an object area ( (See Patent Document 2).

顔検出は目や口など顔の特徴を画像から抽出し、顔であるか判別することによって、個々の画像から被写体を検出する。一方、被写体領域を追尾する方法は、あるフレーム画像で選択された被写体領域と相関の高い領域を別のフレーム画像から探索することによって、個々の画像から被写体を検出する。以下、本明細書においては、便宜上、顔検出のような画像認識を用いた被写体の検出を「顔検出」、被写体領域の動き検出に基づく被写体の検出を「動体検出」と呼ぶ。しかしながら、この呼び名には、被写体を人間の顔に限定する意図を含まない点に留意されたい。 In face detection, facial features such as eyes and mouth are extracted from an image, and a subject is detected from each image by determining whether the face is a face. On the other hand, in the method of tracking the subject area, a subject is detected from each image by searching another frame image for an area highly correlated with the subject area selected in a certain frame image. Hereinafter, in this specification, for convenience, detection of a subject using image recognition such as face detection is referred to as “face detection”, and detection of a subject based on motion detection of a subject region is referred to as “moving object detection”. However, it should be noted that this name does not include the intention to limit the subject to a human face.

特開２００７−２７４５８７号公報JP 2007-274587 A 特開２００１−２４３４７８号公報JP 2001-243478 A

顔検出は顔を構成する複数の部位をそれぞれ認識する必要があるため、被写体の検出精度は高いが、演算が複雑で処理に時間を要するため、被写体の検出速度（追従性）において動体検出に劣る。これに対し、動体検出は、複数のフレーム画像間の差分演算によって被写体を検出するため、顔検出よりも高速に実行できるが、被写体の輝度が変化した場合に検出が困難になるなど、検出精度において顔検出に劣る。
このように、従来の被写体領域の追尾方法では、被写体の検出精度を優先すれば検出速度が低下し、検出速度を優先すれば検出精度が低下してしまう。 Face detection requires recognition of multiple parts that make up the face, so subject detection accuracy is high, but computation is complex and processing takes time, so subject detection speed (follow-up performance) can detect moving objects. Inferior. In contrast, moving object detection detects a subject by calculating the difference between a plurality of frame images, so it can be performed faster than face detection, but detection accuracy becomes difficult when the brightness of the subject changes. Inferior to face detection.
As described above, in the conventional tracking method of the subject area, the detection speed is lowered if priority is given to the detection accuracy of the subject, and the detection precision is lowered if priority is given to the detection speed.

本発明はこのような従来技術の課題に鑑みてなされたものであり、被写体領域を適切に追尾可能な画像処理装置及びその制御方法の実現を目的とする。 The present invention has been made in view of such a problem of the prior art, and an object thereof is to realize an image processing apparatus capable of appropriately tracking a subject area and a control method therefor.

上記課題を解決するため、本願発明に係る画像処理装置は、動画像のフレームの画像から予め定めた特徴となる部位を検出することで、フレームの画像に存在する予め定めた被写体を検出する第１の検出手段と、フレームの画像間で類似する領域を探索することで、動画像において、第１の検出手段によって検出された予め定めた被写体が存在する領域を追尾する第２の検出手段と、第１の検出手段による検出結果と第２の検出手段による検出結果の少なくともいずれかに基づいて、動画像における予め定めた被写体が存在する領域を決定する決定手段とを有し、決定手段は、予め定めた条件を満たす場合には、第２の検出手段による検出結果によらずに第１の検出手段による検出結果に基づいて、予め定めた被写体が存在する領域を決定し、予め定めた条件を満たさない場合には、第１の検出手段による検出結果および第２の検出手段による検出結果に基づいて、予め定めた被写体が存在する領域を決定するものであって、予め定めた条件は、動画像に適用されるゲイン調整量が予め定めた閾値を超えている、および、１フレームの画像に対する露光時間が予め定めた閾値より長い、ことの少なくともいずれかを含むことを特徴とするものである。 In order to solve the above problems, an image processing apparatus according to the present invention detects a predetermined subject existing in a frame image by detecting a part having a predetermined characteristic from a frame image of a moving image. And a second detection unit that tracks a region where a predetermined subject detected by the first detection unit exists in the moving image by searching for a similar region between the images of the frame. Determining means for determining a region in the moving image where a predetermined subject is present based on at least one of the detection result by the first detection means and the detection result by the second detection means, If the predetermined condition is satisfied, the region where the predetermined subject exists is determined based on the detection result by the first detection means, not by the detection result by the second detection means. When the predetermined condition is not satisfied, a region where the predetermined subject exists is determined based on the detection result by the first detection unit and the detection result by the second detection unit. The conditions include at least one of a gain adjustment amount applied to the moving image exceeding a predetermined threshold and an exposure time for one frame image being longer than a predetermined threshold. It is what.

同様に、上記課題を解決するため、本願発明に係る画像処理装置の制御方法は、第１の検出手段が、動画像のフレームの画像から予め定めた特徴となる部位を検出することで、フレームの画像に存在する予め定めた被写体を検出する第１の検出工程と、第２の検出手段が、フレームの画像間で類似する領域を探索することで、動画像において、第１の検出工程において検出された予め定めた被写体が存在する領域を追尾する第２の検出工程と、決定手段が、第１の検出工程における検出結果と第２の検出工程における検出結果の少なくともいずれかに基づいて、動画像における予め定めた被写体が存在する領域を決定する決定工程とを有し、決定工程において決定手段は、予め定めた条件を満たす場合には、第２の検出工程における検出結果によらずに第１の検出工程における検出結果に基づいて、予め定めた被写体が存在する領域を決定し、予め定めた条件を満たさない場合には、第１の検出工程における検出結果および第２の検出工程における検出結果に基づいて、予め定めた被写体が存在する領域を決定するものであって、予め定めた条件は、動画像に適用されるゲイン調整量が予め定めた閾値を超えている、および、１フレームの画像に対する露光時間が予め定めた閾値より長い、ことの少なくともいずれかを含むことを特徴とするものである。 Similarly, in order to solve the above-described problem, in the control method of the image processing apparatus according to the present invention, the first detection unit detects a part having a predetermined feature from the image of the frame of the moving image. In the first detection step, the first detection step of detecting a predetermined subject existing in the image of the first image and the second detection means search for a similar region between the images of the frame. A second detection step for tracking a region where the detected subject is detected, and a determination unit based on at least one of the detection result in the first detection step and the detection result in the second detection step; A determination step of determining a region where a predetermined subject exists in the moving image, and the determination means in the determination step detects the detection result in the second detection step when the predetermined condition is satisfied. Regardless of the case, a region where a predetermined subject exists is determined based on the detection result in the first detection step, and when the predetermined condition is not satisfied, the detection result in the first detection step and the second Based on the detection result in the detection step, a region where a predetermined subject exists is determined, and the predetermined condition is that the gain adjustment amount applied to the moving image exceeds a predetermined threshold value, And an exposure time for an image of one frame is longer than a predetermined threshold value.

このような構成により、本発明によれば、被写体領域を適切に追尾可能な画像処理装置及びその制御方法を実現できる With such a configuration, according to the present invention, an image processing apparatus capable of appropriately tracking a subject area and a control method thereof can be realized.

本発明の第１の実施形態に係る画像処理装置の一例としてのデジタルカメラの機能構成例を示すブロック図である。It is a block diagram which shows the function structural example of the digital camera as an example of the image processing apparatus which concerns on the 1st Embodiment of this invention. 本発明の第１の実施形態に係るデジタルカメラにおける被写体検出動作を示すフローチャートである。3 is a flowchart showing a subject detection operation in the digital camera according to the first embodiment of the present invention. 感度による画質の変化と、画質の変化が動体検出の精度に与える影響を模式的に示す図である。It is a figure which shows typically the influence which the change of the image quality by a sensitivity and the change of an image quality gives to the precision of a moving body detection. 本発明の第１の実施形態に係るデジタルカメラにおける枠表示の例を模式的に示す図である。It is a figure which shows typically the example of the frame display in the digital camera which concerns on the 1st Embodiment of this invention. 本発明の第２の実施形態に係るデジタルカメラにおける被写体検出動作を示すフローチャートである。It is a flowchart which shows the to-be-photographed object detection operation | movement in the digital camera which concerns on the 2nd Embodiment of this invention. シャッタースピードが遅い場合の動体検出の問題点を模式的に説明するための図である。It is a figure for demonstrating typically the problem of the moving body detection when a shutter speed is slow. 本発明の第２の実施形態の効果を模式的に示す図である。It is a figure which shows typically the effect of the 2nd Embodiment of this invention.

以下、添付図面を参照して、本発明の好適かつ例示的な実施形態を詳細に説明する。
＜第１の実施形態＞
本発明の第１の実施形態に係る画像処理装置は、まず顔検出により精度良く被写体領域を検出し、検出された被写体領域を用いた動体検出を行うことで、被写体の検出精度と検出速度（追従性）を両立させることを可能とするものである。 Hereinafter, preferred and exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.
<First Embodiment>
The image processing apparatus according to the first embodiment of the present invention first detects a subject area with high accuracy by face detection, and performs moving object detection using the detected subject area, thereby detecting the subject detection accuracy and detection speed ( (Trackability) can be made compatible.

図１は、本発明の第１の実施形態に係る画像処理装置の一例としてのデジタルカメラの機能構成例を示すブロック図である。
（デジタルカメラの構成）
操作部１０１は、デジタルカメラ１００のユーザがデジタルカメラ１００に対して各種の指示を入力するためのユーザインタフェースであり、スイッチやボタンなどの入力デバイスにより構成されている。 FIG. 1 is a block diagram illustrating a functional configuration example of a digital camera as an example of an image processing apparatus according to the first embodiment of the present invention.
(Configuration of digital camera)
The operation unit 101 is a user interface for a user of the digital camera 100 to input various instructions to the digital camera 100, and includes an input device such as a switch or a button.

操作部１０１にはシャッタースイッチが含まれる。そして、シャッタースイッチの半押し状態で信号ＳＷ１が、全押し状態で信号ＳＷ２がそれぞれ制御部１０２に対して通知される。 The operation unit 101 includes a shutter switch. The control unit 102 is notified of the signal SW1 when the shutter switch is half-pressed and the signal SW2 when the shutter switch is fully pressed.

制御部１０２は、操作部１０１からの指示に応じてデジタルカメラ１００の各部の動作を制御し、デジタルカメラ１００の機能を実現する。制御部１０２は例えばＣＰＵと、ＣＰＵが実行するプログラムを記憶する不揮発性メモリと、プログラムを読み込んだりワークエリアとして用いたりするためのＲＡＭから構成される。 The control unit 102 controls the operation of each unit of the digital camera 100 in accordance with an instruction from the operation unit 101 and realizes the function of the digital camera 100. The control unit 102 includes, for example, a CPU, a nonvolatile memory that stores a program executed by the CPU, and a RAM for reading the program and using it as a work area.

制御部１０２は、後述する画像処理部１０５が出力するデジタル画像データから被写体輝度を算出し、撮影モードに応じてシャッタースピード及び絞りの少なくとも一方を自動的に決定する自動露出制御（ＡＥ）機能を実現する。また、制御部１０２は、設定感度に応じたゲイン調整量をＡ／Ｄ変換部１０４へ通知する。設定感度は、ユーザが設定した固定感度であってもよいし、制御部１０２がＡＥ処理の結果に基づいて動的に設定した感度であってもよい。また、制御部１０２は、フラッシュ設定や自動決定したシャッタースピードなどに応じ、本撮影時のフラッシュ部１１１の発光要否を決定する。フラッシュの発光を決定した場合、制御部１０２は、ＥＦ処理部１１０に、フラッシュオンを指示する。ＥＦ処理部１１０は、制御部１０２からフラッシュオンの指示を受けると、フラッシュ部１１１を制御し、露出機構１０９が有するシャッターが開くタイミングに応じてフラッシュ部１１１を発光させる。 The control unit 102 has an automatic exposure control (AE) function that calculates subject luminance from digital image data output from an image processing unit 105 (to be described later) and automatically determines at least one of the shutter speed and the aperture according to the shooting mode. Realize. Further, the control unit 102 notifies the A / D conversion unit 104 of a gain adjustment amount corresponding to the set sensitivity. The set sensitivity may be a fixed sensitivity set by the user, or may be a sensitivity dynamically set by the control unit 102 based on the result of the AE process. In addition, the control unit 102 determines whether or not the flash unit 111 needs to emit light at the time of actual shooting according to the flash setting, the automatically determined shutter speed, and the like. When the flash emission is determined, the control unit 102 instructs the EF processing unit 110 to turn on the flash. When the EF processing unit 110 receives a flash-on instruction from the control unit 102, the EF processing unit 110 controls the flash unit 111 and causes the flash unit 111 to emit light in accordance with the opening timing of the shutter of the exposure mechanism 109.

さらに制御部１０２は、レンズ駆動部１０８を用いてレンズ１０８ａのフォーカスレンズを駆動させ、画像処理部１０５が出力するデジタル画像データのコントラストの変化を検出することで、自動合焦制御機能を実現する。 Further, the control unit 102 drives the focus lens of the lens 108a using the lens driving unit 108, and detects a change in contrast of the digital image data output from the image processing unit 105, thereby realizing an automatic focusing control function. .

撮像素子１０３は、ＣＣＤイメージセンサ、ＣＭＯＳイメージセンサなどの光電変換デバイスであり、レンズ１０８ａ、露出機構１０９ａを介して結像される被写体光学像を画素単位のアナログ電気信号（アナログ画像データ）に変換する。 The image sensor 103 is a photoelectric conversion device such as a CCD image sensor or a CMOS image sensor, and converts a subject optical image formed through the lens 108a and the exposure mechanism 109a into an analog electrical signal (analog image data) in units of pixels. To do.

レンズ１０８ａはＡＦ機能を有し、後述するレンズ駆動部１０８からの制御に従ってフォーカスレンズを駆動する。露出機構１０９ａは絞り及びメカニカルシャッターを有し、後述するメカ駆動部１０９の制御に従った絞りとシャッタースピードでレンズ１０８ａと撮像素子１０３の間の光路を開くことにより、被写体光学像で撮像素子１０３を露光させる。 The lens 108a has an AF function, and drives the focus lens in accordance with control from a lens driving unit 108 described later. The exposure mechanism 109a has an aperture and a mechanical shutter, and opens the optical path between the lens 108a and the image sensor 103 with the aperture and shutter speed according to the control of a mechanical drive unit 109, which will be described later. To expose.

Ａ／Ｄ変換部１０４は、撮像素子１０３から出力されたアナログ画像データに対して、相関二重サンプリング、ゲイン調整、Ａ／Ｄ変換等を行い、デジタル画像データとして出力する。適用するゲイン調整量（増幅率）は、制御部１０２から与えられる。ゲインが大きければ、結果として、画像中に含まれるノイズ成分も大きくなる。 The A / D conversion unit 104 performs correlated double sampling, gain adjustment, A / D conversion, and the like on the analog image data output from the image sensor 103, and outputs the result as digital image data. The gain adjustment amount (amplification factor) to be applied is given from the control unit 102. If the gain is large, as a result, the noise component included in the image also becomes large.

画像処理部１０５は、Ａ／Ｄ変換部１０４から出力されたデジタル画像データに対してホワイトバランス補正、画素補間処理、ガンマ補正処理、色差信号生成などの画像処理を行い、例えばＹＵＶ画像データを処理済みのデジタル画像データとして出力する。 The image processing unit 105 performs image processing such as white balance correction, pixel interpolation processing, gamma correction processing, and color difference signal generation on the digital image data output from the A / D conversion unit 104, for example, processing YUV image data Output as completed digital image data.

顔検出部１０６は、画像認識により被写体検出を行う第１の被写体検出手段として機能する。顔検出部１０６は、Ａ／Ｄ変換部１０４が出力するデジタル画像データの画像から、被写体の一例としての人間の顔を検出する。そして、顔検出部１０６は、検出した顔の位置や範囲（大きさ）、信頼度（顔としての確からしさ）など、顔領域に係る情報（顔情報）を制御部１０２及び動体検出部１１５に通知する。顔の位置としては、顔領域の中心座標であってよい。 The face detection unit 106 functions as a first subject detection unit that performs subject detection by image recognition. The face detection unit 106 detects a human face as an example of a subject from the digital image data output from the A / D conversion unit 104. Then, the face detection unit 106 sends information (face information) related to the face area, such as the position and range (size) of the detected face, reliability (probability as a face), and the like to the control unit 102 and the moving object detection unit 115. Notice. The face position may be the center coordinates of the face area.

なお、本実施形態における顔検出には、公知の顔検出技術を利用できる。公知の顔検出技術としては、ニューラルネットワークなどを利用した学習に基づく手法、テンプレートマッチングを用いて目、鼻、口等の形状に特徴のある部位を画像から探し出し、類似度が高ければ顔とみなす手法などがある。また、他にも、肌の色や目の形といった画像特徴量を検出し、統計的解析を用いた手法等、多数提案されている。一般的にはこれらの手法を複数組み合わせ、顔検出の精度を向上させている。具体的な例としては特開２００２−２５１３８０号公報に記載のウェーブレット変換と画像特徴量を利用して顔検出する方法などが挙げられる。 A known face detection technique can be used for face detection in the present embodiment. As a known face detection technique, a method based on learning using a neural network or the like, template matching is used to search a part having a characteristic shape of eyes, nose, mouth, etc. from an image, and if the degree of similarity is high, it is regarded as a face There are methods. In addition, many other methods have been proposed, such as a method that detects image feature amounts such as skin color and eye shape and uses statistical analysis. In general, a plurality of these methods are combined to improve the accuracy of face detection. Specific examples include a face detection method using wavelet transform and image feature amount described in JP-A-2002-251380.

動体検出部１１５は、動体検出により被写体検出を行う第２の被写体検出手段として機能する。動体検出部１１５は、画像処理部１０５が出力する、時系列上で連続する２枚分のデジタル画像データの画像から動体を判別し、その位置、範囲、移動量を算出する。本実施形態の動体検出部１１５は角速度センサ（図示せず）を備え、デジタルカメラ１００の動きも検出する。なお、顔検出部１０６で検出された顔情報を用いることで、動体検出部１１５は背景の動きと被写体の動きを区別することが可能であり、背景と被写体のそれぞれについて動体情報（位置、範囲、移動量）を算出する。 The moving body detection unit 115 functions as a second subject detection unit that performs subject detection by moving body detection. The moving object detection unit 115 discriminates a moving object from two images of digital image data continuous in time series output from the image processing unit 105, and calculates its position, range, and movement amount. The moving object detection unit 115 of this embodiment includes an angular velocity sensor (not shown), and also detects the movement of the digital camera 100. Note that by using the face information detected by the face detection unit 106, the moving object detection unit 115 can distinguish the movement of the background from the movement of the subject, and the moving object information (position and range) for each of the background and the subject. , Movement amount).

ＥＶＦ表示部１０７は、ＬＣＤなどの表示装置を含み、画像処理部１０５による処理済みのデジタル画像データに基づく画像を表示する。 The EVF display unit 107 includes a display device such as an LCD, and displays an image based on the digital image data processed by the image processing unit 105.

フォーマット変換部１１２は、画像処理部１０５から出力されたデジタル画像データから、例えばＤＣＦ(Design fule for Camera File System)に準拠した記録用のデータファイルを生成する。フォーマット変換部１１２は、データファイル生成の過程で、ＪＰＥＧ形式への符号化や、ファイルヘッダの生成などを行う。 The format conversion unit 112 generates a data file for recording conforming to, for example, DCF (Design Fule for Camera File System) from the digital image data output from the image processing unit 105. The format conversion unit 112 performs encoding into a JPEG format, generation of a file header, and the like in the process of generating a data file.

画像記録部１１３は、フォーマット変換部１１２が生成したデータファイルを、デジタルカメラ１００の内蔵メモリや、デジタルカメラ１００に装着されているリムーバブルメディアなどに記録する。 The image recording unit 113 records the data file generated by the format conversion unit 112 in a built-in memory of the digital camera 100, a removable medium attached to the digital camera 100, or the like.

外部接続部１１４は、デジタルカメラ１００をＰＣ（パーソナルコンピュータ）やプリンタといった外部装置に接続するためのインターフェースである。外部接続部１１４は、例えばＵＳＢ、ＩＥＥＥ１３９４、ＩＥＥＥ８０２．１１などの一般的な規格に準拠して、外部装置との通信を行う。 The external connection unit 114 is an interface for connecting the digital camera 100 to an external device such as a PC (personal computer) or a printer. The external connection unit 114 communicates with an external device in conformity with general standards such as USB, IEEE 1394, IEEE 802.11, and the like.

（動作の説明）
次に、本実施形態のデジタルカメラ１００の動作について説明する。
先ず、デジタルカメラ１００のユーザが、操作部１０１に含まれている電源スイッチをオンにすると、制御部１０２はこれを検知し、デジタルカメラ１００を構成する各部に図示しない電池やＡＣ入力から電源を供給する。 (Description of operation)
Next, the operation of the digital camera 100 of this embodiment will be described.
First, when the user of the digital camera 100 turns on a power switch included in the operation unit 101, the control unit 102 detects this, and powers each unit constituting the digital camera 100 from a battery or an AC input (not shown). Supply.

本実施形態のデジタルカメラ１００は、電源が供給されるとＥＶＦ表示動作を開始するように構成されている。具体的には、露出機構１０９ａが有するメカニカルシャッターが開き、撮像素子１０３が露光される。撮像素子１０３の各画素で蓄積した電荷を、予め定められたフレームレートを実現する周期で順次読み出し、Ａ／Ｄ変換部１０４にアナログ画像データとして出力する。このように、本実施形態では、ＥＶＦ表示用の画像を、所謂電子シャッターを用いて連続して撮像することにより取得する。 The digital camera 100 of this embodiment is configured to start an EVF display operation when power is supplied. Specifically, the mechanical shutter of the exposure mechanism 109a is opened, and the image sensor 103 is exposed. The charges accumulated in each pixel of the image sensor 103 are sequentially read out at a cycle that realizes a predetermined frame rate, and output to the A / D converter 104 as analog image data. Thus, in this embodiment, the image for EVF display is acquired by continuously capturing images using a so-called electronic shutter.

Ａ／Ｄ変換部１０４は、撮像素子１０３から出力されたアナログ画像データに対して、相関二重サンプリング、ゲイン調整、Ａ／Ｄ変換等を行い、デジタル画像データとして出力する。 The A / D conversion unit 104 performs correlated double sampling, gain adjustment, A / D conversion, and the like on the analog image data output from the image sensor 103, and outputs the result as digital image data.

顔検出部１０６は、Ａ／Ｄ変換部１０４が出力するデジタル画像データの画像から、画像認識によって人間の顔を検出し、検出した顔の位置や範囲、信頼度（顔としての確からしさ）など、顔領域に係る情報（顔情報）を制御部１０２に通知する。 The face detection unit 106 detects a human face by image recognition from the digital image data output from the A / D conversion unit 104, and detects the position and range of the detected face, reliability (probability of the face), and the like. The control unit 102 is notified of information related to the face area (face information).

動体検出部１１５は、画像処理部１０５が出力する、時系列上で連続する複数のデジタル画像データの画像から動体を検出し、その位置、範囲、移動量を算出する。なお、３枚以上の連続する画像に対して動体検出を行う場合、動体検出部１１５は、最初の２枚に対して行った検出結果を用いて、次の２枚以降についても順次検出を継続する。さらに、動体検出部１１５は角速度センサにより、デジタルカメラ１００の動きも検出する。 The moving object detection unit 115 detects a moving object from images of a plurality of digital image data continuous in time series output from the image processing unit 105, and calculates its position, range, and movement amount. When moving object detection is performed on three or more consecutive images, the moving object detection unit 115 uses the detection results obtained for the first two sheets to continue detection for the next two or more sheets. To do. Furthermore, the moving object detection unit 115 also detects the movement of the digital camera 100 using an angular velocity sensor.

制御部１０２は、Ａ／Ｄ変換部１０４に通知するゲイン調整量を決定する。また制御部１０２は、ゲイン調整量に応じて、顔検出部１０６による被写体検出結果のみを用いて被写体領域を決定するか、動体検出部１１５による被写体検出結果も用いて被写体領域を決定するかを切り替える。切り替え制御の詳細については後述する。 The control unit 102 determines a gain adjustment amount to be notified to the A / D conversion unit 104. Also, the control unit 102 determines whether to determine the subject region using only the subject detection result by the face detection unit 106 or to determine the subject region using the subject detection result by the moving body detection unit 115 according to the gain adjustment amount. Switch. Details of the switching control will be described later.

画像処理部１０５は、Ａ／Ｄ変換部１０４から出力されたデジタル画像データに対して各種画像処理を行い、処理済みのデジタル画像データを出力する。
また、ＥＶＦ表示部１０７は、画像処理部１０５が出力する画像データに従った画像を逐次表示する。 The image processing unit 105 performs various types of image processing on the digital image data output from the A / D conversion unit 104, and outputs processed digital image data.
Further, the EVF display unit 107 sequentially displays images according to the image data output from the image processing unit 105.

制御部１０２は、操作部１０１から信号ＳＷ１の通知（即ち、シャッタースイッチの半押しの通知）を受けていない限り、上記のＥＶＦ表示処理を繰り返して実行するように各部を制御する。 The control unit 102 controls each unit so as to repeatedly execute the EVF display process as long as the signal SW1 is not notified from the operation unit 101 (that is, notification that the shutter switch is half-pressed).

一方、制御部１０２が信号ＳＷ１の通知を受けると、通知を受けた時点で最新の撮像画像を用いてＡＦ、ＡＥ処理を行い、合焦位置および露出条件を決定する。さらに制御部１０２は、フラッシュ部１１１の発光要否を決定する。フラッシュを発光するか否かは、操作部１０１を用いて予め設定しておき、その設定データを読み取ることで判断するようにしても良いし、また周囲の暗さを検知し自動的に判断するようにしても良い。 On the other hand, when the control unit 102 receives the notification of the signal SW1, AF and AE processes are performed using the latest captured image when the notification is received, and the in-focus position and the exposure condition are determined. Further, the control unit 102 determines whether or not the flash unit 111 needs to emit light. Whether or not to emit the flash may be determined in advance by using the operation unit 101 and may be determined by reading the setting data, or may be automatically determined by detecting the darkness of the surroundings. You may do it.

制御部１０２は、操作部１０１から信号ＳＷ１の通知が継続している限り、かつ信号ＳＷ２の通知（即ち、シャッタースイッチの全押しの通知）を受けるまで待機する。制御部１０２は、信号ＳＷ２の通知を受ける前に信号ＳＷ１の通知が途絶えた場合には、ＥＶＦ表示処理を再開させる。 The control unit 102 stands by until the notification of the signal SW1 is continued from the operation unit 101 and until the notification of the signal SW2 (that is, the notification of full pressing of the shutter switch) is received. If the notification of the signal SW1 is interrupted before receiving the notification of the signal SW2, the control unit 102 resumes the EVF display process.

信号ＳＷ２の通知を受けると制御部１０２は、フラッシュ部１１１を発光させる場合にはＥＦ処理部１１０を制御し、フラッシュ部１１１にプリ発光を行わせ、発光量の算出、ＥＦ枠の重み付けなどの処理を行う。そして、制御部１０２は、プリ発光により計算された本発光量でフラッシュ部１１１を発光させるようにＥＦ処理部１１０に指示する。フラッシュ部１１１を発光させない場合、制御部１０２は上述の調光制御を行わずに本撮影処理に移行する。 Upon receiving the notification of the signal SW2, the control unit 102 controls the EF processing unit 110 to cause the flash unit 111 to emit light, causes the flash unit 111 to perform pre-emission, calculates the light emission amount, and weights the EF frame. Process. Then, the control unit 102 instructs the EF processing unit 110 to cause the flash unit 111 to emit light at the main light emission amount calculated by the pre-light emission. When the flash unit 111 is not caused to emit light, the control unit 102 proceeds to the main photographing process without performing the above-described light control.

本撮影処理は、ＥＶＦ表示処理時の撮像処理と、ＡＥ処理で決定された絞り及びシャッタースピードによって露出機構１０９ａを制御して撮像素子１０３を露光すること、一般には撮像素子１０３から読み出す画素数が多い点で主に異なる。 In the main photographing process, the image pickup process during the EVF display process, the exposure mechanism 109a is controlled by the aperture and the shutter speed determined in the AE process, and the image pickup element 103 is exposed. Mainly different in many respects.

また、本撮影は記録用の画像を撮像する処理であるため、画像処理部１０５が出力する処理済みのデジタル画像データは、フォーマット変換部１１２によって記録用のデータファイルに変換され、画像記録部１１３によって記録媒体に記録される。 In addition, since the actual shooting is a process of capturing an image for recording, the processed digital image data output from the image processing unit 105 is converted into a recording data file by the format conversion unit 112 and the image recording unit 113. Is recorded on the recording medium.

（被写体検出動作）
次に、本実施形態のデジタルカメラ１００における被写体検出動作について、図２に示すフローチャートを用いて説明する。
先ず、制御部１０２は、ＥＶＦ表示用の画像を撮像するための初期露出値を設定する（Ｓ２０１）。この初期露出値は、予め定めたＥＶＦ画像のフレームレートを実現できる範囲で適宜設定することができる。 (Subject detection operation)
Next, the subject detection operation in the digital camera 100 of the present embodiment will be described using the flowchart shown in FIG.
First, the control unit 102 sets an initial exposure value for capturing an image for EVF display (S201). The initial exposure value can be set as appropriate within a range in which a predetermined EVF image frame rate can be realized.

そして、制御部１０２は、設定した露出値を実現するため、露出機構１０９ａの絞りをメカ駆動部１０９を通じて制御して撮像素子を露光（Ｓ２０２）し、シャッタースピードに対応した周期でＥＶＦ画像を撮像素子１０３から読み出す（Ｓ２０３）。 Then, in order to realize the set exposure value, the control unit 102 controls the aperture of the exposure mechanism 109a through the mechanical drive unit 109 to expose the image sensor (S202), and captures an EVF image at a cycle corresponding to the shutter speed. Reading from the element 103 (S203).

次に制御部１０２は、撮像したＥＶＦ画像が、予め定めた顔検出周期に対応するか否かを判定する（Ｓ２０４）。この顔検出周期は、例えば顔検出部１０６の処理能力に応じてＥＶＦ画像の何フレーム毎に継続的に顔検出を行うかとして定めることができる。 Next, the control unit 102 determines whether or not the captured EVF image corresponds to a predetermined face detection cycle (S204). This face detection cycle can be determined as the number of frames of the EVF image for which face detection is continuously performed, for example, according to the processing capability of the face detection unit 106.

撮像したＥＶＦ画像が顔検出周期に対応する場合、制御部１０２は動体検出部１１５を無効、顔検出部１０６を有効にし、ＥＶＦ画像に対する顔検出処理を実行させ、被写体情報（顔情報）を取得する（Ｓ２０５）。また、ＥＶＦ画像が顔検出周期に対応しない場合は、顔検出部１０６による顔検出は行わず、前回検出した被写体情報を保持する。 When the captured EVF image corresponds to the face detection cycle, the control unit 102 disables the moving object detection unit 115 and enables the face detection unit 106 to execute face detection processing on the EVF image and acquire subject information (face information). (S205). If the EVF image does not correspond to the face detection cycle, face detection by the face detection unit 106 is not performed and the previously detected subject information is held.

次に制御部１０２は、撮像したＥＶＦ画像が、予め定めた動体検出周期に対応するか否かを判定する（Ｓ２０６）。この動体検出周期は、例えば動体検出部１１５の処理能力に応じて、ＥＶＦ画像の何フレーム毎に動体検出を行うかとして定めることができる。なお、顔検出周期よりも動体検出周期の方が短くなるようにする。 Next, the control unit 102 determines whether or not the captured EVF image corresponds to a predetermined moving object detection cycle (S206). This moving object detection cycle can be determined as the number of frames of the EVF image to be detected according to the processing capability of the moving object detection unit 115, for example. Note that the moving object detection period is shorter than the face detection period.

撮像したＥＶＦ画像が動体検出周期に対応する場合、制御部１０２は顔検出部１０６を無効、動体検出部１１５を有効にし、動体検出（被写体追尾）を実行させ、被写体の移動距離及び方向を取得する。 When the captured EVF image corresponds to the moving object detection cycle, the control unit 102 disables the face detection unit 106, enables the moving object detection unit 115, performs moving object detection (subject tracking), and acquires the moving distance and direction of the subject. To do.

動体検出部１１５は、動体検出の対象となるＥＶＦ画像（画像ｂとする）と時系列上で隣接する過去のＥＶＦ画像（画像ａ）との間の被写体の動きを検出する。具体的には、動体検出部１１５は、画像ａが顔検出周期に対応していれば、顔検出部１０６で検出された顔領域について、画像ａが動体検出周期に対応していれば、動体検出により追尾された顔領域を対象として動体検出を行う。 The moving object detection unit 115 detects the movement of the subject between the EVF image (image b) that is the object of moving object detection and the past EVF image (image a) that is adjacent in time series. Specifically, the moving object detection unit 115 determines the moving object for the face area detected by the face detection unit 106 if the image a corresponds to the face detection cycle, and if the image a corresponds to the moving object detection cycle. Moving object detection is performed on the face area tracked by the detection.

動体検出部１１５は、画像ｂにおいて、画像ａにおける顔領域と最も類似した領域を探索し、それら２つの領域の中心座標の差から、被写体移動距離及び移動方向を被写体ベクトル(Tx,Ty)として算出する（Ｓ２０７）。なお、３枚以上の連続する画像に対して動体検出を行う場合、動体検出部１１５は、最初の２枚に対して行った検出結果を用いて、次の２枚以降についても順次検出を継続する。動体検出の周期は顔検出の周期よりも短いため、顔検出結果が更新されるまでの間に、複数回の動体検出が行われる。顔検出部１０６によって顔領域が検出されていれば、その顔領域を基準とした動体検出によって、被写体領域を追尾することで、その間に顔検出を行わなくても顔の動きを追尾することが可能となる。 The moving object detection unit 115 searches the image b for an area most similar to the face area in the image a, and sets the object moving distance and moving direction as the object vector (Tx, Ty) from the difference between the center coordinates of the two areas. Calculate (S207). When moving object detection is performed on three or more consecutive images, the moving object detection unit 115 uses the detection results obtained for the first two sheets to continue detection for the next two or more sheets. To do. Since the moving object detection cycle is shorter than the face detection cycle, moving object detection is performed a plurality of times before the face detection result is updated. If a face area is detected by the face detection unit 106, tracking the subject area by moving object detection based on the face area can track the movement of the face without performing face detection during that time. It becomes possible.

ここで、上述したように、顔検出は画像認識に基づく被写体検出であるため、その演算の複雑さから時間を要するが、目や頬といった顔の特徴量を抽出するため、検出精度が高い。これに対して、動体検出は、前のＥＶＦ画像で被写体として検出された領域と類似度の高い領域を、現在のＥＶＦ画像における被写体領域として検出する方法である。そのため、動体検出では、被写体領域が顔領域なのか否かを判断することができず、精度の面では顔検出に劣るが、演算の複雑さが顔検出よりも低いため、顔検出を行わない間の被写体追尾を行う処理として有用である。 Here, as described above, since face detection is subject detection based on image recognition, it takes time due to the complexity of the calculation. However, since face features such as eyes and cheeks are extracted, detection accuracy is high. On the other hand, the moving object detection is a method of detecting a region having a high similarity to a region detected as a subject in the previous EVF image as a subject region in the current EVF image. Therefore, in moving object detection, it cannot be determined whether or not the subject area is a face area, and although it is inferior to face detection in terms of accuracy, face detection is not performed because the computational complexity is lower than face detection. This is useful as a process for tracking the subject in between.

しかしながら、被写体輝度が低い場合など、感度を上昇させるためにＡ／Ｄ変換部１０４において撮像画像に適用するゲイン調整量を上昇させると、ＥＶＦ画像中のノイズ成分が増加し、動体検出の精度を低下させてしまう。 However, when the gain adjustment amount applied to the captured image in the A / D conversion unit 104 is increased in order to increase sensitivity, such as when the subject brightness is low, the noise component in the EVF image increases, and the accuracy of moving object detection increases. It will decrease.

その理由について説明する。ライブ画像は動画像として表示する必要性からフレームレートに下限があり、シャッタースピードの最長時間が制限される。例えばフレームレートが３０フレーム／秒の場合、シャッタースピードは１／３０秒より遅くすることはできない。そのため、被写体輝度が低い場合には、感度を上げる、具体的には画素信号のゲインを増加させて、ライブ画像の撮像を行う。感度（ゲイン）が高くなると、撮影画像中のランダムノイズ成分も増加する。 The reason will be described. Live images have a lower frame rate because they need to be displayed as moving images, and the maximum shutter speed is limited. For example, when the frame rate is 30 frames / second, the shutter speed cannot be slower than 1/30 seconds. Therefore, when the subject brightness is low, the live image is captured by increasing the sensitivity, specifically, by increasing the gain of the pixel signal. As the sensitivity (gain) increases, the random noise component in the captured image also increases.

図３は、感度による画質の変化と、画質の変化が動体検出の精度に与える影響を模式的に示す図である。
図３（ａ）は、通常の感度で撮影された画像を、図３（ｂ）は、図３（ａ）よりも高い感度で撮像された画像を、図３（ｃ）は、図３（ｂ）よりも高い感度で撮像された画像をそれぞれ模式的に示している。図３（ｂ）でランダムノイズ成分が現れ、図３（ｃ）ではさらに増加している。 FIG. 3 is a diagram schematically illustrating a change in image quality due to sensitivity and an influence of the change in image quality on the accuracy of moving object detection.
3A shows an image taken with normal sensitivity, FIG. 3B shows an image taken with higher sensitivity than FIG. 3A, and FIG. 3C shows FIG. The images captured with higher sensitivity than b) are schematically shown. A random noise component appears in FIG. 3B and further increases in FIG.

例えば、図３（ａ）の状態が継続すれば、まず被写体領域を顔検出によって検出し、その後は動体検出に切り替えて被写体領域を追尾する方法を採用しても、動体検出による誤検出の可能性は低い。なお、顔枠３１は、検出されている被写体領域（顔領域）をユーザに知らせるためにライブ画像上に重畳表示されるマークの一例である。 For example, if the state of FIG. 3 (a) continues, even if a method of detecting a subject area first by face detection and then switching to moving object detection to track the subject area is possible, erroneous detection by moving object detection is possible. The nature is low. Note that the face frame 31 is an example of a mark that is superimposed and displayed on the live image in order to notify the user of the detected subject area (face area).

しかし、例えば図３（ａ）の状態で顔検出し、動体検出に切り替えた後で図３（ｂ）の状態に変化すると、本来の被写体領域の周辺領域（点線の枠３２で示される領域）を誤検出する可能性が出てくる。その結果、画像間で被写体が動いていなくても、検出される被写体領域は矢印で示す範囲でばらつき始める。 However, for example, when face detection is performed in the state of FIG. 3A and the state is changed to the state of FIG. 3B after switching to moving body detection, the peripheral region of the original subject region (the region indicated by the dotted frame 32) There is a possibility of false detection. As a result, even if the subject does not move between images, the detected subject region starts to vary within the range indicated by the arrows.

さらにゲインが上昇して図３（ｃ）の状態になると、ランダムノイズ成分が増加することにより、誤検出によるばらつきの範囲がさらに広範囲となる。 When the gain further increases and the state shown in FIG. 3C is reached, the random noise component increases, so that the range of variation due to erroneous detection becomes wider.

その結果、被写体領域でない領域を移動後の被写体領域として誤検出してしまう。しかも、動体検出では、誤検出しても、検出した領域が正しい被写体領域（この場合は顔領域）かどうかを確認できないので、次の顔検出周期までの間に実施する動体検出において誤検出した被写体領域を追尾しつづけてしまう可能性もある。 As a result, a region that is not the subject region is erroneously detected as a subject region after movement. Moreover, in the moving object detection, it is not possible to confirm whether or not the detected area is a correct subject area (in this case, the face area) even if erroneously detected, so the erroneous detection was performed in the moving object detection performed until the next face detection cycle. There is also a possibility that the subject area continues to be tracked.

そのため、本実施形態では、制御部１０２が、撮像画像に適用されるゲイン調整量が予め定めた閾値を超えているかどうかを判別する（Ｓ２０８）。ゲイン調整量はＥＶＦ画像を撮影する際の感度の設定に応じた値であってよく、この場合、閾値は例えばISO 1600に相当するゲイン調整量であってよい。そして、ゲイン調整量が閾値を超えている場合には、顔検出と動体検出の併用を中止し、顔検出のみに基づいた被写体領域の決定に変更する（Ｓ２０９）。ゲイン調整量の増加によって画像に重畳するノイズ成分は、顔検出にとっても好ましくないが、顔検出では目や頬といった顔特有の特徴量を抽出するため、ノイズ成分の存在により顔以外の領域を誤検出する確率は、動体検出に比較してはるかに小さい。 Therefore, in this embodiment, the control unit 102 determines whether or not the gain adjustment amount applied to the captured image exceeds a predetermined threshold (S208). The gain adjustment amount may be a value corresponding to the sensitivity setting when capturing an EVF image. In this case, the threshold value may be a gain adjustment amount corresponding to, for example, ISO 1600. If the gain adjustment amount exceeds the threshold, the combined use of face detection and moving object detection is stopped, and the subject region is determined based on only face detection (S209). Noise components that are superimposed on the image due to an increase in gain adjustment amount are not preferable for face detection.However, in face detection, face-specific feature values such as eyes and cheeks are extracted. The probability of detection is much smaller compared to moving object detection.

具体的には、Ｓ２０９で制御部１０２は、直近の顔検出で検出された顔領域を、次の顔検出周期までの間のＥＶＦ画像における被写体領域として用いる。この場合、ゲイン調整量が閾値以下に低下するまでの間、動体検出部１１５による動体検出は行うが、その検出結果を使用しないようにしてもよいし、動体検出部１１５による動体検出動作そのものを行わないようにしてもよい。後者の場合、制御部１０２は、停止手段として機能する。 Specifically, in S209, the control unit 102 uses the face area detected by the most recent face detection as the subject area in the EVF image until the next face detection cycle. In this case, the moving object detection unit 115 performs the moving object detection until the gain adjustment amount falls below the threshold, but the detection result may not be used, or the moving object detection operation itself by the moving object detection unit 115 is performed. It may not be performed. In the latter case, the control unit 102 functions as a stopping unit.

一方、Ｓ２０８において、ゲイン調整量が予め定めた閾値以下であれば、制御部１０２は、顔検出の検出結果に基づく動体検出により追尾された領域を被写体領域として用いる。 On the other hand, if the gain adjustment amount is equal to or smaller than the predetermined threshold value in S208, the control unit 102 uses the area tracked by the moving object detection based on the detection result of the face detection as the subject area.

制御部１０２は、検出された被写体領域をユーザに知らせるため、ＥＶＦ画像に被写体領域を示す枠（顔枠３１）を重畳表示することができる。ゲイン調整量が閾値を超える場合には顔検出の結果のみを用いて枠表示を行うことで、図３（ｂ）や図３（ｃ）に点線で示したような顔枠３１の不規則な移動を回避し、安定した枠表示が実現できる。 The control unit 102 can superimpose and display a frame (face frame 31) indicating the subject area on the EVF image in order to notify the user of the detected subject area. When the gain adjustment amount exceeds the threshold value, the frame display is performed using only the result of the face detection, so that the irregularity of the face frame 31 as shown by the dotted line in FIG. 3B or FIG. A stable frame display can be realized by avoiding movement.

図４は、本実施形態における枠表示の例を模式的に示す図であり、図４（ａ）〜図４（ｃ）はそれぞれ図３（ａ）〜図３（ｃ）と同じＥＶＦ画像に対応している。
図４（ａ）の状態で顔検出が行われ、その後ゲイン調整量が上昇し、図４（ｂ）の状態ではゲイン調整量が閾値を超えたものとする。この場合、図４（ｂ）、図４（ｃ）に示すように、図４（ａ）で顔検出により得られた被写体領域の情報をそのまま用いて顔枠３３を表示する。そのため、図３（ｂ）や図３（ｃ）に示したような、動体検出精度の低下によって顔枠の表示位置が不安定になるという現象を回避することができる。 FIG. 4 is a diagram schematically showing an example of frame display in the present embodiment, and FIGS. 4A to 4C are respectively the same EVF images as FIGS. 3A to 3C. It corresponds.
Assume that face detection is performed in the state of FIG. 4A, and then the gain adjustment amount increases, and in the state of FIG. 4B, the gain adjustment amount exceeds the threshold value. In this case, as shown in FIGS. 4B and 4C, the face frame 33 is displayed using the information of the subject area obtained by the face detection in FIG. 4A as it is. Therefore, it is possible to avoid the phenomenon that the display position of the face frame becomes unstable due to a decrease in moving object detection accuracy as shown in FIG. 3B or 3C.

なお、ゲイン調整量の閾値は一例であり、他の値であってもよい。また、ゲイン調整量の閾値は固定値である必要はなく、撮像素子１０３のベースノイズレベルや他のノイズ要因に応じて可変であってよい。例えば、ＣＣＤイメージセンサの温度が高いほど、撮影画像中のノイズレベルが高くなるため、ゲイン調整量の閾値を下げる。 The gain adjustment amount threshold is an example, and may be another value. Further, the threshold value of the gain adjustment amount does not need to be a fixed value, and may be variable according to the base noise level of the image sensor 103 and other noise factors. For example, the higher the temperature of the CCD image sensor, the higher the noise level in the captured image, so the gain adjustment amount threshold is lowered.

Ｓ２１１で制御部１０２は、信号ＳＷ１が通知されているかどうかを判別する。そして、信号ＳＷ１が通知されていなければ処理をＳ２０２へ戻してＥＶＦ表示処理を継続する。 In S211, the control unit 102 determines whether or not the signal SW1 is notified. If the signal SW1 is not notified, the process returns to S202 and the EVF display process is continued.

一方、Ｓ２１１で制御部１０２は、信号ＳＷ１が通知されていれば、Ｓ２１２において、ＡＥ制御を、Ｓ２１３においてＡＦ制御を行う。この際、被写体領域が検出されていれば、制御部１０２は被写体領域が適正露出になるようにＡＥ制御を行ったり、被写体領域に合焦するようにＡＦ制御を行ったりすることができる。被写体領域が検出されていない場合、制御部１０２は予め定められた露出制御モードや焦点検出領域に基づいて、ＡＥ制御やＡＥ制御を行うことができる。 On the other hand, if the signal SW1 is notified in S211, the control unit 102 performs AE control in S212 and AF control in S213. At this time, if the subject area is detected, the control unit 102 can perform AE control so that the subject area is properly exposed, or perform AF control so that the subject area is focused. When the subject area is not detected, the control unit 102 can perform AE control and AE control based on a predetermined exposure control mode and focus detection area.

次に制御部１０２は信号ＳＷ２が通知されたか否かを判別し（Ｓ２１４）、通知されていればＡＥ制御及びＡＦ制御で設定した露出条件で本撮影処理を行う（Ｓ２１６）。一方、信号ＳＷ２が通知されていなければ、制御部１０２は信号ＳＷ１の状態を確認し（Ｓ２１５）、通知が解除されたならば処理をＳ２０２へ、通知が継続していれば処理をＳ２１４へ戻す。 Next, the control unit 102 determines whether or not the signal SW2 is notified (S214), and if notified, performs the main photographing process under the exposure conditions set by the AE control and the AF control (S216). On the other hand, if the signal SW2 is not notified, the control unit 102 confirms the state of the signal SW1 (S215). If the notification is canceled, the process returns to S202. If the notification continues, the process returns to S214. .

以上説明したように、本実施形態は、画像認識による被写体検出と、画像認識により検出された被写体領域を動体検出により追尾する被写体検出とが可能な画像処理装置において、撮像データに適用されるゲイン調整量を監視する。そして、ゲイン調整量が予め定めた閾値を超える場合には画像認識による被写体検出結果のみに基づく被写体領域を用いる。また、ゲイン調整量が閾値以下の場合は、動体検出による被写体検出の結果に基づく被写体領域をさらに用いる。 As described above, the present embodiment is a gain applied to imaging data in an image processing apparatus capable of subject detection by image recognition and subject detection by tracking a subject area detected by image recognition by moving object detection. Monitor the amount of adjustment. When the gain adjustment amount exceeds a predetermined threshold, a subject area based only on the subject detection result by image recognition is used. When the gain adjustment amount is equal to or smaller than the threshold value, a subject area based on the result of subject detection by moving object detection is further used.

ゲイン調整量が閾値を超え、動体検出の精度が低下する可能性が高い状態では画像認識による被写体検出の結果のみを用いるので、追従性は多少犠牲になるが、被写体領域の誤検出を防止することが可能である。その結果、本撮影時に、被写体からずれた背景へ合焦制御したり、露出設定を誤ったりするという問題を回避し、ＡＥ制御やＡＦ制御の精度を向上させることができる。 When the gain adjustment amount exceeds the threshold value and there is a high possibility that the accuracy of moving object detection will decrease, only the result of subject detection by image recognition is used, so tracking performance is sacrificed somewhat, but erroneous detection of the subject region is prevented. It is possible. As a result, it is possible to avoid problems such as focusing control on a background deviated from the subject or incorrect exposure setting during actual photographing, and improving the accuracy of AE control and AF control.

また、被写体領域を示す表示（枠表示など）を行う場合、動体検出による被写体領域の誤検出を原因とする表示位置の変動を防止し、安定した表示を行うことができる。 In addition, when displaying the subject area (frame display or the like), it is possible to prevent a change in the display position caused by erroneous detection of the subject area due to moving object detection, and to perform stable display.

一方で、ゲイン調整量が閾値以下の場合は、画像認識による被写体検出に加え、動体検出による被写体検出結果に基づく被写体領域を用いるので、精度と追従性とを兼ね備えた被写体検出を実現することができる。 On the other hand, when the gain adjustment amount is equal to or smaller than the threshold value, the subject area based on the subject detection result by moving object detection is used in addition to the subject detection by image recognition, so that subject detection having both accuracy and followability can be realized. it can.

＜第２の実施形態＞
次に、本発明の第２の実施形態について説明する。第１の実施形態においては、動体検出の精度が低下する要因の１つであるノイズ成分に着目し、ゲイン調整量が予め定めた閾値を超えるか否かに応じて、被写体領域の検出に画像認識と動体検出を併用するか、画像認識のみを用いるかを切り替えていた。 <Second Embodiment>
Next, a second embodiment of the present invention will be described. In the first embodiment, attention is paid to a noise component, which is one of the factors that reduce the accuracy of moving object detection, and an image is used to detect a subject area depending on whether the gain adjustment amount exceeds a predetermined threshold. It was switched between using recognition and moving object detection together or using only image recognition.

本実施形態では、動体検出の精度を低下させる別の要因である被写体ぶれに着目し、露光時間（シャッタースピード）が予め定めた閾値より長いか否かに応じて、被写体領域の検出に画像認識と動体検出を併用するか、画像認識のみを用いるかを切り替える。 In this embodiment, attention is paid to subject blur, which is another factor that reduces the accuracy of moving object detection, and image recognition is performed for subject region detection depending on whether the exposure time (shutter speed) is longer than a predetermined threshold. And whether to use motion detection together or only image recognition.

遅いシャッタースピードで動体を撮像した場合、被写体ぶれが起こりやすい。そのため、画像間の相関に基づく動体検出を行うことが困難となり、顔検出によって検出された顔領域の移動先が検出できないことが起こりうる。一方、顔検出は演算に時間がかかるが、被写体ぶれに対しても動体検出より強く、動体検出よりも高い被写体検出精度を維持できる。 When moving objects are imaged at a slow shutter speed, subject blurring tends to occur. For this reason, it is difficult to detect a moving object based on the correlation between images, and the destination of the face area detected by face detection may not be detected. On the other hand, the face detection takes time to calculate, but it is stronger than the moving object detection against the subject blur and can maintain the higher object detection accuracy than the moving object detection.

図６は、シャッタースピードが遅い場合の動体検出の問題点を模式的に説明するための図である。図６において、被写体（人物）が、ＥＶＦ画像のシャッタースピードではぶれてしまう速度で移動しており、例えば図６（ａ）において顔検出に成功したとする。この場合、図６（ｂ）から動体検出に切り替えても、図６（ａ）で検出された顔領域が図６（ｂ）の画像中のどこに移動したのか画像間の相関に基づいて正しく検出することはできない。その結果、被写体領域を正しく追尾できない。また、動体検出では検出した領域が顔かどうか判別できないため、その後の動体検出において、顔でない領域を追尾し続けてしまう。（図６（ｃ）〜図６（ｅ））。その結果、顔枠３１の表示位置が被写体とずれたままになってしまう。 FIG. 6 is a diagram for schematically explaining the problem of moving object detection when the shutter speed is slow. In FIG. 6, it is assumed that the subject (person) moves at a speed at which the shutter speed of the EVF image is blurred. For example, the face detection is successful in FIG. In this case, even when switching from FIG. 6 (b) to moving object detection, the position of the face area detected in FIG. 6 (a) is correctly detected based on the correlation between the images in FIG. 6 (b). I can't do it. As a result, the subject area cannot be tracked correctly. In addition, since it is impossible to determine whether the detected area is a face or not by moving object detection, the area that is not a face is continuously tracked in subsequent moving object detection. (FIGS. 6C to 6E). As a result, the display position of the face frame 31 remains shifted from the subject.

図５は、本実施形態のデジタルカメラの動作を説明するフローチャートであり、図２と同じ動作ステップについては同じ参照数字を付し、重複する説明を省略する。また、本実施形態のデジタルカメラは、第１の実施形態において図１を用いて説明したデジタルカメラ１００と同一構成であってよいため、以下においても、図１に示した構成要素を用いて説明する。 FIG. 5 is a flowchart for explaining the operation of the digital camera according to the present embodiment. The same operation steps as those in FIG. 2 are denoted by the same reference numerals, and redundant description is omitted. Further, since the digital camera of the present embodiment may have the same configuration as the digital camera 100 described with reference to FIG. 1 in the first embodiment, the following description is also given using the components shown in FIG. To do.

図５と図２との比較から明らかなように、Ｓ５０８における判定処理以外は第１の実施形態と同一の処理であってよいため、Ｓ５０８における判定処理についてのみ説明する。上述の通り、本実施形態では、露光時間（シャッタースピード）が予め定めた閾値（例えば１／３０秒）よりも長い（遅い）場合に、顔検出と動体検出の併用による被写体領域の検出から、顔検出のみによる被写体領域の検出へ切り替える。 As is clear from the comparison between FIG. 5 and FIG. 2, except for the determination processing in S508, the same processing as in the first embodiment may be performed, and therefore only the determination processing in S508 will be described. As described above, in the present embodiment, when the exposure time (shutter speed) is longer (slower) than a predetermined threshold (for example, 1/30 second), the detection of the subject area by the combined use of face detection and moving object detection, Switch to subject area detection by face detection only.

図７は、本実施形態の効果を模式的に示す図である。図７（ａ）において、露光時間が閾値よりも長いと判定された場合、顔検出のみに基づいて被写体領域を設定することで、追従性は多少犠牲になる（図７（ｄ））が、顔検出時に正しく被写体検出できる（図７（ｅ））ため、顔枠３１が被写体からずれない。一方、動体検出を併用した場合、図７（ｂ）、図７（ｃ）に示すように、顔枠３１が被写体からずれてしまう。 FIG. 7 is a diagram schematically illustrating the effect of the present embodiment. In FIG. 7A, when it is determined that the exposure time is longer than the threshold, setting the subject area based only on the face detection sacrifices the tracking ability somewhat (FIG. 7D). Since the subject can be detected correctly when the face is detected (FIG. 7E), the face frame 31 does not deviate from the subject. On the other hand, when the moving object detection is used together, as shown in FIGS. 7B and 7C, the face frame 31 is displaced from the subject.

以上説明したように、本実施形態によっても、第１の実施形態と同様の効果を実現できる。なお、本実施形態における露光時間の閾値は固定値である必要はない。例えば、顔検出周期に得られる被写体領域間から被写体の動きベクトルを算出し、被写体の移動速度が速い（動きベクトルが大きい）ほど閾値を短く設定してもよい。 As described above, the present embodiment can achieve the same effects as those of the first embodiment. Note that the threshold value of the exposure time in this embodiment does not have to be a fixed value. For example, the motion vector of the subject may be calculated from between the subject regions obtained in the face detection cycle, and the threshold may be set shorter as the subject moving speed is faster (the motion vector is larger).

＜他の実施形態＞
なお、第１の実施形態と本実施形態とは組み合わせて実施することも可能である。すなわち、ゲイン調整量が閾値を超えているか、露光時間が閾値よりも長い場合には、顔検出と動体検出の併用による被写体領域の検出から、顔検出のみによる被写体領域の検出へ切り替えることができる。 <Other embodiments>
It should be noted that the first embodiment and this embodiment can be implemented in combination. That is, when the gain adjustment amount exceeds the threshold value or the exposure time is longer than the threshold value, it is possible to switch from the detection of the subject area by the combined use of face detection and moving object detection to the detection of the subject area by only face detection. .

上述の実施形態は、システム或は装置のコンピュータ（或いはＣＰＵ、ＭＰＵ等）によりソフトウェア的に実現することも可能である。
従って、上述の実施形態をコンピュータで実現するために、該コンピュータに供給されるコンピュータプログラム自体も本発明を実現するものである。つまり、上述の実施形態の機能を実現するためのコンピュータプログラム自体も本発明の一つである。 The above-described embodiment can also be realized in software by a computer of a system or apparatus (or CPU, MPU, etc.).
Therefore, the computer program itself supplied to the computer in order to implement the above-described embodiment by the computer also realizes the present invention. That is, the computer program itself for realizing the functions of the above-described embodiments is also one aspect of the present invention.

なお、上述の実施形態を実現するためのコンピュータプログラムは、コンピュータで読み取り可能であれば、どのような形態であってもよい。例えば、オブジェクトコード、インタプリタにより実行されるプログラム、ＯＳに供給するスクリプトデータ等で構成することができるが、これらに限るものではない。 The computer program for realizing the above-described embodiment may be in any form as long as it can be read by a computer. For example, it can be composed of object code, a program executed by an interpreter, script data supplied to the OS, but is not limited thereto.

上述の実施形態を実現するためのコンピュータプログラムは、記憶媒体又は有線／無線通信によりコンピュータに供給される。プログラムを供給するための記憶媒体としては、例えば、フレキシブルディスク、ハードディスク、磁気テープ等の磁気記憶媒体、ＭＯ、ＣＤ、ＤＶＤ等の光／光磁気記憶媒体、不揮発性の半導体メモリなどがある。 A computer program for realizing the above-described embodiment is supplied to a computer via a storage medium or wired / wireless communication. Examples of the storage medium for supplying the program include a magnetic storage medium such as a flexible disk, a hard disk, and a magnetic tape, an optical / magneto-optical storage medium such as an MO, CD, and DVD, and a nonvolatile semiconductor memory.

有線／無線通信を用いたコンピュータプログラムの供給方法としては、コンピュータネットワーク上のサーバを利用する方法がある。この場合、本発明を形成するコンピュータプログラムとなりうるデータファイル（プログラムファイル）をサーバに記憶しておく。プログラムファイルとしては、実行形式のものであっても、ソースコードであっても良い。 As a computer program supply method using wired / wireless communication, there is a method of using a server on a computer network. In this case, a data file (program file) that can be a computer program forming the present invention is stored in the server. The program file may be an executable format or a source code.

そして、このサーバにアクセスしたクライアントコンピュータに、プログラムファイルをダウンロードすることによって供給する。この場合、プログラムファイルを複数のセグメントファイルに分割し、セグメントファイルを異なるサーバに分散して配置することも可能である。
つまり、上述の実施形態を実現するためのプログラムファイルをクライアントコンピュータに提供するサーバ装置も本発明の一つである。 Then, the program file is supplied by downloading to a client computer that has accessed the server. In this case, the program file can be divided into a plurality of segment files, and the segment files can be distributed and arranged on different servers.
That is, a server apparatus that provides a client computer with a program file for realizing the above-described embodiment is also one aspect of the present invention.

また、上述の実施形態を実現するためのコンピュータプログラムを暗号化して格納した記憶媒体を配布し、所定の条件を満たしたユーザに、暗号化を解く鍵情報を供給し、ユーザの有するコンピュータへのインストールを許可してもよい。鍵情報は、例えばインターネットを介してホームページからダウンロードさせることによって供給することができる。 In addition, a storage medium in which the computer program for realizing the above-described embodiment is encrypted and distributed is distributed, and key information for decrypting is supplied to a user who satisfies a predetermined condition, and the user's computer Installation may be allowed. The key information can be supplied by being downloaded from a homepage via the Internet, for example.

また、上述の実施形態を実現するためのコンピュータプログラムは、すでにコンピュータ上で稼働するＯＳの機能を利用するものであってもよい。
さらに、上述の実施形態を実現するためのコンピュータプログラムは、その一部をコンピュータに装着される拡張ボード等のファームウェアで構成してもよいし、拡張ボード等が備えるＣＰＵで実行するようにしてもよい。 Further, the computer program for realizing the above-described embodiment may use an OS function already running on the computer.
Further, a part of the computer program for realizing the above-described embodiment may be configured by firmware such as an expansion board attached to the computer, or may be executed by a CPU provided in the expansion board. Good.

Claims

First detecting means for detecting a predetermined subject existing in the image of the frame by detecting a part having a predetermined feature from the image of the frame of the moving image;
A second detection unit that tracks a region where the predetermined subject detected by the first detection unit exists in the moving image by searching for a similar region between the images of the frame;
Determining means for determining an area where the predetermined subject exists in the moving image based on at least one of a detection result by the first detection means and a detection result by the second detection means;
When the predetermined means satisfies the predetermined condition, the area where the predetermined subject exists is based on the detection result by the first detection means instead of the detection result by the second detection means. If the predetermined condition is not satisfied, an area where the predetermined subject exists is determined based on the detection result by the first detection means and the detection result by the second detection means. To do,
The predetermined condition is that at least one of a gain adjustment amount applied to the moving image exceeds a predetermined threshold and an exposure time for an image of one frame is longer than a predetermined threshold. An image processing apparatus comprising:

The image processing apparatus according to claim 1, wherein the gain adjustment amount is a value that increases as an imaging sensitivity setting when the image is captured is high.

The image processing apparatus according to claim 1, wherein the first detection unit detects a human face as the predetermined subject.

A first detection step in which a first detection unit detects a predetermined subject existing in the image of the frame by detecting a part having a predetermined characteristic from the image of the frame of the moving image;
The second detection means searches for a similar region between the images of the frame, thereby tracking a region where the predetermined subject detected in the first detection step is present in the moving image. Detection process of
A determining step for determining a region in the moving image where the predetermined subject is present based on at least one of the detection result in the first detection step and the detection result in the second detection step; Have
In the determination step, when the predetermined condition is satisfied, the determination means does not depend on the detection result in the second detection step, but based on the detection result in the first detection step, the predetermined subject. If the predetermined condition is not satisfied and the predetermined condition is not satisfied, the predetermined subject exists based on the detection result in the first detection step and the detection result in the second detection step. Which determines the area to be
The predetermined condition is that at least one of a gain adjustment amount applied to the moving image exceeds a predetermined threshold and an exposure time for an image of one frame is longer than a predetermined threshold. An image processing apparatus control method comprising:

The program for functioning a computer as each means of the image processing apparatus of any one of Claim 1 thru | or 3.