JP7479803B2

JP7479803B2 - Image processing device and image processing method

Info

Publication number: JP7479803B2
Application number: JP2019158661A
Authority: JP
Inventors: 佳絵伊藤
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2019-08-30
Filing date: 2019-08-30
Publication date: 2024-05-09
Anticipated expiration: 2039-08-30
Also published as: JP2021040183A

Description

本発明は、被写体を検出し、検出した被写体位置を表示する装置に関する。 The present invention relates to a device that detects a subject and displays the detected subject position.

撮像素子を使用したデジタルカメラにおいて、撮像素子から得られた画像データから被写体の検出及び追尾を行い、その被写体に対してピント、明るさ、色を好適な状態に合わせて撮影することが一般的になっている。検出対象となる被写体として一般的なものとしては、人物の顔や人体、あるいは犬猫などの特定の動物などが知られている。さらに、検出した被写体の特定の部位を検出する技術として、顔の中の目（瞳）、鼻、口といった器官検出がある。器官検出の代表的な使用用途としては、検出された目を焦点検出領域に設定しオートフォーカスを行う瞳ＡＦがある。特許文献１には、統計データ上の顔の輪郭領域からの瞳領域の相対位置を用いて、顔輪郭情報から瞳領域を推定する方法が開示されている。 In digital cameras that use an image sensor, it is common to detect and track a subject from image data obtained from the image sensor, and then photograph the subject by adjusting the focus, brightness, and color to optimal conditions. Typical subjects to be detected include a person's face or body, or specific animals such as dogs and cats. Furthermore, organ detection, such as the eyes (pupils), nose, and mouth in the face, is a technique for detecting specific parts of a detected subject. A typical use of organ detection is pupil AF, which sets the detected eye as the focus detection area and performs autofocus. Patent Document 1 discloses a method for estimating the pupil area from face contour information, using the relative position of the pupil area from the face contour area in statistical data.

特開２００９－１７５７４４号公報JP 2009-175744 A

カメラから遠い、すなわち小さな被写体では、被写体に対する情報が減少するため、検出精度が出にくい。このような小さな被写体に対して瞳検出を行う場合に、既存の手法では課題が存在している。 When the subject is far from the camera, i.e. small, the amount of information about the subject is reduced, making it difficult to achieve accurate detection. When performing pupil detection on such small subjects, existing methods pose problems.

そこで、本発明では、先行技術と比較して、小さい被写体に対してリアルタイム性を損なうことなく精度よく瞳位置を推定して表示する技術を提供することを目的としている。 Therefore, the present invention aims to provide a technology that can estimate and display the pupil position with high accuracy without compromising real-time performance for small subjects, compared to the prior art.

そこで、本発明の画像処理装置は、表示手段に撮像画像およびアイコンを表示する表示制御手段と、撮像画像から顔および目を検出する検出手段と、第１の撮像画像において設定された顔領域を前記検出手段による検出よりも簡易な検出手法で第２の撮像画像から検出し追尾する追尾手段と、前記検出手段が検出した目の位置、顔の位置および顔のサイズ情報をもとに顔に対する目の相対位置を決定する決定手段とを有し、前記決定手段は、第２の撮像画像において前記追尾手段が検出した前記顔領域の顔位置および顔サイズに対して、前記顔に対する目の相対位置関係を当てはめて目の推定位置を決定し、前記表示制御手段は、前記決定された目の推定位置を示すアイコンを表示手段に表示された前記第２の撮像画像に重畳表示することを特徴とする。 The image processing device of the present invention has a display control means for displaying a captured image and an icon on a display means, a detection means for detecting a face and eyes from the captured image, a tracking means for detecting and tracking a face area set in a first captured image from a second captured image using a detection method that is simpler than detection by the detection means, and a determination means for determining a relative position of the eyes to the face based on the eye position, face position and face size information detected by the detection means, the determination means determining an estimated position of the eyes by applying the relative positional relationship of the eyes to the face to the face position and face size of the face area detected by the tracking means in the second captured image, and the display control means superimposes and displays an icon indicating the determined estimated position of the eyes on the second captured image displayed on the display means.

本発明によれば、小さい被写体に対してリアルタイム性を損なうことなく精度よく瞳位置を推定して表示することができる。 The present invention makes it possible to accurately estimate and display the pupil position for small subjects without compromising real-time performance.

本発明を実施するための構成図Configuration diagram for implementing the present invention 顔および器官検出方法に関する図Diagram of face and organ detection method 追尾方法に関する図Diagram of tracking method 実施例１におけるタイミング図Timing diagram in the first embodiment 実施例１における瞳位置推定方法を示した図FIG. 1 is a diagram showing a pupil position estimation method in the first embodiment. 実施例１における小顔時の瞳位置推定に関するフローチャートFlowchart for estimating pupil position when face is small in the first embodiment 実施例２における顔角度による表示枠位置の補正方法を示した図FIG. 13 is a diagram showing a method for correcting the display frame position based on the face angle in the second embodiment. 実施例２における顔角度による表示枠サイズの補正方法を示した図FIG. 13 is a diagram showing a method for correcting the display frame size depending on the face angle in the second embodiment.

本発明の各種実施例について、添付図面を参照して説明する。各実施形態では、瞳検出機能を有する撮像装置を例示する。瞳検出機能を有する撮像装置としては、ビデオカメラ、デジタルカメラおよび銀塩スチルカメラや、さらにカメラ機能を搭載したスマートフォンなどの携帯機器も本発明の一側面を構成する。 Various embodiments of the present invention will be described with reference to the accompanying drawings. In each embodiment, an imaging device having an pupil detection function is exemplified. Examples of imaging devices having pupil detection function include video cameras, digital cameras, and silver halide still cameras, as well as mobile devices such as smartphones equipped with a camera function, which also constitute one aspect of the present invention.

本実施形態では、瞳ＡＦの発動および発動位置をユーザに通知するため、カメラ上の表示装置上の被写体の画像データに重畳する形で、瞳ＡＦ領域に枠などのマークを表示する。特に動きのある被写体においては、ユーザに対して正確に瞳ＡＦの発動位置を知らせるために、リアルタイム性が要求される。また、瞳ＡＦは、撮影者の撮影準備にゆとりを持たせるためにカメラに対してなるべく遠い被写体でも発動し、枠表示されることが好ましい。 In this embodiment, to notify the user of the activation and activation position of eye AF, a mark such as a frame is displayed in the eye AF area, superimposed on the image data of the subject on the display device on the camera. Particularly for moving subjects, real-time performance is required to accurately inform the user of the activation position of eye AF. Also, it is preferable that eye AF be activated and a frame be displayed even for subjects as far away as possible from the camera, to allow the photographer ample time to prepare for shooting.

顔検出および器官検出手段の既知の方法として、Ｈａａｒ－ｌｉｋｅ特徴（２つの領域の輝度差）を用いた手法や、ディープラーニング等が知られている。別の手段として、テンプレートマッチング手法を用いた汎用的な物体追尾を用いて、ある時点での器官位置を含む画像エリアをテンプレートとして作成し、次フレーム以降はテンプレートと類似するエリアを探索するパターンマッチングを行う手法が知られている。また、さらに他の手段として、器官の位置を他の情報から簡易推定する方法が知られている。 Known methods for face detection and organ detection include a method using Haar-like features (brightness difference between two areas) and deep learning. Another method is a method that uses general-purpose object tracking using a template matching method to create an image area containing the position of an organ at a certain point in time as a template, and performs pattern matching to search for areas similar to the template in the next frame and thereafter. Still another method is a method of simply estimating the position of an organ from other information.

一般的に器官検出手段は精度が高い代わりに検出にかかる時間が長い。そのため、リアルタイム性が損なわれてしまう。特に、顔検出と器官検出は別手段として装置が構成される場合にさらに時間を必要とする。例えば画面内から複数の顔候補を検出し、その複数の顔候補の中からＡＦ対象となりうる顔を選択・決定した上で顔エリアを器官検出手段に投入し、より詳細に解析することで器官を検出することが可能となる。このように、顔検出と器官検出を別構成とした場合は実行の順序関係がおのずと決定され、さらに器官検出にかかる時間によっては、顔検出実行時との時間的な乖離が発生する。 In general, organ detection means are highly accurate but take a long time to perform detection. This impairs real-time performance. In particular, when face detection and organ detection are configured as separate means in the device, more time is required. For example, it is possible to detect organs by detecting multiple face candidates within the screen, selecting and determining faces that can be AF targets from among the multiple face candidates, and then inputting the face area into the organ detection means and analyzing it in more detail. In this way, when face detection and organ detection are configured separately, the order of execution is automatically determined, and furthermore, depending on the time it takes for organ detection, a time lag occurs when face detection is performed.

一方、テンプレートマッチング手法を用いた追尾手段は、検出にかかる時間が短くリアルタイム性の確保が容易である代わりに、精度に課題がある。そもそもテンプレートマッチングは画像内に存在する類似パターンを探索する方式のため、本来検出したい被写体とは別の、類似する別被写体をヒットしてしまう場合がある。小さい顔の中のさらに小さい目では追尾に十分な解像度が得られず、マッチング精度が不足して誤追尾が発生する課題がある。器官の位置を他の情報から簡易推定する方法である特許文献１に記載の手法では、検出にかかる時間が短い利点はあるが、人の顔の平均形状に基づいて瞳領域を推定するため、顔パーツが平均から偏っている被写体ではズレが生じてしまう。 On the other hand, tracking methods using template matching methods require less time for detection and are easier to ensure in real time, but they have problems with accuracy. Template matching is a method for searching for similar patterns in an image, so it may hit a similar subject other than the one you are actually trying to detect. Smaller eyes in a small face do not provide sufficient resolution for tracking, and there is a problem that mistracking occurs due to insufficient matching accuracy. The method described in Patent Document 1, which is a method for simply estimating the position of organs from other information, has the advantage of requiring less time for detection, but estimates the pupil area based on the average shape of a human face, so deviations occur in subjects whose facial features are biased from the average.

そこで本実施形態では、器官検出処理とパターンマッチング処理とを併用することで瞳検出のリアルタイム性と精度を両立することを特徴とする。 Therefore, this embodiment is characterized by combining organ detection processing and pattern matching processing to achieve both real-time and accurate pupil detection.

図１は本発明の第１実施形態に係る撮像装置の構成例を示す図であり、瞳ＡＦ機能を搭載したミラーレスカメラ（以下、カメラ）の構成を例示したものである。 Figure 1 shows an example of the configuration of an imaging device according to a first embodiment of the present invention, and illustrates the configuration of a mirrorless camera (hereinafter, camera) equipped with an eye AF function.

交換レンズ１００は、カメラ本体部１２０に装着可能な光学機器のうちの一つである。交換レンズ１００は、主撮影光学系１０２、光量調節を行う絞り１０３、およびピント調節を行うフォーカスレンズ群１０４を含む撮影レンズユニット１０１を備える。 The interchangeable lens 100 is one of the optical devices that can be attached to the camera body 120. The interchangeable lens 100 has a photographing lens unit 101 that includes a main photographing optical system 102, an aperture 103 that adjusts the amount of light, and a focus lens group 104 that adjusts the focus.

レンズシステム制御用マイクロコンピュータ（以下、レンズ制御部という）１１１は、絞り１０３の動作を制御する絞り制御部１１２、フォーカスレンズ群１０４の動作（駆動とも称する）を制御するフォーカスレンズ制御部１１３、などを備える。フォーカスレンズ制御部１１３は、カメラ本体部１２０から取得したフォーカスレンズ駆動情報に基づいてフォーカスレンズ群１０４を撮影レンズユニット１０１の光軸方向に駆動させ、カメラのピント調節を行う。 The lens system control microcomputer (hereinafter referred to as the lens control unit) 111 includes an aperture control unit 112 that controls the operation of the aperture 103, and a focus lens control unit 113 that controls the operation (also referred to as driving) of the focus lens group 104. The focus lens control unit 113 drives the focus lens group 104 in the optical axis direction of the photographing lens unit 101 based on focus lens driving information obtained from the camera body unit 120, and adjusts the focus of the camera.

なお、フォーカスレンズ群１０４は、複数のフォーカスレンズを有していても、１枚のフォーカスレンズのみを有していても良い。また、ここでは図の簡略化のため、交換レンズの例として単焦点レンズを示しているが、焦点距離を変更可能なレンズ（ズームレンズ）であっても良い。ズームレンズである場合には、レンズ制御部１１３はズームレンズ位置を検出するエンコーダ出力から焦点距離情報する。また、手振れ補正機能を搭載したレンズの場合には、レンズ制御部１１３は、振れ補正用のシフトレンズ群などの制御も行う。 The focus lens group 104 may have multiple focus lenses or only one focus lens. Also, to simplify the diagram, a fixed focal length lens is shown as an example of an interchangeable lens, but a lens with a changeable focal length (zoom lens) may also be used. In the case of a zoom lens, the lens control unit 113 obtains focal length information from an encoder output that detects the zoom lens position. In the case of a lens equipped with an image stabilization function, the lens control unit 113 also controls a shift lens group for shake correction.

カメラ本体部１２０は、露出制御に用いるシャッター１２１や、ＣＭＯＳ（相補型金属酸化膜半導体）センサ等の撮像素子１２２を備える。撮像素子１２２の出力する撮像信号は、アナログ信号処理回路１２３で処理された後、カメラ信号処理回路１２４に送られる。 The camera body 120 includes a shutter 121 used for exposure control and an image sensor 122 such as a CMOS (complementary metal-oxide semiconductor) sensor. The image signal output by the image sensor 122 is processed by an analog signal processing circuit 123 and then sent to a camera signal processing circuit 124.

カメラシステム制御用マイクロコンピュータ（以下、カメラ制御部という）１３１は、撮像装置全体を制御する。例えば、カメラ制御部１３１は不図示のシャッター駆動用のモータを駆動制御し、シャッター１２１を駆動する。メモリカード１２５は撮影された画像のデータを記録する記録媒体である。撮影者によって操作されるレリーズスイッチ１８１の押下状態がカメラ制御部１３１に送られ、その状態に応じて撮像した画像がメモリカード１２５に記憶される。 The camera system control microcomputer (hereafter referred to as the camera control unit) 131 controls the entire imaging device. For example, the camera control unit 131 drives and controls a motor for driving a shutter (not shown), and drives the shutter 121. The memory card 125 is a recording medium that records data on captured images. The pressed state of a release switch 181 operated by the photographer is sent to the camera control unit 131, and the captured image according to that state is stored on the memory card 125.

画像表示部１７１は、撮影者がカメラで撮影しようとしている画像をモニタし、また撮影した画像を表示する液晶パネル（ＬＣＤ）等の表示デバイスを備える。また、タッチパネル１７２は撮影者が指やタッチペンにより画像表示部１７１における座標を指定することができる操作部であり、画像表示部１７１とは一体的に構成することができる。例えば、タッチパネルを光の透過率が画像表示部１７１の表示を妨げないように構成し、画像表示部１７１の表示面の内部に組み込む内蔵型（インセル型）などである。そして、タッチパネル１７２上の入力座標と、画像表示部１７１上の表示座標とを対応付ける。これにより、あたかもユーザが画像表示部１７１上に表示された画面を直接的に操作可能であるかのようなＧＵＩ（グラフィカルユーザーインターフェース）を構成することができる。操作部１７２への操作状態はカメラ制御部１３１で管理される。 The image display unit 171 is equipped with a display device such as a liquid crystal panel (LCD) that monitors the image that the photographer is about to capture with the camera and displays the captured image. The touch panel 172 is an operation unit that allows the photographer to specify coordinates on the image display unit 171 with a finger or a touch pen, and can be configured integrally with the image display unit 171. For example, the touch panel can be configured so that the light transmittance does not interfere with the display of the image display unit 171, and can be built into the display surface of the image display unit 171 (in-cell type). Then, the input coordinates on the touch panel 172 are associated with the display coordinates on the image display unit 171. This makes it possible to configure a GUI (graphical user interface) that makes it seem as if the user can directly operate the screen displayed on the image display unit 171. The operation state of the operation unit 172 is managed by the camera control unit 131.

カメラ本体部１２０は、交換レンズ１００とのマウント面に、交換レンズ１００と通信を行うための通信端子であるマウント接点部１６１を備える。また、交換レンズ１００は、カメラ本体１２０とのマウント面に、カメラ本体１２０と通信を行うための通信端子であるマウント接点部１１４を備える。 The camera body 120 has a mount contact 161, which is a communication terminal for communicating with the interchangeable lens 100, on its mount surface with the interchangeable lens 100. The interchangeable lens 100 also has a mount contact 114, which is a communication terminal for communicating with the camera body 120, on its mount surface with the camera body 120.

レンズ制御部１０５とカメラ制御部１３１は、マウント接点部１１４および１６１を介して所定のタイミングでシリアル通信を行うよう通信を制御する。この通信により、カメラ制御部１３１からレンズ制御部１１１にはフォーカスレンズ駆動情報、絞り駆動情報などが送られ、レンズ制御部１１１からカメラ制御部１３１へ焦点距離などの光学情報が送られる。 The lens control unit 105 and the camera control unit 131 control communication so that serial communication is performed at a predetermined timing via the mount contact units 114 and 161. Through this communication, focus lens driving information, aperture driving information, and the like are sent from the camera control unit 131 to the lens control unit 111, and optical information such as focal length is sent from the lens control unit 111 to the camera control unit 131.

カメラ信号処理回路１２４は、顔情報検出部１４１を備え、さらに器官情報検出部１４２を備えており、器官情報検出部１４２は顔情報検出部１４１で検出した顔情報から、瞳、口などの器官情報を検出する。追尾部１４３は、アナログ信号処理回路１２３より撮像信号を受け取り、前記顔情報検出部１４１および器官情報検出部１４２よりも精度は劣るが高速な検出手法により瞳や顔を検出する。算出部１４４は、顔情報検出部１４１、器官情報検出部１４２、追尾部１４３における検出結果を用いて顔に対する瞳の相対位置を算出する。顔情報検出部１４１、器官情報検出部１４２、追尾部１４３における検出結果および算出部１４４の算出結果はカメラ制御部１３１に送られる。 The camera signal processing circuit 124 includes a face information detection unit 141 and an organ information detection unit 142, which detects organ information such as the pupils and mouth from the face information detected by the face information detection unit 141. The tracking unit 143 receives an imaging signal from the analog signal processing circuit 123 and detects the pupils and face using a detection method that is faster but less accurate than the face information detection unit 141 and the organ information detection unit 142. The calculation unit 144 calculates the relative position of the pupils with respect to the face using the detection results from the face information detection unit 141, the organ information detection unit 142, and the tracking unit 143. The detection results from the face information detection unit 141, the organ information detection unit 142, and the tracking unit 143 and the calculation results from the calculation unit 144 are sent to the camera control unit 131.

カメラ制御部１３１には、本発明に関連するブロックとして、対象とする瞳を検出した顔情報から自動で選択する瞳自動選択部１５０、検出した顔、あるいは瞳情報に対応して表示部１７１に表示させるための検出枠を設定する表示枠設定部１５１を有する。また、撮影者による操作に応じて、撮影者が指定した瞳を検出し続ける瞳として指定、あるいは指定を解除する瞳指定／指定解除部１５２を有する。また、ユーザによる操作に応じて指定した瞳や顔のほか、顔情報検出部１４１および器官情報検出部１４２、追尾部１４３の検出結果を記憶する記憶部１５３、さらに、選択あるいは指定した瞳や顔を、ピントを合わせるべき被写体（対象被写体とも称する）として焦点検出部１５５に通知するＡＦ対象被写体設定部１５４がある。これらは顔情報検出部１４１、器官情報検出部１４２、追尾部１４３の出力に基づいて動作する。焦点検出部１５５は、ＡＦ対象被写体設定部１５４によって通知されたピントを合わせるべき被写体に対応する画像信号に基づいて、焦点検出処理を行う。焦点検出処理は、例えば公知の位相差検出式や、コントラスト検出式等によって実行される。位相差検出式の場合、焦点検出処理として、視差を有する一対の像信号を相関演算することで算出された像すれ量の算出、もしくは像ずれ量を更にデフォーカス量に変換して算出する処理が行われる。デフォーカス量は、交換レンズ１００のレンズ駆動時の敏感度等を考慮することで、更にフォーカスレンズ駆動量へと変換することができる。カメラ制御部１３１は、焦点検出部１５５によって検出された焦点検出結果（像ずれ量またはデフォーカス量）あるいは焦点検出結果に基づいて算出されたフォーカスレンズ駆動量をレンズ制御部１１１に送信する。フォーカスレンズ制御部１１３は、カメラ制御部１３１から受信したフォーカスレンズ駆動情報に基づいて、フォーカスレンズの駆動を制御する。換言すると、カメラ制御部１３１がフォーカスレンズ制御部１１３を介してフォーカスレンズの駆動を制御する。 The camera control unit 131 has, as blocks related to the present invention, an automatic pupil selection unit 150 that automatically selects a target pupil from the detected face information, and a display frame setting unit 151 that sets a detection frame to be displayed on the display unit 171 in response to the detected face or pupil information. It also has a pupil designation/designation cancellation unit 152 that designates the pupil designated by the photographer as a pupil to be continuously detected or cancels the designation in response to the operation by the photographer. In addition to the pupil or face designated in response to the operation by the user, there is a memory unit 153 that stores the detection results of the face information detection unit 141, the organ information detection unit 142, and the tracking unit 143, and further an AF target subject setting unit 154 that notifies the focus detection unit 155 of the selected or designated pupil or face as a subject to be focused on (also called a target subject). These operate based on the outputs of the face information detection unit 141, the organ information detection unit 142, and the tracking unit 143. The focus detection unit 155 performs focus detection processing based on an image signal corresponding to the subject to be focused on notified by the AF target subject setting unit 154. The focus detection process is performed, for example, by a known phase difference detection method or contrast detection method. In the case of the phase difference detection method, the focus detection process involves calculating the amount of image shift calculated by performing a correlation operation on a pair of image signals having parallax, or converting the image shift amount into a defocus amount and calculating it. The defocus amount can be further converted into a focus lens drive amount by taking into account the sensitivity of the interchangeable lens 100 when driving the lens. The camera control unit 131 transmits the focus detection result (image shift amount or defocus amount) detected by the focus detection unit 155 or the focus lens drive amount calculated based on the focus detection result to the lens control unit 111. The focus lens control unit 113 controls the drive of the focus lens based on the focus lens drive information received from the camera control unit 131. In other words, the camera control unit 131 controls the drive of the focus lens via the focus lens control unit 113.

このような構成からなるカメラにおいて、顔の器官を検出及び追尾する方法について説明する。 We will explain how to detect and track facial features in a camera with this configuration.

図２に検出器による顔および器官の検出を行う器官検出処理の様子を示す。本実施形態では、顔情報検出部１４１あるいはその中の器官情報検出部１４２が行うものとする。図２（ａ）は画像内に複数の顔が存在している状況で複数の顔が検出できていることを示している（顔エリア２０１、顔エリア２１１、顔エリア２１２）。図２（ｂ）は顔エリア２０１を器官検出器（器官情報検出部１４２）にかけた結果を示している。器官検出は顔内部の特徴点を検出するものであり、代表的なものとして、目（２０３、２０５）、鼻２０７、口２０９を検出することができる。図２（ｃ）は顔の状況によって変化する顔検出の検出スコアの大小の様子を示している。検出スコアは顔検出の信頼度、自信度と言い換えられる情報で、その人物自身の表情によって変化したり、環境光の当たり方などの外部要因によっても変化する。高スコアであるほど正確な位置を検出できており、低スコアの場合はその位置情報は信頼ならない、といった使い分けを行うことができる。図２（ｃ）の顔エリア２０１では、目は大きく開いており、口もはっきりしている（比較的大きく見えている）ので顔検出スコアは相対的に高い。一方で顔エリア２３１、顔エリア２５１になるにつれて目は小さくあるいは閉じた状態に近くなり、口元もはっきり見えず小さくすぼんできており、顔検出スコアは合わせて相対的に低くなる。 Figure 2 shows the organ detection process in which a detector detects faces and organs. In this embodiment, this is performed by the face information detection unit 141 or the organ information detection unit 142 therein. Figure 2(a) shows that multiple faces can be detected in a situation in which multiple faces exist in an image (face area 201, face area 211, face area 212). Figure 2(b) shows the result of applying the face area 201 to the organ detector (organ information detection unit 142). Organ detection detects feature points inside the face, and representative examples include the eyes (203, 205), nose 207, and mouth 209. Figure 2(c) shows the magnitude of the face detection score that changes depending on the face situation. The detection score is information that can be rephrased as the reliability or confidence of face detection, and it changes depending on the person's own facial expression and external factors such as the way the ambient light hits the area. It is possible to distinguish between the higher the score, the more accurate the position detection, and the lower the score, the less reliable the position information. In face area 201 in FIG. 2(c), the eyes are wide open and the mouth is clear (appears relatively large), so the face detection score is relatively high. On the other hand, in face areas 231 and 251, the eyes are small or close to being closed, the mouth is not clearly visible, and is small and puckered, so the face detection score is relatively low overall.

図２（ｄ）は検出された顔と器官の位置から、顔の角度を推定する様子を示している。正面顔（角度０度）の顔では、顔枠２６１の中心線２６２に対して両瞳が左右均等に配置され、鼻や口の中心は中心線上に配置される。一方、斜め４５度回転した顔では顔枠２７１の中心線２７２から目鼻口が寄っており、左右均等性が崩れる。目鼻口の配置の偏り程度によって、顔の角度を推定することができる。顔角度の信頼度は、目鼻口の検出スコアに基づき算出される。 Figure 2(d) shows how the face angle is estimated from the positions of the detected face and organs. In a frontal view (angle 0 degrees), the pupils are positioned evenly on the left and right of the center line 262 of the face frame 261, and the centers of the nose and mouth are positioned on the center line. On the other hand, in a face rotated at a 45-degree angle, the eyes, nose, and mouth are closer to the center line 272 of the face frame 271, destroying the left-right evenness. The face angle can be estimated based on the degree of bias in the positioning of the eyes, nose, and mouth. The reliability of the face angle is calculated based on the detection score of the eyes, nose, and mouth.

図３に汎用的な追尾手段による追尾処理の動作概念を示す。本実施形態では、撮像素子１２２からの信号に基づいて追尾部１４３によって本追尾処理が行われるものとする。撮像素子は駆動信号ＶＤ周期３０１で駆動され、露光３０３と読み出し３０５の繰り返し動作を行う。撮像素子から得られたデータは現像処理されて可視画像として画像生成される（画像生成３０７）。図３ではこの画像を用いてテンプレートマッチング（パターンマッチング）手法を適用する。露光１に基づいて画像生成３０７の処理で生成された画像１において、ユーザによる指定あるいは被写体検出処理に基づき自動で、追尾対象を含むエリアをテンプレート３３１（追尾領域）として設定する。その後、露光２に基づいて生成された画像２に対して、順次位置を変えながらテンプレート画像３３１との画像差分をとるサーチ動作を行う（サーチ３３３）。差分値を得る作業を繰り返し行い、差分が最も小さいマッチエリア３３５が追尾対象が存在する位置として確定する。その後、画像１と画像２の間で追尾処理が成立したあと画像２で求めた追尾位置を次回のためのテンプレートとして更新し、画像２と３による追尾、画像３と４による追尾を繰り返し実行していくことで、連続的な追尾処理を実現する。なお、更新の頻度は所定フレームごと、あるいは撮影条件や環境によって制御されてもよい。追尾手段によって、顔位置および瞳などの器官の位置を得ることができるが、マッチングの精度を得るためには対象が規定のサイズ以上であることが好ましい。 Figure 3 shows the operation concept of tracking processing by a general-purpose tracking means. In this embodiment, this tracking processing is performed by the tracking unit 143 based on a signal from the image sensor 122. The image sensor is driven by a drive signal VD cycle 301, and performs repeated operations of exposure 303 and readout 305. The data obtained from the image sensor is developed and generated as a visible image (image generation 307). In Figure 3, a template matching (pattern matching) method is applied using this image. In image 1 generated by the image generation 307 process based on exposure 1, an area including the tracking target is set as a template 331 (tracking area) by user specification or automatically based on subject detection processing. After that, a search operation is performed for image 2 generated based on exposure 2 to obtain the image difference with the template image 331 while sequentially changing the position (search 333). The operation of obtaining the difference value is repeated, and the match area 335 with the smallest difference is determined as the position where the tracking target exists. After that, after tracking processing is established between image 1 and image 2, the tracking position determined in image 2 is updated as a template for the next time, and tracking using images 2 and 3, and tracking using images 3 and 4 are repeatedly performed to achieve continuous tracking processing. The frequency of updates may be controlled for each specified frame, or according to the shooting conditions and environment. The tracking means can obtain the position of the face and the positions of organs such as the eyes, but to obtain matching accuracy, it is preferable that the target be a specified size or larger.

上記説明した器官検出処理と追尾処理を併用して実行することによる本発明の効果を説明するため、図２で示した検出と、図３で説明した追尾処理を用い、処理時間の観点からの連写撮影時のタイミング図を図４に示す。図４（ａ）は静止画撮影（４１１、４３１、４５１）と、静止画撮影の間に検出及びＡＦ用のＬＶ露光（４１３、４３３、４５３）をもうけて検出を行い、検出結果を用いてＡＦ４１５および枠表示４１７を行うシーケンスを示している。枠表示４１７に使用する瞳位置情報を顔・器官検出４２１より得る場合、追尾４４１と比較して時間がかかるため、枠表示の位置更新までの時間４２９が長くなりリアルタイム性が損なわれる。また、検出処理時間は連写撮影時に求められる高速なコマ速の実現への律速となる。 To explain the effect of the present invention by executing the above-described organ detection process and tracking process in combination, FIG. 4 shows a timing diagram of continuous shooting from the viewpoint of processing time, using the detection shown in FIG. 2 and the tracking process described in FIG. 3. FIG. 4(a) shows a sequence of still image shooting (411, 431, 451), detection and LV exposure for AF (413, 433, 453) between still image shooting, and AF 415 and frame display 417 using the detection result. When the pupil position information used for frame display 417 is obtained from face/organ detection 421, it takes more time than tracking 441, so the time 429 until the position of the frame display is updated becomes longer and real-time performance is impaired. In addition, the detection processing time is a limiting factor in achieving the high frame rate required for continuous shooting.

図４（ｂ）はＬＶ露光期間で得た画像を用いて追尾処理によりＡＦおよび表示枠対象位置を求めるシーケンスを示している。一般的に処理の複雑度が高い器官検出を用いた検出処理よりも、シンプルなマッチング処理を用いた追尾処理のほうが短時間で処理が終わることが知られている。追尾４４１の結果を用いてＡＦ４３５と枠表示４３７を行うことで、検出を行う図４（ａ）のシーケンスよりも短い時間４４９で枠表示の位置更新を行うことができ、早いコマ速も実現できていることが分かる。しかしながら、追尾単体での連続動作は類似被写体への誤追尾の懸念がある。被写体が小さい場合、例えば画像枠に全身が入っているような人の顔の場合、追尾による顔位置の精度はある程度は保たれるが目の位置の精度は不十分となる場合が多い。 Figure 4(b) shows a sequence for determining the AF and display frame target positions by tracking processing using images obtained during the LV exposure period. It is generally known that tracking processing using a simple matching process can be completed in a shorter time than detection processing using organ detection, which is generally more complex. By performing AF 435 and frame display 437 using the results of tracking 441, it is possible to update the frame display position in a shorter time 449 than the sequence of Figure 4(a) in which detection is performed, and it can be seen that a fast frame rate is also achieved. However, continuous operation by tracking alone raises concerns about erroneous tracking of similar subjects. When the subject is small, for example, in the case of a human face whose entire body is included in the image frame, the accuracy of the face position by tracking is maintained to a certain extent, but the accuracy of the eye position is often insufficient.

図４（ｃ）は検出と追尾を併用したシーケンスを、図５は併用方法を説明するためのイメージ図を示している。高速な追尾処理４６３により得られた現コマの顔情報５２１、５２３（位置及びサイズ）に対し、前コマの高精度な検出処理４６５から得られた顔枠中心５１３からの目５１５の相対位置の情報（図５ではＸ方向に顔枠５１１の水平幅を１００％とした時の２０％右方向、Ｙ方向に１５％上方向）を適用することで、器官検出だけを用いる場合と比較して枠表示のリアルタイム性を高めつつ、追尾情報のみを使用する場合と比較してより高い精度の検出結果に基づいて目の推定位置５２５を推定し、表示することができる。 Figure 4(c) shows a sequence in which detection and tracking are used together, and Figure 5 shows an image diagram for explaining the combined use method. By applying information on the relative position of the eyes 515 from the face frame center 513 obtained from the high-precision detection process 465 of the previous frame (20% to the right in the X direction and 15% upward in the Y direction when the horizontal width of the face frame 511 is 100% in Figure 5) to the face information 521, 523 (position and size) of the current frame obtained by the high-speed tracking process 463, it is possible to estimate and display the estimated position 525 of the eyes based on a detection result with higher accuracy compared to when only tracking information is used, while improving the real-timeness of the frame display compared to when only organ detection is used.

図６を用いて、本発明の処理フローを説明する。本フローはカメラ制御部１３１あるいはカメラ制御部１３１の指示によりカメラ信号処理回路１２４など各部で実行されるものとする。まずＡＦ用のＬＶ露光が終わると、図６（ｂ）のＳ２０１の瞳枠設定処理が開始される。Ｓ２０１で瞳枠設定処理が開始されると、カメラ制御部１３１は、Ｓ２０２に進んで追尾部１４３による現撮影コマにおける追尾処理の結果を取得し、Ｓ２０３に進む。Ｓ２０３では、追尾処理の結果、一定の閾値以上の信頼度で顔を検出できたかを判定する。Ｓ２０３一定の閾値以上の信頼度で顔検出ができたと判定される場合は、Ｓ２０４へ移行し、次にカメラ制御部１３１は、追尾処理を用いて検出された顔のサイズが所定の閾値以下であるか判定する。Ｓ２０４を行う意味として、追尾で検出精度を得るためには対象が規定のサイズ以上であることが好ましいからである。Ｓ２０４でカメラ制御部１３１が顔サイズが精度が出る十分な大きさではなかった場合、すなわち顔サイズが所定の閾値以下であると判定した場合は、Ｓ２０５へ移行する。Ｓ２０５では、算出部１４４が、顔情報検出部１４１および器官情報検出部１４２から得られている最新の顔と瞳の検出結果を取得する。カメラ制御部１３１はＳ２０５で検出結果を取得したのち、Ｓ２０６ではカメラ制御部１３１が、取得した検出結果が検出されたコマのコマ数（コマ番号）と現コマ数（現コマのコマ番号）の差分が所定の閾値以下であるか判定する。この判定に用いるものは、検出結果と現コマの間の時間に紐づいた情報であればなんでも良い。例えば、コマ数の代わりに時間そのものを記録して比較しても良い。この閾値は、好適には被写体の移動量が大きい場合は小さく、移動量が少ない場合は大きく設定される。移動量は、前コマと前々コマにおける同被写体の顔位置、顔角度、顔サイズの変化量の少なくともいずれか１つに基づいて決定されたものを使用する。Ｓ２０６において、カメラ制御部１３１が検出結果を得たコマ数と現コマ数の差分が閾値以下であると判定した場合、取得した検出結果を用いても現コマとの位置ずれが起こりにくいと判断し、Ｓ２０７に移行する。Ｓ２０７では、算出部１４４が、顔情報検出部１４１および器官情報検出部１４２から得られた瞳位置、顔位置情報および顔サイズ情報から、顔に対する瞳の相対位置を算出する。Ｓ２０８では、算出部１４４が算出された相対位置を現コマの追尾で得た顔位置および顔サイズ情報に適用し、現コマにおける瞳の推定位置を算出する。 The processing flow of the present invention will be described with reference to FIG. 6. This flow is executed by the camera control unit 131 or each unit such as the camera signal processing circuit 124 in response to an instruction from the camera control unit 131. First, when the LV exposure for AF is completed, the pupil frame setting process of S201 in FIG. 6B is started. When the pupil frame setting process is started in S201, the camera control unit 131 proceeds to S202 to obtain the result of the tracking process in the current shooting frame by the tracking unit 143, and proceeds to S203. In S203, it is determined whether the face was detected with a reliability equal to or higher than a certain threshold as a result of the tracking process. If it is determined that the face was detected with a reliability equal to or higher than a certain threshold in S203, the process proceeds to S204, and the camera control unit 131 next determines whether the size of the face detected using the tracking process is equal to or smaller than a predetermined threshold. The reason for performing S204 is that it is preferable for the target to be equal to or larger than a specified size in order to obtain detection accuracy by tracking. If the camera control unit 131 determines in S204 that the face size is not large enough to obtain accuracy, that is, that the face size is equal to or smaller than a predetermined threshold, the process proceeds to S205. In S205, the calculation unit 144 acquires the latest face and eye detection results obtained from the face information detection unit 141 and the organ information detection unit 142. After acquiring the detection result in S205, the camera control unit 131 determines in S206 whether the difference between the frame number (frame number) of the frame in which the acquired detection result was detected and the current frame number (frame number of the current frame) is equal to or smaller than a predetermined threshold. Any information associated with the time between the detection result and the current frame may be used for this determination. For example, the time itself may be recorded and compared instead of the frame number. The threshold is preferably set to be small when the movement amount of the subject is large and large when the movement amount is small. The movement amount is determined based on at least one of the change amount of the face position, face angle, and face size of the same subject in the previous frame and the frame before the previous frame. In S206, if the camera control unit 131 determines that the difference between the number of frames for which the detection results were obtained and the number of current frames is equal to or less than the threshold, it determines that positional deviation with the current frame is unlikely to occur even if the obtained detection results are used, and proceeds to S207. In S207, the calculation unit 144 calculates the relative position of the pupil with respect to the face from the pupil position, face position information, and face size information obtained from the face information detection unit 141 and the organ information detection unit 142. In S208, the calculation unit 144 applies the calculated relative position to the face position and face size information obtained by tracking the current frame, and calculates an estimated position of the pupil in the current frame.

Ｓ２０８で現コマにおける瞳推定位置が算出されると、Ｓ２０９に移行し、カメラ制御部１３１は、表示枠設定部１５１が設定した表示部１７１上の前記瞳の推定位置に瞳枠を表示する。なお、そもそも顔サイズが所定の閾値より大きい大きさであった場合は、追尾の瞳検出結果を用いた方が前コマ以前の器官検出処理による検出結果から位置推定するよりも精度とリアルタイム性を確保できるため、Ｓ２０４からＳ２１１へ移行する。Ｓ２１１でカメラ制御部１３１は、追尾処理による瞳検出結果を取得したのち、Ｓ２１２で瞳を一定値以上の信頼度で検出できたかを判定する。Ｓ２１２で瞳を一定値以上の信頼度で検出できたと判定した場合には、Ｓ２０９に遷移し、カメラ制御部１３１は表示枠設定部１５１が追尾の瞳検出結果を用いて設定した瞳枠を表示部１７１に表示する。カメラ制御部１３１が、Ｓ２０３で追尾処理で顔検出ができない、Ｓ２０６での器官検出結果の取得コマ数と現コマ数の差が所定の閾値以下、あるいはＳ２１２で追尾処理で瞳検出ができないとそれぞれ判定した場合には、瞳枠表示を中止する（Ｓ２１３）。 When the estimated pupil position in the current frame is calculated in S208, the process proceeds to S209, where the camera control unit 131 displays a pupil frame at the estimated pupil position on the display unit 171 set by the display frame setting unit 151. If the face size is larger than a predetermined threshold in the first place, the process proceeds from S204 to S211 because using the pupil detection result of tracking can ensure higher accuracy and real-time performance than estimating the position from the detection result by the organ detection process in the previous frame or earlier. In S211, the camera control unit 131 obtains the pupil detection result by the tracking process, and then in S212 determines whether the pupil was detected with a certain level of reliability or higher. If it is determined in S212 that the pupil was detected with a certain level of reliability or higher, the process proceeds to S209, where the camera control unit 131 displays the pupil frame set by the display frame setting unit 151 using the pupil detection result of tracking on the display unit 171. If the camera control unit 131 determines in S203 that a face cannot be detected by the tracking process, that the difference between the number of frames acquired in the organ detection results and the current number of frames in S206 is equal to or less than a predetermined threshold, or that an eye cannot be detected by the tracking process in S212, the display of the eye frame is discontinued (S213).

Ｓ２０５で取得される顔情報検出部１４１（器官情報検出部１４２）による器官検出の検出結果は、図６（ａ）に示されている処理フローに従って毎コマ更新されているものとする。具体的には、毎コマの器官検出処理が行われたのち、Ｓ１０１の検出結果情報更新処理が開始される。次いで、Ｓ１０２に進み、カメラ制御部１３１は、処理顔情報検出部１４１および器官情報検出部１４２の検出結果情報を取得する。Ｓ１０２で検出結果を取得したら、Ｓ１０３に推移して取得した顔および瞳の両方が一定の閾値以上の信頼度で検出できているかを判定する。一定以上の信頼度で顔・瞳検出ができていた場合はＳ１０４に移行し、記憶部１５３に格納されているコマ数と顔および瞳の検出位置情報を取得した検出結果に更新する。Ｓ１０３において、顔・瞳の信頼度が閾値未満であった場合は、更新処理を行わず更新処理を終了する。また、器官検出処理の更新頻度は必ずしも毎コマでなくてもよく、所定のコマ数ごとあるいは所定の期間ごとであってもよい。また、何らかのユーザ指示やシーン判定等によって割込みで器官検出処理が行われてもよい。いずれの態様においても、器官検出処理が行われた場合、最新の検出結果が記憶部１５３に記憶され、更新される。 The detection result of the organ detection by the face information detection unit 141 (organ information detection unit 142) acquired in S205 is updated for each frame according to the processing flow shown in FIG. 6A. Specifically, after the organ detection process is performed for each frame, the detection result information update process of S101 is started. Next, proceed to S102, where the camera control unit 131 acquires the detection result information of the processed face information detection unit 141 and the organ information detection unit 142. After acquiring the detection result in S102, proceed to S103 to determine whether the acquired face and eye have both been detected with a reliability equal to or higher than a certain threshold. If the face and eye have been detected with a reliability equal to or higher than a certain threshold, proceed to S104, where the number of frames and the detection position information of the face and eye stored in the memory unit 153 are updated to the acquired detection result. In S103, if the reliability of the face and eye is less than the threshold, the update process is terminated without performing the update process. In addition, the update frequency of the organ detection process does not necessarily have to be every frame, and may be every predetermined number of frames or every predetermined period. Additionally, the organ detection process may be interrupted by some user instruction, scene determination, etc. In either case, when the organ detection process is performed, the latest detection results are stored in the storage unit 153 and updated.

以上のように本実施形態では、器官検出処理だけを用いて瞳ＡＦを行う場合と比較して枠表示のリアルタイム性を高めつつ、パターンマッチング処理等を用いた追尾処理のみを行う場合と比較して高い精度の検出結果に基づいて目の推定を行うことができる。 As described above, in this embodiment, the real-time frame display is improved compared to when pupil AF is performed using only organ detection processing, while the eyes can be estimated based on detection results with higher accuracy compared to when only tracking processing using pattern matching processing or the like is performed.

次に、顔情報検出部１４１（器官情報検出部１４２）の検出結果に基づき算出部１４４が行う瞳位置の推定処理について、被写体がＰｉｔｃｈ、Ｙａｗ、Ｒｏ１１方向への回転を行っている場合においても正確な位置に瞳枠を推定することができる実施例２を示す。 Next, a second embodiment will be shown in which the calculation unit 144 performs the pupil position estimation process based on the detection results of the face information detection unit 141 (organ information detection unit 142), and the pupil frame can be estimated at an accurate position even when the subject is rotating in the Pitch, Yaw, or Ro11 directions.

図７（ａ）に示したように、実施例１の方法では、被写体がカメラのレンズ１００に対し、顔角度が変わらないような移動をした場合、またはゆっくりとＰｉｔｃｈ、Ｙａｗ、Ｒｏ１１方向へ回転した場合は精度よく瞳位置を推定できるが、前記３種の回転方向のいずれかもしくは全部において急激な移動をした場合には、瞳の位置推定に前コマにおける顔枠に対する瞳の相対位置を用いるために実際の瞳位置と枠位置のズレが大きくなってしまう。例えば、Ｐｉｔｃｈ方向に７１１→７２１→７３１と被写体の顔が回転した場合では、前コマでの瞳の相対位置７２２を現コマに適用した場合、実際の瞳よりも下位置に瞳があると推測してしまい、枠７３２が実際の瞳位置から下方にずれてしまう。そこで、本変形例では表示枠設定部１５１が顔情報検出部１４１から得た前々コマと前コマの顔角度情報をもとに、枠７３２の位置から図７（ｂ）の枠７３３に補正する。また、Ｙａｗ（７４１～７６３）方向への回転でもＰｉｔｃｈと同様に、補正を行う。Ｙａｗ回転とＰｉｔｃｈ回転の補正の方法としては、簡便な手法としては前々コマと前コマの移動距離をそのまま補正量として用いる方法が考えられる。顔は略球形なので、例えば、正対方向からＹａｗ方向の奥側に行くにつれ、瞳移動量は減少する。そのため、さらに精度よく補正する方法としてこの顔角度に対する移動量をルックアップチャートとして持ち、参照しても良い。Ｒｏｌｌ（７７１～７９３）方向に顔が回転した場合では、ＹａｗとＰｉｔｃｈ回転で用いる前々コマと前コマの移動距離をそのまま補正量として用いる方法は適さない。なぜならば、瞳は画像上で円運動をするため、略直線移動を前提とした補正方法では位置ずれが悪化する可能性があるためである。そのため、図７（ｂ）に示すように、Ｒｏｌｌ方向の回転では、顔枠の中心７９５を円心とし、円心から瞳までの距離を半径とした円７９６の上を、顔角度に応じて表示枠を７９２→７９３へと移動させる補正を行う。 7A, in the method of the first embodiment, if the subject moves so that the face angle does not change with respect to the camera lens 100, or if the subject slowly rotates in the Pitch, Yaw, or Ro11 directions, the pupil position can be estimated with high accuracy. However, if the subject moves suddenly in any or all of the three rotation directions, the deviation between the actual pupil position and the frame position becomes large because the relative position of the pupil to the face frame in the previous frame is used for pupil position estimation. For example, if the subject's face rotates in the Pitch direction from 711 to 721 to 731, if the relative position 722 of the pupil in the previous frame is applied to the current frame, it is assumed that the pupil is located lower than the actual pupil, and the frame 732 is shifted downward from the actual pupil position. Therefore, in this modified example, the display frame setting unit 151 corrects the position of the frame 732 to the frame 733 in FIG. 7B based on the face angle information of the frame before the previous frame and the previous frame obtained from the face information detection unit 141. In addition, the rotation in the Yaw (741 to 763) direction is corrected in the same way as the Pitch. As a simple method for correcting Yaw and Pitch rotations, a method of using the movement distance between the frame before the previous frame and the frame before the previous frame as the correction amount can be considered. Since the face is approximately spherical, for example, the pupil movement amount decreases as it moves from the front facing direction to the back in the Yaw direction. Therefore, as a method for more accurate correction, the movement amount for this face angle may be stored as a lookup chart and referred to. When the face is rotated in the Roll (771 to 793) direction, the method of using the movement distance between the frame before the previous frame and the frame before the previous frame used in Yaw and Pitch rotation as the correction amount is not suitable. This is because the pupil makes a circular movement on the image, and a correction method based on an approximately linear movement may worsen the position deviation. Therefore, as shown in FIG. 7B, when rotating in the Roll direction, a correction is made to move the display frame from 792 to 793 according to the face angle on a circle 796 whose center is the center 795 of the face frame and whose radius is the distance from the center to the pupil.

図８（ａ）はカメラ８１１と被写体８１２を上空から見た図である。被写体８１２の瞳８１３がカメラ８１１に対し正対する顔角度を０度とし、矢印の方向に顔が回転する場合の画面上の瞳のサイズ変動を示したのが図（ｂ）である。０度から４５度に行く間で瞳８１３はカメラに接近し、顔に対して相対的に大きくなり、さらに奥方向へ向いたときには小さくなっていく。この顔の回転に伴う瞳のサイズ変動を加味して枠サイズを設定するとより実態に則した枠を付けることができる。また、枠が小さくなりすぎると視認性が下がるため、枠サイズが一定閾値以下にはならないようにするか、もしくは矢印にて瞳位置を示すなど表示方法を変えてもよい。Ｙａｗでも同様に、枠サイズ補正を行う。Ｒｏｌｌでは瞳のサイズは変化しないため、このサイズ補正を行わない。 Figure 8 (a) shows a camera 811 and a subject 812 seen from above. Figure 8 (b) shows the change in pupil size on the screen when the face rotates in the direction of the arrow, with the face angle at which the pupil 813 of the subject 812 faces the camera 811 being 0 degrees. From 0 degrees to 45 degrees, the pupil 813 approaches the camera and becomes larger relative to the face, and when it faces further back, it becomes smaller. If the frame size is set taking into account the change in pupil size due to the rotation of the face, a frame that is more realistic can be attached. Also, since visibility decreases if the frame becomes too small, the frame size may be set so that it does not fall below a certain threshold, or the display method may be changed, such as by indicating the pupil position with an arrow. Frame size correction is also performed in the same way for Yaw. Since the pupil size does not change with Roll, this size correction is not performed.

以上により、被写体がＰｉｔｃｈ、Ｙａｗ、Ｒｏ１１方向への回転を行っている場合において、処理負荷は増すものの、実施例１よりも正確な位置に瞳枠を表示することができる。 As a result, when the subject is rotating in the Pitch, Yaw, or Ro11 directions, the processing load increases, but the pupil frame can be displayed in a more accurate position than in Example 1.

以上、本発明をその好適な実施形態に基づいて詳述してきたが、本発明はこれら特定の実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の様々な形態も本発明に含まれる。上述の実施形態の一部を適宜組み合わせてもよい。また、上述の実施形態の機能を実現するソフトウェアのプログラムを、記録媒体から直接、或いは有線／無線通信を用いてプログラムを実行可能なコンピュータを有するシステム又は装置に供給し、そのプログラムを実行する場合も本発明に含む。従って、本発明の機能処理をコンピュータで実現するために、該コンピュータに供給、インストールされるプログラムコード自体も本発明を実現するものである。つまり、本発明の機能処理を実現するためのコンピュータプログラム自体も本発明に含まれる。その場合、プログラムの機能を有していれば、オブジェクトコード、インタプリタにより実行されるプログラム、ＯＳに供給するスクリプトデータ等、プログラムの形態を問わない。プログラムを供給するための記録媒体としては、例えば、ハードディスク、磁気テープ等の磁気記録媒体、光／光磁気記憶媒体、不揮発性の半導体メモリでもよい。また、プログラムの供給方法としては、コンピュータネットワーク上のサーバに本発明を形成するコンピュータプログラムを記憶し、接続のあったクライアントコンピュータがコンピュータプログラムをダウンロードしてプログラムするような方法も考えられる。 Although the present invention has been described in detail above based on its preferred embodiments, the present invention is not limited to these specific embodiments, and various forms within the scope of the gist of the present invention are also included in the present invention. Parts of the above-mentioned embodiments may be appropriately combined. In addition, the present invention also includes a case where a software program that realizes the functions of the above-mentioned embodiments is supplied to a system or device having a computer that can execute the program directly from a recording medium or using wired/wireless communication, and the program is executed. Therefore, the program code itself that is supplied and installed to the computer in order to realize the functional processing of the present invention by the computer also realizes the present invention. In other words, the computer program itself for realizing the functional processing of the present invention is also included in the present invention. In that case, as long as it has the function of the program, the form of the program does not matter, such as object code, a program executed by an interpreter, script data supplied to an OS, etc. Examples of recording media for supplying the program include magnetic recording media such as hard disks and magnetic tapes, optical/magneto-optical storage media, and non-volatile semiconductor memories. In addition, as a method of supplying the program, a method in which the computer program forming the present invention is stored in a server on a computer network, and a connected client computer downloads the computer program and programs it is also considered.

１２２撮像素子
１２４カメラ信号処理回路
１３１カメラ制御部
１４１顔情報検出部
１４２器官情報検出部
１４３追尾部
１４４算出部
１５０瞳自動選択部
１５１表示枠設定部
１５２瞳指定／指定解除部
１５３指定瞳記憶部
１７１表示部
１７２タッチパネル 122 Image sensor 124 Camera signal processing circuit 131 Camera control unit 141 Face information detection unit 142 Organ information detection unit 143 Tracking unit 144 Calculation unit 150 Automatic pupil selection unit 151 Display frame setting unit 152 Pupil designation/designation cancellation unit 153 Designated pupil storage unit 171 Display unit 172 Touch panel

Claims

a display control means for displaying a captured image and an icon on a display means;
A detection means for detecting a face and eyes from a captured image;
a tracking means for detecting and tracking a face area set in the first captured image from the second captured image using a detection method that is simpler than the detection by the detection means;
a determining means for determining a relative position of the eyes with respect to the face based on the position of the eyes, the position of the face and the size of the face detected by the detecting means,
the determining means determines an estimated position of the eyes by applying a relative positional relationship of the eyes to the face to a face position and a face size of the face area detected by the tracking means in the second captured image;
The image processing device according to claim 1, wherein the display control means displays an icon indicating the determined estimated position of the eye in a superimposed manner on the second captured image displayed on the display means.

The image processing device according to claim 1 , wherein the icon is displayed based on the estimated position when it is determined that the face size of the subject obtained from the tracking means is equal to or smaller than a predetermined threshold value.

The image processing device according to claim 1 or 2, characterized in that the face and eye information detected by the detection means used by the determination means is information from a predetermined number of frames or a predetermined time before the current frame.

An image processing device according to any one of claims 1 to 3, characterized in that the face and eye information detected by the detection means used by the determination means is information within a predetermined time from the current frame, and the time is set to be small when the movement amount of the subject is large and set to be large when the movement amount is small.

The image processing device according to any one of claims 1 to 4, characterized in that the method of determining the relative position of the eyes is to determine the horizontal and vertical movement distances from the center point of the face to the center point of the eyes as amounts relative to the face size.

The image processing device according to any one of claims 1 to 5, characterized in that the determining means corrects the relative position of the eye using the amount of rotation of the subject or the amount of movement of the organ between the most recent frames.

The image processing device according to any one of claims 1 to 6, further comprising an imaging means.

a display control step of displaying a captured image and an icon on a display means;
a detection step of detecting a face and eyes from a captured image by a detection means;
a tracking step of detecting and tracking a face area set in the first captured image from the second captured image using a detection method that is simpler than the detection by the detection means;
a determining step of determining a relative position of the eyes with respect to the face based on the position of the eyes, the position of the face and the size of the face detected by the detecting means,
In the determining step, an estimated position of the eyes is determined by applying a relative positional relationship of the eyes to the face to a face position and a face size of the face region detected in the tracking step in the second captured image,
The image processing method according to the present invention, wherein the display control step displays an icon indicating the determined estimated position of the eye superimposed on the second captured image displayed on a display means.