JP2020095673A

JP2020095673A - Image processing device, control method therefor, and image capturing device

Info

Publication number: JP2020095673A
Application number: JP2019119017A
Authority: JP
Inventors: 慶祐緑川; Keisuke Midorikawa; 良介辻; Ryosuke Tsuji
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2018-07-27
Filing date: 2019-06-26
Publication date: 2020-06-18
Anticipated expiration: 2039-06-26
Also published as: JP7324066B2

Abstract

To provide an image processing device capable of detecting a moving object moving in a direction the background is moving, and to provide a control method therefor and an image capturing device.SOLUTION: An image processing device provided herein is configured to compute a background vector representing motion of the background using a plurality of motion vectors detected among a plurality of images, and then detect a motion vector of a moving object from the plurality of motion vectors on the basis of a Euclidean distance between the background vector and each of the plurality of motion vectors.SELECTED DRAWING: Figure 3

Description

本発明は、画像処理装置およびその制御方法、ならびに撮像装置に関し、特に動体検出技術に関する。 The present invention relates to an image processing apparatus, a control method therefor, and an image pickup apparatus, and more particularly to a moving body detection technique.

動画のフレーム画像間の動きベクトルに基づいて、被写体の大局的な移動方向と逆方向に移動する被写体を検出する技術が知られている（特許文献１）。また、画面全体の動きと局所的な動きとの差異が大きい領域を主被写体の領域と判定する技術が知られている（特許文献２）。 There is known a technique of detecting a subject moving in a direction opposite to the general movement direction of the subject based on a motion vector between frame images of a moving image (Patent Document 1). In addition, a technique is known in which an area in which the difference between the entire screen movement and the local movement is large is determined as the area of the main subject (Patent Document 2).

特開２０１５−１９４９１５号公報JP, 2005-194915, A 特開２０１５−１１１７４６号公報JP, 2005-111746, A

特許文献１に記載の手法は、多数の移動体と逆方向に移動する移動体を、多数の移動体の移動方向を表す動きベクトルに対して角度差が所定値以上である動きベクトルに基づいて検出している。そのため、例えば移動体をパンニングしながら撮影する場合の様に、背景と被写体（移動体）とが同じ方向に異なる速さで移動するシーンを撮影した場合、特許文献１に記載の手法は背景と被写体とを分離することができない。 The method described in Patent Document 1 is based on a motion vector in which a moving body that moves in the opposite direction of a large number of moving bodies has an angle difference of a predetermined value or more with respect to a motion vector that represents the moving direction of the plurality of moving bodies. It is detecting. Therefore, for example, when shooting a scene in which a background and a subject (moving body) move in the same direction at different speeds, such as when shooting a moving body while panning, the method described in Patent Document 1 The subject cannot be separated.

一方、特許文献２では、画像全体の動きと、局所的な動きとの差異に差に基づいて主被写体を判定するため、背景と主被写体とが同じ方向に異なる速さで移動している場合でも主被写体を判別できる。しかしながら、ユーザが意図している主被写体は、画面全体の動きと差異の大きな動きを有する大きな被写体とはかぎらない。 On the other hand, in Patent Document 2, since the main subject is determined based on the difference between the movement of the entire image and the local movement, when the background and the main subject are moving in the same direction at different speeds. But the main subject can be identified. However, the main subject intended by the user is not limited to a large subject having a large difference from the movement of the entire screen.

本発明はこのような従来技術の課題に鑑みてなされたものであり、背景と同じ方向に移動する移動体を検出可能な画像処理装置およびその制御方法、ならびに撮像装置を提供することを目的の１つとする。 The present invention has been made in view of the problems of the related art described above, and an object of the present invention is to provide an image processing device capable of detecting a moving object that moves in the same direction as the background, a control method thereof, and an imaging device. One

上述の目的は、複数の画像間の複数の動きベクトルを検出する動きベクトル検出手段と、動きベクトルに基づいて、背景の動きを表す背景ベクトルを算出する算出手段と、複数の動きベクトルのそれぞれと背景ベクトルとのユークリッド距離の大きさに基づいて、複数の動きベクトルから動体の動きベクトルを検出する動体検出手段と、を有することを特徴とする画像処理装置によって達成される。 The above-described object is to provide a motion vector detecting means for detecting a plurality of motion vectors between a plurality of images, a calculating means for calculating a background vector representing a motion of a background based on the motion vector, and a plurality of motion vectors respectively. And a moving object detection unit that detects a motion vector of a moving object from a plurality of motion vectors based on the size of the Euclidean distance from the background vector.

本発明によれば、背景と同じ方向に移動する移動体を検出可能な画像処理装置およびその制御方法、ならびに撮像装置を提供することができる。 According to the present invention, it is possible to provide an image processing device capable of detecting a moving object that moves in the same direction as the background, a control method thereof, and an imaging device.

発明の実施形態に係るデジタルカメラの機能構成例を示すブロック図FIG. 3 is a block diagram showing a functional configuration example of a digital camera according to an embodiment of the invention. 実施形態における画像処理部の動体検出に関する機能構成例を示すブロック図FIG. 3 is a block diagram showing an example of the functional configuration of a moving body detection of the image processing unit according to the embodiment. 実施形態における動体検出処理に関するフローチャートFlowchart regarding moving object detection processing in the embodiment 実施形態における動体検出方法を説明するための模式図Schematic diagram for explaining the moving object detection method in the embodiment 実施形態における背景クラスタ選択動作の例に関するフローチャートFlowchart regarding an example of background cluster selection operation in the embodiment 実施形態における背景クラスタ選択動作の別の例に関するフローチャートThe flowchart regarding another example of the background cluster selection operation|movement in embodiment. 実施形態における距離マップを用いた動体検出を説明するための模式図Schematic diagram for explaining moving object detection using a distance map in the embodiment 第２実施形態における画像処理部の動体検出に関する機能構成例を示すブロック図FIG. 3 is a block diagram showing a functional configuration example of moving object detection of an image processing unit in the second embodiment. 第２実施形態における画像処理部の動作に関するフローチャートFlowchart regarding the operation of the image processing unit in the second embodiment 第２実施形態における動きベクトルのクラスタリングに関する図Diagram regarding clustering of motion vectors in the second embodiment 第２実施形態における主被写体判別処理に関するフローチャートFlowchart regarding main subject determination processing in the second embodiment

以下、添付図面を参照して、本発明をその例示的な実施形態に基づいて詳細に説明する。なお、説明する実施形態は単なる例示であり、本発明の範囲を限定するものではない。例えば、以下では本発明をデジタルカメラに適用した実施形態を説明する。しかし、デジタルカメラは本発明を適用可能な画像処理装置の一例にすぎない。本発明は任意の電子機器において実施可能である。このような電子機器には、デジタルカメラやデジタルビデオカメラといった撮像装置はもちろん、パーソナルコンピュータ、タブレット端末、携帯電話機、ゲーム機、ドライブレコーダ、ロボット、ドローンなどが含まれるが、これらに限定されない。なお、本発明には撮影機能は必須でなく、動画のように時系列で撮影された画像を取得可能な電子機器で実施可能である。 Hereinafter, the present invention will be described in detail based on exemplary embodiments thereof with reference to the accompanying drawings. The embodiments to be described are merely examples and do not limit the scope of the present invention. For example, an embodiment in which the present invention is applied to a digital camera will be described below. However, the digital camera is only an example of the image processing apparatus to which the present invention can be applied. The present invention can be implemented in any electronic device. Such electronic devices include, but are not limited to, personal computers, tablet terminals, mobile phones, game consoles, drive recorders, robots, drones, as well as imaging devices such as digital cameras and digital video cameras. Note that the image capturing function is not essential to the present invention, and can be implemented by an electronic device that can acquire images captured in time series such as a moving image.

（撮像装置の構成）
図１は第１実施形態に係るデジタルカメラ１００の機能構成例を示すブロック図である。デジタルカメラ１００は動画および静止画の撮影ならびに記録が可能である。デジタルカメラ１００内の各機能ブロックは、バス１６０を介して互いに通信可能に接続されている。デジタルカメラ１００の動作は、主制御部１５１が有する１つ以上のプログラマブルプロセッサが例えばＲＯＭ１５５に記憶されているプログラムをＲＡＭ１５４に読み込んで実行し、各機能ブロックを制御することにより実現される。 (Structure of imaging device)
FIG. 1 is a block diagram showing a functional configuration example of the digital camera 100 according to the first embodiment. The digital camera 100 can capture and record moving images and still images. The functional blocks in the digital camera 100 are communicatively connected to each other via a bus 160. The operation of the digital camera 100 is realized by one or more programmable processors included in the main control unit 151 reading a program stored in, for example, the ROM 155 into the RAM 154, executing the program, and controlling each functional block.

撮影レンズ１０１（レンズユニット）は、固定１群レンズ１０２、ズームレンズ１１１、絞り１０３、固定３群レンズ１２１、フォーカスレンズ１３１、ズームモータ（ＺＭ）１１２、絞りモータ（ＡＭ）１０４、およびフォーカスモータ（ＦＭ）１３２を有する。固定１群レンズ１０２、ズームレンズ１１１、絞り１０３、固定３群レンズ１２１、フォーカスレンズ１３１は撮影光学系を構成する。なお、便宜上レンズ１０２、１１１、１２１、１３１を１枚のレンズとして図示しているが、それぞれ複数のレンズで構成されてもよい。また、撮影レンズ１０１は撮像装置１００から取り外し可能な交換レンズとして構成されてもよい。 The taking lens 101 (lens unit) includes a fixed first group lens 102, a zoom lens 111, an aperture 103, a fixed third group lens 121, a focus lens 131, a zoom motor (ZM) 112, an aperture motor (AM) 104, and a focus motor ( FM) 132. The fixed first group lens 102, the zoom lens 111, the diaphragm 103, the fixed third group lens 121, and the focus lens 131 constitute a photographing optical system. Although the lenses 102, 111, 121, and 131 are shown as one lens for convenience, each of them may be composed of a plurality of lenses. Further, the taking lens 101 may be configured as an interchangeable lens that can be detached from the imaging device 100.

絞り制御部１０５は、主制御部１５１の命令にしたがって絞りモータ１０４の動作を制御し、絞り１０３の開口径を変更する。
ズーム制御部１１３は、主制御部１５１の命令にしたがってズームモータ１１２の動作を制御し、撮影レンズ１０１の焦点距離（画角）を変更する。 The aperture control unit 105 controls the operation of the aperture motor 104 according to the instruction of the main control unit 151, and changes the aperture diameter of the aperture 103.
The zoom control unit 113 controls the operation of the zoom motor 112 according to the instruction of the main control unit 151, and changes the focal length (angle of view) of the taking lens 101.

フォーカス制御部１３３は、例えば撮像素子１４１から得られる１対の焦点検出用信号の位相差に基づいて撮影レンズ１０１のデフォーカス量およびデフォーカス方向を算出する。そしてフォーカス制御部１３３は、デフォーカス量およびデフォーカス方向をフォーカスモータ１３２の駆動量および駆動方向に変換する。この駆動量および駆動方向に基づいてフォーカス制御部１３３はフォーカスモータ１３２の動作を制御し、フォーカスレンズ１３１を駆動することにより、撮影レンズ１０１の焦点状態を制御する。 The focus control unit 133 calculates the defocus amount and the defocus direction of the taking lens 101 based on the phase difference between the pair of focus detection signals obtained from the image sensor 141, for example. Then, the focus control unit 133 converts the defocus amount and the defocus direction into the drive amount and drive direction of the focus motor 132. The focus control unit 133 controls the operation of the focus motor 132 based on the drive amount and the drive direction, and drives the focus lens 131 to control the focus state of the taking lens 101.

このように、フォーカス制御部１３３は位相差検出方式の自動焦点検出（ＡＦ）を実施するが、フォーカス制御部１３３は撮像素子１４１から得られる画像信号のコントラスト評価値に基づくコントラスト検出方式のＡＦを実行してもよい。また、撮像素子１４１とは別個に設けたＡＦセンサから得られる焦点検出用信号を用いて位相差検出方式のＡＦを実行してもよい。なお、フォーカス制御部１３３におけるＡＦ動作において、後述する画像処理部１５２が検出する主被写体の領域に焦点検出領域を設定することができる。 In this way, the focus control unit 133 performs automatic focus detection (AF) of the phase difference detection method, but the focus control unit 133 performs AF of the contrast detection method based on the contrast evaluation value of the image signal obtained from the image sensor 141. You may execute. Further, AF of the phase difference detection method may be executed using a focus detection signal obtained from an AF sensor provided separately from the image sensor 141. In the AF operation of the focus control unit 133, the focus detection area can be set in the area of the main subject detected by the image processing unit 152 described later.

撮影レンズ１０１によって撮像素子１４１の結像面に形成される被写体像は、撮像素子１４１に配置された複数の画素のそれぞれが有する光電変換素子により電気信号（画像信号）に変換される。本実施形態では、撮像素子１４１に、水平方向にｍ、垂直方向にｎ（ｎ，ｍは複数）の画素が行列状に配置されており、各画素には２つの光電変換素子（光電変換領域）が設けられている。撮像素子１４１からの信号読み出しは、主制御部１５１からの指示に従ってセンサ制御部１４３が制御する。 A subject image formed on the imaging surface of the image sensor 141 by the taking lens 101 is converted into an electric signal (image signal) by a photoelectric conversion element included in each of a plurality of pixels arranged in the image sensor 141. In the present embodiment, in the image sensor 141, m pixels in the horizontal direction and n pixels (n and m are plural) in the vertical direction are arranged in a matrix, and each pixel has two photoelectric conversion elements (photoelectric conversion regions). ) Is provided. The sensor control unit 143 controls the signal reading from the image sensor 141 according to an instruction from the main control unit 151.

個々の画素が有する２つの光電変換領域を領域Ａおよび領域Ｂと呼び、個々の画素の領域Ａから読み出した画像信号群からなる画像をＡ画像、個々の画素の領域Ｂから読み出した画像信号群からなる画像をＢ画像と呼ぶ。また、Ａ画像とＢ画像を画素単位で加算した画像をＡ＋Ｂ画像と呼ぶ。Ａ画像とＢ画像は視差画像対を形成する。表示や記録にはＡ＋Ｂ画像を用いる。また、位相差検出方式のＡＦに用いる焦点検出用信号の生成や、距離マップの生成にはＡ画像とＢ画像を用いる。 The two photoelectric conversion areas of each pixel are called an area A and an area B, an image consisting of an image signal group read from the area A of each pixel is an A image, and an image signal group read from the area B of each pixel An image composed of is called a B image. An image obtained by adding the A image and the B image in pixel units is called an A+B image. The A image and the B image form a parallax image pair. A+B images are used for display and recording. Further, the A image and the B image are used for generating a focus detection signal used for AF of the phase difference detection method and for generating a distance map.

撮像素子１４１から読み出された画像信号は信号処理部１４２に供給される。信号処理部１４２は、画像信号にノイズ低減処理、Ａ／Ｄ変換処理、自動利得制御処理などの信号処理を適用し、画像データとしてセンサ制御部１４３に出力する。センサ制御部１４３は信号処理部１４２から受信した画像データをＲＡＭ（ランダム・アクセス・メモリ）１５４に蓄積する。 The image signal read from the image sensor 141 is supplied to the signal processing unit 142. The signal processing unit 142 applies signal processing such as noise reduction processing, A/D conversion processing, and automatic gain control processing to the image signal, and outputs it as image data to the sensor control unit 143. The sensor control unit 143 accumulates the image data received from the signal processing unit 142 in a RAM (random access memory) 154.

ＲＡＭ１５４に保存された画像データを記録する場合、主制御部１５１は画像データに例えば所定のヘッダを追加するなどして、記録形式に応じたデータファイルを生成する。この際、主制御部１５１は必要に応じて圧縮解凍部１５３で画像データを符号化し、符号化データをデータファイルに格納する。主制御部１５１は、生成したデータファイルを例えばメモリカードのような記録媒体１５７に記録する。 When recording the image data stored in the RAM 154, the main control unit 151 generates a data file according to the recording format by, for example, adding a predetermined header to the image data. At this time, the main control unit 151 encodes the image data by the compression/decompression unit 153 as necessary, and stores the encoded data in the data file. The main control unit 151 records the generated data file in a recording medium 157 such as a memory card.

また、ＲＡＭ１５４に保存された画像データを表示する場合、主制御部１５１は表示部１５０での表示サイズに適合するように画像データを画像処理部１５２でスケーリングした後、ＲＡＭ１５４のうちビデオメモリとして用いる領域（ＶＲＡＭ領域）に書き込む。表示部１５０は、ＲＡＭ１５４のＶＲＡＭ領域から表示用の画像データを読み出し、例えばＬＣＤや有機ＥＬディスプレイなどの表示装置に表示する。表示部１５０では、画像処理部１５２が検出した主被写体（動体）の検出結果（主被写体領域を示す枠など）も表示する。 When displaying the image data stored in the RAM 154, the main control unit 151 uses the image data in the RAM 154 as a video memory after scaling the image data by the image processing unit 152 so as to match the display size on the display unit 150. Write in the area (VRAM area). The display unit 150 reads out image data for display from the VRAM area of the RAM 154 and displays it on a display device such as an LCD or an organic EL display. The display unit 150 also displays the detection result of the main subject (moving body) detected by the image processing unit 152 (a frame indicating the main subject region, etc.).

デジタルカメラ１００は、動画撮影時（撮影スタンバイ状態や動画記録中）に、撮影された動画を表示部１５０に即時表示することにより、表示部１５０を電子ビューファインダー（ＥＶＦ）として機能させる。表示部１５０をＥＶＦとして機能させる際に表示する動画およびそのフレーム画像を、ライブビュー画像もしくはスルー画像と呼ぶ。また、デジタルカメラ１００は、静止画撮影を行った場合、撮影結果をユーザが確認できるように、直前に撮影した静止画を一定時間表示部１５０に表示する。これらの表示動作についても、主制御部１５１の制御によって実現される。 The digital camera 100 causes the display unit 150 to function as an electronic viewfinder (EVF) by immediately displaying the captured moving image on the display unit 150 during moving image shooting (shooting standby state or during moving image recording). The moving image and its frame image displayed when the display unit 150 functions as an EVF are called a live view image or a through image. Further, when a still image is captured, the digital camera 100 displays the last captured still image on the display unit 150 for a certain time so that the user can confirm the captured result. These display operations are also realized by the control of the main control unit 151.

圧縮解凍部１５３は画像データを符号化したり復号したりする。例えば静止画や動画を記録する場合、予め定められた符号化方式によって画像データや音声データを符号化する。また、記録媒体１５７に記録された静止画データファイルや動画データファイルを再生する際、圧縮解凍部１５３は符号化データを復号してＲＡＭ１５４に格納する。 The compression/decompression unit 153 encodes and decodes image data. For example, when recording a still image or a moving image, image data or audio data is encoded by a predetermined encoding method. Further, when reproducing the still image data file or the moving image data file recorded in the recording medium 157, the compression/decompression unit 153 decodes the encoded data and stores it in the RAM 154.

ＲＡＭ１５４はプログラムを実行するためのシステムメモリ、ビデオメモリ、バッファメモリなどとして用いられる。
ＲＯＭ１５５は主制御部１５１のプロセッサが実行可能なプログラム、各種の設定値、デジタルカメラ１００の固有情報、ＧＵＩデータなどを記憶する。ＲＯＭ１５５は電気的に書き換え可能であってよい。 The RAM 154 is used as a system memory, a video memory, a buffer memory, etc. for executing a program.
The ROM 155 stores programs executable by the processor of the main controller 151, various setting values, unique information of the digital camera 100, GUI data, and the like. The ROM 155 may be electrically rewritable.

操作部１５６は、ユーザがデジタルカメラ１００に指示を入力するためのスイッチ、ボタン、キー、タッチパネルなどの入力デバイス群の総称である。操作部１５６を通じた入力はバス１６０を通じて主制御部１５１が検知し、主制御部１５１は入力に応じた動作を実現するために各部を制御する。 The operation unit 156 is a general term for a group of input devices such as switches, buttons, keys, and touch panels, which are used by the user to input instructions to the digital camera 100. An input through the operation unit 156 is detected by the main control unit 151 through the bus 160, and the main control unit 151 controls each unit to realize an operation according to the input.

主制御部１５１は例えばＣＰＵやＭＰＵなどのプログラマブルプロセッサを１つ以上有し、例えばＲＯＭ１５５に記憶されたプログラムをＲＡＭ１５４に読み込んで実行することにより各部を制御し、デジタルカメラ１００の機能を実現する。主制御部１５１はまた、被写体輝度の情報に基づいて露出条件（シャッタースピードもしくは蓄積時間、絞り値、感度）を自動的に決定するＡＥ処理を実行する。被写体輝度の情報は例えば画像処理部１５２から取得することができる。主制御部１５１は、例えば人物の顔など、特定被写体の領域の輝度情報に基づいて露出条件を決定することもできる。 The main control unit 151 has one or more programmable processors such as CPU and MPU, and controls each unit by reading a program stored in the ROM 155 into the RAM 154 and executing the program, for example, to realize the function of the digital camera 100. The main control unit 151 also executes AE processing for automatically determining the exposure condition (shutter speed or storage time, aperture value, sensitivity) based on the information on the subject brightness. The information on the subject brightness can be acquired from the image processing unit 152, for example. The main control unit 151 can also determine the exposure condition based on the brightness information of the area of the specific subject, such as the face of a person.

主制御部１５１は、動画撮影時には絞り１０３は固定とし、電子シャッタスピード（蓄積時間）とゲインの大きさで露出を制御する。主制御部１５１は決定した蓄積時間とゲインの大きさをセンサ制御部１４３に通知する。センサ制御部１４３は通知された露出条件に従った撮影が行われるように撮像素子１４１の動作を制御する。 The main control unit 151 controls the exposure with the electronic shutter speed (accumulation time) and the magnitude of the gain while the diaphragm 103 is fixed during moving image shooting. The main controller 151 notifies the sensor controller 143 of the determined accumulation time and the magnitude of the gain. The sensor control unit 143 controls the operation of the image sensor 141 so that shooting is performed according to the notified exposure condition.

距離マップ生成部１６１（距離検出手段）は、例えばＲＡＭ１５４に保存された画像データを用いて距離マップを生成する。距離マップは例えば画素の輝度値が被写体距離を表し、デプスマップ距離画像、奥行き画像などと呼ばれることもある。距離マップは公知の方法で生成することができる。例えば、距離マップ生成部１６１は、視差画像（上述したＡ画像およびＢ画像）の像ズレ量から各画素位置におけるデフォーカス量（フォーカスレンズの合焦位置からのズレ量およびズレの方向）を求めることができる。デフォーカス量は現在の被写体距離を基準としたピントのズレ量を表すので、デフォーカス量を距離情報と見なすことができる。もちろん、デフォーカス量に基づいてフォーカスレンズの合焦位置を求め、合焦位置に対応する被写体距離を求めてもよい。また、通常の撮像光学系においては像ズレ量とデフォーカス量は１対１に対応するため、像ズレ量の分布を距離マップとして、奥行きに応じた処理などを行うこともできる。なお、撮像装置１００をステレオカメラのような多眼カメラとして視差画像を取得してもよいし、記憶媒体や外部装置から視差画像を取得してもよい。 The distance map generation unit 161 (distance detection means) generates a distance map using, for example, the image data stored in the RAM 154. In the distance map, for example, the brightness value of a pixel represents the subject distance, and is sometimes called a depth map distance image or depth image. The distance map can be generated by a known method. For example, the distance map generation unit 161 obtains the defocus amount at each pixel position (the shift amount from the focus position of the focus lens and the shift direction) from the image shift amount of the parallax image (A image and B image described above). be able to. Since the defocus amount represents the amount of focus shift based on the current subject distance, the defocus amount can be regarded as distance information. Of course, the focus position of the focus lens may be obtained based on the defocus amount, and the subject distance corresponding to the focus position may be obtained. Further, in a normal image pickup optical system, the image shift amount and the defocus amount have a one-to-one correspondence, so that it is possible to perform processing according to the depth using the distribution of the image shift amount as a distance map. The parallax image may be acquired by using the imaging device 100 as a multi-lens camera such as a stereo camera, or the parallax image may be acquired from a storage medium or an external device.

また、距離マップは視差画像を用いずに生成することもできる。コントラスト評価値が極大となるフォーカスレンズ位置を画素ごとに求めることで、画素ごとに被写体距離を取得することができる。また、合焦距離を変えて同一シーンを複数回撮影して得られる画像データと光学系の点像分布関数(PSF)とから、ぼけ量と距離との相関関係に基づいて画素ごとの距離情報を求めることもできる。距離マップ生成部１６１は、画像全体に対して距離マップを生成してもよいし、画像のうち、動体検出に必要な部分領域に対してだけ距離マップを生成してもよい。距離マップ生成部１６１は、生成した距離マップをＲＡＭ１５４に保存する。距離マップは画像処理部１５２から参照される。なお、距離マップ生成部１６１は、被写体距離を画素ごとに求める代わりに小領域（画素ブロック）ごとに求めて距離マップを生成してもよい。 In addition, the distance map can be generated without using the parallax image. By obtaining the focus lens position where the contrast evaluation value becomes maximum for each pixel, the subject distance can be obtained for each pixel. Also, from the image data obtained by shooting the same scene multiple times while changing the focusing distance and the point spread function (PSF) of the optical system, the distance information for each pixel based on the correlation between the blur amount and the distance. You can also ask. The distance map generation unit 161 may generate a distance map for the entire image, or may generate a distance map only for a partial region of the image that is necessary for moving body detection. The distance map generator 161 stores the generated distance map in the RAM 154. The distance map is referred to by the image processing unit 152. Note that the distance map generation unit 161 may generate the distance map by obtaining the subject distance for each small area (pixel block) instead of obtaining it for each pixel.

さらに、距離マップ生成部１６１は、距離マップに領域ごとに信頼度を算出し、距離マップとともに保存することができる。信頼度の算出方法には特に制限はない。例えば、視差画像を用いて距離マップを生成する際、視差画像間の像ズレ量を求めるために、相対的なシフト量を変えながら例えばＳＡＤなどの相関量（類似度）を演算し、相関が最大（相関量が最小）になるシフト量を像ズレ量として検出する。算出した相関量の平均値と最大値との差が大きい程、検出した像ズレ量（デフォーカス量）の信頼度は高いと考えられる。したがって、各画素位置で算出した相関量の平均値と最大値との差を、その画素位置信頼度として用いることができる。なお、距離マップ生成部１６１は、被写体距離やその信頼度を、画素ごとに求める代わりに小領域（画素ブロック）ごとに求めて距離マップを生成してもよい。 Further, the distance map generation unit 161 can calculate the reliability for each area in the distance map and store the reliability together with the distance map. There is no particular limitation on the method of calculating the reliability. For example, when a distance map is generated using parallax images, in order to obtain the image shift amount between parallax images, the correlation amount (similarity) such as SAD is calculated while changing the relative shift amount, and the correlation is calculated. The shift amount that maximizes (the correlation amount is minimum) is detected as the image shift amount. It is considered that the larger the difference between the average value and the maximum value of the calculated correlation amounts, the higher the reliability of the detected image shift amount (defocus amount). Therefore, the difference between the average value and the maximum value of the correlation amount calculated at each pixel position can be used as the pixel position reliability. The distance map generation unit 161 may generate the distance map by obtaining the subject distance and its reliability for each small area (pixel block) instead of obtaining for each pixel.

動き検出部１６２は、例えばジャイロや加速度センサ、電子コンパスなどの姿勢センサにより構成され、デジタルカメラ１００の位置や姿勢の変化を計測する。本実施形態では一例として、撮影レンズ１０１の光軸をロール軸とし、ロール軸に直交し、撮像素子の長手方向に平行な軸をピッチ軸、ロール軸とピッチ軸に直交する軸をヨー軸としたときのヨー軸およびピッチ軸周りの角速度を姿勢変化として検出する。動き検出部１６２は、検出した動きをＲＡＭ１５４に保存する。動き検出部１６２が検出した動きの情報は、画像処理部１５２から参照される。 The motion detector 162 is composed of a posture sensor such as a gyro, an acceleration sensor, or an electronic compass, and measures a change in the position or posture of the digital camera 100. In this embodiment, as an example, the optical axis of the taking lens 101 is the roll axis, the axis orthogonal to the roll axis and parallel to the longitudinal direction of the image sensor is the pitch axis, and the axis orthogonal to the roll axis and the pitch axis is the yaw axis. The angular velocities around the yaw axis and the pitch axis at that time are detected as attitude changes. The motion detector 162 stores the detected motion in the RAM 154. The image processing unit 152 refers to the information on the motion detected by the motion detection unit 162.

画像処理部１５２は、ＲＡＭ１５４に蓄積された画像データに対して予め定められた画像処理を適用する。画像処理部１５２が適用する画像処理には、ホワイトバランス調整処理、色補間（デモザイク）処理、ガンマ補正処理といった所謂現像処理のほか、信号形式変換処理、スケーリング処理などがあるが、これらに限定されない。さらに、画像処理部１５２は後述する動体検出処理を実行し、動体を主被写体として選択する。動体検出処理において画像処理部１５２は、距離マップ生成部１６１が生成した距離マップと動き検出部１６２が検出した動きを、動体情報に基づく主被写体の判別処理に用いることができる。 The image processing unit 152 applies predetermined image processing to the image data stored in the RAM 154. Image processing applied by the image processing unit 152 includes, but is not limited to, so-called development processing such as white balance adjustment processing, color interpolation (demosaic) processing, and gamma correction processing, as well as signal format conversion processing and scaling processing. .. Further, the image processing unit 152 executes a moving body detection process, which will be described later, and selects the moving body as a main subject. In the moving body detection process, the image processing unit 152 can use the distance map generated by the distance map generation unit 161 and the motion detected by the motion detection unit 162 for the main subject determination process based on the moving body information.

判別した主被写体の領域に関する情報を、他の画像処理（例えばホワイトバランス調整処理や被写体の輝度情報の生成処理など）に利用してもよい。なお、フォーカス制御部１３３がコントラスト検出方式のＡＦを行う場合、コントラスト評価値を画像処理部１５２が生成してフォーカス制御部１３３に供給することができる。画像処理部１５２は、処理した画像データや、主被写体の領域に関する情報などをＲＡＭ１５４に保存する。 The information regarding the determined area of the main subject may be used for other image processing (for example, white balance adjustment processing or subject luminance information generation processing). When the focus control unit 133 performs the AF of the contrast detection method, the contrast evaluation value can be generated by the image processing unit 152 and supplied to the focus control unit 133. The image processing unit 152 stores the processed image data, information about the area of the main subject, and the like in the RAM 154.

（動体検出処理）
図２は、本実施形態における動体検出処理を説明するために、画像処理部１５２を動体検出処理に特化した機能ブロックによって表現した模式図である。図２に示す機能ブロックのそれぞれは、別個のハードウェア回路として実現されてもよいし、画像処理部１５２が有するプログラマブルプロセッサがメモリにプログラムを読み込んで実行することによって実現されてもよい。 (Motion detection process)
FIG. 2 is a schematic diagram in which the image processing unit 152 is represented by a functional block specialized for the moving body detection process in order to explain the moving body detection process in the present embodiment. Each of the functional blocks illustrated in FIG. 2 may be implemented as a separate hardware circuit, or may be implemented by a programmable processor included in the image processing unit 152 reading a program into a memory and executing the program.

図３は、画像処理部１５２が実施する動体検出処理のフローチャートである。
Ｓ２０１において画像入力部５０１は、撮影時刻が異なる２フレームの入力画像をＲＡＭ１５４から取得する。現フレームはセンサ制御部１４３から取得し、前フレーム（過去フレーム）はＲＡＭ１５４から取得してもよい。画像入力部５０１は、取得した入力画像を動きベクトル検出部５０２および顕著度算出部５０７に供給する。 FIG. 3 is a flowchart of the moving body detection process performed by the image processing unit 152.
In step S201, the image input unit 501 acquires from the RAM 154 the input images of two frames having different shooting times. The current frame may be acquired from the sensor control unit 143, and the previous frame (past frame) may be acquired from the RAM 154. The image input unit 501 supplies the acquired input image to the motion vector detection unit 502 and the saliency calculation unit 507.

Ｓ２０２において動きベクトル検出部５０２は、画像入力部５０１から供給された２フレームの入力画像について、画像間の複数の動きベクトルを検出する。動きベクトルの検出は、任意の公知の手法を用いて実現することができるが、本実施形態では一例としてテンプレートマッチングを用いて動きベクトルを検出するものとする。 In S202, the motion vector detection unit 502 detects a plurality of motion vectors between the images of the two-frame input image supplied from the image input unit 501. The detection of the motion vector can be realized by using any known method, but in the present embodiment, the motion vector is detected by using template matching as an example.

すなわち、動きベクトル検出部５０２は、２フレームの画像のうち、先に撮影された１フレームの画像（フレームt-1とする）を水平および垂直方向にそれぞれ複数分割して複数の部分画像を生成する。そして、動きベクトル検出部５０２は、各部分画像をテンプレート、後に撮影された１フレームの画像（フレームtとする）を参照画像としたテンプレートマッチングを実施し、部分画像と類似度が最も高い領域を参照画像内で探索する。そして、動きベクトル検出部５０２は、部分画像の中心点の座標を始点とし、探索により見つかった、部分画像と類似度が最も高い領域の中心点の座標を終点とするベクトルを、部分画像についての動きベクトルとする。このようにして、動きベクトル検出部５０２は、フレームt-1の部分画像ごとに動きベクトルを検出する。 That is, the motion vector detection unit 502 divides a previously-captured one-frame image (referred to as frame t-1) into two in the horizontal and vertical directions to generate a plurality of partial images. To do. Then, the motion vector detection unit 502 performs template matching using each partial image as a template and the image of one frame captured later (referred to as frame t) as a reference image, and determines the region having the highest similarity to the partial image. Search within the reference image. Then, the motion vector detection unit 502 sets a vector having the coordinates of the center point of the partial image as the start point and the coordinates of the center point of the region having the highest similarity to the partial image as the end point, which is found by the search, for the partial image. Let it be a motion vector. In this way, the motion vector detection unit 502 detects a motion vector for each partial image of the frame t-1.

図４（ａ）はフレームt-1、図４（ｂ）はフレームtの例を示す図であり、図４（ｂ）にはテンプレートマッチングによって検出された動きベクトルのいくつかが矢印で示されている。フレームt-1上の点

を始点、それらに対応するフレームt上の点

を終点とするベクトルv_iを以下のように算出する。

なお、等号の上に三角形が付された記号は、左辺を右辺によって定義することを意味する。また、ベクトルの第一成分を画像の横方向、第二成分を画像の縦方向とする（以下同様）。動きベクトル検出部５０２は、検出した動きベクトルv_iと、動きベクトルv_iの終点e_iとを関連づけてＲＡＭ１５４に保存する。 FIG. 4A is a diagram showing an example of frame t-1, and FIG. 4B is a diagram showing an example of frame t. In FIG. 4B, some of the motion vectors detected by template matching are indicated by arrows. ing. Point on frame t-1

Starting points and their corresponding points on frame t

The vector v _i whose end point is is calculated as follows.

The symbol with a triangle above the equal sign means that the left side is defined by the right side. The first component of the vector is the horizontal direction of the image, and the second component is the vertical direction of the image (the same applies hereinafter). The motion vector detection unit 502 stores the detected motion vector v _i and the end point e _i of the motion vector v _{i in} the RAM 154 in association with each other.

Ｓ２０３においてクラスタリング部５０３は、Ｓ２０２において検出された動きベクトル群v_iをクラスタリングする。クラスタリングにはK-Means法や、Affinity Propagationなど、公知の任意のクラスタリング手法を用いることができる。図４（ｃ）には、クラスタ数を自動決定可能なクラスタリング手法であるAffinity Propagationを用いたクラスタリング結果の例を示している。K-Means法のようなクラスタ数を自動決定出来ないクラスタリング手法を用いる場合には何らかの方法でクラスタ数を決定してからクラスタリングを実行する。 In S203, the clustering unit 503 clusters the motion vector group v _i detected in S202. For clustering, any known clustering method such as K-Means method or Affinity Propagation can be used. FIG. 4C shows an example of a clustering result using Affinity Propagation, which is a clustering method capable of automatically determining the number of clusters. When using a clustering method that cannot automatically determine the number of clusters such as the K-Means method, the number of clusters is determined by some method and then clustering is performed.

Ｓ２０４において、背景クラスタ選択部５０４は、Ｓ２０３におけるクラスタリングによって得られたクラスタ（図４（ｃ））のうち、背景領域のベクトルから成る背景クラスタを選択する。背景クラスタの選択方法については後述する。背景クラスタ選択部５０４は、背景クラスタが存在しない場合や選択できない場合には、背景クラスタの選択結果を、背景クラスタ無しとする。背景クラスタ選択部５０４は、背景クラスタの選択結果を背景ベクトル算出部５０５に供給する。 In step S204, the background cluster selection unit 504 selects a background cluster formed by the vector of the background region from the clusters (FIG. 4C) obtained by the clustering in step S203. The method of selecting the background cluster will be described later. The background cluster selection unit 504 sets the background cluster selection result to no background cluster when the background cluster does not exist or cannot be selected. The background cluster selection unit 504 supplies the background cluster selection result to the background vector calculation unit 505.

Ｓ２０５において背景ベクトル算出部５０５は、背景クラスタが存在するか否かを判定し、背景クラスタが存在すると判定されればＳ２０６へ処理を進め、判定されなければ動体検出結果なしとして動体検出処理を終了する。背景ベクトル算出部５０５は、背景クラスタの選択結果が背景クラスタ無しだった場合は背景クラスタが存在しないと判定することができる。 In step S205, the background vector calculation unit 505 determines whether or not a background cluster exists. If it is determined that the background cluster exists, the process proceeds to step S206. If not, the moving object detection result ends and the moving object detection process ends. To do. The background vector calculation unit 505 can determine that there is no background cluster when the background cluster selection result is no background cluster.

Ｓ２０６において背景ベクトル算出部５０５は、背景クラスタ選択部５０４が選択した背景クラスタに属する動きベクトルから、背景の動きを表す背景ベクトルbを算出する。ここでは、一例として、背景クラスタに属する動きベクトルの平均ベクトルを背景ベクトルbとして算出するものとするが、これに限定されない。 In S206, the background vector calculation unit 505 calculates the background vector b representing the motion of the background from the motion vectors belonging to the background cluster selected by the background cluster selection unit 504. Here, as an example, the average vector of the motion vectors belonging to the background cluster is calculated as the background vector b, but the invention is not limited to this.

Ｓ２０７において動体選択部５０６は、動きベクトル検出部５０２が検出した全ての動きベクトルについて、Ｓ２０６で得られた背景ベクトルとのユークリッド距離を算出する。そして、動体選択部５０６は、算出したユークリッド距離が最大となる動きベクトルを選択する。具体的には、動きベクトルv_iと、背景ベクトルbとのユークリッド距離をdist_i、選択される動きベクトルのインデックスをmとすると、以下の式（２）および（３）に従って動体選択部５０６は動きベクトルのインデックスｍを決定する。

なお、dist_m（すなわち、背景ベクトルbと動きベクトルv_iとの最大のユークリッド距離）が所定の閾値未満である場合、動体選択部５０６は動体が検出されないものと判定して動体検出処理を終了する。この場合、動体検出結果は、動体検出無しとなる。一方、dist_mが閾値以上であれば、動体選択部５０６は、動きベクトルv_mの終点の画像座標e_mをフレームtに対する動体検出位置として出力し、動体検出処理を終了する。 In S207, the moving object selection unit 506 calculates the Euclidean distance from the background vector obtained in S206 for all the motion vectors detected by the motion vector detection unit 502. Then, the moving body selection unit 506 selects the motion vector that maximizes the calculated Euclidean distance. Specifically, if the Euclidean distance between the motion vector v _i and the background vector b is dist _i , and the index of the selected motion vector is m, the moving body selection unit 506 calculates the following equations (2) and (3). Determine the index m of the motion vector.

When dist _m (that is, the maximum Euclidean distance between the background vector b and the motion vector v _i ) is less than the predetermined threshold value, the moving body selection unit 506 determines that a moving body is not detected and ends the moving body detection process. To do. In this case, the moving body detection result is no moving body detection. On the other hand, if the dist _m is equal to or larger than the threshold, the moving object selection unit 506 outputs the image coordinates e _m of the end point of the motion vector v _m as moving object detection position relative to the frame t, and ends the moving object detection process.

検出した動体検出位置は、撮像装置の動作を制御するために用いることができる。例えば、動体検出位置を含むように焦点検出領域の設定することで、動体検出位置に合焦するようにレンズユニットの焦点検出を制御することができる。また、動体検出位置が適正露出になる様に自動露出制御を行うことができる。 The detected moving body detection position can be used to control the operation of the imaging device. For example, by setting the focus detection area so as to include the moving body detection position, it is possible to control the focus detection of the lens unit so as to focus on the moving body detection position. Further, automatic exposure control can be performed so that the moving object detection position has a proper exposure.

（背景クラスタの選択処理例１）
ここで、Ｓ２０４で背景クラスタ選択部５０４が行う背景クラスタ選択処理の一例について図５のフローチャートを用いて説明する。この例において背景クラスタ選択部５０４は、クラスタを構成する動きベクトルの画像中における存在範囲が最大であるクラスタを背景クラスタとして選択する。これは、クラスタを構成する動きベクトルの検出位置の分布範囲が最も広いクラスタを背景クラスタとして選択すると言うこともできる。 (Background cluster selection processing example 1)
Here, an example of the background cluster selection processing performed by the background cluster selection unit 504 in S204 will be described using the flowchart of FIG. In this example, the background cluster selection unit 504 selects the cluster in which the motion vector forming the cluster has the largest existence range in the image as the background cluster. It can be said that the cluster having the widest distribution range of the motion vector detection positions forming the cluster is selected as the background cluster.

Ｓ３０１において背景クラスタ選択部５０４は、クラスタリング部５０３が生成したクラスタのそれぞれについて、クラスタを構成する動きベクトルの、始点座標または終点座標の分散を算出する。背景クラスタ選択部５０４は、例えば座標のｘ座標についての分散と、ｙ座標についての分散とを算出し、両者を加算することにより、座標の分散値を算出する。本実施形態において背景クラスタ選択部５０４は以下の式（４）に示すようにｘ座標の分散とｙ座標の分散を単純に加算してクラスタkの分散var_kを算出するが、ｘ座標の分散とｙ座標の分散を画像の縦横比に応じた重みで加算してもよい。

ここで、n_kはクラスタkを構成する動きベクトルの総数であり、

はそれぞれ、クラスタkを構成する動きベクトルの終点のｘ座標ex_iおよびｙ座標ey_iの平均値である。 In step S301, the background cluster selection unit 504 calculates the variance of the start point coordinates or the end point coordinates of the motion vectors forming the cluster for each cluster generated by the clustering unit 503. The background cluster selection unit 504 calculates the variance of the coordinates, for example, the x-coordinate and the y-coordinate, and adds the two to calculate the coordinate variance value. In the present embodiment, the background cluster selection unit 504 calculates the variance var _{k of the} cluster k by simply adding the variance of the x coordinate and the variance of the y coordinate as shown in the following equation (4). And the variance of y-coordinates may be added with weights according to the aspect ratio of the image.

Where n _k is the total number of motion vectors that make up cluster k,

Are the average values of the x-coordinate ex _i and the y-coordinate ey _i of the end points of the motion vectors forming the cluster k, respectively.

Ｓ３０２において背景クラスタ選択部５０４は、Ｓ３０１で算出した分散var_kの最大値var_Kと、対応するクラスタＫを選択し、例えば内部メモリに記憶する。 In S302, the background cluster selection unit 504 selects the maximum value var _K of the variance var _k calculated in S301 and the corresponding cluster K, and stores the cluster K in, for example, an internal memory.

Ｓ３０３において背景クラスタ選択部５０４は、Ｓ３０２で記憶した最大分散値var_Kが所定の閾値以上か否か判定し、閾値以上と判定されればＳ３０４へ、判定されなければＳ３０５へ処理を進める。 In S303, the background cluster selection unit 504 determines whether or not the maximum variance value var _K stored in S302 is equal to or greater than a predetermined threshold value. If determined to be equal to or greater than the threshold value, the process proceeds to S304, and if not determined, the process proceeds to S305.

Ｓ３０４において背景クラスタ選択部５０４は、Ｓ３０２で記憶した、最大分散値var_Kに対応するクラスタKを背景クラスタとして選択し、背景クラスタ選択処理を終了する。ここでは最大分散値var_Kに対応するクラスタKを背景クラスタとして選択した。 In S304, the background cluster selection unit 504 selects the cluster K corresponding to the maximum variance value var _K stored in S302 as a background cluster, and ends the background cluster selection processing. Here, the cluster K corresponding to the maximum variance value var _K is selected as the background cluster.

Ｓ３０５において背景クラスタ選択部５０４は、背景クラスタなしと判定して背景クラスタ選択処理を終了する。 In step S305, the background cluster selection unit 504 determines that there is no background cluster and ends the background cluster selection process.

（背景クラスタの選択処理例２）
Ｓ２０４で背景クラスタ選択部５０４が行う背景クラスタ選択処理の別の例について図６のフローチャートを用いて説明する。この例において背景クラスタ選択部５０４は、デジタルカメラ１００の動きに基づいて背景クラスタを選択する。デジタルカメラの動きを考慮することで、安定した背景クラスタの選択が実現できる。 (Background cluster selection processing example 2)
Another example of the background cluster selection processing performed by the background cluster selection unit 504 in S204 will be described with reference to the flowchart of FIG. In this example, the background cluster selection unit 504 selects a background cluster based on the movement of the digital camera 100. A stable background cluster selection can be realized by considering the movement of the digital camera.

Ｓ４０１において動き入力部５０８は、動き検出部１６２からデジタルカメラ１００の動き情報（ここではヨー軸およびピッチ軸周りの角速度）を取得し、背景クラスタ選択部５０４に供給する。 In step S<b>401, the motion input unit 508 acquires the motion information of the digital camera 100 (here, the angular velocity around the yaw axis and the pitch axis) from the motion detection unit 162, and supplies it to the background cluster selection unit 504.

Ｓ４０２において背景クラスタ選択部５０４は、動き入力部５０８から受け取った動き情報から背景の動き方向を推定する。本実施形態において背景クラスタ選択部５０４は、ヨー軸周りの角速度をyaw、ピッチ軸周りの角速度をpitchとした場合、背景の動き方向を

で定義されるgに平行な方向と推定する。なお、ここで説明した方法は単なる一例であり、ロール軸周りの角速度を考慮したり、シフト方向の速度を考慮したりして背景の動き方向を推定してもよい。 In S402, the background cluster selection unit 504 estimates the motion direction of the background from the motion information received from the motion input unit 508. In the present embodiment, when the angular velocity around the yaw axis is yaw and the angular velocity around the pitch axis is pitch, the background cluster selection unit 504 determines the movement direction of the background.

It is assumed that the direction is parallel to g defined in. The method described here is merely an example, and the moving direction of the background may be estimated by considering the angular velocity around the roll axis and the velocity in the shift direction.

Ｓ４０３において背景クラスタ選択部５０４は、各クラスタのクラスタ中心ベクトルと、式（５）のgとがなす角度をそれぞれ算出する。ここで、クラスタ中心ベクトルとは、各クラスタを構成する動きベクトルv_iの代表ベクトルであり、例えば平均ベクトルであってよい。 In S403, the background cluster selection unit 504 calculates the angle formed by the cluster center vector of each cluster and g in Expression (5). Here, the cluster center vector is a representative vector of the motion vectors v _i forming each cluster, and may be, for example, an average vector.

Ｓ４０４において背景クラスタ選択部５０４は、Ｓ４０３で算出した角度の最小値θ_Kと、対応するクラスタＫを選択し、例えば内部メモリに記憶する。 In S404, the background cluster selection unit 504 selects the minimum angle θ _K calculated in S403 and the corresponding cluster K, and stores the cluster K in, for example, an internal memory.

Ｓ４０５において背景クラスタ選択部５０４は、Ｓ４０４で選択したθ_Kが所定の閾値未満か否かを判定し、閾値未満と判定されればＳ４０６へ、判定されなければＳ４０７へ、処理を進める。 In step S405, the background cluster selection unit 504 determines whether or not θ _K selected in step S404 is less than a predetermined threshold value. If it is determined to be less than the threshold value, the process proceeds to step S406. If not, the process proceeds to step S407.

Ｓ４０６において背景クラスタ選択部５０４は、Ｓ４０４で選択したクラスタＫを背景クラスタとして選択し、背景クラスタ選択処理を終了する。 In S406, the background cluster selection unit 504 selects the cluster K selected in S404 as the background cluster, and ends the background cluster selection process.

Ｓ４０７において背景クラスタ選択部５０４は、背景クラスタなしと判定して背景クラスタ選択処理を終了する。 In step S407, the background cluster selection unit 504 determines that there is no background cluster and ends the background cluster selection processing.

（動体検出処理の別例１）
次に、Ｓ２０７における動体検出処理の別の例について説明する。本例において動体選択部５０６は、画像内の視覚的に目立った領域（顕著領域）に限定して動体検出処理を適用することで、より被写体らしい領域を選択することが可能となる。 (Another example 1 of moving object detection processing)
Next, another example of the moving body detection process in S207 will be described. In the present example, the moving body selection unit 506 can select a more subject-like area by applying the moving body detection processing only to a visually conspicuous area (salient area) in the image.

顕著度算出部５０７は、画像入力部５０１から供給されるフレームtについて、動きベクトル検出部５０２が検出した動きベクトルのそれぞれの終点座標e_iにおける顕著度を算出する。顕著度算出部５０７は、例えばLaurent Itti, Christof Koch, and Ernst Niebur, ”A Model of Saliency-Based Visual Attention for Rapid Scene Analysis”, IEEE Transactions on Pattern Analysis and Machine Intelligence archive Volume 20 Issue 11, November 1998 Pages 1254-1259に記載の顕著度(saliency)など、公知の任意の顕著度算出方法を用いて顕著度を算出することができる。顕著度算出部５０７は、算出した顕著度を、動きベクトルに対応付けてＲＡＭ１５４に格納する。 The saliency calculating unit 507 calculates the saliency of the frame t supplied from the image input unit 501 at each end point coordinate e _i of the motion vector detected by the motion vector detecting unit 502. The saliency calculation unit 507 is, for example, Laurent Itti, Christof Koch, and Ernst Niebur, “A Model of Saliency-Based Visual Attention for Rapid Scene Analysis”, IEEE Transactions on Pattern Analysis and Machine Intelligence archive Volume 20 Issue 11, November 1998 Pages. The saliency can be calculated using any known saliency calculation method such as the saliency described in 1254-1259. The saliency calculating unit 507 stores the calculated saliency in the RAM 154 in association with the motion vector.

Ｓ２０７において動体選択部５０６は、背景ベクトルbとのユークリッド距離を算出する動きベクトルを、所定の閾値以上の顕著度を持つ動きベクトルに限定する。これにより、動体検出を顕著領域に限定して適用することができる。 In S207, the moving body selection unit 506 limits the motion vector for calculating the Euclidean distance from the background vector b to the motion vector having the saliency of a predetermined threshold value or more. Accordingly, the moving body detection can be applied to the salient region only.

（動体検出処理の別例２）
次に、Ｓ２０７における動体検出処理のさらに別の例について説明する。本例において動体選択部５０６は、デジタルカメラ１００に近い領域から優先的に動体検出を適用する。これにより、動体が複数存在する場合、手前にいる動体を優先して主被写体として検出することができる。 (Another example 2 of moving object detection processing)
Next, still another example of the moving body detection process in S207 will be described. In this example, the moving body selection unit 506 preferentially applies moving body detection from an area close to the digital camera 100. Accordingly, when there are a plurality of moving objects, the moving object in front can be preferentially detected as the main subject.

距離マップ入力部５０９は、距離マップ生成部１６１がフレームtに対して生成した距離マップを取得し、動体選択部５０６に供給する。 The distance map input unit 509 acquires the distance map generated for the frame t by the distance map generation unit 161, and supplies the distance map to the moving body selection unit 506.

Ｓ２０７において動体選択部５０６は、背景ベクトルbとのユークリッド距離を算出する動きベクトルを、その終点座標e_iに対応する被写体距離が閾値未満である動きベクトルに限定する。これにより、動体検出をデジタルカメラ１００に近い領域に限定して適用することができる。 In S207, the moving body selection unit 506 limits the motion vector for calculating the Euclidean distance to the background vector b to the motion vector whose subject distance corresponding to the end point coordinate e _i is less than the threshold value. Accordingly, the moving body detection can be applied to only the area close to the digital camera 100.

あるいは、動体選択部５０６は、背景ベクトルbとのユークリッド距離が閾値以上である動きベクトルの中から、終点座標e_iに対応する被写体距離が最小であるe_iを動体検出位置としてもよい。 Alternatively, the moving object selection unit 506 may set e _i having the smallest subject distance corresponding to the end point coordinates e _i as a moving object detection position from the motion vectors whose Euclidean distance to the background vector b is equal to or greater than the threshold value.

後者の例を図７を用いて説明する。図７（ａ）および（ｂ）は、動体として人物と犬が含まれるフレーム画像を模式的に示している。いずれのフレーム画像についても、人物と犬に関する動きベクトルは背景ベクトルbとのユークリッド距離が閾値以上であり、動体の動きベクトルであると推測できるものとする。ユークリッド距離が閾値以上である動きベクトルの中から、終点座標e_iに対応する被写体距離が最小であるe_iを動体検出位置とする場合、図７（ａ）では人物の動きベクトル、図７（ｂ）では犬の動きベクトルの終点が動体検出位置として選択される。 The latter example will be described with reference to FIG. 7A and 7B schematically show frame images in which a person and a dog are included as moving bodies. In any of the frame images, the motion vector relating to the person and the dog has the Euclidean distance from the background vector b equal to or greater than the threshold value, and it can be estimated that the motion vector is the motion vector of the moving body. When the moving object detection position is e _i having the smallest subject distance corresponding to the end point coordinates e _i among the motion vectors having the Euclidean distance equal to or greater than the threshold value, the motion vector of the person in FIG. In b), the end point of the motion vector of the dog is selected as the moving body detection position.

（背景ベクトル算出の別例）
上述の構成では、動きベクトルをクラスタリングして背景クラスタを選択し、背景クラスタを構成する動きベクトルから背景ベクトルbを算出した。しかし、デジタルカメラ１００の動きと撮影レンズ１０１の焦点距離（画角）を用いて、クラスタリングを行わずに背景ベクトルを算出してもよい。これにより、動きベクトルのクラスタリングに係る演算負荷を除去することができる。また、背景領域に色変化が少なく動きベクトルを検出することが難しいシーンでこの手法が有用である。 (Another example of background vector calculation)
In the above configuration, the motion vector is clustered to select the background cluster, and the background vector b is calculated from the motion vector forming the background cluster. However, the background vector may be calculated without clustering using the movement of the digital camera 100 and the focal length (angle of view) of the taking lens 101. As a result, the calculation load related to the clustering of motion vectors can be removed. In addition, this method is useful in a scene where there is little color change in the background area and it is difficult to detect a motion vector.

背景ベクトル算出部５０５は、フレームtを撮影した際の撮影レンズ１０１（撮影光学系）の焦点距離ｆ[mm]をズーム制御部１１３を通じて取得する。また、動き入力部５０８は、動き検出部１６２からデジタルカメラ１００の動き情報（ここではヨー軸およびピッチ軸周りの角速度）を取得し、背景ベクトル算出部５０５に供給する。 The background vector calculation unit 505 acquires the focal length f [mm] of the photographing lens 101 (photographing optical system) when photographing the frame t through the zoom control unit 113. Further, the motion input unit 508 acquires the motion information of the digital camera 100 (here, the angular velocity around the yaw axis and the pitch axis) from the motion detection unit 162, and supplies it to the background vector calculation unit 505.

背景ベクトル算出部５０５は以下の式（６）に従って背景ベクトルbを算出することができる。

ここで、Ａは撮像素子１４１上の実距離[mm]を画像座標系の距離[pixel]に変換するための係数であり、デジタルカメラ１００の構造と画像サイズで決定される撮像素子の情報であり、例えばＲＯＭ１５５に予め定め記憶されている。Ｔはフレームt-1とフレームtの撮影間隔[sec]である。 The background vector calculation unit 505 can calculate the background vector b according to the following equation (6).

Here, A is a coefficient for converting the actual distance [mm] on the image sensor 141 into the distance [pixel] of the image coordinate system, which is information of the image sensor determined by the structure of the digital camera 100 and the image size. Yes, for example, it is predetermined and stored in the ROM 155. T is a shooting interval [sec] between the frame t-1 and the frame t.

背景ベクトル算出部５０５は、算出した背景ベクトルbを動体選択部５０６に供給する。動体選択部５０６の処理は上述した通りでよい。 The background vector calculation unit 505 supplies the calculated background vector b to the moving body selection unit 506. The processing of the moving body selection unit 506 may be as described above.

以上説明したように本実施形態によれば、個々の動きベクトルと背景ベクトルとのユークリッド距離に基づいて動体の動きベクトルを選択することにより、背景と同じ方向に移動する移動体であっても検出することができる。 As described above, according to the present embodiment, even if a moving object that moves in the same direction as the background is detected by selecting the motion vector of the moving object based on the Euclidean distance between each motion vector and the background vector. can do.

（第２実施形態）
次に、本発明の第２実施形態について説明する。本実施形態は動き情報を用いて主被写体を判別する際に、ユーザが意図している主被写体を確率よく判別する画像処理装置およびその制御方法に関する。以下では、第１実施形態と類似又は同様の構成や動作に関しては同じ参照数字を用い、重複する説明を省略する。 (Second embodiment)
Next, a second embodiment of the present invention will be described. The present embodiment relates to an image processing apparatus and a control method thereof that determine a main subject intended by a user with high probability when determining a main subject using motion information. In the following, the same reference numerals are used for the configurations and operations similar or similar to those of the first embodiment, and the duplicate description will be omitted.

図８は、本実施形態における動体検出処理を説明するために、画像処理部１５２を動体検出処理に特化した機能ブロックによって表現した模式図である。図８に示す機能ブロックのそれぞれは、別個のハードウェア回路として実現されてもよいし、画像処理部１５２が有するプログラマブルプロセッサがメモリにプログラムを読み込んで実行することによって実現されてもよい。 FIG. 8 is a schematic diagram in which the image processing unit 152 is represented by a functional block specialized for the moving body detection process in order to explain the moving body detection process in the present embodiment. Each of the functional blocks illustrated in FIG. 8 may be realized as a separate hardware circuit, or may be realized by a programmable processor included in the image processing unit 152 reading a program into a memory and executing the program.

画像入力部５０１は、撮影時刻が異なる２フレームの入力画像をＲＡＭ１５４から取得する。現フレームはセンサ制御部１４３から取得し、前フレーム（過去フレーム）はＲＡＭ１５４から取得してもよい。 The image input unit 501 acquires, from the RAM 154, input images of two frames having different shooting times. The current frame may be acquired from the sensor control unit 143, and the previous frame (past frame) may be acquired from the RAM 154.

動きベクトル検出部５０２は、画像入力部５０１が取得した入力画像間で複数の動きベクトルを検出する。動きベクトルの検出は、任意の公知の手法を用いて実現することができるが、本実施形態では一例としてテンプレートマッチングを用いて動きベクトルを検出するものとする。 The motion vector detection unit 502 detects a plurality of motion vectors between the input images acquired by the image input unit 501. The detection of the motion vector can be realized by using any known method, but in the present embodiment, the motion vector is detected by using template matching as an example.

クラスタリング部５０３は、動きベクトル検出部５０２で検出された複数の動きベクトルを、少なくとも以下の３つのクラスタに分類する。
・背景クラスタ：画像全体の動きベクトル（例えば撮像装置１００の動きに基づくベクトル）
・主被写体クラスタ：前フレーム以前の主被写体に関する動きベクトル
・主被写体候補クラスタ：背景クラスタに属する動きベクトルとの距離が大きいベクトル（実空間で動きの大きいベクトル） The clustering unit 503 classifies the plurality of motion vectors detected by the motion vector detection unit 502 into at least the following three clusters.
Background cluster: motion vector of the entire image (for example, a vector based on the motion of the image capturing apparatus 100)
-Main subject cluster: Motion vector related to main subject before previous frame-Main subject candidate cluster: Vector with large distance to motion vector belonging to background cluster (vector with large movement in real space)

これら少なくとも３つのクラスタに分類することで、動きに基づく主被写体判別処理において、現在の主被写体の動き情報を考慮した判別を実現できる。そのため、ユーザの意図する主被写体を正しく判別する確率を高めることができる。例えば、現在の主被写体が実空間で静止していれば、動きのある別の被写体を新たな主被写体として判別することが可能である。あるいは、現在の主被写体が実空間で動いていれば、別の動体が存在していても、主被写体の判別結果を変更しないようにすることも可能である。 By classifying these into at least three clusters, it is possible to realize the determination in consideration of the current movement information of the main subject in the main subject determination process based on the movement. Therefore, it is possible to increase the probability of correctly distinguishing the main subject intended by the user. For example, if the current main subject is still in the real space, another moving subject can be determined as a new main subject. Alternatively, if the current main subject is moving in the real space, it is possible not to change the determination result of the main subject even if another moving body exists.

主被写体判別部８０２では、クラスタリング部５０３によるクラスタリングの結果に基づいて、現フレーム内の主被写体領域を判別する。主被写体判別部８０２で判別された主被写体領域の情報は、例えば自動露出制御や自動焦点検出といった各種の処理に用いることができる。また、被写体追尾部８０１は、主被写体判別部８０２で判別された主被写体領域をテンプレートとしたマッチング処理により、次フレーム以降の入力画像中の主被写体領域を探索する被写体追尾処理を実行する。被写体追尾部８０１による被写体追尾結果（画像内の主被写体領域に関する情報）は、クラスタリング部５０３にも供給される。 The main subject discriminating unit 802 discriminates the main subject region in the current frame based on the result of the clustering by the clustering unit 503. The information on the main subject area determined by the main subject determination unit 802 can be used for various processes such as automatic exposure control and automatic focus detection. In addition, the subject tracking unit 801 executes subject tracking processing for searching for the main subject region in the input image of the next frame and subsequent frames by the matching process using the main subject region determined by the main subject determination unit 802 as a template. The result of subject tracking by the subject tracking unit 801 (information regarding the main subject region in the image) is also supplied to the clustering unit 503.

なお、動き情報に基づく主被写体判別処理には、画像入力部５０１が取得する入力画像以外に、撮像装置１００で取得できる他の情報を用いてもよい。例えば、動き検出部１６２が検出した動きの情報を動き入力部５０８を通じてクラスタリング部５０３に入力し、背景クラスタの推定に用いることができる。また、距離マップ生成部１６１で得られる距離マップを距離マップ入力部５０９を通じて主被写体判別部８０２に入力し、主被写体判別処理に利用することができる。 In addition to the input image acquired by the image input unit 501, other information that can be acquired by the image capturing apparatus 100 may be used for the main subject determination process based on the motion information. For example, the information on the motion detected by the motion detection unit 162 can be input to the clustering unit 503 through the motion input unit 508 and used for estimating the background cluster. Further, the distance map obtained by the distance map generation unit 161 can be input to the main subject determination unit 802 through the distance map input unit 509 and used for the main subject determination processing.

（主被写体判別処理）
図９は、画像処理部１５２が実施する主被写体判別処理のフローチャートである。
Ｓ９０１において画像入力部５０１は、撮影時刻が異なる２フレームの入力画像を取得する。
（動きベクトル検出）
Ｓ９０２において、動きベクトル検出部５０２は、画像入力部５０１が取得した２フレームの入力画像間で複数の動きベクトルを検出する。動きベクトル検出部５０２は、２フレームの画像のうち、先に撮影された１フレームの画像（フレームt-1とする）を水平および垂直方向にそれぞれ複数分割して複数の部分画像を生成する。そして、動きベクトル検出部５０２は、各部分画像をテンプレート、後に撮影された１フレームの画像（フレームtとする）を参照画像としたテンプレートマッチングを実施し、部分画像と類似度が最も高い領域を参照画像内で探索する。そして、動きベクトル検出部５０２は、部分画像の中心点の座標を始点とし、探索により見つかった、部分画像と類似度が最も高い領域の中心点の座標を終点とするベクトルを、部分画像についての動きベクトルとする。このようにして、動きベクトル検出部５０２は、第１実施形態と同様にしてフレームt-1の部分画像ごとに動きベクトルを検出する。そして、動きベクトル検出部５０２は、検出した動きベクトルv_iと、動きベクトルv_iの終点e_iとを関連づけてＲＡＭ１５４に保存する。 (Main subject discrimination process)
FIG. 9 is a flowchart of the main subject determination process performed by the image processing unit 152.
In step S<b>901, the image input unit 501 acquires two frames of input images with different shooting times.
(Motion vector detection)
In step S902, the motion vector detection unit 502 detects a plurality of motion vectors between the two-frame input images acquired by the image input unit 501. The motion vector detection unit 502 divides a previously-captured 1-frame image (referred to as frame t-1) into a plurality of horizontal and vertical directions to generate a plurality of partial images. Then, the motion vector detection unit 502 performs template matching using each partial image as a template and the image of one frame captured later (referred to as frame t) as a reference image, and determines the region having the highest similarity to the partial image. Search within the reference image. Then, the motion vector detection unit 502 sets a vector having the coordinates of the center point of the partial image as the start point and the coordinates of the center point of the region having the highest similarity to the partial image as the end point, which is found by the search, for the partial image. Let it be a motion vector. In this way, the motion vector detection unit 502 detects a motion vector for each partial image of the frame t−1, as in the first embodiment. Then, the motion vector detection unit 502 stores the detected motion vector v _i and the end point e _i of the motion vector v _{i in} the RAM 154 in association with each other.

（クラスタリング）
Ｓ９０３からＳ９０６までは、クラスタリング部５０３よる、動きベクトルのクラスタリング処理に関する。
Ｓ９０３においてクラスタリング部５０３は、Ｓ９０２において検出された動きベクトル群v_iをクラスタリングする。クラスタリングは第１実施形態と同様にK-Means法や、Affinity Propagationなど、公知の任意のクラスタリング手法を用いることができる。 (Clustering)
S903 to S906 relate to motion vector clustering processing by the clustering unit 503.
In S903, the clustering unit 503 clusters the motion vector group v _i detected in S902. For clustering, any known clustering method such as the K-Means method or Affinity Propagation can be used as in the first embodiment.

現フレームの画像と、現フレームと前フレームとの間で検出された動きベクトルとを模式的に図１０（ａ）に示す。そして、この動きベクトルを、クラスタ数を自動決定可能なクラスタリング手法であるAffinity Propagationを用いたクラスタリング結果の例を図１０（ｃ）に示す。K-Means法のようなクラスタ数を自動決定出来ないクラスタリング手法を用いる場合には何らかの方法でクラスタ数を決定してからクラスタリングを実行する。 The image of the current frame and the motion vector detected between the current frame and the previous frame are schematically shown in FIG. Then, FIG. 10C shows an example of a clustering result obtained by using Affinity Propagation, which is a clustering method capable of automatically determining the number of clusters, for this motion vector. When using a clustering method that cannot automatically determine the number of clusters such as the K-Means method, the number of clusters is determined by some method and then clustering is performed.

（背景クラスタの選択）
Ｓ９０４において、クラスタリング部５０３は、Ｓ９０３におけるクラスタリングによって得られたクラスタ（図１０（ｃ））のうち、背景領域のベクトルから成る背景クラスタを選択する（図１０（ｄ））。背景クラスタの選択は、第１実施形態で図５や図６を用いて説明した方法で行うことができる。 (Select background cluster)
In step S904, the clustering unit 503 selects a background cluster including a vector of the background region from the clusters (FIG. 10C) obtained by the clustering in step S903 (FIG. 10D). The background cluster can be selected by the method described with reference to FIGS. 5 and 6 in the first embodiment.

（主被写体クラスタの選択）
図９に戻り、Ｓ９０５においてクラスタリング部５０３は、Ｓ９０３で得られたクラスタのうち、主被写体領域のベクトルから構成される主被写体クラスタを選択する。主被写体領域は前フレーム以前で、主被写体判別部８０２によって判別されている。そして、被写体追尾部８０１は、例えば直近に判別された主被写体領域をテンプレートとしたテンプレートマッチングにより、画像入力部５０１が取得する現フレームの画像内で主被写体領域を探索する。 (Selection of main subject cluster)
Returning to FIG. 9, in step S<b>905, the clustering unit 503 selects a main subject cluster formed by the vectors of the main subject area from the clusters obtained in step S<b>903. The main subject area is determined by the main subject determination unit 802 before the previous frame. Then, the subject tracking unit 801 searches for the main subject region in the image of the current frame acquired by the image input unit 501 by, for example, template matching using the most recently determined main subject region as a template.

被写体追尾部８０１は具体的には現フレームの画像の部分領域と主被写体領域（テンプレート）との類似度を、部分領域の位置を変えながら順次算出することにより、テンプレートと最も類似度が高い領域を現フレームの画像内で探索する。類似度は例えば相関量であってよい。そして、被写体追尾部８０１は、最も類似度が高い部分領域の類似度が予め定められた閾値を超える場合に、その部分領域がテンプレートと同じ画像（すなわち、現フレームの画像における主被写体領域）であると判定する。 Specifically, the subject tracking unit 801 sequentially calculates the similarity between the partial region of the image of the current frame and the main subject region (template) while changing the position of the partial region, and thus the region having the highest similarity to the template. In the image of the current frame. The similarity may be a correlation amount, for example. Then, when the similarity of the partial region having the highest similarity exceeds a predetermined threshold, the subject tracking unit 801 determines that partial region is the same image as the template (that is, the main subject region in the image of the current frame). Judge that there is.

図１０（ｂ）に、被写体追尾部８０１によって主被写体領域と判定された領域５０１を示す。クラスタリング部５０３は、被写体追尾部８０１から主被写体領域の情報を取得し、主被写体領域内に終点e_iが含まれる動きベクトルを主被写体クラスタに割り当てる（図１０（ｄ））。なお、被写体追尾部８０１による被写体追尾結果が得られない場合、クラスタリング部５０３は例えばユーザが指定した画像領域を主被写体領域として用いてもよい。 FIG. 10B shows an area 501 determined as the main object area by the object tracking unit 801. The clustering unit 503 acquires information on the main subject region from the subject tracking unit 801 and assigns a motion vector having the end point e _i in the main subject region to the main subject cluster (FIG. 10(d)). When the subject tracking result by the subject tracking unit 801 cannot be obtained, the clustering unit 503 may use, for example, the image region designated by the user as the main subject region.

（主被写体候補クラスタの選択）
Ｓ９０６において、クラスタリング部５０３は、Ｓ９０３で得られたクラスタのうち、新たな主被写体領域の候補を示す動きベクトルから構成される主被写体候補クラスタを選択する。まずクラスタリング部５０３は、Ｓ９０５で選択した背景クラスタに属する動きベクトルから、背景の動きを表す背景ベクトルbを算出する。ここでは、第１実施形態の背景クラスタ選択部５０４がＳ２０６で行う処理と同様に、背景クラスタに属する動きベクトルの平均ベクトルを背景ベクトルbとして算出するものとする（図１０（ｄ））が、これに限定されない。 (Select main subject candidate cluster)
In step S906, the clustering unit 503 selects a main subject candidate cluster including a motion vector indicating a new main subject region candidate, from the clusters obtained in step S903. First, the clustering unit 503 calculates the background vector b representing the motion of the background from the motion vectors belonging to the background cluster selected in S905. Here, it is assumed that the average vector of motion vectors belonging to the background cluster is calculated as the background vector b as in the processing performed by the background cluster selection unit 504 of the first embodiment in S206 (FIG. 10(d)). It is not limited to this.

そして、クラスタリング部５０３は、動きベクトル検出部５０２が検出した全ての動きベクトルについて、背景ベクトルbとのユークリッド距離を算出する。そして、クラスタリング部５０３は、算出したユークリッド距離が最大となる動きベクトルを選択する。この処理は、第１実施形態の動体選択部５０６がＳ２０７で行う処理と同様である。具体的には、クラスタリング部５０３は式（２）および（３）に従って動きベクトルのインデックスｍを決定する。なお、ユークリッド距離が最大となる動きベクトルが複数検出されている場合、インデックスｍも複数決定される。なお、dist_m（すなわち、背景ベクトルbと動きベクトルv_iとの最大のユークリッド距離）が所定の閾値未満である場合、クラスタリング部５０３は主被写体候補クラスタに割り当てる動きベクトルが存在しないと判定する。一方、dist_mが閾値以上であれば、クラスタリング部５０３は、動きベクトルv_mを主被写体候補クラスタに割り当てる（図１０（ｄ））。 Then, the clustering unit 503 calculates the Euclidean distance from the background vector b for all the motion vectors detected by the motion vector detection unit 502. Then, the clustering unit 503 selects the motion vector that maximizes the calculated Euclidean distance. This processing is the same as the processing performed by the moving body selection unit 506 of the first embodiment in S207. Specifically, the clustering unit 503 determines the motion vector index m according to equations (2) and (3). When a plurality of motion vectors having the maximum Euclidean distance are detected, a plurality of indexes m are also determined. When dist _m (that is, the maximum Euclidean distance between the background vector b and the motion vector v _i ) is less than the predetermined threshold, the clustering unit 503 determines that there is no motion vector to be assigned to the main subject candidate cluster. On the other hand, if dist _m is greater than or equal to the threshold value, the clustering unit 503 assigns the motion vector v _m to the main subject candidate cluster (FIG. 10(d)).

（主被写体判別）
Ｓ９０７において主被写体判別部８０２は、背景、主被写体、および主被写体候補の３つのクラスタに属する動きベクトルに基づいて、主被写体を判別する。図１１は、Ｓ９０７における主被写体判別処理の詳細に関するフローチャートである。 (Main subject discrimination)
In step S907, the main subject determination unit 802 determines the main subject based on the motion vectors belonging to the three clusters of the background, the main subject, and the main subject candidate. FIG. 11 is a flowchart regarding details of the main subject determination processing in S907.

Ｓ１１０１で主被写体判別部８０２は、主被写体クラスタに属する動きベクトルの代表ベクトル（例えば平均ベクトル）と、背景クラスタに属する動きベクトルの代表ベクトル（例えば背景ベクトルb）とのユークリッド距離を算出する。そして、主被写体判別部８０２は、算出したユークリッド距離が所定の閾値未満であるか否かを判定し、閾値未満（距離が小さい）と判定されればＳ１１０２へ、閾値未満と判定されなければＳ１１０４へ処理を進める。ここで、距離が閾値未満であることは、実空間での主被写体の動きが小さい状態を示す。なお、主被写体判別部８０２は、主被写体クラスタに属する動きベクトルが存在しない場合にはＳ１１０２へ、背景クラスタに属する動きベクトルが存在しない場合にはＳ１１０４へ、処理を進める。 In step S1101, the main subject determination unit 802 calculates the Euclidean distance between the representative vector of motion vectors belonging to the main subject cluster (for example, the average vector) and the representative vector of motion vectors belonging to the background cluster (for example, the background vector b). Then, the main subject determination unit 802 determines whether or not the calculated Euclidean distance is less than a predetermined threshold value. If it is determined that the calculated Euclidean distance is less than the threshold value (small distance), the process proceeds to S1102. If not, the process proceeds to S1104. Proceed to. Here, the distance being less than the threshold indicates a state in which the movement of the main subject in the real space is small. Note that the main subject determination unit 802 advances the processing to S1102 when there is no motion vector belonging to the main subject cluster, and proceeds to S1104 when there is no motion vector belonging to the background cluster.

Ｓ１１０２で主被写体判別部８０２は、主被写体候補クラスタに所定の条件を満たす動きベクトルが存在するか否か判定し、存在すると判定されればＳ１１０３へ、判定されなければＳ１１０４へ処理を進める。なお、本実施形態では主被写体候補クラスタを構成する動きベクトルは１種類であるため、主被写体判別部８０２は、主被写体候補クラスタに属する動きベクトルが所定の条件を満たすか否かを判定してもよい。 In step S1102, the main subject determination unit 802 determines whether or not there is a motion vector satisfying a predetermined condition in the main subject candidate cluster. If it is determined that there is a motion vector, the process proceeds to step S1103, and if not, the process proceeds to step S1104. It should be noted that in the present embodiment, since there is only one type of motion vector forming the main subject candidate cluster, the main subject determination unit 802 determines whether or not the motion vector belonging to the main subject candidate cluster satisfies a predetermined condition. Good.

Ｓ１１０３で主被写体判別部８０２は、主被写体候補クラスタに属する動きベクトル（ここでは動きベクトルv_m）を検出した現フレーム画像内の部分領域を、新たな主被写体領域と判別する。すなわち、主被写体判別部８０２は、主被写体を変更することを決定する。なお、主被写体候補クラスタに属する動きベクトルを検出した現フレーム画像内の部分領域が複数存在する場合、主被写体判別部８０２は、他の１つ以上の部分領域と隣接する部分領域群を主被写体領域と判別してもよい。 In step S1103, the main subject determination unit 802 determines the partial region in the current frame image in which the motion vector (here, motion vector v _m ) belonging to the main subject candidate cluster is detected as a new main subject region. That is, the main subject determination unit 802 determines to change the main subject. When there are a plurality of partial areas in the current frame image in which the motion vector belonging to the main subject candidate cluster is detected, the main subject determining unit 802 determines the partial area group adjacent to one or more other partial areas as the main subject. It may be determined as a region.

ここで、Ｓ１１０２における所定の条件は、ユーザの意図する主被写体が、現在の主被写体よりも、主被写体候補クラスタに属する動きベクトルに対応する別の被写体である可能性が高いか否かを判定するための条件である。そして、所定の条件が満たされると判定される場合、主被写体判別部８０２は、ユーザの意図する主被写体が主被写体候補クラスタに属する動きベクトルに対応する別の被写体であると推定し、主被写体を変更する。そして、主被写体判別部８０２は、新たな主被写体領域の情報を被写体追尾部８０１に通知する。通知に応じて被写体追尾部８０１は、追尾処理に用いるテンプレートを、新たな主被写体領域に基づくテンプレートに更新する。 Here, the predetermined condition in S1102 determines whether or not the main subject intended by the user is more likely to be another subject corresponding to the motion vector belonging to the main subject candidate cluster than the current main subject. It is a condition for doing. Then, when it is determined that the predetermined condition is satisfied, the main subject determination unit 802 estimates that the main subject intended by the user is another subject corresponding to the motion vector belonging to the main subject candidate cluster, and the main subject is determined. To change. Then, the main subject determination unit 802 notifies the subject tracking unit 801 of information on the new main subject region. In response to the notification, the subject tracking unit 801 updates the template used for the tracking processing to the template based on the new main subject area.

例えば、被写体を画面の中央に配置するためにデジタルカメラ１００をフレーミングしている場合、その被写体はユーザが意図した主被写体である可能性が高い。主被写体候補クラスタに属する動きベクトルが画像中心に近づく方向の動きベクトルであることを所定の条件とすることで、別の被写体に対してこのようなカメラ操作が行われている場合に、主被写体領域を変更することができる。その結果、ユーザが意図した主被写体を確率よく判別することができる。 For example, when the digital camera 100 is being framed in order to place the subject in the center of the screen, the subject is likely to be the main subject intended by the user. By setting a predetermined condition that the motion vector belonging to the main subject candidate cluster is a motion vector in a direction approaching the center of the image, when such a camera operation is performed on another subject, the main subject You can change the area. As a result, the main subject intended by the user can be determined with high probability.

また、デジタルカメラ１００に近づいている被写体は主被写体である可能性が高い。そのため、主被写体候補クラスタに属する動きベクトルが、デジタルカメラ１００に近づく方向の動きベクトルであることを所定の条件とすることで、別の被写体がデジタルカメラ１００に近づいている場合に、主被写体領域を変更することができる。この場合も、ユーザが意図した主被写体を確率よく判別することができる。 Also, the subject approaching the digital camera 100 is likely to be the main subject. Therefore, by setting a predetermined condition that the motion vector belonging to the main subject candidate cluster is a motion vector in the direction of approaching the digital camera 100, when another subject is approaching the digital camera 100, the main subject region Can be changed. Also in this case, the main subject intended by the user can be determined with high probability.

なお、動きベクトルがカメラに近づく方向の動きベクトルであるか否かは、距離マップ入力部５０９が取得する距離マップを用いて判定することができる。デジタルカメラ１００に近づく方向の動きベクトルは、別の被写体の距離が小さくなる方向の動きベクトルである。主被写体候補クラスタに属する動きベクトルの始点より終点の距離が小さい動きベクトルは、デジタルカメラ１００に近づく方向の動きベクトルであると判定できる。 Whether or not the motion vector is a motion vector in the direction of approaching the camera can be determined using the distance map acquired by the distance map input unit 509. The motion vector in the direction toward the digital camera 100 is a motion vector in the direction in which the distance to another subject becomes smaller. A motion vector whose end point distance is smaller than the start point of the motion vector belonging to the main subject candidate cluster can be determined to be a motion vector in the direction toward the digital camera 100.

なお、ここで説明した所定の条件は単なる例示であり、主被写体候補クラスタに属する動きベクトルに対応する被写体が、ユーザの意図する主被写体である可能性が高いと判定するための他の任意の条件を定めることができる。また、所定の条件は複数存在してもよい。この場合、複数の条件の１つでも満たすと判定されれば主被写体領域を変更するようにしたり、複数の条件を全て満たすと判定されれば主被写体領域を変更するようにしたりすることができる。 The predetermined condition described here is merely an example, and any other arbitrary condition for determining that the subject corresponding to the motion vector belonging to the main subject candidate cluster is highly likely to be the main subject intended by the user. Conditions can be set. Moreover, a plurality of predetermined conditions may exist. In this case, the main subject area can be changed if it is determined that even one of the plurality of conditions is satisfied, or the main subject area can be changed if it is determined that all of the plurality of conditions are satisfied. ..

Ｓ１１０４で主被写体判別部８０２は、主被写体領域を維持する（変更しない）ことを決定する。例えば、主被写体クラスタに属する動きベクトルの代表ベクトルと、背景クラスタに属する動きベクトルの代表ベクトルとのユークリッド距離が閾値以上の場合には、主被写体が有意に移動していると考えられる。この場合には現在の主被写体を変更しない。また、移動している被写体が検出されない場合には、上述したdist_mが閾値未満となり、主被写体候補クラスタに割り当てられる動きベクトルが存在しないため、現在の主被写体領域が維持される。 In step S1104, the main subject determination unit 802 determines to maintain (do not change) the main subject area. For example, when the Euclidean distance between the representative vector of the motion vectors belonging to the main subject cluster and the representative vector of the motion vectors belonging to the background cluster is equal to or greater than the threshold value, it is considered that the main subject is significantly moving. In this case, the current main subject is not changed. Further, when the moving subject is not detected, dist _m described above is less than the threshold value and there is no motion vector assigned to the main subject candidate cluster, so that the current main subject region is maintained.

以上説明したように、本実施形態では、フレーム間で検出した動きベクトルに基づいて主被写体を判別する際、現在の主被写体と別の被写体の動きベクトルを考慮するようにした。そして、別の被写体の動きベクトルから、例えばユーザが意図する主被写体が、現在の主被写体ではなく別の被写体であると想定される場合には、主被写体を変更するようにした。そのため、ユーザの意図にあった主被写体を判別できる可能性を高めることができる。 As described above, in the present embodiment, when determining the main subject based on the motion vector detected between the frames, the motion vector of the subject different from the current main subject is taken into consideration. Then, if it is assumed that the main subject intended by the user is not the current main subject but another subject from the motion vector of the different subject, the main subject is changed. Therefore, it is possible to increase the possibility that the main subject that is intended by the user can be identified.

なお、本実施形態では、主被写体候補クラスタを、背景ベクトルbとのユークリッド距離が最大の動きベクトルのみから構成されるクラスタとした。しかし、他の方法で主被写体候補クラスタを規定してもよい。例えば、主被写体候補クラスタを、背景ベクトルbとのユークリッド距離が最大の動きベクトルを含む、背景ベクトルbとのユークリッド距離が最大の動きベクトルと大きさおよび方向の差が閾値未満である動きベクトルから構成されるクラスタとしてもよい。 In the present embodiment, the main subject candidate cluster is a cluster composed only of motion vectors having the maximum Euclidean distance from the background vector b. However, the main subject candidate cluster may be defined by other methods. For example, a main subject candidate cluster is selected from a motion vector having a maximum Euclidean distance with the background vector b and a motion vector having a maximum Euclidean distance with the background vector b and a difference in size and direction less than a threshold value. It may be configured as a cluster.

また、Ｓ１１０１では、主被写体クラスタに属する動きベクトルの代表ベクトルと、背景クラスタに属する動きベクトルの代表ベクトルとのユークリッド距離が閾値未満の場合には処理をＳ１１０２に進めていた。しかし、ユークリッド距離が閾値未満の場合でも、距離マップに基づいて主被写体が近付いていると判定される場合には処理をＳ１１０２に進めずにＳ１１０４に進めてもよい。 In S1101, if the Euclidean distance between the representative vector of the motion vectors belonging to the main subject cluster and the representative vector of the motion vectors belonging to the background cluster is less than the threshold value, the process proceeds to S1102. However, even if the Euclidean distance is less than the threshold value, if it is determined that the main subject is approaching based on the distance map, the process may proceed to S1104 instead of S1102.

（その他の実施形態）
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 (Other embodiments)
The present invention supplies a program that implements one or more functions of the above-described embodiments to a system or apparatus via a network or a storage medium, and one or more processors in a computer of the system or apparatus read and execute the program. It can also be realized by the processing. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

また、上述の実施形態は本発明の理解を助けることを目的とした具体例に過ぎず、いかなる意味においても本発明を上述の実施形態に限定する意図はない。特許請求の範囲に規定される範囲に含まれる全ての実施形態は本発明に包含される。 Further, the above-described embodiments are merely specific examples for the purpose of helping understanding of the present invention, and are not intended to limit the present invention to the above-described embodiments in any sense. All embodiments that fall within the scope defined by the claims are included in the present invention.

１００…撮像装置、１０１…レンズユニット、１４１…撮像素子、１４２…撮像信号処理部、１４３…撮像制御部、１５１…主制御部、１５２…画像処理部、１５４…ＲＡＭ、１５５…ＲＯＭ、１６１…距離マップ生成部、１６２…動き検出部 Reference numeral 100... Imaging device, 101... Lens unit, 141... Imaging element, 142... Imaging signal processing unit, 143... Imaging control unit, 151... Main control unit, 152... Image processing unit, 154... RAM, 155... ROM, 161,... Distance map generation unit 162... Motion detection unit

Claims

Motion vector detection means for detecting a plurality of motion vectors between a plurality of images,
Calculating means for calculating a background vector representing the motion of the background based on the motion vector;
An image processing apparatus, comprising: a moving body detection unit that detects a motion vector of a moving body from the plurality of motion vectors based on the size of the Euclidean distance between each of the plurality of motion vectors and the background vector. ..

Clustering means for generating a cluster by clustering the plurality of motion vectors,
Further comprising a selection unit for selecting a background cluster from the clusters generated by the clustering unit,
The image processing apparatus according to claim 1, wherein the calculation unit calculates the background vector from a motion vector forming the background cluster.

3. The cluster according to claim 2, wherein the selecting unit selects, from the clusters generated by the clustering unit, a cluster having a maximum existence range in a motion vector image forming the cluster as the background cluster. Image processing device.

3. The selection unit selects, as the background cluster, a cluster having a maximum variance of coordinates of a start point or an end point of a motion vector forming the cluster among the clusters generated by the clustering unit. The image processing device according to item 3.

Further comprising a motion detecting means for detecting a motion of the image processing device,
The calculation means estimates the movement direction of the background from the movement of the image processing device detected by the movement detection means,
The selecting means selects the background cluster from the clusters generated by the clustering means using the movement direction of the background,
The image processing device according to claim 2, wherein

The image according to claim 5, wherein the selecting unit selects, from the clusters generated by the clustering unit, a cluster having a smallest angle formed by the representative vector and the movement direction of the background as the background cluster. Processing equipment.

Motion detecting means for detecting the motion of the image processing device;
Further comprising an acquisition unit that acquires information about the focal length of the imaging optical system that has captured the plurality of images and the image sensor,
The calculation unit calculates the background vector from the movement of the image processing device, the focal length, the information of the image sensor, and the shooting intervals of the plurality of images. The image processing device described.

Further comprising a saliency calculating means for calculating the saliency of the image at the coordinates of the end point of the motion vector,
8. The moving body detection means detects the motion vector of the moving body from among the plurality of motion vectors, a motion vector in which the saliency of the image at the end point coordinates is equal to or more than a threshold value. The image processing device according to item 1.

A distance detecting means for detecting a subject distance at a pixel position,
The moving body detecting means detects a motion vector of the moving body from a motion vector whose subject distance at a pixel position of an end point coordinate is less than a threshold value among the plurality of motion vectors.
The image processing apparatus according to claim 1, wherein the image processing apparatus is an image processing apparatus.

A distance detecting means for detecting a subject distance at a pixel position,
The moving body detection means detects, as the motion vector of the moving body, a motion vector having the minimum Euclidean distance of a threshold value or more and a minimum object distance at the pixel position of the end point coordinate among the plurality of motion vectors. To do
The image processing apparatus according to claim 1, wherein the image processing apparatus is an image processing apparatus.

The image processing apparatus according to claim 1,
An image sensor for capturing the plurality of images,
Control means for performing focus detection and/or exposure control based on the motion vector of the moving body detected by the image processing device;
An image pickup apparatus comprising:

Detecting multiple motion vectors between multiple images;
Calculating a background vector representing the motion of the background based on the plurality of motion vectors;
Detecting the motion vector of the moving object from the plurality of motion vectors based on the magnitude of the Euclidean distance between each of the plurality of motion vectors and the background vector. Method.

A program for causing a computer to function as each unit of the image processing apparatus according to claim 1.

Vector detection means for detecting a plurality of motion vectors between a plurality of images,
A discriminating means for discriminating an area of the main subject based on the plurality of motion vectors,
The discrimination means is
Of the plurality of motion vectors, a motion vector for a subject other than the current main subject is detected,
When the motion vector relating to the different subject satisfies a predetermined condition, the different subject is determined as a new main subject,
An image processing device characterized by the above.

The image processing apparatus according to claim 14, wherein the predetermined condition is that a motion vector regarding the other subject is a motion vector in a direction toward a center of an image.

The image processing according to claim 14 or 15, wherein the predetermined condition is that a motion vector related to the different subject is a motion vector in a direction in which the distance between the different subjects decreases. apparatus.

The discrimination means is
Of the plurality of motion vectors, a background vector representing a motion vector related to the background is calculated,
Of the plurality of motion vectors, a motion vector whose distance from the background vector is equal to or greater than a threshold value is detected as a motion vector related to the different subject,
The image processing apparatus according to any one of claims 14 to 16, characterized in that.

The determination means does not change the current main subject when the Euclidean distance between the motion vector relating to the current main subject and the background vector among the plurality of motion vectors is equal to or greater than a threshold value. The image processing apparatus according to claim 17.

The determination unit does not change the current main subject when it is determined that the current main subject is approaching based on the motion vector related to the current main subject among the plurality of motion vectors. The image processing device according to claim 17 or 18, characterized in that.

Further comprising clustering means for clustering the plurality of motion vectors to generate a cluster,
The discrimination means is
Among the clusters generated by the clustering unit, a cluster having the widest distribution range of the detection positions of the motion vectors forming the cluster is selected as the background cluster,
Calculating the background vector from the motion vector forming the background cluster,
20. The image processing apparatus according to claim 17, wherein the image processing apparatus is an image processing apparatus.

Further comprising clustering means for clustering the plurality of motion vectors to generate a cluster,
The discrimination means is
Of the clusters generated by the clustering unit, a cluster having the largest variance of the coordinates of the start point or the end point of the motion vector forming the cluster is selected as the background cluster,
Calculating the background vector from the motion vector forming the background cluster,
20. The image processing apparatus according to claim 17, wherein the image processing apparatus is an image processing apparatus.

Clustering means for generating a cluster by clustering the plurality of motion vectors,
Further comprising a motion detecting means for detecting a motion of the image processing device,
The discrimination means is
Estimating the motion direction of the background from the motion of the image processing device detected by the motion detection means,
Using the background motion direction, select a background cluster from the clusters generated by the clustering means,
Calculating the background vector from the motion vector forming the background cluster,
20. The image processing apparatus according to claim 17, wherein the image processing apparatus is an image processing apparatus.

23. The image according to claim 22, wherein the discriminating unit selects, as the background cluster, a cluster having a smallest angle formed by the representative vector and the movement direction of the background among the clusters generated by the clustering unit. Processing equipment.

An image processing apparatus according to any one of claims 14 to 23,
An image sensor for capturing the plurality of images,
Control means for performing focus detection and/or exposure control based on the motion vector of the moving body detected by the image processing device;
An image pickup apparatus comprising:

Detecting multiple motion vectors between multiple images;
Of the plurality of motion vectors, detecting a motion vector relating to a subject other than the current main subject,
When the motion vector relating to the different subject satisfies a predetermined condition, the different subject is discriminated as a new main subject, and
A method for controlling an image processing apparatus having the following.

A program for causing a computer to function as each unit of the image processing apparatus according to any one of claims 14 to 23.