JP2023086274A

JP2023086274A - Image processing device and control method for the same

Info

Publication number: JP2023086274A
Application number: JP2021200669A
Authority: JP
Inventors: 友貴植草; Tomotaka Uekusa; 豪山下; Takeshi Yamashita; 寧司大輪; Yasushi Owa; 貴弘宇佐美; Takahiro Usami; 裕也江幡; Hironari Ehata; 浩靖形川; Hiroyasu Katagawa; 浩之谷口; Hiroyuki Taniguchi; 徹相田; Toru Aida; 侑弘小貝; Yukihiro Kogai
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2021-12-10
Filing date: 2021-12-10
Publication date: 2023-06-22

Abstract

To provide an imaging apparatus including a subject tracking function that achieves excellent performance while suppressing power consumption and to provide a control method for the same.SOLUTION: An imaging apparatus has first tracking means for tracking a subject using an image acquired by imaging means, second tracking means for tracking the subject using the image acquired by the imaging means, which has a smaller computational load than the first tracking means, and control means for switching between enabling both the first tracking means and the second tracking means or disabling one of the two tracking means on the basis of feature points detected in the image acquired by the imaging means.SELECTED DRAWING: Figure 2

Description

本発明は被写体の追尾処理に関する画像処理装置およびその制御方法に関する。 The present invention relates to an image processing apparatus and a control method thereof for subject tracking processing.

デジタルカメラなどの撮像装置には、顔領域などの特徴領域の検出を経時的に適用することにより、特徴領域を追尾する機能（被写体追尾機能）を有するものがある。また、学習済みのニューラルネットワークを用いて被写体を追尾する装置も知られている（特許文献１）。 2. Description of the Related Art Some imaging devices such as digital cameras have a function (subject tracking function) of tracking a characteristic region such as a face region by applying detection of the characteristic region over time. Also known is a device that tracks a subject using a trained neural network (Patent Document 1).

特開２０１７－１５６８８６号公報JP 2017-156886 A

画像を用いた被写体認識や被写体追尾を行う技術において、機械学習（ニューラルネットワーク、深層学習）を用いることにより、画像領域間の相関や類似性などを用いる場合よりも被写体追尾の精度を向上させることができる場合がある。しかしながら、ニューラルネットワークを用いた処理は演算量が多く、高速なプロセッサや大規模な回路が必要となるため、消費電力が大きいという問題がある。例えば、ライブビュー表示用の動画像に対してニューラルネットワークを用いた被写体追尾を適用した場合、ライブビュー表示による電池の消耗が問題となる。また、機械学習済モデルの回路を用いた被写体追尾の中であっても、学習モデルによって演算負荷、消費電力に差がある場合もある。 Improving the accuracy of subject tracking by using machine learning (neural networks, deep learning) in technologies for subject recognition and subject tracking using images, compared to when correlations and similarities between image regions are used. may be possible. However, processing using a neural network requires a large amount of calculations, requires a high-speed processor and a large-scale circuit, and has a problem of high power consumption. For example, when subject tracking using a neural network is applied to moving images for live view display, battery consumption due to live view display becomes a problem. Moreover, even during subject tracking using a circuit of a machine-learned model, there may be differences in calculation load and power consumption depending on the learning model.

本発明は上記課題に鑑みなされたものであり、消費電力を抑制しながら良好な性能を実現する被写体追尾機能を備えた画像処理装置および画像処理方法の提供を目的とする。 SUMMARY OF THE INVENTION It is an object of the present invention to provide an image processing apparatus and an image processing method having a subject tracking function that achieves good performance while suppressing power consumption.

上記課題を解決するために、本発明の画像処理装置は、撮像手段で取得した画像を用いて被写体追尾を行う第１の追尾手段と、前記撮像手段で取得した画像を用いて被写体追尾を行う、前記第１の追尾手段に比べて演算負荷が小さい第２の追尾手段と、前記撮像手段で取得した画像から検出される特徴点に基づいて前記第１の追尾手段と前記第２の追尾手段の両方を有効にするか、一方を無効にするかを切り替える制御手段と、を有する、ことを特徴とする。 In order to solve the above problems, the image processing apparatus of the present invention includes first tracking means for tracking a subject using an image acquired by an imaging means, and tracking a subject using the image acquired by the imaging means. a second tracking means having a smaller computational load than the first tracking means; and the first tracking means and the second tracking means based on feature points detected from the image acquired by the imaging means. and a control means for switching between enabling both of and disabling one of them.

本発明によれば、消費電力を抑制しながら良好な性能を実現する被写体追尾機能を実現することができる。 According to the present invention, it is possible to realize a subject tracking function that achieves good performance while suppressing power consumption.

第１の実施形態に係る撮像装置の機能構成例を示すブロック図1 is a block diagram showing a functional configuration example of an imaging device according to a first embodiment; FIG. 第１の実施形態に係る撮像装置における追尾制御部１１３の動作フロー図FIG. 4 is an operation flow diagram of the tracking control unit 113 in the imaging apparatus according to the first embodiment; 第１の実施形態に係る被写体追尾処理におけるライブビュー表示を示す図FIG. 4 is a diagram showing live view display in subject tracking processing according to the first embodiment; 第２の実施形態に係る撮像装置における制御部１０２の動作フロー図FIG. 11 is an operation flow chart of the control unit 102 in the imaging apparatus according to the second embodiment; 第２の実施形態に係る撮影シーンと検出部１１０と追尾部１０５の動作モードの関係を示す表7 is a table showing the relationship between the shooting scene and the operation modes of the detection unit 110 and the tracking unit 105 according to the second embodiment; 第２の実施形態に係る検出部１１０と追尾部１０５の動作モードを示す表Table showing operation modes of the detection unit 110 and the tracking unit 105 according to the second embodiment 第３の実施形態の制御部１０２の動作フロー図Operation flow chart of the control unit 102 of the third embodiment 第３の実施形態の特徴点検出部２０１で行う特徴点検出処理のフローチャート図FIG. 11 is a flowchart of feature point detection processing performed by the feature point detection unit 201 of the third embodiment;

以下、添付の図面を参照して本発明の好適な実施形態を説明する。 Preferred embodiments of the present invention will now be described with reference to the accompanying drawings.

（第１の実施形態）
以下、添付図面を参照して本発明をその例示的な実施形態に基づいて詳細に説明する。なお、以下の実施形態は特許請求の範囲に係る発明を限定しない。また、実施形態には複数の特徴が記載されているが、その全てが発明に必須のものとは限らず、また、複数の特徴は任意に組み合わせられてもよい。さらに、添付図面においては、同一若しくは同様の構成に同一の参照番号を付し、重複した説明は省略する。 (First embodiment)
The invention will now be described in detail on the basis of its exemplary embodiments with reference to the accompanying drawings. In addition, the following embodiments do not limit the invention according to the scope of claims. In addition, although a plurality of features are described in the embodiments, not all of them are essential to the invention, and the plurality of features may be combined arbitrarily. Furthermore, in the accompanying drawings, the same or similar configurations are denoted by the same reference numerals, and redundant description is omitted.

なお、以下の実施形態では、本発明をデジタルカメラなどの撮像装置で実施する場合に関して説明する。しかし、本発明は撮像機能を有する任意の電子機器でも実施可能である。このような電子機器には、コンピュータ機器（パーソナルコンピュータ、タブレットコンピュータ、メディアプレーヤ、ＰＤＡなど）、携帯電話機、スマートフォン、ゲーム機、ロボット、ドローン、ドライブレコーダが含まれる。これらは例示であり、本発明は他の電子機器でも実施可能である。 In the following embodiments, the case where the present invention is implemented in an imaging device such as a digital camera will be described. However, the present invention can be implemented in any electronic device that has an imaging function. Such electronic devices include computer devices (personal computers, tablet computers, media players, PDAs, etc.), mobile phones, smart phones, game consoles, robots, drones, and drive recorders. These are examples, and the present invention can also be implemented in other electronic devices.

図１は第１の実施形態に係る画像処理装置の一例としての撮像装置１００の機能構成例を示すブロック図である。 FIG. 1 is a block diagram showing a functional configuration example of an imaging device 100 as an example of an image processing device according to the first embodiment.

光学系１０１はフォーカスレンズなどの可動レンズを含む複数枚のレンズを有し、撮影範囲の学像を撮像素子１０３の結像面に形成する。 The optical system 101 has a plurality of lenses including a movable lens such as a focus lens, and forms a scientific image of a photographing range on the imaging plane of the image sensor 103 .

制御部１０２は、ＣＰＵを有し、例えばＲＯＭ１２３に記憶されたプログラムをＲＡＭ１２２に読み込んで実行する。制御部１０２は、各機能ブロックの動作を制御することにより、撮像装置１００の機能を実現する。ＲＯＭ１２３は例えば書き換え可能な不揮発性メモリであり、制御部１０２のＣＰＵが実行可能なプログラム、設定値、ＧＵＩデータなどを記憶する。ＲＡＭ１２２は、制御部１０２のＣＰＵが実行するプログラムを読み込んだり、プログラムの実行中に必要な値を保存したりするために用いられるシステムメモリである。なお、図１では省略しているが、制御部１０２は各機能ブロックと通信可能に接続されている。 The control unit 102 has a CPU, for example, loads a program stored in the ROM 123 into the RAM 122 and executes the program. The control unit 102 implements the functions of the imaging apparatus 100 by controlling the operation of each functional block. The ROM 123 is, for example, a rewritable non-volatile memory, and stores programs executable by the CPU of the control unit 102, setting values, GUI data, and the like. The RAM 122 is a system memory used to read programs executed by the CPU of the control unit 102 and to store values required during execution of the programs. Although omitted in FIG. 1, the control unit 102 is communicably connected to each functional block.

撮像素子１０３は、例えば原色ベイヤ配列のカラーフィルタを有するＣＭＯＳイメージセンサであってよい。撮像素子１０３には光電変換領域を有する複数の画素が２次元配置されている。撮像素子１０３は、光学系１０１が形成する光学像を複数の画素によって電気信号群（アナログ画像信号）に変換する。アナログ画像信号は撮像素子１０３が有するＡ／Ｄ変換器によってデジタル画像信号（画像データ）に変換されて出力される。Ａ／Ｄ変換器は撮像素子１０３の外部に設けられてもよい。 The imaging device 103 may be, for example, a CMOS image sensor having color filters in a primary color Bayer arrangement. A plurality of pixels having photoelectric conversion regions are two-dimensionally arranged in the image sensor 103 . The imaging element 103 converts an optical image formed by the optical system 101 into an electrical signal group (analog image signal) using a plurality of pixels. The analog image signal is converted into a digital image signal (image data) by an A/D converter of the image sensor 103 and output. The A/D converter may be provided outside the imaging device 103 .

評価値生成部１２４は、撮像素子１０３から得られる画像データから、自動焦点検出（ＡＦ）に用いる信号や評価値を生成したり、自動露出制御（ＡＥ）に用いる評価値を算出したりする。評価値生成部１２４は、生成した信号および評価値を制御部１０２に出力する。制御部１０２は、評価値生成部１２４から得られる信号や評価値に基づいて、光学系１０１のフォーカスレンズ位置を制御したり、撮影条件（露光時間、絞り値、ＩＳＯ感度など）を決定したりする。評価値生成部１２４は、後述する後処理部１１４が生成する表示用画像データから信号や評価値を生成してもよい。 The evaluation value generation unit 124 generates signals and evaluation values used for automatic focus detection (AF) from image data obtained from the image sensor 103, and calculates evaluation values used for automatic exposure control (AE). Evaluation value generator 124 outputs the generated signal and evaluation value to control unit 102 . The control unit 102 controls the focus lens position of the optical system 101 and determines shooting conditions (exposure time, aperture value, ISO sensitivity, etc.) based on signals and evaluation values obtained from the evaluation value generation unit 124. do. The evaluation value generation unit 124 may generate a signal or an evaluation value from display image data generated by the post-processing unit 114, which will be described later.

第１前処理部１０４は、撮像素子１０３から得られる画像データに対して色補間処理を適用する。色補間処理は、デモザイク処理などとも呼ばれ、画像データを構成する画素データのそれぞれが、Ｒ成分、Ｇ成分、Ｂ成分の値を有するようにする処理である。また、第１前処理部１０４は、必要に応じて画素数を削減する縮小処理を適用してもよい。第１前処理部１０４は、処理を適用した画像データを表示用メモリ１０７に格納する。 A first preprocessing unit 104 applies color interpolation processing to image data obtained from the image sensor 103 . Color interpolation processing, which is also called demosaicing processing, is processing for making each pixel data constituting image data have values of R, G, and B components. Also, the first preprocessing unit 104 may apply reduction processing for reducing the number of pixels as necessary. The first preprocessing unit 104 stores the processed image data in the display memory 107 .

第１画像補正部１０９は、表示用メモリ１０７に格納された画像データに対してホワイトバランス補正処理およびシェーディング補正処理といった補正処理や、ＲＧＢ形式からＹＵＶ形式への変換処理などを適用する。なお、第１画像補正部１０９は、補正処理を適用する際、表示用メモリ１０７に格納されている画像データのうち、処理対象フレームとは異なる１フレーム以上の画像データを用いてもよい。第１画像補正部１０９は、例えば、処理対象のフレームより時系列で前および／または後のフレームの画像データを補正処理に用いることができる。第１画像補正部１０９は、処理を適用した画像データを、後処理部１１４に出力する。 The first image correction unit 109 applies correction processing such as white balance correction processing and shading correction processing to the image data stored in the display memory 107, conversion processing from RGB format to YUV format, and the like. Note that the first image correction unit 109 may use image data of one or more frames different from the processing target frame among the image data stored in the display memory 107 when applying the correction processing. For example, the first image correction unit 109 can use the image data of the frames before and/or after the frame to be processed in the correction process. The first image correction unit 109 outputs the processed image data to the post-processing unit 114 .

後処理部１１４は、第１画像補正部１０９から供給される画像データから、記録用画像データや表示用画像データを生成する。後処理部１１４は、例えば画像データに符号化処理を適用し、符号化した画像データを格納するデータファイルを記録用画像データとして生成する。後処理部１１４は、記録用画像データを記録部１１８に供給する。 The post-processing unit 114 generates recording image data and display image data from the image data supplied from the first image correction unit 109 . The post-processing unit 114 applies, for example, an encoding process to the image data, and generates a data file storing the encoded image data as recording image data. The post-processing unit 114 supplies the recording image data to the recording unit 118 .

また、後処理部１１４は、第１画像補正部１０９から供給される画像データから、表示部１２１に表示するための表示用画像データを生成する。表示用画像データは、表示部１２１での表示サイズに応じたサイズを有する。後処理部１１４は表示用画像データを情報重畳部１２０に供給する。 Also, the post-processing unit 114 generates display image data to be displayed on the display unit 121 from the image data supplied from the first image correction unit 109 . The image data for display has a size corresponding to the display size on the display unit 121 . The post-processing unit 114 supplies the display image data to the information superimposing unit 120 .

記録部１１８は、後処理部１１４で変換された記録用画像データを記録媒体１１９に記録する。記録媒体１１９は、例えば半導体メモリカード、内蔵不揮発性メモリなどであってよい。 A recording unit 118 records the recording image data converted by the post-processing unit 114 on a recording medium 119 . The recording medium 119 may be, for example, a semiconductor memory card, built-in non-volatile memory, or the like.

第２前処理部１０５は、撮像素子１０３が出力する画像データに対して色補間処理を適用する。第２前処理部１０５は、処理を適用した画像データを追尾用メモリ１０８に格納する。追尾用メモリ１０８と表示用メモリ１０７とは同一メモリ空間内の別アドレス空間として実装されてもよい。また、第２前処理部１０５は、処理負荷を軽減するために必要に応じて画素数を削減する縮小処理を適用してもよい。なお、ここでは第１前処理部１０４と第２前処理部１０５とを別個の機能ブロックとして記載したが、共通の前処理部を用いる構成としてもよい。 A second preprocessing unit 105 applies color interpolation processing to the image data output by the image sensor 103 . The second preprocessing unit 105 stores the processed image data in the tracking memory 108 . Tracking memory 108 and display memory 107 may be implemented as separate address spaces within the same memory space. In addition, the second preprocessing unit 105 may apply reduction processing for reducing the number of pixels as necessary in order to reduce the processing load. Although the first preprocessing unit 104 and the second preprocessing unit 105 are described here as separate functional blocks, a common preprocessing unit may be used.

第２画像補正部１０６は、追尾用メモリ１０８に格納された画像データに対してホワイトバランス補正処理およびシェーディング補正処理といった補正処理や、ＲＧＢ形式からＹＵＶ形式への変換処理などを適用する。また、第２画像補正部１０６は、被写体検出処理に適した画像処理を画像データに適用してもよい。第２画像補正部１０６は、例えば、画像データの代表輝度（例えば全画素の平均輝度）が予め定められた閾値以下であれば、代表輝度が閾値以上になるよう、画像データ全体に一定の係数（ゲイン）を乗じてもよい。 The second image correction unit 106 applies correction processing such as white balance correction processing and shading correction processing to the image data stored in the tracking memory 108, conversion processing from RGB format to YUV format, and the like. Also, the second image correction unit 106 may apply image processing suitable for subject detection processing to the image data. For example, if the representative luminance of the image data (for example, the average luminance of all pixels) is equal to or less than a predetermined threshold, the second image correction unit 106 applies a constant coefficient to the entire image data so that the representative luminance becomes equal to or greater than the threshold. (gain) may be multiplied.

なお、第２画像補正部１０６は、補正処理を適用する際、追尾用メモリ１０８に格納されている画像データのうち、処理対象フレームとは異なる１フレーム以上の画像データを用いてもよい。第２画像補正部１０６は、例えば、処理対象のフレームより時系列で前および／または後のフレームの画像データを補正処理に用いることができる。第２画像補正部１０６は、処理を適用した画像データを、追尾用メモリ１０８に格納する。 Note that the second image correction unit 106 may use image data of one or more frames different from the processing target frame among the image data stored in the tracking memory 108 when applying the correction processing. For example, the second image correction unit 106 can use image data of frames before and/or after the frame to be processed in the correction process. The second image correction unit 106 stores the processed image data in the tracking memory 108 .

なお、第２前処理部１０５、第２画像補正部１０６など、被写体追尾機能に関する機能ブロックは、被写体追尾機能を実施しない場合には動作しなくてよい。また、被写体追尾機能を適用する画像データは、ライブビュー表示用もしくは記録用に撮影される動画データである。動画データは例えば３０ｆｐｓ、６０ｆｐｓ、１２０ｆｐｓといった所定のフレームレートを有する。 Note that functional blocks related to the subject tracking function, such as the second preprocessing unit 105 and the second image correction unit 106, do not need to operate when the subject tracking function is not performed. Image data to which the subject tracking function is applied is moving image data captured for live view display or recording. Moving image data has predetermined frame rates such as 30 fps, 60 fps, and 120 fps.

検出部１１０は、１フレーム分の画像データから、予め定められた候補被写体の領域（候補領域）を１つ以上検出する。また、検出部１１０は、検出した領域ごとに、フレーム内の位置および大きさ、候補被写体の種類（自動車、飛行機、鳥、昆虫、人体、頭部、瞳、猫、犬など）を示すオブジェクトクラスとその信頼度を関連付ける。また、オブジェクトクラスごとに、検出した領域数を計数する。 The detection unit 110 detects one or more predetermined candidate subject areas (candidate areas) from one frame of image data. For each detected area, the detection unit 110 detects an object class indicating the position and size in the frame, and the type of candidate subject (automobile, airplane, bird, insect, human body, head, pupil, cat, dog, etc.). and its confidence. Also, the number of detected regions is counted for each object class.

検出部１１０は、人物や動物の顔領域のような特徴領域を検出するための公知技術を用いて候補領域を検出することができる。例えば、学習データを用いて学習済みのクラス識別器として検出部１１０を構成してもよい。識別（分類）のアルゴリズムに特に制限はない。多クラス化したロジスティック回帰、サポートベクターマシン、ランダムフォレスト、ニューラルネットワークなどを実装した識別器を学習させることで、検出部１１０を実現できる。検出部１１０は、検出結果を追尾用メモリ１０８に格納する。 The detection unit 110 can detect candidate regions using known techniques for detecting characteristic regions such as human or animal face regions. For example, the detection unit 110 may be configured as a class discriminator that has been trained using learning data. There are no particular restrictions on the identification (classification) algorithm. The detection unit 110 can be realized by learning a discriminator implemented with multi-class logistic regression, support vector machine, random forest, neural network, or the like. The detection unit 110 stores the detection result in the tracking memory 108 .

対象決定部１１１は、検出部１１０が検出した候補領域あるいは後述する追尾部１１５による追尾結果から、追尾対象とする被写体領域（主被写体領域）を決定する。追尾対象の被写体領域は、例えば、オブジェクトクラス、領域の大きさなど、検出結果に含まれる項目ごとに予め付与された優先順位に基づいて決定することができる。具体的には、候補領域ごとに優先順位の合計を算出し、合計が最も小さい候補領域を追尾対象の被写体領域として決定してもよい。あるいは、特定のオブジェクトクラスに属する候補領域のうち、画像の中央もしくは焦点検出領域に最も近い候補領域や、最も大きい候補領域を追尾対象の被写体領域として決定してもよい。対象決定部１１１は、決定した被写体領域を特定する情報を追尾用メモリ１０８に格納する。 The target determination unit 111 determines a subject area (main subject area) to be tracked from the candidate area detected by the detection unit 110 or the tracking result of the tracking unit 115 (to be described later). The subject area to be tracked can be determined, for example, based on the priority assigned in advance to each item included in the detection result, such as object class and area size. Specifically, the total priority may be calculated for each candidate area, and the candidate area with the lowest total may be determined as the subject area to be tracked. Alternatively, among the candidate areas belonging to a specific object class, the candidate area closest to the center of the image or the focus detection area, or the largest candidate area may be determined as the subject area to be tracked. The target determining unit 111 stores information specifying the determined subject area in the tracking memory 108 .

難度判定部１１２は、対象決定部１１１が決定した追尾対象の被写体領域について、追尾の難度を示す評価値である難度スコアを算出する。例えば、難度判定部１１２は、追尾の難度に影響を与える１つ以上の要素を考慮して難度スコアを算出することができる。追尾の難度に影響を与える要素としては、被写体領域の大きさ、被写体のオブジェクトクラス（種類）、同じオブジェクトクラスに属する領域の総数、画像内の位置などが例示されるが、これらに限定されない。難度スコアの算出方法の具体例については後述する。難度判定部１１２は、算出した難度スコアを追尾制御部１１３に出力する。 The difficulty determination unit 112 calculates a difficulty score, which is an evaluation value indicating the difficulty of tracking, for the tracking target subject area determined by the target determination unit 111 . For example, the difficulty level determination unit 112 can calculate the difficulty level score considering one or more factors that affect the tracking difficulty level. Elements that affect the tracking difficulty include, but are not limited to, the size of the subject area, the object class (kind) of the subject, the total number of areas belonging to the same object class, and the position within the image. A specific example of how to calculate the difficulty score will be described later. The difficulty level determination unit 112 outputs the calculated difficulty level score to the tracking control unit 113 .

追尾制御部１１３は、難度判定部１１２が算出した難度スコアに基づいて、追尾部１１５が有する複数の追尾部のそれぞれについて、有効とするか無効とするかを決定する。本実施形態では、追尾部１１５が、演算負荷と追尾精度が異なる複数の追尾部を有する。具体的には、追尾部１１５は、深層学習（ＤＬ）を用いて被写体追尾を行うＤＬ追尾部１１６と、ＤＬを用いずに被写体追尾を行う非ＤＬ追尾部１１７とを有する。ＤＬ追尾部１１６は、非ＤＬ追尾部１１７よりも処理精度が高い反面、演算負荷が非ＤＬ追尾部１１７よりも大きいものとする。 The tracking control unit 113 determines whether to enable or disable each of the plurality of tracking units included in the tracking unit 115 based on the difficulty score calculated by the difficulty determination unit 112 . In this embodiment, the tracking unit 115 has a plurality of tracking units with different calculation loads and tracking accuracies. Specifically, the tracking unit 115 has a DL tracking unit 116 that tracks the subject using deep learning (DL) and a non-DL tracking unit 117 that tracks the subject without using DL. It is assumed that the DL tracking unit 116 has higher processing accuracy than the non-DL tracking unit 117 but has a larger computational load than the non-DL tracking unit 117 .

この場合、追尾制御部１１３は、ＤＬ追尾部１１６と非ＤＬ追尾部１１７のそれぞれについて、有効とするか無効とするかを決定する。また、追尾制御部１１３は、有効とする追尾部についての動作頻度についても併せて決定する。動作頻度とは、追尾処理を適用する頻度（ｆｐｓ）である。 In this case, the tracking control unit 113 determines whether to enable or disable each of the DL tracking unit 116 and the non-DL tracking unit 117 . The tracking control unit 113 also determines the operation frequency of the active tracking units. The operating frequency is the frequency (fps) at which the tracking process is applied.

追尾部１１５は、追尾用メモリ１０８に格納された、処理対象のフレーム（現フレーム）の画像データから追尾対象の被写体領域を推定し、推定した被写体領域のフレーム内の位置と大きさを追尾結果として求める。追尾部１１５は例えば、現フレームの画像データと、現フレームより前に撮影された過去フレーム（例えば１つ前のフレーム）との画像データとを用いて、現フレーム内の追尾対象の被写体領域を推定する。追尾部１１５は、追尾結果を情報重畳部１２０に出力する。 The tracking unit 115 estimates a subject area to be tracked from the image data of the frame to be processed (current frame) stored in the tracking memory 108, and calculates the position and size of the estimated subject area within the frame as a tracking result. Ask as For example, the tracking unit 115 uses image data of the current frame and image data of a past frame captured before the current frame (for example, the previous frame) to determine the subject area to be tracked in the current frame. presume. Tracking section 115 outputs the tracking result to information superimposing section 120 .

ここで、追尾部１１５は、過去フレームにおける追尾対象の被写体領域に対応する、処理対象のフレーム内の領域を推定するものである。つまり、処理対象のフレームについて対象決定部１１１が決定した追尾対象の被写体領域は、処理対象のフレームに対する追尾処理における追尾対象の被写体領域ではない。処理対象のフレームに対する追尾処理における追尾対象の被写体領域は、過去フレームにおける追尾対象の被写体領域である。処理対象のフレームについて対象決定部１１１が決定した追尾対象の被写体領域は、追尾対象の被写体が別の被写体に切り替わった場合に、次のフレームの追尾処理に用いられる。 Here, the tracking unit 115 estimates an area within the frame to be processed that corresponds to the subject area to be tracked in the past frame. That is, the tracking target subject area determined by the target determining unit 111 for the processing target frame is not the tracking target subject area in the tracking process for the processing target frame. The subject area to be tracked in the tracking process for the frame to be processed is the subject area to be tracked in the past frame. The tracking target subject area determined by the target determination unit 111 for the processing target frame is used for the tracking process of the next frame when the tracking target subject is switched to another subject.

追尾部１１５は深層学習（ＤＬ）を用いて被写体追尾を行うＤＬ追尾部１１６と、ＤＬを用いずに被写体追尾を行う非ＤＬ追尾部１１７とを有する。そして、追尾制御部１１３によって有効とされた追尾部が、追尾制御部１１３によって設定された動作頻度で追尾結果を出力する。 The tracking unit 115 has a DL tracking unit 116 that tracks the subject using deep learning (DL) and a non-DL tracking unit 117 that tracks the subject without using DL. Then, the tracking unit enabled by the tracking control unit 113 outputs the tracking result at the operation frequency set by the tracking control unit 113 .

ＤＬ追尾部１１６は、学習済みの、畳み込み層を含む多層ニューラルネットワークを用いて、追尾対象の被写体領域の位置および大きさを推定する。より具体的には、ＤＬ追尾部１１６は、対象となりうるオブジェクトクラスごとの被写体領域についての特徴点と、特徴点が含む特徴量とを抽出する機能と、抽出した特徴点をフレーム間で対応付ける機能とを有する。したがって、ＤＬ追尾部１１６は、過去フレームの追尾対象の被写体領域についての特徴点に対応付けられる現フレームの特徴点から、現フレームにおける追尾対象の被写体領域の位置と大きさを推定することができる。 The DL tracking unit 116 estimates the position and size of the subject area to be tracked using a trained multi-layer neural network including convolution layers. More specifically, the DL tracking unit 116 has a function of extracting feature points and feature amounts included in the feature points of the subject area for each object class that can be a target, and a function of associating the extracted feature points between frames. and Therefore, the DL tracking unit 116 can estimate the position and size of the tracking target subject area in the current frame from the feature points of the current frame that are associated with the feature points of the tracking target subject area of the past frame. .

ＤＬ追尾部１１６は、現フレームについて推定した追尾対象の被写体領域について、位置、大きさ、および信頼度スコアを出力する。信頼度スコアは、フレーム間における特徴点の対応付けの信頼度、すなわち、追尾対象の被写体領域の推定結果の信頼度を示す。信頼度スコアが、フレーム間の特徴点の対応付けの信頼度が低いことを示す場合、現フレームにおいて推定された被写体領域が、過去フレームにおける追尾対象の被写体領域とは異なる被写体に関する領域である可能性があることを示す。 The DL tracking unit 116 outputs the position, size, and reliability score of the tracking target subject area estimated for the current frame. The reliability score indicates the reliability of the matching of feature points between frames, that is, the reliability of the estimation result of the tracking target subject area. If the reliability score indicates that the reliability of the matching of feature points between frames is low, it is possible that the subject area estimated in the current frame is an area related to a different subject from the tracking target subject area in the past frame. indicates that there is

一方、非ＤＬ追尾部１１７は、深層学習を用いない手法によって、現フレームにおける追尾対象の被写体領域を推定する。ここでは、非ＤＬ追尾部１１７が、色構成の類似度に基づいて追尾対象の被写体領域を推定するものとする。しかし、過去フレームにおける追尾対象の被写体領域をテンプレートとしたパターンマッチングなど、他の方法を用いてもよい。非ＤＬ追尾部１１７は、現フレームについて推定した追尾対象の被写体領域について、位置、大きさ、および信頼度スコアを出力する。 On the other hand, the non-DL tracking unit 117 estimates the tracking target subject area in the current frame by a technique that does not use deep learning. Here, it is assumed that the non-DL tracking unit 117 estimates the tracking target subject area based on the similarity of color configuration. However, other methods such as pattern matching using a tracking target subject region in a past frame as a template may be used. The non-DL tracking unit 117 outputs the position, size, and reliability score of the tracking target subject area estimated for the current frame.

ここで、色構成の類似度について説明する。ここでは、説明および理解を容易にするため、過去フレームと現フレームとで追尾対象の被写体領域の形状および大きさが同一であるものとする。また、画像データがＲＧＢの色成分ごとに８ビット（値０～２５５）の深度を有するものとする。 Here, the degree of similarity of color configuration will be described. Here, in order to facilitate explanation and understanding, it is assumed that the shape and size of the subject area to be tracked are the same between the past frame and the current frame. It is also assumed that the image data has a depth of 8 bits (values 0 to 255) for each RGB color component.

非ＤＬ追尾部１１７は、ある色成分（例えばＲ成分とする）について、取り得る値の範囲（０～２５５）を複数の領域に分割する。そして、非ＤＬ追尾部１１７は、追尾対象の被写体領域に含まれる画素について、Ｒ成分の値が属する領域によって分類した結果（値の範囲ごとの頻度）を、追尾対象の被写体領域の色構成とする。 The non-DL tracking unit 117 divides the range of possible values (0 to 255) for a certain color component (for example, the R component) into a plurality of regions. Then, the non-DL tracking unit 117 classifies the pixels included in the tracking target subject region according to the region to which the R component value belongs (the frequency for each range of values) as the color configuration of the tracking target subject region. do.

最も単純な例として、Ｒ成分の取り得る値の範囲（０～２５５）を、０～１２７のＲｅｄ１と、１２８～２５５のＲｅｄ２とに分割したものとする。そして、過去フレームにおける追尾対象の被写体領域の色構成が、Ｒｅｄ１が５０画素、Ｒｅｄ２が７０画素であったとする。また、現フレームにおける追尾対象の被写体領域の色構成が、Ｒｅｄ１が４５画素、Ｒｅｄ２が７５画素であったとする。 As the simplest example, it is assumed that the possible value range (0 to 255) of the R component is divided into Red1 of 0 to 127 and Red2 of 128 to 255. Assume that the color configuration of the subject area to be tracked in the past frame is 50 pixels for Red1 and 70 pixels for Red2. It is also assumed that the color configuration of the subject area to be tracked in the current frame is 45 pixels for Red1 and 75 pixels for Red2.

この場合、非ＤＬ追尾部１１７は色構成の類似度を表すスコア（類似度スコア）を、同じ値の範囲に分類された画素数の差に基づいて、以下の様に算出することができる。
類似度スコア＝｜５０－４５｜＋｜７０－７５｜＝１０
仮に、現フレームにおける追尾対象の被写体領域の色構成が、Ｒｅｄ１が１０画素、Ｒｅｄ２が１１０画素であったとすると、類似度スコアは、
類似度スコア＝｜５０－１０｜＋｜７０－１１０｜＝８０
となる。このように、色構成の類似度が低いほど類似度スコアは大きくなる。あるいは、類似度スコアが小さいほど、色構成の類似度が高いことを表す。 In this case, the non-DL tracking unit 117 can calculate a score (similarity score) representing the degree of similarity of color configuration based on the difference in the number of pixels classified into the same value range as follows.
Similarity score=|50-45|+|70-75|=10
Assuming that the color configuration of the tracking target subject area in the current frame is 10 pixels for Red1 and 110 pixels for Red2, the similarity score is:
Similarity score=|50-10|+|70-110|=80
becomes. Thus, the lower the color configuration similarity, the higher the similarity score. Alternatively, a smaller similarity score indicates a higher similarity of color configurations.

情報重畳部１２０は、追尾部１１５が出力する追尾結果に含まれる、被写体領域の大きさに基づいて、追尾枠の画像を生成する。例えば、追尾枠の画像は、被写体領域に外接する矩形の輪郭を表す枠状の画像であってよい。そして、情報重畳部１２０は、追尾結果に含まれる被写体領域の位置に追尾枠が表示されるように、後処理部１１４が出力する表示用画像データに対して追尾枠の画像を重畳させて合成画像データを生成する。情報重畳部１２０はまた、撮像装置１００の現在の設定値や状態などを表す画像を生成し、これらの画像が予め定められた位置に表示されるように、後処理部１１４が出力する表示用画像データに重畳させてもよい。情報重畳部１２０は、合成画像データを表示部１２１に出力する。 The information superimposing unit 120 generates a tracking frame image based on the size of the subject area included in the tracking result output by the tracking unit 115 . For example, the tracking frame image may be a frame-shaped image representing the outline of a rectangle that circumscribes the subject area. Then, the information superimposing unit 120 superimposes and synthesizes the image of the tracking frame on the display image data output from the post-processing unit 114 so that the tracking frame is displayed at the position of the subject area included in the tracking result. Generate image data. The information superimposing unit 120 also generates images representing the current setting values and states of the imaging device 100, and displays the images for display output by the post-processing unit 114 so that these images are displayed at predetermined positions. It may be superimposed on the image data. Information superimposing section 120 outputs the synthesized image data to display section 121 .

表示部１２１は例えば液晶ディスプレイや有機ＥＬディスプレイであってよい。表示部１２１は、情報重畳部１２０が出力する合成画像データに基づく画像を表示する。以上のようにして１フレーム分のライブビュー表示が行われる。 The display unit 121 may be, for example, a liquid crystal display or an organic EL display. The display unit 121 displays an image based on the composite image data output by the information superimposing unit 120. FIG. Live view display for one frame is performed as described above.

評価値生成部１２４は、撮像素子１０３から得られる画像データから、自動焦点検出（ＡＦ）に用いる信号や評価値の生成や、自動露出制御（ＡＥ）に用いる評価値（輝度情報）を算出する。輝度情報は、各カラーフィルタ画素（赤、青、緑）を積分した積分値（赤、青、緑）から、色変換をして輝度を生成する。なお、輝度情報の生成は別の方法を用いても良い。また、自動ホワイトバランス（ＡＷＢ）に用いる評価値（色毎（赤、青、緑）積分値）を、輝度情報を生成する時と同様の方法で算出する。制御部１０２は、この色毎積分値から光源を特定し、白いものが白となるように画素の補正値を算出する。この補正値を後述する第１画像補正部１０９や第２画像補正部１０６で、各画素に乗算することで、ホワイトバランスを行う。また、手振れ補正のための手振れ検出に用いる評価値（動きベクトル情報）を、２枚以上の画像データを用いて、基準となる画像データから動きベクトルを算出する。評価値生成部１２４は、生成した信号および評価値を制御部１０２に出力する。制御部１０２は、評価値生成部１２４から得られる信号や評価値に基づいて、光学系１０１のフォーカスレンズ位置を制御したり、撮影条件（露光時間、絞り値、ＩＳＯ感度など）を決定したりする。評価値生成部１２４は、後述する後処理部１１４が生成する表示用画像データから信号や評価値を生成してもよい。 The evaluation value generation unit 124 generates signals and evaluation values used for automatic focus detection (AF) from image data obtained from the image sensor 103, and calculates evaluation values (luminance information) used for automatic exposure control (AE). . Luminance information is generated by color conversion from integrated values (red, blue, green) obtained by integrating each color filter pixel (red, blue, green). Note that another method may be used to generate luminance information. Also, an evaluation value (integrated value for each color (red, blue, green)) used for automatic white balance (AWB) is calculated in the same manner as when generating luminance information. The control unit 102 identifies the light source from the integrated value for each color, and calculates the pixel correction value so that the white color becomes white. White balance is performed by multiplying each pixel by the correction value in the first image correction unit 109 and the second image correction unit 106, which will be described later. Also, an evaluation value (motion vector information) used for camera shake detection for camera shake correction is calculated from image data serving as a reference using two or more pieces of image data. Evaluation value generator 124 outputs the generated signal and evaluation value to control unit 102 . The control unit 102 controls the focus lens position of the optical system 101 and determines shooting conditions (exposure time, aperture value, ISO sensitivity, etc.) based on signals and evaluation values obtained from the evaluation value generation unit 124. do. The evaluation value generation unit 124 may generate a signal or an evaluation value from display image data generated by the post-processing unit 114, which will be described later.

選択部１２５は、ＤＬ追尾部１１６が出力する信頼度スコア、および非ＤＬ追尾部１１７が出力する類似度スコアに基づいて、ＤＬ追尾部１１６および非ＤＬ追尾部１１７の追尾結果の一方を採用する。選択部１２５は例えば信頼度スコアが予め定められた信頼度スコア閾値以下、かつ類似度スコアが予め定められた類似度スコア閾値以下であった場合には、非ＤＬ追尾部１１７の追尾結果を採用し、それ以外の場合には、ＤＬ追尾部１１６の追尾結果を採用する。選択部１２５は、採用した追尾結果を、情報重畳部１２０および制御部１０２に出力する。 The selection unit 125 adopts one of the tracking results of the DL tracking unit 116 and the non-DL tracking unit 117 based on the reliability score output by the DL tracking unit 116 and the similarity score output by the non-DL tracking unit 117. . For example, when the reliability score is less than or equal to a predetermined reliability score threshold and the similarity score is less than or equal to a predetermined similarity score threshold, the selection unit 125 adopts the tracking result of the non-DL tracking unit 117. Otherwise, the tracking result of the DL tracking unit 116 is adopted. Selecting section 125 outputs the adopted tracking result to information superimposing section 120 and control section 102 .

なお、ここではＤＬ追尾部１１６および非ＤＬ追尾部１１７の追尾結果のいずれを採用するかを信頼度スコアおよび類似度スコアに基づいて決定した。しかし、他の方法で決定してもよい。例えば、ＤＬ追尾部１１６の精度は、非ＤＬ追尾部１１７の精度より高い傾向にあることを利用して、ＤＬ追尾部１１６の追尾結果を優先して採用してもよい。具体的には、ＤＬ追尾部１１６の追尾結果が得られていればＤＬ追尾部１１６の追尾結果を採用し、得られていなければ非ＤＬ追尾部１１７の追尾結果を採用してもよい。 Here, which of the tracking results of the DL tracking unit 116 and the non-DL tracking unit 117 is adopted is determined based on the reliability score and the similarity score. However, other methods may be used. For example, using the fact that the accuracy of the DL tracking unit 116 tends to be higher than the accuracy of the non-DL tracking unit 117, the tracking result of the DL tracking unit 116 may be preferentially employed. Specifically, if the tracking result of the DL tracking unit 116 is obtained, the tracking result of the DL tracking unit 116 may be adopted, and if not, the tracking result of the non-DL tracking unit 117 may be adopted.

撮像装置動き検出部１２６は、撮像装置１００自体の動きを検出し、ジャイロセンサーなどで構成される。撮像装置動き検出部１２６は、検出した撮像装置の動き情報を制御部１０２に出力する。制御部１０２は、撮像装置の動き情報を元に手振れ検出や、撮像装置の一定方向への振りを検出し、流し撮り撮影の判定を行う。なお、流し撮り判定には、撮像装置動き検出部１２６の結果と、評価値生成部１２４の動きベクトルを組み合わせて、撮像装置は一定方向へ振られているが、被写体の動きベクトルが殆どないことを見ることで、流し撮り判定精度を向上されることができる。 The imaging device motion detection unit 126 detects the motion of the imaging device 100 itself, and is composed of a gyro sensor or the like. The imaging device motion detection unit 126 outputs the detected motion information of the imaging device to the control unit 102 . The control unit 102 detects camera shake based on motion information of the imaging device, detects swinging of the imaging device in a certain direction, and determines panning shooting. Note that for panning determination, the result of the imaging device motion detection unit 126 and the motion vector of the evaluation value generation unit 124 are combined, and the imaging device is shaken in a certain direction, but there is almost no motion vector of the subject. , the panning determination accuracy can be improved.

次に図２を用いて、撮像装置１００が撮像動作を行う際の追尾制御部１１３による被写体の追尾処理の動作フローを説明する。本実施形態では、流し撮り撮影を行うシーンであるか否か、および輝度の低いシーンであるか否かに応じてＤＬ追尾および非ＤＬ追尾を制御する。しかし、判定するシーンとしてはいずれか一方だけでもよいし、他のシーン判定を行いさらにＤＬ追尾あるいは非ＤＬ追尾を判定するように構成してよい。 Next, with reference to FIG. 2, an operational flow of subject tracking processing by the tracking control unit 113 when the imaging apparatus 100 performs an imaging operation will be described. In the present embodiment, DL tracking and non-DL tracking are controlled depending on whether the scene is a panning shooting scene and whether the brightness of the scene is low. However, only one of the scenes may be determined, or another scene may be determined, and DL tracking or non-DL tracking may be determined.

Ｓ２０１において追尾制御部１１３は、撮像装置動き検出部１２６が検出した撮像装置自体の動き情報を取得し、Ｓ２０２へ進む。 In S201, the tracking control unit 113 acquires motion information of the imaging device itself detected by the imaging device motion detection unit 126, and proceeds to S202.

Ｓ２０２において追尾制御部１１３は、撮像装置自体の動き情報を元に、一定方向に撮像装置の動きがあるどうかで流し撮り判定し、流し撮りしていると判定した場合は、Ｓ２０５へ進み、流し撮りしていないと判定した場合は、Ｓ２０３へ進む。 In S202, the tracking control unit 113 determines whether or not the imaging device is moving in a certain direction based on the motion information of the imaging device itself, and determines whether or not the imaging device is moving in a certain direction. If it is determined that the image has not been taken, the process proceeds to S203.

Ｓ２０３において追尾制御部１１３は、評価値生成部１２４で生成した輝度情報を取得し、Ｓ２０４へ進む。 In S203, the tracking control unit 113 acquires the luminance information generated by the evaluation value generation unit 124, and proceeds to S204.

Ｓ２０４において追尾制御部１１３は、取得した輝度情報と閾値を比較して、閾値未満であれば、Ｓ２０５へ進み、閾値以上であれば、Ｓ２０６へ進む。具体的には、画像データの明るさに応じて、暗ければＳ２０５へ進み、明るければＳ２０６へ進むことを意味する。本実施例では、１フレームの輝度情報のみで判定しているが、複数フレームに渡って輝度情報と閾値を比較して、複数フレームで閾値未満であれば、Ｓ２０５へ進むように動作させてもよい。 In S204, the tracking control unit 113 compares the acquired luminance information with a threshold value, and proceeds to S205 if less than the threshold value, and proceeds to S206 if greater than or equal to the threshold value. Specifically, if the image data is dark, the process proceeds to S205, and if the image data is bright, the process proceeds to S206. In this embodiment, the determination is made based only on the luminance information of one frame. However, the luminance information may be compared with the threshold over a plurality of frames, and if the threshold is less than the threshold in the plurality of frames, the process may proceed to S205. good.

Ｓ２０５において追尾制御部１１３は、ＤＬ追尾部１１６を無効に、非ＤＬ追尾部１１７を有効にすることを決定し、処理を終える。これは、流し撮りでは対象の被写体、つまり動いている被写体を撮像装置が追尾するのではなく、ユーザーが被写体を捉えて撮像装置自体を一定方向に動かすため、追尾性能が要求されていないシーンと捉えて、非ＤＬ追尾部１１７の動作頻度を低減させてもよい。同様に、画像データが暗いとき、つまり夜景撮影が想定されるため、追尾性能が要求されていないシーンと捉えて、非ＤＬ追尾部１１７の動作頻度を低減させてもよい。 In S205, the tracking control unit 113 determines to disable the DL tracking unit 116 and enable the non-DL tracking unit 117, and ends the process. In panning, the camera does not track a moving subject, but the user captures the subject and moves the camera in a certain direction. The frequency of operation of the non-DL tracking unit 117 may be reduced. Similarly, when the image data is dark, that is, because night scene photography is assumed, the frequency of operation of the non-DL tracking unit 117 may be reduced by regarding this as a scene that does not require tracking performance.

Ｓ２０６において、追尾制御部１１３は、ＤＬ追尾部１１６を有効に、非ＤＬ追尾部１１７を無効にすることを決定し、処理を終える。 In S206, the tracking control unit 113 determines to enable the DL tracking unit 116 and disable the non-DL tracking unit 117, and ends the process.

（表示部１２１による表示処理）
図３は、ライブビュー表示の例を示す図である。図３（ａ）は、後処理部１１４が出力する表示用画像データが表す画像３００を示す。また、図２（ｂ）は、表示用画像データに対して追尾枠３０３の画像を重畳した合成画像データが表す画像３０２を示す。ここでは撮影範囲に候補被写体３０１が１つだけ存在するため、候補被写体３０１が追尾対象の被写体として選択される。そして、候補被写体３０１を囲むように追尾枠３０３が重畳されている。なお、図２（ｂ）の例では、追尾枠３０３が４つの中空かぎ形状の組み合わせから構成されているが、中空でないかぎ形状の組み合わせ、切れ目のない枠、矩形の組み合わせ、三角形の組み合わせなど、他の形態の追尾枠３０３としてもよい。また、追尾枠３０３の形態はユーザが選択可能であってもよい。 (Display processing by display unit 121)
FIG. 3 is a diagram showing an example of live view display. FIG. 3A shows an image 300 represented by the display image data output by the post-processing unit 114. FIG. FIG. 2B shows an image 302 represented by combined image data in which the image of the tracking frame 303 is superimposed on the image data for display. Since only one candidate subject 301 exists in the imaging range here, the candidate subject 301 is selected as the subject to be tracked. A tracking frame 303 is superimposed so as to surround the candidate subject 301 . In the example of FIG. 2B, the tracking frame 303 is composed of a combination of four hollow hook shapes. Other forms of tracking frame 303 may be used. Also, the form of the tracking frame 303 may be selectable by the user.

図４は、撮像装置１００による一連の撮像動作における被写体追尾機能の動作に関するフローチャートである。各ステップは制御部１０２あるいは制御部１０２の指示により各部で実行される。 FIG. 4 is a flowchart regarding the operation of the subject tracking function in a series of imaging operations by the imaging apparatus 100. FIG. Each step is executed by each unit according to the control unit 102 or an instruction from the control unit 102 .

Ｓ４００で制御部１０２は、撮像素子１０３を制御して１フレームの撮像を行い、画像データを取得する。 In S400, the control unit 102 controls the image sensor 103 to capture one frame of image, and acquires image data.

Ｓ４０１で第１前処理部１０４は、撮像素子１０３から読み出された画像データに対して前処理を適用する。 In S401 , the first preprocessing unit 104 applies preprocessing to image data read from the image sensor 103 .

Ｓ４０２で制御部１０２は、前処理を適用した画像データを表示用メモリ１０７に格納する。 In S402 , the control unit 102 stores the preprocessed image data in the display memory 107 .

Ｓ４０３で第１画像補正部１０９は、表示用メモリ１０７から読み出した画像データに対し、所定の画像補正処理の適用を開始する。 In S403 , the first image correction unit 109 starts applying predetermined image correction processing to the image data read from the display memory 107 .

Ｓ４０４で制御部１０２は、適用すべき画像補正処理をすべて完了したか否かを判定し、すべて完了したと判定されれば、画像補正処理を適用した画像データを後処理部１１４に出力し、Ｓ４０５に進む。また、第１画像補正部１０９は、画像補正処理がすべて完了したと判定されなければ、画像補正処理を継続する。 In S404, the control unit 102 determines whether or not all the image correction processes to be applied have been completed. Proceed to S405. Further, the first image correction unit 109 continues the image correction processing unless it is determined that all the image correction processing is completed.

Ｓ４０５で後処理部１１４は、第１画像補正部１０９によって画像補正処理が適用された画像データから、表示用の画像データを生成し、情報重畳部１２０に出力する。 In S405 , the post-processing unit 114 generates display image data from the image data to which the image correction processing has been applied by the first image correction unit 109 , and outputs the display image data to the information superimposing unit 120 .

Ｓ４０６で情報重畳部１２０は、後処理部１１４が生成した表示用の画像データと、追尾枠の画像データと、他の情報を示す画像データとを用いて、撮影画像に追尾枠や他の情報の画像が重畳した合成画像のデータを生成する。情報重畳部１２０は、合成画像データを表示部１２１に出力する。 In S406, the information superimposing unit 120 uses the image data for display generated by the post-processing unit 114, the image data of the tracking frame, and the image data representing other information to add the tracking frame and other information to the captured image. data of a composite image in which the images of are superimposed. Information superimposing section 120 outputs the synthesized image data to display section 121 .

Ｓ４０７で表示部１２１は、情報重畳部１２０が生成した合成画像データを表示する。これにより、１フレーム分のライブビュー表示が完了する。 In S407 , the display unit 121 displays the composite image data generated by the information superimposing unit 120 . This completes the live view display for one frame.

以上のように、本実施形態では第１の追尾手段と、第１の追尾手段よりも演算負荷が小さい第２の追尾手段とを用いる画像処理装置において、撮像装置の動きおよび画像データの明るさの少なくとも１つに基づいて第１及び第２の追尾手段の有効・無効を制御する。そのため、良好な追尾結果を得る必要性が低いシーンにおいて、第１の追尾手段を無効とすることで、消費電力を抑制することができる。 As described above, in this embodiment, in the image processing apparatus using the first tracking means and the second tracking means having a smaller computational load than the first tracking means, the movement of the imaging device and the brightness of the image data The validity/invalidity of the first and second tracking means is controlled based on at least one of Therefore, power consumption can be suppressed by disabling the first tracking means in a scene where it is less necessary to obtain a good tracking result.

また本実施形態では、撮像装置自体の動きや画像データの明るさに基づいてＤＬ／非ＤＬ追尾部の有効・無効を制御するに当たって、ＤＬ追尾部１１６が有効な場合には非ＤＬ追尾部１１７が無効にするといった排他的に制御する例を示した。しかしこれに限らず、難度が高くなる流し撮りシーンや低輝度値においてはそのパンニング速度や輝度値の低さに応じてＤＬ追尾部１１６と非ＤＬ追尾部１１７をどちらも有効にしてもよい。すなわち、このとき両方の追尾結果に基づいて追尾処理を行うように制御してもよい。なお、上記実施形態ではＤＬ追尾部１１６と非ＤＬ追尾部１１７の有効、無効に制御を２値で切り替える例を示した。しかしこれに限らず、画像の明るさや被写体の動きに応じて多段階で切り替えてもよい。すなわち、ＤＬ追尾部１１６と非ＤＬ追尾部１１７の有効にも演算負荷の大きさが複数段階用意され、より効果的な場合にはより演算負荷の高い処理を行うよう切り替えられてもよい。 Further, in this embodiment, in controlling the validity/invalidity of the DL/non-DL tracking unit based on the movement of the imaging device itself and the brightness of the image data, when the DL tracking unit 116 is valid, the non-DL tracking unit 117 An example of exclusive control such as disabling is shown. However, not limited to this, both the DL tracking unit 116 and the non-DL tracking unit 117 may be enabled according to the panning speed and low brightness value in panning scenes and low brightness values that are more difficult. That is, at this time, control may be performed so that the tracking process is performed based on both tracking results. In addition, in the above-described embodiment, an example is shown in which the control of the DL tracking unit 116 and the non-DL tracking unit 117 is switched between valid and invalid in binary. However, it is not limited to this, and may be switched in multiple stages according to the brightness of the image or the movement of the subject. In other words, a plurality of levels of calculation load may be prepared for the effectiveness of the DL tracking unit 116 and the non-DL tracking unit 117, and switching may be performed to perform processing with a higher calculation load when more effective.

またＤＬ追尾部１１６と非ＤＬ追尾部１１７の無効は、本実施形態ではＬ追尾部１１６で行う演算処理をすべて省略、実行しない例を示した。しかしこれに限らず、追尾処理のための前処理、追尾の本処理のための演算など有効な場合に行う追尾演算処理、追尾結果出力処理の少なくとも一部が省略、実行されないことを含んでもよい。 In the present embodiment, the DL tracking unit 116 and the non-DL tracking unit 117 are invalidated by omitting or not executing all the arithmetic processing performed by the L tracking unit 116 . However, the present invention is not limited to this, and may include omitting or not executing at least a part of tracking calculation processing and tracking result output processing performed when valid, such as preprocessing for tracking processing and calculation for main processing of tracking. .

（第２の実施形態）
次に、本発明の第２の実施形態について説明する。なお、ここでは、前述した第１の実施形態と異なる部分のみを説明し、同一の部分については、同一の符号を付すなどして詳細な説明を省略する。第２の実施形態では、撮像装置が撮像画像や撮影パラメータ、撮像装置の姿勢などの少なくともいずれか１つに基づいて自動で撮影シーンを認識した結果を用いて、ＤＬ追尾部１１６、非ＤＬ追尾部１１７、検出部１１０を制御する。以下、図５、図６を用いて説明する。 (Second embodiment)
Next, a second embodiment of the invention will be described. Here, only parts different from the above-described first embodiment will be described, and detailed description of the same parts will be omitted by attaching the same reference numerals. In the second embodiment, the DL tracking unit 116, the non-DL tracking unit 116, and the non-DL tracking unit 116 use the result of the imaging apparatus automatically recognizing the shooting scene based on at least one of the captured image, the shooting parameter, the posture of the imaging apparatus, and the like. It controls the unit 117 and the detection unit 110 . Hereinafter, description will be made with reference to FIGS. 5 and 6. FIG.

図５は、第２の実施形態の制御部１０２の動作フローである。 FIG. 5 is an operation flow of the control unit 102 of the second embodiment.

Ｓ５０１において制御部１０２は、図４（説明は後述する）に示す撮影シーンを判別して、Ｓ５０２へ進む。図４の撮影シーンの判別には、背景の明るい／暗いは、評価値生成部１２４で取得した輝度情報から判定し、背景の青空／夕景は、ホワイトバランスの補正値を算出する過程で得た光源情報と、輝度情報から判定する。また、被写体の人物、人物以外は、検出部１１０の結果から判定し、動体、非動体は、追尾部１１５より判定する。これらの判定方法に限らず、画像やジャイロセンサ、赤外センサ、ＴｏＦ（タイムオブフライト）センサ等で得られた情報から撮影シーンを判定できる公知の処理シーケンスであれば、いずれも適用可能である。流し撮り判定は第１の実施形態と同じ方法で行う。 In S501, the control unit 102 determines the shooting scene shown in FIG. 4 (which will be described later), and proceeds to S502. In determining the shooting scene in FIG. 4, whether the background is bright or dark is determined from the luminance information acquired by the evaluation value generation unit 124, and the background blue sky/twilight is obtained in the process of calculating the white balance correction value. It is determined from light source information and brightness information. Also, a person or non-person subject is determined from the result of the detection unit 110 , and a moving object or non-moving object is determined by the tracking unit 115 . Any known processing sequence that can determine a shooting scene from information obtained by an image, a gyro sensor, an infrared sensor, a ToF (time of flight) sensor, or the like is applicable without being limited to these determination methods. . Panning determination is performed by the same method as in the first embodiment.

Ｓ５０２において制御部１０２は、図４に示す撮影シーンに応じた図５（説明は後述する）に示す動作モードとなるように制御をして処理を終える。具体的には、制御部１０２は、図４の撮影シーンの表中の動作モードに応じて、検出部１１０の制御と追尾制御部１１３へ通知を行う。通知を受け取った追尾制御部１１３が追尾部１１６の制御を行う。 In S502, the control unit 102 performs control so that the operation mode shown in FIG. 5 (described later) corresponding to the shooting scene shown in FIG. 4 is set, and the process ends. Specifically, the control unit 102 controls the detection unit 110 and notifies the tracking control unit 113 according to the operation mode in the shooting scene table of FIG. The tracking control unit 113 that has received the notification controls the tracking unit 116 .

図６（ａ）は、撮影シーンと検出部１１０と追尾部１０５の動作モードの関係を示す表である。横軸を被写体の判定で、人物、人物以外で、かつ動体か非動体かをまたは流し撮りのシーンかどうかの項目で、縦軸が背景の明るさと青空や夕景かの項目である。つまり、被写体と背景を判定して、動作モードを決める表となっている。なお、図４の撮影シーンは１例であり、他の撮影シーンを追加して動作モードを決定してもよい。 FIG. 6A is a table showing the relationship between the shooting scene and the operation modes of the detection unit 110 and the tracking unit 105. FIG. The abscissa indicates subject determination, whether it is a person, non-person, moving or non-moving object, or whether it is a panning scene, and the ordinate indicates the background brightness, blue sky, or evening scene. In other words, it is a table that determines the operation mode by judging the subject and the background. Note that the shooting scene in FIG. 4 is an example, and the operation mode may be determined by adding other shooting scenes.

図６（ｂ）は、検出部１１０と追尾部１０５の動作モードを示す表である。 FIG. 6B is a table showing operation modes of the detection unit 110 and the tracking unit 105. As shown in FIG.

動作モード１では、ＤＬ追尾部１１６を無効、非ＤＬ追尾部１１７を有効、検出部１１０で検出する対象を人物と人物以外として動作させ、かつ検出部１１０の動作周期を例えば、撮影のフレームレートの半分以下にする。 In operation mode 1, the DL tracking unit 116 is disabled, the non-DL tracking unit 117 is enabled, the detection unit 110 is operated to detect a person and a person other than the person, and the operation period of the detection unit 110 is set to, for example, the shooting frame rate. less than half of

動作モード２では、ＤＬ追尾部１１６を無効、非ＤＬ追尾部１１７を有効、検出部１１０で検出する対象を人物と人物以外、非動体物、例えば建物、道路、空、木などとして動作させ、かつ動作周期を例えば、撮影のフレームレートの半分以下にする。非動体物の認識結果は、例えば、ホワイトバランスの光源特定で使用や、第１画像補正部１０９や第２画像補正部１０６の補正処理で、人口物と非人口物を区別した画像処理を施すために使用する。 In operation mode 2, the DL tracking unit 116 is disabled, the non-DL tracking unit 117 is enabled, and the objects to be detected by the detection unit 110 are people and non-animal objects such as buildings, roads, the sky, trees, etc., In addition, the operation cycle is set to, for example, half or less of the shooting frame rate. The recognition result of the non-moving object is used, for example, to specify the light source for white balance, and the correction processing of the first image correction unit 109 and the second image correction unit 106 performs image processing that distinguishes between artificial objects and non-animal objects. use for

動作モード３では、ＤＬ追尾部１１６を有効、非ＤＬ追尾部１１７を有効、検出部１１０で検出する対象を人物と人物以外として動作させ、かつ動作周期を例えば、人物は撮影のフレームレートを同じ、人物以外を撮影のフレームレートの半分以下にする。 In the operation mode 3, the DL tracking unit 116 is enabled, the non-DL tracking unit 117 is enabled, the detection unit 110 detects objects other than a person, and the operation cycle is set to the same frame rate. , to reduce the frame rate of shots other than people to less than half of the shooting frame rate.

動作モード４では、ＤＬ追尾部１１６をＯＮ、非ＤＬ追尾部１１７を有効、検出部１１０で検出する対象を人物と人物以外、非動体物として動作させる。かつ動作周期を例えば、人物は撮影のフレームレートを同じ、人物以外、非動体物を撮影のフレームレートの半分以下にする。 In operation mode 4, the DL tracking unit 116 is turned on, the non-DL tracking unit 117 is enabled, and the object to be detected by the detection unit 110 is a person and a non-moving object other than a person. In addition, for example, the motion cycle is set to the same shooting frame rate for people, and to half or less of the shooting frame rate for non-moving objects other than people.

動作モード５では、ＤＬ追尾部１１６を無効、非ＤＬ追尾部１１７を有効、検出部１１０で検出する対象を人物と人物以外として動作させ、かつ動作周期を例えば、人物は撮影のフレームレートを同じ、人物以外を撮影のフレームレートの半分以下にする。 In operation mode 5, the DL tracking unit 116 is disabled, the non-DL tracking unit 117 is enabled, the detection unit 110 detects objects other than a person, and the operation period is set to the same frame rate. , to reduce the frame rate of shots other than people to less than half of the shooting frame rate.

動作モード６では、ＤＬ追尾部１１６を有効、非ＤＬ追尾部１１７を有効、検出部１１０で検出する対象を人物と人物以外として動作させ、かつ動作周期を例えば、人物は撮影のフレームレートの半分以下、人物以外を撮影のフレームレートと同じにする。 In operation mode 6, the DL tracking unit 116 is enabled, the non-DL tracking unit 117 is enabled, the detection unit 110 operates with a person and a person other than the person, and the operation cycle is set to, for example, half the shooting frame rate for the person. In the following, the frame rate is set to be the same as the shooting frame rate except for the person.

動作モード７では、ＤＬ追尾部１１６を有効、非ＤＬ追尾部１１７を有効、検出部１１０で検出する対象を人物と人物以外、非動体物として動作させる。また、動作周期を例えば、人物と非動体物は撮影のフレームレートの半分以下、人物以外を撮影のフレームレートと同じにする。 In operation mode 7, the DL tracking unit 116 is enabled, the non-DL tracking unit 117 is enabled, and the detection unit 110 operates as a person and a non-moving object other than a person. Also, for example, the operation cycle is set to be half or less of the frame rate of photographing for people and non-moving objects, and to be the same as the frame rate of photographing for non-human objects.

動作モード８では、ＤＬ追尾部１１６を無効、非ＤＬ追尾部１１７を有効、検出部１１０で検出する対象を人物と人物以外として動作させ、かつ動作周期を例えば、人物は撮影のフレームレートの半分以下、人物以外を撮影のフレームレートと同じにする。 In the operation mode 8, the DL tracking unit 116 is disabled, the non-DL tracking unit 117 is enabled, the detection unit 110 detects a person and other objects than the person, and the operation cycle is set to, for example, half the shooting frame rate for the person. In the following, the frame rate is set to be the same as the shooting frame rate except for the person.

なお、図６（ｂ）の動作モードは、図６（ａ）に示すシーンに応じた動作モードの一例であり、動作モードを変更してもよい。本実施形態では、撮影シーン判定として被写体が動体か非動体かの判定に、非ＤＬ追尾部１１６を使用するため、どの動作モードでも非ＤＬ追尾部１１６は有効とした。しかし被写体の動体判定に、検出部１１０で検出した被写体の位置を複数フレーム渡って監視することで、動体判定を行ってもよい。その場合、被写体が非動体の場合（動体でないと判定される場合）には、非ＤＬ追尾１１６を無効としてもよい。 The operation mode shown in FIG. 6(b) is an example of the operation mode corresponding to the scene shown in FIG. 6(a), and the operation mode may be changed. In this embodiment, since the non-DL tracking unit 116 is used to determine whether the subject is moving or non-moving as the shooting scene determination, the non-DL tracking unit 116 is enabled in any operation mode. However, the moving object determination may be performed by monitoring the position of the object detected by the detection unit 110 over a plurality of frames. In that case, when the subject is a non-moving object (when determined not to be a moving object), the non-DL tracking 116 may be disabled.

以上のように、本実施形態では、第１の追尾手段と、第１の追尾手段よりも演算負荷が小さい第２の追尾手段とを用いる画像処理装置において、画像が撮影されたシーンに基づいて第１及び第２の追尾手段の有効・無効を制御するようにした。さらに、画像が撮影されたシーンに基づいて画像から検出部で検出するオブジェクトの制限や動作周期の変更を行った。そのため、良好な追尾結果を得る必要性が低いシーンにおいて、消費電力を抑制することができる。 As described above, in the present embodiment, in the image processing apparatus using the first tracking means and the second tracking means having a smaller computational load than the first tracking means, based on the scene in which the image was captured, Validity/invalidity of the first and second tracking means is controlled. Furthermore, based on the scene in which the image was taken, we restricted the objects detected by the detector from the image and changed the operation cycle. Therefore, power consumption can be suppressed in a scene where it is less necessary to obtain a good tracking result.

（第３の実施形態）
次に、本発明の第３の実施形態について説明する。第３の実施形態では、撮像画像内の複数の領域でいわゆる「特徴点」を検出する画像処理装置において、これら特徴点の検出結果に基づいてＤＬ追尾部１１６、非ＤＬ追尾部１１７、検出部１１０を制御する。以下、図７、図８を用いて説明する。 (Third embodiment)
Next, a third embodiment of the invention will be described. In the third embodiment, in an image processing device that detects so-called "feature points" in a plurality of regions in a captured image, the DL tracking unit 116, the non-DL tracking unit 117, and the detection unit are based on the detection results of these feature points. control 110; Hereinafter, description will be made with reference to FIGS. 7 and 8. FIG.

図７は、第３の実施形態の制御部１０２の動作フローである。本フローは、撮像装置１００に電源が入った状態で、メニューより撮像を行うモードが選択され、撮像素子１０３から順次取得される撮像画像に対して追尾処理を行う対象の追尾被写体を決定し、追尾処理を行うときに動作するものとする。また、追尾制御のＯＮＯＦＦの設定がある場合には、追尾制御がＯＮに設定されている場合に本フローが始まるように制御されていてもよい。 FIG. 7 is an operation flow of the control unit 102 of the third embodiment. In this flow, an imaging mode is selected from a menu while the power of the imaging apparatus 100 is turned on, and a tracking subject for which tracking processing is to be performed on captured images sequentially acquired from the imaging device 103 is determined. Assume that it operates when performing tracking processing. Moreover, when there is a setting of ONOFF for tracking control, control may be performed so that this flow starts when tracking control is set to ON.

Ｓ７０１では、制御部１０２は、撮像素子１０３から出力された、または検出追尾用メモリ１０８に記憶された撮像画像を取得する。 In S701 , the control unit 102 acquires a captured image output from the image sensor 103 or stored in the detection/tracking memory 108 .

Ｓ７０２で、制御部１０２の指示により、評価値生成部１２４は、Ｓ６０１で得られた撮像画像を解析し画像内から特徴点を検出する検出処理を行う。特徴点の検出処理の詳細については後述する。 In S702, according to an instruction from the control unit 102, the evaluation value generation unit 124 analyzes the captured image obtained in S601 and performs detection processing for detecting feature points from the image. The details of the feature point detection processing will be described later.

Ｓ７０３で、制御部１０２は、Ｓ６０２で各特徴点を検出する際に算出した特徴点強度の情報を取得する。 In S703, the control unit 102 acquires information on the feature point intensity calculated when detecting each feature point in S602.

Ｓ７０４で、制御部１０２は、前フレームまでに、ＤＬ追尾部１１６、非ＤＬ追尾部１１７あるいはその他の被写体検出処理（顔検出など）によって追尾対象の被写体が含まれる領域として決定している追尾被写体領域内で検出された特徴点について判定処理を行う。具体的には、追尾被写体領域内の特徴点強度が第１の閾値以上の特徴点の数が第２の閾値以上であるか否かを判定する。特徴点強度が第１の閾値以上の特徴点の数が第２の閾値以上である場合Ｓ７０５に進み、特徴点強度が第１の閾値以上の特徴点の数が第２の閾値未満である場合Ｓ７０６に進む。 In step S704 , the control unit 102 detects a tracked subject that has been determined by the DL tracking unit 116 , the non-DL tracking unit 117 , or other subject detection processing (such as face detection) as an area including the subject to be tracked by the previous frame. Judgment processing is performed on the feature points detected within the area. Specifically, it is determined whether or not the number of feature points having a feature point intensity greater than or equal to a first threshold in the tracking subject area is greater than or equal to a second threshold. If the number of feature points whose feature point strength is equal to or greater than the first threshold is equal to or greater than the second threshold, the process advances to S705; if the number of feature points whose feature point strength is equal to or greater than the first threshold is less than the second threshold Proceed to S706.

Ｓ７０５で、制御部１０２は、撮像画像内で前フレームにおいて追尾被写体領域と決定された領域外で検出された特徴点について判定処理を行う。具体的には、追尾被写体領域外の特徴点強度が第３の閾値以上の特徴点の数が第４の閾値以上であるか否かを判定する。特徴点強度が第３の閾値以上の特徴点の数が第４の閾値以上である場合Ｓ７０７に進み、特徴点強度が第３の閾値以上の特徴点の数が第４の閾値未満である場合Ｓ７０８に進む。 In S705, the control unit 102 performs determination processing on the feature points detected outside the area determined as the tracking subject area in the previous frame in the captured image. Specifically, it is determined whether or not the number of feature points outside the tracking subject region having feature point intensities greater than or equal to the third threshold is greater than or equal to the fourth threshold. If the number of feature points whose feature point strength is greater than or equal to the third threshold is equal to or greater than the fourth threshold, the process advances to S707; if the number of feature points whose feature point strength is greater than or equal to the third threshold is less than the fourth threshold Proceed to S708.

Ｓ７０６で、制御部１０２は、撮像画像内で前フレームにおいて追尾被写体領域と決定された領域外で検出された特徴点について判定処理を行う。具体的には、追尾被写体領域外の特徴点強度が第３の閾値以上の特徴点の数が第４の閾値以上であるか否かを判定する。特徴点強度が第３の閾値以上の特徴点の数が第４の閾値以上である場合Ｓ７０９に進み、特徴点強度が第３の閾値以上の特徴点の数が第４の閾値未満である場合Ｓ７１０に進む。 In S706, the control unit 102 performs determination processing on the feature points detected outside the area determined as the tracking subject area in the previous frame in the captured image. Specifically, it is determined whether or not the number of feature points outside the tracking subject region having feature point intensities greater than or equal to the third threshold is greater than or equal to the fourth threshold. If the number of feature points with feature point strengths equal to or greater than the third threshold is equal to or greater than the fourth threshold, the process advances to step S709; Proceed to S710.

Ｓ７０７では、制御部１０２の指示により追尾制御部１１３は、ＤＬ追尾部１１６、非ＤＬ追尾部１１７をともに有効にし、かつＤＬ追尾処理の動作レートを非ＤＬ追尾処理の動作レートより高く設定する。追尾被写体領域内外で複雑なテクスチャの被写体が多く存在し、追尾の難度が高いため、どちらの追尾処理も高レートで行うことで追尾精度を維持することができる。 In S707, the tracking control unit 113 enables both the DL tracking unit 116 and the non-DL tracking unit 117 according to the instruction from the control unit 102, and sets the operating rate of the DL tracking process higher than the operating rate of the non-DL tracking process. Since there are many subjects with complex textures inside and outside the tracking subject area, and tracking is highly difficult, tracking accuracy can be maintained by performing both tracking processes at a high rate.

Ｓ７０８では、制御部１０２の指示により追尾制御部１１３は、ＤＬ追尾部１１６を無効に、非ＤＬ追尾部１１７を有効にする。ここで、本実施形態では、このときの非ＤＬ追尾処理の動作レートはＳ７０７で設定される非ＤＬ追尾処理の動作レートより高い。追尾被写体領域内外の区別が容易であるため、非ＤＬ追尾のみで追尾処理を行うことで追尾精度を維持しつつ消費電力を抑制できる。 In S708 , the tracking control unit 113 disables the DL tracking unit 116 and enables the non-DL tracking unit 117 according to an instruction from the control unit 102 . Here, in the present embodiment, the operating rate of the non-DL tracking process at this time is higher than the operating rate of the non-DL tracking process set in S707. Since it is easy to distinguish between the inside and outside of the tracking subject area, it is possible to suppress power consumption while maintaining tracking accuracy by performing tracking processing only with non-DL tracking.

Ｓ７０９では、制御部１０２の指示により追尾制御部１１３は、ＤＬ追尾部１１６を有効に、非ＤＬ追尾部１１７を無効にする。ここで、本実施形態では、このときのＤＬ追尾処理の動作レートはＳ７０７～Ｓ７１０でＤＬ追尾部１１６に設定される動作レートの中で最も高いものとする。追尾被写体領域内には特徴点数が少なく、追尾被写体領域外には特徴点数が多いということはそれだけ追いにくく、特に特徴点検出処理のように画像内のエッジ部分などに基づいて追尾処理を行う非ＤＬ追尾処理は却って誤った結果を出力する可能性が高くなる。したがって、ＤＬ追尾処理のみで追尾することで追尾精度の低下を抑制する。 In S709 , the tracking control unit 113 enables the DL tracking unit 116 and disables the non-DL tracking unit 117 according to an instruction from the control unit 102 . Here, in this embodiment, the operating rate of the DL tracking process at this time is assumed to be the highest among the operating rates set in the DL tracking unit 116 in S707 to S710. The fact that there are few feature points in the tracking subject area and many feature points outside the tracking subject area makes tracking difficult. The DL tracking process is more likely to output an erroneous result. Therefore, tracking is performed only by the DL tracking process, thereby suppressing deterioration in tracking accuracy.

Ｓ７１０では、制御部１０２の指示により追尾制御部１１３は、ＤＬ追尾部１１６、非ＤＬ追尾部１１７をともに有効にし、かつＤＬ追尾処理、非ＤＬ追尾処理の動作レートをそれぞれＳ７０７で設定される動作レートより低く設定する。追尾被写体領域内外のいずれの領域でも検出できる特徴点が少ない状況では、ＤＬ追尾処理、非ＤＬ追尾処理ともに精度が出にくいので、例えば結果がいろいろな領域に振れる恐れがある。それらが高レートで反映されると画像のちらつきの原因になるので、どちらの追尾処理も有効にしつつ動作レートを下げることで追尾結果のちらつきによる視認性の低下を抑制する。 In S710, the tracking control unit 113 activates both the DL tracking unit 116 and the non-DL tracking unit 117 according to an instruction from the control unit 102, and sets the operation rates of the DL tracking process and the non-DL tracking process in S707. Set lower than rate. In a situation where there are few feature points that can be detected both inside and outside the tracking subject area, it is difficult to achieve accuracy in both the DL tracking process and the non-DL tracking process. If they are reflected at a high rate, they cause flickering in the image. Therefore, by lowering the operation rate while enabling both tracking processes, the decrease in visibility due to the flickering of the tracking result is suppressed.

（特徴点検出処理）
図８は特徴点検出部２０１で行う特徴点検出処理のフローチャートである。Ｓ８００で、制御部１０２は、追尾被写体の領域に対して水平一次微分フィルタ処理を行うことで水平一次微分画像を生成する。Ｓ８０２で、制御部１０２は、Ｓ８００で得た水平一次微分画像に対してさらに水平一次微分フィルタ処理を行うことで水平二次微分画像を生成する。 (Feature point detection processing)
FIG. 8 is a flowchart of feature point detection processing performed by the feature point detection unit 201 . In S800, the control unit 102 generates a horizontal first-order differential image by performing horizontal first-order differential filter processing on the region of the tracking subject. In S802, the control unit 102 further performs horizontal primary differential filter processing on the horizontal primary differential image obtained in S800 to generate a horizontal secondary differential image.

Ｓ８０１で、制御部１０２は、追尾被写体の領域に対して垂直一次微分フィルタ処理を行うことで垂直一次微分画像を生成する。 In S801, the control unit 102 generates a vertical primary differential image by performing vertical primary differential filter processing on the region of the tracking subject.

Ｓ８０４で、制御部１０２は、Ｓ８０１で得た垂直一次微分画像に対してさらに垂直一次微分フィルタ処理を行うことで水平二次微分画像を生成する。 In S804, the control unit 102 further performs vertical primary differential filter processing on the vertical primary differential image obtained in S801 to generate a horizontal secondary differential image.

Ｓ８０３で、制御部１０２は、Ｓ８００で得た水平一次微分画像に対してさらに垂直一次微分フィルタ処理を行うことで水平一次微分、垂直一次微分画像を生成する。 In S803, the control unit 102 further performs vertical primary differential filter processing on the horizontal primary differential image obtained in S800 to generate horizontal primary differential and vertical primary differential images.

Ｓ８０５で、制御部１０２は、Ｓ８０２、Ｓ８０３、Ｓ８０４で得られた微分値のヘシアン行列Ｈの行列式Ｄｅｔを計算する。Ｓ８０２で得られた水平二次微分値をＬｘｘ、Ｓ８０４で得られた垂直二次微分値をＬｙｙ、Ｓ８０３で得られた水平一次微分、垂直一次微分値をＬｘｙとするとき、ヘシアン行列Ｈは式（１）で表され、行列式Ｄｅｔは式（２）で表される。 In S805, the control unit 102 calculates the determinant Det of the Hessian matrix H of the differential values obtained in S802, S803, and S804. Let Lxx be the horizontal secondary differential value obtained in S802, Lyy be the vertical secondary differential value obtained in S804, and Lxy be the horizontal primary differential value and the vertical primary differential value obtained in S803. (1), and the determinant Det is represented by equation (2).

Ｓ８０６で、制御部１０２は、Ｓ８０５で得られた行列式Ｄｅｔが０以上であるかを判断する。行列式Ｄｅｔが０以上の時、Ｓ８０７に進む。行列式Ｄｅｔが０未満の時、Ｓ８０８に進む。 In S806, the control unit 102 determines whether the determinant Det obtained in S805 is 0 or more. When the determinant Det is 0 or more, the process proceeds to S807. When the determinant Det is less than 0, proceed to S808.

Ｓ８０７で、制御部１０２は、行列式Ｄｅｔが０以上の点を特徴点として検出する。 In S807, the control unit 102 detects points with a determinant Det of 0 or more as feature points.

Ｓ８０８で、制御部１０２は、入力された被写体領域全てに対して処理を行ったと判断した場合、特徴点検出処理を終了する。処理が全て完了していない場合は、Ｓ８００からＳ８０７の処理を繰り返し、特徴点検出処理を続ける。 In S808, if the control unit 102 determines that processing has been performed on all of the input subject regions, it ends the feature point detection processing. If all the processes have not been completed, the processes from S800 to S807 are repeated to continue the feature point detection process.

以上のように本実施形態では、第１の追尾手段と、第１の追尾手段よりも演算負荷が小さい第２の追尾手段とを用いる画像処理装置において、画像の特徴量に基づいて第１及び第２の追尾手段の有効・無効を制御するようにした。そのため、良好な追尾結果を得る必要性が低いシーンにおいて、消費電力を抑制することができる。 As described above, in the present embodiment, in the image processing apparatus using the first tracking means and the second tracking means having a smaller computational load than the first tracking means, the first and second Validity/invalidity of the second tracking means is controlled. Therefore, power consumption can be suppressed in a scene where it is less necessary to obtain a good tracking result.

以上、本発明を実施例に基づき具体的に説明したが、本発明は、前記実施例に限定されるものではなく、その要旨を逸脱しない範囲において種々の変更が可能であることは言うまでもない。 Although the present invention has been specifically described above based on the embodiments, it goes without saying that the present invention is not limited to the above embodiments, and that various modifications are possible without departing from the gist of the invention.

１０２制御部
１１０検出部
１１３追尾制御部
１１５追尾部
１１６ＤＬ追尾部
１１７非ＤＬ追尾部
１２４評価値生成部
１２６撮像装置動き検出部 102 control unit 110 detection unit 113 tracking control unit 115 tracking unit 116 DL tracking unit 117 non-DL tracking unit 124 evaluation value generation unit 126 imaging device motion detection unit

Claims

a first tracking means for tracking a subject using an image acquired by the imaging means;
a second tracking means having a smaller computational load than the first tracking means for tracking an object using the image acquired by the imaging means;
a control means for switching between enabling both the first tracking means and the second tracking means or disabling one of them based on the feature points detected from the image acquired by the imaging means; have
An image processing apparatus characterized by:

The control means controls the first tracking means based on the number of feature points within the tracking subject area to be tracked by the first tracking means or the second tracking means and the number of feature points outside the tracking subject area. 2. The image processing apparatus according to claim 1, wherein switching is made between enabling both of said tracking means and said second tracking means and disabling one of them.

The control means is
In the tracking subject area, the number of feature points whose intensity is higher than a first threshold is larger than a second threshold,
When the number of feature points whose intensity is higher than a third threshold outside the tracking subject area is larger than a fourth threshold,
3. The image processing apparatus according to claim 1, wherein both said first tracking means and said second tracking means are enabled.

The control means is
the number of feature points whose intensity is higher than a first threshold in the tracking subject area is less than a second threshold;
When the number of feature points whose intensity is higher than a third threshold outside the tracking subject area is larger than a fourth threshold,
4. The image processing apparatus according to claim 1, wherein said first tracking means is enabled and said second tracking means is disabled.

The control means is
In the tracking subject area, the number of feature points whose intensity is higher than a first threshold is larger than a second threshold,
outside the tracking subject area, if the number of feature points whose intensity is higher than a third threshold is less than a fourth threshold,
5. The image processing apparatus according to claim 1, wherein said first tracking means is disabled and said second tracking means is enabled.

The control means is
the number of feature points whose intensity is higher than a first threshold in the tracking subject area is less than a second threshold;
outside the tracking subject area, if the number of feature points whose intensity is higher than a third threshold is less than a fourth threshold,
6. The method according to any one of claims 1 to 5, wherein the operation rate is made lower than in other cases while enabling both the first tracking means and the second tracking means. Image processing device.

a first tracking step of performing tracking by a first tracking means for performing subject tracking using an image acquired by the imaging means;
a second tracking step of performing tracking by a second tracking means having a smaller computational load than the first tracking means, in which the subject is tracked using the image acquired by the imaging means;
a control step of switching between enabling both the first tracking means and the second tracking means or disabling one of them based on the feature points detected from the image acquired by the imaging means; have
An image processing method characterized by: