JP2012203613A

JP2012203613A - Image processing device, image processing method, recording medium, and program

Info

Publication number: JP2012203613A
Application number: JP2011067138A
Authority: JP
Inventors: Kaname Ogawa; 要小川
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2011-03-25
Filing date: 2011-03-25
Publication date: 2012-10-22
Also published as: CN102693544A; US8774458B2; US20120243737A1; EP2503511A1

Abstract

PROBLEM TO BE SOLVED: To track an image with a light load.SOLUTION: An image processing device calculates an evaluation value, which is expressed as the sum of confidence degrees obtained by compounding, at variable compound ratios, a matching degree of a first feature quantity and a matching degree of a second feature quantity between a target image including an object to be tracked and a comparison image which is an image of a comparison area to be compared with the target image of a predetermined frame, to obtain compound ratios which make the maximum evaluation value. Based on the confidence degrees in which the compound ratios making the maximum evaluation value are set, an image corresponding to the target image is detected.

Description

本技術は画像処理装置および方法、記録媒体並びにプログラムに関し、特に軽い負荷で画像をトラッキングすることができるようにした画像処理装置および方法、記録媒体並びにプログラムに関する。 The present technology relates to an image processing apparatus and method, a recording medium, and a program, and particularly to an image processing apparatus and method, a recording medium, and a program that can track an image with a light load.

デジタルカメラは、被写体を自動的にフォーカスするオートフォーカス機能を有していることが多い。これによりユーザは、カメラを被写体に向け、レリーズスイッチを操作するだけの簡単な操作で、被写体をピントが合った状態で確実に撮影することができる。 Digital cameras often have an autofocus function that automatically focuses a subject. As a result, the user can surely shoot the subject in focus with a simple operation of simply pointing the camera at the subject and operating the release switch.

さらにトラッキング機能が具備されている場合、被写体が移動したとしても、その被写体が自動的にトラッキングされるので、その被写体に対するピントが合った状態で撮影を行うことができる。 Further, in the case where a tracking function is provided, even if the subject moves, the subject is automatically tracked, so that it is possible to perform shooting while the subject is in focus.

被写体を自動的にトラッキングする技術としては、例えば非特許文献１に記載された技術がある。 As a technique for automatically tracking a subject, there is a technique described in Non-Patent Document 1, for example.

「Ensemble Tracking」 Shai Avidan, Mitsubishi Electric Research Labs, 201 Broadway Cambridge, MA02139, avidan@merl.com`` Ensemble Tracking '' Shai Avidan, Mitsubishi Electric Research Labs, 201 Broadway Cambridge, MA02139, avidan@merl.com

しかしながら、非特許文献１に記載の技術は、ブースティングの技術を用いるものであるため、計算量が厖大となり、民生用の画像処理装置であるデジタルカメラに応用することは困難である。 However, since the technique described in Non-Patent Document 1 uses a boosting technique, the calculation amount is enormous, and it is difficult to apply it to a digital camera which is a consumer image processing apparatus.

本技術はこのような状況に鑑みてなされたものであり、軽い負荷で画像をトラッキングすることができるようにするものである。 The present technology has been made in view of such a situation, and makes it possible to track an image with a light load.

本技術の一側面は、トラッキングの対象であるオブジェクトを含む対象画像と、第１のフレームの前記対象画像と比較される比較領域の画像である比較画像との、第１の特徴量のマッチング度と第２の特徴量のマッチング度を所定の混合率で混合して得られる信頼度の、前記混合率を変化させた場合の和で表される評価値を計算し、前記評価値が最大になるときの前記混合率を求める計算部と、前記評価値が最大になるときの前記混合率が設定された前記信頼度に基づいて、第２のフレームの前記対象画像に対応する画像を検出する検出部とを備える画像処理装置である。 One aspect of the present technology is a first feature amount matching degree between a target image including an object that is a tracking target and a comparison image that is an image of a comparison region compared with the target image in a first frame. And calculating the evaluation value represented by the sum of the reliability obtained by mixing the matching degree of the second feature quantity with a predetermined mixing ratio when the mixing ratio is changed, and maximizing the evaluation value An image corresponding to the target image in the second frame is detected based on the calculation unit for obtaining the mixing rate when the evaluation value becomes and the reliability with which the mixing rate when the evaluation value is maximized is set. An image processing apparatus including a detection unit.

前記第１のフレームと前記第２のフレームは、奇数フレームと偶数フレームの一方と他方とすることができる。 The first frame and the second frame may be one of the odd frame and the even frame and the other.

前記第２のフレームのスキャン領域のスキャン画像を前記比較画像とし、前記対象画像と前記スキャン画像の前記信頼度を演算する演算部をさらに備えることができる。 The image processing apparatus may further include a calculation unit that calculates the reliability of the target image and the scan image using the scan image of the scan area of the second frame as the comparison image.

前記検出部は、前記対象画像と前記スキャン画像の前記信頼度が最大となる前記スキャン画像を前記対象画像に対応する画像として検出することができる。 The detection unit can detect the scan image having the maximum reliability of the target image and the scan image as an image corresponding to the target image.

前記計算部は、前記第１のフレームの基準領域の画像を前記対象画像とし、前記基準領域の前記対象画像を少なくとも一部に含む複数の領域をポジティブ領域とし、前記基準領域の前記対象画像を含まない複数の領域をネガティブ領域とし、前記基準領域の前記対象画像と複数の前記ポジティブ領域の画像との前記信頼度である第１の信頼度を計算し、前記基準領域の前記対象画像と複数の前記ネガティブ領域の画像との前記信頼度である第２の信頼度を計算し、前記第１の信頼度と前記ポジティブ領域の第１の重み係数との積和である第１の積和を計算し、前記第２の信頼度と前記ネガティブ領域の第２の重み係数との積和である第２の積和を計算し、前記第１の積和と前記第２の積和の和を前記評価値として計算することができる。 The calculation unit sets the image of the reference area of the first frame as the target image, sets a plurality of areas including at least a part of the target image of the reference area as a positive area, and sets the target image of the reference area as the target image. A plurality of regions not included are defined as negative regions, a first reliability that is the reliability of the target image in the reference region and a plurality of images in the positive region is calculated, and a plurality of the target images in the reference region A second reliability that is the reliability with the image of the negative region is calculated, and a first product sum that is a product sum of the first reliability and the first weighting factor of the positive region is calculated. Calculating a second product sum that is a product sum of the second reliability and the second weighting factor of the negative region, and calculating the sum of the first product sum and the second product sum. It can be calculated as the evaluation value.

前記第１の重み係数は、定数を前記ポジティブ領域の数で除算した値とし、前記第２の重み係数は、前記定数を前記ネガティブ領域の数で除算した値とすることができる。 The first weighting factor may be a value obtained by dividing a constant by the number of positive regions, and the second weighting factor may be a value obtained by dividing the constant by the number of negative regions.

前記計算部は、前記第２のフレームの前記対象画像に対応する画像の座標に対応する領域であって、前記第２のフレームよりさらに後の第３のフレームの領域の画像を新たな前記対象画像として、前記第３のフレームで前記評価値を計算して、前記評価値が最大になるときの前記混合率を求め、前記検出部は、前記第３のフレームの画像に基づいて前記評価値が最大になるときの前記混合率が設定された前記信頼度に基づいて、前記第３のフレームよりさらに後の第４のフレームにおいて、前記第３のフレームの新たな前記対象画像に対応する画像を検出することができる。 The calculation unit is a region corresponding to the coordinates of an image corresponding to the target image of the second frame, and an image in a region of a third frame further after the second frame is newly added to the target As the image, the evaluation value is calculated in the third frame to obtain the mixing ratio when the evaluation value is maximized, and the detection unit determines the evaluation value based on the image of the third frame. An image corresponding to the new target image of the third frame in a fourth frame further after the third frame, based on the reliability with which the mixing ratio when the maximum value is set is Can be detected.

前記対象画像に対応する画像の座標に対応する領域にマーカを表示する表示部をさらに備えることができる。 The image processing apparatus may further include a display unit that displays a marker in an area corresponding to the coordinates of the image corresponding to the target image.

前記対象画像に対応する画像が画面の所定の位置に配置されるようにカメラの位置を駆動する駆動部をさらに備えることができる。 The image forming apparatus may further include a drive unit that drives the position of the camera so that an image corresponding to the target image is arranged at a predetermined position on the screen.

前記第１の特徴量は輝度情報とし、前記第２の特徴量は色情報とすることができる。 The first feature amount may be luminance information, and the second feature amount may be color information.

本技術の側面の画像処理方法、記録媒体およびプログラムは、上述した本技術の側面の画像処理装置に対応する画像処理方法、記録媒体およびプログラムである。 An image processing method, a recording medium, and a program according to an aspect of the present technology are an image processing method, a recording medium, and a program corresponding to the above-described image processing apparatus according to the present technology.

本技術の側面においては、トラッキングの対象であるオブジェクトを含む対象画像と、所定のフレームの対象画像と比較される比較領域の画像である比較画像との、第１の特徴量のマッチング度と第２の特徴量のマッチング度を所定の混合率で混合して得られる信頼度の、混合率を変化させた場合の和で表される評価値が計算され、評価値が最大になるときの混合率が求められる。評価値が最大になるときの混合率が設定された信頼度に基づいて対象画像に対応する画像が検出される。 In the aspect of the present technology, the first feature amount matching degree between the target image including the object to be tracked and the comparison image that is the image of the comparison region compared with the target image of the predetermined frame An evaluation value expressed by the sum of the reliability obtained by mixing the degree of matching of the two feature quantities at a predetermined mixing ratio when the mixing ratio is changed is calculated, and the mixing is performed when the evaluation value is maximized. A rate is required. An image corresponding to the target image is detected based on the reliability with which the mixing ratio when the evaluation value is maximized is set.

以上のように、本技術の一側面によれば、軽い負荷で画像をトラッキングすることができる。 As described above, according to one aspect of the present technology, an image can be tracked with a light load.

本技術のデジタルカメラの構成を示すブロック図である。It is a block diagram which shows the structure of the digital camera of this technique. トラッキング処理を説明するフローチャートである。It is a flowchart explaining a tracking process. 領域の切り出しを説明する図である。It is a figure explaining extraction of a field. 評価値を説明する図であるIt is a figure explaining an evaluation value スキャンを説明する図である。It is a figure explaining a scan. マーカの表示を説明する図である。It is a figure explaining the display of a marker. 領域の切り出しを説明する図である。It is a figure explaining extraction of a field.

図１は、本技術のデジタルカメラ１の構成を示すブロック図である。デジタルカメラ１は、ＣＰＵ（Central Processing Unit）１１、レンズ１２、出力部１３、入力部１４、および記憶部１５により構成されている。 FIG. 1 is a block diagram illustrating a configuration of a digital camera 1 of the present technology. The digital camera 1 includes a CPU (Central Processing Unit) 11, a lens 12, an output unit 13, an input unit 14, and a storage unit 15.

ＣＰＵ１１は、各種の処理を実行する。レンズ１２は被写体を撮像し、その画像データをＣＰＵ１１に供給する。出力部１３は、例えばＬＣＤ（Liquid Crystal Display）などにより構成され、レンズ１２により撮像された画像を表示する。また出力部１３は、スピーカを有し、必要な警告音等を出力する。入力部１４は、ユーザにより操作されるレリーズスイッチの他、シャッタスピード、露光時間を調整する部材等により構成される。記憶部１５は、撮像された画像データを記憶したり、ＣＰＵ１１が動作するプログラムなどを記憶する。 The CPU 11 executes various processes. The lens 12 images a subject and supplies the image data to the CPU 11. The output unit 13 is configured by, for example, an LCD (Liquid Crystal Display) and displays an image captured by the lens 12. The output unit 13 includes a speaker and outputs necessary warning sounds and the like. The input unit 14 includes a release switch operated by the user, a member that adjusts the shutter speed and the exposure time, and the like. The storage unit 15 stores captured image data, a program for operating the CPU 11, and the like.

駆動部４１は、例えばカメラ１が所定の台（図示せず）に搭載されている状態において、カメラ１を所定の方向にパン、チルトする。 For example, the drive unit 41 pans and tilts the camera 1 in a predetermined direction in a state where the camera 1 is mounted on a predetermined table (not shown).

ＣＰＵ１１は、取り込み部２１、切り出し部２２、初期化部２３、計算部２４、設定部２５、演算部２６、検出部２７、表示部２８、フォーカス部２９、および判定部３０の機能ブロックを有している。各部は必要に応じて信号を授受することが可能である。 The CPU 11 has functional blocks of a capture unit 21, a cutout unit 22, an initialization unit 23, a calculation unit 24, a setting unit 25, a calculation unit 26, a detection unit 27, a display unit 28, a focus unit 29, and a determination unit 30. ing. Each unit can send and receive signals as necessary.

取り込み部２１は、画像の取り込みを行う。切り出し部２２は、取り込まれた画像から所定の部分の切り出しを行う。初期化部２３は、係数を初期化する。計算部２４は、各所の計算を実行する。設定部２５は、係数の設定を行う。演算部２６は、各所の演算を実行する。検出部２７は、位置を検出する。表示部２８は、マーカを表示する。フォーカス部２９は、フォーカス調整を行う。判定部３０は、各種の判定処理を行う。 The capturing unit 21 captures an image. The cutout unit 22 cuts out a predetermined portion from the captured image. The initialization unit 23 initializes the coefficients. The calculation unit 24 performs calculation at each place. The setting unit 25 sets a coefficient. The computing unit 26 performs computations at various places. The detection unit 27 detects a position. The display unit 28 displays a marker. The focus unit 29 performs focus adjustment. The determination unit 30 performs various determination processes.

この実施の形態においては、各部はプログラムを実行する場合に機能的に構成されるが、ハードウェアとして構成してもよいことは勿論である。 In this embodiment, each unit is functionally configured when a program is executed, but may of course be configured as hardware.

図２は、トラッキング処理を説明するフローチャートである。以下、この図２を参照して、デジタルカメラ１のトラッキング処理について説明する。 FIG. 2 is a flowchart for explaining the tracking process. Hereinafter, the tracking process of the digital camera 1 will be described with reference to FIG.

ステップＳ１において取り込み部２１は、画像を取り込む。すなわちレンズ１２が撮像した画像の所定のフレームＦ１の画像が取り込まれる。この取り込みは、レンズ１２により撮像され、記憶部１５に記憶された画像から行われる。 In step S1, the capturing unit 21 captures an image. That is, an image of a predetermined frame F1 of an image captured by the lens 12 is captured. This capturing is performed from an image captured by the lens 12 and stored in the storage unit 15.

ステップＳ２において切り出し部２２は、ステップＳ１の処理で取り込まれたフレームの画像から、オブジェクトを含む領域と含まない領域とを切り出す。オブジェクトとはユーザがトラッキングを希望する対象の画像であり、例えば被写体の顔である。この顔の切り出しについて図３を参照して説明する。 In step S2, the cutout unit 22 cuts out a region including the object and a region not including the object from the frame image captured in the process of step S1. An object is an image of a target that the user desires to track, for example, the face of a subject. This face clipping will be described with reference to FIG.

図３は、領域の切り出しを説明する図である。図３に示されるように、ステップＳ１の処理で取り込まれた画像であるフレーム１０１（フレームＦ１に対応する）に、オブジェクト１０２が表示されている。オブジェクト１０２を含む例えば矩形の領域が基準領域１１１−０とされる。そして基準領域１１１−０内の画像が対象画像１１４とされる。後述するステップＳ１０の処理でマーカ２３１が、その後のフレーム２０１（フレームＦ２に対応する）上に表示されている場合には（後述する図６を参照）、フレーム２０１上のマーカ２３１の座標に対応するフレーム１０１上の領域が基準領域１１１−０とされる。まだステップＳ１０の処理が実行される前の最初のフレームでは、ユーザが入力部１４を操作することで指定した点を中心とする矩形の領域が基準領域１１１−０とされる。 FIG. 3 is a diagram for explaining segmentation. As shown in FIG. 3, an object 102 is displayed in a frame 101 (corresponding to the frame F1) that is an image captured in the process of step S1. For example, a rectangular area including the object 102 is set as a reference area 111-0. The image in the reference area 111-0 is the target image 114. When the marker 231 is displayed on the subsequent frame 201 (corresponding to the frame F2) in the process of step S10 described later (see FIG. 6 described later), it corresponds to the coordinates of the marker 231 on the frame 201. A region on the frame 101 to be used is a reference region 111-0. In the first frame before the process of step S10 is executed, a rectangular area centered on a point designated by the user operating the input unit 14 is set as the reference area 111-0.

ステップＳ２では、この基準領域１１１−０の対象画像１１４の少なくとも一部を含む領域１１１−１，１１１−２，・・・，１１１−Ｎｐが切り出される。すなわちＮｐ個の領域が基準領域１１１−０の対象画像１１４を含むポジティブ領域として切り出される。同様に、基準領域１１１−０の対象画像１１４を全く含まない領域１１２−１，１１２−２，・・・，１１２−Ｎｎが切り出される。すなわちＮｎ個の領域が基準領域１１１−０の対象画像１１４を含まないネガティブ領域として切り出される。 In step S2, regions 111-1, 111-2,..., 111-Np including at least a part of the target image 114 of the reference region 111-0 are cut out. That is, Np areas are cut out as positive areas including the target image 114 of the reference area 111-0. Similarly, regions 112-1, 112-2,..., 112-Nn that do not include the target image 114 of the reference region 111-0 are cut out. That is, Nn areas are cut out as negative areas not including the target image 114 of the reference area 111-0.

次にステップＳ３において初期化部２３は、各領域の重み係数ｗ_Ｐ，ｗ_Ｎを初期化する。重み係数ｗ_Ｐ，ｗ_Ｎは、次の式（１）により表される。重み係数ｗ_Ｐは、ポジティブ領域１１１−Ｊ（Ｊ＝１，２,・・・，Ｎｐ）の重み係数であり、重み係数ｗ_Ｎは、ネガティブ領域１１２−Ｊ（Ｊ＝１，２,・・・，Ｎｎ）の重み係数である。 Next, in step S3, the initialization unit 23 initializes the weighting factors w _P and w _N of each region. The weighting factors w _P and w _N are expressed by the following equation (1). Weight coefficient _{w P} is positive area 111-J (J = 1,2, ···, Np) is a weighting factor, the weighting factor _{w N} is negative region 112-J (J = 1,2, ·· ., Nn).

式（１）で表されるように、ポジティブ領域の重み係数ｗ_Ｐは、定数Ｇ_Ｐをポジティブ領域の数Ｎｐで除算した値であり、ネガティブ領域の重み係数ｗ_Ｎは、定数Ｇ_Ｎをネガティブ領域の数Ｎｎで除算した値である。ポジティブ領域の重み係数ｗ_Ｐの値は各領域１１１−Ｊにおいて同一である。同様に、ネガティブ領域の重み係数ｗ_Ｎの値は各領域１１２−Ｊにおいて同一である。定数Ｇ_Ｐ，Ｇ_Ｎの値は、デジタルカメラ１の工場出荷時に予め決定され、設定される。 As represented by the formula (1), the weight coefficient w _P of the positive region, the constant G _P is a value obtained by dividing the number Np of positive areas, weighting coefficient w _N negative region, the negative constant G _N This is a value divided by the number of areas Nn. The value of the weighting factor w _P positive region is the same in each region 111-J. Similarly, the value of the weight coefficient w _N negative region is the same in each region 112-J. The value of the constant G _P, G _N is predetermined at the time of factory digital camera 1 is set.

定数Ｇ_Ｐ，Ｇ_Ｎの値は、例えばいずれも０．５に設定したり、定数Ｇ_Ｐの値を０．８とし、定数Ｇ_Ｎの値を０．２とすることもできる。重み係数ｗ_Ｐ，ｗ_Ｎのうち、対応する定数がより大きい値に設定された方の重みがより強くなる。定数Ｇ_Ｐ，Ｇ_Ｎの値を所定の値に設定することで、重み係数ｗ_Ｐと重み係数ｗ_Ｎのバランスを、適宜調整することができる。 The value of the constant _G P, _{G N,} for example to set either to 0.5, the value of the constant _{G P} is 0.8, the value of the constant _{G N} can also be 0.2. Of the weight coefficients w _P and w _N , the weight with the corresponding constant set to a larger value becomes stronger. Constant G _P, by setting the value of G _N to a predetermined value, the balance of the weight coefficient w _P and the weighting coefficient w _N, can be appropriately adjusted.

ステップＳ４において計算部２４は、評価値Eval(K)を計算する。評価値Eval(K)は、式（２）で表される。式（２）における信頼度Confidence（K）は、式（３）で表される。Ｋは、例えば０乃至２５６のように、変化される整数値である。 In step S4, the calculation unit 24 calculates an evaluation value Eval (K). The evaluation value Eval (K) is expressed by Expression (2). The reliability Confidence (K) in Expression (2) is expressed by Expression (3). K is an integer value that is changed, for example, from 0 to 256.

すなわち、式（２）では、基準領域１１１−０の対象画像１１４と、複数のポジティブ領域１１１−１，１１１−２，・・・の画像との信頼度Confidence（K）が第１の信頼度とされる。これが式（２）の右辺の第１項の信頼度Confidence（K）である。基準領域１１１−０の対象画像１１４と複数のネガティブ領域１１２−１，１１２−２，・・・の画像との信頼度Confidence（K）が第２の信頼度とされる。これが式（２）の右辺の第２項の信頼度Confidence（K）である。第１の信頼度とポジティブ領域１１１−１，１１１−２，・・・の第１の重み係数ｗ_Ｐとの積和が第１の積和とされ、第２の信頼度とネガティブ領域１１２−１，１１２−２，・・・の第２の重み係数ｗ_Ｎとの積和が第２の積和とされる。そして第１の積和と第２の積和の和が評価値Eval(K)とされる。 That is, in Expression (2), the reliability Confidence (K) between the target image 114 of the reference region 111-0 and the images of the plurality of positive regions 111-1, 111-2,. It is said. This is the reliability Confidence (K) of the first term on the right side of Equation (2). The reliability Confidence (K) between the target image 114 in the reference area 111-0 and the images in the plurality of negative areas 112-1, 112-2,... Is the second reliability. This is the reliability Confidence (K) of the second term on the right side of Equation (2). First reliability and positive regions 111-1 and 111-2, the product-sum of the first weighting factor _{w P} of ... is the first product-sum, the second reliability and negative regions 112- 1,112-2, sum of products between the second weighting factor _{w N} of ... are the second sum-of-products. The sum of the first product sum and the second product sum is used as the evaluation value Eval (K).

式（３）におけるfeat_Aは、トラッキングする対象であるオブジェクトを含む対象画像１１４と比較画像の第１の特徴量（例えば輝度情報）のマッチング度であり、feat_Bは、第２の特徴量（例えば色情報）のマッチング度である。Ｋは、第１の特徴量のマッチング度feat_Aと第２の特徴量のマッチング度feat_Bの混合率を意味する。式（３）から判るように、信頼度Confidence（K）は、比較画像が、対象画像１１４と一致する確からしさを表しており、その値が大きい程、比較画像が対象画像１１４と一致する可能性が高い。勿論、輝度情報と色情報以外の特徴量を用いることができる。 In Expression (3), feat_A is the degree of matching between the target image 114 including the object to be tracked and the first feature amount (eg, luminance information) of the comparison image, and feat_B is the second feature amount (eg, color) Information) matching degree. K means a mixing ratio of the matching degree feat_A of the first feature quantity and the matching degree feat_B of the second feature quantity. As can be seen from Equation (3), the reliability Confidence (K) represents the probability that the comparison image matches the target image 114, and the larger the value, the more likely the comparison image matches the target image 114. High nature. Of course, feature quantities other than luminance information and color information can be used.

式（２）における右辺の第1項のΣのtrueは、ポジティブ領域のConfidence（K）だけを積和することを意味する。式（２）における右辺の第１項のポジティブ領域のConfidence（K）を計算する場合において対象画像１１４と比較される比較画像は、ポジティブ領域１１１−Ｊの画像である。同様に、第２項のΣのtrueは、ネガティブ領域のConfidence（K）だけを積和することを意味する。式（２）における右辺の第２項のネガティブ領域のConfidence（K）を計算する場合において対象画像１１４と比較される比較画像は、ネガティブ領域１１２−Ｊの画像である。 The true value of Σ in the first term on the right side in Equation (2) means that only the confidence (K) in the positive region is summed. The comparison image to be compared with the target image 114 when calculating Confidence (K) of the positive region of the first term on the right side in Expression (2) is an image of the positive region 111-J. Similarly, the true value of Σ in the second term means that only the Confidence (K) in the negative region is summed. When calculating Confidence (K) of the negative region in the second term on the right side in Expression (2), the comparison image compared with the target image 114 is an image of the negative region 112-J.

ステップＳ５において計算部２４は、評価値Eval(K)の値を最大とする混合率Ｋｍを求める。すなわち、混合率Ｋの値を０乃至２５６に順次変化させて、評価値Eval(K)の値が計算される。そして２５７個の評価値Eval(K)の値の中から最大のものが選択され、評価値Eval(K)の値を最大とする混合率Ｋｍが決定される。 In step S5, the calculation unit 24 obtains a mixing rate Km that maximizes the evaluation value Eval (K). That is, the value of the evaluation value Eval (K) is calculated by sequentially changing the value of the mixing ratio K from 0 to 256. Then, the maximum value is selected from the 257 evaluation value Eval (K) values, and the mixture ratio Km that maximizes the evaluation value Eval (K) value is determined.

図４は、評価値を説明する図である。混合率Ｋの値を０乃至２５６に順次変化させると、評価値Eval(K)の値は図４に示されるように変化する。図４の例では、評価値Eval(K)の値を最大にする混合率Ｋは、Ｋｍである。評価値Eval(K)の値を最大にする混合率Ｋｍが、そのフレームのオブジェクト１０２を含む対象画像１１４を検出するのに最適な混合率である。ステップＳ８，Ｓ９で後述するように、次のフレームでは、この混合率Ｋｍを用いて信頼度Confidence(K)が演算される。つまり評価値Eval(K)は、最適な混合率Ｋｍを決定するための関数である。 FIG. 4 is a diagram for explaining the evaluation value. When the value of the mixing rate K is sequentially changed from 0 to 256, the evaluation value Eval (K) changes as shown in FIG. In the example of FIG. 4, the mixing rate K that maximizes the value of the evaluation value Eval (K) is Km. The mixing rate Km that maximizes the evaluation value Eval (K) is the optimal mixing rate for detecting the target image 114 including the object 102 of the frame. As will be described later in steps S8 and S9, in the next frame, the reliability Confidence (K) is calculated using this mixing ratio Km. That is, the evaluation value Eval (K) is a function for determining the optimum mixing ratio Km.

そこでステップＳ６において設定部２５は、式（３）の信頼度Confidence(K)に、ステップＳ５で求められた混合率Ｋｍを設定する。 Therefore, in step S6, the setting unit 25 sets the mixing ratio Km obtained in step S5 to the reliability Confidence (K) of the equation (3).

以上のようにしてステップＳ１乃至Ｓ６の処理により、第１のフレームで混合率Ｋの学習処理が行われた後、続くステップＳ７乃至Ｓ１１により第２のフレームでトラッキング処理が行われる。 As described above, the learning process of the mixing ratio K is performed in the first frame by the processes in steps S1 to S6, and then the tracking process is performed in the second frame in subsequent steps S7 to S11.

ステップＳ７において取り込み部２１は画像を取り込む。つまりステップＳ１で取り込まれたフレームＦ１の次のフレームＦ２の画像が記憶部１５から読み出され、取り込まれる。 In step S7, the capturing unit 21 captures an image. That is, the image of the frame F2 next to the frame F1 captured in step S1 is read from the storage unit 15 and captured.

ステップＳ８において演算部２６は、取り込んだ画像上でスキャン画像をスキャンし、各スキャン画像の信頼度Confidence(K)を演算する。すなわちステップＳ２の処理で、フレームＦ１の基準領域１１１−０の画像が対象画像１１４として決定されている。現在のフレーム（つまりステップＳ７で取り込まれたフレームＦ２）上の所定の位置の、対象画像１１４に対応する大きさのスキャン領域のスキャン画像が比較画像として抽出され、対象画像１１４と比較される。そして対象画像１１４とスキャン画像の間の第１の特徴量のマッチング度feat_Aと第２の特徴量のマッチング度feat_Bが演算される。演算された第１の特徴量のマッチング度feat_Aと第２の特徴量のマッチング度feat_Bを式（３）に適用して、信頼度Confidence(K)が演算される。このときの混合率Ｋとしては、ステップＳ６で設定された値Ｋｍが用いられる。 In step S8, the calculation unit 26 scans the scanned image on the captured image, and calculates the reliability Confidence (K) of each scanned image. That is, the image of the reference area 111-0 of the frame F1 is determined as the target image 114 in the process of step S2. A scan image of a scan area having a size corresponding to the target image 114 at a predetermined position on the current frame (that is, the frame F <b> 2 captured in step S <b> 7) is extracted as a comparison image and compared with the target image 114. Then, a first feature amount matching degree feat_A and a second feature amount matching degree feat_B between the target image 114 and the scanned image are calculated. The reliability degree Confidence (K) is calculated by applying the calculated matching degree feat_A of the first feature quantity and the matching degree feat_B of the second feature quantity to Equation (3). As the mixing ratio K at this time, the value Km set in step S6 is used.

図５は、スキャンを説明する図である。図５に示されるように、ステップＳ７で取り込まれたフレーム２０１（すなわちフレームＦ２）上の所定の位置のスキャン領域２２１−１のスキャン画像２２２−１が比較画像として抽出され、ステップＳ２で指定された前のフレームＦ１の対象画像１１４と比較される。スキャン領域２２１−１の大きさは、基準領域１１１−０と同じ大きさとされている。つまり、スキャン画像２２２−１は対象画像１１４と同じ大きさとされている。式（３）の混合率Ｋの値が最大値Ｋｍに設定された状態で、対象画像２１１とスキャン画像２２２−１との信頼度Confidence(K)が演算される。 FIG. 5 is a diagram for explaining scanning. As shown in FIG. 5, a scan image 222-1 of the scan region 221-1 at a predetermined position on the frame 201 (that is, the frame F2) captured in step S7 is extracted as a comparison image and designated in step S2. It is compared with the target image 114 of the previous frame F1. The size of the scan area 221-1 is the same as that of the reference area 111-0. That is, the scanned image 222-1 has the same size as the target image 114. The reliability Confidence (K) between the target image 211 and the scan image 222-1 is calculated in a state where the value of the mixing ratio K in Expression (3) is set to the maximum value Km.

フレーム２０１上の比較領域は、比較領域２１１−１，２１１−２，２１１−３，・・・と順次移動され、同様の処理が繰り返される。フレーム２０１上のスキャンする範囲は、フレーム２０１の全体とすることもできるが、ステップＳ２で指定された基準領域１１１−０の座標（すなわち前回のステップＳ１０の処理でマーカ２３１が表示された座標）を基準として、そこから所定の距離の範囲内とすることもできる。スキャンする範囲を制限した方が計算量を少なくすることができる。 The comparison area on the frame 201 is sequentially moved to comparison areas 211-1, 211-2, 211-3,..., And the same processing is repeated. The range to be scanned on the frame 201 may be the entire frame 201, but the coordinates of the reference area 111-0 specified in step S2 (that is, the coordinates at which the marker 231 is displayed in the previous processing in step S10). Can be within a predetermined distance. The amount of calculation can be reduced by limiting the scanning range.

ステップＳ９において検出部２７は、信頼度Confidence(K)が最大となる領域を検出する。すなわち、ステップＳ８の処理で演算された各スキャン領域２２１−Ｊ（Ｊ＝１，２，・・・）の信頼度Confidence(K)の中から、最も大きな値の信頼度Confidence(K)が選択され、その信頼度Confidence(K)に対応するスキャン領域２２１−Ｍが選択される。そして、そのフレーム２０１（フレームＦ２）上のスキャン領域２２１−Ｍの画像が、フレーム１０１（フレームＦ１）上の対象画像１１４に対応する画像２３２とされる。つまり、フレーム１０１上の基準領域１１１−０の対象画像１１４が、フレームＦ２のスキャン領域２２１−Ｍに移動し、画像２３２として表示されているものと判断される（後述する図６を参照）。 In step S9, the detection unit 27 detects a region where the reliability Confidence (K) is maximized. That is, the reliability Confidence (K) having the largest value is selected from the reliability Confidence (K) of each scan region 221-J (J = 1, 2,...) Calculated in the process of Step S8. Then, the scan area 221 -M corresponding to the reliability Confidence (K) is selected. The image of the scan area 221 -M on the frame 201 (frame F2) is set as an image 232 corresponding to the target image 114 on the frame 101 (frame F1). That is, it is determined that the target image 114 in the reference area 111-0 on the frame 101 is moved to the scan area 221-M in the frame F2 and displayed as the image 232 (see FIG. 6 described later).

ステップＳ１０で表示部２８は、検出された位置にマーカ２３１を表示する。図６は、マーカ２３１の表示を説明する図である。図６においては、スキャン領域２２１−Ｍにオブジェクト１０２を含む画像２３２が表示されている。そしてスキャン領域２２１−Ｍの位置にマーカ２３１が表示されている。つまり画像２３２に対してマーカ２３２が表示されている。フォーカス部２９は、マーカ２３１内に表示されている画像２３２を基準にフォーカスが合うように、レンズ１２を駆動、調整する。ユーザはこのマーカ２３１を見て、いまどこにフォーカスが合っているのかを確認することができる。 In step S10, the display unit 28 displays the marker 231 at the detected position. FIG. 6 is a diagram for explaining the display of the marker 231. In FIG. 6, an image 232 including the object 102 is displayed in the scan area 221 -M. A marker 231 is displayed at the position of the scan area 221-M. That is, the marker 232 is displayed with respect to the image 232. The focus unit 29 drives and adjusts the lens 12 so that the image is focused on the image 232 displayed in the marker 231. The user can check the marker 231 to confirm where the focus is now.

ステップＳ１１において判定部３０は、トラッキングを終了するかを判定する。ユーザが入力部１４を操作して、トラッキングの中止を指令した場合、トラッキング処理は終了される。 In step S11, the determination unit 30 determines whether to end tracking. When the user operates the input unit 14 and instructs to stop tracking, the tracking process is terminated.

トラッキングの中止が指令されていない場合、処理はステップＳ１に戻り、さらに次のフレームＦ３の画像が取り込まれる。そしてステップＳ２においてオブジェクトを含む領域を切り出す処理が行われる。最初のフレームＦ１の場合、まだステップＳ１０の処理が行われていないので、ユーザにより指定された位置に基づいて基準領域１１１−０が設定された。しかしいまの場合、ステップＳ１０の処理で前回の対象画像１１４に対応する画像２３２の座標が判っているので、フレーム２０１のマーカ２３１が表示されている領域２２１−Ｍに対応する座標の次のフレーム３０１の領域が新たな基準領域１１１−０とされ、そこを基準に切り出し処理が行われる。 If tracking is not instructed, the process returns to step S1, and an image of the next frame F3 is further captured. In step S2, a process for cutting out an area including the object is performed. In the case of the first frame F1, since the process of step S10 has not yet been performed, the reference area 111-0 is set based on the position designated by the user. However, in this case, since the coordinates of the image 232 corresponding to the previous target image 114 are known in the process of step S10, the next frame of the coordinates corresponding to the area 221-M where the marker 231 of the frame 201 is displayed. An area 301 is set as a new reference area 111-0, and the cutout process is performed based on the new reference area 111-0.

図７は、２回目の領域の切り出しを説明する図である。図７に示されるように、２回目のステップＳ１の処理で新たに取り込まれたフレーム３０１（すなわちフレームＦ３）の領域３１１−０は、１フレーム前の図６のフレーム２０１（すなわちフレームＦ２）上のスキャン領域２２１−Ｍに対応する領域である。この領域３１１−０が新たなフレーム３０１の基準領域とされ、そこに表示されている画像が新たな対象画像３１４とされる。切り出し部２２は、この新たな基準領域３１１−０を基準として、新たなポジティブ領域３１１−１，３１１−２，・・・と、新たなネガティブ領域３１２−１，３１２−２，・・・を切り出す。 FIG. 7 is a diagram for explaining the second region extraction. As shown in FIG. 7, the region 311-0 of the frame 301 (that is, the frame F3) newly acquired in the process of the second step S1 is on the frame 201 (that is, the frame F2) of FIG. This is an area corresponding to the scan area 221-M. This region 311-0 is set as a reference region of a new frame 301, and an image displayed there is set as a new target image 314. The cut-out unit 22 uses the new reference area 311-0 as a reference to create new positive areas 311-1, 311-2,... And new negative areas 312-1, 312-2,. cut.

以下、同様の処理が行われる。すなわち、フレームＦ２の対象画像１１４に対応する画像２３２の座標に対応する領域であって、フレームＦ２よりさらに後のフレームＦ３の領域の画像が新たな対象画像３１４とされ、フレームＦ３で評価値Eval(K)が計算される。つまり、新たな対象画像３１４と、新たなポジティブ領域３１１−１，３１１−２，・・・、並びに新たなネガティブ領域３１２−１，３１２−２，・・・との間における評価値Eval(K)が計算される。 Thereafter, the same processing is performed. That is, an image corresponding to the coordinates of the image 232 corresponding to the target image 114 of the frame F2 and in the region of the frame F3 after the frame F2 is set as a new target image 314. (K) is calculated. That is, the evaluation value Eval (K between the new target image 314 and the new positive areas 311-1, 311-2,... And the new negative areas 312-1, 312-2,. ) Is calculated.

さらに計算された評価値Eval(K)が最大になるときの混合率Ｋｍが求められる。そして、フレームＦ３の画像に基づいて評価値Eval(K)が最大になるときの混合率Ｋｍが設定された信頼度Confidence(K)に基づいて、フレームＦ３よりさらに後のフレームＦ４（図示せず）において、フレームＦ３の新たな対象画像３１４に対応する画像が検出される。 Further, the mixing ratio Km when the calculated evaluation value Eval (K) becomes maximum is obtained. Then, based on the reliability Confidence (K) in which the mixing ratio Km when the evaluation value Eval (K) is maximized based on the image of the frame F3 is set, the frame F4 (not shown) further after the frame F3 ), An image corresponding to the new target image 314 of the frame F3 is detected.

このような処理が、各フレーム毎に繰り返されて、オブジェクト１０２が移動すると、その移動先をマーカ２３１がトラッキングし、表示される。ステップＳ１乃至Ｓ６の処理は、連続する奇数フレームと偶数フレームの一方で実行され、ステップＳ７乃至Ｓ１１の処理は、他方で実行される。 Such processing is repeated for each frame, and when the object 102 moves, the marker 231 tracks and displays the movement destination. The processes of steps S1 to S6 are executed on one of the consecutive odd frames and even frames, and the processes of steps S7 to S11 are executed on the other.

なお、式（２）の右辺の第２項は、省略することも可能である。ただしこの場合、省略しない場合に比べてトラッキングの機能の質は低下する。 Note that the second term on the right side of Equation (2) can be omitted. However, in this case, the quality of the tracking function is deteriorated as compared with the case where it is not omitted.

また、式（３）における正規化処理に代えて、すなわち値２５６による除算をせずに、（２５６−Ｋ）の代わりに、（１−Ｋ）を用いるようにしてもよい。 Further, instead of the normalization process in Expression (3), that is, without dividing by the value 256, (1-K) may be used instead of (256-K).

さらにステップＳ１０においてマーカ２３１を表示させるようにしたが、駆動部４１を駆動して、常に、オブジェクト１０２がフレーム内の所定の位置（例えば中央）に位置するようにカメラ１の位置をパン、チルトして制御することもできる。 In step S10, the marker 231 is displayed. However, the driving unit 41 is driven, and the position of the camera 1 is panned and tilted so that the object 102 is always located at a predetermined position (for example, the center) in the frame. It can also be controlled.

本技術は、フレーム毎に得られる情報だけを利用しており、例えば動きベクトルのような複数のフレーム間の画像から得られる情報を利用したり、測距装置などを利用していないので、処理が迅速かつ簡単になる。また、本技術は、演算量が少ないので、デジタルカメラの他、ビデオカメラ、監視カメラ、その他の小型で安価な画像処理装置に適用し、リアルタイムでオブジェクトをトラッキングすることができる。 This technology uses only the information obtained for each frame. For example, it does not use information obtained from images between multiple frames, such as motion vectors, or uses a distance measuring device. Will be quick and easy. Further, since the present technology has a small amount of calculation, it can be applied to a video camera, a surveillance camera, and other small and inexpensive image processing apparatuses in addition to a digital camera to track an object in real time.

上述した一連の処理は、ハードウエアにより実行させることもできるし、ソフトウエアにより実行させることができる。 The series of processes described above can be executed by hardware or can be executed by software.

一連の処理をソフトウエアにより実行させる場合には、そのソフトウエアを構成するプログラムが、記憶部１５に記憶される。 When a series of processing is executed by software, a program constituting the software is stored in the storage unit 15.

なお、本明細書において、コンピュータが実行するプログラムは、本明細書で説明する順序に沿って時系列に処理が行われるプログラムであっても良いし、並列に、あるいは呼び出しが行われたとき等の必要なタイミングで処理が行われるプログラムであっても良い。 In this specification, the program executed by the computer may be a program that is processed in time series in the order described in this specification, or in parallel or when a call is made. It may be a program that performs processing at a necessary timing.

なお、本技術の実施の形態は、上述した実施の形態に限定されるものではなく、本技術の要旨を逸脱しない範囲において種々の変更が可能である。 The embodiments of the present technology are not limited to the above-described embodiments, and various modifications can be made without departing from the gist of the present technology.

本技術は、以下のような構成もとることができる。
（１）
トラッキングの対象であるオブジェクトを含む対象画像と、第１のフレームの前記対象画像と比較される比較領域の画像である比較画像との、第１の特徴量のマッチング度と第２の特徴量のマッチング度を所定の混合率で混合して得られる信頼度の、前記混合率を変化させた場合の和で表される評価値を計算し、前記評価値が最大になるときの前記混合率を求める計算部と、
前記評価値が最大になるときの前記混合率が設定された前記信頼度に基づいて、第２のフレームの前記対象画像に対応する画像を検出する検出部と
を備える画像処理装置。
（２）
前記第１のフレームと前記第２のフレームは、奇数フレームと偶数フレームの一方と他方である
前記（１）に記載の画像処理装置。
（３）
前記第２のフレームのスキャン領域のスキャン画像を前記比較画像とし、前記対象画像と前記スキャン画像の前記信頼度を演算する演算部をさらに備える
前記（１）または（２）に記載の画像処理装置。
（４）
前記検出部は、前記対象画像と前記スキャン画像の前記信頼度が最大となる前記スキャン画像を前記対象画像に対応する画像として検出する
前記（１）、（２）または（３）に記載の画像処理装置。
（５）
前記計算部は、前記第１のフレームの基準領域の画像を前記対象画像とし、前記基準領域の前記対象画像を少なくとも一部に含む複数の領域をポジティブ領域とし、前記基準領域の前記対象画像を含まない複数の領域をネガティブ領域とし、前記基準領域の前記対象画像と複数の前記ポジティブ領域の画像との前記信頼度である第１の信頼度を計算し、前記基準領域の前記対象画像と複数の前記ネガティブ領域の画像との前記信頼度である第２の信頼度を計算し、前記第１の信頼度と前記ポジティブ領域の第１の重み係数のと積和である第１の積和を計算し、前記第２の信頼度と前記ネガティブ領域の第２の重み係数との積和である第２の積和を計算し、前記第１の積和と前記第２の積和の和を前記評価値として計算する
前記（１）乃至（４）のいずれかに記載の画像処理装置。
（６）
前記第１の重み係数は、定数を前記ポジティブ領域の数で除算した値であり、前記第２の重み係数は、前記定数を前記ネガティブ領域の数で除算した値である
前記（１）乃至（５）のいずれかに記載の画像処理装置。
（７）
前記計算部は、前記第２のフレームの前記対象画像に対応する画像の座標に対応する領域であって、前記第２のフレームよりさらに後の第３のフレームの領域の画像を新たな前記対象画像として、前記第３のフレームで前記評価値を計算して、前記評価値が最大になるときの前記混合率を求め、
前記検出部は、前記第３のフレームの画像に基づいて前記評価値が最大になるときの前記混合率が設定された前記信頼度に基づいて、前記第３のフレームよりさらに後の第４のフレームにおいて、前記第３のフレームの新たな前記対象画像に対応する画像を検出する
前記（１）乃至（６）のいずれかに記載の画像処理装置。
（８）
前記対象画像に対応する画像の座標に対応する領域にマーカを表示する表示部をさらに備える
前記（１）乃至（７）のいずれかに記載の画像処理装置。
（９）
前記対象画像に対応する画像が画面の所定の位置に配置されるようにカメラの位置を駆動する駆動部をさらに備える
前記（１）乃至（８）のいずれかに記載の画像処理装置。
（１０）
前記第１の特徴量は輝度情報であり、
前記第２の特徴量は色情報である
前記（１）乃至（９）のいずれかに記載の画像処理装置。
（１１）
トラッキングの対象であるオブジェクトを含む対象画像と、所定のフレームの前記対象画像と比較される比較領域の画像である比較画像との、第１の特徴量のマッチング度と第２の特徴量のマッチング度を所定の混合率で混合して得られる信頼度の、前記混合率を変化させた場合の和で表される評価値を計算し、前記評価値が最大になるときの前記混合率を求める計算ステップと、
前記評価値が最大になるときの前記混合率が設定された前記信頼度に基づいて前記対象画像に対応する画像を検出する検出ステップと
を含む画像処理方法。
（１２）
トラッキングの対象であるオブジェクトを含む対象画像と、所定のフレームの前記対象画像と比較される比較領域の画像である比較画像との、第１の特徴量のマッチング度と第２の特徴量のマッチング度を所定の混合率で混合して得られる信頼度の、前記混合率を変化させた場合の和で表される評価値を計算し、前記評価値が最大になるときの前記混合率を求める計算ステップと、
前記評価値が最大になるときの前記混合率が設定された前記信頼度に基づいて前記対象画像に対応する画像を検出する検出ステップと
を含む処理をコンピュータに実行させるプログラムが記録されている記録媒体。
（１３）
トラッキングの対象であるオブジェクトを含む対象画像と、所定のフレームの前記対象画像と比較される比較領域の画像である比較画像との、第１の特徴量のマッチング度と第２の特徴量のマッチング度を所定の混合率で混合して得られる信頼度の、前記混合率を変化させた場合の和で表される評価値を計算し、前記評価値が最大になるときの前記混合率を求める計算ステップと、
前記評価値が最大になるときの前記混合率が設定された前記信頼度に基づいて前記対象画像に対応する画像を検出する検出ステップと
を含む処理をコンピュータに実行させるプログラム。 The present technology can be configured as follows.
(1)
The degree of matching of the first feature amount and the second feature amount between the target image including the object to be tracked and the comparison image that is an image of the comparison region compared with the target image of the first frame. An evaluation value represented by the sum of the reliability obtained by mixing the matching degree at a predetermined mixing ratio when the mixing ratio is changed is calculated, and the mixing ratio when the evaluation value is maximized is calculated. A calculation unit to be obtained;
An image processing apparatus comprising: a detection unit configured to detect an image corresponding to the target image in a second frame based on the reliability set with the mixing ratio when the evaluation value is maximized.
(2)
The image processing apparatus according to (1), wherein the first frame and the second frame are one of the odd frame and the even frame and the other.
(3)
A scan unit that scans the scan area of the second frame as the comparison image, and further includes a calculation unit that calculates the reliability of the target image and the scan image.
The image processing apparatus according to (1) or (2).
(4)
The detection unit detects the scan image having the maximum reliability of the target image and the scan image as an image corresponding to the target image. The image according to (1), (2), or (3) Processing equipment.
(5)
The calculation unit sets the image of the reference area of the first frame as the target image, sets a plurality of areas including at least a part of the target image of the reference area as a positive area, and sets the target image of the reference area as the target image. A plurality of regions not included are defined as negative regions, a first reliability that is the reliability of the target image in the reference region and a plurality of images in the positive region is calculated, and a plurality of the target images in the reference region Calculating a second reliability that is the reliability with the image of the negative region, and calculating a first product sum that is a product sum of the first reliability and the first weighting factor of the positive region. Calculating a second product sum that is a product sum of the second reliability and the second weighting factor of the negative region, and calculating the sum of the first product sum and the second product sum. Calculated as the evaluation value (1) to (4 ).
(6)
The first weighting factor is a value obtained by dividing a constant by the number of positive regions, and the second weighting factor is a value obtained by dividing the constant by the number of negative regions. The image processing apparatus according to any one of 5).
(7)
The calculation unit is a region corresponding to the coordinates of an image corresponding to the target image of the second frame, and an image in a region of a third frame further after the second frame is newly added to the target As the image, the evaluation value is calculated in the third frame, and the mixing ratio when the evaluation value is maximized is obtained,
The detection unit, based on the reliability at which the mixing ratio when the evaluation value is maximized based on the image of the third frame is set, is set to a fourth level further after the third frame. The image processing device according to any one of (1) to (6), wherein an image corresponding to the new target image of the third frame is detected in a frame.
(8)
The image processing apparatus according to any one of (1) to (7), further including a display unit that displays a marker in a region corresponding to the coordinates of the image corresponding to the target image.
(9)
The image processing apparatus according to any one of (1) to (8), further including a drive unit that drives a position of the camera so that an image corresponding to the target image is arranged at a predetermined position on the screen.
(10)
The first feature amount is luminance information;
The image processing apparatus according to any one of (1) to (9), wherein the second feature amount is color information.
(11)
Matching between the first feature amount and the second feature amount between a target image including an object that is a tracking target and a comparison image that is an image of a comparison region compared with the target image of a predetermined frame An evaluation value represented by the sum of the reliability obtained by mixing the degrees at a predetermined mixing ratio when the mixing ratio is changed is calculated, and the mixing ratio when the evaluation value is maximized is obtained. A calculation step;
And a detecting step of detecting an image corresponding to the target image based on the reliability with which the mixing ratio when the evaluation value is maximized is set.
(12)
Matching between the first feature amount and the second feature amount between a target image including an object that is a tracking target and a comparison image that is an image of a comparison region compared with the target image of a predetermined frame An evaluation value represented by the sum of the reliability obtained by mixing the degrees at a predetermined mixing ratio when the mixing ratio is changed is calculated, and the mixing ratio when the evaluation value is maximized is obtained. A calculation step;
A recording in which a program for causing a computer to execute processing including: a detection step of detecting an image corresponding to the target image based on the reliability with which the mixing ratio when the evaluation value is maximized is set is recorded Medium.
(13)
Matching between the first feature amount and the second feature amount between a target image including an object that is a tracking target and a comparison image that is an image of a comparison region compared with the target image of a predetermined frame An evaluation value represented by the sum of the reliability obtained by mixing the degrees at a predetermined mixing ratio when the mixing ratio is changed is calculated, and the mixing ratio when the evaluation value is maximized is obtained. A calculation step;
And a detection step of detecting an image corresponding to the target image based on the reliability set with the mixing ratio when the evaluation value is maximized.

１デジタルカメラ，１２レンズ，１３出力部，１４入力部，１５記憶部，２１取り込み部，２２切り出し部，２３初期化部，２４計算部，２５設定部，２６演算部，２７検出部，２８表示部，２９フォーカス部，３０判定部 DESCRIPTION OF SYMBOLS 1 Digital camera, 12 Lens, 13 Output part, 14 Input part, 15 Storage part, 21 Capture part, 22 Extraction part, 23 Initialization part, 24 Calculation part, 25 Setting part, 26 Calculation part, 27 Detection part, 28 Display Part, 29 focus part, 30 judgment part

Claims

The degree of matching of the first feature amount and the second feature amount between the target image including the object to be tracked and the comparison image that is an image of the comparison region compared with the target image of the first frame. An evaluation value represented by the sum of the reliability obtained by mixing the matching degree at a predetermined mixing ratio when the mixing ratio is changed is calculated, and the mixing ratio when the evaluation value is maximized is calculated. A calculation unit to be obtained;
An image processing apparatus comprising: a detection unit configured to detect an image corresponding to the target image in a second frame based on the reliability set with the mixing ratio when the evaluation value is maximized.

The image processing apparatus according to claim 1, wherein the first frame and the second frame are one of an odd frame and an even frame.

A scan unit that scans the scan area of the second frame as the comparison image, and further includes a calculation unit that calculates the reliability of the target image and the scan image.
The image processing apparatus according to claim 2.

The image processing apparatus according to claim 3, wherein the detection unit detects the scan image having the maximum reliability of the target image and the scan image as an image corresponding to the target image.

The calculation unit sets the image of the reference area of the first frame as the target image, sets a plurality of areas including at least a part of the target image of the reference area as a positive area, and sets the target image of the reference area as the target image. A plurality of regions not included are defined as negative regions, a first reliability that is the reliability of the target image in the reference region and a plurality of images in the positive region is calculated, and a plurality of the target images in the reference region Calculating a second reliability that is the reliability with the image of the negative region, and calculating a first product sum that is a product sum of the first reliability and the first weighting factor of the positive region. Calculating a second product sum that is a product sum of the second reliability and the second weighting factor of the negative region, and calculating the sum of the first product sum and the second product sum. The image according to claim 4, wherein the image is calculated as the evaluation value. Image processing device.

The first weighting factor is a value obtained by dividing a constant by the number of positive regions, and the second weighting factor is a value obtained by dividing the constant by the number of negative regions. Image processing device.

The calculation unit is a region corresponding to the coordinates of an image corresponding to the target image of the second frame, and an image in a region of a third frame further after the second frame is newly added to the target As the image, the evaluation value is calculated in the third frame, and the mixing ratio when the evaluation value is maximized is obtained,
The detection unit, based on the reliability at which the mixing ratio when the evaluation value is maximized based on the image of the third frame is set, is set to a fourth level further after the third frame. The image processing apparatus according to claim 5, wherein an image corresponding to the new target image of the third frame is detected in a frame.

The image processing apparatus according to claim 5, further comprising: a display unit that displays a marker in an area corresponding to the coordinates of the image corresponding to the target image.

The image processing apparatus according to claim 5, further comprising a drive unit that drives the position of the camera so that an image corresponding to the target image is arranged at a predetermined position on the screen.

The first feature amount is luminance information;
The image processing apparatus according to claim 5, wherein the second feature amount is color information.

Matching between the first feature amount and the second feature amount between a target image including an object that is a tracking target and a comparison image that is an image of a comparison region compared with the target image of a predetermined frame An evaluation value represented by the sum of the reliability obtained by mixing the degrees at a predetermined mixing ratio when the mixing ratio is changed is calculated, and the mixing ratio when the evaluation value is maximized is obtained. A calculation step;
And a detecting step of detecting an image corresponding to the target image based on the reliability with which the mixing ratio when the evaluation value is maximized is set.

Matching between the first feature amount and the second feature amount between a target image including an object that is a tracking target and a comparison image that is an image of a comparison region compared with the target image of a predetermined frame An evaluation value represented by the sum of the reliability obtained by mixing the degrees at a predetermined mixing ratio when the mixing ratio is changed is calculated, and the mixing ratio when the evaluation value is maximized is obtained. A calculation step;
A recording in which a program for causing a computer to execute processing including: a detection step of detecting an image corresponding to the target image based on the reliability with which the mixing ratio when the evaluation value is maximized is set is recorded Medium.

Matching between the first feature amount and the second feature amount between a target image including an object that is a tracking target and a comparison image that is an image of a comparison region compared with the target image of a predetermined frame An evaluation value represented by the sum of the reliability obtained by mixing the degrees at a predetermined mixing ratio when the mixing ratio is changed is calculated, and the mixing ratio when the evaluation value is maximized is obtained. A calculation step;
And a detection step of detecting an image corresponding to the target image based on the reliability set with the mixing ratio when the evaluation value is maximized.