JP2011090488A

JP2011090488A - Object tracking device, object tracking program, and camera

Info

Publication number: JP2011090488A
Application number: JP2009243332A
Authority: JP
Inventors: Yuichi Ito; 悠一伊藤
Original assignee: Nikon Corp
Current assignee: Nikon Corp
Priority date: 2009-10-22
Filing date: 2009-10-22
Publication date: 2011-05-06

Abstract

<P>PROBLEM TO BE SOLVED: To provide an object tracking device for tracking an object between frames, and for specifying the position of the object with high accuracy. <P>SOLUTION: A controller 104 specifies a transition destination of an object based on time-sequentially input inter-frame information by motion estimation, and calculates first similarity between each pixel in the frame and a template for tracking an object by template matching, calculates second similarity by weighting the first similarity of each pixel according to a distance from the transition destination of the object specified by motion estimation, and specifies the position of the object in the frame based on the second similarity. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、被写体追尾装置、被写体追尾プログラム、およびカメラに関する。 The present invention relates to a subject tracking device, a subject tracking program, and a camera.

次のような追跡装置が知られている。この追跡装置は、過去の被写体の動き情報等から、次フレームの被写体位置を予測して被写体を追跡する（例えば、特許文献１）。 The following tracking devices are known. The tracking device tracks the subject by predicting the subject position of the next frame from the motion information of the subject in the past (for example, Patent Document 1).

特開２００７−４２０７２号公報JP 2007-42072 A

しかしながら、従来の追跡装置では、低フレームレートで隣接フレーム間の相関が乏しい場合には、被写体位置の予測精度が低下して、追跡性能が低下する可能性があった。 However, in the conventional tracking device, when the correlation between adjacent frames is poor at a low frame rate, the prediction accuracy of the subject position is lowered, and the tracking performance may be lowered.

本発明による被写体追尾装置は、時系列で入力されるフレーム間の情報に基づいて、被写体の遷移先を特定する遷移先特定手段と、テンプレートマッチングを行って、フレーム内の各画素と被写体追尾用のテンプレートとの第１の類似度を算出する第１の類似度算出手段と、遷移先特定手段により特定された被写体の遷移先からの距離に応じて、第１の類似度算出手段によって算出された各画素の第１の類似度に対して重み付けを行って、第２の類似度を算出する第２の類似度算出手段と、第２の類似度算出手段によって算出された第２の類似度に基づいて、フレーム内における被写体位置を特定する被写体位置特定手段とを備えることを特徴とする。
本発明では、遷移先特定手段は、フレーム間での被写体の動きを示す動きベクトルを算出することによって、被写体の遷移先を特定するようにしてもよい。
被写体位置特定手段によって特定された被写体位置近傍領域内の画像を用いて動きベクトルを算出するための動き探索用のテンプレートを更新するテンプレート更新手段をさらに備えるようにしてもよい。
遷移先特定手段は、同一フレームに基づいて異なる解像度の複数枚の画像を生成し、低解像度の画像から順番に動き探索用のテンプレートを用いたテンプレートマッチング処理を行って動きベクトルを算出するようにしてもよい。
本発明による被写体追尾プログラムは、コンピュータに、時系列で入力されるフレーム間の情報に基づいて、被写体の遷移先を特定する遷移先特定手順と、テンプレートマッチングを行って、フレーム内の各画素と被写体追尾用のテンプレートとの第１の類似度を算出する第１の類似度算出手順と、遷移先特定手順で特定した被写体の遷移先からの距離に応じて、第１の類似度算出手順で算出した各画素の前記第１の類似度に対して重み付けを行って、第２の類似度を算出する第２の類似度算出手順と、第２の類似度算出手順で算出した第２の類似度に基づいて、フレーム内における被写体位置を特定する被写体位置特定手順とを実行させることを特徴とする。
本発明によるカメラは、上記いずれかの被写体追尾装置を備えることを特徴とする。 The subject tracking device according to the present invention performs transition matching with a transition destination specifying unit that specifies a transition destination of a subject based on information between frames input in time series, and performs pixel matching for each pixel in the frame and subject tracking Calculated by the first similarity calculating means according to the distance from the transition destination of the subject specified by the transition destination specifying means and the first similarity calculating means for calculating the first similarity with the template. A second similarity calculation unit that calculates the second similarity by weighting the first similarity of each pixel, and the second similarity calculated by the second similarity calculation unit And a subject position specifying means for specifying the subject position in the frame.
In the present invention, the transition destination specifying means may specify the transition destination of the subject by calculating a motion vector indicating the motion of the subject between frames.
You may make it further provide the template update part which updates the template for a motion search for calculating a motion vector using the image in the to-be-photographed object position vicinity area | region specified by the object position specifying means.
The transition destination specifying means generates a plurality of images with different resolutions based on the same frame, and performs a template matching process using a motion search template in order from the low resolution images to calculate a motion vector. May be.
The subject tracking program according to the present invention performs a transition destination identification procedure for identifying a transition destination of a subject on the basis of information between frames input in time series to a computer, performs template matching, and sets each pixel in the frame. In accordance with the first similarity calculation procedure for calculating the first similarity with the template for tracking the subject and the distance from the transition destination of the subject specified in the transition destination specification procedure, the first similarity calculation procedure A second similarity calculation procedure for calculating a second similarity by weighting the first similarity of each calculated pixel, and a second similarity calculated by the second similarity calculation procedure And a subject position specifying procedure for specifying a subject position in the frame based on the degree.
A camera according to the present invention includes any one of the subject tracking devices described above.

本発明によれば、高精度に被写体位置を特定することができる。 According to the present invention, the subject position can be specified with high accuracy.

カメラ１００の一実施の形態の構成を示すブロック図である。1 is a block diagram illustrating a configuration of an embodiment of a camera 100. FIG. 被写体追尾処理の流れを示すフローチャート図である。It is a flowchart figure which shows the flow of a subject tracking process. 動きベクトルの検出例を示す図である。It is a figure which shows the example of a detection of a motion vector. 動き探索処理の方法を模式的に示した図である。It is the figure which showed the method of the motion search process typically. 動き探索処理から第２類似度の算出までのイメージを示した図である。It is the figure which showed the image from a motion search process to calculation of a 2nd similarity.

図１は、本実施の形態におけるカメラの一実施の形態の構成を示すブロック図である。カメラ１００は、操作部材１０１と、レンズ１０２と、撮像素子１０３と、制御装置１０４と、メモリカードスロット１０５と、モニタ１０６とを備えている。操作部材１０１は、使用者によって操作される種々の入力部材、例えば電源ボタン、レリーズボタン、ズームボタン、十字キー、決定ボタン、再生ボタン、削除ボタンなどを含んでいる。 FIG. 1 is a block diagram illustrating a configuration of an embodiment of a camera according to the present embodiment. The camera 100 includes an operation member 101, a lens 102, an image sensor 103, a control device 104, a memory card slot 105, and a monitor 106. The operation member 101 includes various input members operated by the user, such as a power button, a release button, a zoom button, a cross key, an enter button, a play button, and a delete button.

レンズ１０２は、複数の光学レンズから構成されるが、図１では代表して１枚のレンズで表している。撮像素子１０３は、例えばＣＣＤやＣＭＯＳなどのイメージセンサーであり、レンズ１０２により結像した被写体像を撮像する。そして、撮像によって得られた画像信号を制御装置１０４へ出力する。 The lens 102 is composed of a plurality of optical lenses, but is representatively represented by one lens in FIG. The image sensor 103 is an image sensor such as a CCD or a CMOS, for example, and captures a subject image formed by the lens 102. Then, an image signal obtained by imaging is output to the control device 104.

制御装置１０４は、ＣＰＵ、メモリ、およびその他の周辺回路により構成され、カメラ１００を制御する。なお、制御装置１０４を構成するメモリには、ＳＤＲＡＭやフラッシュメモリが含まれる。ＳＤＲＡＭは、揮発性のメモリであって、ＣＰＵがプログラム実行時にプログラムを展開するためのワークメモリとして使用されたり、データを一時的に記録するためのバッファメモリとして使用される。また、フラッシュメモリは、不揮発性のメモリであって、制御装置１０４が実行するプログラムのデータや、プログラム実行時に読み込まれる種々のパラメータなどが記録されている。 The control device 104 includes a CPU, a memory, and other peripheral circuits, and controls the camera 100. Note that the memory constituting the control device 104 includes SDRAM and flash memory. The SDRAM is a volatile memory, and is used as a work memory for the CPU to develop a program when the program is executed or as a buffer memory for temporarily recording data. The flash memory is a non-volatile memory in which data of a program executed by the control device 104, various parameters read during program execution, and the like are recorded.

この制御装置１０４は、撮像素子１０３から入力された画像信号に基づいて所定の画像形式、例えばＪＰＥＧ形式の画像データ（以下、「本画像データ」と呼ぶ）を生成する。また、制御装置１０４は、生成した本画像データに基づいて、表示用画像データ、例えばサムネイル画像データを生成する。制御装置１０４は、生成した本画像データとサムネイル画像データとを含み、さらにヘッダ情報を付加した画像ファイルを生成してメモリカードスロット１０５へ出力する。 The control device 104 generates image data in a predetermined image format, for example, JPEG format (hereinafter referred to as “main image data”) based on the image signal input from the image sensor 103. Further, the control device 104 generates display image data, for example, thumbnail image data, based on the generated main image data. The control device 104 generates an image file that includes the generated main image data and thumbnail image data, and further includes header information, and outputs the image file to the memory card slot 105.

メモリカードスロット１０５は、記憶媒体としてのメモリカードを挿入するためのスロットであり、制御装置１０４から出力された画像ファイルをメモリカードに書き込んで記録する。また、メモリカードスロット１０５は、制御装置１０４からの指示に基づいて、メモリカード内に記憶されている画像ファイルを読み込む。 The memory card slot 105 is a slot for inserting a memory card as a storage medium, and the image file output from the control device 104 is written and recorded on the memory card. The memory card slot 105 reads an image file stored in the memory card based on an instruction from the control device 104.

モニタ１０６は、カメラ１００の背面に搭載された液晶モニタ（背面モニタ）であり、当該モニタ１０６には、メモリカードに記憶されている画像やカメラ１００を設定するための設定メニューなどが表示される。また、制御装置１０４は、使用者によってカメラ１００のモードが撮影モードに設定されると、撮像素子１０３から時系列で取得した画像信号に基づいて生成した表示用画像データをモニタ１０６に出力する。これによってモニタ１０６にはスルー画が表示される。 The monitor 106 is a liquid crystal monitor (rear monitor) mounted on the back surface of the camera 100, and the monitor 106 displays an image stored in a memory card, a setting menu for setting the camera 100, and the like. . Further, when the user sets the mode of the camera 100 to the shooting mode, the control device 104 outputs the display image data generated based on the image signal acquired in time series from the image sensor 103 to the monitor 106. As a result, a through image is displayed on the monitor 106.

本実施の形態におけるカメラ１００では、制御装置１０４は、撮像素子１０３から入力されるスルー画の各フレーム（フレーム画像）に対して、あらかじめ用意したテンプレート画像を用いたテンプレートマッチング処理を行うことによって、フレーム内からテンプレート画像と類似する画像領域を特定する。そして、制御装置１０４は、特定した領域をフレーム間で追跡することによって、被写体追尾を行う。 In the camera 100 according to the present embodiment, the control device 104 performs a template matching process using a template image prepared in advance for each frame (frame image) of a through image input from the image sensor 103. An image region similar to the template image is specified from within the frame. Then, the control device 104 performs subject tracking by tracking the specified region between frames.

本実施の形態では、制御装置１０４は、テンプレートマッチングを行うに当たって、被写体のフレーム間での動きを探索する動き探索用のテンプレートと、被写体追尾用のテンプレートとの２つのテンプレートを用いて、被写体追尾を行う。以下、本実施の形態における被写体追尾処理の詳細を図２に示すフローチャートを用いて説明する。なお、図２に示す処理は、撮像素子１０３から画像信号の入力が開始されると起動するプログラムとして、制御装置１０４によって実行される。 In the present embodiment, when performing template matching, the control device 104 uses two templates, a motion search template for searching for a motion between frames of a subject and a subject tracking template, to perform subject tracking. I do. The details of the subject tracking process in the present embodiment will be described below with reference to the flowchart shown in FIG. 2 is executed by the control device 104 as a program that is activated when input of an image signal from the image sensor 103 is started.

ステップＳ１において、制御装置１０４は、撮像素子１０３から入力されるフレーム画像を取り込んで、ステップＳ２へ進む。ステップＳ２では、制御装置１０４は、ステップＳ１で取り込んだフレーム画像内から、所定範囲内の画像を動き探索用のテンプレート、および被写体追尾用のテンプレートを取得する。なお、動き探索用のテンプレート、および被写体追尾用のテンプレートは、ステップＳ１で取り込んだフレーム画像内において、使用者によって指定された範囲内の画像を切り出すことによって取得される。また、ステップＳ２で取得される初期の動き探索用のテンプレート、および被写体追尾用のテンプレートは、同じ画像であってもよいし、異なる画像であってもよい。 In step S1, the control device 104 takes in a frame image input from the image sensor 103, and proceeds to step S2. In step S2, the control device 104 acquires a template for motion search and a template for subject tracking from the frame image captured in step S1 for an image within a predetermined range. Note that the motion search template and the subject tracking template are obtained by cutting out an image within the range specified by the user in the frame image captured in step S1. Further, the initial motion search template and the subject tracking template acquired in step S2 may be the same image or different images.

その後、ステップＳ３へ進み、制御装置１０４は、ステップＳ１で取りこんだフレーム画像を対象として、ステップＳ２で取得した動き探索用のテンプレートを用いた動き探索処理を行う。例えば、制御装置１０４は、時系列で入力されるフレーム間の情報に基づいて、被写体の遷移先を特定する。具体的には、フレーム画像を対象として、テンプレートマッチング（ブロックマッチング）や勾配法等の方法により、動き探索用のテンプレートを用いた動き探索処理を行う。 Thereafter, the process proceeds to step S3, and the control device 104 performs a motion search process using the motion search template acquired in step S2 for the frame image captured in step S1. For example, the control device 104 specifies the transition destination of the subject based on information between frames input in time series. Specifically, motion search processing using a template for motion search is performed on a frame image by a method such as template matching (block matching) or gradient method.

この動き探索処理によって、フレーム間での被写体の遷移を示す動きベクトルが検出される。例えば、図３（ａ）に示すｉフレーム目のフレーム画像における被写体２ａが、図３（ｂ）に示すｉ＋１フレーム目のフレーム画像に示す位置に移動した場合には、動き探索処理によってフレーム間での被写体の動きを示す動きベクトル２ｂが検出される。 By this motion search process, a motion vector indicating the transition of the subject between frames is detected. For example, when the subject 2a in the frame image of the i-th frame shown in FIG. 3A moves to the position shown in the frame image of the i + 1-th frame shown in FIG. A motion vector 2b indicating the motion of the subject is detected.

なお、本実施の形態における動き探索処理では、制御装置１０４は、図４に示すように、階層的なテンプレートマッチング処理によって動きベクトルを検出することにより、動き探索処理を行う。具体的には、制御装置１０４は、任意の隣接フレーム、例えばｉフレームとｉ＋１フレームにおいて、図４に示すような階層構造の画像を作成する。この階層は２つ以上であり、図４に示す例では、ｉフレームが階層４ａ〜４ｃ、ｉ＋１フレームが階層４ｄ〜４ｆとそれぞれ３階層になっている。また、高解像側（４ｃ側、４ｆ側）からＮ番目の階層画像の１辺のサイズ比は１／２^Ｎとなる。 In the motion search process according to the present embodiment, as shown in FIG. 4, the control device 104 performs the motion search process by detecting a motion vector through a hierarchical template matching process. Specifically, the control device 104 creates an image having a hierarchical structure as shown in FIG. 4 in arbitrary adjacent frames, for example, i frame and i + 1 frame. There are two or more hierarchies. In the example shown in FIG. 4, the i frame has hierarchies 4a to 4c and the i + 1 frame has hierarchies 4d to 4f. Further, the size ratio of one side of the Nth hierarchical image from the high resolution side (4c side, 4f side) is 1 / ^2N .

この場合、テンプレートはｉフレームから抽出するものとし、具体的には、ｉフレームにおいて、点線の枠で示したように各階層でテンプレートが抽出される。このそれぞれのテンプレートを用いて、最も低解像な階層の画像、すなわちｉ＋１フレームにおける画像４ｆから順番にテンプレートマッチング処理を行う。ここで、Ｌ−１階層でマッチングした座標値を２倍した座標をＬ階層の探索初期値にする。これは、マッチング座標をＸｅｎｄ、探索初期座標をＸｉｎｉとすると、次式（１）により表すことができる。

In this case, it is assumed that the template is extracted from the i frame. Specifically, in the i frame, the template is extracted at each layer as indicated by the dotted frame. Using each of these templates, template matching processing is performed in order from the lowest resolution image, that is, the image 4f in the i + 1 frame. Here, the coordinate obtained by doubling the coordinate value matched in the L-1 hierarchy is set as the search initial value in the L hierarchy. This can be expressed by the following equation (1), where Xend is the matching coordinate and Xini is the initial search coordinate.

このような階層的なテンプレートマッチングを行うことにより、各階層で探索エリアを小さくしても、被写体の大きな動きを正確かつ高速に検出することができる。その後、ステップＳ４へ進む。 By performing such hierarchical template matching, even if the search area is reduced in each hierarchy, a large movement of the subject can be detected accurately and at high speed. Then, it progresses to step S4.

ステップＳ４では、制御装置１０４は、被写体追尾用のテンプレートを用いて、フレーム画像を対象としたテンプレートマッチングを行い、フレーム画像の各画素に対して被写体追尾用のテンプレートとの類似度を算出する。本実施の形態では、このステップＳ４で各画素について算出される類似度を第１類似度と呼ぶ。なお、被写体追尾用のテンプレートを用いたテンプレートマッチングの方法は公知であるため詳細な説明は省略するが、一般的には、ＳＡＤ（ＳｕｍｏｆＡｂｓｏｌｕｔｅＤｉｆｆｅｒｅｎｃｅ）やＳＳＤ（ＳｕｍｏｆＳｑｕａｒｅｄＤｉｆｆｅｒｅｎｃｅ）等の方法が用いられる。 In step S <b> 4, the control device 104 performs template matching for the frame image using the subject tracking template, and calculates the similarity between the pixel of the frame image and the subject tracking template. In the present embodiment, the similarity calculated for each pixel in step S4 is referred to as a first similarity. Note that a template matching method using a subject tracking template is well known and will not be described in detail. Generally, a method such as SAD (Sum of Absolute Difference) or SSD (Sum of Squared Difference) is used. Is used.

その後、ステップＳ５へ進み、制御装置１０４は、ステップＳ３における動き探索処理によって得られた主要被写体の遷移先からの距離に応じて、各画素の第１類似度の重み付けを行うことにより、第２類似度を算出する。具体的には、制御装置１０４は、次式（２）によって第２類似度を算出する。
第２類似度＝第１類似度×距離情報による重み・・・（２） Thereafter, the process proceeds to step S5, where the control device 104 performs the second weighting by weighting the first similarity of each pixel according to the distance from the transition destination of the main subject obtained by the motion search process in step S3. Calculate similarity. Specifically, the control device 104 calculates the second similarity by the following equation (2).
2nd similarity = 1st similarity x weight based on distance information (2)

なお、式（２）において、距離情報による重み付けとしては、次式（３）に示すガウシアン関数や、次式（４）に示す線形関数を用いた重み付けが挙げられる。

In Expression (2), weighting by distance information includes weighting using a Gaussian function represented by the following Expression (3) and a linear function represented by the following Expression (4).

ここで、ステップＳ３における動き探索処理からステップＳ５における第２類似度の算出までのイメージを図５を用いて説明する。図５（ａ）に示すように、ステップＳ３における動き探索処理によって、追尾対象である被写体５ａの動きベクトル５ｂが算出される。なお、図５（ａ）に示す例では、被写体５ａのｉフレームにおける被写体位置が点線で示す枠内であったのに対して、ｉ＋１フレームでは実線で示す枠内に移動したことにより、動きベクトル５ｂが算出されている。 Here, an image from the motion search process in step S3 to the calculation of the second similarity in step S5 will be described with reference to FIG. As shown in FIG. 5A, the motion vector 5b of the subject 5a to be tracked is calculated by the motion search process in step S3. In the example shown in FIG. 5A, the subject position of the subject 5a in the i frame is within the frame indicated by the dotted line, whereas in the i + 1 frame, the motion vector is moved into the frame indicated by the solid line. 5b is calculated.

図５（ｂ）は、ステップＳ５で、ステップＳ３における動き探索処理によって得られた主要被写体の遷移先５ｃからの距離に応じて、各画素の第１類似度の重み付けを行う場合の重み付けイメージを示す図である。この図５（ｂ）に示す主要被写体の遷移先５ｃは、図５（ａ）に示したｉ＋１フレームにおける被写体５ａの位置に相当し、この遷移先５ｃを中心に近い画素ほど重みを高くし、中心から遠ざかるに従って重みを低くする。すなわち、図５（ｂ）に示す例では、色が濃いほど重みが高く、色が薄くなるに従って重みが低くなっている。 FIG. 5B shows a weighted image when weighting the first similarity of each pixel according to the distance from the transition destination 5c of the main subject obtained by the motion search process in step S3 in step S5. FIG. The transition destination 5c of the main subject shown in FIG. 5B corresponds to the position of the subject 5a in the i + 1 frame shown in FIG. 5A, and the pixel closer to the center of the transition destination 5c has a higher weight. The weight is lowered as the distance from the center increases. That is, in the example shown in FIG. 5B, the darker the color, the higher the weight, and the lower the color, the lower the weight.

図５（ｃ）は、ステップＳ４で算出される第１類似度のイメージを示した図である。この例では、図５（ａ）における追尾対象の被写体５ａに対応する第１類似度成分５ｅに加えて、他の被写体５ｃに対応する第１類似度成分５ｆや被写体５ｄに対応する第１類似度成分５ｇが算出されている。 FIG. 5C is a diagram showing an image of the first similarity calculated in step S4. In this example, in addition to the first similarity component 5e corresponding to the subject 5a to be tracked in FIG. 5A, the first similarity component 5f corresponding to the other subject 5c and the first similarity corresponding to the subject 5d. The degree component 5g is calculated.

図５（ｄ）は、ステップＳ５において、第１類似度成分を動き探索処理によって得られた主要被写体の遷移先からの距離に応じて重み付けを行うことによって算出される第２類似度のイメージを示した図である。この図５（ｄ）に示す例では、図５（ｂ）に示した重み付けに基づいて、主要被写体の遷移先５ｃからの距離が近い、追尾対象の被写体５ａに対応する第１類似度成分５ｅに対して大きな重み付けがなされた結果、被写体５ａに対応する第２類似度成分５ｈが算出されている。 FIG. 5D shows an image of the second similarity calculated by weighting the first similarity component according to the distance from the transition destination of the main subject obtained by the motion search process in step S5. FIG. In the example shown in FIG. 5D, the first similarity component 5e corresponding to the tracking target subject 5a whose distance from the transition destination 5c of the main subject is close based on the weighting shown in FIG. 5B. As a result, a second similarity component 5h corresponding to the subject 5a is calculated.

その後、ステップＳ６へ進み、制御装置１０４は、フレーム画像内でステップＳ５で算出した第２類似度の値が最小となる位置、すなわち第２類似度が最も高い位置を特定して抽出する。制御装置１０４は、ここで抽出した位置をフレーム画像内における追尾対象の被写体位置として特定することにより、フレーム間での被写体追尾を行う。その後、ステップＳ７へ進む。 Thereafter, the process proceeds to step S6, and the control device 104 identifies and extracts the position where the second similarity value calculated in step S5 is minimum in the frame image, that is, the position where the second similarity is the highest. The control device 104 performs subject tracking between frames by specifying the extracted position as the subject position to be tracked in the frame image. Thereafter, the process proceeds to step S7.

ステップＳ７では、制御装置１０４は、次のフレーム画像に対する動き探索処理のために、ステップＳ６で抽出した第２類似度が最も高い位置の近傍領域から新たな動き探索用テンプレートを取得して、動き探索用のテンプレートの更新を行なって、ステップＳ８へ進む。 In step S7, the control device 104 acquires a new motion search template from the vicinity region of the position having the highest second similarity extracted in step S6 for the motion search process for the next frame image, and moves the motion. The search template is updated, and the process proceeds to step S8.

ステップＳ８では、制御装置１０４は、ステップＳ６で特定した最も高い第２類似度、すなわち最良類似度があらかじめ定められた所定の閾値より大きいか否かを判断する。ステップＳ８で肯定判断した場合には、ステップＳ９へ進み、制御装置１０４は、次のフレーム画像に対するテンプレートマッチング処理のために、ステップＳ６で抽出した第２類似度が最も高い位置の近傍領域から新たな被写体追尾用のテンプレートを取得して、被写体追尾用のテンプレートの更新を行なって、ステップＳ１０へ進む。これに対して、ステップＳ８で否定判断した場合には、そのままステップＳ１０へ進む。 In step S8, the control device 104 determines whether or not the highest second similarity specified in step S6, that is, the best similarity is greater than a predetermined threshold value. When an affirmative determination is made in step S8, the process proceeds to step S9, and the control device 104 newly starts from the neighboring region of the position having the highest second similarity extracted in step S6 for the template matching process for the next frame image. The subject tracking template is acquired, the subject tracking template is updated, and the process proceeds to step S10. On the other hand, if a negative determination is made in step S8, the process proceeds directly to step S10.

このように、本実施の形態では、動き探索用のテンプレートは毎フレーム更新を行なうのに対して、被写体追尾用のテンプレートは、毎フレーム更新を行なうとは限らず、ステップＳ８の条件を満たしたときのみ更新を行なうこととする。これによって、動き探索用のテンプレートは、常に最新の被写体の形状を反映させたテンプレートとすることができるため、最新の被写体の形状を捉えて動きベクトルを算出できる。一方、被写体追尾用のテンプレートは、類似度が高い場合にのみ更新を行うため、誤マッチングの場合にテンプレートが更新されるのを防ぐことができ、高い精度でテンプレートマッチングを継続することができる。 As described above, in this embodiment, the motion search template is updated every frame, whereas the subject tracking template is not always updated every frame, and satisfies the condition of step S8. Update only when As a result, the motion search template can always be a template that reflects the latest shape of the subject, so that the motion vector can be calculated by capturing the latest shape of the subject. On the other hand, since the subject tracking template is updated only when the similarity is high, the template can be prevented from being updated in the case of incorrect matching, and the template matching can be continued with high accuracy.

ステップＳ１０では、制御装置１０４は、処理を終了するか否かを判断する。例えば、制御装置１０４は、撮像素子１０３からのフレーム画像の入力が停止したときに、処理を終了すると判断する。ステップＳ１０で否定判断した場合には、ステップＳ３へ戻って処理を繰り返す。これに対して、ステップＳ１０で肯定判断した場合には、処理を終了する。 In step S10, the control device 104 determines whether or not to end the process. For example, the control device 104 determines to end the process when the input of the frame image from the image sensor 103 is stopped. If a negative determination is made in step S10, the process returns to step S3 and is repeated. On the other hand, if an affirmative determination is made in step S10, the process ends.

以上説明した本実施の形態によれば、以下のような作用効果を得ることができる。
（１）制御装置１０４は、動き探索処理を行なって、時系列で入力されるフレーム間の情報に基づいて被写体の遷移先を特定し、テンプレートマッチングを行って、フレーム内の各画素と被写体追尾用のテンプレートとの第１の類似度を算出する。そして、動き探索処理によって特定された被写体の遷移先からの距離に応じて、各画素の第１の類似度に対して重み付けを行って第２の類似度を算出し、第２の類似度に基づいて、フレーム内における被写体位置を特定するようにした。このように、被写体の遷移先からの距離に応じて第１の類似度を重み付けして得た第２の類似度に基づいて被写体位置を特定することにより、精度高く被写体位置を特定することができる。 According to the present embodiment described above, the following operational effects can be obtained.
(1) The control device 104 performs a motion search process, specifies a subject transition destination based on information between frames input in time series, performs template matching, and tracks each pixel in the frame and subject tracking A first similarity with the template is calculated. Then, according to the distance from the transition destination of the subject specified by the motion search process, the first similarity of each pixel is weighted to calculate the second similarity, and the second similarity is calculated. Based on this, the subject position in the frame is specified. Thus, the subject position can be specified with high accuracy by specifying the subject position based on the second similarity obtained by weighting the first similarity according to the distance of the subject from the transition destination. it can.

（２）制御装置１０４は、フレーム間での被写体の動きを示す動きベクトルを算出することによって被写体の遷移先を特定するようにした。これによって、被写体の遷移先を精度高く特定することができる。 (2) The control device 104 specifies the transition destination of the subject by calculating a motion vector indicating the motion of the subject between frames. Thereby, the transition destination of the subject can be specified with high accuracy.

（３）制御装置１０４は、第２類似度が最も高い位置の近傍領域から新たな動き探索用テンプレートを取得して、動き探索用のテンプレートの更新を行なうようにした。これによって、動き探索用のテンプレートは、常に最新の被写体の形状を反映させたテンプレートとすることができるため、最新の被写体の形状を捉えて動きベクトルを算出できる。 (3) The control device 104 obtains a new motion search template from the vicinity region of the position having the highest second similarity, and updates the template for motion search. As a result, the motion search template can always be a template that reflects the latest shape of the subject, so that the motion vector can be calculated by capturing the latest shape of the subject.

（４）制御装置１０４は、動き探索処理では、階層的なテンプレートマッチング処理によって動きベクトルを検出することにより、動き探索処理を行うようにした。これによって、各階層で探索エリアを小さくしても、被写体の大きな動きを正確かつ高速に検出することができる。 (4) In the motion search process, the control device 104 performs the motion search process by detecting a motion vector by a hierarchical template matching process. Thereby, even if the search area is reduced in each hierarchy, a large movement of the subject can be detected accurately and at high speed.

―変形例―
なお、上述した実施の形態のカメラは、以下のように変形することもできる。
（１）上述した実施の形態では、制御装置１０４は、スルー画を対象として被写体追尾を行う例について説明した。しかしながら、カメラ１００が動画撮影機能を備えている場合には、制御装置１０４は、スルー画ではなく、撮影済みの動画のフレーム間で被写体追尾を行うようにしてもよい。 -Modification-
The camera of the above-described embodiment can be modified as follows.
(1) In the above-described embodiment, the control device 104 has described an example in which subject tracking is performed on a through image. However, when the camera 100 has a moving image shooting function, the control device 104 may perform subject tracking between frames of a captured moving image instead of the through image.

（２）上述し実施の形態では、カメラ１００が供える制御装置１０４がテンプレートマッチング処理を行って被写体追尾を行う例について説明した。しかしながら、テンプレートマッチング処理を実行するためのプログラムをパソコンなどその他の端末に記録して、それらの端末上で処理を実行することも可能である。この場合、カメラで撮影した動画像データを被写体追尾装置として機能する端末側に取り込んで、これを対象に処理を行うようにすれば、動画のフレーム間で被写体追尾を行うことが可能となる。また、本発明はカメラ付き携帯電話などに適用することも可能である。 (2) In the above-described embodiment, the example in which the control device 104 provided by the camera 100 performs template matching processing to perform subject tracking has been described. However, it is also possible to record a program for executing the template matching process in another terminal such as a personal computer and execute the process on those terminals. In this case, if moving image data captured by a camera is taken into a terminal functioning as a subject tracking device and processing is performed on the terminal, subject tracking can be performed between frames of a moving image. The present invention can also be applied to a camera-equipped mobile phone.

なお、本発明の特徴的な機能を損なわない限り、本発明は、上述した実施の形態における構成に何ら限定されない。また、上述の実施の形態と複数の変形例を組み合わせた構成としてもよい。 Note that the present invention is not limited to the configurations in the above-described embodiments as long as the characteristic functions of the present invention are not impaired. Moreover, it is good also as a structure which combined the above-mentioned embodiment and a some modification.

１００カメラ、１０１操作部材、１０２レンズ、１０３撮像素子、１０４制御装置、１０５メモリカードスロット、１０６モニタ 100 Camera, 101 Operation member, 102 Lens, 103 Image sensor, 104 Control device, 105 Memory card slot, 106 Monitor

Claims

A transition destination specifying means for specifying a transition destination of a subject based on information between frames input in time series;
First similarity calculation means for performing template matching to calculate a first similarity between each pixel in the frame and a template for subject tracking;
According to the distance from the transition destination of the subject specified by the transition destination specifying means, weighting is performed on the first similarity of each pixel calculated by the first similarity calculation means, A second similarity calculating means for calculating a second similarity;
A subject tracking device comprising: subject position specifying means for specifying a subject position in a frame based on the second similarity calculated by the second similarity calculating means.

The subject tracking device according to claim 1,
The subject tracking device characterized in that the transition destination specifying means specifies a transition destination of the subject by calculating a motion vector indicating the motion of the subject between frames.

The subject tracking device according to claim 2,
A subject tracking apparatus, further comprising a template update unit that updates a motion search template for calculating the motion vector using an image in a region near the subject position specified by the subject position specifying unit.

The subject tracking device according to claim 3,
The transition destination specifying unit generates a plurality of images with different resolutions based on the same frame, and performs the template matching process using the motion search template in order from the low resolution images to calculate the motion vector. An object tracking device characterized by:

On the computer,
Based on information between frames input in time series, a transition destination specifying procedure for specifying a transition destination of a subject,
A first similarity calculation procedure for performing a template matching to calculate a first similarity between each pixel in the frame and a subject tracking template;
The first similarity of each pixel calculated in the first similarity calculation procedure is weighted according to the distance from the transition destination of the subject specified in the transition destination specification procedure, and a second A second similarity calculation procedure for calculating the similarity of
A subject tracking program for executing a subject position specifying procedure for specifying a subject position in a frame based on the second similarity calculated in the second similarity calculation procedure.

A camera comprising the subject tracking device according to claim 1.