JP2017027363A

JP2017027363A - Video processing device, video processing method and program

Info

Publication number: JP2017027363A
Application number: JP2015145227A
Authority: JP
Inventors: 靖子白鷹; Yasuko Shirataka; 多聞貞末; Tamon Sadasue; 康宏梶原; Yasuhiro Kajiwara; 和史松下; Kazufumi Matsushita; 和寛 ▲高▼澤; Kazuhiro Takazawa; 賢青木; Masaru Aoki; 康子橋本; yasuko Hashimoto
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2015-07-22
Filing date: 2015-07-22
Publication date: 2017-02-02

Abstract

PROBLEM TO BE SOLVED: To improve tracking accuracy of a feature point of a video frame.SOLUTION: A video processing device comprises: an extraction part for extracting a feature point from a first frame; a tracking part for performing block matching with a block containing the feature point, with respect to the second frame, for specifying a tracking point corresponding to the feature point; a detection part for detecting a position of a first pixel which is the most similar to the feature point, similarity between the first pixel and the feature point, a position of a second pixel which is the second most similar to the feature point, and similarity between the second pixel and the feature point; a first determination part for, when the position of the first pixel and the tracking point position match, and difference between the similarity of the first pixel and the similarity of the second pixel is equal to or larger than a first threshold, determining that the first condition is satisfied; and a second determination part for, calculating similarity between the tracking point and the feature point, and similarity between each pixel out of the plural pixels on the surrounding of the tracking point, and the feature point, and when difference between the similarity between the tracking point and the feature point, and the similarity between each pixel out of the plural pixels on the surrounding of the tracking point and the feature point, is equal to or larger than a second threshold, determining that the second condition is satisfied. When the first and second conditions are satisfied, it is determined that the feature point corresponds to the tracking point.SELECTED DRAWING: Figure 3

Description

本発明は、映像処理装置、映像処理方法、及びプログラムに関する。 The present invention relates to a video processing apparatus, a video processing method, and a program.

自動車の自立運転等に用いる技術の一つに、自己位置の推定を行う技術（ＳＬＡＭ：ＳｉｍｕｌｔａｎｅｏｕｓｌｙＬｏｃａｌｉｚａｔｉｏｎＡｎｄＭａｐｐｉｎｇ）があり、ＳＬＡＭを用いて正確に自己位置の推定を行う方法が広く検討されている。ＳＬＡＭでは、カメラ等で撮影された映像データから特徴点を抽出し、かかる特徴点を追跡することにより、自己位置の推定を行う。 One of the technologies used for autonomous driving of automobiles is a technology for estimating self-location (SLAM: Simulatively Localization And Mapping), and a method for accurately estimating self-location using SLAM has been widely studied. . In SLAM, a feature point is extracted from video data photographed by a camera or the like, and the self-position is estimated by tracking the feature point.

特徴点の追跡の手法として、映像のフレーム間の所定の領域の類似度を比較して、特徴点の追跡を行うブロックマッチングが広く知られている。 As a feature point tracking method, block matching for tracking feature points by comparing similarities of predetermined regions between frames of a video is widely known.

ブロックマッチングでは、類似度の比較のときの計算誤差により、特徴点の追跡を誤る場合があった。このため、輝度等の画素の属性を用いて、特徴点の追跡の精度を向上させる様々な方法が提案されている（例えば、特許文献１）。 In block matching, tracking of feature points may be mistaken due to a calculation error in comparison of similarities. For this reason, various methods for improving the accuracy of tracking feature points using pixel attributes such as luminance have been proposed (for example, Patent Document 1).

しかし、提案されている方法を用いても、追跡された特徴点の周辺に類似する画素が分布している場合には、特徴点を正確に追跡できないという課題があった。 However, even if the proposed method is used, if similar pixels are distributed around the tracked feature point, there is a problem that the feature point cannot be accurately tracked.

本発明は、上記の課題を鑑みてされたものであって、映像のフレームの特徴点の追跡の精度を向上させることを目的とする。 The present invention has been made in view of the above-described problems, and an object of the present invention is to improve the accuracy of tracking feature points of video frames.

本実施形態に係る映像処理装置は、第１の映像のフレームから特徴点を抽出する抽出部と、前記第１の映像のフレームより時間的に後の映像のフレームである第２の映像のフレームの所定の領域に対して、前記第１の映像のフレームの一部であって、前記特徴点を含むブロックとのブロックマッチング処理を行うことにより、前記特徴点に対応する追跡点を特定する追跡部と、前記所定の領域において、前記特徴点に最も類似する第１の画素の位置、該第１の画素と前記特徴点との類似度、前記特徴点に２番目に類似する第２の画素の位置、及び該第２の画素と前記特徴点との類似度を検出する検出部と、前記追跡点の位置と前記第１の画素の位置とが一致し、かつ前記第１の画素の類似度と前記第２の画素の類似度との差が第１の閾値以上である場合、第１の条件を満たすと判定する第１の判定部と、前記追跡点と前記特徴点との類似度と、前記追跡点の複数の周辺の画素の各々と前記特徴点との類似度とを算出し、算出された前記追跡点の類似度と、前記複数の周辺の画素の各々の類似度との差が、第２の閾値以上である場合、第２の条件を満たすと判定する第２の判定部と、を有し、前記第１の条件、及び前記第２の条件を満たす場合、前記追跡部は、前記特徴点が前記追跡点に対応すると判断する。 The video processing apparatus according to the present embodiment includes an extraction unit that extracts a feature point from a first video frame, and a second video frame that is a video frame temporally later than the first video frame. Tracking for identifying a tracking point corresponding to the feature point by performing block matching processing with a block that is a part of the frame of the first video and includes the feature point for a predetermined region of And the position of the first pixel most similar to the feature point in the predetermined area, the similarity between the first pixel and the feature point, and the second pixel second closest to the feature point And the detection unit for detecting the similarity between the second pixel and the feature point, the tracking point position matches the first pixel position, and the first pixel similarity And the difference between the degree of similarity and the second pixel similarity is greater than or equal to a first threshold value In some cases, a first determination unit that determines that the first condition is satisfied, a similarity between the tracking point and the feature point, and a similarity between each of a plurality of surrounding pixels of the tracking point and the feature point When the difference between the calculated similarity of the tracking points and the similarity of each of the plurality of surrounding pixels is equal to or greater than a second threshold, it is determined that the second condition is satisfied. When the first condition and the second condition are satisfied, the tracking unit determines that the feature point corresponds to the tracking point.

本実施形態によれば、映像のフレームの特徴点の追跡の精度が向上された映像処理装置、映像処理方法、及びプログラムを提供することが可能となる。 According to the present embodiment, it is possible to provide a video processing apparatus, a video processing method, and a program with improved accuracy of tracking feature points of video frames.

一実施形態に係る映像処理装置の機能ブロック図である。It is a functional block diagram of the video processing device concerning one embodiment. 一実施形態に係る映像処理装置の追跡部の機能ブロック図である。It is a functional block diagram of the tracking part of the video processing device concerning one embodiment. 一実施形態に係る動作手順を示すフローチャートである。It is a flowchart which shows the operation | movement procedure which concerns on one Embodiment. 一実施形態に係るブロックマッチングの処理を示す図（その１）である。It is a figure (the 1) which shows the process of the block matching which concerns on one Embodiment. 一実施形態に係るブロックマッチングの処理を示す図（その２）である。It is FIG. (2) which shows the process of the block matching which concerns on one Embodiment. 一実施形態に係るブロックマッチングの処理の計算式を示す図である。It is a figure which shows the calculation formula of the process of the block matching which concerns on one Embodiment. 一実施形態に係る第１の判定の動作を説明するための図（その１）である。It is FIG. (1) for demonstrating the operation | movement of the 1st determination which concerns on one Embodiment. 一実施形態に係る第１の判定の動作の計算式を示す図（その１）である。It is FIG. (1) which shows the calculation formula of the operation | movement of the 1st determination which concerns on one Embodiment. 一実施形態に係る第１の判定の動作を説明するための図（その２）である。FIG. 10 is a diagram (No. 2) for explaining the operation of the first determination according to the embodiment; 一実施形態に係る第１の判定の動作の計算式を示す図（その２）である。It is FIG. (2) which shows the calculation formula of the operation | movement of the 1st determination which concerns on one Embodiment. 一実施形態に係る第１の判定の動作を説明するための図（その３）である。It is FIG. (3) for demonstrating the operation | movement of the 1st determination which concerns on one Embodiment. 一実施形態に係る第１の判定の動作の計算式を示す図（その３）である。It is FIG. (The 3) which shows the calculation formula of the operation | movement of the 1st determination which concerns on one Embodiment. 一実施形態に係る第１の判定の動作を説明するための図（その４）である。It is FIG. (4) for demonstrating the operation | movement of the 1st determination which concerns on one Embodiment. 一実施形態に係る第１の判定の動作を示すフローチャートである。It is a flowchart which shows the operation | movement of the 1st determination which concerns on one Embodiment. 一実施形態に係るブロックマッチングの処理を示す図（その３）である。It is FIG. (The 3) which shows the process of the block matching which concerns on one Embodiment. 一実施形態に係る第２の判定の動作を説明するための図である。It is a figure for demonstrating the operation | movement of the 2nd determination which concerns on one Embodiment. 一実施形態に係る第２の判定の動作の計算式を示す図である。It is a figure which shows the calculation formula of the operation | movement of the 2nd determination which concerns on one Embodiment. 一実施形態に係る映像処理装置のハードウェア構成図である。It is a hardware block diagram of the video processing apparatus which concerns on one Embodiment.

［第１実施形態］
＜機能構成＞
（１）全体構成
図１を用いて、本実施形態に係る映像処理装置１の機能構成について説明する。映像処理装置１は、撮影部１００と、映像入力部１１０と、映像補正部１２０と、特徴点抽出部１３０と、追跡部１４０と、フレームバッファ１５０と、重複点除去部１６０とを有する。 [First Embodiment]
<Functional configuration>
(1) Overall Configuration The functional configuration of the video processing apparatus 1 according to the present embodiment will be described using FIG. The video processing apparatus 1 includes a photographing unit 100, a video input unit 110, a video correction unit 120, a feature point extraction unit 130, a tracking unit 140, a frame buffer 150, and an overlapping point removal unit 160.

撮影部１００は、カメラで映像を撮影する。撮影された映像を映像入力部１１０に出力する。カメラは単眼カメラ、ステレオカメラのいずれでもよい。撮影部１００のカメラは、自動車に取り付ける車載カメラでもよい。 The photographing unit 100 photographs a video with a camera. The captured video is output to the video input unit 110. The camera may be a monocular camera or a stereo camera. The camera of the photographing unit 100 may be an in-vehicle camera attached to a car.

映像入力部１１０は、撮影部１００によって撮影された映像の入力を受け付ける。また、映像入力部１１０は、撮影部１００以外から映像の入力を受け付けてもよい。例えば、映像入力部１１０は、記憶媒体に記憶された映像の入力を受け付けてもよいし、ネットワーク経由で映像の入力を受け付けてもよい。 The video input unit 110 receives an input of a video shot by the shooting unit 100. Further, the video input unit 110 may accept video input from other than the photographing unit 100. For example, the video input unit 110 may accept input of video stored in a storage medium, or may accept video input via a network.

映像補正部１２０は、映像入力部１１０から映像フレームを取得し、かかる映像フレームのひずみを補正する。 The video correction unit 120 acquires a video frame from the video input unit 110 and corrects distortion of the video frame.

特徴点抽出部１３０は、映像フレームの特徴点を抽出する。特徴点とは、例えば、物体の角など、際立って検出できる映像フレーム上の点である。 The feature point extraction unit 130 extracts feature points of the video frame. A feature point is a point on a video frame that can be detected prominently, such as a corner of an object.

フレームバッファ１５０は、映像補正部１２０から取得した映像フレームを保存する。フレームバッファ１５０は、追跡部１４０に、映像フレームを送信する。 The frame buffer 150 stores the video frame acquired from the video correction unit 120. The frame buffer 150 transmits the video frame to the tracking unit 140.

追跡部１４０は、特徴点抽出部１３０で抽出された特徴点を追跡する。具体的には、追跡部１４０は、特徴点が抽出された映像フレームより時間的に後の映像フレームから、かかる特徴点に対応する追跡点を抽出する。追跡部１４０は、映像補正部１２０から映像フレームを取得する。 The tracking unit 140 tracks the feature points extracted by the feature point extraction unit 130. Specifically, the tracking unit 140 extracts a tracking point corresponding to the feature point from a video frame temporally after the video frame from which the feature point is extracted. The tracking unit 140 acquires a video frame from the video correction unit 120.

追跡部１４０は、特徴点抽出部１３０から、かかる映像フレームにおいて抽出された特徴点に関する情報を取得する。特徴点に関する情報とは、例えば、特徴点における輝度、及び特徴点の映像フレーム内の位置を示す座標情報等である。追跡部１４０は、フレームバッファ１５０から特徴点が抽出された映像フレームより時間的に前の映像フレームを取得する。追跡部１４０は、取得した映像フレーム、及び特徴点に関する情報から、特徴点を追跡する。 The tracking unit 140 acquires information on the feature points extracted in the video frame from the feature point extraction unit 130. The information on the feature point is, for example, the luminance at the feature point and coordinate information indicating the position of the feature point in the video frame. The tracking unit 140 acquires a video frame temporally prior to the video frame from which the feature points are extracted from the frame buffer 150. The tracking unit 140 tracks feature points from the acquired video frame and information on the feature points.

追跡部１４０が、時間Ｍにおける映像フレームＦ（Ｍ）で抽出された特徴点を追跡する場合について説明する。 A case where the tracking unit 140 tracks feature points extracted in the video frame F (M) at time M will be described.

追跡部１４０は、時間Ｍにおける映像フレームＦ（Ｍ）における特徴点に関する情報を特徴点抽出部１３０から取得する。追跡部１４０は、時間Ｍ＋１における映像フレームＦ（Ｍ＋１）を映像補正部１２０から取得する。追跡部１４０は、フレームバッファ１５０より、映像フレームＦ（Ｍ）を取得する。 The tracking unit 140 acquires information on the feature points in the video frame F (M) at the time M from the feature point extraction unit 130. The tracking unit 140 acquires the video frame F (M + 1) at the time M + 1 from the video correction unit 120. The tracking unit 140 acquires the video frame F (M) from the frame buffer 150.

追跡部１４０は、映像フレームＦ（Ｍ）と、かかる映像フレームにおける特徴点に関する情報とから、映像フレームＦ（Ｍ＋１）における特徴点に対応する追跡点を抽出する。 The tracking unit 140 extracts a tracking point corresponding to the feature point in the video frame F (M + 1) from the video frame F (M) and information about the feature point in the video frame.

重複点除去部１６０は、特徴点抽出部１３０で抽出された映像フレームＦ（Ｍ＋１）の特徴点と、追跡部１４０で抽出された映像フレームＦ（Ｍ＋１）の追跡点とが重複する場合に、一方を除去する。 When the feature point of the video frame F (M + 1) extracted by the feature point extraction unit 130 and the tracking point of the video frame F (M + 1) extracted by the tracking unit 140 overlap, Remove one.

（２）追跡部の構成
図２を用いて、追跡部１４０、及び追跡部１４０と関連する機能部について詳細に説明する。 (2) Configuration of Tracking Unit The tracking unit 140 and functional units related to the tracking unit 140 will be described in detail with reference to FIG.

追跡部１４０は、ブロックマッチング部１４１と、極値検出部１４２と、単一性判定部１４３と、曲面形状判定部１４５とを有する。 The tracking unit 140 includes a block matching unit 141, an extreme value detection unit 142, a unity determination unit 143, and a curved surface shape determination unit 145.

特徴点抽出部１３０は、第１の映像のフレームから特徴点を抽出する。 The feature point extraction unit 130 extracts feature points from the frame of the first video.

ブロックマッチング部１４１は、かかる第１の映像のフレームより時間的に後の映像のフレームである第２の映像のフレームの所定の領域に対してブロックマッチング処理を行い、特徴点に対応する追跡点を特定する。なお、ブロックマッチング処理を実行する場合、ブロックマッチング部１４１は、第１の映像のフレームの一部であって特徴点を含むブロックを用いる。 The block matching unit 141 performs block matching processing on a predetermined region of the second video frame that is a video frame temporally subsequent to the first video frame, and performs tracking points corresponding to the feature points. Is identified. When executing the block matching process, the block matching unit 141 uses a block that is a part of the frame of the first video and includes a feature point.

極値検出部１４２は、第２の映像のフレームの所定の領域において、特徴点に最も類似する第１の画素の位置、第１の画素と特徴点との類似度、特徴点に２番目に類似する第２の画素の位置、及び第２の画素と特徴点との類似度を検出する。 The extreme value detection unit 142 is second in the predetermined region of the second video frame, the position of the first pixel most similar to the feature point, the similarity between the first pixel and the feature point, and the feature point. The position of the similar second pixel and the similarity between the second pixel and the feature point are detected.

なお、以下の記述において類似度とは、特徴点に対する類似度を意味する。従って、例えば「画素Ａの類似度」とは、「特徴点と画素Ａとの類似度」を意味する。 In the following description, the similarity means the similarity to the feature point. Therefore, for example, “similarity of pixel A” means “similarity between feature points and pixel A”.

なお、極値検出部１４２は、第１の画素と、第２の画素とを検出するときに、画素自体の特徴点との類似度に加えて、かかる画素と特徴点との類似度と、隣接する画素と特徴点との類似度の差を考慮して第１の画素と第２の画素とを検出してもよい。 The extreme value detection unit 142 detects the first pixel and the second pixel, in addition to the similarity with the feature point of the pixel itself, the similarity between the pixel and the feature point, The first pixel and the second pixel may be detected in consideration of the difference in similarity between adjacent pixels and feature points.

単一性判定部１４３は、追跡点の位置と第１の画素の位置とが一致し、かつ第１の画素の類似度と第２の画素の類似度との差が第１の閾値以上である場合、第１の条件を満たすと判定する。 The unity determination unit 143 matches the position of the tracking point with the position of the first pixel, and the difference between the similarity of the first pixel and the similarity of the second pixel is equal to or greater than the first threshold. If there is, it is determined that the first condition is satisfied.

曲面形状判定部１４５は、追跡点と特徴点との類似度と、追跡点の周辺の複数の画素の各々と特徴点との類似度とを算出し、算出された追跡点の類似度と、周辺の複数の画素の各々の類似度との差が、第２の閾値以上である場合、第２の条件を満たすと判定する。 The curved surface shape determination unit 145 calculates the similarity between the tracking point and the feature point, the similarity between each of the plurality of pixels around the tracking point and the feature point, and the calculated similarity between the tracking points, When the difference between each of the plurality of neighboring pixels is equal to or greater than the second threshold, it is determined that the second condition is satisfied.

追跡部１４０は、第１の条件、及び第２の条件を満たす場合、第１の映像のフレームの特徴点が、第２の映像のフレームにおいて追跡点に対応すると判断する。この場合、追跡部１４０は、第１の映像のフレームの特徴点が、第２の映像のフレームにおいて追跡点に移動したと判断してもよい。 When the first condition and the second condition are satisfied, the tracking unit 140 determines that the feature point of the frame of the first video corresponds to the tracking point in the frame of the second video. In this case, the tracking unit 140 may determine that the feature point of the frame of the first video has moved to the tracking point in the frame of the second video.

ブロックマッチング処理に加えて、第１の条件、及び第２の条件を満たす場合に、特徴点が追跡点に対応すると判断されるため、映像処理装置１は、追跡点を正確に特定することができる。 In addition to the block matching process, when the first condition and the second condition are satisfied, it is determined that the feature point corresponds to the tracking point. Therefore, the video processing apparatus 1 can accurately specify the tracking point. it can.

なお、ブロックマッチング部１４１は追跡部の一例である。極値検出部１４２は、検出部の一例である。単一性判定部１４３は、第１の判定部の一例である。曲面形状判定部１４５は、第２の判定部の一例である。 The block matching unit 141 is an example of a tracking unit. The extreme value detection unit 142 is an example of a detection unit. The unity determination unit 143 is an example of a first determination unit. The curved surface shape determination unit 145 is an example of a second determination unit.

第１の画素と、第２の画素とを検出する場合、極値検出部１４２は、第２の映像のフレームの所定の領域において、特徴点と類似する複数の類似画素を検出してもよい。かかる複数の類似画素は隣接する画素より特徴点と類似する。これらの類似画素は極値を有する画素と表現されてもよい。例えば、隣接する８画素よりも特徴点と類似する画素が、極値を有する画素であってもよい。極値検出部１４２は、極値を有する画素の中から第１の画素と第２の画素とを検出してもよい。この場合、特徴点に最も類似する極値を有する画素が第１の画素、特徴点に２番目に類似する極値を有する画素が第２の画素となる。 When detecting the first pixel and the second pixel, the extreme value detection unit 142 may detect a plurality of similar pixels similar to the feature points in a predetermined region of the frame of the second video. . The plurality of similar pixels are more similar to feature points than neighboring pixels. These similar pixels may be expressed as pixels having extreme values. For example, a pixel that is more similar to the feature point than the adjacent eight pixels may be a pixel having an extreme value. The extreme value detection unit 142 may detect the first pixel and the second pixel from the pixels having the extreme value. In this case, the pixel having the extreme value most similar to the feature point is the first pixel, and the pixel having the extreme value second most similar to the feature point is the second pixel.

映像フレームが白黒の映像の場合、特徴点との類似度は、特徴点における輝度と、類似度の算出対象となる画素の輝度との差で表されてもよい。かかる差が小さい画素ほど、特徴点と類似している。 When the video frame is a black and white video, the similarity to the feature point may be represented by the difference between the luminance at the feature point and the luminance of the pixel whose similarity is to be calculated. Pixels with such smaller differences are more similar to feature points.

映像フレームがカラーの映像の場合、特徴点との類似度は、特徴点における各色の輝度の合計値と、類似度の算出対象となる画素の輝度の合計値との差で表されてもよい。かかる合計値との差が小さい画素ほど、特徴点と類似している。 When the video frame is a color video, the similarity with the feature point may be represented by the difference between the total luminance value of each color at the feature point and the total luminance value of the pixels for which similarity is calculated. . Pixels with a smaller difference from the total value are more similar to feature points.

ブロックマッチング部１４１は、画素よりも細かい単位である副画素の精度で、ブロックマッチング処理を行い、追跡点を特定してもよい。また、極値検出部１４２は、副画素の精度で、第１の画素と、第２の画素とを検出してもよい。 The block matching unit 141 may specify a tracking point by performing a block matching process with the accuracy of a sub-pixel that is a unit smaller than a pixel. Further, the extreme value detection unit 142 may detect the first pixel and the second pixel with the accuracy of the sub-pixel.

追跡点が副画素の精度で特定されるため、特徴点に対応する追跡点を正確に特定することができる。 Since the tracking point is specified with the accuracy of the sub-pixel, the tracking point corresponding to the feature point can be accurately specified.

なお、画素はピクセル、副画素はサブピクセルと表現されてもよい。 The pixel may be expressed as a pixel, and the sub-pixel may be expressed as a sub-pixel.

極値検出部１４２は、隣接する画素の間に存在する副画素と特徴点との類似度を、隣接する画素の周辺の画素の各々と特徴点との類似度から補間してもよい。 The extreme value detection unit 142 may interpolate the similarity between subpixels and feature points existing between adjacent pixels from the similarity between each of the pixels around the adjacent pixels and the feature points.

例えば、極値検出部１４２は、所定の領域の画素と第１の映像のフレームの所定領域に対応する領域の画素との類似度の変化に基づいて副画素の座標を算出し、かかる座標の副画素と特徴点との類似度を補間してもよい。変化の傾向から副画素の座標を算出する場合、極値検出部１４２は、勾配法を用いて、かかる座標を算出してもよい。なお、「第１の映像のフレームの所定領域に対応する領域の画素」とは、所定の領域の画素と第１の映像のフレームの中で同一の位置にある画素である。 For example, the extreme value detection unit 142 calculates the coordinates of the sub-pixel based on the change in the similarity between the pixel in the predetermined area and the pixel in the area corresponding to the predetermined area of the frame of the first video, The similarity between the sub-pixel and the feature point may be interpolated. When calculating the coordinates of the sub-pixel from the change tendency, the extreme value detection unit 142 may calculate the coordinates using a gradient method. The “pixels in the region corresponding to the predetermined region of the first video frame” are pixels in the same position as the pixels in the predetermined region.

具体的な、補間の手順については後述する。 A specific interpolation procedure will be described later.

曲面形状判定部１４５は、追跡点と特徴点との類似度と、周辺の画素と特徴点との類似度とから２次曲面の類似度の分布を形成し、形成された２次曲面の類似度の分布から第２の条件を満たすかを判定してもよい。 The curved surface shape determination unit 145 forms a similarity distribution of the quadric surface from the similarity between the tracking point and the feature point and the similarity between the surrounding pixels and the feature point, and the similarity of the formed quadric surface Whether the second condition is satisfied may be determined from the degree distribution.

例えば、曲面形状判定部１４５は、２次曲面が平坦に近い形状の場合、つまり特徴点と類似している点が周辺の画素と比較して際立っていない場合、第２の条件を満たさないと判断してもよい。これにより、追跡点の誤検出を避けることができる。 For example, the curved surface shape determination unit 145 does not satisfy the second condition when the quadratic curved surface has a nearly flat shape, that is, when the point similar to the feature point is not distinguished from the surrounding pixels. You may judge. Thereby, erroneous detection of tracking points can be avoided.

２次曲面の分布を表す方程式の２次の係数の少なくとも１つが、第３の閾値以上の場合、曲面形状判定部１４５は、第２の条件を満たすと判定してもよい。かかる２次の係数が大きい程、曲面が際立っており、追跡点が特徴点に対応する可能性が高いためである。 When at least one of the quadratic coefficients of the equation representing the distribution of the quadratic curved surface is equal to or greater than the third threshold value, the curved surface shape determining unit 145 may determine that the second condition is satisfied. This is because as the secondary coefficient increases, the curved surface becomes more conspicuous and the tracking point is more likely to correspond to the feature point.

極値検出部１４２は、極値を有する画素を、特徴点と類似する順に並べ、特徴点とＮ番目に類似する極値を有する画素と、特徴点とＮ＋１番目に類似する極値を有する画素との特徴点との類似度の差を算出する。ここでＮは２以上の自然数であり、Ｎ＋１の最大値は第２の映像のフレームの所定の領域に含まれる極値を有する画素の数である。 The extreme value detection unit 142 arranges pixels having extreme values in the order similar to the feature points, the pixel having the Nth similar extreme value to the feature points, and the pixel having the N + 1th extreme value similar to the feature points. The difference in similarity with the feature point is calculated. Here, N is a natural number equal to or greater than 2, and the maximum value of N + 1 is the number of pixels having extreme values included in a predetermined region of the frame of the second video.

単一性判定部１４３は、第１の画素の類似度と第２の画素の類似度との差が、第１の閾値以上でない場合であっても、第１の条件を満たすと判定してもよい場合がある。 The unity determination unit 143 determines that the first condition is satisfied even when the difference between the similarity of the first pixel and the similarity of the second pixel is not equal to or greater than the first threshold. There are cases where it is good.

例えば、第１の画素の類似度と第２の画素の類似度との差が、第２の映像のフレームの所定の領域に含まれる特徴点とＮ番目に類似する極値を有する画素と特徴点とＮ＋１番目に類似する極値を有する画素との間の類似度の差より大きい場合、第１の条件を満たすと判定してもよい。 For example, the difference between the similarity of the first pixel and the similarity of the second pixel is a pixel and a feature having an extreme value similar to the feature point included in a predetermined region of the frame of the second video. If the difference between the points and the pixel having the N + 1th most similar extreme value is greater than the similarity difference, it may be determined that the first condition is satisfied.

第１の画素が特徴点と最も類似しかつ、第１の画素と第２の画素との類似度の差が、特徴点とＮ番目に類似する極値を有する画素と特徴点とＮ＋１番目に類似する極値を有する画素との類似度の差よりも大きいため、他の画素よりも特徴点と際立って類似していると判断できるためである。 The first pixel is the most similar to the feature point, and the difference in similarity between the first pixel and the second pixel is the N + 1th extreme point similar to the feature point, the feature point, and the N + 1th feature point. This is because the difference in degree of similarity with a pixel having a similar extreme value is larger, so that it can be determined that the feature point is remarkably similar to other pixels.

第１の条件を満たさない場合、映像処理装置１は、第２の映像フレームの所定の領域を拡大した領域において、ブロックマッチング処理と、第１の条件を満たすかの判定を実行してもよい。 When the first condition is not satisfied, the video processing device 1 may execute block matching processing and a determination as to whether or not the first condition is satisfied in an area obtained by enlarging a predetermined area of the second video frame. .

なお、「第２の映像フレームの所定の領域」は、第１の探索ウィンドウと呼ばれてもよい。 The “predetermined area of the second video frame” may be referred to as a first search window.

＜動作手順＞
図３を用いて、本実施形態の動作手順の一例について説明する。 <Operation procedure>
An example of the operation procedure of this embodiment will be described with reference to FIG.

特徴点に対応する追跡点を特定する場合、映像処理装置１は、ブロックマッチング処理、第１の判定の処理、第２の判定の処理の順番で処理を実行する。 When specifying a tracking point corresponding to a feature point, the video processing device 1 executes processing in the order of block matching processing, first determination processing, and second determination processing.

ステップＳ３００で、ブロックマッチング部１４１は、ブロックマッチング処理を実行する。ブロックマッチング部１４１は、第１の映像フレームで抽出された特徴点を中心とした任意のサイズの画像を第１の映像フレームから抜き出す。抜き出された画像は、テンプレートパッチと呼ばれてもよい。次に、ブロックマッチング部１４１は、テンプレートパッチを、第２の映像フレームの第１の探索ウィンドウ内で移動させる。 In step S300, the block matching unit 141 executes a block matching process. The block matching unit 141 extracts an image of an arbitrary size centering on the feature point extracted in the first video frame from the first video frame. The extracted image may be referred to as a template patch. Next, the block matching unit 141 moves the template patch within the first search window of the second video frame.

ブロックマッチング部１４１は、テンプレートパッチを第１の探索ウィンドウ内で移動させ、テンプレートパッチと、テンプレートパッチと重複する部分との類似度を計算し、最も類似する位置におけるテンプレートパッチの中心を追跡点とする。具体的な計算方法については後述する。 The block matching unit 141 moves the template patch in the first search window, calculates the similarity between the template patch and a portion overlapping the template patch, and sets the center of the template patch at the most similar position as the tracking point. To do. A specific calculation method will be described later.

ブロックマッチング処理が成功した場合（ステップＳ３０１ＹＥＳ）、ステップＳ３０２に進む。一方、ブロックマッチング処理が成功しなかった場合（ステップＳ３０１ＮＯ）、ステップＳ３０８に進む。 If the block matching process is successful (step S301 YES), the process proceeds to step S302. On the other hand, if the block matching process is not successful (NO in step S301), the process proceeds to step S308.

ステップＳ３０２乃至ステップＳ３０４は、第１の条件の判定の処理の動作手順である。 Steps S <b> 302 to S <b> 304 are an operation procedure of a first condition determination process.

ステップＳ３０２で、極値検出部１４２は、単一性比較値の計算処理を行う。具体的には、極値検出部１４２は、探索ウィンドウ内の画素と特徴点との類似度を算出する。類似度は、例えば特徴点と探索ウィンドウ内の画素との輝度の差でもよい。この場合、類似度が小さい程、画素は特徴点と類似している。 In step S302, the extreme value detection unit 142 performs a unity comparison value calculation process. Specifically, the extreme value detection unit 142 calculates the similarity between the pixel in the search window and the feature point. The similarity may be, for example, a difference in luminance between a feature point and a pixel in the search window. In this case, the smaller the similarity is, the more similar the pixel is to the feature point.

極値検出部１４２は、周辺の画素よりも類似度が小さい画素を抽出し、極値とする。例えば、極値検出部１４２は、任意の画素と、かかる任意の画素の周辺の８画素とを比較して、極値を抽出してもよい。 The extreme value detection unit 142 extracts a pixel having a smaller degree of similarity than the surrounding pixels and sets it as an extreme value. For example, the extreme value detection unit 142 may extract an extreme value by comparing an arbitrary pixel with eight pixels around the arbitrary pixel.

探索ウィンドウ内で、最も小さい極値を有する画素が第１の画素であり、２番目に小さい極値を有する画素が第２の画素である。 In the search window, the pixel having the smallest extreme value is the first pixel, and the pixel having the second smallest extreme value is the second pixel.

ステップＳ３０３で、極値検出部１４２は、第１の画素の類似度と第２の画素の類似度とを比較し、類似度の差が、第１の閾値以上か比較する。第１の閾値は、映像処理装置１のユーザが設定可能な値である。例えば、第１の閾値に第１の画素の類似度の１０％、２０％といった値が設定されてもよい。 In step S303, the extreme value detection unit 142 compares the similarity of the first pixel with the similarity of the second pixel, and compares whether the difference in similarity is equal to or greater than the first threshold. The first threshold is a value that can be set by the user of the video processing device 1. For example, a value such as 10% or 20% of the similarity of the first pixel may be set as the first threshold.

ステップＳ３０４で、類似度の差が第１の閾値以上であり、かつ第１の画素の位置がブロックマッチングで抽出した追跡点と一致する場合、単一性判定部１４３は、単一性判定が成功したと判定し、ステップＳ３０５に進む（ステップＳ３０４ＹＥＳ）。「単一性判定が成功したこと」は、「第１の条件を満たす」と表現されてもよい。 In step S304, when the difference in similarity is equal to or greater than the first threshold and the position of the first pixel matches the tracking point extracted by block matching, the unity determination unit 143 performs unity determination. It determines with having succeeded and progresses to step S305 (step S304 YES). “Success in unity determination” may be expressed as “first condition is satisfied”.

一方、類似度の差が第１の閾値以上でない、又は第１の画素の位置が追跡点と一致しない場合、単一性判定部１４３は、単一性判定が失敗したと判定し、ステップＳ３０８に進む（ステップＳ３０８ＮＯ）。 On the other hand, if the difference in similarity is not greater than or equal to the first threshold value, or the position of the first pixel does not match the tracking point, the unity determination unit 143 determines that unity determination has failed and step S308. (Step S308 NO).

ステップＳ３０５乃至ステップＳ３０７は、第２の条件の判定の処理の動作手順である。 Steps S <b> 305 to S <b> 307 are operation procedures for determining the second condition.

ステップＳ３０５で、曲面形状判定部１４５は、追跡点と特徴点との類似度と、追跡点の周辺の画素と特徴点との類似度とを算出し、算出された追跡点の類似度と、周辺の画素の類似度とから２次曲面の分布を形成する。 In step S305, the curved surface shape determination unit 145 calculates the similarity between the tracking point and the feature point and the similarity between the pixel around the tracking point and the feature point, and the calculated tracking point similarity, A quadric surface distribution is formed from the similarity of surrounding pixels.

ステップＳ３０６で、曲面形状判定部１４５は、形状が際立っているか判定する。例えば、２次曲面の分布を表す方程式の２次の係数の少なくとも１つ以上が、第３の閾値以上の場合、曲面形状判定部１４５は、形状が際立っていると判定する。 In step S306, the curved surface shape determination unit 145 determines whether the shape stands out. For example, if at least one of the quadratic coefficients of the equation representing the distribution of the quadratic curved surface is equal to or greater than the third threshold, the curved surface shape determining unit 145 determines that the shape is outstanding.

２次の係数が大きい場合には、曲面の形状が平坦ではなく、形状が際立っているためである。２次の係数は周辺の画素の類似度と、追跡点の類似度との差を表している。第３の閾値は、映像処理装置１のユーザが設定可能な値であり、０．１、０．２といった値が設定される。 This is because when the secondary coefficient is large, the shape of the curved surface is not flat and the shape stands out. The secondary coefficient represents the difference between the similarity of the surrounding pixels and the similarity of the tracking points. The third threshold is a value that can be set by the user of the video processing apparatus 1 and is set to a value such as 0.1 or 0.2.

ステップＳ３０６で、曲面形状判定部１４５が、形状が際立っていると判定した場合、つまり、第２の条件を満たすと判定した場合、ステップＳ３０７に進む。一方、曲面形状判定部１４５が、形状が際立っていないと判定した場合、つまり第２の条件を満たさないと判定した場合、ステップＳ３０８に進む。 If the curved surface shape determination unit 145 determines in step S306 that the shape is outstanding, that is, if it is determined that the second condition is satisfied, the process proceeds to step S307. On the other hand, if the curved surface shape determination unit 145 determines that the shape is not outstanding, that is, if it is determined that the second condition is not satisfied, the process proceeds to step S308.

ステップＳ３０７に進んだ場合、追跡部１４０は、追跡が成功したと判定する。一方、ステップＳ３０８に進んだ場合、追跡部１４０は、追跡が失敗したと判定する。 When the process proceeds to step S307, the tracking unit 140 determines that the tracking is successful. On the other hand, when the process proceeds to step S308, the tracking unit 140 determines that the tracking has failed.

なお、映像処理装置１は、ステップＳ３０２及びステップＳ３０３を実行し、第１の画素と、第２の画素とを抽出した後に、ステップＳ３００及びステップＳ３０１のブロックマッチングの処理を実行してもよい。 Note that the video processing apparatus 1 may execute the block matching processing in steps S300 and S301 after executing steps S302 and S303 to extract the first pixel and the second pixel.

＜ブロックマッチング処理＞
図４乃至図６を用いて、ブロックマッチング処理について説明する。 <Block matching process>
The block matching process will be described with reference to FIGS.

図４及び図５は、ブロックマッチングの処理の概要を示す図である。図４の（１）に示すように、ブロックマッチング部１４１は、ｎ枚目の映像のフレーム４１０から特徴点４００を抽出する。ブロックマッチング部１４１は、ｎ枚目の映像のフレーム４１０から特徴点４００を中心にＮ（画素）×Ｎ（画素）のサイズの領域を抜き出しテンプレートパッチ４３０を生成する。 4 and 5 are diagrams showing an outline of block matching processing. As shown in (1) of FIG. 4, the block matching unit 141 extracts the feature point 400 from the frame 410 of the nth video. The block matching unit 141 extracts a region having a size of N (pixels) × N (pixels) around the feature point 400 from the frame 410 of the nth video, and generates a template patch 430.

図４の（２）に示すように、ブロックマッチング部１４１は、ｎ枚目以降の映像のフレーム４２０の中のＨ（画素）×Ｗ（画素）のサイズの領域を抜き出し第２の探索ウィンドウ４４０を生成する。 As shown in (2) of FIG. 4, the block matching unit 141 extracts an area having a size of H (pixel) × W (pixel) in the frame 420 of the nth and subsequent images, and the second search window 440. Is generated.

ここで、テンプレートパッチ４３０のサイズ、及び第２の探索ウィンドウ４４０のサイズは映像処理装置１のユーザが設定可能な値である。 Here, the size of the template patch 430 and the size of the second search window 440 are values that can be set by the user of the video processing apparatus 1.

図５の（１）に示すように、ブロックマッチング部１４１は、テンプレートパッチ４３０の中心を第２の探索ウィンドウ４４０内で移動させる。つまり、ブロックマッチング部１４１は、ｎ枚目以降の映像のフレーム４２０の中の（Ｈ＋Ｎ）（画素）×（Ｗ＋Ｎ）（画素）のサイズの領域を抜き出し、かかる領域内にテンプレートパッチ４３０を移動させる。この領域は、第１の探索ウィンドウ４５０と表現されてもよい。 As shown in (1) of FIG. 5, the block matching unit 141 moves the center of the template patch 430 within the second search window 440. That is, the block matching unit 141 extracts an area having a size of (H + N) (pixel) × (W + N) (pixel) from the frame 420 of the nth and subsequent images, and moves the template patch 430 into the area. . This area may be expressed as a first search window 450.

ブロックマッチング部１４１は、テンプレートパッチ４３０を、第１の探索ウィンドウ４５０内で移動させ、テンプレートパッチ４３０との重複部分と、テンプレートパッチ４３０との類似度を算出する。 The block matching unit 141 moves the template patch 430 in the first search window 450, and calculates the similarity between the overlapping portion with the template patch 430 and the template patch 430.

例えば、図５の（１）に示すように、ブロックマッチング部１４１は、テンプレートパッチ４３０を、第１の探索ウィンドウ４５０内で水平方向に移動させてもよい。ブロックマッチング部１４１は、テンプレートパッチ４３０を１画素ずつ、縦、又は横に移動させつつ、テンプレートパッチ４３０と重複部分との類似度を算出する。ブロックマッチング部１４１は、１画素よりも細かい単位である副画素単位で、テンプレートパッチ４３０を縦、又は横に移動させて類似度を算出してもよい。 For example, as shown in (1) of FIG. 5, the block matching unit 141 may move the template patch 430 in the horizontal direction within the first search window 450. The block matching unit 141 calculates the similarity between the template patch 430 and the overlapping portion while moving the template patch 430 pixel by pixel vertically or horizontally. The block matching unit 141 may calculate the similarity by moving the template patch 430 vertically or horizontally in a sub-pixel unit that is a unit smaller than one pixel.

図６の（式１）に類似度の算出例を示す。テンプレートパッチ４３０内の画素の輝度値の合計値と、重複部分の輝度値の合計値との差Ｅ_ＳＡＤが、最も小さくなる場合、又は所定の閾値よりも小さくなる場合に、ブロックマッチング部１４１は、かかる重複部分の中心が特徴点に対応する追跡点であると判断する。 FIG. 6 (Formula 1) shows an example of calculating the similarity. When the difference _ESAD between the total value of the luminance values of the pixels in the template patch 430 and the total value of the luminance values of the overlapping portions is the smallest or smaller than a predetermined threshold, the block matching unit 141 Then, it is determined that the center of the overlapping portion is a tracking point corresponding to the feature point.

図５の（２）の例では、テンプレートパッチ４３０の中心がＶｎ（Ｘ、Ｙ）に移動した場合に算出されるＥ_ＳＡＤから、ブロックマッチング部１４１は、Ｖｎ（Ｘ、Ｙ）が追跡点５６０であると判断する。 In the example of (2) of FIG. 5, the block matching unit 141 determines that Vn (X, Y) is a tracking point 560 from _ESAD calculated when the center of the template patch 430 is moved to Vn (X, Y). It is judged that.

ブロックマッチング部１４１は、ｎ枚目のフレームの特徴点４００が、ｎ枚目以降の映像のフレーム４２０で追跡点５６０に移動したと判断してもよい。また、ブロックマッチング部１４１は、特徴点が移動の軌跡５７０に沿って移動したと判断してもよい。ブロックマッチング部１４１は、ｎ枚目以降の映像のフレームに繰り返しブロックマッチングの処理を行い、特徴点の移動の軌跡を算出してもよい。特徴点の移動の軌跡は、オプティカルフローと呼ばれてもよい。 The block matching unit 141 may determine that the feature point 400 of the nth frame has moved to the tracking point 560 in the frame 420 of the nth and subsequent images. The block matching unit 141 may determine that the feature point has moved along the movement locus 570. The block matching unit 141 may repeatedly perform block matching processing on the nth and subsequent video frames to calculate a trajectory of the feature point movement. The trajectory of feature point movement may be referred to as an optical flow.

上述した実施形態では、テンプレートパッチ４３０の輝度値の合計値と、重複部分の輝度値の合計値との絶対値の差（ＳｕｍｏｆＡｂｓｏｌｕｔｅＤｉｆｆｅｒｅｎｃｅ）であるＥ_ＳＡＤを用いる場合について説明した。ブロックマッチング部１４１は、その他の方法を用いてブロックマッチングの処理を実行してもよい。例えば、ブロックマッチング部１４１は、テンプレートパッチ４３０と重複部分の輝度値の差の二乗和を算出するＳＳＤ（ＳｕｍｏｆＳｑｕａｒｅｄＤｉｆｆｅｒｅｎｃｅｓ）を用いてもよいし、テンプレートパッチ４３０と重複部分の輝度値の相互相関を算出するＮＣＣ（ＮｏｒｍａｌｉｚｅｄＣｒｏｓｓＣｏｒｒｅｌａｔｉｏｎ）を用いてもよい。 In the above-described embodiment, a case has been described in which _ESAD , which is the difference between the sum of the brightness values of the template patch 430 and the sum of the brightness values of overlapping portions (Sum of Absolute Difference), is used. The block matching unit 141 may execute the block matching process using other methods. For example, the block matching unit 141 may use SSD (Sum of Squared Differences) that calculates the sum of squares of the difference between the luminance values of the template patch 430 and the overlapping portion, and the template patch 430 and the luminance value of the overlapping portion are mutually reciprocal. NCC (Normalized Cross Correlation) for calculating the correlation may be used.

映像のフレームがカラーの場合、ブロックマッチング部１４１は、各色の輝度を合計した値を用いて、テンプレートパッチと重複部分との類似度を判断してもよい。 When the video frame is color, the block matching unit 141 may determine the similarity between the template patch and the overlapping portion using a value obtained by summing the luminances of the respective colors.

＜第１の条件の判定＞
図７乃至図１５を用いて第１の条件の判定の動作について説明する。 <Determination of the first condition>
The operation for determining the first condition will be described with reference to FIGS.

（１）極値の検出方法（その１）
図７と図８を用いて極値を検出する手順について説明する。 (1) Extreme value detection method (1)
A procedure for detecting an extreme value will be described with reference to FIGS.

極値検出部１４２は、第１の探索ウィンドウ内の各画素と特徴点との類似度を算出する。例えば、極値検出部１４２は、第１の探索ウィンドウ内の各画素と特徴点との輝度の差を算出する。図７の（１）に各画素と特徴点との類似度の分布のグラフの一例を示す。類似度の値が小さい画素ほど、特徴点と類似している。 The extreme value detection unit 142 calculates the degree of similarity between each pixel in the first search window and the feature point. For example, the extreme value detection unit 142 calculates a luminance difference between each pixel in the first search window and the feature point. FIG. 7 (1) shows an example of a graph of similarity distribution between each pixel and a feature point. Pixels with smaller similarity values are more similar to feature points.

次に、極値検出部１４２は、算出された特徴点との類似度を用いて極値を有する画素を抽出する。極値を有する画素は、隣接する画素より特徴点と類似している画素である。例えば、図７の（２）において、位置（Ｘ_ｉ，Ｙ_ｉ）の画素と特徴点との類似度Ｒ（Ｘ_ｉ，Ｙ_ｉ）が、周辺の８画素よりも特徴点と類似している場合、極値検出部１４２は、位置（Ｘ_ｉ，Ｙ_ｉ）の画素を極値を有する画素、Ｒ（Ｘ_ｉ，Ｙ_ｉ）をかかる画素の極値と判断する。具体的には、極値検出部１４２は、図８の（式２）を満たす場合、位置（Ｘ_ｉ，Ｙ_ｉ）の画素を極値を有する画素と判断する。 Next, the extreme value detection unit 142 extracts pixels having extreme values using the calculated similarity to the feature points. Pixels having extreme values are pixels that are more similar to feature points than neighboring pixels. For example, in FIG. 7B, the similarity R (X _i , Y _i ) between the pixel at the position (X _i , Y _i ) and the feature point is more similar to the feature point than the surrounding eight pixels. In this case, the extreme value detection unit 142 determines that the pixel at the position (X _i , Y _i ) is a pixel having an extreme value, and R (X _i , Y _i ) is the extreme value of the pixel. Specifically, the extreme value detection unit 142 determines that the pixel at the position (X _i , Y _i ) is a pixel having an extreme value when (Expression 2) in FIG. 8 is satisfied.

極値検出部１４２は、抽出された極値を有する画素のうち、特徴点と最も類似する極値を有する画素を第１の画素とし、特徴点と２番目に類似する極値を有する画素を第２の画素とする。 The extreme value detection unit 142 uses, as the first pixel, a pixel having the extreme value most similar to the feature point among the extracted pixels having the extreme value, and a pixel having the second extreme value similar to the feature point. The second pixel.

図７の（３）に極値となる画素と特徴点との類似度の分布の一例を示す。類似度の値が小さいほど特徴点と類似していることを表している。極値検出部１４２は、類似度の値が最も小さい極値を有する画素７２０と、類似度の値が２番目に小さい極値を有する画素７３０とを検出する。極値検出部１４２は、画素７２０を第１の画素とし、画素７３０を第２の画素とする。 FIG. 7 (3) shows an example of the distribution of similarity between the extreme value pixels and the feature points. The smaller the similarity value, the more similar to the feature point. The extreme value detection unit 142 detects a pixel 720 having an extreme value with the smallest similarity value and a pixel 730 having an extreme value with the second smallest similarity value. The extreme value detection unit 142 sets the pixel 720 as the first pixel and the pixel 730 as the second pixel.

単一性判定部１４３は、第１の画素の位置が、ブロックマッチングの処理で特定された追跡点の位置と同じであり、かつ、第１の画素と特徴点との類似度と第２の画素と特徴点との類似度の差が第１の閾値以上の場合、第１の条件を満たすと判断する。 The unity determination unit 143 has the same position of the first pixel as that of the tracking point specified by the block matching process, and the similarity between the first pixel and the feature point, and the second If the difference in similarity between the pixel and the feature point is greater than or equal to the first threshold, it is determined that the first condition is satisfied.

第１の画素と特徴点との類似度と、第２の画素と特徴点との類似度が十分に離れていない場合には、単一性判定部１４３は、追跡点を誤って抽出する可能性が高い。例えば、探索ウィンドウ内の画像に、縞模様のような、類似する特徴が繰り返し出現する場合、単一性判定部１４３は、特徴点に対応する追跡点を誤って抽出する可能性が高い。 When the similarity between the first pixel and the feature point and the similarity between the second pixel and the feature point are not sufficiently separated, the unity determination unit 143 may extract the tracking point by mistake. High nature. For example, when similar features such as a striped pattern repeatedly appear in the image in the search window, the unity determination unit 143 is likely to erroneously extract tracking points corresponding to the feature points.

単一性判定部１４３は、第１の画素と特徴点との類似度と、第２の画素と特徴点との類似度の差が第１の閾値以上の場合にのみ第１の画素を追跡点の候補とする。このため、単一性判定部１４３は、適切な追跡点の候補を抽出することができる。 The unity determination unit 143 tracks the first pixel only when the difference between the similarity between the first pixel and the feature point and the similarity between the second pixel and the feature point is greater than or equal to the first threshold. Candidate points. Therefore, the unity determination unit 143 can extract an appropriate tracking point candidate.

（２）副画素における極値の検出方法
図９を用いて副画素における極値の検出方法について説明する。上述した極値の検出方法においては、画素単位の精度で極値を算出したが、極値検出部１４２は、画素より細かい精度である副画素単位で極値の検出を行ってもよい。 (2) Method for detecting extreme values in subpixels A method for detecting extreme values in subpixels will be described with reference to FIG. In the above-described extremum detection method, the extremum is calculated with accuracy in units of pixels. However, the extremum detection unit 142 may detect the extremum in units of sub-pixels that are finer than pixels.

上述した検出方法では、単一性判定部１４３は、中心の画素（Ｘ_ｉ，Ｙ_ｉ）と特徴点との類似度と、周辺の８画素と特徴点との類似度とを比較し、極値を検出している。かかる検出方法では、中心の画素と特徴点との類似度と、略同じ特徴点との類似度を有する画素が周辺８点内の画素に存在した場合、図８の（式２）を満たさない。 In the detection method described above, the unity determination unit 143 compares the similarity between the central pixel (X _i , Y _i ) and the feature point with the similarity between the eight neighboring pixels and the feature point, and determines the poles. The value is detected. In such a detection method, when the pixels having the similarity between the central pixel and the feature point and the similarities with the substantially same feature point exist in the pixels in the surrounding eight points, (Equation 2) in FIG. 8 is not satisfied. .

図８の（式２）を満たさないときには、画像に特徴がなく追跡点になり得ない場合と、画素の間に極値が存在する場合とがある。図９は、画素の間に極値が存在する場合の一例を示している。図９は、隣接する画素である（Ｘ_ｉ，Ｙ_ｉ）と（Ｘ_ｉ＋１，Ｙ_ｉ）における特徴点との類似度Ｒ（Ｘ_ｉ，Ｙ_ｉ）とＲ（Ｘ_ｉ＋１，Ｙ_ｉ）とが略同じ場合に、副画素（ａ，ｂ）９００において、極値が存在する例を示している。 When (Equation 2) in FIG. 8 is not satisfied, there are a case where the image has no feature and cannot be a tracking point, and an extreme value exists between pixels. FIG. 9 shows an example in the case where extreme values exist between pixels. FIG. 9 shows the similarity R (X _i , Y _i ) and R (X _{i + 1} , Y _i ) between feature points in adjacent pixels (X _i , Y _i ) and (X _{i + 1} , Y _i ). In the case where the subpixels (a, b) 900 are substantially the same, an example in which extreme values exist is shown.

極値検出部１４２は、画素の精度よりも正確に極値を検出するため、副画素の精度で極値の検出を実行してもよい。 The extreme value detection unit 142 may detect the extreme value with the accuracy of the sub-pixel in order to detect the extreme value more accurately than the accuracy of the pixel.

副画素の精度で極値の検出を実行する場合、極値検出部１４２は、中心の画素及び、周辺の画素と特徴点との類似度を補間することにより副画素の精度で極値を検出する。 When detecting the extreme value with the accuracy of the sub-pixel, the extreme value detection unit 142 detects the extreme value with the accuracy of the sub-pixel by interpolating the similarity between the center pixel and the surrounding pixels and the feature point. To do.

図９の（３）を用いて、副画素（ａ，ｂ）９００における極値を検出する例について説明する。ここでは、各画素と特徴点との輝度の差Ｌ（以下、輝度の類似度Ｌ）から、副画素（ａ，ｂ）９００と特徴点との輝度の類似度Ｌを補間する例について説明するが、輝度以外の類似度を用いてもよいことは勿論である。 An example of detecting an extreme value in the subpixel (a, b) 900 will be described with reference to (3) of FIG. Here, an example in which the luminance similarity L between the sub-pixel (a, b) 900 and the feature point is interpolated from the luminance difference L between the pixels and the feature point (hereinafter, luminance similarity L) will be described. However, it is needless to say that similarity other than luminance may be used.

図９の（３）では、画素間の距離はβ（例えば、（Ｘ_ｉ，Ｙ_ｉ）〜（Ｘ_ｉ，Ｙ_ｉ−１））であり、補間対象の副画素（ａ，ｂ）９００の中心の画素（Ｘ_ｉ，Ｙ_ｉ）からの距離がαである場合を示している。 In (3) of FIG. 9, the distance between the pixels is β (for example, (X _i , Y _i ) to (X _i , Y _i-1 )), and the interpolation target sub-pixel (a, b) 900 The case where the distance from the center pixel (X _i , Y _i ) is α is shown.

副画素（ａ，ｂ）９００における輝度の類似度Ｌ（ａ，ｂ）は図１０の（式３）から算出される。副画素（ａ，ｂ）９００の周辺の画素における輝度の類似度Ｌを、副画素（ａ，ｂ）９００と各画素との距離により重み付けすることにより、副画素（ａ，ｂ）９００における輝度の類似度が算出される。 The luminance similarity L (a, b) in the sub-pixel (a, b) 900 is calculated from (Equation 3) in FIG. The luminance similarity L in pixels around the sub-pixel (a, b) 900 is weighted by the distance between the sub-pixel (a, b) 900 and each pixel, whereby the luminance in the sub-pixel (a, b) 900 is determined. The similarity is calculated.

（３）勾配法による極値の算出方法
副画素の精度で極値の検出を実行する場合、極値検出部１４２は、勾配法を用いて極値を有する副画素の位置を特定してもよい。 (3) Extreme Value Calculation Method Using the Gradient Method When performing extreme value detection with subpixel accuracy, the extreme value detection unit 142 may identify the position of a subpixel having an extreme value using the gradient method. Good.

ここでは、ＫＬＴ（Ｋａｎａｄｅ−ＬｕｃａｓＴｏｍａｓｉ）法を用いて副画素の位置を特定する方法について、図１１及び図１２を用いて説明する。 Here, a method for specifying the position of a sub-pixel using the KLT (Kanade-Lucas Tomasi) method will be described with reference to FIGS.

図１１は、Ｎ枚目の映像のフレーム１１００の画像Ｉと、Ｎ＋１枚目の映像のフレーム１１１０の画像Ｊ、画像Ｊ上の第１の探索ウィンドウ１１２０、画像Ｉ上の第１の探索ウィンドウ１１３０、画像Ｉ上の輝度値Ｉ（ｘ，ｙ）１１４０、画像Ｊ上の輝度値Ｊ（ｘ，ｙ）１１５０を表している。 FIG. 11 shows image I of frame 1100 of the Nth video, image J of frame 1110 of the (N + 1) th video, first search window 1120 on image J, and first search window 1130 on image I. , A luminance value I (x, y) 1140 on the image I and a luminance value J (x, y) 1150 on the image J are represented.

画像Ｉと画像Ｊの各画素の輝度値から、画像Ｊ上の副画素の座標を特定する方法について説明する。 A method for specifying the coordinates of the sub-pixel on the image J from the luminance value of each pixel of the image I and the image J will be described.

極値検出部１４２は、図１２の（式４）を用いて画像Ｉと画像Ｊの対応する画素における輝度値の差（δＩ_ｋ）を算出する。 The extreme value detection unit 142 calculates the difference (δI _k ) of the luminance values in the corresponding pixels of the image I and the image J using (Equation 4) in FIG.

極値検出部１４２は、図１２の（式５）を用いて、画像Ｊの第１の探索ウィンドウ１１２０内の各画素における「輝度値の差（δＩ_ｋ）に画像Ｉ、Ｊの各画素を微分した値を乗算した値」であるｂ_ｋを求める。ここで、位置（ｘ，ｙ）で画像Ｉの各画素を微分した値は、Ｉ_ｘ（ｘ，ｙ）で表され、位置（ｘ，ｙ）で画像Ｊの各画素を微分した値は、Ｉ_ｙ（ｘ，ｙ）で表される。 The extreme value detection unit 142 uses “Equation 5” of FIG. 12 to set each pixel of the images I and J to the “difference in luminance value (δI _k )” of each pixel in the first search window 1120 of the image J. B _k which is “a value obtained by multiplying the differentiated value” is _obtained . Here, the value obtained by differentiating each pixel of the image I at the position (x, y) is represented by I _x (x, y), and the value obtained by differentiating each pixel of the image J at the position (x, y) is I _y (x, y).

また、極値検出部１４２は、図１２の（式６）を用いて、第１の探索ウィンドウ１１２０内の各画素におけるｘ方向の微分値（Ｉ_ｘ）とｙ方向の微分値（Ｉ_ｙ）とを乗算して算出される２行２列の行列Ｇを算出する。 Also, the extreme value detector 142, with reference of FIG. 12 (6), the differential value of the x-direction at each pixel in the first search window 1120 _{(I x)} and y-direction of the differential value _(I y) And a matrix G of 2 rows and 2 columns calculated by multiplying.

極値検出部１４２は、図１２の（式７）を用いてｂ_ｋとＧの逆行列とから、（ｘ，ｙ）方向の移動量η^ｋを算出する。 The extreme value detection unit 142 calculates the amount of movement η ^k in the (x, y) direction from b _k and the inverse matrix of G using (Expression 7) in FIG.

極値検出部１４２は、図１２の（式８）を用いて、副画素の座標を算出する。（式８）は、極値を有する画素の位置からの距離を表しており、ν^ｋの初期値は０である。また、ｋは計算回数を表す。 The extreme value detection unit 142 calculates the coordinates of the sub-pixel using (Equation 8) in FIG. (Equation 8) represents the distance from the position of the pixel having the extreme, an initial value of [nu ^k is 0. K represents the number of calculations.

極値検出部１４２は、所定の回数ν^ｋの算出を繰り返してもよい。例えば、極値検出部１４２は、ν^ｋの算出を２０回程度、繰り返してもよい。或いは、極値検出部１４２は、移動量η^ｋが所定の閾値以下になるまでν^ｋの算出を繰り返してもよい。例えば、極値検出部１４２は、η^ｋが０．０３副画素以下になるまで、ν^ｋの算出を繰り返してもよい。
極値検出部１４２は、算出された位置の副画素と特徴点との類似度を算出する。例えば、極値検出部１４２は、図１０の（式３）を用いてかかる位置の副画素と特徴点との輝度の差を算出してもよい。 The extreme value detection unit 142 may repeat the calculation of ν ^k a predetermined number of times. For example, the extreme value detection unit 142 may repeat the calculation of ν ^k about 20 times. Alternatively, the extreme value detection unit 142 may repeat the calculation of ν ^k until the movement amount η ^k becomes equal to or less than a predetermined threshold value. For example, the extreme value detection unit 142 may repeat the calculation of ν ^k until η ^k becomes 0.03 subpixel or less.
The extreme value detection unit 142 calculates the degree of similarity between the subpixel at the calculated position and the feature point. For example, the extreme value detection unit 142 may calculate the luminance difference between the sub-pixel at the position and the feature point using (Expression 3) in FIG.

極値検出部１４２は、算出された位置の副画素と特徴点との類似度と、画素単位で算出された周辺の極値の候補の画素と特徴点との類似度とを比較し、特徴点により類似する方を極値を有する画素とする。 The extreme value detection unit 142 compares the degree of similarity between the subpixel and the feature point at the calculated position with the degree of similarity between the neighboring extreme value candidate pixel and the feature point calculated in units of pixels. The pixel that is more similar to the point is defined as a pixel having an extreme value.

なお、ＫＬＴ法により副画素の座標を算出する方法について説明したが、その他の方法を用いて副画素の座標を算出してもよい。また、極値検出部１４２が、副画素の座標を算出する範囲は第１の探索ウィンドウ内に限らなくてよい。 In addition, although the method of calculating the subpixel coordinates by the KLT method has been described, the subpixel coordinates may be calculated using other methods. In addition, the range in which the extreme value detection unit 142 calculates the coordinates of the sub-pixels is not limited to the first search window.

Ｎ＋１枚目の映像のフレームにおける副画素の座標を算出する方法について説明したが、Ｎ＋１枚目以降の映像のフレームにおける副画素の座標の算出にも上述した算出方法を適用できる。 Although the method for calculating the coordinates of the subpixels in the frame of the (N + 1) th image has been described, the above-described calculation method can also be applied to the calculation of the coordinates of the subpixels in the frames of the N + 1th and subsequent images.

（４）極値の検出方法（その２）
上述した極値の検出方法においては、第１の画素と特徴点との類似度と、第２の画素と特徴点との類似度の差が第１の閾値以上でない場合、単一性判定部１４３は、第１の条件を満たさないと判定した。しかし、かかる類似度の差が第１の閾値以上でない場合でも、その他の極値を有する画素と特徴点との類似度によっては、第１の画素が追跡点である可能性がある。このような場合、単一性判定部１４３は、第１の画素を追跡点の候補とする方が好ましい。かかる場合における極値の検出の動作手順について、図１３及び図１４を用いて説明する。 (4) Extreme value detection method (2)
In the extreme value detection method described above, if the difference between the similarity between the first pixel and the feature point and the similarity between the second pixel and the feature point are not equal to or greater than the first threshold, the unity determination unit 143 determined that the first condition was not satisfied. However, even when the difference in similarity is not equal to or greater than the first threshold value, the first pixel may be a tracking point depending on the similarity between a pixel having another extreme value and a feature point. In such a case, it is preferable that the unity determination unit 143 sets the first pixel as a tracking point candidate. An operation procedure of extreme value detection in such a case will be described with reference to FIGS.

上述したように、単一性判定部１４３は、第１の画素と特徴点との類似度と、第２の画素と特徴点との類似度の差が第１の閾値以上の場合、第１の条件を満たすと判定する。 As described above, the unity determination unit 143 determines the first when the difference between the similarity between the first pixel and the feature point and the similarity between the second pixel and the feature point is equal to or greater than the first threshold. It is determined that the above condition is satisfied.

図１３の（１）は、第１の探索ウィンドウ内の画素の各々と特徴点との類似度を示す分布１３００を示しており、図１３の（２）は、図１３の（１）の分布で示される画素と、かかる画素と特徴点との類似度の関係とを２次元で示す図である。これらの図では、類似度が小さい程、特徴点と類似していることを示している。 (1) in FIG. 13 shows a distribution 1300 indicating the similarity between each pixel in the first search window and the feature point, and (2) in FIG. 13 shows the distribution of (1) in FIG. FIG. 2 is a diagram showing two-dimensionally the pixel indicated by, and the relationship of similarity between the pixel and the feature point. In these figures, the smaller the similarity, the more similar to the feature point.

図１３の（２）の場合、第１の画素１３１０と特徴点との類似度と、第２の画素１３２０と特徴点との類似度の差１３３０が、第１の閾値以上であるため、単一性判定部１４３は、第１の画素１３１０を追跡点の候補とする。 In the case of (2) in FIG. 13, the difference 1330 between the similarity between the first pixel 1310 and the feature point and the similarity between the second pixel 1320 and the feature point is equal to or greater than the first threshold. The uniqueness determination unit 143 sets the first pixel 1310 as a tracking point candidate.

次に、第１の画素と特徴点との類似度と、第２の画素と特徴点との類似度の差が第１の閾値以上でない場合について説明する。 Next, a case where the difference between the similarity between the first pixel and the feature point and the difference between the similarity between the second pixel and the feature point is not greater than or equal to the first threshold will be described.

図１３の（３）は、第１の探索ウィンドウ内の画素の各々と特徴点との類似度を示す分布１３４０であり、図１３の（４）は、図１３の（３）の分布で示される画素と、かかる画素と特徴点との類似度の関係を２次元で示す図である。 (3) in FIG. 13 is a distribution 1340 indicating the similarity between each pixel in the first search window and the feature point, and (4) in FIG. 13 is indicated by the distribution in (3) in FIG. It is a figure which shows the relationship of the similarity of a pixel and this pixel and a feature point in two dimensions.

図１３の（４）において、第１の画素１３５０、第２の画素１３６０、及び第１の画素１３５０と第２の画素１３６０の特徴点との類似度の差１３５５で示されている。 In FIG. 13 (4), the first pixel 1350, the second pixel 1360, and the similarity difference 1355 between the feature points of the first pixel 1350 and the second pixel 1360 are indicated.

類似度の差１３５５が、第１の閾値より小さい場合、単一性判定部１４３は、第２の画素１３６０と特徴点と３番目に類似する極値を有する画素１３７０との類似度の差１３６５、及び特徴点と３番目に類似する極値を有する画素１３７０と特徴点と４番目に類似する極値を有する画素１３８０との類似度の差１３７５等を算出する。 When the similarity difference 1355 is smaller than the first threshold, the unity determination unit 143 determines the similarity difference 1365 between the second pixel 1360 and the pixel 1370 having the third most similar feature point. , And a similarity difference 1375 between a pixel 1370 having an extreme value third most similar to the feature point and a pixel 1380 having an extreme value fourth similar to the feature point are calculated.

次に、単一性判定部１４３は、かかる算出された類似度の差（１３６５、及び１３７５等）と、類似度の差１３５５とを比較する。つまり、単一性判定部１４３は、特徴点にＮ番目に類似する極値を有する画素と特徴点との類似度と、Ｎ＋１番目に類似する極値を有する画素と特徴点との類似度の差（以下、「ＮとＮ＋１との類似度の差」）を算出する。 Next, the unity determination unit 143 compares the calculated similarity difference (such as 1365 and 1375) with the similarity difference 1355. That is, the unity determination unit 143 determines the similarity between the feature point and the pixel having the Nth most similar extreme value to the feature point, and the similarity degree between the feature point and the pixel having the N + 1th extreme value. A difference (hereinafter, “difference in similarity between N and N + 1”) is calculated.

そして、単一性判定部１４３は、算出されたＮとＮ＋１との類似度の差と、類似度の差１３５５とを比較する（Ｎは２以上の自然数）。 Then, the unity determination unit 143 compares the calculated similarity difference between N and N + 1 with the similarity difference 1355 (N is a natural number of 2 or more).

かかる比較の結果、類似度の差１３５５が、ＮとＮ＋１との類似度の差より大きい場合、単一性判定部１４３は、類似度の差１３５５が、第１の閾値より小さい場合であっても、第１の画素１３５０を追跡点の候補とする。 As a result of such comparison, if the similarity difference 1355 is greater than the difference in similarity between N and N + 1, the unity determination unit 143 indicates that the similarity difference 1355 is smaller than the first threshold. Also, the first pixel 1350 is set as a tracking point candidate.

図１４を用いて、単一性判定部１４３の処理フローを説明する。 The processing flow of the unity determination unit 143 will be described with reference to FIG.

ステップＳ１４０１において、単一性判定部１４３は、第１の画素と特徴点との類似度と、第２の画素と特徴点との類似度の差が第１の閾値以上であるか否か判定する。かかる類似度の差が第１の閾値以上の場合、ステップＳ１４０４に進む（ステップＳ１４０１ＹＥＳ）。一方、かかる類似度の差が第１の閾値以上でない場合、ステップＳ１４０２に進む（ステップＳ１４０１ＮＯ）。 In step S1401, the unity determination unit 143 determines whether or not the difference between the similarity between the first pixel and the feature point and the similarity between the second pixel and the feature point is greater than or equal to the first threshold. To do. When the similarity difference is equal to or larger than the first threshold value, the process proceeds to step S1404 (YES in step S1401). On the other hand, if the difference in similarity is not equal to or greater than the first threshold, the process proceeds to step S1402 (NO in step S1401).

ステップＳ１４０２において、単一性判定部１４３は、第１の画素と特徴点との類似度と第２の画素と特徴点との類似度の差と、ＮとＮ＋１との類似度の差とを比較する。第１の画素と第２の画素の類似度の差が、ＮとＮ＋１との類似度の差より大きい場合、ステップＳ１４０４に進む（ステップＳ１４０２ＹＥＳ）。一方、第１の画素と第２の画素の類似度の差が、ＮとＮ＋１との類似度の差より大きくない場合、ステップＳ１４０３に進む（ステップＳ１４０２ＮＯ）。 In step S1402, the unity determination unit 143 calculates the similarity between the first pixel and the feature point, the difference between the second pixel and the feature point, and the difference between the similarity between N and N + 1. Compare. When the difference in similarity between the first pixel and the second pixel is greater than the difference in similarity between N and N + 1, the process proceeds to step S1404 (YES in step S1402). On the other hand, if the difference in similarity between the first pixel and the second pixel is not greater than the difference in similarity between N and N + 1, the process proceeds to step S1403 (NO in step S1402).

ステップＳ１４０３で、単一性判定部１４３は、単一性の判定を失敗と判断し、処理を終了する。 In step S1403, the unity determination unit 143 determines that the unity determination is unsuccessful and ends the process.

ステップＳ１４０４で、第１の画素の位置が、ブロックマッチング処理で特定された追跡点の位置と一致する場合、単一性判定部１４３は、単一性の判定を成功と判断し、処理を終了する。 In step S1404, when the position of the first pixel matches the position of the tracking point specified by the block matching process, the unity determination unit 143 determines that the unity determination is successful and ends the process. To do.

なお、探索ウィンドウ内に極値を有する画素が１つしか検出されなかったときには、かかる画素の位置が、ブロックマッチングの処理で特定された追跡点の位置と一致する場合に、単一性判定部１４３は、単一性の判定を成功と判断する。 Note that when only one pixel having an extreme value is detected in the search window, the unity determination unit is used when the position of the pixel matches the position of the tracking point specified by the block matching process. 143 determines that the unity determination is successful.

（５）第２の映像フレームの探索ウィンドウの拡大
単一性判定部１４３が単一性の判定を失敗と判断した場合、映像処理装置１は、第２の映像フレームの探索ウィンドウを拡大して、ブロックマッチングの処理、及び第１の条件の判定の処理を行ってもよい。 (5) Expansion of search window for second video frame When unity determination unit 143 determines that the determination of unity has failed, video processing apparatus 1 expands the search window for the second video frame. Block matching processing and first condition determination processing may be performed.

図１５を用いて、第２の映像フレームの探索ウィンドウを拡大した場合の映像処理装置１の処理内容について説明する。 Processing contents of the video processing device 1 when the search window of the second video frame is enlarged will be described with reference to FIG.

図１５の（１）は、探索ウィンドウを拡大する前の第２の映像フレーム１５５０における第１の探索ウィンドウ１５３０、第２の探索ウィンドウ１５００、及びテンプレートパッチ１５１０の関係を示す図である。 (1) of FIG. 15 is a diagram illustrating a relationship among the first search window 1530, the second search window 1500, and the template patch 1510 in the second video frame 1550 before the search window is enlarged.

第２の映像フレーム１５５０は、柵の奥に家が存在する画像を示している。第１及び第２の探索ウィンドウ（１５３０、１５００）が設定されている領域は、柵に対応する部分であり、似たようなパターンが繰り返されている。単一性判定部１４３は、追跡点の候補となる第１の画素を抽出するのは難しい。また、似たようなパターンが繰り返されているため、ブロックマッチング部１４１は、追跡点を誤って特定するおそれがある。 The second video frame 1550 shows an image in which a house is present behind the fence. The area where the first and second search windows (1530, 1500) are set is a part corresponding to the fence, and a similar pattern is repeated. It is difficult for the unity determination unit 143 to extract the first pixel that is a tracking point candidate. In addition, since similar patterns are repeated, the block matching unit 141 may erroneously specify the tracking point.

このような場合には、映像処理装置１は、探索ウィンドウを拡大して、ブロックマッチングの処理と、単一性の判定とを実行してもよい。 In such a case, the video processing apparatus 1 may enlarge the search window and execute block matching processing and unity determination.

図１５の（２）は、探索ウィンドウを拡大した場合の一例である。図１５の（２）では、第２の探索ウィンドウ１５００のサイズは一定であるが、映像処理装置１は、第１の探索ウィンドウ１５４０を拡大する。また、映像処理装置１は、第１の探索ウィンドウ内を移動する、テンプレートパッチ１５２０のサイズも拡大する。 (2) of FIG. 15 is an example when the search window is enlarged. In (2) of FIG. 15, the size of the second search window 1500 is constant, but the video processing device 1 enlarges the first search window 1540. Also, the video processing apparatus 1 increases the size of the template patch 1520 that moves in the first search window.

テンプレートパッチ１５２０が拡大されることにより、テンプレートパッチ１５２０の第２の映像フレーム１５５０の重複部分が広くなる。これにより、重複部分は、柵以外の部分も含むようになり、テンプレートパッチ１５２０を第１の探索ウィンドウ内で移動した場合に、重複部分とテンプレートパッチ１５２０との類似度を評価する値が変化するため、追跡点を抽出し易くなる。 By enlarging the template patch 1520, the overlapping portion of the second video frame 1550 of the template patch 1520 becomes wider. As a result, the overlapping portion includes a portion other than the fence, and when the template patch 1520 is moved in the first search window, the value for evaluating the similarity between the overlapping portion and the template patch 1520 changes. Therefore, it becomes easy to extract tracking points.

また、拡大された第１の探索ウィンドウ１５４０は、拡大前よりも変化のある画像であるため、単一性判定部１４３は、最も類似する第１の画素を追跡点の候補と判定し易くなる。 In addition, since the enlarged first search window 1540 is an image having a change from before the enlargement, the unity determination unit 143 can easily determine the most similar first pixel as a tracking point candidate. .

なお、第２の映像のフレームの探索ウィンドウを拡大する場合に、映像処理装置１は、テンプレートパッチ１５１０を拡大しなくてもよい。また、映像処理装置１は、第１の探索ウィンドウと共に、第２の探索ウィンドウのサイズを拡大してもよい。 Note that the video processing apparatus 1 does not need to enlarge the template patch 1510 when the search window for the second video frame is enlarged. Further, the video processing device 1 may enlarge the size of the second search window together with the first search window.

＜第２の条件の判定＞
（１）曲面形状の算出方法
図１６及び図１７を用いて、曲面形状の算出方法について説明する。曲面形状判定部１４５は、ブロックマッチング処理と、単一性の判定（第１の条件の判定）の処理との結果、抽出された追跡点の候補の画素が追跡点であるかの判定を行う（第２の条件の判定）。 <Determination of the second condition>
(1) Method for calculating curved surface shape A method for calculating a curved surface shape will be described with reference to FIGS. 16 and 17. The curved surface shape determination unit 145 determines whether the extracted tracking point candidate pixel is a tracking point as a result of the block matching processing and the unity determination (first condition determination) processing. (Determination of the second condition).

追跡点の候補の画素が追跡点か否かの判定を行うときに、曲面形状判定部１４５は、追跡点の候補の画素が周辺の画素に対して際立った特徴を有するか否かの判定を行う。本実施形態では、曲面形状判定部１４５が、追跡点の候補の画素と特徴点との類似度と、かかる候補の画素の周辺の８画素と特徴点との類似度とを比較し、追跡点か否かの判定を行う。 When determining whether or not a tracking point candidate pixel is a tracking point, the curved surface shape determination unit 145 determines whether or not the tracking point candidate pixel has a distinctive feature with respect to surrounding pixels. Do. In this embodiment, the curved surface shape determination unit 145 compares the similarity between the tracking point candidate pixel and the feature point with the similarity between the 8 pixels around the candidate pixel and the feature point, and the tracking point. It is determined whether or not.

具体的には、曲面形状判定部１４５は、候補の画素（ｘ，ｙ）とその周辺８画素の位置と、これらの画素と特徴点との類似度から２次曲面の形状を算出する。 Specifically, the curved surface shape determination unit 145 calculates the shape of the secondary curved surface from the positions of the candidate pixel (x, y) and the surrounding eight pixels and the similarity between these pixels and feature points.

図１６の（１）に候補の画素（ｘ，ｙ）と周辺８画素との位置関係を示す。Ｒは各画素と特徴点との類似度を表す。例えば、Ｒは特徴点との輝度の類似度を表してもよい。 FIG. 16 (1) shows the positional relationship between the candidate pixel (x, y) and the surrounding eight pixels. R represents the similarity between each pixel and the feature point. For example, R may represent the degree of luminance similarity with the feature point.

図１６の（２）は、候補の画素（ｘ，ｙ）と周辺８画素の位置と類似度の値とから２次曲面を形成した様子を示す図である。図１６の（２）は、候補の画素（ｘ，ｙ）の類似度Ｒ（ｘ，ｙ）を中心に２次曲面を切り出した様子を示している。 (2) of FIG. 16 is a diagram illustrating a state in which a quadric surface is formed from the candidate pixel (x, y), the positions of the surrounding eight pixels, and the similarity value. (2) of FIG. 16 shows a state in which a quadric surface is cut out with the similarity R (x, y) of the candidate pixel (x, y) as the center.

２次曲面は、図１７の（式９）から求められる。ここで、ｘの２次の係数ａと、ｙの２次の係数ｃとは、それぞれ（式１０）と（式１１）とから求められる。 The quadric surface is obtained from (Equation 9) in FIG. Here, the secondary coefficient a of x and the secondary coefficient c of y are obtained from (Expression 10) and (Expression 11), respectively.

ここで、（式１０）は、図１６の（１）の両端の２列の６画素（（ｘ−１，ｙ−１）、（ｘ−１、ｙ）、（ｘ−１，ｙ＋１）、（ｘ＋１，ｙ−１）、（ｘ＋１，ｙ）、（ｘ＋１，ｙ＋１））の類似度と、候補の画素（ｘ，ｙ）を含む真ん中の列の３画素（（ｘ，ｙ−１）、（ｘ，ｙ）、（ｘ，ｙ＋１））の類似度の２倍との差を表している。また、（式１１）は、図１６の（１）の上段と下段の２行の６画素（（ｘ−１，ｙ−１）、（ｘ，ｙ−１）、（ｘ＋１，ｙ−１）、（ｘ−１，ｙ＋１）、（ｘ，ｙ＋１）、（ｘ＋１，ｙ＋１））の類似度と、候補の画素（ｘ，ｙ）を含む真ん中の行の３画素（（ｘ−１，ｙ）、（ｘ，ｙ）、（ｘ＋１，ｙ））の類似度の２倍との差を表している。ｘの２次の係数ａと、ｙの２次の係数ｃの値とが第３の閾値より小さい場合、候補の画素（ｘ，ｙ）と周辺の画素の類似度から形成される２次曲面の形状は平坦に近く、候補の画素（ｘ，ｙ）で際立っていない。このため、特徴点に対応する追跡点が正しく特定できていない可能性が高い。この場合には、曲面形状判定部１４５は、第２の条件を満たさないと判定し、映像処理装置１は、追跡点の抽出に失敗したと判断する。 Here, (Equation 10) is expressed by two columns of 6 pixels ((x-1, y-1), (x-1, y), (x-1, y + 1)) at both ends of (1) in FIG. (X + 1, y-1), (x + 1, y), (x + 1, y + 1)) similarity and three pixels ((x, y-1), middle column including candidate pixel (x, y), The difference between (x, y) and (x, y + 1)) is twice the similarity. Further, (Equation 11) is expressed by 6 pixels ((x−1, y−1), (x, y−1), (x + 1, y−1)) in the upper and lower rows of (1) in FIG. , (X-1, y + 1), (x, y + 1), (x + 1, y + 1)) and three pixels ((x-1, y) in the middle row including the candidate pixel (x, y)) , (X, y), (x + 1, y)) is twice the similarity. A quadratic surface formed from the similarity between a candidate pixel (x, y) and surrounding pixels when the value of the quadratic coefficient a of x and the value of the quadratic coefficient c of y are smaller than the third threshold. The shape of is nearly flat and does not stand out in the candidate pixel (x, y). For this reason, there is a high possibility that tracking points corresponding to feature points cannot be correctly identified. In this case, the curved surface shape determination unit 145 determines that the second condition is not satisfied, and the video processing device 1 determines that the tracking point extraction has failed.

一方、２次の係数ａとｃの少なくともいずれか一方が、第３の閾値より大きい場合、候補の画素（ｘ，ｙ）は周辺の画素に対して、特徴点との類似度が際立っている。このため、曲面形状判定部１４５は、第２の条件を満たすと判定し、映像処理装置１は、追跡点の抽出に成功したと判断する。 On the other hand, when at least one of the secondary coefficients a and c is larger than the third threshold, the candidate pixel (x, y) has a remarkable similarity to the feature point with respect to surrounding pixels. . Therefore, the curved surface shape determination unit 145 determines that the second condition is satisfied, and the video processing device 1 determines that the tracking point has been successfully extracted.

ブロックマッチングの処理及び第１の判定の処理が副画素の精度で実行された場合、曲面形状判定部１４５は、追跡点の候補となる副画素から最も近い位置の画素を選択し、かかる画素とその周辺の画素とを用いて、第２の判定の処理を実行してもよい。或いは、曲面形状判定部１４５は、追跡点の候補となる副画素と特徴点との類似度と、周辺の画素と特徴点との類似度とから２次曲面を形成し、第２の判定の処理を実行してもよい。 When the block matching process and the first determination process are executed with the accuracy of the sub-pixel, the curved surface shape determination unit 145 selects the pixel closest to the sub-pixel that is the tracking point candidate, The second determination process may be executed using the surrounding pixels. Alternatively, the curved surface shape determination unit 145 forms a secondary curved surface from the similarity between the sub-pixels that are tracking point candidates and the feature points, and the similarity between the surrounding pixels and the feature points, and performs the second determination. Processing may be executed.

上述した実施形態では、候補の画素（ｘ，ｙ）と隣接する周辺８画素とから２次曲面の形状を算出しているが、周辺の画素は候補の画素（ｘ，ｙ）と隣接しない画素を含んでもよいことに留意すべきである。例えば、周辺の画素は、候補の画素（ｘ，ｙ）の２画素、又は３画素、隣りまで含んでもよい。 In the above-described embodiment, the shape of the quadric surface is calculated from the candidate pixel (x, y) and the adjacent eight neighboring pixels, but the neighboring pixels are not adjacent to the candidate pixel (x, y). It should be noted that may be included. For example, the peripheral pixels may include two pixels or three pixels of the candidate pixel (x, y) up to the neighbor.

＜ハードウェア構成＞
映像処理装置１は、例えば図１８に示すようなハードウェア構成により実現される。 <Hardware configuration>
The video processing apparatus 1 is realized by a hardware configuration as shown in FIG. 18, for example.

映像処理装置１は入出力部２０１、外部Ｉ／Ｆ２０３、ＲＡＭ２０４、ＲＯＭ２０５、ＣＰＵ２０６、通信Ｉ／Ｆ２０７、ＨＤＤ２０９、カメラモジュール２１０などを備え、それぞれがバスＢで相互に接続されている。 The video processing apparatus 1 includes an input / output unit 201, an external I / F 203, a RAM 204, a ROM 205, a CPU 206, a communication I / F 207, an HDD 209, a camera module 210, and the like.

入出力部２０１は、カメラモジュール２１０で撮影した映像のフレームを表示する。入出力部２０１は、映像処理装置１の状態などを表示する。また、入出力部２０１は、映像処理装置１のユーザから、映像処理装置１の各種設定を受け付けてもよい。 The input / output unit 201 displays a frame of video captured by the camera module 210. The input / output unit 201 displays the state of the video processing device 1 and the like. Further, the input / output unit 201 may accept various settings of the video processing device 1 from the user of the video processing device 1.

通信Ｉ／Ｆ２０７は、サーバ、ＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）等と、有線、又は無線のネットワークを介して通信を行う。映像処理装置１は、通信Ｉ／Ｆ２０７を介して、ＰＣ等の端末から、映像処理装置１に対する指示を受け付けてもよい。 A communication I / F 207 communicates with a server, a PC (Personal Computer), or the like via a wired or wireless network. The video processing device 1 may receive an instruction for the video processing device 1 from a terminal such as a PC via the communication I / F 207.

また、ＨＤＤ２０９はプログラムやデータを格納している不揮発性の記憶装置の一例である。格納されるプログラムやデータには映像処理装置１全体を制御する基本ソフトウェアであるＯＳ、ＯＳ上において各種機能を提供するアプリケーションソフトウェアなどがある。なお、映像処理装置１はＨＤＤ２０９に替え、記憶媒体としてフラッシュメモリを用いるドライブ装置（例えばソリッドステートドライブ：ＳＳＤ）を利用するものであってもよい。 The HDD 209 is an example of a non-volatile storage device that stores programs and data. The stored programs and data include an OS that is basic software for controlling the entire video processing apparatus 1 and application software that provides various functions on the OS. Note that the video processing apparatus 1 may use a drive device (for example, a solid state drive: SSD) that uses a flash memory as a storage medium instead of the HDD 209.

外部Ｉ／Ｆ２０３は、外部装置とのインタフェースである。外部装置には、記録媒体２０３ａなどがある。これにより、映像処理装置１は外部Ｉ／Ｆ２０３を介して記録媒体２０３ａの読み取り及び／又は書き込みを行うことができる。記録媒体２０３ａにはフレキシブルディスク、ＣＤ、ＤＶＤ、ＳＤメモリカード、及びＵＳＢメモリなどがある。 The external I / F 203 is an interface with an external device. The external device includes a recording medium 203a. Thereby, the video processing apparatus 1 can read and / or write the recording medium 203a via the external I / F 203. Examples of the recording medium 203a include a flexible disk, a CD, a DVD, an SD memory card, and a USB memory.

ＲＯＭ２０５は、電源を切ってもプログラムやデータを保持することができる不揮発性の半導体メモリ（記憶装置）の一例である。ＲＯＭ２０５には映像処理装置１の起動時に実行されるＢＩＯＳ、ＯＳ設定、及びネットワーク設定などのプログラムやデータが格納されている。ＲＡＭ２０４はプログラムやデータを一時保持する揮発性の半導体メモリ（記憶装置）の一例である。 The ROM 205 is an example of a nonvolatile semiconductor memory (storage device) that can retain programs and data even when the power is turned off. The ROM 205 stores programs and data such as BIOS, OS settings, and network settings that are executed when the video processing apparatus 1 is started. The RAM 204 is an example of a volatile semiconductor memory (storage device) that temporarily stores programs and data.

カメラモジュール２１０は、カメラを有し、ＣＰＵ２０６からの指示に基づいて撮影を行い、撮影された映像データをＲＡＭ２０４等の記憶装置に送信する。ＲＡＭ２０４等の記憶装置は受信した撮影データを記憶し、ＣＰＵ２０６の指示に応じて読み出す。 The camera module 210 has a camera, performs shooting based on an instruction from the CPU 206, and transmits the captured video data to a storage device such as the RAM 204. A storage device such as the RAM 204 stores the received photographing data and reads it in accordance with an instruction from the CPU 206.

ＣＰＵ２０６は、ＲＯＭ２０５及びＨＤＤ２０９などの記憶装置からプログラムやデータをＲＡＭ２０４上に読み出し、処理を実行することで、映像処理装置１全体の制御や図１及び図２に示す映像処理装置１の機能を実現する演算装置である。映像処理装置１は図１６に示すハードウェア構成により、上述した各種処理を実現できる。 The CPU 206 reads programs and data from a storage device such as the ROM 205 and the HDD 209 onto the RAM 204 and executes the processing, thereby realizing the control of the entire video processing device 1 and the functions of the video processing device 1 shown in FIGS. 1 and 2. It is a computing device. The video processing apparatus 1 can implement the various processes described above with the hardware configuration shown in FIG.

［その他］
上述した実施形態では、第１の探索ウィンドウ内の画素を用いて、映像処理装置１は、第１の判定処理と、第２の判定処理とを実行しているが、第２の探索ウィンドウ内の画素を用いてこれらの処理を実行してもよい。これにより、映像処理装置１は、第１の判定処理と第２の判定処理の対象となる画素を追跡点の候補の周辺の画素に絞ることができる。 [Others]
In the embodiment described above, the video processing apparatus 1 executes the first determination process and the second determination process using the pixels in the first search window. These processes may be executed using these pixels. As a result, the video processing apparatus 1 can narrow down the pixels to be subjected to the first determination process and the second determination process to pixels around the tracking point candidate.

上述した実施の形態の機能を実現するソフトウェアのプログラムコードを記録した記憶媒体を、映像処理装置１に供給してもよい。そして、映像処理装置１が記憶媒体に格納されたプログラムコードを読み出し実行することによっても、上述の実施形態が、達成されることは言うまでもない。この場合、記憶媒体から読み出されたプログラムコード自体が前述した実施形態の機能を実現することになり、そのプログラムコードを記憶した記憶媒体は、いずれかの実施の形態を構成することになる。ここで、記憶媒体は、記録媒体または非一時的な記憶媒体である。 A storage medium that records a program code of software that implements the functions of the above-described embodiments may be supplied to the video processing apparatus 1. Needless to say, the above-described embodiment can also be achieved by the video processing apparatus 1 reading and executing the program code stored in the storage medium. In this case, the program code itself read from the storage medium realizes the functions of the above-described embodiment, and the storage medium storing the program code constitutes any of the embodiments. Here, the storage medium is a recording medium or a non-transitory storage medium.

また、コンピュータ装置が読み出したプログラムコードを実行することにより、前述した実施形態の機能が実現されるだけではない。そのプログラムコードの指示に従って、コンピュータ装置上で動作しているオペレーティングシステム（ＯＳ）等が実際の処理の一部または全部を行ってもよい。さらに、その処理によって前述した実施形態の機能が実現されてもよいことは言うまでもない。 The functions of the above-described embodiments are not only realized by executing the program code read by the computer device. An operating system (OS) or the like operating on the computer device may perform part or all of the actual processing in accordance with the instruction of the program code. Furthermore, it goes without saying that the functions of the above-described embodiments may be realized by the processing.

以上、本発明の好ましい実施形態について説明したが、本発明はこうした実施形態に限定されるものではなく、本発明の要旨を逸脱しない範囲内において種々の変形及び置換を加えることができる。 As mentioned above, although preferable embodiment of this invention was described, this invention is not limited to such embodiment, A various deformation | transformation and substitution can be added in the range which does not deviate from the summary of this invention.

１映像処理装置
１１０映像入力部
１２０映像補正部
１３０特徴点抽出部
１４０追跡部
１４１ブロックマッチング部
１４２極値検出部
１４３単一性判定部
１４５曲面形状判定部
１５０フレームバッファ
１６０重複点除去部 DESCRIPTION OF SYMBOLS 1 Image processing apparatus 110 Image | video input part 120 Image | video correction | amendment part 130 Feature point extraction part 140 Tracking part 141 Block matching part 142 Extreme value detection part 143 Singleness determination part 145 Curved surface shape determination part 150 Frame buffer 160 Duplicate point removal part

特開２００３−０７８８０７号公報JP 2003-0788807 A

Claims

An extraction unit for extracting feature points from the frame of the first video;
A part of the frame of the first video with respect to a predetermined area of the frame of the second video that is a video frame temporally after the frame of the first video, and the feature point A tracking unit that identifies a tracking point corresponding to the feature point by performing a block matching process with a block including the block;
A position of a first pixel that is most similar to the feature point in the predetermined region; a similarity between the first pixel and the feature point; a position of a second pixel that is second closest to the feature point; And a detection unit for detecting the similarity between the second pixel and the feature point;
When the position of the tracking point matches the position of the first pixel, and the difference between the similarity of the first pixel and the similarity of the second pixel is equal to or greater than a first threshold, A first determination unit that determines that one condition is satisfied;
The similarity between the tracking point and the feature point, the similarity between each of a plurality of surrounding pixels of the tracking point and the feature point, and the calculated similarity between the tracking point and the plurality A second determination unit that determines that the second condition is satisfied when the difference between the respective similarities of pixels around the pixel is equal to or greater than a second threshold value,
When the first condition and the second condition are satisfied, the tracking unit determines that the feature point corresponds to the tracking point.

The detection unit detects a plurality of similar pixels that are similar to the feature point in the predetermined region, and the plurality of similar pixels are more similar to the feature point than pixels adjacent thereto, and are detected. The video processing according to claim 1, wherein among the plurality of similar pixels, a pixel that is most similar to the feature point is the first pixel, and a pixel that is second most similar to the feature point is the second pixel. apparatus.

The video processing apparatus according to claim 1, wherein the similarity with the feature point is represented by a difference between a luminance at the feature point and a luminance of a pixel for which the similarity is calculated.

The degree of similarity with the feature point is represented by a difference between a total value of luminance of each color at the feature point and a total value of luminance of pixels for which the similarity is calculated. Video processing device.

The tracking unit performs the block matching process with the accuracy of a subpixel that is a unit smaller than a pixel, specifies the tracking point,
The video processing apparatus according to claim 1, wherein the detection unit detects the first pixel and the second pixel with accuracy of the sub-pixel.

The detection unit interpolates the similarity between the feature point and a sub-pixel existing between adjacent pixels from the similarity between each of the pixels around the adjacent pixel and the feature point. The video processing apparatus described.

The detection unit calculates coordinates of the sub-pixel based on a change in similarity between pixels of the predetermined area and pixels of an area corresponding to the predetermined area of the frame of the first video, and the coordinates The video processing apparatus according to claim 6, wherein the similarity with the feature point is interpolated.

The second determination unit is formed by forming a similarity distribution of a quadric surface from the similarity between the tracking point and the feature point and the similarity between the surrounding pixel and the feature point. The video processing apparatus according to claim 1, wherein whether or not the second condition is satisfied is determined from a distribution of similarity of quadric surfaces.

Extracting feature points from a frame of the first video;
A part of the frame of the first video with respect to a predetermined area of the frame of the second video that is a video frame temporally after the frame of the first video, and the feature point Identifying a tracking point corresponding to the feature point by performing block matching processing with a block including
A position of a first pixel that is most similar to the feature point in the predetermined region; a similarity between the first pixel and the feature point; a position of a second pixel that is second closest to the feature point; And detecting the degree of similarity between the second pixel and the feature point;
When the position of the tracking point matches the position of the first pixel, and the difference between the similarity of the first pixel and the similarity of the second pixel is equal to or greater than a first threshold, Determining that condition 1 is satisfied;
The similarity between the tracking point and the feature point, the similarity between each of a plurality of surrounding pixels of the tracking point and the feature point, and the calculated similarity between the tracking point and the plurality Determining that the second condition is satisfied if the difference between each of the neighboring pixels of the pixel is equal to or greater than a second threshold value;
And a step of determining that the feature point corresponds to the tracking point when the first condition and the second condition are satisfied.

A program causing a video processing apparatus to execute the video processing method according to claim 9.