JP6065427B2

JP6065427B2 - Object tracking method and object tracking apparatus

Info

Publication number: JP6065427B2
Application number: JP2012151608A
Authority: JP
Inventors: ションホアイシヌ
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2011-07-07
Filing date: 2012-07-05
Publication date: 2017-01-25
Anticipated expiration: 2032-07-05
Also published as: JP2013020616A; CN102867311A; CN102867311B

Description

本発明は、オブジェクト追跡方法及びオブジェクト追跡装置に関する。 The present invention relates to an object tracking method and an object tracking apparatus.

オブジェクト追跡は、画像分析と機械視覚の分野における最も重要な技術の一つであり、該オブジェクト追跡においては、主に、連続した映像フレームにおけるオブジェクトの関連付けが難しく、特に、オブジェクト対象の移動がフレームレートよりも早くなる場合のオブジェクトの追跡が困難という問題があり、また、運動期間中に追跡されたオブジェクトの向きが変わったことにより一定程度の歪みが生じた場合は、該問題の複雑度がより増してしまうという難点がある。 Object tracking is one of the most important technologies in the field of image analysis and machine vision. In the object tracking, it is mainly difficult to associate objects in continuous video frames. There is a problem that it is difficult to track an object when it is faster than the rate, and when a certain degree of distortion occurs due to a change in the direction of the tracked object during the exercise period, the complexity of the problem is reduced. There is a difficulty that it increases more.

非特許文献１である「Mathias Kolsch, Doctoral Dissertation, “Vision Based Hand Gesture Interfaces for Wearable Computing and Virtual Environments”, UNIVERSITY OF CALIFORNIA, Santa Barbara, 2004」には、２Ｄ（２次元）画像におけるフロック特徴（Flocks of Features）による手追跡方法が提案されている。該方法においては、オプティカル・フロー・ベクトル算出をもとにした、ＫＬＴ追跡器による複数の特徴点の追跡を行うとともに、ルーズで全体的な制限によるオプティカル・フロー・ベクトルの群分けを行い、オブジェクト位置を予測している。該方法においては、肌色が特徴の群分けで重要な役割を果たし、肌色確率密度関数を特徴点の補充に用いている。このため、手が動き中に肌色近似領域（例えば、顔）を掠めて通ると、追跡結果が目標物から逸れてしまう問題がある。 Non-Patent Document 1, “Mathias Kolsch, Doctoral Dissertation,“ Vision Based Hand Gesture Interfaces for Wearable Computing and Virtual Environments ”, UNIVERSITY OF CALIFORNIA, Santa Barbara, 2004”, Flock Features in 2D (2D) Images (Flocks of features) has been proposed. In this method, a plurality of feature points are tracked by the KLT tracker based on the optical flow vector calculation, and the optical flow vectors are grouped according to the overall restriction loosely. Predicting position. In this method, the skin color plays an important role in feature grouping, and the skin color probability density function is used for feature point supplementation. For this reason, there is a problem in that the tracking result deviates from the target object when the hand moves and gives up the skin color approximate region (for example, face).

実際に、従来の２Ｄ追跡技術においては、特徴の安定性に劣るという問題と、特徴間の不一致が生じやすいという問題がうまく解決できていない。２Ｄオブジェクト追跡技術に比べて、３Ｄ（３次元）カメラは、３Ｄ世界の各対象にさらに深度情報を与えることができ、該深度情報により、座標系のＺ軸上の、近似色や形状を有する異なる対象の区別が可能になる。 Actually, in the conventional 2D tracking technique, the problem that the stability of the feature is inferior and the problem that the mismatch between the features tends to occur cannot be solved well. Compared to 2D object tracking technology, a 3D (3D) camera can give more depth information to each object in the 3D world, and the depth information has an approximate color and shape on the Z axis of the coordinate system Different objects can be distinguished.

図１Ａ〜１Ｄは、３Ｄカメラの画像処理工程を示した図であり、ここでは、Prime Sense 3Dカメラを例に説明を行う。 1A to 1D are diagrams showing image processing steps of a 3D camera. Here, a Prime Sense 3D camera will be described as an example.

図１Ａは、シーンを示し、図１Ｂに示されたPrime Sense 3Dカメラにより、図１Ａのシーンが採集され、図１Ｃに示された深度画像が得られる。深度画像の各画素点には、該画素点に対応する実際の対象の深度座標データ（深度値）が含まれ、例えば、図１Ｄに示された画素マトリクスにおいて、該マトリクス中の各要素が深度画像中の１画素点に対応し、該画素の値が、ワールド座標における該画素点に対応する対象表面からカメラまでの距離を表し、距離単位は、例えば、ｍｍでもよい。 FIG. 1A shows a scene, and the Prime Sense 3D camera shown in FIG. 1B collects the scene of FIG. 1A to obtain the depth image shown in FIG. 1C. Each pixel point of the depth image includes actual target depth coordinate data (depth value) corresponding to the pixel point. For example, in the pixel matrix shown in FIG. 1D, each element in the matrix has a depth. Corresponding to one pixel point in the image, the value of the pixel represents the distance from the target surface corresponding to the pixel point in world coordinates to the camera, and the distance unit may be, for example, mm.

Prime Sense 3Dカメラの深度値の測定技術の原理としては、先ず、図１Ｂに示されたカメラで赤外光源から、１グループの非可視赤外線からなるパターン点を図１Ａに示されたシーンの対象表面に投射するとともに、ＣＭＯＳセンサーにより投射後のパターン画像を取得してから、プロセッサにより、パターン光中の点のずれから、三角測定技術による対象表面上の各点の深度値の算出を行う。深度画像は、色情報がないが、例えば、図１Ｃに示された白黒階調化のように、各種方法で可視化することができる。深度データストリームと画像ストリームを混合することにより、カラーの３Ｄ画像を生成することができ、３Ｄカメラで同期的にＲＧＢ画像ストリームとそれぞれに応じた深度値ストリームを出力することができる。 The principle of the Prime Sense 3D camera's depth measurement technique is as follows. First, the camera shown in FIG. 1B uses an infrared light source, and a pattern point consisting of a group of invisible infrared rays is the object of the scene shown in FIG. After projecting on the surface and obtaining a pattern image after projection by the CMOS sensor, the processor calculates the depth value of each point on the target surface by the triangulation measurement technique from the deviation of the points in the pattern light. The depth image has no color information, but can be visualized by various methods, for example, the black and white gradation shown in FIG. 1C. By mixing the depth data stream and the image stream, a color 3D image can be generated, and an RGB image stream and a depth value stream corresponding to each can be output synchronously by the 3D camera.

特許文献1であるUS 20100194741 A1には、深度画像におけるオプティカル・フローによるオブジェクト追跡方法が提案されている。該方法においては、孤立領域における各画素点に、それぞれの深度値に応じた白黒階調値付けが行われ、これにより、白黒階調化の方法で、「縞模様（zebra）」パターンを生成し、その後、オプティカル・フロー算法により、各領域中の各画素点の新規位置を決定している。該方法では、速度予測により追跡を安定化させているが、このような単一点の追跡結果のみに依存するボトムアップの処理方法には、エラー伝播の阻止といったポリシーに欠けていることから、該方法によると、特に、長時間の複雑運動追跡の場合は、安定的な追跡結果が得られない。 US 20100194741 A1, which is Patent Document 1, proposes an object tracking method using an optical flow in a depth image. In this method, each pixel point in the isolated region is assigned a black and white gradation value according to the depth value, thereby generating a “zebra” pattern in a monochrome gradation method. Thereafter, a new position of each pixel point in each region is determined by optical flow calculation. In this method, tracking is stabilized by speed prediction, but such a bottom-up processing method that relies only on the tracking result of a single point lacks a policy such as prevention of error propagation. According to the method, a stable tracking result cannot be obtained particularly in the case of long-time complicated motion tracking.

なお、前記従来技術では、オブジェクト追跡期間中のオブジェクトの結像面内の縮拡尺の判定方法が考慮されていない。イメージ尺度の判定を考慮した従来技術が存在したとしても、用いられている対象尺度予測手段は、通常、確率ブロックマッチングであり、例えば、通常、粒子フィルタ追跡においては現状態へのランダム変化（オブジェクトの尺度を含む）が生じ、変化後のブロックと変化前のブロックの関連係数により、変換後のブロックの重み算出を行い、算出後の平均状態から、その尺度が決められるが、このような尺度処理法は、信頼性が低く、最終追跡結果の正確性に影響を及ぼしてしまう。 In the prior art, a method for determining the scale of the object in the imaging plane during the object tracking period is not considered. Even if the prior art that considers the determination of the image scale exists, the target scale prediction means used is usually stochastic block matching. For example, in particle filter tracking, a random change to the current state (object And the weight of the block after conversion is calculated based on the relation coefficient between the block after the change and the block before the change, and the scale is determined from the average state after the calculation. Processing methods are unreliable and affect the accuracy of the final tracking results.

特に、マン・マシン・インタラクションシステムの場合、その背景は、通常、任意かつ複雑なものであり、対象の動きも比較的に複雑であるため、運動方向と速度の変化のみならず、運動対象自身の形状調整も存在している。特に、長時間の連続追跡において、如何にこのような環境で、確実かつ安定な追跡結果を得られるかは、極めて重要でかつチャレンジ的なことになっている。 In particular, in the case of man-machine interaction systems, the background is usually arbitrary and complex, and the movement of the object is also relatively complex, so that not only the movement direction and speed change, but also the movement object itself. There are also shape adjustments. In particular, how to obtain a reliable and stable tracking result in such an environment in continuous tracking over a long period of time is extremely important and challenging.

本発明は、従来技術における前述の問題を解決するためになされたものである。本発明によると、深度画像列からオブジェクト追跡を行い、オブジェクトの位置を取得することができ、さらに、オブジェクトの結像面内の縮拡尺を得ることができる。 The present invention has been made to solve the above-mentioned problems in the prior art. According to the present invention, an object can be tracked from a depth image sequence, the position of the object can be acquired, and further, a reduced scale in the imaging plane of the object can be obtained.

Means for solving the problem

前記問題を解決するために、本発明の実施例においては、深度画像に3DCCA（Three-Dimension Connected Component Analysis，３次元連通域解析）技術を適用し、全連通域のリストを取得するとともに、リストから追跡オブジェクトに関連するオブジェクト連通域を最終決定する。3DCCAにおいては、画素の連通性に応じて、近接の画素点への同一のマーク付けを行うことにより、画像を複数の異なる連通域に群分けすることができ、同一連通域内の任意の２画素点間には、少なくとも1本のＤ−連通路が存在することになる。深度画像への３ＤＣＣＡにより、複雑な背景環境から異なる深度値を有する対象を識別することができ、オブジェクトのマスク画像を得ることができる。 In order to solve the above problem, in the embodiment of the present invention, a 3DCCA (Three-Dimension Connected Component Analysis) technique is applied to a depth image to obtain a list of all communication areas, Finally, the object communication area related to the tracking object is determined. In 3DCCA, images can be grouped into a plurality of different communication areas by performing the same marking on adjacent pixel points according to the connectivity of the pixels, and any two pixels within the same communication area. There will be at least one D-communication path between the points. With 3DCCA on the depth image, targets having different depth values can be identified from a complex background environment, and a mask image of the object can be obtained.

３ＤＣＣＡ技術を用いると、複雑な背景からの対象の分離が効率よく行えるとともに、オブジェクトの移動履歴情報によるオブジェクトの大略位置の予測が可能になる。オブジェクト所在連通域の予測結果により、さらにオブジェクト所在連通域にオプティカル・フロー方法を用いた特徴点の追跡により、オプティカル・フロー・ベクトルを取得するとともに、複数の特徴点のオプティカル・フロー・ベクトルから、最終オブジェクト位置を抽出することができる。 When the 3DCCA technique is used, the object can be efficiently separated from the complicated background, and the approximate position of the object can be predicted based on the movement history information of the object. Based on the prediction result of the object location communication area, the optical flow vector is obtained by tracking the feature point using the optical flow method in the object location communication area, and from the optical flow vector of a plurality of feature points, The final object position can be extracted.

さらに、３ＤＣＣＡと深度情報により、オブジェクトの結像面上の縮拡尺の判定、即ち、イメージ尺度の判定が可能になる。初期状態時のオブジェクトの尺度、初期状態時の深度情報、及び現深度情報により、近似三角形理論から、現オブジェクトのイメージ尺度を予測することができる。 Furthermore, the 3DCCA and the depth information make it possible to determine the scale on the image plane of the object, that is, to determine the image scale. Based on the scale of the object in the initial state, the depth information in the initial state, and the current depth information, the image scale of the current object can be predicted from the approximate triangle theory.

本発明の一態様によると、入力された初期深度画像への3次元の連通域解析を行い、初期深度画像の連通域リストを取得する連通域取得ステップと、初期深度画像におけるオブジェクトの既知の現位置から、オブジェクト所在の連通域を決定するとともに、該連通域に対応する画像部分におけるｎ個特徴点（ｎは自然数）を決定する初期オブジェクト決定ステップと、前記初期深度画像後に入力される後続の深度画像の3次元の連通域解析を行い、前記後続の深度画像の連通域リストを取得した各候補連通域から、前記オブジェクト所在のオブジェクト連通域を識別する追跡ステップと、前記追跡ステップで識別されたオブジェクト連通域から、前記ｎ個特徴点を追跡し、前記オブジェクトの現位置を更新するオブジェクト位置決めステップと、を有する、オブジェクト追跡方法が提供される。 According to one aspect of the present invention, a communication area acquisition step of performing a three-dimensional communication area analysis on the input initial depth image and acquiring a communication area list of the initial depth image, and a known current state of the object in the initial depth image. An initial object determining step for determining a communication area where the object is located from the position and determining n feature points (n is a natural number) in an image portion corresponding to the communication area, and a subsequent input input after the initial depth image A tracking step for identifying the object communication area where the object is located from each candidate communication area obtained by performing a three-dimensional communication area analysis of the depth image and obtaining a communication area list of the subsequent depth image, and identified in the tracking step. An object positioning step of tracking the n feature points from the object communication area and updating the current position of the object; Having, object tracking method is provided.

本発明の他の態様においては、入力された初期深度画像への3次元の連通域解析を行い、初期深度画像の連通域リストを取得する連通域取得装置と、初期深度画像におけるオブジェクトの既知の現位置から、オブジェクト所在の連通域を決定するとともに、該連通域に対応する画像部分におけるｎ個特徴点（ｎは自然数）を決定する初期オブジェクト決定装置と、前記初期深度画像後に入力される後続の深度画像の、3次元の連通域解析を行い、前記後続の深度画像の連通域リストを取得した各候補連通域から、前記オブジェクト所在のオブジェクト連通域を識別する追跡装置と、前記追跡装置で識別されたオブジェクト連通域から、前記ｎ個特徴点を追跡し、前記オブジェクトの現位置を更新するオブジェクト位置決め装置と、を有する、オブジェクト追跡装置が提供される。 In another aspect of the present invention, a communication area acquisition device that performs a three-dimensional communication area analysis on an input initial depth image and acquires a communication area list of the initial depth image, and a known object of the initial depth image An initial object determination device that determines a communication area where an object is located from the current position and determines n feature points (n is a natural number) in an image portion corresponding to the communication area, and a subsequent input that is input after the initial depth image A tracking device that performs a three-dimensional communication area analysis of the depth image of the image and identifies the object communication area where the object is located from each candidate communication area obtained from the communication area list of the subsequent depth image; and An object positioning device that tracks the n feature points from the identified object communication area and updates the current position of the object. Object tracking apparatus is provided.

本発明の実施例においては、３ＤＣＣＡ（３次元連通域解析）技術により、被追跡対象所在の連通域を分割し、各特徴点のオプティカル・フロー追跡結果を、評価する参考基準とする。本発明の実施例は、各種マン・マシンインタラクション応用システムに適用することができ、例えば、マン・マシン・インタラクションゲーム、仮想現実の遠隔制御等の技術方向に適用することができる。 In the embodiment of the present invention, the communication area where the tracking target is located is divided by 3DCCA (three-dimensional communication area analysis) technology, and the optical flow tracking result of each feature point is used as a reference standard for evaluation. The embodiments of the present invention can be applied to various man-machine interaction application systems, and can be applied to technical directions such as man-machine interaction games and virtual reality remote control.

本発明の実施例によると、デプスカメラによるマン・マシンインタラクション応用におけるオブジェクト追跡課題を解決することができ、本発明の実施例の実行は、追跡されるオブジェクトへの特別な要求がなく、特殊なマークや手袋が不要で、外形が明らかなオブジェクトのみならず、非固定オブジェクトにも適用することができ、リアルタイム処理が可能になるとともに、安定かつ確かな追跡結果が得られる。 According to the embodiment of the present invention, the object tracking problem in the man-machine interaction application by the depth camera can be solved, and the execution of the embodiment of the present invention has no special requirement for the object to be tracked, and has a special purpose. Marks and gloves are not required, and it can be applied not only to objects with clear outlines but also to non-fixed objects, enabling real-time processing and obtaining stable and reliable tracking results.

図１Ａは、３Ｄカメラの画像処理工程におけるシーンを示す図である。図１Ｂは、図１Ａのシーンを採集する３Ｄカメラを示す図である。図１Ｃは、図１Ａのシーンの深度画像を示すである。図１Ｄは、図１Ｃにおける深度画像の画素マトリクスである。FIG. 1A is a diagram illustrating a scene in an image processing process of a 3D camera. FIG. 1B shows a 3D camera that collects the scene of FIG. 1A. FIG. 1C shows a depth image of the scene of FIG. 1A. FIG. 1D is a pixel matrix of the depth image in FIG. 1C. 本発明の実施例におけるオブジェクト追跡処理の全体フローチャートである。It is a whole flowchart of the object tracking process in the Example of this invention. 深度画像を示す図である。It is a figure which shows a depth image. 図３Ａに示された深度画像への３ＤＣＣＡ実行結果である。It is a 3DCCA execution result to the depth image shown by FIG. 3A. 画像からのオブジェクト決定結果を示す図である。It is a figure which shows the object determination result from an image. 追跡段階のオブジェクトの結像面上の縮拡尺算出の基本原理図である。It is a basic principle diagram of scale reduction calculation on an imaging plane of an object in a tracking stage. 頭部オブジェクト追跡時の頭部オブジェクトイメージ尺度変化図である。It is a head object image scale change figure at the time of head object tracking. 頭部オブジェクト追跡時の他の頭部オブジェクトイメージ尺度変化図である。It is another head object image scale change figure at the time of head object tracking. 本発明の実施例におけるオブジェクト正確位置決めステップのフローチャートである。It is a flowchart of the object exact positioning step in the Example of this invention. 追跡ターゲットが手の場合の追跡結果を示した図である。It is the figure which showed the tracking result in case a tracking target is a hand. 追跡ターゲットが手の場合の他の追跡結果を示した図である。It is the figure which showed the other tracking result in case a tracking target is a hand. 本発明の実施例におけるオブジェクト追跡装置のブロック図である。It is a block diagram of the object tracking apparatus in the Example of this invention.

以下、図面を参照しながら、本発明の実施例を説明する。 Embodiments of the present invention will be described below with reference to the drawings.

図２は、本発明の実施例におけるオブジェクト追跡方法の全体フローチャートである。図２に示されたように、該オブジェクト追跡方法は、入力された初期深度画像への3次元の連通域解析を行い、初期深度画像の連通域リストを取得する連通域取得ステップＳ１００と、初期深度画像におけるオブジェクトの既知の現位置から、オブジェクト所在の連通域を決定するとともに、該連通域に対応する画像部分におけるｎ個特徴点（ｎは自然数）を決定する初期オブジェクト決定ステップＳ２００と、前記初期深度画像後に入力される後続の深度画像の3次元の連通域解析を行い、前記後続の深度画像の連通域リストを取得した各候補連通域から、前記オブジェクト所在のオブジェクト連通域を識別する追跡ステップＳ３００と、前記追跡ステップで識別されたオブジェクト連通域から、前記ｎ個特徴点を追跡し、前記オブジェクトの現位置を更新するオブジェクト位置決めステップＳ４００と、を有する。 FIG. 2 is an overall flowchart of the object tracking method in the embodiment of the present invention. As shown in FIG. 2, the object tracking method performs a three-dimensional communication area analysis on the input initial depth image and acquires a communication area acquisition step S100 for acquiring a communication area list of the initial depth image, An initial object determining step S200 for determining a communication area where the object is located from a known current position of the object in the depth image, and determining n feature points (n is a natural number) in the image portion corresponding to the communication area; Tracking that identifies the object communication area where the object is located from each candidate communication area obtained by performing a three-dimensional communication area analysis of the subsequent depth image input after the initial depth image and obtaining the communication area list of the subsequent depth image The n feature points are tracked from the object communication area identified in step S300 and the tracking step, and the object Having an object positioning step S400 to update the current position.

本発明の実施例で処理される深度画像は、各種公知の入力技術により入力され、例えば、各種深度画像採集装置や保存装置から読み込まれるか、ネットワークにより得られ、処理後に得られる結果は、各種公知の出力技術により出力され、例えば、直接制御情報に変換されるか、各種保存装置に保存され、ネットワーク経由で出力されるか、プリンタにより出力される。 The depth images processed in the embodiments of the present invention are input by various known input techniques, for example, read from various depth image collection devices or storage devices, or obtained by a network. It is output by a known output technique, for example, directly converted into control information, stored in various storage devices, output via a network, or output by a printer.

前述の連通域取得ステップＳ１００と初期オブジェクト決定ステップＳ２００は、大略初期化段階と見なすことができ、初期化段階においては、深度画像列における最初の１深度画像を選択して処理を施しているが、任意の深度画像を選択して初期化段階操作を行う場合も、該初期化段階の操作対象となる深度画像を、初期深度画像と称することができる。 The communication area acquisition step S100 and the initial object determination step S200 described above can be regarded as an initialization stage. In the initialization stage, the first one depth image in the depth image sequence is selected and processed. Even when an arbitrary depth image is selected and the initialization stage operation is performed, the depth image to be operated in the initialization stage can be referred to as an initial depth image.

前述の追跡ステップＳ３００とオブジェクト位置決めステップＳ４００は、大略追跡段階と見なすことができ、前記初期深度画像後の後続の深度画像となる各深度画像に対し、順次或いは一定間隔で、繰り返し処理を施し、毎回処理される深度画像は、すべて現深度画像と見なすことができ、利用された前回の処理結果が、前回の結果と称され、前回の処理は、後続の深度画像を対象とすることもあり、初期深度画像を対象とすることもある（現在第１の後続の深度画像を処理する場合）。 The tracking step S300 and the object positioning step S400 described above can be roughly regarded as a tracking stage, and each depth image to be a subsequent depth image after the initial depth image is repeatedly processed sequentially or at regular intervals. All depth images processed each time can be regarded as the current depth image, and the previous processing result used is called the previous result, and the previous processing may target the subsequent depth image. The initial depth image may be the target (when processing the first subsequent depth image).

連通域取得ステップＳ１００は、３ＤＣＣＡ手段により、該深度画像中に含まれるＣＣ（連通域）のリストを取得する。初期化段階、追跡段階に関わらず、先ず図中の全連通域情報を含むリストを取得するための深度画像への３ＤＣＣＡ操作が必要となる。３ＤＣＣＡ（即ち、３次元連通域解析）においては、入力される深度画像に対し、画像座標系のＸ軸、Ｙ軸方向に近接点を検知するとともに、Ｚ軸上の間隔が一定範囲内の近接点画素に同一の数字番号付けを行い、同一番号の画素が１つの連通域を構成することから、３ＤＣＣＡの出力は、連通した形状コンポーネントの集合となる。深度画像の３ＤＣＣＡ操作により、３Ｄカメラから得られる画素レベルの深度情報が少ない数の対象集合に統合され、これらの対象集合を用いることにより、Ｚ軸上の異なる深度の対象の決定や、他のシーンコンテンツの解析が可能になる。 In the communication area acquisition step S100, a list of CCs (communication areas) included in the depth image is acquired by the 3DCCA means. Regardless of the initialization stage or the tracking stage, a 3DCCA operation on the depth image is first required to obtain a list including all communication area information in the figure. In 3DCCA (that is, three-dimensional communication area analysis), proximity points are detected in the X-axis and Y-axis directions of the image coordinate system with respect to the input depth image, and the proximity on the Z-axis is within a certain range. Since the same numerical numbering is performed on the point pixels and the pixels having the same number form one communication area, the output of 3DCCA is a set of connected shape components. By the 3DCCA operation of the depth image, the pixel level depth information obtained from the 3D camera is integrated into a small number of target sets, and by using these target sets, determination of targets at different depths on the Z-axis, Analysis of scene content becomes possible.

本発明の実施例には、下記のような具体的な3ＤＣＣＡ実施法を用いることができ、該方法は、一般的に用いられる２ＤＣＣＡをベースに、３Ｄ画像データを適用するための変更を加えて得られたものである。 Embodiments of the present invention can use the following specific 3DCCA implementation, which is based on the commonly used 2DCCA, with modifications to apply 3D image data. It is obtained.

先ず、３ＤＣＣ（３次元連通域）は、以下のように定義される。 First, 3DCC (three-dimensional communication area) is defined as follows.

２つの３Ｄ点のＸＹ面における投影が近接し、かつその深度変化が一定の閾値D_TH未満であると、該2つの点は、Ｄ−連通されているとし、
２つの３Ｄ点のＰとＱにおいて、１グループの３Ｄ点リスト(P, p1, p2, . . . pN, Q)が存在し、該リストにおける任意の２つの隣接点がＤ−連通されていると、該２つの点間にＤ−連通路が存在するとし、
１つの３Ｄ点のＤ−連通集合において、該集合における各点ｐにつき、ＸＹ面上に該集合の連通条件を満たす状態で該連通集合に加えられるｐの近接点が存在しないと、該Ｄ−連通集合は、最大Ｄ−連通集合とし、即ち、Ｄ−連通域とする。 If the projections on the XY plane of two 3D points are close and the change in depth is less than a certain threshold D_TH, the two points are D-connected,
For two 3D points P and Q, there is a group of 3D point lists (P, p1, p2,... PN, Q), and any two adjacent points in the list are D-connected. And there is a D-communication path between the two points,
In a D-communication set of one 3D point, for each point p in the set, if there are no p neighboring points added to the communication set in a state satisfying the communication condition of the set on the XY plane, the D- The communication set is a maximum D-communication set, that is, a D-communication area.

Ｄ−連通域を探す方法、即ち、最大限ＣＣを探す３ＤＣＣＡ方法は、以下の通りである。 The method for searching for the D-communication area, that is, the 3DCCA method for searching for the maximum CC is as follows.

１．各点(x,y)に、所属する連通域（ＣＣ）の番号を付し、LABEL(x,y)と記する。 1. Each point (x, y) is assigned the number of the communication area (CC) to which it belongs, and is denoted as LABEL (x, y).

２．深度差の閾値D_THを定義する。 2. Define the threshold D_TH for the depth difference.

３．１行列データ構造（先入れ先出し）を定義し、QUEUEと記する。 3.1 Define a matrix data structure (first-in first-out) and write QUEUE.

４．全ての点LABEL(x,y)を、−１に初期化する。 4). Initialize all points LABEL (x, y) to -1.

５．現連通域の番号cur_labelを１に設定する。 5. Set the current communication area number cur_label to 1.

６．次のLABELが-1となるCCの起点p-startを検索し始め、このような点が存在しないと、繰り返しを停止する。 6). It starts searching for the CC start point p-start where the next LABEL is -1. If no such point exists, the iteration stops.

７．LABEL(p_start)をcur_labelに設定する。 7). Set LABEL (p_start) to cur_label.

８．全ての点p_startを行列QUEUEに入れる。 8). Put all points p_start into the matrix QUEUE.

９．QUEUEが空でないと、下記ステップを繰り返す。 9. If QUEUE is not empty, repeat the following steps.

ａ．行列からヘッドノードp_head(x,y)を除去する。 a. Remove the head node p_head (x, y) from the matrix.

ｂ．p_headのm個近接点に対し、順次、
i.LABEL(k)>0（kは、m個近接点の索引値）場合は、次の近接点に進み、
ii.ｋの近接点の深度とp_headの深度差がD_THであると、第ｋ個近接点を行列に入れるとともに、LABEL(k)をcur_labelとする。 b. For m proximity points of p_head,
If i.LABEL (k)> 0 (k is the index value of m proximity points), go to the next proximity point,
ii. If the difference between the depth of the proximity point of k and the depth of p_head is D_TH, the kth proximity point is put into the matrix and LABEL (k) is set to cur_label.

１０．cur_labelを1ずつ増やし、ステップ６から繰り返し実行する。 10. Increase cur_label by 1 and repeat from step 6.

前述の方法において、点(x,y)の近接点は、座標が以下のような点に定義される：(x-1,y-1)、(x-1,y)、(x-1,y+1)、(x,y-1)、(x,y+1)、(x+1,y-1)、(x+1,y)、(x+1,y+1)。近接点の座標位置が画像範囲を超えている（マイナス値か、画像の解像度を超えている）と、処理は行わない。 In the above method, the proximity point of the point (x, y) is defined as a point whose coordinates are as follows: (x-1, y-1), (x-1, y), (x-1 , y + 1), (x, y-1), (x, y + 1), (x + 1, y-1), (x + 1, y), (x + 1, y + 1). If the coordinate position of the proximity point exceeds the image range (a negative value or exceeds the image resolution), no processing is performed.

図３Ａ、３Ｂは、深度画像への３ＤＣＣＡ実行結果であり、図３Ａが深度画像を示し、図中の各画素点のそれぞれの濃淡階調値が、それぞれの深度の遠近情報を表している。図３Ｂは、図３Ａに示された深度画像への３ＤＣＣＡ実行結果であり、得られたそれぞれの連通域は、それぞれの階調（色）で区分されている。これにより、手が伸ばされると、図３Ｂの矩形フレームＱ１で示されたように、３ＤＣＣＡにより、身体と環境から手を確実に分割することが可能になる。 3A and 3B show the results of 3DCCA execution on a depth image. FIG. 3A shows the depth image, and the grayscale values of the respective pixel points in the drawing represent the perspective information of the respective depths. FIG. 3B shows a result of 3DCCA execution on the depth image shown in FIG. 3A, and each obtained communication area is divided by each gradation (color). Thus, when the hand is extended, as shown by the rectangular frame Q1 in FIG. 3B, the hand can be surely divided from the body and the environment by 3DCCA.

３ＤＣＣＡは、算法の最適化により、その実行効率の改善が可能であり、さらに、実際の需要に応じて、３ＤＣＣＡを局所領域に適用し、より高い効率を得ることができる。 The execution efficiency of 3DCCA can be improved by optimizing the algorithm, and further, higher efficiency can be obtained by applying 3DCCA to the local region according to actual demand.

初期オブジェクト決定ステップＳ２００においては、初期化段階であるため、外部入力により、オブジェクト位置を指定し、簡単に四角い枠でオブジェクトを囲むか、リアルタイムな検知演算子により、オブジェクトを検知識別することができる。該段階において、オブジェクトの位置が予め分かるため、オブジェクトに関連付けられたＣＣリストから簡単にオブジェクトの所在するＣＣを確定することができる。 Since the initial object determination step S200 is an initialization stage, the object position can be designated by external input, and the object can be easily surrounded by a square frame, or the object can be detected and identified by a real-time detection operator. . At this stage, since the position of the object is known in advance, the CC where the object is located can be easily determined from the CC list associated with the object.

なお、例えば、最近接最大連通域原則等の啓発式規則により、オブジェクトの所在するＣＣを自動確定してもよい。 Note that, for example, the CC where the object is located may be automatically determined by enlightenment rules such as the nearest maximum communication area principle.

図４は、画像からのオブジェクト決定結果を示す図である。図４において、矩形フレームＱ２の白領域が、オブジェクトが手の場合のオブジェクトに関連付けられた連通域を表す。初期段階であるため、図４における手オブジェクトを分割する目的であれば、図４に矩形フレームＱ２を引き、矩形フレームＱ２の領域内で局所３ＤＣＣＡ操作を行えばよい。図４における矩形フレームＱ２においては、３ＤＣＣＡ後に手ＣＣと背景ＣＣの２つのＣＣ領域を有することになり、最近接でかつ大面積の手ＣＣが追跡すべきオブジェクトの所在するＣＣとなる。 FIG. 4 is a diagram illustrating an object determination result from an image. In FIG. 4, a white area of the rectangular frame Q2 represents a communication area associated with an object when the object is a hand. Since it is an initial stage, if the purpose is to divide the hand object in FIG. 4, the rectangular frame Q2 is drawn in FIG. 4, and the local 3DCCA operation may be performed within the area of the rectangular frame Q2. The rectangular frame Q2 in FIG. 4 has two CC regions of the hand CC and the background CC after 3DCCA, and the hand CC having the closest area is the CC where the object to be tracked is located.

オブジェクト所在のＣＣ決定後は、オブジェクトのマスク画像を得ることができる。マスク画像に対応する階調画像（このときの階調画像は、深度画像の可視化後に得られる画像を指し、該階調画像によりオプティカル・フローの追跡が可能になる）や、マスク画像に対応する深度画像と同期するＲＧＢカラー画像から、以後のオプティカル追跡のためのｎ個特徴点を選択する。即ち、前記対応する画像部分が位置する深度画像或いは深度画像と同期する色画像から、前記ｎ個特徴点を抽出する。 After determining the CC where the object is located, a mask image of the object can be obtained. A gradation image corresponding to a mask image (a gradation image at this time indicates an image obtained after visualization of a depth image, and the optical flow can be traced by the gradation image) and a mask image From the RGB color image synchronized with the depth image, n feature points for subsequent optical tracking are selected. That is, the n feature points are extracted from the depth image where the corresponding image portion is located or the color image synchronized with the depth image.

この時の特徴点は、階調画像（ＲＧＢカラー画像もカラー除去操作により階調画像に変換することができる）に比較的に大きな応答値を持つコーナーを指し、例えば、harrisコーナーを指している。ｎ個特徴点の相互間隔は、一定の閾値（第１の閾値とする）以上であり、任意の２つの特徴点の空間上の間隔が近すぎないように、相互間の間隔の最小距離を制限することで、各自の追跡結果の有効性を確保することができる。換言すると、前記ｎ個特徴点の相互間隔が第１の閾値以上であり、かつ前記ｎ個特徴点のそれぞれが、特定の追跡演算子において隣接画素点と区別されるコーナーとなる。 The feature point at this time indicates a corner having a relatively large response value in a gradation image (an RGB color image can be converted into a gradation image by a color removal operation), for example, a harris corner. . The mutual distance between n feature points is equal to or greater than a certain threshold value (first threshold value), and the minimum distance between the two feature points is set so that the space distance between any two feature points is not too close. By limiting, it is possible to ensure the effectiveness of each tracking result. In other words, a mutual interval between the n feature points is equal to or greater than a first threshold value, and each of the n feature points is a corner that is distinguished from an adjacent pixel point by a specific tracking operator.

オープン・ソース・アイテムOpenCV中の関数GoodFeaturesToTrackを用いて、要求を満たす特徴点を選出することができる。 Feature points that meet the requirements can be selected using the function GoodFeaturesToTrack in the open source item OpenCV.

次に、オブジェクト工程は、追跡段階に進む。 The object process then proceeds to the tracking stage.

追跡段階において、初期深度画像後に入力される各深度画像（後続深度画像）に対し、先ず、３ＤＣＣＡ操作を行い、各深度画像のＣＣリストを取得してから、オブジェクトＣＣを決定する。具体的に、前記追跡ステップＳ３００において、後続の深度画像の連通域リストの各候補連通域から、オブジェクトの前回決められた位置に、動き予測後の状態変化を加算した結果と類似度が最も高くなる候補連通域を検索し、前記オブジェクトの現在所在するオブジェクト連通域とする。
後続深度画像の配列における現在処理される深度画像に対し、初期深度画像に用いられる手段と同様の手段により３ＤＣＣＡ操作を施し、該現在深度画像のＣＣリストを取得する。しかし、オブジェクトＣＣの決定工程においては、初期化段階と異なり、この時はオブジェクトＣＣの位置が未知であり、ＣＣリストにおける全連通域がオブジェクト候補連通域となる。追跡ステップＳ３００におけるオブジェクト所在のＣＣの確定は、ＣＣリストからの１ＣＣの検索により行われ、該ＣＣは、被追跡オブジェクトの前回の所在位置に動き予測後の状態変化を加算した結果と最も類似した特徴を有するべきであり、即ち、類似度が最も高くなるべきである。 In the tracking step, first, a 3DCCA operation is performed on each depth image (subsequent depth image) input after the initial depth image to obtain a CC list of each depth image, and then an object CC is determined. Specifically, in the tracking step S300, the degree of similarity is highest with the result obtained by adding the state change after the motion prediction to the previously determined position of the object from each candidate communication area in the communication area list of the subsequent depth image. The candidate communication area is searched for as the object communication area where the object is currently located.
A 3DCCA operation is performed on the currently processed depth image in the array of subsequent depth images by means similar to the means used for the initial depth image to obtain a CC list of the current depth image. However, in the determination process of the object CC, unlike the initialization stage, the position of the object CC is unknown at this time, and all communication areas in the CC list become object candidate communication areas. The determination of the CC of the object location in the tracking step S300 is performed by searching for one CC from the CC list, which is most similar to the result of adding the state change after motion prediction to the previous location of the tracked object. It should have features, that is, it should have the highest similarity.

前記類似度は、候補連通域の位置の、オブジェクトの前回の確定位置にx軸、y軸、z軸方向の予測動き変位を加算した後に得られる予測位置との位置差（運動速度ベクトル差と同等である）と、候補連通域の前回の確定オブジェクト連通域との面積差により、決められる。 The similarity is a position difference (motion velocity vector difference) between a candidate communication area position and a predicted position obtained after adding a predicted motion displacement in the x-axis, y-axis, and z-axis directions to the previous determined position of the object. Is equal) and the area difference between the candidate communication area and the previous determined object communication area.

ここで、オブジェクトCCの検索は、オブジェクトCCの識別や、オブジェクトCCの粗位置決め予測と見なすことができる。深度画像への3DCCAにより、画素レベルの深度情報がオブジェクトCCのリストに集約され、オブジェクトの画素レベルでの単独の識別よりも処理が容易となる。簡単な手段により、オブジェクト所在のＣＣを検索することができ、例えば、以下の距離算出式（１）により、前回のオブジェクトCCまでの位置と最も近似するCCを検索することができる。

Here, the search of the object CC can be regarded as the identification of the object CC and the rough positioning prediction of the object CC. With 3DCCA on the depth image, pixel level depth information is aggregated into the list of objects CC, making the process easier than single identification at the pixel level of the object. The CC where the object is located can be searched by simple means. For example, the CC closest to the previous position to the object CC can be searched by the following distance calculation formula (1).

式中、||.||は、ユークリッドノルム演算子であり、nは、深度画像列における各フレーム深度画像の索引値であり、現在第nフレーム深度画像を処理するとした場合、iは、現在の第nフレーム深度画像から得られた候補連通域の索引値であり、V(n-1)は、前回の第(n-1)フレーム深度画像のオブジェクト追跡結果のx軸、y軸、z軸方向における速度ベクトルであり、Vi(n)は、現在第nフレーム深度画像の場合、第iの候補連通域がオブジェクト連通域時のx軸、y軸、z軸方向における運動速度ベクトルであり、A(n-1)は、前回の第(n-1)フレーム深度画像におけるオブジェクト連通域の面積であり、Ai(n)は、現在の第nフレームの第iの候補連通域の面積であり、aは、重みであり、ユーザの体験やテスト統計解析結果から決められ、Diは、現在の第iの候補CCと前回のオブジェクトCC間の類似度の度量距離である。 Where ||. || is the Euclidean norm operator, n is the index value of each frame depth image in the depth image sequence, and i is currently processing the nth frame depth image, i is the current Is the index value of the candidate communication area obtained from the nth frame depth image of V, and V (n-1) is the x-axis, y-axis, z of the object tracking result of the previous (n-1) th frame depth image. Vi (n) is the velocity vector in the x-axis, y-axis, and z-axis directions when the i-th candidate communication area is the object communication area in the case of the current n-th frame depth image. , A (n-1) is the area of the object communication area in the previous (n-1) th frame depth image, and Ai (n) is the area of the i th candidate communication area of the current nth frame. Yes, a is a weight, determined from user experience and test statistical analysis results, Di is the current i-th candidate CC and the previous o It is a metric distance of similarity between objects CC.

Diの値が小さくなるほど、該候補連通域と、前回のオブジェクト連通域との類似度が高くなると判断することができ、最小のDi値が得られた第iの候補連通域が現在のオブジェクト連通域となる。 It can be determined that the smaller the value of Di, the higher the similarity between the candidate communication area and the previous object communication area, and the i-th candidate communication area having the smallest Di value is the current object communication area. It becomes an area.

オブジェクト連通域の検索工程において、考慮されるオブジェクト状態には、x軸、y軸、z軸方向の運動速度ベクトルとオブジェクトに関連付けられたCC面積のみならず、他のCCに対応付けられた階調や、カラー画像の色ヒストグラムのような特徴量が含まれてもよい。 In the object communication area search process, the object states considered include not only the motion velocity vectors in the x-axis, y-axis, and z-axis directions and the CC area associated with the object, but also the floors associated with other CCs. A feature amount such as a tone or a color histogram of a color image may be included.

なお、他の手段により、オブジェクトＣＣの決定を行ってもよく、例えば、マシン学習方法により実現してもよい。具体的には、分類器の訓練により、オブジェクトＣＣの識別を行ってもよい。分類器の訓練工程においては、速度ベクトルと連通域面積や、他の特徴量を分類器訓練時の入力としてもよい。 Note that the object CC may be determined by other means, and may be realized by a machine learning method, for example. Specifically, the object CC may be identified by training the classifier. In the classifier training process, the velocity vector, the communication area, and other feature quantities may be used as input during classifier training.

追跡ステップＳ３００後は、オブジェクトＣＣが、対象レベルのリストから、オブジェクト位置の大略予測が得られ、前記追跡ステップで識別されたオブジェクトの初期深度画像におけるオブジェクトのイメージ尺度を決定するイメージ尺度決定ステップをさらに有してもよい。現在の追跡オブジェクトイメージ尺度の計算により、オブジェクトの現在の状態へのより正確な記述が行いやすくなる。特に、追跡オブジェクトが背景と同等深度の位置に動いた場合、オブジェクトは、大範囲ＣＣにおける局所部分となり、周囲とはっきり区別されるような周囲との完全な孤立ができない場合、大きなＣＣからのオブジェクト所在の部分ＣＣの境界定めのためのオブジェクトイメージ尺度の算出がより必要となる。 After the tracking step S300, the object CC obtains a rough prediction of the object position from the target level list, and an image scale determining step for determining the image scale of the object in the initial depth image of the object identified in the tracking step. Furthermore, you may have. Calculation of the current tracked object image scale facilitates a more accurate description of the current state of the object. In particular, if the tracking object moves to a position at the same depth as the background, the object becomes a local part in the large range CC, and if it cannot be completely isolated from the surroundings, which is clearly distinguished from the surroundings, the object from the large CC It is necessary to calculate an object image scale for demarcating the location CC.

追跡段階におけるオブジェクトの結像面上の尺度変化を算出するためには、オブジェクトＣＣの平均深度値の算出が必要となるとともに、その初期状態における平均深度及び初期状態におけるオブジェクトのイメージ尺度の記録が必要となる。 In order to calculate the scale change on the imaging plane of the object in the tracking stage, it is necessary to calculate the average depth value of the object CC, and to record the average depth in the initial state and the image scale of the object in the initial state. Necessary.

図５は、追跡段階のオブジェクト結像面上の縮拡尺算出の基本原理図である。
図５に示されたように、オブジェクトObjをカメラCameraの左側に載置し、最初フレームの深度画像中の平均深度がd0で、後続のあるフレームにおける第nフレーム（nは、１ではない自然数）の深度画像中の平均深度がdnである。カメラCameraの右側が結像面Planであり、オブジェクトObjのカメラCameraまでの距離がd0（平均深度d0）時、結像面上のイメージ尺度がS0となり、オブジェクトObjのカメラCameraまでの距離がdn（平均深度dn）時、結像面上のイメージ尺度がSnとなる。オブジェクトObjの実際の尺度はHであり、オブジェクトObjのカメラCameraまでの距離がd0時に、オブジェクトObjのdnにおける投影尺度はLとなる(カメラ位置を点光源とした投影)。これにより、近似三角形原理により、S0/Sn=L/Hとなり、L/H=dn/d0であることから、S0/Sn=dn/d0から、オブジェクトの所在位置の結像面上のイメージ縮拡尺を算出する式（2）を導出することができる。

FIG. 5 is a basic principle diagram for calculating the scale on the object imaging plane in the tracking stage.
As shown in FIG. 5, the object Obj is placed on the left side of the camera Camera, the average depth in the depth image of the first frame is d0, and the nth frame in a subsequent frame (n is a natural number other than 1) ) The average depth in the depth image is dn. The right side of the camera Camera is the imaging plane Plan, and when the distance of the object Obj to the camera Camera is d0 (average depth d0), the image scale on the imaging plane is S0, and the distance of the object Obj to the camera Camera is dn When (average depth dn), the image scale on the imaging plane is Sn. The actual scale of the object Obj is H, and when the distance of the object Obj to the camera Camera is d0, the projection scale at the dn of the object Obj is L (projection using the camera position as a point light source). Thus, according to the approximate triangle principle, S0 / Sn = L / H and L / H = dn / d0, so that S0 / Sn = dn / d0 reduces the image on the image plane at the object location. Equation (2) for calculating the scale can be derived.

前記イメージ尺度決定ステップにおいては、式（2）のSn=d0/dn*S0により、イメージ尺度を算出し、式中、dnは、前記追跡ステップで識別された前記オブジェクト所在のオブジェクト連通域の平均深度であり、d0は、初期深度画像中の決定オブジェクト所在のオブジェクト連通域の平均深度であり、S0は、初期深度画像中のオブジェクトのイメージ尺度であり、Snは、前記オブジェクトの該後続深度画像におけるイメージ尺度である。 In the image scale determination step, an image scale is calculated by Sn = d0 / dn * S0 in Equation (2), where dn is the average of the object communication area where the object is located identified in the tracking step. Depth, d0 is the average depth of the object coverage of the determined object in the initial depth image, S0 is the image scale of the object in the initial depth image, and Sn is the subsequent depth image of the object Is an image scale.

式（2）により、図５に示された場合を算出し、図５におけるdnがd0の２倍であると、dn位置上のオブジェクトのイメージ尺度Snは、d0位置時のイメージ尺度S0の半分となる。該式（2）の算出結果が人間の直感感受と一致することは言うまでもない。 When the case shown in FIG. 5 is calculated by Equation (2), and dn in FIG. 5 is twice d0, the image scale Sn of the object on the dn position is half of the image scale S0 at the d0 position. It becomes. Needless to say, the calculation result of the formula (2) is consistent with human intuition.

図６Ａ、６Ｂは、頭部オブジェクト追跡時の頭部オブジェクトイメージ尺度変化図であり、図６Ａ、図６Ｂは、それぞれ異なるフレームを表し、図６Ａにおける矩形フレームＱ３が、追跡ターゲットを表し、図６Ｂにおける矩形フレームＱ４も、追跡ターゲットを表しているが、図６Ａに示された追跡ターゲットは、図６Ｂにおいてはイメージ尺度の変化がある。 6A and 6B are head object image scale change diagrams when tracking a head object, FIGS. 6A and 6B represent different frames, a rectangular frame Q3 in FIG. 6A represents a tracking target, and FIG. The rectangular frame Q4 in FIG. 6 also represents the tracking target, but the tracking target shown in FIG. 6A has a change in image scale in FIG. 6B.

オブジェクト連通域の予測により、オブジェクトの大略位置と、結像面内のイメージ縮拡尺の予測のみならず、オブジェクトの大略の形状記述も可能になり、非固定対象の追跡が容易になる。 By predicting the object communication area, not only the approximate position of the object and the image scale in the imaging plane can be predicted, but also the approximate shape description of the object can be made, and the tracking of the non-fixed object becomes easy.

次に、オブジェクト位置決めステップＳ４００において、例えばオプティカル・フロー方法により、より正確なオブジェクト位置決め点の予測を得ることができる（画像座標において、オブジェクトの尺度を省略して１位置決め点に抽象化し、該点の位置がオブジェクトの画像における位置を表すことになる）。本発明の実施例においては、オブジェクトレベル上の情報が、オブジェクト位置の正確予測に用いられ、このような情報には、オブジェクト連通域の形状のみならず、オブジェクト連通域内の複数点のオプティカル・フロー追跡と中間値特徴の抽出で共通に表されるオブジェクトレベルの情報が含まれている。 Next, in the object positioning step S400, a more accurate prediction of the object positioning point can be obtained by, for example, an optical flow method (in the image coordinates, the object scale is omitted and the object positioning point is abstracted into one positioning point. Represents the position of the object in the image). In the embodiment of the present invention, information on the object level is used for accurate prediction of the object position. Such information includes not only the shape of the object communication area but also the optical flow of a plurality of points in the object communication area. Contains object-level information that is commonly represented by tracking and intermediate value extraction.

図７は、本発明の実施例におけるオブジェクト正確位置決めステップのフローチャートである。 FIG. 7 is a flowchart of the object accurate positioning step in the embodiment of the present invention.

図７に示されたように、前記オブジェクト位置決めステップＳ４００において、前記後続の深度画像から、ＫＬＴ追跡器による前記ｎ個特徴点の追跡を行う特徴点追跡ステップＳ４２０と、オブジェクト連通域のマスク画像情報から、追跡された各特徴点の階級量子化を行い、追跡された各特徴点の重み付けを行う階級量子化ステップＳ４４０と、追跡特徴点の群分けを行うとともに、群の中心点を算出し、前記オブジェクトの現位置を更新する群分けステップＳ４６０と、を有する。 As shown in FIG. 7, in the object positioning step S400, the feature point tracking step S420 for tracking the n feature points by the KLT tracker from the subsequent depth image, and the mask image information of the object communication area And class quantization of each tracked feature point, class quantization step S440 for weighting each tracked feature point, grouping the tracked feature points, and calculating the center point of the group, And a grouping step S460 for updating the current position of the object.

特徴点追跡ステップＳ４２０において、ＫＬＴ追跡器による各特徴点の追跡を行い、各特徴点の現在処理した後続深度画像上の新規位置点を取得する。例えば、OpenCVの関数cvCalcOpticalFlowPyrLKにより、特徴点のオプティカル・フロー・ベクトルを算出し、特徴点の該後続の深度画像上の対応する新規位置を取得する。 In the feature point tracking step S420, each feature point is tracked by the KLT tracker, and a new position point on the subsequent depth image that is currently processed of each feature point is acquired. For example, the optical flow vector of the feature point is calculated by the OpenCV function cvCalcOpticalFlowPyrLK, and the corresponding new position of the feature point on the subsequent depth image is acquired.

前記特徴点追跡ステップＳ４２０後で、前記階級量子化ステップＳ４４０前に、追跡特徴点における、特徴点の追跡中の誤差が第２の所定閾値を超えた特徴点の除去を行う第１の除去ステップをさらに有する。例えば、OpenCVの関数cvCalcOpticalFlowPyrLKにより、特徴点のオプティカル・フロー・ベクトルを算出する場合、最大の誤差パラメータ（第２の閾値）の値を１１５０に設定することができ、誤差パラメータが該値を超える追跡結果は除去することにより、最初に、特徴点オプティカル・フロー追跡中の誤差の高い、または関連性の低い追跡特徴点を除去することができる。１１５０は、第２閾値の１例にすぎず、第２閾値は、例えば、１１００、１２００等の他の値でもよいことは言うまでもない。 After the feature point tracking step S420, and before the class quantization step S440, a first removal step of removing feature points in the tracking feature points whose feature point tracking error exceeds a second predetermined threshold. It has further. For example, when calculating the optical flow vector of a feature point using the OpenCV function cvCalcOpticalFlowPyrLK, the maximum error parameter (second threshold) value can be set to 1150, and the error parameter exceeds this value. By removing the results, one can first remove tracking feature points with high or low error during feature point optical flow tracking. It goes without saying that 1150 is merely an example of the second threshold value, and the second threshold value may be other values such as 1100 and 1200, for example.

階級量子化ステップＳ４４０において、オブジェクト連通域のマスク画像情報により、各追跡特徴点の階級量子化を行い、前記第１の除去ステップを行った場合は、第１の除去後の残りの各追跡特徴点の重み算出を行う。用いられる簡単な処理方法としては、追跡特徴点の新規位置がオブジェクト連通域の予測領域内にあると、重み付けを１とし、それ以外は、０とする。 In the class quantization step S440, each tracking feature point is subjected to class quantization based on the mask image information of the object communication area, and when the first removal step is performed, each remaining tracking feature after the first removal is performed. Point weight calculation is performed. As a simple processing method to be used, the weight is set to 1 if the new position of the tracking feature point is within the predicted area of the object communication area, and 0 otherwise.

前記階級量子化ステップＳ４４０後で、前記群分けステップＳ４６０前に、追跡特徴点における追跡特徴点の重心との間隔が最も大きくなる所定比例数の特徴点の除去を行う第２の除去ステップをさらに有する。 After the class quantization step S440 and before the grouping step S460, a second removal step of removing a predetermined proportional number of feature points having the largest interval between the tracking feature points and the center of gravity of the tracking feature points is further performed. Have.

群分けステップＳ４６０において、存在する追跡特徴点（前記第２の除去ステップを行った場合は、前記第２の除去操作後の残りの追跡特徴点）により、各特徴点までも総距離は最も短くなるその群の中心点を算出し、該群の中心点を、比較的正確にオブジェクト追跡のオブジェクト位置決め点とすることができる。用いられる群分け実施方式としては、例えば、P1,P2,P3,…,Pmのように、残りのm（mは、自然数）の追跡特徴点の番号付けを行い、第iの点（iは、索引値であり、1……mである）を群の中心点とし、該第iの特徴点から他の各特徴点までの距離の総長

を算出し、最後に、i=1……mにおいて、Dtiが最小値となる時の特徴点位置を探し、最終群分けの中心点とする。このときの最終群分けの中心点位置を前記オブジェクトの現位置とする。 In the grouping step S460, the total distance is the shortest to each feature point due to the existing tracking feature points (or the remaining tracking feature points after the second removal operation when the second removal step is performed). The center point of the group can be calculated, and the center point of the group can be used as the object positioning point for object tracking relatively accurately. As the grouping implementation method used, for example, the tracking feature points of the remaining m (m is a natural number) are numbered, such as P1, P2, P3, ..., Pm, and the i-th point (i is Is the index value, and 1 ... m), and the total length of the distance from the i-th feature point to each other feature point

Finally, the feature point position when Dti is the minimum value is searched for i = 1... M, and is set as the center point of the final grouping. The center point position of the final grouping at this time is set as the current position of the object.

各追跡サイクルにおいて、前記操作後に追跡特徴点の数が減少する場合があることから、新規特徴点を特徴点集合に補充することで、次のフレームの深度画像の追跡に必要な数の特徴点を満足することができる。 Since the number of tracking feature points may decrease after each operation in each tracking cycle, the number of feature points necessary for tracking the depth image of the next frame can be obtained by supplementing the feature point set with new feature points. Can be satisfied.

換言すると、前記群分けステップＳ４６０後に、オブジェクトの特徴点総数がｎとなるように、また、前記オブジェクト内に位置し、かつ補充後のｎ個特徴点の相互間隔が、第１の所定閾値以上になるように、新規特徴点を補充する補充ステップをさらに有してもよい。 In other words, after the grouping step S460, the total number of feature points of the object is n, and the interval between the n feature points that are located in the object and are replenished is equal to or greater than the first predetermined threshold. The replenishment step of replenishing new feature points may be further included.

前述の具体的なオプティカル・フロー追跡方法には、予測したオブジェクト連通域の形状情報が十分用いられており、下層から得られた画素点のオプティカル・フロー情報が用いられるとともに、上層の対象レベルの情報が組み合わせられ、追跡結果の正確性、特に非固定オブジェクトの追跡結果の信頼性を確保することができる。 In the specific optical flow tracking method described above, the shape information of the predicted object communication area is sufficiently used, the optical flow information of the pixel points obtained from the lower layer is used, and the target level of the upper layer is also measured. Information can be combined to ensure the accuracy of tracking results, particularly the reliability of tracking results for non-fixed objects.

図８Ａ、８Ｂは、追跡ターゲットが手の場合の追跡結果を示した図である。図８Ａにおいて、矩形フレームQ5 は、追跡ターゲットの手を表し、e1を例とした複数の微小点で追跡特徴点を表し、比較的に大きな点T1でターゲット位置決め点の予測結果を表している。図８Ｂにおいて、矩形フレームQ6 は、追跡ターゲットの手を表し、e2を例とした複数の微小点で追跡特徴点を表し、比較的に大きな点T2でターゲット位置決め点の予測結果を表している。 8A and 8B are diagrams illustrating tracking results when the tracking target is a hand. In FIG. 8A, a rectangular frame Q5 represents the hand of the tracking target, a tracking feature point is represented by a plurality of minute points e1 as an example, and a target positioning point prediction result is represented by a relatively large point T1. In FIG. 8B, a rectangular frame Q6 represents the hand of the tracking target, a tracking feature point is represented by a plurality of minute points e2 as an example, and a target positioning point prediction result is represented by a relatively large point T2.

本発明は、さらに、前述のオブジェクト追跡方法を実行するオブジェクト追跡装置として実施することが可能である。図９は、本発明の実施例におけるオブジェクト追跡装置のブロック図である。 The present invention can be further implemented as an object tracking device that executes the object tracking method described above. FIG. 9 is a block diagram of the object tracking device in the embodiment of the present invention.

図９に示されたように、本発明の実施例によるオブジェクト追跡装置は、入力された初期深度画像への3次元の連通域解析を行い、初期深度画像の連通域リストを取得する、前述の連通域取得ステップＳ１００を実行する連通域取得装置１００と、初期深度画像におけるオブジェクトの既知の現位置から、オブジェクト所在の連通域を決定するとともに、該連通域に対応する画像部分におけるｎ個特徴点（ｎは自然数）を決定する、前記初期オブジェクト決定ステップＳ２００を実行する初期オブジェクト決定装置２００と、前記初期深度画像後に入力される後続の深度画像の3次元の連通域解析を行い、前記後続の深度画像の連通域リストを取得した各候補連通域から、前記オブジェクト所在のオブジェクト連通域を識別する、前記追跡ステップＳ３００を実行する追跡装置３００と、前記追跡装置３００で識別されたオブジェクト連通域から、前記ｎ個特徴点を追跡し、前記オブジェクトの現位置を更新する、前記オブジェクト位置決めステップＳ４００を実行するオブジェクト位置決め装置４００と、を有する。 As shown in FIG. 9, the object tracking apparatus according to the embodiment of the present invention performs a three-dimensional communication area analysis on the input initial depth image, and acquires a communication area list of the initial depth image. From the communication area acquisition device 100 that executes the communication area acquisition step S100, and the known current position of the object in the initial depth image, the communication area where the object is located is determined, and n feature points in the image portion corresponding to the communication area (N is a natural number) for determining the initial object determination device 200 for executing the initial object determination step S200, and performing a three-dimensional communication area analysis of the subsequent depth image input after the initial depth image, The tracking step for identifying the object communication area where the object is located from each candidate communication area from which the communication area list of the depth image is acquired. A tracking device 300 that executes the step S300, and an object that executes the object positioning step S400 that tracks the n feature points from the object communication area identified by the tracking device 300 and updates the current position of the object. Positioning device 400.

ここで、前記ｎ個特徴点の相互間隔が、第１の所定閾値以上であり、前記ｎ個特徴点のそれぞれが、特定の追跡演算子において、隣接画素点のコーナーと区別されるようにしてもよい。 Here, an interval between the n feature points is equal to or greater than a first predetermined threshold value, and each of the n feature points is distinguished from a corner of an adjacent pixel point in a specific tracking operator. Also good.

また、前記ｎ個特徴点が抽出された前記対応画像部分が、深度画像に位置するか、深度画像と同期する色画像に位置するようにしてもよい。 Further, the corresponding image portion from which the n feature points are extracted may be located in a depth image or a color image synchronized with the depth image.

前記追跡装置３００は、後続の深度画像の連通域リストの各候補連通域から、オブジェクトの前回決められた位置に、動き予測後の状態変化を加算した結果と類似度が最も高くなる候補連通域を検索し、前記オブジェクトの現在所在するオブジェクト連通域とする。 The tracking device 300 has the highest similarity with the result obtained by adding the state change after motion prediction to the previously determined position of the object from each candidate communication area in the communication area list of the subsequent depth image. To the object communication area where the object is currently located.

本発明の実施例におけるオブジェクト追跡装置は、さらに、前記追跡装置３００で識別されたオブジェクト連通域の、初期深度画像におけるオブジェクトのイメージ尺度を決定する、前記イメージ尺度決定ステップ実行するイメージ尺度決定装置を有する。 The object tracking device according to an embodiment of the present invention further includes an image scale determination device that executes the image scale determination step of determining an image scale of an object in an initial depth image of the object communication area identified by the tracking device 300. Have.

ここで、前記イメージ尺度決定装置は、前記追跡装置３００で識別された前記オブジェクト所在のオブジェクト連通域の平均深度をｄｎとし、初期深度画像から決められたオブジェクト所在のオブジェクト連通域の平均深度をｄ０とし、初期深度画像におけるオブジェクトのイメージ尺度をＳ０とし、前記オブジェクトの該後続の深度画像におけるイメージ尺度をＳｎとした時、Ｓｎ＝ｄ０/ｄｎ*Ｓ０により、イメージ尺度を計算するようにしてもよい。 Here, the image scale determination device sets the average depth of the object communication area of the object location identified by the tracking device 300 to dn, and sets the average depth of the object communication region of the object location determined from the initial depth image to d0. And the image scale of the object in the initial depth image is S0, and the image scale of the subsequent depth image of the object is Sn, the image scale may be calculated by Sn = d0 / dn * S0. .

ここで、前記オブジェクト位置決め装置４００は、前記後続の深度画像から、ＫＬＴ追跡器による前記ｎ個特徴点の追跡を行う、前記特徴点追跡ステップＳ４２０を行う特徴点追跡装置と、オブジェクト連通域のマスク画像情報から、追跡された各特徴点の階級量子化を行い、追跡された各特徴点の重み付けを行う、前記階級量子化ステップを行う階級量子化装置と、追跡特徴点の群分けを行うとともに、群の中心点を算出し、前記オブジェクトの現位置を更新する、前記群分けステップＳ４６０を行う群分け装置と、を有してもよい。 Here, the object positioning apparatus 400 performs tracking of the n feature points by the KLT tracker from the subsequent depth image, the feature point tracking apparatus that performs the feature point tracking step S420, and an object communication area mask. Performs class quantization of each tracked feature point from image information, weights each tracked feature point, performs the class quantization step and classifies the tracking feature points A grouping device that calculates the center point of the group and updates the current position of the object, and performs the grouping step S460.

ここで、前記特徴点追跡装置と、前記階級量子化装置との間に、追跡特徴点における、特徴点の追跡中の誤差が第２の所定閾値を超えた特徴点の除去を行う、前記第１の除去ステップを実行する第１の除去装置をさらに有し、前記階級量子化装置と、前記群分け装置の間に、追跡特徴点における追跡特徴点の重心との間隔が最も大きくなる所定比例数の特徴点の除去を行う、第２の除去ステップを実行する第２の除去装置さらに有するようにしてもよい。 Here, between the feature point tracking device and the class quantizing device, the feature points whose tracking point has an error during tracking of the feature point exceeds a second predetermined threshold are removed. A first removal device that executes one removal step, and a predetermined proportionality between the class quantization device and the grouping device that maximizes the distance between the tracking feature point and the center of the tracking feature point. You may make it also have the 2nd removal apparatus which performs the 2nd removal step which removes a number of feature points.

本発明の実施例におけるオブジェクト追跡装置は、前記群分けステップ後に、オブジェクトの特徴点総数がｎとなるように、また、前記オブジェクト内に位置し、かつ補充後のｎ個特徴点の相互間隔が、第１の所定閾値以上になるように、新規特徴点を補充する、前記補充ステップを実行する補充装置をさらに有するようにしてもよい。 In the object tracking device according to the embodiment of the present invention, after the grouping step, the total number of feature points of the object is n, and the mutual interval of the n feature points after being replenished is positioned in the object. A replenishing device that replenishes new feature points so as to be equal to or higher than the first predetermined threshold and that executes the replenishment step may be further included.

本発明の実施例におけるオブジェクト追跡方法及びオブジェクト追跡装置においては、深度画像への３次元連通域解析を行い、全連通域リストを取得し、連通域の動き予測を行い、連通域リストから、オブジェクト所在の連通域を識別するとともに、オブジェクトの結像面におけるイメージ尺度の予測を行い、信頼性のある対象レベルの情報（オブジェクト連通域）を提供することにより、特徴群分け処理における各追跡特徴点の評価に重要な基準情報を提供することができ、複数の特徴点のオプティカル・フロー追跡における特徴点の補充を実現可能にする。連通域の分割と複数点追跡の相互補充により、誤差の伝播を効率よく防止することができ、安定した追跡結果を得ることができる。 In the object tracking method and the object tracking device according to the embodiment of the present invention, the three-dimensional communication area analysis to the depth image is performed, the entire communication area list is acquired, the motion of the communication area is predicted, and the object is detected from the communication area list. Each tracking feature point in the feature grouping process is identified by identifying the communication area where it is located, predicting the image scale on the image plane of the object, and providing reliable target level information (object communication area) It is possible to provide reference information important to the evaluation of the feature points, and to make it possible to implement feature point supplementation in optical flow tracking of a plurality of feature points. Propagation of errors can be efficiently prevented and a stable tracking result can be obtained by dividing the communication area and mutually supplementing the tracking of multiple points.

本発明の実施例においては、非特許文献１と比べて、オブジェクトの分割が、深度図への３ＤＣＣＡ操作及び運動の連続性により行われるため、２Ｄ追跡における色干渉の影響を無くすことができ、より高信頼性の追跡結果を得ることができる。 In the embodiment of the present invention, as compared with Non-Patent Document 1, the object is divided by the 3DCCA operation to the depth map and the continuity of motion, so that the influence of color interference in 2D tracking can be eliminated, More reliable tracking results can be obtained.

本発明の実施例においては、特許文献１に比べて、オブジェクトに関連付けられた連通域の確定により、大略位置を先ず粗予測し、粗予測のもと、オプティカル・フロー方法を用いて、より正確な位置を得ている。本発明の実施例は、全体と局部の情報が組み合わせられており、アップからボトムへ、及びボトムからアップへの処理が用いられているため、特許文献１における方法よりも、安定した追跡がより容易に実現可能となる。なお、連通域から、形状エッジ情報が得られるとともに、連通域マスクに対応する画像に、相互間隔が所定値以上の複数特徴点の追跡が用いられ、追跡中のオブジェクトの形状変化が反映されているため、本発明の実施例によると、非固定対象の追跡処理がより容易に行える。 In the embodiment of the present invention, compared with Patent Document 1, the rough position is first roughly predicted by the determination of the communication area associated with the object, and the optical flow method is used more accurately under the rough prediction. Has gained a good position. The embodiment of the present invention combines the whole and local information, and uses up-to-bottom and bottom-to-up processing, so that stable tracking is better than the method in Patent Document 1. It can be easily realized. Note that shape edge information is obtained from the communication area, and tracking of multiple feature points whose mutual intervals are equal to or larger than a predetermined value is used in the image corresponding to the communication area mask to reflect the shape change of the object being tracked. Therefore, according to the embodiment of the present invention, the tracking process of the non-fixed object can be performed more easily.

また、本発明の実施例は、深度情報と近似三角形理論により、結像面のオブジェクトのイメージ尺度を計算することにより、より正確なオブジェクト表示と特徴抽出が容易に行えるとともに、より安定した追跡が可能になる。 Further, according to the embodiment of the present invention, by calculating the image scale of the object on the imaging plane based on depth information and approximate triangle theory, more accurate object display and feature extraction can be easily performed, and more stable tracking can be performed. It becomes possible.

また、本明細書における一連の動作は、ハードウェアや、ソフトウェアや、ハードウェアとソフトウェアの組み合わせから実行することができる。ソフトウェアによる該一連の動作の実行時には、コンピュータプログラムを専用ハードウェアに内蔵されたコンピュータのメモリにインストールし、コンピュータにより該コンピュータプログラムを実行させてもよい。或いは、コンピュータプログラムを各種処理が実行可能な汎用コンピュータにインストールし、コンピュータにより該コンピュータプログラムを実行させてもよい。 In addition, a series of operations in this specification can be executed from hardware, software, or a combination of hardware and software. When executing the series of operations by software, a computer program may be installed in a memory of a computer built in dedicated hardware, and the computer program may be executed by the computer. Alternatively, a computer program may be installed in a general-purpose computer that can execute various processes, and the computer program may be executed by the computer.

例えば、コンピュータプログラムを記録媒体であるハードディスクやＲＯＭに予め保存するか、一時または永久的にコンピュータプログラムをフロッピや、ＣＤ−ＲＯＭや、MOや、ＤＶＤや、磁気ディスクや、半導体メモリ等のような移動記録媒体に記憶（記録）することができ、このような移動記録媒体をパッケージとして提供してもよい。 For example, the computer program is stored in advance on a recording medium such as a hard disk or ROM, or the computer program is temporarily or permanently stored on a floppy disk, CD-ROM, MO, DVD, magnetic disk, semiconductor memory, etc. It can be stored (recorded) on a mobile recording medium, and such a mobile recording medium may be provided as a package.

前述の具体的な実施例によって、本発明を詳細に説明したが、本発明の精神を逸脱しない範囲内の実施例への修正や代替が可能なことは言うまでもない。換言すると、本発明は、説明の形式で開示されており、制限的に解釈されるものではない。本発明の要旨は、添付された特許請求の範囲で判断されるべきである。 Although the present invention has been described in detail with the specific embodiments described above, it goes without saying that modifications and substitutions can be made to the embodiments without departing from the spirit of the present invention. In other words, the present invention is disclosed in the form of description and should not be construed as limiting. The spirit of the invention should be determined from the appended claims.

Claims

A communication area acquisition step of performing a three-dimensional communication area analysis on the input initial depth image and acquiring a communication area list of the initial depth image;
An initial object determining step for determining a communication area where the object is located from a known current position of the object in the initial depth image, and determining n feature points (n is a natural number) in an image portion corresponding to the communication area; ,
A three-dimensional communication area analysis of the subsequent depth image input after the initial depth image is performed, and the object communication area where the object is located is identified from each candidate communication area obtained from the communication area list of the subsequent depth image. A tracking step;
Wherein from said object connected domain identified in tracking step tracks the n feature points, have a, and object positioning step of updating the current position of the object,
The object tracking method , wherein a mutual interval between the n feature points is equal to or greater than a first predetermined threshold, and each of the n feature points is distinguished from a corner of an adjacent pixel point by a specific tracking operator .

The object tracking method according to claim 1 , wherein the corresponding image portion from which the n feature points are extracted is located in a depth image or a color image synchronized with the depth image.

In the tracking step, the candidate communication area having the highest similarity with the result of adding the state change after motion prediction to the previously determined position of the object from each candidate communication area of the communication area list of the subsequent depth image The object tracking method according to claim 1, wherein an object communication area where the object is currently located is searched.

The object tracking method according to claim 1, further comprising an image scale determining step of determining an image scale of the object in the initial depth image of the object communication area identified in the tracking step after the tracking step.

In the image scale determination step,
The average depth of the object communication area of the object location identified in the tracking step is dn, the average depth of the object communication area of the object location determined from the initial depth image is d0, and the object in the initial depth image 5. The object tracking method according to claim 4 , wherein the image scale is calculated by Sn = d0 / dn * S0 where S0 is an image scale and Sn is an image scale in the subsequent depth image of the object.

In the object positioning step,
A feature point tracking step of tracking the n feature points by the KLT tracker from the subsequent depth image;
A class quantization step for performing class quantization of each tracked feature point from the mask image information of the object communication area and weighting each tracked feature point;
The object tracking method according to claim 1, further comprising: a grouping step of performing grouping of the tracking feature points, calculating a center point of the group, and updating a current position of the object.

After the feature point tracking step, before the class quantization step, the tracking feature point further includes a first removal step for removing a feature point whose error during tracking of the feature point exceeds a second predetermined threshold. And
After the class quantization step and before the grouping step, the tracking feature points further include a second removal step of removing a predetermined proportional number of feature points having the largest distance from the center of gravity of the tracking feature points. The object tracking method according to claim 6 .

After the grouping step, so that the total number of feature points of the object is n, and so that the mutual interval between the n feature points that are located in the object and are replenished is equal to or greater than a first predetermined threshold value. The object tracking method according to claim 6 , further comprising a replenishment step of replenishing new feature points.

A communication area acquisition device that performs a three-dimensional communication area analysis on the input initial depth image and acquires a communication area list of the initial depth image;
An initial object determination device for determining a communication area of the object location from a known current position of the object in the initial depth image and determining n feature points (n is a natural number) in an image portion corresponding to the communication area; ,
A three-dimensional communication area analysis of the subsequent depth image input after the initial depth image is performed, and the object communication area where the object is located is identified from each candidate communication area obtained from the communication area list of the subsequent depth image. A tracking device;
Wherein from said object connected domain identified in tracking device to track the n feature points, have a, an object positioning device that updates the current position of the object,
An object tracking device , wherein the n feature points have a mutual interval equal to or greater than a first predetermined threshold, and each of the n feature points is distinguished from a corner of an adjacent pixel point by a specific tracking operator .

The object tracking device according to claim 9, wherein the corresponding image portion from which the n feature points are extracted is located in a depth image or a color image synchronized with the depth image.