JP6632142B2

JP6632142B2 - Object tracking device, method and program

Info

Publication number: JP6632142B2
Application number: JP2016210608A
Authority: JP
Inventors: ホウアリサビリン; 内藤　整; 整内藤
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2016-10-27
Filing date: 2016-10-27
Publication date: 2020-01-15
Anticipated expiration: 2036-10-27
Also published as: JP2018073044A

Description

本発明は、奥行映像上でオブジェクトを追跡するオブジェクト追跡装置、方法およびプログラムに係り、特に、オクルージョンにかかわらず頑健かつ高精度のオブジェクト追跡を可能にするオブジェクト追跡装置、方法およびプログラムに関する。 The present invention relates to an object tracking device, method, and program for tracking an object on a depth image, and more particularly, to an object tracking device, method, and program that enable robust and accurate object tracking regardless of occlusion.

監視カメラ映像や、スポーツ映像を対象としたオブジェクト追跡に関する手法がこれまでに数多く提案されている。これらのほとんどは、一定の撮影条件が満たされることを前提に、映像のカラー情報に基づく追跡を行っている。 Many methods related to object tracking for surveillance camera images and sports images have been proposed. Most of them perform tracking based on color information of an image on the assumption that a certain shooting condition is satisfied.

一方、防犯やマーケティングの用途において、監視カメラ映像を用いた人物検出・追跡に関する需要が急速に高まっているが、解像度不足や照明条件の観点からカラー情報の利用が困難なケースが少なくない。また、プライバシーの観点から、人物を特定できてしまうカラー情報の利用が制限されており、代わりに人物を特定できない奥行映像のみからオブジェクト追跡を行う技術に対する期待が高まっている。 On the other hand, in crime prevention and marketing applications, demand for person detection and tracking using surveillance camera images is rapidly increasing, but in many cases it is difficult to use color information from the viewpoint of insufficient resolution and lighting conditions. In addition, from the viewpoint of privacy, the use of color information that can identify a person is restricted, and there is an increasing expectation for a technology that performs object tracking only from a depth video in which a person cannot be identified.

特許文献１には、奥行映像のフレーム間で各オブジェクトの位置、サイズおよび動きベクトルを比較し、オクルージョンの発生前後における各オブジェクトの位置、サイズおよび動きベクトルに基づいて各オブジェクトに同一IDを割り当てる技術が開示されている。 Patent Literature 1 discloses a technique for comparing the position, size, and motion vector of each object between frames of a depth image and assigning the same ID to each object based on the position, size, and motion vector of each object before and after the occurrence of occlusion. Is disclosed.

特許文献２には、奥行映像上で検知したオブジェクトの左右の各輪郭に沿ったオブジェクト曲線をそれぞれ抽出し、オクルージョンの発生前および解消後の各オブジェクトのオブジェクト曲線に基づいて各オブジェクトの対応関係を判定し、対応するオブジェクトに同一IDを割り当てる技術が開示されている。 Patent Literature 2 discloses extracting object curves along respective left and right contours of an object detected on a depth image, and determining the correspondence between the objects based on the object curves before and after occlusion has occurred. There is disclosed a technique for determining and assigning the same ID to a corresponding object.

特許文献３には、今回フレームで検知されたオブジェクトがオクルージョンにより生じた統合オブジェクトであると、この統合オブジェクトから前回フレームで検知されているオブジェクトに類似する部分を追跡対象のオブジェクトとして分割し、フレーム間で各オブジェクトをその類似度に基づいて追跡する技術が開示されている。 Patent Document 3 discloses that if an object detected in a current frame is an integrated object caused by occlusion, a part similar to the object detected in the previous frame is divided from the integrated object as an object to be tracked. There is disclosed a technique for tracking each object based on the similarity between the objects.

特願2014-198941号Japanese Patent Application No. 2014-198941 特願2015-38594号Japanese Patent Application No. 2015-38594 特願2016-62076号Japanese Patent Application No. 2016-62076

S. Thrun, and A. Bucken, "Integrating grid-based and topological maps for mobile robot navigation," Proceedings of the Thirteenth National Conference on Artificial Intelligence, pp944-950, 1996.S. Thrun, and A. Bucken, "Integrating grid-based and topological maps for mobile robot navigation," Proceedings of the Thirteenth National Conference on Artificial Intelligence, pp944-950, 1996. T. Bagautdinov, F. Fleuret, and P. Fua, "Probability Occupancy Maps for Occluded Depth Images," IEEE CVPR 2015.T. Bagautdinov, F. Fleuret, and P. Fua, "Probability Occupancy Maps for Occluded Depth Images," IEEE CVPR 2015.

上記の各先行技術は、奥行映像から得られるデプス情報に基づいて各オブジェクトを検知するので、奥行カメラとオブジェクトとの距離が大きく、デプス情報の信頼性が低いためにオブジェクトを正確に検知できないと、オブジェクトの追跡精度が低下してしまう。 In each of the above prior arts, each object is detected based on depth information obtained from a depth image. Therefore, if the distance between the depth camera and the object is large and the reliability of the depth information is low, the object cannot be accurately detected. However, the tracking accuracy of the object is reduced.

また、上記の各先行技術では、各オブジェクトを奥行カメラのカメラビューで得られるデプス情報のみに基づいて検知するので、奥行カメラと各オブジェクトとの相対的な位置関係により、カメラビューでの識別が難しいオブジェクトについては追跡精度が低下してしまう。 Further, in each of the above prior arts, since each object is detected based only on the depth information obtained in the camera view of the depth camera, identification in the camera view is performed based on the relative positional relationship between the depth camera and each object. For difficult objects, the tracking accuracy is reduced.

本発明の目的は、上記の技術課題を解決し、奥行カメラと各オブジェクトとの相対的な位置関係にかかわらず、各オブジェクトをオクルージョンにかかわらず頑健かつ精度良く追跡できるオブジェクト追跡装置、方法およびプログラムを提供することにある。 SUMMARY OF THE INVENTION An object of the present invention is to solve the above-mentioned technical problem, and provide an object tracking apparatus, method and program that can robustly and accurately track each object regardless of the occlusion regardless of the relative positional relationship between the depth camera and each object. Is to provide.

上記の目的を達成するために、本発明は、オブジェクトをその奥行映像に基づいて追跡するオブジェクト追跡装置において以下のような構成を具備した点に特徴がある。
(1) オブジェクトを含む観察エリアを撮影してカメラビューの奥行映像を取得する奥行カメラと、奥行映像から各ピクセルがデプス値を有するデプス画像をフレーム単位で取得する手段と、カメラビューのデプス画像に基づいてオブジェクト同士のオクルージョンを検知する手段と、カメラビューのデプス画像に基づいて、当該カメラビューとは異なる複数の相互に異なる仮想ビューでの仮想デプス画像をそれぞれ算出する手段と、オブジェクト同士のオクルージョンが検知されると、デプス画像および各仮想デプス画像に基づいて各オブジェクトを識別する手段と、識別結果に基づいて各オブジェクトを追跡する手段と、追跡の結果を表示する手段とを具備した。
(2) カメラビューのデプス画像に基づいて、観察エリアのトップビュー、左サイドビューおよび右サイドビューの各仮想デプス画像を算出するようにした。
(3) 複数の奥行カメラを備え、各奥行カメラの観察エリアごとに予め設定されたマーカに基づいて、各奥行カメラにより得られた追跡の結果を同一の座標系上へ投影する手段をさらに具備し、追跡の結果を表示する手段は、同一の座標系上に投影された追跡の結果を表示するようにした。 In order to achieve the above object, the present invention is characterized in that an object tracking device that tracks an object based on its depth image has the following configuration.
(1) A depth camera that captures an observation area including an object to obtain a depth image of a camera view, a unit that obtains a depth image in which each pixel has a depth value from the depth image in frame units, and a depth image of a camera view Means for detecting occlusion between objects on the basis of a depth image of the camera view, means for calculating virtual depth images in a plurality of mutually different virtual views different from the camera view, and means for detecting occlusion between the objects. When occlusion is detected, the system includes a unit for identifying each object based on the depth image and each virtual depth image, a unit for tracking each object based on the identification result, and a unit for displaying a result of the tracking.
(2) Each virtual depth image of the top view, the left side view, and the right side view of the observation area is calculated based on the depth image of the camera view.
(3) A plurality of depth cameras are provided, and a means for projecting a tracking result obtained by each depth camera onto the same coordinate system based on a marker preset for each observation area of each depth camera is further provided. The means for displaying the tracking result displays the tracking result projected on the same coordinate system.

本発明によれば、以下のような効果が達成される。 According to the present invention, the following effects are achieved.

(1) カメラビューの奥行映像に基づくデプス画像に基づいて、カメラビュー以外の複数の仮想ビュー（例えば、トップビュー、左右の各サイドビュー）の仮想デプス画像を生成し、デプス画像および各仮想デプス画像に基づいて各オブジェクトを検知、識別するので、奥行カメラと各オブジェクトとの相対的な位置関係にかかわらず、各オブジェクトをオクルージョンにかかわらず精度良く追跡できるようになる。 (1) Based on a depth image based on a depth image of a camera view, a virtual depth image of a plurality of virtual views (for example, a top view, left and right side views) other than the camera view is generated, and the depth image and each virtual depth are generated. Since each object is detected and identified based on the image, each object can be accurately tracked regardless of the occlusion regardless of the relative positional relationship between the depth camera and each object.

(2) 各仮想ビューにおける前回フレームの各オブジェクトと今回フレームの各オブジェクトとの対応付けが、フレーム間での各オブジェクトの位置のみならず、各オブジェクトのデプス分布に基づいて行われるので、フレーム間での移動量が多いオブジェクトも精度よく追跡できるようになる。 (2) Since the correspondence between each object of the previous frame and each object of the current frame in each virtual view is performed based on not only the position of each object between frames but also the depth distribution of each object, It is possible to accurately track an object having a large amount of movement in the area.

(3) オブジェクトの観察エリアが広いために複数の奥行カメラで各オブジェクトを追跡する場合も、各奥行カメラの観察エリアが、予め各観察エリアに設定されたオーバラップマーカに基づいて統合されるので、オブジェクトが観察エリアを跨いで移動する場合も精度よく追跡できるようになる。 (3) When observing each object with a plurality of depth cameras because the observation area of the object is large, the observation areas of each depth camera are integrated based on the overlap marker set in advance in each observation area. In addition, even when the object moves across the observation area, the object can be accurately tracked.

本発明の一実施形態に係るオブジェクト追跡装置が設置される環境の一例を模式的に示した図である。It is a figure showing typically an example of the environment where the object tracking device concerning one embodiment of the present invention is installed. オブジェクト追跡装置の主要部の構成を示した機能ブロック図である。FIG. 2 is a functional block diagram illustrating a configuration of a main part of the object tracking device. デプス画像の例を示した図である。FIG. 4 is a diagram illustrating an example of a depth image. オブジェクトの輪郭に外接する矩形領域の例を示した図である。FIG. 4 is a diagram illustrating an example of a rectangular area circumscribing an outline of an object. オブジェクトの矩形領域の重畳面積の割合に基づいてオクルージョンの発生を判定する方法を模式的に示した図である。FIG. 4 is a diagram schematically illustrating a method of determining occurrence of occlusion based on a ratio of an overlapping area of a rectangular region of an object. トップビューでの仮想デプス画像D^T，左サイドビューでの仮想デプス画像D^Lおよび右サイドビューでの仮想デプス画像D^Rの例を示した図である。Virtual depth image D ^T at the top view, is a diagram showing an example of a virtual depth image D ^R of the virtual depth image D ^L and the right side-view of the left side view. カメラビューの一例を示した図である。FIG. 3 is a diagram illustrating an example of a camera view. トップビューの一例を示した図である。It is a figure showing an example of a top view. 左サイドビューの一例を示した図である。It is a figure showing an example of the left side view. 右サイドビューの一例を示した図である。It is a figure showing an example of a right side view. オブジェクトの移動軌跡の表示例を示した図である。FIG. 5 is a diagram illustrating a display example of a movement locus of an object. 本発明の一実施形態に係るオブジェクトの追跡方法および追跡プログラムの動作を示したフローチャートである。4 is a flowchart illustrating an operation of the object tracking method and the tracking program according to the embodiment of the present invention. ２つの観察エリアQ1，Q2の統合例を示した図である。It is a figure showing an example of integration of two observation areas Q1 and Q2. ３つの観察エリアQ1，Q2，Q3の統合例を示した図である。It is a figure showing an example of integration of three observation areas Q1, Q2, and Q3. ４つの観察エリアQ1，Q2，Q3，Q4の統合例を示した図である。It is a figure showing an example of integration of four observation areas Q1, Q2, Q3, and Q4.

以下、図面を参照して本発明の実施の形態について詳細に説明する。図1は、本発明の一実施形態に係るオブジェクト追跡装置が設置される環境の一例を模式的に示した図であり、ここでは２台の奥行カメラcam（cam1，cam2）を設置する場合を例にして説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. FIG. 1 is a diagram schematically showing an example of an environment in which an object tracking device according to an embodiment of the present invention is installed. Here, a case where two depth cameras cam (cam1, cam2) are installed is shown. An example will be described.

第１奥行カメラcam1は、第1観察エリアQ1内のオブジェクトObjを撮影し、そのカメラビューでの奥行映像を取得する。第２奥行カメラcam2は、第２観察エリアQ2内のオブジェクトObjを撮影し、そのカメラビューでの奥行映像を取得する。各観察エリアQ1，Q2は矩形であり、それぞれの４隅には、後述する透視投影用のマーカm11〜m14，m21〜m24が予め設定されている。 The first depth camera cam1 captures an image of the object Obj in the first observation area Q1, and acquires a depth video in the camera view. The second depth camera cam2 captures an image of the object Obj in the second observation area Q2, and acquires a depth video in the camera view. Each of the observation areas Q1 and Q2 is rectangular, and markers m11 to m14 and m21 to m24 for later-described perspective projection are set in advance at the four corners.

第１および第２観察エリアQ1，Q2は、その一辺を共有するように隣接配置される。当該一辺の一端および他端の各マーカm13，m23およびm14，m24は共通であり、前記一辺はオーバラップマーカm_overとして利用される。当該オーバラップマーカm_overは、オブジェクトObjが第１観察エリアQ1および第２観察エリアQ2の一方から他方に移動した場合でもオブジェクトObjを見失うことなく追跡を継続できるように、各観察エリアQ1，Q2を統合するために参照される。 The first and second observation areas Q1, Q2 are arranged adjacently so as to share one side thereof. The markers m13, m23 and m14, m24 at one end and the other end of the one side are common, and the one side is used as an overlap marker m _over . The overlap marker m _over is used for each of the observation areas Q1 and Q2 so that tracking can be continued without losing the object Obj even when the object Obj moves from one of the first observation area Q1 and the second observation area Q2 to the other. Referred to integrate.

図２は、前記奥行カメラcam1，cam2が撮影した奥行映像に基づいてオブジェクトを追跡するオブジェクト追跡装置の主要部の構成を示した機能ブロック図であり、ここでは、本発明の説明に不要な構成は図示が省略されている。本実施形態では、２台の奥行カメラcam1，cam2が独立してオブジェクト追跡を実行するが、その追跡メカニズムは同一なので、ここでは一方の奥行カメラcamに注目して説明する。 FIG. 2 is a functional block diagram showing a configuration of a main part of an object tracking device that tracks an object based on depth images captured by the depth cameras cam1 and cam2. Here, a configuration unnecessary for description of the present invention. Is not shown. In the present embodiment, the two depth cameras cam1 and cam2 independently perform object tracking. However, since the tracking mechanism is the same, the description will be given focusing on one depth camera cam.

オブジェクト検知部１０は、図３に示したように、奥行映像から各ピクセルがデプス値を有するデプス画像をフレーム単位で取得し、当該デプス画像上でオブジェクトを検知する。本実施形態では、オブジェクトが存在しない環境下で撮影した背景画像を蓄積しておき、デプス画像と背景画像との差分が所定の閾値以上となる閉領域に基づいてオブジェクトが検知される。 As illustrated in FIG. 3, the object detection unit 10 acquires a depth image in which each pixel has a depth value from the depth image in frame units, and detects an object on the depth image. In the present embodiment, a background image captured in an environment where no object exists is stored, and the object is detected based on a closed region where the difference between the depth image and the background image is equal to or greater than a predetermined threshold.

矩形領域設定部２０は、図４に一例を示したように、各オブジェクトに対して、その輪郭に外接する矩形領域Kを設定する。オクルージョン判定部３０は重畳面積計算部３１を含み、各オブジェクトの矩形領域Kが重なる重畳範囲の面積の割合に基づいてオクルージョンの有無を判定する。 The rectangular area setting unit 20 sets a rectangular area K circumscribing the outline of each object, as shown in an example in FIG. The occlusion determination unit 30 includes a superimposition area calculation unit 31 and determines the presence or absence of occlusion based on the ratio of the area of the superimposition range where the rectangular area K of each object overlaps.

図５は、各オブジェクトの矩形領域Kの重畳面積の割合に基づいてオクルージョンの発生を判定する方法を模式的に示した図である。 FIG. 5 is a diagram schematically illustrating a method of determining the occurrence of occlusion based on the ratio of the overlapping area of the rectangular area K of each object.

本実施形態では、前回フレーム(f-1)で検知されたオブジェクトの矩形領域と今回フレーム(f)で検知されたオブジェクトの矩形領域との重なりが判断される。その結果、同図(a)に示したように、前回フレームで検知されたオブジェクトObja^f-1，Objb^f-1の各矩形領域Ka，Kbと、今回フレームで検知されたオブジェクトObjc^fの矩形領域Kcとの重畳範囲の面積の割合がいずれも５０パーセント以上であれば、オブジェクトObjc^fは２つのオブジェクトObja^f-1，Objb^f-1のオクルージョンにより生じた統合オブジェクトであると判断される。 In the present embodiment, the overlap between the rectangular area of the object detected in the previous frame (f-1) and the rectangular area of the object detected in the current frame (f) is determined. As a result, as shown in FIG. 7A, the rectangular areas Ka and Kb of the objects Obja ^f-1 and Objb ^f-1 detected in the previous frame and the rectangles of the object Objc ^f detected in the current frame are obtained. When the proportion of the area of the overlapped range of the region Kc are both 50% or more, it is determined that the object objc ^f is an integrated object caused by the two occlusion objects ^{^{Obja f-1, Objb f-}} 1.

これに対して、同図(b)に示したように、前回フレームで検知されたオブジェクトObja^f-1，Objb^f-1の各矩形領域Ka，Kbと今回フレームで検知されたオブジェクトObjc^fの矩形領域Kcとの重畳が検知されても、重畳範囲の面積の割合が５０パーセント以上の組み合わせが一つのみ（ここでは、オブジェクトObja^f-1，Objd^fの組み合わせのみ）であればオクルージョン状態とは判定されない。 In contrast, as shown in FIG. (B), objects OBJa ^f-1 have been detected in the previous frame, OBJb each rectangular region Ka of ^f-1, Kb and frame of the detected object objc ^f in this Even if the superimposition with the rectangular area Kc is detected, if only one combination (here, only the combination of the objects Obja ^f-1 and Objd ^f ) has the area ratio of the superimposition range of 50% or more, the occlusion state occurs. Is not determined.

また、同図(c)に示したように、前回フレームで検知されたオブジェクトObja^f-1の矩形領域Kaと今回フレームで検知されたオブジェクトObjf^fの矩形領域Kfとの重畳範囲の面積の割合が５０パーセント以上であっても、それが一つのみであればオクルージョン状態とは判定されない。 Also, as shown in FIG. 4C, the ratio of the area of the overlapping area of the rectangular area Ka of the object Obja ^f-1 detected in the previous frame and the rectangular area Kf of the object Objf ^f detected in the current frame. Is 50% or more, if there is only one, it is not determined to be in the occlusion state.

図２へ戻り、透視投影部４０は、図６に示したように、前記４つのマーカm11〜m22のカメラ座標系での位置ならびに奥行カメラcamの内部パラメータおよび外部パラメータに基づいて、前記カメラビューでのデプス画像D^Cを、トップビュー（上面視）での仮想的なデプス画像D^Tに射影変換し、さらに左サイドビュー（左側面視）での仮想的なデプス画像D^Lおよび右サイドビュー（右側面視）での仮想的なデプス画像D^Rに射影変換する。各デプス画像Dは各ピクセル位置のデプス値dの集合として次式(1)で表され、各デプス値dの添え字x，yは各ピクセルの座標位置を表している。 Returning to FIG. 2, as shown in FIG. 6, the perspective projection unit 40 sets the camera view based on the positions of the four markers m11 to m22 in the camera coordinate system and the internal and external parameters of the depth camera cam. depth image D ^C, and projective transformation into a virtual depth image D ^T at the top view (top view), further virtual depth image D ^L and the right side-view of the left side view (left side view) in to projective transformation into a virtual depth image D ^R in (right side view). Each depth image D is represented by the following equation (1) as a set of depth values d at each pixel position, and the suffixes x and y of each depth value d represent the coordinate position of each pixel.

本実施形態では、前記観察エリアQに設定された４つのマーカm11〜m22に基づいて次式(2)の行列Aが与えられると、観察エリアQのトップビューを得るための次式(3)の行列A'に基づいて変換行列Mを計算する。 In the present embodiment, given the matrix A of the following equation (2) based on the four markers m11 to m22 set in the observation area Q, the following equation (3) for obtaining the top view of the observation area Q A transformation matrix M is calculated based on the matrix A ′ of

ここでは、計算を簡単化するためにx₀'=x₂'=y₀'=y₁'=0とし、また観察エリアを予め設定された矩形状とすることでx₁'=x₃'=y₂'=y₃'とする。その結果、行列A'は次式(4)で表される。ここでは、X'=Y'である。 Here, x ₀ '= x ₂ ' = y ₀ '= y ₁ ' = 0 to simplify the calculation, and x ₁ '= x ₃ ' by setting the observation area to a preset rectangular shape = y ₂ '= y ₃ '. As a result, the matrix A ′ is expressed by the following equation (4). Here, X '= Y'.

行列A，A'間の透視マッピングは次式(5)の計算により実行される。 The perspective mapping between the matrices A and A ′ is executed by the calculation of the following equation (5).

ここで、x_i，y_iおよびx'_i，y'_iは既知なので、行列係数c_ijはLU分解を解くことで計算できる。c₂₂=１とすれば、変換行列Mは次式(6)で表される。 Here, since x _i , y _i and x ′ _i , y ′ _i are known, the matrix coefficient c _ij can be calculated by solving the LU decomposition. If c ₂₂ = 1, the transformation matrix M is expressed by the following equation (6).

簡単のために、変換行列Mを用いて観察エリアの左サイドおよび右サイドからの透視ビューを、次式(7)，(8)の行列A^L，A^Rをそれぞれ決定することで得る。 For simplicity, perspective views from the left and right sides of the observation area are obtained using the transformation matrix M by determining the matrices A ^L and A ^R in the following equations (7) and (8), respectively.

ID割当実行部５０は、統合オブジェクト分割部５１，オブジェクト候補検知部５２，類似度パラメータ計算部５３およびオブジェクト対応付け部５４を含み、カメラビューでのデプス画像D^Cに基づいてオブジェクト同士のオクルージョンが検知されると、その観察エリアのトップビュー，左サイドビューおよび右サイドビューでの各仮想デプス画像D^T，D^L，D^Rに基づいて、ビューごとにオブジェクトを検知する。 ID allocation execution unit 50 includes an integrated object dividing unit 51, the object candidate detection unit 52 includes a similarity parameter calculation unit 53 and the object association unit 54, occlusion between objects based on the depth image D ^C of the camera view When detected, the object is detected for each view based on the virtual depth images D ^T , D ^L , and D ^R in the top view, left side view, and right side view of the observation area.

そして、前回フレーム(f-1)と今回フレーム(f)との間で各オブジェクトの同一性が、その位置およびデブス分布に基づいて判断され、最尤のオブジェクト同士に同一IDを付することでオクルージョン中も各オブジェクトの追跡が継続される。 Then, the identity of each object between the previous frame (f-1) and the current frame (f) is determined based on the position and the depth distribution, and the same ID is assigned to the most likely objects. Tracking of each object is continued during occlusion.

前記統合オブジェクト分割部５１は、オクルージョンにより複数のオブジェクトが統合したオブジェクト（統合オブジェクト）から、オクルージョン前の各オブジェクトとデプス値の分布が類似する各重畳範囲を追跡対象のオブジェクトとして分割する。 The integrated object division unit 51 divides, from an object (integrated object) in which a plurality of objects are integrated by occlusion, a superimposed range in which the distribution of depth values is similar to each object before occlusion, as a tracking target object.

本実施形態では、n個のオブジェクトのオクルージョンにより生じた統合オブジェクトであれば、その領域内にデプス分布の異なるn個の領域が混在しているとの考察に基づいて、統合オブジェクトから各ピクセルのデプス値に基づいて複数の部分を分割して新たな追跡対象のオブジェクトとする。 In the present embodiment, if it is an integrated object generated by occlusion of n objects, based on the consideration that n regions having different depth distributions are mixed in the region, each pixel of the integrated object is extracted from the integrated object. A plurality of parts are divided based on the depth value to obtain a new tracking target object.

ここでは初めに、前回フレームで検知されている統合前の各オブジェクトおよび今回フレームにおける重畳範囲に関して、次式(9)〜(12)で表されるデプスモデルxが生成される。ここで、変数aは統合前の各オブジェクトまたは重畳範囲に外接する矩形領域の幅wおよび高さhの関数であり、変数cは幅wおよび高さhの中心位置（座標）であり、変数dは各ピクセルのデプス値の関数である。 Here, first, a depth model x represented by the following equations (9) to (12) is generated for each object before integration detected in the previous frame and the superimposed range in the current frame. Here, the variable a is a function of the width w and the height h of the rectangular area circumscribing each object before the integration or the superimposition range, the variable c is the center position (coordinate) of the width w and the height h, and the variable d is a function of the depth value of each pixel.

次いで、統合前の各オブジェクトのデプス分布と今回フレームにおける重畳範囲のデプス分布との類似性を判断するために、次式(13)のバッタチャリャ距離（Bhattacharyya distance：距離尺度）Bが計算される。 Next, in order to determine the similarity between the depth distribution of each object before integration and the depth distribution of the superimposed range in the current frame, a Bhattacharyya distance (distance scale) B of the following equation (13) is calculated.

ここで、σ_f，μ_fは今回フレームにおける重畳範囲のデプス値の分散および算術平均（中央値）であり、σ_f-1，μ_f-1は前回フレームで検知されたオブジェクトのデプス値の分散および算術平均（中央値）である。そして、今回フレームにおける各重畳範囲のうち、前回フレームで検知されている統合前の各オブジェクトとのバッタチャリャ距離Bの近い重畳範囲が統合オブジェクトから分割されて新たな追跡対象のオブジェクトとされる。 Here, σ _f and μ _f are the variance and the arithmetic mean (median value) of the depth value of the superimposed range in the current frame, and σ _f-1 and μ _f-1 are the depth values of the object detected in the previous frame. Variance and arithmetic mean (median). Then, of the superimposition ranges in the current frame, the superimposition range in which the Bhattacharyha distance B with each object before integration detected in the previous frame is divided from the integrated object and is set as a new tracking target object.

前記オブジェクト候補検知部５２は、各ビューの仮想デプス画像D^T，D^L，D^Rに基づいてオブジェクトを検知する。ここでは、ビューごとにそのデプス値をクラスタリングすることでオブジェクトが検知され、検知された各オブジェクトに外接する矩形領域が、それぞれオブジェクト候補領域とされる。 The object candidate detection unit 52 detects an object based on the virtual depth images D ^T , D ^L , and D ^R of each view. Here, an object is detected by clustering the depth value for each view, and a rectangular area circumscribing each detected object is set as an object candidate area.

類似度パラメータ計算部５３は、前回フレームで検知されたオブジェクトごとに、今回フレームで検知された各オブジェクトとのペア（オブジェクトペア）の類似度パラメータを計算し、連続するフレーム間での各オブジェクトの同一性の指標とする。 The similarity parameter calculation unit 53 calculates, for each object detected in the previous frame, a similarity parameter of a pair (object pair) with each object detected in the current frame, and calculates the similarity parameter of each object between successive frames. It is used as an index of identity.

図７ないし図１０は、前記類似度パラメータ計算部５３による類似度パラメータの計算方法を説明するための図であり、ここでは、２つのオブジェクトObj1，Obj2が奥行カメラcamに対して略平行に起立している場合を例にして説明する。 7 to 10 are diagrams for explaining a method of calculating a similarity parameter by the similarity parameter calculation unit 53. Here, two objects Obj1 and Obj2 stand substantially parallel to the depth camera cam. An example will be described.

図７は、カメラビューの一例を示した図であり、本実施形態では、４つのマーカで囲まれた観察エリアQが、格子状に量子化されて予めN×N個のブロックに分割されている。このようなエリア分割は、非特許文献１，２に開示されたOccupancy Grid Mapの技術を利用して行うことができる。 FIG. 7 is a diagram illustrating an example of a camera view. In the present embodiment, an observation area Q surrounded by four markers is quantized in a lattice shape and divided into N × N blocks in advance. I have. Such area division can be performed using the Occupancy Grid Map technology disclosed in Non-Patent Documents 1 and 2.

本実施形態では、奥行カメラcamが地面から2.5mの高さの位置に配置され、フレームサイズがVGA解像度（640×480）であり、エリア分割は、例えばN=50として行われている。そして、オクルージョン中の２つのオブジェクトObj1，Obj2が位置している各ブロックが、オクルージョン中の各オブジェクトに占有されているブロック（占有ブロック）として認識される。 In the present embodiment, the depth camera cam is arranged at a height of 2.5 m from the ground, the frame size is VGA resolution (640 × 480), and the area division is performed, for example, with N = 50. Then, each block in which the two objects Obj1 and Obj2 in the occlusion are located is recognized as a block (occupied block) occupied by each object in the occlusion.

図８は、図７のカメラビューのデプス画像D^Cを射影変換して得られたトップビューでの仮想デプス画像D^Tに基づいて検知されたオブジェクトObj1，Obj2およびその位置を示した図であり、各オブジェクトObj1，Obj2の占有ブロックはハッチングで示されている。 Figure 8 is an diagram showing the detected object Obj1, Obj2 and its location based on the virtual depth image D ^T at the top view obtained by projection transformation of the depth image D ^C camera view in FIG. 7 The blocks occupied by the objects Obj1 and Obj2 are indicated by hatching.

図９は、図７のカメラビューのデプス画像D^Cを射影変換して得られた左サイドビューでの仮想デプス画像D^Lに基づいて検知されたオブジェクトおよびその位置を示した図であり、図１０は、図７のカメラビューのデブス情報D^Cを射影変換して得られた右サイドビューでの仮想デプス画像D^Rに基づいて検知されたオブジェクトおよびその位置を示した図である。 Figure 9 is a diagram showing a detected object and its location based on the virtual depth image D ^L of the left side view obtained by projection transformation of the depth image D ^C camera view of FIG. 7, FIG. 10 is a diagram showing an object and its position detected based on the virtual depth image D ^R in the right side view obtained by projective transformation of Debs information D ^C of the camera view in FIG.

本実施形態では、以上のようにして仮想ビューごとにオブジェクト候補の位置およびそのデプス値が検知されると、前回フレーム(f-1)で検知されたオブジェクトObji（iはオブジェクト識別子）ごとに、今回フレーム(f)で検知された各オブジェクト候補Obj(i)との類似度パラメータΔ(π_(i) ^f，π_i ^f-1)が、各オブジェクト間の距離およびデプス分布の差の総和として次式(14)により計算される。 In the present embodiment, when the position of an object candidate and its depth value are detected for each virtual view as described above, for each object Obji (i is an object identifier) detected in the previous frame (f-1), The similarity parameter Δ (π _(i) ^f , π _i ^f-1 ) with each object candidate Obj (i) detected in frame (f) this time is calculated as the sum of the distance between each object and the difference in depth distribution. It is calculated by the following equation (14).

ここで、上式(14)の右辺第1項のB(p^f，p^f-1)は、今回フレーム(f)でのオブジェクトの位置と前回フレーム(f-1)でのオブジェクトの位置との距離であり、次式(15)で求められる。 Here, B (p ^f , p ^f-1 ) of the first term on the right side of the above equation (14) is the position of the object in the current frame (f) and the position of the object in the previous frame (f-1). And is obtained by the following equation (15).

ここで、|p^f，p^f-1|は、今回フレームのオブジェクト候補Obj(i)の位置p^fと前回フレームのオブジェクトObjiの位置p^f-1とのユークリッド距離であり、例えば、各オブジェクトに外接するオブジェクト領域の重心位置間のユークリッド距離である。 ^{^{Here, | p f, p f-}} 1 | is the Euclidean distance between the position p ^f-1 object Obji position p ^f and the previous frame of the object candidate Obj of the current frame (i), for example, each object Is the Euclidean distance between the positions of the centers of gravity of the object regions circumscribing.

そして、カメラビューにおいて、今回フレームでのオブジェクト候補Obj(i)の占有ブロックb(p^f)と前回フレームでのオブジェクトObjiの占有ブロックb(p^f-1)とが同一[b(p^f)=b(p^f-1)]であれば、距離B(p^f，p^f-1)は、カメラビューでのオブジェクトObji，Obj(i)間のユークリッド距離|p^f，p^f-1|に所定の係数αを乗じた値として求められる。 Then, in the camera view, the occupied block b (p ^f ) of the object candidate Obj (i) in the current frame and the occupied block b (p ^f-1 ) of the object Obji in the previous frame are the same [b (p ^f )] = b (p ^f-1 )], the distance B (p ^f , p ^f-1 ) is the Euclidean distance | p ^f , p ^f-1 | between the objects Obji and Obj (i) in the camera view. Is multiplied by a predetermined coefficient α.

これに対して、カメラビューにおいて、今回フレームのオブジェクト候補Obj(i)の占有ブロックb(p^f)と前回フレームのオブジェクトObjiの占有ブロックb(p^f-1)とが同一でなければ[else]、距離B(p^f，p^f-1)は、各オブジェクトObji，Obj(i)のトップビューにおけるユークリッド距離|p^f，p^f-1|、左サイドビューにおけるユークリッド距離|p^Lf，p^Lf-1|および右サイドビューにおけるユークリッド距離|p^Rf，p^Rf-1|の総和に所定の係数αを乗じた値となる。 On the other hand, if the occupied block b (p ^f ) of the object candidate Obj (i) of the current frame and the occupied block b (p ^f-1 ) of the object Obji of the previous frame are not the same in the camera view, [else ], Distance B (p ^f , p ^f-1 ) is the Euclidean distance | p ^f , p ^f-1 | in the top view of each object Obji, Obj (i), and the Euclidean distance | p ^Lf , p in the left side view. ^Lf-1 | and the sum of the Euclidean distances | p ^Rf and p ^Rf-1 | in the right side view are multiplied by a predetermined coefficient α.

また、上式(14)の右辺第２項の各Γ(D^f，D^f-1)は、各ビューにおいて検知された今回フレームのオブジェクト候補Obj(i)のデプス分布と前回フレームのオブジェクトObjiのデブス分布との差であり、本実施形態では、バッタチャリ距離として次式(16)により求められる。 Further, each Γ (D ^f , D ^f−1 ) of the second term on the right side of the above equation (14) is the depth distribution of the object candidate Obj (i) of the current frame detected in each view and the object Obji of the previous frame. In the present embodiment, the difference is obtained by the following equation (16).

ここでも、σ_f，μ_fは今回フレームにおける重畳範囲のデプス値の分散および算術平均（中央値）であり、σ_f-1，μ_f-1は前回フレームで検知されたオブジェクトのデプス値の分散および算術平均（中央値）である。そして、今回フレームにおけるオブジェクトのうち、前回フレームで検知されている統合前の各オブジェクトとのバッタチャリャ距離の近いオブジェクトが、当該統合前のオブジェクトと同一のオブジェクトとされる。 Here, σ _f and μ _f are the variance and arithmetic mean (median value) of the depth value of the superimposed range in the current frame, and σ _f-1 and μ _f-1 are the depth values of the object detected in the previous frame. Variance and arithmetic mean (median). Then, among the objects in the current frame, an object that is detected in the previous frame and has a short battery distance from each of the objects before integration is regarded as the same object as the object before integration.

すなわち、Γ(D^Tf，D^Tf-1)は、トップビューにおいて検知された今回フレームでのオブジェクト候補Obj(i)のデプス分布と前回フレームでのオブジェクトObjiのデブス分布との距離である。同様に、Γ(D^Lf，D^Lf-1)は、左サイドビューにおいて検知された今回フレームでのオブジェクト候補Obj(i)のデプス分布と前回フレームでのオブジェクトObjiのデブス分布との距離である。同様に、Γ(D^Rf，D^Rf-1)は、右サイドビューにおいて検知された今回フレームでのオブジェクト候補Obj(i)のデプス分布と前回フレームでのオブジェクトObjiのデブス分布との距離である。 That is, Γ (D ^Tf , D ^Tf-1 ) is the distance between the depth distribution of the object candidate Obj (i) detected in the top view in the current frame and the depth distribution of the object Obji in the previous frame. Similarly, Γ (D ^Lf , D ^Lf-1 ) is the distance between the depth distribution of the object candidate Obj (i) detected in the left side view in the current frame and the depth distribution of the object Obji in the previous frame. . Similarly, Γ (D ^Rf , D ^Rf-1 ) is the distance between the depth distribution of the object candidate Obj (i) in the current frame detected in the right side view and the depth distribution of the object Obji in the previous frame. .

本実施形態では、オクルージョンが検知されると、前回フレーム(f-1)で検知された全てのオブジェクトごとに、今回フレーム(f)で検知された全てのオブジェクトとの全ての組み合わせに関して上記の計算が実行される。 In the present embodiment, when occlusion is detected, the above calculation is performed for all objects detected in the previous frame (f-1) and all combinations with all objects detected in the current frame (f). Is executed.

オブジェクト対応付け部５３は、類似度パラメータが最小値を示すオブジェクト、すなわち次式(17)を満足するオブジェクト(i)に対して、前回フレームで当該オブジェクトに割り当てられたIDが付与される。 The object associating unit 53 assigns the ID assigned to the object in the previous frame to the object having the minimum similarity parameter, that is, the object (i) satisfying the following expression (17).

このように、本実施形態では一のカメラビューに基づくデプス情報をトップビューおよび左右のサイドビューに射影変換することにより、オブジェクトを複数の異なる方向から見込んだデプス情報が得られるので、複数のオブジェクトがどのような位置関係でオクルージョン状態になっても、各オブジェクトを識別することができる。 As described above, in the present embodiment, depth information based on one camera view is projected and transformed into the top view and the left and right side views, so that depth information in which the object is viewed from a plurality of different directions is obtained. Regardless of the positional relationship between the objects, the objects can be identified.

図２へ戻り、動線表示部６０は、図１１に示したように、フレーム間で同一IDを割り当てられたオブジェクトを同一オブジェクトとみなして、各オブジェクトの移動軌跡をディスプレイ上に表示する。 Returning to FIG. 2, as shown in FIG. 11, the flow line display unit 60 regards the objects assigned the same ID between the frames as the same object, and displays the movement locus of each object on the display.

本実施形態では、前記オーバラップマーカmoverに基づいて、前記２つの観察エリアQ1，Q2の2Dプレーンが統合される。そして、カメラcam1の追跡点およびカメラcam2の追跡点が前記統合プレーン上に投影され、オブジェクトごとに一つの動線（軌跡）として表示出力される。 In the present embodiment, the 2D planes of the two observation areas Q1 and Q2 are integrated based on the overlap marker move. Then, the tracking point of the camera cam1 and the tracking point of the camera cam2 are projected on the integrated plane, and displayed and output as one flow line (trajectory) for each object.

図１２は、本発明の一実施形態に係るオブジェクトの追跡方法および追跡プログラムの動作を示したフローチャートであり、上記の各処理が各ステップで実行される。 FIG. 12 is a flowchart showing the operation of the object tracking method and the tracking program according to an embodiment of the present invention, and the above-described processing is executed in each step.

ステップS１では、奥行カメラcamごとにピクセル単位でデプス値Dを有する奥行映像がフレーム単位で取り込まれる。ステップS２では、各フレーム画像と予め生成されている背景画像との差分データが2D座標系上に投影される。 In step S1, a depth video having a depth value D in pixels for each depth camera cam is captured in frames. In step S2, difference data between each frame image and a previously generated background image is projected on a 2D coordinate system.

ステップS３では、当該2D座標上でデプス値に基づくクラスタリングがピクセル単位で実行される。ステップS４では、前記クラスタリングの結果に基づいて、今回フレームの全てのオブジェクトが検知される。 In step S3, clustering based on the depth value on the 2D coordinates is executed in pixel units. In step S4, all objects in the current frame are detected based on the result of the clustering.

ステップS５では、前回フレームで検知された各オブジェクトと今回フレームで検知された各オブジェクトとの類似度が総当たりで計算され、類似度のより高いオブジェクト同士が相互に対応付けられる。 In step S5, the similarity between each object detected in the previous frame and each object detected in the current frame is calculated on a round robin basis, and objects having higher similarities are associated with each other.

ステップS６では、各オブジェクトに外接する矩形領域Kが設定され、各矩形領域同士の重畳範囲の面積の割合が算出される。ステップS７では、重畳面積の割合が所定の条件を満足するオブジェクト同士がオクルージョン関係にあると判定される。 In step S6, a rectangular area K circumscribing each object is set, and the ratio of the area of the overlapping range of each rectangular area is calculated. In step S7, it is determined that the objects whose overlapping area ratio satisfies a predetermined condition have an occlusion relationship.

このオクルージョン判定により、オクルージョン状態ではないと判定された今回フレームのオブジェクトに関してはステップS８へ進み、類似度が高かった前回フレームのオブジェクトのIDが付与される。これに対して、オクルージョン状態であると判定されたオブジェクト（統合オブジェクト）に関してはステップS９へ進み、統合オブジェクトが各ピクセルのデプス値に基づいて分割される。 With this occlusion determination, the process proceeds to step S8 for the object of the current frame determined not to be in the occlusion state, and the ID of the object of the previous frame having a high degree of similarity is assigned. On the other hand, for an object (integrated object) determined to be in the occlusion state, the process proceeds to step S9, where the integrated object is divided based on the depth value of each pixel.

ステップＳ１０では、カメラビューでのデプス情報D^Cを射影変換することでトップビューでの仮想デプス情報D^T、左サイドビューでの仮想デプス情報D^Lおよび右サイドビューでの仮想デプス情報D^Rが生成される。 In step S10, the virtual depth information D ^T of the depth information D ^C of the camera view in top view by projective transformation, the virtual depth information D ^R of the virtual depth information D ^L and the right side-view of the left side view Generated.

ステップＳ１１では、各ビューのデプス情報に基づいてオブジェクトが検知される。ここでも、各オブジェクトはビューごとにそのデプス値をクラスタリングすることで検知され、各オブジェクトに外接する矩形領域がオブジェクト候補領域とされる。 In step S11, an object is detected based on the depth information of each view. Again, each object is detected by clustering its depth value for each view, and a rectangular area circumscribing each object is set as an object candidate area.

ステップＳ１２では、前回フレームで検知されたオブジェクトごとに、今回フレームで検知された全てのオブジェクトとの類似度パラメータが、上式(14)，(15)，(16)に基づいて計算される。ステップＳ１３では、上式(17)を満足する今回のオブジェクト、すなわち類似度パラメータが最小のオブジェクトに対して同一のIDが付与される。 In step S12, for each object detected in the previous frame, a similarity parameter with all objects detected in the current frame is calculated based on the above equations (14), (15), and (16). In step S13, the same ID is assigned to the current object satisfying the above expression (17), that is, the object having the smallest similarity parameter.

ステップＳ１４では、前記オーバラップマーカm_overに基づいて、前記２つの観察エリアQ1，Q2が同一の座標系上に統合される。ステップＳ１５では、図１３に示したように、カメラcam1の追跡点およびカメラcam2の追跡点が前記統合された座標系上に投影され、オブジェクトごとに一つの動線（軌跡）として表示出力される。 In step S14, the two observation areas Q1 and Q2 are integrated on the same coordinate system based on the overlap marker m _over . In step S15, as shown in FIG. 13, the tracking point of camera cam1 and the tracking point of camera cam2 are projected on the integrated coordinate system, and are displayed and output as one flow line (trajectory) for each object. .

本実施形態によれば、カメラビューのデプス画像に基づいてトップビューおよび左右の各サイドビューの仮想デプス画像を生成し、デプス画像および各仮想デプス画像に基づいて各オブジェクトを検知、識別するので、奥行カメラと各オブジェクトとの相対的な位置関係にかかわらず、各オブジェクトをオクルージョンの前後で精度良く追跡できるようになる。 According to this embodiment, a virtual depth image of each of the top view and the left and right side views is generated based on the depth image of the camera view, and each object is detected and identified based on the depth image and each virtual depth image. Regardless of the relative positional relationship between the depth camera and each object, each object can be accurately tracked before and after occlusion.

また、本実施形態によれば、各仮想ビューにおける前回フレームの各オブジェクトと今回フレームの各オブジェクトとの対応付けが、フレーム間での各オブジェクトの位置および各オブジェクトのデプス分布に基づいて行われるので、フレーム間での移動量が多いオブジェクトも精度よく追跡できるようになる。 Further, according to the present embodiment, the association between each object of the previous frame and each object of the current frame in each virtual view is performed based on the position of each object between frames and the depth distribution of each object. Also, an object having a large amount of movement between frames can be accurately tracked.

さらに、本実施形態によれば、オブジェクトの観察エリアが広いために複数の奥行カメラで各オブジェクトを追跡する場合も、各奥行カメラの観察エリアが、予め各観察エリアに設定されたオーバラップマーカm_overに基づいて統合されるので、オブジェクトが観察エリアを跨いで移動する場合も精度よく追跡できるようになる。 Further, according to the present embodiment, even when tracking each object with a plurality of depth cameras because the observation area of the object is large, the observation area of each depth camera is also set to the overlap marker m set in each observation area in advance. Since integration is performed based on “ _over” , it is possible to accurately track even when an object moves across an observation area.

なお、上記の実施形態では２つの矩形の観察エリアQ1，Q2が、その一辺を共有するように統合される場合を例にして説明したが、本発明はこれのみに限定されるものではなく、図１４に示したように、３つの観察エリアQ1，Q2，Q3がオ−バラッップマーカm_overにより統合されるようにしても良い。あるいは図１５に示したように、４つの観察エリアQ1，Q2，Q3，Q4ないしはそれ以上の観察エリアがオ−バラッップマーカm_overにより統合されるようにしても良い。 In the above-described embodiment, the case where two rectangular observation areas Q1 and Q2 are integrated so as to share one side has been described as an example. However, the present invention is not limited to this. As shown in FIG. 14, the three observation areas Q1, Q2, Q3 may be integrated by an overlap marker m _over . Alternatively, as shown in FIG. 15, four observation areas Q1, Q2, Q3, Q4 or more observation areas may be integrated by an overlap marker m _over .

さらに、上記の実施形態では、カメラビューのデプス情報に基づいて、トップビューの仮想デプス情報D^T、左サイドビューの仮想デプス情報D^Lおよび右サイドビューの仮想デプス情報D^Rの３つの仮想ビューのデプス情報を取得するものとして説明したが、本発明はこれのみに限定されるものではなく、カメラビュー以外に、少なくとも２つの仮想ビューについて仮想デプス情報を取得し、ビューごとに各オクルージョンを識別するようにしても良い。 Furthermore, in the above embodiment, based on the depth information of the camera view, the virtual depth information D ^T of top view, three virtual view of the virtual depth information D ^R of the virtual depth information D ^L and right side views of the left side-view However, the present invention is not limited to this, but acquires virtual depth information for at least two virtual views other than the camera view, and identifies each occlusion for each view. You may do it.

１０…オブジェクト検知部，２０…矩形領域設定部，３０…オクルージョン判定部，３１…重畳面積計算部，４０…透視投影部，５０…ID割当実行部，５１…統合オブジェクト分割部，５２…オブジェクト候補検知部，５３…類似度パラメータ計算部，５４…オブジェクト対応付け部，６０…動線表示部 DESCRIPTION OF SYMBOLS 10 ... Object detection part, 20 ... Rectangular area setting part, 30 ... Occlusion determination part, 31 ... Superimposition area calculation part, 40 ... Perspective projection part, 50 ... ID allocation execution part, 51 ... Integrated object division part, 52 ... Object candidate Detecting unit, 53: Similarity parameter calculating unit, 54: Object associating unit, 60: Flow line display unit

Claims

In an object tracking device that tracks an object based on its depth image,
A depth camera that captures an observation area including the object and obtains a depth image of the camera view,
Means for acquiring a depth image in which each pixel has a depth value from the depth image in frame units,
Means for detecting occlusion between objects based on the depth image of the camera view,
Means for respectively calculating virtual depth images in a plurality of mutually different virtual views different from the camera view, based on the depth image of the camera view,
When occlusion between objects is detected, means for identifying each object based on the depth image and each virtual depth image,
Means for tracking each object based on the identification result;
Means for displaying the result of the tracking.

2. The object tracking device according to claim 1, wherein the means for calculating each of the virtual depth images calculates a virtual depth image of each virtual view by projective transforming the depth image of the camera view.

The means for calculating the virtual depth image, respectively, calculates each virtual depth image of a top view, a left side view, and a right side view of the observation area based on the camera view depth image. 3. The object tracking device according to 1 or 2.

The means for identifying each of the objects,
Based on the depth image and each virtual depth image, means for detecting the position and depth distribution of each object in each view in frame units,
Means for calculating a similarity parameter between a pair with each object detected in the current frame for each object detected in the previous frame, based on a difference between the distance and the depth distribution of the object pair,
4. The object tracking apparatus according to claim 1, wherein the means for tracking each object assigns the same ID to a maximum likelihood object pair based on the similarity parameter.

Equipped with multiple depth cameras,
Based on a marker set in advance for each observation area of each depth camera, further comprising means for projecting the tracking results obtained by each depth camera on the same coordinate system,
5. The object tracking apparatus according to claim 1, wherein the tracking result displaying unit displays the tracking result projected on the same coordinate system.

In an object tracking method for tracking an object based on its depth image,
Shoot the observation area including the object to obtain a depth image of the camera view,
Obtaining a depth image in which each pixel has a depth value from the depth image in frame units,
Detecting occlusion between objects based on the depth image of the camera view,
When occlusion between objects is detected, based on the depth image of the camera view, a virtual depth image in a plurality of mutually different virtual views different from the camera view is calculated,
Identifying each object based on the depth image and each virtual depth image,
Tracking each object based on the identification result;
An object tracking method, wherein a result of the tracking is displayed.

7. The object tracking method according to claim 6, wherein the depth image of the camera view is projectively transformed to calculate a virtual depth image of each virtual view.

8. The object tracking method according to claim 6, wherein each virtual depth image of a top view, a left side view, and a right side view of the observation area is calculated based on the depth image of the camera view.

When identifying each of the objects,
Based on the depth image and each virtual depth image, the position and depth distribution of each object in each view is detected in frame units,
For each object detected in the previous frame, the similarity parameter between the pair with each object detected in the current frame is calculated based on the difference between the distance and the depth distribution of the object pair,
9. The object tracking method according to claim 6, wherein the same ID is assigned to the maximum likelihood object pair based on the similarity parameter.

Equipped with multiple depth cameras,
10. A tracking result obtained by each depth camera is projected on the same coordinate system and displayed based on a marker preset for each observation area of each depth camera. An object tracking method according to any of the above.

An object tracking program for causing a computer to execute the object tracking method according to any one of claims 6 to 10.