JP6628494B2

JP6628494B2 - Apparatus, program, and method for tracking object using discriminator learning based on real space information

Info

Publication number: JP6628494B2
Application number: JP2015085269A
Authority: JP
Inventors: 有希永井; 小林　達也; 達也小林; 智史上野; 有哉巻渕
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2015-04-17
Filing date: 2015-04-17
Publication date: 2020-01-08
Anticipated expiration: 2035-04-17
Also published as: JP2016206795A

Description

本発明は、物体を撮影可能なカメラから取得される時系列画像群を解析して当該物体を追跡する物体追跡技術に関する。 The present invention relates to an object tracking technique for analyzing a time-series image group acquired from a camera capable of photographing an object and tracking the object.

監視やマーケティング等の目的をもって、カメラで撮影され生成された時系列の画像群を用いて、移動する物体の実空間での位置を追跡する技術が開発されている。追跡する物体としては、人物や乗り物等、撮影可能な様々なものが対象となる。 2. Description of the Related Art For the purpose of surveillance, marketing, and the like, a technology for tracking a position of a moving object in a real space using a time-series image group captured and generated by a camera has been developed. As the object to be tracked, various objects that can be photographed, such as a person and a vehicle, are targeted.

このような物体追跡技術では、一般に、追跡対象物体の映っている画像における対象となる画像領域を追跡し、この画像領域を実空間での位置に変換することにより、実空間での物体の位置追跡を実現する。ここで、２次元の画像領域内の１つの点を３次元の実空間に投影する場合、この点に対応する実空間での点については、その高さ、即ちｚ座標値を所定値に固定する必要がある。 In such an object tracking technique, in general, a target image area in an image in which a tracking target object is reflected is tracked, and the image area is converted into a position in a real space, so that the position of the object in the real space is obtained. Achieve tracking. Here, when one point in the two-dimensional image area is projected onto the three-dimensional real space, the height of the point in the real space corresponding to this point, that is, the z coordinate value is fixed to a predetermined value. There is a need to.

例えば、画像内での領域の追跡結果からして、足元位置のように画像内で物体が床や地面に接していることが明らかな箇所は、高さをゼロとして実空間の床面の位置に投影することができる。しかしながら、実際には、画像内で物体が床や地面に接している箇所を特定し続けることは容易ではない。一般に、撮影画像において、物体が床や地面に接している箇所は、例えば、机、テーブル、人物や車といった他の物体の背後に回り隠れてしまうことも少なくない。 For example, based on the result of tracking the area in the image, the place where it is clear that the object is in contact with the floor or the ground in the image, such as the foot position, is assumed to have a height of zero and is positioned on the floor of the real space. Can be projected. However, in practice, it is not easy to continue to specify the location where the object is in contact with the floor or the ground in the image. In general, in a captured image, a place where an object is in contact with a floor or the ground is often hidden behind another object such as a desk, a table, a person, or a car.

このような実空間への投影の問題に対し、例えば特許文献１には、人物の頭部が足元に比べて画像内で隠れ難い事情を利用し、人物の足元位置が画像内で不明な場合に頭部の検出を行って、画像内の頭部の位置を示す点を、高さとして予め設定した平均身長値を当てはめて実空間に投影する技術が開示されている。 For such a problem of projection into a real space, for example, Patent Document 1 discloses a case where a person's head is difficult to hide in an image compared to his / her feet, and a person's foot position is unknown in the image. A technique for detecting a head and applying a point indicating the position of the head in an image to a real space by applying an average height value preset as a height is disclosed.

また、特許文献２には、複数の視点から物体を撮影し、視点の異なる複数の画像に基づいて、物体が路面に接する箇所を推定する技術が開示されている。 Further, Patent Literature 2 discloses a technique of photographing an object from a plurality of viewpoints and estimating a position where the object contacts a road surface based on a plurality of images having different viewpoints.

特開２０１４−２２９０６８号公報JP 2014-229068 A 特開２０１４−１９４３６１号公報JP 2014-194361 A

しかしながら、特許文献１及び特許文献２に記載されたような従来技術では、実空間における高い位置精度を維持しつつ物体を追跡し続けることは困難であるという問題が生じていた。 However, in the related arts described in Patent Literature 1 and Patent Literature 2, there has been a problem that it is difficult to keep track of an object while maintaining high positional accuracy in a real space.

例えば、特許文献１に記載されたような物体の平均的な高さを予め設定して用いる手法では、追跡対象物体の高さと平均的な高さとの乖離が大きい場合や、当該物体の形状が変化する場合、実空間での推定位置が正しい位置から大きくずれてしまう。例えば、子供を追跡する場合、この子供の実際の身長と予め設定された平均身長との差が大きくなって、画像内の頭部位置を実空間に投影した際に、本来の頭部位置から見て大きなずれが生じてしまう。さらに、高さを平均身長に設定することは、追跡する人物が直立している場合のみを想定していることになる。その結果、追跡対象人物において座る、お辞儀する等の形状変化が生じた場合、推定位置に大きなズレが生じてしまう。 For example, in the method described in Patent Literature 1 in which the average height of an object is set in advance and used, the deviation between the height of the tracking target object and the average height is large, or the shape of the object is If it changes, the estimated position in the real space will deviate significantly from the correct position. For example, when tracking a child, the difference between the actual height of the child and a preset average height becomes large, and when projecting the head position in the image to the real space, the head position from the original head position is reduced. When viewed, a large shift occurs. Furthermore, setting the height to the average height assumes that the person to be tracked is only upright. As a result, when a shape change such as sitting or bowing occurs in the tracking target person, a large deviation occurs in the estimated position.

また、特許文献２に記載されたような複数視点による画像を用いた手法では、確かに、単眼カメラに比べて床や地面に接した箇所がいずれかの画像に映っている可能性は高くなる。しかしながら、例えば、他の移動物体に囲まれてしまっていずれのカメラからも当該箇所が撮影されなくなる状況は容易に発生し得る。即ち、複数視点によっても床や地面に接した箇所がいずれかのカメラに映ることは何ら保証されない。また、カメラを必ず複数台使用しなければならないので、導入・運用コストが単眼カメラに比べて高くなってしまうとの問題も生じる。 Further, in the method using an image from a plurality of viewpoints as described in Patent Literature 2, it is more likely that a portion in contact with the floor or the ground is shown in any of the images as compared with a monocular camera. However, for example, a situation can easily occur in which the location is not photographed by any camera due to being surrounded by another moving object. That is, there is no guarantee that a portion in contact with the floor or the ground will be reflected on any of the cameras even from a plurality of viewpoints. In addition, since a plurality of cameras must be used, there is also a problem that the introduction / operation cost is higher than that of the monocular camera.

さらに、特許文献１及び特許文献２に記載された技術はいずれも、画像内での物体相当の領域の移動量を考慮して追跡を行っており、従って、実空間における移動量は考慮していない。その結果、画像内で推定位置に誤差が生じた場合に、画像内での移動量が僅かであったとしても、実空間での移動量が、現実にはほとんどあり得ないような急激な変化を示すものになってしまうという問題が生じ得る。 Furthermore, the techniques described in Patent Literature 1 and Patent Literature 2 perform tracking in consideration of the amount of movement of a region corresponding to an object in an image, and therefore, consider the amount of movement in a real space. Absent. As a result, when an error occurs in the estimated position in the image, even if the amount of movement in the image is small, the amount of movement in the real space is abruptly changed such that it is almost impossible in reality. May be caused.

そこで、本発明は、取得される画像群を用いて、実空間における高い位置精度を維持しつつ物体を追跡することができる装置、プログラム及び方法を提供することを目的とする。 Therefore, an object of the present invention is to provide an apparatus, a program, and a method capable of tracking an object while maintaining high positional accuracy in a real space by using an acquired image group.

本発明によれば、追跡対象の物体を撮影可能な１つ以上のカメラから取得される時系列の画像群を用いて当該物体を追跡可能な装置であって、
１つの時点における当該物体の実空間での位置に係る位置情報を含む情報である物体動き情報として、少なくとも当該物体の実空間での位置における前時点からの変化分を採用し、当該１つの時点における互いに変化分の異なる物体動き情報である複数の候補物体動き情報を算出する候補情報算出手段と、
取得された画像に係る画像情報と、正解とされる当該物体動き情報とを含むデータセットによって学習する識別器であって、当該物体の実空間での位置の変化分を変数とする確率密度関数に係る項と、当該物体に係る画像領域に対する当該候補物体動き情報から算出される画像領域の見かけ（appearance）の近さを評価する項とを有する評価関数に対し、入力された当該複数の候補物体動き情報及び当該１つの時点での画像に係る画像情報を適用し、当該評価関数のスコアを最大にする候補物体動き情報を、当該１つの時点における当該物体の実空間での位置に係る正解の位置情報として出力する識別器によって、当該物体の実空間での刻々の位置情報を取得する物体追跡手段と
を有する物体追跡装置が提供される。 According to the present invention, an apparatus capable of tracking an object to be tracked using a time-series image group acquired from one or more cameras capable of capturing the object,
As the object motion information which is information including the position information on the position of the object in the real space at one time, at least the change from the previous time in the position of the object in the real space is adopted, and the one time Candidate information calculation means for calculating a plurality of candidate object motion information that is object motion information different in the amount of change from each other,
And image information according to the acquired image, a classifier to learn the data set comprising a the object motion information that is the correct answer, the probability that a change in the position of the real space of the object a variable density For the evaluation function having a term relating to the function and a term for evaluating the closeness of the appearance (appearance) of the image area calculated from the candidate object motion information with respect to the image area relating to the object, Applying the candidate object motion information and the image information related to the image at the one time point, the candidate object motion information that maximizes the score of the evaluation function is calculated based on the position of the object in the real space at the one time point. An object tracking unit that obtains instantaneous position information of the object in real space by a discriminator that outputs the position information as a correct answer ;
Object tracking apparatus having is provided.

また、本発明による物体追跡装置の他の実施形態として、候補情報算出手段は、当該１つの時点での当該物体動き情報として、当該物体の実空間での位置における前時点からの変化分と、当該物体の高さにおける前時点からの変化分とを採用して、当該１つの時点における少なくとも変化分の１つが異なる複数の候補物体動き情報を算出し、
物体追跡手段の識別器は、
（ａ）当該物体の実空間での位置の変化分を変数とする確率密度関数に係る項と、
（ｂ）当該物体の高さの変化分を変数とする確率密度関数に係る項と、
（ｃ）当該物体に係る画像領域に対する当該候補物体動き情報から算出される画像領域の見かけの近さを評価する項と
を有する評価関数に対し、入力された当該複数の候補物体動き情報及び当該１つの時点での画像に係る画像情報を適用し、当該評価関数のスコアを最大にする候補物体動き情報を、当該１つの時点における当該物体の実空間での位置及び当該物体の高さに係る正解の情報として出力することも好ましい。 Further, as another embodiment of the object tracking device according to the present invention, the candidate information calculating means includes, as the object motion information at the one time point , a change in the position of the object in the real space from the previous time point , adopted and change from the previous point in the height of the object, at least one of the variation in the single point in time is to calculate a plurality of candidate object motion information different,
The classifier of the object tracking means is
(A) a term relating to a probability density function having a change in the position of the object in real space as a variable,
(B) a term relating to the probability density function having a change in the height of the object as a variable;
And (c) a term for evaluating the apparent proximity of the image area calculated from the candidate object motion information to the image area of the object. Applying the image information of the image at one point in time , the candidate object motion information that maximizes the score of the evaluation function is calculated based on the position of the object in the real space and the height of the object at the one point in time . It is also preferable to output the information as correct information.

さらに、本発明による物体追跡装置の更なる他の実施形態として、候補情報算出手段は、当該１つの時点での当該物体動き情報として、当該物体の実空間での位置における前時点からの変化分と、当該物体の高さにおける前時点からの変化分とを採用して、当該１つの時点における少なくとも変化分の１つが異なる複数の候補物体動き情報を算出し、
物体追跡手段の識別器は、
（ａ）当該物体の実空間での位置の変化分を変数とする確率密度関数に係る項と、
（ｂ）当該物体の高さの変化分を変数とする確率密度関数に係る項と、
（ｃ）当該物体に係る画像領域における当該物体の動き（motion）による変化と当該物体動き情報に係る変化分とが合致する度合いを評価する項と、
（ｄ）当該物体に係る画像領域に対する当該候補物体動き情報から算出される画像領域の見かけの近さを評価する項と
を有する評価関数に対し、入力された当該複数の候補物体動き情報及び当該１つの時点での画像に係る画像情報を適用し、当該評価関数のスコアを最大にする候補物体動き情報を、当該１つの時点における当該物体の実空間での位置及び当該物体の高さに係る正解の情報として出力することも好ましい。 Furthermore, as still another embodiment of the object tracking device according to the present invention, the candidate information calculating means includes, as the object motion information at the one time point , a change in the position of the object in the real space from the previous time point. If, adopts a change from the previous point in the height of the object, at least one of the variation in the single point in time is to calculate a plurality of candidate object motion information different,
The classifier of the object tracking means is
(A) a term relating to a probability density function having a change in the position of the object in real space as a variable,
(B) a term relating to the probability density function having a change in the height of the object as a variable;
(C) a term that evaluates the degree to which a change due to the motion of the object in the image area related to the object matches a change amount related to the object motion information;
(D) a term for evaluating the apparent proximity of the image area calculated from the candidate object motion information to the image area relating to the object, to the input plurality of candidate object motion information and the Applying the image information of the image at one point in time , the candidate object motion information that maximizes the score of the evaluation function is calculated based on the position of the object in the real space and the height of the object at the one point in time . It is also preferable to output the information as correct information.

また、以上に述べた各実施形態において、候補情報算出手段は、当該１つの時点での当該物体動き情報として、当該物体の傾きにおける前時点からの変化分を更に採用し、
物体追跡手段の識別器は、当該評価関数のスコアを最大にする候補物体動き情報を、当該１つの時点における当該物体の傾きに係る正解の情報をも含む情報として出力することも好ましい。 Further, in each of the embodiments described above, the candidate information calculation unit further employs a change in the inclination of the object from the previous time as the object motion information at the one time ,
It is also preferable that the discriminator of the object tracking means outputs candidate object motion information that maximizes the score of the evaluation function as information including information on the correct answer related to the tilt of the object at the one point in time .

さらに、以上に述べた各実施形態において、取得された画像に基づいて当該物体を検出し、検出された当該物体に係る画像領域の最下位置に基づいて、当該物体の実空間での位置としての当該物体の接地位置を算出し、検出された当該物体に係る画像領域の最上位置に基づいて算出された実空間での位置と、算出された接地位置とに基づいて、当該物体の高さを算出する物体検出部を更に有することも好ましい。 Further, in each of the embodiments described above, the object is detected based on the acquired image, and the position of the object in the real space is determined based on the lowest position of the detected image area of the object. The ground position of the object is calculated, the position in the real space calculated based on the detected top position of the image area of the object, and the height of the object based on the calculated ground position It is preferable to further include an object detection unit that calculates

さらに、以上に述べた各実施形態において、物体追跡手段の識別器は、学習によって当該評価関数の各項の重み係数を決定し、決定された重み係数を有する評価関数を用いて、入力した当該画像に係る画像情報を処理して、出力する物体動き情報を算出することも好ましい。 Further, in each of the embodiments described above, the discriminator of the object tracking means determines the weight coefficient of each term of the evaluation function by learning, and uses the evaluation function having the determined weight coefficient to input the weight coefficient. It is also preferable that the image information relating to the image is processed to calculate the object motion information to be output.

また、本発明による物体追跡装置において、物体追跡手段の識別器は、１つの時点の前時点において、正解として出力された物体動き情報を用いて生成されたデータセットによって学習を行い、当該１つの時点における当該画像に係る画像情報を入力して、当該画像情報を、当該学習によって決定されたパラメータを用いて処理し、当該１つの時点における正解となる物体動き情報を出力することも好ましい。
さらに、本発明による物体追跡装置において、物体追跡手段の識別器は、当該物体に係る画像領域として、実空間における当該物体の上端から当該物体の高さの所定割合だけ下方となる位置までの物体部分を座標変換して算出された画像領域を採用することも好ましい。 Further, the object tracking apparatus according to the present invention, the identifier of the object tracking means, before the time of one time point, the data set generated by using the object motion information outputted as the correct learns, the one It is also preferable that image information relating to the image at the time is input, the image information is processed using the parameters determined by the learning, and the correct object motion information at the one time is output.
Further, in the object tracking device according to the present invention, the discriminator of the object tracking means may include, as an image area of the object, the object from the upper end of the object in the real space to a position lower by a predetermined ratio of the height of the object. It is also preferable to adopt an image area calculated by performing coordinate conversion on a portion.

さらにまた、本発明による物体追跡装置において、物体追跡手段の識別器は、構造化ＳＶＭ（Structured Support Vector Machine）のアルゴリズムによって構築されることも好ましい。 Furthermore, in the object tracking device according to the present invention, it is preferable that the discriminator of the object tracking means is constructed by a structured SVM (Structured Support Vector Machine) algorithm.

本発明によれば、さらに、追跡対象の物体を撮影可能な１つ以上のカメラから取得される時系列の画像群を用いて当該物体を追跡可能な装置に搭載されたコンピュータを機能させるプログラムであって、
１つの時点における当該物体の実空間での位置に係る位置情報を含む情報である物体動き情報として、少なくとも当該物体の実空間での位置における前時点からの変化分を採用し、当該１つの時点における互いに変化分の異なる物体動き情報である複数の候補物体動き情報を算出する候補情報算出手段と、
取得された画像に係る画像情報と、正解とされる当該物体動き情報とを含むデータセットによって学習する識別器であって、当該物体の実空間での位置の変化分を変数とする確率密度関数に係る項と、当該物体に係る画像領域に対する当該候補物体動き情報から算出される画像領域の見かけ（appearance）の近さを評価する項とを有する評価関数に対し、入力された当該複数の候補物体動き情報及び当該１つの時点での画像に係る画像情報を適用し、当該評価関数のスコアを最大にする候補物体動き情報を、当該１つの時点における当該物体の実空間での位置に係る正解の位置情報として出力する識別器によって、当該物体の実空間での刻々の位置情報を取得する物体追跡手段と
としてコンピュータを機能させる物体追跡プログラムが提供される。 According to the present invention, a program for causing a computer mounted on a device capable of tracking an object to be tracked by using a time-series image group acquired from one or more cameras capable of capturing the object to be tracked is provided. So,
As the object motion information which is information including the position information on the position of the object in the real space at one time, at least the change from the previous time in the position of the object in the real space is adopted, and the one time Candidate information calculation means for calculating a plurality of candidate object motion information that is object motion information different in the amount of change from each other,
And image information according to the acquired image, a classifier to learn the data set comprising a the object motion information that is the correct answer, the probability that a change in the position of the real space of the object a variable density For the evaluation function having a term relating to the function and a term for evaluating the closeness of the appearance (appearance) of the image area calculated from the candidate object motion information with respect to the image area relating to the object, Applying the candidate object motion information and the image information related to the image at the one time point, the candidate object motion information that maximizes the score of the evaluation function is calculated based on the position of the object in the real space at the one time point. the discriminator to output as the position information of the correct, object tracking program for causing a computer to function as <br/> the object tracking means for acquiring every moment the position information of the real space of the object is provided is It is.

本発明によれば、さらにまた、追跡対象の物体を撮影可能な１つ以上のカメラから取得される時系列の画像群を用い、コンピュータによって当該物体を追跡する方法であって、
１つの時点における当該物体の実空間での位置に係る位置情報を含む情報である物体動き情報として、少なくとも当該物体の実空間での位置における前時点からの変化分を採用し、当該１つの時点における互いに変化分の異なる物体動き情報である複数の候補物体動き情報を算出するステップと、
取得された画像に係る画像情報と、正解とされる当該物体動き情報とを含むデータセットによって学習する識別器であって、当該物体の実空間での位置の変化分を変数とする確率密度関数に係る項と、当該物体に係る画像領域に対する当該候補物体動き情報から算出される画像領域の見かけ（appearance）の近さを評価する項とを有する評価関数に対し、入力された当該複数の候補物体動き情報及び当該１つの時点での画像に係る画像情報を適用し、当該評価関数のスコアを最大にする候補物体動き情報を、当該１つの時点における当該物体の実空間での位置に係る正解の位置情報として出力する識別器によって、当該物体の実空間での当該１つの時点における位置情報を決定するステップと
を繰り返し、当該物体の実空間での刻々の位置情報を取得する物体追跡方法が提供される。 According to the present invention, there is further provided a method of tracking an object to be tracked by a computer using a time-series image group acquired from one or more cameras capable of capturing the object to be tracked,
As the object motion information which is information including the position information on the position of the object in the real space at one time, at least the change from the previous time in the position of the object in the real space is adopted, and the one time Calculating a plurality of candidate object motion information, which are different object motion information from each other,
A probability classifier that learns from a data set including image information of an acquired image and the object motion information that is regarded as a correct answer, and uses a change in position of the object in real space as a variable. , And a term for evaluating the closeness of the appearance (appearance) of the image region calculated from the candidate object motion information with respect to the image region of the object, The candidate object motion information that maximizes the score of the evaluation function is obtained by applying the object motion information and the image information of the image at the one time point to the correct answer based on the position of the object in the real space at the one time point. Repeating the step of determining the position information of the object in the real space at the one point in time by the discriminator that outputs the position information of the object. An object tracking method for acquiring is provided.

本発明の物体追跡装置、プログラム及び方法によれば、取得される画像群を用いて、実空間における高い位置精度を維持しつつ物体を追跡することができる。 ADVANTAGE OF THE INVENTION According to the object tracking apparatus, program, and method of this invention, an object can be tracked using an acquired image group, maintaining high positional accuracy in real space.

本発明による物体追跡装置を含む物体追跡システムの一実施形態を示す模式図である。1 is a schematic diagram illustrating an embodiment of an object tracking system including an object tracking device according to the present invention. 本発明による物体追跡装置の一実施形態における処理の流れを概略的に示すフローチャートである。5 is a flowchart schematically showing a flow of processing in an embodiment of the object tracking device according to the present invention. 本発明による物体追跡装置の一実施形態における機能構成を示す機能ブロック図である。FIG. 2 is a functional block diagram showing a functional configuration in an embodiment of the object tracking device according to the present invention. 高さ算出部における物体の高さを算出する方法の一実施形態を示す模式図である。It is a schematic diagram which shows one Embodiment of the method of calculating the height of the object in a height calculation part. 取得される時系列の画像と追跡用識別器での識別機能との関係を概略的に示す模式図である。It is a schematic diagram which shows roughly the relationship between the acquired time-series image and the discrimination function in the tracking discriminator. 追跡対象物体を画像座標系へ投影する一実施形態を説明するための模式図である。FIG. 2 is a schematic diagram for describing an embodiment in which a tracking target object is projected onto an image coordinate system. 物体動き情報における実空間での変化分に係る要素と物体モデルとの関係を示す模式図である。It is a schematic diagram which shows the relationship between the element which concerns on the amount of change in real space in object motion information, and an object model. 実空間での位置に係るサンプリングによる候補物体動き情報の取得の一実施形態を示す模式図である。FIG. 9 is a schematic diagram illustrating an embodiment of acquiring candidate object motion information by sampling a position in a real space. 実空間での高さに係るサンプリングによる候補物体動き情報の取得の一実施形態を示す模式図である。It is a schematic diagram which shows one Embodiment of acquisition of candidate object motion information by sampling which concerns on the height in real space. 画像座標系での傾きに係るサンプリングによる候補物体動き情報の取得の一実施形態を示す模式図である。FIG. 7 is a schematic diagram illustrating an embodiment of obtaining candidate object motion information by sampling relating to a tilt in an image coordinate system. 評価関数における位置変化分の確率密度関数の一実施例を示すグラフである。9 is a graph showing an example of a probability density function for a position change in an evaluation function. 評価関数における高さ変化分の確率密度関数の一実施例を示すグラフである。9 is a graph showing an example of a probability density function for a height change in an evaluation function. 差分画像の一実施例を示すイメージ図である。It is an image figure showing an example of a difference picture. 画像領域x^t|_ytの見かけの特徴ベクトル化の一実施例を示す模式図である。Is a schematic diagram showing an embodiment of a feature vector of the apparent _yt | image region x ^t.

以下、本発明の実施形態について、図面を用いて詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

［物体追跡システム］
図１は、本発明による物体追跡装置を含む物体追跡システムの一実施形態を示す模式図である。 [Object tracking system]
FIG. 1 is a schematic diagram showing an embodiment of an object tracking system including an object tracking device according to the present invention.

図１に示した本実施形態の物体追跡システムは、
（ａ）追跡対象の物体を撮影可能であり、撮影した画像の情報を、通信ネットワークを介して時系列で送信可能な１つ又は複数のカメラ２と、
（ｂ）カメラ２から通信ネットワークを介して取得される時系列の画像群を用いて当該物体を追跡可能な物体追跡装置１と
を備えている。 The object tracking system of the present embodiment shown in FIG.
(A) one or more cameras 2 capable of photographing an object to be tracked and capable of transmitting information of the photographed image in a time series via a communication network;
(B) an object tracking device 1 that can track the object using a time-series image group acquired from the camera 2 via a communication network.

ここで、追跡対象となる物体には、人物、動物、乗り物や、その他移動可能な物理対象等、撮影可能であれば様々なものが該当する。特に、本実施形態では、立ったり座ったり屈んだりしてその全体の形状が変化し得る人物や動物等であってもよい。さらに、撮影される場所も、特に限定されるものではなく、例えば、観客、通勤者、買い物客、歩行者や、ランナー等が映り得る屋外であってもよいが、着席したりお辞儀をしたりする場面が想定されるような会社、学校、家庭や、店舗の内部といった屋内であることも好ましい。 Here, the object to be tracked includes various objects that can be photographed, such as a person, an animal, a vehicle, and other movable physical objects. In particular, in the present embodiment, a person, an animal, or the like that can change its overall shape by standing, sitting, or bending may be used. Further, the place where the image is taken is not particularly limited, and may be, for example, an outdoor where spectators, commuters, shoppers, pedestrians, runners, etc. can be seen, but may take a seat or bow. It is also preferable to be indoors, such as inside a company, school, home, or store where such situations are expected.

また、画像情報の伝送路である通信ネットワークは、例えばＷｉ−Ｆｉ（登録商標）等の無線ＬＡＮ(Local Area Network)とすることができる。または、ＬＴＥ(Long Term Evolution)、ＷｉＭＡＸ（Worldwide Interoperability for Microwave Access）又は３Ｇ（3rd Generation）等の無線系アクセスネットワークを介し、インターネットを経由してカメラ２と物体追跡装置１とを通信接続させるものであってもよい。 In addition, a communication network that is a transmission path of image information can be a wireless LAN (Local Area Network) such as Wi-Fi (registered trademark). Alternatively, the camera 2 and the object tracking device 1 are connected to each other via the Internet via a wireless access network such as LTE (Long Term Evolution), WiMAX (Worldwide Interoperability for Microwave Access), or 3G (3rd Generation). It may be.

さらに、光ファイバ網若しくはＡＤＳＬ（Asymmetric Digital Subscriber Line）等の固定系アクセスネットワークを介しインターネットを経由して、又はプライベートネットワークを介してカメラ２と物体追跡装置１とが通信接続されてもよい。また、変更態様として、カメラ２と物体追跡装置１とは直接有線で接続されてもよい。さらに、複数のカメラ２から出力される画像情報を取りまとめて物体追跡装置１に送信可能な（図示していない）カメラ制御装置が設けられていてもよい。 Further, the camera 2 and the object tracking device 1 may be communicatively connected via the Internet via a fixed access network such as an optical fiber network or an ADSL (Asymmetric Digital Subscriber Line), or via a private network. As a modification, the camera 2 and the object tracking device 1 may be directly connected by wire. Further, a camera control device (not shown) that can collect image information output from the plurality of cameras 2 and transmit the image information to the object tracking device 1 may be provided.

同じく図１に示すように、物体追跡装置１は、刻々と取得される物体追跡対象の画像毎に、
（Ａ１）この画像に係る画像情報を入力することによって少なくとも追跡対象物体の実空間での正解とされる位置情報を出力する追跡用識別器１１４ａによって、少なくとも追跡対象物体の実空間での刻々の位置情報を取得する物体追跡部１１４
を有する。 Similarly, as shown in FIG. 1, the object tracking device 1 is provided for each image of the object tracking target that is acquired moment by moment.
(A1) The tracking discriminator 114a that outputs at least the correct position information in the real space of the object to be tracked by inputting the image information related to the image, and at least the instantaneous position of the object to be tracked in the real space. Object tracking unit 114 for acquiring position information
Having.

ここで、この追跡用識別器１１４ａは、
（Ａ２）取得された画像に係る画像情報と、追跡対象物体の実空間での位置に係る位置情報を含む正解とされる「物体動き情報」とを含むデータセットによって学習する
ことによって構築され更新される。この「物体動き情報」は、後に詳細に説明するが、少なくとも追跡対象物体の実空間での位置における前時刻からの変化分Δp_x ^t及びΔp_y ^tを要素として含む量である。 Here, the tracking discriminator 114a
(A2) It is constructed and updated by learning with a data set including image information on an acquired image and “correct object movement information” including position information on the position of the tracking target object in the real space. Is done. The "object motion information" is described later in detail, is an amount containing a variation Delta] p _x ^t and Delta] p _y ^t from the previous time at the location in the real space of at least tracked object as an element.

このように、物体追跡装置１は、取得された画像に係る画像情報のみならず、実空間（観測対象空間）での追跡対象物体の位置情報を含む「物体動き情報」をも考慮して追跡を行っている。例えば、画像内での物体相当の画像領域の変化だけではなく、実空間における物体相当領域の変化をも考慮して、即ち実空間上での制約も取り入れて、追跡対象物体の実空間での刻々の位置情報を推定しているのである。その結果、画像内で推定位置に誤差が生じた場合に、画像内での移動量が僅かであったとしても、実空間での移動量が、現実にはほとんどあり得ないような急激な変化を示すものになってしまうといった事態を回避することができる。即ち、実空間を考慮した「物体動き情報」を取り入れることによって、取得される画像群を用いながらも、実空間における高い位置精度を維持しつつ物体を追跡することが可能となるのである。 As described above, the object tracking device 1 performs tracking in consideration of not only image information relating to an acquired image but also “object motion information” including position information of a tracking target object in a real space (observation target space). It is carried out. For example, taking into account not only the change in the image area corresponding to the object in the image but also the change in the object equivalent area in the real space, that is, taking into account the constraints in the real space, It estimates position information every moment. As a result, when an error occurs in the estimated position in the image, even if the amount of movement in the image is small, the amount of movement in the real space is abruptly changed such that it is almost impossible in reality. Can be avoided. That is, by adopting the “object motion information” in consideration of the real space, it is possible to track the object while maintaining high positional accuracy in the real space while using the acquired image group.

因みに、「物体動き情報」における追跡対象物体の実空間での位置の前時刻からの変化分Δp_x ^t及びΔp_y ^t等を算出するには、画像での位置情報を実空間での位置情報に変換しなければならない。物体追跡装置１では、画像内に張られた画像座標系u-vでの位置座標(u, v)を、実空間に張られた世界座標系G_x-G_y-G_zでの位置座標(g_x, g_y, g_z)へ変換する座標変換操作を用いて、追跡対象物体の映った画像情報から、実空間での位置の変化分等の位置情報を算出している。 Incidentally, in calculating the change in Delta] p _x ^t and Delta] p _y ^t like from the previous time position in the real space of the tracking target object in the "object motion information", the position information of the position information of the image in real space Must be converted to In the object tracking device 1, the position coordinates (u, v) in the image coordinate system uv set in the image are converted into the position coordinates (g in the world coordinate system G _x -G _y -G _{z set} in the real space. _x , g _y , g _z ) is used to calculate position information such as a change in position in the real space from image information of the object to be tracked, using a coordinate conversion operation.

例えば、追跡対象物体の画像内における前時刻（t−1）での位置(u, v)が、現時刻ｔでの位置(u', v')へ変化した場合、この物体は、実空間（観測対象空間）において前時刻（t−1）での位置(g_x, g_y, g_z)から現時刻ｔでの位置(g_x', g_y', g_z')へ移動したことが推定され、実空間での位置の前時刻（t−1）からの変化分を取得することができる。 For example, if the position (u, v) at the previous time (t−1) in the image of the tracking target object changes to the position (u ′, v ′) at the current time t, this object is in real space. Moved from the position (g _x , g _y , g _z ) at the previous time (t−1) to the position (g _x ′, g _y ′, g _z ′) at the current time t in (observation target space) Is estimated, and the change in the position in the real space from the previous time (t−1) can be obtained.

ここで、使用する時刻は、単位時間を１としてこの単位時間経過毎に設定される時刻であり、時刻tの1つ前となる前時刻はt-1となる。また、上記のような画像座標系から世界座標系への座標変換は、予めキャリブレーションにより各カメラ２についての設置位置及び撮影向きに係る外部パラメータを設定しておくことによって決定することが可能である。尚、複数のカメラ２のそれぞれから画像が取得される場合でも、これらの画像を統合して１つの画像空間を構築し、この画像空間に画像座標系を適用することができる。 Here, the time to be used is a time set every time the unit time elapses with the unit time set to 1, and the time immediately before the time t is t-1. Further, the coordinate conversion from the image coordinate system to the world coordinate system as described above can be determined by setting external parameters relating to the installation position and the shooting direction of each camera 2 by calibration in advance. is there. Note that, even when images are acquired from each of the plurality of cameras 2, these images can be integrated to construct one image space, and the image coordinate system can be applied to this image space.

さらに、物体追跡装置１では、追跡用識別器１１４ａに対し、カメラ２から取得した時系列の各画像を用いて即座に、即ちオンラインで学習させることができる。その結果、追跡対象物体の位置を即座に把握して追跡を行うことが可能となるのである。さらに、刻々と学習する追跡用識別器１１４ａを用いて追跡を行うので、対象物体の見え方が刻々と変化しても、同一の物体であると認識することができ、例えば当該物体に固有の識別子IDを付与し続けながら、適切な追跡を続行することが容易になる。 Further, in the object tracking device 1, the tracking discriminator 114a can be made to learn immediately, that is, online, using each time-series image acquired from the camera 2. As a result, it is possible to immediately grasp the position of the tracking target object and perform tracking. Furthermore, since tracking is performed using the tracking discriminator 114a that learns every moment, even if the appearance of the target object changes every moment, it can be recognized as the same object, and, for example, a unique It is easy to continue appropriate tracking while continuing to assign an identifier ID.

因みに、上記（Ａ１）に示したように、物体追跡部１１４の追跡用識別器１１４ａが入出力する画像情報、及び実空間での位置情報（「物体動き情報」）は、共に内部に構造をもったデータである。即ち、追跡用識別器１１４ａは構造学習に基づいて実空間での正解とされる情報を出力可能となっている。このように、物体追跡装置１は、実空間とカメラ画像との間の構造関係を考慮した構造学習に基づき物体の識別を行うことによって、例えば後に詳述するように追跡対象物体における高さや形状の変化が起こった場合にも、例えば固有の識別子IDを付与し続けながら、正確な実空間での位置をもって追跡することを可能にするのである。 Incidentally, as shown in the above (A1), the image information input / output by the tracking discriminator 114a of the object tracking unit 114 and the position information in real space (“object motion information”) both have a structure inside. This is the data that we have. That is, the tracking discriminator 114a can output information that is a correct answer in the real space based on the structure learning. As described above, the object tracking device 1 performs the object identification based on the structure learning in consideration of the structural relationship between the real space and the camera image, and thereby, for example, as described in detail later, the height and the shape of the tracking target object. Even if a change occurs, for example, it is possible to keep track of an accurate real space position while continuously assigning a unique identifier ID.

［装置機能概要］
図２は、本発明による物体追跡装置の一実施形態における処理の流れを概略的に示すフローチャートである。 [Device Function Overview]
FIG. 2 is a flowchart schematically showing a flow of processing in an embodiment of the object tracking device according to the present invention.

図２によれば、本実施形態の物体追跡装置１は、カメラ２から解析対象の画像を取得した際、追跡対象の物体に対応した学習済みの追跡用識別器１１４ａ（図１）に対して当該画像を入力し、正解としての「物体動き情報」を出力させて当該物体を追跡する。ここで、追跡対象物体の数だけの追跡用識別器１１４ａが使用される。この際、各物体には当該物体固有の識別子IDが継続して付与されることになる。 According to FIG. 2, when the object tracking device 1 of the present embodiment acquires an image to be analyzed from the camera 2, the object tracking device 1 sends a learned tracking identifier 114 a (FIG. 1) corresponding to the tracking target object. The object is tracked by inputting the image and outputting “object motion information” as a correct answer. Here, as many tracking discriminators 114a as the number of tracking target objects are used. At this time, each object is continuously assigned an identifier ID unique to the object.

さらに、物体追跡装置１は、追跡用識別器１１４ａから出力された正解の「物体動き情報」と、取得した画像とを教師データセットとして用い、追跡用識別器１１４ａにオンライン学習を行わせる。 Further, the object tracking device 1 uses the correct “object motion information” output from the tracking discriminator 114a and the acquired image as a teacher data set, and causes the tracking discriminator 114a to perform online learning.

また、物体追跡装置１は、上記の追跡処理と並行して、検出処理を行う。具体的には、新規の又は追跡されていない可能性のある物体が、取得された画像上に写っていないか否かを学習済みの検出用識別器を用いて判定し、このような物体の領域が当該画像上に出現した際に当該物体を検出したとする。 The object tracking device 1 performs a detection process in parallel with the above-described tracking process. Specifically, a new or possibly untracked object is determined by using a learned detection classifier to determine whether or not the object is captured in the acquired image. It is assumed that the object is detected when a region appears on the image.

物体が検出された際、物体追跡装置１は、新規に追跡を開始する前に、検出された物体と過去に追跡していた物体との類似度を算出し、この類似度が所定値以上である場合に、一旦追跡が終了した物体が撮影可能な空間内に復帰したとみなす。この場合、類似度の高い過去の物体と同一の識別子IDを検出物体に付与して、即ち識別子IDを統合して追跡を再開する。一方、算出した類似度が所定値よりも小さい場合、新規の物体が撮影可能な空間内に出現したとみなし、新規の識別子IDを検出物体に付与する。 When an object is detected, the object tracking device 1 calculates a similarity between the detected object and an object that has been tracked in the past before starting a new tracking, and when the similarity is equal to or greater than a predetermined value. In some cases, it is considered that the object once tracked has returned to the space where the image can be captured. In this case, the same identifier ID as the past object having a high similarity is given to the detected object, that is, the identifier ID is integrated and the tracking is restarted. On the other hand, when the calculated similarity is smaller than the predetermined value, it is considered that a new object has appeared in the space where the image can be captured, and a new identifier ID is assigned to the detected object.

物体追跡装置１は、次いで、時系列で再び新たな画像を取得して、図２に示したような処理のサイクルを繰り返す。 Next, the object tracking device 1 acquires a new image again in time series and repeats the processing cycle as shown in FIG.

以上に説明したように、物体追跡装置１は、１つ以上のカメラ２からの時系列画像群を用いて追跡と同時に検出処理も行っているので、物体における様々の状況に合わせて、的確な且つ統合的な追跡を実施することができる。 As described above, the object tracking device 1 performs the tracking process and the detection process using the time-series image group from one or more cameras 2 at the same time. In addition, integrated tracking can be performed.

［装置構成、物体追跡方法］
図３は、本発明による物体追跡装置の一実施形態における機能構成を示す機能ブロック図である。 [Device configuration, object tracking method]
FIG. 3 is a functional block diagram showing a functional configuration in an embodiment of the object tracking device according to the present invention.

図３によれば、物体追跡装置１は、１つ又は複数のカメラ２と通信接続可能な通信インタフェース１０１と、画像蓄積部１０２と、ＩＤ蓄積部１０３と、追跡物体管理部１０４と、プロセッサ・メモリとを有する。ここで、プロセッサ・メモリは、物体追跡装置１のコンピュータを機能させるプログラムを実行することによって、物体追跡機能を実現させる。 According to FIG. 3, the object tracking device 1 includes a communication interface 101 that can be communicatively connected to one or a plurality of cameras 2, an image storage unit 102, an ID storage unit 103, a tracked object management unit 104, a processor And a memory. Here, the processor memory realizes the object tracking function by executing a program that causes the computer of the object tracking device 1 to function.

さらに、プロセッサ・メモリは、機能構成部として、物体検出部１１１と、ＩＤ（識別子）管理部１１２と、候補情報算出部１１３と、物体追跡部１１４と、物体位置・形状推定部１１５と、通信制御部１２１とを有する。ここで、物体検出部１１１は、検出用識別器１１１ａと、高さ算出部１１１ｂとを有することも好ましい。さらに、ＩＤ管理部１１２は、物体統合部１１２ａと、物体登録部１１２ｂとを有することも好ましい。さらにまた、物体追跡部１１４は、追跡用識別器１１４ａと、教師データセット生成部１１４ｂとを有することも好ましい。尚、図３における物体追跡装置１の機能構成部間を矢印で接続して示した処理の流れは、本発明による物体追跡方法の一実施形態としても理解される。 Further, the processor memory includes, as functional components, an object detection unit 111, an ID (identifier) management unit 112, a candidate information calculation unit 113, an object tracking unit 114, an object position / shape estimation unit 115, and a communication unit. And a control unit 121. Here, it is preferable that the object detection unit 111 includes a detection classifier 111a and a height calculation unit 111b. Further, it is preferable that the ID management unit 112 includes an object integration unit 112a and an object registration unit 112b. Furthermore, it is preferable that the object tracking unit 114 includes a tracking identifier 114a and a teacher data set generation unit 114b. The flow of processing shown by connecting the functional components of the object tracking device 1 in FIG. 3 with arrows is understood as an embodiment of the object tracking method according to the present invention.

カメラ２は、例えば、ＣＣＤイメージセンサ、ＣＭＯＳイメージセンサ等の固体撮像素子を備えた可視光、近赤外線又は赤外線対応の撮影デバイスである。また、カメラ２又は（図示していない）カメラ制御装置は、カメラ２で撮影された物体の画像を含む撮影画像データを生成し、当該データを時系列に又はバッチで物体追跡装置１に送信する機能を有する。また、カメラ２は、可動であって設置位置、撮影向きや高さを変更することができ、この変更のための制御信号を受信し処理する機能を有していることも好ましい。 The camera 2 is a photographing device for visible light, near-infrared light, or infrared light provided with a solid-state image sensor such as a CCD image sensor or a CMOS image sensor. Further, the camera 2 or a camera control device (not shown) generates captured image data including an image of an object captured by the camera 2 and transmits the data to the object tracking device 1 in a time series or in a batch. Has functions. It is also preferable that the camera 2 is movable and can change the installation position, the shooting direction and the height, and has a function of receiving and processing a control signal for this change.

通信インタフェース１０１は、カメラ２又はカメラ制御装置から時系列の画像群である撮影画像データを、通信ネットワークを介して受信する。通信インタフェース１０１を使用した送受信及び通信データ処理の制御は、通信制御部１２１によって行われ、取得された撮影画像データは、画像蓄積部１０２に蓄積される。ここで、この撮影画像データは、カメラ２又はカメラ制御装置から時系列順に呼び出されて取得されたものであってもよく、リアルタイムに一定時間間隔でキャプチャされた画像を順に取得したものであってもよい。 The communication interface 101 receives captured image data as a time-series image group from the camera 2 or the camera control device via a communication network. Control of transmission / reception and communication data processing using the communication interface 101 is performed by the communication control unit 121, and the acquired captured image data is stored in the image storage unit 102. Here, the captured image data may be obtained by calling and acquiring in chronological order from the camera 2 or the camera control device, or may be obtained by sequentially acquiring images captured at fixed time intervals in real time. Is also good.

物体検出部１１１は、所定の特徴量を用いて学習を行った検出用識別器１１１ａによって、物体識別対象の画像における出現した又は追跡されていない物体を検出可能な機能部である。具体的には、画像蓄積部１０２に蓄積された画像において、追跡対象となる物体に対応する画像領域を検出する。ここで、人物を追跡対象とする場合、学習には人物検出に適した特徴量を用いる。物体検出のための特徴量としては、例えばＨＯＧ特徴量を使用することも好ましい。ＨＯＧ特徴量は、画像の局所領域における輝度の勾配方向をヒストグラム化し、各度数を成分としたベクトル量である。ＨＯＧ特徴量を用いた人物検出技術については、例えば、非特許文献であるDalal. N及びTriggs. B、「Histograms of Oriented Gradients for Human Detection」、proceedings of IEEE Computer Vision and Pattern Recognition (CVPR)、pp.886-893、2005年に記載されている。 The object detection unit 111 is a functional unit that can detect an object that has appeared or has not been tracked in the image of the object identification target by the detection classifier 111a that has learned using a predetermined feature amount. Specifically, an image area corresponding to an object to be tracked is detected in the image stored in the image storage unit 102. Here, when a person is to be tracked, a feature amount suitable for detecting a person is used for learning. It is also preferable to use, for example, a HOG feature amount as the feature amount for object detection. The HOG feature amount is a vector amount in which the gradient direction of the luminance in the local region of the image is converted into a histogram and each frequency is used as a component. About the person detection technology using the HOG feature, for example, non-patent documents Dalal.N and Triggs.B, `` Histograms of Oriented Gradients for Human Detection '', proceedings of IEEE Computer Vision and Pattern Recognition (CVPR), pp .886-893, 2005.

物体検出部１１１は、また、画像蓄積部１０２から入力した画像から物体を検出した際、新規登録の可能性がある検出した物体の情報をＩＤ管理部１１２へ通知し、登録を依頼する。 Further, when detecting an object from the image input from the image storage unit 102, the object detection unit 111 notifies the ID management unit 112 of information on the detected object that may be newly registered, and requests registration.

さらに、物体検出部１１１は高さ算出部１１１ｂを有する。高さ算出部１１１ｂは、検出された追跡対象物体に係る画像領域の最下位置（例えば最下ピクセル位置）に基づいて、この物体の実空間での位置としての接地位置を算出し、検出された物体に係る画像領域の最上位置（例えば最上ピクセル位置）に基づいて算出された実空間での位置と、算出された接地位置とに基づいて、この物体の高さh₀を算出する。次に、高さh₀の算出の一実施形態を詳しく説明する。 Further, the object detection unit 111 has a height calculation unit 111b. The height calculation unit 111b calculates a ground contact position as a position of the object in the real space based on the lowest position (for example, the lowest pixel position) of the detected image area of the tracking target object, and and position in real space is calculated based on the highest position of the image area (e.g., uppermost pixel position) of the object was, based on the calculated ground position, and calculates the height h ₀ of the object. Next, an embodiment of the calculation of the height h ₀ detail.

図４は、高さ算出部１１１ｂにおける物体の高さh₀を算出する方法の一実施形態を示す模式図である。尚、本実施形態では、最初に、物体の検出は、物体が標準的な形状であり、且つ画像内において床や地面に接している箇所（接地位置）が明らかな場合に行われる。例えば、人物であれば直立していて足元が映っている人物のみを検出する。画像内で物体を検出した際、実空間における標準形状でのこの物体の高さを推定する。 FIG. 4 is a schematic diagram illustrating an embodiment of a method for calculating the height h ₀ of the object in the height calculation unit 111b. In the present embodiment, first, the detection of the object is performed when the object has a standard shape and a portion ( ground contact position) in contact with the floor or the ground in the image is clear. For example, in the case of a person, only a person standing upright and showing his / her feet is detected. When an object is detected in the image, the height of the object in a standard shape in the real space is estimated.

ここで、図１に示したような画像に張られた画像座標系での座標(u, v)と、実空間（観察対象空間）に張られた世界座標系での座標(g_x, g_y, g_z)との間には、次式

の関係が成立する。上式（１）において、行列Ｐは予め決定された透視投影行列であり、sは未知のスカラ変数である。この際、各カメラ２の内部パラメータ及び外部パラメータをキャリブレーションによって予め設定しておけば、カメラ２の位置・姿勢が変わらない限り、透視投影行列Ｐは当初設定された値をとり続ける。 Here, the coordinates (u, v) in the image coordinate system spanned on the image as shown in FIG. 1 and the coordinates (g _x , g) in the world coordinate system spanned in the real space (observation target space) _y , g _z )

Is established. In the above equation (1), the matrix P is a predetermined perspective projection matrix, and s is an unknown scalar variable. At this time, if the internal parameters and the external parameters of each camera 2 are set in advance by calibration, the perspective projection matrix P keeps taking the initially set value unless the position and orientation of the camera 2 change.

上式（１）を用いて、２次元の画像座標系での座標から３次元の世界座標系での座標を求める際、画像座標系での座標(u, v)及び透視投影行列Ｐが定まっているだけでは、未知パラメータの数（４つ）が観測方程式の数（３つ）よりも多いので、世界座標系での座標(g_x, g_y, g_z)を一意に決定することはできない。 When the coordinates in the three-dimensional world coordinate system are obtained from the coordinates in the two-dimensional image coordinate system using the above equation (1), the coordinates (u, v) in the image coordinate system and the perspective projection matrix P are determined. , The number of unknown parameters (4) is greater than the number of observation equations (3), so it is not possible to uniquely determine the coordinates (g _x , g _y , g _z ) in the world coordinate system. Can not.

しかしながら、本実施形態では、図４に示したように、検出された物体について画像内で床や地面に接している接地位置(u_b ⁰, v_b ⁰)が取得される。従って、この接地位置(u_b ⁰, v_b ⁰)及びg_z＝０を式（１）に代入することによって、接地位置(u_b ⁰, v_b ⁰)に対応する実空間上の位置(g_x ⁰, g_y ⁰, 0)を一意に取得することができる。ここで、実空間での物体の高さをh₀とすると、取得された実空間の床面又は地面での位置座標g_x ⁰及びg_y ⁰と、画面座標系での物体の最上部の点(u_h ⁰, v_h ⁰)との間に、次式の関係が成立する。

上式（２）において、未知のパラメータはs及びh₀の２つのみであり、一方、観測方程式の数は３つであることから、この式を用いて実空間での高さh₀を求めることが可能となる。尚、この際、s及びh₀の値を、最終的に最小二乗法を用いて決定することも好ましい。 However, in the present embodiment, as shown in FIG. 4, the detected object grounded position in contact with the floor or the ground in the image for _{^{_{^{(u b 0, v b 0}}}} ) is obtained. Therefore, by substituting the ground position (u _b ⁰ , v _b ⁰ ) and g _z = 0 into the equation (1), the position in the real space corresponding to the ground position (u _b ⁰ , v _b ⁰ ) ( g _x ⁰ , g _y ⁰ , 0) can be uniquely obtained. Here, when the height of the object in real space and h _0, and the position coordinates g _x ⁰ and g _y ⁰ in the floor or ground of the acquired real space, the top of the object on the screen coordinate system The following relationship holds between the point (u _h ⁰ , v _h ⁰ ).

In the above equation (2), there are only two unknown parameters, s and h ₀ , while the number of observation equations is three. Therefore, the height h ₀ in the real space is calculated using this equation. It is possible to ask. At this time, it is also preferable that the values of s and h ₀ are finally determined by using the least squares method.

以上説明したように、物体検出部１１１の高さ算出部１１１ｂは、例えば、１．人物を画像内で検出（抽出）した際、２．人物モデルを実空間に投影して足元の世界座標系での座標を決定し、３．人物の身長、即ち標準的な形状での高さh₀を算出することができるのである。 As described above, the height calculation unit 111b of the object detection unit 111 includes, for example, 1. 1. When a person is detected (extracted) in the image; 2. Project the human model to the real space to determine the coordinates of the feet in the world coordinate system; Height of the person, that is, the height h ₀ of a standard shape can be calculated.

図３に戻って、ＩＤ管理部１１２は、物体統合部１１２ａと、物体登録部１１２ｂとを有する。このうち物体統合部１１２ａは、物体検出部１１１から通知のあった検出された物体と、過去に識別子IDを付与された既知物体とを比較し、検出された物体に対し、同一物体であると判定された既知物体に付与された識別子IDを付与する旨を決定する。 Returning to FIG. 3, the ID management unit 112 has an object integration unit 112a and an object registration unit 112b. Among them, the object integration unit 112a compares the detected object notified from the object detection unit 111 with a known object to which an identifier ID has been given in the past, and determines that the detected object is the same object. It is determined that an identifier ID given to the determined known object is given.

物体統合部１１２ａは、具体的に、
（ａ）（例えば複数のカメラ２から取得された）物体識別対象の画像から算出される両物体間の実空間での距離ｄが、既知物体の移動速度ｖを考慮した現時点での両物体間の推定距離未満であって、且つ検出された物体の領域と既知物体の領域とから決定される類似度が所定閾値よりも大きい場合、この既知物体が現在追跡されていないならば、検出された物体に対し、既知物体に付与されたものと同一の識別子IDを付与することを決定する。
（ｂ）一方、上記（ａ）において、この既知物体が現在追跡されているならば、物体検出部１１１からの通知を無視し、新規登録は行わない。
（ｃ）上記（ａ）及び（ｂ）以外の場合、検出された物体に対し、新たな識別子IDを付与することを決定する。 The object integration unit 112a specifically includes:
(A) The distance d in the real space between the two objects calculated from the image of the object identification target (for example, obtained from the plurality of cameras 2) is the distance between the two objects at the present time in consideration of the moving speed v of the known object. Is less than the estimated distance, and if the similarity determined from the detected object region and the known object region is greater than a predetermined threshold, if the known object is not currently tracked, it is detected. It is determined that the same identifier ID as that assigned to the known object is assigned to the object.
(B) On the other hand, in (a), if the known object is currently being tracked, the notification from the object detection unit 111 is ignored, and no new registration is performed.
(C) In cases other than the above (a) and (b), it is determined to assign a new identifier ID to the detected object.

一方、物体登録部１１２ｂは、識別子ID付与の決定された物体に識別子IDを付与し、当該物体を登録し管理する。ここで、検出された物体の画像領域に係る情報と、付与された識別子IDとが対応付けられてＩＤ蓄積部１０３に保存されることも好ましい。尚、上記（ａ）における類似度は、追跡中に学習された各物体に対応する識別器を用いて算出されてもよい。また、後に詳細に説明するものではあるが、各物体に対応する評価関数Ｆのうち見かけ（appearance）の近さをスコア化する関数Ψ(x^t|_yt)の値を用いて算出されることも好ましい。 On the other hand, the object registration unit 112b assigns an identifier ID to the object for which the identifier ID is determined, and registers and manages the object. Here, it is also preferable that the information relating to the image area of the detected object and the assigned identifier ID be stored in the ID storage unit 103 in association with each other. Note that the similarity in (a) above may be calculated using a classifier corresponding to each object learned during tracking. As will be described in detail later, the evaluation function F corresponding to each object is calculated using a value of a function Ψ (x ^t | _yt ) for scoring the closeness of appearance (appearance). Is also preferred.

候補情報算出部１１３は、１つの時刻tでの「物体動き情報」として、少なくとも
（ａ）追跡対象物体の実空間での位置における前時刻(t−1)からの変化分Δp_x ^t及びΔp_y ^t
を採用し、この１つの時刻tにおける互いに変化分の異なる複数の「候補物体動き情報」を算出する。ここで、追跡用識別器１１４ａを用いてこの複数の「候補物体動き情報」の中から最適な１つを決定することによって、当該１つの時刻tでの追跡対象物体の位置を推定することができるのである。 The candidate information calculation unit 113 calculates, as “object motion information” at one time t, at least (a) the change amounts Δp _x ^t and Δp from the previous time (t−1) in the position of the tracking target object in the real space. _y ^t
To calculate a plurality of “candidate object motion information” having different amounts of change at one time t. Here, it is possible to estimate the position of the tracking target object at the one time t by determining an optimum one from the plurality of “candidate object motion information” using the tracking discriminator 114a. You can.

また、変更態様として、候補情報算出部１１３は、１つの時刻tでの「物体動き情報」として、
（ｂ）上記（ａ）の変化分Δp_x ^t及びΔp_y ^tと、追跡対象物体の高さにおける前時刻(t−1)からの変化分Δh^tと
を採用して、この１つの時刻tにおける少なくとも変化分の１つが異なる複数の「候補物体動き情報」を算出してもよい。さらに、
（ｃ）上記（ａ）の変化分Δp_x ^t及びΔp_y ^tと、上記（ｂ）の変化分Δh^tと、追跡対象物体の傾きにおける前時刻(t−1)からの変化分Δa^tとからの変化分と
を採用して、この１つの時刻tにおける少なくとも変化分の１つが異なる複数の「候補物体動き情報」を算出することも好ましい。尚、「物体動き情報」及び「候補物体動き情報」の具体例については、後に図８〜１０を用いて詳細に説明する。 In addition, as a change mode, the candidate information calculation unit 113 sets “object motion information” at one time t as
(B) the a variation Delta] p _x ^t and Delta] p _y ^t of (a), employs a variation Delta] h ^t from the previous time (t-1) at the height of the tracking target object, the one at time t A plurality of “candidate object motion information” in which at least one of the changes in may be calculated. further,
A variation Delta] p _x ^t and Delta] p _y ^t of (c) above (a), the change in Delta] h ^t of the (b), the variation .DELTA.a ^t from the previous time (t-1) in the slope of the tracked object It is also preferable to calculate a plurality of “candidate object motion information” in which at least one of the changes at this one time t is different by adopting the change from. Specific examples of the “object motion information” and the “candidate object motion information” will be described later in detail with reference to FIGS.

同じく図３において、物体追跡部１１４は、追跡用識別器１１４ａを用いて、追跡対象物体の実空間での刻々の位置情報を取得する。具体的には、取得された画像中のある領域に追跡対象物体が映っているか否かを例えば２値判定する追跡用識別器１１４ａを用い、未知の画像中に追跡対象物体が映っていると認識される領域を推定していくことで物体追跡を行う。 Similarly, in FIG. 3, the object tracking unit 114 uses the tracking discriminator 114 a to acquire instantaneous position information of the tracking target object in the real space. Specifically, for example, using the tracking discriminator 114a for determining whether or not the tracking target object is reflected in a certain area in the acquired image, if the tracking target object is reflected in an unknown image, Object tracking is performed by estimating the region to be recognized.

ここで、追跡用識別器１１４ａは、
（ａ）取得された画像に係る画像情報と、当該物体の実空間での位置に係る位置情報を含む物体動き情報であって正解とされる情報とを含む教師データセットによってオンライン学習を行い、
（ｂ）物体追跡対象の画像毎に、当該画像に係る画像情報を入力することによって少なくとも追跡対象物体の実空間での正解とされる位置情報を出力する。
尚、上記（ａ）の教師データセットは、教師データセット生成部１１４ｂによって生成される。 Here, the tracking discriminator 114a is
(A) online learning is performed by a teacher data set including image information relating to the acquired image and object motion information including position information relating to the position of the object in the real space, which is regarded as the correct answer;
(B) For each image of the object to be tracked, by inputting image information related to the image, at least position information that is a correct answer in the real space of the tracked object is output.
The teacher data set (a) is generated by the teacher data set generation unit 114b.

このように、追跡用識別器１１４ａは、上記（ａ）及び（ｂ）を繰り返し実行することによって、新たに画像が読み込まれる毎に、オンラインで学習しつつこの読み込み時刻での物体の位置情報を出力することを可能にする。 In this way, the tracking discriminator 114a repeatedly executes the above (a) and (b), so that each time a new image is read, the position information of the object at this reading time is learned while learning online. Enable output.

図５は、取得される時系列の画像と追跡用識別器１１４ａでの識別機能との関係を概略的に示す模式図である。 FIG. 5 is a schematic diagram schematically showing the relationship between the acquired time-series images and the identification function of the tracking identification unit 114a.

図５によれば、追跡用識別器１１４ａは、取得される時系列の各画像を用いて刻々に学習を行う。追跡用識別器は、構造データの取り扱いが可能な教師あり機械学習であれば種々のもので構築可能であるが、例えば構造化サポートベクタマシン（ＳＶＭ，Structured Support Vector Machine）のアルゴリズムによって構築されていることも好ましい。 According to FIG. 5, the tracking discriminator 114a performs learning every moment using the acquired time-series images. The tracking discriminator can be constructed with various kinds of supervised machine learning capable of handling structural data. For example, the tracking discriminator is constructed by an algorithm of a structured support vector machine (SVM). Is also preferred.

具体的に学習の内容としては、追跡対象物体の領域に対応付けられる特徴量としての「物体動き情報」に正のラベルを付与し、それ以外の領域に対応付けられる「物体動き情報」に負のラベルを付与して、これらの特徴量を特徴空間内に配置する。次いで、特徴空間内においてラベルの正負を区分けする識別超平面を算出する。このように学習によって取得した識別超平面を基準として、以後、判定を実施することができる。例えば、時刻tにおける画像領域の判定は、時刻ゼロから時刻(t-1)までの間オンライン学習を行ってきた追跡用識別器１１４ａを用いて実施される。 Specifically, the contents of the learning include adding a positive label to “object motion information” as a feature amount associated with the area of the tracking target object, and adding a negative label to “object motion information” associated with the other areas. And the feature amounts are arranged in the feature space. Next, an identification hyperplane that separates the sign of the label in the feature space is calculated. Thereafter, the determination can be performed with reference to the identification hyperplane acquired by the learning. For example, the determination of the image area at time t is performed using the tracking discriminator 114a that has performed online learning from time zero to time (t-1).

ここで、特徴空間内における当該特徴量と識別超平面との距離ｄは、後に詳細に説明する評価関数Ｆの値（スコア）に相当する。次に、上記の「物体動き情報」yについて説明する。 Here, the distance d between the feature amount and the discrimination hyperplane in the feature space corresponds to a value (score) of an evaluation function F described later in detail. Next, the “object motion information” y will be described.

最初に、推定関数y＝f(x)として、
（３） f(x)＝argmax_y∈YF(x, y)
を採用する。これにより、画像xが与えられると、推定関数fはyを出力する。ここで、F(x, y)は、上述した評価関数であるが、本実施形態におけるその具体的な形は、後に式（６）に示す。 First, as an estimation function y = f (x),
(3) f (x) = argmax _y∈Y F (x, y)
Is adopted. Thus, when the image x is given, the estimation function f outputs y. Here, F (x, y) is the above-described evaluation function, and a specific form in the present embodiment is shown later in Expression (6).

本実施形態では、時刻tにおける画像をx^tとした際に、この時刻tでの物体動き情報y^tを、
（４） y^t＝(Δp_x ^t, Δp_y ^t, Δh^t, Δa^t)
と定義する。上式（４）において、パラメータΔp_x ^tは、追跡対象物体における世界座標系のG_x軸方向での前時刻(t−1)からの位置の変化分であり、パラメータΔp_y ^tは、追跡対象物体における世界座標系のG_y軸方向での前時刻(t−1)からの位置の変化分である。また、パラメータΔh^tは、追跡対象物体における（世界座標系のG_z軸方向での）高さの変化分である。さらに、パラメータΔa^tは、追跡対象物体の傾きの角度における前時刻(t−1)からの変化分である。この傾きの角度は、例えば人物でいえばお辞儀の際の傾き角に相当し、世界座標系で言えばG_z軸を含む面内での角度となる。 In the present embodiment, an image at time t upon the x ^t, the object motion information y ^t at this time t,
^{(4) y t = (Δp} x t, Δp y t, Δh t, Δa t)
Is defined. In the above equation (4), the parameter Delta] p _x ^t is the change in position from the previous time (t-1) in G _x-axis direction of the global coordinate system in the tracking target object, the parameter Delta] p _y ^t is tracked This is a change in position of the target object in the world _y- axis direction of the world coordinate system from the previous time (t−1). The parameter Delta] h ^t is (at G _z-axis direction of the global coordinate system) in the tracked object is a change in the height. Furthermore, the parameter .DELTA.a ^t is the change from the previous time (t-1) at an angle of inclination of the tracking target object. The angle of the tilt corresponds to, for example, the tilt angle when bowing in the case of a person, and is an angle in a plane including the _Gz axis in the world coordinate system.

尚、パラメータΔa^tについても、他のパラメータと同じく実空間（世界座標系）での値を用いてもよいが、以下の実施形態では、画像内（画像座標系）での値を用いるものとする。即ち、Δa^tは、追跡対象物体が画像座標系に投影された際の角度値の変化分となる。このように、Δa^tとして画像内（画像座標系）での値を採用することによって、角度変化分を１次元で考えることができるので、物体動き情報y^tの推定の際の候補数が極端に増大するのを回避し、計算量を抑えることが可能となる。また、パラメータΔh^tも、Δa^tと同じく、追跡対象物体が画像座標系に投影された際の画像上での高さの変化分とすることも可能である。 Incidentally, for the parameter .DELTA.a ^t, it may be used the values of the other parameters as well the real space (world coordinate system), but in the following embodiments, and those using the values in the image (image coordinate system) I do. That, .DELTA.a ^t is a change in the angle value when the tracking target object projected on the image coordinate system. Thus, by adopting the values in the image as .DELTA.a ^t (image coordinate system), it is possible to consider the angular variation in one dimension, extremes number of candidates during the estimation of the object motion information y ^t Can be avoided, and the amount of calculation can be suppressed. The parameter Delta] h ^t also, as with .DELTA.a ^t, can be tracked object is the height variation of the on the image when projected on the image coordinate system.

次に、これらのパラメータ（物体動き情報）と画像座標系での対応する画像領域との関係について説明する。 Next, the relationship between these parameters (object motion information) and the corresponding image area in the image coordinate system will be described.

図６は、追跡対象物体を画像座標系へ投影する一実施形態を説明するための模式図である。 FIG. 6 is a schematic diagram for explaining an embodiment in which a tracking target object is projected on an image coordinate system.

ある時刻tにおける候補となるパラメータベクトルである候補物体動き情報y^tと、取得された画像x^tとは、図６に示したような関係を有する。ここで、前時刻(t−1)において決定（出力）された物体動き情報の最適解をy^t-1*＝(Δp_x ^t-1*, Δp_y ^t-1*, Δh^t-1*, Δa^t-1*)とする。 A candidate object motion information y ^t is a parameter vectors, which are candidates at a certain time t, and the acquired image x ^t, having a relationship as shown in FIG. Here, the preceding time (t-1) determined in (output) is the object of the optimal solution of the motion information ^{y t-1 * = (Δp} x t-1 *, Δp y t-1 *, Δh t-1 * , Δa ^{t-1 *} ).

図６に示すように、最初に、追跡対象物体については、床又は地面への接地位置である物体位置に、対応する３次元の物体モデルが存在するものとしている。この物体モデルは、予め定められた標準的なおおよその物体の形を表したものであり、モデル表面を表す３次元空間内の点の集合となっている。この物体モデルの初期の（時刻ゼロでの）高さはh₀であり、この物体の時刻(t−1)における高さh^t-1*は、
（５） h^t-1*＝h₀＋ΣΔh^k*
となる。ここで、Σはkについての1からt−2までの総和である。時刻tにおける物体モデルの高さh^tは、h^t-1*からΔh^tだけ変化した値（h^t-1*−Δh^t）となっている。 As shown in FIG. 6, it is assumed that a three-dimensional object model corresponding to an object position to be tracked first exists at an object position that is a ground contact position with the floor or the ground. This object model represents a predetermined standard approximate shape of an object, and is a set of points in a three-dimensional space representing the model surface. The initial height (at time zero) of this object model is h ₀ , and the height h ^{t-1 *} of this object at time (t−1) is
(5) h ^{t-1 *} = h ₀ + ΣΔh ^{k *}
Becomes Here, Σ is a sum of k from 1 to t−2. The height h ^t of the object model at time t has a h ^{t-1 *} from Delta] h ^t only changed values ^{^{(h t-1 * -Δh t}} ).

本実施形態では、この物体モデルのうち、上端から長さαh₀の範囲となる上部を画像座標系へ投影する。αは予め定められた１以下の正の（(0, 1]の範囲内の）定数である。ここで、物体モデル（部分）を画像座標系へ投影するとは、物体モデル（の部分）の表面に相当する点集合を画像座標系へ変換することである。さらに、物体モデル（部分）を画像座標系へ投影した結果の画像領域とは、物体モデル（の部分）の表面の点集合に対応する変換された画面上の点集合によって囲われる画像内の領域のことである。 In the present embodiment, the upper part of the object model, which is within a range of the length αh ₀ from the upper end, is projected onto the image coordinate system. α is a predetermined positive constant (within the range of (0, 1)) equal to or less than 1. Here, projecting the object model (portion) onto the image coordinate system means that the object model (portion) is Further, the point set corresponding to the surface is transformed into the image coordinate system, and the image area resulting from projecting the object model (part) onto the image coordinate system is defined as a point set on the surface of (part of) the object model A region in an image that is surrounded by a corresponding transformed point set on the screen.

即ち、追跡用識別器１１４ａ（図３）は、学習及び判定の際の追跡対象物体に係る画像領域として、実空間における当該物体の上端から高さh₀の所定割合αだけ下方となる位置までの物体部分を座標変換して算出された画像領域を採用するのである。一般に、撮影画像においては、物体が床や地面に接している箇所は、例えば、机、テーブル、人物や車といった他の物体の背後に回り隠れてしまうことも少なくない。しかしながら、本実施形態によれば、接地位置が隠れて見えない状況でも追跡対象物体の上部を追跡するので、当該物体の位置や高さを継続して認識し続けることが可能となる。 That is, tracing discriminator 114a (FIG. 3) is, as the image area according to the learning and tracking the target object when the judgment, position to which a predetermined ratio α below the height h ₀ from the upper end of the object in the real space The image area calculated by performing coordinate conversion on the object portion is adopted. In general, in a captured image, a place where an object is in contact with the floor or the ground is often hidden behind another object such as a desk, a table, a person, or a car. However, according to the present embodiment, since the top of the tracking target object is tracked even in a situation where the ground contact position is hidden and cannot be seen, it is possible to continue to recognize the position and height of the object.

ここで、物体モデルにおける画像座標系へ投影された上部に相当する時刻(t−1)での画像領域を、ある基準点を中心にΔa^tだけ画像上で回転させた領域を、以下、x^t|_ytと表す。即ち、x^t|_ytは、画像x^tにおいて物体動き情報がy^t＝(Δp_x ^t, Δp_y ^t, Δh^t, Δa^t)である場合に、画像内に映る物体モデルの上部に相当する画像領域である。 Here, the image area at time (t-1) corresponding to the upper projected into the image coordinate system in the object model, the area is rotated on the image just .DELTA.a ^t mainly a certain reference point, or less, x ^t | _yt . That, x ^t | _yt is the object motion information in the image x ^t is ^{_{^{y t = (Δp x t,}}} Δp y t, Δh t, Δa t) If it is, corresponds to the upper part of the object model appearing in the image This is an image area.

次に、複数の候補物体動き情報を生成して時刻tにおける最適解y^t*を決定するための探索方法について説明する。 Next, a search method for generating a plurality of pieces of candidate object motion information and determining an optimal solution y ^{t *} at time t will be described.

図７は、物体動き情報における実空間での変化分に係る要素と物体モデルとの関係を示す模式図である。 FIG. 7 is a schematic diagram illustrating a relationship between an element related to a change in the real space in the object motion information and the object model.

物体動き情報y^tにおけるΔp_x ^t、Δp_y ^t及びΔh^tは、既に説明したように、前時刻(t−1)から時刻tまでの間における物体位置及び高さの変化分であるが、図７に示すように、それぞれ床又は地面上のG_x軸方向での変化分、床又は地面上のG_y軸方向での変化分、G_z軸方向での変化分に相当する。 Delta] p _x ^t in the object motion information y ^t, Delta] p _y ^t and Delta] h ^t, as already described, it is a change in object position and height between the previous time (t-1) to time t, as shown in FIG. 7, each variation in G _x-axis direction on the floor or ground, the variation in G _y-axis direction on the floor or ground, corresponding to the variation in G _z-axis direction.

図８は、実空間での位置に係るサンプリングによる候補物体動き情報の取得の一実施形態を示す模式図である。因みに、以下に（図８〜１０で）示す複数の候補物体動き情報のうち、上式（３）f(x)＝argmax_y∈YF(x, y)を満たすものが最適解（正解）となる。 FIG. 8 is a schematic diagram showing an embodiment of obtaining candidate object motion information by sampling a position in a real space. Incidentally, among a plurality of candidate object motion information shown below (FIGS. 8 to 10), one that satisfies the above equation (3) f (x) = argmax _y∈Y F (x, y) is the optimal solution (correct answer). It becomes.

図８によれば、実空間での位置変化分Δp_x ^t及びΔp_y ^tの互いに異なる複数の候補物体動き情報y^tが、円形状のグリッドサンプリングによって取得されている。 According to FIG. 8, a plurality of different candidate object motion information y ^t position variation Delta] p _x ^t and Delta] p _y ^t in the real space have been acquired by the circular grid sampling.

具体的に、Δp_x ^t及びΔp_y ^tの組は、前時刻(t−1)での床又は地面上の位置を中心に構成された円形グリッドにおいて、所定範囲内にある格子点に相当する値の組として複数決定される。例えば、半径rが3、4又は5（所定単位）であって方位角θが0から10°刻みで350°までの値をとるとした場合の円形グリッド格子点(r, θ)に相当する値の組(Δp_x ^t, Δp_y ^t)を有するy^tを候補物体動き情報としてもよい。因みに変更態様として、候補物体動き情報y^tにおける実空間での位置変化分を極座標表示で、即ちΔr^t及びΔθ^tとして表示してもよい。尚、半径rについてどのような範囲の値をとるかについては、前時刻(t−1)での物体モデルの移動速度を算出し、この値に基づいて時刻tであり得る値範囲を設定して決定することも好ましい。例えば、移動速度がゼロであったならば、半径rはゼロから始まる値の組をとるものとする。 Specifically, the set of Delta] p _x ^t and Delta] p _y ^t is the circular grid built around a position on the floor or ground at the previous time (t-1), corresponding to the lattice point is within a predetermined range A plurality of values are determined. For example, when the radius r is 3, 4, or 5 (predetermined unit) and the azimuth angle θ takes a value from 0 to 350 ° in increments of 10 °, it corresponds to a circular grid lattice point (r, θ). value pairs _{^{_{^{(Δp x t, Δp y t}}}} ) may be the candidate object motion information y ^t with. Incidentally As modifications, the position change of the real space in a candidate object motion information y ^t in polar coordinates, i.e. may be displayed as [Delta] r ^t and [Delta] [theta] ^t. Regarding the value of the range of the radius r, the moving speed of the object model at the previous time (t−1) is calculated, and a value range that can be the time t is set based on this value. It is also preferable to determine it. For example, if the moving speed is zero, the radius r takes a set of values starting from zero.

以上説明したように、候補となる位置変化分の決定した人物モデルは、図８に示すように、上部αh₀の部分のみが画像座標系に投影される。このように投影された複数の画像領域が、画像x^tでの複数の候補領域となる。 As described above, in the human model determined as the candidate position change, only the upper αh ₀ portion is projected on the image coordinate system as shown in FIG. Such projected plurality of image areas in becomes the plurality of candidate areas in the image x ^t.

図９は、実空間での高さに係るサンプリングによる候補物体動き情報の取得の一実施形態を示す模式図である。 FIG. 9 is a schematic diagram illustrating an embodiment of obtaining candidate object motion information by sampling the height in the real space.

図９によれば、実空間での高さ変化分Δh^tの互いに異なる複数の候補物体動き情報y^tが取得されている。 According to FIG. 9, a plurality of different candidate object motion information y ^t of the height variation Delta] h ^t in the real space is acquired.

具体的に、Δh^tは、前時刻(t−1)での高さh^t-1*からの高さ変化分であり、複数のバリエーションの高さ変化分として複数の候補値をとる。例えば、固定変化分Δhを予め設定しておき、Δhのプラス・マイナス係数倍を高さ変化分の複数候補とすることができる。この際、係数値も所定範囲内で変化させる。尚、候補となる高さ変化分の決定した人物モデルは、図９に示すように、上部αh₀の部分のみが画像座標系に投影される。このように投影された複数の画像領域が、画像x^tでの複数の候補領域となる。 Specifically, Delta] h ^t is before the time (t-1) is the height change from the height h ^{t-1 *} in, taking a plurality of candidate values as the height variation of the multiple variations. For example, the fixed change Δh may be set in advance, and a plus / minus coefficient times Δh may be set as a plurality of candidates for the height change. At this time, the coefficient value is also changed within a predetermined range. As shown in FIG. 9, only the upper αh ₀ portion of the human model determined as the candidate height change is projected onto the image coordinate system. Such projected plurality of image areas in becomes the plurality of candidate areas in the image x ^t.

図１０は、画像座標系での傾きに係るサンプリングによる候補物体動き情報の取得の一実施形態を示す模式図である。 FIG. 10 is a schematic diagram illustrating an embodiment of acquiring candidate object motion information by sampling the inclination in the image coordinate system.

図１０によれば、画像座標系に投影された物体モデルの上部の画像領域x^t|_ytについて、傾きの変化分Δa^tの互いに異なる複数の候補物体動き情報y^tが取得されている。このような候補物体動き情報を考慮することにより、例えば、追跡対象物体が人物である場合に、体を腰から傾けるような形状変化にも対応して追跡を行うことができる。 According to FIG. 10, the upper portion of the image area x ^t of the object model projected on the image coordinate system | About _yt, different plurality of candidate object motion information y ^t of the slope of the variation .DELTA.a ^t are acquired. By considering such candidate object motion information, for example, when the tracking target object is a person, tracking can be performed in response to a shape change such as tilting the body from the waist.

具体的に、Δa^tは、前時刻(t−1)での画像領域x^t|_ytの向きからの傾き変化分であり、複数のバリエーションの傾き変化分として複数の候補値をとる。例えば、固定変化分Δaを予め設定しておき、Δaのプラス・マイナス係数倍を傾き変化分の複数候補とすることができる。この際、係数値も所定範囲内で変化させる。尚、候補となる傾き変化分の決定した人物モデルは、上部αh₀の部分のみが画像座標系に投影される。このように投影された複数の画像領域が、画像x^tでの複数の候補領域となる。 Specifically, .DELTA.a ^t is the image region x ^t at the previous time (t-1) | is the slope change from the orientation of _yt, taking a plurality of candidate values as the slope variation of the multiple variations. For example, the fixed change Δa can be set in advance, and a plus or minus coefficient times Δa can be set as a plurality of candidates for the slope change. At this time, the coefficient value is also changed within a predetermined range. It should be noted that only the upper αh ₀ part of the human model determined as the candidate inclination change is projected on the image coordinate system. Such projected plurality of image areas in becomes the plurality of candidate areas in the image x ^t.

以上、図８〜１０を用いて候補物体動き情報y^tの生成を説明したが、上式（３）f(x)＝argmax_y∈YF(x, y)を満たす最適解y^t*を得るための探索では、上述したような変化分Δp_x ^t, Δp_y ^t, Δh^t及びΔa^tの候補値における全ての組合せであるy（∈Y）に関して評価関数Fの値、即ちスコアを算出し、算出されたスコアのうち最大のものを導出するyを最適解y^t*とすることになる。 The generation of the candidate object motion information y ^t has been described above with reference to FIGS. 8 to _{10. The} optimal solution y ^{t *} that satisfies the above equation (3) f (x) = argmax _y∈Y F (x, y) in the search for obtaining, variation Delta] p _x ^t as described above, Delta] p _y ^t, Delta] h ^t and Δa value of the evaluation function F with respect to y (∈Y) are all combined in a candidate value of ^t, i.e. calculate the scores Then, y that derives the largest one of the calculated scores is set as the optimal solution yt ^* .

尚、このスコア算出においては、候補が多数になると計算コストが増大するため、予め所定の前提を設けて変化分の組合せ数を限定し、計算コストを軽減させることも好ましい。例えば、観察対象空間での状況から人物の形状変化は着席によるものであると前提することができる場合、着席する際には歩行を停止するとの事前知識に基づき、着席に伴う高さ変化分や傾き変化分の候補を設定するのは、床や地面における位置に変化がない場合に限定することができる。即ち、この場合、(Δp_x ^t, Δp_y ^t)の候補に限って、Δh^t及びΔa^tに複数の候補値を設定するので、候補y（∈Y）の全数を減少させることができる。 In this score calculation, the calculation cost increases as the number of candidates increases. Therefore, it is preferable to reduce the calculation cost by setting a predetermined premise in advance to limit the number of combinations of changes. For example, if it can be assumed that the shape change of the person is due to sitting from the situation in the observation target space, the height change due to seating and The setting of the inclination change candidate can be limited to the case where there is no change in the position on the floor or the ground. That is, in this _{^{_{case, (Δp x t, Δp y}}} t) only candidate, since setting a plurality of candidate values on Delta] h ^t and .DELTA.a ^t, it is possible to reduce the total number of candidate y (∈Y).

以下、評価関数Fを用いた追跡対象物体の追跡について説明する。 Hereinafter, tracking of the tracking target object using the evaluation function F will be described.

図３に戻って、物体追跡部１１４の追跡用識別器１１４ａは、１つの実施形態として、
（ａ）追跡対象物体の実空間での位置の変化分Δp_x ^t及びΔp_y ^tを変数とする確率密度関数P_pに係る項と、
（ｂ）追跡対象物体に係る画像領域に対する候補物体動き情報から算出される画像領域x^t|_ytの見かけ（appearance）の近さを評価する項と
を有する評価関数Fに対し、入力された複数の候補物体動き情報(Δp_x ^t, Δp_y ^t)及び時刻tでの画像に係る画像情報を適用し、評価関数Fのスコアを最大にする候補物体動き情報を、時刻tにおける当該物体の実空間での位置に係る正解の位置情報Δp_x ^t*及びΔp_y ^t*として出力してもよい。 Returning to FIG. 3, the tracking discriminator 114a of the object tracking unit 114 is, as one embodiment,
(A) the term of the probability density function P _p for a variation Delta] p _x ^t and Delta] p _y ^t position in the real space of the tracking target object as a variable,
(B) a term for evaluating the closeness of the appearance (appearance) of the image area x ^t | _yt calculated from the candidate object motion information with respect to the image area relating to the tracking target object; candidate object motion information _{^{_{^{(Δp x t, Δp y t}}}} ) and applies the image information relating to an image at time t, evaluating candidate object motion information that maximizes the score function F, the real of the object at time t it may be output as position information of the correct Delta] p _x ^{t *} and Delta] p _y ^{t *} according to the position in space.

また、変更態様として、評価関数Fに適用される候補物体動き情報を(Δp_x ^t, Δp_y ^t, Δa^t)とし、追跡用識別器１１４ａが、時刻tにおける追跡対象物体の傾きに係る正解の情報Δa^t*をも含む情報を出力することも好ましい。 Further, the correct answer as a modification, the candidate object motion information that applies to the evaluation function F as _{^{_{^{(Δp x t, Δp y t}}}} , Δa t), tracing identifier 114a is, according to the inclination of the tracking target object at time t It is also preferable to output information including the information Δat ^* .

さらに、追跡用識別器１１４ａは、他の実施形態として、上記（ａ）の項と、上記（ｂ）の項と、さらに、
（ｃ）追跡対象物体の高さの変化分Δh^tを変数とする確率密度関数P_hに係る項と
を有する評価関数Fに対し、入力された複数の候補物体動き情報(Δp_x ^t, Δp_y ^t, Δh^t)及び時刻tでの画像に係る画像情報を適用し、評価関数Fのスコアを最大にする候補物体動き情報を、時刻tにおける当該物体の実空間での位置及び高さに係る正解の情報Δp_x ^t*、Δp_y ^t*及びΔh^t*として出力してもよい。 Further, as another embodiment, the tracking discriminator 114a includes the above item (a), the above item (b), and
(C) with respect to the evaluation function F and a term relating to the probability density function P _h to the variable height of the change in Delta] h ^t of the tracking target object, a plurality of candidate object motion information input (Δp _x ^t, Δp _y ^t , Δh ^t ) and the image information of the image at time t, applying the candidate object motion information that maximizes the score of the evaluation function F to the position and height of the object in the real space at time t. according correct answer information Δp _x ^{t *,} may be output as Delta] p _y ^{t *} and Delta] h ^{t *.}

また、変更態様として、評価関数Fに適用される候補物体動き情報を(Δp_x ^t, Δp_y ^t, Δh^t, Δa^t)とし、追跡用識別器１１４ａが、時刻tにおける追跡対象物体の傾きに係る正解の情報Δa^t*をも含む情報を出力することも好ましい。 Further, as a modification, the candidate object motion information that applies to the evaluation function _{^{_{F (Δp x t, Δp y}}} t, Δh t, Δa t) and, tracing identifier 114a is, inclination of the tracking target object at time t It is also preferable to output information including the correct information Δat ^{* according} to the above.

さらに、追跡用識別器１１４ａは、更なる他の実施形態として、上記（ａ）の項と、上記（ｂ）の項と、上記（ｃ）の項と、さらに、
（ｄ）追跡対象物体に係る画像領域における当該物体の動き（motion）による変化と物体動き情報に係る変化分とが合致する度合いを評価する項と
を有する評価関数Fに対し、入力された複数の候補物体動き情報(Δp_x ^t, Δp_y ^t, Δh^t)及び時刻tでの画像に係る画像情報を適用し、評価関数Fのスコアを最大にする候補物体動き情報を、時刻tにおける当該物体の実空間での位置及び高さsに係る正解の情報Δp_x ^t*、Δp_y ^t*及びΔh^t*として出力してもよい。 Further, as another embodiment, the tracking discriminator 114a includes the above item (a), the above item (b), the above item (c), and
(D) For an evaluation function F having a term that evaluates the degree to which the change due to the motion of the object in the image area of the object to be tracked matches the amount of change related to the object motion information, candidate object motion information _{^{_{^{(Δp x t, Δp y t}}}} , Δh t) and applies the image information relating to an image at time t, the candidate object motion information that maximizes the score of the evaluation function F, the at time t ^* object in the real space at the position and height information of correct answers according to the s Δp _x ^t, may be output as Delta] p _y ^{t *} and Delta] h ^{t *.}

ここで、以下に説明する実施形態では、評価関数Fとして、上記（ａ）〜（ｄ）の全ての項を備えた次式
（６） F(x^t, y^t)＝w_pP_p(Δp^t-1*, Δp_x ^t, Δp_y ^t)＋w_hP_h(Δh^t-1*, Δh^t)
＋w_bΦ(x^t-1|_yt-1, x^t|_yt)＋w_sΨ(x^t|_yt)
を採用する。係数w_p、w_h、w_b及びw_sは学習によって決定される重みパラメータである。この関数値（スコア）が大きいほど、y^tはより適した解（より正解に近い解）となる。次に、上式（６）右辺の各項を順次説明する。 Here, in the embodiment described below, as the evaluation function F, the following equation (6) including all of the terms (a) to (d) is given by F (x ^t , y ^t ) = w _p P _p ( Δp ^{t-1 *} , Δp _x ^t , Δp _y ^t ) + w _h P _h (Δh ^{t-1 *} , Δh ^t )
+ W _b Φ (x ^t-1 | _yt-1 , x ^t | _yt ) + w _s Ψ (x ^t | _yt )
Is adopted. The coefficients w _p , w _h , w _b, and w _s are weight parameters determined by learning. As this function value (score) is large, y ^t is the more suitable solutions (solutions closer to the correct answer). Next, the terms on the right side of the above equation (6) will be sequentially described.

図１１は、評価関数Fにおける位置変化分の確率密度関数P_pの一実施例を示すグラフである。 FIG. 11 is a graph showing an example of the probability density function P _p for the position change in the evaluation function F.

上式（６）の評価関数Fの第１項におけるP_p(Δp^t-1*, Δp_x ^t, Δp_y ^t)は、図１１に示すように、前時刻(t−1)での位置の変化分Δp^t-1*＝(Δp_x ^t-1*, Δp_y ^t-1*)から算出された時刻tでの位置変化分Δp_x ^t及びΔp_y ^tに関する確率密度関数である。具体的には、予め定められた分散共分散行列Σを用いてΔp_x ^t-1*を平均とした２変量正規分布N(Δp_x ^t-1*, Σ)として確率密度関数P_pを定義することができる。 P _p in the first term of the evaluation function F of the equation ^{(6) (Δp t-1} *, Δp x t, Δp y t) , as shown in FIG. 11, the position at the previous time (t-1) the variation ^{Δp t-1 * = (Δp} x t-1 *, Δp y t-1 *) is the probability density function for the position change amount Delta] p in is calculated from the time t _x ^t and Delta] p _y ^t. Specifically, a probability density function P _p is defined as a bivariate normal distribution N (Δp _x ^{t-1 *} , Σ) using Δp _x ^{t-1 *} as an average using a predetermined variance-covariance matrix ^-1. can do.

このような所定の確率モデルに基づく確率密度関数P_pを評価関数Fに採用することによって、前時刻(t−1)での移動量から見て、起こり得る確率の十分に高い時刻tでの移動量を推測することが可能となっている。 By adopting the probability density function P _p based on such a predetermined probability model as the evaluation function F, the time at the time t at which the probability that can occur is sufficiently high when viewed from the movement amount at the previous time (t−1) is considered. It is possible to estimate the moving amount.

図１２は、評価関数Fにおける高さ変化分の確率密度関数P_hの一実施例を示すグラフである。 Figure 12 is a graph showing an example of a probability density function P _h height variation in the evaluation function F.

上式（６）の評価関数Fの第２項におけるP_h(Δh^t-1*, Δh^t)は、図１２に示すように、前時刻(t−1)での高さの変化分Δh^t-1*から算出された時刻tでの高さ変化分Δh^tに関する確率密度関数である。具体的には、予め定められた分散σを用いてΔh^t-1*を平均とした２変量正規分布N(Δh^t-1*, σ)として確率密度関数P_hを定義することができる。 _{^{P h (Δh t-1 *}} , Δh t) in the second term of the evaluation function F of the equation (6), as shown in FIG. 12, the previous time (t-1) at the height of the change in Delta] h it is the probability density function for the height variation Delta] h ^t at ^{t-1 *} is calculated from the time t. Specifically, it is possible to define a probability density function P _h as using a dispersion sigma predetermined Δh ^{t-1 *} Average and the bivariate normal distribution ^{N (Δh t-1 *,} σ).

このような所定の確率モデルに基づく確率密度関数P_hを評価関数Fに採用することによって、前時刻(t−1)での高さ変化量から見て、起こり得る確率の十分に高い時刻tでの高さ変化量を推測することが可能となっている。 By adopting a probability density function P _h based on such a predetermined probability model in the evaluation function F, the previous time as viewed from the height variation in the (t-1), possible sufficiently high time t probability It is possible to estimate the amount of change in height at.

次に、上式（６）の評価関数Fの第３項におけるΦ(x^t-1|_yt-1, x^t|_yt)は、パラメータΔp_x ^t、Δp_y ^t及びΔh^tで定義される３次元の物体モデルを画像座標系に投影した結果としての領域をΔa^tだけ回転させた画像領域x^t|_ytに対し、評価を行う関数であり、差分画像を用いて画像内での移動を評価する関数である。 Next, [Phi in the third term of the evaluation function F of the equation ^{(6) (x t-1} | yt-1, x t | yt) is defined by the parameter Δp _x ^{^t,} Δp _y ^t and Delta] h ^t the three-dimensional object model image region x ^t is rotated by .DELTA.a ^t the region as a result projected on the image coordinate system | to _yt, a function to evaluate the movement in the image by using a differential image The function to evaluate.

ここで、差分画像とは、画像x^t-1上の点と画像x^t上の対応する点との間の輝度値の差に相当する輝度値を有する点から構成される画像である。時刻tでの画像x^tにおける点(u, v)の輝度値をx^t(u, v)とすると、画像x^t-1と画像x^tの差分画像における各点(u, v)の輝度値x_bg ^t-1,tは、次式
（７） x_bg ^t-1,t(u, v)＝|x^t-1(u, v)−x^t(u, v)|
で定義される。多くの場合に画像の輝度値はゼロから255までの範囲（[0,255]）内の値をとるように定義されることから、差分画像の各点の輝度値を、255で割り算することによってゼロから1までの範囲（[0,1]）に正規化した値としてもよい。 Here, the difference image is an image composed of points having a brightness value corresponding to the difference in luminance value between the corresponding point on the image x ^t-1 on the point and the image x ^t. Point in the image x ^t at time t (u, v) the luminance value x ^t (u, v) and when the brightness of each point in the difference image of the image x ^t-1 and the image x ^t (u, v) The value x _bg ^{t-1, t} is given by the following equation (7): x _bg ^{t-1, t} (u, v) = | x ^t-1 (u, v) −x ^t (u, v) |
Is defined by Since the luminance value of an image is often defined to be in the range of 0 to 255 ([0,255]), dividing the luminance value of each point in the difference image by 255 It may be a value normalized to the range ([0,1]) from to.

図１３は、差分画像の一実施例を示すイメージ図である。図１３に示したように、差分画像では、取得される画像上での物体の動きが反映された輝度分布が観察される。 FIG. 13 is an image diagram showing one embodiment of the difference image. As shown in FIG. 13, in the difference image, a luminance distribution reflecting the movement of the object on the acquired image is observed.

このような差分画像に関係するΦ(x^t-1|_yt-1, x^t|_yt)は、次式

Is defined by In equation (8), the area excluding the point that is also the image area B from the image area A is defined as AB, and the area (number of pixels) of the area C is defined as | C |. ^{_{Φ (x t-1 | yt}} -1, x t | yt) , the region x ^t-1 | area a _yt-1 x ^t | not _yt area ^{_{| x t-1 | yt-}} 1 -x t If | _yt | is not zero, the value obtained by dividing the total luminance value of this area by this area (the number of pixels), that is, the average luminance value in the difference image, is taken. Here, in general, since the difference between the luminance value becomes larger by made the motion region, the region x ^t | closer to the image area of the object corresponding in _yt actually time t, the value of the function Φ increases. As a result, the degree to which the movement of the object in the real space matches the movement of the object in the image can be evaluated by the function Φ.

最後に、上式（６）の評価関数Fの第４項におけるΨ(x^t|_yt)は、パラメータΔp_x ^t、Δp_y ^t及びΔh^tで定義される３次元の物体モデルを画像座標系に投影した結果としての領域をΔa^tだけ回転させた画像領域x^t|_ytに対し、評価を行う関数であり、追跡対象物体に係る画像領域に対する画像領域x^t|_ytの見かけ（appearance）の近さを評価する関数である。 Finally, [psi in the fourth term of the evaluation function F of the equation ^{_{(6) (x t | yt}} ) , the parameter Δp _x ^{^t,} Δp _y ^t and the image coordinate system a three-dimensional object model defined by Delta] h ^t to _yt, a function for evaluating the image area x ^t for the image area of the tracking target object | | the image area x ^t which is rotated by .DELTA.a ^t the region as a result of the projection of _yt apparent (appearance) It is a function that evaluates closeness.

画像領域x^t|_ytから算出される見かけのモデル化には、例えば領域内の色ヒストグラムやHaar-Like特徴を用いることができる。この際、領域内の見かけを特徴ベクトル化し、その近さを評価する。尚、Haar-Like特徴については、例えば非特許文献であるViola, P及びJones, M、「Rapid object detection using a boosted cascade of simple features」、proceedings of IEEE Computer Vision and Pattern Recognition (CVPR)、vol.1、pp.511-518、2001年に記載されている。 Image region x ^t | A model of apparent calculated from _yt, can be used, for example color histograms or Haar-Like features in the region. At this time, the appearance in the region is converted into a feature vector, and the closeness is evaluated. For Haar-Like features, for example, Non-Patent Documents Viola, P and Jones, M, `` Rapid object detection using a boosted cascade of simple features '', proceedings of IEEE Computer Vision and Pattern Recognition (CVPR), vol. 1, pp. 511-518, 2001.

図１４は、画像領域x^t|_ytの見かけの特徴ベクトル化の一実施例を示す模式図である。 14, the image region x ^t | is a schematic diagram showing an embodiment of a feature vector of the apparent _yt.

図１４によれば、画像領域x^t|_ytに対し、当該領域内のピクセルについての輝度ヒストグラムが生成されている。本実施例において生成された輝度ヒストグラムは、輝度値0〜255の範囲を複数の区間に分け、各区間に属する輝度値を有するピクセルの数（度数）を柱状グラフで示したものである。図１４には、輝度範囲を６つの区間に分けた場合の例を示している。 According to FIG 14, the image region x ^t | to _yt, luminance histogram for pixels of the region is generated. The luminance histogram generated in the present embodiment is obtained by dividing the range of luminance values 0 to 255 into a plurality of sections, and indicating the number (frequency) of pixels having luminance values belonging to each section in a columnar graph. FIG. 14 shows an example in which the luminance range is divided into six sections.

ここで、特徴ベクトルは、各輝度区間の度数（ピクセル数）を成分としたベクトルで表される。図１４の例では、６次元の特徴量ベクトルが生成される。尚、当然に、特徴ベクトルの内容及び次元は、この例に限定されるものではない。画像領域x^t|_ytの特徴を表す量ならば、種々のものが特徴ベクトルの成分として採用可能である。 Here, the feature vector is represented by a vector having the frequency (pixel number) of each luminance section as a component. In the example of FIG. 14, a six-dimensional feature amount vector is generated. Note that, naturally, the contents and dimensions of the feature vector are not limited to this example. Image region x ^t | if the amount representing the feature of _yt, it is possible to employ various ones as components of a feature vector.

次に、物体追跡部１１４の追跡用識別器１１４ａにおける学習処理について説明する。 Next, a learning process in the tracking discriminator 114a of the object tracking unit 114 will be described.

図３に戻って、物体追跡部１１４の追跡用識別器１１４ａは、教師データセット生成部１１４ｂで生成される教師データセットを用いて、オンラインで学習する。具体的に、追跡用識別器１１４ａは、検出時刻をゼロとした場合、物体追跡対象の画像x^tを取得した時刻tにおいて、それ以前の時刻1、2、・・・及びt-1での正解データを時刻毎に毎回用いて学習を行って更新を繰り返した状態となっている。 Returning to FIG. 3, the tracking discriminator 114a of the object tracking unit 114 learns online using the teacher data set generated by the teacher data set generation unit 114b. Specifically, tracing identifier 114a is, when the detection time is zero, at time t acquired image x ^t of the object tracked, earlier time 1, in ... and t-1 The learning is performed by using the correct answer data every time and the update is repeated.

具体的に、学習には構造学習のアプローチを用いる。構造学習とは、機械学習の一種であり、未知の入力から適切な構造関係（依存関係）を有するデータを出力する関数について学習を行う。追跡用識別器１１４ａは、追跡対象物体について、実空間での位置・高さ情報と当該物体を画像内に投影した際の画像領域との構造関係を学習するものとなっている。本実施形態では、上式（３）f(x)＝argmax_y∈YF(x, y)の評価関数F(x, y)による変換F：X→Yを学習するアルゴリズムとして構造化（Structured）ＳＶＭを用いる。尚、構造化ＳＶＭについては、例えば非特許文献であるIoannis Tsochantaridis、Thorsten Joachims、Thomas Hofmann及びYasemin Altun、「Large Margin Methods for Structured and Interdependent Output Variables」、Journal of Machine Learning Research 6、pp.1453-1484、2005年に記載されている。 Specifically, a learning approach is used for learning. Structural learning is a type of machine learning in which learning is performed on a function that outputs data having an appropriate structural relationship (dependency) from an unknown input. The tracking discriminator 114a learns the structural relationship between the position / height information in the real space and the image area when the object is projected into the image, for the tracking target object. In the present embodiment, the above equation (3) is structured (Structured) as an algorithm for learning the transformation F: X → Y by the evaluation function F (x, y) of f (x) = argmax _y∈Y F (x, y). 3.) Use SVM. Incidentally, for structured SVM, for example, non-patent literature Ioannis Tsochantaridis, Thorsten Joachims, Thomas Hofmann and Yasemin Altun, "Large Margin Methods for Structured and Interdependent Output Variables", Journal of Machine Learning Research 6, pp. 1453-1484 , 2005.

学習用の教師データセットとしては、例えば、
（ａ）画像x^jと、評価関数Fの値を最大とする正解物体動き情報y^j*と、正解としての正のラベルである1との組（x^j, y^j*, 1）、及び
（ｂ）画像x^kと、評価関数Fの値を最小とする正解ではない物体動き情報y^kと、正解ではない解としての負のラベルである−1との組（x^k, y^k, −1）
を用いることができる。ここで、組（x^j, y^j*, 1）及び組（x^k, y^k, −1）の総数、即ち、教師データセットの数はｎ個とし、以下、（x^j, y^j*）及び（x^k, y^k）を通して(xⁱ, yⁱ)（i＝1, 2,・・・, n）と表現する。尚、上記（ｂ）において、正解ではない物体動き情報y^kとして、F値を最大にはしないy^kを用いることも可能である。 As a training data set for learning, for example,
(A) A set ( ^xj , ^{yj *} , 1) of an image ^xj , correct object motion information ^{yj *} that maximizes the value of the evaluation function F, and 1 that is a positive label as a correct answer, and (B) A pair (x ^k , y ^k , y ^k , y ^k , y ^k , y ^k , y ^k ) of an image x ^k , non-correct object motion information y ^k that minimizes the value of the evaluation function F, and a negative label −1 as a non-correct solution -1)
Can be used. Here, the total number of sets (x ^j , y ^{j *} , 1) and sets (x ^k , y ^k , −1), that is, the number of training data sets is n, and (x ^j , y ^{j *)} ) And (x ^k , y ^k ) are represented as (x ⁱ , y ⁱ ) (i = 1, 2,..., N). In the above (b), as the object motion information y ^k is not a correct answer, it is also possible to use a y ^k which are not the maximum F value.

ここで、評価関数Fの重みパラメータw＝(w_p, w_h, w_b, w_s)は、次式

の形で定義される目的関数の最適化によって導出される。上式（９）においてL(y)は損失関数であって、次式

によって定義される。上式（１０）においてy^*は入力xに対する正解データである。この損失関数L(y)は、y＝y^*の場合のみゼロ値をとり、それ以外の場合、yと正解y*とのズレが大きいほど大きな正値をとるものであり、yの構造関係を反映した形となっている。 Here, the weight parameter w = (w _p , w _h , w _b , w _s ) of the evaluation function F is expressed by the following equation.

Is derived by optimizing an objective function defined in the form In the above equation (9), L (y) is a loss function.

Defined by In the above equation (10), y ^* is the correct answer data for the input x. This loss function L (y) takes a zero value only when y = y ^* , otherwise, the larger the difference between y and the correct answer y *, the larger the positive value, and the structural relationship of y It is a form that reflects.

このように、追跡用識別器１１４ａは、オンラインでの構造学習によって評価関数Fの各項の重みパラメータw＝(w_p, w_h, w_b, w_s)を決定し、決定された重みパラメータwを有する評価関数Fを用いて、入力した画像x^tを処理して、出力する物体動き情報y^t*を算出する。因みに、学習によって決定された重みパラメータwが、図５に示した識別超平面を規定する。 In this way, the tracking discriminator 114a determines the weight parameter w = (w _p , w _h , w _b , w _s ) of each term of the evaluation function F by online structure learning, and determines the determined weight parameter. The input image ^xt is processed using the evaluation function F having w, and the output object motion information yt ^* is calculated. Incidentally, the weight parameter w determined by learning defines the identification hyperplane shown in FIG.

以上説明した学習及び判定をまとめると、本実施形態の追跡用識別器１１４ａは、前時刻(t−1)において、正解として出力された物体動き情報y^t-1*を用いて生成されたデータセットによって学習を行い、時刻tにおいて取得した画像x^tを入力して、この画像x^tを、評価関数Fの構造学習によって決定されたパラメータw＝(w_p, w_h, w_b, w_s)を用いて処理し、時刻tにおける正解となる物体動き情報y^t*を出力する。これにより、画像内で追跡対象物体が床や地面に接する箇所が特定できない場合や、物体の形状が変化したり物体の高さが変化したりする場合でも、取得される画像群を用いて、実空間における高い位置精度を維持しつつ、固有の識別子IDを付与し続けながら物体を追跡することができるのである。 To summarize the learning and determination described above, the tracking discriminator 114a according to the present embodiment uses the data generated using the object motion information y ^{t-1 *} output as the correct answer at the previous time (t−1). learns by the set, and inputs the image x ^t obtained at time t, the image x ^t, the parameters determined by the structure learning evaluation function _{F w = (w p, w} h, w b, w s ) To output the correct object motion information y ^{t *} at time t. As a result, even when the position of the tracking target object in contact with the floor or the ground in the image cannot be specified, or even when the shape of the object changes or the height of the object changes, the actual image group is obtained using the acquired image group. The object can be tracked while maintaining a high positional accuracy in the space and continuously giving a unique identifier ID.

物体位置・形状推定部１１５は、物体追跡部１１４から入力した物体動き情報y^t*、又は追跡対象物体の実空間での刻々の位置、高さ及び／又は傾き情報に基づいて、所定の時間範囲における追跡対象物体の実空間での位置、高さ及び／又は傾きの変化を決定する。これらの情報や、追跡対象物体の動線、さらには動線上での着席、お辞儀等のイベントを決定して追跡物体管理部１０４に保存することも好ましい。また、このような物体位置・形状推定結果は、例えば外部の情報処理装置からの要求に応じ、通信制御部１２１及び通信インタフェース１０１を介して、当該外部の情報処理装置宛てに送信されることも好ましい。 The object position / shape estimating unit 115 performs a predetermined time based on the object motion information y ^{t *} input from the object tracking unit 114 or the instantaneous position, height, and / or tilt information of the tracking target object in the real space. A change in position, height and / or tilt of the tracked object in real space in the range is determined. It is also preferable to determine the information, the flow line of the tracking target object, and events such as seating and bowing on the flow line, and store them in the tracking object management unit 104. In addition, such an object position / shape estimation result may be transmitted to the external information processing device via the communication control unit 121 and the communication interface 101, for example, in response to a request from the external information processing device. preferable.

以上詳細に説明したように、本発明によれば、取得された画像に係る画像情報のみならず、実空間での制約を含む「物体動き情報」をも考慮して追跡を行う。その結果、取得される画像群を用いながらも、実空間における高い位置精度を維持しつつ物体を追跡することができる。 As described above in detail, according to the present invention, tracking is performed in consideration of not only image information relating to an acquired image but also “object motion information” including a restriction in a real space. As a result, the object can be tracked while maintaining high positional accuracy in the real space while using the acquired image group.

また、「物体動き情報」に追跡対象物体の位置変化分のみならず高さ変化分や傾き変化分を取り入れることによって、物体の形状が変化したり物体の高さが変化したりする場合でも、実空間における高い位置精度を維持しつつ物体を追跡することができる。さらに、物体の刻々の位置だけでなく、刻々の高さや形状を推定することも可能となる。 In addition, even when the shape of the object changes or the height of the object changes by incorporating the height change and the tilt change as well as the position change of the tracking target object into the "object motion information", The object can be tracked while maintaining high position accuracy in the real space. Further, it is possible to estimate not only the instantaneous position of the object but also the instantaneous height and shape.

また、本発明の構成及び方法は、例えば、人物が移動したり座ったり屈んだりする場を監視する監視システム、及び商店街や商業・サービス施設内での人物の入店、休憩、観戦・イベント参加や、移動の状況を調査するためのマーケティング調査システム等、様々な系に適用可能である。 In addition, the configuration and method of the present invention include, for example, a monitoring system for monitoring a place where a person moves, sits, and bends, and enters, breaks, watches, and events of a person in a shopping street or a commercial / service facility. The present invention is applicable to various systems such as a marketing research system for investigating participation and movement situations.

以上に述べた本発明の種々の実施形態において、本発明の技術思想及び見地の範囲の種々の変更、修正及び省略は、当業者によれば容易に行うことができる。前述の説明はあくまで例であって、何ら制約しようとするものではない。本発明は、特許請求の範囲及びその均等物として限定するものにのみ制約される。 In the above-described various embodiments of the present invention, various changes, modifications, and omissions of the scope of the technical idea and viewpoint of the present invention can be easily performed by those skilled in the art. The above description is merely an example and is not intended to be limiting. The invention is limited only as defined by the following claims and equivalents thereof.

１物体追跡装置
１０１通信インタフェース
１０２画像蓄積部
１０３ＩＤ蓄積部
１０４追跡物体管理部
１１１物体検出部
１１１ａ検出用識別器
１１１ｂ高さ算出部
１１２ＩＤ管理部
１１２ａ物体統合部
１１２ｂ物体登録部
１１３候補情報算出部
１１４物体追跡部
１１４ａ追跡用識別器
１１４ｂ教師データセット生成部
１１５物体位置・形状推定部
１２１通信制御部
２カメラ Reference Signs List 1 object tracking device 101 communication interface 102 image storage unit 103 ID storage unit 104 tracking object management unit 111 object detection unit 111a detection discriminator 111b height calculation unit 112 ID management unit 112a object integration unit 112b object registration unit 113 candidate information calculation Unit 114 Object tracking unit 114a Tracking discriminator 114b Teacher data set generation unit 115 Object position / shape estimation unit 121 Communication control unit 2 Camera

Claims

An apparatus capable of tracking an object to be tracked using a time-series image group obtained from one or more cameras capable of capturing the object,
As the object motion information which is information including the position information on the position of the object in the real space at one time, at least the change from the previous time in the position of the object in the real space is adopted, and the one time Candidate information calculation means for calculating a plurality of candidate object motion information that is object motion information different in the amount of change from each other,
And image information according to the acquired image, a classifier to learn the data set comprising a the object motion information that is the correct answer, the probability that a change in the position of the real space of the object a variable density For the evaluation function having a term relating to the function and a term for evaluating the closeness of the appearance (appearance) of the image area calculated from the candidate object motion information with respect to the image area relating to the object, Applying the candidate object motion information and the image information related to the image at the one time point, the candidate object motion information that maximizes the score of the evaluation function is calculated based on the position of the object in the real space at the one time point. An object tracking device, comprising: an object tracking unit configured to acquire instantaneous position information of the object in a real space by a discriminator that outputs the position information as a correct answer.

The candidate information calculating means includes, as the object motion information at the one time point , a change amount of the position of the object in the real space from the previous time point and a change amount of the height of the object from the previous time point. And calculating a plurality of candidate object motion information in which at least one of the changes at the one time point is different,
The discriminator of the object tracking means includes a term relating to a probability density function having a change in the position of the object in real space as a variable, and a term relating to a probability density function having a change in the height of the object as a variable. And a term for evaluating the apparent proximity of the image area calculated from the candidate object motion information with respect to the image area of the object, the input plurality of candidate object motion information and the 1 Applying the image information relating to the image at one point in time , the candidate object motion information for maximizing the score of the evaluation function is converted to the correct answer relating to the position of the object in real space and the height of the object at the one point in time 2. The object tracking device according to claim 1 , wherein the object tracking device outputs the information.

The candidate information calculating means includes, as the object motion information at the one time point , a change amount of the position of the object in the real space from the previous time point and a change amount of the height of the object from the previous time point. And calculating a plurality of candidate object motion information in which at least one of the changes at the one time point is different,
The discriminator of the object tracking means includes a term relating to a probability density function having a change in the position of the object in real space as a variable, and a term relating to a probability density function having a change in the height of the object as a variable. And a term for evaluating the degree to which a change due to the motion of the object in the image area related to the object matches a change amount related to the object motion information, and the candidate object motion information for the image area related to the object. Applying the input plurality of candidate object motion information and the image information of the image at the one point in time to an evaluation function having a term for evaluating the apparent proximity of the image area calculated from the candidate object motion information that maximizes the score of the evaluation function according to claim 1, and outputs as information correct answers according to the position and height of the object in the real space of the object in the one time or 2 Object tracking apparatus according.

The candidate information calculation means further employs, as the object motion information at the one time point , a change in inclination of the object from a previous time point ,
The discriminator of the object tracking means outputs the candidate object motion information that maximizes the score of the evaluation function as information including information on the correct answer related to the tilt of the object at the one point in time . object tracking apparatus according to any one of claims 1 to 3.

Detecting the object based on the acquired image, calculating a ground contact position of the object as a position in the real space of the object based on the lowest position of the detected image area of the object, and detecting The position in the real space calculated based on the uppermost position of the image area related to the object, and, based on the calculated ground contact position, further comprising an object detection unit that calculates the height of the object The object tracking device according to any one of claims 1 to 4 , wherein

The discriminator of the object tracking means determines the weighting factor of each term of the evaluation function by learning, and uses the evaluation function having the determined weighting factor to process the input image information of the image, The object tracking device according to any one of claims 1 to 5 , wherein the object motion information to be output is calculated.

Identifier of the object tracking means, before the time of one time point, the data set generated by using the object motion information outputted as the correct learns, image information relating to the image in the one time point type, the image information, processed using the parameters determined by the learning, either 6 from claim 1 and outputs the object motion information to be correct in the one time point 1 Item tracking device according to Item.

The discriminator of the object tracking means is calculated by performing coordinate conversion on an object portion from an upper end of the object in a real space to a position below a predetermined ratio of the height of the object as an image area of the object. object tracking apparatus according to any one of claims 1 to 7, characterized in employing an image area.

Identifier of the object tracking means, object tracking apparatus according to any one of claims 1 to 8, characterized in that it is constructed by the algorithm of the structured SVM (Structured Support Vector Machine).

A program that causes a computer mounted on a device capable of tracking an object to be tracked using a time-series image group obtained from one or more cameras capable of capturing the object to be tracked,
As the object motion information which is information including the position information on the position of the object in the real space at one time, at least the change from the previous time in the position of the object in the real space is adopted, and the one time Candidate information calculation means for calculating a plurality of candidate object motion information that is object motion information different in the amount of change from each other,
And image information according to the acquired image, a classifier to learn the data set comprising a the object motion information that is the correct answer, the probability that a change in the position of the real space of the object a variable density For the evaluation function having a term relating to the function and a term for evaluating the closeness of the appearance (appearance) of the image area calculated from the candidate object motion information with respect to the image area relating to the object, Applying the candidate object motion information and the image information related to the image at the one time point, the candidate object motion information that maximizes the score of the evaluation function is calculated based on the position of the object in the real space at the one time point. the discriminator to output as the position information of the correct, object tracking flop for causing a computer to function as <br/> the object tracking means for acquiring every moment the position information of the real space of the object Program.

A method of tracking an object to be tracked by a computer using a time-series image group obtained from one or more cameras capable of capturing the object to be tracked,
As the object motion information which is information including the position information on the position of the object in the real space at one time, at least the change from the previous time in the position of the object in the real space is adopted, and the one time Calculating a plurality of candidate object motion information, which are different object motion information from each other,
A probability classifier that learns from a data set including image information of an acquired image and the object motion information that is regarded as a correct answer, and uses a change in position of the object in real space as a variable. , And a term for evaluating the closeness of the appearance (appearance) of the image region calculated from the candidate object motion information with respect to the image region of the object, The candidate object motion information that maximizes the score of the evaluation function is obtained by applying the object motion information and the image information of the image at the one time point to the correct answer based on the position of the object in the real space at the one time point. Repeating the step of determining the position information of the object in the real space at the one point in time by the discriminator that outputs the position information of the object. An object tracking method characterized by acquiring.