JP6792195B2

JP6792195B2 - Image processing device

Info

Publication number: JP6792195B2
Application number: JP2016177535A
Authority: JP
Inventors: 亮磨大網
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2016-09-12
Filing date: 2016-09-12
Publication date: 2020-11-25
Anticipated expiration: 2036-09-12
Also published as: JP2018045287A

Description

本発明は、画像処理装置にかかり、特に、画像内のオブジェクトを追跡する画像処理装置に関する。 The present invention relates to an image processing apparatus, and more particularly to an image processing apparatus that tracks an object in an image.

カメラを用いたオブジェクト追跡として、特許文献１の技術が知られている。特許文献１では、画像中から移動物体と一時的な静止物体とを検出し、これまでの追跡結果と対応付けて物体を追跡する方法を開示している。そして、特許文献１の方法では、状態予測の際に、場所に応じて、静止状態あるいは移動状態のいずれかの状態を優先させて物体を追跡している。 The technique of Patent Document 1 is known as object tracking using a camera. Patent Document 1 discloses a method of detecting a moving object and a temporary stationary object in an image and tracking the object in association with the tracking results so far. Then, in the method of Patent Document 1, when predicting a state, an object is tracked by giving priority to either a stationary state or a moving state depending on the location.

特開２０１６−１８３７４号公報Japanese Unexamined Patent Publication No. 2016-18374

しかしながら、上述した特許文献１の方法では、すれ違い等の状況で、複数のオブジェクト間で対応付けを一度誤ってしまうと、その後で修正して追跡することができない、という問題があった。これは、オブジェクトはそれぞれ状態を持っているが、個別のオブジェクトの状態を１つに決定し、状態の更新を実施しているためである。つまり、移動物体検出結果か静止物体検出結果のいずれかを選択し、それに応じてオブジェクトの状態の更新を行っているため、上述した問題が生じる。 However, the method of Patent Document 1 described above has a problem that once a plurality of objects are associated with each other by mistake in a situation such as passing each other, it cannot be corrected and tracked thereafter. This is because each object has a state, but the state of each individual object is determined to be one, and the state is updated. That is, since either the moving object detection result or the stationary object detection result is selected and the state of the object is updated accordingly, the above-mentioned problem occurs.

このように、上述した技術では、仮に移動物体検出あるいは静止物体検出の選択に誤りがあった場合には、それに応じて状態が更新されてしまい、その後の追跡を誤ってしまうという問題があった。従って、オブジェクトの追跡精度が低下する、という問題が生じる。 As described above, in the above-mentioned technique, if there is an error in the selection of moving object detection or stationary object detection, the state is updated accordingly, and there is a problem that the subsequent tracking is incorrect. .. Therefore, there arises a problem that the tracking accuracy of the object is lowered.

このため、本発明の目的は、上述した課題である、オブジェクトの追跡精度が低下する、ことを解決することができる画像処理装置を提供することにある。 Therefore, an object of the present invention is to provide an image processing apparatus capable of solving the above-mentioned problem that the tracking accuracy of an object is lowered.

本発明の一形態である画像処理装置は、
映像を構成する画像からオブジェクトを検出する検出部と、
追跡対象であるオブジェクトが取り得る複数の状態毎に、当該オブジェクトの変数を予測する予測部と、
画像から検出されたオブジェクトと、追跡対象であるオブジェクトの予測した変数と、に基づいて、追跡対象であるオブジェクトと画像から検出されたオブジェクトとの対応づけ及びオブジェクトの状態の選択を含む対応付け処理を行う対応付け部と、
前記対応付け処理の結果に基づいて、追跡対象であるオブジェクトの追跡を行うと共に、当該オブジェクトが取り得る複数の状態毎に当該オブジェクトの変数を更新する更新部と、を備え、
前記予測部は、さらに、更新された複数の状態毎のオブジェクトの変数に基づいて、当該状態毎に追跡対象であるオブジェクトの変数を予測する、
という構成をとる。 The image processing apparatus according to the present invention is
A detector that detects objects from the images that make up the video,
A predictor that predicts the variables of the object for each of the multiple states that the object to be tracked can take,
Correspondence processing including association between the object to be tracked and the object detected from the image and selection of the state of the object based on the object detected from the image and the predicted variable of the object to be tracked. And the mapping part that performs
Based on the result of the association processing, it is provided with an update unit that tracks the object to be tracked and updates the variable of the object for each of a plurality of possible states of the object.
The prediction unit further predicts the variables of the object to be tracked for each state based on the updated variables of the objects for each state.
It takes the configuration.

また、本発明の一形態であるプログラムは、
情報処理装置に、
映像を構成する画像からオブジェクトを検出する検出部と、
追跡対象であるオブジェクトが取り得る複数の状態毎に、当該オブジェクトの変数を予測する予測部と、
画像から検出されたオブジェクトと、追跡対象であるオブジェクトの予測した変数と、に基づいて、追跡対象であるオブジェクトと画像から検出されたオブジェクトとの対応づけ及びオブジェクトの状態の選択を含む対応付け処理を行う対応付け部と、
前記対応付け処理の結果に基づいて、追跡対象であるオブジェクトの追跡を行うと共に、当該オブジェクトが取り得る複数の状態毎に当該オブジェクトの変数を更新する更新部と、
を実現させ、
前記予測部は、さらに、更新された複数の状態毎のオブジェクトの変数に基づいて、当該状態毎に追跡対象であるオブジェクトの変数を予測する、
という構成をとる。 Moreover, the program which is one form of the present invention
For information processing equipment
A detector that detects objects from the images that make up the video,
A predictor that predicts the variables of the object for each of the multiple states that the object to be tracked can take,
Correspondence processing including association between the object to be tracked and the object detected from the image and selection of the state of the object based on the object detected from the image and the predicted variable of the object to be tracked. And the mapping part that performs
Based on the result of the association processing, an update unit that tracks the object to be tracked and updates the variable of the object for each of a plurality of possible states of the object.
Realized,
The prediction unit further predicts the variables of the object to be tracked for each state based on the updated variables of the objects for each state.
It takes the configuration.

また、本発明の一形態である画像処理方法は、
映像を構成する画像からオブジェクトを検出し、
追跡対象であるオブジェクトが取り得る複数の状態毎に、当該オブジェクトの変数を予測し、
画像から検出されたオブジェクトと、追跡対象であるオブジェクトの予測した変数と、に基づいて、追跡対象であるオブジェクトと画像から検出されたオブジェクトとの対応づけ及びオブジェクトの状態の選択を含む対応付け処理を行い、
前記対応付け処理の結果に基づいて、追跡対象であるオブジェクトの追跡を行うと共に、当該オブジェクトが取り得る複数の状態毎に当該オブジェクトの変数を更新し、
さらに、更新された複数の状態毎のオブジェクトの変数に基づいて、当該状態毎に追跡対象であるオブジェクトの変数を予測する、
という構成をとる。 Further, the image processing method, which is one embodiment of the present invention, is
Detects objects from the images that make up the video
Predict the variables of the object for each of the multiple states that the object to be tracked can take,
Correspondence processing including association between the object to be tracked and the object detected from the image and selection of the state of the object based on the object detected from the image and the predicted variable of the object to be tracked. And
Based on the result of the association processing, the object to be tracked is tracked, and the variables of the object are updated for each of a plurality of possible states of the object.
Furthermore, based on the updated variables of the objects for each state, the variables of the object to be tracked for each state are predicted.
It takes the configuration.

本発明は、以上のように構成されることにより、映像中のオブジェクトの追跡精度の向上を図ることができる。 By configuring as described above, the present invention can improve the tracking accuracy of an object in a video.

本発明の第１の実施形態における情報処理装置の構成を示すブロック図である。It is a block diagram which shows the structure of the information processing apparatus in 1st Embodiment of this invention. 図１に開示した情報処理装置による画像処理の内容を示す図である。It is a figure which shows the content of the image processing by the information processing apparatus disclosed in FIG. 図１に開示した情報処理装置による画像処理の内容を示す図である。It is a figure which shows the content of the image processing by the information processing apparatus disclosed in FIG. 図１に開示した情報処理装置による画像処理の内容を示す図である。It is a figure which shows the content of the image processing by the information processing apparatus disclosed in FIG. 図１に開示した情報処理装置の動作を示すフローチャートである。It is a flowchart which shows the operation of the information processing apparatus disclosed in FIG. 図１に開示した情報処理装置による画像処理に用いられるモデルの一例を示す図である。It is a figure which shows an example of the model used for the image processing by the information processing apparatus disclosed in FIG. 図１に開示した情報処理装置による画像処理に用いられるモデルの一例を示す図である。It is a figure which shows an example of the model used for the image processing by the information processing apparatus disclosed in FIG. 図１に開示した情報処理装置による画像処理に用いられるモデルの一例を示す図である。It is a figure which shows an example of the model used for the image processing by the information processing apparatus disclosed in FIG. 本発明の第２の実施形態における画像処理装置の構成を示すブロック図である。It is a block diagram which shows the structure of the image processing apparatus in the 2nd Embodiment of this invention.

［実施形態１］
本発明の第１の実施形態を、図１乃至図８を参照して説明する。図１は、実施形態１における情報処理装置の構成を説明するための図である。図２乃至図４は、画像処理の内容を説明するための図である。図５は、画像処理の様子を示すフローチャートである。図６乃至図８は、画像処理に用いられるモデルの一例を説明するための図である。 [Embodiment 1]
The first embodiment of the present invention will be described with reference to FIGS. 1 to 8. FIG. 1 is a diagram for explaining the configuration of the information processing apparatus according to the first embodiment. 2 to 4 are diagrams for explaining the contents of image processing. FIG. 5 is a flowchart showing a state of image processing. 6 to 8 are diagrams for explaining an example of a model used for image processing.

本発明は、カメラと言った撮像装置を用いて所定場所の映像を撮影し、かかる映像を構成する画像に写っている人物や車などのオブジェクトを検出して、追跡する目的で利用するものである。このため、本発明は、映像を構成する画像を処理する画像処理装置としての情報処理装置１００を備えて構成されている。 The present invention is used for the purpose of capturing an image at a predetermined location using an imaging device such as a camera, detecting an object such as a person or a car in the image constituting the image, and tracking the object. is there. Therefore, the present invention is configured to include an information processing device 100 as an image processing device that processes images constituting a video.

本実施形態における情報処理装置１００は、演算装置や記憶装置を備えた一般的な情報処理装置である。情報処理装置１００は、撮像装置とは別個に存在するパーソナルコンピュータやサーバコンピュータであってもよく、撮像装置に組み込まれた演算装置及び記憶装置からなるコンピュータで構成されていてもよい。 The information processing device 100 in the present embodiment is a general information processing device including an arithmetic unit and a storage device. The information processing device 100 may be a personal computer or a server computer that exists separately from the image pickup device, or may be composed of a computer including an arithmetic device and a storage device incorporated in the image pickup device.

また、撮像装置は、ネットワークに接続されたいわゆるＩＰ（Internet Protocol）カメラであってもよい。この場合、ＩＰカメラは、撮影した映像を構成する画像を、ネットワークを介して画像処理を行うコンピュータに送信する。あるいは、ＩＰカメラは、当該ＩＰカメラに搭載されたコンピュータによって追跡処理を行い、その追跡処理結果をネットワークを介して他のコンピュータ等に出力してもよい。 Further, the image pickup device may be a so-called IP (Internet Protocol) camera connected to a network. In this case, the IP camera transmits the images constituting the captured video to a computer that performs image processing via the network. Alternatively, the IP camera may perform tracking processing by a computer mounted on the IP camera and output the tracking processing result to another computer or the like via a network.

情報処理装置１００は、図１に示すように、演算装置がプログラムを実行することで構築された、オブジェクト検出部１０１（検出部）、状態毎オブジェクト記述変数予測部１０２（予測部）、検出・追跡結果対応付け部１０３（対応付け部）、追跡対象状態・オブジェクト記述変数更新部１０４（更新部）、を備えている。 As shown in FIG. 1, the information processing device 100 includes an object detection unit 101 (detection unit), an object description variable prediction unit 102 (prediction unit) for each state, and a detection / detection unit, which are constructed by the arithmetic unit executing a program. It includes a tracking result mapping unit 103 (correspondence unit) and a tracking target state / object description variable update unit 104 (update unit).

上記オブジェクト検出部１０１は、映像を構成する個々の画像（フレーム画像）の入力を受け付けて、入力された画像に対してオブジェクト検出を行い、オブジェクト検出結果を検出・追跡結果対応付け部１０３へ出力する。状態毎オブジェクト記述変数予測部１０２は、追跡対象状態・オブジェクト記述変数更新部１０４から出力される、今までに求まっている各追跡対象に対する状態毎のオブジェクト記述変数情報に基づいて各追跡対象オブジェクトのオブジェクト記述変数の現在の値を状態毎に予測し、状態毎のオブジェクト記述変数予測結果を検出・追跡結果対応付け部１０３へ出力する。 The object detection unit 101 receives input of individual images (frame images) constituting the video, performs object detection on the input image, and outputs the object detection result to the detection / tracking result mapping unit 103. To do. The state-specific object description variable prediction unit 102 of each tracking target object is output from the tracking target state / object description variable update unit 104 based on the object description variable information for each state for each tracking target that has been obtained so far. The current value of the object description variable is predicted for each state, and the prediction result of the object description variable for each state is output to the detection / tracking result mapping unit 103.

検出・追跡結果対応付け部１０３は、状態毎オブジェクト記述変数予測部１０２から出力される状態毎のオブジェクト記述変数予測結果と、オブジェクト検出部１０１から出力されるオブジェクト検出結果とを用いて、両者の対応関係を求め、検出・追跡対応付け結果として追跡対象状態・オブジェクト記述変数更新部１０４へ出力する。追跡対象状態・オブジェクト記述変数更新部１０４は、入力される画像と、検出・追跡結果対応付け部１０３から出力される検出・追跡対応付け結果とから、追跡結果を求めるとともに、追跡対象オブジェクトの状態とオブジェクト記述変数の値を更新し、状態毎オブジェクト記述変数予測部１０２へ状態毎のオブジェクト記述変数情報を出力する。 The detection / tracking result mapping unit 103 uses both the object description variable prediction result for each state output from the object description variable prediction unit 102 for each state and the object detection result output from the object detection unit 101. The correspondence is obtained and output to the tracking target state / object description variable update unit 104 as the detection / tracking association result. The tracking target state / object description variable update unit 104 obtains the tracking result from the input image and the detection / tracking mapping result output from the detection / tracking result mapping unit 103, and obtains the tracking result, and the state of the tracking target object. And the value of the object description variable is updated, and the object description variable information for each state is output to the object description variable prediction unit 102 for each state.

次に、上述した情報処理装置１００の構成及び動作の詳細を説明する。 Next, the details of the configuration and operation of the information processing apparatus 100 described above will be described.

＜オブジェクトの状態とオブジェクト記述変数＞
まず、以下では、「状態」と「オブジェクト記述変数」という言葉を区別して用いる。ここで「状態」とは、各オブジェクトがとりえる離散的な有限数の状態のことを指す。例えば、静止状態、移動状態、といったものが状態に相当し、特許文献１で述べている動作状態に相当する。ただし、状態はこれに限らず、それ以外の状態も定義し得る。 <Object state and object description variables>
First, in the following, the terms "state" and "object description variable" will be used separately. Here, the "state" refers to a finite number of discrete states that each object can take. For example, a stationary state, a moving state, and the like correspond to a state, and correspond to an operating state described in Patent Document 1. However, the state is not limited to this, and other states can be defined.

一方、オブジェクトの位置や速度、明るさや、各状態をとりえる尤度などの情報（通常連続値をとりえる）は、「オブジェクト記述変数」と述べることとし、上述の状態とは区別して用いる（詳細は後述）。 On the other hand, information such as the position, velocity, brightness of an object, and the likelihood of taking each state (usually a continuous value can be taken) is described as an "object description variable" and is used separately from the above-mentioned states (). Details will be described later).

また、「オブジェクト」は、車や荷物などの物体だけでなく、人物や人物が乗った物体なども含むものとする。そして、画面内の少なくとも１つのエリアで、オブジェクトは少なくとも２つ以上の状態をとりえるものとする。なお、オブジェクトのとりえる状態の数、種類は、オブジェクトの種別、画面内の領域によって変わってもよい。 Further, the "object" includes not only an object such as a car or luggage but also a person or an object on which the person rides. Then, in at least one area in the screen, the object can take at least two or more states. The number and types of states that the object can take may change depending on the type of the object and the area on the screen.

＜オブジェクト検出部１０１＞
オブジェクト検出部１０１は、入力される画像に対してオブジェクト検出を行い、結果をオブジェクト検出結果として出力する。例えば、映像を構成する連続するフレーム画像が入力される度に、順次、オブジェクト検出を行い、検出結果を検出・追跡結果対応付け部１０３に出力する。 <Object detection unit 101>
The object detection unit 101 detects an object on the input image and outputs the result as the object detection result. For example, each time a continuous frame image constituting a video is input, object detection is performed in sequence, and the detection result is output to the detection / tracking result association unit 103.

ここで、オブジェクトが人物の場合、人物の画像特徴を学習した検出器を用いて、人物領域を検出する。例えば、HOG（Histograms of Oriented Gradients）特徴に基づいて検出する検出器や、CNN（Convolutional Neural Network）を用いて画像から直接検出する検出器を用いてもよい。あるいは、人全体ではなく、人の一部の領域（例えば頭部など）を学習させた検出器を用いて人物を検出するようにしてもよい。オブジェクトが車の場合も同様に、車両の画像特徴を学習させた検出器を用いて検出することが可能である。オブジェクトがそれ以外の特定物体の場合も、その特定物体の画像特徴を学習させた検出器を構築し、用いるようにすればよい。 Here, when the object is a person, the person area is detected by using a detector that has learned the image features of the person. For example, a detector that detects based on HOG (Histograms of Oriented Gradients) characteristics or a detector that detects directly from an image using a CNN (Convolutional Neural Network) may be used. Alternatively, the person may be detected by using a detector trained in a part of the person (for example, the head) instead of the whole person. Similarly, when the object is a car, it can be detected by using a detector trained with the image features of the vehicle. Even if the object is a specific object other than that, a detector that has learned the image features of the specific object may be constructed and used.

検出されたオブジェクトの情報は、オブジェクト検出結果として出力される。オブジェクト検出結果には、検出された「オブジェクトの数」と、検出された個々の「オブジェクトの情報」を含む。 The information of the detected object is output as the object detection result. The object detection result includes the "number of objects" detected and the individual "object information" detected.

上記検出された個々の「オブジェクトの情報」は、そのオブジェクトの検出位置を含む。ここで、検出位置は、オブジェクトの接地位置であってもよいし、オブジェクトの中心位置であってもよい。また、位置情報は、画面上の二次元の座標であってもよいし、カメラの姿勢パラメータを用いて、実世界上の座標（例えばフロア上の位置）に変換したものであってもよい。カメラの姿勢パラメータは、既知の方法で事前にキャリブレーションを行うことによって求めることができる。 The detected individual "object information" includes the detection position of the object. Here, the detection position may be the grounding position of the object or the center position of the object. Further, the position information may be two-dimensional coordinates on the screen, or may be converted into coordinates on the real world (for example, a position on the floor) using the posture parameters of the camera. The camera orientation parameters can be determined by pre-calibrating by a known method.

また、上記オブジェクトの情報は、オブジェクトの大きさを表す情報を含んでいてもよい。具体的には、画像上でのオブジェクト領域の外接矩形の情報であったり、オブジェクトの幅、高さ情報であったりする。また、オブジェクトの形状を表す情報を含んでいてもよい。たとえば、オブジェクト領域を表すシルエット情報やＭＰＥＧ−７のシェイプディスクリプタを抽出した値などを含んでいてもよい。ここで、シルエット情報とは、オブジェクト領域の内部の画素と外部の画素を区別する情報であり、例えば、内部の画素値を255、外部の画素値を0に設定した画像情報である。 Further, the information of the object may include information indicating the size of the object. Specifically, it may be information on the circumscribed rectangle of the object area on the image, or information on the width and height of the object. It may also include information representing the shape of the object. For example, it may include silhouette information representing an object area, an extracted value of an MPEG-7 shape descriptor, and the like. Here, the silhouette information is information that distinguishes between the internal pixel and the external pixel of the object area, and is, for example, image information in which the internal pixel value is set to 255 and the external pixel value is set to 0.

また、オブジェクトの情報は、オブジェクトの外見の特徴量を含んでいてもよい。例えば、オブジェクトの色や模様、形状などの特徴量も含んでいてもよい。さらに、オブジェクトの情報は、検出の確からしさを表す尤度を記述する情報も含んでいてもよい。尤度情報とは、尤度の算出に用いる情報であり、オブジェクト検出時のスコアの値や、検出されたオブジェクトのカメラからの距離、大きさなど、検出の確からしさに関連する情報である。あるいは、オブジェクトの尤度自体を算出し、尤度情報としてもよい。求まったオブジェクト検出結果は、検出・追跡結果対応付け部１０３へ出力される。 In addition, the information of the object may include the feature amount of the appearance of the object. For example, features such as the color, pattern, and shape of the object may be included. Further, the information of the object may also include information describing the likelihood indicating the certainty of detection. The likelihood information is information used for calculating the likelihood, and is information related to the certainty of detection such as the value of the score at the time of object detection, the distance and size of the detected object from the camera. Alternatively, the likelihood itself of the object may be calculated and used as the likelihood information. The obtained object detection result is output to the detection / tracking result associating unit 103.

＜状態毎オブジェクト記述変数予測部１０２＞
状態毎オブジェクト記述変数予測部１０２は、追跡対象状態・オブジェクト記述変数更新部１０４から出力される、今までに求まっている各追跡対象オブジェクトに対する状態毎のオブジェクト記述変数情報に基づいて、各追跡対象オブジェクトのオブジェクト記述変数の現在の値を状態毎に予測する。 <Object description variable prediction unit 102 for each state>
The object description variable prediction unit 102 for each state is output from the tracking target state / object description variable update unit 104, and each tracking target is based on the object description variable information for each state for each tracking target object that has been obtained so far. Predict the current value of the object's object description variable for each state.

状態毎のオブジェクト記述変数情報とは、オブジェクトがとりえる各状態に対して求まるオブジェクト記述変数情報である。例えば、取りえるオブジェクトの状態が、移動状態と静止状態の場合には、それぞれの状態に対して求められたオブジェクト記述変数を含んでいる。そして、オブジェクト記述変数がオブジェクトの位置である場合には、そのオブジェクトが移動状態であると仮定したときの位置情報と、静止状態であると仮定したときの位置情報を含む。 The object description variable information for each state is the object description variable information obtained for each state that the object can take. For example, when the states of the objects that can be taken are the moving state and the stationary state, the object description variables obtained for each state are included. When the object description variable is the position of the object, the position information when the object is assumed to be in the moving state and the position information when the object is assumed to be in the stationary state are included.

＜オブジェクト記述変数の説明＞
ここで、上述した「オブジェクト記述変数」について詳述する。オブジェクト記述変数は、追跡中のオブジェクトの外見や位置など、オブジェクトの動作状況を記述する変数であり、位置や動きモデルのパラメータなどが含まれる。位置は、オブジェクトの画面内での座標であってもよいし、あるいは、オブジェクトの実空間上での座標であってもよい。 <Explanation of object description variables>
Here, the above-mentioned "object description variable" will be described in detail. The object description variable is a variable that describes the operation status of the object such as the appearance and position of the object being tracked, and includes the parameters of the position and the movement model. The position may be the coordinates of the object in the screen, or may be the coordinates of the object in the real space.

動きモデルのパラメータとしては、等速直線運動モデルの場合には、オブジェクトのその時点での速度を含む。オブジェクトの短時間での動きは等速直線運動で記述できることが多いため、よく用いられる。一方、加速度まで含んだ動きモデルを用いてもよい。この場合、さらに、加速度情報が追加される。なお、速度の情報も、画面上の座標系での速度情報であってもよいし、実空間座標系での情報であってもよい。あるいは、行列のように、ある一定の規則に従ってオブジェクトが並んで一次元的に動く場合には、その位置を表す一次元のパラメータを動きモデルパラメータとしてもよい。例えば、図２に示すように、先頭から行列の並びに沿った座標軸を定義し、人物の位置をその座標上の数値として表すようにしてもよい。この場合は、行列に沿った座標系での位置の予測値になり、一定の速度で行列が前に動いている場合には、一定の速度で座標値が減少するように予測する。 In the case of a constant velocity linear motion model, the parameters of the motion model include the current velocity of the object. It is often used because the movement of an object in a short time can often be described by a constant velocity linear motion. On the other hand, a motion model including acceleration may be used. In this case, further acceleration information is added. The speed information may also be speed information in the coordinate system on the screen or information in the real space coordinate system. Alternatively, when objects move one-dimensionally side by side according to a certain rule like a matrix, a one-dimensional parameter representing the position may be used as a movement model parameter. For example, as shown in FIG. 2, coordinate axes along a matrix from the beginning may be defined, and the position of a person may be represented as a numerical value on the coordinates. In this case, it is the predicted value of the position in the coordinate system along the matrix, and when the matrix is moving forward at a constant speed, the coordinate value is predicted to decrease at a constant speed.

また、オブジェクト記述変数には、外見の変化を表す情報が含まれていてもよい。例えば、明滅して飛行する飛翔体がオブジェクトの場合には、オブジェクトの平均輝度を記述する変数をさらに含んでいてもよい。また、外見を記述する色や模様，形状などの視覚特徴量をオブジェクト記述変数として保持していてもよい。また、オブジェクトの姿勢の変化を表す量をオブジェクト記述変数として保持していてもよい。例えば、オブジェクトが人物の場合には、人物の見かけの高さ情報や、身長に対する見かけの高さの比率、あるいは、人物外接矩形の縦横比などが該当する。 In addition, the object description variable may include information indicating a change in appearance. For example, if the flickering flying object is an object, it may further include a variable that describes the average brightness of the object. In addition, visual features such as colors, patterns, and shapes that describe the appearance may be stored as object description variables. Further, the quantity representing the change in the posture of the object may be held as an object description variable. For example, when the object is a person, the apparent height information of the person, the ratio of the apparent height to the height, the aspect ratio of the person's circumscribing rectangle, and the like are applicable.

また、オブジェクト記述変数に、各状態の尤度情報も含まれていてもよい。例えば、状態として静止状態、移動状態の２状態を含み、ある時点でオブジェクトが静止状態、移動状態である確からしさがそれぞれ0.8，0.2であった場合、この0.8と0.2という数値を尤度情報として含んでいてもよい。 In addition, the object description variable may also include the likelihood information of each state. For example, if the state includes two states, a stationary state and a moving state, and the probability that the object is in the stationary state and the moving state at a certain point is 0.8 and 0.2, respectively, the numerical values of 0.8 and 0.2 are used as the likelihood information. It may be included.

このようなオブジェクト記述変数の値は、後述する追跡対象状態・オブジェクト記述変数更新部１０４で追跡結果とともに以前の時刻（以前の画像）に対して状態毎に求まっている。状態毎オブジェクト記述変数予測部１０２では、以前の時刻における状態毎の値に基づいて、現在のオブジェクト記述変数の値をオブジェクトのとりえる状態ごとに予測する。これを追跡中のオブジェクトのそれぞれに対して行う。 The value of such an object description variable is obtained for each state with respect to the previous time (previous image) together with the tracking result by the tracking target state / object description variable update unit 104 described later. The object description variable prediction unit 102 for each state predicts the value of the current object description variable for each state that the object can take, based on the value for each state at the previous time. Do this for each of the objects you are tracking.

例えば、オブジェクト記述変数のうちの位置については、前回からの経過時間を考慮して、前回まで位置の値から現在の予測位置が算出される。この予測は、状態毎に定められた動きモデルに従って算出される。時刻tにおけるk番目のオブジェクトのs番目の状態に対する位置を［数１］とし、速度を［数２］とすると、時刻(t-Δt)前の状態からの時刻tにおける予測値［数３］は、［数４］となる。 For example, for the position of the object description variable, the current predicted position is calculated from the value of the position up to the previous time in consideration of the elapsed time from the previous time. This prediction is calculated according to the motion model determined for each state. Assuming that the position of the k-th object at time t with respect to the s-th state is [Equation 1] and the velocity is [Equation 2], the predicted value at time t [Equation 3] from the state before the time (t-Δt). Is [Equation 4].

ここで、［数５］は、k番目のオブジェクトのs番目の状態に対して、時刻(t-Δt)から時刻tにおける移動量の予測値を表しており、動きモデルによって定まる。 Here, [Equation 5] represents a predicted value of the amount of movement from the time (t-Δt) to the time t with respect to the sth state of the kth object, and is determined by the motion model.

例えば、取りえる状態を静止状態と移動状態の２つとし、s=0を静止状態、s=1を移動状態とし、移動状態の動きモデルは等速直線運動であるとすると、s=0のときは移動しないため、［数６］となる。 For example, if there are two possible states, a stationary state and a moving state, s = 0 is a stationary state, s = 1 is a moving state, and the motion model in the moving state is a constant velocity linear motion, s = 0. When it does not move, it becomes [Equation 6].

一方、移動状態であるs=1のときは、［数７］となる。 On the other hand, when s = 1 in the moving state, it becomes [Equation 7].

ここで、［数７］中の項である［数８］は、実際の動きの運動モデルからのずれを表す項であり、例えば、ガウシアン分布を持つ乱数を発生させて得られる値を用いることができる。 Here, the term [Equation 8] in [Equation 7] is a term representing a deviation from the motion model of the actual movement, and for example, a value obtained by generating a random number having a Gaussian distribution is used. Can be done.

また、位置が実空間の座標系の値の場合には、予測も実空間座標系で行うようにすればよい。これを画像上の座標値に戻すには、カメラの姿勢情報を表すカメラパラメータを用いれば変換可能である。 Further, when the position is a value in the coordinate system in the real space, the prediction may be performed in the coordinate system in the real space. In order to return this to the coordinate value on the image, it can be converted by using the camera parameter representing the posture information of the camera.

また、オブジェクト記述変数の中に輝度など、外見に関連する情報を含んでいる場合、その情報の時間に対する変動をモデル化しておき、モデルに基づいて時刻tの値を予測してもよい。例えば、輝度Iが正弦関数に従って変化する場合には、［数９］のように表すことができる。この場合、この式を用いて時刻tの輝度Iを予測してもよい。 Further, when the object description variable contains information related to appearance such as brightness, the variation of the information with respect to time may be modeled and the value of time t may be predicted based on the model. For example, when the brightness I changes according to a sine function, it can be expressed as [Equation 9]. In this case, the luminance I at time t may be predicted using this equation.

また、オブジェクト自体の輝度がオブジェクト自体の明度の変化ではなく、環境にある照明の影響によって変わる場合には、画面内の各場所における照明の状態を記述した情報を別途用意しておき、その情報を用いて、オブジェクトが移動する移動先の位置を求め、その位置におけるオブジェクトの明るさを予測するようにしてもよい。 If the brightness of the object itself changes not due to the change in the brightness of the object itself but due to the influence of the lighting in the environment, information describing the lighting state at each location on the screen is prepared separately, and that information is provided. May be used to find the position of the destination where the object moves and predict the brightness of the object at that position.

また、特徴量についても位置によって変化する場合には、位置に応じて値を予測するようにしてもよい。例えば、物体のテクスチャ情報は、カメラから遠ざかるにつれ、不鮮明になるため、それに応じて特徴量も変化する。このため、カメラからの距離や、あるいは、オブジェクトの画面上のサイズに応じて、特徴量の変化を予測し、用いるようにしてもよい。 Further, when the feature amount also changes depending on the position, the value may be predicted according to the position. For example, the texture information of an object becomes unclear as it moves away from the camera, and the feature amount changes accordingly. Therefore, the change in the feature amount may be predicted and used according to the distance from the camera or the size of the object on the screen.

また、オブジェクト記述変数に姿勢に関する情報が含まれる場合には、その変化をモデル化し、予測するようにする。例えば、オブジェクトが人物であり、見かけの高さをオブジェクト記述情報に含み、人物の状態がしゃがみ・屈み状態を含む場合、しゃがみ・屈み状態に対しては、人物がしゃがんでいく過程において、人物の見かけの高さが一定の割合で減少していくように予測してもよい。 In addition, when the object description variable contains information about the posture, the change is modeled and predicted. For example, when the object is a person, the apparent height is included in the object description information, and the state of the person includes the crouching / crouching state, the crouching / crouching state of the person in the process of crouching. You may predict that the apparent height will decrease at a constant rate.

状態毎オブジェクト記述変数予測部１０２は、上述のようにして求まった状態毎のオブジェクト記述変数予測結果を、検出・追跡結果対応付け部１０３へ出力する。 The object description variable prediction unit 102 for each state outputs the object description variable prediction result for each state obtained as described above to the detection / tracking result mapping unit 103.

＜検出・追跡結果対応付け部１０３＞
検出・追跡結果対応付け部１０３は、状態毎オブジェクト記述変数予測部１０２から出力される状態毎のオブジェクト記述変数予測結果と、オブジェクト検出部１０１から出力されるオブジェクト検出結果とを用いて、両者の対応関係を求める。 <Detection / tracking result mapping unit 103>
The detection / tracking result mapping unit 103 uses the object description variable prediction result for each state output from the object description variable prediction unit 102 for each state and the object detection result output from the object detection unit 101 for both. Find a correspondence.

現在追跡中のオブジェクトの数をＫ個、検出されたオブジェクトの数をM個とし、K個の追跡オブジェクトとM個の検出オブジェクトとの間で対応付けを行う。まず、k番目の追跡オブジェクトとm番目の検出オブジェクトとが対応づく尤度（これをq_k,mとする）を算出する。k番目の追跡オブジェクトのとりえる状態がS_k個であるとし、k番目の追跡オブジェクトの状態がsであると仮定したときの両者が対応づく尤度を［数１０］で表すとすると、最も値が大きくなる尤度を、k番目の追跡オブジェクトとm番目の検出オブジェクトの対応付け尤度とする（図３参照）。すなわち、［数１１］により、q_k,mを求める。また、尤度が最大となる状態を選択し、「選択された状態」とする。 The number of objects currently being tracked is K, the number of detected objects is M, and the K tracking objects and the M detection objects are associated with each other. First, the likelihood (let this be q _{k, m} ) that the k-th tracking object and the m-th detection object correspond to each other is calculated. Assuming that the k-th tracking object has S _k states and the k-th tracking object has s states, the likelihood of the two corresponding to each other is represented by [Equation 10]. The likelihood of increasing the value is defined as the association likelihood of the k-th tracking object and the m-th detection object (see FIG. 3). That is, q _{k, m} is _obtained by [Equation 11]. In addition, the state with the maximum likelihood is selected and referred to as the "selected state".

ここで、尤度［数１０］は、両者の距離の近さに基づいて、近いほど大きくなる値とすればよい。すなわち、［数１２］のベクトルの長さが小さいほど大きくなる値とする。 Here, the likelihood [Equation 10] may be a value that increases as the distance between the two increases. That is, the smaller the length of the vector of [Equation 12], the larger the value.

具体的には、例えば、距離に関する単調非増加関数により、距離と尤度の関係をモデル化し、モデルに基づいて距離から尤度を求めるようにすればよい。例えば、負の指数分布を用いてモデル化することが考えらえる。この尤度を［数１３］で表すことにすると、［数１４］となる。 Specifically, for example, the relationship between the distance and the likelihood may be modeled by a monotonous non-increasing function related to the distance, and the likelihood may be obtained from the distance based on the model. For example, it is conceivable to model using a negative exponential distribution. If this likelihood is expressed by [Equation 13], it becomes [Equation 14].

また、オブジェクトの検出結果と、追跡オブジェクトのオブジェクト記述変数予測情報がともに視覚特徴量などの外見情報を含む場合には、それらの類似度も一緒に考慮して、尤度を決定してもよい。すなわち、類似度に関する単調非減少関数を用いて、類似度を尤度に変換し（これを［数１５］とする）、得られた値と前述の距離による尤度を掛け合わせて全体の尤度としてもよい。この場合は、［数１６］となる。なお、類似度による尤度とオブジェクト位置の近さによる尤度の統合方法としては、単純に掛け算する以外にも様々な方法が適用可能である。 In addition, when both the object detection result and the object description variable prediction information of the tracking object include appearance information such as visual features, the likelihood may be determined by considering their similarity together. .. That is, using the monotonous non-decreasing function for similarity, the similarity is converted to likelihood (this is referred to as [Equation 15]), and the obtained value is multiplied by the likelihood due to the above distance to obtain the overall likelihood. It may be a degree. In this case, it becomes [Equation 16]. As a method of integrating the likelihood based on the similarity and the likelihood based on the closeness of the object positions, various methods other than simple multiplication can be applied.

さらに、検出オブジェクトmの検出の確からしさを表す尤度（これを［数１７］とする）や、追跡オブジェクトkの追跡の確からしさを表す尤度（これを［数１８］とする）も考慮して、尤度q_k,mを算出してもよい。この場合は、さらにこれらの尤度を掛け合わせたものがq_k,mとなる。すなわち、［数１９］となる。 Further, the likelihood indicating the probability of detection of the detection object m (referred to as [Equation 17]) and the likelihood representing the probability of tracking the tracking object k (referred to as [Equation 18]) are also considered. Then, the likelihood q _{k, m} may be calculated. In this case, the product of these likelihoods is q _{k, m} . That is, it becomes [Equation 19].

このようにして各追跡オブジェクトk=1,...,Kと、各検出オブジェクトm=1,...,Mの間の対応付けの尤度が求まるため、これに基づいて例えば、単調非増加関数によって尤度をコストに変換したのち、ハンガリアン法などのアルゴリズムによって対応関係を求めることができる。 In this way, the likelihood of the association between each tracking object k = 1, ..., K and each detection object m = 1, ..., M can be obtained. Based on this, for example, monotonous non-monotonic After converting the likelihood into cost by the increasing function, the correspondence can be obtained by an algorithm such as the Hungarian method.

なお、検出オブジェクトの中には、新規に現れたオブジェクトやオブジェクトの誤検出により、追跡オブジェクトと対応づかないオブジェクトが含まれる可能性がある。一方、追跡オブジェクトの中には、カメラの画角外に消えたり、未検知のオブジェクトだったりといった理由で、検出オブジェクトと対応づかないオブジェクトが含まれる場合がある。これらの存在を考慮し，追跡オブジェクト側にM個のダミーノードを追加し、検出オブジェクト側にK個のダミーノードを追加してハンガリアン法を行うようにしてもよい。 Note that the detected objects may include newly appearing objects and objects that do not correspond to the tracking objects due to false detection of the objects. On the other hand, the tracking object may include an object that does not correspond to the detected object because it disappears outside the angle of view of the camera or is an undetected object. Considering these existences, M dummy nodes may be added to the tracking object side, and K dummy nodes may be added to the detection object side to perform the Hungarian method.

ダミーノードを追加した場合の対応づけについて、図４を用いて説明する。図４は、追跡対象オブジェクトの数K=2、検出オブジェクトの数M=3の時を示している。このため、追跡側に３個のダミーノードを追加し、検出側に２個のダミーノードを追加している。このようにしてから、ハンガリアン法で対応づけを求めるようにしてもよい。この際、追跡対象オブジェクトとダミーノードとの間の対応づけの尤度は、ダミーノードが誤検出である確率や追跡対象オブジェクトが画面外に消失した確率に基づいて算出できる。一方、検出オブジェクトとダミーノードとの対応づけの尤度は、検出オブジェクトが未検出である確率や新規追跡対象オブジェクトが画面内に入ってくる確率に基づいて算出できる。 The correspondence when a dummy node is added will be described with reference to FIG. FIG. 4 shows the time when the number of tracked objects K = 2 and the number of detected objects M = 3. Therefore, three dummy nodes are added to the tracking side, and two dummy nodes are added to the detection side. After this, the Hungarian law may be used to request the correspondence. At this time, the likelihood of the association between the tracked object and the dummy node can be calculated based on the probability that the dummy node is erroneously detected and the probability that the tracked object disappears from the screen. On the other hand, the likelihood of associating the detected object with the dummy node can be calculated based on the probability that the detected object has not been detected and the probability that a new tracked object will enter the screen.

ダミーノードと対応づいたものについては、実際には対応づかなかったものとして扱う。すなわち、検出オブジェクトのダミーノードと対応づいた追跡オブジェクトについては、対応づく検出オブジェクトがなかったものとして扱う。逆に、追跡オブジェクトのダミーノードと対応づいた検出オブジェクトについては、対応づく追跡オブジェクトがなかったものとして扱う。 Those that correspond to the dummy node are treated as if they did not actually correspond. That is, the tracking object that corresponds to the dummy node of the detection object is treated as if there was no corresponding detection object. Conversely, the detection object that corresponds to the dummy node of the tracking object is treated as if there was no corresponding tracking object.

得られた対応付けの結果は、検出・追跡対応付け結果として追跡対象状態・オブジェクト記述変数更新部１０４へ出力される。このとき、検出・追跡対応付け結果には、オブジェクト検出部１０１で検出されたオブジェクトの情報や、状態毎オブジェクト記述変数予測部１０２で算出された状態毎のオブジェクト記述変数の予測値も含められて、追跡対象状態・オブジェクト記述変数更新部１０４へ出力される。 The obtained mapping result is output to the tracking target state / object description variable update unit 104 as the detection / tracking mapping result. At this time, the detection / tracking association result includes the information of the object detected by the object detection unit 101 and the predicted value of the object description variable for each state calculated by the object description variable prediction unit 102 for each state. , Tracked state / object description variable update unit 104.

＜追跡対象状態・オブジェクト記述変数更新部１０４＞
追跡対象状態・オブジェクト記述変数更新部１０４では、入力される画像と、検出・追跡結果対応付け部１０３から出力される検出・追跡対応付け結果とから、追跡結果を求めるとともに、追跡対象オブジェクトの状態と、状態毎におけるオブジェクト記述変数の値を更新する。 <Tracked state / object description variable update unit 104>
The tracking target state / object description variable update unit 104 obtains the tracking result from the input image and the detection / tracking mapping result output from the detection / tracking result mapping unit 103, and obtains the tracking result, and the state of the tracking target object. And update the value of the object description variable for each state.

ここでは、各追跡オブジェクトに対応づいた検出オブジェクトの情報と、そのときに選択されたオブジェクトの状態に応じて、オブジェクト記述変数の値を更新する。また、各追跡オブジェクトが各状態をとっている確率を状態の尤度を［数２０］で表し、その中で最も尤度が高い状態をそのオブジェクトの最尤状態（あるいは単に状態）と呼ぶことにする。なお、各状態の尤度は、例えば、はじめは全て同一の値に設定されており、後述するように、検出・追跡結果対応付け部１０３で「選択された状態」に応じて算出されてもよく、あるいは、他の方法で算出されてもよい。 Here, the value of the object description variable is updated according to the information of the detected object corresponding to each tracking object and the state of the object selected at that time. In addition, the probability that each tracking object is in each state is represented by the likelihood of the state by [Equation 20], and the state with the highest likelihood is called the maximum likelihood state (or simply state) of the object. To. Note that the likelihood of each state is initially set to the same value, and as will be described later, even if it is calculated according to the "selected state" by the detection / tracking result mapping unit 103. It may be calculated well or by other methods.

各追跡オブジェクトのオブジェクト記述変数の更新方法は、その追跡オブジェクトの各状態の尤度に応じて切り替える。ここで、各状態の尤度に基づいて総合的に定まる状態を、オブジェクトの「総合状態」と呼ぶこととする。そして、各状態の尤度に基づいて「総合状態」を決定し、決定した総合状態に応じて、更新方法を切り替える。例えば、最尤状態を総合状態とし、最尤状態が何であるかによって更新方法を切り替える。あるいは、最尤状態だけでなく、他の状態との尤度の差を用いて総合状態を決定し、更新するようにしてもよい。例えば、最尤状態と他の状態との差がある一定の閾値以上で十分大きいと見做せる場合に、最尤状態を総合状態とし、更新を実施する。ある一定の閾値より小さい場合には、総合状態が最尤状態と２番目に尤度が大きい状態の中間に位置すると捉え、例えば、２番目に尤度が大きい状態も考慮して更新を実施する。なお、オブジェクト記述変数の更新方法を表す算出式が複数用意されており、予め各総合状態と対応づけて記憶されている。 The method of updating the object description variable of each tracking object is switched according to the likelihood of each state of the tracking object. Here, the state that is comprehensively determined based on the likelihood of each state is called the "comprehensive state" of the object. Then, the "total state" is determined based on the likelihood of each state, and the update method is switched according to the determined total state. For example, the maximum likelihood state is set as the total state, and the update method is switched depending on what the maximum likelihood state is. Alternatively, not only the maximum likelihood state but also the difference in likelihood from other states may be used to determine and update the total state. For example, when it is considered that the difference between the maximum likelihood state and another state is sufficiently large above a certain threshold value, the maximum likelihood state is set as the total state and the update is performed. When it is smaller than a certain threshold value, it is considered that the total state is located between the maximum likelihood state and the second highest likelihood state, and for example, the update is performed in consideration of the second highest likelihood state. .. A plurality of calculation formulas expressing the method of updating the object description variable are prepared, and are stored in advance in association with each comprehensive state.

以下では、最尤状態を総合状態として決定する場合について述べる。そして、取り得る状態が静止状態と移動状態の２つである場合に、オブジェクトの最尤状態に基づいて、静止状態と移動状態の２つの状態毎に、オブジェクト記述変数を更新する場合について述べる。 The case where the maximum likelihood state is determined as the total state will be described below. Then, when there are two possible states, the stationary state and the moving state, the case where the object description variable is updated for each of the two states of the stationary state and the moving state based on the maximum likelihood state of the object will be described.

最尤状態が静止状態、つまり総合状態が静止状態である場合、そのオブジェクトは静止している可能性が高いと考えられ、このことに基づいてオブジェクト記述変数の更新を行う。具体的に、対応付けにおいて選択された状態が静止状態の場合には、静止状態の位置は更新せず、移動状態の位置も、静止状態の位置によって更新する。例えば、カルマンフィルタを用いている場合には、カルマンフィルタの更新を、対応づいた検出オブジェクトの位置によって更新する。一方、対応付けにおいて選択された状態が移動状態の場合には、すれ違い等で対応付けが誤っている可能性もあるため、静止状態に対する位置は更新せず、移動状態の位置のみ検出オブジェクトの位置によって更新する。 When the maximum likelihood state is the rest state, that is, the total state is the rest state, it is considered that the object is likely to be rest, and the object description variable is updated based on this. Specifically, when the state selected in the association is the stationary state, the position of the stationary state is not updated, and the position of the moving state is also updated according to the position of the stationary state. For example, when the Kalman filter is used, the update of the Kalman filter is updated according to the position of the corresponding detection object. On the other hand, if the state selected in the mapping is the moving state, the mapping may be incorrect due to passing, etc., so the position with respect to the stationary state is not updated, and only the position of the moving state is the position of the detected object. Update by.

最尤状態が移動状態、つまり総合状態が移動状態である場合、そのオブジェクトは移動している可能性が高いと考えられ、このことに基づいて更新を行う。具体的に、対応付けにおいて選択された状態が静止状態の場合には、すれ違い等で対応付けが誤っている可能性もあるため、静止状態の位置は、検出オブジェクトの位置情報によって更新するが、移動状態の位置は、検出オブジェクトの位置では更新せず、予測した値のままとする。一方、選択された状態が移動状態の場合には、静止状態の位置は、検出オブジェクトの位置で更新する。ただし、静止状態であるため、速度は0のままとする。移動状態については、検出オブジェクトの位置によって更新を行う。 If the maximum likelihood state is the moving state, that is, the total state is the moving state, it is considered that the object is likely to be moved, and the update is performed based on this. Specifically, when the state selected in the mapping is the stationary state, the mapping may be incorrect due to passing or the like, so the position of the stationary state is updated by the position information of the detected object. The position of the moving state is not updated at the position of the detected object, but remains at the predicted value. On the other hand, when the selected state is the moving state, the position of the stationary state is updated with the position of the detection object. However, since it is in a stationary state, the speed remains 0. The movement state is updated according to the position of the detected object.

各追跡オブジェクトのオブジェクト記述変数の更新が完了したら、状態の尤度も更新する。基本的には、選択された状態の尤度が高くなるようにし、それ以外の状態の尤度は下がるように更新する。例えば、選択された状態の尤度をΔPだけ増加させ、それ以外の状態の尤度を少しずつ下げ、選択状態以外の状態に対する尤度減少量の総和がΔPになるようにすればよい。尤度の減少分の各状態への配分方法についてはいろいろ考えるが、均等に配分するようにしてもよい。あるいは、減じると尤度がマイナスになってしまう状態がある場合には、その状態の尤度の減少分を減らし、マイナスにならないようにして、他の状態の減少分を増やすようにしてもよい。そして、全状態の尤度の和は1になるようにする。 When the object description variables for each tracking object have been updated, the state likelihood is also updated. Basically, the likelihood of the selected state is increased, and the likelihood of the other states is decreased. For example, the likelihood of the selected state may be increased by ΔP, the likelihood of the other states may be gradually decreased, and the sum of the likelihood reductions for the states other than the selected state may be ΔP. We will consider various methods for allocating the decrease in likelihood to each state, but it may be distributed evenly. Alternatively, if there is a state in which the likelihood becomes negative when reduced, the decrease in the likelihood of that state may be reduced so that it does not become negative, and the decrease in other states may be increased. .. Then, the sum of the likelihoods of all states should be 1.

そして、最尤状態が他の状態に変わった時点で、このオブジェクトの状態が遷移したとみなす。すなわち、最尤状態が静止状態だった状態から、何回か移動状態が選択され、更新の結果、移動状態の尤度の方が静止状態の尤度より高くなった場合に、オブジェクトの状態が静止状態から移動状態に変化したとみなす。 Then, when the maximum likelihood state changes to another state, the state of this object is considered to have changed. That is, when the moving state is selected several times from the state in which the maximum likelihood state is the resting state, and as a result of the update, the likelihood of the moving state becomes higher than the likelihood of the resting state, the state of the object changes. It is considered that the state has changed from a stationary state to a moving state.

２状態の場合は、互いの状態に遷移可能であるが、３状態以上の場合には、特定の状態間の遷移のみを許可するようにすることもできる。この場合、現時点での最尤状態に応じて、次の状態として選べる状態を限定するようにすればよい。すなわち、状態毎オブジェクト記述変数情報に含める状態を、そのオブジェクトが遷移可能な状態に限定すればよい。 In the case of two states, it is possible to transition to each other's states, but in the case of three or more states, it is possible to allow only the transition between specific states. In this case, the states that can be selected as the next states may be limited according to the maximum likelihood state at the present time. That is, the states included in the object description variable information for each state may be limited to the states in which the object can transition.

例えば、オブジェクトが人物で、状態として移動状態、直立静止状態、しゃがみ・屈み状態の３状態を考える場合、移動状態から直接しゃがみ・屈み状態には遷移しないようにし、移動状態からは直立静止状態のみに遷移可能なようにしてもよい。逆に、しゃがみ・屈み状態からも移動状態には直接遷移せず、しゃがみ・屈み状態からは、直立静止状態にのみ遷移するようにしてもよい。これは、店舗等での顧客行動で、移動状態から立ち止まって、さらにしゃがんだり、屈んだりして品物を見るといった行動のモデル化に適している。 For example, when the object is a person and the three states of moving state, upright resting state, and crouching / crouching state are considered, the transition from the moving state to the crouching / crouching state is prevented, and only the upright resting state is set from the moving state. It may be possible to transition to. On the contrary, the crouching / bending state may not directly transition to the moving state, and the crouching / bending state may only transition to the upright resting state. This is a customer behavior in a store or the like, which is suitable for modeling the behavior of stopping from a moving state and then crouching or crouching to see an item.

また、追跡オブジェクトの中で、検出オブジェクトと対応づかないものがあったり、検出オブジェクトの中で、追跡オブジェクトと対応づかないものがあったりした場合には、新規追加、削除処理を行う。 In addition, if some of the tracking objects do not correspond to the detection object, or if some of the detection objects do not correspond to the tracking object, new addition / deletion processing is performed.

追跡オブジェクトと対応づかない検知オブジェクトが存在する場合には、そのオブジェクトが新規に現れたオブジェクトとみなせるかどうかを評価し、新規に現れた可能性が高い場合に、新規に追跡オブジェクトを追加する。例えば、検知オブジェクトの尤度（検出スコア）が高く、かつ、近くには対応づく追跡結果が存在しない場合に、新規に現れたとみなし、追跡オブジェクトを生成してもよい。あるいは、画面の縁の部分や、出入口がある領域の近くで検知された場合には、新たにカメラの視界に入ってきたオブジェクトである可能性が高い。このような場合も、新規に現れたオブジェクトとみなして、対応する追跡オブジェクトを生成してもよい。 If there is a detection object that does not correspond to the tracking object, evaluate whether the object can be regarded as a newly appearing object, and if there is a high possibility that it has newly appeared, add a new tracking object. For example, when the likelihood (detection score) of the detection object is high and there is no corresponding tracking result nearby, it may be regarded as a new appearance and a tracking object may be generated. Alternatively, if it is detected near the edge of the screen or the area where the doorway is located, it is highly possible that the object is newly in the field of view of the camera. Even in such a case, the corresponding tracking object may be generated by regarding it as a newly appearing object.

一方、検出オブジェクトと対応づかない追跡オブジェクトが存在する場合には、そのオブジェクトが消えたオブジェクトとみなせるかどうかを評価し、消えた可能性が高い場合には、その追跡オブジェクトを削除する。例えば、過去の少し前のフレームから連続して対応づいていない追跡オブジェクトの場合は、追跡の確からしさを表す尤度を徐々に下げていき、尤度がある一定値以下になったときに削除するようにしてもよい。あるいは直前のフレームで画面の縁の近くの位置にあったり、あるいは出入口近くにあったりした追跡オブジェクトについては、尤度の下げ幅を大きくしてもよい。 On the other hand, if there is a tracking object that does not correspond to the detected object, it is evaluated whether the object can be regarded as a disappeared object, and if there is a high possibility that the tracking object has disappeared, the tracking object is deleted. For example, in the case of a tracking object that has not been continuously supported from a frame a little earlier in the past, the likelihood that indicates the certainty of tracking is gradually lowered, and when the likelihood falls below a certain value, it is deleted. You may try to do it. Alternatively, for tracking objects that were near the edge of the screen in the previous frame, or near the doorway, the likelihood reduction may be increased.

最終的に得られた各追跡オブジェクトのオブジェクト記述情報のうち、追跡結果として必要な情報を抜き出して、あるいは、変換してオブジェクト追跡結果として出力する。追跡結果は、少なくとも各追跡オブジェクトの位置情報を対応する時刻情報と関連させて含むが、それ以外に、オブジェクトの外接矩形など、画面に情報を重畳させて結果を表示するのに必要な情報も一緒に出力するようにしてもよい。あるいは、地図上にマッピングする必要がある場合には、各オブジェクトの位置情報を地図上の座標に変換して追跡結果に含めるようにしてもよい。 From the object description information of each tracking object finally obtained, the information required as the tracking result is extracted or converted and output as the object tracking result. The tracking result includes at least the position information of each tracking object in relation to the corresponding time information, but also other information necessary to superimpose the information on the screen and display the result, such as the circumscribed rectangle of the object. It may be output together. Alternatively, when it is necessary to map on the map, the position information of each object may be converted into the coordinates on the map and included in the tracking result.

一方、追跡結果に含まれない情報も含め、各オブジェクトのオブジェクト記述変数情報は、各状態に対して記録される。また、状態毎オブジェクト記述変数予測部１０２へ状態毎のオブジェクト記述変数情報として出力され、次の状態毎のオブジェクト記述変数の予測に使用される。 On the other hand, the object description variable information of each object, including the information not included in the tracking result, is recorded for each state. Further, it is output as object description variable information for each state to the object description variable prediction unit 102 for each state, and is used for predicting the object description variable for each state.

次に、上述の動作のうち、状態毎オブジェクト記述変数予測部１０２、検出・追跡結果対応付け部１０３、追跡対象状態・オブジェクト記述変数更新部１０４において実行される追跡の部分の動作について、図フローチャートを用いて説明する。オブジェクト検出部１０１で実行されるオブジェクトの検出は、これとは別に行われ、その結果がフレーム画像毎に検出・追跡結果対応付け部１０３に入力されるものとする。 Next, among the above-mentioned operations, the operation of the tracking part executed by the object description variable prediction unit 102 for each state, the detection / tracking result mapping unit 103, and the tracking target state / object description variable update unit 104 is shown in the flowchart. Will be described using. It is assumed that the object detection executed by the object detection unit 101 is performed separately from this, and the result is input to the detection / tracking result association unit 103 for each frame image.

ステップＳ１では、状態毎オブジェクト記述変数予測部１０２において、前のフレームで算出された各追跡オブジェクトに対して、状態別に現時刻におけるオブジェクト記述変数の値を予測する。最初のフレームでは、追跡オブジェクトが存在しないため、ここでは何もしない。また、前のフレームの処理の結果、追跡オブジェクトが存在しない場合も同様である。 In step S1, the object description variable prediction unit 102 for each state predicts the value of the object description variable at the current time for each tracking object calculated in the previous frame. In the first frame, there is no tracking object, so we do nothing here. The same applies when the tracking object does not exist as a result of processing the previous frame.

ステップＳ２では、オブジェクト検出部１０１で算出されたオブジェクト検出結果が検出・追跡結果対応付け部１０３に入力され、各追跡オブジェクトと各検出オブジェクトの対応付け尤度を状態別に算出する。そして、追跡オブジェクトと検出オブジェクトの各ペアにたいして、最も尤度が高くなる状態とその時の尤度を算出する。 In step S2, the object detection result calculated by the object detection unit 101 is input to the detection / tracking result association unit 103, and the association likelihood of each tracking object and each detection object is calculated for each state. Then, for each pair of the tracking object and the detection object, the state with the highest likelihood and the likelihood at that time are calculated.

ステップＳ３では、検出・追跡結果対応付け部１０３において、追跡オブジェクトと検出オブジェクトの対応付けが行われる。 In step S3, the detection / tracking result associating unit 103 associates the tracking object with the detection object.

ステップＳ４では、追跡対象状態・オブジェクト記述変数更新部１０４において、各追跡オブジェクトの状態を更新する。 In step S4, the tracking target state / object description variable update unit 104 updates the state of each tracking object.

ステップＳ５では、追跡対象状態・オブジェクト記述変数更新部１０４において、対応づかなかった追跡オブジェクト、検出オブジェクトの結果に基づいて、追跡オブジェクトの新規生成、削除を行い、追跡結果を生成する。最初のフレーム等で追跡オブジェクトが存在しない場合には、全ての検出オブジェクトが未対応状態となるため、各検出オブジェクトに対応する追跡オブジェクトを生成する。 In step S5, the tracking target state / object description variable update unit 104 newly creates and deletes tracking objects based on the results of tracking objects and detection objects that have not been matched, and generates tracking results. If the tracking object does not exist in the first frame or the like, all the detected objects are in the unsupported state, so the tracking object corresponding to each detected object is generated.

ステップＳ６では、処理したフレームが最後のフレームかどうかを確認し、最後のフレームの場合には処理を終了し、そうでない場合には、ステップＳ１に戻り、次に処理対象となるフレームの処理に移る。 In step S6, it is confirmed whether or not the processed frame is the last frame, and if it is the last frame, the processing is terminated. If not, the process returns to step S1 and the processing of the next frame to be processed is performed. Move.

＜効果＞
以上のように、本発明によると、オブジェクトの各状態の尤度を考慮しつつ、オブジェクトの変数の更新を行うことで、すれ違い等で誤って追跡対象と検出対象とが対応づいてしまった場合であっても、状態が遷移する間であれば、以前の状態を保持しつつ状態の更新が行われる。このため、その後、正しい対応付けの方が全体としての尤度が高くなれば、正しい対応関係に復旧することができる。 <Effect>
As described above, according to the present invention, when the variable of the object is updated while considering the likelihood of each state of the object, the tracking target and the detection target are mistakenly associated with each other due to passing each other or the like. Even so, as long as the state is transitioning, the state is updated while maintaining the previous state. Therefore, after that, if the likelihood of the correct correspondence becomes higher as a whole, the correct correspondence can be restored.

具体的に、本発明では、追跡対象であるオブジェクトの変数を、複数の状態毎に予測し、かつ、複数の状態毎に更新している。このように、追跡対象であるオブジェクトの変数を、状態毎に更新して予測することで、取り得る可能性のある状態毎の変数を以後の画像に利用することができる。このため、仮に追跡対象であるオブジェクトと検出したオブジェクトとの対応付けを誤ってしまった場合であっても、その後正しい対応付けを行うことができる。その結果、映像を構成する画像内のオブジェクトの追跡精度の向上を図ることができる。 Specifically, in the present invention, the variable of the object to be tracked is predicted for each of a plurality of states and updated for each of a plurality of states. In this way, by updating and predicting the variables of the object to be tracked for each state, the variables for each state that may be possible can be used for the subsequent images. Therefore, even if the association between the object to be tracked and the detected object is erroneous, the correct association can be performed thereafter. As a result, it is possible to improve the tracking accuracy of the objects in the image constituting the video.

次に，状態モデルの例について、図６乃至図８を参照して説明する。 Next, an example of the state model will be described with reference to FIGS. 6 to 8.

＜２状態モデルの例＞
以下では、オブジェクトの状態が「移動状態」、「静止状態」の場合について述べる。移動状態は、オブジェクトが移動している状態、静止状態は、オブジェクトが静止している状態である。この場合の状態遷移を図６に示す。 <Example of 2-state model>
In the following, the case where the state of the object is "moving state" or "stationary state" will be described. The moving state is a state in which the object is moving, and the stationary state is a state in which the object is stationary. The state transition in this case is shown in FIG.

このような状態モデルは、例えば、行列内で待っている人物の追跡に用いることができる。あるいは、店舗等でも、顧客が移動しているか、止まっているかの大きく２つの状態に分けて顧客の行動を捉えることができるため、２状態のモデルを用いて追跡することができる。あるいは、オブジェクトが車の場合にも、移動状態、静止状態の２状態のモデルを用いることができる。あるいは、交差点での人と車の動きなど、異なる種類のオブジェクトが存在する場合にも適用できる。この場合は、オブジェクトごとに、各状態で用いる予測モデルを変えてもよい。 Such a state model can be used, for example, to track a person waiting in a queue. Alternatively, even in a store or the like, it is possible to capture the behavior of the customer by roughly dividing it into two states, whether the customer is moving or stopped, so that the tracking can be performed using a model of the two states. Alternatively, even when the object is a car, a two-state model of a moving state and a stationary state can be used. Alternatively, it can be applied when different types of objects exist, such as the movement of a person and a car at an intersection. In this case, the prediction model used in each state may be changed for each object.

＜３状態モデルの例＞
以下では、３状態モデルの例として、オブジェクトが人物で、状態が「移動状態」、「直立静止状態」、「しゃがみ・屈み状態」の場合について述べる。移動状態は上述の２状態モデルの場合と同様である。直立静止状態は、静止して立っている状態である。一方、しゃがみ・屈み状態は、静止した状態で、かつ、体を斜めにかがめたり、しゃがんだりしている状態である。位置は静止したままだが、姿勢が変動し、例えば見かけの高さ等が大きく変わるのが特徴である。この場合の状態遷移を図７に示す。 <Example of 3-state model>
In the following, as an example of the three-state model, the case where the object is a person and the states are "moving state", "upright resting state", and "crouching / bending state" will be described. The moving state is the same as in the case of the above-mentioned two-state model. The upright rest state is a state of standing still. On the other hand, the crouching / bending state is a state in which the body is stationary and the body is crouched or crouched at an angle. The position remains stationary, but the posture fluctuates, for example, the apparent height changes significantly. The state transition in this case is shown in FIG.

図７に示すように、移動状態と直立静止状態間、および、直立静止状態としゃがみ・屈み状態間は、互いに遷移し得るが、移動状態としゃがみ・屈み状態間では直接は遷移しない。これは、これらの状態間の遷移では、その間に直立静止状態を伴うのが通常であると考えられるためである。 As shown in FIG. 7, there may be a transition between the moving state and the upright resting state, and between the upright resting state and the crouching / bending state, but there is no direct transition between the moving state and the crouching / bending state. This is because transitions between these states are usually considered to be accompanied by an upright rest state in between.

このような状態モデルは、例えば、店舗において、顧客の行動を、移動状態、立ち止まって商品等を見る直立静止状態、棚の下の方にある商品を見たり、商品に手を伸ばしたりする際のしゃがみ・屈み状態の３つに分けて捉え、顧客を追跡するときに用いることができる。あるいは、物流倉庫等で、オーダーに応じて商品を棚から集めてくる場所の従業員の動線の抽出等に用いることができる。 Such a state model is, for example, in a store, when a customer's behavior is moved, stopped to look at a product, etc., upright and stationary, looking at a product at the bottom of a shelf, or reaching for a product. It can be divided into three categories, crouching and crouching, and used when tracking customers. Alternatively, it can be used for extracting the flow lines of employees at a place where products are collected from shelves according to an order in a distribution warehouse or the like.

＜４状態モデルの例＞
以下では、オブジェクトが人物で状態が「移動状態」、「直立静止状態」、「しゃがみ・屈み状態」、「座り状態」の場合について述べる。座り状態以外は、３状態モデルの場合と同様である。座り状態は、椅子などに座った状態を示す。しゃがみ・屈み状態とは異なり、長時間にわたって位置や姿勢があまり変化しない状況が続くため、別の状態として定義し、予測モデルや追跡尤度の更新モデル等を制御したほうが追跡の精度が向上する。この場合の状態遷移を図８に示す。 <Example of 4-state model>
In the following, the case where the object is a person and the state is "moving state", "upright resting state", "crouching / bending state", and "sitting state" will be described. Except for the sitting state, it is the same as the case of the three-state model. The sitting state indicates a state of sitting on a chair or the like. Unlike the crouching / bending state, the position and posture do not change much for a long time, so it is better to define it as a different state and control the prediction model, tracking likelihood update model, etc. to improve the tracking accuracy. .. The state transition in this case is shown in FIG.

この場合も、座り状態と移動状態間は直接は遷移させず、直立静止状態を経由して遷移するようにする。一方、直立静止状態、しゃがみ・屈み状態と座り状態の間は遷移可能とする。ただし、上述の通り、座り状態は、ある特定の場所でしか発生しないため、オブジェクトがそれ以外の場所にある場合には、座り状態に遷移させないようにしてもよい。すなわち、オブジェクトの位置に応じて、座り状態が取り得ない場所の場合には、上述の３状態モデルと同様に制御し、オブジェクトの位置が座り状態を取り得る場所においては、座り状態も含めて制御を行う。 Also in this case, the transition between the sitting state and the moving state is not made directly, but the transition is made via the upright rest state. On the other hand, it is possible to make a transition between an upright rest state, a crouching / bending state, and a sitting state. However, as described above, since the sitting state occurs only in a specific place, if the object is in another place, the sitting state may not be changed. That is, in the case of a place where the sitting state cannot be taken according to the position of the object, the control is performed in the same manner as the above-mentioned three-state model, and in the place where the position of the object can take the sitting state, the sitting state is also controlled. I do.

このような状態モデルは、上述の３状態モデルの場合に加え、例えば、イートインコーナーやカウンター前に椅子があり、そこで座る動作が生じ得るような店舗で用いることができる。 In addition to the case of the above-mentioned three-state model, such a state model can be used in, for example, a store where there is a chair in front of an eat-in corner or a counter and a sitting motion can occur.

＜実施形態２＞
次に、本発明の第２の実施形態を、図１０を参照して説明する。図１０は、本発明における情報処理装置の構成を示すブロック図である。 <Embodiment 2>
Next, a second embodiment of the present invention will be described with reference to FIG. FIG. 10 is a block diagram showing a configuration of an information processing device according to the present invention.

図１０に示すように、画像処理装置１０は、装備された演算装置にプログラムが実行されることで構築された、検出部１１と、予測部１２と、対応付け部１３と、更新部１４と、を備える。また、画像処理装置１０は、これら各部１１，１２，１３，１４で処理する情報あるいは処理された情報を一時的に記憶する記憶装置（図示せず）を備える。 As shown in FIG. 10, the image processing device 10 includes a detection unit 11, a prediction unit 12, an association unit 13, and an update unit 14, which are constructed by executing a program on the equipped arithmetic unit. , Equipped with. Further, the image processing device 10 includes a storage device (not shown) that temporarily stores the information processed by each of the units 11, 12, 13, and 14, or the processed information.

上記検出部１１は、映像を構成する画像からオブジェクトを検出する。上記予測部１２は、追跡対象であるオブジェクトが取り得る複数の状態毎に、当該オブジェクトの変数を予測する。上記対応付け部１３は、画像から検出されたオブジェクトと、追跡対象であるオブジェクトの予測した変数と、に基づいて、追跡対象であるオブジェクトと画像から検出されたオブジェクトとの対応づけ及びオブジェクトの状態の選択を含む対応付け処理を行う。上記更新部１４は、対応付け処理の結果に基づいて、追跡対象であるオブジェクトの追跡を行うと共に、当該オブジェクトが取り得る複数の状態毎に当該オブジェクトの変数を更新する。そして、上記予測部１２は、さらに、更新された複数の状態毎のオブジェクトの変数に基づいて、当該状態毎に追跡対象であるオブジェクトの変数を予測する。 The detection unit 11 detects an object from the images constituting the video. The prediction unit 12 predicts the variables of the object for each of a plurality of possible states of the object to be tracked. The association unit 13 associates the object to be tracked with the object detected from the image and the state of the object based on the object detected from the image and the variable predicted by the object to be tracked. Performs association processing including selection of. The update unit 14 tracks the object to be tracked based on the result of the association processing, and updates the variables of the object for each of a plurality of states that the object can take. Then, the prediction unit 12 further predicts the variable of the object to be tracked for each state based on the updated variables of the object for each of the plurality of states.

上記構成の画像処理装置１０によると、追跡対象であるオブジェクトの変数を、複数の状態毎に予測し、かつ、複数の状態毎に更新している。このように、追跡対象であるオブジェクトの変数を、状態毎に更新して予測することで、取り得る可能性のある状態毎の変数を以後の画像に利用することができる。このため、仮に追跡対象であるオブジェクトと検出したオブジェクトとの対応付けを誤ってしまった場合であっても、その後正しい対応付けを行うことができる。その結果、映像を構成する画像内のオブジェクトの追跡精度の向上を図ることができる。 According to the image processing device 10 having the above configuration, the variables of the object to be tracked are predicted for each of a plurality of states and updated for each of the plurality of states. In this way, by updating and predicting the variables of the object to be tracked for each state, the variables for each state that may be possible can be used for the subsequent images. Therefore, even if the association between the object to be tracked and the detected object is erroneous, the correct association can be performed thereafter. As a result, it is possible to improve the tracking accuracy of the objects in the image constituting the video.

＜付記＞
上記実施形態の一部又は全部は、以下の付記のようにも記載されうる。以下、本発明における画像処理装置、プログラム、画像処理方法の構成の概略を説明する。但し、本発明は、以下の構成に限定されない。 <Additional notes>
Part or all of the above embodiments may also be described as in the appendix below. Hereinafter, the outline of the configuration of the image processing apparatus, the program, and the image processing method in the present invention will be described. However, the present invention is not limited to the following configurations.

（付記１）
映像を構成する画像からオブジェクトを検出する検出部と、
追跡対象であるオブジェクトが取り得る複数の状態毎に、当該オブジェクトの変数を予測する予測部と、
画像から検出されたオブジェクトと、追跡対象であるオブジェクトの予測した変数と、に基づいて、追跡対象であるオブジェクトと画像から検出されたオブジェクトとの対応づけ及びオブジェクトの状態の選択を含む対応付け処理を行う対応付け部と、
前記対応付け処理の結果に基づいて、追跡対象であるオブジェクトの追跡を行うと共に、当該オブジェクトが取り得る複数の状態毎に当該オブジェクトの変数を更新する更新部と、を備え、
前記予測部は、さらに、更新された複数の状態毎のオブジェクトの変数に基づいて、当該状態毎に追跡対象であるオブジェクトの変数を予測する、
画像処理装置。 (Appendix 1)
A detector that detects objects from the images that make up the video,
A predictor that predicts the variables of the object for each of the multiple states that the object to be tracked can take,
Correspondence processing including association between the object to be tracked and the object detected from the image and selection of the state of the object based on the object detected from the image and the predicted variable of the object to be tracked. And the mapping part that performs
Based on the result of the association processing, it is provided with an update unit that tracks the object to be tracked and updates the variable of the object for each of a plurality of possible states of the object.
The prediction unit further predicts the variables of the object to be tracked for each state based on the updated variables of the objects for each state.
Image processing device.

（付記２）
付記１に記載の画像処理装置であって、
前記更新部は、追跡対象であるオブジェクトの状態毎の尤度に基づいて当該オブジェクトの状態を決定し、決定した状態に対応する方法を用いて、当該オブジェクトが取り得る複数の状態毎に当該オブジェクトの変数を更新する、
画像処理装置。 (Appendix 2)
The image processing apparatus according to Appendix 1.
The update unit determines the state of the object based on the likelihood of each state of the object to be tracked, and uses a method corresponding to the determined state for each of a plurality of possible states of the object. Update the variables of
Image processing device.

（付記３）
付記２に記載の画像処理装置であって、
前記更新部は、追跡対象であるオブジェクトの状態毎の尤度を、前記選択された状態に基づいて算出する、
画像処理装置。 (Appendix 3)
The image processing apparatus according to Appendix 2.
The update unit calculates the likelihood of each state of the object to be tracked based on the selected state.
Image processing device.

（付記４）
付記２又は３に記載の画像処理装置であって、
前記更新部は、追跡対象であるオブジェクトの状態毎の尤度が最大である状態を、当該オブジェクトの状態として決定する、
画像処理装置。 (Appendix 4)
The image processing apparatus according to Appendix 2 or 3.
The update unit determines the state having the maximum likelihood for each state of the object to be tracked as the state of the object.
Image processing device.

（付記５）
付記１乃至４のいずれかに記載の画像処理装置であって、
前記更新部は、画像から検出されたオブジェクトの変数と、前記予測したオブジェクトの変数と、に基づいて、当該オブジェクトが取り得る複数の状態毎に当該オブジェクトの変数を更新する、
画像処理装置。 (Appendix 5)
The image processing apparatus according to any one of Supplementary note 1 to 4.
The update unit updates the variable of the object for each of a plurality of possible states of the object based on the variable of the object detected from the image and the variable of the predicted object.
Image processing device.

（付記６）
付記１乃至５のいずれかに記載の画像処理装置であって、
前記対応付け部は、画像から検出されたオブジェクトの変数と、追跡対象であるオブジェクトの予測した変数と、に基づいて、前記対応付け及び前記状態の選択を行う、
画像処理装置。 (Appendix 6)
The image processing apparatus according to any one of Appendix 1 to 5.
The association unit selects the association and the state based on the variables of the object detected from the image and the predicted variables of the object to be tracked.
Image processing device.

（付記７）
付記６に記載の画像処理装置であって、
前記対応付け部は、画像から検出されたオブジェクトの変数と、追跡対象であるオブジェクトの予測した変数と、に基づいて、状態毎に、追跡対象であるオブジェクトと検出されたオブジェクトとが対応付く尤度を算出し、当該算出した尤度に基づいて前記対応付け及び前記状態の選択を行う、
画像処理装置。 (Appendix 7)
The image processing apparatus according to Appendix 6.
Based on the variable of the object detected from the image and the predicted variable of the object to be tracked, the matching unit may associate the object to be tracked with the detected object for each state. The degree is calculated, and the association and the state are selected based on the calculated likelihood.
Image processing device.

（付記８）
付記１乃至７のいずれかに記載の画像処理装置であって、
前記オブジェクトの変数は、オブジェクトの位置情報、オブジェクトの外見の変化を表す情報、オブジェクトの姿勢の変動を表す情報、および、オブジェクトの状態別の尤度情報、の少なくとも1つを含む、
画像処理装置。 (Appendix 8)
The image processing apparatus according to any one of Appendix 1 to 7.
The variable of the object includes at least one of the position information of the object, the information representing the change in the appearance of the object, the information representing the change in the posture of the object, and the likelihood information for each state of the object.
Image processing device.

（付記９）
付記１乃至８のいずれかに記載の画像処理装置であって、
前記オブジェクトが取り得る複数の状態は、オブジェクトの現在の状態に基づいて設定される、
画像処理装置。 (Appendix 9)
The image processing apparatus according to any one of Supplementary note 1 to 8.
The multiple states that the object can take are set based on the object's current state.
Image processing device.

（付記１０）
付記１乃至９のいずれかに記載の画像処理装置であって、
前記オブジェクトが取り得る複数の状態は、検出されたオブジェクトの位置情報に基づいて設定される、
画像処理装置。 (Appendix 10)
The image processing apparatus according to any one of Supplementary note 1 to 9.
The plurality of states that the object can take are set based on the position information of the detected object.
Image processing device.

（付記１１）
付記１乃至１０のいずれかに記載の画像処理装置であって、
前記状態は、静止状態及び移動状態を含む、
画像処理装置。 (Appendix 11)
The image processing apparatus according to any one of Appendix 1 to 10.
The state includes a stationary state and a moving state.
Image processing device.

（付記１２）
付記１乃至１１のいずれかに記載の画像処理装置であって、
前記オブジェクトは、人物であり、
前記状態は、静止直立状態、移動状態、及び、しゃがみ・屈み状態、を含む、
画像処理装置。 (Appendix 12)
The image processing apparatus according to any one of Supplementary note 1 to 11.
The object is a person
The state includes a stationary upright state, a moving state, and a crouching / bending state.
Image processing device.

（付記１３）
付記１２に記載の画像処理装置であって、
前記状態は、さらに、座り状態も含む、
画像処理装置。 (Appendix 13)
The image processing apparatus according to Appendix 12.
The state further includes a sitting state.
Image processing device.

（付記１４）
付記１３に記載の画像処理装置であって、
前記座り状態は、特定の場所においてのみ前記状態として取り得る、
画像処理装置。 (Appendix 14)
The image processing apparatus according to Appendix 13.
The sitting state can be taken as the state only in a specific place.
Image processing device.

（付記Ａ１）
情報処理装置に、
映像を構成する画像からオブジェクトを検出する検出部と、
追跡対象であるオブジェクトが取り得る複数の状態毎に、当該オブジェクトの変数を予測する予測部と、
画像から検出されたオブジェクトと、追跡対象であるオブジェクトの予測した変数と、に基づいて、追跡対象であるオブジェクトと画像から検出されたオブジェクトとの対応づけ及びオブジェクトの状態の選択を含む対応付け処理を行う対応付け部と、
前記対応付け処理の結果に基づいて、追跡対象であるオブジェクトの追跡を行うと共に、当該オブジェクトが取り得る複数の状態毎に当該オブジェクトの変数を更新する更新部と、
を実現させ、
前記予測部は、さらに、更新された複数の状態毎のオブジェクトの変数に基づいて、当該状態毎に追跡対象であるオブジェクトの変数を予測する、
プログラム。 (Appendix A1)
For information processing equipment
A detector that detects objects from the images that make up the video,
A predictor that predicts the variables of the object for each of the multiple states that the object to be tracked can take,
Correspondence processing including association between the object to be tracked and the object detected from the image and selection of the state of the object based on the object detected from the image and the predicted variable of the object to be tracked. And the mapping part that performs
Based on the result of the association processing, an update unit that tracks the object to be tracked and updates the variable of the object for each of a plurality of possible states of the object.
Realized,
The prediction unit further predicts the variables of the object to be tracked for each state based on the updated variables of the objects for each state.
program.

（付記Ａ２）
付記Ａ１に記載のプログラムであって、
前記更新部は、追跡対象であるオブジェクトの状態毎の尤度に基づいて当該オブジェクトの状態を決定し、決定した状態に対応する方法を用いて、当該オブジェクトが取り得る複数の状態毎に当該オブジェクトの変数を更新する、
プログラム。 (Appendix A2)
The program described in Appendix A1
The update unit determines the state of the object based on the likelihood of each state of the object to be tracked, and uses a method corresponding to the determined state for each of a plurality of possible states of the object. Update the variables of
program.

（付記Ａ３）
付記Ａ１又はＡ２に記載のプログラムであって、
前記更新部は、画像から検出されたオブジェクトの変数と、前記予測したオブジェクトの変数と、に基づいて、当該オブジェクトが取り得る複数の状態毎に当該オブジェクトの変数を更新する、
プログラム。 (Appendix A3)
The program described in Appendix A1 or A2.
The update unit updates the variable of the object for each of a plurality of possible states of the object based on the variable of the object detected from the image and the variable of the predicted object.
program.

（付記Ａ４）
付記Ａ１乃至Ａ３のいずれかに記載のプログラムであって、
前記対応付け部は、画像から検出されたオブジェクトの変数と、追跡対象であるオブジェクトの予測した変数と、に基づいて、前記対応付け及び前記状態の選択を行う、
プログラム。 (Appendix A4)
The program described in any of the appendices A1 to A3.
The association unit selects the association and the state based on the variables of the object detected from the image and the predicted variables of the object to be tracked.
program.

（付記Ｂ１）
映像を構成する画像からオブジェクトを検出し、
追跡対象であるオブジェクトが取り得る複数の状態毎に、当該オブジェクトの変数を予測し、
画像から検出されたオブジェクトと、追跡対象であるオブジェクトの予測した変数と、に基づいて、追跡対象であるオブジェクトと画像から検出されたオブジェクトとの対応づけ及びオブジェクトの状態の選択を含む対応付け処理を行い、
前記対応付け処理の結果に基づいて、追跡対象であるオブジェクトの追跡を行うと共に、当該オブジェクトが取り得る複数の状態毎に当該オブジェクトの変数を更新し、
さらに、更新された複数の状態毎のオブジェクトの変数に基づいて、当該状態毎に追跡対象であるオブジェクトの変数を予測する、
画像処理方法。 (Appendix B1)
Detects objects from the images that make up the video
Predict the variables of the object for each of the multiple states that the object to be tracked can take,
Correspondence processing including association between the object to be tracked and the object detected from the image and selection of the state of the object based on the object detected from the image and the predicted variable of the object to be tracked. And
Based on the result of the association processing, the object to be tracked is tracked, and the variables of the object are updated for each of a plurality of possible states of the object.
Furthermore, based on the updated variables of the objects for each state, the variables of the object to be tracked for each state are predicted.
Image processing method.

（付記Ｂ２）
付記Ｂ１に記載の画像処理方法であって、
追跡対象であるオブジェクトの状態毎の尤度に基づいて当該オブジェクトの状態を決定し、決定した状態に対応する方法を用いて、当該オブジェクトが取り得る複数の状態毎に当該オブジェクトの変数を更新する、
画像処理方法。 (Appendix B2)
The image processing method described in Appendix B1.
The state of the object to be tracked is determined based on the likelihood of each state of the object, and the variable of the object is updated for each of a plurality of possible states of the object by using the method corresponding to the determined state. ,
Image processing method.

（付記Ｂ３）
付記Ｂ１又はＢ２に記載の画像処理方法であって、
画像から検出されたオブジェクトの変数と、前記予測したオブジェクトの変数と、に基づいて、当該オブジェクトが取り得る複数の状態毎に当該オブジェクトの変数を更新する、
画像処理方法。 (Appendix B3)
The image processing method according to Appendix B1 or B2.
Based on the variable of the object detected from the image and the variable of the predicted object, the variable of the object is updated for each of a plurality of possible states of the object.
Image processing method.

（付記Ｂ４）
付記Ｂ１乃至Ｂ３のいずれかに記載の画像処理方法であって、
画像から検出されたオブジェクトの変数と、追跡対象であるオブジェクトの予測した変数と、に基づいて、前記対応付け及び前記状態の選択を行う、
画像処理方法。 (Appendix B4)
The image processing method according to any one of Supplementary Provisions B1 to B3.
Based on the variable of the object detected from the image and the predicted variable of the object to be tracked, the association and the selection of the state are performed.
Image processing method.

なお、上述したプログラムは、記憶装置に記憶されていたり、コンピュータが読み取り可能な記録媒体に記録されている。例えば、記録媒体は、フレキシブルディスク、光ディスク、光磁気ディスク、及び、半導体メモリ等の可搬性を有する媒体である。 The above-mentioned program is stored in a storage device or recorded on a computer-readable recording medium. For example, the recording medium is a portable medium such as a flexible disk, an optical disk, a magneto-optical disk, and a semiconductor memory.

以上、上記実施形態等を参照して本願発明を説明したが、本願発明は、上述した実施形態に限定されるものではない。本願発明の構成や詳細には、本願発明の範囲内で当業者が理解しうる様々な変更をすることができる。 Although the invention of the present application has been described above with reference to the above-described embodiments and the like, the present invention is not limited to the above-described embodiments. Various changes that can be understood by those skilled in the art can be made to the structure and details of the present invention within the scope of the present invention.

本発明によれば、例えばカメラを使って人物の動線を抽出することが可能になる。これにより、例えば店舗内を回遊する顧客の行動を分析し、マーケティングや店舗のレイアウト変更の基礎情報としたり、セキュリティ目的で、エリア間をうろつく人物を検出したりする目的で利用することができる。 According to the present invention, it is possible to extract a flow line of a person using, for example, a camera. This makes it possible to analyze the behavior of customers wandering around the store, for example, as basic information for marketing and store layout changes, and for security purposes, to detect people wandering between areas.

１０画像処理装置
１１検出部
１２予測部
１３対応付け部
１４更新部
１００情報処理装置
１０１オブジェクト検出部
１０２状態毎オブジェクト記述変数予測部
１０３検出・追跡結果対応づけ部
１０４追跡対象状態・オブジェクト記述変数更新部
10 Image processing device 11 Detection unit 12 Prediction unit 13 Correspondence unit 14 Update unit 100 Information processing device 101 Object detection unit 102 Object description variable prediction unit for each state 103 Detection / tracking result association unit 104 Tracking target state / object description variable update Department

Claims

A detector that detects objects from the images that make up the video,
A predictor that predicts the variables of the object for each of the multiple states that the object to be tracked can take,
Correspondence processing including association between the object to be tracked and the object detected from the image and selection of the state of the object based on the object detected from the image and the predicted variable of the object to be tracked. And the mapping part that performs
Based on the result of the association processing, it is provided with an update unit that tracks the object to be tracked and updates the variable of the object for each of a plurality of possible states of the object.
The prediction unit further predicts the variables of the object to be tracked for each state based on the updated variables of the objects for each state.
Image processing device.

The image processing apparatus according to claim 1.
The update unit determines the state of the object based on the likelihood of each state of the object to be tracked, and uses a method corresponding to the determined state for each of a plurality of possible states of the object. Update the variables of
Image processing device.

The image processing apparatus according to claim 2.
The update unit calculates the likelihood of each state of the object to be tracked based on the selected state.
Image processing device.

The image processing apparatus according to claim 2 or 3.
The update unit determines the state having the maximum likelihood for each state of the object to be tracked as the state of the object.
Image processing device.

The image processing apparatus according to any one of claims 1 to 4.
The update unit updates the variable of the object for each of a plurality of possible states of the object based on the variable of the object detected from the image and the variable of the predicted object.
Image processing device.

The image processing apparatus according to any one of claims 1 to 5.
The association unit selects the association and the state based on the variables of the object detected from the image and the predicted variables of the object to be tracked.
Image processing device.

The image processing apparatus according to claim 6.
Based on the variable of the object detected from the image and the predicted variable of the object to be tracked, the matching unit may associate the object to be tracked with the detected object for each state. The degree is calculated, and the association and the state are selected based on the calculated likelihood.
Image processing device.

The image processing apparatus according to any one of claims 1 to 7.
The variable of the object includes at least one of the position information of the object, the information representing the change in the appearance of the object, the information representing the change in the posture of the object, and the likelihood information for each state of the object.
Image processing device.

For information processing equipment
A detector that detects objects from the images that make up the video,
A predictor that predicts the variables of the object for each of the multiple states that the object to be tracked can take,
Correspondence processing including association between the object to be tracked and the object detected from the image and selection of the state of the object based on the object detected from the image and the predicted variable of the object to be tracked. And the mapping part that performs
Based on the result of the association processing, an update unit that tracks the object to be tracked and updates the variable of the object for each of a plurality of possible states of the object.
Realized,
The prediction unit further predicts the variables of the object to be tracked for each state based on the updated variables of the objects for each state.
program.

Detects objects from the images that make up the video
Predict the variables of the object for each of the multiple states that the object to be tracked can take,
Correspondence processing including association between the object to be tracked and the object detected from the image and selection of the state of the object based on the object detected from the image and the predicted variable of the object to be tracked. And
Based on the result of the association processing, the object to be tracked is tracked, and the variables of the object are updated for each of a plurality of possible states of the object.
Furthermore, based on the updated variables of the objects for each state, the variables of the object to be tracked for each state are predicted.
Image processing method.