JP2021117635A

JP2021117635A - Object tracking device and object tracking method

Info

Publication number: JP2021117635A
Application number: JP2020009676A
Authority: JP
Inventors: 仁志西村; Hitoshi Nishimura; 和之田坂; Kazuyuki Tasaka
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2020-01-24
Filing date: 2020-01-24
Publication date: 2021-08-10
Anticipated expiration: 2040-01-24
Also published as: JP7229954B2

Abstract

To precisely track an object.SOLUTION: An object tracking device 1 includes: an object detection unit 122 that detects an object from a plurality of captured images of a prescribed area captured at each of a plurality of times; a motion feature amount extraction unit 124 that extracts a motion feature amount of the object on the basis of partial images, of the captured images, corresponding to the detected object; and an association unit 125 that associates each of a plurality of objects detected from each of the plurality of captured images captured at different image capturing times on the basis of the motion feature amount. The motion feature amount extraction unit 124 re-extracts, for each of the associated objects, a motion feature amount on the basis of a partial image corresponding to the object and a partial image corresponding to another object associated with the object. The association unit 125 re-associates objects detected from each of the plurality of captured images captured at different image capturing times, on the basis of the re-extracted motion feature amount.SELECTED DRAWING: Figure 3

Description

本発明は、オブジェクト追跡装置及びオブジェクト追跡方法に関する。 The present invention relates to an object tracking device and an object tracking method.

従来、撮像装置が撮像した撮像画像に映る人物等のオブジェクトを追跡することが行われている。例えば、非特許文献１には、オブジェクトの位置に関する特徴を示す位置特徴量と、オブジェクトの見え方を示す見え特徴量と、オブジェクトの行動の特徴を示す行動特徴量を用いてオブジェクトの追跡を行うことが開示されている。 Conventionally, an object such as a person appearing in a captured image captured by an imaging device has been tracked. For example, in Non-Patent Document 1, an object is tracked by using a position feature amount indicating a feature related to the position of the object, a visible feature amount indicating the appearance of the object, and a behavioral feature amount indicating the behavioral feature of the object. Is disclosed.

Gurkirt Singh, Suman Saha, Michael Sapienza, Philip H. S. Torr, and Fabio Cuzzolin, "Online Real-time Multiple Spatiotemporal Action Localisation and Prediction," IEEE International Conference on Computer Vision (ICCV), pp. 3637-3646, ２０１７年Gurkirt Singh, Suman Saha, Michael Sapienza, Philip H. S. Torr, and Fabio Cuzzolin, "Online Real-time Multiple Spatiotemporal Action Localization and Prediction," IEEE International Conference on Computer Vision (ICCV), pp. 3637-3646, 2017

非特許文献１では、撮像画像から行動特徴量を抽出する場合に、当該撮像画像に映るオブジェクトと、他の撮像画像に映るオブジェクトとの関係を考慮していない。しかしながら、行動は、動きを伴った結果判明するものであり、１つの撮像画像から精度良く行動特徴量を抽出することは困難である。したがって、非特許文献１では、位置や見え方が似ているオブジェクトに対する追跡精度が低下してしまうという問題があった。 Non-Patent Document 1 does not consider the relationship between the object reflected in the captured image and the object reflected in another captured image when the behavioral feature amount is extracted from the captured image. However, the behavior is found as a result of the movement, and it is difficult to accurately extract the behavioral features from one captured image. Therefore, Non-Patent Document 1 has a problem that the tracking accuracy for objects having similar positions and appearances is lowered.

そこで、本発明はこれらの点に鑑みてなされたものであり、オブジェクトを精度良く追跡することができるオブジェクト追跡装置及びオブジェクト追跡方法を提供することを目的とする。 Therefore, the present invention has been made in view of these points, and an object of the present invention is to provide an object tracking device and an object tracking method capable of accurately tracking an object.

本発明の第１の態様に係るオブジェクト追跡装置は、所定エリアを撮像する撮像装置が３以上の複数の時刻のそれぞれで撮像した３以上の複数の撮像画像を取得する取得部と、前記取得部が取得した前記複数の撮像画像のそれぞれから、前記撮像画像に映るオブジェクトを検出するオブジェクト検出部と、前記オブジェクト検出部が検出した前記オブジェクトに対応する前記撮像画像の部分画像に基づいて、前記オブジェクトの行動の特徴量を示す行動特徴量を抽出する行動特徴量抽出部と、前記行動特徴量抽出部が抽出した前記行動特徴量に基づいて、撮像された時刻が異なる複数の前記撮像画像のそれぞれから前記オブジェクト検出部が検出した前記オブジェクトの関連付けを行う関連付け部と、を備え、前記行動特徴量抽出部は、前記関連付け部により関連付けが行われた複数の前記オブジェクトのそれぞれについて、当該オブジェクトに対応する部分画像と、当該オブジェクトに関連付けられた一以上の他のオブジェクトのそれぞれに対応する前記部分画像とに基づいて前記行動特徴量を再抽出し、前記関連付け部は、前記行動特徴量抽出部により再抽出された前記行動特徴量に基づいて、撮像された時刻が異なる複数の前記撮像画像のそれぞれから前記オブジェクト検出部が検出した前記オブジェクトの関連付けを再度行う。 The object tracking device according to the first aspect of the present invention includes an acquisition unit that acquires a plurality of captured images captured by an imaging device that captures a predetermined area at a plurality of time of three or more, and the acquisition unit. Based on an object detection unit that detects an object reflected in the captured image from each of the plurality of captured images acquired by the object, and a partial image of the captured image corresponding to the object detected by the object detection unit. The behavioral feature amount extraction unit that extracts the behavioral feature amount indicating the behavioral feature amount of the above, and the plurality of captured images having different imaging times based on the behavioral feature amount extracted by the behavioral feature amount extraction unit, respectively. The object detection unit includes an association unit that associates the objects detected by the object detection unit, and the behavior feature amount extraction unit corresponds to the object for each of the plurality of objects associated with the association unit. The behavioral feature amount is re-extracted based on the partial image to be performed and the partial image corresponding to each of one or more other objects associated with the object, and the association part is generated by the behavior feature amount extraction unit. Based on the re-extracted behavioral feature amount, the object detected by the object detection unit is associated again from each of the plurality of captured images having different captured times.

前記取得部は、前記撮像装置が４以上の複数の時刻のそれぞれで撮像した４以上の複数の撮像画像を取得し、前記オブジェクト追跡装置は、前記行動特徴量抽出部による前記行動特徴量の再抽出と、前記関連付け部による前記オブジェクトの再度の関連付けとを、所定の条件を満たすまで交互に繰り返し実行させることにより、オブジェクトの追跡を行う実行制御部をさらに備えてもよい。 The acquisition unit acquires a plurality of four or more captured images captured by the imaging device at each of the four or more time periods, and the object tracking device reconstructs the behavior feature amount by the behavior feature amount extraction unit. An execution control unit that tracks the objects may be further provided by alternately and repeatedly executing the extraction and the reassociation of the objects by the association unit until a predetermined condition is satisfied.

前記実行制御部は、前記関連付け部によるオブジェクトの関連付けを行った結果、当該関連付けを行った後において関連付けられているオブジェクトの数に対する、当該関連付けを行う前において関連付けられているオブジェクトの数の割合が所定の割合以上となるまで、前記行動特徴量抽出部による前記行動特徴量の再抽出と、前記関連付け部によるオブジェクトの関連付けとを交互に繰り返し実行させてもよい。 As a result of associating the objects by the association unit, the execution control unit has a ratio of the number of objects associated before the association to the number of objects associated after the association. The re-extraction of the behavioral feature amount by the behavioral feature amount extraction unit and the association of the objects by the association unit may be alternately and repeatedly executed until the ratio becomes a predetermined ratio or more.

前記行動特徴量抽出部は、前記関連付け部により関連付けが行われた複数の前記オブジェクトのそれぞれについて、当該オブジェクトに対応する前記行動特徴量が示す行動傾向に基づいて、当該オブジェクトの前記行動特徴量の再抽出に用いる他のオブジェクトの数を変化させてもよい。 The behavior feature extraction unit determines the behavior feature of the object based on the behavior tendency indicated by the behavior feature corresponding to the object for each of the plurality of objects associated with the association. You may vary the number of other objects used for re-extraction.

前記オブジェクト検出部は、前記撮像画像に映る前記オブジェクトの位置を示すオブジェクト位置を特定することにより前記オブジェクトを検出し、前記行動特徴量抽出部は、前記オブジェクト検出部が特定した前記オブジェクト位置に対応する前記部分画像に基づいて、前記行動特徴量を抽出してもよい。 The object detection unit detects the object by specifying an object position indicating the position of the object reflected in the captured image, and the action feature amount extraction unit corresponds to the object position specified by the object detection unit. The behavioral feature amount may be extracted based on the partial image.

前記関連付け部は、前記オブジェクト検出部が特定した前記オブジェクト位置にさらに基づいて、撮像された時刻が異なる複数の前記撮像画像のそれぞれから前記オブジェクト検出部が検出した前記オブジェクトの関連付けを行ってもよい。 The association unit may associate the object detected by the object detection unit from each of the plurality of captured images having different captured times, based on the object position specified by the object detection unit. ..

前記オブジェクト追跡装置は、前記オブジェクト検出部が検出した前記オブジェクトの前記撮像画像における見え方を示す見え特徴量を抽出する見え特徴量抽出部をさらに備え、前記関連付け部は、前記見え特徴量抽出部が抽出した前記見え特徴量にさらに基づいて、撮像された時刻が異なる撮像画像のそれぞれから前記オブジェクト検出部が検出した前記オブジェクトの関連付けを行ってもよい。 The object tracking device further includes a visible feature amount extracting unit that extracts a visible feature amount indicating the appearance of the object in the captured image detected by the object detection unit, and the associating unit is the visible feature amount extracting unit. Based on the visible feature amount extracted by the object, the object detected by the object detection unit may be associated with each of the captured images having different captured times.

前記行動特徴量抽出部は、前記オブジェクト検出部が検出した前記オブジェクトに対応する前記撮像画像の前記部分画像と、当該撮像画像が撮像された時刻の前の時刻又は後の時刻に撮像された撮像画像の前記部分画像との差分に基づいて、前記行動特徴量を抽出してもよい。 The behavioral feature amount extraction unit captures the partial image of the captured image corresponding to the object detected by the object detection unit and an image captured at a time before or after the time at which the captured image is captured. The behavioral feature amount may be extracted based on the difference between the image and the partial image.

前記行動特徴量抽出部は、前記オブジェクト検出部が検出した前記オブジェクトを含む前記部分画像である第１部分画像と、前記オブジェクトを含み、前記第１部分画像より表示領域が大きい前記部分画像である第２部分画像に基づいて、前記行動特徴量を抽出してもよい。 The behavioral feature amount extraction unit is a first partial image that is a partial image including the object detected by the object detection unit, and the partial image that includes the object and has a larger display area than the first partial image. The behavioral feature amount may be extracted based on the second partial image.

前記行動特徴量抽出部は、前記部分画像の入力に対して前記行動特徴量を出力するニューラルネットワークに、前記オブジェクトに対応する前記部分画像と、当該オブジェクトとの関連付けが行われた一以上の前記他のオブジェクトに対応する前記部分画像とを入力し、前記ニューラルネットワークから出力された複数の行動特徴量に基づいて、当該オブジェクトの前記行動特徴量を再抽出してもよい。 The behavior feature extraction unit is one or more of the above, in which the partial image corresponding to the object is associated with the object in a neural network that outputs the behavior feature in response to the input of the partial image. The partial image corresponding to the other object may be input, and the behavioral feature of the object may be re-extracted based on the plurality of behavioral features output from the neural network.

前記行動特徴量抽出部は、前記部分画像の入力に対して前記行動特徴量を出力するニューラルネットワークに、前記オブジェクトに対応する前記部分画像を入力し、前記ニューラルネットワークにおける中間層が示す特徴量を取得し、取得した特徴量に基づいて、当該オブジェクトの前記行動特徴量を抽出してもよい。 The behavior feature extraction unit inputs the partial image corresponding to the object into the neural network that outputs the behavior feature in response to the input of the partial image, and obtains the feature indicated by the intermediate layer in the neural network. The behavioral feature amount of the object may be extracted based on the acquired feature amount.

前記行動特徴量抽出部は、前記部分画像の入力に対して前記行動特徴量を出力するニューラルネットワークに、前記オブジェクトに対応する前記部分画像と、当該オブジェクトとの関連付けが行われた一以上の前記他のオブジェクトに対応する前記部分画像とを入力し、前記ニューラルネットワークにおける中間層が示す特徴量を取得し、取得した特徴量に基づいて、当該オブジェクトの前記行動特徴量を再抽出してもよい。 The behavior feature extraction unit is one or more of the above, in which the partial image corresponding to the object is associated with the object in a neural network that outputs the behavior feature in response to the input of the partial image. The partial image corresponding to the other object may be input, the feature amount indicated by the intermediate layer in the neural network may be acquired, and the behavioral feature amount of the object may be re-extracted based on the acquired feature amount. ..

本発明の第２の態様に係るオブジェクト追跡方法は、コンピュータが実行する、所定エリアを撮像する撮像装置が３以上の複数の時刻のそれぞれで撮像した３以上の複数の撮像画像を取得するステップと、取得された前記複数の撮像画像のそれぞれから、前記撮像画像に映るオブジェクトを検出するステップと、検出された前記オブジェクトに対応する前記撮像画像の部分画像に基づいて、前記オブジェクトの行動の特徴量を示す行動特徴量を抽出するステップと、抽出された前記行動特徴量に基づいて、撮像された時刻が異なる複数の前記撮像画像のそれぞれから検出された前記オブジェクトの関連付けを行うステップと、関連付けが行われた複数の前記オブジェクトのそれぞれについて、当該オブジェクトに対応する部分画像と、当該オブジェクトに関連付けられた一以上の他のオブジェクトのそれぞれに対応する前記部分画像とに基づいて前記行動特徴量を再抽出するステップと、再抽出された前記行動特徴量に基づいて、撮像された時刻が異なる複数の前記撮像画像のそれぞれから検出された前記オブジェクトの関連付けを再度行うステップと、を有する。 The object tracking method according to the second aspect of the present invention includes a step of acquiring three or more captured images captured by an imaging device that captures a predetermined area at a plurality of times of three or more, which is executed by a computer. , A feature amount of the behavior of the object based on the step of detecting the object reflected in the captured image from each of the acquired plurality of captured images and the partial image of the captured image corresponding to the detected object. The step of extracting the behavioral feature amount indicating the above, and the step of associating the object detected from each of the plurality of captured images having different captured times based on the extracted behavioral feature amount, and the association. For each of the plurality of objects performed, the behavioral feature amount is regenerated based on the partial image corresponding to the object and the partial image corresponding to each of one or more other objects associated with the object. It has a step of extracting and a step of reassociating the objects detected from each of the plurality of captured images having different captured times based on the re-extracted behavioral feature amount.

本発明によれば、オブジェクトを精度良く追跡することができるという効果を奏する。 According to the present invention, there is an effect that an object can be tracked with high accuracy.

本実施形態に係るオブジェクト追跡装置の概要を示す図である。It is a figure which shows the outline of the object tracking apparatus which concerns on this embodiment. オブジェクトの関連付けの例について示す図である。It is a figure which shows the example of the association of an object. 本実施形態に係るオブジェクト追跡装置の構成を示す図である。It is a figure which shows the structure of the object tracking apparatus which concerns on this embodiment. オブジェクトの関連付けが行われていない場合における行動特徴量情報の抽出例を示す図である。It is a figure which shows the extraction example of the behavior feature amount information when the object is not associated. オブジェクトの関連付けが行われている場合における行動特徴量情報の抽出例を示す図である。It is a figure which shows the extraction example of the behavior feature amount information in the case where the object is associated. 本実施形態に係るオブジェクト追跡装置における処理の流れを示すフローチャートである。It is a flowchart which shows the flow of processing in the object tracking apparatus which concerns on this embodiment.

［オブジェクト追跡装置１の概要］
図１は、本実施形態に係るオブジェクト追跡装置１の概要を示す図である。オブジェクト追跡装置１は、店舗内等の所定エリアを撮像する撮像装置２が撮像した複数の撮像画像に映る一以上のオブジェクトを関連付けることにより、オブジェクトを追跡する装置である。ここで、オブジェクトは、例えば店舗内を行動する店員や顧客である。 [Overview of Object Tracking Device 1]
FIG. 1 is a diagram showing an outline of the object tracking device 1 according to the present embodiment. The object tracking device 1 is a device that tracks objects by associating one or more objects that appear in a plurality of captured images captured by the imaging device 2 that images a predetermined area such as in a store. Here, the object is, for example, a clerk or a customer who acts in the store.

オブジェクト追跡装置１は、撮像装置２が撮像した時系列の複数の撮像画像を取得する（図１の（１））。オブジェクト追跡装置１は、取得した複数の撮像画像のそれぞれから、撮像画像に映るオブジェクトを検出する（図１の（２））。 The object tracking device 1 acquires a plurality of time-series captured images captured by the image capturing device 2 ((1) in FIG. 1). The object tracking device 1 detects an object reflected in the captured image from each of the acquired plurality of captured images ((2) in FIG. 1).

オブジェクト追跡装置１は、複数の撮像画像のそれぞれについて、検出したオブジェクトの行動の特徴量を示す行動特徴量を抽出する（図１の（３））。オブジェクト追跡装置１は、抽出した行動特徴量に基づいて、複数の撮像画像のそれぞれから検出したオブジェクトの関連付けを行う（図１の（４））。 The object tracking device 1 extracts a behavioral feature amount indicating the behavioral feature amount of the detected object for each of the plurality of captured images ((3) in FIG. 1). The object tracking device 1 associates the objects detected from each of the plurality of captured images based on the extracted behavioral features ((4) in FIG. 1).

オブジェクト追跡装置１は、複数のオブジェクトのそれぞれについて、当該オブジェクトに対応する撮像画像と、関連付けが行われたオブジェクトが映る他の撮像画像とに基づいて行動特徴量を再抽出する。複数の撮像画像において関連付けられたオブジェクトは、複数の撮像画像が示すオブジェクトの行動に基づいて行動特徴量を抽出できることから、関連付けが行われる前に比べて、行動特徴量の精度が高くなる。 The object tracking device 1 re-extracts the behavioral features of each of the plurality of objects based on the captured image corresponding to the object and another captured image in which the associated object is displayed. Since the objects associated with the plurality of captured images can extract the behavioral features based on the behaviors of the objects shown by the plurality of captured images, the accuracy of the behavioral features is higher than before the association is performed.

オブジェクト追跡装置１は、再抽出した行動特徴量に基づいて、複数の撮像画像のそれぞれから検出したオブジェクトの関連付けを行う。行動特徴量の精度が高くなったことにより、これまでに関連付けられていなかったオブジェクト同士での関連付けが行われることとなる。 The object tracking device 1 associates the objects detected from each of the plurality of captured images based on the re-extracted behavioral features. Due to the increased accuracy of behavioral features, objects that have not been associated with each other will be associated with each other.

図２は、オブジェクトの関連付けの例について示す図である。図２（ａ）〜（ｄ）において、横軸は撮像時刻を示し、縦軸はオブジェクトの位置を示している。また、図２（ａ）〜（ｄ）において、マークＭは、検出されたオブジェクトを示している。図２（ａ）〜（ｄ）において、同じ時刻に示されるマークＭは、オブジェクト追跡装置１が検出した同一のオブジェクトを示している。なお、時刻ｔ_３では、マークＭが存在していないが、これは、同時刻において、例えば遮蔽等の理由により、オブジェクトが検出できなかったことを示している。また、マークＭの中に示すアルファベットは、行動特徴量が示す行動クラスを示すクラス情報を示している。 FIG. 2 is a diagram showing an example of associating objects. In FIGS. 2A to 2D, the horizontal axis represents the imaging time and the vertical axis represents the position of the object. Further, in FIGS. 2A to 2D, the mark M indicates the detected object. In FIGS. 2A to 2D, the marks M shown at the same time indicate the same object detected by the object tracking device 1. In time t _3, but not present mark M, which in the same time, for example because of the shielding, etc., indicate that it could not detect objects. In addition, the alphabet shown in the mark M indicates the class information indicating the behavior class indicated by the behavior feature amount.

図２（ａ）は、オブジェクト追跡装置１が、オブジェクトの関連付けを行う前の状態を示している。図２（ｂ）は、オブジェクト追跡装置１により、初めて関連付けが行われた状態を示している。 FIG. 2A shows a state before the object tracking device 1 associates objects. FIG. 2B shows a state in which the object tracking device 1 first makes an association.

図２（ｃ）は、初めて関連付けが行われた後、オブジェクト追跡装置１が、オブジェクトの行動特徴量の再抽出を行った状態を示している。関連付けが行われたオブジェクトによって行動特徴量の再抽出が行われた結果、図２（ｃ）に示すように、時刻ｔ_２におけるオブジェクトのクラス情報が「ｂ」から「ａ」に変化しているとともに、時刻ｔ_４におけるオブジェクトのクラス情報が「ｃ」から「ａ」に変化していることが確認できる。 FIG. 2C shows a state in which the object tracking device 1 re-extracts the behavioral features of the object after the association is performed for the first time. Association results of the re-extraction action feature quantity by objects made were made, as shown in FIG. 2 (c), object class information at time t ₂ is changed from "b" to "a" with, class information of the object at time t ₄ it can be confirmed that has changed from "c" to "a".

図２（ｄ）は、図２（ｃ）に示すようにオブジェクトの行動特徴量の再抽出が行われた後に、再度、オブジェクトの関連付けが行われた状態を示している。図２（ｄ）に示すように、時刻ｔ_２におけるオブジェクトと時刻ｔ_４におけるオブジェクトの行動特徴量が再抽出された結果、時刻ｔ_２におけるオブジェクトと時刻ｔ_４におけるオブジェクトとが関連付けられたことが確認できる。このように、時刻ｔ_３においてオブジェクトが検出できなかった場合であっても、時刻ｔ_３の前後の時刻ｔ_２、ｔ_４における行動特徴量の再抽出により、時刻ｔ_２、ｔ_４におけるオブジェクトのクラス情報が同一のものに修正され、時刻ｔ_２、ｔ_４におけるオブジェクトを関連付けることができる。 FIG. 2D shows a state in which the object is associated again after the behavioral feature amount of the object is re-extracted as shown in FIG. 2C. As shown in FIG. 2 (d), the result of action feature quantity of an object in the object and the time t ₄ at time t ₂ is re-extracted, that an object in the object and the time t ₄ at time t ₂ is associated with You can check. Thus, even if it can not detect objects at time t _3, the action feature quantity at a time t _2, t ₄ before and after the time t ₃ by the re-extraction of the object at time t _2, t ₄ class information is corrected to the same thing, it is possible to associate an object at time t _2, t _4.

図１に説明を戻し、オブジェクト追跡装置１は、図１の（３）の処理と、（４）の処理とを繰り返し実行する。オブジェクト追跡装置１は、行動特徴量の再抽出と、オブジェクトの関連付けとが交互に行うことにより、オブジェクトの関連付けの精度を向上させることができるので、結果として、オブジェクトを精度良く追跡することができる。
以下、オブジェクト追跡装置１の構成を詳細に説明する。 Returning to FIG. 1, the object tracking device 1 repeatedly executes the process (3) and the process (4) of FIG. The object tracking device 1 can improve the accuracy of the object association by alternately re-extracting the behavioral feature amount and associating the objects. As a result, the object can be tracked with high accuracy. ..
Hereinafter, the configuration of the object tracking device 1 will be described in detail.

［オブジェクト追跡装置１の構成］
図３は、本実施形態に係るオブジェクト追跡装置１の構成を示す図である。図３に示すように、オブジェクト追跡装置１は、記憶部１１と、制御部１２とを備える。 [Configuration of object tracking device 1]
FIG. 3 is a diagram showing a configuration of the object tracking device 1 according to the present embodiment. As shown in FIG. 3, the object tracking device 1 includes a storage unit 11 and a control unit 12.

記憶部１１は、ＲＯＭ（Read Only Memory）及びＲＡＭ（Random Access Memory）等を含む記憶媒体である。記憶部１１は、制御部１２が実行するプログラムを記憶している。例えば、記憶部１１は、制御部１２を、取得部１２１、オブジェクト検出部１２２、見え特徴量抽出部１２３、行動特徴量抽出部１２４、関連付け部１２５、及び実行制御部１２６として機能させるオブジェクト追跡プログラムを記憶している。 The storage unit 11 is a storage medium including a ROM (Read Only Memory), a RAM (Random Access Memory), and the like. The storage unit 11 stores a program executed by the control unit 12. For example, the storage unit 11 is an object tracking program that causes the control unit 12 to function as an acquisition unit 121, an object detection unit 122, a visible feature amount extraction unit 123, an action feature amount extraction unit 124, an association unit 125, and an execution control unit 126. I remember.

制御部１２は、例えばＣＰＵ（Central Processing Unit）又はＧＰＵ（Graphics Processing Unit）である。制御部１２は、記憶部１１に記憶されたオブジェクト追跡プログラムを実行することにより、取得部１２１、オブジェクト検出部１２２、見え特徴量抽出部１２３、行動特徴量抽出部１２４、関連付け部１２５、及び実行制御部１２６として機能する。 The control unit 12 is, for example, a CPU (Central Processing Unit) or a GPU (Graphics Processing Unit). By executing the object tracking program stored in the storage unit 11, the control unit 12 executes the acquisition unit 121, the object detection unit 122, the visible feature amount extraction unit 123, the action feature amount extraction unit 124, the association unit 125, and the execution unit. It functions as a control unit 126.

［撮像画像の取得］
取得部１２１は、所定エリアを撮像する撮像装置２が４以上の複数の時刻のそれぞれで撮像した４以上の複数の撮像画像を取得する。ここで、撮像装置２は、複数の時刻のそれぞれで、同じ撮像範囲を撮像するものとする。 [Acquisition of captured image]
The acquisition unit 121 acquires a plurality of captured images of 4 or more captured by the imaging device 2 that captures a predetermined area at each of the plurality of times of 4 or more. Here, it is assumed that the imaging device 2 images the same imaging range at each of a plurality of times.

［オブジェクトの検出］
オブジェクト検出部１２２は、取得部１２１が取得した複数の撮像画像のそれぞれから、撮像画像に映るオブジェクトを検出する。具体的には、オブジェクト検出部１２２は、撮像画像に映るオブジェクトの位置を示すオブジェクト位置を特定することによりオブジェクトを検出する。 [Object detection]
The object detection unit 122 detects an object reflected in the captured image from each of the plurality of captured images acquired by the acquisition unit 121. Specifically, the object detection unit 122 detects an object by specifying an object position indicating the position of the object reflected in the captured image.

例えば、オブジェクト検出部１２２は、例えば、オブジェクト検出器である。オブジェクト検出部１２２は、取得部１２１が取得した撮像画像が入力されると、当該撮像画像に対してオブジェクトの位置の特徴量を示す位置特徴量情報Ｘ^ｌｏｃと、当該オブジェクトが当該位置に存在する確からしさを示すコストＣ_ｄｅｔとを出力することにより、オブジェクトを検出する。オブジェクト検出器には、例えば、ＳＳＤ（Single Shot Multibox Detector）が用いられる。 For example, the object detection unit 122 is, for example, an object detector. When the captured image acquired by the acquisition unit 121 is input, the object detection unit 122 has the position feature amount information X ^loc indicating the feature amount of the position of the object with respect to the captured image, and the object exists at the position. The object is detected by outputting the _{cost C date} indicating the certainty. As the object detector, for example, SSD (Single Shot Multibox Detector) is used.

位置特徴量情報Ｘ^ｌｏｃは、例えば、４つの変数（ｘ，ｙ，ｗ，ｈ）の組み合わせによって示される。ｘは撮像画像におけるＸ軸方向（横方向）上の位置、ｙは撮像画像におけるＸ軸と直行するＹ軸方向（縦方向）上の位置、ｗはオブジェクトのＸ軸方向上の長さ（幅）、ｈはオブジェクトのＹ軸方向上の長さ（高さ）を示している。位置特徴量情報Ｘ^ｌｏｃにより、検出されたオブジェクトを囲む矩形領域であるバウンディングボックスが特定される。オブジェクト検出部１２２は、複数の撮像画像のそれぞれにおいて検出されたオブジェクトに対してインデックスｉを割り振り、各オブジェクトｉに対応する位置特徴量情報Ｘ_ｉ ^ｌｏｃ及びコストＣ_ｄｅｔ（ｉ）を特定する。 The position feature amount information X ^loc is indicated by, for example, a combination of four variables (x, y, w, h). x is the position on the X-axis direction (horizontal direction) in the captured image, y is the position on the Y-axis direction (vertical direction) orthogonal to the X-axis in the captured image, and w is the length (width) of the object in the X-axis direction. ) And H indicate the length (height) of the object in the Y-axis direction. The position feature amount information X ^loc identifies a bounding box which is a rectangular area surrounding the detected object. The object detection unit 122 allocates an index i to each of the objects detected in each of the plurality of captured images, and identifies the _{position feature amount information X i} ^loc and the cost C _{det (i) corresponding to each object i.}

［見え特徴量の抽出］
見え特徴量抽出部１２３は、オブジェクト検出部１２２が検出したオブジェクトの撮像画像における見え方を示す見え特徴量を抽出する。例えば、見え特徴量抽出部１２３は、オブジェクト検出部１２２が撮像画像から検出した位置特徴量情報Ｘ^ｌｏｃに基づいて、撮像画像におけるバウンディングボックスの位置及び大きさを特定する。見え特徴量抽出部１２３は、当該撮像画像から、オブジェクトを示す部分画像として、特定したバウンディングボックスに囲まれる部分画像を抽出する。 [Extraction of visible features]
The visible feature amount extraction unit 123 extracts a visible feature amount indicating how the object detected by the object detection unit 122 looks in the captured image. For example, the visible feature amount extraction unit 123 specifies the position and size of the bounding box in the captured image based on ^{the position feature amount information X loc detected by the object detection unit 122 from the captured image.} The visible feature amount extraction unit 123 extracts a partial image surrounded by the specified bounding box as a partial image showing an object from the captured image.

見え特徴量抽出部１２３は、画像の入力に対してオブジェクトの見え方の特徴量を示す複数次元のベクトルである見え特徴量情報Ｘ^ａｐｐを出力する見え特徴量出力プログラムに、抽出した部分画像を入力し、当該プログラムから出力される見え特徴量情報Ｘ^ａｐｐを取得することにより、オブジェクトの見え特徴量を抽出する。見え特徴量出力プログラムは、例えば、深層ニューラルネットワークのプログラムであり、例えば、ＷＲＮｓ（Wide Residual Networks）が用いられる。 The visible feature amount extraction unit 123 outputs the extracted partial image to the visible feature amount output program that outputs the visible feature amount ^{information X app} , which is a multidimensional vector indicating the feature amount of the appearance of the object with respect to the input of the image. The visible feature amount of the object is extracted by inputting and acquiring the visible feature amount information X ^{app output from the program.} The visible feature output program is, for example, a deep neural network program, and for example, WRNs (Wide Residual Networks) are used.

見え特徴量抽出部１２３は、オブジェクト検出部１２２が検出した各オブジェクトｉに対応する部分画像を見え特徴量出力プログラムに入力し、当該プログラムから各オブジェクトｉに対応する見え特徴量情報Ｘ_ｉ ^ａｐｐを取得する。 Feature amount extraction unit 123 visible, enter the feature quantity output program looks a partial image corresponding to each object i to the object detection unit 122 has detected the appearance feature amount information X _i ^app corresponding to each object i from the program get.

［行動特徴量の抽出］
行動特徴量抽出部１２４は、オブジェクト検出部１２２が検出した複数のオブジェクトのそれぞれに対応する撮像画像の部分画像に基づいて、複数のオブジェクトのそれぞれの行動の特徴量を示す行動特徴量を抽出する。行動特徴量抽出部１２４は、オブジェクト検出部１２２が検出した複数のオブジェクトｉのオブジェクト位置を示す位置特徴量情報Ｘ_ｉ ^ｌｏｃに対応する部分画像に基づいて、複数のオブジェクトｉのそれぞれに対応する行動特徴量を抽出する。行動特徴量は、複数次元のベクトルであり、次元の数は、例えば、行動クラスの数に対応している。 [Extraction of behavioral features]
The behavioral feature extraction unit 124 extracts behavioral features indicating the behavioral features of each of the plurality of objects based on the partial images of the captured images corresponding to the plurality of objects detected by the object detection unit 122. .. Action feature quantity extraction unit 124, based on the partial image corresponding to the position feature quantity information X _i ^loc indicating the object positions of the plurality of objects i the object detection unit 122 has detected, corresponding to each of the plurality of objects i action Extract features. The behavioral feature is a multidimensional vector, and the number of dimensions corresponds to, for example, the number of behavioral classes.

［関連付けが行われていない場合の行動特徴量の抽出］
行動特徴量抽出部１２４は、関連付け部１２５により、オブジェクトの関連付けが行われていない場合、以下に示すように、オブジェクト検出部１２２が検出したオブジェクトｉの行動特徴量を示す複数次元のベクトルである行動特徴量情報Ｘ_ｉ ^ｐａｆを抽出する。図４は、オブジェクトの関連付けが行われていない場合における行動特徴量情報Ｘ_ｉ ^ｐａｆの抽出例を示す図である。 [Extraction of behavioral features when no association is made]
The behavior feature extraction unit 124 is a multidimensional vector indicating the behavior feature of the object i detected by the object detection unit 122, as shown below, when the objects are not associated with each other by the association unit 125. Behavioral feature amount information X _i ^paf is extracted. Figure 4 is a diagram showing an example of extraction of a behavioral characteristic quantity information X _i ^paf when the association object is not performed.

行動特徴量抽出部１２４は、オブジェクトｉの部分画像の色に基づく行動特徴量情報Ｘ_ｉ ^{ｐａｆ−Ｃ}を抽出するとともに、部分画像と、当該部分画像が抽出された撮像画像の前後の撮像画像から抽出した部分画像との差分に基づく行動特徴量情報Ｘ_ｉ ^{ｐａｆ−Ｆ}を抽出する。そして、行動特徴量抽出部１２４は、部分画像の色に基づく行動特徴量情報Ｘ_ｉ ^{ｐａｆ−Ｃ}と、部分画像の差分に基づく行動特徴量情報Ｘ_ｉ ^{ｐａｆ−Ｆ}とを統合することにより、オブジェクトｉの行動特徴量情報Ｘ_ｉ ^ｐａｆを抽出する。 Action feature quantity extraction unit 124 extracts the action feature amount information X _i ^paf-C based on the color of the partial images of the object i, and the partial images, from the front and back of the captured image of the captured image in which the partial image is extracted extracting the action feature amount information X _i ^paf-F based on the difference between the extracted partial image. Then, the behavior characteristic amount extraction unit 124, by integrating the behavior characteristic amount information X _i ^paf-C based on the color of the partial images, a behavior characteristic amount information based on the difference of the partial image X _i ^paf-F, Object The behavioral feature amount information X _i ^paf of i is extracted.

まず、行動特徴量抽出部１２４は、オブジェクト検出部１２２が検出したオブジェクトｉの位置特徴量情報Ｘ_ｉ ^ｌｏｃに基づいて、当該オブジェクトが検出された撮像画像から、当該オブジェクトを含む第１部分画像としての局所切出画像Ｉｍ_Ｌと、当該オブジェクトを含み、第１部分画像よりも表示領域が大きい部分画像である第２部分画像としての大域切出画像Ｉｍ_Ｇとを抽出する。 First, the behavior characteristic amount extraction unit 124, based on the positional characteristic amount information X _i ^loc object i to the object detection unit 122 detects, from the captured image in which the object is detected, as a first partial image containing the object The local cutout image Im _{L of the above} _{and the global cutout image Im G} as the second partial image which is a partial image including the object and having a display area larger than that of the first partial image are extracted.

局所切出画像Ｉｍ_Ｌは、例えば、オブジェクトｉの位置特徴量情報Ｘ_ｉ ^ｌｏｃが示すバウンディングボックスに囲まれる画像であり、大域切出画像Ｉｍ_Ｇは、局所切出画像Ｉｍ_Ｌを含み、局所切出画像Ｉｍ_Ｌよりも数倍の表示領域を有する画像である。そして、行動特徴量抽出部１２４は、抽出した局所切出画像Ｉｍ_Ｌ及び大域切出画像Ｉｍ_Ｇに基づいて行動特徴量情報Ｘ_ｉ ^ｐａｆを抽出する。このようにすることで、オブジェクト追跡装置１は、オブジェクトの周りに存在する他のオブジェクトや物体の情報を考慮して、より高精度に行動特徴量を抽出することができる。 Local clipped images Im _L is, for example, an image to be surrounded by a bounding box indicating the position feature quantity information _X ^{i loc} object i, global cut image Im _G includes a local clipped images Im _L, local switching This is an image having a display area several times larger than that of the output image Im _L. Then, the behavior characteristic amount extraction unit 124 extracts the action feature amount information _X ^{i paf} based on the extracted local cut image Im _L and global clipped images Im _G. By doing so, the object tracking device 1 can extract the behavioral feature amount with higher accuracy in consideration of other objects existing around the object and the information of the object.

なお、本実施形態において、オブジェクト追跡装置１は、局所切出画像Ｉｍ_Ｌと大域切出画像Ｉｍ_Ｇとに基づいて行動特徴量情報Ｘ_ｉ ^ｐａｆを抽出することとしたが、これに限らず、局所切出画像Ｉｍ_Ｌと大域切出画像Ｉｍ_Ｇとのいずれか一方に基づいて行動特徴量情報Ｘ_ｉ ^ｐａｆを抽出してもよい。 In the present embodiment, the object tracking apparatus 1 has been decided to extract behavioral feature amount information X _i ^paf on the basis of the local cut image Im _L and the global clipped images Im _G, not limited thereto, local clipped images Im _L and the global cut image Im or behavioral characteristic quantity information X _i ^paf based on one of the _G may be extracted.

図４に示す例では、行動特徴量抽出部１２４は、局所切出画像Ｉｍ_Ｌが示す色情報と、大域切出画像Ｉｍ_Ｇが示す色情報とを、それぞれ、ＲｅｓＮｅｔ（Residual Networks）等の深層ニューラルネットワークのベースモデルプログラムに入力し、各ベースモデルプログラムから出力される行動特徴量情報を結合することにより、色に基づく行動特徴量情報Ｘ_ｉ ^{ｐａｆ−Ｃ}を抽出する。 In the example shown in FIG. 4, the behavioral feature amount extraction unit 124 _{obtains the color information indicated by the local cutout image Im L} and the color information indicated by the global cutout image Im _G into deep layers such as ResNet (Residual Networks), respectively. enter the base model program of the neural network, by combining the action feature amount information output from the base model program, it extracts the action feature amount information based on a color X _i ^paf-C.

また、行動特徴量抽出部１２４は、オブジェクトｉが検出された撮像画像から抽出したオブジェクトｉの部分画像（局所切出画像Ｉｍ_Ｌ及び大域切出画像Ｉｍ_Ｇ）と、当該撮像画像が撮像された時刻の前の時刻又は後の時刻に撮像された撮像画像の部分画像との差分に基づいて、行動特徴量情報Ｘ_ｉ ^{ｐａｆ−Ｆ}を抽出する。 _{In addition, the behavioral feature amount extraction unit 124 captured the partial image (local cutout image Im L} and global cutout image Im _G ) of the object i extracted from the captured image in which the object i was detected, and the captured image. based on the difference between the partial image of the previous time or after the time the captured image captured at the time, extracts the action feature amount information X _i ^paf-F.

例えば、行動特徴量抽出部１２４は、オブジェクトｉが検出された撮像時刻ｔ_ｉの撮像画像から、位置特徴量情報Ｘ_ｉ ^ｌｏｃに基づいて局所切出画像Ｉｍ_Ｌ及び大域切出画像Ｉｍ_Ｇを抽出する。また、行動特徴量抽出部１２４は、オブジェクトｉが検出された撮像時刻ｔ_ｉの直前の撮像時刻ｔ_ｉ−１の撮像画像から、位置特徴量情報Ｘ_ｉ ^ｌｏｃに基づいて局所切出画像Ｉｍ_Ｌｂ及び大域切出画像Ｉｍ_Ｇｂを抽出する。そして、行動特徴量抽出部１２４は、抽出した２つの局所切出画像の差分Ｄ_Ｌ、及び２つの大域切出画像の差分Ｄ_Ｇに基づいて、行動特徴量情報Ｘ_ｉ ^{ｐａｆ−Ｆ}を抽出する。 For example, action feature quantity extraction unit 124 extracts from the captured image of the imaging time _{t i} the object i is detected, the local cut image Im _L and global cut image Im _G on the basis of the positional feature amount information _X ^{i loc} do. Further, the behavior characteristic amount extraction unit 124, from the immediately preceding imaging time _{t i-1} of the captured image capturing time _{t i} the object i is detected based on the positional characteristic amount information _X ^{i loc} local cut image _{Im Lb} And the global cutout image Im _Gb is extracted. Then, the behavior characteristic amount extraction unit 124, the difference D _L of the extracted two local clipped _images, and based on the difference D _G of the two global clipped images, extracts a behavior feature amount information X _i ^paf-F ..

図４に示す例では、行動特徴量抽出部１２４は、局所切出画像の差分Ｄ_Ｌと、大域切出画像の差分Ｄ_Ｇとを、それぞれ、ＲｅｓＮｅｔ等の深層ニューラルネットワークのベースモデルプログラムに入力し、各ベースモデルプログラムから出力される行動特徴量情報を結合することにより行動特徴量情報Ｘ_ｉ ^{ｐａｆ−Ｆ}を抽出する。なお、ベースモデルプログラムは予め学習が行われているものとする。 In the example shown in FIG. 4, the behavior characteristic amount extraction unit 124, the difference D _L of the local clipped images, and the difference D _G global clipped images, respectively, input to the base model program deep neural network such ResNet and extracts the action feature amount information X _i ^paf-F by combining the action feature amount information output from the base model program. It is assumed that the base model program has been learned in advance.

そして、行動特徴量抽出部１２４は、抽出した行動特徴量情報Ｘ_ｉ ^{ｐａｆ−Ｃ}と行動特徴量情報Ｘ_ｉ ^{ｐａｆ−Ｆ}とを統合することにより、オブジェクトｉの行動特徴量情報Ｘ_ｉ ^ｐａｆを抽出する。例えば、行動特徴量抽出部１２４は、抽出した行動特徴量情報Ｘ_ｉ ^{ｐａｆ−Ｃ}と行動特徴量情報Ｘ_ｉ ^{ｐａｆ−Ｆ}との平均値をオブジェクトｉの行動特徴量情報Ｘ_ｉ ^ｐａｆとする。 Then, the behavior characteristic amount extraction unit 124 by integrating the extracted and behavioral characteristic quantity information _X ^{i paf-C} and action feature amount information _X ^{i paf-F,} extracting action feature amount information _X ^{i paf} object i do. For example, action feature quantity extraction unit 124, the average value of the extracted and behavioral characteristic quantity information _X ^{i paf-C} and behavioral characteristic quantity information _X ^{i paf-F} and action feature quantity information _X ^{i paf} object i.

行動特徴量抽出部１２４は、行動特徴量情報Ｘ_ｉ ^{ｐａｆ−Ｃ}と行動特徴量情報Ｘ_ｉ ^{ｐａｆ−Ｆ}とを統合する他の方法として、行動特徴量情報Ｘ_ｉ ^{ｐａｆ−Ｃ}と行動特徴量情報Ｘ_ｉ ^{ｐａｆ−Ｆ}とを要素ごとに比較し、大きいほうの値で構成されたベクトルを、オブジェクトｉの行動特徴量情報Ｘ_ｉ ^ｐａｆとしてもよい。例えば、Ｘ_ｉ ^{ｐａｆ−Ｃ}＝（０．１，０．２，０．５，０．６）、Ｘ_ｉ ^{ｐａｆ−Ｆ}＝（０．９，０．６，０．２，０．９）である場合、Ｘ_ｉ ^ｐａｆ＝（０．９，０．６，０．５，０．９）となる。以下、この抽出方法を、最大要素抽出法と呼ぶ。 The behavioral feature amount extraction unit 124 uses the behavioral feature amount information Xi ^paf-C and the behavioral feature amount information X _i ^paf-C _{as another method of integrating the behavioral feature amount information X i} paf-C and the behavioral feature amount information X _i ^paf-F. The X _i ^paf-F may be compared for each element, and the vector composed of the larger value may be ^used _{as the behavioral feature amount information X i paf of the object i.} For example, with X _i ^paf-C = (0.1, 0.2, ^{0.5, 0.6} _{) and X i} paf-F = (0.9, 0.6, 0.2, 0.9). In some cases, X _i paf = ( ^{0.9, 0.6, 0.5, 0.9).} Hereinafter, this extraction method will be referred to as a maximum element extraction method.

また、行動特徴量抽出部１２４は、行動特徴量情報Ｘ_ｉ ^ｐａｆを抽出すると、当該行動特徴量情報Ｘ_ｉ ^ｐａｆが示すベクトル値のうち、最も高いベクトル値に対応する行動クラスを示すクラス情報を抽出する。 Further, the behavior characteristic amount extraction unit 124, extracting the action feature amount information X _i ^paf, among vector value indicated by the action feature amount information X _i ^paf, the class information indicating the action class corresponding to the highest vector value Extract.

なお、行動特徴量抽出部１２４は、図４に示すように行動特徴量情報Ｘ_ｉ ^ｐａｆを抽出したが、これに限らない。行動特徴量抽出部１２４は、ＴＳＮ（Temporal Segment Networks）のような深層ニューラルネットワークのプログラムを用いて行動特徴量情報Ｘ_ｉ ^ｐａｆを抽出してもよい。 Note that action feature quantity extraction unit 124 has been extracted action feature amount information X _i ^paf as shown in FIG. 4, not limited to this. Action feature quantity extraction unit 124 may extract the action feature amount information X _i ^paf using a program of deep neural networks, such as TSN (Temporal Segment Networks).

また、オブジェクトの関連付けが行われていない場合には、抽出した行動特徴量情報Ｘ_ｉ ^{ｐａｆ−Ｃ}と行動特徴量情報Ｘ_ｉ ^{ｐａｆ−Ｆ}との平均値が、必ずしもオブジェクトの行動特徴量を反映しているとはいえず、むしろ、行動特徴量にあいまいさを持たせたほうが好ましいとも考えられる。このため、行動特徴量抽出部１２４は、各ニューラルネットワークにおける中間層が示す特徴量を取得し、取得した複数の特徴量の平均化等を行ったものを、行動特徴量情報Ｘ_ｉ ^ｐａｆとして抽出してもよい。このようにすることで、オブジェクト追跡装置１は、オブジェクトの関連付けが行われていない場合に、行動特徴量情報Ｘ_ｉ ^ｐａｆにあいまいさを持たせることができる。 Also, if the association between an object is not performed, the average value of the extracted and behavioral characteristic quantity information X _i ^paf-C and behavioral characteristic quantity information X _i ^paf-F is, necessarily reflect the behavioral features of the object Rather, it is considered preferable to give ambiguity to the behavioral features. Therefore, the behavior characteristic amount extraction unit 124, extracts acquires the feature quantity indicated by the intermediate layer in each neural network, what was averaging such as a plurality of feature amount acquired, as the action feature amount information X _i ^paf You may. By doing so, the object tracking apparatus 1, when the association of the object has not been performed, it is possible to have ambiguity in the behavior characteristic amount information X _i ^paf.

［関連付けが行われている場合の行動特徴量の抽出］
行動特徴量抽出部１２４は、関連付け部１２５によりオブジェクトの関連付けが行われている場合、関連付け部１２５により関連付けが行われた複数のオブジェクトｉのそれぞれについて、当該オブジェクトｉに対応する部分画像と、当該オブジェクトｉに関連付けられた一以上の他のオブジェクトのそれぞれに対応する部分画像とに基づいて行動特徴量を再抽出する。ここで、行動特徴量抽出部１２４は、オブジェクトｉに関連付けられた一以上の他のオブジェクトのうち、オブジェクトｉが映る撮像画像の撮像時刻から所定時間（所定のフレーム数）以内に撮像された撮像画像から検出された他のオブジェクトの部分画像に基づいて行動特徴量を再抽出するものとする。以下の説明において、所定のフレーム数によって定められる、オブジェクトｉの行動特徴量の抽出に用いる他のオブジェクトの参照範囲をウィンドウサイズという。 [Extraction of behavioral features when association is performed]
When the objects are associated with each other by the association unit 125, the behavior feature amount extraction unit 124 has a partial image corresponding to the object i and the partial image of each of the plurality of objects i associated with the association unit 125. The behavioral features are re-extracted based on the partial images corresponding to each of the one or more other objects associated with the object i. Here, the behavioral feature amount extraction unit 124 captures images taken within a predetermined time (predetermined number of frames) from the imaging time of the captured image in which the object i is projected, among one or more other objects associated with the object i. The behavioral features shall be re-extracted based on the partial images of other objects detected from the image. In the following description, the reference range of another object used for extracting the behavioral feature amount of the object i, which is determined by a predetermined number of frames, is referred to as a window size.

図５は、オブジェクトの関連付けが行われている場合における行動特徴量情報Ｘ_ｉ ^ｐａｆの抽出例を示す図である。例えば、時刻ｔ_ｉに検出されたオブジェクトｉに対して、時刻ｔ_ｉ−１に検出されたオブジェクトと、時刻ｔ_ｉ＋１に検出されたオブジェクトとが関連付けられたものとする。この場合、行動特徴量抽出部１２４は、図５に示すように、時刻ｔ_ｉに検出されたオブジェクトｉの部分画像に基づいて行動特徴量情報を抽出するとともに、時刻ｔ_ｉ−１及び時刻ｔ_ｉ＋１に検出されたオブジェクトの行動特徴量情報を抽出する。そして、行動特徴量抽出部１２４は、抽出したこれらの行動特徴量情報の平均値を算出し、活性化させることにより、行動特徴量情報Ｘ_ｉ ^ｐａｆを再抽出する。また、行動特徴量抽出部１２４は、行動特徴量情報Ｘ_ｉ ^{ｐａｆ−Ｃ}と行動特徴量情報Ｘ_ｉ ^{ｐａｆ−Ｆ}とを統合する方法と同様に、抽出したこれらの行動特徴量情報に最大要素抽出法を適用して行動特徴量情報を算出し、算出した結果を活性化させることにより、行動特徴量情報Ｘ_ｉ ^ｐａｆを再抽出してもよい。 Figure 5 is a diagram showing an example of extraction of a behavioral characteristic quantity information X _i ^paf when the association of the object has been performed. For example, with respect to the time t _i of the detected object i, and the object detected at time t _i-1, and that is the object detected at time t _{i + 1} associated. In this case, the behavior characteristic amount extraction unit 124, as shown in FIG. 5, extracts the action feature amount information based on the partial image of the detected object i at time t _i, the time t _i-1 and time t The behavioral feature amount information of the object detected in _{i + 1 is extracted.} Then, the behavior characteristic amount extraction unit 124, extracted to calculate the average value of these actions feature amount information, by activating reextracted action feature amount information X _i ^paf. Moreover, action feature quantity extracting unit 124, like the method to integrate the action feature amount information X _i ^paf-C and behavioral feature amount information X _i ^paf-F, the maximum element extraction the extracted these actions feature amount information law to calculate the behavior characteristic amount information by applying, by activating the calculated results may be re-extracted action feature amount information X _i ^paf.

図５に示す例では、時刻ｔ_ｉ−１、時刻ｔ_ｉ、及び時刻ｔ_ｉ＋１に検出されたオブジェクトの行動特徴量情報を示すベクトルＢ_{ｔ（ｉ−１）}、Ｂ_ｔｉ、Ｂ_{ｔ（ｉ＋１）}に含まれる行動クラスの値のうち、相対的に値が大きい行動クラスを、他の行動クラスと異なる色で示している。ベクトルＢ_{ｔ（ｉ−１）}、Ｂ_ｔｉ、Ｂ_{ｔ（ｉ＋１）}を統合した結果、行動特徴量情報Ｘ_ｉ ^ｐａｆを示すベクトルが、時刻ｔ_ｉのベクトルＢ_ｔｉ（再抽出前の行動特徴量情報を示すベクトル）から変化していることが確認できる。なお、行動特徴量抽出部１２４は、時刻ｔ_ｉだけでなく、時刻ｔ_ｉ−１及び時刻ｔ_ｉ＋１に検出されたオブジェクトの行動特徴量情報として、再抽出された行動特徴量情報Ｘ_ｉ ^ｐａｆを示すベクトルを用いてもよい。 In the example shown in FIG. 5, the time _{t i-1,} time _{t i,} and the vector _B t _(i-1) indicating the action feature amount information of the detected object at the time _{_{t i + 1, B ti,}} B t (i + 1) Of the behavior class values included in, the behavior class with a relatively large value is shown in a color different from that of other behavior classes. Vector _{_{B t (i-1),}} B ti, B t (i + 1) results of the integration of the vector indicating the action feature amount information _X ^{i paf} is vector _{B ti} (re-extraction before behavioral characteristic quantity information of the time _{t i} It can be confirmed that it changes from the vector indicating. Note that action feature quantity extraction unit 124, not only the time _{t i,} as the action feature amount information of the detected object at time _{t i-1} and time _{t i + 1,} was re extracted action feature amount information _X ^{i paf} The vector shown may be used.

なお、行動特徴量抽出部１２４は、関連付け部１２５により関連付けが行われた複数のオブジェクトのそれぞれについて、当該オブジェクトに対応する行動特徴量が示す行動傾向に基づいてウィンドウサイズを変化させることにより、当該オブジェクトの行動特徴量の再抽出に用いる他のオブジェクトの数を変化させてもよい。例えば、行動特徴量抽出部１２４は、行動傾向が静止等の静的な行動傾向を示す場合、ウィンドウサイズを大きくし、歩行等の動的な行動を示す場合、ウィンドウサイズを小さくしてもよい。このようにすることで、ウィンドウサイズを、行動傾向に対して適切なウィンドウサイズとすることができるので、行動特徴量の抽出精度を向上させることができる。 The behavior feature extraction unit 124 changes the window size of each of the plurality of objects associated by the association unit 125 based on the behavior tendency indicated by the behavior feature corresponding to the object. You may change the number of other objects used to re-extract the behavioral features of the object. For example, the behavior feature amount extraction unit 124 may increase the window size when the behavior tendency shows a static behavior tendency such as stationary, and may decrease the window size when the behavior tendency shows a dynamic behavior such as walking. .. By doing so, the window size can be set to an appropriate window size for the behavioral tendency, so that the accuracy of extracting the behavioral feature amount can be improved.

また、行動特徴量抽出部１２４は、オブジェクトの関連付けが行われている場合においても、再抽出の回数が所定回数以内であることを条件として、ニューラルネットワークの中間層が示す特徴量に基づいて、オブジェクトの行動特徴量を再抽出してもよい。具体的には、行動特徴量抽出部１２４は、ニューラルネットワークに、検出したオブジェクトｉに対応する部分画像と、当該オブジェクトｉとの関連付けが行われた一以上の他のオブジェクトに対応する部分画像とを入力し、ニューラルネットワークにおける中間層が示す特徴量を取得し、取得した特徴量に基づいて、当該オブジェクトｉの行動特徴量情報Ｘ_ｉ ^ｐａｆを再抽出してもよい。 Further, the behavior feature amount extraction unit 124 is based on the feature amount indicated by the intermediate layer of the neural network, provided that the number of re-extractions is within a predetermined number even when the objects are associated with each other. The behavioral features of the object may be re-extracted. Specifically, the behavioral feature amount extraction unit 124 has a neural network with a partial image corresponding to the detected object i and a partial image corresponding to one or more other objects associated with the object i. enter a acquires the feature quantity indicating the intermediate layer in the neural network, based on the obtained feature amount may be re-extracted action feature amount information X _i ^paf of the object i.

例えば、行動特徴量抽出部１２４は、検出したオブジェクトに対応する部分画像をニューラルネットワークに入力した場合における、当該ニューラルネットワークにおける中間層が示す特徴量と、関連するオブジェクトに対応する部分画像をニューラルネットワークに入力した場合における、当該ニューラルネットワークにおける中間層が示す特徴量との平均値を、検出したオブジェクトの行動特徴量情報Ｘ_ｉ ^ｐａｆを再抽出してもよい。このようにすることで、オブジェクト追跡装置１は、オブジェクトの関連付けが十分に行われていない場合に、行動特徴量情報Ｘ_ｉ ^ｐａｆにあいまいさを持たせることができる。 For example, the behavioral feature amount extraction unit 124 inputs the feature amount indicated by the intermediate layer in the neural network when the partial image corresponding to the detected object is input to the neural network, and the partial image corresponding to the related object in the neural network. in the case where input to the average value of the feature quantity indicated by the intermediate layer in the neural network may be re-extracted action feature amount information X _i ^paf of the detected object. By doing so, the object tracking apparatus 1, when the association of the object is not sufficiently, can have ambiguity in the action feature amount information X _i ^paf.

［オブジェクトの関連付け］
関連付け部１２５は、オブジェクト検出部１２２が抽出した位置特徴量、見え特徴量抽出部１２３が抽出した見え特徴量、及び行動特徴量抽出部１２４が抽出した行動特徴量に基づいて、撮像された時刻が異なる複数の撮像画像のそれぞれからオブジェクト検出部１２２が検出したオブジェクトの関連付けを行う。 [Object Association]
The association unit 125 is the time when the image is taken based on the position feature amount extracted by the object detection unit 122, the appearance feature amount extracted by the appearance feature amount extraction unit 123, and the behavior feature amount extracted by the behavior feature amount extraction unit 124. The object detected by the object detection unit 122 is associated with each of the plurality of captured images having different values.

関連付け部１２５は、検出されたオブジェクトｉをノードとし、各ノード間のコストを設定して、最小費用流問題を解くことにより、オブジェクトの関連付けを行う。具体的にはまず、関連付け部１２５は、以下の式（１）に示すように、検出されたオブジェクトｉを示すノードｙ_ｉを定義する。 The association unit 125 associates the objects by using the detected object i as a node, setting a cost between each node, and solving the minimum cost flow problem. Specifically, first, the association unit 125 defines a _{node y i} indicating the detected object i as shown in the following equation (1).

続いて、関連付け部１２５は、各ノードｙ_ｉに対するコストを算出する。コストには、観測コストＣ_ｏｂｓｖ（ｉ）、遷移コストＣ_ｔｒａｎ（ｉ，ｊ）、開始コストＣ_ｅｎｔｒ（ｉ）、終了コストＣ_ｅｘｉｔ（ｉ）が含まれる。遷移コストＣ_ｔｒａｎ（ｉ，ｊ）に含まれるｊは、他のノードｙ_ｊのインデックスを示している。また、他のノードｙ_ｊは、オブジェクトｉが検出された撮像画像の撮像時刻から所定時間内に撮像された撮像画像から検出したオブジェクトに対応しているものとする。 Subsequently, the association unit 125 calculates the cost for _{each node y i.} The costs include the observation cost _Cobsv (i), the transition cost C _tran (i, j), the start cost _Center (i), and the end cost _Exit (i). J included in the transition cost C _tran (i, j) indicates the index of _{another node y j.} Further, it is assumed that the other node y _j corresponds to the object detected from the captured image captured within a predetermined time from the imaging time of the captured image in which the object i is detected.

また、開始コストＣ_ｅｎｔｒ（ｉ）は、予め定数として与えられるものである。開始コストＣ_ｅｎｔｒ（ｉ）が小さければ小さいほど、ノードｙ_ｉに対して新たに関連付けが行われる頻度が増加する。終了コストＣ_ｅｘｉｔ（ｉ）は、予め定数として与えられるものである。終了コストＣ_ｅｘｉｔ（ｉ）が小さければ小さいほど、ノードｙ_ｉに対して関連付けが行われない頻度が増加する。 Further, the starting cost _Center (i) is given as a constant in advance. The smaller the starting cost _Center (i), the more often new associations are made to _{the node y i.} The end cost _Exit (i) is given as a constant in advance. _{The smaller the exit} cost EXIT (i), the more often no association is made to the _{node y i.}

関連付け部１２５は、観測コストＣ_ｏｂｓｖ（ｉ）を以下に示す式（２）、（３）に基づいて算出する。

The association unit 125 _calculates the observation cost Cobsv (i) based on the following equations (2) and (3).

式（２）に示されるｂは予め定められる定数である。コストＣ_ｄｅｔ（ｉ）は、オブジェクト検出器の出力（当該オブジェクトが位置特徴量情報Ｘ_ｉ ^ｌｏｃが示す位置に存在する確からしさ）を示している。α、βは、コストＣ_ｄｅｔ（ｉ）に対して最適となる観測コストＣ_ｏｂｓｖ（ｉ）を算出するためのパラメータである。α、βは、位置特徴量情報Ｘ_ｉ ^ｌｏｃにおけるコストＣ_ｄｅｔ（ｉ）と、当該オブジェクトの位置が正しいか否かを示す正解データとを組み合わせた教師データに基づいて学習を行うことにより設定される。 B shown in the equation (2) is a predetermined constant. Cost C _{det (i)} shows the output of the object detector (probability that the object is present at a position indicated by the position characteristic amount information X _i ^loc). α and β are _{parameters for calculating the observation cost Cobsv} (i) that is _{optimal for the cost C det} (i). alpha, beta is a cost C _{det (i)} at position feature quantity information X _i ^loc, set by performing learning based on training data obtained by combining the correct data position of the object indicating whether correct or not NS.

関連付け部１２５は、遷移コストＣ_ｔｒａｎ（ｉ，ｊ）を以下に示す式（４）、（５）に基づいて算出する。 The association unit 125 _{calculates the transition cost C tran} (i, j) based on the following equations (4) and (5).

式（４）に示されるＣ_ｉｏｕ（ｉ，ｊ）は、例えば、ノードｙ_ｉに含まれる位置特徴量情報Ｘ_ｉ ^ｌｏｃによって示されるバウンディングボックスと、ノードｙ_ｊに含まれる位置特徴量情報Ｘ_ｊ ^ｌｏｃによって示されるバウンディングボックスとの重複率である。式（４）に示されるＣ_ａｐｐ（ｉ，ｊ）は、ノードｙ_ｉに含まれる見え特徴量情報Ｘ_ｉ ^ａｐｐとノードｙ_ｊに含まれる見え特徴量情報Ｘ_ｊ ^ａｐｐとのコサイン距離である。式（４）に示されるＣ_ｐａｆ（ｉ，ｊ）は、ノードｙ_ｉに含まれる行動特徴量情報Ｘ_ｉ ^ｐａｆとノードｙ_ｊに含まれる行動特徴量情報Ｘ_ｊ ^ｐａｆとのコサイン距離である。 _C iou represented by formula (4) (i, j), for example, node _y position feature quantity information contained in the _i _X ⁱ and the bounding box indicated by ^loc, node _{y j} position feature quantity information _{X j} included in The overlap rate with the bounding box indicated by the ^loc. _C app represented by formula (4) (i, j) is the cosine distance between the appearance feature amount information _X ^{j app} in a node _y appearance feature amount information contained in the _i _X ^{i app} and node _{y j.} _C paf represented by formula (4) (i, j) is the cosine distance between action feature quantity information _X ^{j paf} in a node _y action feature quantity information contained in the _i _X ^{i paf} and node _{y j.}

関数ｇは、式（４）にＣ_ｉｏｕ（ｉ，ｊ）、Ｃ_ａｐｐ（ｉ，ｊ）、Ｃ_ｐａｆ（ｉ，ｊ）を入力した結果、最適となる観測コストＣ_ｔｒａｎ（ｉ，ｊ）を算出するための非線形関数である。関数ｇは、予め関連付けが行われている複数のオブジェクトｉと、オブジェクト検出部１２２によって検出された当該複数のオブジェクトｉに対応するノードｙとを組み合わせた教師データに基づいて学習を行うことにより設定される。なお、関数ｇは、例えば決定木で表現してもよい。この場合、関数ｇは、例えばブースティングアルゴリズムによって予めパラメータを学習させておくものとする。 Function g, _C iou in equation _{(4) (i, j)} , C app (i, j), C paf (i, j) a result of entering the optimal become observed cost _C tran (i, j) It is a non-linear function for calculation. The function g is set by learning based on teacher data that combines a plurality of objects i that are associated in advance and nodes y corresponding to the plurality of objects i detected by the object detection unit 122. Will be done. The function g may be expressed by, for example, a decision tree. In this case, the function g is supposed to learn the parameters in advance by, for example, a boosting algorithm.

関連付け部１２５は、以下の式（６）に示すオブジェクトの関連付けの結果Ｆを求めることにより、オブジェクトの関連付けを行う。

The association unit 125 associates the objects by obtaining the result F of the association of the objects shown in the following equation (6).

ここで、ｆは、関連付けをするか否かを示すものであり、ｆ_ｅｎｔｒ（ｉ）、ｆ_ｏｂｓｖ（ｉ）、ｆ_ｔｒａｎ（ｉ，ｊ）、ｆ_ｅｘｉｔ（ｉ）のうち、いずれかが１となり、その他が０となる。ｆが０である場合は関連付けが行われないことを示し、ｆが１である場合は関連付けが行われることを示す。 Here, f is, which indicates whether the _{_{_{association, f entr (i), f}}} obsv (i), f tran (i, j), of _{f exit} (i), either 1 And the others are 0. When f is 0, it indicates that the association is not performed, and when f is 1, it indicates that the association is performed.

関連付け部１２５は、各オブジェクトの最適な関連付けの結果Ｆ^＊を、以下の式（７）に示す目的関数を最適化、すなわち、Ｆ^＊を最小の値にすることによって求める。関連付け部１２５は、例えばスケーリングプッシュ再ラベルアルゴリズムを用いて式（７）に示す目的関数の最適化を行うことにより、オブジェクトの関連付けを行う。

^{The association unit 125 obtains the result F *} of the optimum association of each object by optimizing the objective function shown in the following equation (7), that is, by ^{setting F *} to the minimum value. The association unit 125 associates the objects by optimizing the objective function shown in the equation (7) using, for example, a scaling push relabeling algorithm.

関連付け部１２５は、行動特徴量抽出部１２４により再抽出された行動特徴量に基づいて、撮像された時刻が異なる複数の撮像画像のそれぞれからオブジェクト検出部１２２が検出したオブジェクトの関連付けを再度行う。関連付け部１２５が、オブジェクトの関連付けを再度行うときも、上述したように、各ノードｙ_ｉに対するコストを算出し、式（７）に示す目的関数の最適化を行うことにより、オブジェクトの関連付けを行う。 The association unit 125 reassociates the objects detected by the object detection unit 122 from each of the plurality of captured images having different captured times based on the behavior feature amount re-extracted by the behavior feature amount extraction unit 124. When the associating unit 125 reassociates the objects, as described above, the associating the objects is performed _{by calculating the cost for each node y i} and optimizing the objective function shown in the equation (7). ..

［行動特徴量の再抽出及びオブジェクトの関連付けの繰り返し］
実行制御部１２６は、行動特徴量抽出部１２４による行動特徴量の再抽出と、関連付け部１２５によるオブジェクトの再度の関連付けとを、所定の条件を満たすまで交互に繰り返し実行させることにより、オブジェクトの追跡を行う。例えば、所定の条件は、関連付け部１２５によるオブジェクトの関連付けが収束したことである。 [Re-extraction of behavioral features and repetition of object association]
The execution control unit 126 tracks the object by alternately and repeatedly executing the re-extraction of the behavior feature amount by the behavior feature amount extraction unit 124 and the re-association of the object by the association unit 125 until a predetermined condition is satisfied. I do. For example, the predetermined condition is that the association of objects by the association unit 125 has converged.

具体的には、実行制御部１２６は、関連付け部１２５によるオブジェクトの関連付けを行った結果、当該関連付けを行った後において関連付けられているオブジェクトの数に対する、当該関連付けを行う前において関連付けられているオブジェクトの数の割合が所定の割合以上（例えば９０％以上）となるまで、行動特徴量抽出部１２４による行動特徴量の再抽出と、関連付け部１２５によるオブジェクトの関連付けとを交互に繰り返し実行させる。すなわち、実行制御部１２６は、関連付け部１２５によるオブジェクトの関連付けを行った前後において関連付けられているオブジェクトの数の変化（増分）が小さくなるまで行動特徴量抽出部１２４による行動特徴量の再抽出と、関連付け部１２５によるオブジェクトの関連付けとを交互に繰り返し実行させる。このようにすることで、オブジェクト追跡装置１は、オブジェクトの関連付けが収束するまでオブジェクトの関連付けを行うことができる。 Specifically, the execution control unit 126 associates the objects with the association unit 125, and as a result, the objects associated with the number of objects associated with the association after the association is performed before the association is performed. The re-extraction of the behavioral feature amount by the behavioral feature amount extraction unit 124 and the association of the objects by the association unit 125 are alternately and repeatedly executed until the ratio of the number of the objects becomes a predetermined ratio or more (for example, 90% or more). That is, the execution control unit 126 re-extracts the behavior feature amount by the behavior feature amount extraction unit 124 until the change (increment) in the number of associated objects before and after the association of the objects by the association unit 125 becomes small. , The object association by the association unit 125 is repeatedly executed alternately. By doing so, the object tracking device 1 can associate the objects until the association of the objects converges.

なお、行動特徴量抽出部１２４による行動特徴量の再抽出と、関連付け部１２５によるオブジェクトの関連付けとを交互に繰り返し実行させる回数を予め定めておき、実行制御部１２６が、当該回数に基づいて、行動特徴量抽出部１２４による行動特徴量の再抽出と、関連付け部１２５によるオブジェクトの関連付けとを交互に繰り返し実行させてもよい。 It should be noted that the number of times that the behavioral feature amount extraction unit 124 re-extracts the behavioral feature amount and the association unit 125 alternately repeatedly executes the object association is determined in advance, and the execution control unit 126 determines the number of times based on the number of times. The re-extraction of the behavioral feature amount by the behavioral feature amount extraction unit 124 and the association of the objects by the association unit 125 may be alternately and repeatedly executed.

［オブジェクト追跡装置１における処理の流れ］
続いて、オブジェクト追跡装置１における処理の流れについて説明する。図６は、本実施形態に係るオブジェクト追跡装置１における処理の流れを示すフローチャートである。 [Process flow in object tracking device 1]
Subsequently, the processing flow in the object tracking device 1 will be described. FIG. 6 is a flowchart showing a processing flow in the object tracking device 1 according to the present embodiment.

まず、取得部１２１は、所定エリアを撮像する撮像装置２が複数の時刻のそれぞれで撮像した複数の撮像画像を取得する（Ｓ１）。
続いて、オブジェクト検出部１２２は、取得部１２１が取得した複数の撮像画像のそれぞれからオブジェクトに対応する位置特徴量を抽出することにより、複数の撮像画像のそれぞれに映るオブジェクトを検出する（Ｓ２）。 First, the acquisition unit 121 acquires a plurality of captured images captured by the imaging device 2 that images a predetermined area at each of the plurality of times (S1).
Subsequently, the object detection unit 122 detects the object reflected in each of the plurality of captured images by extracting the position feature amount corresponding to the object from each of the plurality of captured images acquired by the acquisition unit 121 (S2). ..

続いて、見え特徴量抽出部１２３は、オブジェクト検出部１２２が検出したオブジェクトの見え特徴量を抽出する（Ｓ３）。
続いて、行動特徴量抽出部１２４は、オブジェクト検出部１２２が検出したオブジェクトの行動特徴量を抽出する（Ｓ４）。 Subsequently, the visible feature amount extraction unit 123 extracts the visible feature amount of the object detected by the object detection unit 122 (S3).
Subsequently, the behavior feature amount extraction unit 124 extracts the behavioral feature amount of the object detected by the object detection unit 122 (S4).

続いて、関連付け部１２５は、オブジェクト検出部１２２が抽出した位置特徴量、見え特徴量抽出部１２３が抽出した見え特徴量、及び行動特徴量抽出部１２４が抽出した行動特徴量に基づいて、オブジェクト検出部１２２が検出したオブジェクトの関連付けを行う（Ｓ５）。 Subsequently, the association unit 125 is an object based on the position feature amount extracted by the object detection unit 122, the appearance feature amount extracted by the appearance feature amount extraction unit 123, and the behavior feature amount extracted by the behavior feature amount extraction unit 124. The detection unit 122 associates the detected objects (S5).

続いて、行動特徴量抽出部１２４は、関連付け部１２５により関連付けが行われた複数のオブジェクトのそれぞれについて、当該オブジェクトと、当該オブジェクトに関連付けられた一以上の他のオブジェクトとに基づいて行動特徴量を再抽出する（Ｓ６）。 Subsequently, the behavior feature extraction unit 124 uses the behavior feature amount extraction unit 124 based on the object and one or more other objects associated with the object for each of the plurality of objects associated with the association unit 125. Is re-extracted (S6).

続いて、関連付け部１２５は、オブジェクト検出部１２２が抽出した位置特徴量、見え特徴量抽出部１２３が抽出した見え特徴量、及び行動特徴量抽出部１２４が再抽出した行動特徴量に基づいて、オブジェクト検出部１２２が検出したオブジェクトの関連付けを行う（Ｓ７）。 Subsequently, the association unit 125 is based on the position feature amount extracted by the object detection unit 122, the appearance feature amount extracted by the appearance feature amount extraction unit 123, and the behavior feature amount re-extracted by the behavior feature amount extraction unit 124. The object detection unit 122 associates the detected object (S7).

続いて、実行制御部１２６は、関連付け部１２５によるオブジェクトの関連付けが収束したか否かを判定する（Ｓ８）。実行制御部１２６は、オブジェクトの関連付けが収束したと判定すると、本フローチャートに係る処理を終了し、オブジェクトの関連付けが収束していないと判定すると、Ｓ６に処理を移す。 Subsequently, the execution control unit 126 determines whether or not the object association by the association unit 125 has converged (S8). When the execution control unit 126 determines that the object associations have converged, it ends the process according to this flowchart, and when it determines that the object associations have not converged, it shifts the process to S6.

［本実施形態における効果］
以上説明したように、本実施形態に係るオブジェクト追跡装置１は、所定エリアを複数の時刻のそれぞれで撮像した複数の撮像画像からオブジェクトを検出し、検出したオブジェクトに対応する撮像画像の部分画像に基づいてオブジェクトの行動特徴量を抽出し、抽出した行動特徴量に基づいて撮像時刻が異なる複数の撮像画像のそれぞれから検出されたオブジェクトの関連付けを行う。そして、オブジェクト追跡装置１は、関連付けが行われた複数のオブジェクトのそれぞれについて、当該オブジェクトに対応する部分画像と、当該オブジェクトに関連付けられた他のオブジェクトに対応する部分画像とに基づいて行動特徴量を再抽出し、再抽出した行動特徴量に基づいて、撮像時刻が異なる複数の撮像画像のそれぞれから検出されたオブジェクトの関連付けを再度行う。 [Effect in this embodiment]
As described above, the object tracking device 1 according to the present embodiment detects an object from a plurality of captured images obtained by capturing a predetermined area at each of a plurality of times, and converts the object into a partial image of the captured image corresponding to the detected object. Based on this, the behavioral feature amount of the object is extracted, and the objects detected from each of the plurality of captured images having different imaging times are associated with each other based on the extracted behavioral feature amount. Then, the object tracking device 1 has a behavioral feature amount based on the partial image corresponding to the object and the partial image corresponding to the other object associated with the object for each of the plurality of associated objects. Is re-extracted, and based on the re-extracted behavioral feature amount, the objects detected from each of the plurality of captured images having different imaging times are associated again.

このようにすることで、オブジェクト追跡装置１は、関連付けが行われたオブジェクトのそれぞれに対応するオブジェクトの行動特徴量の精度を高めることができるので、オブジェクトを精度良く追跡することができる。 By doing so, the object tracking device 1 can increase the accuracy of the behavioral feature amount of the object corresponding to each of the associated objects, so that the object can be tracked with high accuracy.

また、オブジェクト追跡装置１は、行動特徴量の再抽出と、オブジェクトの再度の関連付けとを、所定の条件を満たすまで交互に繰り返し実行させることにより、オブジェクトの追跡を行う。これにより、オブジェクト追跡装置１は、関連付けが行われたオブジェクト同士が示す行動特徴量に基づいて、各オブジェクトの行動特徴量の精度をさらに高めることができるので、オブジェクトの追跡精度をさらに向上させることができる。 Further, the object tracking device 1 tracks the object by alternately and repeatedly executing the re-extraction of the behavioral feature amount and the re-association of the object until a predetermined condition is satisfied. As a result, the object tracking device 1 can further improve the accuracy of the behavioral features of each object based on the behavioral features indicated by the associated objects, so that the tracking accuracy of the objects can be further improved. Can be done.

以上、本発明を実施の形態を用いて説明したが、本発明の技術的範囲は上記実施の形態に記載の範囲には限定されず、その要旨の範囲内で種々の変形及び変更が可能である。例えば、上述の実施形態では、関連付け部１２５は、オブジェクト検出部１２２が抽出した位置特徴量、見え特徴量抽出部１２３が抽出した見え特徴量、及び行動特徴量抽出部１２４が抽出した行動特徴量に基づいて、オブジェクトの関連付けを行うこととしたが、これに限らない。 Although the present invention has been described above using the embodiments, the technical scope of the present invention is not limited to the scope described in the above embodiments, and various modifications and changes can be made within the scope of the gist thereof. be. For example, in the above-described embodiment, the association unit 125 uses the position feature amount extracted by the object detection unit 122, the appearance feature amount extracted by the appearance feature amount extraction unit 123, and the behavior feature amount extracted by the behavior feature amount extraction unit 124. It was decided to associate objects based on, but it is not limited to this.

関連付け部１２５は、行動特徴量抽出部１２４が抽出した行動特徴量のみに基づいて、オブジェクト検出部１２２が検出したオブジェクトの関連付けを行ってもよい。また、関連付け部１２５は、オブジェクト検出部１２２が抽出した位置特徴量及び見え特徴量抽出部１２３が抽出した見え特徴量のいずれかと、行動特徴量抽出部１２４が抽出した行動特徴量とに基づいてオブジェクト検出部１２２が検出したオブジェクトの関連付けを行ってもよい。 The association unit 125 may associate the objects detected by the object detection unit 122 based only on the behavior feature amount extracted by the behavior feature amount extraction unit 124. Further, the association unit 125 is based on either the position feature amount extracted by the object detection unit 122 or the visible feature amount extracted by the visible feature amount extraction unit 123, and the behavioral feature amount extracted by the behavioral feature amount extraction unit 124. The objects detected by the object detection unit 122 may be associated with each other.

また、取得部１２１は、撮像装置２が４以上の複数の時刻のそれぞれで撮像した４以上の複数の撮像画像を取得したが、これに限らない。取得部１２１は、撮像装置２が３つの時刻のそれぞれで撮像した３つの撮像画像を取得してもよい。この場合には、行動特徴量抽出部１２４による行動特徴量の再抽出と、関連付け部１２５による行動特徴量の再抽出が行われたオブジェクトの関連付けとを、それぞれ１回のみ実行し、行動特徴量の再抽出と、行動特徴量の再抽出が行われたオブジェクトの関連付けとを複数回実行しないようにしてもよい。 Further, the acquisition unit 121 has acquired a plurality of captured images of 4 or more captured by the imaging device 2 at a plurality of times of 4 or more, but the present invention is not limited to this. The acquisition unit 121 may acquire three captured images captured by the imaging device 2 at each of the three times. In this case, the behavioral feature amount extraction unit 124 re-extracts the behavioral feature amount and the association unit 125 re-extracts the behavioral feature amount of the object only once, respectively, and the behavioral feature amount is re-extracted. The re-extraction of the behavioral features and the association of the objects for which the behavioral features have been re-extracted may not be executed multiple times.

また、上記の実施の形態では、オブジェクトが、店舗内を行動する店員や顧客等の人物であることを例として説明したが、これに限らない。例えば、撮像画像が、町中を撮像する撮像画像である場合、人物に限らず、車両等の移動物もオブジェクトとしてもよい。この場合、オブジェクト追跡装置１は、オブジェクトの種別に応じた各種特徴量を用いてもよい。 Further, in the above embodiment, the object is described as an example of a person such as a clerk or a customer who acts in the store, but the present invention is not limited to this. For example, when the captured image is a captured image that captures the entire town, not only a person but also a moving object such as a vehicle may be an object. In this case, the object tracking device 1 may use various feature quantities according to the type of the object.

また、装置の全部又は一部は、任意の単位で機能的又は物理的に分散・統合して構成することができる。また、複数の実施の形態の任意の組み合わせによって生じる新たな実施の形態も、本発明の実施の形態に含まれる。組み合わせによって生じる新たな実施の形態の効果は、もとの実施の形態の効果を併せ持つ。 In addition, all or part of the device can be functionally or physically distributed / integrated in any unit. Also included in the embodiments of the present invention are new embodiments resulting from any combination of the plurality of embodiments. The effect of the new embodiment produced by the combination also has the effect of the original embodiment.

１・・・オブジェクト追跡装置、１１・・・記憶部、１２・・・制御部、１２１・・・取得部、１２２・・・オブジェクト検出部、１２３・・・見え特徴量抽出部、１２４・・・行動特徴量抽出部、１２５・・・関連付け部、１２６・・・実行制御部、２・・・撮像装置

1 ... Object tracking device, 11 ... Storage unit, 12 ... Control unit, 121 ... Acquisition unit, 122 ... Object detection unit, 123 ... Visible feature amount extraction unit, 124 ... -Behavior feature extraction unit, 125 ... association unit, 126 ... execution control unit, 2 ... imaging device

Claims

An acquisition unit that acquires a plurality of three or more captured images captured by an imaging device that captures a predetermined area at a plurality of three or more times, respectively.
An object detection unit that detects an object reflected in the captured image from each of the plurality of captured images acquired by the acquisition unit, and an object detection unit.
A behavioral feature amount extraction unit that extracts a behavioral feature amount indicating the behavioral feature amount of the object based on a partial image of the captured image corresponding to the object detected by the object detection unit, and a behavioral feature amount extraction unit.
An association unit that associates the object detected by the object detection unit from each of the plurality of captured images having different captured times based on the behavior feature amount extracted by the behavior feature amount extraction unit.
With
The behavior feature extraction unit corresponds to each of the partial image corresponding to the object and one or more other objects associated with the object for each of the plurality of objects associated with the association unit. The behavioral feature amount is re-extracted based on the partial image to be performed.
The association unit associates the object detected by the object detection unit with each of the plurality of captured images having different captured times based on the behavior feature amount re-extracted by the behavior feature amount extraction unit. Do it again
Object tracking device.

The acquisition unit acquires a plurality of four or more captured images captured by the imaging device at each of the four or more time periods.
Execution control for tracking an object by alternately and repeatedly executing the re-extraction of the behavior feature amount by the behavior feature amount extraction unit and the re-association of the object by the association unit until a predetermined condition is satisfied. With more parts,
The object tracking device according to claim 1.

As a result of associating the objects by the association unit, the execution control unit has a ratio of the number of objects associated before the association to the number of objects associated after the association. The re-extraction of the behavioral feature amount by the behavioral feature amount extraction unit and the association of the objects by the association unit are alternately and repeatedly executed until the ratio becomes a predetermined ratio or more.
The object tracking device according to claim 2.

The behavior feature extraction unit determines the behavior feature of the object based on the behavior tendency indicated by the behavior feature corresponding to the object for each of the plurality of objects associated with the association. Change the number of other objects used for re-extraction,
The object tracking device according to claim 2 or 3.

The object detection unit detects the object by specifying an object position indicating the position of the object reflected in the captured image.
The behavior feature amount extraction unit extracts the behavior feature amount based on the partial image corresponding to the object position specified by the object detection unit.
The object tracking device according to any one of claims 1 to 4.

The association unit further associates the object detected by the object detection unit from each of the plurality of captured images having different captured times, based on the object position specified by the object detection unit.
The object tracking device according to claim 5.

The object detection unit further includes a visual feature amount extraction unit that extracts a visual feature amount that indicates the appearance of the object in the captured image.
Based on the visible feature amount extracted by the visible feature amount extraction unit, the associating unit further associates the object detected by the object detection unit from each of the captured images having different captured times.
The object tracking device according to any one of claims 1 to 6.

The behavioral feature amount extraction unit captures the partial image of the captured image corresponding to the object detected by the object detection unit and an image captured at a time before or after the time at which the captured image is captured. The behavioral feature amount is extracted based on the difference between the image and the partial image.
The object tracking device according to any one of claims 1 to 7.

The behavioral feature amount extraction unit is a first partial image that is a partial image including the object detected by the object detection unit, and the partial image that includes the object and has a larger display area than the first partial image. The behavioral feature amount is extracted based on the second partial image.
The object tracking device according to any one of claims 1 to 8.

The behavior feature extraction unit is one or more of the above, in which the partial image corresponding to the object is associated with the object in a neural network that outputs the behavior feature in response to the input of the partial image. The partial image corresponding to the other object is input, and the behavioral feature of the object is re-extracted based on the plurality of behavioral features output from the neural network.
The object tracking device according to any one of claims 1 to 9.

The behavior feature extraction unit inputs the partial image corresponding to the object into the neural network that outputs the behavior feature in response to the input of the partial image, and obtains the feature indicated by the intermediate layer in the neural network. Acquire and extract the behavioral feature of the object based on the acquired feature.
The object tracking device according to any one of claims 1 to 10.

The behavior feature extraction unit is one or more of the above, in which the partial image corresponding to the object is associated with the object in a neural network that outputs the behavior feature in response to the input of the partial image. The partial image corresponding to the other object is input, the feature amount indicated by the intermediate layer in the neural network is acquired, and the behavioral feature amount of the object is re-extracted based on the acquired feature amount.
The object tracking device according to any one of claims 1 to 10.

Computer runs,
A step of acquiring three or more captured images captured by an imaging device that captures a predetermined area at a plurality of three or more times, respectively.
A step of detecting an object reflected in the captured image from each of the acquired plurality of captured images, and
A step of extracting a behavioral feature amount indicating the behavioral feature amount of the object based on a partial image of the captured image corresponding to the detected object, and a step of extracting the behavioral feature amount.
Based on the extracted behavioral features, a step of associating the objects detected from each of the plurality of captured images having different captured times, and a step of associating the objects.
For each of the plurality of associated objects, the behavioral feature amount is based on the partial image corresponding to the object and the partial image corresponding to each of one or more other objects associated with the object. And the steps to re-extract
A step of reassociating the objects detected from each of the plurality of captured images having different captured times based on the re-extracted behavioral features.
Object tracking method with.