JP2019003329A

JP2019003329A - Information processor, information processing method, and program

Info

Publication number: JP2019003329A
Application number: JP2017115995A
Authority: JP
Inventors: 健二塚本; Kenji Tsukamoto; 大岳八谷; Hirotaka Hachiya; 克彦森; Katsuhiko Mori
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2017-06-13
Filing date: 2017-06-13
Publication date: 2019-01-10
Anticipated expiration: 2037-06-13
Also published as: JP6976731B2

Abstract

To make it possible to create an identification model using proper data having less errors.SOLUTION: A data storage part (102) stores event data containing a feature quantity of an event of an object created in advance. A video acquisition part (101) acquires a video. A feature quantity creation part (103) creates a feature quantity of the event of the object in the acquired video. A data selection part (105) selects event data containing a feature quantity similar to the feature quantity created by the feature quantity creation part (103) from the event data stored in the data storage part (104). An identification model creation part (106) creates an identification model identifying an event of the object in the video by using the feature quantity of the selected event data.SELECTED DRAWING: Figure 1

Description

本発明は、映像データ内の対象物の識別に用いられる識別モデルを生成する情報処理装置、情報処理方法、及びプログラムに関する。 The present invention relates to an information processing apparatus, an information processing method, and a program for generating an identification model used for identifying an object in video data.

従来、映像データから対象物の事象を識別（対象物やその対象物の状態を識別）する識別モデルを作成する手法が提案されている。この識別モデルを作成するには学習データが必要であるため、学習データの収集に時間を要する。また、収集した学習データが充分であるか確認する事は難しい。 Conventionally, a method for creating an identification model for identifying an event of an object from video data (identifying the object and the state of the object) has been proposed. Since learning data is required to create this identification model, it takes time to collect learning data. Also, it is difficult to check whether the collected learning data is sufficient.

それらに対し、非特許文献１では、予め検出対象のモデルを用意し、環境情報としてカメラ位置や背景画像などの情報を入力し、その環境に特化した学習サンプルをＣＧ（コンピュータグラフィック）で生成して追加学習を行う方法がある。これにより、環境に併せた学習データ作成のコストを低減させている。 On the other hand, in Non-Patent Document 1, a model to be detected is prepared in advance, information such as a camera position and a background image is input as environment information, and a learning sample specialized for the environment is generated by CG (computer graphic). There is a way to do additional learning. Thereby, the cost of creating learning data in accordance with the environment is reduced.

土屋成光、山内悠嗣、山下隆義、藤吉弘亘、ハイブリッド型転移学習による物体検出における学習の効率化、信学技報、ｖｏｌ．１１２，ｎｏ．３８５，ＰＲＭＵ２０１２−１２２．ｐｐ. ３２９−３３４，２０１３年１月Narumi Tsuchiya, Atsushi Yamauchi, Takayoshi Yamashita, Hironobu Fujiyoshi, Learning Efficiency in Object Detection by Hybrid Type Transfer Learning, IEICE Tech. 112, no. 385, PRMU2012-122. pp. 329-334, January 2013

しかしながら、非特許文献１の技術では、設定したシーン（場面）において、実際には起こらない対象物の事象（対象物の状態）を表すデータをＣＧで作成する可能性がある。そのため、学習データに不要なデータが混入してしまい、識別時の未検知・誤検知が発生する要因となってしまう。 However, in the technique of Non-Patent Document 1, there is a possibility that data representing an event (object state) of an object that does not actually occur in a set scene (scene) is generated by CG. For this reason, unnecessary data is mixed into the learning data, which may cause undetected / misdetected data during identification.

そこで、本発明は、シーン内の対象物の事象を精度良く識別できる識別モデルを生成可能にすることを目的とする。 Therefore, an object of the present invention is to enable generation of an identification model that can accurately identify an event of an object in a scene.

本発明は、予め生成された、対象物の事象の特徴量を含む複数の事象データを保存する保存手段と、映像を取得する映像取得手段と、前記取得された映像内の対象物の事象の特徴量を作成する特徴量作成手段と、前記特徴量作成手段にて作成された特徴量に類似した特徴量を含む事象データを、前記保存手段に保存されている事象データの中から選択する選択手段と、前記選択された事象データの特徴量を用いて、映像内の対象物の事象を識別する識別モデルを作成するモデル作成手段と、を有することを特徴とする。 The present invention provides a storage means for storing a plurality of event data including a feature amount of an event of an object generated in advance, an image acquisition means for acquiring an image, and an event of an object in the acquired image. A feature amount creating unit that creates a feature amount, and a selection that selects event data including a feature amount similar to the feature amount created by the feature amount creating unit from among the event data stored in the storage unit And a model creating means for creating an identification model for identifying the event of the object in the video using the feature amount of the selected event data.

本発明によれば、シーン内の対象物の事象を精度良く識別できる識別モデルを生成可能となる。 According to the present invention, it is possible to generate an identification model that can accurately identify an event of an object in a scene.

第１の実施形態の情報処理装置の概略構成図である。It is a schematic block diagram of the information processing apparatus of 1st Embodiment. 第１の実施形態において対象物の正常な行動の入力例を示す図である。It is a figure which shows the example of input of the normal action of a target object in 1st Embodiment. 正常な行動の行動データの一例を示す図である。It is a figure which shows an example of the action data of normal action. ハッシュ関数群を用いたデータ探索の説明図である。It is explanatory drawing of the data search using a hash function group. 収集された行動データと映像の合成例を示す図である。It is a figure which shows the example of a synthesis | combination of the collected action data and an image | video. データ保存部への登録例の説明図である。It is explanatory drawing of the example of registration to a data storage part. 第１の実施形態の情報処理装置の処理のフローチャートである。It is a flowchart of the process of the information processing apparatus of 1st Embodiment. 第２の実施形態の情報処理装置の概略構成図である。It is a schematic block diagram of the information processing apparatus of 2nd Embodiment. 異常な行動の行動データの一例を示す図である。It is a figure which shows an example of the action data of abnormal action. ラベル選択による行動データの入力例の説明図である。It is explanatory drawing of the example of input of the action data by label selection. 第２の実施形態の情報処理装置の処理のフローチャートである。It is a flowchart of a process of the information processing apparatus of 2nd Embodiment. 第３の実施形態の情報処理装置の概略構成図である。It is a schematic block diagram of the information processing apparatus of 3rd Embodiment. マップ情報を用いた行動データ入力例の説明図である。It is explanatory drawing of the example of action data input using map information. 第３の実施形態の情報処理装置の処理のフローチャートである。It is a flowchart of the process of the information processing apparatus of 3rd Embodiment.

以下、本発明の好ましい実施の形態を、添付の図面に基づいて詳細に説明する。
＜第１の実施形態＞
図１（ａ）は第１の実施形態に関わる情報処理装置１００の概略的な構成例を示している。本実施形態の情報処理装置１００は、後述するように、映像シーンに生ずる対象物の事象が指定され、予め作成した対象物の事象データを保存するデータベースの中から、その指定された対象物の事象に類似した事象データを選択して識別モデルを作成する。また、第１の実施形態の情報処理装置１００では、識別モデルを作成する際、対象物の正常な事象に関する事象データを収集する。対象物の正常な事象としては、例えば交差点の映像シーンにおいて横断歩道上を歩く歩行者を対象物とした場合、その歩行者が横断歩道上を歩くような、横断歩道に対する歩行者の一般的な行動などを挙げることができる。もちろんこれは一例であり、対象物の正常な事象は、横断歩道を歩く歩行者の行動に限定されるものではない。なお、第１の実施形態では、映像シーンの例として屋外の交差点を撮影した映像を用いた説明を行うが、その他にも、映像シーンは、例えば商業施設や病院、介護施設、駅などの公共施設の屋内やその周辺等のシーンであってもよい。 Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings.
<First Embodiment>
FIG. 1A shows a schematic configuration example of the information processing apparatus 100 according to the first embodiment. As will be described later, the information processing apparatus 100 according to the present embodiment designates an event of an object that occurs in a video scene, and stores the event data of the designated object from a database that stores event data of the object created in advance. An event model similar to an event is selected to create an identification model. Further, in the information processing apparatus 100 according to the first embodiment, when creating an identification model, event data relating to a normal event of an object is collected. As a normal event of an object, for example, when a pedestrian walking on a pedestrian crossing in the video scene of an intersection is used as an object, the pedestrian's general pedestrian walks on a pedestrian crossing. You can list actions. Of course, this is an example, and the normal phenomenon of the object is not limited to the behavior of a pedestrian walking on a pedestrian crossing. In the first embodiment, description is given using an image obtained by photographing an outdoor intersection as an example of an image scene. In addition, an image scene is a public facility such as a commercial facility, a hospital, a nursing facility, or a station. It may be a scene inside the facility or its surroundings.

以下、本実施形態の情報処理装置１００において、映像シーンにおいて対象物の正常な事象が指定されて識別モデルの生成を行う構成及び処理について説明する。
図１（ａ）に示す情報処理装置１００は、映像取得部１０１と、入力部１０２と、特徴量作成部１０３と、データ保存部１０４と、データ選択部１０５と、識別モデル作成部１０６と、識別モデル保存部１０７と、表示部１０８とを有して構成されている。 Hereinafter, in the information processing apparatus 100 of the present embodiment, a configuration and processing for generating an identification model by specifying a normal event of an object in a video scene will be described.
An information processing apparatus 100 illustrated in FIG. 1A includes a video acquisition unit 101, an input unit 102, a feature amount creation unit 103, a data storage unit 104, a data selection unit 105, an identification model creation unit 106, The identification model storage unit 107 and the display unit 108 are included.

映像取得部１０１は、例えば交差点や公共施設などに設置された監視カメラ等により撮影した監視対象の映像データを取得し、その取得した映像データを表示部１０８と特徴量作成部１０３へと出力する。
図２は、映像取得部１０１にて取得された映像データが表示部１０８の画面に表示された表示例を示している。図２には、交差点に設置された監視カメラの映像のうち、連続したｎフレーム分の映像２０１−１〜２０１−ｎが、表示部１０８の画面上に表示されている例を示している。図２に例示したｎフレーム分の映像２０１−１〜２０１−ｎには、交差点の横断歩道上を歩行者２２１が歩く様子が映っているとする。なお、図２に示した画面内の枠２１１−１〜２１１−ｎと属性情報リスト２１２については後述する。 The video acquisition unit 101 acquires, for example, video data to be monitored captured by a monitoring camera or the like installed at an intersection or public facility, and outputs the acquired video data to the display unit 108 and the feature amount generation unit 103. .
FIG. 2 shows a display example in which the video data acquired by the video acquisition unit 101 is displayed on the screen of the display unit 108. FIG. 2 shows an example in which images 201-1 to 201-n for consecutive n frames among the images of the surveillance camera installed at the intersection are displayed on the screen of the display unit 108. It is assumed that the images 201-1 to 201-n for n frames illustrated in FIG. 2 show a pedestrian 221 walking on a crosswalk at an intersection. The frames 211-1 to 211-n and the attribute information list 212 in the screen illustrated in FIG. 2 will be described later.

入力部１０２は、表示部１０８の画面表示を用いたＧＵＩ（グラフィカルユーザインターフェース）等を介したユーザからの入力指示等の情報取得を行う。すなわち本実施形態において、ユーザは、表示部１０８に表示されている映像を見つつ、入力部１０２を介して、対象物の正常な事象に関する指示を入力可能となされている。以下、図２に示した映像２０１−１〜２０１−ｎのように、歩行者２２１が交差点の横断歩道上を歩く行動を例に挙げ、ユーザによる対象物の正常な事象の指示入力例について説明する。 The input unit 102 acquires information such as an input instruction from a user via a GUI (graphical user interface) using the screen display of the display unit 108. That is, in the present embodiment, the user can input an instruction regarding a normal event of the target object via the input unit 102 while viewing the video displayed on the display unit 108. Hereinafter, as an example of the images 201-1 to 201-n shown in FIG. 2, an example in which the pedestrian 221 walks on the pedestrian crossing at the intersection will be described, and an instruction input example of a normal event of the object by the user will be described. To do.

ここで、図２のように横断歩道上を歩く歩行者２２１の行動が対象物の正常な事象として指定される場合、ユーザからは、入力部１０２を介して、映像内の歩行者２２１を指定するための指示入力がなされる。入力部１０２を介してユーザから指示入力がなされると、情報処理装置１００は、その指示入力を基に、映像内の歩行者２２１に対して所定の枠を設定する。この時のユーザによる指示入力としては、例えばＧＵＩを介して映像上の対象物（歩行者２２１）の例えば左上の位置及び右下の位置を指示するような入力方法を用いることができる。情報処理装置１００は、入力部１０２を介してユーザから映像内の歩行者２２１の左上及び右下の位置指定がなされると、それら指定された位置を枠の左上の位置及び右下の位置とする矩形枠を設定する。なお、ユーザによる指定と枠の設定方法は、この例に限定されず、その他の方法が用いられてもよい。 Here, when the action of the pedestrian 221 walking on the pedestrian crossing is designated as a normal event of the target object as shown in FIG. 2, the user designates the pedestrian 221 in the video via the input unit 102. An instruction is input to When an instruction is input from the user via the input unit 102, the information processing apparatus 100 sets a predetermined frame for the pedestrian 221 in the video based on the instruction input. As the instruction input by the user at this time, for example, an input method for instructing, for example, the upper left position and the lower right position of the object (pedestrian 221) on the video via the GUI can be used. When the user designates the upper left and lower right positions of the pedestrian 221 in the video via the input unit 102, the information processing apparatus 100 sets the designated positions as the upper left position and the lower right position of the frame. Set the rectangular frame to be used. The designation by the user and the frame setting method are not limited to this example, and other methods may be used.

本実施形態の場合、歩行者２２１に対する枠の設定は、連続するｎフレーム分の映像２０１−１〜２０１−ｎのそれぞれについて行われる。これにより、それらｎフレーム分の映像２０１−１〜２０１−ｎについて、それぞれ枠２１１−１〜２１１−ｎが設定される。なお、連続するｎフレームの最初の１フレームについてのみユーザによる位置指定がなされ、以降の２〜ｎフレームについては、情報処理装置１００が、下記の参考文献１に記載の公知の追尾技術により対象物を追尾することで枠を自動設定してもよい。対象物の追尾方法は、参考文献１の例に限定されるものではなく、他の追尾方法が用いられてもよい。本実施形態の情報処理装置１００は、前述のように映像２０１−１〜２０１−ｎに対して設定された枠２１１−１〜２１１−ｎの情報（以下、領域情報と呼ぶ。）を、対象物の事象に関する情報の一つとして取得する。 In the case of the present embodiment, the setting of the frame for the pedestrian 221 is performed for each of the images 201-1 to 201-n for consecutive n frames. As a result, frames 211-1 to 211-n are set for the images 201-1 to 201-n for the n frames, respectively. Note that the position is specified by the user only for the first one of the consecutive n frames, and for the subsequent 2 to n frames, the information processing apparatus 100 uses the known tracking technique described in Reference Document 1 below to detect the object. The frame may be automatically set by tracking. The tracking method of the object is not limited to the example of Reference 1, and other tracking methods may be used. As described above, the information processing apparatus 100 according to the present embodiment deals with information on the frames 211-1 to 211-n set for the videos 201-1 to 201-n (hereinafter referred to as region information). Acquired as one of information related to an event of an object.

参考文献１：Ｍ．ＩｓａｒｄａｎｄＡ．Ｂｌａｋｅ，Ｃｏｎｄｅｎｓａｔｉｏｎ − ｃｏｎｄｉｔｉｏｎａｌｄｅｎｓｉｔｙｐｒｏｐａｇａｔｉｏｎｆｏｒｖｉｓｕａｌｔｒａｃｋｉｎｇ，ＩｎｔｅｒｎａｔｉｏｎａｌＪｏｕｒｎａｌｏｆＣｏｍｐｕｔｅｒＶｉｓｉｏｎ，ｖｏｌ．２９，ｎｏ．１，ｐｐ．５−２８，１９９８． Reference 1: M.M. Isard and A.M. Blake, Condensation-conditional density propagation for visual tracking, International Journal of Computer Vision, vol. 29, no. 1, pp. 5-28, 1998.

また、本実施形態の情報処理装置１００は、対象物の正常な事象に関する情報として、前述した領域情報とともに、対象物の属性情報をも取得する。属性情報としては、例えば、対象物を表すカテゴリ情報、天候等のような環境情報、時刻や時間帯のような時間情報などを挙げることができる。本実施形態の情報処理装置１００は、この属性情報についても、入力部１０２を介したユーザからの指示入力による情報取得が可能となされている。 Further, the information processing apparatus 100 according to the present embodiment also acquires the attribute information of the target object as information related to the normal event of the target object together with the area information described above. Examples of the attribute information include category information representing an object, environmental information such as weather, time information such as time and time zone, and the like. The information processing apparatus 100 according to the present embodiment can acquire information by inputting an instruction from the user via the input unit 102 for the attribute information.

図２に示した属性情報リスト２１２は、入力部１０２を介してユーザが属性情報を指定する際に用いられる。本実施形態の情報処理装置１００は、図２に示したような属性情報リスト２１２を画面上に表示させ、この属性情報リスト２１２から、入力部１０２を介してユーザが指定した属性情報を取得する。図２に例示した属性情報リスト２１２は、対象物のカテゴリ情報（例えば歩行者や自転車などの移動体を表すカテゴリ情報）、天候等を表す環境情報、時刻や時間帯等を表す時間情報を、ユーザが選択可能なプルダウンリストとなされている。したがって、ユーザは、図２の映像２０１−１〜２０１−ｎを見ながら、入力部１０２を介して属性情報リスト２１２のプルダウンリストを操作することにより、属性情報の指定を行うことができる。なお、図２の属性情報リスト２１２には、対象物の種類を指定するためのプルダウンリストも含まれているが、これについては後述する。属性情報は、図２の属性情報リスト２１２に挙げられている情報に限定されるものではなく、これら以外の属性情報の指定が可能になされていてもよい。本実施形態の情報処理装置１００は、属性情報リスト２１２からユーザが指定した属性情報を、対象物の正常な事象に関する情報の一つとして取得する。 The attribute information list 212 illustrated in FIG. 2 is used when the user specifies attribute information via the input unit 102. The information processing apparatus 100 according to the present embodiment displays an attribute information list 212 as illustrated in FIG. 2 on the screen, and acquires attribute information specified by the user via the input unit 102 from the attribute information list 212. . The attribute information list 212 illustrated in FIG. 2 includes object category information (for example, category information representing a moving object such as a pedestrian or a bicycle), environmental information representing weather, time information representing time, time zone, and the like. The pull-down list is selectable by the user. Therefore, the user can designate attribute information by operating the pull-down list of the attribute information list 212 via the input unit 102 while viewing the videos 201-1 to 201-n in FIG. Note that the attribute information list 212 in FIG. 2 includes a pull-down list for designating the type of object, which will be described later. The attribute information is not limited to the information listed in the attribute information list 212 in FIG. 2, and other attribute information may be specified. The information processing apparatus 100 according to the present embodiment acquires attribute information specified by the user from the attribute information list 212 as one piece of information related to a normal event of the target object.

そして、本実施形態の情報処理装置１００は、前述のように取得した領域情報（図２の例では枠２１１−１〜２１１−ｎの領域情報）と、属性情報（図２の例では属性情報リスト２１２にて指定された属性情報）とを、特徴量作成部１０３へ送る。 Then, the information processing apparatus 100 according to this embodiment includes the area information acquired as described above (area information in the frames 211-1 to 211-n in the example of FIG. 2) and attribute information (attribute information in the example of FIG. 2). Attribute information specified in the list 212) is sent to the feature quantity creation unit 103.

特徴量作成部１０３は、映像データから、映像内の対象物の事象における特徴量を作成する。例えば、特徴量作成部１０３は、映像データから対象物の動きベクトルを算出し、その対象物の動きベクトルの平均値を各要素とした特徴ベクトルを生成して、その特徴ベクトルを特徴量として作成する。図２の例の場合、特徴量作成部１０３は、歩行者２２１に対して設定された枠について、映像２０１−１〜２０１−ｎから動きベクトルを算出し、その動きベクトルの平均値を各要素とした特徴ベクトルを特徴量として作成する。また、特徴量作成部１０３は、例えば下記の参考文献２に示すＨＯＦ（ＨｉｓｔｏｇｒａｍｏｆＯｐｔｉｃａｌＦｌｏｗ）、ＭＨＯＦ（ＭｕｌｔｉＨｉｓｔｏｇｒａｍｏｆＯｐｔｉｃａｌＦｌｏｗ）などによる特徴量を求めてもよい。なお、ＨＯＦ、ＭＨＯＦでは、動きベクトルを方向別に分けて強度を足し合わせてヒストグラムにした特徴量が得られる。その他にも、特徴量作成部１０３は、例えば下記の参考文献３に示すアピアランスの勾配強度を方向別にヒストグラムにしたＨＯＧ（ＨｉｓｔｏｇｒａｍｏｆＯｒｉｅｎｔｅｄＧｒａｄｉｅｎｔｓ）や、それ以外の特徴量を求めてもよい。本実施形態における特徴量は、これら記載した方法によるものに限定されるものではない。そして、特徴量作成部１０３は、前述のようにして作成した特徴量と前述の属性情報とを、取得された映像内の対象物の正常な事象に関する事象データとして、データ選択部１０５へと送る。 The feature amount creation unit 103 creates a feature amount in the event of the object in the video from the video data. For example, the feature quantity creation unit 103 calculates a motion vector of an object from video data, generates a feature vector having each element as an average value of the motion vector of the object, and creates the feature vector as a feature quantity To do. In the case of the example in FIG. 2, the feature amount creation unit 103 calculates a motion vector from the images 201-1 to 201-n for the frame set for the pedestrian 221, and calculates an average value of the motion vector for each element. The feature vector is created as a feature quantity. Further, the feature quantity creation unit 103 may obtain a feature quantity by, for example, HOF (Histogram of Optical Flow), MHOF (Multi Histogram of Optical Flow) shown in Reference Document 2 below. In addition, in HOF and MHOF, a feature quantity obtained as a histogram is obtained by dividing motion vectors into directions and adding intensities. In addition, for example, the feature amount creation unit 103 may obtain HOG (Histogram of Oriented Gradients) in which the gradient strength of appearance shown in Reference Document 3 below is a histogram for each direction, or other feature amounts. The feature amounts in the present embodiment are not limited to those according to these described methods. Then, the feature quantity creation unit 103 sends the feature quantity created as described above and the attribute information described above to the data selection unit 105 as event data relating to the normal event of the object in the acquired video. .

参考文献２：Ｊ．Ｐｅｒｓ，ｅｔａｌ，ＨｉｓｔｏｇｒａｍｓｏｆＯｐｔｉｃａｌＦｌｏｗｆｏｒＥｆｆｉｃｉｅｎｔＲｅｐｒｅｓｅｎｔａｔｉｏｎｏｆＢｏｄｙＭｏｔｉｏｎ，ＰａｔｔｅｒｎＲｅｃｏｇｎｉｔｉｏｎＬｅｔｔｅｒｓ，ｖｏｌ．３１，ｎｏ．１１，ｐｐ．１３６９−１３７６，２０１０．
参考文献３：Ｎ．ＤａｌａｌａｎｄＢ．Ｔｒｉｇｇｓ，ＨｉｓｔｏｇｒａｍｓｏｆＯｒｉｅｎｔｅｄＧｒａｄｉｅｎｔｓｆｏｒＨｕｍａｎＤｅｔｅｃｔｉｏｎ，ＩｎＰｒｏｃｅｅｄｉｎｇｓｏｆＣｏｍｐｕｔｅｒＶｉｓｉｏｎａｎｄＰｅｔｔｅｒｎＲｅｃｏｇｎｉｔｉｏｎ（ＣＶＰＲ）、ｐｐ．８８６−８９３，２００５． Reference 2: J. Pers, et al, Histograms of Optical Flow for Efficient Representation of Body Motion, Pattern Recognition Letters, vol. 31, no. 11, pp. 1369-1376, 2010.
Reference 3: N.R. Dalal and B.M. Triggs, Histograms of Oriented Gradients for Human Detection, In Proceedings of Computer Vision and Pentification Recognition (CVPR), pp. 886-893, 2005.

データ保存部１０４には、監視カメラ等の映像データを基に予め生成された対象物の正常な事象に関する事象データとして、例えば図３に示すような行動データ３０１が保存されている。行動データ３０１には、監視カメラ等の映像から抽出された対象物の画像データと、前述したような特徴量と、対象物の属性情報を表すラベル情報とが保存（登録）されている。属性情報を表すラベル情報としては、対象物のカテゴリ（歩行者や自転車などのカテゴリ）、撮影時の天候（晴れ、曇り等）、時刻や時間帯（昼間、夕方等）などの情報が保存されている。また、行動データ３０１は、それら画像データと特徴量と属性情報ごとに、固有のデータＩＤ（識別情報）が付与されている。このデータ保存部１０４に保存されている行動データ３０１の登録処理については、後述するデータ登録装置３００の構成説明の際に述べる。 In the data storage unit 104, for example, action data 301 as shown in FIG. 3 is stored as event data relating to a normal event of an object generated in advance based on video data from a surveillance camera or the like. The action data 301 stores (registers) image data of a target object extracted from video from a surveillance camera, the above-described feature amount, and label information indicating target object attribute information. As label information representing attribute information, information such as the category of the object (category such as pedestrians and bicycles), the weather at the time of shooting (sunny, cloudy, etc.), the time and time zone (daytime, evening, etc.) are stored. ing. The behavior data 301 is given a unique data ID (identification information) for each of the image data, the feature amount, and the attribute information. The registration process of the action data 301 stored in the data storage unit 104 will be described when the configuration of the data registration apparatus 300 described later is described.

データ選択部１０５は、データ保存部１０４に保存されている行動データ３０１の中から、特徴量作成部１０３にて作成した対象物の事象（例えば歩行者の行動）の特徴量に類似した特徴量を含む行動データを収集する。収集方法としては、例えば、データ保存部１０４内の行動データ３０１の中から、映像内の対象物について入力された属性情報と一致する属性情報を探索し、その探索された属性情報に対応した行動データを収集するような方法を用いることができる。また例えば、特徴量作成部１０３にて作成した特徴量と、データ保存部１０４内の行動データ３０１の特徴量との間のユークリッド距離を算出し、そのユークリッド距離が所定の閾値以下となっている特徴量を含む行動データを収集する方法が用いられてもよい。これら収集された行動データは、特徴量作成部１０３にて特徴量が生成された対象物の行動に類似した行動のデータである。 The data selection unit 105 is a feature amount similar to the feature amount of the event (for example, pedestrian behavior) of the target object created by the feature amount creation unit 103 from the behavior data 301 stored in the data storage unit 104. Collect behavioral data including As a collection method, for example, from the action data 301 in the data storage unit 104, the attribute information that matches the attribute information input for the object in the video is searched, and the action corresponding to the searched attribute information Methods such as collecting data can be used. For example, the Euclidean distance between the feature quantity created by the feature quantity creation unit 103 and the feature quantity of the action data 301 in the data storage unit 104 is calculated, and the Euclidean distance is equal to or less than a predetermined threshold. A method of collecting behavior data including feature amounts may be used. The collected behavior data is behavior data similar to the behavior of the object for which the feature amount is generated by the feature amount creation unit 103.

また、データ選択部１０５は、例えば下記の参考文献４に記載のｐ−ｓｔａｂｌｅｈａｓｈｉｎｇなどの近似最近傍探索法を用いたデータ探索処理により、データ保存部１０４から行動データを収集してもよい。近似最近傍探索法を用いたデータ探索処理を行う場合、データ選択部１０５は、先ず、ハッシュ関数を下記の式（１）により作成する。なお、式（１）の「ａ」は、各次元の要素値であり、次元数はデータ保存部１０４に保存されている特徴量の次元数である。また、式（１）の「ｒ」は空間を分割する幅、「ｂ」は［０，ｒ］から一様に選ばれる実数である。

Further, the data selection unit 105 may collect action data from the data storage unit 104 by a data search process using an approximate nearest neighbor search method such as p-stable hashing described in Reference Document 4 below. When performing a data search process using the approximate nearest neighbor search method, the data selection unit 105 first creates a hash function using the following equation (1). Note that “a” in Expression (1) is an element value of each dimension, and the number of dimensions is the number of dimensions of the feature amount stored in the data storage unit 104. Further, “r” in the expression (1) is a width for dividing the space, and “b” is a real number uniformly selected from [0, r].

参考文献４：Ｍ．Ｄａｔａｒ，Ｎ．Ｉｍｍｏｒｌｉｃａ，Ｐ．ＩｎｄｙｋａｎｄＶ．Ｓ．Ｍｉｒｒｏｋｎｉ，Ｌｏｃａｌｉｔｙ−ｓｅｎｓｉｔｉｖｅｈａｓｈｉｎｇｓｃｈｅｍｅｂａｓｅｄｏｎｐ−ｓｔａｂｌｅｄｉｓｔｒｉｂｕｔｉｏｎ，Ｐｒｏｃｅｅｄｉｎｇｓ２０ｔｈａｎｎｕａｌＳｙｍｐｏｓｉｕｍｏｎＣｏｍｐｕｔａｔｉｏｎａｌＧｅｏｍｅｔｒｙ，ｐｐ．２５３−２６２，２００４． Reference 4: M.M. Data, N.C. Immorlica, P.M. Indyk and V.M. S. Mirokni, Locality-sensitive hashing scheme based on p-stable distribution, Proceedings 20th annual Symposium on Computational Geometry, pp. 253-262, 2004.

データ選択部１０５は、このハッシュ関数を複数作成して、ハッシュ関数群を構成する。図４は、データ保存部１０４内の行動データ３０１の各特徴量（図４では特徴量４０１とする。）を図中の各黒丸（●）により表し、それら特徴量４０１が含まれる特徴空間を、ハッシュ関数群４０２により線形に分割した図を示している。また図４の例において、特徴量作成部１０３にて作成された特徴量４１１は、図４中の×印にて表されているとする。データ選択部１０５は、データ保存部１０４内の行動データ３０１の各特徴量４０１が何れのハッシュ関数による分割領域に属するかを決定し、また、特徴量作成部１０３にて作成された特徴量４１１が何れのハッシュ関数による分割領域に属するかを判定する。さらに、データ選択部１０５は、データ保存部１０４に保存されている行動データ３０１の各特徴量４０１の中で、特徴量作成部１０３にて作成された特徴量４１１が属する分割領域４１２内の特徴量４０１（４１３）を特定する。そして、データ選択部１０５は、データ保存部１０４に保存されている行動データ３０１の中から、それら特定した特徴量４０１（４１３）を含む行動データを収集する。これら収集出された行動データは、特徴量作成部１０３にて特徴量４１１が生成された対象物の行動に類似した行動のデータである。 The data selection unit 105 creates a plurality of hash functions and configures a hash function group. FIG. 4 represents each feature quantity (referred to as feature quantity 401 in FIG. 4) of the action data 301 in the data storage unit 104 by each black circle (●) in the figure, and a feature space in which these feature quantities 401 are included. The figure divided | segmented linearly by the hash function group 402 is shown. Further, in the example of FIG. 4, it is assumed that the feature quantity 411 created by the feature quantity creation unit 103 is represented by a cross in FIG. The data selection unit 105 determines which hash function each feature quantity 401 of the action data 301 in the data storage unit 104 belongs to, and the feature quantity 411 created by the feature quantity creation unit 103. Which hash function belongs to the divided area. Further, the data selection unit 105 includes a feature in the divided region 412 to which the feature quantity 411 created by the feature quantity creation unit 103 among the feature quantities 401 of the behavior data 301 stored in the data storage unit 104 belongs. The quantity 401 (413) is specified. Then, the data selection unit 105 collects behavior data including the identified feature quantity 401 (413) from the behavior data 301 stored in the data storage unit 104. The collected action data is action data similar to the action of the object for which the feature quantity 411 is generated by the feature quantity creation unit 103.

次に、データ選択部１０５は、前述のようにしてデータ保存部１０４に保存されている行動データ３０１の中から収集した行動データを、識別モデルの作成に用いるか否か選択する。この選択方法としては、例えば図５に示すように、収集した各行動データに対応した各画像５１１を、個別に映像２０１に合成して表示部１０８の画面に表示させてユーザに確認させた上で選択させる方法を用いることができる。この映像合成の際、データ選択部１０５は、データ保存部１０４の行動データの作成時に取得された前述同様の領域情報の入力位置の情報を基に、対象物が移動している範囲を決定する。そして、データ選択部１０５は、その決定した範囲内において、映像２０１内で画像５１１をフレームごとの動きに合わせた異なる位置に合成する。フレームの切り替えは例えば入力部１０２を介したユーザによるフレーム切り替えの指示に応じて行われ、これにより、表示部１０８には、映像２０１内でフレームごとに画像５１１が移動していく様子が表示される。 Next, the data selection unit 105 selects whether or not the behavior data collected from the behavior data 301 stored in the data storage unit 104 as described above is used for creating an identification model. As this selection method, for example, as shown in FIG. 5, each image 511 corresponding to each collected action data is individually combined with the video 201 and displayed on the screen of the display unit 108 to be confirmed by the user. The method of selecting with can be used. At the time of this video composition, the data selection unit 105 determines the range in which the object is moving based on the information on the input position of the region information similar to the above obtained when the action data is created by the data storage unit 104. . Then, the data selection unit 105 synthesizes the image 511 in the video 201 at different positions in accordance with the movement of each frame within the determined range. For example, the frame is switched in response to a frame switching instruction from the user via the input unit 102, and the display unit 108 displays how the image 511 moves for each frame in the video 201. The

また、図５に示すように、データ選択部１０５は、表示部１０８の画面上で、画像５１１が合成された映像２０１の例えば下部に、「選択する」のボタンアイコン５３１と「選択しない」のボタンアイコン５３２を表示させる。そして、入力部１０２を介してユーザにより「選択する」のボタンアイコン５３１への入力指示がなされた場合、データ選択部１０５は、その入力指示時の画像５１１に対応した行動データを識別モデルの作成時の学習用データとして選択する。一方、ユーザにより「選択しない」のボタンアイコン５３２への入力指示がなされた場合、データ選択部１０５は、その時の行動データを識別モデルの作成時の学習用データとして選択しない。本実施形態の場合、データ選択部１０５による前述した選択処理が、収集された行動データごとに繰り返し行われて、識別モデル学習用の複数の行動データの選択が行われる。 As shown in FIG. 5, the data selection unit 105 displays a “select” button icon 531 and a “do not select” button on the screen of the display unit 108, for example, in the lower part of the video 201 combined with the image 511. A button icon 532 is displayed. When the user inputs an instruction to the “select” button icon 531 via the input unit 102, the data selection unit 105 creates action data corresponding to the image 511 at the time of the input instruction to create an identification model. Select as learning data for the hour. On the other hand, when the user inputs an instruction to the “do not select” button icon 532, the data selection unit 105 does not select the behavior data at that time as learning data when creating the identification model. In the case of this embodiment, the selection process described above by the data selection unit 105 is repeatedly performed for each collected behavior data, and a plurality of behavior data for identification model learning is selected.

また、データ選択部１０５は、収集した各行動データの特徴量と、特徴量作成部１０３からの特徴量との間の距離を算出し、その距離に応じて行動データを分けて距離ごとに代表を選び、それの代表の行動データの画像を表示部１０８に表示させてもよい。この場合、それら代表の行動データの画像が表示され、それらの中からユーザにより選択された行動データが、識別モデルの作成時の学習用データとして選択される。 Further, the data selection unit 105 calculates the distance between the feature amount of each collected behavior data and the feature amount from the feature amount creation unit 103, divides the behavior data according to the distance, and represents the distance for each distance. May be selected and an image of representative behavior data may be displayed on the display unit 108. In this case, an image of the representative behavior data is displayed, and the behavior data selected by the user is selected as learning data when creating the identification model.

そして、本実施形態のデータ選択部１０５は、前述したように、データ保存部１０４から収集された行動データの中から、入力部１０２を介してユーザにより選択された行動データが、識別モデル作成部１０６へ送られる。 Then, as described above, the data selection unit 105 according to the present embodiment is configured such that the behavior data selected by the user via the input unit 102 from the behavior data collected from the data storage unit 104 is an identification model creation unit. 106.

識別モデル作成部１０６は、前述のようにしてデータ選択部１０５にて選択された行動データを用いて、識別モデルを作成する。識別モデルの作成方法としては、例えば、ｋ−ｍｅａｎｓクラスタリング手法を用い、行動データのクラスタ情報を作成して識別モデルとする方法を用いることができる。この場合、クラスタ数は、識別モデル作成部１０６に入力された行動データ数に基づいて決定してもよい。そして、識別モデル作成部１０６では、各クラスタの重心位置、及びクラスタ範囲が識別モデルとして作成される。また、識別モデル作成部１０６は、識別モデルに対して或る行動データの特徴量が入力された場合、特徴空間上で距離が最も近いクラスタの範囲内であれば正常と判定し、範囲外であれば正常ではないと判定する識別モデルを作成することができる。なお、本実施形態において、識別モデル作成方法は、前述のｋ−ｍｅａｎｓクラスタリング手法には限定されず、別の識別モデル作成方法が用いられてもよい。 The identification model creation unit 106 creates an identification model using the action data selected by the data selection unit 105 as described above. As an identification model creation method, for example, a k-means clustering method may be used, and a method of creating cluster information of behavior data to obtain an identification model can be used. In this case, the number of clusters may be determined based on the number of behavior data input to the identification model creation unit 106. Then, the identification model creation unit 106 creates the centroid position and cluster range of each cluster as an identification model. In addition, when a feature amount of certain behavior data is input to the identification model, the identification model creation unit 106 determines that it is normal if the distance is within the nearest cluster in the feature space, and out of the range. If there is, an identification model for determining that it is not normal can be created. In the present embodiment, the identification model creation method is not limited to the above-described k-means clustering method, and another identification model creation method may be used.

識別モデル作成部１０６により作成された識別モデルは、識別モデル保存部１０７に送られて保存されるとともに、表示部１０８にも出力される。この際、識別モデルとして出力されるのは、各クラスタの重心位置とクラスタの範囲（例えばクラスタの分散）である。 The identification model created by the identification model creation unit 106 is sent to and stored in the identification model storage unit 107 and also output to the display unit 108. At this time, the center of gravity of each cluster and the cluster range (for example, cluster dispersion) are output as the identification model.

表示部１０８は、映像取得部１０１にて取得された映像を画面上に表示させるとともに、入力部１０２による入力内容の表示や、データ選択部１０５にて選択された行動データの画像の合成画像の表示などをも行う。また、表示部１０８には、識別モデル作成部１０６で作成した識別モデルがアイコン等により表示されてもよい。
以上が、図１（ａ）に示した本実施形態の情報処理装置１００の構成と処理である。 The display unit 108 displays the video acquired by the video acquisition unit 101 on the screen, displays the input content by the input unit 102, and the composite image of the action data image selected by the data selection unit 105. Also display. Further, the identification model created by the identification model creation unit 106 may be displayed on the display unit 108 by an icon or the like.
The above is the configuration and processing of the information processing apparatus 100 of the present embodiment illustrated in FIG.

＜データ登録処理の構成と処理＞
図１（ｂ）は、図１（ａ）に示した情報処理装置１００から、データ保存部１０４にデータベースとして行動データを登録するデータ登録処理を行う構成部分を抜き出して示した図である。なお、図１（ｂ）の構成は図１（ａ）の情報処理装置１００とは別の装置であってもよい。以下、本実施形態では、図１（ｂ）に示す構成をデータ登録装置３００と呼ぶ。図１（ｂ）に示すように、データ登録装置３００は、映像取得部１０１と入力部１０２と特徴量作成部１０３と表示部１０８とデータ保存部１０４とで構成される。 <Data registration processing configuration and processing>
FIG. 1B is a diagram showing a configuration part extracted from the information processing apparatus 100 shown in FIG. 1A for performing data registration processing for registering behavior data as a database in the data storage unit 104. The configuration shown in FIG. 1B may be a device different from the information processing device 100 shown in FIG. Hereinafter, in the present embodiment, the configuration shown in FIG. As shown in FIG. 1B, the data registration device 300 includes a video acquisition unit 101, an input unit 102, a feature amount creation unit 103, a display unit 108, and a data storage unit 104.

映像取得部１０１は、前述したのと同様に、監視カメラ等からの映像データを取得し、その映像データは特徴量作成部１０３と表示部１０８へ送られる。図６は、映像取得部１０１にて取得された映像データの映像２０１が表示された表示部１０８の画面表示例を示している。 As described above, the video acquisition unit 101 acquires video data from a surveillance camera or the like, and the video data is sent to the feature amount creation unit 103 and the display unit 108. FIG. 6 shows a screen display example of the display unit 108 on which the video 201 of the video data acquired by the video acquisition unit 101 is displayed.

入力部１０２は、前述したのと同様に、表示部１０８の画面表示を用いたＧＵＩ等を介して、ユーザから対象物の正常な事象（例えば歩行者の行動）を指示する入力を取得する。図６は、対象物としての歩行者６２１が横断歩道を歩いている映像２０１から、歩行者６２１が横断歩道上を歩くような正常な事象としての行動が指定された例を示している。また、図６の画面上には、前述の図２で説明したのと同様にして、映像２０１内の歩行者６２１に対して枠６０１が設定され、さらに属性情報リスト６０２も表示される。そして、枠６０１の設定や属性情報リスト６０２を用いた属性情報の入力が完了し、例えばユーザにより「入力完了」のボタンアイコン６３１への入力指示が行われると、入力部１０２は、前述同様に、領域情報と属性情報等を特徴量作成部１０３へと出力する。一方、「入力完了」のボタンアイコン６３１への入力指示が行われていない場合、行動の指定と属性情報の設定が可能な状態が維持される。 As described above, the input unit 102 acquires an input for instructing a normal event (for example, a pedestrian's behavior) of an object from the user via a GUI or the like using the screen display of the display unit 108. FIG. 6 shows an example in which an action as a normal event in which a pedestrian 621 walks on a pedestrian crossing is specified from an image 201 in which a pedestrian 621 as an object is walking on a pedestrian crossing. Further, on the screen of FIG. 6, a frame 601 is set for the pedestrian 621 in the video 201 and an attribute information list 602 is also displayed in the same manner as described with reference to FIG. When the setting of the frame 601 and the input of the attribute information using the attribute information list 602 are completed, for example, when the user gives an input instruction to the “input complete” button icon 631, the input unit 102, as described above, The area information, the attribute information, and the like are output to the feature quantity creation unit 103. On the other hand, when an input instruction to the “input complete” button icon 631 is not performed, a state in which an action can be specified and attribute information can be set is maintained.

特徴量作成部１０３は、前述したのと同様にして特徴量を作成する。そして、特徴量作成部１０３にて作成した特徴量と、入力部１０２による入力に応じた属性情報とが、データ保存部１０４へと出力される。 The feature quantity creation unit 103 creates a feature quantity in the same manner as described above. Then, the feature amount created by the feature amount creation unit 103 and the attribute information corresponding to the input by the input unit 102 are output to the data storage unit 104.

データ保存部１０４には、特徴量作成部１０３にて作成された特徴量と、入力部１０２からの属性情報とが対応付けられ、さらに、データＩＤ（識別情報）が割り当てられた行動データが保存される。また、特徴量については、例えば前述したｐ−ｓｔａｂｌｅｈａｓｈｉｎｇを用いたハッシュ値（各ハッシュ関数によって線形に分割された領域のどれに属するかの情報）が作成されて、これらのデータが保存される。
図１（ｂ）のデータ登録装置３００では、以上のようにしてデータ登録処理が行われることにより学習データベースが形成されている。 In the data storage unit 104, feature data created by the feature creation unit 103 is associated with attribute information from the input unit 102, and action data to which a data ID (identification information) is assigned is stored. Is done. As for the feature amount, for example, a hash value using the above-described p-stable hashing (information regarding which of the regions linearly divided by each hash function) is created, and these data are stored. .
In the data registration apparatus 300 of FIG. 1B, a learning database is formed by performing the data registration process as described above.

＜処理フローチャートの説明＞
以下、図７（ａ）〜図７（ｃ）に示すフローチャートを用いて、本実施形態の情報処理装置１００における処理の流れを説明する。図７（ａ）には識別モデル作成処理、図７（ｂ）にはデータ選択処理、図７（ｃ）にはデータ登録処理の各フローチャートを示す。なお、図７（ａ）〜図７（ｃ）のフローチャートでは、ステップＳ７０１〜ステップＳ７２６をそれぞれＳ７０１〜Ｓ７２６と略記する。また、図７（ａ）〜図７（ｃ）のフローチャートの処理は、ハードウェア構成又はソフトウェア構成により実行されてもよいし、一部がソフトウェア構成で残りがハードウェア構成により実現されてもよい。ソフトウェア構成により処理が実行される場合、図７（ａ）〜図７（ｃ）のフローチャートの処理は、不図示のＲＯＭ等に格納されているプログラムがＲＡＭ等に展開されてＣＰＵ等により実行される。本実施形態に係るプログラムは、ＲＯＭ等に予め用意される場合だけでなく、例えば着脱可能な半導体メモリから読み出されたり、不図示のインターネット等のネットワークからダウンロードされたりして、ＲＡＭ等にロードされてもよい。これらのことは、後述する他のフローチャートにおいても同様とする。 <Description of processing flowchart>
Hereinafter, the flow of processing in the information processing apparatus 100 according to the present embodiment will be described with reference to the flowcharts illustrated in FIGS. 7A to 7C. FIG. 7A shows a flowchart of an identification model creation process, FIG. 7B shows a data selection process, and FIG. 7C shows a data registration process. In the flowcharts of FIGS. 7A to 7C, steps S701 to S726 are abbreviated as S701 to S726, respectively. 7A to 7C may be executed by a hardware configuration or a software configuration, or a part of the processing may be realized by a software configuration and the rest by a hardware configuration. . When the processing is executed according to the software configuration, the processing in the flowcharts of FIGS. 7A to 7C is executed by the CPU or the like after a program stored in a ROM (not shown) is expanded in the RAM or the like. The The program according to the present embodiment is not only prepared in advance in a ROM or the like, but is also read from a removable semiconductor memory or downloaded from a network such as the Internet (not shown) and loaded into the RAM or the like. May be. The same applies to other flowcharts described later.

先ず図７（ａ）の識別モデル作成処理のフローチャートから説明する。
Ｓ７０１において、映像取得部１０１は、監視カメラ等から映像データを取得して、特徴量作成部１０３と表示部１０８へ出力する。Ｓ７０１の後、情報処理装置１００の処理はＳ７０２へと進む。
Ｓ７０２において、表示部１０８は、映像取得部１０１から送られてきた映像を表示する。Ｓ７０２の後、情報処理装置１００の処理はＳ７０３へと進む。 First, the flowchart of the identification model creation process in FIG.
In step S 701, the video acquisition unit 101 acquires video data from a monitoring camera or the like, and outputs the video data to the feature amount generation unit 103 and the display unit 108. After S701, the processing of the information processing apparatus 100 proceeds to S702.
In step S 702, the display unit 108 displays the video transmitted from the video acquisition unit 101. After S702, the processing of the information processing apparatus 100 proceeds to S703.

Ｓ７０３において、入力部１０２は、表示部１０８に表示されている映像内の対象物の事象に関するユーザの入力指示を基に、前述したように領域情報と属性情報を取得し、それら領域情報と属性情報を特徴量作成部１０３へと出力する。Ｓ７０３の後、情報処理装置１００はＳ７０４へと進む。 In step S 703, the input unit 102 acquires area information and attribute information as described above based on a user input instruction regarding an event of an object in the video displayed on the display unit 108, and the area information and attribute are acquired. Information is output to the feature quantity creation unit 103. After S703, the information processing apparatus 100 proceeds to S704.

Ｓ７０４において、特徴量作成部１０３は、対象物の行動を表す前述した領域情報と属性情報を基に、前述したようにして特徴量を作成し、その特徴量の情報をデータ選択部１０５へと出力する。Ｓ７０４の後、情報処理装置１００の処理はＳ７０５へと進む。 In step S 704, the feature amount creation unit 103 creates a feature amount as described above based on the region information and attribute information representing the behavior of the target object, and sends the feature amount information to the data selection unit 105. Output. After S704, the processing of the information processing apparatus 100 proceeds to S705.

Ｓ７０５において、データ選択部１０５は、前述したように、特徴量作成部１０３で作成された特徴量に類似する特徴量を持つ行動データをデータ保存部１０４のデータベースから選択する。データ選択部１０５におけるデータ選択処理の詳細な処理の流れは図７（ｂ）のフローチャートで説明する。Ｓ７０５の後、データ選択部１０５は、Ｓ７０６へと処理を進める。 In step S 705, the data selection unit 105 selects action data having a feature amount similar to the feature amount created by the feature amount creation unit 103 from the database of the data storage unit 104 as described above. The detailed processing flow of the data selection processing in the data selection unit 105 will be described with reference to the flowchart of FIG. After S705, the data selection unit 105 advances the process to S706.

Ｓ７０６において、データ選択部１０５は、入力部１０２を介してユーザから入力完了の指示がなされたか否かを判定する。そして、データ選択部１０５は、ユーザから入力完了の指示が入力されず、引き続き入力部１０２を介した対象物の行動に関する入力が行われる場合（ＮＯ）、情報処理装置１００の処理をＳ７０３に戻す。一方、データ選択部１０５は、ユーザから入力完了の指示が入力された場合（ＹＥＳ）、データ保存部１０４から選択された行動データを、識別モデルの作成に用いるデータとして識別モデル作成部１０６へと出力する。そして、Ｓ７０６で入力完了の指示が入力されたと判定された場合（ＹＥＳ）、情報処理装置１００の処理はＳ７０７へと進む。 In step S 706, the data selection unit 105 determines whether an input completion instruction has been given from the user via the input unit 102. When the input completion instruction is not input from the user and the input regarding the action of the target object is continuously performed via the input unit 102 (NO), the data selection unit 105 returns the processing of the information processing apparatus 100 to S703. . On the other hand, when an input completion instruction is input from the user (YES), the data selection unit 105 sends the action data selected from the data storage unit 104 to the identification model creation unit 106 as data used for creation of the identification model. Output. If it is determined in S706 that an input completion instruction has been input (YES), the processing of the information processing apparatus 100 proceeds to S707.

Ｓ７０７において、識別モデル作成部１０６は、識別モデルの作成用の行動データを用いて、前述のように識別モデルを作成（つまり識別モデルを学習）する。そして、識別モデル作成部１０６は、その作成した識別モデルを識別モデル保存部１０７に保存させる。このＳ７０７の処理完了後、情報処理装置１００は、識別モデル作成処理を終了させる。 In S707, the identification model creation unit 106 creates an identification model (that is, learns the identification model) as described above using the action data for creating the identification model. Then, the identification model creation unit 106 stores the created identification model in the identification model storage unit 107. After completing the process in S707, the information processing apparatus 100 ends the identification model creation process.

次に、図７（ｂ）のフローチャートに示すデータ選択処理（Ｓ７０６の処理）について説明する。なお、以下の説明では前述したハッシュ関数を用いる例を挙げる。
Ｓ７１１において、データ選択部１０５は、前述したＳ７０４の処理で取得された特徴量に対して、前述したようにハッシュ関数を適用してハッシュ値を算出する。そして、データ選択部１０５は、算出したハッシュ値に対し、データ保存部１０４に保存されている特徴量のハッシュ値が同一の行動データを収集する。データ選択部１０５は、Ｓ７１１にて行動データを収集できた場合、Ｓ７１２へと処理を進める。 Next, the data selection process (the process of S706) shown in the flowchart of FIG. In the following description, an example using the hash function described above will be given.
In S711, the data selection unit 105 calculates a hash value by applying a hash function as described above to the feature amount acquired in the process of S704 described above. Then, the data selection unit 105 collects behavior data having the same hash value of the feature amount stored in the data storage unit 104 with respect to the calculated hash value. If the data selection unit 105 has collected behavior data in S711, the data selection unit 105 advances the process to S712.

Ｓ７１２において、データ選択部１０５は、参照する行動データに付与する番号を表すインデックスｉを初期化する。インデックスｉは、前述のように収集した行動データに対して順番に割り振られる例えば番号である。インデックスｉの初期化が完了すると、データ選択部１０５は、Ｓ７１３へと処理を進める。
Ｓ７１３に進むと、データ選択部１０５は、参照する行動データのインデックスｉが、収集した行動データ数Ｉを超えるか（ｉ＞Ｉ）否かを判定する。データ選択部１０５は、インデックスｉが、収集した行動データ数以下（ｉ≦Ｉ）である場合（ＮＯ）にはＳ７１４へと処理を進め、一方、収集した行動データ数Ｉを超える場合（ＹＥＳ）には図７（ｂ）の処理を終了する。 In S712, the data selection unit 105 initializes an index i representing a number to be assigned to the behavior data to be referred to. The index i is, for example, a number assigned in order to the action data collected as described above. When the initialization of the index i is completed, the data selection unit 105 advances the process to S713.
In step S713, the data selection unit 105 determines whether the index i of the behavior data to be referenced exceeds the number I of collected behavior data (i> I). If the index i is less than or equal to the number of collected action data (i ≦ I) (NO), the data selection unit 105 proceeds to S714, whereas if the index i exceeds the number of collected action data I (YES). Then, the process of FIG.

Ｓ７１４に進むと、データ選択部１０５は、収集した行動データの中で、インデックスｉの行動データに含まれる画像データの画像を、前述の図５で説明したように、映像取得部１０１にて取得された映像に対して合成する。Ｓ７１４の後、データ選択部１０５は、Ｓ７１５に処理を進める。 In S714, the data selection unit 105 acquires the image of the image data included in the action data of the index i from the collected action data by the video acquisition unit 101 as described with reference to FIG. To the synthesized video. After S714, the data selection unit 105 advances the process to S715.

Ｓ７１５において、データ選択部１０５は、表示部１０８に画面表示された合成映像を見たユーザにより、入力部１０２を介して、インデックスｉの行動データの選択指示又は非選択の指示が入力されたか否かを判定する。データ選択部１０５は、例えば前述の図５の「選択する」のボタンアイコン５３１への入力指示がなされて、インデックスｉの行動データが選択された場合（ＹＥＳ）には、Ｓ７１６に処理を進める。一方、データ選択部１０５は、例えば図５の「選択しない」のボタンアイコン５３２への入力指示がなされたことで、非選択の指示がなされた場合（ＮＯ）には、Ｓ７１７に処理を進める。 In S 715, the data selection unit 105 determines whether or not an instruction to select or not to select the action data of the index i is input via the input unit 102 by the user who has viewed the composite video displayed on the display unit 108. Determine whether. For example, when an input instruction is given to the “select” button icon 531 of FIG. 5 described above and the action data of the index i is selected (YES), the data selection unit 105 advances the process to S716. On the other hand, the data selection unit 105 advances the process to S717 when a non-selection instruction is given (NO) due to, for example, an input instruction to the “do not select” button icon 532 of FIG.

Ｓ７１６に進むと、データ選択部１０５は、Ｓ７１５で選択されたインデックスｉの行動データを識別モデル作成用データに設定する。Ｓ７１６の後、データ選択部１０５は、Ｓ７１７へと処理を進める。 In step S716, the data selection unit 105 sets the action data of the index i selected in step S715 as identification model creation data. After S716, the data selection unit 105 advances the process to S717.

Ｓ７１７に進むと、データ選択部１０５は、次の行動データを参照するよう、インデックスｉをインクリメントする更新を行った後、Ｓ７１３へと処理を戻す。そして、収集した全ての行動データについてＳ７１４〜Ｓ７１６の処理が終わり、Ｓ７１７でインデックスｉが更新されると、そのインデックスｉは収集した行動データ数Ｉを超えることになる。したがって、収集した全ての行動データについてＳ７１４〜Ｓ７１６の処理が終わると、Ｓ７１３では収集した行動データ数Ｉを超えると判定（ＹＥＳ）されて、図７（ｂ）のフローチャートの処理は終了する。 In step S717, the data selection unit 105 updates the index i so as to refer to the next action data, and then returns the process to step S713. Then, when the processing of S714 to S716 is completed for all the collected action data and the index i is updated in S717, the index i exceeds the number I of collected action data. Therefore, when the processing of S714 to S716 is completed for all the collected behavior data, it is determined that the number of collected behavior data I is exceeded (YES) in S713, and the processing of the flowchart of FIG.

次に、図７（ｃ）のフローチャートに示すデータ登録処理について説明する。なお、図７（ｃ）のフローチャートの処理は、図１（ｂ）のデータ登録装置３００にて行われる。
Ｓ７２１において、映像取得部１０１は、監視カメラから映像データを取得して、特徴量作成部１０３と表示部１０８へ出力する。Ｓ７２１の後、データ登録装置３００の処理はＳ７２２へと進む。
Ｓ７２２において、表示部１０８は、映像取得部１０１から送られてきた映像を表示する。この場合の映像は、入力部１０２を介したユーザからの操作により、表示するフレームが変更され、その変更されたフレームの映像が表示される。Ｓ７２２の後、データ登録装置３００の処理はＳ７２３へと進む。 Next, the data registration process shown in the flowchart of FIG. Note that the processing of the flowchart of FIG. 7C is performed by the data registration apparatus 300 of FIG.
In step S 721, the video acquisition unit 101 acquires video data from the surveillance camera and outputs the video data to the feature amount generation unit 103 and the display unit 108. After S721, the processing of the data registration device 300 proceeds to S722.
In S722, the display unit 108 displays the video transmitted from the video acquisition unit 101. In the video in this case, the frame to be displayed is changed by a user operation via the input unit 102, and the video of the changed frame is displayed. After S722, the processing of the data registration device 300 proceeds to S723.

Ｓ７２３において、入力部１０２は、表示部１０８に表示されている映像内の対象物の正常な事象に対するユーザの入力指示を基に、前述した領域情報と属性情報を取得し、それら領域情報と属性情報を特徴量作成部１０３へと出力する。正常な事象としての行動の入力は、前述したように、映像の各フレームに対して対象物（歩行者や自転車など）がある領域をＧＵＩ操作により入力することにより行われる。Ｓ７２３の後、データ登録装置３００はＳ７２４へと進む。 In step S723, the input unit 102 acquires the above-described region information and attribute information based on a user input instruction for a normal event of the object in the video displayed on the display unit 108, and the region information and attribute are acquired. Information is output to the feature quantity creation unit 103. As described above, an action is input as a normal event by inputting a region where an object (pedestrian, bicycle, etc.) is present with respect to each frame of a video by GUI operation. After S723, the data registration device 300 proceeds to S724.

Ｓ７２４において、特徴量作成部１０３は、前述同様にして特徴量を作成する。そして、その作成された特徴量と属性情報は、データ保存部１０４へと送られる。Ｓ７２４の後、データ登録装置３００の処理はＳ７２５へと進む。 In S724, the feature amount creation unit 103 creates a feature amount in the same manner as described above. Then, the created feature amount and attribute information are sent to the data storage unit 104. After S724, the processing of the data registration device 300 proceeds to S725.

Ｓ７２５に進むと、データ保存部１０４は、前述したように、特徴量の情報及び登録情報について、データＩＤを割り当てて保存（登録）する。特徴量については、前述したように例えばハッシュ値が作成されて、これらの情報が保存される。このＳ７２５の後、データ登録装置３００の処理は、Ｓ７２６へと進む。 In step S725, the data storage unit 104 assigns and stores (registers) a data ID for the feature amount information and the registration information as described above. For the feature amount, for example, a hash value is created as described above, and these pieces of information are stored. After S725, the processing of the data registration device 300 proceeds to S726.

Ｓ７２６に進むと、入力部１０２は、ユーザから図６に例示した「入力完了」のボタンアイコン６３１への入力指示がなされたか否かを判定する。入力部１０２は、ユーザから入力完了の指示が入力されない場合（ＮＯ）にはＳ７２３に処理を戻す。一方、入力部１０２に入力完了の指示が入力された場合（ＹＥＳ）、データ登録装置３００は、図７（ｃ）のフローチャートの処理を終了する。なお、図６の例では図示していないが、画面内に例えば「継続」のボタンアイコンを設け、その「継続」のボタンアイコンへの入力指示が行われた場合に、Ｓ７２６でＮＯと判定されてＳ７２３の処理に戻るようにしてもよい。 In step S726, the input unit 102 determines whether an input instruction has been given to the “input complete” button icon 631 illustrated in FIG. 6 by the user. When the input completion instruction is not input from the user (NO), the input unit 102 returns the process to S723. On the other hand, when an input completion instruction is input to the input unit 102 (YES), the data registration device 300 ends the process of the flowchart of FIG. Although not shown in the example of FIG. 6, for example, when a “continue” button icon is provided on the screen and an input instruction is given to the “continue” button icon, NO is determined in S 726. Then, the process may return to S723.

以上説明したように、第１の実施形態の情報処理装置１００では、映像シーン内の対象物の事象としての行動やその状態を指定し、それらに類似する行動データを、予め作成して登録されているデータベースから収集する。そして、本実施形態においては、データベースから収集した行動データの中から、映像シーンに応じた適切な行動データを選択し、その選択した行動データを用いて識別モデルを作成している。すなわち、本実施形態の情報処理装置１００によれば、設置した監視カメラに対して、例えば学習用の映像データが少なくても、映像シーン内の対象物の事象を精度良く的確に識別できる識別モデルを作成することが可能となっている。 As described above, in the information processing apparatus 100 according to the first embodiment, the behavior and the state of the target object in the video scene are specified, and behavior data similar to them is created and registered in advance. Collect from the database you are using. In this embodiment, appropriate action data corresponding to the video scene is selected from the action data collected from the database, and an identification model is created using the selected action data. That is, according to the information processing apparatus 100 of the present embodiment, an identification model that can accurately and accurately identify an event of an object in a video scene even if there is little video data for learning, for example, with respect to an installed monitoring camera. It is possible to create.

＜第２の実施形態＞
図８は、第２の実施形態に関わる情報処理装置８００の概略的な構成例を示している。
第２の実施形態の情報処理装置８００は、識別モデルを作成する際、対象物やその状態を識別するデータとして、第１の実施形態で説明した対象物の正常な事象に加えて、対象物の正常な事象とは異なる事象に応じた行動データをも収集する。第２の実施形態において、対象物の正常な事象とは異なる事象としては、一例として、歩行者や自転車などが対象物である場合、歩行者や自転車などが転倒、倒れこみ、横断禁止場所の横断などの行動が挙げられる。なお、第２の実施形態においても映像シーンの一例として屋外の交差点の映像シーンを用いて説明するが、その他の公共施設等の映像シーンなどであってもよい。以下の説明では、正常な事象とは異なる事象を「異常な事象」と表記し、対象物の異常な事象としての行動を「異常な行動」と表記することとする。そして、第２の実施形態の場合、正常な事象と異常な事象に関する情報の入力は、後述するラベルアイコンの選択入力により行われる。 <Second Embodiment>
FIG. 8 shows a schematic configuration example of an information processing apparatus 800 according to the second embodiment.
When creating the identification model, the information processing apparatus 800 according to the second embodiment uses the target object as data for identifying the target object and its state in addition to the normal event of the target object described in the first embodiment. Collect behavioral data for events that differ from normal events. In the second embodiment, as an event different from the normal event of the target object, for example, when a pedestrian or bicycle is the target object, the pedestrian or the bicycle falls, falls down, or is prohibited from crossing. This includes actions such as crossing. In the second embodiment, an explanation will be given using a video scene at an outdoor intersection as an example of a video scene, but it may be a video scene of other public facilities. In the following description, an event different from a normal event is referred to as an “abnormal event”, and an action of the target object as an abnormal event is referred to as an “abnormal action”. In the case of the second embodiment, information regarding normal events and abnormal events is input by selecting and inputting a label icon described later.

以下、図８に示した第２の実施形態の情報処理装置８００において、対象物の正常な事象と異常な事象に関する情報の入力と行動データの収集、その収集した行動データに基づく識別モデルの生成を行う構成及び処理の説明を行う。なお、第２の実施形態の情報処理装置８００において、前述した第１の実施形態の情報処理装置１００の各構成と同一の構成については、同一の参照符号を付してその説明は省略する。第２の実施形態の情報処理装置８００の場合、データ選択部８０５、データ保存部８０４、識別モデル作成部８０６が、第１の実施形態の情報処理装置１００とは異なり、それ以外は第１の実施形態と同一の構成である。 Hereinafter, in the information processing apparatus 800 according to the second embodiment illustrated in FIG. 8, input of information regarding normal and abnormal events of an object, collection of behavior data, and generation of an identification model based on the collected behavior data A description will be given of the configuration and processing to be performed. Note that in the information processing apparatus 800 of the second embodiment, the same components as those of the information processing apparatus 100 of the first embodiment described above are denoted by the same reference numerals, and description thereof is omitted. In the case of the information processing apparatus 800 according to the second embodiment, the data selection unit 805, the data storage unit 804, and the identification model creation unit 806 are different from the information processing apparatus 100 according to the first embodiment. The configuration is the same as that of the embodiment.

また、第２の実施形態の場合は、前述した第１の実施形態で説明した対象物の正常な事象に関する入力処理に加えて、対象物の異常な事象に関する入力処理が行われる。対象物の異常な事象に関する入力処理では、対象物の事象の種類を表すラベル情報と、対象物の事象が異常な事象である場合のその異常な事象の意味を表すラベル情報の入力が行われる。対象物の事象の種類を表すラベル情報としては、対象物の事象が「正常」と「異常」の何れの種類に属するかを表す情報が用いられる。また、対象物の異常な事象の意味を表すラベル情報としては、対象物が歩行者等である場合の例えば「転倒」や「倒れこみ」、「横断禁止」などの情報が用いられる。したがって、第２の実施形態のデータ保存部８０４には、対象物の事象の種類を表すラベル情報と対象物の異常な事象の意味を表すラベル情報とを含む属性情報が記述された行動データが保存される。 In the case of the second embodiment, in addition to the input process related to the normal event of the object described in the first embodiment, the input process related to the abnormal event of the object is performed. In the input processing related to the abnormal event of the object, label information indicating the type of the event of the object and label information indicating the meaning of the abnormal event when the event of the object is an abnormal event are input. . As the label information indicating the type of event of the object, information indicating which type of the event of the object belongs to “normal” or “abnormal” is used. Further, as the label information indicating the meaning of the abnormal event of the object, for example, information such as “falling”, “falling down”, “prohibiting crossing” when the object is a pedestrian or the like is used. Therefore, in the data storage unit 804 of the second embodiment, action data in which attribute information including label information indicating the type of event of the target object and label information indicating the meaning of the abnormal event of the target object is described. Saved.

図９は、対象物が歩行者等であり、その歩行者等の行動が異常な行動である場合の行動データ９０１の一例を示した図である。図９には、対象物の異常な行動を表した画像データ、その特徴量、異常な行動に関連付けられた各ラベル情報の属性情報に対して、固有のデータＩＤが割り当てられた行動データ９０１の例が挙げられている。図９の属性情報では、対象物の行動の種類を表すラベル情報として「異常」が、対象物を表すラベル情報として「歩行者」が、対象物の異常な行動の意味を表すラベル情報として「転倒」や「倒れこみ」、「横断禁止」が、記述された例を挙げている。したがって、第２の実施形態のデータ保存部８０４には、例えば図９に示すような行動データ９０１が保存されている。なお、図９には図示していないが、行動データ９０１には対象物の行動が正常である場合の情報も適宜記述される。 FIG. 9 is a diagram illustrating an example of behavior data 901 when the target object is a pedestrian or the like and the behavior of the pedestrian or the like is an abnormal behavior. FIG. 9 shows behavior data 901 in which unique data IDs are assigned to the image data representing the abnormal behavior of the object, the feature amount, and the attribute information of each label information associated with the abnormal behavior. An example is given. In the attribute information of FIG. 9, “abnormality” is used as the label information indicating the type of action of the target object, “pedestrian” is used as the label information indicating the target object, and “ Examples are given of “falling”, “falling” and “no crossing”. Therefore, for example, action data 901 as shown in FIG. 9 is stored in the data storage unit 804 of the second embodiment. Although not shown in FIG. 9, information when the behavior of the object is normal is also described as appropriate in the behavior data 901.

また第２の実施形態の場合も、データ選択部８０５では、入力部１０２を介したユーザからの入力指示に基づいて、識別モデルの作成に用いる行動データの選択が行われる。
図１０は、映像取得部１０１にて取得された映像データの映像１００１が表示された表示部１０８の表示例を示している。入力部１０２は、前述同様に、表示部１０８の画面表示を用いたＧＵＩ等を介して、ユーザから対象物に対する指示入力を取得する。図１０には、対象物としての歩行者１０２１が横断歩道を歩いている映像１００１の例を示している。第２の実施形態の場合、表示部１０８の画面には、映像データの映像１００１と前述同様の属性情報リスト１０１２の他に、データ保存部８０４に保存されている行動データの各ラベル情報をアイコンにより表したラベルリスト１００２も表示される。すなわち、ラベルリスト１００２は、データ保存部８０４に保存されている各行動データの属性情報のラベル情報を基に分類したリストとなされている。図１０には、ラベルリスト１００２として、正常な行動に対応した正常行動ラベルリスト、異常な行動に対応した異常行動ラベルリスト、及び、その他の行動ラベルリストが表示された例を挙げている。 Also in the case of the second embodiment, the data selection unit 805 selects action data used to create an identification model based on an input instruction from the user via the input unit 102.
FIG. 10 shows a display example of the display unit 108 on which the video 1001 of the video data acquired by the video acquisition unit 101 is displayed. As described above, the input unit 102 acquires an instruction input to the target object from the user via a GUI or the like using the screen display of the display unit 108. FIG. 10 shows an example of an image 1001 in which a pedestrian 1021 as an object is walking on a pedestrian crossing. In the case of the second embodiment, on the screen of the display unit 108, in addition to the video 1001 of the video data and the attribute information list 1012 similar to the above, each label information of the action data stored in the data storage unit 804 is displayed as an icon. Is also displayed. That is, the label list 1002 is a list classified based on the label information of the attribute information of each action data stored in the data storage unit 804. FIG. 10 shows an example in which a normal behavior label list corresponding to normal behavior, an abnormal behavior label list corresponding to abnormal behavior, and other behavior label lists are displayed as the label list 1002.

第２の実施形態の場合、ユーザは、映像データの映像１００１を見ながら、入力部１０２を介したＧＵＩ操作により、ラベルリスト１００２の中のアイコン指示により行動データのラベル情報を入力する。図１０には、ユーザが入力部１０２を介して例えば横断禁止ラベルアイコン１００３を入力した例が示されている。すなわち、図１０の例のように、歩行者１０２１が横断歩道を歩いている場合において、横断禁止ラベルアイコン１００３の設定がなされると、その歩行者１０２１が横断歩道を歩く行動は異常行動であるとして行動データの設定が行われる。 In the case of the second embodiment, the user inputs label information of action data by an icon instruction in the label list 1002 by a GUI operation via the input unit 102 while viewing the video 1001 of the video data. FIG. 10 shows an example in which the user inputs, for example, a crossing prohibition label icon 1003 via the input unit 102. That is, when the pedestrian 1021 is walking on a pedestrian crossing as shown in the example of FIG. 10 and the crossing prohibition label icon 1003 is set, the behavior of the pedestrian 1021 walking on the pedestrian crossing is an abnormal behavior. The action data is set as follows.

また、本実施形態において、ラベルリスト１００２には、例えば交通信号機に対してラベル情報の入力を可能にするアイコンも用意されている。例えば、歩行者１０２１が歩いている横断歩道に直交した道路用の交通信号機１００４を例えば赤色点灯状態に設定し、その時の歩行者１０２１の行動に関する情報の入力を行うことで、併せて交通信号機１００４の情報が設定される。これにより、交通信号機１００４の状況変化にも対応した行動データの入力が可能となる。
そして、これらの入力が完了した場合、それら入力により設定された行動データが、識別モデル作成部８０６へと出力される。 In the present embodiment, the label list 1002 is also provided with icons that allow label information to be input to a traffic signal, for example. For example, the traffic signal 1004 for a road orthogonal to the pedestrian crossing where the pedestrian 1021 is walking is set in a red lighting state, for example, and information regarding the behavior of the pedestrian 1021 at that time is input. Is set. Thereby, it becomes possible to input action data corresponding to a change in the situation of the traffic signal 1004.
Then, when these inputs are completed, the behavior data set by these inputs is output to the identification model creation unit 806.

第２の実施形態の識別モデル作成部８０６は、設定された行動データを用いて、前述同様に識別モデルの学習を行う。第２の実施形態の場合、各行動データには、正常又は異常の種類を表す属性情報が付与されている。このため、第２の実施形態の識別モデル作成部８０６は、正常のラベル情報をクラス「＋１」とし、異常のラベル情報をクラス「−１」として、ＳＶＭ（ＳｕｐｐｏｒｔＶｅｃｔｏｒＭａｃｈｉｎｅ）手法を用いて識別モデルを作成する。これにより、入力された特徴量の行動データが、正常な行動か、又は、異常な行動かを判定可能な識別モデルが作成される。なお、識別モデルの作成には、Ａｄａｂｏｏｓｔなどの手法を用いることも可能である。 The identification model creation unit 806 of the second embodiment uses the set behavior data to learn the identification model as described above. In the case of the second embodiment, each behavior data is provided with attribute information indicating the type of normal or abnormal. For this reason, the identification model creation unit 806 of the second embodiment identifies normal label information as class “+1” and abnormal label information as class “−1” using the SVM (Support Vector Machine) method. Create a model. As a result, an identification model is created that can determine whether the behavior data of the input feature quantity is normal behavior or abnormal behavior. Note that a method such as Adaboost can also be used to create the identification model.

第２の実施形態の場合、前述したように属性情報には例えば交通信号機の状態を表す情報の入力も可能となされている。このため、例えば交通信号機が赤色点灯時に入力された情報を使って赤色点灯時の識別モデルを作成し、また例えば交通信号機が青色点灯時に入力された情報を使って青色点灯時の識別モデルを作成することも可能である。このように、交通信号機の点灯変化に対してそれぞれの識別モデルを作成しておくことにより、例えば、対象物やその行動の識別時に、交通信号機の情報を基に識別モデルを切り替えることで、より正しい正常・異常の行動判定を行うことも可能となる。 In the case of the second embodiment, as described above, for example, information indicating the state of a traffic signal can be input in the attribute information. For this reason, for example, an identification model for when the traffic light is lit in red is used to create an identification model for when the red light is lit. For example, an identification model for when the traffic light is lit in blue is created using the information input when the traffic light is lit in blue. It is also possible to do. In this way, by creating each identification model for lighting changes of traffic signals, for example, when identifying an object or its action, by switching the identification model based on traffic signal information, It is also possible to perform correct normal / abnormal behavior determination.

前述のようにして識別モデル作成部８０６にて作成された識別モデルのデータは、識別モデル保存部１０７へと送られて保存される。なお、本実施形態の場合、識別モデルはＳＶＭ手法を用いて作成されるので、複数のサポートベクターと、それぞれに対応する係数、及び、閾値が、識別モデル保存部１０７に保存される。 The identification model data created by the identification model creation unit 806 as described above is sent to and stored in the identification model storage unit 107. In this embodiment, since the identification model is created using the SVM technique, a plurality of support vectors, coefficients corresponding to the respective support vectors, and threshold values are stored in the identification model storage unit 107.

次に、図１１のフローチャートを参照しながら、第２の実施形態における識別モデル作成処理について詳細に説明する。なお、図１１において、前述の図７（ａ）のフローチャートと同じ処理ステップには同一の参照符号を付してそれらの説明は省略する。
図１１のフローチャートにおいて、Ｓ７０２の後、情報処理装置８００の処理は、Ｓ１１１３に進む。 Next, the identification model creation process in the second embodiment will be described in detail with reference to the flowchart of FIG. In FIG. 11, the same processing steps as those in the flowchart of FIG. 7A described above are denoted by the same reference numerals, and description thereof is omitted.
In the flowchart of FIG. 11, after S702, the processing of the information processing apparatus 800 proceeds to S1113.

Ｓ１１１３において、入力部１０２は、表示部１０８に表示されている対象物の事象に対するユーザの入力指示を基に、前述同様の領域情報と共に、その対象物の異常な行動に関する各ラベル情報を含む属性情報を取得する。Ｓ１１１３の後、入力部１０２の処理は、Ｓ１１１４へと進む。 In S 1113, the input unit 102 includes, on the basis of the user input instruction for the event of the target object displayed on the display unit 108, the attribute information including each piece of label information related to the abnormal behavior of the target object together with the same area information as described above Get information. After S1113, the process of the input unit 102 proceeds to S1114.

Ｓ１１１４において、入力部１０２は、ユーザによる入力指示がなされた属性情報のラベル情報が、前述のラベルリスト１００２からの入力か否かを判定する。具体的には、入力部１０２は、ラベルリスト１００２内のラベルアイコンをユーザがクリック等することで何れかのラベルアイコンが選択されているかどうかにより、ラベルリスト１００２による入力か否かの判定を行う。なお、判定の方法はこの方法に限定されるものではない。Ｓ１１１４において、ラベルリスト１００２による入力であると判定された場合、ユーザによる入力部１０２を介した指示入力の情報はデータ選択部８０５へと送られ、情報処理装置８００の処理はＳ１１１５へと進む。一方、Ｓ１１１４において、ラベルリスト１００２による入力でない判定された場合、ユーザによる入力部１０２を介した指示入力の情報は特徴量作成部１０３へと送られ、情報処理装置８００の処理は前述したＳ７０５へと進む。なお、Ｓ７０４の処理に進んだ場合、情報処理装置８００の処理は、その後、前述したＳ７０５の処理へ進み、さらに前述したＳ７０６へと進む。 In step S 1114, the input unit 102 determines whether the label information of the attribute information instructed by the user is input from the label list 1002 described above. Specifically, the input unit 102 determines whether or not the input is performed using the label list 1002 depending on whether any label icon is selected by the user clicking a label icon in the label list 1002 or the like. . Note that the determination method is not limited to this method. If it is determined in S1114 that the input is based on the label list 1002, the instruction input information by the user via the input unit 102 is sent to the data selection unit 805, and the processing of the information processing apparatus 800 proceeds to S1115. On the other hand, if it is determined in S1114 that the input is not based on the label list 1002, the information on the instruction input by the user via the input unit 102 is sent to the feature amount creation unit 103, and the processing of the information processing apparatus 800 proceeds to S705 described above. Proceed with When the process proceeds to S704, the process of the information processing apparatus 800 proceeds to the process of S705 described above, and further proceeds to S706 described above.

Ｓ１１１５の処理に進んだ場合、データ選択部８０５は、Ｓ１１１３にてユーザにより入力部１０２を介して入力されたラベルアイコンに応じたラベル情報を基に、データ保存部８０４から行動データを収集する。すなわち、データ選択部８０５は、データ保存部８０４内の各行動データの中から、ラベル情報を基に検索した行動データを収集し、その行動データを識別モデル作成部８０６に送る。Ｓ１１１５の後、情報処理装置８００の処理は、前述したＳ７０６へと進む。 When the process proceeds to S1115, the data selection unit 805 collects behavior data from the data storage unit 804 based on the label information corresponding to the label icon input by the user via the input unit 102 in S1113. That is, the data selection unit 805 collects the behavior data searched based on the label information from the behavior data in the data storage unit 804 and sends the behavior data to the identification model creation unit 806. After S1115, the processing of the information processing apparatus 800 proceeds to S706 described above.

Ｓ７０６において、前述したように入力が完了したと判定されると、情報処理装置８００の処理は、Ｓ１１１７へと進む。
Ｓ１１１７において、識別モデル作成部８０６は、識別モデルの作成用の行動データを用いて、識別モデルの学習を行う。第２の実施形態の場合、識別モデル作成部８０６は、入力された行動データの属性情報を用いて、正常な行動の行動データと、異常な行動の行動データとに分ける。そして、識別モデル作成部８０６は、前述したように、正常行動データをクラス「＋１」として、異常行動データをクラス「−１」とし、ＳＶＭを用いて識別モデルを作成する。このようにして作成された識別モデル（複数のサポートベクターと、それぞれに対応する係数、及び、閾値）は、識別モデル保存部１０７へと出力されて保存される。このＳ１１１７の処理完了後、情報処理装置８００は、図１１のフローチャートの識別モデル作成処理を終了させる。 If it is determined in S706 that the input has been completed as described above, the processing of the information processing apparatus 800 proceeds to S1117.
In step S 1117, the identification model creation unit 806 learns the identification model using the behavior data for creating the identification model. In the case of the second embodiment, the identification model creation unit 806 divides into behavior data of normal behavior and behavior data of abnormal behavior using the attribute information of the inputted behavior data. Then, as described above, the identification model creation unit 806 creates normal identification data as class “+1”, abnormal behavior data as class “−1”, and creates an identification model using SVM. The identification models (a plurality of support vectors, coefficients corresponding to the respective support vectors, and threshold values) created in this way are output to the identification model storage unit 107 and stored. After the completion of the process of S1117, the information processing apparatus 800 ends the identification model creation process of the flowchart of FIG.

以上説明したように、第２の実施形態の情報処理装置８００では、データ保存部８０４に保存されている行動データのラベル情報に応じたラベルアイコンを表示し、ユーザがラベルアイコンを選択することで対象物の行動に関する情報入力が行われる。すなわち、第２の実施形態の場合、ラベルアイコンの選択入力により、映像シーンの対象物の事象が正常か異常かを判定可能な識別モデルの作成が可能となる。 As described above, in the information processing apparatus 800 according to the second embodiment, the label icon corresponding to the label information of the action data stored in the data storage unit 804 is displayed, and the user selects the label icon. Information about the behavior of the object is input. That is, in the case of the second embodiment, it is possible to create an identification model that can determine whether an event of an object in a video scene is normal or abnormal by selecting and inputting a label icon.

＜第３の実施形態＞
図１２は、第３の実施形態に関わる情報処理装置１２００の概略的な構成例を示している。
第３の実施形態の情報処理装置１２００は、前述した第１、第２の実施形態で説明したような監視カメラ等の映像の表示と共に、監視カメラ等により映像が取得される場所のマップ情報をも表示して、対象物の事象に関する情報の入力を可能にする例である。 <Third Embodiment>
FIG. 12 shows a schematic configuration example of an information processing apparatus 1200 according to the third embodiment.
The information processing apparatus 1200 according to the third embodiment displays the map information of the place where the video is acquired by the monitoring camera or the like together with the display of the video of the monitoring camera or the like as described in the first and second embodiments. Is also an example that allows the input of information related to the event of the object.

以下、図１２に示す情報処理装置１２００において、複数のシーンの情報を含むマップ情報を用いて対象物の事象としての行動データを収集し、それら収集した行動データを基にした学習等により識別モデルを作成する構成及び処理の説明を行う。なお、本実施形態では、監視カメラ等が屋内の公共施設等に設置されている例を挙げて図示しているが、これには限定されず、例えば病院、介護施設、駅などの施設や、屋外等に設置されていてもよい。 Hereinafter, in the information processing apparatus 1200 shown in FIG. 12, action data as an event of an object is collected using map information including information of a plurality of scenes, and an identification model is obtained by learning based on the collected action data. The configuration and processing for creating the file will be described. In the present embodiment, an example in which surveillance cameras and the like are installed in an indoor public facility is illustrated, but the present invention is not limited to this, for example, a facility such as a hospital, a nursing facility, a station, It may be installed outdoors.

図１２に示した情報処理装置１２００において、マップ情報保存部１２０１は、監視カメラ等が設置さている場所及びその周囲のマップ情報を保持している。マップ情報は、監視カメラ等が例えば建物の屋内に設置されている場合には、その建物の見取り図（ゾーニングマップ）の情報を含み、例えばＣＧなどの３次元データとして保存されている。また、マップ情報保存部１２０１には、建物に関する情報に対して、監視カメラの設置情報や、複数のシーンの情報として屋内の各エリアにおける対象物の行動データも併せて保存されている。これらシーン毎の行動データはＣＧで作成することができる。また、マップ情報は、監視カメラ等が屋外に設置されている場合には、その周囲の地図情報となされる。屋外の場合のマップ情報には、シーン毎の対象物の行動に関する情報として、例えば携帯電話機や車両などに搭載されているＧＰＳ（全地球無線測位システム）等の測位情報（移動情報）が含まれていてもよい。 In the information processing apparatus 1200 illustrated in FIG. 12, the map information storage unit 1201 holds the map information about the location where the surveillance camera or the like is installed and the surrounding area. The map information includes information on a floor plan (zoning map) of the building when the monitoring camera or the like is installed indoors, for example, and is stored as three-dimensional data such as CG. In addition, the map information storage unit 1201 stores monitoring camera installation information and action data of objects in each indoor area as information on a plurality of scenes with respect to information on the building. The action data for each scene can be created by CG. In addition, when the monitoring camera or the like is installed outdoors, the map information is map information around the map information. The map information in the case of outdoor includes positioning information (movement information) such as GPS (Global Radio Positioning System) mounted on a mobile phone or a vehicle, for example, as information on the behavior of the object for each scene. It may be.

図１３には、第３の実施形態の情報処理装置１２００の映像取得部１２０８にて取得された映像１３０６とマップ情報保存部１２０１から供給されたマップ１３０１とが、表示部１２０９に表示された例を示している。図１３の表示例において、映像１３０６は映像取得部１２０８にて取得された映像であり、マップ１３０１はマップ情報保存部１２０１から供給されたマップ情報に基づくゾーニングマップ等である。また、マップ情報保存部１２０１のマップ情報には、監視カメラの設置位置情報と、その監視カメラのカメラ情報も含まれる。監視カメラの設定位置情報にはカメラの設置高さやカメラの設置角度の情報が含まれ、カメラ情報にはカメラの画角、焦点距離、絞り、シャッタースピード、ＩＳＯ感度、画素数などのカメラパラメータの情報等が含まれている。したがって、図１３のマップ１３０１には、監視カメラの設置位置情報に基づく監視カメラ１３０２も表示される。なお、映像１３０６内のエリア１３０５についての説明は後述する。 FIG. 13 shows an example in which the video 1306 acquired by the video acquisition unit 1208 of the information processing apparatus 1200 according to the third embodiment and the map 1301 supplied from the map information storage unit 1201 are displayed on the display unit 1209. Is shown. In the display example of FIG. 13, a video 1306 is a video acquired by the video acquisition unit 1208, and a map 1301 is a zoning map based on the map information supplied from the map information storage unit 1201. Further, the map information of the map information storage unit 1201 includes the installation position information of the monitoring camera and the camera information of the monitoring camera. The monitoring camera setting position information includes information on the camera installation height and camera installation angle. The camera information includes camera parameters such as the camera angle of view, focal length, aperture, shutter speed, ISO sensitivity, and pixel count. Information etc. are included. Accordingly, the monitoring camera 1302 based on the installation position information of the monitoring camera is also displayed on the map 1301 in FIG. Note that the area 1305 in the video 1306 will be described later.

また、マップ情報保存部１２０１のマップ情報には、マップ１３０１内でカメラ設置位置情報に応じたエリア１３０３内における正常な行動の対象物のデータも登録されている。マップ情報に含まれる対象物のデータには、その対象物の行動の３次元の動きデータも含まれている。図１３の例の場合、エリア１３０３内における正常な行動の対象物データとして、前後左右に動く歩行者と、止まっている人と、前後左右に動く車椅子に乗った人のデータが登録されており、それらを表すアイコン１３２１〜１３２３が表示されている。なお、マップ情報保存部１２０１には、正常行動のデータだけでなく、前述の第２の実施形態で説明したような、異常行動の場合の対象物のデータが登録されていてもよい。 Further, in the map information of the map information storage unit 1201, data of objects of normal behavior in the area 1303 corresponding to the camera installation position information in the map 1301 is also registered. The object data included in the map information includes three-dimensional motion data of the behavior of the object. In the case of the example of FIG. 13, data of a pedestrian who moves back and forth and right and left, a person who stops, and a person who rides on a wheelchair who moves back and forth and right and left are registered as target data of normal behavior in the area 1303. , Icons 1321 to 1323 are displayed. In the map information storage unit 1201, not only normal behavior data but also target object data in the case of abnormal behavior as described in the second embodiment may be registered.

図１２に説明を戻す。
座標変換部１２０２は、マップ情報保存部１２０１に登録されているマップ情報、カメラ設置情報、カメラ情報、対象物に関するデータを読み込む。そして、座標変換部１２０２は、カメラの設置位置情報に基づいて登録されているエリア１３０３とそのエリア１３０３内の対象物のデータに対し、映像１３０６の領域に表示するための座標変換を行う。具体的には、座標変換部１２０２は、カメラの設置位置を基準として、下記の式（２）を用い、エリア１３０３を映像１３０６内に透視投影変換することで、映像１３０６上のエリア１３０５を算出する。 Returning to FIG.
The coordinate conversion unit 1202 reads map information, camera installation information, camera information, and data related to an object registered in the map information storage unit 1201. Then, the coordinate conversion unit 1202 performs coordinate conversion for displaying in the area of the video 1306 with respect to the area 1303 registered based on the camera installation position information and the data of the object in the area 1303. Specifically, the coordinate conversion unit 1202 calculates the area 1305 on the video 1306 by performing perspective projection conversion of the area 1303 into the video 1306 using the following equation (2) with reference to the installation position of the camera. To do.

なお、式（２）において、（ｘ，ｙ，ｚ）は映像１３０６内における座標、ｋは画素の有効サイズ、ｏは映像１３０６の中心（画像中心）、ｆはカメラの焦点距離、（Ｘ，Ｙ，Ｚ，１）はカメラ設置位置を基準とした時の座標系のデータである。また、座標変換部１２０２は、３次元のデータをカメラの座標系にデータに変換するのに下記の式（３）の演算を行う。 In equation (2), (x, y, z) are coordinates in the video 1306, k is the effective size of the pixel, o is the center (image center) of the video 1306, f is the focal length of the camera, (X, Y, Z, 1) is data of the coordinate system when the camera installation position is used as a reference. In addition, the coordinate conversion unit 1202 performs the following equation (3) to convert three-dimensional data into data in the camera coordinate system.

なお、式（３）において、（Ｘ，Ｙ，Ｚ）はデータ座標系での座標、ｔはデータ座標系を基準としたカメラの設置位置、θはカメラの設置角度、（Ｘ'，Ｙ'，Ｚ'）はカメラ座標系での座標である。 In equation (3), (X, Y, Z) are coordinates in the data coordinate system, t is the camera installation position with reference to the data coordinate system, θ is the camera installation angle, and (X ′, Y ′). , Z ′) are coordinates in the camera coordinate system.

座標変換部１２０２は、図１３のマップ１３０１のエリア１３０３について、この座標変換の演算を行うことにより、映像１３０６内において対応するエリア１３０５の領域を設定することができる。また、座標変換部１２０２は、マップ１３０１のエリア１３０３内の対象物（アイコン１３２１，１３２２，１３２３）について、３次元の動きベクトルを同様に映像１３０６上の動きベクトルに変換する。そして、座標変換部１２０２により座標変換された情報は、特徴量作成部１２０３へ出力される。 The coordinate conversion unit 1202 can set the corresponding area 1305 in the video 1306 by performing this coordinate conversion operation on the area 1303 of the map 1301 in FIG. 13. Further, the coordinate conversion unit 1202 similarly converts a three-dimensional motion vector into a motion vector on the video 1306 for the object (icons 1321, 1322, 1323) in the area 1303 of the map 1301. Information converted by the coordinate conversion unit 1202 is output to the feature amount generation unit 1203.

特徴量作成部１２０３は、座標変換部１２０２にて変換された動きデータを基に、特徴量を作成する。具体的には、特徴量作成部１２０３は、座標変換部１２０２による変換で算出された映像１３０６上での動きデータから、ｎフレーム分の動きをベクトルの各要素とした特徴ベクトルを特徴量として求める。その他にも、特徴量作成部１２０３は、ｎフレーム分のＨＯＦ特徴量を作成して特徴量としてもよい。そして、特徴量作成部１２０３は、作成した特徴量をデータ選択部１２０５へと出力する。 The feature amount creation unit 1203 creates a feature amount based on the motion data converted by the coordinate conversion unit 1202. Specifically, the feature amount creation unit 1203 obtains, as a feature amount, a feature vector having n frames of motion as each element of the vector from motion data on the video 1306 calculated by the conversion by the coordinate conversion unit 1202. . In addition, the feature value creation unit 1203 may create HOF feature values for n frames as feature values. Then, the feature quantity creation unit 1203 outputs the created feature quantity to the data selection unit 1205.

データ保存部１２０４は、各行動データを保存している。本実施形態の場合、データ保存部１２０４には、前述した図３や図９で説明したのと同様の行動データが保存されている。
データ選択部１２０５は、特徴量作成部１２０３から取得した特徴量を用いて、前述した実施形態と同様に、類似する特徴量の行動データをデータ保存部１０４から選択する。そして、その選択された類似する行動データが識別モデル作成部１２０６へと送られる。
識別モデル作成部１２０６は、データ選択部１２０５で選択された行動データを用いて、前述した実施形態と同様に、識別モデルを作成する。なお、マップ情報保存部１２０１に異常行動のデータも登録されている場合、識別モデル作成部１２０６では前述同様のＳＶＭなどの２クラス識別モデルを作成することもできる。そして、その作成された識別モデルは、識別モデル保存部１２０７へ送られて保存される。また、識別モデルは、表示部１２０９へと送られてもよい。 The data storage unit 1204 stores each action data. In the case of this embodiment, the data storage unit 1204 stores action data similar to that described with reference to FIGS.
The data selection unit 1205 uses the feature amount acquired from the feature amount creation unit 1203 to select action data having a similar feature amount from the data storage unit 104 as in the above-described embodiment. Then, the selected similar action data is sent to the identification model creation unit 1206.
The identification model creation unit 1206 creates an identification model using the action data selected by the data selection unit 1205 as in the above-described embodiment. When abnormal behavior data is also registered in the map information storage unit 1201, the identification model creation unit 1206 can create a two-class identification model such as SVM as described above. The created identification model is sent to the identification model storage unit 1207 and stored. The identification model may be sent to the display unit 1209.

第３の実施形態の映像取得部１２０８は、マップ情報保存部１２０１に登録されている監視カメラにより撮影された映像データを取得する。この映像データは表示部１２０９へと送られる。
表示部１２０９は、映像取得部１２０８からの映像と、識別モデル作成部１２０６で作成した識別モデルとを表示する。第３の実施形態の場合、表示部１２０９の画面には、図１３に示したように、映像１３０６のエリア１３０５に、識別モデルに応じたアイコン１３２１〜１３２３を重ねて表示する。これにより、ユーザは、識別結果を確認することができることになる。なお、図１３の例では、エリア１３０５上のアイコン１３２１〜１３２３は、マップ１３０１のエリア１３０３内のものと同様のものを例に挙げている。 A video acquisition unit 1208 according to the third embodiment acquires video data captured by a monitoring camera registered in the map information storage unit 1201. This video data is sent to the display unit 1209.
The display unit 1209 displays the video from the video acquisition unit 1208 and the identification model created by the identification model creation unit 1206. In the case of the third embodiment, on the screen of the display unit 1209, as shown in FIG. 13, icons 1321 to 1323 corresponding to the identification model are displayed overlaid on the area 1305 of the video 1306. Thereby, the user can confirm the identification result. In the example of FIG. 13, the icons 1321 to 1323 on the area 1305 are the same as those in the area 1303 of the map 1301.

以下、図１４のフローチャートを参照しながら、第３の実施形態の情報処理装置１２００における識別モデル作成から表示までの処理について詳細に説明する。
Ｓ１４０１において、座標変換部１２０２は、マップ情報保存部１２０１に登録されている前述したマップ情報、カメラの設置位置情報、カメラ情報、対象物のデータを読み込む。Ｓ１４０１の後、座標変換部１２０２の処理は、Ｓ１４０２へと進む。 Hereinafter, with reference to the flowchart of FIG. 14, processing from creation of an identification model to display in the information processing apparatus 1200 according to the third embodiment will be described in detail.
In step S1401, the coordinate conversion unit 1202 reads the above-described map information, camera installation position information, camera information, and object data registered in the map information storage unit 1201. After S1401, the process of the coordinate conversion unit 1202 proceeds to S1402.

Ｓ１４０２に進むと、座標変換部１２０２は、マップ情報保存部１２０１から取得したマップ情報、カメラ設定位置情報、カメラ情報、対象物のデータを用いて、前述したような座標変換処理を行う。そして、座標変換部１２０２は、座標変換により得られたデータを特徴量作成部１２０３へと出力する。Ｓ１４０２の後、情報処理装置１２００の処理は、Ｓ１４０３へと進む。 In step S1402, the coordinate conversion unit 1202 performs the above-described coordinate conversion processing using the map information, camera setting position information, camera information, and object data acquired from the map information storage unit 1201. Then, the coordinate conversion unit 1202 outputs the data obtained by the coordinate conversion to the feature amount creation unit 1203. After S1402, the processing of the information processing device 1200 proceeds to S1403.

Ｓ１４０３において、特徴量作成部１２０３は、座標変換部１２０２にて変換されたデータを用いて前述したように特徴量を作成し、その作成した特徴量の情報をデータ選択部１２０５へと送る。Ｓ１４０３の後、情報処理装置１２００の処理はＳ１４０４へと進む。
Ｓ１４０４に進むと、データ選択部１２０５は、特徴量作成部１２０３から取得した特徴量を基に、前述したように類似する特徴量の行動データを選択し、その選択した行動データを識別モデル作成部１２０６へと送る。Ｓ１４０４の後、情報処理装置１２００の処理はＳ１４０５へと進む。
Ｓ１４０５において、識別モデル作成部１２０６は、前述したように、選択した行動データを用いて識別モデルを作成し、その作成した識別モデルのデータを識別モデル保存部１２０７と表示部１２０９に出力する。Ｓ１４０５の後、情報処理装置１２００の処理はＳ１４０６へと進む。 In step S 1403, the feature amount creation unit 1203 creates a feature amount as described above using the data converted by the coordinate conversion unit 1202, and sends the created feature amount information to the data selection unit 1205. After S1403, the process of the information processing apparatus 1200 proceeds to S1404.
In step S1404, the data selection unit 1205 selects action data of similar feature amounts as described above based on the feature amounts acquired from the feature amount creation unit 1203, and uses the selected behavior data as an identification model creation unit. 1206. After S1404, the process of the information processing apparatus 1200 proceeds to S1405.
In step S1405, as described above, the identification model creation unit 1206 creates an identification model using the selected action data, and outputs the created identification model data to the identification model storage unit 1207 and the display unit 1209. After S1405, the processing of the information processing apparatus 1200 proceeds to S1406.

Ｓ１４０６において、表示部１２０９は、映像取得部１２０８から映像を取得し、識別モデル保存部１２０７から識別モデルの情報を取得する。なお、監視カメラが複数ある場合には、予め何れの監視カメラの映像を取得するかが選択されているとする。Ｓ１４０６の後、表示部１２０９は、Ｓ１４０７の処理として、映像取得部１２０８から取得した映像と、識別モデル保存部１２０７から取得した識別モデルを画面に表示する。この表示を見ることにより、ユーザは、どのような識別モデルが作成されたかを確認することができる。 In step S 1406, the display unit 1209 acquires a video from the video acquisition unit 1208 and acquires identification model information from the identification model storage unit 1207. When there are a plurality of surveillance cameras, it is assumed that which surveillance camera image is acquired in advance. After S1406, the display unit 1209 displays the video acquired from the video acquisition unit 1208 and the identification model acquired from the identification model storage unit 1207 on the screen as processing of S1407. By viewing this display, the user can confirm what identification model has been created.

以上説明したように、第３の実施形態の情報処理装置１２００によれば、マップ情報を基に、設置されている監視カメラに対応可能な識別モデルを自動で作成することができる。第３の実施形態の情報処理装置１２００においても、前述の実施形態と同様にシーン内の対象物の事象を精度良く識別できる識別モデルを生成可能である。 As described above, according to the information processing apparatus 1200 of the third embodiment, it is possible to automatically create an identification model that is compatible with an installed monitoring camera based on map information. Also in the information processing apparatus 1200 of the third embodiment, an identification model that can accurately identify an event of an object in a scene can be generated as in the above-described embodiment.

本発明は、前述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 The present invention supplies a program that realizes one or more functions of the above-described embodiments to a system or apparatus via a network or a storage medium, and one or more processors in a computer of the system or apparatus read and execute the program This process can be realized. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

前述の実施形態は、何れも本発明を実施するにあたっての具体化の例を示したものに過ぎず、これらによって本発明の技術的範囲が限定的に解釈されてはならないものである。即ち、本発明は、その技術思想、又はその主要な特徴から逸脱することなく、様々な形で実施することができる。 The above-described embodiments are merely examples of implementation in carrying out the present invention, and the technical scope of the present invention should not be construed as being limited thereto. That is, the present invention can be implemented in various forms without departing from the technical idea or the main features thereof.

１０１映像取得部、１０２入力部、１０３特徴量作成部、１０４データ保存部、１０５データ選択部、１０６識別モデル作成部、１０７識別モデル保存部、１０８表示部 DESCRIPTION OF SYMBOLS 101 Image | video acquisition part, 102 input part, 103 feature-value preparation part, 104 data storage part, 105 data selection part, 106 identification model preparation part, 107 identification model storage part, 108 display part

Claims

Storage means for storing a plurality of event data including a feature amount of the event of the object generated in advance;
Video acquisition means for acquiring video;
A feature amount creating means for creating a feature amount of an event of an object in the acquired video;
Selecting means for selecting event data including a feature quantity similar to the feature quantity created by the feature quantity creating means from among the event data stored in the storage means;
Model creation means for creating an identification model for identifying an event of an object in a video using the feature amount of the selected event data;
An information processing apparatus comprising:

Having information acquisition means for acquiring input information;
The information acquisition means acquires attribute information related to the event of the object in the acquired video, and creates event data including the feature quantity of the target in the video and the acquired attribute information The information processing apparatus according to claim 1.

The selection means is a distance between the feature quantity created by the feature quantity creation means and the feature quantity included in the event data stored in the storage means, or a predetermined approximate nearest neighbor search method The information processing apparatus according to claim 1, wherein the selection is performed using a function calculated by:

The event data stored in the storage means includes attribute information regarding the event of the object,
The selecting means stores event data including attribute information that matches attribute information included in the created event data for the object in the video as event data including the similar feature amount. The information processing apparatus according to claim 2, wherein the information is selected from stored event data.

The information acquisition means acquires attribute information corresponding to a label selected from a plurality of labels representing an event of the object as the attribute information related to the event of the object in the acquired video. The information processing apparatus according to claim 2 or 4, characterized in that:

Having a display means;
The information processing apparatus according to claim 1, wherein the display unit displays the video and the created identification model.

The selection means causes the display means to display an image of the object based on event data including the similar feature amount so as to be superimposed on the displayed video, and is selected from the displayed images. 7. The information processing apparatus according to claim 6, wherein event data corresponding to the selected image is selected as event data used for creating the identification model.

Storage means for storing a plurality of event data including a feature amount of the event of the object generated in advance;
Information storage means for storing map information including information of a plurality of scenes;
Based on the information on the scene stored in the information storage unit, a feature amount creation unit that creates a feature amount of the event of the object;
Selecting means for selecting event data including a feature quantity similar to the feature quantity created by the feature quantity creating means from among the event data stored in the storage means;
Model creation means for creating an identification model for identifying an event of an object in a video using the feature amount of the selected event data;
An information processing apparatus comprising:

9. The information storage unit according to claim 8, wherein in addition to the scene information, the information storage unit stores information related to the event of the object and camera information related to the camera specified by the map information. Information processing device.

Video acquisition means for acquiring video captured by the camera specified by the map information;
Display means,
10. The information processing according to claim 9, wherein the display unit displays the information on the scene stored in the information storage unit, the video acquired by the video acquisition unit, and the identification model. apparatus.

The said storage means preserve | saves at least any event data of the normal event of the said target object, and the abnormal event of the said target object, The any one of Claim 1 to 10 characterized by the above-mentioned. Information processing device.

Model storage means for storing the created identification model;
The information according to any one of claims 1 to 11, wherein the identification model stored in the model storage unit is used to identify an event of the object in the acquired image. Processing equipment.

A storing step of storing a plurality of event data including the feature amount of the event of the object generated in advance;
A video acquisition process for acquiring video;
A feature amount creating step of creating a feature amount of an event of an object in the acquired video;
A selection step of selecting event data including a feature amount similar to the feature amount created in the feature amount creation step from the event data saved in the saving step;
A model creation step of creating an identification model for identifying an event of an object in a video using the feature amount of the selected event data;
An information processing method for an information processing apparatus, comprising:

A storing step of storing a plurality of event data including the feature amount of the event of the object generated in advance;
An information storage step for storing map information including information on a plurality of scenes;
Based on the information on the scene stored in the information storage step, a feature amount creation step of creating a feature amount of the event of the object;
A selection step of selecting event data including a feature amount similar to the feature amount created in the feature amount creation step from the event data saved in the saving step;
A model creation step of creating an identification model for identifying an event of an object in a video using the feature amount of the selected event data;
An information processing method for an information processing apparatus, comprising:

The program for functioning a computer as each means of the information processing apparatus of any one of Claim 1 to 12.