JP2022018808A

JP2022018808A - Information processing device, information processing method, and program

Info

Publication number: JP2022018808A
Application number: JP2020122173A
Authority: JP
Inventors: 将史瀧本; Masafumi Takimoto
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2020-07-16
Filing date: 2020-07-16
Publication date: 2022-01-27

Abstract

To make it possible to suppress the recognition of a person detected in an image as a registered person by mistake.SOLUTION: An information processing device includes detection means for detecting a predetermined object from an image, acquisition means for acquiring a second object that is different from a first object registered in advance on the basis of spatial information representing the space in which the object was detected and the time at which the object was present in the space, and determination means for determining whether the object detected by the detection means is the first object or the second object.SELECTED DRAWING: Figure 2

Description

本発明は、画像を処理する情報処理技術に関する。 The present invention relates to an information processing technique for processing an image.

監視システムでは、登録人物に似た特徴を持つ他人を、登録人物であるとして誤って検出してしまうことがある。以下、これを誤認識と呼ぶ。効果的に誤認識を減らしつつ精度良く検出・認識が行われることが望まれている。例えば、特許文献１には、誤認識が発生した際に該誤認識の原因となった類似した人物の情報を、登録人物に紐づけて保存しておき、照合の際に、その類似した人物の情報の照合をも行う方法が開示されている。 In the monitoring system, another person having characteristics similar to the registered person may be erroneously detected as a registered person. Hereinafter, this is referred to as misrecognition. It is desired that detection and recognition be performed with high accuracy while effectively reducing false recognition. For example, in Patent Document 1, information on a similar person who caused the misrecognition when a misrecognition occurs is stored in association with the registered person, and the similar person is stored at the time of collation. A method of collating the information of the above is disclosed.

特開２００８－２５７３２９号公報Japanese Unexamined Patent Publication No. 2008-257329

街中の監視のような事例では、複数の監視カメラ（マルチカメラ）による監視を行う場合が多く、そのような場合、監視カメラの台数が増加するにつれて登録人物と類似した他人とによる誤認識数が増加する可能性が高い。例えば特許文献１に記載の方法のように、誤認識の原因となる類似した人物全てを登録して運用した場合、類似した人物として誤って認識してしまう可能性がある。 In cases such as city surveillance, monitoring is often performed by multiple surveillance cameras (multi-cameras), and in such cases, as the number of surveillance cameras increases, the number of false recognitions by others similar to the registered person increases. It is likely to increase. For example, when all similar persons causing erroneous recognition are registered and operated as in the method described in Patent Document 1, there is a possibility that they are erroneously recognized as similar persons.

そこで本発明は、画像から検出された人物を誤って登録人物であると認識することを抑制する目的とする。 Therefore, an object of the present invention is to prevent a person detected from an image from being mistakenly recognized as a registered person.

本発明の情報処理装置は、画像から所定の物体を検出する検出手段と、前記物体が検出された空間を表す空間情報と前記物体が前記空間に存在した時刻とに基づいて、予め登録された第１の物体とは異なる第２の物体を取得する取得手段と、前記検出手段にて前記検出された物体が、前記第１の物体または前記第２の物体であるかどうかを判定する判定手段を有することを特徴とする。 The information processing apparatus of the present invention is registered in advance based on a detection means for detecting a predetermined object from an image, spatial information representing the space in which the object is detected, and a time when the object exists in the space. An acquisition means for acquiring a second object different from the first object, and a determination means for determining whether the object detected by the detection means is the first object or the second object. It is characterized by having.

本発明によれば、画像から検出された人物を誤って登録人物であると認識することを抑制できる。 According to the present invention, it is possible to prevent a person detected from an image from being mistakenly recognized as a registered person.

情報処理システムのハードウェア構成例を示す図Diagram showing an example of hardware configuration of an information processing system 情報処理装置が実行する処理を示すフローチャートFlow chart showing the processing executed by the information processing device 情報処理装置が実行する処理を示すフローチャートFlow chart showing the processing executed by the information processing device 監視カメラ設置マップの一例を示す図Diagram showing an example of a surveillance camera installation map 人物が検出された際の画面例を示す図The figure which shows the screen example when a person is detected 類似した人物情報の追加時に表示されるウインドウの説明図Explanatory drawing of the window displayed when adding similar person information 特徴空間内でのデータの分布イメージを表す図A diagram showing the distribution image of data in the feature space 第２の実施形態に係る画面表示例を示した図The figure which showed the screen display example which concerns on 2nd Embodiment 情報処理装置が実行する処理を示すフローチャートFlow chart showing the processing executed by the information processing device 情報処理装置の機能構成例を示したブロック図Block diagram showing a functional configuration example of an information processing device

以下、本発明の実施形態を、添付の図面に基づいて詳細に説明する。以下の実施形態において示す構成は一例にすぎず、本発明は図示された構成に限定されるものではない。なお、同一の構成または処理については、同じ参照符号を付して説明する。
＜第１の実施形態＞
第１の実施形態では、複数の監視カメラによって、市街地のような広域を監視する情報処理システムを想定して説明する。本実施形態の情報処理システムでは、監視対象として所定の対象物が事前に登録され、その登録済対象物が複数の監視カメラの内のどれかで撮影されて検出された場合に、アラート情報を発してユーザに知らせるという利用形態で運用されるものとする。本実施形態において、監視対象となされる所定の対象物としては例えば人物を想定するが、人物に限定されるものではない。本実施形態の情報処理システムでは、人物における画像由来の情報として人物の顔画像や歩容特徴の情報を取得し、それらの情報とその他の該人物に関連する情報とを含む人物情報が登録される。なお本実施形態において、情報処理システムに事前に登録される人物としては、例えば指名手配犯や不審者、重要人物等を想定する。以下、本実施形態の説明では、事前に登録されている登録済対象物である人物を、登録済検出対象人物と呼ぶことにする。 Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. The configurations shown in the following embodiments are merely examples, and the present invention is not limited to the configurations shown. The same configuration or processing will be described with the same reference numerals.
<First Embodiment>
In the first embodiment, an information processing system that monitors a wide area such as an urban area with a plurality of surveillance cameras will be described. In the information processing system of the present embodiment, when a predetermined object is registered in advance as a monitoring target and the registered object is photographed and detected by any of a plurality of surveillance cameras, alert information is generated. It shall be operated in a usage form in which it is issued and notified to the user. In the present embodiment, for example, a person is assumed as a predetermined object to be monitored, but the object is not limited to the person. In the information processing system of the present embodiment, information on a person's face image and gait characteristics is acquired as information derived from an image in a person, and personal information including such information and other information related to the person is registered. To. In the present embodiment, as the person registered in advance in the information processing system, for example, a wanted criminal, a suspicious person, an important person, or the like is assumed. Hereinafter, in the description of the present embodiment, a person who is a registered object registered in advance will be referred to as a registered detection target person.

図１は、本実施形態の情報処理装置が適用される情報処理システムのハードウェア構成例を示す図である。
本実施形態に係る情報処理システムは、複数の監視カメラ１０１，１０２，１０３，１０４と、記録サーバ１１と、情報処理装置１２と、出力装置１３と、入力装置１４とを有して構成されている。なお、図１の例では、４台の監視カメラが情報処理装置１２と記録サーバ１１に接続されているが、監視カメラは４台である必要はなく、数千台、或いは数万台でもよいし、１台であってもよい。 FIG. 1 is a diagram showing a hardware configuration example of an information processing system to which the information processing apparatus of the present embodiment is applied.
The information processing system according to the present embodiment includes a plurality of surveillance cameras 101, 102, 103, 104, a recording server 11, an information processing device 12, an output device 13, and an input device 14. There is. In the example of FIG. 1, four surveillance cameras are connected to the information processing device 12 and the recording server 11, but the number of surveillance cameras does not have to be four, and may be several thousand or tens of thousands. However, it may be one.

複数の監視カメラによってそれぞれ撮影された動画の画像データ（以下、画像と記す）は、記録サーバ１１に動画像形式で保存されるとともに、入力画像として情報処理装置１２にも送信される。記録サーバ１１には、監視カメラからの動画がそのまま記録される場合の他、以下で説明する情報処理装置１２で処理されたデータや情報処理装置１２が扱う情報等も記録される。記録サーバ１１には、それぞれ後述する登録済対象物の情報や、時刻の情報、空間情報、人物情報、ラベル、スコア、リストなどの様々な情報が記録される。 The image data (hereinafter referred to as an image) of the moving image taken by each of the plurality of surveillance cameras is stored in the recording server 11 in a moving image format, and is also transmitted to the information processing apparatus 12 as an input image. In addition to the case where the moving image from the surveillance camera is recorded as it is, the recording server 11 also records the data processed by the information processing device 12 described below, the information handled by the information processing device 12, and the like. Various information such as information on registered objects, time information, spatial information, person information, labels, scores, and lists, which will be described later, are recorded on the recording server 11.

次に、図１を用いて情報処理装置１２が実装されるハードウェア構成について説明する。情報処理装置１２は、ＣＰＵ１２１、ＲＡＭ１２２、ＲＯＭ１２３、出力Ｉ／Ｆ１２４、入力Ｉ／Ｆ１２５、およびバス１２０等を有して構成されている。また、情報処理装置１２には、出力装置１３と入力装置１４が接続されている。なお図１の場合、情報処理装置１２は、出力装置１３と入力装置１４が接続されるパーソナルコンピュータを想定している。この例には限定されず、タブレット端末やスマートフォンのように入出力デバイスを備えた装置でもよい。 Next, the hardware configuration in which the information processing apparatus 12 is mounted will be described with reference to FIG. The information processing apparatus 12 includes a CPU 121, a RAM 122, a ROM 123, an output I / F 124, an input I / F 125, a bus 120, and the like. Further, the output device 13 and the input device 14 are connected to the information processing device 12. In the case of FIG. 1, the information processing device 12 assumes a personal computer to which the output device 13 and the input device 14 are connected. The present invention is not limited to this, and a device provided with an input / output device such as a tablet terminal or a smartphone may be used.

ＣＰＵ（セントラルプロセッシングユニット）１２１は、バス１２０に接続されている各デバイスを総括的に制御する。ＲＯＭ（リードオンリーメモリ）１２３は、本実施形態に係る情報処理プログラム（アプリケーションプログラム）、オペレーティングシステム（ＯＳ）、デバイスドライバなどを記憶している。ＲＡＭ（ランダムアクセスメモリ）１２２は、ＣＰＵ１２１による処理時に先述したプログラムや処理途中のデータ等を一次記憶する。ＣＰＵ１２１は、ＲＯＭ１２３に記憶されている情報処理ステップや他のプログラムを読み出して、ＲＡＭ１２２に一次記憶し、後述する各フローチャートに従って処理を実行する。なお、ＣＰＵ１２１を用いたソフトウェア処理の代替として、後述する各機能部の処理に対応させた演算部や回路から構成されるハードウェアによる処理が行われてもよい。 The CPU (Central Processing Unit) 121 comprehensively controls each device connected to the bus 120. The ROM (read-only memory) 123 stores an information processing program (application program), an operating system (OS), a device driver, and the like according to the present embodiment. The RAM (random access memory) 122 primarily stores the above-mentioned program, data in the process of processing, and the like at the time of processing by the CPU 121. The CPU 121 reads out the information processing step and other programs stored in the ROM 123, first stores them in the RAM 122, and executes the process according to each flowchart described later. As an alternative to the software processing using the CPU 121, the processing by the hardware composed of the arithmetic unit and the circuit corresponding to the processing of each functional unit described later may be performed.

入力Ｉ／Ｆ（インターフェイス）１２５は、外部の操作装置などから入力した情報を、情報処理装置１２で処理可能な形式に変換する。出力Ｉ／Ｆ１２４は、出力する情報を外部の出力装置などが処理可能な形式に変換する。
図１には最低限の構成のみを示しているが、より高速処理を求める環境ではバス１２０に接続するデバイスとして、例えばＧＰＵ（グラフィックスプロセッシングユニット）が設けられていてもよい。 The input I / F (interface) 125 converts the information input from an external operating device or the like into a format that can be processed by the information processing device 12. The output I / F 124 converts the output information into a format that can be processed by an external output device or the like.
Although only the minimum configuration is shown in FIG. 1, a GPU (graphics processing unit) may be provided as a device connected to the bus 120 in an environment where higher speed processing is required.

出力装置１３は、情報処理装置１２で処理された結果を表示する表示装置である。例えば、登録済検出対象人物が検出された場合にアラート情報を発して監視者等のユーザに知らせる場合、出力装置１３には、登録済検出対象人物として検出された人物が映った監視カメラからの映像や登録済検出対象人物の登録画像等が表示される。これにより、ユーザは、出力装置１３にて表示された映像を観ることで、情報処理装置１２によって検出された人物が、登録済検出対象人物であるかどうかを確認することができる。そして、例えば登録済検出対象人物が指名手配犯であるような場合、ユーザは、警察への通報などの対処を行うことができる。 The output device 13 is a display device that displays the result processed by the information processing device 12. For example, when alert information is issued to notify a user such as a monitor when a registered detection target person is detected, the output device 13 is from a surveillance camera showing a person detected as a registered detection target person. A video or a registered image of a registered person to be detected is displayed. As a result, the user can confirm whether or not the person detected by the information processing apparatus 12 is a registered detection target person by observing the image displayed by the output device 13. Then, for example, when the registered detection target person is a wanted criminal, the user can take measures such as reporting to the police.

しかし、情報処理装置１２が、登録済検出対象人物とは別人である人物を検出し、その別人を登録済検出対象人物として誤って認識するような誤認識が生じることもある。このような誤認識が生じた場合、ユーザは、認識結果が誤りであることを情報処理装置１２に知らせる。この場合、ユーザは、キーボードやマウス等から構成される入力装置１４を介して、出力装置１３に表示された認識結果に対して、その結果を正す指令（誤認識であることを教示する指令）や調整等を行う指令などを入力することができる。 However, the information processing apparatus 12 may detect a person who is different from the registered detection target person, and erroneously recognize the other person as the registered detection target person. When such an erroneous recognition occurs, the user informs the information processing apparatus 12 that the recognition result is erroneous. In this case, the user gives a command to correct the recognition result displayed on the output device 13 via the input device 14 composed of a keyboard, a mouse, or the like (a command to teach that the recognition is erroneous). And commands for adjustment etc. can be input.

ただし、本実施形態のような複数の監視カメラによる監視（マルチカメラによる監視）が行われる場合には、登録された検出対象人物と類似した他人を誤って認識する数が増加する可能性が高くなる。また例えば、特許文献１に記載の方法のように、誤認識の原因となる類似した人物全てを登録して運用した場合には、前述したように処理時間が長くなり、また見逃しも発生することになる。 However, when monitoring by a plurality of surveillance cameras (monitoring by a multi-camera) as in the present embodiment, there is a high possibility that the number of erroneously recognizing another person similar to the registered detection target person will increase. Become. Further, for example, when all similar persons causing misrecognition are registered and operated as in the method described in Patent Document 1, the processing time becomes long as described above, and oversight may occur. become.

ここで例えば、登録済検出対象人物と比較する必要のない人物については、極力比較しないで済む仕組みがあれば、処理時間を減らすだけでなく、見逃しの発生も低減でき、認識精度を向上させる効果も得られる。
例えば、一般的に人物の行動パターンには傾向が有り、各人によって日常的に屋外で行動する時間帯や利用経路は、それほど大きく変わらないことが分かっている。そのため、この一般的な仕組みを反映させることによって不要な人物に対する比較の実施を避けることができると考えられる。 Here, for example, for a person who does not need to be compared with a registered person to be detected, if there is a mechanism that does not need to be compared as much as possible, not only the processing time can be reduced, but also the occurrence of oversight can be reduced, and the recognition accuracy can be improved. Can also be obtained.
For example, it is generally known that there is a tendency for a person's behavior pattern, and that the time zone and usage route of each person's daily activities outdoors do not change so much. Therefore, it is considered that it is possible to avoid conducting comparisons with unnecessary persons by reflecting this general mechanism.

一例として、全世界の街中に張り巡らされた多数の監視カメラから入力される動画を対象とした情報処理システムについて考えてみる。また例えば、登録済検出対象人物Ａと、その人物Ａに類似した特徴を持つ別人である人物Ｐが、登録済検出対象人物Ａに紐づけた類似人物として登録されているとする。特許文献１に記載の方法の場合、全世界に配置された全ての監視カメラの動画について、登録済検出対象人物Ａとその類似人物として登録された人物Ｐとに対する照合処理が行われることになり、膨大な処理が実行されることになる。さらに登録済検出対象人物Ａに似た特徴を持つ人物は、人物Ｐだけでなく世界各国に多数存在すると考えられ、これらの人物に対しても照合処理が行われることになる。仮に類似した人物の特徴を全世界から集めたことでＮ人分の類似人物が登録されたとすると、それらＮ人の類似人物の特徴についても照合処理が実施されることになり、また登録済検出対象人物Ａを特定できる可能性も下がるため見逃しが増加する。 As an example, consider an information processing system for moving images input from a large number of surveillance cameras spread all over the world. Further, for example, it is assumed that the registered detection target person A and the person P who is another person having characteristics similar to the person A are registered as similar persons associated with the registered detection target person A. In the case of the method described in Patent Document 1, the videos of all the surveillance cameras arranged all over the world are collated with the registered detection target person A and the person P registered as a similar person. , A huge amount of processing will be executed. Further, it is considered that there are many persons having characteristics similar to the registered detection target person A not only in the person P but also in various countries around the world, and the collation process is performed for these persons as well. Assuming that N similar persons are registered by collecting the characteristics of similar persons from all over the world, the characteristics of those N similar persons will also be collated, and the registered detection will be performed. Since the possibility of identifying the target person A is reduced, the number of oversights increases.

ここで、人物Ｐは、例えば毎日夜間に地区Ｄに設置された特定の監視カメラで観測されるとする。このように人物Ｐが観測される地域や時間帯がある程度限られる場合、それら地域や時間帯とは異なった別の地域や時間帯における各監視カメラの動画を用いた処理は、ほぼ不要な処理であると考えられる。また、人物Ｐが観測される地域や時間帯とは全く異なる別の地域や時間帯において各監視カメラの動画から人物Ｐが検出されたとすると、それは誤った検出である可能性が非常に高いと考えられる。 Here, it is assumed that the person P is observed by a specific surveillance camera installed in the district D every night, for example. When the area or time zone in which the person P is observed is limited to some extent in this way, processing using the video of each surveillance camera in a different area or time zone different from those areas or time zones is almost unnecessary processing. Is considered to be. Also, if the person P is detected from the video of each surveillance camera in a completely different area or time zone from the area or time zone where the person P is observed, it is very likely that the detection is erroneous. Conceivable.

そこで、本実施形態の情報処理装置１２は、登録済検出対象人物Ａに似た特徴を持つ別の人物Ｐが出現する地域や時間帯に基づく出現尤度を考慮した照合処理を行う。これにより、人物Ｐがほとんど出現しない地域や時間帯の動画に対する無駄な処理量及び処理時間が大幅に低減されることになる。同様に、人物Ｐが登録済検出対象人物Ａとして誤認識されること、さらに登録済検出対象人物Ａが見逃されてしまうことについても、大幅に低減可能になる。 Therefore, the information processing apparatus 12 of the present embodiment performs collation processing in consideration of the appearance likelihood based on the area and time zone in which another person P having a characteristic similar to the registered detection target person A appears. As a result, the wasteful processing amount and processing time for the moving image in the area or time zone in which the person P hardly appears can be significantly reduced. Similarly, it is possible to significantly reduce the fact that the person P is erroneously recognized as the registered detection target person A and that the registered detection target person A is overlooked.

図１０を用いて情報処理装置１２の機能構成例を説明する。情報処理装置１２は、画像取得部１２０１、検出部１２０２を有する。検出部１２０２は、更に、物体検出部１２０２０、空間情報取得部１２０２１、類似物体情報取得部１２０２２、照合部１２０２３、生成部１２０２４、受付部１２０２５、更新部１２０２６を有する。また、情報処理装置１２は、複数の外部装置に接続されている。 An example of the functional configuration of the information processing apparatus 12 will be described with reference to FIG. The information processing device 12 has an image acquisition unit 1201 and a detection unit 1202. The detection unit 1202 further includes an object detection unit 12020, a spatial information acquisition unit 12021, a similar object information acquisition unit 12022, a collation unit 12023, a generation unit 12024, a reception unit 12025, and an update unit 12026. Further, the information processing device 12 is connected to a plurality of external devices.

ここで、各機能構成の処理の概要を説明する。詳細はフローチャートと合わせて説明する。
画像取得部１２０１は、撮像装置（監視カメラ１０１（～１０４））によって撮像された画像を取得する。検出部１２０２は、取得された画像から検出された物体（人物）から、予め登録された登録物体（第1の物体）を検出する。登録物体は、本タスクにおいて検出したい人物を指す。具体的には、指名手配されている人物や迷子等で捜索されている人物を想定している。 Here, an outline of the processing of each functional configuration will be described. Details will be described together with the flowchart.
The image acquisition unit 1201 acquires an image captured by an image pickup device (surveillance camera 101 (to 104)). The detection unit 1202 detects a pre-registered registered object (first object) from the object (person) detected from the acquired image. The registered object refers to the person to be detected in this task. Specifically, it is assumed that a person is wanted or is being searched for by a lost child.

検出部１２０２は、さらに以下の処理を実行する。物体検出部１２０２０は、取得された画像から人物を検出する。空間情報取得部１２０２１は、現在検出されている人物が検出された空間および時刻を示す情報を取得する。類似物体情報取得部１２０２２は、取得された空間及び時刻に基づいて、その時刻ｔに近接する時間帯に、空間情報ｓ（場所）に対応した監視カメラから取得された画像によって、過去に誤認識された人物についての類似物体情報を取得する。照合部１２０２３は、検出された人物についての情報を、登録済検出対象人物Ａ（第１の物体）についての情報と、類似物体である人物Ｂ（ｎ）についての情報と照合する。生成部１２０２４は、検出された人物について、第１の物体または第２の物体と照合したか否かについての判定結果を示す出力情報を生成する。受付部１２０２５は、判定結果についてユーザによる修正指示を受け付ける。すなわち受付部１２０２５は、検出された人物に対する認識結果は誤認識であったか否かについての情報を受け付ける。更新部１２０２６は、認識結果が誤りであった場合、照合処理における閾値を更新する。 The detection unit 1202 further executes the following processing. The object detection unit 12020 detects a person from the acquired image. The spatial information acquisition unit 12021 acquires information indicating the space and time when the currently detected person is detected. Based on the acquired space and time, the similar object information acquisition unit 12022 misrecognizes in the past by the image acquired from the surveillance camera corresponding to the spatial information s (location) in the time zone close to the time t. Acquires similar object information about the person who was made. The collation unit 12023 collates the information about the detected person with the information about the registered detection target person A (first object) and the information about the person B (n) which is a similar object. The generation unit 12024 generates output information indicating a determination result as to whether or not the detected person is collated with the first object or the second object. The reception unit 12025 receives a correction instruction by the user regarding the determination result. That is, the reception unit 12025 receives information as to whether or not the recognition result for the detected person is erroneous recognition. If the recognition result is erroneous, the update unit 12026 updates the threshold value in the collation process.

図２（Ａ）は本実施形態の情報処理装置１２が実行する処理を示したフローチャートである。監視カメラからの画像入力から監視終了までの間に繰り返される処理を示している。また図２（Ｂ）は、図２（Ａ）のフローチャートのステップＳ２１で行われる登録済検出対象人物の検出処理の詳細なフローチャートである。図３は、図２（Ｂ）のフローチャートのステップ２１５で行われる認識結果表示処理の詳細なフローチャートである。処理の詳細は後述する。 FIG. 2A is a flowchart showing a process executed by the information processing apparatus 12 of the present embodiment. It shows the process repeated from the image input from the surveillance camera to the end of surveillance. Further, FIG. 2B is a detailed flowchart of the detection process of the registered detection target person performed in step S21 of the flowchart of FIG. 2A. FIG. 3 is a detailed flowchart of the recognition result display process performed in step 215 of the flowchart of FIG. 2 (B). The details of the process will be described later.

ここでは、本実施形態における効果がより明確に発揮される例として、市街監視のために複数の監視カメラが各々異なる道路を撮影する場合や、病院の院内監視のために複数の監視カメラが各々異なるフロアを撮影する場合などを想定して説明する。また、それら各監視カメラにて撮影される人々は、それぞれ顔画像や歩容特徴等の傾向が異なるとする。なお、本実施形態に係る情報処理システムは、人物を対象とした検出であれば特に利用される場面が限定されることはない。例えば、車や動物にも適用可能である。以下では、より効果の理解と説明のし易さのために市街監視を例に挙げて説明する。 Here, as an example in which the effect of the present embodiment is more clearly exhibited, a case where a plurality of surveillance cameras take pictures of different roads for city surveillance, and a case where a plurality of surveillance cameras are used for in-hospital surveillance, respectively. The explanation is based on the assumption that a different floor will be photographed. In addition, it is assumed that the people photographed by each of these surveillance cameras have different tendencies such as facial images and gait characteristics. The information processing system according to the present embodiment is not particularly limited in the situations where it is used as long as it is a detection targeting a person. For example, it can be applied to cars and animals. In the following, for the sake of better understanding and explanation of the effect, city monitoring will be explained as an example.

図４は、市街監視における複数の監視カメラの設置例を示した図である。
ここでは、１４台の監視カメラＣ３０１～Ｃ３１４からの動画を同時に処理する事例とする。なお、監視カメラＣ３０１～Ｃ３１４は、図１における監視カメラ１０１～１０４に相当する。また、登録済検出対象人物は１人でもよいが、数千人、数万人であってもよい。 FIG. 4 is a diagram showing an installation example of a plurality of surveillance cameras in city surveillance.
Here, it is assumed that moving images from 14 surveillance cameras C301 to C314 are processed at the same time. The surveillance cameras C301 to C314 correspond to the surveillance cameras 101 to 104 in FIG. Further, the number of registered detection target persons may be one, but may be several thousand or tens of thousands.

まず、図２（Ａ）のフローチャートのステップＳ２０において、画像取得部１２０１は、監視カメラＣ３０１～Ｃ３１４からそれぞれ撮影された動画の画像（監視画像）を取得する。
次にステップＳ２１において、検出部１２０２は、取得された画像から検出された物体（人物）から、予め登録された登録物体（第1の物体）を検出する。
そして、情報処理装置１２は、ステップＳ２２においてユーザから監視終了の指示が入力されるまで、ステップＳ２０とステップＳ２１の処理を継続する。 First, in step S20 of the flowchart of FIG. 2A, the image acquisition unit 1201 acquires images (surveillance images) of moving images taken from the surveillance cameras C301 to C314, respectively.
Next, in step S21, the detection unit 1202 detects a pre-registered registered object (first object) from the object (person) detected from the acquired image.
Then, the information processing apparatus 12 continues the processing of steps S20 and S21 until the user inputs an instruction to end monitoring in step S22.

図２（Ｂ）のフローチャートのステップＳ２１０において、検出部１２０２は、複数の監視カメラＣ３０１～Ｃ３１４からそれぞれ送られてきた入力画像から所定の対象物としての人物を検出する検出処理を実行する。例えば、検出部１２０２は、人物の特徴を学習した学習済みモデルを用いて画像における人物の位置を検出する。他の方法でもよい。 In step S210 of the flowchart of FIG. 2B, the detection unit 1202 executes a detection process of detecting a person as a predetermined object from input images sent from a plurality of surveillance cameras C301 to C314, respectively. For example, the detection unit 1202 detects the position of a person in an image using a trained model that has learned the characteristics of the person. Other methods may be used.

次にステップＳ２１１において、物体検出部１２０２０は、ステップＳ２１０の人物検出処理によって人物が検出されたか否かを判定する。情報処理装置１２は、人物が検出されなかった場合にはステップＳ２１２以降の処理を行わず、図２（Ａ）のステップＳ２２に進む。一方、人物が検出されたと判定した場合、情報処理装置１２は、ステップＳ２１２以降の処理により、登録済検出対象人物に該当する人物が検出されたかどうか照合処理を行う。 Next, in step S211 the object detection unit 12020 determines whether or not a person has been detected by the person detection process in step S210. If the person is not detected, the information processing apparatus 12 does not perform the processing after step S212, and proceeds to step S22 of FIG. 2 (A). On the other hand, when it is determined that a person has been detected, the information processing apparatus 12 performs a collation process to see if a person corresponding to the registered detection target person has been detected by the processing in step S212 and subsequent steps.

ステップＳ２１２以降の照合処理において、検出部１２０２は、ステップＳ２１０で検出された人物について、登録済検出対象人物であるかどうかの比較照合のみならず、登録済検出対象人物であると過去に誤認識された人物との照合も行う。また本実施形態の情報処理装置１２は、過去に誤認識された人物について照合を行うかどうかは、当該過去に誤認識された人物が検出された空間および時刻を示す情報と、現在検出されている人物が検出された空間および時刻を示す情報を基に決定する。本実施形態において、人物が検出された空間とはその人物を撮影した監視カメラの撮影領域を表す空間である。また人物が検出された時刻とは、人物が検出された空間に当該人物が存在した時刻であり、監視カメラによって人物が撮影された時刻である。このため、空間情報取得部１２０２１が、現在検出されている人物が撮影された時刻ｔの情報と、その人物を撮影した監視カメラの識別情報（カメラＩＤ）によって特定される撮影領域（場所）を表す空間情報ｓとを取得する。なお、空間情報ｓは、カメラＩＤで特定される撮影領域だけでなく、当該カメラＩＤの監視カメラの近傍地域に設置された複数の監視カメラで撮影される複数の撮影領域を表す識別子を含んでいてもよい。 In the collation process after step S212, the detection unit 1202 not only compares and collates whether the person detected in step S210 is a registered detection target person, but also erroneously recognizes the person as a registered detection target person in the past. It also collates with the person who was made. Further, whether or not the information processing apparatus 12 of the present embodiment collates a person who has been erroneously recognized in the past is currently detected with information indicating the space and time when the person who was erroneously recognized in the past was detected. It is determined based on the information indicating the space and time when the person is detected. In the present embodiment, the space in which a person is detected is a space representing a shooting area of a surveillance camera that has shot the person. The time when the person is detected is the time when the person exists in the space where the person is detected, and is the time when the person is photographed by the surveillance camera. Therefore, the spatial information acquisition unit 12021 obtains a shooting area (location) specified by the information at the time t when the currently detected person was shot and the identification information (camera ID) of the surveillance camera that shot the person. The spatial information s to be represented is acquired. The spatial information s includes not only the shooting area specified by the camera ID but also an identifier representing a plurality of shooting areas shot by a plurality of surveillance cameras installed in the vicinity of the surveillance camera of the camera ID. You may.

ここで、本実施形態において、過去に誤認識の原因となった別人物（以下、類似物体あるいは第２の物体）の人物情報を保持することによる意味を説明する。これ以降、登録人物と類似する別人物を、別人Ｂと呼ぶことにする。該別人Ｂが、別の機会に再びこの情報処理システムの監視カメラで撮影された場合、再び登録済検出対象人物Ａであるとして検出され不要なアラート情報が出る可能性が非常に高い。よって、単純に別人Ｂを原因とする誤認識を減らすためには、登録済検出対象人物Ａに対する別人Ｂのように過去に誤認識した人物情報を、当該登録済検出対象人物Ａに関連付けて保持しておくことで対処する。 Here, in the present embodiment, the meaning of retaining the person information of another person (hereinafter, similar object or second object) that has caused misrecognition in the past will be described. Hereinafter, another person similar to the registered person will be referred to as another person B. If the other person B is photographed again by the surveillance camera of this information processing system on another occasion, it is very likely that the person B will be detected again as the registered detection target person A and unnecessary alert information will be output. Therefore, in order to simply reduce the misrecognition caused by another person B, the person information misrecognized in the past like another person B for the registered detection target person A is retained in association with the registered detection target person A. I will deal with it by keeping it.

関連付けを行う場合、情報処理システムが新規に検出した人物で登録済検出対象人物Ａである可能性が高いとされた類似の人物Ｐを同時に別人Ｂと比較し、より特徴量が近接している方の人物であると決定することができる。これにより、真に登録済検出対象人物Ａであるかどうかの精度を上げることができ、少なくとも別人Ｂ由来の誤認識を低減させることができる。 When associating, a similar person P who is newly detected by the information processing system and is considered to be a registered detection target person A is compared with another person B at the same time, and the feature quantities are closer to each other. It can be determined that it is one person. As a result, it is possible to improve the accuracy of whether or not the person is truly a registered detection target person A, and at least reduce erroneous recognition derived from another person B.

しかし、広い領域を多くの監視カメラで監視して運用する場合の現実的な問題として、登録済検出対象人物Ａに類似する特徴を持つ別人Ｂのような人物が、監視カメラの数に比例して検出される頻度が増加する。これら別人が、全ての登録済検出対象人物ごとに設定されると、登録済検出対象人物に似た人物が検出される度に、それら別人の全てと比較照合が実施される。この場合、以下のような問題が発生する可能性が高い。例えば、登録済検出対象人物における特徴空間内における本来のバラつき具合に比して、類似した人物のバラつきが大きい場合には、検出精度が低下する可能性が高まる。またこの場合、比較照合件数が増加することによって、処理時間が増加する。 However, as a practical problem when monitoring and operating a wide area with many surveillance cameras, a person like another person B who has characteristics similar to the registered detection target person A is proportional to the number of surveillance cameras. Increases the frequency of detection. When these different persons are set for all the registered detection target persons, comparison and collation with all of the registered detection target persons is performed every time a person similar to the registered detection target person is detected. In this case, the following problems are likely to occur. For example, if the variation of similar persons is larger than the original variation in the feature space of the registered detection target person, the possibility that the detection accuracy is lowered increases. Further, in this case, the processing time increases due to the increase in the number of comparisons and collations.

そこで、本実施形態の情報処理装置１２は、ステップＳ２１２において、空間情報取得部１２０２１が、現在検出されている人物が検出された空間および時刻を示す情報を取得する。この処理によって、別人Ｂの出現尤度に応じて該誤認識の発生を抑制する。本実施形態では、当該別人Ｂが検出された時刻ｔおよび空間情報ｓ（場所）を別人Ｂの出現尤度として用いる。空間情報取得部１２０２１は、人物が検出された時刻および空間情報と別人Ｂが出現した時刻および空間情報とを基に、照合するべき人物を制限する。このため、空間情報取得部１２０２１は、人物が検出された際に、その人物が撮影された時刻ｔとその人物を撮影した監視カメラに応じた空間情報ｓとを取得する。 Therefore, in the information processing apparatus 12 of the present embodiment, in step S212, the spatial information acquisition unit 12021 acquires information indicating the space and time in which the currently detected person is detected. By this processing, the occurrence of the erroneous recognition is suppressed according to the appearance likelihood of another person B. In the present embodiment, the time t and the spatial information s (location) at which the other person B is detected are used as the appearance likelihood of the other person B. The spatial information acquisition unit 12021 limits the persons to be collated based on the time and spatial information when the person is detected and the time and spatial information when another person B appears. Therefore, when the person is detected, the spatial information acquisition unit 12021 acquires the time t at which the person was photographed and the spatial information s corresponding to the surveillance camera that photographed the person.

さらにステップＳ２１３において、類似物体情報取得部１２０２２は、その時刻ｔに近接する時間帯に、空間情報ｓ（場所）に対応した監視カメラから取得された画像によって、過去に誤認識された物体についての情報（第２の物体に関する情報）を取得する。過去に誤認識された物体とは、登録物体に類似した特徴を有する類似物体であって、登録物体とは異なる個体を指す。
そしてステップＳ２１４において、照合部１２０２３は、検出された人物についての情報を、登録済検出対象人物Ａ（第１の物体）についての情報と、類似物体である人物Ｂ（ｎ）についての情報の情報と照合する。すなわち照合部１２０２３は、当該過去に誤認識された人物と、ステップＳ２１０で検出された人物と、登録済検出対象人物とを照合する。照合処理の方法は下記でも述べるが、一例として、検出された物体の特徴ベクトルと、第１の物体を示す特徴ベクトルまたは第２の物体を示す特徴ベクトルのそれぞれとの距離を算出する。距離が小さいほど類似していると判定できる。 Further, in step S213, the similar object information acquisition unit 12022 describes an object erroneously recognized in the past by an image acquired from a surveillance camera corresponding to spatial information s (location) in a time zone close to the time t. Acquire information (information about a second object). An object misrecognized in the past is a similar object having characteristics similar to a registered object, and refers to an individual different from the registered object.
Then, in step S214, the collation unit 12023 uses information about the detected person, information about the registered detection target person A (first object), and information about the similar object B (n). Match with. That is, the collation unit 12023 collates the person who was erroneously recognized in the past, the person detected in step S210, and the registered detection target person. The method of collation processing will be described below, but as an example, the distance between the feature vector of the detected object and the feature vector indicating the first object or the feature vector indicating the second object is calculated. The smaller the distance, the more similar it can be determined.

このような処理が有効である理由は、人物の移動パターンに注目すると多くの人々は似たような移動を繰り返す傾向が有るためである。例えば、特定の地点Ｒから地点Ｑへの移動に対して特定の人物が利用する経路は、複数の可能性が有るにも関わらず多くの場合、同一経路であることが多い。また同様に、ある時間帯に観測された人物は、別の日においても同様の時間帯に観測される可能性が高いという傾向も有る。これらは、人の行動には繰り返されるパターンが有り、人の行動を予測する際には無限の可能性から探索する必要が無く、繰り返されるパターンを基に予測することが有効であることを意味している。したがって例えば、登録済検出対象人物においても、毎日同じようなルーティーンの移動を繰り返す可能性が高く、該人物を監視するのは容易であると考えられる。ただし、登録済検出対象人物については、例えばルーティーンの移動と異なるランダムな移動を行ったとしても精度良く検出したいという動機があることも考えられる。その一方で、登録済検出対象人物以外の、その他大勢の検出対象ではない人物の多くは、日々似た移動を繰り返すため、前述のようなルーティーンに基づく検出はある程度有効に作用すると考えられる。このため本実施形態では、前述したように人物が出現した空間に関する情報として、その人物を撮影した監視カメラの撮影領域を表す空間情報ｓを取得する。すなわち、ある人物についての空間情報は、その人物についてのルーティーンや周期性のある習慣を示す情報と言い換えられる。各人物についての周期性に関する情報（空間情報）を用いて、登録人物と過去に検出された人物を母集団とする特徴空間から、検出された人物と照合を行う人物のサンプル集団を抽出する。 The reason why such a process is effective is that many people tend to repeat similar movements when focusing on the movement pattern of a person. For example, the route used by a specific person for the movement from the specific point R to the point Q is often the same route even though there are a plurality of possibilities. Similarly, a person observed in one time zone tends to be more likely to be observed in the same time zone on another day. These mean that there are repeated patterns in human behavior, and it is not necessary to search from infinite possibilities when predicting human behavior, and it is effective to make predictions based on repeated patterns. is doing. Therefore, for example, even in a registered detection target person, there is a high possibility that the same routine movement is repeated every day, and it is considered easy to monitor the person. However, it is possible that there is a motive for accurately detecting the registered person to be detected even if a random movement different from that of the routine is performed. On the other hand, many of the other non-detection target persons other than the registered detection target persons repeat similar movements every day, so it is considered that the above-mentioned routine-based detection works to some extent. Therefore, in the present embodiment, as the information regarding the space in which the person appears as described above, the space information s representing the shooting area of the surveillance camera in which the person is photographed is acquired. That is, spatial information about a person is paraphrased as information indicating routines and periodic habits about that person. Using information on the periodicity of each person (spatial information), a sample group of people who collate with the detected person is extracted from the feature space whose population is the registered person and the person detected in the past.

前述のような時間的および空間的な制約条件（時空間的制約条件）の具体例を挙げて説明する。
時間的な制約条件の例として、人物Ｐが検出された時刻を、時刻ｔ（Ｐ）とした場合、その人物Ｐを同一人物と判定するための時間的な閾値として、時刻ｔ（Ｐ）を挟んだ時間帯Ｔｔｈを用いるとする。なお、時間帯Ｔｔｈは３０分等とする。また、時刻ｔ（Ｐ）は、時間帯Ｔｔｈ内の中心の時間とするが、中心に限定されるものではない。時間帯Ｔｔｈは、時刻ｔ（Ｐ）に近接した時間帯を表しており、時間的にどのような時間帯を近接時間帯とするかは、例えばユーザが事前に定義可能であるが、予めシステムにおいて定義されていてもよい。 Specific examples of the temporal and spatial constraints (spatiotemporal constraints) as described above will be described.
As an example of the time constraint condition, when the time when the person P is detected is set to time t (P), the time t (P) is set as the time threshold for determining the person P as the same person. It is assumed that the sandwiched time zone Tth is used. The time zone Tth is 30 minutes or the like. Further, the time t (P) is set to the central time within the time zone Tth, but is not limited to the central time. The time zone Tth represents a time zone close to the time t (P), and the user can define in advance what time zone is to be the close time zone in terms of time, but the system can be defined in advance. It may be defined in.

また例えば空間的な制約条件の例として、人物Ｐを撮影した監視カメラによって定義される空間的位置座標のみを空間情報ｓとする。なお、図４に示した地図上で、人物Ｐが検出された監視カメラから特定距離範囲以内にある監視カメラ全てにおいてそれぞれ定義される空間的位置座標を空間情報ｓとしてもよい。空間情報ｓは、人物Ｐを撮影した監視カメラによって定義される空間的位置座標とするが、空間的にどのような範囲を空間情報とするかは、例えばユーザが事前に定義可能であるが、予めシステムにおいて定義されていてもよい。 Further, for example, as an example of spatial constraints, only the spatial position coordinates defined by the surveillance camera that captured the person P are used as the spatial information s. On the map shown in FIG. 4, the spatial position coordinates defined in all the surveillance cameras within a specific distance range from the surveillance camera in which the person P is detected may be used as the spatial information s. The spatial information s is the spatial position coordinates defined by the surveillance camera that captured the person P, and the range of the spatial information can be defined in advance by the user, for example. It may be defined in the system in advance.

また、前述した人物の移動パターンとは、移動経路の妥当性までは含めず、最も簡単な例として、監視カメラの位置が近接しているか否かで判定（例えば０または１と判定）する方法を用いることができる。その他、人物が出現する時間と空間にも確率的な連続性が有る。このため、例えば別人Ｂらしさのスコアを求める場合、人物から抽出された特徴量だけでなく、過去に別人Ｂが観測された時刻からの時間的距離および観測された空間からの空間的距離を反映させた尤度を求めてもよい。時間的距離および空間的距離を同時に考慮した尤度を求める場合は、カメラトポロジや時空間遷移確率を考慮することが望ましい。 Further, the above-mentioned movement pattern of a person does not include the validity of the movement route, and as the simplest example, a method of determining whether or not the positions of the surveillance cameras are close to each other (for example, determining 0 or 1). Can be used. In addition, there is a probabilistic continuity in the time and space in which a person appears. Therefore, for example, when calculating the score of another person B's uniqueness, not only the feature amount extracted from the person but also the temporal distance from the time when the other person B was observed in the past and the spatial distance from the observed space are reflected. You may find the probability of making it. When determining the likelihood considering the temporal distance and the spatial distance at the same time, it is desirable to consider the camera topology and the spatiotemporal transition probability.

例えば、下記の参考文献１に記載されているような方法によって人物照合を実行しながら逐次的にカメラトポロジや遷移確率を学習して利用する場合が考えられる。一般的に時空間遷移確率は人物追跡において追跡対象の人物で同一人物らしい人物に対して通行人の平均的な移動時間や距離および経路を勘案し、同一人物かどうかを確率的に判定する。この場合は、近接する時空間でのペアとなる二者の人物を比較して対応付ける。しかし、本実施形態の例において比較する二者は、現在観測された人物Ｐと過去に観測された別人Ｂである。ここで、人物Ｐと比較すべき過去に観測された類似する人物Ｂが複数（ｎ＝１・・・Ｎ）存在するならば、これをｎで区別して人物Ｂ（ｎ）と表記する。 For example, it is conceivable that the camera topology and the transition probability are sequentially learned and used while performing person collation by the method described in Reference 1 below. In general, the spatiotemporal transition probability is stochastically determined whether or not a person is the same person in person tracking by considering the average travel time, distance, and route of a passerby for a person who seems to be the same person. In this case, two people who are paired in a space-time close to each other are compared and associated. However, the two to be compared in the example of this embodiment are the currently observed person P and the previously observed person B. Here, if there are a plurality of similar persons B (n = 1 ... N) observed in the past to be compared with the person P, they are distinguished by n and expressed as the person B (n).

参考文献１：Ａ．ＧｉｌｂｅｒｔａｎｄＲ．Ｂｏｗｄｅｎ："Ｉｎｃｒｅｍｅｎｔａｌ，ｓｃａｌａｂｌｅｔｒａｃｋｉｎｇｏｆｏｂｊｅｃｔｓｉｎｔｅｒｃａｍｅｒａ"，ＣｏｍｐｕｔｅｒＶｉｓｉｏｎａｎｄＩｍａｇｅＵｎｄｅｒｓｔａｎｄｉｎｇ，１１１，１，ｐｐ．４３-５８（２００８）． Reference 1: A. Gilbert and R. Bowden: "Incremental, scalable tracking of objects inter camera", Computer Vision and Image Understanding, 111, 1, pp. 43-58 (2008).

空間的な距離の値は、そのまま距離が遠ければ遠いほど同一人物である確率が下がる、といった具合に、確率に反映させれば良い。しかし、時間的な距離の扱いは実用上異なる。時間軸上での人物行動は、特定の時間間隔（多くの場合２４時間周期）で特定のパターンを繰り返す傾向がある。具体的には、特定の個人の時刻ｔにおける行動は、３時間後の行動よりも２４時間後や７２時間後の行動の方が近接する。よって、本実施形態では、時間的な距離は２者が観測された日にちは関係なく時刻のみの差分で定義されるものとする。 The value of the spatial distance may be reflected in the probability, such that the farther the distance is, the lower the probability of being the same person. However, the treatment of temporal distances is practically different. Human behavior on the time axis tends to repeat a specific pattern at a specific time interval (often a 24-hour cycle). Specifically, the behavior of a specific individual at time t is closer to the behavior after 24 hours or 72 hours than the behavior after 3 hours. Therefore, in the present embodiment, the temporal distance is defined by the difference of only the time regardless of the date when the two parties are observed.

本実施形態の照合部１２０２３は、以上のようにして求まる尤度を、比較対象である二者から抽出した特徴量で定義される類似度との積を算出することによって、時間的および空間的な妥当性も考慮に入れた類似度Ｓ（ｎ）を導出する。 The collation unit 12023 of the present embodiment calculates the product of the likelihood obtained as described above with the similarity defined by the feature quantity extracted from the two comparison targets, thereby temporally and spatially. The similarity S (n) is derived in consideration of the appropriate validity.

そして、ステップＳ２１５において、生成部１２０２４は、検出された人物について、第１の物体または第２の物体と照合したか否かについての判定結果を示す出力情報を生成する。ステップＳ２１５の詳細を示す図３のフローチャートのステップＳ２１５１において、生成部１２０２４は、最終的に人物Ｐの候補となる人物Ｂ（ｎ）の中で、閾値を超えた人物が存在するかどうかを探索する。図３のフローチャートの処理の詳細は後述する。 Then, in step S215, the generation unit 12024 generates output information indicating a determination result as to whether or not the detected person is collated with the first object or the second object. In step S2151 of the flowchart of FIG. 3 showing the details of step S215, the generation unit 12024 searches for whether or not there is a person who exceeds the threshold value among the person B (n) who is finally a candidate for the person P. do. Details of the processing of the flowchart of FIG. 3 will be described later.

さらに生成部１２０２４は、ステップＳ２１５１で探索された候補の中で、最高類似度Ｓ（ｎ）となった人物を検出された物体の判定結果とする。これによって、最高類似度Ｓ（ｎ）と登録済検出対象人物Ａとから算出された類似度の２つのスコアを比較することができる。すなわち、ステップＳ２１５２において、生成部１２０２４は、人物Ｐが登録済検出対象人物Ａであるか、あるいは人物Ｂ（ｎ）であるかどうか決定することができる。 Further, the generation unit 12024 uses the person having the highest similarity S (n) among the candidates searched in step S2151 as the determination result of the detected object. Thereby, it is possible to compare two scores of the similarity calculated from the highest similarity S (n) and the registered detection target person A. That is, in step S2152, the generation unit 12024 can determine whether the person P is the registered detection target person A or the person B (n).

ここで、照合処理の方法としては、検出された人物に関する特徴量を利用して照合する方法であれば何でもよいが、一例として以下で説明する顔認識による方法を用いることができる。顔認識による方法を用いた場合、照合部１２０２３は、ステップＳ２１０で検出された人物領域からさらに顔領域を特定する。顔領域が検出された場合、照合部１２０２３は、該顔領域画像と登録済検出対象人物の顔画像とに基づいて、顔領域画像の人物を判定する。生成部１２０２４は、照合処理によって検出された物体の照合結果を示す情報を生成する。 Here, as the collation processing method, any method may be used as long as it is a collation method using the feature amount of the detected person, but as an example, the face recognition method described below can be used. When the face recognition method is used, the collation unit 12023 further identifies the face region from the person region detected in step S210. When the face area is detected, the collation unit 12023 determines the person in the face area image based on the face area image and the face image of the registered detection target person. The generation unit 12024 generates information indicating the collation result of the object detected by the collation process.

顔画像同士の比較照合処理はどのように実施してもよい。例えば、検出された人物の顔領域画像と登録済検出対象人物の顔画像の両者に対し、特徴抽出処理（その人らしさの抽出）が行われる。そして、その特徴抽出処理の結果得られる特徴ベクトル同士の距離を算出することで比較照合が行われる。特徴抽出処理としては、ディープラーニングによるＤＮＮ（ＤｅｅｐＮｅｕｒａｌＮｅｔｗｏｒｋｓ）を用いた手法を用いるとよく、また、特徴ベクトル同士の距離の算出はユークリッド距離やコサイン距離を用いるとよい。 Any method may be used for comparison and collation processing between facial images. For example, feature extraction processing (extraction of the person's personality) is performed on both the face area image of the detected person and the face image of the registered detection target person. Then, comparison and collation are performed by calculating the distance between the feature vectors obtained as a result of the feature extraction process. As the feature extraction process, a method using DNN (Deep Neural Networks) by deep learning may be used, and the distance between feature vectors may be calculated by using the Euclidean distance or the cosine distance.

よって、本実施形態の生成部１２０２４は、前述のようにして得られた距離に応じて検出された人物の顔画像と登録済検出対象人物の顔画像、または過去に誤認識された人物の顔画像との類似度スコアが算出できる。
ここで、類似度スコアは、スコアが高くなるにつれて、顔同士の類似性が上昇することを表す評価値であるので、コサイン距離の場合は、そのまま類似度スコアとしてもよいし、ユークリッド距離の場合はその逆数を類似度スコアにしてもよい。なお、顔画像のみによる比較照合処理では、顔画像領域が検出されなければ顔認識による類似度は測定できず、その場合、照合結果として登録済検出対象人物の顔や過去の誤認識された人物の顔と一致することは無い。このため、本照合処理は、顔認識による方法以外に、例えば下記の参考文献２のような人体領域から得られた特徴量により定義される類似度スコアを用いた照合でもよい。または下記の参考文献３のような手法により、一連の動画像から得られた歩容から得られた特徴量により定義されるスコアを用いた照合でもよい。さらに、前述した類似度スコアの和を取るなど、各種スコアを組み合わせた運用がなされてもよい。 Therefore, the generation unit 12024 of the present embodiment has a face image of a person detected according to the distance obtained as described above, a face image of a registered detection target person, or a face of a person erroneously recognized in the past. The similarity score with the image can be calculated.
Here, the similarity score is an evaluation value indicating that the similarity between faces increases as the score increases. Therefore, in the case of the cosine distance, the similarity score may be used as it is, or in the case of the Euclidean distance. May use its reciprocal as a similarity score. In the comparison collation process using only the face image, the similarity by face recognition cannot be measured unless the face image area is detected. In that case, the face of the registered detection target person or the person who was erroneously recognized in the past as the collation result. It does not match the face of. Therefore, in addition to the face recognition method, the collation process may be collated using a similarity score defined by a feature quantity obtained from a human body region, for example, as in Reference 2 below. Alternatively, a collation may be performed using a score defined by a feature amount obtained from a gait obtained from a series of moving images by a method as in Reference 3 below. Further, various scores may be combined and operated, such as taking the sum of the similarity scores described above.

参考文献２：Ｍ．Ｆａｒｅｎｚｅｎａｅｔａｌ．，ＰｅｒｓｏｎＲｅ－ＩｄｅｎｔｉｆｉｃａｔｉｏｎｂｙＳｙｍｍｅｔｒｙ－ＤｒｉｖｅｎＡｃｃｕｍｕｌａｔｉｏｎｏｆＬｏｃａｌＦｅａｔｕｒｅｓ，ＣＶＰＲ２０１０．
参考文献３：Ｍｕｒａｓｅ，Ｍｏｖｉｎｇｏｂｊｅｃｔｒｅｃｏｇｎｉｔｉｏｎｉｎｅｉｇｅｎｓｐａｃｅｒｅｐｒｅｓｅｎｔａｔｉｏｎ：ｇａｉｔａｎａｌｙｓｉｓａｎｄｌｉｐｒｅａｄｉｎｇ，ＰａｔｔｅｒｎＲｅｃｏｇｎｉｔｉｏｎＬｅｔｔｅｔｔｅｒｓ，１９９６． Reference 2: M.I. Farenza et al. , Person Re-Identification by Symmetry-Driven Accumulation of Local Features, CVPR2010.
Reference 3: Murase, Moving object recognition in eigenspace representation: gait analysis and lip reading, Pattern Recognition Letters, 1996.

図２（Ｂ）のフローチャートの説明に戻る。
ステップＳ２１４の照合処理の結果、ステップＳ２１０にて検出された人物が登録済検出対象人物または過去に誤認識された人物のどれかであるという尤度が、前記した類似度スコアを基に定義される。通常、該尤度は、前述の類似度スコアそのものでよい。 Returning to the description of the flowchart of FIG. 2 (B).
As a result of the collation process in step S214, the likelihood that the person detected in step S210 is either a registered detection target person or a person misrecognized in the past is defined based on the similarity score described above. To. Usually, the likelihood may be the similarity score itself described above.

そして、生成部１２０２４は、ステップＳ２１５において、照合部１２０２３の判定結果を示す表示画像を生成する。前述の類似度によって決定した結果を表す表示画像は、出力装置１３に表示される。以下、ステップＳ２１５の詳細を示す図３のフローチャートについて説明する。 Then, in step S215, the generation unit 12024 generates a display image showing the determination result of the collation unit 12023. A display image showing the result determined by the above-mentioned similarity is displayed on the output device 13. Hereinafter, the flowchart of FIG. 3 showing the details of step S215 will be described.

まずステップＳ２１５１において、生成部１２０２４は、前述した類似度スコアのうち、照合結果において閾値を超えるものが有るかどうかを判定する。生成部１２０２４において閾値を超えるものが無かった場合、情報処理装置１２は、図３の処理を終了して図２のステップＳ２１６に処理を進める。一方、閾値を超えた類似度スコアが１つ以上有った場合、生成部１２０２４は、ステップＳ２１５２に処理を進める。 First, in step S2151, the generation unit 12024 determines whether or not any of the above-mentioned similarity scores exceeds the threshold value in the collation result. If none of the generation units 12024 exceeds the threshold value, the information processing apparatus 12 ends the process of FIG. 3 and proceeds to step S216 of FIG. On the other hand, when there is one or more similarity scores exceeding the threshold value, the generation unit 12024 proceeds to the process in step S2152.

ステップＳ２１５２に進むと、生成部１２０２４は、類似度スコアが最も高かった照合対象の人物を参照する。そして、生成部１２０２４は、類似度スコアが最も高かった人物は登録済検出対象人物か否かを判定する。このとき、最大の類似度スコアとなった人物が登録済検出対象人物ではなかった場合、生成部１２０２４は、過去に誤認識した人物との類似度スコアが高かったと判定する。この場合、検出部１２０２は、出力装置１３に対して登録人物がいない旨の情報を出力する。そして生成部１２０２４はは、画面表示状態を更新することなく終了する。一方、類似度スコアが最大であった人物が、登録済検出対象人物であった場合、生成部１２０２４は、ステップＳ２１５３に処理を進める。 Proceeding to step S2152, the generation unit 12024 refers to the person to be collated having the highest similarity score. Then, the generation unit 12024 determines whether or not the person with the highest similarity score is the registered detection target person. At this time, if the person with the maximum similarity score is not the registered detection target person, the generation unit 12024 determines that the similarity score with the person erroneously recognized in the past is high. In this case, the detection unit 1202 outputs information to the output device 13 that there is no registered person. Then, the generation unit 12024 ends without updating the screen display state. On the other hand, if the person with the highest similarity score is the registered detection target person, the generation unit 12024 proceeds with the process in step S2153.

ステップＳ２１５３に進むと、生成部１２０２４は前のステップで最大類似度スコアとなった登録済検出対象人物の画像を出力装置１３に画面表示させて、図３のフローチャートの処理を終了する。例えば、検出された物体が登録人物に類似する（類似度が所定の値より大きい）場合、生成部１２０２４は、検出された物体が登録人物（第１の物体）であることを示す情報を生成する。また、検出された物体が類似人物（第２の物体）に類似する場合、生成部１２０２４は、検出された物体が類似人物（第２の物体）であることを示す情報を生成する。検出された物体が登録人物（第１の物体）と類似人物（第２の物体）のどちらとも類似しない（類似度が所定の値より小さい）場合、生成部１２０２４は、登録人物（第１の物体）と類似人物（第２の物体）のどちらでもないことを示す情報を生成する。 Proceeding to step S2153, the generation unit 12024 causes the output device 13 to display the image of the registered detection target person having the maximum similarity score in the previous step on the screen, and ends the processing of the flowchart of FIG. For example, when the detected object is similar to the registered person (similarity is greater than a predetermined value), the generation unit 12024 generates information indicating that the detected object is the registered person (first object). do. Further, when the detected object is similar to a similar person (second object), the generation unit 12024 generates information indicating that the detected object is a similar person (second object). When the detected object is not similar to either the registered person (first object) or the similar person (second object) (similarity is smaller than a predetermined value), the generation unit 12024 may perform the registered person (first object). Generates information indicating that it is neither an object) nor a similar person (second object).

図５は、ステップＳ２１５３において出力装置１３に表示された画面例を示した図である。図５は、ユーザに対して登録済検出対象人物が検出されたことを通知するための画面例である。
図５において、領域４１には、例えば現在検出されている人物（以降、人物Ｐとする）と最も類似度が高い登録済検出対象人物Ａの画像が表示される。領域４２には、人物Ｐの居場所と監視カメラの位置関係を把握するための地図が表示され、この地図は前述の図４に示した図と同様のものが表示される。領域４２に表示された地図を見ると、登録済検出対象人物が検出された場合等の追跡中の人物の位置が地図上でリアルタイムに表示されている。領域４３は、登録済検出対象人物Ａと類似度が高かった人物Ｐの映像が表示されている。 FIG. 5 is a diagram showing an example of a screen displayed on the output device 13 in step S2153. FIG. 5 is an example of a screen for notifying the user that the registered detection target person has been detected.
In FIG. 5, for example, an image of a registered detection target person A having the highest degree of similarity to a currently detected person (hereinafter referred to as person P) is displayed in the area 41. In the area 42, a map for grasping the positional relationship between the location of the person P and the surveillance camera is displayed, and this map is similar to the one shown in FIG. 4 described above. Looking at the map displayed in the area 42, the position of the person being tracked, such as when a registered detection target person is detected, is displayed in real time on the map. In the area 43, the image of the person P having a high degree of similarity to the registered detection target person A is displayed.

図５のような画面表示がなされることで、ユーザは、領域４１に表示された登録済検出対象人物の画像と、領域４３に表示された人物Ｐの映像とを見比べて、同一人物らしいかどうか判断ができる。なお、同一人物かどうかを即座に判断することが難しい場合に、ユーザが追跡ボタン４４を選択すると、情報処理装置１２は、領域４３に表示されている人物Ｐを暫く追跡表示する。
領域４２の地図上の各監視カメラ（図４の監視カメラＣ３０１～Ｃ３１４）のうち、いずれかをユーザが例えばマウスポインタ等で選択した場合、情報処理装置１２は、領域４３に表示する画像を、その選択された監視カメラによる映像に切り替えるものとする。 By displaying the screen as shown in FIG. 5, the user compares the image of the registered detection target person displayed in the area 41 with the image of the person P displayed in the area 43, and is it likely to be the same person? I can judge. When it is difficult to immediately determine whether or not the person is the same person, when the user selects the tracking button 44, the information processing apparatus 12 tracks and displays the person P displayed in the area 43 for a while.
When the user selects one of the surveillance cameras (surveillance cameras C301 to C314 in FIG. 4) on the map of the area 42, for example, with a mouse pointer or the like, the information processing apparatus 12 displays the image to be displayed in the area 43. It shall be switched to the image by the selected surveillance camera.

なお、追跡の方法としては前述した参考文献２等に記載された方法などを用いることができ、その追跡方法によって同一人物を追跡でき、例えば人物Ｐが移動したことで当該人物Ｐを撮影する監視カメラが変わったとしても追跡を継続する。また例えば、過去の追跡情報を見たい場合、ユーザは、スライダーバー４９を動かす。この場合、情報処理装置１２は、スライダーバー４９の位置に応じた過去の動画を記録サーバ１１から取得する。そして、ユーザが例えば再生ボタン４７を押すと、情報処理装置は、スライダーバー４９の位置に応じた特定時点から現在時点までの動画を再生して出力装置１３に表示させる。またユーザが停止ボタン４８を押した場合、情報処理装置１２は、その停止ボタン４８が押された際に表示されていた映像を一時停止状態で表示させる。このように本実施形態によれば、ユーザは、過去の特定時点から現在時点までの映像、例えば、同一人物が登録済検出対象人物らしさのスコアが閾値を超えてアラート情報が出る以前の映像を閲覧できる。また本実施形態によれば、所望の時点で一時停止表示した映像によって、登録済検出対象人物であるかどうかなどを吟味して詳細に確認することができる。また本実施形態の情報処理装置１２は、再生ボタン４７、停止ボタン４８、或いはスライダーバー４９による映像制御指示がなされた場合、その映像制御指示に応じて、領域４２の地図上で追跡されている人物の位置も変化する。 As the tracking method, the method described in Reference 2 or the like described above can be used, and the same person can be tracked by the tracking method. For example, monitoring in which the person P is photographed when the person P moves. Continue tracking even if the camera changes. Also, for example, when the user wants to see the past tracking information, the user moves the slider bar 49. In this case, the information processing apparatus 12 acquires the past moving image corresponding to the position of the slider bar 49 from the recording server 11. Then, when the user presses, for example, the play button 47, the information processing apparatus reproduces the moving image from the specific time point to the present time point according to the position of the slider bar 49 and displays it on the output device 13. When the user presses the stop button 48, the information processing apparatus 12 displays the image displayed when the stop button 48 is pressed in a paused state. As described above, according to the present embodiment, the user can view the video from a specific time point in the past to the present time point, for example, the video before the same person is registered and the score of the detection target person-likeness exceeds the threshold value and the alert information is output. You can browse. Further, according to the present embodiment, it is possible to examine and confirm in detail whether or not the person is a registered detection target person by the video paused and displayed at a desired time point. Further, when the image control instruction is given by the play button 47, the stop button 48, or the slider bar 49, the information processing apparatus 12 of the present embodiment is tracked on the map of the area 42 in response to the image control instruction. The position of the person also changes.

また、本実施形態では、リアルタイムでの処理を想定しているため、領域４３に表示される映像は監視カメラが現時点で撮影しているリアルタイム映像であるが、過去に遡って映像を確認する場合でも同様な表示がなされてもよい。
さらに、前述のフローチャートの処理では、最も類似度が高い人物を検出された候補とみなして表示する例を挙げたが、例えば類似度が閾値を超えた登録済検出対象人物を全て検出したものとみなして全表示し、以降同様の処理を実施してもよい。 Further, in the present embodiment, since processing in real time is assumed, the image displayed in the area 43 is a real-time image taken by the surveillance camera at the present time, but when the image is confirmed retroactively. However, the same display may be made.
Further, in the processing of the above-mentioned flowchart, an example is given in which a person having the highest similarity is regarded as a detected candidate and displayed. For example, it is assumed that all registered detection target persons whose similarity exceeds the threshold value are detected. It may be regarded as full display, and the same processing may be performed thereafter.

図２（Ｂ）のフローチャートに説明を戻す。
前述したステップＳ２１５の後、ステップＳ２１６の処理に進むと、受付部１２０２５は、ユーザによって検出された物体の判定結果について修正を受け付ける。つまり、受付部１２０２５は、前段のステップＳ２１５での認識結果に対するユーザからの正誤判定の指示を受け付ける。具体的には、情報処理装置１２は、ユーザが画面表示による確認作業行ったことで、図５の領域４１に表示された人物と領域４３に表示された人物Ｐ（追跡されている人物）とが同一人物であると確認して、例えば通報ボタン４６を押したとする。この図５では、情報処理システムによる市街監視が行われ、登録済検出対象人物Ａが指名手配犯である例を想定しており、ユーザにより通報ボタン４６が押された場合、情報処理装置１２は、例えば近隣の警察署へ通報する。これは一例であり、情報処理システムによる店舗等の監視が行われる例の場合、情報処理装置１２は、通報ボタン４６が押された場合には店舗スタッフへの通報を行う。このように、通報ボタン４６は、監視の目的によって異なる通報先への通報に用いられ、任意にカスタマイズして使用することができる。 The explanation is returned to the flowchart of FIG. 2 (B).
After step S215 described above, when the process proceeds to step S216, the reception unit 12025 accepts corrections regarding the determination result of the object detected by the user. That is, the reception unit 12025 receives an instruction from the user to determine whether the recognition result in step S215 in the previous stage is correct or incorrect. Specifically, the information processing apparatus 12 includes a person displayed in the area 41 of FIG. 5 and a person P (a person being tracked) displayed in the area 43 due to the confirmation work performed by the user on the screen display. It is assumed that the same person is confirmed and, for example, the report button 46 is pressed. In FIG. 5, it is assumed that the city is monitored by the information processing system and the registered detection target person A is a wanted criminal. When the report button 46 is pressed by the user, the information processing device 12 is set. For example, call a nearby police station. This is an example, and in the case where the store or the like is monitored by the information processing system, the information processing apparatus 12 notifies the store staff when the report button 46 is pressed. As described above, the report button 46 is used for reporting to different report destinations depending on the purpose of monitoring, and can be arbitrarily customized and used.

次にステップＳ２１７に進むと、受付部１２０２５は、検出された人物に対する認識結果は誤認識であったか否かについての情報を受け付ける。ここで、通報ボタン４６がユーザにより押された場合、検出された人物は登録済検出対象人物と同一人物であるとユーザが確認したと考えられ、その人物に対する認識結果は正しいとみなすことができる。この場合、以降の処理に対して何も変更を加える必要はないため、情報処理装置１２は、図２（Ｂ）のフローチャートの処理を終了して、図２（Ａ）のステップＳ２２に処理を進める。一方、誤検出ボタン４５が押された場合、ユーザは、検出された人物が登録済検出対象人物とは別人であると判断しものと考えられる。したがって、誤検出ボタン４５が押された場合、登録済検出対象人物が誤認識されたとユーザが判断したものとして、情報処理装置１２は、ステップＳ２１８に処理を進める。 Next, when the process proceeds to step S217, the reception unit 12025 receives information as to whether or not the recognition result for the detected person is erroneous recognition. Here, when the report button 46 is pressed by the user, it is considered that the user has confirmed that the detected person is the same person as the registered detection target person, and the recognition result for that person can be regarded as correct. .. In this case, since it is not necessary to make any changes to the subsequent processing, the information processing apparatus 12 ends the processing of the flowchart of FIG. 2 (B) and performs the processing in step S22 of FIG. 2 (A). Proceed. On the other hand, when the erroneous detection button 45 is pressed, it is considered that the user determines that the detected person is different from the registered detection target person. Therefore, when the erroneous detection button 45 is pressed, the information processing apparatus 12 proceeds with the process in step S218, assuming that the user has determined that the registered detection target person has been erroneously recognized.

ステップＳ２１８に進むと、更新部１２０２６は、ユーザによって判定結果の修正を受け付けた場合、その修正に基づいて判定における判定基準を更新する。すなわち、情報処理装置１２は、認識結果が正しいか又は誤っているかの情報をユーザから受け付け、検出された人物に関する情報について当該ユーザが修正を行った場合に、判定基準の更新処理を行う。判定基準の更新処理は、類似度による判定の場合は閾値を調整することによる誤認識の発生抑制を目的とするものであってもよいし。前述した識別器の学習を再度行うことによって判定基準を更新する処理でもよい。なお、登録済検出対象人物Ａに紐付けられて登録されている、誤認識された人物が存在しない場合には、初めて誤認識された人物ということになる。このため、更新部１２０２６は、当該初めて誤認識された人物については、そのまま人物情報（特徴量や画像、時刻や空間情報）を記録サーバ１１に記録する。 In step S218, when the user accepts the correction of the determination result, the update unit 12026 updates the determination criterion in the determination based on the correction. That is, the information processing apparatus 12 receives information on whether the recognition result is correct or incorrect from the user, and when the user corrects the information on the detected person, the information processing apparatus 12 updates the determination criteria. In the case of determination based on similarity, the determination criterion update process may be aimed at suppressing the occurrence of erroneous recognition by adjusting the threshold value. The process of updating the determination criteria may be performed by re-learning the discriminator described above. If there is no erroneously recognized person associated with the registered detection target person A, it means that the person is erroneously recognized for the first time. Therefore, the update unit 12026 records the person information (feature amount, image, time, spatial information) as it is in the recording server 11 for the person who is erroneously recognized for the first time.

また、ユーザによる判定結果が、登録済検出対象人物本人ではなく別人であった場合には、登録済検出対象人物であるというアラート情報を出す必要はない。しかし過去に観測された全ての別人Ｂ（ｎ）の中で最も類似度Ｓ（ｎ）が高い人物であることを根拠に、人物Ｐが過去に観測されたうちの１人である別人Ｂ（ｋ）と認識されたとても、その人物が別人Ｂ（ｋ）でない可能性は残る。これは、人物Ｐが過去に観測されていない別人Ｃである場合である。もし、新規な別人Ｃであった場合、この別人Ｃの人物情報を新たに登録することをユーザに促す必要がある。 Further, when the determination result by the user is not the registered detection target person but another person, it is not necessary to issue the alert information that the user is the registered detection target person. However, based on the fact that the person has the highest similarity S (n) among all the different people B (n) observed in the past, the person P is one of the different people B (n) observed in the past. It is very possible that the person recognized as k) is not another person B (k). This is the case where the person P is another person C who has not been observed in the past. If it is a new person C, it is necessary to urge the user to newly register the person information of this person C.

新規な別人Ｃの人物情報を登録する方法としては、人物Ｐから抽出した特徴量が特徴空間内において登録済みの全ての別人Ｂ（ｎ）から抽出した特徴量と事前に設定した閾値以上離れていた場合を判定基準とする方法を用いる。具体的には、例えば図６（Ａ）に示すようなウインドウ５０の画面を用いて、新規な別人Ｃの追加登録を行う。ウインドウ５０では、登録済検出対象人物に似た人物と、過去登録済みの別人の人物画像とを比較して異なる人物であるか、或いは該当する登録済みの人物が存在しないため新規な別人であるかどうかを、ユーザから受け付け可能とになされている。そして、更新部１２０２６は、ウインドウ５０を介したユーザからの指示に応じて、新規な別人Ｃの人物情報を、登録済検出対象人物に類似した人物の人物情報として追加登録する。 As a method of registering new person information of another person C, the feature amount extracted from the person P is separated from the feature amount extracted from all the registered other person B (n) in the feature space by a preset threshold value or more. A method is used in which the case is used as a criterion. Specifically, for example, using the screen of the window 50 as shown in FIG. 6A, additional registration of a new person C is performed. In the window 50, a person similar to the registered detection target person is a different person by comparing the person image of another person registered in the past, or the person is a new person because the corresponding registered person does not exist. Whether or not it can be accepted from the user. Then, the update unit 12026 additionally registers the new person information of another person C as the person information of the person similar to the registered detection target person in response to the instruction from the user via the window 50.

なお図６（Ａ）に示したウインドウ５０では、それぞれ別人の候補がスコア順に表示される。図６（Ａ）の例では、候補として２人の人物の画像が表示されているが、スコア順であればさらに多くの人物画像が表示されてもよい。さらに、この表示例では、過去に誤認識された人物の顔画像だけでなく、その人物が撮影された時刻と監視カメラのカメラＩＤも表示し、ユーザが判断を行い易くしている。このとき表示される顔画像は、過去に監視カメラで撮影された画像から得られた画像である。 In the window 50 shown in FIG. 6A, candidates of different persons are displayed in order of score. In the example of FIG. 6A, images of two people are displayed as candidates, but more person images may be displayed as long as they are in order of score. Further, in this display example, not only the face image of a person who has been erroneously recognized in the past but also the time when the person was photographed and the camera ID of the surveillance camera are displayed to facilitate the user's judgment. The face image displayed at this time is an image obtained from an image taken by a surveillance camera in the past.

そして、もしも、人物Ｐがウインドウ５０に表示された、過去に検出された複数の人物のなかの何れかの人物と同一人物である場合、ユーザは、それらウインドウ５０に表示された複数の人物の中から、正しいと判断した人物を選定する。一方、該当する人物が存在しない場合、ユーザは、ウインドウ５０内に用意された［該当無］を選択する。そしてユーザにより［該当無］が選択された場合、更新部１２０２６は、人物Ｐを誤認識された新規人物とし、別人Ｂ（ｎ）のｎをＮ＋１として登録する。これにより、図２（Ｂ）のフローチャートにおいては、次回ループから、ステップＳ２１４による照合処理が更新される。 Then, if the person P is the same person as any one of the plurality of persons detected in the past displayed in the window 50, the user can use the plurality of persons displayed in the window 50. Select the person who is judged to be correct from the list. On the other hand, when the corresponding person does not exist, the user selects [Not applicable] prepared in the window 50. When [Not applicable] is selected by the user, the update unit 12026 registers the person P as a newly recognized person and the n of another person B (n) as N + 1. As a result, in the flowchart of FIG. 2B, the collation process according to step S214 is updated from the next loop.

なお、情報処理システムの運用先のルールによっては、登録済検出対象人物以外の一般人の画像を保存しないで運用したいという場合がある。この場合は、ウインドウ５０に表示される情報のうち、人物に関する画像を表示しないで実行すればよい。この例の場合、情報処理装置１２は、図６（Ｂ）に示すウインドウ５１を表示させる。このウインドウ５１により、ユーザは、人物Ｐが過去に繰り返し見たことのある人物であるか、初めて見る人物であるかを基準に、人物画像が表示されていない場合でも適切に判定することができる。 Depending on the rules of the operation destination of the information processing system, it may be desired to operate without saving the images of ordinary people other than the registered detection target person. In this case, among the information displayed in the window 50, the image related to the person may not be displayed. In the case of this example, the information processing apparatus 12 displays the window 51 shown in FIG. 6 (B). With this window 51, the user can appropriately determine whether the person P is a person who has been repeatedly seen in the past or a person who is seen for the first time, even if the person image is not displayed. ..

本実施形態では、主に図６（Ａ）の画面によって、登録されている誤認識された別人Ｂ（ｎ）の中でさらに同一個人とそれ以外で正しくラベリングして運用する方法を述べた。このように、検出対象ではない別人であっても、類似した人物を正しく同一人物か異なる人物であるかを教示することによって得られる効果は複数存在する。 In the present embodiment, a method of correctly labeling and operating the same individual and others among the registered misrecognized different person B (n) is described mainly by the screen of FIG. 6 (A). As described above, even if it is another person who is not the detection target, there are a plurality of effects obtained by teaching that a similar person is correctly the same person or a different person.

例えば、登録済検出対象人物Ａに類似した別人Ｂ（ｒ），Ｂ（ｓ）の２名が存在した場合を例に挙げて説明する。これらを正しく別人であるとしてユーザが入力したとする。この場合、情報処理装置１２は、別人Ｂ（ｒ）の出現する尤度の高い時間帯および空間情報と、別人Ｂ（ｓ）の出現する尤度の高い時間帯および空間情報とを分けて照合することで、正しい認識結果を得ることができる。 For example, a case where two persons B (r) and B (s) similar to the registered detection target person A exist will be described as an example. It is assumed that the user inputs these correctly as different people. In this case, the information processing apparatus 12 separately collates the highly likely time zone and spatial information in which another person B (r) appears with the highly likely time zone and spatial information in which another person B (s) appears. By doing so, the correct recognition result can be obtained.

またこの場合、特徴空間内での別人Ｂ（ｒ）由来の特徴量のバラつきと、別人Ｂ（ｓ）由来の特徴量のバラつきとを分離して保持することが可能となる。このため、情報処理装置１２は、登録済検出対象人物Ａの検出精度を高めることができる。
図７は、特徴空間を示したイメージ図である。図７の例は、別人Ｂ（ｒ）および別人Ｂ（ｓ）が共に過去に複数回観測されていて、それら各々について抽出された特徴位置を図中の黒丸と白丸で表している。また、図７の例では、別人Ｂ（ｒ）由来の特徴量の特徴空間内でのバラつきが大きく、別人Ｂ（ｓ）由来の特徴量の特徴空間内でのバラつきが相対的に小さいとする。さらに別人Ｂ（ｓ）由来の特徴量の分布は、別人Ｂ（ｒ）由来の特徴量の分布に含まれてしまっているとする。 Further, in this case, it is possible to separately hold the variation in the feature amount derived from another person B (r) and the variation in the feature amount derived from another person B (s) in the feature space. Therefore, the information processing apparatus 12 can improve the detection accuracy of the registered detection target person A.
FIG. 7 is an image diagram showing a feature space. In the example of FIG. 7, both another person B (r) and another person B (s) have been observed a plurality of times in the past, and the feature positions extracted for each of them are represented by black circles and white circles in the figure. Further, in the example of FIG. 7, it is assumed that the variation of the feature amount derived from another person B (r) is large in the feature space, and the variation of the feature amount derived from another person B (s) is relatively small in the feature space. .. Further, it is assumed that the distribution of the feature amount derived from another person B (s) is included in the distribution of the feature amount derived from another person B (r).

この図７に示したような特徴量分布の場合、ユーザによる判断なしに、それら特徴量のみで二者が異なる人物であることを決定することは難しい。例えば、ユーザによる判断が無い場合、二者に同じラベル（ｖとする）が自動で付与されることになる。ここで、例えば、別人Ｂ（ｒ）が観測された時刻が午前９時であり、別人Ｂ（ｓ）が観測された時刻が午後７時だったとする。このとき、それら二者に同じラベルｖが付与されているとすると、それら二者は区別されずに人物Ｂ（ｖ）として扱われることになり、システム上では、当該人物Ｂ（ｖ）が午前９時と午後７時に出現するものとして保持される。そして、運用時において、例えば、登録済検出対象人物Ａに近い特徴を持つとして検出された人物Ｐが午後７時に検出され、さらに抽出特徴量が特徴空間内では図７の白丸Ｐの位置であった場合、人物Ｐは別人Ｂ（ｒ）である可能性が高いと判断される。一方、ユーザによる判断がなされ、別人Ｂ（ｒ）と別人Ｂ（ｓ）がそれぞれ異なる人物として判断されているとすると、人物Ｐが検出された午後７時には人物Ｂ（ｒ）のみの特徴空間内分布領域の点との距離を勘案する必要はない。この場合は、別人Ｂ（ｓ）と登録済検出対象人物Ａのどちらかであるかを、人物Ｐについて照合すればよい。 In the case of the feature amount distribution as shown in FIG. 7, it is difficult to determine that the two are different persons only by the feature amount without the judgment by the user. For example, if there is no judgment by the user, the same label (referred to as v) will be automatically given to the two parties. Here, for example, it is assumed that the time when another person B (r) is observed is 9:00 am and the time when another person B (s) is observed is 7:00 pm. At this time, if the same label v is given to the two persons, the two persons will be treated as the person B (v) without distinction, and the person B (v) will be treated as the person B (v) on the system. Retained as appearing at 9 and 7 pm. Then, during operation, for example, a person P detected as having a feature close to the registered detection target person A is detected at 7:00 pm, and the extracted feature amount is the position of the white circle P in FIG. 7 in the feature space. If so, it is determined that the person P is likely to be another person B (r). On the other hand, assuming that the judgment is made by the user and the different person B (r) and the different person B (s) are judged as different persons, in the feature space of only the person B (r) at 7:00 pm when the person P is detected. It is not necessary to consider the distance to the points in the distribution area. In this case, it suffices to collate with respect to the person P whether it is another person B (s) or the registered detection target person A.

ただし、運用法によっては、厳密な検出精度よりも、ユーザの目視確認による労力を削減することでユーザコストを低減させたい場合がある。その場合は、上述したように別人Ｂ（ｒ）と別人Ｂ（ｓ）を厳密に別人であるとするラベルを付与せずに運用してもよい。この場合は、図６（Ａ）のような表示は実施しない、または、表示してもユーザはウインドウ５０を消すことで入力作業をスキップし、システム側は自動的に全ての誤認識人物を自動的に別人Ｂ（ｎ）に追加していくという処理で対応する。つまり、この場合、別人Ｂ（ｒ）と別人Ｂ（ｓ）が異なる人物情報として保持されていたとしても、実際は同一人物であるというデータが増加していくことになる。その結果として、別人Ｂ（ｒ）と別人Ｂ（ｓ）の出現する尤度の高い時間帯・空間情報の特定や、特徴空間内での各々の特徴量の分布状態を区別せず全て同様の別人の人物情報として登録済検出対象Ａと比較照合を実施することで対応する。 However, depending on the operation method, there is a case where it is desired to reduce the user cost by reducing the labor for visual confirmation of the user rather than the strict detection accuracy. In that case, as described above, the operation may be performed without assigning a label that the other person B (r) and the other person B (s) are strictly different persons. In this case, the display as shown in FIG. 6A is not performed, or even if the display is displayed, the user skips the input work by turning off the window 50, and the system automatically automatically selects all the misrecognized persons. Correspond by the process of adding to another person B (n). That is, in this case, even if another person B (r) and another person B (s) are held as different person information, the data that they are actually the same person will increase. As a result, the time zone / spatial information with high likelihood of appearance of another person B (r) and another person B (s) is specified, and the distribution state of each feature amount in the feature space is not distinguished, and all are the same. Correspond by performing comparison and collation with the registered detection target A as personal information of another person.

以上、説明した登録済検出対象人物の検出処理は前記した通り、図２のステップＳ２２にてユーザにより監視終了の入力がなされるまで継続される。
以上により、第１の実施形態の情報処理装置１２によれば、監視カメラにより事前登録された人物を検出・認識する場合において、見逃しの発生を極力抑えながら、誤認識率を低減し、さらに処理コストの上昇を抑制することができる。 As described above, the detection process of the registered detection target person described above is continued until the user inputs the end of monitoring in step S22 of FIG.
As described above, according to the information processing apparatus 12 of the first embodiment, when a person pre-registered by a surveillance camera is detected and recognized, the false recognition rate is reduced and further processing is performed while suppressing the occurrence of oversight as much as possible. It is possible to suppress the increase in cost.

＜第２の実施形態＞
第１の実施形態では特定の登録済検出対象人物を効率良くかつ高い精度で検出可能にするために、登録済検出対象人物に類似した特徴を持つ人物の出現パターンの時空間情報を利用した。第２の実施形態においても、第１の実施形態同様、通常、人物は毎日同様のルーティーンを繰り返すという性質を利用し、さらに、毎日繰り返されるルーティーンは正常状態の行動であると定義する。 <Second embodiment>
In the first embodiment, in order to enable the detection of a specific registered detection target person efficiently and with high accuracy, the spatiotemporal information of the appearance pattern of the person having characteristics similar to the registered detection target person is used. The second embodiment also utilizes the property that a person usually repeats the same routine every day as in the first embodiment, and further defines that the routine repeated every day is a behavior in a normal state.

第２の実施形態では、登録済みの特定の対象物の検出だけでなく、未登録対象物を低コスト且つ高精度に検出可能とし、その検出結果をユーザに提示可能とする処理について説明する。第２の実施形態でも対象物として人物を挙げる。以下の説明では、登録済みの特定の人物を登録済人物（第１の物体）と呼び、登録済人物における通常のルーティーンによる正常状態の行動を正常行動と呼ぶ。また、登録済人物による通常のルーティーンから外れた異常状態の行動を異常行動と呼ぶ。また本実施形態において、異常状態には、未登録の人物が現れる状態も含まれ、当該未登録人物が現れることによる異常状態についても異常行動と呼ぶ。また第２の実施形態では、異常行動の人物が検出された場合、その人物を要監視対象人物とする。なお第２の実施形態に係る情報処理システムの構成は前述の図１と同様である。また、情報処理装置の機能構成は前述の図１０と同様である。 In the second embodiment, not only the detection of the registered specific object but also the unregistered object can be detected with low cost and high accuracy, and the detection result can be presented to the user. Also in the second embodiment, a person is mentioned as an object. In the following description, a specific registered person is referred to as a registered person (first object), and a normal routine behavior of a registered person is referred to as a normal behavior. In addition, behavior in an abnormal state that deviates from the normal routine by a registered person is called abnormal behavior. Further, in the present embodiment, the abnormal state includes a state in which an unregistered person appears, and the abnormal state caused by the appearance of the unregistered person is also referred to as an abnormal behavior. Further, in the second embodiment, when a person with abnormal behavior is detected, that person is set as a person requiring monitoring. The configuration of the information processing system according to the second embodiment is the same as that of FIG. 1 described above. Further, the functional configuration of the information processing apparatus is the same as that of FIG. 10 described above.

市街監視等における異常行動の情報処理システムでは、例えば監視カメラの画像から得られる特徴量のうちで異常行動に関する特徴量を事前に学習しておき、運用時に、異常行動に関する特徴量が検出されれば異常行動が発生したとアラート情報を出す。または、情報処理システムは、登録済人物における正常行動に関する特徴量を用いて正常モデルを学習しておき、その正常モデルから逸脱した特徴量が取得された場合に、異常行動が検知されたとしてアラート情報を出す。しかしながら、異常行動が検出されてからアラート情報を出すのでは遅いため、異常行動の予兆を検出したいというニーズもある。ただし、異常行動の予兆の検出は、異常行動の検出よりも難しく、また個人差も大きいため、実用では期待した通りの効果が得られないことが多い。 In an information processing system for abnormal behavior in city monitoring, for example, among the feature quantities obtained from images of a surveillance camera, the feature quantity related to the abnormal behavior is learned in advance, and the feature quantity related to the abnormal behavior is detected during operation. If abnormal behavior occurs, alert information is issued. Alternatively, the information processing system learns a normal model using the features related to normal behavior in the registered person, and when the features deviating from the normal model are acquired, the information processing system alerts that the abnormal behavior is detected. Give out information. However, since it is too late to issue alert information after abnormal behavior is detected, there is also a need to detect signs of abnormal behavior. However, the detection of signs of abnormal behavior is more difficult than the detection of abnormal behavior, and since there are large individual differences, it is often the case that the expected effect cannot be obtained in practical use.

そこで、第２の実施形態では、特定の時間帯および空間で繰り返し観測される傾向の有る登録済人物およびその行動は正常行動であるとみなす。一方、特定の時間帯および空間では観測されない傾向の人物が検出された場合、その人物は異常行動の人物であり、要監視対象人物であるとしてアラート情報を出す。これことにより、第２の実施形態では、異常行動の予兆の検出を容易にする。 Therefore, in the second embodiment, the registered person and the behavior thereof, which tend to be repeatedly observed in a specific time zone and space, are regarded as normal behavior. On the other hand, when a person who tends not to be observed in a specific time zone and space is detected, the person is a person with abnormal behavior and alert information is issued assuming that the person is a person requiring monitoring. This facilitates the detection of signs of abnormal behavior in the second embodiment.

図８は、第２の実施形態の情報処理装置１２において、要監視対象人物を検出してアラート情報を発する際の出力装置１３の表示画面のイメージを示した図である。
図８において、領域６１は監視カメラにて撮影されている画像を表示する領域である。検出部１２０２は、この時間帯において毎日同様のルーティーンを繰り返す登録済人物のみが当該監視カメラからの映像に映っている場合には正常行動の状態にあるとみなす。一方、その時間帯において、例えば毎日同様のルーティーンを繰り返す人物とは異なる人物が、当該監視カメラからの映像に映った場合、検出部１２０２は、、その人物を要監視対象人物として検出する。また例えば、その時間帯において登録済人物の行動が毎日のルーティーンとは異なる行動である場合、検出部１２０２は、その登録済人物についても要監視対象人物として検出する。 FIG. 8 is a diagram showing an image of a display screen of the output device 13 when detecting a person requiring monitoring and issuing alert information in the information processing device 12 of the second embodiment.
In FIG. 8, the area 61 is an area for displaying an image taken by a surveillance camera. The detection unit 1202 considers that only a registered person who repeats the same routine every day during this time period is in a state of normal behavior when it is shown in the image from the surveillance camera. On the other hand, when a person different from a person who repeats the same routine every day appears in the image from the surveillance camera in that time zone, the detection unit 1202 detects the person as a person requiring surveillance. Further, for example, when the behavior of the registered person is different from the daily routine in that time zone, the detection unit 1202 also detects the registered person as a person requiring monitoring.

そして、検出部１２０２は、その要監視対象人物の画像を枠６２で囲ってユーザに注意を促すように表示を生成する。また検出部１２０２は、その要監視対象人物の拡大画像を領域６３に表示し、さらに警告音を出力してユーザに通知する。これにより、ユーザは、その人物が監視を継続すべき人物であるか、またその人物の行動が監視を継続すべき行動であるかを、画面を見て判断することができる。ここで、その人物が毎日同様のルーティーンを繰り返す登録済人物であるとユーザが判断し、誤検出ボタン６５を押した場合、情報処理装置１２は、その人物に対する誤認識が生じたとして、以降は前述の第１の実施形態と同様に処理する。一方、ユーザが監視を継続して追跡すべき要監視対象人物であると判断して追跡ボタン６４を押した場合、検出部１２０２は、その人物を前述の第１の実施形態の例と同様の手法で追跡する。さらに、その要監視対象人物の監視を継続している際に、その人物が異常行動を起こしそうであるとユーザが判断して通報ボタン６６を押した場合、検出部１２０２は、前述の第１の実施形態と同様に通報する。 Then, the detection unit 1202 encloses the image of the person to be monitored by the frame 62 and generates a display so as to call attention to the user. Further, the detection unit 1202 displays an enlarged image of the person to be monitored in the area 63, and further outputs a warning sound to notify the user. As a result, the user can determine by looking at the screen whether the person is the person who should continue the monitoring and whether the behavior of the person is the behavior which the monitoring should be continued. Here, when the user determines that the person is a registered person who repeats the same routine every day and presses the erroneous detection button 65, the information processing apparatus 12 assumes that the person is erroneously recognized. Is processed in the same manner as in the first embodiment described above. On the other hand, when the user determines that the person needs to be continuously tracked and presses the tracking button 64, the detection unit 1202 resembles the example of the first embodiment described above. Track by method. Further, when the user determines that the person is likely to cause an abnormal behavior and presses the report button 66 while continuing to monitor the person to be monitored, the detection unit 1202 uses the above-mentioned first method. Report in the same manner as in the embodiment of.

図９（Ａ）は第２の実施形態の情報処理装置１２が実行する処理を示したフローチャートであり、監視カメラからの画像入力から監視終了までの間に繰り返される処理を示している。また図９（Ｂ）は、図９（Ａ）のフローチャートのステップＳ７１で検出部１２０２が実行する要監視対象人物の検出処理の詳細なフローチャートである。なお、第２の実施形態の情報処理システムも第１の実施形態と同様、特に利用される場面が限定されることは無いが、効果の理解と説明のし易さのため市街地の監視を例として挙げる。また第２の実施形態でも、図４に示した監視カメラＣ３０１～Ｃ３１４の１４台のカメラを同時に処理する事例とする。 FIG. 9A is a flowchart showing the process executed by the information processing apparatus 12 of the second embodiment, and shows the process repeated from the image input from the surveillance camera to the end of the surveillance. Further, FIG. 9B is a detailed flowchart of the detection process of the person to be monitored, which is executed by the detection unit 1202 in step S71 of the flowchart of FIG. 9A. As with the first embodiment, the information processing system of the second embodiment is not particularly limited in the situations in which it is used, but for the sake of understanding and explaining the effects, monitoring of the city area is an example. Listed as. Further, also in the second embodiment, it is an example that 14 cameras of the surveillance cameras C301 to C314 shown in FIG. 4 are processed at the same time.

まず、ステップＳ７０において、画像取得部１２０１は、監視カメラＣ３０１～Ｃ３１４からそれぞれ監視カメラによって撮像された画像を取得する。
次にステップＳ７１において、検出部１２０２は、要監視対象人物の検出処理を開始する。検出部１２０２は、取得された画像から検出された物体（人物）から、予め登録された要監視対象人物（第１の物体）を検出する。なお、第１の物体は、登録済みの所定の物体（実施形態１と同様）と、さらに異常状態である任意の物体とを含む。つまり、本実施形態における監視対象は、異常行動を行う任意の物体、すなわち正常状態ではない物体である。なお、登録済みではない物体は、その行動が正常か異常かを判断することができないため、要監視対象人物として検出する。
そして、情報処理装置１２は、ステップＳ７２においてユーザから監視終了の指示が入力されるまで、ステップＳ７１の処理を継続する。 First, in step S70, the image acquisition unit 1201 acquires images captured by the surveillance cameras from the surveillance cameras C301 to C314, respectively.
Next, in step S71, the detection unit 1202 starts the detection process of the person requiring monitoring. The detection unit 1202 detects a pre-registered person to be monitored (first object) from the object (person) detected from the acquired image. The first object includes a registered predetermined object (similar to the first embodiment) and an arbitrary object in an abnormal state. That is, the monitoring target in the present embodiment is an arbitrary object that performs abnormal behavior, that is, an object that is not in a normal state. Since it is not possible to determine whether the behavior of an object that has not been registered is normal or abnormal, it is detected as a person to be monitored.
Then, the information processing apparatus 12 continues the process of step S71 until the user inputs an instruction to end monitoring in step S72.

図９（Ａ）のステップＳ７１０～ステップＳ７１２までは、前述した図２（Ｂ）のステップＳ２１０～ステップＳ２１２の処理と概ね同様であり、それら同様の処理の説明は省略する。
ステップＳ７１１において、物体検出部１２０２０は、人物が検出されなかった場合にはステップＳ７１６の処理に進む。ステップＳ７１６では、生成部１２０２４が出力装置１３に正常時の画面、つまり要監視対象人物が検出されていない通常時の画面を生成し、出力装置に表示させる。一方、人物が検出された場合、検出部１２０２は、ステップＳ７１２以降の処理により、ステップＳ７１０で検出された人物が要監視対象人物であるかどうかの照合処理を行う。以下、ステップ７１０で検出された人物を、人物Ｕとする。 Steps S710 to S712 of FIG. 9A are substantially the same as the processes of steps S210 to S212 of FIG. 2B described above, and the description of these similar processes will be omitted.
In step S711, if the person is not detected, the object detection unit 12020 proceeds to the process of step S716. In step S716, the generation unit 12024 generates a normal screen on the output device 13, that is, a normal screen in which the person requiring monitoring is not detected, and causes the output device to display the screen. On the other hand, when a person is detected, the detection unit 1202 performs a collation process for checking whether the person detected in step S710 is a person requiring monitoring by the processing after step S712. Hereinafter, the person detected in step 710 is referred to as a person U.

ステップＳ７１２以降の照合処理において、空間情報取得部１２０２１は、ステップＳ７１０で検出された人物Ｕが撮影された時刻ｔと、その人物Ｕを撮影した監視カメラのカメラＩＤで特定される撮影領域を表す空間情報ｓとを取得する。第２の実施形態においても第１の実施形態の場合と同様に、空間情報ｓは、カメラＩＤで特定される撮影領域だけでなく、そのカメラＩＤの監視カメラの近傍地域の複数の監視カメラで撮影される複数の撮影領域を表す識別子を含んでいてもよい。 In the collation process after step S712, the spatial information acquisition unit 12021 represents a shooting area specified by the time t at which the person U detected in step S710 was shot and the camera ID of the surveillance camera that shot the person U. Acquires spatial information s. In the second embodiment as in the case of the first embodiment, the spatial information s is not only the shooting area specified by the camera ID, but also a plurality of surveillance cameras in the vicinity of the surveillance camera of the camera ID. It may include an identifier representing a plurality of shooting areas to be shot.

次にステップＳ７１３において、類似物体情報取得部１２０２２は、時刻ｔによって定義される時刻帯と、空間情報ｓによって定義される監視カメラ群で過去に検出されたことの有る正常行動の登録済人物のリスト（正常人物リストとする）を取得する。ここで言う正常行動とは、特定の期間、例えば過去一定の期間に渡って１回以上観測され、さらに異常行動を行わず、図８の通報ボタン６６が押されて通報された人物を除く人物の行動である。正常人物リストとは、特定の時刻および空間で過去一定の期間にわたって１回以上観測され、さらに異常行動を行わず、通報ボタン６６による通報がなされて異常行動の人物であることを示すラベルが付与されていない、登録済人物のリストである。すなわちステップＳ７１３で取得された正常人物リストに含まれる人物は、通常その時間帯において近辺の監視カメラで観測されることの有る登録済人物であると考えられるので、要監視対象人物ではない。 Next, in step S713, the similar object information acquisition unit 12022 is a registered person of normal behavior that has been detected in the past by the time zone defined by the time t and the surveillance camera group defined by the spatial information s. Get a list (let's call it a normal person list). The normal behavior referred to here is a person who has been observed more than once over a specific period, for example, a certain period in the past, does not perform any abnormal behavior, and excludes a person who has been notified by pressing the report button 66 in FIG. Is the action of. The normal person list is a person who has been observed at least once in a specific time and space over a certain period of time in the past, does not perform any abnormal behavior, and is notified by the report button 66 to indicate that the person is a person with abnormal behavior. This is a list of registered people who have not been registered. That is, the person included in the normal person list acquired in step S713 is considered to be a registered person who may be observed by a surveillance camera in the vicinity in that time zone, and therefore is not a person requiring surveillance.

次にステップＳ７１４において、照合部１２０２３は、ステップＳ７１０で検出された人物Ｕと、正常人物リストに含まれる各登録済人物とを照合する。すなわち、普段観測されない人物や、時刻ｔおよび空間情報ｓで定義される正常行動から外れた行動の人物を要監視対象人物として検出するために、ステップＳ７１４では、ステップＳ７１０で検出された人物を正常人物リストと照合する。 Next, in step S714, the collation unit 12023 collates the person U detected in step S710 with each registered person included in the normal person list. That is, in order to detect a person who is not normally observed or a person whose behavior deviates from the normal behavior defined by the time t and the spatial information s as a person to be monitored, in step S714, the person detected in step S710 is normal. Match with the person list.

このときの照合部１２０２３は、過去に時刻ｔおよび空間情報ｓで定義される正常人物リストに含まれる各登録済人物の特徴量群と人物Ｕから抽出された特徴量とを、特徴空間内で比較する。そして、照合部１２０２３は、人物Ｕの特徴量が、特定の閾値以下の距離内で特徴量が近接する、過去に検出された登録済人物であるかどうかを判定する。また、照合部１２０２３は、登録済人物であると判定した場合に、その人物の行動が、時刻ｔおよび空間情報ｓで定義される正常行動から外れた行動であるかどうかを判定する。つまり、照合部１２０２３は、検出された物体が正常状態であると判定された場合、さらに、検出された物体が登録済人物（第１の物体）であるか否かを判定する。 At this time, the collation unit 12023 sets the feature amount group of each registered person included in the normal person list defined by the time t and the spatial information s in the past and the feature amount extracted from the person U in the feature space. Compare. Then, the collation unit 12023 determines whether or not the feature amount of the person U is a registered person detected in the past in which the feature amount is close within a distance equal to or less than a specific threshold value. Further, when the collation unit 12023 determines that the person is a registered person, the collating unit 12023 determines whether or not the behavior of the person deviates from the normal behavior defined by the time t and the spatial information s. That is, when it is determined that the detected object is in the normal state, the collation unit 12023 further determines whether or not the detected object is a registered person (first object).

次にステップＳ７１５において、生成部１２０２４は、ステップＳ７１４の照合処理にて、過去に検出された登録済人物であり、正常行動であるとの照合結果を得た場合、要監視対象人物は検出されなかったと判定し、ステップＳ７１６に進む。また、生成部１２０２４は、検出された物体が過去に検出された登録済人物の正常行動ではないと判定した場合、要監視対象であると判定する。さらに、生成部１２０２４は、検出された物体が登録済人物ではない未登録人物（第３の物体）であるとの照合結果を得た場合、要監視対象人物であると判定する。又は、生成部１２０２４は、検出された物体の行動が登録済人物の正常行動ではないとの照合結果を得た場合、検出された物体要監視対象人物であるとして、ステップＳ７１７に進む。生成部１２０２４は、人物Ｕが、過去に時刻ｔで定義される時刻帯に空間情報ｓで定義される監視カメラによって検出されたことが無い人物、または正常行動でない登録済人物である場合、その人物Ｕは要監視対象人物であると判定した結果を示す情報を生成する。 Next, in step S715, the generation unit 12024 is a registered person detected in the past in the collation process of step S714, and when a collation result is obtained that the behavior is normal, the person to be monitored is detected. It is determined that there was no such thing, and the process proceeds to step S716. Further, when the generation unit 12024 determines that the detected object is not the normal behavior of the registered person detected in the past, the generation unit 12024 determines that the object needs to be monitored. Further, when the generation unit 12024 obtains a collation result that the detected object is an unregistered person (third object) that is not a registered person, the generation unit 12024 determines that the person needs to be monitored. Alternatively, when the generation unit 12024 obtains a collation result that the behavior of the detected object is not the normal behavior of the registered person, the generation unit 12024 determines that the detected object is a person to be monitored and proceeds to step S717. When the person U is a person who has never been detected by the surveillance camera defined by the spatial information s in the time zone defined by the time t in the past, or the person U is a registered person who does not behave normally, the generation unit 12024 is the person. The person U generates information indicating the result of determining that the person needs to be monitored.

ステップＳ７１７において、生成部１２０２４は、ステップＳ７１４とステップ７１５による照合結果として、出力装置１３に前述した図８のような画像を生成し、出力装置が画面表示を行う。
そして、次のステップＳ７１８において、受付部１２０２５は、ユーザからの入力を受け付ける。受付部１２０２５は、異常状態であると判定された前記検出された物体について、ユーザによって登録済人物でないことを教示された場合、過去に検出されたことのない未登録人物（第３の物体）であることを示すラベルを付与する。また、前記正常状態ではないと判定された検出された物体について、ユーザによって第１の物体であることを教示された場合、検出された物体における空間情報と時刻とを関連付けて、検出された物体は第１の物体の正常状態であることを示す情報を保持する。 In step S717, the generation unit 12024 generates an image as shown in FIG. 8 described above on the output device 13 as a collation result by step S714 and step 715, and the output device displays the screen.
Then, in the next step S718, the reception unit 12025 receives the input from the user. When the user is instructed by the user that the detected object that is determined to be in an abnormal state is not a registered person, the reception unit 12025 is an unregistered person (third object) that has not been detected in the past. A label indicating that is attached. Further, when the user is instructed that the detected object determined to be not in the normal state is the first object, the detected object is associated with the spatial information in the detected object and the time. Holds information indicating that the first object is in a normal state.

ここで図８の画面表示において、枠付きで表示された要監視対象人物について、ユーザが、正常行動の登録済人物であると判断して誤検出ボタン６５を押した場合、情報処理装置１２は、人物Ｕは要監視対象人物ではないと判断する。この場合、検出部１２０２は、この人物Ｕが撮影された時刻ｔと空間情報ｓを、当該人物Ｕにおける正常行動を示す新たな情報として正常人物リストを更新する。 Here, in the screen display of FIG. 8, when the user determines that the person to be monitored, which is displayed with a frame, is a registered person of normal behavior and presses the false detection button 65, the information processing apparatus 12 will perform the information processing device 12. , It is determined that the person U is not the person to be monitored. In this case, the detection unit 1202 updates the normal person list with the time t when the person U was photographed and the spatial information s as new information indicating the normal behavior in the person U.

また、ユーザが人物Ｕについて要監視対象人物であるかどうかをすぐに判断できず、追跡ボタン６４を押した場合、検出部１２０２は、当該人物Ｕの追跡を継続する。
一方、ユーザが、人物Ｕは異常行動を行う人物であると判断して通報ボタン６６を押した場合、検出部１２０２は、通報処理を行う。
このように検出部１２０２は、ステップＳ７１８で受け付けたユーザ入力を反映させ、ステップＳ７１９にてユーザ入力を反映させた時刻ｔおよび空間情報ｓで定義される正常人物リストを生成して、次の不審行動の検出に利用する。 Further, if the user cannot immediately determine whether or not the person U is a person to be monitored and presses the tracking button 64, the detection unit 1202 continues to track the person U.
On the other hand, when the user determines that the person U is a person who performs an abnormal behavior and presses the report button 66, the detection unit 1202 performs the report process.
In this way, the detection unit 1202 reflects the user input received in step S718, generates a normal person list defined by the time t and the spatial information s that reflects the user input in step S719, and generates the next suspicious person list. Used to detect behavior.

以上、説明した第２の実施形態に係る情報処理は、前述した通り、ステップＳ７２にてユーザにより監視終了が入力されるまで継続される。これにより、ユーザは画面上で観測される人物の行動全てを監視する必要がなく、過去に該当する時間帯や空間で観測されていない人物が観測された場合のみ注視するだけの監視が可能になり、監視にかかるコストを大幅に削減することができる。 As described above, the information processing according to the second embodiment described above is continued until the end of monitoring is input by the user in step S72. As a result, the user does not have to monitor all the behaviors of the person observed on the screen, and it is possible to monitor only when a person who has not been observed in the corresponding time zone or space in the past is observed. Therefore, the cost of monitoring can be significantly reduced.

前述した第１の実施形態および第２の実施形態の情報処理装置１２は、図１で示したような同じハードウェア構成で実現可能である。このため、第１の実施形態において事前に登録された特定の登録済検出対象人物の検出と、第２の実施形態における要監視対象人物の検出とを同時に実施することも可能である。また、第２の実施形態で定義した要監視対象人物として検出されない人物と判定された場合でも、該人物が登録済検出対象人物、または登録済検出対象人物でないと２段階で判定することで、よりユーザ意図に合致した認識を実施可能となる。 The information processing apparatus 12 of the first embodiment and the second embodiment described above can be realized with the same hardware configuration as shown in FIG. Therefore, it is also possible to simultaneously detect a specific registered detection target person registered in advance in the first embodiment and detect a monitoring target person in the second embodiment. Further, even if it is determined that the person is not detected as the person to be monitored as defined in the second embodiment, it is determined in two stages that the person is not the registered detection target person or the registered detection target person. It becomes possible to carry out recognition that is more in line with the user's intention.

また登録済検出対象人物と比較する場合は、第１の実施形態と同様であり、また本実施形態では同様の処理で、例えば事前に登録済みの非検出対象人物と比較することで要監視対象として検出された人物によるアラート情報の発出数を抑制可能となる。なお、ここで言う非検出対象人物とは、例えば、店舗の監視などにおいてランダムなタイミングで店舗を訪れる店舗のスタッフ等を挙げることができ、それらスタッフの情報を非検出対象人物として登録しておく。 Further, when comparing with the registered detection target person, it is the same as in the first embodiment, and in the present embodiment, the monitoring is required by comparing with the non-detection target person registered in advance, for example. It is possible to suppress the number of alert information issued by the person detected as. The non-detection target person referred to here may be, for example, the staff of a store that visits the store at random timing in the monitoring of the store, and the information of the staff is registered as the non-detection target person. ..

本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける一つ以上のプロセッサがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。
上述の実施形態は、何れも本発明を実施するにあたっての具体化の例を示したものに過ぎず、これらによって本発明の技術的範囲が限定的に解釈されてはならないものである。すなわち本発明は、その技術思想、又はその主要な特徴から逸脱することなく、様々な形で実施することができる。 The present invention supplies a program that realizes one or more functions of the above-described embodiment to a system or device via a network or storage medium, and one or more processors in the computer of the system or device reads and executes the program. It can also be realized by the processing to be performed. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.
The above-mentioned embodiments are merely examples of embodiment in carrying out the present invention, and the technical scope of the present invention should not be construed in a limited manner by these. That is, the present invention can be implemented in various forms without departing from the technical idea or its main features.

１１：記憶装置、１２：情報処理装置、１３：出力装置、１４：入力装置、１０１：撮像装置、１２０１：画像取得部、１２０２：検出部、１２０２０：物体検出部、１２０２１：空間情報取得部、１２０２２：類似物体情報取得部、１２０２３：照合部、１２０２４：生成部、１２０２５：受付部、１２０２６：更新部 11: Storage device, 12: Information processing device, 13: Output device, 14: Input device, 101: Image pickup device, 1201: Image acquisition unit, 1202: Detection unit, 12020: Object detection unit, 12021: Spatial information acquisition unit, 12022: Similar object information acquisition unit, 12023: Collation unit, 12024: Generation unit, 12025: Reception unit, 12026: Update unit

Claims

A detection means that detects a predetermined object from an image,
An acquisition means for acquiring a second object different from the pre-registered first object based on the spatial information representing the space in which the object is detected and the time when the object existed in the space.
A determination means for determining whether or not the detected object by the detection means is the first object or the second object.
An information processing device characterized by having.

The information according to claim 1, further comprising an output means for outputting alert information when the determination means determines that the detected object is the first object. Processing equipment.

The information processing apparatus according to claim 2, further comprising an updating means for updating the determination criteria in the determination based on the modification when the user accepts the modification of the determination result by the determination means.

When the determination result that the detected object is determined to be the first object by the determination means is incorrect, the spatial information and time of the detected object and the detected object are extracted. The information processing apparatus according to any one of claims 1 to 3, wherein the information processing apparatus has a holding means for holding the obtained feature amount as information of the second object.

The determination means determines the detected object as a candidate for the first object, and further obtains information on the second object held by the holding means with respect to the candidate for the first object. The information processing apparatus according to claim 4, wherein the detection means determines whether or not the detected object is the first object or the second object.

The information processing apparatus according to claim 4 or 5, further comprising a presentation means for presenting information on the feature amount of the second object held by the holding means to the user.

When the determination result that the detected object is determined to be the first object by the determination means is incorrect, the determination means is held by the holding means with respect to the detected object. The information processing apparatus according to any one of claims 4 to 6, further comprising a label that distinguishes whether or not the object is the same as the first object.

A detection means that detects a predetermined object from an image,
Specifying the pre-registered first object detected corresponding to the spatial information representing the space in which the object is detected and the time when the object existed in the space, and the normal state of the first object. Means and
Determination of determining whether or not the detected object by the detection means is in the normal state based on the first object specified by the specific means and the normal state of the first object. Means and
An information processing device characterized by having.

The determination means is characterized in that when it is determined that the detected object is in the normal state, it is further determined whether or not the detected object is the first object. The information processing apparatus according to 8.

The determination means has not been detected in the past when the user teaches that the detected object determined to be not in the normal state of the first object is not the first object. The information processing apparatus according to claim 8 or 9, wherein a label indicating that the object is a third object is attached.

When the user is instructed by the user that the detected object that is determined not to be in the normal state is the first object, the determination means includes the spatial information and the time in the detected object. 10. The information processing apparatus according to claim 10, further comprising a holding means for holding the first object in a normal state in association with the above.

The determination means excludes the first object in a normal state detected in a specific time and space in the past from the object of the determination, which is not detected for a specific period. The information processing apparatus according to claim 11.

It is an information processing method executed by an information processing device.
A detection process that detects a predetermined object from an image,
An acquisition step of acquiring a second object different from the pre-registered first object based on the spatial information representing the space in which the object is detected and the time when the object existed in the space.
A determination step of determining whether the detected object is the first object or the second object, and
An information processing method characterized by having.

It is an information processing method executed by an information processing device.
A detection process that detects a predetermined object from an image,
Specifying the pre-registered first object detected corresponding to the spatial information representing the space in which the object is detected and the time when the object existed in the space, and the normal state of the first object. Process and
A determination step of determining whether or not the detected object is in the normal state based on the specified first object and the normal state of the first object.
An information processing method characterized by having.

A program for making a computer function as each means of the information processing apparatus according to any one of claims 1 to 12.