JP6588413B2

JP6588413B2 - Monitoring device and monitoring method

Info

Publication number: JP6588413B2
Application number: JP2016196373A
Authority: JP
Inventors: 慎也高田; 幸由大田; 竹内　格; 格竹内; 禅石倉
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2016-10-04
Filing date: 2016-10-04
Publication date: 2019-10-09
Anticipated expiration: 2036-10-04
Also published as: JP2018061114A

Description

本発明は、監視装置および監視方法に関する。 The present invention relates to a monitoring apparatus and a monitoring method.

単一の映像から人物の軌跡を追跡する技術が知られている。例えば、映像に映っている人物を検知して、この人物映像のピクセル位置と人物映像の識別情報とを時系列に出力する（非特許文献１参照）。 A technique for tracking a person's trajectory from a single image is known. For example, a person appearing in the video is detected, and the pixel position of the human video and the identification information of the human video are output in time series (see Non-Patent Document 1).

また、複数の方向から撮影された画像を用い、画像間の相対関係を手掛かりにして一部の画像にしか撮影されていない人物を同定する技術が知られている。また、複数のカメラで撮影された映像の各フレーム画像に写る人物を検知して、各人物の移動軌跡を算出する技術が知られている（特許文献１参照）。 In addition, a technique is known that uses images taken from a plurality of directions and identifies a person who has been photographed only in a part of the images by using a relative relationship between the images as a clue. In addition, there is known a technique for detecting a person in each frame image of a video taken by a plurality of cameras and calculating a movement trajectory of each person (see Patent Document 1).

また、ステレオカメラの映像を利用して、人物の不審行動を検知する技術も提案されている。この技術は、監視対象となる人物の移動軌跡を取得し、この移動軌跡に基づいて当該人物の行動状態を識別し、不審行動を検知する（特許文献２参照）。 In addition, a technique for detecting a suspicious behavior of a person using a stereo camera image has been proposed. This technique acquires a movement trajectory of a person to be monitored, identifies a behavior state of the person based on the movement trajectory, and detects suspicious behavior (see Patent Document 2).

また、複数のカメラを用いた人物の監視技術が提案されている。例えば、複数のカメラの撮影映像から移動体を抽出し、カメラ間で映像を照合し、その移動体の移動経路を求め、所定エリアへ侵入した場合に、監視員にアラームを表示する技術が提案されている。この技術において、カード認証等の個人特定手段を用いることで、移動体すなわち人物の所定エリアへのアクセス権限の有無を判断する。また、人物の移動経路や移動軌跡を分析することで、映像中の人物が不審者であるか否かを判定し、アラームを表示する。 A person monitoring technique using a plurality of cameras has been proposed. For example, a technique is proposed in which moving objects are extracted from video captured by multiple cameras, the images are collated between cameras, the moving path of the moving object is obtained, and an alarm is displayed to a supervisor when entering a predetermined area. Has been. In this technique, by using a personal identification means such as card authentication, it is determined whether or not a mobile object, that is, a person has access authority to a predetermined area. Also, by analyzing the movement path and movement trajectory of the person, it is determined whether or not the person in the video is a suspicious person, and an alarm is displayed.

特開２０１０−０６３００１号公報JP 2010-063001 A 特開２０１２−１２８８７７号公報JP 2012-128877 A

「画像センシング技術ＯＫＡＯＶｉｓｉｏｎ人を見つけ、認識する」、[online]、２０１５年１２月、オムロン株式会社、［２０１６年４月１２日検索]、インターネット＜URL:http://plus-sensing.omron.co.jp/technology/＞“Image Sensing Technology OKAO Vision Finding and Recognizing People”, [online], December 2015, OMRON Corporation, [April 12, 2016 search], Internet <URL: http: //plus-sensing.omron .co.jp / technology / ＞

しかしながら、従来の画像中の人物を検知する技術は、人物映像の顔等の特徴を比較して同一性を判定する処理や、多視点からの映像を３次元空間に対応づけるステレオマッチングを伴っていた。そのため、処理が複雑となり、また、カメラのキャリブレーションの設定や運用のためのコストがかさんでいた。 However, the conventional technology for detecting a person in an image involves processing for determining the identity by comparing features such as a face of a person image, and stereo matching for associating images from multiple viewpoints with a three-dimensional space. It was. This complicates the process and increases the cost for setting and operating the camera calibration.

また、ステレオカメラの映像を利用して人物の不審行動を検知する技術は、不審者（部外者）の行動パターンを事前に登録しておく必要があり、手間がかかっていた。 Further, the technology for detecting a suspicious behavior of a person using a video from a stereo camera needs to register a behavior pattern of a suspicious person (outsider) in advance, which is troublesome.

また、複数のカメラにより撮影された映像中の人物を監視する場合、カード認証手段等、人物の特定が明確な個人特定手段の併用が前提となる。したがって、例えば、映像トラッキングシステムとビーコン受信システム等、人物の特定が曖昧な技術のみを併用する場合、映像トラッキングシステムで検知した人物の所定エリアへのアクセス権限の有無を判定したり、当該人物が不審者であるか否かを判定したりすることは困難である。また、例えば、ビーコン受信システムの位置検知精度は、数１０ｍ程度であり、誤差も多い。そのため、映像トラッキングシステムで検知した人物についてビーコン受信システムを併用して所定エリアへのアクセス権限が有るか否かを判定したり、当該人物が不審者であるか否かを判定したりすることも困難である。 In addition, when monitoring a person in a video taken by a plurality of cameras, it is premised on the combined use of a personal identification means such as a card authentication means that clearly identifies the person. Therefore, for example, in the case of using only a technique that is ambiguous in identifying a person, such as a video tracking system and a beacon receiving system, it is possible to determine whether or not the person has access authority to a predetermined area detected by the video tracking system. It is difficult to determine whether or not a person is a suspicious person. For example, the position detection accuracy of the beacon receiving system is about several tens of meters, and there are many errors. Therefore, it is also possible to determine whether the person detected by the video tracking system has the authority to access the predetermined area by using the beacon receiving system together, or determine whether the person is a suspicious person. Have difficulty.

本発明は、上記に鑑みてなされたものであって、人物の特定が曖昧なセンサと組み合わせて、多視点から撮影された映像中の人物を簡易に特定して行動を監視することを目的とする。 The present invention has been made in view of the above, and it is an object of the present invention to easily identify a person in a video taken from multiple viewpoints and monitor an action in combination with a sensor with ambiguous person identification. To do.

上述した課題を解決し、目的を達成するために、本発明に係る監視装置は、同一時刻に、所定の監視範囲内を異なる場所から撮影した複数の映像と、該監視範囲内の人物を識別して検知したセンサデータとを取得する取得部と、取得された前記映像のそれぞれに映る同一の人物の位置と、同時に取得された該人物を検知した前記センサデータとを組み合わせた訓練データを生成する訓練部と、前記訓練データに基づいて、連続する時系列に取得された前記映像のそれぞれに映る人物の位置における前記センサデータの推定値を算出する推定部と、算出された前記推定値と、取得された前記映像と同時に取得された前記センサデータとの類似度の時系列変化に基づいて、該映像中の人物を識別する識別情報を特定する特定部と、前記複数の映像のそれぞれに映る人物の位置を前記監視範囲内の位置に変換した場合に重畳する同一の人物の識別情報として、変換前に特定された各映像中の人物の前記識別情報のうち重複する数が多いものを判定する判定部と、を備えることを特徴とする。 In order to solve the above-described problems and achieve the object, the monitoring apparatus according to the present invention identifies a plurality of videos taken from different locations within a predetermined monitoring range and persons within the monitoring range at the same time. Generating the training data that combines the acquisition unit that acquires the sensor data detected in this way, the position of the same person appearing in each of the acquired images, and the sensor data that is detected at the same time An estimation unit that calculates an estimation value of the sensor data at the position of the person shown in each of the videos acquired in a continuous time series based on the training data, and the calculated estimation value A specifying unit for specifying identification information for identifying a person in the video based on a time-series change in similarity with the sensor data acquired simultaneously with the acquired video; and the plurality of videos As the identification information of the same person to be superimposed when the position of the person shown in each is converted into a position within the monitoring range, there are many overlapping numbers among the identification information of the person in each video specified before conversion And a determination unit for determining a thing.

本発明によれば、人物の特定が曖昧なセンサと組み合わせて、多視点から撮影された映像中の人物を簡易に特定して行動を監視することができる。 According to the present invention, it is possible to easily identify a person in a video taken from multiple viewpoints and monitor an action in combination with a sensor in which the person is not clearly identified.

図１は、本発明の一実施形態に係る監視装置を含むシステムの概略構成を示す模式図である。FIG. 1 is a schematic diagram showing a schematic configuration of a system including a monitoring device according to an embodiment of the present invention. 図２は、映像センサの設置方法を説明するための説明図である。FIG. 2 is an explanatory diagram for explaining a method of installing the video sensor. 図３は、映像センサの設置方法を説明するための説明図である。FIG. 3 is an explanatory diagram for explaining a method of installing the video sensor. 図４は、映像センサの設置方法を説明するための説明図である。FIG. 4 is an explanatory diagram for explaining a method of installing the video sensor. 図５は、人物映像の位置を説明するための説明図である。FIG. 5 is an explanatory diagram for explaining the position of the person video. 図６は、訓練データの生成を説明するための説明図である。FIG. 6 is an explanatory diagram for explaining generation of training data. 図７は、グローバルピクセル座標およびローカルピクセル座標を説明するための説明図である。FIG. 7 is an explanatory diagram for explaining global pixel coordinates and local pixel coordinates. 図８は、推定部の処理を説明するための説明図である。FIG. 8 is an explanatory diagram for explaining processing of the estimation unit. 図９は、判定部の処理を説明するための説明図である。FIG. 9 is an explanatory diagram for explaining the processing of the determination unit. 図１０は、判定部の処理を説明するための説明図である。FIG. 10 is an explanatory diagram for explaining processing of the determination unit. 図１１は、監視処理手順を示すフローチャートである。FIG. 11 is a flowchart showing the monitoring processing procedure. 図１２は、監視プログラムを実行するコンピュータを例示する図である。FIG. 12 is a diagram illustrating a computer that executes a monitoring program.

以下、図面を参照して、本発明の一実施形態を詳細に説明する。なお、この実施形態により本発明が限定されるものではない。また、図面の記載において、同一部分には同一の符号を付して示している。 Hereinafter, an embodiment of the present invention will be described in detail with reference to the drawings. In addition, this invention is not limited by this embodiment. Moreover, in description of drawing, the same code | symbol is attached | subjected and shown to the same part.

［システム構成］
図１は、本実施形態に係る監視装置を含むシステムの概略構成を示す模式図である。図１に例示するように、設置された複数台の映像センサＣが、それぞれ所定の監視範囲Ｒ内を撮影してその映像を生成する。また、監視範囲Ｒ内に無線信号発信機Ｂが設置され、監視範囲Ｒ内にいる人物ｈが所持するモバイル端末ｍを検知する。監視装置１は、後述する監視処理により監視範囲Ｒ内にいる人物ｈの行動を監視する。 [System configuration]
FIG. 1 is a schematic diagram illustrating a schematic configuration of a system including a monitoring device according to the present embodiment. As illustrated in FIG. 1, a plurality of installed video sensors C each capture a predetermined monitoring range R and generate the video. In addition, the wireless signal transmitter B is installed in the monitoring range R, and the mobile terminal m possessed by the person h in the monitoring range R is detected. The monitoring device 1 monitors the behavior of the person h in the monitoring range R by a monitoring process described later.

無線信号発信機Ｂは、例えば、ＢＬＥ（Bluetooth（登録商標） Low Energy）信号を発信するビーコン等で実現され、監視範囲Ｒ内に向けて無線信号を発信する。この無線信号の受信機能を有するモバイル端末ｍは、監視範囲Ｒ内で無線信号を受信した場合に、自端末ｍの端末ＩＤ、無線信号発信機Ｂの識別情報および無線信号の受信強度を含むセンサデータをサーバＳに送信する。 The radio signal transmitter B is realized by, for example, a beacon that transmits a BLE (Bluetooth (registered trademark) Low Energy) signal, and transmits a radio signal toward the monitoring range R. When the mobile terminal m having the wireless signal receiving function receives a wireless signal within the monitoring range R, the mobile terminal m includes a terminal ID of the terminal m, identification information of the wireless signal transmitter B, and a reception strength of the wireless signal. Data is transmitted to the server S.

ここで、モバイル端末ｍの端末ＩＤと、モバイル端末ｍを所持する人物ｈを識別する識別情報とを予め対応付けておく。その場合に、サーバＳは、センサデータにより、監視範囲Ｒ内の複数個所に設置されたどの無線信号発信機Ｂが発信した無線信号をどの程度の受信強度で受信したかを検知することにより、モバイル端末ｍを所持する人物ｈの識別情報とその位置とを検知する。なお、この無線信号は、電磁波の他、音波等でもよい。 Here, the terminal ID of the mobile terminal m and the identification information for identifying the person h carrying the mobile terminal m are associated in advance. In that case, the server S detects, based on the sensor data, the radio signal transmitted from which radio signal transmitter B installed at a plurality of locations within the monitoring range R and received with a certain reception strength. The identification information and position of the person h holding the mobile terminal m are detected. Note that this radio signal may be a sound wave in addition to an electromagnetic wave.

映像センサＣは光学カメラや赤外線カメラ等で実現され、監視範囲Ｒ内の人物ｈを、この人物ｈに対する撮影方向およびこの人物ｈまでの距離が異なる複数の場所から撮影可能に設置される。具体的に、図２〜図４を参照して、映像センサＣの設置方法について説明する。 The image sensor C is realized by an optical camera, an infrared camera, or the like, and is installed so that a person h in the monitoring range R can be photographed from a plurality of places with different photographing directions and distances to the person h. Specifically, the installation method of the image sensor C will be described with reference to FIGS.

例えば、各映像センサＣの監視範囲Ｒに対する撮影方向を変え、撮影範囲が重複する範囲が設けられるように設置される。これにより、図２に例示するように、移動中に撮影範囲の重複範囲（ａ，ｂ）に入った人物ｈを複数台の映像センサＣ（Ｃ１〜Ｃ３）で撮影可能となる。例えば、重複範囲ａにいる人物ｈは、映像センサＣ１と映像センサＣ２とにより撮影可能となる。 For example, the imaging direction with respect to the monitoring range R of each video sensor C is changed so that the range where the imaging ranges overlap is provided. As a result, as illustrated in FIG. 2, it is possible to shoot a person h that has entered the overlapping range (a, b) of the shooting range while moving with a plurality of video sensors C (C1 to C3). For example, a person h in the overlapping range a can be photographed by the video sensor C1 and the video sensor C2.

また、各映像センサＣの監視範囲Ｒ内の人物ｈに対する撮影方向を変えて設置される。これにより、図３に例示するように、一部の映像センサＣ１と人物ｈとの間に障害物がある場合にも、この人物ｈを、この人物ｈに対する撮影方向が異なる他の映像センサＣ２で撮影可能となる。 In addition, the image sensor C is installed by changing the shooting direction with respect to the person h within the monitoring range R. As a result, as illustrated in FIG. 3, even when there are obstacles between some of the video sensors C1 and the person h, this person h is another video sensor C2 having a different shooting direction with respect to the person h. It becomes possible to shoot with.

また、各映像センサＣが、監視範囲Ｒ内の人物ｈまでの距離が異なるように設置される。これにより、図４に例示するように、一部の映像センサＣ１からの距離ｄが近すぎて、映像上でこの人物ｈの移動速度が大きくなり、この映像上の人物ｈの検知精度が低下する場合にも、この人物ｈまでの距離が異なる他の映像センサＣ２で撮影可能となる。反対に、一部の映像センサＣ１からの距離ｄが遠すぎて映像上での人物ｈの移動速度が小さくなり、この映像上の人物ｈの検知精度が低下する場合にも、この人物ｈまでの距離が異なる他の映像センサＣ２で撮影可能となる。なお、ここで設定された映像センサＣの各位置は、以降の処理では固定とする。 In addition, each video sensor C is installed such that the distance to the person h in the monitoring range R is different. As a result, as illustrated in FIG. 4, the distance d from some of the image sensors C1 is too close, and the moving speed of the person h on the image increases, and the detection accuracy of the person h on the image decreases. In this case, it is possible to shoot with another video sensor C2 having a different distance to the person h. On the other hand, even when the distance d from some of the image sensors C1 is too far and the moving speed of the person h on the image is reduced, and the detection accuracy of the person h on the image is lowered, the person h is also reached. It is possible to shoot with other video sensors C2 having different distances. Note that each position of the image sensor C set here is fixed in the subsequent processing.

［監視装置の構成］
図１の説明に戻る。監視装置１は、パソコン等の汎用コンピュータで実現され、入力部１１、出力部１２、通信制御部１３、記憶部１４、および制御部１５を備える。 [Configuration of monitoring device]
Returning to the description of FIG. The monitoring device 1 is realized by a general-purpose computer such as a personal computer, and includes an input unit 11, an output unit 12, a communication control unit 13, a storage unit 14, and a control unit 15.

入力部１１は、キーボードやマウス等の入力デバイスを用いて実現され、操作者による入力操作に対応して、制御部１５に対して処理開始などの各種指示情報を入力する。出力部１２は、液晶ディスプレイなどの表示装置、プリンター等の印刷装置等によって実現される。 The input unit 11 is realized by using an input device such as a keyboard or a mouse, and inputs various instruction information such as processing start to the control unit 15 in response to an input operation by the operator. The output unit 12 is realized by a display device such as a liquid crystal display or a printing device such as a printer.

通信制御部１３は、ＮＩＣ（Network Interface Card）等で実現され、ＬＡＮ（Local Area Network）やインターネットなどの電気通信回線を介した映像センサＣやサーバＳ等の外部の装置と制御部１５との通信を制御する。 The communication control unit 13 is realized by a NIC (Network Interface Card) or the like, and is connected between an external device such as a video sensor C or a server S and a control unit 15 via a telecommunication line such as a LAN (Local Area Network) or the Internet. Control communication.

記憶部１４は、ＲＡＭ（Random Access Memory）、フラッシュメモリ（Flash Memory）等の半導体メモリ素子、または、ハードディスク、光ディスク等の記憶装置によって実現され、後述する監視処理により生成される訓練データ１４ａが記憶される。なお、記憶部１４は、通信制御部１３を介して制御部１５と通信する構成でもよい。 The storage unit 14 is realized by a semiconductor memory device such as a RAM (Random Access Memory) or a flash memory, or a storage device such as a hard disk or an optical disk, and stores training data 14a generated by monitoring processing to be described later. Is done. The storage unit 14 may be configured to communicate with the control unit 15 via the communication control unit 13.

制御部１５は、ＣＰＵ（Central Processing Unit）等を用いて実現され、メモリに記憶された処理プログラムを実行する。これにより、制御部１５は、図１に例示するように、取得部１５ａ、訓練部１５ｂ、推定部１５ｃ、特定部１５ｄおよび判定部１５ｅとして機能する。 The controller 15 is implemented using a CPU (Central Processing Unit) or the like, and executes a processing program stored in a memory. Thereby, the control part 15 functions as the acquisition part 15a, the training part 15b, the estimation part 15c, the specific | specification part 15d, and the determination part 15e so that it may illustrate in FIG.

取得部１５ａは、同一時刻に、所定の監視範囲内を異なる場所から撮影した複数の映像と、該監視範囲内の人物を識別して検知したセンサデータとを取得する。具体的に、取得部１５ａは、複数の映像センサＣのそれぞれが同一時刻に監視範囲Ｒ内を撮影した複数の映像と、同一時刻にサーバＳが検知したセンサデータとを、通信制御部１３を介して時間的に連続して取得する。 The acquisition unit 15a acquires, at the same time, a plurality of videos taken from different locations within a predetermined monitoring range, and sensor data that is detected by identifying a person within the monitoring range. Specifically, the acquisition unit 15a uses the communication control unit 13 to display a plurality of videos captured by the plurality of video sensors C in the monitoring range R at the same time and sensor data detected by the server S at the same time. Get continuously in time through.

また、取得部１５ａは、取得した同一時刻の複数の映像のそれぞれから、例えば、頭部〜肩の形状を検知することにより、人物の映像の部分（以下、人物映像と記す）ｐを検知する。また、取得部１５ａは、各人物映像ｐの位置ｘｐとして、例えば、頭部エリアの重心点等の代表点のピクセル座標上の位置（以下、ローカルピクセル位置とも記す）を抽出する。その際、取得部１５ａは、同一の場所から撮影された各映像において、連続する異なる時刻間で同一の人物の人物映像ｐを識別して同一の識別値を付与する。 Further, the acquisition unit 15a detects a human video portion (hereinafter referred to as a human video) p by detecting, for example, the shape of the head to the shoulder from each of the acquired multiple videos at the same time. . Further, the acquisition unit 15a extracts, as the position xp of each person video p, for example, the position on the pixel coordinates of a representative point such as the center of gravity of the head area (hereinafter also referred to as a local pixel position). At that time, the acquisition unit 15a identifies the person image p of the same person and assigns the same identification value to the images taken from the same place at different consecutive times.

ここで、図５を参照して、人物映像のローカルピクセル位置について説明する。図５には、監視範囲Ｒ内の位置（以下、グローバルピクセル位置とも記す）ｘｈｉ（ｉ＝１，２，３）にいる人物ｈｉを複数の映像センサＣｎ（ｎ＝１，２）で撮影した複数の映像ｎ中の人物映像ｉｎのローカルピクセル位置ｘｉｎが例示されている。例えば、映像センサＣ１で撮影された映像１中には、グローバルピクセル位置ｘｈ１にいる人物ｈ１およびグローバルピクセル位置ｘｈ２にいる人物ｈ２が、ローカルピクセル位置ｘ１１の人物映像１１またはローカルピクセル位置ｘ２１の人物映像２１として映っている。一方、映像センサＣ２で撮影された映像２には、人物ｈ１、ｈ２に加えて、グローバルピクセル位置ｘｈ３にいる人物ｈ３が、ローカルピクセル位置ｘ２２の人物映像２２、ローカルピクセル位置ｘ１２の人物映像１２、またはローカルピクセル位置ｘ３２の人物映像３２として映っている。すなわち、グローバルピクセル位置ｘｈ３にいる人物ｈ３は、映像１には映っておらず、映像２にのみ映っている。 Here, the local pixel position of the person video will be described with reference to FIG. In FIG. 5, a person hi at a position within the monitoring range R (hereinafter also referred to as a global pixel position) xhi (i = 1, 2, 3) was photographed by a plurality of video sensors Cn (n = 1, 2). A local pixel position xin of a person video in in a plurality of videos n is illustrated. For example, in the video 1 shot by the video sensor C1, the person h1 at the global pixel position xh1 and the person h2 at the global pixel position xh2 are the person video 11 at the local pixel position x11 or the person video at the local pixel position x21. It is reflected as 21. On the other hand, in the video 2 captured by the video sensor C2, in addition to the people h1 and h2, the person h3 at the global pixel position xh3 includes a person video 22 at the local pixel position x22, a person video 12 at the local pixel position x12, Alternatively, the image is shown as a person image 32 at the local pixel position x32. That is, the person h3 at the global pixel position xh3 is not reflected in the video 1 but only in the video 2.

なお、取得部１５ａが複数の映像ｎを取得して人物映像ｉｎのローカルピクセル位置ｘｉｎを抽出した時点では、各人物映像ｉｎとグローバルピクセル位置ｘｈｉにいる人物ｈｉとの対応は不明である。そこで、本実施形態の監視装置１は、後述する監視処理により、各映像ｎ中で人物映像ｉｎの識別情報を特定する。また、各映像ｎ中の人物映像ｉｎのローカルピクセル位置ｘｉｎをグローバルピクセル位置ｘｈｉに変換した場合に、同一のグローバルピクセル位置ｘｈｉになる人物映像ｉｎの識別情報のうち重複する数が多いものを、人物ｈｉの識別情報と判定する。これにより、本実施形態の監視装置１は、各人物ｈｉを特定してその移動軌跡を検知し、行動を監視する。 Note that when the acquisition unit 15a acquires a plurality of videos n and extracts the local pixel position xin of the person video in, the correspondence between each person video in and the person hi at the global pixel position xhi is unknown. Therefore, the monitoring device 1 of the present embodiment specifies identification information of the person video in in each video n by a monitoring process described later. In addition, when the local pixel position xin of the person video in in each video n is converted into the global pixel position xhi, the identification information of the person video in having the same global pixel position xhi has a large number of duplicates. It is determined as identification information of the person hi. Thereby, the monitoring apparatus 1 of this embodiment identifies each person hi, detects the movement locus | trajectory, and monitors action.

また、取得部１５ａは、上記したように設置された映像センサＣが、監視範囲Ｒ内の人物ｈを、該人物ｈに対する撮影方向および該人物ｈまでの距離が異なる複数の場所から撮影した映像を取得する。これにより、各映像センサＣの撮影範囲が重複する範囲ができ、重複範囲にいる人物ｈを複数の映像センサＣで撮影した複数の映像を取得できる（図２参照）。また、一部の映像センサＣと人物ｈとの間に障害物があり映像上で人物ｈの追跡が困難な場合にも、他の映像センサＣが撮影した映像を用いて人物ｈの検知の精度の低下を抑止できる（図３参照）。また、例えば、一部の映像センサＣからの距離が近すぎて映像上でこの人物ｈの移動速度が大きくなり、この人物ｈの映像上での検知精度が低下した場合にも、他の映像センサＣが撮影した映像を用いて人物ｈの識別の精度の低下を抑止できる。反対に、一部の映像センサＣからの距離が遠すぎて映像上での人物ｈの移動速度が小さくなり、この人物ｈの映像上での検知精度が低下した場合にも、他の映像センサＣが撮影した映像を用いて人物ｈの識別の精度の低下を抑止できる（図４参照）。 The acquisition unit 15a also captures images of the person h in the monitoring range R taken from a plurality of places with different shooting directions and distances to the person h by the video sensor C installed as described above. To get. Thereby, the range which the imaging | photography range of each image sensor C overlaps is made, and the some image | video which image | photographed the person h in the overlap range with the some image | video sensor C can be acquired (refer FIG. 2). Further, even when there is an obstacle between some of the video sensors C and the person h and it is difficult to track the person h on the video, the person h can be detected using the video captured by the other video sensors C. The decrease in accuracy can be suppressed (see FIG. 3). In addition, for example, when the distance from some of the video sensors C is too short and the moving speed of the person h on the video increases and the detection accuracy on the video of the person h decreases, A decrease in the accuracy of identification of the person h can be suppressed by using the video imaged by the sensor C. On the other hand, even when the distance from some of the image sensors C is too far and the movement speed of the person h on the image becomes small and the detection accuracy on the image of the person h is reduced, other image sensors A reduction in the accuracy of the identification of the person h can be suppressed using the video imaged by C (see FIG. 4).

図１の説明に戻る。訓練部１５ｂは、取得された映像のそれぞれに映る同一の人物ｈの位置と、同時に取得された該人物ｈを検知したセンサデータとを組み合わせた訓練データを生成する。具体的に、図６を参照して、訓練データ１４ａの生成について説明する。訓練部１５ｂは、同一時刻に、監視範囲Ｒ内のグローバルピクセル位置ｘｈにいる人物ｈを複数の映像センサＣｎ（ｎ＝１，２）が撮影した各映像ｎ中の人物映像ｎのローカルピクセル位置ｘｎと、同一の人物ｈを検知したセンサデータとを組み合わせて訓練データ１４ａを生成する。 Returning to the description of FIG. The training unit 15b generates training data that combines the position of the same person h shown in each of the acquired videos and the sensor data that has been detected at the same time. Specifically, generation of training data 14a will be described with reference to FIG. The training unit 15b, at the same time, the local pixel position of the person video n in each video n obtained by photographing the person h at the global pixel position xh in the monitoring range R by the plurality of video sensors Cn (n = 1, 2). Training data 14a is generated by combining xn and sensor data that detects the same person h.

ここで、センサデータは、例えば、人物ｈが所持するモバイル端末ｍが無線信号発信機Ｂから受信した無線信号の受信強度で表される。すなわち、センサデータは、監視範囲Ｒ内に設置されている複数の無線信号発信機Ｂのそれぞれから発信される無線信号の、各人物ｈの位置における受信強度の組み合わせで表される。図６に示す例で生成される訓練データ１４ａは、グローバルピクセル位置ｘｈにいる人物ｈの各映像ｎ中のローカルピクセル位置ｘｎとセンサデータとを含む。 Here, the sensor data is represented by, for example, the reception intensity of the radio signal received from the radio signal transmitter B by the mobile terminal m possessed by the person h. That is, the sensor data is represented by a combination of the reception strengths at the positions of the persons h of the radio signals transmitted from the plurality of radio signal transmitters B installed in the monitoring range R. The training data 14a generated in the example shown in FIG. 6 includes the local pixel position xn in each video n of the person h at the global pixel position xh and sensor data.

さらに、訓練部１５ｂは、人物ｈのグローバルピクセル位置ｘｈを変えて、各グローバルピクセル位置ｘｈにおいて、上記処理と同様に訓練データ１４ａを生成する。その後、訓練部１５ｂは、生成した訓練データ１４ａを記憶部１４に格納する。 Further, the training unit 15b changes the global pixel position xh of the person h, and generates training data 14a at each global pixel position xh in the same manner as the above processing. Thereafter, the training unit 15 b stores the generated training data 14 a in the storage unit 14.

このようにして、訓練部１５ｂは、人物ｈのグローバルピクセル位置ｘｈに対応する各映像ｎ中の人物映像ｎのローカルピクセル位置ｘｎと、その位置で検知されたセンサデータとを組み合わせて訓練データ１４ａを生成する。なお、後述するように、本実施形態の監視装置１は、同一のグローバルピクセル位置ｘｈに対応する各映像ｎ中のローカルピクセル位置ｘｎとその位置で検知されたセンサデータとを用いて、各映像ｎ中に映る人物の識別情報を特定する。したがって、監視範囲Ｒ内の可能な限り多数のグローバルピクセル位置ｘｈに対応する訓練データ１４ａが生成されることが望ましい。 In this way, the training unit 15b combines the local pixel position xn of the person video n in each video n corresponding to the global pixel position xh of the person h and the sensor data detected at that position to combine the training data 14a. Is generated. Note that, as will be described later, the monitoring device 1 of the present embodiment uses each local image position xn in each image n corresponding to the same global pixel position xh and sensor data detected at that position, for each image. The identification information of the person shown in n is specified. Therefore, it is desirable that training data 14a corresponding to as many global pixel positions xh as possible within the monitoring range R is generated.

また、図７は、各映像ｎのローカルピクセル座標とグローバルピクセル座標とを説明する説明図である。訓練部１５ｂは、図７に例示するように、各映像センサＣｎによる各映像ｎのローカルピクセル座標上の任意のローカルピクセル位置とグローバルピクセル座標上のグローバルピクセル位置との対応を表す変換式または変換表を生成し、記憶部１４に格納する。後述するように、判定部１５ｅがこの変換式または変換表を参照して、映像ｎ中の人物映像のローカルピクセル位置ｘｎをグローバルピクセル位置ｘｈに変換する。 Moreover, FIG. 7 is explanatory drawing explaining the local pixel coordinate and global pixel coordinate of each image | video n. As illustrated in FIG. 7, the training unit 15 b uses a conversion formula or conversion that indicates a correspondence between an arbitrary local pixel position on the local pixel coordinate of each image n by the image sensor Cn and a global pixel position on the global pixel coordinate. A table is generated and stored in the storage unit 14. As will be described later, the determination unit 15e converts the local pixel position xn of the person image in the image n into the global pixel position xh with reference to the conversion formula or conversion table.

図１の説明に戻る。推定部１５ｃは、訓練データ１４ａに基づいて、連続する時系列に取得された映像のそれぞれに映る人物の位置におけるセンサデータの推定値を算出する。具体的に、推定部１５ｃは、訓練データ１４ａの各映像中のローカルピクセル位置とセンサデータとの対応関係を用いて、時間的に連続して取得された各映像中の人物映像のローカルピクセル位置におけるセンサデータの推定値を算出する。 Returning to the description of FIG. Based on the training data 14a, the estimation unit 15c calculates an estimated value of the sensor data at the position of the person shown in each of the images acquired in continuous time series. Specifically, the estimation unit 15c uses the correspondence between the local pixel position in each video of the training data 14a and the sensor data, and the local pixel position of the person video in each video acquired continuously in time. The estimated value of the sensor data at is calculated.

例えば、推定部１５ｃは、各映像中のローカルピクセル座標を複数のセルに分割し、同一のセルに属する訓練データ１４ａにおけるセンサデータの平均値を、当該セルに属するローカルピクセル位置におけるセンサデータの推定値として算出する。言い換えれば、推定部１５ｃは、各ローカルピクセル位置の近傍におけるセンサデータの推定値を算出する。なお、推定部１５ｃは、以下に説明する特定部１５ｄの処理に先立って、予め各映像中の任意のローカルピクセル位置が属するセルにおけるセンサデータの推定値を算出しておき、記憶部１４に格納しておくことが望ましい。 For example, the estimation unit 15c divides the local pixel coordinates in each video into a plurality of cells, and estimates the sensor data average value in the training data 14a belonging to the same cell as the sensor data in the local pixel position belonging to the cell. Calculate as a value. In other words, the estimation unit 15c calculates an estimated value of sensor data in the vicinity of each local pixel position. The estimation unit 15c calculates an estimated value of sensor data in a cell to which an arbitrary local pixel position in each video belongs in advance and stores the estimated value in the storage unit 14 prior to the processing of the specifying unit 15d described below. It is desirable to keep it.

特定部１５ｄは、算出された推定値と、取得された映像と同時に取得されたセンサデータとの類似度の時系列変化に基づいて、該映像中の人物を識別する識別情報を特定する。具体的に、特定部１５ｄは、時間的に連続して取得されたある人物ｊのローカルピクセル位置について、推定部１５ｃが算出した推定値と、同一時刻に実測され取得された人物ｋのセンサデータとの類似度を算出する。 The specifying unit 15d specifies identification information for identifying a person in the video based on a time-series change in similarity between the calculated estimated value and the sensor data acquired simultaneously with the acquired video. Specifically, the specifying unit 15d determines the sensor data of the person k actually measured and acquired at the same time as the estimated value calculated by the estimation unit 15c for the local pixel position of a certain person j acquired continuously in time. The similarity is calculated.

なお、特定部１５ｄは、監視範囲Ｒ内に設置されたｌ個の無線信号発信機Ｂに対するセンサデータについて、例えば、ｌ次元空間におけるセンサデータの実測値と推定値との距離を類似度として算出する。 The specifying unit 15d calculates, for example, the distance between the measured value and the estimated value of the sensor data in the l-dimensional space as the similarity with respect to the sensor data for the one radio signal transmitter B installed in the monitoring range R. To do.

そして、特定部１５ｄは、算出した類似度の時間的推移に基づいて、人物ｊと人物ｋとが同一人物か否かを判定し、同一人物と判定した場合に、人物ｋの識別情報を人物ｊの識別情報として特定する。例えば、特定部１５ｄは、所定の期間における類似度の平均値が所定の閾値を超えて高い場合に、同一人物と判定する。 Then, the specifying unit 15d determines whether the person j and the person k are the same person based on the calculated temporal transition of the similarity, and when determining that the person j is the same person, It is specified as identification information for j. For example, the specifying unit 15d determines that the person is the same person when the average value of the similarity in a predetermined period is higher than a predetermined threshold.

ここで、図８を参照して、特定部１５ｄの処理を具体的に説明する。図８に示す例では、人物ｊのセンサデータの推定値の時間的推移は、人物ｋ２のセンサデータの実測値の時間的推移より、人物ｋ１のセンサデータの実測値の時間的推移に似通っている。図示されている期間において、人物ｊのセンサデータの推定値と人物ｋ１のセンサデータの実測値との類似度の平均値が、人物ｋ２のセンサデータの実測値との類似度の平均値より大きく、所定の閾値を超えるものとする。この場合に、人物ｊと人物ｋ１とが同一人物と判定され、人物ｊの識別情報は人物ｋ１の識別情報と同一と特定される。 Here, the processing of the specifying unit 15d will be described in detail with reference to FIG. In the example shown in FIG. 8, the temporal transition of the estimated value of the sensor data of the person j is more similar to the temporal transition of the measured value of the sensor data of the person k1 than the temporal transition of the measured value of the sensor data of the person k2. Yes. In the period shown, the average value of the similarity between the estimated value of the sensor data of the person j and the measured value of the sensor data of the person k1 is larger than the average value of the similarity with the measured value of the sensor data of the person k2. Assume that a predetermined threshold value is exceeded. In this case, it is determined that the person j and the person k1 are the same person, and the identification information of the person j is specified to be the same as the identification information of the person k1.

特定部１５ｄは、人物ｊと人物ｋとのすべての組み合わせについて、上記の処理を行って、映像中の人物の識別情報を特定する。なお、人物ｊについて、同一人物と判定される人物ｋがいない場合には、部外者と判定される。また、特定部１５ｄは、すべての映像ｎについて、上記の処理を行って、各映像ｎ中の人物の識別情報を特定する。 The specifying unit 15d performs the above process for all combinations of the person j and the person k, and specifies the identification information of the person in the video. When there is no person k determined to be the same person for the person j, the person j is determined to be an outsider. Further, the specifying unit 15d performs the above-described processing for all the videos n to specify the identification information of the person in each video n.

図１の説明に戻る。判定部１５ｅは、複数の映像のそれぞれに映る人物の位置を監視範囲Ｒ内の位置に変換した場合に重畳する同一の人物の識別情報として、変換前に特定された各映像中の人物の識別情報のうち重複する数が多いものを判定する。 Returning to the description of FIG. The determination unit 15e identifies the person in each video specified before conversion as identification information of the same person to be superimposed when the position of the person shown in each of the plurality of videos is converted into a position within the monitoring range R. The information having a large number of duplicates is determined.

具体的に、図９および図１０を参照して、判定部１５ｅの処理について説明する。判定部１５ｅは、記憶部１４の変換式あるいは変換表を参照し、各映像ｎ中の人物映像ｉｎのローカルピクセル座標上のローカルピクセル位置ｘｉｎを、監視範囲Ｒ内のグローバルピクセル座標上のグローバルピクセル位置ｈｉに変換する。 Specifically, the process of the determination unit 15e will be described with reference to FIGS. The determination unit 15e refers to the conversion formula or conversion table in the storage unit 14 and determines the local pixel position xin on the local pixel coordinate of the person video in in each video n as the global pixel on the global pixel coordinate in the monitoring range R. Convert to position hi.

その際、判定部１５ｅは、必ずしも各映像の画角内のローカルピクセル座標点の全てをグローバルピクセル座標上に変換しなくてもよい。人物映像のローカルピクセル位置を変換すれば足り、その他の座標点は後に補完してもよい。 At that time, the determination unit 15e does not necessarily have to convert all the local pixel coordinate points within the angle of view of each video to the global pixel coordinates. It is sufficient to convert the local pixel position of the person video, and other coordinate points may be complemented later.

また、各映像中の人物映像のローカルピクセル位置は、上記したように、各人物映像の頭部エリアの重心点等の代表点であるため、同一人物の人物映像であっても変換したグローバルピクセル位置が同一になるとは限らない。したがって、同一人物の人物映像は、変換したグローバルピクセル位置が同一または所定の閾値以内の距離の近傍の位置で重畳する。 In addition, since the local pixel position of the person video in each video is a representative point such as the center of gravity of the head area of each human video as described above, the converted global pixel even if it is a human video of the same person The positions are not necessarily the same. Therefore, person images of the same person are superimposed at positions where the converted global pixel positions are the same or in the vicinity of a distance within a predetermined threshold.

図９に示す例では、映像センサＣ１の映像１のローカルピクセル位置ｘ１１と、映像センサＣ２の映像２のローカルピクセル位置ｘ１２とがグローバルピクセル位置ｘｈ１の近傍に変換される。また、映像１のローカルピクセル位置ｘ２１、映像２のローカルピクセル位置ｘ２２、および映像センサＣ３の映像３のローカルピクセル位置ｘ２３がグローバルピクセル位置ｘｈ２の近傍に変換される。 In the example shown in FIG. 9, the local pixel position x11 of the video 1 of the video sensor C1 and the local pixel position x12 of the video 2 of the video sensor C2 are converted to the vicinity of the global pixel position xh1. In addition, the local pixel position x21 of the video 1, the local pixel position x22 of the video 2, and the local pixel position x23 of the video 3 of the video sensor C3 are converted to the vicinity of the global pixel position xh2.

次に、判定部１５ｅは、特定部１５ｄが各映像ｎで特定した人物映像の識別情報のうち重複する数が多いものを、グローバルピクセル位置ｘｈｉの人物の識別情報として判定する。図９に示す例において、グローバルピクセル位置ｘｈ１の人物ｈ１は、映像１および映像２で、モバイル端末ｍ１を所持する識別情報ＩＤ１の人物と特定されている。この場合に、図１０に例示するように、判定部１５ｅは、グローバルピクセル位置ｘｈ１の人物ｈ１は、モバイル端末ｍ１を所持する識別情報ＩＤ１の人物と判定する。 Next, the determination unit 15e determines, as the identification information of the person at the global pixel position xhi, the identification information of the person image specified by each of the images n by the specifying unit 15d that has a large number of duplicates. In the example shown in FIG. 9, the person h1 at the global pixel position xh1 is specified as the person of the identification information ID1 possessing the mobile terminal m1 in the video 1 and the video 2. In this case, as illustrated in FIG. 10, the determination unit 15e determines that the person h1 at the global pixel position xh1 is the person with the identification information ID1 that possesses the mobile terminal m1.

一方、グローバルピクセル位置ｘｈ２の人物ｈ２は、映像１および映像２では、モバイル端末ｍ２を所持する識別情報ＩＤ２の人物と特定されているが、映像３では、モバイル端末ｍ３を所持する識別情報ＩＤ３の人物と特定されている。この場合に、図１０に例示するように、判定部１５ｅは、グローバルピクセル位置ｘｈ２の人物ｈ２は、モバイル端末ｍ２を所持する識別情報ＩＤ２の人物と判定する。このようにして、判定部１５ｅは、人物の識別情報を特定してこの人物の移動軌跡を追跡することが可能となる。 On the other hand, the person h2 at the global pixel position xh2 is identified as the person of the identification information ID2 possessing the mobile terminal m2 in the images 1 and 2, but in the image 3, the person h2 of the identification information ID3 possessing the mobile terminal m3 is identified. Identified as a person. In this case, as illustrated in FIG. 10, the determination unit 15e determines that the person h2 at the global pixel position xh2 is the person with the identification information ID2 that possesses the mobile terminal m2. In this way, the determination unit 15e can identify the person identification information and track the movement locus of the person.

また、判定部１５ｅは、判定した人物の識別情報を出力部１２に出力する。例えば、判定部１５ｅは、各映像センサＣによる映像に、識別した人物のローカルピクセル位置に重畳してこの人物の識別情報を表示するように制御する。これにより、例えば、関係者以外は立ち入り禁止の区域を監視する管理者等の利用者の利便性が向上する。 Further, the determination unit 15 e outputs the determined person identification information to the output unit 12. For example, the determination unit 15e controls to display the identification information of the person superimposed on the local pixel position of the identified person on the video by each video sensor C. Thereby, for example, the convenience of a user such as an administrator who monitors an area where entry is prohibited except for related persons is improved.

［監視処理］
次に、図１１を参照して、本実施形態に係る監視装置１による監視処理について説明する。図１１は、監視処理手順を示すフローチャートである。図１１のフローチャートは、例えば、監視処理の開始を指示する操作入力があったタイミングで開始される。 [Monitoring process]
Next, with reference to FIG. 11, the monitoring process by the monitoring apparatus 1 according to the present embodiment will be described. FIG. 11 is a flowchart showing the monitoring processing procedure. The flowchart in FIG. 11 is started, for example, at a timing when there is an operation input instructing start of the monitoring process.

取得部１５ａが、同一時刻に、所定の監視範囲Ｒ内を異なる場所から撮影した複数の映像と、該監視範囲Ｒ内の人物を識別して検知したセンサデータとを取得する（ステップＳ１）。 The acquisition unit 15a acquires a plurality of videos taken from different locations in the predetermined monitoring range R and sensor data that is detected by identifying a person in the monitoring range R at the same time (step S1).

次に、訓練部１５ｂが、取得された映像のそれぞれに映る同一の人物の位置と、同時に取得された該人物を検知したセンサデータとを組み合わせた訓練データ１４ａを生成する（ステップＳ２）。 Next, the training unit 15b generates training data 14a that combines the position of the same person shown in each of the acquired videos and the sensor data that has been acquired at the same time (step S2).

その後、取得部１５ａが、同一時刻に、所定の監視範囲Ｒ内を異なる場所から撮影した複数の映像と、該監視範囲Ｒ内の人物を識別して検知したセンサデータとを、連続する時系列に取得する（ステップＳ３）。 After that, the acquisition unit 15a continuously generates a plurality of videos taken from different locations within the predetermined monitoring range R and sensor data that is detected by identifying a person within the monitoring range R at the same time. (Step S3).

また、推定部１５ｃが、訓練データ１４ａに基づいて、取得された映像のそれぞれに映る人物の位置におけるセンサデータの推定値を算出する。また、特定部１５ｄが、算出された推定値と、取得された映像と同時に取得されたセンサデータとの類似度の時系列変化に基づいて、該映像中の人物の識別情報を特定する（ステップＳ４）。 Moreover, the estimation part 15c calculates the estimated value of the sensor data in the position of the person shown in each of the acquired image | video based on the training data 14a. Further, the specifying unit 15d specifies identification information of a person in the video based on a time-series change in similarity between the calculated estimated value and the sensor data acquired at the same time as the acquired video (Step S15). S4).

また、判定部１５ｅが、複数の映像のそれぞれに映る人物の位置を監視範囲Ｒ内のグローバルピクセル位置に変換した場合に重畳する同一の人物の識別情報として、変換前に特定された各映像中の人物の識別情報のうち重複する数が多いものを判定する（ステップＳ５）。また、判定部１５ｅが、各映像に判定した人物の識別情報を重畳して出力部１２に出力し、監視範囲Ｒ内の人物を特定してその行動を監視する。これにより、一連の監視処理が終了する。 Further, when the determination unit 15e converts the position of the person shown in each of the plurality of videos to the global pixel position in the monitoring range R, the identification information of the same person to be superimposed is included in each video specified before the conversion. The identification information having a large number of duplicates is identified (step S5). Further, the determination unit 15e superimposes the identification information of the determined person on each video and outputs it to the output unit 12, identifies a person within the monitoring range R, and monitors the action. Thereby, a series of monitoring processes are completed.

以上、説明したように、本実施形態の監視装置１において、取得部１５ａが、同一時刻に、所定の監視範囲Ｒ内を異なる場所から撮影した複数の映像と、該監視範囲Ｒ内の人物を識別して検知したセンサデータとを取得する。また、訓練部１５ｂが、取得された映像のそれぞれに映る同一の人物の位置と、同時に取得された該人物を検知したセンサデータとを組み合わせた訓練データ１４ａを生成する。 As described above, in the monitoring apparatus 1 according to the present embodiment, the acquisition unit 15a captures a plurality of videos taken from different locations in the predetermined monitoring range R and persons in the monitoring range R at the same time. The sensor data identified and detected is acquired. In addition, the training unit 15b generates training data 14a that combines the position of the same person shown in each of the acquired videos and the sensor data that has been acquired at the same time.

また、推定部１５ｃが、訓練データ１４ａに基づいて、連続する時系列に取得された映像のそれぞれに映る人物の位置におけるセンサデータの推定値を算出する。また、特定部１５ｄが、算出された推定値と、取得された映像と同時に取得されたセンサデータとの類似度の時系列変化に基づいて、該映像中の人物の識別情報を特定する。また、判定部１５ｅが、複数の映像のそれぞれに映る人物の位置を監視範囲Ｒ内のグローバルピクセル位置に変換した場合に重畳する同一の人物の識別情報として、変換前に特定された各映像中の人物の識別情報のうち重複する数が多いものを判定する。 Moreover, the estimation part 15c calculates the estimated value of the sensor data in the position of the person shown in each of the image | video acquired in the continuous time series based on the training data 14a. Further, the specifying unit 15d specifies identification information of a person in the video based on a time-series change in similarity between the calculated estimated value and the sensor data acquired simultaneously with the acquired video. Further, when the determination unit 15e converts the position of the person shown in each of the plurality of videos to the global pixel position in the monitoring range R, the identification information of the same person to be superimposed is included in each video specified before the conversion. The person identification information having a large number of duplicates is determined.

これにより、人物の特定が曖昧なセンサと組み合わせて、多視点から撮影された映像中の人物を簡易に特定して行動を監視することができる。例えば、各人物につき、複数のカメラの映像のそれぞれにおける位置の組み合わせを基に同一人物を特定する場合には、「検知された人数」の「カメラの台数」乗のオーダーの計算量が必要になる。すなわち、カメラの台数が増加するに従い、指数関数的に計算量が増加する。これに対し、本実施形態の監視処理によれば、各映像において人物を特定するので、「検知された人数」と「カメラの台数」との積のオーダーの計算量で人物を特定できる。 This makes it possible to easily identify a person in a video taken from multiple viewpoints and monitor the behavior in combination with a sensor in which the person is not clearly identified. For example, when specifying the same person for each person based on the combination of positions in the images of multiple cameras, it is necessary to calculate the order of the “number of detected cameras” to the power of “number of cameras”. Become. That is, as the number of cameras increases, the amount of calculation increases exponentially. On the other hand, according to the monitoring process of the present embodiment, since a person is specified in each video, it is possible to specify a person with a calculation amount in the order of the product of “number of detected persons” and “number of cameras”.

また、取得部１５ａは、監視範囲Ｒ内の人物を、該人物に対する撮影方向および該人物までの距離が異なる複数の場所から撮影した映像を取得する。これにより、各映像センサＣの撮影範囲が重複する範囲ができ、重複範囲にいる人物を複数の映像センサＣで撮影した複数の映像を取得できる。また、一部の映像センサＣと人物との間に障害物があり映像上で人物の追跡が困難な場合にも、他の映像センサＣが撮影した映像を用いて人物の検知の精度の低下を抑止できる。また、例えば、一部の映像センサＣからの距離が近すぎて映像上でこの人物の移動速度が大きくなり、この人物の映像上での検知精度が低下した場合にも、他の映像センサＣが撮影した映像を用いて人物の識別の精度の低下を抑止できる。反対に、一部の映像センサＣからの距離が遠すぎて映像上での人物の移動速度が小さくなり、この人物の映像上での検知精度が低下した場合にも、他の映像センサＣが撮影した映像を用いて人物の識別の精度の低下を抑止できる。このように、確度高く人物を識別することが可能となる。 In addition, the acquisition unit 15a acquires videos obtained by shooting a person in the monitoring range R from a plurality of places having different shooting directions and distances to the person. Thereby, the range which the imaging | photography range of each image sensor C overlaps is made, and the some image | video which image | photographed the person in the overlap range with the some image sensor C can be acquired. In addition, even when there are obstacles between some of the video sensors C and the person and it is difficult to track the person on the video, the detection accuracy of the person is reduced using the video taken by the other video sensors C. Can be suppressed. In addition, for example, even when the distance from some of the video sensors C is too short and the movement speed of this person increases on the video, and the detection accuracy on the video of this person decreases, the other video sensors C It is possible to suppress a decrease in the accuracy of person identification by using the video taken by. On the other hand, even when the distance from some of the image sensors C is too far and the movement speed of the person on the image becomes small and the detection accuracy on the image of the person is lowered, It is possible to suppress a decrease in the accuracy of person identification using the captured video. Thus, it becomes possible to identify a person with high accuracy.

また、判定部１５ｅは、識別した人物の識別情報を出力部１２に出力する。例えば、識別した人物の位置に重畳してこの人物の識別情報を表示するように制御する。これにより、例えば、関係者以外は立ち入り禁止の区域を監視する管理者等の利用者の利便性が向上する。 Further, the determination unit 15 e outputs identification information of the identified person to the output unit 12. For example, control is performed so that the identification information of the person is displayed superimposed on the position of the identified person. Thereby, for example, the convenience of a user such as an administrator who monitors an area where entry is prohibited except for related persons is improved.

［プログラム］
上記実施形態に係る監視装置１が実行する処理をコンピュータが実行可能な言語で記述したプログラムを作成することもできる。一実施形態として、監視装置１は、パッケージソフトウェアやオンラインソフトウェアとして上記の監視処理を実行する監視プログラムを所望のコンピュータにインストールさせることによって実装できる。例えば、上記の監視プログラムを情報処理装置に実行させることにより、情報処理装置を監視装置１として機能させることができる。ここで言う情報処理装置には、デスクトップ型またはノート型のパーソナルコンピュータが含まれる。また、その他にも、情報処理装置にはスマートフォン、携帯電話機やＰＨＳ（Personal Handyphone System）などの移動体通信端末、さらには、ＰＤＡ（Personal Digital Assistants）などのスレート端末などがその範疇に含まれる。また、ユーザが使用する端末装置をクライアントとし、当該クライアントに上記の監視処理に関するサービスを提供するサーバ装置として実装することもできる。例えば、監視装置１は、所定の監視範囲内を撮影した複数の映像およびビーコン受信システムが受信したセンサデータを入力とし、識別された人物の位置と識別情報とを出力する監視処理サービスを提供するサーバ装置として実装される。この場合、監視装置１は、Ｗｅｂサーバとして実装することとしてもよいし、アウトソーシングによって上記の監視処理に関するサービスを提供するクラウドとして実装することとしてもかまわない。以下に、監視装置１と同様の機能を実現する監視プログラムを実行するコンピュータの一例を説明する。 [program]
It is also possible to create a program in which processing executed by the monitoring apparatus 1 according to the above embodiment is described in a language that can be executed by a computer. As one embodiment, the monitoring device 1 can be implemented by installing a monitoring program for executing the above-described monitoring processing as package software or online software on a desired computer. For example, the information processing apparatus can function as the monitoring apparatus 1 by causing the information processing apparatus to execute the above monitoring program. The information processing apparatus referred to here includes a desktop or notebook personal computer. In addition, the information processing apparatus includes mobile communication terminals such as smartphones, mobile phones and PHS (Personal Handyphone System), and slate terminals such as PDA (Personal Digital Assistants). In addition, a terminal device used by a user can be a client, and the client can be implemented as a server device that provides services related to the above-described monitoring processing to the client. For example, the monitoring device 1 provides a monitoring processing service that receives a plurality of videos captured within a predetermined monitoring range and sensor data received by the beacon receiving system and outputs the position of the identified person and identification information. Implemented as a server device. In this case, the monitoring device 1 may be implemented as a Web server, or may be implemented as a cloud that provides services related to the above-described monitoring processing by outsourcing. Hereinafter, an example of a computer that executes a monitoring program that realizes the same function as the monitoring device 1 will be described.

図１２に示すように、監視プログラムを実行するコンピュータ１０００は、例えば、メモリ１０１０と、ＣＰＵ１０２０と、ハードディスクドライブインタフェース１０３０と、ディスクドライブインタフェース１０４０と、シリアルポートインタフェース１０５０と、ビデオアダプタ１０６０と、ネットワークインタフェース１０７０とを有する。これらの各部は、バス１０８０によって接続される。 As shown in FIG. 12, a computer 1000 that executes a monitoring program includes, for example, a memory 1010, a CPU 1020, a hard disk drive interface 1030, a disk drive interface 1040, a serial port interface 1050, a video adapter 1060, and a network interface. 1070. These units are connected by a bus 1080.

メモリ１０１０は、ＲＯＭ（Read Only Memory）１０１１およびＲＡＭ１０１２を含む。ＲＯＭ１０１１は、例えば、ＢＩＯＳ（Basic Input Output System）等のブートプログラムを記憶する。ハードディスクドライブインタフェース１０３０は、ハードディスクドライブ１０３１に接続される。ディスクドライブインタフェース１０４０は、ディスクドライブ１０４１に接続される。ディスクドライブ１０４１には、例えば、磁気ディスクや光ディスク等の着脱可能な記憶媒体が挿入される。シリアルポートインタフェース１０５０には、例えば、マウス１０５１およびキーボード１０５２が接続される。ビデオアダプタ１０６０には、例えば、ディスプレイ１０６１が接続される。 The memory 1010 includes a ROM (Read Only Memory) 1011 and a RAM 1012. The ROM 1011 stores a boot program such as BIOS (Basic Input Output System). The hard disk drive interface 1030 is connected to the hard disk drive 1031. The disk drive interface 1040 is connected to the disk drive 1041. For example, a removable storage medium such as a magnetic disk or an optical disk is inserted into the disk drive 1041. For example, a mouse 1051 and a keyboard 1052 are connected to the serial port interface 1050. For example, a display 1061 is connected to the video adapter 1060.

ここで、図１２に示すように、ハードディスクドライブ１０３１は、例えば、ＯＳ１０９１、アプリケーションプログラム１０９２、プログラムモジュール１０９３およびプログラムデータ１０９４を記憶する。上記実施形態で説明した各テーブルは、例えばハードディスクドライブ１０３１やメモリ１０１０に記憶される。 Here, as shown in FIG. 12, the hard disk drive 1031 stores, for example, an OS 1091, an application program 1092, a program module 1093, and program data 1094. Each table described in the above embodiment is stored in the hard disk drive 1031 or the memory 1010, for example.

また、監視プログラムは、例えば、コンピュータ１０００によって実行される指令が記述されたプログラムモジュール１０９３として、ハードディスクドライブ１０３１に記憶される。具体的には、上記実施形態で説明した監視装置１が実行する各処理が記述されたプログラムモジュール１０９３が、ハードディスクドライブ１０３１に記憶される。 Further, the monitoring program is stored in the hard disk drive 1031 as a program module 1093 in which a command executed by the computer 1000 is described, for example. Specifically, a program module 1093 describing each process executed by the monitoring apparatus 1 described in the above embodiment is stored in the hard disk drive 1031.

また、監視プログラムによる情報処理に用いられるデータは、プログラムデータ１０９４として、例えば、ハードディスクドライブ１０３１に記憶される。そして、ＣＰＵ１０２０が、ハードディスクドライブ１０３１に記憶されたプログラムモジュール１０９３やプログラムデータ１０９４を必要に応じてＲＡＭ１０１２に読み出して、上述した各手順を実行する。 Further, data used for information processing by the monitoring program is stored in the hard disk drive 1031 as the program data 1094, for example. Then, the CPU 1020 reads the program module 1093 and the program data 1094 stored in the hard disk drive 1031 to the RAM 1012 as necessary, and executes the above-described procedures.

なお、監視プログラムに係るプログラムモジュール１０９３やプログラムデータ１０９４は、ハードディスクドライブ１０３１に記憶される場合に限られず、例えば、着脱可能な記憶媒体に記憶されて、ディスクドライブ１０４１等を介してＣＰＵ１０２０によって読み出されてもよい。あるいは、監視プログラムに係るプログラムモジュール１０９３やプログラムデータ１０９４は、ＬＡＮやＷＡＮ（Wide Area Network）等のネットワークを介して接続された他のコンピュータに記憶され、ネットワークインタフェース１０７０を介してＣＰＵ１０２０によって読み出されてもよい。 Note that the program module 1093 and the program data 1094 related to the monitoring program are not limited to being stored in the hard disk drive 1031, but are stored in a removable storage medium, for example, and read out by the CPU 1020 via the disk drive 1041 or the like. May be. Alternatively, the program module 1093 and the program data 1094 related to the monitoring program are stored in another computer connected via a network such as a LAN or a WAN (Wide Area Network), and read by the CPU 1020 via the network interface 1070. May be.

以上、本発明者によってなされた発明を適用した実施形態について説明したが、本実施形態による本発明の開示の一部をなす記述および図面により本発明は限定されることはない。すなわち、本実施形態に基づいて当業者等によりなされる他の実施形態、実施例および運用技術等は全て本発明の範疇に含まれる。 As mentioned above, although embodiment which applied the invention made | formed by this inventor was described, this invention is not limited with the description and drawing which make a part of indication of this invention by this embodiment. That is, other embodiments, examples, operational techniques, and the like made by those skilled in the art based on this embodiment are all included in the scope of the present invention.

１監視装置
１１入力部
１２出力部
１３通信制御部
１４記憶部
１４ａ訓練データ
１５制御部
１５ａ取得部
１５ｂ訓練部
１５ｃ推定部
１５ｄ特定部
１５ｅ判定部
Ｂ無線信号発信機
Ｃ映像センサ
Ｒ監視範囲
ｈ人物 DESCRIPTION OF SYMBOLS 1 Monitoring apparatus 11 Input part 12 Output part 13 Communication control part 14 Storage part 14a Training data 15 Control part 15a Acquisition part 15b Training part 15c Estimation part 15d Identification part 15e Judgment part B Radio signal transmitter C Video sensor R Monitoring range h Person

Claims

An acquisition unit that acquires a plurality of videos taken from different locations within a predetermined monitoring range at the same time, and sensor data that is detected by identifying a person within the monitoring range;
A training unit that generates training data that combines the position of the same person shown in each of the acquired images and the sensor data that has been detected at the same time;
Based on the training data, an estimation unit that calculates an estimated value of the sensor data at the position of the person shown in each of the videos acquired in a continuous time series;
A specifying unit that specifies identification information for identifying a person in the video based on a time-series change in similarity between the calculated estimated value and the sensor data acquired simultaneously with the acquired video;
Among the identification information of the person in each video specified before the conversion as the identification information of the same person to be superimposed when the position of the person shown in each of the plurality of videos is converted into a position within the monitoring range A determination unit that determines a large number of duplicates;
A monitoring device comprising:

The monitoring apparatus according to claim 1, wherein the acquisition unit acquires videos obtained by shooting a person within the monitoring range from a plurality of places having different shooting directions and distances to the person.

The monitoring apparatus according to claim 1, further comprising an output unit that presents identification information of the person determined by the determination unit.

The monitoring device according to claim 1, wherein the sensor data indicates a strength of a radio signal received from a radio signal transmitter by a mobile terminal possessed by a person within the monitoring range. .

The estimation unit divides a position in the video into a plurality of cells, and calculates an average value of the sensor data of the training data at a position belonging to the same cell as an estimated value of the sensor data at a position belonging to the cell. The monitoring device according to any one of claims 1 to 4, wherein

A monitoring method executed by a monitoring device,
An acquisition step of acquiring a plurality of videos taken from different locations within a predetermined monitoring range at the same time and sensor data detected by identifying a person within the monitoring range in time series,
A training step for generating training data combining the position of the same person shown in each of the acquired images and the sensor data obtained by detecting the person simultaneously;
Based on the training data, an estimation step of calculating an estimated value of the sensor data at the position of the person shown in each of the acquired images,
A specifying step of identifying identification information for identifying a person in the video based on a time-series change in similarity between the calculated estimated value and the sensor data acquired simultaneously with the acquired video;
Among the identification information of the person in each video specified before the conversion as the identification information of the same person to be superimposed when the position of the person shown in each of the plurality of videos is converted into a position within the monitoring range A determination step of determining what has a large number of duplicates;
The monitoring method characterized by including.