JP6679349B2

JP6679349B2 - Information processing apparatus, information processing method, and program

Info

Publication number: JP6679349B2
Application number: JP2016044760A
Authority: JP
Inventors: 一郎梅田; 矢野　光太郎; 光太郎矢野; 内山　寛之; 寛之内山
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2016-03-08
Filing date: 2016-03-08
Publication date: 2020-04-15
Anticipated expiration: 2036-03-08
Also published as: JP2017163279A

Description

本発明は、撮像された画像からオブジェクトを検出する情報処理装置、情報処理方法及びプログラムに関する。 The present invention relates to an information processing device, an information processing method, and a program that detect an object from a captured image.

複数のカメラ等の撮像装置を用いて、設定された空間内のオブジェクトを検出する技術がある。この技術には、例えば、監視対象の空間内で複数の撮像装置を用いてオブジェクトを監視する技術や、監視対象の空間内で複数の撮像装置を用いてオブジェクトを追尾する技術等があげられる。
複数の撮像装置でオブジェクトを検出して、複数の撮像装置全てで共通する座標（例えば、世界座標）におけるそのオブジェクトの位置等のオブジェクトの属性を推定する手法が提案されている。
非特許文献１は、世界座標が既知のカメラでオブジェクトを撮像し、個々のカメラによる撮像画像上でのオブジェクトの座標を求め、三角測量の原理でオブジェクトの世界座標を求める演算を開示する。更に、例えば特許文献１は、複数のカメラのそれぞれで動体領域を求め、オブジェクトの高さが既知だとして、三次元座標を推定する手法を開示する。
非特許文献２は、あるオブジェクトのカメラによる撮像画像上の座標を求める手法を開示する。非特許文献２は、カメラによる撮像画像のあらゆる部分画像について、その部分画像が人体か否かを識別する手法を開示する。このときカメラによる撮像画像上の人体の座標を求める演算量は、部分画像の数、即ち入力画像の大きさに比例する。特許文献２は、この演算量の削減の為に、画像上から部分画像を選択する領域を制限する手法を開示する。 There is a technique of detecting an object in a set space by using an imaging device such as a plurality of cameras. This technique includes, for example, a technique of monitoring an object using a plurality of image capturing devices in a monitored space, a technique of tracking an object using a plurality of image capturing devices in the monitored space, and the like.
A method has been proposed in which an object is detected by a plurality of imaging devices and the attributes of the object such as the position of the object at coordinates (for example, world coordinates) common to all the imaging devices are estimated.
Non-Patent Document 1 discloses an arithmetic operation in which an object is imaged by a camera whose world coordinates are known, the coordinates of the object on the image captured by each camera are obtained, and the world coordinates of the object are obtained by the principle of triangulation. Further, for example, Patent Document 1 discloses a method of obtaining a moving body region with each of a plurality of cameras, and assuming that the height of an object is known, estimating three-dimensional coordinates.
Non-Patent Document 2 discloses a method of obtaining coordinates of a certain object on a captured image by a camera. Non-Patent Document 2 discloses a method for identifying, with respect to every partial image of an image captured by a camera, whether or not the partial image is a human body. At this time, the amount of calculation for obtaining the coordinates of the human body on the image captured by the camera is proportional to the number of partial images, that is, the size of the input image. Patent Document 2 discloses a method of limiting an area in which a partial image is selected from an image in order to reduce the calculation amount.

特許第５２６３６９４号公報Japanese Patent No. 5263694 特開２００７−２３３５１７号公報JP, 2007-233517, A

ＲｉｃｈａｒｄＨａｒｔｌｅｙａｎｄＡｎｄｒｅｗＺｉｓｓｅｒｍａｎ（２０１３）、ＭｕｌｔｉｐｌｅＶｉｅｗＧｅｏｍｅｔｒｙｉｎｃｏｍｐｕｔｅｒｖｉｓｉｏｎＳｅｃｏｎｄＥｄｉｔｉｏｎ、ＣＡＭＢＲＩＤＧＥＵＮＩＶＥＲＳＩＴＹＰＲＥＳＳRichard Hartley and Andrew Zisserman (2013), Multiple View Geometry in computer vision Second Edition, CAMBRIDGE UNIVERSITY PRESS ＮａｖｎｅｅｔＤａｌａｌ、ＢｉｌｌＴｒｉｇｇｓ、ＨｉｓｔｏｇｒａｍｓｏｆＯｒｉｅｎｔｅｄＧｒａｄｉｅｎｔｓｆｏｒＨｕｍａｎＤｅｔｅｃｔｉｏｎ、ＣＶＰＲ２００５Navneet Dal, Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR2005

従来、複数の撮像装置を用いたオブジェクトの検出では、複数の撮像装置により撮像された画像のそれぞれについて、画像の全範囲を対象にオブジェクトの検出が行われていた。そのため、オブジェクトの検出に要する演算量等の処理負担の大きさは、全てのカメラの撮像画像の画素数の総和に比例することとなる。また、オブジェクトの検出に用いられる撮像装置の台数は、検出対象の空間の大きさにほぼ比例する。よって、オブジェクトの検出の処理は、検出対象の空間が大きくなればなるほど、処理の負担が大きくなるという問題がある。
本発明は、オブジェクト検出処理の負担を軽減することを目的とする。 Conventionally, in the detection of an object using a plurality of image pickup devices, for each of the images picked up by the plurality of image pickup devices, the object detection is performed for the entire range of the image. Therefore, the amount of processing load such as the amount of calculation required to detect an object is proportional to the total number of pixels of the captured images of all cameras. Further, the number of imaging devices used for detecting an object is substantially proportional to the size of the space to be detected. Therefore, the object detection process has a problem that the larger the detection target space, the greater the processing load.
An object of the present invention is to reduce the load of object detection processing.

本発明の情報処理装置は、複数の撮像装置に共通する共通座標系において当該複数の撮像装置の撮像範囲を俯瞰した２次元地図の画像を表示し、該画像に対するユーザーの操作に基づいて、前記２次元地図においてオブジェクトが初めて登場する領域を入力する入力手段と、入力された前記オブジェクトが初めて登場する領域に基づいて、前記複数の撮像装置のうち前記オブジェクトを初めて検出すべき画像を取得する撮像装置と、該画像において前記オブジェクトを初めて検出すべき領域とを決定する初期領域決定手段と、前記初期領域決定手段により決定された撮像装置により撮像された画像において、前記初期領域決定手段により決定されたオブジェクトを初めて検出すべき領域から、前記オブジェクトを検出する第１の検出手段と、前記オブジェクトが検出された撮像画像に基づいて、前記共通座標系における前記オブジェクトの位置を表す座標を取得する取得手段と、前記座標に基づいて、現在の時点から設定された期間が経過した将来の時点における前記共通座標系における前記オブジェクトの位置を表す座標の予測値を予測する予測手段と、前記複数の撮像装置のそれぞれの撮像可能な範囲と、前記予測値とに基づいて、当該複数の撮像装置のうち前記将来の時点において前記オブジェクトを撮像可能な撮像装置を決定し、該決定された撮像装置において前記オブジェクトを撮像し得る領域を推定する推定手段と、前記将来の時点において、前記推定手段により決定された撮像装置により撮像された画像における前記推定された領域から、前記オブジェクトを検出する第２の検出手段と、を有する。 The information processing apparatus of the present invention displays an image of a two-dimensional map in which the imaging ranges of the plurality of image capturing apparatuses are overlooked in a common coordinate system common to the plurality of image capturing apparatuses, and based on user's operation on the images, Input means for inputting an area in which an object first appears in a two-dimensional map, and imaging for acquiring an image in which the object should be detected for the first time among the plurality of imaging devices based on the input area in which the object first appears. A device, an initial region determining means for determining a region in the image where the object should be detected for the first time, and an image captured by the image capturing device determined by the initial region determining means, the image being determined by the initial region determining means. First detecting means for detecting the object from the area where the object should be detected for the first time, Serial based on the object captured image is detected, the acquisition means for acquiring the coordinates representing the position of the object in the common coordinate system, on the basis of the coordinates, the period set by the current point the future course prediction means for predicting a prediction value of the coordinates representing the position of the object in the common coordinate system at the time, and each of the imageable range of said plurality of imaging devices, based on said predicted value, the plurality of imaging Estimating means for deciding an image pickup device capable of picking up the object at the future time point of the apparatus, and estimating an area in which the object can be picked up in the decided image pickup device; and the estimating means at the future time point Detecting the object from the estimated region in the image captured by the imaging device determined by A second detecting means.

本発明によれば、オブジェクト検出処理の負担を軽減することができる。 According to the present invention, the load of object detection processing can be reduced.

監視システムのシステム構成等の一例を示す図である。It is a figure which shows an example of a system configuration etc. of a monitoring system. 情報処理装置のハードウェア構成等の一例を示す図である。It is a figure showing an example of hardware constitutions etc. of an information processor. 情報処理装置の機能構成の一例を示す図である。FIG. 2 is a diagram illustrating an example of a functional configuration of an information processing device. オブジェクト追尾処理の一例を示すフローチャートである。It is a flow chart which shows an example of object tracking processing. 撮像画像の一例を示す図である。It is a figure showing an example of a picked-up image. 歩行者の追尾を説明する図である。It is a figure explaining the tracking of a pedestrian. 撮像画像の一例を示す図である。It is a figure showing an example of a picked-up image. オブジェクトが存在すると予測された領域を説明する図である。It is a figure explaining the area | region predicted that an object exists. 指定画面の一例を示す図である。It is a figure which shows an example of a designation screen. オブジェクトが登場し得る領域を説明する図である。It is a figure explaining the area where an object can appear.

以下、本発明の実施形態について図面に基づいて説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

＜実施形態１＞
本実施形態では、監視システムが、視野が重複した複数のカメラを用いて、監視空間内を移動する人等のオブジェクトを追尾する処理を説明する。監視システムは、各カメラの撮像画像からオブジェクトを検出し、オブジェクトの世界座標系における座標を取得し、これらを繰り返し、オブジェクトの軌跡を求めることで、オブジェクトを追尾する。世界座標系とは、全カメラに共通する座標系である共通座標系の一例であり、三次元座標系であって、その１軸を底面に垂直、残りの２軸を底面に平行とする座標系である。
本実施形態では、説明の簡略化の為に、監視空間内の底面が平面であると仮定する。また、本実施形態では監視システムによる追尾対象のオブジェクトは、歩行する人（歩行者）であるとするが、歩行者には限定されず、止まって何らかの作業をしている人、走る人、車いす等に乗っている人、犬や猫や牛や豚や鶏等の動物、ドローン等でもよい。 <Embodiment 1>
In the present embodiment, a process will be described in which the monitoring system uses a plurality of cameras having overlapping fields of view to track an object such as a person moving in the monitoring space. The monitoring system tracks an object by detecting the object from the captured image of each camera, acquiring the coordinates of the object in the world coordinate system, and repeating these to obtain the trajectory of the object. The world coordinate system is an example of a common coordinate system that is common to all cameras, and is a three-dimensional coordinate system in which one axis is perpendicular to the bottom surface and the remaining two axes are parallel to the bottom surface. It is a system.
In this embodiment, it is assumed that the bottom surface in the monitoring space is a flat surface for simplification of description. Further, in the present embodiment, the object to be tracked by the monitoring system is assumed to be a pedestrian (pedestrian), but the object is not limited to a pedestrian, and a person who stops and does some work, a person who runs, a wheelchair. People such as dogs, cats, cows, pigs, chickens, and drones.

図１は、監視システムのシステム構成等の一例を示す図である。本実施形態の監視システムは、情報処理装置３４１、カメラ３１１〜３１３等の複数のカメラを含む。情報処理装置３４１、カメラ３１１〜３１３等の複数のカメラは、ネットワーク３３１を介して、相互に接続されている。情報処理装置３４１は、ネットワーク３３１を介して、複数のカメラのそれぞれから撮像画像のデータを取得する。
情報処理装置３４１は、監視システム内の複数のカメラからの撮像画像を取得し、オブジェクトを検出し、検出したオブジェクトの追尾を行うＰＣやサーバ装置等の情報処理装置である。
カメラ３１１〜３１３等の監視システム内の複数のカメラのそれぞれは、ネットワークに接続可能なカメラであり、撮像画像のデータを、ネットワークを介して、情報処理装置３４１等の外部の装置等に送信することができるカメラである。本実施形態では、監視システム内の複数のカメラは、共通する時間情報を有しており、撮像画像が撮像された時点の情報を撮像画像の情報と紐付けて、情報処理装置３４１に送信する。情報処理装置３４１は、各カメラから送信された撮像画像に紐付けられた時点情報を確認することで、各カメラからの撮像画像が何時撮像された画像なのかを把握できる。また、情報処理装置３４１は、異なるカメラから送信された撮像画像に紐付けられた時点情報がそれぞれ同じ時点を指す場合、同じ時点の監視空間３０１を撮像した画像であることを把握できる。 FIG. 1 is a diagram illustrating an example of a system configuration and the like of a monitoring system. The monitoring system of this embodiment includes a plurality of cameras such as the information processing device 341 and the cameras 311 to 313. A plurality of cameras such as the information processing device 341 and the cameras 311 to 313 are connected to each other via a network 331. The information processing device 341 acquires the captured image data from each of the plurality of cameras via the network 331.
The information processing device 341 is an information processing device such as a PC or a server device that acquires captured images from a plurality of cameras in the surveillance system, detects an object, and tracks the detected object.
Each of the plurality of cameras in the monitoring system such as the cameras 311 to 313 is a camera that can be connected to a network, and transmits captured image data to an external device such as the information processing device 341 via the network. It is a camera that can. In the present embodiment, the plurality of cameras in the surveillance system have common time information, and the information at the time when the captured image is captured is associated with the information of the captured image and transmitted to the information processing device 341. . The information processing device 341 can confirm when the captured image from each camera is the captured image by confirming the time point information associated with the captured image transmitted from each camera. Further, when the time point information linked to the captured images transmitted from different cameras respectively indicate the same time point, the information processing device 341 can recognize that the images are images of the monitoring space 301 at the same time point.

図１には、監視システム内の複数のカメラの設置状況、監視対象の空間である監視空間３０１等が示されている。監視空間３０１は、監視システムが監視する空間であり、図１の例では、点線で囲まれた空間として示されている。監視システム内の複数のカメラは、各カメラの視野を合わせた領域が監視空間３０１を含むように、かつ、各カメラの視野が重複するように、設置されている。また、監視空間３０１は、監視システム内の複数のカメラの撮像範囲を全て合わせた領域であるとしてもよい。
監視空間３０１に出入りする歩行者は、監視空間３０１に出入り可能な領域を必ず経由することになる。新規オブジェクト登場空間３０２、新規オブジェクト登場空間３０３、新規オブジェクト登場空間３０４は、監視空間３０１に出入り可能な領域に設定された空間であり、図１の例では監視空間３０１内の、実線で囲まれた空間として示されている。
監視システム内の複数のカメラは、監視空間３０１を監視すべく、その視野を重複する様に設置される。カメラ３１１の視野は、カメラ３１１を中心とする扇型の領域として表され、図１の例では、破線の扇形の領域３２１として示されている。同様に、カメラ３１２、カメラ３１３の視野は、それぞれ領域３２２、領域３２３として示されている。
本実施形態では、監視システムは、監視空間３０１内を移動する歩行者３５１の座標の軌跡を求めることで、歩行者３５１を追尾する。 FIG. 1 shows an installation situation of a plurality of cameras in a surveillance system, a surveillance space 301 which is a surveillance target space, and the like. The monitoring space 301 is a space monitored by the monitoring system, and is shown as a space surrounded by a dotted line in the example of FIG. 1. The plurality of cameras in the surveillance system are installed such that a region in which the fields of view of the cameras are combined includes the surveillance space 301 and the fields of view of the cameras overlap. Further, the monitoring space 301 may be a region in which the imaging ranges of a plurality of cameras in the monitoring system are all combined.
Pedestrians who come in and out of the surveillance space 301 always pass through the area where they can come in and out of the surveillance space 301. The new object appearance space 302, the new object appearance space 303, and the new object appearance space 304 are spaces set in areas that can enter and leave the monitoring space 301, and are surrounded by solid lines in the monitoring space 301 in the example of FIG. 1. It is shown as an open space.
A plurality of cameras in the surveillance system are installed so that their fields of view overlap so as to monitor the surveillance space 301. The field of view of the camera 311 is represented as a fan-shaped region centered on the camera 311 and is shown as a broken-line fan-shaped region 321 in the example of FIG. 1. Similarly, the fields of view of the cameras 312 and 313 are shown as a region 322 and a region 323, respectively.
In the present embodiment, the monitoring system tracks the pedestrian 351 by obtaining the trajectory of the coordinates of the pedestrian 351 moving in the monitoring space 301.

図２は、情報処理装置３４１のハードウェア構成等の一例を示す図である。情報処理装置３４１は、通信部２０１、ＣＰＵ２０２、ＲＡＭ２０３、補助記憶装置２０４、ユーザーインターフェース２０５を含む。通信部２０１、ＣＰＵ２０２、ＲＡＭ２０３、補助記憶装置２０４、ユーザーインターフェース２０５は、システムバス２０６を介して、相互に接続されている。
通信部２０１は、ネットワーク３３１を介して、カメラ３１１〜３１３等の監視システム内の複数のカメラから撮像画像のデータ等を受信する。また、通信部２０１は、複数のカメラそれぞれに対して、撮像指示、停止指示等の指示を送信することもできる。
ＣＰＵ２０２は、ＲＡＭ２０３、補助記憶装置２０４等に格納されている制御プログラム等のプログラムを実行し、情報処理装置３４１を制御する中央演算装置である。
ＲＡＭ２０３は、ＣＰＵ２０２のワークエリアやデータの一時待避領域として機能する記憶装置である。 FIG. 2 is a diagram illustrating an example of the hardware configuration of the information processing device 341. The information processing device 341 includes a communication unit 201, a CPU 202, a RAM 203, an auxiliary storage device 204, and a user interface 205. The communication unit 201, CPU 202, RAM 203, auxiliary storage device 204, and user interface 205 are connected to each other via a system bus 206.
The communication unit 201 receives data of captured images and the like from a plurality of cameras in the monitoring system such as the cameras 311 to 313 via the network 331. The communication unit 201 can also send instructions such as an imaging instruction and a stop instruction to each of the plurality of cameras.
The CPU 202 is a central processing unit that executes a program such as a control program stored in the RAM 203, the auxiliary storage device 204, etc., and controls the information processing device 341.
The RAM 203 is a storage device that functions as a work area of the CPU 202 or a temporary save area for data.

補助記憶装置２０４は、制御プログラム等のプログラム、各種パラメータデータ、画像や各種の設定情報等を記憶する記憶装置である。
ユーザーインターフェース２０５は、キーボード、マウス、ディスプレイ、タッチパネル等のユーザーに対する情報の表示やユーザーからの情報の入力を行う入力装置や出力装置との接続に利用されるインターフェースである。ユーザーインターフェース２０５には、キーボード、マウス、ディスプレイ、タッチパネル等の入力装置や出力装置が接続されている。
本実施形態では、ＣＰＵ２０２が、ＲＡＭ２０３又は補助記憶装置２０４に記憶されたプログラムに基づき処理を実行することによって、図３で後述する情報処理装置３４１の機能及び図４で後述するフローチャートの処理が実現される。 The auxiliary storage device 204 is a storage device that stores programs such as control programs, various parameter data, images, various setting information, and the like.
The user interface 205 is an interface used to connect an input device or an output device such as a keyboard, a mouse, a display, and a touch panel that displays information to the user and inputs information from the user. The user interface 205 is connected with an input device and an output device such as a keyboard, a mouse, a display and a touch panel.
In the present embodiment, the CPU 202 executes the process based on the program stored in the RAM 203 or the auxiliary storage device 204, thereby realizing the function of the information processing device 341 described later in FIG. 3 and the process of the flowchart described later in FIG. To be done.

図３は、情報処理装置３４１の機能構成の一例を示す図である。情報処理装置３４１は、画像取得部１０１、属性取得部１０２、属性更新部１０３、共通属性推定部１０４、共通属性予測部１０５、装置決定部１０６、領域推定部１０７、空間入力部１１０、領域決定部１１１、を含む。
画像取得部１０１は、カメラ３１１〜３１３等の監視システム内の複数のカメラから、撮像された撮像画像を取得する。
属性取得部１０２は、画像取得部１０１により取得された撮像画像から、監視空間３０１内に新規に現れたオブジェクトを検出する。以下では、監視空間３０１内に新規に現れるオブジェクトを新規オブジェクトとする。そして、属性取得部１０２は、検出したオブジェクトの撮像画像内の座標系における座標を取得する。撮像画像内の座標系における座標は、監視システム内の各カメラに個別な属性の一例である。
共通属性推定部１０４は、属性取得部１０２又は後述する属性更新部１０３により取得された各カメラの撮像画像内の座標系における追尾対象のオブジェクトの座標に基づいて、そのオブジェクトの世界座標系における座標を推定する。世界座標系における座標は、監視システム内の全てのカメラに共通する属性である共通属性の一例である。共通属性についてのオブジェクトの属性値を共通属性値とする。即ち、世界座標系におけるオブジェクトの座標は、オブジェクトの共通属性値の一例である。 FIG. 3 is a diagram illustrating an example of a functional configuration of the information processing device 341. The information processing device 341 includes an image acquisition unit 101, an attribute acquisition unit 102, an attribute update unit 103, a common attribute estimation unit 104, a common attribute prediction unit 105, a device determination unit 106, a region estimation unit 107, a space input unit 110, and a region determination. Part 111.
The image acquisition unit 101 acquires captured images from a plurality of cameras in the surveillance system such as the cameras 311 to 313.
The attribute acquisition unit 102 detects an object that newly appears in the monitoring space 301 from the captured image acquired by the image acquisition unit 101. In the following, an object that newly appears in the monitoring space 301 is a new object. Then, the attribute acquisition unit 102 acquires the coordinates of the detected object in the captured image in the coordinate system. Coordinates in the coordinate system in the captured image are an example of attributes unique to each camera in the surveillance system.
The common attribute estimation unit 104, based on the coordinates of the tracking target object in the coordinate system in the captured image of each camera acquired by the attribute acquisition unit 102 or the attribute update unit 103 described later, coordinates of the object in the world coordinate system. To estimate. Coordinates in the world coordinate system are an example of common attributes that are attributes common to all cameras in the surveillance system. The attribute value of the object for the common attribute is the common attribute value. That is, the coordinate of the object in the world coordinate system is an example of the common attribute value of the object.

共通属性予測部１０５は、共通属性推定部１０４により推定された追尾対象のオブジェクトの世界座標系における座標に基づいて、現在の時点から設定された期間が経過した時点におけるオブジェクトの世界座標系における座標を予測する。また、共通属性予測部１０５は、共通属性推定部１０４により推定されたオブジェクトの世界座標系における座標と、オブジェクトの世界座標系における過去の座標と、に基づいて、上記の予測処理を行うこととしてもよい。即ち、共通属性予測部１０５は、オブジェクトの世界座標系における推定された現在の座標と、過去の座標と、に基づいて、現在の時点から設定された期間が経過した時点におけるオブジェクトの世界座標系における座標を予測してもよい。
装置決定部１０６は、共通属性予測部１０５により予測された追尾対象のオブジェクトの世界座標系における座標に基づいて、監視システム内の複数のカメラのうち、オブジェクトを撮像し得るカメラを推定する。装置決定部１０６によりオブジェクトを撮像し得ると推定されたカメラは、改めて撮像を行って、その撮像画像からオブジェクトが検出されることになる。
領域推定部１０７は、共通属性予測部１０５により予測された追尾対象のオブジェクトの世界座標系における座標に基づいて、装置決定部１０６によりオブジェクトを撮像し得ると推定されたカメラのそれぞれについて、以下の処理を行う。即ち、領域推定部１０７は、そのオブジェクトが撮像画像上で映りうる領域を推定する。以下では、領域推定部１０７により推定された領域を追尾領域とする。 The common attribute prediction unit 105, based on the coordinates in the world coordinate system of the tracking target object estimated by the common attribute estimation unit 104, the coordinates in the world coordinate system of the object at the time when the set period elapses from the current time. Predict. Further, the common attribute prediction unit 105 performs the above prediction process based on the coordinates of the object in the world coordinate system estimated by the common attribute estimation unit 104 and the past coordinates of the object in the world coordinate system. Good. That is, the common attribute prediction unit 105, based on the estimated current coordinates and the past coordinates in the world coordinate system of the object, the world coordinate system of the object at the time when the set period elapses from the current time. The coordinates at may be predicted.
The device determination unit 106 estimates a camera that can image the object among the plurality of cameras in the monitoring system based on the coordinates in the world coordinate system of the tracking target object predicted by the common attribute prediction unit 105. The camera estimated to be able to image the object by the device determination unit 106 performs another imaging, and the object is detected from the captured image.
The area estimation unit 107, for each of the cameras estimated to be able to image the object by the device determination unit 106, based on the coordinates in the world coordinate system of the tracking target object predicted by the common attribute prediction unit 105, Perform processing. That is, the area estimation unit 107 estimates the area in which the object can appear in the captured image. Hereinafter, the area estimated by the area estimation unit 107 will be referred to as a tracking area.

属性更新部１０３は、装置決定部１０６によりオブジェクトを撮像し得ると推定されたカメラからの撮像画像内の領域推定部１０７により推定された追尾領域の中から、追尾対象のオブジェクトを検出し、オブジェクトの撮像画像内の座標を取得する。
空間入力部１１０は、入出力装置、ユーザーインターフェース２０５を介したユーザーの操作に基づいて、オブジェクトが監視空間に入る為に必ず経由する空間である新規オブジェクト登場空間を決定する。ユーザーは、入出力装置、ユーザーインターフェース２０５を介して、監視空間内の、壁、カメラの設置座標や関連パラメータ、オブジェクトが新規に登場する空間、等を入力する操作を行う。
領域決定部１１１は、空間入力部１１０により決定された新規オブジェクト登場空間に基づいて、監視空間内に新規に登場したオブジェクトを検出し得るカメラ、及び、そのカメラによる撮像画像上で新規オブジェクトが写りうる領域を決定する。
図３の要素群１２１は、本実施形態においてループして処理を繰り返すことになる機能構成要素群である。 The attribute update unit 103 detects an object to be tracked from the tracking regions estimated by the region estimation unit 107 in the captured image from the camera estimated to be able to capture the object by the device determination unit 106, and The coordinates in the captured image of are acquired.
The space input unit 110 determines a new object appearance space, which is a space through which an object necessarily passes in order to enter the monitoring space, based on a user's operation via the input / output device and the user interface 205. The user performs an operation of inputting a wall, camera installation coordinates and related parameters, a space in which an object newly appears, and the like in the monitoring space through the input / output device and the user interface 205.
The area determination unit 111 detects a new object that has newly appeared in the monitoring space based on the new object appearance space determined by the space input unit 110, and the new object appears on the image captured by the camera. Determine the potential area.
The element group 121 in FIG. 3 is a functional component group that loops and repeats the processing in the present embodiment.

図４は、本実施形態におけるオブジェクト追尾処理の一例を示すフローチャートである。歩行者３５１が外部から監視空間３０１内に入り、監視空間３０１内を移動する状況を例に、図４の処理を説明する。
Ｓ４０１において、画像取得部１０１は、カメラ３１１〜３１３等の監視システム内の複数のカメラから、同じ時点に撮像された撮像画像を取得する。また、画像取得部１０１は、カメラ３１１〜３１３等の監視システム内の複数のカメラから、設定された期間内（例えば、ある時点を中心に０．１秒の幅のある期間）に撮像された撮像画像を取得することとしてもよい。 FIG. 4 is a flowchart showing an example of the object tracking process in this embodiment. The process of FIG. 4 will be described by taking as an example a situation where a pedestrian 351 enters the monitoring space 301 from the outside and moves in the monitoring space 301.
In step S401, the image acquisition unit 101 acquires captured images captured at the same time from a plurality of cameras in the surveillance system such as the cameras 311 to 313. In addition, the image acquisition unit 101 is imaged by a plurality of cameras in the monitoring system such as the cameras 311 to 313 within a set period (for example, a period having a width of 0.1 seconds around a certain time point). The captured image may be acquired.

Ｓ４０２において、属性取得部１０２は、Ｓ４０１で取得された撮像画像から、監視空間３０１内に新たに現れた歩行者３５１を検出し、歩行者３５１の撮像画像内における座標を取得する。
本実施形態では、属性取得部１０２は、予め設定されたカメラから取得された撮像画像から、新規に監視空間３０１に表れる新規オブジェクトを検出する処理を行う。属性取得部１０２は、監視システム内の全てのカメラから取得された撮像画像から、新規に監視空間３０１に表れる新規オブジェクトを検出する処理を行うこととしてもよい。
また、属性取得部１０２は、カメラの撮像画像内に予め設定された領域から、新規に監視空間３０１に表れる新規オブジェクトを検出する処理を行う。この予め設定された領域は、撮像画像内で新たなオブジェクトが登場し得る領域であり、以下では、新規オブジェクト属性取得領域とする。新規オブジェクト属性取得領域は、先述の新規オブジェクト登場空間がカメラの撮像画像上で映る領域に相当する。また、属性取得部１０２は、カメラの撮像画像内の全領域から、新規に監視空間３０１に表れるオブジェクトを検出する処理を行うこととしてもよい。
図１の例では、新規オブジェクト登場空間３０２を映すカメラは、カメラ３１１及びカメラ３１２である。カメラ３１１、カメラ３１２、カメラ３１３が映す撮像画像の一例を、図５に示す。 In S402, the attribute acquisition unit 102 detects a pedestrian 351 that has newly appeared in the monitoring space 301 from the captured image acquired in S401, and acquires the coordinates of the pedestrian 351 in the captured image.
In the present embodiment, the attribute acquisition unit 102 performs a process of detecting a new object newly appearing in the monitoring space 301 from a captured image acquired from a preset camera. The attribute acquisition unit 102 may perform a process of newly detecting a new object appearing in the monitoring space 301 from captured images acquired from all the cameras in the monitoring system.
In addition, the attribute acquisition unit 102 performs a process of newly detecting a new object appearing in the monitoring space 301 from a preset area in the captured image of the camera. This preset area is an area in which a new object can appear in the captured image, and is hereinafter referred to as a new object attribute acquisition area. The new object attribute acquisition area corresponds to the area in which the above-described new object appearance space appears in the captured image of the camera. Further, the attribute acquisition unit 102 may perform a process of newly detecting an object appearing in the monitoring space 301 from the entire area within the captured image of the camera.
In the example of FIG. 1, the cameras showing the new object appearance space 302 are the camera 311 and the camera 312. FIG. 5 shows an example of a captured image captured by the camera 311, the camera 312, and the camera 313.

図５は、カメラ３１１、カメラ３１２、カメラ３１３による撮像画像の一例を示す図である。図５の撮像画像４１１、撮像画像４１２、撮像画像４１３は、それぞれ、カメラ３１１、カメラ３１２、カメラ３１３の撮像画像の一例である。新規オブジェクト属性取得領域４２１、４２２は、それぞれ、撮像画像４１１、４１２上の新規オブジェクト属性取得領域であり、破線で囲まれた領域として示されている。監視空間３０１に入ってきた歩行者３５１は、新規オブジェクト登場空間３０２内に存在するとき、新規オブジェクト属性取得領域４２１、４２２の中に映ることになる。
属性取得部１０２は、新規オブジェクト属性取得領域４２１から、歩行者３５１を検出し、歩行者３５１の撮像画像４１１内での座標を取得する。また、属性取得部１０２は、新規オブジェクト属性取得領域４２２から、歩行者３５１を検出し、歩行者３５１の撮像画像４１２内での座標を取得する。
また、カメラ３１３は、新規オブジェクト登場空間３０２〜３０４の何れも映さないので、撮像画像４１３に新規なオブジェクトが現れることはない。そのため、属性取得部１０２は、撮像画像４１３について、歩行者３５１を検出する処理を行わない。
属性取得部１０２がカメラの撮像画像から歩行者３５１の座標を取得する方法としては、例えば、非特許文献２に記載された、カメラの撮像画像内の部分画像について、その部分画像が人体か否かを識別する手法がある。属性取得部１０２は、人体（歩行者３５１）か否かを識別するための部分画像を、新規オブジェクト属性取得領域の範囲内から選択することになる。本実施形態では、歩行者３５１の撮像画像上の座標を、歩行者３５１の頭部を円で近似した場合の中心の座標とする。 FIG. 5 is a diagram showing an example of images captured by the cameras 311, 312 and 313. The captured image 411, the captured image 412, and the captured image 413 in FIG. 5 are examples of captured images of the camera 311, the camera 312, and the camera 313, respectively. The new object attribute acquisition areas 421 and 422 are new object attribute acquisition areas on the captured images 411 and 412, respectively, and are shown as areas surrounded by broken lines. When the pedestrian 351 who has entered the monitoring space 301 is present in the new object appearance space 302, it is reflected in the new object attribute acquisition areas 421 and 422.
The attribute acquisition unit 102 detects the pedestrian 351 from the new object attribute acquisition area 421 and acquires the coordinates of the pedestrian 351 in the captured image 411. The attribute acquisition unit 102 also detects the pedestrian 351 from the new object attribute acquisition area 422 and acquires the coordinates of the pedestrian 351 in the captured image 412.
In addition, since the camera 313 does not show any of the new object appearance spaces 302 to 304, no new object appears in the captured image 413. Therefore, the attribute acquisition unit 102 does not perform the process of detecting the pedestrian 351 in the captured image 413.
As a method for the attribute acquisition unit 102 to acquire the coordinates of the pedestrian 351 from the image captured by the camera, for example, regarding a partial image in the image captured by the camera described in Non-Patent Document 2, whether the partial image is a human body or not There is a method to identify that. The attribute acquisition unit 102 will select a partial image for identifying whether or not it is a human body (pedestrian 351) from within the range of the new object attribute acquisition region. In the present embodiment, the coordinates on the captured image of the pedestrian 351 are the coordinates of the center when the head of the pedestrian 351 is approximated by a circle.

従来技術では、監視空間３０１内を移動するオブジェクトの軌跡を求める上で、監視空間３０１に新規に現れるオブジェクトを検出する演算量は、監視空間３０１の面積に比例することになる。本実施形態では、属性取得部１０２が新規に監視空間３０１に現れるオブジェクトを検出するための演算量は、オブジェクトが監視空間３０１に出入りする際に通過し得る領域の幅の長さに比例する値となり、従来技術に比べて削減されている。
なぜならば、属性取得部１０２は、新規オブジェクト属性取得領域にのみ検出処理を施し、新規オブジェクト属性取得領域の大きさは、新規オブジェクト登場空間の大きさに比例する。新規オブジェクト登場空間は、監視空間３０１内に新規に現れるオブジェクトが必ず新規オブジェクト登場空間を経由する様に設定されている。即ち、新規オブジェクト登場空間の大きさは、オブジェクトが監視空間３０１に出入りする際に通過し得る領域の幅の長さに比例するからである。図１の例では、新規オブジェクト登場空間３０２、新規オブジェクト登場空間３０３、新規オブジェクト登場空間３０４の面積の和は、オブジェクトが監視空間３０１に出入りする際に通過し得る領域の幅の長さに比例する。 In the conventional technique, the amount of calculation for detecting an object newly appearing in the monitoring space 301 in obtaining the trajectory of the object moving in the monitoring space 301 is proportional to the area of the monitoring space 301. In the present embodiment, the calculation amount for the attribute acquisition unit 102 to detect an object that newly appears in the monitoring space 301 is a value proportional to the length of the width of the area that the object can pass through when entering or leaving the monitoring space 301. Which is reduced compared to the prior art.
This is because the attribute acquisition unit 102 performs the detection process only on the new object attribute acquisition area, and the size of the new object attribute acquisition area is proportional to the size of the new object appearance space. The new object appearance space is set so that an object newly appearing in the monitoring space 301 always passes through the new object appearance space. That is, the size of the new object appearance space is proportional to the length of the width of the area through which the object can pass when entering and leaving the monitoring space 301. In the example of FIG. 1, the sum of the areas of the new object appearance space 302, the new object appearance space 303, and the new object appearance space 304 is proportional to the length of the width of the area through which the object can pass when entering and leaving the monitoring space 301. To do.

Ｓ４０３において、共通属性推定部１０４は、Ｓ４０２又はＳ４０８で取得された歩行者３５１の撮像画像内での座標に基づいて、世界座標系における歩行者３５１の座標３６２を推定する。Ｓ４０３で推定された世界座標系における歩行者３５１の座標は、共通属性についての歩行者３５１の共通属性値の一例である。
複数のカメラの撮影画像のそれぞれの座標系における歩行者３５１の座標から、歩行者３５１の世界座標系における座標を求める方法には、非特許文献１に示されたエピポーラ幾何に基づく三角測量を利用する方法がある。各カメラの世界座標系における座標、光軸の傾き、画角等の情報は、カメラ設置の際等に測量された等の理由により既知であるとする。本実施形態では、歩行者３５１の世界座標系における座標は、歩行者３５１の頭部を球で近似した場合のその球の中心の座標であるとする。
Ｓ４０４において、共通属性推定部１０４は、Ｓ４０３で推定した歩行者３５１の世界座標系における座標の情報を、歩行者３５１の軌跡の情報に登録する。共通属性推定部１０４は、処理がＳ４０４に進む度に、Ｓ４０３で推定した歩行者３５１の世界座標系における座標の情報を時系列順に記憶することで、歩行者３５１の軌跡の情報を生成する。歩行者３５１の軌跡の情報のうち、最新のＳ４０４の処理で追加された歩行者３５１の座標の情報以外の情報は、過去においての歩行者３５１の世界座標系における座標の情報である。即ち、歩行者３５１の軌跡の情報のうち、最新のＳ４０４の処理で追加された歩行者３５１の座標の情報以外の情報は、共通属性についての歩行者３５１の過去の属性値の一例である。 In S403, the common attribute estimation unit 104 estimates the coordinates 362 of the pedestrian 351 in the world coordinate system based on the coordinates in the captured image of the pedestrian 351 acquired in S402 or S408. The coordinate of the pedestrian 351 in the world coordinate system estimated in S403 is an example of the common attribute value of the pedestrian 351 for the common attribute.
As a method for obtaining the coordinates of the pedestrian 351 in the world coordinate system from the coordinates of the pedestrian 351 in the coordinate systems of the images captured by the plurality of cameras, triangulation based on epipolar geometry shown in Non-Patent Document 1 is used. There is a way to do it. It is assumed that the information on the coordinates of each camera in the world coordinate system, the inclination of the optical axis, the angle of view, and the like are known because of the fact that they were measured when the cameras were installed. In the present embodiment, the coordinates of the pedestrian 351 in the world coordinate system are assumed to be the coordinates of the center of the sphere when the head of the pedestrian 351 is approximated by a sphere.
In step S 404, the common attribute estimation unit 104 registers the information on the coordinates of the pedestrian 351 in the world coordinate system estimated in step S 403 in the information on the trajectory of the pedestrian 351. The common attribute estimation unit 104 generates information about the trajectory of the pedestrian 351 by storing the information about the coordinates of the pedestrian 351 in the world coordinate system estimated at S403 in chronological order every time the process proceeds to S404. Of the information about the trajectory of the pedestrian 351, information other than the information about the coordinates of the pedestrian 351 added in the latest process of S404 is information about the coordinates of the pedestrian 351 in the past in the world coordinate system. That is, among the information on the trajectory of the pedestrian 351, the information other than the information on the coordinates of the pedestrian 351 added in the latest process of S404 is an example of the past attribute value of the pedestrian 351 for the common attribute.

Ｓ４０５において、共通属性予測部１０５は、世界座標系における歩行者３５１の軌跡の情報に基づいて、現在の時点から設定された期間が経過した時点において歩行者３５１が存在し得る世界座標系における座標の範囲を予測する。この設定された期間は、例えば、監視システム内の複数のカメラによりある撮像画像が撮像された時点から、次の撮像画像が撮像される時点までの期間等である。Ｓ４０５で共通属性予測部１０５により予測される世界座標系における座標の範囲は、共通属性についての歩行者３５１の共通属性値の予測される値である予測値の一例である。
共通属性予測部１０５は、歩行者３５１の軌跡の情報にＳ４０４で登録された座標の情報しか登録されていない場合、以下の処理を行うことになる。即ち、共通属性予測部１０５は、Ｓ４０３で推定された世界座標系における歩行者３５１の座標の情報に基づいて、現在の時点から設定された期間が経過した時点での世界座標系における歩行者３５１が存在し得る座標の範囲を予測することになる。
また、共通属性予測部１０５は、歩行者３５１の軌跡の情報にＳ４０４で登録された座標の情報の他に歩行者３５１の過去の座標の情報も登録されている場合、以下の処理を行うことになる。即ち、共通属性予測部１０５は、Ｓ４０３で推定された歩行者３５１の座標の情報と、歩行者３５１の過去の座標の情報と、に基づいて、現在の時点から設定された期間が経過した時点での世界座標系での歩行者３５１が存在し得る座標の範囲を予測することになる。 In S405, the common attribute prediction unit 105, based on the information of the trajectory of the pedestrian 351 in the world coordinate system, the coordinates in the world coordinate system in which the pedestrian 351 may exist at the time when the set period has elapsed from the current time. Predict the range of. The set period is, for example, a period from the time when a certain captured image is captured by the plurality of cameras in the monitoring system to the time when the next captured image is captured. The range of coordinates in the world coordinate system predicted by the common attribute prediction unit 105 in S405 is an example of a predicted value that is a predicted value of the common attribute value of the pedestrian 351 for the common attribute.
When only the coordinate information registered in S404 is registered in the trajectory information of the pedestrian 351, the common attribute prediction unit 105 performs the following processing. That is, the common attribute prediction unit 105, based on the information of the coordinates of the pedestrian 351 in the world coordinate system estimated in S403, the pedestrian 351 in the world coordinate system at the time when the set period has elapsed from the current time. Would predict the range of coordinates in which may exist.
In addition, the common attribute prediction unit 105 performs the following process when the past coordinate information of the pedestrian 351 is also registered in the trajectory information of the pedestrian 351 in addition to the coordinate information registered in S404. become. That is, the common attribute prediction unit 105, based on the information of the coordinates of the pedestrian 351 estimated in S403 and the information of the past coordinates of the pedestrian 351, the time when the set period has elapsed from the current time. Will predict the range of coordinates in which the pedestrian 351 can exist in the world coordinate system.

例えば、共通属性予測部１０５は、図１の例では、座標３６２を含む歩行者３５１の軌跡の情報に基づいて、座標３６３と予測誤差３７１とを求める。即ち、共通属性予測部１０５は、現在の時点から設定された期間が経過した時点における歩行者３５１が座標３６３を中心に予測誤差３７１の半径を有する円形上の領域に存在すると予測する。座標３６３は、現在の時点から設定された期間が経過した時点における歩行者３５１の座標として予測される値である。また、予測誤差３７１は、歩行者３５１の座標として予測された値からの誤差を示す指標である。歩行者３５１の軌跡の情報から、現在の時点から設定された期間が経過した時点において歩行者３５１が存在し得る世界座標系における座標の範囲を求める方法には、歩行者３５１の軌跡の情報に基づいた線形回帰を利用する方法等がある。
また、歩行者３５１が新規オブジェクト登場空間内に新たに現れており、歩行者３５１の過去の座標の情報が存在しない場合、又は、歩行者３５１の過去の座標の情報の数が線形回帰を利用するために十分な値ではない場合、以下のような方法が利用される。即ち、Ｓ４０３で推定された座標を中心に、歩行者３５１が設定された期間で移動し得る範囲を、現在の時点から設定された期間が経過した時点において歩行者３５１が存在し得る世界座標系における座標の範囲として求める方法が利用される。例えば、共通属性予測部１０５は、予め設定された歩行者３５１の速度に設定された期間を乗じた距離を算出する。そして、共通属性予測部１０５は、Ｓ４０３で推定された座標を中心に、算出した距離の半径を有する円形の範囲を、現在の時点から設定された期間が経過した時点において歩行者３５１が存在し得る世界座標系における座標の範囲とする。 For example, in the example of FIG. 1, the common attribute prediction unit 105 calculates the coordinates 363 and the prediction error 371 based on the information on the trajectory of the pedestrian 351 including the coordinates 362. That is, the common attribute prediction unit 105 predicts that the pedestrian 351 at the time when the set period has elapsed from the current time exists in the circular area having the radius of the prediction error 371 with the coordinate 363 as the center. The coordinate 363 is a value predicted as the coordinate of the pedestrian 351 at the time when the set period has elapsed from the current time. The prediction error 371 is an index indicating an error from the value predicted as the coordinate of the pedestrian 351. To obtain the range of coordinates in the world coordinate system in which the pedestrian 351 can exist from the information of the trajectory of the pedestrian 351 at the time when the set period has elapsed from the current time, the information of the trajectory of the pedestrian 351 is used. There is a method using linear regression based on the above.
In addition, when the pedestrian 351 is newly appearing in the new object appearance space and there is no information on the past coordinates of the pedestrian 351 or when the number of information on the past coordinates of the pedestrian 351 uses linear regression. If the value is not enough to do, the following method is used. That is, the world coordinate system in which the pedestrian 351 can exist at the time when the set period elapses from the present time within the range in which the pedestrian 351 can move in the set period around the coordinates estimated in S403. The method of obtaining the range of the coordinates in is used. For example, the common attribute prediction unit 105 calculates the distance by multiplying the speed of the pedestrian 351 set in advance by the set period. Then, the common attribute prediction unit 105 determines that the pedestrian 351 exists in the circular range having the radius of the calculated distance centered on the coordinates estimated in S403 at the time when the set period has elapsed from the current time. The range of coordinates in the world coordinate system to be obtained.

Ｓ４０６において、装置決定部１０６は、Ｓ４０５で予測された世界座標系における範囲に基づいて、監視システムの複数のカメラからの撮像画像のうちどのカメラからの撮像画像を、歩行者３５１の検出に利用するかを決定する。また、領域推定部１０７は、Ｓ４０５で予測された世界座標系における範囲に基づいて、装置決定部１０６により決定されたカメラからの撮像画像のうちどの領域が、歩行者３５１の検出に利用されるかを決定する。
例えば、装置決定部１０６は、監視システム内の複数のカメラのうち、Ｓ４０５で予測された世界座標系における範囲に存在する歩行者３５１を設定された画素数以上の領域として撮像するカメラを選択する。そして、装置決定部１０６は、選択したカメラを、歩行者３５１の検出に利用される撮像画像を撮像するカメラとして決定する。また、装置決定部１０６は、監視システム内の複数のカメラのうち、Ｓ４０５で予測された世界座標系における範囲を視野に含めるカメラを、撮像画像が歩行者３５１の検出に利用されるカメラとして決定することとしてもよい。 In step S 406, the device determination unit 106 uses the imaged image from which of the imaged images from the plurality of cameras of the monitoring system to detect the pedestrian 351 based on the range in the world coordinate system predicted in step S 405. Decide what to do. Further, the region estimation unit 107 uses which region of the captured image from the camera determined by the device determination unit 106 based on the range in the world coordinate system predicted in S405 to detect the pedestrian 351. Decide
For example, the device determination unit 106 selects, from a plurality of cameras in the monitoring system, a camera that captures an image of the pedestrian 351 existing in the range in the world coordinate system predicted in S405 as an area having a set number of pixels or more. . Then, the device determination unit 106 determines the selected camera as a camera that captures a captured image used for detecting the pedestrian 351. Further, the device determination unit 106 determines, among the plurality of cameras in the monitoring system, a camera that includes the range in the world coordinate system predicted in S405 in the field of view as a camera whose captured image is used for detecting the pedestrian 351. It may be done.

補助記憶装置２０４は、予めカメラ毎に設定された、検出対象のオブジェクトを設定された画素数よりも大きい領域として撮像できる世界座標系における領域の情報を記憶しているとする。以下では、この領域を追尾可能領域とする。即ち、追尾可能領域とは、あるカメラの視野のうち、検出対象のオブジェクトが予め設定された画素数よりも大きく映る領域を、世界座標系の底面へ射影した領域である。本実施形態では、追尾可能領域は、あるカメラの視野のうち検出対象のオブジェクトが写る領域を、世界座標系の底面へ射影した領域、即ち視野の領域と同一であるとする。
装置決定部１０６は、予めカメラ毎に設定された追尾可能領域の情報を、補助記憶装置２０４から取得する。監視システム内の複数のカメラのうち、オブジェクトの移動先の世界座標系における座標が、あるカメラの追尾可能領域の内側にある場合、そのカメラでオブジェクトを追尾できる、と判断できる。図１の例では、歩行者３５１が座標３６３の近辺に移動すると推定されるので、装置決定部１０６は、座標３６３を追尾可能領域に含むカメラ３１１、カメラ３１３を、撮像画像が歩行者３５１の検出に利用されるカメラとして決定する。 It is assumed that the auxiliary storage device 204 stores information of an area in the world coordinate system that is set in advance for each camera and that can image an object to be detected as an area having a larger number of pixels than the set number of pixels. Hereinafter, this area will be referred to as a trackable area. That is, the trackable area is an area obtained by projecting, on the bottom surface of the world coordinate system, an area in the visual field of a camera in which the object to be detected appears larger than the preset number of pixels. In the present embodiment, the trackable area is assumed to be the same as the area of the field of view of a certain camera in which the object to be detected is projected onto the bottom surface of the world coordinate system, that is, the area of the field of view.
The device determination unit 106 acquires, from the auxiliary storage device 204, information on the trackable area preset for each camera. If the coordinates in the world coordinate system of the movement destination of the object among the plurality of cameras in the monitoring system are inside the trackable area of a certain camera, it can be determined that the camera can track the object. In the example of FIG. 1, since it is estimated that the pedestrian 351 moves to the vicinity of the coordinate 363, the device determination unit 106 determines that the camera 311 and the camera 313 that include the coordinate 363 in the trackable area have the captured image of the pedestrian 351. Determined as the camera used for detection.

例えば、装置決定部１０６は、世界座標系における座標が追尾可能領域の内外の何れに存在するかの判定に、ｅｖｅｎｏｄｄｒｕｌｅ法を用いる。ｅｖｅｎｏｄｄｒｕｌｅ法とは、世界座標系における座標と何らかの無限遠のある点とを結ぶ半直線を想定し、この半直線と追尾可能領域の扇形の円弧及び２つの半径とが交差する回数に基づいて、座標が追尾可能領域の内側か外側かを判定する方法である。想定された半直線と追尾可能領域の扇形の円弧及び２つの半径とが交差する回数が奇数であれば扇形の内側、０又は偶数であれば扇側の外側、として判定される。
図６は、歩行者３５１の追尾を説明する図である。図７の例では、Ｓ４０５で歩行者３５１が座標３６３に移動すると予測されている場合、座標３６３から伸ばした半直線６０１とカメラ３１３の追尾可能領域である領域３２３とが交差する回数は、１回すなわち奇数である。よって、装置決定部１０６は、座標３６３を、追尾可能領域である領域３２３の内側に存在すると判定し、カメラ３１３を、歩行者３５１を追尾できるカメラである、と判断する。半直線６０１とカメラ６１１の追尾可能領域６２１との交差回数は、２回すなわち偶数であるので、装置決定部１０６は、カメラ６１１を、歩行者３５１を追尾できないカメラであると判断する。 For example, the device determination unit 106 uses the even odd rule method to determine whether the coordinates in the world coordinate system are inside or outside the trackable area. The even odd rule method assumes a half line connecting the coordinates in the world coordinate system and a point at some infinity, and is based on the number of times this half line intersects with the fan-shaped arc and two radii of the trackable area. It is a method of determining whether the coordinates are inside or outside the trackable area. If the number of times the assumed half line and the sector-shaped arc and two radii of the trackable area intersect is odd, it is determined as the inside of the sector, and if 0 or even, it is determined as the outside of the sector.
FIG. 6 is a diagram illustrating tracking of a pedestrian 351. In the example of FIG. 7, when it is predicted that the pedestrian 351 moves to the coordinate 363 in S405, the number of times the half line 601 extended from the coordinate 363 and the area 323 which is the trackable area of the camera 313 intersect with each other is 1 Times or odd. Therefore, the device determination unit 106 determines that the coordinate 363 exists inside the area 323 that is the trackable area, and determines that the camera 313 is a camera that can track the pedestrian 351. Since the number of intersections of the half line 601 and the trackable area 621 of the camera 611 is twice, that is, an even number, the device determination unit 106 determines that the camera 611 is a camera that cannot track the pedestrian 351.

領域推定部１０７は、装置決定部１０６により歩行者３５１の検出に利用される撮像画像を撮像するカメラとして決定されたカメラのそれぞれについて、歩行者３５１が撮像画像上で映りうる領域である追尾領域を推定する。
図７は、撮像画像の一例を示す図である。図７の撮像画像５１３は、カメラ３１３が行う次の撮像処理において撮像された画像を示す。領域推定部１０７は、撮像画像５１３において歩行者３５１の追尾領域を、追尾領域５３１のように推定する。
Ｓ４０５で予測された世界座標系における歩行者３５１が存在し得る範囲は、半径を予測誤差とする球状となる。しかし、球は、透視投影変換が困難である。そのため、本実施形態では、領域推定部１０７は、Ｓ４０５で予測された球状の範囲を多面体で近似することとする。本実施形態では、領域推定部１０７は、直方体でこの球状の範囲を近似し、この直方体の各頂点のカメラによる撮像画像上の点を求め、この点群の凸包を求めて追尾領域とする。本実施形態では、この直方体は、床面に接し、その高さを追尾中の歩行者３５１の頭部よりも高い値とし、直方体の水平方向の一片の長さを予測誤差の二倍とする図形である。図８は、オブジェクトが存在すると予測された領域を説明する図である。図８の図形は、中心を座標３６３とするこの直方体を示す。領域推定部１０７は、この直方体の頂点のそれぞれについて、透視投影変換を施すことで、図７に示す撮像画像５１３上に頂点群を取得し、その頂点群を凸包することで、追尾領域５３１を取得する。
ある世界座標系における点に対応するカメラによる撮像画像の座標系における点の座標を求める方法には、非特許文献１に記載のピンホールカメラモデルに従う透視投影変換がある。非特許文献１によれば、世界座標Ｍ（ｘ、ｙ、ｚ）のオブジェクトを撮像画像上に透視投影した座標ｍ（ｕ、ｖ）は、以下の式１により求まる。ただし、カメラ毎に、予め、世界座標系を撮像画像上の座標系に透視投影するカメラ内部パラメータ行列Ａ、カメラ外部パラメータ行列［Ｒ｜ｔ］、スケールｓの情報が必要となる。本実施形態では、補助記憶装置２０４は、これらの情報を予め記憶しているものとする。 The area estimation unit 107, for each of the cameras determined by the device determination unit 106 as a camera that captures the captured image used for detecting the pedestrian 351, is a tracking region that is an area that the pedestrian 351 can appear on the captured image. To estimate.
FIG. 7 is a diagram showing an example of a captured image. The captured image 513 in FIG. 7 is an image captured in the next imaging process performed by the camera 313. The area estimation unit 107 estimates the tracking area of the pedestrian 351 in the captured image 513 like the tracking area 531.
The range in which the pedestrian 351 can exist in the world coordinate system predicted in S405 is spherical with a prediction error of radius. However, spheres are difficult to transform in perspective projection. Therefore, in this embodiment, the region estimation unit 107 approximates the spherical range predicted in S405 with a polyhedron. In the present embodiment, the region estimation unit 107 approximates this spherical range with a rectangular parallelepiped, finds points on the image captured by the camera at each vertex of the rectangular parallelepiped, and obtains the convex hull of this point group as the tracking region. . In the present embodiment, this rectangular parallelepiped is in contact with the floor surface, and its height is set to a value higher than that of the head of the pedestrian 351 being tracked, and the length of a horizontal piece of the rectangular parallelepiped is set to twice the prediction error. It is a figure. FIG. 8 is a diagram illustrating a region in which an object is predicted to exist. The figure in FIG. 8 shows this rectangular parallelepiped whose center is the coordinates 363. The area estimation unit 107 obtains a vertex group on the captured image 513 shown in FIG. 7 by performing perspective projection transformation on each of the vertices of this rectangular parallelepiped, and convexly wraps the vertex group to make a tracking area 531. To get.
A method of obtaining the coordinates of a point in the coordinate system of an image captured by a camera corresponding to a point in a certain world coordinate system is perspective projection conversion according to the pinhole camera model described in Non-Patent Document 1. According to Non-Patent Document 1, coordinates m (u, v) obtained by perspective projection of an object having world coordinates M (x, y, z) on a captured image are obtained by the following Expression 1. However, information on the camera internal parameter matrix A, the camera external parameter matrix [R | t], and the scale s for perspective projection of the world coordinate system onto the coordinate system on the captured image is required for each camera in advance. In the present embodiment, the auxiliary storage device 204 is assumed to store such information in advance.

ｓｍ＝Ａ［Ｒ｜ｔ］Ｍ（式１）
但し
ｍ＝［［ｕ］［ｖ］［１］］
Ａ＝［［ｆ＿ｘ０ｃ＿ｘ］［０ｆ＿ｙｃ＿ｙ］［ｃ＿ｘｃ＿ｙ１］］
［Ｒ｜ｔ］＝［［ｒ１１ｒ１２ｒ１３ｔ１］［ｒ２１ｒ２２ｒ２３ｔ２］［ｒ３１ｒ３２ｒ３３ｔ３］］
Ｍ＝［［ｘ］［ｙ］［ｚ］［１］］ sm = A [R | t] M (Formula 1)
However, m = [[u] [v] [1]]
A = [[f_x 0 c_x] [0 f_y c_y] [c_x c_y 1]]
[R | t] = [[r11 r12 r13 t1] [r21 r22 r23 t2] [r31 r32 r33 t3]]
M = [[x] [y] [z] [1]]

また、装置決定部１０６は、監視システムの複数のカメラからの撮像画像のうちどのカメラからの撮像画像を、歩行者３５１の検出に利用されるかを決定しないこととしてもよい。その場合、領域推定部１０７は、Ｓ４０５で予測された世界座標系における範囲に基づいて、監視システム内の全てのカメラからの撮像画像のうちどの領域が、歩行者３５１の検出に利用されるかを決定することとしてもよい。この場合、領域推定部１０７は、Ｓ４０５で予測された範囲を追尾可能領域に含まないカメラについては、撮像画像の全ての領域を、オブジェクトの検出に利用しないことを決定する。
装置決定部１０６により決定される監視システムの複数のカメラからの撮像画像のうちどのカメラからの撮像画像が歩行者３５１の検出に利用されるかの情報は、歩行者３５１等のオブジェクトの検出に利用されるパラメータである検出パラメータの一例である。また、領域推定部１０７により決定されるカメラからの撮像画像のうちどの領域が、歩行者３５１の検出に利用されるかを示す情報も、検出パラメータの一例である。検出パラメータには、他には、オブジェクトの検出にどの検出器を利用するかを示す情報やオブジェクトの検出のためにカメラからの撮像画像からどの大きさの部分画像を抽出するかを示す情報等がある。
Ｓ４０６での装置決定部１０６、領域推定部１０７の処理は、Ｓ４０５で予測された共通属性の予測値に基づいて、検出パラメータを決定する処理の一例である。特に、Ｓ４０６での領域推定部１０７の処理は、Ｓ４０５で予測された共通属性の予測値から各カメラの撮像画像毎の属性の予測値を取得し、取得した撮像画像毎の属性の予測値に基づいて、検出パラメータを決定する処理の一例である。また、装置決定部１０６、領域推定部１０７は、オブジェクトの共通属性についての予測値に基づいて、オブジェクトの検出に利用される検出パラメータを決定する第１の決定手段の一例である。 Further, the device determination unit 106 may not determine which camera, of the captured images from the plurality of cameras of the monitoring system, is used to detect the pedestrian 351. In that case, the region estimation unit 107, based on the range in the world coordinate system predicted in S405, which region of the imaged images from all the cameras in the monitoring system is used to detect the pedestrian 351. May be determined. In this case, the area estimation unit 107 determines not to use the entire area of the captured image for detecting an object for a camera that does not include the range predicted in S405 in the trackable area.
Information about which camera is used to detect the pedestrian 351 among the images captured from the plurality of cameras of the monitoring system determined by the device determination unit 106 is used to detect an object such as the pedestrian 351. It is an example of a detection parameter which is a parameter used. In addition, information indicating which region of the image captured by the camera, which is determined by the region estimation unit 107, is used for detecting the pedestrian 351 is also an example of the detection parameter. In addition to the detection parameters, information indicating which detector is used to detect an object, information indicating which size of a partial image is extracted from a captured image from a camera for detecting an object, etc. There is.
The process of the device determination unit 106 and the region estimation unit 107 in S406 is an example of the process of determining the detection parameter based on the predicted value of the common attribute predicted in S405. In particular, the process of the area estimation unit 107 in S406 acquires the predicted value of the attribute for each captured image of each camera from the predicted value of the common attribute predicted in S405, and uses the acquired predicted value for the attribute for each captured image. It is an example of a process of determining a detection parameter based on the above. The device determination unit 106 and the region estimation unit 107 are an example of a first determination unit that determines a detection parameter used for detecting an object based on the predicted value of the common attribute of the object.

Ｓ４０７において、画像取得部１０１は、監視システム内の複数のカメラから、撮像画像群を取得する。本実施形態、画像取得部１０１は、監視システム内の複数のカメラから、直前のＳ４０１又はＳ４０７の処理で取得した撮像画像の次に各カメラにより撮像された撮像画像を取得することとする。また、画像取得部１０１は、監視システム内の複数のカメラから、直前にＳ４０１又はＳ４０７の処理が行われてから、設定された期間が経過した時点で各カメラにより撮像された撮像画像を取得することとしてもよい。例えば、画像取得部１０１は、歩行者３５１が座標３６３を中心とした半径が予測誤差３７１の円形の範囲内に移動している場合、カメラ３１３から図７に示す撮像画像５１３を取得する。
Ｓ４０８において、属性更新部１０３は、Ｓ４０６で装置決定部１０６により歩行者３５１の検出に利用される撮像画像を撮像するカメラとして決定されたカメラからの撮像画像に基づいて、以下の処理を行う。即ち、属性更新部１０３は、Ｓ４０６で領域推定部１０７により決定された追尾領域の中から、歩行者３５１を検出し、検出した歩行者３５１の撮像画像の座標系における座標を取得する。例えば、属性更新部１０３は、撮像画像５１３が得られた場合、追尾領域５３１の内部から歩行者３５１を検出し、検出した歩行者３５１の座標を求める。歩行者３５１を検出する方法、及び歩行者３５１の座標を取得する方法は、Ｓ４０２の処理と同様である。属性更新部１０３は、Ｓ４０６で決定された検出パラメータに基づいて、オブジェクトを検出する第１の検出手段の一例である。
Ｓ４０９において、属性更新部１０３は、Ｓ４０８で歩行者３５１が検出したか否かを判定する。属性更新部１０３は、Ｓ４０８で歩行者３５１が検出したと判定した場合、歩行者３５１の追尾が継続するものとして、Ｓ４０３の処理に進む。また、属性更新部１０３は、Ｓ４０８で歩行者３５１が検出していないと判定した場合、歩行者３５１が監視空間３０１の外に出たため追尾が終了したとして、図４の処理を終了する。 In step S407, the image acquisition unit 101 acquires a captured image group from a plurality of cameras in the surveillance system. In this embodiment, the image acquisition unit 101 acquires, from a plurality of cameras in the monitoring system, a captured image captured by each camera next to the captured image acquired in the immediately preceding process of S401 or S407. In addition, the image acquisition unit 101 acquires, from a plurality of cameras in the surveillance system, a captured image captured by each camera when a set period has elapsed since the processing of S401 or S407 was performed immediately before. It may be that. For example, the image acquisition unit 101 acquires the captured image 513 shown in FIG. 7 from the camera 313 when the pedestrian 351 has moved within the circular range of the prediction error 371 with the radius centered on the coordinate 363.
In step S 408, the attribute update unit 103 performs the following processing based on the captured image from the camera determined as the camera that captures the captured image used for detecting the pedestrian 351 by the device determination unit 106 in step S 406. That is, the attribute updating unit 103 detects the pedestrian 351 from the tracking area determined by the area estimating unit 107 in S406, and acquires the coordinates in the coordinate system of the captured image of the detected pedestrian 351. For example, when the captured image 513 is obtained, the attribute updating unit 103 detects the pedestrian 351 from inside the tracking area 531 and obtains the coordinates of the detected pedestrian 351. The method of detecting the pedestrian 351 and the method of acquiring the coordinates of the pedestrian 351 are the same as the processing in S402. The attribute update unit 103 is an example of a first detection unit that detects an object based on the detection parameter determined in S406.
In S409, the attribute update unit 103 determines whether or not the pedestrian 351 is detected in S408. When the attribute update unit 103 determines in S408 that the pedestrian 351 is detected, the attribute updating unit 103 determines that the tracking of the pedestrian 351 is continued, and proceeds to the process of S403. If the attribute updating unit 103 determines in S408 that the pedestrian 351 has not been detected, the attribute updating unit 103 determines that tracking has ended because the pedestrian 351 has gone out of the monitoring space 301, and ends the processing in FIG.

以上、本実施形態の処理により、監視システムは、複数のカメラの撮像画像毎の歩行者３５１の座標から、世界座標系における歩行者３５１の座標を推定する。そして、監視システムは、取得した世界座標系における歩行者３５１の座標に基づいて、現在の時点から設定された期間が経過した時点における歩行者３５１の世界座標系における座標を予測する。又は、監視システムは、世界座標系における歩行者３５１の座標と世界座標系における歩行者３５１の過去の座標とに基づいて、現在の時点から設定された期間が経過した時点における歩行者３５１の世界座標系における座標を予測する。監視システムは、予測された座標に基づいて、歩行者３５１を検出するための撮像画像を撮影するためのカメラ、及び撮像画像内で歩行者３５１の検出を行うための領域を決定する。そして、監視システムは、決定したカメラの撮像画像に対して、決定した領域について歩行者３５１の検出を行い、歩行者３５１の追尾を行う。
これにより、監視システムは、歩行者３５１を撮像し得るカメラの撮像画像に対して、歩行者３５１が存在し得る領域についてのみに、歩行者３５１の検出を行えばよいことになる。即ち、監視システムは、全てのカメラの撮像画像の全領域に対して、歩行者３５１を検出する処理を行わずに済むことになり、歩行者３５１の検出の負担を軽減することができる。 As described above, by the processing of this embodiment, the monitoring system estimates the coordinates of the pedestrian 351 in the world coordinate system from the coordinates of the pedestrian 351 for each of the captured images of the plurality of cameras. Then, the monitoring system predicts, on the basis of the acquired coordinates of the pedestrian 351 in the world coordinate system, the coordinates of the pedestrian 351 in the world coordinate system at the time when the set period has elapsed from the current time. Alternatively, the monitoring system, based on the coordinates of the pedestrian 351 in the world coordinate system and the past coordinates of the pedestrian 351 in the world coordinate system, the world of the pedestrian 351 at the time when the set period has elapsed from the current time. Predict the coordinates in the coordinate system. The monitoring system determines a camera for capturing a captured image for detecting the pedestrian 351 and an area for detecting the pedestrian 351 in the captured image based on the predicted coordinates. Then, the monitoring system detects the pedestrian 351 in the determined area in the captured image of the determined camera, and tracks the pedestrian 351.
As a result, the monitoring system only needs to detect the pedestrian 351 only in the region where the pedestrian 351 can exist in the image captured by the camera that can image the pedestrian 351. That is, the monitoring system does not have to perform the process of detecting the pedestrian 351 in the entire area of the captured images of all the cameras, and the burden of detecting the pedestrian 351 can be reduced.

また、監視システムは、歩行者３５１の移動が共通属性予測部１０５により予測される範囲内に収まる限り、歩行者３５１を追尾できる。また、監視システムは、属性取得部１０２がＳ４０２で新規オブジェクト属性取得領域内に新規に現れたオブジェクトを検出し、Ｓ４０８でそのオブジェクトが再度、検出された場合、以下のようにして、そのオブジェクトの速度を取得できる。即ち、監視システムは、初めに検出された際のそのオブジェクトの世界座標系における座標と、次に検出された際のそのオブジェクトの世界座標系における座標とを求め、座標の差と検出された時点のずれとから速度を求める事ができる。また、共通属性予測部１０５と領域推定部１０７との処理によれば、歩行者３５１の追尾領域は、想定された最大速度に基づき求められる。即ち、歩行者３５１の速度が想定された最大速度を下回る限り、属性更新部１０３は、撮像画像上の追尾領域内に歩行者３５１が存在することとなり、歩行者３５１を検出できる。
また、共通属性予測部１０５が、歩行者３５１の軌跡の情報に基づいて線形回帰を利用して、歩行者３５１の存在する範囲を予測する場合、監視空間３０１が広い程、属性更新部１０３の演算量は、減ることになる。なぜならば、歩行者３５１の軌跡の情報に歩行者が通過した座標の情報が増える程、線形回帰の信頼性が上がり、共通属性予測部１０５は、歩行者３５１が存在すると予測する範囲の推定誤差として、線形回帰の誤差を用いることができる。線形回帰の誤差は、歩行者３５１の軌跡の情報に含まれる座標の情報が増える程、回帰の信頼性が上がるため、小さくなる。そのため、領域推定部１０７は、その推定誤差に応じて追尾領域を小さくする。それにより、属性更新部１０３の処理の対象となる領域が小さくなるため、属性更新部１０３の処理の負担が低減される。結果として、監視空間３０１が広い程、属性更新部１０３の追尾処理の演算量が減ることになる。 Moreover, the monitoring system can track the pedestrian 351 as long as the movement of the pedestrian 351 is within the range predicted by the common attribute prediction unit 105. Further, in the monitoring system, when the attribute acquisition unit 102 detects an object newly appearing in the new object attribute acquisition area in S402 and the object is detected again in S408, the monitoring of the object is performed as follows. You can get the speed. That is, the monitoring system obtains the coordinate of the object in the world coordinate system when first detected and the coordinate of the object in the world coordinate system when detected next, and the difference between the coordinate and the time when the difference is detected. The speed can be calculated from the deviation of the speed. Further, according to the processes of the common attribute prediction unit 105 and the region estimation unit 107, the tracking region of the pedestrian 351 is obtained based on the assumed maximum speed. That is, as long as the speed of the pedestrian 351 is lower than the assumed maximum speed, the attribute update unit 103 can detect the pedestrian 351 because the pedestrian 351 exists in the tracking area on the captured image.
In addition, when the common attribute prediction unit 105 predicts the range in which the pedestrian 351 exists by using linear regression based on the information on the trajectory of the pedestrian 351, the wider the monitoring space 301, the more the attribute update unit 103 operates. The amount of calculation will be reduced. This is because the reliability of the linear regression increases as the information of the coordinates of the pedestrian passing in the information of the locus of the pedestrian 351 increases, and the common attribute prediction unit 105 predicts the estimation error of the range in which the pedestrian 351 exists. As, the error of linear regression can be used. The error of the linear regression becomes smaller as the information of the coordinates included in the information of the trajectory of the pedestrian 351 increases, because the reliability of the regression increases. Therefore, the area estimation unit 107 reduces the tracking area according to the estimation error. As a result, the area to be processed by the attribute updating unit 103 becomes smaller, and the processing load on the attribute updating unit 103 is reduced. As a result, the larger the monitoring space 301 is, the smaller the calculation amount of the tracking process of the attribute updating unit 103 is.

本実施形態では、ＣＰＵ２０２がＲＡＭ２０３又は補助記憶装置２０４内のプログラムを実行することで、図４のフローチャートの処理が実現されるとした。しかし、図４のフローチャートの処理の一部又は全部を電子回路等のハードウェアで実現するようにしてもよい。
また、本実施形態の図４のフローチャートの処理の一部又は全部を、撮像素子を備える監視システム内のカメラが実行することとしてもよい。その場合、監視システム内のカメラのそれぞれは、ＣＰＵ、図４のフローチャートの処理の一部又は全部を実行するためのプログラム等を記憶する記憶装置を含むことになる。そして、監視システム内のカメラのＣＰＵが、カメラの記憶装置に記憶されたプログラムを実行することにより、カメラの機能、及び図４のフローチャートの処理の一部又は全部が実現されることになる。 In the present embodiment, it is assumed that the CPU 202 executes the program in the RAM 203 or the auxiliary storage device 204 to implement the processing of the flowchart of FIG. However, some or all of the processing of the flowchart of FIG. 4 may be realized by hardware such as an electronic circuit.
Further, a part or all of the processing of the flowchart of FIG. 4 of the present embodiment may be executed by the camera in the surveillance system including the image sensor. In that case, each of the cameras in the surveillance system includes a CPU and a storage device that stores a program or the like for executing a part or all of the processing of the flowchart of FIG. The CPU of the camera in the surveillance system executes the program stored in the storage device of the camera, so that the function of the camera and part or all of the processing of the flowchart of FIG. 4 are realized.

＜実施形態２＞
本実施形態では、空間入力部１１０、領域決定部１１１の処理を説明する。空間入力部１１０は、新規オブジェクト登場空間を入力する。そして、領域決定部１１１は、空間入力部１１０により入力された新規オブジェクト登場空間の指定に基づいて、以下の処理を行う。即ち、領域決定部１１１は、Ｓ４０２で監視空間３０１に新たに登場したオブジェクトの検出処理を行う撮像画像がどのカメラからの撮像画像であるかを決定する。また、領域決定部１１１は、Ｓ４０２で撮像画像のうち、オブジェクトの検出処理を行う領域を決定する。即ち、空間入力部１１０、領域決定部１１１は、新たに、監視空間に登場し得るオブジェクトを検出する際に利用される検出パラメータを決定する。本実施形態の空間入力部１１０、領域決定部１１１の処理は、図４の処理の前に行われる。
本実施形態の監視システムのシステム構成、システム構成要素のハードウェア構成及び機能構成は、実施形態１と同様である。 <Embodiment 2>
In this embodiment, the processing of the space input unit 110 and the area determination unit 111 will be described. The space input unit 110 inputs a new object appearance space. Then, the area determination unit 111 performs the following processing based on the designation of the new object appearance space input by the space input unit 110. That is, the area determination unit 111 determines from which camera the captured image for performing the detection processing of the object newly appearing in the monitoring space 301 in S402 is the captured image. Further, the area determination unit 111 determines the area in the captured image in which the object detection processing is performed in S402. That is, the space input unit 110 and the region determination unit 111 newly determine the detection parameter used when detecting an object that can appear in the surveillance space. The processing of the space input unit 110 and the area determination unit 111 of this embodiment is performed before the processing of FIG.
The system configuration of the monitoring system of this embodiment and the hardware configuration and functional configuration of system components are the same as in the first embodiment.

空間入力部１１０は、オブジェクトが監視空間３０１に入る為に必ず経由する新規オブジェクト登場空間を入力する。
空間入力部１１０は、ユーザーインターフェース２０５を介して、新規オブジェクト登場空間の指定に利用される指定画面を表示部に表示する。そして、空間入力部１１０は、指定画面、入力装置を介したユーザーの操作に基づいて、新規オブジェクト登場空間の指定を受け付ける。例えば、空間入力部１１０は、監視空間３０１を俯瞰した図面を指定画面として、表示部に表示する。ユーザーは、指定画面に表示された監視空間３０１において新規オブジェクト登場空間を設定したい範囲を、例えばマウスを介したドラッグ処理等で指定する。また、ユーザーは、指定画面に表示された監視空間３０１において新規オブジェクト登場空間の候補として表示された範囲のうち、新規オブジェクト登場空間として決定したい範囲を、例えばマウスを介したクリック処理等で選択することとしてもよい。空間入力部１１０は、ユーザーの操作に基づいて、指定された新規オブジェクト登場空間を設定したい範囲の情報を受け付けることで、新規オブジェクト登場空間の情報を入力する。
また、空間入力部１１０は、ユーザーインターフェース２０５を介して、新規オブジェクト登場空間だけでなく、壁、カメラの設置座標、関連パラメータ等の指定に利用される指定画面を表示部に表示することとしてもよい。例えば、空間入力部１１０は、世界座標系を俯瞰した画面や関連パラメータの設定画面を含む指定画面を表示する。ユーザーは、指定画面内に表示された座標系に壁、カメラ等のアイコンを配置することで、これらの座標を指定し、関連パラメータの設定画面に希望のパラメータを指定することで、関連パラメータを指定する。空間入力部１１０は、指定画面を介して、指定された情報を受け付ける。 The space input unit 110 inputs a new object appearance space through which an object always enters to enter the monitoring space 301.
The space input unit 110 displays a designation screen used for designating a new object appearance space on the display unit via the user interface 205. Then, the space input unit 110 receives the designation of the new object appearance space based on the designation screen and the user's operation via the input device. For example, the space input unit 110 displays a plan view of the monitoring space 301 on the display unit as a designation screen. The user specifies the range in which the new object appearance space is to be set in the monitoring space 301 displayed on the specification screen, for example, by dragging with a mouse. In addition, the user selects a range to be determined as a new object appearance space from the ranges displayed as candidates for the new object appearance space in the monitoring space 301 displayed on the designated screen by, for example, a click process using a mouse. It may be that. The space input unit 110 receives the information of the range in which the specified new object appearance space is desired to be set based on the user's operation, and inputs the information of the new object appearance space.
In addition, the space input unit 110 may display, through the user interface 205, not only the new object appearance space but also a designation screen used for designation of walls, camera installation coordinates, related parameters, and the like on the display unit. Good. For example, the space input unit 110 displays a designated screen including a screen overlooking the world coordinate system and a screen for setting related parameters. The user specifies the coordinates by placing icons such as walls and cameras in the coordinate system displayed in the specified screen, and specifies the desired parameters on the related parameter setting screen to set the related parameters. specify. The space input unit 110 receives the designated information via the designation screen.

図９は、空間入力部１１０により表示される指定画面の一例を示す図である。空間入力部１１０が表示する指定画面を介したユーザーの操作に基づく新規オブジェクト登場空間の指定処理の例について図９を用いて説明する。図９の指定画面では、実線、破線、点線等が描画されている。実線には、相対的に太い線（以下では太実線）、相対的に細い実線（以下では細実線）がある。図９の例では、指定画面には、壁を示す太実線、カメラを示す長方形、新規にオブジェクトが登場し得る空間を示す細実線で囲まれた領域、カメラがオブジェクトを追尾できる空間を示す点線の扇形等が表示される。図９の指定画面は、監視空間３０１、新規オブジェクト登場空間、カメラ毎のオブジェクトを検出し得る空間、が全て上方から俯瞰し底面上に射影された領域を示す。また、壁、新規オブジェクト登場空間等も、底面に射影された領域とする。即ち、図９の指定画面は、世界座標系における監視空間を、平面の地図である２次元地図として表した画面である。
空間入力部１１０は、各カメラについて追尾可能領域を推定し、指定画面に表示することで、ユーザーに提示する。例えば、ユーザーが指定画面を介して監視空間３０１内にカメラ３１１の設置を指定したときに、追尾可能領域である領域３２１を推定しユーザーに提示する。追尾可能領域は、扇形である。追尾可能領域の扇形の中心角は、カメラの画角である。扇形の半径は、追尾対象のオブジェクトがある画素数以上で映る距離であって、この画素数は、属性更新部１０３がオブジェクトを検出できる最小の画素数であるものとする。
本実施形態では、空間入力部１１０が属性更新部１０３の特性に基づいて追尾可能領域の扇形の半径を決定する。それにより、装置決定部１０６は、オブジェクトを単に映すカメラではなく、オブジェクトを追尾できるカメラを、検出処理を行うための撮像画像を撮像するカメラとして決定できる。これにより、空間入力部１１０は、属性更新部１０３では検出不可能なオブジェクトを追尾しようとする処理に要する演算量を削減する事ができる。 FIG. 9 is a diagram showing an example of a designation screen displayed by the space input unit 110. An example of the process of designating the new object appearance space based on the user's operation through the designation screen displayed by the space input unit 110 will be described with reference to FIG. In the designation screen of FIG. 9, solid lines, broken lines, dotted lines, etc. are drawn. The solid lines include relatively thick lines (thick solid lines below) and relatively thin solid lines (thin solid lines below). In the example of FIG. 9, the designated screen has a thick solid line indicating a wall, a rectangle indicating a camera, a region surrounded by a thin solid line indicating a space where an object may newly appear, and a dotted line indicating a space where the camera can track the object. Is displayed. The designation screen of FIG. 9 shows an area in which the surveillance space 301, the new object appearance space, and the space in which an object for each camera can be detected are all viewed from above and projected onto the bottom surface. Also, a wall, a new object appearance space, and the like are areas projected on the bottom surface. That is, the designation screen of FIG. 9 is a screen in which the monitoring space in the world coordinate system is represented as a two-dimensional map which is a planar map.
The space input unit 110 estimates a trackable area for each camera and displays the area on a designated screen to present it to the user. For example, when the user designates the installation of the camera 311 in the surveillance space 301 via the designation screen, the region 321 that is the trackable region is estimated and presented to the user. The trackable area has a fan shape. The central angle of the fan-shaped trackable area is the angle of view of the camera. The radius of the sector is the distance at which the object to be tracked is reflected by a certain number of pixels or more, and this number of pixels is the minimum number of pixels by which the attribute updating unit 103 can detect the object.
In the present embodiment, the space input unit 110 determines the radius of the sector of the trackable area based on the characteristics of the attribute update unit 103. As a result, the device determination unit 106 can determine a camera that can track an object as a camera that captures a captured image for performing the detection process, instead of a camera that simply captures the object. As a result, the space input unit 110 can reduce the amount of calculation required for the process of tracking an object that cannot be detected by the attribute update unit 103.

領域決定部１１１は、空間入力部１１０により入力された新規オブジェクト登場空間に基づき、監視空間３０１内に新規に登場したオブジェクトを検出し得るカメラ、及びそのカメラ画像上の新規オブジェクト属性取得領域を推定する。例えば、領域決定部１１１は、図９の指定画面で入力された新規オブジェクト登場空間３０２を追尾可能領域に含むカメラ３１１及びカメラ３１２を求める。更に、領域決定部１１１は、図５に示す様にカメラ３１１の撮像画像４１１上に新規オブジェクト属性取得領域４２１を、カメラ３１２の撮像画像４１２上に新規オブジェクト属性取得領域４２２を推定する。
領域決定部１１１は、例えば、新規オブジェクトを検出し得るカメラを求めるために、以下のような処理を行う。補助記憶装置２０４は、予めカメラ毎にオブジェクトを検出し得る空間の情報を記憶しているものとする。領域決定部１１１は、補助記憶装置２０４から、全てのカメラについて、オブジェクトを検出し得る空間の情報を取得する。そして、領域決定部１１１は、取得した情報に基づいて、オブジェクトを検出し得る空間と、新規オブジェクト登場空間とが重複するカメラを選択する。本実施形態では、カメラがオブジェクトを検出し得る空間、新規オブジェクト登場空間等の空間は、底面に射影した領域として扱われる。また、本実施形態では、属性取得部１０２と属性更新部１０３とで非特許文献２記載の同一の方法が適用されるので、あるカメラでオブジェクトを検出し得る領域とカメラの追尾可能領域とは等しい。よって、領域決定部１１１は、新規オブジェクト登場空間とカメラの追尾可能領域とが重複するカメラを求めればよい。例えば、領域決定部１１１は、新規オブジェクト登場空間３０２とカメラの追尾可能領域である領域３２１や追尾可能領域である領域３２２とが重複するか否かを判定し、重複すると判定したカメラを選択すればよい。
また、領域決定部１１１は、新規オブジェクトを検出し得るカメラを選択せずに、全てのカメラについて、追尾領域を決定することとしてもよい。その場合、領域決定部１１１は、新規オブジェクトを検出し得ないカメラについて、追尾領域として面積が０の領域を決定することになる。 The area determination unit 111 estimates a camera capable of detecting an object newly appearing in the monitoring space 301 and a new object attribute acquisition area on the camera image based on the new object appearance space input by the space input unit 110. To do. For example, the area determination unit 111 obtains the cameras 311 and 312 that include the new object appearance space 302 input on the designation screen of FIG. 9 in the trackable area. Further, the area determination unit 111 estimates the new object attribute acquisition area 421 on the captured image 411 of the camera 311 and the new object attribute acquisition area 422 on the captured image 412 of the camera 312, as shown in FIG.
The area determination unit 111 performs the following processing, for example, in order to obtain a camera that can detect a new object. It is assumed that the auxiliary storage device 204 stores in advance information on a space in which an object can be detected for each camera. The area determination unit 111 acquires, from the auxiliary storage device 204, information on a space in which an object can be detected for all cameras. Then, the area determination unit 111 selects a camera in which the space where the object can be detected and the new object appearance space overlap based on the acquired information. In the present embodiment, a space where the camera can detect an object, a space where a new object appears, and the like are treated as an area projected on the bottom surface. Further, in the present embodiment, since the same method described in Non-Patent Document 2 is applied to the attribute acquisition unit 102 and the attribute update unit 103, the area where an object can be detected by a certain camera and the trackable area of the camera are equal. Therefore, the area determination unit 111 may find a camera in which the new object appearance space and the camera trackable area overlap. For example, the area determination unit 111 determines whether or not the new object appearance space 302 and the area 321 that is the trackable area of the camera or the area 322 that is the trackable area of the camera overlap, and select the cameras that are determined to overlap. Good.
Further, the area determination unit 111 may determine the tracking area for all the cameras without selecting a camera that can detect a new object. In that case, the area determination unit 111 determines an area having an area of 0 as a tracking area for a camera that cannot detect a new object.

同一平面上の二つの領域が重複するか否かを判定する方法は、任意の方法でよいが、例えば、双方の領域を複数の三角形で近似した上で、一方の領域のある三角形の内部に他方の領域の三角形の頂点が存在するならば、二つの領域は重複するとする方法が利用される。例えば、領域決定部１１１は、新規オブジェクト登場空間３０２のある頂点８０１が追尾可能領域である領域３２１の内部に含まれるので、カメラ３１１を、新規オブジェクト登場空間３０２から新規オブジェクトを検出し得るカメラとして選択する。
次に、領域決定部１１１は、あるカメラによる撮像画像上の新規オブジェクト属性取得領域を決定する。撮像画像上の新規オブジェクト属性取得領域を決定する方法には、それぞれの新規オブジェクト登場空間の頂点のそれぞれについてカメラによる撮像画像上での座標を算出し、算出した座標を凸包した領域とカメラによる撮像画像との共通部分を求めればよい。ただし、新規オブジェクト登場空間は底面に射影した領域として扱われるので、領域決定部１１１は、オブジェクトの映りうる高さに応じて適切な高さを想定する頂点を設けてから、そのカメラ画像上での座標を求める必要がある。本実施形態では、補助記憶装置２０４は、予め、オブジェクトの映りうる高さに応じた適切な高さの情報を記憶しているものとする。領域決定部１１１は、補助記憶装置２０４からオブジェクトの映りうる高さに応じた適切な高さの情報を取得する。ある座標のカメラによる撮像画像上での座標を求める方法には、式１を用いる方法がある。例えば、図１０に示す様に、領域決定部１１１は、新規オブジェクト登場空間３０２の底面上の頂点群に、十分に身長の高い歩行者でも映り得る程度の高さに頂点群を設けて立方体とする。領域決定部１１１は、この立方体の全ての頂点群について、撮像画像４１１での座標を求めて、その凸包を求めると、新規オブジェクト属性取得領域４２１を取得する。領域決定部１１１は、空間入力部１１０により入力された領域に基づいて、オブジェクトを初めて検出する際に利用される検出パラメータを決定する第２の決定手段の一例である。ここで、新規オブジェクト属性取得領域４２１を示す情報が、検出パラメータの一例である。
そして、属性取得部１０２は、領域決定部１１１により決定された新規オブジェクトを検出する際に利用される検出パラメータに基づいて、新規オブジェクトを検出する。本実施形態では、Ｓ４０２で、属性取得部１０２は、領域決定部１１１により決定された新規オブジェクトを撮像し得るカメラからの撮像画像について、領域決定部１１１により決定された新規オブジェクト属性取得領域から歩行者３５１を検出することになる。属性取得部１０２は、複数のカメラからの撮像画像群のそれぞれに設定された、オブジェクトが初めて登場する領域からオブジェクトを検出する第２の検出手段の一例である。 Any method may be used to determine whether or not two areas on the same plane overlap each other. For example, after approximating both areas with a plurality of triangles, one area may be inside a triangle. If the vertices of the triangle of the other region are present, the method of making the two regions overlap is used. For example, since the apex 801 of the new object appearance space 302 is included in the area 321 that is the trackable area, the area determination unit 111 uses the camera 311 as a camera that can detect a new object from the new object appearance space 302. select.
Next, the area determination unit 111 determines a new object attribute acquisition area on the image captured by a certain camera. The method for determining the new object attribute acquisition area on the captured image is to calculate the coordinates on the captured image by the camera for each of the vertices of each new object appearance space, and use the area in which the calculated coordinates are convexly enveloped and the camera. It suffices to find the common part with the captured image. However, since the new object appearance space is treated as a region projected on the bottom surface, the region determination unit 111 provides a vertex assuming an appropriate height according to the height at which the object can be seen, and then, on the camera image thereof. It is necessary to find the coordinates of. In the present embodiment, it is assumed that the auxiliary storage device 204 stores in advance information of an appropriate height according to the height at which the object can be seen. The area determination unit 111 acquires, from the auxiliary storage device 204, information on an appropriate height according to the height at which the object can be seen. There is a method of using Expression 1 as a method of obtaining coordinates of a certain coordinate on a captured image by a camera. For example, as shown in FIG. 10, the area determination unit 111 provides a group of vertices on the bottom of the new object appearance space 302 at a height such that even a pedestrian with a sufficiently high height can see them, and forms a cube. To do. The area determination unit 111 obtains the coordinates in the captured image 411 for all the vertex groups of this cube, obtains the convex hull thereof, and obtains the new object attribute acquisition area 421. The area determination unit 111 is an example of a second determination unit that determines the detection parameter used when the object is detected for the first time, based on the area input by the space input unit 110. Here, the information indicating the new object attribute acquisition area 421 is an example of the detection parameter.
Then, the attribute acquisition unit 102 detects the new object based on the detection parameter used when detecting the new object determined by the area determination unit 111. In the present embodiment, in step S402, the attribute acquisition unit 102 walks from the new object attribute acquisition area determined by the area determination unit 111 for the captured image from the camera that can capture the new object determined by the area determination unit 111. The person 351 will be detected. The attribute acquisition unit 102 is an example of a second detection unit that detects an object from a region in which the object first appears, which is set for each of a group of captured images from a plurality of cameras.

以上、本実施形態の処理によれば、監視システムは、各カメラについて、新規オブジェクト登場空間を設定することができる。また、空間入力部１１０は、新規オブジェクト登場空間を指定するための指定画面をユーザーに提示することで、ユーザーの希望に沿った、新規オブジェクト登場空間を設定できる。それにより、監視空間３０１に新たに登場するオブジェクトを検出する際の処理の負担を軽減することができる。 As described above, according to the processing of this embodiment, the monitoring system can set the new object appearance space for each camera. In addition, the space input unit 110 can set the new object appearance space according to the user's wish by presenting the user with a designation screen for designating the new object appearance space. This can reduce the processing load when detecting an object that newly appears in the monitoring space 301.

＜実施形態３＞
実施形態１、２では以下のような処理について説明した。即ち、属性取得部１０２や属性更新部１０３がカメラごとのオブジェクトの属性を取得し、共通属性推定部１０４がそれら属性を統合してカメラ間で共通の属性を推定する処理である。また、共通属性予測部１０５がカメラによる次の撮像画像におけるオブジェクトの共通属性値を予測し、装置決定部１０６と領域推定部１０７とが予測された共通属性の値に基づいて、検出パラメータを決定する処理である。この様な、各カメラで収集し、全体で統合・予測し、予測した属性に基づいて、決定することができる検出パラメータは、検出対象の画像を撮像するカメラの情報や、追尾領域の情報には、限られない。また、オブジェクトから取得することができる属性も座標に限られない。例えば、以下のようなものがある。 <Embodiment 3>
In the first and second embodiments, the following processing has been described. That is, this is a process in which the attribute acquisition unit 102 and the attribute update unit 103 acquire the attributes of objects for each camera, and the common attribute estimation unit 104 integrates the attributes and estimates the common attribute between the cameras. Further, the common attribute prediction unit 105 predicts the common attribute value of the object in the next captured image by the camera, and the device determination unit 106 and the region estimation unit 107 determine the detection parameter based on the predicted common attribute value. It is a process to do. Such detection parameters that are collected by each camera, integrated / predicted as a whole, and can be determined based on the predicted attributes are the information of the camera that captures the image of the detection target and the information of the tracking area. Is not limited. Also, the attributes that can be obtained from the object are not limited to coordinates. For example, there are the following.

例えば、属性取得部１０２や属性更新部１０３が検出したオブジェクトから取得できる属性として、歩行者の身長（オブジェクトの高さ）がある。
実施形態１では、歩行者３５１の世界座標系における座標は、歩行者３５１の頭部を球で近似した場合のその球の中心の座標であるとした。ここで、歩行者３５１の世界座標系における座標を、この球の最も床面から遠い点の座標とすると、共通属性推定部１０４は、歩行者の身長を推定していることになる。ここで、領域推定部１０７は、歩行者の身長、カメラの画角、カメラの画素数等に基づいて、そのカメラで歩行者が映る垂直方向の画素数を算出できる。そして、領域推定部１０７は、検出パラメータとして、歩行者が映る垂直方向の画素数を決定する。
そして、属性更新部１０３は、非特許文献２の方法に基づきオブジェクト検出のための部分画像を選択するときに、検出パラメータとして決定された垂直方向の画素数と近似する垂直方向の画素数を持つ部分画像を選択する。例えば、属性更新部１０３は、検出パラメータとして決定された垂直方向の画素数に近似する垂直方向の画素数を持つ部分画像を選択し、選択した部分画像からオブジェクトを検出する処理を行う。検出パラメータとして決定された垂直方向の画素数に近似する画素数としては、例えば、その画素数を中心に設定された幅を持つ範囲（例えば、その画素数の０．９倍から１．１倍までの範囲）に属する画素数とすることとしてもよい。これにより、属性更新部１０３は、検出パラメータとして決定された垂直方向の画素数と近似しない垂直方向の画素数を持つ部分画像について、オブジェクトの検出処理を行う必要がなくなる。これにより、監視システムは、属性更新部１０３による検出処理の負担を低減させることができる。 For example, the height of the pedestrian (height of the object) is an attribute that can be acquired from the object detected by the attribute acquisition unit 102 or the attribute update unit 103.
In the first embodiment, the coordinates of the pedestrian 351 in the world coordinate system are the coordinates of the center of the sphere when the head of the pedestrian 351 is approximated by a sphere. Here, assuming that the coordinates of the pedestrian 351 in the world coordinate system are the coordinates of the point farthest from the floor surface of the sphere, the common attribute estimation unit 104 estimates the height of the pedestrian. Here, the area estimation unit 107 can calculate the number of pixels in the vertical direction in which the pedestrian is reflected by the camera based on the height of the pedestrian, the angle of view of the camera, the number of pixels of the camera, and the like. Then, the area estimation unit 107 determines the number of pixels in the vertical direction in which the pedestrian is reflected, as the detection parameter.
Then, the attribute updating unit 103 has the number of pixels in the vertical direction that is similar to the number of pixels in the vertical direction determined as a detection parameter when selecting a partial image for object detection based on the method of Non-Patent Document 2. Select a partial image. For example, the attribute updating unit 103 selects a partial image having the number of pixels in the vertical direction that is close to the number of pixels in the vertical direction determined as the detection parameter, and performs a process of detecting an object from the selected partial image. The number of pixels close to the number of pixels in the vertical direction determined as the detection parameter is, for example, a range having a width set around the number of pixels (for example, 0.9 times to 1.1 times the number of pixels). (Range up to) can be set as the number of pixels. As a result, the attribute updating unit 103 does not need to perform the object detection processing on the partial image having the number of pixels in the vertical direction that is not approximate to the number of pixels in the vertical direction determined as the detection parameter. As a result, the monitoring system can reduce the load of the detection processing by the attribute updating unit 103.

例えば、属性取得部１０２や属性更新部１０３が検出したオブジェクトから取得できる属性として、歩行者の胴体の向きや顔向き（オブジェクトの向き、オブジェクトの一部の向き）がある。
歩行者は、撮像方向によってカメラにより撮像される外見が大きく異なるため、非特許文献２の検出方法では、撮影方向が変わると検出性能が低下する。これを克服する為に、属性取得部１０２や属性更新部１０３は、非特許文献２に示されているように、検出器を顔向きと胴体向きの組合せごとに用意していることとする。
Ｓ４０２で属性取得部１０２は、ある歩行者の顔向き及び胴体向きを求める。即ち、属性取得部１０２は、複数用意した検出器のうち歩行者を最も確からしく真と判定した検出器を求める。そして、属性取得部１０２は、その検出器が顔向き、胴体向きが何れの方向に対応する検出器であるかを取得することで、歩行者の顔向き、胴体向きを求める。共通属性推定部１０４は、各カメラからの撮像画像における歩行者の顔向き及び胴体向きを統合し、世界座標上での顔向きと胴体向きを求める。そして、共通属性予測部１０５は、現在の時点から設定された期間が経過した時点における歩行者の胴体や顔向きを予測する。例えば、共通属性予測部１０５は、歩行者が監視空間のある一点を凝視しながら歩いている場合、顔向きをその一点を見つめる向きであると予測する。領域推定部１０７は、予測された世界座標系における歩行者の顔向き、胴体向きに基づいて、カメラによる撮像画像上での歩行者の顔向き及び胴体向きを推定し、推定した顔向き及び胴体向きの情報を検出パラメータとして決定する。そして、属性更新部１０３は、検出パラメータとして決定された歩行者の顔向き及び胴体向きに応じて、複数の検出器から歩行者を検出する為の検出器を選択し適用する。これにより、属性更新部１０３は、検出パラメータとして決定された歩行者の顔向き及び胴体向きに対応しない検出器を用いて、歩行者を検出する処理を行う必要がなくなる。それにより、監視システムは、属性更新部１０３によるオブジェクト検出処理の負担を低減させることができる。
また、属性取得部１０２は、Ｓ４０２である歩行者の座標と共に顔向き及び胴体向きを取得することとしてもよい。 For example, the attributes that can be acquired from the objects detected by the attribute acquisition unit 102 and the attribute update unit 103 include the orientation of the torso and the face of the pedestrian (the orientation of the object, the orientation of part of the object).
Since the appearance of a pedestrian imaged by the camera is significantly different depending on the image capturing direction, the detection method of Non-Patent Document 2 has poor detection performance when the image capturing direction changes. In order to overcome this, it is assumed that the attribute acquisition unit 102 and the attribute update unit 103 prepare detectors for each combination of face orientation and body orientation, as shown in Non-Patent Document 2.
In S402, the attribute acquisition unit 102 obtains the face direction and body direction of a pedestrian. That is, the attribute acquisition unit 102 obtains the detector that most accurately determines the pedestrian as the true detector from among the prepared detectors. Then, the attribute acquisition unit 102 obtains the face orientation and the body orientation of the pedestrian by acquiring which of the detector orientation corresponds to the face orientation and the body orientation. The common attribute estimation unit 104 integrates the face orientation and the body orientation of the pedestrian in the captured images from each camera, and obtains the face orientation and the body orientation in world coordinates. Then, the common attribute prediction unit 105 predicts the torso and face orientation of the pedestrian at the time when the set period has elapsed from the current time. For example, when the pedestrian walks while gazing at a point in the surveillance space, the common attribute prediction unit 105 predicts that the face direction is the direction in which the point is gazed at. The area estimation unit 107 estimates the face direction and the body direction of the pedestrian on the image captured by the camera based on the predicted face direction and the body direction of the pedestrian in the world coordinate system, and estimates the face direction and the body direction. The orientation information is determined as the detection parameter. Then, the attribute updating unit 103 selects and applies a detector for detecting a pedestrian from a plurality of detectors according to the face orientation and the body orientation of the pedestrian determined as the detection parameter. As a result, the attribute updating unit 103 does not need to perform the process of detecting a pedestrian using a detector that does not correspond to the face direction and the body direction of the pedestrian determined as the detection parameter. Thereby, the monitoring system can reduce the load of the object detection processing by the attribute updating unit 103.
Further, the attribute acquisition unit 102 may acquire the face orientation and the body orientation together with the pedestrian coordinates in S402.

また、空間入力部１１０及び領域決定部１１１は、属性取得部１０２がオブジェクトの検出に利用する検出パラメータを決定するものである。そのため、空間入力部１１０及び領域決定部１１１は、検出に利用される撮像画像を撮像するカメラや撮像画像中のオブジェクトの検出に利用される領域以外の検出パラメータを設定することができる。例えば、空間入力部１１０が歩行者の身長と画素数をオブジェクトの属性として取得する場合、領域決定部１１１は、歩行者の平均的な身長から、カメラに映る歩行者の垂直方向の画素数を算出し、Ｓ４０２で利用される検出パラメータとして決定する。そして、属性取得部１０２は、Ｓ４０２で、算出された画素数に基づいた部分画像を撮像画像から抽出し、抽出した部分画像から歩行者の検出を行う。
また、空間入力部１１０が歩行者の顔や動体の向きをオブジェクトの属性として取得する場合、指定画面を介して、新規オブジェクト登場空間毎に、歩行方向や凝視点の入力を受け付ける。そして、領域決定部１１１は、入力された歩行方向や凝視点から、歩行者の胴体及び顔向きを推定し、Ｓ４０２で利用される検出パラメータとして決定するする。そして、属性取得部１０２は、Ｓ４０２で、推定された胴体向き及び顔向きに基づいた検出器を用いて、歩行者を検出することができる。
これにより、監視システムは、Ｓ４０２での属性取得部１０２によるオブジェクトの検出処理の負担が低減できる。 The space input unit 110 and the area determination unit 111 determine the detection parameter used by the attribute acquisition unit 102 to detect an object. Therefore, the space input unit 110 and the region determination unit 111 can set the detection parameters other than the region used for detecting the camera in the captured image used for detection and the object in the captured image. For example, when the space input unit 110 acquires the height and the number of pixels of the pedestrian as the attributes of the object, the area determination unit 111 determines the number of pixels in the vertical direction of the pedestrian reflected in the camera from the average height of the pedestrian. It is calculated and determined as the detection parameter used in S402. Then, in S402, the attribute acquisition unit 102 extracts a partial image based on the calculated number of pixels from the captured image, and detects a pedestrian from the extracted partial image.
Further, when the space input unit 110 acquires the direction of a pedestrian's face or moving body as an attribute of an object, the input of a walking direction or a gazing point is accepted for each new object appearance space via a designation screen. Then, the region determination unit 111 estimates the torso and face direction of the pedestrian from the input walking direction and gazing point, and determines them as the detection parameters used in S402. Then, in S402, the attribute acquisition unit 102 can detect a pedestrian using a detector based on the estimated body direction and face direction.
As a result, the monitoring system can reduce the load of the object detection processing by the attribute acquisition unit 102 in S402.

＜その他の実施形態＞
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサがプログラムを読み出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。
以上、本発明の好ましい実施形態について詳述したが、本発明は係る特定の実施形態に限定されるものではない。
例えば、上述した監視システムの機能構成の一部又は全てをハードウェアとして情報処理装置３４１に実装してもよい。 <Other embodiments>
The present invention supplies a program that implements one or more functions of the above-described embodiments to a system or apparatus via a network or a storage medium, and one or more processors in a computer of the system or apparatus read and execute the program. It can also be realized by the processing. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.
Although the preferred embodiments of the present invention have been described in detail above, the present invention is not limited to the specific embodiments.
For example, part or all of the functional configuration of the above-described monitoring system may be implemented as hardware in the information processing device 341.

３４１情報処理装置、２０２ＣＰＵ 341 Information processing device, 202 CPU

Claims

An image of a two-dimensional map overlooking the imaging ranges of the plurality of image capturing devices is displayed in a common coordinate system common to the plurality of image capturing devices, and an object first appears in the two-dimensional map based on a user operation on the image. Input means for inputting the area to be
An imaging device that acquires an image in which the object should be detected for the first time among the plurality of imaging devices and a region in which the object should be detected for the first time are determined based on the input region where the object first appears. Initial area determining means to
First detection means for detecting the object from the area in which the object determined by the initial area determination means should be detected for the first time in the image captured by the imaging device determined by the initial area determination means,
On the basis of the object captured image detected, an acquisition unit configured to acquire coordinates representing the position of the object in the common coordinate system,
Prediction means for predicting a predicted value of a coordinate representing the position of the object in the common coordinate system at a future time point when a period set from the current time point has elapsed, based on the coordinates ,
Based on the image-capable range of each of the plurality of image-capturing devices and the predicted value, an image-capturing device capable of capturing the object at the future time point is determined from among the plurality of image-capturing devices , and the determination is performed. Estimating means for estimating an area in which the object can be imaged in an image capturing device ;
Second detecting means for detecting the object from the estimated region in the image captured by the image capturing device determined by the estimating means at the future time point;
Information processing device having a.

It said predicting means, and the coordinates, the common coordinate and past coordinates of the object in the system, based on the information processing apparatus according to claim 1, wherein predicting the predicted value.

It said acquisition means, field of view based on the captured image in which the object is detected within the captured image group captured by the plurality of imaging devices overlapping, acquires the coordinates of the object in the common coordinate system,
Said predicting means, based on the coordinates of the object in the acquired the common coordinate system, the coordinates of the object in the common coordinate system at a future time, according to claim 1 or 2, wherein the prediction as the prediction value Information processing equipment.

The initial area determination means may define an area in which the area in which the object appears for the first time in the two-dimensional map input by the input means is projected onto each of the captured image groups captured by the plurality of imaging devices , the information processing apparatus according to claim 1 to 3 any one of claims to determine the object as the first detection should do region.

An information processing method executed by an information processing apparatus,
An image of a two-dimensional map overlooking the imaging ranges of the plurality of image capturing devices is displayed in a common coordinate system common to the plurality of image capturing devices, and an object first appears in the two-dimensional map based on a user operation on the image. Input step to enter the area to be
An imaging device that acquires an image in which the object should be detected for the first time among the plurality of imaging devices and a region in which the object should be detected for the first time are determined based on the input region where the object first appears. An initial area determination step to
A first detection step of detecting the object from the area where the object determined in the initial area determination step is to be detected for the first time in the image captured by the imaging device determined in the initial area determination step,
On the basis of the object captured image detected, an acquisition step of acquiring the coordinates representing the position of the object in the common coordinate system,
Based on the coordinates , a prediction step of predicting a predicted value of coordinates representing the position of the object in the common coordinate system at a future time when a period set from the current time has passed,
Each an imaging possible range of the plurality of imaging devices, based on said predicted value, said object Te said future time odor determines the imaging imaging device capable among the plurality of imaging devices, is the determined An estimating step of estimating a region in which the object can be imaged in the image capturing device ,
A second detection step of detecting the object from the estimated region in the image captured by the imaging device determined in the estimation step at the future time point,
Information processing method including.

Computer program to function as each unit of the information processing apparatus of claims 1 to 4 any one of claims.