JP2019061407A

JP2019061407A - Object detection device

Info

Publication number: JP2019061407A
Application number: JP2017184443A
Authority: JP
Inventors: 知行永橋; Tomoyuki Nagahashi; 秀紀氏家; Hidenori Ujiie; 龍佑野坂; Ryusuke Nosaka
Original assignee: Secom Co Ltd
Current assignee: Secom Co Ltd
Priority date: 2017-09-26
Filing date: 2017-09-26
Publication date: 2019-04-18
Anticipated expiration: 2037-09-26
Also published as: JP6920944B2

Abstract

To detect a position of an individual object with high accuracy from an image photographed in a possible object congestion occurring space regardless of its congestion state.SOLUTION: Density estimating means 50 analyzes an arbitrary area in a photographed image to estimate a congestion degree of an object photographed in the area. Candidate position extraction means 51 uses a single classifier which has learned a feature of a photographed single object image to extract a candidate position having the feature of the single image in the photographed image. Group generation means 52 sets a lower limit regarding a proximity degree of the candidate position to be higher as a congestion degree in the photographed image is higher, and generates a candidate position group including candidate positions closer than the lower limit. Object position determination means 53 determines a position of the object on the basis of the candidate positions belonging to the candidate position group for each candidate position group.SELECTED DRAWING: Figure 2

Description

本発明は、人等の物体が存在し得る空間が撮影された撮影画像から個々の物体の位置を検出する物体検出装置に関し、特に、混雑が生じ得る空間が撮影された撮影画像から個々の物体の位置を検出する物体検出装置に関する。 The present invention relates to an object detection apparatus for detecting the position of an individual object from a photographed image in which a space where an object such as a human can exist is photographed, and in particular, an individual object from a photographed image in which a space where congestion may occur is photographed. Object detection apparatus for detecting the position of

イベント会場等の混雑が発生し得る空間においてはパニックの発生等を防止するために、異常な行動をとる不審者を早期に発見することが求められる。この要請に応えるため、例えば、会場の各所に監視カメラを配置して撮影画像から人の分布を推定し、推定した分布を表示することによって監視員による混雑状況の把握を容易化することが期待される。そして、その際に、個々の人の位置を検出して、検出した各位置に人の形状を模したモデルを表示すること等によって個々の人の位置を示せば、より一層の監視効率向上が期待できる。 In a space such as an event site where congestion may occur, early detection of a suspicious person taking an abnormal action is required to prevent occurrence of panic or the like. In order to respond to this request, for example, it is expected that surveillance cameras will be arranged at various places in the hall to estimate the distribution of people from photographed images and display the estimated distribution to facilitate understanding of the congestion situation by the surveillance staff Be done. Then, at that time, if the position of each person is detected and the position of each person is shown by displaying a model imitating the shape of the person at each detected position, the monitoring efficiency is further improved. I can expect it.

複数人が撮影された撮影画像から個々の人の位置を検出する方法の１つに、単独の人が撮影された画像の特徴量を事前に学習した識別器を用いて撮影画像を探索することによって撮影画像から単独の人の画像特徴が現れている位置を検出する方法がある。 As one of the methods for detecting the position of each person from the photographed images taken by a plurality of people, searching for the photographed images using a classifier that previously learned the feature amounts of the images taken by a single person There is a method of detecting the position where the image feature of a single person appears from the photographed image according to.

識別器を用いた探索処理では、一人ひとりに対して複数の候補位置が近接して抽出され得、一般に、近接して抽出された複数の候補位置に基づいて一人ひとりの位置が決定される。例えば、下記特許文献１に記載の対象検出装置は、指標値（識別器のスコア）が第一閾値を超える候補領域を抽出し、一定以上の割合で重複して抽出された複数の候補領域からなる領域グループを生成する。そして各領域グループの中から最高スコアのものを対象領域（人の領域）として検出し、または、領域グループごとに当該領域グループをなす複数の候補領域を平均化して対象領域を検出する。 In the search processing using a classifier, a plurality of candidate positions may be extracted in proximity to each other, and generally, positions of each individual are determined based on a plurality of candidate positions extracted in proximity. For example, the target detection device described in Patent Document 1 below extracts candidate areas where the index value (score of the classifier) exceeds the first threshold, and from a plurality of candidate areas extracted in duplicate at a certain ratio or more Create an area group that Then, the one with the highest score is detected as a target area (a human area) from each area group, or a plurality of candidate areas forming the area group are averaged for each area group to detect a target area.

特開２０１６−０４０７０５号公報JP, 2016-040705, A

しかしながら、イベント会場等を撮影した撮影画像においては、混雑した領域において隣り合う人物の候補領域同士が一定以上の割合で重複して抽出され得る。そのため、従来技術では複数人の候補領域が混ざった領域グループから１人の領域を決定することとなってしまい、検出し損ねを生じるおそれがあった。 However, in a photographed image obtained by photographing an event hall or the like, candidate areas of adjacent persons in a crowded area may be extracted redundantly at a certain ratio or more. Therefore, in the related art, one area is determined from an area group in which a plurality of candidate areas are mixed, which may cause a failure in detection.

一方、検出し損ねを防ぐために狭い範囲で領域グループを生成すると、混雑が生じていない領域において１人しか撮影されていないにも拘らず複数の領域グループが生成されてしまい、過剰な検出を生じるおそれがある。 On the other hand, if area groups are generated in a narrow range in order to prevent detection failure, a plurality of area groups will be generated even though only one person is photographed in an area where congestion does not occur, resulting in excessive detection. There is a fear.

このように、混雑状態によらず常に一定の基準で物体の候補領域（候補位置）を統合していると、領域ごとの混雑状態の違いや混雑状態の変化によって、物体の位置を検出する精度が低下する問題があった。 As described above, when the candidate areas (candidate positions) of the object are integrated on a constant basis regardless of the congestion state, the accuracy of detecting the position of the object by the difference in the congestion state or the change in the congestion state for each area There was a problem that decreased.

本発明は、上記問題を鑑みてなされたものであり、混雑状態によらず高精度に個々の物体の位置を検出することのできる物体検出装置を提供することを目的とする。 The present invention has been made in view of the above problems, and an object of the present invention is to provide an object detection apparatus capable of detecting the positions of individual objects with high accuracy regardless of congestion.

（１）本発明に係る物体検出装置は、物体による混雑が生じ得る空間が撮影された撮影画像から個々の前記物体の位置を検出する装置であって、前記撮影画像内の任意の領域を解析して当該領域に撮影された前記物体の混雑度合いを推定する混雑推定手段と、単独の前記物体が撮影された単体画像の特徴を学習した単体識別器を用いて、前記撮影画像において前記単体画像の特徴を有する候補位置を抽出する候補位置抽出手段と、前記撮影画像内の前記混雑度合いが高い位置ほど前記候補位置同士の近接度合いに関する下限を高く設定し、前記下限以上に近接している前記候補位置からなる候補位置グループを生成するグループ生成手段と、前記候補位置グループごとに、当該候補位置グループに帰属する候補位置に基づいて前記物体の位置を決定する物体位置決定手段と、を備える。 (1) An object detection apparatus according to the present invention is an apparatus for detecting the position of an individual object from a photographed image in which a space in which congestion due to the object may occur is photographed, and analyzes an arbitrary region in the photographed image Using the congestion estimation means for estimating the congestion degree of the object photographed in the area, and the simple discriminator which has learned the features of the simple image in which the single object is photographed, in the photographed image Candidate position extracting means for extracting candidate positions having the following characteristics, and the lower limit regarding the proximity degree of the candidate positions is set higher as the position with higher degree of congestion in the captured image is higher, Group generation means for generating a candidate position group consisting of candidate positions; and the position of the object based on the candidate positions belonging to the candidate position group for each of the candidate position groups And a object position determining means for determining.

（２）上記（１）に記載の物体検出装置において、前記候補位置抽出手段は、前記候補位置を基準として前記単体画像の特徴を有する候補領域を抽出し、前記グループ生成手段は、前記候補領域同士の重複部分の割合によって前記近接度合いを測り、前記撮影画像内の前記混雑度合いが高い位置ほど前記重複部分に関する下限割合を大きく設定し、前記下限割合以上で重複している前記候補領域に対応する前記候補位置グループを生成する構成とすることができる。 (2) In the object detection device according to (1), the candidate position extraction unit extracts a candidate region having a feature of the single image based on the candidate position, and the group generation unit is configured to The proximity degree is measured by the ratio of overlapping parts of each other, and the lower limit ratio regarding the overlapping part is set larger as the position of the congestion degree in the captured image is higher, and the candidate area overlapping at the lower limit ratio is coped The candidate position group may be generated.

（３）上記（１）に記載の物体検出装置において、前記グループ生成手段は、前記候補位置同士の距離によって前記近接度合いを測り、前記撮影画像内の前記混雑度合いが高い位置ほど前記距離に関する上限を小さく設定し、前記上限以下の距離にある前記候補位置からなる前記候補位置グループを抽出する構成とすることができる。 (3) In the object detection device according to (1), the group generation unit measures the proximity degree by the distance between the candidate positions, and the upper limit related to the distance increases as the congestion degree in the photographed image increases. Can be set small, and the candidate position group consisting of the candidate positions at a distance below the upper limit can be extracted.

（４）上記（１）〜（３）に記載の物体検出装置において、前記混雑推定手段は、所定の密度ごとに当該密度にて前記物体が存在する空間を撮影した密度画像それぞれの特徴を学習した密度推定器を用いて、前記撮影画像内の任意の領域に撮影された前記物体の前記密度を前記混雑度合いとして推定する構成とすることができる。 (4) In the object detection device according to (1) to (3), the congestion estimation unit learns features of density images obtained by photographing the space in which the object is present at the predetermined density. The density of the object captured in an arbitrary area in the captured image may be estimated as the degree of congestion using the density estimator.

（５）上記（４）に記載の物体検出装置において、前記グループ生成手段は、前記撮影画像内の任意の領域にて前記混雑推定手段により推定される前記物体の前記密度に対する、前記候補位置抽出手段により抽出される前記候補位置の当該領域における密度の比に応じて、当該領域における前記候補位置グループを構成する候補位置の上限数を設定し、前記上限数以下の前記候補位置からなる前記候補位置グループを生成する構成とすることができる。 (5) In the object detection device according to (4), the group generation unit extracts the candidate position with respect to the density of the object estimated by the congestion estimation unit in an arbitrary area in the captured image. The upper limit number of candidate positions constituting the candidate position group in the area is set according to the density ratio in the area of the candidate positions extracted by the means, and the candidate consisting of the candidate positions not exceeding the upper limit number It can be configured to generate location groups.

本発明によれば、混雑状態によらず高精度に個々の物体の位置を検出できる物体検出装置が得られる。 According to the present invention, it is possible to obtain an object detection apparatus capable of detecting the positions of individual objects with high accuracy regardless of congestion.

本発明の実施形態に係る画像監視装置の概略の構成を示すブロック図である。FIG. 1 is a block diagram showing a schematic configuration of an image monitoring device according to an embodiment of the present invention. 本発明の実施形態に係る画像監視装置の概略の機能ブロック図である。FIG. 1 is a schematic functional block diagram of an image monitoring device according to an embodiment of the present invention. 単体識別器記憶手段が記憶している単体識別器の情報を模式的に表した図である。It is the figure which represented typically the information of the single-piece | unit discriminator which the single-piece | unit discriminator storage means has memorize | stored. 密度クラスごとの候補位置および候補位置グループの例を示す模式図である。It is a schematic diagram which shows the example of the candidate position for every density class, and a candidate position group. 本発明の実施形態に係る画像監視装置の概略の動作を示すフロー図である。It is a flowchart which shows the general | schematic operation | movement of the image monitoring apparatus concerning embodiment of this invention. 候補位置抽出処理の概略のフロー図である。It is a flowchart of the outline of candidate position extraction processing. 候補位置統合処理の概略のフロー図である。It is a schematic flowchart of candidate position integration processing.

以下、本発明の実施形態として、イベント会場が撮影された撮影画像から個々の人を検出する物体検出装置の例を含み、検出結果を監視員に対して表示する画像監視装置１の例を説明する。 Hereinafter, as an embodiment of the present invention, an example of an image monitoring apparatus 1 including an example of an object detection apparatus that detects an individual from an image captured of an event hall will be described. Do.

［画像監視装置１の構成］
図１は画像監視装置１の概略の構成を示すブロック図である。画像監視装置１は、撮影部２、通信部３、記憶部４、画像処理部５および表示部６からなる。 [Configuration of Image Monitoring Device 1]
FIG. 1 is a block diagram showing a schematic configuration of the image monitoring device 1. The image monitoring apparatus 1 includes a photographing unit 2, a communication unit 3, a storage unit 4, an image processing unit 5, and a display unit 6.

撮影部２は監視カメラであり、通信部３を介して画像処理部５と接続され、監視空間を所定の時間間隔で撮影して撮影画像を生成し、撮影画像を順次、画像処理部５に入力する撮影手段である。例えば、撮影部２はイベント会場に設置されたポールに当該監視空間を俯瞰する視野を有して設置される。その視野は固定されていてもよいし、予めのスケジュール或いは通信部３を介した外部からの指示に従って変更されてもよい。また、例えば、撮影部２は監視空間をフレーム周期１秒で撮影してカラー画像を生成する。カラー画像の代わりにモノクロ画像を生成してもよい。 The photographing unit 2 is a monitoring camera, and is connected to the image processing unit 5 via the communication unit 3 to photograph the monitoring space at predetermined time intervals to generate a photographed image, and to sequentially transmit the photographed image to the image processing unit 5 It is a photographing means to input. For example, the photographing unit 2 is installed on a pole installed at an event hall with a view over the monitoring space. The field of view may be fixed or may be changed according to a pre-schedule or an external instruction via the communication unit 3. Also, for example, the photographing unit 2 photographs the monitoring space at a frame period of 1 second to generate a color image. Instead of a color image, a monochrome image may be generated.

通信部３は通信回路であり、その一端が画像処理部５に接続され、他端が同軸ケーブルまたはＬＡＮ（Local Area Network）、インターネットなどの通信網を介して撮影部２および表示部６と接続される。通信部３は撮影部２から撮影画像を取得して画像処理部５に入力し、画像処理部５から入力された検出結果を表示部６に出力する。 The communication unit 3 is a communication circuit, one end of which is connected to the image processing unit 5 and the other end of which is connected to the photographing unit 2 and the display unit 6 via a coaxial cable or a communication network such as LAN (Local Area Network) or the Internet. Be done. The communication unit 3 acquires a captured image from the imaging unit 2 and inputs the image to the image processing unit 5, and outputs the detection result input from the image processing unit 5 to the display unit 6.

記憶部４は、ＲＯＭ（Read Only Memory）、ＲＡＭ（Random Access Memory）等のメモリ装置であり、各種プログラムや各種データを記憶する。記憶部４は画像処理部５と接続され、画像処理部５との間でこれらの情報を入出力する。 The storage unit 4 is a memory device such as a read only memory (ROM) or a random access memory (RAM), and stores various programs and various data. The storage unit 4 is connected to the image processing unit 5, and inputs / outputs the information with the image processing unit 5.

画像処理部５は、ＣＰＵ（Central Processing Unit）、ＤＳＰ（Digital Signal Processor）、ＭＣＵ（Micro Control Unit）等の演算装置で構成される。画像処理部５は記憶部４と接続され、記憶部４からプログラムを読み出して実行することにより各種の処理手段・制御手段として動作し、各種データを記憶部４に記憶させ、また記憶部４から読み出す。また、画像処理部５は、通信部３を介して撮影部２および表示部６とも接続され、通信部３経由で撮影部２から取得した撮影画像を解析することにより個々の人を検出し、検出結果を通信部３経由で表示部６に出力する。 The image processing unit 5 is configured by an arithmetic device such as a central processing unit (CPU), a digital signal processor (DSP), and a micro control unit (MCU). The image processing unit 5 is connected to the storage unit 4 and operates as various processing means and control means by reading and executing a program from the storage unit 4 to store various data in the storage unit 4 and from the storage unit 4 read out. The image processing unit 5 is also connected to the imaging unit 2 and the display unit 6 via the communication unit 3, and detects individual persons by analyzing a captured image obtained from the imaging unit 2 via the communication unit 3, The detection result is output to the display unit 6 via the communication unit 3.

表示部６は、液晶ディスプレイ又はＣＲＴ（Cathode Ray Tube）ディスプレイ等のディスプレイ装置であり、通信部３を介して画像処理部５と接続され、画像処理部５による検出結果を表示する表示手段である。監視員は表示された検出結果を視認して混雑の発生等を判断し、必要に応じて人員配置の変更等の対処を行う。 The display unit 6 is a display device such as a liquid crystal display or a CRT (Cathode Ray Tube) display, and is connected to the image processing unit 5 via the communication unit 3 and is a display unit that displays the detection result by the image processing unit 5 . The supervisor visually recognizes the displayed detection results to determine the occurrence of congestion, etc., and takes measures such as changing the staffing as needed.

なお、本実施形態においては、撮影部２と画像処理部５の個数が１対１である画像監視装置１を例示するが、別の実施形態においては、撮影部２と画像処理部５の個数を多対１或いは多対多とすることもできる。 In the present embodiment, the image monitoring device 1 in which the number of the imaging unit 2 and the number of the image processing units 5 is one to one is exemplified, but in another embodiment, the number of the imaging units 2 and the image processing unit 5 Can also be many-to-one or many-to-many.

［画像監視装置１の機能］
図２は画像監視装置１の機能を示す機能ブロック図である。通信部３は画像取得手段３０および物体位置出力手段３１等として機能し、記憶部４は密度推定器記憶手段４０および単体識別器記憶手段４１等として機能する。画像処理部５は、密度推定手段５０、候補位置抽出手段５１、グループ生成手段５２および物体位置決定手段５３等として機能する。 [Function of Image Monitoring Device 1]
FIG. 2 is a functional block diagram showing the functions of the image monitoring device 1. The communication unit 3 functions as an image acquisition unit 30, an object position output unit 31, and the like, and the storage unit 4 functions as a density estimator storage unit 40, a single identifier storage unit 41, and the like. The image processing unit 5 functions as a density estimation unit 50, a candidate position extraction unit 51, a group generation unit 52, an object position determination unit 53, and the like.

画像取得手段３０は撮影手段である撮影部２から撮影画像を順次取得して、取得した撮影画像を密度推定手段５０および候補位置抽出手段５１に順次出力する。 The image acquisition unit 30 sequentially acquires captured images from the imaging unit 2 which is an imaging unit, and sequentially outputs the acquired captured images to the density estimation unit 50 and the candidate position extraction unit 51.

密度推定器記憶手段４０は、所定の密度ごとに当該密度にて物体（人）が存在する空間を撮影した画像（密度画像）それぞれの画像特徴を学習した推定密度算出関数であって、画像の特徴量を入力されると当該画像に撮影されている物体の密度の推定値（推定密度）を算出して出力する推定器（密度推定器）を表す情報を予め記憶している。つまり密度推定器記憶手段４０は上記推定密度算出関数の係数等のパラメータを密度推定器の情報として予め記憶している。 The density estimator storage means 40 is an estimated density calculation function which learns the image features of each image (density image) obtained by photographing a space where an object (person) exists at the density for each predetermined density. When the feature amount is input, information representing an estimator (density estimator) which calculates and outputs an estimated value (estimated density) of the density of the object captured in the image is stored in advance. That is, the density estimator storage means 40 stores in advance parameters such as the coefficients of the estimated density calculation function as information of the density estimator.

密度推定手段５０は、画像取得手段３０から入力された撮影画像内の任意の領域を解析して当該領域に撮影された物体の密度を、当該領域における物体の混雑の度合い（混雑度合い）として推定する混雑推定手段である。具体的には、密度推定手段５０は、任意の領域の撮影画像から密度推定用の特徴量（推定用特徴量）を抽出するとともに密度推定器記憶手段４０から密度推定器を読み出して、抽出した推定用特徴量のそれぞれを密度推定器に入力することによって密度を推定する。この推定を撮影画像内の複数の位置にて行うことにより、撮影画像内での推定密度の分布（物体の密度分布）が求められ、密度推定手段５０は推定した密度分布を候補位置抽出手段５１に出力する。 The density estimating means 50 analyzes an arbitrary area in the photographed image inputted from the image acquiring means 30, and estimates the density of the object photographed in the area as the degree of congestion of the object in the area (congestion degree) Congestion estimation means. Specifically, the density estimation unit 50 extracts a feature amount (feature amount for estimation) for density estimation from a photographed image of an arbitrary region and reads out a density estimator from the density estimator storage unit 40 and extracts the extracted feature amount. The density is estimated by inputting each of the estimation feature quantities into the density estimator. By performing this estimation at a plurality of positions in the photographed image, the distribution of the estimated density (density distribution of the object) in the photographed image can be obtained, and the density estimating unit 50 can estimate the density distribution as candidate position extracting unit 51 Output to

密度推定の処理と密度推定器について具体的に説明する。 The process of density estimation and the density estimator will be specifically described.

密度推定手段５０は、撮影画像の各画素の位置に窓（推定用抽出窓）を設定し、各推定用抽出窓における撮影画像から推定用特徴量を抽出する。推定用特徴量はＧＬＣＭ（Gray Level Co-occurrence Matrix）特徴である。 The density estimation unit 50 sets a window (extraction window for estimation) at the position of each pixel of the captured image, and extracts the feature amount for estimation from the captured image in each extraction window for estimation. The feature amount for estimation is a feature of Gray Level Co-occurrence Matrix (GLCM).

各推定用抽出窓に撮影されている監視空間内の領域は同一サイズであることが望ましい。すなわち、好適には密度推定手段５０は不図示のカメラパラメータ記憶手段から予め記憶されている撮影部２のカメラパラメータを読み出し、カメラパラメータを用いたホモグラフィ変換により撮影画像の任意の画素に撮影されている監視空間内の領域が同一サイズとなるように撮影画像を変形してから推定用特徴量を抽出する。 It is desirable that the areas in the monitoring space captured by the estimation extraction windows have the same size. That is, preferably, the density estimation unit 50 reads the camera parameters of the imaging unit 2 stored in advance from the camera parameter storage unit (not shown), and the image is captured at an arbitrary pixel of the captured image by homography conversion using the camera parameters. After the photographed image is deformed so that the area in the monitored space becomes the same size, the feature quantity for estimation is extracted.

密度推定器は多クラスの画像を識別する識別器で実現することができ、多クラスＳＶＭ（Support Vector Machine）法で学習した識別関数とすることができる。 The density estimator can be realized by a classifier that identifies multiple classes of images, and can be a classification function learned by the multiclass Support Vector Machine (SVM) method.

密度は、例えば、人が存在しない「背景」クラス、０人／ｍ^２より高く２人／ｍ^２以下である「低密度」クラス、２人／ｍ^２より高く４人／ｍ^２以下である「中密度」クラス、４人／ｍ^２より高い「高密度」クラスの４クラスと定義することができる。 The density is, for example, a "background" class in which there are no people, a "low density" class which is higher than 0 person / m ^{2 and not} more than 2 people / m ^{2, and not} less than 2 people / m ^{2 and not} more than 4 people / m ² It can be defined as four classes of "medium density" class and "high density" class higher than 4 persons / m ² .

推定密度は各クラスに予め付与された値であり、分布推定の結果として出力される値である。本実施形態では各クラスに対応する値を「背景」、「低密度」、「中密度」、「高密度」と表記する。 The estimated density is a value given in advance to each class, and is a value output as a result of distribution estimation. In the present embodiment, values corresponding to each class are described as “background”, “low density”, “medium density”, and “high density”.

すなわち、密度推定器は「背景」クラス、「低密度」クラス、「中密度」クラス、「高密度」クラスのそれぞれに帰属する多数の画像（密度画像）の特徴量に多クラスＳＶＭ法を適用して学習して得られる、各クラスの密度画像を他のクラスと識別するための識別関数である。この学習により導出された識別関数のパラメータが密度推定器として記憶されている。なお、密度画像の特徴量は、推定用特徴量と同種であり、ＧＬＣＭ特徴である。 That is, the density estimator applies the multi-class SVM method to the feature quantities of a large number of images (density images) belonging to the "background" class, the "low density" class, the "medium density" class, and the "high density" class. It is a discriminant function for discriminating density images of each class from other classes obtained by learning. The parameters of the discriminant function derived by this learning are stored as a density estimator. The feature amount of the density image is the same as the feature amount for estimation, and is a GLCM feature.

密度推定手段５０は、各画素に対応して抽出した推定用特徴量のそれぞれを密度推定器に入力することによってその出力値である推定密度を取得する。なお、撮影画像を変形させて推定用特徴量を抽出した場合、密度推定手段５０はカメラパラメータを用いたホモグラフィ変換により密度分布を元の撮影画像の形状に変形させる。 The density estimation means 50 acquires estimated density which is the output value by inputting each of the feature quantities for estimation extracted corresponding to each pixel to the density estimator. When the photographed image is deformed and the estimation feature amount is extracted, the density estimating unit 50 deforms the density distribution into the shape of the original photographed image by homography conversion using camera parameters.

こうして得られた、撮影画像の画素ごとの推定密度の集まりが密度分布である。ここで、密度推定手段５０が出力する密度分布からは撮影画像の各所における人の粗密状況が分かるが、密度分布から個々の人の位置までは分からない。これに対し、密度推定手段５０の後に設けられる候補位置抽出手段５１、グループ生成手段５２および物体位置決定手段５３は、撮影画像に現れている個々の人の位置を検出する手段である。 A group of estimated densities for each pixel of the captured image obtained in this manner is a density distribution. Here, although the density condition of the person in each place of the photographed image can be known from the density distribution outputted by the density estimating means 50, the position of the individual person can not be known from the density distribution. On the other hand, candidate position extraction means 51, group generation means 52 and object position determination means 53 provided after density estimation means 50 are means for detecting the position of each person appearing in the photographed image.

単体識別器記憶手段４１は、単独の人（物体）が撮影された画像（単体画像）の特徴を学習した識別器（単体識別器）を予め記憶している。 The single identifier storage means 41 stores in advance a identifier (single identifier) that has learned the features of an image (single image) in which a single person (object) is photographed.

図３は単体識別器記憶手段４１が記憶している単体識別器の情報を模式的に表した図である。 FIG. 3 is a view schematically showing the information of the single discriminator stored in the single discriminator storage means 41. As shown in FIG.

単体識別器は、画像の特徴量を入力されると当該画像が単体画像であることの尤もらしさを表す評価値（識別スコア）を算出して出力する評価値算出関数の係数、および識別スコアに対して適用する閾値等のパラメータで表される。 The simple discriminator calculates the evaluation value (identification score) that represents the likelihood that the image is a simple image when the feature amount of the image is input, and outputs the evaluation score calculation function coefficient and the discrimination score. It is represented by parameters such as a threshold value applied to it.

単体識別器は多数の単体画像とそれぞれが人以外しか写っていない多数の無人画像とからなる学習用画像の特徴量に線形ＳＶＭ法を適用して学習した識別器とすることができる。 The single-piece discriminator can be a discriminator learned by applying the linear SVM method to the feature amount of a learning image consisting of a large number of single-piece images and a large number of unmanned images in which each person is only a person.

学習用アルゴリズムとして線形ＳＶＭを用いた場合、評価値算出関数の係数は重みベクトルである。この重みベクトルは特徴量の各要素に対する重みであり、入力された画像の特徴量と重みベクトルとの内積の値が識別スコアを表す。学習において、当該重みベクトルと特徴量との内積が０より大きい場合は人、０以下の場合は人以外と識別されるように調整される。よって、入力された画像が単体画像であるか否かを識別する閾値は原理上は０であり、通常は、閾値は０に設定することができる。ただし、単体画像を単体画像でないと識別する誤りを減じるために、閾値を０よりも小さな値に設定してもよい。 When a linear SVM is used as a learning algorithm, the coefficient of the evaluation value calculation function is a weight vector. The weight vector is a weight for each element of the feature amount, and the value of the inner product of the feature amount of the input image and the weight vector represents the identification score. In learning, when the inner product of the weight vector and the feature amount is larger than 0, it is adjusted so as to be identified as a person, and when less than 0, it is identified as other than a person. Therefore, the threshold for identifying whether the input image is a single image is in principle 0, and the threshold can usually be set to 0. However, the threshold may be set to a value smaller than 0 in order to reduce an error in identifying a single image as not being a single image.

なお、学習用画像の特徴量はＨＯＧ（Histograms of Oriented Gradients）特徴量である。 Note that the feature amounts of the learning image are HOG (Histograms of Oriented Gradients) feature amounts.

単体識別器記憶手段４１が記憶している単体識別器は、密度が高いほど単独の物体を構成する部分のうちの少ない部分の画像特徴を学習した識別器となっている。単体識別器記憶手段４１は、低密度クラスを表す値と対応付けて単独の人の全身の画像特徴を学習した単体識別器である全身識別器１００、中密度クラスを表す値と対応付けて単独の人の上部２／３の画像特徴を学習した単体識別器である上半身識別器１０１、高密度クラスを表す値と対応付けて単独の人の上部１／３の画像特徴を学習した単体識別器である頭部近傍識別器１０２を記憶している。 The single-piece classifier stored in the single-piece classifier storage unit 41 is a classifier that has learned image features of a small part of parts constituting a single object as the density is higher. The single discriminator storage means 41 is a single discriminator that is a single discriminator that has learned the image feature of a single person's whole body in association with a value representing a low density class, and is associated with a value representing a medium density class Upper-body discriminator 101, which is a simplex discriminator that has learned the image features of the upper 2/3 of a person, and a simplex discriminator that has learned the image characteristics of the upper 1/3 of a single person in association with a value representing a high density class The near-head identifier 102 is stored.

全身識別器１００は単独の人の全身が撮影された単体画像を用いて学習した単体識別器であり、上半身識別器１０１は単独の人の上部２／３が撮影された単体画像（人の全身が撮影された単体画像の上部２／３を切り出した画像など）を用いて学習した単体識別器であり、頭部近傍識別器１０２は単独の人の上部１／３が撮影された単体画像（人の全身が撮影された単体画像の上部１／３を切り出した画像など）を用いて学習した単体識別器である。 The whole-body discriminator 100 is a single-piece discriminator learned using a single-body image of the whole body of a single person, and the upper-body discriminator 101 is a single-piece image of the upper two-thirds of a single person Is a single-body discriminator learned using an image obtained by cutting out the upper 2/3 of a single-body image taken, etc., and the near-head part discriminator 102 is a single-piece image in which the upper third of a single person is photographed ( It is a single-piece discriminator learned using an image etc. which cut out the upper 1/3 of the single-piece picture by which the whole human body was photoed.

このように、単体識別器記憶手段４１は、低密度クラスと対応付けて全身識別器１００を、中密度クラスと対応付けて上半身識別器１０１を、高密度クラスと対応付けて頭部近傍識別器１０２を記憶している。 As described above, the single identifier storage unit 41 associates the whole body identifier 100 with the low density class, associates the full body identifier 100 with the medium density class, associates the upper body identifier 101 with the high density class, and the near head identifier. 102 is stored.

候補位置抽出手段５１は、単体識別器記憶手段４１から単体識別器を読み出し、読み出した単体識別器を用いて、撮影画像において単体画像の特徴を有する候補位置を抽出し、抽出した候補位置をグループ生成手段５２に出力する。 The candidate position extraction means 51 reads out the simplex discriminator from the simplex discriminator storage means 41, extracts the candidate position having the feature of the simplex image in the photographed image using the read out simplex discriminator, and groups the extracted candidate positions It is output to the generation means 52.

具体的には、まず、候補位置抽出手段５１は、撮影画像内に所定間隔にて評価位置を複数設定し、各評価位置を基準として識別用の窓（識別用抽出窓）を設定する。例えば、候補位置抽出手段５１は、撮影画像全体に１画素間隔で評価位置を設定し、各画素の位置を人の頭部重心を表す評価位置として当該位置を基準に識別用抽出窓を設定する。 Specifically, first, the candidate position extraction unit 51 sets a plurality of evaluation positions in the captured image at predetermined intervals, and sets a window for identification (an identification extraction window) based on each evaluation position. For example, the candidate position extraction unit 51 sets an evaluation position at an interval of one pixel in the entire photographed image, and sets the extraction window for identification with the position of each pixel as the evaluation position indicating the center of gravity of a person's head. .

そして、識別用抽出窓を設定した候補位置抽出手段５１は、各識別用抽出窓内の画像の特徴量を抽出し、抽出した特徴量を単体識別器に入力することによって各評価位置の識別スコアを取得する。 And candidate position extraction means 51 which set the extraction window for identification extracts the feature-value of the image in each extraction window for identification, The identification score of each evaluation position is input by inputting the extracted feature-value to a single-piece | unit discriminator To get

このとき、候補位置抽出手段５１は、混雑によるオクルージョンを考慮しつつ識別に用いる画像をできる限り大きくするために、各評価位置における物体の密度が高いほど識別用抽出窓を小さく、密度が低いほど識別用抽出窓を大きく設定し、識別用抽出窓の大きさに対応した単体識別器を用いる。 At this time, in order to make the image used for identification as large as possible while taking into consideration the occlusion due to congestion, the candidate position extraction means 51 makes the extraction window for identification smaller as the density of objects at each evaluation position is higher, and the lower the density is. The discrimination extraction window is set large, and a single discriminator corresponding to the size of the discrimination extraction window is used.

そのために、候補位置抽出手段５１は、各評価位置に単独の人の上部１／３の形状に定められた窓を設定するとともに密度推定手段５０から入力された密度分布を参照し、当該窓内における最多の推定密度を当該評価位置の密度と決定する。 For that purpose, the candidate position extraction unit 51 sets a window defined in the shape of the upper 1/3 of a single person at each evaluation position and refers to the density distribution input from the density estimation unit 50, and The largest estimated density at is determined as the density of the evaluation position.

そして、候補位置抽出手段５１は、密度が低密度である評価位置に単独の人の全身の形状に定められた識別用抽出窓を設定して識別用抽出窓内の撮影画像から単体識別用の特徴量（識別用特徴量）を抽出し、抽出した特徴量を全身識別器に入力して識別スコアを取得する。また、候補位置抽出手段５１は、密度が中密度である評価位置に単独の人の上部２／３の形状に定められた識別用抽出窓を設定して識別用抽出窓内の撮影画像から識別用特徴量を抽出し、抽出した特徴量を上半身識別器に入力して識別スコアを取得する。また、候補位置抽出手段５１は、密度が高密度である評価位置に単独の人の上部１／３の形状に定められた識別用抽出窓を設定して識別用抽出窓内の撮影画像から識別用特徴量を抽出し、抽出した特徴量を頭部近傍識別器に入力して識別スコアを取得する。 Then, the candidate position extraction means 51 sets the identification extraction window defined in the shape of the whole body of a single person at the evaluation position where the density is low, and for the single identification from the photographed image in the identification extraction window. A feature amount (feature amount for identification) is extracted, and the extracted feature amount is input to the whole-body classifier to acquire an identification score. Further, the candidate position extraction means 51 sets the identification extraction window defined in the shape of the upper part 2/3 of a single person at the evaluation position where the density is medium density, and identifies from the photographed image in the identification extraction window Feature amounts are extracted, and the extracted feature amounts are input to the upper body classifier to obtain an identification score. Further, the candidate position extraction means 51 sets an identification extraction window defined in the shape of the upper 1/3 of a single person at an evaluation position having a high density and identifies it from the photographed image in the identification extraction window Feature quantities are extracted, and the extracted feature quantities are input to the near-head classifier to obtain a discrimination score.

こうして評価位置ごとの識別スコアを取得した候補位置抽出手段５１は、各識別スコアを予め定めた閾値と比較し、識別スコアが閾値以上である評価位置を候補位置として抽出する。例えば、上述したように人と人以外とを分ける閾値を０と定めたＳＶＭを用いた場合、候補位置抽出手段５１は、０より大きな識別スコアが取得された評価位置を抽出する。そして、候補位置抽出手段５１は候補位置ごとに、候補位置、密度、識別スコア、使用した単体識別器の閾値および使用した識別用抽出窓を対応付けた情報（候補位置情報）をグループ生成手段５２に出力する。 Thus, the candidate position extraction unit 51 that has acquired the identification score for each evaluation position compares each identification score with a predetermined threshold, and extracts an evaluation position whose identification score is equal to or higher than the threshold as a candidate position. For example, as described above, in the case of using an SVM in which the threshold value for dividing a person from others is defined as 0, the candidate position extraction unit 51 extracts an evaluation position at which a discrimination score larger than 0 is acquired. Then, the candidate position extraction unit 51 generates information (candidate position information) in which the candidate position, the density, the identification score, the threshold value of the single discriminator used, and the extraction window for identification used are associated for each candidate position. Output to

こうして抽出される候補位置は、多くの場合、個々の人のそれぞれに対して複数個抽出される。そこで、グループ生成手段５２が候補位置のうち同一人物のものと考えられる１以上の候補位置からなるグループ（候補位置グループ）を生成し、物体位置決定手段５３が候補位置グループごとに候補位置を１つに統合して個々の人の位置（物体位置）を決定する。 In many cases, a plurality of candidate positions extracted in this manner are extracted for each individual person. Therefore, the group generation unit 52 generates a group (candidate position group) consisting of one or more candidate positions considered to be of the same person among the candidate positions, and the object position determination unit 53 sets one candidate position for each candidate position group. Integrate into one to determine the position (object position) of each person.

図４は密度クラスごとの候補位置と候補位置グループの例を示す模式図である。図４（ａ）は低密度領域の例であり、図４（ｂ）は中密度領域の例であり、図４（ｃ）は高密度領域の例である。図４（ａ）〜（ｃ）それぞれの左側部分には人物の近傍にて抽出された複数の候補位置に対応する識別用抽出窓の例を示している。具体的には、図４（ａ）の低密度領域では１人の人物２００の近傍に４つの候補位置が抽出され、これに対応して全身に対応した形状・大きさの４つの識別用抽出窓２０１が示されている。図４（ｂ）の中密度領域では近接する２人の人物２１０，２１１の近傍に５つの候補位置が抽出され、これに対応して人の上部２／３に対応した形状・大きさの５つの識別用抽出窓２１２が示されている。図４（ｃ）の高密度領域では近接する３人の人物２２０〜２２２の近傍に７つの候補位置が抽出され、これに対応して人の上部１／３に対応した形状・大きさの７つの識別用抽出窓２２３が示されている。また、図４（ａ）〜（ｃ）の右側部分はそれぞれの左側部分に示した候補位置から生成される候補位置グループの例を示している。 FIG. 4 is a schematic view showing an example of candidate positions and candidate position groups for each density class. FIG. 4A is an example of the low density region, FIG. 4B is an example of the medium density region, and FIG. 4C is an example of the high density region. The left part of each of FIGS. 4A to 4C shows an example of an identification extraction window corresponding to a plurality of candidate positions extracted in the vicinity of a person. Specifically, in the low density region of FIG. 4A, four candidate positions are extracted in the vicinity of one person 200, and in correspondence with this, four discrimination extractions of shape / size corresponding to the whole body A window 201 is shown. In the medium density region of FIG. 4B, five candidate positions are extracted in the vicinity of two persons 210 and 211 in proximity, and the shape / size 5 corresponding to the upper two thirds of the person correspondingly One identification extraction window 212 is shown. In the high density region of FIG. 4C, seven candidate positions are extracted in the vicinity of three persons 220 to 222 in proximity, and the shape / size 7 corresponding to the upper one-third of the person correspondingly One identification extraction window 223 is shown. The right side of FIGS. 4A to 4C shows an example of a candidate position group generated from the candidate positions shown in the left side of each.

ここで、高い密度が推定された領域では人同士が近接しているため広い範囲で候補位置グループを生成すると、複数人の候補位置から１つの候補位置グループを生成する誤りが生じて、人の位置を検出し損ねる。その一方で、低い密度が推定された領域では高い密度が推定された領域よりも広い範囲で同一人物についての候補位置が抽出され得るため、狭い範囲で候補位置グループを生成すると、同一人物の候補位置から複数の候補位置グループを生成する誤りが生じて、人の位置が過剰に検出されてしまう。 Here, when the candidate position group is generated in a wide range because people are close to each other in the area where the high density is estimated, an error occurs to generate one candidate position group from a plurality of candidate positions. It fails to detect the position. On the other hand, since candidate positions for the same person can be extracted in a wider range than a region where high density is estimated in a region where low density is estimated, when candidate position groups are generated in a narrow range, candidates for the same person are candidates. An error occurs to generate a plurality of candidate position groups from the positions, and the position of the person is excessively detected.

そこで、グループ生成手段５２は、候補位置にて推定された密度を参照し、撮影画像内の密度が高い位置ほど狭い範囲（統合範囲）で抽出された候補位置からなる候補位置グループを生成する。例えば、低密度領域では統合範囲が広い結果、図４（ａ）の右側に示すように、１人の人物２００の近傍の識別用抽出窓２０１で示す４つの候補位置全てから１つの候補位置グループ２０２が生成され得る。これに対して中密度領域では統合範囲が狭く設定される結果、図４（ｂ）の右側に示すように、識別用抽出窓２１２で示す５つの候補位置から２人の人物２１０，２１１に対応して２つの候補位置グループ２１３，２１４が生成され、またさらに統合範囲が狭く設定される高密度領域では図４（ｃ）の右側に示すように、識別用抽出窓２２３で示す７つの候補位置から３人の人物２２０〜２２２に対応して３つの候補位置グループ２２４〜２２６が生成され得る。 Therefore, the group generation unit 52 refers to the density estimated at the candidate position, and generates a candidate position group consisting of candidate positions extracted in a narrower range (integrated range) as the density in the captured image is higher. For example, as shown in the right side of FIG. 4A, as a result of the wide integration range in the low density region, one candidate position group is obtained from all four candidate positions shown by the discrimination extraction window 201 near one person 200. 202 may be generated. On the other hand, as a result of the integration range being set narrow in the medium density region, as shown on the right side of FIG. 4B, the two candidate positions 210, 211 are supported from five candidate positions shown by the extraction window 212 for identification. As shown in the right side of FIG. 4C, in the high density area where the two candidate position groups 213 and 214 are generated and the integration range is set to be narrow, seven candidate positions indicated by the extraction window 223 for identification Three candidate position groups 224-226 may be generated corresponding to the three persons 220-222.

統合範囲は候補位置同士の近接度合いを示す任意の尺度を用いて定義することができる。つまり、グループ生成手段５２は、近接度合いに関する下限を設定し、当該下限以上に近接していれば統合範囲内であるとして候補位置グループを生成する。そして、その際に、グループ生成手段５２は、撮影画像内の密度が高い位置ほど候補位置同士の近接度合いに関する下限を高く設定し、密度が低い位置ほど下限を低くする。 The integrated range can be defined using any measure indicating the degree of proximity of candidate positions. That is, the group generation unit 52 sets a lower limit on the proximity degree, and generates a candidate position group as being within the integrated range if it is closer to the lower limit or more. Then, at that time, the group generation unit 52 sets the lower limit regarding the proximity degree of candidate positions higher as the density in the captured image is higher, and lowers the lower limit as the density is lower.

グループ生成手段５２は生成した各候補位置グループに帰属する候補位置の候補位置情報に当該候補位置グループの識別子を付与し、各候補位置グループの情報を物体位置決定手段５３に出力する。 The group generation unit 52 assigns an identifier of the candidate position group to the candidate position information of the candidate position belonging to each of the generated candidate position groups, and outputs information of each candidate position group to the object position determination unit 53.

具体的にはグループ生成手段５２は各候補位置と対応して設定された識別用抽出窓（候補領域）同士の重複割合によって近接度合いを測り、統合範囲を制御することができる。すなわち、重複割合について下限割合を設定し、識別用抽出窓同士が下限割合以上に重複している場合を統合範囲とする。そして、グループ生成手段５２は、撮影画像内の密度が高い位置ほど高い下限割合を設定し、下限割合以上で重複している複数の候補領域からなる候補位置グループを抽出する。例えば、候補領域ＡとＢの重複割合を式（１）にて定義し、密度が低密度である候補位置に対しては下限割合を０．５と設定し、密度が中密度である候補位置に対しては下限割合を０．６５と設定し、密度が高密度である候補位置に対しては下限割合を０．８と設定する。なお、式（１）においてＳ_Ａ，Ｓ_Ｂ，Ｓ_Ａ∩Ｂはそれぞれ候補領域Ａの面積、候補領域Ｂの面積、候補領域Ａ，Ｂの重複部分の面積を表す。 Specifically, the group generation unit 52 can measure the proximity degree by the overlapping ratio of the extraction windows for identification (candidate regions) set corresponding to the respective candidate positions, and can control the integration range. That is, the lower limit ratio is set for the overlap ratio, and the case where the extraction windows for identification overlap with each other is set as the integrated range. Then, the group generation unit 52 sets a lower limit ratio that increases as the density in the captured image increases, and extracts a candidate position group consisting of a plurality of candidate areas overlapping at the lower limit ratio or more. For example, the overlapping ratio of candidate areas A and B is defined by equation (1), the lower limit ratio is set to 0.5 for a candidate position having a low density, and a candidate position having a medium density For this, the lower limit ratio is set to 0.65, and the lower limit ratio is set to 0.8 for candidate positions with high density. In Equation (1), S _A , S _B and S _{A ∩ B} respectively represent the area of the candidate area A, the area of the candidate area B, and the area of the overlapping portion of the candidate areas A and B.

さらに、グループ生成手段５２は、人の配置の偏りによる検出し損ねを防止するために、撮影画像内の任意の領域にて密度推定手段５０により推定される物体の密度に対する、候補位置抽出手段５１により抽出される候補位置の当該領域における密度の比に応じて、当該領域における候補位置グループを構成する候補位置の上限数を設定し、上限数以下の候補位置からなる前記候補位置グループを生成・抽出する。具体的には、グループ生成手段５２は、密度推定手段５０により推定された密度ごとに、抽出された候補位置の数および当該密度が推定された領域の大きさに応じて候補位置グループを構成する候補位置の上限数を設定する。例えば、２人／ｍ^２より高く４人／ｍ^２以下の密度画像から学習した中密度クラスと推定された領域が３．５ｍ^２相当である場合、当該領域に撮影されている人の数は７〜１４人と推定される。そして、例えば、当該領域から４０個の候補位置が抽出された場合、候補位置グループを構成する候補位置の数は平均的には２．９〜５．７個と推定される。これに対応してグループ生成手段５２は、中密度の候補位置グループを構成する候補位置の上限数を６個に設定する。 Furthermore, the group generation unit 52 can use candidate position extraction unit 51 with respect to the density of the object estimated by the density estimation unit 50 in an arbitrary area in the photographed image in order to prevent detection failure due to the deviation of the arrangement of persons. The upper limit number of candidate positions constituting the candidate position group in the area is set according to the density ratio of the candidate positions extracted by the target area, and the candidate position group consisting of candidate positions not exceeding the upper limit number is generated Extract. Specifically, for each density estimated by the density estimation means 50, the group generation means 52 constructs a candidate position group according to the number of extracted candidate positions and the size of the region for which the density is estimated. Set the upper limit number of candidate positions. For example, if two people / m higher than ² Quadruple / m area was estimated to density class within learned ² from the density image is 3.5 m ² equivalent, the number of the person being photographed to the area It is estimated to be 7 to 14 people. Then, for example, when 40 candidate positions are extracted from the area, the number of candidate positions constituting the candidate position group is estimated to be 2.9 to 5.7 on average. Corresponding to this, the group generation unit 52 sets the upper limit number of candidate positions constituting the medium density candidate position group to six.

物体位置決定手段５３は、例えば、候補位置グループごとに、当該候補位置グループを構成する候補位置のうち識別スコアが最大の候補位置を物体位置と決定する。そして、物体位置決定手段５３は、決定した物体位置の情報を物体位置出力手段３１に出力する。 The object position determination means 53 determines the candidate position having the largest identification score among the candidate positions constituting the candidate position group as the object position, for example, for each candidate position group. Then, the object position determination means 53 outputs the information of the determined object position to the object position output means 31.

物体位置出力手段３１は物体位置決定手段５３から入力された物体位置の情報を表示部６に順次出力し、表示部６は物体位置出力手段３１から入力された物体位置の情報を表示する。例えば、物体位置の情報はインターネット経由で送受信され、表示部６に表示される。監視員は表示された情報を視認することによって監視空間に混雑が発生している地点を把握し、当該地点に警備員を派遣し或いは増員するなどの対処を行う。 The object position output unit 31 sequentially outputs the information on the object position input from the object position determination unit 53 to the display unit 6, and the display unit 6 displays the information on the object position input from the object position output unit 31. For example, information on the position of an object is transmitted and received via the Internet and displayed on the display unit 6. By watching the displayed information, the surveillance personnel grasps a point where congestion occurs in the surveillance space, and takes measures such as dispatching or increasing security personnel to the point.

［画像監視装置１の動作］
図５、図６および図７のフロー図を参照して画像監視装置１の動作を説明する。 [Operation of Image Monitoring Device 1]
The operation of the image monitoring device 1 will be described with reference to the flowcharts of FIG. 5, FIG. 6 and FIG.

画像監視装置１が動作を開始すると、イベント会場に設置されている撮影部２は所定時間おきに監視空間を撮影して撮影画像を画像処理部５が設置されている画像解析センター宛に順次送信する。そして、画像処理部５は撮影画像を受信するたびに図５のフロー図に従った動作を繰り返す。 When the image monitoring apparatus 1 starts operation, the imaging unit 2 installed in the event hall captures an image of the monitoring space at predetermined time intervals and sequentially transmits the captured image to the image analysis center where the image processing unit 5 is installed. Do. Then, the image processing unit 5 repeats the operation according to the flow chart of FIG. 5 every time the photographed image is received.

まず、通信部３は画像取得手段３０として動作し、撮影部２からの撮影画像の受信待ち状態となる。撮影画像を取得した画像取得手段３０は当該撮影画像を画像処理部５に出力する（ステップＳ１）。 First, the communication unit 3 operates as the image acquisition unit 30 and waits for reception of a photographed image from the photographing unit 2. The image acquisition unit 30 which has acquired the photographed image outputs the photographed image to the image processing unit 5 (step S1).

撮影画像を入力された画像処理部５は密度推定手段５０として動作し、撮影画像から密度分布を推定する（ステップＳ２）。密度推定手段５０は、撮影画像の各画素の位置にて推定用特徴量を抽出するとともに記憶部４の密度推定器記憶手段４０から密度推定器を読み出し、各推定用特徴量を密度推定器に入力して撮影画像の各画素における推定密度を取得することにより、撮影画像における密度分布を推定する。 The image processing unit 5 which receives the photographed image operates as the density estimating unit 50, and estimates the density distribution from the photographed image (step S2). The density estimating means 50 extracts the feature quantity for estimation at the position of each pixel of the photographed image and reads the density estimator from the density estimator storage means 40 of the storage unit 4 and uses the feature quantities for estimation as the density estimator. The density distribution in the photographed image is estimated by inputting and acquiring the estimated density at each pixel of the photographed image.

密度分布を推定した画像処理部５は候補位置抽出手段５１としても動作し、候補位置抽出手段５１には画像取得手段３０から撮影画像が入力されるとともに密度推定手段５０から密度分布が入力される。これらを入力された候補位置抽出手段５１は、密度分布に背景クラス以外の推定密度が含まれているか否かを確認する（ステップＳ３）。 The image processing unit 5 that has estimated the density distribution also operates as the candidate position extraction unit 51, and the candidate position extraction unit 51 receives the photographed image from the image acquisition unit 30 and the density distribution from the density estimation unit 50. . The candidate position extraction means 51 which has input these checks whether or not the density distribution includes an estimated density other than the background class (step S3).

背景クラス以外の推定密度が含まれている場合は（ステップＳ３にてＹＥＳ）、候補位置抽出手段５１は、少なくとも１人以上の人が撮影されているとして、撮影画像から個々の物体の候補位置を抽出する処理を行う（ステップＳ４）。他方、背景クラスのみの場合は（ステップＳ３にてＮＯ）、人が撮影されていないとして、ステップＳ４，Ｓ５の処理を省略する。 If an estimated density other than the background class is included (YES in step S3), the candidate position extraction unit 51 determines that at least one person is photographed, and the candidate positions of individual objects from the photographed image The process of extracting is performed (step S4). On the other hand, in the case of only the background class (NO in step S3), the process of steps S4 and S5 is omitted, assuming that no person is photographed.

図６のフローチャートを参照して、ステップＳ４の候補位置抽出処理を説明する。 The candidate position extraction process of step S4 will be described with reference to the flowchart of FIG.

候補位置抽出手段５１は、撮影画像中の各画素の位置を順次、評価位置に設定する（ステップＳ４００）。そして、候補位置抽出手段５１は、密度推定手段５０から入力される密度分布を参照して、評価位置の密度を特定する（ステップＳ４０１）。具体的には、候補位置抽出手段５１は評価位置に単独の人の上部１／３の形状に定められた窓を設定して当該窓内で最多の推定密度を評価位置の密度として特定する。 The candidate position extraction unit 51 sequentially sets the position of each pixel in the captured image as the evaluation position (step S400). Then, the candidate position extraction unit 51 specifies the density of the evaluation position with reference to the density distribution input from the density estimation unit 50 (step S401). Specifically, the candidate position extraction means 51 sets a window defined in the shape of the top 1/3 of a single person at the evaluation position, and specifies the largest estimated density in the window as the density of the evaluation position.

密度を特定した候補位置抽出手段５１は、単体識別器記憶手段４１から当該密度に応じた単体識別器を読み出し、当該密度に応じた識別用抽出窓を設定して識別用抽出窓内の撮影画像から識別用特徴量を抽出し（ステップＳ４０２）、抽出した識別用特徴量を当該密度に応じた単体識別器に入力して識別スコア（評価値）を算出する（ステップＳ４０３）。 The candidate position extraction means 51 which specified the density reads the single-body discriminator according to the density from the single-body discriminator storage means 41, sets the extraction window for identification according to the density, and picks up the photographed image in the extraction window for identification The feature amount for identification is extracted from (step S402), and the extracted feature amount for identification is input to the single discriminator according to the density to calculate an identification score (evaluation value) (step S403).

そして、候補位置抽出手段５１は、評価位置の評価値が所定の閾値を超えていれば（ステップＳ４０４にてＹＥＳ）、当該評価位置を物体の候補位置とし、候補位置情報を生成する（ステップＳ４０５）。一方、評価位置の評価値が所定の閾値を超えていなければ（ステップＳ４０４にてＮＯ）、当該評価位置は候補位置とはされず、ステップＳ４０５の処理は省略される。 Then, if the evaluation value of the evaluation position exceeds the predetermined threshold (YES in step S404), the candidate position extraction unit 51 sets the evaluation position as a candidate position of the object, and generates candidate position information (step S405). ). On the other hand, if the evaluation value of the evaluation position does not exceed the predetermined threshold (NO in step S404), the evaluation position is not considered as a candidate position, and the process of step S405 is omitted.

候補位置抽出手段５１は、ステップＳ４０４，Ｓ４０５にて或る画素を評価位置とした処理を終えると、撮影画像の全ての画素の位置を評価位置に設定し終えたか否かを確認し（ステップＳ４０６）、未設定の画素がある場合は（ステップＳ４０６にてＮＯ）、処理をステップＳ４００に戻して次の画素の位置を処理する。 When the candidate position extraction unit 51 finishes the process of setting a certain pixel as the evaluation position in steps S404 and S405, the candidate position extraction unit 51 confirms whether or not the positions of all the pixels of the captured image have been set as the evaluation position (step S406). If there is an unset pixel (NO in step S406), the process returns to step S400 to process the position of the next pixel.

他方、候補位置抽出手段５１が全ての画素の位置を評価位置に設定して候補位置の抽出処理を終えた場合（ステップＳ４０６にてＹＥＳ）、生成された候補位置情報はグループ生成手段５２に出力され、処理は図５のステップＳ５に進められる。候補位置抽出手段５１は生成した候補位置情報をグループ生成手段５２に出力する。 On the other hand, when candidate position extraction means 51 sets the positions of all the pixels as evaluation positions and finishes the extraction process of the candidate positions (YES in step S406), the generated candidate position information is output to group generation means 52 The process proceeds to step S5 of FIG. The candidate position extraction unit 51 outputs the generated candidate position information to the group generation unit 52.

図７のフローチャートを参照して、ステップＳ５の候補位置統合処理を説明する。 The candidate position integration process of step S5 will be described with reference to the flowchart of FIG.

グループ生成手段５２は評価値の降順に候補位置情報を並べたリストを生成する（ステップＳ５００）。グループ生成手段５２は、リスト先頭の候補位置の密度に応じた下限割合と上限数とを設定するとともに（ステップＳ５０１）、候補位置グループのメンバー数を“１”に初期化する（ステップＳ５０２）。 The group generation unit 52 generates a list in which candidate position information is arranged in descending order of evaluation value (step S500). The group generation unit 52 sets the lower limit ratio and the upper limit number according to the density of the candidate position at the head of the list (step S501), and initializes the number of members of the candidate position group to "1" (step S502).

グループ生成手段５２は、リストの２番目以降の候補位置情報を順次、比較位置情報に設定し（ステップＳ５０３）、リスト先頭の候補位置の識別用抽出窓と比較位置情報の候補位置（比較位置）の識別用抽出窓との重複割合を算出する。当該重複割合がステップＳ５０１にて設定した下限割合を超えている場合（ステップＳ５０４にてＹＥＳ）、グループ生成手段５２は比較位置をリスト先頭の候補位置と同じ候補位置グループとすることとし、比較位置情報をリストから削除するとともに（ステップＳ５０５）、メンバー数を１だけ増加する（ステップＳ５０６）。 The group generation unit 52 sequentially sets the second and subsequent candidate position information of the list as comparison position information (step S 503), and the extraction window for identification of the candidate position at the head of the list and the candidate position of comparison position information (comparison position) Calculate the overlapping ratio with the extraction window for identification. When the overlapping ratio exceeds the lower limit ratio set in step S501 (YES in step S504), the group generation unit 52 sets the comparison position as the same candidate position group as the candidate position at the top of the list, and the comparison position The information is deleted from the list (step S505), and the number of members is increased by 1 (step S506).

ステップＳ５０６にて増加させたメンバー数がステップＳ５０１にて設定した上限数に達していない場合（ステップＳ５０７にてＮＯ）、グループ生成手段５２は比較位置情報がリスト終端であるか否かを判断する（ステップＳ５０８）。また、ステップＳ５０４にて重複割合が下限割合以下である場合は（ステップＳ５０４にてＮＯ）、ステップＳ５０５〜Ｓ５０７の処理を省略してステップＳ５０８の判断を行う。 When the number of members increased in step S506 does not reach the upper limit number set in step S501 (NO in step S507), the group generation unit 52 determines whether the comparison position information is the end of the list or not. (Step S508). If it is determined in step S504 that the overlapping ratio is equal to or less than the lower limit ratio (NO in step S504), the processing in steps S505 to S507 is omitted and the determination in step S508 is performed.

グループ生成手段５２は、比較位置情報がリスト終端でない場合（ステップＳ５０８にてＮＯ）、ステップＳ５０３〜Ｓ５０８の処理を繰り返し、リスト終端に達すると（ステップＳ５０８にてＹＥＳ）、現在のリスト先頭の候補位置についての候補位置グループの抽出を終了する。 When the comparison position information does not indicate the end of the list (NO in step S508), the group generation unit 52 repeats the processing of steps S503 to S508, and when the end of the list is reached (YES in step S508), the current list head candidate End the extraction of candidate position groups for the position.

また、グループ生成手段５２は、ステップＳ５０６にて増加させたメンバー数が上限数に達した場合（ステップＳ５０７にてＹＥＳ）、比較位置情報がリスト終端でなくても、現在のリスト先頭の候補位置についての候補位置グループの抽出を終了する。 Further, when the number of members increased in step S506 reaches the upper limit number (YES in step S507), the group generation unit 52 determines the current candidate position of the top of the list even if the comparison position information is not the end of the list. Finish extracting candidate position groups for.

グループ生成手段５２によりリスト先頭の候補位置についての候補位置グループが生成されると、物体位置決定手段５３は、候補位置グループ内の評価値が最大の候補位置、つまりリスト先頭の候補位置を物体位置に決定する（ステップＳ５０９）。 When the candidate position group for the candidate position at the top of the list is generated by the group generation unit 52, the object position determination unit 53 determines the candidate position with the largest evaluation value in the candidate position group, that is, the candidate position at the top of the list (Step S509).

また、グループ生成手段５２は候補位置グループの生成処理Ｓ５０１〜Ｓ５０８が完了したリスト先頭の候補位置情報を削除する（ステップＳ５１０）。ステップＳ５１０の削除処理後、リストに候補位置情報が残っていれば（ステップＳ５１１にてＮＯ）、グループ生成手段５２は処理をステップＳ５０１に戻し、新たなリスト先頭の候補位置についての候補位置グループの生成を行う。一方、リストが空になると（ステップＳ５１１にてＹＥＳ）、候補位置統合処理Ｓ５は終了し、処理は図５のステップＳ６に進められる。候補位置抽出手段５１は、生成した候補位置情報をグループ生成手段５２に出力する。 Further, the group generation unit 52 deletes candidate position information on the top of the list for which the generation processing S501 to S508 of the candidate position group is completed (step S510). If candidate position information remains in the list after the deletion processing in step S510 (NO in step S511), the group generation unit 52 returns the processing to step S501, and the candidate position group for the candidate position at the top of the new list Generate. On the other hand, when the list is empty (YES in step S511), the candidate position integration process S5 ends, and the process proceeds to step S6 in FIG. The candidate position extraction unit 51 outputs the generated candidate position information to the group generation unit 52.

再び図５を参照して説明を続ける。物体位置決定手段５３はステップＳ５にて決定した物体位置の情報を通信部３に出力し（ステップＳ６）、通信部３は物体位置出力手段３１として動作して物体位置の情報を表示部６に送信する。 Description will be continued with reference to FIG. 5 again. The object position determination means 53 outputs the information on the object position determined in step S5 to the communication unit 3 (step S6), and the communication unit 3 operates as the object position output means 31 to display the information on the object position on the display unit 6. Send.

以上の処理を終えると、処理はステップＳ１に戻され、次の撮影画像に対する処理が行われる。 When the above process is completed, the process is returned to step S1, and the process is performed on the next captured image.

［変形例］
（１）上記実施形態において、グループ生成手段５２は各候補位置と対応して設定された識別用抽出窓（候補領域）同士の重複割合を尺度に用いて候補位置同士の近接度合いを測る具体例を示したが、重複割合に代えて候補位置間の距離を尺度に用い、候補位置同士の距離によって近接度合いを測ることもできる。この構成では、グループ生成手段５２は、候補位置同士の距離について上限を設定する。そして、グループ生成手段５２は撮影画像内の混雑度合いが高い位置ほど短い上限距離を設定し、上限距離以下の距離にある複数の候補位置からなる候補位置グループを抽出する。この場合、例えば、グループ生成手段５２は、密度が低密度である候補位置同士に対しては上限距離を６０画素と設定し、密度が中密度である候補位置同士に対しては上限距離を４０画素と設定し、密度が高密度である候補位置同士に対しては上限距離を３０画素と設定して、密度ごとに候補位置グループを抽出する。 [Modification]
(1) In the above embodiment, the group generation unit 52 measures the proximity of candidate positions by using the overlapping ratio of the extraction windows for identification (candidate regions) set corresponding to the candidate positions as a measure. Although the distance between the candidate positions is used as a measure instead of the overlapping ratio, the proximity degree can also be measured by the distance between the candidate positions. In this configuration, the group generation unit 52 sets an upper limit for the distance between candidate positions. Then, the group generation unit 52 sets an upper limit distance that is shorter as the congestion degree in the captured image is higher, and extracts a candidate position group consisting of a plurality of candidate positions located at a distance equal to or less than the upper limit distance. In this case, for example, the group generation unit 52 sets the upper limit distance to 60 pixels for candidate positions having a low density and sets the upper limit distance to 40 for candidate positions having a medium density. The pixels are set as pixels, and the upper limit distance is set as 30 pixels for candidate positions having high density, and candidate position groups are extracted for each density.

（２）上記実施形態およびその変形例において、グループ生成手段５２の処理として例示した、候補位置グループを構成する候補位置の上限数の設定および当該上限数以下の候補位置からなる候補位置グループを抽出する処理は省略することもできる。 (2) In the above embodiment and the modification thereof, extraction of a candidate position group including setting of the upper limit number of candidate positions constituting the candidate position group and processing of the group generation unit 52 The processing to be performed can be omitted.

（３）上記実施形態およびその変形例において、物体位置決定手段５３は評価値が最大の候補位置を物体位置と決定したが、候補位置の平均値または重み付け平均値を物体位置と決定してもよい。すなわち、物体位置決定手段５３は、候補位置グループごとに当該候補位置グループを構成する候補位置の平均値を物体位置と決定する、または、候補位置グループごとに当該候補位置グループを構成する候補位置を当該候補位置の評価値（負の評価値を含み得る場合は全てを正となるようシフトさせた評価値）で重み付けて平均して重み付け平均値を物体位置と決定する。 (3) In the above embodiment and its modification, the object position determination means 53 determined the candidate position having the largest evaluation value as the object position, but it is also possible to determine the average value or the weighted average value of the candidate positions as the object position Good. That is, the object position determination means 53 determines the average value of the candidate positions constituting the candidate position group as the object position for each candidate position group, or the candidate positions constituting the candidate position group for each candidate position group The weighted average value is determined as the object position by weighting with the evaluation value of the candidate position (the evaluation value shifted so that all become positive if negative evaluation values can be included) and averaged.

（４）上記実施形態およびその変形例においては、検出対象の物体を人とする例を示したが、これに限らず、検出対象の物体を車両、椅子や机等の什器、牛や羊等の動物等とすることもできる。また、検出対象を１種類とせず人、椅子および机の３種類とするなど、複数種類とすることもできる（複数種類の物体が混在する空間での検出）。 (4) In the above embodiment and its modification, an example in which the object to be detected is a human is shown, but the invention is not limited to this, and the object to be detected is a vehicle, a fixture such as a chair or desk, a cow, a sheep, etc. The animals can also be In addition, a plurality of types of detection targets can be used such as one type of person, a chair, and a desk (detection in a space in which a plurality of types of objects are mixed).

（５）上記実施形態およびその変形例においては、各密度クラスに対応付ける単体識別器として人の全身、上部２／３および上部１／３を識別するものを用いたが、識別する部分および大きさは一例であり、検出対象や撮影する監視空間の特性、採用する特徴量や評価値の種類などの違いに応じ、それぞれに適した別の設定とすることができる。 (5) In the above-described embodiment and the modification thereof, the one that identifies the whole human body, the upper 2/3 and the upper 1/3, is used as a single body identifier that corresponds to each density class. Is an example, and different settings suitable for each can be made according to differences in the detection target, the characteristics of the monitoring space to be photographed, the feature value to be adopted, the type of evaluation value, and the like.

（６）上記実施形態およびその変形例においては、多クラスＳＶＭ法にて学習した密度推定器を例示したが、多クラスＳＶＭ法に代えて、決定木型のランダムフォレスト法、多クラスのアダブースト（AdaBoost）法または多クラスロジスティック回帰法などにて学習した密度推定器など種々の密度推定器とすることができる。 (6) In the above embodiment and its modification, the density estimator learned by the multiclass SVM method is exemplified, but instead of the multiclass SVM method, a decision tree type random forest method, multiclass adaboost ( It can be various density estimators, such as a density estimator learned by AdaBoost) method or multiclass logistic regression method.

或いは識別型のＣＮＮ（Convolutional Neural Network）を用いた密度推定器とすることもできる。 Alternatively, it may be a density estimator using a discrimination type CNN (Convolutional Neural Network).

（７）上記実施形態およびその変形例においては、密度推定器が推定する背景以外の密度のクラスを３クラスとしたが、より細かくクラスを分けてもよい。 (7) In the above embodiment and its modification, the class of density other than the background estimated by the density estimator is three classes, but the classes may be divided more finely.

その場合、３段階（全身、上半身および頭部近傍）の単体識別器に代えて、クラス分けに対応したより細かい段階の単体識別器とし、クラスと単体識別器を対応付けて単体識別器記憶手段４１に記憶させておくことができる。或いは、クラスと３段階の単体識別器を多対一で対応付けて単体識別器記憶手段４１に記憶させておくこともできる。 In that case, instead of the single-class classifiers of the three stages (whole body, upper body and vicinity of the head), a single-class classifier of finer stages corresponding to classification is used 41 can be stored. Alternatively, the class and the three-step single classifiers can be associated in many-to-one correspondence and stored in the single classifier storage unit 41.

（８）上記実施形態およびその変形例においては、多クラスに分類する密度推定器を例示したがこれに代えて、特徴量から密度の値（推定密度）を回帰する回帰型の密度推定器とすることもできる。すなわち、リッジ回帰法、サポートベクターリグレッション法、回帰木型のランダムフォレスト法またはガウス過程回帰（Gaussian Process Regression）などによって、特徴量から推定密度を求めるための回帰関数のパラメータを学習した密度推定器とすることができる。 (8) In the above embodiment and its modification, the density estimator classified into multiple classes is exemplified, but instead, a regression type density estimator that regresses the value of density (estimated density) from the feature amount and You can also That is, a density estimator which learns parameters of a regression function for obtaining an estimated density from feature amounts by ridge regression method, support vector regression method, regression tree type random forest method, Gaussian process regression, or the like can do.

或いは回帰型のＣＮＮを用いた密度推定器とすることもできる。 Alternatively, it may be a density estimator using a regression type CNN.

これらの場合、密度クラスの値の代わりに連続値で出力される推定密度の値域を、単体識別器と対応付けて単体識別器記憶手段４１に記憶させておく。 In these cases, the value range of the estimated density, which is output as a continuous value instead of the value of the density class, is stored in the single identifier storage unit 41 in association with the single identifier.

（９）上記実施形態およびその変形例においては、密度推定器が学習する特徴量および推定用特徴量としてＧＬＣＭ特徴を例示したが、これらはＧＬＣＭ特徴に代えて、局所二値パターン（Local Binary Pattern：ＬＢＰ）特徴量、ハールライク（Haar-like）特徴量、ＨＯＧ特徴量、輝度パターンなどの種々の特徴量とすることができ、またはＧＬＣＭ特徴とこれらのうちの複数を組み合わせた特徴量とすることもできる。 (9) In the above embodiment and its modification, GLCM features are exemplified as feature quantities to be learned by the density estimator and feature quantities for estimation. However, these may be replaced with GLCM features, and local binary patterns (Local Binary Pattern) may be used. : LBP) A variety of feature quantities such as a feature quantity, a Haar-like feature quantity, an HOG feature quantity, a luminance pattern, etc., or a feature quantity combining a plurality of GLCM features and these You can also.

（１０）上記各実施形態およびその各変形例においては、混雑推定手段である密度推定手段５０が物体の混雑度合いとしてその密度を推定する例を示したが、混雑推定手段は画像の複雑度の解析によって混雑度合いを推定することもできる。例えば、混雑推定手段は撮影画像を互いに色が類似する隣接画素ごとの領域に分割し、所定のブロックごとに分割領域を計数して計数値に応じた高さの複雑度を算出する（予めの実験を通じて求めた、計数値が多いほど高い複雑度が定まる関係に基づき算出）。或いは、混雑推定手段は所定のブロックごとに撮影画像の周波数解析を行ってピーク周波数の高さに応じた高さの複雑度を求める（予めの実験を通じて求めた、ピーク周波数が高いほど高い複雑度が定まる関係に基づき算出）。そして、混雑推定手段は、ブロックごとに複雑度に応じた高さの混雑度合いを推定する（予めの実験を通じて求めた、複雑度が高いほど高い混雑度合いが定まる関係に基づき算出）。 (10) In each of the above-described embodiments and their modifications, the density estimation unit 50 serving as the congestion estimation unit estimates the density as the congestion degree of the object. The degree of congestion can also be estimated by analysis. For example, the congestion estimation means divides the photographed image into areas for adjacent pixels having similar colors, counts the divided areas for each predetermined block, and calculates the complexity of the height according to the count value (previously Calculated based on the relationship determined by experiment, the higher the complexity, the greater the count value. Alternatively, the congestion estimation means performs frequency analysis of the captured image for each predetermined block to determine the complexity of the height according to the height of the peak frequency (the higher the peak frequency is, the higher the complexity determined through experiments) Calculated based on the relationship that determines Then, the congestion estimation means estimates the degree of congestion at a height corresponding to the degree of complexity for each block (calculated based on a relationship determined through preliminary experiments, the higher the degree of congestion, the higher the degree of congestion is determined).

（１１）上記実施形態およびその変形例においては、注目している候補位置に人の上部１／３の形状に定められたモデルの投影領域または該形状に定められた窓を設定して当該領域内の推定密度を集計することによって、当該候補位置における推定密度を決定する例を示したが、処理量を削減するために当該領域に代えて候補位置の画素、候補位置の８近傍領域または１６近傍領域などの小さな領域とすることもできる。或いは、確度を上げるために当該領域に代えて候補位置を代表位置とする単独の人の上部２／３の形状に定められたモデルの投影領域または該形状に定められた窓、または候補位置を代表位置とする単独の人の全身の形状に定められたモデルの投影領域または該形状に定められた窓などの大きな領域とすることもできる。 (11) In the above embodiment and the variation thereof, the projection area of the model defined in the shape of the upper 1/3 of the person or the window defined in the shape is set at the candidate position of interest and the area is set In this example, the estimated density at the candidate position is determined by adding up the estimated density in the area, but in order to reduce the amount of processing, the pixel at the candidate position, eight neighboring areas of the candidate position or It can also be a small area such as a near area. Alternatively, in order to increase the accuracy, instead of the area concerned, the projection area of the model defined in the shape of the upper 2/3 of a single person whose representative position is the representative position or the window defined in the shape or the candidate position The projection area of the model defined in the shape of the whole body of a single person taken as a representative position or a large area such as a window defined in the shape can also be used.

（１２）上記実施形態およびその変形例において示した、識別スコアと対比する閾値は単体識別器ごとに異なる値とすることもできる。 (12) The threshold value to be compared with the discrimination score shown in the above embodiment and the modification thereof may be a different value for each single classifier.

（１３）上記実施形態およびその変形例においては、線形ＳＶＭ法により学習された単体識別器を例示したが、線形ＳＶＭ法に代えてアダブースト法など、従来知られた各種の学習法を用いて学習した単体識別器とすることもできる。また、識別器の代わりにパターンマッチング器を用いることもでき、その場合の識別スコアは人の学習用画像から抽出した特徴量の平均パターンと入力画像の特徴量との内積などとなり、識別スコア算出関数は当該スコアを出力値とし入力画像の特徴量を入力値とする関数とすることができる。また単体識別器として識別型のＣＮＮを用いても良い。特に、識別処理に加えて識別用抽出窓のサイズの推定処理をも行うＲ−ＣＮＮ（Regions with CNN features）法等を用いる場合、候補領域である識別用抽出窓の大きさは可変サイズとすることができる。Ｒ−ＣＮＮ法については例えば"Rich feature hierarchies for accurate object detection and semantic segmentation", Ross Girshick他, CVPR 2014に記載がある。 (13) In the above embodiment and its modification, the single discriminator learned by the linear SVM method is exemplified, but instead of the linear SVM method, learning is performed using various known learning methods such as the Adaboost method. It can also be used as a single identifier. In addition, a pattern matching unit can be used instead of the classifier, and the classification score in that case is the inner product of the average pattern of the feature quantities extracted from the human learning image and the feature quantity of the input image, etc. The function can be a function having the score as an output value and the feature amount of the input image as an input value. Alternatively, a discrimination type CNN may be used as a single discriminator. In particular, when using the R-CNN (Regions with CNN features) method or the like which performs estimation processing of the size of the extraction window for identification in addition to identification processing, the size of the extraction window for identification which is a candidate region is variable. be able to. The R-CNN method is described, for example, in "Rich feature hierarchies for accurate object detection and semantic segmentation", Ross Girshick et al., CVPR 2014.

（１４）上記実施形態およびその変形例においては、単体識別器が学習する特徴量としてＨＯＧ特徴量を例示したが、これらはＨＯＧ特徴量に代えて、局所二値パターン特徴量、ハールライク特徴量、輝度パターンなどの種々の特徴量とすることができ、またはＨＯＧ特徴量とこれらのうちの複数を組み合わせた特徴量とすることもできる。 (14) Although the HOG feature has been exemplified as the feature to be learned by the simplex discriminator in the above embodiment and the variation thereof, these are local binary pattern feature, Haar-like feature, instead of the HOG feature. It may be various feature quantities such as a luminance pattern, or it may be a feature quantity combining a HOG feature quantity and a plurality of these.

１画像監視装置、２撮影部、３通信部、４記憶部、５画像処理部、６表示部、３０画像取得手段、３１物体位置出力手段、４０密度推定器記憶手段、４１単体識別器記憶手段、５０密度推定手段、５１候補位置抽出手段、５２グループ生成手段、５３物体位置決定手段、１００全身識別器、１０１上半身識別器、１０２頭部近傍識別器。 Reference Signs List 1 image monitoring apparatus, 2 imaging unit, 3 communication unit, 4 storage unit, 5 image processing unit, 6 display unit, 30 image acquisition unit, 31 object position output unit, 40 density estimator storage unit, 41 single classifier storage unit , 50 density estimation means, 51 candidate position extraction means, 52 group generation means, 53 object position determination means, 100 whole body discriminator, 101 upper body discriminator, 102 head near discriminator.

Claims

An object detection apparatus for detecting the position of each of the objects from a captured image obtained by capturing a space in which congestion due to the object may occur.
Congestion estimation means for analyzing an arbitrary area in the photographed image to estimate the degree of congestion of the object photographed in the area;
Candidate position extraction means for extracting a candidate position having a feature of the single image in the captured image using a single classifier that learns the features of the single image in which the single object is captured;
A group generation unit configured to set a lower limit regarding the proximity degree of the candidate positions to a higher position as the congestion degree in the captured image is higher, and generate a candidate position group including the candidate positions closer to the lower limit or more;
Object position determining means for determining the position of the object based on the candidate positions belonging to the candidate position group for each of the candidate position groups;
An object detection apparatus comprising:

The candidate position extraction unit extracts a candidate area having a feature of the single image based on the candidate position.
The group generation unit measures the proximity degree by the ratio of overlapping portions of the candidate areas, and sets the lower limit ratio regarding the overlapping portion to a larger value as the congestion degree is higher in the photographed image, Generating the candidate position group corresponding to the overlapping candidate region;
The object detection apparatus according to claim 1, characterized in that

The group generation unit measures the proximity degree based on the distance between the candidate positions, sets the upper limit regarding the distance to a smaller position as the congestion degree is higher in the photographed image, and the candidate position is a distance below the upper limit The object detection apparatus according to claim 1, wherein the candidate position group consisting of

The congestion estimation means may be captured in an arbitrary area in the captured image using a density estimator that learns features of each of the density images obtained by capturing the space in which the object is present at the density for each predetermined density. The object detection apparatus according to any one of claims 1 to 3, wherein the density of the object is estimated as the degree of congestion.

The group generation means is a ratio of the density in the area of the candidate position extracted by the candidate position extraction means to the density of the object estimated by the congestion estimation means in an arbitrary area in the photographed image. According to the present invention, the upper limit number of candidate positions constituting the candidate position group in the area is set according to the condition, and the candidate position group consisting of the candidate positions equal to or less than the upper limit number is generated. The object detection apparatus as described.