JP2018165966A

JP2018165966A - Object detection device

Info

Publication number: JP2018165966A
Application number: JP2017063887A
Authority: JP
Inventors: 秀紀氏家; Hidenori Ujiie; 昌宏前田; Masahiro Maeda; 黒川　高晴; Takaharu Kurokawa; 高晴黒川
Original assignee: Secom Co Ltd
Current assignee: Secom Co Ltd
Priority date: 2017-03-28
Filing date: 2017-03-28
Publication date: 2018-10-25
Anticipated expiration: 2037-03-28
Also published as: JP6893812B2

Abstract

PROBLEM TO BE SOLVED: To accurately detect each object in a photographed image even if it is a photographed image in which a space where congestion may occur.SOLUTION: An object detection device that detects individual objects from a photographed image obtained by photographing a space in which congestion by a predetermined object may occur includes: density estimation means 50 that uses a density estimator that learns image features of each of density images obtained by photographing a space in which an object exists with predetermined density for each predetermined density, to estimate a distribution of density of an object photographed in the photographed image; and object position determination means 51 that sets a candidate position at which each object can exist in the photographed image, calculates an evaluation value indicating degree of appearance of an image feature of a single object in the photographed image at the candidate position, and determines a candidate position whose evaluation value is equal to or larger than the predetermined value as the position of the object; where the object position determination means 51 calculates the evaluation value by changing a part of importance which is emphasized among parts constituting the single object according to density at the candidate position.SELECTED DRAWING: Figure 2

Description

本発明は、人等の所定の物体が存在し得る空間が撮影された撮影画像から個々の物体を検出する物体検出装置に関し、特に、混雑が生じ得る空間が撮影された撮影画像から個々の物体を検出する物体検出装置に関する。 The present invention relates to an object detection device that detects individual objects from a captured image in which a space in which a predetermined object such as a person can exist is captured, and in particular, an individual object from a captured image in which a space in which congestion can occur is captured. The present invention relates to an object detection device for detecting the above.

イベント会場等の混雑が発生し得る空間においては事故防止等のために、混雑が発生している区域に警備員を多く配置するなどの対応が求められる。そこで、会場の各所に監視カメラを配置して撮影画像から人の分布を推定し、推定した分布を表示することによって監視員による混雑状況の把握を容易化することができる。 In an event venue or other space where congestion can occur, countermeasures such as placing a large number of guards in the crowded area are required to prevent accidents. Therefore, monitoring cameras can be arranged at various locations in the venue to estimate the distribution of people from the captured images and display the estimated distribution, thereby facilitating the understanding of the congestion situation by the monitoring staff.

その際、個々の人の位置を検出して、検出した各位置に人の形状を模したモデルを表示し、または／および人の位置関係（例えば行列を為している、取り囲んでいる）を解析して解析結果を報知することによって、より一層の監視効率向上が期待できる。 At that time, the position of each person is detected, and a model imitating the person's shape is displayed at each detected position, or / and the positional relationship of the person (for example, forming a matrix or surrounding) Further analysis efficiency can be expected by analyzing and notifying the analysis result.

複数人が撮影された撮影画像から個々の人の位置を検出する方法に、人を模したモデルを複数個組み合わせて撮影画像に当てはめる方法や、単独の人が撮影された画像の特徴量を事前に学習した識別器を用いて撮影画像をスキャンする方法など、予め用意した単独の人の画像特徴を用いて撮影画像から単独の人の画像特徴が現れている位置を検出する方法がある。 The method of detecting the position of each person from a photographed image taken by multiple people, combining multiple models that imitate people, and applying it to the photographed image, as well as feature values of images taken by a single person in advance There is a method of detecting a position where an image feature of a single person appears from a captured image using a previously prepared image feature of a single person, such as a method of scanning a captured image using a discriminator that has been learned.

例えば、特許文献１に記載の移動物体追跡装置においては、監視画像と背景画像との比較によって変化画素が抽出された位置に、追跡中の移動物体の形状を模した移動物体モデルを追跡中の移動物体の数だけ組み合わせて当てはめることによって個々の移動物体の位置を検出している。この移動物体追跡装置においては、人の全身の形状を近似した移動物体モデルを用いることが例示されている。 For example, in the moving object tracking device described in Patent Document 1, a moving object model imitating the shape of the moving object being tracked is extracted at the position where the changed pixel is extracted by comparing the monitoring image and the background image. The positions of individual moving objects are detected by combining and applying the number of moving objects. In this moving object tracking apparatus, the use of a moving object model that approximates the shape of a person's whole body is exemplified.

また、例えば、特許文献２に記載の物体検出装置は、多数の「人」の画像データ、「人以外」の画像データを用いて予め学習させた識別器を用いて入力画像から人を検出する。この物体検出装置が用いる識別器は、人の全身の画像データを用いて学習したものであることが示唆されている。また、この物体検出装置においては、入力画像から円を検出して人体頭部の候補領域としている。 Further, for example, the object detection device described in Patent Document 2 detects a person from an input image using a classifier previously learned using a large number of “human” image data and “non-human” image data. . It has been suggested that the classifier used by this object detection apparatus is learned using image data of the whole body of a person. In this object detection device, a circle is detected from an input image and used as a candidate region for the human head.

特開２０１２−１５９９５８号公報JP 2012-159958 A 特開２０１１−１８６６３３号公報JP 2011-186633 A

しかしながら、混雑が生じ得る空間が撮影された撮影画像においては、混雑状態に応じて人の隠蔽状態が変化する。そのため、単独の人の画像特徴として、混雑状態によらず常に同一部分の画像特徴を用いていると個々の人を精度良く検出し続けることが困難となる問題があった。 However, in a captured image in which a space where congestion can occur is captured, the concealment state of a person changes according to the congestion state. For this reason, there has been a problem that it is difficult to accurately detect individual persons if the image characteristics of the same part are always used as the image characteristics of a single person regardless of the congestion state.

すなわち、混雑が生じておらず、全身が撮影されている人が多い撮影画像については、人を模したモデルを用いる方法においても、人の画像を学習した識別器を用いる方法においても、全身の画像特徴を用いた方が頭部近傍のみの画像特徴を用いるよりも高い精度で当該人を検出できる。 That is, for a captured image in which there are many people who are not crowded and the whole body has been photographed, both in the method using a model imitating a person and in the method using a discriminator that learns a human image, The person using the image feature can be detected with higher accuracy than using the image feature only near the head.

一方、混雑が生じ、隠蔽状態が多発している撮影画像については、人を模したモデルを用いる方法においても、人の画像を学習した識別器を用いる方法においても、全身の画像特徴を用いるよりも頭部近傍のみの画像特徴を用いた方が高い精度で当該人を検出できる。 On the other hand, for captured images that are congested and frequently concealed, both in the method using a model imitating a person and in the method using a discriminator that learns a human image, the image features of the whole body are used. The person can be detected with higher accuracy by using only the image feature near the head.

そのため、例えば、混雑時の検出精度を高めるために頭部近傍のみの画像特徴を常に用いていると混雑が生じていない時の検出精度が低下し、混雑が生じていない時の検出精度を高めるために全身の画像特徴を常に用いていると混雑時の検出精度が低下する。
つまり、混雑状態と、個々の物体の検出のために用いる部分の多寡には、隠蔽状態の変化を要因とするトレードオフがある。 Therefore, for example, if image features only near the head are always used to increase detection accuracy during congestion, the detection accuracy when there is no congestion is reduced, and detection accuracy when there is no congestion is increased. For this reason, if the image features of the whole body are always used, the detection accuracy at the time of congestion decreases.
That is, there is a trade-off between the congestion state and the number of parts used for detecting individual objects due to a change in the concealment state.

また、撮影画像中には混雑状態の異なる領域が混在し得る。これによって領域ごとに検出精度が変わることとなり、問題がより複雑化する。 In addition, regions with different congestion states may be mixed in the captured image. This changes the detection accuracy for each region, and the problem becomes more complicated.

このように、混雑が生じ得る空間が撮影された撮影画像においては検出対象の物体の隠蔽状態が混雑状態に応じて変化するため、当該撮影画像から個々の物体を精度良く検出することが困難であった。 In this way, in the captured image in which a space where congestion can occur is captured, the concealment state of the object to be detected changes according to the congestion state, and thus it is difficult to accurately detect individual objects from the captured image. there were.

本発明は上記問題に鑑みてなされたものであり、混雑が生じ得る空間が撮影された撮影画像であっても当該撮影画像中の個々の物体を精度良く検出することができる物体検出装置を提供することを目的とする。 The present invention has been made in view of the above problems, and provides an object detection device that can accurately detect individual objects in a captured image even in a captured image of a space in which congestion can occur. The purpose is to do.

かかる目的を達成するために本発明は、所定の物体による混雑が生じ得る空間が撮影された撮影画像から個々の前記物体を検出する物体検出装置であって、所定の密度ごとに当該密度にて前記物体が存在する空間を撮影した密度画像それぞれの画像特徴を学習した密度推定器を用いて、前記撮影画像に撮影された前記物体の前記密度の分布を推定する密度推定手段と、前記撮影画像内に個々の前記物体が存在し得る候補位置を設定して当該候補位置の前記撮影画像に単独の前記物体の画像特徴が現れている度合いを表す評価値を算出し、前記評価値が所定値以上である候補位置を前記物体の位置と判定する物体位置判定手段と、を備え、前記物体位置判定手段は、前記候補位置における前記密度に応じ、単独の前記物体を構成する部分のうちの重視する部分を変更して前記評価値を算出することを特徴とした物体検出装置を提供する。 In order to achieve such an object, the present invention provides an object detection device for detecting individual objects from a captured image in which a space in which congestion due to a predetermined object may occur is captured, and for each predetermined density Density estimating means for estimating the density distribution of the object photographed in the photographed image using a density estimator that has learned image features of the respective density images photographed in the space in which the object exists, and the photographed image A candidate position where each of the objects can exist is set, and an evaluation value representing the degree to which the image feature of the single object appears in the captured image at the candidate position is calculated, and the evaluation value is a predetermined value Object position determination means for determining the candidate position as described above as the position of the object, wherein the object position determination means is a part of a part constituting the single object according to the density at the candidate position. To provide an object detection apparatus and calculates the evaluation value by changing the portion to be emphasized.

前記物体位置判定手段は、前記候補位置における前記密度が高いほど、単独の前記物体を構成する部分のうちの少ない部分の画像特徴を重視して前記評価値を算出することが好適である。 It is preferable that the object position determination unit calculates the evaluation value with an emphasis on image features of a small part of the parts constituting the single object as the density at the candidate position is higher.

また、前記物体位置判定手段は、前記候補位置における前記密度が高いほど、前記撮影画像の当該候補位置に単独の前記物体を構成する部分のうちの少ない部分の画像特徴が現れている度合いを表す前記評価値を算出することが好適である。 Further, the object position determination means represents the degree that the image features of a small part of the parts constituting the single object appear at the candidate position of the photographed image as the density at the candidate position is higher. It is preferable to calculate the evaluation value.

または、前記物体位置判定手段は、前記撮影画像の前記候補位置に単独の前記物体を構成する複数の部分の画像特徴が現れている度合いを表す部分評価値を算出し、当該候補位置における前記密度が高いほど、前記物体を構成する部分のうちの少ない部分の前記部分評価値に高く重み付けて前記部分評価値を総和することによって前記評価値を算出することが好適である。 Alternatively, the object position determination unit calculates a partial evaluation value indicating a degree of appearance of image features of a plurality of parts constituting the single object at the candidate position of the captured image, and the density at the candidate position is calculated. It is preferable that the higher the is, the higher the weight is given to the partial evaluation values of a small part of the parts constituting the object, and the evaluation value is calculated by summing the partial evaluation values.

また、前記物体位置判定手段は、それぞれが１以上の前記候補位置を含む互いに異なる複数通りの配置を生成する配置生成手段と、前記複数通りの配置それぞれについて、前記各候補位置に、当該候補位置における前記密度が高いほど、単独の前記物体を構成する部分のうちの少ない部分を模した物体モデルを描画してモデル画像を生成するモデル画像生成手段と、前記複数通りの配置それぞれについて、前記モデル画像の前記撮影画像に対する類似の度合いを表す前記評価値を算出する評価値算出手段と、前記評価値が最大の配置における前記候補位置を前記物体の位置と決定する最適配置決定手段と、を含むことが好適である。 In addition, the object position determination unit is configured to generate a plurality of different arrangements each including one or more candidate positions, and to each candidate position for each of the plurality of arrangements. The model image generating means for generating a model image by drawing an object model simulating a small part of the parts constituting the single object, and the model for each of the plurality of arrangements, Evaluation value calculating means for calculating the evaluation value representing the degree of similarity of the image to the photographed image, and optimum arrangement determining means for determining the candidate position in the arrangement having the maximum evaluation value as the position of the object. Is preferred.

また、前記物体位置判定手段は、それぞれが１以上の前記候補位置を含む互いに異なる複数通りの配置を生成する配置生成手段と、前記複数通りの配置それぞれについて、前記各候補位置に単独の前記物体を模した物体モデルを描画してモデル画像を生成するモデル画像生成手段と、前記複数通りの配置の前記モデル画像それぞれについて、前記物体を構成する部分ごとに前記物体モデルの前記撮影画像に対する類似度を求め、前記候補位置における前記密度が高いほど少ない部分に偏重させた重み付けを行って前記類似度を総和することにより前記評価値を算出する評価値算出手段と、前記評価値が最大の配置における前記候補位置を前記物体の位置と決定する最適配置決定手段と、を含むことが好適である。 In addition, the object position determination unit generates a plurality of different arrangements each including one or more candidate positions, and the object at each candidate position for each of the plurality of arrangements. Model image generation means for generating a model image by drawing an object model simulating an object model, and for each of the model images in the plurality of arrangements, the degree of similarity of the object model to the captured image for each part constituting the object And an evaluation value calculation means for calculating the evaluation value by performing weighting with a weight being applied to a smaller portion as the density at the candidate position is higher and summing up the similarities, and an arrangement in which the evaluation value is the maximum It is preferable that an optimum arrangement determining unit that determines the candidate position as the position of the object is included.

また、前記物体位置判定手段は、前記撮影画像内に所定間隔にて複数の前記候補位置を設定する候補位置設定手段と、前記候補位置それぞれについて、当該候補位置の前記密度が高いほど単独の前記物体を構成する部分のうちの少ない部分の画像特徴を学習した識別器に、当該候補位置の前記撮影画像の画像特徴を入力して前記評価値を算出する評価値算出手段と、予め定めた基準を満たす前記評価値が算出された前記候補位置を前記物体の位置と決定する位置決定手段と、を含むことが好適である。 In addition, the object position determination unit includes a candidate position setting unit that sets a plurality of candidate positions at predetermined intervals in the captured image, and for each of the candidate positions, the higher the density of the candidate positions, Evaluation value calculating means for calculating the evaluation value by inputting the image feature of the photographed image at the candidate position to the discriminator that has learned the image characteristics of a small part of the parts constituting the object, and a predetermined reference It is preferable that a position determination unit that determines the candidate position where the evaluation value satisfying the condition is calculated as the position of the object is included.

また、前記物体位置判定手段は、前記撮影画像内に所定間隔にて複数の前記候補位置を設定する候補位置設定手段と、前記候補位置それぞれについて、単独の前記物体を構成する複数の部分の画像特徴を学習した識別器に当該候補位置の前記撮影画像の画像特徴を入力して前記複数の部分の部分評価値を求め、前記候補位置における前記密度が高いほど少ない部分に偏重させた重み付けを行って前記部分評価値を総和することにより前記評価値を算出する評価値算出手段と、予め定めた基準を満たす前記評価値が算出された前記候補位置を前記物体の位置と決定する位置決定手段と、を含むことが好適である。 In addition, the object position determination unit includes a candidate position setting unit that sets a plurality of candidate positions at predetermined intervals in the captured image, and a plurality of partial images that constitute the single object for each of the candidate positions. The image features of the photographed image at the candidate position are input to the classifier that has learned the features to obtain partial evaluation values of the plurality of portions, and weighting is applied to the smaller portions at the candidate positions with increasing weight. Evaluation value calculating means for calculating the evaluation value by summing the partial evaluation values, and position determining means for determining the candidate position where the evaluation value satisfying a predetermined criterion is calculated as the position of the object It is preferable to contain.

本発明によれば、混雑が生じ得る空間が撮影された撮影画像から個々の物体を精度良く検出できる。 According to the present invention, it is possible to accurately detect individual objects from a captured image in which a space where congestion can occur is captured.

画像監視装置の概略の構成を示すブロック図である。It is a block diagram which shows the schematic structure of an image monitoring apparatus. 画像監視装置の機能を示す機能ブロック図である。It is a functional block diagram which shows the function of an image monitoring apparatus. 画像監視装置の機能を示す機能ブロック図である。It is a functional block diagram which shows the function of an image monitoring apparatus. 物体モデル記憶手段が記憶している物体モデルの情報を模式的に表した図である。It is the figure which represented typically the information of the object model which the object model memory | storage means has memorize | stored. 密度推定手段、配置生成手段およびモデル画像生成手段による処理例を模式的に示した図である。It is the figure which showed typically the example of a process by a density estimation means, an arrangement | positioning production | generation means, and a model image generation means. 画像監視装置の動作を示したフローチャートである。It is the flowchart which showed operation | movement of the image monitoring apparatus. 画像監視装置の物体位置判定処理のフローチャートである。It is a flowchart of the object position determination process of an image monitoring apparatus. 画像監視装置の物体位置判定処理のフローチャートである。It is a flowchart of the object position determination process of an image monitoring apparatus. 画像監視装置の機能を示す機能ブロック図である。It is a functional block diagram which shows the function of an image monitoring apparatus. 物体モデル記憶手段が記憶している物体モデルの情報および重み係数記憶手段が記憶している重み係数の情報を模式的に表した図である。It is the figure which represented typically the information of the object model which the object model memory | storage means has memorize | stored, and the information of the weighting coefficient which the weighting coefficient memory | storage means has memorize | stored. モデル画像生成手段が生成したモデル画像と重み画像を模式的に示した図である。It is the figure which showed typically the model image and weight image which the model image generation means produced | generated. 画像監視装置の物体位置判定処理のフローチャートである。It is a flowchart of the object position determination process of an image monitoring apparatus. 画像監視装置の物体位置判定処理のフローチャートである。It is a flowchart of the object position determination process of an image monitoring apparatus. 画像監視装置の機能を示す機能ブロック図である。It is a functional block diagram which shows the function of an image monitoring apparatus. 単体識別器記憶手段が記憶している単体識別器の情報を模式的に表した図である。It is the figure which represented typically the information of the single classifier which the single classifier memory | storage means has memorize | stored. 評価値算出手段が設定する識別用抽出窓を模式的に示した図である。It is the figure which showed typically the extraction window for identification which an evaluation value calculation means sets. 画像監視装置の物体位置判定処理のフローチャートである。It is a flowchart of the object position determination process of an image monitoring apparatus. 画像監視装置の機能を示す機能ブロック図である。It is a functional block diagram which shows the function of an image monitoring apparatus. 単体識別器記憶手段が記憶している単体識別器の情報および重み係数記憶手段が記憶している重み係数の情報を模式的に表した図である。It is the figure which represented typically the information of the single discriminator memorize | stored in the single discriminator memory | storage means, and the information of the weighting coefficient which the weighting coefficient memory | storage means has memorize | stored. 評価値算出手段が識別スコアを算出する様子を模式的に示した図である。It is the figure which showed typically a mode that an evaluation value calculation means calculates an identification score. 画像監視装置の物体位置判定処理のフローチャートである。It is a flowchart of the object position determination process of an image monitoring apparatus.

［第一の実施形態］
以下、本発明の実施形態として、イベント会場が撮影された撮影画像から個々の人を検出する物体検出装置の例を含み、検出結果を監視員に対して表示する画像監視装置１の例を説明する。この実施形態に係る画像監視装置１は、特に、物体検出装置が人を模した物体モデルを用いて個々の人を検出し、その際に物体検出装置が人の密度によって物体モデルを切り替える例を含む。 [First embodiment]
Hereinafter, as an embodiment of the present invention, an example of an image monitoring device 1 that includes an example of an object detection device that detects an individual person from a captured image obtained by shooting an event venue, and that displays a detection result to a monitor will be described. To do. In the image monitoring apparatus 1 according to this embodiment, in particular, an example in which the object detection device detects an individual person using an object model imitating a person, and the object detection apparatus switches the object model according to the density of the person at that time. Including.

＜第一の実施形態に係る画像監視装置１の構成＞
図１は画像監視装置１の概略の構成を示すブロック図である。画像監視装置１は、撮影部２、通信部３、記憶部４、画像処理部５、および表示部６からなる。 <Configuration of Image Monitoring Device 1 according to First Embodiment>
FIG. 1 is a block diagram showing a schematic configuration of the image monitoring apparatus 1. The image monitoring apparatus 1 includes a photographing unit 2, a communication unit 3, a storage unit 4, an image processing unit 5, and a display unit 6.

撮影部２は、監視カメラであり、通信部３を介して画像処理部５と接続され、監視空間を所定の時間間隔で撮影して撮影画像を生成し、撮影画像を順次画像処理部５に入力する撮影手段である。例えば、撮影部２は、イベント会場に設置されたポールに当該監視空間を俯瞰する視野を有して設置される。その視野は固定されていてもよいし、予めのスケジュール或いは通信部３を介した外部からの指示に従って変更されてもよい。また、例えば、撮影部２は監視空間をフレーム周期１秒で撮影してカラー画像を生成する。カラー画像の代わりにモノクロ画像を生成してもよい。 The photographing unit 2 is a surveillance camera and is connected to the image processing unit 5 via the communication unit 3. The photographing unit 2 shoots the monitoring space at a predetermined time interval to generate a photographed image, and sequentially captures the photographed image to the image processing unit 5. It is a photographing means to input. For example, the imaging unit 2 is installed on a pole installed at an event site with a view of the monitoring space. The visual field may be fixed, or may be changed according to a schedule in advance or an instruction from the outside via the communication unit 3. Further, for example, the imaging unit 2 captures the monitoring space with a frame period of 1 second and generates a color image. A monochrome image may be generated instead of the color image.

通信部３は、通信回路であり、その一端が画像処理部５に接続され、他端が同軸ケーブルまたはＬＡＮ（Local Area Network）、インターネットなどの通信網を介して撮影部２および表示部６と接続される。通信部３は、撮影部２から撮影画像を取得して画像処理部５に入力し、画像処理部５から入力された検出結果を表示部６に出力する。 The communication unit 3 is a communication circuit, one end of which is connected to the image processing unit 5 and the other end is connected to the photographing unit 2 and the display unit 6 via a communication network such as a coaxial cable, a LAN (Local Area Network), or the Internet. Connected. The communication unit 3 acquires a captured image from the imaging unit 2 and inputs the acquired image to the image processing unit 5, and outputs the detection result input from the image processing unit 5 to the display unit 6.

記憶部４は、ＲＯＭ（Read Only Memory）、ＲＡＭ（Random Access Memory）等のメモリ装置であり、各種プログラムや各種データを記憶する。記憶部４は、画像処理部５と接続されて画像処理部５との間でこれらの情報を入出力する。 The storage unit 4 is a memory device such as a ROM (Read Only Memory) or a RAM (Random Access Memory), and stores various programs and various data. The storage unit 4 is connected to the image processing unit 5 and inputs / outputs such information to / from the image processing unit 5.

画像処理部５は、ＣＰＵ（Central Processing Unit）、ＤＳＰ（Digital Signal Processor）、ＭＣＵ（Micro Control Unit）等の演算装置で構成される。画像処理部５は、記憶部４および表示部６と接続され、記憶部４からプログラムを読み出して実行することにより各種処理手段・制御手段として動作し、各種データを記憶部４に記憶させ、読み出す。また、画像処理部５は、通信部３を介して撮影部２および表示部６とも接続され、通信部３経由で撮影部２から取得した撮影画像を解析することにより個々の人を検出し、検出結果を通信部３経由で表示部６に表示させる。 The image processing unit 5 is configured by an arithmetic device such as a CPU (Central Processing Unit), a DSP (Digital Signal Processor), or an MCU (Micro Control Unit). The image processing unit 5 is connected to the storage unit 4 and the display unit 6, operates as various processing units / control units by reading out and executing programs from the storage unit 4, and stores various types of data in the storage unit 4 for reading. . The image processing unit 5 is also connected to the imaging unit 2 and the display unit 6 via the communication unit 3 and detects individual persons by analyzing the captured image acquired from the imaging unit 2 via the communication unit 3. The detection result is displayed on the display unit 6 via the communication unit 3.

表示部６は、液晶ディスプレイ又はＣＲＴ（Cathode Ray Tube）ディスプレイ等のディスプレイ装置であり、通信部３を介して画像処理部５と接続され、画像処理部５による検出結果を表示する表示手段である。監視員は表示された検出結果を視認して混雑の発生等を判断し、必要に応じて人員配置の変更等の対処を行う。 The display unit 6 is a display device such as a liquid crystal display or a CRT (Cathode Ray Tube) display, and is a display unit that is connected to the image processing unit 5 via the communication unit 3 and displays a detection result by the image processing unit 5. . The monitor visually checks the displayed detection result to determine the occurrence of congestion, and takes measures such as changing the personnel arrangement as necessary.

なお、本実施形態においては、撮影部２と画像処理部５の個数が１対１である画像監視装置１を例示するが、別の実施形態においては、撮影部２と画像処理部５の個数を多対１或いは多対多とすることもできる。 In the present embodiment, the image monitoring apparatus 1 in which the number of the photographing units 2 and the image processing units 5 is 1: 1 is illustrated, but in another embodiment, the number of the photographing units 2 and the image processing units 5 is illustrated. Can be many-to-one or many-to-many.

＜第一の実施形態に係る画像監視装置１の機能＞
図２および図３は画像監視装置１の機能を示す機能ブロック図である。通信部３は画像取得手段３０および物体位置出力手段３１等として機能し、記憶部４は密度推定器記憶手段４０および単体特徴記憶手段４１等として機能する。画像処理部５は、密度推定手段５０および物体位置判定手段５１等として機能する。また、単体特徴記憶手段４１は物体モデル記憶手段４１０ａとしての機能を含み、物体位置判定手段５１は配置生成手段５１０ａ、モデル画像生成手段５１２ａ、評価値算出手段５１４ａおよび最適配置決定手段５１６ａとしての機能を含む。 <Function of the image monitoring apparatus 1 according to the first embodiment>
2 and 3 are functional block diagrams showing functions of the image monitoring apparatus 1. The communication unit 3 functions as the image acquisition unit 30, the object position output unit 31, and the like, and the storage unit 4 functions as the density estimator storage unit 40, the single feature storage unit 41, and the like. The image processing unit 5 functions as a density estimation unit 50, an object position determination unit 51, and the like. The single feature storage unit 41 includes a function as an object model storage unit 410a, and the object position determination unit 51 functions as an arrangement generation unit 510a, a model image generation unit 512a, an evaluation value calculation unit 514a, and an optimum arrangement determination unit 516a. including.

画像取得手段３０は、撮影手段である撮影部２から撮影画像を順次取得して、取得した撮影画像を密度推定手段５０および物体位置判定手段５１に順次出力する。 The image acquisition unit 30 sequentially acquires captured images from the imaging unit 2 that is an imaging unit, and sequentially outputs the acquired captured images to the density estimation unit 50 and the object position determination unit 51.

密度推定器記憶手段４０は、所定の密度ごとに当該密度にて物体（人）が存在する空間を撮影した密度画像それぞれの画像特徴を学習した推定密度算出関数であって、画像の特徴量を入力されると当該画像に撮影されている物体の密度の推定値（推定密度）を算出し、算出した推定密度を出力する推定器（密度推定器）の情報を予め記憶している。つまり上記推定密度算出関数の係数等のパラメータを密度推定器の情報として予め記憶している。 The density estimator storage unit 40 is an estimated density calculation function that learns the image features of each density image obtained by photographing a space where an object (person) exists at a predetermined density for each predetermined density. When input, an estimated value (estimated density) of the density of an object photographed in the image is calculated, and information on an estimator (density estimator) that outputs the calculated estimated density is stored in advance. That is, parameters such as the coefficient of the estimated density calculation function are stored in advance as information on the density estimator.

密度推定手段５０は、画像取得手段３０から入力された撮影画像の各所から密度推定用の特徴量（推定用特徴量）を抽出するとともに密度推定器記憶手段４０から密度推定器を読み出して、抽出した推定用特徴量のそれぞれを密度推定器に入力することによって推定密度の分布（密度分布）を推定し、推定した密度分布を物体位置判定手段５１に出力する。 The density estimation unit 50 extracts a density estimation feature amount (estimation feature amount) from each part of the captured image input from the image acquisition unit 30 and reads the density estimator from the density estimator storage unit 40 to extract it. Each estimated feature amount is input to a density estimator to estimate an estimated density distribution (density distribution), and the estimated density distribution is output to the object position determination unit 51.

密度推定の処理と密度推定器について具体的に説明する。 The density estimation process and the density estimator will be specifically described.

密度推定手段５０は、撮影画像の各画素の位置に窓（推定用抽出窓）を設定し、各推定用抽出窓における撮影画像から推定用特徴量を抽出する。推定用特徴量はＧＬＣＭ（Gray Level Co-occurrence Matrix）特徴である。 The density estimation means 50 sets a window (estimation extraction window) at the position of each pixel of the captured image, and extracts an estimation feature amount from the captured image in each estimation extraction window. The estimation feature amount is a GLCM (Gray Level Co-occurrence Matrix) feature.

各推定用抽出窓に撮影されている監視空間内の領域は同一サイズであることが望ましい。すなわち、好適には密度推定手段５０は不図示のカメラパラメータ記憶手段から予め記憶されている撮影部２のカメラパラメータを読み出し、カメラパラメータを用いたホモグラフィ変換により撮影画像の任意の画素に撮影されている監視空間内の領域が同一サイズとなるように撮影画像を変形してから推定用特徴量を抽出する。 It is desirable that the area in the monitoring space photographed by each estimation extraction window is the same size. That is, preferably, the density estimation means 50 reads out the camera parameters of the photographing unit 2 stored in advance from a camera parameter storage means (not shown), and is photographed at an arbitrary pixel of the photographed image by homography conversion using the camera parameters. The estimation feature amount is extracted after the captured image is deformed so that the areas in the monitoring space have the same size.

密度推定器は多クラスの画像を識別する識別器で実現することができ、多クラスＳＶＭ（Support Vector Machine）法で学習した識別関数とすることができる。
密度は、例えば、人が存在しない「背景」クラス、０人／ｍ^２より高く２人／ｍ^２以下である「低密度」クラス、２人／ｍ^２より高く４人／ｍ^２以下である「中密度」クラス、４人／ｍ^２より高い「高密度」クラスの４クラスと定義することができる。 The density estimator can be realized by a classifier that identifies multi-class images, and can be a discrimination function learned by a multi-class SVM (Support Vector Machine) method.
Density, for example, there is no human "Background" class is 0 people / m higher than ² is two / m ² or less "low density" class, higher than two / m ² 4 persons / m ² or less It can be defined as 4 classes of “medium density” class, “high density” class higher than 4 persons / m ² .

推定密度は各クラスに予め付与された値であり、分布推定の結果として出力される値である。本実施形態では各クラスに対応する値を「背景」「低密度」「中密度」「高密度」と表記する。 The estimated density is a value given in advance to each class, and is a value output as a result of distribution estimation. In the present embodiment, values corresponding to each class are expressed as “background”, “low density”, “medium density”, and “high density”.

すなわち、密度推定器は「背景」クラス、「低密度」クラス、「中密度」クラス、「高密度」クラスのそれぞれに帰属する多数の画像（密度画像）の特徴量に多クラスＳＶＭ法を適用して学習した、各クラスの画像を他のクラスと識別するための識別関数である。この学習により導出された識別関数のパラメータが密度推定器として記憶されている。なお、密度画像の特徴量は、推定用特徴量と同種であり、ＧＬＣＭ特徴である。 That is, the density estimator applies the multi-class SVM method to the feature quantities of a large number of images (density images) belonging to the “background” class, “low density” class, “medium density” class, and “high density” class. This is an identification function for discriminating the images of each class from other classes. The parameters of the discriminant function derived by this learning are stored as a density estimator. The feature amount of the density image is the same type as the estimation feature amount and is a GLCM feature.

密度推定手段５０は、各画素に対応して抽出した推定用特徴量のそれぞれを密度推定器に入力することによってその出力値である推定密度を取得する。なお、撮影画像を変形させて推定用特徴量を抽出した場合、密度推定手段５０は、カメラパラメータを用いたホモグラフィ変換により密度分布を元の撮影画像の形状に変形させる。 The density estimation means 50 acquires the estimated density which is the output value by inputting each of the estimation feature quantities extracted corresponding to each pixel to the density estimator. In addition, when the estimated feature amount is extracted by deforming the captured image, the density estimating unit 50 deforms the density distribution into the original captured image shape by homography conversion using the camera parameter.

こうして得られた、撮影画像の画素ごとの推定密度の集まりが密度分布である。 A collection of estimated densities for each pixel of the captured image thus obtained is a density distribution.

密度推定手段５０が出力する密度分布から撮影画像の各所における人の粗密状況が分かるが、密度分布から個々の人の位置までは分からない。
これに対し、密度推定手段５０の後段の物体位置判定手段５１は、撮影画像に現れている個々の人の位置を判定する手段である。 From the density distribution output by the density estimation means 50, the density of people at various locations in the photographed image can be understood, but the position of each person cannot be determined from the density distribution.
On the other hand, the object position determination means 51 subsequent to the density estimation means 50 is a means for determining the position of each person appearing in the captured image.

物体位置判定手段５１は、単独の物体（人）としての画像特徴が現れている箇所を撮影画像上で探索することにより個々の物体を検出して個々の物体の位置を判定する。すなわち、物体位置判定手段５１は、撮影画像内に個々の物体が存在し得る候補位置を設定して当該候補位置の撮影画像に単独の物体の画像特徴（単体特徴）が現れている度合いを表す評価値を算出し、評価値が所定値以上である候補位置を物体の位置と判定する。例えば、単体特徴は人の形状であり、単体特徴記憶手段４１が予め単体特徴を記憶している。また例えば、評価値は撮影画像のエッジと人の形状を表すモデルとの類似度である。 The object position determination means 51 detects an individual object by searching a captured image for a place where an image feature as a single object (person) appears, and determines the position of the individual object. That is, the object position determination unit 51 sets a candidate position where an individual object can exist in the photographed image, and represents the degree to which the image feature (single feature) of a single object appears in the photographed image at the candidate position. An evaluation value is calculated, and a candidate position whose evaluation value is greater than or equal to a predetermined value is determined as an object position. For example, the single feature is a human shape, and the single feature storage unit 41 stores the single feature in advance. For example, the evaluation value is the similarity between the edge of the captured image and a model representing the shape of a person.

ここで、混雑が生じ得る空間が撮影された撮影画像においては、混雑によって人と人の間で生じる隠蔽が単体特徴の一部を隠し、それによって評価値が下がれば個々の人を検出し損ねる。撮影部２が俯瞰設置されている場合は、足元に近いほど隠蔽は発生しやすく、頭に近いほど隠蔽は発生しにくい。このことを考慮し、混雑に適応させるべく単体特徴を人の頭部だけにすると混雑時の検出し損ねは減少する。しかし、頭部だけの単体特徴は肩などに対しても比較的高い評価値が算出されてしまうため、混雑していないときの誤検出が増加する。 Here, in a captured image in which a space in which congestion can occur is captured, concealment that occurs between people due to congestion hides some of the single features, and if the evaluation value decreases thereby, individual persons cannot be detected. . When the photographing unit 2 is installed in an overhead view, the closer to the foot, the easier the concealment occurs, and the closer to the head, the less likely the concealment occurs. Considering this, if the single feature is only the head of a person so as to adapt to the congestion, the detection loss at the time of congestion is reduced. However, since a relatively high evaluation value is calculated for a single feature of the head only for the shoulder or the like, false detection when there is no congestion increases.

物体位置判定手段５１は、密度分布を参照することによって、このような評価する部分の多寡と個々の物体の検出精度との間に存在するトレードオフを解消する。すなわち、物体位置判定手段５１は、候補位置における密度に応じ、単独の物体を構成する部分のうちの重視する部分を変更して評価値を算出する。特に、物体位置判定手段５１は、候補位置における密度が高いほど、物体を構成する部分のうちの少ない部分の画像特徴を重視して評価値を算出する。例えば、物体位置判定手段５１は、候補位置の推定密度が低密度であれば全身を均等に評価して評価値を算出し、中密度であれば上半身を重視して評価値を算出し、高密度であれば頭部近傍を重視して評価値を算出する。 The object position determination unit 51 refers to the density distribution to eliminate the trade-off between the number of parts to be evaluated and the detection accuracy of each object. That is, the object position determination unit 51 calculates an evaluation value by changing an important part of parts constituting a single object according to the density at the candidate position. In particular, the object position determination unit 51 calculates the evaluation value with an emphasis on the image features of a smaller part of the parts constituting the object as the density at the candidate position is higher. For example, the object position determination unit 51 calculates the evaluation value by equally evaluating the whole body if the estimated density of the candidate positions is low, and calculates the evaluation value by focusing on the upper body if the density is medium. In the case of density, the evaluation value is calculated with emphasis on the vicinity of the head.

以下、個々の物体の検出と単体特徴について説明する。 Hereinafter, detection of individual objects and single unit characteristics will be described.

単体特徴記憶手段４１は、単独の人（物体）の形状を模した物体モデルの情報を予め記憶した物体モデル記憶手段４１０ａとして機能し、物体モデルの情報を単体特徴として記憶している。 The single feature storage unit 41 functions as the object model storage unit 410a that stores in advance information on an object model that simulates the shape of a single person (object), and stores the information on the object model as a single feature.

図４は、単体特徴記憶手段４１が記憶している単体特徴、すなわち物体モデル記憶手段４１０ａが記憶している物体モデルの情報を模式的に表した図である。 FIG. 4 is a diagram schematically showing the information of the single feature stored in the single feature storage unit 41, that is, the object model information stored in the object model storage unit 410a.

物体モデル記憶手段４１０ａが記憶している物体モデルは、具体的には立位の人の頭部、胴部、脚部に対応する３つの回転楕円体から構成される立体モデル７００である。なお頭部重心を人の代表位置とする。さらに物体モデル記憶手段４１０ａは、立体モデル７００と併せて、密度ごとの評価範囲７０２を記憶し、また立体モデル７００を撮影画像の座標系に投影するために撮影部２のカメラパラメータ７０１を記憶している。カメラパラメータ７０１は、実際の監視空間における撮影部２の設置位置及び撮像方向といった外部パラメータ、撮影部２の焦点距離、画角、レンズ歪みその他のレンズ特性や、撮像素子の画素数といった内部パラメータを含む情報である。 The object model stored in the object model storage unit 410a is specifically a three-dimensional model 700 including three spheroids corresponding to the head, torso, and legs of a standing person. Note that the center of gravity of the head is the representative position of the person. Further, the object model storage unit 410a stores an evaluation range 702 for each density together with the stereo model 700, and stores the camera parameters 701 of the imaging unit 2 in order to project the stereo model 700 onto the coordinate system of the captured image. ing. The camera parameters 701 include external parameters such as the installation position and imaging direction of the imaging unit 2 in the actual monitoring space, internal parameters such as the focal length, angle of view, lens distortion and other lens characteristics of the imaging unit 2 and the number of pixels of the imaging device. It is information to include.

評価範囲７０２は密度が高いほど単独の物体を構成する部分のうちの少ない部分となっている。具体的には物体モデル記憶手段４１０ａは、低密度クラスを表す値と対応付けて「全体」、中密度クラスを表す値と対応付けて「上部２／３」、高密度クラスを表す値と対応付けて「上部１／３」という設定を記憶している。以下、全体という評価範囲と立体モデル７００との組み合わせによって表される低密度用の物体モデル７１０を全身モデル、上部２／３という評価範囲と立体モデル７００との組み合わせによって表される中密度用の物体モデル７１１を上半身モデル、上部１／３という評価範囲と立体モデル７００との組み合わせによって表される高密度用の物体モデル７１２を頭部近傍モデルと称する。 The evaluation range 702 is a smaller portion of the portions constituting a single object as the density is higher. Specifically, the object model storage unit 410a associates the value representing the low density class with “entire”, the value representing the medium density class with “upper 2/3”, and the value representing the high density class. In addition, the setting “upper 1/3” is stored. Hereinafter, the low density object model 710 represented by a combination of the overall evaluation range and the three-dimensional model 700 is the whole body model, and the medium density medium model represented by the combination of the evaluation range of the upper 2/3 and the three-dimensional model 700 is used. The object model 711 is referred to as an upper body model, and the object model 712 for high density represented by a combination of the evaluation range of the upper ３ and the three-dimensional model 700 is referred to as a head vicinity model.

このように、物体モデル記憶手段４１０ａは、低密度クラスと対応付けて全身モデル７１０を、中密度クラスと対応付けて上半身モデル７１１を、高密度クラスと対応付けて頭部近傍モデル７１２を、カメラパラメータ７０１ととともに物体モデルの情報として記憶している。 As described above, the object model storage unit 410a associates the whole body model 710 with the low density class, the upper body model 711 with the medium density class, the head vicinity model 712 with the high density class, the camera It is stored together with the parameter 701 as object model information.

配置生成手段５１０ａは、それぞれが１以上の候補位置を含む互いに異なる複数通りの配置を生成し、生成した各配置をモデル画像生成手段５１２ａに出力する。 The arrangement generation unit 510a generates a plurality of different arrangements each including one or more candidate positions, and outputs the generated arrangements to the model image generation unit 512a.

そのために、配置生成手段５１０ａは、乱数に基づき撮影画像の画素のうちの推定密度が低密度、中密度または高密度である画素の中から１個以上上限個数以下の個数（配置数）の画素をランダムに決定し、決定した各画素の位置を候補位置とすることで配置を生成する。配置生成手段５１０ａは、この生成を、配置数を順次増加させながら配置数ごとに予め定めた回数ずつ反復することによって、互いに異なる複数通りの配置を生成する。なお、配置数の上限個数は監視空間に存在し得る物体の数の上限とすることができ、例えば監視空間を模した仮想空間中に立位の人の立体モデルを重ならずに配置可能な数として算出できる。 For this purpose, the arrangement generation unit 510a uses one or more pixels (arrangement number) that are one or more and not more than the upper limit number of pixels whose estimated density is low density, medium density, or high density among pixels of the captured image based on random numbers. Are determined at random, and the determined position of each pixel is set as a candidate position to generate an arrangement. The arrangement generation unit 510a generates a plurality of different arrangements by repeating this generation by a predetermined number of times for each arrangement number while sequentially increasing the number of arrangements. Note that the upper limit of the number of arrangements can be the upper limit of the number of objects that can exist in the monitoring space. For example, it is possible to arrange a standing person's three-dimensional model in a virtual space imitating the monitoring space without overlapping. It can be calculated as a number.

モデル画像生成手段５１２ａは、配置生成手段５１０ａから入力された複数通りの配置それぞれについて、各候補位置に、当該候補位置における密度が高いほど、単独の物体を構成する部分のうちの少ない部分を模した物体モデルを描画してモデル画像を生成し、生成した各モデル画像を評価値算出手段５１４ａに出力する。 For each of a plurality of arrangements input from the arrangement generation unit 510a, the model image generation unit 512a simulates a smaller portion of the portions constituting a single object at each candidate position as the density at the candidate position is higher. The model object is drawn to generate a model image, and each generated model image is output to the evaluation value calculation unit 514a.

そのために、モデル画像生成手段５１２ａは、物体モデル記憶手段４１０ａからカメラパラメータを読み出し、配置ごとに、カメラパラメータを用いて、各候補位置を立体モデルの頭部重心の高さ（例えば１．５ｍ）の水平面に逆投影することで、当該候補位置に投影される立体モデルの、監視空間を模した仮想空間内における代表位置を算出する。 For this purpose, the model image generation unit 512a reads the camera parameters from the object model storage unit 410a, and uses the camera parameters for each arrangement to set each candidate position to the height of the center of gravity of the head of the stereo model (for example, 1.5 m). The representative position in the virtual space imitating the monitoring space of the three-dimensional model projected at the candidate position is calculated by back-projecting to the horizontal plane.

また、モデル画像生成手段５１２ａは、物体モデル記憶手段４１０ａから頭部近傍モデルを読み出して各候補位置と対応する仮想空間内の代表位置に頭部近傍モデルを配置し、カメラパラメータを用いて頭部近傍モデルを撮影画像の座標系に投影する。そして、モデル画像生成手段５１２ａは、密度推定手段５０から入力された密度分布を参照して各候補位置に対応する頭部近傍モデルの投影領域内の推定密度を集計し、各候補位置における最多の推定密度（ただし背景クラスは除く）を当該候補位置の密度と決定する。 Further, the model image generation unit 512a reads out the head vicinity model from the object model storage unit 410a, arranges the head vicinity model at a representative position in the virtual space corresponding to each candidate position, and uses the camera parameters to The neighborhood model is projected onto the coordinate system of the captured image. Then, the model image generation unit 512a refers to the density distribution input from the density estimation unit 50 and totals the estimated density in the projection region of the head vicinity model corresponding to each candidate position, and the largest number at each candidate position. The estimated density (excluding the background class) is determined as the density of the candidate position.

また、モデル画像生成手段５１２ａは、候補位置ごとに当該候補位置の密度に応じた物体モデルを物体モデル記憶手段４１０ａから読み出す。具体的には、モデル画像生成手段５１２ａは、候補位置の密度が低密度であれば全身モデルを読み出し、中密度であれば上半身モデルを読み出し、高密度であれば頭部近傍モデルを読み出す。そして、モデル画像生成手段５１２ａは、各配置について、各候補位置に対応して読み出した物体モデルを当該候補位置と対応する仮想空間内の代表位置に配置し、カメラパラメータを用いて各全身モデルの形状を撮影画像の座標系に投影することによって、配置ごとのモデル画像を生成する。
なお、モデル画像生成手段５１２ａは、撮影部２からの距離が遠い代表位置に配置した物体モデルから順に投影し、投影領域を上書きすることによって、物体モデル間の隠蔽を表現したモデル画像を生成する。 In addition, the model image generation unit 512a reads an object model corresponding to the density of the candidate positions from the object model storage unit 410a for each candidate position. Specifically, the model image generation means 512a reads the whole body model if the density of candidate positions is low, reads the upper body model if the density is medium, and reads the head vicinity model if the density is high. Then, for each arrangement, the model image generating unit 512a arranges the object model read out corresponding to each candidate position at a representative position in the virtual space corresponding to the candidate position, and uses the camera parameters for each whole body model. By projecting the shape onto the coordinate system of the captured image, a model image for each arrangement is generated.
Note that the model image generation unit 512a sequentially projects from the object model arranged at the representative position that is far from the photographing unit 2, and generates a model image expressing the concealment between the object models by overwriting the projection area. .

また、モデル画像生成手段５１２ａは、配置ごとに、モデル画像における物体モデルどうしの重なり度合いを表す隠蔽度を次式に従って算出する。
隠蔽度＝モデル間の重複領域の面積／モデルの投影領域の和領域の面積（１） In addition, the model image generation unit 512a calculates a concealment degree that represents the degree of overlap between the object models in the model image for each arrangement according to the following equation.
Concealment degree = area of overlapping area between models / area of sum area of model projection areas (1)

そして、モデル画像生成手段５１２ａは、配置とモデル画像と隠蔽度を対応付けて評価値算出手段５１４ａに出力する。 Then, the model image generating unit 512a associates the arrangement, the model image, and the degree of concealment, and outputs them to the evaluation value calculating unit 514a.

図５は、第一の実施形態に係る密度推定手段５０、配置生成手段５１０ａおよびモデル画像生成手段５１２ａによる処理例を模式的に示した図である。
画像７２０は、密度推定手段５０が推定した密度分布を画像化したものである。当該密度分布においては、白抜き部は推定密度が背景である領域、横線部は推定密度が低密度である領域、斜線部は推定密度が中密度である領域、格子部は推定密度が高密度である領域をそれぞれ示している。
画像７２１は、配置生成手段５１０ａが生成した配置に含まれる８個の候補位置を撮影画像の座標系に×印でプロットしたものである。
３次元モデル７２２は、モデル画像生成手段５１２ａが画像７２１に示した８個の候補位置と対応する仮想空間内の代表位置に立体モデルを配置した様子を図示したものである。
画像７２３は、モデル画像生成手段５１２ａが、画像７２０で示した密度分布に基づいて各候補位置の密度を特定し、当該密度に応じた評価範囲の立体モデルを各候補位置に投影して作成したモデル画像を示している。 FIG. 5 is a diagram schematically illustrating a processing example by the density estimation unit 50, the arrangement generation unit 510a, and the model image generation unit 512a according to the first embodiment.
The image 720 is an image of the density distribution estimated by the density estimation unit 50. In the density distribution, the white area is the area where the estimated density is the background, the horizontal line area is the area where the estimated density is low density, the shaded area is the area where the estimated density is medium density, and the lattice area is the high density estimated area Each region is shown.
The image 721 is obtained by plotting the eight candidate positions included in the arrangement generated by the arrangement generation unit 510a in the coordinate system of the photographed image with x marks.
The three-dimensional model 722 illustrates a state in which the three-dimensional model is arranged at the representative position in the virtual space corresponding to the eight candidate positions shown in the image 721 by the model image generating unit 512a.
The image 723 is created by the model image generation unit 512a specifying the density of each candidate position based on the density distribution shown in the image 720 and projecting a three-dimensional model in the evaluation range according to the density to each candidate position. A model image is shown.

評価値算出手段５１４ａは、複数通りの配置それぞれについて、モデル画像生成手段５１２ａから入力されたモデル画像の撮影画像に対する類似の度合いを表す評価値を算出し、配置ごとの評価値を最適配置決定手段５１６ａに出力する。 The evaluation value calculation unit 514a calculates an evaluation value representing the degree of similarity of the model image input from the model image generation unit 512a to the photographed image for each of a plurality of types of arrangement, and sets the evaluation value for each arrangement as an optimum arrangement determination unit. Output to 516a.

具体的には、評価値算出手段５１４ａは、各モデル画像と撮影画像の類似度を次式に従って算出する。
類似度＝形状適合度 − Ｗ_Ｈａ×隠蔽度（２）
ただし、Ｗ_Ｈａは０より大きな重み係数であり、事前の実験に基づいて予め設定される。形状適合度から減じる隠蔽度は過剰な物体モデルの重なりを抑制するためのペナルティ値である。このように隠蔽度を含めた類似度に基づいて最適配置を決定することで、本来の物体数以上の物体モデルが当てはまることによる物体位置の誤検出を防止できる。 Specifically, the evaluation value calculation unit 514a calculates the similarity between each model image and the captured image according to the following equation.
Similarity = Shape conformity-W _Ha x Concealment (2)
However, W _Ha is a weighting coefficient larger than 0, and is preset based on a prior experiment. The degree of concealment subtracted from the shape matching degree is a penalty value for suppressing the overlap of excessive object models. In this way, by determining the optimum arrangement based on the similarity including the concealment degree, it is possible to prevent erroneous detection of the object position due to the application of an object model equal to or more than the original number of objects.

形状適合度は、モデル画像と撮影画像とのエッジの類似度とすることができる。評価値算出手段５１４ａは、各モデル画像と撮影画像のそれぞれからエッジを抽出し、各モデル画像について、モデル画像から有効なエッジが抽出された画素ごとに、対応する撮影画像の画素のエッジとの差の絶対値を算出して総和し、総和値をモデル画像からエッジが抽出された画素数で除して符号を反転した値を、当該モデル画像の形状適合度として算出する。 The shape matching degree can be the similarity of the edge between the model image and the captured image. The evaluation value calculation means 514a extracts an edge from each model image and each photographed image, and for each model image, for each pixel from which a valid edge is extracted from the model image, the corresponding edge of the pixel of the photographed image. The absolute value of the difference is calculated and summed, and the value obtained by dividing the sum by the number of pixels from which the edge is extracted from the model image and inverting the sign is calculated as the shape suitability of the model image.

或いは、評価値算出手段５１４ａは、各モデル画像と撮影画像のそれぞれからエッジ画像を生成し、各モデル画像について、撮影画像から生成したエッジ画像と当該モデル画像から生成したエッジ画像とのチャンファーマッチング（Chamfer Matching）を行って得られるチャンファー距離の符号を反転した値を、当該モデル画像の形状適合度として算出する。 Alternatively, the evaluation value calculation unit 514a generates an edge image from each model image and the captured image, and for each model image, chamfer matching between the edge image generated from the captured image and the edge image generated from the model image is performed. A value obtained by inverting the sign of the chamfer distance obtained by performing (Chamfer Matching) is calculated as the shape matching degree of the model image.

最適配置決定手段５１６ａは、評価値算出手段５１４ａから入力された配置ごとの評価値を参照し、評価値が最大の配置における候補位置を物体の位置と決定し、決定した物体位置の情報を物体位置出力手段３１に出力する。すなわち、最適配置決定手段５１６ａは、最大の類似度が算出された配置に含まれる各候補位置を撮影画像に撮影されている各人の位置と決定する。
例えば、最適配置決定手段５１６ａは、監視員が視認し易いよう、各物体位置に物体モデルを当該物体位置の密度に応じて色分けして描画して物体位置の情報を生成し、出力する。または、物体位置の情報は物体位置の座標値そのものとすることもでき、物体位置の情報は、描画した各物体モデルの、他の物体モデルと重複していない領域とすることもできる。或いは、物体位置の情報は、上述したデータのうちの２以上を含んだデータとしてもよい。 The optimum arrangement determining unit 516a refers to the evaluation value for each arrangement input from the evaluation value calculating unit 514a, determines the candidate position in the arrangement having the maximum evaluation value as the object position, and uses the information on the determined object position as the object Output to the position output means 31. That is, the optimum arrangement determining unit 516a determines each candidate position included in the arrangement where the maximum similarity is calculated as the position of each person photographed in the photographed image.
For example, the optimum arrangement determining unit 516a generates and outputs information on the object position by drawing an object model in each object position in a color-coded manner according to the density of the object position so that the observer can easily recognize it. Alternatively, the object position information may be the coordinate value of the object position itself, and the object position information may be a region of each drawn object model that does not overlap with other object models. Alternatively, the object position information may be data including two or more of the above-described data.

物体位置出力手段３１は物体位置判定手段５１から入力された物体位置の情報を表示部６に順次出力し、表示部６は物体位置出力手段３１から入力された物体位置の情報を表示する。例えば、物体位置の情報は、インターネット経由で送受信され、表示部６に表示される。監視員は、表示された情報を視認することによって監視空間に混雑が発生している地点を把握し、当該地点に警備員を派遣し或いは増員するなどの対処を行う。 The object position output unit 31 sequentially outputs the object position information input from the object position determination unit 51 to the display unit 6, and the display unit 6 displays the object position information input from the object position output unit 31. For example, the information on the object position is transmitted / received via the Internet and displayed on the display unit 6. The monitoring person grasps a point where the monitoring space is congested by viewing the displayed information, and takes measures such as dispatching or increasing the number of guards at the point.

＜第一の実施形態に係る画像監視装置１の動作＞
図６、図７および図８のフローチャートを参照して画像監視装置１の動作を説明する。 <Operation of the image monitoring apparatus 1 according to the first embodiment>
The operation of the image monitoring apparatus 1 will be described with reference to the flowcharts of FIGS. 6, 7, and 8.

画像監視装置１が動作を開始すると、イベント会場に設置されている撮影部２は所定時間おきに監視空間を撮影して撮影画像を画像処理部５が設置されている画像解析センター宛に順次送信する。そして、画像処理部５は撮影画像を受信するたびに図６のフローチャートに従った動作を繰り返す。 When the image monitoring apparatus 1 starts operation, the image capturing unit 2 installed in the event venue captures the monitoring space every predetermined time and sequentially transmits the captured images to the image analysis center in which the image processing unit 5 is installed. To do. The image processing unit 5 repeats the operation according to the flowchart of FIG. 6 every time a captured image is received.

まず、通信部３は画像取得手段３０として動作し、撮影部２からの撮影画像の受信待ち状態となる。撮影画像を取得した画像取得手段３０は当該撮影画像を画像処理部５に出力する（ステップＳ１）。 First, the communication unit 3 operates as the image acquisition unit 30 and waits to receive a captured image from the imaging unit 2. The image acquisition means 30 that acquired the captured image outputs the captured image to the image processing unit 5 (step S1).

撮影画像を入力された画像処理部５は密度推定手段５０として動作し、撮影画像から密度分布を推定する（ステップＳ２）。密度推定手段５０は、撮影画像の各画素の位置にて推定用特徴量を抽出するとともに記憶部４の密度推定器記憶手段４０から密度推定器を読み出し、各推定用特徴量を密度推定器に入力して撮影画像の各画素における推定密度を取得することにより密度分布を推定する。 The image processing unit 5 to which the photographed image is input operates as the density estimating means 50, and estimates the density distribution from the photographed image (step S2). The density estimation unit 50 extracts the estimation feature quantity at the position of each pixel of the captured image, reads the density estimator from the density estimator storage unit 40 of the storage unit 4, and uses each estimation feature quantity as the density estimator. The density distribution is estimated by inputting and obtaining the estimated density at each pixel of the captured image.

密度分布を推定した画像処理部５は物体位置判定手段５１としても動作し、物体位置判定手段５１には画像取得手段３０から撮影画像が入力されるとともに密度推定手段５０から密度分布が入力される。これらを入力された物体位置判定手段５１は、密度分布に背景クラス以外の推定密度が含まれているか否かを確認する（ステップＳ３）。 The image processing unit 5 that has estimated the density distribution also operates as the object position determination unit 51. The captured image is input from the image acquisition unit 30 and the density distribution is input from the density estimation unit 50 to the object position determination unit 51. . The object position determination means 51 that has received these confirms whether the density distribution includes an estimated density other than the background class (step S3).

背景クラス以外の推定密度が含まれている場合は（ステップＳ３にてＹＥＳ）、物体位置判定手段５１は、少なくとも１人以上の人が撮影されているとして、撮影画像から個々の物体の位置を判定する処理を行う（ステップＳ４）。他方、背景クラスのみの場合は（ステップＳ３にてＮＯ）、人が撮影されていないとして、ステップＳ４，Ｓ５の処理を省略する。 If the estimated density other than the background class is included (YES in step S3), the object position determination unit 51 determines the position of each object from the captured image, assuming that at least one person is captured. A determination process is performed (step S4). On the other hand, in the case of only the background class (NO in step S3), the processes in steps S4 and S5 are omitted assuming that no person is photographed.

図７および図８のフローチャートを参照して、ステップＳ４の物体位置判定処理を説明する。単体特徴記憶手段４１が物体モデル記憶手段４１０ａとして動作し、物体位置判定手段５１が配置生成手段５１０ａ、モデル画像生成手段５１２ａ、評価値算出手段５１４ａおよび最適配置決定手段５１６ａとして動作して、物体位置判定処理が実行される。 The object position determination process in step S4 will be described with reference to the flowcharts of FIGS. The single feature storage unit 41 operates as the object model storage unit 410a, and the object position determination unit 51 operates as the arrangement generation unit 510a, the model image generation unit 512a, the evaluation value calculation unit 514a, and the optimal arrangement determination unit 516a. Judgment processing is executed.

配置生成手段５１０ａは、１から上限個数以下の範囲で配置数を順次設定して（ステップＳ１００）、ステップＳ１００〜Ｓ１１４のループ処理を制御する。 The arrangement generation unit 510a sequentially sets the number of arrangements within a range from 1 to the upper limit number (step S100), and controls the loop processing of steps S100 to S114.

また、配置生成手段５１０ａは、反復回数をカウントするための変数Ｔを用意してＴを０に初期化し（ステップＳ１０１）、ステップＳ１０２〜Ｓ１１３の反復処理を開始する。 The arrangement generation unit 510a prepares a variable T for counting the number of iterations, initializes T to 0 (step S101), and starts the iterative process of steps S102 to S113.

次に、配置生成手段５１０ａは、密度推定手段５０から入力された密度分布において推定密度が低密度、中密度または高密度の領域内に、ステップＳ１００にて設定した配置数と同数の候補位置をランダムに設定することによって、当該配置数におけるＴ通り目の配置を生成し、モデル画像生成手段５１２ａに出力する（ステップＳ１０２）。 Next, the arrangement generation unit 510a sets the same number of candidate positions as the number of arrangements set in step S100 in the low density, medium density, or high density region in the density distribution input from the density estimation unit 50. By setting at random, a T-th arrangement in the arrangement number is generated and output to the model image generation unit 512a (step S102).

モデル画像生成手段５１２ａは、物体モデル記憶手段４１０ａからカメラパラメータを読み出し、カメラパラメータを用いて、ステップＳ１０２で生成した配置に含まれる各候補位置を仮想空間の三次元座標に変換する（ステップＳ１０３）。 The model image generation unit 512a reads camera parameters from the object model storage unit 410a, and converts each candidate position included in the arrangement generated in step S102 into three-dimensional coordinates in the virtual space using the camera parameters (step S103). .

次に、モデル画像生成手段５１２ａは、撮影画像と同サイズのモデル画像を用意して初期化するとともに、各候補位置の三次元座標の撮影部２までの距離を算出し、距離が遠い候補位置から順に処理対象に設定して（ステップＳ１０４）、ステップＳ１０４〜Ｓ１０８のループ処理を実行する。 Next, the model image generation unit 512a prepares and initializes a model image having the same size as the captured image, calculates the distance to the imaging unit 2 of the three-dimensional coordinates of each candidate position, and the candidate position with a long distance Are sequentially set as processing targets (step S104), and the loop processing of steps S104 to S108 is executed.

続いて、モデル画像生成手段５１２ａは密度分布を参照して処理対象の候補位置の密度を特定する（ステップＳ１０５）。モデル画像生成手段５１２ａは、物体モデル記憶手段４１０ａから頭部近傍モデルを読み出して当該候補位置の三次元座標に配置し、カメラパラメータを用いて頭部近傍モデルを撮影画像の座標系に投影し、投影領域内で最多の推定密度（ただし背景クラス以外）を候補位置の密度として特定する。 Subsequently, the model image generating unit 512a specifies the density of the candidate position to be processed with reference to the density distribution (step S105). The model image generation unit 512a reads out the head vicinity model from the object model storage unit 410a, arranges it in the three-dimensional coordinates of the candidate position, projects the head vicinity model on the coordinate system of the captured image using the camera parameters, The highest estimated density (except the background class) in the projection area is specified as the density of candidate positions.

続いて、モデル画像生成手段５１２ａは、ステップＳ１０５で特定した密度に対応する物体モデルを物体モデル記憶手段４１０ａから読み出して（ステップＳ１０６）、処理対象の候補位置の三次元座標に配置し、カメラパラメータを用いて、配置した物体モデルをモデル画像に上書き投影する（ステップＳ１０７）。また、このとき、モデル画像生成手段５１２ａは物体モデルの投影面積を記録しておく。 Subsequently, the model image generation unit 512a reads the object model corresponding to the density specified in step S105 from the object model storage unit 410a (step S106), arranges it at the three-dimensional coordinates of the candidate position to be processed, and sets the camera parameter. Is used to over-project the placed object model onto the model image (step S107). At this time, the model image generation means 512a records the projected area of the object model.

そして、モデル画像生成手段５１２ａは、現配置数におけるＴ通り目の配置に含まれる全ての候補位置を処理し終えたか否かを確認し（ステップＳ１０８）、未処理の候補位置がある場合は（ステップＳ１０８にてＮＯ）、処理をステップＳ１０４に戻して次の候補位置を処理する。 Then, the model image generating means 512a checks whether or not all candidate positions included in the Tth arrangement in the current arrangement number have been processed (step S108), and if there is an unprocessed candidate position ( NO at step S108), the process returns to step S104 to process the next candidate position.

他方、全ての候補位置を処理し終えた場合は（ステップＳ１０８にてＹＥＳ）、現配置数におけるＴ通り目の配置についてのモデル画像の完成となる。モデル画像を完成させたモデル画像生成手段５１２ａは当該モデル画像における物体モデルの隠蔽度を算出する（ステップＳ１０９）。すなわち、モデル画像生成手段５１２ａは、「モデルの投影領域の和領域の面積」であるモデル画像上の投影領域の面積を求めるとともに、ステップＳ１０７で記録していた物体モデルごとの投影面積を総和し、総和値からモデルの投影領域の和領域の面積を差し引いて「モデル間の重複領域の面積」を求め、これらを式（１）に代入して隠蔽度を算出する。
隠蔽度を算出したモデル画像生成手段５１２ａはモデル画像と隠蔽度を評価値算出手段５１４ａに出力する。 On the other hand, when all candidate positions have been processed (YES in step S108), the model image for the Tth arrangement in the current arrangement number is completed. The model image generating means 512a that completes the model image calculates the degree of concealment of the object model in the model image (step S109). That is, the model image generation unit 512a obtains the area of the projection area on the model image that is “the area of the sum area of the model projection areas” and sums up the projection areas for each object model recorded in step S107. Then, by subtracting the area of the sum area of the projected areas of the model from the total value, the “area of the overlapping area between the models” is obtained, and these are substituted into Equation (1) to calculate the degree of concealment.
The model image generation unit 512a that has calculated the concealment degree outputs the model image and the concealment degree to the evaluation value calculation unit 514a.

モデル画像と隠蔽度を入力された評価値算出手段５１４ａは、当該モデル画像と撮影画像の形状適合度を算出し（ステップＳ１１０）、さらに、当該形状適合度と隠蔽度から、モデル画像と撮影画像の類似度を現配置数におけるＴ通り目の配置についての評価値として算出する（ステップＳ１１１）。すなわち、評価値算出手段５１４ａは、モデル画像生成手段５１２ａから入力されたモデル画像と撮影画像のそれぞれからエッジ画像を生成し、これらのエッジ画像の類似度を形状適合度として算出する。そして、形状適合度と隠蔽度を式（２）に代入して類似度を算出する。 The evaluation value calculation means 514a to which the model image and the concealment degree are input calculates the shape conformity between the model image and the captured image (step S110), and further, the model image and the photographed image are obtained from the shape conformity and the concealment degree. Is calculated as an evaluation value for the Tth arrangement in the current arrangement number (step S111). In other words, the evaluation value calculation unit 514a generates an edge image from each of the model image and the captured image input from the model image generation unit 512a, and calculates the similarity between these edge images as the shape matching degree. Then, the similarity is calculated by substituting the shape matching degree and the concealment degree into the equation (2).

現配置数におけるＴ通り目の配置についての評価値が算出されると、評価値算出手段５１４ａは当該配置と評価値を対応付けて記録し、配置生成手段５１０ａは反復回数Ｔを１だけ増加させて（ステップＳ１１２）、規定回数Ｔ_ＭＡＸと比較し（ステップＳ１１３）、ＴがＴ_ＭＡＸ未満の場合は（ステップＳ１１３にてＮＯ）、処理をステップＳ１０２に戻して現配置数における反復処理を継続させる。 When the evaluation value for the Tth arrangement in the current arrangement number is calculated, the evaluation value calculation means 514a records the arrangement and the evaluation value in association with each other, and the arrangement generation means 510a increases the number of iterations T by one. (Step S112), and compared with the specified number of times T _MAX (step S113). If T is less than T _MAX (NO in step S113), the process returns to step S102 to continue the iterative process for the current number of arrangements. .

反復回数Ｔが規定回数Ｔ_ＭＡＸに達した場合（ステップＳ１１３にてＹＥＳ）、配置生成手段５１０ａは、現配置数における反復処理を終了させ、全ての配置数を設定し終えたか否かを確認する（ステップＳ１１４）。未設定の配置数がある場合は（ステップＳ１１４にてＮＯ）、処理をステップＳ１００に戻して次の配置数についての処理を行う。 When the number of iterations T has reached the specified number of times T _MAX (YES in step S113), the arrangement generation unit 510a ends the iterative process for the current number of arrangements and confirms whether all the arrangement numbers have been set. (Step S114). If there is an unset number of arrangements (NO in step S114), the process returns to step S100 to perform processing for the next number of arrangements.

他方、全ての配置数を設定し終えた場合は（ステップＳ１１４にてＹＥＳ）、評価値算出手段５１４ａはステップＳ１１２で記録した配置と評価値を最適配置決定手段５１６ａに入力し、最適配置決定手段５１６ａは、それらの中で評価値が最大の配置を特定し（ステップＳ１１５）、当該配置を撮影画像に撮影されている個々の人の位置を表している情報と判定する。 On the other hand, when all the arrangement numbers have been set (YES in step S114), evaluation value calculation means 514a inputs the arrangement and evaluation values recorded in step S112 to optimum arrangement determination means 516a, and optimum arrangement determination means 516a identifies the arrangement having the maximum evaluation value among them (step S115), and determines that the arrangement is information representing the position of each person photographed in the photographed image.

再び図６を参照して説明を続ける。物体位置判定手段５１はステップＳ４にて判定した個々の人の位置（物体位置）の情報を通信部３に出力する（ステップＳ５）。物体位置の情報を入力された通信部３は物体位置出力手段３１として動作し、物体位置の情報を表示部６に送信する。 The description will be continued with reference to FIG. The object position determination unit 51 outputs information on the position (object position) of each person determined in step S4 to the communication unit 3 (step S5). The communication unit 3 to which the object position information is input operates as the object position output unit 31 and transmits the object position information to the display unit 6.

以上の処理を終えると、処理はステップＳ１に戻され、次の撮影画像に対する処理が行われる。 When the above process is completed, the process returns to step S1, and the process for the next captured image is performed.

［第二の実施形態］
以下、第一の実施形態とは異なる本発明の好適な実施形態として、人の密度によって物体モデルに対する重み付けを変更する物体検出装置の例を含んだ画像監視装置１の例を説明する。 [Second Embodiment]
Hereinafter, as a preferred embodiment of the present invention different from the first embodiment, an example of an image monitoring device 1 including an example of an object detection device that changes the weighting of an object model according to the density of a person will be described.

第二の実施形態に係る画像監視装置は、単体特徴記憶手段４１が記憶している単体特徴の細部および物体位置判定手段５１が行う処理の細部が第一の実施形態に係る画像監視装置と異なり、概略の構成、概略の機能および動作の一部は共通する。そのため、概略の構成、概略の機能および動作の一部については、それぞれ第一の実施形態で参照した図１のブロック図、図２の機能ブロック図および図６のフローチャートを再び参照して説明する。 The image monitoring apparatus according to the second embodiment differs from the image monitoring apparatus according to the first embodiment in the details of the single feature stored in the single feature storage unit 41 and the details of the processing performed by the object position determination unit 51. A part of the general configuration, the general function, and the operation is common. Therefore, a part of the schematic configuration, the schematic function, and the operation will be described with reference to the block diagram of FIG. 1, the functional block diagram of FIG. 2, and the flowchart of FIG. .

＜第二の実施形態に係る画像監視装置１の構成＞
図１のブロック図を参照して第二の実施形態に係る画像監視装置１の概略の構成を説明する。
画像監視装置１は、第一の実施形態と同様、監視空間を所定時間おきに撮影して撮影画像を出力する撮影部２と、物体位置の情報を入力されて当該情報を表示する表示部６と、撮影画像を取得して当該撮影画像から個々の人（物体）を検出し、検出した物体の位置（物体位置）の情報を生成して出力する画像処理部５とが、撮影画像および物体位置の情報等の入出力を介在する通信部３に接続されるとともに、プログラムおよび各種データ等を記憶してこれらを入出力する記憶部４が画像処理部５に接続されてなる。 <Configuration of Image Monitoring Device 1 according to Second Embodiment>
The schematic configuration of the image monitoring apparatus 1 according to the second embodiment will be described with reference to the block diagram of FIG.
As in the first embodiment, the image monitoring apparatus 1 includes a photographing unit 2 that photographs a monitoring space at predetermined time intervals and outputs a photographed image, and a display unit 6 that receives information on an object position and displays the information. An image processing unit 5 that acquires a captured image, detects an individual person (object) from the captured image, and generates and outputs information on the position (object position) of the detected object. In addition to being connected to the communication unit 3 through which input and output of position information and the like are connected, a storage unit 4 that stores programs and various data and inputs and outputs them is connected to the image processing unit 5.

＜第二の実施形態に係る画像監視装置１の機能＞
図２および図９の機能ブロック図を参照し、第二の実施形態に係る画像監視装置１の機能について説明する。 <Function of the image monitoring apparatus 1 according to the second embodiment>
The function of the image monitoring apparatus 1 according to the second embodiment will be described with reference to the functional block diagrams of FIGS. 2 and 9.

通信部３は、第一の実施形態と同様、撮影部２から撮影画像を取得して密度推定手段５０と物体位置判定手段５１に出力する画像取得手段３０、および物体位置判定手段５１から入力された物体位置の情報を表示部６に出力する物体位置出力手段３１等としての機能を含む。 Similar to the first embodiment, the communication unit 3 receives a captured image from the image capturing unit 2 and outputs the captured image to the density estimation unit 50 and the object position determination unit 51 and the object position determination unit 51. A function as an object position output means 31 or the like for outputting information on the object position to the display unit 6.

また、記憶部４は、第一の実施形態と同様、所定の密度ごとに当該密度にて物体が存在する空間を撮影した密度画像それぞれの画像特徴を学習した密度推定器を記憶している密度推定器記憶手段４０、および予め単独の物体の画像特徴（単体特徴）を記憶している単体特徴記憶手段４１等としての機能を含み、単体特徴記憶手段４１が記憶している単体特徴は、密度が高いほど、物体を構成する部分のうちの少ない部分の画像特徴を重視した評価ができるものとなっている。 In addition, as in the first embodiment, the storage unit 4 stores a density estimator that learns the image features of each density image obtained by photographing a space where an object exists at the predetermined density for each predetermined density. The single feature stored in the single feature storage unit 41 includes functions such as an estimator storage unit 40 and a single feature storage unit 41 that stores image features (single features) of a single object in advance. The higher the is, the more the evaluation can be made with an emphasis on the image features of a small part of the parts constituting the object.

また、画像処理部５は、第一の実施形態と同様、撮影画像を密度推定器で走査することによって撮影画像に撮影された物体の密度の分布を推定し、推定した密度分布を物体位置判定手段５１に出力する密度推定手段５０、および撮影画像内に個々の物体が存在し得る候補位置を設定して当該候補位置の撮影画像に単独の物体の画像特徴が現れている度合いを表す評価値を算出し、評価値が所定値以上である候補位置を物体の位置と判定し、物体位置の情報を物体位置出力手段３１に出力する物体位置判定手段５１等としての機能を含み、物体位置判定手段５１は、候補位置の密度に応じた単体特徴を用いることによって、候補位置における密度が高いほど、物体を構成する部分のうちの少ない部分の画像特徴を重視して評価値を算出する。 Further, as in the first embodiment, the image processing unit 5 estimates the density distribution of the object photographed in the photographed image by scanning the photographed image with the density estimator, and determines the estimated density distribution as the object position. The density estimation means 50 output to the means 51, and an evaluation value representing the degree to which the image feature of a single object appears in the photographed image at the candidate position by setting candidate positions where individual objects can exist in the photographed image And a candidate position having an evaluation value equal to or greater than a predetermined value is determined as an object position, and the object position determination unit 51 includes a function as an object position determination unit 51 that outputs information on the object position to the object position output unit 31. The means 51 uses the single feature according to the density of the candidate position, and calculates the evaluation value with an emphasis on the image feature of a smaller part of the parts constituting the object as the density at the candidate position is higher.

ただし、上述したように、第二の実施形態に係る物体位置判定手段５１が行う処理の細部および単体特徴記憶手段４１が記憶している単体特徴の細部が第一の実施形態に係る画像監視装置１と異なる。これらの点について、図９の機能ブロック図を参照して説明する。 However, as described above, the details of the processing performed by the object position determination unit 51 according to the second embodiment and the details of the single feature stored in the single feature storage unit 41 are the image monitoring apparatus according to the first embodiment. Different from 1. These points will be described with reference to the functional block diagram of FIG.

第二の実施形態に係る単体特徴記憶手段４１は、単独の人（物体）の形状を模した物体モデルの情報を予め記憶した物体モデル記憶手段４１０ｂ、および評価値の算出において用いる重み係数を予め記憶した重み係数記憶手段４１２ｂとして機能し、物体モデルの情報および重み係数の情報を単体特徴として記憶している。 The unit feature storage unit 41 according to the second embodiment includes an object model storage unit 410b that stores in advance information on an object model that imitates the shape of a single person (object), and a weighting factor that is used in calculating an evaluation value in advance. It functions as the stored weight coefficient storage means 412b, and stores object model information and weight coefficient information as single features.

図１０は、第二の実施形態に係る単体特徴記憶手段４１が記憶している単体特徴、すなわち物体モデル記憶手段４１０ｂが記憶している物体モデルの情報および重み係数記憶手段４１２ｂが記憶している重み係数の情報を模式的に表した図である。 FIG. 10 shows the single feature stored in the single feature storage unit 41 according to the second embodiment, that is, the object model information stored in the object model storage unit 410b and the weight coefficient storage unit 412b. It is the figure which represented the information of the weighting coefficient typically.

物体モデル記憶手段４１０ｂが記憶している物体モデルは立位の人の頭部、胴部、脚部に対応する３つの回転楕円体から構成される立体モデル７５０である。この立体モデル７５０は人の全身の形状を表す物体モデルとなっており、以下、全身モデルと称する。なお頭部重心を人の代表位置とする。そしてさらに物体モデル記憶手段４１０ｂは、この全身モデルを撮影画像の座標系に投影するために撮影部２のカメラパラメータ７５１を全身モデルと併せて記憶している。 The object model stored in the object model storage unit 410b is a three-dimensional model 750 composed of three spheroids corresponding to the head, trunk, and legs of a standing person. This three-dimensional model 750 is an object model that represents the shape of a person's whole body, and is hereinafter referred to as a whole body model. Note that the center of gravity of the head is the representative position of the person. Further, the object model storage unit 410b stores the camera parameters 751 of the photographing unit 2 together with the whole body model in order to project the whole body model onto the coordinate system of the photographed image.

重み係数は密度が高いほど単独の物体を構成する部分のうちの少ない部分に偏重させた設定となっている。重み係数記憶手段４１２ｂは、低密度クラスを表す値と対応付けて「上部１／３に適用する重み係数０．３３３」「中部１／３に適用する重み係数０．３３３」「下部１／３に適用する重み係数０．３３３」、中密度クラスを表す値と対応付けて「上部１／３に適用する重み係数０．５００」「中部１／３に適用する重み係数０．４００」「下部１／３に適用する重み係数０．１００」、高密度クラスを表す値と対応付けて「上部１／３に適用する重み係数０．７００」「中部１／３に適用する重み係数０．２００」「下部１／３に適用する重み係数０．１００」を記憶している。以下、全身に均等な低密度用の重み係数７６０を全身均等重み係数、上半身を重視した中密度用の重み係数７６１を上半身偏重重み係数、頭部近傍を重視した高密度用の重み係数７６２を頭部近傍偏重重み係数と称する。 The weighting factor is set such that the higher the density is, the more the weight is concentrated on the smaller part of the part constituting the single object. The weighting coefficient storage unit 412b associates with the value representing the low density class “weighting coefficient 0.333 applied to the upper 1/3” “weighting coefficient 0.333 applied to the middle 1/3” “lower 1/3. “Weighting coefficient 0.333” applied to the middle density class, “weighting coefficient 0.500 applied to the upper third”, “weighting coefficient 0.400 applied to the middle third”, “lower” “Weighting coefficient 0.100 applied to 1/3”, “weighting coefficient 0.700 applied to upper 1/3” “weighting coefficient 0.200 applied to middle 1/3” in association with a value representing a high-density class "" Weighting coefficient 0.100 applied to lower 1/3 "is stored. Hereinafter, a weight coefficient 760 for low density that is uniform throughout the whole body is expressed as a uniform weight coefficient for whole body, a weight coefficient 761 for medium density that emphasizes the upper body, an overweight weight coefficient for upper body, and a weight coefficient 762 for high density that emphasizes the vicinity of the head. This is referred to as the near-head weighting coefficient.

このように、物体モデル記憶手段４１０ｂは全身モデルとカメラパラメータとを物体モデルの情報として記憶しており、重み係数記憶手段４１２ｂは低密度クラスと対応付けて全身均等重み係数７６０を、中密度クラスと対応付けて上半身偏重重み係数７６１を、高密度クラスと対応付けて頭部近傍偏重重み係数７６２を記憶している。 As described above, the object model storage unit 410b stores the whole body model and camera parameters as object model information, and the weight coefficient storage unit 412b associates the whole body uniform weight coefficient 760 with the low density class and the medium density class. Are stored in association with the upper-body body weighting factor 761 and the high-density class in association with the head vicinity weighting factor 762.

配置生成手段５１０ｂは、第一の実施形態において説明した配置生成手段５１０ａと同様にして、それぞれが１以上の候補位置を含む互いに異なる複数通りの配置を生成する。そして、配置生成手段５１０ｂは、生成した各配置をモデル画像生成手段５１２ｂに出力する。 The arrangement generation unit 510b generates a plurality of different arrangements each including one or more candidate positions in the same manner as the arrangement generation unit 510a described in the first embodiment. And the arrangement | positioning production | generation means 510b outputs each produced | generated arrangement | positioning to the model image generation means 512b.

モデル画像生成手段５１２ｂは、配置生成手段５１０ｂから入力された複数通りの配置のそれぞれについて、各候補位置に単独の物体を模した物体モデルを描画してモデル画像を生成し、生成した各モデル画像を評価値算出手段５１４ｂに出力する。 The model image generating unit 512b generates a model image by drawing an object model imitating a single object at each candidate position for each of a plurality of arrangements input from the arrangement generating unit 510b, and generates each model image. Is output to the evaluation value calculation means 514b.

そのために、モデル画像生成手段５１２ｂは、物体モデル記憶手段４１０ｂからカメラパラメータを読み出し、配置ごとに、カメラパラメータを用いて、各候補位置を立体モデルの頭部重心の高さの水平面に逆投影することで、当該候補位置に投影される立体モデルの、監視空間を模した仮想空間内における代表位置を算出する。 For this purpose, the model image generation unit 512b reads the camera parameters from the object model storage unit 410b, and backprojects each candidate position onto the horizontal plane at the height of the center of gravity of the head of the stereoscopic model using the camera parameters for each arrangement. Thus, the representative position in the virtual space imitating the monitoring space of the three-dimensional model projected on the candidate position is calculated.

また、モデル画像生成手段５１２ｂは、物体モデル記憶手段４１０ｂから全身モデルを読み出し、各候補位置と対応する仮想空間内の代表位置に全身モデルを配置し、カメラパラメータを用いて全身モデルを撮影画像の座標系に投影する。そして、モデル画像生成手段５１２ｂは、密度推定手段５０から入力された密度分布を参照して各候補位置に対応する全身モデルの投影領域における上部１／３の領域内の推定密度を集計し、各候補位置における最多の推定密度（ただし背景クラス以外）を当該候補位置の密度と決定する。 The model image generation unit 512b reads the whole body model from the object model storage unit 410b, places the whole body model at a representative position in the virtual space corresponding to each candidate position, and uses the camera parameters to convert the whole body model to the captured image. Project to the coordinate system. Then, the model image generation unit 512b refers to the density distribution input from the density estimation unit 50 and totals the estimated densities in the upper third region in the projection area of the whole body model corresponding to each candidate position. The highest estimated density (except for the background class) at the candidate position is determined as the density of the candidate position.

また、モデル画像生成手段５１２ｂは、候補位置ごとに当該候補位置の密度に応じた物体モデルを重み係数記憶手段４１２ｂから読み出す。すなわち、モデル画像生成手段５１２ｂは、候補位置の密度が低密度であれば全身均等重み係数を読み出し、中密度であれば上半身偏重重み係数を読み出し、高密度であれば頭部近傍偏重重み係数を読み出す。 Further, the model image generation unit 512b reads out an object model corresponding to the density of the candidate positions from the weight coefficient storage unit 412b for each candidate position. That is, the model image generating unit 512b reads the whole body equal weighting coefficient if the density of the candidate positions is low, reads the upper body weighting coefficient if the density is medium, and calculates the head weight weighting coefficient if the density is high. read out.

また、モデル画像生成手段５１２ｂは、各配置について、各候補位置と対応する仮想空間内の代表位置に全身モデルを配置し、カメラパラメータを用いて各全身モデルの形状を撮影画像の座標系に投影することによって、配置ごとのモデル画像を生成する。
なお、モデル画像生成手段５１２ｂは、撮影部２からの距離が遠い代表位置に配置した物体モデルから順に投影し、投影領域を上書きすることによって、物体モデル間の隠蔽を表現したモデル画像とする。 In addition, the model image generation unit 512b arranges the whole body model at the representative position in the virtual space corresponding to each candidate position for each arrangement, and projects the shape of each whole body model on the coordinate system of the captured image using the camera parameters. By doing so, a model image for each arrangement is generated.
Note that the model image generation unit 512b sequentially projects from the object model arranged at the representative position far from the photographing unit 2, and overwrites the projection area to obtain a model image expressing the concealment between the object models.

また、モデル画像生成手段５１２ｂは、各モデル画像と対応して、当該モデル画像における各候補位置の全身モデルの投影領域に当該候補位置の密度に応じた重み係数を設定した重み画像を生成する。すなわち、重み画像中の密度が低密度である候補位置の投影領域においては、上部１／３の領域の画素に０．３３３、中部１／３の領域の画素に０．３３３、下部１／３の領域の画素に０．３３３がそれぞれ設定される。重み画像中の密度が中密度である候補位置の投影領域においては、上部１／３の領域の画素に０．５００、中部１／３の領域の画素に０．４００、下部１／３の領域の画素に０．１００がそれぞれ設定される。重み画像中の密度が高密度である候補位置の投影領域においては、上部１／３の領域の画素に０．７００、中部１／３の領域の画素に０．２００、下部１／３の領域の画素に０．１００がそれぞれ設定される。 In addition, the model image generation unit 512b generates a weight image corresponding to each model image by setting a weighting factor according to the density of the candidate position in the projection area of the whole body model at each candidate position in the model image. That is, in the projection area at the candidate position where the density in the weighted image is low, the pixel in the upper １／ area is 0.333, the pixel in the middle ３ area is 0.333, and the lower ３. 0.333 is set for each pixel in the area. In the projection area at the candidate position where the density in the weighted image is medium density, 0.500 for the pixel in the upper 1/3 area, 0.400 for the pixel in the middle 1/3 area, and the lower 1/3 area Is set to 0.100. In the projection area of the candidate position where the density in the weighted image is high, 0.700 for the pixel in the upper 1/3 area, 0.200 for the pixel in the middle 1/3 area, and the lower 1/3 area Is set to 0.100.

そして、モデル画像生成手段５１２ｂは、配置ごとに、配置とモデル画像と重み画像とを対応付けて評価値算出手段５１４ｂに出力する。 Then, the model image generation unit 512b associates the arrangement, the model image, and the weight image for each arrangement, and outputs them to the evaluation value calculation unit 514b.

図１１は、図５で例示した密度分布および配置に対して、モデル画像生成手段５１２ｂが生成したモデル画像７７０と重み画像７７１を模式的に示した図である。なお、重み画像７７１においてはスペースの都合上、重み係数の値を有効数字１桁で示している。 FIG. 11 is a diagram schematically illustrating a model image 770 and a weight image 771 generated by the model image generation unit 512b with respect to the density distribution and arrangement illustrated in FIG. In the weight image 771, the value of the weight coefficient is shown by one significant digit for convenience of space.

評価値算出手段５１４ｂは、モデル画像生成手段５１２ｂから入力された複数通りの配置のモデル画像それぞれについて、物体を構成する部分ごとに物体モデルの撮影画像に対する類似度を求め、候補位置における密度が高いほど少ない部分に偏重させた重み付けを行って類似度を総和することにより評価値を算出し、配置ごとの評価値を最適配置決定手段５１６ｂに出力する。 The evaluation value calculation unit 514b obtains a similarity to the captured image of the object model for each portion constituting the object for each of the plurality of model images input from the model image generation unit 512b, and the density at the candidate position is high. An evaluation value is calculated by performing weighting with a biased number of parts as small as possible and summing up the similarities, and the evaluation value for each arrangement is output to the optimum arrangement determining means 516b.

具体的には、評価値算出手段５１４ｂは、各モデル画像と撮影画像の、当該モデル画像と対応する重み画像に従って重み付けた重み付け類似度を算出する。 Specifically, the evaluation value calculation unit 514b calculates a weighted similarity between each model image and the photographed image according to a weight image corresponding to the model image.

重み付け類似度は、モデル画像と撮影画像とのエッジの重み付け類似度とすることができる。評価値算出手段５１４ｂは、各モデル画像と撮影画像のそれぞれからエッジを抽出し、各モデル画像について、モデル画像から有効なエッジが抽出された画素ごとに、対応する撮影画像の画素のエッジとの差の絶対値を算出して重み画像の当該画素に設定された重み係数にて重み付けて総和し、総和値をモデル画像からエッジが抽出された画素数で除して符号を反転した値を、当該モデル画像の重み付け類似度として算出する。 The weighted similarity can be the weighted similarity of the edge between the model image and the captured image. The evaluation value calculation means 514b extracts an edge from each model image and each captured image, and for each model image, for each pixel from which a valid edge is extracted from the model image, the edge of the corresponding captured image pixel. The absolute value of the difference is calculated and weighted with the weighting coefficient set for the pixel of the weighted image and summed, and the value obtained by dividing the sum by the number of pixels from which the edge is extracted from the model image is inverted. The weighted similarity of the model image is calculated.

或いは、評価値算出手段５１４ｂは、各モデル画像と撮影画像のそれぞれからエッジ画像を生成し、各モデル画像について、撮影画像から生成したエッジ画像と当該モデル画像から生成したエッジ画像とのチャンファーマッチングを行って、その過程で算出される画素ごとの距離に重み画像に従った重み付けを行って得られるチャンファー距離の符号を反転した値を、当該モデル画像の重み付け類似度として算出してもよい。 Alternatively, the evaluation value calculation unit 514b generates an edge image from each model image and the captured image, and for each model image, chamfer matching between the edge image generated from the captured image and the edge image generated from the model image is performed. And a value obtained by inverting the sign of the chamfer distance obtained by weighting the distance for each pixel calculated in the process according to the weighted image may be calculated as the weighted similarity of the model image. .

最適配置決定手段５１６ｂは、評価値算出手段５１４ｂから入力された配置ごとの評価値を参照し、評価値が最大の配置における候補位置を物体の位置と決定し、決定した物体位置の情報を物体位置出力手段３１に出力する。すなわち、最適配置決定手段５１６ｂは、最大の類似度が算出された配置に含まれる各候補位置を撮影画像に撮影されている各人の位置と決定する。 The optimum arrangement determining unit 516b refers to the evaluation value for each arrangement input from the evaluation value calculating unit 514b, determines the candidate position in the arrangement having the maximum evaluation value as the object position, and uses the information on the determined object position as the object Output to the position output means 31. That is, the optimum arrangement determining unit 516b determines each candidate position included in the arrangement where the maximum similarity is calculated as the position of each person photographed in the photographed image.

＜第二の実施形態に係る画像監視装置１の動作＞
以下、図６、図１２および図１３を参照し、第二の実施形態に係る画像監視装置１の動作を説明する。 <Operation of Image Monitoring Device 1 according to Second Embodiment>
Hereinafter, the operation of the image monitoring apparatus 1 according to the second embodiment will be described with reference to FIGS. 6, 12, and 13.

画像監視装置１が動作を開始すると、第一の実施形態と同様に、撮影部２は順次撮影画像を送信し、画像処理部５は撮影画像を受信するたびに図６のフローチャートに従った動作を繰り返す。 When the image monitoring apparatus 1 starts operation, the imaging unit 2 sequentially transmits captured images, and the image processing unit 5 operates according to the flowchart of FIG. 6 every time it receives a captured image, as in the first embodiment. repeat.

通信部３は画像取得手段３０として動作し、撮影画像を受信して画像処理部５に出力する（ステップＳ１）。撮影画像を入力された画像処理部５は密度推定手段５０として動作して記憶部４の密度推定器記憶手段４０から密度推定器を読み出し、撮影画像を密度推定器にて走査することによって密度分布を推定する（ステップＳ２）。 The communication unit 3 operates as the image acquisition unit 30, receives a captured image, and outputs it to the image processing unit 5 (step S1). The image processing unit 5 to which the photographed image is input operates as the density estimating unit 50, reads the density estimator from the density estimator storage unit 40 of the storage unit 4, and scans the photographed image with the density estimator, thereby density distribution. Is estimated (step S2).

次に、画像処理部５は物体位置判定手段５１として動作し、物体位置判定手段５１は、画像取得手段３０から撮影画像および密度推定手段５０から密度分布を入力されて、密度分布に背景クラス以外の推定密度が含まれているか否かを確認する（ステップＳ３）。 Next, the image processing unit 5 operates as the object position determination unit 51. The object position determination unit 51 receives the captured image from the image acquisition unit 30 and the density distribution from the density estimation unit 50, and the density distribution is not a background class. It is confirmed whether or not the estimated density is included (step S3).

物体位置判定手段５１は、背景クラス以外の推定密度が含まれている場合は（ステップＳ３にてＹＥＳ）、撮影画像から個々の物体の位置を判定する処理を行い（ステップＳ４）、背景クラスのみの場合は（ステップＳ３にてＮＯ）、ステップＳ４，Ｓ５の処理を省略する。 When the estimated density other than the background class is included (YES in step S3), the object position determination unit 51 performs a process of determining the position of each object from the captured image (step S4), and only the background class In the case of (NO in step S3), the processes in steps S4 and S5 are omitted.

図１２および図１３のフローチャートを参照して、ステップＳ４の物体位置判定処理を説明する。単体特徴記憶手段４１が物体モデル記憶手段４１０ｂおよび重み係数記憶手段４１２ｂとして動作し、物体位置判定手段５１が配置生成手段５１０ｂ、モデル画像生成手段５１２ｂ、評価値算出手段５１４ｂおよび最適配置決定手段５１６ｂとして動作して、物体位置判定処理が実行される。 The object position determination process in step S4 will be described with reference to the flowcharts of FIGS. The single feature storage unit 41 operates as the object model storage unit 410b and the weight coefficient storage unit 412b, and the object position determination unit 51 operates as the arrangement generation unit 510b, the model image generation unit 512b, the evaluation value calculation unit 514b, and the optimum arrangement determination unit 516b. In operation, the object position determination process is executed.

配置生成手段５１０ｂは、１から上限個数以下の範囲で配置数を順次設定して（ステップＳ２００）、ステップＳ２００〜Ｓ２１４のループ処理を制御する。 The placement generation unit 510b sequentially sets the number of placements within a range from 1 to the upper limit number (step S200), and controls the loop processing of steps S200 to S214.

また、配置生成手段５１０ｂは、反復回数をカウントするための変数Ｔを用意してＴを０に初期化し（ステップＳ２０１）、ステップＳ２０２〜Ｓ２１３の反復処理を開始する。 The arrangement generation unit 510b prepares a variable T for counting the number of iterations, initializes T to 0 (step S201), and starts the iterative process of steps S202 to S213.

次に、配置生成手段５１０ｂは、密度推定手段５０から入力された密度分布において推定密度が低密度、中密度または高密度の領域内に、ステップＳ２００にて設定した配置数と同数の候補位置をランダムに設定することによって、当該配置数におけるＴ通り目の配置を生成し、モデル画像生成手段５１２ｂに出力する（ステップＳ２０２）。 Next, the arrangement generation unit 510b sets the same number of candidate positions as the number of arrangements set in step S200 in an area where the estimated density is low density, medium density, or high density in the density distribution input from the density estimation unit 50. By setting at random, a T-th arrangement in the arrangement number is generated and output to the model image generation unit 512b (step S202).

モデル画像生成手段５１２ｂは、物体モデル記憶手段４１０ｂからカメラパラメータを読み出し、カメラパラメータを用いて、ステップＳ２０２で生成した配置に含まれる各候補位置を仮想空間の三次元座標に変換する（ステップＳ２０３）。 The model image generation unit 512b reads the camera parameters from the object model storage unit 410b, and converts each candidate position included in the arrangement generated in step S202 into three-dimensional coordinates in the virtual space using the camera parameters (step S203). .

次に、モデル画像生成手段５１２ｂは、撮影画像と同サイズのモデル画像および重み画像を用意して初期化するとともに、各候補位置の三次元座標の撮影部２までの距離を算出し、距離が遠い候補位置から順に処理対象に設定して（ステップＳ２０４）、ステップＳ２０４〜Ｓ２０８のループ処理を実行する。 Next, the model image generation unit 512b prepares and initializes a model image and a weight image having the same size as the captured image, calculates the distance to the imaging unit 2 of the three-dimensional coordinates of each candidate position, and the distance is The processing target is set in order from a far candidate position (step S204), and the loop processing of steps S204 to S208 is executed.

続いて、モデル画像生成手段５１２ｂは密度分布を参照して処理対象の候補位置の密度を特定する（ステップＳ２０５）。モデル画像生成手段５１２ｂは、物体モデル記憶手段４１０ｂから全身モデルを読み出して当該候補位置の三次元座標に配置し、カメラパラメータを用いて、配置した全身モデルを撮影画像の座標系に投影する。そして、モデル画像生成手段５１２ｂは、投影領域の上部１／３の領域内で最多の推定密度を当該候補位置の密度として特定する。 Subsequently, the model image generating unit 512b specifies the density of the candidate position to be processed with reference to the density distribution (step S205). The model image generation unit 512b reads out the whole body model from the object model storage unit 410b, arranges the whole body model at the three-dimensional coordinates of the candidate position, and projects the arranged whole body model on the coordinate system of the captured image using the camera parameters. Then, the model image generating unit 512b specifies the highest estimated density as the density of the candidate position in the upper third region of the projection area.

続いて、モデル画像生成手段５１２ｂは、ステップＳ２０５で特定した密度に対応する重み係数を重み係数記憶手段４１２ｂから読み出し（ステップＳ２０６）、全身モデルと重み係数を投影する（ステップＳ２０７）。すなわち、モデル画像生成手段５１２ｂは、まず、カメラパラメータを用いて、ステップＳ２０５で配置した全身モデルをモデル画像に上書き投影する。また、このとき、モデル画像生成手段５１２ｂは物体モデルの投影面積を記録しておく。さらに、モデル画像生成手段５１２ｂは、読み出した重み係数を全身モデルの各部分に設定し、カメラパラメータを用いて、重み係数を設定した全身モデルを重み画像に上書き投影する。 Subsequently, the model image generating unit 512b reads out the weighting factor corresponding to the density specified in step S205 from the weighting factor storage unit 412b (step S206), and projects the whole body model and the weighting factor (step S207). That is, the model image generating unit 512b first projects the whole body model arranged in step S205 on the model image by overwriting using the camera parameters. At this time, the model image generating means 512b records the projected area of the object model. Further, the model image generation means 512b sets the read weighting factor in each part of the whole body model, and uses the camera parameter to project the whole body model with the weighting factor overwritten on the weight image.

そして、モデル画像生成手段５１２ｂは、現配置数におけるＴ通り目の配置に含まれる全ての候補位置を処理し終えたか否かを確認し（ステップＳ２０８）、未処理の候補位置がある場合は（ステップＳ２０８にてＮＯ）、処理をステップＳ２０４に戻して次の候補位置を処理する。 Then, the model image generating unit 512b checks whether or not all candidate positions included in the Tth arrangement in the current arrangement number have been processed (step S208), and if there is an unprocessed candidate position ( NO in step S208), the process returns to step S204 to process the next candidate position.

他方、全ての候補位置を処理し終えた場合は（ステップＳ２０８にてＹＥＳ）、現配置数におけるＴ通り目の配置についてのモデル画像および重み画像の完成となる。モデル画像を完成させたモデル画像生成手段５１２ｂはモデル画像と重み画像を評価値算出手段５１４ｂに出力する。 On the other hand, when all candidate positions have been processed (YES in step S208), the model image and the weight image for the Tth arrangement in the current arrangement number are completed. The model image generating unit 512b that has completed the model image outputs the model image and the weight image to the evaluation value calculating unit 514b.

モデル画像と重み画像を入力された評価値算出手段５１４ｂは、当該重み画像に従って重み付けた当該モデル画像と撮影画像の重み付け類似度を現配置数におけるＴ通り目の配置についての評価値として算出する（ステップＳ２１０）。すなわち、評価値算出手段５１４ｂは、モデル画像生成手段５１２ｂから入力されたモデル画像と撮影画像のそれぞれからエッジ画像を生成し、これらのエッジ画像の画素ごとの類似度を当該画素の重み係数で重み付けて総和した重み付け類似度として算出する。 The evaluation value calculation means 514b to which the model image and the weight image are input calculates the weighted similarity between the model image and the photographed image weighted according to the weight image as the evaluation value for the Tth arrangement in the current arrangement number ( Step S210). In other words, the evaluation value calculation unit 514b generates edge images from the model image and the photographed image input from the model image generation unit 512b, and weights the similarity of each of the edge images with the weight coefficient of the pixel. To calculate the total weighted similarity.

現配置数におけるＴ通り目の配置についての評価値が算出されると、評価値算出手段５１４ｂは当該配置と評価値を対応付けて記録し、配置生成手段５１０ｂは反復回数Ｔを１だけ増加させて（ステップＳ２１２）、規定回数Ｔ_ＭＡＸと比較し（ステップＳ２１３）、ＴがＴ_ＭＡＸ未満の場合は（ステップＳ２１３にてＮＯ）、処理をステップＳ２０２に戻して現配置数における反復処理を継続させる。 When the evaluation value for the Tth arrangement in the current arrangement number is calculated, the evaluation value calculation unit 514b records the arrangement and the evaluation value in association with each other, and the arrangement generation unit 510b increases the number of iterations T by one. (Step S212), and compared with the specified number of times T _MAX (step S213). If T is less than T _MAX (NO in step S213), the process returns to step S202 to continue the iterative process for the current number of arrangements. .

反復回数Ｔが規定回数Ｔ_ＭＡＸに達した場合（ステップＳ２１３にてＹＥＳ）、配置生成手段５１０ｂは、現配置数における反復処理を終了させ、全ての配置数を設定し終えたか否かを確認する（ステップＳ２１４）。未設定の配置数がある場合は（ステップＳ２１４にてＮＯ）、処理をステップＳ２００に戻して次の配置数についての処理を行う。 When the number of iterations T has reached the specified number of times T _MAX (YES in step S213), the arrangement generation unit 510b ends the iterative process for the current arrangement number and confirms whether all the arrangement numbers have been set. (Step S214). If there is an unset number of arrangements (NO in step S214), the process returns to step S200 to perform processing for the next number of arrangements.

他方、全ての配置数を設定し終えた場合は（ステップＳ２１４にてＹＥＳ）、評価値算出手段５１４ｂはステップＳ２１２で記録した配置と評価値を最適配置決定手段５１６ｂに入力し、最適配置決定手段５１６ｂは、それらの中で評価値が最大の配置を特定し（ステップＳ２１５）、当該配置を撮影画像に撮影されている個々の人の位置を表している情報と判定する。 On the other hand, when all the arrangement numbers have been set (YES in step S214), evaluation value calculation means 514b inputs the arrangement and evaluation values recorded in step S212 to optimum arrangement determination means 516b, and optimum arrangement determination means 516b specifies an arrangement having the maximum evaluation value among them (step S215), and determines that the arrangement is information indicating the position of each person photographed in the photographed image.

再び図６を参照して説明を続ける。物体位置判定手段５１はステップＳ４にて判定した物体位置の情報を通信部３に出力し（ステップＳ５）、通信部３は物体位置出力手段３１として動作して物体位置の情報を表示部６に送信する。 The description will be continued with reference to FIG. The object position determination unit 51 outputs the information on the object position determined in step S4 to the communication unit 3 (step S5), and the communication unit 3 operates as the object position output unit 31 to display the object position information on the display unit 6. Send.

［第三の実施形態］
以下、第一および第二の実施形態とは異なる本発明の好適な実施形態として、単独の人の画像特徴を学習した識別器を用いて個々の人を検出する物体検出装置の例を含んだ画像監視装置１の例を説明する。この実施形態に係る画像監視装置１は、特に、物体検出装置が人の密度によって識別器を切り替える例を含む。 [Third embodiment]
Hereinafter, as a preferred embodiment of the present invention different from the first and second embodiments, an example of an object detection device that detects an individual person using a discriminator that has learned an image feature of a single person is included. An example of the image monitoring apparatus 1 will be described. The image monitoring apparatus 1 according to this embodiment includes an example in which the object detection apparatus switches the discriminator depending on the human density.

第三の実施形態に係る画像監視装置は、単体特徴記憶手段４１が記憶している単体特徴の細部および物体位置判定手段５１が行う処理の細部が第一および第二の実施形態に係る画像監視装置と異なり、概略の構成、概略の機能および動作の一部は共通する。そのため、概略の構成、概略の機能および動作の一部については、それぞれ第一および第二の実施形態で参照した図１のブロック図、図２の機能ブロック図および図６のフローチャートを再び参照して説明する。 In the image monitoring apparatus according to the third embodiment, the details of the single feature stored in the single feature storage unit 41 and the details of the processing performed by the object position determination unit 51 are the image monitoring according to the first and second embodiments. Unlike the apparatus, the general configuration, the general function, and a part of the operation are common. Therefore, with respect to a schematic configuration, a general function, and a part of the operation, refer to the block diagram of FIG. 1, the functional block diagram of FIG. 2, and the flowchart of FIG. 6 referred to in the first and second embodiments, respectively. I will explain.

＜第三の実施形態に係る画像監視装置１の構成＞
図１のブロック図を参照して第三の実施形態に係る画像監視装置１の概略の構成を説明する。
画像監視装置１は、第一および第二の実施形態と同様、監視空間を所定時間おきに撮影して撮影画像を出力する撮影部２と、物体位置の情報を入力されて当該情報を表示する表示部６と、撮影画像を取得して当該撮影画像から個々の人（物体）を検出し、検出した物体の位置（物体位置）の情報を生成して出力する画像処理部５とが、撮影画像および物体位置の情報等の入出力を介在する通信回路である通信部３に接続されるとともに、プログラムおよび各種データ等を記憶してこれらを入出力する記憶部４が画像処理部５に接続されてなる。 <Configuration of Image Monitoring Device 1 according to Third Embodiment>
The schematic configuration of the image monitoring apparatus 1 according to the third embodiment will be described with reference to the block diagram of FIG.
As in the first and second embodiments, the image monitoring apparatus 1 captures the monitoring space at predetermined time intervals and outputs a captured image, and receives information on the object position and displays the information. A display unit 6 and an image processing unit 5 that acquires a captured image, detects an individual person (object) from the captured image, generates information about the position (object position) of the detected object, and outputs the information. Connected to the communication unit 3, which is a communication circuit that intervenes input / output of image and object position information, etc., and a storage unit 4 that stores programs and various data and inputs / outputs them is connected to the image processing unit 5 Being done.

＜第三の実施形態に係る画像監視装置１の機能＞
図２および図１４の機能ブロック図を参照し、第三の実施形態に係る画像監視装置１の機能について説明する。 <Functions of Image Monitoring Device 1 according to Third Embodiment>
The function of the image monitoring apparatus 1 according to the third embodiment will be described with reference to the functional block diagrams of FIGS.

通信部３は、第一および第二の実施形態と同様、撮影部２から撮影画像を取得して密度推定手段５０と物体位置判定手段５１に出力する画像取得手段３０、および物体位置判定手段５１から入力された物体位置の情報を表示部６に出力する物体位置出力手段３１等としての機能を含む。 As in the first and second embodiments, the communication unit 3 acquires a captured image from the imaging unit 2 and outputs the acquired image to the density estimation unit 50 and the object position determination unit 51, and the object position determination unit 51. The function as the object position output means 31 etc. which outputs the information of the object position input from the to the display part 6 is included.

記憶部４は、第一および第二の実施形態と同様、所定の密度ごとに当該密度にて物体が存在する空間を撮影した密度画像それぞれの画像特徴を学習した密度推定器を記憶している密度推定器記憶手段４０、および予めの学習により生成された単独の物体の画像特徴（単体特徴）を記憶している単体特徴記憶手段４１等としての機能を含み、単体特徴記憶手段４１が記憶している単体特徴は、密度が高いほど、物体を構成する部分のうちの少ない部分の画像特徴を重視した評価ができるものとなっている。 As in the first and second embodiments, the storage unit 4 stores a density estimator that learns the image features of each density image obtained by photographing a space where an object exists at the density for each predetermined density. The unit includes a function as a density estimator storage unit 40 and a unit feature storage unit 41 that stores an image feature (unit unit feature) of a single object generated by learning in advance. The single feature that can be evaluated can be evaluated with an emphasis on the image feature of a small part of the parts constituting the object as the density is high.

また、画像処理部５は、第一および第二の実施形態と同様、撮影画像を密度推定器で走査することによって撮影画像に撮影された物体の密度の分布を推定し、推定した密度分布を物体位置判定手段５１に出力する密度推定手段５０、および撮影画像内に個々の物体が存在し得る候補位置を設定して当該候補位置の撮影画像に単独の物体の画像特徴が現れている度合いを表す評価値を算出し、評価値が所定値以上である候補位置を物体の位置と判定し、物体位置の情報を物体位置出力手段３１に出力する物体位置判定手段５１等としての機能を含み、物体位置判定手段５１は、候補位置の密度に応じた単体特徴を用いることによって、候補位置における密度が高いほど、物体を構成する部分のうちの少ない部分の画像特徴を重視して評価値を算出する。 Further, as in the first and second embodiments, the image processing unit 5 estimates the density distribution of the object photographed in the photographed image by scanning the photographed image with the density estimator, and calculates the estimated density distribution. The density estimation means 50 output to the object position determination means 51, and candidate positions where individual objects can exist in the captured image are set, and the degree to which the image feature of a single object appears in the captured image at the candidate position. Including a function as an object position determination unit 51 that calculates an evaluation value to be expressed, determines a candidate position whose evaluation value is equal to or greater than a predetermined value as an object position, and outputs information on the object position to the object position output unit 31; The object position determination unit 51 uses the single feature according to the density of the candidate positions, and calculates the evaluation value by placing importance on the image features of a smaller part of the parts constituting the object as the density at the candidate position is higher. To.

ただし、上述したように、第三の実施形態に係る物体位置判定手段５１が行う処理の細部および単体特徴記憶手段４１が記憶している単体特徴の細部が第一および第二の実施形態に係る画像監視装置１と異なる。これらの点について、図１４の機能ブロック図を参照して説明する。 However, as described above, the details of the processing performed by the object position determination unit 51 according to the third embodiment and the details of the single feature stored in the single feature storage unit 41 relate to the first and second embodiments. Different from the image monitoring apparatus 1. These points will be described with reference to the functional block diagram of FIG.

第三の実施形態に係る単体特徴記憶手段４１は、単独の人（物体）の画像特徴を学習した識別器（単体識別器）を予め記憶した単体識別器記憶手段４１１ｃとして機能し、単体識別器の情報を単体特徴として記憶している。 The single feature storage unit 41 according to the third embodiment functions as a single discriminator storage unit 411c that stores in advance a discriminator (single discriminator) that has learned an image feature of a single person (object). Is stored as a single feature.

図１５は、第三の実施形態に係る単体特徴記憶手段４１が記憶している単体特徴、すなわち単体識別器記憶手段４１１ｃが記憶している単体識別器の情報を模式的に表した図である。 FIG. 15 is a diagram schematically showing the single feature stored in the single feature storage unit 41 according to the third embodiment, that is, the single classifier information stored in the single classifier storage unit 411c. .

単体識別器は、画像の特徴量を入力されると当該画像が単独の人が撮影されている画像（単体画像）であることの尤もらしさを表す評価値（識別スコア）を算出して出力する評価値算出関数の係数、および識別スコアに対して適用する閾値等のパラメータで表される。
単体識別器は多数の単体画像とそれぞれが人以外しか写っていない多数の無人画像からなる学習用画像の特徴量に線形ＳＶＭ法を適用して学習した識別器とすることができる。
学習アルゴリズムとして線形ＳＶＭを用いた場合、評価値算出関数の係数は重みベクトルである。この重みベクトルは、特徴量の各要素に対する重みであり、入力された画像の特徴量と重みベクトルとの内積の値が識別スコアを表す。学習において、当該重みベクトルと特徴量との内積が０より大きい場合は人、０以下の場合は人以外と識別されるように調整される。よって、入力された画像が単体画像であるか否かを識別する閾値は原理上は０であり、通常、閾値は０に設定することができる。ただし、単体画像を単体画像でないと識別する誤りを減じるために、閾値を０よりも小さな値に設定してもよい。
なお、学習用画像の特徴量はＨＯＧ（Histograms of Oriented Gradients）特徴量である。 When a feature amount of an image is input, the single classifier calculates and outputs an evaluation value (discrimination score) indicating the likelihood that the image is an image of a single person (single image). It is represented by parameters such as a coefficient applied to the evaluation value calculation function and a threshold value applied to the identification score.
The single discriminator can be a discriminator that is learned by applying the linear SVM method to the feature amount of a learning image including a large number of single images and a large number of unmanned images in which only a person is captured.
When linear SVM is used as the learning algorithm, the coefficient of the evaluation value calculation function is a weight vector. This weight vector is a weight for each element of the feature quantity, and the value of the inner product of the feature quantity of the input image and the weight vector represents the identification score. In learning, when the inner product of the weight vector and the feature quantity is greater than 0, the person is identified. Therefore, the threshold for identifying whether or not the input image is a single image is 0 in principle, and the threshold can usually be set to 0. However, the threshold value may be set to a value smaller than 0 in order to reduce errors in identifying a single image as not being a single image.
Note that the feature amount of the learning image is a HOG (Histograms of Oriented Gradients) feature amount.

単体識別器記憶手段４１１ｃが記憶している単体識別器は、密度が高いほど単独の物体を構成する部分のうちの少ない部分の画像特徴を学習した識別器となっている。単体識別器記憶手段４１１ｃは、低密度クラスを表す値と対応付けて単独の人の全身の画像特徴を学習した単体識別器８００、中密度クラスを表す値と対応付けて単独の人の上部２／３の画像特徴を学習した単体識別器８０１、高密度クラスを表す値と対応付けて単独の人の上部１／３の画像特徴を学習した単体識別器８０２を記憶している。以下、単体識別器８００，８０１，８０２をそれぞれ全身識別器、上半身識別器、頭部近傍識別器と称する。 The single discriminator stored in the single discriminator storage unit 411c is a discriminator that learns image features of a smaller part of the parts constituting a single object as the density increases. The single discriminator storage unit 411c is a single discriminator 800 that learns the image features of the whole body of a single person in association with a value that represents a low density class, and the upper part 2 of the single person in association with a value that represents a medium density class. A single discriminator 801 that has learned the image feature of / 3 and a single discriminator 802 that has learned the image feature of the upper third of a single person in association with the value representing the high-density class are stored. Hereinafter, the single classifiers 800, 801, and 802 are referred to as a whole body classifier, an upper body classifier, and a head vicinity classifier, respectively.

全身識別器８００は単独の人の全身が撮影された単体画像を用いて学習した単体識別器であり、上半身識別器８０１は単独の人の上部２／３が撮影された単体画像（人の全身が撮影された単体画像の上部２／３を切り出した画像など）を用いて学習した単体識別器であり、頭部近傍識別器８０２は単独の人の上部１／３が撮影された単体画像（人の全身が撮影された単体画像の上部１／３を切り出した画像など）を用いて学習した単体識別器である。 The whole body discriminator 800 is a single discriminator learned using a single image obtained by photographing the whole body of a single person, and the upper body discriminator 801 is a single image obtained by photographing the upper 2/3 of a single person (a whole body of a person). Is a single discriminator that has been learned using an image obtained by cutting out the upper part 2/3 of the single image taken by the user, and the head vicinity discriminator 802 is a single image (upper third of a single person taken) This is a single discriminator that has been learned using an image obtained by cutting out the upper third of a single image obtained by photographing the whole body of a person.

このように、単体識別器記憶手段４１１ｃは、低密度クラスと対応付けて全身識別器８００を、中密度クラスと対応付けて上半身識別器８０１を、高密度クラスと対応付けて頭部近傍識別器８０２を記憶している。 As described above, the single classifier storage unit 411c associates the whole body classifier 800 with the low density class, the upper body classifier 801 with the medium density class, and the head vicinity classifier with the high density class. 802 is stored.

候補位置設定手段５１１ｃは、撮影画像内に所定間隔にて複数の候補位置を設定し、設定した候補位置を評価値算出手段５１４ｃに出力する。具体的には、所定間隔は１画素であり、候補位置設定手段５１１ｃは撮影画像の各画素の位置を順次候補位置に設定する。なお候補位置は人の頭部重心を表すものとする。 The candidate position setting unit 511c sets a plurality of candidate positions at predetermined intervals in the captured image, and outputs the set candidate positions to the evaluation value calculation unit 514c. Specifically, the predetermined interval is one pixel, and the candidate position setting unit 511c sequentially sets the position of each pixel of the captured image as a candidate position. The candidate position represents the human head center of gravity.

評価値算出手段５１４ｃは、候補位置設定手段５１１ｃから入力された候補位置のそれぞれについて、当該候補位置の密度が高いほど単独の物体を構成する部分のうちの少ない部分の画像特徴を学習した単体識別器に、当該候補位置の撮影画像の画像特徴を入力して評価値を算出し、算出した評価値とそれに付随する情報を位置決定手段５１７ｃに出力する。 For each candidate position input from the candidate position setting means 511c, the evaluation value calculating means 514c learns the image features of a smaller part of the parts constituting a single object as the density of the candidate positions is higher. The image feature of the photographed image at the candidate position is input to the device, the evaluation value is calculated, and the calculated evaluation value and accompanying information are output to the position determining means 517c.

そのために、評価値算出手段５１４ｃは、各候補位置に単独の人の上部１／３の形状に定められた窓を設定するとともに密度推定手段５０から入力された密度分布を参照し、当該窓内の推定密度を集計する。当該窓は後述する識別用抽出窓である。そして、評価値算出手段５１４ｃは、各候補位置における最多の推定密度を当該候補位置の密度と決定する。 For this purpose, the evaluation value calculation means 514c sets a window defined in the shape of the upper third of a single person at each candidate position and refers to the density distribution input from the density estimation means 50, Aggregate the estimated density. The window is an identification extraction window described later. Then, the evaluation value calculating unit 514c determines the highest estimated density at each candidate position as the density of the candidate position.

また、評価値算出手段５１４ｃは、各候補位置に当該候補位置の密度に応じた識別用抽出窓を設定し、識別用抽出窓内の撮影画像から単体識別用の特徴量（識別用特徴量）を抽出する。識別用抽出窓は、各密度に応じた単体識別器の学習に用いた単体画像の形状（図１５に示した実線の矩形）を有し、予め定めた複数の倍率で拡大・縮小した大きさの窓である。すなわち、識別用抽出窓は、候補位置の密度が低密度であれば単独の人の全身の形状に定められた窓であり、中密度であれば単独の人の上部２／３の形状に定められた窓であり、高密度であれば単独の人の上部１／３の形状に定められた窓である。 In addition, the evaluation value calculation unit 514c sets an extraction window for identification corresponding to the density of the candidate positions at each candidate position, and a feature quantity for identification (identification feature quantity) from the captured image in the identification extraction window. To extract. The identification extraction window has the shape of a single image (solid-line rectangle shown in FIG. 15) used for learning of a single classifier corresponding to each density, and is enlarged / reduced by a plurality of predetermined magnifications. The window. That is, the identification extraction window is a window defined in the shape of the whole body of a single person if the density of candidate positions is low, and is defined in the shape of the upper 2/3 of the single person if the density is medium. If it is high density, it is a window defined in the shape of the upper third of a single person.

図１６は、図５で例示した密度分布が得られている場合に、図５で例示した各候補位置に評価値算出手段５１４ｃが設定する識別用抽出窓を模式的に示した図である。 FIG. 16 is a diagram schematically illustrating an extraction window for identification set by the evaluation value calculation unit 514c at each candidate position illustrated in FIG. 5 when the density distribution illustrated in FIG. 5 is obtained.

また、評価値算出手段５１４ｃは、候補位置ごとに当該候補位置の密度に応じた単体識別器を単体識別器記憶手段４１１ｃから読み出す。すなわち、評価値算出手段５１４ｃは、候補位置の密度が低密度であれば全身識別器を読み出し、中密度であれば上半身識別器を読み出し、高密度であれば頭部近傍識別器を読み出す。そして、評価値算出手段５１４ｃは、各候補位置について、読み出した単体識別器に当該候補位置から抽出した識別用特徴量を入力し、その出力値である識別スコアを当該候補位置の評価値として取得する。 In addition, the evaluation value calculation unit 514c reads, from the single classifier storage unit 411c, a single classifier corresponding to the density of the candidate positions for each candidate position. That is, the evaluation value calculation unit 514c reads the whole body classifier if the density of the candidate positions is low, reads the upper body classifier if the density is medium, and reads the head vicinity classifier if the density is high. Then, the evaluation value calculation unit 514c inputs, for each candidate position, the identification feature amount extracted from the candidate position to the read single classifier, and acquires the identification score that is the output value as the evaluation value of the candidate position. To do.

そして、評価値算出手段５１４ｃは、候補位置ごとに、候補位置、密度、識別スコア、使用した単体識別器の閾値および使用した識別用抽出窓を対応付けた情報を位置決定手段５１７ｃに出力する。 Then, the evaluation value calculating unit 514c outputs, for each candidate position, information that associates the candidate position, the density, the identification score, the threshold value of the used single classifier, and the used extraction window for identification, to the position determining unit 517c.

位置決定手段５１７ｃは、評価値算出手段５１４ｃから入力された情報を参照し、予め定めた基準を満たす評価値が算出された候補位置を物体の位置と決定する。 The position determining unit 517c refers to the information input from the evaluation value calculating unit 514c, and determines a candidate position where an evaluation value satisfying a predetermined criterion is calculated as the position of the object.

具体的には、位置決定手段５１７ｃは、識別スコアがそれに対応する閾値以上である候補位置を抽出し、抽出した候補位置のうち対応する密度が同一であり且つ互いに近接する複数の候補位置を一つにまとめ、まとめた候補位置を人が撮影されている位置と決定する。 Specifically, the position determining unit 517c extracts candidate positions having an identification score equal to or greater than a corresponding threshold value, and selects a plurality of candidate positions that have the same density and are close to each other among the extracted candidate positions. The candidate positions are determined as the positions where the person is photographed.

この候補位置をまとめる処理は、実際に人が撮影されている位置に加えてその近傍においても同一人物に対して高い識別スコアが算出されることに対処するために行う。具体的には、例えば、位置決定手段５１７ｃは、密度ごとに、閾値以上の識別スコアが算出された候補位置を識別スコアが高い順に順次注目位置に設定するとともに注目位置より識別スコアが低い候補位置を比較位置に設定する。そして、位置決定手段５１７ｃは、比較位置のうち当該比較位置に設定された識別用抽出窓と注目位置に設定された識別用抽出窓との重なりが予め定めた割合より大きい比較位置の情報を削除することで複数の候補位置を一つにまとめる。 The process of grouping the candidate positions is performed in order to cope with the fact that a high identification score is calculated for the same person in the vicinity in addition to the position where the person is actually photographed. Specifically, for example, for each density, the position determination unit 517c sequentially sets candidate positions for which an identification score equal to or greater than the threshold is calculated as the attention position in descending order of the identification score and has a lower identification score than the attention position. Is set to the comparison position. Then, the position determining means 517c deletes information on the comparison position where the overlap between the identification extraction window set at the comparison position and the identification extraction window set at the target position is larger than a predetermined ratio among the comparison positions. By doing so, a plurality of candidate positions are combined into one.

そして、位置決定手段５１７ｃは、人が撮影されている位置と決定した候補位置を物体位置の情報として物体位置出力手段３１に出力する。 Then, the position determination unit 517c outputs the position where the person is photographed and the determined candidate position to the object position output unit 31 as information on the object position.

＜第三の実施形態に係る画像監視装置１の動作＞
以下、図６および図１７を参照し、第三の実施形態に係る画像監視装置１の動作を説明する。 <Operation of Image Monitoring Device 1 according to Third Embodiment>
Hereinafter, the operation of the image monitoring apparatus 1 according to the third embodiment will be described with reference to FIGS. 6 and 17.

画像監視装置１が動作を開始すると、第一および第二の実施形態と同様に、撮影部２は順次撮影画像を送信し、画像処理部５は撮影画像を受信するたびに図６のフローチャートに従った動作を繰り返す。 When the image monitoring apparatus 1 starts its operation, as in the first and second embodiments, the imaging unit 2 sequentially transmits the captured images, and the image processing unit 5 receives the captured image in the flowchart of FIG. Repeat the action.

物体位置判定手段５１は、背景クラス以外の推定密度が含まれている場合は（ステップＳ３にてＹＥＳ）、撮影画像から個々の物体の位置を判定する処理を行い（ステップＳ４）、背景クラスのみの場合は（ステップＳ３にてＮＯ）、ステップＳ４の処理を省略する。 When the estimated density other than the background class is included (YES in step S3), the object position determination unit 51 performs a process of determining the position of each object from the captured image (step S4), and only the background class In the case of (NO in step S3), the process of step S4 is omitted.

図１７のフローチャートを参照して、ステップＳ４の物体位置判定処理を説明する。単体特徴記憶手段４１が単体識別器記憶手段４１１ｃとして動作し、物体位置判定手段５１が候補位置設定手段５１１ｃ、評価値算出手段５１４ｃおよび位置決定手段５１７ｃとして動作して、物体位置判定処理が実行される。 The object position determination process in step S4 will be described with reference to the flowchart in FIG. The single feature storage unit 41 operates as the single classifier storage unit 411c, and the object position determination unit 51 operates as the candidate position setting unit 511c, the evaluation value calculation unit 514c, and the position determination unit 517c, and the object position determination process is executed. The

候補位置設定手段５１１ｃは、撮影画像中の各画素の位置を順次候補位置に設定して評価値算出手段５１４ｃに入力し（ステップＳ３００）、ステップＳ３００〜Ｓ３０４のループ処理を制御する。 The candidate position setting unit 511c sequentially sets the position of each pixel in the captured image as a candidate position and inputs it to the evaluation value calculation unit 514c (step S300), and controls the loop processing of steps S300 to S304.

候補位置を入力された評価値算出手段５１４ｃは密度分布を参照して候補位置の密度を特定する（ステップＳ３０１）。評価値算出手段５１４ｃは候補位置に単独の人の上部１／３の形状に定められた窓を設定して当該窓内で最多の推定密度を候補位置の密度として特定する。 The evaluation value calculation means 514c that has received the candidate position specifies the density of the candidate position with reference to the density distribution (step S301). The evaluation value calculation means 514c sets a window defined in the shape of the upper third of a single person at the candidate position, and specifies the most estimated density as the candidate position density in the window.

密度を特定した評価値算出手段５１４ｃは、単体識別器記憶手段４１１ｃから当該密度に応じた単体識別器を読み出し、当該密度に応じた識別用抽出窓を設定して識別用抽出窓内の撮影画像から識別用特徴量を抽出し（ステップＳ３０２）、抽出した識別用特徴量を当該密度に応じた単体識別器に入力して識別スコア（評価値）を算出する（ステップＳ３０３）。 The evaluation value calculation means 514c that specifies the density reads the single classifier corresponding to the density from the single classifier storage means 411c, sets an extraction window for identification corresponding to the density, and takes a captured image in the extraction window for identification. The feature quantity for identification is extracted from (step S302), and the extracted feature quantity for identification is input to a single classifier corresponding to the density to calculate the identification score (evaluation value) (step S303).

そして、評価値算出手段５１４ｃは、候補位置と識別用抽出窓と密度と評価値とを対応付けて記録するとともに、撮影画像の全ての画素の位置を候補位置に設定し終えたか否かを確認し（ステップＳ３０４）、未設定の画素がある場合は（ステップＳ３０４にてＮＯ）、処理をステップＳ３００に戻して次の画素の位置を処理する。 Then, the evaluation value calculation unit 514c records the candidate position, the extraction window for identification, the density, and the evaluation value in association with each other and confirms whether or not the positions of all the pixels of the photographed image have been set as the candidate positions. If there is an unset pixel (NO in step S304), the process returns to step S300 to process the position of the next pixel.

他方、全ての画素の位置を候補位置に設定し終えた場合（ステップＳ３０４にてＹＥＳ）、位置決定手段５１７ｃは、ステップＳ３０４で記録された候補位置と識別用抽出窓と密度と評価値の組の中から評価値が閾値未満の組を削除し（ステップＳ３０５）、さらに、削除されずに残った組について、密度ごとに、互いの識別用抽出窓が予め定めた割合よりも大きく重複している組を同一人物のものとして一つの組にまとめる（ステップＳ３０６）。そして位置決定手段５１７ｃはまとめた後の各組の候補位置を撮影画像に撮影されている個々の人の位置（物体位置）と判定する。 On the other hand, when the positions of all the pixels have been set as the candidate positions (YES in step S304), the position determining unit 517c sets the candidate position, the identification extraction window, the density, and the evaluation value recorded in step S304. The pair whose evaluation value is less than the threshold value is deleted from the list (step S305), and for each remaining set, the identification extraction windows overlap each other by a larger amount than the predetermined ratio for each density. The groups that belong to the same person are grouped into one group (step S306). Then, the position determining unit 517c determines each set of candidate positions after the grouping as the position (object position) of each person photographed in the photographed image.

［第四の実施形態］
以下、第一、第二および第三の実施形態とは異なる本発明の好適な実施形態として、識別器が出力する部分評価値に対する重み付けを人の密度によって変更する物体検出装置の例を含んだ画像監視装置１の例を説明する。 [Fourth embodiment]
Hereinafter, as a preferred embodiment of the present invention that is different from the first, second, and third embodiments, an example of an object detection device that changes a weight for a partial evaluation value output by a discriminator depending on a human density is included. An example of the image monitoring apparatus 1 will be described.

第四の実施形態に係る画像監視装置は、単体特徴記憶手段４１が記憶している単体特徴の細部および物体位置判定手段５１が行う処理の細部が第一、第二および第三の実施形態に係る画像監視装置と異なり、概略の構成、概略の機能および動作の一部は共通する。そのため、概略の構成、概略の機能および動作の一部については、それぞれ第一、第二および第三の実施形態で参照した図１のブロック図、図２の機能ブロック図および図６のフローチャートを再び参照して説明する。 In the image monitoring apparatus according to the fourth embodiment, the details of the single features stored in the single feature storage means 41 and the details of the processing performed by the object position determination means 51 are the same as those in the first, second and third embodiments. Unlike the image monitoring apparatus, the general configuration, the general function, and a part of the operation are common. Therefore, for the schematic configuration, a part of the schematic function and the operation, the block diagram of FIG. 1, the functional block diagram of FIG. 2, and the flowchart of FIG. 6 referred to in the first, second and third embodiments, respectively. The description will be given with reference again.

＜第四の実施形態に係る画像監視装置１の構成＞
図１のブロック図を参照して第四の実施形態に係る画像監視装置１の概略の構成を説明する。
画像監視装置１は、第一、第二および第三の実施形態と同様、監視空間を所定時間おきに撮影して撮影画像を出力する撮影部２と、物体位置の情報を入力されて当該情報を表示する表示部６と、撮影画像を取得して当該撮影画像から個々の人（物体）を検出し、検出した物体の位置（物体位置）の情報を生成して出力する画像処理部５とが、撮影画像および物体位置の情報等の入出力を介在する通信部３に接続されるとともに、プログラムおよび各種データ等を記憶してこれらを入出力する記憶部４が画像処理部５に接続されてなる。 <Configuration of Image Monitoring Device 1 according to Fourth Embodiment>
A schematic configuration of the image monitoring apparatus 1 according to the fourth embodiment will be described with reference to the block diagram of FIG.
As in the first, second, and third embodiments, the image monitoring apparatus 1 captures the monitoring space every predetermined time and outputs a captured image, and receives the information on the object position and receives the information. A display unit 6 that displays the image, an image processing unit 5 that acquires a captured image, detects an individual person (object) from the captured image, and generates and outputs information on the position (object position) of the detected object Is connected to the communication unit 3 through which input / output of captured image and object position information and the like, and a storage unit 4 for storing programs and various data and inputting / outputting these are connected to the image processing unit 5. It becomes.

＜第四の実施形態に係る画像監視装置１の機能＞
図２および図１８の機能ブロック図を参照し、第四の実施形態に係る画像監視装置１の機能について説明する。 <Functions of the image monitoring apparatus 1 according to the fourth embodiment>
The function of the image monitoring apparatus 1 according to the fourth embodiment will be described with reference to the functional block diagrams of FIGS.

通信部３は、第一、第二および第三の実施形態と同様、撮影部２から撮影画像を取得して密度推定手段５０と物体位置判定手段５１に出力する画像取得手段３０、および物体位置判定手段５１から入力された物体位置の情報を表示部６に出力する物体位置出力手段３１等としての機能を含む。 As in the first, second, and third embodiments, the communication unit 3 acquires a captured image from the imaging unit 2 and outputs the captured image to the density estimation unit 50 and the object position determination unit 51, and the object position It includes a function as an object position output means 31 that outputs information on the object position input from the determination means 51 to the display unit 6.

また、記憶部４は、第一、第二および第三の実施形態と同様、所定の密度ごとに当該密度にて物体が存在する空間を撮影した密度画像それぞれの画像特徴を学習した密度推定器を記憶している密度推定器記憶手段４０、および予めの学習により生成された単独の物体の画像特徴（単体特徴）を記憶している単体特徴記憶手段４１等としての機能を含み、単体特徴記憶手段４１が記憶している単体特徴は、密度が高いほど、物体を構成する部分のうちの少ない部分の画像特徴を重視した評価ができるものとなっている。 Further, as in the first, second, and third embodiments, the storage unit 4 is a density estimator that learns image features of each density image obtained by photographing a space where an object exists at the predetermined density for each predetermined density. Including a function as a density estimator storage means 40 that stores information, a single feature storage means 41 that stores image features (single features) of a single object generated by pre-learning, and the like. The unit features stored in the means 41 can be evaluated with an emphasis on the image features of a small part of the parts constituting the object as the density is higher.

また、画像処理部５は、第一、第二および第三の実施形態と同様、撮影画像を密度推定器で走査することによって撮影画像に撮影された物体の密度の分布を推定し、推定した密度分布を物体位置判定手段５１に出力する密度推定手段５０、および撮影画像内に個々の物体が存在し得る候補位置を設定して当該候補位置の撮影画像に単独の物体の画像特徴が現れている度合いを表す評価値を算出し、評価値が所定値以上である候補位置を物体の位置と判定し、物体位置の情報を物体位置出力手段３１に出力する物体位置判定手段５１等としての機能を含み、物体位置判定手段５１は、候補位置の密度に応じた単体特徴を用いることによって、候補位置における密度が高いほど、物体を構成する部分のうちの少ない部分の画像特徴を重視して評価値を算出する。 Further, the image processing unit 5 estimates and estimates the density distribution of the object photographed in the photographed image by scanning the photographed image with the density estimator as in the first, second, and third embodiments. A density estimation unit 50 that outputs the density distribution to the object position determination unit 51, and candidate positions where individual objects can exist in the captured image are set, and an image feature of a single object appears in the captured image at the candidate position. A function as an object position determination unit 51 or the like that calculates an evaluation value that represents the degree of movement, determines a candidate position whose evaluation value is equal to or greater than a predetermined value as an object position, and outputs information on the object position to the object position output unit 31 The object position determination unit 51 uses the single feature according to the density of the candidate positions, so that the higher the density at the candidate position, the more important the image features of the parts constituting the object are evaluated. It is calculated.

ただし、上述したように、第四の実施形態に係る物体位置判定手段５１が行う処理の細部および単体特徴記憶手段４１が記憶している単体特徴の細部が第一、第二および第三の実施形態に係る画像監視装置１と異なる。これらの点について、図１８の機能ブロック図を参照して説明する。 However, as described above, the details of the processing performed by the object position determination unit 51 according to the fourth embodiment and the details of the single feature stored in the single feature storage unit 41 are the first, second, and third implementations. Different from the image monitoring apparatus 1 according to the embodiment. These points will be described with reference to the functional block diagram of FIG.

第四の実施形態に係る単体特徴記憶手段４１は、単独の人（物体）の画像特徴を学習した識別器（単体識別器）を予め記憶した単体識別器記憶手段４１１ｄ、および評価値の算出において用いる重み係数を予め記憶した重み係数記憶手段４１２ｄとして機能し、単体識別器の情報および重み係数の情報を単体特徴として記憶している。 The unit feature storage unit 41 according to the fourth embodiment includes a unit classifier storage unit 411d that stores in advance a classifier (unit classifier) that has learned the image features of a single person (object), and the evaluation value calculation. It functions as a weight coefficient storage means 412d that stores the weight coefficient to be used in advance, and stores information on the single discriminator and information on the weight coefficient as single features.

図１９は、第四の実施形態に係る単体特徴記憶手段４１が記憶している単体特徴、すなわち単体識別器記憶手段４１１ｄが記憶している単体識別器の情報および重み係数記憶手段４１２ｄが記憶している重み係数の情報を模式的に表した図である。 FIG. 19 shows the unit features stored in the unit feature storage unit 41 according to the fourth embodiment, that is, the unit classifier information stored in the unit classifier storage unit 411d and the weight coefficient storage unit 412d. It is the figure which represented typically the information of the weighting coefficient.

単体識別器は、第三の実施形態にて説明したように、画像の特徴量を入力されると当該画像が単体画像であることの尤もらしさを表す評価値（識別スコア）を算出して出力する評価値算出関数の係数、および識別スコアに対して適用する閾値等のパラメータで表され、多数の単体画像と多数の無人画像からなる学習用画像の特徴量に線形ＳＶＭ法を適用して学習した識別器とすることができる。学習用画像の特徴量はＨＯＧ特徴量とすることができる。 As described in the third embodiment, the single discriminator calculates and outputs an evaluation value (discrimination score) indicating the likelihood that the image is a single image when the image feature amount is input. Learning by applying the linear SVM method to the feature quantity of the learning image consisting of a large number of single images and a large number of unmanned images. Discriminator. The feature amount of the learning image can be a HOG feature amount.

単体識別器記憶手段４１１ｄが記憶している単体識別器は、単独の物体を構成する複数の部分の画像特徴を学習した識別器となっている。具体的には、単体識別器記憶手段４１１ｄはそれぞれが互いに異なる部分の画像特徴を学習した３つの単体識別器を記憶している。すなわち、単体識別器記憶手段４１１ｄは、人の上部１／３の画像特徴を学習した単体識別器８５０と、人の中部１／３の画像特徴を学習した単体識別器８５１と、人の下部１／３の画像特徴を学習した単体識別器８５２とを記憶している。以下、上部１／３を識別する単体識別器８５０を上部識別器、中部１／３を識別する単体識別器８５１を中部識別器、下部１／３を識別する単体識別器８５２を下部識別器と称する。 The single discriminator stored in the single discriminator storage unit 411d is a discriminator that has learned the image features of a plurality of portions that constitute a single object. Specifically, the single discriminator storage unit 411d stores three single discriminators that have learned image features of portions that are different from each other. That is, the single classifier storage unit 411d includes a single classifier 850 that has learned the image feature of the upper third of the person, a single classifier 851 that has learned the image feature of the middle third of the person, and the lower part 1 of the person. A single discriminator 852 that has learned the image feature of / 3 is stored. Hereinafter, the single discriminator 850 that identifies the upper 1/3 is the upper discriminator, the single discriminator 851 that identifies the middle 1/3 is the middle discriminator, and the single discriminator 852 that identifies the lower 1/3 is the lower discriminator. Called.

重み係数は密度が高いほど単独の物体を構成する部分のうちの少ない部分に偏重させた設定となっている。重み係数記憶手段４１２ｄは、低密度クラスを表す値と対応付けて「上部１／３に適用する重み係数０．３３３」「中部１／３に適用する重み係数０．３３３」「下部１／３に適用する重み係数０．３３３」、中密度クラスを表す値と対応付けて「上部１／３に適用する重み係数０．５００」「中部１／３に適用する重み係数０．４００」「下部１／３に適用する重み係数０．１００」、高密度クラスを表す値と対応付けて「上部１／３に適用する重み係数０．７００」「中部１／３に適用する重み係数０．２００」「下部１／３に適用する重み係数０．１００」を記憶している。以下、全身に均等な低密度用の重み係数８６０を全身均等重み係数、上半身を重視した中密度用の重み係数８６１を上半身偏重重み係数、頭部近傍を重視した高密度用の重み係数８６２を頭部近傍偏重重み係数と称する。 The weighting factor is set such that the higher the density is, the more the weight is concentrated on the smaller part of the part constituting the single object. The weighting coefficient storage unit 412d associates the value representing the low density class with “weighting coefficient 0.333 applied to the upper 1/3” “weighting coefficient 0.333 applied to the middle 1/3” “lower 1/3”. “Weighting coefficient 0.333” applied to the middle density class, “weighting coefficient 0.500 applied to the upper third”, “weighting coefficient 0.400 applied to the middle third”, “lower” “Weighting coefficient 0.100 applied to 1/3”, “weighting coefficient 0.700 applied to upper 1/3” “weighting coefficient 0.200 applied to middle 1/3” in association with a value representing a high-density class "" Weighting coefficient 0.100 applied to lower 1/3 "is stored. Hereinafter, a weight coefficient 860 for low density that is equal to the whole body is represented as a weight coefficient for whole body, a weight coefficient 861 for medium density that emphasizes the upper body, an overweight weight coefficient for upper body, and a weight coefficient 862 for high density that emphasizes the vicinity of the head. This is referred to as the near-head weighting coefficient.

このように、単体識別器記憶手段４１１ｄは上部識別器８５０、中部識別器８５１および下部識別器８５２を単体識別器の情報として記憶しており、重み係数記憶手段４１２ｄは低密度クラスと対応付けられた全身均等重み係数８６０、中密度クラスと対応付けられた上半身偏重重み係数８６１および高密度クラスと対応付けられた頭部近傍偏重重み係数８６２を記憶している。 As described above, the single discriminator storage unit 411d stores the upper discriminator 850, the middle discriminator 851 and the lower discriminator 852 as single discriminator information, and the weight coefficient storage unit 412d is associated with the low density class. The whole body equal weighting coefficient 860, the upper body weighting coefficient 861 associated with the medium density class, and the head vicinity weighting coefficient 862 associated with the high density class are stored.

候補位置設定手段５１１ｄは、撮影画像内に所定間隔にて複数の候補位置を設定し、設定した候補位置を評価値算出手段５１４ｄに出力する。具体的には、所定間隔は１画素であり、候補位置設定手段５１１ｄは撮影画像の各画素の位置を順次候補位置に設定する。なお候補位置は人の頭部重心を表すものとする。 The candidate position setting unit 511d sets a plurality of candidate positions at predetermined intervals in the captured image, and outputs the set candidate positions to the evaluation value calculation unit 514d. Specifically, the predetermined interval is one pixel, and the candidate position setting unit 511d sequentially sets the position of each pixel of the captured image as the candidate position. The candidate position represents the human head center of gravity.

評価値算出手段５１４ｄは、候補位置設定手段５１１ｄから入力された候補位置のそれぞれについて、単独の物体を構成する複数の部分の画像特徴を学習した単体識別器に当該候補位置の撮影画像の画像特徴を入力して複数の部分の部分評価値を求め、候補位置における密度が高いほど少ない部分に偏重させた重み付けを行って部分評価値を総和することにより評価値を算出し、算出した評価値とそれに付随する情報を位置決定手段５１７ｄに出力する。 The evaluation value calculation unit 514d uses, for each candidate position input from the candidate position setting unit 511d, an image feature of the captured image at the candidate position in a single discriminator that has learned the image features of a plurality of parts constituting a single object. To obtain partial evaluation values of a plurality of parts, and by calculating the evaluation value by summing the partial evaluation values by weighting the parts with a smaller weight as the density at the candidate position is higher, the calculated evaluation value and The accompanying information is output to the position determining means 517d.

そのために、評価値算出手段５１４ｄは、各候補位置に人の各部分の識別用抽出窓を設定し、識別用抽出窓内の撮影画像から単体識別用の特徴量（識別用特徴量）を抽出する。識別用抽出窓は、各部分の単体識別器の学習に用いた単体画像の形状（図１９に示した実線の矩形）を有し、予め定めた複数の倍率で拡大・縮小した大きさの窓である。すなわち、識別用抽出窓は、単独の人の上部１／３、中部１／３および下部１／３の形状に定められた３つの窓である。 For this purpose, the evaluation value calculation means 514d sets an extraction window for identifying each part of the person at each candidate position, and extracts a feature quantity for identification (identification feature quantity) from the captured image in the identification extraction window. To do. The identification extraction window has a shape of a single image (solid-line rectangle shown in FIG. 19) used for learning of the single classifier of each part, and is a window whose size is enlarged or reduced at a plurality of predetermined magnifications. It is. That is, the identification extraction windows are three windows defined in the shape of the upper part 1/3, the middle part 1/3, and the lower part 1/3 of a single person.

また、評価値算出手段５１４ｄは、各部分の単体識別器を単体識別器記憶手段４１１ｄから読み出す。すなわち、評価値算出手段５１４ｄは、上部識別器、中部識別器および下部識別器を読み出す。そして、評価値算出手段５１４ｄは、各候補位置について、部分ごとに、読み出した単体識別器に当該候補位置から抽出した識別用特徴量を入力してその出力値である部分識別スコアを当該候補位置における当該部分の部分評価値として取得する。つまり、評価値算出手段５１４ｄは、候補位置ごとに上部識別器による部分識別スコア、中部識別器による部分識別スコアおよび下部識別器による部分識別スコアを算出する。 Further, the evaluation value calculation unit 514d reads out the unit classifiers of each part from the unit classifier storage unit 411d. That is, the evaluation value calculation unit 514d reads out the upper discriminator, the middle discriminator, and the lower discriminator. Then, for each candidate position, the evaluation value calculation unit 514d inputs, for each part, the identification feature amount extracted from the candidate position to the read single classifier and outputs the partial identification score that is the output value of the candidate position. Is obtained as a partial evaluation value of the part. That is, the evaluation value calculation unit 514d calculates a partial identification score by the upper classifier, a partial identification score by the middle classifier, and a partial identification score by the lower classifier for each candidate position.

また、評価値算出手段５１４ｄは、密度推定手段５０から入力された密度分布を参照し、各候補位置に設定した上部１／３の窓内の推定密度を集計する。そして、評価値算出手段５１４ｄは、各候補位置における最多の推定密度を当該候補位置の密度と決定する。 Further, the evaluation value calculation unit 514d refers to the density distribution input from the density estimation unit 50 and totals the estimated density in the upper third window set at each candidate position. Then, the evaluation value calculation unit 514d determines the highest estimated density at each candidate position as the density of the candidate position.

また、評価値算出手段５１４ｄは、候補位置ごとに当該候補位置の密度に応じた重み係数を重み係数記憶手段４１２ｄから読み出す。すなわち、評価値算出手段５１４ｄは、候補位置の密度が低密度であれば全身均等重み係数を読み出し、中密度であれば上半身偏重重み係数を読み出し、高密度であれば頭部近傍偏重重み係数を読み出す。そして、評価値算出手段５１４ｄは、各候補位置について、読み出した重み係数で対応する部分の部分評価値を重み付けて加算することによって当該候補位置の評価値を算出する。 In addition, the evaluation value calculation unit 514d reads the weighting coefficient corresponding to the density of the candidate position from the weighting coefficient storage unit 412d for each candidate position. That is, the evaluation value calculation means 514d reads the whole body equal weighting coefficient if the density of the candidate positions is low, reads the upper body weighting coefficient if it is medium density, and calculates the head weight weighting coefficient if it is high. read out. Then, the evaluation value calculation unit 514d calculates the evaluation value of the candidate position by weighting and adding the partial evaluation value of the corresponding portion with the read weighting coefficient for each candidate position.

すなわち、上部識別器による部分識別スコアをＳ_Ｕ、中部識別器による部分識別スコアをＳ_Ｍ、下部識別器による部分識別スコアをＳ_Ｌとすると、評価値算出手段５１４ｄは、注目している候補位置の密度が低密度であれば次式によって当該候補位置の識別スコアを算出する。
識別スコア＝０．３３３Ｓ_Ｕ＋０．３３３Ｓ_Ｍ＋０．３３３Ｓ_Ｌ（３）
また、評価値算出手段５１４ｄは、注目している候補位置の密度が中密度であれば次式によって当該候補位置の識別スコアを算出する。
識別スコア＝０．５００Ｓ_Ｕ＋０．４００Ｓ_Ｍ＋０．１００Ｓ_Ｌ（４）
また、評価値算出手段５１４ｄは、注目している候補位置の密度が高密度であれば次式によって当該候補位置の識別スコアを算出する。
識別スコア＝０．７００Ｓ_Ｕ＋０．２００Ｓ_Ｍ＋０．１００Ｓ_Ｌ（５） That is, assuming that the partial discrimination score by the upper discriminator is S _U , the partial discrimination score by the middle discriminator is S _M , and the partial discrimination score by the lower discriminator is S _L , the evaluation value calculation means 514d has the candidate position of interest If the density is low, the identification score of the candidate position is calculated by the following equation.
Identification score = 0.333S _U + 0.333S _M + 0.333S _L (3)
Further, the evaluation value calculating unit 514d calculates the identification score of the candidate position according to the following equation if the density of the candidate position of interest is medium density.
Identification score = 0.500S _U + 0.400S _M + 0.100S _L (4)
In addition, the evaluation value calculation unit 514d calculates the identification score of the candidate position according to the following formula if the density of the candidate position of interest is high.
Identification score = 0.700S _U + 0.200S _M + 0.100S _L (5)

図２０は、図５で例示した密度分布が得られている場合に、図５で例示した各候補位置について評価値算出手段５１４ｄが識別スコアを算出する様子を模式的に示した図である。画像８７０は、これらの候補位置のうち密度が低密度である３つの候補位置について、各部分と重み係数の関係を示している。画像８７１は、密度が中密度である３つの候補位置について、各部分と重み係数の関係を示している。画像８７２は、密度が高密度である２つの候補位置について、各部分と重み係数の関係を示している。なお、スペースの都合上、重み係数の値を有効数字１桁で示している。 FIG. 20 is a diagram schematically illustrating how the evaluation value calculation unit 514d calculates an identification score for each candidate position illustrated in FIG. 5 when the density distribution illustrated in FIG. 5 is obtained. The image 870 shows the relationship between each part and the weighting factor for three candidate positions having a low density among these candidate positions. An image 871 shows the relationship between each part and the weighting coefficient for three candidate positions having a medium density. The image 872 shows the relationship between each part and the weighting coefficient for two candidate positions having a high density. For reasons of space, the value of the weighting factor is indicated by one significant digit.

そして、評価値算出手段５１４ｄは、候補位置ごとに、候補位置、密度、識別スコアおよび使用した識別用抽出窓を対応付けた情報を位置決定手段５１７ｄに出力する。 Then, the evaluation value calculating unit 514d outputs information that associates the candidate position, the density, the identification score, and the used identification extraction window with each candidate position to the position determining unit 517d.

位置決定手段５１７ｄは、評価値算出手段５１４ｄから入力された情報を参照し、予め定めた基準を満たす評価値が算出された候補位置を物体の位置と決定する。 The position determining unit 517d refers to the information input from the evaluation value calculating unit 514d and determines a candidate position where an evaluation value that satisfies a predetermined criterion is calculated as the position of the object.

具体的には、位置決定手段５１７ｄは、識別スコアが０以上である候補位置を抽出し、抽出した候補位置のうち対応する密度が同一であり且つ互いに近接する複数の候補位置（識別用抽出窓同士の重複が予め定めた割合より大きな候補位置）を一つにまとめ、まとめた候補位置を人が撮影されている位置と決定する。この候補位置をまとめる処理とその意義は、第三の実施形態に係る位置決定手段５１７ｃが行う処理およびその意義と同様である。 Specifically, the position determining unit 517d extracts candidate positions having an identification score of 0 or more, and among the extracted candidate positions, a plurality of candidate positions (identification extraction windows) that have the same corresponding density and are close to each other. (Candidate positions where the overlap between them is larger than a predetermined ratio) are combined into one, and the combined candidate positions are determined as positions where a person is photographed. The process of collecting candidate positions and the significance thereof are the same as the process performed by the position determining unit 517c according to the third embodiment and the significance thereof.

そして、位置決定手段５１７ｄは、人が撮影されている位置と決定した候補位置を物体位置の情報として物体位置出力手段３１に出力する。 Then, the position determination unit 517d outputs the position where the person is photographed and the determined candidate position to the object position output unit 31 as object position information.

＜第四の実施形態に係る画像監視装置１の動作＞
以下、図６および図２１を参照し、第四の実施形態に係る画像監視装置１の動作を説明する。 <Operation of Image Monitoring Device 1 According to Fourth Embodiment>
Hereinafter, the operation of the image monitoring apparatus 1 according to the fourth embodiment will be described with reference to FIGS. 6 and 21.

画像監視装置１が動作を開始すると、第一、第二および第三の実施形態と同様に、撮影部２は順次撮影画像を送信し、画像処理部５は撮影画像を受信するたびに図６のフローチャートに従った動作を繰り返す。 When the image monitoring apparatus 1 starts to operate, as in the first, second, and third embodiments, the imaging unit 2 sequentially transmits captured images, and the image processing unit 5 receives the captured image every time it receives the captured image. The operation according to the flowchart is repeated.

図２１のフローチャートを参照して、ステップＳ４の物体位置判定処理を説明する。単体特徴記憶手段４１が単体識別器記憶手段４１１ｄおよび重み係数記憶手段４１２ｄとして動作し、物体位置判定手段５１が候補位置設定手段５１１ｄ、評価値算出手段５１４ｄおよび位置決定手段５１７ｄとして動作して、物体位置判定処理が実行される。 The object position determination process in step S4 will be described with reference to the flowchart in FIG. The single feature storage unit 41 operates as the single classifier storage unit 411d and the weight coefficient storage unit 412d, and the object position determination unit 51 operates as the candidate position setting unit 511d, the evaluation value calculation unit 514d, and the position determination unit 517d. A position determination process is executed.

候補位置設定手段５１１ｄは、撮影画像中の各画素の位置を順次候補位置に設定して評価値算出手段５１４ｄに入力し（ステップＳ４００）、ステップＳ４００〜Ｓ４０５のループ処理を制御する。 The candidate position setting unit 511d sequentially sets the position of each pixel in the captured image as a candidate position and inputs it to the evaluation value calculation unit 514d (step S400), and controls the loop processing of steps S400 to S405.

候補位置を入力された評価値算出手段５１４ｄは、単体識別器記憶手段４１１ｄから各部分（上部・中部・下部）の単体識別器を読み出し、各部分に対応する識別用抽出窓を設定して各識別用抽出窓内の撮影画像から識別用特徴量を抽出し（ステップＳ４０１）、抽出した各識別用特徴量を対応する部分の単体識別器に入力して部分識別スコア（部分評価値）を算出する（ステップＳ４０２）。 The evaluation value calculation means 514d to which the candidate position is inputted reads out the individual classifiers of each part (upper / middle / lower) from the single classifier storage means 411d, sets the extraction window for identification corresponding to each part, and sets each A feature value for identification is extracted from the captured image in the extraction window for identification (step S401), and each extracted feature value for identification is input to a corresponding unit single classifier to calculate a partial identification score (partial evaluation value). (Step S402).

部分評価値を算出した評価値算出手段５１４ｄは、密度分布を参照して候補位置の密度を特定する（ステップＳ４０３）。評価値算出手段５１４ｄは候補位置に設定した上部１／３の窓内で最多の推定密度を候補位置の密度として特定する。 The evaluation value calculation means 514d that has calculated the partial evaluation value specifies the density at the candidate position with reference to the density distribution (step S403). The evaluation value calculation means 514d specifies the highest estimated density as the candidate position density in the upper third window set at the candidate position.

密度を特定した評価値算出手段５１４ｄは、重み係数記憶手段４１２ｄから当該密度に応じた重み係数を読み出し、式（３）、式（４）または式（５）のうちの密度に応じた式に従って、読み出した重み係数と部分評価値を積和することにより当該候補位置の評価値を算出する（ステップＳ４０４）。 The evaluation value calculation means 514d that has specified the density reads the weighting coefficient corresponding to the density from the weighting coefficient storage means 412d, and follows the expression according to the density among the expressions (3), (4), or (5). Then, the evaluation value of the candidate position is calculated by multiplying the read weight coefficient and the partial evaluation value (step S404).

そして、評価値算出手段５１４ｄは、候補位置と識別用抽出窓と密度と評価値とを対応付けて記録するとともに、撮影画像の全ての画素の位置を候補位置に設定し終えたか否かを確認し（ステップＳ４０５）、未設定の画素がある場合は（ステップＳ４０５にてＮＯ）、処理をステップＳ４００に戻して次の画素の位置を処理する。 Then, the evaluation value calculation unit 514d records the candidate position, the extraction window for identification, the density, and the evaluation value in association with each other and confirms whether or not the positions of all the pixels of the captured image have been set as the candidate positions. If there is an unset pixel (NO in step S405), the process returns to step S400 to process the position of the next pixel.

他方、全ての画素の位置を候補位置に設定し終えた場合（ステップＳ４０５にてＹＥＳ）、位置決定手段５１７ｄは、ステップＳ４０５で記録された候補位置と識別用抽出窓と密度と評価値の組の中から評価値が閾値未満の組を削除し（ステップＳ４０６）、さらに、削除されずに残った組について、密度ごとに、互いの識別用抽出窓が予め定めた割合よりも大きく重複している組を同一人物のものとして一つの組にまとめる（ステップＳ４０７）。そして位置決定手段５１７ｄはまとめた後の各組の候補位置を撮影画像に撮影されている個々の人の位置（物体位置）と判定する。 On the other hand, when the positions of all the pixels have been set as the candidate positions (YES in step S405), the position determining unit 517d sets the candidate position, the extraction window for identification, the density, and the evaluation value recorded in step S405. The pair whose evaluation value is less than the threshold value is deleted from among the combinations (step S406). Further, for each pair that remains without being deleted, the identification extraction windows overlap each other more than a predetermined ratio for each density. The existing groups are grouped into one group as those of the same person (step S407). Then, the position determination unit 517d determines each group of candidate positions after being collected as the position (object position) of each person photographed in the photographed image.

＜変形例＞
（１）上記各実施形態およびその変形例においては、検出対象の物体を人とする例を示したが、これに限らず、検出対象の物体を車両、牛や羊等の動物等とすることもできる。 <Modification>
(1) In each of the above-described embodiments and modifications thereof, an example in which the object to be detected is a person has been shown. However, the present invention is not limited thereto, and the object to be detected is a vehicle, an animal such as a cow or a sheep, or the like. You can also.

（２）上記各実施形態およびその各変形例においては物体を１／３ずつに分けた部分を単位として単体特徴を設定する例を示したが、分け方はこれに限らない。検出対象や撮影する監視空間の特性、採用する特徴量や評価値の種類などの違いに応じ、それぞれに適した別の比率で分けた単体特徴とすることができる。また密度間でオーバーラップさせて単体特徴を設定してもよい。 (2) In each of the above-described embodiments and modifications thereof, an example in which a single feature is set with a unit obtained by dividing an object by 1/3 has been shown, but the way of dividing is not limited to this. Depending on the difference in the characteristics of the detection target, the monitoring space to be photographed, the feature amount to be used, the type of evaluation value, etc., the single feature can be divided by a different ratio suitable for each. In addition, single features may be set by overlapping between densities.

（３）上記第二および第四の実施形態およびその各変形例において示した重み係数の値は一例であり、検出対象や撮影する監視空間の特性、採用する特徴量や評価値の種類などの違いに応じ、それぞれに適した別の値とすることができる。 (3) The values of the weighting coefficients shown in the second and fourth embodiments and the modifications thereof are merely examples, such as the characteristics of the detection target, the monitoring space to be photographed, the feature quantity to be adopted, the type of evaluation value, etc. Depending on the difference, it can be set to a different value suitable for each.

（４）上記各実施形態およびその各変形例においては、多クラスＳＶＭ法にて学習した密度推定器を例示したが、多クラスＳＶＭ法に代えて、決定木型のランダムフォレスト法、多クラスのアダブースト（AdaBoost）法または多クラスロジスティック回帰法などにて学習した密度推定器など種々の密度推定器とすることができる。
或いは識別型のＣＮＮ（Convolutional Neural Network）を用いた密度推定器とすることもできる。 (4) In each of the above-described embodiments and modifications thereof, the density estimator learned by the multi-class SVM method has been exemplified, but instead of the multi-class SVM method, a decision tree type random forest method, a multi-class Various density estimators such as a density estimator learned by the AdaBoost method or the multi-class logistic regression method can be used.
Alternatively, a density estimator using a discriminating CNN (Convolutional Neural Network) may be used.

（５）上記各実施形態およびその各変形例においては、密度推定器が推定する背景以外の密度のクラスを３クラスとしたが、より細かくクラスを分けてもよい。
その場合、３段階（全身、上半身および頭部近傍）の単体特徴に代えて、クラス分けに対応したより細かい段階の単体特徴とし、クラスと単体特徴を対応付けて単体特徴記憶手段４１に記憶させておくことができる。或いは、クラスと３段階の単体特徴を多対一で対応付けて単体特徴記憶手段４１に記憶させておくこともできる。 (5) In each of the above-described embodiments and modifications thereof, the class of density other than the background estimated by the density estimator is set to three classes, but the class may be divided more finely.
In that case, instead of single-stage features in three levels (whole body, upper body, and the vicinity of the head), single-stage features of finer levels corresponding to classification are used, and classes and single-unit features are associated and stored in the single-unit feature storage unit 41. I can keep it. Alternatively, the class and the three-stage single feature can be associated in a many-to-one manner and stored in the single feature storage unit 41.

（６）上記各実施形態およびその各変形例においては、多クラスに分類する密度推定器を例示したがこれに代えて、特徴量から密度の値（推定密度）を回帰する回帰型の密度推定器とすることもできる。すなわち、リッジ回帰法、サポートベクターリグレッション法、回帰木型のランダムフォレスト法またはガウス過程回帰（Gaussian Process Regression）などによって、特徴量から推定密度を求めるための回帰関数のパラメータを学習した密度推定器とすることができる。
或いは回帰型のＣＮＮを用いた密度推定器とすることもできる。
これらの場合、密度クラスの値の代わりに連続値で出力される推定密度の値域を、単体特徴と対応付けて単体特徴記憶手段４１に記憶させておく。 (6) In each of the above-described embodiments and modifications thereof, a density estimator that classifies into multiple classes is illustrated, but instead of this, a regression type density estimation that regresses a density value (estimated density) from a feature quantity It can also be a container. That is, a density estimator that has learned the parameters of the regression function for obtaining the estimated density from the features by ridge regression method, support vector regression method, regression tree-type random forest method or Gaussian Process Regression, etc. can do.
Alternatively, a density estimator using a regression type CNN may be used.
In these cases, the estimated density value range output as a continuous value instead of the density class value is stored in the single feature storage unit 41 in association with the single feature.

（７）第二および第四の実施形態およびその各変形例においては、各部分の重み係数を一定値とする例を示したが、各部分の重み係数を関数としてもよい。その場合、例えば重み係数記憶手段４１２ｂ、４１２ｄは、各部分内の画素の位置を入力とし当該部分内の高さが高い位置ほど大きな重み係数を出力する関数を記憶し、評価値算出手段５１４ｂ、５１４ｄは各部分内の画素の位置を当該関数に入力して画素ごとに重み付けを行う。 (7) In the second and fourth embodiments and the modifications thereof, the example in which the weighting coefficient of each part is set to a constant value has been shown, but the weighting coefficient of each part may be a function. In that case, for example, the weight coefficient storage means 412b and 412d store a function that inputs the position of the pixel in each part and outputs a larger weight coefficient as the height in the part is higher, and the evaluation value calculation means 514b, 514d inputs the position of the pixel in each part to the said function, and performs weighting for every pixel.

（８）上記各実施形態およびその各変形例においては、密度推定器が学習する特徴量および推定用特徴量としてＧＬＣＭ特徴を例示したが、これらはＧＬＣＭ特徴に代えて、局所二値パターン（Local Binary Pattern：ＬＢＰ）特徴量、ハールライク（Haar-like）特徴量、ＨＯＧ特徴量、輝度パターンなどの種々の特徴量とすることができ、またはＧＬＣＭ特徴とこれらのうちの複数を組み合わせた特徴量とすることもできる。 (8) In each of the above embodiments and the modifications thereof, the GLCM feature is exemplified as the feature amount learned by the density estimator and the estimation feature amount. However, instead of the GLCM feature, the local binary pattern (Local Binary Pattern (LBP) feature value, Haar-like feature value, HOG feature value, luminance pattern, and other various feature values, or a combination of GLCM features and a plurality of them You can also

（９）上記各実施形態およびその各変形例においては、密度推定手段５０および物体位置判定手段５１が１画素間隔で走査して処理を行う例を示したが、これらの走査を２画素以上の間隔を空けて行うことも可能である。 (9) In each of the above-described embodiments and modifications thereof, an example is shown in which the density estimation unit 50 and the object position determination unit 51 scan and process at intervals of one pixel. It is also possible to carry out at intervals.

（１０）上記各実施形態およびその各変形例においては、候補位置を推定密度が低密度、中密度または高密度の領域内から選んで設定する例を示したが、配置生成手段５１０ａ、配置生成手段５１０ｂ、候補位置設定手段５１１ｃおよび候補位置設定手段５１１ｄのそれぞれは、変化領域内に限定して候補位置を設定することもできる。その場合、記憶部４は監視空間の背景画像を記憶する背景画像記憶手段（不図示）を備え、画像処理部５は、撮影画像と背景画像との差分処理を行って差分値が所定の差分閾値以上である画素の集まりを変化領域として抽出する、または撮影画像と背景画像との相関処理を行って相関値が所定の相関閾値以下である画素の集まりを変化領域として抽出する変化領域抽出手段（不図示）を備え、配置生成手段５１０ａ、配置生成手段５１０ｂ、候補位置設定手段５１１ｃおよび候補位置設定手段５１１ｄのそれぞれは、変化領域抽出手段が抽出した変化領域を参照して候補位置を設定する。
なお、候補位置を設定する領域を限定する場合、配置生成手段５１０ａおよび配置生成手段５１０ｂのそれぞれは、限定した領域の大きさに応じて配置数の上限個数を変更することができる。
このような候補位置を設定する領域の限定によって、撮影画像とモデル画像の偶発的な類似または背景に対する高い識別スコアの偶発的な算出を防止でき、物体位置の誤検出を低減できる。 (10) In each of the above-described embodiments and modifications thereof, an example has been shown in which candidate positions are selected and set from regions with low, medium, or high estimated density. Each of the means 510b, the candidate position setting means 511c, and the candidate position setting means 511d can set the candidate position only within the change region. In that case, the storage unit 4 includes background image storage means (not shown) for storing the background image of the monitoring space, and the image processing unit 5 performs a difference process between the captured image and the background image, and the difference value is a predetermined difference. A change area extraction unit that extracts a group of pixels that are equal to or greater than a threshold value as a change area, or performs a correlation process between a captured image and a background image to extract a group of pixels that have a correlation value equal to or less than a predetermined correlation threshold as a change area (Not shown), each of the arrangement generation unit 510a, the arrangement generation unit 510b, the candidate position setting unit 511c, and the candidate position setting unit 511d sets a candidate position with reference to the change area extracted by the change area extraction unit. .
In the case where the area where the candidate position is set is limited, each of the arrangement generation unit 510a and the arrangement generation unit 510b can change the upper limit number of arrangements according to the size of the limited area.
By limiting the region where the candidate positions are set, accidental similarity between the captured image and the model image or accidental calculation of a high identification score for the background can be prevented, and erroneous detection of the object position can be reduced.

（１１）上記第一および第二の実施形態とその各変形例においては、配置生成手段５１０ａおよび配置生成手段５１０ｂが反復の都度ランダムに配置を生成する例を示したが、反復の２回目以降に一回前の候補位置から微小にずらした候補位置に更新することで配置を生成してもよいし、反復の２回目以降に一回前の配置に対する類似度を参照してＭＣＭＣ（Markov chain Monte Carlo）法により確率的に候補位置を探索する方法や山登り法により候補位置を逐次改善することで配置を生成してもよい。 (11) In the first and second embodiments and the variations thereof, an example in which the arrangement generation unit 510a and the arrangement generation unit 510b randomly generate an arrangement for each iteration has been described. The arrangement may be generated by updating the candidate position slightly shifted from the previous candidate position to the MCMC (Markov chain by referring to the similarity to the previous arrangement after the second iteration. The arrangement may be generated by sequentially searching for candidate positions by the Monte Carlo method or by sequentially improving candidate positions by the hill-climbing method.

（１２）上記各実施形態およびその各変形例においては、注目している候補位置に人の上部１／３の形状に定められたモデルの投影領域または該形状に定められた窓を設定して当該領域内の推定密度を集計することによって、当該候補位置における推定密度を決定する例を示したが、処理量を削減するために当該領域に代えて候補位置の画素、候補位置の８近傍領域または１６近傍領域などの小さな領域とすることもできる。或いは、確度を上げるために当該領域に代えて候補位置を代表位置とする単独の人の上部２／３の形状に定められたモデルの投影領域または該形状に定められた窓、または候補位置を代表位置とする単独の人の全身の形状に定められたモデルの投影領域または該形状に定められた窓などの大きな領域とすることもできる。 (12) In each of the above-described embodiments and modifications thereof, a projection area of a model defined in the shape of the upper third of the person or a window defined in the shape is set at the candidate position of interest. The example in which the estimated density at the candidate position is determined by aggregating the estimated density in the area has been shown. However, in order to reduce the processing amount, the pixel at the candidate position and the 8 neighboring areas at the candidate position are replaced with the area. Alternatively, it may be a small area such as the 16 neighborhood area. Alternatively, in order to improve the accuracy, a projection area of a model defined in the shape of the upper 2/3 of a single person with the candidate position as a representative position instead of the area, a window defined in the shape, or a candidate position It can also be a projection area of a model defined in the shape of the whole body of a single person as a representative position or a large area such as a window defined in the shape.

（１３）第三の実施形態およびその各変形例において示した、識別スコアと対比する閾値は単体識別器ごとに異なる値とすることもできる。 (13) The threshold value to be compared with the identification score shown in the third embodiment and the modifications thereof may be different for each single classifier.

（１４）第三および第四の実施形態およびその変形例においては、線形ＳＶＭ法により学習された単体識別器を例示したが、線形ＳＶＭ法に代えてアダブースト（AdaBoost）法など、従来知られた各種の学習法を用いて学習した単体識別器とすることもできる。また、識別器の代わりにパターンマッチング器を用いることもでき、その場合の識別スコアは人の学習用画像から抽出した特徴量の平均パターンと入力画像の特徴量との内積などとなり、識別スコア算出関数は当該スコアを出力値とし入力画像の特徴量を入力値とする関数とすることができる。また単体識別器として識別型のＣＮＮを用いても良い。 (14) In the third and fourth embodiments and the modifications thereof, the single classifier learned by the linear SVM method is exemplified, but conventionally known such as the AdaBoost method instead of the linear SVM method. It is also possible to use a single classifier that has been learned using various learning methods. In addition, a pattern matching device can be used in place of the discriminator. In this case, the discriminant score is an inner product of the average pattern of the feature amount extracted from the human learning image and the feature amount of the input image. The function can be a function having the score as an output value and the feature quantity of the input image as an input value. Further, an identification type CNN may be used as a single classifier.

（１５）第三および第四の実施形態およびその各変形例においては、単体識別器が学習する特徴量としてＨＯＧ特徴量を例示したが、これらはＨＯＧ特徴量に代えて、局所二値パターン特徴量、ハールライク特徴量、輝度パターンなどの種々の特徴量とすることができ、またはＨＯＧ特徴量とこれらのうちの複数を組み合わせた特徴量とすることもできる。 (15) In the third and fourth embodiments and their modifications, the HOG feature value is exemplified as the feature value learned by the single discriminator. However, these are local binary pattern features instead of the HOG feature value. Various feature amounts such as an amount, a Haar-like feature amount, and a luminance pattern can be used, or a HOG feature amount and a feature amount obtained by combining a plurality of these can be used.

以上の各実施形態およびその変形例によれば、物体検出装置は、候補位置ごとの密度に応じ、当該密度により物体に生じ得る隠蔽状態に適した単体特徴（単独の物体の画像特徴）を用いて個々の物体の位置を判定するので、混雑状態の変化に伴う物体の隠蔽状態の変化と個々の物体の検出のために用いる部分の多寡に伴う検出精度の変動とのトレードオフを解消して精度の高い物体検出が可能となる。 According to each of the above embodiments and the modifications thereof, the object detection device uses a single feature (image feature of a single object) suitable for the concealment state that can occur in the object due to the density according to the density for each candidate position. Since the position of each object is determined, the trade-off between the change in the concealment state of the object due to the change in the congestion state and the fluctuation in detection accuracy due to the number of parts used for the detection of each object is eliminated. It is possible to detect an object with high accuracy.

そのうちの第一の実施形態およびその変形例に係る物体検出装置は、単体特徴を表す物体モデルを用い、候補位置ごとの密度に応じて物体モデルを切り替えることによって上記トレードオフを解消し、精度の高い物体検出を可能とする。 The object detection apparatus according to the first embodiment and the modification thereof uses an object model representing a single feature, eliminates the trade-off by switching the object model according to the density for each candidate position, Enables high object detection.

またそのうちの第二の実施形態およびその変形例に係る物体検出装置は、単体特徴を表す物体モデルおよび物体モデルの撮影画像に対する類似度を評価する際の重み係数を用い、候補位置ごとの密度に応じて重み係数を切り替えることによって上記トレードオフを解消し、精度の高い物体検出を可能とする。 In addition, the object detection apparatus according to the second embodiment and the modification thereof uses an object model representing a single feature and a weighting factor when evaluating the similarity of the object model to a captured image, and uses the weight coefficient when evaluating the density for each candidate position. By switching the weighting factor accordingly, the trade-off is eliminated, and highly accurate object detection is possible.

またそのうちの第三の実施形態およびその変形例に係る物体検出装置は、単体特徴を学習した識別器を用い、候補位置ごとの密度に応じて識別器を切り替えることによって上記トレードオフを解消し、精度の高い物体検出を可能とする。 In addition, the object detection apparatus according to the third embodiment and the modification thereof uses a discriminator that has learned a single feature, and eliminates the trade-off by switching the discriminator according to the density for each candidate position, Enables highly accurate object detection.

またそのうちの第四の実施形態およびその変形例に係る物体検出装置は、単体特徴を部分ごとに学習した識別器および識別器による部分ごとの部分評価値を総和する際の重み係数を用い、候補位置ごとの密度に応じて重み係数を切り替えることによって上記トレードオフを解消し、精度の高い物体検出を可能とする。 In addition, the object detection device according to the fourth embodiment and the modification thereof includes a classifier that learns a single feature for each part, and uses a weighting factor for summing the partial evaluation values for each part by the classifier. By switching the weighting coefficient according to the density for each position, the trade-off is eliminated, and highly accurate object detection is possible.

１画像監視装置、２撮影部、３通信部、４記憶部、５画像処理部、６表示部、３０画像取得手段、３１物体位置出力手段、４０密度推定器記憶手段、４１単体特徴記憶手段、４１０ａ,４１０ｂ物体モデル記憶手段、４１１ｃ,４１１ｄ単体識別器記憶手段、４１２ａ,４１２ｂ,４１２ｄ重み係数記憶手段、５０密度推定手段、５１物体位置判定手段、５１０ａ,５１０ｂ配置生成手段、５１１ｃ,５１１ｄ候補位置設定手段、５１２ａ,５１２ｂモデル画像生成手段、５１４ａ,５１４ｂ,５１４ｃ,５１４ｄ評価値算出手段、５１６ａ,５１６ｂ最適配置決定手段、５１７ｃ,５１７ｄ位置決定手段

DESCRIPTION OF SYMBOLS 1 Image monitoring apparatus, 2 imaging | photography part, 3 communication part, 4 memory | storage part, 5 image processing part, 6 display part, 30 image acquisition means, 31 object position output means, 40 density estimator memory | storage means, 41 single-piece | unit characteristic memory | storage means, 410a, 410b Object model storage means, 411c, 411d Single classifier storage means, 412a, 412b, 412d Weight coefficient storage means, 50 Density estimation means, 51 Object position determination means, 510a, 510b Arrangement generation means, 511c, 511d Candidate positions Setting means 512a, 512b Model image generating means 514a, 514b, 514c, 514d Evaluation value calculating means 516a, 516b Optimal arrangement determining means 517c, 517d Position determining means

Claims

An object detection device for detecting individual objects from a captured image in which a space in which congestion due to a predetermined object may occur is captured,
Estimate the distribution of the density of the object imaged in the captured image using a density estimator that learns the image characteristics of each density image captured in the space where the object exists at the density for each predetermined density Density estimation means to perform,
A candidate position where each of the objects can exist in the photographed image is set, and an evaluation value representing a degree of appearance of an image feature of the single object in the photographed image at the candidate position is calculated, and the evaluation value Object position determination means for determining a candidate position whose value is equal to or greater than a predetermined value as the position of the object;
With
The object position determination means calculates the evaluation value by changing a portion to be emphasized among portions constituting a single object according to the density at the candidate position.

The said object position determination means calculates the said evaluation value by attaching importance to the image feature of the few parts which comprise the said single object, so that the said density in the said candidate position is high. Object detection device.

The object position determination means indicates the degree that the image feature of a smaller part of the parts constituting the single object appears at the candidate position of the photographed image as the density at the candidate position is higher. The object detection apparatus according to claim 2, wherein a value is calculated.

The object position determination means calculates a partial evaluation value representing a degree of appearance of image features of a plurality of portions constituting a single object at the candidate position of the captured image, and the density at the candidate position is high. The object detection device according to claim 2, wherein the evaluation value is calculated by weighting the partial evaluation values of a small part of the parts constituting the object higher and summing the partial evaluation values.

The object position determination means includes
Arrangement generation means for generating a plurality of different arrangements each including one or more candidate positions;
For each of the plurality of arrangements, a model image is generated by drawing an object model simulating a smaller part of the parts constituting the single object as the density at the candidate position is higher at each candidate position. Model image generation means for
Evaluation value calculating means for calculating the evaluation value representing the degree of similarity of the model image to the captured image for each of the plurality of arrangements;
An optimum arrangement determining means for determining the candidate position in the arrangement having the maximum evaluation value as the position of the object;
The object position detection apparatus according to claim 3, comprising:

The object position determination means includes
Arrangement generation means for generating a plurality of different arrangements each including one or more candidate positions;
For each of the plurality of arrangements, model image generating means for generating a model image by drawing an object model imitating the single object at each candidate position;
For each of the model images in the plurality of arrangements, the partial evaluation value representing the degree of similarity of the object model with respect to the photographed image is obtained for each portion constituting the object, and the smaller the density at the candidate positions, the smaller An evaluation value calculating means for calculating the evaluation value by performing weighting with partial weighting and summing the partial evaluation values;
An optimum arrangement determining means for determining the candidate position in the arrangement having the maximum evaluation value as the position of the object;
The object position detecting device according to claim 4 including:

The object position determination means includes
Candidate position setting means for setting a plurality of candidate positions in the captured image;
For each of the candidate positions, the image feature of the photographed image at the candidate position is input to the discriminator that has learned the image feature of the smaller part of the parts constituting the single object as the density of the candidate position is higher. Evaluation value calculating means for calculating the evaluation value,
Position determining means for determining, as the position of the object, the candidate position from which the evaluation value that satisfies a predetermined criterion is calculated;
The object position detection apparatus according to claim 3, comprising:

The object position determination means includes
Candidate position setting means for setting a plurality of candidate positions in the captured image;
For each of the candidate positions, an image feature of the captured image at the candidate position is input to a discriminator that has learned the image features of a plurality of parts constituting the single object, and partial evaluation values of the plurality of parts are obtained. An evaluation value calculating means for calculating the evaluation value by performing weighting that is biased toward a smaller portion as the density at the candidate position is higher, and summing the partial evaluation values;
Position determining means for determining, as the position of the object, the candidate position from which the evaluation value that satisfies a predetermined criterion is calculated;
The object position detecting device according to claim 4 including: