JP6851246B2

JP6851246B2 - Object detector

Info

Publication number: JP6851246B2
Application number: JP2017086322A
Authority: JP
Inventors: 昌宏前田; 秀紀氏家; 黒川　高晴; 高晴黒川
Original assignee: Secom Co Ltd
Current assignee: Secom Co Ltd
Priority date: 2017-04-25
Filing date: 2017-04-25
Publication date: 2021-03-31
Anticipated expiration: 2037-04-25
Also published as: JP2018185623A

Description

本発明は、人等の所定の物体が存在し得る空間が撮影された撮影画像から個々の物体を検出する物体検出装置に関し、特に、混雑が生じ得る空間が撮影された撮影画像から個々の物体を検出する物体検出装置に関する。 The present invention relates to an object detection device that detects an individual object from a photographed image in which a space in which a predetermined object such as a person can exist is photographed. In particular, the present invention relates to an individual object from a photographed image in which a space where congestion may occur is photographed. The present invention relates to an object detection device for detecting.

イベント会場等の混雑が発生し得る空間においては事故防止等のために、混雑が発生している区域に警備員を多く配置するなどの対応が求められる。そこで、会場の各所に監視カメラを配置して撮影画像から人の分布を推定し、推定した分布を表示することによって監視員による混雑状況の把握を容易化することができる。 In spaces where congestion may occur, such as event venues, it is necessary to take measures such as allocating a large number of security guards in areas where congestion is occurring in order to prevent accidents. Therefore, by arranging surveillance cameras at various places in the venue, estimating the distribution of people from the captured image, and displaying the estimated distribution, it is possible to facilitate the grasp of the congestion situation by the observer.

その際、個々の人の位置を検出して、検出した各位置に人の形状を模したモデルを表示し、または／および人の位置関係（例えば行列を為している、取り囲んでいる）を解析して解析結果を報知することによって、より一層の監視効率向上が期待できる。 At that time, the position of each person is detected, a model imitating the shape of the person is displayed at each detected position, or / and the positional relationship of the person (for example, forming a matrix or surrounding) is displayed. Further improvement in monitoring efficiency can be expected by analyzing and notifying the analysis result.

複数人が撮影された撮影画像から個々の人の位置を検出する方法に、人を模したモデルを複数個組み合わせて撮影画像に当てはめる方法や、単独の人が撮影された画像の特徴量を事前に学習した識別器を用いて撮影画像をスキャンする方法など、予め用意した単独の人の画像特徴を用いて撮影画像から単独の人の画像特徴が現れている位置を検出する方法がある。 A method of detecting the position of an individual person from a photographed image taken by multiple people, a method of applying a combination of multiple models imitating a person to the photographed image, and a method of preliminarily determining the feature amount of an image taken by a single person. There is a method of detecting the position where the image feature of a single person appears from the photographed image by using the image feature of a single person prepared in advance, such as a method of scanning the photographed image using the discriminator learned in 1.

例えば、特許文献１に記載の移動物体追跡装置においては、監視画像と背景画像との比較によって変化画素が抽出された位置に、追跡中の移動物体の形状を模した移動物体モデルを追跡中の移動物体の数だけ組み合わせて当てはめることによって個々の移動物体の位置を検出している。この移動物体追跡装置においては、人の全身の形状を近似した移動物体モデルを用いることが例示されている。 For example, in the moving object tracking device described in Patent Document 1, a moving object model that imitates the shape of the moving object being tracked is being tracked at a position where change pixels are extracted by comparing a surveillance image and a background image. The position of each moving object is detected by combining and applying as many as the number of moving objects. In this moving object tracking device, it is exemplified to use a moving object model that approximates the shape of the whole body of a person.

また、例えば、特許文献２に記載の物体検出装置は、多数の「人」の画像データ、「人以外」の画像データを用いて予め学習させた識別器を用いて入力画像から人を検出する。この物体検出装置が用いる識別器は、人の全身の画像データを用いて学習したものであることが示唆されている。 Further, for example, the object detection device described in Patent Document 2 detects a person from an input image using a classifier trained in advance using a large number of "human" image data and "non-human" image data. .. It is suggested that the classifier used by this object detection device was learned using image data of the whole body of a person.

特開２０１２−１５９９５８JP 2012-159858 特開２０１１−１８６６３３JP 2011-186633

しかしながら、混雑が生じ得る空間が撮影された撮影画像においては、混雑状態に応じて人の隠蔽状態が変化する。そのため、単独の人の画像特徴として、混雑状態によらず常に物体の各部分の画像特徴を一様に評価していると個々の人を精度良く検出し続けることが困難となる問題があった。 However, in a photographed image in which a space where congestion can occur is taken, the concealed state of a person changes according to the congestion state. Therefore, as an image feature of a single person, if the image feature of each part of the object is always evaluated uniformly regardless of the congestion state, there is a problem that it becomes difficult to continuously detect each person with high accuracy. ..

すなわち、混雑が生じておらず、全身が撮影されている人が多い撮影画像については、人を模したモデルを用いる方法においても、人の画像を学習した識別器を用いる方法においても、全身の画像特徴を用いた方が一部（例えば頭部近傍のみ）の画像特徴を用いるよりも高い精度で当該人を検出できる。 That is, for captured images in which many people are photographed with no congestion and the whole body is photographed, both the method using a model imitating a person and the method using a discriminator that has learned the image of a person are used. The person can be detected with higher accuracy by using the image feature than by using the image feature of a part (for example, only in the vicinity of the head).

一方、混雑が生じ、隠蔽状態が多発している撮影画像については、人を模したモデルを用いる方法においても、人の画像を学習した識別器を用いる方法においても、全身の画像特徴を用いるよりも隠蔽が生じている可能性が低い部分のみの画像特徴を用いた方が高い精度で当該人を検出できる。 On the other hand, for captured images that are congested and frequently concealed, the image features of the whole body are used in both the method using a model imitating a person and the method using a classifier that has learned the image of a person. However, it is possible to detect the person with higher accuracy by using the image feature of only the part where the possibility of concealment is low.

そのため、撮影画像において混雑している領域の左側に撮影されている人はその左側部分、右側に撮影されている人はその右側部分、上方に撮影されている人はその上部、下方に撮影されている人はその下部を他の部分よりも重視して評価した方が高い精度で検出できる。 Therefore, the person photographed on the left side of the crowded area in the captured image is photographed on the left side, the person photographed on the right side is photographed on the right side, and the person photographed above is photographed above and below. It is possible to detect with higher accuracy by evaluating the lower part with more emphasis than other parts.

また、例えば、混雑時の検出精度を高めるために頭部近傍のみの画像特徴を常に用いていると混雑が生じていない時の検出精度が低下し、混雑が生じていない時の検出精度を高めるために全身の画像特徴を常に用いていると混雑時の検出精度が低下する。 Further, for example, if the image feature only in the vicinity of the head is always used in order to improve the detection accuracy at the time of congestion, the detection accuracy at the time of no congestion is lowered and the detection accuracy at the time of no congestion is improved. Therefore, if the image features of the whole body are always used, the detection accuracy at the time of congestion decreases.

このように、混雑が生じ得る空間が撮影された撮影画像においては検出対象の物体の個々の隠蔽状態が混雑状態に応じて変化するため、常に物体の各部分を一様に評価していると当該撮影画像から個々の物体を精度良く検出することが困難であった。 In this way, in the captured image in which a space where congestion can occur is taken, the individual concealment state of the object to be detected changes according to the congestion state, so that each part of the object is always evaluated uniformly. It was difficult to accurately detect individual objects from the captured image.

本発明は上記問題に鑑みてなされたものであり、混雑が生じ得る空間が撮影された撮影画像であっても当該撮影画像中の個々の物体を精度良く検出することができる物体検出装置を提供することを目的とする。 The present invention has been made in view of the above problems, and provides an object detection device capable of accurately detecting individual objects in a photographed image even in a photographed image in a space where congestion may occur. The purpose is to do.

かかる目的を達成するために本発明は、所定の物体による混雑が生じ得る空間が撮影された撮影画像から個々の物体を検出する物体検出装置であって、所定の密度ごとに当該密度にて物体が存在する空間を撮影した密度画像それぞれの画像特徴を学習した密度推定器を用いて、撮影画像に撮影された物体の密度の分布を推定する密度推定手段と、撮影画像内に個々の物体が存在し得る候補位置を設定するとともに候補位置を基準として撮影画像内に物体を構成する複数の部分のそれぞれと対応する部分領域を設定して、部分領域ごとに当該部分領域と対応する部分の画像特徴が現れている度合いを表す部分評価値を算出し、候補位置を基準に設定した複数の部分領域の部分評価値を当該部分領域の密度が低いほど重視して統合した統合評価値が所定の判定基準を満たす候補位置を物体の位置と判定する物体位置判定手段と、を備えたことを特徴とする物体検出装置を提供する。 In order to achieve such an object, the present invention is an object detection device that detects an individual object from a captured image in which a space where congestion due to a predetermined object may occur is captured, and the object at the density at each predetermined density. A density estimation means that estimates the distribution of the density of objects captured in the captured image using a density estimator that learns the image characteristics of each density image captured in the space in which the image exists, and individual objects in the captured image. A candidate position that may exist is set, and a partial area corresponding to each of a plurality of parts constituting the object is set in the captured image based on the candidate position, and an image of the part corresponding to the partial area is set for each partial area. A predetermined integrated evaluation value is obtained by calculating a partial evaluation value indicating the degree of appearance of a feature, and emphasizing the partial evaluation value of a plurality of partial regions set based on the candidate position as the density of the partial region becomes lower. Provided is an object detection device provided with an object position determination means for determining a candidate position satisfying a determination criterion as an object position.

また、物体位置判定手段は、部分領域ごとに当該部分領域における密度が低いほど高く当該部分領域における密度が高いほど低い重み係数を設定し、候補位置を基準に設定した複数の部分領域の部分評価値を当該部分領域の重み係数にて重み付けて総和して統合評価値を算出することが好適である。 Further, the object position determination means sets a weighting coefficient for each partial region, which is higher as the density in the partial region is lower and lower as the density in the partial region is higher, and partial evaluation of a plurality of partial regions set based on the candidate position. It is preferable to calculate the integrated evaluation value by weighting the values with the weighting coefficient of the partial region and summing them up.

また、物体位置判定手段は、候補位置における密度が高いほど物体を構成する部分のうちの少ない部分に対応する部分領域を設定することが好適である。 Further, it is preferable that the object position determining means sets a partial region corresponding to a small part of the parts constituting the object as the density at the candidate position increases.

また、物体位置判定手段は、それぞれが１以上の候補位置を含む互いに異なる複数通りの配置を生成する配置生成手段と、複数通りの配置それぞれについて、各候補位置を基準とする複数の部分のそれぞれと対応する部分領域に、当該部分を模した部分モデルを描画してモデル画像を生成するモデル画像生成手段と、複数通りの配置のモデル画像それぞれについて、部分領域ごとに部分モデルの撮影画像に対する類似の度合いを表す部分評価値を算出し、複数の部分領域の部分評価値を統合して統合評価値を算出する評価値算出手段と、統合評価値が最大の配置における候補位置を物体の位置と決定する最適配置決定手段と、を含むことが好適である。 Further, the object position determination means includes an arrangement generation means that generates a plurality of different arrangements each including one or more candidate positions, and a plurality of parts based on each candidate position for each of the plurality of arrangements. A model image generation means that draws a partial model that imitates the relevant part in the corresponding partial area to generate a model image, and similarities to the captured image of the partial model for each partial area for each of the model images in a plurality of arrangements. The evaluation value calculation means that calculates the partial evaluation value indicating the degree of the above and integrates the partial evaluation values of multiple partial areas to calculate the integrated evaluation value, and the candidate position in the arrangement where the integrated evaluation value is the maximum is the position of the object. It is preferable to include an optimum arrangement determining means for determining.

また、物体位置判定手段は、撮影画像内に複数の候補位置を設定する候補位置設定手段と、各候補位置を基準に設定した複数の部分のそれぞれと対応する部分領域の画像特徴を当該部分の画像特徴を学習した識別器に入力して当該部分領域の部分評価値を算出し、候補位置ごとに当該候補位置を基準に設定した複数の部分領域の部分評価値を統合して統合評価値を算出する評価値算出手段と、判定基準を満たす統合評価値が算出された候補位置を物体の位置と決定する位置決定手段と、を含むことが好適である。 Further, the object position determining means sets the candidate position setting means for setting a plurality of candidate positions in the captured image, and the image feature of the partial region corresponding to each of the plurality of parts set based on each candidate position. The image features are input to the learned classifier to calculate the partial evaluation value of the relevant subregion, and the partial evaluation values of multiple subregions set based on the candidate position are integrated for each candidate position to obtain an integrated evaluation value. It is preferable to include a means for calculating the evaluation value to be calculated and a means for determining the position where the candidate position for which the integrated evaluation value satisfying the determination criteria is calculated is determined as the position of the object.

本発明によれば、混雑が生じ得る空間が撮影された撮影画像から個々の物体を精度良く検出できる。 According to the present invention, individual objects can be accurately detected from a captured image in which a space where congestion can occur is captured.

第一および第二の実施形態に係る画像監視装置の概略の構成を示すブロック図である。It is a block diagram which shows the schematic structure of the image monitoring apparatus which concerns on 1st and 2nd Embodiment. 第一および第二の実施形態に係る画像監視装置の機能を示す機能ブロック図である。It is a functional block diagram which shows the function of the image monitoring apparatus which concerns on 1st and 2nd Embodiment. 第一の実施形態に係る画像監視装置の機能を示す機能ブロック図である。It is a functional block diagram which shows the function of the image monitoring apparatus which concerns on 1st Embodiment. 第一の実施形態に係る物体モデル記憶手段が記憶している物体モデルの情報の一例を模式的に表した図である。It is a figure which represented typically an example of the information of the object model stored in the object model storage means which concerns on 1st Embodiment. 第一の実施形態に係る物体モデル記憶手段が記憶している物体モデルの情報の一例を模式的に表した図である。It is a figure which represented typically an example of the information of the object model stored in the object model storage means which concerns on 1st Embodiment. 第一の実施形態に係る重み記憶手段が記憶している重み（重み係数の比率）の一例を模式的に表した図である。It is a figure which represented typically an example of the weight (the ratio of the weighting coefficient) stored in the weight storing means which concerns on 1st Embodiment. 第一の実施形態に係る密度推定手段、配置生成手段およびモデル画像生成手段による処理例を模式的に示した図である。It is a figure which shows typically the processing example by the density estimation means, arrangement generation means, and model image generation means which concerns on 1st Embodiment. 第一の実施形態に係るモデル画像７４３の各部分領域に対応してモデル画像生成手段が算出した重み係数を模式的に示した画像である。It is an image which schematically shows the weighting coefficient calculated by the model image generation means corresponding to each partial region of the model image 743 which concerns on 1st Embodiment. 第一および第二の実施形態に係る画像監視装置の動作を示したフローチャートである。It is a flowchart which showed the operation of the image monitoring apparatus which concerns on 1st and 2nd Embodiment. 第一の実施形態に係る画像監視装置による物体位置判定処理のフローチャートである。It is a flowchart of the object position determination process by the image monitoring apparatus which concerns on 1st Embodiment. 第一の実施形態に係る画像監視装置による物体位置判定処理のフローチャートである。It is a flowchart of the object position determination process by the image monitoring apparatus which concerns on 1st Embodiment. 第二の実施形態に係る画像監視装置の機能を示す機能ブロック図である。It is a functional block diagram which shows the function of the image monitoring apparatus which concerns on 2nd Embodiment. 第二の実施形態に係る部分識別器記憶手段が記憶している部分識別器の情報および重み記憶手段が記憶している重みの情報を模式的に表した図である。It is a figure which represented typically the information of the partial classifier which the partial classifier storage means which concerns on 2nd Embodiment, and the weight information which a weight storage means stores. 候補位置について第二の実施形態に係る評価値算出手段が統合スコアを算出する様子を模式的に示した図である。It is a figure which shows typically how the evaluation value calculation means which concerns on 2nd Embodiment calculates an integrated score about a candidate position. 第二の実施形態に係る画像監視装置による物体位置判定処理のフローチャートである。It is a flowchart of the object position determination process by the image monitoring apparatus which concerns on 2nd Embodiment.

［第一の実施形態］
以下、本発明の実施形態として、イベント会場が撮影された撮影画像から個々の人を検出する物体検出装置の例を含み、検出結果を監視員に対して表示する画像監視装置１の例を説明する。この実施形態に係る画像監視装置１は、特に、物体検出装置が人を模した物体モデルを用いて個々の人を検出し、その際に物体検出装置が物体モデルを構成する複数の部分モデルを用いて個々の人を検出する例を含む。 [First Embodiment]
Hereinafter, as an embodiment of the present invention, an example of an image monitoring device 1 that includes an example of an object detection device that detects an individual person from a photographed image taken at an event venue and displays the detection result to an observer will be described. To do. In the image monitoring device 1 according to this embodiment, in particular, the object detection device detects an individual person by using an object model imitating a person, and at that time, the object detection device forms a plurality of partial models of the object model. Includes an example of using to detect an individual person.

＜第一の実施形態に係る画像監視装置１の構成＞
図１は画像監視装置１の概略の構成を示すブロック図である。画像監視装置１は、撮影部２、通信部３、記憶部４、画像処理部５、および表示部６からなる。 <Configuration of image monitoring device 1 according to the first embodiment>
FIG. 1 is a block diagram showing a schematic configuration of the image monitoring device 1. The image monitoring device 1 includes a photographing unit 2, a communication unit 3, a storage unit 4, an image processing unit 5, and a display unit 6.

撮影部２は、監視カメラであり、通信部３を介して画像処理部５と接続され、監視空間を所定の時間間隔で撮影して撮影画像を生成し、撮影画像を順次画像処理部５に入力する撮影手段である。例えば、撮影部２は、イベント会場に設置されたポールに当該監視空間を俯瞰する視野を有して設置される。その視野は固定されていてもよいし、予めのスケジュール或いは通信部３を介した外部からの指示に従って変更されてもよい。また、例えば、撮影部２は監視空間をフレーム周期１秒で撮影してカラー画像を生成する。カラー画像の代わりにモノクロ画像を生成してもよい。 The photographing unit 2 is a surveillance camera, is connected to the image processing unit 5 via the communication unit 3, photographs the monitoring space at predetermined time intervals to generate a photographed image, and sequentially transfers the photographed images to the image processing unit 5. It is a shooting means to input. For example, the photographing unit 2 is installed on a pole installed at the event venue with a field of view overlooking the monitoring space. The field of view may be fixed, or may be changed according to a schedule in advance or an instruction from the outside via the communication unit 3. Further, for example, the photographing unit 2 photographs the monitoring space with a frame period of 1 second to generate a color image. A monochrome image may be generated instead of the color image.

通信部３は、通信回路であり、その一端が画像処理部５に接続され、他端が同軸ケーブルまたはＬＡＮ（Local Area Network）、インターネットなどの通信網を介して撮影部２および表示部６と接続される。通信部３は、撮影部２から撮影画像を取得して画像処理部５に入力し、画像処理部５から入力された検出結果を表示部６に出力する。 The communication unit 3 is a communication circuit, one end of which is connected to the image processing unit 5, and the other end of the communication unit 3 is connected to the photographing unit 2 and the display unit 6 via a communication network such as a coaxial cable, LAN (Local Area Network), or the Internet. Be connected. The communication unit 3 acquires a captured image from the photographing unit 2 and inputs it to the image processing unit 5, and outputs the detection result input from the image processing unit 5 to the display unit 6.

記憶部４は、ＲＯＭ（Read Only Memory）、ＲＡＭ（Random Access Memory）等のメモリ装置であり、各種プログラムや各種データを記憶する。記憶部４は、画像処理部５と接続されて画像処理部５との間でこれらの情報を入出力する。 The storage unit 4 is a memory device such as a ROM (Read Only Memory) and a RAM (Random Access Memory), and stores various programs and various data. The storage unit 4 is connected to the image processing unit 5 and inputs / outputs these information to and from the image processing unit 5.

画像処理部５は、ＣＰＵ（Central Processing Unit）、ＤＳＰ（Digital Signal Processor）、ＭＣＵ（Micro Control Unit）等の演算装置で構成される。画像処理部５は、記憶部４と接続され、記憶部４からプログラムを読み出して実行することにより各種処理手段・制御手段として動作し、各種データを記憶部４に記憶させ、読み出す。また、画像処理部５は、通信部３を介して撮影部２および表示部６とも接続され、通信部３経由で撮影部２から取得した撮影画像を解析することにより個々の人を検出し、検出結果を通信部３経由で表示部６に表示させる。 The image processing unit 5 is composed of arithmetic units such as a CPU (Central Processing Unit), a DSP (Digital Signal Processor), and an MCU (Micro Control Unit). The image processing unit 5 is connected to the storage unit 4 and operates as various processing means / control means by reading and executing a program from the storage unit 4, and stores and reads various data in the storage unit 4. Further, the image processing unit 5 is also connected to the photographing unit 2 and the display unit 6 via the communication unit 3, and detects an individual person by analyzing the captured image acquired from the photographing unit 2 via the communication unit 3. The detection result is displayed on the display unit 6 via the communication unit 3.

表示部６は、液晶ディスプレイ又はＣＲＴ（Cathode Ray Tube）ディスプレイ等のディスプレイ装置であり、通信部３を介して画像処理部５と接続され、画像処理部５による検出結果を表示する表示手段である。監視員は表示された検出結果を視認して混雑の発生等を判断し、必要に応じて人員配置の変更等の対処を行う。 The display unit 6 is a display device such as a liquid crystal display or a CRT (Cathode Ray Tube) display, and is a display means connected to the image processing unit 5 via the communication unit 3 and displaying the detection result by the image processing unit 5. .. The observer visually recognizes the displayed detection result, judges the occurrence of congestion, etc., and takes measures such as changing the staffing as necessary.

なお、本実施形態においては、撮影部２と画像処理部５の個数が１対１である画像監視装置１を例示するが、別の実施形態においては、撮影部２と画像処理部５の個数を多対１或いは多対多とすることもできる。 In this embodiment, the image monitoring device 1 in which the number of the photographing unit 2 and the image processing unit 5 is 1: 1 is illustrated, but in another embodiment, the number of the photographing unit 2 and the image processing unit 5 is illustrated. Can be many-to-one or many-to-many.

＜第一の実施形態に係る画像監視装置１の機能＞
図２および図３は画像監視装置１の機能を示す機能ブロック図である。通信部３は画像取得手段３０および物体位置出力手段３１等として機能し、記憶部４は密度推定器記憶手段４０および単体特徴記憶手段４１等として機能する。画像処理部５は、密度推定手段５０および物体位置判定手段５１等として機能する。また、単体特徴記憶手段４１は物体モデル記憶手段４１０ａおよび重み記憶手段４１２ａとしての機能を含み、物体位置判定手段５１は配置生成手段５１０ａ、モデル画像生成手段５１２ａ、評価値算出手段５１４ａおよび最適配置決定手段５１６ａとしての機能を含む。 <Function of image monitoring device 1 according to the first embodiment>
2 and 3 are functional block diagrams showing the functions of the image monitoring device 1. The communication unit 3 functions as an image acquisition means 30, an object position output means 31, and the like, and a storage unit 4 functions as a density estimator storage means 40, a single feature storage means 41, and the like. The image processing unit 5 functions as a density estimation means 50, an object position determination means 51, and the like. Further, the single feature storage means 41 includes functions as an object model storage means 410a and a weight storage means 412a, and the object position determination means 51 includes an arrangement generation means 510a, a model image generation means 512a, an evaluation value calculation means 514a, and an optimum arrangement determination. It includes a function as means 516a.

画像取得手段３０は、撮影手段である撮影部２から撮影画像を順次取得して、取得した撮影画像を密度推定手段５０および物体位置判定手段５１に順次出力する。 The image acquisition means 30 sequentially acquires captured images from the photographing unit 2 which is a photographing means, and sequentially outputs the acquired captured images to the density estimation means 50 and the object position determination means 51.

密度推定器記憶手段４０は、所定の密度ごとに当該密度にて物体（人）が存在する空間を撮影した密度画像それぞれの画像特徴を学習した推定密度算出関数であって、画像の特徴量を入力されると当該画像に撮影されている物体の密度の推定値（推定密度）を算出し、算出した推定密度を出力する推定器（密度推定器）の情報を予め記憶している。つまり上記推定密度算出関数の係数等のパラメータを密度推定器の情報として予め記憶している。 The density estimator storage means 40 is an estimated density calculation function that learns the image features of each density image obtained by photographing a space in which an object (person) exists at the density at a predetermined density, and obtains the feature amount of the image. When input, the estimated value (estimated density) of the density of the object captured in the image is calculated, and the information of the estimator (density estimator) that outputs the calculated estimated density is stored in advance. That is, parameters such as the coefficient of the estimated density calculation function are stored in advance as information of the density estimator.

密度推定手段５０は、画像取得手段３０から入力された撮影画像の各所から密度推定用の特徴量（推定用特徴量）を抽出するとともに密度推定器記憶手段４０から密度推定器を読み出して、抽出した推定用特徴量のそれぞれを密度推定器に入力することによって推定密度の分布（密度分布）を推定し、推定した密度分布を物体位置判定手段５１に出力する。 The density estimation means 50 extracts the feature amount for density estimation (feature amount for estimation) from various parts of the captured image input from the image acquisition means 30, and reads out the density estimator from the density estimator storage means 40 to extract the feature amount. The distribution of the estimated density (density distribution) is estimated by inputting each of the estimated feature quantities to the density estimator, and the estimated density distribution is output to the object position determination means 51.

密度推定の処理と密度推定器について具体的に説明する。 The processing of density estimation and the density estimator will be specifically described.

密度推定手段５０は、撮影画像の各画素の位置に窓（推定用抽出窓）を設定し、各推定用抽出窓における撮影画像から推定用特徴量を抽出する。推定用特徴量はＧＬＣＭ（Gray Level Co-occurrence Matrix）特徴である。 The density estimation means 50 sets a window (estimation extraction window) at the position of each pixel of the captured image, and extracts the estimation feature amount from the captured image in each estimation extraction window. The estimation feature quantity is a GLCM (Gray Level Co-occurrence Matrix) feature.

各推定用抽出窓に撮影されている監視空間内の領域は同一サイズであることが望ましい。すなわち、好適には密度推定手段５０は不図示のカメラパラメータ記憶手段から予め記憶されている撮影部２のカメラパラメータを読み出し、カメラパラメータを用いたホモグラフィ変換により撮影画像の任意の画素に撮影されている監視空間内の領域が同一サイズとなるように撮影画像を変形してから推定用特徴量を抽出する。 It is desirable that the areas in the monitoring space photographed by each estimation window are the same size. That is, preferably, the density estimation means 50 reads out the camera parameters of the photographing unit 2 stored in advance from the camera parameter storage means (not shown), and is photographed on an arbitrary pixel of the captured image by homography conversion using the camera parameters. The captured image is deformed so that the areas in the monitoring space are the same size, and then the estimation feature amount is extracted.

密度推定器は多クラスの画像を識別する識別器で実現することができ、多クラスＳＶＭ（Support Vector Machine）法で学習した識別関数とすることができる。 The density estimator can be realized by a discriminator that discriminates a multi-class image, and can be a discriminant function learned by a multi-class SVM (Support Vector Machine) method.

密度は、例えば、人が存在しない「背景」クラス、０人／ｍ^２より高く２人／ｍ^２以下である「低密度」クラス、２人／ｍ^２より高く４人／ｍ^２以下である「中密度」クラス、４人／ｍ^２より高い「高密度」クラスの４クラスと定義することができる。 The density is, for example, a "background" class with no people, a "low density" class ^{higher than 0 people / m 2} and 2 people / m ² ^{or less, higher than 2 people / m 2} and 4 people / m ² or less. It can be defined as 4 classes of "medium density" class and ^{"high density" class higher than 4 people / m 2.}

推定密度は各クラスに予め付与された値であり、分布推定の結果として出力される値である。本実施形態では各クラスに対応する値を「背景」「低密度」「中密度」「高密度」と表記する。 The estimated density is a value given in advance to each class and is a value output as a result of distribution estimation. In this embodiment, the values corresponding to each class are described as "background", "low density", "medium density", and "high density".

すなわち、密度推定器は「背景」クラス、「低密度」クラス、「中密度」クラス、「高密度」クラスのそれぞれに帰属する多数の画像（密度画像）の特徴量に多クラスＳＶＭ法を適用して学習した、各クラスの画像を他のクラスと識別するための識別関数である。この学習により導出された識別関数のパラメータが密度推定器として記憶されている。なお、密度画像の特徴量は、推定用特徴量と同種であり、ＧＬＣＭ特徴である。 That is, the density estimator applies the multi-class SVM method to the features of a large number of images (density images) belonging to each of the "background" class, "low density" class, "medium density" class, and "high density" class. It is an identification function for distinguishing the image of each class from other classes. The parameters of the discriminant function derived by this learning are stored as a density estimator. The feature amount of the density image is the same as the feature amount for estimation, and is a GLCM feature.

密度推定手段５０は、各画素に対応して抽出した推定用特徴量のそれぞれを密度推定器に入力することによってその出力値である推定密度を取得する。なお、撮影画像を変形させて推定用特徴量を抽出した場合、密度推定手段５０は、カメラパラメータを用いたホモグラフィ変換により密度分布を元の撮影画像の形状に変形させる。 The density estimation means 50 acquires the estimated density, which is the output value, by inputting each of the estimation feature quantities extracted corresponding to each pixel into the density estimator. When the captured image is deformed to extract the feature amount for estimation, the density estimation means 50 transforms the density distribution into the shape of the original captured image by homography transformation using camera parameters.

こうして得られた、撮影画像の画素ごとの推定密度の集まりが密度分布である。 The density distribution is a collection of estimated densities for each pixel of the captured image obtained in this way.

密度推定手段５０が出力する密度分布から撮影画像の各所における人の粗密状況が分かるが、密度分布から個々の人の位置までは分からない。
これに対し、密度推定手段５０の後段の物体位置判定手段５１は、撮影画像に現れている個々の人の位置を判定する手段である。 From the density distribution output by the density estimation means 50, the density of people in various parts of the photographed image can be known, but from the density distribution, the position of each person cannot be known.
On the other hand, the object position determination means 51 at the subsequent stage of the density estimation means 50 is a means for determining the position of an individual person appearing in the captured image.

物体位置判定手段５１は、単独の人（物体）としての画像特徴が現れている箇所を撮影画像上で探索することにより個々の物体を検出して個々の物体の位置を判定する。すなわち、物体位置判定手段５１は、撮影画像内に個々の物体が存在し得る候補位置を設定して当該候補位置の撮影画像に単独の物体の画像特徴（単体特徴）が現れている度合いを表す評価値（統合評価値）を算出し、統合評価値が所定値以上である候補位置を物体の位置と判定する。例えば、単体特徴は人の形状であり、単体特徴記憶手段４１が予め単体特徴を記憶している。また例えば、統合評価値は撮影画像のエッジと人の形状を表すモデルとの類似度である。統合評価値は物体を構成する複数の部分それぞれに対する評価値（部分評価値）を統合した評価値である。 The object position determining means 51 detects an individual object by searching on a captured image for a place where an image feature as a single person (object) appears, and determines the position of the individual object. That is, the object position determining means 51 sets a candidate position where each object can exist in the captured image, and represents the degree to which the image feature (single feature) of a single object appears in the captured image at the candidate position. The evaluation value (integrated evaluation value) is calculated, and the candidate position where the integrated evaluation value is equal to or higher than a predetermined value is determined as the position of the object. For example, the single feature is a human shape, and the single feature storage means 41 stores the single feature in advance. Further, for example, the integrated evaluation value is the degree of similarity between the edge of the photographed image and the model representing the shape of a person. The integrated evaluation value is an evaluation value that integrates evaluation values (partial evaluation values) for each of a plurality of parts constituting the object.

ここで、混雑が生じ得る空間が撮影された撮影画像においては、密度が高い領域ほど隠蔽が生じやすくなるため評価値の信頼性は低いと考えられる。その逆に、密度が低い領域ほど隠蔽は生じにくく評価値の信頼性は高いと考えられる。また、混雑が生じ得る空間が撮影された撮影画像においては、密度の境目に存在している物体のようにその信頼性の高い部分と低い部分が混在する物体も多々含まれ得る。 Here, in a captured image in which a space where congestion can occur is captured, it is considered that the reliability of the evaluation value is low because concealment is more likely to occur in a region having a higher density. On the contrary, it is considered that the lower the density, the less concealment occurs and the higher the reliability of the evaluation value. Further, in a photographed image in which a space where congestion may occur is taken, many objects such as an object existing at a boundary of density in which a highly reliable portion and a low reliability portion coexist may be included.

そこで、物体位置判定手段５１は、物体全体を一様に評価するのではなく、密度分布を参照して物体の部分ごとに密度に応じた重み付けを行うことによって物体の検出精度を向上させる。
すなわち、物体位置判定手段５１は、撮影画像内に個々の物体が存在し得る候補位置を設定するとともに候補位置を基準として撮影画像内に物体を構成する複数の部分のそれぞれと対応する部分領域を設定して、部分領域ごとに当該部分領域と対応する部分の画像特徴が現れている度合いを表す部分評価値を算出し、候補位置を基準に設定した各部分領域の部分評価値を当該部分領域の密度が低いほど重視して統合した統合評価値が所定の判定基準を満たす候補位置を物体の位置と判定する。より具体的には、物体位置判定手段５１は、前記部分領域ごとに当該部分領域における密度が低いほど高く当該部分領域における密度が高いほど低い重み係数を設定し、候補位置を基準に設定した複数の部分領域の部分評価値を当該部分領域の重み係数にて重み付けて総和して統合評価値を算出する。 Therefore, the object position determination means 51 does not uniformly evaluate the entire object, but improves the detection accuracy of the object by weighting each part of the object according to the density with reference to the density distribution.
That is, the object position determining means 51 sets a candidate position in which an individual object may exist in the captured image, and sets a partial region corresponding to each of a plurality of portions constituting the object in the captured image with reference to the candidate position. Set, calculate the partial evaluation value indicating the degree of appearance of the image feature of the part corresponding to the partial area for each partial area, and set the partial evaluation value of each partial area based on the candidate position as the relevant partial area. The lower the density of the object, the more important the integrated evaluation value is, and the candidate position where the integrated evaluation value satisfies a predetermined criterion is determined as the position of the object. More specifically, the object position determination means 51 sets a weighting coefficient for each of the partial regions, which is higher as the density in the partial region is lower and lower as the density in the partial region is higher, and is set based on the candidate position. The partial evaluation value of the subregion is weighted by the weighting coefficient of the subregion and summed up to calculate the integrated evaluation value.

また、密度が高い領域ほど重度の隠蔽が生じる可能性は高くなり、隠れている部分についても評価を行うと誤評価が生じる機会を増加させ、物体の検出精度を低下させることにつながる。例えば、撮影部２が俯瞰設置されている場合は、足元に近いほど隠蔽は発生しやすく、頭に近いほど隠蔽は発生しにくい。このことを考慮し、混雑に適応させるべく単体特徴を人の頭部だけにすると混雑時の検出し損ねは減少する。しかし、頭部だけの単体特徴は肩などに対しても比較的高い統合評価値が算出されてしまうため、混雑していないときの誤検出が増加する。 In addition, the higher the density of the region, the higher the possibility that severe concealment will occur, and if the concealed portion is also evaluated, the chance of erroneous evaluation will increase, leading to a decrease in the detection accuracy of the object. For example, when the photographing unit 2 is installed from a bird's-eye view, concealment is more likely to occur as it is closer to the feet, and concealment is less likely to occur as it is closer to the head. In consideration of this, if the single feature is limited to the human head in order to adapt to congestion, the failure to detect during congestion is reduced. However, since a relatively high integrated evaluation value is calculated for the shoulder and the like as a single feature of only the head, erroneous detection when it is not crowded increases.

物体位置判定手段５１は、密度分布を参照することによって、このような評価する部分の多寡と個々の物体の検出精度との間に存在するトレードオフを解消する。すなわち、物体位置判定手段５１は、候補位置における密度が高いほど物体を構成する部分のうちの少ない部分に対応する部分領域を設定する。つまり、物体位置判定手段５１は、候補位置における密度が高いほど、物体を構成する部分のうちの少ない部分の画像特徴を評価して統合評価値を算出する。例えば、物体位置判定手段５１は、候補位置の推定密度が低密度であれば全身の画像特徴を評価して統合評価値を算出し、中密度であれば上半身の画像特徴を評価して統合評価値を算出し、高密度であれば頭部近傍の画像特徴を評価して統合評価値を算出する。 By referring to the density distribution, the object position determining means 51 eliminates the trade-off existing between the amount of the evaluated portion and the detection accuracy of each object. That is, the object position determination means 51 sets a partial region corresponding to a smaller portion of the portions constituting the object as the density at the candidate position increases. That is, the object position determination means 51 evaluates the image feature of a small portion of the portions constituting the object and calculates the integrated evaluation value as the density at the candidate position increases. For example, the object position determining means 51 evaluates the image features of the whole body to calculate the integrated evaluation value if the estimated density of the candidate positions is low, and evaluates the image features of the upper body to evaluate the integrated evaluation if the density is medium. The value is calculated, and if the density is high, the image feature near the head is evaluated and the integrated evaluation value is calculated.

以下、個々の物体の検出と単体特徴について説明する。 Hereinafter, the detection of individual objects and the characteristics of individual objects will be described.

単体特徴記憶手段４１は、単独の人（物体）の形状を模した物体モデルの情報を予め記憶した物体モデル記憶手段４１０ａ、および評価値の算出において用いる重みの情報を予め記憶した重み記憶手段４１２ａとして機能し、物体モデルの情報および重みの情報を単体特徴として記憶している。 The single feature storage means 41 is an object model storage means 410a that stores information of an object model that imitates the shape of a single person (object) in advance, and a weight storage means 412a that stores weight information used in calculating an evaluation value in advance. It functions as, and stores the information of the object model and the information of the weight as a single feature.

図４〜図６は単体特徴記憶手段４１が記憶している単体特徴を模式的に表した図である。そのうちの図４はおよび図５は物体モデル記憶手段４１０ａが記憶している物体モデルの情報の一例であり、図６は重み記憶手段４１２ａが記憶している重み（重み係数の比率）の一例である。 4 to 6 are diagrams schematically showing the single features stored in the single feature storage means 41. Of these, FIGS. 4 and 5 are examples of object model information stored by the object model storage means 410a, and FIG. 6 is an example of weights (weight coefficient ratio) stored by the weight storage means 412a. is there.

物体モデル記憶手段４１０ａが記憶している物体モデルは、具体的には立位の人の頭部、胴部、脚部に対応する３つの回転楕円体から構成される立体モデル７００である。なお頭部重心を人の代表位置とする。さらに物体モデル記憶手段４１０ａは、立体モデル７００と併せて、密度ごとの評価範囲７０２を記憶し、また立体モデル７００を撮影画像の座標系に投影するために撮影部２のカメラパラメータ７０１を記憶している。カメラパラメータ７０１は、実際の監視空間における撮影部２の設置位置及び撮像方向といった外部パラメータ、撮影部２の焦点距離、画角、レンズ歪みその他のレンズ特性や、撮像素子の画素数といった内部パラメータを含む情報である。 The object model stored in the object model storage means 410a is specifically a three-dimensional model 700 composed of three spheroids corresponding to the head, body, and legs of a standing person. The center of gravity of the head is the representative position of the person. Further, the object model storage means 410a stores the evaluation range 702 for each density together with the stereoscopic model 700, and stores the camera parameter 701 of the photographing unit 2 in order to project the stereoscopic model 700 onto the coordinate system of the captured image. ing. The camera parameter 701 includes external parameters such as the installation position and imaging direction of the photographing unit 2 in the actual monitoring space, internal parameters such as the focal length, angle of view, lens distortion and other lens characteristics of the photographing unit 2 and the number of pixels of the imaging element. Information to include.

評価範囲７０２は、複数の部分に分けて密度ごとに設定され、密度が高いほど単独の物体を構成する部分のうちの少ない部分となっている。
具体的には物体モデル記憶手段４１０ａは、低密度クラスを表す値と対応付けて人の６つの部分「上１／３、左１／２」、「上１／３、右１／２」、「中１／３、左１／２」、「中１／３、右１／２」、「下１／３、左１／２」および「下１／３、右１／２」を表す評価範囲を記憶している。これら６つの部分を合わせると人の「全体」となる。
また物体モデル記憶手段４１０ａは、中密度クラスを表す値と対応付けて人の４つの部分「上１／３、左１／２」、「上１／３、右１／２」、「中１／３、左１／２」および「中１／３、右１／２」を表す評価範囲を記憶している。これら４つの部分を合わせると人の「上部２／３」となる。
また物体モデル記憶手段４１０ａは、高密度クラスを表す値と対応付けて人の４つの部分「上１／３、左１／２」および「上１／３、右１／２」を表す評価範囲を記憶している。これら２つの部分を合わせると人の「上部１／３」となる。 The evaluation range 702 is divided into a plurality of parts and set for each density, and the higher the density, the smaller the part that constitutes a single object.
Specifically, the object model storage means 410a has six parts of a person "upper 1/3, left 1/2", "upper 1/3, right 1/2", in association with a value representing a low density class. Evaluation representing "middle 1/3, left 1/2", "middle 1/3, right 1/2", "bottom 1/3, left 1/2" and "bottom 1/3, right 1/2" I remember the range. When these six parts are combined, it becomes the "whole" of a person.
Further, the object model storage means 410a associates the four parts of the person with the values representing the medium density class, "upper 1/3, left 1/2", "upper 1/3, right 1/2", and "middle 1". The evaluation range representing "/ 3, left 1/2" and "middle 1/3, right 1/2" is stored. When these four parts are combined, it becomes the "upper two-thirds" of a person.
Further, the object model storage means 410a has an evaluation range representing four parts of a person "upper 1/3, left 1/2" and "upper 1/3, right 1/2" in association with a value representing a high-density class. I remember. When these two parts are combined, it becomes the "upper 1/3" of a person.

以下、評価範囲「上１／３、左１／２」と立体モデル７００とカメラパラメータ７０１との組み合わせによって表される物体モデル７１０を左上部モデル、評価範囲「上１／３、右１／２」と立体モデル７００とカメラパラメータ７０１との組み合わせによって表される部分モデル７１１を右上部モデル、評価範囲「中１／３、左１／２」と立体モデル７００とカメラパラメータ７０１との組み合わせによって表される部分モデル７１２を左中部モデル、評価範囲「中１／３、右１／２」と立体モデル７００とカメラパラメータ７０１との組み合わせによって表される部分モデル７１３を右中部モデル、評価範囲「下１／３、左１／２」と立体モデル７００とカメラパラメータ７０１との組み合わせによって表される部分モデル７１４を左下部モデル、評価範囲「下１／３、右１／２」と立体モデル７００とカメラパラメータ７０１との組み合わせによって表される部分モデル７１５を右下部モデルと称する。また評価範囲「全体」と立体モデル７００とカメラパラメータ７０１との組み合わせによって表される物体モデル７２０を全身モデル、評価範囲「上部２／３」と立体モデル７００とカメラパラメータ７０１との組み合わせによって表される物体モデル７２１を上半身モデル、評価範囲「上部１／３」と立体モデル７００とカメラパラメータ７０１との組み合わせによって表される物体モデル７２２を頭部近傍モデルと称する。 Hereinafter, the object model 710 represented by the combination of the evaluation range "upper 1/3, left 1/2", the three-dimensional model 700, and the camera parameter 701 is the upper left model, and the evaluation range "upper 1/3, right 1/2". The partial model 711 represented by the combination of the three-dimensional model 700 and the camera parameter 701 is shown by the upper right model, the evaluation range "middle 1/3, left 1/2" and the combination of the three-dimensional model 700 and the camera parameter 701. The partial model 712 is the left middle model, and the partial model 713 represented by the combination of the evaluation range "middle 1/3, right 1/2", the three-dimensional model 700, and the camera parameter 701 is the right middle model, and the evaluation range "bottom". The partial model 714 represented by the combination of "1/3, left 1/2", the three-dimensional model 700, and the camera parameter 701 is the lower left model, the evaluation range "lower one-third, right 1/2", and the three-dimensional model 700. The partial model 715 represented by the combination with the camera parameter 701 is referred to as a lower right model. Further, the object model 720 represented by the combination of the evaluation range "whole", the three-dimensional model 700 and the camera parameter 701 is represented by the whole body model, and the evaluation range "upper two-thirds" is represented by the combination of the three-dimensional model 700 and the camera parameter 701. The object model 721 is referred to as an upper body model, and the object model 722 represented by a combination of the evaluation range “upper 1/3”, the three-dimensional model 700, and the camera parameter 701 is referred to as a head proximity model.

このように、物体モデル記憶手段４１０ａは、低密度クラスと対応付けて左上部モデル７１０、右上部モデル７１１、左中部モデル７１２、右中部モデル７１３、左下部モデル７１４および右下部モデル７１５からなる全身モデル７２０を、中密度クラスと対応付けて左上部モデル７１０、右上部モデル７１１、左中部モデル７１２および右中部モデル７１３からなる上半身モデル７２１を、高密度クラスと対応付けて左上部モデル７１０および右上部モデル７１１からなる頭部近傍モデル７２２を物体モデルの情報として記憶している。
なお、部分モデルとして描画されるのは、物体の形状を表す輪郭線（図５の実線部）である。 As described above, the object model storage means 410a is a whole body including the upper left model 710, the upper right model 711, the left middle model 712, the right middle model 713, the lower left model 714, and the lower right model 715 in association with the low density class. The upper left model 721 consisting of the upper left model 710, the upper right model 711, the left middle model 712 and the right middle model 713 is associated with the high density class, and the upper left model 710 and the upper right are associated with the model 720. The head vicinity model 722 composed of the part model 711 is stored as the information of the object model.
What is drawn as a partial model is a contour line (solid line portion in FIG. 5) representing the shape of the object.

重みは、物体の各部分と対応する部分領域の密度に応じて当該部分を重視する度合いであり、密度間の相対的な比率で表される。密度が低い部分ほど重視され密度が高い部分ほど軽視されるため、密度が低いほど高く密度が高いほど低い重みが設定されている。例えば、低密度、中密度および高密度の重みは比率１０：７：５とすることができる。なお部分領域の密度の決定に際しては背景クラスは低密度クラスとみなすことができる。 The weight is the degree to which the portion of the object is emphasized according to the density of the corresponding partial region, and is expressed as a relative ratio between the densities. The lower the density, the more important the part, and the higher the density, the lighter the weight. Therefore, the lower the density, the higher the weight, and the higher the density, the lower the weight. For example, low density, medium density and high density weights can have a ratio of 10: 7: 5. The background class can be regarded as a low density class when determining the density of a partial region.

このように重み記憶手段４１２ａは、密度が低いほど高く密度が高いほど低い、密度ごとの重みを記憶している。 As described above, the weight storage means 412a stores the weight for each density, which is higher as the density is lower and lower as the density is higher.

配置生成手段５１０ａは、それぞれが１以上の候補位置を含む互いに異なる複数通りの配置を生成し、生成した各配置をモデル画像生成手段５１２ａに出力する。 The arrangement generation means 510a generates a plurality of different arrangements each including one or more candidate positions, and outputs each generated arrangement to the model image generation means 512a.

そのために、配置生成手段５１０ａは、乱数に基づき撮影画像の画素のうち推定密度が低密度、中密度または高密度である画素の中から１個以上上限個数以下の個数（配置数）の画素をランダムに決定し、決定した各画素の位置を候補位置とすることで配置を生成する。配置生成手段５１０ａは、この生成を、配置数を順次増加させながら配置数ごとに予め定めた回数ずつ反復することによって、互いに異なる複数通りの配置を生成する。なお、配置数の上限個数は監視空間に存在し得る物体の数の上限とすることができ、例えば監視空間を模した仮想空間中に立位の人の立体モデルを重ならずに配置可能な数として算出できる。 Therefore, the arrangement generation means 510a selects one or more pixels (the number of arrangements) from among the pixels of the captured image having an estimated density of low density, medium density, or high density based on random numbers. Arrangement is generated by randomly determining and using the determined position of each pixel as a candidate position. The arrangement generation means 510a repeats this generation for each number of arrangements by a predetermined number of times while sequentially increasing the number of arrangements, thereby generating a plurality of different arrangements. The upper limit of the number of arrangements can be the upper limit of the number of objects that can exist in the monitoring space. For example, a three-dimensional model of a standing person can be arranged in a virtual space imitating the monitoring space without overlapping. It can be calculated as a number.

モデル画像生成手段５１２ａは、配置生成手段５１０ａから入力された複数通りの配置それぞれについて、各候補位置を基準とする複数の部分のそれぞれと対応する部分領域に、当該部分を模した部分モデルを描画してモデル画像を生成する。またその際に、モデル画像生成手段５１２ａは、各候補位置に、当該候補位置における密度が高いほど、単独の物体を構成する部分のうちの少ない部分を模した物体モデルを描画してモデル画像を生成し、生成した各モデル画像を評価値算出手段５１４ａに出力する。 The model image generation means 512a draws a partial model imitating the portion in the partial region corresponding to each of the plurality of portions based on each candidate position for each of the plurality of arrangements input from the arrangement generation means 510a. To generate a model image. At that time, the model image generation means 512a draws an object model at each candidate position to imitate a small part of the parts constituting a single object as the density at the candidate position increases, and draws a model image. It is generated, and each generated model image is output to the evaluation value calculation means 514a.

そのために、モデル画像生成手段５１２ａは、物体モデル記憶手段４１０ａからカメラパラメータを読み出し、配置ごとに、カメラパラメータを用いて、各候補位置を立体モデルの頭部重心の高さ（例えば１．５ｍ）の水平面に逆投影することで、当該候補位置に投影される立体モデルの、監視空間を模した仮想空間内における代表位置を算出する。 Therefore, the model image generation means 512a reads out the camera parameters from the object model storage means 410a, and uses the camera parameters for each arrangement to set each candidate position at the height of the center of gravity of the head of the three-dimensional model (for example, 1.5 m). By back-projecting onto the horizontal plane of, the representative position of the three-dimensional model projected at the candidate position in the virtual space that imitates the monitoring space is calculated.

また、モデル画像生成手段５１２ａは、物体モデル記憶手段４１０ａから頭部近傍モデルを読み出して各候補位置と対応する仮想空間内の代表位置に頭部近傍モデルを配置し、カメラパラメータを用いて頭部近傍モデルを撮影画像の座標系に投影する。そして、モデル画像生成手段５１２ａは、密度推定手段５０から入力された密度分布を参照して各候補位置に対応する頭部近傍モデルの投影領域内の推定密度を集計し、各候補位置における最多の推定密度（ただし背景クラスは除く）を当該候補位置の密度と決定する。 Further, the model image generation means 512a reads out the head neighborhood model from the object model storage means 410a, arranges the head neighborhood model at a representative position in the virtual space corresponding to each candidate position, and uses the camera parameters to arrange the head neighborhood model. The neighborhood model is projected onto the coordinate system of the captured image. Then, the model image generation means 512a aggregates the estimated densities in the projection region of the model near the head corresponding to each candidate position with reference to the density distribution input from the density estimation means 50, and the largest number in each candidate position. The estimated density (excluding the background class) is determined as the density of the candidate position.

また、モデル画像生成手段５１２ａは、候補位置ごとに当該候補位置の密度に応じた物体モデルを物体モデル記憶手段４１０ａから読み出す。具体的には、モデル画像生成手段５１２ａは、候補位置の密度が低密度であれば６つの部分モデルからなる全身モデルを読み出し、中密度であれば４つの部分モデルからなる上半身モデルを読み出し、高密度であれば２つの部分モデルからなる頭部近傍モデルを読み出す。そして、モデル画像生成手段５１２ａは、各配置について、各候補位置に対応して読み出した物体モデルを当該候補位置と対応する仮想空間内の代表位置に配置し、カメラパラメータを用いて各物体モデルを為す部分モデルそれぞれを撮影画像の座標系に投影し、物体の形状（輪郭線）を描画することによって、配置ごとのモデル画像を生成する。
なお、モデル画像生成手段５１２ａは、撮影部２からの距離が遠い代表位置に配置した物体モデルから順に投影し、投影領域を上書きすることによって、物体モデル間の隠蔽を表現したモデル画像を生成する。 Further, the model image generation means 512a reads out an object model corresponding to the density of the candidate position for each candidate position from the object model storage means 410a. Specifically, the model image generation means 512a reads out a whole body model consisting of six partial models if the density of the candidate positions is low, and reads out an upper body model consisting of four partial models if the density is medium, and is high. If the density is high, a head-near model consisting of two partial models is read out. Then, the model image generation means 512a arranges the object model read out corresponding to each candidate position at a representative position in the virtual space corresponding to the candidate position for each arrangement, and uses the camera parameters to set each object model. A model image for each arrangement is generated by projecting each of the partial models to be performed onto the coordinate system of the captured image and drawing the shape (contour line) of the object.
The model image generation means 512a sequentially projects from the object model arranged at the representative position far from the photographing unit 2 and overwrites the projection area to generate a model image expressing the concealment between the object models. ..

また、モデル画像生成手段５１２ａは、配置ごとに、モデル画像における物体モデルどうしの重なり度合いを表す隠蔽度を次式に従って算出する。
隠蔽度＝物体モデル間の重複領域の面積／物体モデルの投影領域の和領域の面積（１） Further, the model image generation means 512a calculates the degree of concealment representing the degree of overlap between the object models in the model image for each arrangement according to the following equation.
Concealment = Area of overlapping area between object models / Area of sum area of projection area of object model (1)

また、モデル画像生成手段５１２ａは、各モデル画像と対応して、当該モデル画像における各部分領域について当該部分領域の推定密度に応じた重み係数を算出する。 Further, the model image generation means 512a calculates a weighting coefficient corresponding to the estimated density of the partial region for each partial region in the model image in correspondence with each model image.

そのために、モデル画像生成手段５１２ａは、密度分布を参照して各部分領域内の推定密度を集計し、各部分領域における最多の推定密度を当該部分領域の密度と決定する。ただし背景クラスは低密度クラスとみなして集計する。
次にモデル画像生成手段５１２ａは、重み記憶手段４１２ａから各部分領域の密度に対応する重み（重み係数の比率）を読み出し、配置ごとに全部分領域の重みの和を求める。
続いてモデル画像生成手段５１２ａは、配置ごとに、各部分領域の重みを重みの和で除して、当該部分領域の重み係数を算出する。すなわち各配置における重み係数の和は１となるように正規化される。 Therefore, the model image generation means 512a aggregates the estimated densities in each sub-region with reference to the density distribution, and determines the highest estimated density in each sub-region as the density of the sub-region. However, the background class is regarded as a low density class and aggregated.
Next, the model image generation means 512a reads out the weights (ratio of weighting coefficients) corresponding to the densities of each partial region from the weight storage means 412a, and obtains the sum of the weights of all the partial regions for each arrangement.
Subsequently, the model image generation means 512a divides the weight of each partial region by the sum of the weights for each arrangement to calculate the weight coefficient of the partial region. That is, the sum of the weighting coefficients in each arrangement is normalized to be 1.

モデル画像生成手段５１２ａは、こうして求めた配置とモデル画像と隠蔽度と重み係数とを対応付けて評価値算出手段５１４ａに出力する。 The model image generation means 512a associates the arrangement obtained in this way with the model image, the degree of concealment, and the weighting coefficient, and outputs the result to the evaluation value calculation means 514a.

図７は、第一の実施形態に係る密度推定手段５０、配置生成手段５１０ａおよびモデル画像生成手段５１２ａによる処理例を模式的に示した図である。
画像７４０は、密度推定手段５０が推定した密度分布を画像化したものである。当該密度分布においては、白抜き部は推定密度が背景である領域、横線部は推定密度が低密度である領域、斜線部は推定密度が中密度である領域、格子部は推定密度が高密度である領域をそれぞれ示している。
画像７４１は、配置生成手段５１０ａが生成した配置に含まれる８個の候補位置を撮影画像の座標系に×印でプロットしたものである。
３次元モデル７４２は、モデル画像生成手段５１２ａが画像７４１に示した８個の候補位置と対応する仮想空間内の代表位置に立体モデルを配置した様子を図示したものである。
画像７４３は、モデル画像生成手段５１２ａが、画像７４０で示した密度分布に基づいて各候補位置の密度を特定し、当該密度に応じた評価範囲の立体モデルを各候補位置に投影して作成したモデル画像を示している。 FIG. 7 is a diagram schematically showing a processing example by the density estimation means 50, the arrangement generation means 510a, and the model image generation means 512a according to the first embodiment.
The image 740 is an image of the density distribution estimated by the density estimation means 50. In the density distribution, the white area is the area where the estimated density is the background, the horizontal line area is the area where the estimated density is low, the shaded area is the area where the estimated density is medium density, and the lattice area is the area where the estimated density is high. Each area is shown.
The image 741 is a plot of eight candidate positions included in the arrangement generated by the arrangement generation means 510a on the coordinate system of the captured image with a cross.
The three-dimensional model 742 illustrates how the model image generation means 512a arranges the three-dimensional model at the representative positions in the virtual space corresponding to the eight candidate positions shown in the image 741.
The image 743 was created by the model image generation means 512a specifying the density of each candidate position based on the density distribution shown in the image 740 and projecting a three-dimensional model in the evaluation range corresponding to the density onto each candidate position. The model image is shown.

モデル画像７４３が表す配置においては、１個の部分領域（集団中の中段右の人の左中部）が完全に隠蔽されており、３３個の部分領域が描画されている。３３個の部分領域の内訳は、密度が低密度または背景であるものが２３個、密度が中密度であるものが６個、密度が高密度であるものが４個である。モデル画像７４３が表す配置の全部分領域についての重み係数比率の和は、１０×２３＋７×６＋５×４＝２９２となる。３３個の部分領域のうちの部分領域７４４の密度は低密度、部分領域７４５は中密度、部分領域７４６は高密度である。部分領域７４４の重み係数は１０／２９２、部分領域７４５の重み係数は７／２９２、部分領域７４６の重み係数は５／２９２となる。他の部分領域についても同様にして重み係数が算出される。 In the arrangement represented by the model image 743, one partial region (the middle left middle part of the middle right person in the group) is completely concealed, and 33 partial regions are drawn. The breakdown of the 33 subregions is 23 with low density or background, 6 with medium density, and 4 with high density. The sum of the weighting coefficient ratios for all the partial regions of the arrangement represented by the model image 743 is 10 × 23 + 7 × 6 + 5 × 4 = 292. Of the 33 partial regions, the partial region 744 has a low density, the partial region 745 has a medium density, and the partial region 746 has a high density. The weighting factor of the partial region 744 is 10/292, the weighting coefficient of the partial region 745 is 7/292, and the weighting coefficient of the partial region 746 is 5/292. The weighting coefficient is calculated in the same manner for the other subregions.

図８の画像７５０は、モデル画像７４３の各部分領域に対応してモデル画像生成手段５１２ａが算出した重み係数を模式的に示した画像である。 The image 750 of FIG. 8 is an image schematically showing the weighting coefficient calculated by the model image generation means 512a corresponding to each partial region of the model image 743.

評価値算出手段５１４ａは、複数通りの配置それぞれについて、モデル画像生成手段５１２ａから入力されたモデル画像の撮影画像に対する類似の度合いを表す統合評価値を算出し、配置ごとの統合評価値を最適配置決定手段５１６ａに出力する。 The evaluation value calculation means 514a calculates an integrated evaluation value indicating the degree of similarity of the model image input from the model image generation means 512a to the captured image for each of the plurality of arrangements, and optimally arranges the integrated evaluation value for each arrangement. Output to the determination means 516a.

具体的には、評価値算出手段５１４ａは、各モデル画像と撮影画像の重み付け類似度を次式に従って算出する。
重み付け類似度＝重み付け形状適合度 − Ｗ_Ｈａ×隠蔽度（２）
ただし、Ｗ_Ｈａは０より大きな重み係数であり、事前の実験に基づいて予め設定される。重み付け形状適合度から減じる隠蔽度は過剰な物体モデルの重なりを抑制するためのペナルティ値である。このように隠蔽度を含めた類似度に基づいて最適配置を決定することで、本来の物体数以上の物体モデルが当てはまることによる物体位置の誤検出を防止できる。 Specifically, the evaluation value calculation means 514a calculates the weighting similarity between each model image and the captured image according to the following equation.
Weighted similarity = Weighted shape goodness of fit − W _Ha × Concealment degree (2)
However, W _Ha is a weighting coefficient larger than 0 and is preset based on prior experiments. The concealment degree deducted from the weighted shape fit is a penalty value for suppressing excessive overlap of object models. By determining the optimum arrangement based on the degree of similarity including the degree of concealment in this way, it is possible to prevent erroneous detection of the object position due to the application of an object model larger than the original number of objects.

評価値算出手段５１４ａは、重み付け形状適合度を、モデル画像と撮影画像との部分領域ごとの形状適合度を当該部分領域の重み係数で重み付けて総和して算出する。部分領域ごとの形状適合度が部分評価値である。形状適合度はエッジの類似度とすることができる。評価値算出手段５１４ａは、各モデル画像と撮影画像のそれぞれからエッジを抽出し、各モデル画像について、モデル画像から有効なエッジが抽出された画素と当該画素に対応する撮影画像の画素のエッジとの差の絶対値を算出して総和し、総和値を撮影画像から有効なエッジが抽出された画素数とモデル画像から有効なエッジが抽出された画素数の和で除して符号を反転した値を重み付け形状適合度として算出する。 The evaluation value calculation means 514a calculates the weighted shape suitability by weighting the shape suitability of each partial region of the model image and the captured image with the weighting coefficient of the partial region and summing them up. The shape conformity for each partial area is the partial evaluation value. The goodness of fit can be the similarity of edges. The evaluation value calculation means 514a extracts an edge from each of the model image and the captured image, and for each model image, the pixel obtained by extracting a valid edge from the model image and the edge of the pixel of the captured image corresponding to the pixel are used. The absolute value of the difference was calculated and summed, and the sum was divided by the sum of the number of pixels where valid edges were extracted from the captured image and the number of pixels where valid edges were extracted from the model image, and the sign was inverted. The value is calculated as the weighted shape suitability.

或いは、評価値算出手段５１４ａは、各モデル画像と撮影画像のそれぞれからエッジ画像を生成し、各モデル画像について、撮影画像から生成したエッジ画像と当該モデル画像から生成したエッジ画像とのチャンファーマッチング（Chamfer Matching）を行って得られる距離の符号を反転した値を、撮影画像から有効なエッジが抽出された画素数とモデル画像から有効なエッジが抽出された画素数の和で除して、当該モデル画像の重み付け形状適合度とすることもできる。 Alternatively, the evaluation value calculation means 514a generates an edge image from each of the model image and the captured image, and for each model image, chamfer matching between the edge image generated from the captured image and the edge image generated from the model image. The value obtained by performing (Chamfer Matching) with the sign of the distance inverted is divided by the sum of the number of pixels in which valid edges are extracted from the captured image and the number of pixels in which valid edges are extracted from the model image. It can also be the weighted shape conformity of the model image.

最適配置決定手段５１６ａは、評価値算出手段５１４ａから入力された配置ごとの統合評価値を参照し、統合評価値が最大の配置における候補位置を物体の位置と決定し、決定した物体位置の情報を物体位置出力手段３１に出力する。すなわち、最適配置決定手段５１６ａは、最大の類似度が算出された配置に含まれる各候補位置を撮影画像に撮影されている各人の位置と決定する。
例えば、最適配置決定手段５１６ａは、監視員が視認し易いよう、各物体位置に物体モデルを当該物体位置の密度に応じて色分けして描画して物体位置の情報を生成し、出力する。または、物体位置の情報は物体位置の座標値そのものとすることもでき、物体位置の情報は、描画した各物体モデルの、他の物体モデルと重複していない領域とすることもできる。或いは、物体位置の情報は、上述したデータのうちの２以上を含んだデータとしてもよい。 The optimum arrangement determining means 516a refers to the integrated evaluation value for each arrangement input from the evaluation value calculating means 514a, determines the candidate position in the arrangement with the maximum integrated evaluation value as the position of the object, and determines the information on the determined object position. Is output to the object position output means 31. That is, the optimum arrangement determining means 516a determines each candidate position included in the arrangement in which the maximum similarity is calculated as the position of each person photographed in the captured image.
For example, the optimum arrangement determining means 516a draws an object model at each object position in different colors according to the density of the object position so that the observer can easily see it, and generates and outputs information on the object position. Alternatively, the object position information can be the coordinate value of the object position itself, and the object position information can be a region of each drawn object model that does not overlap with other object models. Alternatively, the object position information may be data including two or more of the above-mentioned data.

物体位置出力手段３１は物体位置判定手段５１から入力された物体位置の情報を表示部６に順次出力し、表示部６は物体位置出力手段３１から入力された物体位置の情報を表示する。例えば、物体位置の情報は、インターネット経由で送受信され、表示部６に表示される。監視員は、表示された情報を視認することによって監視空間に混雑が発生している地点を把握し、当該地点に警備員を派遣し或いは増員するなどの対処を行う。 The object position output means 31 sequentially outputs the object position information input from the object position determination means 51 to the display unit 6, and the display unit 6 displays the object position information input from the object position output means 31. For example, the information on the position of the object is transmitted and received via the Internet and displayed on the display unit 6. By visually recognizing the displayed information, the observer grasps the point where congestion is occurring in the monitoring space, and takes measures such as dispatching or increasing the number of guards to the point.

＜第一の実施形態に係る画像監視装置１の動作＞
図９〜図１１のフローチャートを参照して画像監視装置１の動作を説明する。 <Operation of the image monitoring device 1 according to the first embodiment>
The operation of the image monitoring device 1 will be described with reference to the flowcharts of FIGS. 9 to 11.

画像監視装置１が動作を開始すると、イベント会場に設置されている撮影部２は所定時間おきに監視空間を撮影して撮影画像を画像処理部５が設置されている画像解析センター宛に順次送信する。そして、画像処理部５は撮影画像を受信するたびに図９のフローチャートに従った動作を繰り返す。 When the image monitoring device 1 starts operation, the photographing unit 2 installed at the event venue photographs the monitoring space at predetermined time intervals and sequentially transmits the photographed images to the image analysis center where the image processing unit 5 is installed. To do. Then, each time the image processing unit 5 receives the captured image, the image processing unit 5 repeats the operation according to the flowchart of FIG.

まず、通信部３は画像取得手段３０として動作し、撮影部２からの撮影画像の受信待ち状態となる。撮影画像を取得した画像取得手段３０は当該撮影画像を画像処理部５に出力する（ステップＳ１）。 First, the communication unit 3 operates as the image acquisition means 30, and is in a state of waiting for reception of the captured image from the photographing unit 2. The image acquisition means 30 that has acquired the captured image outputs the captured image to the image processing unit 5 (step S1).

撮影画像を入力された画像処理部５は密度推定手段５０として動作し、撮影画像から密度分布を推定する（ステップＳ２）。密度推定手段５０は、撮影画像の各画素の位置にて推定用特徴量を抽出するとともに記憶部４の密度推定器記憶手段４０から密度推定器を読み出し、各推定用特徴量を密度推定器に入力して撮影画像の各画素における推定密度を取得することにより密度分布を推定する。 The image processing unit 5 to which the captured image is input operates as the density estimation means 50, and estimates the density distribution from the captured image (step S2). The density estimation means 50 extracts the estimation feature amount at the position of each pixel of the captured image, reads the density estimator from the density estimator storage means 40 of the storage unit 4, and uses each estimation feature amount as the density estimator. The density distribution is estimated by inputting and acquiring the estimated density in each pixel of the captured image.

密度分布を推定した画像処理部５は物体位置判定手段５１としても動作し、物体位置判定手段５１には画像取得手段３０から撮影画像が入力されるとともに密度推定手段５０から密度分布が入力される。これらを入力された物体位置判定手段５１は、密度分布に背景クラス以外の推定密度が含まれているか否かを確認する（ステップＳ３）。 The image processing unit 5 that estimates the density distribution also operates as the object position determination means 51, and the captured image is input from the image acquisition means 30 and the density distribution is input from the density estimation means 50 to the object position determination means 51. .. The object position determination means 51 in which these are input confirms whether or not the density distribution includes an estimated density other than the background class (step S3).

背景クラス以外の推定密度が含まれている場合は（ステップＳ３にてＹＥＳ）、物体位置判定手段５１は、少なくとも１人以上の人が撮影されているとして、撮影画像から個々の物体の位置を判定する処理を行う（ステップＳ４）。他方、背景クラスのみの場合は（ステップＳ３にてＮＯ）、人が撮影されていないとして、ステップＳ４，Ｓ５の処理を省略する。 When an estimated density other than the background class is included (YES in step S3), the object position determination means 51 determines the position of each object from the captured image, assuming that at least one person has been photographed. Perform the determination process (step S4). On the other hand, in the case of only the background class (NO in step S3), it is assumed that no person has been photographed, and the processes of steps S4 and S5 are omitted.

図１０および図１１のフローチャートを参照して、ステップＳ４の物体位置判定処理を説明する。単体特徴記憶手段４１が物体モデル記憶手段４１０ａおよび重み記憶手段４１２ａとして動作し、物体位置判定手段５１が配置生成手段５１０ａ、モデル画像生成手段５１２ａ、評価値算出手段５１４ａおよび最適配置決定手段５１６ａとして動作して、物体位置判定処理が実行される。 The object position determination process in step S4 will be described with reference to the flowcharts of FIGS. 10 and 11. The single feature storage means 41 operates as the object model storage means 410a and the weight storage means 412a, and the object position determination means 51 operates as the placement generation means 510a, the model image generation means 512a, the evaluation value calculation means 514a, and the optimum placement determination means 516a. Then, the object position determination process is executed.

配置生成手段５１０ａは、１から上限個数以下の範囲で配置数を順次設定して（ステップＳ１００）、ステップＳ１００〜Ｓ１１７のループ処理を制御する。 The arrangement generation means 510a sequentially sets the number of arrangements in the range from 1 to the upper limit number or less (step S100), and controls the loop processing in steps S100 to S117.

また、配置生成手段５１０ａは、反復回数をカウントするための変数Ｔを用意してＴを０に初期化し（ステップＳ１０１）、ステップＳ１０２〜Ｓ１１６の反復処理を開始する。 Further, the arrangement generation means 510a prepares a variable T for counting the number of iterations, initializes T to 0 (step S101), and starts the iteration process of steps S102 to S116.

次に、配置生成手段５１０ａは、密度推定手段５０から入力された密度分布において推定密度が低密度、中密度または高密度の領域内に、ステップＳ１００にて設定した配置数と同数の候補位置をランダムに設定することによって、当該配置数におけるＴ通り目の配置を生成し、モデル画像生成手段５１２ａに出力する（ステップＳ１０２）。 Next, the arrangement generation means 510a places the same number of candidate positions as the number of arrangements set in step S100 in the region where the estimated density is low density, medium density, or high density in the density distribution input from the density estimation means 50. By setting at random, the T-th arrangement in the number of arrangements is generated and output to the model image generation means 512a (step S102).

モデル画像生成手段５１２ａは、物体モデル記憶手段４１０ａからカメラパラメータを読み出し、カメラパラメータを用いて、ステップＳ１０２で生成した配置に含まれる各候補位置を仮想空間の三次元座標に変換する（ステップＳ１０３）。 The model image generation means 512a reads the camera parameters from the object model storage means 410a, and uses the camera parameters to convert each candidate position included in the arrangement generated in step S102 into three-dimensional coordinates in the virtual space (step S103). ..

次に、モデル画像生成手段５１２ａは、撮影画像と同サイズのモデル画像を用意して初期化するとともに、各候補位置の三次元座標の撮影部２までの距離を算出し、距離が遠い候補位置から順に処理対象に設定して（ステップＳ１０４）、ステップＳ１０４〜Ｓ１１０のループ処理を実行する。 Next, the model image generation means 512a prepares and initializes a model image having the same size as the captured image, calculates the distance to the photographing unit 2 in the three-dimensional coordinates of each candidate position, and determines the distance to the distant candidate position. The processing targets are set in order from (step S104), and the loop processing of steps S104 to S110 is executed.

続いて、モデル画像生成手段５１２ａは密度分布を参照して処理対象の候補位置の密度を特定する（ステップＳ１０５）。モデル画像生成手段５１２ａは、物体モデル記憶手段４１０ａから頭部近傍モデルを読み出して当該候補位置の三次元座標に配置し、カメラパラメータを用いて、頭部近傍モデルを撮影画像の座標系に投影し、投影領域内で最多の推定密度（ただし背景クラス以外）を候補位置の密度として特定する。 Subsequently, the model image generation means 512a specifies the density of the candidate position to be processed with reference to the density distribution (step S105). The model image generation means 512a reads out the head vicinity model from the object model storage means 410a, arranges it at the three-dimensional coordinates of the candidate position, and projects the head vicinity model onto the coordinate system of the captured image using the camera parameters. , The highest estimated density in the projection area (but other than the background class) is specified as the density of the candidate position.

続いて、モデル画像生成手段５１２ａは、ステップＳ１０５で特定した密度に対応する物体モデルを物体モデル記憶手段４１０ａから読み出して（ステップＳ１０６）、処理対象の候補位置の三次元座標に配置し、カメラパラメータを用いて、配置した物体モデルをモデル画像に上書き投影する（ステップＳ１０７）。また、このとき、モデル画像生成手段５１２ａは物体モデルの投影面積を記録しておく。
物体モデルは複数の部分モデルから構成されており、これら複数の部分モデルにおける物体の輪郭がモデル画像中の候補位置を基準とする各位置に上書き投影される。各部分の部分モデルを投影した投影領域が当該部分の部分領域である。モデル画像生成手段５１２ａは、候補位置と部分の組み合わせに識別番号を付与し、当該部分と対応する部分領域の画素値に、投影した部分の識別番号を部分モデルにおける物体の輪郭の強度値とともに設定する。 Subsequently, the model image generation means 512a reads out the object model corresponding to the density specified in step S105 from the object model storage means 410a (step S106), arranges the object model at the three-dimensional coordinates of the candidate position to be processed, and sets the camera parameter. Is used to overwrite and project the arranged object model on the model image (step S107). At this time, the model image generation means 512a records the projected area of the object model.
The object model is composed of a plurality of partial models, and the contours of the objects in the plurality of partial models are overwritten and projected on each position based on the candidate position in the model image. The projection area on which the partial model of each part is projected is the partial area of the part. The model image generation means 512a assigns an identification number to the combination of the candidate position and the portion, and sets the identification number of the projected portion together with the intensity value of the outline of the object in the partial model in the pixel value of the partial region corresponding to the portion. To do.

続いて、モデル画像生成手段５１２ａは密度分布を参照して各部分領域の密度を特定する（ステップＳ１０８）。モデル画像生成手段５１２ａは、各部分領域内で最多の推定密度（ただし背景クラスは低密度クラスとみなす）を当該部分領域の密度として特定する。 Subsequently, the model image generation means 512a specifies the density of each partial region with reference to the density distribution (step S108). The model image generation means 512a specifies the highest estimated density in each subregion (however, the background class is regarded as a low density class) as the density of the subregion.

続いて、モデル画像生成手段５１２ａは重み記憶手段４１２ａを参照して各部分領域の密度に対応する重み（重み係数の比率）を設定する（ステップＳ１０９）。すなわちモデル画像生成手段５１２ａは、各部分領域の重みを当該部分領域に対応する部分の識別番号とともに記録する。なお、この際に、ステップＳ１０７における上書き投影によりモデル画像上で完全に隠蔽された部分領域の重みは記録から削除する。 Subsequently, the model image generation means 512a sets the weight (ratio of weighting factors) corresponding to the density of each partial region with reference to the weight storage means 412a (step S109). That is, the model image generation means 512a records the weight of each partial region together with the identification number of the portion corresponding to the partial region. At this time, the weight of the partial region completely hidden on the model image by the overwrite projection in step S107 is deleted from the recording.

そして、モデル画像生成手段５１２ａは、現配置数におけるＴ通り目の配置に含まれる全ての候補位置を処理し終えたか否かを確認し（ステップＳ１１０）、未処理の候補位置がある場合は（ステップＳ１１０にてＮＯ）、処理をステップＳ１０４に戻して次の候補位置を処理する。 Then, the model image generation means 512a confirms whether or not all the candidate positions included in the T-th arrangement in the current arrangement number have been processed (step S110), and if there is an unprocessed candidate position (step S110). NO in step S110), the process is returned to step S104, and the next candidate position is processed.

他方、全ての候補位置を処理し終えた場合は（ステップＳ１１０にてＹＥＳ）、現配置数におけるＴ通り目の配置についてのモデル画像の完成となる。モデル画像を完成させたモデル画像生成手段５１２ａは当該モデル画像における物体モデルの隠蔽度を算出する（ステップＳ１１１）。すなわち、モデル画像生成手段５１２ａは、「モデルの投影領域の和領域の面積」であるモデル画像上の投影領域の面積を求めるとともに、ステップＳ１０７で記録していた物体モデルごとの投影面積を総和し、総和値からモデルの投影領域の和領域の面積を差し引いて「モデル間の重複領域の面積」を求め、これらを式（１）に代入して隠蔽度を算出する。 On the other hand, when all the candidate positions have been processed (YES in step S110), the model image for the arrangement of the Tth street in the current arrangement number is completed. The model image generation means 512a that has completed the model image calculates the degree of concealment of the object model in the model image (step S111). That is, the model image generation means 512a obtains the area of the projected area on the model image, which is the “area of the sum area of the projected areas of the model”, and sums the projected areas of each object model recorded in step S107. , The area of the sum area of the projected area of the model is subtracted from the total value to obtain the "area of the overlapping area between the models", and these are substituted into the equation (1) to calculate the degree of concealment.

また、モデル画像生成手段５１２ａはステップＳ１０９で記録していた部分領域の重みを総和し、各部分領域の重みを総和値で除することによって各部分領域の重み係数を算出する（ステップＳ１１２）。 Further, the model image generation means 512a totals the weights of the partial regions recorded in step S109, and divides the weights of each partial region by the total value to calculate the weight coefficient of each partial region (step S112).

重み係数を算出したモデル画像生成手段５１２ａはモデル画像と隠蔽度と部分領域ごとの重み係数を評価値算出手段５１４ａに出力する。 The model image generation means 512a for which the weighting coefficient is calculated outputs the model image, the degree of concealment, and the weighting coefficient for each partial region to the evaluation value calculating means 514a.

モデル画像と隠蔽度と重み係数を入力された評価値算出手段５１４ａは、当該モデル画像と撮影画像の部分領域ごとの形状適合度を当該部分領域の部分評価値として算出し（ステップＳ１１３）、さらに、部分領域ごとの形状適合度と当該部分領域の重み係数の積を総和した重み付け形状適合度と隠蔽度から、モデル画像と撮影画像の重み付け類似度を現配置数におけるＴ通り目の配置についての統合評価値として算出する（ステップＳ１１４）。すなわち、評価値算出手段５１４ａは、モデル画像生成手段５１２ａから入力されたモデル画像と撮影画像のそれぞれからエッジ画像を生成し、これらのエッジ画像の部分領域ごとの類似度と部分領域ごとの重み係数を積和して重み付け形状適合度を算出する。そして、重み付け形状適合度と隠蔽度を式（２）に代入して重み付け類似度を算出する。 The evaluation value calculation means 514a in which the model image, the concealment degree, and the weighting coefficient are input calculates the shape conformity of each partial region of the model image and the captured image as a partial evaluation value of the partial region (step S113), and further. From the weighted shape suitability and concealment degree, which is the sum of the product of the shape fit degree for each partial region and the weighting coefficient of the partial region, the weighted similarity between the model image and the captured image is determined for the T-th arrangement in the current number of arrangements. Calculated as an integrated evaluation value (step S114). That is, the evaluation value calculation means 514a generates an edge image from each of the model image and the captured image input from the model image generation means 512a, and the similarity of each partial region of these edge images and the weighting coefficient for each partial region. Is added to calculate the weighted shape conformity. Then, the weighted shape goodness of fit and the concealment degree are substituted into the equation (2) to calculate the weighted similarity.

現配置数におけるＴ通り目の配置についての重み付け類似度が算出されると、評価値算出手段５１４ａは当該配置と重み付け類似度を対応付けて記録し、配置生成手段５１０ａは反復回数Ｔを１だけ増加させて（ステップＳ１１５）、規定回数Ｔ_ＭＡＸと比較し（ステップＳ１１６）、ＴがＴ_ＭＡＸ未満の場合は（ステップＳ１１６にてＮＯ）、処理をステップＳ１０２に戻して現配置数における反復処理を継続させる。 When the weighted similarity for the T-th arrangement in the current number of arrangements is calculated, the evaluation value calculating means 514a records the arrangement and the weighted similarity in association with each other, and the arrangement generating means 510a sets the number of repetitions T to 1 only. is increased (step S115), and compared with the specified number of times _{T MAX} (step S116), if T is less than _{T MAX} iteration in the current arrangement number back (NO at step S116), the processing to step S102 Let it continue.

反復回数Ｔが規定回数Ｔ_ＭＡＸに達した場合（ステップＳ１１６にてＹＥＳ）、配置生成手段５１０ａは、現配置数における反復処理を終了させ、全ての配置数を設定し終えたか否かを確認する（ステップＳ１１７）。未設定の配置数がある場合は（ステップＳ１１７にてＮＯ）、処理をステップＳ１００に戻して次の配置数についての処理を行う。 If the number of iterations T has reached a predetermined number T _MAX (YES in step S116), disposed generation unit 510a ends the iteration in the current arrangement number, checks whether finished setting all numbers arranged (Step S117). If there is an unset number of arrangements (NO in step S117), the process is returned to step S100 and the process for the next number of arrangements is performed.

他方、全ての配置数を設定し終えた場合は（ステップＳ１１７にてＹＥＳ）、評価値算出手段５１４ａはステップＳ１１５で記録した配置と重み付け類似度を最適配置決定手段５１６ａに入力し、最適配置決定手段５１６ａは、それらの中で重み付け類似度が最大の配置を特定し（ステップＳ１１８）、当該配置を撮影画像に撮影されている個々の人の位置を表している情報と判定する。 On the other hand, when all the arrangement numbers have been set (YES in step S117), the evaluation value calculation means 514a inputs the arrangement and the weighting similarity recorded in step S115 into the optimum arrangement determination means 516a, and determines the optimum arrangement. The means 516a identifies the arrangement having the maximum weighting similarity among them (step S118), and determines that the arrangement is information representing the position of an individual person photographed in the captured image.

再び図９を参照して説明を続ける。物体位置判定手段５１はステップＳ４にて判定した個々の人の位置（物体位置）の情報を通信部３に出力する（ステップＳ５）。物体位置の情報を入力された通信部３は物体位置出力手段３１として動作し、物体位置の情報を表示部６に送信する。 The description will be continued with reference to FIG. 9 again. The object position determination means 51 outputs the information of the position (object position) of each person determined in step S4 to the communication unit 3 (step S5). The communication unit 3 to which the object position information is input operates as the object position output means 31, and transmits the object position information to the display unit 6.

以上の処理を終えると、処理はステップＳ１に戻され、次の撮影画像に対する処理が行われる。 When the above processing is completed, the processing is returned to step S1, and the processing for the next captured image is performed.

［第二の実施形態］
以下、第一の実施形態とは異なる本発明の好適な実施形態として、単独の人の画像特徴を学習した識別器を用いて個々の人を検出し、特に、単独の人を構成する複数の部分のそれぞれの画像特徴を学習した部分識別器を用いて個々の人を検出する物体検出装置の例を含んだ画像監視装置１の例を説明する。 [Second Embodiment]
Hereinafter, as a preferred embodiment of the present invention different from the first embodiment, an individual person is detected by using a discriminator that has learned the image features of a single person, and in particular, a plurality of persons constituting the single person. An example of the image monitoring device 1 including an example of an object detection device that detects an individual person by using a partial classifier that has learned the image features of each part will be described.

第二の実施形態に係る画像監視装置は、単体特徴記憶手段４１が記憶している単体特徴の細部および物体位置判定手段５１が行う処理の細部が第一の実施形態に係る画像監視装置と異なり、概略の構成、概略の機能および動作の一部は共通する。そのため、概略の構成、概略の機能および動作の一部については、それぞれ第一の実施形態で参照した図１のブロック図、図２の機能ブロック図および図９のフローチャートを再び参照して説明する。 The image monitoring device according to the second embodiment is different from the image monitoring device according to the first embodiment in the details of the single feature stored in the single feature storage means 41 and the details of the processing performed by the object position determination means 51. , Schematic configuration, schematic functions and some of the operations are common. Therefore, a schematic configuration, a schematic function, and a part of the operation will be described again with reference to the block diagram of FIG. 1, the functional block diagram of FIG. 2, and the flowchart of FIG. 9, respectively, which are referred to in the first embodiment. ..

＜第二の実施形態に係る画像監視装置１の構成＞
図１のブロック図を参照して第二の実施形態に係る画像監視装置１の概略の構成を説明する。
画像監視装置１は、第一の実施形態と同様、監視空間を所定時間おきに撮影して撮影画像を出力する撮影部２と、物体位置の情報を入力されて当該情報を表示する表示部６と、撮影画像を取得して当該撮影画像から個々の人（物体）を検出し、検出した物体の位置（物体位置）の情報を生成して出力する画像処理部５とが、撮影画像および物体位置の情報等の入出力を介在する通信部３に接続されるとともに、プログラムおよび各種データ等を記憶してこれらを入出力する記憶部４が画像処理部５に接続されてなる。 <Configuration of image monitoring device 1 according to the second embodiment>
A schematic configuration of the image monitoring device 1 according to the second embodiment will be described with reference to the block diagram of FIG.
Similar to the first embodiment, the image monitoring device 1 has a photographing unit 2 that photographs the monitoring space at predetermined time intervals and outputs the captured image, and a display unit 6 that inputs the object position information and displays the information. And the image processing unit 5 that acquires a captured image, detects an individual person (object) from the captured image, generates information on the position (object position) of the detected object, and outputs the captured image and the object. It is connected to a communication unit 3 that mediates input / output of position information and the like, and a storage unit 4 that stores programs and various data and inputs / outputs them is connected to an image processing unit 5.

＜第二の実施形態に係る画像監視装置１の機能＞
図２および図１２の機能ブロック図を参照し、第二の実施形態に係る画像監視装置１の機能について説明する。 <Function of image monitoring device 1 according to the second embodiment>
The function of the image monitoring device 1 according to the second embodiment will be described with reference to the functional block diagrams of FIGS. 2 and 12.

通信部３は、第一の実施形態と同様、撮影部２から撮影画像を取得して密度推定手段５０と物体位置判定手段５１に出力する画像取得手段３０、および物体位置判定手段５１から入力された物体位置の情報を表示部６に出力する物体位置出力手段３１等としての機能を含む。 Similar to the first embodiment, the communication unit 3 is input from the image acquisition means 30 and the object position determination means 51 that acquire the captured image from the imaging unit 2 and output it to the density estimation means 50 and the object position determination means 51. It includes a function as an object position output means 31 or the like that outputs information on the position of the object to the display unit 6.

また、記憶部４は、第一の実施形態と同様、所定の密度ごとに当該密度にて物体が存在する空間を撮影した密度画像それぞれの画像特徴を学習した密度推定器を記憶している密度推定器記憶手段４０、および予めの学習により生成された単独の物体の画像特徴（単体特徴）を記憶している単体特徴記憶手段４１等としての機能を含み、単体特徴記憶手段４１が記憶している単体特徴は、物体を構成する部分ごとの画像特徴の情報および密度に対する重みの情報となっている。 Further, as in the first embodiment, the storage unit 4 stores a density estimator that has learned the image features of each density image obtained by photographing a space in which an object exists at a predetermined density at a predetermined density. The estimator storage means 40 includes functions as a single feature storage means 41 that stores image features (single features) of a single object generated by prior learning, and is stored by the single feature storage means 41. The single feature is the information of the image feature for each part constituting the object and the information of the weight with respect to the density.

また、画像処理部５は、第一の実施形態と同様、撮影画像を密度推定器で走査することによって撮影画像に撮影された物体の密度の分布を推定し、推定した密度分布を物体位置判定手段５１に出力する密度推定手段５０、および撮影画像内に個々の物体が存在し得る候補位置を設定して当該候補位置の撮影画像に単独の物体の画像特徴が現れている度合いを表す評価値（統合評価値）を算出し、統合評価値が判定基準を満たす候補位置を物体の位置と判定して物体位置の情報を物体位置出力手段３１に出力する物体位置判定手段５１等としての機能を含み、物体位置判定手段５１は、候補位置を基準として撮影画像内に物体を構成する複数の部分のそれぞれと対応する部分領域を設定して、部分領域ごとに当該部分領域と対応する部分の画像特徴が現れている度合いを表す部分評価値を算出し、候補位置を基準に設定した各部分領域の部分評価値を当該部分領域の密度が低いほど重視して統合することによって統合評価値を算出する。 Further, the image processing unit 5 estimates the density distribution of the object captured in the captured image by scanning the captured image with the density estimator as in the first embodiment, and determines the estimated density distribution for the object position. An evaluation value indicating the degree to which the image feature of a single object appears in the captured image of the candidate position by setting the density estimation means 50 output to the means 51 and the candidate position where each object can exist in the captured image. The function as an object position determination means 51 or the like that calculates (integrated evaluation value), determines a candidate position whose integrated evaluation value satisfies the determination criteria as an object position, and outputs the object position information to the object position output means 31. Including, the object position determination means 51 sets a partial area corresponding to each of a plurality of parts constituting the object in the captured image with reference to the candidate position, and an image of the part corresponding to the partial area for each partial area. The integrated evaluation value is calculated by calculating the partial evaluation value indicating the degree of appearance of the feature, and integrating the partial evaluation value of each partial area set based on the candidate position with emphasis as the density of the partial area decreases. To do.

ただし、上述したように、第二の実施形態に係る物体位置判定手段５１が行う処理の細部および単体特徴記憶手段４１が記憶している単体特徴の細部が第一の実施形態に係る画像監視装置１と異なる。これらの点について、図１２の機能ブロック図を参照して説明する。 However, as described above, the details of the processing performed by the object position determination means 51 according to the second embodiment and the details of the single feature stored by the single feature storage means 41 are the image monitoring device according to the first embodiment. Different from 1. These points will be described with reference to the functional block diagram of FIG.

第二の実施形態に係る単体特徴記憶手段４１は、単独の人（物体）を構成する複数の部分のそれぞれの画像特徴を学習した部分ごとの識別器（部分識別器）を予め記憶した部分識別器記憶手段４１１ｂ、および評価値の算出において用いる重みの情報を予め記憶した重み記憶手段４１２ｂとして機能し、部分識別器の情報および重みの情報を単体特徴として記憶している。 The single feature storage means 41 according to the second embodiment is a partial identification in which a classifier (partial classifier) for each part that has learned the image features of each of a plurality of parts constituting a single person (object) is stored in advance. It functions as the instrument storage means 411b and the weight storage means 412b in which the weight information used in the calculation of the evaluation value is stored in advance, and the information of the partial classifier and the weight information are stored as a single feature.

図１３は、第二の実施形態に係る単体特徴記憶手段４１が記憶している単体特徴、すなわち部分識別器記憶手段４１１ｂが記憶している部分識別器の情報および重み記憶手段４１２ｂが記憶している重みの情報を模式的に表した図である。 FIG. 13 shows the single feature stored in the single feature storage means 41 according to the second embodiment, that is, the partial classifier information and the weight storage means 412b stored in the partial classifier storage means 411b. It is a figure which represented the information of the weight which is present.

物体の各部分についての部分識別器のそれぞれは、画像の特徴量を入力されると当該画像が当該部分の画像（部分画像）であることの尤もらしさを表す部分スコア（部分評価値）を算出して出力するスコア算出関数の係数等で表される。部分識別器記憶手段４１１ｂは、各部分識別器を表す係数等、各部分の基準位置からの相対位置および密度と使用する部分識別器の対応関係等のパラメータを記憶している。基準位置は例えば頭部重心である。 Each of the partial classifiers for each part of the object calculates a partial score (partial evaluation value) indicating the plausibility that the image is an image (partial image) of the part when the feature amount of the image is input. It is represented by the coefficient of the score calculation function that is output. The partial classifier storage means 411b stores parameters such as a coefficient representing each partial classifier and the correspondence between the relative position and density of each part from the reference position and the partial classifier used. The reference position is, for example, the center of gravity of the head.

部分識別器は、例えば、多数の人についての部分画像と多数の無人画像からなる学習用画像の特徴量に線形ＳＶＭ法を適用して学習した識別器とすることができる。学習アルゴリズムとして線形ＳＶＭを用いた場合、スコア算出関数の係数は重みベクトルである。この重みベクトルは、特徴量の各要素に対する重みであり、入力された画像の特徴量と重みベクトルとの内積の値が部分スコアを表す。学習において、当該重みベクトルと特徴量との内積が０以上である場合は人の部分画像、０未満である場合は人の部分画像ではないと識別されるように調整される。よって、統合スコアに適用して、入力された画像が単独の人の画像であるか否かを識別する閾値は原理上は０であり、通常、閾値は０に設定することができる。ただし、単独の人の画像を単独の人の画像ではないと識別する誤りを減じるために、閾値を０よりも小さな値に設定してもよい。
学習用画像の特徴量は例えばＨＯＧ（Histograms of Oriented Gradients）特徴量とすることができる。 The partial classifier can be, for example, a classifier learned by applying the linear SVM method to the features of a learning image composed of a partial image of a large number of people and a large number of unmanned images. When linear SVM is used as the learning algorithm, the coefficient of the score calculation function is a weight vector. This weight vector is a weight for each element of the feature amount, and the value of the inner product of the feature amount and the weight vector of the input image represents the partial score. In learning, if the inner product of the weight vector and the feature amount is 0 or more, it is identified as a human partial image, and if it is less than 0, it is not identified as a human partial image. Therefore, the threshold value for discriminating whether or not the input image is an image of a single person when applied to the integrated score is 0 in principle, and the threshold value can usually be set to 0. However, the threshold value may be set to a value smaller than 0 in order to reduce the error of identifying the image of a single person as not the image of a single person.
The feature amount of the learning image can be, for example, a HOG (Histograms of Oriented Gradients) feature amount.

部分識別器記憶手段４１１ｂは、具体的には、低密度クラスを表す値と対応付けて人の６つの部分「上１／３、左１／２」、「上１／３、右１／２」、「中１／３、左１／２」、「中１／３、右１／２」、「下１／３、左１／２」および「下１／３、右１／２」の画像特徴を学習した部分識別器８００〜８０５を記憶している。これら６つの部分を合わせると人の「全体」となる。
以下、「上１／３、左１／２」を左上部、「上１／３、右１／２」を右上部、「中１／３、左１／２」を左中部、「中１／３、右１／２」を右中部、「下１／３、左１／２」を左下部、「下１／３、右１／２」を右下部と称する。また、左上部用の部分識別器８００を左上部識別器、右上部用の部分識別器８０１を右上部識別器、左中部用の部分識別器８０２を左中部識別器、右中部用の部分識別器８０３を右中部識別器、左下部用の部分識別器８０４を左下部識別器、右下部用の部分識別器８０５を右下部識別器と称する。また、合わせると評価範囲が「全体」となる６つの部分識別器８００〜８０５のセット８１０を全体識別器と称する。
また部分識別器記憶手段４１１ｂは、中密度クラスを表す値と対応付けて左上部識別器８００、右上部識別器８０１、左中部識別器８０２および右中部識別器８０３を記憶している。これら４つの部分識別器の評価範囲を合わせると人の「上部２／３」となる。合わせると評価範囲が「上部２／３」となる４つの部分識別器８００〜８０３のセット８１１を上半身識別器と称する。
また部分識別器記憶手段４１１ｂは、高密度クラスを表す値と対応付けて左上部識別器８００および右上部識別器８０１を記憶している。これら２つの部分識別器の評価範囲を合わせると人の「上部１／３」となる。合わせると評価範囲が「上部１／３」となる２つの部分識別器８００，８０１のセット８１２を頭部近傍識別器と称する。 Specifically, the partial classifier storage means 411b is associated with a value representing a low density class and has six parts of a person "upper 1/3, left 1/2", "upper 1/3, right 1/2". , "Middle 1/3, Left 1/2", "Middle 1/3, Right 1/2", "Bottom 1/3, Left 1/2" and "Bottom 1/3, Right 1/2" The partial classifiers 800 to 805 that have learned the image features are stored. When these six parts are combined, it becomes the "whole" of a person.
Below, "upper 1/3, left 1/2" is the upper left part, "upper 1/3, right 1/2" is the upper right part, "middle 1/3, left 1/2" is the left middle part, and "middle 1". "3/3, right 1/2" is referred to as the middle right, "lower 1/3, left 1/2" is referred to as the lower left, and "lower 1/3, right 1/2" is referred to as the lower right. Further, the partial discriminator 800 for the upper left part is the upper left part discriminator, the partial discriminator 801 for the upper right part is the upper right part discriminator, the partial discriminator 802 for the left middle part is the left middle part discriminator, and the partial discriminator for the right middle part. The device 803 is referred to as a right middle discriminator, the partial discriminator 804 for the lower left is referred to as a lower left discriminator, and the partial discriminator 805 for the lower right is referred to as a lower right discriminator. Further, a set 810 of six partial classifiers 800 to 805 whose evaluation range is "whole" when combined is referred to as a whole classifier.
Further, the partial classifier storage means 411b stores the upper left part classifier 800, the upper right part classifier 801 and the left middle part classifier 802 and the right middle part classifier 803 in association with the value representing the medium density class. When the evaluation ranges of these four partial classifiers are combined, it becomes the "upper two-thirds" of a person. A set 811 of four partial classifiers 800 to 803 having an evaluation range of "upper 2/3" when combined is referred to as an upper body classifier.
Further, the partial classifier storage means 411b stores the upper left class classifier 800 and the upper right class classifier 801 in association with the values representing the high-density class. The combined evaluation range of these two partial classifiers is the "upper 1/3" of a person. A set 812 of two partial classifiers 800 and 801 having an evaluation range of "upper 1/3" when combined is called a head proximity classifier.

このように、部分識別器記憶手段４１１ｂは、低密度クラスと対応付けて左上部識別器８００、右上部識別器８０１、左中部識別器８０２、右中部識別器８０３、左下部識別器８０４および右下部識別器８０５からなる全体識別器８１０を、中密度クラスと対応付けて左上部識別器８００、右上部識別器８０１、左中部識別器８０２および右中部識別器８０３からなる上半身識別器８１１を、高密度クラスと対応付けて左上部識別器８００および右上部識別器８０１からなる頭部近傍識別器８１２を、統合評価値に適用する閾値等とともに記憶している。
なお、ここでは密度間で同一部分の部分識別器を共用する例を示したが、密度ごとに部分の区分けを異ならせてもよい。 In this way, the partial classifier storage means 411b is associated with the low density class, the upper left class classifier 800, the upper right class classifier 801 and the left middle class classifier 802, the right middle class classifier 803, the lower left lower class classifier 804, and the right side. The overall classifier 810 consisting of the lower class classifier 805 is associated with the middle density class, and the upper body classifier 800, the upper right board classifier 801 and the upper body classifier 811 composed of the left middle class classifier 802 and the right middle class classifier 803 are designated. The head neighborhood classifier 812 including the upper left discriminator 800 and the upper right discriminator 801 is stored together with the threshold value applied to the integrated evaluation value in association with the high-density class.
Although the example in which the partial classifier of the same portion is shared between the densities is shown here, the division of the portion may be different for each density.

重み８２０は、物体の各部分と対応する部分領域の密度に応じて当該部分を重視する度合いであり、密度間の相対的な比率で表される。密度が低い部分ほど重視され密度が高い部分ほど軽視されるため、密度が低いほど高く密度が高いほど低い重み８２０が設定されている。例えば、低密度、中密度および高密度の重み８２０は比率１０：７：５とすることができる。なお部分領域の密度の決定に際しては背景クラスは低密度クラスとみなすことができる。 The weight 820 is a degree of emphasizing the portion according to the density of each portion of the object and the corresponding partial region, and is represented by a relative ratio between the densities. The lower the density, the more important the portion, and the higher the density, the lighter the weight. Therefore, the lower the density, the higher the density, and the lower the weight 820 is set. For example, low density, medium density and high density weights 820 can have a ratio of 10: 7: 5. The background class can be regarded as a low density class when determining the density of a partial region.

このように重み記憶手段４１２ｂは、密度が低いほど高く密度が高いほど低い、密度ごとの重みを記憶している。 As described above, the weight storage means 412b stores the weight for each density, which is higher as the density is lower and lower as the density is higher.

候補位置設定手段５１１ｂは、撮影画像内に所定間隔にて複数の候補位置を設定し、設定した候補位置を評価値算出手段５１４ｂに出力する。具体的には、所定間隔は１画素であり、候補位置設定手段５１１ｂは撮影画像の各画素の位置を順次候補位置に設定する。なお候補位置は人の頭部重心を表すものとする。 The candidate position setting means 511b sets a plurality of candidate positions in the captured image at predetermined intervals, and outputs the set candidate positions to the evaluation value calculating means 514b. Specifically, the predetermined interval is one pixel, and the candidate position setting means 511b sequentially sets the position of each pixel of the captured image to the candidate position. The candidate position represents the center of gravity of the human head.

評価値算出手段５１４ｂは、候補位置設定手段５１１ｂから入力された各候補位置を基準として撮影画像内に単独の物体を構成する複数の部分のそれぞれと対応する部分領域を設定して、複数の部分のそれぞれと対応する部分領域の画像特徴を当該部分の画像特徴を学習した部分識別器に入力して当該部分領域の部分評価値を算出し、候補位置ごとに当該候補位置を基準に設定した部分領域の部分評価値を当該部分領域の密度が低いほど重視して統合することにより統合評価値を算出し、算出した統合評価値とそれに付随する情報を位置決定手段５１７ｂに出力する。
またその際に、評価値算出手段５１４ｂは、各候補位置に、当該候補位置における密度が高いほど、単独の物体を構成する部分のうちの少ない部分に対応する部分領域を設定する。 The evaluation value calculating means 514b sets a partial area corresponding to each of a plurality of parts constituting a single object in the captured image with reference to each candidate position input from the candidate position setting means 511b, and sets a plurality of parts. The image feature of the partial area corresponding to each of the above is input to the partial classifier that learned the image feature of the relevant part, the partial evaluation value of the partial area is calculated, and the part set for each candidate position based on the candidate position. The integrated evaluation value is calculated by emphasizing the partial evaluation value of the region as the density of the partial region decreases, and the calculated integrated evaluation value and the information associated therewith are output to the position-fixing means 517b.
At that time, the evaluation value calculating means 514b sets a partial region corresponding to a small part of the parts constituting a single object as the density at the candidate position increases at each candidate position.

そのために、評価値算出手段５１４ｂは、各候補位置に上部１／３の窓を設定し、密度推定手段５０から入力された密度分布を参照して当該窓内の推定密度を集計する。ただし背景クラスは除いて集計する。そして、評価値算出手段５１４ｂは各候補位置における最多の推定密度を当該候補位置の密度と決定する。 Therefore, the evaluation value calculation means 514b sets an upper 1/3 window at each candidate position, and totals the estimated density in the window with reference to the density distribution input from the density estimation means 50. However, the background class is excluded and aggregated. Then, the evaluation value calculation means 514b determines the highest estimated density at each candidate position as the density of the candidate position.

また、評価値算出手段５１４ｂは、部分識別器記憶手段４１１ｂから各候補位置の密度に応じた部分識別器の情報を読み出し、各候補位置に当該候補位置の密度に対応づけられた部分識別器と対応する窓（部分領域）を設定し、各部分領域内の撮影画像から識別用の特徴量（識別用特徴量）を抽出する。これらの部分領域は、各部分の部分識別器の学習に用いた部分画像の形状（図１３に示した実線の矩形）を有し、予め定めた複数の倍率で拡大・縮小した大きさの窓である。低密度の候補位置に対しては合わせると人の「全体」となる６つの部分領域、中密度の候補位置に対しては合わせると人の「上部２／３」となる４つの部分領域、高密度の候補位置に対しては合わせると人の「上部１／３」となる２つの部分領域が設定される。識別用特徴量は、学習用画像の特徴量と同種であり、ＨＯＧ特徴量である。 Further, the evaluation value calculating means 514b reads out the information of the partial classifier corresponding to the density of each candidate position from the partial classifier storage means 411b, and sets each candidate position as a partial classifier associated with the density of the candidate position. The corresponding window (partial area) is set, and the feature amount for identification (feature amount for identification) is extracted from the captured image in each partial area. These partial regions have the shape of the partial image (solid rectangle shown in FIG. 13) used for learning the partial classifier of each part, and the size of the window is enlarged or reduced by a plurality of predetermined magnifications. Is. Six sub-regions that make up the "whole" of a person for low-density candidate positions, and four sub-regions that make up the "upper two-thirds" of a person when combined for medium-density candidate positions, high Two subregions, which together form the "upper 1/3" of a person, are set for the density candidate positions. The identification feature amount is the same type as the feature amount of the learning image, and is a HOG feature amount.

また、評価値算出手段５１４ｂは、各部分の識別用特徴量を当該部分の部分識別器に入力してその出力値である部分スコアを部分評価値として取得する。 Further, the evaluation value calculation means 514b inputs the identification feature amount of each part into the partial classifier of the part and acquires the partial score which is the output value as the partial evaluation value.

また、評価値算出手段５１４ｂは、各部分領域について当該部分領域の推定密度に応じた重み係数を算出する。 In addition, the evaluation value calculation means 514b calculates a weighting coefficient for each subregion according to the estimated density of the subregion.

そのために、評価値算出手段５１４ｂは、密度分布を参照して各部分領域内の推定密度を集計し、各部分領域における最多の推定密度を当該部分領域の密度と決定する。ただし背景クラスは低密度クラスとみなして集計する。
次に評価値算出手段５１４ｂは、重み記憶手段４１２ｂから各部分領域の密度に対応する重み（重み係数の比率）を読み出し、候補位置ごとに当該候補位置を基準に設定した部分領域の重みの和を求める。
続いて評価値算出手段５１４ｂは、候補位置ごとに、各部分領域の重みを全部分領域の重みの和で除して、当該部分領域の重み係数を算出する。すなわち各候補位置における重み係数の和は１となるように正規化される。
このようにして評価値算出手段５１４ｂは、部分領域ごとに当該部分領域における密度が低いほど高く当該部分領域における密度が高いほど低い重み係数を設定する。 Therefore, the evaluation value calculation means 514b aggregates the estimated densities in each sub-region with reference to the density distribution, and determines the highest estimated density in each sub-region as the density of the sub-region. However, the background class is regarded as a low density class and aggregated.
Next, the evaluation value calculation means 514b reads out the weights (ratio of weighting coefficients) corresponding to the density of each partial region from the weight storage means 412b, and sums the weights of the partial regions set for each candidate position based on the candidate position. Ask for.
Subsequently, the evaluation value calculating means 514b calculates the weighting coefficient of the subregion by dividing the weight of each subregion by the sum of the weights of all the subregions for each candidate position. That is, the sum of the weighting coefficients at each candidate position is normalized to be 1.
In this way, the evaluation value calculating means 514b sets a weighting coefficient for each subregion, which is higher as the density in the subregion is lower and lower as the density in the subregion is higher.

また、評価値算出手段５１４ｂは、候補位置ごとに、候補位置を基準に設定した各部分領域の部分評価値を当該部分領域の重み係数にて重み付けて総和して統合評価値を算出する。 Further, the evaluation value calculation means 514b calculates the integrated evaluation value by weighting the partial evaluation values of each sub-region set based on the candidate position with the weighting coefficient of the sub-region for each candidate position and summing them up.

つまり、左上部識別器による部分スコアをＳ_ＵＬ、右上部識別器による部分スコアをＳ_ＵＲ、左中部識別器による部分スコアをＳ_ＭＬ、右中部識別器による部分スコアをＳ_ＭＲ、左下部識別器による部分スコアをＳ_ＬＬ、右下部識別器による部分スコアをＳ_ＬＲとし、左上部の重み係数をＷ_ＵＬ、右上部の重み係数をＷ_ＵＲ、左中部の重み係数をＷ_ＭＬ、右中部の重み係数をＷ_ＭＲ、左下部の重み係数をＷ_ＬＬ、右下部の重み係数をＷ_ＬＲとすると、評価値算出手段５１４ｂは、次のようにして統合スコアを算出する。 That is, the partial score by the upper left discriminator is _SUL , the partial score by the upper right discriminator is _SUR , the partial score by the left middle discriminator is _SML , the partial score by the right middle discriminator is _SMR , and the lower left discriminator. The partial score by is S _LL , the partial score by the lower right classifier is S _LR , the weight coefficient in the upper left is W _UL , the weight coefficient in the upper right is W _UR , the weight coefficient in the middle left is W _ML , and the weight in the middle right is W ML. Assuming that the coefficient is W _MR , the lower left weight coefficient is W _LL , and the lower right weight coefficient is W _LR , the evaluation value calculation means 514b calculates the integrated score as follows.

評価値算出手段５１４ｂは、注目している候補位置の密度が低密度であれば次式によって当該候補位置の統合スコアを算出する。
統合スコア＝Ｗ_ＵＬＳ_ＵＬ＋Ｗ_ＵＲＳ_ＵＲ＋Ｗ_ＭＬＳ_ＭＬ＋Ｗ_ＭＲＳ_ＭＲ
＋Ｗ_ＬＬＳ_ＬＬ＋Ｗ_ＬＲＳ_ＬＲ（３）
また、評価値算出手段５１４ｂは、注目している候補位置の密度が中密度であれば次式によって当該候補位置の統合スコアを算出する。
統合スコア＝Ｗ_ＵＬＳ_ＵＬ＋Ｗ_ＵＲＳ_ＵＲ＋Ｗ_ＭＬＳ_ＭＬ＋Ｗ_ＭＲＳ_ＭＲ（４）
また、評価値算出手段５１４ｂは、注目している候補位置の密度が高密度であれば次式によって当該候補位置の統合スコアを算出する。
統合スコア＝Ｗ_ＵＬＳ_ＵＬ＋Ｗ_ＵＲＳ_ＵＲ（５） If the density of the candidate position of interest is low, the evaluation value calculating means 514b calculates the integrated score of the candidate position by the following equation.
Integrated score = W _UL S _UL + W _UR S _UR + W _ML S _ML + W _MR S _MR
+ W _LL S _LL + W _LR S _LR (3)
Further, if the density of the candidate position of interest is medium density, the evaluation value calculating means 514b calculates the integrated score of the candidate position by the following equation.
Integrated score = W _UL S _UL + W _UR S _UR + W _ML S _ML + W _MR S _MR (4)
Further, if the density of the candidate positions of interest is high, the evaluation value calculating means 514b calculates the integrated score of the candidate positions by the following equation.
Integrated score = W _UL S _UL + W _UR S _UR (5)

図１４は、図７で例示した密度分布が得られている場合に、図７で例示した各候補位置について評価値算出手段５１４ｂが統合スコアを算出する様子を模式的に示した図である。画像８３０は、これらの候補位置のうち密度が低密度である３つの候補位置について、各部分と重み係数の関係を示している。画像８３１は、密度が中密度である３つの候補位置について、各部分と重み係数の関係を示している。画像８３２は、密度が高密度である２つの候補位置について、各部分と重み係数の関係を示している。 FIG. 14 is a diagram schematically showing how the evaluation value calculation means 514b calculates the integrated score for each candidate position illustrated in FIG. 7 when the density distribution illustrated in FIG. 7 is obtained. Image 830 shows the relationship between each part and the weighting coefficient for three candidate positions having a low density among these candidate positions. Image 831 shows the relationship between each part and the weighting factor for the three candidate positions having medium densities. Image 832 shows the relationship between each part and the weighting factor for the two candidate positions with high densities.

例えば、候補位置８４０は、その密度が低密度であるため候補位置８４０を基準に６つの部分領域が設定され、６つの部分領域の密度がいずれも低密度であるため各部分領域の重み係数比率は１０、その和は６０となる。よって、重み係数はＷ_ＵＬ＝Ｗ_ＵＲ＝Ｗ_ＭＬ＝Ｗ_ＭＲ＝Ｗ_ＬＬ＝Ｗ_ＬＲ＝１０／６０となり、これらの重み係数と各部分の部分スコアを式（３）に代入して統合スコアが算出される。
また例えば、候補位置８４１は、その密度が中密度であるため候補位置８４１を基準に４つの部分領域が設定され、４つの部分領域のうちの上２つは密度が中密度であるため重み係数比率は７、下２つは密度が低密度であるため重み係数比率は１０、その和は３４となる。よって、重み係数はＷ_ＵＬ＝Ｗ_ＵＲ＝７／３４、Ｗ_ＭＬ＝Ｗ_ＭＲ＝１０／３４となり、これらの重み係数と各部分の部分スコアを式（４）に代入して統合スコアが算出される。
また例えば、候補位置８４２は、その密度が高密度であるため候補位置８４２を基準に２つの部分領域が設定され、２つの部分領域の密度がいずれも高密度であるため各部分領域の重み係数比率は５、その和は１０となる。よって、重み係数はＷ_ＵＬ＝Ｗ_ＵＲ＝５／１０となり、これらの重み係数と各部分の部分スコアを式（５）に代入して統合スコアが算出される。
他の候補位置についても同様にして統合スコアが算出される。 For example, since the density of the candidate position 840 is low, six subregions are set based on the candidate position 840, and since the densities of the six subregions are all low, the weight coefficient ratio of each subregion is low. Is 10, and the sum is 60. Therefore, the weighting coefficient is W _UL = W _UR = W _ML = W _MR = W _LL = W _LR = 10/60, and these weighting factors and the partial score of each part are substituted into the equation (3) to obtain the integrated score. Calculated.
Further, for example, since the density of the candidate position 841 is medium density, four subregions are set based on the candidate position 841, and the upper two of the four subregions have a medium density, so that the weighting coefficient The ratio is 7, and the lower two have low densities, so the weighting factor ratio is 10, and the sum is 34. Therefore, the weighting factors are W _UL = W _UR _{= 7/34} and W ML = W _MR = 10/34, and the integrated score is calculated by substituting these weighting factors and the partial scores of each part into the equation (4). To.
Further, for example, since the density of the candidate position 842 is high, two subregions are set based on the candidate position 842, and since the densities of the two subregions are both high, the weighting coefficient of each subregion is high. The ratio is 5, and the sum is 10. Therefore, the weighting coefficient becomes W _UL = W _UR = 5/10, and the integrated score is calculated by substituting these weighting factors and the partial scores of each part into the equation (5).
The integrated score is calculated in the same manner for the other candidate positions.

そして、評価値算出手段５１４ｂは、候補位置ごとに、候補位置、候補位置の密度、統合スコアおよび使用した部分領域の和領域（統合窓）を対応付けた情報を位置決定手段５１７ｂに出力する。 Then, the evaluation value calculation means 514b outputs information in which the candidate position, the density of the candidate positions, the integrated score, and the sum region (integrated window) of the used partial regions are associated with each candidate position to the position determining means 517b.

位置決定手段５１７ｂは、評価値算出手段５１４ｂから入力された情報を参照し、予め定めた判定基準を満たす統合評価値が算出された候補位置を物体の位置と決定する。 The position determining means 517b refers to the information input from the evaluation value calculating means 514b, and determines the candidate position where the integrated evaluation value satisfying the predetermined determination criteria is calculated as the position of the object.

具体的には、位置決定手段５１７ｂは、統合スコアが予め定めた閾値（例えば０）以上である候補位置を抽出し、抽出した候補位置のうち対応する密度が同一であり且つ互いに近接する複数の候補位置（統合窓同士の重複が予め定めた割合より大きな候補位置）を一つにまとめ、まとめた候補位置を人が撮影されている位置と決定する。 Specifically, the position-determining means 517b extracts candidate positions whose integrated score is equal to or higher than a predetermined threshold value (for example, 0), and among the extracted candidate positions, a plurality of extracted candidate positions having the same density and close to each other. The candidate positions (candidate positions where the overlap between the integrated windows is larger than the predetermined ratio) are combined into one, and the combined candidate positions are determined as the positions where the person is photographed.

この候補位置をまとめる処理は、実際に人が撮影されている位置に加えてその近傍においても同一人物に対して高い統合スコアが算出されることに対処するために行う。具体的には、例えば、位置決定手段５１７ｂは、候補位置の密度ごとに、閾値以上の統合スコアが算出された候補位置を統合スコアが高い順に順次注目位置に設定するとともに注目位置より統合スコアが低い候補位置を比較位置に設定する。そして、位置決定手段５１７ｂは、比較位置のうち当該比較位置の統合窓と注目位置の統合窓との重なりが予め定めた割合より大きい比較位置の情報を削除することで複数の候補位置を一つにまとめる。 This process of summarizing the candidate positions is performed in order to deal with the fact that a high integrated score is calculated for the same person not only in the position where the person is actually photographed but also in the vicinity thereof. Specifically, for example, the position-determining means 517b sets the candidate positions for which the integrated score equal to or higher than the threshold value is calculated to the attention positions in descending order of the integrated score for each density of the candidate positions, and the integrated score is higher than the attention position. Set the lower candidate position as the comparison position. Then, the position-determining means 517b deletes information on the comparison position in which the overlap between the integrated window at the comparison position and the integrated window at the attention position is larger than a predetermined ratio among the comparison positions, thereby selecting a plurality of candidate positions. Summarize in.

そして、位置決定手段５１７ｂは、人が撮影されている位置と決定した候補位置を物体位置の情報として物体位置出力手段３１に出力する。 Then, the position-determining means 517b outputs the position where the person is photographed and the candidate position determined to be the position where the person is photographed to the object position output means 31 as the object position information.

＜第二の実施形態に係る画像監視装置１の動作＞
以下、図９および図１５を参照し、第二の実施形態に係る画像監視装置１の動作を説明する。 <Operation of the image monitoring device 1 according to the second embodiment>
Hereinafter, the operation of the image monitoring device 1 according to the second embodiment will be described with reference to FIGS. 9 and 15.

画像監視装置１が動作を開始すると、第一の実施形態と同様に、撮影部２は順次撮影画像を送信し、画像処理部５は撮影画像を受信するたびに図９のフローチャートに従った動作を繰り返す。 When the image monitoring device 1 starts operation, the photographing unit 2 sequentially transmits the captured images, and the image processing unit 5 operates according to the flowchart of FIG. 9 each time the captured image is received, as in the first embodiment. repeat.

通信部３は画像取得手段３０として動作し、撮影画像を受信して画像処理部５に出力する（ステップＳ１）。撮影画像を入力された画像処理部５は密度推定手段５０として動作して記憶部４の密度推定器記憶手段４０から密度推定器を読み出し、撮影画像を密度推定器にて走査することによって密度分布を推定する（ステップＳ２）。 The communication unit 3 operates as the image acquisition means 30, receives the captured image, and outputs the captured image to the image processing unit 5 (step S1). The image processing unit 5 to which the captured image is input operates as the density estimation means 50, reads the density estimator from the density estimator storage means 40 of the storage unit 4, and scans the captured image with the density estimator to distribute the density. Is estimated (step S2).

次に、画像処理部５は物体位置判定手段５１として動作し、物体位置判定手段５１は、画像取得手段３０から撮影画像および密度推定手段５０から密度分布を入力されて、密度分布に背景クラス以外の推定密度が含まれているか否かを確認する（ステップＳ３）。 Next, the image processing unit 5 operates as the object position determination means 51, and the object position determination means 51 receives the captured image from the image acquisition means 30 and the density distribution from the density estimation means 50, and the density distribution is other than the background class. It is confirmed whether or not the estimated density of is included (step S3).

物体位置判定手段５１は、背景クラス以外の推定密度が含まれている場合は（ステップＳ３にてＹＥＳ）、撮影画像から個々の物体の位置を判定する処理を行い（ステップＳ４）、背景クラスのみの場合は（ステップＳ３にてＮＯ）、ステップＳ４，Ｓ５の処理を省略する。 When the object position determination means 51 includes an estimated density other than the background class (YES in step S3), the object position determination means 51 performs a process of determining the position of each object from the captured image (step S4), and only the background class. In the case of (NO in step S3), the processing of steps S4 and S5 is omitted.

図１５のフローチャートを参照して、ステップＳ４の物体位置判定処理を説明する。単体特徴記憶手段４１が部分識別器記憶手段４１１ｂおよび重み記憶手段４１２ｂとして動作し、物体位置判定手段５１が候補位置設定手段５１１ｂ、評価値算出手段５１４ｂおよび位置決定手段５１７ｂとして動作して、物体位置判定処理が実行される。 The object position determination process in step S4 will be described with reference to the flowchart of FIG. The single feature storage means 41 operates as the partial classifier storage means 411b and the weight storage means 412b, and the object position determination means 51 operates as the candidate position setting means 511b, the evaluation value calculation means 514b, and the position determination means 517b. Judgment processing is executed.

候補位置設定手段５１１ｂは、撮影画像中の各画素の位置を順次候補位置に設定して評価値算出手段５１４ｂに入力し（ステップＳ２００）、ステップＳ２００〜Ｓ２０６のループ処理を制御する。 The candidate position setting means 511b sequentially sets the position of each pixel in the captured image to the candidate position and inputs it to the evaluation value calculating means 514b (step S200), and controls the loop processing of steps S200 to S206.

候補位置を入力された評価値算出手段５１４ｂは密度分布を参照して候補位置の密度を特定する（ステップＳ２０１）。すなわち評価値算出手段５１４ｂは候補位置に単独の人の上部１／３の形状に定められた窓を設定して当該窓内で最多の推定密度（ただし背景クラスは除く）を候補位置の密度として特定する。 The evaluation value calculation means 514b in which the candidate position is input specifies the density of the candidate position with reference to the density distribution (step S201). That is, the evaluation value calculation means 514b sets a window defined in the shape of the upper 1/3 of a single person at the candidate position, and sets the highest estimated density (excluding the background class) in the window as the density of the candidate position. Identify.

候補位置の密度を特定した評価値算出手段５１４ｂは、部分識別器記憶手段４１１ｂから当該密度に応じた複数の部分識別器を読み出し、各部分識別器に対応する部分領域を設定して部分領域内の撮影画像から識別用特徴量をそれぞれ抽出し（ステップＳ２０２）、抽出した識別用特徴量を該当する部分識別器に入力して、それぞれの部分スコア（部分評価値）を算出する（ステップＳ２０３）。 The evaluation value calculation means 514b that specifies the density of the candidate position reads out a plurality of partial classifiers according to the density from the partial classifier storage means 411b, sets a partial area corresponding to each partial classifier, and within the partial area. Each of the identification features is extracted from the captured image (step S202), the extracted identification features are input to the corresponding partial classifier, and each partial score (partial evaluation value) is calculated (step S203). ..

部分評価値を算出した評価値算出手段５１４ｂは密度分布を参照して各部分領域の密度を特定する（ステップ２０４）。すなわち、評価値算出手段５１４ｂは各部分領域内で最多の推定密度（ただし背景クラスは低密度クラスとみなす）を当該部分領域の密度として特定する。 The evaluation value calculation means 514b that calculated the partial evaluation value specifies the density of each partial region with reference to the density distribution (step 204). That is, the evaluation value calculation means 514b specifies the highest estimated density in each subregion (however, the background class is regarded as the low density class) as the density of the subregion.

各部分領域の密度を特定した評価値算出手段５１４ｂは、重み記憶手段４１２ｂから各部分領域の密度に応じた重み係数の比率を読み出し、各部分領域の重み係数の比率を全部分領域の重み係数の比率の和で除して当該部分領域の重み係数を算出し、算出した重み係数とステップＳ２０３で算出した部分評価値を積和することにより候補位置の統合評価値を算出する（ステップＳ２０５）。候補領域の密度が低密度の場合は式（３）、中密度の場合は式（４）、高密度の場合は式（５）に従って、積和することになる。 The evaluation value calculation means 514b that specifies the density of each partial region reads out the ratio of the weighting coefficient according to the density of each partial region from the weight storage means 412b, and sets the ratio of the weighting coefficient of each partial region as the weighting coefficient of all the partial regions. The weighting coefficient of the relevant partial region is calculated by dividing by the sum of the ratios of, and the integrated evaluation value of the candidate position is calculated by summing the calculated weighting coefficient and the partial evaluation value calculated in step S203 (step S205). .. When the density of the candidate region is low, the product is summed according to the formula (3), when the density is medium, the product is summed according to the formula (4), and when the density is high, the product is summed according to the formula (5).

そして、評価値算出手段５１４ｂは、候補位置と、部分領域の和領域の窓（統合窓）と、候補位置の密度と、統合評価値とを対応付けて記録するとともに、撮影画像の全ての画素の位置を候補位置に設定し終えたか否かを確認し（ステップＳ２０６）、未設定の画素がある場合は（ステップＳ２０６にてＮＯ）、処理をステップＳ２００に戻して次の画素の位置を処理する。 Then, the evaluation value calculation means 514b records the candidate position, the window (integrated window) of the sum region of the partial region, the density of the candidate position, and the integrated evaluation value in association with each other, and all the pixels of the captured image. It is confirmed whether or not the position of is set to the candidate position (step S206), and if there is an unset pixel (NO in step S206), the process is returned to step S200 and the position of the next pixel is processed. To do.

他方、全ての画素の位置を候補位置に設定し終えた場合（ステップＳ２０６にてＹＥＳ）、位置決定手段５１７ｂは、ステップＳ２０６で記録された候補位置と統合窓と候補位置の密度と統合評価値の組の中から統合評価値が閾値未満の組を削除し（ステップＳ２０７）、さらに、削除されずに残った組について、候補位置の密度ごとに、互いの統合窓が予め定めた割合よりも大きく重複している組を同一人物のものとして一つの組にまとめる（ステップＳ２０８）。そして位置決定手段５１７ｂはまとめた後の各組の候補位置を撮影画像に撮影されている個々の人の位置（物体位置）と判定する。 On the other hand, when the positions of all the pixels have been set to the candidate positions (YES in step S206), the position determining means 517b has the candidate position, the integrated window, the density of the candidate positions, and the integrated evaluation value recorded in step S206. The pair whose integrated evaluation value is less than the threshold value is deleted from the pairs (step S207), and for the remaining pairs that are not deleted, the ratio of the integrated windows of each other is higher than the predetermined ratio for each density of candidate positions. The large overlapping groups are grouped into one group as the same person (step S208). Then, the position-determining means 517b determines that the candidate positions of each set after being collected are the positions (object positions) of the individual persons photographed in the captured image.

再び図９を参照して説明を続ける。物体位置判定手段５１はステップＳ４にて判定した物体位置の情報を通信部３に出力し（ステップＳ５）、通信部３は物体位置出力手段３１として動作して物体位置の情報を表示部６に送信する。 The description will be continued with reference to FIG. 9 again. The object position determination means 51 outputs the object position information determined in step S4 to the communication unit 3 (step S5), and the communication unit 3 operates as the object position output means 31 to display the object position information on the display unit 6. Send.

＜変形例＞
（１）上記各実施形態およびその変形例においては、検出対象の物体を人とする例を示したが、これに限らず、検出対象の物体を車両、牛や羊等の動物等とすることもできる。 <Modification example>
(1) In each of the above embodiments and modifications thereof, an example in which the object to be detected is a human is shown, but the object to be detected is not limited to this, and the object to be detected is an animal such as a vehicle, a cow or a sheep. You can also.

（２）上記各実施形態およびその各変形例においては物体を高さ方向に３分割し幅方向に２分割した部分を単位として部分領域を設定する例を示したが、分け方はこれに限らない。検出対象や撮影する監視空間の特性、採用する特徴量や評価値の種類などの違いに応じ、それぞれに適した別の比率で分けた部分領域とすることができる。またオーバーラップさせて部分領域を設定してもよい。 (2) In each of the above-described embodiments and modifications thereof, an example is shown in which an object is divided into three parts in the height direction and a partial area is set in units of two parts in the width direction, but the division method is limited to this. Absent. Depending on the difference in the detection target, the characteristics of the monitoring space to be photographed, the feature amount to be adopted, the type of evaluation value, etc., the partial area can be divided by a different ratio suitable for each. Further, the partial areas may be set by overlapping.

（３）上記各実施形態およびその各変形例において示した重みは一例であり、検出対象や撮影する監視空間の特性、採用する特徴量や評価値の種類などの違いに応じ、それぞれに適した別の値とすることができる。 (3) The weights shown in each of the above embodiments and their modifications are examples, and are suitable for each of them according to the difference in the detection target, the characteristics of the monitoring space to be photographed, the feature amount to be adopted, the type of evaluation value, and the like. It can be another value.

（４）上記各実施形態およびその各変形例においては、多クラスＳＶＭ法にて学習した密度推定器を例示したが、多クラスＳＶＭ法に代えて、決定木型のランダムフォレスト法、多クラスのアダブースト（AdaBoost）法または多クラスロジスティック回帰法などにて学習した密度推定器など種々の密度推定器とすることができる。
或いは識別型のＣＮＮ（Convolutional Neural Network）を用いた密度推定器とすることもできる。 (4) In each of the above embodiments and each modification thereof, the density estimator learned by the multi-class SVM method is illustrated, but instead of the multi-class SVM method, a decision tree type random forest method and a multi-class method are used. Various density estimators such as density estimators learned by the AdaBoost method or the multiclass logistic regression method can be used.
Alternatively, it can be a density estimator using an identification type CNN (Convolutional Neural Network).

（５）上記各実施形態およびその各変形例においては、密度推定器が推定する背景以外の密度のクラスを３クラスとしたが、より細かくクラスを分けてもよい。
その場合、３段階の重みに代えて、クラス分けに対応したより細かい段階の重みとし、クラスと重みを対応付けて単体特徴記憶手段４１に記憶させておくことができる。或いは、クラスと３段階の重みを多対一で対応付けて単体特徴記憶手段４１に記憶させておくこともできる。 (5) In each of the above-described embodiments and modifications thereof, the density classes other than the background estimated by the density estimator are set to 3 classes, but the classes may be further divided.
In that case, instead of the three-step weight, a finer-step weight corresponding to the classification can be used, and the class and the weight can be associated and stored in the single feature storage means 41. Alternatively, the class and the weights of the three stages can be associated with each other in a many-to-one manner and stored in the single feature storage means 41.

（６）上記各実施形態およびその各変形例においては、多クラスに分類する密度推定器を例示したがこれに代えて、特徴量から密度の値（推定密度）を回帰する回帰型の密度推定器とすることもできる。すなわち、リッジ回帰法、サポートベクターリグレッション法、回帰木型のランダムフォレスト法またはガウス過程回帰（Gaussian Process Regression）などによって、特徴量から推定密度を求めるための回帰関数のパラメータを学習した密度推定器とすることができる。
或いは回帰型のＣＮＮを用いた密度推定器とすることもできる。
これらの場合、密度クラスの値の代わりに連続値で出力される推定密度の値域を、部分モデルおよび重み、または部分識別器および重みと対応付けて単体特徴記憶手段４１に記憶させておく。またこれらの場合、部分領域の密度は部分領域内で最多の推定密度とする以外にも、部分領域内の推定密度の平均値または中央値などとすることもできる。 (6) In each of the above embodiments and each modification thereof, a density estimator classified into multiple classes is illustrated, but instead of this, a regression type density estimation that returns a density value (estimated density) from a feature amount is used. It can also be used as a vessel. That is, with a density estimator that has learned the parameters of the regression function for obtaining the estimated density from the features by the ridge regression method, the support vector regression method, the random forest method of the regression tree type, or the Gaussian process regression. can do.
Alternatively, it can be a density estimator using a regression type CNN.
In these cases, the range of the estimated density output as a continuous value instead of the value of the density class is stored in the single feature storage means 41 in association with the partial model and the weight, or the partial classifier and the weight. In these cases, the density of the partial region can be the average value or the median value of the estimated densities in the partial region, in addition to the highest estimated density in the partial region.

（７）上記各実施形態およびその各変形例においては、密度推定器が学習する特徴量および推定用特徴量としてＧＬＣＭ特徴を例示したが、これらはＧＬＣＭ特徴に代えて、局所二値パターン（Local Binary Pattern：ＬＢＰ）特徴量、ハールライク（Haar-like）特徴量、ＨＯＧ特徴量、輝度パターンなどの種々の特徴量とすることができ、またはＧＬＣＭ特徴とこれらのうちの複数を組み合わせた特徴量とすることもできる。 (7) In each of the above-described embodiments and modifications thereof, the GLCM features are exemplified as the feature amount learned by the density estimator and the feature amount for estimation, but these are local binary patterns (Local) instead of the GLCM features. Binary Pattern: LBP) features, Haar-like features, HOG features, luminance patterns, and other features, or GLCM features and a combination of multiple of these. You can also do it.

（８）上記各実施形態およびその各変形例においては、密度推定手段５０および物体位置判定手段５１が１画素間隔で走査して処理を行う例を示したが、これらの走査を２画素以上の間隔を空けて行うことも可能である。 (8) In each of the above-described embodiments and modifications thereof, an example is shown in which the density estimation means 50 and the object position determination means 51 scan at one pixel interval to perform processing, but these scans are performed by two or more pixels. It is also possible to do it at intervals.

（９）上記各実施形態およびその各変形例においては、候補位置を推定密度が低密度、中密度または高密度の領域内から選んで設定する例を示したが、配置生成手段５１０ａおよび候補位置設定手段５１１ｂのそれぞれは、変化領域内に限定して候補位置を設定することもできる。その場合、記憶部４は監視空間の背景画像を記憶する背景画像記憶手段（不図示）を備え、画像処理部５は、撮影画像と背景画像との差分処理を行って差分値が所定の差分閾値以上である画素の集まりを変化領域として抽出する、または撮影画像と背景画像との相関処理を行って相関値が所定の相関閾値以下である画素の集まりを変化領域として抽出する変化領域抽出手段（不図示）を備え、配置生成手段５１０ａおよび候補位置設定手段５１１ｂのそれぞれは、変化領域抽出手段が抽出した変化領域を参照して候補位置を設定する。
なお、候補位置を設定する領域を限定する場合、配置生成手段５１０ａは限定した領域の大きさに応じて配置数の上限個数を変更することができる。
このような候補位置を設定する領域の限定によって、撮影画像とモデル画像の偶発的な類似または背景に対する高い識別スコアの偶発的な算出を防止でき、物体位置の誤検出を低減できる。 (9) In each of the above-described embodiments and modifications thereof, an example in which the candidate position is selected and set from the region where the estimated density is low density, medium density, or high density is shown, but the arrangement generation means 510a and the candidate position are shown. Each of the setting means 511b can also set the candidate position only within the change area. In that case, the storage unit 4 includes a background image storage means (not shown) for storing the background image of the monitoring space, and the image processing unit 5 performs difference processing between the captured image and the background image so that the difference value is a predetermined difference. A change area extraction means for extracting a group of pixels having a threshold value or more as a change area, or performing a correlation process between a captured image and a background image to extract a group of pixels having a correlation value equal to or less than a predetermined correlation threshold value as a change area. (Not shown), each of the arrangement generation means 510a and the candidate position setting means 511b sets a candidate position with reference to the change area extracted by the change area extraction means.
When limiting the area for setting the candidate position, the arrangement generation means 510a can change the upper limit number of arrangements according to the size of the limited area.
By limiting the area for setting the candidate position in this way, it is possible to prevent accidental similarity between the captured image and the model image or accidental calculation of a high discrimination score for the background, and it is possible to reduce erroneous detection of the object position.

（１０）上記第一の実施形態とその各変形例においては、配置生成手段５１０ａが反復の都度ランダムに配置を生成する例を示したが、反復の２回目以降に一回前の候補位置から微小にずらした候補位置に更新することで配置を生成してもよいし、反復の２回目以降に一回前の配置に対する類似度を参照してＭＣＭＣ（Markov chain Monte Carlo）法により確率的に候補位置を探索する方法や山登り法により候補位置を逐次改善することで配置を生成してもよい。 (10) In the first embodiment and each modification thereof, an example in which the arrangement generation means 510a randomly generates an arrangement each time the repetition is performed is shown, but from the candidate position one time before the second and subsequent repetitions. Arrangements may be generated by updating to slightly shifted candidate positions, or stochastically by the MCMC (Markov chain Monte Carlo) method with reference to the similarity to the previous arrangement after the second iteration. Arrangements may be generated by sequentially improving the candidate positions by a method of searching for a candidate position or a mountain climbing method.

（１１）上記各実施形態およびその各変形例においては、注目している候補位置に人の上部１／３の形状に定められたモデルの投影領域または該形状に定められた窓を設定して当該領域内の推定密度を集計することによって、当該候補位置における推定密度を決定する例を示したが、処理量を削減するために当該領域に代えて候補位置の画素、候補位置の８近傍領域または１６近傍領域などの小さな領域とすることもできる。或いは、確度を上げるために当該領域に代えて候補位置を代表位置とする単独の人の上部２／３の形状に定められたモデルの投影領域または該形状に定められた窓、または候補位置を代表位置とする単独の人の全身の形状に定められたモデルの投影領域または該形状に定められた窓などの大きな領域とすることもできる。 (11) In each of the above embodiments and each modification thereof, a projection area of the model defined in the shape of the upper 1/3 of the person or a window defined in the shape is set at the candidate position of interest. An example of determining the estimated density at the candidate position by aggregating the estimated densities in the area is shown, but in order to reduce the processing amount, the pixels at the candidate position and the area near 8 of the candidate position are replaced with the area. Alternatively, it may be a small region such as a region near 16. Alternatively, in order to increase the accuracy, the projection area of the model defined in the shape of the upper two-thirds of a single person whose representative position is the candidate position, the window defined in the shape, or the candidate position is used instead of the region. It can also be a projection area of the model defined in the shape of the whole body of a single person as a representative position or a large area such as a window defined in the shape.

（１２）第二の実施形態およびその変形例においては、線形ＳＶＭ法により学習された部分識別器を例示したが、線形ＳＶＭ法に代えてアダブースト（AdaBoost）法など、従来知られた各種の学習法を用いて学習した部分識別器とすることもできる。また、識別器の代わりにパターンマッチング器を用いることもでき、その場合の部分スコアは人の学習用画像から抽出した特徴量の平均パターンと入力画像の特徴量との内積などとなり、評価値算出関数は当該スコアを出力値とし撮影画像の特徴量を入力値とする関数とすることができる。また部分識別器として識別型のＣＮＮを用いても良い。 (12) In the second embodiment and its modifications, the partial classifier learned by the linear SVM method is illustrated, but various conventionally known learning methods such as the AdaBoost method are used instead of the linear SVM method. It can also be a partial classifier learned using the method. A pattern matching device can also be used instead of the classifier, and the partial score in that case is the inner product of the average pattern of the features extracted from the human learning image and the features of the input image, and the evaluation value is calculated. The function can be a function that uses the score as an output value and the feature amount of the captured image as an input value. Further, an identification type CNN may be used as the partial classifier.

（１３）第二の実施形態およびその各変形例においては、部分識別器が学習および識別に用いる特徴量としてＨＯＧ特徴量を例示したが、これらはＨＯＧ特徴量に代えて、局所二値パターン特徴量、ハールライク特徴量、輝度パターンなどの種々の特徴量とすることができ、またはＨＯＧ特徴量とこれらのうちの複数を組み合わせた特徴量とすることもできる。 (13) In the second embodiment and each modification thereof, HOG features are exemplified as features used by the partial classifier for learning and discrimination, but these are local binary pattern features instead of HOG features. It can be a variety of features such as a quantity, a haul-like feature, and a luminance pattern, or it can be a combination of a HOG feature and a plurality of these features.

（１４）第二の実施形態およびその各変形例においては、評価値算出手段５１４ｂが候補位置ごとに重みの和が１となるように正規化する例を示したが、さらに水平方向（左右方向）の重みの和が１となるように正規化してもよい。 (14) In the second embodiment and each modification thereof, an example in which the evaluation value calculation means 514b normalizes so that the sum of the weights is 1 for each candidate position is shown, but further in the horizontal direction (horizontal direction). ) May be normalized so that the sum of the weights is 1.

（１５）第二の実施形態およびその各変形例においては、物体の部分を重視する度合いを重みによって切り替える例を示したが、部分評価値に対する閾値（部分閾値）を切り替えることもできる。その場合、例えば、単体特徴記憶手段４１は密度が低いほど低く密度が高いほど高く予め定められた部分閾値を各密度に対応づけて記憶し、評価値算出手段５１４ｂは部分評価値が部分閾値以上である部分領域の個数の割合を統合評価値として算出する。そして、位置決定手段５１７ｂは統合評価値が予め定めた閾値（例えば５０％）以上である候補位置を物体位置と判定する。 (15) In the second embodiment and each modification thereof, an example in which the degree of emphasis on the part of the object is switched by the weight is shown, but the threshold value (partial threshold value) for the partial evaluation value can also be switched. In that case, for example, the single feature storage means 41 stores a predetermined partial threshold value corresponding to each density, which is lower as the density is lower and higher as the density is higher, and the evaluation value calculation means 514b has a partial evaluation value equal to or higher than the partial threshold value. The ratio of the number of subregions is calculated as the integrated evaluation value. Then, the position determining means 517b determines the candidate position whose integrated evaluation value is equal to or higher than a predetermined threshold value (for example, 50%) as the object position.

以上の各実施形態およびその変形例によれば、物体検出装置は、候補位置および部分領域ごとの密度に応じ、当該密度により物体に生じ得る隠蔽状態に適した部分の特徴および重視する度合いを切り替えて個々の物体の位置を判定するので、混雑状態の変化に伴う物体の隠蔽状態の変化に適応した精度の高い物体検出が可能となる。 According to each of the above embodiments and variations thereof, the object detection device switches the characteristics and the degree of emphasis of the portion suitable for the concealed state that can occur in the object depending on the candidate position and the density of each partial region. Since the position of each object is determined, it is possible to detect the object with high accuracy in accordance with the change in the concealed state of the object due to the change in the congestion state.

またそのうちの第一の実施形態およびその変形例に係る物体検出装置は、物体の各部分の画像特徴を表す部分モデルおよび部分モデルの撮影画像に対する部分評価値を評価する際に重視する度合いを密度に応じて切り替えることによって混雑状態の変化に伴う物体の隠蔽状態の変化に適応した精度の高い物体検出を可能とする。 Further, the object detection device according to the first embodiment and its modification has a density of a partial model representing the image features of each part of the object and a degree of importance when evaluating the partial evaluation value of the captured image of the partial model. By switching according to the above, it is possible to detect an object with high accuracy adapted to the change in the concealed state of the object due to the change in the congestion state.

またそのうちの第二の実施形態およびその変形例に係る物体検出装置は、物体の各部分の画像特徴を学習した部分識別器および部分識別器による部分ごとの部分評価値を総和する際に重視する度合いを密度に応じて切り替えることによって混雑状態の変化に伴う物体の隠蔽状態の変化に適応した精度の高い物体検出を可能とする。
Further, the object detection device according to the second embodiment and the modified example thereof attaches importance to the partial classifier that has learned the image features of each part of the object and the partial evaluation value for each part by the partial classifier. By switching the degree according to the density, it is possible to detect an object with high accuracy adapted to the change in the concealment state of the object due to the change in the congestion state.

１・・・画像監視装置
２・・・撮影部
３・・・通信部
３０・・・画像取得手段
３１・・・物体位置出力手段
４・・・記憶部
４０・・・密度推定器記憶手段
４１・・・単体特徴記憶手段
４１０ａ・・・物体モデル記憶手段
４１１ｂ・・・部分識別器記憶手段
４１２ａ、４１２ｂ・・・重み記憶手段
５・・・画像処理部
５０・・・密度推定手段
５１・・・物体位置判定手段
５１０ａ・・・配置生成手段
５１１ｂ・・・候補位置設定手段
５１２ａ・・・モデル画像生成手段
５１４ａ、５１４ｂ・・・評価値算出手段
５１６ａ・・・最適配置決定手段
５１７ｂ・・・位置決定手段
６・・・表示部

1 ... Image monitoring device 2 ... Imaging unit 3 ... Communication unit 30 ... Image acquisition means 31 ... Object position output means 4 ... Storage unit 40 ... Density estimator storage means 41 ... Single feature storage means 410a ... Object model storage means 411b ... Partial classifier storage means 412a, 412b ... Weight storage means 5 ... Image processing unit 50 ... Density estimation means 51 ... -Object position determination means 510a ... Arrangement generation means 511b ... Candidate position setting means 512a ... Model image generation means 514a, 514b ... Evaluation value calculation means 516a ... Optimal arrangement determination means 517b ... Position determination means 6 ... Display unit

Claims

An object detection device that detects individual objects from captured images in a space where congestion due to a predetermined object can occur.
Density image of the space where the object exists is estimated for each predetermined density Using a density estimator that learns the image features of each image, the distribution of the density of the object captured in the photographed image is estimated. Density estimation means and
A candidate position where each object may exist in the captured image is set, and a partial region corresponding to each of a plurality of portions constituting the object is set in the captured image with reference to the candidate position. For each of the partial regions, a partial evaluation value indicating the degree to which the image feature of the portion corresponding to the partial region appears is calculated, and the partial evaluation value of the plurality of the partial regions set based on the candidate position is used as the partial evaluation value. An object position determination means for determining a candidate position in which the integrated evaluation value integrated with emphasis as the density of the region is lower and satisfying a predetermined determination criterion is the position of the object.
An object detection device characterized by being equipped with.

The object position determining means sets a weight coefficient for each of the partial regions, which is higher as the density in the partial region is lower and lower as the density is higher in the partial region, and a plurality of said portions set based on the candidate position. The object detection device according to claim 1, wherein the partial evaluation value of the region is weighted by the weighting coefficient of the partial region and summed to calculate the integrated evaluation value.

The object detection device according to claim 1 or 2, wherein the object position determination means sets the partial region corresponding to a smaller portion of the portions constituting the object as the density at the candidate position increases.

The object position determination means is
Arrangement generation means, each of which generates a plurality of different arrangements including one or more of the candidate positions, and
A model image generation means for generating a model image by drawing a partial model imitating the portion in the partial region corresponding to each of the plurality of portions based on each of the candidate positions for each of the plurality of arrangements. When,
For each of the model images in the plurality of arrangements, the partial evaluation value indicating the degree of similarity of the partial model to the captured image is calculated for each of the partial regions, and the partial evaluation values of the plurality of the partial regions are integrated. And the evaluation value calculation means for calculating the integrated evaluation value,
Optimal placement determining means for determining the candidate position in the placement with the maximum integrated evaluation value as the position of the object, and
The object detection device according to any one of claims 1 to 3.

The object position determination means is
Candidate position setting means for setting a plurality of the candidate positions in the captured image, and
The image feature of the partial region corresponding to each of the plurality of portions set based on each candidate position is input to the classifier that has learned the image feature of the portion, and the partial evaluation value of the partial region is calculated. An evaluation value calculation means for calculating the integrated evaluation value by integrating the partial evaluation values of a plurality of the partial regions set based on the candidate position for each candidate position.
A position-determining means for determining the candidate position from which the integrated evaluation value satisfying the determination criteria is calculated as the position of the object, and
The object detection device according to any one of claims 1 to 3.