JP2018156236A

JP2018156236A - Object position estimating apparatus

Info

Publication number: JP2018156236A
Application number: JP2017051077A
Authority: JP
Inventors: 秀紀氏家; Hidenori Ujiie; 黒川　高晴; Takaharu Kurokawa; 高晴黒川; 昌宏前田; Masahiro Maeda; 匠宗片; Takumi Munekata
Original assignee: Secom Co Ltd
Current assignee: Secom Co Ltd
Priority date: 2017-03-16
Filing date: 2017-03-16
Publication date: 2018-10-04
Anticipated expiration: 2037-03-16
Also published as: JP6851233B2

Abstract

PROBLEM TO BE SOLVED: To provide an object position estimating apparatus capable of accurately estimating an individual object position from a photographed image obtained by photographing a space where congestion may occur.SOLUTION: The object position estimating apparatus of the present invention includes: density estimating means 50 for estimating a density distribution of objects in a photographed image by using a density estimator which has previously learned features of a density image obtained by photographing a space where the objects exist at a predetermined density for each predetermined density; object model storing means 41 for storing object models imitating the object; model image generating means 51 for setting an arrangement region in the photographed image and assigning positions in the arrangement region to the object models the number of which corresponds to the distribution to set a plurality of arrangements and drawing the object models in the plurality of arrangements to generate model images; and optimum arrangement estimation means 52 for calculating similarities between the respective model images in the plurality of arrangements and the photographed images to output the arrangement having the highest similarity out of the plurality of arrangements.SELECTED DRAWING: Figure 2

Description

本発明は、画像から人等の所定の物体の位置を推定する物体位置推定装置に関し、特に、混雑が生じ得る空間が撮影された画像から個々の物体の位置を推定する物体位置推定装置に関する。 The present invention relates to an object position estimation apparatus that estimates the position of a predetermined object such as a person from an image, and more particularly to an object position estimation apparatus that estimates the position of an individual object from an image in which a space where congestion can occur is captured.

イベント会場等の混雑が発生し得る空間においては事故防止等のために、混雑が発生している区域に警備員を多く配置するなどの対応が求められる。そこで、会場の各所に監視カメラを配置して撮影画像から人の分布を推定し、推定した分布を表示することによって監視員による混雑状況の把握を容易化することができる。 In an event venue or other space where congestion can occur, countermeasures such as placing a large number of guards in the crowded area are required to prevent accidents. Therefore, monitoring cameras can be arranged at various locations in the venue to estimate the distribution of people from the captured images and display the estimated distribution, thereby facilitating the understanding of the congestion situation by the monitoring staff.

その際、個々の人の位置を推定することによって、推定した個々の位置に人の形状を模したモデルを表示し、または／および人の位置関係（例えば行列を為している、取り囲んでいる）を解析して解析結果を報知することによって、より一層の監視効率向上が期待できる。 In doing so, by estimating the position of the individual person, a model imitating the person's shape is displayed at the estimated individual position, and / or the positional relationship of the person (for example, forming a matrix, surrounding) ) And reporting the analysis result, further improvement in monitoring efficiency can be expected.

複数人が撮影された撮影画像から個々の人の位置を推定する方法のひとつに、人を模したモデルを複数個組み合わせて撮影画像に当てはめる方法がある。 One of the methods for estimating the position of each person from a photographed image obtained by photographing a plurality of people is a method of applying a plurality of models imitating people to the photographed image.

特許文献１に記載の移動物体追跡装置においては、監視画像と背景画像との比較によって変化画素が抽出された位置に、追跡中の移動物体の形状を模した移動物体モデルを追跡中の移動物体の数だけ組み合わせて当てはめることによって個々の移動物体の位置を推定している。その際、各物体位置に対して推定された物体領域を合成し、変化画素のうち合成領域外の変化画素を検出してそれらをラベリングし、ラベルが移動物体とみなせる大きさであればラベルの位置を新規出現した移動物体の位置とすることが記載されている。 In the moving object tracking device described in Patent Document 1, the moving object that is tracking the moving object model that imitates the shape of the moving object that is being tracked at the position where the changed pixel is extracted by comparing the monitoring image and the background image. The position of each moving object is estimated by combining and applying the same number. At that time, the estimated object area is synthesized for each object position, and the change pixels outside the synthesis area among the change pixels are detected and labeled, and if the label is large enough to be regarded as a moving object, the label It describes that the position is the position of a newly appearing moving object.

特開２０１２−１５９９５８号公報JP 2012-159958 A

しかしながら、混雑時の空間を撮影した撮影画像は、当てはめるべきモデルの個数を推定することが困難であり、高精度に人の位置を推定できない問題があった。 However, it is difficult to estimate the number of models to be applied to a photographed image obtained by photographing a space at the time of congestion, and there is a problem that the position of a person cannot be estimated with high accuracy.

すなわち、モデルの組み合わせを変化領域に当てはめる方法では、領域重複の割合が大きな組み合わせを含ませれば、本来人数以上の個数のモデルが当てはまってしまうため、組み合わせるモデルの個数を尤もらしい個数に制限する必要がある。 In other words, in the method of applying a combination of models to a change area, if a combination with a large area overlap ratio is included, more models than the original number are applied, so it is necessary to limit the number of models to be combined to a reasonable number. There is.

ところが、混雑時の空間を撮影した撮影画像においては、多数人の新規出現が人同士のオクルージョンを伴って同時に生じ得る。このことは、オクルージョンの度合いによって、新規出現による変化領域（合成領域外のラベル）の面積と新規出現した人数の関係が様々に変わり得ることを意味する。そのため、合成領域外のラベルの面積から、当該ラベルに当てはめるべきモデルの個数を推定することが困難であった。 However, in a photographed image obtained by photographing a crowded space, a large number of new appearances can occur simultaneously with the occlusion between people. This means that depending on the degree of occlusion, the relationship between the area of the changed area (label outside the combined area) due to the new appearance and the number of newly appearing people can vary. For this reason, it is difficult to estimate the number of models to be applied to the label from the area of the label outside the synthesis region.

加えて、混雑時の空間を撮影した撮影画像においては、多数人の消失が多数人の新規出現と同時に生じ得る。そのため、新規出現による変化領域のみならず、それ以外の部分を含めた変化領域に当てはめるべきモデルの個数を推定することが困難であった。 In addition, disappearance of a large number of people may occur at the same time as the emergence of a large number of people in a captured image obtained by photographing a space at the time of congestion. For this reason, it is difficult to estimate the number of models to be applied not only to a change area due to a new appearance but also to a change area including other portions.

そのため、混雑時の空間を撮影した撮影画像に対し、当てはめるモデルの個数を尤もらしい個数に制限できず、高精度に人の位置を推定できないのである。 For this reason, the number of models to be applied to a photographed image obtained by capturing a crowded space cannot be limited to a reasonable number, and the position of a person cannot be estimated with high accuracy.

本発明は上記問題に鑑みてなされたものであり、混雑が生じ得る空間が撮影された撮影画像から個々の物体の位置を精度良く推定できる物体位置推定装置を提供することを目的とする。 The present invention has been made in view of the above problems, and an object of the present invention is to provide an object position estimation apparatus that can accurately estimate the position of each object from a captured image in which a space in which congestion may occur is captured.

かかる課題を解決するため本発明は、所定の物体による混雑が生じ得る推定対象空間が撮影された撮影画像から物体それぞれの位置を推定する物体位置推定装置であって、所定の密度ごとに当該密度にて物体が存在する空間を撮影した密度画像の特徴を予め学習した密度推定器を用いて、撮影画像に撮影された物体の密度の分布を推定する密度推定手段と、物体を模した物体モデルを記憶している物体モデル記憶手段と、撮影画像に配置領域を設定するとともに、分布に応じた個数の物体モデルに配置領域内の位置を割り当てて複数通りの配置を設定し、複数通りの配置にて物体モデルを描画してモデル画像を生成するモデル画像生成手段と、複数通りの配置それぞれのモデル画像と撮影画像との類似度を算出して、複数通りの配置のうちの類似度が最も高い配置を出力する最適配置推定手段と、を備えたことを特徴とする物体位置推定装置を提供する。 In order to solve such a problem, the present invention provides an object position estimation device that estimates the position of each object from a captured image obtained by capturing an estimation target space in which congestion due to the predetermined object may occur. The density estimation means for estimating the density distribution of the object photographed in the photographed image using the density estimator that has previously learned the features of the density image photographed in the space where the object exists at the object, and the object model imitating the object The object model storage means for storing the image and the arrangement area are set in the photographed image, and a plurality of arrangements are set by assigning positions in the arrangement area to the number of object models corresponding to the distribution. The model image generating means for drawing the object model and generating the model image at the same time, and calculating the similarity between the model image and the captured image of each of the plurality of arrangements, Similarity score to provide an object position estimation device characterized by comprising a, and optimal placement estimating means for outputting the highest placement.

かかる物体位置推定装置においてモデル画像生成手段は、分布において想定される推定誤差の範囲で複数通りの個数の物体モデルを割り当てて複数通りの配置を設定し、最適配置推定手段は、配置ごとの類似度を当該配置における個数と対応する推定誤差が大きいほど低めて算出することが好適である。 In such an object position estimation device, the model image generation means assigns a plurality of types of object models within a range of an estimation error assumed in the distribution and sets a plurality of kinds of arrangements, and the optimum arrangement estimation means determines the similarity for each arrangement. It is preferable to calculate the degree as the degree of estimation error corresponding to the number in the arrangement increases.

かかる物体位置推定装置においてモデル画像生成手段は、配置ごとにモデル画像における物体モデルどうしの重なり度合いを算出し、最適配置推定手段は、配置ごとの類似度を当該配置における重なり度合いが大きいほど低めて算出することが好適である。 In such an object position estimation device, the model image generation means calculates the degree of overlap between the object models in the model image for each arrangement, and the optimum arrangement estimation means decreases the similarity for each arrangement as the degree of overlap in the arrangement increases. It is preferable to calculate.

かかる物体位置推定装置においては、撮影画像における局所領域のサイズを推定対象空間における実サイズに換算するサイズ換算手段、をさらに備え、密度推定手段は、局所領域ごとに密度を推定し、モデル画像生成手段は、配置領域に含まれる局所領域の実サイズと密度から当該配置領域に割り当てる個数を決定することが好適である。 The object position estimation apparatus further includes a size conversion unit that converts the size of the local region in the captured image into an actual size in the estimation target space, and the density estimation unit estimates the density for each local region and generates a model image. Preferably, the means determines the number to be allocated to the arrangement area from the actual size and density of the local area included in the arrangement area.

かかる物体位置推定装置においては、撮影画像を推定対象空間の背景画像と比較して撮影画像において背景画像と所定基準以上に相違する変化領域を抽出する変化領域抽出手段、をさらに備え、モデル画像生成手段は、変化領域を配置領域に設定することが好適である。 The object position estimation apparatus further includes a change area extraction unit that compares the captured image with the background image of the estimation target space and extracts a change area that differs from the background image by a predetermined reference or more in the captured image, and generates a model image. It is preferable that the means sets the change area as the arrangement area.

かかる物体位置推定装置においてモデル画像生成手段は、分布において０より大きな密度が推定された領域を配置領域に設定することが好適である。 In such an object position estimation apparatus, it is preferable that the model image generation unit sets an area where a density greater than 0 is estimated in the distribution as the arrangement area.

かかる物体位置推定装置において物体モデル記憶手段は、物体の形状を模した物体モデルを記憶し、最適配置推定手段は、モデル画像に描画された物体モデルの形状の、配置領域に対する適合度の高さに応じた類似度を算出することが好適である。
In such an object position estimation device, the object model storage means stores an object model that imitates the shape of the object, and the optimum arrangement estimation means has a high degree of fitness of the shape of the object model drawn on the model image with respect to the arrangement area. It is preferable to calculate the degree of similarity according to.

本発明によれば、撮影画像から物体の密度分布を推定して密度分布に応じた個数の物体モデルを用いて個々の物体の位置を推定するため、混雑が生じ得る空間が撮影された撮影画像から個々の物体の位置を精度良く推定できる。
According to the present invention, since the density distribution of an object is estimated from the photographed image and the position of each object is estimated using the number of object models corresponding to the density distribution, the photographed image in which a space in which congestion can occur is photographed. Thus, the position of each object can be estimated with high accuracy.

画像監視装置１の概略の構成を示すブロック図である。1 is a block diagram showing a schematic configuration of an image monitoring device 1. FIG. 画像監視装置１の機能ブロック図である。2 is a functional block diagram of the image monitoring apparatus 1. FIG. 物体モデル記憶手段４１の記憶内容を示す模式図である。FIG. 4 is a schematic diagram showing the contents stored in an object model storage means 41. サイズ換算手段５４の構成要素を示す模式図である。4 is a schematic diagram showing components of a size conversion means 54. FIG. 実サイズ算出手段５４０の処理例を説明する図である。It is a figure explaining the example of a process of the real size calculation means 540. FIG. 形状適合度の算出処理の一例を模式的に示した図である。It is the figure which showed typically an example of the calculation process of a shape fitting degree. モデル画像２１０に対して隠蔽度を算出する様子を示した図である。It is the figure which showed a mode that the concealment degree was calculated with respect to the model image. 画像処理部５の動作を説明するフロー図である。6 is a flowchart for explaining the operation of the image processing unit 5. FIG. 注目変化領域の最適配置推定を説明する１つめの図である。It is the 1st figure explaining optimal arrangement | positioning estimation of an attention change area | region. 注目変化領域の最適配置推定を説明する２つめの図である。It is the 2nd figure explaining optimal arrangement | positioning estimation of an attention change area | region.

以下、本発明の物体位置推定装置を含む好適な実施形態の一例として、物体位置推定装置によってイベント会場を撮影した撮影画像から個々の人の位置を推定し、推定した位置の情報を表示する画像監視装置１について説明する。 Hereinafter, as an example of a preferred embodiment including the object position estimation apparatus of the present invention, an image that estimates the position of each person from a captured image obtained by photographing an event venue by the object position estimation apparatus and displays information on the estimated position The monitoring device 1 will be described.

＜画像監視装置１の構成＞
図１は画像監視装置１の概略の構成を示すブロック図である。画像監視装置１は、撮影部２、通信部３、記憶部４、画像処理部５、および出力部６からなる。 <Configuration of Image Monitoring Device 1>
FIG. 1 is a block diagram showing a schematic configuration of the image monitoring apparatus 1. The image monitoring apparatus 1 includes a photographing unit 2, a communication unit 3, a storage unit 4, an image processing unit 5, and an output unit 6.

撮影部２は、監視カメラであり、通信部３を介して画像処理部５と接続され、監視空間を所定の時間間隔で撮影して撮影画像を生成し、撮影画像を順次画像処理部５に入力する撮影手段である。例えば、撮影部２は、イベント会場に設置されたポールに当該監視空間を俯瞰する所定の固定視野を有して設置され、監視空間をフレーム周期１秒で撮影してカラー画像を生成する。カラー画像の代わりにモノクロ画像を生成してもよい。 The photographing unit 2 is a surveillance camera and is connected to the image processing unit 5 via the communication unit 3. The photographing unit 2 shoots the monitoring space at a predetermined time interval to generate a photographed image, and sequentially captures the photographed image to the image processing unit 5. It is a photographing means to input. For example, the imaging unit 2 is installed on a pole installed at an event venue with a predetermined fixed visual field overlooking the monitoring space, and images the monitoring space with a frame period of 1 second to generate a color image. A monochrome image may be generated instead of the color image.

通信部３は、通信回路であり、その一端が画像処理部５に接続され、他端が同軸ケーブルまたはＬＡＮ（Local Area Network）、インターネットなどの通信網を介して撮影部２および出力部６と接続される。通信部３は、撮影部２から撮影画像を取得して画像処理部５に入力し、画像処理部５から入力された推定結果を出力部６に出力する。 The communication unit 3 is a communication circuit, one end of which is connected to the image processing unit 5, and the other end is connected to the photographing unit 2 and the output unit 6 via a communication network such as a coaxial cable, a LAN (Local Area Network), or the Internet. Connected. The communication unit 3 acquires a captured image from the imaging unit 2 and inputs the acquired image to the image processing unit 5, and outputs an estimation result input from the image processing unit 5 to the output unit 6.

記憶部４は、ＲＯＭ（Read Only Memory）、ＲＡＭ（Random Access Memory）等のメモリ装置であり、各種プログラムや各種データを記憶する。記憶部４は、画像処理部５と接続されて画像処理部５との間でこれらの情報を入出力する。 The storage unit 4 is a memory device such as a ROM (Read Only Memory) or a RAM (Random Access Memory), and stores various programs and various data. The storage unit 4 is connected to the image processing unit 5 and inputs / outputs such information to / from the image processing unit 5.

画像処理部５は、ＣＰＵ（Central Processing Unit）、ＤＳＰ（Digital Signal Processor）、ＭＣＵ（Micro Control Unit）等の演算装置で構成される。画像処理部５は、記憶部４および出力部６と接続され、記憶部４からプログラムを読み出して実行することにより各種処理手段・制御手段として動作し、各種データを記憶部４に記憶させ、読み出す。また、画像処理部５は、通信部３を介して撮影部２および出力部６とも接続され、通信部３経由で撮影部２から取得した撮影画像を解析することにより監視空間に存在する人物の位置及び、人物領域を通信部３経由で出力部６に表示させる。 The image processing unit 5 is configured by an arithmetic device such as a CPU (Central Processing Unit), a DSP (Digital Signal Processor), or an MCU (Micro Control Unit). The image processing unit 5 is connected to the storage unit 4 and the output unit 6, operates as various processing units / control units by reading out and executing programs from the storage unit 4, and stores various types of data in the storage unit 4 for reading. . The image processing unit 5 is also connected to the imaging unit 2 and the output unit 6 via the communication unit 3, and analyzes a captured image acquired from the imaging unit 2 via the communication unit 3 to analyze the person existing in the monitoring space. The position and the person area are displayed on the output unit 6 via the communication unit 3.

出力部６は、液晶ディスプレイ又はＣＲＴ（Cathode Ray Tube）ディスプレイ等のディスプレイ装置であり、通信部３を介して画像処理部５と接続され、画像処理部５による推定結果を表示する表示手段である。監視員は表示された推定結果を視認して混雑の発生等を判断し、必要に応じて人員配置の変更等の対処を行う。 The output unit 6 is a display device such as a liquid crystal display or a CRT (Cathode Ray Tube) display, and is a display unit that is connected to the image processing unit 5 via the communication unit 3 and displays an estimation result by the image processing unit 5. . The monitor visually recognizes the displayed estimation result to determine the occurrence of congestion, and takes measures such as changing the personnel assignment as necessary.

なお、本実施形態においては、撮影部２と画像処理部５の個数が１対１である画像監視装置１を例示するが、別の実施形態においては、撮影部２と画像処理部５の個数を多対１或いは多対多とすることもできる。 In the present embodiment, the image monitoring apparatus 1 in which the number of the photographing units 2 and the image processing units 5 is 1: 1 is illustrated, but in another embodiment, the number of the photographing units 2 and the image processing units 5 is illustrated. Can be many-to-one or many-to-many.

＜画像監視装置１の機能＞
図２は画像監視装置１の機能ブロック図である。通信部３は、画像取得手段３０および物体位置出力手段３１等として機能し、記憶部４は、背景画像記憶手段４０および物体モデル記憶手段４１等として機能する。また、画像処理部５は、密度推定手段５０、モデル画像生成手段５１、最適配置推定手段５２、変化領域抽出手段５３およびサイズ換算手段５４等として機能する。 <Function of the image monitoring apparatus 1>
FIG. 2 is a functional block diagram of the image monitoring apparatus 1. The communication unit 3 functions as the image acquisition unit 30 and the object position output unit 31, and the storage unit 4 functions as the background image storage unit 40, the object model storage unit 41, and the like. The image processing unit 5 functions as a density estimation unit 50, a model image generation unit 51, an optimum arrangement estimation unit 52, a change area extraction unit 53, a size conversion unit 54, and the like.

以下、図２から図７を参照して各手段について説明する。 Hereinafter, each means will be described with reference to FIGS.

画像取得手段３０は、撮影手段である撮影部２から撮影画像を順次取得して、取得した撮影画像を密度推定手段５０および変化領域抽出手段５３に順次出力する。 The image acquisition unit 30 sequentially acquires captured images from the imaging unit 2 that is an imaging unit, and sequentially outputs the acquired captured images to the density estimation unit 50 and the change area extraction unit 53.

密度推定手段５０は、画像取得手段３０から入力された撮影画像から密度推定用の特徴量（推定用特徴量）を抽出して、抽出した推定用特徴量を密度推定器に入力して取得される出力値を用いて人の密度の分布（密度分布）を推定し、推定した密度分布をモデル画像生成手段５１に出力する。 The density estimation unit 50 extracts the feature quantity for density estimation (estimation feature quantity) from the captured image input from the image acquisition unit 30 and acquires the extracted estimation feature quantity into the density estimator. A human density distribution (density distribution) is estimated using the output value obtained, and the estimated density distribution is output to the model image generation means 51.

具体的には、密度推定手段５０は、撮影画像の各画素の位置に窓（推定用抽出窓）を設定し、各推定用抽出窓における撮影画像の推定用特徴量を算出することによって、画素ごとに推定用特徴量を抽出する。推定用特徴量はＧＬＣＭ（Gray Level Co-occurrence Matrix）特徴である。
各推定用抽出窓に撮影されている監視空間内の領域が同一サイズであることが望ましい。すなわち、好適には密度推定手段５０は不図示のカメラパラメータ記憶手段から予め記憶されている撮影部２のカメラパラメータを読み出し、カメラパラメータを用いたホモグラフィ変換により撮影画像の任意の領域に撮影されている監視空間内の領域が同一サイズとなるように撮影画像を変形してから推定用特徴量を抽出する。なお、カメラパラメータ記憶手段は後述するカメラパラメータ記憶手段４１１または／および４２０と共用することもできる。 Specifically, the density estimation means 50 sets a window (estimation extraction window) at the position of each pixel of the captured image, and calculates a feature amount for estimation of the captured image in each estimation extraction window. The feature value for estimation is extracted every time. The estimation feature amount is a GLCM (Gray Level Co-occurrence Matrix) feature.
It is desirable that the area in the monitoring space photographed by each estimation extraction window is the same size. That is, preferably, the density estimation means 50 reads out the camera parameters of the photographing unit 2 stored in advance from a camera parameter storage means (not shown), and is photographed in an arbitrary region of the photographed image by homography conversion using the camera parameters. The estimation feature amount is extracted after the captured image is deformed so that the areas in the monitoring space have the same size. The camera parameter storage means can also be shared with camera parameter storage means 411 or / and 420 described later.

そして、密度推定手段５０は、画素ごとに、当該画素に対応して抽出した推定用特徴量を密度推定器に入力することによってその出力値である推定密度を取得する。この画素ごとの推定密度が密度分布である。なお、撮影画像を変形させて推定用特徴量を抽出した場合、密度推定手段５０は、カメラパラメータを用いたホモグラフィ変換により密度分布を元の撮影画像の形状に変形させる。 And the density estimation means 50 acquires the estimated density which is the output value for every pixel by inputting the estimation feature-value corresponding to the said pixel to a density estimator. The estimated density for each pixel is a density distribution. In addition, when the estimated feature amount is extracted by deforming the captured image, the density estimating unit 50 deforms the density distribution into the original captured image shape by homography conversion using the camera parameter.

密度推定器は、画像の特徴量を入力されると当該画像に撮影されている人の密度を推定して推定値（推定密度）を出力する関数である。当該関数が、その係数等のパラメータを含めて、密度推定手段５０のプログラムの一部として予め記憶されている。密度推定器は多クラスの画像を識別する識別器で実現することができ、多クラスＳＶＭ法で学習した識別関数とすることができる。 The density estimator is a function that, when a feature amount of an image is input, estimates the density of a person photographed in the image and outputs an estimated value (estimated density). The function including parameters such as the coefficient is stored in advance as part of the program of the density estimation means 50. The density estimator can be realized by a discriminator that identifies multi-class images, and can be a discrimination function learned by the multi-class SVM method.

密度は、例えば、人が存在しない「背景」クラス、０人／ｍ^２より高く２人／ｍ^２以下である「低密度」クラス、２人／ｍ^２より高く４人／ｍ^２以下である「中密度」クラス、４人／ｍ^２より高い「高密度」クラスの４クラスと定義することができる。 Density, for example, there is no human "Background" class is 0 people / m higher than ² is two / m ² or less "low density" class, higher than two / m ² 4 persons / m ² or less It can be defined as 4 classes of “medium density” class, “high density” class higher than 4 persons / m ² .

推定密度は各クラスに予め付与された値であり、分布推定の結果として出力される値である。本実施形態では各クラスに対応する値を「背景」「低密度」「中密度」「高密度」と表記する。 The estimated density is a value given in advance to each class, and is a value output as a result of distribution estimation. In the present embodiment, values corresponding to each class are expressed as “background”, “low density”, “medium density”, and “high density”.

すなわち、密度推定器は「背景」クラス、「低密度」クラス、「中密度」クラス、「高密度」クラスのそれぞれに帰属する多数の画像（密度画像）の特徴量に多クラスＳＶＭ法を適用して学習した、各クラスの画像を他のクラスと識別するための識別関数である。この学習により導出された識別関数のパラメータが密度推定器として記憶されている。なお、密度画像の特徴量は、推定用特徴量と同種であり、ＧＬＣＭ特徴である。 That is, the density estimator applies the multi-class SVM method to the feature quantities of a large number of images (density images) belonging to the “background” class, “low density” class, “medium density” class, and “high density” class. This is an identification function for discriminating the images of each class from other classes. The parameters of the discriminant function derived by this learning are stored as a density estimator. The feature amount of the density image is the same type as the estimation feature amount and is a GLCM feature.

密度推定手段５０が出力する密度分布から撮影画像の各所における人の粗密状況が分かるが、密度分布から個々の人の位置までは分からない。
密度推定手段５０の後段のモデル画像生成手段５１および最適配置推定手段５２は、人を模したモデルを撮影画像に当てはめることで個々の人の位置を推定する。当てはめるモデルは物体モデル記憶手段４１に記憶されており、このモデルは人の形状を模したものである。最適配置推定手段５２は、配置したモデルの形状と撮影画像に現れている形状特徴量の類似性を基礎にしてモデルの当てはまり度合いを評価する。 From the density distribution output by the density estimation means 50, the density of people at various locations in the photographed image can be understood, but the position of each person cannot be determined from the density distribution.
The model image generation means 51 and the optimum arrangement estimation means 52 in the subsequent stage of the density estimation means 50 estimate the position of each person by applying a model imitating a person to the captured image. The model to be applied is stored in the object model storage means 41, and this model imitates the shape of a person. The optimal arrangement estimation means 52 evaluates the degree of model fit based on the similarity between the shape of the arranged model and the shape feature amount appearing in the captured image.

具体的には、モデル画像生成手段５１は、密度推定手段５０から入力された密度分布を参照するとともに、物体モデル記憶手段４１から物体モデルを読み出し、撮影画像に配置領域を設定するとともに、当該密度分布に応じた個数（配置数）の物体モデルに配置領域内の位置を割り当てて複数通りの配置を設定し、設定した複数通りの配置にて物体モデルを描画してモデル画像を生成し、生成したモデル画像および配置数を含むモデル画像情報を最適配置推定手段５２に出力する。配置領域は、変化領域など、物体が撮影されていると推定される領域である。 Specifically, the model image generation unit 51 refers to the density distribution input from the density estimation unit 50, reads the object model from the object model storage unit 41, sets an arrangement area in the captured image, and sets the density. Assign multiple positions according to the distribution (number of placements) to the object model, set multiple placements, draw the object model with the set placements, generate a model image, and generate The model image information including the model image and the number of arrangements is output to the optimum arrangement estimation means 52. The arrangement area is an area where an object is estimated to be photographed, such as a change area.

そして、最適配置推定手段５２は、モデル画像生成手段５１から入力された複数通りの配置それぞれのモデル画像と撮影画像との類似度を算出して、複数通りの配置のうちの類似度が最も高い配置（最適配置）が示す物体モデルの位置から物体位置の情報を生成して物体位置出力手段３１に出力する。 Then, the optimum arrangement estimation unit 52 calculates the similarity between the model image of each of the plurality of arrangements input from the model image generation unit 51 and the photographed image, and the similarity is the highest among the plurality of arrangements. Information on the object position is generated from the position of the object model indicated by the arrangement (optimum arrangement) and is output to the object position output means 31.

このように密度分布に応じた個数の物体モデルを配置することで、当てはめるモデルの個数を的確に制限して個々の物体の位置を推定できる。そのため、本来の物体数以上の物体モデルが当てはまることを防止でき、混雑が生じ得る空間が撮影された撮影画像から個々の物体の位置を精度良く推定することが可能となる。 By arranging the number of object models according to the density distribution in this way, it is possible to accurately limit the number of models to be applied and estimate the position of each object. For this reason, it is possible to prevent the object models exceeding the original number of objects from being applied, and it is possible to accurately estimate the position of each object from a captured image in which a space where congestion may occur is captured.

物体モデルとモデル画像の生成について説明する。 Generation of an object model and a model image will be described.

物体モデル記憶手段４１は、図３（Ａ）に示すように、予め推定対象の物体の立体モデルを記憶している立体モデル記憶手段４１０と、予め撮影部２のカメラパラメータを記憶しているカメラパラメータ記憶手段４１１を備え、モデル画像生成手段５１が立体モデルおよびカメラパラメータを読み出す。 As shown in FIG. 3A, the object model storage unit 41 includes a three-dimensional model storage unit 410 that stores a three-dimensional model of an object to be estimated in advance, and a camera that stores a camera parameter of the photographing unit 2 in advance. A parameter storage unit 411 is provided, and the model image generation unit 51 reads the stereo model and camera parameters.

立体モデルは、推定対象の物体を構成する複数の構成部分毎の立体形状を表す部分モデルと、それら部分モデル相互の配置関係とを記述したデータである。画像監視装置１が推定対象とする物体は立位の人であり、人の頭部、胴部、脚部の３部分の立体形状を近似する回転楕円体をＺ軸方向に積み重ねた立体モデルを設定する。本実施形態では説明を簡単にするため、立体モデルの高さおよび幅は標準的な人のサイズとし全員に共通とする。また、頭部中心を人の代表位置とする。なお立体モデルはより単純化して１つの回転楕円体で近似してもよい。 The three-dimensional model is data describing a partial model representing a three-dimensional shape for each of a plurality of constituent parts constituting an object to be estimated, and an arrangement relationship between the partial models. The object to be estimated by the image monitoring apparatus 1 is a standing person, and a three-dimensional model in which spheroids approximating the three-dimensional shape of the human head, torso, and legs are stacked in the Z-axis direction. Set. In this embodiment, in order to simplify the description, the height and width of the three-dimensional model are set to a standard human size and are common to all. The center of the head is the representative position of the person. Note that the three-dimensional model may be further simplified and approximated by one spheroid.

カメラパラメータは撮影部２が監視空間を投影した撮影画像を撮影する際の投影条件に関する情報を含む。例えば、カメラパラメータは、実際の監視空間における撮影部２の設置位置及び撮像方向といった外部パラメータ、撮影部２の焦点距離、画角、レンズ歪みその他のレンズ特性や、撮像素子の画素数といった内部パラメータを含む情報である。このカメラパラメータを用いて立体モデルをレンダリングすることで、撮影部２による人の撮影を模した仮想画像（モデル画像）を生成できる。また、このカメラパラメータを用いて撮影画像上の任意の画素を立体モデルの頭部中心の高さの水平面に逆投影することで、当該画素の位置に投影される立体モデルの、監視空間を模した仮想空間内における代表位置（頭部中心）を算出できる。 The camera parameters include information regarding projection conditions when the imaging unit 2 captures a captured image in which the monitoring space is projected. For example, camera parameters include external parameters such as the installation position and imaging direction of the imaging unit 2 in an actual monitoring space, internal parameters such as the focal length, angle of view, lens distortion, and other lens characteristics of the imaging unit 2 and the number of pixels of the imaging device. It is information including. By rendering the stereoscopic model using the camera parameters, a virtual image (model image) imitating the photographing of a person by the photographing unit 2 can be generated. In addition, by using this camera parameter to back-project any pixel on the captured image onto the horizontal plane at the center of the head of the stereo model, the monitoring space of the stereo model projected at the position of the pixel is simulated. The representative position (center of the head) in the virtual space can be calculated.

モデル画像生成手段５１は、配置数分の物体モデルに撮影画像上の位置を割り当て、カメラパラメータを用いて当該各位置に対応する仮想空間内の位置を求めて、求めた各位置に立体モデルを配置する。そして、モデル画像生成手段５１は、立体モデルを配置した仮想空間を、カメラパラメータを用いて撮影部２の撮影面にレンダリングすることによりモデル画像を生成する。レンダリングにより生成したモデル画像においては物体間の隠蔽も表現される。 The model image generating means 51 assigns positions on the captured image to the object models as many as the number of arrangements, obtains a position in the virtual space corresponding to each position using the camera parameters, and puts a three-dimensional model at each obtained position. Deploy. Then, the model image generating unit 51 generates a model image by rendering the virtual space in which the three-dimensional model is arranged on the shooting surface of the shooting unit 2 using the camera parameters. Concealment between objects is also expressed in the model image generated by rendering.

物体モデル記憶手段４１は、図３（Ａ）の構成に代えて、図３（Ｂ）に示すように、予め撮影画像の各画素の位置に対応した二次元のモデル像を記憶しているモデル像記憶手段４１２で構成することもできる。これらのモデル像はカメラパラメータを用いた立体モデルの投影を事前に行うことで生成される。その場合、モデル画像生成手段５１は、各物体モデルに撮影画像上の位置を割り当てると、当該位置に対応するモデル像を描画する。その際、モデル画像生成手段５１は撮影部２から遠い位置から順に上書き描画することにより物体間の隠蔽を表現する。 As shown in FIG. 3B, the object model storage means 41 stores a two-dimensional model image corresponding to the position of each pixel of the photographed image in advance, instead of the configuration shown in FIG. The image storage unit 412 may be used. These model images are generated by performing a projection of a three-dimensional model using camera parameters in advance. In this case, when the model image generation unit 51 assigns a position on the captured image to each object model, the model image generation unit 51 draws a model image corresponding to the position. At this time, the model image generation unit 51 expresses concealment between objects by overwriting and drawing in order from a position far from the photographing unit 2.

或いは、物体モデル記憶手段４１は、撮影画像の画素数よりも少ない個数の代表的なモデル像を予め記憶しているモデル像記憶手段４１２で構成することもできる。この場合は、これらの代表的なモデル像に拡大・縮小などの変形処理を施すことによって、任意の位置のモデル像が生成される。 Alternatively, the object model storage unit 41 may be configured by a model image storage unit 412 that stores in advance a number of representative model images smaller than the number of pixels of the captured image. In this case, a model image at an arbitrary position is generated by performing deformation processing such as enlargement / reduction on these representative model images.

以上のように、物体モデル記憶手段４１は推定対象の物体を模した物体モデルを記憶しており、特にその形状を模した物体モデルを記憶している。そして、モデル画像生成手段５１は、各物体モデルに撮影画像上の位置を割り当てると、物体モデル記憶手段４１から物体モデルを読み出し、割り当てた各位置に物体モデルを描画してモデル画像を生成する。 As described above, the object model storage unit 41 stores an object model imitating an object to be estimated, and particularly stores an object model imitating its shape. When the model image generation unit 51 assigns a position on the captured image to each object model, the model image generation unit 51 reads the object model from the object model storage unit 41 and draws the object model at each assigned position to generate a model image.

次に、物体モデルを配置する領域（配置領域）および配置数について説明する。 Next, the area (arrangement area) where the object model is arranged and the number of arrangements will be described.

単純な例では、モデル画像生成手段５１は、低密度クラス、中密度クラスまたは高密度クラスであると推定された領域を配置領域に設定し、１個以上の物体モデルのそれぞれに配置領域内のランダムな位置を割り当て、割り当てを変更し且つ上限個数（例えば仮想空間中で立体モデルが重ならずに配置できる上限個数）まで配置数を増やしながら、互いに位置および／または配置数が異なる複数通りのモデル画像を生成する。 In a simple example, the model image generation unit 51 sets an area estimated to be a low density class, a medium density class, or a high density class as an arrangement area, and sets each of one or more object models in the arrangement area. While assigning random positions, changing the assignment, and increasing the number of arrangements up to the upper limit number (for example, the upper limit number that can be arranged without overlapping the three-dimensional model in the virtual space), a plurality of different positions and / or arrangement numbers are mutually different Generate a model image.

しかしながら、上記の単純な例では、位置と配置数の組み合わせが過剰であるため処理量が多くなる上に本来の物体数以上の物体モデルが当てはまることによる誤推定を生じやすくなる。
そこで、モデル画像生成手段５１は、撮影画像中の変化領域を配置領域とすることで、より厳密な配置領域を設定し、また配置領域の実サイズと密度からより厳密な配置数を決定することによって、処理量を減じるとともに本来の物体数以上の物体モデルが当てはまることによる誤推定をより的確に防止する。 However, in the above simple example, since the combination of the position and the number of arrangements is excessive, the amount of processing is increased, and an erroneous estimation is likely to occur due to the application of an object model exceeding the original number of objects.
Therefore, the model image generation unit 51 sets a more strict arrangement area by setting the change area in the captured image as the arrangement area, and determines a more strict arrangement number from the actual size and density of the arrangement area. Thus, the amount of processing is reduced, and erroneous estimation due to the application of an object model exceeding the original number of objects is more accurately prevented.

そのために、背景画像記憶手段４０は監視空間の背景画像を記憶し、変化領域抽出手段５３は、画像取得手段３０から入力された撮影画像と、背景画像記憶手段４０から読み出した背景画像とを比較して、撮影画像において背景画像と所定基準以上に相違する変化領域を抽出し、抽出した変化領域の情報をモデル画像生成手段５１および最適配置推定手段５２に出力する。 For this purpose, the background image storage unit 40 stores the background image of the monitoring space, and the change area extraction unit 53 compares the captured image input from the image acquisition unit 30 with the background image read from the background image storage unit 40. Then, a change area that differs from the background image by a predetermined reference or more in the captured image is extracted, and information on the extracted change area is output to the model image generation means 51 and the optimum arrangement estimation means 52.

具体的には、変化領域抽出手段５３は、推定に先立って、人が存在しないときの撮影画像を背景画像として背景画像記憶手段４０に記憶させる。また、変化領域抽出手段５３は、新たな撮影画像のうちの変化領域以外の部分画像を、背景画像に加算平均し、または置換することによって背景画像を適宜更新する。
或いは、変化領域抽出手段５３は無人の監視空間を模した環境モデルをレンダリングすることによって背景画像を生成することもできる。この場合、変化領域抽出手段５３は、不図示の環境モデル記憶手段から予め記憶されている環境モデルを読み出すとともに不図示のカメラパラメータ記憶手段から予め記憶されている撮影部２のカメラパラメータを読み出し、カメラパラメータを用いて環境モデルをレンダリングする。カメラパラメータ記憶手段は後述するカメラパラメータ記憶手段４１１または／および４２０と共用することもできる。環境モデルには監視空間の構成物体それぞれについての三次元形状、マテリアル特性等および監視空間を照明する光源の照明パラメータが含まれ、変化領域抽出手段５３は、撮影画像等から照明条件を推定し、推定した照明条件に応じて照明パラメータを変更してレンダリングすることにより背景画像を適宜更新する。 Specifically, the change area extraction unit 53 stores, in the background image storage unit 40, a photographed image when no person is present as a background image prior to estimation. In addition, the change area extraction unit 53 appropriately updates the background image by adding and averaging the partial images other than the change area in the new captured image to the background image.
Alternatively, the change area extraction unit 53 can generate a background image by rendering an environmental model that imitates an unattended monitoring space. In this case, the change area extraction unit 53 reads the environment model stored in advance from the environment model storage unit (not shown) and reads the camera parameters of the photographing unit 2 stored in advance from the camera parameter storage unit (not shown). Render the environment model using camera parameters. The camera parameter storage means can be shared with the camera parameter storage means 411 or / and 420 described later. The environmental model includes the three-dimensional shape, material characteristics, and the like of each of the objects constituting the monitoring space, and the illumination parameters of the light source that illuminates the monitoring space, and the change area extraction unit 53 estimates the illumination conditions from the captured image or the like, The background image is appropriately updated by changing and rendering the illumination parameter according to the estimated illumination condition.

変化領域抽出手段５３は背景差分処理により変化領域を抽出する。すなわち変化領域抽出手段５３は、画素ごとに撮影画像と背景画像の輝度差の絶対値（差分値）を算出して予め定めた閾値と比較して閾値以上の差分値が算出された画素を検出し、検出した画素が空間的につながっている塊を変化領域として抽出する。なお推定対象の想定サイズよりも小さな塊は抽出対象外とする。
或いは、変化領域抽出手段５３は、背景相関処理によって変化領域を抽出することもできる。
なお、変化領域抽出手段５３は、モデル画像生成手段５１に出力する変化領域の情報に、各画素における差分値あるいは相関値を含ませてもよい。 The change area extraction unit 53 extracts the change area by background difference processing. That is, the change area extraction unit 53 calculates the absolute value (difference value) of the luminance difference between the captured image and the background image for each pixel and compares it with a predetermined threshold value to detect a pixel whose difference value is equal to or greater than the threshold value. Then, a block in which the detected pixels are spatially connected is extracted as a change area. A lump that is smaller than the estimated size of the estimation target is not extracted.
Alternatively, the change area extraction unit 53 can extract the change area by background correlation processing.
Note that the change region extraction unit 53 may include the difference value or the correlation value in each pixel in the change region information output to the model image generation unit 51.

このように、背景画像記憶手段４０は推定対象空間の背景画像を記憶し、変化領域抽出手段５３は撮影画像を推定対象空間の背景画像と比較して撮影画像において背景画像と所定基準以上に相違する変化領域を抽出する。そして、モデル画像生成手段５１は変化領域抽出手段５３から入力された変化領域を配置領域に設定する。 As described above, the background image storage unit 40 stores the background image of the estimation target space, and the change area extraction unit 53 compares the captured image with the background image of the estimation target space and differs from the background image in the captured image by more than a predetermined reference. The change area to be extracted is extracted. Then, the model image generation means 51 sets the change area input from the change area extraction means 53 as the arrangement area.

サイズ換算手段５４は、モデル画像生成手段５１から撮影画像中の任意の画素を指定され、指定された画素に投影されている監視空間の面積（実サイズ）をモデル画像生成手段５１に出力する。
そして、モデル画像生成手段５１は、変化領域中の各画素をサイズ換算手段５４に指定して各画素の実サイズを取得し、変化領域中の各画素の実サイズに密度推定手段５０から入力された当該画素における密度を乗じて積を総和することにより変化領域に対するモデルの配置数を決定する。
すなわち、変化領域に含まれる画素の集合をＲ、集合Ｒの中の任意の一画素をｉ、画素ｉの実サイズをａ_ｉ、画素ｉにおける密度をｄ_ｉとすると、配置数Ｐは次式で導出される。

The size conversion means 54 designates an arbitrary pixel in the captured image from the model image generation means 51, and outputs the area (actual size) of the monitoring space projected on the designated pixel to the model image generation means 51.
Then, the model image generation unit 51 acquires the actual size of each pixel by designating each pixel in the change region to the size conversion unit 54, and the actual size of each pixel in the change region is input from the density estimation unit 50. Further, the number of models arranged in the change area is determined by multiplying the density of the pixels and summing up the products.
That is, if the set of pixels included in the change region is R, an arbitrary pixel in the set R is i, the actual size of the pixel i is a _i , and the density at the pixel i is d _i , the number of arrangements P is given by Is derived by

具体的には、サイズ換算手段５４は、図４（Ａ）に示すように、予め撮影部２のカメラパラメータを記憶しているカメラパラメータ記憶手段４２０と、カメラパラメータを用いて、指定された画素が表す撮影画像上の領域を監視空間内における領域に変換して当該領域の面積を指定された画素の実サイズとして算出する実サイズ算出手段５４０によって構成することができる。なお、カメラパラメータ記憶手段４２０とカメラパラメータ記憶手段４１１とは共通化してもよい。 Specifically, as shown in FIG. 4A, the size conversion means 54 is a pixel specified by using the camera parameter storage means 420 that stores the camera parameters of the photographing unit 2 in advance and the camera parameters. Can be configured by an actual size calculation unit 540 that converts the area on the captured image represented by the above into an area in the monitoring space and calculates the area of the area as the actual size of the designated pixel. The camera parameter storage unit 420 and the camera parameter storage unit 411 may be shared.

図５を参照して、実サイズ算出手段５４０の処理例を説明する。図５のＸＹＺ空間１００は監視空間を模した仮想空間である。ＸＹ平面は、監視空間における地面、床面等の水平面を表している。また、図５には標準的な身長の人を模した物体モデル１０２の頭部中心の高さ（例えば１．５ｍ）の水平面１０１を示している。
サイズ換算手段５４は、カメラパラメータを用いて、撮影部２の光学中心と指定された画素とを結ぶ視線ベクトルが平面１０１と交差する交点Ｐ_０（Ｘ_０，Ｙ_０，Ｚ_０）を算出する。同様に、サイズ換算手段５４は、カメラパラメータを用いて、指定された画素の右隣の画素および下の画素のそれぞれの交点Ｐ_ｒ（Ｘ_ｒ，Ｙ_ｒ，Ｚ_ｒ）およびＰ_ｂ（Ｘ_ｂ，Ｙ_ｂ，Ｚ_ｂ）を算出する。指定された画素と対応する実サイズは、交点Ｐ_０から交点Ｐ_ｒへのベクトルと交点Ｐ_０から交点Ｐ_ｂへのベクトルが為す平行四辺形の面積で近似できる。サイズ換算手段５４は、｜（Ｘ_ｒ−Ｘ_０）（Ｙ_ｂ−Ｙ_０）−（Ｙ_ｒ−Ｙ_０）（Ｘ_ｂ−Ｘ_０）｜を算出して出力する。 With reference to FIG. 5, a processing example of the actual size calculation unit 540 will be described. An XYZ space 100 in FIG. 5 is a virtual space that imitates a monitoring space. The XY plane represents a horizontal plane such as a ground surface or a floor surface in the monitoring space. FIG. 5 shows a horizontal plane 101 having a height (for example, 1.5 m) at the center of the head of the object model 102 simulating a person having a standard height.
The size conversion means 54 calculates an intersection point P ₀ (X ₀ , Y ₀ , Z ₀ ) at which the line-of-sight vector connecting the optical center of the photographing unit 2 and the designated pixel intersects the plane 101 using the camera parameters. . Similarly, the size conversion means 54 uses the camera parameters to determine the intersections P _r (X _r , Y _r , Z _r ) and P _b (X _b ) of the pixel adjacent to the right of the designated pixel and the lower pixel, respectively. , Y _b , Z _b ). Actual size corresponding to the designated pixel can be approximated by a parallelogram area vector is made from the vector and intersection point P ₀ from the intersection P ₀ to the intersection P _r to the intersection point P _b. The size conversion unit _{_{_{_{54, | (X r -X 0}}}} ) (Y b -Y 0) - (Y r -Y 0) (X b -X 0) | and outputs them.

なお、サイズ換算手段５４は、図４（Ｂ）に示すように、撮影画像の各画素と対応づけて、事前に算出された当該画素の実サイズを予め記憶している実サイズ記憶手段４２１によって構成してもよい。この場合、実サイズ記憶手段４２１がモデル画像生成手段５１から指定された画素に対応して記憶している実サイズを出力する。 As shown in FIG. 4B, the size conversion means 54 is associated with each pixel of the photographed image by an actual size storage means 421 that stores in advance the actual size of the pixel calculated in advance. It may be configured. In this case, the actual size storage unit 421 outputs the actual size stored corresponding to the pixel designated from the model image generation unit 51.

ここで、密度分布が示す密度は推定値であり誤差を含む。そのため、配置数にも幅を持たせて複数通りの配置数でモデル画像を生成し、それらの中から最も撮影画像に類似するモデル画像を選出することによって物体の位置を推定するのが良い。
そこで、モデル画像生成手段５１は、密度分布において想定される推定誤差の範囲で複数通りの個数の物体モデルを割り当てて複数通りの配置を設定する。 Here, the density indicated by the density distribution is an estimated value and includes an error. For this reason, it is preferable to estimate the position of the object by generating model images with a plurality of arrangement numbers with a wide range of arrangement numbers and selecting a model image most similar to the photographed image from among them.
Therefore, the model image generation means 51 assigns a plurality of types of object models and sets a plurality of types of arrangement within a range of estimation errors assumed in the density distribution.

上述した密度推定手段５０が出力する密度分布は、それ自体が幅を持った推定密度で表されており、推定誤差を含んだ表現となっている。すなわち低密度、中密度、高密度クラスの推定誤差の範囲をそれぞれ０人／ｍ^２より高く２人／ｍ^２以下、２人／ｍ^２より高く４人／ｍ^２以下、４人／ｍ^２より高く８人／ｍ^２以下とすることができる。ただし高密度クラスの上限値を８人／ｍ^２としている。モデル画像生成手段５１は、各クラスの下限値を式（１）のｄ_ｉに適用した場合のＰを下限配置数、各クラスの上限値を式（１）のｄ_ｉに適用した場合のＰを上限配置数に設定することができる。 The density distribution output from the above-described density estimation means 50 is represented by an estimated density having a width, and is an expression including an estimation error. That is, the estimation error ranges of the low density, medium density, and high density classes are each higher than 0 person / m ² and 2 persons / m ² or less, but higher than 2 persons / m ² and 4 persons / m ² or less, 4 persons / m ^2. It can be higher than 8 persons / m ² . However, the upper limit of the high density class is 8 people / m ² . Model image generating means 51, d _i lower place number P when applied to, P when applied to d _i of equation (1) the upper limit of each class of the formula (1) the lower limit of each class Can be set to the upper limit arrangement number.

例えば、モデル画像生成手段５１は、中密度クラスの１００画素と高密度クラスの５０画素からなる変化領域について、下限配置数が４（＝０．０１×２×１００＋０．０１×４×５０）、上限配置数が８（＝０．０１×４×１００＋０．０１×８×５０）と算出し、４個の物体モデルを配置したモデル画像、５個の物体モデルを配置したモデル画像、…、８個の物体モデルを配置したモデル画像を生成する。ただし、この例では簡単のため１画素あたり一律０．０１ｍ^２としている。 For example, the model image generating means 51 has a lower limit number of arrangements of 4 (= 0.01 × 2 × 100 + 0.01 × 4 × 50) for a change area composed of 100 pixels of the medium density class and 50 pixels of the high density class. The upper limit arrangement number is calculated as 8 (= 0.01 × 4 × 100 + 0.01 × 8 × 50), a model image in which four object models are arranged, a model image in which five object models are arranged,..., 8 A model image in which individual object models are arranged is generated. However, in this example, for simplicity, it is set to 0.01 m ² per pixel.

なお、推定誤差を多めに見積もって、低密度、中密度、高密度クラスの推定誤差の範囲をそれぞれ０人／ｍ^２以上３人／ｍ^２以下、２人／ｍ^２以上５人／ｍ^２以下、４人／ｍ^２以上８人／ｍ^２以下などとしてもよい。 In addition, the estimation error is estimated to be large, and the range of the estimation error of the low density, medium density, and high density classes is 0 person / m ² or more and 3 persons / m ² or less, 2 persons / m ² or more and 5 persons / m ^{2, respectively.} Hereinafter, 4 people / m ² or more and 8 people / m ² or less may be used.

以上のように、サイズ換算手段５４は、撮影画像における局所領域のサイズを推定対象空間における実サイズに換算する。そして、モデル画像生成手段５１は、撮影画像上に複数の局所領域からなる配置領域を設定して、当該配置領域に含まれる局所領域の実サイズと密度分布から当該配置領域に割り当てる個数を決定する。そのため、処理量を減じるとともに本来の物体数以上の物体モデルが当てはまることによる誤推定をより的確に防止できる。
また、その際、モデル画像生成手段５１は、密度分布において想定される推定誤差の範囲で複数通りの個数の物体モデルを割り当てて複数通りの配置を設定する。そのため、密度分布に推定誤差が含まれていても本来の物体数での配置を含めた複数通りの配置を設定でき、物体の位置の誤推定を防ぐことができる。 As described above, the size conversion unit 54 converts the size of the local region in the captured image into the actual size in the estimation target space. Then, the model image generation unit 51 sets an arrangement area including a plurality of local areas on the photographed image, and determines the number to be allocated to the arrangement area from the actual size and density distribution of the local areas included in the arrangement area. . For this reason, it is possible to reduce the amount of processing and to prevent erroneous estimation due to the application of object models exceeding the original number of objects.
At that time, the model image generating means 51 assigns a plurality of object models and sets a plurality of arrangements within a range of estimation errors assumed in the density distribution. For this reason, even if an estimation error is included in the density distribution, a plurality of arrangements including the arrangement with the original number of objects can be set, and erroneous estimation of the position of the object can be prevented.

次に、類似度について説明する。 Next, the similarity will be described.

最適配置推定手段５２は、モデル画像に描画された物体モデルの形状の、配置領域に対する適合度（形状適合度）の高さに応じた類似度を次式に従って算出する。
類似度＝形状適合度 − （Ｗ_Ｈ×隠蔽度＋Ｗ_Ｎ×配置数の推定誤差）（２）
ただし、Ｗ_ＨおよびＷ_Ｎは０より大きな重み係数であり、事前の実験に基づいて予め設定される。形状適合度から減じる値はペナルティ値に相当する。 The optimal arrangement estimation means 52 calculates the similarity according to the height of the degree of matching (shape fitting degree) of the shape of the object model drawn on the model image with respect to the arrangement area according to the following equation.
Similarity = shape conformity-( _WH x concealment degree + W _N x number of placement estimation errors) (2)
However, W _H and W _N are weighting factors larger than 0, and are set in advance based on prior experiments. The value subtracted from the shape conformity corresponds to the penalty value.

形状適合度は、例えば、撮影画像から抽出された変化領域とモデル画像において物体モデルが描画された領域（モデル領域）との重複度とすることができる。その場合、重複度から配置領域とモデル領域の非重複度を減じることによって、互いの領域のはみ出しを加味した、より信頼性の高い形状適合度とすることができる。すなわち、最適配置推定手段５２は、次式に従って形状適合度を算出する。
形状適合度＝重複度−非重複度
＝（重複領域の面積−非重複領域の面積）
／（変化領域の面積＋モデル領域の面積−重複領域の面積）（３） The shape suitability can be, for example, a degree of overlap between a change area extracted from a captured image and an area (model area) in which an object model is drawn in a model image. In that case, by reducing the non-overlap degree between the arrangement area and the model area from the overlap degree, it is possible to obtain a more reliable shape conformity degree that takes into account the protrusion of each other area. That is, the optimal arrangement estimation means 52 calculates the shape suitability according to the following equation.
Shape suitability = overlap-non-overlap
= (Overlapping area-non-overlapping area)
/ (Area of change area + area of model area−area of overlap area) (3)

図６は形状適合度の算出処理の一例を模式的に示した図である。
最適配置推定手段５２には、変化領域抽出手段５３から変化領域２０１の情報を含んだ差分画像２００が入力され、モデル画像生成手段５１から６つのモデルの像２１１が描画されたモデル画像２１０が入力されている。モデルの像２１１の和領域がモデル領域である。画像２２０は最適配置推定手段５２が変化領域２０１とモデル領域を重ね合せている様子が示されている。図中の白抜き部分が重複領域２２１、横線を記した部分がモデル領域側の非重複領域２２２、斜線を記した部分が変化領域側の非重複領域２２３である。非重複領域２２２と非重複領域２２３の和が式（３）の非重複領域である。
最適配置推定手段５２は、重複領域２２１、非重複領域、変化領域２０１およびモデル領域の画素数を計数して各領域の面積とし、それらの面積を式（３）に適用して形状適合度を算出する。 FIG. 6 is a diagram schematically showing an example of the shape matching degree calculation process.
The optimal arrangement estimation means 52 receives the difference image 200 including the information on the change area 201 from the change area extraction means 53 and the model image 210 on which six model images 211 are drawn from the model image generation means 51. Has been. The sum area of the model image 211 is the model area. The image 220 shows a state in which the optimum arrangement estimation means 52 overlaps the change area 201 and the model area. In the figure, the white part is the overlapping area 221, the horizontal line is the non-overlapping area 222 on the model area side, and the shaded part is the non-overlapping area 223 on the change area side. The sum of the non-overlapping area 222 and the non-overlapping area 223 is the non-overlapping area of Equation (3).
The optimum arrangement estimation means 52 counts the number of pixels of the overlapping region 221, the non-overlapping region, the change region 201, and the model region to obtain the area of each region, and applies these areas to Equation (3) to determine the shape fitness. calculate.

より好適には、形状適合度は、重複度および非重複度を各画素における差分値により重み付けて算出される。すなわち、最適配置推定手段５２は、次式に従って形状適合度を算出することもできる。
形状適合度＝重複度−非重複度
＝重複領域における差分値の総和−非重複領域における差分値の総和
／（変化領域の面積＋モデル領域の面積−重複領域の面積）（４）
なお、変化領域抽出手段５３が背景相関処理により変化領域を抽出した場合は、差分値に代えて１．０から相関値を減じた値を用いる。 More preferably, the shape matching degree is calculated by weighting the overlapping degree and the non-overlapping degree with a difference value in each pixel. That is, the optimum arrangement estimation unit 52 can also calculate the shape suitability according to the following equation.
Shape suitability = overlap-non-overlap
= Sum of difference values in overlapping area-Sum of difference values in non-overlapping area
/ (Area of change area + area of model area−area of overlap area) (4)
When the change area extraction unit 53 extracts the change area by the background correlation process, a value obtained by subtracting the correlation value from 1.0 is used instead of the difference value.

或いは、形状適合度は、撮影画像とモデル画像とのエッジ類似度とすることができる。その場合、最適配置推定手段５２は、画像取得手段３０から撮影画像を入力され、撮影画像とモデル画像のそれぞれからエッジを抽出する。そして最適配置推定手段５２は、例えば、モデル画像から有効なエッジが抽出された画素ごとに対応する撮影画像の画素のエッジ抽出結果との差の絶対値を算出して総和し、総和値を算出に用いた画素数で除してエッジ類似度を算出する。 Alternatively, the shape matching degree can be the edge similarity between the captured image and the model image. In this case, the optimum arrangement estimation unit 52 receives the captured image from the image acquisition unit 30 and extracts an edge from each of the captured image and the model image. Then, the optimal arrangement estimation unit 52 calculates, for example, the absolute value of the difference from the edge extraction result of the corresponding captured image pixel for each pixel from which a valid edge has been extracted from the model image, and calculates the total value. The edge similarity is calculated by dividing by the number of pixels used.

さらには、重複度と非重複度の差と、エッジ類似度の重みづけ和を形状適合度とすることもできる。 Furthermore, the difference between the overlapping degree and the non-overlapping degree and the weighted sum of the edge similarity degrees can be used as the shape matching degree.

次に、類似度のペナルティ項に含まれる隠蔽度について説明する。 Next, the degree of concealment included in the penalty term for similarity will be described.

モデル画像における物体モデルどうしの重なりが大きくなり過ぎると配置数の増減に対する形状適合度の変化が小さくなるため、物体の位置を誤推定する要因となる。隠蔽度は、モデル画像における物体モデルどうしの重なり度合いであり、物体モデルの過剰な重なりを抑制するための尺度である。 If the overlap between the object models in the model image becomes too large, the change in the shape conformity with the increase / decrease in the number of arrangements becomes small, which causes a false estimation of the position of the object. The degree of concealment is the degree of overlap between object models in a model image, and is a measure for suppressing excessive overlap of object models.

モデル画像生成手段５１は配置ごとにモデル画像における物体モデルどうしの重なり度合いを表す隠蔽度を次式に従って算出し、最適配置推定手段５２は式（２）に示したように配置ごとの類似度を当該配置における隠蔽度が大きいほど低めて算出する。
隠蔽度＝モデル間の重複領域の面積／モデル領域の和領域の面積（５） The model image generation means 51 calculates a concealment degree representing the degree of overlap between object models in the model image for each arrangement according to the following equation, and the optimum arrangement estimation means 52 calculates the similarity for each arrangement as shown in equation (2). The calculation is performed with a lower concealment degree in the arrangement.
Concealment level = area of overlapping area between models / area of sum area of model areas (5)

図７は、図６のモデル画像２１０に対して隠蔽度を算出する様子を示した図である。
画像３００は６つのモデル領域の重なりを表す画像である。斜線で示した領域３０１，３０２，３０３を合わせた領域がモデル間の重複領域である。
また、画像３１０は６つのモデル領域の論理和を表す画像である。斜線で示した領域３１１，３１２，３１３を合わせた領域がモデル領域の和領域である。
モデル画像生成手段５１は、領域３０１，３０２および３０３の画素数を計数してモデル間の重複領域の面積とするとともに、領域３１１，３１２および３１３の画素数を計数してモデル領域の和領域の面積とし、それらの面積を式（５）に適用して隠蔽度を算出する。 FIG. 7 is a diagram showing how the concealment degree is calculated for the model image 210 of FIG.
An image 300 is an image representing an overlap of six model regions. A region obtained by combining regions 301, 302, and 303 indicated by hatching is an overlapping region between models.
An image 310 is an image representing a logical sum of six model areas. A region obtained by combining regions 311, 312, and 313 indicated by hatching is a sum region of model regions.
The model image generation means 51 counts the number of pixels in the regions 301, 302, and 303 to determine the area of the overlap region between the models, and counts the number of pixels in the regions 311, 312 and 313 to calculate the sum of the model regions. The area is applied, and the area is applied to Equation (5) to calculate the concealment degree.

このようにすることで、隠蔽度を含めた類似度に基づいて最適配置を推定することができ、本来の物体数以上の物体モデルが当てはまることによる誤推定をより的確に防止できる。 By doing in this way, it is possible to estimate the optimal arrangement based on the similarity including the concealment degree, and it is possible to more accurately prevent erroneous estimation due to the application of object models that are more than the original number of objects.

次に、類似度のペナルティ項に含まれる、配置数の推定誤差について説明する。 Next, the estimation error of the number of arrangements included in the penalty term of similarity will be described.

各モデル画像における配置数は、密度の推定値に基づいて設定された数であり、密度の推定誤差を含んでいる。配置数の推定誤差は、各モデル画像における配置数が含む、密度分布の推定誤差の程度を表す尺度である。 The number of arrangements in each model image is a number set based on the estimated value of density, and includes a density estimation error. The estimation error of the number of arrangement is a scale representing the degree of the estimation error of the density distribution included in the number of arrangement in each model image.

最適配置推定手段５２は、配置ごとの類似度を当該配置における配置数と対応する推定誤差が大きいほど低めて算出する。 The optimum arrangement estimation means 52 calculates the similarity for each arrangement by lowering the estimation error corresponding to the number of arrangements in the arrangement.

具体的には、最適配置推定手段５２は、例えば、次式に従って配置数の推定誤差を算出する。
配置数の推定誤差＝ −ｅｘｐ｛−α（配置数の代表値−配置数）^２｝（６）

ただし、変化領域に含まれる画素の集合をＲ、集合Ｒの中の任意の一画素をｉ、画素ｉの実サイズをａ_ｉ、画素ｉにおける密度の代表値をｃ_ｉとしている。低密度クラスの密度の代表値は１人／ｍ^２、中密度クラスの密度の代表値は３人／ｍ^２、高密度は６人／ｍ^２などとすることができる。αは０よりも大きな定数であり、事前実験に基づいて予め定められる。 Specifically, the optimum arrangement estimation unit 52 calculates an estimation error of the arrangement number according to the following equation, for example.
Arrangement number estimation error = −exp {−α (representative value of arrangement number−number of arrangement) ² } (6)

However, a set of pixels included in the change region is R, an arbitrary pixel in the set R is i, an actual size of the pixel i is a _i , and a representative value of the density in the pixel i is c _i . The representative value of the density of the low density class may be 1 person / m ² , the representative value of the density of the medium density class may be 3 persons / m ² , the high density may be 6 persons / m ² , and the like. α is a constant larger than 0 and is determined in advance based on a prior experiment.

このようにすることで、配置数の推定誤差を含めた類似度に基づいて最適配置を推定することができるので、密度分布から乖離して本来の物体数以上の物体モデルが当てはまってしまうことを防げ、個々の物体の位置の誤推定をより的確に防止できる。 In this way, the optimal arrangement can be estimated based on the similarity including the estimation error of the number of arrangements, so that an object model more than the original number of objects will be applied with a deviation from the density distribution. It is possible to prevent erroneous estimation of the position of each object more accurately.

上述したように、最適配置推定手段５２は、複数通りの配置のうちの類似度が最も高い配置（最適配置）が示す物体モデルの位置から物体位置の情報を出力する。
例えば、最適配置推定手段５２は、監視員が視認し易い物体位置の情報として、最適配置のモデル画像に描画された物体モデルのそれぞれを密度クラスに応じて色分けした分布画像を生成し、出力する。 As described above, the optimum arrangement estimation means 52 outputs the object position information from the position of the object model indicated by the arrangement (optimum arrangement) having the highest similarity among the plurality of arrangements.
For example, the optimum arrangement estimation unit 52 generates and outputs a distribution image in which each of the object models drawn on the model image of the optimum arrangement is color-coded according to the density class as information on the object position that is easy for the observer to visually recognize. .

物体位置の情報は最適配置が示す物体位置そのものであってもよい。または、物体位置の情報は、最適配置における最適配置における各物体モデルの、他の物体モデルと重複していない領域であってもよい。或いは、物体位置の情報は、上述したデータのうちの２以上を含んだデータであってもよい。 The object position information may be the object position itself indicated by the optimum arrangement. Alternatively, the object position information may be a region of each object model in the optimum arrangement that does not overlap with other object models. Alternatively, the object position information may be data including two or more of the data described above.

物体位置出力手段３１は最適配置推定手段５２から入力された物体位置の情報を出力部６に順次出力し、出力部６は物体位置出力手段３１から入力された情報を表示する。例えば、物体位置の情報は、インターネット経由で送受信され、出力部６に表示される。監視員は、表示された分布画像を視認することによって監視空間に混雑が発生している地点およびその地点の様子を迅速に把握し、当該地点に警備員を派遣し或いは増員するなどの対処を行う。 The object position output means 31 sequentially outputs the information of the object position input from the optimum arrangement estimation means 52 to the output unit 6, and the output unit 6 displays the information input from the object position output means 31. For example, the information on the object position is transmitted / received via the Internet and displayed on the output unit 6. The monitor can quickly grasp the location where the monitoring space is congested and the state of the location by visually checking the displayed distribution image, and take measures such as dispatching or increasing the number of guards at the location. Do.

＜画像監視装置１の動作＞
図８から図１０のフローチャートを参照して画像監視装置１の動作を説明する。 <Operation of Image Monitoring Device 1>
The operation of the image monitoring apparatus 1 will be described with reference to the flowcharts of FIGS.

画像監視装置１はイベント会場が無人であるときに起動され、画像監視装置１が動作を開始すると、イベント会場に設置されている撮影部２は所定時間おきに監視空間を撮影して撮影画像を順次画像処理部５が設置されている画像解析センター宛に送信する。画像処理部５は撮影画像を受信するたびに図８のフローチャートに従った動作を繰り返す。 The image monitoring device 1 is activated when the event venue is unmanned, and when the image monitoring device 1 starts operating, the imaging unit 2 installed in the event venue captures the monitoring space and captures the captured image at predetermined intervals. The images are sequentially transmitted to the image analysis center where the image processing unit 5 is installed. The image processing unit 5 repeats the operation according to the flowchart of FIG. 8 every time a captured image is received.

まず、通信部３は画像取得手段３０として動作し、撮影部２からの撮影画像の受信待ち状態となる。そして、撮影画像を取得した画像取得手段３０は当該撮影画像を画像処理部５に出力する（ステップＳ１）。 First, the communication unit 3 operates as the image acquisition unit 30 and waits to receive a captured image from the imaging unit 2. And the image acquisition means 30 which acquired the picked-up image outputs the said picked-up image to the image process part 5 (step S1).

撮影画像を入力された画像処理部５は変化領域抽出手段５３として動作し、記憶部４の背景画像記憶手段４０から背景画像を読み出して撮影画像と比較し、変化領域を抽出する（ステップＳ２）。ただし、起動直後は、変化領域抽出手段５３は変化領域の抽出を省略し、撮影画像を背景画像として背景画像記憶手段４０に記憶させる。 The image processing unit 5 to which the photographed image is input operates as the change area extracting unit 53, reads the background image from the background image storage unit 40 of the storage unit 4, compares it with the photographed image, and extracts the change region (step S2). . However, immediately after activation, the change area extraction unit 53 omits the extraction of the change area and stores the captured image in the background image storage unit 40 as a background image.

変化領域が抽出されなかった場合（ステップＳ３にてＮＯ）、ステップＳ４〜Ｓ８の処理は省略される。このとき、変化領域抽出手段５３は背景画像記憶手段４０の背景画像を変化領域が抽出されなかった撮影画像で置換する。 When the change area is not extracted (NO in step S3), the processes in steps S4 to S8 are omitted. At this time, the change area extraction unit 53 replaces the background image in the background image storage unit 40 with a captured image from which no change area has been extracted.

変化領域が抽出された場合（ステップＳ３にてＹＥＳ）、画像処理部５はモデル画像生成手段５１および最適配置推定手段５２としても動作し、変化領域抽出手段５３からモデル画像生成手段５１に変化領域の情報が入力され、変化領域抽出手段５３から最適配置推定手段５２に差分値の情報が入力される。モデル画像生成手段５１および最適配置推定手段５２はこれらの情報を保持し、処理はステップＳ４に進められる。また、画像処理部５は密度推定手段５０として動作し、撮影画像が密度推定手段５０に入力される。また、変化領域抽出手段５３は撮影画像を用いて背景画像を更新する。 When the change region is extracted (YES in step S3), the image processing unit 5 also operates as the model image generation unit 51 and the optimum arrangement estimation unit 52, and the change region is transferred from the change region extraction unit 53 to the model image generation unit 51. And the difference value information is input from the change region extraction unit 53 to the optimum arrangement estimation unit 52. The model image generation means 51 and the optimum arrangement estimation means 52 hold these pieces of information, and the process proceeds to step S4. Further, the image processing unit 5 operates as the density estimation unit 50, and the captured image is input to the density estimation unit 50. The change area extraction unit 53 updates the background image using the captured image.

撮影画像を入力された密度推定手段５０は、当該撮影画像を密度推定器にて走査し、密度分布を推定する（ステップＳ４）。密度分布を推定した画像処理部５はモデル画像生成手段５１としても動作し、密度推定手段５０からモデル画像生成手段５１に密度分布が入力される。 The density estimation means 50 to which the photographed image is input scans the photographed image with the density estimator and estimates the density distribution (step S4). The image processing unit 5 that has estimated the density distribution also operates as the model image generation unit 51, and the density distribution is input from the density estimation unit 50 to the model image generation unit 51.

密度分布を入力されたモデル画像生成手段５１は、保持している変化領域を順次注目変化領域に設定して（ステップＳ５）、注目変化領域の最適配置推定（ステップＳ６）を制御する。 The model image generation means 51 to which the density distribution has been input sequentially sets the change area held as the attention change area (step S5), and controls the optimum arrangement estimation of the attention change area (step S6).

図９のフローチャートを参照して、注目変化領域の最適配置推定を説明する。 With reference to the flowchart of FIG. 9, the optimum arrangement estimation of the attention change area will be described.

まず、画像処理部５はサイズ換算手段５４としても動作し、モデル画像生成手段５１が注目変化領域の各画素をサイズ換算手段５４に指定して、各画素の実サイズを算出する（ステップＳ６０）。 First, the image processing unit 5 also operates as the size conversion unit 54, and the model image generation unit 51 designates each pixel of the target change region as the size conversion unit 54 and calculates the actual size of each pixel (step S60). .

次に、モデル画像生成手段５１は、注目変化領域に対する配置数の範囲を決定する（ステップＳ６１）。モデル画像生成手段５１は、注目変化領域内の各画素の実サイズと各画素の密度クラスの下限値をそれぞれ式（１）のａ_ｉとｄ_ｉに代入して下限配置数を算出する。また、モデル画像生成手段５１は、注目変化領域内の各画素の実サイズと各画素の密度クラスの上限値をそれぞれ式（１）のａ_ｉとｄ_ｉに代入して上限配置数を算出する。 Next, the model image generating unit 51 determines the range of the number of arrangements with respect to the attention changing region (step S61). The model image generating means 51 calculates the lower limit arrangement number by substituting the actual size of each pixel in the attention change region and the lower limit value of the density class of each pixel into a _i and d _i in the equation (1), respectively. Further, the model image generating means 51 calculates the upper limit arrangement number by substituting the actual size of each pixel in the attention change region and the upper limit value of the density class of each pixel into a _i and d _i in Expression (1), respectively. .

さらに、モデル画像生成手段５１は、注目変化領域内の各画素の実サイズと各画素の密度クラスの代表値をそれぞれ式（７）のａ_ｉとｃ_ｉに代入して、配置数の代表値を算出する（ステップＳ６２）。 Further, the model image generating unit 51 substitutes the actual size of each pixel in the target change region and the representative value of the density class of each pixel into a _i and c _i in Expression (7), respectively, and thereby represents the representative value of the number of arrangements. Is calculated (step S62).

続いて、モデル画像生成手段５１は、ステップＳ６１で設定した範囲内の整数値を順次配置数に設定して（ステップＳ６３）、ステップＳ６３〜Ｓ７３のループ処理を行う。 Subsequently, the model image generating unit 51 sequentially sets the integer value within the range set in step S61 as the number of arrangement (step S63), and performs the loop processing of steps S63 to S73.

モデル画像生成手段５１は、反復回数を計数するカウンタＴを用意して０に初期化し（ステップＳ６４）、反復処理の制御を開始する。 The model image generating means 51 prepares a counter T for counting the number of iterations, initializes it to 0 (step S64), and starts control of the iteration process.

モデル画像生成手段５１は、注目変化領域内にステップＳ６３で設定した配置数と同じ数だけランダムに位置を設定することで、物体モデルに位置を割り当てる（ステップＳ６５）。 The model image generation unit 51 assigns positions to the object model by setting positions at random in the attention change area by the same number as the number of arrangements set in step S63 (step S65).

モデル画像生成手段５１は、撮影画像と同サイズのモデル画像を用意し、モデル画像の、ステップＳ６５で設定した各位置に物体モデルを描画する（ステップＳ６６）。モデル画像生成手段５１は、記憶部４の物体モデル記憶手段４１を参照して、カメラパラメータ記憶手段４１１からカメラパラメータを、立体モデル記憶手段４１０から立体モデルをそれぞれ読み出す。モデル画像生成手段５１は、カメラパラメータを用いて各位置を仮想空間中の位置に変換して変換した各位置に立体モデルを配置し、カメラパラメータを用いて立体モデルを配置した仮想空間をモデル画像にレンダリングすることで物体モデルの描画を行う。 The model image generation means 51 prepares a model image having the same size as the captured image, and draws an object model at each position set in step S65 of the model image (step S66). The model image generation means 51 reads the camera parameters from the camera parameter storage means 411 and the stereo model from the stereo model storage means 410 with reference to the object model storage means 41 of the storage unit 4. The model image generation means 51 converts each position into a position in the virtual space using the camera parameter, arranges the stereo model at each position converted, and uses the camera space as the model image in the virtual space where the stereo model is arranged. The object model is drawn by rendering to.

また、モデル画像生成手段５１は、描画した物体モデルどうしの重複領域の面積と描画した物体モデルの和領域の面積を求め、これらを式（５）に代入して、ステップＳ６６で生成したモデル画像における隠蔽度を算出する（ステップＳ６７）。画像処理部５は最適配置推定手段５２としても動作し、モデル画像生成手段５１から最適配置推定手段５２にモデル画像、物体モデルどうしの重複領域、物体モデルの和領域から重複領域を除いた非重複領域、配置数、各物体モデルの位置、代表配置数および隠蔽度が入力される。 Further, the model image generation means 51 obtains the area of the overlapping area between the drawn object models and the area of the sum area of the drawn object model, substitutes these into equation (5), and generates the model image generated in step S66. The degree of concealment at is calculated (step S67). The image processing unit 5 also operates as the optimum arrangement estimation means 52, and the model image generation means 51 sends the optimum arrangement estimation means 52 to the model image, the overlapping area between the object models, and the non-overlapping that excludes the overlapping area from the sum area of the object model. The area, the number of arrangements, the position of each object model, the number of representative arrangements, and the degree of concealment are input.

図１０のフローチャートを参照して引き続き注目変化領域の最適配置推定を説明する。 With reference to the flowchart of FIG. 10, the optimum arrangement estimation of the attention change area will be described.

最適配置推定手段５２は、入力された配置数と代表配置数を式（６）に代入して配置数の推定誤差を算出する（ステップＳ６８）。 The optimal arrangement estimation means 52 calculates the estimation error of the arrangement number by substituting the input arrangement number and the representative arrangement number into Expression (6) (step S68).

また、最適配置推定手段５２は、入力された重複領域、非重複領域および差分値を用いて重複領域における差分値の総和と非重複領域における差分値の総和を求め、これらを式（４）に代入して形状適合度を算出する（ステップＳ６９）。 Further, the optimum arrangement estimating means 52 obtains the sum of the difference values in the overlap region and the sum of the difference values in the non-overlap region using the input overlap region, non-overlap region, and difference value, and these are expressed in equation (4). Substituting and calculating the shape adaptability (step S69).

そして、最適配置推定手段５２は、ステップＳ６９で算出した形状適合度、ステップＳ６８で算出した推定誤差、および入力された隠蔽度を式（２）に代入することにより、ステップＳ６６で生成したモデル画像に対する類似度を算出し、モデル画像と各物体モデルの位置と類似度とを対応付けて記憶部４に記憶させる（ステップＳ７０）。 Then, the optimum arrangement estimation unit 52 substitutes the shape matching degree calculated in step S69, the estimation error calculated in step S68, and the input concealment degree into the equation (2), thereby generating the model image generated in step S66. The degree of similarity is calculated, and the model image, the position of each object model, and the degree of similarity are associated with each other and stored in the storage unit 4 (step S70).

こうしてモデル画像の類似度が算出されると、モデル画像生成手段５１は、反復回数Ｔを１だけ増加させて（ステップＳ７１）、その値を予め定めた規定回数Ｔ_ＭＡＸと比較する（ステップＳ７２）。 When the model image similarity is calculated in this way, the model image generating means 51 increases the number of iterations T by 1 (step S71) and compares the value with a predetermined number of times T _MAX (step S72). .

反復回数Ｔが規定回数Ｔ_ＭＡＸに満たない場合（ステップＳ７２にてＮＯ）、モデル画像生成手段５１は、処理をステップＳ６５に戻して反復処理を繰り返す。 If the number of iterations T is less than the specified number of times T _MAX (NO in step S72), model image generating means 51 returns the process to step S65 and repeats the iterative process.

他方、反復回数Ｔが規定回数Ｔ_ＭＡＸに達した場合（ステップＳ７２にてＹＥＳ）、モデル画像生成手段５１は、ステップＳ６３で設定した範囲の配置数を全て設定したか否かを確認する（ステップＳ７３）。 On the other hand, when the number of iterations T has reached the specified number of times T _MAX (YES in step S72), the model image generating means 51 checks whether or not all the arrangement numbers in the range set in step S63 have been set (step S63). S73).

未だ設定していない配置数がある場合（ステップＳ７３にてＮＯ）、モデル画像生成手段５１は、処理をステップＳ６３に戻して次の配置数での処理を行う。 If there is an arrangement number that has not yet been set (NO in step S73), the model image generating unit 51 returns the process to step S63 and performs the process with the next arrangement number.

他方、全配置数を設定し終えた場合（ステップＳ７３にてＹＥＳ）、モデル画像生成手段５１は最適配置推定手段５２に注目変化領域についての複数通りのモデル画像を生成し終えたことを通知する。 On the other hand, when the setting of the total number of arrangements has been completed (YES in step S73), the model image generation unit 51 notifies the optimum arrangement estimation unit 52 that generation of a plurality of model images for the attention change region has been completed. .

この通知を受けた最適配置推定手段５２は類似度が最大の配置を決定する（ステップＳ７４）。最適配置推定手段５２はステップＳ７０で記録した類似度の中から最大値を選出し、選出した類似度と対応付けられているモデル画像と各物体モデルの位置を、注目変化領域における物体位置として記憶部４に記憶させる。最適配置推定手段５２はステップＳ７０で記録した情報をクリアし、処理を図８のステップＳ７に進める。 Upon receiving this notification, the optimum arrangement estimating means 52 determines an arrangement having the maximum similarity (step S74). The optimum arrangement estimation means 52 selects the maximum value from the similarities recorded in step S70, and stores the model image associated with the selected similarity and the position of each object model as the object position in the attention change region. Store in part 4. The optimum arrangement estimation means 52 clears the information recorded in step S70 and advances the process to step S7 in FIG.

再び図８を参照して説明する。モデル画像生成手段５１は、保持している変化領域の全てについて処理し終えたか否かを確認する（ステップＳ７）。未だ処理していない変化領域がある場合（ステップＳ７にてＮＯ）、モデル画像生成手段５１は、処理をステップＳ５に戻して次の変化領域の処理を行う。次の変化領域の処理が行われるとステップＳ７４においてその変化領域における物体位置の情報が追記されることになる。 A description will be given with reference to FIG. 8 again. The model image generation means 51 confirms whether or not the processing has been completed for all of the change areas that are held (step S7). If there is a change area that has not yet been processed (NO in step S7), the model image generating means 51 returns the process to step S5 and performs the process of the next change area. When the next change area is processed, information on the object position in the change area is added in step S74.

他方、全ての変化領域の処理を終えた場合（ステップＳ７にてＹＥＳ）、最適配置推定手段５２はステップＳ７４で記録した物体位置の情報を通信部３に出力する（ステップＳ８）。物体位置の情報を入力された通信部３は物体位置出力手段３１として動作し、物体位置の情報を出力部６に送信する。出力部６は物体位置の情報を表示するなどして監視員に伝達する。 On the other hand, when all the changed areas have been processed (YES in step S7), the optimum arrangement estimating means 52 outputs the object position information recorded in step S74 to the communication unit 3 (step S8). The communication unit 3 that has received the object position information operates as the object position output unit 31 and transmits the object position information to the output unit 6. The output unit 6 displays information on the object position and transmits it to the monitor.

以上の処理を終えると処理はステップＳ１に戻され、次の撮影画像に対する処理が行われる。 When the above process is completed, the process returns to step S1, and the process for the next photographed image is performed.

＜変形例＞
（１）上記実施形態においては、推定対象の物体を人とする例を示したが、これに限らず、推定対象の物体を車両、牛や羊等の動物等とすることもできる。 <Modification>
(1) In the above-described embodiment, an example in which the object to be estimated is a person has been shown. However, the present invention is not limited to this, and the object to be estimated may be a vehicle, an animal such as a cow or a sheep, or the like.

（２）上記実施形態およびその変形例においては、撮影部２がカメラ１台の例を示したが、共通視野を有する複数のカメラで撮影部２を構成することもできる。その場合、背景画像記憶手段４０には各カメラの背景画像が記憶され、変化領域抽出手段５２が参照するカメラパラメータ記憶手段（不図示）には各カメラのカメラパラメータが記憶され、変化領域抽出手段５２はカメラごとに変化領域を抽出する。また、密度推定手段５０が参照するカメラパラメータ記憶手段（不図示）には各カメラのカメラパラメータが記憶され、密度推定手段５０はカメラごとに密度分布を推定する。また、カメラパラメータ記憶手段４１１，４２０には各カメラのカメラパラメータが記憶され、サイズ換算手段５４はカメラごとに実サイズを算出し、モデル画像生成手段５１はカメラごとに配置数の範囲を算出する。そして、モデル画像生成手段５１は最小の下限配置数と最大の上限配置数にて配置数の範囲を決定し、仮想空間に各配置数分だけの物体モデルを配置する。モデル画像生成手段５１は、これを各カメラのカメラパラメータを用いてレンダリングすることでカメラごとのモデル画像を生成し、カメラごとに隠蔽度の算出を行う。そして、最適配置推定手段５２は、カメラごとに形状類似度、配置数の推定誤差、類似度を算出し、全カメラの類似度を総和した総和値が最大のモデル画像から最適配置を決定する。このようにすることで、物体の隠蔽状態が異なる複数の視点からの類似度に基づき総合的に最適な配置を決定でき、物体位置の推定精度が向上する。 (2) In the above-described embodiment and its modification, an example in which the photographing unit 2 is a single camera has been described. However, the photographing unit 2 can be configured by a plurality of cameras having a common field of view. In that case, the background image storage means 40 stores the background image of each camera, the camera parameter storage means (not shown) referred to by the change area extraction means 52 stores the camera parameters of each camera, and the change area extraction means. 52 extracts a change area for each camera. The camera parameter storage means (not shown) to which the density estimation means 50 refers is stored with camera parameters of each camera, and the density estimation means 50 estimates the density distribution for each camera. The camera parameter storage means 411 and 420 store the camera parameters of each camera, the size conversion means 54 calculates the actual size for each camera, and the model image generation means 51 calculates the range of the number of arrangements for each camera. . Then, the model image generation means 51 determines the range of the number of arrangements with the minimum lower limit arrangement number and the maximum upper limit arrangement number, and arranges as many object models as the number of arrangements in the virtual space. The model image generation means 51 generates a model image for each camera by rendering it using the camera parameters of each camera, and calculates a concealment degree for each camera. Then, the optimum arrangement estimation means 52 calculates the shape similarity, the estimation error of the number of arrangements, and the similarity for each camera, and determines the optimum arrangement from the model image having the maximum sum total value of the similarities of all cameras. In this way, an optimal arrangement can be determined comprehensively based on the similarity from a plurality of viewpoints with different object concealment states, and the object position estimation accuracy is improved.

（３）上記実施形態およびその各変形例においては、固定視野の撮影部２を用いる例を示したが、可変視野の撮影部２とすることもできる。その場合、撮影部２は視野変更後のカメラパラメータを通信部３経由で画像処理部５に出力し、画像処理部５は、密度推定手段５０が参照するカメラパラメータ記憶手段（不図示）、変化領域抽出手段５３が参照するカメラパラメータ記憶手段（不図示）およびカメラパラメータ記憶手段４１１，４２０のカメラパラメータを入力されたカメラパラメータに変更する。またその場合、変化領域抽出手段５３は、環境モデルをレンダリングすることによって背景画像を生成する。 (3) In the above-described embodiment and the modifications thereof, an example using the fixed-field imaging unit 2 has been described, but a variable-field imaging unit 2 may be used. In that case, the imaging unit 2 outputs the camera parameters after the field of view change to the image processing unit 5 via the communication unit 3, and the image processing unit 5 uses a camera parameter storage unit (not shown) referred to by the density estimation unit 50, the change The camera parameter storage means (not shown) and the camera parameter storage means 411 and 420 referred to by the area extraction means 53 are changed to the input camera parameters. In this case, the change area extraction unit 53 generates a background image by rendering the environmental model.

（４）上記実施形態およびその変形例においては、モデル画像生成手段５１が変化領域を配置領域に設定する例を示したが、モデル画像生成手段５１は背景クラス以外の領域を配置領域に設定することもできる。すなわち、モデル画像生成手段５１は、密度分布において０より大きな密度が推定された領域を配置領域に設定する。具体的には、モデル画像生成手段５１は、推定密度が「低密度」クラス、「中密度」クラスおよび「高密度」クラスと推定された画素からなる領域を配置領域に設定する。なお、背景クラス以外の領域を配置領域に設定する場合、背景画像記憶手段４０および変化領域抽出手段５３を省略した構成とすることができる。 (4) In the above-described embodiment and its modification, the model image generation unit 51 sets the change area as the arrangement area. However, the model image generation unit 51 sets the area other than the background class as the arrangement area. You can also. That is, the model image generating unit 51 sets an area where a density greater than 0 is estimated in the density distribution as an arrangement area. Specifically, the model image generating unit 51 sets an area including pixels whose estimated densities are estimated to be a “low density” class, a “medium density” class, and a “high density” class as an arrangement area. When an area other than the background class is set as the arrangement area, the background image storage unit 40 and the change area extraction unit 53 can be omitted.

（５）上記実施形態およびその各変形例においては、式（６）および式（７）に従って配置数の推定誤差を算出する例を示したが、密度分布と配置の相違度を配置数の推定誤差として算出することもできる。その場合、密度推定手段５０は最適配置推定手段５２にも密度分布を出力する。最適配置推定手段５２は、各配置において、画素ごとに、推定用抽出窓を設定して推定用抽出窓内の物体モデルの位置を計数して窓内配置数を求める。最適配置推定手段５２は、各配置に対し、次式に従い、画素ごとの窓内配置数と密度分布において対応する画素の密度の代表値との差の総和を総画素数で除した値を配置数の推定誤差として算出する。

ただし、撮影画像の総画素数をＮ、撮影画像中の任意の一画素をｉ、画素ｉにて算出した窓内配置数をｍｉ、画素ｉにおける密度の代表値をｃｉとしている。背景クラスの密度の代表値は０人／ｍ^２、低密度クラスの密度の代表値は１人／ｍ^２、中密度クラスの密度の代表値は３人／ｍ^２、高密度は６人／ｍ^２などとすることができる。
或いは、最適配置推定手段５２は、画素ごとに窓内配置数と対応する密度クラスの値を求めて配置数の推定誤差を算出してもよい。例えば、或る配置において或る画素に対応して設定した推定用抽出窓内の物体モデルの位置が２個であれば当該画素の値は「低密度」クラスを示す値となる。最適配置推定手段５２は、こうして求めた画素ごと密度クラスの値を密度分布において対応する画素の値と比較し、同一値でない画素数を総画素数で除した値を配置数の推定誤差として算出する。 (5) In the above-described embodiment and each modification thereof, an example in which the estimation error of the number of arrangements is calculated according to the equations (6) and (7) is shown. It can also be calculated as an error. In that case, the density estimation means 50 also outputs the density distribution to the optimum arrangement estimation means 52. In each arrangement, the optimum arrangement estimation means 52 sets an extraction window for estimation for each pixel, counts the position of the object model in the extraction window for estimation, and obtains the number of arrangements in the window. The optimum arrangement estimation means 52 arranges the value obtained by dividing the sum of the differences between the number of arrangements in the window for each pixel and the representative value of the density of the corresponding pixel in the density distribution by the total number of pixels for each arrangement according to the following equation. Calculated as number estimation error.

However, the total number of pixels of the photographed image is N, one arbitrary pixel in the photographed image is i, the number of arrangement in the window calculated by the pixel i is mi, and the representative value of the density at the pixel i is ci. The representative value of the density of the background class is 0 person / m ² , the representative value of the density of the low density class is 1 person / m ² , the representative value of the density of the medium density class is 3 persons / m ² , and the high density is 6 persons / m ² . m ² or the like.
Alternatively, the optimal arrangement estimation means 52 may calculate the estimation error of the arrangement number by obtaining the density class value corresponding to the arrangement number in the window for each pixel. For example, if there are two positions of the object model in the estimation extraction window set corresponding to a certain pixel in a certain arrangement, the value of the pixel is a value indicating the “low density” class. The optimum arrangement estimation means 52 compares the density class value obtained for each pixel with the corresponding pixel value in the density distribution, and calculates a value obtained by dividing the number of non-identical pixels by the total number of pixels as an arrangement number estimation error. To do.

（６）上記実施形態およびその各変形例においては、分布密度と実サイズに基づいて配置数を設定する例を示したが、上述した配置数の推定誤差に基づいて探索的に配置数を設定することもできる。例えば、モデル画像生成手段５１は、１から予め定めた上限個数（例えば撮影画像と対応する仮想空間中で立体モデルが重ならずに配置できる個数）まで仮配置数のループ処理を行って、仮配置数分の物体モデルのそれぞれに配置領域内のランダムな位置を割り当ててＴ_ＭＡＸ通りの仮配置を設定し、Ｔ_ＭＡＸ通りの仮配置それぞれに対して式（６）または式（８）などの推定誤差を算出して予め定めた閾値εと比較し、推定誤差が閾値ε以下である仮配置にてモデル画像を生成する。こうすることによっても、モデル画像生成手段５１は、密度分布に応じた個数の物体モデルに配置領域内の位置を割り当てて複数通りの配置を設定し、複数通りの配置にて物体モデルを描画してモデル画像を生成することができる。 (6) In the above-described embodiment and each modification thereof, the example in which the number of arrangements is set based on the distribution density and the actual size has been shown, but the number of arrangements is set in a search based on the above-described estimation error of the number of arrangements. You can also For example, the model image generating unit 51 performs a temporary arrangement number loop process from 1 to a predetermined upper limit number (for example, the number of stereoscopic models that can be arranged without overlapping in the virtual space corresponding to the captured image). set the temporary placement of T _MAX street assigns a random position of the placement area each arrangement number of the object model, for each temporary arrangement of T _MAX as the formula (6) or formula (8), such as An estimation error is calculated and compared with a predetermined threshold value ε, and a model image is generated with a temporary arrangement in which the estimation error is equal to or less than the threshold value ε. Also by doing this, the model image generating means 51 assigns positions in the arrangement area to the number of object models corresponding to the density distribution, sets a plurality of arrangements, and draws the object models in the plurality of arrangements. Model images can be generated.

（７）上記実施形態およびその各変形例においては、密度推定器が学習する特徴量および密度推定手段５０が抽出する推定用特徴量としてＧＬＣＭ特徴を例示したが、これらはＧＬＣＭ特徴に代えて、局所二値パターン（Local Binary Pattern：ＬＢＰ）特徴量、ハールライク（Haar-like）特徴量、輝度パターン、ＨＯＧ（Histograms of Oriented Gradients）特徴量などの種々の特徴量とすることができ、またはＧＬＣＭ特徴とこれらのうちの複数を組み合わせた特徴量とすることもできる。 (7) In the above-described embodiment and each modification thereof, the GLCM feature is exemplified as the feature amount learned by the density estimator and the estimation feature amount extracted by the density estimation means 50. However, these are replaced with the GLCM feature. Various feature quantities such as local binary pattern (LBP) feature quantities, Haar-like feature quantities, luminance patterns, HOG (Histograms of Oriented Gradients) feature quantities, or GLCM features And a feature amount combining a plurality of them.

（８）上記実施形態およびその各変形例においては、多クラスＳＶＭ法にて学習した密度推定器を例示したが、多クラスＳＶＭ法に代えて、決定木型のランダムフォレスト法、多クラスのアダブースト（AdaBoost）法または多クラスロジスティック回帰法などにて学習した密度推定器など種々の密度推定器とすることができる。
或いは、特徴量の抽出処理と密度推定器による推定処理を一つのネットワークで表現するＣＮＮ（Convolutional Neural Network）のような方法を用いて、密度推定手段５０を実現することもできる。 (8) In the above-described embodiment and each modification thereof, the density estimator learned by the multi-class SVM method is exemplified, but instead of the multi-class SVM method, a decision tree type random forest method, a multi-class Adaboost Various density estimators such as a density estimator learned by (AdaBoost) method or multi-class logistic regression method can be used.
Alternatively, the density estimation means 50 can be realized by using a method such as CNN (Convolutional Neural Network) in which the feature amount extraction processing and the estimation processing by the density estimator are expressed by one network.

（９）上記実施形態およびその各変形例においては、モデル画像生成手段５１は、密度推定手段５０が出力する各画素の密度推定値を基に配置数の代表値ｃ_ｉを算出する例を示したが、密度推定手段５０が密度推定値に加えて推定の過程で算出する各クラスのスコアを出力し、モデル画像生成手段５１がこれらのスコアを基に配置数の代表値ｃ_ｉを補正することもできる。
クラスのスコアは、推定用特徴量が抽出された画像の「背景」クラスと他のクラスのうちの「背景」クラスであることの尤もらしさを表す背景スコア、「低密度」クラスと他のクラスのうちの「低密度」クラスであることの尤もらしさを表す低密度スコア、「中密度」クラスと他のクラスのうちの「中密度」クラスであることの尤もらしさを表す中密度スコア、「高密度」クラスと他のクラスのうちの「高密度」クラスであることの尤もらしさを表す高密度スコアである。因みにこれらのうちの最も高いスコアを示すクラスが密度推定値となる。
モデル画像生成手段５１は密度推定値が示すクラスのスコアの高さが高いほど高く、低いほど低く配置数の代表値ｃ_ｉを補正する。 (9) In the above embodiment and the modifications thereof, the model image generation unit 51 calculates an example of the representative value c _{i of the} number of arrangements based on the density estimation value of each pixel output from the density estimation unit 50. However, the density estimation means 50 outputs the score of each class calculated in the estimation process in addition to the density estimation value, and the model image generation means 51 corrects the representative value c _{i of the} number of arrangements based on these scores. You can also
The score of the class is the background score indicating the likelihood of being the “background” class of the “background” class and other classes of the image from which the estimation feature amount is extracted, and the “low density” class and other classes. A low density score representing the likelihood of being a “low density” class, a medium density score representing the likelihood of being a “medium density” class of the “medium density” class and other classes, “ It is a high density score representing the likelihood of being a “high density” class and a “high density” class among other classes. Incidentally, the class showing the highest score among these is the density estimation value.
The model image generating unit 51 corrects the representative value c _{i of the} number of arrangements as the class score indicated by the density estimation value is higher and lower as the class score is lower.

（１０）上記実施形態およびその各変形例においては、密度推定器が推定する密度のクラスを４クラスとしたが、より細かくクラスを分けてもよい。 (10) In the above-described embodiment and its modifications, the density classes estimated by the density estimator are four classes, but the classes may be divided more finely.

（１１）上記実施形態およびその各変形例においては、密度推定手段５０は多クラスに分類する密度推定器を用いる例を示したが、これに代えて、特徴量から密度の値を回帰する回帰型の密度推定器とすることもできる。すなわち、リッジ回帰法、サポートベクターリグレッション法または回帰木型のランダムフォレスト法などによって、特徴量から密度を求めるための回帰関数のパラメータを学習した密度推定器とすることができる。
その場合、モデル画像生成手段５１は密度推定器が出力する密度の値を配置数の代表値ｃ_ｉとして用いる。また、モデル画像生成手段５１は、下限配置数を（ｃ_ｉ−２）とし（ただし０以下となる場合は０）、上限配置数を（ｃ_ｉ＋２）とするなど、密度推定器が出力する密度の値に予め定めた値を加減した下限配置数、上限配置数を用いる。 (11) In the above-described embodiment and its modifications, the density estimation means 50 uses an example of a density estimator that classifies into multiple classes, but instead of this, regression that regresses the density value from the feature quantity It can also be a mold density estimator. That is, the density estimator can learn the parameters of the regression function for obtaining the density from the feature quantity by the ridge regression method, the support vector regression method, or the regression tree-type random forest method.
In that case, the model image generating means 51 uses the density value output from the density estimator as the representative value c _i of the number of arrangements. In addition, the model image generation unit 51 outputs a density estimator such that the lower limit arrangement number is (c _i -2) (however, 0 if it is 0 or less), and the upper limit arrangement number is (c _i +2). The lower limit arrangement number and upper limit arrangement number obtained by adding or subtracting a predetermined value to the density value are used.

（１２）上記実施形態およびその各変形例においては、モデル画像生成手段５１が物体モデルに対して反復の都度ランダムに位置を割り当てる例を示したが、一回前の位置から微小にずらした位置を割り当ててもよいし、一回前の配置に対する類似度を参照してＭＣＭＣ（Markov chain Monte Carlo）法により確率的に位置を探索する方法や山登り法により位置を逐次改善する割り当て等を行ってもよい。

(12) In the above-described embodiment and its modifications, the model image generation unit 51 randomly assigns a position to the object model every time it is repeated. However, the position slightly shifted from the previous position. May be assigned, a method of searching for a position by the MCMC (Markov chain Monte Carlo) method with reference to the similarity to the previous arrangement, or an assignment to improve the position sequentially by a hill-climbing method, etc. Also good.

１・・・画像監視装置
２・・・撮影部
３・・・通信部
４・・・記憶部
５・・・画像処理部
６・・・出力部
３０・・・画像取得手段
３１・・・物体位置出力手段
４０・・・背景画像記憶手段
４１・・・物体モデル記憶手段
５０・・・密度推定手段
５１・・・モデル画像生成手段
５２・・・最適配置推定手段
５３・・・変化領域抽出手段
５４・・・サイズ換算手段
４１０・・・立体モデル記憶手段
４１１・・・カメラパラメータ記憶手段
４１２・・・モデル像記憶手段
４２０・・・カメラパラメータ記憶手段
４２１・・・実サイズ記憶手段
５４０・・・実サイズ算出手段 DESCRIPTION OF SYMBOLS 1 ... Image monitoring apparatus 2 ... Imaging | photography part 3 ... Communication part 4 ... Memory | storage part 5 ... Image processing part 6 ... Output part 30 ... Image acquisition means 31 ... Object Position output means 40 ... background image storage means 41 ... object model storage means 50 ... density estimation means 51 ... model image generation means 52 ... optimum arrangement estimation means 53 ... change area extraction means 54 ... Size conversion means 410 ... Stereo model storage means 411 ... Camera parameter storage means 412 ... Model image storage means 420 ... Camera parameter storage means 421 ... Actual size storage means 540・ Actual size calculation means

Claims

An object position estimation device that estimates the position of each object from a captured image obtained by capturing an estimation target space in which congestion due to a predetermined object may occur,
The density distribution of the object imaged in the captured image is estimated using a density estimator that has previously learned the characteristics of the density image obtained by capturing the space where the object exists at the density for each predetermined density. Density estimation means;
Object model storage means storing an object model imitating the object;
In addition to setting an arrangement area in the captured image, assigning positions in the arrangement area to the number of the object models according to the distribution, setting a plurality of arrangements, and setting the object models in the plurality of arrangements. Model image generating means for generating a model image by drawing;
Calculating the degree of similarity between the model image of each of the plurality of arrangements and the captured image, and outputting an arrangement having the highest degree of similarity among the plurality of arrangements;
An object position estimation apparatus comprising:

The model image generating means assigns a plurality of the object models in the range of estimation errors assumed in the distribution and sets the plurality of arrangements;
The optimum arrangement estimating means calculates the similarity for each arrangement by lowering the estimation error corresponding to the number in the arrangement,
The object position estimation apparatus according to claim 1.

The model image generation means calculates the degree of overlap between the object models in the model image for each arrangement,
The optimum arrangement estimation means calculates the similarity for each arrangement by lowering the degree of overlap in the arrangement,
The object position estimation apparatus according to claim 1 or 2.

Size conversion means for converting the size of the local region in the captured image into the actual size in the estimation target space,
The density estimation means estimates the density for each local region,
The model image generation means determines the number to be allocated to the arrangement area from the actual size and the density of the local area included in the arrangement area.
The object position estimation apparatus according to claim 1.

A change area extracting unit that compares the captured image with a background image of the estimation target space and extracts a change area that differs from the background image by a predetermined reference or more in the captured image;
The model image generating means sets the change area to the arrangement area;
The object position estimation apparatus according to any one of claims 1 to 4.

The model image generating means sets an area where the density greater than 0 in the distribution is estimated as the arrangement area;
The object position estimation apparatus according to any one of claims 1 to 5.

The object model storage means stores the object model imitating the shape of the object,
The optimum arrangement estimating means calculates the similarity according to the degree of fitness of the shape of the object model drawn on the model image with respect to the arrangement area.
The object position estimation apparatus according to any one of claims 1 to 6.