JP6851233B2

JP6851233B2 - Object position estimation device

Info

Publication number: JP6851233B2
Application number: JP2017051077A
Authority: JP
Inventors: 秀紀氏家; 黒川　高晴; 高晴黒川; 昌宏前田; 匠宗片
Original assignee: Secom Co Ltd
Current assignee: Secom Co Ltd
Priority date: 2017-03-16
Filing date: 2017-03-16
Publication date: 2021-03-31
Anticipated expiration: 2037-03-16
Also published as: JP2018156236A

Description

本発明は、画像から人等の所定の物体の位置を推定する物体位置推定装置に関し、特に、混雑が生じ得る空間が撮影された画像から個々の物体の位置を推定する物体位置推定装置に関する。 The present invention relates to an object position estimation device that estimates the position of a predetermined object such as a person from an image, and more particularly to an object position estimation device that estimates the position of an individual object from an image in which a space where congestion may occur is photographed.

イベント会場等の混雑が発生し得る空間においては事故防止等のために、混雑が発生している区域に警備員を多く配置するなどの対応が求められる。そこで、会場の各所に監視カメラを配置して撮影画像から人の分布を推定し、推定した分布を表示することによって監視員による混雑状況の把握を容易化することができる。 In spaces where congestion may occur, such as event venues, it is necessary to take measures such as allocating a large number of security guards in areas where congestion is occurring in order to prevent accidents. Therefore, by arranging surveillance cameras at various places in the venue, estimating the distribution of people from the captured image, and displaying the estimated distribution, it is possible to facilitate the grasp of the congestion situation by the observer.

その際、個々の人の位置を推定することによって、推定した個々の位置に人の形状を模したモデルを表示し、または／および人の位置関係（例えば行列を為している、取り囲んでいる）を解析して解析結果を報知することによって、より一層の監視効率向上が期待できる。 At that time, by estimating the position of each person, a model imitating the shape of the person is displayed at the estimated individual position, and / and the positional relationship of the person (for example, forming a matrix, surrounding). ) Is analyzed and the analysis result is notified, which can be expected to further improve the monitoring efficiency.

複数人が撮影された撮影画像から個々の人の位置を推定する方法のひとつに、人を模したモデルを複数個組み合わせて撮影画像に当てはめる方法がある。 One of the methods for estimating the position of each person from the images taken by a plurality of people is to combine a plurality of models imitating a person and apply them to the photographed image.

特許文献１に記載の移動物体追跡装置においては、監視画像と背景画像との比較によって変化画素が抽出された位置に、追跡中の移動物体の形状を模した移動物体モデルを追跡中の移動物体の数だけ組み合わせて当てはめることによって個々の移動物体の位置を推定している。その際、各物体位置に対して推定された物体領域を合成し、変化画素のうち合成領域外の変化画素を検出してそれらをラベリングし、ラベルが移動物体とみなせる大きさであればラベルの位置を新規出現した移動物体の位置とすることが記載されている。 In the moving object tracking device described in Patent Document 1, a moving object that is tracking a moving object model that imitates the shape of the moving object being tracked at a position where change pixels are extracted by comparing a surveillance image and a background image. The positions of individual moving objects are estimated by combining and applying as many as the number of. At that time, the estimated object area is synthesized for each object position, the changed pixels outside the synthesized area are detected among the changed pixels and labeled, and if the label is large enough to be regarded as a moving object, the label is displayed. It is described that the position is the position of a newly appearing moving object.

特開２０１２−１５９９５８号公報Japanese Unexamined Patent Publication No. 2012-159985

しかしながら、混雑時の空間を撮影した撮影画像は、当てはめるべきモデルの個数を推定することが困難であり、高精度に人の位置を推定できない問題があった。 However, it is difficult to estimate the number of models to be applied to the captured image of the space at the time of congestion, and there is a problem that the position of a person cannot be estimated with high accuracy.

すなわち、モデルの組み合わせを変化領域に当てはめる方法では、領域重複の割合が大きな組み合わせを含ませれば、本来人数以上の個数のモデルが当てはまってしまうため、組み合わせるモデルの個数を尤もらしい個数に制限する必要がある。 In other words, in the method of applying a combination of models to a changing region, if a combination with a large region overlap ratio is included, more models than the original number will be applied, so it is necessary to limit the number of models to be combined to a plausible number. There is.

ところが、混雑時の空間を撮影した撮影画像においては、多数人の新規出現が人同士のオクルージョンを伴って同時に生じ得る。このことは、オクルージョンの度合いによって、新規出現による変化領域（合成領域外のラベル）の面積と新規出現した人数の関係が様々に変わり得ることを意味する。そのため、合成領域外のラベルの面積から、当該ラベルに当てはめるべきモデルの個数を推定することが困難であった。 However, in a photographed image of a crowded space, new appearances of a large number of people may occur at the same time with occlusion between people. This means that the relationship between the area of the change area due to new appearance (label outside the synthetic area) and the number of newly appearing people can change variously depending on the degree of occlusion. Therefore, it is difficult to estimate the number of models to be applied to the label from the area of the label outside the composite area.

加えて、混雑時の空間を撮影した撮影画像においては、多数人の消失が多数人の新規出現と同時に生じ得る。そのため、新規出現による変化領域のみならず、それ以外の部分を含めた変化領域に当てはめるべきモデルの個数を推定することが困難であった。 In addition, in a photographed image of a crowded space, the disappearance of a large number of people may occur at the same time as the new appearance of a large number of people. Therefore, it is difficult to estimate the number of models that should be applied not only to the change region due to new appearance but also to the change region including other parts.

そのため、混雑時の空間を撮影した撮影画像に対し、当てはめるモデルの個数を尤もらしい個数に制限できず、高精度に人の位置を推定できないのである。 Therefore, it is not possible to limit the number of models to be applied to a photographed image of a crowded space to a plausible number, and it is not possible to estimate the position of a person with high accuracy.

本発明は上記問題に鑑みてなされたものであり、混雑が生じ得る空間が撮影された撮影画像から個々の物体の位置を精度良く推定できる物体位置推定装置を提供することを目的とする。 The present invention has been made in view of the above problems, and an object of the present invention is to provide an object position estimation device capable of accurately estimating the position of an individual object from a photographed image in which a space where congestion may occur is photographed.

かかる課題を解決するため本発明は、所定の物体による混雑が生じ得る推定対象空間が撮影された撮影画像から物体それぞれの位置を推定する物体位置推定装置であって、所定の密度ごとに当該密度にて物体が存在する空間を撮影した密度画像の特徴を予め学習した密度推定器を用いて、撮影画像に撮影された物体の密度の分布を推定する密度推定手段と、物体を模した物体モデルを記憶している物体モデル記憶手段と、撮影画像に配置領域を設定するとともに、分布に応じた個数の物体モデルに配置領域内の位置を割り当てて複数通りの配置を設定し、複数通りの配置にて物体モデルを描画してモデル画像を生成するモデル画像生成手段と、複数通りの配置それぞれのモデル画像と撮影画像との類似度を算出して、複数通りの配置のうちの類似度が最も高い配置を出力する最適配置推定手段と、を備えたことを特徴とする物体位置推定装置を提供する。 In order to solve such a problem, the present invention is an object position estimation device that estimates the position of each object from a captured image in which an estimation target space where congestion due to a predetermined object may occur is captured, and the density is set for each predetermined density. Using a density estimator that has learned the characteristics of the density image of the space in which the object exists in advance, a density estimation means that estimates the distribution of the density of the object captured in the captured image, and an object model that imitates the object. The object model storage means that stores the above and the arrangement area are set in the captured image, and the positions in the arrangement area are assigned to the number of object models according to the distribution to set multiple arrangements, and multiple arrangements are made. The model image generation means for drawing the object model and generating the model image and the similarity between the model image and the photographed image of each of the multiple arrangements are calculated, and the similarity among the multiple arrangements is the highest. Provided is an object position estimation device provided with an optimum arrangement estimation means for outputting a high arrangement.

かかる物体位置推定装置においてモデル画像生成手段は、分布において想定される推定誤差の範囲で複数通りの個数の物体モデルを割り当てて複数通りの配置を設定し、最適配置推定手段は、配置ごとの類似度を当該配置における個数と対応する推定誤差が大きいほど低めて算出することが好適である。 In such an object position estimation device, the model image generation means allocates a plurality of types of object models within the range of estimation error assumed in the distribution and sets a plurality of arrangements, and the optimum arrangement estimation means is similar for each arrangement. It is preferable to calculate the degree lower as the estimation error corresponding to the number in the arrangement becomes larger.

かかる物体位置推定装置においてモデル画像生成手段は、配置ごとにモデル画像における物体モデルどうしの重なり度合いを算出し、最適配置推定手段は、配置ごとの類似度を当該配置における重なり度合いが大きいほど低めて算出することが好適である。 In such an object position estimation device, the model image generation means calculates the degree of overlap between the object models in the model image for each arrangement, and the optimum arrangement estimation means lowers the degree of similarity for each arrangement as the degree of overlap in the arrangement increases. It is preferable to calculate.

かかる物体位置推定装置においては、撮影画像における局所領域のサイズを推定対象空間における実サイズに換算するサイズ換算手段、をさらに備え、密度推定手段は、局所領域ごとに密度を推定し、モデル画像生成手段は、配置領域に含まれる局所領域の実サイズと密度から当該配置領域に割り当てる個数を決定することが好適である。 Such an object position estimation device further includes a size conversion means for converting the size of a local region in a captured image into an actual size in an estimation target space, and the density estimation means estimates the density for each local region and generates a model image. As a means, it is preferable to determine the number to be allocated to the arrangement area from the actual size and density of the local area included in the arrangement area.

かかる物体位置推定装置においては、撮影画像を推定対象空間の背景画像と比較して撮影画像において背景画像と所定基準以上に相違する変化領域を抽出する変化領域抽出手段、をさらに備え、モデル画像生成手段は、変化領域を配置領域に設定することが好適である。 Such an object position estimation device further includes a change area extraction means for comparing a photographed image with a background image in an estimation target space and extracting a change area in the photographed image that differs from the background image by a predetermined reference or more, and generates a model image. As the means, it is preferable to set the change region as the arrangement region.

かかる物体位置推定装置においてモデル画像生成手段は、分布において０より大きな密度が推定された領域を配置領域に設定することが好適である。 In such an object position estimation device, it is preferable that the model image generation means sets a region in which a density larger than 0 is estimated in the distribution as the arrangement region.

かかる物体位置推定装置において物体モデル記憶手段は、物体の形状を模した物体モデルを記憶し、最適配置推定手段は、モデル画像に描画された物体モデルの形状の、配置領域に対する適合度の高さに応じた類似度を算出することが好適である。
In such an object position estimation device, the object model storage means stores an object model that imitates the shape of the object, and the optimum placement estimation means has a high degree of suitability of the shape of the object model drawn on the model image with respect to the placement area. It is preferable to calculate the degree of similarity according to the above.

本発明によれば、撮影画像から物体の密度分布を推定して密度分布に応じた個数の物体モデルを用いて個々の物体の位置を推定するため、混雑が生じ得る空間が撮影された撮影画像から個々の物体の位置を精度良く推定できる。
According to the present invention, since the density distribution of an object is estimated from the captured image and the position of each object is estimated using a number of object models corresponding to the density distribution, the captured image in which a space where congestion may occur is photographed. The position of each object can be estimated accurately from.

画像監視装置１の概略の構成を示すブロック図である。It is a block diagram which shows the schematic structure of the image monitoring apparatus 1. 画像監視装置１の機能ブロック図である。It is a functional block diagram of the image monitoring device 1. 物体モデル記憶手段４１の記憶内容を示す模式図である。It is a schematic diagram which shows the storage content of the object model storage means 41. サイズ換算手段５４の構成要素を示す模式図である。It is a schematic diagram which shows the component of the size conversion means 54. 実サイズ算出手段５４０の処理例を説明する図である。It is a figure explaining the processing example of the actual size calculation means 540. 形状適合度の算出処理の一例を模式的に示した図である。It is a figure which showed typically an example of the calculation process of the goodness of fit. モデル画像２１０に対して隠蔽度を算出する様子を示した図である。It is a figure which showed the state of calculating the concealment degree with respect to a model image 210. 画像処理部５の動作を説明するフロー図である。It is a flow chart explaining the operation of the image processing unit 5. 注目変化領域の最適配置推定を説明する１つめの図である。It is the first figure explaining the optimum arrangement estimation of the attention change area. 注目変化領域の最適配置推定を説明する２つめの図である。It is a second figure explaining the optimum arrangement estimation of the attention change area.

以下、本発明の物体位置推定装置を含む好適な実施形態の一例として、物体位置推定装置によってイベント会場を撮影した撮影画像から個々の人の位置を推定し、推定した位置の情報を表示する画像監視装置１について説明する。 Hereinafter, as an example of a preferred embodiment including the object position estimation device of the present invention, an image in which the position of an individual person is estimated from a photographed image of the event venue taken by the object position estimation device and information on the estimated position is displayed. The monitoring device 1 will be described.

＜画像監視装置１の構成＞
図１は画像監視装置１の概略の構成を示すブロック図である。画像監視装置１は、撮影部２、通信部３、記憶部４、画像処理部５、および出力部６からなる。 <Configuration of image monitoring device 1>
FIG. 1 is a block diagram showing a schematic configuration of the image monitoring device 1. The image monitoring device 1 includes a photographing unit 2, a communication unit 3, a storage unit 4, an image processing unit 5, and an output unit 6.

撮影部２は、監視カメラであり、通信部３を介して画像処理部５と接続され、監視空間を所定の時間間隔で撮影して撮影画像を生成し、撮影画像を順次画像処理部５に入力する撮影手段である。例えば、撮影部２は、イベント会場に設置されたポールに当該監視空間を俯瞰する所定の固定視野を有して設置され、監視空間をフレーム周期１秒で撮影してカラー画像を生成する。カラー画像の代わりにモノクロ画像を生成してもよい。 The photographing unit 2 is a surveillance camera, is connected to the image processing unit 5 via the communication unit 3, photographs the monitoring space at predetermined time intervals to generate a photographed image, and sequentially transfers the photographed images to the image processing unit 5. It is a shooting means to input. For example, the photographing unit 2 is installed on a pole installed at the event venue with a predetermined fixed field of view overlooking the monitoring space, and photographs the monitoring space with a frame period of 1 second to generate a color image. A monochrome image may be generated instead of the color image.

通信部３は、通信回路であり、その一端が画像処理部５に接続され、他端が同軸ケーブルまたはＬＡＮ（Local Area Network）、インターネットなどの通信網を介して撮影部２および出力部６と接続される。通信部３は、撮影部２から撮影画像を取得して画像処理部５に入力し、画像処理部５から入力された推定結果を出力部６に出力する。 The communication unit 3 is a communication circuit, one end of which is connected to the image processing unit 5, and the other end of the communication unit 3 is connected to the photographing unit 2 and the output unit 6 via a communication network such as a coaxial cable, LAN (Local Area Network), or the Internet. Be connected. The communication unit 3 acquires a photographed image from the photographing unit 2 and inputs it to the image processing unit 5, and outputs the estimation result input from the image processing unit 5 to the output unit 6.

記憶部４は、ＲＯＭ（Read Only Memory）、ＲＡＭ（Random Access Memory）等のメモリ装置であり、各種プログラムや各種データを記憶する。記憶部４は、画像処理部５と接続されて画像処理部５との間でこれらの情報を入出力する。 The storage unit 4 is a memory device such as a ROM (Read Only Memory) and a RAM (Random Access Memory), and stores various programs and various data. The storage unit 4 is connected to the image processing unit 5 and inputs / outputs these information to and from the image processing unit 5.

画像処理部５は、ＣＰＵ（Central Processing Unit）、ＤＳＰ（Digital Signal Processor）、ＭＣＵ（Micro Control Unit）等の演算装置で構成される。画像処理部５は、記憶部４および出力部６と接続され、記憶部４からプログラムを読み出して実行することにより各種処理手段・制御手段として動作し、各種データを記憶部４に記憶させ、読み出す。また、画像処理部５は、通信部３を介して撮影部２および出力部６とも接続され、通信部３経由で撮影部２から取得した撮影画像を解析することにより監視空間に存在する人物の位置及び、人物領域を通信部３経由で出力部６に表示させる。 The image processing unit 5 is composed of arithmetic units such as a CPU (Central Processing Unit), a DSP (Digital Signal Processor), and an MCU (Micro Control Unit). The image processing unit 5 is connected to the storage unit 4 and the output unit 6, operates as various processing means / control means by reading and executing a program from the storage unit 4, and stores and reads various data in the storage unit 4. .. Further, the image processing unit 5 is also connected to the photographing unit 2 and the output unit 6 via the communication unit 3, and analyzes the photographed image acquired from the photographing unit 2 via the communication unit 3 to analyze a photographed image of a person existing in the monitoring space. The position and the person area are displayed on the output unit 6 via the communication unit 3.

出力部６は、液晶ディスプレイ又はＣＲＴ（Cathode Ray Tube）ディスプレイ等のディスプレイ装置であり、通信部３を介して画像処理部５と接続され、画像処理部５による推定結果を表示する表示手段である。監視員は表示された推定結果を視認して混雑の発生等を判断し、必要に応じて人員配置の変更等の対処を行う。 The output unit 6 is a display device such as a liquid crystal display or a CRT (Cathode Ray Tube) display, and is a display means connected to the image processing unit 5 via the communication unit 3 and displaying the estimation result by the image processing unit 5. .. The observer visually recognizes the displayed estimation result, judges the occurrence of congestion, etc., and takes measures such as changing the staffing as necessary.

なお、本実施形態においては、撮影部２と画像処理部５の個数が１対１である画像監視装置１を例示するが、別の実施形態においては、撮影部２と画像処理部５の個数を多対１或いは多対多とすることもできる。 In this embodiment, the image monitoring device 1 in which the number of the photographing unit 2 and the image processing unit 5 is 1: 1 is illustrated, but in another embodiment, the number of the photographing unit 2 and the image processing unit 5 is illustrated. Can be many-to-one or many-to-many.

＜画像監視装置１の機能＞
図２は画像監視装置１の機能ブロック図である。通信部３は、画像取得手段３０および物体位置出力手段３１等として機能し、記憶部４は、背景画像記憶手段４０および物体モデル記憶手段４１等として機能する。また、画像処理部５は、密度推定手段５０、モデル画像生成手段５１、最適配置推定手段５２、変化領域抽出手段５３およびサイズ換算手段５４等として機能する。 <Function of image monitoring device 1>
FIG. 2 is a functional block diagram of the image monitoring device 1. The communication unit 3 functions as the image acquisition means 30, the object position output means 31, and the like, and the storage unit 4 functions as the background image storage means 40, the object model storage means 41, and the like. Further, the image processing unit 5 functions as a density estimation means 50, a model image generation means 51, an optimum arrangement estimation means 52, a change area extraction means 53, a size conversion means 54, and the like.

以下、図２から図７を参照して各手段について説明する。 Hereinafter, each means will be described with reference to FIGS. 2 to 7.

画像取得手段３０は、撮影手段である撮影部２から撮影画像を順次取得して、取得した撮影画像を密度推定手段５０および変化領域抽出手段５３に順次出力する。 The image acquisition means 30 sequentially acquires captured images from the photographing unit 2 which is the photographing means, and sequentially outputs the acquired captured images to the density estimation means 50 and the change area extraction means 53.

密度推定手段５０は、画像取得手段３０から入力された撮影画像から密度推定用の特徴量（推定用特徴量）を抽出して、抽出した推定用特徴量を密度推定器に入力して取得される出力値を用いて人の密度の分布（密度分布）を推定し、推定した密度分布をモデル画像生成手段５１に出力する。 The density estimation means 50 extracts the feature amount for density estimation (feature amount for estimation) from the captured image input from the image acquisition means 30, and inputs the extracted feature amount for estimation into the density estimator to acquire the feature amount. The human density distribution (density distribution) is estimated using the output value, and the estimated density distribution is output to the model image generation means 51.

具体的には、密度推定手段５０は、撮影画像の各画素の位置に窓（推定用抽出窓）を設定し、各推定用抽出窓における撮影画像の推定用特徴量を算出することによって、画素ごとに推定用特徴量を抽出する。推定用特徴量はＧＬＣＭ（Gray Level Co-occurrence Matrix）特徴である。
各推定用抽出窓に撮影されている監視空間内の領域が同一サイズであることが望ましい。すなわち、好適には密度推定手段５０は不図示のカメラパラメータ記憶手段から予め記憶されている撮影部２のカメラパラメータを読み出し、カメラパラメータを用いたホモグラフィ変換により撮影画像の任意の領域に撮影されている監視空間内の領域が同一サイズとなるように撮影画像を変形してから推定用特徴量を抽出する。なお、カメラパラメータ記憶手段は後述するカメラパラメータ記憶手段４１１または／および４２０と共用することもできる。 Specifically, the density estimation means 50 sets a window (estimation extraction window) at the position of each pixel of the captured image, and calculates the estimation feature amount of the captured image in each estimation extraction window to obtain pixels. The feature amount for estimation is extracted for each. The estimation feature quantity is a GLCM (Gray Level Co-occurrence Matrix) feature.
It is desirable that the areas in the monitoring space photographed by each estimation window have the same size. That is, preferably, the density estimation means 50 reads out the camera parameters of the photographing unit 2 stored in advance from the camera parameter storage means (not shown), and is photographed in an arbitrary region of the captured image by homography conversion using the camera parameters. The captured image is deformed so that the areas in the monitoring space are the same size, and then the estimation feature amount is extracted. The camera parameter storage means can also be shared with the camera parameter storage means 411 and / and 420 described later.

そして、密度推定手段５０は、画素ごとに、当該画素に対応して抽出した推定用特徴量を密度推定器に入力することによってその出力値である推定密度を取得する。この画素ごとの推定密度が密度分布である。なお、撮影画像を変形させて推定用特徴量を抽出した場合、密度推定手段５０は、カメラパラメータを用いたホモグラフィ変換により密度分布を元の撮影画像の形状に変形させる。 Then, the density estimation means 50 acquires the estimated density, which is the output value, by inputting the estimation feature amount extracted corresponding to the pixel into the density estimator for each pixel. The estimated density for each pixel is the density distribution. When the captured image is deformed to extract the feature amount for estimation, the density estimation means 50 transforms the density distribution into the shape of the original captured image by homography transformation using camera parameters.

密度推定器は、画像の特徴量を入力されると当該画像に撮影されている人の密度を推定して推定値（推定密度）を出力する関数である。当該関数が、その係数等のパラメータを含めて、密度推定手段５０のプログラムの一部として予め記憶されている。密度推定器は多クラスの画像を識別する識別器で実現することができ、多クラスＳＶＭ法で学習した識別関数とすることができる。 The density estimator is a function that estimates the density of the person photographed in the image and outputs an estimated value (estimated density) when the feature amount of the image is input. The function, including parameters such as its coefficient, is stored in advance as part of the program of the density estimation means 50. The density estimator can be realized by a discriminator that discriminates multi-class images, and can be a discriminant function learned by the multi-class SVM method.

密度は、例えば、人が存在しない「背景」クラス、０人／ｍ^２より高く２人／ｍ^２以下である「低密度」クラス、２人／ｍ^２より高く４人／ｍ^２以下である「中密度」クラス、４人／ｍ^２より高い「高密度」クラスの４クラスと定義することができる。 The density is, for example, a "background" class with no people, a "low density" class ^{higher than 0 people / m 2} and 2 people / m ² ^{or less, higher than 2 people / m 2} and 4 people / m ² or less. It can be defined as 4 classes of "medium density" class and ^{"high density" class higher than 4 people / m 2.}

推定密度は各クラスに予め付与された値であり、分布推定の結果として出力される値である。本実施形態では各クラスに対応する値を「背景」「低密度」「中密度」「高密度」と表記する。 The estimated density is a value given in advance to each class and is a value output as a result of distribution estimation. In this embodiment, the values corresponding to each class are described as "background", "low density", "medium density", and "high density".

すなわち、密度推定器は「背景」クラス、「低密度」クラス、「中密度」クラス、「高密度」クラスのそれぞれに帰属する多数の画像（密度画像）の特徴量に多クラスＳＶＭ法を適用して学習した、各クラスの画像を他のクラスと識別するための識別関数である。この学習により導出された識別関数のパラメータが密度推定器として記憶されている。なお、密度画像の特徴量は、推定用特徴量と同種であり、ＧＬＣＭ特徴である。 That is, the density estimator applies the multi-class SVM method to the features of a large number of images (density images) belonging to each of the "background" class, "low density" class, "medium density" class, and "high density" class. It is an identification function for distinguishing the image of each class from other classes. The parameters of the discriminant function derived by this learning are stored as a density estimator. The feature amount of the density image is the same as the feature amount for estimation, and is a GLCM feature.

密度推定手段５０が出力する密度分布から撮影画像の各所における人の粗密状況が分かるが、密度分布から個々の人の位置までは分からない。
密度推定手段５０の後段のモデル画像生成手段５１および最適配置推定手段５２は、人を模したモデルを撮影画像に当てはめることで個々の人の位置を推定する。当てはめるモデルは物体モデル記憶手段４１に記憶されており、このモデルは人の形状を模したものである。最適配置推定手段５２は、配置したモデルの形状と撮影画像に現れている形状特徴量の類似性を基礎にしてモデルの当てはまり度合いを評価する。 From the density distribution output by the density estimation means 50, the density of people in various parts of the photographed image can be known, but from the density distribution, the position of each person cannot be known.
The model image generation means 51 and the optimum placement estimation means 52, which are the latter stages of the density estimation means 50, estimate the positions of individual people by applying a model imitating a person to a captured image. The model to be applied is stored in the object model storage means 41, and this model imitates the shape of a person. The optimum placement estimation means 52 evaluates the degree of fit of the model based on the similarity between the shape of the placed model and the shape feature amount appearing in the captured image.

具体的には、モデル画像生成手段５１は、密度推定手段５０から入力された密度分布を参照するとともに、物体モデル記憶手段４１から物体モデルを読み出し、撮影画像に配置領域を設定するとともに、当該密度分布に応じた個数（配置数）の物体モデルに配置領域内の位置を割り当てて複数通りの配置を設定し、設定した複数通りの配置にて物体モデルを描画してモデル画像を生成し、生成したモデル画像および配置数を含むモデル画像情報を最適配置推定手段５２に出力する。配置領域は、変化領域など、物体が撮影されていると推定される領域である。 Specifically, the model image generation means 51 refers to the density distribution input from the density estimation means 50, reads the object model from the object model storage means 41, sets an arrangement area in the captured image, and sets the density. Assign positions in the placement area to the number of object models (number of placements) according to the distribution, set multiple placements, draw the object model with the set multiple placements, generate a model image, and generate The model image information including the model image and the number of arrangements is output to the optimum arrangement estimation means 52. The arrangement area is an area where an object is presumed to be photographed, such as a change area.

そして、最適配置推定手段５２は、モデル画像生成手段５１から入力された複数通りの配置それぞれのモデル画像と撮影画像との類似度を算出して、複数通りの配置のうちの類似度が最も高い配置（最適配置）が示す物体モデルの位置から物体位置の情報を生成して物体位置出力手段３１に出力する。 Then, the optimum arrangement estimating means 52 calculates the similarity between the model image and the captured image of each of the plurality of arrangements input from the model image generation means 51, and the degree of similarity among the plurality of arrangements is the highest. Information on the object position is generated from the position of the object model indicated by the arrangement (optimal arrangement) and output to the object position output means 31.

このように密度分布に応じた個数の物体モデルを配置することで、当てはめるモデルの個数を的確に制限して個々の物体の位置を推定できる。そのため、本来の物体数以上の物体モデルが当てはまることを防止でき、混雑が生じ得る空間が撮影された撮影画像から個々の物体の位置を精度良く推定することが可能となる。 By arranging the number of object models according to the density distribution in this way, the number of models to be applied can be accurately limited and the positions of individual objects can be estimated. Therefore, it is possible to prevent an object model having more than the original number of objects from being applied, and it is possible to accurately estimate the position of each object from a photographed image in which a space where congestion may occur is photographed.

物体モデルとモデル画像の生成について説明する。 The generation of the object model and the model image will be described.

物体モデル記憶手段４１は、図３（Ａ）に示すように、予め推定対象の物体の立体モデルを記憶している立体モデル記憶手段４１０と、予め撮影部２のカメラパラメータを記憶しているカメラパラメータ記憶手段４１１を備え、モデル画像生成手段５１が立体モデルおよびカメラパラメータを読み出す。 As shown in FIG. 3A, the object model storage means 41 includes a three-dimensional model storage means 410 that stores a three-dimensional model of the object to be estimated in advance, and a camera that stores the camera parameters of the photographing unit 2 in advance. The parameter storage means 411 is provided, and the model image generation means 51 reads out the stereoscopic model and the camera parameters.

立体モデルは、推定対象の物体を構成する複数の構成部分毎の立体形状を表す部分モデルと、それら部分モデル相互の配置関係とを記述したデータである。画像監視装置１が推定対象とする物体は立位の人であり、人の頭部、胴部、脚部の３部分の立体形状を近似する回転楕円体をＺ軸方向に積み重ねた立体モデルを設定する。本実施形態では説明を簡単にするため、立体モデルの高さおよび幅は標準的な人のサイズとし全員に共通とする。また、頭部中心を人の代表位置とする。なお立体モデルはより単純化して１つの回転楕円体で近似してもよい。 The three-dimensional model is data that describes a partial model that represents the three-dimensional shape of each of a plurality of constituent parts constituting the object to be estimated, and the arrangement relationship between the partial models. The object to be estimated by the image monitoring device 1 is a standing person, and a three-dimensional model in which spheroids that approximate the three-dimensional shapes of the human head, body, and legs are stacked in the Z-axis direction is created. Set. In this embodiment, for the sake of simplicity, the height and width of the three-dimensional model are standard human sizes and are common to all. In addition, the center of the head is the representative position of the person. The three-dimensional model may be simplified and approximated by one spheroid.

カメラパラメータは撮影部２が監視空間を投影した撮影画像を撮影する際の投影条件に関する情報を含む。例えば、カメラパラメータは、実際の監視空間における撮影部２の設置位置及び撮像方向といった外部パラメータ、撮影部２の焦点距離、画角、レンズ歪みその他のレンズ特性や、撮像素子の画素数といった内部パラメータを含む情報である。このカメラパラメータを用いて立体モデルをレンダリングすることで、撮影部２による人の撮影を模した仮想画像（モデル画像）を生成できる。また、このカメラパラメータを用いて撮影画像上の任意の画素を立体モデルの頭部中心の高さの水平面に逆投影することで、当該画素の位置に投影される立体モデルの、監視空間を模した仮想空間内における代表位置（頭部中心）を算出できる。 The camera parameters include information regarding projection conditions when the photographing unit 2 captures a captured image projected on the monitoring space. For example, the camera parameters include external parameters such as the installation position and imaging direction of the photographing unit 2 in the actual monitoring space, internal parameters such as the focal length, angle of view, lens distortion and other lens characteristics of the photographing unit 2 and the number of pixels of the imaging element. Information including. By rendering a three-dimensional model using this camera parameter, it is possible to generate a virtual image (model image) that imitates the shooting of a person by the shooting unit 2. In addition, by using this camera parameter to back-project any pixel on the captured image onto the horizontal plane at the height of the center of the head of the 3D model, the surveillance space of the 3D model projected at the position of the pixel is imitated. The representative position (center of the head) in the virtual space can be calculated.

モデル画像生成手段５１は、配置数分の物体モデルに撮影画像上の位置を割り当て、カメラパラメータを用いて当該各位置に対応する仮想空間内の位置を求めて、求めた各位置に立体モデルを配置する。そして、モデル画像生成手段５１は、立体モデルを配置した仮想空間を、カメラパラメータを用いて撮影部２の撮影面にレンダリングすることによりモデル画像を生成する。レンダリングにより生成したモデル画像においては物体間の隠蔽も表現される。 The model image generation means 51 assigns positions on the captured image to as many object models as the number of arrangements, obtains positions in the virtual space corresponding to each position using camera parameters, and creates a three-dimensional model at each of the obtained positions. Deploy. Then, the model image generation means 51 generates a model image by rendering the virtual space in which the three-dimensional model is arranged on the shooting surface of the shooting unit 2 using the camera parameters. Concealment between objects is also expressed in the model image generated by rendering.

物体モデル記憶手段４１は、図３（Ａ）の構成に代えて、図３（Ｂ）に示すように、予め撮影画像の各画素の位置に対応した二次元のモデル像を記憶しているモデル像記憶手段４１２で構成することもできる。これらのモデル像はカメラパラメータを用いた立体モデルの投影を事前に行うことで生成される。その場合、モデル画像生成手段５１は、各物体モデルに撮影画像上の位置を割り当てると、当該位置に対応するモデル像を描画する。その際、モデル画像生成手段５１は撮影部２から遠い位置から順に上書き描画することにより物体間の隠蔽を表現する。 As shown in FIG. 3B, the object model storage means 41 stores a two-dimensional model image corresponding to the position of each pixel of the captured image in advance instead of the configuration of FIG. 3A. It can also be configured by the image storage means 412. These model images are generated by projecting a three-dimensional model using camera parameters in advance. In that case, when the model image generation means 51 assigns a position on the captured image to each object model, the model image generation means 51 draws a model image corresponding to the position. At that time, the model image generation means 51 expresses the concealment between the objects by overwriting and drawing in order from the position far from the photographing unit 2.

或いは、物体モデル記憶手段４１は、撮影画像の画素数よりも少ない個数の代表的なモデル像を予め記憶しているモデル像記憶手段４１２で構成することもできる。この場合は、これらの代表的なモデル像に拡大・縮小などの変形処理を施すことによって、任意の位置のモデル像が生成される。 Alternatively, the object model storage means 41 may be configured by the model image storage means 412 that stores in advance a number of representative model images that is smaller than the number of pixels of the captured image. In this case, a model image at an arbitrary position is generated by performing deformation processing such as enlargement / reduction on these typical model images.

以上のように、物体モデル記憶手段４１は推定対象の物体を模した物体モデルを記憶しており、特にその形状を模した物体モデルを記憶している。そして、モデル画像生成手段５１は、各物体モデルに撮影画像上の位置を割り当てると、物体モデル記憶手段４１から物体モデルを読み出し、割り当てた各位置に物体モデルを描画してモデル画像を生成する。 As described above, the object model storage means 41 stores an object model that imitates the object to be estimated, and particularly stores an object model that imitates the shape. Then, when the model image generation means 51 assigns a position on the captured image to each object model, the model image generation means 51 reads the object model from the object model storage means 41, draws the object model at each assigned position, and generates a model image.

次に、物体モデルを配置する領域（配置領域）および配置数について説明する。 Next, the area (arrangement area) in which the object model is arranged and the number of arrangements will be described.

単純な例では、モデル画像生成手段５１は、低密度クラス、中密度クラスまたは高密度クラスであると推定された領域を配置領域に設定し、１個以上の物体モデルのそれぞれに配置領域内のランダムな位置を割り当て、割り当てを変更し且つ上限個数（例えば仮想空間中で立体モデルが重ならずに配置できる上限個数）まで配置数を増やしながら、互いに位置および／または配置数が異なる複数通りのモデル画像を生成する。 In a simple example, the model image generating means 51 sets a region estimated to be a low-density class, a medium-density class, or a high-density class as a placement region, and sets each of the one or more object models in the placement region. Multiple ways of assigning random positions, changing the allocation, and increasing the number of placements up to the maximum number (for example, the maximum number of 3D models that can be placed without overlapping in virtual space), with different positions and / or numbers of placements. Generate a model image.

しかしながら、上記の単純な例では、位置と配置数の組み合わせが過剰であるため処理量が多くなる上に本来の物体数以上の物体モデルが当てはまることによる誤推定を生じやすくなる。
そこで、モデル画像生成手段５１は、撮影画像中の変化領域を配置領域とすることで、より厳密な配置領域を設定し、また配置領域の実サイズと密度からより厳密な配置数を決定することによって、処理量を減じるとともに本来の物体数以上の物体モデルが当てはまることによる誤推定をより的確に防止する。 However, in the above simple example, since the combination of the position and the number of arrangements is excessive, the amount of processing is large and an object model larger than the original number of objects is applied, which makes it easy to cause an erroneous estimation.
Therefore, the model image generation means 51 sets a more strict arrangement area by setting the change area in the captured image as the arrangement area, and determines the more strict arrangement number from the actual size and density of the arrangement area. This reduces the amount of processing and more accurately prevents erroneous estimation due to the application of an object model larger than the original number of objects.

そのために、背景画像記憶手段４０は監視空間の背景画像を記憶し、変化領域抽出手段５３は、画像取得手段３０から入力された撮影画像と、背景画像記憶手段４０から読み出した背景画像とを比較して、撮影画像において背景画像と所定基準以上に相違する変化領域を抽出し、抽出した変化領域の情報をモデル画像生成手段５１および最適配置推定手段５２に出力する。 Therefore, the background image storage means 40 stores the background image of the monitoring space, and the change area extraction means 53 compares the captured image input from the image acquisition means 30 with the background image read from the background image storage means 40. Then, a change region different from the background image by a predetermined reference or more is extracted from the captured image, and the information of the extracted change region is output to the model image generation means 51 and the optimum arrangement estimation means 52.

具体的には、変化領域抽出手段５３は、推定に先立って、人が存在しないときの撮影画像を背景画像として背景画像記憶手段４０に記憶させる。また、変化領域抽出手段５３は、新たな撮影画像のうちの変化領域以外の部分画像を、背景画像に加算平均し、または置換することによって背景画像を適宜更新する。
或いは、変化領域抽出手段５３は無人の監視空間を模した環境モデルをレンダリングすることによって背景画像を生成することもできる。この場合、変化領域抽出手段５３は、不図示の環境モデル記憶手段から予め記憶されている環境モデルを読み出すとともに不図示のカメラパラメータ記憶手段から予め記憶されている撮影部２のカメラパラメータを読み出し、カメラパラメータを用いて環境モデルをレンダリングする。カメラパラメータ記憶手段は後述するカメラパラメータ記憶手段４１１または／および４２０と共用することもできる。環境モデルには監視空間の構成物体それぞれについての三次元形状、マテリアル特性等および監視空間を照明する光源の照明パラメータが含まれ、変化領域抽出手段５３は、撮影画像等から照明条件を推定し、推定した照明条件に応じて照明パラメータを変更してレンダリングすることにより背景画像を適宜更新する。 Specifically, the change region extraction means 53 stores a captured image when no person is present in the background image storage means 40 as a background image prior to estimation. Further, the change area extraction means 53 appropriately updates the background image by adding, averaging, or replacing the partial image other than the change area of the new captured image with the background image.
Alternatively, the change area extraction means 53 can also generate a background image by rendering an environment model that imitates an unmanned surveillance space. In this case, the change area extraction means 53 reads the environment model stored in advance from the environment model storage means (not shown) and reads the camera parameters of the photographing unit 2 stored in advance from the camera parameter storage means (not shown). Render the environment model using camera parameters. The camera parameter storage means can also be shared with the camera parameter storage means 411 and / and 420 described later. The environment model includes the three-dimensional shape, material characteristics, and the lighting parameters of the light source that illuminates the monitoring space for each of the constituent objects of the monitoring space, and the change area extraction means 53 estimates the lighting conditions from the captured image and the like. The background image is updated as appropriate by rendering by changing the lighting parameters according to the estimated lighting conditions.

変化領域抽出手段５３は背景差分処理により変化領域を抽出する。すなわち変化領域抽出手段５３は、画素ごとに撮影画像と背景画像の輝度差の絶対値（差分値）を算出して予め定めた閾値と比較して閾値以上の差分値が算出された画素を検出し、検出した画素が空間的につながっている塊を変化領域として抽出する。なお推定対象の想定サイズよりも小さな塊は抽出対象外とする。
或いは、変化領域抽出手段５３は、背景相関処理によって変化領域を抽出することもできる。
なお、変化領域抽出手段５３は、モデル画像生成手段５１に出力する変化領域の情報に、各画素における差分値あるいは相関値を含ませてもよい。 The change area extraction means 53 extracts the change area by background subtraction processing. That is, the change region extraction means 53 calculates the absolute value (difference value) of the brightness difference between the captured image and the background image for each pixel, compares it with a predetermined threshold value, and detects the pixel for which the difference value equal to or greater than the threshold value is calculated. Then, the mass in which the detected pixels are spatially connected is extracted as a change region. In addition, lumps smaller than the estimated size of the estimation target are excluded from the extraction target.
Alternatively, the change region extraction means 53 can also extract the change region by background correlation processing.
The change area extraction means 53 may include a difference value or a correlation value in each pixel in the information of the change area output to the model image generation means 51.

このように、背景画像記憶手段４０は推定対象空間の背景画像を記憶し、変化領域抽出手段５３は撮影画像を推定対象空間の背景画像と比較して撮影画像において背景画像と所定基準以上に相違する変化領域を抽出する。そして、モデル画像生成手段５１は変化領域抽出手段５３から入力された変化領域を配置領域に設定する。 In this way, the background image storage means 40 stores the background image of the estimation target space, and the change area extraction means 53 compares the captured image with the background image of the estimation target space and differs from the background image by a predetermined reference or more in the captured image. Extract the change area to be used. Then, the model image generation means 51 sets the change area input from the change area extraction means 53 as the arrangement area.

サイズ換算手段５４は、モデル画像生成手段５１から撮影画像中の任意の画素を指定され、指定された画素に投影されている監視空間の面積（実サイズ）をモデル画像生成手段５１に出力する。
そして、モデル画像生成手段５１は、変化領域中の各画素をサイズ換算手段５４に指定して各画素の実サイズを取得し、変化領域中の各画素の実サイズに密度推定手段５０から入力された当該画素における密度を乗じて積を総和することにより変化領域に対するモデルの配置数を決定する。
すなわち、変化領域に含まれる画素の集合をＲ、集合Ｒの中の任意の一画素をｉ、画素ｉの実サイズをａ_ｉ、画素ｉにおける密度をｄ_ｉとすると、配置数Ｐは次式で導出される。

The size conversion means 54 specifies an arbitrary pixel in the captured image from the model image generation means 51, and outputs the area (actual size) of the monitoring space projected on the designated pixel to the model image generation means 51.
Then, the model image generation means 51 designates each pixel in the change region as the size conversion means 54, acquires the actual size of each pixel, and inputs the actual size of each pixel in the change region from the density estimation means 50. The number of models arranged with respect to the change region is determined by multiplying the density of the pixels and summing the products.
That is, the set of pixels included in the change region R, any one pixel i in the set R, the actual size a _i of the pixel _i, when the density at pixel i and d _i, arrangement number P is expressed by the following equation Derived by.

具体的には、サイズ換算手段５４は、図４（Ａ）に示すように、予め撮影部２のカメラパラメータを記憶しているカメラパラメータ記憶手段４２０と、カメラパラメータを用いて、指定された画素が表す撮影画像上の領域を監視空間内における領域に変換して当該領域の面積を指定された画素の実サイズとして算出する実サイズ算出手段５４０によって構成することができる。なお、カメラパラメータ記憶手段４２０とカメラパラメータ記憶手段４１１とは共通化してもよい。 Specifically, as shown in FIG. 4A, the size conversion means 54 uses the camera parameter storage means 420 that stores the camera parameters of the photographing unit 2 in advance and the pixels designated by using the camera parameters. It can be configured by the actual size calculation means 540 that converts the area on the captured image represented by the above into an area in the monitoring space and calculates the area of the area as the actual size of the designated pixel. The camera parameter storage means 420 and the camera parameter storage means 411 may be shared.

図５を参照して、実サイズ算出手段５４０の処理例を説明する。図５のＸＹＺ空間１００は監視空間を模した仮想空間である。ＸＹ平面は、監視空間における地面、床面等の水平面を表している。また、図５には標準的な身長の人を模した物体モデル１０２の頭部中心の高さ（例えば１．５ｍ）の水平面１０１を示している。
サイズ換算手段５４は、カメラパラメータを用いて、撮影部２の光学中心と指定された画素とを結ぶ視線ベクトルが平面１０１と交差する交点Ｐ_０（Ｘ_０，Ｙ_０，Ｚ_０）を算出する。同様に、サイズ換算手段５４は、カメラパラメータを用いて、指定された画素の右隣の画素および下の画素のそれぞれの交点Ｐ_ｒ（Ｘ_ｒ，Ｙ_ｒ，Ｚ_ｒ）およびＰ_ｂ（Ｘ_ｂ，Ｙ_ｂ，Ｚ_ｂ）を算出する。指定された画素と対応する実サイズは、交点Ｐ_０から交点Ｐ_ｒへのベクトルと交点Ｐ_０から交点Ｐ_ｂへのベクトルが為す平行四辺形の面積で近似できる。サイズ換算手段５４は、｜（Ｘ_ｒ−Ｘ_０）（Ｙ_ｂ−Ｙ_０）−（Ｙ_ｒ−Ｙ_０）（Ｘ_ｂ−Ｘ_０）｜を算出して出力する。 A processing example of the actual size calculation means 540 will be described with reference to FIG. The XYZ space 100 in FIG. 5 is a virtual space that imitates a monitoring space. The XY plane represents a horizontal plane such as the ground and floor in the monitoring space. Further, FIG. 5 shows a horizontal plane 101 at the height of the center of the head (for example, 1.5 m) of the object model 102 imitating a person of standard height.
_{The size conversion means 54 uses camera parameters to calculate the intersection P 0} (X ₀ , Y ₀ , Z ₀ ) at which the line-of-sight vector connecting the optical center of the photographing unit 2 and the designated pixel intersects the plane 101. .. _{Similarly, the size conversion means 54 uses the camera parameters to make the intersections Pr} (X _r , Y _r , Z _r ) and P _b (X _b ) of the pixel to the right of the specified pixel and the pixel below it, respectively. , Y _b , Z _b ) are calculated. Actual size corresponding to the designated pixel can be approximated by a parallelogram area vector is made from the vector and intersection point P ₀ from the intersection P ₀ to the intersection P _r to the intersection point P _b. The size conversion means 54 calculates and outputs | (X _r −X ₀ ) (Y _b −Y ₀ ) − (Y _r −Y ₀ ) (X _b −X ₀ ) |.

なお、サイズ換算手段５４は、図４（Ｂ）に示すように、撮影画像の各画素と対応づけて、事前に算出された当該画素の実サイズを予め記憶している実サイズ記憶手段４２１によって構成してもよい。この場合、実サイズ記憶手段４２１がモデル画像生成手段５１から指定された画素に対応して記憶している実サイズを出力する。 As shown in FIG. 4B, the size conversion means 54 is provided by the actual size storage means 421 that stores the actual size of the pixel calculated in advance in association with each pixel of the captured image. It may be configured. In this case, the actual size storage means 421 outputs the actual size stored corresponding to the pixels designated by the model image generation means 51.

ここで、密度分布が示す密度は推定値であり誤差を含む。そのため、配置数にも幅を持たせて複数通りの配置数でモデル画像を生成し、それらの中から最も撮影画像に類似するモデル画像を選出することによって物体の位置を推定するのが良い。
そこで、モデル画像生成手段５１は、密度分布において想定される推定誤差の範囲で複数通りの個数の物体モデルを割り当てて複数通りの配置を設定する。 Here, the density indicated by the density distribution is an estimated value and includes an error. Therefore, it is preferable to estimate the position of the object by generating a model image with a plurality of arrangements with a width in the arrangement number and selecting a model image most similar to the photographed image from them.
Therefore, the model image generation means 51 allocates a plurality of different number of object models within a range of estimation error assumed in the density distribution and sets a plurality of different arrangements.

上述した密度推定手段５０が出力する密度分布は、それ自体が幅を持った推定密度で表されており、推定誤差を含んだ表現となっている。すなわち低密度、中密度、高密度クラスの推定誤差の範囲をそれぞれ０人／ｍ^２より高く２人／ｍ^２以下、２人／ｍ^２より高く４人／ｍ^２以下、４人／ｍ^２より高く８人／ｍ^２以下とすることができる。ただし高密度クラスの上限値を８人／ｍ^２としている。モデル画像生成手段５１は、各クラスの下限値を式（１）のｄ_ｉに適用した場合のＰを下限配置数、各クラスの上限値を式（１）のｄ_ｉに適用した場合のＰを上限配置数に設定することができる。 The density distribution output by the density estimation means 50 described above is represented by an estimated density having a width in itself, and is an expression including an estimation error. That is, the range of estimation error of the low density, medium density, and high density classes ^{is higher than 0 person / m 2} and 2 people / m ² or less, and ^{higher than 2 people / m 2} and 4 people / m ² or less, 4 people / m ^2. It can be higher, 8 people / m ² or less. However, the upper limit of the high-density class is 8 people / m ² . Model image generating means 51, d _i lower place number P when applied to, P when applied to d _i of equation (1) the upper limit of each class of the formula (1) the lower limit of each class Can be set to the maximum number of placements.

例えば、モデル画像生成手段５１は、中密度クラスの１００画素と高密度クラスの５０画素からなる変化領域について、下限配置数が４（＝０．０１×２×１００＋０．０１×４×５０）、上限配置数が８（＝０．０１×４×１００＋０．０１×８×５０）と算出し、４個の物体モデルを配置したモデル画像、５個の物体モデルを配置したモデル画像、…、８個の物体モデルを配置したモデル画像を生成する。ただし、この例では簡単のため１画素あたり一律０．０１ｍ^２としている。 For example, the model image generation means 51 has a lower limit arrangement number of 4 (= 0.01 × 2 × 100 + 0.01 × 4 × 50) for a change region consisting of 100 pixels in the medium density class and 50 pixels in the high density class. The upper limit number of arrangements is calculated to be 8 (= 0.01 × 4 × 100 + 0.01 × 8 × 50), and a model image in which 4 object models are arranged, a model image in which 5 object models are arranged, ..., 8 Generate a model image in which individual object models are arranged. However, in this example, for the sake of simplicity, it is uniformly set to 0.01 m ² per pixel.

なお、推定誤差を多めに見積もって、低密度、中密度、高密度クラスの推定誤差の範囲をそれぞれ０人／ｍ^２以上３人／ｍ^２以下、２人／ｍ^２以上５人／ｍ^２以下、４人／ｍ^２以上８人／ｍ^２以下などとしてもよい。 In addition, the estimation error is overestimated, and the range of estimation error of low density, medium density, and high density class is 0 person / m ² or more and 3 person / m ² or less, 2 person / m ² or more and 5 person / m ^{2 respectively.} Hereinafter, it may be 4 people / m ² or more and 8 people / m ^{2 or less.}

以上のように、サイズ換算手段５４は、撮影画像における局所領域のサイズを推定対象空間における実サイズに換算する。そして、モデル画像生成手段５１は、撮影画像上に複数の局所領域からなる配置領域を設定して、当該配置領域に含まれる局所領域の実サイズと密度分布から当該配置領域に割り当てる個数を決定する。そのため、処理量を減じるとともに本来の物体数以上の物体モデルが当てはまることによる誤推定をより的確に防止できる。
また、その際、モデル画像生成手段５１は、密度分布において想定される推定誤差の範囲で複数通りの個数の物体モデルを割り当てて複数通りの配置を設定する。そのため、密度分布に推定誤差が含まれていても本来の物体数での配置を含めた複数通りの配置を設定でき、物体の位置の誤推定を防ぐことができる。 As described above, the size conversion means 54 converts the size of the local region in the captured image into the actual size in the estimation target space. Then, the model image generation means 51 sets an arrangement area composed of a plurality of local areas on the captured image, and determines the number to be allocated to the arrangement area from the actual size and density distribution of the local area included in the arrangement area. .. Therefore, it is possible to reduce the amount of processing and more accurately prevent erroneous estimation due to the application of an object model larger than the original number of objects.
At that time, the model image generation means 51 allocates a plurality of different number of object models within the range of the estimation error assumed in the density distribution and sets a plurality of different arrangements. Therefore, even if the density distribution includes an estimation error, it is possible to set a plurality of arrangements including the arrangement with the original number of objects, and it is possible to prevent erroneous estimation of the position of the object.

次に、類似度について説明する。 Next, the similarity will be described.

最適配置推定手段５２は、モデル画像に描画された物体モデルの形状の、配置領域に対する適合度（形状適合度）の高さに応じた類似度を次式に従って算出する。
類似度＝形状適合度 − （Ｗ_Ｈ×隠蔽度＋Ｗ_Ｎ×配置数の推定誤差）（２）
ただし、Ｗ_ＨおよびＷ_Ｎは０より大きな重み係数であり、事前の実験に基づいて予め設定される。形状適合度から減じる値はペナルティ値に相当する。 The optimum placement estimation means 52 calculates the similarity of the shape of the object model drawn on the model image according to the high degree of fit (shape fit) with respect to the placement region according to the following equation.
Similarity = shape goodness of fit- ( _WH x concealment + W _N x number of placement estimation error) (2)
However, _WH and W _N are weighting factors greater than 0 and are preset based on prior experiments. The value subtracted from the shape conformity corresponds to the penalty value.

形状適合度は、例えば、撮影画像から抽出された変化領域とモデル画像において物体モデルが描画された領域（モデル領域）との重複度とすることができる。その場合、重複度から配置領域とモデル領域の非重複度を減じることによって、互いの領域のはみ出しを加味した、より信頼性の高い形状適合度とすることができる。すなわち、最適配置推定手段５２は、次式に従って形状適合度を算出する。
形状適合度＝重複度−非重複度
＝（重複領域の面積−非重複領域の面積）
／（変化領域の面積＋モデル領域の面積−重複領域の面積）（３） The shape conformity can be, for example, the degree of overlap between the change region extracted from the captured image and the region (model region) in which the object model is drawn in the model image. In that case, by subtracting the non-overlapping degree of the arrangement area and the model area from the overlap degree, it is possible to obtain a more reliable shape conformity in consideration of the protrusion of each area. That is, the optimum arrangement estimating means 52 calculates the shape conformity according to the following equation.
Shape Goodness of Fit = Multiplicity-Non-Multiplicity
= (Area of overlapping area-Area of non-overlapping area)
/ (Area of change area + Area of model area-Area of overlapping area) (3)

図６は形状適合度の算出処理の一例を模式的に示した図である。
最適配置推定手段５２には、変化領域抽出手段５３から変化領域２０１の情報を含んだ差分画像２００が入力され、モデル画像生成手段５１から６つのモデルの像２１１が描画されたモデル画像２１０が入力されている。モデルの像２１１の和領域がモデル領域である。画像２２０は最適配置推定手段５２が変化領域２０１とモデル領域を重ね合せている様子が示されている。図中の白抜き部分が重複領域２２１、横線を記した部分がモデル領域側の非重複領域２２２、斜線を記した部分が変化領域側の非重複領域２２３である。非重複領域２２２と非重複領域２２３の和が式（３）の非重複領域である。
最適配置推定手段５２は、重複領域２２１、非重複領域、変化領域２０１およびモデル領域の画素数を計数して各領域の面積とし、それらの面積を式（３）に適用して形状適合度を算出する。 FIG. 6 is a diagram schematically showing an example of the shape conformity calculation process.
The difference image 200 including the information of the change region 201 is input from the change region extraction means 53 to the optimum arrangement estimation means 52, and the model image 210 in which the images 211 of the six models are drawn from the model image generation means 51 is input. Has been done. The sum region of the model image 211 is the model region. Image 220 shows that the optimum arrangement estimation means 52 superimposes the change region 201 and the model region. The white part in the figure is the overlapping area 221, the part marked with a horizontal line is the non-overlapping area 222 on the model area side, and the part marked with diagonal lines is the non-overlapping area 223 on the changing area side. The sum of the non-overlapping region 222 and the non-overlapping region 223 is the non-overlapping region of the equation (3).
The optimum arrangement estimating means 52 counts the number of pixels in the overlapping area 221 and the non-overlapping area, the changing area 201, and the model area to obtain the area of each area, and applies these areas to the equation (3) to obtain the shape goodness of fit. calculate.

より好適には、形状適合度は、重複度および非重複度を各画素における差分値により重み付けて算出される。すなわち、最適配置推定手段５２は、次式に従って形状適合度を算出することもできる。
形状適合度＝重複度−非重複度
＝重複領域における差分値の総和−非重複領域における差分値の総和
／（変化領域の面積＋モデル領域の面積−重複領域の面積）（４）
なお、変化領域抽出手段５３が背景相関処理により変化領域を抽出した場合は、差分値に代えて１．０から相関値を減じた値を用いる。 More preferably, the goodness of fit is calculated by weighting the multiplicity and the non-overlapping by the difference value in each pixel. That is, the optimum arrangement estimating means 52 can also calculate the shape goodness of fit according to the following equation.
Shape Goodness of Fit = Multiplicity-Non-Multiplicity
= Sum of difference values in overlapping areas-Sum of difference values in non-overlapping areas
/ (Area of change area + Area of model area-Area of overlapping area) (4)
When the change area extraction means 53 extracts the change area by the background correlation process, a value obtained by subtracting the correlation value from 1.0 is used instead of the difference value.

或いは、形状適合度は、撮影画像とモデル画像とのエッジ類似度とすることができる。その場合、最適配置推定手段５２は、画像取得手段３０から撮影画像を入力され、撮影画像とモデル画像のそれぞれからエッジを抽出する。そして最適配置推定手段５２は、例えば、モデル画像から有効なエッジが抽出された画素ごとに対応する撮影画像の画素のエッジ抽出結果との差の絶対値を算出して総和し、総和値を算出に用いた画素数で除してエッジ類似度を算出する。 Alternatively, the shape conformity can be the edge similarity between the captured image and the model image. In that case, the optimum arrangement estimation means 52 inputs a captured image from the image acquisition means 30, and extracts edges from each of the captured image and the model image. Then, the optimum arrangement estimating means 52 calculates, for example, the absolute value of the difference between the pixel edge extraction result of the pixel of the captured image corresponding to each pixel from which the effective edge is extracted from the model image, and sums them up to calculate the total value. The edge similarity is calculated by dividing by the number of pixels used in.

さらには、重複度と非重複度の差と、エッジ類似度の重みづけ和を形状適合度とすることもできる。 Further, the difference between the degree of overlap and the degree of non-overlap and the weighted sum of the edge similarity can be used as the shape goodness of fit.

次に、類似度のペナルティ項に含まれる隠蔽度について説明する。 Next, the degree of concealment included in the penalty term of similarity will be described.

モデル画像における物体モデルどうしの重なりが大きくなり過ぎると配置数の増減に対する形状適合度の変化が小さくなるため、物体の位置を誤推定する要因となる。隠蔽度は、モデル画像における物体モデルどうしの重なり度合いであり、物体モデルの過剰な重なりを抑制するための尺度である。 If the overlap between the object models in the model image becomes too large, the change in the shape suitability with respect to the increase or decrease in the number of arrangements becomes small, which causes an erroneous estimation of the position of the object. The degree of concealment is the degree of overlap between object models in a model image, and is a measure for suppressing excessive overlap of object models.

モデル画像生成手段５１は配置ごとにモデル画像における物体モデルどうしの重なり度合いを表す隠蔽度を次式に従って算出し、最適配置推定手段５２は式（２）に示したように配置ごとの類似度を当該配置における隠蔽度が大きいほど低めて算出する。
隠蔽度＝モデル間の重複領域の面積／モデル領域の和領域の面積（５） The model image generation means 51 calculates the degree of concealment representing the degree of overlap between the object models in the model image for each arrangement according to the following equation, and the optimum arrangement estimation means 52 calculates the similarity for each arrangement as shown in the equation (2). The larger the degree of concealment in the arrangement, the lower the calculation.
Concealment = Area of overlapping area between models / Area of sum area of model area (5)

図７は、図６のモデル画像２１０に対して隠蔽度を算出する様子を示した図である。
画像３００は６つのモデル領域の重なりを表す画像である。斜線で示した領域３０１，３０２，３０３を合わせた領域がモデル間の重複領域である。
また、画像３１０は６つのモデル領域の論理和を表す画像である。斜線で示した領域３１１，３１２，３１３を合わせた領域がモデル領域の和領域である。
モデル画像生成手段５１は、領域３０１，３０２および３０３の画素数を計数してモデル間の重複領域の面積とするとともに、領域３１１，３１２および３１３の画素数を計数してモデル領域の和領域の面積とし、それらの面積を式（５）に適用して隠蔽度を算出する。 FIG. 7 is a diagram showing how the degree of concealment is calculated for the model image 210 of FIG.
The image 300 is an image showing the overlap of the six model regions. The area in which the areas 301, 302, and 303 shown by diagonal lines are combined is the overlapping area between the models.
Further, the image 310 is an image representing the logical sum of the six model regions. The area in which the areas 311, 312, and 313 shown by diagonal lines are combined is the sum area of the model area.
The model image generation means 51 counts the number of pixels in the areas 301, 302 and 303 to obtain the area of the overlapping area between the models, and counts the number of pixels in the areas 311, 312 and 313 to obtain the sum of the model areas. Let the area be an area, and apply those areas to the equation (5) to calculate the degree of concealment.

このようにすることで、隠蔽度を含めた類似度に基づいて最適配置を推定することができ、本来の物体数以上の物体モデルが当てはまることによる誤推定をより的確に防止できる。 By doing so, the optimum arrangement can be estimated based on the degree of similarity including the degree of concealment, and erroneous estimation due to the application of an object model larger than the original number of objects can be prevented more accurately.

次に、類似度のペナルティ項に含まれる、配置数の推定誤差について説明する。 Next, the estimation error of the number of arrangements included in the penalty term of similarity will be described.

各モデル画像における配置数は、密度の推定値に基づいて設定された数であり、密度の推定誤差を含んでいる。配置数の推定誤差は、各モデル画像における配置数が含む、密度分布の推定誤差の程度を表す尺度である。 The number of arrangements in each model image is a number set based on the estimated value of the density, and includes an estimation error of the density. The estimated error of the number of arrangements is a measure of the degree of estimation error of the density distribution included in the number of arrangements in each model image.

最適配置推定手段５２は、配置ごとの類似度を当該配置における配置数と対応する推定誤差が大きいほど低めて算出する。 The optimum arrangement estimating means 52 calculates the similarity for each arrangement to be lower as the estimation error corresponding to the number of arrangements in the arrangement is larger.

具体的には、最適配置推定手段５２は、例えば、次式に従って配置数の推定誤差を算出する。
配置数の推定誤差＝ −ｅｘｐ｛−α（配置数の代表値−配置数）^２｝（６）

ただし、変化領域に含まれる画素の集合をＲ、集合Ｒの中の任意の一画素をｉ、画素ｉの実サイズをａ_ｉ、画素ｉにおける密度の代表値をｃ_ｉとしている。低密度クラスの密度の代表値は１人／ｍ^２、中密度クラスの密度の代表値は３人／ｍ^２、高密度は６人／ｍ^２などとすることができる。αは０よりも大きな定数であり、事前実験に基づいて予め定められる。 Specifically, the optimum placement estimation means 52 calculates, for example, an estimation error of the number of placements according to the following equation.
Estimation error of the number of placements = -exp {-α (representative value of the number of placements-number of placements) ² } (6)

However, the set of pixels included in the change region R, is an arbitrary one pixel i, the actual size a _i of the pixel _i, the representative value of the density at pixel i c _i in the set R. The representative value of the density of the low density class can be 1 person / m ² , the representative value of the density of the medium density class can be 3 people / m ² , and the high density can be 6 people / m ² . α is a constant greater than 0 and is predetermined based on prior experiments.

このようにすることで、配置数の推定誤差を含めた類似度に基づいて最適配置を推定することができるので、密度分布から乖離して本来の物体数以上の物体モデルが当てはまってしまうことを防げ、個々の物体の位置の誤推定をより的確に防止できる。 By doing so, the optimum placement can be estimated based on the similarity including the estimation error of the number of placements, so that the object model that deviates from the density distribution and exceeds the original number of objects is applied. It can be prevented, and misestimation of the position of each object can be prevented more accurately.

上述したように、最適配置推定手段５２は、複数通りの配置のうちの類似度が最も高い配置（最適配置）が示す物体モデルの位置から物体位置の情報を出力する。
例えば、最適配置推定手段５２は、監視員が視認し易い物体位置の情報として、最適配置のモデル画像に描画された物体モデルのそれぞれを密度クラスに応じて色分けした分布画像を生成し、出力する。 As described above, the optimum arrangement estimating means 52 outputs the object position information from the position of the object model indicated by the arrangement (optimum arrangement) having the highest degree of similarity among the plurality of arrangements.
For example, the optimum placement estimation means 52 generates and outputs a distribution image in which each of the object models drawn on the model image of the optimum placement is color-coded according to the density class as information on the object position that is easily visible to the observer. ..

物体位置の情報は最適配置が示す物体位置そのものであってもよい。または、物体位置の情報は、最適配置における最適配置における各物体モデルの、他の物体モデルと重複していない領域であってもよい。或いは、物体位置の情報は、上述したデータのうちの２以上を含んだデータであってもよい。 The information on the object position may be the object position itself indicated by the optimum arrangement. Alternatively, the object position information may be a region of each object model in the optimum arrangement in the optimum arrangement that does not overlap with other object models. Alternatively, the object position information may be data including two or more of the above-mentioned data.

物体位置出力手段３１は最適配置推定手段５２から入力された物体位置の情報を出力部６に順次出力し、出力部６は物体位置出力手段３１から入力された情報を表示する。例えば、物体位置の情報は、インターネット経由で送受信され、出力部６に表示される。監視員は、表示された分布画像を視認することによって監視空間に混雑が発生している地点およびその地点の様子を迅速に把握し、当該地点に警備員を派遣し或いは増員するなどの対処を行う。 The object position output means 31 sequentially outputs the object position information input from the optimum placement estimation means 52 to the output unit 6, and the output unit 6 displays the information input from the object position output means 31. For example, the information on the position of the object is transmitted and received via the Internet and displayed on the output unit 6. By visually recognizing the displayed distribution image, the observer can quickly grasp the point where congestion is occurring in the monitoring space and the state of that point, and take measures such as dispatching or increasing the number of guards to that point. Do.

＜画像監視装置１の動作＞
図８から図１０のフローチャートを参照して画像監視装置１の動作を説明する。 <Operation of image monitoring device 1>
The operation of the image monitoring device 1 will be described with reference to the flowcharts of FIGS. 8 to 10.

画像監視装置１はイベント会場が無人であるときに起動され、画像監視装置１が動作を開始すると、イベント会場に設置されている撮影部２は所定時間おきに監視空間を撮影して撮影画像を順次画像処理部５が設置されている画像解析センター宛に送信する。画像処理部５は撮影画像を受信するたびに図８のフローチャートに従った動作を繰り返す。 The image monitoring device 1 is activated when the event venue is unmanned, and when the image monitoring device 1 starts operating, the photographing unit 2 installed at the event venue photographs the monitoring space at predetermined time intervals and captures captured images. The images are sequentially transmitted to the image analysis center in which the image processing unit 5 is installed. Each time the image processing unit 5 receives the captured image, the image processing unit 5 repeats the operation according to the flowchart of FIG.

まず、通信部３は画像取得手段３０として動作し、撮影部２からの撮影画像の受信待ち状態となる。そして、撮影画像を取得した画像取得手段３０は当該撮影画像を画像処理部５に出力する（ステップＳ１）。 First, the communication unit 3 operates as the image acquisition means 30, and is in a state of waiting for reception of the captured image from the photographing unit 2. Then, the image acquisition means 30 that has acquired the captured image outputs the captured image to the image processing unit 5 (step S1).

撮影画像を入力された画像処理部５は変化領域抽出手段５３として動作し、記憶部４の背景画像記憶手段４０から背景画像を読み出して撮影画像と比較し、変化領域を抽出する（ステップＳ２）。ただし、起動直後は、変化領域抽出手段５３は変化領域の抽出を省略し、撮影画像を背景画像として背景画像記憶手段４０に記憶させる。 The image processing unit 5 to which the captured image is input operates as the change area extraction means 53, reads the background image from the background image storage means 40 of the storage unit 4, compares it with the photographed image, and extracts the change area (step S2). .. However, immediately after the start-up, the change area extraction means 53 omits the extraction of the change area, and stores the captured image as the background image in the background image storage means 40.

変化領域が抽出されなかった場合（ステップＳ３にてＮＯ）、ステップＳ４〜Ｓ８の処理は省略される。このとき、変化領域抽出手段５３は背景画像記憶手段４０の背景画像を変化領域が抽出されなかった撮影画像で置換する。 If the change region is not extracted (NO in step S3), the processes of steps S4 to S8 are omitted. At this time, the change area extraction means 53 replaces the background image of the background image storage means 40 with a captured image in which the change area has not been extracted.

変化領域が抽出された場合（ステップＳ３にてＹＥＳ）、画像処理部５はモデル画像生成手段５１および最適配置推定手段５２としても動作し、変化領域抽出手段５３からモデル画像生成手段５１に変化領域の情報が入力され、変化領域抽出手段５３から最適配置推定手段５２に差分値の情報が入力される。モデル画像生成手段５１および最適配置推定手段５２はこれらの情報を保持し、処理はステップＳ４に進められる。また、画像処理部５は密度推定手段５０として動作し、撮影画像が密度推定手段５０に入力される。また、変化領域抽出手段５３は撮影画像を用いて背景画像を更新する。 When the change region is extracted (YES in step S3), the image processing unit 5 also operates as the model image generation means 51 and the optimum arrangement estimation means 52, and the change region extraction means 53 changes to the model image generation means 51. Information is input, and the difference value information is input from the change area extraction means 53 to the optimum arrangement estimation means 52. The model image generation means 51 and the optimum arrangement estimation means 52 hold the information, and the process proceeds to step S4. Further, the image processing unit 5 operates as the density estimation means 50, and the captured image is input to the density estimation means 50. Further, the change area extraction means 53 updates the background image using the captured image.

撮影画像を入力された密度推定手段５０は、当該撮影画像を密度推定器にて走査し、密度分布を推定する（ステップＳ４）。密度分布を推定した画像処理部５はモデル画像生成手段５１としても動作し、密度推定手段５０からモデル画像生成手段５１に密度分布が入力される。 The density estimation means 50 in which the captured image is input scans the captured image with the density estimator and estimates the density distribution (step S4). The image processing unit 5 that estimates the density distribution also operates as the model image generation means 51, and the density distribution is input from the density estimation means 50 to the model image generation means 51.

密度分布を入力されたモデル画像生成手段５１は、保持している変化領域を順次注目変化領域に設定して（ステップＳ５）、注目変化領域の最適配置推定（ステップＳ６）を制御する。 The model image generation means 51 in which the density distribution is input sequentially sets the holding change region to the attention change region (step S5), and controls the optimum arrangement estimation (step S6) of the attention change region.

図９のフローチャートを参照して、注目変化領域の最適配置推定を説明する。 The optimum arrangement estimation of the attention change region will be described with reference to the flowchart of FIG.

まず、画像処理部５はサイズ換算手段５４としても動作し、モデル画像生成手段５１が注目変化領域の各画素をサイズ換算手段５４に指定して、各画素の実サイズを算出する（ステップＳ６０）。 First, the image processing unit 5 also operates as the size conversion means 54, and the model image generation means 51 designates each pixel in the attention changing region as the size conversion means 54 and calculates the actual size of each pixel (step S60). ..

次に、モデル画像生成手段５１は、注目変化領域に対する配置数の範囲を決定する（ステップＳ６１）。モデル画像生成手段５１は、注目変化領域内の各画素の実サイズと各画素の密度クラスの下限値をそれぞれ式（１）のａ_ｉとｄ_ｉに代入して下限配置数を算出する。また、モデル画像生成手段５１は、注目変化領域内の各画素の実サイズと各画素の密度クラスの上限値をそれぞれ式（１）のａ_ｉとｄ_ｉに代入して上限配置数を算出する。 Next, the model image generation means 51 determines the range of the number of arrangements with respect to the attention change region (step S61). Model image generating unit 51 calculates the number of lower limit placement by substituting a _i and d _i in the formula the lower limit of the density class of the actual size and each pixel of each pixel in the target change area, respectively (1). Further, the model image generating unit 51 calculates the maximum number of placement by substituting a _i and d _i, respectively and the upper limit value of the density class of the actual size and each pixel of each pixel in the target change area (1) ..

さらに、モデル画像生成手段５１は、注目変化領域内の各画素の実サイズと各画素の密度クラスの代表値をそれぞれ式（７）のａ_ｉとｃ_ｉに代入して、配置数の代表値を算出する（ステップＳ６２）。 Further, the model image generating unit 51 substitutes the representative value of the density class of the actual size and each pixel of each pixel in the target change area to a _i and c _i of each formula (7), the arrangement number of the representative value Is calculated (step S62).

続いて、モデル画像生成手段５１は、ステップＳ６１で設定した範囲内の整数値を順次配置数に設定して（ステップＳ６３）、ステップＳ６３〜Ｓ７３のループ処理を行う。 Subsequently, the model image generation means 51 sequentially sets an integer value within the range set in step S61 to the number of arrangements (step S63), and performs the loop processing of steps S63 to S73.

モデル画像生成手段５１は、反復回数を計数するカウンタＴを用意して０に初期化し（ステップＳ６４）、反復処理の制御を開始する。 The model image generation means 51 prepares a counter T for counting the number of iterations, initializes it to 0 (step S64), and starts controlling the iteration process.

モデル画像生成手段５１は、注目変化領域内にステップＳ６３で設定した配置数と同じ数だけランダムに位置を設定することで、物体モデルに位置を割り当てる（ステップＳ６５）。 The model image generation means 51 assigns positions to the object model by randomly setting the same number of positions in the attention change region as the number of arrangements set in step S63 (step S65).

モデル画像生成手段５１は、撮影画像と同サイズのモデル画像を用意し、モデル画像の、ステップＳ６５で設定した各位置に物体モデルを描画する（ステップＳ６６）。モデル画像生成手段５１は、記憶部４の物体モデル記憶手段４１を参照して、カメラパラメータ記憶手段４１１からカメラパラメータを、立体モデル記憶手段４１０から立体モデルをそれぞれ読み出す。モデル画像生成手段５１は、カメラパラメータを用いて各位置を仮想空間中の位置に変換して変換した各位置に立体モデルを配置し、カメラパラメータを用いて立体モデルを配置した仮想空間をモデル画像にレンダリングすることで物体モデルの描画を行う。 The model image generation means 51 prepares a model image having the same size as the captured image, and draws an object model at each position of the model image set in step S65 (step S66). The model image generation means 51 reads out the camera parameters from the camera parameter storage means 411 and the three-dimensional model from the three-dimensional model storage means 410 with reference to the object model storage means 41 of the storage unit 4. The model image generation means 51 converts each position into a position in the virtual space using camera parameters, arranges a three-dimensional model at each converted position, and uses camera parameters to arrange a three-dimensional model in a virtual space as a model image. The object model is drawn by rendering to.

また、モデル画像生成手段５１は、描画した物体モデルどうしの重複領域の面積と描画した物体モデルの和領域の面積を求め、これらを式（５）に代入して、ステップＳ６６で生成したモデル画像における隠蔽度を算出する（ステップＳ６７）。画像処理部５は最適配置推定手段５２としても動作し、モデル画像生成手段５１から最適配置推定手段５２にモデル画像、物体モデルどうしの重複領域、物体モデルの和領域から重複領域を除いた非重複領域、配置数、各物体モデルの位置、代表配置数および隠蔽度が入力される。 Further, the model image generation means 51 obtains the area of the overlapping area between the drawn object models and the area of the sum area of the drawn object models, substitutes them into the equation (5), and generates the model image in step S66. The degree of concealment in is calculated (step S67). The image processing unit 5 also operates as the optimum placement estimation means 52, and the model image generation means 51 transfers the model image, the overlapping area between the object models, and the non-overlapping area excluding the overlapping area from the sum area of the object model to the optimum placement estimation means 52. The area, the number of arrangements, the position of each object model, the number of representative arrangements, and the degree of concealment are input.

図１０のフローチャートを参照して引き続き注目変化領域の最適配置推定を説明する。 The optimum arrangement estimation of the attention change region will be described continuously with reference to the flowchart of FIG.

最適配置推定手段５２は、入力された配置数と代表配置数を式（６）に代入して配置数の推定誤差を算出する（ステップＳ６８）。 The optimum arrangement estimation means 52 substitutes the input number of arrangements and the representative number of arrangements into the equation (6) to calculate the estimation error of the number of arrangements (step S68).

また、最適配置推定手段５２は、入力された重複領域、非重複領域および差分値を用いて重複領域における差分値の総和と非重複領域における差分値の総和を求め、これらを式（４）に代入して形状適合度を算出する（ステップＳ６９）。 Further, the optimal placement estimation means 52 uses the input overlapping area, non-overlapping area, and difference value to obtain the total sum of the difference values in the overlapping area and the total difference value in the non-overlapping area, and formulates these in the equation (4). Substituting and calculating the shape conformity (step S69).

そして、最適配置推定手段５２は、ステップＳ６９で算出した形状適合度、ステップＳ６８で算出した推定誤差、および入力された隠蔽度を式（２）に代入することにより、ステップＳ６６で生成したモデル画像に対する類似度を算出し、モデル画像と各物体モデルの位置と類似度とを対応付けて記憶部４に記憶させる（ステップＳ７０）。 Then, the optimum placement estimation means 52 substitutes the shape goodness of fit calculated in step S69, the estimation error calculated in step S68, and the input concealment degree into the equation (2), so that the model image generated in step S66 is generated. The similarity with respect to the model image is calculated, and the model image, the position of each object model, and the similarity are associated with each other and stored in the storage unit 4 (step S70).

こうしてモデル画像の類似度が算出されると、モデル画像生成手段５１は、反復回数Ｔを１だけ増加させて（ステップＳ７１）、その値を予め定めた規定回数Ｔ_ＭＡＸと比較する（ステップＳ７２）。 Thus the similarity of the model image is calculated, the model image generating means 51 increases the number of iterations T by 1 (step S71), comparing the predetermined number of times T _MAX which defines its value in advance (step S72) ..

反復回数Ｔが規定回数Ｔ_ＭＡＸに満たない場合（ステップＳ７２にてＮＯ）、モデル画像生成手段５１は、処理をステップＳ６５に戻して反復処理を繰り返す。 If the number of iterations T is less than a specified number _{T MAX} (NO in step S72), the model image generation means 51 repeats the iterative process returns the process to step S65.

他方、反復回数Ｔが規定回数Ｔ_ＭＡＸに達した場合（ステップＳ７２にてＹＥＳ）、モデル画像生成手段５１は、ステップＳ６３で設定した範囲の配置数を全て設定したか否かを確認する（ステップＳ７３）。 On the other hand, if the number of iterations T has reached a predetermined number T _MAX (YES in step S72), the model image generation means 51 checks whether the set of all the arrangement number in the range set in step S63 (step S73).

未だ設定していない配置数がある場合（ステップＳ７３にてＮＯ）、モデル画像生成手段５１は、処理をステップＳ６３に戻して次の配置数での処理を行う。 If there is an arrangement number that has not yet been set (NO in step S73), the model image generation means 51 returns the process to step S63 and performs processing with the next arrangement number.

他方、全配置数を設定し終えた場合（ステップＳ７３にてＹＥＳ）、モデル画像生成手段５１は最適配置推定手段５２に注目変化領域についての複数通りのモデル画像を生成し終えたことを通知する。 On the other hand, when the total number of arrangements has been set (YES in step S73), the model image generation means 51 notifies the optimum arrangement estimation means 52 that the generation of a plurality of types of model images for the change region of interest has been completed. ..

この通知を受けた最適配置推定手段５２は類似度が最大の配置を決定する（ステップＳ７４）。最適配置推定手段５２はステップＳ７０で記録した類似度の中から最大値を選出し、選出した類似度と対応付けられているモデル画像と各物体モデルの位置を、注目変化領域における物体位置として記憶部４に記憶させる。最適配置推定手段５２はステップＳ７０で記録した情報をクリアし、処理を図８のステップＳ７に進める。 Upon receiving this notification, the optimum placement estimation means 52 determines the placement with the maximum similarity (step S74). The optimum arrangement estimating means 52 selects the maximum value from the similarity recorded in step S70, and stores the model image associated with the selected similarity and the position of each object model as the object position in the attention change region. Store in part 4. The optimum arrangement estimating means 52 clears the information recorded in step S70, and proceeds to the process in step S7 of FIG.

再び図８を参照して説明する。モデル画像生成手段５１は、保持している変化領域の全てについて処理し終えたか否かを確認する（ステップＳ７）。未だ処理していない変化領域がある場合（ステップＳ７にてＮＯ）、モデル画像生成手段５１は、処理をステップＳ５に戻して次の変化領域の処理を行う。次の変化領域の処理が行われるとステップＳ７４においてその変化領域における物体位置の情報が追記されることになる。 This will be described again with reference to FIG. The model image generation means 51 confirms whether or not all the changed regions held by the model image generation means 51 have been processed (step S7). When there is a change region that has not been processed yet (NO in step S7), the model image generation means 51 returns the processing to step S5 and performs the processing of the next change region. When the processing of the next change area is performed, the information of the object position in the change area is added in step S74.

他方、全ての変化領域の処理を終えた場合（ステップＳ７にてＹＥＳ）、最適配置推定手段５２はステップＳ７４で記録した物体位置の情報を通信部３に出力する（ステップＳ８）。物体位置の情報を入力された通信部３は物体位置出力手段３１として動作し、物体位置の情報を出力部６に送信する。出力部６は物体位置の情報を表示するなどして監視員に伝達する。 On the other hand, when the processing of all the change regions is completed (YES in step S7), the optimum placement estimation means 52 outputs the object position information recorded in step S74 to the communication unit 3 (step S8). The communication unit 3 to which the object position information is input operates as the object position output means 31, and transmits the object position information to the output unit 6. The output unit 6 transmits information to the observer by displaying information on the position of the object.

以上の処理を終えると処理はステップＳ１に戻され、次の撮影画像に対する処理が行われる。 When the above processing is completed, the processing is returned to step S1, and the processing for the next captured image is performed.

＜変形例＞
（１）上記実施形態においては、推定対象の物体を人とする例を示したが、これに限らず、推定対象の物体を車両、牛や羊等の動物等とすることもできる。 <Modification example>
(1) In the above embodiment, an example in which the object to be estimated is a human is shown, but the object is not limited to this, and the object to be estimated can be a vehicle, an animal such as a cow or a sheep, or the like.

（２）上記実施形態およびその変形例においては、撮影部２がカメラ１台の例を示したが、共通視野を有する複数のカメラで撮影部２を構成することもできる。その場合、背景画像記憶手段４０には各カメラの背景画像が記憶され、変化領域抽出手段５２が参照するカメラパラメータ記憶手段（不図示）には各カメラのカメラパラメータが記憶され、変化領域抽出手段５２はカメラごとに変化領域を抽出する。また、密度推定手段５０が参照するカメラパラメータ記憶手段（不図示）には各カメラのカメラパラメータが記憶され、密度推定手段５０はカメラごとに密度分布を推定する。また、カメラパラメータ記憶手段４１１，４２０には各カメラのカメラパラメータが記憶され、サイズ換算手段５４はカメラごとに実サイズを算出し、モデル画像生成手段５１はカメラごとに配置数の範囲を算出する。そして、モデル画像生成手段５１は最小の下限配置数と最大の上限配置数にて配置数の範囲を決定し、仮想空間に各配置数分だけの物体モデルを配置する。モデル画像生成手段５１は、これを各カメラのカメラパラメータを用いてレンダリングすることでカメラごとのモデル画像を生成し、カメラごとに隠蔽度の算出を行う。そして、最適配置推定手段５２は、カメラごとに形状類似度、配置数の推定誤差、類似度を算出し、全カメラの類似度を総和した総和値が最大のモデル画像から最適配置を決定する。このようにすることで、物体の隠蔽状態が異なる複数の視点からの類似度に基づき総合的に最適な配置を決定でき、物体位置の推定精度が向上する。 (2) In the above-described embodiment and its modification, the photographing unit 2 shows an example of one camera, but the photographing unit 2 can be configured by a plurality of cameras having a common field of view. In that case, the background image of each camera is stored in the background image storage means 40, the camera parameters of each camera are stored in the camera parameter storage means (not shown) referred to by the change area extraction means 52, and the change area extraction means 52 extracts a change area for each camera. Further, the camera parameters of each camera are stored in the camera parameter storage means (not shown) referred to by the density estimation means 50, and the density estimation means 50 estimates the density distribution for each camera. Further, the camera parameter storage means 411 and 420 store the camera parameters of each camera, the size conversion means 54 calculates the actual size for each camera, and the model image generation means 51 calculates the range of the number of arrangements for each camera. .. Then, the model image generation means 51 determines the range of the number of arrangements based on the minimum number of arrangements and the maximum number of arrangements, and arranges the object models for each arrangement in the virtual space. The model image generation means 51 generates a model image for each camera by rendering this using the camera parameters of each camera, and calculates the degree of concealment for each camera. Then, the optimum placement estimation means 52 calculates the shape similarity, the estimation error of the number of placements, and the similarity for each camera, and determines the optimum placement from the model image having the maximum sum of the similarities of all the cameras. By doing so, it is possible to comprehensively determine the optimum arrangement based on the degree of similarity from a plurality of viewpoints in which the concealment state of the object is different, and the estimation accuracy of the object position is improved.

（３）上記実施形態およびその各変形例においては、固定視野の撮影部２を用いる例を示したが、可変視野の撮影部２とすることもできる。その場合、撮影部２は視野変更後のカメラパラメータを通信部３経由で画像処理部５に出力し、画像処理部５は、密度推定手段５０が参照するカメラパラメータ記憶手段（不図示）、変化領域抽出手段５３が参照するカメラパラメータ記憶手段（不図示）およびカメラパラメータ記憶手段４１１，４２０のカメラパラメータを入力されたカメラパラメータに変更する。またその場合、変化領域抽出手段５３は、環境モデルをレンダリングすることによって背景画像を生成する。 (3) In the above-described embodiment and each modification thereof, an example in which the fixed-field photographing unit 2 is used is shown, but the variable-field photographing unit 2 can also be used. In that case, the photographing unit 2 outputs the camera parameters after the field change to the image processing unit 5 via the communication unit 3, and the image processing unit 5 changes the camera parameter storage means (not shown) referred to by the density estimation means 50. The camera parameters of the camera parameter storage means (not shown) and the camera parameter storage means 411,420 referred to by the area extraction means 53 are changed to the input camera parameters. In that case, the change area extraction means 53 generates a background image by rendering the environment model.

（４）上記実施形態およびその変形例においては、モデル画像生成手段５１が変化領域を配置領域に設定する例を示したが、モデル画像生成手段５１は背景クラス以外の領域を配置領域に設定することもできる。すなわち、モデル画像生成手段５１は、密度分布において０より大きな密度が推定された領域を配置領域に設定する。具体的には、モデル画像生成手段５１は、推定密度が「低密度」クラス、「中密度」クラスおよび「高密度」クラスと推定された画素からなる領域を配置領域に設定する。なお、背景クラス以外の領域を配置領域に設定する場合、背景画像記憶手段４０および変化領域抽出手段５３を省略した構成とすることができる。 (4) In the above-described embodiment and its modification, the model image generation means 51 sets the change area as the arrangement area, but the model image generation means 51 sets the area other than the background class as the arrangement area. You can also do it. That is, the model image generation means 51 sets a region in which the density larger than 0 is estimated in the density distribution as the arrangement region. Specifically, the model image generation means 51 sets a region consisting of pixels whose estimated densities are estimated to be "low density" class, "medium density" class, and "high density" class as the arrangement region. When an area other than the background class is set as the arrangement area, the background image storage means 40 and the change area extraction means 53 can be omitted.

（５）上記実施形態およびその各変形例においては、式（６）および式（７）に従って配置数の推定誤差を算出する例を示したが、密度分布と配置の相違度を配置数の推定誤差として算出することもできる。その場合、密度推定手段５０は最適配置推定手段５２にも密度分布を出力する。最適配置推定手段５２は、各配置において、画素ごとに、推定用抽出窓を設定して推定用抽出窓内の物体モデルの位置を計数して窓内配置数を求める。最適配置推定手段５２は、各配置に対し、次式に従い、画素ごとの窓内配置数と密度分布において対応する画素の密度の代表値との差の総和を総画素数で除した値を配置数の推定誤差として算出する。

ただし、撮影画像の総画素数をＮ、撮影画像中の任意の一画素をｉ、画素ｉにて算出した窓内配置数をｍｉ、画素ｉにおける密度の代表値をｃｉとしている。背景クラスの密度の代表値は０人／ｍ^２、低密度クラスの密度の代表値は１人／ｍ^２、中密度クラスの密度の代表値は３人／ｍ^２、高密度は６人／ｍ^２などとすることができる。
或いは、最適配置推定手段５２は、画素ごとに窓内配置数と対応する密度クラスの値を求めて配置数の推定誤差を算出してもよい。例えば、或る配置において或る画素に対応して設定した推定用抽出窓内の物体モデルの位置が２個であれば当該画素の値は「低密度」クラスを示す値となる。最適配置推定手段５２は、こうして求めた画素ごと密度クラスの値を密度分布において対応する画素の値と比較し、同一値でない画素数を総画素数で除した値を配置数の推定誤差として算出する。 (5) In the above-described embodiment and each modification thereof, an example of calculating the estimation error of the number of arrangements according to the equations (6) and (7) is shown, but the difference between the density distribution and the arrangement is estimated by the number of arrangements. It can also be calculated as an error. In that case, the density estimation means 50 also outputs the density distribution to the optimum arrangement estimation means 52. The optimum arrangement estimation means 52 sets an estimation window for each pixel in each arrangement, counts the positions of the object models in the estimation extraction window, and obtains the number of arrangements in the window. The optimum arrangement estimation means 52 allocates a value obtained by dividing the sum of the differences between the number of arrangements in the window for each pixel and the representative value of the density of the corresponding pixels in the density distribution by the total number of pixels for each arrangement according to the following equation. Calculated as a number estimation error.

However, the total number of pixels in the captured image is N, any one pixel in the captured image is i, the number of arrangements in the window calculated by pixel i is mi, and the representative value of the density in pixel i is ci. The representative value of the density of the background class is 0 person / m ² , the representative value of the density of the low density class is 1 person / m ² , the representative value of the density of the medium density class is 3 people / m ² , and the representative value of the density is 6 people / m 2. It can be m ² or the like.
Alternatively, the optimum arrangement estimation means 52 may calculate the estimation error of the arrangement number by obtaining the value of the density class corresponding to the number of arrangements in the window for each pixel. For example, if there are two positions of the object model in the estimation window set corresponding to a certain pixel in a certain arrangement, the value of the pixel becomes a value indicating the "low density" class. The optimum placement estimation means 52 compares the value of the density class for each pixel thus obtained with the value of the corresponding pixel in the density distribution, and calculates the value obtained by dividing the number of pixels that are not the same value by the total number of pixels as the estimation error of the number of placements. To do.

（６）上記実施形態およびその各変形例においては、分布密度と実サイズに基づいて配置数を設定する例を示したが、上述した配置数の推定誤差に基づいて探索的に配置数を設定することもできる。例えば、モデル画像生成手段５１は、１から予め定めた上限個数（例えば撮影画像と対応する仮想空間中で立体モデルが重ならずに配置できる個数）まで仮配置数のループ処理を行って、仮配置数分の物体モデルのそれぞれに配置領域内のランダムな位置を割り当ててＴ_ＭＡＸ通りの仮配置を設定し、Ｔ_ＭＡＸ通りの仮配置それぞれに対して式（６）または式（８）などの推定誤差を算出して予め定めた閾値εと比較し、推定誤差が閾値ε以下である仮配置にてモデル画像を生成する。こうすることによっても、モデル画像生成手段５１は、密度分布に応じた個数の物体モデルに配置領域内の位置を割り当てて複数通りの配置を設定し、複数通りの配置にて物体モデルを描画してモデル画像を生成することができる。 (6) In the above-described embodiment and each modification thereof, an example in which the number of arrangements is set based on the distribution density and the actual size is shown, but the number of arrangements is exploratoryly set based on the above-mentioned estimation error of the number of arrangements. You can also do it. For example, the model image generation means 51 performs loop processing of the number of temporary arrangements from 1 to a predetermined upper limit number (for example, the number of three-dimensional models that can be arranged without overlapping in the virtual space corresponding to the captured image). set the temporary placement of T _MAX street assigns a random position of the placement area each arrangement number of the object model, for each temporary arrangement of T _MAX as the formula (6) or formula (8), such as The estimation error is calculated and compared with a predetermined threshold value ε, and a model image is generated in a temporary arrangement in which the estimation error is equal to or less than the threshold value ε. By doing so, the model image generation means 51 assigns positions in the arrangement area to the number of object models according to the density distribution, sets a plurality of arrangements, and draws the object model in the plurality of arrangements. Can generate a model image.

（７）上記実施形態およびその各変形例においては、密度推定器が学習する特徴量および密度推定手段５０が抽出する推定用特徴量としてＧＬＣＭ特徴を例示したが、これらはＧＬＣＭ特徴に代えて、局所二値パターン（Local Binary Pattern：ＬＢＰ）特徴量、ハールライク（Haar-like）特徴量、輝度パターン、ＨＯＧ（Histograms of Oriented Gradients）特徴量などの種々の特徴量とすることができ、またはＧＬＣＭ特徴とこれらのうちの複数を組み合わせた特徴量とすることもできる。 (7) In the above-described embodiment and each modification thereof, the GLCM features are exemplified as the feature amount learned by the density estimator and the estimation feature amount extracted by the density estimation means 50, but these are replaced with the GLCM features. It can be a variety of features such as Local Binary Pattern (LBP) features, Haar-like features, brightness patterns, HOG (Histograms of Oriented Gradients) features, or GLCM features. And a plurality of these can be combined into a feature quantity.

（８）上記実施形態およびその各変形例においては、多クラスＳＶＭ法にて学習した密度推定器を例示したが、多クラスＳＶＭ法に代えて、決定木型のランダムフォレスト法、多クラスのアダブースト（AdaBoost）法または多クラスロジスティック回帰法などにて学習した密度推定器など種々の密度推定器とすることができる。
或いは、特徴量の抽出処理と密度推定器による推定処理を一つのネットワークで表現するＣＮＮ（Convolutional Neural Network）のような方法を用いて、密度推定手段５０を実現することもできる。 (8) In the above embodiment and each modification thereof, the density estimator learned by the multi-class SVM method is illustrated, but instead of the multi-class SVM method, a decision tree type random forest method and multi-class AdaBoost are used. Various density estimators such as density estimators learned by the (AdaBoost) method or the multiclass logistic regression method can be used.
Alternatively, the density estimation means 50 can be realized by using a method such as CNN (Convolutional Neural Network) that expresses the feature quantity extraction process and the estimation process by the density estimator in one network.

（９）上記実施形態およびその各変形例においては、モデル画像生成手段５１は、密度推定手段５０が出力する各画素の密度推定値を基に配置数の代表値ｃ_ｉを算出する例を示したが、密度推定手段５０が密度推定値に加えて推定の過程で算出する各クラスのスコアを出力し、モデル画像生成手段５１がこれらのスコアを基に配置数の代表値ｃ_ｉを補正することもできる。
クラスのスコアは、推定用特徴量が抽出された画像の「背景」クラスと他のクラスのうちの「背景」クラスであることの尤もらしさを表す背景スコア、「低密度」クラスと他のクラスのうちの「低密度」クラスであることの尤もらしさを表す低密度スコア、「中密度」クラスと他のクラスのうちの「中密度」クラスであることの尤もらしさを表す中密度スコア、「高密度」クラスと他のクラスのうちの「高密度」クラスであることの尤もらしさを表す高密度スコアである。因みにこれらのうちの最も高いスコアを示すクラスが密度推定値となる。
モデル画像生成手段５１は密度推定値が示すクラスのスコアの高さが高いほど高く、低いほど低く配置数の代表値ｃ_ｉを補正する。 (9) In the above embodiment and modifications, the model image generating means 51, an example of calculating a representative value c _i of the arrangement number based on the density estimates of each pixel output from the density estimating means 50 was, but outputs a score for each class density estimation unit 50 calculates in the course of the estimated in addition to the density estimate model image generating unit 51 corrects the representative value c _i of the number of arranged based on these scores You can also do it.
The class score is the background score, which indicates the plausibility of being the "background" class of the image from which the estimation features are extracted and the "background" class of the other classes, the "low density" class and other classes. Low density score, which indicates the plausibility of being in the "low density" class, medium density score, which indicates the plausibility of being in the "medium density" class and other classes. It is a high-density score that represents the plausibility of being a "high-density" class among the "high-density" class and other classes. By the way, the class showing the highest score among these is the density estimate.
Model image generating means 51 is high the higher the height of the score of the class indicated by the density estimate, correcting the representative value c _i of the lower the low number of arrangement.

（１０）上記実施形態およびその各変形例においては、密度推定器が推定する密度のクラスを４クラスとしたが、より細かくクラスを分けてもよい。 (10) In the above-described embodiment and each modification thereof, the density class estimated by the density estimator is set to 4 classes, but the classes may be further divided.

（１１）上記実施形態およびその各変形例においては、密度推定手段５０は多クラスに分類する密度推定器を用いる例を示したが、これに代えて、特徴量から密度の値を回帰する回帰型の密度推定器とすることもできる。すなわち、リッジ回帰法、サポートベクターリグレッション法または回帰木型のランダムフォレスト法などによって、特徴量から密度を求めるための回帰関数のパラメータを学習した密度推定器とすることができる。
その場合、モデル画像生成手段５１は密度推定器が出力する密度の値を配置数の代表値ｃ_ｉとして用いる。また、モデル画像生成手段５１は、下限配置数を（ｃ_ｉ−２）とし（ただし０以下となる場合は０）、上限配置数を（ｃ_ｉ＋２）とするなど、密度推定器が出力する密度の値に予め定めた値を加減した下限配置数、上限配置数を用いる。 (11) In the above embodiment and each modification thereof, an example is shown in which the density estimation means 50 uses a density estimator that classifies into multiple classes, but instead, a regression that returns the density value from the feature amount. It can also be a mold density estimator. That is, it can be a density estimator that learns the parameters of the regression function for obtaining the density from the features by the ridge regression method, the support vector regression method, the random forest method of the regression tree type, or the like.
In that case, the model image generating unit 51 uses the value of the density of the output density estimators as a representative value c _i of the arrangement number. Further, the model image generating section 51, a number of lower and placement (c i _-2) (if the proviso below 0 0), such as the maximum number and placement (c i _+2), density estimator outputs The lower limit arrangement number and the upper limit arrangement number obtained by adding or subtracting a predetermined value to the density value are used.

（１２）上記実施形態およびその各変形例においては、モデル画像生成手段５１が物体モデルに対して反復の都度ランダムに位置を割り当てる例を示したが、一回前の位置から微小にずらした位置を割り当ててもよいし、一回前の配置に対する類似度を参照してＭＣＭＣ（Markov chain Monte Carlo）法により確率的に位置を探索する方法や山登り法により位置を逐次改善する割り当て等を行ってもよい。

(12) In the above-described embodiment and each modification thereof, an example in which the model image generating means 51 randomly assigns a position to the object model each time it is repeated is shown, but the position is slightly deviated from the previous position. May be assigned, or the position may be probabilistically searched by the MCMC (Markov chain Monte Carlo) method with reference to the similarity to the previous arrangement, or the position may be sequentially improved by the mountain climbing method. May be good.

１・・・画像監視装置
２・・・撮影部
３・・・通信部
４・・・記憶部
５・・・画像処理部
６・・・出力部
３０・・・画像取得手段
３１・・・物体位置出力手段
４０・・・背景画像記憶手段
４１・・・物体モデル記憶手段
５０・・・密度推定手段
５１・・・モデル画像生成手段
５２・・・最適配置推定手段
５３・・・変化領域抽出手段
５４・・・サイズ換算手段
４１０・・・立体モデル記憶手段
４１１・・・カメラパラメータ記憶手段
４１２・・・モデル像記憶手段
４２０・・・カメラパラメータ記憶手段
４２１・・・実サイズ記憶手段
５４０・・・実サイズ算出手段 1 ... Image monitoring device 2 ... Imaging unit 3 ... Communication unit 4 ... Storage unit 5 ... Image processing unit 6 ... Output unit 30 ... Image acquisition means 31 ... Object Position output means 40 ... Background image storage means 41 ... Object model storage means 50 ... Density estimation means 51 ... Model image generation means 52 ... Optimal placement estimation means 53 ... Change area extraction means 54 ... Size conversion means 410 ... Three-dimensional model storage means 411 ... Camera parameter storage means 412 ... Model image storage means 420 ... Camera parameter storage means 421 ... Actual size storage means 540 ...・ Actual size calculation method

Claims

It is an object position estimation device that estimates the position of each of the objects from the captured image in which the estimation target space where congestion may occur due to a predetermined object is captured.
The distribution of the density of the object photographed in the photographed image is estimated by using a density estimator that has learned in advance the characteristics of the density image obtained by photographing the space in which the object exists at the predetermined density for each predetermined density. Density estimation means and
An object model storage means that stores an object model that imitates the object,
The arrangement area is set in the captured image, and the positions in the arrangement area are assigned to the number of the object models according to the distribution to set a plurality of arrangements, and the object model is arranged in the plurality of arrangements. A model image generation means that draws and generates a model image,
Optimal placement estimation means that calculates the similarity between the model image and the captured image of each of the plurality of arrangements and outputs the arrangement having the highest degree of similarity among the plurality of arrangements.
An object position estimation device characterized by being equipped with.

The model image generation means allocates a plurality of the number of the object models within the range of the estimation error assumed in the distribution and sets the plurality of arrangements.
The optimum arrangement estimation means calculates the similarity for each arrangement by lowering it as the estimation error corresponding to the number in the arrangement becomes larger.
The object position estimation device according to claim 1.

The model image generation means calculates the degree of overlap between the object models in the model image for each arrangement.
The optimum arrangement estimating means calculates the similarity for each arrangement by lowering the degree of overlap in the arrangement.
The object position estimation device according to claim 1 or 2.

Further provided with a size conversion means for converting the size of the local region in the captured image into the actual size in the estimation target space.
The density estimation means estimates the density for each local region and estimates the density.
The model image generation means determines the number to be allocated to the arrangement region from the actual size and the density of the local region included in the arrangement region.
The object position estimation device according to any one of claims 1 to 3.

A change area extraction means for comparing the photographed image with the background image of the estimation target space and extracting a change area in the photographed image that differs from the background image by a predetermined reference or more is further provided.
The model image generation means sets the change area in the arrangement area.
The object position estimation device according to any one of claims 1 to 4.

The model image generation means sets a region in which the density is estimated to be greater than 0 in the distribution as the arrangement region.
The object position estimation device according to any one of claims 1 to 5.

The object model storage means stores the object model that imitates the shape of the object, and stores the object model.
The optimum placement estimation means calculates the similarity of the shape of the object model drawn on the model image according to the high degree of conformity with respect to the placement region.
The object position estimation device according to any one of claims 1 to 6.