JP2019032588A

JP2019032588A - Image analysis apparatus

Info

Publication number: JP2019032588A
Application number: JP2017151720A
Authority: JP
Inventors: 匠宗片; Takumi Munekata
Original assignee: Secom Co Ltd
Current assignee: Secom Co Ltd
Priority date: 2017-08-04
Filing date: 2017-08-04
Publication date: 2019-02-28
Anticipated expiration: 2037-08-04
Also published as: JP7005213B2

Abstract

To solve a problem in which analysis accuracy changes according to the difference in a congestion degree, in an apparatus which analyzes motion of a moving object from a photographed image.SOLUTION: Image acquisition means 30 acquires photographed images at a plurality of time points obtained by photographing a space to be crowded in a predetermined moving object. Density estimation means 50 estimates the density of the moving object photographed to any region in the photographed image, by using a density estimator which has learned an image feature of each density image obtained by photographing the space in which the moving object is present at the density for each predetermined density. Region division means 51 divides each of the division regions obtained by dividing the photographed image for each of plural classes set with respect to the density based on the estimated density, into a plurality of local regions in accordance with a division reference defined for each class. Motion vector calculation means 52 calculates a motion vector in each local region. Behavior to be closely observed detection means 53 analyzes a motion of the moving object in the space from the motion vector of the plurality of local regions.SELECTED DRAWING: Figure 2

Description

本発明は、人等の移動物体により混雑が生じ得る空間を撮影した画像から移動物体の動きを解析する画像解析装置に関する。 The present invention relates to an image analysis apparatus that analyzes the movement of a moving object from an image of a space in which congestion can occur due to a moving object such as a person.

画像に撮影された移動物体の動きを解析する際の基礎情報のひとつとして動きベクトルが知られている。 A motion vector is known as one of basic information for analyzing the motion of a moving object photographed in an image.

下記特許文献１には、特徴点の座標を中心とする局所領域を設定し、局所領域を分析単位とするオプティカルフロー分析を行うことで、人や配置物の動きベクトルを算出して人や配置物の動きを解析する画像監視装置が例示されている。通常、この局所領域の大きさは予め定められる。 In the following Patent Document 1, a local region centered on the coordinates of feature points is set, and by performing an optical flow analysis using the local region as an analysis unit, a motion vector of a person or an arrangement is calculated, and the person or the arrangement is calculated. An image monitoring device that analyzes the movement of an object is illustrated. Usually, the size of this local region is predetermined.

また、下記特許文献２には、時系列画像に時空間セグメンテーションを施して生成した複数の時空間セグメントのそれぞれから動きベクトルを算出して人の動きを解析する群衆解析装置が例示されている。その際の時空間セグメンテーションは００３５段落の式（１）に示された基準で時空間セグメント同士を結合させることにより行われる。当該式においては緩和項α／Ｎのαの値が時空間セグメント同士の結合し易さを定め、当該αは予め設定された値とされる。 Patent Document 2 below exemplifies a crowd analysis device that calculates a motion vector from each of a plurality of spatiotemporal segments generated by performing spatiotemporal segmentation on a time-series image and analyzes a human motion. The spatiotemporal segmentation at that time is performed by combining the spatiotemporal segments according to the criteria shown in Equation (1) in the 0035 paragraph. In the equation, the value of α of the relaxation term α / N determines the ease of coupling of the spatio-temporal segments, and α is a preset value.

このように、従来技術においては、動きベクトルの分析のために撮影画像を分割する基準（以下、分割基準）が固定的に設定されていた。 As described above, in the prior art, a reference (hereinafter referred to as a division reference) for dividing a captured image for motion vector analysis is fixedly set.

特開２０１３−１４３０６８号公報JP 2013-143068 A 特開２０１７−０６８５９８号公報JP 2017-068598 A

従来手法においては、画像に撮影されている空間における移動物体の混雑の度合い（以下、密度）に関わらず、常に同じ分割基準で撮影画像を局所領域に分割して動きベクトルを算出していたため、密度が変動すると移動物体の動きを解析する精度が低下することがあった。 In the conventional method, regardless of the degree of congestion (hereinafter referred to as density) of the moving object in the space where the image is captured, the captured image is always divided into local regions based on the same division criterion, and the motion vector is calculated. When the density fluctuates, the accuracy of analyzing the movement of a moving object may decrease.

すなわち、人の密度が高いほど人の像同士が密接し易く、人の密度が低いほど人の像同士は分離した状態となり易い。そのため、例えば、人の密度が低い場合は人の部位（手、頭など）程度の大きさまたは部位よりも小さな局所領域を設定することによって詳細な動きを解析することが好適である。しかし、混雑が生じて人の密度が高くなってもそのままの設定で動きを解析していると、近傍人物の部位との混同が多発して誤った動きベクトルの算出が多発してしまう。 In other words, the higher the density of people, the easier the human images are in close contact with each other, and the lower the density of people, the more likely the human images are separated. Therefore, for example, when the density of a person is low, it is preferable to analyze the detailed movement by setting a local region that is about the size of a human part (hand, head, etc.) or smaller than the part. However, even if congestion occurs and the density of people increases, if the motion is analyzed with the setting as it is, confusion with nearby human parts often occurs and erroneous motion vectors are frequently calculated.

このように、常に同じ分割基準で動きベクトルの算出を行うと、混雑の変動によって動きベクトルの誤算出が多くなり、移動物体の動きを解析する精度が低下してしまう問題があった。 As described above, when motion vectors are always calculated based on the same division criterion, motion vectors are erroneously calculated due to fluctuations in congestion, and the accuracy of analyzing the motion of a moving object is reduced.

本発明は上記問題を鑑みてなされたものであり、人等の移動物体による混雑が生じ得る空間を撮影した画像から移動物体の動きを高い精度で解析可能な画像解析装置を提供することを目的とする。 The present invention has been made in view of the above problems, and an object of the present invention is to provide an image analysis apparatus capable of analyzing the movement of a moving object with high accuracy from an image of a space in which congestion due to a moving object such as a person may occur. And

（１）本発明に係る画像解析装置は、所定の移動物体で混雑し得る空間を撮影した複数時刻の撮影画像を取得する画像取得手段と、所定の密度ごとに当該密度にて前記移動物体が存在する空間を撮影した密度画像それぞれの画像特徴を学習した密度推定器を用いて、前記撮影画像内の任意の領域に撮影された前記移動物体の前記密度を推定する密度推定手段と、推定された前記密度に基づいて前記撮影画像を前記密度に関し設定された複数の階級ごとに区分した区分領域のそれぞれを、前記階級ごとに定めた分割基準に従い複数の局所領域に分割する領域分割手段と、前記局所領域それぞれにおける動きベクトルを算出する動きベクトル算出手段と、前記複数の局所領域の前記動きベクトルから前記空間における前記移動物体の動きを解析する動き解析手段と、を備える。 (1) An image analysis apparatus according to the present invention includes an image acquisition unit that acquires a plurality of time-captured images obtained by capturing a space that can be crowded with a predetermined moving object, and the moving object has a predetermined density for each moving object. Density estimation means for estimating the density of the moving object imaged in an arbitrary region in the captured image using a density estimator that has learned the image characteristics of each density image captured of an existing space; Area dividing means for dividing each of the divided areas obtained by dividing the captured image into a plurality of classes set with respect to the density based on the density, into a plurality of local areas in accordance with a division criterion defined for each class; Motion vector calculating means for calculating a motion vector in each of the local regions, and analyzing the motion of the moving object in the space from the motion vectors of the plurality of local regions Comprising a motion analysis means.

（２）上記（１）に記載の画像解析装置において、前記分割基準は、前記移動物体の大きさを基準として予め定められたサイズを有する領域を前記局所領域とし、前記密度が高い前記階級ほど前記サイズを大きく設定するものとすることができる。 (2) In the image analysis apparatus according to (1), the division criterion is a region having a size determined in advance with respect to the size of the moving object as the local region, and the higher the density the higher the class. The size can be set large.

（３）上記（１）に記載の画像解析装置において、前記分割基準は、画素値と画素位置とで定義する画素の類似度に基づいて互いに類似する画素からなる領域を前記局所領域とし、前記密度が高い前記階級ほど前記局所領域のサイズが大きくなりやすく定められたものとすることができる。 (3) In the image analysis device according to (1), the division criterion is a region including pixels similar to each other based on pixel similarity defined by a pixel value and a pixel position, and It can be determined that the higher the density, the larger the size of the local region.

（４）上記（３）に記載の画像解析装置において、前記分割基準は、前記密度が高い前記階級ほど、単位面積当たりの前記局所領域の数を少なく設定することができる。 (4) In the image analysis device according to (3), the division criterion can be set such that the higher the density, the smaller the number of the local regions per unit area.

（５）上記（３）に記載の画像解析装置において、前記分割基準は、前記密度が高い前記階級ほど、前記画素について互いに類似すると判定する前記類似度の閾値を低く設定することができる。 (5) In the image analysis apparatus according to (3), the division criterion may set the similarity threshold that is determined to be similar to each other with respect to the pixels as the density is higher.

本発明によれば、移動物体による混雑が生じ得る空間を撮影した画像から移動物体の動きを高い精度で解析可能な画像解析装置を提供することが可能となる。 ADVANTAGE OF THE INVENTION According to this invention, it becomes possible to provide the image analysis apparatus which can analyze the motion of a moving object with high precision from the image which image | photographed the space where the congestion by a moving object may arise.

本発明の実施形態に係る画像監視装置の概略の構成を示すブロック図である。1 is a block diagram illustrating a schematic configuration of an image monitoring apparatus according to an embodiment of the present invention. 本発明の実施形態に係る画像監視装置の機能を示す機能ブロック図である。It is a functional block diagram which shows the function of the image monitoring apparatus which concerns on embodiment of this invention. 撮影画像の例、およびそれに対応する区分領域の例を示す模式図である。It is a schematic diagram which shows the example of a picked-up image, and the example of the division area corresponding to it. 低混雑領域、中混雑領域、高混雑領域それぞれにおける局所領域および動きベクトルの模式図である。It is a schematic diagram of a local area and a motion vector in each of a low congestion area, a medium congestion area, and a high congestion area. 本発明の実施形態に係る画像監視装置における監視動作の概略の処理フロー図である。FIG. 5 is a schematic process flow diagram of a monitoring operation in the image monitoring apparatus according to the embodiment of the present invention. 本発明の第１の実施形態における要注視行動検出処理の一例の概略のフロー図である。It is a general | schematic flowchart of an example of a gaze required action detection process in the 1st Embodiment of this invention. 低混雑領域、中混雑領域、高混雑領域それぞれからなる撮影画像の例とそれに対する局所領域の例を示す模式図である。It is a schematic diagram which shows the example of the picked-up image which consists of each of a low congestion area | region, a medium congestion area | region, and a high congestion area | region, and the example of a local area | region with respect to it. 本発明の第２の実施形態における要注視行動検出処理の一例の概略のフロー図である。It is a general | schematic flowchart of an example of a gaze required action detection process in the 2nd Embodiment of this invention.

以下、本発明の実施の形態（以下実施形態という）に係る画像監視装置１について、図面に基づいて説明する。 Hereinafter, an image monitoring apparatus 1 according to an embodiment of the present invention (hereinafter referred to as an embodiment) will be described with reference to the drawings.

［第１の実施形態］
図１は画像監視装置１の概略の構成を示すブロック図である。画像監視装置１は、本発明に係る画像解析装置を用いて構成され、撮影部２、通信部３、記憶部４、画像処理部５および表示部６からなる。 [First Embodiment]
FIG. 1 is a block diagram showing a schematic configuration of the image monitoring apparatus 1. The image monitoring apparatus 1 is configured using the image analysis apparatus according to the present invention, and includes an imaging unit 2, a communication unit 3, a storage unit 4, an image processing unit 5, and a display unit 6.

撮影部２は、監視カメラであり、通信部３を介して画像処理部５と接続され、所定の物体が混雑し得る監視空間を所定の時間間隔で撮影して撮影画像を出力する撮影手段である。 The imaging unit 2 is a monitoring camera, and is an imaging unit that is connected to the image processing unit 5 via the communication unit 3 and that captures a monitoring space in which a predetermined object can be crowded at predetermined time intervals and outputs a captured image. is there.

例えば、撮影部２は、イベント会場に設置されたポールに監視空間を俯瞰する視野を有して設置される。その視野は固定されていてもよいし、予めのスケジュール或いは通信部３を介した外部からの指示に従って変更されてもよい。また、例えば、撮影部２は監視空間をフレーム周期１秒で撮影してカラー画像を生成する。カラー画像の代わりにモノクロ画像を生成してもよい。 For example, the imaging unit 2 is installed with a field of view overlooking the monitoring space on a pole installed at the event venue. The visual field may be fixed, or may be changed according to a schedule in advance or an instruction from the outside via the communication unit 3. Further, for example, the imaging unit 2 captures the monitoring space with a frame period of 1 second and generates a color image. A monochrome image may be generated instead of the color image.

通信部３は、通信回路であり、その一端が画像処理部５に接続され、他端が同軸ケーブルまたはＬＡＮ（Local Area Network）、インターネットなどの通信網を介して撮影部２および表示部６と接続される。通信部３は、撮影部２から撮影画像を取得して画像処理部５に入力し、画像処理部５から入力された解析結果を表示部６に出力する。 The communication unit 3 is a communication circuit, one end of which is connected to the image processing unit 5 and the other end is connected to the photographing unit 2 and the display unit 6 via a communication network such as a coaxial cable, a LAN (Local Area Network), or the Internet. Connected. The communication unit 3 acquires a captured image from the imaging unit 2 and inputs the acquired image to the image processing unit 5, and outputs the analysis result input from the image processing unit 5 to the display unit 6.

記憶部４は、ＲＯＭ（Read Only Memory）、ＲＡＭ（Random Access Memory）等のメモリ装置であり、各種プログラムや各種データを記憶する。記憶部４は、画像処理部５と接続されて画像処理部５との間でこれらの情報を入出力する。 The storage unit 4 is a memory device such as a ROM (Read Only Memory) or a RAM (Random Access Memory), and stores various programs and various data. The storage unit 4 is connected to the image processing unit 5 and inputs / outputs such information to / from the image processing unit 5.

画像処理部５は、ＣＰＵ（Central Processing Unit）、ＤＳＰ（Digital Signal Processor）、ＭＣＵ（Micro Control Unit）等の演算装置で構成される。画像処理部５は、記憶部４と接続され、記憶部４からプログラムを読み出して実行することにより各種処理手段・制御手段として動作し、各種データを記憶部４に記憶させ、また記憶部４から読み出す。また、画像処理部５は、通信部３を介して撮影部２および表示部６とも接続され、通信部３経由で撮影部２から取得した撮影画像を解析することにより、人の動きを解析し、解析結果と撮影画像を通信部３経由で表示部６に出力する。 The image processing unit 5 is configured by an arithmetic device such as a CPU (Central Processing Unit), a DSP (Digital Signal Processor), or an MCU (Micro Control Unit). The image processing unit 5 is connected to the storage unit 4, operates as various processing units / control units by reading and executing a program from the storage unit 4, stores various data in the storage unit 4, and from the storage unit 4 read out. The image processing unit 5 is also connected to the photographing unit 2 and the display unit 6 via the communication unit 3, and analyzes a captured image acquired from the photographing unit 2 via the communication unit 3, thereby analyzing a human movement. The analysis result and the captured image are output to the display unit 6 via the communication unit 3.

表示部６は、液晶ディスプレイ又はＣＲＴ（Cathode Ray Tube）ディスプレイ等のディスプレイ装置であり、通信部３を介して画像処理部５と接続され、画像処理部５による解析結果を表示する表示手段である。監視員は表示された解析結果を参照しながら撮影画像を視認して異常行動等の発生を判断し、必要に応じて人員配置の変更等の対処を行う。 The display unit 6 is a display device such as a liquid crystal display or a CRT (Cathode Ray Tube) display, and is a display unit that is connected to the image processing unit 5 via the communication unit 3 and displays an analysis result by the image processing unit 5. . The monitor visually recognizes the photographed image while referring to the displayed analysis result to determine the occurrence of abnormal behavior or the like, and takes measures such as changing the personnel arrangement as necessary.

なお、本実施形態においては、撮影部２と画像処理部５の個数が１対１である画像監視装置１を例示するが、別の実施形態においては、撮影部２と画像処理部５の個数を多対１或いは多対多とすることもできる。 In the present embodiment, the image monitoring apparatus 1 in which the number of the photographing units 2 and the image processing units 5 is 1: 1 is illustrated, but in another embodiment, the number of the photographing units 2 and the image processing units 5 is illustrated. Can be many-to-one or many-to-many.

図２は画像監視装置１の機能を示す機能ブロック図である。通信部３は画像取得手段３０および要注視情報出力手段３１等として機能し、記憶部４は時系列画像記憶手段４０、密度推定器記憶手段４１および検出基準記憶手段４２等として機能する。画像処理部５は、密度推定手段５０、領域分割手段５１、動きベクトル算出手段５２、および要注視行動検出手段５３（動き解析手段）等として機能する。 FIG. 2 is a functional block diagram showing functions of the image monitoring apparatus 1. The communication unit 3 functions as the image acquisition unit 30 and the gaze information output unit 31 and the like, and the storage unit 4 functions as the time-series image storage unit 40, the density estimator storage unit 41, the detection reference storage unit 42, and the like. The image processing unit 5 functions as a density estimating unit 50, a region dividing unit 51, a motion vector calculating unit 52, a gaze behavior detecting unit 53 (motion analyzing unit), and the like.

画像取得手段３０は撮影手段である撮影部２から撮影画像を順次取得して、取得した撮影画像を密度推定手段５０に順次出力するとともに、時系列画像記憶手段４０に順次追加記憶させる。 The image acquisition unit 30 sequentially acquires captured images from the imaging unit 2 that is an imaging unit, sequentially outputs the acquired captured images to the density estimation unit 50, and additionally stores them in the time-series image storage unit 40 sequentially.

時系列画像記憶手段４０は、画像取得手段３０から入力された撮影画像を時系列に記憶し、撮影画像を撮影時刻順に並べた時系列画像を動きベクトル算出手段５２に出力する。時系列画像記憶手段４０は少なくとも動きベクトル算出手段５２が必要とする時間区間の撮影画像を記憶し、好ましくは不要となった時点で削除する。例えば、時系列画像記憶手段４０は現時刻および現時刻の１時刻前から４時刻前までの撮影画像（つまり最新５フレームの撮影画像）を循環記憶する。 The time-series image storage unit 40 stores the captured images input from the image acquisition unit 30 in time series, and outputs the time-series images in which the captured images are arranged in order of the shooting time to the motion vector calculation unit 52. The time-series image storage means 40 stores at least a captured image of a time interval required by the motion vector calculation means 52, and preferably deletes it when it becomes unnecessary. For example, the time-series image storage means 40 circulates and stores the current time and the captured images from one hour before the current time to four hours before (that is, the latest five frames).

密度推定器記憶手段４１は、所定の密度ごとに当該密度にて移動物体（人）が存在する空間を撮影した画像（密度画像）それぞれの画像特徴を学習した推定密度算出関数であって、画像の特徴量を入力されると当該画像に撮影されている移動物体の密度の推定値（推定密度）を算出して出力する推定器（密度推定器）を表す情報を予め記憶している。つまり密度推定器記憶手段４１は上記推定密度算出関数の係数等のパラメータを密度推定器の情報として予め記憶している。 The density estimator storage unit 41 is an estimated density calculation function that learns the image features of each image (density image) obtained by capturing a space where a moving object (person) exists at a predetermined density. Is input in advance, information representing an estimator (density estimator) that calculates and outputs an estimated value (estimated density) of the density of the moving object photographed in the image is stored in advance. That is, the density estimator storage unit 41 stores parameters such as coefficients of the estimated density calculation function in advance as information on the density estimator.

密度推定手段５０は、画像取得手段３０から入力された撮影画像内の任意の領域について当該領域に撮影された移動物体の密度を推定する。具体的には、密度推定手段５０は、撮影画像の各所から密度推定用の特徴量（推定用特徴量）を抽出するとともに密度推定器記憶手段４１から密度推定器を読み出して、抽出した推定用特徴量のそれぞれを密度推定器に入力することによって密度を推定する。これにより、撮影画像内での推定密度の分布（移動物体の密度分布）が求められ、密度推定手段５０は推定した密度分布を動きベクトル算出手段５２に出力する。 The density estimation unit 50 estimates the density of the moving object photographed in the region for any region in the photographed image input from the image acquisition unit 30. Specifically, the density estimation means 50 extracts the density estimation feature quantity (estimation feature quantity) from various locations of the captured image, reads out the density estimator from the density estimator storage means 41, and extracts the estimated value. The density is estimated by inputting each of the feature quantities to the density estimator. Thereby, an estimated density distribution (a density distribution of the moving object) in the captured image is obtained, and the density estimation unit 50 outputs the estimated density distribution to the motion vector calculation unit 52.

密度推定の処理と密度推定器について具体的に説明する。 The density estimation process and the density estimator will be specifically described.

密度推定手段５０は、撮影画像の各画素の位置に窓（推定用抽出窓）を設定し、各推定用抽出窓における撮影画像から推定用特徴量を抽出する。推定用特徴量はＧＬＣＭ（Gray Level Co-occurrence Matrix）特徴である。 The density estimation means 50 sets a window (estimation extraction window) at the position of each pixel of the captured image, and extracts an estimation feature amount from the captured image in each estimation extraction window. The estimation feature amount is a GLCM (Gray Level Co-occurrence Matrix) feature.

各推定用抽出窓に撮影されている監視空間内の領域は同一サイズであることが望ましい。すなわち、好適には密度推定手段５０は不図示のカメラパラメータ記憶手段から予め記憶されている撮影部２のカメラパラメータを読み出し、カメラパラメータを用いたホモグラフィ変換により撮影画像の任意の画素に撮影されている監視空間内の領域が同一サイズとなるように撮影画像を変形してから推定用特徴量を抽出する。 It is desirable that the area in the monitoring space photographed by each estimation extraction window is the same size. That is, preferably, the density estimation means 50 reads out the camera parameters of the photographing unit 2 stored in advance from a camera parameter storage means (not shown), and is photographed at an arbitrary pixel of the photographed image by homography conversion using the camera parameters. The estimation feature amount is extracted after the captured image is deformed so that the areas in the monitoring space have the same size.

密度推定器は多クラスの画像を識別する識別器で実現することができ、多クラスＳＶＭ（Support Vector Machine）法で学習した識別関数とすることができる。 The density estimator can be realized by a classifier that identifies multi-class images, and can be a discrimination function learned by a multi-class SVM (Support Vector Machine) method.

密度は、例えば、人が存在しない「背景」クラス、０人／ｍ^２より高く２人／ｍ^２以下である「低密度」クラス、２人／ｍ^２より高く４人／ｍ^２以下である「中密度」クラス、４人／ｍ^２より高い「高密度」クラスの４クラスと定義することができる。 Density, for example, there is no human "Background" class is 0 people / m higher than ² is two / m ² or less "low density" class, higher than two / m ² 4 persons / m ² or less It can be defined as 4 classes of “medium density” class, “high density” class higher than 4 persons / m ² .

推定密度は各クラスに予め付与された値であり、分布推定の結果として出力される値である。本実施形態では各クラスに対応する値を「背景」、「低密度」、「中密度」、「高密度」と表記する。 The estimated density is a value given in advance to each class, and is a value output as a result of distribution estimation. In the present embodiment, values corresponding to each class are expressed as “background”, “low density”, “medium density”, and “high density”.

すなわち、密度推定器は「背景」クラス、「低密度」クラス、「中密度」クラス、「高密度」クラスのそれぞれに帰属する多数の画像（密度画像）の特徴量に多クラスＳＶＭ法を適用して学習して得られる、各クラスの画像を他のクラスと識別するための識別関数である。この学習により導出された識別関数のパラメータが密度推定器として記憶されている。なお、密度画像の特徴量は、推定用特徴量と同種であり、ＧＬＣＭ特徴である。 That is, the density estimator applies the multi-class SVM method to the feature quantities of a large number of images (density images) belonging to the “background” class, “low density” class, “medium density” class, and “high density” class. It is an identification function for discriminating images of each class obtained from learning and other classes. The parameters of the discriminant function derived by this learning are stored as a density estimator. The feature amount of the density image is the same type as the estimation feature amount and is a GLCM feature.

密度推定手段５０は、各画素に対応して抽出した推定用特徴量のそれぞれを密度推定器に入力することによってその出力値である推定密度を取得する。なお、撮影画像を変形させて推定用特徴量を抽出した場合、密度推定手段５０はカメラパラメータを用いたホモグラフィ変換により密度分布を元の撮影画像の形状に変形させる。 The density estimation means 50 acquires the estimated density which is the output value by inputting each of the estimation feature quantities extracted corresponding to each pixel to the density estimator. Note that when the estimated feature value is extracted by deforming the photographed image, the density estimation unit 50 transforms the density distribution into the shape of the original photographed image by homography conversion using camera parameters.

こうして得られた、撮影画像の画素ごとの推定密度の集まりが密度分布である。 A collection of estimated densities for each pixel of the captured image thus obtained is a density distribution.

領域分割手段５１は、密度推定手段５０から入力される密度分布を参照して、撮影画像を密度に応じて区分して、当該区分した領域のそれぞれを密度に応じて定めた分割基準に従って複数の局所領域に分割し、分割結果を動きベクトル算出手段５２に出力する。以下、密度に応じて区分した領域を区分領域と称する。 The area dividing means 51 refers to the density distribution input from the density estimating means 50, divides the captured image according to the density, and sets a plurality of divided areas according to the division criteria determined according to the density. The image is divided into local areas, and the division result is output to the motion vector calculation means 52. Hereinafter, the area divided according to the density is referred to as a divided area.

具体的には、領域分割手段５１はまず、密度推定手段５０により推定された密度分布に基づいて撮影画像を、密度に関し設定された複数の階級ごとの区分領域に分ける。本実施形態では、密度に関する階級として、密度推定手段５０が出力する密度のクラスのうち「背景」と「低密度」とを統合し「低混雑度」という１つの階級を定義し、また「中密度」および「高密度」をそれぞれ「中混雑度」、「高混雑度」という階級に定義する。これら３つの階級に対応して、撮影画像は、推定密度が「背景」クラスである画素と推定密度が「低密度」クラスである画素との集まりからなる低混雑領域、推定密度が「中密度」クラスである画素の集まりからなる中混雑領域、および推定密度が「高密度」クラスである画素の集まりからなる高混雑領域の３種類の区分領域に分けられる。 Specifically, the region dividing unit 51 first divides the captured image into a plurality of divided regions set for the density based on the density distribution estimated by the density estimating unit 50. In this embodiment, as a class relating to density, among the density classes output by the density estimation means 50, “background” and “low density” are integrated to define one class “low congestion”, and “medium” “Density” and “High density” are defined as classes of “medium congestion degree” and “high congestion degree”, respectively. Corresponding to these three classes, the photographed image is a low-congested area composed of a group of pixels having an estimated density of “background” class and pixels having an estimated density of “low density” class, and the estimated density is “medium density”. The area is divided into three types of divided areas: a medium crowded area composed of a group of pixels of “class” and a high crowded area composed of a group of pixels of which the estimated density is a “high density” class.

図３は撮影画像の例、およびそれに対応する区分領域の例を示す模式図である。図３（ａ）は撮影画像の例であり、人の像６０が示されている。また、図３（ｂ）が区分領域を表しており、白抜き部が低混雑領域であり、また、斜線部が中混雑領域、網掛け部が高混雑領域である。 FIG. 3 is a schematic diagram illustrating an example of a captured image and an example of a segmented area corresponding to the captured image. FIG. 3A is an example of a photographed image, and a human image 60 is shown. Further, FIG. 3B shows a segmented area, where a white portion is a low congestion region, a hatched portion is a medium congestion region, and a shaded portion is a high congestion region.

次に領域分割手段５１は区分領域を、密度の階級、つまり混雑度ごとに定めた分割基準に従い複数の局所領域に分割する。つまり、撮影画像内の各区分領域は、当該区分領域に対応する混雑度について定められた分割基準により複数の局所領域に分割される。 Next, the area dividing unit 51 divides the divided area into a plurality of local areas in accordance with a density standard, that is, a division criterion determined for each degree of congestion. That is, each segmented area in the captured image is divided into a plurality of local areas according to a division criterion determined for the degree of congestion corresponding to the segmented area.

本実施形態では撮影画像を単位ブロックに分割し、当該単位ブロックを単位として局所領域を定義する。例えば、撮影画像を当該撮影画像に撮影される立位の人の大きさの８分の１程度に見積もった間隔で格子状に分割し、これにより生成される矩形領域を単位ブロックとすることができる。領域分割手段５１は、混雑度に応じて区分した各区分領域に、混雑度に応じて予め定めた数の単位ブロックからなる局所領域を設定することによって、区分領域それぞれの撮影画像を当該密度に応じた大きさの局所領域に分割する。 In the present embodiment, a captured image is divided into unit blocks, and a local region is defined using the unit block as a unit. For example, the captured image may be divided into a grid at intervals estimated to be about one-eighth the size of a standing person photographed in the captured image, and a rectangular area generated thereby may be used as a unit block. it can. The area dividing means 51 sets a local area composed of a predetermined number of unit blocks according to the degree of congestion in each of the divided areas divided according to the degree of congestion, so that the captured images of the respective divided areas are set to the density. Divide into local areas of appropriate size.

具体的には、撮影画像の横方向をＸ軸、縦方向をＹ軸として、撮影画像をＸ軸方向、Ｙ軸方向それぞれに沿って分割して単位ブロックを定義する。そして、低混雑領域においては各単位ブロックを局所領域に設定する。これにより、低混雑領域においては、例えば、小さくとも人の手ほどの大きさであり、大きくとも人の頭ほどの大きさの局所領域が設定される。 Specifically, the unit block is defined by dividing the photographed image along the X-axis direction and the Y-axis direction, with the horizontal direction of the photographed image as the X axis and the vertical direction as the Y axis. In the low congestion area, each unit block is set as a local area. Thereby, in the low congestion area, for example, a local area that is as small as a human hand and as large as a human head is set.

また、中混雑領域においては単位ブロックを２つずつ統合した統合ブロックそれぞれを局所領域に設定する。例えば、当該局所領域とする統合ブロックはＹ軸方向に隣り合う２つの単位ブロックからなり、中混雑領域内にて、Ｘ軸方向に１ブロック間隔、Ｙ軸方向に２ブロック間隔で配置される。これにより、中混雑領域においては、小さくとも人の頭ほどの大きさであり、大きくとも人の上半身ほどの大きさの局所領域が設定される。 In the middle congestion area, each integrated block obtained by integrating two unit blocks is set as a local area. For example, the integrated block as the local area is composed of two unit blocks adjacent in the Y-axis direction, and is arranged at an interval of 1 block in the X-axis direction and at an interval of 2 blocks in the Y-axis direction in the middle congestion area. As a result, in the medium congestion area, a local area that is as small as a person's head and as large as the upper body of the person is set.

高混雑領域においては単位ブロックを４つずつ統合した統合ブロックそれぞれを局所領域に設定する。例えば、当該局所領域とする統合ブロックはＸ軸方向およびＹ軸方向に２つずつ並んだ２×２の配列をなす４つの単位ブロックからなり、高混雑領域内にて、Ｘ軸方向およびＹ軸方向それぞれに２ブロック間隔で配置される。これにより、高混雑領域においては、小さくとも人の上半身ほどの大きさであり、大きくとも人の全身ほどの大きさの局所領域が設定される。 In the highly congested area, each integrated block obtained by integrating four unit blocks is set as a local area. For example, the integrated block as the local region is composed of 4 unit blocks having a 2 × 2 array arranged two by two in the X-axis direction and the Y-axis direction, and in the highly congested region, the X-axis direction and the Y-axis Arranged at intervals of 2 blocks in each direction. Thereby, in the high congestion area, a local area is set which is at least as large as the upper body of the person and as large as the whole body of the person.

図４は各混雑度での局所領域および動きベクトルの模式図であり、図４（ａ）〜（ｃ）はそれぞれ低混雑領域、中混雑領域、高混雑領域での局所領域および動きベクトルを示している。図４の局所領域は上述した単位ブロックに基づいて設定される例を示しており、図４（ａ）に示す低混雑領域では、撮影画像７０内のマス目の１つ１つが単位ブロックであり、当該単位ブロックが局所領域７２ａとされる。図４（ｂ）に示す中混雑領域では、撮影画像７０内の各マス目はＹ軸方向に並ぶ２つの単位ブロックからなる統合ブロックであり、当該統合ブロックが局所領域７２ｂとされる。図４（ｃ）に示す高混雑領域では、撮影画像７０内の各マス目はＸ軸方向およびＹ軸方向に２つずつ並ぶ４つの単位ブロックからなる統合ブロックであり、当該統合ブロックが局所領域７２ｃとされる。 FIG. 4 is a schematic diagram of local areas and motion vectors at each degree of congestion. FIGS. 4A to 4C show local areas and motion vectors in low congestion areas, medium congestion areas, and high congestion areas, respectively. ing. 4 shows an example in which the local area is set based on the unit block described above. In the low congestion area shown in FIG. 4A, each square in the captured image 70 is a unit block. The unit block is a local region 72a. In the middle crowded area shown in FIG. 4B, each square in the captured image 70 is an integrated block including two unit blocks arranged in the Y-axis direction, and the integrated block is a local area 72b. In the highly congested area shown in FIG. 4C, each square in the photographed image 70 is an integrated block composed of four unit blocks arranged two by two in the X-axis direction and the Y-axis direction, and the integrated block is a local area. 72c.

以上のように、領域分割手段５１は、撮影画像を密度に応じて区分した区分領域のそれぞれを、密度に応じて定めた分割基準に従って複数の局所領域に分割する。そして、その際に領域分割手段５１は、区分領域のそれぞれを、移動物体の大きさを基準とする大きさの局所領域であって、密度が高いほど大きく予め定められた大きさの局所領域に分割する。 As described above, the area dividing unit 51 divides each of the divided areas obtained by dividing the captured image according to the density into a plurality of local areas according to the division criterion determined according to the density. At that time, the area dividing means 51 converts each of the divided areas into local areas having a size based on the size of the moving object, and the larger the density, the larger the predetermined area. To divide.

動きベクトル算出手段５２は領域分割手段５１が設定した局所領域それぞれにおける動きベクトルを算出し、算出した動きベクトルを要注視行動検出手段５３に出力する。 The motion vector calculating unit 52 calculates a motion vector in each local region set by the region dividing unit 51 and outputs the calculated motion vector to the gaze action detecting unit 53.

密度の推定によって人どうしが接近していないことが判っている低混雑領域においては人の８分の１程度の小さな局所領域についての動きベクトルが算出されるので、手や足の動きなどの詳細な動きを表す動きベクトルが他人どうしの局所領域を混同することなく高精度に算出されることが期待できる。 In low-congestion areas where it is known that people are not approaching by density estimation, motion vectors for small local areas that are about one-eighth of humans are calculated. It can be expected that a motion vector representing a simple motion is calculated with high accuracy without confusion with the local area between others.

その一方、密度の推定によって人どうしの接近が生じていることが判っている中混雑領域および高混雑領域においては、人の４分の１〜２分の１程度の大きめな局所領域についての動きベクトルが算出される。大きめの局所領域を算出単位とすることで局所領域内に含まれ得る複数人の部位のかたまりについての動きを表す動きベクトルを算出でき、混雑するほどこれら複数人の位置関係は短時間で変化しにくくなるため、局所領域を大きくしても動きベクトルの精度は低下しにくくなる。よって、混雑の程度によらず精度良く動きベクトルを算出できる。 On the other hand, in medium and high congestion areas where it is known that humans are approaching each other by density estimation, the movement of a large local area of about one-quarter to one-half of a person A vector is calculated. By using a larger local area as a calculation unit, it is possible to calculate a motion vector that represents the movement of a group of multiple people that can be included in the local area. Therefore, even if the local area is enlarged, the accuracy of the motion vector is not easily lowered. Therefore, a motion vector can be calculated with high accuracy regardless of the degree of congestion.

なお、その際、好適には、動きベクトルは、推定された密度が高いほど長く設定した期間（分析時間間隔）における画像の動きから算出する。すなわち、動きベクトル算出手段５２は各局所領域が低混雑領域、中混雑領域および高混雑領域のいずれに帰属するかによって、当該局所領域における動きベクトルの分析時間間隔を切り替える。例えば、動きベクトル算出手段５２は低混雑領域においては分析時間間隔を１時刻間隔（１フレーム間隔）とし、また、中混雑領域においては２時刻間隔（２フレーム間隔）、高混雑領域においては４時刻間隔（４フレーム間隔）として動きベクトルを算出する。 At this time, preferably, the motion vector is calculated from the motion of the image during a set period (analysis time interval) as the estimated density is higher. That is, the motion vector calculation means 52 switches the motion vector analysis time interval in the local area depending on whether each local area belongs to a low congestion area, a medium congestion area, or a high congestion area. For example, the motion vector calculation means 52 sets the analysis time interval to one time interval (one frame interval) in the low congestion region, two time intervals (two frame interval) in the medium congestion region, and four times in the high congestion region. Motion vectors are calculated as intervals (4 frame intervals).

すなわち、動きベクトル算出手段５２は、時系列画像記憶手段４０から現時刻の撮影画像と１時刻前の撮影画像とを読み出し、現時刻の撮影画像において低混雑領域に帰属する局所領域（注目局所領域）それぞれに所定の探索範囲を設定して、１時刻前の撮影画像に設定した局所領域の中から各注目局所領域の探索範囲内に位置し且つ特徴量同士が最も類似する対応局所領域を検出し、対応局所領域の重心を始点とし注目局所領域の重心を終点とするベクトルを低混雑領域における現時刻の動きベクトルとして算出する。 That is, the motion vector calculation unit 52 reads the current time captured image and the previous one captured image from the time-series image storage unit 40, and the local region (attention local region) belonging to the low congestion region in the current time captured image. ) Set a predetermined search range for each, and detect corresponding local regions that are located within the search range of each local region of interest and have the most similar features from among the local regions set in the captured image one time ago Then, a vector having the centroid of the corresponding local region as the start point and the centroid of the local region of interest as the end point is calculated as the motion vector at the current time in the low congestion region.

同様に、動きベクトル算出手段５２は、時系列画像記憶手段４０から現時刻の撮影画像と２時刻前の撮影画像とを読み出し、現時刻の撮影画像において中混雑領域に帰属する注目局所領域それぞれに所定の探索範囲を設定して、２時刻前の撮影画像に設定した局所領域の中から各注目局所領域の探索範囲内に位置し且つ特徴量同士が最も類似する対応局所領域を検出し、対応局所領域と注目局所領域の重心同士を結ぶベクトルを中混雑領域における現時刻の動きベクトルとして算出する。 Similarly, the motion vector calculation unit 52 reads the current time captured image and the current image captured two hours ago from the time-series image storage unit 40, and each of the target local regions belonging to the middle crowded region in the current time captured image. A predetermined search range is set, and a corresponding local region that is located within the search range of each local region of interest and that has the most similar features is detected from the local regions set in the captured image two hours ago. A vector connecting the centroids of the local area and the local area of interest is calculated as a motion vector at the current time in the middle congestion area.

また、動きベクトル算出手段５２は、時系列画像記憶手段４０から現時刻の撮影画像と４時刻前の撮影画像とを読み出し、現時刻の撮影画像において高混雑領域に帰属する注目局所領域それぞれに所定の探索範囲を設定して、４時刻前の撮影画像に設定した局所領域の中から各注目局所領域の探索範囲内に位置し且つ特徴量同士が最も類似する対応局所領域を検出し、対応局所領域と注目局所領域の重心同士を結ぶベクトルを高混雑領域における現時刻の動きベクトルとして算出する。 Also, the motion vector calculation means 52 reads the captured image at the current time and the captured image at 4 hours before from the time-series image storage means 40, and determines the predetermined local area belonging to the highly congested area in the captured image at the current time. A corresponding local region that is located within the search range of each local region of interest and that has the most similar features is detected from among the local regions set in the captured image four times before, and the corresponding local region is detected. A vector connecting the centroids of the region and the local region of interest is calculated as a motion vector at the current time in the highly congested region.

ここで、特徴量は例えば平均画素値（平均色または平均濃度）とすることができる。また、探索範囲は動き解析の対象とする移動物体が移動可能な広さとすることができる。例えば、各注目局所領域に対し、当該注目局所領域の重心を中心とする所定半径の円を探索範囲として設定することができ、その半径は例えば１時刻の間に人が走って移動可能な距離に予め定めることができる。ここで、混雑度の増加は人の移動可能な速さを低くする効果がある。当該効果を考慮して、分析時間間隔は混雑度が高いほど長く設定される。一方、当該効果により、低混雑度より分析時間間隔を長く設定される中混雑度や高混雑度の領域での移動物体の移動可能な広さは、分析時間間隔に対応しては変化しない。この観点から、上述のように混雑度の各クラスにおける探索範囲の大きさを共通とすることが可能である。 Here, the feature amount can be, for example, an average pixel value (average color or average density). In addition, the search range can be set to an area in which a moving object to be subjected to motion analysis can move. For example, for each local area of interest, a circle with a predetermined radius centered on the center of gravity of the local area of interest can be set as a search range, and the radius is a distance that a person can run and move during, for example, one time Can be predetermined. Here, the increase in the degree of congestion has the effect of reducing the speed at which a person can move. Considering the effect, the analysis time interval is set longer as the congestion degree is higher. On the other hand, due to this effect, the movable range of the moving object in the medium congestion level or high congestion area where the analysis time interval is set longer than the low congestion level does not change in accordance with the analysis time interval. From this viewpoint, as described above, the size of the search range in each class of the congestion degree can be made common.

図４を用いて動きベクトルを説明する。図４（ａ）〜（ｃ）にはそれぞれ低混雑領域、中混雑領域、高混雑領域に関して複数時刻Ｔの撮影画像７０を示している。図４（ａ）に示す低混雑領域での動きベクトル算出では、現時刻（Ｔ＝ｔ）の撮影画像７０の低混雑度の注目局所領域の動きベクトル７４ａの始点として１時刻前（Ｔ＝ｔ−１）の撮影画像７０にて対応局所領域が探索される。また、図４（ｂ）に示す中混雑領域での動きベクトル算出では現時刻（Ｔ＝ｔ）の撮影画像７０の中混雑度の注目局所領域の動きベクトル７４ｂの始点として２時刻前（Ｔ＝ｔ−２）の撮影画像７０にて対応局所領域が探索され、図４（ｃ）に示す高混雑領域での動きベクトル算出では現時刻（Ｔ＝ｔ）の撮影画像７０の中混雑度の注目局所領域の動きベクトル７４ｃの始点として４時刻前（Ｔ＝ｔ−４）の撮影画像７０にて対応局所領域が探索される。 The motion vector will be described with reference to FIG. 4A to 4C show captured images 70 at a plurality of times T with respect to a low congestion area, a middle congestion area, and a high congestion area, respectively. In the calculation of the motion vector in the low congestion area shown in FIG. 4A, one hour before (T = t) as the start point of the motion vector 74a of the local area of interest of low congestion in the captured image 70 at the current time (T = t). The corresponding local region is searched for in the captured image 70 of (-1). In addition, in the motion vector calculation in the middle congestion area shown in FIG. 4B, two hours before (T = t) as the start point of the motion vector 74b of the attention local area of the middle congestion degree of the captured image 70 at the current time (T = t). The corresponding local area is searched for in the captured image 70 at t-2), and attention is paid to the degree of congestion in the captured image 70 at the current time (T = t) in the motion vector calculation in the highly congested area shown in FIG. The corresponding local area is searched for in the captured image 70 four times before (T = t−4) as the start point of the local area motion vector 74c.

検出基準記憶手段４２は、要注視行動を検出するために予め定められた検出基準を記憶している。この検出基準は混雑度合いごとに記憶され、各検出基準はそれぞれに対応する混雑度合いの領域において算出された動き分布に基づく要注視行動の検出に用いられる。 The detection criterion storage means 42 stores a predetermined detection criterion for detecting a gaze action requiring attention. This detection criterion is stored for each degree of congestion, and each detection criterion is used for detecting a gaze action requiring attention based on the motion distribution calculated in the corresponding congestion degree region.

要注視行動検出手段５３は、動きベクトル算出手段５２から複数の局所領域の動きベクトルを入力され、それら動きベクトルから撮影空間における移動物体の動きを解析することによって移動物体による要注視行動を検出し、検出した要注視行動の情報（要注視情報）を要注視情報出力手段３１に出力する。 The gaze action detecting means 53 receives motion vectors of a plurality of local areas from the motion vector calculation means 52, and detects the gaze action due to the moving object by analyzing the movement of the moving object in the imaging space from the motion vectors. Then, the detected gaze action information (gaze information) is output to the gaze information output means 31.

要注視行動検出手段５３は、混雑度ごとに、当該混雑度の領域にて算出された動きベクトルを集計して動き分布を算出するとともに、検出基準記憶手段４２から当該混雑度に対応する検出基準を読み出し、動き分布を検出基準と比較することによって当該混雑度の領域において要注視行動が発生しているか否かを判定する。例えば、要注視行動検出手段５３は、混雑度ごとに動きベクトルを集計して移動方向の頻度分布および／または速さの頻度分布を算出し、当該混雑度と対応付けて記憶されている検出基準と比較することによって要注視行動を検出する。 The gaze-behavior detection means 53 calculates the motion distribution by summing up the motion vectors calculated in the area of the congestion degree for each congestion degree, and also detects the detection reference corresponding to the congestion degree from the detection reference storage means 42. Is read out and the motion distribution is compared with the detection criterion to determine whether or not the behavior requiring attention is occurring in the area of the congestion level. For example, the gaze-behavior detection means 53 calculates the motion direction frequency distribution and / or the speed frequency distribution by summing up the motion vectors for each degree of congestion, and stores the detection criterion stored in association with the congestion degree. To detect gaze behavior.

ここで、要注視行動検出手段５３は、例えば、対応付けられている検出基準が要注視行動の特徴量である要注視パターンおよび閾値である場合は、要注視パターンと動き分布との類似度を算出して、類似度が閾値以上である場合に要注視行動が発生していると判定する。また、要注視行動検出手段５３は、対応付けられている検出基準が正常行動の特徴量である正常パターンおよび閾値である場合は、正常パターンと動き分布との相違度を算出して、相違度が閾値以上である場合に要注視行動が発生していると判定する。 Here, the gaze-behavior detection means 53, for example, determines the similarity between the gaze pattern and the motion distribution when the associated detection criterion is a gaze pattern and a threshold value that are features of the gaze behavior. When it is calculated and the similarity is equal to or greater than the threshold value, it is determined that the gaze action is occurring. Further, the gaze behavior detecting means 53 calculates the difference between the normal pattern and the motion distribution when the associated detection criterion is a normal pattern and a threshold value that are the feature amount of the normal action. It is determined that the behavior requiring attention is occurring when is greater than or equal to the threshold value.

要注視行動検出手段５３は、要注視行動が発生していると判定した場合に、検出基準を満たした動き分布が算出された領域、満たされた検出基準と対応する事象名を重畳させた監視画像を要注視情報として生成し、生成した要注視情報を要注視情報出力手段３１に出力する。 When it is determined that the gaze action is required, the gaze-behavior detection unit 53 superimposes the region in which the motion distribution satisfying the detection criterion is calculated, and the event name corresponding to the met detection criterion is superimposed. The image is generated as the gaze information, and the generated gaze information is output to the gaze information output means 31.

要注視情報出力手段３１は要注視行動検出手段５３から入力された要注視情報を表示部６に順次出力し、表示部６は要注視情報出力手段３１から入力された要注視情報に含まれる情報を表示する。例えば、要注視情報はインターネット経由で送受信され、表示部６に表示される。監視員は、表示された情報を視認することによって要注視行動の対処要否を判断し、対処が必要と判断すると対処員を派遣するなどの対処を行う。 The gaze information output means 31 sequentially outputs the gaze information input from the gaze behavior detection means 53 to the display unit 6, and the display unit 6 includes information included in the gaze information input from the gaze information output means 31. Is displayed. For example, the attention required information is transmitted / received via the Internet and displayed on the display unit 6. The monitoring person determines whether or not the action requiring attention is necessary by visually checking the displayed information, and takes measures such as dispatching a handling person when it is determined that the action is necessary.

次に、画像監視装置１の動作について説明する。図５は画像監視装置１における監視動作の概略の処理フロー図である。 Next, the operation of the image monitoring apparatus 1 will be described. FIG. 5 is a schematic process flow diagram of the monitoring operation in the image monitoring apparatus 1.

撮影部２は監視空間を撮影して、撮影した画像を順次、画像処理部５に入力する。画像処理部５は画像取得手段３０として動作し、撮影部２から撮影画像を取得して（ステップＳ１）、記憶部４に入力する。記憶部４は時系列画像記憶手段４０として機能し、入力された撮影画像を記憶、蓄積する（ステップＳ２）。 The imaging unit 2 images the monitoring space and sequentially inputs the captured images to the image processing unit 5. The image processing unit 5 operates as the image acquisition unit 30, acquires a captured image from the imaging unit 2 (step S <b> 1), and inputs the acquired image to the storage unit 4. The storage unit 4 functions as the time-series image storage means 40, and stores and accumulates the input photographed image (step S2).

要注視行動の検出に用いる動きベクトルの算出には予め定めた複数フレームの画像撮影を要するため、当該所定フレーム数の撮影画像が時系列画像記憶手段４０に蓄積されるまで（ステップＳ３にて「ＮＯ」の場合）、画像処理部５はステップＳ１，Ｓ２を繰り返す。本実施形態では当該フレーム数を５フレームとしている。 Since the calculation of the motion vector used for detecting the gaze action requires the photographing of a plurality of predetermined frames, until the predetermined number of photographed images are accumulated in the time-series image storage means 40 (in step S3, “ In the case of “NO”), the image processing unit 5 repeats steps S1 and S2. In the present embodiment, the number of frames is five.

時系列画像記憶手段４０に所定フレーム数の撮影画像が蓄積されると（ステップＳ３にて「ＹＥＳ」の場合）、画像処理部５は密度推定手段５０として動作し、密度推定手段５０は撮影画像の各画素の位置に推定用抽出窓を設定し、各推定用抽出窓における撮影画像から抽出した推定用特徴量に基づいて当該画素における移動物体の推定密度を算出する（ステップＳ４）。 When a predetermined number of frames of captured images are accumulated in the time-series image storage means 40 (in the case of “YES” in step S3), the image processing unit 5 operates as the density estimation means 50, and the density estimation means 50 is a captured image. An estimation extraction window is set at the position of each pixel, and the estimated density of the moving object at the pixel is calculated based on the estimation feature amount extracted from the captured image in each estimation extraction window (step S4).

密度推定手段５０により撮影画像における推定密度の分布が求められると、画像処理部５は領域分割手段５１として動作し、撮影画像を混雑度ごとの領域に区分する（ステップＳ５）。これにより、撮影画像は、推定密度が「背景」または「低密度」の画素群である低混雑領域、推定密度が「中密度」の画素群である中混雑領域、および推定密度が「高密度」の画素群である高混雑領域に区分される。 When the density estimation unit 50 obtains the estimated density distribution in the captured image, the image processing unit 5 operates as the region dividing unit 51, and divides the captured image into regions for each degree of congestion (step S5). As a result, the captured image has a low congestion area that is a pixel group with an estimated density of “background” or “low density”, a medium congestion area that is a pixel group with an estimated density of “medium density”, and an estimated density of “high density”. "Is a highly congested region that is a pixel group.

画像処理部５は混雑度ごとの領域を順次、処理領域に設定して（ステップＳ６）、要注視行動検出処理を行う（ステップＳ７）。低混雑領域、中混雑領域、高混雑領域の全ての領域について要注視行動検出処理が完了するまで（ステップＳ８にて「ＮＯ」の場合）、ステップＳ６，Ｓ７を繰り返す。全領域について完了すると（ステップＳ８にて「ＹＥＳ」の場合）、要注視行動が検出された場合には（ステップＳ９にて「ＹＥＳ」の場合）、要注視情報出力手段３１に要注視情報が出力され（ステップＳ１０）、処理はステップＳ１に戻る。一方、要注視行動が検出されなかった場合には（ステップＳ９にて「ＮＯ」の場合）、ステップＳ１０は省略される。なお、ステップＳ１に戻るときに画像処理部５は現時刻の撮影画像および局所領域の情報を記憶部４に記憶させる。 The image processing unit 5 sequentially sets a region for each degree of congestion as a processing region (step S6), and performs a gazing action detection process (step S7). Steps S <b> 6 and S <b> 7 are repeated until the gaze-behavior detection processing is completed for all regions of the low congestion region, medium congestion region, and high congestion region (in the case of “NO” in step S <b> 8). When all the regions are completed (in the case of “YES” in step S8), when the gaze action is detected (in the case of “YES” in step S9), the gaze information is required for the gaze information output means 31. In step S10, the process returns to step S1. On the other hand, if no gaze action is detected (in the case of “NO” in step S9), step S10 is omitted. When returning to step S1, the image processing unit 5 causes the storage unit 4 to store the photographed image at the current time and the local area information.

図６は要注視行動検出処理Ｓ７の概略のフロー図である。図６に示す処理では、撮影画像を格子状に分割して、混雑度が低、中、高の各領域についての局所領域の設定に用いる単位ブロックを設定する（ステップＳ１００）。 FIG. 6 is a schematic flowchart of the gaze action detection process S7 requiring attention. In the process shown in FIG. 6, the captured image is divided into a grid pattern, and unit blocks used for setting local areas for the low, medium, and high congestion areas are set (step S100).

設定された処理領域が低混雑領域である場合（ステップＳ１０２にて「ＹＥＳ」の場合）、領域分割手段５１は各単位ブロックを局所領域に設定する（ステップＳ１０３）。そして、動きベクトル算出手段５２が、記憶部４から１時刻前の撮影画像と１時刻前の処理で設定した局所領域の情報とを読み出し、現時刻の撮影画像における低混雑領域に設定された局所領域ごとに、分析時間間隔を１時刻間隔（１フレーム間隔）として動きベクトルを算出し（ステップＳ１０４）、低混雑領域について算出した動きベクトルを集計して移動方向の頻度分布と速さの頻度分布を算出する（ステップＳ１０５）。 When the set processing area is a low congestion area (in the case of “YES” in step S102), the area dividing means 51 sets each unit block as a local area (step S103). Then, the motion vector calculation means 52 reads the captured image one time ago from the storage unit 4 and information on the local area set by the processing one hour ago, and the local area set in the low congestion area in the captured image at the current time. For each region, a motion vector is calculated with an analysis time interval of one time interval (one frame interval) (step S104), and the motion vectors calculated for the low-congestion region are aggregated to calculate the frequency distribution in the moving direction and the speed frequency distribution. Is calculated (step S105).

動きベクトル算出手段５２が低混雑領域について動き分布を算出すると、要注視行動検出手段５３は動き分布について低混雑時の検出基準を満たすか否かを調べる（ステップＳ１０６）。具体的には、要注視行動検出手段５３は検出基準記憶手段４２から低混雑時の検出基準を読み出す。すなわち、要注視行動検出手段５３は、動き分布の正常パターンと閾値Ｔ_Ｌ１１を読み出す。次に、ステップＳ１０５で求めた各分布が要注視行動の検出基準を満たすか否かを判定する。 When the motion vector calculation means 52 calculates the motion distribution for the low congestion area, the gaze behavior detecting means 53 checks whether or not the motion distribution satisfies the detection criterion at the time of low congestion (step S106). Specifically, the gazing behavior detection means 53 reads the detection standard at the time of low congestion from the detection standard storage means 42. That is, the gazing behavior detection means 53 reads the normal pattern of the motion distribution and the threshold value _TL11 . Next, it is determined whether or not each distribution obtained in step S105 satisfies the detection criterion for the gaze action.

例えば、要注視行動検出手段５３は、ステップＳ１０５で動き分布として求めた各頻度分布をそれぞれと対応する正常パターンと比較して相違度を算出する。相違度として、動き分布とその正常パターンの面積差Ｄ_Ｌ１１を算出することができる。そして、面積差Ｄ_Ｌ１１を閾値Ｔ_Ｌ１１と比較し、Ｄ_Ｌ１１≧Ｔ_Ｌ１１である場合は検出基準を満たすと判定し（ステップＳ１０６にて「ＹＥＳ」の場合）、Ｄ_Ｌ１１＜Ｔ_Ｌ１１である場合は検出基準を満たさないと判定する（ステップＳ１０６にて「ＮＯ」の場合）。 For example, the gaze behavior detecting means 53 calculates the degree of difference by comparing each frequency distribution obtained as the motion distribution in step S105 with the corresponding normal pattern. As the degree of difference, the area difference D _L11 between the motion distribution and its normal pattern can be calculated. Then, the area difference D _L11 is compared with the threshold value T _L11, and when D _L11 ≧ T _L11, it is determined that the detection criterion is satisfied (in the case of “YES” in step S106), and when D _L11 <T _L11 Determines that the detection criterion is not satisfied (in the case of “NO” in step S106).

Ｄ_Ｌ１１≧Ｔ_Ｌ１１である場合は、低混雑領域内に急加速もしくは急減速している局所領域があり、ひったくり時の手の動きやひったくり後の逃走行動もしくはひったくり前の手の動きやひったくり前の接近行動等が発生している可能性がある。このように、要注視行動の検出基準を満たす分布が検出された場合（ステップＳ１０６にて「ＹＥＳ」の場合）、要注視行動検出手段５３は、当該分布についての要注視情報を生成し記録し（ステップＳ１０７）、図５のステップＳ８に処理を進める。例えば、要注視行動検出手段５３は、当該分布が満たした検出基準と対応する事象名「ひったくりなどの可能性あり」、および抽出対象領域である局所領域の座標を要注視情報として生成する。一方、分布が検出基準を満たさない場合（ステップＳ１０６にて「ＮＯ」の場合）、ステップＳ１０７は省略される。 When D _L11 ≧ T _L11 , there is a local area that suddenly accelerates or decelerates in a low-congestion area, and the movement of the hand before snatching, the escape running movement after snatching, or the hand movement before snatching There is a possibility that an approaching action or the like has occurred. As described above, when a distribution satisfying the detection criterion for the gaze action required is detected (in the case of “YES” in step S106), the gaze action detection unit 53 generates and records the gaze information for the distribution. (Step S107), the process proceeds to Step S8 of FIG. For example, the gaze-behavior detection means 53 generates the event name “possibility of snatching” corresponding to the detection criterion satisfied by the distribution and the coordinates of the local area that is the extraction target area as the gaze information. On the other hand, when the distribution does not satisfy the detection criterion (in the case of “NO” in step S106), step S107 is omitted.

要注視行動検出処理Ｓ７に対して設定された処理領域が中混雑領域である場合（ステップＳ１０２にて「ＮＯ」かつステップＳ１０８にて「ＹＥＳ」の場合）、領域分割手段５１は上述したようにＹ軸方向に並ぶ２つの単位ブロックからなる統合ブロックを局所領域に設定する（ステップＳ１０９）。そして、動きベクトル算出手段５２は、記憶部４から２時刻前の撮影画像と２時刻前の処理で設定した局所領域の情報とを読み出し、現時刻の撮影画像における中混雑領域に設定された局所領域ごとに、分析時間間隔を２時刻間隔（２フレーム間隔）として動きベクトルを算出し（ステップＳ１１０）、中混雑領域について算出した動きベクトルを集計して動き分布を算出する（ステップＳ１１１）。例えば、動きベクトル算出手段５２は中混雑領域の動き分布として、移動方向の頻度分布を算出する。 When the processing area set for the gaze-behavior detection process S7 is a medium congestion area ("NO" in step S102 and "YES" in step S108), the area dividing means 51 is as described above. An integrated block composed of two unit blocks arranged in the Y-axis direction is set as a local region (step S109). Then, the motion vector calculation means 52 reads the captured image two hours before and the information on the local area set by the processing two hours before from the storage unit 4, and the local area set as the medium congestion area in the captured image at the current time. For each region, a motion vector is calculated with an analysis time interval of 2 time intervals (2 frame intervals) (step S110), and the motion vectors calculated for the middle crowded region are aggregated to calculate a motion distribution (step S111). For example, the motion vector calculation unit 52 calculates a frequency distribution in the movement direction as the motion distribution of the middle congestion area.

動きベクトル算出手段５２が中混雑領域について動き分布を算出すると、要注視行動検出手段５３は動き分布について中混雑時の検出基準を満たすか否かを調べる（ステップＳ１１２）。具体的には、要注視行動検出手段５３は検出基準記憶手段４２から中混雑時の検出基準を読み出す。すなわち、要注視行動検出手段５３は、移動方向が特定方向に偏った頻度を有する複数の移動方向の頻度分布とその閾値Ｔ_Ｍ１１を読み出す。また移動方向の偏りが無い移動方向の頻度分布とその閾値Ｔ_Ｍ１２を読み出す。これら頻度分布は要注視パターンに相当する。 When the motion vector calculation means 52 calculates the motion distribution for the middle congestion area, the gaze required behavior detection means 53 checks whether or not the motion distribution satisfies the detection criterion for the middle congestion (step S112). Specifically, the watch-at-behavior detection means 53 reads out the detection reference at the time of medium congestion from the detection reference storage means 42. That is, the gaze behavior detecting means 53 reads the frequency distributions of a plurality of moving directions having a frequency that the moving direction is biased in a specific direction and the threshold value _TM11 . Further, the frequency distribution in the moving direction with no deviation in the moving direction and the threshold value _TM12 are read out. These frequency distributions correspond to the watch pattern required.

要注視行動検出手段５３は、ステップＳ１１１で算出した移動方向の頻度分布を要注視パターンと比較して類似度を算出する。例えば、類似度として、ステップＳ１１１で算出した移動方向の頻度分布とその要注視パターンである偏った頻度を有する複数のパターンとの重複面積Ｓ_Ｍ１１、偏りの無い頻度を有するパターンとの重複面積Ｓ_Ｍ１２を算出する。 The gaze behavior detecting unit 53 calculates the similarity by comparing the frequency distribution in the movement direction calculated in step S111 with the gaze pattern. For example, as the similarity, the overlapping area S _M11 between the frequency distribution in the moving direction calculated in step S111 and a plurality of patterns having a biased frequency that is a watched pattern thereof, and the overlapping area S with a pattern having a biased frequency. _M12 is calculated.

要注視行動検出手段５３は、重複面積Ｓ_Ｍ１１と閾値Ｔ_Ｍ１１と比較する。Ｓ_Ｍ１１≧Ｔ_Ｍ１１であれば、人物グループをなす各人物が特定の位置に向かって移動し、さらに移動方向が一致していることから、当該人物グループは行列を生成している可能性がある。 The gazing behavior detecting means 53 compares the overlapping area S _M11 with the threshold T _M11 . If S _M11 ≧ T _M11 , each person in the person group moves toward a specific position, and the movement directions match, so that the person group may generate a matrix. .

また、要注視行動検出手段５３は、重複面積Ｓ_Ｍ１２と閾値Ｔ_Ｍ１２と比較する。Ｓ_Ｍ１２≧Ｔ_Ｍ１２であれば、人物グループをなす各人物が特定の位置に向かって移動し、さらに移動方向が均等であることから、当該人物グループは、特定位置に向かって囲い込む行動をとっており、急病人や喧嘩などのトラブルが生じている可能性を示す。 In addition, the gazing behavior detecting means 53 compares the overlapping area S _M12 with the threshold value T _M12 . If S _M12 ≧ T _M12 , each person in the person group moves toward a specific position, and the movement direction is uniform, and therefore the person group takes an action of enclosing toward the specific position. It shows the possibility of troubles such as sudden illness and fighting.

このような要注視行動の検出基準を満たす分布が検出された場合（ステップＳ１１２にて「ＹＥＳ」の場合）、要注視行動検出手段５３は、当該分布についての要注視情報を生成し記録し（ステップＳ１０７）、図５のステップＳ８に処理を進める。例えば、要注視行動検出手段５３は、当該分布が満たした検出基準と対応する「囲い込み発生」などの事象名、および抽出対象領域である局所領域の座標を要注視情報として生成する。一方、分布が検出基準を満たさない場合（ステップＳ１１２にて「ＮＯ」の場合）、ステップＳ１０７は省略される。 When a distribution satisfying the detection criteria for such a watch-at-behavior is detected (in the case of “YES” at step S112), the watch-at-behavior detection means 53 generates and records the watch-at-watch information for the distribution ( In step S107), the process proceeds to step S8 in FIG. For example, the gaze-behavior detection means 53 generates an event name such as “enclosed” corresponding to the detection criterion satisfied by the distribution and the coordinates of the local area that is the extraction target area as the gaze information. On the other hand, when the distribution does not satisfy the detection criterion (“NO” in step S112), step S107 is omitted.

要注視行動検出処理Ｓ７に対して設定された処理領域が高混雑領域である場合（ステップＳ１０２およびＳ１０８にて「ＮＯ」の場合）、領域分割手段５１は上述したようにＸ軸方向、Ｙ軸方向に２×２の配列をなす４つの単位ブロックからなる統合ブロックを局所領域に設定する（ステップＳ１１３）。そして、動きベクトル算出手段５２は、記憶部４から４時刻前の撮影画像と４時刻前の処理で設定した局所領域の情報とを読み出し、現時刻の撮影画像における高混雑領域に設定された局所領域ごとに、分析時間間隔を４時刻間隔（４フレーム間隔）として動きベクトルを算出し（ステップＳ１１４）、高混雑領域について算出した動きベクトルを集計して動き分布を算出する（ステップＳ１１５）。例えば、動きベクトル算出手段５２は、複数の局所領域それぞれの動きベクトルと当該局所領域の周囲の局所領域の動きベクトルとの差ベクトルの平均ベクトル（相対動きベクトル）を算出して、複数の局所領域それぞれの重心と相対動きベクトルを対応付けた動き分布を算出する。なお、注目局所領域に隣接する局所領域を注目局所領域の周囲の局所領域としてもよいし、注目局所領域の重心から予め定めた半径の円内に重心が含まれる局所領域を注目局所領域の周囲の局所領域としてもよい。 When the processing area set for the gaze behavior detection process S7 is a highly congested area (in the case of “NO” in steps S102 and S108), the area dividing means 51 is in the X-axis direction and the Y-axis as described above. An integrated block composed of four unit blocks having a 2 × 2 array in the direction is set as a local region (step S113). Then, the motion vector calculation means 52 reads the captured image 4 hours before and the local area information set by the processing 4 hours before from the storage unit 4, and the local area set as the highly congested area in the captured image at the current time. For each region, motion vectors are calculated with an analysis time interval of 4 time intervals (4 frame intervals) (step S114), and the motion vectors calculated for highly congested regions are aggregated to calculate a motion distribution (step S115). For example, the motion vector calculation means 52 calculates an average vector (relative motion vector) of a difference vector between the motion vector of each of the plurality of local regions and the motion vector of the local region around the local region, and the plurality of local regions A motion distribution in which each center of gravity is associated with a relative motion vector is calculated. A local region adjacent to the local region of interest may be a local region around the local region of interest, or a local region that includes a centroid within a circle with a predetermined radius from the centroid of the local region of interest It may be a local region.

動きベクトル算出手段５２が高混雑領域について動き分布を算出すると、要注視行動検出手段５３は動き分布について高混雑時の検出基準を満たすか否かを調べる（ステップＳ１１６）。具体的には、要注視行動検出手段５３は検出基準記憶手段４２から高混雑時の検出基準を読み出す。すなわち、要注視行動検出手段５３は、高混雑領域の動き分布の正常パターンと閾値Ｔ_Ｈ１１、閾値Ｔ_Ｈ１２を読み出す。 When the motion vector calculation means 52 calculates the motion distribution for the highly congested area, the gaze behavior detecting means 53 checks whether or not the motion distribution satisfies the detection criterion at the time of high congestion (step S116). Specifically, the gazing behavior detecting unit 53 reads out the detection criterion at the time of high congestion from the detection criterion storage unit 42. That is, the gazing behavior detection means 53 reads the normal pattern of the motion distribution in the highly congested region, the threshold value _TH11 , and the threshold value _TH12 .

要注視行動検出手段５３は、ステップＳ１１５で算出した分布を正常パターンと比較して相違度を算出する。例えば、要注視行動検出手段５３は、ステップＳ１１５で算出した動き分布とその正常パターンの間で対応する局所領域の相対動きベクトルどうしの差ベクトルの大きさを閾値Ｔ_Ｈ１１と比較して、差ベクトルの大きさが閾値Ｔ_Ｈ１１以上である局所領域の総面積Ｄ_Ｈ１２を算出する。なお、注目局所領域の重心に最も近い重心を有する局所領域を注目局所領域に対応する局所領域としてもよい。 The gaze behavior detecting unit 53 calculates the difference by comparing the distribution calculated in step S115 with the normal pattern. For example, the gazing behavior detecting means 53 compares the magnitude of the difference vector between the relative motion vectors in the local area corresponding to the motion distribution calculated in step S115 and the normal pattern with the threshold _TH11, and calculates the difference vector. The total area _DH12 of the local region whose size is _{equal to} or greater than the threshold value _TH11 is calculated. Note that a local region having a centroid closest to the centroid of the local region of interest may be a local region corresponding to the local region of interest.

要注視行動検出手段５３は、総面積Ｄ_Ｈ１２を閾値Ｔ_Ｈ１２と比較し、Ｄ_Ｈ１２≧Ｔ_Ｈ１２である場合は検出基準を満たすと判定し（ステップＳ１１６にて「ＹＥＳ」の場合）、Ｄ_Ｈ１２＜Ｔ_Ｈ１２である場合は検出基準を満たさないと判定する（ステップＳ１１６にて「ＮＯ」の場合）。 Main watching action detection unit 53, the total area _{D H12} is compared with a threshold value _{T _H12,} determined to be a _{D H12} ≧ _{T H12} satisfy the detection criteria (if at step S116 is _{"YES"), D H12} If < _{TH 12,} it is determined that the detection criterion is not satisfied (in the case of “NO” in step S116).

Ｄ_Ｈ１２≧Ｔ_Ｈ１２である場合は、高混雑領域中に他の大勢の動きとは異なる動きが生じており、人の集団移動の中での逆行や滞留など、ひったくり後の逃走行動もしくはひったくり前の接近行動等が発生している可能性がある。 When D _H12 ≧ T _H12 , there is a movement different from many other movements in the high-congestion area, such as backward running or staying in a group movement of people, before the runaway movement after snatching or snatching There is a possibility that an approaching action or the like has occurred.

このような要注視行動の検出基準を満たす分布が検出された場合（ステップＳ１１６にて「ＹＥＳ」の場合）、要注視行動検出手段５３は、当該分布についての要注視情報を生成し記録し（ステップＳ１０７）、図５のステップＳ８に処理を進める。例えば、要注視行動検出手段５３は、当該分布が満たした検出基準と対応する「ひったくりなどの可能性あり」などの事象名、および抽出対象領域である高混雑領域において差ベクトルの大きさが閾値Ｔ_Ｈ１１以上であった局所領域の重心座標を要注視情報として生成する。一方、分布が検出基準を満たさない場合（ステップＳ１１６にて「ＮＯ」の場合）、ステップＳ１０７は省略される。 When a distribution satisfying the detection criteria for such a watch-at-behavior is detected (“YES” at step S116), the watch-at-behavior detection means 53 generates and records the watch-at-watch information for the distribution ( In step S107), the process proceeds to step S8 in FIG. For example, the gaze-behavior detecting unit 53 requires the event name such as “possibility of snatching” corresponding to the detection criterion satisfied by the distribution, and the size of the difference vector in the highly congested area that is the extraction target area The center-of-gravity coordinates of the local region that is _{equal to} or higher than _TH11 are generated as gaze information. On the other hand, when the distribution does not satisfy the detection criterion (“NO” in step S116), step S107 is omitted.

以上により、混雑が生じ得る空間を撮影した撮影画像から、当該空間内の移動物体の動きベクトルを、混雑度が低い区分領域では移動物体の部位の動きまで考慮して詳細に算出しつつ、混雑度が高い区分領域では移動物体の部位の混同を原因とする誤算出を低減して、精度良く算出できる。そのため、混雑が生じ得る空間を撮影した撮影画像から移動物体の動きを精度良く解析できる。 As described above, the motion vector of the moving object in the space that is likely to be crowded is calculated in detail in consideration of the movement of the moving object in the segmented area where the degree of congestion is low. In a highly divided area, it is possible to reduce the erroneous calculation caused by the confusion of the parts of the moving object and to calculate with high accuracy. Therefore, it is possible to accurately analyze the movement of a moving object from a captured image obtained by capturing a space where congestion can occur.

［第２の実施形態］
本発明の第２の実施形態に係る画像監視装置１は領域分割手段５１の処理が上述した第１の実施形態とは異なり、他の点は基本的に第１の実施形態と同様である。以下、第２の実施形態について、第１の実施形態と同様の構成については同一の符号を付して上述の説明を援用し、以下、第１の実施形態との相違点を中心に説明する。 [Second Embodiment]
The image monitoring apparatus 1 according to the second embodiment of the present invention is basically the same as the first embodiment except that the processing of the area dividing unit 51 is different from the first embodiment described above. Hereinafter, the second embodiment will be described with a focus on the differences from the first embodiment, with the same reference numerals given to the same configurations as those of the first embodiment, and the above description. .

第１の実施形態において領域分割手段５１は区分領域をその密度に応じて予め定めた大きさの局所領域に分割したが、第２の実施形態において領域分割手段５１は撮影画像ごとにその撮影内容に応じた局所領域を動的に設定する。具体的には、領域分割手段５１は、画素値（色または濃度）と画素位置とが互いに類似する画素からなる局所領域に分割する分割基準であって密度に関し定めた階級が高いほど大きな局所領域となりやすく定める分割基準に従って、区分領域のそれぞれを局所領域に分割する。 In the first embodiment, the area dividing unit 51 divides the divided area into local areas having a predetermined size according to the density. In the second embodiment, the area dividing unit 51 captures the captured content for each captured image. The local area corresponding to is dynamically set. Specifically, the region dividing means 51 is a division criterion for dividing the pixel value (color or density) and the pixel position into local regions made up of pixels similar to each other, and the higher the class defined for the density, the larger the local region Each of the divided areas is divided into local areas in accordance with division criteria that are easily determined.

第２の実施形態の領域分割手段５１は、密度に関する階級として、第１の実施形態と同様に定義した混雑度を用いる。よって、第２の実施形態における区分領域は、第１の実施形態と同様に生成され、撮影画像内には低混雑領域、中混雑領域および高混雑領域の３通りの区分領域が設定され得る。 The area dividing means 51 of the second embodiment uses the degree of congestion defined as in the first embodiment as the class relating to the density. Therefore, the partitioned areas in the second embodiment are generated in the same manner as in the first embodiment, and three partitioned areas of a low congestion area, a medium congestion area, and a high congestion area can be set in the captured image.

図７は各混雑度の領域からなる撮影画像の例とそれに対する局所領域の例を示す模式図である。具体的には、図７の上段には、低混雑領域のみからなる撮影画像とそれに対する局所領域とが示されている。同様に、図７の中段、下段にはそれぞれ中混雑領域、高混雑領域のみからなる撮影画像とそれに対する局所領域とが示されている。 FIG. 7 is a schematic diagram showing an example of a photographed image composed of areas of respective congestion degrees and an example of a local area corresponding thereto. Specifically, in the upper part of FIG. 7, a captured image including only a low congestion area and a local area corresponding thereto are shown. Similarly, a middle image and a lower image in FIG. 7 show a captured image including only a medium congestion area and a high congestion area, and a local area corresponding thereto.

例えば、領域分割手段５１は、各区分領域にＳＬＩＣ（Simple Linear Iterative Clustering）法を適用することによって各区分領域の撮影画像を複数のクラスタに分割する。その複数のクラスタのそれぞれが局所領域である。 For example, the area dividing means 51 divides the captured image of each divided area into a plurality of clusters by applying an SLIC (Simple Linear Iterative Clustering) method to each divided area. Each of the plurality of clusters is a local region.

ＳＬＩＣ法においては、分割に先立って分割数が定められて、定められた分割数と同数のクラスタ中心を初期値として対象の画像上に設定し、対象の画像が分割数と同数のクラスタに分割される。 In the SLIC method, the number of divisions is determined prior to division, the same number of cluster centers as the determined number of divisions are set as initial values on the target image, and the target image is divided into the same number of clusters as the number of divisions. Is done.

このようなＳＬＩＣ法の特性に対応して、領域分割手段５１は、密度が高いほど大きな局所領域となりやすいよう、区分領域ごとに、当該区分領域の混雑度が高いほど少ない割合の分割数を定める。つまり単位面積当たりの局所領域の数を少なく設定する分割基準に従って、区分領域のそれぞれを局所領域に分割する。 Corresponding to the characteristics of the SLIC method, the region dividing means 51 determines a smaller number of divisions for each divided region as the density of the divided region is higher, so that a larger local region is more likely to be generated. . That is, each of the segmented areas is divided into local areas according to a division criterion that sets a small number of local areas per unit area.

例えば、領域分割手段５１は、区分領域ごとに以下のステップＡ１〜Ａ６の処理を行って区分領域をクラスタに分割する。 For example, the area dividing unit 51 performs the following steps A1 to A6 for each divided area to divide the divided area into clusters.

（ステップＡ１）区分領域を、区分領域の混雑度に応じた分割面積で格子状に略等分割して、当該分割により生成された矩形領域である各ブロックの中心をクラスタ中心の初期値とする。この処理により、クラスタ中心の数（ブロック数）だけの分割数（＞１）を分割基準として定めたことになる。 (Step A1) The segmented region is divided into substantially equal grids with a divided area corresponding to the degree of congestion of the segmented region, and the center of each block, which is a rectangular region generated by the segmentation, is set as the initial value of the cluster center. . With this processing, the number of divisions (> 1) equal to the number of cluster centers (number of blocks) is determined as the division criterion.

具体的には、低混雑領域については、当該領域の面積Ｓ_Ｌと人の１／８ほどの大きさに予め定めた分割面積Ｕ_Ｌの商Ｓ_Ｌ／Ｕ_Ｌを四捨五入した整数値を当該領域の分割数ｋ_Ｌと定め、当該領域をそれぞれの面積が略Ｕ_Ｌのブロックに分割して、各ブロックの中心を当該領域におけるクラスタ中心の初期値とする。 Specifically, the low congestion area, the area S _L and predetermined divided area U _L quotient S _{L /} U _L the integer value obtained by rounding off the region of the extent of the size 1/8 people of the area defined as the division number k _L, the area is each area is divided into blocks of substantially U _L, the center of each block and the initial value of the cluster centers in the area.

同様に、中混雑領域については、当該領域の面積Ｓ_Ｍと人の１／４ほどの大きさに予め定めた分割面積Ｕ_Ｍの商Ｓ_Ｍ／Ｕ_Ｍを四捨五入した整数値を当該領域の分割数ｋ_Ｍと定め、当該領域をそれぞれの面積が略Ｕ_Ｍのブロックに分割して、各ブロックの中心を当該領域におけるクラスタ中心の初期値とする。 Similarly, for a middle crowded area, an integer value obtained by rounding off a quotient S _M / U _M of a predetermined divided area U _M that is approximately ¼ of the area S _M of the area and a person is divided into the areas. defined as the number k _M, the area is each area is divided into blocks of substantially U _M, the center of each block and the initial value of the cluster centers in the area.

また、高混雑領域については、当該領域の面積Ｓ_Ｈと人の１／２ほどの大きさに予め定めた分割面積Ｕ_Ｈの商Ｓ_Ｈ／Ｕ_Ｈを四捨五入した整数値を当該領域の分割数ｋ_Ｈと定め、当該領域をそれぞれの面積が略Ｕ_Ｈのブロックに分割して、各ブロックの中心を当該領域におけるクラスタ中心の初期値とする。 As for the high congestion area, the division number of the region area S _H and human predetermined splitting area U _H quotient S _{H /} U _H that region an integer value obtained by rounding off the about the size of half of the k _H is defined, the region is divided into blocks each having an area of approximately U _H , and the center of each block is set as the initial value of the cluster center in the region.

なお、分割数の算出に際し、四捨五入による整数値化に代えて、切り捨てまたは切り上げにより整数値化を行ってもよく、いずれとするかは予め定めておけばよい。 In calculating the number of divisions, instead of rounding to an integer value, rounding or rounding may be used to make an integer value, and either one may be determined in advance.

また、分割数が２未満となる場合は、局所領域への分割および動きベクトルの算出は行わないよう制御する。 Further, when the number of divisions is less than 2, control is performed so as not to perform division into local regions and calculation of motion vectors.

（ステップＡ２）区分領域内の各画素と各クラスタ中心との組み合わせに対して評価値を算出する。評価値として、例えば、当該画素から当該クラスタ中心までの距離の逆数と、当該画素とクラスタ中心との間の輝度の類似度との重みづけ和を用いることができる。すなわち、評価値は、画素位置の類似度と画素値の類似度とを統合した統合類似度で定義することができる。 (Step A2) An evaluation value is calculated for a combination of each pixel and each cluster center in the partitioned area. As the evaluation value, for example, a weighted sum of the reciprocal of the distance from the pixel to the cluster center and the luminance similarity between the pixel and the cluster center can be used. That is, the evaluation value can be defined by an integrated similarity obtained by integrating the similarity of pixel positions and the similarity of pixel values.

（ステップＡ３）区分領域内の各画素を、当該画素との評価値が最も高いクラスタ中心に帰属させる。 (Step A3) Each pixel in the segmented region is assigned to the cluster center having the highest evaluation value with the pixel.

（ステップＡ４）全画素の評価値の総和を求める。 (Step A4) The sum of evaluation values of all pixels is obtained.

（ステップＡ５）各クラスタ中心を、当該クラスタ中心に帰属する画素の座標を当該画素の評価値で重み付けて平均した重み付け平均座標に更新する。 (Step A5) Each cluster center is updated to weighted average coordinates obtained by weighting and averaging the coordinates of the pixels belonging to the cluster center with the evaluation values of the pixels.

（ステップＡ６）クラスタ中心の更新値を用いてステップＡ２〜Ａ５を繰り返し、ステップＡ４で求めた総和と前回のステップＡ４で総和した値との差の絶対値が所定値未満となり、クラスタの更新処理が収束したと判断される場合、または繰り返し回数が規定回数に達した場合は、処理を終了し、直近に得られたクラスタを局所領域に決定する。 (Step A6) Steps A2 to A5 are repeated using the cluster center update value, and the absolute value of the difference between the sum obtained in step A4 and the sum total obtained in the previous step A4 becomes less than a predetermined value, and the cluster update process is performed. When it is determined that has converged, or when the number of repetitions reaches the specified number, the process is terminated, and the most recently obtained cluster is determined as a local region.

図７の上段、中段、下段の図はそれぞれ低混雑領域の撮影画像、中混雑領域の撮影画像、高混雑領域の撮影画像のそれぞれが人の８分の１程度の大きさの局所領域、人の４分の１程度の大きさの局所領域、人の２分の１程度の大きさの局所領域に分割される様子を例示している。 The upper, middle, and lower diagrams in FIG. 7 are a captured image of a low-congested area, a captured image of a medium-congested area, and a captured image of a highly-congested area, respectively. Is illustrated as being divided into a local region having a size of about one-fourth of the above and a local region having a size of about one-half of a person.

第２の実施形態における領域分割手段５１の上述の区分領域から局所領域を生成する処理は、第１の実施形態と同様、図５に示した動作における要注視行動検出処理Ｓ７にて行われる。つまり、第２の実施形態の画像監視装置１における監視動作は、第１の実施形態について説明した図５と同様の処理フローで行われるが、領域分割手段５１の処理の相違に関連して、要注視行動検出処理Ｓ７の詳細にて第１の実施形態と違いを有する。 The process of generating a local area from the above-described segmented area by the area dividing unit 51 in the second embodiment is performed in the gaze required action detection process S7 in the operation shown in FIG. 5 as in the first embodiment. That is, the monitoring operation in the image monitoring apparatus 1 of the second embodiment is performed in the same processing flow as in FIG. 5 described for the first embodiment, but in relation to the difference in processing of the area dividing unit 51, There is a difference from the first embodiment in the details of the gaze action detection process S7 requiring attention.

図８は第２の実施形態における要注視行動検出処理Ｓ７の概略のフロー図である。 FIG. 8 is a schematic flowchart of the gaze-behavior detection processing S7 in the second embodiment.

図５のステップＳ６にて設定された処理領域が低混雑領域である場合（ステップＳ２００にて「ＹＥＳ」の場合）、領域分割手段５１は、人の１／８ほどの大きさに定めた分割面積Ｕ_Ｌを用いた上述の局所領域の生成処理を行い、低混雑領域を平均面積が１／８人分となる数の局所領域に分割する（ステップＳ２０１）。 When the processing area set in step S6 in FIG. 5 is a low-congestion area (in the case of “YES” in step S200), the area dividing means 51 is divided into a size about 1/8 of a person. performs generation processing of the above-described local region using the area U _L, is divided into the number of local regions where the average area of low congestion area 1/8 persons (step S201).

当該局所領域に対して、動きベクトル算出手段５２は第１の実施形態の図６のステップＳ１０４，Ｓ１０５と同様にして動きベクトルの算出、および動き分布の算出を行う（ステップＳ２０２，Ｓ２０３）。そして、要注視行動検出手段５３が第１の実施形態について図６のステップＳ１０６，Ｓ１０７と同様にして、低混雑時の検出基準を満たすか否かの判定、および要注視情報の記録を行う（ステップＳ２０４，Ｓ２０５）。 For the local region, the motion vector calculation means 52 calculates a motion vector and a motion distribution in the same manner as steps S104 and S105 in FIG. 6 of the first embodiment (steps S202 and S203). Then, in the same manner as steps S106 and S107 in FIG. 6, the gazing behavior detecting unit 53 determines whether or not the detection criterion at the time of low congestion is satisfied and records the gazing information (see FIG. 6). Steps S204 and S205).

図５のステップＳ６にて設定された処理領域が中混雑領域である場合（ステップＳ２００にて「ＮＯ」かつステップＳ２０６にて「ＹＥＳ」の場合）、領域分割手段５１は、人の１／４ほどの大きさに定めた分割面積Ｕ_Ｍを用いた上述の局所領域の生成処理を行い、中混雑領域を平均面積が１／４人分となる数の局所領域に分割する（ステップＳ２０７）。 When the processing area set in step S6 in FIG. 5 is a medium congestion area (in the case of “NO” in step S200 and “YES” in step S206), the area dividing means 51 is ¼ of the person. The above-described local region generation process using the divided area U _M set to a moderate size is performed, and the middle crowded region is divided into a number of local regions whose average area is ¼ (step S207).

当該局所領域に対して、動きベクトル算出手段５２は図６のステップＳ１１０，Ｓ１１１と同様にして動きベクトルの算出、および動き分布の算出を行い（ステップＳ２０８，Ｓ２０９）、要注視行動検出手段５３は図６のステップＳ１１２，Ｓ１０７と同様にして、中混雑時の検出基準を満たすか否かの判定、および要注視情報の記録を行う（ステップＳ２１０，Ｓ２０５）。 For the local region, the motion vector calculation unit 52 calculates a motion vector and a motion distribution in the same manner as steps S110 and S111 in FIG. 6 (steps S208 and S209). In the same manner as steps S112 and S107 in FIG. 6, it is determined whether or not the detection criterion at the time of medium congestion is satisfied, and the gazing information is recorded (steps S210 and S205).

また、図５のステップＳ６にて設定された処理領域が高混雑領域である場合（ステップＳ２００にて「ＮＯ」かつステップＳ２０６にて「ＮＯ」の場合）、領域分割手段５１は、人の１／２ほどの大きさに定めた分割面積Ｕ_Ｈを用いた上述の局所領域の生成処理を行い、高混雑領域を平均面積が１／２人分となる数の局所領域に分割する（ステップＳ２１１）。 In addition, when the processing area set in step S6 in FIG. 5 is a highly congested area (“NO” in step S200 and “NO” in step S206), the area dividing unit 51 determines that the person 1 / performs generation processing of the above-described local region using division area U _H as defined in 2 about the size, the average area of high congestion area is divided into the number of local regions comprising 1/2 persons (step S211 ).

当該局所領域に対して、動きベクトル算出手段５２は図６のステップＳ１１４，Ｓ１１５と同様にして動きベクトルの算出、および動き分布の算出を行い（ステップＳ２１２，Ｓ２１３）、要注視行動検出手段５３は図６のステップＳ１１６，Ｓ１０７と同様にして、高混雑時の検出基準を満たすか否かの判定、および要注視情報の記録を行う（ステップＳ２１４，Ｓ２０５）。 For the local region, the motion vector calculation unit 52 calculates a motion vector and a motion distribution in the same manner as steps S114 and S115 in FIG. 6 (steps S212 and S213). In the same manner as steps S116 and S107 in FIG. 6, it is determined whether or not the detection criterion at the time of high congestion is satisfied, and the gazing information is recorded (steps S214 and S205).

上述した本実施形態の局所領域の生成処理では、分割数を少なくすればひとつひとつの局所領域が大きくなり易く、分割数を多くすればひとつひとつの局所領域が小さくなり易い。 In the local region generation processing of the present embodiment described above, each local region is likely to be increased if the number of divisions is reduced, and each local region is likely to be reduced if the number of divisions is increased.

このように区分領域を当該領域における移動物体の密度が高いほど少ない割合の分割数で分割することによっても、少なくとも移動物体に関する局所領域は、密度に応じて区分した領域においてその密度が高いほど大きく、密度が低いほど小さく分割されることが期待できる。 As described above, even if the divided area is divided by a smaller number of divisions as the density of the moving object in the area is higher, at least the local area related to the moving object is larger as the density is higher in the area divided according to the density. It can be expected that the lower the density, the smaller the division.

そのため、混雑が生じ得る空間を撮影した撮影画像から、当該空間内の移動物体の動きベクトルを、移動物体の部位の混同を原因とする誤算出を低減して、精度良く算出できる。そのため、混雑が生じ得る空間を撮影した撮影画像から移動物体の動きを精度良く解析できる。 For this reason, it is possible to accurately calculate a motion vector of a moving object in the space from a captured image obtained by capturing a space in which congestion may occur, by reducing erroneous calculation due to confusion of the parts of the moving object. Therefore, it is possible to accurately analyze the movement of a moving object from a captured image obtained by capturing a space where congestion can occur.

以上、第２の実施形態として、区分領域から局所領域を生成する際の分割数の多寡により混雑度に応じた局所領域の大きさを制御する手法を、ＳＬＩＣ法を用いて実現する構成を説明した。 As described above, as the second embodiment, a configuration for realizing a method for controlling the size of a local region in accordance with the degree of congestion by using a large number of divisions when generating a local region from a segmented region is described using the SLIC method. did.

ここで、ＳＬＩＣ法に代えて群平均法などを用いたボトムアップな領域分割方法を採用する場合も、分割数の多寡によって局所領域の大きさを制御できる。この場合、領域分割手段５１は以下のステップＢ１〜Ｂ５の処理を行って区分領域をクラスタに分割する。 Here, even when a bottom-up region dividing method using a group average method or the like is employed instead of the SLIC method, the size of the local region can be controlled by the number of divisions. In this case, the area dividing means 51 performs the following steps B1 to B5 to divide the divided area into clusters.

（ステップＢ１）区分領域の面積を区分領域の密度に応じた分割面積で除して分割数（＞１）を定める。具体的には、低混雑領域については、当該領域の面積Ｓ_Ｌと人の１／８ほどの大きさに予め定めた分割面積Ｕ_Ｌの商Ｓ_Ｌ／Ｕ_Ｌを四捨五入した整数値を当該領域の分割数ｋ_Ｌと定める。中混雑領域については、当該領域の面積Ｓ_Ｍと人の１／４ほどの大きさに予め定めた分割面積Ｕ_Ｍの商Ｓ_Ｍ／Ｕ_Ｍを四捨五入した整数値を当該領域の分割数ｋ_Ｍと定める。また、高混雑領域については、当該領域の面積Ｓ_Ｈと人の１／２ほどの大きさに予め定めた分割面積Ｕ_Ｈの商Ｓ_Ｈ／Ｕ_Ｈを四捨五入した整数値を当該領域の分割数ｋ_Ｈと定める。 (Step B1) The number of divisions (> 1) is determined by dividing the area of the divided region by the divided area corresponding to the density of the divided region. Specifically, the low congestion area, the area S _L and predetermined divided area U _L quotient S _{L /} U _L the integer value obtained by rounding off the region of the extent of the size 1/8 people of the area defined as the number of divisions _{k L.} For the middle crowded area, an integer value obtained by rounding off the quotient S _M / U _M of the divided area U _M set in advance to the area S _M of the area and about ¼ of the person is the division number k _{M of the} area. It is determined. As for the high congestion area, the division number of the region area S _H and human predetermined splitting area U _H quotient S _{H /} U _H that region an integer value obtained by rounding off the about the size of half of the defined as k _H.

（ステップＢ２）撮影画像中の各画素を初期のクラスタに設定する。 (Step B2) Each pixel in the captured image is set as an initial cluster.

（ステップＢ３）隣り合うクラスタの組み合わせごとに評価値を算出する。評価値として、例えば、クラスタ中心間の距離の逆数と、クラスタ間の平均輝度の類似度との重みづけ和を用いることができる。すなわち評価値は、画素位置の類似度と画素値の類似度とを統合した統合類似度で定義することができる。なお、統合類似度の代わりに画素値の類似度を評価値としてもよい。ちなみにその場合も「隣り合うクラスタ」という条件によって画素位置の類似性の要件が含まれている。 (Step B3) An evaluation value is calculated for each combination of adjacent clusters. As the evaluation value, for example, a weighted sum of the reciprocal of the distance between the cluster centers and the similarity of the average luminance between the clusters can be used. That is, the evaluation value can be defined by an integrated similarity obtained by integrating the similarity of pixel positions and the similarity of pixel values. Note that the similarity of pixel values may be used as the evaluation value instead of the integrated similarity. Incidentally, in that case as well, the requirement of similarity of pixel positions is included under the condition of “adjacent clusters”.

（ステップＢ４）評価値が最大であるクラスタの組み合わせを１つのクラスタに統合する。 (Step B4) The combination of clusters having the maximum evaluation value is integrated into one cluster.

（ステップＢ５）ステップＢ３，Ｂ４を繰り返し、クラスタ数がステップＢ１で定めた分割数以下になれば処理を終了し、直近に得られたクラスタを局所領域に決定する。一方、クラスタ数がステップＢ１で定めた分割数以下でなければステップＢ３，Ｂ４をさらに繰り返す。 (Step B5) Steps B3 and B4 are repeated, and if the number of clusters is equal to or less than the number of divisions determined in step B1, the process is terminated, and the most recently obtained cluster is determined as a local region. On the other hand, if the number of clusters is not less than the number of divisions determined in step B1, steps B3 and B4 are further repeated.

以上、分割数の多寡によって局所領域の大きさを制御する方法を２通り説明したが、分割数の多寡によって制御するのではなく、上述した評価値（統合類似度）に対する閾値の高さによって局所領域の大きさを制御する別方法を採用することもできる。 As described above, two methods for controlling the size of the local region based on the number of divisions have been described. However, instead of controlling based on the number of divisions, the local region is controlled based on the height of the threshold for the evaluation value (integrated similarity) described above. Another method for controlling the size of the region can also be adopted.

すなわち領域分割手段５１は、画素値の類似度と画素位置の類似度とを統合した統合類似度に対する閾値を密度が高いほど低く定める分割基準に従って、区分領域のそれぞれを統合類似度が閾値を超える画素からなる局所領域に分割する。つまり、分割基準は、混雑度が高いほど、画素について互いに類似すると判断する統合類似度の閾値を低く設定する。 That is, the area dividing unit 51 has the integrated similarity exceeding the threshold for each of the divided areas in accordance with the division criterion that sets the threshold for the integrated similarity obtained by integrating the similarity of the pixel value and the similarity of the pixel position as the density increases. Divide into local areas consisting of pixels. That is, the division criterion sets a lower threshold for the integrated similarity that determines that the pixels are similar to each other as the degree of congestion is higher.

この場合、領域分割手段５１は、区分領域ごとに以下のステップＣ１〜Ｃ４の処理を行って区分領域をクラスタに分割する。 In this case, the area dividing means 51 performs the following steps C1 to C4 for each divided area to divide the divided area into clusters.

（ステップＣ１）撮影画像中の各画素を初期のクラスタに設定する。 (Step C1) Each pixel in the captured image is set as an initial cluster.

（ステップＣ２）隣り合うクラスタの組み合わせごとに評価値を算出する。評価値として、例えば、上述した統合類似度を用いることができる。 (Step C2) An evaluation value is calculated for each combination of adjacent clusters. As the evaluation value, for example, the integrated similarity described above can be used.

（ステップＣ３）ステップＣ２で算出した評価値を閾値と比較し、評価値が閾値以下のクラスタの組み合わせを１つのクラスタに統合する。閾値は区分領域の混雑度ごとに予め定めた値であり、混雑度が高いほど低く定められた値である。 (Step C3) The evaluation value calculated in Step C2 is compared with a threshold value, and a combination of clusters having an evaluation value equal to or less than the threshold value is integrated into one cluster. The threshold value is a predetermined value for each degree of congestion of the segmented area, and is a value set lower as the degree of congestion is higher.

（ステップＣ４）ステップＣ３にて評価値が閾値以下のクラスタの組み合わせが１つ以上あればステップＣ２，Ｃ３を繰り返し、ステップＣ３にて評価値が閾値以下のクラスタの組み合わせが１つも無ければ処理を終了し、直近に得られたクラスタを局所領域に決定する。 (Step C4) Steps C2 and C3 are repeated if there is at least one combination of clusters whose evaluation value is less than or equal to the threshold value in Step C3, and processing is performed if there is no combination of clusters whose evaluation value is less than or equal to the threshold value in Step C3. End and determine the most recently obtained cluster as the local region.

なお、第２の実施形態においては、分割基準として分割数、類似度の閾値を例示したが、それ以外にも局所領域の大きさの範囲を制限する閾値（密度が高い階級ほど範囲を広く、密度が低い階級ほど範囲を狭く定める）、またはクラスタの統合回数を制限する閾値（密度が高い階級ほど統合回数の上限を高く、密度が低い階級ほど統合回数の上限を低く定める）などの分割基準によっても、密度が高い階級ほど局所領域のサイズが大きくなりやすく定めることができる。 In the second embodiment, the number of divisions and the similarity threshold are exemplified as the division criteria. However, other thresholds limit the size range of the local region (the higher the density, the wider the range, A division criterion such as a threshold that limits the number of cluster integrations (a lower density class) or a threshold that limits the number of cluster integrations (a higher density class has a higher upper limit on integration times, and a lower density class has a lower upper limit on integration times) Also, it can be determined that the higher the density, the easier the local area size becomes.

（１）上記各実施形態においては、検出対象の物体を人とする例を示したが、これに限らず、検出対象の物体を車両、牛や羊等の動物等とすることもできる。 (1) In each of the embodiments described above, an example in which the object to be detected is a person has been shown. However, the present invention is not limited to this, and the object to be detected may be a vehicle, an animal such as a cow or a sheep, or the like.

（２）上記各実施形態およびその変形例においては、多クラスＳＶＭ法にて学習した密度推定器を例示したが、多クラスＳＶＭ法に代えて、決定木型のランダムフォレスト法、多クラスのアダブースト（AdaBoost）法または多クラスロジスティック回帰法などにて学習した密度推定器など種々の密度推定器とすることができる。 (2) In each of the above-described embodiments and modifications thereof, the density estimator learned by the multi-class SVM method has been exemplified. However, instead of the multi-class SVM method, a decision tree type random forest method, a multi-class Adaboost Various density estimators such as a density estimator learned by (AdaBoost) method or multi-class logistic regression method can be used.

或いは識別型のＣＮＮ（Convolutional Neural Network）を用いた密度推定器とすることもできる。 Alternatively, a density estimator using a discriminating CNN (Convolutional Neural Network) may be used.

（３）上記各実施形態およびその各変形例においては、密度推定器が推定する背景以外の密度のクラスを３クラスとしたが、より細かくクラスを分けてもよい。 (3) In each of the above embodiments and their modifications, the density classes other than the background estimated by the density estimator are set to three classes. However, the classes may be divided more finely.

（４）上記各実施形態およびその各変形例においては、多クラスに分類する密度推定器を例示したがこれに代えて、特徴量から密度の値（推定密度）を回帰する回帰型の密度推定器とすることもできる。すなわち、リッジ回帰法、サポートベクターリグレッション法、回帰木型のランダムフォレスト法またはガウス過程回帰（Gaussian Process Regression）などによって、特徴量から推定密度を求めるための回帰関数のパラメータを学習した密度推定器とすることができる。 (4) In each of the above-described embodiments and modifications thereof, a density estimator that classifies into multiple classes is illustrated, but instead of this, a regression type density estimation that regresses a density value (estimated density) from a feature quantity It can also be a container. That is, a density estimator that has learned the parameters of the regression function for obtaining the estimated density from the features by ridge regression method, support vector regression method, regression tree-type random forest method or Gaussian Process Regression, etc. can do.

或いは回帰型のＣＮＮを用いた密度推定器とすることもできる。 Alternatively, a density estimator using a regression type CNN may be used.

（５）上記各実施形態およびその各変形例においては、密度推定器が学習する特徴量および推定用特徴量としてＧＬＣＭ特徴を例示したが、これらはＧＬＣＭ特徴に代えて、局所二値パターン（Local Binary Pattern：ＬＢＰ）特徴量、ハールライク（Haar-like）特徴量、ＨＯＧ特徴量、輝度パターンなどの種々の特徴量とすることができ、またはＧＬＣＭ特徴とこれらのうちの複数を組み合わせた特徴量とすることもできる。 (5) In each of the above embodiments and the modifications thereof, the GLCM feature is exemplified as the feature amount learned by the density estimator and the estimation feature amount. However, instead of the GLCM feature, the local binary pattern (Local Binary Pattern (LBP) feature value, Haar-like feature value, HOG feature value, luminance pattern, and other various feature values, or a combination of GLCM features and a plurality of them You can also

１画像監視装置、２撮影部、３通信部、４記憶部、５画像処理部、６表示部、３０画像取得手段、３１要注視情報出力手段、４０時系列画像記憶手段、４１密度推定器記憶手段、４２検出基準記憶手段、５０密度推定手段、５１領域分割手段、５２動きベクトル算出手段、５３要注視行動検出手段。 DESCRIPTION OF SYMBOLS 1 Image monitoring apparatus, 2 imaging | photography part, 3 communication part, 4 memory | storage part, 5 image processing part, 6 display part, 30 image acquisition means, 31 gaze information output means, 40 time series image storage means, 41 density estimator memory | storage Means, 42 detection reference storing means, 50 density estimating means, 51 area dividing means, 52 motion vector calculating means, 53 gaze behavior detecting means.

Claims

Image acquisition means for acquiring a plurality of time-captured images of a space that can be crowded with a predetermined moving object;
The moving object photographed in an arbitrary region in the photographed image by using a density estimator that has learned the image characteristics of each of the density images obtained by photographing the space where the moving object exists at the density for each predetermined density Density estimation means for estimating the density of
Area dividing means for dividing each of the divided areas obtained by dividing the captured image into a plurality of classes set with respect to the density based on the estimated density into a plurality of local areas in accordance with a division criterion determined for each of the classes. When,
Motion vector calculating means for calculating a motion vector in each of the local regions;
Motion analysis means for analyzing the motion of the moving object in the space from the motion vectors of the plurality of local regions;
An image analysis apparatus comprising:

2. The division criterion is characterized in that an area having a predetermined size with respect to the size of the moving object is set as the local area, and the size is set to be larger as the class has a higher density. The image analysis apparatus described in 1.

In the division criterion, a region composed of pixels that are similar to each other based on pixel similarity defined by a pixel value and a pixel position is defined as the local region, and the higher the class, the larger the size of the local region. The image analysis apparatus according to claim 1, wherein the image analysis apparatus is defined.

The image analysis apparatus according to claim 3, wherein the division criterion is set such that the higher the density is, the smaller the number of the local regions per unit area is.

The image analysis apparatus according to claim 3, wherein the division criterion is such that a threshold value of the similarity that is determined to be similar to each other with respect to the pixels is set to be lower as the density is higher.