JP2014074977A

JP2014074977A - Image processing apparatus, image processing method, and image processing program

Info

Publication number: JP2014074977A
Application number: JP2012221034A
Authority: JP
Inventors: J H Manoj Vincent Pelella; ジェイ．エイチ．マノジヴィンセントペレラ
Original assignee: National Institute of Advanced Industrial Science and Technology AIST
Current assignee: National Institute of Advanced Industrial Science and Technology AIST
Priority date: 2012-10-03
Filing date: 2012-10-03
Publication date: 2014-04-24
Anticipated expiration: 2032-10-03
Also published as: JP6028972B2

Abstract

PROBLEM TO BE SOLVED: To provide an image processing apparatus configured to detect and track an object, such as a person or an object, at a high speed, to improve detection accuracy.SOLUTION: An image processing apparatus includes: an object detection part which detects an object window including an object from an input frame image; a clustering integration part which integrates nearby object windows detected by the object detection part by clustering, to set an object area; and an object tracking part which registers an object candidate area in association with an object area of a past frame image, for an object candidate area, which is the object area, having a predetermined range of window width difference and located at the minimum distance from an object prediction area which is an area predicted based on the object area of the past frame image.

Description

この発明は、人物や物体等のオブジェクトの検出と追跡を行なう画像処理装置及びそのプログラムに関し、高速に検出を行い、少ない特徴量でオブジェクトの追跡を確実に行えるようにすることで、オブジェクトの検出と追跡がリアルタイムで行うことを可能にしたものである。 The present invention relates to an image processing apparatus and its program for detecting and tracking an object such as a person or an object, and detecting the object by performing high-speed detection and reliably tracking the object with a small amount of features. And tracking can be done in real time.

定点カメラで来場者映像を撮影し、人物を検出して追跡するシステムは、来場者の人数を計測し、年齢や性別を調査するために検討されてきている。 A system that captures a visitor's video with a fixed-point camera, detects and tracks a person, has been studied to measure the number of visitors and investigate the age and gender.

撮影したオブジェクト、特に人物検出の手法としては、ＨＯＧ（Histograms of oriented Gradients）と呼ばれる輝度勾配ヒストグラムを特徴量とした識別手法が広く用いられている。 As a method for detecting a photographed object, particularly a person, an identification method called a HOG (Histograms of oriented Gradients) using a luminance gradient histogram as a feature amount is widely used.

N.Dalal and B.Triggs: “Histgrams of Oriented Gradients for Human Detection”, IEEE Computer Society Conference on ComputerVision and Pattern Recognition, vol.1, pp.886-893(2005)N.Dalal and B. Triggs: “Histgrams of Oriented Gradients for Human Detection”, IEEE Computer Society Conference on ComputerVision and Pattern Recognition, vol.1, pp.886-893 (2005) X. Wang, T.X. Han and S. Yan: “An HOG-LBP Human Detector with Partial Occlusion Handling”, IEEE 12th International Conference on ComputerVision, pp.32-39(2009)X. Wang, T.X.Han and S. Yan: “An HOG-LBP Human Detector with Partial Occlusion Handling”, IEEE 12th International Conference on ComputerVision, pp.32-39 (2009)

上記のＨＯＧ特徴量を用いた人物検出においては、精度高く人物検出が行える一方、計算に時間がかかるという問題があった。また、人物の追跡（トラッキング）においては、プライバシーへの配慮から各クレーム画像を高速にリアルタイムに処理を行って画像を捨てる必要があるため、人物のトラッキングを行う場合に適用することは困難であった。また、人物の追跡についても高速に行う必要がある。 The person detection using the above HOG feature amount has a problem that it can perform person detection with high accuracy, but takes time for calculation. Also, in tracking people, it is difficult to apply to tracking people because it is necessary to process each complaint image in real time at high speed and discard the images in consideration of privacy. It was. In addition, it is necessary to track a person at high speed.

本発明は、人物や物体等のオブジェクト検出及び追跡を高速に行え、検出精度を向上させた画像処理装置に関する。 The present invention relates to an image processing apparatus capable of detecting and tracking an object such as a person or an object at high speed and improving detection accuracy.

本発明の画像処理装置は、入力フレーム画像からオブジェクトが存在するオブジェクトウィンドウを検出するオブジェクト検出部と、オブジェクト検出部で検出された複数の近傍のオブジェクトウィンドウをクラスタリングで統合することで、オブジェクト領域を設定するクラスタリング統合部と、オブジェクト領域をオブジェクト候補領域とし、過去のフレーム画像のオブジェクト領域に基づいて予測される領域をオブジェクト予測領域とし、ウィンドウ幅の差異が所定範囲内であるオブジェクト候補領域であって、かつ前記オブジェクト予測領域から最短距離にあるオブジェクト候補領域について、前記過去のフレーム画像のオブジェクト領域と関連付けて登録するオブジェクトトラッキング部と、を有する画像処理装置を提供する。 The image processing apparatus of the present invention integrates an object detection unit that detects an object window in which an object exists from an input frame image, and a plurality of neighboring object windows detected by the object detection unit by clustering, thereby obtaining an object region. The clustering integration unit to be set, the object region as the object candidate region, the region predicted based on the object region of the past frame image as the object prediction region, and the object candidate region whose window width difference is within the predetermined range. And an object tracking unit that registers an object candidate area that is at the shortest distance from the object prediction area in association with the object area of the past frame image.

上記画像処理装置において、オブジェクトトラッキング部は、複数のオブジェクト候補領域のうち、ウィンドウ幅の差異が所定範囲内であって、オブジェクト予測領域から所定の距離内にあるオブジェクト候補領域を関連オブジェクト候補領域として抽出し、関連オブジェクト候補領域のうち、オブジェクト予測領域から最短距離にある関連オブジェクト候補領域について、前記過去のフレーム画像のオブジェクト領域と関連付けて登録するようにしてもよい。 In the image processing apparatus, the object tracking unit uses, as a related object candidate area, an object candidate area that has a window width difference within a predetermined range and is within a predetermined distance from the object prediction area among the plurality of object candidate areas. The related object candidate area that is the shortest distance from the object prediction area among the related object candidate areas may be extracted and registered in association with the object area of the past frame image.

上記画像処理装置において、オブジェクトトラッキング部は、オブジェクト候補領域が、二以上のオブジェクト予測領域に関連付けられた場合、オブジェクト候補領域から最短距離にあるオブジェクト予測領域に対応する過去のフレーム画像のオブジェクト領域にのみ関連付けて登録するようにしてもよい。 In the above image processing device, the object tracking unit, when the object candidate area is associated with two or more object prediction areas, displays the object area of the past frame image corresponding to the object prediction area at the shortest distance from the object candidate area. May be registered in association with each other.

また、上記画像処理装置において、オブジェクトトラッキング部は、過去のフレーム画像のオブジェクト領域に付与されている識別子と同一の識別子をオブジェクト候補領域に付与することで関連付けて登録するようにしてもよい。 Further, in the image processing apparatus, the object tracking unit may be associated and registered by assigning the same identifier as the identifier assigned to the object region of the past frame image to the object candidate region.

上記画像処理装置において、オブジェクトトラッキング部は、二フレーム前のフレーム画像の第一のオブジェクト領域と、第一のオブジェクト領域と関連付けられた一フレーム前のフレーム画像における第二のオブジェクト領域における座標の変化量に基づいて入力フレーム画像におけるオブジェクト予測領域を決定してもよい。 In the image processing apparatus, the object tracking unit may change the coordinates of the first object area of the frame image two frames before and the second object area of the frame image one frame before associated with the first object area. The object prediction region in the input frame image may be determined based on the amount.

上記画像処理装置において、オブジェクトトラッキング部は、二フレーム前のフレーム画像に関連付けられたオブジェクト領域がない場合には、一フレーム前のフレーム画像におけるオブジェクト領域の座標から所定量変化させることで、入力フレーム画像におけるオブジェクト予測領域を決定してもよい。 In the above image processing apparatus, when there is no object area associated with the frame image two frames before, the object tracking unit changes the input frame by changing a predetermined amount from the coordinates of the object area in the frame image one frame before. The object prediction area in the image may be determined.

上記画像処理装置において、オブジェクト検出部は、入力フレーム画像にウィンドウを設け、ウィンドウ内を所定領域を有するブロックでオーバーラップスキャンすることで、ウィンドウの特徴量を算出し、算出した特徴量に基づいてオブジェクトウィンドウを検出してもよい。 In the image processing apparatus, the object detection unit calculates a feature amount of the window by providing a window in the input frame image and performing an overlap scan in the window with a block having a predetermined area, and based on the calculated feature amount. An object window may be detected.

上記画像処理装置において、オブジェクト検出部は、オーバーラップスキャンする場合に、スキャンレイヤごとに、オーバーラップ領域の特徴量は一度算出した特徴量を再利用することで、ウィンドウの特徴量を算出してもよい。 In the above image processing apparatus, when the overlap detection is performed, the object detection unit calculates the feature amount of the window by reusing the feature amount once calculated as the overlap region feature amount for each scan layer. Also good.

本発明は、入力フレーム画像からオブジェクトが存在するオブジェクトウィンドウを検出するステップ、検出された複数の近傍のオブジェクトウィンドウをクラスタリングで統合することで、オブジェクト領域を設定するステップ、オブジェクト領域をオブジェクト候補領域とし、過去のフレーム画像のオブジェクト領域に基づいて予測される領域をオブジェクト予測領域とし、ウィンドウ幅の差異が所定範囲内であるオブジェクト候補領域であって、かつ前記オブジェクト予測領域から最短距離にあるオブジェクト候補領域について、過去のフレーム画像のオブジェクト領域と関連付けて登録するステップを有する画像処理方法を提供する。 The present invention includes a step of detecting an object window in which an object exists from an input frame image, a step of setting an object region by integrating a plurality of detected object windows in a vicinity, and an object region as an object candidate region. An object predicted area based on an object area of a past frame image is set as an object predicted area, and an object candidate having a window width difference within a predetermined range and having the shortest distance from the object predicted area An image processing method including a step of registering an area in association with an object area of a past frame image is provided.

また、本発明は、コンピュータに、入力フレーム画像からオブジェクトが存在するオブジェクトウィンドウを検出するステップ、検出された複数の近傍のオブジェクトウィンドウをクラスタリングで統合することで、オブジェクト領域を設定するステップ、前記オブジェクト領域をオブジェクト候補領域とし、過去のフレーム画像のオブジェクト領域に基づいて予測される領域をオブジェクト予測領域とし、ウィンドウ幅の差異が所定範囲内であるオブジェクト候補領域であって、かつオブジェクト予測領域から最短距離にあるオブジェクト候補領域について、過去のフレーム画像のオブジェクト領域と関連付けて登録するステップと、を実行させるプログラムを提供する。 Further, the present invention includes a step of detecting an object window in which an object exists from an input frame image in a computer, a step of setting an object region by integrating a plurality of detected object windows in a vicinity by clustering, and the object The area is an object candidate area, the area predicted based on the object area of the past frame image is the object prediction area, the object candidate area has a window width difference within a predetermined range, and is the shortest from the object prediction area There is provided a program for executing a step of registering an object candidate area at a distance in association with an object area of a past frame image.

本発明によれば、入力フレーム画像において、過去フレーム画像に基づいてオブジェクト予測領域を設定し、オブジェクト検出部においてオブジェクトが存在するとして検出されたオブジェクト候補領域であって、所定条件を満たしかつ最短距離にあるオブジェクト候補領域を過去フレーム画像のオブジェクト領域と関連付けて登録することができるため、同一人物をフレーム画像間でトラッキングすることが可能となる。 According to the present invention, in an input frame image, an object prediction region is set based on a past frame image, and is an object candidate region detected by the object detection unit as an object, satisfying a predetermined condition and having the shortest distance Can be registered in association with the object area of the past frame image, so that the same person can be tracked between the frame images.

また、トラッキング処理において、ヒストグラムや色情報などの比較を行うことなく、トラッキングを行うため、リアルタイムでトラッキング処理を行うことができる。さらに、オブジェクト候補領域は、輝度等の特徴量に基づいて検出されているため、少ないパラメータであっても精度高くトラッキングを行うことができる。 Further, since tracking is performed without comparing histograms or color information in the tracking process, the tracking process can be performed in real time. Furthermore, since the object candidate area is detected based on a feature quantity such as luminance, tracking can be performed with high accuracy even with a small number of parameters.

さらに、オブジェクト検出部において、入力フレーム画像にウィンドウを設け、ウィンドウ内の所定領域を有するブロックでオーバーラップスキャンし、オーバーラップスキャンするときに、スキャンレイヤごとに、一度算出した特徴量を再利用することで高速にオブジェクト検出を行うことができる。 Further, in the object detection unit, a window is provided in the input frame image, overlap scan is performed with a block having a predetermined area in the window, and the feature amount calculated once is reused for each scan layer when performing overlap scan. Thus, object detection can be performed at high speed.

本発明にかかる一つの実施形態における画像処理装置のブロック図の一例である。It is an example of the block diagram of the image processing apparatus in one embodiment concerning this invention. 本発明の画像処理装置のハードウェア構成図の一例である。It is an example of the hardware block diagram of the image processing apparatus of this invention. 本発明の画像処理装置において、オブジェクト検出部１２が入力フレーム画像からオブジェクトを検出する場合のフローチャートの一例である。5 is an example of a flowchart when the object detection unit 12 detects an object from an input frame image in the image processing apparatus of the present invention. スキャンテーブル記憶部１１に記憶されているスキャンテーブルの一例である。3 is an example of a scan table stored in a scan table storage unit 11. セルとブロックの関係を示す概念図である。It is a conceptual diagram which shows the relationship between a cell and a block. ウィンドウとブロックの関係を示す概念図である。It is a conceptual diagram which shows the relationship between a window and a block. 本発明の画像処理装置においてオブジェクトトラッキング部１４がフレーム画像間で同一のオブジェクト領域を対応づける場合の処理の一例である。It is an example of a process in case the object tracking part 14 matches the same object area | region between frame images in the image processing apparatus of this invention. フレーム画像間におけるオブジェクト領域の関連付けのイメージ図である。It is an image figure of correlation of the object area | region between frame images. 図７のステップ７０３におけるオブジェクト予測領域とオブジェクト候補領域の関係を示す図である。It is a figure which shows the relationship between the object prediction area | region and object candidate area | region in step 703 of FIG. 本発明の画像処理装置においてオブジェクト検出を行った場合の実験結果を示すテーブルである。It is a table which shows the experimental result at the time of performing object detection in the image processing apparatus of this invention.

この発明を画像内に撮影されている人物の上半身領域を検出し、人物のトラッキングに適用した実施の形態を以下に説明する。この実施の形態では、カメラが所定の位置に設置され、撮影場所を通過する人物を撮影している。撮影された画像から人物を検出し、フレーム画像間でトラッキングを行うことで、一人の人物を動画像において追跡を行う。但し、ここでは人物の上半身領域であるが、検出するオブジェクトは人物の上半身に限らず、他の人物の一部分、目や鼻などであってもよく、また、車や車のナンバープレートなど、物体や物体の一部であってもよい。 An embodiment in which the present invention is applied to person tracking by detecting the upper body area of a person photographed in an image will be described below. In this embodiment, a camera is installed at a predetermined position, and a person passing through a shooting location is shot. A person is detected in the moving image by detecting a person from the captured image and performing tracking between the frame images. However, although it is the upper body area of the person here, the object to be detected is not limited to the upper body of the person, but may be a part of another person, eyes or nose, or an object such as a car or a car license plate. Or part of an object.

図１は、本発明にかかる一つの実施形態における画像処理装置のブロック図の一例である。画像処理装置１００は、画像記憶部１０、スキャンテーブル記憶部１１、オブジェクト検出部１２、クラスタリング都合部１３、オブジェクトトラッキング部１４と、を有する。画像記憶部１０は、ネットワークや有線ケーブル等を介して接続されるカメラから入力される画像を記憶する。カメラは、動画像を撮像しており、画像処理装置１００には、動画像の時系列データである画像データが、例えば、通信インターフェースを介して、順次入力される。入力される各々の画像データはフレーム画像である。入力されたフレーム画像は、画像記憶部１０に順次、一時的又は長期間記憶される。 FIG. 1 is an example of a block diagram of an image processing apparatus according to an embodiment of the present invention. The image processing apparatus 100 includes an image storage unit 10, a scan table storage unit 11, an object detection unit 12, a clustering convenience unit 13, and an object tracking unit 14. The image storage unit 10 stores an image input from a camera connected via a network or a wired cable. The camera captures a moving image, and image data that is time-series data of the moving image is sequentially input to the image processing apparatus 100 via, for example, a communication interface. Each input image data is a frame image. The input frame images are stored in the image storage unit 10 sequentially or temporarily or for a long time.

スキャンテーブル記憶部１１は、フレーム画像をスキャンする際のウィンドウサイズやスキャンストライドの値、ウィンドウの数などを記憶しているテーブルである。フレーム画像からオブジェクト検出処理を行う際に、画像にウィンドウを設定し、ウィンドウ内をオーバーラップさせながらスキャンすることで当該ウィンドウ内の特徴量を算出している。ウィンドウサイズは様々なものが設定可能であり、ウィンドウサイズによってスキャンストライド値や画像データ全体をスキャンするために必要なウィンドウ数が変化するため、ウィンドウサイズに対応づけてテーブルとして記憶しており、オブジェクト検出部１２は、スキャンテーブル記憶部１１からスキャンテーブルを読み出して、所定のウィンドウサイズでスキャンを行う。 The scan table storage unit 11 is a table that stores a window size, a scan stride value, the number of windows, and the like when scanning a frame image. When performing object detection processing from a frame image, a window is set in the image, and the feature amount in the window is calculated by scanning while overlapping the windows. Various window sizes can be set, and since the scan stride value and the number of windows necessary to scan the entire image data change depending on the window size, it is stored as a table in association with the window size. The detection unit 12 reads the scan table from the scan table storage unit 11 and performs scanning with a predetermined window size.

オブジェクト検出部１２は、画像記憶部１０に記憶されたフレーム画像を順次読み出して、入力フレーム画像からオブジェクトが存在するオブジェクトウィンドウの検出処理を行う。例えば、オブジェクト検出部１２は、スキャンテーブル記憶部１１からスキャンテーブルを読み出して、予め決められたスキャンレイヤのウィンドウサイズで、入力フレーム画像全体において探索する。一例として、定められたウィンドウサイズを有するウィンドウ内で所定のスキャンストライド値でオーバーラップスキャンを行うことで、ウィンドウ内の輝度勾配方向ヒストグラムを生成し、ＨＯＧ特徴量を算出する。 The object detection unit 12 sequentially reads the frame images stored in the image storage unit 10 and performs an object window detection process in which an object exists from the input frame image. For example, the object detection unit 12 reads the scan table from the scan table storage unit 11 and searches the entire input frame image with a predetermined scan layer window size. As an example, by performing overlap scanning with a predetermined scan stride value in a window having a predetermined window size, a luminance gradient direction histogram in the window is generated, and the HOG feature amount is calculated.

例えば、人物の上半身をオブジェクトとして検出する場合、オブジェクト検出部１２は、サポート・ベクター・マシン（ＳＶＭ）などを用いて、輝度勾配方向ヒストグラムの特徴量（ＨＯＧ特徴量）を予め学習しておき、ＳＶＭ識別器を得ておく。そして、入力フレーム画像から抽出されたＨＯＧ特徴量が人物の上半身かどうかを、ＳＶＭ識別器によって評価することで、人物の上半身の領域であるオブジェクトウィンドウを検出する。つまりＳＶＭ識別器によって、人物の上半身である特徴量を有するウィンドウを、オブジェクトウィンドウとして検出する。 For example, when detecting the upper body of a person as an object, the object detection unit 12 uses a support vector machine (SVM) or the like to learn in advance the feature amount (HOG feature amount) of the luminance gradient direction histogram, Obtain an SVM classifier. Then, an SVM classifier evaluates whether or not the HOG feature amount extracted from the input frame image is the upper body of the person, thereby detecting an object window that is an area of the upper body of the person. That is, the SVM classifier detects a window having a feature value that is the upper body of a person as an object window.

また、オブジェクト検出部１２は、様々なウィンドウサイズを有するウィンドウで入力フレーム画像内を逐次スキャンを行って、上半身が撮影されている領域であるオブジェクト領域を検出する。 In addition, the object detection unit 12 sequentially scans the input frame image with windows having various window sizes, and detects an object region that is a region where the upper body is photographed.

クラスタリング統合部１３は、オブジェクト検出部１２で検出されたウィンドウであるオブジェクトウィンドウをクラスタリングによって統合する。つまり、様々なウィンドウで入力フレーム画像をスキャンすると、一つの上半身に対して複数の近傍のオブジェクトウィンドウが存在することになる。言い換えれば、複数の位置が異なったウィンドウやサイズが異なったウィンドウが重なりあってオブジェクトウィンドウとして検出される。そこで、これら複数のオブジェクトウィンドウを、例えばMeanshiftクラスタリングを用いて一つのオブジェクト領域として統合して、設定する。 The clustering integration unit 13 integrates object windows, which are windows detected by the object detection unit 12, by clustering. That is, when the input frame image is scanned in various windows, a plurality of neighboring object windows exist for one upper body. In other words, a plurality of windows having different positions and windows having different sizes are overlapped to be detected as an object window. Therefore, the plurality of object windows are integrated and set as one object region using, for example, Meanshift clustering.

クラスタリング統合部１３は、例えばMeanshiftクラスタリングを用いる場合、識別されたオブジェクトウィンドウの位置（ｘ_i、ｙ_i）とスケールscale_iの三次元でのクラスタリング統合を下記式のガウシアン・カーネル関数を用いて、その累積値から位置とスケールのシフト値を算出し、それぞれのシフト値から位置とスケールを修正更新することで収束させ、一つのオブジェクト領域としている。

ここにいうbandwidth_positionとbandwidth_scaleは、位置とスケールにおけるそれぞれのクラスタリング探索領域として設定するパラメータである。 For example, when using Meanshift clustering, the clustering integration unit 13 performs three-dimensional clustering integration of the identified object window position (x _i , y _i ) and the scale scale _i using the following Gaussian kernel function: The position and scale shift values are calculated from the accumulated values, and the positions and scales are corrected and updated from the respective shift values to be converged to form one object region.

Here, bandwidth_position and bandwidth_scale are parameters set as respective clustering search regions in the position and scale.

オブジェクトトラッキング部１４は、入力フレーム画像におけるオブジェクト領域をオブジェクト候補領域とし、過去フレーム画像でのオブジェクト領域に基づいて入力フレーム画像においてオブジェクトが存在すると予測される領域をオブジェクト予測領域として、所定条件を満たすオブジェクト候補領域をオブジェクト予測領域に対応する過去フレーム画像のオブジェクト領域に関連付けて登録することで、オブジェクトのトラッキングを行い、その結果を出力する。過去フレーム画像とは、例えば、入力フレーム画像の一フレーム前の画像である。 The object tracking unit 14 sets the object area in the input frame image as an object candidate area, and sets the area in which the object is predicted to exist in the input frame image based on the object area in the past frame image as the object prediction area, and satisfies a predetermined condition. The object candidate area is registered in association with the object area of the past frame image corresponding to the object prediction area, thereby tracking the object and outputting the result. The past frame image is, for example, an image one frame before the input frame image.

例えば、オブジェクトトラッキング部１４は、ウィンドウ幅の差異が所定の範囲内であるオブジェクト候補領域であって、他のオブジェクト候補領域とも比較してオブジェクト予測領域から最短距離であるオブジェクト候補領域を、過去フレーム画像のオブジェクト領域に関連付けて登録する。また、例えば、オブジェクトトラッキング部１４は、関連付けて登録する場合に、過去フレーム画像のオブジェクト領域に付与されていた識別子（人物ＩＤ）と同じ識別子をオブジェクト候補領域へ付与する。トラッキング方法の詳細については後述する。 For example, the object tracking unit 14 determines an object candidate area that has a window width difference within a predetermined range and that has the shortest distance from the object prediction area as compared to other object candidate areas, as a past frame. Register in association with the object area of the image. For example, when registering in association with each other, the object tracking unit 14 assigns the same identifier as the identifier (person ID) assigned to the object area of the past frame image to the object candidate area. Details of the tracking method will be described later.

図２は、本発明の実施の形態に係る画像処理装置１００のハードウェア構成の一例を示すブロック図である。図２において、画像処理装置１００を構成するコンピュータは、従前から存在する汎用的なハードウェア構成で実現できる。すなわち、画像処理装置１００を形成するコンピュータは、図２に示したようにＣＰＵ１０１、ＲＯＭ１０２、ＲＡＭ１０３、外部記憶装置１０４、通信インターフェース１０５、入出力インターフェース１０６と接続されたマウス１０７とキーボード１０８、及び表示装置として設けられたディスプレイ１０９を、バスに接続して構成される。 FIG. 2 is a block diagram showing an example of a hardware configuration of the image processing apparatus 100 according to the embodiment of the present invention. In FIG. 2, the computer constituting the image processing apparatus 100 can be realized by a general-purpose hardware configuration that has existed before. That is, the computer forming the image processing apparatus 100 includes a CPU 101, a ROM 102, a RAM 103, an external storage device 104, a communication interface 105, a mouse 107 and a keyboard 108 connected to the input / output interface 106, and a display as shown in FIG. A display 109 provided as a device is connected to a bus.

画像処理装置１００は、一つの態様では、ハードウェア資源とソフトウェアの協働で実現される。具体的には、画像処理装置１００の各種機能は、ＲＯＭ１０２や外部記憶装置１０４等の記録媒体に記録されたプログラムがＣＰＵ１０１によって実行されることで実現される。また、画像処理装置１００は、物理的に一つの装置により実現されてもよく、複数の装置により実現されていてもよい。 In one aspect, the image processing apparatus 100 is realized by cooperation of hardware resources and software. Specifically, various functions of the image processing apparatus 100 are realized by the CPU 101 executing programs recorded on a recording medium such as the ROM 102 and the external storage device 104. Further, the image processing apparatus 100 may be physically realized by a single device or may be realized by a plurality of devices.

図３は、本発明の画像処理装置において、オブジェクト検出部１２が入力フレーム画像からオブジェクトを検出する場合のフローチャートの一例である。オブジェクト検出部１２は、画像記憶部１０から入力されるフレーム画像を受け付ける（ステップ３０１）。オブジェクト検出部１２は、入力を受け付けたフレーム画像におけるオブジェクトが存在するオブジェクトウィンドウの検出処理を開始する。 FIG. 3 is an example of a flowchart when the object detection unit 12 detects an object from an input frame image in the image processing apparatus of the present invention. The object detection unit 12 receives a frame image input from the image storage unit 10 (step 301). The object detection unit 12 starts detection processing of an object window in which an object in the frame image that has received the input exists.

オブジェクト検出部１２は、スキャンテーブル記憶部１１からスキャンテーブルを読み出し、入力フレーム画像をスキャンする際のウィンドウサイズを決定する（ステップ３０２）。ウィンドウサイズは、例えば、予め決定されたスキャンレイヤに基づいて決められている。スキャンテーブルに記憶されたスキャンレイヤのうち一又は二のスキャンレイヤのウィンドウサイズで入力フレーム画像のスキャンを行ってもよい。また、スキャンテーブル記憶部１１に記憶されているスキャンレイヤ全てのスキャン処理を行ってもよい。 The object detection unit 12 reads the scan table from the scan table storage unit 11 and determines the window size for scanning the input frame image (step 302). The window size is determined based on, for example, a predetermined scan layer. The input frame image may be scanned with the window size of one or two of the scan layers stored in the scan table. Alternatively, the scan processing of all the scan layers stored in the scan table storage unit 11 may be performed.

次に、オブジェクト検出部１２は、決定したウィンドウサイズに基づいて、入力フレーム画像における算出ウィンドウを設定する（ステップ３０３）。入力フレーム画像全体をスキャンするために必要なウィンドウ数は、ウィンドウサイズによって異なるが、所定のウィンドウサイズに基づいて入力フレーム画像において特徴量を算出するウィンドウの位置を設定する。算出ウィンドウの位置は、ウィンドウサイズと該ウィンドウサイズにおける何番目のスキャンウィンドウかによって決定される。 Next, the object detection unit 12 sets a calculation window in the input frame image based on the determined window size (step 303). The number of windows required to scan the entire input frame image varies depending on the window size, but the position of the window for calculating the feature amount in the input frame image is set based on a predetermined window size. The position of the calculation window is determined by the window size and the number of scan windows in the window size.

オブジェクト検出部１２は、算出対象として設定されたウィンドウ内のブロックでの輝度勾配ヒストグラムを生成する（ステップ３０４）。ここでは、例えば、上半身であるオブジェクトを検出するためにＨＯＧ特徴量を用いる。そのためには、入力フレーム画像内の各画素における近傍４画素から水平垂直方向の輝度勾配をもとめ、輝度勾配強度と方向を算出する。そして各画素に対する輝度勾配強度を輝度勾配方向によって、０度から１８０度までの２０度ごとのビンイメージ数９に割り振る。各ビンイメージをカーネルを利用しながらフィルタする。こちらでは大きさが７×７のカーネルを使っても良い。そして各ビンイメージに関するインテグラルイメージを生成する。一例として、ブロックに対する各セル内の輝度勾配強度累積をヒストグラムとし、ブロック内のセル４個分の輝度勾配方向ヒストグラムを連結し、ブロックごとにノーマライズすることで３６次元のブロック内輝度勾配方向ヒストグラムを生成する。なお、スキャンとは、画像の各画素の特徴量を解析することを意味する。 The object detection unit 12 generates a luminance gradient histogram in the block within the window set as the calculation target (step 304). Here, for example, the HOG feature value is used to detect an object that is the upper body. For this purpose, the luminance gradient in the horizontal and vertical directions is obtained from the four neighboring pixels in each pixel in the input frame image, and the luminance gradient intensity and direction are calculated. Then, the luminance gradient strength for each pixel is assigned to the number of bin images 9 every 20 degrees from 0 degrees to 180 degrees according to the luminance gradient direction. Filter each bin image using the kernel. You can use a 7x7 kernel here. Then, an integral image for each bin image is generated. As an example, a luminance gradient strength accumulation in each cell for a block is used as a histogram, luminance gradient direction histograms for four cells in the block are concatenated, and each block is normalized to obtain a 36-dimensional luminance gradient direction histogram in the block. Generate. Note that scanning means analyzing the feature amount of each pixel of an image.

さらに、オブジェクト検出部１２は、ウィンドウ内をブロックでオーバーラップスキャンする（ステップ３０５）。スキャンテーブルに記憶されているスキャンレイヤ、セル幅に基づいてオーバーラップスキャンを行う。つまり、ウィンドウ内を定められたセル幅ずつ位置をずらしながらブロック内の輝度勾配ヒストグラムを順次作成していく。つまり、ウィンドウ内を水平方向に７回、垂直方向に７回の計４９回のスキャンを行う。この際、オブジェクト検出部１２は、オーバーラップ領域の特徴量については、前のブロックをスキャンした際に算出した特徴量を用いる。算出結果を再利用することでオブジェクト検出処理を高速化するためである。 Further, the object detection unit 12 performs an overlap scan in the window with blocks (step 305). An overlap scan is performed based on the scan layer and cell width stored in the scan table. That is, the brightness gradient histogram in the block is sequentially created while shifting the position in the window by a predetermined cell width. That is, a total of 49 scans are performed in the window, 7 times in the horizontal direction and 7 times in the vertical direction. At this time, the object detection unit 12 uses the feature amount calculated when the previous block is scanned as the feature amount of the overlap region. This is for speeding up the object detection process by reusing the calculation result.

そして、オブジェクト検出部１２は、算出ウィンドウ内の特徴量を算出する（ステップ３０６）。一例として、オブジェクト検出部１２は、算出ウィンドウ内を４９回のスキャンを行うことで、ウィンドウ内において１７６４次元のＨＯＧ特徴量を得る。 Then, the object detection unit 12 calculates the feature amount in the calculation window (step 306). As an example, the object detection unit 12 obtains a 1764-dimensional HOG feature value in the window by performing 49 scans in the calculation window.

オブジェクト検出部１２は、算出ウィンドウにおける特徴量をＳＶＭで識別する（ステップ３０７）。ここでは識別器としてサポート・ベクター・マシン（ＳＶＭ）を用いているが、識別器としてはこれに限らない。予め上半身が含まれている画像と上半身が含まれていない画像とでＨＯＧ特徴量を学習しておき、ＳＶＭ識別器を得ておく。オブジェクト検出部１２は、ＳＶＭ識別器を用いて、人物の上半身である特徴量を有するウィンドウをオブジェクトウィンドウとして検出する。算出ウィンドウがオブジェクトウィンドウであると判定された場合、クラスタリング統合部１３への処理へうつる。 The object detection unit 12 identifies the feature amount in the calculation window by SVM (step 307). Here, a support vector machine (SVM) is used as the classifier, but the classifier is not limited to this. HOG feature values are learned in advance from an image that includes the upper body and an image that does not include the upper body, and an SVM classifier is obtained. Using the SVM classifier, the object detection unit 12 detects a window having a feature value that is the upper body of a person as an object window. If it is determined that the calculation window is an object window, the process proceeds to the clustering integration unit 13.

オブジェクト検出部１２は、所定のスキャンレイヤにおける全てのウィンドウで特徴量の算出を行ったか、判定を行う（ステップ３０８）。全てのウィンドウでの特徴量算出が終わっていない場合、次の算出ウィンドウを設定する（ステップ３０３）。そして、同様に特徴量の算出を行う。全てのウィンドウでの特徴量算出が終わった場合、オブジェクト検出部１２は、そのスキャンレイヤにおけるオブジェクト検出処理を終了する。 The object detection unit 12 determines whether or not the feature amount has been calculated for all the windows in the predetermined scan layer (step 308). If the feature amount calculation has not been completed for all windows, the next calculation window is set (step 303). Similarly, the feature amount is calculated. When the feature amount calculation has been completed for all windows, the object detection unit 12 ends the object detection process in the scan layer.

なお、上記では、一つのスキャンレイヤにおけるオブジェクト検出処理を説明したが、スキャンテーブルに記憶されている全てのスキャンレイヤを行う場合、また２以上のスキャンレイヤの処理を行う場合、図３において説明したフローを所定回数分同様に行う。スキャンレイヤを変えるとウィンドウサイズが変化する。従って、画像に映っている人物の大きさが様々である場合、複数のスキャンレイヤでオブジェクト検出処理を行うことが望ましい。 In the above description, the object detection process in one scan layer has been described. However, in the case where all the scan layers stored in the scan table are performed, or when the process of two or more scan layers is performed, the process described with reference to FIG. The flow is performed in the same manner for a predetermined number of times. Changing the scan layer changes the window size. Therefore, when the size of the person shown in the image is various, it is desirable to perform the object detection process with a plurality of scan layers.

図４は、スキャンテーブル記憶部１１に記憶されているスキャンテーブルの一例である。本スキャンテーブルは、入力画像の解像度が６４０×４８０画素である場合の一例である。ウィンドウサイズごとにスキャンレイヤの番号が付与されており、スキャンレイヤに対応づけて、ウィンドウサイズ、セルサイズ、スキャンストライド値、ウィンドウ数が記憶されている。ウィンドウサイズは、入力画像のなかでオブジェクトを検出する際に設定するウィンドウである。ここではウィンドウサイズは、ピクセル数で表されている。セルサイズは、上述したようにウィンドウ内において設定されるセルのサイズである。スキャンストライド値は、画像全体をスキャンするためにウィンドウをずらす、画素距離の値である。ウィンドウ数は、画像全体をスキャンするために必要なウィンドウの数をしめしている。オブジェクト検出部１２は、スキャンテーブル記憶部１１に記憶されているスキャンテーブルを参照しながら、入力フレーム画像のスキャンを行う。 FIG. 4 is an example of a scan table stored in the scan table storage unit 11. This scan table is an example when the resolution of the input image is 640 × 480 pixels. A scan layer number is assigned for each window size, and a window size, a cell size, a scan stride value, and the number of windows are stored in association with the scan layer. The window size is a window that is set when an object is detected in the input image. Here, the window size is represented by the number of pixels. The cell size is the cell size set in the window as described above. The scan stride value is a pixel distance value by which the window is shifted in order to scan the entire image. The number of windows indicates the number of windows necessary to scan the entire image. The object detection unit 12 scans the input frame image while referring to the scan table stored in the scan table storage unit 11.

図５は、セルとブロックの関係を示す概念図である。図３のステップ３０４において、オブジェクト検出部１２はウィンドウ内にブロックを設け、輝度勾配ヒストグラムを生成している。この際に、ブロックの中にセルを設けている。それぞれのセルにおいて水平・垂直方向の輝度勾配をもとめ、セル１、セル２、セル３、セル４の順に連結することで３６次元のブロック内輝度勾配方向ヒストグラムを生成している。 FIG. 5 is a conceptual diagram showing the relationship between cells and blocks. In step 304 of FIG. 3, the object detection unit 12 provides a block in the window and generates a luminance gradient histogram. At this time, cells are provided in the block. In each cell, the luminance gradient in the horizontal and vertical directions is obtained, and the cell 1, cell 2, cell 3, and cell 4 are connected in this order to generate a 36-dimensional in-block luminance gradient direction histogram.

図６は、ウィンドウとブロックの関係を示す概念図である。一例では、ウィンドウを水平・垂直方向に４等分した各々の領域をブロックとし、さらに水平・垂直方向にそれぞれ２等分した４個の領域をセルとしている。セル幅ずつずらしてオーバーラップスキャンさせるため、点線のブロック領域が次に輝度勾配を求めるブロックとなる。このように、ブロックをオーバーラップスキャンさせた場合、次のブロックのスキャンにおいて、前のブロックと、２つのセルがオーバーラップ（重複）することとなる。このため、オブジェクト検出部１２は、２つのセルの特徴量の算出においては、前のブロックの輝度勾配を算出した際のセルの特徴量をそのまま再利用する。ここでは、最初に垂直方向にスキャンをし、次に水平方向にスキャンすることとしている。しかし、これに限らず、水平方向に最初にスキャンをして次に垂直方向にスキャンしてもよい。 FIG. 6 is a conceptual diagram showing the relationship between windows and blocks. In one example, each area obtained by dividing the window into four equal parts in the horizontal and vertical directions is used as a block, and four areas obtained by further dividing the window into two equal parts in the horizontal and vertical directions are used as cells. Since overlap scanning is performed by shifting the cell width, the dotted block area is the next block for obtaining the luminance gradient. In this way, when a block is subjected to overlap scan, in the next block scan, the previous block and two cells overlap (overlap). Therefore, the object detection unit 12 reuses the feature amount of the cell as it is when the brightness gradient of the previous block is calculated in the calculation of the feature amount of the two cells. Here, scanning is first performed in the vertical direction, and then scanned in the horizontal direction. However, the present invention is not limited to this, and scanning may be performed first in the horizontal direction and then in the vertical direction.

図７は、本発明の画像処理装置においてオブジェクトトラッキング部１４がフレーム画像間で同一のオブジェクト領域を対応づける場合の処理の一例である。オブジェクトトラッキング部１４は、入力フレーム画像においてクラスタリング統合部１３でオブジェクト領域として設定された領域をオブジェクト候補領域として設定する（ステップ７０１）。 FIG. 7 shows an example of processing when the object tracking unit 14 associates the same object area between frame images in the image processing apparatus of the present invention. The object tracking unit 14 sets the region set as the object region by the clustering integration unit 13 in the input frame image as the object candidate region (step 701).

次に、オブジェクトトラッキング部１４は、過去フレーム画像におけるオブジェクト領域に基づいて入力フレーム画像におけるオブジェクト予測領域を設定する（ステップ７０２）。例えば、オブジェクトトラッキング部１４は、画像記憶部１０に記憶されている入力フレーム画像から二フレーム前のフレーム画像や一フレーム前のフレーム画像など、過去のフレーム画像を読み出す。そして、二フレーム前のフレーム画像におけるオブジェクト領域と、一フレーム前のフレーム画像におけるオブジェクト領域であって、二フレーム前のフレーム画像におけるオブジェクト領域と関連付けられたオブジェクト領域との間での座標の変化量を計算する。そして、オブジェクトトラッキング部１４は、一フレーム前のフレーム画像におけるオブジェクト領域の座標と、算出した座標の変化量に基づいて入力フレーム画像におけるオブジェクト予測領域を算出し、設定する。他の一例として、所定の変化量を設定しておき、一フレーム前にフレーム画像におけるオブジェクト領域に基づいて、入力フレーム画像におけるオブジェクト予測領域を算出してもよい。この場合、人間の歩く速度など、オブジェクト対象の動く速度に基づいて、座標変化量を予め設定しておくことで、二フレーム前にオブジェクトが映っていない場合であっても、オブジェクト予測領域を設定することが可能となる。また、他の一例として、過去のフレーム画像として一フレーム前のフレーム画像ではなく、数フレーム前のフレーム画像を用いて数フレーム前のフレーム画像のオブジェクト領域と所定の座標変化量に基づいてオブジェクト予測領域を設定してもよい。つまり、過去のフレーム画像は、一フレーム前のフレーム画像に限らない。 Next, the object tracking unit 14 sets an object prediction region in the input frame image based on the object region in the past frame image (step 702). For example, the object tracking unit 14 reads past frame images such as a frame image two frames before and a frame image one frame before from the input frame image stored in the image storage unit 10. Then, the amount of change in coordinates between the object area in the frame image two frames before and the object area in the frame image one frame before, which is associated with the object area in the frame image two frames before Calculate Then, the object tracking unit 14 calculates and sets the object prediction area in the input frame image based on the coordinates of the object area in the frame image one frame before and the calculated change amount of the coordinates. As another example, a predetermined change amount may be set, and the object prediction area in the input frame image may be calculated based on the object area in the frame image one frame before. In this case, by setting the coordinate change amount in advance based on the moving speed of the object target such as the walking speed of a human, the object prediction area can be set even when the object is not shown two frames ago. It becomes possible to do. As another example, an object prediction based on an object area of a frame image several frames before and a predetermined coordinate change amount using a frame image several frames before instead of a frame image one frame previous as a past frame image. An area may be set. That is, the past frame image is not limited to the previous frame image.

オブジェクトトラッキング部１４は、オブジェクト予測領域からの距離、ウィンドウ幅の差異が所定範囲内のオブジェクト候補領域を抽出する（ステップ７０３）。具体的には、オブジェクト予測領域であるウィンドウの四角形の左上端の座標をＰ(ａ(ｘ₁)，ａ(ｙ₁)）としオブジェクト予測領域のウィンドウ幅をＷ_aとし、オブジェクト候補領域のウィンドウの四角形の左上端の座標をＰ(ｂ(ｘ₁)，ｂ(ｙ₁)）としオブジェクト候補領域のウィンドウ幅をＷ_bとした場合、オブジェクトトラッキング部１４は、下記の４つの条件を満たすウィンドウ候補領域を抽出する。

ここで、Ｔは、所定の閾値である。 The object tracking unit 14 extracts an object candidate area in which a difference in distance and window width from the object prediction area is within a predetermined range (step 703). Specifically, the coordinates of the upper left point of the rectangle of the window is an object prediction area _{P (a (x 1),} a (y 1)) and then the window width of the object prediction region and W _a, window object candidate region When the coordinate of the upper left corner of the quadrangle is P (b (x ₁ ), b (y ₁ )) and the window width of the object candidate area is W _b , the object tracking unit 14 satisfies the following four conditions. Extract candidate areas.

Here, T is a predetermined threshold value.

上記式の（２）及び（４）は、オブジェクト候補領域のうち、座標値の差異が所定の閾値の範囲内であるオブジェクト領域、つまりオブジェクト予測領域からオブジェクト候補領域まで所定の距離の範囲内にあるオブジェクト候補領域を抽出しようとするものである。また、（３）及び（５）は、オブジェクト候補領域のうち、オブジェクト予測領域のウィンドウ幅との差異が所定の範囲内であるオブジェクト候補領域を抽出しようとするものである。ここで、オブジェクトトラッキング部１４は、これら４つの条件を満たすウィンドウ候補領域を抽出し、抽出したオブジェクト領域を関連オブジェクト候補領域とする。 (2) and (4) in the above formulas are the object areas in which the difference in coordinate values is within a predetermined threshold range among the object candidate areas, that is, within a predetermined distance from the object prediction area to the object candidate area. An object candidate area is to be extracted. Further, (3) and (5) are intended to extract object candidate areas whose difference from the window width of the object prediction area is within a predetermined range from the object candidate areas. Here, the object tracking unit 14 extracts a window candidate area that satisfies these four conditions, and sets the extracted object area as a related object candidate area.

なお、閾値であるＴは任意の値に設定可能であるが、一例として下記式に基づいてＴを定めてもよい。βは、任意の変数である。

In addition, although T which is a threshold value can be set to arbitrary values, you may determine T based on the following formula as an example. β is an arbitrary variable.

次に、オブジェクトトラッキング部１４は、抽出したオブジェクト候補領域、つまり関連オブジェクト候補領域のなかから最短距離であるオブジェクト候補領域を抽出する（ステップ７０４）。所定条件を満たす関連オブジェクト候補領域のなかでオブジェクト予測領域と最も近い座標値、つまり最短距離を有するオブジェクト候補領域を抽出する。 Next, the object tracking unit 14 extracts the object candidate area that is the shortest distance from the extracted object candidate areas, that is, the related object candidate areas (step 704). An object candidate area having a coordinate value closest to the object prediction area, that is, the shortest distance is extracted from the related object candidate areas that satisfy the predetermined condition.

さらに、オブジェクトトラッキング部１４は、抽出したオブジェクト候補領域が他のオブジェクト予測領域に関連付けられているか、チェックを行う（ステップ７０５）。抽出したオブジェクト候補領域が既に他のオブジェクト予測領域に関連づけられている場合があるからである。なお、このチェックは、「オブジェクト予測領域」と関連づけられているかとしているか、「オブジェクト予測領域に対応する過去フレーム画像（例えば一フレーム前のフレーム画像）のオブジェクト領域」と置き換えてもよい。抽出したオブジェクト候補領域が他にも関連付けられている場合（ステップ７０５でＹｅｓ）、複数のオブジェクト予測領域のうち、抽出したオブジェクト候補領域から最短距離であるオブジェクト予測領域にのみ関連付け、最短ではないオブジェクト予測領域との関連付けを破棄する（ステップ７０６）。 Further, the object tracking unit 14 checks whether or not the extracted object candidate area is associated with another object prediction area (step 705). This is because the extracted object candidate area may already be associated with another object prediction area. This check may be related to the “object prediction area” or may be replaced with “an object area of a past frame image (for example, a frame image one frame before) corresponding to the object prediction area”. When the extracted object candidate area is associated with other objects (Yes in step 705), the object that is not the shortest is associated with only the object prediction area that is the shortest distance from the extracted object candidate area among the plurality of object prediction areas The association with the prediction region is discarded (step 706).

次に破棄されたオブジェクト予測領域について、そのオブジェクト予測領域について抽出されている関連オブジェクト候補領域のうち、残りのオブジェクト候補領域のなかで最短距離のオブジェクト候補領域を抽出する（ステップ７０７）。破棄されたオブジェクト予測領域については、対応するオブジェクト候補領域がなくなってしまうため、再度算出をし直すのである。再度の抽出にあたっては、予め抽出していた関連オブジェクト候補領域のなかで、最短距離の関連オブジェクト候補領域を抽出することで行う。 Next, for the discarded object prediction area, an object candidate area having the shortest distance is extracted from the remaining object candidate areas among the related object candidate areas extracted for the object prediction area (step 707). The discarded object prediction area is recalculated because the corresponding object candidate area disappears. The re-extraction is performed by extracting the related object candidate region with the shortest distance from the related object candidate regions extracted in advance.

抽出したオブジェクト領域が他のオブジェクト予測領域に関連付けられていない場合（ステップ７０５でＮｏ）、オブジェクト予測領域に対応付けられた人物ＩＤを付与する（ステップ７０８）。つまり、オブジェクト予測領域に対応する過去のフレーム画像（例えば一フレーム前のフレーム画像）におけるオブジェクト領域に付与されている識別子である人物ＩＤを付与する。同じ識別子を付与することで、フレーム間でのオブジェクトトラッキングが可能となる。付与された人物ＩＤは、オブジェクト領域の近傍に表示部で表示されるようにしてもよい。 If the extracted object area is not associated with another object prediction area (No in step 705), a person ID associated with the object prediction area is assigned (step 708). That is, a person ID that is an identifier assigned to the object area in the past frame image (for example, the frame image one frame before) corresponding to the object prediction area is assigned. By assigning the same identifier, object tracking can be performed between frames. The assigned person ID may be displayed on the display unit in the vicinity of the object area.

図８は、フレーム画像間におけるオブジェクト領域の関連付けのイメージ図である。フレーム画像Ｉ₁、Ｉ₂、Ｉ₃は時系列的に取得されたフレーム画像であり、例えば、Ｉ₃を入力フレーム画像とすると、Ｉ₁は二フレーム前のフレーム画像、Ｉ₂は一フレーム前のフレーム画像である。それぞれに検出されたオブジェクト領域が存在しており、オブジェクトが人間などの動体である場合は、図８のように同じオブジェクトであったとしても、それぞれのフレーム画像での検出位置が異なる。そこで、これらのオブジェクトを関連付けるために、オブジェクトトラッキング部１４は、図７で示したようなフローに従って、フレーム画像間のオブジェクトの関連付けを行って、オブジェクトトラッキングを実現している。 FIG. 8 is an image diagram of association of object areas between frame images. The frame images I ₁ , I ₂ , and I ₃ are frame images acquired in time series. For example, when I ₃ is an input frame image, I ₁ is a frame image two frames before and I ₂ is one frame previous It is a frame image. When each detected object area exists and the object is a moving object such as a human, even if the object is the same object as shown in FIG. 8, the detection position in each frame image is different. Therefore, in order to associate these objects, the object tracking unit 14 realizes object tracking by associating objects between frame images according to the flow shown in FIG.

図９は、図７のステップ７０３におけるオブジェクト予測領域とオブジェクト候補領域の関係を示す図である。Ａがオブジェクト予測領域であるオブジェクトウィンドウであり、Ｂがオブジェクト候補領域のウィンドウの一例である。前述したようにオブジェクト予測領域の左上端の座標Ｐの位置及びウィンドウ幅Ｗの差異が閾値内であるかどうか４つの式に基づいて判定を行い、条件を満たすオブジェクト候補領域を抽出する。 FIG. 9 is a diagram showing the relationship between the object prediction area and the object candidate area in step 703 of FIG. A is an object window which is an object prediction area, and B is an example of a window of an object candidate area. As described above, whether or not the difference between the position of the coordinate P of the upper left corner of the object prediction area and the window width W is within the threshold is determined based on the four expressions, and object candidate areas that satisfy the conditions are extracted.

図１０は、本発明の画像処理装置においてオブジェクト検出を行った場合の実験結果を示すテーブルである。それぞれのスキャンレイヤ及びウィンドウサイズにおける特徴量の計算にかかった時間を示している。本発明においては、オブジェクト検出において、オーバーラップスキャンを行ってウィンドウの特徴量を算出している。その際に、スキャンレイヤごとに、オーバーラップ領域の特徴量について過去の算出結果を再利用することで、オブジェクト検出処理の高速化を図っている。その結果、ＨＯＧ特徴量のような計算数が多く、精度高く検出が行える特徴量を採用した場合であっても、図に示すようにきわめて短時間でオブジェクト検出を行うことが可能となる。そのため、フレーム画像が次々と入力される場合であっても、リアルタイムでオブジェクト検出を行うことができる。
FIG. 10 is a table showing experimental results when object detection is performed in the image processing apparatus of the present invention. It shows the time taken to calculate the feature values for each scan layer and window size. In the present invention, in the object detection, overlap scanning is performed to calculate the window feature amount. At this time, the object detection processing is speeded up by reusing the past calculation result for the feature amount of the overlap region for each scan layer. As a result, even when a feature amount that can be detected with high accuracy such as a HOG feature amount is employed, object detection can be performed in a very short time as shown in the figure. Therefore, even when frame images are input one after another, object detection can be performed in real time.

Claims

An object detection unit for detecting an object window in which an object exists from an input frame image;
A clustering integration unit for setting an object region by integrating a plurality of neighboring object windows detected by the object detection unit by clustering; and
The object area is an object candidate area, an area predicted based on an object area of a past frame image is an object prediction area, and the object prediction area has a window width difference within a predetermined range, and the object prediction An object tracking unit that registers an object candidate area that is at the shortest distance from an area in association with an object area of the past frame image.

The object tracking unit extracts an object candidate area having a window width difference within a predetermined range and within a predetermined distance from the object prediction area as a related object candidate area from among the plurality of object candidate areas. The image processing apparatus according to claim 1, wherein among candidate areas, a related object candidate area located at a shortest distance from the object prediction area is registered in association with the object area of the past frame image.

When the object candidate region is associated with two or more object prediction regions, the object tracking unit registers only the object region of the past frame image corresponding to the object prediction region at the shortest distance from the object candidate region. The image processing apparatus according to claim 1, wherein the image processing apparatus is an image processing apparatus.

The image according to claim 1, wherein the object tracking unit associates and registers an identifier identical to an identifier assigned to an object area of the past frame image by assigning to the object candidate area. Processing equipment.

The object tracking unit is based on the first object area of the frame image two frames before and the amount of change in coordinates in the second object area of the frame image one frame before associated with the first object area. The image processing apparatus according to claim 1, wherein an object prediction area in the input frame image is determined.

The object tracking unit may change the object prediction area in the input frame image by changing a predetermined amount from the coordinates of the object area in the previous frame image when there is no object area associated with the previous frame image. The image processing apparatus according to claim 5, wherein:

The object detection unit calculates a feature amount of the window by providing a window in the input frame image and performing an overlap scan within the window with a block having a predetermined area, and detects the object window based on the calculated feature amount. The image processing apparatus according to claim 1, wherein:

The object detection unit calculates a feature value of a window by reusing a feature value once calculated as a feature value of an overlap region for each scan layer when performing an overlap scan. Item 8. The image processing apparatus according to Item 7.

Detecting an object window in which an object exists from an input frame image;
A step of setting an object region by integrating a plurality of detected object windows in a cluster by clustering;
The object area is an object candidate area, an area predicted based on an object area of a past frame image is an object prediction area, and the object prediction area has a window width difference within a predetermined range, and the object prediction An image processing method comprising a step of registering an object candidate area at a shortest distance from an area in association with an object area of the past frame image.

On the computer,
Detecting an object window in which an object exists from an input frame image;
A step of setting an object region by integrating a plurality of detected object windows in a cluster by clustering;
The object area is an object candidate area, an area predicted based on an object area of a past frame image is an object prediction area, and the object prediction area has a window width difference within a predetermined range, and the object prediction Registering the object candidate area at the shortest distance from the area in association with the object area of the past frame image;
A program that executes