JP2016071387A

JP2016071387A - Image processing apparatus, image processing method, and program

Info

Publication number: JP2016071387A
Application number: JP2014196417A
Authority: JP
Inventors: 明燮鄭; Mingbian Zheng; 明洋皆川; Akihiro Minagawa; 上原　祐介; Yusuke Uehara; 祐介上原
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2014-09-26
Filing date: 2014-09-26
Publication date: 2016-05-09
Anticipated expiration: 2034-09-26
Also published as: JP6372282B2

Abstract

PROBLEM TO BE SOLVED: To provide an image processing apparatus for detecting a specific object from an input image, configured to improve the accuracy of detecting the specific object.SOLUTION: An image processing apparatus 1 includes: an image acquisition section 11 which acquires an image captured by a camera; a feature quantity extraction section 12 which divides the image into a plurality of cells, to extract feature quantity of the cells; a storage section which stores background feature quantity 15 extracted from an image designated for a background image and dictionary data 14 including feature quantity of a specific object to be detected from an image; a collation feature quantity determination section 16 which determines collation feature quantity of cells in the image, on the basis of a differential value between the feature quantity extracted from each of the cells in the image to be collated with the dictionary data 14 with the background feature quantity; and a feature quantity collation section 17 which collates the collation feature quantity of each cell in the image with the feature quantity of the dictionary data 14.SELECTED DRAWING: Figure 1

Description

本発明は、画像内の特定物体を検出する画像処理装置、画像処理方法及びプログラムに関する。 The present invention relates to an image processing apparatus, an image processing method, and a program for detecting a specific object in an image.

画像処理装置には、カメラで撮像した画像や映像のデータから人物等の特定物体を検出するものがある。この種の画像処理装置における特定物体の検出方法の１つとして、Histograms of Oriented Gradients（ＨＯＧ）特徴量を用いる検出方法が知られている（例えば非特許文献１を参照）。 Some image processing apparatuses detect a specific object such as a person from image or video data captured by a camera. As one method for detecting a specific object in this type of image processing apparatus, a detection method using Histograms of Oriented Gradients (HOG) features is known (see, for example, Non-Patent Document 1).

ＨＯＧ特徴量は、画像内の局所領域における輝度の勾配方向をヒストグラム化した特徴量である。ＨＯＧ特徴量を用いた検出方法では、特定物体が存在する画像及び存在しない画像をそれぞれ複数枚用意し、特定物体のＨＯＧ特徴量及び背景物体とみなす非特定物体のＨＯＧ特徴量を学習して辞書データとして登録する。そして、入力画像内に特定物体らしき像（特定物体候補）が存在する場合に、特定物体候補を含む領域（検出枠）内のＨＯＧ特徴量と辞書データのＨＯＧ特徴量とを照合する。照合の結果、特定物体候補が特定物体であると判定された場合、例えば検出枠を入力画像に合成し、合成した画像を液晶ディスプレイ等の表示装置に表示する。 The HOG feature value is a feature value obtained by making a histogram of luminance gradient directions in a local region in an image. In the detection method using the HOG feature amount, a plurality of images each including a specific object and non-existing images are prepared, and the HOG feature amount of the specific object and the non-specific object HOG feature amount regarded as the background object are learned and dictionary. Register as data. Then, when an image that appears to be a specific object (specific object candidate) exists in the input image, the HOG feature amount in the region (detection frame) including the specific object candidate is compared with the HOG feature amount of the dictionary data. As a result of the collation, when it is determined that the specific object candidate is the specific object, for example, the detection frame is combined with the input image, and the combined image is displayed on a display device such as a liquid crystal display.

また、特定物体の検出方法には、入力画像と特定物体が存在しない背景画像とを比較し、入力画像のうちの背景画像と異なる領域（変化のある領域）の輪郭形状から特定物体を検出する検出方法もある（例えば特許文献１を参照）。 As a method for detecting a specific object, the input image is compared with a background image in which the specific object does not exist, and the specific object is detected from the contour shape of a region (region with change) different from the background image in the input image. There is also a detection method (see, for example, Patent Document 1).

入力画像と背景画像とを比較する検出方法では、入力画像と背景画像との差分である差分画像を求め、時系列に並んだ複数の差分画像から移動物体の輪郭領域を抽出することで、特定物体を検出する。 In the detection method that compares the input image and the background image, a difference image that is the difference between the input image and the background image is obtained, and the contour region of the moving object is extracted from the plurality of difference images arranged in time series. Detect an object.

上述の画像処理装置は、防犯のための監視システムにおける人物等の検出に用いられる。 The above-described image processing apparatus is used for detecting a person or the like in a surveillance system for crime prevention.

また、上述の画像処理装置は、近年、店舗等における顧客の行動観察等への活用が検討されている。顧客の行動観察においては、上述の画像処理装置により店舗内や商店街等の画像から検出した人物の移動経路を利用することで、顧客がどのような商品に興味を持つか等のニーズを把握しやすくなる。 In recent years, utilization of the above-described image processing apparatus for observation of customer behavior in stores and the like has been studied. For customer behavior observation, grasp the needs such as what products the customer is interested in by using the movement path of the person detected from the image in the store or shopping street by the above-mentioned image processing device It becomes easy to do.

特開２００４−２６５２９２号公報JP 2004-265292 A

Navneet Dalal, and Bill Triggs, "Histograms of oriented gradients for human detection," In IEEE Conf. on Computer Vision and Pattern Recognition, Vol.1, pp.886-893, 2005Navneet Dalal, and Bill Triggs, "Histograms of oriented gradients for human detection," In IEEE Conf. On Computer Vision and Pattern Recognition, Vol.1, pp.886-893, 2005

上述のＨＯＧ特徴量を用いた検出方法では、入力画像内の特定物体候補のＨＯＧ特徴量が、非特定物体のＨＯＧ特徴量よりも特定物体のＨＯＧ特徴量と類似している場合に、特定物体候補を特定物体として検出する。すなわち、入力画像内の特定物体候補が特定物体ではなくても、特定物体候補のＨＯＧ特徴量が特定物体のＨＯＧ特徴量と類似していれば、特定物体候補を特定物体として検出する。 In the detection method using the HOG feature amount described above, when the HOG feature amount of the specific object candidate in the input image is more similar to the HOG feature amount of the specific object than the non-specific object HOG feature amount, the specific object A candidate is detected as a specific object. That is, even if the specific object candidate in the input image is not a specific object, if the HOG feature value of the specific object candidate is similar to the HOG feature value of the specific object, the specific object candidate is detected as the specific object.

しかしながら、入力画像内には、旗等の形状が変化し得る非特定物体が存在することもある。形状が変化し得る非特定物体のＨＯＧ特徴量は、人物等の特定物体のＨＯＧ特徴量と類似した値になり得る。そのため、人物を検出する画像処理装置では、入力画像内に旗等の非特定物体が存在する場合に、非特定物体を人物として誤検出してしまうことがある。 However, in the input image, there may be a non-specific object whose shape such as a flag can change. The HOG feature value of a non-specific object whose shape can change can be a value similar to the HOG feature value of a specific object such as a person. Therefore, an image processing apparatus that detects a person may erroneously detect a non-specific object as a person when a non-specific object such as a flag exists in the input image.

また、入力画像と背景画像との差分画像を用いた検出方法では、入力画像の特定物体と背景画像の非特定物体とが重なっている場合、特定物体のうちの非特定物体と重なっている部分の特徴量から非特定物体の特徴量が減算される。そのため、差分画像における特定物体の輪郭領域が入力画像における特定物体の輪郭領域とは異なる形状になってしまい、特定物体を正しく検出することが困難になる。 Further, in the detection method using the difference image between the input image and the background image, when the specific object of the input image and the non-specific object of the background image overlap, the part of the specific object that overlaps the non-specific object The feature amount of the non-specific object is subtracted from the feature amount. For this reason, the contour area of the specific object in the difference image has a different shape from the contour area of the specific object in the input image, making it difficult to correctly detect the specific object.

本発明の１つの側面に係る目的は、入力画像から特定物体を検出する画像処理装置における特定物体の検出精度を向上させることである。 An object according to one aspect of the present invention is to improve the detection accuracy of a specific object in an image processing apparatus that detects the specific object from an input image.

本発明の１つの態様の画像処理装置は、カメラで撮像した画像を取得する画像取得部と、前記画像を複数のセルに分割して各セルの特徴量を抽出する特徴量抽出部と、取得した前記画像のうちの背景画像に指定した画像から抽出した背景特徴量、及び前記画像から検出する特定物体の特徴量を含む辞書データを記憶する記憶部と、前記辞書データと照合する前記画像の各セルから抽出した特徴量と前記背景特徴量との差分値に基づいて前記画像の各セルの照合用特徴量を決定する照合用特徴量決定部と、前記画像の各セルの照合用特徴量と前記辞書データの特徴量とを照合する照合部と、を備える。 An image processing apparatus according to one aspect of the present invention includes an image acquisition unit that acquires an image captured by a camera, a feature amount extraction unit that extracts the feature amount of each cell by dividing the image into a plurality of cells, and an acquisition A storage unit that stores dictionary data including a background feature amount extracted from an image designated as a background image of the images and a feature amount of a specific object detected from the image; and A matching feature value determining unit that determines a matching feature value of each cell of the image based on a difference value between the feature value extracted from each cell and the background feature value; and a matching feature value of each cell of the image And a collation unit that collates the feature amount of the dictionary data.

上述の態様によれば、入力画像から特定物体を検出する画像処理装置における特定物体の検出精度が向上する。 According to the above aspect, the detection accuracy of the specific object in the image processing apparatus that detects the specific object from the input image is improved.

本発明の第１の実施形態に係る画像処理装置の構成を示す機能ブロック図である。1 is a functional block diagram illustrating a configuration of an image processing apparatus according to a first embodiment of the present invention. ＨＯＧ特徴量を説明するための模式図である。It is a schematic diagram for demonstrating a HOG feature-value. 事前学習の手順を示すフローチャートである。It is a flowchart which shows the procedure of prior learning. 第１の実施形態に係る特定物体の検出処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the detection process of the specific object which concerns on 1st Embodiment. 入力画像及びＨＯＧ特徴量の一例を示す図である。It is a figure which shows an example of an input image and a HOG feature-value. 図４の照合処理の処理内容を示すフローチャートである。It is a flowchart which shows the processing content of the collation process of FIG. 図６の照合用特徴量決定処理の処理内容を示すフローチャートである。It is a flowchart which shows the processing content of the feature-value determination process for collation of FIG. 照合用特徴量決定処理で用いる２枚の入力画像の一例を示す模式図である。It is a schematic diagram which shows an example of the input image of 2 sheets used by the feature-value determination process for collation. 図８の入力画像における特徴量の差分値ΔＨの例を示す模式図である。FIG. 9 is a schematic diagram illustrating an example of a feature value difference value ΔH in the input image of FIG. 8. 図８の入力画像を用いた照合用特徴量決定処理後及び照合結果の画像を示す模式図である。It is a schematic diagram which shows the image after the characteristic value determination process for collation using the input image of FIG. 8, and the collation result. 照合用特徴量決定処理で用いる２枚の入力画像の別の一例を示す模式図である。It is a schematic diagram which shows another example of two input images used by the feature-value determination process for collation. 図１１の入力画像における特徴量の差分値ΔＨの例を示す模式図である。It is a schematic diagram which shows the example of difference value (DELTA) H of the feature-value in the input image of FIG. 図１１の入力画像を用いた照合用特徴量決定処理後及び照合結果の画像を示す模式図である。FIG. 12 is a schematic diagram showing images after matching feature value determination processing using the input image of FIG. 11 and images of matching results; 本発明の第２の実施形態に係る特定物体の検出処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the detection process of the specific object which concerns on the 2nd Embodiment of this invention. 図１４の背景特徴量設定処理の処理内容を示すフローチャートである。It is a flowchart which shows the processing content of the background feature-value setting process of FIG. 入力画像内における非特定物体の有無の時間変化の例を示す模式図である。It is a schematic diagram which shows the example of the time change of the presence or absence of a nonspecific object in an input image.

以下、図面を参照しながら、本発明の実施形態を詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

（第１の実施形態）
図１は、本発明の第１の実施形態に係る画像処理装置の構成を示す機能ブロック図である。 (First embodiment)
FIG. 1 is a functional block diagram showing the configuration of the image processing apparatus according to the first embodiment of the present invention.

図１に示すように、本実施形態の画像処理装置１は、画像取得部１０と、特徴量抽出部１２と、事前学習部１３と、照合用特徴量決定部１６と、特徴量照合部１７と、出力処理部１８と、を備える。また、画像処理装置１は、入力画像保持部１１と、特徴量の辞書データ１４と、背景特徴量データベース１５と、を更に備える。 As shown in FIG. 1, the image processing apparatus 1 of the present embodiment includes an image acquisition unit 10, a feature amount extraction unit 12, a pre-learning unit 13, a matching feature amount determination unit 16, and a feature amount matching unit 17. And an output processing unit 18. The image processing apparatus 1 further includes an input image holding unit 11, feature quantity dictionary data 14, and a background feature quantity database 15.

画像取得部１０は、撮像装置２で撮像した画像又は映像のデータを取得し、入力画像として入力画像保持部１１に保持させる。 The image acquisition unit 10 acquires image or video data captured by the imaging device 2 and stores the acquired image or video data in the input image storage unit 11 as an input image.

特徴量抽出部１２は、入力画像から、人物等の特定物体（検出対象）の検出に用いる特徴量を抽出する。特徴量抽出部１２は、１枚（１フレーム）の入力画像を複数のセルに分割し、セルごとに特徴量を抽出する。本実施形態の画像処理装置１における特徴量抽出部１２では、特徴量として、画像内での並進に対して不変とみなせるＨＯＧ特徴量を抽出する。 The feature amount extraction unit 12 extracts a feature amount used for detecting a specific object (detection target) such as a person from the input image. The feature amount extraction unit 12 divides one (one frame) input image into a plurality of cells, and extracts feature amounts for each cell. The feature amount extraction unit 12 in the image processing apparatus 1 of the present embodiment extracts a HOG feature amount that can be regarded as invariant to translation in the image as the feature amount.

また、特徴量抽出部１２は、特徴量を抽出した入力画像が背景画像に指定された画像である場合、抽出した特徴量を背景特徴量データベース１５に登録する。 Further, the feature quantity extraction unit 12 registers the extracted feature quantity in the background feature quantity database 15 when the input image from which the feature quantity has been extracted is an image designated as the background image.

事前学習部１３は、画像内における特定物体の特徴量及び非特定物体の特徴量を学習する。事前学習部１３では、特定物体（例えば人物）が存在する複数枚の画像から抽出したＨＯＧ特徴量と、特定物体が存在しない複数枚の画像から抽出したＨＯＧ特徴量とを用い、機械学習で特定物体及び非特定物体のＨＯＧ特徴量を学習する。また、事前学習部１３は、学習したＨＯＧ特徴量を含む辞書データ１４を生成し、画像処理装置１が備える記憶部に記憶させる。 The pre-learning unit 13 learns the feature amount of the specific object and the feature amount of the non-specific object in the image. The prior learning unit 13 uses machine learning to identify HOG feature values extracted from a plurality of images in which a specific object (for example, a person) exists and HOG feature values extracted from a plurality of images in which no specific object exists. Learn HOG features of objects and non-specific objects. In addition, the pre-learning unit 13 generates dictionary data 14 including the learned HOG feature value and stores it in the storage unit included in the image processing apparatus 1.

なお、事前学習部１３による特徴量の学習は、画像処理装置１を用いた特定物体の検出処理を開始する前に行う。すなわち、画像処理装置１を用いて特定物体の検出を行っている間、事前学習部１３による特徴量の学習は行わない。また、辞書データ１４は、他の画像処理装置で作成したデータを可搬記録媒体やネットワークを通じて画像処理装置１内に読み込んでもよい。他の画像処理装置で作成した辞書データ１４を読み込んで用いる場合、画像処理装置１に事前学習部１３はなくてもよい。 The feature amount learning by the prior learning unit 13 is performed before the specific object detection process using the image processing device 1 is started. That is, while the specific object is detected using the image processing apparatus 1, the feature amount is not learned by the prior learning unit 13. As the dictionary data 14, data created by another image processing apparatus may be read into the image processing apparatus 1 through a portable recording medium or a network. When the dictionary data 14 created by another image processing apparatus is read and used, the image processing apparatus 1 may not have the pre-learning unit 13.

照合用特徴量決定部１６は、特徴量抽出部１２で抽出した入力画像の特徴量と、背景特徴量データベース１５の背景特徴量とを用い、辞書データ１４と照合する入力画像の照合用特徴量を決定する。照合用特徴量決定部１６は、入力画像内に１つ以上のセルが含まれる検出枠を設定し、検出枠内のセルごとに抽出した特徴量と背景特徴量との差分値を求める。そして、セルごとに、差分値と予め定めた閾値とを比較して照合用特徴量を決定する。具体的には、差分値が閾値よりも大きいセルは特徴量抽出部１２で抽出した特徴量を照合用特徴量とし、差分値が閾値以下のセルは差分値を照合用特徴量とする。すなわち、照合用特徴量決定部では、入力画像における検出枠内のセルの特徴量のうちの差分値が閾値以下のセルの特徴量を差分値に置換する。 The matching feature amount determination unit 16 uses the feature amount of the input image extracted by the feature amount extraction unit 12 and the background feature amount of the background feature amount database 15 to match the feature value for matching of the input image with the dictionary data 14. To decide. The matching feature amount determination unit 16 sets a detection frame including one or more cells in the input image, and obtains a difference value between the feature amount extracted for each cell in the detection frame and the background feature amount. Then, for each cell, the comparison feature value is determined by comparing the difference value with a predetermined threshold value. Specifically, for a cell having a difference value larger than a threshold value, the feature value extracted by the feature value extraction unit 12 is used as a matching feature value, and for a cell having a difference value equal to or less than the threshold value, the difference value is used as a matching feature value. That is, the matching feature value determination unit replaces the feature value of the cell whose difference value is equal to or less than the threshold value among the feature values of the cells in the detection frame in the input image with the difference value.

特徴量照合部１７は、検出枠内の各セルの照合用特徴量を、辞書データ１４の特徴量と照合し、検出枠内に特定物体が存在するか否かを判定する。そして、特定物体が存在すると判定された検出枠がある場合、その検出枠の位置や寸法等の情報を出力処理部１８に送る。 The feature amount matching unit 17 checks the matching feature amount of each cell in the detection frame with the feature amount of the dictionary data 14, and determines whether or not a specific object exists in the detection frame. Then, when there is a detection frame determined that a specific object exists, information such as the position and size of the detection frame is sent to the output processing unit 18.

出力処理部１８は、入力画像を液晶ディスプレイ等の表示装置３に時系列順に出力する。出力処理部１８は、特徴量照合部１７から特定物体が存在すると判定された検出枠の情報を受け取った場合、入力画像に検出枠を合成した画像を表示装置３に出力する。 The output processing unit 18 outputs input images to the display device 3 such as a liquid crystal display in time series order. When the output processing unit 18 receives information on the detection frame determined that the specific object is present from the feature amount matching unit 17, the output processing unit 18 outputs an image in which the detection frame is combined with the input image to the display device 3.

本実施形態の画像処理装置１は、コンピュータと、コンピュータに実行させるプログラムとにより実現される。この際、特徴量抽出部１２、事前学習部１３、照合用特徴量決定部１６、特徴量照合部１７、及び出力処理部１８は、Central Processing Unit（ＣＰＵ）、Digital Signal Processor（ＤＳＰ）等のプロセッサに所定のプログラムを実行させることで実現する。また、入力画像保持部１１、並びに背景特徴量データベース１５及び辞書データ１４を記憶する記憶部には、プロセッサとバスで接続されたハードディスクドライブ、半導体メモリ等の記憶装置、もしくはプロセッサ内部のキャッシュメモリを用いる。 The image processing apparatus 1 according to the present embodiment is realized by a computer and a program executed by the computer. At this time, the feature quantity extraction unit 12, the pre-learning unit 13, the matching feature quantity determination unit 16, the feature quantity matching unit 17, and the output processing unit 18 are a Central Processing Unit (CPU), a Digital Signal Processor (DSP), or the like. This is realized by causing a processor to execute a predetermined program. The storage unit for storing the input image holding unit 11 and the background feature quantity database 15 and the dictionary data 14 includes a hard disk drive connected to the processor via a bus, a storage device such as a semiconductor memory, or a cache memory inside the processor. Use.

＜事前学習処理＞
本実施形態の画像処理装置１により入力画像内の特定物体を検出する際には、事前に、入力画像内の特定物体候補との照合に用いるＨＯＧ特徴量の辞書データ１４を作成し登録しておく。そこで、特定物体の検出方法を説明する前に、ＨＯＧ特徴量及び辞書データ１４を登録する事前学習の処理手順について簡単に説明する。 <Pre-learning process>
When the specific object in the input image is detected by the image processing apparatus 1 of the present embodiment, the dictionary data 14 of the HOG feature amount used for collation with the specific object candidate in the input image is created and registered in advance. deep. Therefore, before explaining a method for detecting a specific object, a pre-learning processing procedure for registering the HOG feature value and the dictionary data 14 will be briefly explained.

図２は、ＨＯＧ特徴量を説明するための模式図である。
ＨＯＧ特徴量は、画像内の局所領域における輝度の勾配方向のヒストグラムであり、画像から人物等の特定物体を検出する際の特徴量として用いられている。 FIG. 2 is a schematic diagram for explaining the HOG feature amount.
The HOG feature value is a histogram of the luminance gradient direction in a local region in the image, and is used as a feature value when detecting a specific object such as a person from the image.

ＨＯＧ特徴量は、図２の（ａ）に示すように、１枚の画像４０を複数のセル４１に分割し、セル４１ごとに抽出する。１つのセル４１は複数の画素（図２の（ａ）に示した例では５×５画素）からなる。 As shown in FIG. 2A, the HOG feature amount is extracted for each cell 41 by dividing one image 40 into a plurality of cells 41. One cell 41 is composed of a plurality of pixels (5 × 5 pixels in the example shown in FIG. 2A).

１つのセル４１のＨＯＧ特徴量は、セル４１内の各画素における輝度の勾配方向及び勾配強度を算出し、例えば図２の（ｂ）に示すように勾配方向を９分割してヒストグラム化した特徴量である。本実施形態で用いるＨＯＧ特徴量は、周知の抽出方法のいずれかで抽出すればよい。 The HOG feature amount of one cell 41 is a feature obtained by calculating the gradient direction and gradient intensity of the luminance in each pixel in the cell 41 and, for example, dividing the gradient direction into nine parts as shown in FIG. Amount. What is necessary is just to extract the HOG feature-value used by this embodiment with the well-known extraction method.

また、特定物体が人物である場合、事前学習においては、図２の（ａ）に示した画像４０から人物５０を含むｎ×ｍセル（ｎ，ｍはそれぞれ任意の自然数）の矩形領域４２を切り出した画像を特定物体が存在する画像とする。そして、矩形領域４２のような特定物体（人物５０）が存在する画像と、特定物体が存在しない画像とを学習用画像としてそれぞれ複数枚用意し、それぞれの画像におけるＨＯＧ特徴量を抽出する。 Further, when the specific object is a person, in the pre-learning, a rectangular area 42 of n × m cells (n and m are arbitrary natural numbers) including the person 50 from the image 40 shown in FIG. Let the cut-out image be an image in which a specific object exists. Then, a plurality of images each having a specific object (person 50) such as the rectangular area 42 and images having no specific object are prepared as learning images, and the HOG feature amount in each image is extracted.

なお、ＨＯＧ特徴量は勾配方向を分割してヒストグラム化した特徴量であるが、以降の説明で参照するＨＯＧ特徴量のグラフでは、図２（ｂ）に示したような各方向の勾配強度に基づく曲線ＰＨでＨＯＧ特徴量を模式的に表す。 Note that the HOG feature value is a feature value obtained by dividing the gradient direction into a histogram, but in the HOG feature value graph referred to in the following description, the gradient strength in each direction as shown in FIG. The HOG feature value is schematically represented by the curve PH based on it.

図３は、事前学習の手順を示すフローチャートである。
画像処理装置１における事前学習は、ＨＯＧ特徴量を用いた特定物体の検出方法に適用される周知の学習方法で行う。事前学習の行う際には、図３に示すように、まず、学習用画像を画像処理装置１に入力する（ステップＳ１０１）。学習用画像は、画像取得部１０又は図示しない入力インタフェース等を介して入力し、入力画像保持部１１に保持させる。学習用画像には、特定物体（人物）が存在する画像及び存在しない画像をそれぞれ複数枚ずつ用いる。また、学習用画像には、例えば特定物体の検出処理を行う際の入力画像の撮像範囲と同じ範囲を撮像した画像からｎ×ｍセルの矩形領域を切り出した画像を用いる。 FIG. 3 is a flowchart showing a procedure for preliminary learning.
Prior learning in the image processing apparatus 1 is performed by a known learning method applied to a method for detecting a specific object using HOG feature values. When performing pre-learning, first, as shown in FIG. 3, a learning image is input to the image processing apparatus 1 (step S101). The learning image is input via the image acquisition unit 10 or an input interface (not shown), and is held in the input image holding unit 11. For the learning image, a plurality of images each including a specific object (person) and non-existing images are used. In addition, as the learning image, for example, an image obtained by cutting out a rectangular region of n × m cells from an image obtained by imaging the same range as the imaging range of the input image when performing the specific object detection process is used.

次に、特徴量抽出部１２において、入力した各学習用画像のＨＯＧ特徴量を抽出する（ステップＳ１０２）。ステップＳ１０２は、周知の抽出方法で行う。 Next, the feature amount extraction unit 12 extracts the HOG feature amount of each input learning image (step S102). Step S102 is performed by a known extraction method.

次に、事前学習部１３において、特定物体のＨＯＧ特徴量及び非特定物体（背景物体）のＨＯＧ特徴量を学習し辞書データ１４を作成する（ステップＳ１０３）。ＨＯＧ特徴量の学習は、例えばSupport Vector Machine（ＳＶＭ）、Adaptive Boosting（ＡｄａＢｏｏｓｔ）等の周知の機械学習法で行う。 Next, the pre-learning unit 13 learns the HOG feature quantity of the specific object and the HOG feature quantity of the non-specific object (background object), and creates dictionary data 14 (step S103). The learning of the HOG feature amount is performed by a known machine learning method such as Support Vector Machine (SVM) or Adaptive Boosting (AdaBoost).

そして、作成した辞書データ１４を画像処理装置１が備える記憶部に記憶させる（ステップＳ１０４）。 Then, the created dictionary data 14 is stored in a storage unit included in the image processing apparatus 1 (step S104).

＜特定物体の検出処理＞
上記の事前学習により辞書データ１４を作成した後、画像処理装置１による入力画像内の特定物体の検出処理を開始する。 <Specific object detection processing>
After the dictionary data 14 is created by the above prior learning, the image processing apparatus 1 starts a process for detecting a specific object in the input image.

図４は、第１の実施形態に係る特定物体の検出処理の手順を示すフローチャートである。図５は、入力画像及びＨＯＧ特徴量の一例を示す図である。 FIG. 4 is a flowchart illustrating the procedure of the specific object detection process according to the first embodiment. FIG. 5 is a diagram illustrating an example of the input image and the HOG feature amount.

画像処理装置１は、特定物体の検出処理を開始すると、撮像装置２で撮像した画像を画像取得部１０で順次取得し、入力画像として入力画像保持部１１に保持させる。この際、画像取得部１０は、取得した入力画像に対し時系列に固有のフレーム番号（通し番号）を１から順に付与する。 When the image processing device 1 starts the detection process of the specific object, the image acquisition unit 10 sequentially acquires images captured by the imaging device 2 and causes the input image storage unit 11 to store the images as input images. At this time, the image acquisition unit 10 sequentially assigns frame numbers (serial numbers) unique to the acquired input image in order from 1 in order.

入力画像の取得を開始したら、次に、図４に示すように、まず、入力画像のフレーム番号を表す変数ｉを初期化する（ステップＳ２）。上記のように入力画像のフレーム番号を１から始まる通し番号としたため、ステップＳ２では、変数ｉを１にする。 When the acquisition of the input image is started, next, as shown in FIG. 4, first, a variable i representing the frame number of the input image is initialized (step S2). Since the frame number of the input image is a serial number starting from 1, as described above, the variable i is set to 1 in step S2.

次に、特徴量抽出部１２において、フレーム番号ｉの入力画像からＨＯＧ特徴量を抽出する（ステップＳ３）。 Next, the feature quantity extraction unit 12 extracts the HOG feature quantity from the input image of frame number i (step S3).

次に、特徴量抽出部１２において、入力画像のフレーム番号ｉが１より大きいか判断する（ステップＳ４）。ｉ＝１の場合（ステップＳ４；Ｎｏ）、特徴量抽出部１２は、抽出したＨＯＧ特徴量を背景特徴量として背景特徴量データベース１５に登録する（ステップＳ５）。 Next, the feature quantity extraction unit 12 determines whether the frame number i of the input image is greater than 1 (step S4). When i = 1 (step S4; No), the feature quantity extraction unit 12 registers the extracted HOG feature quantity in the background feature quantity database 15 as a background feature quantity (step S5).

一方、ｉ＞１の場合（ステップＳ４；Ｙｅｓ）、特徴量抽出部１２は、抽出したＨＯＧ特徴量を照合用特徴量決定部１６に渡し、照合用特徴量決定部１６及び特徴量照合部１７に照合処理を行わせる（ステップＳ６）。ステップＳ６では、抽出したＨＯＧ特徴量と背景特徴量との差分値に基づいて入力画像における各セルの照合用特徴量を決定し、照合用特徴量を辞書データの特徴量と照合することで入力画像内に特定物体が存在するか否かを判定する。このステップＳ６の照合処理の具体的な手順は後述する。ステップＳ６の照合処理が済むと、照合結果を出力する（ステップＳ７）。ステップＳ７は、出力処理部１８が行い、ステップＳ６の処理によりフレーム番号がｉの入力画像から特定物体が検出された場合は検出した特定物体を囲む枠線を合成（付加）して表示装置３に出力する。なお、フレーム番号がｉの入力画像から特定物体が検出されなかった場合、出力処理部１８は、入力画像をそのまま表示装置３に出力する。 On the other hand, if i> 1 (step S4; Yes), the feature amount extraction unit 12 passes the extracted HOG feature amount to the matching feature amount determination unit 16, and the matching feature amount determination unit 16 and the feature amount matching unit 17 The collation process is performed (step S6). In step S6, a matching feature value of each cell in the input image is determined based on the difference value between the extracted HOG feature value and the background feature value, and input is performed by matching the matching feature value with the feature value of the dictionary data. It is determined whether or not a specific object exists in the image. The specific procedure of the collation process in step S6 will be described later. When the collation process in step S6 is completed, the collation result is output (step S7). Step S7 is performed by the output processing unit 18, and when a specific object is detected from the input image having the frame number i by the process of step S6, a frame surrounding the detected specific object is synthesized (added) and displayed on the display device 3. Output to. If no specific object is detected from the input image with the frame number i, the output processing unit 18 outputs the input image to the display device 3 as it is.

背景特徴量を登録するステップＳ５、又は照合結果を出力するステップＳ７の後は、検出処理を続けるか判断する（ステップＳ８）。検出処理を続ける場合（ステップＳ８；Ｙｅｓ）、フレーム番号を表す変数ｉを１だけインクリメント（ｉ＝ｉ＋１）し、ステップＳ３からの処理を繰り返す。検出処理を続けない場合（ステップＳ８；Ｎｏ）、検出処理を終了する。 After step S5 for registering the background feature amount or step S7 for outputting the collation result, it is determined whether to continue the detection process (step S8). When the detection process is continued (step S8; Yes), the variable i representing the frame number is incremented by 1 (i = i + 1), and the process from step S3 is repeated. If the detection process is not continued (step S8; No), the detection process is terminated.

上記の検出処理におけるステップＳ３では、図５の（ａ）に示すように、入力画像４０を複数のセル４１に分割し、セル４１ごとにＨＯＧ特徴量を抽出する。この際、１つのセル４１はｗ×ｈ画素とする。また、複数のセル４１は、固有のセル番号を付与するとともに、入力画像４０内における各セル４１の左上角部のｘｙ座標により位置を特定して区別する。そして、抽出した各セル４１のＨＯＧ特徴量は、図５の（ｂ）に示すように、固有のセル番号、セルの位置、及びセルの寸法と対応付ける。 In step S3 in the above detection process, as shown in FIG. 5A, the input image 40 is divided into a plurality of cells 41, and HOG feature values are extracted for each cell 41. At this time, one cell 41 has w × h pixels. Further, the plurality of cells 41 are given a unique cell number, and are identified and identified by the xy coordinates of the upper left corner of each cell 41 in the input image 40. Then, the extracted HOG feature value of each cell 41 is associated with a unique cell number, cell position, and cell size, as shown in FIG.

なお、図５の（ｂ）に示したＨＯＧ特徴量におけるＨ_１〜Ｈ_９は、勾配方向を０〜１８０度の角度範囲で９分割してヒストグラム化した場合の各勾配方向の勾配強度であり、添え字の数値が大きいものほど勾配方向を表す角度が大きい。例えば、勾配強度Ｈ_１は勾配方向θ（度）が０≦θ＜２０の画素の勾配強度を積算した値であり、勾配強度Ｈ_２は勾配方向θ（度）が２０≦θ＜４０の画素の勾配強度を積算した値である。 Note that H _{1 to} H ₉ in the HOG feature amount shown in FIG. 5B are gradient intensities in the respective gradient directions when the gradient direction is divided into nine portions in the angle range of 0 to 180 degrees. The larger the numerical value of the subscript, the larger the angle representing the gradient direction. For example, the gradient strength H ₁ is a value obtained by integrating the gradient strengths of pixels whose gradient direction θ (degree) is 0 ≦ θ <20, and the gradient strength H ₂ is a pixel whose gradient direction θ (degree) is 20 ≦ θ <40. It is a value obtained by integrating the gradient intensities.

本実施形態に係る特定物体の検出処理では、上記のように、フレーム番号が１の入力画像、すなわち最初（検出処理開始時）の入力画像を背景画像とし、背景画像から抽出したＨＯＧ特徴量を背景特徴量データベース１５に登録する。この際、背景画像とする入力画像４０には、図５の（ａ）に示したように旗５１等の形状が変化し得る非特定物体が存在していても構わない。そして、ステップＳ６の照合処理では、入力画像のＨＯＧ特徴量と背景特徴量との差分値に基づいて入力画像の各セルの照合用特徴量を決定し、照合用特徴量と辞書データとを照合する。 In the specific object detection process according to the present embodiment, as described above, the input image with the frame number of 1, that is, the first input image (at the start of the detection process) is used as the background image, and the HOG feature amount extracted from the background image is used. Register in the background feature database 15. At this time, the input image 40 as the background image may include a non-specific object whose shape such as the flag 51 can change as shown in FIG. In the collation process in step S6, the collation feature quantity of each cell of the input image is determined based on the difference value between the HOG feature quantity and the background feature quantity of the input image, and the collation feature quantity and the dictionary data are collated. To do.

図６は、図４の照合処理の処理内容を示すフローチャートである。図７は、図６の照合用特徴量決定処理の処理内容を示すフローチャートである。 FIG. 6 is a flowchart showing the contents of the collation process of FIG. FIG. 7 is a flowchart showing the processing contents of the matching feature amount determination processing of FIG.

照合処理では、図６に示すように、まず、特定物体の検出結果を表す検出リストを初期化する（ステップＳ６０１）。検出リストは、例えば入力画像から特定物体として検出された像を囲む枠の位置及びサイズが記述されたデータであり、画像処理装置１が備えるキャッシュメモリ等で保持している。 In the collation process, as shown in FIG. 6, first, a detection list representing the detection result of the specific object is initialized (step S601). The detection list is, for example, data describing the position and size of a frame surrounding an image detected as a specific object from the input image, and is held in a cache memory or the like provided in the image processing apparatus 1.

次に、照合用特徴量決定部１６において、入力画像内に検出枠を設定する（ステップＳ６０２）。検出枠は、縦横に連続した複数のセルからなるブロックの外周である。検出枠の寸法は、入力画像内における特定物体（人物）の寸法に応じて複数種類用意される。ステップＳ６０２では、検出枠の寸法を選択して入力画像に当てはめ、検出枠内に含まれるセルを特定する。 Next, the matching feature amount determination unit 16 sets a detection frame in the input image (step S602). The detection frame is the outer periphery of a block composed of a plurality of cells that are continuous vertically and horizontally. A plurality of types of detection frame dimensions are prepared according to the dimensions of a specific object (person) in the input image. In step S602, the size of the detection frame is selected and applied to the input image, and the cells included in the detection frame are specified.

次に、照合用特徴量決定部１６において、検出枠内のセルの照合用特徴量を決定する照合用特徴量決定処理を行う（ステップＳ６０３）。ステップＳ６０３では、図７に示すような処理を行う。 Next, the matching feature value determining unit 16 performs a matching feature value determining process for determining the matching feature value of the cell in the detection frame (step S603). In step S603, processing as shown in FIG. 7 is performed.

ステップＳ６０３では、まず、検出枠内のセルを１つ選択し（ステップＳ６０３ａ）、選択したセルに対応する背景画像内のセルのＨＯＧ特徴量（背景特徴量）を背景特徴量データベースから取得する（ステップＳ６０３ｂ）。ステップＳ６０３ｂでは、図５の（ｂ）に示したセルの位置に基づいて背景特徴量を取得する。 In step S603, first, one cell in the detection frame is selected (step S603a), and the HOG feature (background feature) of the cell in the background image corresponding to the selected cell is acquired from the background feature database ( Step S603b). In step S603b, a background feature amount is acquired based on the cell position shown in FIG.

次に、選択したセルのＨＯＧ特徴量と背景特徴量との差分値ΔＨを算出する（ステップＳ６０３ｃ）。 Next, a difference value ΔH between the HOG feature amount and the background feature amount of the selected cell is calculated (step S603c).

次に、差分値ΔＨが閾値ＴＨ１より大きいか判断する（ステップＳ６０３ｄ）。ΔＨ＞ＴＨ１の場合（ステップＳ６０３ｄ；Ｙｅｓ）、ステップＳ３で抽出したＨＯＧ特徴量を選択したセルの照合用特徴量とする（ステップＳ６０３ｅ）。一方、ΔＨ≦ＴＨ１の場合（ステップＳ６０３ｄ；Ｎｏ）、差分値ΔＨを選択したセルの照合用特徴量とする（ステップＳ６０３ｆ）。閾値ＴＨ１は、少なくとも背景画像に存在する非特定物体のみを含むセルの照合用特徴量が差分値ΔＨになるような値にする。例えば、閾値ＴＨ１は、図５の（ａ）に示した旗５１のような非特定物体が存在する複数枚の画像を用い、非特定物体を含むセルのＨＯＧ特徴量の差分値ΔＨについて統計処理を行って決める。また、勾配方向を９分割したＨＯＧ特徴量は９個の値からなるので、差分値ΔＨも９個の値からなる。そのため、差分値ΔＨは、９個の値全て（例えば図２の曲線ＰＨ全体）で比較した場合には値が大きくても、個別に見た場合にはいくつかの勾配方向で値が小さくなることがある。したがって、例えば、９個の値のうちの５個以上の値が閾値ＴＨ１より大きい場合ΔＨ＞ＴＨ１とする等、ステップＳ６０３ｄの判定基準は適宜決めれば良い。 Next, it is determined whether the difference value ΔH is greater than the threshold value TH1 (step S603d). When ΔH> TH1 (step S603d; Yes), the HOG feature value extracted in step S3 is set as the feature value for matching of the selected cell (step S603e). On the other hand, if ΔH ≦ TH1 (step S603d; No), the difference value ΔH is used as the matching feature amount of the selected cell (step S603f). The threshold value TH1 is set to a value such that a matching feature amount of a cell including at least a non-specific object existing in the background image becomes the difference value ΔH. For example, the threshold TH1 uses a plurality of images in which a non-specific object such as the flag 51 shown in FIG. 5A exists, and statistical processing is performed on the difference value ΔH of the HOG feature amount of a cell including the non-specific object. And decide. Further, since the HOG feature value obtained by dividing the gradient direction into nine parts is composed of nine values, the difference value ΔH is also composed of nine values. Therefore, the difference value ΔH is large when compared with all nine values (for example, the entire curve PH in FIG. 2), but decreases in several gradient directions when viewed individually. Sometimes. Therefore, for example, when 5 or more of the 9 values are larger than the threshold value TH1, ΔH> TH1 is set, and the determination criterion in step S603d may be determined as appropriate.

選択したセルの照合用特徴量を決定したら、次に、検出枠内に照合用特徴量の決まっていないセルが有るか判断する（ステップＳ６０３ｇ）。照合用特徴量の決まっていないセルが有る場合（ステップＳ６０３ｇ；Ｙｅｓ）、ステップＳ６０３ａからの処理を繰り返す。照合用特徴量の決まっていないセルが無い場合（ステップＳ６０３ｇ；Ｎｏ）、照合用特徴量決定処理（ステップＳ６０３）を終了する。 After determining the matching feature amount of the selected cell, it is next determined whether or not there is a cell whose matching feature amount has not been determined in the detection frame (step S603g). When there is a cell for which the matching feature amount is not determined (step S603g; Yes), the processing from step S603a is repeated. When there is no cell whose matching feature value is not determined (step S603g; No), the matching feature value determination process (step S603) is terminated.

図６に示したように、照合用特徴量決定処理（ステップＳ６０３）が終了すると、次に、照合用特徴量と辞書データのＨＯＧ特徴量とから検出枠内の像と特定物体との類似度Ｓを算出する（ステップＳ６０４）。ステップＳ６０４の処理は、特徴量照合部１７が行う。類似度Ｓは、ＨＯＧ特徴量を用いた検出方法における周知の算出方法で算出する。 As shown in FIG. 6, when the matching feature value determination process (step S603) is completed, the similarity between the image in the detection frame and the specific object is calculated from the matching feature value and the HOG feature value of the dictionary data. S is calculated (step S604). The feature amount matching unit 17 performs the process of step S604. The similarity S is calculated by a known calculation method in the detection method using the HOG feature amount.

次に、特徴量照合部１７において、算出した類似度Ｓが閾値ＴＨ２より大きいか判断する（ステップＳ６０５）。類似度Ｓの値が大きいほど、検出枠内の像と特定物体との類似度が高い。そのため、特徴量照合部１７は、Ｓ＞ＴＨ２の場合（ステップＳ６０５；Ｙｅｓ）にのみ、検出枠の設定値（例えば位置及び寸法）を検出リストに登録する（ステップＳ６０６）。閾値ＴＨ２は、任意の値であり、ＨＯＧ特徴量を用いた検出方法で採用される値のいずれかを用いる。 Next, the feature amount matching unit 17 determines whether the calculated similarity S is greater than the threshold value TH2 (step S605). The larger the value of the similarity S, the higher the similarity between the image in the detection frame and the specific object. Therefore, the feature amount matching unit 17 registers the set value (for example, position and size) of the detection frame in the detection list only when S> TH2 (step S605; Yes) (step S606). The threshold value TH2 is an arbitrary value, and any one of values adopted in the detection method using the HOG feature amount is used.

次に、未処理の検出枠が有るか判断する（ステップＳ６０７）。ステップＳ６０７では、例えば、まず選択した１つの寸法の検出枠によるステップＳ６０２〜Ｓ６０６の処理を入力画像の全ての領域に対して行ったか否かを判断する。そして、行っていない領域が有る場合には、未処理の検出枠が有ると判定し（ステップＳ６０７；Ｙｅｓ）、行っていない領域に対しステップＳ６０２〜Ｓ６０６の処理を繰り返す。また、１つの寸法の検出枠によるステップＳ６０２〜Ｓ６０６の処理を入力画像の全ての領域に対して行った場合、他の寸法の検出枠による同様の処理を全て行ったかを判断する。そして、同様の処理を行っていない検出枠が有る場合、未処理の検出枠が有ると判定し（ステップＳ６０７；Ｙｅｓ）、全ての寸法の検出枠による同様の処理が終わるまでステップＳ６０２〜Ｓ６０６の処理を繰り返す。 Next, it is determined whether there is an unprocessed detection frame (step S607). In step S607, for example, it is first determined whether or not the processing in steps S602 to S606 with the selected detection frame of one dimension has been performed on all regions of the input image. If there is a region that has not been performed, it is determined that there is an unprocessed detection frame (step S607; Yes), and the processing of steps S602 to S606 is repeated for the region that has not been performed. In addition, when the processing of steps S602 to S606 using the detection frame of one size is performed on all regions of the input image, it is determined whether all the same processing using the detection frame of another size is performed. If there is a detection frame that has not been subjected to the same process, it is determined that there is an unprocessed detection frame (step S607; Yes), and steps S602 to S606 are performed until the same process is completed for the detection frames of all dimensions. Repeat the process.

ステップＳ６０２〜Ｓ６０６の処理を繰り返し未処理の検出枠が無いと判定すると（ステップＳ６０７；Ｎｏ）、照合処理が終了する。 When the processes in steps S602 to S606 are repeated and it is determined that there is no unprocessed detection frame (step S607; No), the collation process ends.

なお、ステップＳ６の照合処理において１つの検出枠によるステップＳ６０２〜Ｓ６０６の処理を繰り返す場合、ステップＳ６０２では、例えば入力画像全体をラスタスキャンするよう検出枠の設定位置をずらしていく。 Note that when the processing of steps S602 to S606 with one detection frame is repeated in the collation processing of step S6, in step S602, for example, the setting position of the detection frame is shifted so as to raster scan the entire input image.

画像処理装置１における特定物体の検出処理では、ステップＳ６の照合処理が終了すると、図４に示したように、照合結果を出力する（ステップＳ７）。 In the specific object detection process in the image processing apparatus 1, when the collation process in step S6 ends, the collation result is output as shown in FIG. 4 (step S7).

ステップＳ７は、例えばステップＳ６０６で検出リストに登録された検出枠の位置及び寸法に基づいて、フレーム番号がｉの入力画像に検出した特定物体を囲む枠線を合成（付加）し、表示装置３に出力する。 In step S7, for example, based on the position and size of the detection frame registered in the detection list in step S606, a frame line surrounding the detected specific object is synthesized (added) to the input image having the frame number i, and the display device 3 is added. Output to.

＜特定物体の検出例＞
上述の検出処理による特定物体の検出例を、図８〜図１０を参照しながら説明する。 <Specific object detection example>
An example of detection of a specific object by the above-described detection process will be described with reference to FIGS.

図８は、照合用特徴量決定処理で用いる２枚の入力画像の一例を示す模式図である。図９は、図８の入力画像における特徴量の差分値ΔＨの例を示す模式図である。図１０は、図８の入力画像を用いた照合用特徴量決定処理後及び照合結果の画像を示す模式図である。 FIG. 8 is a schematic diagram illustrating an example of two input images used in the matching feature amount determination process. FIG. 9 is a schematic diagram illustrating an example of the feature value difference value ΔH in the input image of FIG. 8. FIG. 10 is a schematic diagram illustrating an image after the matching feature amount determination processing using the input image of FIG. 8 and the matching result image.

本実施形態の特定物体の検出処理では、事前学習したＨＯＧ特徴量とは別に、最初（フレーム番号１）の入力画像から抽出したＨＯＧ特徴量を背景特徴量として背景特徴量データベース１５に登録する。そして、フレーム番号２以降の入力画像から特定物体を検出する際には、まず、照合用特徴量決定処理（ステップＳ６）により、入力画像における各セルの照合用特徴量を決定する。 In the specific object detection process of the present embodiment, the HOG feature value extracted from the first (frame number 1) input image is registered in the background feature value database 15 as the background feature value, separately from the previously learned HOG feature value. When a specific object is detected from the input image after frame number 2, the matching feature value of each cell in the input image is first determined by the matching feature value determination process (step S6).

特定物体を検出する入力画像（検出対象画像）には、図８の（ａ）に示すように、非特定物体である旗５１や、特定物体である人物５０が存在する画像４０ｂがある。一方、図８の（ｂ）に示すように、最初の入力画像（背景画像）４０ａには、非特定物体である旗５１のみが存在している。 As shown in FIG. 8A, the input image (detection target image) for detecting a specific object includes an image 40b in which a flag 51 that is a non-specific object and a person 50 that is a specific object are present. On the other hand, as shown in FIG. 8B, only the flag 51 that is a non-specific object exists in the first input image (background image) 40a.

検出対象画像４０ｂに対し照合処理（ステップＳ６）を行う場合、照合用特徴量決定処理により、検出対象画像４０ｂの旗５１を含む検出枠４３内の各セル、人物５０を含む検出枠４４内の各セル等の照合用特徴量を決定する。 When the matching process (step S6) is performed on the detection target image 40b, each cell in the detection frame 43 including the flag 51 of the detection target image 40b and the detection frame 44 including the person 50 are detected by the matching feature amount determination process. A matching feature amount for each cell or the like is determined.

検出対象画像４０ｂの検出枠４３における最上段の左側のセル４１ａの照合用特徴量は、セル４１ａから抽出したＨＯＧ特徴量と、セル４１ａと対応する背景画像４０ａのセル４１ＡのＨＯＧ特徴量（背景特徴量）との差分値ΔＨに基づいて決定する。同様に、検出対象画像４０ａの検出枠４３における上から３段目の右側のセル４１ｂの照合用特徴量は、セル４１ｂから抽出したＨＯＧ特徴量と、セル４１ｂと対応する基準画像４０ａのセル４１ＢのＨＯＧ特徴量（背景特徴量）との差分値ΔＨに基づいて決定する。 The matching feature amount of the leftmost cell 41a in the detection frame 43 of the detection target image 40b includes the HOG feature amount extracted from the cell 41a and the HOG feature amount (background) of the cell 41A of the background image 40a corresponding to the cell 41a. It is determined based on the difference value ΔH with respect to the feature amount. Similarly, the matching feature quantity of the right cell 41b in the third row from the top in the detection frame 43 of the detection target image 40a is the HOG feature quantity extracted from the cell 41b and the cell 41B of the reference image 40a corresponding to the cell 41b. It is determined based on the difference value ΔH from the HOG feature amount (background feature amount).

検出枠４３に存在する旗５１が路面等に設置されている場合、両画像４０ａ，４０ｂにおける旗５１は、輪郭（形状）に違いが生じることがあるものの、位置はほとんど変わらない。ＨＯＧ特徴量は、局所領域における輝度の勾配方向をヒストグラム化した特徴量であり、画像内の物体の並進に対して不変な特徴量である。そのため、図９の（ａ）に示すように、検出対象画像４０ｂのセル４１ｂのＨＯＧ特徴量ＰＨ１と、背景画像４０ａのセル４１ＢのＨＯＧ特徴量ＰＨ２とはほぼ同じ値になる。よって、検出対象画像４０ｂのセル４１ｂと背景画像４０ａのセル４１ＢとのＨＯＧ特徴量の差分値ΔＨは、図９の（ａ）に示したように全ての勾配方向の勾配強度が非常に小さな値になる。また、検出枠４３内の他のセルのＨＯＧ特徴量の差分値ΔＨは、図９の（ａ）に示した差分値ΔＨと同様に、全ての勾配方向の勾配強度が非常に小さな値になる。 When the flag 51 existing in the detection frame 43 is installed on the road surface or the like, the position of the flag 51 in both the images 40a and 40b is almost the same although there may be a difference in contour (shape). The HOG feature amount is a feature amount obtained by histogramating the luminance gradient direction in the local region, and is a feature amount that is invariant to the translation of the object in the image. Therefore, as shown in FIG. 9A, the HOG feature amount PH1 of the cell 41b of the detection target image 40b and the HOG feature amount PH2 of the cell 41B of the background image 40a have substantially the same value. Therefore, the difference value ΔH of the HOG feature amount between the cell 41b of the detection target image 40b and the cell 41B of the background image 40a is a value in which the gradient strengths in all gradient directions are very small as shown in FIG. become. In addition, the difference value ΔH of the HOG feature amount of other cells in the detection frame 43 is a very small value in the gradient strengths in all gradient directions, like the difference value ΔH shown in FIG. .

一方、検出対象画像４０ｂの検出枠４４に対応する背景画像４０ａの検出枠４４内には、人物５０が存在しない。そのため、図９の（ｂ）に示すように、検出対象画像４０ｂの検出枠４４における最上段の左側のセル４１ｃのＨＯＧ特徴量ＰＨ３と、セル４１ｃと対応する背景画像４０ａのセル４１ＣのＨＯＧ特徴量ＰＨ４とは異なる値になる。よって、検出対象画像４０ｂのセル４１ｃと背景画像４０ａのセル４１ＣとのＨＯＧ特徴量の差分値ΔＨは、図９の（ｂ）に示したようにいくつかの勾配方向の勾配強度が大きな値になる。また、検出枠４４内の他のセルのＨＯＧ特徴量の差分値ΔＨは、図９の（ｂ）に示した差分値ΔＨと同様に、いくつかの勾配方向の勾配強度が大きな値になる。 On the other hand, the person 50 does not exist in the detection frame 44 of the background image 40a corresponding to the detection frame 44 of the detection target image 40b. Therefore, as shown in FIG. 9B, the HOG feature amount PH3 of the leftmost cell 41c in the detection frame 44 of the detection target image 40b and the HOG feature of the cell 41C of the background image 40a corresponding to the cell 41c. The value is different from the amount PH4. Therefore, the difference value ΔH of the HOG feature quantity between the cell 41c of the detection target image 40b and the cell 41C of the background image 40a has a large gradient strength in several gradient directions as shown in FIG. 9B. Become. Further, the difference value ΔH of the HOG feature amount of the other cells in the detection frame 44 has a large gradient strength in several gradient directions, like the difference value ΔH shown in FIG.

このように、検出対象画像４０ｂのセルと背景画像４０ａのセルとのＨＯＧ特徴量の差分値ΔＨは、検出対象画像４０ｂのセルに存在する物体が背景画像４０ａのセルにも存在しているか否かで大きく異なる。本実施形態では、ＨＯＧ特徴量のこのような性質を利用し、検出対象画像４０ｂのセルに存在する物体が基準画像４０ａのセルにも存在している場合には照合用特徴量を差分値ΔＨとし、存在しない場合には照合用特徴量を抽出したＨＯＧ特徴量とする。具体的には、図９の（ａ）及び（ｂ）に示したような閾値ＴＨ１を設定し、差分値ΔＨが閾値ＴＨ１以下のセルは差分値ΔＨを照合用特徴量とし、差分値ΔＨが閾値ＴＨ１より大きいセルは抽出したＨＯＧ特徴量を照合用特徴量とする。そのため、検出対象画像４０ｂの検出枠４３内の各セルの照合用特徴量は、全て差分値ΔＨとなる。一方、検出対象画像４０ｂの検出枠内４４内の各セルの照合用特徴量は、全て抽出したＨＯＧ特徴量となる。 As described above, the difference value ΔH of the HOG feature amount between the cell of the detection target image 40b and the cell of the background image 40a indicates whether an object existing in the cell of the detection target image 40b is also present in the cell of the background image 40a. But it is very different. In the present embodiment, using such a property of the HOG feature amount, when an object existing in the cell of the detection target image 40b is also present in the cell of the reference image 40a, the matching feature amount is set to the difference value ΔH. If there is no match, the HOG feature value is extracted from the matching feature value. Specifically, a threshold value TH1 as shown in FIGS. 9A and 9B is set, and for cells whose difference value ΔH is equal to or less than the threshold value TH1, the difference value ΔH is used as a matching feature amount. A cell larger than the threshold value TH1 uses the extracted HOG feature value as a matching feature value. For this reason, all the matching feature amounts of the cells in the detection frame 43 of the detection target image 40b are the difference value ΔH. On the other hand, the matching feature quantity of each cell in the detection frame 44 of the detection target image 40b is the HOG feature quantity that is all extracted.

つまり、検出対象画像４０ｂに対して照合用特徴量決定処理を行うと、処理後の画像は、図１０の（ａ）に示す画像４０ｃのような検出枠４３内に旗５１が存在していない画像と実質的に等価になる。したがって、画像４０ｃと辞書データ１４とを照合すると、検出枠４３内の特徴量と人物（特定物体）との類似度Ｓは低くなり、検出枠４４内の特徴量と人物との類似度Ｓは高くなる。そのため、検出対象画像４０ｂからは検出枠４４内の人物５０のみが特定物体として検出され、図１０の（ｂ）に示すように、照合結果の画像４０ｄは人物５０のみが枠５５で囲まれる。 That is, when the matching feature amount determination process is performed on the detection target image 40b, the flag 51 does not exist in the detection frame 43 like the image 40c illustrated in FIG. Substantially equivalent to an image. Therefore, when the image 40c and the dictionary data 14 are collated, the similarity S between the feature amount in the detection frame 43 and the person (specific object) is low, and the similarity S between the feature amount in the detection frame 44 and the person is Get higher. Therefore, only the person 50 in the detection frame 44 is detected as a specific object from the detection target image 40b, and only the person 50 is surrounded by the frame 55 in the collation result image 40d as shown in FIG.

このように、本実施形態の特定物体の検出方法では、検出対象画像４０ｂのセルのうちの特徴量の差分値ΔＨが閾値ＴＨ１より小さいセルについては、抽出した特徴量を差分値ΔＨに置換し、差分値ΔＨを照合用特徴量とする。そのため、検出対象画像４０ｂの検出枠４３内に存在する旗５１のＨＯＧ特徴量が人物のＨＯＧ特徴量と類似している場合でも、辞書データと照合する照合用特徴量は人物のＨＯＧ特徴量とは異なる値になる。したがって、ＨＯＧ特徴量が人物（特定物体）のＨＯＧ特徴量と類似した値になり得る非特定物体を人物として誤検出してしまうことを防げる。 As described above, in the method for detecting a specific object according to the present embodiment, the extracted feature value is replaced with the difference value ΔH for the cell whose feature value difference value ΔH is smaller than the threshold value TH1 among the cells of the detection target image 40b. The difference value ΔH is used as a matching feature amount. Therefore, even if the HOG feature value of the flag 51 existing in the detection frame 43 of the detection target image 40b is similar to the person's HOG feature value, the matching feature value to be matched with the dictionary data is the person's HOG feature value. Will have different values. Therefore, it is possible to prevent erroneous detection of a non-specific object whose HOG feature value is similar to the HOG feature value of a person (specific object) as a person.

次に、上述の検出処理による特定物体の別の検出例を、図１１〜図１３を参照しながら説明する。 Next, another detection example of the specific object by the above-described detection processing will be described with reference to FIGS.

図１１は、照合用特徴量決定処理で用いる２枚の入力画像の別の一例を示す模式図である。図１２は、図１１の入力画像における特徴量の差分値ΔＨの例を示す模式図である。図１３は、図１１の入力画像を用いた照合用特徴量決定処理後及び照合結果の画像を示す模式図である。 FIG. 11 is a schematic diagram illustrating another example of two input images used in the matching feature amount determination process. FIG. 12 is a schematic diagram illustrating an example of the feature value difference value ΔH in the input image of FIG. 11. FIG. 13 is a schematic diagram illustrating an image of the matching result after the matching feature amount determination process using the input image of FIG.

画像処理装置１により特定物体の検出処理を行う際、特定物体を検出する入力画像（検出対象画像）には、図１１の（ａ）に示すように、非特定物体である旗５１よりも撮像装置側に位置する人物５０が旗５１と重なっている画像４０ｅもある。一方、図１１の（ｂ）に示すように、最初の入力画像（背景画像）４０ａには、非特定物体である旗５１のみが存在している。 When the specific object detection process is performed by the image processing apparatus 1, the input image (detection target image) for detecting the specific object is captured more than the flag 51, which is a non-specific object, as shown in FIG. There is also an image 40e in which the person 50 located on the apparatus side overlaps the flag 51. On the other hand, as shown in FIG. 11B, only the flag 51 that is a non-specific object exists in the first input image (background image) 40a.

検出対象画像４０ｅに対し照合処理（ステップＳ６）を行う場合、照合用特徴量決定処理により、検出対象画像４０ｅの旗５１及び人物５０を含む検出枠４５内の各セル等の照合用特徴量を決定する。 When the matching process (step S6) is performed on the detection target image 40e, the matching feature quantity of each cell in the detection frame 45 including the flag 51 and the person 50 of the detection target image 40e is obtained by the matching feature quantity determination process. decide.

検出対象画像４０ｅの検出枠４５における最上段の左側のセル４１ｅの照合用特徴量は、セル４１ｅから抽出したＨＯＧ特徴量と、セル４１ｅと対応する背景画像４０ａのセル４１ＥのＨＯＧ特徴量（背景特徴量）との差分値ΔＨに基づいて決定する。同様に、検出対象画像４０ｅの検出枠４５における上から３段目の中央のセル４１ｆの照合用特徴量は、セル４１ｆから抽出したＨＯＧ特徴量と、セル４１ｆと対応する背景画像４０ａのセル４１ＦのＨＯＧ特徴量（背景特徴量）との差分値ΔＨに基づいて決定する。 The matching feature amount of the leftmost cell 41e in the detection frame 45 of the detection target image 40e includes the HOG feature amount extracted from the cell 41e and the HOG feature amount (background) of the cell 41E of the background image 40a corresponding to the cell 41e. It is determined based on the difference value ΔH with respect to the feature amount. Similarly, the matching feature quantity of the center cell 41f in the third row from the top in the detection frame 45 of the detection target image 40e is the HOG feature quantity extracted from the cell 41f and the cell 41F of the background image 40a corresponding to the cell 41f. It is determined based on the difference value ΔH from the HOG feature amount (background feature amount).

検出枠４３に存在する旗５１が路面等に設置されている場合、両画像４０ａ，４０ｅにおける旗５１は、輪郭（形状）に違いが生じることがあるものの、位置はほとんど変わらない。そのため、図１２の（ａ）に示すように、旗５１のみが存在する検出対象画像４０ｂのセル４１ｅのＨＯＧ特徴量ＰＨ５と、背景画像４０ａのセル４１ＥのＨＯＧ特徴量ＰＨ６とはほぼ同じ値になる。よって、検出対象画像４０ｅのセル４１ｅと背景画像４０ａのセル４１ＥとのＨＯＧ特徴量の差分値ΔＨは、図１２の（ａ）に示したように全ての勾配方向の勾配強度が閾値ＴＨ１よりも小さな値になる。したがって、検出対象画像４０ｅのセル４１ｅの照合用特徴量は差分値ΔＨになる。 When the flag 51 existing in the detection frame 43 is installed on the road surface or the like, the position of the flag 51 in both the images 40a and 40e is almost the same although there may be a difference in contour (shape). Therefore, as shown in FIG. 12A, the HOG feature amount PH5 of the cell 41e of the detection target image 40b in which only the flag 51 exists and the HOG feature amount PH6 of the cell 41E of the background image 40a have substantially the same value. Become. Therefore, the difference value ΔH of the HOG feature amount between the cell 41e of the detection target image 40e and the cell 41E of the background image 40a is such that the gradient strengths in all gradient directions are lower than the threshold value TH1 as shown in FIG. Small value. Therefore, the matching feature amount of the cell 41e of the detection target image 40e is the difference value ΔH.

一方、図１１の（ａ）及び（ｂ）に示したように、旗５１及び人物５０が存在する検出対象画像４０ｅのセル４１ｆに対応する背景画像４０ａのセル４１Ｆには、旗５１のみが存在する。そのため、図１２の（ｂ）に示すように、検出対象画像４０ｅのセル４１ｆのＨＯＧ特徴量ＰＨ７と、背景画像４０ａのセル４１ＦのＨＯＧ特徴量ＰＨ８とは異なる値になる。よって、検出対象画像４０ｅのセル４１ｆと背景画像４０ａのセル４１ＦとのＨＯＧ特徴量の差分値ΔＨは、図１２の（ｂ）に示したようにいくつかの勾配方向の勾配強度が閾値ＴＨ１よりも大きな値になる。したがって、検出対象画像４０ｅのセル４１ｆの照合用特徴量はセル４１ｆから抽出したＨＯＧ特徴量になる。 On the other hand, as shown in FIGS. 11A and 11B, only the flag 51 exists in the cell 41F of the background image 40a corresponding to the cell 41f of the detection target image 40e where the flag 51 and the person 50 exist. To do. Therefore, as shown in FIG. 12B, the HOG feature amount PH7 of the cell 41f of the detection target image 40e and the HOG feature amount PH8 of the cell 41F of the background image 40a have different values. Therefore, the difference value ΔH of the HOG feature amount between the cell 41f of the detection target image 40e and the cell 41F of the background image 40a is such that the gradient strength in several gradient directions is less than the threshold value TH1 as shown in FIG. Will also be a large value. Therefore, the matching feature amount of the cell 41f of the detection target image 40e is the HOG feature amount extracted from the cell 41f.

つまり、検出対象画像４０ｅに対して照合用特徴量決定処理を行うと、処理後の画像は、図１３の（ａ）に示す画像４０ｆのように検出枠４５内に存在する旗５１の一部が欠落した画像と実質的に等価になる。これにより、検出枠４５内に存在する像の特徴量（形状）が、人物５０の特徴量に近づく。したがって、画像４０ｆと辞書データ１４とを照合すると、検出枠４５内の特徴量と人物との類似度Ｓは高くなる。そのため、検出対象画像４０ｆからは検出枠４５内の人物５０が特定物体として検出され、図１３の（ｂ）に示すように、照合結果の画像４０ｇは人物５０が枠５５で囲まれる。 That is, when the matching feature amount determination process is performed on the detection target image 40e, the processed image is a part of the flag 51 existing in the detection frame 45 as in the image 40f illustrated in FIG. Is substantially equivalent to a missing image. Thereby, the feature amount (shape) of the image existing in the detection frame 45 approaches the feature amount of the person 50. Therefore, when the image 40f and the dictionary data 14 are collated, the similarity S between the feature amount in the detection frame 45 and the person increases. Therefore, the person 50 in the detection frame 45 is detected as a specific object from the detection target image 40f, and the person 50 is surrounded by the frame 55 in the collation result image 40g as shown in FIG.

このように、本実施形態の特定物体の検出方法では、入力画像における特定物体が非特定物体と重なっている場合にも、特定物体の特徴量（輪郭領域）の変化を抑制しつつ非特定物体の特徴量を減算でき、特定物体を正しく検出することができる。 As described above, in the specific object detection method of the present embodiment, even when the specific object in the input image overlaps the non-specific object, the non-specific object is suppressed while suppressing the change in the feature amount (contour region) of the specific object. The feature amount can be subtracted and a specific object can be detected correctly.

以上説明したように、第１の実施形態に係る特定物体の検出方法では、入力画像（検出対象画像）から抽出したＨＯＧ特徴量と背景画像から抽出したＨＯＧ特徴量（背景特徴量）との差分値ΔＨが小さいセルの照合用特徴量を差分値ΔＨにする。これにより、検出対象画像及び背景画像に存在する非特定物体の特徴量を検出対象画像から減算することができる。そのため、非特定物体の特徴量が特定物体の特徴量と類似している場合に非特定物体を特定物体として誤検出することを防げる。 As described above, in the specific object detection method according to the first embodiment, the difference between the HOG feature value extracted from the input image (detection target image) and the HOG feature value (background feature value) extracted from the background image. The feature value for collation of a cell having a small value ΔH is set to a difference value ΔH. Thereby, the feature amount of the non-specific object existing in the detection target image and the background image can be subtracted from the detection target image. Therefore, it is possible to prevent the non-specific object from being erroneously detected as the specific object when the feature amount of the non-specific object is similar to the feature amount of the specific object.

また、第１の実施形態に係る特定物体の検出方法では、入力画像における特定物体が非特定物体と重なっている場合に、特定物体の特徴量（輪郭領域）の変化を抑制しつつ非特定物体の特徴量を減算できる。そのため、特定物体と非特定物体とが重なっている場合にも特定物体を正しく検出することができる。 In the specific object detection method according to the first embodiment, when the specific object in the input image overlaps the non-specific object, the non-specific object is suppressed while suppressing the change in the feature amount (contour region) of the specific object. The feature amount can be subtracted. Therefore, the specific object can be detected correctly even when the specific object and the non-specific object overlap.

したがって、上記の特定物体の検出方法を実現する画像処理装置１は、特定物体の検出精度を向上させることができる。 Therefore, the image processing apparatus 1 that realizes the above-described specific object detection method can improve the detection accuracy of the specific object.

なお、第１の実施形態では、ＨＯＧ特徴量を用いて特定物体を検出する検出方法を説明した。しかしながら、特定物体の検出に用いる特徴量は、ＨＯＧ特徴量に限らず、画像内における物体の並進に対して不変又は不変とみなせる特徴量であればよい。よって、特定物体の検出に用いる特徴量は、例えば色相等の色不変量でもよい。さらに、例えば複数枚の画像から抽出した特徴量を統計処理した際に分散又は標準偏差が小さく、並進に対して不変とみなすことができれば他の特徴量でもよい。 In the first embodiment, the detection method for detecting a specific object using the HOG feature amount has been described. However, the feature amount used for detecting the specific object is not limited to the HOG feature amount, and may be a feature amount that can be regarded as invariable or invariant with respect to translation of the object in the image. Therefore, the feature amount used for detecting the specific object may be a color invariant such as a hue, for example. Further, for example, when the feature values extracted from a plurality of images are statistically processed, other feature values may be used as long as the variance or standard deviation is small and can be regarded as invariant with respect to translation.

また、第１の実施形態で説明した照合用特徴量決定処理は、図７に示したように、検出枠を設定し、検出枠内のセルごとに照合用特徴量を決定している。しかしながら、照合用特徴量は、検出枠内のセルごとに限らず、例えば入力画像内のセルをラスタスキャンして決定してもよい。そのため、例えば、図６に示したステップＳ６０３の代わりに、図４に示した処理におけるステップＳ４とステップＳ６との間に照合用特徴量を決定する処理を設けてもよい。 In the matching feature amount determination process described in the first embodiment, as shown in FIG. 7, a detection frame is set, and a matching feature amount is determined for each cell in the detection frame. However, the matching feature amount is not limited to each cell in the detection frame, and may be determined by raster scanning the cell in the input image, for example. Therefore, for example, instead of step S603 illustrated in FIG. 6, a process for determining a matching feature amount may be provided between step S4 and step S6 in the process illustrated in FIG.

また、背景画像とする入力画像は、図５の（ａ）や図８の（ｂ）に示したように、人物等の特定物体が存在せず、かつＨＯＧ特徴量が特定物体のＨＯＧ特徴量と類似した値をとり得る非特定物体が存在する画像であることが好ましい。しかしながら、第１の実施形態に係る特定物体の検出方法では、背景画像とする入力画像に特定物体が存在していてもよい。 Further, as shown in FIGS. 5A and 8B, the input image as the background image does not include a specific object such as a person, and the HOG feature value of the specific object is the HOG feature value. It is preferable that the image includes a non-specific object that can take a value similar to. However, in the method for detecting a specific object according to the first embodiment, the specific object may exist in the input image as the background image.

特定物体が人物である場合、背景画像に存在する人物は一定時間が経過した後の入力画像には存在しない可能性が高い。そのため、入力画像のうちの背景画像に存在する人物と対応する領域は背景物体になっており、ＨＯＧ特徴量が人物のＨＯＧ特徴量とは異なっている。そのため、背景物体のＨＯＧ特徴量と人物のＨＯＧ特徴量との差分値ΔＨは閾値ＴＨ１よりも大きくなり、背景物体のＨＯＧ特徴量が照合用特徴量となる。したがって、第１の実施形態に係る特定物体の検出方法では、背景画像とする入力画像に特定物体が存在していても、入力画像から特定物体を検出することができる。 When the specific object is a person, there is a high possibility that the person existing in the background image does not exist in the input image after a predetermined time has elapsed. Therefore, a region corresponding to a person existing in the background image in the input image is a background object, and the HOG feature value is different from the person's HOG feature value. Therefore, the difference value ΔH between the HOG feature value of the background object and the person's HOG feature value is larger than the threshold value TH1, and the HOG feature value of the background object becomes the matching feature value. Therefore, the specific object detection method according to the first embodiment can detect the specific object from the input image even if the specific object exists in the input image as the background image.

このように、第１の実施形態に係る特定物体の検出方法では、背景画像とする入力画像に特定物体（人物）が存在しても構わないので、検出処理の開始時に最初の入力画像を自動的に背景画像とすることが可能である。そのため、事前学習処理とは別に、検出処理を開始する前に、特定物体（人物）が存在しない画像を取得して背景特徴量データベースに登録する処理を行わなくてもよい。したがって、検出処理を開始するまでの手間が増えることを防げ、容易に検出処理を実施できる。 As described above, in the method for detecting a specific object according to the first embodiment, a specific object (person) may exist in the input image as a background image. Therefore, when the detection process starts, the first input image is automatically set. Therefore, it can be used as a background image. Therefore, separately from the pre-learning process, before starting the detection process, it is not necessary to perform a process of acquiring an image without a specific object (person) and registering it in the background feature amount database. Therefore, it is possible to prevent an increase in labor until the detection process is started, and the detection process can be easily performed.

また、図１に示した画像処理装置１は、辞書データ１４を作成するための事前学習部１３を備える。しかしながら、辞書データ１４は、他の画像処理装置で作成し可搬記録媒体やネットワークを通じて画像処理装置１内に読み込んでもよい。そのため、第１の実施形態に係る画像処理装置１は、事前学習部１３が無くてもよい。 The image processing apparatus 1 shown in FIG. 1 includes a pre-learning unit 13 for creating dictionary data 14. However, the dictionary data 14 may be created by another image processing apparatus and read into the image processing apparatus 1 through a portable recording medium or a network. Therefore, the image processing apparatus 1 according to the first embodiment may not have the prior learning unit 13.

（第２の実施形態）
図１４は、本発明の第２の実施形態に係る特定物体の検出処理の手順を示すフローチャートである。 (Second Embodiment)
FIG. 14 is a flowchart showing the procedure of the specific object detection process according to the second embodiment of the present invention.

本実施形態では、第１の実施形態とは別の特定物体の検出処理の処理手順について説明する。なお、本実施形態の検出処理は、図１に示した画像処理装置１で行う。 In the present embodiment, a specific object detection process procedure different from that of the first embodiment will be described. Note that the detection processing of this embodiment is performed by the image processing apparatus 1 shown in FIG.

本実施形態の特定物体の検出処理では、図１４に示すように、まず、背景画像候補の条件を設定する（ステップＳ１０）。ステップＳ１０では、入力画像のうちの背景画像の候補とする画像のフレーム番号、又は撮像時刻等を設定する。フレーム番号を条件にする場合、フレーム番号１のほかに、例えばフレーム番号ｉがｉ＝１０００×ｎ（ｎは自然数）の場合等の条件を設定する。撮像時刻を条件にする場合、検出処理の開始時刻（Ｔ＝０）のほかに、例えばｕ時間毎又は毎日ｓ時等の条件を設定する。 In the specific object detection process of the present embodiment, as shown in FIG. 14, first, a background image candidate condition is set (step S10). In step S10, the frame number of the image that is a candidate for the background image in the input image or the imaging time is set. When the frame number is used as a condition, in addition to the frame number 1, a condition such as when the frame number i is i = 1000 × n (n is a natural number) is set. When the imaging time is used as a condition, in addition to the detection processing start time (T = 0), a condition such as every u hours or every day s is set.

画像処理装置１は、ステップＳ１０の後、撮像装置２で撮像した画像の取得を開始し、入力画像として入力画像保持部１１に保持させる。この際、画像取得部１０は、取得した入力画像に対し時系列に固有のフレーム番号（通し番号）を１から順に付与する。 After step S10, the image processing apparatus 1 starts acquiring an image captured by the imaging apparatus 2, and causes the input image holding unit 11 to hold the image as an input image. At this time, the image acquisition unit 10 sequentially assigns frame numbers (serial numbers) unique to the acquired input image in order from 1 in order.

入力画像の取得を開始したら、次に、入力画像のフレーム番号を表す変数ｉを初期化する（ステップＳ２）。ステップＳ２では、変数ｉを１にする。 Once acquisition of the input image is started, next, a variable i representing the frame number of the input image is initialized (step S2). In step S2, the variable i is set to 1.

次に、特徴量抽出部１２において、フレーム番号ｉの入力画像が背景画像候補であるか判断する（ステップＳ１１）。ステップＳ１１では、ステップＳ１０で設定した条件と入力画像のフレーム番号ｉ又は撮像時刻等を比較する。入力画像が背景画像候補である場合（ステップＳ４；Ｙｅｓ）、特徴量抽出部１２は、続けて背景特徴量設定処理を行う（ステップＳ１２）。 Next, the feature amount extraction unit 12 determines whether the input image with the frame number i is a background image candidate (step S11). In step S11, the condition set in step S10 is compared with the frame number i of the input image or the imaging time. If the input image is a background image candidate (step S4; Yes), the feature amount extraction unit 12 continues to perform background feature amount setting processing (step S12).

一方、入力画像が背景画像候補ではない場合（ステップＳ１１；Ｎｏ）、第１の実施形態で説明した照合処理を行い（ステップＳ６）、照合結果を出力する（ステップＳ７）。 On the other hand, when the input image is not a background image candidate (step S11; No), the collation process described in the first embodiment is performed (step S6), and the collation result is output (step S7).

背景特徴量設定処理（ステップＳ５）、又は照合結果を出力する処理（ステップＳ７）の後は、検出処理を続けるか判断する（ステップＳ８）。検出処理を続ける場合（ステップＳ８；Ｙｅｓ）、フレーム番号を表す変数ｉを１だけインクリメント（ｉ＝ｉ＋１）し、ステップＳ３からの処理を繰り返す。検出処理を続けない場合（ステップＳ８；Ｎｏ）、検出処理を終了する。 After the background feature amount setting process (step S5) or the process of outputting the collation result (step S7), it is determined whether to continue the detection process (step S8). When the detection process is continued (step S8; Yes), the variable i representing the frame number is incremented by 1 (i = i + 1), and the process from step S3 is repeated. If the detection process is not continued (step S8; No), the detection process is terminated.

図１５は、図１４の背景特徴量設定処理の処理内容を示すフローチャートである。
本実施形態の検出処理で行う背景特徴量設定処理では、図１５に示すように、まず、背景特徴量データベース１５に背景特徴量が登録されているか判断する（ステップＳ１２０１）。入力画像が最初（フレーム番号１）の入力画像であり背景特徴量が登録されていない場合（ステップＳ１２０１；Ｎｏ）、ステップＳ３で抽出したＨＯＧ特徴量を背景特徴量としてデータベースに登録し（ステップＳ１２０２）、背景特徴量設定処理を終了する。 FIG. 15 is a flowchart showing the processing content of the background feature amount setting processing of FIG.
In the background feature value setting process performed in the detection process of the present embodiment, as shown in FIG. 15, first, it is determined whether a background feature value is registered in the background feature value database 15 (step S1201). If the input image is the first (frame number 1) input image and no background feature value is registered (step S1201; No), the HOG feature value extracted in step S3 is registered in the database as a background feature value (step S1202). ), The background feature amount setting process is terminated.

一方、入力画像が最初の入力画像ではなく背景特徴量が登録されている場合（ステップＳ１２０１；Ｙｅｓ）、データベースの背景特徴量と入力画像から抽出したＨＯＧ特徴量とを比較し（ステップＳ１２０３）、特徴量の差異が所定の更新条件を満たすか判断する（ステップＳ１２０４）。更新条件は、例えば特徴量の差分値ΔＨが閾値ＴＨ１よりも大きいセルが所定の数以上存在する等の条件にする。 On the other hand, if the input image is not the first input image but a background feature amount is registered (step S1201; Yes), the background feature amount in the database is compared with the HOG feature amount extracted from the input image (step S1203). It is determined whether the difference between the feature amounts satisfies a predetermined update condition (step S1204). The update condition is set such that, for example, a predetermined number or more of cells having a feature value difference value ΔH larger than the threshold value TH1 exist.

特徴量の差異が所定の更新条件を満たしていない場合（ステップＳ１２０４；Ｎｏ）、データベースの背景特徴量を更新せず背景特徴量設定処理を終了する。一方、更新条件を満たしている場合（ステップＳ１２０４；Ｙｅｓ）、データベースの背景特徴量を入力画像から抽出したＨＯＧ特徴量に更新し（ステップＳ１２０５）、背景特徴量設定処理を終了する。 If the difference in feature quantity does not satisfy the predetermined update condition (step S1204; No), the background feature quantity setting process is terminated without updating the background feature quantity in the database. On the other hand, if the update condition is satisfied (step S1204; Yes), the background feature amount in the database is updated to the HOG feature amount extracted from the input image (step S1205), and the background feature amount setting process is terminated.

本実施形態の検出処理におけるステップＳ６〜Ｓ８の処理は、第１の実施形態で説明した通りでよい。 The processes in steps S6 to S8 in the detection process of the present embodiment may be as described in the first embodiment.

すなわち、第１の実施形態の検出処理では最初（検出処理開始時）の入力画像から抽出したＨＯＧ特徴量を背景特徴量として使い続けるのに対し、本実施形態の検出処理では背景特徴量を随時更新する。 That is, in the detection process of the first embodiment, the HOG feature amount extracted from the first input image (at the start of the detection process) is continuously used as the background feature amount, whereas in the detection process of the present embodiment, the background feature amount is used as needed. Update.

図１６は、入力画像内における非特定物体の有無の時間変化の例を示す模式図である。
第１の実施形態では、背景画像の一例として、図５の（ａ）や図８の（ｂ）のように、ＨＯＧ特徴量が特定物体（人物）のＨＯＧ特徴量と類似した値になり得る非特定物体（旗５１）が存在する画像を挙げている。しかしながら、画像処理装置１を用いた特定物体の検出処理は、任意のタイミングで開始される。そのため、最初（フレーム番号１）の入力画像は、図１６に示した画像４０ｈのように、旗５１が存在しない画像になることがある。なお、図１６では、時刻ｔ０が検出処理の開始時刻であり、時間軸は上から下に向かって進む。 FIG. 16 is a schematic diagram illustrating an example of a temporal change in the presence or absence of a non-specific object in the input image.
In the first embodiment, as an example of the background image, the HOG feature value can be a value similar to the HOG feature value of the specific object (person) as shown in FIG. 5A and FIG. 8B. An image in which a non-specific object (flag 51) is present is cited. However, the specific object detection process using the image processing apparatus 1 is started at an arbitrary timing. Therefore, the first input image (frame number 1) may be an image in which the flag 51 does not exist, as in the image 40h illustrated in FIG. In FIG. 16, time t0 is the start time of the detection process, and the time axis advances from top to bottom.

この場合、画像処理装置１では、旗５１が存在しない画像４０ｈから抽出したＨＯＧ特徴量を背景特徴量としてステップＳ６の照合処理を行う。 In this case, the image processing apparatus 1 performs the matching process in step S6 using the HOG feature value extracted from the image 40h in which the flag 51 does not exist as the background feature value.

その後、時刻ｔ１〜ｔ２の間に撮像領域内に旗５１が設置されると、図１６に示したように、旗５１が存在する画像４０ｋが画像処理装置１に入力される。第１の実施形態で説明した検出処理では、画像４０ｋは検出対象画像であり、ＨＯＧ特徴量を抽出した後、画像４０ｋの背景特徴量との差分値に基づいて照合用特徴量を決定する。したがって、第１の実施形態で説明した検出処理では、画像４０ｋの旗５１を特定物体（人物）として誤検出する恐れがある。 Thereafter, when the flag 51 is set in the imaging region between the times t1 and t2, an image 40k in which the flag 51 is present is input to the image processing apparatus 1 as illustrated in FIG. In the detection process described in the first embodiment, the image 40k is a detection target image, and after extracting the HOG feature value, the matching feature value is determined based on the difference value from the background feature value of the image 40k. Therefore, in the detection process described in the first embodiment, the flag 51 of the image 40k may be erroneously detected as a specific object (person).

一方、本実施形態の検出方法では、ステップＳ１０で設定した条件及びステップＳ１２で用いる更新条件に基づいて背景特徴量を随時更新する。そのため、時刻ｔ２以降、画像４０ｋのように旗５１が存在する画像が背景画像候補になった時点で、背景画像が画像４０ｈから画像４０ｋに変更され、画像４０ｋから抽出したＨＯＧ特徴量が背景特徴量となる。したがって、画像４０ｋを入力画像とする照合処理により旗５１が特定物体として検出され、表示装置３に旗５１を囲む枠５７が表示される事態を防げる。 On the other hand, in the detection method of the present embodiment, the background feature amount is updated as needed based on the condition set in step S10 and the update condition used in step S12. Therefore, after the time t2, when the image having the flag 51 such as the image 40k becomes a background image candidate, the background image is changed from the image 40h to the image 40k, and the HOG feature amount extracted from the image 40k is the background feature. Amount. Therefore, it is possible to prevent a situation in which the flag 51 is detected as a specific object by the matching process using the image 40k as an input image, and the frame 57 surrounding the flag 51 is displayed on the display device 3.

また、詳細な説明は省略するが、画像４０ｋから抽出したＨＯＧ特徴量を背景特徴量として検出処理を行っている際に、画像４０ｈのように旗５１が撤収された場合、あるいは他の非特定物体が設置された場合には、所定のタイミングで背景特徴量が更新される。 Although detailed description is omitted, when the detection process is performed using the HOG feature value extracted from the image 40k as the background feature value, the flag 51 is withdrawn like the image 40h, or other non-specification is performed. When an object is installed, the background feature amount is updated at a predetermined timing.

以上説明したように、第２の実施形態に係る特定物体の検出方法では、所定の条件に従って背景特徴量を随時更新するので、入力画像内における非特定物体の有無が時間変化する場合にも非特定物体の誤検出等を防げ、高い検出精度を維持できる。そのため、例えば、商店街のような店舗の営業時間等により非特定物体の有無が変化する場所を観察する際に、非特定物体の有無の影響を受けにくく安定した検出精度を維持できる。 As described above, in the specific object detection method according to the second embodiment, the background feature amount is updated as needed according to a predetermined condition. Therefore, even when the presence or absence of the non-specific object in the input image changes over time, It prevents false detection of specific objects and maintains high detection accuracy. Therefore, for example, when observing a place where the presence or absence of a non-specific object changes depending on the business hours of a store such as a shopping street, it is difficult to be affected by the presence or absence of a non-specific object, and stable detection accuracy can be maintained.

なお、第２の実施形態では、図１に示した画像処理装置１の特徴量抽出部１２に背景特徴量設定処理を行わせている。しかしながら、本実施形態の検出処理を行う画像処理装置は、これに限らず、背景特徴量設定処理を行う処理部を備える装置であってもよい。また、画像処理装置は、例えば特徴量抽出部１２に最初（フレーム番号１）の入力画像から抽出したＨＯＧ特徴量を背景特徴量データベース１５に登録させ、以後に行う背景特徴量を更新する処理を特徴量抽出部１２とは別の処理部に行わせてもよい。 In the second embodiment, the feature amount extraction unit 12 of the image processing apparatus 1 illustrated in FIG. 1 performs background feature amount setting processing. However, the image processing apparatus that performs the detection process of the present embodiment is not limited to this, and may be an apparatus that includes a processing unit that performs a background feature amount setting process. In addition, the image processing apparatus, for example, causes the feature amount extraction unit 12 to register the HOG feature amount extracted from the first (frame number 1) input image in the background feature amount database 15 and updates the background feature amount to be performed thereafter. A processing unit different from the feature amount extraction unit 12 may perform the processing.

以上記載した各実施例を含む実施形態に関し、さらに以下の付記を開示する。
（付記１）
カメラで撮像した画像を取得する画像取得部と、
前記画像を複数のセルに分割して各セルの特徴量を抽出する特徴量抽出部と、
取得した前記画像のうちの背景画像に指定した画像から抽出した背景特徴量、及び前記画像から検出する特定物体の特徴量を含む辞書データを記憶する記憶部と、
前記辞書データと照合する前記画像の各セルから抽出した特徴量と前記背景特徴量との差分値に基づいて、前記画像の各セルの照合用特徴量を決定する照合用特徴量決定部と、
前記画像の各セルの照合用特徴量と前記辞書データの特徴量とを照合する特徴量照合部と、
を備えることを特徴とする画像処理装置。
（付記２）
前記照合用特徴量決定部は、前記画像のセルのうちの抽出した特徴量と前記背景特徴量との差分値が予め定めた閾値以下であるセルの照合用特徴量を前記差分値に決定し、前記差分値が前記閾値よりも大きいセルの特徴量を、当該セルから抽出した特徴量に決定すること、
を特徴とする付記１に記載の画像処理装置。
（付記３）
予め定めた条件に従って前記背景特徴量を更新する背景特徴量更新部を更に備えること、
を特徴とする付記１に記載の画像処理装置。
（付記４）
前記特徴量抽出部は、前記画像内における並進に対して不変な特徴量を抽出すること、
を特徴とする付記１に記載の画像処理装置。
（付記５）
前記辞書データは、前記特定物体のHistograms of Oriented Gradients（ＨＯＧ）特徴量を含み、
前記特徴量抽出部は、前記ＨＯＧ特徴量を抽出すること、
を特徴とする付記１に記載の画像処理装置。
（付記６）
前記特徴量照合部は、複数のセルを含む検出枠内の照合用特徴量と前記辞書データの特徴量とを照合し、前記検出枠内の照合用特徴量と前記辞書データの特定物体の特徴量との類似度が高い場合に、当該検出枠内に特定物体が存在すると判定すること
を特徴とする付記１に記載の画像処理装置。
（付記７）
前記特定物体が存在すると判定された検出枠の位置及び寸法に基づいて、前記画像に前記特定物体を囲む枠を合成して出力する出力処理部を更に備える
ことを特徴とする付記６に記載の画像処理装置。
（付記８）
カメラで撮像した画像と前記画像から検出する特定物体の特徴量を含む辞書データとを照合して前記画像から特定物体を検出する画像処理方法であって、
前記画像を複数のセルに分割して各セルの特徴量を抽出する第１の処理と、
前記画像のうちの背景画像に指定した画像から抽出した背景特徴量をデータベースに登録する第２の処理と、
前記辞書データと照合する前記画像の各セルから抽出した特徴量と前記背景特徴量との差分値に基づいて、前記画像の各セルの照合用特徴量を決定する第３の処理と、
前記画像の照合用特徴量と前記辞書データの特徴量とを照合する第４の処理と、
を有することを特徴とする画像処理方法。
（付記９）
前記第３の処理は、前記辞書データと照合する前記画像のセルのうちの抽出した特徴量と前記背景特徴量との差分値が予め定めた閾値以下であるセルの照合用特徴量を前記差分値に決定し、
前記差分値が前記閾値よりも大きいセルの照合用特徴量を、当該セルから抽出した特徴量に決定する
ことを特徴とする画像処理方法。
（付記１０）
前記第２の処理は、前記特定物体の検出処理を開始したときに最初に取得する画像を前記背景画像に指定すること
を特徴とする付記８に記載の画像処理方法。
（付記１１）
前記第２の処理は、予め定めた条件に従って前記背景画像の候補を指定し、前記背景画像の候補から抽出した特徴量と前記データベースに登録された背景特徴量との差異が予め定めた条件を満たす場合に前記背景特徴量を前記背景画像の候補から抽出した特徴量に更新する処理、を含む、
ことを特徴とする付記８に記載の画像処理方法。
（付記１２）
前記第２の処理は、前記特定物体の検出処理を開始してから所定の時間が経過するごとに前記背景画像の候補を指定する
ことを特徴とする付記１１に記載の画像処理方法。
（付記１３）
前記第２の処理は、予め定めた時刻が到来するごとに前記背景画像の候補を指定する
ことを特徴とする付記１１に記載の画像処理方法。
（付記１４）
前記第１の処理は、前記画像内における並進に対して不変な特徴量を抽出すること、
を特徴とする付記８に記載の画像処理方法。
（付記１５）
前記第１の処理は、Histograms of Oriented Gradients（ＨＯＧ）特徴量を抽出すること、
を特徴とする付記８に記載の画像処理方法。
（付記１６）
カメラで撮像した画像から検出する特定物体の特徴量を含む辞書データを予め登録しておき、
前記画像を複数のセルに分割して各セルの特徴量を抽出し、
前記画像のうちの背景画像に指定した画像から抽出した背景特徴量をデータベースに登録し、
前記辞書データと照合する前記画像の各セルの照合用特徴量を、当該セルから抽出した特徴量、又は抽出した特徴量と前記背景特徴量との差分値に決定し、
前記画像の照合用特徴量と前記辞書データの特徴量とを照合する、
処理をコンピュータに実行させるプログラム。 The following additional notes are further disclosed with respect to the embodiments including the examples described above.
(Appendix 1)
An image acquisition unit that acquires an image captured by the camera;
A feature amount extraction unit that divides the image into a plurality of cells and extracts a feature amount of each cell;
A storage unit for storing dictionary data including a background feature amount extracted from an image designated as a background image of the acquired images and a feature amount of a specific object detected from the image;
A matching feature value determining unit that determines a matching feature value of each cell of the image based on a difference value between the feature value extracted from each cell of the image to be checked against the dictionary data and the background feature value;
A feature amount matching unit for matching the feature amount for matching of each cell of the image with the feature amount of the dictionary data;
An image processing apparatus comprising:
(Appendix 2)
The matching feature amount determination unit determines, as the difference value, a matching feature amount of a cell whose difference value between the extracted feature amount and the background feature amount of the image cells is equal to or less than a predetermined threshold. Determining a feature quantity of a cell having the difference value larger than the threshold as a feature quantity extracted from the cell;
The image processing apparatus according to appendix 1, characterized by:
(Appendix 3)
A background feature amount updating unit that updates the background feature amount according to a predetermined condition;
The image processing apparatus according to appendix 1, characterized by:
(Appendix 4)
The feature extraction unit extracts a feature that is invariant to translation in the image;
The image processing apparatus according to appendix 1, characterized by:
(Appendix 5)
The dictionary data includes Histograms of Oriented Gradients (HOG) features of the specific object,
The feature quantity extraction unit extracts the HOG feature quantity;
The image processing apparatus according to appendix 1, characterized by:
(Appendix 6)
The feature amount matching unit matches a feature amount for matching in a detection frame including a plurality of cells with a feature amount of the dictionary data, and features of the matching feature amount in the detection frame and a specific object feature of the dictionary data The image processing apparatus according to appendix 1, wherein when the degree of similarity with the amount is high, it is determined that the specific object exists within the detection frame.
(Appendix 7)
The additional processing unit according to claim 6, further comprising: an output processing unit that synthesizes and outputs a frame that surrounds the specific object based on the position and size of the detection frame determined that the specific object exists. Image processing device.
(Appendix 8)
An image processing method for detecting a specific object from the image by collating an image captured by a camera with dictionary data including a feature amount of the specific object detected from the image,
A first process of dividing the image into a plurality of cells and extracting a feature amount of each cell;
A second process of registering a background feature amount extracted from an image designated as a background image of the images in a database;
A third process for determining a matching feature quantity of each cell of the image based on a difference value between the feature quantity extracted from each cell of the image to be matched with the dictionary data and the background feature quantity;
A fourth process of collating the image feature amount with the dictionary data feature amount;
An image processing method comprising:
(Appendix 9)
In the third process, the difference value between the extracted feature value and the background feature value of the cell of the image to be checked against the dictionary data is equal to or less than a predetermined threshold value. Determine the value,
An image processing method, wherein a matching feature amount of a cell having the difference value larger than the threshold value is determined as a feature amount extracted from the cell.
(Appendix 10)
The image processing method according to appendix 8, wherein the second process designates an image acquired first when the specific object detection process is started as the background image.
(Appendix 11)
In the second process, the background image candidate is designated according to a predetermined condition, and the difference between the feature quantity extracted from the background image candidate and the background feature quantity registered in the database is a predetermined condition. A process of updating the background feature value to a feature value extracted from the background image candidates when satisfied,
The image processing method according to appendix 8, wherein
(Appendix 12)
The image processing method according to appendix 11, wherein the second process specifies the background image candidate every time a predetermined time has elapsed since the start of the specific object detection process.
(Appendix 13)
The image processing method according to appendix 11, wherein the second process designates the background image candidate every time a predetermined time arrives.
(Appendix 14)
The first process extracts a feature quantity that is invariant to translation in the image;
The image processing method according to appendix 8, characterized by:
(Appendix 15)
The first processing includes extracting Histograms of Oriented Gradients (HOG) features,
The image processing method according to appendix 8, characterized by:
(Appendix 16)
Register in advance dictionary data including the feature quantity of the specific object detected from the image captured by the camera,
Dividing the image into a plurality of cells and extracting feature values of each cell;
The background feature amount extracted from the image designated as the background image among the images is registered in the database,
The feature value for matching of each cell of the image to be matched with the dictionary data is determined as a feature value extracted from the cell, or a difference value between the extracted feature value and the background feature value,
Collating the image feature amount with the dictionary data feature amount,
A program that causes a computer to execute processing.

１画像処理装置
１０画像取得部
１１入力画像保持部
１２特徴量抽出部
１３事前学習部
１４辞書データ
１５背景特徴量データベース
１６照合用特徴量決定部
１７特徴量照合部
１８出力処理部
２撮像装置
３表示装置
４０，４０ａ〜４０ｋ画像
４１，４１ａ〜４１ｆ，４１Ａ〜４１Ｆセル
５０人物
５１旗 DESCRIPTION OF SYMBOLS 1 Image processing apparatus 10 Image acquisition part 11 Input image holding | maintenance part 12 Feature-value extraction part 13 Prior learning part 14 Dictionary data 15 Background feature-value database 16 Matching feature-value determination part 17 Feature-value collation part 18 Output processing part 2 Imaging device 3 Display device 40, 40a-40k Image 41, 41a-41f, 41A-41F Cell 50 Person 51 Flag

Claims

An image acquisition unit that acquires an image captured by the camera;
A feature amount extraction unit that divides the image into a plurality of cells and extracts a feature amount of each cell;
A storage unit for storing dictionary data including a background feature amount extracted from an image designated as a background image of the acquired images and a feature amount of a specific object detected from the image;
A matching feature value determining unit that determines a matching feature value of each cell of the image based on a difference value between the feature value extracted from each cell of the image to be checked against the dictionary data and the background feature value;
A feature amount matching unit for matching the feature amount for matching of each cell of the image with the feature amount of the dictionary data;
An image processing apparatus comprising:

The matching feature amount determination unit determines, as the difference value, a matching feature amount of a cell whose difference value between the extracted feature amount and the background feature amount of the image cells is equal to or less than a predetermined threshold. Determining a matching feature amount of a cell having the difference value larger than the threshold value as a feature amount extracted from the cell;
The image processing apparatus according to claim 1.

The feature extraction unit extracts a feature that is invariant to translation in the image;
The image processing apparatus according to claim 1.

The dictionary data includes Histograms of Oriented Gradients (HOG) features of the specific object,
The feature quantity extraction unit extracts the HOG feature quantity;
The image processing apparatus according to claim 1.

An image processing method for detecting a specific object from the image by collating an image captured by a camera with dictionary data including a feature amount of the specific object detected from the image,
A first process of dividing the image into a plurality of cells and extracting a feature amount of each cell;
A second process of registering a background feature amount extracted from an image designated as a background image of the images in a database;
A third process for determining a matching feature quantity of each cell of the image based on a difference value between the feature quantity extracted from each cell of the image to be matched with the dictionary data and the background feature quantity;
A fourth process of collating the image feature amount with the dictionary data feature amount;
An image processing method comprising:

The third process determines, as the difference value, a matching feature amount of a cell whose difference value between the extracted feature amount and the background feature amount of the image cells is equal to or less than a predetermined threshold value,
The image processing method according to claim 5, wherein a matching feature amount of a cell having the difference value larger than the threshold is determined as a feature amount extracted from the cell.

In the second process, the background image candidate is designated according to a predetermined condition, and the difference between the feature quantity extracted from the background image candidate and the background feature quantity registered in the database is a predetermined condition. A process of updating the background feature value to a feature value extracted from the background image candidates when satisfied,
The image processing method according to claim 5.

Register in advance dictionary data including the feature quantity of the specific object detected from the image captured by the camera,
Dividing the image into a plurality of cells and extracting feature values of each cell;
The background feature amount extracted from the image designated as the background image among the images is registered in the database,
The feature value for matching of each cell of the image to be matched with the dictionary data is determined as a feature value extracted from the cell, or a difference value between the extracted feature value and the background feature value,
Collating the image feature amount with the dictionary data feature amount,
A program that causes a computer to execute processing.