JP2011210054A

JP2011210054A - Object detection device and learning device for the same

Info

Publication number: JP2011210054A
Application number: JP2010077798A
Authority: JP
Inventors: Takaharu Kurokawa; 高晴黒川
Original assignee: Secom Co Ltd
Current assignee: Secom Co Ltd
Priority date: 2010-03-30
Filing date: 2010-03-30
Publication date: 2011-10-20
Anticipated expiration: 2030-03-30
Also published as: JP5290227B2

Abstract

PROBLEM TO BE SOLVED: To solve a problem of an object detection device detecting an object from a voting result about a plurality of feature points appearing in an image of the object, wherein high-accuracy detection is difficult because a difference between votes caused by the likelihood of variation of the feature points occurs.SOLUTION: A detection storage part 12 stores information about a feature point distribution in an object image and a relative position of a reference point from a representative position of the feature point distribution as feature point information 120. A vote part 141 obtains, in the input image, a voting value of a feature point in each position of the input image according to the feature point distribution moved such that the representative position accords with the relative reference point provided in the relative position of a detection position of the feature point. An object decision part 142 determines a position of the reference point inside the input image based on a point wherein a total value of voting values of the respective feature points in the respective positions of the input image exceeds a prescribed detection threshold value and becomes locally maximal, and determines that the object image is present in the position.

Description

本発明は、入力画像に現れる対象物を検知する対象物検知装置、及びその学習に用いる学習装置に関する。 The present invention relates to an object detection apparatus that detects an object appearing in an input image, and a learning apparatus used for the learning.

近年、監視カメラの画像やデジタルスチルカメラの画像から人や顔などの存在を検知する研究が盛んに行われている。検知処理には、パターンマッチング装置や識別器による探索的手法が用いられる。すなわち、画像内の各所に窓を設定して各窓画像をパターンマッチング装置や識別器に入力し、これらが出力する検出結果を集計して集計値が高い位置に対象物を検知する。 In recent years, active research has been conducted to detect the presence of people, faces, and the like from images from surveillance cameras and digital still cameras. In the detection process, a search method using a pattern matching device or a classifier is used. That is, windows are set at various locations in the image, the window images are input to a pattern matching device and a discriminator, and the detection results output by these are totaled to detect an object at a position where the total value is high.

画像中の対象物はその対象物の全体像が撮像されているとは限らず、対象物の一部分が他の物体に隠蔽されている場合もある。一部隠蔽状態にある対象物を検知するために、従来、対象物の複数の特徴点を検出し、これら特徴点の検出結果を統合判定することが行なわれている。 The target object in the image is not necessarily an image of the entire target object, and a part of the target object may be hidden by another object. In order to detect an object that is partially concealed, conventionally, a plurality of feature points of the object are detected, and the detection results of these feature points are integrally determined.

例えば、特許文献１に記載の従来技術では、対象物の標本画像について複数の特徴点と当該特徴点の位置に対する当該標本画像の基準点の位置（相対位置）とを予め学習しておき、入力画像から検出された特徴点から見た相対位置に投票を行い、入力画像内にて当該投票の集計値が閾値を超えると対象物が存在すると判定する。つまり、複数の特徴点が標本画像においてと同じ位置関係で検出されるとそれらの特徴点からの投票が入力画像内にて１個所に集まり、投票の集計値が閾値を超えて対象物が検知されるのである。 For example, in the prior art described in Patent Document 1, a plurality of feature points and the position (relative position) of the reference point of the sample image with respect to the position of the feature point are learned in advance for the sample image of the object and input. Voting is performed on the relative position viewed from the feature point detected from the image, and when the total value of the voting exceeds a threshold value in the input image, it is determined that the object exists. In other words, if multiple feature points are detected in the same positional relationship as in the sample image, votes from those feature points gather in one place in the input image, and the target value is detected when the total number of votes exceeds the threshold value. It is done.

特開平９−２１６１０号公報JP-A-9-21610

特徴点には、基準点との相対位置がばらつきやすいものとそうでないものがある。例えば、対象物が人である場合、頭部は可動域が小さいため頭部周辺の特徴点と基準点との相対位置のばらつきは比較的小さいが、脚部は可動域が大きいため脚部周辺の特徴点と基準点との相対位置は大きくばらつく。 The feature points include those that tend to vary in relative position with respect to the reference point and those that do not. For example, if the object is a person, the head has a small range of motion, so the variation in the relative positions of the feature points around the head and the reference point is relatively small, but the leg has a large range of motion, so the legs are around the legs. The relative position between the feature point and the reference point varies greatly.

このことは、ばらつきの小さな特徴点に係る投票は１箇所に集まりやすいが、ばらつきの大きな特徴点に係る投票は１箇所に集まりにくいことを意味する。そのため、これらの特徴点を一律に投票すると、検知漏れが起こりやすくなるという問題があった。 This means that votes relating to feature points with small variations tend to gather in one place, but votes relating to feature points having large variations are difficult to gather in one place. Therefore, if these feature points are uniformly voted, there is a problem that detection omission is likely to occur.

本発明は上記問題点を解決するためになされたものであり、特徴点と基準点との相対位置のばらつきやすさによる投票の格差が是正され、対象物を高精度に検知できる対象物検知装置、及び当該対象物検知装置の構築に用いる学習装置を提供することを目的とする。 The present invention has been made to solve the above-described problem, and an object detection apparatus capable of correcting an object difference with a high degree of accuracy by correcting a disparity in voting due to the variability of the relative positions of feature points and reference points. It is another object of the present invention to provide a learning device used for constructing the object detection device.

本発明に係る対象物検知装置は、入力画像に現れる対象物を検知するものであって、予め設定された前記対象物を撮影した対象物体像の特徴を示す画像特徴を有する複数の特徴点ごとに、前記対象物体像における所定の基準点と特徴点との相対位置と、当該相対位置のばらつき度とを含む特徴点情報を記憶した記憶部と、前記入力画像の中から前記特徴点を検出する特徴点検出部と、前記特徴点検出部にて検出された特徴点について、前記入力画像における当該特徴点の位置との相対位置を中心に前記ばらつき度に応じた距離特性にて投票値を算出する投票部と、前記投票値を前記入力画像における各位置において集計して対象物の存在を判定する対象物判定部と、を有する。 An object detection device according to the present invention detects an object appearing in an input image, and has a plurality of feature points each having an image feature indicating a characteristic of a target object image obtained by photographing the object set in advance. In addition, a storage unit storing feature point information including a relative position between a predetermined reference point and a feature point in the target object image and a variation degree of the relative position, and the feature point is detected from the input image The feature point detection unit and the feature point detected by the feature point detection unit obtain a vote value with a distance characteristic according to the degree of variation around a relative position with respect to the position of the feature point in the input image. A voting unit for calculating, and an object determination unit for counting the voting values at each position in the input image to determine the presence of the object.

他の本発明に係る対象物検知装置においては、前記投票部の前記距離特性が、前記ばらつき度が大きいほど前記中心からの投票位置の距離範囲を広くするものである。 In another object detection apparatus according to the present invention, the distance characteristic of the voting unit increases the distance range of the voting position from the center as the variation degree increases.

さらに他の本発明に係る対象物検知装置においては、前記投票部の前記距離特性が、前記ばらつき度が大きいほど前記中心から離れる距離に応じて前記投票値を緩やかに減衰させるものである。 In still another object detection apparatus according to the present invention, the distance characteristic of the voting unit gradually attenuates the voting value according to a distance away from the center as the degree of variation increases.

本発明に係る学習装置は、上記対象物検知装置に用いる前記特徴点情報を生成するものであって、前記対象物が撮された複数の標本画像を格納されている標本画像記憶部と、前記各標本画像から所定の画像特徴を有する標本特徴点を抽出する標本特徴点抽出部と、前記標本画像相互間にて位置及び前記画像特徴が類似する前記標本特徴点からなるクラスタを生成するクラスタリング部と、前記クラスタごとに、統計分析により前記標本特徴点の位置の分布に関する標本分布の情報を求め、さらに、前記標本画像における所定の基準点の、前記標本分布における所定の代表位置からの相対位置である標本相対位置を求め、当該クラスタごとの前記標本分布の情報及び前記標本相対位置を前記特徴点分布の情報及び前記相対位置とした前記特徴点情報を生成する特徴点情報生成部と、を有する。 A learning device according to the present invention generates the feature point information used in the object detection device, and includes a sample image storage unit in which a plurality of sample images taken of the object are stored; A sample feature point extraction unit that extracts a sample feature point having a predetermined image feature from each sample image, and a clustering unit that generates a cluster composed of the sample feature points having similar positions and image features between the sample images And for each cluster, information on the sample distribution regarding the distribution of the position of the sample feature point is obtained by statistical analysis, and further, a relative position of a predetermined reference point in the sample image from a predetermined representative position in the sample distribution The feature point is obtained by obtaining the sample relative position and the sample distribution information and the sample relative position for each cluster as the feature point distribution information and the relative position. Has a characteristic point information generation unit for generating a multi-address, the.

本発明に係る対象物検知装置によれば、特徴点のばらつきやすさによる投票の格差が是正され、対象物を高精度に検知できるようになり、また本発明に係る学習装置によれば、当該対象物検知装置の構築が可能となる。 According to the object detection device according to the present invention, the disparity of voting due to the variability of feature points is corrected, and the object can be detected with high accuracy. According to the learning device according to the present invention, An object detection device can be constructed.

本発明の実施形態に係る対象物検知装置の概略の構成を示すブロック図である。It is a block diagram which shows the structure of the outline of the target object detection apparatus which concerns on embodiment of this invention. 対象物標本画像の一例を示す模式図である。It is a schematic diagram which shows an example of a target object sample image. 特徴点の例について特徴点情報の一部のパラメータを対象物標本画像に対応する二次元領域上にて模式的に表現した模式図である。It is the schematic diagram which expressed typically a part of parameter of feature point information about the example of a feature point on the two-dimensional area | region corresponding to a target object sample image. 特徴点情報を構成するパラメータ群を表形式に表した模式図である。It is the schematic diagram which represented the parameter group which comprises feature point information in tabular form. 入力画像にて検出された特徴点の例を示す模式図である。It is a schematic diagram which shows the example of the feature point detected in the input image. 同一の対象物に検出された複数の特徴点について投票の様子を例示する模式図である。It is a schematic diagram which illustrates the mode of voting about the some feature point detected by the same target object. 対象物判定処理の様子を説明するための図である。It is a figure for demonstrating the mode of a target object determination process. 本発明の実施形態に係る対象物検知装置の概略の動作を示すフロー図である。It is a flowchart which shows the operation | movement of the outline of the target object detection apparatus which concerns on embodiment of this invention. 特徴点検出処理及び投票処理の概略のフロー図である。It is a general | schematic flowchart of a feature point detection process and a voting process. 対象物判定処理の概略のフロー図である。It is a general | schematic flowchart of a target object determination process. 本発明の実施形態に係る学習装置の概略の構成を示すブロック図である。It is a block diagram which shows the schematic structure of the learning apparatus which concerns on embodiment of this invention. 標本点とクラスタとの関係を示す模式図である。It is a schematic diagram which shows the relationship between a sample point and a cluster. 本発明の実施形態に係る学習装置の概略の動作を示すフロー図である。It is a flowchart which shows the operation | movement of the outline of the learning apparatus which concerns on embodiment of this invention.

以下、本発明の実施の形態（以下実施形態という）である対象物検知装置１、及び学習装置２について、図面に基づいて説明する。対象物検知装置１は、例えば、監視空間から得られた監視画像等を入力画像とし、当該入力画像に現れる対象物を検知する。本実施形態は人を対象物とし、監視空間から得られた監視画像において、人の特徴点を検出することで侵入者を検知し、侵入者を検知すると異常信号を出力する。学習装置２は、対象物検知装置１に用いる特徴点情報を学習により生成する。 Hereinafter, an object detection device 1 and a learning device 2 which are embodiments of the present invention (hereinafter referred to as embodiments) will be described with reference to the drawings. For example, the object detection apparatus 1 uses a monitoring image obtained from the monitoring space as an input image, and detects an object appearing in the input image. In this embodiment, a person is an object, and an intruder is detected by detecting a human feature point in a monitoring image obtained from the monitoring space, and an abnormal signal is output when the intruder is detected. The learning device 2 generates feature point information used for the object detection device 1 by learning.

［対象物検知装置］
図１は、実施形態に係る対象物検知装置１の概略の構成を示すブロック図である。対象物検知装置１は、撮像部１０、画像取得部１１、検知記憶部１２、特徴点情報設定部１３、検知制御部１４及び検知出力部１５を含んで構成される。画像取得部１１は撮像部１０と接続され、画像取得部１１、検知記憶部１２、特徴点情報設定部１３及び検知出力部１５は検知制御部１４と接続される。 [Object detection device]
FIG. 1 is a block diagram illustrating a schematic configuration of an object detection device 1 according to the embodiment. The object detection device 1 includes an imaging unit 10, an image acquisition unit 11, a detection storage unit 12, a feature point information setting unit 13, a detection control unit 14, and a detection output unit 15. The image acquisition unit 11 is connected to the imaging unit 10, and the image acquisition unit 11, the detection storage unit 12, the feature point information setting unit 13, and the detection output unit 15 are connected to the detection control unit 14.

撮像部１０は監視カメラであり、監視空間内に設置される。例えば、監視カメラは監視空間の天井部に監視空間を俯瞰して設置される。当該監視カメラは、監視空間を所定の時間間隔（例えば１秒）で撮影し、各画素が多階調の画素値で表現される監視画像を順次、出力する。 The imaging unit 10 is a surveillance camera and is installed in a surveillance space. For example, the monitoring camera is installed on the ceiling of the monitoring space over the monitoring space. The monitoring camera images the monitoring space at a predetermined time interval (for example, 1 second), and sequentially outputs monitoring images in which each pixel is expressed by a multi-gradation pixel value.

画像取得部１１は、撮像部１０により撮影された監視画像を取得して検知制御部１４に取り込むインターフェース回路である。以下、画像取得部１１から検知制御部１４に入力される画像を入力画像と称する。 The image acquisition unit 11 is an interface circuit that acquires a monitoring image captured by the imaging unit 10 and imports the monitoring image into the detection control unit 14. Hereinafter, an image input from the image acquisition unit 11 to the detection control unit 14 is referred to as an input image.

検知記憶部１２は、ＲＯＭ（Read Only Memory）、ＲＡＭ（Random Access Memory）、ハードディスク等の記憶装置であり、検知制御部１４で使用されるプログラムやデータを記憶する。検知記憶部１２はこれらプログラム、データを検知制御部１４との間で入出力する。検知記憶部１２に記憶されるデータには、特徴点情報１２０、投票画像１２１が含まれる。 The detection storage unit 12 is a storage device such as a ROM (Read Only Memory), a RAM (Random Access Memory), and a hard disk, and stores programs and data used by the detection control unit 14. The detection storage unit 12 inputs and outputs these programs and data to and from the detection control unit 14. The data stored in the detection storage unit 12 includes feature point information 120 and a voting image 121.

特徴点情報１２０は、対象物の画像特徴を有する複数の特徴点ごとに、特徴点を識別する特徴点番号、対象物画像における重心位置等の基準点位置と特徴点位置との相対位置、入力画像から特徴点を検出するための検出基準、及び相対位置のばらつき度から構成されている。 The feature point information 120 includes a feature point number for identifying a feature point for each of a plurality of feature points having an image feature of the target object, a relative position between a reference point position such as a barycentric position in the target object image, and a feature point position. It consists of detection criteria for detecting feature points from the image and the degree of variation in relative position.

この特徴点情報１２０は、対象物が撮像された多数の対象物標本画像を基に、後述する学習装置２によって予め生成される。 The feature point information 120 is generated in advance by the learning device 2 to be described later based on a large number of object specimen images obtained by imaging the object.

図２は、対象物標本画像の一例を示す模式図である。本実施形態では検知対象物は人であり、対象物標本画像は人の全体画像である。各対象物標本画像は、人の形状に合わせて幅（水平）方向６４ピクセル×高さ（垂直）方向１２８ピクセルの縦長の矩形に規格化され、その重心座標（３２，６４）を対象物基準点Ｂと定める。 FIG. 2 is a schematic diagram illustrating an example of an object specimen image. In the present embodiment, the detection target is a person, and the target specimen image is a whole person image. Each object specimen image is normalized to a vertically long rectangle of 64 pixels in the width (horizontal) direction × 128 pixels in the height (vertical) direction according to the shape of the person, and the barycentric coordinates (32, 64) are used as the object reference. Set as point B.

図３は、特徴点の例について特徴点情報１２０の一部のパラメータを対象物標本画像に対応する二次元領域上にて模式的に表現した模式図であり、図３（ａ）〜（ｃ）はそれぞれ異なる特徴点に対応する。また、図４は、特徴点情報１２０を構成するパラメータ群を表形式に表した模式図である。図３，図４を参照して特徴点情報１２０を構成する各パラメータを説明する。ここで、設定されている特徴点の数はＭ（＞１）個とし、各特徴点には１〜Ｍの特徴点番号（特徴点＃）を通しで付与する。Ｍ個の特徴点の位置（図３にて“×”印）はそれぞれ異なり、特徴点＃１〜＃Ｍのそれぞれについてその位置から対象物基準点ＢへのベクトルＲ_１〜Ｒ_Ｍが当該特徴点の相対位置として、特徴点情報１２０に記憶されている。また特徴点情報１２０には、Ｍ個の特徴点の検出基準として、特徴点＃１〜＃Ｍのそれぞれの位置における対象物標本画像の特徴量を表すＮ次元のベクトルＡ_１〜Ａ_Ｍが記憶されている。 FIG. 3 is a schematic diagram schematically showing some parameters of the feature point information 120 on a two-dimensional region corresponding to the object specimen image with respect to an example of feature points. ) Correspond to different feature points. FIG. 4 is a schematic diagram showing the parameter group constituting the feature point information 120 in a table format. Each parameter constituting the feature point information 120 will be described with reference to FIGS. Here, the number of set feature points is M (> 1), and 1 to M feature point numbers (feature points #) are assigned to each feature point. The positions of the M feature points (“x” in FIG. 3) are different from each other, and for each of the feature points # 1 to #M, vectors R _{1 to} R _M from the position to the object reference point B are the features. It is stored in the feature point information 120 as the relative position of the point. Further, the feature point information 120 stores N-dimensional vectors A _{1 to} A _M representing the feature quantities of the object specimen images at the respective positions of the feature points # 1 to #M as detection criteria for the M feature points. Has been.

特徴量は公知のシェイプコンテキスト（Shape Context）やヒストグラム・オブ・オリエンティッド・グラディエント（ＨＯＧ：Histograms of Oriented Gradients；Navneet Dalal and Bill Triggs，“Histograms of Oriented Gradients for Human Detection”，In Proceedings of IEEE Conference Computer Vision and Pattern Recognition 2005）等である。 Features are known Shape Contexts, Histograms of Oriented Gradients (HOG), Navneet Dalal and Bill Triggs, “Histograms of Oriented Gradients for Human Detection”, In Proceedings of IEEE Conference Computer Vision and Pattern Recognition 2005).

シェイプコンテキストは特徴点周辺におけるエッジの分布特性を表す特徴量であり、データはベクトル形式である。シェイプコンテキストは特徴点を中心に設定された分析窓内の画像を分析して算出され、そのベクトルの各要素のインデックスは分析窓内を複数に分割した小領域と量子化されたエッジ方向との組み合わせに対応し、各要素の値はインデックスが表す小領域においてインデックスが表すエッジ方向を有するエッジの強度の和に対応する。ＨＯＧも特徴点周辺における輝度微分値の分布特性を表すベクトル量である。シェイプコンテキストもＨＯＧも、特徴点周辺における輝度勾配の分布特性を表しており、照明変動に頑強であることから対象物の検知に適している。 The shape context is a feature quantity representing the distribution characteristics of edges around the feature point, and the data is in a vector format. The shape context is calculated by analyzing the image in the analysis window set around the feature point, and the index of each element of the vector is the difference between the small area divided into the analysis window and the quantized edge direction. Corresponding to the combination, the value of each element corresponds to the sum of the strengths of the edges having the edge direction represented by the index in the small area represented by the index. HOG is also a vector quantity representing the distribution characteristic of the luminance differential value around the feature point. Both the shape context and the HOG represent the distribution characteristics of the luminance gradient around the feature points, and are suitable for detecting an object because they are robust against illumination fluctuations.

Ｍ個の特徴点のばらつき度も各特徴点ごとに異なり得る特徴量であり、特徴点＃１〜＃Ｍのそれぞれについてそのばらつき度Ｖ_１〜Ｖ_Ｍが特徴点情報１２０として記憶されている。本実施形態では、ばらつき度は特徴点位置と基準点位置との相対位置に関する学習した複数の標本画像における分散値である。ばらつき度はｘ方向（水平方向）の成分ｖｘと、ｙ方向（垂直方向）の成分ｖｙとを有している。図３にて特徴点を中心とする実線の楕円は所定の信頼度の信頼区間に相当する範囲を表しており、分散値から求められる。なお、当該楕円のｘ方向、ｙ方向それぞれの半径に対応させて記したｖｘ、ｖｙそれぞれの平方根（つまり標準偏差）の表示は、それら半径の比がｘ方向、ｙ方向の標準偏差の比で与えられることを示している。 Variation degree of the M feature points is also a feature quantity that may vary for each feature point, the variation of V ₁ ~V _M for each of the feature points #. 1 to # M are stored as feature information 120. In the present embodiment, the degree of variation is a variance value in a plurality of learned sample images related to the relative position between the feature point position and the reference point position. The degree of variation has a component vx in the x direction (horizontal direction) and a component vy in the y direction (vertical direction). In FIG. 3, a solid oval centered on the feature point represents a range corresponding to a confidence interval of a predetermined reliability, and is obtained from the variance value. In addition, the display of the square root (that is, standard deviation) of each of vx and vy described corresponding to the radii of the ellipse in the x direction and y direction is the ratio of the radii in the x direction and the standard deviation in the y direction. It shows that it is given.

なお、特徴点位置と基準点位置との相対位置の分布を解析して、分布の主軸を求め、この主軸に直交する軸との２軸で表してもよい。 The distribution of the relative position between the feature point position and the reference point position may be analyzed to obtain a distribution main axis, which may be represented by two axes, an axis orthogonal to the main axis.

特徴点位置と基準点位置との相対位置のばらつきは姿勢の変動やプロポーションの個人差などにより生じる。図３（ａ）に示す特徴点＃１は頭部付近の特徴点を表しており、図３（ｃ）に示す特徴点＃Ｍは脚部付近の特徴点を表している。頭部に比べて脚部は可動域が大きいため、特徴点＃１のばらつき度より特徴点＃Ｍのばらつき度の方が大きい（ｖｘ_１＞ｖｘ_Ｍ，ｖｙ_１＞ｖｙ_Ｍ）。 Variation in the relative position between the feature point position and the reference point position is caused by variations in posture, individual differences in proportion, and the like. A feature point # 1 shown in FIG. 3A represents a feature point near the head, and a feature point #M shown in FIG. 3C represents a feature point near the leg. Since the range of motion of the leg is larger than that of the head, the degree of variation of the feature point #M is larger than the degree of variation of the feature point # 1 (vx ₁ > vx _M , vy ₁ > by _M ).

投票画像１２１は、対象物判定に使用される情報であって、入力画像から検出された複数の特徴点について、特徴点それぞれのばらつき度を加味して入力画像と同じ大きさのフレーム上の相対位置に投票した結果を示す画像である。また、本実施形態では、後述するように複数の倍率（検知倍率）にて対象物検知処理が行われることに対応して、投票画像１２１は検知倍率ごとに用意される。 The voting image 121 is information used for object determination, and relative to a plurality of feature points detected from the input image on a frame having the same size as the input image in consideration of the degree of variation of each feature point. It is an image which shows the result of having voted for the position. In the present embodiment, as described later, the voting image 121 is prepared for each detection magnification in response to the object detection processing being performed at a plurality of magnifications (detection magnifications).

特徴点情報設定部１３は、特徴点情報１２０を外部から入力するＵＳＢ端子、ＣＤドライブ、ネットワークアダプタ等のインターフェース回路及びそれぞれのドライバ・プログラム、及び入力された特徴点情報１２０を検知記憶部１２に格納させるプログラムからなる。この特徴点情報設定部１３を介して、学習装置２にて生成された特徴点情報２１２が入力され、特徴点情報１２０として検知記憶部１２に格納される。 The feature point information setting unit 13 inputs the feature point information 120 from the outside, such as a USB terminal, a CD drive, a network adapter and other interface circuits and their respective drivers and programs, and the input feature point information 120 in the detection storage unit 12. Consists of programs to be stored. The feature point information 212 generated by the learning device 2 is input via the feature point information setting unit 13 and stored as feature point information 120 in the detection storage unit 12.

検知制御部１４はＤＳＰ(Digital Signal Processor)、ＭＣＵ(Micro Control Unit)等の演算装置を用いて構成される。検知制御部１４は、画像取得部１１からの入力画像を処理して人の存在有無を判定し、人を検知すると異常信号を検知出力部１５へ出力する処理を行う。具体的には、検知制御部１４は検知記憶部１２からプログラムを読み出して実行し、後述する特徴点検出部１４０、投票部１４１、対象物判定部１４２、異常判定部１４３として機能する。 The detection control unit 14 is configured using an arithmetic device such as a DSP (Digital Signal Processor) or an MCU (Micro Control Unit). The detection control unit 14 processes the input image from the image acquisition unit 11 to determine the presence or absence of a person, and performs processing to output an abnormal signal to the detection output unit 15 when a person is detected. Specifically, the detection control unit 14 reads out and executes a program from the detection storage unit 12, and functions as a feature point detection unit 140, a voting unit 141, an object determination unit 142, and an abnormality determination unit 143, which will be described later.

特徴点検出部１４０は、入力画像から各特徴点を検出し、当該検出された特徴点の特徴点番号、当該特徴点が検出された入力画像内の位置（検出位置）、当該特徴点の検出度、及び当該特徴点を検出したときの検知倍率を対応付けた特徴点検出情報を投票部１４１に出力する。 The feature point detection unit 140 detects each feature point from the input image, detects the feature point number of the detected feature point, the position (detection position) in the input image where the feature point is detected, and the detection of the feature point The feature point detection information in which the degree and the detection magnification when the feature point is detected is associated is output to the voting unit 141.

特徴点検出部１４０は入力画像内の各位置を中心に分析窓を設定して分析窓内の特徴量を抽出し、当該特徴量を各特徴点の検出基準と比較して検出度を算出する。そして、算出された検出度が予め設定された特徴点検出閾値Ｔｐを超えていれば当該位置に当該特徴点を検出し、検出度がＴｐを超えていなければ当該位置に当該特徴点を検出しない。特徴量は、本実施形態では上述したシェイプコンテキストとするが、ＨＯＧとすることもできる。なお、特徴量は、後述する学習装置２にて検出基準を学習したときに用いられた特徴量と同じ種類とする必要がある。 The feature point detection unit 140 sets an analysis window around each position in the input image, extracts a feature amount in the analysis window, compares the feature amount with a detection criterion for each feature point, and calculates a degree of detection. . If the calculated degree of detection exceeds a preset feature point detection threshold Tp, the feature point is detected at the position. If the degree of detection does not exceed Tp, the feature point is not detected at the position. . The feature amount is the shape context described above in the present embodiment, but may be a HOG. Note that the feature amount needs to be the same type as the feature amount used when the detection criterion is learned by the learning device 2 described later.

本実施形態では検出基準として特徴量を記憶しており、この場合、特徴点検出部１４０はパターンマッチングを行なって特徴点を検出する。すなわち、抽出された特徴量と検出基準の特徴量とのユークリッド距離ｄを検出度として算出し、ｄがＴｐ以下であれば特徴点を検出する。 In this embodiment, feature quantities are stored as detection criteria. In this case, the feature point detection unit 140 performs pattern matching to detect feature points. That is, the Euclidean distance d between the extracted feature quantity and the detection reference feature quantity is calculated as a degree of detection. If d is equal to or less than Tp, a feature point is detected.

検出基準として識別関数を記憶する別の実施形態においては、特徴点検出部１４０は、抽出された特徴量を識別関数に入力してその出力値である尤度を検出度として算出し、尤度がＴｐより大きければ特徴点を検出する。つまり特徴点検出部１４０は識別器として動作するように構成される。 In another embodiment in which the discriminant function is stored as a detection criterion, the feature point detector 140 inputs the extracted feature quantity into the discriminant function, calculates the likelihood that is the output value as the detectability, and the likelihood. If is larger than Tp, a feature point is detected. That is, the feature point detection unit 140 is configured to operate as a discriminator.

入力画像に撮像されている対象物のサイズは様々であることに対応して、特徴点検出部１４０は、特徴点の検出に際して、検知倍率を調整して対象物のサイズの多様性への適合処理を行う。ここで、検知倍率αは、対象物標本画像に撮像されていた対象物のサイズを基準にしたときの、入力画像に撮像されている対象物のサイズの倍率である。具体的には、入力画像に撮像されている対象物のサイズを対象物標本画像のサイズに合わせるために、入力画像の方を予め設定された複数段階の検知倍率に応じて拡大又は縮小する。その拡大・縮小により、入力画像は元のサイズの１／αとなる。検知倍率αは、例えば（１．０５）^３倍、（１．０５）^２倍、１．０５倍、１．０倍、１／１．０５倍、１／（１．０５）^２倍、１／（１．０５）^３倍の７段階に設定する。拡大・縮小処理は公知のバイリニア補間法などにより行うことができる。 Corresponding to the fact that the size of the object captured in the input image varies, the feature point detection unit 140 adjusts the detection magnification to match the variety of the object size when detecting the feature point. Process. Here, the detection magnification α is a magnification of the size of the object captured in the input image when the size of the object captured in the object specimen image is used as a reference. Specifically, in order to match the size of the object captured in the input image with the size of the object specimen image, the input image is enlarged or reduced in accordance with preset multiple detection magnifications. By the enlargement / reduction, the input image becomes 1 / α of the original size. The detection magnification α is, for example, (1.05) ³ times, (1.05) ² times, 1.05 times, 1.0 times, 1 / 1.05 times, 1 / (1.05) ² times, 1 /(1.05) Set to 7 levels, ³ times. Enlarging / reducing processing can be performed by a known bilinear interpolation method or the like.

投票部１４１は、入力画像にて検出された特徴点それぞれについて投票値を求める。具体的には、投票部１４１は、特徴点情報１２０及び、特徴点検出部１４０からの特徴点検出情報を参照して、特徴点検出部１４０により検出された各特徴点について、当該特徴点の検出位置Ｐから当該特徴点の相対位置Ｒだけずらした相対基準点Ｑを算出し、このＱを中心とし当該特徴点のばらつき度に応じた距離特性の投票値を設定する。そして、設定された投票位置と投票値との関係を対象物判定部１４２へ出力する。検出時の検知倍率をαとすると、特徴点＃ｊの相対基準点Ｑは次式により算出される。
Ｑ＝（Ｐ＋Ｒ_ｊ）／α ・・・・・・（１） The voting unit 141 obtains a voting value for each feature point detected in the input image. Specifically, the voting unit 141 refers to the feature point information 120 and the feature point detection information from the feature point detection unit 140, and for each feature point detected by the feature point detection unit 140, A relative reference point Q shifted from the detection position P by the relative position R of the feature point is calculated, and a voting value of a distance characteristic is set according to the variation degree of the feature point with the Q as a center. Then, the relationship between the set voting position and the voting value is output to the object determining unit 142. When the detection magnification at the time of detection is α, the relative reference point Q of the feature point #j is calculated by the following equation.
Q = (P + R _j ) / α (1)

ここで、検出された特徴点が真である場合には、相対基準点Ｑは、対象物基準点Ｂが存在する位置を表す。つまり、同一の対象物から検出される特徴点同士はそれぞれの相対基準点Ｑが理想的には互いに一致する。実際には検出された特徴点はそのばらつきにより、相対位置を表すベクトルＲの始点とした特徴点の代表位置からのずれを有し、その分、同一の対象物から検出された特徴点同士の相対基準点Ｑの間にも距離が生じる。 Here, when the detected feature point is true, the relative reference point Q represents the position where the object reference point B exists. In other words, the relative reference points Q of the feature points detected from the same object ideally match each other. Actually, the detected feature point has a deviation from the representative position of the feature point as the starting point of the vector R representing the relative position due to the variation, and the feature points detected from the same object are correspondingly shifted. A distance also occurs between the relative reference points Q.

特徴点＃ｊの検出位置Ｐを（ｐ_ｘ，ｐ_ｙ）、相対基準点Ｑを（ｑ_ｘ，ｑ_ｙ）と表すと入力画像における各画素（ｘ，ｙ）の投票値ｆ（ｘ，ｙ）を次式で定義することができる。

When the detection position P of the feature point #j is represented as (p _x , p _y ) and the relative reference point Q is represented as (q _x , q _y ), the vote value f (x, y) of each pixel (x, y) in the input image ) Can be defined by the following equation:

（２）式で表されるｆ（ｘ，ｙ）は平均が相対基準点Ｑであり、分散がばらつき度Ｖ_ｊである２次元正規分布である。（２）式で定義される投票値には、相対基準点Ｑで最大値を示し、相対基準点Ｑから遠ざかるほど小さくなる距離減衰特性が与えられる。そして、ばらつき度Ｖ_ｊが大きいほど緩慢な距離減衰特性が与えられ、ばらつき度Ｖ_ｊが小さいほど急峻な距離減衰特性が与えられる。つまり投票部１４１は、投票値を相対基準点からの距離に応じて減少させる。すなわち、相対基準点が対象物基準点Ｂに近いほど、対象物基準点Ｂにて高い投票値が設定される。 F (x, y) expressed by the equation (2) is a two-dimensional normal distribution in which the average is the relative reference point Q and the variance is the variation degree V _j . The vote value defined by the equation (2) is given a distance attenuation characteristic that shows a maximum value at the relative reference point Q and decreases as the distance from the relative reference point Q increases. A slower distance attenuation characteristic is given as the variation degree V _j is larger, and a steeper distance attenuation characteristic is given as the variation degree V _j is smaller. That is, the voting unit 141 decreases the voting value according to the distance from the relative reference point. That is, the closer the relative reference point is to the object reference point B, the higher the voting value is set at the object reference point B.

ｗは、検出度の関数であり、検出度が特徴点検出閾値Ｔｐを超えるほど大きな値となるように設定される。つまり投票値ｆ（ｘ，ｙ）は、検出度が特徴点検出閾値Ｔｐを超えるほど大きく重みづけされる。すなわち検出の信頼性が高い特徴点ほど高い投票値が設定される。例えば、ユークリッド距離ｄに対してｗ＝ｅｘｐ（−ｋｄ）と設定される。但し、ｋは予め設定された正の定数である。また、例えば、検出度が尤度Ｌである別の実施形態では、ｗ＝Ｌと設定することができる。 w is a function of the degree of detection, and is set so as to increase as the degree of detection exceeds the feature point detection threshold Tp. That is, the vote value f (x, y) is weighted more as the degree of detection exceeds the feature point detection threshold Tp. That is, a higher voting value is set for a feature point with higher detection reliability. For example, w = exp (−kd) is set for the Euclidean distance d. Here, k is a preset positive constant. Further, for example, in another embodiment in which the detection degree is the likelihood L, w = L can be set.

投票値の関数ｆは（２）式以外のものに定義することもでき、例えば、次に示す（３）式で表される四角錘型関数、又は（４）式で表される円錐型関数で規定することができる。

The vote value function f can be defined as other than the expression (2). For example, a quadrangular pyramid function expressed by the following expression (3) or a conical function expressed by the expression (4) It can be specified by.

これら（３）式，（４）式の関数によれば上述したような距離減衰特性のほか、投票値が設定される距離範囲がばらつき度に応じた広さに制限される。なお、（２）式の投票関数においても、例えば次の（５）式を満たす距離範囲に制限することができる。ちなみに（５）式は正規分布における３σの信頼区間を表す。このように制限することで、真の対象物の基準点が存在することが確からしい範囲のみに投票値を設定でき、対象物検知の信頼性が向上する。

According to the functions of the expressions (3) and (4), in addition to the distance attenuation characteristics as described above, the distance range in which the voting value is set is limited to a width corresponding to the degree of variation. Also in the voting function of the formula (2), for example, it can be limited to a distance range that satisfies the following formula (5). Incidentally, equation (5) represents a 3σ confidence interval in a normal distribution. By limiting in this way, the voting value can be set only in a range where it is certain that the true object reference point exists, and the reliability of object detection is improved.

また、さらに別の実施形態として、（５）式の範囲にｆ（ｘ，ｙ）＝ｗの投票値を設定する構成としてもよい。 As still another embodiment, a configuration may be adopted in which a voting value of f (x, y) = w is set in the range of equation (5).

図５は、入力画像４００にて検出された特徴点の例を示す模式図である。図５において“×”印が検出された特徴点の位置を示す。例えば、特徴点４０１、４０２，４０３が同一人の肩、左足、右足に検出されている。 FIG. 5 is a schematic diagram illustrating an example of feature points detected in the input image 400. FIG. 5 shows the position of the feature point where the “x” mark is detected. For example, feature points 401, 402, and 403 are detected on the shoulder, left foot, and right foot of the same person.

図６は同一の対象物に検出された特徴点４０１〜４０３について投票の様子を例示する模式図である。なお、入力画像４００に撮像されている当該対象物の重心の真値は（ｘ０，ｙ０）であるとする。図６に示す画像４３０は入力画像４００の一部である。また、グラフ４５０は入力画像４３０における直線ｙ＝ｙ０に沿った位置での本実施形態による投票の様子を示すものであり、横軸が位置、縦軸は投票値ｆ（ｘ，ｙ）を表す。領域４１１〜４１３はそれぞれ特徴点４０１〜４０３に対する相対基準点４２１〜４２３を中心として設定される投票範囲を表す。また、グラフ４５１〜４５３は当該投票範囲での投票値であり、それぞれ特徴点４０１〜４０３に対する投票値を表す。 FIG. 6 is a schematic diagram illustrating a state of voting for the feature points 401 to 403 detected on the same object. It is assumed that the true value of the center of gravity of the target object captured in the input image 400 is (x0, y0). An image 430 shown in FIG. 6 is a part of the input image 400. A graph 450 shows a state of voting according to the present embodiment at a position along the straight line y = y0 in the input image 430, where the horizontal axis represents the position and the vertical axis represents the vote value f (x, y). . Regions 411 to 413 represent voting ranges set around the relative reference points 421 to 423 for the feature points 401 to 403, respectively. Graphs 451 to 453 are voting values in the voting range, and represent voting values for the feature points 401 to 403, respectively.

対象物基準点Ｂである重心（ｘ０，ｙ０）と、各特徴点の相対基準点４２１〜４２３との位置の相違が各特徴点の検出位置のずれに対応する。（ｘ０，ｙ０）には、ｘ方向に比較的小さなずれで検出された特徴点４０１及び４０２の投票値のみならず、比較的大きなずれで検出された特徴点４０３の投票値も設定されている。 The difference in position between the center of gravity (x0, y0), which is the object reference point B, and the relative reference points 421 to 423 of each feature point corresponds to a shift in the detection position of each feature point. In (x0, y0), not only the vote values of feature points 401 and 402 detected with a relatively small shift in the x direction but also the vote values of feature points 403 detected with a relatively large shift are set. .

一方、グラフ４７０は、本実施形態のグラフ４５０との対比のため、仮にばらつき度Ｖを全特徴点共通の一定値としたときの投票の様子を示したものである。グラフ４７０におけるグラフ４７１〜４７３がそれぞれグラフ４５０におけるグラフ４５１〜４５３に対応する。グラフ４７０は、ずれが大きい特徴点４０３の投票値（グラフ４７３）は（ｘ０，ｙ０）には設定されない点で、グラフ４５０と基本的な相違を有する。 On the other hand, for comparison with the graph 450 of the present embodiment, the graph 470 shows the state of voting when the degree of variation V is a constant value common to all feature points. Graphs 471 to 473 in graph 470 correspond to graphs 451 to 453 in graph 450, respectively. The graph 470 is fundamentally different from the graph 450 in that the vote value (graph 473) of the feature point 403 having a large deviation is not set to (x0, y0).

特徴点のばらつき度が大きいということは当該特徴点が広範囲で検出される可能性があることを意味し、逆に特徴点のばらつき度が小さいということは当該特徴点が狭い範囲で検出されることを意味する。対象物検知装置１は上述したように、特徴点のばらつき度に応じた距離減衰特性の投票値を設定したり、特徴点のばらつき度に応じた距離範囲に投票値を設定したりすることによって、ばらつきやすい特徴点からも真に対象物が存在する対象物基準点に対する有効な投票が行なわれるため、ばらつきにくい特徴点に偏った投票が回避されて対象物の誤検出や検出漏れを防ぐことが可能となる。 A large feature point variation means that the feature point may be detected in a wide range. Conversely, a small feature point variation means that the feature point is detected in a narrow range. Means that. As described above, the object detection device 1 sets a voting value of the distance attenuation characteristic according to the degree of variation of the feature points, or sets a voting value within a distance range according to the degree of variation of the feature points. Effective voting is performed on target reference points where the target is truly present even from feature points that are likely to vary, thus avoiding voting biased to feature points that are less likely to vary, thereby preventing false detection and detection omission of the target. Is possible.

また投票値に距離減衰特性を与えることによって特徴点の存在確率に相応した投票が行なわれるため、ばらつきやすい特徴点からの不当に高い投票を防いだ精度の高い投票が可能となる。 In addition, since voting according to the existence probability of feature points is performed by giving a distance attenuation characteristic to the voting value, it is possible to perform voting with high accuracy while preventing unreasonably high voting from feature points that tend to vary.

投票部１４１は、上述のように特徴点ごとに各位置（ｘ，ｙ）への投票値ｆ（ｘ，ｙ）を設定する。対象物判定部１４２は、各位置に設定された投票値は投票画像１２１における当該位置の画素値に累積加算され集計される（一次集計）。この各特徴点についての投票値の設定は検知倍率ごとに行われ、また投票画像への加算も、当該検知倍率に対応した投票画像を用いて検知倍率ごとに行われる。 As described above, the voting unit 141 sets the voting value f (x, y) for each position (x, y) for each feature point. The object determination unit 142 accumulates and adds the vote values set at the respective positions to the pixel values at the positions in the vote image 121 (primary aggregation). The setting of the vote value for each feature point is performed for each detection magnification, and addition to the vote image is also performed for each detection magnification using a vote image corresponding to the detection magnification.

対象物判定部１４２は、投票部１４１にて投票値を設定された投票画像に基づき、入力画像の各位置における投票値についてさらに集計処理（二次集計）を行う。具体的には、対象物の撮像状態やプロポーションの個体差が原因で、同一対象物の投票値が複数の検知倍率に跨って設定されることがある。そこで対象物判定部１４２は、検知倍率が隣接する同一位置の投票値をさらに加算する（二次集計）。これにより撮像状態や個体差による部位間の大きさバランスを吸収することができ、対象物の検出漏れを防ぐことができる。そして、集計値が予め設定された対象物検知閾値Ｔｏを超える位置に対象物が存在すると判定し、一方、集計値がＴｏを超える位置が１つも無い場合は入力画像内に対象物は存在しないと判定し、当該判定結果を出力する。判定結果は異常判定部１４３に入力される。 The object determination unit 142 further performs a counting process (secondary counting) on the voting value at each position of the input image based on the voting image for which the voting value is set by the voting unit 141. Specifically, the voting value of the same object may be set across a plurality of detection magnifications due to individual differences in the imaging state of the object and the proportion. Therefore, the object determination unit 142 further adds the vote values at the same position where the detection magnifications are adjacent (secondary aggregation). As a result, the size balance between the parts due to the imaging state and individual differences can be absorbed, and the detection omission of the object can be prevented. Then, it is determined that there is an object at a position where the aggregate value exceeds a preset object detection threshold To, and on the other hand, if there is no position where the aggregate value exceeds To, no object exists in the input image. And the determination result is output. The determination result is input to the abnormality determination unit 143.

また、真に対象物が存在する位置の近傍において複数の位置で集計値がＴｏを超える場合がある。そこで対象物判定部１４２は、投票画像１２１を複数のブロックに分割してブロックごとに集計値が極大となる位置（ピーク点）を検出し、ピーク点の集計値のみを対象物検知閾値Ｔｏと比較する。ブロックの大きさは検知倍率に応じて拡大・縮小した入力画像上での対象物の大きさより小さく設定する。これにより対象物の誤検出を防ぐことができる In addition, the total value may exceed To at a plurality of positions in the vicinity of the position where the object truly exists. Therefore, the object determination unit 142 divides the voting image 121 into a plurality of blocks, detects a position (peak point) where the total value is maximum for each block, and uses only the total value of the peak points as the object detection threshold To. Compare. The block size is set smaller than the size of the object on the input image enlarged or reduced according to the detection magnification. This can prevent false detection of the object.

対象物判定部１４２は、判定結果として対象物が存在すると判定された入力画像内の位置、当該位置における集計値、当該集計値が算出された検知倍率を対応付けた対象物検知情報を生成する。 The target object determination unit 142 generates target object detection information that associates the position in the input image determined to have the target object as a determination result, the total value at the position, and the detection magnification at which the total value is calculated. .

図７は対象物判定処理の様子を説明するための図であり、図７（ａ）は入力画像５００、及び対象物が撮像されている位置位置（ｘ１，ｙ１）及び（ｘ２，ｙ２）を示す模式図である。図７（ｂ）は、図７（ａ）に示す入力画像５００から検出された特徴点に対する投票値が、互いに検知倍率の異なる投票画像５１０〜５１６に対して設定されている様子を示す模式図である。なお、図７（ｂ）は、各投票画像５１０〜５１６のｘ方向及びｙ方向のサイズを揃え、ｘ軸、ｙ軸と直交する方向に投票画像を検知倍率αの順に並べたｘｙα三次元空間を表している。図７（ｂ）において、円は投票範囲を表しており、対象物が撮像されている位置（ｘ１，ｙ１）及び（ｘ２，ｙ２）に投票が集中していることが分かる。これらを集計すると位置（ｘ１，ｙ１）及び（ｘ２，ｙ２）に対象物検知閾値Ｔｏを超えるピークが検出され、位置（ｘ１，ｙ１）及び（ｘ２，ｙ２）に対象物の存在が判定される。 FIG. 7 is a diagram for explaining the state of the object determination process. FIG. 7A shows the input image 500 and the positions (x1, y1) and (x2, y2) where the object is imaged. It is a schematic diagram shown. FIG. 7B is a schematic diagram showing a state in which voting values for feature points detected from the input image 500 shown in FIG. 7A are set for voting images 510 to 516 having different detection magnifications. It is. FIG. 7B shows an xyα three-dimensional space in which the sizes of the voting images 510 to 516 are aligned in the x direction and the y direction, and the voting images are arranged in the order orthogonal to the x axis and the y axis in the order of the detection magnification α. Represents. In FIG. 7B, the circle represents the voting range, and it can be seen that the voting is concentrated at the positions (x1, y1) and (x2, y2) where the object is imaged. When these are totaled, peaks exceeding the object detection threshold value To are detected at the positions (x1, y1) and (x2, y2), and the presence of the object is determined at the positions (x1, y1) and (x2, y2). .

異常判定部１４３は対象物判定部１４２により対象物の存在が判定されると侵入異常が検知されたとして侵入異常信号を検知出力部１５へ出力する。 When the object determination unit 142 determines the presence of the object, the abnormality determination unit 143 outputs an intrusion abnormality signal to the detection output unit 15 assuming that an intrusion abnormality is detected.

検知出力部１５は外部装置と接続され、当該外部装置へ侵入異常信号を出力するインターフェース回路である。外部装置は、侵入者の存在を警報するスピーカー、ブザー又はランプ等の警報表示手段や、通信網を介して接続される遠隔地のセンタ装置等である。 The detection output unit 15 is an interface circuit that is connected to an external device and outputs an intrusion abnormality signal to the external device. The external device is an alarm display means such as a speaker, a buzzer, or a lamp for alarming the presence of an intruder, a remote center device connected via a communication network, and the like.

次に、対象物検知装置１の動作を説明する。図８は、対象物検知装置１の概略の動作を示すフロー図である。例えば、装置の管理者が電源を投入すると各部が動作を始める。画像取得部１１は所定時間間隔で撮像された画像を検知制御部１４に入力する。検知制御部１４は画像が入力されるたびにステップＳ１０〜Ｓ１８からなる処理を繰り返す。 Next, operation | movement of the target object detection apparatus 1 is demonstrated. FIG. 8 is a flowchart showing a schematic operation of the object detection apparatus 1. For example, when the device administrator turns on the power, each unit starts operating. The image acquisition unit 11 inputs images captured at predetermined time intervals to the detection control unit 14. The detection control unit 14 repeats the process consisting of steps S10 to S18 each time an image is input.

画像が入力されると（Ｓ１０）、検知制御部１４の特徴点検出部１４０は入力画像から特徴点を検出し、検知制御部１４の投票部１４１は検出結果に応じた投票を投票画像１２１に対して行なう（Ｓ１２）。 When an image is input (S10), the feature point detection unit 140 of the detection control unit 14 detects a feature point from the input image, and the voting unit 141 of the detection control unit 14 adds a vote according to the detection result to the vote image 121. This is performed (S12).

図９は、特徴点検出処理及び投票処理（Ｓ１２）の概略のフロー図である。図９を参照して特徴点検出処理及び投票処理を説明する。 FIG. 9 is a schematic flowchart of the feature point detection process and the voting process (S12). The feature point detection process and the voting process will be described with reference to FIG.

特徴点検出部１４０は、７段階の検知倍率を順次、注目倍率に設定し（Ｓ１２０）、全ての検知倍率に対してステップＳ１２１〜Ｓ１３２の処理を繰り返すループ処理を実行する。 The feature point detection unit 140 sequentially sets the seven detection magnifications to the attention magnification (S120), and executes a loop process that repeats the processes of steps S121 to S132 for all the detection magnifications.

検知倍率のループ処理において、まず特徴点検出部１４０は、注目倍率が１以外である場合には、拡大又は縮小を行うことで注目倍率に応じたサイズの入力画像を生成する（Ｓ１２１）。特徴点検出部１４０は、当該入力画像の全ての画素位置を順次、分析窓の中心に設定し、設定した各位置での当該分析窓内の特徴量を抽出する（Ｓ１２２）。抽出された特徴量はその抽出位置と対応付けられ、特徴量情報として検知記憶部１２に一時記憶される。この段階で特徴量を算出し保存しておき、後の処理で随時利用可能とすることで、無駄な重複算出を省くことができる。また、投票部１４１は注目倍率の投票画像１２１の各画素値を０に初期化する（Ｓ１２３）。 In the detection magnification loop processing, first, when the attention magnification is other than 1, the feature point detection unit 140 generates an input image having a size corresponding to the attention magnification by performing enlargement or reduction (S121). The feature point detection unit 140 sequentially sets all the pixel positions of the input image at the center of the analysis window, and extracts the feature amount in the analysis window at each set position (S122). The extracted feature amount is associated with the extraction position and temporarily stored in the detection storage unit 12 as feature amount information. By calculating and storing the feature amount at this stage and making it available at any time in later processing, it is possible to eliminate unnecessary duplication calculation. In addition, the voting unit 141 initializes each pixel value of the voting image 121 with the attention magnification to 0 (S123).

次に、特徴点検出部１４０は、検知倍率のループ処理内において、特徴点情報１２０に記憶されているＭ個の特徴点＃ｍ（１≦ｍ≦Ｍ）を順次、注目特徴点に設定し（Ｓ１２４）、さらに入力画像内の各画素位置を順次、注目位置に設定し（Ｓ１２５）、特徴点と画素位置の全組み合わせに対してステップＳ１２６〜Ｓ１３１の処理を繰り返すループ処理を実行する。 Next, the feature point detection unit 140 sequentially sets M feature points #m (1 ≦ m ≦ M) stored in the feature point information 120 as feature points of interest in the loop processing of the detection magnification. (S124) Further, each pixel position in the input image is sequentially set as a target position (S125), and a loop process for repeating the processes in steps S126 to S131 is executed for all combinations of feature points and pixel positions.

特徴点と画素位置とに関するループ処理において、特徴点検出部１４０は、特徴点情報１２０から注目特徴点の検出基準を読み出し、さらにステップＳ１２２にて生成された特徴量情報から注目位置の特徴量を読み出して、注目位置の特徴量を注目特徴点の検出基準と比較して検出度を算出し（Ｓ１２６）、算出された検出度を特徴点検出閾値Ｔｐと比較する（Ｓ１２７）。 In the loop processing related to the feature point and the pixel position, the feature point detection unit 140 reads the reference point of the target feature point from the feature point information 120, and further calculates the feature amount of the target position from the feature amount information generated in step S122. The degree of detection is calculated by comparing the feature amount of the target position with the detection criterion of the target feature point (S126), and the calculated degree of detection is compared with the feature point detection threshold Tp (S127).

検出度が特徴点検出閾値Ｔｐを超えていれば（Ｓ１２７にて「ＹＥＳ」）、注目位置に注目特徴点が検出されたとして、特徴点検出部１４０から投票部１４１に注目倍率、注目特徴点の特徴点番号、注目位置及び検出度が通知される。投票部１４１は、特徴点情報１２０から注目特徴点のばらつき度Ｖ及び相対位置Ｒを読み出し、通知された注目倍率α、注目位置Ｐ及び検出度ｄと、読み出したばらつき度Ｖ及び相対位置Ｒとを（１）式、（２）式に代入することで、入力画像内の各画素位置（ｘ，ｙ）に対する投票値ｆ（ｘ，ｙ）を算出し（Ｓ１２８）、算出された各画素位置（ｘ，ｙ）の投票値ｆ（ｘ，ｙ）を注目倍率の投票画像１２１において対応する画素位置（ｘ，ｙ）の画素値に加算する（Ｓ１２９）。ステップＳ１２９の加算処理は一次集計に相当する。一方、検出度がＴｐ以下のときは（Ｓ１２７にて「ＮＯ」）、注目位置に注目特徴点は検出されなかったとしてステップＳ１２８とＳ１２９は省略される。 If the degree of detection exceeds the feature point detection threshold Tp (“YES” in S127), it is determined that the feature point of interest is detected at the target position, and the feature point detection unit 140 applies the attention magnification and feature point of interest to the voting unit 141. The feature point number, the target position, and the detection degree are notified. The voting unit 141 reads the degree of variation V and the relative position R of the feature point of interest from the feature point information 120, and notifies the noticed magnification α, the position of interest P and the degree of detection d, and the read degree of variation V and the relative position R. Is substituted into the expressions (1) and (2) to calculate the vote value f (x, y) for each pixel position (x, y) in the input image (S128), and each calculated pixel position The voting value f (x, y) of (x, y) is added to the pixel value of the corresponding pixel position (x, y) in the voting image 121 of the attention magnification (S129). The addition processing in step S129 corresponds to primary aggregation. On the other hand, when the degree of detection is equal to or less than Tp (“NO” in S127), steps S128 and S129 are omitted because the feature point of interest has not been detected at the target position.

こうして全特徴点、全倍率について入力画像全体を走査し終えると（Ｓ１３０にて「ＹＥＳ」、かつＳ１３１にて「ＹＥＳ」、かつＳ１３２にて「ＹＥＳ」）、特徴点検出処理及び投票処理は終了する。 When the entire input image has been scanned for all feature points and magnifications (“YES” in S130, “YES” in S131, and “YES” in S132), the feature point detection process and the voting process are completed). To do.

特徴点検出処理及び投票処理が終わると図８に示すように、対象物検知装置１の処理は対象物判定処理Ｓ１４へ進む。対象物判定処理Ｓ１４では、検知制御部１４の対象物判定部１４２により、以下に説明するように、ステップＳ１２で作成された投票画像を基にして入力画像中に対象物が存在するか否かの判定が行われる。 When the feature point detection process and the voting process are completed, the process of the object detection device 1 proceeds to the object determination process S14 as shown in FIG. In the object determination process S14, as described below, the object determination unit 142 of the detection control unit 14 determines whether or not an object exists in the input image based on the vote image created in step S12. Is determined.

図１０は、対象物判定処理（Ｓ１４）の概略のフロー図である。図１０を参照して対象物判定処理（Ｓ１４）を説明する。 FIG. 10 is a schematic flowchart of the object determination process (S14). The object determination process (S14) will be described with reference to FIG.

対象物判定部１４２は、検知倍率ごとに、検知倍率が隣接する投票画像の対応する画素値同士、すなわち一次集計値同士を加算し合い（Ｓ１４０）、各検知倍率の投票画像をブロック分割してブロックごとにピークの画素を検出する（Ｓ１４１）。ステップＳ１４０の加算処理は二次集計に相当する。 For each detection magnification, the target object determination unit 142 adds the corresponding pixel values of the voting images with adjacent detection magnifications, that is, the primary aggregation values (S140), and divides the voting image of each detection magnification into blocks. Peak pixels are detected for each block (S141). The addition processing in step S140 corresponds to secondary aggregation.

次に対象物判定部１４２は、各ピーク画素を順次、注目ピーク画素に設定し（Ｓ１４２）、全てのピーク画素に対してステップＳ１４３〜Ｓ１４５の処理を繰り返すループ処理を実行する。 Next, the object determination unit 142 sequentially sets each peak pixel as a target peak pixel (S142), and executes a loop process that repeats the processes of steps S143 to S145 for all peak pixels.

ピーク画素に関するループ処理において、対象物判定部１４２は、注目ピーク画素の画素値（二次集計値）を対象物検知閾値Ｔｏと比較する（Ｓ１４３）。そして、集計値がＴｏより大きければ注目ピーク画素の位置に対象物を検知したとして（Ｓ１４３にて「ＹＥＳ」）、注目ピーク画素の位置と、注目ピーク画素の画素値と、注目ピーク画素が属する投票画像の検知倍率とを対応付けた対象物検知情報を生成して、検知記憶部１２に記憶させる（Ｓ１４４）。一方、集計値がＴｏ以下の場合（Ｓ１４３にて「ＮＯ」）、ステップＳ１４４は省略される。 In the loop processing related to the peak pixel, the object determination unit 142 compares the pixel value (secondary aggregate value) of the peak pixel of interest with the object detection threshold value To (S143). If the target value is detected at the position of the target peak pixel if the total value is larger than To (“YES” in S143), the position of the target peak pixel, the pixel value of the target peak pixel, and the target peak pixel belong to it. Object detection information in association with the detection magnification of the voting image is generated and stored in the detection storage unit 12 (S144). On the other hand, when the total value is equal to or less than To (“NO” in S143), step S144 is omitted.

こうして全てのピーク画素について処理し終えると（Ｓ１４５にて「ＹＥＳ」）、対象物判定処理Ｓ１４は終了する。 When all the peak pixels have been processed in this way (“YES” in S145), the object determination process S14 ends.

対象物判定部１４２が処理を終えると、検知制御部１４の異常判定部１４３は検知記憶部１２を参照して対象物検知情報の有無を確認し（Ｓ１６）、対象物検知情報が１つでも記憶されていれば対象物が検知されたとして（Ｓ１６にて「ＹＥＳ」）、侵入異常信号を検知出力部１５へ出力し、検知出力部１５に警報を出力させる（Ｓ１８）。 When the object determination unit 142 finishes the process, the abnormality determination unit 143 of the detection control unit 14 refers to the detection storage unit 12 to confirm the presence / absence of the object detection information (S16), and even one object detection information exists. If it is stored, the object is detected ("YES" in S16), an intrusion abnormality signal is output to the detection output unit 15, and an alarm is output to the detection output unit 15 (S18).

以上の処理を終えると、処理は再びステップＳ１０へ戻される。 When the above process is completed, the process returns to step S10 again.

上記実施形態では、画像取得部１１は撮像部１０と接続され、検知制御部１４はオンライン処理で対象物を検知した。しかし、画像取得部１１が録画装置と接続され、検知制御部１４がオフライン処理で対象物を検知する構成としてもよい。 In the above embodiment, the image acquisition unit 11 is connected to the imaging unit 10, and the detection control unit 14 detects the object by online processing. However, the image acquisition unit 11 may be connected to the recording device, and the detection control unit 14 may detect an object by offline processing.

特徴点検出部１４０は上述の実施形態では、入力画像内の全画素位置を走査するようにしたが、入力画像からブロッブやコーナーを予備検出し、予備検出された位置及びその周辺のみを走査するようにしてもよい。このとき、標本点設定部２２０がブロッブを検出するのであれば特徴点検出部１４０もブロッブを予備検出し、標本点設定部２２０がコーナーを検出するのであれば特徴点検出部１４０もコーナーを予備検出する。 In the above-described embodiment, the feature point detection unit 140 scans all pixel positions in the input image. However, the feature point detection unit 140 preliminarily detects blobs and corners from the input image, and scans only the pre-detected position and its periphery. You may do it. At this time, if the sample point setting unit 220 detects a blob, the feature point detection unit 140 also preliminarily detects the blob, and if the sample point setting unit 220 detects a corner, the feature point detection unit 140 also reserves the corner. To detect.

上記実施形態においては、学習装置２にて算出されたばらつき度が特徴点情報２１２の一部として特徴点情報設定部１３から入力され検知記憶部１２に格納された。別の実施形態として、特徴点情報設定部１３はキーボード、マウス等の操作入力装置をさらに備え、対象物検知装置１の管理者が特徴点情報設定部１３を操作してばらつき度を入力する構成としてもよい。 In the above embodiment, the degree of variation calculated by the learning device 2 is input from the feature point information setting unit 13 as part of the feature point information 212 and stored in the detection storage unit 12. As another embodiment, the feature point information setting unit 13 further includes an operation input device such as a keyboard and a mouse, and the administrator of the object detection device 1 operates the feature point information setting unit 13 to input the degree of variation. It is good.

［学習装置］
図１１は、実施形態に係る学習装置２の概略の構成を示すブロック図である。学習装置２は、学習操作部２０、学習記憶部２１、学習制御部２２及び学習出力部２３を含んで構成される。学習操作部２０、学習記憶部２１及び学習出力部２３は学習制御部２２と接続される。 [Learning device]
FIG. 11 is a block diagram illustrating a schematic configuration of the learning device 2 according to the embodiment. The learning device 2 includes a learning operation unit 20, a learning storage unit 21, a learning control unit 22, and a learning output unit 23. The learning operation unit 20, the learning storage unit 21, and the learning output unit 23 are connected to the learning control unit 22.

学習操作部２０はキーボード、マウス等のユーザインターフェース装置であり、装置の管理者により操作され、学習の開始指示や特徴点の情報の出力指示を学習制御部２２に与える。 The learning operation unit 20 is a user interface device such as a keyboard and a mouse, and is operated by an administrator of the device to give a learning start instruction and an instruction to output feature point information to the learning control unit 22.

学習記憶部２１はＲＯＭ、ＲＡＭ、ハードディスク等の記憶装置であり、学習制御部２２で使用されるプログラムやデータを記憶する。学習記憶部２１はこれらプログラム、データを学習制御部２２との間で入出力する。学習記憶部２１に記憶されるデータには、標本画像２１０、標本点情報２１１、特徴点情報２１２が含まれる。 The learning storage unit 21 is a storage device such as a ROM, a RAM, and a hard disk, and stores programs and data used by the learning control unit 22. The learning storage unit 21 inputs and outputs these programs and data to and from the learning control unit 22. The data stored in the learning storage unit 21 includes a sample image 210, sample point information 211, and feature point information 212.

標本画像２１０は特徴点情報２１２を作成する基礎となる画像であり、当該学習に先立って予め記憶される。標本画像２１０は、対象物が撮像された多数（数千〜数万枚程度）の対象物標本画像、及び対象物が撮像されていない多数（数千〜数万枚程度）の非対象物標本画像とからなる。標本画像２１０のそれぞれには当該画像を識別する標本番号が付与されている。対象物標本画像は６４×１２８画素の基準サイズに予め揃えられている。 The sample image 210 is an image serving as a basis for creating the feature point information 212, and is stored in advance prior to the learning. The sample image 210 includes a large number (several thousands to several tens of thousands) of target object images in which the target is imaged, and a large number (several thousands to tens of thousands) of the non-target samples in which the target is not captured. It consists of an image. Each of the sample images 210 is given a sample number for identifying the image. The object specimen image is preliminarily aligned to a reference size of 64 × 128 pixels.

標本点情報２１１は各対象物標本画像内に設定された標本点（標本特徴点）の情報である。標本点情報２１１は、各標本点の位置、特徴量及び、当該標本点が設定された対象物標本画像を特定する標本番号を含む。 The sample point information 211 is information on sample points (sample feature points) set in each object sample image. The sample point information 211 includes a position of each sample point, a feature amount, and a sample number for specifying an object sample image in which the sample point is set.

特徴点情報２１２は標本点情報２１１を基に作成された特徴点の情報である。その内容は上述した特徴点情報１２０と同じであり、各特徴点の特徴点番号、相対位置、特徴量、ばらつき度といったパラメータ群である。 The feature point information 212 is feature point information created based on the sample point information 211. The content is the same as the feature point information 120 described above, and is a parameter group such as a feature point number, a relative position, a feature amount, and a variation degree of each feature point.

学習出力部２３は生成された特徴点情報２１２を学習装置２の外部へ出力するＵＳＢ端子、ＣＤドライブ、ネットワークアダプタ等のインターフェース回路、及びそれぞれのドライバ・プログラムからなる。外部出力された各データは対象物検知装置１に入力される。 The learning output unit 23 includes a USB terminal for outputting the generated feature point information 212 to the outside of the learning device 2, an interface circuit such as a CD drive and a network adapter, and respective driver programs. Each data output externally is input to the object detection apparatus 1.

学習制御部２２は、ＤＳＰ、ＭＣＵ等の演算装置を用いて構成される。学習制御部２２は、標本画像２１０から特徴点情報２１２を生成して、生成した特徴点情報２１２を学習出力部２３へ出力する処理を行う。具体的には、学習制御部２２は、学習記憶部２１からプログラムを読み出して実行し、後述する標本点設定部２２０、クラスタリング部２２１、特徴点情報生成部２２２として機能する。 The learning control unit 22 is configured using an arithmetic device such as a DSP or MCU. The learning control unit 22 performs processing for generating feature point information 212 from the sample image 210 and outputting the generated feature point information 212 to the learning output unit 23. Specifically, the learning control unit 22 reads out and executes a program from the learning storage unit 21 and functions as a sample point setting unit 220, a clustering unit 221, and a feature point information generation unit 222, which will be described later.

標本点設定部２２０は、標本画像から所定の画像特徴を有する標本点を抽出する標本特徴点抽出部として機能する。具体的には、各対象物標本画像内に複数の標本点を設定して標本点における特徴量を抽出し、各標本点の位置と特徴量と当該標本点が設定された対象物標本画像の標本番号とを対応付けた標本点情報２１１を学習記憶部２１に記憶させる。 The sample point setting unit 220 functions as a sample feature point extracting unit that extracts a sample point having a predetermined image feature from the sample image. Specifically, a plurality of sample points are set in each object sample image, the feature amount at the sample point is extracted, and the position and feature amount of each sample point and the object sample image in which the sample point is set are extracted. Sample point information 211 associated with the sample number is stored in the learning storage unit 21.

本実施形態では、標本点として、コーナー（corner）と呼ばれるエッジの交点、又はブロッブ（blob）と呼ばれる輝度極大点などを用いる。具体的には、ハリス−ラプラス（Harris-Laplace）の方法など公知のコーナー検出方法により各対象物標本画像からコーナーを検出して特徴点に設定し、又は、ＳＩＦＴ（Scale-Invariant Feature Transform）など公知のブロッブ検出方法により各対象物標本画像からブロッブを検出して検出されたブロッブを特徴点に設定する。輝度に特徴のある標本点を設定することで、対象物の検知に有効な特徴点情報を効率的に生成できる。 In the present embodiment, an intersection of edges called a corner or a luminance maximum point called a blob is used as a sample point. Specifically, a corner is detected from each object specimen image by a known corner detection method such as Harris-Laplace method and set as a feature point, or SIFT (Scale-Invariant Feature Transform), etc. A blob detected from each object specimen image by a known blob detection method is set as a feature point. By setting sample points having a characteristic in luminance, it is possible to efficiently generate feature point information effective for detecting an object.

なお、標本点の設定の仕方として、対象物標本画像の全体に予め設定された個数の複数の標本点をランダムに設定する方法や、対象物標本画像内に等間隔で複数の標本点をグリッド状に設定する方法を採用することもできる。 In addition, as a method of setting sample points, a method of randomly setting a plurality of sample points in a predetermined number in the entire object sample image, or a grid of sample points at equal intervals in the object sample image It is also possible to adopt a method of setting the shape.

標本点設定部２２０は、各対象物標本画像の標本点それぞれに分析窓を設定して分析窓内の特徴量を抽出する。特徴量は、本実施形態では上述したシェイプコンテキストとするが、ＨＯＧとすることもできる。 The sample point setting unit 220 sets an analysis window for each sample point of each object sample image and extracts a feature amount in the analysis window. The feature amount is the shape context described above in the present embodiment, but may be a HOG.

クラスタリング部２２１は、標本画像相互間にて位置及び画像特徴が類似する標本点からなるクラスタ（cluster）を生成する。具体的には、クラスタリング部２２１は標本点情報２１１を参照し、位置及び特徴量に着目して標本点をクラスタリングすることによって、位置及び特徴量が類似する標本点のクラスタを生成する。これにより、多数の対象物標本画像間で対象物の同じ部位を表す標本点が１つのクラスタにまとめられる。生成されたクラスタの情報は、特徴点情報生成部２２２へ出力される。 The clustering unit 221 generates a cluster composed of sample points having similar positions and image features between the sample images. Specifically, the clustering unit 221 generates a cluster of sample points having similar positions and feature amounts by referring to the sample point information 211 and clustering the sample points by paying attention to the positions and feature amounts. As a result, the sample points representing the same part of the object among a large number of object sample images are collected into one cluster. The generated cluster information is output to the feature point information generation unit 222.

クラスタリングにはｋ−平均クラスタリング、種々の凝集クラスタリング（群平均法など）など公知の手法を用いることができる。 For the clustering, known methods such as k-average clustering and various aggregation clustering methods (group average method etc.) can be used.

またこのとき、位置及び特徴量に同時に着目してクラスタリングを行っても良いし、まず特徴量に着目したクラスタリングを行い、次いで位置に着目したクラスタリングを行っても良い。 At this time, the clustering may be performed while paying attention to the position and the feature amount at the same time, or the clustering focusing on the feature amount may be performed first, and then the clustering focusing on the position may be performed.

特徴点情報生成部２２２は、クラスタごとに、統計分析により標本点の位置の分布（標本分布）の情報を求め、さらに、標本画像における所定の基準点の、標本分布における所定の代表位置からの相対位置である標本相対位置を求める。そして、クラスタごとの標本分布の情報及び標本相対位置に基づく特徴点分布の情報及び相対位置を含む特徴点情報を生成する。具体的には特徴点情報生成部２２２は、クラスタごとに、標本点の位置のばらつき度を統計分析すると共に、標本点の特徴量を用いて検出基準を学習し、当該ばらつき度と当該検出基準と当該クラスタの対象物基準点に対する相対位置とを対応付けた特徴点情報２１２を生成し、生成された特徴点情報２１２を学習記憶部２１に記憶させる。特徴点情報２１２は１つのクラスタから１つ生成される。本実施形態では、クラスタ＃ｍに属する標本点の位置の分散値を特徴点＃ｍのばらつき度Ｖとして算出する。また、クラスタ＃ｍに属する標本点の位置の平均値を当該クラスタの代表位置として算出し、算出した代表位置の座標から対象物基準点Ｂの座標を引いたベクトルを特徴点＃ｍの相対位置として算出する。さらに、クラスタ＃ｍに属する標本点の特徴量の平均（平均ベクトル）を特徴点＃ｍの検出基準として算出する。 The feature point information generation unit 222 obtains information on the distribution of sample point positions (sample distribution) by statistical analysis for each cluster, and further calculates a predetermined reference point in the sample image from a predetermined representative position in the sample distribution. The relative sample position is obtained. Then, the feature point information including the sample distribution information for each cluster and the feature point distribution information based on the sample relative position and the relative position is generated. Specifically, the feature point information generation unit 222 statistically analyzes the degree of variation in the position of the sample point for each cluster, learns the detection criterion using the feature amount of the sample point, and determines the variation degree and the detection criterion. And feature point information 212 in which the relative position of the cluster with respect to the object reference point is associated, and the generated feature point information 212 is stored in the learning storage unit 21. One feature point information 212 is generated from one cluster. In the present embodiment, the variance value of the positions of the sample points belonging to the cluster #m is calculated as the variation degree V of the feature point #m. Further, an average value of the positions of the sample points belonging to the cluster #m is calculated as a representative position of the cluster, and a vector obtained by subtracting the coordinates of the object reference point B from the calculated coordinates of the representative position is a relative position of the feature point #m. Calculate as Further, the average (average vector) of the feature amounts of the sample points belonging to the cluster #m is calculated as a detection reference for the feature point #m.

対象物検知装置１の特徴点検出部１４０が識別器として動作する構成では、特徴点情報生成部２２２は、クラスタ＃ｍの代表位置における特徴量を非対象物標本画像のそれぞれからも抽出し、クラスタ＃ｍに属する標本点の特徴量と非対象物標本画像から抽出された特徴量とに公知のブースティング（Boosting）又はサポートベクターマシーン（Support Vector Machine）等の学習アルゴリズムを適用して検出基準である識別関数を学習する。 In the configuration in which the feature point detection unit 140 of the target object detection device 1 operates as a discriminator, the feature point information generation unit 222 extracts the feature amount at the representative position of the cluster #m from each of the non-target sample images, Detection criteria by applying a learning algorithm such as Boosting or Support Vector Machine to the feature values of sample points belonging to cluster #m and the feature values extracted from the non-object sample image The discriminant function is learned.

なお、少数の対象物標本画像から生成されたクラスタからは、対象物を検知する十分な精度を有した特徴点情報を生成できる可能性が低い。そこで特徴点情報生成部２２２はクラスタに属する標本点の設定元となった対象物標本画像の数を計数して、計数値が所定値以下のクラスタの情報からは特徴点情報２１２を生成しないように構成することができる。これにより十分な対象物検知精度を有した特徴点情報を生成できる。また、これにより不当に小さいばらつき度や不当に大きなばらつき度が算出されることを防止できるため、高い対象物検知精度を有した特徴点情報を生成できる。 Note that it is unlikely that feature point information with sufficient accuracy for detecting an object can be generated from a cluster generated from a small number of object specimen images. Therefore, the feature point information generation unit 222 counts the number of object sample images that are the setting source of the sample points belonging to the cluster, and does not generate the feature point information 212 from the information of the cluster whose count value is a predetermined value or less. Can be configured. Thereby, feature point information having sufficient object detection accuracy can be generated. In addition, since it is possible to prevent an unduly small variation degree or an unduly large variation degree from being calculated, it is possible to generate feature point information having high object detection accuracy.

図１２は、標本点とクラスタとの関係を示す模式図である。図１２は、各対象物標本画像６００〜６０３にて互いに対応する特徴を有する標本点であるコーナー６１０〜６１３（“×”印）が、点線で示す領域内にばらついて検出された様子を示している。コーナーにおいてクラスタリング用の特徴量を抽出する領域６２０〜６２３を、コーナーを示す“×”印を囲む実線の円で表している。この場合、コーナー６１０〜６１３は点線の領域に対応するクラスタ６５０にまとめられる。また、クラスタの代表位置６３０〜６３３（黒丸）は、当該クラスタにまとめられるコーナーの平均位置に設定されている。当該代表位置が当該クラスタに対応する特徴点となり、当該特徴点の特徴量を抽出する領域である局所領域６４０〜６４３を当該特徴点を囲む実線の円で表している。 FIG. 12 is a schematic diagram showing the relationship between sample points and clusters. FIG. 12 shows a state in which corners 610 to 613 ("x" marks) that are sample points having features corresponding to each other in each of the object sample images 600 to 603 are detected by being dispersed in the region indicated by the dotted line. ing. Regions 620 to 623 for extracting feature values for clustering at corners are represented by solid circles surrounding “x” marks indicating the corners. In this case, the corners 610 to 613 are collected into a cluster 650 corresponding to the dotted area. The cluster representative positions 630 to 633 (black circles) are set to the average positions of the corners collected in the cluster. The representative position becomes a feature point corresponding to the cluster, and local regions 640 to 643, which are regions for extracting feature amounts of the feature point, are represented by solid circles surrounding the feature point.

次に、学習装置２の動作を説明する。図１３は、学習装置２の概略の動作を示すフロー図である。例えば、管理者が学習装置２の電源を投入し学習操作部２０を操作して学習の開始を指示すると、学習装置２は学習処理を行う。以下、図１３を参照して学習処理を説明する。 Next, the operation of the learning device 2 will be described. FIG. 13 is a flowchart showing a schematic operation of the learning device 2. For example, when the administrator turns on the power of the learning device 2 and operates the learning operation unit 20 to instruct the start of learning, the learning device 2 performs a learning process. Hereinafter, the learning process will be described with reference to FIG.

学習制御部２２は標本点設定部２２０により、各対象物標本画像内に複数の標本点を設定する（Ｓ２０）。そして、各対象物標本画像において当該画像から抽出された標本点それぞれの位置に分析窓を設定して当該分析窓内から特徴量を抽出し（Ｓ２１）、抽出された特徴量に抽出元の対象物標本画像の標本番号及び抽出元の標本点の位置を対応付けて標本点情報２１１を生成し学習記憶部２１に記憶させる。 The learning control unit 22 sets a plurality of sample points in each object sample image by the sample point setting unit 220 (S20). Then, an analysis window is set at the position of each sample point extracted from the image in each object sample image, and a feature amount is extracted from the analysis window (S21). Sample point information 211 is generated by associating the sample number of the object sample image with the position of the sample point of the extraction source, and stored in the learning storage unit 21.

次に、学習制御部２２はクラスタリング部２２１により、標本点情報２１１に対して位置と特徴量に着目したクラスタリング処理を行い、対象物標本画像間で位置及び特徴量が類似する標本点同士がまとめられたクラスタを生成する（Ｓ２２）。クラスタリング部２２１は、クラスタリングの結果として、各標本点の標本点情報２１１に当該標本点が属するクラスタを識別するクラスタ番号を追記する。 Next, the learning control unit 22 uses the clustering unit 221 to perform clustering processing focusing on the position and the feature amount on the sample point information 211, and collects sample points having similar positions and feature amounts between the target sample images. The generated cluster is generated (S22). As a result of clustering, the clustering unit 221 adds a cluster number for identifying a cluster to which the sample point belongs to the sample point information 211 of each sample point.

学習制御部２２は特徴点情報生成部２２２により、ステップＳ２２にて生成された各クラスタを順次、注目クラスタに設定し（Ｓ２３）、全てのクラスタに対してステップＳ２４〜Ｓ２９の処理を繰り返すループ処理を実行する。 The learning control unit 22 causes the feature point information generation unit 222 to sequentially set each cluster generated in step S22 as a cluster of interest (S23), and repeats the processing of steps S24 to S29 for all clusters. Execute.

クラスタのループ処理において、特徴点情報生成部２２２は、標本点情報２１１を参照して、注目クラスタに属する標本点の位置を代表する代表位置を算出し（Ｓ２４）、また、算出された代表位置の対象物基準点Ｂに対する相対位置Ｒを算出し（Ｓ２５）、さらに、注目クラスタに属する標本点の位置のばらつき度Ｖを算出する（Ｓ２６）。 In the cluster loop processing, the feature point information generation unit 222 refers to the sample point information 211 and calculates a representative position representing the position of the sample point belonging to the cluster of interest (S24). The relative position R with respect to the object reference point B is calculated (S25), and the variation degree V of the position of the sample point belonging to the cluster of interest is calculated (S26).

さらに、特徴点情報生成部２２２は、注目クラスタに属する標本点の特徴量を用いて検出基準の学習を行う（Ｓ２７）。 Further, the feature point information generation unit 222 learns detection criteria using the feature amounts of the sample points belonging to the cluster of interest (S27).

対象物検知装置１の特徴点検出部１４０がパターンマッチングにより特徴点を検出する場合、特徴点情報生成部２２２は注目クラスタに属する標本点の特徴量の平均特徴量を検出基準として学習する。この場合、検出基準は対象物標本画像の画像情報のみを用いて学習されることになる。 When the feature point detection unit 140 of the object detection device 1 detects a feature point by pattern matching, the feature point information generation unit 222 learns using the average feature amount of the feature points of the sample points belonging to the cluster of interest as a detection reference. In this case, the detection criterion is learned using only the image information of the object specimen image.

対象物検知装置１の特徴点検出部１４０が識別器として動作する別の実施形態の場合、特徴点情報生成部２２２は、注目クラスタに属する標本点の特徴量すなわち対象物標本画像の画像情報に加え、非対象物標本画像の画像情報を用いて学習を行なう。すなわち、特徴点情報生成部２２２は、ステップＳ２４において算出された代表位置における特徴量を非対象物標本画像のそれぞれからも抽出し、注目クラスタに属する標本点の特徴量と非対象物標本画像から抽出された特徴量とにブースティング又はサポートベクターマシーンを適用して検出基準を学習する。 In another embodiment in which the feature point detection unit 140 of the object detection device 1 operates as a discriminator, the feature point information generation unit 222 uses the feature amount of the sample point belonging to the cluster of interest, that is, the image information of the object sample image. In addition, learning is performed using the image information of the non-object specimen image. That is, the feature point information generation unit 222 also extracts the feature amount at the representative position calculated in step S24 from each of the non-object sample images, and uses the feature amount of the sample point belonging to the cluster of interest and the non-object sample image. A detection criterion is learned by applying a boosting or support vector machine to the extracted feature quantity.

特徴点情報生成部２２２は、以上ステップＳ２４〜Ｓ２７にて注目クラスタに対応する特徴点の情報を求めると、新たな特徴点番号、注目クラスタの相対位置Ｒ、注目クラスタのばらつき度Ｖ及び、注目クラスタの検出基準を対応付けて新たな特徴点情報２１２を生成し学習記憶部２１に記憶させる（Ｓ２８）。 When the feature point information generation unit 222 obtains information on the feature points corresponding to the target cluster in steps S24 to S27, the new feature point number, the relative position R of the target cluster, the degree of variation V of the target cluster, and the target point New feature point information 212 is generated in association with the cluster detection criteria and stored in the learning storage unit 21 (S28).

全てのクラスタについて処理し終えると（Ｓ２９にて「ＹＥＳ」）、学習処理は終了する。 When all the clusters have been processed (“YES” in S29), the learning process ends.

学習処理の終了後、管理者が学習操作部２０を操作して特徴点情報２１２の出力を指示すると、学習制御部２２は学習記憶部２１から特徴点情報２１２を読み出して学習出力部２３に出力させる。 After completion of the learning process, when the administrator operates the learning operation unit 20 to instruct the output of the feature point information 212, the learning control unit 22 reads the feature point information 212 from the learning storage unit 21 and outputs it to the learning output unit 23. Let

なお、上記実施形態においては基準点Ｂを標本画像２１０の重心位置に定めたが、特徴点間で共通していれば基準点Ｂは標本画像２１０の左上端、右下端など任意の位置でもよい。 In the above embodiment, the reference point B is set at the center of gravity of the sample image 210. However, the reference point B may be at an arbitrary position such as the upper left end or the lower right end of the sample image 210 as long as it is common among the feature points. .

また上記実施形態では、特徴点情報生成部２２２は特徴点のばらつき度として分散値を算出した。別の実施形態として、特徴点情報生成部２２２は、クラスタごとに標本点の平均位置からの差の絶対値の平均値を当該クラスタのばらつき度として算出したり、クラスタごとに標本点の平均位置からの距離の最大値を当該クラスタのばらつき度として算出する構成としてもよい。 In the above embodiment, the feature point information generation unit 222 calculates the variance value as the variation degree of the feature points. As another embodiment, the feature point information generation unit 222 calculates the average value of the absolute value of the difference from the average position of the sample points for each cluster as the degree of variation of the cluster, or the average position of the sample points for each cluster. The maximum value of the distance from the distance may be calculated as the variation degree of the cluster.

１対象物検知装置、２学習装置、１０撮像部、１１画像取得部、１２検知記憶部、１３特徴点情報設定部、１４検知制御部、１５検知出力部、２０学習操作部、２１学習記憶部、２２学習制御部、２３学習出力部、１２０特徴点情報、１２１投票画像、１４０特徴点検出部、１４１投票部、１４２対象物判定部、１４３異常判定部、２１０標本画像、２１１標本点情報、２１２特徴点情報、２２０標本点設定部、２２１クラスタリング部、２２２特徴点情報生成部、４００，５００入力画像、４０１〜４０３特徴点、５１０〜５１６投票画像、６００〜６０３対象物標本画像、６１０〜６１３コーナー、６５０クラスタ、６３０〜６３３代表位置。 DESCRIPTION OF SYMBOLS 1 Object detection apparatus, 2 Learning apparatus, 10 Imaging part, 11 Image acquisition part, 12 Detection storage part, 13 Feature point information setting part, 14 Detection control part, 15 Detection output part, 20 Learning operation part, 21 Learning storage part , 22 Learning control unit, 23 Learning output unit, 120 Feature point information, 121 Vote image, 140 Feature point detection unit, 141 Voting unit, 142 Object determination unit, 143 Abnormality determination unit, 210 Sample image, 211 Sample point information, 212 feature point information, 220 sample point setting unit, 221 clustering unit, 222 feature point information generation unit, 400,500 input image, 401 to 403 feature point, 510 to 516 voting image, 600 to 603 target object sample image, 610 613 corner, 650 clusters, 630-633 representative positions.

Claims

An object detection device for detecting an object appearing in an input image,
For each of a plurality of feature points having an image feature indicating a feature of a target object image obtained by photographing the target object set in advance, a relative position between a predetermined reference point and the feature point in the target object image, and the relative position A storage unit storing feature point information including a degree of variation;
A feature point detection unit for detecting the feature points from the input image;
A voting unit that calculates a voting value with a distance characteristic according to the degree of variation around a relative position with respect to the position of the feature point in the input image for the feature point detected by the feature point detection unit;
An object determination unit that determines the presence of an object by counting the vote values at each position in the input image;
The object detection apparatus characterized by having.

The object detection apparatus according to claim 1,
The object detection device according to claim 1, wherein the distance characteristic of the voting unit increases the distance range of the voting position from the center as the degree of variation increases.

In the object detection device according to claim 1 or 2,
The object detection device according to claim 1, wherein the distance characteristic of the voting unit gradually attenuates the voting value in accordance with a distance away from the center as the degree of variation increases.

A learning device that generates the feature point information used in the object detection device according to any one of claims 1 to 3,
A sample image storage unit storing a plurality of sample images taken of the object;
A sample feature point extraction unit for extracting a sample feature point having a predetermined image feature from each sample image;
A clustering unit for generating a cluster of the sample feature points whose positions and image features are similar between the sample images;
For each cluster, information on the sample distribution regarding the distribution of the positions of the sample feature points is obtained by statistical analysis, and further, a predetermined reference point in the sample image is a relative position from a predetermined representative position in the sample distribution. A feature point information generation unit for obtaining a sample relative position, and generating the feature point information with the information on the sample distribution and the sample relative position for each cluster as the feature point distribution information and the relative position;
A learning apparatus comprising: