JP2010140315A

JP2010140315A - Object detection device

Info

Publication number: JP2010140315A
Application number: JP2008316904A
Authority: JP
Inventors: Hidenori Ujiie; 秀紀氏家
Original assignee: Secom Co Ltd
Current assignee: Secom Co Ltd
Priority date: 2008-12-12
Filing date: 2008-12-12
Publication date: 2010-06-24
Anticipated expiration: 2028-12-12
Also published as: JP5264457B2

Abstract

<P>PROBLEM TO BE SOLVED: To reduce a processing load to prevent detection leakage, in object detection from an image. <P>SOLUTION: An object detection device includes: an image acquisition part 2 for acquiring the image; an identification device 34 for fetching a detection window area image set with a detection window from the image input from the image acquisition part 2, and outputting decision of whether or not a detection object is present in the detection window area image and reliability showing a degree of certainty of the presence of the detection object in the decision; a detection window control means 32 for setting a subsequent scanning spacing of the detection window to be shorter as the reliability becomes higher; and a decision means 36 deciding whether or not the detection object is present based on the output from the identification device. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、画像から検出対象物を検出する物体検出装置に関する。 The present invention relates to an object detection apparatus that detects a detection target from an image.

画像内に検出対象物に対応した画像領域が含まれるか否かを検出する物体検出装置が知られている。 An object detection device that detects whether an image area corresponding to a detection target is included in an image is known.

特許文献１には、識別対象のテンプレートを画像上の走査領域に対して順次ずらしながらマッチングするテンプレートマッチングの技術が開示されている。走査間隔を大きくとりつつ検出窓領域を走査してマッチング処理を行い、識別対象のテンプレートとの類似度が高いと現在よりも演算対象画素が多いレベルでマッチングを８つの近傍領域に対して行うことで、マッチング処理の高速化と精度向上を図っている。 Patent Document 1 discloses a template matching technique that performs matching while sequentially shifting a template to be identified with respect to a scanning region on an image. Matching processing is performed by scanning the detection window region with a large scanning interval, and matching is performed on eight neighboring regions at a level where there are more calculation target pixels than the current when the similarity to the template to be identified is high. Therefore, the matching process is speeded up and accuracy is improved.

特許文献２には、矩形領域を検出窓領域として、入力画像を順次走査して物体を検出する物体検出装置が開示されている。検出窓領域にて物体との類似性を算出し、類似度が十分に高ければ検出窓領域の走査間隔を広くし、類似度が十分に高くなければ走査間隔を短くする。 Patent Document 2 discloses an object detection device that detects an object by sequentially scanning an input image using a rectangular area as a detection window area. The similarity with the object is calculated in the detection window region. If the similarity is sufficiently high, the scanning interval of the detection window region is widened. If the similarity is not sufficiently high, the scanning interval is shortened.

また、対象物体の存否を事前学習させた識別器に画像を入力し、画像中の対象物体を検出する物体検出装置が提案されている。例えば、特許文献３には、パーティクルフィルタを用いて人物を追跡する人物追跡装置であって、カスケード型識別器を用いて、カスケードの通過段数を人らしさの指標とする技術が開示されている。 There has also been proposed an object detection apparatus that inputs an image to a discriminator that has previously learned whether or not the target object exists and detects the target object in the image. For example, Patent Document 3 discloses a person tracking apparatus that tracks a person using a particle filter, and uses a cascade type discriminator and uses the number of passing stages of the cascade as an indicator of humanity.

特開平２−１５９６８２号公報JP-A-2-159682 特許第３５１７５１８号公報Japanese Patent No. 3517518 特開２００８−２６９７４号公報JP 2008-26974 A

ところで、画像から検出対象物が存在する位置をより高い精度で求めるためには検出窓領域を細かい走査間隔でずらしながらマッチングを行う必要がある。しかしながら、走査間隔を細かくすると処理負担が大きくなり、検出対象物を検出するまで時間が掛かる。一方、検出窓領域を大きな走査間隔でずらしながらマッチングを行うと、検出対象が存在するにも関わらず検出できない、いわゆる検出漏れが発生する原因となる。 By the way, in order to obtain the position where the detection target exists from the image with higher accuracy, it is necessary to perform matching while shifting the detection window region at a fine scanning interval. However, if the scanning interval is reduced, the processing load increases, and it takes time until the detection target is detected. On the other hand, if matching is performed while shifting the detection window region at a large scanning interval, so-called detection omission that cannot be detected despite the presence of a detection target occurs.

本発明は、上記問題を鑑みてなされたものであり、処理負担の軽減と検出漏れをなくす物体検出装置を提供することを目的とする。 The present invention has been made in view of the above problems, and an object of the present invention is to provide an object detection device that reduces processing load and eliminates detection omissions.

本発明の１つの態様は、画像から検出対象物を検出する物体検出装置であって、前記画像を取得する画像取得部と、前記画像取得部から入力された前記画像から検出窓が設定されている検出窓領域画像を取り込み、前記検出窓領域画像に前記検出対象物が存在しているか否かの判定と、その判定における前記検出対象物の存在の確からしさの度合いを示す信頼度と、を出力する識別器と、前記信頼度が高くなるほどその後の検出窓の走査間隔を短く設定する検出窓制御手段と、前記識別器からの出力に基づき前記検出対象物が存在するか否かを判定する判定手段と、を有することを特徴とする。 One aspect of the present invention is an object detection device that detects a detection target from an image, and an image acquisition unit that acquires the image, and a detection window is set from the image input from the image acquisition unit. A detection window region image, a determination as to whether or not the detection target exists in the detection window region image, and a reliability indicating a degree of certainty of the detection target in the determination. It is determined whether or not the detection target exists based on the output from the discriminator, the detection window control means for setting the scanning interval of the subsequent detection window to be shorter as the reliability becomes higher, and the output from the discriminator. And determining means.

ここで、前記検出窓制御手段は、前記走査間隔を前記検出窓の大きさを超えない範囲に設定することが好適である。 Here, it is preferable that the detection window control means sets the scanning interval within a range that does not exceed the size of the detection window.

また、前記識別器は、前記検出窓領域画像と前記検出対象物とが類似している度合いを示す類似度を出力し、前記判定手段は、前記信頼度が所定値以上であり互いに重なり合っている前記検出窓領域画像、の中から前記類似度が高い前記検出窓領域画像を前記検出対象物の画像を含む検出対象画像と判定することが好適である。 The discriminator outputs a degree of similarity indicating a degree of similarity between the detection window region image and the detection target, and the determination unit has the reliability equal to or higher than a predetermined value and overlaps each other. It is preferable that the detection window region image having a high similarity among the detection window region images is determined as a detection target image including an image of the detection target.

また、前記識別器は、複数の強識別器が直列に接続されたカスケード型識別器を含み、当該直列に接続された複数の強識別器のそれぞれは、前記検出窓領域画像に前記検出対象物が存在するか否かの識別結果を出力し、前記カスケード型識別器において前記検出対象物が存在するとの識別結果を出力した強識別器のうち最後の段の強識別器の初段からの段数を前記信頼度とすることが好適である。なお、検出対象物が存在しないとの識別結果の出力を利用する場合は、識別結果を出力した強識別器の前段の段数としても同じである。 The discriminator includes a cascade type discriminator in which a plurality of strong discriminators are connected in series, and each of the plurality of strong discriminators connected in series includes the detection object in the detection window region image. The number of stages from the first stage of the strong classifier of the last stage among the strong classifiers that output the identification result that the detection target exists in the cascade type classifier is output. The reliability is preferably set. In addition, when using the output of the identification result that the detection target does not exist, the same is true for the number of stages preceding the strong classifier that outputs the identification result.

本発明によれば、物体検出装置において処理負担を軽減し、検出漏れをなくすことができる。 According to the present invention, it is possible to reduce the processing load in the object detection apparatus and eliminate detection omissions.

＜第１の実施の形態＞
本発明の実施の形態における物体検出装置１は、図１に示すように、画像取得部２、信号処理部３、記憶部４及び出力部５を含んで構成される。物体検出装置１は、監視空間を撮像した監視画像を取得し、画像内に撮像された人や物等の検出対象物を検出する。画像取得部２、信号処理部３、記憶部４及び出力部５は互いに情報伝達可能に接続される。 <First Embodiment>
As shown in FIG. 1, the object detection apparatus 1 according to the embodiment of the present invention includes an image acquisition unit 2, a signal processing unit 3, a storage unit 4, and an output unit 5. The object detection device 1 acquires a monitoring image obtained by imaging the monitoring space, and detects a detection target such as a person or an object captured in the image. The image acquisition unit 2, the signal processing unit 3, the storage unit 4, and the output unit 5 are connected so as to be able to transmit information to each other.

なお、本実施の形態では、画像内に写った人を検出対象物する例について説明する。ただし、これに限定されるものではなく、流通化に置かれる商品等の物品、通行する車等を検出する場合等にも適用することができる。 In this embodiment, an example in which a person captured in an image is detected will be described. However, the present invention is not limited to this, and the present invention can also be applied to the case of detecting articles such as commodities placed in distribution, vehicles passing through, and the like.

画像取得部２は、ＣＣＤ素子やＣ−ＭＯＳ素子等の撮像素子、光学系部品、アナログ／デジタル変換器等を含んで構成される所謂監視カメラを含む。また、画像取得部２は、インターネットのネットワークを介して画像を取得するものであってもよい。画像取得部２は、撮像した画像を信号処理部３へ送信する。画像を取得する間隔は一定の時間間隔でなくてもよい。また、取得する画像は固定した場所を撮像したものでなくてもよい。 The image acquisition unit 2 includes a so-called surveillance camera configured to include an imaging element such as a CCD element or a C-MOS element, an optical system component, an analog / digital converter, and the like. The image acquisition unit 2 may acquire an image via an Internet network. The image acquisition unit 2 transmits the captured image to the signal processing unit 3. The interval at which the images are acquired may not be a fixed time interval. Further, the acquired image may not be an image of a fixed place.

画像は、例えば、幅３２０ピクセル、高さ２４０ピクセル、各ピクセルがＲ（赤）、Ｇ（緑）、Ｂ（青）をそれぞれ２５６階調で表現したカラー画像である。 The image is, for example, a color image having a width of 320 pixels, a height of 240 pixels, and each pixel representing R (red), G (green), and B (blue) with 256 gradations.

信号処理部３は、ＣＰＵ、ＤＳＰ、ＭＣＵ、ＩＣ等の演算回路を含んで構成される。信号処理部３は、画像取得部２、記憶部４及び出力部５と情報伝達可能に接続される。信号処理部３は、検出窓領域選択手段３０、検出窓制御手段３２、識別器３４及び判定手段３６等の各手段での処理を記述したプログラムを記憶部４から読み出して実行することによりコンピュータを各手段として機能させる。 The signal processing unit 3 includes an arithmetic circuit such as a CPU, DSP, MCU, or IC. The signal processing unit 3 is connected to the image acquisition unit 2, the storage unit 4, and the output unit 5 so as to be able to transmit information. The signal processing unit 3 reads the program describing the processing in each unit such as the detection window region selection unit 30, the detection window control unit 32, the discriminator 34, and the determination unit 36 from the storage unit 4 and executes the computer. It functions as each means.

信号処理部３は、図２の上図に示すように、画像取得部２より入力画像２００から検出対象物を検出する。説明の都合上、画像２００の左上を原点Ｘ＝０，Ｙ＝０とし、横方向にＸ軸、縦方向にＹ軸とし、Ｘ軸は右方向、Ｙ軸は下方向に正に増加するものとする。入力画像２００には、領域２０３，２０４の位置に検出対象物（人）が写っており、矩形２０１，２０２は検出窓領域を示している。 As shown in the upper diagram of FIG. 2, the signal processing unit 3 detects a detection target from the input image 200 from the image acquisition unit 2. For convenience of explanation, the upper left corner of the image 200 is the origin X = 0, Y = 0, the horizontal direction is the X axis, the vertical direction is the Y axis, the X axis is the right direction, and the Y axis is positively increased downward. And In the input image 200, a detection object (person) is shown at the positions of the areas 203 and 204, and rectangles 201 and 202 indicate detection window areas.

物体検出装置１は、検出窓領域を少しずつずらしながら走査し、検出窓領域に人が写っている否かを判定する。矢印は、検出窓領域をずらす際の左上の座標を示すものであり、検出窓領域は画像２００全体を漏れなく探索するように走査する。領域２０３，２０４にある矩形は、検出処理の結果、人であると判定された検出窓領域（人候補領域）を示す。人が写っている画像領域付近では、人であると判定される検出窓領域が複数出てくる場合があるが、検出窓領域をまとめる処理を行うことで最終的な検出窓領域（図２中の太線：人領域）を得る。 The object detection apparatus 1 scans while gradually shifting the detection window area, and determines whether a person is captured in the detection window area. The arrow indicates the upper left coordinates when the detection window area is shifted, and the detection window area is scanned so as to search the entire image 200 without omission. The rectangles in the areas 203 and 204 indicate detection window areas (person candidate areas) that are determined to be human as a result of the detection process. In the vicinity of an image region where a person is shown, there may be a plurality of detection window regions that are determined to be humans. However, the final detection window region (in FIG. (Thick line: human area).

検出窓領域選択手段３０は、検出窓領域の幅と高さを決定する。画像内に様々な大きさで写る人に対応するため検出窓領域の幅と高さを変更しながら、検出窓領域を走査して画像内に人が写っているか否かを判定する。なお、検出窓領域の幅と高さは、検出対象物の画像上での大きさを考慮して、予め記憶部４に１又は複数を記憶している。ただし、予め幅と高さを記憶していなくとも、所定の規則に従って決定してもよい。 The detection window area selection means 30 determines the width and height of the detection window area. The detection window area is scanned to determine whether or not a person is captured in the image while changing the width and height of the detection window area in order to correspond to people captured in various sizes in the image. Note that one or a plurality of widths and heights of the detection window region are stored in advance in the storage unit 4 in consideration of the size of the detection object on the image. However, even if the width and height are not stored in advance, they may be determined according to a predetermined rule.

検出窓制御手段３２は、検出窓領域をずらす間隔（走査間隔）を決定し、決定した走査間隔に基づいて検出窓領域をずらす。 The detection window control means 32 determines an interval (scanning interval) for shifting the detection window region, and shifts the detection window region based on the determined scanning interval.

識別器３４は、検出窓領域内の画像がどれだけ人に似ているか、類似度を計算する。類似度は、図３に示す処理手順で計算される。まず、検出窓切出手段３１０にて入力画像から検出窓領域が切り出され、カスケード型識別器３２０に入力される。カスケード型識別器３２０は、強識別器３２１，３２２，３２３のような複数の強識別器が直列に並んだ識別器である（ここではＮ個の識別器が直列に並んでいる例を示す）。個々の識別器は、ヒストグラム・オブ・オリエンティッド・グラディエント(ＨＯＧ：Histograms of Oriented Gradients)特徴(Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, In Proceedings of IEEE Conference Computer Vision and Pattern Recognition 2005)を用いてアダブースト(AdaBoost)で予め学習させる。すなわち、検出対象物である人の様々な立位状態の画像と人が写っていない画像を大量に用意し、各画像に対して検出対象物の画像であるか否か正解付けを行っておき、これらのデータを用いて両者が識別できるようにアダブーストに学習させる。類似度を計算するときは、各識別器は、入力された画像からＨＯＧ特徴を計算し、アダブーストで選択された特徴量より類似度を計算する。 The discriminator 34 calculates the degree of similarity to how much the image in the detection window region resembles a person. The similarity is calculated by the processing procedure shown in FIG. First, a detection window region is extracted from the input image by the detection window extraction means 310 and input to the cascade discriminator 320. The cascade type discriminator 320 is a discriminator in which a plurality of strong discriminators such as strong discriminators 321, 322 and 323 are arranged in series (here, an example in which N discriminators are arranged in series) is shown. . Individual classifiers have Histograms of Oriented Gradients (HOG) features (Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, In Proceedings of IEEE Conference Computer Vision and Pattern Recognition 2005) Used to learn in advance with AdaBoost. In other words, a large number of images of various standing states of people who are detection objects and images in which people are not captured are prepared, and whether each image is an image of the detection object is correctly identified. Using these data, Adaboost is trained so that both can be identified. When calculating the similarity, each discriminator calculates the HOG feature from the input image, and calculates the similarity from the feature amount selected by Adaboost.

強識別器３２１〜３２３はあらかじめ計算する順序が決まっており、最初に計算する強識別器３２１が１、次に計算する強識別器３２２が２・・・のように各強識別器には計算する順番と同じ番号を割り当てる。 The order in which the strong classifiers 321 to 323 are calculated in advance is determined. The strong classifier 321 to be calculated first is 1, the next strong classifier 322 to be calculated is 2. Assign the same number as the order of

各識別器３２１〜３２３は、検出窓領域として切り出された画像を入力とし、類似度を計算する。最先の識別器３２１以外の識別器３２２，３２３等は、前段で計算された類似度が閾値より大きい場合のみ（図中Ｔの矢印）、類似度を計算する。閾値以下の場合（図中Ｆの矢印）、類似度と信頼度を類似度・信頼度管理手段３３０に出力し、検出窓領域の大きさ（幅、高さ）と中心座標と共に判定情報履歴４０として記録する。 Each of the classifiers 321 to 323 receives the image cut out as the detection window region and calculates the similarity. The discriminators 322, 323, etc. other than the earliest discriminator 321 calculate the similarity only when the similarity calculated in the previous stage is larger than the threshold (arrow T in the figure). If it is equal to or less than the threshold value (arrow F in the figure), the similarity and reliability are output to the similarity / reliability management means 330, and the determination information history 40 together with the size (width, height) and center coordinates of the detection window area. Record as.

閾値は、例えば０に設定し、０より大きければ人に似ており、０以下であれば人に似ていないと判定する。また、信頼度とは、検出窓領域として切り出された画像に検出対象物の存在の確からしさの度合いであって、検出窓領域の走査間隔を設定する際に用いる。例えば、カスケード型識別器３２０において検出対象物が存在する（図中Ｔ）との識別結果を出力した強識別器のうち最後の段となる強識別器の番号（初段からの段数）とする。なお、最終段の強識別器３２３で判定を行う際には、閾値より大きい場合（図中Ｔの矢印）及び閾値以下の場合（図中Ｆの矢印）のいずれにおいても類似度、信頼度、検出窓領域の大きさ（幅、高さ）と中心座標を類似度・信頼度管理手段３３０に出力し、判定情報履歴４０として記録する。 The threshold is set to 0, for example, and if it is greater than 0, it is similar to a person, and if it is 0 or less, it is determined that it is not similar to a person. The reliability is the degree of certainty that the detection target exists in the image cut out as the detection window area, and is used when setting the scanning interval of the detection window area. For example, the strong classifier number (number of stages from the first stage) of the last class among the strong classifiers that output the discrimination result that the detection target exists (T in the figure) in the cascade classifier 320 is used. When the determination is made by the strong classifier 323 at the final stage, the similarity, the reliability, and the case of being larger than the threshold (T arrow in the figure) and below the threshold (F arrow in the figure) The size (width, height) of the detection window region and the center coordinates are output to the similarity / reliability management means 330 and recorded as the determination information history 40.

判定手段３６は、判定情報履歴４０として記憶されている検出窓領域の大きさ（幅、高さ）と中心座標、類似度及び信頼度（強識別器の識別番号）を用いて、最終的に画像内のどこに人が写っているかを決定する。 The determination means 36 finally uses the size (width, height) of the detection window area and the center coordinates, similarity and reliability (strong classifier identification number) stored as the determination information history 40. Determine where people are in the image.

判定情報履歴４０に含まれるデータの中で、信頼度がカスケード型識別器３２０に含まれる強識別器の最終段の識別番号に一致し、かつ類似度が閾値より大きいデータを抽出し人候補領域とする。人候補領域がない場合は入力画像中に人領域はないということで終了する。 From the data included in the determination information history 40, data whose reliability coincides with the identification number of the last stage of the strong classifier included in the cascade classifier 320 and whose similarity is larger than the threshold is extracted, and the human candidate area And If there is no human candidate area, the process ends because there is no human area in the input image.

人候補領域があった場合、検出窓領域の大きさと中心座標を用いて、一定以上（例えば、検出窓領域の面積の半分以上）の領域が重なっている人候補領域を纏める。纏めた人候補領域の中で類似度が一番高い検出窓領域を人領域として選択する。纏めた領域毎に選択された検出窓領域を人領域とし、出力部５に選択された検出窓領域の情報を出力する。 When there is a human candidate area, the human candidate areas that overlap a certain area (for example, more than half of the area of the detection window area) are collected using the size and center coordinates of the detection window area. The detection window region having the highest similarity among the collected human candidate regions is selected as the human region. The detection window area selected for each collected area is set as a human area, and information on the detection window area selected is output to the output unit 5.

記憶部４は、ＲＯＭやＲＡＭ等のメモリ装置で構成される。記憶部４は、信号処理部３からアクセス可能に接続される。記憶部４は、各種プログラム、各種データを記憶することができ、信号処理部３からの要求に応じてこれらの情報を読み書きする。記憶部４は、類似度・信頼度管理手段３３０として機能し、判定情報履歴４０として検出窓領域の大きさ（幅、高さ）と中心座標、類似度及び信頼度の４つの情報を関連付けて記憶する。判定情報履歴４０は、検出窓制御手段３２、識別器３４、判定手段３６において必要に応じて読み書きされる。 The storage unit 4 is configured by a memory device such as a ROM or a RAM. The storage unit 4 is connected so as to be accessible from the signal processing unit 3. The storage unit 4 can store various programs and various data, and reads and writes these pieces of information in response to requests from the signal processing unit 3. The storage unit 4 functions as the similarity / reliability management unit 330, and associates four types of information, that is, the size (width, height) of the detection window area, the center coordinates, the similarity, and the reliability as the determination information history 40. Remember. The determination information history 40 is read and written as necessary by the detection window control means 32, the discriminator 34, and the determination means 36.

出力部５は、報知音を出力する音響出力手段や入力画像を表示する表示手段を含んで構成することができる。判定手段３６で人領域が検出された場合、スピーカー、ブザー等の音響出力手段で警報を鳴らしたり、ディスプレイ等の外部表示装置に入力画像を表示したりする。また、出力部５は、コンピュータをネットワークや電話回線に接続するためのインターフェースを含んでもよい。この場合、出力部５は、電話回線やインターネット等の情報伝達手段を介して、センタ装置（図示しない）に入力画像や人領域の情報を送出する。なお、センタ装置は、画像内の検出対象物を監視するセンタ等に設置されるホストコンピュータである。 The output unit 5 can be configured to include a sound output unit that outputs a notification sound and a display unit that displays an input image. When the determination unit 36 detects a human area, an alarm is sounded by a sound output unit such as a speaker or a buzzer, or an input image is displayed on an external display device such as a display. The output unit 5 may include an interface for connecting the computer to a network or a telephone line. In this case, the output unit 5 sends an input image and human area information to a center device (not shown) via information transmission means such as a telephone line or the Internet. The center device is a host computer installed in a center or the like that monitors a detection target in an image.

以下、物体検出装置１における処理について、図１の機能ブロック図並びに図５〜８のフローチャートを参照しつつ説明する。 Hereinafter, processing in the object detection apparatus 1 will be described with reference to the functional block diagram of FIG. 1 and the flowcharts of FIGS.

ステップＳ１０では、画像取得部２において画像を取得し、信号処理部３に画像が送信される。画像の取得タイミングは決まった時間間隔でもよいし、外部から何らかの要求があったタイミングであってもよい。 In step S <b> 10, an image is acquired by the image acquisition unit 2, and the image is transmitted to the signal processing unit 3. The image acquisition timing may be a fixed time interval or may be a timing when there is some request from the outside.

ステップＳ２０では、検出窓領域の大きさ（幅と高さ）を決定する。検出窓領域の大きさを予め設定した複数の大きさに順次変更しつつ、各大きさの検出窓領域で画像全体を走査する。また、本実施の形態では検出窓領域は矩形としたので大きさとして幅と高さのみを決定すればよいが、検出窓領域は任意の形状であってよく、その場合には形状と大きさを決定する。この処理は検出窓領域選択手段３０にて行われる。 In step S20, the size (width and height) of the detection window region is determined. The entire image is scanned with each size of the detection window region while sequentially changing the size of the detection window region to a plurality of preset sizes. In this embodiment, since the detection window region is rectangular, it is only necessary to determine the width and height as the size. However, the detection window region may have an arbitrary shape, and in that case, the shape and size are determined. To decide. This processing is performed by the detection window region selecting means 30.

ステップＳ３０からステップＳ５０の処理は、ステップＳ２０で設定した検出窓領域の大きさで画像全体を走査し終わるまで繰り返す。 The processing from step S30 to step S50 is repeated until the entire image has been scanned with the size of the detection window region set in step S20.

ステップＳ３０では、前回の検出窓領域で判別の際に求めた信頼度を用いて検出窓領域の走査間隔を求める。信頼度とは、上記のとおり、カスケード型識別器３２０において検出対象物が存在する（図中Ｔ）との識別結果を出力した強識別器のうち最後の強識別器に割り当てられた番号（初段からの段数）である。この処理は検出窓制御手段３２で行う。 In step S30, the scanning interval of the detection window region is obtained using the reliability obtained in the determination in the previous detection window region. As described above, the reliability is the number assigned to the last strong classifier among the strong classifiers that output the discrimination result that the detection target exists (T in the figure) in the cascade type classifier 320 (first stage). The number of stages from This process is performed by the detection window control means 32.

具体的には、判定情報履歴４０に格納されている前回の検出窓領域で判定した信頼度を読み出し、対応する走査間隔を予め用意した対応表により選択する。図４に対応表４００の例を示す。 Specifically, the reliability determined in the previous detection window area stored in the determination information history 40 is read, and the corresponding scanning interval is selected from a correspondence table prepared in advance. FIG. 4 shows an example of the correspondence table 400.

図２の下図は、入力画像２００のＹ＝ｙ１において、検出窓領域を１ピクセルずつＸ方向にずらした時に判定した信頼度がどのように変化するかをグラフ２１０として示す。人が写り込んでいる領域に近づくにつれ、カスケード型識別器３２０において検出対象物が存在するとの識別結果を出力する識別番号（信頼度）が大きくなり、人が写り込んでいる領域から離れるにつれ識別番号（信頼度）が小さくなる。このことより、信頼度が小さい場合にはその検出窓領域の画像は人に似ておらず、信頼度が大きい場合には人に似ているといえ、対応表４００は信頼度が小さいほど走査間隔は大きく、信頼度が大きいほど走査間隔は小さくなるように設定する。また、走査間隔は０（ピクセル）より大きく、検出窓領域の幅又は高さよりも小さくする。 The lower diagram of FIG. 2 shows, as a graph 210, how the reliability determined when the detection window region is shifted pixel by pixel in the X direction at Y = y1 of the input image 200 changes. As the area where the person is reflected is approached, the identification number (reliability) for outputting the identification result that the detection target is present in the cascade discriminator 320 increases, and the identification is performed as the person moves away from the area where the person is reflected. The number (reliability) decreases. From this, it can be said that the image of the detection window area does not resemble a person when the reliability is small, and resembles a person when the reliability is large, and the correspondence table 400 scans as the reliability decreases. The interval is set to be large, and the scanning interval is set to be smaller as the reliability is higher. The scanning interval is larger than 0 (pixel) and smaller than the width or height of the detection window region.

例えば、入力画像が幅３２０ピクセル、高さ２４０ピクセル、検出窓領域が幅６４ピクセル、高さ１２８ピクセル、カスケード型識別器３２０の強識別器の段数が１３である場合、Ｖ１＝８，Ｖ２＝８，Ｖ３＝７，Ｖ４＝７，Ｖ５＝６，Ｖ６＝６，Ｖ７＝６，Ｖ８＝５，Ｖ９＝５，Ｖ１０＝５，Ｖ１１＝４，Ｖ１２＝４，Ｖ１３＝４とする。 For example, when the input image has a width of 320 pixels and a height of 240 pixels, the detection window area has a width of 64 pixels and a height of 128 pixels, and the number of stages of the strong classifiers of the cascade classifier 320 is 13, V1 = 8, V2 = 8, V3 = 7, V4 = 7, V5 = 6, V6 = 6, V7 = 6, V8 = 5, V9 = 5, V10 = 5, V11 = 4, V12 = 4, V13 = 4.

なお、ステップＳ２０で検出窓領域の大きさを変更した直後、すなわち画像に対する新たな検出窓領域での走査開始時には走査間隔は予め設定した初期値とする。 Note that immediately after the size of the detection window area is changed in step S20, that is, at the start of scanning of the image in the new detection window area, the scanning interval is set to a preset initial value.

ステップＳ４０では、検出窓領域の位置を決定する。この処理は検出窓制御手段３２でなされる。検出窓領域の位置決定方法について図６のフローチャートを用いて説明する。なお、検出窓領域の左上の座標を開始点と呼び、決定すべき検出窓領域の開始点を（ＳＸ，ＳＹ）、前回の検出窓領域の開始点を（ＢＸ，ＢＹ）とする。 In step S40, the position of the detection window area is determined. This process is performed by the detection window control means 32. A method for determining the position of the detection window region will be described with reference to the flowchart of FIG. The upper left coordinate of the detection window area is called a start point, the start point of the detection window area to be determined is (SX, SY), and the previous start point of the detection window area is (BX, BY).

ステップＳ４０１では、検出窓領域の大きさが変更されたか否かの判定を行う。検出窓領域の大きさが変更された直後の場合は、ステップＳ４０５において検出窓領域の開始点（ＳＸ，ＳＹ）を（０，０）に設定してステップＳ５０に移行する。検出窓領域の大きさ変更後、初めての処理でない場合にはステップＳ４０２へ移行する。 In step S401, it is determined whether or not the size of the detection window area has been changed. If the size of the detection window area has just been changed, the start point (SX, SY) of the detection window area is set to (0, 0) in step S405, and the process proceeds to step S50. If it is not the first process after the size of the detection window area is changed, the process proceeds to step S402.

ステップＳ４０２では、前回の検出窓領域の開始点（ＢＸ，ＢＹ）を読み出す。ステップＳ４０３では、ステップＳ４０２で読み出した前回の検出窓領域の開始点（ＢＸ，ＢＹ）から画像の右端まで走査したか否かを判定する。前記の検出窓領域で画像の右端まで走査が終了した場合、すなわち検出窓領域の右端ＢＸ＋Ｗ（但し、Ｗは検出窓領域の幅）が画像の右端に一致した場合にはステップＳ４０６において検出窓領域の開始点を（ＳＸ，ＳＹ）＝（０，ＢＹ＋ｑ）と設定し、ステップＳ５０へ移行する。ただし、ｑは予め定めた定数とする。例えば、入力画像が幅３２０ピクセル、高さ２４０ピクセルの場合にはｑ＝４ピクセルに設定する。右端まで走査していない場合にはステップＳ４０４に移行する。 In step S402, the start point (BX, BY) of the previous detection window area is read. In step S403, it is determined whether scanning has been performed from the start point (BX, BY) of the previous detection window area read in step S402 to the right end of the image. When scanning to the right edge of the image is completed in the detection window area, that is, when the right edge BX + W of the detection window area (W is the width of the detection window area) coincides with the right edge of the image, the detection window area in step S406 Is set to (SX, SY) = (0, BY + q), and the process proceeds to step S50. However, q is a predetermined constant. For example, when the input image has a width of 320 pixels and a height of 240 pixels, q = 4 pixels is set. If not scanned to the right end, the process proceeds to step S404.

ステップＳ４０４では、ステップＳ３０で決定した走査間隔だけＸ方向に検出窓領域をずらし、Ｙ方向にはずらさないように検出窓領域の開始点を（ＳＸ，ＳＹ）を決定する。すなわち、検出窓領域の開始点（ＳＸ，ＳＹ）＝（ＢＸ＋ステップＳ３０で決定した走査間隔，ＢＹ）とする。その後、ステップＳ５０に移行する。 In step S404, the detection window area is shifted in the X direction by the scanning interval determined in step S30, and the start point of the detection window area is determined (SX, SY) so as not to be shifted in the Y direction. That is, the detection window region start point (SX, SY) = (BX + scanning interval determined in step S30, BY). Thereafter, the process proceeds to step S50.

ステップＳ５０では、ステップＳ４０にて設定された検出窓領域内の画像がどれだけ人に似ているかを示す類似度を求める。これは、図３に示したカスケード型識別器３２０の識別処理であり、識別器３４にてなされる。識別処理について図７のフローチャートを用いて説明する。 In step S50, a similarity indicating how much the image in the detection window area set in step S40 resembles a person is obtained. This is a discrimination process of the cascade discriminator 320 shown in FIG. The identification process will be described with reference to the flowchart of FIG.

ステップＳ５０１では、現在の検出窓領域からＨＯＧ特徴量を計算する。ただし、この処理はステップＳ１０とステップＳ２０との間で、入力画像の各ピクセルのエッジの強度と角度を計算し、エッジの角度毎のインテグラル画像を作成しておくことでより高速に計算することが可能になる。 In step S501, the HOG feature amount is calculated from the current detection window region. However, this process is faster between steps S10 and S20 by calculating the edge strength and angle of each pixel of the input image and creating an integral image for each edge angle. It becomes possible.

ステップＳ５０２では、すべての強識別器を調査し終わったか否かの判定を行う。すべての強識別器で調査し終わった場合はステップＳ５０５に移行し、終わっていない場合にはステップＳ５０３に移行し、次の強識別器での調査を行う。ステップＳ５０３では、検出窓領域内の画像が人に似ているか否かを判定するための類似度を計算する。 In step S502, it is determined whether or not all strong classifiers have been investigated. If all strong classifiers have been investigated, the process proceeds to step S505. If not completed, the process proceeds to step S503, and the next strong classifier is investigated. In step S503, the similarity for determining whether the image in the detection window area is similar to a person is calculated.

ステップＳ５０４では、ステップＳ５０３において計算した類似度が予め設定した閾値より大きいか否かを判定する。閾値より大きい場合にはステップＳ５０２に移行し、閾値以下の場合にはステップＳ５０５に移行する。ステップＳ５０５では、検出窓領域の大きさ（幅と高さ）と中心座標、類似度及び信頼度を関連付けて判定情報履歴４０として記録し、ステップＳ６０に移行する。 In step S504, it is determined whether the similarity calculated in step S503 is greater than a preset threshold value. If it is larger than the threshold value, the process proceeds to step S502, and if it is less than the threshold value, the process proceeds to step S505. In step S505, the size (width and height) of the detection window region, the center coordinates, the similarity, and the reliability are associated and recorded as the determination information history 40, and the process proceeds to step S60.

ステップＳ６０では、すべての大きさの検出窓領域について検出処理が終了したか否かを判定する。すべての大きさの検出窓領域について調査が終わった場合はステップＳ７０に処理を移行させ、そうでない場合はステップＳ２０に処理を戻す。 In step S60, it is determined whether or not the detection process has been completed for detection window regions of all sizes. If the inspection is completed for all detection window regions, the process proceeds to step S70, and if not, the process returns to step S20.

ステップＳ７０では、ステップＳ５０で求めた判定情報履歴４０に記憶されている各検出窓領域の大きさ（幅と高さ）、中心座標、類似度及び信頼度から最終的に人が写っている位置を決定する。この処理は判定手段３６でなされる。 In step S70, the position where the person is finally captured from the size (width and height), center coordinates, similarity, and reliability of each detection window area stored in the determination information history 40 obtained in step S50. To decide. This process is performed by the judging means 36.

ステップＳ７０１〜Ｓ７０２の処理はステップＳ５０で求めた判定情報履歴４０として記憶されているすべての検出窓領域について行う。ステップＳ７０１では、信頼度がカスケード型識別器３２０に含まれる強識別器の最終段の識別番号に一致し、かつ類似度が閾値より大きいか否かを判定し、条件を満たす場合にはステップＳ７０２に移行し、そうでない場合には次の検出窓領域について判定を行う。ステップＳ７０２では、ステップＳ７０１で条件を満たした検出窓領域を人候補領域に追加する。 The processing of steps S701 to S702 is performed for all detection window regions stored as the determination information history 40 obtained in step S50. In step S701, it is determined whether or not the reliability matches the identification number of the last stage of the strong classifier included in the cascade classifier 320, and the similarity is greater than a threshold value. If the condition is satisfied, step S702 is performed. If not, the next detection window region is determined. In step S702, the detection window area that satisfies the condition in step S701 is added to the human candidate area.

判定情報履歴４０に記憶された情報に対してステップＳ７０１〜Ｓ７０２を繰り返すことで求められた人候補領域は、図２に示すように、人が写っている領域の近くに領域２０３，２０４のように複数抽出される場合がある。ステップＳ７０３〜Ｓ７０５では、複数の人候補領域から人領域を最終的に選択する。 As shown in FIG. 2, candidate candidate areas obtained by repeating steps S701 to S702 on the information stored in the determination information history 40 are areas 203 and 204 near the area where the person is shown. May be extracted multiple times. In steps S703 to S705, a human region is finally selected from a plurality of human candidate regions.

ステップＳ７０３では、人候補領域として抽出された検出窓領域の大きさと中心座標を用いて、一定以上（例えば、検出窓領域の面積の半分以上）の領域が重なっている人候補領域をグループとして纏める。ステップＳ７０４では、ステップＳ７０３で作成されたグループ毎に人候補領域の中で類似度が一番高い検出窓領域を人領域として選択する。ステップＳ７０５では、ステップＳ７０４で選択された検出窓領域を人領域とし、ステップＳ８０に移行する。 In step S703, using the size and center coordinates of the detection window area extracted as the human candidate area, the human candidate areas that overlap a certain area (for example, more than half the area of the detection window area) are grouped together. . In step S704, the detection window region having the highest similarity among the human candidate regions is selected as the human region for each group created in step S703. In step S705, the detection window area selected in step S704 is set as a human area, and the process proceeds to step S80.

ステップＳ８０では、ステップＳ７０において判定された人領域を基に、人領域が１つでもあれば、判定された人領域の検知窓情報を画像と共に異常信号として出力する。この処理は出力部５にてなされる。 In step S80, if there is at least one human area based on the human area determined in step S70, the detection window information of the determined human area is output as an abnormal signal together with the image. This processing is performed by the output unit 5.

なお、本実施の形態では、カスケード型識別器３２０の各識別器における判定によって定められる信頼度に応じてＸ方向への走査間隔を変更するものとしたが、同様にＹ方向への走査間隔を変更するものとしてもよい。例えば、１走査ライン前の各検出窓領域におけるカスケード型識別器３２０からの信頼度の出力を平均化し、その値に応じて次の走査ラインの開始点（ＳＸ，ＳＹ）を決定する際のｑの値を変更すればよい。 In the present embodiment, the scanning interval in the X direction is changed according to the reliability determined by the determination in each discriminator of the cascade type discriminator 320. Similarly, the scanning interval in the Y direction is changed. It may be changed. For example, q at the time of determining the start point (SX, SY) of the next scan line in accordance with the average of the reliability outputs from the cascade discriminator 320 in each detection window region one scan line before. The value of can be changed.

＜第２の実施の形態＞
第２の実施の形態では、第１の実施の形態における識別器３４の構成のみが異なる。本実施の形態における識別器３４は、図９に示すように、並列に接続した強識別器の数によって人領域の判定を行う。すなわち、第１の実施の形態ではカスケード型識別器３２０においていずれかの強識別器が人でないと判定した時点で処理を終了するものとしたが、本実施の形態ではすべての強識別器にて識別処理を行い、検索窓領域が人を含むと判定した強識別器の数がそうでないと判定した強識別器の数よりも多い場合に人を含むと判定した識別器の数を信頼度として判定手段３６に出力する。 <Second Embodiment>
In the second embodiment, only the configuration of the discriminator 34 in the first embodiment is different. As shown in FIG. 9, the discriminator 34 in the present embodiment determines the human area based on the number of strong discriminators connected in parallel. That is, in the first embodiment, the processing is terminated when it is determined in the cascade classifier 320 that any strong classifier is not a person, but in this embodiment, all strong classifiers are used. When the number of strong classifiers that have been identified and that the search window area has determined to include people is greater than the number of strong classifiers that have been determined to be otherwise, the number of classifiers that have been determined to include people It outputs to the determination means 36.

＜第３の実施の形態＞
第３の実施の形態では、第１の実施の形態における識別器３４の構成のみが異なる。本実施の形態における識別器３４は、予め画像上の人が有する形状、輝度、エッジ情報等のパラメータを含む特徴情報をテンプレートとして記憶部４に記憶しておき、検出窓領域の画像と特徴情報との類似度を算出し、類似度をその検出窓領域の人らしさの信頼度として判定手段３６に出力する。 <Third Embodiment>
In the third embodiment, only the configuration of the discriminator 34 in the first embodiment is different. The discriminator 34 in the present embodiment stores feature information including parameters such as shape, brightness, and edge information of a person on the image in advance in the storage unit 4 as a template, and the image and feature information of the detection window region. And the similarity is output to the determination unit 36 as the reliability of the humanity of the detection window region.

以上のように、上記各実施の形態によれば、画像からの物体検出において処理負担を軽減し、検出漏れをなくすことができる。 As described above, according to each of the above-described embodiments, it is possible to reduce the processing burden in detecting an object from an image and eliminate detection omission.

また、本実施の形態では、物体検出装置１の各部の機能を１つのコンピュータで実現する態様を説明したがこれに限定されるものではない。物体検出装置１の各部の機能は一般的なコンピュータをプログラムにより制御することによって実現できるものであり、これらの装置の各機能を適宜組み合わせて１つのコンピュータで処理させてもよいし、各機能をネットワーク等で接続された複数のコンピュータで分散処理させてもよい。 Moreover, although this Embodiment demonstrated the aspect which implement | achieves the function of each part of the object detection apparatus 1 with one computer, it is not limited to this. The functions of each part of the object detection device 1 can be realized by controlling a general computer by a program, and the functions of these devices may be appropriately combined and processed by a single computer. Distributed processing may be performed by a plurality of computers connected via a network or the like.

本発明の実施の形態における物体検出装置の構成を示す機能ブロック図である。It is a functional block diagram which shows the structure of the object detection apparatus in embodiment of this invention. 物体の検出処理を説明する図である。It is a figure explaining the detection process of an object. 本発明の実施の形態における識別器の構成を示す図である。It is a figure which shows the structure of the discriminator in embodiment of this invention. 本発明の実施の形態における対応表の例を示す図である。It is a figure which shows the example of the conversion table in embodiment of this invention. 本発明の実施の形態における物体検出装置での処理を示すフローチャートである。It is a flowchart which shows the process in the object detection apparatus in embodiment of this invention. 本発明の実施の形態における物体検出装置での検出窓領域の位置設定処理を示すフローチャートである。It is a flowchart which shows the position setting process of the detection window area | region in the object detection apparatus in embodiment of this invention. 本発明の実施の形態における物体検出装置での識別処理を示すフローチャートである。It is a flowchart which shows the identification process in the object detection apparatus in embodiment of this invention. 本発明の実施の形態における物体検出装置での判定処理を示すフローチャートである。It is a flowchart which shows the determination process in the object detection apparatus in embodiment of this invention. 本発明の実施の形態における識別器の構成を示す図である。It is a figure which shows the structure of the discriminator in embodiment of this invention.

Explanation of symbols

１物体検出装置、２画像取得部、３信号処理部、４記憶部、５出力部、３０検出窓領域選択手段、３２検出窓制御手段、３４識別器、３６判定手段、４０判定情報履歴、２００入力画像、２０１，２０２検出窓領域、２０３，２０４領域、３１０検出窓切出手段、３２０カスケード型識別器、３２１，３２２，３２３強識別器、３３０類似度・信頼度管理手段、４００対応表。 DESCRIPTION OF SYMBOLS 1 Object detection apparatus, 2 Image acquisition part, 3 Signal processing part, 4 Storage part, 5 Output part, 30 Detection window area | region selection means, 32 Detection window control means, 34 Classifier, 36 Determination means, 40 Determination information log | history, 200 Input image, 201, 202 detection window area, 203, 204 area, 310 detection window extraction means, 320 cascade type discriminator, 321, 322, 323 strong discriminator, 330 similarity / reliability management means, 400 correspondence table.

Claims

An object detection device for detecting a detection target from an image,
An image acquisition unit for acquiring the image;
A detection window region image in which a detection window is set is captured from the image input from the image acquisition unit, and it is determined whether or not the detection target is present in the detection window region image, and in the determination A discriminator that outputs a reliability indicating the degree of certainty of the presence of the detection object;
Detection window control means for setting the scanning interval of the subsequent detection window to be shorter as the reliability is higher;
Determining means for determining whether or not the detection object exists based on an output from the discriminator;
An object detection apparatus comprising:

The object detection apparatus according to claim 1, wherein the detection window control unit sets the scanning interval to a range that does not exceed a size of the detection window.

The discriminator outputs a similarity indicating the degree of similarity between the detection window region image and the detection target;
The determination means determines that the detection window region image having a high similarity from the detection window region images having a reliability equal to or higher than a predetermined value and overlapping each other is a detection target image including an image of the detection target. The object detection apparatus according to claim 1, wherein:

The discriminator includes a cascade type discriminator in which a plurality of strong discriminators are connected in series, and each of the plurality of strong discriminators connected in series has the detection target in the detection window region image. Output the identification result of whether or not
The number of stages from the first stage of the strong classifier of the last stage among the strong classifiers that output the identification result that the detection target exists in the cascade classifier is defined as the reliability. The object detection apparatus according to one.