JP6266461B2

JP6266461B2 - Object detection device

Info

Publication number: JP6266461B2
Application number: JP2014155853A
Authority: JP
Inventors: 叶秋李; 秀紀氏家; 正則小野塚; 佐藤　昌宏; 昌宏佐藤; 陽介村井
Original assignee: Secom Co Ltd
Current assignee: Secom Co Ltd
Priority date: 2014-07-31
Filing date: 2014-07-31
Publication date: 2018-01-24
Anticipated expiration: 2034-07-31
Also published as: JP2016033717A

Description

本発明は入力画像から所定の対象を検出する対象検出装置に関する。 The present invention relates to an object detection device that detects a predetermined object from an input image.

監視カメラなどで撮影した入力画像から人などの対象物の像を検出するために識別器等による探索処理が行われる。この探索処理では、入力画像内の各位置に窓領域を設定し、窓領域における画像を識別器等に入力する。そして、識別器等から出力されるスコアが閾値を超える窓領域を対象物の候補領域として抽出する。ここで真の対象物の周辺にも候補領域が抽出される傾向があるため、重複を有する候補領域をグループ化し、グループごとにスコアが最大の候補領域を対象物の領域として選別する。 Search processing by a classifier or the like is performed in order to detect an image of an object such as a person from an input image taken by a surveillance camera or the like. In this search process, a window area is set at each position in the input image, and an image in the window area is input to a discriminator or the like. Then, a window region whose score output from the discriminator or the like exceeds a threshold is extracted as a candidate region for the target object. Here, since there is a tendency that candidate areas are also extracted around the true object, the candidate areas having overlap are grouped, and the candidate area having the maximum score for each group is selected as the object area.

従来、この候補領域の抽出に用いるスコアの閾値として、どの入力画像に対しても予め定めた共通の値が用いられていた。 Conventionally, a predetermined common value has been used for any input image as a threshold value of a score used for extraction of candidate areas.

特開２０１０−１６０６４０号公報JP 2010-160640 A

しかしながら、スコアは入力画像の撮影環境（背景の複雑さ、照明状態、解像度など）によって変動する傾向があるにもかかわらず、どの入力画像に対しても共通の閾値を用いていたため、入力画像の撮影環境によっては背景の一部を誤抽出した候補領域が頻出する場合があり、対象物の検出精度を低下させる原因となっていた。 However, although the score tends to vary depending on the shooting environment of the input image (background complexity, lighting conditions, resolution, etc.), a common threshold is used for all input images. Depending on the shooting environment, a candidate area in which a part of the background is erroneously extracted frequently appears, which causes a decrease in detection accuracy of the target object.

具体的には、入力画像の撮影環境によっては背景しか写っていない候補領域だけからなるグループが生じ最終的に背景を対象物の領域としてしまうことや、複数の対象物の間に背景を誤抽出した候補領域がまたがってしまい、複数の対象物が一つの対象物として検出されてしまったりすることが多くなり、検出精度が低下する。 Specifically, depending on the shooting environment of the input image, a group consisting of only candidate areas with only the background appearing, and eventually the background becomes the object area, or the background is erroneously extracted between multiple objects. As a result, the number of target objects is often detected as a single target object, and the detection accuracy is reduced.

しかし、候補領域の誤抽出が生じやすい撮影環境に合わせて予め閾値を厳しくすると別の撮影環境で対象物の検出し損ねが増加する。 However, if the threshold value is tightened in advance in accordance with a shooting environment in which a candidate region is likely to be erroneously extracted, the failure to detect an object in another shooting environment increases.

この問題を図８、図９に示す例を用いて説明する。図８は比較的単純な環境で撮影された画像９００の模式図であり、図９は複雑な環境で撮影された画像９１０の模式図である。それぞれの画像内の矩形は候補領域を示している。図８（ａ）、図９（ａ）は比較的低い閾値で候補領域を抽出した場合である。図８（ａ）の画像９００には左側の小さな（遠くの）人物像の辺り、中央の机・椅子の辺り、及び右側の大きな（近くの）人物像の辺りにそれぞれ候補領域９０１ａ，９０１ｂ，９０１ｃが抽出されている。図９（ａ）の画像９１０には画像９００と同様、左側の小さな人物像の辺り、中央の机・椅子の辺り、及び右側の大きな人物像の辺りにそれぞれ候補領域９１１ａ，９１１ｂ，９１１ｃが抽出されている。ここで、候補領域９０１ｂ，９１１ｂはそれぞれ背景である椅子の辺りを人として誤抽出したものである。また、複雑な環境の画像９１０の左右の人物像は背景を構成する机等により一部の隠蔽を生じており、その分、スコアは低くなる。 This problem will be described with reference to examples shown in FIGS. FIG. 8 is a schematic diagram of an image 900 photographed in a relatively simple environment, and FIG. 9 is a schematic diagram of an image 910 photographed in a complicated environment. A rectangle in each image indicates a candidate area. FIGS. 8A and 9A show a case where candidate regions are extracted with a relatively low threshold. In the image 900 of FIG. 8A, candidate areas 901a, 901b, around a small (far) person image on the left side, around a central desk / chair, and around a large (near) person image on the right side, respectively. 901c is extracted. Like image 900, candidate areas 911a, 911b, and 911c are extracted from the left small person image, the center desk / chair, and the right large person image, respectively, in the image 910 of FIG. 9A. Has been. Here, the candidate areas 901b and 911b are obtained by erroneously extracting the vicinity of the chair, which is the background, as a person. In addition, the left and right human images of the image 910 in a complicated environment are partially concealed by a desk or the like constituting the background, and the score is lowered accordingly.

この例において、誤抽出された候補領域９０１ｂ，９１１ｂを削除するために候補領域の抽出閾値を高くすることが考えられる。図８（ｂ）、図９（ｂ）は、図８（ａ）、図９（ａ）よりも高い共通の閾値を設定して候補領域を抽出した様子を示している。誤抽出の候補領域は基本的にスコアが低いので、閾値を上げることで誤抽出の候補領域９０１ｂ，９１１ｂを削除することが可能である。これにより図８（ｂ）に示す画像９００では人物像を含む候補領域９０１ａ，９０１ｃが残る。一方、図９（ｂ）に示す画像９１０では、人物に対応した候補領域のうち隠蔽の度合いが大きい候補領域９１１ａも削除され、人物の検出漏れが起こることになる。 In this example, it is conceivable to increase the extraction threshold value of the candidate region in order to delete the erroneously extracted candidate regions 901b and 911b. FIGS. 8B and 9B show how candidate areas are extracted by setting a common threshold higher than those in FIGS. 8A and 9A. Since the erroneous extraction candidate area basically has a low score, it is possible to delete the erroneous extraction candidate areas 901b and 911b by increasing the threshold. As a result, candidate areas 901a and 901c including a person image remain in the image 900 shown in FIG. 8B. On the other hand, in the image 910 shown in FIG. 9B, the candidate area 911a having a high degree of concealment is also deleted from the candidate areas corresponding to the person, and the detection of the person is omitted.

このように、共通の閾値を用い、様々な環境で撮影した入力画像から精度よく対象物を検出することは困難であった。 As described above, it has been difficult to accurately detect an object from input images taken in various environments using a common threshold.

本発明は上記問題を鑑みてなされたものであり、候補領域に誤抽出が混在していると、抽出された候補領域のスコアがばらつくことを見出し、この知見に基づいて様々な環境で撮影した入力画像から精度よく対象を検出可能な対象検出装置を提供することを目的とする。 The present invention has been made in view of the above-mentioned problems, and found that if there are misextractions in candidate areas, the score of the extracted candidate areas varies, and images were taken in various environments based on this knowledge. An object of the present invention is to provide an object detection device capable of detecting an object from an input image with high accuracy.

本発明に係る対象検出装置は、入力画像において所定の対象が現れている対象領域を検出するものであって、前記入力画像内に設定される注目領域に前記対象が存在する尤もらしさを表す指標値を前記入力画像内の各所にて抽出される特徴量を用いて算出するための指標値算出関数を予め記憶している記憶部と、前記入力画像内の複数の位置に前記注目領域を設定し、当該注目領域における前記指標値を前記指標値算出関数により算出する指標値算出部と、前記注目領域のうち前記指標値が予め定められた第一閾値を超えるものを対象候補領域として抽出する候補領域抽出部と、前記候補領域抽出部により抽出された前記対象候補領域を用いて前記対象領域を決定する対象領域決定部と、を備え、前記候補領域抽出部は、前記注目領域ごとの前記指標値のうち前記第一閾値を超えるもののばらつき度合いが予め定められた誤抽出推定閾値以上である場合に、前記第一閾値より大きな第二閾値を設定し、前記対象候補領域から前記指標値が前記第二閾値以下であるものを削除する。 An object detection apparatus according to the present invention detects an object region in which a predetermined object appears in an input image, and is an index representing the likelihood that the object exists in an attention region set in the input image A storage unit that stores in advance an index value calculation function for calculating a value using a feature amount extracted at various points in the input image, and sets the attention area at a plurality of positions in the input image Then, an index value calculation unit that calculates the index value in the attention area by the index value calculation function, and a target area that extracts the attention value that exceeds the predetermined first threshold among the attention areas A candidate region extraction unit; and a target region determination unit that determines the target region using the target candidate region extracted by the candidate region extraction unit, the candidate region extraction unit for each region of interest When the degree of variation of the index value exceeding the first threshold is equal to or greater than a predetermined erroneous extraction estimation threshold, a second threshold larger than the first threshold is set, and the index value is determined from the target candidate area. That are less than or equal to the second threshold are deleted.

他の本発明に係る対象検出装置においては、前記候補領域抽出部は、前記第二閾値として、前記指標値のうち前記第一閾値を超えるものの代表値より小さい値であって、前記第一閾値との差が前記代表値に応じて大きくなる値を設定する。 In another object detection apparatus according to the present invention, the candidate region extraction unit is a value smaller than a representative value of the index value that exceeds the first threshold value as the second threshold value, and the first threshold value Is set to a value that increases the difference with the representative value.

さらに他の本発明に係る対象検出装置においては、前記候補領域抽出部は、前記第一閾値に前記ばらつき度合いに応じた値を加算して前記第二閾値を設定する。 In still another object detection apparatus according to the present invention, the candidate area extraction unit sets the second threshold value by adding a value corresponding to the variation degree to the first threshold value.

別の本発明に係る対象検出装置においては、前記候補領域抽出部は、前記指標値の前記ばらつき度合いが前記誤抽出推定閾値より低く予め設定される抽出漏れ推定閾値以下である場合に、前記第一閾値を予め定められた量だけ低下させ、前記対象候補領域を再抽出する。 In the target detection device according to another aspect of the present invention, the candidate area extraction unit may perform the first step when the variation degree of the index value is lower than the erroneous extraction estimation threshold and is equal to or less than a preset extraction omission estimation threshold. One threshold is decreased by a predetermined amount, and the target candidate region is re-extracted.

本発明によれば、候補領域の抽出閾値を入力画像ごとに適応的に設定することが可能となるため、様々な環境で撮影した入力画像から精度よく対象を検出できる。 According to the present invention, it is possible to adaptively set the extraction threshold value of the candidate area for each input image, so that the target can be detected with high accuracy from the input images taken in various environments.

本発明の実施形態に係る人物検出装置の概略のブロック構成図である。1 is a schematic block configuration diagram of a person detection device according to an embodiment of the present invention. 入力画像及び縮小画像の例を示す模式図である。It is a schematic diagram which shows the example of an input image and a reduction image. 候補領域のスコアを横軸、その頻度を縦軸にしてプロットした模式的なグラフである。It is the typical graph which plotted the score of the candidate area | region with the horizontal axis | shaft and the frequency on the vertical axis | shaft. 本発明の実施形態に係る人物検出装置の概略の動作を示すフロー図である。It is a flowchart which shows operation | movement of the outline of the person detection apparatus which concerns on embodiment of this invention. スコア算出部により抽出された候補領域に対する後続処理を説明する模式的な画像である。It is a typical image explaining the subsequent process with respect to the candidate area | region extracted by the score calculation part. 候補領域削除部の概略の処理フロー図である。FIG. 10 is a schematic process flow diagram of a candidate area deletion unit. 第一閾値の下方修正を説明する模式図である。It is a schematic diagram explaining the downward correction of a 1st threshold value. 比較的単純な環境で撮影された入力画像の模式図である。It is a schematic diagram of the input image image | photographed in the comparatively simple environment. 複雑な環境で撮影された入力画像の模式図である。It is a schematic diagram of the input image image | photographed in the complicated environment.

以下、本発明の実施の形態（以下実施形態という）について、図面に基づいて説明する。本実施形態に係る対象検出装置は、画像中に映った人物を検出の対象とする人物検出装置１である。 Hereinafter, embodiments of the present invention (hereinafter referred to as embodiments) will be described with reference to the drawings. The target detection apparatus according to the present embodiment is a person detection apparatus 1 that targets a person shown in an image as a detection target.

[構成例]
図１は、実施形態に係る人物検出装置１の概略のブロック構成図である。人物検出装置１は、画像入力部２、制御部３、記憶部４及び出力部５を含んで構成される。画像入力部２、記憶部４及び出力部５は制御部３と接続される。 [Configuration example]
FIG. 1 is a schematic block diagram of a person detection device 1 according to the embodiment. The person detection device 1 includes an image input unit 2, a control unit 3, a storage unit 4, and an output unit 5. The image input unit 2, the storage unit 4, and the output unit 5 are connected to the control unit 3.

画像入力部２は例えば、監視カメラなどの撮像装置、又は映像を記録したデジタルビデオレコーダーなどの記録装置であり、画像を制御部３へ出力する。以下、画像入力部２から制御部３に入力される画像を入力画像と称する。後述するように制御部３は各入力画像にて対象を検出する処理を行う。よって、画像入力部２から制御部３へ複数フレームの画像を順次入力する必要はない。また、画像を順次入力する場合でも、入力画像は固定した場所に設置したカメラで撮影された画像である必要はなく、移動体に取り付けられたカメラなどで撮影した画像でも良い。さらに、入力画像はＰＴＺカメラのようにパン、チルト、ズームが可能なカメラによって撮影された画像でも良い。 The image input unit 2 is, for example, an imaging device such as a surveillance camera or a recording device such as a digital video recorder that records video, and outputs an image to the control unit 3. Hereinafter, an image input from the image input unit 2 to the control unit 3 is referred to as an input image. As will be described later, the control unit 3 performs processing for detecting a target in each input image. Therefore, it is not necessary to sequentially input images of a plurality of frames from the image input unit 2 to the control unit 3. Even when images are sequentially input, the input image does not need to be an image captured by a camera installed in a fixed place, and may be an image captured by a camera attached to a moving body. Further, the input image may be an image taken by a camera capable of panning, tilting, and zooming, such as a PTZ camera.

制御部３はＣＰＵ（Central Processing Unit）、ＤＳＰ(Digital Signal Processor)等の演算装置を用いて構成される。制御部３は、画像入力部２からの入力画像を処理して人の存在有無を判定し、その判定結果等を出力部５へ出力する処理を行う。そのために、制御部３は、記憶部４からプログラムを読み出して実行し、画像縮小部３０、特徴量算出部３１、スコア算出部３２、候補領域削除部３３、領域グループ生成部３４及び対象物領域算出部３５として機能する。 The control unit 3 is configured using an arithmetic device such as a CPU (Central Processing Unit) or a DSP (Digital Signal Processor). The control unit 3 processes the input image from the image input unit 2 to determine the presence / absence of a person and outputs the determination result to the output unit 5. For this purpose, the control unit 3 reads out and executes a program from the storage unit 4, and executes an image reduction unit 30, a feature amount calculation unit 31, a score calculation unit 32, a candidate area deletion unit 33, an area group generation unit 34, and an object area. It functions as the calculation unit 35.

画像縮小部３０は、入力画像に撮像されている人物のサイズが様々であることに対応して、予め設定された複数段階の倍率で入力画像を縮小する。これにより画像内にて人物を検出するために設定する窓領域の大きさは変えずに、様々なサイズの人物の像を検出することが可能となる。例えば、画像縮小部３０は入力画像を予め定めた最小幅または高さになるまで決まった間隔で順次縮小し、縮小画像を生成する。縮小倍率は、例えば縦横のサイズが半分になるまでの間に１０段階に設定される。例えば、図２（ａ）に示す画像１００が原サイズの入力画像であり、図２（ｂ），（ｃ）に示す画像１１０，１２０は画像１００を縮小した入力画像の例である。 The image reduction unit 30 reduces the input image at a plurality of preset magnifications in response to the various sizes of the person captured in the input image. As a result, it is possible to detect images of persons of various sizes without changing the size of the window region set for detecting the person in the image. For example, the image reduction unit 30 sequentially reduces the input image at a predetermined interval until it reaches a predetermined minimum width or height, and generates a reduced image. For example, the reduction ratio is set to 10 levels until the vertical and horizontal sizes are halved. For example, the image 100 shown in FIG. 2A is an input image of the original size, and the images 110 and 120 shown in FIGS. 2B and 2C are examples of input images obtained by reducing the image 100.

特徴量算出部３１は、原サイズの入力画像及び縮小した入力画像のそれぞれを予め定めたブロックサイズに区切り、各ブロックの画像について特徴量を計算する。特徴量として、ヒストグラム・オブ・オリエンティッド・グラディエント（Histograms of Oriented Gradients：ＨＯＧ）特徴量、局所二値パターン（Local Binary Pattern：ＬＢＰ）特徴量、Haar-like特徴量などの従来知られた特徴量を単独で、又は複数を組み合わせて用いることができる。 The feature amount calculation unit 31 divides each of the original size input image and the reduced input image into predetermined block sizes, and calculates the feature amount for each block image. Conventionally known features such as Histograms of Oriented Gradients (HOG), Local Binary Pattern (LBP) features, Haar-like features, etc. It can be used alone or in combination.

スコア算出部３２は、原サイズの入力画像及び縮小した入力画像内の各位置に人物を検出するための枠として、予め定めた人の大きさの窓領域（注目領域）を設定し、当該窓領域に対象が存在する尤もらしさを表す多値の指標値であるスコアを、入力画像から得られたを予め学習した指標値算出関数により算出する指標値算出部である。例えば、スコア算出部３２は、各窓領域内の特徴量を指標値算出関数に入力して当該窓領域に対するスコアを算出する、または、人物の腕部等が窓領域からはみ出す姿勢変動を考慮して窓領域周辺の所定範囲を含む窓領域の内外の特徴量を指標値算出関数に入力して当該窓領域に対するスコアを算出する。 The score calculation unit 32 sets a window area (attention area) of a predetermined person size as a frame for detecting a person at each position in the original size input image and the reduced input image, and the window This is an index value calculation unit that calculates a score, which is a multi-valued index value representing the likelihood that an object exists in a region, by using an index value calculation function that has been learned in advance from an input image. For example, the score calculation unit 32 calculates the score for the window region by inputting the feature amount in each window region to the index value calculation function, or considers the posture variation that the person's arm or the like protrudes from the window region. Then, feature values inside and outside the window area including a predetermined range around the window area are input to the index value calculation function to calculate a score for the window area.

なお、図２では画像１００，１１０，１２０に設定される矩形の窓領域１０１の例を点線で示している。スコア算出部３２は窓領域１０１を少しずつずらしながら繰り返し設定し、画像全体を走査する。例えば、窓領域１０１の走査は画像の左上から水平方向の走査が開始される。水平方向の走査は垂直方向の位置を少しずつずらしつつ繰り返される。 In FIG. 2, an example of the rectangular window region 101 set in the images 100, 110, and 120 is indicated by a dotted line. The score calculation unit 32 repeatedly sets the window area 101 while gradually shifting it, and scans the entire image. For example, scanning of the window region 101 starts in the horizontal direction from the upper left of the image. The horizontal scanning is repeated while shifting the vertical position little by little.

指標値算出関数は本実施形態では、検出対象である「人」と「人」以外とを識別する識別器である。識別器は「人」が映っている多数の画像と、「人」が映っていない多数の画像とを用いて予め学習され、後述する識別器格納部４０に格納されている。スコア算出部３２は識別器に窓領域の位置に応じて特徴量を与えることでスコアを算出する。 In this embodiment, the index value calculation function is a discriminator that discriminates between “persons” to be detected and those other than “persons”. The discriminator is learned in advance using a large number of images in which “people” are reflected and a large number of images in which “people” are not reflected, and is stored in a discriminator storage unit 40 described later. The score calculation unit 32 calculates a score by giving a feature amount to the classifier according to the position of the window region.

スコア算出部３２は後述する候補領域削除部３３と共に、入力画像内にて人物が存在する候補領域（対象候補領域）を求める候補領域抽出部としての機能も有する。具体的には、スコア算出部３２は、スコアが予め定めた第一閾値Ｔ_１を超える窓領域の矩形情報（入力画像における位置、幅、高さ及びスコア）を候補領域として、後述する候補領域格納部４１に格納する。 The score calculation unit 32 also has a function as a candidate region extraction unit for obtaining a candidate region (target candidate region) where a person exists in the input image together with a candidate region deletion unit 33 described later. Specifically, the score calculation unit 32 uses the rectangular information (position, width, height, and score in the input image) of the window region where the score exceeds a predetermined first threshold T ₁ as a candidate region, which will be described later. Store in the storage unit 41.

候補領域削除部３３は、候補領域格納部４１に格納されている候補領域の情報に基づいて、スコアに関する閾値として第一閾値Ｔ_１より大きな第二閾値Ｔ_２を設定し、当該第二閾値Ｔ_２によって候補領域を削除し絞り込む。 The candidate area deletion unit 33 sets a second threshold value T ₂ larger than the first threshold value T ₁ as a threshold value for the score based on the information on the candidate area stored in the candidate area storage unit 41, and the second threshold value T _The candidate area is deleted and narrowed down by ₂ .

候補領域のスコアは背景の複雑さ、照明状態、解像度など撮影環境によって変動する傾向があり、この変動により誤抽出が増大する場合がある。そして、誤抽出が多い場合とそうでない場合とでスコアのばらつきに有意な差が生じる傾向がある。そこで、候補領域削除部３３は、入力画像ごとに得られる候補領域のスコアの分布に応じて動的にスコアの第二閾値Ｔ_２を設定する。 The score of the candidate area tends to vary depending on the shooting environment such as background complexity, illumination state, and resolution, and this variation may increase false extraction. Then, there is a tendency that a significant difference occurs in the variation in scores between the case where there are many erroneous extractions and the case where there are not many erroneous extractions. Therefore, the candidate area deleting unit 33 dynamically sets the second threshold value T ₂ of the scores according to the distribution of the scores of the candidate region obtained for each input image.

ここで環境の違いによって候補領域のスコアの分布がどのように異なるかを図３を用いて説明する。図３は候補領域のスコアを横軸、その頻度を縦軸にしてプロットした模式的なグラフである。図３（ａ）は単純な背景に人物がいる入力画像における候補領域のスコアの分布であり、一方、図３（ｂ），（ｃ）は複雑な背景に人物がいる入力画像における候補領域のスコアの分布である。例えば、図３（ｂ）は人物に一部隠蔽が起こっているような場合であり、図３（ｃ）は複雑な背景内に人物に似た特徴を有する領域が存在するような場合である。 Here, how the score distribution of the candidate area differs depending on the environment will be described with reference to FIG. FIG. 3 is a schematic graph plotted with the score of the candidate region on the horizontal axis and the frequency on the vertical axis. FIG. 3 (a) shows the distribution of the scores of candidate areas in an input image with a person on a simple background, while FIGS. 3 (b) and 3 (c) show the candidate areas in an input image with a person on a complex background. The distribution of scores. For example, FIG. 3B shows a case where a part of the person is concealed, and FIG. 3C shows a case where an area having characteristics similar to the person exists in a complicated background. .

単純な背景ではスコアの頻度分布が高いところに集中する（図３（ａ）の山３００ａ）。一方、複雑な背景で背景を人物として誤抽出しているような状態（図３（ｃ））では、図３（ａ）の山３００ａよりスコアが低い側に分布がばらつき、例えば、図３（ａ）の山３００ａに対応する山３００ｃよりスコアが低い位置に頻度が低い山３０１ｃが現れたりすることが実験的に確かめられた。 In a simple background, the score is concentrated at a high frequency distribution (mountain 300a in FIG. 3A). On the other hand, in a state where the background is erroneously extracted as a person with a complicated background (FIG. 3C), the distribution varies to the side where the score is lower than the mountain 300a in FIG. 3A. For example, FIG. It has been experimentally confirmed that a mountain 301c having a lower frequency appears at a position having a lower score than the mountain 300c corresponding to the mountain 300a of a).

つまり、撮影環境によって誤抽出が多くなるとスコアの分布に大きなばらつきが生じる。そこで、候補領域削除部３３は、候補領域のスコアの分布から誤抽出が多く含まれているかどうかを判定し、その上で第一閾値Ｔ_１よりも高い第二閾値Ｔ_２を用いて誤抽出した候補領域の削除を図る。 That is, if the number of erroneous extractions increases depending on the shooting environment, the score distribution varies greatly. Therefore, the candidate area deletion unit 33 determines whether or not many erroneous extractions are included from the score distribution of the candidate areas, and then uses the second threshold value T ₂ higher than the first threshold value T ₁ for erroneous extraction. The candidate area is deleted.

そのために候補領域削除部３３は、スコア算出部３２により候補領域格納部４１に格納された候補領域、つまりスコアが第一閾値Ｔ_１を超える窓領域についてのスコアの分布のばらつき度合いを算出し、算出したばらつき度合いを誤抽出推定閾値と比較する。候補領域削除部３３は、ばらつき度合いが誤抽出推定閾値以上である場合は誤抽出が多く含まれているとして、スコアについての第二閾値Ｔ_２を設定し、スコアが第二閾値Ｔ_２以下である候補領域を候補領域格納部４１から削除する。一方、ばらつき度合いが誤抽出推定閾値未満である場合は当該削除を行わない。 The candidate area deleting unit 33 to calculates the stored candidate regions in the candidate area storage unit 41, i.e. the degree of dispersion of the distribution of scores for the window area where the score exceeds the first thresholds T ₁ by the score calculation unit 32, The calculated variation degree is compared with an erroneous extraction estimation threshold. Candidate area deleting unit 33, as false if the degree of variation is erroneous extraction estimated threshold above extraction contained many sets a second threshold value T ₂ of the the score, the score is the second threshold value T ₂ or less A candidate area is deleted from the candidate area storage unit 41. On the other hand, when the degree of variation is less than the erroneous extraction estimation threshold, the deletion is not performed.

候補領域削除部３３はばらつき度合いとして例えば、分散値σ^２を算出する。例えば、分散値に対する誤抽出推定閾値Ｖ_Ｈは０．０４とすることができる。 The candidate area deleting unit 33 calculates, for example, the variance value σ ² as the degree of variation. For example, the erroneous extraction estimation threshold value V _H for the variance value can be set to 0.04.

ここで、複雑な背景で人物の一部に隠蔽が起こっているような状態では当該人物についてスコアが低下し図３（ｂ）の３０１ｂのように低い山が生じ得る。そのため誤抽出が多くなくともばらつき度合いが誤抽出推定閾値以上となることがあり、低い山３０１ｂを誤って削除しないよう第二閾値Ｔ_２を制御するのが望ましい。ここで、スコアが低下した人物の山３０１ｂはスコアの分布のピークの近傍に現れ、スコアの分布のピークはスコアの分布の平均値で近似できる。すなわち、分散σ^２が大きいとき、候補領域に含まれる誤抽出のものの数が比較的多くなるとは言え、候補領域は少なくとも第一閾値Ｔ_１を超えるスコアを有するものであり、正しく抽出された候補領域が支配的である。よって、全候補領域のスコア平均値ｍ_１は正しく抽出された候補領域のスコアが形成している頻度分布の山（図３の山３００）のピーク付近を示す。そこで、候補領域削除部３３はスコアの分布の平均値ｍ_１に応じた第二閾値Ｔ_２を設定する。例えば、予め設定した係数κ_０を用いてＴ_２＝κ_０ｍ_１と設定できる。κ_０は０より大きく１未満の値であり、例えば０．４とすることができる。 Here, in a state where a part of the person is concealed in a complicated background, the score of the person is lowered, and a low mountain can be generated as indicated by 301b in FIG. 3B. Therefore even no more erroneous extraction may equal to or greater than the extraction estimated threshold erroneous degree of variation, to control the second threshold value T ₂ so that accidental deletion low mountain 301b is desirable. Here, the mountain 301b of the person whose score has decreased appears in the vicinity of the peak of the score distribution, and the peak of the score distribution can be approximated by an average value of the score distribution. That is, when the variance σ ² is large, although the number of erroneous extractions included in the candidate region is relatively large, the candidate region has a score that exceeds at least the first threshold value T ₁ and is a correctly extracted candidate The territory is dominant. Therefore, the average score value m ₁ of all candidate areas indicates the vicinity of the peak of the peak (the peak 300 in FIG. 3) of the frequency distribution formed by the correctly extracted candidate area scores. Therefore, the candidate area deleting unit 33 sets a second threshold T ₂ corresponding to the average value m ₁ of the score distribution. For example, T ₂ = κ ₀ m ₁ can be set using a preset coefficient κ ₀ . κ ₀ is a value greater than 0 and less than 1, and can be set to 0.4, for example.

さらに、平均値ｍ_１に応じた第二閾値Ｔ_２を第一閾値Ｔ_１の値を基準とした設定とすることもできる。具体的には候補領域削除部３３は、窓領域ごとに算出されるスコアのうち第一閾値Ｔ_１を超えるものの平均値ｍ_１より小さい値であって、第一閾値Ｔ_１との差が平均値ｍ_１に応じて大きくなる値を第二閾値Ｔ_２に設定する。例えば、候補領域のうちスコアが下位ξ％であるもののスコア平均値をｍ_２とする。誤抽出が多い撮影環境である場合は、候補領域のスコアのばらつきが大きくなり、ｍ_２が低くなる。そこで、撮影環境によって変動するｍ_１とｍ_２の差を用いた次式でＴ_２を定めることができる。
Ｔ_２＝Ｔ_１＋κ_１（ｍ_１−ｍ_２） ……（１） Furthermore, the second threshold value T ₂ corresponding to the average value m ₁ can be set based on the value of the first threshold value T ₁ . Candidate area deleting unit 33 Specifically, an average value m ₁ is less than value of in excess of first thresholds T ₁ of the score calculated for each window area, the difference between the first threshold value T ₁ is the average A value that increases according to the value m ₁ is set as the second threshold value T ₂ . For example, let m ₂ be the average score value of the candidate regions whose score is lower ξ%. In a shooting environment where there are many erroneous extractions, the variation of the score of the candidate area becomes large and m ₂ becomes low. Therefore, T ₂ can be determined by the following equation using the difference between m ₁ and m _{2 that} varies depending on the shooting environment.
T ₂ = T ₁ + κ ₁ (m ₁ −m ₂ ) (1)

κ_１は正の係数であり、例えば２／３とすることができる。なお、係数κは、事前の実験を通じて、少なくともＴ_２が真の人物領域を含んだ領域グループの最高スコアを削除してしまわない程度の低さに設定すればよい。また例えばξ＝１０（％）とすることができる。（１）式によれば、誤抽出が多い撮影環境である場合は、第二閾値Ｔ_２が高めに設定され、背景を誤検出した候補領域が削除されやすくなる。一方、誤抽出が少ない撮影環境である場合は、第二閾値Ｔ_２が低めに設定され、人物領域の検出し損ねを抑えつつ背景を誤抽出した候補領域の削除を図ることができる。 κ ₁ is a positive coefficient and can be set to 2/3, for example. Incidentally, the coefficient kappa, through preliminary experiments, at least T ₂ may be set to a low enough to not to delete the highest score of the region group including the true person area. For example, ξ = 10 (%) can be set. (1) According to the equation, when it is erroneously extracted is large shooting environment, the second threshold value T ₂ is set high, the detected candidate region erroneous background is easily removed. On the other hand, if extraction errors is less shooting environment, it is possible to the second threshold value T ₂ is set low, promote the removal of erroneously extracted candidate region background while suppressing the detected impair the person area.

または、候補領域削除部３３は第一閾値Ｔ_１にばらつき度合いに応じた値を加算して第二閾値Ｔ_２を設定することもできる。この場合、例えば、候補領域削除部３３は分散値σ^２を用いた次式でＴ_２を定める。
Ｔ_２＝Ｔ_１＋κ_１・ＳＱＲＴ（σ^２） ……（２） Or, the candidate area deleting unit 33 may also set the second threshold value T ₂ by adding a value corresponding to the degree of variation in the first threshold value T _1. In this case, for example, the candidate area deleting unit 33 determines T ₂ by the following expression using the variance value σ ² .
T ₂ = T ₁ + κ ₁ · SQRT (σ ² ) (2)

ただし、ＳＱＲＴ（）は平方根を表す。また、κ_２は正の係数であり、実験を通じて少なくともＴ_２が真の人物領域を含んだ領域グループの最高スコアを削除してしまわない程度の低さに設定すればよい。 However, SQRT () represents a square root. Further, κ ₂ is a positive coefficient, and it may be set to a low value that does not delete the highest score of the region group including at least T _{2 that} is a true human region through experiments.

スコアのばらつき度合いとしては上述した候補領域のスコアの分散に代えて、当該スコアの標準偏差や、当該スコアの最大値と最小値との差などを用いることもできる。また上述した平均値ｍ_１，ｍ_２の代わりに他の代表値である中央値または最頻値を用いることもできる。 As the degree of score variation, the standard deviation of the score, the difference between the maximum value and the minimum value of the score, or the like can be used instead of the distribution of the score of the candidate area described above. Further, instead of the average values m ₁ and m ₂ described above, a median value or a mode value which is another representative value can be used.

領域グループ生成部３４及び対象物領域算出部３５は、候補領域削除部３３により残された候補領域の中から人物領域（対象領域）を決定する対象領域決定部を構成する。 The region group generation unit 34 and the target region calculation unit 35 constitute a target region determination unit that determines a person region (target region) from the candidate regions left by the candidate region deletion unit 33.

領域グループ生成部３４は、候補領域格納部４１に格納されている候補領域について、同一人物に起因するものをグループ化する処理を行う。具体的には、領域グループ生成部３４は、所定以上の重複を有する候補領域同士に同じラベルを割り当てることによって領域グループの情報を生成する。領域グループ生成部３４で定義された各候補領域のラベル情報は、矩形情報及びスコアと共に候補領域格納部４１に格納される。 The area group generation unit 34 performs processing for grouping candidate areas stored in the candidate area storage unit 41 that are attributed to the same person. Specifically, the region group generation unit 34 generates region group information by assigning the same label to candidate regions having a predetermined overlap or more. The label information of each candidate area defined by the area group generation unit 34 is stored in the candidate area storage unit 41 together with the rectangle information and the score.

対象物領域算出部３５は候補領域格納部４１に格納されている候補領域から最終的な人物領域を求める。対象物領域算出部３５は、領域グループ生成部３４で算出されたグループごとに一つの人物領域を定め、当該人物領域の領域情報をスコアと共に対象物領域格納部４２に格納する。例えば、対象物領域算出部３５は、最終的な人物領域として、各領域グループの中でスコアが最大になる候補領域を一つ選択する。或いは、対象物領域算出部３５は、領域グループごとに当該領域グループを構成する候補領域を平均して最終的な人物領域を算出する。 The object area calculation unit 35 obtains a final person area from the candidate areas stored in the candidate area storage unit 41. The object area calculation unit 35 determines one person area for each group calculated by the area group generation unit 34, and stores area information of the person area in the object area storage unit 42 together with the score. For example, the object region calculation unit 35 selects one candidate region having the maximum score in each region group as the final person region. Alternatively, the target object area calculation unit 35 calculates the final person area by averaging the candidate areas constituting the area group for each area group.

制御部３は，入力画像から最終的な人物領域が一つでも検出された場合は、その情報を出力部５に出力する。 When at least one final person area is detected from the input image, the control unit 3 outputs the information to the output unit 5.

記憶部４はＲＯＭ（Read Only Memory）、ＲＡＭ（Random Access Memory）、ハードディスク等の記憶装置であり、制御部３で使用されるプログラムやデータを記憶する。記憶部４はこれらプログラム、データを制御部３との間で入出力する。記憶部４は識別器格納部４０、候補領域格納部４１及び対象物領域格納部４２としての機能を有する。 The storage unit 4 is a storage device such as a ROM (Read Only Memory), a RAM (Random Access Memory), and a hard disk, and stores programs and data used by the control unit 3. The storage unit 4 inputs and outputs these programs and data to and from the control unit 3. The storage unit 4 has functions as a discriminator storage unit 40, a candidate region storage unit 41, and an object region storage unit 42.

識別器格納部４０は、入力画像内に設定される窓領域に対象が存在する尤もらしさを表すスコアを、入力画像内の各ブロックにて抽出される特徴量を用いて算出するための指標値算出関数、及び第一閾値Ｔ_１を予め記憶している。指標値算出関数は既に述べたように識別器であり、具体的には予め収集した人の学習用画像と人以外の学習用画像にサポートベクターマシーン（Support Vector Machine：ＳＶＭ）を適用して求めた識別器のパラメータが識別器格納部４０に格納される。学習アルゴリズムとして線形ＳＶＭを用いた場合、識別器のパラメータは学習用画像から生成した重みベクトルである。この重みベクトルは、特徴量の各要素に対する重みである。重みベクトルは、当該重みベクトルと学習用画像から抽出された特徴量との内積が０より大きい場合は人、０以下の場合は人以外と識別されるように学習において調整され、入力画像の特徴量と重みベクトルとの内積の値がスコアを表す。よって、人と人以外のスコアを識別する閾値は原理上は０であり、通常、第一閾値Ｔ_１は０に設定することができる。ただし、人を人以外であると識別する誤りを減じるために、第一閾値Ｔ_１を０よりも小さな値に設定してもよい。 The discriminator storage section 40 calculates an index value for calculating a score representing the likelihood that the target exists in the window area set in the input image using the feature amount extracted in each block in the input image. calculation function, and stores in advance a first threshold value T _1. As described above, the index value calculation function is a discriminator. Specifically, the index value calculation function is obtained by applying a support vector machine (SVM) to learning images of human beings and learning images other than human beings. The discriminator parameters are stored in the discriminator storage unit 40. When linear SVM is used as the learning algorithm, the parameter of the discriminator is a weight vector generated from the learning image. This weight vector is a weight for each element of the feature amount. The weight vector is adjusted in learning so that the inner product of the weight vector and the feature amount extracted from the learning image is identified as a person when it is greater than 0, and when it is equal to or less than 0, it is identified as a person other than the person. The value of the inner product of the quantity and the weight vector represents the score. Therefore, the threshold value for identifying a score of non-human and human is zero in principle, usually, the first thresholds T ₁ may be set to 0. However, in order to reduce the error identified as other than human to human, the first thresholds T ₁ may be set to a value smaller than 0.

識別器の学習アルゴリズムにはＳＶＭの他、アダブースト（AdaBoost）法など、従来知られた各種のものを用いることができる。 As the learning algorithm of the discriminator, various conventionally known ones such as the AdaBoost method can be used in addition to the SVM.

また、識別器の代わりにパターンマッチング器を用いることもでき、その場合、スコアは人の学習用画像から抽出した特徴量の平均パターンと入力画像の特徴量との距離の逆数などとなり、指標値算出関数は当該スコアを出力値とし入力画像の特徴量を入力値とする関数とすることができる。 In addition, a pattern matching device can be used instead of the discriminator. In this case, the score is the reciprocal of the distance between the average pattern of feature values extracted from the human learning image and the feature value of the input image, and the index value. The calculation function can be a function having the score as an output value and the feature quantity of the input image as an input value.

候補領域格納部４１は、スコア算出部３２により得られた人物の候補領域の情報、及び領域グループ生成部３４により得られた領域グループを示すラベル情報を格納する。なお、候補領域の情報は上述したように窓領域の位置・寸法、及びスコアである。例えば、入力画像における窓領域の位置として窓領域をなす矩形の左上の座標が格納される。 The candidate area storage unit 41 stores information on the candidate area of the person obtained by the score calculation unit 32 and label information indicating the area group obtained by the area group generation unit 34. The candidate area information includes the position / size and score of the window area as described above. For example, the upper left coordinates of the rectangle forming the window area are stored as the position of the window area in the input image.

対象物領域格納部４２は、対象物領域算出部３５により最終的に人物がいると判定された人物領域の情報を格納する。人物領域の情報は、候補領域の情報と同様、入力画像における人物領域の矩形情報（矩形の左上の座標、及び寸法）とスコアである。 The object area storage unit 42 stores information on a person area that is finally determined by the object area calculation unit 35 to have a person. The information on the person area is the rectangular information (the coordinates and size of the upper left corner of the rectangle) and the score of the person area in the input image, similarly to the information on the candidate area.

出力部５は対象物領域算出部３５の結果を受けて、ディスプレイなどの外部表示装置に入力画像を表示したり、異常信号をセンタ装置へ送出したりする。 The output unit 5 receives the result of the object area calculation unit 35 and displays an input image on an external display device such as a display or sends an abnormal signal to the center device.

[動作例]
次に人物検出装置１の動作を説明する。図４は人物検出装置１の概略の動作を示すフロー図である。制御部３は画像入力部２から画像を入力されると（ステップＳ１０）、画像縮小部３０により、入力画像を複数の倍率それぞれで縮小して縮小画像を作成する（ステップＳ２０）。例えば、図２に示したように、入力画像１００から縮小画像１１０，１２０が生成される。 [Example of operation]
Next, the operation of the person detection device 1 will be described. FIG. 4 is a flowchart showing a schematic operation of the person detection apparatus 1. When an image is input from the image input unit 2 (step S10), the control unit 3 creates a reduced image by reducing the input image at a plurality of magnifications by the image reduction unit 30 (step S20). For example, as illustrated in FIG. 2, reduced images 110 and 120 are generated from the input image 100.

特徴量算出部３１は入力画像及び複数の縮小画像それぞれについて、画像内の各所における特徴量を計算する（ステップＳ３０）。 The feature quantity calculation unit 31 calculates the feature quantity at each location in the image for each of the input image and the plurality of reduced images (step S30).

スコア算出部３２は、特徴量算出部３１で計算された特徴量と識別器格納部４０に格納されている識別器とにより画像内の各所に設定する窓領域に対応したスコアを算出し、設定した窓領域の中からスコアが第一閾値Ｔ_１を超えるものを人物の候補領域として候補領域格納部４１に格納する（ステップＳ４０）。 The score calculation unit 32 calculates a score corresponding to the window region set in each place in the image by using the feature amount calculated by the feature amount calculation unit 31 and the classifier stored in the classifier storage unit 40. score from the the window region is stored in the candidate area storage unit 41 in excess of the first thresholds T ₁ as a candidate region of a person (step S40).

図２では、窓領域１０１を点線の矩形で示し、候補領域の例を窓領域に応じた大きさの実線の矩形で示している。画像１００では左側の小さな（遠くの）人物像の辺りに候補領域１０２ａ，１０２ｂが抽出されている。また、画像１１０では中央の机・椅子の辺りに候補領域１１２が検出され、画像１２０では右側の大きな（近くの）人物像の辺りに候補領域１２２ａ，１２２ｂが抽出されている。なお、図２に示すように、人物などの１つの像に対し、重複した複数の候補領域が抽出され得る。 In FIG. 2, the window area 101 is indicated by a dotted rectangle, and an example of a candidate area is indicated by a solid rectangle having a size corresponding to the window area. In the image 100, candidate regions 102a and 102b are extracted around a small (far) person image on the left side. In the image 110, a candidate area 112 is detected around a central desk / chair, and in the image 120, candidate areas 122a and 122b are extracted around a large (near) person image on the right side. As shown in FIG. 2, a plurality of overlapping candidate regions can be extracted for one image such as a person.

図５はスコア算出部３２により抽出された候補領域に対する後続処理を説明する模式的な画像である。なお、図５の画像は図２に示したものと同じ内容が映っており、図５（ａ）の画像１３０は、画像１００，１１０，１２０の候補領域を１つの画像上にまとめて表示したものである。画像１３０は入力画像１００と等倍のサイズであり、画像１００の候補領域１０２ａ，１０２ｂはそのままの倍率で画像１３０上の候補領域１３１ａ，１３１ｂとなる。一方、縮小画像における候補領域１１２，１２２ａ，１２２ｂそれぞれは入力画像１００の倍率に正規化された候補領域１３２，１３３ａ，１３３ｂとなる。 FIG. 5 is a schematic image for explaining the subsequent processing for the candidate area extracted by the score calculation unit 32. The image of FIG. 5 shows the same content as that shown in FIG. 2, and the image 130 of FIG. 5A displays the candidate areas of the images 100, 110, and 120 together on one image. Is. The image 130 is the same size as the input image 100, and the candidate areas 102a and 102b of the image 100 become candidate areas 131a and 131b on the image 130 at the same magnification. On the other hand, the candidate areas 112, 122a, and 122b in the reduced image become candidate areas 132, 133a, and 133b normalized to the magnification of the input image 100, respectively.

候補領域削除部３３は、候補領域格納部４１に格納されているスコアから、候補領域を削除する処理を行うか否かの判定を行い、削除処理を行う場合は第二閾値Ｔ_２を設定し、Ｔ_２以下の候補領域を候補領域格納部４１から削除する（ステップＳ５０）。 Candidate area deleting unit 33, from the score stored in the candidate area storage unit 41, performs for determining whether or not performing the process of deleting the candidate area, the case of performing the deletion processing sets the second threshold value T ₂ deletes the _{T 2} following candidate region from the candidate region storing unit 41 (step S50).

図６は候補領域削除部３３の概略の処理フロー図である。図６を用いて候補領域削除部３３の動作について説明する。候補領域削除部３３は、候補領域格納部４１に格納されている候補領域のスコアについてそのばらつき度合いとして分散値σ^２を算出する（ステップＳ５０１）。分散値σ^２が予め定めた抽出漏れ推定閾値ｖ_Ｌより大きければ（ステップＳ５０２にて「ＮＯ」の場合）、候補領域削除部３３はさらに分散値σ^２を予め定めた誤抽出推定閾値ｖ_Ｈと比較する（ステップＳ５０３）。 FIG. 6 is a schematic process flow diagram of the candidate area deletion unit 33. The operation of the candidate area deletion unit 33 will be described with reference to FIG. The candidate area deletion unit 33 calculates the variance value σ ² as the degree of variation of the score of the candidate area stored in the candidate area storage unit 41 (step S501). If the variance value σ ² is larger than the predetermined extraction omission estimation threshold v _L (in the case of “NO” in step S502), the candidate area deletion unit 33 further sets the variance value σ ² to the predetermined erroneous extraction estimation threshold v _H. (Step S503).

分散値σ^２が誤抽出推定閾値ｖ_Ｈ以上である場合は（ステップＳ５０３にて「ＹＥＳ」の場合）、第一閾値Ｔ_１より大きな第二閾値Ｔ_２を設定する（ステップＳ５０４）。 If the variance value σ ² is greater than or equal to the erroneous extraction estimation threshold v _H (“YES” in step S503), a second threshold T ₂ greater than the first threshold T ₁ is set (step S504).

一方、分散値σ^２が誤抽出推定閾値Ｖ_Ｈ未満の場合（ステップＳ５０３にて「ＮＯ」の場合）は、第二閾値Ｔ_２を設定することなく候補領域削除部３３による処理を終了し、図４のステップＳ６０に進む。すなわち、この場合は、制御部３は候補領域の削除を行わない。 On the other hand, when the variance value σ ² is less than the erroneous extraction estimation threshold value V _H (in the case of “NO” in step S503), the process by the candidate area deleting unit 33 is terminated without setting the second threshold value T ₂ , Proceed to step S60 of FIG. That is, in this case, the control unit 3 does not delete the candidate area.

分散値σ^２がＶ_Ｈ以上である場合には上述のように第二閾値Ｔ_２が設定され、候補領域削除部３３はこのＴ_２を用いて候補領域を削除する処理Ｓ５０５〜Ｓ５０８を行う。この処理は候補領域格納部４１に格納されている全候補領域を一つずつ処理対象として繰り返されるループ処理として行われる。具体的には、ループ処理が未処理である候補領域を選択し（ステップＳ５０５）、処理対象として選択された候補領域のスコアが第二閾値Ｔ_２以下か否かを判定し（ステップＳ５０６）、第二閾値Ｔ_２以下の場合は（ステップＳ５０６にて「ＹＥＳ」の場合）、当該候補領域を候補領域格納部４１から削除する（ステップＳ５０７）。スコアが第二閾値Ｔ_２より大きい場合は（ステップＳ５０６にて「ＮＯ」の場合）、当該候補領域は削除せず、次の候補領域の判定を行う（ステップＳ５０７からＳ５０５に戻る）。全ての候補領域について処理が完了した場合、つまりステップＳ５０５で未処理の候補領域が存在せず選択できなかった場合（ステップＳ５０８にて「ＮＯ」の場合）、図４のステップＳ６０に進む。 If the variance value σ ² is equal to or higher than V _H , the second threshold value T ₂ is set as described above, and the candidate area deletion unit 33 performs processes S505 to S508 using this T ₂ to delete the candidate area. This process is performed as a loop process that is repeated for each candidate area stored in the candidate area storage unit 41 one by one. Specifically, selecting the candidate region loop processing has not been processed (step S505), the score of the selected candidate region as a processing target is determined whether the second threshold value T ₂ or less (step S506), If the second threshold value _{T 2} or less (if at step S506 is "YES"), deletes the candidate area from the candidate area storage unit 41 (step S507). If the score is greater than the second threshold value _{T 2} (when step S506 is "NO"), the candidate region is not removed, (returns from step S507 to S505) and determines the next candidate area. If processing has been completed for all candidate areas, that is, if no unprocessed candidate areas exist and cannot be selected in step S505 (“NO” in step S508), the process proceeds to step S60 in FIG.

さて、分散値σ^２が抽出漏れ推定閾値ｖ_Ｌ以下の場合は（ステップＳ５０２にて「ＹＥＳ」の場合）、候補領域削除部３３は第一閾値Ｔ_１を下方修正する。具体的には、候補領域削除部３３は第一閾値Ｔ_１を予め定められた量だけ低下させる。そして、制御部３はステップＳ４０から処理をやり直す。つまり、スコア算出部３２が下方修正された第一閾値Ｔ_１を用いてステップＳ４０の処理を行い、候補領域を抽出し直し候補領域格納部４１に格納し、候補領域削除部３３は再抽出された候補領域について上述したステップＳ５０の処理を行う。 Now, ( "YES" in step S502) if the variance value sigma ² is the following extraction failure estimated threshold _{v L,} the candidate area deleting unit 33 downward adjustment of the first threshold value _{T 1.} Specifically, the candidate area deleting unit 33 decreases by a predetermined amount a first threshold value T _1. And the control part 3 starts a process again from step S40. That is, performs the processing of step S40 by using the first thresholds T ₁ to the score calculation unit 32 is adjusted downward, and stored in the candidate area storage unit 41 re-extracts the candidate region, the candidate area deleting unit 33 is re-extracted The above-described processing of step S50 is performed for the candidate area.

抽出漏れ推定閾値ｖ_Ｌは、人物像に想定されるスコアのばらつき、つまり本実施形態では分散値に基づいて設定される。つまり、図３（ａ）に示す分布のように、隠蔽などを生じていない人物像を窓領域で走査して得られるスコアは、誤抽出などがなくても本来的に或る程度のばらつきを有する。抽出漏れ推定閾値ｖ_Ｌには、候補領域のスコアの分布がそのような本来的な最低限のばらつき度合い未満の値が実験に基づき予め設定される。具体的には、第一閾値Ｔ_１が図７のように人物像に起因するスコアの分布の山３００の位置に設定された場合に、当該山３００の第一閾値Ｔ_１以下の部分は候補領域として抽出されなり、当該山の第一閾値Ｔ_１を超える部分のばらつき度合いは山全体のばらつき度合いよりも小さくなる。第一閾値Ｔ_１の下方修正はこのように不適切な位置に設定されたＴ_１を修正し、山３００の全体を抽出できるようにする。 The extraction omission estimation threshold v _L is set based on the variation of the score assumed for the human image, that is, the variance value in the present embodiment. That is, as shown in the distribution shown in FIG. 3A, the score obtained by scanning a human image in which no concealment or the like is generated in the window region inherently varies to some extent without erroneous extraction. Have. As the extraction omission estimation threshold v _L , a value that is less than the original minimum variation degree of the score distribution of the candidate region is set in advance based on experiments. Specifically, if the first threshold value T ₁ is set to the position of the peaks 300 of the distribution of scores resulting from the figures as shown in FIG. 7, the first thresholds T ₁ following part of the mountain 300 candidates will be extracted as a region, the degree of variation of a portion exceeding the first threshold value T ₁ of the the mountains is smaller than the degree of variation in overall mountain. Downgrades first thresholds T ₁ modifies the T ₁ set in this way the wrong position, to be able to extract the entire mountain 300.

なお、第一閾値Ｔ_１を超えるスコアが、人物像に起因する山３００の存在を推定可能な数以上ない場合は、ステップＳ５０２の判定及びステップＳ５１０の第一閾値Ｔ_１の下方修正は省略するのが好適である。例えば、第一閾値Ｔ_１を超えるスコアが数個であるような場合には省略される。図３には示していないが、スコアが低い領域には背景に起因する分布の山が存在する。この背景の山を超えたスコア範囲の分布が少ない場合には、第一閾値Ｔ_１の設定が適切であってもそれを超えるスコアが少なく分散が小さくなり得る。このような場合に、Ｔ_１を下方修正していくと背景の山を含む位置にまでＴ_１が低下することが起こり得る。よって、その恐れがある場合にはＴ_１を下方修正は省略される。なお、その場合には基本的にステップＳ５０３〜Ｓ５０８の処理も省略され、処理はステップＳ６０に進む。 Incidentally, if the score exceeds a first threshold value T ₁ is not more than a few possible estimate the presence of a mountain 300 due to figures, the determination and the first threshold value T ₁ of the downward revision of step S510 in step S502 is omitted Is preferred. For example, if score exceeds a first threshold value T ₁ is such that a few may be omitted. Although not shown in FIG. 3, there is a mountain of distribution due to the background in the low score area. If this distribution of mountain score range exceeding the background is low, the first threshold value T ₁ of the configuration be a suitable small score beyond that dispersion can be reduced. Thus case, T ₁ to T ₁ to a position including As you downgraded mountain background can occur may be reduced. Therefore, if there is the possibility downward revision T ₁ is omitted. In this case, basically, the processes in steps S503 to S508 are also omitted, and the process proceeds to step S60.

図５（ｂ）の画像１４０は画像１３０に対する候補領域削除部３３の処理結果を示しており、背景に起因する候補領域１３２が削除されている。 An image 140 in FIG. 5B shows the processing result of the candidate area deletion unit 33 for the image 130, and the candidate area 132 caused by the background is deleted.

上述した候補領域削除部３３の処理が終わると、領域グループ生成部３４は、候補領域格納部４１に格納されている候補領域同士の重複度を算出し、候補領域相互の重複度が予め定められたグループ判定閾値以上である候補領域からなるグループ（領域グループ）を生成し、当該グループを示すラベル番号を候補領域の情報に追加し候補領域格納部４１に格納する（図４のステップＳ６０）。 When the processing of the candidate area deletion unit 33 described above is completed, the area group generation unit 34 calculates the degree of overlap between candidate areas stored in the candidate area storage unit 41, and the degree of overlap between candidate areas is determined in advance. A group (region group) composed of candidate regions equal to or greater than the group determination threshold is generated, and a label number indicating the group is added to the candidate region information and stored in the candidate region storage unit 41 (step S60 in FIG. 4).

重複度は、例えば、(入力画像中での候補領域Ａと候補領域Ｂとの共通領域の面積) / (入力画像中での候補領域Ａ及び候補領域Ｂの面積のうち小さい方)で計算される。また、
(入力画像中での候補領域Ａと候補領域Ｂとの共通領域の面積) / (入力画像中での候補領域Ａと候補領域Ｂとの和領域の面積)で重複度を計算することもできる。グループ判定閾値は例えば、０．５に設定することができる。 The degree of overlap is calculated by, for example, (the area of the common area between candidate area A and candidate area B in the input image) / (the smaller of the areas of candidate area A and candidate area B in the input image). The Also,
The degree of overlap can also be calculated by (area of common area between candidate area A and candidate area B in input image) / (area of sum area of candidate area A and candidate area B in input image). . The group determination threshold can be set to 0.5, for example.

なお、近接する複数の人物に係る候補領域が一つのグループとなることを回避するために、スコアが高い候補領域を優先してグループの核に設定し、当該候補領域及び当該候補領域との重複度がグループ判定閾値以上である他の候補領域をグループ化することが望ましい。 In addition, in order to avoid that candidate areas relating to a plurality of people in the vicinity are combined into one group, a candidate area having a high score is set as the core of the group with priority, and the candidate area and the candidate area overlap. It is desirable to group other candidate areas whose degrees are greater than or equal to the group determination threshold.

そのために例えば、領域グループ生成部３４は、まず候補領域格納部４１に格納されている候補領域をスコアの降順に並べて未割当リストを初期化し、未割当リスト先頭の候補領域をグループの核に設定する。次に領域グループ生成部３４は、未割当リスト先頭の候補領域に対する未割当リストの２番目以降の候補領域それぞれの重複度を算出して、重複度がグループ判定閾値以上である候補領域と未割当リスト先頭の候補領域に同一のラベル番号を割り当てて候補領域格納部４１に格納すると共に、これらの候補領域を未割当リストから削除する。以降、更新された未割当リスト先頭の候補領域を順次核に設定して、未割当リストから候補領域が無くなるまでラベル番号の割り当てを繰り返す。 For this purpose, for example, the area group generation unit 34 first arranges the candidate areas stored in the candidate area storage unit 41 in descending order of the scores to initialize the unallocated list, and sets the candidate area at the head of the unallocated list as the core of the group To do. Next, the area group generation unit 34 calculates the degree of duplication of each of the second and subsequent candidate areas in the unallocated list with respect to the candidate area at the top of the unallocated list, and the candidate areas having the degree of duplication equal to or greater than the group determination threshold and unallocated The same label number is assigned to the candidate area at the head of the list and stored in the candidate area storage unit 41, and these candidate areas are deleted from the unallocated list. Thereafter, the candidate area at the head of the updated unallocated list is sequentially set as a nucleus, and label number allocation is repeated until there are no candidate areas in the unallocated list.

上述した領域グループ生成部３４の処理が終わると、対象物領域算出部３５は最終的な人物領域を求めて対象物領域格納部４２に格納する（図４のステップＳ７０）。 When the processing of the area group generation unit 34 described above is completed, the object area calculation unit 35 obtains a final person area and stores it in the object area storage unit 42 (step S70 in FIG. 4).

図５（ｃ）の画像１５０は画像１４０に対する対象物領域算出部３５の処理結果を示しており、ラベル番号“０”のグループを構成する候補領域１３３ａ，１３３ｂのうちスコアが最大となる候補領域１３３ａが人物領域として選択され、ラベル番号“１”のグループを構成する候補領域１３１ａ，１３１ｂのうちスコアが最大となる候補領域１３１ａが人物領域として選択されている。 An image 150 in FIG. 5C shows the processing result of the object area calculation unit 35 for the image 140, and the candidate area having the maximum score among the candidate areas 133a and 133b constituting the group with the label number “0”. 133a is selected as the person area, and the candidate area 131a having the maximum score is selected as the person area among the candidate areas 131a and 131b constituting the group with the label number “1”.

ステップＳ７０にて人物領域の算出後、画像中に人物が一人でもいた場合（ステップＳ８０にて「ＹＥＳ」の場合）、例えば、出力部５は検出された人物領域の情報と当該人物領域が検出された入力画像とを含めた異常信号をセンタ装置に送出する（ステップＳ９０）。 If there is even one person in the image after calculating the person area in step S70 (in the case of “YES” in step S80), for example, the output unit 5 detects the information of the detected person area and the person area is detected. An abnormal signal including the input image thus transmitted is sent to the center device (step S90).

以上、実施形態を用いて説明した本発明では、入力画像ごとに候補領域のスコアから推定される環境に応じて候補領域を削除するスコアの第二閾値を設定することで、最終的な検出結果の精度を向上させることができる。 As described above, in the present invention described using the embodiment, the final detection result is obtained by setting the second threshold value of the score for deleting the candidate area according to the environment estimated from the score of the candidate area for each input image. Accuracy can be improved.

１人物検出装置、２画像入力部、３制御部、４記憶部、５出力部、３０画像縮小部、３１特徴量算出部、３２スコア算出部、３３候補領域削除部、３４領域グループ生成部、３５対象物領域算出部、４０識別器格納部、４１候補領域格納部、４２対象物領域格納部。 DESCRIPTION OF SYMBOLS 1 Person detection apparatus, 2 Image input part, 3 Control part, 4 Storage part, 5 Output part, 30 Image reduction part, 31 Feature-value calculation part, 32 Score calculation part, 33 Candidate area deletion part, 34 Area group generation part, 35 object region calculation unit, 40 classifier storage unit, 41 candidate region storage unit, 42 object region storage unit.

Claims

A target detection device for detecting a target region where a predetermined target appears in an input image,
An index value calculation function for calculating an index value representing the likelihood that the target exists in a region of interest set in the input image using feature amounts extracted at various points in the input image is stored in advance. Storage unit
An index value calculation unit that sets the attention area at a plurality of positions in the input image and calculates the index value in the attention area by the index value calculation function;
A candidate area extraction unit that extracts, as a target candidate area, an area in which the index value exceeds a predetermined first threshold among the attention areas;
A target region determination unit that determines the target region using the target candidate region extracted by the candidate region extraction unit;
With
The candidate region extraction unit, when the degree of variation of the index value for each region of interest exceeding the first threshold is equal to or greater than a predetermined erroneous extraction estimation threshold, a second threshold greater than the first threshold And deleting the index value equal to or less than the second threshold value from the target candidate area,
An object detection device characterized by the above.

The candidate area extraction unit is a value that is smaller than a representative value of the index value that exceeds the first threshold value as the second threshold value, and a difference from the first threshold value increases according to the representative value. The object detection device according to claim 1, wherein a value is set.

The target detection apparatus according to claim 1, wherein the candidate area extraction unit sets the second threshold value by adding a value corresponding to the degree of variation to the first threshold value.

The candidate area extraction unit reduces the first threshold by a predetermined amount when the degree of variation of the index value is lower than the erroneous extraction estimation threshold and equal to or less than a preset extraction omission estimation threshold, The object detection apparatus according to claim 1, wherein the target candidate area is re-extracted.