JP2014085795A

JP2014085795A - Learning image collection device, learning device and object detection device

Info

Publication number: JP2014085795A
Application number: JP2012233694A
Authority: JP
Inventors: Kentaro Yokoi; 謙太朗横井; Tomokazu Kawahara; 智一河原; Yuki Watanabe; 友樹渡辺
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2012-10-23
Filing date: 2012-10-23
Publication date: 2014-05-12

Abstract

PROBLEM TO BE SOLVED: To provide a learning image collection device easily collecting images in environment different from that of an image when a detector has learned, a learning device using the collected images, and an object detection device.SOLUTION: A learning image collection device comprises an acquisition part, an extraction part, a calculation part and a selection part. The acquisition part acquires an image including an object. The extraction part extracts a plurality of candidate areas being a candidate of the object from the image. The calculation part calculates either one of first similarity between the candidate areas and a prescribed area, second similarity between sizes of the candidate areas and the prescribed object, or third similarity of each of the plurality of candidate areas. The selection part selects a candidate area as an object area including the object when either one of the first, second, or third similarity is larger than a prescribed threshold.

Description

本発明の実施形態は、学習画像収集装置、学習装置及び対象物検出装置に関する。 Embodiments described herein relate generally to a learning image collection device, a learning device, and an object detection device.

撮像された画像から対象物か否かを識別し、物体の追跡を行うことで、より効果的に学習パターンを収集することができる認識装置に関する技術が開示されている。この技術では効率よく収集するために、誤識別した場合の対象物の画像を収集している。 A technique related to a recognition apparatus that can collect learning patterns more effectively by identifying whether or not the object is an object from the captured image and tracking the object is disclosed. In this technique, in order to collect efficiently, images of an object in the case of misidentification are collected.

しかし、予め対象物か否かを識別ができない場合には追跡ができず、新たな学習用画像が収集できない。一方、大量の学習データを用いて検出器の学習を行った場合でも、学習時と異なる環境の画像では検出率が低下してしまう。 However, if it cannot be identified in advance whether the object is an object, it cannot be traced and a new learning image cannot be collected. On the other hand, even when the detector is learned using a large amount of learning data, the detection rate is lowered in an image in an environment different from that at the time of learning.

特開２００７−３１０４８０JP2007-310480

本発明が解決しようとする課題は、検出器の学習時の画像と環境の異なる画像を容易に収集する学習画像収集装置、収集した画像を用いた学習装置及び対象物検出装置を提供することである。 The problem to be solved by the present invention is to provide a learning image collection device that easily collects an image having a different environment from the learning image of the detector, a learning device using the collected image, and an object detection device. is there.

実施形態の学習画像収集装置は、取得部と、抽出部と、算出部と、選択部とを備える。取得部は、対象物を含む画像を取得する。抽出部は、前記画像から前記対象物の候補となる複数の候補領域を抽出する。算出部は、前記候補領域と所定領域との第１の類似度、所定の前記対象物のサイズとの第２の類似度、前記複数の候補領域夫々との第３類似度のいずれか一つの類似度を算出する。選択部は前記第１類似度、前記第２類似度、前記第３類似度のいずれか一つが所定の閾値より大きい場合に、前記対象物を含む対象物領域として選択する。 The learning image collection device of the embodiment includes an acquisition unit, an extraction unit, a calculation unit, and a selection unit. An acquisition part acquires the image containing a target object. The extraction unit extracts a plurality of candidate regions that are candidates for the object from the image. The calculation unit is any one of a first similarity between the candidate area and the predetermined area, a second similarity with the predetermined size of the object, and a third similarity with each of the plurality of candidate areas. Calculate similarity. The selection unit selects an object region including the object when any one of the first similarity, the second similarity, and the third similarity is larger than a predetermined threshold.

実施形態の学習装置は、取得部と、抽出部と、算出部と、選択部と学習部とを備える。 The learning device according to the embodiment includes an acquisition unit, an extraction unit, a calculation unit, a selection unit, and a learning unit.

取得部は、対象物を含む画像を取得する。抽出部は、前記画像から前記対象物の候補となる複数の候補領域を抽出する。算出部は、前記候補領域と所定領域との第１の類似度、所定の前記対象物のサイズとの第２の類似度、前記複数の候補領域夫々との第３類似度のいずれか一つの類似度を算出する。選択部は前記第１類似度、前記第２類似度、前記第３類似度のいずれか一つが所定の閾値より大きい場合に、前記対象物を含む対象物領域として選択する。学習部は前記対象物領域を教師データとして対象物を識別する識別器の学習を行う。 An acquisition part acquires the image containing a target object. The extraction unit extracts a plurality of candidate regions that are candidates for the object from the image. The calculation unit is any one of a first similarity between the candidate area and the predetermined area, a second similarity with the predetermined size of the object, and a third similarity with each of the plurality of candidate areas. Calculate similarity. The selection unit selects an object region including the object when any one of the first similarity, the second similarity, and the third similarity is larger than a predetermined threshold. The learning unit learns a discriminator for identifying an object using the object area as teacher data.

実施形態の対象物検出装置は、撮像部と、抽出部と、算出部と、選択部と、学習部と、検出部とを備える。撮像部は対象物を含む画像を撮像する。抽出部は、前記画像から前記対象物の候補となる複数の候補領域を抽出する。算出部は、前記候補領域と所定領域との第１の類似度、所定の前記対象物のサイズとの第２の類似度、前記複数の候補領域夫々との第３類似度のいずれか一つの類似度を算出する。選択部は、前記第１類似度、前記第２類似度、前記第３類似度のいずれか一つが所定の閾値より大きい場合に、前記対象物を含む対象物領域として選択する。学習部は、前記対象物領域を教師データとして対象物を識別する識別器の学習を行う。検出部は前記対象物を前記画像から検出する。 The target object detection apparatus according to the embodiment includes an imaging unit, an extraction unit, a calculation unit, a selection unit, a learning unit, and a detection unit. The imaging unit captures an image including the object. The extraction unit extracts a plurality of candidate regions that are candidates for the object from the image. The calculation unit is any one of a first similarity between the candidate area and the predetermined area, a second similarity with the predetermined size of the object, and a third similarity with each of the plurality of candidate areas. Calculate similarity. The selection unit selects an object region including the object when any one of the first similarity, the second similarity, and the third similarity is larger than a predetermined threshold. The learning unit learns a discriminator that identifies an object using the object region as teacher data. The detection unit detects the object from the image.

第１の実施形態の学習画像取集装置の例を示す構成図。The lineblock diagram showing the example of the learning image collection device of a 1st embodiment. 第１の実施形態の学習画像収集装置の例を示すフローチャート。The flowchart which shows the example of the learning image collection device of 1st Embodiment. 人物を検出対象とした場合の一例を示す説明図。Explanatory drawing which shows an example at the time of setting a person as a detection target. 環境条件を考慮した学習に必要な画像の一例を示す説明図。Explanatory drawing which shows an example of an image required for the learning which considered environmental conditions. 第１の実施形態の変形例の画面一例を示す図。The figure which shows an example of the screen of the modification of 1st Embodiment. 第１の実施形態の変形例の画面一例を示す図。The figure which shows an example of the screen of the modification of 1st Embodiment. 第１の実施形態の変形例の画面一例を示す図。The figure which shows an example of the screen of the modification of 1st Embodiment. 第２の実施形態の学習装置の例を示す構成図。The block diagram which shows the example of the learning apparatus of 2nd Embodiment. 第２の実施形態の対象物検出装置の例を示す構成図。The block diagram which shows the example of the target object detection apparatus of 2nd Embodiment. 第２の実施形態の学習装置の例を示すフローチャート。The flowchart which shows the example of the learning apparatus of 2nd Embodiment. 第３の実施形態の対象物検出装置の例を示す構成図。The block diagram which shows the example of the target object detection apparatus of 3rd Embodiment. 第３の実施形態の対象物検出装置の例を示すフローチャート。The flowchart which shows the example of the target object detection apparatus of 3rd Embodiment. 第３の実施形態の変形例を示す構成図。The block diagram which shows the modification of 3rd Embodiment. 第３の実施形態の変形例を示す構成図。The block diagram which shows the modification of 3rd Embodiment.

以下、添付図面を参照しながら、実施形態を詳細に説明する。 Hereinafter, embodiments will be described in detail with reference to the accompanying drawings.

（第１の実施形態）
図１は、第１の実施形態の学習画像収集装置１の一例を示す構成図である。図１に示すように、取得部１０、抽出部１１、算出部１２、選択部１３とを備える。学習画像収集装置は、例えば、ＣＰＵ（Central Processing Unit）などの処理装置にプログラムを実行させること、即ち、ソフトウェアにより実現してもよいし、ＩＣ（Integrated Circuit）などのハードウェアにより実現してもよいし、ソフトウェア及びハードウェアを併用して実現してもよい。取得部１０が取得する画像は記憶装置に記憶されたものを用いてもよい。 (First embodiment)
FIG. 1 is a configuration diagram illustrating an example of a learning image collection device 1 according to the first embodiment. As shown in FIG. 1, an acquisition unit 10, an extraction unit 11, a calculation unit 12, and a selection unit 13 are provided. The learning image collection device may be realized by causing a processing device such as a CPU (Central Processing Unit) to execute a program, that is, by software, or by hardware such as an IC (Integrated Circuit). It may be realized by using software and hardware together. The image acquired by the acquisition unit 10 may be an image stored in a storage device.

記憶装置は、例えば、ＨＤＤ（Hard Disk Drive）、ＳＳＤ（Solid State Drive）、ＲＯＭ（Read Only Memory）、メモリカードなどの磁気的、光学的、及び電気的に記憶可能な記憶装置の少なくともいずれかにより実現できる。 The storage device is, for example, at least one of magnetic, optical, and electrical storage devices such as a hard disk drive (HDD), a solid state drive (SSD), a read only memory (ROM), and a memory card. Can be realized.

図２は第１の実施形態の学習画像収集装置１の処理の手順の流れの一例を示すフローチャートである。 FIG. 2 is a flowchart showing an example of a flow of processing of the learning image collection device 1 according to the first embodiment.

まず学習画像収集装置１は、撮像装置で撮像された画像または記憶装置に記憶された画像を取得する（ステップＳ１１）。次に抽出部が、対象物体の候補となりうる領域を候補領域として抽出する（ステップＳ１２）。候補領域の抽出は、フレーム間差分処理や背景差分処理を用いればよい。例えば、Wallflower: Principles and Practice of Background Maintenance（Kentaro Toyama, John Krumm, Barry Brumitt, Brian Meyers, Proceedings of the 7th IEEE International Conference on Computer Vision, ICCV 1999, pp255-261, September,1999）の手法を用いればよい
候補領域には、例えば、例えば人物、車両、揺れる木、開閉するドアなど、変動する背景などが含まれる。 First, the learning image collection device 1 acquires an image captured by the imaging device or an image stored in the storage device (step S11). Next, the extraction unit extracts a region that can be a candidate for the target object as a candidate region (step S12). The extraction of the candidate area may be performed using inter-frame difference processing or background difference processing. For example, using the method of Wallflower: Principles and Practice of Background Maintenance (Kentaro Toyama, John Krumm, Barry Brumitt, Brian Meyers, Proceedings of the 7th IEEE International Conference on Computer Vision, ICCV 1999, pp255-261, September, 1999) Good candidate areas include, for example, fluctuating backgrounds such as people, vehicles, swaying trees, doors that open and close.

図３は人物を検出対象とした場合の一例を示した図である。このように、フレーム間差分や背景差分処理では、学習データとして収集したい対象だけを選択できるわけではない。 FIG. 3 is a diagram showing an example when a person is a detection target. As described above, in the inter-frame difference and background difference processing, it is not possible to select only the target to be collected as learning data.

図４は環境条件を考慮した学習に必要な画像の一例を示したものである。対象物検出処理では、学習時と環境条件が異なる場合でも検出できることが理想である（X1）。しかし実際には、日照条件（光やその方向）等が異なる環境で対象物を検出する場合が多い。これまで、これまで、学習時と環境条件（周囲の条件）が異なると検出が困難であった（X2）。フレーム間差分や背景差分処理などの差分検出を用いることで、環境条件が異なる画像から対象物検出処理のための学習画像を収集することができるが、対象外の物体でも差分変化がある場合（例えば木の揺れやドアの開閉）は検出されてしまい（X3）、必ずしも学習に適した画像のみを収集できるわけではない。 FIG. 4 shows an example of an image necessary for learning in consideration of environmental conditions. In the object detection process, it is ideal that detection is possible even when the environmental conditions are different from those during learning (X1). However, in practice, the object is often detected in an environment where the sunlight conditions (light and its direction) are different. Until now, detection has been difficult if the environmental conditions (ambient conditions) differ from those during learning (X2). By using difference detection such as inter-frame difference and background difference processing, learning images for object detection processing can be collected from images with different environmental conditions. For example, tree shaking and door opening / closing are detected (X3), and it is not always possible to collect only images suitable for learning.

人物を検出する場合においては、単独の人物で、環境条件が異なる場合の学習画像を得ることが望ましい。検出対象が「人物」である場合には単独の人物検出が可能であれば、複数の人物検出も可能になるからである。即ち、提案手法では取得した環境条件の異なる画像から単独の人物を選択できるため（X4）、これを教師データとして追加学習することにより、環境条件が異なる場合にも対象物を検出可能な装置を提供することが可能となる（X5）。 In the case of detecting a person, it is desirable to obtain a learning image for a single person when the environmental conditions are different. This is because when the detection target is “person”, if a single person can be detected, a plurality of persons can be detected. That is, since the proposed method can select a single person from the acquired images with different environmental conditions (X4), by additionally learning this as teacher data, an apparatus capable of detecting an object even when the environmental conditions are different It becomes possible to provide (X5).

環境条件が異なる対象物を効率よく収集するには、図３の場合のように環境の異なる画像において、（a）矩形の縦横比を比較、（ｂ）画像上の位置の変化などの個々の条件を利用する。条件の詳細について、図２の各ステップを参照しながら説明する。 In order to efficiently collect objects with different environmental conditions, (a) compare the aspect ratios of the rectangles in the images with different environments as in the case of FIG. 3, and (b) change the positions on the images. Use conditions. Details of the conditions will be described with reference to the steps in FIG.

図２に戻る。抽出部が候補領域を複数抽出した場合、ユーザが学習として必要な画像を一つ選択する（ステップＳ１３）。人物を検出したい場合、人物が含まれる領域だけを選択する。例えば、車両を検出したい場合には車両が含まれる領域だけを選択すればよい。人物や車両以外の物体を検出したい場合も、その物体を対象として同様の処理を行えばよい。 Returning to FIG. When the extraction unit extracts a plurality of candidate areas, the user selects one image necessary for learning (step S13). When detecting a person, only an area including the person is selected. For example, when it is desired to detect a vehicle, only an area including the vehicle needs to be selected. When it is desired to detect an object other than a person or a vehicle, the same processing may be performed on the object.

算出部１２は、対象候補領域と選択または指定された対象領域とを比較して類似度Ｓ１を算出する（ステップＳ１４）。ここで類似度の算出は、例えば対象候補領域と指定された対象領域との正規化相関処理を用いる。 The calculating unit 12 compares the target candidate area with the selected or designated target area to calculate the similarity S1 (step S14). Here, the similarity is calculated using, for example, normalized correlation processing between the target candidate area and the designated target area.

また、算出部１２は、対象候補領域のサイズ情報とあらかじめ指定されたサイズ情報との類似度Ｓ２を算出する（ステップＳ１５）。ここでサイズ情報としては、例えば対象の縦方向・横方向の大きさおよび縦横比を用いる。 Further, the calculation unit 12 calculates the similarity S2 between the size information of the target candidate area and the size information designated in advance (step S15). Here, as the size information, for example, the size and the aspect ratio of the target in the vertical and horizontal directions are used.

類似度Ｓ２は候補領域Aの横方向長さをwidth_A、縦方向長さをheight_A、あらかじめ指定された対象領域Bの横方向長さをwidth_B、縦方向長さをheight_B、とすると、サイズ情報の類似度は例えば面積をもとに
（式１）
1 / {|(width_A * height_A) - (width_B * height_B)| / (width_B * height_B)}
と算出すればよい。また縦横比をもとに
（式２）
1 / {|(width_A / height_A) - (width_B / height_B)| / (width_B / height_B)}
と算出してもよい。また上述の式１及び式２の積を用いてもよい。 The similarity S2 indicates that the horizontal length of the candidate area A is width_A, the vertical length is height_A, the predetermined horizontal length of the target area B is width_B, and the vertical length is height_B. Similarity is based on area, for example (Formula 1)
1 / {| (width_A * height_A)-(width_B * height_B) | / (width_B * height_B)}
And calculate. Based on the aspect ratio (Equation 2)
1 / {| (width_A / height_A)-(width_B / height_B) | / (width_B / height_B)}
May be calculated. Further, the product of Equation 1 and Equation 2 described above may be used.

算出されたＳ２の値が小さい候補領域は、あらかじめ指定されたサイズ情報との相違が大きいことを意味する。例えば、Ｓ２の値が大きい候補領域は人物の可能性が高いことが推定できる。人物を検出対象として指定した場合に、類似度Ｓ２の値が小さい候補領域は車両領域等が該当する。画像収集時には、類似度Ｓ２の値が小さい候補領域を除外する方式で扱う。この場合検出装置の精度がよりよくなるため、好ましい。 A candidate area having a small calculated value of S2 means that the difference from the size information designated in advance is large. For example, it can be estimated that a candidate region having a large value of S2 is highly likely to be a person. When a person is designated as a detection target, a candidate area with a small value of similarity S2 corresponds to a vehicle area or the like. At the time of image collection, it is handled by a method of excluding candidate areas having a small value of similarity S2. This is preferable because the accuracy of the detection device is improved.

例えば、人物を検出対象として指定した場合に、複数の人物が重なり合って１つの集団として抽出された領域は学習データとして好ましくない。縦横比を利用すると、複数の人物が重なりあった一つの集団は縦横比が大きく異なることから類似度Ｓ２の値は小さくなる。これらを学習データから除外できるため、検出装置の精度がよりよくなるため、好ましい。 For example, when a person is designated as a detection target, a region in which a plurality of persons overlap and are extracted as one group is not preferable as learning data. When the aspect ratio is used, the value of the similarity S2 becomes small because one group in which a plurality of persons overlap has a greatly different aspect ratio. Since these can be excluded from the learning data, the accuracy of the detection device is improved, which is preferable.

また算出部１２は、得られた複数の対象候補領域を比較して類似度Ｓ３を算出する（ステップＳ１６）。ここで類似度の算出は、例えば一般的な正規化相関処理を用いればよい。例えばある候補領域Xと、それ以外のN個の候補領域X_i (i = 1〜N)があった場合に、
（式３）
S3 = 1/N * Σi=1〜N (Sim(X, X_i)) （ただしSim(A,B)はAとBの正規化相関値）
とすればよい。類似度Ｓ３の値が小さい場合は、比較した他の候補領域と大きく異なるパターンであることを意味する。例えば、背景などが突発的に抽出されたものである可能性が高い。この場合学習データから排除すると、検出装置の精度がよりよくなるために好ましい。 In addition, the calculation unit 12 compares the obtained plurality of target candidate regions and calculates the similarity S3 (step S16). Here, the similarity may be calculated using, for example, a general normalized correlation process. For example, when there is a candidate area X and other N candidate areas X_i (i = 1 to N),
(Formula 3)
S3 = 1 / N * Σi = 1 to N (Sim (X, X_i)) (where Sim (A, B) is the normalized correlation value of A and B)
And it is sufficient. When the value of the similarity S3 is small, it means that the pattern is greatly different from other compared candidate areas. For example, there is a high possibility that the background or the like is suddenly extracted. In this case, it is preferable to exclude it from the learning data because the accuracy of the detection device is improved.

更に、算出部１２は複数の対象候補領域の位置の相違度Ｓ４を算出する（ステップＳ17）。ここで位置の相違度とは、ある候補領域Xと、それと十分に高い類似度をもつM個の候補領域X_i (i = 1〜M)あった場合に次式（式４）で表すことができる。 Furthermore, the calculation unit 12 calculates the degree of difference S4 between the positions of the plurality of target candidate regions (step S17). Here, the degree of difference in position is expressed by the following equation (Equation 4) when there is a certain candidate region X and M candidate regions X_i (i = 1 to M) having sufficiently high similarity to the candidate region X. it can.

（式４）
S4 = 1/M * Σi=1〜M (Dist(X, X_i)) （ただしDist(A,B)はAとBのユークリッド距離）相違度Ｓ４の値が小さい場合は、特定の近接した場所にしか出現しないパターンであると推定できる。例えば、自動ドアの開閉など、検出したい対象である人物とは異なる背景領域が繰り返し抽出される場合である。例えば、Ｓ４の値が大きい候補領域は人物の可能性が高いことが推定できる。また学習用の画像収集時には、類似度Ｓ４の値が小さい候補領域を除外する方式で扱うと検出装置の精度がよりよくなるため、好ましい。 (Formula 4)
S4 = 1 / M * Σi = 1 to M (Dist (X, X_i)) (where Dist (A, B) is the Euclidean distance between A and B). It can be estimated that the pattern appears only in. For example, it is a case where a background area different from a person who is a target to be detected is repeatedly extracted, such as opening and closing of an automatic door. For example, it can be estimated that a candidate region having a large value of S4 is highly likely to be a person. In addition, when collecting learning images, it is preferable to use a method of excluding candidate regions having a small value of similarity S4 because the accuracy of the detection device is improved.

選択部１３は、算出部１２で算出された類似度または相違度（Ｓ１〜Ｓ４）が所定の閾値より大きいか否かを判定し（ステップＳ１８）、候補領域から学習用画像を選択する。候補領域からの選択は類似度または相違度（Ｓ１〜Ｓ４）のいずれか、または２つ以上の類似度の結果に基づいて画像領域を選択する（ステップＳ１９）。 The selection unit 13 determines whether the similarity or dissimilarity (S1 to S4) calculated by the calculation unit 12 is greater than a predetermined threshold (step S18), and selects a learning image from the candidate area. The selection from the candidate area is either the similarity or the difference (S1 to S4), or the image area is selected based on the result of two or more similarities (step S19).

例えば、算出したＳ1〜Ｓ４の値に対して重みw1〜w4を設定し、以下のような重み付け評価を行ってもよい。 For example, weights w1 to w4 may be set for the calculated values of S1 to S4, and the following weighting evaluation may be performed.

（式５）
(w1 * S1) + (w2 * S2) + (w3 * S3) + (w4 * S4)
選択部１３はこの評価値が所定の閾値以上の領域を学習用画像として選択する。所定の閾値より評価値が小さいものは学習データから除外され、検出装置の検出対象物の精度がよくなるため好ましい。 (Formula 5)
(w1 * S1) + (w2 * S2) + (w3 * S3) + (w4 * S4)
The selection unit 13 selects a region where the evaluation value is equal to or greater than a predetermined threshold as a learning image. Those having an evaluation value smaller than a predetermined threshold value are preferably excluded because they are excluded from the learning data and the accuracy of the detection target of the detection device is improved.

また、算出部１２は候補領域の移動量Ｓ５を算出する（ステップＳ２１）。例えば時系列順の複数枚の画像に渡って、対象となる候補を追跡し、その移動量を算出すればよい。例えば追跡処理には、以下の手法を用いればよい。 Further, the calculation unit 12 calculates the movement amount S5 of the candidate area (step S21). For example, the target candidate may be tracked over a plurality of images in time series order, and the movement amount may be calculated. For example, the following method may be used for the tracking process.

Dorin Comaniciu , Visvanathan Ramesh , Peter Meer：Real-Time Tracking of Non-Rigid Objects using Mean Shift, pp.142--149, Proceedings of the 2000 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2000)
算出した移動量Ｓ５が所定の閾値より大きいか否かを判定し（ステップＳ２２）、候補領域から学習用画像を選択する。候補領域からの選択は類似度または相違度（Ｓ１〜Ｓ５）のいずれか、または２つ以上の類似度の結果に基づいて画像領域を選択する（ステップＳ１９）
例えば、上述の式５と同様に、算出したＳ1〜Ｓ５の値に対して重みw1〜w５を設定し、以下のような重み付け評価を行ってもよい。 Dorin Comaniciu, Visvanathan Ramesh, Peter Meer: Real-Time Tracking of Non-Rigid Objects using Mean Shift, pp.142--149, Proceedings of the 2000 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2000)
It is determined whether or not the calculated movement amount S5 is greater than a predetermined threshold (step S22), and a learning image is selected from the candidate area. The selection from the candidate area is either the similarity or the difference (S1 to S5), or the image area is selected based on the result of two or more similarities (step S19).
For example, similarly to the above-described Expression 5, the weights w1 to w5 may be set for the calculated values of S1 to S5, and the following weighting evaluation may be performed.

（式５）
(w1 * S1) + (w2 * S2) + (w3 * S3) + (w4 * S4) + (w5 * S5)
選択部１３はこの評価値が所定の閾値以上の領域を学習用画像として選択する。所定の閾値より評価値が小さいものは学習データから除外する。この場合、検出装置の検出対象物の精度がよくなるため好ましい。 (Formula 5)
(w1 * S1) + (w2 * S2) + (w3 * S3) + (w4 * S4) + (w5 * S5)
The selection unit 13 selects a region where the evaluation value is equal to or greater than a predetermined threshold as a learning image. Those having an evaluation value smaller than a predetermined threshold are excluded from the learning data. In this case, the accuracy of the detection target of the detection device is improved, which is preferable.

選択した候補領域は、ユーザが選択できるように、表示装置等に提示してもよい（ステップＳ２０）。 The selected candidate area may be presented on a display device or the like so that the user can select it (step S20).

以上のように、第１の実施形態に係わる学習画像収集装置１によれば、検出器の学習時の画像と環境の異なる画像を容易に収集することが可能になる。特に、検出器の学習時と異なる環境の画像から対象物を検出しようとする場合には、追跡や追加学習ができない場合があるが、本実施形態に係る検出対象画像収集装置を用いることにより、検出器の学習に適した検出対象画像を得ることが可能になる。 As described above, according to the learning image collecting apparatus 1 according to the first embodiment, it is possible to easily collect images having different environments from the images when the detector is learned. In particular, when trying to detect an object from an image of an environment different from the time of learning of the detector, tracking or additional learning may not be possible, but by using the detection target image collecting apparatus according to the present embodiment, It is possible to obtain a detection target image suitable for detector learning.

（変形例）
第１の実施形態の変形に係わる検出対象画像収集装置は、第１の実施形態における選択部１３の代わりに、手動選択部を備えてもよい。まず、第１の実施形態におけるステップＳ１からステップＳ１８、Ｓ２１及びＳ２２までは、本変形例においても同様の処理を行う。 (Modification)
The detection target image collection apparatus according to the modification of the first embodiment may include a manual selection unit instead of the selection unit 13 in the first embodiment. First, from step S1 to step S18, S21, and S22 in the first embodiment, the same processing is performed in the present modification.

ステップＳ１９おいて、手動選択部は、類似Ｓ１〜類似度Ｓ３、相違度Ｓ４及びＳ５のいずれかもしくは２つ以上の手段の算出結果に基づいて画像領域を整列する。ここで、第１の実施形態におけるステップＳ１９と同様に重み付け評価を行い、この評価値の高いものから順に整列するとよい。 In step S19, the manual selection unit arranges the image regions based on the calculation results of any one or two or more of the similarity S1 to the similarity S3, the difference S4 and S5. Here, it is preferable to perform weighting evaluation in the same manner as in step S19 in the first embodiment, and arrange them in descending order of the evaluation value.

図５は評価値の高いものから順に整列した場合の一例である。手動選択部は、整列した画像を表示部に表示する。ユーザはどの画像を学習用画像として選択するかを決め、検出対象画像収集装置はユーザの指定を受け付ける。 FIG. 5 shows an example in the case where the evaluation values are arranged in descending order. The manual selection unit displays the aligned images on the display unit. The user determines which image to select as the learning image, and the detection target image collection device accepts the user's specification.

ここでユーザに提示される学習用の画像領域候補は、より学習用画像として適切だと推定された順番に画像領域候補を整列する。このため、ユーザは評価値の高い画像領域を優先して確認することができる。図５では、左にあるほど評価値が高く整列されている。従って、ユーザは左から候補領域を確認すればよい。より左にある画像領域候補を優先して確認すればよいため学習用画像の選択が容易である。このようなユーザの確認によって、不適切な画像領域が選択されることを防ぐことができる。また、ユーザは評価値の低い画像領域を簡易な確認で除外することができるため、確認の手間を低減することもできる。 Here, the image area candidates for learning presented to the user are arranged in the order in which they are estimated to be more appropriate as images for learning. For this reason, the user can preferentially check an image area having a high evaluation value. In FIG. 5, the evaluation values are arranged higher as they are on the left. Therefore, the user may confirm the candidate area from the left. Since the image region candidate on the left side may be checked with priority, the learning image can be easily selected. By such user confirmation, it is possible to prevent an inappropriate image region from being selected. In addition, since the user can exclude the image area having a low evaluation value by simple confirmation, it is possible to reduce the trouble of confirmation.

また、選択すべき画像の周辺も含めて表示するようにしてもよい。選択すべき画像の周辺も含めて表示することによって、確認したい対象の領域の周囲を確認して判断ができる。例えば図６は全体画像の中に確認対象となる画像領域を重ねて表示した例である。人物を対象とする場合、画像の周辺の枠を表示したり、枠の色や太さ、線の種類等に評価値を反映させたりして表示するとよい。例えば、枠の色を赤く表示し、より色の濃いものを評価値が大きいとしたり、赤の場合は評価値が高い、青の場合は評価値が低いとしたりする。また、枠の太さが太いほど評価値が大きいとしてもよいし、枠線の種類を鎖線に変えて、評価値が小さいとしてもよい。例えば、評価値が最も高い場合に、矢印などのシンボルを表示して示してもよい。周囲の画像を含めて表示して選ぶ場合、ユーザにどういう画像のどの部分のものが選択されたかが分かりやすくなり良い。 Further, it may be displayed including the periphery of the image to be selected. By displaying the image including the periphery of the image to be selected, it is possible to confirm and determine the periphery of the target region to be confirmed. For example, FIG. 6 shows an example in which an image area to be confirmed is superimposed on the entire image. When a person is a target, a frame around the image may be displayed, or an evaluation value may be reflected in the color and thickness of the frame, the type of line, and the like. For example, the color of the frame is displayed in red, and the darker color has a larger evaluation value, the evaluation value is high for red, and the evaluation value is low for blue. Moreover, the evaluation value may be larger as the thickness of the frame is larger, or the evaluation value may be smaller by changing the type of the frame line to a chain line. For example, when the evaluation value is the highest, a symbol such as an arrow may be displayed. When displaying and selecting surrounding images, it is easy for the user to understand what part of the image has been selected.

一方、評価値の大小が分かりにくいため、評価値順に並べ替えるなどのモードの切り替えを別に提示しても良い。また、図７は図５と図６を合わせた場合の一例を示した図である。評価値が大きい順に並べると、一覧性が上がるためユーザの視認性が向上するうえ、実際の画像の様子を併せて表示することで、全体の様子も把握しやすくなるため好ましい。これらは、「上の画面に存在する候補だけを下に並べる」というのでもよいし、
下の画面を主にして、「下は全映像中の評価値順に並べて、下の候補を指定すると対応する全体画像が上に表示される」というように切り替えるモードを提示してもよい。 On the other hand, since the magnitudes of the evaluation values are difficult to understand, mode switching such as rearranging in order of evaluation values may be presented separately. FIG. 7 is a diagram showing an example in the case where FIGS. 5 and 6 are combined. Arranging the evaluation values in descending order is preferable because it improves the visibility of the user and improves the visibility of the user, and also displays the actual state of the image together, making it easier to grasp the overall state. These may be "put only the candidates that exist on the screen above",
Mainly on the lower screen, a mode may be presented in which “the lower is arranged in the order of evaluation values in all videos and the corresponding whole image is displayed on the upper side when the lower candidate is designated”.

（第２の実施形態）
図８は、本発明の第２の実施形態に係わる学習装置２を示すブロック図である。本実施形態に関わる学習装置は、取得部１０と、抽出部１１と、算出部１２と、選択部１３と学習部２４とを備える。第１の実施形態とは学習部２４を備えることが異なる。 (Second Embodiment)
FIG. 8 is a block diagram showing a learning device 2 according to the second embodiment of the present invention. The learning device according to the present embodiment includes an acquisition unit 10, an extraction unit 11, a calculation unit 12, a selection unit 13, and a learning unit 24. The difference from the first embodiment is that a learning unit 24 is provided.

また、図９は本発明の第２の実施形態に関わる学習装置を用いた対象物検出装置３である。第１の実施形態とは撮像部３０と学習部２４と検出部３５を備えることが異なる。また、図８とは撮像部３０を備えることが異なる。 Moreover, FIG. 9 is the target object detection apparatus 3 using the learning apparatus concerning the 2nd Embodiment of this invention. The first embodiment is different from the first embodiment in that the imaging unit 30, the learning unit 24, and the detection unit 35 are provided. 8 is different from FIG. 8 in that an imaging unit 30 is provided.

以下、第１の実施形態と同じ構成、処理には同じ番号を付し、その説明を省略する。 Hereinafter, the same configurations and processes as those of the first embodiment are denoted by the same reference numerals, and description thereof is omitted.

次に図１０を用いて、本発明の第２の実施形態に係わる対象検出装置の動作について説明する。なお、図１０は、本発明の第２の実施形態に係わる対象検出装置の動作を示すフローチャートである。 Next, the operation of the object detection apparatus according to the second embodiment of the present invention will be described with reference to FIG. FIG. 10 is a flowchart showing the operation of the object detection apparatus according to the second embodiment of the present invention.

まず、撮像部３０によって画像を撮像する。撮像された画像は図８の学習装置の場合であれば取得部１０にが取得する。次に第１の実施形態におけるＳ１２からステップＳ１８、Ｓ２１及びＳ２２までは、第２の実施形態でも同様の処理を行う。 First, an image is captured by the imaging unit 30. The acquired image is acquired by the acquisition unit 10 in the case of the learning apparatus of FIG. Next, from S12 in the first embodiment to steps S18, S21 and S22, the same processing is performed in the second embodiment.

ステップＳ２３で検出器の学習に適した検出対象画像を選択し、学習部２４は、それらの画像を用いて検出に必要な情報を算出し、学習する（ステップＳ２４）。算出は、例えばG. Cauwenberghs and T. Poggio, "Incremental and Decremental Support Vector Machine Learning," in Adv. Neural Information Processing Systems pp.409-415 (NIPS*2000), Incremental and Decremental Support Vector Machine Learningのような追加学習を行えばよい。学習によって算出された結果は、画像認識分野において一般に辞書と呼ばれる、検出の手がかりとなる情報である。算出結果に基づいて、検出部３５は指定された画像の対象物について検出を行う（ステップＳ２５）。 In step S23, detection target images suitable for detector learning are selected, and the learning unit 24 calculates and learns information necessary for detection using these images (step S24). For example, G. Cauwenberghs and T. Poggio, "Incremental and Decremental Support Vector Machine Learning," in Adv. Neural Information Processing Systems pp.409-415 (NIPS * 2000), Incremental and Decremental Support Vector Machine Learning Additional learning may be performed. The result calculated by learning is information that is commonly called a dictionary in the field of image recognition and serves as a clue to detection. Based on the calculation result, the detection unit 35 detects the object of the designated image (step S25).

ここで、ステップＳ２４で、既存の検出器と並存して動作する別の検出器を新たに学習し、追加の検出部としてもよい。そしてステップＳ２５、既存の検出器の結果と新たに追加した検出器の結果に共通するものを検出する。または、既存の検出器の結果と新たに追加した検出器の両方を検出結果として用いてもよい。例えば既存の検出器と新たに追加した検出器に用いる検出パラメータなどが異なる場合、両方の検出結果を比較した結果を用いることによって、検出性能を向上させることができる
以上のように、第２の実施形態に係わる学習装置２によれば、検出器の学習時の画像と環境の異なる画像を容易に収集することが可能になる。また、第２の実施形態に係る対象物検出装置３によれば、環境の異なる画像を収集し、対象物の検出性能を向上させることができる。特に、検出器の学習時と異なる環境の画像から対象物を検出しようとする場合には、追跡や追加学習ができない場合があるが、本実施形態に係る学習装置を用いることにより、検出器の学習に適した検出対象画像を得ることが可能になるため、対象物の検出性能を向上させることが可能になる。 Here, in step S24, another detector that operates in parallel with the existing detector may be newly learned and used as an additional detector. Then, in step S25, a result common to the result of the existing detector and the result of the newly added detector is detected. Or you may use both the result of the existing detector, and the newly added detector as a detection result. For example, when the detection parameters used for the existing detector and the newly added detector are different, the detection performance can be improved by using the result of comparing both detection results. According to the learning device 2 according to the embodiment, it is possible to easily collect images having different environments from the images when the detector is learned. Moreover, according to the target object detection apparatus 3 which concerns on 2nd Embodiment, the image from which an environment differs can be collected and the detection performance of a target object can be improved. In particular, when an object is to be detected from an image in an environment different from that at the time of learning of the detector, tracking or additional learning may not be possible. However, by using the learning device according to the present embodiment, Since a detection target image suitable for learning can be obtained, the detection performance of the target can be improved.

（第３の実施の形態）
図１１は、第３の実施形態に係わる対象物検出装置３の一例を示す構成図である。第２の実施形態とは、データ配信決定部４６（４６Ａ及び４６Ｂ）を備える点が異なる。以下、第１の実施形態、第２の実施形態と同じ構成、処理には同じ番号を付し、その説明を省略する。 (Third embodiment)
FIG. 11 is a configuration diagram illustrating an example of the object detection device 3 according to the third embodiment. The second embodiment is different from the second embodiment in that a data distribution determination unit 46 (46A and 46B) is provided. Hereinafter, the same configurations and processes as those in the first embodiment and the second embodiment are denoted by the same reference numerals, and the description thereof is omitted.

次に図１１および図１２を用いて、本発明の第３の実施形態に係わる対象物検出装置の動作について説明する。なお、図１２は、本発明の第３の実施形態に係わる対象物検出装置の動作を示すフローチャートである。 Next, the operation of the object detection apparatus according to the third embodiment of the present invention will be described using FIG. 11 and FIG. FIG. 12 is a flowchart showing the operation of the object detection apparatus according to the third embodiment of the present invention.

以下では学習画像収集装置１Ａが画像を収集し、学習画像収集装置１Ｂを含む対象物検出装置へ画像情報を配信する場合について説明をする。まず、第１の実施形態におけるＳ１１からステップＳ１８、Ｓ２１及びＳ２２までは、本変形でも同様の処理を行う。 Hereinafter, a case where the learning image collection device 1A collects images and distributes the image information to the object detection device including the learning image collection device 1B will be described. First, from S11 to steps S18, S21, and S22 in the first embodiment, the same processing is performed in this modification.

ステップＳ２３で検出器の学習に適した検出対象画像を選択し、データ配信決定部４６は、選択された画像の画像情報を配信する先を決定する（ステップＳ２６）。この際、配信先としては、例えば地理的に近い位置に設置されている対象物検出装置を選択すればよい。もしくは、過去に配信先候補から配信された画像情報と、ステップＳ２３で得られた検出対象画像の情報を比較し、類似度があらかじめ指定された範囲の画像情報をもつ対象物検出装置を配信先として指定すればよい。 In step S23, a detection target image suitable for the learning of the detector is selected, and the data distribution determination unit 46 determines a distribution destination of the image information of the selected image (step S26). At this time, for example, an object detection device installed at a geographically close position may be selected as the distribution destination. Alternatively, the image information distributed in the past from the distribution destination candidates is compared with the information on the detection target image obtained in step S23, and the object detection apparatus having the image information in the range in which the similarity is specified in advance is selected as the distribution destination. Can be specified as

学習手段は、得られた画像情報を用いて検出に必要な情報を算出する（ステップＳ２７）。この算出は、第２の実施形態におけるステップＳ２５と同様の処理を行えばよい。その算出結果に基づいて、検出部３５は検出処理を行う（ステップＳ２８）。 The learning means calculates information necessary for detection using the obtained image information (step S27). This calculation may be performed in the same manner as step S25 in the second embodiment. Based on the calculation result, the detection unit 35 performs a detection process (step S28).

以上の処理により、学習画像収集装置１Ａが収集した検出対象画像だけでなく、他の学習画像収集装置１Ｂが収集した検出対象画像を得ることができるようになる。対象物検出装置は２台以上あってもよい。 Through the above processing, not only the detection target image collected by the learning image collection device 1A but also the detection target image collected by another learning image collection device 1B can be obtained. There may be two or more object detection devices.

例えば、ステップＳ２６で類似度の高い画像情報をもつ対象物検出装置を配信先として指定する。類似度の高い画像情報をもつ対象物検出装置とは、俯角や照度などが類似する画像情報をもつ対象物検出装置である。この場合、俯角の大きい画像を撮像する対象物検出装置の画像は同じく俯角の大きい画像を撮像する他の対象物検出装置に配信されるようになり、照度の低い画像を撮像する対象物検出装置の画像は同じく照度の低い画像を撮像する他の対象物検出装置に配信されるようになる。その結果、より多量の検出対象画像を収集して学習することができるようになり、より高性能な対象物検出が行えるようになる。 For example, in step S26, an object detection apparatus having image information with a high degree of similarity is designated as a distribution destination. An object detection apparatus having image information with a high degree of similarity is an object detection apparatus having image information with similar depression angles, illuminances, and the like. In this case, the image of the object detection device that captures an image with a large depression angle is distributed to another object detection device that also captures an image with a large depression angle, and the object detection device that captures an image with low illuminance This image is also distributed to another object detection device that captures an image with low illuminance. As a result, a larger amount of detection target images can be collected and learned, and higher-performance target detection can be performed.

逆に、ステップＳ２６で類似度の低い画像情報をもつ対象物検出装置を配信先として指定した場合は、例えば俯角の大きい画像を撮像する対象物検出装置の画像は俯角の小さい画像を撮像する他の対象物検出装置に配信されるようになり、照度の低い画像を撮像する対象物検出装置の画像は照度の高い画像を撮像する他の対象物検出装置に配信されるようになる。その結果、より多様な検出対象画像を収集して学習することができるようになり、より安定した対象物検出が行えるようになる。 Conversely, when an object detection device having image information with low similarity is designated as the delivery destination in step S26, for example, the image of the object detection device that captures an image with a large depression angle is an image with a small depression angle. The image of the object detection device that captures an image with low illuminance is distributed to another object detection device that captures an image with high illuminance. As a result, more diverse detection target images can be collected and learned, and more stable target detection can be performed.

（変形例１）
図１１は、対象物検出装置同士が画像情報を配信しあう構成となっているが、図１３に示すように、データ配信決定部４６が独立した構成であってもよい。この場合、例えばデータ配信決定部は中央サーバに存在し、検出対象画像収集装置１Ａまたは１Ｂが収集した画像情報はいったん中央サーバのデータ配信決定部４６に送信される。データ配信決定部４６が画像情報の配信先を決定し、個々の対象物検出装置に配信されることになる。対象物検出装置は２台以上あってもよい。 (Modification 1)
In FIG. 11, the object detection devices are configured to distribute image information to each other, but as illustrated in FIG. 13, the data distribution determination unit 46 may be independent. In this case, for example, the data distribution determination unit exists in the central server, and the image information collected by the detection target image collection device 1A or 1B is once transmitted to the data distribution determination unit 46 of the central server. The data distribution determining unit 46 determines the distribution destination of the image information and distributes it to each target object detection device. There may be two or more object detection devices.

（変形例２）
図１１は、対象物検出装置同士が画像情報を配信しあう構成となっているが、図１４に示すように、選択部と学習部とデータ配信部４６が中央サーバにあってもよい。この場合、選択部５３は複数の学習画像収集装置（１Ａ、１Ｂ）から得られた画像情報を比較し、選択する。例えば地理的に近い位置に設置されている学習画像収集装置の画像を選択する。または、複数の学習画像収集装置から得られた画像情報を比較し、類似度が予め指定された範囲の画像情報をもつ学習画像収集装置の画像を選択する。 (Modification 2)
In FIG. 11, the object detection devices are configured to distribute image information to each other. However, as illustrated in FIG. 14, the selection unit, the learning unit, and the data distribution unit 46 may be provided in the central server. In this case, the selection unit 53 compares and selects image information obtained from the plurality of learning image collection devices (1A, 1B). For example, an image of a learning image collection device installed at a geographically close position is selected. Alternatively, image information obtained from a plurality of learning image collection devices is compared, and an image of the learning image collection device having image information in a range in which the degree of similarity is designated in advance is selected.

学習部３５は、得られた画像情報を用いて検出に必要な情報を算出する。中央サーバのデータ配信決定部４６は、配信先として決定した検出部３５Ａまたは３５Ｂに算出結果を配信し、検出部３５Ａまたは３５Ｂは、配信された情報に基づいて検出を行う。 The learning unit 35 calculates information necessary for detection using the obtained image information. The data distribution determination unit 46 of the central server distributes the calculation result to the detection unit 35A or 35B determined as the distribution destination, and the detection unit 35A or 35B performs detection based on the distributed information.

（ハードウェア構成）
上記実施形態の学習画像収集装置、学習装置、対象物検出装置は、ＣＰＵ（Central Processing Unit）などの制御装置、ＲＯＭやＲＡＭなどの記憶装置、ＨＤＤやＳＳＤなどの外部記憶装置、ディスプレイなどの表示装置、マウスやキーボードなどの入力装置、及びカメラなどの撮像装置等を備えており、通常のコンピュータを利用したハードウェア構成で実現可能となっている。 (Hardware configuration)
The learning image collection device, the learning device, and the object detection device of the above embodiment are a control device such as a CPU (Central Processing Unit), a storage device such as a ROM and a RAM, an external storage device such as an HDD and an SSD, and a display such as a display. The apparatus includes an input device such as a mouse and a keyboard, an imaging device such as a camera, and the like, and can be realized with a hardware configuration using a normal computer.

上記実施形態の装置で実行されるプログラムは、ＲＯＭ等に予め組み込んで提供される。 The program executed by the apparatus of the above embodiment is provided by being incorporated in advance in a ROM or the like.

また、上記実施形態の装置で実行されるプログラムを、インストール可能な形式又は実行可能な形式のファイルでＣＤ−ＲＯＭ、ＣＤ−Ｒ、メモリカード、ＤＶＤ、フレキシブルディスク（ＦＤ）等のコンピュータで読み取り可能な記憶媒体に記憶されて提供するようにしてもよい。 The program executed by the apparatus of the above embodiment can be read by a computer such as a CD-ROM, a CD-R, a memory card, a DVD, and a flexible disk (FD) in an installable or executable file. It may be provided by being stored in a different storage medium.

また、上記実施形態の装置で実行されるプログラムを、インターネット等のネットワークに接続されたコンピュータ上に格納し、ネットワーク経由でダウンロードさせることにより提供するようにしてもよい。また、上記実施形態の装置で実行されるプログラムを、インターネット等のネットワーク経由で提供または配布するようにしてもよい。 The program executed by the apparatus of the above embodiment may be provided by storing it on a computer connected to a network such as the Internet and downloading it via the network. The program executed by the apparatus of the above embodiment may be provided or distributed via a network such as the Internet.

上記実施形態の装置で実行されるプログラムは、上述した各部をコンピュータ上で実現させるためのモジュール構成となっている。実際のハードウェアとしては、例えば、制御装置が外部記憶装置からプログラムを記憶装置上に読み出して実行することにより、上記各部がコンピュータ上で実現されるようになっている。 The program executed by the apparatus of the above embodiment has a module configuration for realizing the above-described units on a computer. As actual hardware, for example, the control device reads out a program from an external storage device to the storage device and executes the program, whereby the above-described units are realized on a computer.

なお、本発明は上記実施形態そのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。また、上記実施形態に開示されている複数の構成要素の適宜な組み合わせにより、種々の発明を形成できる。例えば、実施形態に示される全構成要素から幾つかの構成要素を削除してもよい。さらに、異なる実施形態にわたる構成要素を適宜組み合わせてもよい。 Note that the present invention is not limited to the above-described embodiment as it is, and can be embodied by modifying the constituent elements without departing from the scope of the invention in the implementation stage. In addition, various inventions can be formed by appropriately combining a plurality of components disclosed in the embodiment. For example, some components may be deleted from all the components shown in the embodiment. Furthermore, constituent elements over different embodiments may be appropriately combined.

例えば、上記実施形態のフローチャートにおける各ステップを、その性質に反しない限り、実行順序を変更し、複数同時に実施し、あるいは実施毎に異なった順序で実施してもよい。 For example, as long as each step in the flowchart of the above embodiment is not contrary to its nature, the execution order may be changed, a plurality of steps may be performed simultaneously, or may be performed in a different order for each execution.

以上のように上記実施形態によれば、検出器の学習時の画像と環境の異なる画像を容易に収集することができ、学習により対象物の検出性能を向上することが可能になる。 As described above, according to the above-described embodiment, it is possible to easily collect images having different environments from the images at the time of learning of the detector, and it is possible to improve the detection performance of the object by learning.

１・・・学習画像収集装置、２・・・学習装置、３・・・対象物検出装置、１０・・・取得部、１１・・・抽出部、１２・・・算出部、１３、５３・・・選択部、２４・・・学習部、３０・・・撮像部、３５・・・検出部、４６・・・データ配信決定部 DESCRIPTION OF SYMBOLS 1 ... Learning image collection apparatus, 2 ... Learning apparatus, 3 ... Object detection apparatus, 10 ... Acquisition part, 11 ... Extraction part, 12 ... Calculation part, 13, 53. ..Selection unit, 24 ... learning unit, 30 ... imaging unit, 35 ... detection unit, 46 ... data distribution determination unit

Claims

An acquisition unit that acquires an image including an object, an extraction unit that extracts a plurality of candidate areas that are candidates for the object from the image, a first similarity between the candidate area and the predetermined area, and the predetermined object A calculating unit that calculates any one of the second similarity with the size of the first similarity and the third similarity with each of the plurality of candidate regions;
A selection unit configured to select an object region including the object when any one of the first similarity, the second similarity, and the third similarity is greater than a predetermined threshold. Learning image collection device.

The acquisition unit acquires a plurality of time-series images including the object,
The extraction unit further extracts a plurality of candidate region positions,
The calculation unit further calculates a change in position of the plurality of candidate regions in time series as a fourth similarity,
The selection unit selects the candidate region as the target region when any one of the first similarity, the second similarity, the third similarity, and the fourth similarity is greater than a predetermined threshold. The learning image collecting apparatus according to claim 1, wherein:

The calculation unit further calculates the amount of movement of the candidate area between time series images using the similarity of each of the candidate areas of the time series image,
The learning image collecting apparatus according to claim 2, wherein the selection unit sets the candidate region as the target region when the movement amount is larger than a predetermined threshold.

The object is a person,
The learning image collecting apparatus according to claim 1, wherein the size used for the second similarity is an aspect ratio of the candidate area.

The selection unit selects the object region using the second similarity and the fourth similarity,
The candidate area is excluded from the object area when the fourth similarity is smaller than a predetermined threshold among the candidate areas where the second similarity is larger than a predetermined threshold. 4. The learning image collecting apparatus according to 4.

The selection unit selects the object region using the second similarity and the movement amount,
The candidate area is excluded from the object area when the movement amount is smaller than a predetermined site among the candidate areas having the second similarity greater than a predetermined threshold. Learning image collection device.

An acquisition unit that acquires an image including an object, an extraction unit that extracts a plurality of candidate areas that are candidates for the object from the image, a first similarity between the candidate area and the predetermined area, and the predetermined object A calculating unit that calculates any one of the second similarity with the size of the first similarity and the third similarity with each of the plurality of candidate regions;
A selection unit that selects an object region including the object when any one of the first similarity, the second similarity, and the third similarity is greater than a predetermined threshold;
A learning apparatus comprising: a learning unit that learns a discriminator that identifies an object using the object region as teacher data.

The object is a person,
The learning apparatus according to claim 7, wherein the size used for the second similarity is an aspect ratio of the candidate area.

The selection unit selects the object region using the second similarity and the fourth similarity,
Of the candidate regions with the second similarity greater than a predetermined threshold, when the fourth similarity is smaller than a predetermined threshold, the candidate region is excluded from the target region to remove the target region The learning device according to claim 8, wherein the learning device is selected.

The selection unit selects the object region using the second similarity and the movement amount,
The candidate area is selected by excluding the candidate area from the object area when the movement amount is smaller than a predetermined site among the candidate areas having the second similarity greater than a predetermined threshold. The learning device according to claim 8.

An imaging unit that captures an image including an object;
An extraction unit that extracts a plurality of candidate areas that are candidates for the object from the image; a first similarity between the candidate area and the predetermined area; a second similarity with a predetermined size of the object; A calculation unit that calculates any one of the third similarities with each of the plurality of candidate areas;
A selection unit that selects a target region including the target when any one of the first similarity, the second similarity, and the third similarity is greater than a predetermined threshold value. An object detection apparatus comprising: a learning unit that performs learning of a classifier that identifies an object; and a detection unit that detects the object from the image.

The object is a person,
The object detection apparatus according to claim 11, wherein the size used for the second similarity is an aspect ratio of the candidate area.

The selection unit selects the object region using the second similarity and the fourth similarity,
The candidate area is excluded from the object area when the fourth similarity is smaller than a predetermined threshold among the candidate areas where the second similarity is larger than a predetermined threshold. 12. The learning device according to 12.

The selection unit selects the object region using the second similarity and the movement amount,
13. The learning according to claim 12, wherein, among the candidate areas having the second similarity greater than a predetermined threshold, the candidate area is excluded from the object area when the movement amount is smaller than a predetermined site. apparatus.