JP2010211468A

JP2010211468A - Device for creating learning model, object detection system, and program

Info

Publication number: JP2010211468A
Application number: JP2009056350A
Authority: JP
Inventors: Miyako Baba; 美也子馬場; Koichiro Yamaguchi; 晃一郎山口; Yoshiki Ninomiya; 芳樹二宮; Toshiyasu Katsuno; 歳康勝野; Yoshimasa Hara; 祥雅原
Original assignee: Denso Corp; Toyota Central R&D Labs Inc
Current assignee: Denso Corp; Toyota Central R&D Labs Inc
Priority date: 2009-03-10
Filing date: 2009-03-10
Publication date: 2010-09-24
Anticipated expiration: 2029-03-10
Also published as: JP5063632B2

Abstract

<P>PROBLEM TO BE SOLVED: To create a learning model which is reduced in the number of characteristics while ensuring detection performance substantially equal to that of the related art. <P>SOLUTION: For each of a plurality of ranks 1-4 according to the easiness and the difficulty in detection of a detection object, a plurality of learning images is selected, based on a plurality of learning images preliminarily classified to the easiest rank 1 to the most difficult rank 4 in detection of the detection object, so that the ratio of the number N1 of learning images classified to the ranks 2 and 3 between the easiest rank 1 and the most difficult rank 4 to the number N1 of learning images classified to the ranks 1 and 4 is a predetermined value or less, and a learning model including a group of characteristics of the number according to each of the plurality of selected learning images is created based on the selected learning images. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、学習モデル生成装置、対象物検出システム、及びプログラムに関する。 The present invention relates to a learning model generation device, an object detection system, and a program.

従来、対象物（例えば人物。検出対象物とも呼ぶ）の画像と、非対象物（非検出対象物）の画像との２種類の学習用画像から、例えば、Ｖｉｏｌａ＆Ｊｏｎｅｓの手法（例えば、「Paul Viola and Michael Jones :"Rapid Object Detection using a Boosted Cascade of Simple Features" , IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2001」に記載の手法、カスケード型ブースティングアルゴリズム）を用いて、Ｈａａｒ−ｌｉｋｅ特徴の集合からなる学習モデルを生成する学習モデル生成装置と、この学習モデル生成装置によって生成された学習モデルと入力画像とに基づいて、入力画像から対象物を検出する検出手段を備えた対象物検出装置とを含む対象物検出システムが知られている（例えば、特許文献１参照）。 Conventionally, from two types of learning images, an image of an object (for example, a person, also referred to as a detection object) and an image of a non-object (non-detection object), for example, a technique of Viola & Jones (for example, “Paul Viola” and Michael Jones: Using the method described in "Rapid Object Detection using a Boosted Cascade of Simple Features", IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2001, cascade boosting algorithm), Haar-like features A learning model generation device that generates a learning model composed of a set, and an object detection device that includes detection means for detecting an object from an input image based on the learning model and the input image generated by the learning model generation device Is known (for example, refer to Patent Document 1).

特開２００８−１６５４９６号公報JP 2008-16596A

しかしながら、学習用画像の質によって、生成された学習モデルを構成するＨａａｒ−ｌｉｋｅ特徴の数（特徴数）は大きく異なる。ここで、学習用画像の質とは、例えば、学習用画像の解像度の大きさ、学習用画像における検出対象の対象物（検出対象物）が当該検出対象物以外の対象物によって隠れている部分の大きさ、学習用画像におけるボケの度合いの大きさ、及び学習用画像における背景雑音の大きさ等によって定められるものである。例えば、学習用画像の質は、学習用画像の解像度の大きさが大きくなるほど良好となり、学習用画像における検出対象の対象物（検出対象物）が当該検出対象物以外の対象物によって隠れている部分の大きさが小さくなるほど良好となり、学習用画像におけるボケの度合いが小さくなるほど良好となり、学習用画像における背景雑音の大きさが小さくなるほど良好となる。すなわち、学習用画像において解像度が小さい画像が多くなるほど、特徴数が多くなる。また、学習用画像における検出対象物が当該検出対象物以外の対象物によって隠れている部分が大きい画像が多くなるほど特徴数が多くなる。また、学習用画像におけるボケの度合いが大きい画像が多くなるほど特徴数が多くなる。また、学習用画像における背景雑音が大きい画像が多くなるほど特徴数が多くなる。 However, the number of Haar-like features (number of features) constituting the generated learning model varies greatly depending on the quality of the learning image. Here, the quality of the learning image refers to, for example, the size of the resolution of the learning image, and the portion where the detection target object (detection target) in the learning image is hidden by an object other than the detection target object. , The degree of blur in the learning image, the background noise in the learning image, and the like. For example, the quality of the learning image becomes better as the resolution of the learning image increases, and the detection target object (detection target) in the learning image is hidden by an object other than the detection target object. The smaller the size of the portion, the better. The smaller the degree of blur in the learning image, the better. The smaller the background noise in the learning image, the better. That is, the number of features increases as the number of images with low resolution increases in the learning image. In addition, the number of features increases as the number of images in which the detection target in the learning image is hidden by a target other than the detection target increases. In addition, the number of features increases as the number of images having a large degree of blur in the learning image increases. The number of features increases as the number of images with large background noise in the learning image increases.

特許文献１に記載の学習モデル生成装置では、学習用画像の質については考慮しておらず、学習用画像をランダムに用いて学習モデルを生成しているので、特徴数が多くなることにより学習モデルの大きさが大きくなってしまう場合があり、このような場合には学習モデルを記憶するための記憶手段の容量が大きくなってしまう、という問題があった。また、特徴数が多くなると、対象物検出装置での検出対象物の検出処理（識別処理）に長い時間を要する、という問題があった。 The learning model generation device described in Patent Literature 1 does not consider the quality of the learning image, and generates the learning model using the learning image at random. Therefore, learning is performed by increasing the number of features. There is a case where the size of the model becomes large, and in such a case, there is a problem that the capacity of the storage means for storing the learning model becomes large. Further, when the number of features increases, there is a problem that it takes a long time for detection processing (identification processing) of a detection target in the target detection device.

そこで、特徴数を少なくするために、質が良い学習用画像（例えば、解像度が大きく、検出対象物が当該検出対象物以外の対象物によって隠れている部分が小さく、ボケの度合いが小さく、かつ背景雑音が小さい学習用画像）のみを用いて学習モデルを生成することが考えられるが、このような学習用画像のみに基づいて生成された学習モデルは、様々な質のバリエーションの対象物に基づいた学習モデルではないので、多様な検出対象物の検出に対応できず、対象物検出装置での検出性能が低くなってしまう、という問題があった。 Therefore, in order to reduce the number of features, the quality learning image (for example, the resolution is large, the portion where the detection target is hidden by the target other than the detection target is small, the degree of blur is small, and It is conceivable to generate a learning model using only learning images with low background noise, but learning models generated based only on such learning images are based on objects of various quality variations. Since it is not a learning model, it cannot cope with the detection of various detection objects, and there is a problem that the detection performance of the object detection device is lowered.

本発明は上述した問題点を解決するために成されたものであり、従来の技術と比較して、検出性能が同程度となり、かつ特徴数が削減された学習モデルを生成することができる学習モデル生成装置、対象物検出システム、及びプログラムを提供することを目的とする。 The present invention has been made to solve the above-described problems, and learning capable of generating a learning model having the same detection performance and reduced number of features as compared with the conventional technique. It is an object to provide a model generation device, an object detection system, and a program.

上記目的を達成するために、第１の発明に係る学習モデル生成装置は、検出対象物の検出し易さまたは検出し難さの度合いに応じた複数のランク毎に、検出対象物の最も検出し易いランクから最も検出し難いランクにわたって予め分類された複数の学習用画像に基づいて、最も検出し易いランクと最も検出し難いランクとの間のランクに分類された複数の学習用画像の数に対する、前記最も検出し易いランクと最も検出し難いランクとに分類された複数の学習用画像の数の割合が所定値以下となるように、複数の学習用画像を選択する選択手段と、前記選択手段によって選択された複数の学習用画像に基づいて、該複数の学習用画像の各々に応じた数の特徴の集合を含む学習モデルを生成する生成手段とを含んで構成されている。 In order to achieve the above object, the learning model generation device according to the first aspect of the present invention detects the detection object most frequently for each of a plurality of ranks according to the ease of detection of the detection object or the degree of difficulty of detection. The number of learning images classified into a rank between the rank that is most easily detected and the rank that is most difficult to detect based on a plurality of learning images that are classified in advance from the rank that is most difficult to detect to the rank that is most difficult to detect Selecting means for selecting a plurality of learning images such that a ratio of the number of the plurality of learning images classified into the rank that is most easily detected and the rank that is most difficult to detect is equal to or less than a predetermined value; Based on a plurality of learning images selected by the selection unit, a generation unit that generates a learning model including a set of features corresponding to each of the plurality of learning images is configured.

第１の発明に係る学習モデル生成装置によれば、最も検出し易いランクと最も検出し難いランクとの間のランクに分類された複数の学習用画像の数に対する、最も検出し易いランクと最も検出し難いランクとに分類された複数の学習用画像の数の割合が所定値以下となるように、複数の学習用画像が選択され、選択された複数の学習用画像に基づいて、複数の学習用画像の各々に応じた数の特徴の集合を含む学習モデルが生成される。このようにして生成された学習モデルは、従来の技術と比較して、検出性能が同程度となり、かつ特徴数が削減された学習モデルである。 According to the learning model generation device according to the first invention, the most easily detected rank and the most easily detected number of learning images classified into a rank between the most easily detected rank and the most difficult to detect rank. A plurality of learning images are selected such that the ratio of the number of learning images classified into ranks that are difficult to detect is equal to or less than a predetermined value, and a plurality of learning images are selected based on the selected plurality of learning images. A learning model including a set of features corresponding to each of the learning images is generated. The learning model generated in this way is a learning model in which the detection performance is comparable and the number of features is reduced as compared with the conventional technique.

従って、第１の発明に係る学習モデル生成装置によれば、従来の技術と比較して、検出性能が同程度となり、かつ特徴数が削減された学習モデルを生成することができる。 Therefore, according to the learning model generation device according to the first aspect of the present invention, it is possible to generate a learning model in which the detection performance is comparable and the number of features is reduced as compared with the conventional technique.

また、上記目的を達成するために、第２の発明に係る学習モデル生成装置は、検出対象物の検出し易さまたは検出し難さの度合いに応じた複数のランク毎に、検出対象物の最も検出し易いランクから最も検出し難いランクにわたって予め分類された複数の基準画像に基づいて、複数の学習用画像の各々を前記複数のランクのうちの何れかのランクに分類する分類手段と、前記分類手段によって分類された複数の学習用画像に基づいて、最も検出し易いランクと最も検出し難いランクとの間のランクに分類された複数の学習用画像の数に対する、前記最も検出し易いランクと最も検出し難いランクとに分類された複数の学習用画像の数の割合が所定値以下となるように、複数の学習用画像を選択する選択手段と、前記選択手段によって選択された複数の学習用画像に基づいて、該複数の学習用画像の各々に応じた数の特徴の集合を含む学習モデルを生成する生成手段とを含んで構成されている。 In order to achieve the above object, the learning model generation device according to the second aspect of the present invention is configured to detect the detection target object for each of a plurality of ranks according to the ease of detection of the detection target object or the degree of detection difficulty. Classification means for classifying each of the plurality of learning images into any one of the plurality of ranks based on a plurality of reference images previously classified from a rank that is most easily detected to a rank that is most difficult to detect; Based on the plurality of learning images classified by the classification means, the number of the plurality of learning images classified into a rank between the rank that is most easily detected and the rank that is most difficult to detect is the most easily detected. A selection unit that selects a plurality of learning images so that a ratio of the number of the plurality of learning images classified into a rank and a rank that is most difficult to detect is equal to or less than a predetermined value; Based on the plurality of learning images, and is configured to include a generation means for generating a learning model that includes a set of number of features corresponding to each of the plurality of learning images.

第２の発明に係る学習モデル生成装置によれば、検出対象物の最も検出し易いランクから最も検出し難いランクにわたって予め分類された複数の基準画像に基づいて、複数の学習用画像の各々が複数のランクのうちの何れかのランクに分類され、最も検出し易いランクと最も検出し難いランクとの間のランクに分類された複数の学習用画像の数に対する、最も検出し易いランクと最も検出し難いランクとに分類された複数の学習用画像の数の割合が所定値以下となるように、複数の学習用画像が選択され、選択された複数の学習用画像に基づいて、複数の学習用画像の各々に応じた数の特徴の集合を含む学習モデルが生成される。このようにして生成された学習モデルは、従来の技術と比較して、検出性能が同程度となり、かつ特徴数が削減された学習モデルである。 According to the learning model generation device according to the second aspect of the present invention, each of the plurality of learning images is based on a plurality of reference images that are classified in advance from the rank that is most easily detected to the rank that is most difficult to detect. The most easily detected rank and the most probable rank for the number of learning images classified into one of a plurality of ranks and classified into a rank between the most easily detected rank and the most difficult to detect rank A plurality of learning images are selected such that the ratio of the number of learning images classified into ranks that are difficult to detect is equal to or less than a predetermined value, and a plurality of learning images are selected based on the selected plurality of learning images. A learning model including a set of features corresponding to each of the learning images is generated. The learning model generated in this way is a learning model in which the detection performance is comparable and the number of features is reduced as compared with the conventional technique.

従って、第２の発明に係る学習モデル生成装置によれば、従来の技術と比較して、検出性能が同程度となり、かつ特徴数が削減された学習モデルを生成することができる。 Therefore, according to the learning model generation device according to the second invention, it is possible to generate a learning model in which the detection performance is comparable and the number of features is reduced as compared with the conventional technique.

また、前記検出対象物の検出し易さの度合いを、学習用画像の解像度が大きくなるほど大きくなり、学習用画像における検出対象物が当該検出対象物以外の対象物によって隠れている部分が大きくなるほど小さくなり、学習用画像におけるボケの度合いが大きくなるほど小さくなり、または、学習用画像における背景雑音が大きくなるほど小さくなるように定めてもよい。 Further, the degree of ease of detection of the detection object increases as the resolution of the learning image increases, and as the part of the learning image hidden by the object other than the detection object increases. It may be determined to be smaller and smaller as the degree of blur in the learning image increases or smaller as the background noise in the learning image increases.

また、前記検出対象物の検出し難さの度合いを、学習用画像の解像度が大きくなるほど小さくなり、学習用画像における検出対象物が当該検出対象物以外の対象物によって隠れている部分が大きくなるほど大きくなり、学習用画像におけるボケの度合いが大きくなるほど大きくなり、または、学習用画像における背景雑音が大きくなるほど大きくなるように定めてもよい。 In addition, the degree of difficulty in detecting the detection object decreases as the resolution of the learning image increases, and as the part of the learning image hidden by the object other than the detection object increases. It may be determined so that it increases as the degree of blur in the learning image increases, or as the background noise in the learning image increases.

また、上記目的を達成するために、第３の発明に係る対象物検出システムは、上記の学習モデル生成装置と、前記学習モデル生成装置によって生成された学習モデルと入力画像とに基づいて、前記入力画像から対象物を検出する検出手段を備えた対象物検出装置とを含んで構成されている。 In order to achieve the above object, an object detection system according to a third aspect of the present invention is based on the learning model generation device, the learning model generated by the learning model generation device, and the input image. And an object detection device including detection means for detecting the object from the input image.

また、上記目的を達成するために、第４の発明に係るプログラムは、コンピュータを、検出対象物の検出し易さまたは検出し難さの度合いに応じた複数のランク毎に、検出対象物の最も検出し易いランクから最も検出し難いランクにわたって予め分類された複数の学習用画像に基づいて、最も検出し易いランクと最も検出し難いランクとの間のランクに分類された複数の学習用画像の数に対する、前記最も検出し易いランクと最も検出し難いランクとに分類された複数の学習用画像の数の割合が所定値以下となるように、複数の学習用画像を選択する選択手段、及び前記選択手段によって選択された複数の学習用画像に基づいて、該複数の学習用画像の各々に応じた数の特徴を含む学習モデルを生成する生成手段として機能させるためのものである。 In order to achieve the above object, a program according to a fourth aspect of the present invention provides a computer that detects a detection object for each of a plurality of ranks according to the ease of detection of the detection object or the degree of difficulty of detection. A plurality of learning images classified into a rank between a rank that is most easily detected and a rank that is most difficult to detect based on a plurality of learning images classified in advance from a rank that is most easily detected to a rank that is most difficult to detect Selection means for selecting a plurality of learning images such that a ratio of the number of the plurality of learning images classified into the rank that is most easily detected and the rank that is difficult to detect with respect to the number of And a function for generating a learning model including a number of features corresponding to each of the plurality of learning images based on the plurality of learning images selected by the selection unit. A.

第４の発明に係るプログラムによれば、コンピュータによって、最も検出し易いランクと最も検出し難いランクとの間のランクに分類された複数の学習用画像の数に対する、最も検出し易いランクと最も検出し難いランクとに分類された複数の学習用画像の数の割合が所定値以下となるように、複数の学習用画像が選択され、選択された複数の学習用画像に基づいて、複数の学習用画像の各々に応じた数の特徴の集合を含む学習モデルが生成される。このようにして生成された学習モデルは、従来の技術と比較して、検出性能が同程度となり、かつ特徴数が削減された学習モデルである。 According to the program of the fourth invention, the most easily detected rank and the most easily detected number of learning images classified into a rank between the rank that is most easily detected and the rank that is most difficult to detect. A plurality of learning images are selected such that the ratio of the number of learning images classified into ranks that are difficult to detect is equal to or less than a predetermined value, and a plurality of learning images are selected based on the selected plurality of learning images. A learning model including a set of features corresponding to each of the learning images is generated. The learning model generated in this way is a learning model in which the detection performance is comparable and the number of features is reduced as compared with the conventional technique.

従って、第４の発明に係るプログラムによれば、従来の技術と比較して、検出性能が同程度となり、かつ特徴数が削減された学習モデルを生成することができる。 Therefore, according to the program according to the fourth aspect of the present invention, it is possible to generate a learning model in which the detection performance is comparable and the number of features is reduced as compared with the conventional technique.

また、上記目的を達成するために、第５の発明に係るプログラムは、コンピュータを、検出対象物の検出し易さまたは検出し難さの度合いに応じた複数のランク毎に、検出対象物の最も検出し易いランクから最も検出し難いランクにわたって予め分類された複数の基準画像に基づいて、複数の学習用画像の各々を前記複数のランクのうちの何れかのランクに分類する分類手段、前記分類手段によって分類された複数の学習用画像に基づいて、最も検出し易いランクと最も検出し難いランクとの間のランクに分類された複数の学習用画像の数に対する、前記最も検出し易いランクと最も検出し難いランクとに分類された複数の学習用画像の数の割合が所定値以下となるように、複数の学習用画像を選択する選択手段、及び前記選択手段によって選択された複数の学習用画像に基づいて、該複数の学習用画像の各々に応じた数の特徴を含む学習モデルを生成する生成手段として機能させるためのものである。 In order to achieve the above object, a program according to a fifth aspect of the present invention provides a computer that detects a detection object for each of a plurality of ranks according to the ease of detection of the detection object or the degree of difficulty of detection. Classification means for classifying each of a plurality of learning images into any one of the plurality of ranks based on a plurality of reference images previously classified from a rank that is most easily detected to a rank that is most difficult to detect, Based on the plurality of learning images classified by the classification means, the most easily detected rank with respect to the number of the plurality of learning images classified into a rank between the rank that is most easily detected and the rank that is most difficult to detect. And a selection unit that selects a plurality of learning images so that a ratio of the number of the plurality of learning images classified into the rank that is most difficult to detect is equal to or less than a predetermined value, and the selection unit Based on the-option by a plurality of learning images, it is intended to function as a generating means for generating a learning model including the features of a number corresponding to each of the plurality of learning images.

第５の発明に係るプログラムによれば、コンピュータによって、検出対象物の最も検出し易いランクから最も検出し難いランクにわたって予め分類された複数の基準画像に基づいて、複数の学習用画像の各々が複数のランクのうちの何れかのランクに分類され、最も検出し易いランクと最も検出し難いランクとの間のランクに分類された複数の学習用画像の数に対する、最も検出し易いランクと最も検出し難いランクとに分類された複数の学習用画像の数の割合が所定値以下となるように、複数の学習用画像が選択され、選択された複数の学習用画像に基づいて、複数の学習用画像の各々に応じた数の特徴の集合を含む学習モデルが生成される。このようにして生成された学習モデルは、従来の技術と比較して、検出性能が同程度となり、かつ特徴数が削減された学習モデルである。 According to the program according to the fifth aspect of the present invention, each of the plurality of learning images is based on a plurality of reference images that are pre-classified by a computer from a rank that is most easily detected to a rank that is most difficult to detect. The most easily detected rank and the most probable rank for the number of learning images classified into one of a plurality of ranks and classified into a rank between the most easily detected rank and the most difficult to detect rank A plurality of learning images are selected such that the ratio of the number of learning images classified into ranks that are difficult to detect is equal to or less than a predetermined value, and a plurality of learning images are selected based on the selected plurality of learning images. A learning model including a set of features corresponding to each of the learning images is generated. The learning model generated in this way is a learning model in which the detection performance is comparable and the number of features is reduced as compared with the conventional technique.

従って、第５の発明に係るプログラムによれば、従来の技術と比較して、検出性能が同程度となり、かつ特徴数が削減された学習モデルを生成することができる。 Therefore, according to the program according to the fifth aspect of the present invention, it is possible to generate a learning model in which the detection performance is comparable and the number of features is reduced as compared with the conventional technique.

以上説明したように、本発明に係る学習モデル生成装置、対象物検出システム、及びプログラムによれば、従来の技術と比較して、検出性能が同程度となり、かつ特徴数が削減された学習モデルを生成することができる、という効果が得られる。 As described above, according to the learning model generation device, the object detection system, and the program according to the present invention, the learning model has the same detection performance and the reduced number of features compared to the conventional technology. Can be produced.

本実施の形態に係る対象物検出システムを示す図である。It is a figure which shows the target object detection system which concerns on this Embodiment. 本実施の形態に係る基準画像の一例を示す図である。It is a figure which shows an example of the reference | standard image which concerns on this Embodiment. 本実施の形態に係る学習モデル生成装置２０が実行する学習モデル生成処理の処理ルーチンのフローチャートを示す図である。It is a figure which shows the flowchart of the process routine of the learning model production | generation process which the learning model production | generation apparatus 20 which concerns on this Embodiment performs. 本実施の形態に係る対象物検出装置３０が実行する対象物検出処理の処理ルーチンのフローチャートを示す図である。It is a figure which shows the flowchart of the process routine of the target object detection process which the target object detection apparatus 30 which concerns on this Embodiment performs. ランダムに学習用画像を選択し、歩行者検出を行う際に用いられる学習モデルを生成した場合の歩行者検出性能（すなわち、従来の技術）と、本実施の形態の対象物検出システムの歩行者検出性能の実験結果の比較例を表すグラフである。Pedestrian detection performance (ie, conventional technology) when a learning model is generated by randomly selecting a learning image and used for pedestrian detection, and a pedestrian in the object detection system of the present embodiment It is a graph showing the comparative example of the experimental result of detection performance. ランダムに学習用画像を選択し、歩行者検出を行う際に用いられる学習モデルを生成した場合の特徴数と、本実施形態の対象物検出システムの特徴数の実験結果の比較例を表すグラフである。A graph showing a comparative example of the number of features when a learning model is generated by randomly selecting a learning image and generating a learning model used for pedestrian detection and the result of experiments on the number of features of the object detection system of this embodiment is there.

以下、図面を参照して、検出対象の対象物（検出対象物）として人物（更に具体的には歩行者）を検出するための学習モデルを生成する学習モデル生成装置と、学習モデル生成装置によって生成された学習モデルと入力画像とに基づいて入力画像から検出対象物として人物を検出する検出手段を備えた対象物検出装置と、を含む対象物検出システムに、本発明を適用した場合の実施の形態を詳細に説明する。 Hereinafter, with reference to the drawings, a learning model generation device that generates a learning model for detecting a person (more specifically, a pedestrian) as a detection target object (detection target), and a learning model generation device Implementation when the present invention is applied to an object detection system including an object detection device including a detection unit that detects a person as a detection object from an input image based on a generated learning model and an input image Will be described in detail.

図１は、本発明の実施の形態に係る対象物検出システム１０の構成を示す図である。本実施形態に係る対象物検出システム１０は、検出対象物を検出する対象物検出装置３０と、対象物検出装置３０で使用される学習モデルを生成する学習モデル生成装置２０と、を含んで構成されている。 FIG. 1 is a diagram showing a configuration of an object detection system 10 according to an embodiment of the present invention. The object detection system 10 according to the present embodiment includes an object detection device 30 that detects a detection object and a learning model generation device 20 that generates a learning model used by the object detection device 30. Has been.

対象物検出装置３０は、図示しないＣＰＵ（Central Processing Unit）、図示しないＲＯＭ（Read Only Memory）、図示しないＲＡＭ（Random Access Memory）、及びこれらを接続するバス（図示せず）を備えたコンピュータを含んで構成されている。ＣＰＵは、プログラムをＲＯＭから読み出して実行することにより、対象物検出装置３０全体の制御を司る。ＲＯＭには、ＯＳ等の基本プログラム、及び後述する対象物検出処理の処理ルーチンを実行するためのプログラムが記憶されている。ワークエリアとしてのＲＡＭには、データが一時的に格納される。 The object detection device 30 includes a computer having a CPU (Central Processing Unit) (not shown), a ROM (Read Only Memory) (not shown), a RAM (Random Access Memory) (not shown), and a bus (not shown) for connecting them. It is configured to include. The CPU controls the entire object detection device 30 by reading the program from the ROM and executing it. The ROM stores a basic program such as an OS and a program for executing a processing routine of an object detection process described later. Data is temporarily stored in the RAM as a work area.

この対象物検出装置３０を、ハードウェアとソフトウェアとに基づいて定まる機能実現手段毎に分割した機能ブロックで説明すると、図１に示すように、ウインドウ画像抽出部３２と、識別部３４と、結果出力部３６と、を含んだ構成で表すことができる。 The target object detection device 30 will be described in terms of functional blocks divided for each function realization means determined based on hardware and software. As shown in FIG. 1, a window image extraction unit 32, an identification unit 34, and a result And an output unit 36.

ウインドウ画像抽出部３２は、外部から入力された入力画像からウインドウ画像（ウインドウに基づいた領域の画像）を抽出する。本実施の形態では、入力画像から検出対象物を検出する際に、当該入力画像から予め定められたサイズのウインドウ（探索ウインドウと呼称）を予め定められた移動量（探索ピッチと呼称）だけ移動させながら画像を切り取り（画像を抽出し）、当該切り取った画像（抽出された画像）から検出対象物を探索する。ここでは、切り取った画像をウインドウ画像といい、ウインドウ画像のサイズ（すなわち探索ウインドウのサイズ）をウインドウサイズと呼称する。ウインドウサイズは複数種設定されており、ウインドウ画像抽出部３２は、設定されている全ての探索ウインドウを用いてウインドウ画像を抽出する。また、ウインドウ画像抽出部３２は、抽出したウインドウ画像を予め設定された画素数の画像（例えば１６×３２画素の画像）に変換する。 The window image extraction unit 32 extracts a window image (an image of an area based on the window) from an input image input from the outside. In this embodiment, when a detection target is detected from an input image, a window having a predetermined size (referred to as a search window) is moved from the input image by a predetermined movement amount (referred to as a search pitch). Then, the image is cut out (the image is extracted), and the detection target object is searched from the cut out image (the extracted image). Here, the cut image is referred to as a window image, and the size of the window image (that is, the size of the search window) is referred to as a window size. A plurality of window sizes are set, and the window image extraction unit 32 extracts window images using all the set search windows. In addition, the window image extraction unit 32 converts the extracted window image into an image having a preset number of pixels (for example, an image of 16 × 32 pixels).

識別部（評価部）３４は、ウインドウ画像と後述の学習モデルとを用いて、検出対象物らしさを示す評価値を算出し、対象物を検出する。すなわち、識別部３４は、ウインドウ画像と学習モデルとの比較により、ウインドウ画像が検出対象物（本実施の形態では人物）であるか否かを判定し、ウインドウ画像が人物であると判定された場合には、ウインドウ画像が人物であると判定する。これによって、入力画像から検出対象物が検出される。 The identification unit (evaluation unit) 34 uses the window image and a learning model described later to calculate an evaluation value indicating the likelihood of the detection target, and detects the target. That is, the identification unit 34 determines whether or not the window image is a detection target (a person in the present embodiment) by comparing the window image and the learning model, and determines that the window image is a person. In this case, it is determined that the window image is a person. Thereby, a detection target is detected from the input image.

結果出力部３６は、検出対象物の有無を示した対象物有無情報、検出対象物の画像を示す画像データ、検出対象物の位置を示す位置情報などを検出対象物の検出結果として、報知手段、例えば表示装置３８に出力することにより、例えば、入力画像に検出結果を重畳させて表示するように表示装置３８の表示を制御する。 The result output unit 36 uses the presence / absence information indicating the presence / absence of the detection target, the image data indicating the image of the detection target, the position information indicating the position of the detection target, and the like as the detection result of the detection target. For example, by outputting to the display device 38, for example, the display of the display device 38 is controlled so that the detection result is displayed superimposed on the input image.

なお、この対象物検出装置３０では、入力画像に対して予め設定されている各サイズの探索ウインドウを探索ピッチだけ移動させながらウインドウ画像を抽出し、該抽出した全てのウインドウ画像について検出対象物の検出処理を実行するため、ウインドウ画像抽出部３２、及び評価部３４での処理は、抽出するウインドウ画像の数だけ繰り返し行なわれる。 The object detection device 30 extracts window images while moving a search window of each size set in advance with respect to the input image by the search pitch, and the detection object of all the extracted window images is detected. In order to execute the detection process, the processes in the window image extraction unit 32 and the evaluation unit 34 are repeated for the number of window images to be extracted.

学習モデル生成装置２０は、図示しないＣＰＵ（Central Processing Unit）、図示しないＲＯＭ（Read Only Memory）、図示しないＲＡＭ（Random Access Memory）、図示しないＨＤＤ（Hard Disk Drive）、及びこれらを接続するバス（図示せず）を備えたコンピュータを含んで構成されている。ＣＰＵは、プログラムをＲＯＭ及びＨＤＤから読み出して実行する。ＲＯＭには、ＯＳ等の基本プログラムが記憶されている。ワークエリアとしてのＲＡＭには、データが一時的に格納される。ＨＤＤには、後述する学習モデル生成処理の処理ルーチンを実行するためのプログラムが記憶されている。 The learning model generation device 20 includes a CPU (Central Processing Unit) (not shown), a ROM (Read Only Memory) (not shown), a RAM (Random Access Memory) (not shown), an HDD (Hard Disk Drive) (not shown), and a bus ( (Not shown). The CPU reads the program from the ROM and HDD and executes it. The ROM stores basic programs such as an OS. Data is temporarily stored in the RAM as a work area. The HDD stores a program for executing a processing routine of learning model generation processing described later.

また、ＨＤＤには、下記の表１に示す検出対象物（本実施の形態では人物であり、より詳細には歩行者）の検出し易さまたは検出し難さの度合いに応じた複数のランク１〜４毎に、検出対象物の最も検出し易いランクから最も検出し難いランクにわたって予め分類された複数の基準画像が記憶されている。 Also, the HDD has a plurality of ranks corresponding to the degree of ease of detection or difficulty of detection of the detection objects shown in Table 1 below (in this embodiment, a person, more specifically a pedestrian). For each of 1-4, a plurality of reference images classified in advance from the rank that is most easily detected to the rank that is most difficult to detect are stored.

ここで、このランクについて説明する。本実施の形態では、質が良好である度合いが大きい画像ほど、数字が小さいランクに属することとする。なお、画像の質とは、例えば、画像の解像度の大きさ、画像における検出対象の対象物（検出対象物）が当該検出対象物以外の対象物によって隠れている部分の大きさ、画像におけるボケの度合いの大きさ、及び画像における背景雑音の大きさ等によって定められるものである。例えば、画像の質は、画像の解像度の大きさが大きくなるほど良好となり、画像における検出対象の対象物（検出対象物）が当該検出対象物以外の対象物によって隠れている部分の大きさが小さくなるほど良好となり、画像におけるボケの度合いが小さくなるほど良好となり、画像における背景雑音の大きさが小さくなるほど良好となる。すなわち、検出対象物の検出し易さの度合いは、学習用画像の解像度が大きくなるほど大きくなり、学習用画像における検出対象物が当該検出対象物以外の対象物によって隠れている部分が大きくなるほど小さくなり、学習用画像におけるボケの度合いが大きくなるほど小さくなり、または、学習用画像における背景雑音が大きくなるほど小さくなるように定められている。同様に、検出対象物の検出し難さの度合いは、学習用画像の解像度が大きくなるほど小さくなり、学習用画像における検出対象物が当該検出対象物以外の対象物によって隠れている部分が大きくなるほど大きくなり、学習用画像におけるボケの度合いが大きくなるほど大きくなり、または、学習用画像における背景雑音が大きくなるほど大きくなるように定められている。 Here, this rank will be described. In the present embodiment, it is assumed that an image having a higher degree of good quality belongs to a lower rank. Note that the image quality refers to, for example, the size of the resolution of the image, the size of the portion of the image where the detection target object (detection target) is hidden by a target other than the detection target, and the blur in the image. And the size of background noise in the image. For example, the quality of the image becomes better as the resolution of the image increases, and the size of the portion of the image where the detection target object (detection target) is hidden by an object other than the detection target is small. The smaller the degree of blur in the image, the better. The smaller the background noise in the image, the better. That is, the degree of ease of detection of the detection target increases as the resolution of the learning image increases, and decreases as the portion of the learning image hidden by the target other than the detection target increases. Thus, the degree of blur is set to be smaller as the degree of blur in the learning image is increased, or to be decreased as background noise in the learning image is increased. Similarly, the degree of difficulty of detection of the detection object decreases as the resolution of the learning image increases, and as the part of the learning image hidden by the object other than the detection object increases. It is set so that it increases as the degree of blur in the learning image increases, or as the background noise in the learning image increases.

本実施の形態では、本検出対象システム１０の設計者等によって、予め、ランク１〜４の各々のランクに対応する複数の基準画像が分類されて、ＨＤＤに記憶されている。例えば、図２に示すように、設計者等の手作業によって、「ボケや隠れ、背景雑音も少なく、画質が良好な歩行者」の基準画像６０ａがランク１に分類され、「多少のボケや隠れ、背景雑音があるが、比較的画質が良好な歩行者」の基準画像６０ｂがランク２に分類され、「ボケや隠れ、背景雑音があり、画質が悪く、検出は難しいと思われる歩行者」の基準画像６０ｃがランク３に分類され、「ボケや隠れ、背景雑音が大きく、検出は不可能と思われる歩行者」の基準画像６０ｄがランク４に分類されて、基準画像６０ａ〜ｄがＨＤＤに記憶される。 In the present embodiment, a plurality of reference images corresponding to each of the ranks 1 to 4 are classified and stored in the HDD in advance by the designer of the detection target system 10 or the like. For example, as shown in FIG. 2, the reference image 60a of “a pedestrian with low blur and hiding, low background noise and good image quality” is classified into rank 1 by a manual operation of a designer or the like, and “some blur or The reference image 60b of “pedestrian with hiding and background noise but relatively good image quality” is classified into rank 2, and “pedestrian with hiding, hiding, background noise, poor image quality, and difficult to detect” ”Is classified into rank 3, and‘ pedestrians that seem to be undetectable because of blurring, hiding, and background noise are large, ”categorized as rank 4, and the reference images 60 a to d are classified as rank images 60 a to d. Stored in HDD.

この学習モデル生成装置２０を、ハードウェアとソフトウェアとに基づいて定まる機能実現手段毎に分割した機能ブロックで説明すると、図１に示すように、学習用画像分類部４２と、学習用画像選択部４４と、学習部４６と、学習モデルＤＢ（識別用モデルＤＢ）４８と、を含んだ構成で表すことができる。 The learning model generation device 20 will be described using functional blocks divided for each function realization means determined based on hardware and software. As shown in FIG. 1, a learning image classification unit 42 and a learning image selection unit 44, a learning unit 46, and a learning model DB (identification model DB) 48.

学習用画像分類部４２には、予め設定されたウインドウサイズの学習用画像が複数枚（例えば約３万枚）入力される。学習用画像としては、検出対象物を含む画像（ポジティブデータ）、および検出対象物を含まない非対象物の画像（ネガティブデータ）が考えられる。学習用画像分類部４２は、検出対象物を含む学習用画像が１枚入力されると、ＨＤＤに記憶された基準画像６０ａ〜ｄに基づいて、例えば、ＮｅａｒｅｓｔＮｅｉｇｈｂｏｒ（最近傍決定則）により、入力された検出対象物を含む学習用画像を、複数のランク１〜４のうちの何れかのランクに分類する。そして、学習用画像分類部４２は、入力された検出対象物を含む画像の枚数分だけ、この分類を行う。なお、学習用画像分類部４２は、非対象物の学習用画像が入力されると、その学習用画像については上記の分類を行わない。 A plurality of learning images (for example, about 30,000 images) having a preset window size are input to the learning image classification unit 42. As the learning image, an image including a detection target (positive data) and a non-target image including no detection target (negative data) can be considered. When one learning image including an object to be detected is input, the learning image classifying unit 42 is based on, for example, Nearest Neighbor (nearest neighbor determination rule) based on the reference images 60a to 60d stored in the HDD. The learning image including the input detection target is classified into one of a plurality of ranks 1 to 4. Then, the learning image classification unit 42 performs this classification for the number of images including the input detection target. Note that when a learning image of a non-object is input, the learning image classification unit 42 does not perform the above classification on the learning image.

学習用画像選択部４４は、下記の式（１）に従って、学習用画像分類部４２によって対応するランクに分類された複数の学習用画像に基づいて、最も検出し易いランク１と最も検出し難いランク４との間のランク２及びランク３に分類された複数の学習用画像の数Ｎ_１に対する、ランク１とランク４とに分類された複数の学習用画像の数Ｎ_２の割合が所定値Ｔ以下となるように、複数Ｍ_１（Ｎ_１＋Ｎ_２）枚の検出対象物の学習用画像を選択する。 The learning image selection unit 44 is the most easily detected rank 1 and the most difficult to detect based on the plurality of learning images classified into the corresponding ranks by the learning image classification unit 42 according to the following equation (1). The ratio of the number N ₂ of the plurality of learning images classified into rank 1 and rank 4 to the number N ₁ of the plurality of learning images classified into rank 2 and rank 3 between rank 4 is a predetermined value. A learning image of a plurality of M ₁ (N ₁ + N ₂ ) detection objects is selected so as to be T or less.

例えば、このＴの値を、例えば１．０に設定した場合には、学習用画像選択部４４は、例えば、ランク１に分類された学習用画像５０００枚、ランク２に分類された学習用画像５０００枚、ランク３に分類された学習用画像０枚、ランク４に分類された学習用画像０枚を選択する。なお、このとき、ランク２及びランク３に分類され、かつ選択される学習用画像の数Ｎ_１は５０００（５０００＋０）であり、ランク１とランク４とに分類され、かつ選択される学習用画像の数Ｎ_２は５０００（５０００＋０）であり、「Ｎ_２／Ｎ_１」の値は、１（５０００／５０００）となる。 For example, when the value of T is set to 1.0, for example, the learning image selection unit 44 has, for example, 5000 learning images classified into rank 1 and learning images classified into rank 2. 5000 images, 0 learning images classified into rank 3 and 0 learning images classified into rank 4 are selected. At this time, are classified into No. 2 and No. 3, and the number N ₁ of the learning images to be selected is 5000 (5000 + 0), the learning images are classified into the rank 1 and rank 4, and is selected The number N ₂ is 5000 (5000 + 0), and the value of “N ₂ / N ₁ ” is 1 (5000/5000).

同様に、Ｔの値を、例えば０．２に設定した場合には、学習用画像選択部４４は、例えば、ランク１に分類された学習用画像１０００枚、ランク２に分類された学習用画像５０００枚、ランク３に分類された学習用画像１０００枚、ランク４に分類された学習用画像０枚を選択する。なお、このとき、ランク２及びランク３に分類され、かつ選択される学習用画像の数Ｎ_１は６０００（１０００＋５０００）であり、ランク１とランク４とに分類され、かつ選択される学習用画像の数Ｎ_２は１０００（１０００＋０）であり、「Ｎ_２／Ｎ_１」の値は、０．１６・・・（１０００／６０００）となる。 Similarly, when the value of T is set to 0.2, for example, the learning image selection unit 44, for example, 1000 learning images classified into rank 1 and learning images classified into rank 2 5000 images, 1000 learning images classified into rank 3, and 0 learning images classified into rank 4 are selected. At this time, are classified into No. 2 and No. 3, and the number _{N 1} of the learning images to be selected is 6000 (1000 + 5000), the learning images are classified into the rank 1 and rank 4, and is selected The number N ₂ is 1000 (1000 + 0), and the value of “N ₂ / N ₁ ” is 0.16... (1000/6000).

また、学習用画像選択部４４は、学習用画像分類部４２によって分類されなかった非対象物の学習用画像を複数Ｍ_２枚選択する。 Further, the learning image selection unit 44 selects a plurality of M ₂ learning images of non-objects that have not been classified by the learning image classification unit 42.

学習部４６は、学習用画像選択部４４によって選択されたＭ_１枚の検出対象物の学習用画像、及び学習用画像選択部４４によって選択されたＭ_２枚の非対象物の学習用画像から検出対象物の特徴を示す学習モデルを生成する。 The learning unit 46 uses the learning image of the M _one detection object selected by the learning image selection unit 44 and the learning image of the M ₂ non-objects selected by the learning image selection unit 44. A learning model indicating the characteristics of the detection target is generated.

学習部４６で生成された学習モデルは、学習モデルＤＢ４８に記憶される。なお、本実施の形態では、学習モデルＤＢ４８を、記憶手段としてのＨＤＤを用いて実現する例について説明するが、本発明はこれに限られない。例えば、学習モデルＤＢ４８は、ＣＤ−ＲＯＭ等のように、外付けの記憶手段であって、学習モデルを記憶できる媒体により構成されていれば、特に限定されない。学習モデルＤＢ４８に記憶された学習モデルは、対象物検出装置３０の識別部３４で対象物の検出に利用される。 The learning model generated by the learning unit 46 is stored in the learning model DB 48. In the present embodiment, an example in which the learning model DB 48 is realized using an HDD as a storage unit will be described, but the present invention is not limited to this. For example, the learning model DB 48 is not particularly limited as long as the learning model DB 48 is an external storage unit such as a CD-ROM and includes a medium that can store the learning model. The learning model stored in the learning model DB 48 is used for detection of an object by the identification unit 34 of the object detection device 30.

なお、本実施の形態では、学習モデル生成装置２０と対象物検出装置３０とを互いに独立したコンピュータで実現する例について説明するが、これらを同一コンピュータ上で実現するようにしてもよい。 In the present embodiment, an example in which the learning model generation device 20 and the object detection device 30 are realized by independent computers will be described, but these may be realized on the same computer.

以上のように構成された本実施の形態の対象物検出システム１０では、上述したように、Ｖｉｏｌａ＆Ｊｏｎｅｓの手法（以下、Ｖ＆Ｊと記す）を用いて、入力画像から歩行者等の検出対象物を検出する。 In the object detection system 10 of the present embodiment configured as described above, a detection object such as a pedestrian is detected from an input image using the Viola & Jones method (hereinafter referred to as V & J) as described above. To do.

Ｖ＆Ｊ手法は２クラス識別を行う手法で、Ｈａａｒ−ｌｉｋｅ特徴の利用とそのインテグラルイメージによる高速演算法、カスケード接続のＡｄａｂｏｏｓｔ型の識別器により、高速な識別処理を実現している（Paul Viola and Michael Jones :"Rapid Object Detection using a Boosted Cascade of Simple Features", IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2001.を参照。）。 The V & J method is a two-class identification method, which realizes high-speed identification processing by using Haar-like features, a high-speed calculation method based on its integral image, and cascaded Adaboost classifiers (Paul Viola and (See Michael Jones: "Rapid Object Detection using a Boosted Cascade of Simple Features", IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2001.)

Ｖ＆Ｊ手法に従い、本実施の形態の学習モデル生成装置２０は、検出対象物の画像と、非対象の画像の２種類の学習用画像から、Ｈａａｒ−ｌｉｋｅ特徴の集合からなる学習モデル（対象物か否かを識別するために用いる辞書の役割をするもの）を生成する。 According to the V & J method, the learning model generation apparatus 20 according to the present embodiment uses a learning model (a target object) that includes a set of Haar-like features from two types of learning images, ie, a detection target image and a non-target image. That serves as a dictionary used to identify whether or not).

また、対象物検出装置３０では、前述したように、入力画像に対して探索ウインドウと呼ばれる複数の大きさのウインドウを縦横に少しずつずらしながらウインドウ画像を抽出し、該ウインドウ画像と上記生成された学習モデルとの比較を行うことにより、対象物を抽出する。 Further, as described above, the object detection device 30 extracts a window image while gradually shifting a plurality of windows called search windows vertically and horizontally with respect to the input image, and the window image and the above-described generated image are generated. An object is extracted by comparing with a learning model.

次に、学習モデル生成装置２０のＣＰＵが実行する学習モデル生成処理の処理ルーチンについて図３を用いて説明する。なお、この学習モデル生成処理は、本実施の形態では、例えば、図示しない指示受付手段（例えばタッチパネルやキーボード等）が学習モデル生成装置２０に設けられており、この指示受付手段をユーザが操作することにより、学習モデルを生成する指示を、学習モデル生成装置２０のＣＰＵが受け付けた場合に実行される。また、ここでは、学習の手法として、上述したＶ＆Ｊ手法を用いることとする。また、検出対象物の画像と非対象物の画像との２種類の学習用画像を所定枚数（例えば、各１５０００枚）予め用意しておく。また、入力される学習用画像のサイズは、予め設定されたサイズ（例えば１６×３２画素）とする。 Next, a processing routine of learning model generation processing executed by the CPU of the learning model generation device 20 will be described with reference to FIG. In this embodiment, for example, instruction learning means (not shown) such as a touch panel or a keyboard is provided in the learning model generation apparatus 20 in the present embodiment, and the user operates this instruction reception means. This is executed when the CPU of the learning model generation device 20 receives an instruction to generate a learning model. Here, the V & J method described above is used as a learning method. In addition, a predetermined number (for example, 15000 each) of two types of learning images, that is, a detection target image and a non-target image are prepared in advance. The size of the input learning image is set to a preset size (for example, 16 × 32 pixels).

まず、ステップ１００で、学習用画像を１枚読み込む。 First, at step 100, one learning image is read.

次のステップ１０２では、入力された学習用画像が検出対象物を含む画像であるか否かを判定し、入力された学習用画像が検出対象物を含む画像であると判定された場合には、ＨＤＤに記憶された基準画像６０ａ〜ｄに基づいて、例えば、ＮｅａｒｅｓｔＮｅｉｇｈｂｏｒ（最近傍決定則）により、入力された検出対象物を含む学習用画像を、複数のランク１〜４のうちの何れかのランクに分類する。これにより、入力された検出対象物を含む学習用画像を、基準画像６０ａ〜ｄ中の最も類似した基準画像と同じランクに分類することができる。なお、画像間の類似度として、画素値そのものや、４方向面特徴などの特徴量の正規化相関を用いてもよいし、類似性を表す指標として、ユークリッド距離や、市街地距離等を用いてもよい。また、入力された学習用画像が検出対象物を含む画像であるか否かを判定する方法としては、例えば、学習用画像に含まれているラベル（検出対象物を含む画像であるか否かを示すラベル）に基づいて判定する方法が考えられる。 In the next step 102, it is determined whether or not the input learning image is an image including a detection target, and when it is determined that the input learning image is an image including a detection target. Based on the reference images 60a to 60d stored in the HDD, the learning image including the input detection target is selected from any one of the ranks 1 to 4 by, for example, Nearest Neighbor (nearest neighbor determination rule). It is classified into the rank. Thereby, the learning image including the input detection target can be classified into the same rank as the most similar reference image in the reference images 60a to 60d. The similarity between images may be a normalized correlation between feature values such as pixel values and four-way plane features, and an Euclidean distance, a city distance, etc. may be used as an index indicating similarity. Also good. Further, as a method for determining whether or not the input learning image is an image including a detection target, for example, a label (whether or not it is an image including a detection target) included in the learning image. A method of determining based on the label) is conceivable.

次のステップ１０４では、用意しておいた全ての学習用画像について、上記ステップ１０２の処理が行われたか否かを判定する。ステップ１０４で、用意しておいた全ての学習用画像について上記ステップ１０２の処理が行われていないと判定された場合には、ステップ１００に戻り、ステップ１００で次の学習用画像を読み込む。一方、ステップ１０４で、用意しておいた全ての学習用画像について上記ステップ１０２の処理が行われたと判定された場合には、次のステップ１０６へ進む。 In the next step 104, it is determined whether or not the processing in step 102 has been performed for all prepared learning images. If it is determined in step 104 that the processing in step 102 has not been performed for all of the prepared learning images, the process returns to step 100, and the next learning image is read in step 100. On the other hand, if it is determined in step 104 that the processing in step 102 has been performed on all prepared learning images, the process proceeds to the next step 106.

ステップ１０６では、下記の式（２）に従って、上記ステップ１０２によって、対応するランクに分類された複数の学習用画像（検出対象物を含む学習用画像）に基づいて、最も検出し易いランク１と最も検出し難いランク４との間のランク２及びランク３に分類された複数の学習用画像の数Ｎ_１に対する、ランク１とランク４とに分類された複数の学習用画像の数Ｎ_２の割合が所定値Ｔ以下となるように、複数Ｍ_１（Ｎ_１＋Ｎ_２）枚の検出対象物の学習用画像を選択する。 In step 106, rank 1 that is most easily detected based on the plurality of learning images (learning images including the detection target) classified into the corresponding ranks in step 102 according to the following equation (2): The number N ₂ of the plurality of learning images classified into rank 1 and rank 4 with respect to the number N ₁ of the plurality of learning images classified into rank 2 and rank 3 between rank 4 and the most difficult to detect A learning image of a plurality of M ₁ (N ₁ + N ₂ ) detection objects is selected so that the ratio is equal to or less than a predetermined value T.

上述したように、例えば、このＴの値を、例えば１．０に設定した場合には、ステップ１０６では、例えば、ランク１に分類された学習用画像５０００枚、ランク２に分類された学習用画像５０００枚、ランク３に分類された学習用画像０枚、ランク４に分類された学習用画像０枚を選択することができる。なお、このとき、ランク２及びランク３に分類され、かつ選択された学習用画像の数Ｎ_１は５０００（５０００＋０）であり、ランク１とランク４とに分類され、かつ選択された学習用画像の数Ｎ_２は５０００（５０００＋０）であり、「Ｎ_２／Ｎ_１」の値は、１（５０００／５０００）となる。 As described above, for example, when the value of T is set to 1.0, for example, in step 106, for example, 5000 learning images classified into rank 1 and learning images classified into rank 2 are used. It is possible to select 5000 images, 0 learning images classified into rank 3, and 0 learning images classified into rank 4. At this time, the number N ₁ of learning images classified into rank 2 and rank 3 and selected is 5000 (5000 + 0), and is classified into rank 1 and rank 4 and selected learning images. The number N ₂ is 5000 (5000 + 0), and the value of “N ₂ / N ₁ ” is 1 (5000/5000).

また、上述したように、Ｔの値を、例えば０．２に設定した場合には、ステップ１０６では、例えば、ランク１に分類された学習用画像１０００枚、ランク２に分類された学習用画像５０００枚、ランク３に分類された学習用画像１０００枚、ランク４に分類された学習用画像０枚を選択することができる。なお、このとき、ランク２及びランク３に分類され、かつ選択される学習用画像の数Ｎ_１は６０００（１０００＋５０００）であり、ランク１とランク４とに分類され、かつ選択される学習用画像の数Ｎ_２は１０００（１０００＋０）であり、「Ｎ_２／Ｎ_１」の値は、０．１６・・（１０００／６０００）となる。 Further, as described above, when the value of T is set to 0.2, for example, in step 106, for example, 1000 learning images classified into rank 1 and learning images classified into rank 2 are used. It is possible to select 5000 images, 1000 learning images classified into rank 3, and 0 learning images classified into rank 4. At this time, are classified into No. 2 and No. 3, and the number _{N 1} of the learning images to be selected is 6000 (1000 + 5000), the learning images are classified into the rank 1 and rank 4, and is selected The number N ₂ is 1000 (1000 + 0), and the value of “N ₂ / N ₁ ” is 0.16 ·· (1000/6000).

また、ステップ１０６では、上記ステップ１０２で分類されなかった非対象物の学習用画像を複数Ｍ_２枚選択する。 In step 106, a plurality of M ₂ learning images of non-objects not classified in step 102 are selected.

次のステップ１０８では、上記ステップ１０６で選択されたＭ_１枚の検出対象物の学習用画像、及び上記ステップ１０６で選択されたＭ_２枚の非対象物の学習用画像から、Ｖ＆Ｊ手法に従い、Ｈａａｒ−ｌｉｋｅ特徴の集合からなる学習モデル（検出対象物の特徴を示す学習モデル）を生成して、生成された学習モデルを学習モデルＤＢ４８に記憶する。そして、学習モデル生成処理を終了する。 In the next step 108, from the learning image of the M ₁ detection object selected in step 106 and the learning image of the M ₂ non-object selected in step 106, according to the V & J method, A learning model composed of a set of Haar-like features (a learning model indicating features of the detection target) is generated, and the generated learning model is stored in the learning model DB 48. Then, the learning model generation process ends.

なお、ステップ１００、１０２、１０４は学習用画像分類部４２で実行され、ステップ１０６は学習用画像選択部４４で実行され、ステップ１０８は学習部４６で実行される。 Steps 100, 102, and 104 are executed by the learning image classification unit 42, step 106 is executed by the learning image selection unit 44, and step 108 is executed by the learning unit 46.

また、上記では、ＣＰＵの処理によって、複数の学習用画像の各々を複数のランクのうちの何れかのランクに分類する例について説明したが、設計者等によって、複数の学習用画像の各々を複数のランクのうちの何れかのランクに予め分類しておいてもよい。 In the above description, the example in which each of the plurality of learning images is classified into any one of the plurality of ranks by the processing of the CPU has been described. However, each of the plurality of learning images is determined by a designer or the like. You may classify | categorize beforehand in any rank of several ranks.

次に、対象物検出装置３０のＣＰＵが実行する対象物検出処理の処理ルーチンについて図４を用いて説明する。なお、この対象物検出処理は、本実施の形態では、例えば、図示しない指示受付手段（例えばタッチパネルやキーボード等）が、対象物検出装置３０に設けられており、この指示受付手段をユーザが操作することにより、検出対象物を検出する指示を、対象物検出装置３０のＣＰＵが受け付けた場合に実行開始され、当該指示を受け付けた以降は、所定時間間隔（例えば、１００ｍｓ間隔）で実行される。 Next, the processing routine of the target object detection process which CPU of the target object detection apparatus 30 performs is demonstrated using FIG. In this embodiment, for example, in this embodiment, an instruction receiving unit (not shown) (for example, a touch panel or a keyboard) is provided in the target object detection device 30, and the user operates the instruction receiving unit. As a result, when the CPU of the object detection device 30 receives an instruction to detect the detection object, the instruction is started. After the instruction is received, the instruction is detected at a predetermined time interval (for example, at an interval of 100 ms). .

まず、ステップ２００で、検出対象物を検出する対象となる画像（本実施の形態ではこの画像を入力画像と称する）を１枚読み込む。なお、この入力画像は、例えば、カメラ等の撮影手段（図示しない）によって撮影された画像であり、ステップ２００では、この撮影手段から入力画像を読み込む。 First, in step 200, one image to be detected is detected (in the present embodiment, this image is referred to as an input image). Note that this input image is, for example, an image photographed by photographing means (not shown) such as a camera. In step 200, the input image is read from this photographing means.

次のステップ２０２では、入力画像に対して、所定の初期サイズ、例えば、１６×３２画素の探索ウインドウを初期位置（例えば、入力画像の左角の領域）に設定する。 In the next step 202, a search window having a predetermined initial size, for example, 16 × 32 pixels, is set to the initial position (for example, the left corner area of the input image) for the input image.

次のステップ２０４では、設定された探索ウインドウを用いて、入力画像から所定サイズ、例えば１６×３２画素のウインドウ画像を抽出する。なお、設定された探索ウインドウが１６×３２画素を超えるサイズのウインドウであった場合には、抽出したウインドウ画像を所定サイズ、例えば１６×３２画素に変換する。 In the next step 204, a window image of a predetermined size, for example, 16 × 32 pixels is extracted from the input image using the set search window. If the set search window is a window having a size exceeding 16 × 32 pixels, the extracted window image is converted to a predetermined size, for example, 16 × 32 pixels.

次のステップ２０６では、上記ステップ２０４で抽出されたウインドウ画像と、学習モデルＤＢ４８から読み出した学習モデルとを比較して、検出対象物の確からしさを評価値として算出する。 In the next step 206, the window image extracted in step 204 is compared with the learning model read from the learning model DB 48, and the probability of the detection target is calculated as an evaluation value.

次のステップ２０８では、算出された評価値に基づいてウインドウ画像が検出対象物（本実施の形態では、例えば、人物）であるか否かを判定することにより、検出対象物を検出する。 In the next step 208, the detection target is detected by determining whether or not the window image is a detection target (for example, a person in the present embodiment) based on the calculated evaluation value.

なお、ウインドウ画像と学習モデルとを用いた検出対象物の検出の方法は様々あるが、前述のＶ＆Ｊの手法で採用されているＡｄａＢｏｏｓｔアルゴリズム等を用いて検出することができる。 There are various methods for detecting the detection object using the window image and the learning model, but the detection can be performed using the AdaBoost algorithm or the like employed in the above-described V & J method.

ステップ２０８で、ウインドウ画像が検出対象物であると判定された場合には、検出対象物が検出されたと判断し、次のステップ２１０へ進む。一方、ステップ２０８で、ウインドウ画像が検出対象物でないと判定された場合には、次のステップ２１２へ進む。 If it is determined in step 208 that the window image is a detection target, it is determined that the detection target has been detected, and the process proceeds to the next step 210. On the other hand, if it is determined in step 208 that the window image is not a detection object, the process proceeds to the next step 212.

ステップ２１０では、上記ステップ２０８で、ウインドウ画像が検出対象物であると判定されたときの探索ウインドウの位置及び大きさ等の情報を、検出対象物の検出結果としてＲＡＭに記憶する。 In step 210, information such as the position and size of the search window when the window image is determined to be a detection target in step 208 is stored in the RAM as a detection result of the detection target.

次のステップ２１２では、入力画像全体について探索ウインドウをスキャンして探索が終了したか否かを判定する。肯定判定のとき（入力画像全体について探索ウインドウをスキャンして探索が終了したと判定された場合）はステップ２１６に進み、否定判定のとき（入力画像全体について探索ウインドウをスキャンして探索が終了していないと判定された場合）はステップ２１４に進む。 In the next step 212, it is determined whether or not the search has been completed by scanning the search window for the entire input image. When the determination is affirmative (when it is determined that the search has been completed by scanning the search window for the entire input image), the process proceeds to step 216. When the determination is negative (the search window is scanned for the entire input image, the search ends). If it is determined that it is not, the process proceeds to step 214.

ステップ２１４では、探索ウインドウの位置を予め定められた探索ピッチだけ移動させて、ステップ２０４に戻る。そして、再びステップ２０４からステップ２１２までの処理が繰り返し実行される。探索ウインドウが画像全体を探索すると、ステップ２１６に進む。 In step 214, the position of the search window is moved by a predetermined search pitch, and the process returns to step 204. Then, the processing from step 204 to step 212 is repeated. When the search window searches the entire image, the process proceeds to step 216.

ステップ２１６では、全てのサイズの探索ウインドウでの探索が終了したか否かを判定する。ここで、探索ウインドウは検出対象物を検出するためのウインドウ画像を抽出するためのフレームとして用いられているが、探索ウインドウのサイズが異なれば、様々なサイズの検出対象物を検出することができる。本実施の形態では、様々なサイズの探索ウインドウが予め用意されており、各々の探索ウインドウで画像全体を探索する必要がある。そこで、ステップ２１６で否定判定された場合（全てのサイズの探索ウインドウでの探索が終了していないと判定された場合）には、ステップ２１８へ進み、当該ステップ２１８で、異なるウインドウサイズの探索ウインドウを設定する。例えば、ステップ２１８では、サイズがそれまでの探索ウインドウより１ステップ拡大された探索ウインドウ（例えば、サイズが１．２倍の探索ウインドウ）を初期位置（例えば、入力画像の左角の領域）に設定する。そして、ステップ２０４に戻り、上記と同様にステップ２０４以降の処理を繰り返し実行する。 In step 216, it is determined whether or not the search in all size search windows has been completed. Here, the search window is used as a frame for extracting a window image for detecting the detection object. However, if the size of the search window is different, detection objects of various sizes can be detected. . In the present embodiment, search windows of various sizes are prepared in advance, and it is necessary to search the entire image in each search window. Therefore, if a negative determination is made in step 216 (when it is determined that the search in all size search windows has not been completed), the process proceeds to step 218, and in step 218, a search window having a different window size is determined. Set. For example, in step 218, a search window (for example, a search window whose size is 1.2 times larger) than the previous search window is set as an initial position (for example, a left corner area of the input image). To do. Then, returning to step 204, the processing after step 204 is repeatedly executed in the same manner as described above.

一方、ステップ２１６で肯定判定された場合（全てのサイズの探索ウインドウでの探索が終了したと判定された場合）には、次のステップ２２０へ進む。ステップ２２０では、ＲＡＭに記憶された検出対象物の検出結果に基づいて、入力画像に対して、検出された検出対象物がウインドウで囲まれて表示されるように、表示装置３８を制御する。そして、対象物検出処理を終了する。 On the other hand, when an affirmative determination is made at step 216 (when it is determined that the search in the search windows of all sizes has been completed), the routine proceeds to the next step 220. In step 220, based on the detection result of the detection object stored in the RAM, the display device 38 is controlled so that the detected detection object is displayed surrounded by a window with respect to the input image. Then, the object detection process ends.

なお、ステップ２００、２０２、２０４、２１２、２１４、２１６、２１８はウインドウ画像抽出部３２で実行され、ステップ２０６、２０８、２１０は識別部３４で実行され、ステップ２２０は結果出力部３６で実行される。 Steps 200, 202, 204, 212, 214, 216, and 218 are executed by the window image extraction unit 32, steps 206, 208, and 210 are executed by the identification unit 34, and step 220 is executed by the result output unit 36. The

次に、図５及び図６を参照して、本実施の形態の効果について説明する。 Next, the effect of the present embodiment will be described with reference to FIGS.

図５には、ランダムに学習用画像を選択し、歩行者検出を行う際に用いられる学習モデルを生成した場合の歩行者検出性能（すなわち、従来の技術）と、本実施形態の対象物検出システム１０の歩行者検出性能の実験結果の比較例を表すグラフが示されている。このグラフは、識別器のステージ毎に検出率と１フレームあたりの誤検出数をプロットしたＲＯＣ（receiver operating characteristic）曲線であり、縦軸が検出率、横軸が１フレームあたりの誤検出数である。なお、検出率は、「検出された歩行者／検出対象となる歩行者」で表され、１フレームあたりの誤検出数は、「検出対象以外に対する検出数／評価フレーム数」で表される。また、ＲＯＣ曲線ではグラフが左上方にあるほど性能が高い。すなわち横軸の値が同じ（誤検出数が等しい）場合は、縦軸の値が大きいほど検出率が高く、縦軸の値が同じ（検出率が等しい）場合は、横軸の値が小さいほど誤検出が少ないことを意味する。 FIG. 5 shows a pedestrian detection performance (that is, a conventional technique) when a learning model is generated by randomly selecting a learning image and performing pedestrian detection, and object detection according to the present embodiment. The graph showing the comparative example of the experimental result of the pedestrian detection performance of the system 10 is shown. This graph is a ROC (receiver operating characteristic) curve in which the detection rate and the number of false detections per frame are plotted for each stage of the discriminator. The vertical axis is the detection rate, and the horizontal axis is the number of false detections per frame. is there. The detection rate is expressed as “detected pedestrian / pedestrian to be detected”, and the number of erroneous detections per frame is expressed as “number of detections other than detection target / number of evaluation frames”. In the ROC curve, the higher the graph is in the upper left, the higher the performance. That is, when the value on the horizontal axis is the same (the number of false detections is equal), the larger the value on the vertical axis, the higher the detection rate. When the value on the vertical axis is the same (the detection rate is equal), the value on the horizontal axis is small. It means that there are few false detections.

実験では、ランク毎に分類された学習用画像を用いて、３つのデータセットを作成した。１つめのデータセット（データセット１）は、ランク１に分類された学習用画像０枚、ランク２に分類された学習用画像５０００枚、ランク３に分類された学習用画像０枚、ランク４に分類された学習用画像０枚を用いて作成した。２つめのデータセット（データセット２）は、ランク１に分類された学習用画像５０００枚、ランク２に分類された学習用画像５０００枚、ランク３に分類された学習用画像０枚、ランク４に分類された学習用画像０枚を用いて作成した。３つめのデータセット（データセット３）は、ランク１に分類された学習用画像１０００枚、ランク２に分類された学習用画像５０００枚、ランク３に分類された学習用画像１０００枚、ランク４に分類された学習用画像０枚を用いて作成した。なお、これらのデータセットは、いずれも、最も検出し易いランク１と最も検出し難いランク４との間のランク２及びランク３に分類された複数の学習用画像の数Ｎ_１に対する、ランク１とランク４とに分類された複数の学習用画像の数Ｎ_２の割合が所定値Ｔ以下となる、複数Ｍ_１（Ｎ_１＋Ｎ_２）枚の検出対象物の学習用画像が用いられている。図５には、これらのデータセット１、２、３の各々に対応するＲＯＣ曲線７０、７２、７４が示されている。 In the experiment, three data sets were created using learning images classified by rank. The first data set (data set 1) has 0 learning images classified into rank 1, 5000 learning images classified into rank 2, 0 learning images classified into rank 3, rank 4 It was created using 0 learning images classified into the above. The second data set (data set 2) has 5000 learning images classified into rank 1, 5000 learning images classified into rank 2, 0 learning images classified into rank 3, and rank 4 It was created using 0 learning images classified into the above. The third data set (data set 3) has 1000 learning images classified into rank 1, 5000 learning images classified into rank 2, 1000 learning images classified into rank 3, rank 4 It was created using 0 learning images classified into the above. Each of these data sets is rank 1 with respect to the number N ₁ of a plurality of learning images classified into rank 2 and rank 3 between rank 1 that is most easily detected and rank 4 that is most difficult to detect. And a plurality of learning images of M ₁ (N ₁ + N ₂ ) detection objects in which the ratio of the number N ₂ of the plurality of learning images classified into rank 4 is equal to or less than a predetermined value T is used. . FIG. 5 shows ROC curves 70, 72, and 74 corresponding to the data sets 1, 2, and 3, respectively.

また、図５には、比較として、同じ学習用画像の集合から質を考慮せずにランダムに５０００枚選択し、学習し、歩行者検出を行う際に用いられる学習モデルを生成した場合（ランダムサンプリング）における最も性能が高かったときのＲＯＣ曲線７６が示されている。 Also, in FIG. 5, for comparison, a case where 5000 images are randomly selected from the same set of learning images without considering quality, learned, and a learning model used for pedestrian detection is generated (random) The ROC curve 76 at the highest performance in (sampling) is shown.

図５のＲＯＣ曲線７０、７２、７４、７６が示すように、どの場合も検出性能は同程度であることがわかる。 As shown by the ROC curves 70, 72, 74, 76 in FIG. 5, it can be seen that the detection performance is almost the same in any case.

図６には、ランダムに学習用画像を選択し、歩行者検出を行う際に用いられる学習モデルを生成した場合の特徴数と、本実施形態の対象物検出システム１０の特徴数の実験結果の比較例を表すグラフが示されている。このグラフは、縦軸が特徴数、横軸が１フレームあたりの誤検出数である。 FIG. 6 shows experimental results of the number of features when a learning model is generated by randomly selecting an image for learning and used for pedestrian detection, and the number of features of the object detection system 10 of the present embodiment. A graph representing a comparative example is shown. In this graph, the vertical axis represents the number of features, and the horizontal axis represents the number of erroneous detections per frame.

図６には、これらのデータセット１、２、３の各々に対応するグラフ８０、８２、８４が示されている。また、図６には、ランダムサンプリングの場合におけるグラフ８６が示されている。 FIG. 6 shows graphs 80, 82, and 84 corresponding to the data sets 1, 2, and 3, respectively. FIG. 6 shows a graph 86 in the case of random sampling.

図６に示すように、ランダムサンプリングの場合と比較して、データセット１〜３を用いた場合には、特徴数が減少（削減）していることが分かる。 As shown in FIG. 6, it can be seen that the number of features is reduced (reduced) when using data sets 1 to 3 as compared to the case of random sampling.

このように本実施の形態の学習モデル生成装置２０によって生成された学習モデルは、図５及び図６の実験結果が示すように、従来の技術と比較して、検出性能が同程度となり、かつ特徴数が削減された学習モデルである。 As described above, the learning model generated by the learning model generation apparatus 20 according to the present embodiment has the same detection performance as that of the conventional technique, as shown in the experimental results of FIGS. This learning model has a reduced number of features.

以上、説明したように、本実施の形態の学習モデル生成装置２０によれば、検出対象物の検出し易さまたは検出し難さの度合いに応じた複数のランク１〜４の各々毎に、検出対象物の最も検出し易いランク１から最も検出し難いランク４にわたって予め分類された複数の学習用画像に基づいて、最も検出し易いランク１と最も検出し難いランク４との間のランク２、３に分類された複数の学習用画像の数Ｎ_１に対する、ランク１とランク４とに分類された複数の学習用画像の数Ｎ_２の割合が所定値Ｔ以下となるように、複数の学習用画像が選択され、選択された複数の学習用画像に基づいて、この複数の学習用画像の各々に応じた数の特徴（本実施の形態ではＨａａｒ−ｌｉｋｅ特徴）の集合を含む学習モデルが生成される。 As described above, according to the learning model generation device 20 of the present embodiment, for each of a plurality of ranks 1 to 4 according to the ease of detection of the detection target or the degree of difficulty of detection, Rank 2 between rank 1 that is most easily detected and rank 4 that is most difficult to detect based on a plurality of learning images previously classified from rank 1 that is most easily detected to rank 4 that is most difficult to detect. The ratio of the number N ₂ of the plurality of learning images classified into rank 1 and rank 4 to the number N ₁ of the plurality of learning images classified into 3 is equal to or less than a predetermined value T. A learning model including a set of features (Haar-like features in this embodiment) corresponding to each of the plurality of learning images is selected based on the plurality of learning images selected. Is generated.

また、本実施の形態の学習モデル生成装置２０によれば、検出対象物の検出し易さまたは検出し難さの度合いに応じた複数のランク１〜４の各々毎に、検出対象物の最も検出し易いランク１から最も検出し難いランク４にわたって予め分類された複数の基準画像６０ａ〜ｄに基づいて、複数の学習用画像の各々が複数のランク１〜４のうちの何れかのランクに分類され、分類された複数の学習用画像に基づいて、最も検出し易いランク１と最も検出し難いランク４との間のランク２、３に分類された複数の学習用画像の数Ｎ_１に対する、ランク１とランク４とに分類された複数の学習用画像の数Ｎ_２の割合が所定値Ｔ以下となるように、複数の学習用画像が選択され、選択された複数の学習用画像に基づいて、この複数の学習用画像の各々に応じた数の特徴（本実施の形態ではＨａａｒ−ｌｉｋｅ特徴）の集合を含む学習モデルが生成される。 Moreover, according to the learning model generation apparatus 20 of the present embodiment, the highest detection target is obtained for each of the plurality of ranks 1 to 4 according to the degree of ease of detection or detection difficulty of the detection target. Based on a plurality of reference images 60a to 60d previously classified from rank 1 that is easy to detect to rank 4 that is most difficult to detect, each of the plurality of learning images is in any one of ranks 1 to 4. It is classified, based on the classified plurality of learning images, to the number N ₁ of detected most easily No. 1 and detected most difficult plurality of learning images classified into ranks 2, 3 between the No. 4 as the proportion of the number N ₂ of a plurality of learning images classified into the rank 1 and rank 4 is equal to or less than a predetermined value T, a plurality of learning images is selected, a plurality of learning images selected Each of the plurality of learning images based on (In this embodiment Haar-like feature) The number of features in accordance with the learning model is generated that includes a set of.

本実施の形態の学習モデル生成装置２０によって生成された学習モデルは、図５及び図６の実験結果が示すように、従来の技術と比較して、検出性能が同程度となり、かつ特徴数が削減された学習モデルである。 The learning model generated by the learning model generation apparatus 20 according to the present embodiment has the same detection performance and the number of features as the experimental results shown in FIGS. This is a reduced learning model.

従って、本実施の形態に係る学習モデル生成装置２０によれば、従来の技術と比較して、検出性能が同程度となり、かつ特徴数が削減された学習モデルを生成することができる。 Therefore, according to the learning model generation apparatus 20 according to the present embodiment, it is possible to generate a learning model in which the detection performance is comparable and the number of features is reduced as compared with the conventional technique.

なお、本実施の形態では、ランクの数を１〜４の４つとしたが、本発明はこれに限られない。例えば、ランクの数を４以外の複数の数としてもよい。 In the present embodiment, the number of ranks is four (1 to 4), but the present invention is not limited to this. For example, the number of ranks may be a plurality of numbers other than four.

また、ステップ１０６において、上記の式（２）に従って、最も検出し易いランク１と最も検出し難いランク４との間のランク２及びランク３に分類された複数の学習用画像の数Ｎ_１に対する、ランク１とランク４とに分類された複数の学習用画像の数Ｎ_２の割合が所定値Ｔ以下となるように、複数Ｍ_１（Ｎ_１＋Ｎ_２）枚の検出対象物の学習用画像を選択する例について説明したが、本発明はこれに限られない。例えば、ステップ１０６において、下記の式（３）に従って、最も検出し易いランク１と最も検出し難いランク４との間のランク２及びランク３に分類された複数の学習用画像の数Ｎ_１に対する、ランク１とランク４とに分類された複数の学習用画像の数Ｎ_２の割合が所定値Ｔ´となるように、複数Ｍ_１（Ｎ_１＋Ｎ_２）枚の検出対象物の学習用画像を選択するようにしてもよい。 Further, in step 106, according to the above equation (2), to the number N ₁ of a plurality of learning images classified into No. 2 and No. 3 between the detected most difficult No. 4 as easily No. 1 was detected most The learning images of a plurality of M ₁ (N ₁ + N ₂ ) detection objects so that the ratio of the number N ₂ of the plurality of learning images classified into rank 1 and rank 4 is equal to or less than a predetermined value T. Although the example which selects is demonstrated, this invention is not limited to this. For example, in step 106, according to equation (3) below, to the number N ₁ of a plurality of learning images classified into No. 2 and No. 3 between the detected most difficult No. 4 as easily No. 1 was detected most The learning images of a plurality of M ₁ (N ₁ + N ₂ ) detection objects so that the ratio of the number N ₂ of the plurality of learning images classified into rank 1 and rank 4 becomes a predetermined value T ′. May be selected.

また、ステップ１０６において、下記の式（４）に従って、最も検出し易いランク１と最も検出し難いランク４との間のランク２及びランク３に分類された複数の学習用画像の数Ｎ_１に対する、ランク１とランク４とに分類された複数の学習用画像の数Ｎ_２の割合が所定値Ｔ´´から所定値Ｔ´´´の範囲内となるように、複数Ｍ_１（Ｎ_１＋Ｎ_２）枚の検出対象物の学習用画像を選択するようにしてもよい。 Further, in step 106, according to equation (4) below, to the number N ₁ of a plurality of learning images classified into No. 2 and No. 3 between the detected most difficult No. 4 as easily No. 1 was detected most , A plurality of M ₁ (N ₁ + N) so that the ratio of the number N ₂ of the plurality of learning images classified into rank 1 and rank 4 falls within the range from the predetermined value T ″ to the predetermined value T ″ ′. ₂ ) You may make it select the image for learning of the detection target object of 1 sheet.

なお、上記で説明した所定値Ｔ、所定値Ｔ´、所定値Ｔ´´、及び所定値Ｔ´´´の各々の値は、例えば、実験によって、従来の技術と比較して検出性能が同程度となり、かつ特徴数が削減された学習モデルが生成された場合における所定値Ｔ、または、所定値Ｔ´、または所定値Ｔ´´及び所定値Ｔ´´´の値を予め求めておき、予め求められた値を設定しておけばよい。 It should be noted that each of the predetermined value T, the predetermined value T ′, the predetermined value T ″, and the predetermined value T ′ ″ described above has the same detection performance as compared with the conventional technique, for example, through experiments. A predetermined value T, or a predetermined value T ′, or a predetermined value T ″ and a predetermined value T ′ ″ when a learning model with a reduced number of features is generated in advance, What is necessary is just to set the value calculated | required previously.

１０対象物検出システム
２０学習モデル生成装置
３０対象物検出装置
４２学習用画像分類部
４４学習用画像選択部
４６学習部
４８学習モデルＤＢ DESCRIPTION OF SYMBOLS 10 Object detection system 20 Learning model production | generation apparatus 30 Target object detection apparatus 42 Image classification part 44 for learning Image selection part 46 for learning 46 Learning part 48 Learning model DB

Claims

For each of a plurality of ranks according to the degree of ease of detection or detection difficulty of the detection object, a plurality of learning images pre-classified from the rank that is most easily detected to the rank that is most difficult to detect Based on the number of learning images classified into a rank between the rank that is most easily detected and the rank that is most difficult to detect, a plurality of classifications that are classified into the rank that is most easily detected and the rank that is most difficult to detect Selection means for selecting a plurality of learning images so that the ratio of the number of learning images is equal to or less than a predetermined value;
Generating means for generating a learning model including a set of features corresponding to each of the plurality of learning images based on the plurality of learning images selected by the selection means;
A learning model generation apparatus including:

Based on a plurality of reference images that are pre-classified from a rank that is most easily detected to a rank that is most difficult to detect, for each of a plurality of ranks according to the degree of ease or difficulty of detection of the detection object. Classifying means for classifying each of the plurality of learning images into any one of the ranks;
Based on the plurality of learning images classified by the classification means, the most easily detected with respect to the number of the plurality of learning images classified into a rank between the rank that is most easily detected and the rank that is most difficult to detect. Selection means for selecting a plurality of learning images so that the ratio of the number of the plurality of learning images classified into the rank and the most difficult to detect is equal to or less than a predetermined value;
Generating means for generating a learning model including a set of features corresponding to each of the plurality of learning images based on the plurality of learning images selected by the selection means;
A learning model generation apparatus including:

The degree of ease of detection of the detection object increases as the resolution of the learning image increases, and decreases as the portion of the learning image hidden by the object other than the detection object increases. The learning model generation device according to claim 1, wherein the learning model generation device is determined so as to decrease as the degree of blur in the learning image increases or to decrease as the background noise in the learning image increases.

The degree of difficulty in detecting the detection object decreases as the resolution of the learning image increases, and increases as the portion of the learning image hidden by the object other than the detection object increases. The learning model generation apparatus according to claim 1 or 2, wherein the learning model generation device is determined so as to increase as a degree of blur in the learning image increases or as a background noise increases in the learning image.

The learning model generation device according to any one of claims 1 to 4,
An object detection device comprising detection means for detecting an object from the input image based on the learning model and the input image generated by the learning model generation device;
Object detection system including

Computer
For each of a plurality of ranks according to the degree of ease of detection or detection difficulty of the detection object, a plurality of learning images pre-classified from the rank that is most easily detected to the rank that is most difficult to detect Based on the number of learning images classified into a rank between the rank that is most easily detected and the rank that is most difficult to detect, a plurality of classifications that are classified into the rank that is most easily detected and the rank that is most difficult to detect Selection means for selecting a plurality of learning images so that the ratio of the number of learning images is equal to or less than a predetermined value, and the plurality of learning images based on the plurality of learning images selected by the selection means A program for functioning as a generation means for generating a learning model including a number of features corresponding to each image.

Computer
Based on a plurality of reference images that are pre-classified from a rank that is most easily detected to a rank that is most difficult to detect, for each of a plurality of ranks according to the degree of ease or difficulty of detection of the detection object. Classifying means for classifying each of the plurality of learning images into any one of the ranks;
Based on the plurality of learning images classified by the classification means, the most easily detected with respect to the number of the plurality of learning images classified into a rank between the rank that is most easily detected and the rank that is most difficult to detect. A selection unit that selects a plurality of learning images so that a ratio of the number of the plurality of learning images classified into a rank and a rank that is most difficult to detect is a predetermined value or less; and a plurality of selections selected by the selection unit A program for functioning as generation means for generating a learning model including a number of features corresponding to each of the plurality of learning images based on the learning images.