JP2016099668A

JP2016099668A - Learning method, learning device, image recognition method, image recognition device and program

Info

Publication number: JP2016099668A
Application number: JP2014233800A
Authority: JP
Inventors: 貴之猿田; Takayuki Saruta; 優和真継; Masakazu Matsugi
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2014-11-18
Filing date: 2014-11-18
Publication date: 2016-05-30

Abstract

PROBLEM TO BE SOLVED: To solve the problem that erroneous detection is sometimes performed regardless of high reliability in a method for identifying the class of each area in an image on the basis of a feature amount extracted from each small area in the image.SOLUTION: A learning image is used to learn a first discriminator, an identification result of a learning evaluation image by the first discriminator is evaluated, and a small area that is hardly identified by the first discriminator is selected. Next, a second discriminator for identifying an area including the selected small area is learned. Lastly, an integrated discriminator is learned by integrating an identification result by the first discriminator and an identification result by the second discriminator.SELECTED DRAWING: Figure 10

Description

本発明は、入力画像中の被写体の検出、および被写体ごとに領域を分割するための技術に関する。 The present invention relates to a technique for detecting a subject in an input image and dividing an area for each subject.

従来から、画像シーンの認識や被写体に応じた画質補正等の後段処理のために、被写体ごとに領域を分割し、被写体の分類に関するクラスを識別する処理が知られている。非特許文献１に記載の方法では、まず、色情報、テクスチャ情報に基づいて入力画像をＳＰ（スーパーピクセル）と呼ばれる小領域に分割する。そして、分割した各小領域のクラスをＲｅｃｕｒｓｉｖｅ−Ｎｅｕｒａｌ−Ｎｅｔｗｏｒｋｓ（ＲＮＮｓ）と呼ばれる識別器を用いて識別する。 2. Description of the Related Art Conventionally, processing for dividing a region for each subject and identifying a class related to subject classification is known for subsequent processing such as image scene recognition and image quality correction according to the subject. In the method described in Non-Patent Document 1, an input image is first divided into small regions called SP (superpixels) based on color information and texture information. And the class of each divided | segmented small area | region is identified using the discriminator called Recursive-Neural-Networks (RNNs).

Ｒ．Ｓｏｃｈｅｒ，“ＰａｒｓｉｎｇＮａｔｕｒａｌＳｃｅｎｅｓａｎｄＮａｔｕｒａｌＬａｎｇｕａｇｅｗｉｔｈＲｅｃｕｒｓｉｖｅＮｅｕｒａｌＮｅｔｗｏｒｋｓ”，ＩｎｔｅｒｎａｔｉｏｎａｌＣｏｎｆｅｒｅｎｃｅｏｎＭａｃｈｉｎｅＬｅａｒｎｉｎｇ２０１１．R. Socher, “Parsing Natural Scenes and Natural Language with Recursive Neural Networks”, International Conference on Machine Learning 2011. Ｐ．Ｋｒａｈｅｎｂｕｈｌ，“ＥｆｆｉｃｉｅｎｔＩｎｆｅｒｅｎｃｅｉｎＦｕｌｌｙＣｏｎｎｅｃｔｅｄＣＲＦｓｗｉｔｈＧａｕｓｓｉａｎＥｄｇｅＰｏｔｅｎｔｉａｌｓ”，ＮｅｕｒａｌＩｎｆｏｒｍａｔｉｏｎＰｒｏｃｅｓｓｉｎｇＳｙｓｔｅｍｓ２０１１．P. Krahenbuhl, “Efficient Inference in Full Connected CRFs with Gaussian Edge Potentials”, Neural Information Processing Systems 2011. Ｊ．Ｔｉｇｈｅ，Ｓ．Ｌａｚｅｂｎｉｋ，“ＦｉｎｄｉｎｇＴｈｉｎｇｓ：ＩｍａｇｅＰａｒｓｉｎｇｗｉｔｈＲｅｇｉｏｎｓａｎｄＰｅｒ−ＥｘｅｍｐｌａｒＤｅｔｅｃｔｏｒｓ”，ＣＶＰＲ２０１３．J. et al. Tighe, S .; Lazebnik, “Finding Things: Image Parsing with Regions and Per-Explorer Detectors”, CVPR2013. Ａ．ＯｌｉｖａａｎｄＡ．Ｔｏｒｒａｌｂａ，“Ｍｏｄｅｌｉｎｇｔｈｅｓｈａｐｅｏｆｔｈｅｓｃｅｎｅ：ａｈｏｌｉｓｔｉｃｒｅｐｒｅｓｅｎｔａｔｉｏｎｏｆｔｈｅｓｐａｔｉａｌｅｎｖｅｌｏｐｅ”，ＩｎｔｅｒｎａｔｉｏｎａｌＪｏｕｒｎａｌｏｆＣｏｍｐｕｔｅｒＶｉｓｉｏｎ，２００１．A. Oliva and A.M. Torralba, “Modeling the shape of the scene: a holistic representation of the spatial envelope”, International Journal of Computer Vision, 2001.

しかし、非特許文献１の方法のように、単に小領域から抽出された特徴量に基づいて画像中の各領域のクラスを識別する方法では、信頼度が高い（識別スコア、識別尤度が高い）にも関わらず誤検出する場合がある。たとえば、空の一部を切り出した小領域と青い壁の一部を切り出した小領域のように、特徴量の近い小領域を識別器で識別することは難しい。 However, the method of identifying the class of each region in the image based on the feature amount extracted from the small region as in the method of Non-Patent Document 1 has high reliability (the identification score and the identification likelihood are high). ) May be erroneously detected. For example, it is difficult to discriminate a small region having a similar feature amount by a discriminator, such as a small region obtained by cutting a part of the sky and a small region obtained by cutting a part of a blue wall.

上記課題を解決するために、本発明の学習方法によれば、画像の領域ごとにクラスを識別するための第１の識別器を、学習用画像を用いて学習する第１の学習工程と、学習した前記第１の識別器により、学習評価画像の領域ごとのクラスを識別する学習時識別工程と、前記第１の識別器による前記学習評価画像に対するクラスの識別結果が誤っている誤識別領域を選択する選択工程と、選択された前記誤識別領域を含む領域を用いて、学習データを生成する生成工程と、前記生成された学習データのクラスを識別する第２の識別器を学習する第２の学習工程とを有することを特徴とする。 In order to solve the above-mentioned problem, according to the learning method of the present invention, a first learning step of learning a first classifier for identifying a class for each region of an image using a learning image; A learning identification step for identifying a class for each region of the learning evaluation image by the learned first discriminator, and a misidentification region in which the class identification result for the learning evaluation image by the first discriminator is incorrect A selection step of selecting a learning step, a generation step of generating learning data using the selected region including the misidentification region, and a second classifier of learning a second classifier for identifying a class of the generated learning data And 2 learning steps.

以上の構成によれば、本発明は、画像認識装置により認識対象画像を認識する際、クラスの識別が難しい小領域が認識対象画像にあっても、その誤検出を軽減でき、画像を精度よく認識することができる。 According to the above configuration, according to the present invention, when the recognition target image is recognized by the image recognition device, even if the recognition target image includes a small region that is difficult to identify the class, the erroneous detection can be reduced, and the image can be accurately displayed. Can be recognized.

第１の実施形態に関わる画像認識システムの構成図。1 is a configuration diagram of an image recognition system according to a first embodiment. 第１の実施形態に関わる認識対象画像を説明する図。The figure explaining the recognition object image in connection with 1st Embodiment. 第１の実施形態において認識対象画像に対する処理を概念的に説明する図。The figure which illustrates notionally the process with respect to the recognition target image in 1st Embodiment. 第１の実施形態に関わる画像認識装置のハードウェア構成を示す図。1 is a diagram illustrating a hardware configuration of an image recognition apparatus according to a first embodiment. 第１の実施形態に関わる画像認識装置の機能構成を示す図。1 is a diagram illustrating a functional configuration of an image recognition apparatus according to a first embodiment. 各実施形態に関わる画像認識処理を示すフローチャート。The flowchart which shows the image recognition process in connection with each embodiment. 第１の実施形態に関わる事例の検出方法を説明する図。The figure explaining the detection method of the example in connection with 1st Embodiment. 第１の実施形態において画像認識処理を概念的に説明する図。The figure which illustrates notionally the image recognition process in 1st Embodiment. 第１の実施形態に関わる統合処理を説明する図。The figure explaining the integration process in connection with 1st Embodiment. 各実施形態に関わる学習装置の機能構成を示す図。The figure which shows the function structure of the learning apparatus in connection with each embodiment. 各実施形態に関わる学習処理を示すフローチャート。The flowchart which shows the learning process in connection with each embodiment. 第１の実施形態において第１識別器を学習する際の画像の例を示す図。The figure which shows the example of the image at the time of learning the 1st discriminator in 1st Embodiment. 第１の実施形態において第１識別器学習工程のフローチャート。The flowchart of a 1st discriminator learning process in 1st Embodiment. 第１の実施形態に関わる誤識別領域選択工程の処理を説明する図。The figure explaining the process of the misidentification area | region selection process in connection with 1st Embodiment. 第１の実施形態において負事例を生成する処理を説明するフローチャート。The flowchart explaining the process which produces | generates a negative example in 1st Embodiment. 第１の実施形態において第２識別器で学習する学習データを示す図。The figure which shows the learning data learned with a 2nd discriminator in 1st Embodiment. 第１の実施形態において負事例を生成する処理を説明する図。The figure explaining the process which produces | generates a negative example in 1st Embodiment. 第１の実施形態において統合識別器学習工程における処理のフローチャート。The flowchart of the process in an integrated discriminator learning process in 1st Embodiment. 第３の実施形態において統合識別器評価工程における処理のフローチャート。The flowchart of the process in an integrated discriminator evaluation process in 3rd Embodiment.

［第１の実施形態］
以下、図面を参照して本発明の実施形態を詳細に説明する。 [First Embodiment]
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

図１は、本実施形態に係る画像認識システムを示す構成図である。本実施形態の画像認識システムは、カメラ１０と画像認識装置２０とがネットワークを介して接続されている。なお、カメラ１０と画像認識装置２０とが一体に構成されていてもよい。カメラ１０によって撮影された画像は画像認識装置２０へと出力され、画像認識装置２０はカメラ１０より出力された画像を取得する。本実施形態では、図１に示すようなシーン３０をカメラ１０が撮影し、画像認識装置２０がこの認識対象画像を処理する形態について説明する。 FIG. 1 is a configuration diagram illustrating an image recognition system according to the present embodiment. In the image recognition system of this embodiment, the camera 10 and the image recognition device 20 are connected via a network. Note that the camera 10 and the image recognition device 20 may be configured integrally. An image captured by the camera 10 is output to the image recognition device 20, and the image recognition device 20 acquires an image output from the camera 10. In the present embodiment, a mode in which the camera 10 captures a scene 30 as shown in FIG. 1 and the image recognition apparatus 20 processes the recognition target image will be described.

図２は、本実施形態の認識対象画像を説明する図である。図２（ａ）は、カメラ１０によってシーン３０を撮影することにより得られた認識対象画像１００を示している。本実施形態の画像認識装置２０は、画像を認識するために、取得した認識対象画像の各小領域のクラスを識別し、認識対象画像を領域分割する。本実施形態において、クラスとは、図２（ｂ）に示されるように、ｓｋｙ、ｔｒｅｅ、ｃａｒ等といった被写体の分類に関するクラスカテゴリー名のことであり、クラスは認識対象画像の各画素に対して割り当てられる。 FIG. 2 is a diagram illustrating a recognition target image according to the present embodiment. FIG. 2A shows a recognition target image 100 obtained by photographing the scene 30 with the camera 10. In order to recognize an image, the image recognition apparatus 20 according to the present embodiment identifies a class of each small region of the acquired recognition target image and divides the recognition target image into regions. In the present embodiment, the class is a class category name relating to the classification of the subject such as sky, tree, car, etc., as shown in FIG. 2B, and the class is for each pixel of the recognition target image. Assigned.

図３は、本実施形態において、認識対象画像の各領域のクラスを識別する処理を概念的に説明する図である。図３（ａ）に示すように、認識対象画像を縦方向、横方向にそれぞれ分割して成る領域を処理の単位領域とし、クラスを識別する。本実施形態では、このクラス識別の処理の単位を画素単位としている。図３（ｂ）は、図３の（ａ）の左上部分を拡大した図であり、各画素１０３にｓｋｙカテゴリーが割り当てられている様子を示している。このように、本実施形態では、認識対象画像１００の各画素１０３にクラスが割り当てられることで、領域分割が実現される。 FIG. 3 is a diagram conceptually illustrating processing for identifying a class of each region of the recognition target image in the present embodiment. As shown in FIG. 3A, a class is identified by setting a region obtained by dividing a recognition target image in the vertical direction and the horizontal direction as a unit region for processing. In this embodiment, the unit of the class identification process is a pixel unit. FIG. 3B is an enlarged view of the upper left part of FIG. 3A, and shows a state where the sky category is assigned to each pixel 103. As described above, in this embodiment, a region is divided by assigning a class to each pixel 103 of the recognition target image 100.

図４は、画像認識装置２０のハードウェア構成を示すブロック図である。ＣＰＵ４０１は、画像認識装置２０全体を制御する。ＣＰＵ４０１がＲＯＭ４０３やＨＤ４０４等に格納されたプログラムを実行することにより、後述する画像認識装置２０の機能構成及び画像認識装置２０に係るフローチャートの処理が実現される。ＲＡＭ４０２は、ＣＰＵ４０１がプログラムを展開して実行するワークエリアとして機能する記憶領域を有する。ＲＯＭ４０３は、ＣＰＵ４０１が実行するプログラム等を格納する記憶領域を有する。ＨＤ４０４は、ＣＰＵ４０１が処理を実行する際に要する各種のプログラム、閾値に関するデータ等を含む各種のデータを格納する記憶領域を有する。操作部４０５は、ユーザによる入力操作を受け付ける。表示部４０６は、画像認識装置２０の情報を表示する。ネットワークＩ／Ｆ４０７は、画像認識装置２０と外部の機器とを接続する。 FIG. 4 is a block diagram illustrating a hardware configuration of the image recognition apparatus 20. The CPU 401 controls the entire image recognition apparatus 20. When the CPU 401 executes a program stored in the ROM 403, the HD 404, or the like, the functional configuration of the image recognition device 20 and the processing of the flowchart relating to the image recognition device 20 described later are realized. The RAM 402 has a storage area that functions as a work area where the CPU 401 develops and executes a program. The ROM 403 has a storage area for storing programs executed by the CPU 401. The HD 404 has a storage area for storing various types of data including various programs required when the CPU 401 executes processing, data relating to threshold values, and the like. The operation unit 405 receives an input operation by the user. The display unit 406 displays information of the image recognition device 20. A network I / F 407 connects the image recognition apparatus 20 and an external device.

図５は、本実施形態における画像認識装置２０の機能構成を示す図である。前述したとおり、本実施形態の画像認識装置２０は、カメラ１０とネットワークを介して接続されている。また、画像認識装置２０は、取得部５０１、識別部５０２、検出部５０４を有する。更に、画像認識装置２０は、必要な情報を記憶、保持するための手段として第１識別器保持部５０３、第２識別器保持部５０５、第１統合識別器保持部５０７を有する。なお、第１識別器保持部５０３、第２識別器保持部５０５、第１統合識別器保持部５０７は、画像認識装置２０とは別体の不揮発性記憶装置に設けられていてもよい。画像認識装置２０が有するこれらの各機能の詳細については、図６等を用いて後述する。 FIG. 5 is a diagram illustrating a functional configuration of the image recognition device 20 according to the present embodiment. As described above, the image recognition apparatus 20 of the present embodiment is connected to the camera 10 via a network. The image recognition apparatus 20 includes an acquisition unit 501, an identification unit 502, and a detection unit 504. Further, the image recognition apparatus 20 includes a first discriminator holding unit 503, a second discriminator holding unit 505, and a first integrated discriminator holding unit 507 as means for storing and holding necessary information. The first discriminator holding unit 503, the second discriminator holding unit 505, and the first integrated discriminator holding unit 507 may be provided in a non-volatile storage device that is separate from the image recognition device 20. Details of these functions of the image recognition apparatus 20 will be described later with reference to FIG.

図６（ａ）は、本実施形態における認識対象の画像を処理する際の画像認識処理を示すフローチャートである。まず、各工程の処理の概要を述べる。 FIG. 6A is a flowchart showing an image recognition process when processing an image to be recognized in the present embodiment. First, an outline of the process in each step will be described.

取得工程Ｓ１１０では、取得部５０１がカメラ１０によって撮影された認識対象画像を入力データとして受信する。検出工程Ｓ１２０では、検出部５０４が第２識別器保持部５０５に記憶されている第２識別器を用いて、予め第２識別器で学習しておいた事例を検出する。事例の検出方法および学習方法については、後述する。第２識別器が複数ある場合には複数回適用して事例を検出する。検出結果は、統合識別部５０６に送信される。 In the acquisition step S110, the acquisition unit 501 receives the recognition target image captured by the camera 10 as input data. In the detection step S120, the detection unit 504 uses the second discriminator stored in the second discriminator holding unit 505 to detect a case learned in advance by the second discriminator. A case detection method and a learning method will be described later. When there are a plurality of second discriminators, the case is detected by applying a plurality of times. The detection result is transmitted to the integrated identification unit 506.

識別工程Ｓ１３０では、識別部５０２が第１識別器保持部５０３に記憶されている第１識別器を用いて、認識対象画像の各領域のクラスを識別する。各領域のクラス識別結果は、統合識別部５０６に送信される。統合識別工程Ｓ１４０では、統合識別部５０６が第２識別器による検出結果および第１識別器による識別結果を統合して、認識対象画像の領域ごとにクラス識別を実行する。 In the identifying step S130, the identifying unit 502 identifies the class of each region of the recognition target image using the first classifier stored in the first classifier holding unit 503. The class identification result of each area is transmitted to the integrated identification unit 506. In the integrated identification step S140, the integrated identification unit 506 integrates the detection result by the second identifier and the identification result by the first identifier, and executes class identification for each region of the recognition target image.

次に、図６（ａ）に示したフローチャートに従って、各処理のより具体的な処理について述べる。 Next, more specific processing of each processing will be described according to the flowchart shown in FIG.

取得工程Ｓ１１０では、取得部５０１がカメラ１０によって撮影された認識対象画像を入力データとして受信する。この認識対象画像は、予め撮影されて外部装置に記憶されていてもよい。この場合、取得部５０１は、認識対象画像を外部装置から取得する。 In the acquisition step S110, the acquisition unit 501 receives the recognition target image captured by the camera 10 as input data. This recognition target image may be captured in advance and stored in an external device. In this case, the acquisition unit 501 acquires the recognition target image from the external device.

次に、検出工程Ｓ１２０では、検出部５０４が第２識別器保持部５０５に記憶されている第２識別器を用いて、予め第２識別器で学習しておいた事例を検出する。事例の学習方法については、学習時の処理を説明する際に述べる。 Next, in the detection step S120, the detection unit 504 uses the second discriminator stored in the second discriminator holding unit 505 to detect a case that has been learned in advance by the second discriminator. A case study method will be described when explaining the learning process.

図７は、本実施形態における事例の検出方法を説明するための図である。事例の検出は、図７（ａ）に示すように、認識対象画像１００に対して検出ウィンドウ１１０をスキャンさせることで行う。図７（ｂ）は、検出すべき事例が検出ウィンドウ１１０で検出される様子を示すものである。 FIG. 7 is a diagram for explaining a case detection method according to this embodiment. As shown in FIG. 7A, the case is detected by scanning the detection window 110 for the recognition target image 100. FIG. 7B shows a state in which a case to be detected is detected in the detection window 110.

そして、事例の存在位置を示すためのマスク１１１が学習時に記憶されていて、図７（ｃ）に示すようにマスク１１１を認識対象画像１００の検出位置に重ね合わせることで、事例に対応する領域が抽出される。マスク１１１は、事例が存在している画素に１、存在しない画素には０が記録されている。もしくは存在確率として０〜１の実数値が記録されていてもよい。 Then, a mask 111 for indicating the existence position of the case is stored at the time of learning, and an area corresponding to the case is obtained by superimposing the mask 111 on the detection position of the recognition target image 100 as shown in FIG. Is extracted. In the mask 111, 1 is recorded for pixels in which cases are present, and 0 is recorded for pixels in which no cases are present. Alternatively, a real value of 0 to 1 may be recorded as the existence probability.

検出ウィンドウで検出処理を行った結果は、各画素における検出スコア（もしくは尤度、０〜１の実数値とする）として出力される。具体的には、各画素における検出結果（スコア）は、検出器が出力するスコア（尤度）にマスク１１１に記録されている事例の存在確率をかけ合わせたものになる。各画素における検出結果（スコア）をＳ_Ｄ（ｘ、ｙ）とおくと、以下の数１式となる。
Ｓ_Ｄ（ｘ，ｙ）＝Ｓ（ｘ_０，ｙ_０）・Ｍａｓｋ（ｘ−ｘ_０，ｙ−ｙ_０）・・・数１
ここで、Ｓ（ｘ_０、ｙ_０）はｘ_０、ｙ_０における検出器（第２識別器）が出力するスコア（尤度）である。Ｍａｓｋ（ｘ−ｘ_０、ｙ−ｙ_０）は、ｘ、ｙにおける事例の存在確率を表している。複数の検出結果が存在する場合（１つの検出器によって多数の検出位置が検出され、検出結果が複数存在する場合）には、複数の検出結果を各画素で平均化すればよい。これにより、認識対象画像の各画素に対して０〜１の実数値が割り当てられる。以下、これを検出スコアマップと記載する。 The result of performing the detection process in the detection window is output as a detection score (or likelihood, a real value of 0 to 1) in each pixel. Specifically, the detection result (score) in each pixel is obtained by multiplying the score (likelihood) output from the detector by the existence probability of the case recorded in the mask 111. When the detection result (score) in each pixel is S _D (x, y), the following equation 1 is obtained.
S _D (x, y) = S (x ₀ , y ₀ ) · Mask (x−x ₀ , y−y ₀ )...
Here, S (x ₀ , y ₀ ) is a score (likelihood) output by the detector (second discriminator) at x ₀ , y ₀ . Mask (x−x ₀ , y−y ₀ ) represents the existence probability of cases in x and y. When there are a plurality of detection results (when a large number of detection positions are detected by one detector and a plurality of detection results exist), the plurality of detection results may be averaged at each pixel. Thereby, a real value of 0 to 1 is assigned to each pixel of the recognition target image. Hereinafter, this is referred to as a detection score map.

なお、事例を検出する際、マスク１１１を認識対象画像１００の検出位置に重ね合わせたあと、グラフカットなどを用いて輪郭をリファインしてもよい。リファインした場合は、そののちに数１式で示したように認識対象画像１００に対して検出スコアを算出すればよい。また、ここでは１つの事例のみを検出器によって検出する例について述べたが、複数の事例を検出してもよい。その場合は、各事例に対して検出スコアマップを保持しておく。 When detecting the case, the contour may be refined using a graph cut or the like after the mask 111 is superimposed on the detection position of the recognition target image 100. In the case of refinement, a detection score may be calculated for the recognition target image 100 as shown in Equation 1 after that. Although an example in which only one case is detected by the detector has been described here, a plurality of cases may be detected. In that case, a detection score map is held for each case.

次に、識別工程Ｓ１３０では、識別部５０２が第１識別器保持部５０３に記憶されている第１識別器を用いて、認識対象画像のクラス識別を行う。図８は、認識対象画像に対する識別工程Ｓ１３０の処理を示す図である。本実施形態においては、図８（ａ）、（ｂ）に示すように、撮影された認識対象画像１００を分割して成る小領域１０１ごとにクラスを識別する。ここで、小領域とは、画像中における１画素以上であって所定値以下の画素で構成される領域を意味する。本実施形態においては、非特許文献３に記載されているようなＳＰ（スーパーピクセル）と呼ばれる小領域に分割する。なお、その他ブロック分割などを用いるようにしてもよい。 Next, in the identifying step S130, the identifying unit 502 classifies the recognition target image using the first classifier stored in the first classifier holding unit 503. FIG. 8 is a diagram showing the process of the identification step S130 for the recognition target image. In the present embodiment, as shown in FIGS. 8A and 8B, a class is identified for each small region 101 formed by dividing the captured recognition target image 100. Here, the small area means an area composed of one or more pixels and a predetermined value or less in the image. In this embodiment, the image is divided into small areas called SP (superpixels) as described in Non-Patent Document 3. Other block divisions may be used.

本実施形態において、第１識別器は小領域１０１から特徴量を抽出し、その特徴量を入力とする識別器に相当する。このような識別器としては、例えば、非特許文献１に示すＲｅｃｕｒｓｉｖｅ−Ｎｅｕｒａｌ−Ｎｅｔｗｏｒｋｓ（ＲＮＮｓ）を用いることができる。または、ＳｕｐｐｏｒｔＶｅｃｔｏｒＭａｓｈｉｎｅｓ（ＳＶＭｓ）などの特徴量を入力して識別結果が出力される識別器であってもよい。本実施形態の識別結果は、予め定義されている各クラスに対して０から１の値域をとり、その値が高いほど信頼度が高いことを示す。第１識別器の学習方法については、学習時の処理を説明する際に述べる。 In the present embodiment, the first discriminator corresponds to a discriminator that extracts a feature amount from the small area 101 and receives the feature amount. As such a discriminator, for example, Recursive-Neural-Networks (RNNs) shown in Non-Patent Document 1 can be used. Alternatively, it may be a discriminator that inputs a feature value such as SupportVectorMachines (SVMs) and outputs a discrimination result. The identification result of the present embodiment takes a value range from 0 to 1 for each class defined in advance, and the higher the value, the higher the reliability. The learning method of the first discriminator will be described when explaining the processing at the time of learning.

なお、本実施形態においては、認識対象画像を予め小領域に分割し、小領域ごとに第１識別器によってクラス識別を行う方法について説明したが、これに限定されるものではない。例えば、非特許文献２に示されるような条件付き確率場ＣＲＦ（Ｃｏｎｄｉｔｉｏｎａｌ−Ｒａｎｄｏｍ−Ｆｉｅｌｄ）を用いて、領域分割及びクラス識別を同時に行ってもよい。また、本実施形態においては、図８（ｂ）に示すように、各小領域は重なり合うことなく認識対象画像上に配置されるとしているが、重なり合っていてもよい。その場合の統合方法については、後述する。 In the present embodiment, a method has been described in which the recognition target image is divided into small areas in advance, and class identification is performed by the first classifier for each small area. However, the present invention is not limited to this. For example, region division and class identification may be performed simultaneously using a conditional random field CRF (Conditional-Random-Field) as shown in Non-Patent Document 2. Further, in the present embodiment, as shown in FIG. 8B, each small region is arranged on the recognition target image without overlapping, but may be overlapped. The integration method in that case will be described later.

次に、統合識別工程Ｓ１４０では、統合識別部５０６が、検出工程Ｓ１２０における検出結果と識別工程Ｓ１３０におけるクラス識別結果とを統合して最終結果を出力する。図９は、本実施形態における統合処理を説明するための図である。図９（ａ）は認識対象画像、図９（ｂ）は第２識別器による検出結果、図９（ｃ）は第１識別器による識別結果、図９（ｄ）は統合による最終的な画像のクラス識別結果を表している。ここでいう検出結果とは、図９（ｂ）に示すように、検出ウィンドウ１１０によって検出された事例の検出位置に対してマスク１１１を認識対象画像に重ね合わせたものに相当し、実際には前述の数１式で示したように各画素に対して検出スコアを算出したものである。 Next, in the integrated identification step S140, the integrated identification unit 506 integrates the detection result in the detection step S120 and the class identification result in the identification step S130 and outputs the final result. FIG. 9 is a diagram for explaining the integration processing in the present embodiment. 9A is a recognition target image, FIG. 9B is a detection result by the second discriminator, FIG. 9C is a discrimination result by the first discriminator, and FIG. 9D is a final image by integration. Represents the class identification result. The detection result here corresponds to a case where the mask 111 is superimposed on the recognition target image at the detection position of the case detected by the detection window 110 as shown in FIG. The detection score is calculated for each pixel as shown in Equation 1 above.

本実施形態における統合方法としては、以下の二つの方法のいずれかを用いることができる。 As the integration method in the present embodiment, one of the following two methods can be used.

１つ目の方法としては、検出工程Ｓ１２０において検出器（第２識別器）が出力するスコア（尤度）のうち、予め定められた閾値以上の検出結果に対応するマスクを重畳し、それ以外の領域には第１識別器による識別結果を採用する。図９（ｂ）に示すように検出された事例に対応するマスク１１１を図９（ｄ）のように重畳する。検出された事例の全てのマスクを重畳したのちに、それ以外の領域に対して、図９（ｃ）に示す第１識別器による識別結果を重畳することで最終的なクラス識別結果とする。小領域同士が重なりあっている場合には、一度小領域ごとにクラスを識別したあと各画素で所属している小領域のクラス識別結果を平均化や投票処理することなどにより決定すればよい。また、検出処理によって得られたマスクと第１識別器に利用した小領域が重なっている場合には、どちらかの結果を優先的に採用してもよいし、重なっている領域のみ信頼度の高い結果を採用してもよい。 As a first method, a mask corresponding to a detection result equal to or higher than a predetermined threshold is superimposed on the score (likelihood) output by the detector (second discriminator) in the detection step S120, and the others The result of discrimination by the first discriminator is adopted for the area of. As shown in FIG. 9B, a mask 111 corresponding to the detected case is superimposed as shown in FIG. 9D. After all the masks of the detected cases are superimposed, the final class identification result is obtained by superimposing the identification result by the first classifier shown in FIG. 9C on the other regions. If the small areas overlap each other, once the class is identified for each small area, the class identification results of the small areas belonging to each pixel may be determined by averaging or voting. In addition, when the mask obtained by the detection process and the small area used for the first discriminator overlap, either result may be preferentially used, or the reliability of only the overlapping area High results may be employed.

２つ目の方法としては、予め学習された統合識別器を利用する。統合識別器の学習方法については後述し、ここでは識別時の統合識別器の利用方法について説明する。本実施形態では、各画素における検出工程Ｓ１２０の検出結果と各画素における識別工程Ｓ１３０におけるクラス識別結果を入力として、各画素のクラスを識別する統合識別器を利用する。各画素における識別結果とは、その画素が含まれる小領域に対するクラス識別の結果のことである。 As the second method, an integrated classifier learned in advance is used. A learning method of the integrated classifier will be described later, and here, a method of using the integrated classifier at the time of identification will be described. In the present embodiment, an integrated discriminator for identifying the class of each pixel is used with the detection result of the detection step S120 for each pixel and the class identification result for the identification step S130 for each pixel as inputs. The identification result in each pixel is a result of class identification for a small area including the pixel.

事例を検出する第２識別器の数をＮｄ、第１識別器によって識別されるクラス数をＣとすると、各画素で統合識別器に入力される入力ベクトルの次元数はＮｄ＋Ｃ次元となる。その入力ベクトルに対して、最終的に出力するクラス数Ｃに対応するＣ次元の出力値を統合識別器によって出力する。ここでは第１識別器で識別するクラス数と最終的に出力するクラス数は同数であるとしているが、異なる数でもよい。 Assuming that the number of second classifiers that detect cases is Nd and the number of classes identified by the first classifier is C, the number of dimensions of the input vector input to the integrated classifier at each pixel is Nd + C dimensions. For the input vector, a C-dimensional output value corresponding to the number of classes C to be finally output is output by the integrated discriminator. Here, the number of classes identified by the first discriminator is the same as the number of classes finally output, but may be different.

また、ここでは、各画素で統合識別を行ったが、識別工程Ｓ１３０で用いた小領域ごとに統合識別を行ってもよいし、統合識別用に小領域やブロックを規定して小領域やブロックごとに統合処理を行ってもよい。その場合は、統合識別器に入力する前に、検出スコアマップおよび第１識別器による識別結果を小領域やブロックごとに平均化する。 Here, the integrated identification is performed for each pixel. However, the integrated identification may be performed for each small area used in the identification step S130, or the small area and the block may be defined by defining the small area and the block for the integrated identification. You may perform an integration process for every. In that case, before inputting to the integrated classifier, the discrimination result by the detection score map and the first classifier is averaged for each small region or block.

以上説明した方法によって、検出工程における検出結果と識別工程における識別結果とが統合されて、最終的に図９（ｄ）の結果が得られる。このあと、非特許文献２に開示されているような条件付き確率場ＣＲＦ（Ｃｏｎｄｉｔｉｏｎａｌ−Ｒａｎｄｏｍ−Ｆｉｅｌｄ）を用いて各画素のクラスを再推定してもよい。 By the method described above, the detection result in the detection step and the identification result in the identification step are integrated, and finally the result of FIG. 9D is obtained. Thereafter, the class of each pixel may be re-estimated using a conditional random field CRF (Conditional-Random-Field) as disclosed in Non-Patent Document 2.

次に、本実施形態における検出工程Ｓ１２０、識別工程Ｓ１３０で利用する第１識別器および第２識別器の学習方法について説明する。 Next, a learning method for the first discriminator and the second discriminator used in the detection step S120 and the discrimination step S130 in the present embodiment will be described.

図１０（ａ）は、本実施形態における学習装置３００の機能構成を示す図である。なお、学習装置３００のハードウェア構成は図４に示した画像認識装置２０と同様である。ここでは、学習装置３００が図５の画像認識装置２０とは別に構成されているものとして説明するが、学習装置３００が画像認識装置２０と一体に構成されて、画像認識装置２０に学習装置３００の各機能部が含まれるよう構成されていてもよい。即ち、画像認識装置２０のＣＰＵ４０１がＲＯＭ４０３やＨＤ４０４等に格納されたプログラムを実行することにより、学習装置３００の機能構成及び学習装置３００に係るフローチャートの処理が実現されるようにしてもよい。 FIG. 10A is a diagram illustrating a functional configuration of the learning device 300 according to the present embodiment. Note that the hardware configuration of the learning apparatus 300 is the same as that of the image recognition apparatus 20 shown in FIG. Here, the learning apparatus 300 is described as being configured separately from the image recognition apparatus 20 of FIG. 5, but the learning apparatus 300 is configured integrally with the image recognition apparatus 20, and the learning apparatus 300 is included in the image recognition apparatus 20. These functional units may be included. That is, the CPU 401 of the image recognition apparatus 20 may execute a program stored in the ROM 403, the HD 404, or the like, so that the functional configuration of the learning apparatus 300 and the processing of the flowchart related to the learning apparatus 300 may be realized.

学習装置３００は、第１識別器学習部３０１、学習時識別部３０２、誤識別領域選択部３０３、第２識別器学習データ生成部３０４、第２識別器学習部３０５、統合識別器学習部３０６を有する。更に、学習装置３００は、必要なデータを記憶、保持するための手段として学習用画像保持部３５１、学習評価画像保持部３５２、第１識別器保持部３５３、第２識別器学習データ保持部３５４、第２識別器保持部３５５、第２統合識別器保持部３５６を有している。学習装置３００が有する各機能の詳細については、図１１（ａ）等を用いて後述する。 The learning device 300 includes a first discriminator learning unit 301, a learning-time discriminating unit 302, an erroneous discrimination region selection unit 303, a second discriminator learning data generation unit 304, a second discriminator learning unit 305, and an integrated discriminator learning unit 306. Have Furthermore, the learning device 300 stores learning data as a means for storing and holding data, a learning image holding unit 351, a learning evaluation image holding unit 352, a first classifier holding unit 353, and a second classifier learning data holding unit 354. The second classifier holding unit 355 and the second integrated classifier holding unit 356 are provided. Details of each function of the learning device 300 will be described later with reference to FIG.

図１１（ａ）は、本実施形態における学習に関する処理を示すフローチャートである。 FIG. 11A is a flowchart showing processing related to learning in the present embodiment.

まず、第１識別器学習工程Ｔ１１０では、第１識別器学習部３０１が、学習用画像保持部３５１に保持されている学習用画像を用いて、第１識別器を学習する。図１２は、第１識別器の学習に用いられる学習用画像を説明するための図である。本実施形態では、学習用画像として、例えば図１２（ａ）に示されるような画像５０と、図１２（ｂ）に示されるような、画像５０の各画素のクラス名が定義されている正解データ（以下ＧＴ（グランドトゥルース）と記載）とを利用する。このとき複数の識別器を学習してもよいが、ここでは、説明の簡略化のため識別器を１つ学習するものとする。第１識別器学習部３０１で学習された第１識別器は、第１識別器保持部３５３に送信される。 First, in the first discriminator learning step T110, the first discriminator learning unit 301 learns the first discriminator using the learning image held in the learning image holding unit 351. FIG. 12 is a diagram for explaining a learning image used for learning of the first discriminator. In the present embodiment, as learning images, for example, an image 50 as shown in FIG. 12A and a correct answer in which the class name of each pixel of the image 50 as shown in FIG. 12B is defined. Data (hereinafter referred to as GT (Grand Truth)) is used. At this time, a plurality of discriminators may be learned. However, here, for simplification of explanation, one discriminator is learned. The first discriminator learned by the first discriminator learning unit 301 is transmitted to the first discriminator holding unit 353.

次に、学習時識別工程Ｔ１２０では、学習時識別部３０２が、第１識別器学習工程Ｔ１１０で学習された第１識別器を用いて、学習評価画像保持部３５２に保持されている学習評価画像の領域のクラス識別を行う。ここでは、学習評価画像と前述の学習用画像の画像５０とを区別して説明しているが、学習評価画像と前述の学習用画像は同じデータであってもよい。学習時識別部３０２でクラス識別された結果は、誤識別領域選択部３０３に送信される。 Next, in the learning time identification step T120, the learning time identification unit 302 uses the first classifier learned in the first classifier learning step T110, and the learning evaluation image held in the learning evaluation image holding unit 352. Class identification of the area. Here, the learning evaluation image and the learning image 50 described above are distinguished from each other, but the learning evaluation image and the learning image may be the same data. The result of class identification by the learning time identification unit 302 is transmitted to the erroneous identification region selection unit 303.

次に、誤識別領域選択工程Ｔ１３０では、誤識別領域選択部３０３が、学習時識別工程Ｔ１２０において識別されたクラス識別結果から誤識別領域を選択する。誤識別領域選択部３０３は、クラス識別結果と学習評価画像保持部３５２に保持されている学習評価画像のＧＴとを比較することにより、誤識別領域を選択する。誤識別領域の選択方法は、後で詳しく説明する。 Next, in the erroneous identification region selection step T130, the erroneous identification region selection unit 303 selects an erroneous identification region from the class identification result identified in the learning-time identification step T120. The misidentification region selection unit 303 selects the misidentification region by comparing the class identification result with the GT of the learning evaluation image held in the learning evaluation image holding unit 352. The method for selecting the erroneous identification area will be described in detail later.

次に、第２識別器学習データ生成工程Ｔ１４０では、第２識別器学習データ生成部３０４が、誤識別領域選択工程Ｔ１３０で選択された誤識別領域に基づいて第２識別器で学習する学習データを生成する。この生成方法については、後で詳しく説明する。第２識別器学習データ生成部３０４によって生成された学習データは、第２識別器学習データ保持部３５４に送信される。 Next, in the second discriminator learning data generation step T140, the learning data that the second discriminator learning data generation unit 304 learns with the second discriminator based on the misidentification region selected in the misidentification region selection step T130. Is generated. This generation method will be described in detail later. The learning data generated by the second discriminator learning data generation unit 304 is transmitted to the second discriminator learning data holding unit 354.

次に、第２識別器学習工程Ｔ１５０では、第２識別器学習部３０５が、第２識別器学習データ生成工程Ｔ１４０で生成された学習データを用いて第２識別器を学習する。 Next, in the second discriminator learning step T150, the second discriminator learning unit 305 learns the second discriminator using the learning data generated in the second discriminator learning data generation step T140.

最後に、統合識別器学習工程Ｔ１６０では、統合識別器学習部３０６が、第１識別器学習工程Ｔ１１０で学習した第１識別器の識別結果と第２識別器学習工程Ｔ１５０で学習した第２識別器の識別結果とを統合する統合識別器もしくパラメータを学習する。 Finally, in the integrated discriminator learning step T160, the integrated discriminator learning unit 306 performs the identification result of the first discriminator learned in the first discriminator learning step T110 and the second discrimination learned in the second discriminator learning step T150. The integrated classifier or parameter that integrates the classifier result is learned.

次に、図１１（ａ）に示したフローチャートに従って、各工程の具体的な処理について述べる。 Next, specific processing of each step will be described according to the flowchart shown in FIG.

まず、第１識別器学習工程Ｔ１１０では、第１識別器学習部３０１が、第１識別器の学習を行う。第１識別器は、先に説明したように各画素のクラスを識別できるものであればどのような識別器であってもよい。本実施形態においては、小領域から特徴量を抽出し、その特徴量を入力とする識別器の１つであるＲｅｃｕｒｓｉｖｅ−Ｎｅｕｒａｌ−Ｎｅｔｗｏｒｋｓ（ＲＮＮｓ）を用いて説明する。ＲＮＮｓについては、非特許文献１に詳細な説明がなされている。 First, in the first discriminator learning step T110, the first discriminator learning unit 301 learns the first discriminator. The first classifier may be any classifier as long as it can identify the class of each pixel as described above. In the present embodiment, description will be made using Recursive-Neural-Networks (RNNs), which is one of classifiers that extract feature amounts from small regions and input the feature amounts. RNNs are described in detail in Non-Patent Document 1.

図１３は、第１識別器学習部３０１により実行される第１識別器学習工程の処理の詳細なフローを示したものである。図中のＭは、第１識別器の学習に用いる学習評価画像の数を示している。 FIG. 13 shows a detailed flow of the process of the first discriminator learning process executed by the first discriminator learning unit 301. M in the figure indicates the number of learning evaluation images used for learning of the first discriminator.

まず、Ｔ１２０１では、第１識別器の学習に用いる学習用画像のリストを設定する。 First, at T1201, a list of learning images used for learning of the first discriminator is set.

次に、Ｔ１２０２では、Ｔ１２０１で設定された学習画像リストに基づいて、第１識別器の学習に用いる各学習用画像を小領域に分割する。例えば、画像認識処理の識別工程Ｓ１３０で説明したようなＳＰ（スーパーピクセル）と呼ばれる小領域に分割する。 Next, in T1202, based on the learning image list set in T1201, each learning image used for learning by the first discriminator is divided into small regions. For example, the image is divided into small regions called SP (superpixels) as described in the image recognition processing identification step S130.

次に、Ｔ１２０３では、Ｔ１２０２で分割された各小領域の特徴量を抽出する。もしくは、全学習用画像それぞれの特徴量を予め抽出しておいて、この工程では学習画像リストに基づいて特徴量をロードしてもよい。Ｔ１２０２、Ｔ１２０３の処理は、全学習用画像の全小領域に対して行われる。特徴量の例としては、各小領域内の色特徴やテクスチャ特徴の統計量を用いればよく、例えば、ＲＧＢ、ＨＳＶ、Ｌａｂ、ＹＣｂＣｒ色空間の各成分や、Ｇａｂｏｒｆｉｌｔｅｒ、ＬｏＧのフィルタ応答を用いることができる。色特徴は、４（色空間）×３（成分）の１２次元となる。また、フィルタ応答は、Ｇａｂｏｒｆｉｌｔｅｒ、ＬｏＧフィルタの数に対応した次元数となる。 Next, in T1203, the feature amount of each small area divided in T1202 is extracted. Alternatively, the feature amounts of all the learning images may be extracted in advance, and the feature amounts may be loaded based on the learning image list in this step. The processes of T1202 and T1203 are performed for all small regions of all learning images. As an example of the feature amount, a statistic of color feature or texture feature in each small region may be used. For example, RGB, HSV, Lab, YCbCr color space components, Gabor filter, LoG filter responses are used. be able to. The color feature has 12 dimensions of 4 (color space) × 3 (component). The filter response has a dimension number corresponding to the number of Gabor filters and LoG filters.

さらには、小領域ごとに特徴付けを行うため、各小領域内の画素ごとに得られる特徴量から統計量を求める。用いる統計量は、平均、標準偏差、歪度、尖度の４つを用いるとする。歪度は分布の非対称性の度合いを示し、尖度は分布が平均の近くに密集している度合いを示す統計量である。よって、色特徴は４（色空間）×３（成分）×４（統計量）の４８次元となり、テクスチャ特徴の次元数は（フィルタ応答数）×４（統計量）となる。また、この他に小領域の重心座標や小領域の面積などを特徴量としてもよい。 Further, since the characterization is performed for each small area, the statistic is obtained from the characteristic amount obtained for each pixel in each small area. Assume that four statistics are used: average, standard deviation, skewness, and kurtosis. Skewness indicates the degree of asymmetry of the distribution, and kurtosis is a statistic indicating the degree to which the distribution is close to the average. Therefore, the color feature has 48 dimensions of 4 (color space) × 3 (component) × 4 (statistic), and the number of dimensions of the texture feature is (filter response number) × 4 (statistic). In addition, the center of gravity coordinates of the small area, the area of the small area, and the like may be used as the feature amount.

次に、Ｔ１２０４では、第１判別器が学習する領域のクラス定義およびクラス数を設定する。クラス数は２以上であればよい。例えば、図１２（ｂ）の学習用画像には、ｓｋｙ、ｂｕｉｌｄｉｎｇ、ｔｒｅｅ、ｒｏａｄ、ｂｏｄｙが定義されている。この場合、クラス数を５クラスとしてもよいし、ｂｕｉｌｄｉｎｇ、ｔｒｅｅ、ｒｏａｄ、ｂｏｄｙを合わせて１つのクラスとして、そのクラスとｓｋｙの２クラスを識別する識別器を学習してもよい。 Next, in T1204, the class definition and the number of classes of the area learned by the first discriminator are set. The number of classes may be two or more. For example, sky, building, tree, load, and body are defined in the learning image in FIG. In this case, the number of classes may be five, or a classifier that identifies two classes, that class and sky, may be learned by combining building, tree, load, and body as one class.

次に、Ｔ１２０５では、Ｔ１２０４で定義されたクラスを識別する第１識別器を学習する。学習された第１識別器は第１識別器保持部３５３に記憶される。 Next, in T1205, a first classifier that identifies the class defined in T1204 is learned. The learned first discriminator is stored in the first discriminator holding unit 353.

学習時識別工程Ｔ１２０では、学習時識別部３０２が、第１識別器学習工程Ｔ１１０により学習された第１識別器を用いて学習評価画像に対してクラスの識別を行う。ここでは、第１識別器の数は１、学習評価画像はＮ枚とし、合計Ｎ回のクラス識別を行う。このクラス識別では、先の第１識別器学習工程Ｔ１１０で定義した小領域に学習評価画像を分割して、各小領域の特徴量に基づいてクラスを識別する。本実施形態における第１識別器であるＲｅｃｕｒｓｉｖｅ−Ｎｅｕｒａｌ−Ｎｅｔｗｏｒｋｓ（ＲＮＮｓ）は、先に定義した各クラスに対する尤度を出力する。 In the learning time identification step T120, the learning time identification unit 302 identifies the class for the learning evaluation image using the first classifier learned in the first classifier learning step T110. Here, the number of first classifiers is 1, the number of learning evaluation images is N, and class identification is performed a total of N times. In this class identification, the learning evaluation image is divided into small regions defined in the first discriminator learning step T110, and the class is identified based on the feature amount of each small region. Recursive-Neural-Networks (RNNs), which is the first discriminator in this embodiment, outputs the likelihood for each class defined above.

第１識別器で識別するクラス数をＣとすると、各小領域に対する第１識別器の判別器結果Ｓ_Ｒは、以下の数２式で表される。
Ｓ_Ｒ＝｛Ｓ_１，Ｓ_２，・・・，Ｓ_ｃ｝・・・数２
ここで各Ｓ_ｃ（ｃ＝１、２、・・・、Ｃ）は、各クラスに対する尤度である。クラスを識別するにあたっては、上記の数２式に基づき、各小領域に対して最も尤度の高いクラスを割り当てる。各学習評価画像に対するクラス識別の結果は、誤識別領域選択部３０３に送信される。 When the number of classes identified by the first identifier is C, classifier result S _R of the first discriminator for each small region is expressed by the following equation (2).
S _R = {S ₁ , S ₂ ,..., S _c }.
Here, each S _c (c = 1, 2,..., C) is a likelihood for each class. In identifying the class, the class with the highest likelihood is assigned to each small region based on the above equation (2). The class identification result for each learning evaluation image is transmitted to the misidentification region selection unit 303.

次に、誤識別領域選択工程Ｔ１３０では、誤識別領域選択部３０３が、学習評価画像に対する識別結果とＧＴ（グランドトゥルース）とを比較することで、誤識別領域を選択する。図１４は、誤識別領域選択工程の処理を説明する図である。図１４（ａ）は、学習評価画像１２０に対して第１識別器によってクラスを識別した小領域およびその識別結果を示している。また、図１４（ｂ）は、学習評価画像のＧＴ１３０を示している。誤識別領域選択部３０３は、図１４（ａ）の識別結果と図１４（ｂ）のＧＴとを比較して、誤識別領域１２１を選択する。誤識別領域選択部３０３は、各学習評価画像に対して誤識別領域を取得する。 Next, in the misidentification region selection step T130, the misidentification region selection unit 303 selects the misidentification region by comparing the identification result for the learning evaluation image with GT (ground truth). FIG. 14 is a diagram for explaining the process of the erroneous identification region selection step. FIG. 14A shows a small area in which a class is identified by the first classifier in the learning evaluation image 120 and the identification result. FIG. 14B shows the GT 130 of the learning evaluation image. The misidentification area selection unit 303 selects the misidentification area 121 by comparing the identification result of FIG. 14A and the GT of FIG. The erroneous identification area selection unit 303 acquires an erroneous identification area for each learning evaluation image.

次に、第２識別器学習データ生成工程Ｔ１４０では、第２識別器学習データ生成部３０４が、誤識別領域選択工程Ｔ１３０で選択された誤識別領域に基づいて、第２識別器で学習する学習データを生成する。図１５は、第２識別器学習データ生成部３０４によって実行される第２識別器学習データ生成工程Ｔ１４０の詳細な処理のフローチャートである。図中のＬは選択する誤識別領域の数であり、生成される学習データ数に対応する。 Next, in the second discriminator learning data generation step T140, learning in which the second discriminator learning data generation unit 304 learns with the second discriminator based on the misidentification region selected in the misidentification region selection step T130. Generate data. FIG. 15 is a flowchart of detailed processing of the second discriminator learning data generation step T140 executed by the second discriminator learning data generation unit 304. L in the figure is the number of misidentification areas to be selected, and corresponds to the number of learning data to be generated.

Ｔ１４０１では、誤識別領域選択工程Ｔ１３０で選択された誤識別領域をソートする。全ての誤識別領域に対して第２識別器で学習データを生成するようにしてもよいが、ここでは、一部の誤識別領域を用いて学習データを生成するために、誤識別領域をソートする。ソート方法は、たとえば、学習評価画像の各小領域をクラス識別した際の尤度や小領域の面積を用いればよい。たとえば、尤度の高い領域を選択すれば、尤度が高いにも関わらずクラスを誤って識別している小領域を選択することができ、小領域の特徴量のみでは識別することが難しいような小領域を選択することができる。 In T1401, the misidentification areas selected in the misidentification area selection step T130 are sorted. Although the learning data may be generated by the second discriminator for all the misidentification areas, here the misidentification areas are sorted in order to generate learning data using a part of the misidentification areas. To do. As the sorting method, for example, the likelihood when classifying each small region of the learning evaluation image or the area of the small region may be used. For example, if a region with a high likelihood is selected, a small region in which the class is mistakenly identified even though the likelihood is high can be selected, and it is difficult to identify only with a feature amount of the small region. A small area can be selected.

Ｔ１４０２では、Ｔ１４０１でソートした誤識別領域を１つ選択する。そして、Ｔ１４０３では、第２識別器で学習する学習データ（正事例）を生成する。図１６は、第２識別器で学習する学習データを示す。図１６（ａ）は、学習評価画像１２０から、誤識別領域１２１に近い小領域をいくつか連結して学習データ１２２を生成した例を表している。連結させる小領域は誤識別領域に近い領域で、同じクラスの領域であればいくつ連結してもよい。図１６（ｂ）は、さらに小領域をいくつか連結した例を表している。図１６（ｃ）は、図１６（ｄ）に示されるようなＧＴ１３０に基づいて、誤識別領域を含む同クラスの領域（図ではｃａｒ領域）を第２識別器の学習データとして選択している。なお、学習データの選択方法としては、誤識別を含むようにすれば、小領域を連結するのではなく矩形などで学習データを切り出してきてもよい。 In T1402, one misidentification area sorted in T1401 is selected. In T1403, learning data (positive case) learned by the second discriminator is generated. FIG. 16 shows learning data learned by the second discriminator. FIG. 16A shows an example in which learning data 122 is generated by connecting several small areas close to the misidentification area 121 from the learning evaluation image 120. The small area to be connected is an area close to the misidentification area, and any number of areas of the same class may be connected. FIG. 16B shows an example in which several small regions are further connected. In FIG. 16C, based on the GT 130 as shown in FIG. 16D, an area of the same class including the misidentification area (the car area in the figure) is selected as learning data for the second discriminator. . As a method for selecting learning data, if misidentification is included, the learning data may be cut out by using a rectangle or the like instead of connecting small areas.

Ｔ１４０４では、Ｔ１４０３で生成された学習データ（正事例）に対する負事例を生成する。図１７は、負事例を生成するＴ１４０４の処理を説明するための図である。図１７（ａ）は学習評価画像１２０に対して生成された学習データ１２２を示しており、図１７（ｂ）は別の学習評価画像１５０から負事例１２３を生成する様子を示している。負事例は、誤識別領域に小領域をいくつか連結して生成した学習データの形状を表すマスクを用いて、誤識別領域を検出した学習評価画像とは異なる学習評価画像の一部分を切り出すことで生成することができる。このようにして、複数の学習評価画像から、学習データと同一の形状で切り出した領域を負事例として作成する。なお、負事例は、誤識別領域で誤識別したクラスと同じクラスの領域から生成される方が効果的である。 In T1404, a negative case for the learning data (positive case) generated in T1403 is generated. FIG. 17 is a diagram for explaining the processing of T1404 for generating a negative case. FIG. 17A shows learning data 122 generated for the learning evaluation image 120, and FIG. 17B shows a state where a negative case 123 is generated from another learning evaluation image 150. FIG. The negative example is by cutting out a part of the learning evaluation image that is different from the learning evaluation image that detected the misidentification area, using a mask that represents the shape of the learning data generated by connecting several small areas to the misidentification area. Can be generated. In this way, a region cut out in the same shape as the learning data from a plurality of learning evaluation images is created as a negative case. In addition, it is more effective that the negative case is generated from an area of the same class as the class erroneously identified in the misidentification area.

Ｔ１４０５では、Ｔ１４０３で生成された学習データ（正事例）により第２識別器を学習することで、識別性能が向上するかを評価する。具体的には、Ｔ１４０３で生成された学習データ（正事例）およびＴ１４０４で生成された負事例データから特徴量を取得し、その距離を算出する。ここでは、第１識別器学習の際に用いた特徴量と同様の特徴量を用いるが、他の特徴量を用いるようにしてもよい。生成した学習データをＳ_Ｐｏｓｉ、負事例データをＳ_{Ｎｅｇａ＿ｉ}とするとき、Ｓ_Ｐｏｓｉは数３式を満たすように設定すればよい。
ａｒｇ_ｉｍｉｎｄｉｓｔ（ｆ（Ｓ_Ｐｏｓｉ），ｆ（Ｓ_{Ｎｅｇａ＿ｉ}））＞ｌ_１・・・数３
数３式において、ｄｉｓｔは二つの特徴量の距離を計算するもので、ヒストグラム距離やユーグリッド距離などを用いることができる。また、ｆは小領域内の特徴量を示している。数３式により表現されるように、最も近接する負事例データに対する距離が所定の距離ｌ_１以上であれば、その学習データを採用し、所定の距離ｌ_１以下であれば再度学習データを生成する。ｌ_１は０以上の任意の値でよいが、例えば、学習評価画像中で誤識別していない小領域同士の距離それぞれ算出し、その中で最も値の小さい距離を用いるようにしてもよい。 In T1405, it is evaluated whether the discrimination performance is improved by learning the second discriminator from the learning data (correct case) generated in T1403. Specifically, the feature amount is acquired from the learning data (positive case) generated in T1403 and the negative case data generated in T1404, and the distance is calculated. Here, the same feature quantity as that used in the first discriminator learning is used, but other feature quantities may be used. When the generated learning data is S _Posi and the negative case data is S _{Nega_i} , S _Posi may be set to satisfy Equation 3.
arg _i min dist (f (S _Posi ), f (S _{Nega — i} ))> l ₁
In equation (3), dist calculates the distance between two feature quantities, and a histogram distance, Eugrid distance, or the like can be used. Further, f indicates a feature amount in the small area. As expressed by Equation 3, if the distance to the closest negative case data is greater than or equal to the predetermined distance l ₁ , the learning data is adopted, and if the distance is equal to or smaller than the predetermined distance l ₁ , learning data is generated again. To do. l ₁ may be an arbitrary value equal to or greater than 0. For example, the distance between small regions that are not misidentified in the learning evaluation image may be calculated, and the distance having the smallest value may be used.

また、学習データと負事例データを比較するだけでなく、学習データと同じクラスの領域（この場合ｃａｒの領域）とも比較を行い、学習データと負事例データの距離は離し、且つ、同じクラスの他の領域との距離は離さないようにする。これにより、学習データが示す領域の大きさが大きくなりすぎないようできる。同じクラスの他の領域をＳ（ｃ＝ｃ（Ｓ_Ｐｏｓｉ））とすると、Ｓ_Ｐｏｓｉは数４式を満たすように設定すればよい。
ｄｉｓｔ（ｆ（Ｓ_Ｐｏｓｉ），ｆ（Ｓ_Ｎｅｇａ））＞ｌ_１，ｄｉｓｔ（ｆ（Ｓ_Ｐｏｓｉ），ｆ（Ｓ（ｃ＝ｃ（Ｓ_Ｐｏｓｉ））））＜ｌ_２・・・数４
ここで、数４式におけるｌ_２は予め定められた所定値であり、例えば、学習評価画像中で誤識別していない小領域同士の距離を算出し、その中で最も小さい値をｌ_１に、最も大きい値ｌ_２とする。選択した誤識別領域の全てに対して、以上の処理を行う。 In addition to comparing the learning data with the negative case data, the learning data is also compared with the area of the same class as the learning data (in this case, the car area), the distance between the learning data and the negative case data is separated, and the same class Avoid distances from other areas. Thereby, the size of the area indicated by the learning data can be prevented from becoming too large. Assuming that another region of the same class is S (c = c (S _Posi )), S _Posi may be set so as to satisfy Equation 4.
dist (f (S _Posi ), f (S _Nega ))> l ₁ , dist (f (S _Posi ), f (S (c = c (S _Posi )))) <l _2.
Here, l _{2 in} Equation 4 is a predetermined value, for example, the distance between small regions that are not misidentified in the learning evaluation image is calculated, and the smallest value among them is set to l ₁ . The largest value is l ₂ . The above process is performed on all selected erroneous identification areas.

次に、第２識別器学習工程Ｔ１５０では、第２識別器学習部３０５が、第２識別器学習データ生成工程Ｔ１４０で生成された学習データを用いて識別器を学習する。たとえば、各学習データに対して生成した負事例を用いて、ＳｕｐｐｏｒｔＶｅｃｔｏｒＭａｓｈｉｎｅｓ（ＳＶＭｓ）を学習すればよい。学習された各識別器は第２識別器学習データ保持部３５４に保持され、識別処理において利用される。 Next, in the second discriminator learning step T150, the second discriminator learning unit 305 learns the discriminator using the learning data generated in the second discriminator learning data generation step T140. For example, it is only necessary to learn SupportVectorMachines (SVMs) using negative cases generated for each learning data. Each learned discriminator is held in the second discriminator learning data holding unit 354 and used in the discrimination processing.

最後に、統合識別器学習工程Ｔ１６０では、統合識別器学習部３０６が、第１識別器学習工程Ｔ１１０で学習した第１識別器の識別結果と第２識別器学習工程Ｔ１５０で学習した第２識別器の識別結果を統合する統合識別器もしくパラメータを学習する。 Finally, in the integrated discriminator learning step T160, the integrated discriminator learning unit 306 performs the identification result of the first discriminator learned in the first discriminator learning step T110 and the second discrimination learned in the second discriminator learning step T150. The integrated discriminator or parameter for integrating the discriminator results is learned.

前述したとおり、第１識別器の識別結果と第２識別器の識別結果とを統合する統合方法は２つある。ここでは、２つ目の方法として挙げた統合識別器を利用する方法を用いるものとし、この統合識別器の学習方法について説明をする。なお、１つ目の方法を利用する場合には、検出器（第２識別器）が出力するスコア（尤度）から最終結果として採用するスコアを決めるための閾値を、学習評価画像に対して決定する。その際には、複数の閾値を設定し、学習評価画像に対して領域分割の精度が最も高くなる閾値を採用すればよい。 As described above, there are two integration methods for integrating the identification result of the first classifier and the identification result of the second classifier. Here, the method using the integrated classifier mentioned as the second method is used, and a learning method of this integrated classifier will be described. When the first method is used, a threshold for determining a score to be adopted as a final result from the score (likelihood) output from the detector (second discriminator) is set for the learning evaluation image. decide. In that case, a plurality of threshold values may be set, and a threshold value with the highest area division accuracy may be adopted for the learning evaluation image.

図１８は、統合識別器学習工程Ｔ１６０において、統合識別器学習部３０６が行う処理の詳細なフローチャートである。図中のＮは学習評価画像数を示している。ここでは、学習評価画像に対して再度第１識別器を用いてクラスを識別するようにしているが、学習時識別工程Ｔ１２０で行った識別結果を保持しておいてロードしてもよい。 FIG. 18 is a detailed flowchart of processing performed by the integrated classifier learning unit 306 in the integrated classifier learning step T160. N in the figure indicates the number of learning evaluation images. Here, the class is identified again using the first discriminator for the learning evaluation image, but the identification result performed in the learning-time identification step T120 may be held and loaded.

Ｔ２６０１では、統合識別器の学習に用いる学習評価画像のリストを取得する。以下では学習評価画像の全画素の結果を統合識別器の学習に用いるとしているが、学習時間短縮のために間引いてもよい。また、ここでは学習評価画像を利用する例について述べているが、他の学習画像を用意してもよいし、第１識別器の学習時に用いた学習用画像を利用してもよい。 In T2601, a list of learning evaluation images used for learning of the integrated classifier is acquired. In the following, the results of all the pixels of the learning evaluation image are used for learning of the integrated discriminator, but may be thinned out to shorten the learning time. Although an example using a learning evaluation image is described here, another learning image may be prepared, or a learning image used during learning of the first discriminator may be used.

Ｔ２６０２では、学習評価画像のリストに従い、学習評価画像を１枚ずつロードする。次に、Ｔ２６０３では、先の第２識別器学習手段で学習された第２識別器をロードし、学習評価画像の対象位置に対して検出処理を行う。対象位置とは、学習評価画像に対して検出処理を行う対象の位置であり、通常は学習評価画像の左上から右下に対してラスタスキャンをしていく。本実施形態においても、学習評価画像の左上から右下に対してラスタスキャンすることで、検出処理を行う。なお、本実施形態では、第２識別器を１つだけとして説明するが、複数ある場合には複数の識別器を利用して検出処理を行う。なお、複数の識別器を利用した場合の統合識別器の学習方法に関しては後で述べる。 In T2602, according to the list of learning evaluation images, the learning evaluation images are loaded one by one. Next, in T2603, the second classifier learned by the second classifier learning unit is loaded, and detection processing is performed on the target position of the learning evaluation image. The target position is a position where a detection process is performed on the learning evaluation image, and usually a raster scan is performed from the upper left to the lower right of the learning evaluation image. Also in the present embodiment, the detection process is performed by raster scanning from the upper left to the lower right of the learning evaluation image. In the present embodiment, the description will be made assuming that there is only one second discriminator. However, when there are a plurality of discriminators, the detection process is performed using the plural discriminators. A learning method of the integrated classifier when a plurality of classifiers is used will be described later.

すべての位置で検出処理を行ったのち、処理はＴ２６０５へ進む。Ｔ２６０５では、第１識別器を用いて学習評価画像のクラス識別を行う。 After performing the detection process at all positions, the process proceeds to T2605. In T2605, class identification of the learning evaluation image is performed using the first classifier.

Ｔ２６０６では、Ｔ２６０３で行われた検出処理およびＴ２６０５で行われたクラス識別の結果を結合する。各画素に対応する結合結果をＳ（ｘ、ｙ）とすると、統合結果は以下の数５式となる。
Ｓ（ｘ，ｙ）＝｛Ｓ_{（ｘ，ｙ）∈Ｒ}，Ｓ_Ｄ（ｘ，ｙ）｝・・・数５
ここで、Ｓ（ｘ、ｙ）∈Ｒは対象画素が含まれる小領域に対する第１識別器による識別結果を表している。また、Ｓ_Ｄ（ｘ、ｙ）は数１式で現れる対象画素（ｘ、ｙ）に対応する第２識別器の識別スコアである。たとえば、第１識別器の識別するクラス数を１０として、第２識別器の数が５の場合、Ｓ（ｘ、ｙ）の次元数は１５となる。以上の作業を学習評価画像のリストに従い、各学習評価画像の全画素に対して行う。 In T2606, the detection process performed in T2603 and the result of the class identification performed in T2605 are combined. When the combined result corresponding to each pixel is S (x, y), the integrated result is expressed by the following equation (5).
S (x, y) = {S _{(x, y) εR} , S _D (x, y)} _Equation 5
Here, S (x, y) εR represents the discrimination result by the first discriminator for the small region including the target pixel. S _D (x, y) is an identification score of the second classifier corresponding to the target pixel (x, y) appearing in Equation 1. For example, when the number of classes identified by the first classifier is 10, and the number of second classifiers is 5, the number of dimensions of S (x, y) is 15. The above operation is performed on all the pixels of each learning evaluation image according to the list of learning evaluation images.

Ｔ２６０７では、数５式で表されるＴ２６０６で結合した結果を入力として、各画素のクラスを識別する識別器を学習する。図１２（ｂ）の場合、ｓｋｙ、ｂｕｉｌｄｉｎｇ、ｔｒｅｅ、ｒｏａｄ、ｂｏｄｙが定義されているため、５クラスに対応してクラス尤度を出力する識別器を学習する必要がある。５クラスのいずれであるかを識別するマルチクラス識別器を学習してもよいし、各クラスについて、そのクラスであるかそれ以外のクラスであるかを識別する２クラス識別器を５個学習し、５個の識別器のうち最もクラス尤度の高いクラスに割り当てるようにしてもよい。図１２（ｂ）の場合であれば、ｓｋｙであるか、それ以外のクラスであるｂｕｉｌｄｉｎｇ、ｔｒｅｅ、ｒｏａｄ、ｂｏｄｙをまとめた４クラスのどれかであるかの２クラス識別を行う。それを他の４クラスに対しても行い、識別時には最も高い尤度を出力した２クラス識別器が識別しているクラスを割り当てる。 In T2607, the classifier that identifies the class of each pixel is learned by using the result of combining in T2606 expressed by Equation 5 as an input. In the case of FIG. 12B, since sky, building, tree, load, and body are defined, it is necessary to learn a classifier that outputs class likelihood corresponding to five classes. A multi-class classifier that identifies which class is one of the five classes may be learned, and for each class, five 2-class classifiers that identify the class or the other class are learned. Of the five classifiers, the class having the highest class likelihood may be assigned. In the case of FIG. 12B, two-class identification is performed to determine whether the class is sky or any of the four classes including building, tree, load, and body that are other classes. This is also performed for the other four classes, and at the time of identification, the class identified by the two-class classifier that outputs the highest likelihood is assigned.

以上により、統合識別器学習工程Ｔ１６０の処理は完了し、この工程で学習された統合識別器は第２統合識別器保持部３５６に保持され、識別処理の統合識別工程Ｓ１４０において供される。上記説明では、第１統合識別器保持部５０７と第２統合識別器保持部３５６を別の構成として説明したが、画像認識装置２０と学習装置３００を一体に構成する場合には、第１統合識別器保持部５０７と第２統合識別器保持部３５６を１つの保持部としてもよい。 The integrated discriminator learning step T160 is thus completed, and the integrated discriminator learned in this step is held in the second integrated discriminator holding unit 356 and provided in the integrated discriminating step S140 of the discriminating process. In the above description, the first integrated discriminator holding unit 507 and the second integrated discriminator holding unit 356 have been described as separate configurations. However, when the image recognition device 20 and the learning device 300 are configured integrally, the first integration discriminator The classifier holding unit 507 and the second integrated classifier holding unit 356 may be a single holding unit.

本実施形態では、各画素に対して統合識別を行う例について述べたが、数５式で表される各画素における結合結果を小領域もしくはブロックごとに平均化して入力してもよい。その場合、統合する識別器の学習時に与える教師値は小領域内で最も画素数の多いクラスを与えてもよし、小領域の面積に対する画素数を回帰値としてｓｕｐｐｏｒｔ−ｖｅｃｔｏｒ−ｒｅｇｒｅｓｓｉｏｎを学習してもよい。 In the present embodiment, an example in which integrated identification is performed on each pixel has been described. However, a combination result in each pixel expressed by Equation 5 may be averaged and input for each small region or block. In that case, the teacher value given when learning the classifiers to be integrated may be given the class having the largest number of pixels in the small area, or the support-vector-regulation is learned by using the number of pixels relative to the area of the small area as a regression value. Also good.

また、本実施形態においては、全学習評価画像を用いて統合識別器を学習する例について説明したが、全学習評価画像を用いるのではなく、学習評価画像をランダムに選択してもよいし、学習評価画像内の画素をランダムに選択してもよい。 Moreover, in this embodiment, although the example which learns an integrated discriminator using all the learning evaluation images was demonstrated, instead of using all the learning evaluation images, you may select a learning evaluation image at random, You may select the pixel in a learning evaluation image at random.

以上のように、本実施形態の画像認識装置２０によれば、まず、第２識別器によりクラスを識別すべき領域を検出するとともに、第１識別器を用いて前記認識対象画像の領域ごとにクラスを識別する。そして、第２識別器を用いた検出結果と第１識別器によるクラス識別を統合して、認識対象画像の各領域のクラスを識別する。この構成により、本実施形態では、クラスの識別が難しい小領域の誤検出を軽減でき、画像を高精度に認識することができる。 As described above, according to the image recognition device 20 of the present embodiment, first, a region where a class is to be identified is detected by the second discriminator, and for each region of the recognition target image using the first discriminator. Identifies the class. Then, the detection result using the second classifier and the class identification by the first classifier are integrated to identify the class of each region of the recognition target image. With this configuration, in this embodiment, it is possible to reduce erroneous detection of a small region where class identification is difficult, and to recognize an image with high accuracy.

また、学習時識別部３０２が第１識別器による学習評価画像の識別結果を評価するとともに、誤識別領域選択部３０３が第１識別器では誤識別領域を、識別が難しい小領域として選択する。そして、第２識別器学習部３０５は、選択された誤識別領域を含む複数の小領域からなる領域を識別するための第２の識別器を学習する。最後に、第１識別器による識別結果と第２識別器による識別結果とを統合して、統合識別器を学習する。以上のように学習した第１識別器および第２識別器を利用する画像認識装置２０では、画像の識別精度を高めることができる。 In addition, the learning-time identification unit 302 evaluates the identification result of the learning evaluation image by the first discriminator, and the misidentification region selection unit 303 selects the misidentification region as a small region that is difficult to identify in the first discriminator. Then, the second discriminator learning unit 305 learns a second discriminator for identifying a region composed of a plurality of small regions including the selected erroneous identification region. Finally, the discrimination result by the first discriminator and the discrimination result by the second discriminator are integrated to learn the integrated discriminator. In the image recognition apparatus 20 using the first classifier and the second classifier learned as described above, the image identification accuracy can be increased.

［第２の実施形態］
次に、第２の実施形態として、学習時に位置情報などの付帯情報を第２識別器に対応づけておくことで、認識時にその付帯情報を取得し、その情報に基づいて複数ある第２識別器の中から必要な第２識別器を選択して識別を行う形態について説明を行う。 [Second Embodiment]
Next, as a second embodiment, incidental information such as position information is associated with a second discriminator at the time of learning to acquire the incidental information at the time of recognition, and there are a plurality of second identifications based on the information. A mode in which a necessary second discriminator is selected from among the discriminators to perform discrimination will be described.

付帯情報とは、撮影の際に画像に付帯させることができる種々の情報のことである。たとえば、撮影を行った位置のＧＰＳなどの位置情報、色温度やカメラで設定された各種パラメータ等の認識対象画像を撮影したカメラにより得られる情報、ブロックや画素ごとに得られるオートフォーカスの情報、撮影画像から得られるシーン特徴量などがある。シーン特徴量とは、画像に対して一意に得られる特徴量のことであり、非特許文献５に記載されているＳｐａｔｉａｌＰｙｒａｍｉｄＭａｔｃｈｉｎｇＫｅｒｎｅｌや非特許文献４に記載されているＧＩＳＴ特徴量を利用することができる。または、画像を複数のブロックに分割し、各ブロックの色分布をヒストグラム化した特徴量などでもよい。その他、画像全体を表す特徴量や、画像の各部分から得られる特徴量を統計量として集計したものであれば利用できる。 The accompanying information is various information that can be attached to an image at the time of photographing. For example, position information such as GPS of the position where the image was taken, information obtained by the camera that took the image to be recognized such as color temperature and various parameters set by the camera, autofocus information obtained for each block or pixel, There are scene feature values obtained from captured images. The scene feature value is a feature value uniquely obtained for an image, and uses a spatial pyramid matching kernel described in Non-Patent Document 5 or a GIST feature value described in Non-Patent Document 4. be able to. Alternatively, it may be a feature amount obtained by dividing an image into a plurality of blocks and histogramating the color distribution of each block. In addition, any feature amount that represents the entire image or a feature amount obtained from each part of the image is aggregated as a statistic.

本実施形態における画像認識装置２０、学習装置３００の機能構成は、第１の実施形態において図５、図１０（ａ）で示した画像認識装置２０、学習装置３００と同様であるが、一部の機能部における処理内容が第１の実施形態とは異なる。具体的には、画像認識装置２０における取得部５０１は、画像の認識処理において、認識対象画像とともに付帯情報を取得する。また、学習装置３００が用いる学習用画像および学習評価画像について、少なくとも第２識別器の学習に用いるデータに対しては付帯情報が必要である。 The functional configurations of the image recognition device 20 and the learning device 300 in the present embodiment are the same as those of the image recognition device 20 and the learning device 300 shown in FIGS. 5 and 10A in the first embodiment, but partly. The processing content in the functional unit is different from that of the first embodiment. Specifically, the acquisition unit 501 in the image recognition apparatus 20 acquires incidental information together with the recognition target image in the image recognition processing. Further, with respect to the learning image and learning evaluation image used by the learning apparatus 300, supplementary information is necessary for at least data used for learning of the second discriminator.

次に、本実施形態における画像の認識処理について説明する。図６（ｂ）は、本実施形態における認識対象の画像を処理する際の認識処理を示すフローチャートである。取得工程Ｓ２１０の処理内容は、第１の実施形態におけるＳ１１０と同様であるため、説明を省略する。 Next, image recognition processing in the present embodiment will be described. FIG. 6B is a flowchart showing the recognition processing when processing the image to be recognized in the present embodiment. Since the processing content of acquisition process S210 is the same as that of S110 in 1st Embodiment, description is abbreviate | omitted.

付帯情報取得工程Ｓ２２０では、取得部５０１が、取得工程Ｓ２１０において取得した認識対象画像の付帯情報を取得し、取得した付帯情報を検出部５０４に送信する。取得する付帯情報としては、前述したように、たとえば、カメラ１０にＧＰＳが搭載されていれば位置情報を取得するようにしてもよいし、もしくはカメラ１０が撮像するために用いたパラメータ等を取得するようにしてもよい。 In the incidental information acquisition step S220, the acquisition unit 501 acquires the incidental information of the recognition target image acquired in the acquisition step S210, and transmits the acquired incidental information to the detection unit 504. As the auxiliary information to be acquired, as described above, for example, if the camera 10 is equipped with a GPS, position information may be acquired, or parameters used for imaging by the camera 10 may be acquired. You may make it do.

検出工程Ｓ２３０では、検出部５０４が、付帯情報取得工程Ｓ２２０において得られた付帯情報に基づいて複数ある第２識別器の中から所定の第２識別器をロードし、検出処理を行う。本実施形態では、学習処理の第２識別器学習工程において、付帯情報を第２識別器と一緒に保持しておき、認識処理の際に付帯情報が近い所定数の第２識別器をロードする。または、付帯情報間の距離を定義して距離に応じて第２識別器をロードするようにしてもよい。 In the detection step S230, the detection unit 504 loads a predetermined second discriminator from among a plurality of second discriminators based on the incidental information obtained in the incidental information acquisition step S220, and performs detection processing. In the present embodiment, in the second discriminator learning step of the learning process, the auxiliary information is held together with the second discriminator, and a predetermined number of second discriminators whose auxiliary information is close are loaded during the recognition process. . Or you may make it define the distance between incidental information and load a 2nd discriminator according to distance.

またさらに、付帯情報との距離に応じて各検出器（第２識別器）の結果を重みづけしてもよい。例えば、シーン特徴量の場合、第２識別器の学習データを取得した学習評価画像のシーン特徴量と認識対象画像のシーン特徴量との距離を算出し、付帯情報との距離に応じて各検出器の結果を重みづけする。このような距離としては、例えばヒストグラム距離などを用いることができる。また、各種撮影パラメータや色温度などの、画像に対して一意に対応する情報を付帯情報として利用する場合には、その数値を直接比較すればよい。また、ブロックや画素ごとに対応づけられたオートフォーカス情報の場合には、その得られた値をベクトル化したり、ヒストグラム化したりして比較するようにすればよい。また、それらすべての値を結合、ベクトル化して、第２識別器の学習データを取得した学習評価画像から得られるベクトルと比較するようにしてもよい。 Furthermore, the result of each detector (second discriminator) may be weighted according to the distance from the incidental information. For example, in the case of a scene feature amount, the distance between the scene feature amount of the learning evaluation image obtained from the learning data of the second discriminator and the scene feature amount of the recognition target image is calculated, and each detection is performed according to the distance to the auxiliary information. Weight the result of the vessel. As such a distance, for example, a histogram distance or the like can be used. When information uniquely corresponding to an image such as various shooting parameters and color temperature is used as supplementary information, the numerical values may be directly compared. In addition, in the case of autofocus information associated with each block or pixel, the obtained values may be compared by vectorization or histogram formation. Alternatively, all these values may be combined and vectorized, and compared with a vector obtained from a learning evaluation image obtained by acquiring learning data of the second discriminator.

第２識別器をロードした後の検出工程Ｓ２３０の処理内容は、第１の実施形態における処理内容と同様であるため、その説明を省略する。 Since the processing content of the detection step S230 after loading the second discriminator is the same as the processing content in the first embodiment, the description thereof is omitted.

識別工程統合処理Ｓ２４０および統合識別処理工程Ｓ２５０の処理内容は、第１の実施形態におけるＳ１３０およびＳ１４０と同様であるため、説明を省略する。 The processing contents of the identification step integration processing S240 and the integration identification processing step S250 are the same as S130 and S140 in the first embodiment, and thus description thereof is omitted.

以上、本実施形態では、学習装置３００は第２識別器を学習する際に付帯情報も併せて保持し、画像を認識する認識処理の際に認識対象画像の付帯情報に基づいて第２識別器をロードする。これにより、不必要な第２識別器による検出処理を省くことででき、画像認識装置２０によるクラス識別の精度向上および高速化、省メモリ化を達成することができる。 As described above, in the present embodiment, the learning apparatus 300 also holds incidental information when learning the second discriminator, and the second discriminator is based on the incidental information of the recognition target image during the recognition process for recognizing the image. To load. Thereby, it is possible to omit unnecessary detection processing by the second discriminator, and it is possible to improve the accuracy and speed of class identification by the image recognition device 20 and to save memory.

［第３の実施形態］
次に、第３の実施形態として、学習処理において学習された統合識別器での識別結果を学習評価画像を用いて評価することにより、第２識別器および統合識別器による識別結果の精度を向上させる形態について説明する。なお、本実施形態において、統合識別器を評価するために用いられる学習評価画像は、他の工程で用いられる学習評価画像と同じであってもよいし、別に用意してもよい。 [Third Embodiment]
Next, as a third embodiment, the accuracy of identification results by the second classifier and the integrated classifier is improved by evaluating the classification result of the integrated classifier learned in the learning process using the learning evaluation image. The form to be made is demonstrated. In the present embodiment, the learning evaluation image used for evaluating the integrated discriminator may be the same as the learning evaluation image used in other steps, or may be prepared separately.

本実施形態における画像認識装置２０およびその処理フローは、第１の実施形態と同様であるため説明を省略する。次に、本実施形態における学習装置３００およびその処理フローについて説明する。図１０（ｂ）は、本実施形態における学習装置３００の機能構成を示す図である。本実施形態における学習装置３００は、第１の実施形態において図１０（ａ）で示した学習装置３００の各機能部の構成に加えて、統合識別器評価部３０７を有する。統合識別器評価部３０７の詳細な説明については、図１１（ｂ）等を用いて後述する。その他の機能部の構成については、図１０（ａ）と同様であるため説明を省略する。 Since the image recognition apparatus 20 and its processing flow in the present embodiment are the same as those in the first embodiment, description thereof will be omitted. Next, the learning apparatus 300 and its processing flow in this embodiment will be described. FIG. 10B is a diagram illustrating a functional configuration of the learning device 300 according to the present embodiment. The learning device 300 according to the present embodiment includes an integrated classifier evaluation unit 307 in addition to the configuration of each functional unit of the learning device 300 illustrated in FIG. 10A in the first embodiment. Detailed description of the integrated discriminator evaluation unit 307 will be described later with reference to FIG. The configuration of the other functional units is the same as that shown in FIG.

次に、本実施形態における学習処理について説明する。図１１（ｂ）は、本実施形態において、学習装置３００が実行する学習処理を示すフローチャートである。第１識別器学習工程Ｔ３１０から統合識別器学習工程Ｔ３６０までの処理は、第１の実施形態における第１識別器学習工程Ｔ１１０から統合識別器学習工程Ｔ１６０と同様であるため、その説明を省略する。 Next, the learning process in this embodiment will be described. FIG. 11B is a flowchart illustrating a learning process executed by the learning device 300 in the present embodiment. The processing from the first discriminator learning step T310 to the integrated discriminator learning step T360 is the same as the first discriminator learning step T110 to the integrated discriminator learning step T160 in the first embodiment, and thus description thereof is omitted. .

統合識別器評価工程Ｔ３７０では、統合識別器評価部３０７が、統合識別器学習工程Ｔ３６０により学習された統合識別器を評価する。図１９は、統合識別器評価部３０７により実行する統合識別器評価工程Ｔ３７０の処理の詳細を示したフローチャートである。 In the integrated discriminator evaluation step T370, the integrated discriminator evaluation unit 307 evaluates the integrated discriminator learned in the integrated discriminator learning step T360. FIG. 19 is a flowchart showing details of the integrated discriminator evaluation step T370 executed by the integrated discriminator evaluating unit 307.

Ｔ３７０１では、学習評価画像に対して、第２識別器によって学習した事例の検出処理を行う。この検出処理の方法は、認識処理における検出工程Ｓ１２０の処理内容と同様であるため、説明を省略する。 In T3701, the detection processing of the case learned by the second classifier is performed on the learning evaluation image. Since the method of this detection process is the same as the processing content of the detection step S120 in the recognition process, description thereof is omitted.

Ｔ３７０２では、学習評価画像に対して、第１識別器によってクラスの識別を行う。ここでの処理内容も、認識処理における識別工程Ｓ１３０の処理内容と同様であるため、説明を省略する。 In T3702, the class is identified by the first classifier for the learning evaluation image. Since the processing content here is the same as the processing content of the identification step S130 in the recognition processing, the description thereof is omitted.

Ｔ３７０３では、Ｔ３７０１の検出結果とＴ３７０２の識別結果とを統合し、第２統合識別器保持部３５６に記憶されている統合識別器を用いて学習評価画像の識別を行う。 In T3703, the detection result of T3701 and the identification result of T3702 are integrated, and the learning evaluation image is identified using the integrated classifier stored in the second integrated classifier holding unit 356.

Ｔ３７０４では、Ｔ３７０２で行った第１識別器による識別結果を学習評価画像のＧＴと比較することにより精度評価する。この精度評価には、例えば、ＰｉｘｅｌＡｃｃｕｒａｃｙを用いる。ＰｉｘｅｌＡｃｃｕｒａｃｙとは、非特許文献１などにおいて領域分割の評価によく用いられる評価値で、各画素のクラス識別結果が正解しているかどうかを集計した値である。 In T3704, accuracy is evaluated by comparing the identification result by the first classifier performed in T3702 with GT of the learning evaluation image. For example, Pixel Accuracy is used for this accuracy evaluation. PixelAccuracy is an evaluation value often used for evaluation of region division in Non-Patent Document 1 or the like, and is a value obtained by counting whether or not the class identification result of each pixel is correct.

Ｔ３７０５では、Ｔ３７０３で行った統合識別器による統合識別結果を学習評価画像のＧＴと比較して精度評価する。ここでの精度評価もＴ３７０４と同様、ＰｉｘｅｌＡｃｃｕｒａｃｙを用いて評価を行う。 In T3705, the integrated identification result by the integrated classifier performed in T3703 is compared with the GT of the learning evaluation image to evaluate the accuracy. The accuracy evaluation here is also performed using Pixel Accuracy as in T3704.

Ｔ３７０６では、Ｔ３７０４でおよびＴ３７０５で評価した識別精度を比較する。そして、第１識別器による識別精度に対して、統合識別結果の精度が所定値以上高くなっていなければ、再度第２識別器を学習する。つまり、学習評価画像に対する統合識別器による識別結果であるＰｉｘｅｌＡｃｃｕｒａｃｙが、第１識別器による識別結果であるＰｉｘｅｌＡｃｃｕｒａｃｙよりも所定値以上高くなっていなければ、再度第２識別器を学習する。 In T3706, the discrimination accuracy evaluated in T3704 and T3705 is compared. Then, if the accuracy of the integrated identification result is not higher than a predetermined value with respect to the identification accuracy by the first identifier, the second identifier is learned again. That is, if the Pixel Accuracy as the identification result by the integrated classifier for the learning evaluation image is not higher than the Pixel Accuracy as the identification result by the first classifier, the second classifier is learned again.

再度第２識別器を学習する場合には、統合識別器による識別結果であるＰｉｘｅｌＡｃｃｕｒａｃｙが第１識別器による識別結果よりも所定値以上高くなるまで、誤識別領域選択工程Ｔ３３０から統合識別器評価工程Ｔ３７０までを繰り返す。２回目以降の誤識別領域選択工程Ｔ３３０では、第１識別器による識別結果ではなく、統合識別器による統合識別結果に対して誤識別領域を選択する。選択された誤識別領域が、以前に選択されていた領域の場合には、第２識別器における学習データのサイズを変更するか特徴量を変更する。学習データのサイズを変更する場合には、先の学習データに比べて隣接する小領域を多く結合すればよい。また、以前に選択されていない誤識別領域の場合には、その領域を識別するための検出器（第２識別器）を追加すればよい。第２識別器を再学習したあとは、統合識別器を再度学習して、統合識別結果を再評価する。 When learning the second discriminator again, the integrated discriminator evaluation process starts from the erroneous discrimination area selection step T330 until PixelAccuracy, which is the discrimination result by the integrated discriminator, becomes higher than the discrimination result by the first discriminator by a predetermined value or more. Repeat until T370. In the second and subsequent erroneous identification region selection step T330, an erroneous identification region is selected for the integrated identification result by the integrated classifier, not by the identification result by the first identifier. If the selected misidentification area is an area that has been selected before, the size of the learning data in the second discriminator is changed or the feature amount is changed. When changing the size of the learning data, it suffices to combine a large number of adjacent small regions as compared with the previous learning data. In the case of a misidentification region that has not been selected before, a detector (second identifier) for identifying the region may be added. After re-learning the second discriminator, the integrated discriminator is learned again, and the integrated discriminating result is reevaluated.

以上、本実施形態によれば、第１の識別器による識別結果と第２の識別器による検出結果を統合した統合識別器を学習評価画像を用いて評価することにより、統合識別器による識別精度を高めることができる。 As described above, according to the present embodiment, by using the learning evaluation image to evaluate the integrated discriminator that integrates the discrimination result by the first discriminator and the detection result by the second discriminator, the discrimination accuracy by the integrated discriminator. Can be increased.

［第４の実施形態］
次に、第４の実施形態として、学習評価画像に対して誤識別した領域について、ユーザがクラスを定義することにより、その領域の事例に対応する第２識別器を学習する構成について説明する。好適な構成としては、ユーザが登録した学習評価画像を第１領域識別器がクラスを識別し、その結果をユーザが見て必要な箇所を誤識別領域として選択する。そして、選択された誤識別領域に基づいて、第２識別器を学習する。 [Fourth Embodiment]
Next, as a fourth embodiment, a configuration will be described in which a user defines a class for a region misidentified with respect to a learning evaluation image, thereby learning a second classifier corresponding to a case in that region. As a preferred configuration, the first region discriminator identifies the class of the learning evaluation image registered by the user, and selects a necessary portion as a misidentification region by viewing the result. Then, the second discriminator is learned based on the selected misidentification region.

本実施形態における画像認識装置２０およびその処理フローは、第１の実施形態と同様であるため説明を省略する。次に、本実施形態における学習装置３００およびその処理フローについて説明する。図１０（ｃ）は、本実施形態における学習装置３００の機能構成を示す図である。本実施形態における学習装置３００は、第１の実施形態において図１０（ａ）で示した学習装置３００の各機能部の構成に加えて、表示制御部３０８、学習評価画像取得部３０９を有する。表示制御部３０８、学習評価画像取得部３０９の詳細な説明については、図１１（ｃ）等を用いて後述する。その他の構成については、図１０（ａ）と同様であるため説明を省略する。 Since the image recognition apparatus 20 and its processing flow in the present embodiment are the same as those in the first embodiment, description thereof will be omitted. Next, the learning apparatus 300 and its processing flow in this embodiment will be described. FIG. 10C is a diagram illustrating a functional configuration of the learning device 300 according to the present embodiment. The learning device 300 in the present embodiment includes a display control unit 308 and a learning evaluation image acquisition unit 309 in addition to the configuration of each functional unit of the learning device 300 illustrated in FIG. 10A in the first embodiment. Detailed descriptions of the display control unit 308 and the learning evaluation image acquisition unit 309 will be described later with reference to FIG. The other configuration is the same as that shown in FIG.

次に、本実施形態における学習処理について説明する。図１１（ｃ）は、本実施形態において、学習装置３００が実行する学習処理を示すフローチャートである。第１識別器学習工程Ｔ４１０の処理は、第１の実施形態における第１識別器学習工程Ｔ１１０と同様であるため、説明を省略する。 Next, the learning process in this embodiment will be described. FIG. 11C is a flowchart showing a learning process executed by the learning device 300 in the present embodiment. Since the processing of the first discriminator learning step T410 is the same as the first discriminator learning step T110 in the first embodiment, the description thereof is omitted.

学習評価画像取得工程Ｔ４２０では、学習評価画像取得部３０９が、ユーザによって学習装置３００に登録された学習用画像を取得する。取得された学習評価画像は、学習評価画像保持部３５２に送信され、記憶される。 In the learning evaluation image acquisition step T420, the learning evaluation image acquisition unit 309 acquires the learning image registered in the learning device 300 by the user. The acquired learning evaluation image is transmitted to and stored in the learning evaluation image holding unit 352.

学習時識別工程Ｔ４３０では、第１識別器学習部３０１が、学習評価画像取得工程Ｔ４２０で取得した学習評価画像に対して、第１識別器を用いてクラスの識別を行う。具体的な処理内容は、第１の実施形態における学習時識別工程Ｔ１２０の処理内容と同様であるため、説明を省略する。 In the learning time identification step T430, the first discriminator learning unit 301 identifies the class using the first discriminator for the learning evaluation image acquired in the learning evaluation image acquisition step T420. Since the specific processing content is the same as the processing content of the learning time identification step T120 in the first embodiment, the description is omitted.

誤識別領域選択工程Ｔ４４０では、表示制御部３０８が、学習時識別工程Ｔ４３０で用いられた学習用画像を表示部４０６に表示させる。表示部４０６に表示された学習用画像は学習時識別工程Ｔ４３０のクラス識別の際に分割された小領域の単位で選択できるようになっており、ユーザは操作部４０５（マウス等）を操作することによって誤識別領域を選択できる。誤識別領域選択部３０３は、ユーザが選択、指示した誤識別領域に関わる情報を取得することにより、誤識別領域を選択する。また、選択された誤識別領域に対するクラス定義も行われる。クラス定義は予め決められたクラスの中から、誤識別領域選択部３０３が選択するようにしてもよいし、ユーザが選択して、その選択した情報を誤識別領域選択部３０３が取得するようにしてもよい。 In the erroneous identification region selection step T440, the display control unit 308 causes the display unit 406 to display the learning image used in the learning time identification step T430. The learning image displayed on the display unit 406 can be selected in units of small areas divided during class identification in the learning time identification step T430, and the user operates the operation unit 405 (such as a mouse). This makes it possible to select a misidentification area. The misidentification area selection unit 303 selects the misidentification area by acquiring information related to the misidentification area selected and instructed by the user. In addition, class definition for the selected misidentification area is also performed. The class definition may be selected by the misidentification area selection unit 303 from predetermined classes, or selected by the user, and the misidentification area selection unit 303 acquires the selected information. May be.

第２識別器学習データ生成工程Ｔ４５０および第２識別器学習工程Ｔ４６０の処理は、第１の実施形態における第２識別器学習データ生成工程Ｔ１４０および第２識別器学習工程Ｔ１５０の処理内容と同様であるため、その説明を省略する。 The processing of the second discriminator learning data generation step T450 and the second discriminator learning step T460 is the same as the processing contents of the second discriminator learning data generation step T140 and the second discriminator learning step T150 in the first embodiment. Therefore, the description thereof is omitted.

以上、本実施形態では、第１識別器学習部３０１は、ユーザにより登録された学習評価画像を用いて第１識別器を学習するとともに、誤識別領域選択部３０３が、ユーザが選択指示した誤識別領域に関わる情報を取得することにより、誤識別領域を選択する。これによって、第１識別器では識別が難しい小領域を抽出することができる。特に、本実施形態では、ユーザの選択指示に基づいて誤識別領域を選択するようにしているので、ＧＴがない学習評価画像を用いることができ、ユーザが認識させたい画像に対して識別精度の高い識別器を学習することができる。 As described above, in the present embodiment, the first discriminator learning unit 301 learns the first discriminator using the learning evaluation image registered by the user, and the erroneous classification region selection unit 303 selects the error that the user has instructed to select. By acquiring information related to the identification area, the erroneous identification area is selected. Thereby, it is possible to extract a small region that is difficult to be identified by the first classifier. In particular, in the present embodiment, since the erroneous identification region is selected based on the user's selection instruction, a learning evaluation image without GT can be used, and the identification accuracy of the image that the user wants to recognize is high. A high classifier can be learned.

また、本発明は、上記実施形態の機能を実現するソフトウェア（プログラム）を、ネットワーク又は各種記憶媒体を介してシステム或いは装置に供給し、そのシステム或いは装置のコンピュータ（又はＣＰＵやＭＰＵ等）がプログラムを読み出して実行する処理である。また、本発明は、複数の機器から構成されるシステムに適用しても、１つの機器からなる装置に適用してもよい。本発明は上記実施例に限定されるものではなく、本発明の趣旨に基づき種々の変形（各実施例の有機的な組合せを含む）が可能であり、それらを本発明の範囲から除外するものではない。即ち、上述した各実施例及びその変形例を組み合わせた構成も全て本発明に含まれるものである。 In addition, the present invention supplies software (program) for realizing the functions of the above-described embodiments to a system or apparatus via a network or various storage media, and the computer of the system or apparatus (or CPU, MPU, etc.) programs Is read and executed. Further, the present invention may be applied to a system composed of a plurality of devices or an apparatus composed of a single device. The present invention is not limited to the above embodiments, and various modifications (including organic combinations of the embodiments) are possible based on the spirit of the present invention, and these are excluded from the scope of the present invention. is not. That is, the present invention includes all the combinations of the above-described embodiments and modifications thereof.

３００学習装置
３０１第１識別器学習部
３０２学習時識別部
３０３誤識別領域選択部
３０４第２識別器学習データ生成部
３０５第２識別器学習部
３０６統合識別器学習部 DESCRIPTION OF SYMBOLS 300 Learning apparatus 301 1st discriminator learning part 302 Discrimination at the time of learning 303 Misidentification area selection part 304 2nd discriminator learning data generation part 305 2nd discriminator learning part 306 Integrated discriminator learning part

Claims

A first learning step of learning a first discriminator for identifying a class for each region of an image using a learning image;
A learning time identifying step of identifying a class for each region of the learning evaluation image by the learned first discriminator;
A selection step of selecting a misidentification area in which the class identification result for the learning evaluation image by the first classifier is incorrect;
A generation step of generating learning data using a region including the selected misidentification region;
And a second learning step of learning a second discriminator for identifying the class of the generated learning data.

The said generation process selects the area | region containing the said misidentification area | region based on the correct data of the said learning evaluation image, The said learning area | region is produced | generated using the selected area | region. Learning method.

The learning method according to claim 1, wherein the generation step selects a region including the misidentification region based on an instruction from a user, and generates the learning data using the selected region. .

In the generation step, the learning evaluation image having a region including the misidentification region as a correct example, having the same shape as the region including the misidentification region, and having the class erroneously identified by the first classifier The learning method according to any one of claims 1 to 3, wherein the learning data is generated with a region of a negative example as a negative example.

The generating step sets a size of a region including the misidentification region based on a distance between a feature amount extracted from the positive case region and a feature amount extracted from the negative case region. The learning method according to claim 4.

The learning method according to claim 1, wherein the learning image and the learning evaluation image are data of the same image.

7. The method according to claim 1, further comprising a third learning step of learning an integrated discriminator that integrates the discrimination result obtained by the first discriminator and the discrimination result obtained by the second discriminator. The learning method according to the section.

An evaluation step for obtaining an evaluation value as a result of identifying a class for each area of the learning evaluation image by the integrated classifier and an evaluation value as a result of identifying a class for each area of the learning evaluation image by the first classifier Further comprising
Until the evaluation value of the identification result of the integrated discriminator becomes higher than the evaluation value of the identification result of the first discriminator by a predetermined value or more, the size of the region including the misidentification region is made different, and the second learning is performed. The learning method according to claim 7, wherein the steps are repeated.

A detection step of detecting a region whose class is to be identified by the second classifier from the recognition target image;
An identifying step of identifying a class for each region of the recognition target image using a first identifier;
An image recognition method comprising: an integrated identification step of identifying a class for each region of the recognition target image based on a detection result of the detection step and an identification result of the identification step.

The first classifier and the second classifier learned by the learning method according to any one of claims 1 to 6 are used as the first classifier and the second classifier. The image recognition method according to claim 9.

It further includes an acquisition step of acquiring incidental information incidental to the recognition target image,
The image according to claim 9 or 10, wherein the detection step selects a second discriminator to be used in the detection step from a plurality of second discriminators based on the acquired auxiliary information. Recognition method.

The supplementary information includes position information when the recognition target image is captured, information obtained by a camera that captured the recognition target image, parameters used by the camera when the recognition target image is captured, The image recognition method according to claim 11, wherein the image recognition method is one of camera autofocus information at the time of capturing a recognition target image and a scene feature amount of the recognition target image.

First learning means for learning a first discriminator for identifying a class for each region of an image using a learning image;
A learning time identifying means for identifying a class for each region of the learning evaluation image by the learned first discriminator;
Selecting means for selecting a misidentification region in which the class identification result for the learning evaluation image by the first classifier is incorrect;
Generating means for generating learning data using a region including the selected misidentification region;
A learning apparatus comprising: a second learning unit that learns a second classifier that identifies a class of the generated learning data.

Detecting means for detecting a region in which a class should be identified by the second classifier from the recognition target image;
Identifying means for identifying a class for each region of the recognition target image using a first identifier;
An image recognition apparatus comprising: an integrated identification unit that identifies a class for each region of the recognition target image based on a detection result of the detection unit and an identification result of the identification unit.

The program for making a computer perform the learning method of any one of Claim 1 to 8.

A program for causing a computer to execute the image recognition method according to any one of claims 9 to 12.