JP7300027B2

JP7300027B2 - Image processing device, image processing method, learning device, learning method, and program

Info

Publication number: JP7300027B2
Application number: JP2022022303A
Authority: JP
Inventors: 雅人青葉; 浩一馬養
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2017-07-05
Filing date: 2022-02-16
Publication date: 2023-06-28
Anticipated expiration: 2037-07-05
Also published as: JP7350208B2; JP2022068282A; JP2023115104A

Description

本発明は、画像処理装置、学習装置、フォーカス制御装置、露出制御装置、画像処理方法、学習方法、及びプログラムに関する。 The present invention relates to an image processing device, a learning device, a focus control device, an exposure control device, an image processing method, a learning method, and a program.

近年、画像を領域分割する研究も広く行われている。例えば、画像から人物の領域、自動車の領域、道路の領域、建物の領域、又は空の領域等を切り出すことができる。これは意味的領域分割（ＳｅｍａｎｔｉｃＳｅｇｍｅｎｔａｔｉｏｎ）と呼ばれ、分割結果は被写体の種類に対応した画像補正又はシーン解釈等に応用可能である。 In recent years, research on image segmentation has also been widely conducted. For example, a person's area, an automobile area, a road area, a building area, a sky area, or the like can be cut out from an image. This is called semantic segmentation, and the result of segmentation can be applied to image correction or scene interpretation corresponding to the type of subject.

意味的領域分割方法としては、画像を事前にいくつかの領域に分割し、分割された各領域をクラス分類する方法がある。例えば、画像を複数の矩形ブロックに分割し、それぞれのブロックをクラス分類することができる。画像をクラス分類する方法としては、非特許文献１に記載のように、深層学習を用いた分類が広く研究されている。また、例えば非特許文献２に記載の方法を用いて画像を不定形の小領域（ｓｕｐｅｒｐｉｘｅｌ）に分割し、その領域の特徴量及び領域周辺のコンテクスト特徴量を用いて領域をクラス分類することもできる。クラス分類には、学習画像を用いて学習が行われた推定器を用いることができる。 As a semantic segmentation method, there is a method of segmenting an image into several regions in advance and classifying each segmented region. For example, an image can be divided into a plurality of rectangular blocks and each block can be classified. As a method for classifying images, classification using deep learning, as described in Non-Patent Document 1, has been widely studied. Alternatively, for example, the method described in Non-Patent Document 2 can be used to divide an image into small irregular-shaped regions (superpixels), and classify the regions using the feature values of the regions and the context feature values around the regions. can. For class classification, an estimator trained using training images can be used.

近年では、深層学習を利用した領域分割も研究されている。非特許文献３は、ＣＮＮ（ＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｕｒａｌＮｅｔｗｏｒｋ）の中間層出力を特徴量として利用し、複数の中間層特徴による画素ごとのクラス判定結果を統合する。この方法では、小領域分割結果を利用することなく、画素ごとに直接クラス判定を行うことができる。 In recent years, region segmentation using deep learning has also been studied. Non-Patent Document 3 uses an intermediate layer output of a CNN (Convolutional Neural Network) as a feature amount, and integrates class determination results for each pixel based on a plurality of intermediate layer features. With this method, class determination can be performed directly for each pixel without using the result of segmentation into small regions.

A. Krizhevsky et al. "ImageNet Classification with Deep Convolutional Neural Networks", Proc. Advances in Neural Information Processing Systems 25 (NIPS 2012).A. Krizhevsky et al. "ImageNet Classification with Deep Convolutional Neural Networks", Proc. Advances in Neural Information Processing Systems 25 (NIPS 2012). R. Achanta et al. "SLIC Superpixels", EPFL Technical Report 149300, 2010.R. Achanta et al. "SLIC Superpixels", EPFL Technical Report 149300, 2010. J. Long et al. "Fully Convolutional Networks for Semantic Segmentation", The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2015.J. Long et al. "Fully Convolutional Networks for Semantic Segmentation", The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2015.

従来の方法によれば、画像上のそれぞれの小領域を、被写体の種類に基づいてクラス分類することができた。例えば、各領域の特徴量に基づいて、領域が空を表す領域であるか前景（空以外）の領域であるかを判定することができた。一方、異なる種類の被写体が混在している領域については、適切な判定を行うことが困難であった。例えば、たくさんの木の枝の隙間に空が見えるような複数の領域について判定を行うと、テクスチャが類似しているために、全ての領域について前景であると判定するか、全ての領域について空であると判定してしまう可能性が高かった。 According to the conventional method, each small area on the image can be classified into classes based on the type of subject. For example, based on the feature amount of each area, it was possible to determine whether the area represents the sky or the foreground (other than the sky). On the other hand, it has been difficult to make an appropriate determination for an area where different types of subjects coexist. For example, if multiple regions where the sky can be seen through the gaps between many tree branches are judged, all regions will be judged to be foreground, or all regions will be judged to be empty because the textures are similar. There was a high possibility that it would be determined that

本発明は、分類結果を用いた処理の精度を向上できるように、画像の各領域のクラス分類を行うことを目的とする。 An object of the present invention is to classify each region of an image so as to improve the accuracy of processing using the classification results.

本発明の目的を達成するために、例えば、本発明の画像処理装置は以下の構成を備える。すなわち、
入力画像を取得する取得手段と、
前記入力画像から特徴を抽出する抽出手段と、
前記入力画像から抽出された前記特徴が入力された推定器から、前記入力画像において特定のクラスに属する領域の面積に対応する情報を出力する出力手段と、を備え、
前記推定器は、学習画像を使用して学習されたパラメータを有する。 In order to achieve the object of the present invention, for example, the image processing apparatus of the present invention has the following configuration. i.e.
an acquisition means for acquiring an input image;
extracting means for extracting features from the input image;
output means for outputting information corresponding to the area of a region belonging to a specific class in the input image from the estimator to which the features extracted from the input image are input;
The estimator has parameters learned using training images.

本発明によれば、分類結果を用いた処理の精度を向上できるように、画像の各領域のクラス分類を行うことができる。 According to the present invention, each region of an image can be classified into classes so as to improve the accuracy of processing using classification results.

各実施形態に係る装置の構成例を示すブロック図。FIG. 2 is a block diagram showing a configuration example of an apparatus according to each embodiment; 各実施形態に係る処理を説明するフローチャート。4 is a flowchart for explaining processing according to each embodiment; 学習画像及びクラスラベルを説明する図。FIG. 4 is a diagram for explaining learning images and class labels; ２クラス識別時のクラスラベル及び混合状態を説明する図。FIG. 4 is a diagram for explaining class labels and a mixed state at the time of 2-class identification; ３クラス識別時のクラスラベル及び混合状態を説明する図。FIG. 10 is a diagram for explaining class labels and a mixed state at the time of 3-class identification; ３クラス識別時の混合状態を説明する図。The figure explaining the mixed state at the time of 3 class identification. 混合状態のマッピングの説明図。Explanatory diagram of mixed state mapping. 各実施形態に係る学習部又は推定部の構成例を示す図。The figure which shows the structural example of the learning part or estimation part which concerns on each embodiment. 混在領域におけるクラスラベル詳細化を説明する図。FIG. 10 is a diagram for explaining class label refinement in a mixed area; 各実施形態を実現可能なコンピュータの構成例を示す図。The figure which shows the structural example of the computer which can implement|achieve each embodiment.

本発明の一実施形態によれば、入力画像上の識別単位となる所定領域における、複数のクラスの混ざり方（以下、混合状態と呼ぶ）を推定することができる。以下では、推定対象となる領域内の画像のことを対象画像と呼ぶことがある。より具体的には、本発明の一実施形態によれば、対象画像における、互いに異なる属性を持つ領域の混合状態が判定される。それぞれの属性の領域は、同じクラスに属する被写体が占める領域である。すなわち、この属性の領域の１つは、特定のクラスに属する被写体の領域であり、この属性の領域の別の１つは、特定のクラスとは別のクラスに属する被写体の領域である。 According to one embodiment of the present invention, it is possible to estimate how a plurality of classes are mixed (hereinafter referred to as a mixed state) in a predetermined area serving as a discrimination unit on an input image. Hereinafter, the image within the estimation target area may be referred to as the target image. More specifically, according to one embodiment of the present invention, the mixed state of regions having mutually different attributes in the target image is determined. Each attribute area is an area occupied by subjects belonging to the same class. That is, one of the regions with this attribute is the region of the subject belonging to a specific class, and the other one of the regions of this attribute is the region of the subject belonging to a class different from the specific class.

一実施形態によれば、例えばたくさんの木の枝（前景）の隙間に空が見えるような領域について、前景部分と空部分との混合状態（例えば面積比、エッジ面積、又は配置パターン等）を推定することができる。従来の方法で得られるような各領域のクラス情報（例えば前景領域であるか空領域であるかを示す情報）だけでなく、このような混合状態を示す情報を用いることにより、後に画像に対して行う処理の精度を向上させることができる。具体例については各実施形態において詳しく説明する。 According to one embodiment, for example, for an area where the sky can be seen between many tree branches (foreground), the mixture state of the foreground part and the sky part (for example, area ratio, edge area, arrangement pattern, etc.) is calculated. can be estimated. By using not only the class information of each region (for example, information indicating whether it is a foreground region or a sky region) as obtained by the conventional method, but also information indicating such a mixture state, it is possible to later apply to the image It is possible to improve the accuracy of the processing performed by A specific example will be described in detail in each embodiment.

以下、本発明の実施形態を図面に基づいて説明する。ただし、本発明の範囲は以下の実形態に限定されるものではない。以下の実施形態においては、図１等に示される各処理部は、コンピュータにより実現してもよいし、専用のハードウェアによって実現してもよい。 BEST MODE FOR CARRYING OUT THE INVENTION An embodiment of the present invention will be described below with reference to the drawings. However, the scope of the present invention is not limited to the following embodiments. In the following embodiments, each processing unit shown in FIG. 1 and the like may be implemented by a computer or may be implemented by dedicated hardware.

図１０は、各実施形態を実現可能なコンピュータの基本構成を示す図である。図１０においてプロセッサ１０１は、例えばＣＰＵであり、コンピュータ全体の動作をコントロールする。メモリ１０２は、例えばＲＡＭであり、プログラム及びデータ等を一時的に記憶する。コンピュータが読み取り可能な記憶媒体１０３は、例えばハードディスク又はＣＤ－ＲＯＭ等であり、プログラム及びデータ等を長期的に記憶する。本実施形態においては、記憶媒体１０３が格納している、各部の機能を実現するプログラムが、メモリ１０２へと読み出される。そして、プロセッサ１０１が、メモリ１０２上のプログラムに従って動作することにより、各部の機能が実現される。 FIG. 10 is a diagram showing the basic configuration of a computer that can implement each embodiment. In FIG. 10, a processor 101 is, for example, a CPU and controls the operation of the entire computer. The memory 102 is, for example, a RAM, and temporarily stores programs, data, and the like. The computer-readable storage medium 103 is, for example, a hard disk or CD-ROM, and stores programs and data for a long period of time. In this embodiment, a program that implements the function of each unit stored in the storage medium 103 is read out to the memory 102 . The processor 101 operates in accordance with the programs on the memory 102 to implement the functions of each unit.

図１０において、入力インタフェース１０４は外部の装置から情報を取得するためのインタフェースである。また、出力インタフェース１０５は外部の装置へと情報を出力するためのインタフェースである。バス１０６は、上述の各部を接続し、データのやりとりを可能とする。 In FIG. 10, an input interface 104 is an interface for acquiring information from an external device. An output interface 105 is an interface for outputting information to an external device. A bus 106 connects the above units and enables data exchange.

［実施形態１］
図１（Ａ）及び（Ｂ）に沿って、実施形態１に係る画像処理装置及び学習装置の基本的な構成を説明する。 [Embodiment 1]
The basic configurations of the image processing device and the learning device according to the first embodiment will be described with reference to FIGS.

最初に、図１（Ａ）に従って、学習装置の装置構成の概要を説明する。本実施形態において学習装置は、後述する画像処理装置が混合状態を認識する処理を行う際に利用する推定器を、事前に用意された学習画像から生成する。学習処理の詳細については後述する。学習データ記憶部５１００には、あらかじめ用意された学習データを記憶している。学習データは、学習画像と教師情報とを含む。データ取得部２１００は、学習データ記憶部５１００から、学習画像と教師情報とを取得する。学習部２２００は、特徴抽出部６１０を用いて、学習画像の所定領域にある、推定器の学習に用いる識別画像の特徴量を抽出する。また、学習部２２００は、特徴量から混合状態を推定する推定器の学習を、識別画像の特徴量と教師情報との組み合わせを用いて行う。例えば、学習部２２００は、特徴量を入力されると混合状態を示す情報を出力する推定器の学習を行うことができる。ここで、教師情報は、識別画像における、互いに異なる属性の領域間の混合状態を示す情報である。学習により得られた推定器は、推定器記憶部５２００に記憶される。具体的には、推定器記憶部５２００は、学習により決定された推定器のパラメータを記憶することができる。 First, according to FIG. 1A, the outline of the device configuration of the learning device will be described. In the present embodiment, the learning device generates an estimator, which is used when an image processing device, which will be described later, performs processing for recognizing a mixed state, from learning images prepared in advance. Details of the learning process will be described later. The learning data storage unit 5100 stores learning data prepared in advance. The learning data includes learning images and teacher information. The data acquisition unit 2100 acquires learning images and teacher information from the learning data storage unit 5100 . The learning unit 2200 uses the feature extraction unit 610 to extract the feature amount of the identification image used for the learning of the estimator in a predetermined region of the learning image. In addition, the learning unit 2200 performs learning of an estimator that estimates a mixed state from a feature amount using a combination of the feature amount of the identification image and teacher information. For example, the learning unit 2200 can learn an estimator that outputs information indicating a mixed state when a feature amount is input. Here, the teacher information is information indicating a mixed state between regions with different attributes in the identification image. The estimator obtained by learning is stored in estimator storage section 5200 . Specifically, the estimator storage unit 5200 can store estimator parameters determined by learning.

次に、図１（Ｂ）に従って、画像処理装置の装置構成の概要を説明する。本実施形態において画像処理装置は、未知の入力画像における混合状態を推定する処理を行う。処理内容の詳細は後述する。画像取得部１１００は、入力画像を取得する。推定部１２００は、特徴抽出部６１０を用いて、入力画像の所定領域にある、混合状態の識別対象となる対象画像から特徴量を抽出する。推定部１２００が用いる特徴抽出部６１０は、学習部２２００が用いる特徴抽出部６１０と同様に動作できる。また、推定部１２００は、特徴量に基づいて、対象画像における互いに異なる属性を持つ領域の混合状態を推定する。例えば、推定部１２００は、あらかじめ学習が行われた推定器６２０を推定器記憶部５２００から読み込み、推定器に特徴量を入力することにより得られた、対象画像における互いに異なる属性の領域間の混合状態を示す情報を、出力部１３００に出力する。推定器６２０は、学習部２２００による学習により得られたものでありうる。出力部１３００は、推定部１２００による推定結果を出力する。 Next, according to FIG. 1B, an overview of the configuration of the image processing apparatus will be described. In this embodiment, the image processing apparatus performs processing for estimating a mixed state in an unknown input image. Details of the processing contents will be described later. The image acquisition unit 1100 acquires an input image. The estimating unit 1200 uses the feature extracting unit 610 to extract a feature amount from a target image, which is a mixed state identification target, located in a predetermined region of the input image. The feature extraction unit 610 used by the estimation unit 1200 can operate in the same manner as the feature extraction unit 610 used by the learning unit 2200 . In addition, the estimation unit 1200 estimates the mixed state of regions having mutually different attributes in the target image based on the feature amount. For example, the estimating unit 1200 reads the pre-trained estimator 620 from the estimator storage unit 5200, and inputs the feature amount to the estimator. Information indicating the state is output to the output unit 1300 . The estimator 620 can be obtained by learning by the learning unit 2200 . Output section 1300 outputs the result of estimation by estimation section 1200 .

学習装置のデータ取得部２１００及び学習部２２００は、同じ計算機上で実現されてもよいし、それぞれ独立したモジュールとして構成されていてもよいし、計算機上で動作するプログラムとして実装されてもよい。学習装置の学習データ記憶部５１００及び推定器記憶部５２００は、計算機の内部又は外部にあるストレージを用いて実現することができる。 The data acquisition unit 2100 and the learning unit 2200 of the learning device may be implemented on the same computer, configured as independent modules, or implemented as programs running on a computer. The learning data storage unit 5100 and the estimator storage unit 5200 of the learning device can be realized using storage inside or outside the computer.

画像処理装置の画像取得部１１００及び推定部１２００は、同じ計算機上で実現されてもよいし、それぞれ独立したモジュールとして構成されていてもよいし、計算機上で動作するプログラムとして実装されてもよい。また、これらはカメラ等の撮影装置内部に回路又はプログラムとして実装されてもよい。 The image acquiring unit 1100 and the estimating unit 1200 of the image processing apparatus may be implemented on the same computer, configured as independent modules, or implemented as programs running on a computer. . Also, these may be implemented as a circuit or a program inside an imaging device such as a camera.

画像処理装置は、学習装置と同じ計算機上で実現されてもよいし、別々の計算機上で実現されてもよい。学習装置及び画像処理装置が備える推定器記憶部５２００は、同じストレージであってもよいし、異なるストレージであってもよい。異なるストレージを用いる場合、学習装置により推定器記憶部５２００に格納された推定器を、画像処理装置が備える推定器記憶部５２００にコピー又は移動することができる。 The image processing device may be implemented on the same computer as the learning device, or may be implemented on a separate computer. The estimator storage unit 5200 included in the learning device and the image processing device may be the same storage or different storages. When using a different storage, the estimator stored in the estimator storage unit 5200 by the learning device can be copied or moved to the estimator storage unit 5200 included in the image processing device.

以下に、本実施形態に係る処理を詳細に説明する。まず、学習装置が行う学習時の処理に関して図２（Ａ）のフローに従って説明する。Ｓ２１００においてデータ取得部２１００は、学習データ記憶部５１００から、学習画像と、混合状態の教師情報と、を学習データとして取得する。 The processing according to this embodiment will be described in detail below. First, processing during learning performed by the learning device will be described according to the flow of FIG. 2(A). In S2100, the data acquisition unit 2100 acquires the learning image and the mixed state teacher information from the learning data storage unit 5100 as learning data.

学習データ記憶部５１００には、あらかじめ複数の学習画像と混合状態の教師情報が記憶されている。学習画像とは、推定器の学習に用いられる画像のことを指す。学習画像は、例えば、デジタルカメラ等で撮影された画像データでありうる。画像データの形式は特に限定されず、例えばＪＰＥＧ、ＰＮＧ、又はＢＭＰ等でありうる。以下では、用意された学習画像の枚数をＮ枚とし、ｎ番目の学習画像をＩ_ｎ（ｎ＝１・・・Ｎ）と表す。 In the learning data storage unit 5100, a plurality of learning images and teacher information of a mixed state are stored in advance. A training image refers to an image used for training the estimator. The learning image can be, for example, image data captured by a digital camera or the like. The format of the image data is not particularly limited, and may be JPEG, PNG, BMP, or the like. In the following description, the number of prepared learning images is N, and the n-th learning image is represented as I _n (n=1 . . . N).

混合状態の教師情報は、学習画像の所定領域における混合状態を示す。この教師情報は予め用意されており、例えば人間が学習画像を見ながら作成することができる。本実施形態においては、学習画像には識別単位となる複数の領域が設定されており、それぞれの領域について教師情報が用意されている。以下、１つの識別単位となる、学習画像中の所定領域の画像を、識別画像と呼ぶ。 The mixed state teacher information indicates the mixed state in a predetermined region of the learning image. This teacher information is prepared in advance, and can be created by a person, for example, while viewing the learning image. In the present embodiment, a plurality of regions that are identification units are set in the learning image, and teacher information is prepared for each region. Hereinafter, an image of a predetermined region in a learning image, which serves as one identification unit, will be referred to as an identification image.

領域の設定方法は特に限定されない。例えば、所定の領域設定パターンに従って、入力画像中に複数の領域を設定することができる。具体例としては、学習画像を所定サイズ（例えば１６×１６ピクセル）の複数の矩形領域に分割し、それぞれの矩形領域を識別単位として扱うことができる。また、非特許文献２に記載の手法で得られた小領域を識別単位の領域として扱うことができる。一方、学習画像の一部にのみ所定サイズの矩形領域を設定してもよい。なお、学習データ記憶部５１００には、所定サイズの識別画像が学習データとして記憶されていてもよい。 A method of setting the area is not particularly limited. For example, multiple regions can be set in the input image according to a predetermined region setting pattern. As a specific example, it is possible to divide the learning image into a plurality of rectangular regions of a predetermined size (for example, 16×16 pixels), and treat each rectangular region as an identification unit. In addition, the small area obtained by the method described in Non-Patent Document 2 can be treated as an identification unit area. On the other hand, a rectangular area of a predetermined size may be set only in part of the learning image. Note that the learning data storage unit 5100 may store an identification image of a predetermined size as learning data.

以下に、教師情報が示す混合状態について説明する。画像上の被写体は、複数のクラスに分類することができる。図３は、このようなクラス分類の例を示す。図３（Ａ）には、学習画像５００の例を示す。学習画像５００には、空、人物、及び植物が写っており、それぞれを異なるクラスに分類することができる。すなわち、図３（Ｂ）に示すように、領域５４１に含まれる画素には「空」のクラスラベルを、領域５４２に含まれる画素には「人物」のクラスラベルを、領域５４３に含まれる画素には「植物」のクラスラベルを、それぞれ与えることができる。 The mixed state indicated by the teacher information will be described below. A subject on an image can be classified into a plurality of classes. FIG. 3 shows an example of such class classification. FIG. 3A shows an example of a learning image 500. FIG. The learning image 500 includes the sky, people, and plants, which can be classified into different classes. That is, as shown in FIG. 3B, the pixels included in the region 541 are assigned the class label of “sky”, the pixels included in the region 542 are assigned the class label of “person”, and the pixels included in the region 543 are assigned the class label of “person”. can each be given a class label of "Plant".

クラス及びクラスラベルの定義は様々であり、クラス分類の方法は特に限定されない。図３の例では、被写体の種類に応じてクラス分類を行った。別のクラスラベルの例としては、肌領域又は髪領域、犬又は猫等の動物、及び自動車又は建物等の人工物、等が挙げられる。工場で用いられる部品Ａ又は部品Ｂといったような、特定物体を示すクラスラベルを用いることもできる。一方、各画素を主被写体領域と背景領域にクラス分類してもよい。また、光沢面又はマット面といった表面性状の違いや、金属面又はプラスチック面のような素材の違いに応じて、クラス分類を行ってもよい。以下では、クラスは全部でＭ種類あるものとする。 There are various definitions of classes and class labels, and the method of classifying is not particularly limited. In the example of FIG. 3, class classification is performed according to the type of subject. Examples of other class labels include skin or hair regions, animals such as dogs or cats, and man-made objects such as automobiles or buildings. A class label can also be used to indicate a particular object, such as Part A or Part B used in the factory. On the other hand, each pixel may be classified into a main subject area and a background area. Classification may also be performed according to the difference in surface properties such as a glossy surface or a matte surface, or the difference in materials such as a metal surface or a plastic surface. In the following, it is assumed that there are M types of classes in total.

クラスの混合状態とは、対象画像における互いに異なる属性の領域間の混合状態のことである。それぞれの属性の領域は、同じ特定のクラスに属する被写体が占める領域である。互いに異なる属性の領域の一方は、特定のクラスに属する被写体の領域であり、他方は、特定のクラスとは別のクラスに属する被写体の領域である。以下では、あるクラスに属する被写体が占める属性の領域のことを、単にそのクラスに属する領域と呼ぶことがある。また、各画素に写っている被写体のクラスのことを、以下では画素の属性又はクラスと呼ぶことがある。 A mixed state of classes is a mixed state between regions of mutually different attributes in the target image. Each attribute area is an area occupied by subjects belonging to the same specific class. One of the regions with attributes different from each other is a region of a subject belonging to a specific class, and the other is a region of a subject belonging to a class different from the specific class. Hereinafter, an attribute area occupied by a subject belonging to a certain class may simply be referred to as an area belonging to that class. Also, the class of the subject captured in each pixel may be referred to as the attribute or class of the pixel below.

混合状態の定義はさまざまなものが考えられる。本実施形態では、以下のように混合状態を数値で表すものとする。一実施形態において、混合状態を示す情報は、対象画像における属性の領域の分布に依存して定まる情報である。例えば、混合状態を示す情報は、対象画像における属性の領域のそれぞれの比率を表す情報である。具体例として、混合状態を示す情報は、対象画像において各クラスに属する領域の面積比でありうる。クラスが「空」と「非空」の２クラスである場合について、図４の例を参照して説明する。図４には、学習画像５１０と、そのクラスラベル５２０が表されている。クラスラベル５２０は、クラスが「空」である画素を白で、クラスが「非空」である画素を黒で表す。図４には、学習画像５１０中の識別画像５１５に対応するクラスラベル５２０上の領域５２５を拡大した、拡大図（５２６）が示されている。拡大図（５２６）には、非空領域５１１と、空領域５２２とが示されている。このとき、識別画像５１５の混合状態を、対応する領域５２５における空領域と非空領域との面積比ｒで表すことができる。例えば、１６×１６ｐｉｘｅｌの矩形領域において、空領域画素が１９２ｐｉｘｅｌ、非空領域画素が６４ｐｉｘｅｌであった場合には、ｒ＝１９２／２５６＝０．７５となる。 There are many possible definitions of mixed state. In this embodiment, the mixed state is expressed numerically as follows. In one embodiment, the information indicating the mixed state is information determined depending on the distribution of attribute regions in the target image. For example, the information indicating the mixed state is information representing the ratio of each attribute area in the target image. As a specific example, the information indicating the mixed state can be the area ratio of the regions belonging to each class in the target image. A case where there are two classes, "empty" and "non-empty", will be described with reference to the example in FIG. FIG. 4 shows a learning image 510 and its class label 520 . The class label 520 represents pixels with class "empty" in white and pixels with class "non-empty" in black. FIG. 4 shows an enlarged view (526) in which a region 525 on the class label 520 corresponding to the identification image 515 in the learning image 510 is enlarged. The enlarged view (526) shows the non-empty area 511 and the empty area 522. FIG. At this time, the mixed state of the identification image 515 can be represented by the area ratio r between the sky area and the non-sky area in the corresponding area 525 . For example, in a rectangular area of 16×16 pixels, if the sky area pixels are 192 pixels and the non-sky area pixels are 64 pixels, then r=192/256=0.75.

上の例では２クラスの面積比について説明したが、３つ以上のクラスの面積比を表すこともできる。図５には、学習画像５３０と、学習画像５３０中の識別画像５３５と、識別画像５３５についてのクラスラベル５３６が示されている。この例では、学習画像５３０の各画素は「空」、「植物」、及び「人工物」の３クラスに分類されている。この場合の混合状態は、植物領域５３１、空領域５３２、及び人工物領域５３３の面積比に応じて定めることができる。一例として、この場合に面積比を示す点５４５を、それぞれのクラスの面積比に応じて、図６（Ａ）に示す座標空間中の単体５４０（図５の場合三角形）上にプロットすることができる。この点５４５は、単体の二つの辺を内分する内分比ｔ_１及びｔ_２を用いて一意に表すことができるため、このときの面積比を、ｒ＝（ｔ_１，ｔ_２）というベクトルで表すことができる。これは一般Ｍ次元においても同様であるため、クラス数Ｍのときの面積比は、ｒ＝（ｔ_１，ｔ_２，・・・ｔ_Ｍ－１）というＭ－１次元のベクトルで一意に表すことができる。なお、前述のような２クラスの場合の面積比は、この一般化された形式においてＭ＝２と設定した場合と同値である。 Although the above example describes area ratios of two classes, area ratios of three or more classes can also be represented. FIG. 5 shows a training image 530, an identification image 535 in the training image 530, and a class label 536 for the identification image 535. FIG. In this example, each pixel of the learning image 530 is classified into three classes: "sky", "plant", and "artifact". The mixed state in this case can be determined according to the area ratio of the plant area 531 , the sky area 532 , and the artifact area 533 . As an example, a point 545 indicating the area ratio in this case can be plotted on a simplex 540 (a triangle in the case of FIG. 5) in the coordinate space shown in FIG. 6(A) according to the area ratio of each class. can. Since this point 545 can be uniquely expressed using the internal division ratios t ₁ and t ₂ that internally divide the two sides of the simple substance, the area ratio at this time is r=(t ₁ , t ₂ ). It can be represented by a vector. Since this is the same for general M dimensions, the area ratio when the number of classes is M is uniquely represented by an M−1 dimensional vector r=(t ₁ , t ₂ , . . . t _M−1 ) be able to. Note that the area ratio in the case of two classes as described above is the same value as when M=2 in this generalized form.

また、上記のように表されるＭクラスの面積比を、低次の空間にマッピングして扱ってもよい。例えば、識別画像における面積比をＭ次元空間にプロットし、ＳＯＭ（Ｓｅｌｆ－ＯｒｇａｎｉｚｉｎｇＭａｐ）やＬＬＥ（ＬｏｃａｌｌｙＬｉｎｅａｒＥｍｂｅｄｄｉｎｇ）を用いて低次の空間にマッピングすることができる。図６（Ｂ）は、上記の３クラスの混合比の空間を１次元のＳＯＭ５５０で量子化した例を示す。５５１はＳＯＭ５５０の始端ノードを表し、５５２はＳＯＭ５５０の終端ノードを表す。図６（Ｃ）は、これらを１次元空間にマッピングすることにより得られる。始端ノードの位置を０、終端ノードの位置を１とし、ノードを均等に配置すると、マップ上の位置を表すスカラ値により面積比を示す点５４５を表現することができる。例えば、図６（Ｂ）に示す面積比を示す点５４５は、図６（Ｃ）上の点５４６として近似することができ、そして点５４６のマップ上の位置（この例ではｒ＝０．３７）を用いて面積比を表すことができる。マッピング後の次元数は、図６（Ｃ）に示すように１次元には限られず、一般的にＭの値が大きい場合には１以上の次元ＲのＳＯＭで近似することができ、その場合ｒはＲ次元のベクトルで表すことができる。例えば、クラス数Ｍ＝５のとき、２次元のＳＯＭで面積比空間を量子化した場合には、ｒは２次元ベクトルで表すことができる。 Also, the M-class area ratio expressed as above may be mapped to a lower-order space. For example, the area ratio in the identification image can be plotted in an M-dimensional space and mapped in a low-dimensional space using SOM (Self-Organizing Map) or LLE (Locally Linear Embedding). FIG. 6B shows an example of quantizing the three-class mixing ratio space with a one-dimensional SOM550. 551 represents the start node of SOM 550 and 552 represents the end node of SOM 550 . FIG. 6(C) is obtained by mapping these to a one-dimensional space. If the position of the start node is 0 and the position of the end node is 1, and the nodes are arranged evenly, a point 545 indicating the area ratio can be represented by a scalar value representing the position on the map. For example, the area ratio point 545 shown in FIG. 6B can be approximated as point 546 on FIG. 6C, and the map location of point 546 (r=0.37 ) can be used to represent the area ratio. The number of dimensions after mapping is not limited to one dimension as shown in FIG. r can be represented by an R-dimensional vector. For example, when the number of classes M=5 and the area ratio space is quantized by a two-dimensional SOM, r can be represented by a two-dimensional vector.

また、Ｍクラスの面積比を、複数の基底ベクトルの合成ベクトルとして表現してもよい。例えば、さまざまな識別画像から得られたクラス面積比を、主成分分析又はスパースコーディング等を用いて複数の基底ベクトルに分解し、寄与度の大きい少数のベクトルでこれを近似することができる。この場合、面積比空間における面積比は、これら基底ベクトルの合成ベクトルとして表現することができ、そのときの各基底ベクトルに対する重み係数を用いて面積比を表すことができる。 Also, the M-class area ratio may be expressed as a composite vector of a plurality of basis vectors. For example, class area ratios obtained from various identification images can be decomposed into multiple basis vectors using principal component analysis, sparse coding, or the like, and approximated by a small number of vectors with large contribution. In this case, the area ratio in the area ratio space can be expressed as a composite vector of these basis vectors, and the area ratio can be expressed using a weighting factor for each basis vector at that time.

別の例として、混合状態を示す情報は、対象画像における互いに異なる属性の領域間の境界に係る情報、例えば対象画像中におけるこの境界を表す画素の比率を表す情報でありうる。一例として、各画素のクラス（例えば空領域又は非空領域）を示す二値画像に対してエッジ検出を行い、得られたエッジ画素数をカウントし、所定領域の画素数とエッジ画素数との比ｅを用いて混合状態を表すことができる。図４には、クラスラベル５２６に対するエッジ検出結果５２７が示されており、検出されたエッジ画素５２３が表されている。１６×１６ｐｉｘｅｌの矩形領域におけるエッジ画素のカウント結果が６４である場合、エッジ画素率はｅ＝６４／２５６＝０．２５と表すことができる。 As another example, the information indicating the mixed state may be information relating to the boundary between regions of different attributes in the target image, for example, information representing the ratio of pixels representing this boundary in the target image. As an example, edge detection is performed on a binary image that indicates the class of each pixel (for example, sky region or non-sky region), the number of obtained edge pixels is counted, and the number of pixels in a predetermined region and the number of edge pixels are calculated. The ratio e can be used to represent the mixing state. FIG. 4 shows an edge detection result 527 for class label 526, representing detected edge pixels 523. FIG. If the count result of edge pixels in a rectangular area of 16×16 pixels is 64, the edge pixel ratio can be expressed as e=64/256=0.25.

さらなる別の例として、混合状態を示す情報は、対象画像における属性の領域の配置を表す情報でありうる。例えば、所定領域内における各クラスの画素の配置パターンに従って混合状態を表すことができる。クラス数がＭ、所定領域の画素数がＫである場合、所定領域内の各画素のクラスを、Ｍ×Ｋ次元の二値ベクトルで表すことができる。例えば、「空」「非空」の２クラスが定義されており、所定領域のサイズが１６×１６ｐｉｘｅｌである場合、所定領域内のクラスラベル配置パターンを、２×１６×１６＝５１２次元の二値ベクトルとして表現することができる。このように識別画像から得られたさまざまな二値ベクトルをベクトル空間上にプロットし、ＳＯＭ若しくはＬＬＥ等を用いて量子化することにより、所定領域におけるクラスラベル配置パターンをベクトルｐとして表現することができる。また、識別画像から得られたさまざまな二値ベクトルを主成分分析若しくはスパースコーディング等を用いて基底ベクトルで表現する方法を用いることもできる。 As yet another example, the information indicating the mixed state may be information representing the arrangement of attribute regions in the target image. For example, the mixed state can be represented according to the arrangement pattern of pixels of each class within a predetermined area. If the number of classes is M and the number of pixels in the predetermined area is K, the class of each pixel in the predetermined area can be represented by an M×K binary vector. For example, if two classes of "empty" and "non-empty" are defined, and the size of the predetermined area is 16×16 pixels, the class label arrangement pattern in the predetermined area is a 2×16×16=512-dimensional two-dimensional It can be represented as a value vector. By plotting various binary vectors obtained from the identification image in this way on a vector space and quantizing them using SOM, LLE, etc., it is possible to express the class label arrangement pattern in a predetermined region as a vector p. can. It is also possible to use a method of expressing various binary vectors obtained from an identification image with basis vectors using principal component analysis, sparse coding, or the like.

図７は、所定領域におけるクラスラベル配置パターンを２次元のＳＯＭを用いてマッピングすることにより得られたマップ９００を示す。マップ９００において、各四角形はＳＯＭのノードであって、量子化されたクラスラベル配置パターンをそれぞれ表す。ＳＯＭの特性上、類似したパターンがマップ上で近い位置に配置される。識別画像におけるクラスラベル配置パターンを、各ノードのパターンとの近さに基づいて、マップ上の位置座標ｐで表すことができる。例えば図７の２次元ＳＯＭの例では、マップ上の位置をｐ＝（ｐ_１，ｐ_２）の２次元ベクトルで表すことができる。 FIG. 7 shows a map 900 obtained by mapping class label arrangement patterns in a predetermined area using a two-dimensional SOM. In map 900, each square is a node of the SOM and each represents a quantized class label placement pattern. Due to the characteristics of the SOM, similar patterns are placed close to each other on the map. The class label arrangement pattern in the identification image can be represented by position coordinates p on the map based on the closeness of each node to the pattern. For example, in the example of the two-dimensional SOM in FIG. 7, the position on the map can be represented by a two-dimensional vector p=(p ₁ , p ₂ ).

このように、混合状態はさまざまな方法で表現することができる。混合状態は、これらの表現のうちいずれか一つを用いて表してもよい。例えば、混合状態を面積比だけで表すのであれば、Ｃ＝ｒと定義すればよく、エッジ画素率だけで表わすのであればＣ＝ｅと表わせばよく、クラスラベル配置パターンだけで表すのであればＣ＝ｐと定義すればよい。また、混合状態は複数の表現を組み合わせて表してもよい。例えば、混合状態を面積比とエッジ画素率との組み合わせＣ＝（ｒ，ｅ）と定義してもよいし、面積比、エッジ画素率、及びクラスラベル配置パターンの組み合わせＣ＝（ｒ，ｅ，ｐ）と定義してもよい。本発明において、混合状態の表現方法は特に限定されない。 Thus, the mixed state can be represented in various ways. Mixed states may be represented using any one of these representations. For example, if the mixed state is expressed only by the area ratio, it can be defined as C=r, if it is expressed only by the edge pixel rate, it can be expressed as C=e, and if it is expressed only by the class label arrangement pattern, It suffices to define C=p. Also, the mixed state may be expressed by combining multiple expressions. For example, the mixed state may be defined as a combination of area ratio and edge pixel ratio C=(r, e), or a combination of area ratio, edge pixel ratio, and class label arrangement pattern C=(r, e, p). In the present invention, the method of expressing the mixed state is not particularly limited.

上記のように、混合状態Ｃは１つ以上の数値で表されるベクトルとして表現することができる。すなわち、一実施形態において得られる混合状態を表す情報は、所定領域における混合状態を表す特徴量であるといえる。混合状態Ｃを表すベクトルの次元数をＬとする。以下では、画像Ｉ_ｎ上の所定領域ｉにおける混合状態ベクトルをＣ_ｎｉと表し、混合状態ベクトルＣ_ｎｉのｌ（ｌ＝１，……，Ｌ）番目の要素をｃ（ｎ，ｉ，ｌ）と表す。なお、混合状態は、所定領域において各クラスの画素がどのように混合されているかだけでなく、所定領域が特定の１つのクラスの画素で構成されることを示してもよい。 As noted above, the mixture state C can be represented as a vector of one or more numbers. That is, it can be said that the information representing the mixed state obtained in one embodiment is a feature quantity representing the mixed state in the predetermined region. Let L be the number of dimensions of the vector representing the mixed state C. In the following, the mixture state vector in a predetermined region i on the image _In is represented as _Cni , and the l (l=1, ..., L) element of the mixture state vector _Cni is c(n, i, l) is represented as Note that the mixed state may indicate not only how the pixels of each class are mixed in the given area, but also whether the given area is composed of pixels of one specific class.

本実施形態においては、各学習画像の各画素について図３（Ｂ）に示すようにクラスラベルが与えられているものとする。そして、このクラスラベルに基づいて、学習画像から得られるそれぞれの識別画像について、上記のようにスカラ値又はベクトル値として表される混合状態Ｃが教師情報として予め算出され、学習データ記憶部５１００に予め格納されているものとする。しかしながら、データ取得部２１００は、識別画像の各画素の属性を示す情報を取得し、各画素の属性を示す情報を用いて混合状態を示す情報を生成することにより、教師情報を取得してもよい。例えば、データ取得部２１００は、学習データ記憶部５１００に格納されている学習画像の各画素のクラスラベルに基づいて、上記のようにそれぞれの識別画像の混合状態Ｃを算出することができる。さらに、各学習画像の各画素について図３（Ｂ）に示すようにクラスラベルが与えられていることは必須ではない。例えば、学習画像から得られる識別画像を見ながら作業者が入力したこの識別画像の混合状態、又は作業者が入力した情報（エッジ情報など）に基づいて自動的に算出されたこの識別画像の混合状態が、学習データ記憶部５１００に予め格納されていてもよい。 In this embodiment, it is assumed that each pixel of each learning image is given a class label as shown in FIG. 3(B). Then, based on this class label, for each identification image obtained from the learning images, the mixed state C represented as a scalar value or vector value as described above is calculated in advance as teacher information, and stored in the learning data storage unit 5100. It is assumed that it is stored in advance. However, the data acquisition unit 2100 acquires the information indicating the attribute of each pixel of the identification image, and generates the information indicating the mixed state using the information indicating the attribute of each pixel, thereby obtaining the teacher information. good. For example, the data acquisition unit 2100 can calculate the mixed state C of each identification image as described above based on the class label of each pixel of the learning image stored in the learning data storage unit 5100 . Furthermore, it is not essential that each pixel of each learning image is given a class label as shown in FIG. 3(B). For example, a mixed state of the identification images input by the worker while viewing the identification images obtained from the learning images, or a mixture of the identification images automatically calculated based on the information (edge information, etc.) input by the worker. The state may be stored in the learning data storage unit 5100 in advance.

ステップＳ２２００で、学習部２２００は、データ取得部２１００から識別画像及び混合状態の教師情報を取得して、混合状態を推定する推定器の学習を行う。以下では、推定器としてＣＮＮ（ＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｕｒａｌＮｅｔｗｏｒｋ）を利用する場合について説明する。ＣＮＮの構成としては従来既知のものを用いることができる。典型的には、ＣＮＮは、畳み込み層とプーリング層とを繰り返すことにより入力信号の局所的な特徴を次第にまとめていき、変形や位置ずれに対してロバストな特徴を得ることにより、認識タスクを行うニューラルネットワークである。 In step S2200, the learning unit 2200 acquires the identification image and the teacher information of the mixture state from the data acquisition unit 2100, and learns an estimator for estimating the mixture state. A case of using a CNN (Convolutional Neural Network) as an estimator will be described below. A conventionally known configuration can be used as the configuration of the CNN. Typically, CNNs perform recognition tasks by progressively summarizing local features of the input signal by repeating convolutional and pooling layers to obtain features that are robust against deformation and misalignment. It is a neural network.

ＣＮＮを用いた推定処理の例を、図８（Ａ）を参照しながら説明する。学習部２２００は、特徴抽出部６１０を用いて、推定器の学習に用いる識別画像の特徴量を抽出する。また、学習部２２００は、特徴量を入力されると混合状態を示す情報を出力する推定器の学習を、識別画像の特徴量と教師情報との組み合わせを用いて行う。図８（Ａ）は、学習部２２００が処理のために用いることができるＣＮＮの一例を示す。図８（Ａ）には、特徴抽出部６１０が行う処理に相当する部分が示されており、これは特徴抽出を行うＣＮＮの畳み込み層に相当する。また、図８（Ａ）には、学習を行う推定器６２０に相当する部分が示されており、これはパターン推定を行うＣＮＮの完全結合層に相当する。 An example of estimation processing using CNN will be described with reference to FIG. The learning unit 2200 uses the feature extraction unit 610 to extract the feature amount of the identification image used for learning the estimator. Also, the learning unit 2200 learns an estimator that outputs information indicating a mixed state when a feature amount is input, using a combination of the feature amount of the identification image and the teacher information. FIG. 8A shows an example of a CNN that can be used for processing by the learning unit 2200. FIG. FIG. 8A shows a portion corresponding to the processing performed by the feature extraction unit 610, which corresponds to the convolution layer of CNN that performs feature extraction. Also, FIG. 8A shows a portion corresponding to the estimator 620 that performs learning, which corresponds to the fully connected layer of CNN that performs pattern estimation.

畳み込み層は、学習画像の部分画像である識別画像６３０の各位置における畳み込み演算結果を信号として受け取る入力層６１１を有する。入力層６１１からの信号は、畳み込み層とプーリング層とが配置され、畳み込み演算とプーリングによる信号の選択とが繰り返される複数の中間層６１２，６１３を介して、最終層６１５へと送られる。特徴抽出部６１０の最終層６１５からの出力信号は、推定器６２０へと送られる。以下では、特徴抽出部６１０の出力信号をＸとする。完全結合層では、各層の素子が前後の層と全結合しており、特徴抽出部６１０から入力された信号は、重み係数を用いた積和演算を介して出力層６４０へと送られる。出力層６４０は、混合状態ベクトルＣの次元数Ｌと同数の出力素子を有している。 The convolution layer has an input layer 611 that receives as a signal the convolution operation result at each position of the identification image 630, which is a partial image of the learning image. A signal from the input layer 611 is sent to the final layer 615 through a plurality of intermediate layers 612 and 613 in which a convolution layer and a pooling layer are arranged, and the convolution operation and signal selection by pooling are repeated. The output signal from final layer 615 of feature extractor 610 is sent to estimator 620 . The output signal of the feature extraction unit 610 is assumed to be X below. In the fully connected layer, the elements of each layer are fully connected to the layers before and after, and the signal input from the feature extraction unit 610 is sent to the output layer 640 through sum-of-products operation using weighting coefficients. The output layer 640 has the same number of output elements as the number of dimensions L of the mixture state vector C. FIG.

学習部２２００は、推定器の学習を行う際に、学習画像Ｉ_ｎの所定領域ｉから得られた識別画像をＣＮＮに入力した際に、出力層６４０で得られる出力信号の値を、教師情報と比較する。ここで、学習画像Ｉ_ｎの所定領域ｉを特徴抽出部６１０に入力して得られた特徴量をＸ^ｎ _ｉ、これを推定器６２０に入力した結果得られた出力層６４０におけるｌ番目の素子の出力信号をｙ_ｌ（Ｘ^ｎ _ｉ）とする。また、出力層６４０のうちｌ番目の出力素子における教師信号は、混合状態Ｃ_ｎｉのｌ番目の要素ｃ（ｎ，ｉ，ｌ）で表される。この場合、出力信号と教師情報との誤差は下記のように計算される。

When learning the estimator, the learning unit 2200 inputs the identification image obtained from the predetermined region i of the learning image _In to the CNN, and converts the value of the output signal obtained in the output layer 640 into teacher information. Compare with Here, X ⁿ _i is the feature amount obtained by inputting the predetermined region i of the learning image I _n to the feature extraction unit 610, and the l-th element in the output layer 640 obtained as a result of inputting this to the estimator 620 Let y _l (X ⁿ _i ) be the output signal of . Also, the teacher signal at the l-th output element in the output layer 640 is represented by the l-th element c(n, i, l) of the mixed state C _ni . In this case, the error between the output signal and teacher information is calculated as follows.

誤差逆伝搬法を用いて、このように得られた誤差を出力層から入力層へと順次逆伝搬することにより、ＣＮＮの学習を行うことができる。例えば、確率的勾配降下法等を用いてＣＮＮにおける各層の重み係数を更新することができる。ＣＮＮの重み係数の初期値としては、ランダムな値を用いることもできるし、何らかのタスクに関する学習により得られた重み係数を用いてもよい。例えば、画像分類タスクにおいては画像ごとにクラスラベルが与えられた学習画像を用いるが、領域分割タスクにおいては画素ごとにクラスラベルが与えられた学習画像を用いるため、領域分割タスク用の学習画像を人間が用意するための負荷は大きい。一方、画像分類タスク用の学習画像は一般に公開されており、簡単に入手することができる。例えばＩＬＳＶＲＣ（ＩｍａｇｅＮｅｔＬａｒｇｅ－ｓｃａｌｅＶｉｓｕａｌＲｅｃｏｇｎｉｔｉｏｎＣｈａｌｌｅｎｇｅ）では１２０万枚の画像分類タスク用の学習画像が公開されている。よって、このような画像分類タスクのためにＣＮＮの学習を行い、この学習により得られた重み係数を初期値として用いて、本実施形態のような混合状態推定タスクのための学習を行ってもよい。 CNN can be trained by sequentially back-propagating the errors obtained in this way from the output layer to the input layer using the error back-propagation method. For example, a stochastic gradient descent method or the like can be used to update the weight coefficients of each layer in the CNN. A random value can be used as the initial value of the weighting factor of the CNN, or a weighting factor obtained by learning about some task may be used. For example, in the image classification task, training images with class labels assigned to each image are used, but in the segmentation task, training images with class labels assigned to each pixel are used. The burden for humans to prepare is large. On the other hand, training images for image classification tasks are publicly available and can be easily obtained. For example, in ILSVRC (ImageNet Large-scale Visual Recognition Challenge), 1.2 million learning images for image classification tasks are published. Therefore, even if the CNN is trained for such an image classification task, and the weighting coefficients obtained by this learning are used as initial values, learning for the mixed state estimation task as in the present embodiment is performed. good.

ここではＣＮＮを用いた推定器について説明したが、推定器の構成は特に限定されない。図８（Ｂ）には、推定部１２００が処理のために用いることができる構成の別の一例を示す。図８（Ｂ）には、特徴抽出部６１０が行う処理に相当する部分、及び推定器６５０が行う処理に相当する部分が示されている。推定器６５０は、特徴抽出部６１０における各層の出力信号を連結して得られた一つの特徴量に対する回帰値を与える。推定器６５０が用いる手法としては、例えばＳＶＲ（ＳｕｐｐｏｒｔＶｅｃｔｏｒＲｅｇｇｒｅｓｓｉｏｎ）やロジスティック回帰等が挙げられるが、特に限定されない。そして、学習画像を用いて、この推定器６５０が用いる回帰関数の学習を行うことができる。例えば、上記の出力信号と教師情報との誤差に基づく誤差関数を最小化するように、回帰関数の学習を行うことができる。また、図８（Ａ）のような構成を用いてＣＮＮの学習をあらかじめ行っておき、その後にＣＮＮの各層の出力信号に基づく特徴量を使って推定器６５０のみの学習を行ってもよい。ここで、推定器６５０を完全結合の多層ニューラルネットワークで構成すれば、図８（Ａ）の構成と同様に、誤差逆伝搬法を用いてＣＮＮと推定器６５０との学習を同時に行うこともできる。 Although the estimator using CNN has been described here, the configuration of the estimator is not particularly limited. FIG. 8B shows another example of a configuration that estimation section 1200 can use for processing. FIG. 8B shows a portion corresponding to the processing performed by the feature extraction section 610 and a portion corresponding to the processing performed by the estimator 650 . The estimator 650 gives a regression value for one feature amount obtained by connecting the output signals of each layer in the feature extraction unit 610 . Methods used by the estimator 650 include, for example, SVR (Support Vector Regression) and logistic regression, but are not particularly limited. The training images can then be used to train the regression function used by this estimator 650 . For example, the regression function can be learned so as to minimize the error function based on the error between the output signal and the teacher information. Alternatively, the CNN may be trained in advance using the configuration shown in FIG. 8A, and then only the estimator 650 may be trained using the feature amount based on the output signal of each layer of the CNN. Here, if the estimator 650 is composed of a fully-connected multi-layer neural network, the CNN and the estimator 650 can be trained simultaneously using the error backpropagation method, as in the configuration of FIG. 8(A). .

また、特徴抽出部６１０は、ＨＯＧ又はＳＩＦＴのような別の特徴抽出手法を用いて特徴量を抽出することができる。また、推定器は、ＳＶＲ、ロジスティック回帰、又は多層ニューラルネットワーク等の識別関数を用いて、混合状態の推定を行うことができる。このように、一実施形態においては、特徴抽出手法と推定手法との任意の組み合わせを用いることができる。このような場合にも、従来の方法に従って推定器の学習を行うことができる。ステップＳ２２００における学習により得られた推定器のパラメータは、推定器記憶部５２００に記憶される。 Also, the feature extraction unit 610 can extract feature amounts using another feature extraction method such as HOG or SIFT. The estimator can also use discriminant functions such as SVR, logistic regression, or multi-layer neural networks to estimate the mixture state. Thus, in one embodiment, any combination of feature extraction and estimation techniques can be used. Even in such a case, the estimator can be trained according to the conventional method. The estimator parameters obtained by learning in step S 2200 are stored in estimator storage section 5200 .

このようにして学習が行われた推定器を用いて、入力画像の混合状態を識別する方法について、図２（Ｂ）のフローチャートを参照して説明する。Ｓ１１００において、画像取得部１１００は、混合状態の識別対象となる入力画像を取得する。画像取得部１１００は、撮像装置から得られた現像前の画像データを取得することもできる。 A method of identifying the mixed state of the input image using the estimator trained in this way will be described with reference to the flowchart of FIG. 2(B). In S1100, the image acquisition unit 1100 acquires an input image to be mixed state identification target. The image acquisition unit 1100 can also acquire image data before development obtained from an imaging device.

以下で、入力画像の所定領域にある混合状態の推定対象となる画像を対象画像と呼ぶ。画像取得部１１００は、所定の領域設定パターンに従って、入力画像中に複数の領域を設定することができる。設定された領域のそれぞれに含まれる入力画像の部分画像が、対象画像となる。対象画像は、識別単位に従う所定サイズの部分画像であり、その設定方法は特に限定されない。例えば、学習時と同様、入力画像を所定サイズ（例えば１６×１６ピクセル）の複数の矩形領域に分割し、それぞれの矩形領域にある複数の対象画像について判定を行うことができる。一方、入力画像の一部領域にある対象画像に対して判定を行ってもよい。 Hereinafter, an image that is an object for estimating a mixed state in a predetermined region of an input image is called a target image. The image acquisition section 1100 can set a plurality of areas in the input image according to a predetermined area setting pattern. A partial image of the input image included in each of the set regions is the target image. The target image is a partial image of a predetermined size according to the identification unit, and its setting method is not particularly limited. For example, as in learning, the input image can be divided into a plurality of rectangular regions of a predetermined size (eg, 16×16 pixels), and determination can be made for a plurality of target images in each rectangular region. On the other hand, the determination may be performed on the target image in a partial area of the input image.

ステップＳ１２００において、推定部１２００は、特徴抽出部６１０を用いて、Ｓ１１００で得られた入力画像の所定領域にある対象画像から特徴量を抽出する。また、推定部１２００は、学習された推定器６２０を推定器記憶部５２００から読み込み、推定器６２０に特徴量を入力することにより、対象画像における互いに異なる属性の領域間の混合状態を示す情報を生成する。こうして、推定部１２００は、ステップＳ１１００で取得した入力画像中の対象画像について混合状態を推定する。図８（Ａ）は、推定部１２００が処理のために用いることができるＣＮＮの一例を示す。図８（Ａ）には、特徴抽出部６１０が行う処理に相当する部分が示されており、ここでは入力画像中の所定領域における信号が各層に順伝搬され、対象画像の特徴量Ｘ_ｉが抽出される。また、図８（Ａ）には、推定器６２０に相当する部分が示されており、ここでは得られた特徴量Ｘ_ｉから、混合状態ベクトルの各要素に割り当てられた出力素子６２１における出力信号が生成される。各素子ｌの出力信号の値は、混合状態ベクトルの各要素ｙ_ｌ（Ｘ_ｉ）の値となる。 In step S1200, estimation section 1200 uses feature extraction section 610 to extract a feature amount from the target image in a predetermined region of the input image obtained in S1100. In addition, the estimating unit 1200 reads the learned estimator 620 from the estimator storage unit 5200 and inputs the feature amount to the estimator 620, thereby obtaining information indicating the state of mixture between regions of different attributes in the target image. Generate. Thus, estimation section 1200 estimates the mixed state of the target image in the input image acquired in step S1100. FIG. 8A shows an example of CNN that can be used for processing by the estimator 1200 . FIG. 8A shows a portion corresponding to the processing performed by the feature extraction unit 610. Here, the signal in a predetermined region in the input image is forward propagated to each layer, and the feature amount X _i of the target image is extracted. Also, FIG. 8A shows a portion corresponding to the estimator 620, where the output signal at the output element 621 assigned to each element of the mixture state vector is determined from the obtained feature quantity X _i is generated. The value of the output signal of each element l is the value of each element y _l (X _i ) of the mixed state vector.

ステップＳ１３００において、出力部１３００は、ステップＳ１２００で得られた推定結果を出力する。出力部１３００が行う処理は、識別結果の利用方法に依存し、特に限定されない。混合状態を示す情報を用いた処理例を以下に挙げる。 In step S1300, output section 1300 outputs the estimation result obtained in step S1200. The processing performed by the output unit 1300 depends on how the identification result is used, and is not particularly limited. An example of processing using information indicating a mixed state is given below.

例えば、入力画像の各領域に対する画像処理を、その領域における混合状態に応じて変更することができる。この場合、出力部１３００は画像補正アプリケーションに対して各領域の混合状態を出力することができる。 For example, the image processing for each region of the input image can be changed according to the blending state in that region. In this case, the output unit 1300 can output the mixed state of each region to the image correction application.

また、別の例として、混合状態に応じたカメラのフォーカス制御を行うこともできる。例えば、複数の測距点を備える撮像装置のためのフォーカス制御装置は、取得部と、制御部とを備えることができる。取得部は、撮像装置により得られた画像のうち複数の測距点のそれぞれに対応する領域について、領域に占める特定の属性の領域の面積比を示す情報を取得する。そして、制御部は、面積比に応じて複数の測距点を重み付けし、撮像装置のフォーカス制御を行う。より具体的には、多点測距ＡＦを行う場合に、フォーカスを合わせる対象となる被写体成分がより多い測距点の重みを大きくすることができる。例えば、前景に重点を置くフォーカス制御を行う場合、前景成分がより多い測距点の重みを大きくすることができ、特定の被写体に重点を置くフォーカス制御を行う場合、特定の被写体成分がより多い測距点の重みを大きくすることができる。このようなフォーカス制御装置は、上記の情報処理装置から混合状態を示す情報を取得してもよいし、上記の情報処理装置が備える上記の各構成を有していてもよいし、本実施形態とは異なる方法で生成された混合状態を示す情報を取得してもよい。 As another example, camera focus control can be performed according to the mixed state. For example, a focus control device for an imaging device with multiple ranging points can include an acquisition unit and a control unit. The acquisition unit acquires information indicating an area ratio of a region having a specific attribute in an image obtained by the imaging device, with respect to each region corresponding to each of the plurality of ranging points. Then, the control unit weights the plurality of distance measuring points according to the area ratio, and performs focus control of the imaging device. More specifically, when performing multi-point ranging AF, it is possible to increase the weight of ranging points having more subject components to be focused. For example, when performing focus control that emphasizes the foreground, it is possible to increase the weight of AF points with more foreground components. The weight of the distance measurement point can be increased. Such a focus control device may acquire information indicating a mixed state from the above information processing device, may have the above configurations provided in the above information processing device, or may have the above configuration. Information indicating the mixed state generated by a method different from that may be obtained.

さらなる別の例として、混合状態に応じたカメラの露出制御を行うこともできる。例えば、撮像装置のための露出制御装置は、取得部と、算出部と、選択部と、制御部とを備えることができる。取得部は、撮像装置により得られた画像と、画像の各領域について、領域に占める特定の属性の領域の面積比を示す情報を取得することができる。算出部は、画像全体に占める特定の属性の領域の面積比を算出することができる。選択部は、算出された面積比に応じて、露出制御アルゴリズムを選択することができる。制御部は、選択された露出制御アルゴリズムを用いて、撮像装置の露出制御を行うことができる。より具体的には、視野における空の面積に応じて異なる露出制御を行う場合に、混合状態に基づいて空の面積を算出することができる。この場合、従来技術のように、空と枝が混ざっている領域について、ほとんどの領域を前景であると判定したり、ほとんどの領域について空であると判定したりすることにより、空の面積が実際の値と大きく異なってしまう可能性を減らせることが期待できる。 As yet another example, camera exposure control may be performed in response to the blend state. For example, an exposure control device for an imaging device can comprise an acquisition unit, a calculation unit, a selection unit, and a control unit. The obtaining unit can obtain an image obtained by the imaging device and information indicating an area ratio of a region having a specific attribute in each region of the image. The calculation unit can calculate the area ratio of the area of the specific attribute in the entire image. The selection unit can select an exposure control algorithm according to the calculated area ratio. The control unit can use the selected exposure control algorithm to control the exposure of the imaging device. More specifically, the area of the sky can be calculated based on the mixed state when different exposure controls are performed according to the area of the sky in the field of view. In this case, as in the conventional technique, most of the area where the sky and branches are mixed is determined to be the foreground, or most of the area is determined to be the sky, thereby reducing the area of the sky. It can be expected to reduce the possibility that the values differ greatly from the actual values.

ここでは静止画像を学習画像及び入力画像として用いる場合について説明したが、動画像を学習画像及び入力画像として用いることもできる。この場合、混合状態の定義は時間方向に拡張される。例えば、１６×１６ｐｉｘｅｌの所定領域及び５フレームを識別単位とする場合、１６×１６×５のボクセルに関して混合状態を定義することができる。例えば、面積比を用いて混合状態を表す上記の例を拡張することにより、体積比を用いて混合状態を表すことが可能である。 Although the case where still images are used as learning images and input images has been described here, moving images can also be used as learning images and input images. In this case, the definition of mixed state is extended in the temporal direction. For example, if a predetermined region of 16×16 pixels and 5 frames are used as identification units, the mixed state can be defined with respect to 16×16×5 voxels. For example, by extending the above example of expressing mixed states using area ratios, it is possible to express mixed states using volume ratios.

本実施形態では、それぞれが複数の画素を含む複数の領域へと入力画像（及び学習画像）が分割され、この領域内の混合状態が推定された。このような処理によれば、全ての画素のそれぞれについてクラスを推定する場合と比較して、推定処理の回数が少なくなるため、処理の高速化が期待できる。一方、入力画像のそれぞれの画素について混合状態を推定することもできる。すなわち、１つの画素に異なるクラスに属する複数の被写体が写っていることがあり、この１つの画素に対応する被写体領域における、それぞれのクラスの被写体の混合状態を推定することもできる。 In this embodiment, the input image (and training image) was divided into multiple regions, each containing multiple pixels, and the mixture state within the regions was estimated. According to such processing, compared to the case of estimating the class for each of all pixels, the number of times of estimation processing is reduced, and speeding up of processing can be expected. On the other hand, the mixture state can also be estimated for each pixel of the input image. That is, a plurality of subjects belonging to different classes may appear in one pixel, and it is possible to estimate the mixed state of the subjects of each class in the subject area corresponding to this one pixel.

本実施形態では、混合状態を示す情報はスカラ値又は複数のスカラ値で構成されるベクトルとして得られた。一方で、混合状態を示す情報は、３つ以上の値から選択される情報でありうる。例えば、所定領域におけるクラス「空」及び「非空」の混合状態を示す情報は、所定領域が「空」で構成されることを示す値、所定領域が「非空」で構成されることを示す値、又は所定領域において「空」及び「非空」が混合されていることを示す値でありうる。このような混合状態を示す情報も、上述の処理例及び後述する実施形態４，５において利用可能である。 In this embodiment, the information indicating the mixed state was obtained as a scalar value or a vector composed of multiple scalar values. On the other hand, the information indicating the mixed state can be information selected from three or more values. For example, the information indicating the mixed state of the classes "empty" and "non-empty" in the given area is a value indicating that the given area is composed of "empty", It can be a value that indicates, or a value that indicates that "empty" and "non-empty" are mixed in a given region. Information indicating such a mixed state can also be used in the above-described processing example and the fourth and fifth embodiments described later.

［実施形態２］
実施形態１では、学習画像の各画素に対してクラスラベルが設定されていることを前提にして説明した。しかしながら、画素毎にクラスラベルを設定するには時間がかかる。実施形態２では、学習画像に対してクラスラベルを入力するユーザ作業を軽減する方法を説明する。本実施形態においては、学習画像の各領域に対して入力されたクラスラベルに基づいて、データ取得部２１００は、画素ごとのクラスラベルを自動的に算出する。 [Embodiment 2]
In the description of the first embodiment, it is assumed that the class label is set for each pixel of the learning image. However, it takes time to set the class label for each pixel. In the second embodiment, a method of reducing the user's work of inputting class labels for learning images will be described. In this embodiment, the data acquisition unit 2100 automatically calculates the class label for each pixel based on the class label input for each region of the learning image.

以下、図１（Ｃ）を参照して、本実施形態における学習装置の基本的な構成を説明する。本実施形態における画像処理装置の構成は実施形態１と同様であり、説明を省略する。本実施形態において学習データ記憶部５１００には、識別画像の他に、識別画像における第１の属性の領域、第２の属性の領域、及び第１の属性の領域と第２の属性の領域とが混在している混在領域を示す情報が格納されている。例えば、学習データ記憶部５１００は、学習画像と、学習画像上の各領域に対して付与されたクラスラベルと、を含む学習データが記憶する。ここで、複数のクラスが混在している領域には、混在領域であることを示すクラスラベルが与えられている。 The basic configuration of the learning device according to this embodiment will be described below with reference to FIG. The configuration of the image processing apparatus according to this embodiment is the same as that of the first embodiment, and the description thereof will be omitted. In the present embodiment, the learning data storage unit 5100 stores, in addition to the identification image, a region of the first attribute, a region of the second attribute, and regions of the first attribute and the region of the second attribute in the identification image. information indicating a mixed area in which For example, the learning data storage unit 5100 stores learning data including a learning image and a class label assigned to each region on the learning image. Here, an area in which a plurality of classes are mixed is given a class label indicating that it is a mixed area.

データ取得部２１００は、学習データ記憶部５１００から学習データを読み込む。すなわち、データ取得部２１００は、識別画像の他に、識別画像における第１の属性の領域、第２の属性の領域、及び第１の属性の領域と第２の属性の領域とが混在している混在領域を示す情報を取得する。 The data acquisition unit 2100 reads learning data from the learning data storage unit 5100 . That is, in addition to the identification image, the data acquisition unit 2100 obtains an identification image including a first attribute area, a second attribute area, and a mixture of the first attribute area and the second attribute area in the identification image. Gets information indicating the mixed area

詳細化部２３００は、第１の属性の領域に含まれる画素の画素値、及び前記第２の属性の領域に含まれる画素の画素値に基づいて、混在領域の各画素の属性を判定する。例えば、詳細化部２３００は、混在領域であることを示すクラスラベルが与えられている領域について、混合状態を示す教師情報を算出する。詳細については後述する。学習部２２００は、学習画像と混合状態の教師情報とを用いて、実施形態１と同様に推定器の学習を行う。 The refinement unit 2300 determines the attribute of each pixel in the mixed area based on the pixel value of the pixel included in the area of the first attribute and the pixel value of the pixel included in the area of the second attribute. For example, the detailing unit 2300 calculates supervised information indicating a mixed state for an area given a class label indicating a mixed area. Details will be described later. The learning unit 2200 performs learning of the estimator in the same manner as in the first embodiment using the learning image and the teacher information of the mixed state.

本実施形態において学習装置が行う処理のフローを、図２（Ｃ）に従って説明する。ステップＳ２１００においてデータ取得部２１００は、学習データ記憶部５１００から、学習画像とクラスラベルデータとを学習データとして読み込む。学習データ記憶部５１００には、あらかじめ複数の学習画像とそれぞれについてのクラスラベルデータとが用意されている。 The flow of processing performed by the learning device in this embodiment will be described with reference to FIG. 2(C). In step S2100, the data acquisition unit 2100 reads the learning image and the class label data from the learning data storage unit 5100 as learning data. A plurality of learning images and class label data for each are prepared in advance in the learning data storage unit 5100 .

ここで、本実施形態におけるクラスラベルデータについて説明する。図９（Ａ）には学習画像５００が示されており、図９（Ｂ）には学習画像５００についてのクラスラベルデータ４００が示されている。この例では、学習画像５００は空領域４１０、非空領域４２０、及び混在領域４３０から構成されており、それぞれの領域の画素には「空」、「非空」、及び「混在」がそれぞれクラスラベルとして付されている。このように、学習画像５００には、単一クラスの領域と、複数クラスが混在している領域と、が設定されている。 Here, the class label data in this embodiment will be explained. A learning image 500 is shown in FIG. 9A, and class label data 400 for the learning image 500 is shown in FIG. 9B. In this example, the training image 500 is composed of a sky region 410, a non-sky region 420, and a mixed region 430. Pixels in each region are assigned classes of “sky,” “non-sky,” and “mixed.” attached as a label. Thus, in the learning image 500, a single-class area and a multi-class area are set.

これらのクラスラベルは、ツール等を介して予め人間が入力することができる。例えば作業者は、学習画像の空領域及び非空領域を決定することができる。その際、前景の木の枝が細かく複雑になっている箇所においては、空領域と非空領域とを正確に切り分けることは、作業者に対する大きな作業負荷を要求する。そこで、作業者は、このように複数のクラスが混在している領域に対しては、「混在」というクラスラベルを与えることができる。 These class labels can be entered in advance by a human using a tool or the like. For example, the operator can determine sky and non-sky regions of the training images. At that time, in a place where the branches of the foreground tree are fine and complicated, accurately separating the sky area and the non-sky area requires a heavy workload for the operator. Therefore, the operator can assign a class label of "mixed" to an area in which a plurality of classes are mixed in this way.

ここでは、「空」と「非空」が混在している領域について説明したが、実施形態１で説明したように、クラス定義はこのようなものに限定されない。また、クラスが３クラス以上ある場合には、クラスの組み合わせの数だけ混在領域の種類を定義することができる。例えば、図５に示すように「空」、「植物」、「人工物」の３クラスが定義されている場合には、「空と植物の混在領域」、「空と人工物の混在領域」、「植物と人工物の混在領域」、「空と植物と人工物の混在領域」、の４種類の混在領域クラスを定義できる。以下では、「空」と「非空」の２クラスが定義されている場合を例にして説明する。 Here, an area in which "empty" and "non-empty" are mixed has been described, but as described in the first embodiment, the class definition is not limited to such. Also, when there are three or more classes, it is possible to define as many types of mixed regions as there are combinations of classes. For example, as shown in FIG. 5, when three classes of "sky", "plant", and "artificial object" are defined, "space and plant mixed area" and "sky and artificial object mixed area" are defined. , “mixed area of plants and artifacts” and “mixed area of sky, plants and artifacts” can be defined. In the following, a case where two classes of "empty" and "non-empty" are defined will be described as an example.

ステップＳ２３００において、詳細化部２３００は、混在領域に関してクラスラベルの詳細化を行う。具体的には、詳細化部２３００は、混在領域の各画素についてクラスラベルを設定する。ここで、詳細化部２３００は、第１の属性の領域に含まれる画素の画素値、及び第２の属性の領域に含まれる画素の画素値に基づいて、混在領域の各画素の属性を判定する。例えば、詳細化部２３００は、各クラスの色情報を参考に、混在領域のクラスラベルを判定することができる。具体例として、詳細化部２３００は、学習画像Ｉ_ｎにおける、空領域と非空領域とのそれぞれについて、各画素のＲＧＢ値を抽出してＲＧＢ色空間にプロットする。混在領域以外の空領域及び非空領域は、学習データに示されている。そして、詳細化部２３００は、空領域と非空領域とのそれぞれについて、混合ガウス分布を推定する。すると、混合領域の各画素について、そのＲＧＢ値及び空領域の混合ガウス分布に基づいて空領域の尤度を求めることができ、またそのＲＧＢ値及び非空領域の混合ガウス分布に基づいて非空領域にある尤度を推定することができる。詳細化部２３００は、そして、「空」「非空」のうち尤度が高い方のクラスラベルを画素に割り当てることができる。こうして、詳細化部２３００は、混在領域内のクラスラベルを詳細化することができる。図９（Ｃ）は、このようにして詳細化されたクラスラベルデータ４５０を示し、ここには詳細化された空領域４６０及び非空領域４７０が表されている。 In step S2300, the detailing section 2300 refines the class label for the mixed area. Specifically, the detailing section 2300 sets a class label for each pixel in the mixed area. Here, the refinement unit 2300 determines the attribute of each pixel in the mixed area based on the pixel value of the pixel included in the area of the first attribute and the pixel value of the pixel included in the area of the second attribute. do. For example, the detailing unit 2300 can determine the class label of the mixed area with reference to the color information of each class. As a specific example, the detailing unit 2300 extracts the RGB values of each pixel for each of the sky region and the non-sky region in the learning image _In and plots them in the RGB color space. Sky areas and non-sky areas other than mixed areas are shown in the learning data. Refinement section 2300 then estimates a mixed Gaussian distribution for each of the sky region and the non-sky region. Then, for each pixel in the mixed region, the likelihood of the sky region can be determined based on its RGB value and the Gaussian mixture of the sky region, and the likelihood of the non-sky region can be determined based on its RGB value and the Gaussian mixture of the non-sky region. We can estimate the likelihood of being in a region. The refinement unit 2300 can then assign a class label of "empty" or "non-empty", whichever has the higher likelihood, to the pixel. In this way, the refinement unit 2300 can refine the class labels in the mixed area. FIG. 9(C) shows class label data 450 refined in this way, in which refined empty regions 460 and non-empty regions 470 are represented.

このようにして詳細化されたクラスラベルデータを基に、詳細化部２３００は、識別単位となる識別領域について混合状態を表す教師情報を算出する。識別領域、並びに混合状態を示す教師情報の定義及び算出方法に関しては、実施形態１で詳しく説明したとおりであるため、ここでは詳細な説明は省く。なお、詳細化部２３００がクラスラベルの詳細化を行うことは必須ではない。例えば、識別領域内の混在領域にある画素のＲＧＢ値分布と、空領域及び非空領域の混合ガウス分布とに基づいて、混在領域における混合状態を推定することが可能であり、これに基づいて識別領域における混合状態を表す教師情報を算出してもよい。 Based on the class label data detailed in this way, the detailing section 2300 calculates teacher information representing the mixed state of the identification region that is the identification unit. The definition and calculation method of the identification area and the teacher information indicating the mixed state are as described in detail in the first embodiment, so detailed description is omitted here. Note that it is not essential that the detailing unit 2300 refines the class label. For example, it is possible to estimate the mixed state in the mixed region based on the RGB value distribution of the pixels in the mixed region within the identification region and the mixed Gaussian distribution of the sky region and the non-sky region. Teacher information representing the mixed state in the identification region may be calculated.

変形例として、学習データにおいて、複数のクラスが混在している領域に対しては混合状態が設定されていてもよい。例えば、作業者は、特定の領域について、「非空領域の割合は３０％」というような、クラスの面積比を示す情報を入力することができる。この場合、詳細化部２３００は、各画素についてのクラスラベルを推定することなく、識別単位となる識別領域について混合状態を表す教師情報を算出することができる。一方、詳細化部２３００は、混合状態を参照して、入力画像の各画素のクラスラベルを推定することもできる。この場合には、後述する実施形態５と同様に、学習データから計算可能な混合状態を表す情報と、推定された各画素の属性に基づいて計算される混合状態を示す情報と、の類似度が大きいほど高くなる評価値を用いて、推定を行うことができる。 As a modification, a mixed state may be set for an area in which a plurality of classes are mixed in the learning data. For example, the operator can input information indicating the area ratio of the class, such as "the percentage of non-empty areas is 30%" for a specific area. In this case, the detailing section 2300 can calculate teacher information representing the mixed state for the identification region, which is the identification unit, without estimating the class label for each pixel. On the other hand, the refinement unit 2300 can also refer to the mixture state to estimate the class label of each pixel of the input image. In this case, similar to the fifth embodiment described later, the degree of similarity between the information indicating the mixed state that can be calculated from the learning data and the information indicating the mixed state that is calculated based on the estimated attribute of each pixel An estimate can be made using an evaluation value that increases as .

［実施形態３］
実施形態１，２では、識別単位となる識別領域は、あらかじめ矩形領域又は小領域として設定されているという前提で説明を行った。一方で、識別領域の大きさや切り方を、さまざまな撮影情報に基づいて変えることができる。例えば、ボケの強い領域では、細かいテクスチャが情報として失われるため、より広い識別領域に対して推定を行うことにより、混合状態の推定精度を向上できる可能性がある。 [Embodiment 3]
In the first and second embodiments, the description has been given on the premise that the identification area, which is the identification unit, is set in advance as a rectangular area or a small area. On the other hand, the size and cutting method of the identification area can be changed based on various shooting information. For example, in highly blurred areas, fine texture information is lost, so there is a possibility that the mixed state estimation accuracy can be improved by estimating a wider identification area.

撮影情報は、撮像装置固有の情報と、撮影された画像固有の情報を含む。撮像装置固有の情報としては、センサのサイズ若しくは許容錯乱円径、及び光学系の明るさ若しくは焦点距離等が挙げられる。撮影された画像固有の情報としては、絞り値、合焦距離、Ｂｖ値、ＲＡＷ画像、露出時間、ゲイン（ＩＳＯ感度）、ホワイトバランス係数、距離情報、ＧＰＳ等による位置情報、日時等の時間情報、等が挙げられる。他にも、撮影された画像固有の情報としては、撮影時における、重力センサ値、加速度、地磁気方向、温度、湿度、気圧、又は高度等が挙げられる。また、可視光以外に、赤外光や紫外光の情報を得ることができる撮像系もある。撮像装置の仕様により、得られる撮影情報はそれぞれ異なる。撮影情報は、入力画像の撮影時に入力画像に関連付けて付された情報、入力画像の撮影時における撮像装置の状態を示す情報、又は入力画像の撮影時に撮像装置によって測定された情報でありうる。また、撮影情報は、入力画像の撮影時に撮像装置によって検知された入力画像の特性を表す情報でありうる。また、撮影情報は、入力画像自体のデータとは異なる情報である。 The imaging information includes information specific to the imaging device and information specific to the captured image. The information specific to the imaging device includes the size of the sensor or the diameter of the permissible circle of confusion, the brightness of the optical system or the focal length, and the like. Information unique to the captured image includes aperture value, focus distance, Bv value, RAW image, exposure time, gain (ISO sensitivity), white balance coefficient, distance information, location information such as GPS, and time information such as date and time. , etc. In addition, the information specific to the captured image includes the gravity sensor value, acceleration, geomagnetic direction, temperature, humidity, atmospheric pressure, altitude, etc. at the time of capturing. In addition to visible light, there are imaging systems that can obtain information on infrared light and ultraviolet light. The imaging information obtained differs depending on the specifications of the imaging apparatus. The shooting information may be information attached in association with the input image when the input image was shot, information indicating the state of the imaging device when the input image was shot, or information measured by the imaging device when the input image was shot. Also, the shooting information may be information representing the characteristics of the input image detected by the imaging device when the input image was shot. Also, the shooting information is information different from the data of the input image itself.

図１（Ｄ）に沿って、実施形態３に係る学習装置の基本的な構成を説明する。学習データ記憶部５１００には、あらかじめ学習データが記憶されている。本実施形態において学習データは、学習画像と、各学習画像に対応する撮影情報と、学習画像上のさまざまな大きさの領域に対して付与された混合状態の教師情報と、を含む。データ取得部２１００は、学習データ記憶部５１００から、学習画像、撮影情報、及び教師情報を読み込む。学習部２２００は、学習画像と混合状態の教師情報を用いて、混合状態を推定する推定器の学習を行い、得られた推定器を推定器記憶部５２００に記憶する。ここで、学習部２２００は、第１の領域設定パターンに従って設定された所定領域にある識別画像を用いた学習により第１の推定器を生成し、第２の領域設定パターンに従って設定された所定領域にある識別画像を用いた学習により第２の推定器を生成する。評価部２４００は、確認データ記憶部５４００から読み込んだ確認データを使って、学習により得られたそれぞれの推定器の推定精度を評価する。そして、評価部２４００は、撮影情報と推定精度とに基づいて領域設定器を生成し、設定器記憶部５３００に記憶する。 The basic configuration of the learning device according to the third embodiment will be described along FIG. 1(D). Learning data is stored in the learning data storage unit 5100 in advance. In the present embodiment, the learning data includes learning images, shooting information corresponding to each learning image, and mixed teacher information given to regions of various sizes on the learning images. The data acquisition unit 2100 reads learning images, shooting information, and teacher information from the learning data storage unit 5100 . The learning unit 2200 learns an estimator for estimating the mixture state using the learning image and the teacher information of the mixture state, and stores the obtained estimator in the estimator storage unit 5200 . Here, the learning unit 2200 generates the first estimator by learning using the identification image in the predetermined area set according to the first area setting pattern, and the predetermined area set according to the second area setting pattern. A second estimator is generated by learning using the identification image at . The evaluation unit 2400 uses the confirmation data read from the confirmation data storage unit 5400 to evaluate the estimation accuracy of each estimator obtained by learning. Then, the evaluation section 2400 generates an area setter based on the imaging information and the estimation accuracy, and stores it in the setter storage section 5300 .

次に図１（Ｅ）に沿って、画像処理装置の装置構成の概要を説明する。画像取得部１１００は、入力画像と撮影情報とを取得する。領域設定部１４００は、撮影情報に応じて、複数の領域設定パターンの中から、対象画像の設定に用いる領域設定パターンを選択する。本実施形態において領域設定部１４００は、領域設定器を設定器記憶部５３００から読み込み、撮影情報に従って識別単位となる領域を設定する。推定部１２００は、推定器を推定器記憶部５２００から読み込み、設定された識別単位に従って設定された所定領域にある対象画像について、推定器を使って混合状態を推定する。 Next, an overview of the configuration of the image processing apparatus will be described with reference to FIG. The image acquisition unit 1100 acquires an input image and shooting information. The area setting unit 1400 selects an area setting pattern to be used for setting the target image from among a plurality of area setting patterns according to the imaging information. In the present embodiment, the area setting unit 1400 reads the area setter from the setter storage unit 5300 and sets the area serving as the identification unit according to the imaging information. The estimator 1200 reads the estimator from the estimator storage unit 5200, and uses the estimator to estimate the mixed state of the target image in the predetermined region set according to the set discrimination unit.

本実施形態における処理の詳細な説明を以下に記す。まず、学習時の処理に関して図２Ｄ）のフローチャートを参照して説明する。ステップＳ２１００において、データ取得部２１００は、学習データ記憶部５１００から、学習画像、撮影情報、及び混合状態の教師情報を、学習データとして読み込む。 A detailed description of the processing in this embodiment is provided below. First, processing during learning will be described with reference to the flowchart of FIG. 2D). In step S2100, the data acquisition unit 2100 reads the learning image, the shooting information, and the teacher information of the mixed state from the learning data storage unit 5100 as learning data.

ステップＳ２２００において、学習部２２００は、データ取得部２１００が取得した学習画像と混合状態の教師情報とを用いて、混合状態を推定する推定器の学習を行う。上述のように、本実施形態においては複数種類の領域設定パターンのそれぞれに従って識別単位が設定される。すなわち、識別単位となる領域としては、さまざまなものが用意されている。例えば、３×３、９×９、及び１５×１５の矩形領域など、サイズの異なる複数パターンの識別単位を用意することができる。実施形態１でも説明したように、識別単位は矩形領域には限られない。例えば、実施形態１で説明したように、複数の領域設定パターンとして、領域分割により小領域を設定する際に用いるパラメータを複数用意することができる。 In step S2200, the learning unit 2200 uses the learning image acquired by the data acquisition unit 2100 and the teacher information of the mixed state to learn an estimator for estimating the mixed state. As described above, in the present embodiment, identification units are set according to each of a plurality of types of area setting patterns. That is, various areas are prepared as identification units. For example, it is possible to prepare a plurality of patterns of identification units having different sizes, such as rectangular areas of 3×3, 9×9, and 15×15. As described in the first embodiment, identification units are not limited to rectangular areas. For example, as described in the first embodiment, it is possible to prepare a plurality of parameters used when setting small regions by region division as a plurality of region setting patterns.

領域設定パターンの違いにより、画像上の同じ位置であっても、混合状態の教師情報は変化しうる。図３（Ｃ）には、学習画像の同じ位置にあるさまざまなサイズの矩形領域５５１、５５２、及び５５３が示されている。最も小さい矩形領域５５１においては、空：非空の面積比はｒ＝１である。一方、矩形領域５５２及び５５３は、それぞれ非空領域を含むため、面積比はそれぞれｒ＝０．９及びｒ＝０．８となる。 Even at the same position on the image, teacher information in a mixed state may change due to differences in area setting patterns. FIG. 3C shows rectangular regions 551, 552, and 553 of various sizes at the same position in the training image. In the smallest rectangular region 551, the empty:non-empty area ratio is r=1. On the other hand, the rectangular areas 552 and 553 each include a non-empty area, so the area ratios are r=0.9 and r=0.8, respectively.

学習部２２００は、それぞれの領域設定パターンに対応する推定器の学習を行う。すなわち、学習部２２００は、着目領域設定パターンに従って設定された識別領域と、この識別領域について与えられた教師情報とに基づき、着目領域設定パターンに対応する推定器の学習を行う。この結果、学習部２２００は、複数の領域設定パターンのそれぞれに対応する推定器を生成する。例えば、領域設定パターンのインデックスをｑとし、領域設定パターンの総数をＱとすると、学習によってＱ種類の推定器ｙｑを得ることができる。推定器の学習は実施形態１と同様に行うことができる。一例として、それぞれの推定器ｙｑは、回帰関数ｆｑ（Ｘ）（ｑ＝１，……，Ｑ）に従って混合状態の推定を行うことができる。学習により得られた推定器は、推定器記憶部５２００に記憶される。 A learning unit 2200 learns an estimator corresponding to each region setting pattern. That is, the learning section 2200 learns the estimator corresponding to the region-of-interest setting pattern based on the identification region set according to the region-of-interest setting pattern and the teacher information given for this identification region. As a result, learning section 2200 generates an estimator corresponding to each of the plurality of region setting patterns. For example, if the region setting pattern index is q and the total number of region setting patterns is Q, Q types of estimators yq can be obtained by learning. Training of the estimator can be performed in the same manner as in the first embodiment. As an example, each estimator yq can perform mixture state estimation according to a regression function fq(X) (q=1, . . . , Q). The estimator obtained by learning is stored in estimator storage section 5200 .

ステップＳ２３００において評価部２４００は、ステップＳ２２００で得られた推定器の識別精度を、撮影情報とともに評価し、領域設定器を生成する。例えば、評価部２４００は、教師情報及び撮影情報が関連付けられている検証画像を用いて、それぞれの推定器の識別精度を評価することができる。そして、評価部２４００は、所定の撮影情報が関連付けられている識別画像の判定を行う際に良好な識別精度が得られるように、特定の撮影情報に対応する推定器を示す情報を生成することができる。 In step S2300, evaluation section 2400 evaluates the identification accuracy of the estimator obtained in step S2200 together with the imaging information, and generates an area setter. For example, the evaluation unit 2400 can evaluate the identification accuracy of each estimator using verification images associated with teacher information and imaging information. Then, the evaluation unit 2400 generates information indicating an estimator corresponding to specific imaging information so that good identification accuracy can be obtained when determining an identification image associated with predetermined imaging information. can be done.

撮影情報の中には、学習画像の画素ごとに得られる情報がある。また、撮影情報の組み合わせにより新たな撮影情報を得ることもできる。例えば、画素位置ｐにおけるレンズ面から被写体までの距離Ｚ（ｐ）と、光学系の焦点距離ｆと、が撮影情報として得られた場合、像倍率Ｓ（ｐ）を算出することができる。

The shooting information includes information obtained for each pixel of the learning image. Also, new photographic information can be obtained by combining the photographic information. For example, if the distance Z(p) from the lens surface to the object at the pixel position p and the focal length f of the optical system are obtained as shooting information, the image magnification S(p) can be calculated.

また、光学系のＦ値、焦点距離ｆ、撮影時の合焦距離Ｚ_ｆ、及び画素位置ｐにおける被写体までの距離Ｚ（ｐ）が撮影情報として得られた場合、各画素位置におけるボケ量Ｂ（ｐ）を得ることができる。

Further, when the F value of the optical system, the focal length f, the in-focus distance Z _f at the time of shooting, and the distance Z(p) to the subject at the pixel position p are obtained as shooting information, the blur amount B (p) can be obtained.

さらに、ＲＡＷ画像の各画素位置ｐにおける値ｒ（ｐ）、ｇ（ｐ）、及びｂ（ｐ）、露出時間Ｔ、ゲインＧ、並びに絞り量Ｆが撮影情報として得られた場合、画素位置ｐにおける入射光量ＢＶ（ｐ）の絶対値を得ることができる。

Furthermore, when the values r(p), g(p), and b(p) at each pixel position p of the RAW image, the exposure time T, the gain G, and the aperture value F are obtained as shooting information, the pixel position p can obtain the absolute value of the incident light amount BV(p) at .

以下、撮影情報として画素位置ｐにおけるボケ量Ｂ（ｐ）を用いて、領域設定器を生成する場合について説明する。もっとも、用いる撮影情報はこれには限定されず、像倍率Ｓ（ｐ）又は入射光量ＢＶ（ｐ）等の他の撮影情報を用いてもよい。また、複数の撮影情報を組み合わせてもよく、例えばボケ量Ｂ（ｐ）と入射光量ＢＶ（ｐ）を組み合わせて用いてもよい。 A case of generating an area setter using the blur amount B(p) at the pixel position p as the imaging information will be described below. However, the photographic information to be used is not limited to this, and other photographic information such as the image magnification S(p) or the incident light amount BV(p) may be used. Also, a plurality of shooting information may be combined, for example, the amount of blur B(p) and the amount of incident light BV(p) may be combined and used.

まず、評価部２４００は、ボケ量Ｂを複数のビンに区切り、領域設定パターンｑに関するテーブルを生成する。この例では、ボケ量Ｂが２未満、２以上３未満、３以上４未満、４以上、の４つのビンに区切られている。また、領域設定パターンｑとしては３×３、９×９、及び１５×１５の３種類が用いられており、３×４のテーブルが得られる。 First, the evaluation unit 2400 divides the blur amount B into a plurality of bins and generates a table regarding the region setting pattern q. In this example, the blur amount B is divided into four bins: less than 2, 2 or more and less than 3, 3 or more and less than 4, and 4 or more. Three types of 3×3, 9×9, and 15×15 are used as the area setting pattern q, and a 3×4 table is obtained.

次に、評価部２４００は、確認データを確認データ記憶部５４００から読み込む。確認データは、学習データと同様、複数の確認画像、それぞれの確認画像についてのクラスラベルデータ、及び撮影情報を含む。ここでは、確認画像の総枚数をＮ_ｖと表し、ｖ番目の確認画像をＩ_ｖ（ｖ＝１，……，Ｎ_ｖ）と表す。 Next, the evaluation section 2400 reads confirmation data from the confirmation data storage section 5400 . The confirmation data, like the learning data, includes a plurality of confirmation images, class label data for each confirmation image, and shooting information. Here, the total number of confirmation images is represented by _Nv , and the v-th confirmation image is represented by Iv ( _v =1, . . . , _Nv ).

評価部２４００は、領域設定パターンｑのそれぞれに従って、確認画像中の識別単位となる領域ｉにおける特徴量を抽出し、対応する推定器に入力する。こうして、領域設定パターンｑを用いた場合の確認画像Ｉν中の領域ｉの混合状態の推定値ｙ_ｑ（Ｘ^ν _ｉ）を得ることができる。このとき、混合状態教師情報ｃ_ｑ（ν，ｉ）に対する二乗誤差は下記のように表すことができる。

The evaluation unit 2400 extracts the feature amount in the area i serving as the identification unit in the confirmation image according to each of the area setting patterns q, and inputs it to the corresponding estimator. In this way, it is possible to obtain the estimated value y _q (X ^ν _i ) of the mixed state of the region i in the verification image Iν when the region setting pattern q is used. At this time, the squared error for the mixed state teacher information c _q (ν, i) can be expressed as follows.

また、ボケ量Ｂと領域設定パターンｑとの組み合わせに対するビン（Ｂ，ｑ）における二乗誤差平均ＭＳＥ（Ｂ，ｑ）は下記のように表される。

ここでδ_Ｂ（ν，ｉ）は、確認画像Ｉνの領域ｉの中心位置におけるボケ量がビンＢの範囲内であるときに１、そうでないときに０を返すものとする。 Further, the mean squared error MSE (B, q) in the bin (B, q) for the combination of the blur amount B and the region setting pattern q is expressed as follows.

Here, δ _B (ν, i) returns 1 when the amount of blur at the center position of the region i of the verification image Iν is within the range of the bin B, and 0 otherwise.

そして、ビン（Ｂ，ｑ）に関する信頼度Ｔ（Ｂ，ｑ）は、１から二乗平均平方根誤差を減じた値として定義できる。

The confidence T(B,q) for bin (B,q) can then be defined as 1 minus the root mean square error.

このようにして、評価部２４００は、各ビン（Ｂ，ｑ）に対する信頼度Ｔ（Ｂ，ｑ）のテーブルを得ることができる。こうして得られたテーブルの例を下に示す。評価部２４００は、このようにして得られたテーブルを、領域設定器として設定器記憶部５３００に記憶する。

In this way, the evaluator 2400 can obtain a table of confidences T(B, q) for each bin (B, q). An example of the table thus obtained is shown below. The evaluation unit 2400 stores the table thus obtained in the setter storage unit 5300 as an area setter.

本実施形態では、得られたテーブルが領域設定器として設定器記憶部５３００に格納された。一方、評価部２４００は、信頼度Ｔ（Ｂ，ｑ）の値を教師情報として、ボケ量Ｂに対する信頼度Ｔを回帰値として出力する回帰関数ｇ_ｑ（Ｂ）を各領域設定パターンｑに対して生成し、これを領域設定器として用いてもよい。 In this embodiment, the obtained table is stored in the setter storage unit 5300 as the area setter. On the other hand, the evaluation unit 2400 uses the value of the reliability T (B, q) as teacher information, and applies a regression function g _q (B) that outputs the reliability T for the blur amount B as a regression value to each region setting pattern q. and use it as the area setter.

このように得られた混合状態推定器及び領域設定器を用いて、入力画像の混合状態を推定する処理について、図２（Ｅ）のフローチャートを用いて説明する。ステップＳ１１００において画像取得部１１００は、撮像装置により得られた画像データと撮影情報とを取得する。 Processing for estimating the mixed state of the input image using the mixed state estimator and region setter thus obtained will be described with reference to the flowchart of FIG. 2(E). In step S1100, the image acquisition unit 1100 acquires image data and shooting information obtained by the imaging device.

ステップＳ１４００において領域設定部１４００は、設定器記憶部５３００から領域設定器を読み込み、撮影情報に従って使用する領域設定パターンを決定する。例えば、領域設定部１４００は、下式に従って、入力画像Ｉの各領域ｉに対して、撮影情報として得られたボケ量Ｂ（ｉ）から得られる信頼度Ｔが最も大きくなる領域設定パターンｑ_ｗｉｎを選ぶことができる。なお、ボケ量Ｂ（ｉ）は、入力画像Ｉの領域ｉの中心位置におけるボケ量を指す。具体的な処理は特に限定されないが、例えば、１つの領域設定パターンに従って入力画像Ｉを複数の領域に分割し、１つの領域を別の領域設定パターンに従って細分化した方が信頼度が高くなる場合に、この細分化を行うことができる。別の例としては、ボケ量が類似している領域を連結し、それぞれの連結領域についてボケ量に従う領域設定パターンを用いて領域分割を行うことができる。

In step S1400, the area setting section 1400 reads the area setting device from the setting device storage section 5300 and determines the area setting pattern to be used according to the imaging information. For example, according to the following formula, the region setting unit 1400 sets region setting patterns q _win can choose. Note that the blur amount B(i) refers to the blur amount at the center position of the region i of the input image I. Although the specific processing is not particularly limited, for example, when the input image I is divided into a plurality of regions according to one region setting pattern, and the one region is subdivided according to another region setting pattern, the reliability becomes higher. , this subdivision can be performed. As another example, it is possible to connect regions with similar amounts of blur, and perform region division using a region setting pattern according to the amount of blur for each connected region.

ステップＳ１２００において、推定部１２００は、推定器記憶部５２００から推定器を読み込み、入力画像の各位置における混合状態を推定する。具体的には、推定部１２００は、各位置ｐにおいて設定された所定領域の画像の特徴量を抽出し、抽出された特徴量を推定器に入力することにより、この位置ｐにおける混合状態を推定することができる。ここで、各位置ｐについての所定領域は、ステップＳ１４００で決定された領域設定パターンｑ_ｗｉｎに従って設定される。上述のように、本実施形態においては、複数の領域設定パターンのそれぞれに対応する推定器が生成されている。したがって、推定部１２００は、ステップＳ１４００で決定された領域設定パターンに従って複数の推定器から選択された推定器を用いることができる。例えば、位置ｐにおける推定器としてはｙ_ｑｗｉｎが選択され、位置ｐにおける所定領域の混合状態の推定値は、ｙ_ｑｗｉｎ（Ｘ_ｉ）として得られる。図３（Ｄ）は、画像上の位置によって領域設定方法を変えた場合の例を示し、それぞれの矩形が推定器に入力される１つの領域を示す。 In step S1200, the estimation unit 1200 reads the estimator from the estimator storage unit 5200 and estimates the mixture state at each position of the input image. Specifically, the estimating unit 1200 extracts the feature amount of the image of the predetermined region set at each position p, and inputs the extracted feature amount to the estimator, thereby estimating the mixed state at this position p. can do. Here, the predetermined area for each position p is set according to the area setting pattern q _win determined in step S1400. As described above, in this embodiment, estimators corresponding to each of a plurality of area setting patterns are generated. Therefore, estimation section 1200 can use an estimator selected from a plurality of estimators according to the region setting pattern determined in step S1400. For example, y _qwin is chosen as the estimator at position p, and the mixture state estimate for the given region at position p is obtained as y _qwin (X _i ). FIG. 3D shows an example in which the region setting method is changed depending on the position on the image, and each rectangle represents one region that is input to the estimator.

ステップＳ１３００に係る処理は実施形態１と同様であるため、説明は省略する。本実施形態のように、撮影情報を利用して混合状態を推定する識別単位となる領域の設定方法を変ることにより、より誤差の少ない混合状態の推定を行うことができる。 Since the processing related to step S1300 is the same as that of the first embodiment, description thereof is omitted. As in the present embodiment, by changing the method of setting areas that serve as identification units for estimating a mixed state using photographing information, it is possible to estimate a mixed state with less error.

［実施形態４］
実施形態１～３では、識別単位となる所定領域における混合状態を推定した。実施形態４では、得られた混合状態の推定結果を用いて、領域を細分化することにより、詳細な領域分割結果を得る方法について説明する。学習装置及び画像処理装置の基本的な構成は実施形態１と同様であり、説明を省略する。 [Embodiment 4]
In Embodiments 1 to 3, the mixed state in a predetermined area serving as a discrimination unit is estimated. In the fourth embodiment, a method of obtaining a detailed segmentation result by subdividing a region using the obtained mixture state estimation result will be described. The basic configurations of the learning device and the image processing device are the same as those of the first embodiment, and descriptions thereof are omitted.

以下、学習時の処理について図２（Ａ）のフローチャートに従って説明する。ステップＳ２１００においてデータ取得部２１００は、学習データ記憶部５１００から、学習画像と混合状態の教師情報と学習データとして読み込む。 The processing during learning will be described below with reference to the flowchart of FIG. 2(A). In step S2100, the data acquisition unit 2100 reads from the learning data storage unit 5100 a learning image, teacher information in a mixed state, and learning data.

ステップＳ２２００で学習部２２００は、実施形態３と同様の処理を行う。すなわち、識別単位としては様々な大きさの領域が用意される。例えば、複数の領域設定パターンのそれぞれに従って、１×１、３×３、９×９、及び１５×１５等、異なるサイズの矩形領域を複数パターン用意することができる。そして、学習部２２００は、それぞれの領域サイズに対応する推定器の学習を、実施形態３と同様にそれぞれの領域サイズについて得られた混合状態の教師情報を用いて行うことができる。すなわち、領域サイズのインデックスをｑとし、領域サイズの総数をＱとすると、学習によってＱ種類の推定器ｙｑ（ｑ＝１，……，Ｑ）を得る事ができる。一例として、それぞれの推定器ｙｑは、回帰関数ｆｑ（Ｘ）に従って混合状態の推定を行うことができる。学習により得られた推定器ｙｑは、推定器記憶部５２００に書きこまれる。 In step S2200, the learning unit 2200 performs processing similar to that of the third embodiment. That is, areas of various sizes are prepared as identification units. For example, it is possible to prepare a plurality of patterns of rectangular areas of different sizes, such as 1×1, 3×3, 9×9, and 15×15, according to each of the plurality of area setting patterns. Then, the learning unit 2200 can learn the estimator corresponding to each region size using the mixed state teacher information obtained for each region size, as in the third embodiment. That is, if the region size index is q and the total number of region sizes is Q, Q types of estimators yq (q=1, . . . , Q) can be obtained by learning. As an example, each estimator yq can make an estimation of the mixture state according to the regression function fq(X). The estimator yq obtained by learning is written in the estimator storage unit 5200 .

次に、判定時の処理に関して、図２（Ｆ）のフローチャートに従って説明する。ステップＳ１１００において、画像取得部１１００は入力画像を取得する。ステップＳ１２００において、推定部１２００は、推定器を用いて入力画像中の所定領域における混合状態を推定する。ここで、推定部１２００は、複数の領域設定パターンのうちの第１の領域設定パターンを用いて領域設定を行う。すなわち、推定部１２００は、第１の領域設定パターンに従う大きさの第１の対象画像について混合状態を判定する。本実施形態において、識別単位としては、Ｑ種類の領域サイズのうち最も大きいサイズが用いられる。前述の例では、識別単位として１５×１５ｐｉｘｅｌが選択され、また推定器としては１５×１５ｐｉｘｅｌに対応する推定器が用いられる。 Next, processing at the time of determination will be described according to the flowchart of FIG. 2(F). In step S1100, the image acquisition unit 1100 acquires an input image. In step S1200, the estimator 1200 estimates the mixed state in a predetermined region in the input image using an estimator. Here, estimation section 1200 performs region setting using a first region setting pattern among the plurality of region setting patterns. That is, the estimating unit 1200 determines the mixed state for the first target image having a size that follows the first area setting pattern. In this embodiment, the largest size among the Q types of area sizes is used as the identification unit. In the above example, 15×15 pixels is selected as the discrimination unit, and an estimator corresponding to 15×15 pixels is used as the estimator.

そして、推定部１２００は、入力画像の第１の部分にある第１の対象画像について推定された混合状態を示す情報に従って、第１の部分の混合状態を再判定するか否かを判定する。例えば、推定部１２００は、混合状態の推定を行った所定領域について混合状態の再判定を行うか否かを判定する。例えば、推定部１２００は、クラス純度が閾値以上である領域については、このクラス推定結果を採用する。 Then, the estimation unit 1200 determines whether or not to redetermine the mixture state of the first portion according to the information indicating the mixture state estimated for the first target image in the first portion of the input image. For example, the estimating unit 1200 determines whether or not to re-determine the mixed state for a predetermined region in which the mixed state has been estimated. For example, the estimating unit 1200 adopts this class estimation result for regions where the class purity is equal to or greater than the threshold.

一方、推定部１２００は、クラス純度が閾値よりも低い領域については、この領域について混合状態の再判定を行う。再判定を行うとの判定に応じて、推定部１２００は、第１の部分にある、第２の領域設定パターンに従う大きさの第２の対象画像の混合状態を示す情報を出力する。ここで、第２の対象画像は第１の対象画像よりも小さい。すなわち、推定部１２００は、クラス純度が閾値よりも低い領域について、より小さい識別単位に従って領域を再分割し、再分割された領域のそれぞれについて、再び推定器を用いて混合状態の推定を行う。推定部１２００は、例えば、一段階小さい領域サイズを用いて再分割を行うことができる。上述のように、本実施形態においては、複数の領域設定パターンのそれぞれに対応する推定器が生成されている。したがって、推定部１２００は、再分割に用いた領域設定パターンに従って複数の推定器から選択された推定器を用いることができる。 On the other hand, the estimating unit 1200 re-determines the mixed state for the regions where the class purity is lower than the threshold. In response to the determination to re-determine, the estimating unit 1200 outputs information indicating the mixed state of the second target image having the size according to the second area setting pattern in the first portion. Here, the second target image is smaller than the first target image. That is, the estimating unit 1200 subdivides the regions whose class purity is lower than the threshold according to smaller discrimination units, and estimates the mixture state for each of the subdivided regions again using the estimator. The estimating unit 1200 can, for example, perform repartitioning using a region size that is one step smaller. As described above, in this embodiment, estimators corresponding to each of a plurality of area setting patterns are generated. Therefore, the estimator 1200 can use an estimator selected from a plurality of estimators according to the region setting pattern used for re-segmentation.

ここで、クラス純度とは、領域内における同一クラスラベルが割り振られている画素の割合を示す。例えば、実施形態１に示した面積比ｒの値が０．８以上又は０．２以下である場合に、クラス純度が高いと定義することができる。図７に示すマップを用いる場合、ｐ１≧０．９かつｐ２≦０．８の場合にクラス純度が高いと定義することもできる。 Here, the class purity indicates the ratio of pixels to which the same class label is assigned in the area. For example, when the value of the area ratio r shown in the first embodiment is 0.8 or more or 0.2 or less, it can be defined that the class purity is high. Using the map shown in FIG. 7, one can also define high class purity when p1≧0.9 and p2≦0.8.

このように、クラス純度が低い領域については細分化及び混合状態の再推定を行うことにより、詳細な領域分割を行うことができる。領域が細分化できなくなるか、すべての領域のクラス純度が閾値以上になると、処理はステップＳ１３００へと進むことができる。ステップＳ１３００における処理は実施形態１と同様であるため、説明は省略する。このようにして得られた詳細な領域分割結果は、領域別のトーンマッピング又はホワイトバランス調整等の高画質化処理に利用することができる。 In this way, by subdividing and re-estimating the mixed state for regions with low class purity, detailed region division can be performed. If the region can no longer be subdivided, or if the class purity of all regions is greater than or equal to the threshold, processing may proceed to step S1300. Since the processing in step S1300 is the same as that of the first embodiment, the description is omitted. Detailed region division results obtained in this way can be used for image quality enhancement processing such as tone mapping or white balance adjustment for each region.

［実施形態５］
実施形態４では、識別単位を細分化していくことによって詳細な領域分割結果を算出したが、領域分割の方法はこの方法には限定されない。実施形態５では、各領域についての混合状態の推定結果を利用して、画素単位のクラス判定を行うことにより、詳細な領域分割結果を得る方法について説明する。 [Embodiment 5]
In the fourth embodiment, a detailed area division result is calculated by subdividing the identification unit, but the area division method is not limited to this method. In the fifth embodiment, a method of obtaining a detailed segmentation result by performing class determination on a pixel-by-pixel basis using the mixed state estimation result for each region will be described.

本実施形態に係る画像処理装置の基本構成を図１（Ｆ）に示す。画像取得部１１００及び推定部１２００の機能は実施形態１と同様であるため説明を省略する。判定部１５００は、対象画像の各画素の属性を判定する。判定部１５００は、評価値に基づいて各画素の属性を決定し、この評価値が示す評価は、各画素の属性に基づいて計算される混合状態を示す情報と、推定部１２００により得られた混合状態を示す情報と、の類似度が大きいほど高くなる。本実施形態においては、判定部１５００は、混合状態推定結果及び画像情報に基づいて、入力画像の各画素のクラスラベルを推定する。出力部１３００は、入力画像の各画素について推定されたクラスラベルを示す情報を出力する。 FIG. 1F shows the basic configuration of the image processing apparatus according to this embodiment. The functions of the image acquiring unit 1100 and the estimating unit 1200 are the same as those of the first embodiment, so description thereof will be omitted. The determination unit 1500 determines attributes of each pixel of the target image. The determination unit 1500 determines the attribute of each pixel based on the evaluation value, and the evaluation indicated by the evaluation value is based on the information indicating the mixture state calculated based on the attribute of each pixel and the information obtained by the estimation unit 1200. The higher the similarity between the information indicating the mixed state, the higher. In this embodiment, the determination unit 1500 estimates the class label of each pixel of the input image based on the mixed state estimation result and the image information. The output unit 1300 outputs information indicating the class label estimated for each pixel of the input image.

本実施形態に係る判定処理の詳細を図２（Ｇ）に従って説明する。ステップＳ１１００及びＳ１２００における処理は、実施形態１と同様であるため説明を省略する。ステップＳ１５００において判定部１５００は、ステップＳ１２００にて推定された各領域の混合状態を利用して、入力画像の各画素についてクラスを推定する。例えば、推定された各画素のクラスに従って求められる混合状態が、ステップＳ１２００にて推定された混合状態に近くなるように、各画素のクラスの推定を行うことができる。各画素のクラスの推定には、さらに各画素の色情報を用いて、例えば同じクラスに属する画素の色が類似するように、又は異なるクラスに属する画素の色が類似しないように、行うことができる。 Details of the determination processing according to this embodiment will be described with reference to FIG. The processing in steps S1100 and S1200 is the same as that of the first embodiment, so the description is omitted. In step S1500, the determination unit 1500 estimates the class of each pixel of the input image using the mixture state of each region estimated in step S1200. For example, the class of each pixel can be estimated such that the mixed state obtained according to the estimated class of each pixel is close to the mixed state estimated in step S1200. The class of each pixel can be estimated by using the color information of each pixel, for example, so that the colors of pixels belonging to the same class are similar or the colors of pixels belonging to different classes are not similar. can.

入力画像の各画素のクラスを推定する方法の一例として、ＣＲＦ（ＣｏｎｄｉｔｉｏｎａｌＲａｎｄｏｍＦｉｅｌｄ、条件付き確率場）のような繰り返し処理を利用する場合について以下説明する。ＣＲＦは、複数のノードからなるグラフに対して、対となるノード間の類似度によるｐａｉｒｗｉｓｅｐｏｔｅｎｔｉａｌと、各ノードの持つｕｎａｒｙｐｏｔｅｎｔｉａｌを考慮して、各ノードの状態を安定した状態まで逐次推移させていく方法である。画像の画素判別にＣＲＦを使う場合は、各ノードが画像の各画素に対応するＣＲＦモデルを使うことができる。 As an example of a method of estimating the class of each pixel of an input image, a case of using iterative processing such as CRF (Conditional Random Field) will be described below. CRF considers the pairwise potential based on the similarity between pairs of nodes and the unary potential of each node for a graph consisting of multiple nodes, and sequentially transitions the state of each node to a stable state. It is a way to go. If a CRF is used for image pixel discrimination, a CRF model can be used in which each node corresponds to each pixel in the image.

入力画像Ｉ上の画素ｉのクラスラベルｃｉの条件付き確率は、下式で表すことができる。

ここで、右辺第一項のψはｕｎａｒｙｐｏｔｅｎｔｉａｌ、右辺第二項のφはｐａｉｒｗｉｓｅｐｏｔｅｎｔｉａｌを表す。θ_ψ及びθ_φはそれぞれのパラメータであり、後述する学習処理において算出される。ε_ｉは、画素ｉの近傍画素の集合である。ｇ_ｉｊは画素ｉと画素ｊとの相互関係を表す関数であり、Ｚは正規化項である。判定部１５００は、このモデル式に従って各画素のクラスラベルを更新していくことによって、画像全体のポテンシャルが高い状態へと判定結果を収束させていく。 The conditional probability of class label ci of pixel i on input image I can be expressed by the following equation.

Here, ψ in the first term on the right side represents unary potential, and φ in the second term on the right side represents pairwise potential. θ _ψ and θ _φ are respective parameters, which are calculated in a learning process to be described later. ε _i is the set of neighboring pixels of pixel i. g _ij is a function representing the correlation between pixel i and pixel j, and Z is a normalization term. The determination unit 1500 updates the class label of each pixel according to this model formula, thereby converging the determination result to a state where the potential of the entire image is high.

ｐａｉｒｗｉｓｅｐｏｔｅｎｔｉａｌは下式で表すことができる。

ここで、ｘ_ｉ及びｘ_ｊはそれぞれ画素ｉ及び画素ｊの色情報であり、ＲＧＢ値を持つ３次元ベクトルで表わされる。βはユーザの定義するハイパーパラメータであって、β＝１などと設定することができる。このように、ｐａｉｒｗｉｓｅｐｏｔｅｎｔｉａｌは、時刻ｔにおいて違うクラスに属する画素の色が類似する場合に評価が低くなるように設定することができる。 The pairwise potential can be expressed by the following formula.

Here, x _i and x _j are color information of pixel i and pixel j, respectively, and are represented by three-dimensional vectors having RGB values. β is a user-defined hyperparameter and can be set such as β=1. In this way, the pairwise potential can be set so that the evaluation is low when the colors of pixels belonging to different classes at time t are similar.

ｕｎａｒｙｐｏｔｅｎｔｉａｌは下記で表すことができる。

ここでｙ_ｃ（Ｘ_ｉ）は、画素位置ｉにおけるクラスｃに関する混合状態推定値である。ｙ_ｃ（Ｘ_ｉ）は、画素位置ｉが含まれる所定領域について推定部１２００が推定した混合状態に基づいて算出することができ、例えば所定領域内における、クラスｃの面積比、エッジ画素率、又はクラスラベル配置パターン等でありうる。このように、ｕｎａｒｙｐｏｔｅｎｔｉａｌは、時刻ｔにおいて各画素の属性に基づいて計算される混合状態と、推定部１２００により得られた混合状態と、の類似度が大きいほど評価が高くなる。 The unary potential can be expressed as follows.

where y _c (X _i ) is the mixture state estimate for class c at pixel location i. y _c (X _i ) can be calculated based on the mixture state estimated by the estimating unit 1200 for the predetermined region including the pixel position i. Or it can be a class label arrangement pattern or the like. In this way, the unary potential is evaluated higher as the degree of similarity between the mixed state calculated based on the attribute of each pixel at time t and the mixed state obtained by the estimation unit 1200 increases.

Ｌ^ｉ _ｃ（ｔ）は、ＣＲＦに従って画素単位のクラスラベルが遷移していったときの、時刻ｔにおける、画素ｉが含まれる所定領域のクラスｃの混合状態である。Ｌ^ｉ _ｃ（ｔ）は、推定部１２００が推定する混合状態と同種の情報であり、推定部１２００が時刻ｔにおいて所定領域内の各画素について推定されているクラスを参照して算出できる。実施形態１で説明した混合状態の例に従って、以下に具体的な例を挙げる。例えば、遷移途中の時刻ｔにおいて、画素ｉが含まれる所定領域内におけるクラスラベルｃが割り当てられている画素をカウントすることにより、クラスｃの面積比ｒ（ｔ）を求めることができる。また、所定領域内におけるクラスラベルの配置に従ってエッジ画素を抽出してカウントすることにより、エッジ画素率ｅ（ｔ）を求めることができる。さらに、所定領域内におけるクラスラベルの配置を、図７に示されるマップのどれに最も近いかを判定することにより、クラスラベル配置パターンｐ（ｔ）を求めることができる。実施形態１で説明したように、Ｌ^ｉ _ｃ（ｔ）は、これら時刻ｔにおける混合状態の組み合わせによって表すこともできる。 L ⁱ _c (t) is the mixed state of class c in a predetermined region including pixel i at time t when the class label for each pixel transitions according to the CRF. L ⁱ _c (t) is the same kind of information as the mixed state estimated by the estimation unit 1200, and can be calculated by the estimation unit 1200 with reference to the class estimated for each pixel in the predetermined region at time t. Specific examples will be given below according to the example of the mixed state described in the first embodiment. For example, the area ratio r(t) of the class c can be obtained by counting the pixels assigned the class label c in the predetermined region including the pixel i at the time t in the middle of the transition. Further, the edge pixel ratio e(t) can be obtained by extracting and counting edge pixels according to the arrangement of class labels in a predetermined area. Further, the class label arrangement pattern p(t) can be obtained by determining which of the maps shown in FIG. 7 the class label arrangement in the predetermined area is closest to. As described in Embodiment 1, L ⁱ _c (t) can also be expressed by a combination of these mixed states at time t.

このように、推移中の時刻ｔにおける画素レベルでのクラスラベル配置に基づいて決定された時刻ｔにおける所定領域内の混合状態と、所定領域についての混合状態推定値と、の類似度を、ｕｎａｒｙｐｏｔｅｎｔｉａｌとして表現することができる。具体的には、時刻ｔにおける所定領域内の混合状態と、所定領域についての混合状態推定値と、の類似度が大きいほど評価が高くなるように、ｕｎａｒｙｐｏｔｅｎｔｉａｌを表現することができる。 In this way, the similarity between the mixture state in the predetermined region at time t determined based on the pixel-level class label arrangement at time t during the transition and the mixture state estimate value for the predetermined region is expressed as unary It can be expressed as potential. Specifically, the unary potential can be expressed such that the higher the similarity between the mixed state in the predetermined region at time t and the mixed state estimated value for the predetermined region, the higher the evaluation.

本実施形態における学習処理について、図２（Ａ）に従って説明する。ステップＳ２１００でデータ取得部２１００は、学習画像と教師データとを取得する。ステップＳ２２００で学習部２２００は、実施形態１と同様に推定器の学習を行う。また、学習部２２００は、入力画像の各画素のクラスを推定する際に用いるパラメータ（例えば上述のθ_ψ及びθ_φ）の値を決定する。学習部２２００は、複数の学習画像と、学習画像の各画素のクラスを示すクラスラベルのデータを用いて、この処理を行うことができる。クラスラベルのデータとしては、例えば、実施形態２に従って図９（Ｃ）のように作成されたものを用いることができる。本実施形態において学習部２２００は、全学習画像に対するポテンシャルが最大となるようにθ_ψ及びθ_φの値を算出することができる。すなわち、下式をそれぞれ最大化するθ_ψとθ_φの値を、勾配法等によって求めることができる。

Learning processing in this embodiment will be described with reference to FIG. In step S2100, data acquisition section 2100 acquires learning images and teacher data. In step S2200, the learning unit 2200 learns the estimator as in the first embodiment. The learning unit 2200 also determines values of parameters (for example, θ _ψ and θ _φ described above) used when estimating the class of each pixel of the input image. The learning unit 2200 can perform this process using a plurality of learning images and class label data indicating the class of each pixel in the learning image. As class label data, for example, data created as shown in FIG. 9C according to the second embodiment can be used. In this embodiment, the learning unit 2200 can calculate the values of θ _ψ and θ _φ such that the potential for all training images is maximized. That is, the values of θ _ψ and θ _φ that maximize the following expressions can be obtained by the gradient method or the like.

学習部２２００は、得られたパラメータを、推定器とともに推定器記憶部５２００に格納する。本実施形態ではθ_ψ及びθ_φの値が推定器記憶部５２００に記憶され、判定部１５００によって上述のように用いられる。こうして得られた画素ごとのクラスラベルのデータは、実施形態４と同様にして、領域ごとに高画質化処理を行う場合等に利用することができる。 Learning section 2200 stores the obtained parameters in estimator storage section 5200 together with the estimator. In this embodiment, the values of θ _ψ and θ _φ are stored in the estimator storage unit 5200 and used by the decision unit 1500 as described above. The class label data for each pixel thus obtained can be used, for example, when image quality improvement processing is performed for each region in the same manner as in the fourth embodiment.

混合状態の推定結果を利用して画素単位のクラス判定を行う方法は、上記の方法には限定されない。例えば、実施形態２と同様に、クラスが確定している領域を用いて求められた各クラスの混合ガウス分布と、上記の混合状態の類似度と、に基づいて、画素単位のクラス判定を行うこともできる。 The method of performing class determination on a pixel-by-pixel basis using the mixed state estimation result is not limited to the above method. For example, similar to the second embodiment, class determination is performed on a pixel-by-pixel basis based on the mixed Gaussian distribution of each class obtained using the region where the class is determined and the similarity of the mixed state. can also

本実施形態に係る処理は、混合状態として、面積比、エッジ画素率、及びクラスラベル配置パターンのいずれを用いても可能であるし、利用可能な混合状態がこれらに限られるわけでもない。また、複数の表現を組み合わせて表現された混合状態を用いることにより、判定精度を向上させることができる。例えば、面積比に加えてエッジ画素率を用いることにより、建物と空との境界のように輪郭が単純な場合と、枝と空との境界のように輪郭が複雑な場合と、を区別することが可能となる。 The processing according to the present embodiment can use any of the area ratio, the edge pixel ratio, and the class label arrangement pattern as the mixed state, and usable mixed states are not limited to these. Also, by using a mixed state expressed by combining a plurality of expressions, it is possible to improve the judgment accuracy. For example, by using the edge pixel ratio in addition to the area ratio, we distinguish between simple contours such as the boundary between a building and the sky and complex contours such as the boundary between a branch and the sky. becomes possible.

本実施形態において、混合状態を示す情報は推定部１２００の処理により得られた。しかしながら、判定部１５００は、異なる方法により得られた混合状態を示す情報を取得し、同様の方法で各画素の属性を判定することもできる。 In this embodiment, the information indicating the mixed state is obtained by the processing of the estimator 1200 . However, the determination unit 1500 can acquire information indicating the mixed state obtained by a different method and determine the attribute of each pixel by a similar method.

（その他の実施例）
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 (Other examples)
The present invention supplies a program that implements one or more functions of the above-described embodiments to a system or device via a network or a storage medium, and one or more processors in the computer of the system or device reads and executes the program. It can also be realized by processing to It can also be implemented by a circuit (for example, ASIC) that implements one or more functions.

１１００：画像取得部、１２００：推定部、１３００：出力部、１４００：領域設定部、１５００：判定部、２１００：データ取得部、２２００：学習部、２３００：詳細化部 1100: image acquisition unit, 1200: estimation unit, 1300: output unit, 1400: area setting unit, 1500: determination unit, 2100: data acquisition unit, 2200: learning unit, 2300: detailing unit

Claims

an acquisition means for acquiring an input image;
extracting means for extracting features from the input image;
output means for outputting information corresponding to the area of a region belonging to a specific class in the input image from the estimator to which the features extracted from the input image are input;
The image processing apparatus, wherein the estimator has parameters learned using learning images.

2. The image processing apparatus according to claim 1, wherein said output means outputs information specifying the ratio estimated by said estimator.

2. The image processing apparatus according to claim 1, further comprising control means for controlling imaging by the imaging means based on the result of estimation by said estimator.

4. The image processing apparatus according to claim 3, wherein said control means controls said imaging by said imaging means with respect to exposure, focus, or white balance based on the result of estimation by said estimator.

3. The ratio estimated by the estimator is a ratio of (1) the number of pixels in the region belonging to the particular class, and (2) the total number of pixels in the input image. The image processing device according to .

2. The image processing apparatus according to claim 1, wherein said estimator estimates a ratio of regions belonging to a sky class, a plant class, or a skin class to said input image.

obtaining an input image;
extracting features from the input image;
outputting information corresponding to the area of a region belonging to a specific class in the input image from the estimator to which the features extracted from the input image are input;
A method of image processing, wherein the estimator has parameters learned using training images.

A program for operating a computer as the image processing apparatus according to any one of claims 1 to 6.

7. The image processing apparatus according to any one of claims 1 to 6, wherein said input image is a divided image obtained by dividing an entire image.

7. The image processing apparatus according to any one of claims 1 to 6, wherein said input image has a plurality of areas belonging to each class.

The estimator has parameters learned based on (a) features extracted from a training image and (b) teacher information representing a quantity corresponding to the area of a region belonging to a particular class in the training image. ,
7. The image processing apparatus according to any one of claims 1 to 6, wherein said teacher information represents a numerical value corresponding to a total area area of a plurality of areas belonging to said specific class in said learning image. .

wherein the estimator outputs, as an output for the input image having a plurality of regions belonging to the specific class, a numerical value corresponding to a total area area of the plurality of regions belonging to the specific class in the input image. 7. The image processing apparatus according to any one of claims 1 to 6, wherein the image processing apparatus is learned.

The information corresponding to the area of the region belonging to the specific class in the input image is a ratio of the area of the region belonging to the specific class in the input image to the area of the input image. 7. The image processing apparatus according to any one of claims 1 to 6.

a first acquisition means for acquiring training images for training the estimator;
extracting means for extracting features from the training image;
a second acquiring means for acquiring, as teacher information, information corresponding to the area of a region belonging to a specific class in the learning image;
learning means for learning the estimator using a combination of the features and the teacher information;
A learning device comprising:

obtaining training images for training the estimator;
extracting features from the training images;
a step of acquiring, as teacher information, information corresponding to the area of a region belonging to a specific class in the learning image;
and a step of learning the estimator using a combination of the features and the teacher information.

an acquisition means for acquiring divided images obtained by dividing an input image;
extraction means for extracting features from the divided images;
output means for outputting information corresponding to the area ratio belonging to a specific class in the divided image from the estimator to which the feature extracted from the divided image is input;
The image processing apparatus, wherein the estimator has parameters learned using learning images.

17. The image processing apparatus according to claim 16, further comprising control means for controlling imaging by the imaging means based on the result of estimation by said estimator.

18. The image processing apparatus according to claim 17, wherein said control means controls said imaging by said imaging means with respect to exposure, focus, or white balance based on the result of estimation by said estimator.

an obtaining step of obtaining divided images obtained by dividing an input image;
an extraction step of extracting features from the divided images;
an output step of outputting information corresponding to the area ratio belonging to a specific class in the divided image from the estimator to which the feature extracted from the divided image is input;
A method of image processing, wherein the estimator has parameters learned using training images.

A program for operating a computer as the image processing apparatus according to any one of claims 16 to 18.