JP2018097807A

JP2018097807A - Learning device

Info

Publication number: JP2018097807A
Application number: JP2016244688A
Authority: JP
Inventors: 雄介関川; Yusuke Sekikawa; 孝介原; Kosuke Hara; 鈴木　幸一郎; Koichiro Suzuki; 幸一郎鈴木
Original assignee: Denso IT Laboratory Inc
Current assignee: Denso IT Laboratory Inc
Priority date: 2016-12-16
Filing date: 2016-12-16
Publication date: 2018-06-21

Abstract

PROBLEM TO BE SOLVED: To provide a technique for efficiently performing labelling by semi-supervised learning.SOLUTION: A learning device 1 includes: an input part 10 for inputting a plurality of labelled images L and a plurality of non-labelled images U; a CNN processing part 11 for generating a plurality of feature maps by subjecting the images to a CNN processing; summing the entropy obtained for each pixel with respect to the plurality of feature maps generated in the CNN processing part 11 and summing the cross entropy with a correct label attached for each element and the elements of the plurality of feature maps with respect to the plurality of feature maps generated from the labelled images L, an estimation value calculation part 12 for performing the process of drawing the cross entropy from the entropy with respect to the plurality of labeled images and the plurality of non-labeled images and calculating the estimation value by summing the obtained value; and a learning part 13 for performing the learning of a parameter Q used in the CNN process so as to minimize the evaluation value.SELECTED DRAWING: Figure 1

Description

本発明は、未知の画像にラベル付けを行う技術に関し、特に、ラベル付けを行うために未知の画像を分類するためのパラメータを学習する技術に関する。 The present invention relates to a technique for labeling an unknown image, and more particularly, to a technique for learning parameters for classifying an unknown image for labeling.

画像のピクセル毎に、ラベル付けを行うセマンティックセグメンテーション（Semantic Segmentation）は、自動運転などの応用で重要な技術である。一般に高い性能を実現するためには、大量のラベル付けを行うことが必要である。ピクセルごとのラベルが必要となるセマンティックセグメンテーションでは，画像１枚当たりのラベル付けコストが特に高く、省力化が必要である。ラベル付けのコストを省力化するための手法として、Weakly supervise learning（非特許文献１，２）やmicro annotation（非特許文献３）等が提案されている。 Semantic segmentation, which labels each pixel of an image, is an important technique for applications such as automatic driving. In general, a large amount of labeling is required to achieve high performance. In the semantic segmentation that requires a label for each pixel, the labeling cost per image is particularly high, and labor saving is required. Weakly supervise learning (Non-Patent Documents 1 and 2), micro annotation (Non-Patent Document 3), and the like have been proposed as methods for saving the labeling cost.

Weakly supervise learningは、ピクセル毎にラベル付けされた少量のラベル付きデータと、ラベル付けコストが比較的低い画像に写っている物体のラベル（画像全体に対して各物体のあり／無しを記述）のみを用い、ピクセルごとにラベル付けを行う必要がないというメリットがある。micro annotationは、学習済みのモデルが出力するセグメンテーションのプロポーザルに対して、それが正しいか否かのみをラベル付けする。 Weakly supervise learning only includes a small amount of labeled data labeled for each pixel and the label of the object in the image with relatively low labeling costs (describes the presence / absence of each object in the entire image) There is an advantage that it is not necessary to label each pixel. The micro annotation labels only the correctness of the segmentation proposal output by the trained model.

Learning Transferrable Knowledge for Semantic Segmentation with Deep Convolutional Neural Network （CVPR2016）Learning Transferrable Knowledge for Semantic Segmentation with Deep Convolutional Neural Network (CVPR2016) Decoupled Deep Neural Network for Semi-supervised Semantic Segmentation（NIPS 2015)Decoupled Deep Neural Network for Semi-supervised Semantic Segmentation (NIPS 2015) Improving Weakly-Supervised Object Localization By Micro-Annotation（BMVC2016）Improving Weakly-Supervised Object Localization By Micro-Annotation (BMVC2016)

しかしながら、セマンティックセグメンテーションにおいて、上記のようなWeakly supervised dataや、micro annotationの情報を用いることなく、半教師あり学習でラベルなしデータを活用する手法は提案されていない。本発明は、半教師あり学習によって、効率的にラベル付けを行う技術を提案する。 However, no method has been proposed for utilizing unlabeled data in semi-supervised learning without using the above-mentioned weakly supervised data or micro annotation information in semantic segmentation. The present invention proposes a technique for efficiently labeling by semi-supervised learning.

本発明の学習装置は、複数のラベル付き画像と複数のラベルなし画像とを入力する入力部と、画像をＣＮＮ処理して複数の特徴マップを生成するＣＮＮ処理部と、前記ＣＮＮ処理部にて生成された複数の特徴マップについて画素毎に求めたエントロピーを合算すると共に、ラベル付き画像から生成された複数の特徴マップについては、さらに、画素毎に付された正解ラベルとのクロスエントロピーを合算し、前記エントロピーから前記クロスエントロピーを引く処理を、複数のラベル付き画像および複数のラベルなし画像について行って、求めた値を合算して評価値を計算する評価値計算部と、前記評価値を最小化するように前記ＣＮＮ処理で用いるパラメータの学習を行う学習部とを備える。 The learning device according to the present invention includes an input unit that inputs a plurality of labeled images and a plurality of unlabeled images, a CNN processing unit that generates a plurality of feature maps by CNN processing the images, and the CNN processing unit. The entropy obtained for each pixel for the generated feature maps is added together, and for the feature maps generated from the labeled images, the cross entropy with the correct label assigned for each pixel is added together. The cross entropy is subtracted from the entropy for a plurality of labeled images and a plurality of unlabeled images, and an evaluation value calculation unit that calculates an evaluation value by adding the obtained values; and A learning unit that learns parameters used in the CNN process.

このように特徴マップのエントロピーを小さくすると共に、ラベル付き画像の正解ラベルとのクロスエントロピーを大きくするようなパラメータを求めることにより、ラベル付き画像が豊富にはない場合であっても、ラベルなし画像をうまく分類するという観点でパラメータの精度を補って、未知の画像を適切にクラス分けするパラメータを求めることができる。 In this way, by reducing the entropy of the feature map and obtaining parameters that increase the cross-entropy of the labeled image with the correct answer label, even if there are not many labeled images, unlabeled images It is possible to obtain a parameter for appropriately classifying an unknown image by supplementing the accuracy of the parameter in terms of classifying the image.

本発明の学習装置において、前記評価値計算部は、画素毎にエントロピーを求める構成に代えて、所定の領域を単位として、その領域内にある画素の平均値に基づいてエントロピーを計算してもよい。また、前記評価値計算部は、画素毎にエントロピーを求める構成に代えて、所定の領域を単位として、その領域内にある画素の重み付き和に基づいてエントロピーを計算してもよい。 In the learning device of the present invention, the evaluation value calculation unit may calculate entropy based on an average value of pixels in a predetermined area in units of a predetermined area instead of a configuration for obtaining entropy for each pixel. Good. Further, the evaluation value calculation unit may calculate entropy based on a weighted sum of pixels in a predetermined area as a unit instead of a configuration for obtaining entropy for each pixel.

画像においては、いくつかの画素が集まった領域においてラベルが同じであることが一般的なので、所定の領域内の画素の平均値または重み付き和に基づいて処理を行うことにより、パラメータを適切に学習することができる。 In an image, it is common that the label is the same in an area where several pixels are gathered. Therefore, by performing processing based on the average value or weighted sum of pixels in a predetermined area, the parameters are appropriately set. Can learn.

本発明の学習装置において、前記評価値計算部は、画素毎にエントロピーを求める構成に代えて、スーパー画素の代表値に基づいてエントロピーを計算してもよい。 In the learning device of the present invention, the evaluation value calculation unit may calculate entropy based on a representative value of a super pixel instead of a configuration for obtaining entropy for each pixel.

類似した特徴をもつ画素の集まりであるスーパー画素を用いて処理を行うことにより、パラメータを適切に学習することができる。 A parameter can be appropriately learned by performing processing using a super pixel which is a collection of pixels having similar characteristics.

本発明の学習装置は、任意の情報から新しい画像を生成する生成器と、前記新しい画像と所定の画像とが同じ画像であるか否かを識別する識別器とを有し、前記新しい画像が前記所定の画像と同じであると前記識別器によって識別されるように前記生成器の学習を行うＧＡＮ（Generative Adversarial Networks）装置を備え、前記学習部にて学習したパラメータを用いて所定の画像の各画素にラベルを付与したセマンティック画像から、前記ＧＡＮ装置によって前記所定の画像を生成するように学習を行い、その結果求められた前記セマンティック画像の誤差を逆伝播させて前記パラメータを学習してもよい。 The learning device of the present invention includes a generator that generates a new image from arbitrary information, and an identifier that identifies whether the new image and a predetermined image are the same image, A GAN (Generative Adversarial Networks) device that performs learning of the generator so as to be identified by the classifier as being the same as the predetermined image, and using the parameters learned by the learning unit, Learning is performed so that the GAN device generates the predetermined image from a semantic image in which a label is assigned to each pixel, and the parameter is learned by back-propagating an error of the semantic image obtained as a result. Good.

このようにセマンティック画像を条件付きとして復元した画像と元の所定の画像とが類似するようにパラメータを更新することにより、パラメータの精度を一層高めることができる。 Thus, by updating the parameters so that the image restored with the semantic image as a condition is similar to the original predetermined image, the accuracy of the parameters can be further improved.

本発明の学習装置は、前記所定の画像をＣＮＮ処理部で処理して求めた複数の特徴マップと、前記セマンティック画像から生成された新しい画像をＣＮＮ処理部で処理して求めた複数の特徴マップとのクロスエントロピーを求めるクロスエントロピー計算部を備え、前記生成器は、前記クロスエントロピーの情報も用いて前記新しい画像を生成してもよい。 The learning device of the present invention includes a plurality of feature maps obtained by processing the predetermined image by a CNN processing unit, and a plurality of feature maps obtained by processing a new image generated from the semantic image by a CNN processing unit. And a cross-entropy calculating unit for obtaining a cross-entropy between the generator and the generator may generate the new image using the cross-entropy information.

ＧＡＮの識別器は、所定の画像が張る多様体と、新しい画像との距離を測ることにより、両者が一致するか否かを判定する構成を有するので、識別器での識別のみでは新しい画像を画素単位で所定の画像に対応させることはできない。本発明の構成により、セマンティック画像から生成された新しい画像と、所定の画像のクロスエントロピーの情報を用いることにより、所定の画像に意味的に近い画像を生成することができる。 The GAN discriminator has a configuration for determining whether or not they match by measuring the distance between the manifold on which the predetermined image is stretched and the new image, so that only the discriminator can identify a new image. It is not possible to correspond to a predetermined image in pixel units. With the configuration of the present invention, it is possible to generate an image that is semantically close to a predetermined image by using a new image generated from the semantic image and cross-entropy information of the predetermined image.

本発明の学習方法は、学習装置によって、複数のラベル付き画像と複数のラベルなし画像とに基づいてラベルなし画像を分類するパラメータを学習するための方法であって、前記学習装置が、複数のラベル付き画像と複数のラベルなし画像とを入力するステップと、前記学習装置が、画像をＣＮＮ処理して複数の特徴マップを生成するステップと、前記学習装置が、生成された複数の特徴マップについて画素毎に求めたエントロピーを合算すると共に、ラベル付き画像から生成された複数の特徴マップについては、さらに、画素毎に付された正解ラベルとのクロスエントロピーを合算し、前記エントロピーから前記クロスエントロピーを引く処理を、複数のラベル付き画像および複数のラベルなし画像について行って、求めた値を合算して評価値を計算するステップと、前記学習装置が、前記評価値を最小化するように前記ＣＮＮ処理で用いるパラメータの学習を行うステップとを備える。 The learning method of the present invention is a method for learning a parameter for classifying an unlabeled image based on a plurality of labeled images and a plurality of unlabeled images by a learning device. A step of inputting a labeled image and a plurality of unlabeled images; a step in which the learning device performs a CNN process on the image to generate a plurality of feature maps; and a step in which the learning device generates a plurality of feature maps. The entropy obtained for each pixel is added together, and for a plurality of feature maps generated from the labeled image, the cross entropy with the correct label attached to each pixel is further added, and the cross entropy is calculated from the entropy. The subtraction process is performed for multiple labeled images and multiple unlabeled images, and the calculated values are added together for evaluation. Comprising calculating a value, the learning apparatus, and performing learning of the parameters used in the CNN process so as to minimize the evaluation value.

本発明のプログラムは、複数のラベル付き画像と複数のラベルなし画像とを用いて、ラベルなし画像を分類するパラメータを学習するためのプログラムであって、コンピュータに、画像をＣＮＮ処理して複数の特徴マップを生成するステップと、生成された複数の特徴マップについて画素毎に求めたエントロピーを合算すると共に、ラベル付き画像から生成された複数の画像については、さらに、画素毎に付された正解ラベルとのクロスエントロピーを合算し、前記エントロピーから前記クロスエントロピーを引く処理を、複数のラベル付き画像および複数のラベルなし画像について行って、求めた値を合算して評価値を計算するステップと、前記評価値を最小化するように前記ＣＮＮ処理で用いるパラメータの学習を行うステップとを実行させる。 The program of the present invention is a program for learning parameters for classifying an unlabeled image using a plurality of labeled images and a plurality of unlabeled images. The step of generating a feature map and the entropy obtained for each pixel of the plurality of generated feature maps are added together, and for a plurality of images generated from the labeled image, a correct label attached to each pixel Summing the cross entropy and subtracting the cross entropy from the entropy is performed for a plurality of labeled images and a plurality of unlabeled images, and adding the obtained values to calculate an evaluation value; and Performing learning of parameters used in the CNN process so as to minimize the evaluation value To.

本発明によれば、入力された画像の特徴マップのエントロピーを小さくすると共に、ラベル付き画像の正解ラベルとのクロスエントロピーを大きくするようなパラメータを求めることにより、ラベル付き画像が豊富にはない場合であっても、ラベルなし画像をうまく分類するという観点でパラメータの精度を補って、未知の画像を適切にクラス分けするパラメータを求めることができる。 According to the present invention, when the entropy of the feature map of the input image is reduced and the parameters that increase the cross entropy of the labeled image with the correct answer label are obtained, the number of labeled images is not abundant. Even so, it is possible to obtain parameters for appropriately classifying unknown images by supplementing the accuracy of the parameters in terms of classifying unlabeled images well.

第１の実施の形態の学習装置の構成を示す図である。It is a figure which shows the structure of the learning apparatus of 1st Embodiment. 第１の実施の形態の学習装置の動作を示す図である。It is a figure which shows operation | movement of the learning apparatus of 1st Embodiment. 第２の実施の形態の学習装置の構成を示す図である。It is a figure which shows the structure of the learning apparatus of 2nd Embodiment. 第２の実施の形態で用いられるＧＡＮ装置の構成を示す図である。It is a figure which shows the structure of the GAN apparatus used by 2nd Embodiment. 第２の実施の形態の学習装置の動作を示す図である。It is a figure which shows operation | movement of the learning apparatus of 2nd Embodiment.

以下、本発明の実施の形態の学習装置について説明する。本実施の形態の学習装置は、画像に対してセマンティックセグメンテーションを行うための畳み込みニューラルネットワークのパラメータ（畳み込み層の各カーネルの要素の値、全結合層の各ユニットの結合重み、バイアス等）を学習する装置である。 Hereinafter, the learning apparatus according to the embodiment of the present invention will be described. The learning device according to the present embodiment learns parameters of convolutional neural networks (element values of each kernel in the convolution layer, coupling weights of each unit of all connection layers, bias, etc.) for performing semantic segmentation on the image. It is a device to do.

図１は、学習装置１の構成を示す図である。学習装置１は、多数の画像を記憶したデータベース１４と接続されている。データベース１４には、教師データである多数の画像データが記憶されている。画像には、何の画像であるかを表すラベルが付与されたラベル付き画像Ｌと、ラベルが付与されていないラベルなし画像Ｕがある。 FIG. 1 is a diagram illustrating a configuration of the learning device 1. The learning device 1 is connected to a database 14 that stores a large number of images. The database 14 stores a large number of image data as teacher data. The image includes a labeled image L to which a label indicating what image is given and an unlabeled image U to which no label is given.

画像にラベルを付すのは、手間がかかるので、すべての画像に対してラベルは付されていない。ラベル付き画像Ｌはラベルなし画像Ｕに比べて格段に少なく、例えば、ラベルなし画像Ｕが１万枚あるのに対し、ラベル付き画像Ｌは１０００枚である。本実施の形態の学習装置１は、ラベル付き画像Ｌ及びラベルなし画像Ｕを用いて半教師あり学習を行い、未知のラベルなし画像に対して、自動でラベルを付すことができるようにするためのパラメータを学習する。 Since it takes a lot of time to label the images, all the images are not labeled. The number of labeled images L is significantly smaller than that of unlabeled images U. For example, there are 10,000 unlabeled images U and 1000 labeled images L. The learning device 1 according to the present embodiment performs semi-supervised learning using the labeled image L and the unlabeled image U so that an unknown unlabeled image can be automatically labeled. Learn the parameters.

学習装置１は、データベース１４から画像を読み出して入力する入力部１０と、入力された画像に対して畳み込みニューラルネットワーク（Convolutional Neural Network）の処理を行うＣＮＮ処理部１１とを有している。ＣＮＮ処理部１１は、画像に対してＣＮＮ処理を行って、複数の特徴マップ（feature map）を生成する。ここで生成する特徴マップの数は、画像を分類したいクラスの数と同じである。ＣＮＮ処理部１１は、入力部１０にて入力されたラベル付き画像およびラベルなし画像のいずれについても特徴マップを生成する。 The learning apparatus 1 includes an input unit 10 that reads and inputs an image from the database 14 and a CNN processing unit 11 that performs a convolutional neural network process on the input image. The CNN processing unit 11 performs a CNN process on the image to generate a plurality of feature maps. The number of feature maps generated here is the same as the number of classes into which images are to be classified. The CNN processing unit 11 generates a feature map for both the labeled image and the unlabeled image input by the input unit 10.

学習装置１は、生成された特徴マップに基づいて、パラメータ更新を行うための評価値を計算する評価値計算部１２と、評価値を最小にするようにパラメータを更新する学習部１３とを有している。評価値計算部１２は、入力されたすべての画像の特徴マップのエントロピーを計算すると共に、ラベル付き画像については、特徴マップと正解ラベルとのクロスエントロピーを計算する。評価値計算部１２は、すべての画像のエントロピーからクロスエントロピーを引いた値を評価値として計算する。なお、評価値計算部１２でのエントロピーの計算は画素毎に行う。 The learning device 1 has an evaluation value calculation unit 12 that calculates an evaluation value for performing parameter updating based on the generated feature map, and a learning unit 13 that updates a parameter so as to minimize the evaluation value. doing. The evaluation value calculation unit 12 calculates the entropy of the feature map of all the input images, and calculates the cross-entropy between the feature map and the correct answer label for the labeled image. The evaluation value calculation unit 12 calculates a value obtained by subtracting the cross entropy from the entropy of all images as an evaluation value. The evaluation value calculation unit 12 calculates entropy for each pixel.

学習部１３は、評価値を最小にするように、ＣＮＮ処理部１１で用いるパラメータを更新する。以下の数式は、学習部１３で行うパラメータ更新の処理を表したものである。 The learning unit 13 updates parameters used in the CNN processing unit 11 so as to minimize the evaluation value. The following formula represents the parameter update process performed by the learning unit 13.

上記式において、Ｑは、ＣＮＮ処理部１１で用いるパラメータを示し、Ｃはラベル付き画像に与えられた正解ラベルを示している。Ｌはラベル付き画像を示し、Ｕはラベルなし画像を示している。かっこ内の第１項は、ラベル付き画像と正解ラベルとのクロスエントロピーを表し、第２項は、ラベル付き画像およびラベルなし画像の全画像のエントロピーを表している。 In the above equation, Q indicates a parameter used in the CNN processing unit 11, and C indicates a correct label given to the labeled image. L indicates a labeled image and U indicates an unlabeled image. The first term in parentheses represents the cross entropy between the labeled image and the correct label, and the second term represents the entropy of all the images of the labeled image and the unlabeled image.

この式を定性的に説明すると、第１項は、ラベル付き画像Ｌの特徴マップと正解ラベルとのクロスエントロピーであるので、クロスエントロピーが大きいほど、ＣＮＮ処理部１１で生成された特徴マップが正解のラベルに近いことを意味している。第２項は、全画像の特徴マップのエントロピーであり、このエントロピーが小さいほど、ラベルが何かは不明であるものの、何らかの特徴が顕著であることを意味している。換言すれば、第２項のエントロピーが小さいほど、特徴マップは、うまくクラス分けがなされたことになる。 Qualitatively explaining this expression, the first term is the cross-entropy between the feature map of the labeled image L and the correct label. Therefore, the larger the cross-entropy, the more correct the feature map generated by the CNN processing unit 11 is. Means close to the label. The second term is the entropy of the feature map of the entire image. The smaller this entropy is, the more unknown the label is, but it means that some feature is remarkable. In other words, the smaller the entropy of the second term, the better the feature map is classified.

学習部１３が、上記の数式に示す評価値を最小化するようなパラメータを求めることで、未知の画像をクラス分けすると共に、分類されたクラスにラベルを付与できるパラメータが得られる。 The learning unit 13 obtains a parameter that minimizes the evaluation value shown in the above mathematical formula, thereby obtaining a parameter that can classify an unknown image and can give a label to the classified class.

図２は、学習装置１にて学習を行う動作を示すフローチャートである。まず、データベース１４から、多数のラベル付き画像およびラベルなし画像を読み出して、学習装置１に入力する（Ｓ１０）。次に、学習装置１は、入力された画像をＣＮＮ処理して各画像の特徴マップを求め（Ｓ１２）、特徴マップのエントロピーと、ラベル付き画像から生成した特徴マップについては、正解ラベルとのクロスエントロピーを、画素毎に求める。評価値計算部１２は、求めたエントロピーからクロスエントロピーを減算して評価値を求める（Ｓ１４）。 FIG. 2 is a flowchart showing an operation of performing learning in the learning device 1. First, a large number of labeled images and unlabeled images are read from the database 14 and input to the learning device 1 (S10). Next, the learning device 1 performs CNN processing on the input image to obtain a feature map of each image (S12), and the feature map entropy and the feature map generated from the labeled image cross with the correct label. Entropy is determined for each pixel. The evaluation value calculation unit 12 obtains an evaluation value by subtracting the cross entropy from the obtained entropy (S14).

次に、学習装置１は、終了条件を満たしたか否かを判定する（Ｓ１６）。終了条件は、例えば、評価値が所定の値以下になったことでもよいし、パラメータの更新を行った回数が所定値に達したことでもよい。終了条件を満たしたと判定された場合には（Ｓ１６でＹＥＳ）、学習装置１は、その時点で求められているパラメータの値をＣＮＮ処理部１１のパラメータとして決定する（Ｓ１８）。終了条件を満たしていないと判定された場合には（Ｓ１６でＮＯ）、学習部１３は、評価値が小さくなる方向にパラメータを更新し（Ｓ２０）、再度、ＣＮＮ処理を行うステップに戻る（Ｓ１２）。以上、第１実施の形態の学習装置１の構成および動作について説明した。 Next, the learning device 1 determines whether or not the end condition is satisfied (S16). The end condition may be, for example, that the evaluation value is equal to or less than a predetermined value, or that the number of parameter updates has reached a predetermined value. When it is determined that the end condition is satisfied (YES in S16), the learning device 1 determines the value of the parameter obtained at that time as the parameter of the CNN processing unit 11 (S18). If it is determined that the termination condition is not satisfied (NO in S16), the learning unit 13 updates the parameter in a direction in which the evaluation value decreases (S20), and returns to the step of performing the CNN process again (S12). ). The configuration and operation of the learning device 1 according to the first embodiment have been described above.

本実施の形態の学習装置１は、ラベル付き画像の特徴マップと正解ラベルとのクロスエントロピーを最大化する方向にパラメータを更新することにより、画像をクラス分けするパラメータを求めることができる。この際に、ラベルなし画像の特徴マップのエントロピーを最小化する方向にパラメータを更新することで、ある特徴が顕著になるようなパラメータを求めることにより、ラベル付き画像の数が少ない場合にも、ラベルなし画像の情報を活用して、画像を適切にクラス分けするパラメータを求めることができる。 The learning device 1 according to the present embodiment can obtain a parameter for classifying an image by updating the parameter in a direction that maximizes the cross entropy between the feature map of the labeled image and the correct label. At this time, by updating the parameter in the direction that minimizes the entropy of the feature map of the unlabeled image, by obtaining a parameter that makes a certain feature remarkable, even when the number of labeled images is small, By using the information of the unlabeled image, a parameter for appropriately classifying the image can be obtained.

（第２の実施の形態）
図３は、第２の実施の形態の学習装置２の構成を示す図である。第２の実施の形態の学習装置２は、第１の実施の形態の学習装置１の構成に加え、ＧＡＮ装置２０を備えている。第２の実施の形態の学習装置２は、ＧＡＮ装置２０が備える識別器２３による識別結果を学習部１３へフィードバックしてパラメータの学習を行う点が異なる。 (Second Embodiment)
FIG. 3 is a diagram illustrating a configuration of the learning device 2 according to the second embodiment. The learning device 2 of the second embodiment includes a GAN device 20 in addition to the configuration of the learning device 1 of the first embodiment. The learning device 2 of the second embodiment is different in that the learning result is fed back to the learning unit 13 by the identification result obtained by the classifier 23 provided in the GAN device 20.

図４は、ＧＡＮ装置２０の詳しい構成を示す図である。セマンティックセグメンテーション部２１は、入力された画像に対してＣＮＮ処理を行って、入力画像を画素単位でクラス分けし、セマンティック画像を生成する機能を有する。ここでのＣＮＮ処理には、学習部１３にて学習を行ったパラメータを用いる。以下、説明の便宜上、セマンティックセグメンテーション部２１に入力する画像を「画像Ａ」という。 FIG. 4 is a diagram showing a detailed configuration of the GAN device 20. The semantic segmentation unit 21 has a function of performing CNN processing on the input image, classifying the input image into pixels, and generating a semantic image. In this CNN process, parameters learned by the learning unit 13 are used. Hereinafter, for convenience of description, an image input to the semantic segmentation unit 21 is referred to as “image A”.

ここで、ＧＡＮ装置２０の概要について述べる。ＧＡＮ装置２０は入力画像Ａから生成したセマンティック画像が正解に近いかどうかを検証する。セマンティック画像が正解に近ければ近いほど、入力画像Ａに近い画像Ａ´を復元することができると考えられるので、ＧＡＮ装置２０は、入力画像Ａと復元画像Ａ´との誤差を算出し、学習部１３へ逆伝播することで学習部１３での学習を助けるものである。 Here, an outline of the GAN device 20 will be described. The GAN device 20 verifies whether the semantic image generated from the input image A is close to the correct answer. Since it is considered that the closer the semantic image is to the correct answer, the closer the input image A is to the image A ′, the GAN device 20 calculates the error between the input image A and the restored image A ′ and learns. Back propagation to the unit 13 helps the learning in the learning unit 13.

ＧＡＮ装置２０は、生成器２２と識別器２３とを有している。生成器２２は、任意の情報から新しい画像を生成する機能を有し、識別器２３は、生成された新しい画像と所定の画像とが同じ画像であるか否かを識別する機能を有する。識別器２３によって同じ画像であると判定されるような画像、すなわち、識別器２３をだます画像を生成器２２が生成する。識別器２３は、生成された画像と所定の画像とを識別した識別結果を生成器２２に入力し、生成器２２はその情報を用いて、所定の画像にさらに近い画像を生成し、識別器２３に入力する。このように画像の生成と識別を繰り返し行うことにより、生成器２２は、所定の画像と同じ画像を生成するように学習する。以上がＧＡＮ（Generative Adversarial Networks）の原理である。 The GAN device 20 includes a generator 22 and a discriminator 23. The generator 22 has a function of generating a new image from arbitrary information, and the identifier 23 has a function of identifying whether or not the generated new image and the predetermined image are the same image. The generator 22 generates an image that is determined to be the same image by the classifier 23, that is, an image that tricks the classifier 23. The discriminator 23 inputs the discrimination result that discriminates the generated image and the predetermined image to the generator 22, and the generator 22 uses the information to generate an image closer to the predetermined image. 23. By repeatedly generating and identifying the image in this way, the generator 22 learns to generate the same image as the predetermined image. The above is the principle of GAN (Generative Adversarial Networks).

本実施の形態においては、ＧＡＮ装置２０の生成器２２は、セマンティックセグメンテーション部２１にて生成した画像Ａのセマンティック画像から新しい画像Ａ´を生成する。そして、識別器２３は、新しい画像Ａ´が、セマンティック画像の元となった画像Ａと同じであるか否かを識別する。つまり、セマンティック画像から元の画像Ａにどれだけ近い画像Ａ´を生成できたかを判定し、その結果を学習部１３へフィードバックする。画像Ａに近い画像Ａ´を生成できたとすれば、セマンティック画像が正解であった可能性が高く、画像Ａからは距離のある画像Ａ´しか生成できなかったとすれば、セマンティック画像が正解からは遠かったということが分かる。 In the present embodiment, the generator 22 of the GAN device 20 generates a new image A ′ from the semantic image of the image A generated by the semantic segmentation unit 21. Then, the discriminator 23 discriminates whether or not the new image A ′ is the same as the image A from which the semantic image is based. That is, it is determined how close to the original image A the image A ′ can be generated from the semantic image, and the result is fed back to the learning unit 13. If the image A ′ close to the image A can be generated, the semantic image is likely to be correct, and if only the image A ′ having a distance from the image A can be generated, the semantic image is determined from the correct answer. You can see that it was far away.

ＧＡＮ装置２０は、画像Ａ及び画像Ａ´のＣＮＮ処理を行うＣＮＮ処理部２４と、ＣＮＮ処理部２４で求めた画像Ａの特徴マップおよび画像Ａ´の特徴マップのクロスエントロピーを計算するクロスエントロピー計算部２５とを備えている。クロスエントロピー計算部２５は、求めたクロスエントロピーの情報を生成器２２に入力する。これにより、生成器２２は画像Ａと画像Ａ´とのクロスエントロピーの情報に基づいて、新しい画像Ａ´を生成できるので、画像Ａに意味的に近い画像Ａ´を生成することができる。 The GAN device 20 performs a CNN processing unit 24 that performs CNN processing of the images A and A ′, and a cross-entropy calculation that calculates a cross-entropy of the feature map of the image A and the feature map of the image A ′ obtained by the CNN processing unit 24. Part 25. The cross entropy calculation unit 25 inputs the obtained cross entropy information to the generator 22. Thereby, since the generator 22 can generate a new image A ′ based on the cross-entropy information between the image A and the image A ′, an image A ′ that is semantically close to the image A can be generated.

図５は、第２の実施の形態の学習装置２にて学習を行う動作を示すフローチャートである。エントロピーとクロスエントロピーの評価値に基づいて、ＣＮＮ処理部１１のパラメータを決定するまでの動作は、第１の実施の形態の学習装置１の動作と同じである。ただし、評価値に基づいて求めたパラメータを、第２の実施の形態では、仮のパラメータとして決定する（Ｓ１８）。 FIG. 5 is a flowchart illustrating an operation of performing learning in the learning device 2 according to the second embodiment. The operations until the parameters of the CNN processing unit 11 are determined based on the evaluation values of entropy and cross entropy are the same as the operations of the learning device 1 according to the first embodiment. However, in the second embodiment, the parameter obtained based on the evaluation value is determined as a temporary parameter (S18).

第２の実施の形態の学習装置１は、仮に決定されたパラメータを用いて生成したセマンティック画像をＧＡＮ装置２０にて検証することにより、さらにパラメータの更新を行う（Ｓ２４）。すなわち、学習装置１は、仮決定されたパラメータを用いてＣＮＮ処理を行って特徴マップを生成し、画像の各画素についてクラス分けをする。そして、各画素のクラス分けに基づいてセマンティック画像を生成する。生成器２２が、セマンティック画像から画像Ａ´を生成し、画像Ａ´が元の画像Ａと同じであると識別器２３によって識別されるように、画像Ａ´を生成する。生成器２２は識別器２３からの識別結果の情報に基づき、画像Ａ´を繰り返し生成し、画像Ａに近づけていく。この際に、図４を参照して説明したとおり、生成器２２は、元の画像Ａと復元画像Ａ´のクロスエントロピーの情報も用いて、画像生成を行う。ＧＡＮ装置２０は、このようにして生成された画像Ａ´と元の画像Ａとの識別結果が終了条件を満たすか否かを判定する（Ｓ２６）。 The learning device 1 according to the second embodiment further updates the parameters by verifying the semantic image generated using the temporarily determined parameters in the GAN device 20 (S24). That is, the learning device 1 performs a CNN process using the temporarily determined parameters to generate a feature map, and classifies each pixel of the image. Then, a semantic image is generated based on the classification of each pixel. The generator 22 generates an image A ′ from the semantic image, and generates an image A ′ so that the image A ′ is identified by the classifier 23 as being the same as the original image A. The generator 22 repeatedly generates the image A ′ based on the identification result information from the classifier 23 and brings it closer to the image A. At this time, as described with reference to FIG. 4, the generator 22 generates an image using information on the cross entropy of the original image A and the restored image A ′. The GAN device 20 determines whether or not the identification result between the image A ′ thus generated and the original image A satisfies the end condition (S26).

終了条件を満たすと判定された場合には（Ｓ２６でＹＥＳ）、仮決定されたパラメータを最終的なパラメータとして決定する（Ｓ２８）。終了条件を満たさないと判定された場合には（Ｓ２６でＮＯ）、識別誤差を学習部１３に逆伝播してパラメータを更新し、再度、評価値に基づいてパラメータ更新を行う（Ｓ２０）。以上、第２実施の形態の学習装置２の構成および動作について説明した。 If it is determined that the end condition is satisfied (YES in S26), the temporarily determined parameter is determined as the final parameter (S28). If it is determined that the termination condition is not satisfied (NO in S26), the identification error is propagated back to the learning unit 13 to update the parameter, and the parameter is updated again based on the evaluation value (S20). The configuration and operation of the learning device 2 according to the second embodiment have been described above.

第２の実施の形態の学習装置２は、第１の実施の形態と同様に、ラベル付き画像の数が少ない場合にも、ラベルなし画像の情報を活用して、画像を適切にクラス分けするパラメータを求めることができる。また、第２の実施の形態の学習装置２は、ＧＡＮ装置２０を利用して仮に生成されたパラメータの検証を行うので、パラメータを精度良く求めることができる。 As in the first embodiment, the learning device 2 according to the second embodiment appropriately classifies images using information on unlabeled images even when the number of labeled images is small. Parameters can be determined. In addition, the learning device 2 according to the second embodiment verifies the parameters temporarily generated using the GAN device 20, so that the parameters can be obtained with high accuracy.

（変形例）
上記した実施の形態においては、画素毎にエントロピーを求める例を挙げて説明したが、エントロピーを求める対象は、所定の領域を単位として、その領域内にある画素の平均値を用いて、エントロピーを計算してもよい。また、所定の領域を単位として、その領域内にある画素の重み付き和に基づいてエントロピーを計算してもよい。画像においては、いくつかの画素が集まった領域においてラベルが同じであることが一般的なので、周辺画素の平均値または重み付き和に基づいて処理を行うことにより、パラメータを適切に学習することができる。 (Modification)
In the above-described embodiment, an example in which entropy is obtained for each pixel has been described. However, entropy is obtained by using an average value of pixels in a predetermined area as a unit. You may calculate. Further, entropy may be calculated based on a weighted sum of pixels in a predetermined area as a unit. In an image, the label is generally the same in a region where several pixels are gathered, so that it is possible to learn parameters appropriately by performing processing based on the average value or weighted sum of neighboring pixels. it can.

また、画素毎にエントロピーを求める構成に代えて、スーパー画素の代表値に基づいてエントロピーを計算してもよい。これにより、類似した特徴をもつ画素の集まりであるスーパー画素を用いて処理を行うことにより、パラメータを適切に学習することができると共に、エントロピーの計算処理を軽減することができる。ここで、代表値とは、スーパー画素内の全画素の平均値でもよいし、最大値でもよいし、またその他の値であってもよい。 Further, instead of the configuration for obtaining entropy for each pixel, the entropy may be calculated based on the representative value of the super pixel. Thus, by performing processing using superpixels that are a collection of pixels having similar features, it is possible to appropriately learn parameters and reduce entropy calculation processing. Here, the representative value may be an average value of all the pixels in the super pixel, a maximum value, or another value.

スーパー画素を用いてエントロピーを計算してパラメータを更新する際には、異なる解像度で生成したスーパー画素を用いてパラメータの計算を行ってもよい。すなわち、ＣＮＮ処理部は、ラベル付き画像およびラベルなし画像をスーパー画素に分割する処理を異なる解像度で行って、評価値計算部および学習部は解像度の異なる複数パターンのスーパー画素を用いて学習を行う。 When entropy is calculated using super pixels and the parameters are updated, the parameters may be calculated using super pixels generated at different resolutions. That is, the CNN processing unit performs a process of dividing the labeled image and the unlabeled image into super pixels at different resolutions, and the evaluation value calculation unit and the learning unit perform learning using a plurality of patterns of super pixels having different resolutions. .

具体的には、ＣＮＮ処理部は、入力部より入力された画像を、例えば、１００００分割、１０００分割、１００分割の３つの解像度でスーパー画素に分割し、それぞれの解像度の画像について、ＣＮＮ処理を行って特徴マップを生成する。続いて、評価値計算部は、それぞれの解像度の画像について、上記の実施の形態にて説明した学習を行って、スーパー画素をクラス分けするパラメータを求める。ラベル付き画像の情報を用いることにより、各クラスのラベルを求めることができるので、ラベルの内容に応じてラベルパラメータへの重み付けをして、異なる解像度で求めた結果をブレンドする。 Specifically, the CNN processing unit divides the image input from the input unit into superpixels with three resolutions, for example, 10,000 divisions, 1000 divisions, and 100 divisions, and performs CNN processing on the images with the respective resolutions. Go to generate a feature map. Subsequently, the evaluation value calculation unit obtains parameters for classifying the super pixels by performing the learning described in the above embodiment for each resolution image. By using the information of the labeled image, the labels of each class can be obtained, so that the label parameters are weighted according to the contents of the labels, and the results obtained at different resolutions are blended.

例えば、ポールや信号など面積が小さくかつ細い物体は、低解像度のスーパー画素では背景に埋もれやすい。したがって、出力ラベルがこれらの物体に高い値を示す場合には、埋もれづらい高解像度のスーパー画素により得られるラベルヒストグラムを高い重みで利用する。逆に、道路などのように埋もれづらいものは、より周囲のラベルと一貫性をもたせるために、低解像度のスーパー画素により得られるラベルヒストグラムを高い重みで利用する。なお、ラベルごとのブレンディングファクターは、ラベル付きデータを様々な解像度でセグメンテーションした際にラベルが変化する割合などから、学習によりあらかじめ求めておく。 For example, a thin and thin object such as a pole or a signal is likely to be buried in the background with a low resolution super pixel. Therefore, when the output label shows a high value for these objects, a label histogram obtained by high-resolution super pixels that are difficult to be buried is used with high weight. Conversely, a label such as a road that is difficult to be buried uses a label histogram obtained with low resolution superpixels with high weight in order to make it more consistent with surrounding labels. Note that the blending factor for each label is obtained in advance by learning from the rate at which the label changes when the labeled data is segmented at various resolutions.

このように複数の解像度で分割したスーパー画素を用いて学習を行い、ラベルごとに学習結果をブレンドすることにより、道路などのように埋もれづらいものは周囲のラベルと一貫性をもたせることができると共に、ポールや信号などのように小さい物体が周囲に埋もれることない適切なパラメータを求めることができる。 By learning using superpixels divided at multiple resolutions in this way and blending the learning results for each label, things that are difficult to bury, such as roads, can be made consistent with surrounding labels. It is possible to obtain an appropriate parameter such that a small object such as a pole or a signal is not buried in the surroundings.

本発明は、画像をクラス分けする技術に適用でき、特に、未知の画像にラベル付けを行う技術等として有用である。 The present invention can be applied to a technique for classifying an image, and is particularly useful as a technique for labeling an unknown image.

１，２学習装置
１０入力部
１１ＣＮＮ処理部
１２評価値計算部
１３学習部
２０ＧＡＮ装置
２１セマンティックセグメンテーション部
２２生成器
２３識別器
２４ＣＮＮ処理部
２５クロスエントロピー計算部 1, 2 Learning device 10 Input unit 11 CNN processing unit 12 Evaluation value calculation unit 13 Learning unit 20 GAN device 21 Semantic segmentation unit 22 Generator 23 Discriminator 24 CNN processing unit 25 Cross entropy calculation unit

Claims

An input unit for inputting a plurality of labeled images and a plurality of unlabeled images;
A CNN processing unit that CNN-processes an image to generate a plurality of feature maps;
The entropy obtained for each pixel for the plurality of feature maps generated by the CNN processing unit is summed, and for the plurality of feature maps generated from the labeled image, a correct label assigned to each pixel and An evaluation value calculation unit that performs a process of subtracting the cross entropy from the entropy and subtracting the cross entropy for a plurality of labeled images and a plurality of unlabeled images, and adding the obtained values to calculate an evaluation value; ,
A learning unit that learns parameters used in the CNN process so as to minimize the evaluation value;
A learning apparatus comprising:

The learning apparatus according to claim 1, wherein the evaluation value calculation unit calculates entropy based on an average value of pixels in a predetermined area as a unit instead of a configuration for obtaining entropy for each pixel.

The learning device according to claim 1, wherein the evaluation value calculation unit calculates entropy based on a weighted sum of pixels in a predetermined area as a unit instead of a configuration for obtaining entropy for each pixel. .

The learning apparatus according to claim 1, wherein the evaluation value calculation unit calculates entropy based on a representative value of a super pixel instead of a configuration for obtaining entropy for each pixel.

A generator for generating a new image from arbitrary information, and an identifier for identifying whether the new image and the predetermined image are the same image, wherein the new image is the same as the predetermined image A GAN (Generative Adversarial Networks) device that trains the generator to be identified by the classifier,
Using the parameters learned by the learning unit, learning is performed so that the GAN device generates the predetermined image from a semantic image in which each pixel of the predetermined image is labeled, and the semantic obtained as a result is obtained. The learning apparatus according to claim 1, wherein the parameter is learned by back-propagating an image error.

Cross entropy between a plurality of feature maps obtained by processing the predetermined image by the CNN processing unit and a plurality of feature maps obtained by processing the new image generated from the semantic image by the CNN processing unit. A cross entropy calculation unit
The learning device according to claim 5, wherein the generator generates the new image using the cross-entropy information.

A method for learning parameters for classifying an unlabeled image based on a plurality of labeled images and a plurality of unlabeled images by a learning device,
The learning device inputs a plurality of labeled images and a plurality of unlabeled images;
The learning device CNN-processing the image to generate a plurality of feature maps;
The learning device adds the entropy obtained for each pixel with respect to the plurality of generated feature maps, and for the plurality of feature maps generated from the labeled image, further includes a correct label attached to each pixel. Adding cross entropy, subtracting the cross entropy from the entropy, performing a plurality of labeled images and a plurality of unlabeled images, and adding the obtained values to calculate an evaluation value;
The learning device learning a parameter used in the CNN process so as to minimize the evaluation value;
A learning method comprising:

A program for learning a parameter for classifying an unlabeled image using a plurality of labeled images and a plurality of unlabeled images,
Generating a plurality of feature maps by CNN processing the image;
The total entropy obtained for each pixel for the plurality of generated feature maps is summed, and for a plurality of images generated from the labeled images, the cross entropy with the correct label attached to each pixel is further summed. Subtracting the cross entropy from the entropy is performed for a plurality of labeled images and a plurality of unlabeled images, and adding the obtained values to calculate an evaluation value;
Learning parameters used in the CNN process so as to minimize the evaluation value;
A program that executes