JP2021068141A

JP2021068141A - Region dividing device, dividing method, and dividing program

Info

Publication number: JP2021068141A
Application number: JP2019192523A
Authority: JP
Inventors: 智之吉山; Tomoyuki Yoshiyama
Original assignee: Secom Co Ltd
Current assignee: Secom Co Ltd
Priority date: 2019-10-23
Filing date: 2019-10-23
Publication date: 2021-04-30
Anticipated expiration: 2039-10-23
Also published as: JP7292178B2

Abstract

To solve a problem in that division results easily vary due to diversification of learning data of a classifier, etc., in region division in which data groups distributed in a space are classified into a plurality of classes and the space is divided into label regions.SOLUTION: As a classifier that outputs a label region for an interest class group after performing a classification process as a result of input of a data group and the interest class group, model storage means 42 stores a learned model in which learning is performed using a learning data group and a learning interest class group given in the forms of a correct class thereof and a subset of the correct class. Interest class setting means 54 sets an interest class group for the data group in a plurality of ways. For each of the plurality of ways of setting the interest class group, region dividing means 55 obtains a label region by means of the classifier, and, among region division results of a space based on the label region, selects, as a region division result for the data group, one satisfying a predetermined condition for a degree of matching between the label region composing the region division result and the space.SELECTED DRAWING: Figure 6

Description

本発明は、画像等のデータ群を被写体等のクラスごとに分類してデータ群をラベル領域に分割する技術に関する。 The present invention relates to a technique for classifying a data group such as an image into a class such as a subject and dividing the data group into label areas.

画像に撮影されたシーンを自動認識するなどの目的で、画像を、当該画像に撮影されている複数の物体それぞれの領域や複数の部位それぞれの領域に分割すると共に、各領域に撮影されている物体や部位を認識する技術が研究・開発されてきた。以下、撮影されている物体や部位を被写体と呼ぶ。被写体の認識を伴った領域分割はセマンティックセグメンテーションなどと称される。 For the purpose of automatically recognizing the scene captured in the image, the image is divided into regions for each of the plurality of objects captured in the image and regions for each of the plurality of parts, and the images are captured in each region. Technology for recognizing objects and parts has been researched and developed. Hereinafter, the object or part being photographed is referred to as a subject. Area division that involves recognition of the subject is called semantic segmentation or the like.

特に、近年では、学習に基づいて上記分割と認識を行う技術が盛んに研究されている。すなわち、例えば、下記非特許文献１には、予め被写体ごとに分割された領域の画素ごとに被写体を表すクラスを付与した学習用画像を多数用意し、コンピュータにこれらの学習用画像を機械学習させることが記載されている。予め付与する情報はアノテーションなどと称される。この学習によって生成された学習済みモデルに任意の画像を入力すれば当該入力画像に対して画素ごとのクラスが出力される。つまり当該入力画像が被写体ごとに、クラスでラベル付けされた領域（ラベル領域）に分割される。 In particular, in recent years, techniques for performing the above division and recognition based on learning have been actively studied. That is, for example, in Non-Patent Document 1 below, a large number of learning images to which a class representing a subject is given for each pixel of a region divided in advance for each subject are prepared, and a computer is made to machine-learn these learning images. It is stated that. The information given in advance is called an annotation or the like. If an arbitrary image is input to the trained model generated by this learning, a class for each pixel is output for the input image. That is, the input image is divided into a class-labeled area (label area) for each subject.

また、近年では、学習用画像とアノテーションとからなるデータセットが公開され利用可能となっている。基本的には多様な学習をした学習済みモデルほど高精度な領域分割を行うことができるため、学習に用いるデータセットの規模は大きい方が望ましい。 Further, in recent years, a data set consisting of learning images and annotations has been made public and available. Basically, the more diverse the trained model is, the more accurate the region division can be performed. Therefore, it is desirable that the scale of the data set used for training is large.

“Fully Convolutional Networks for Semantic Segmentation”,Jonathan Long, Evan Shelhamer, and Trevor Darrell (Proceedings of the IEEE conference on computer vision and pattern recognition, 2015)“Fully Convolutional Networks for Semantic Segmentation”, Jonathan Long, Evan Shelhamer, and Trevor Darrell (Proceedings of the IEEE conference on computer vision and pattern recognition, 2015)

しかしながら、学習データの多様性や付与基準の異なるアノテーションの混在が原因で領域分割結果が変動しやすくなる問題があった。また、付与基準の異なるアノテーションの混在は学習精度低下の原因にもなっていた。 However, there is a problem that the area division result tends to fluctuate due to the diversity of learning data and the mixture of annotations with different assignment criteria. In addition, the mixture of annotations with different assignment criteria has also been a cause of deterioration in learning accuracy.

例えば、黒い絨毯の画像とそれに似たアスファルトの画像とを学習に用いると、黒い絨毯が敷かれた床の領域を正しく床の領域と分割する場合だけでなく、その一部または全部を道路の領域として誤って分割してしまう場合も生じる。これは学習の多様性により領域分割結果が変動しやすくなってしまう例である。 For example, using an image of a black carpet and an image of similar asphalt for learning, not only to properly divide the area of the floor with the black carpet from the area of the floor, but also part or all of it on the road. In some cases, the area may be accidentally divided. This is an example in which the domain division result tends to fluctuate due to the diversity of learning.

また、例えば、野球場を撮影した画像を入力した場合、当該画像における芝の領域を草の領域として分割する場合もあれば、当該画像における芝の領域を遊技場の領域の一部として分割する場合もある。これは付与基準の異なるアノテーションの混在により領域分割結果が変動しやすくなってしまう例である。例えば、公開されているデータセットにおいては、野球場を撮影した学習用画像のひとつにおいては芝の領域に「草」を表すラベルが付与され土の領域に「土」を表すラベルが付与されているが、野球場を撮影した別の学習用画像においては芝と土の領域を合わせた領域に「遊技場」を表すラベルが付与されている、というように異なる付与基準が混在していることがある。つまり、芝の領域に対しては草も遊技場も正解となる。そのため、入力画像の違いによる変動が生じやすくなる。 Further, for example, when an image of a baseball field is input, the turf area in the image may be divided as a grass area, or the turf area in the image may be divided as a part of the game field area. In some cases. This is an example in which the area division result tends to fluctuate due to the mixture of annotations with different assignment criteria. For example, in a publicly available dataset, one of the learning images of a baseball stadium has a label representing "grass" in the turf area and a label representing "soil" in the soil area. However, in another learning image of a baseball stadium, different grant criteria are mixed, such as a label representing "amusement park" is given to the combined area of turf and soil. There is. In other words, both grass and the amusement park are correct answers for the turf area. Therefore, fluctuations due to differences in input images are likely to occur.

別の側面では芝の領域の例のような複数の正解の存在は学習を収束しづらくさせる。そのため、付与基準の異なるアノテーションの混在は学習精度低下の要因でもある。 On the other side, the existence of multiple correct answers, such as the turf domain example, makes learning difficult to converge. Therefore, the mixture of annotations with different assignment criteria is also a factor in lowering the learning accuracy.

なお、上記問題は、二次元画像のみならず、時系列画像から形成される時空間のデータやポイントクラウド等の三次元データ等においても生じ得る。 The above problem may occur not only in a two-dimensional image but also in spatiotemporal data formed from a time-series image, three-dimensional data such as a point cloud, and the like.

本発明は、上記問題を鑑みてなされたものであり、領域分割結果の変動を抑制することのできる領域分割技術を提供することを目的とする。 The present invention has been made in view of the above problems, and an object of the present invention is to provide a region division technique capable of suppressing fluctuations in the region division result.

（１）本発明に係る領域分割装置は、所定の空間に分布するデータ群を複数のクラスに分類する分類処理を行い前記空間を前記クラスで識別されるラベル領域に分割する装置であって、前記データ群と注目クラス群とを入力され当該データ群についての前記分類処理を行い前記注目クラス群についての前記ラベル領域を出力する分類器として、学習用データ群、当該学習用データ群に対し予め与えられた正解のクラス、及び当該正解のクラスの部分集合で与えられる学習用注目クラス群を用いて学習が行われた学習済みモデルを記憶しているモデル記憶手段と、前記データ群に対する前記注目クラス群を複数通り設定する注目クラス設定手段と、前記注目クラス設定手段による前記注目クラス群の複数通りの設定それぞれについて、前記分類器により前記ラベル領域を求め、当該ラベル領域に基づく前記空間の領域分割結果のうち、当該領域分割結果を構成する前記ラベル領域と前記空間との整合の度合いについて所定の条件を満たすものを前記データ群についての領域分割結果として選択する領域分割手段と、を有する。 (1) The area dividing device according to the present invention is a device that performs a classification process for classifying a data group distributed in a predetermined space into a plurality of classes and divides the space into label areas identified by the classes. As a classifier in which the data group and the attention class group are input, the classification processing for the data group is performed, and the label area for the attention class group is output, the training data group and the learning data group are obtained in advance. A model storage means that stores a given correct answer class and a trained model that has been trained using the learning attention class group given by a subset of the correct answer class, and the attention to the data group. For each of the attention class setting means for setting a plurality of class groups and the plurality of settings of the attention class group by the attention class setting means, the label area is obtained by the classifier, and the area of the space based on the label area is obtained. Among the division results, there is a region division means for selecting a division result that satisfies a predetermined condition regarding the degree of matching between the label region and the space constituting the region division result as the region division result for the data group.

（２）上記（１）に記載の領域分割装置において、前記注目クラス設定手段は、前記注目クラス群に補足クラスを加えて新たな前記注目クラス群を設定する処理により、逐次的に前記注目クラスを複数通り設定し、前記領域分割手段は、前記複数通りの前記注目クラス群について前記分類器が出力する前記ラベル領域のうちその大きさが予め定めた基準以上となるものを前記データ群についての領域分割結果として選択する構成とすることができる。 (2) In the region dividing device according to (1) above, the attention class setting means sequentially adds the supplementary class to the attention class group to set a new attention class group, thereby sequentially setting the attention class. Is set in a plurality of ways, and the area dividing means refers to the data group in which the size of the label area output by the classifier for the plurality of types of the attention class group is equal to or larger than a predetermined reference. It can be configured to be selected as the area division result.

（３）上記（２）に記載の領域分割装置において、前記学習済みモデルは、前記データ群と前記注目クラス群とを入力され前記補足クラスを推定する推定器として、さらに前記学習用データ群についての前記補足クラスの正解を用いて前記学習が行われている構成とすることができる。 (3) In the region dividing device according to (2) above, the trained model is used as an estimator for estimating the supplementary class by inputting the data group and the attention class group, and further for the learning data group. It is possible to configure the learning by using the correct answer of the supplementary class of.

（４）本発明に係る領域分割方法は、空間に分布するデータ群を複数のクラスに分類する分類処理を行い前記空間を前記クラスで識別されるラベル領域に分割する方法であって、前記データ群と注目クラス群とを入力され当該データ群についての前記分類処理を行い前記注目クラス群についての前記ラベル領域を出力する分類器として、学習用データ群、当該学習用データ群に対し予め与えられた正解のクラス、及び当該正解のクラスの部分集合で与えられる学習用注目クラス群を用いて学習が行われた学習済みモデルを用意するステップと、前記データ群に対する前記注目クラス群を複数通り設定する注目クラス設定ステップと、前記注目クラス設定ステップにおける前記注目クラス群の複数通りの設定それぞれについて、前記分類器により前記ラベル領域を求め、当該ラベル領域に基づく前記空間の領域分割結果のうち、当該領域分割結果を構成する前記ラベル領域と前記空間との整合の度合いについて所定の条件を満たすものを前記データ群についての領域分割結果として選択する領域分割ステップと、を有する。 (4) The area division method according to the present invention is a method of classifying a data group distributed in a space into a plurality of classes and dividing the space into label areas identified by the classes. It is given in advance to the training data group and the learning data group as a classifier in which the group and the attention class group are input, the classification processing for the data group is performed, and the label area for the attention class group is output. A step of preparing a trained model in which training is performed using a correct class and a learning attention class group given by a subset of the correct answer class, and a plurality of the attention class groups for the data group are set. The label area is obtained by the classifier for each of the attention class setting step and the plurality of settings of the attention class group in the attention class setting step, and among the area division results of the space based on the label area, the said It has a region division step of selecting a data group that satisfies a predetermined condition regarding the degree of matching between the label region and the space that constitutes the region division result as the region division result.

（５）本発明に係る領域分割プログラムは、空間に分布するデータ群を複数のクラスに分類する分類処理を行い前記空間を前記クラスで識別されるラベル領域に分割する処理をコンピュータに行われるプログラムであって、当該コンピュータを、前記データ群と注目クラス群とを入力され当該データ群についての前記分類処理を行い前記注目クラス群についての前記ラベル領域を出力する分類器として、学習用データ群、当該学習用データ群に対し予め与えられた正解のクラス、及び当該正解のクラスの部分集合で与えられる学習用注目クラス群を用いて学習が行われた学習済みモデルを記憶しているモデル記憶手段、前記データ群に対する前記注目クラス群を複数通り設定する注目クラス設定手段、及び、前記注目クラス設定手段による前記注目クラス群の複数通りの設定それぞれについて、前記分類器により前記ラベル領域を求め、当該ラベル領域に基づく前記空間の領域分割結果のうち、当該領域分割結果を構成する前記ラベル領域と前記空間との整合の度合いについて所定の条件を満たすものを前記データ群についての領域分割結果として選択する領域分割手段、として機能させる。 (5) The area division program according to the present invention is a program in which a computer performs a processing of classifying a data group distributed in a space into a plurality of classes and dividing the space into label areas identified by the classes. The computer is used as a classifier that inputs the data group and the attention class group, performs the classification process for the data group, and outputs the label area for the attention class group. A model storage means that stores a trained model that has been trained using a class of correct answers given in advance to the data group for learning and a group of attention classes for learning given by a subset of the classes of correct answers. The label area is obtained by the classifier for each of the attention class setting means for setting a plurality of the attention class groups for the data group and the plurality of settings of the attention class group by the attention class setting means. Among the area division results of the space based on the label area, those satisfying a predetermined condition regarding the degree of matching between the label area and the space constituting the area division result are selected as the area division result for the data group. It functions as an area division means.

本発明によれば、領域分割結果の変動を抑制することが可能になる。 According to the present invention, it is possible to suppress fluctuations in the result of region division.

本発明の実施形態に係る画像処理システムの概略の構成を示すブロック図である。It is a block diagram which shows the schematic structure of the image processing system which concerns on embodiment of this invention. 本発明の第１の実施形態における分類器の概略の機能ブロック図である。It is a schematic functional block diagram of the classifier in the 1st Embodiment of this invention. 合成特徴量の生成処理を説明する模式図である。It is a schematic diagram explaining the generation process of a synthetic feature amount. 第１の実施形態に係る画像処理システムの学習装置としての概略の機能ブロック図である。It is a schematic functional block diagram as a learning apparatus of the image processing system which concerns on 1st Embodiment. 第１の実施形態に係る画像処理システムの学習時の動作の概略のフロー図である。It is a schematic flow chart of the operation at the time of learning of the image processing system which concerns on 1st Embodiment. 第１の実施形態に係る画像処理システムの領域分割装置としての概略の機能ブロック図である。It is a schematic functional block diagram as the area division apparatus of the image processing system which concerns on 1st Embodiment. 第１の実施形態に係る画像処理システムの領域分割処理での動作の概略のフロー図である。It is a schematic flow chart of the operation in the area division processing of the image processing system which concerns on 1st Embodiment. 第１の実施形態に係る画像処理システムの領域分割処理の処理例を説明するための模式図である。It is a schematic diagram for demonstrating the processing example of the area division processing of the image processing system which concerns on 1st Embodiment. 第２の実施形態に係る画像処理システムの領域分割処理での動作の概略のフロー図である。It is a schematic flow chart of the operation in the area division processing of the image processing system which concerns on 2nd Embodiment.

以下、本発明の実施の形態（以下実施形態という）について、図面に基づいて説明する。 Hereinafter, embodiments of the present invention (hereinafter referred to as embodiments) will be described with reference to the drawings.

《第１の実施形態》
本実施形態は、撮影部と表示部とがコンピュータに接続されてなる画像処理システム１であり、画像処理システム１は領域分割装置およびその学習装置として動作する。 << First Embodiment >>
The present embodiment is an image processing system 1 in which a photographing unit and a display unit are connected to a computer, and the image processing system 1 operates as an area dividing device and a learning device thereof.

本発明に係る領域分割装置は、所定の空間に分布する対象データを複数のクラスに分類する分類処理を行い空間をクラスで識別されるラベル領域に分割するものであり、本実施形態にて一例として示す領域分割装置は、監視空間を撮影した画像を領域分割する。すなわち、本実施形態において、分類される対象データは二次元画像を構成する画素であり、分割される空間は画像に対応する有限の大きさの二次元空間である。また、領域分割装置は分類処理を行う分類器のほか、対象データに含まれるクラスに関する推定器を備える。学習装置は領域分割装置で用いる分類器および推定器を学習する。本実施形態では、推定器は分類器の一部を共有して構成されており、以下、分類器という用語は基本的に推定器を含んだ広義で用いる。つまり以下、特に断らない限り、分類器とは、上述の狭義の分類器と推定器との一体の構成を意味する。 The area dividing device according to the present invention performs a classification process for classifying target data distributed in a predetermined space into a plurality of classes and divides the space into label areas identified by the classes, and is an example in the present embodiment. The area division device shown as, divides the image obtained by capturing the monitoring space into areas. That is, in the present embodiment, the target data to be classified is the pixels constituting the two-dimensional image, and the divided space is a two-dimensional space having a finite size corresponding to the image. Further, the area dividing device includes a classifier that performs classification processing and an estimator for the classes included in the target data. The learning device learns the classifier and estimator used in the area dividing device. In the present embodiment, the classifier is configured by sharing a part of the classifier, and hereinafter, the term classifier is basically used in a broad sense including the classifier. That is, hereinafter, unless otherwise specified, the classifier means an integrated configuration of the above-mentioned classifier in a narrow sense and an estimator.

［画像処理システム１の構成］
図１は画像処理システム１の概略の構成を示すブロック図である。画像処理システム１は撮影部２、通信部３、記憶部４、画像処理部５および表示部６からなる。 [Configuration of image processing system 1]
FIG. 1 is a block diagram showing a schematic configuration of an image processing system 1. The image processing system 1 includes a photographing unit 2, a communication unit 3, a storage unit 4, an image processing unit 5, and a display unit 6.

撮影部２は、対象データの集まりである画像を取得するカメラであり、本実施形態においては監視カメラである。撮影部２は通信部３を介して画像処理部５と接続され、監視空間を所定の時間間隔で撮影して画像を生成し、生成した画像を順次、画像処理部５に入力する。例えば、撮影部２は、監視空間である屋内の壁に当該監視空間を俯瞰する所定の固定視野を有して設置され、監視空間をフレーム周期１秒で撮影してカラー画像を生成する。なお、撮影部２はカラー画像の代わりにモノクロ画像を生成してもよい。 The photographing unit 2 is a camera that acquires an image that is a collection of target data, and is a surveillance camera in the present embodiment. The photographing unit 2 is connected to the image processing unit 5 via the communication unit 3, photographs the monitoring space at predetermined time intervals to generate an image, and sequentially inputs the generated images to the image processing unit 5. For example, the photographing unit 2 is installed on an indoor wall which is a monitoring space with a predetermined fixed field of view overlooking the monitoring space, and photographs the monitoring space with a frame period of 1 second to generate a color image. The photographing unit 2 may generate a monochrome image instead of the color image.

通信部３は通信回路であり、その一端が画像処理部５に接続され、他端が撮影部２および表示部６と接続される。通信部３は撮影部２から画像を取得して画像処理部５に入力する。また、通信部３は画像処理部５からクラスへの分類結果やラベル領域への分割結果を入力され表示部６へ出力する。 The communication unit 3 is a communication circuit, one end of which is connected to the image processing unit 5 and the other end of which is connected to the photographing unit 2 and the display unit 6. The communication unit 3 acquires an image from the photographing unit 2 and inputs it to the image processing unit 5. Further, the communication unit 3 inputs the classification result into the class and the division result into the label area from the image processing unit 5 and outputs the result to the display unit 6.

なお、撮影部２、通信部３、記憶部４、画像処理部５および表示部６の間は各部の設置場所に応じた形態で適宜接続される。例えば、撮影部２と通信部３および画像処理部５とが遠隔に設置される場合、撮影部２と通信部３との間をインターネット回線にて接続することができる。また、通信部３と画像処理部５との間はバスで接続する構成とすることができる。その他、接続手段として、ＬＡＮ（Local Area Network）、各種ケーブルなどを用いることができる。 The photographing unit 2, the communication unit 3, the storage unit 4, the image processing unit 5, and the display unit 6 are appropriately connected in a form according to the installation location of each unit. For example, when the photographing unit 2, the communication unit 3, and the image processing unit 5 are installed remotely, the photographing unit 2 and the communication unit 3 can be connected by an internet line. Further, the communication unit 3 and the image processing unit 5 can be connected by a bus. In addition, a LAN (Local Area Network), various cables, or the like can be used as the connection means.

記憶部４は、ＲＯＭ（Read Only Memory）、ＲＡＭ（Random Access Memory）等のメモリ装置であり、各種プログラムや各種データを記憶する。例えば、記憶部４は学習用のデータや、学習済みモデルである分類器の情報を記憶し、画像処理部５との間でこれらの情報を入出力する。すなわち、分類器の学習に用いる情報、分類処理に必要な情報や当該処理の過程で生じた情報などが記憶部４と画像処理部５との間で入出力される。 The storage unit 4 is a memory device such as a ROM (Read Only Memory) and a RAM (Random Access Memory), and stores various programs and various data. For example, the storage unit 4 stores learning data and information of a classifier that is a learned model, and inputs and outputs such information to and from the image processing unit 5. That is, information used for learning the classifier, information required for the classification process, information generated in the process of the process, and the like are input / output between the storage unit 4 and the image processing unit 5.

画像処理部５は、ＣＰＵ（Central Processing Unit）、ＤＳＰ（Digital Signal Processor）、ＭＣＵ（Micro Control Unit）、ＧＰＵ（Graphics Processing Unit）等の演算装置で構成される。画像処理部５は記憶部４からプログラムを読み出して実行することにより各種の処理手段・制御手段として動作し、必要に応じて、各種データを記憶部４から読み出し、生成したデータを記憶部４に記憶させる。例えば、画像処理部５は分類器を学習し生成すると共に、生成した分類器を通信部３経由で記憶部４に記憶させる。また、画像処理部５は分類器を用いて撮影部２からの画像を構成する画素をクラス分類し、当該画像を分割する。 The image processing unit 5 is composed of arithmetic units such as a CPU (Central Processing Unit), a DSP (Digital Signal Processor), an MCU (Micro Control Unit), and a GPU (Graphics Processing Unit). The image processing unit 5 operates as various processing means / control means by reading a program from the storage unit 4 and executing the program, reads various data from the storage unit 4 as necessary, and stores the generated data in the storage unit 4. Remember. For example, the image processing unit 5 learns and generates a classifier, and stores the generated classifier in the storage unit 4 via the communication unit 3. Further, the image processing unit 5 classifies the pixels constituting the image from the photographing unit 2 into classes using a classifier, and divides the image.

表示部６は、液晶ディスプレイまたは有機ＥＬ（Electro-Luminescence）ディスプレイ等であり、通信部３を経由して画像処理部５から入力される分類結果や分割結果を表示する。 The display unit 6 is a liquid crystal display, an organic EL (Electro-Luminescence) display, or the like, and displays a classification result or a division result input from the image processing unit 5 via the communication unit 3.

［分類器の構成］
図２は上述の広義の分類器の概略の機能ブロック図である。この分類器の構成の説明においては、狭義の分類器と推定器とを区別する都合上、狭義の分類器を単に分類器と記載し、広義の分類器を分類・推定器と記載する。分類・推定器は、画像と注目クラス情報とを入力され、分類器としては、画像の各画素についてクラス分類を行ってその結果を出力し、一方、推定器としては、補足クラスを推定してその結果を出力する。注目クラス情報は注目クラス群を指定する情報である。 [Classifier configuration]
FIG. 2 is a schematic functional block diagram of the above-mentioned broadly defined classifier. In the explanation of the configuration of this classifier, for the convenience of distinguishing between a classifier in a narrow sense and an estimator, a classifier in a narrow sense is simply described as a classifier, and a classifier in a broad sense is described as a classifier / estimator. The classifier / estimator inputs the image and the class of interest information, and the classifier classifies each pixel of the image and outputs the result, while the estimator estimates the supplementary class. The result is output. The attention class information is information that specifies the attention class group.

ここで、注目クラス群は、分類器にてラベル領域を求めるクラスであり、基本的には１または複数のクラスからなる。つまり、分類対象として予め定められた複数のクラスのうちで注目クラスが設定され、注目クラスとして設定された１または複数のクラスを要素とする集合が注目クラス群である。そして、分類器は、領域分割対象の画像において注目クラスに対応する領域をラベル領域として識別し、一方、当該画像における注目クラス以外のクラスに対応する領域は特定のクラスのラベル領域としての識別を行わない。その結果、注目クラス群に対応しない領域は例えば、その他のクラスの領域として扱われる。 Here, the class of interest is a class for which a label area is obtained by a classifier, and is basically composed of one or a plurality of classes. That is, the attention class is set among the plurality of classes predetermined as the classification target, and the set having one or more classes set as the attention class as elements is the attention class group. Then, the classifier identifies the area corresponding to the attention class in the image to be divided into areas as the label area, while the area corresponding to the class other than the attention class in the image is identified as the label area of the specific class. Not performed. As a result, the area that does not correspond to the class of interest is treated as, for example, the area of another class.

補足クラスは、現在の注目クラス群に新たな注目クラスとして追加するクラスである。つまり、現在の注目クラス群に補足クラスを加えることで新たな注目クラス群が設定される。補足クラスを加えて注目クラス群を更新することで、基本的に、注目クラス群に対応するラベル領域は拡大し画像の全体領域に近づくが、推定器は、新たな注目クラス群に対応するラベル領域が好適に画像全体に近づく補足クラスを推定する。補足クラスは例えば、注目クラス以外のクラスのうちの１つのクラスであって、それを注目クラス群に加えることで、ラベル領域とされないその他のクラスの領域のサイズが最も減少するものとすることができる。 The supplementary class is a class to be added as a new attention class to the current attention class group. In other words, a new attention class group is set by adding a supplementary class to the current attention class group. By updating the attention class group by adding a supplementary class, basically, the label area corresponding to the attention class group is enlarged and approaches the entire area of the image, but the estimator is a label corresponding to the new attention class group. Estimate a supplementary class in which the region preferably approaches the entire image. The supplementary class is, for example, one of the classes other than the attention class, and by adding it to the attention class group, the size of the area of the other classes that are not labeled areas can be reduced most. it can.

本実施形態では、分類・推定器は深層学習（Deep Learning）で用いられるような多層のネットワークで構成され、例えば、畳み込みニューラルネットワーク（Convolutional Neural Network：ＣＮＮ）でモデル化することができる。本実施形態の分類・推定器を構成するネットワークは、特徴量抽出部４００、注目クラス情報圧縮部４０１、特徴量合成部４０２、クラス分類部４０３および補足クラス推定部４０４を含む。これらのうち特徴量抽出部４００、注目クラス情報圧縮部４０１、特徴量合成部４０２は分類器と推定器とで共有され、当該共有部分にクラス分類部４０３を接続した構成が分類器をなし、一方、当該共有部分に補足クラス推定部４０４を接続した構成が推定器をなす。 In the present embodiment, the classifier / estimator is composed of a multi-layered network as used in deep learning, and can be modeled by, for example, a convolutional neural network (CNN). The network constituting the classification / estimator of the present embodiment includes a feature amount extraction unit 400, a feature amount information compression unit 401, a feature amount synthesis unit 402, a class classification unit 403, and a supplementary class estimation unit 404. Of these, the feature amount extraction unit 400, the feature amount information compression unit 401, and the feature amount synthesizer unit 402 are shared by the classifier and the estimator, and the configuration in which the class classification unit 403 is connected to the shared part forms the classifier. On the other hand, a configuration in which the supplementary class estimation unit 404 is connected to the shared portion forms an estimator.

分類器の構成において、特徴量抽出部４００、特徴量合成部４０２およびクラス分類部４０３は直列に接続された複数層からなるネットワーク構造であり、当該部分を分類器主部と呼ぶことにする。同様に、推定器の構成において、特徴量抽出部４００、特徴量合成部４０２および補足クラス推定部４０４は直列に接続された複数層からなるネットワーク構造であり、当該部分を推定器主部と呼ぶことにする。 In the configuration of the classifier, the feature amount extraction unit 400, the feature amount synthesis unit 402, and the class classification unit 403 have a network structure composed of a plurality of layers connected in series, and this part is referred to as a classifier main unit. Similarly, in the configuration of the estimator, the feature amount extraction unit 400, the feature amount synthesis unit 402, and the supplementary class estimation unit 404 have a network structure composed of a plurality of layers connected in series, and this part is called an estimator main unit. I will decide.

特徴量抽出部４００、クラス分類部４０３および補足クラス推定部４０４は、畳み込み層や活性化関数、プーリング（pooling）層などから構成される。例えば、分類器主部は近傍画素の特徴量を畳み込んだ特徴量マップを求める処理を繰り返し行って周囲の画素との関係を集約し、さらに元の画像の画素についてクラスを識別する処理を行う。本実施形態では、分類器主部および推定器主部のネットワーク構造はその途中に特徴量合成部４０２を挿入され、それぞれ特徴量合成部４０２の前と後との２つの部分に分かれる。これら２つの部分のうち前の部分が分類器主部と推定器主部とで共有される特徴量抽出部４００であり、一方、後ろの部分が分類器主部ではクラス分類部４０３であり、推定器主部では補足クラス推定部４０４である。 The feature amount extraction unit 400, the classification unit 403, and the supplementary class estimation unit 404 are composed of a convolution layer, an activation function, a pooling layer, and the like. For example, the main part of the classifier repeatedly performs a process of obtaining a feature map that convolves the features of neighboring pixels, aggregates the relationship with surrounding pixels, and further performs a process of identifying a class for the pixels of the original image. .. In the present embodiment, the network structure of the classifier main part and the estimator main part is divided into two parts, a front part and a back part of the feature amount synthesis part 402, by inserting the feature amount synthesis part 402 in the middle of the network structure. The front part of these two parts is the feature amount extraction unit 400 shared by the classifier main part and the estimator main part, while the rear part is the classification unit 403 in the classifier main part. In the main part of the estimator, there is a supplementary class estimator 404.

特徴量抽出部４００は画像を入力され、当該画像から特徴量の計算を行う。なお、特徴量抽出部４００が行う特徴量の計算は、複数階層に生成される特徴量マップの途中の階層までであり得、また、クラス分類部４０３や補足クラス推定部４０４が行う処理は当該途中の階層以降の特徴量マップの生成を含み得る。 The feature amount extraction unit 400 inputs an image and calculates the feature amount from the image. The feature amount calculation performed by the feature amount extraction unit 400 can be performed up to a layer in the middle of the feature amount map generated in a plurality of layers, and the processing performed by the class classification unit 403 and the supplementary class estimation unit 404 is relevant. It may include the generation of feature maps after the middle layer.

クラス分類部４０３は特徴量合成部４０２が生成する合成特徴量に基づいて画素のクラス分類を行い画像を領域分割する処理を行う。当該領域分割では、注目クラスに対応するラベル領域が出力される。つまり、クラス分類部４０３は、各画素のクラスを分類し、その際、注目クラスに分類される画素については当該クラスを出力し、これにより画像中にて当該クラスに属する画素群からなるラベル領域が得られ、画像はラベル領域に分割される。具体的には同一クラスに分類した隣接画素同士が当該クラスのラベル領域の１区画を構成する。一方、クラス分類部４０３は、画像中にて注目クラスに分類されない部分については例えば、上述のように「その他クラス」として出力する。 The class classification unit 403 classifies the pixels based on the composite feature amount generated by the feature amount synthesis unit 402, and performs a process of dividing the image into regions. In the area division, the label area corresponding to the class of interest is output. That is, the class classification unit 403 classifies the class of each pixel, and at that time, outputs the class for the pixel classified in the attention class, and thereby, in the image, the label area consisting of the pixel group belonging to the class. Is obtained and the image is divided into label areas. Specifically, adjacent pixels classified into the same class form one section of the label area of the class. On the other hand, the class classification unit 403 outputs the portion of the image that is not classified into the class of interest as, for example, the "other class" as described above.

補足クラス推定部４０４は特徴量合成部４０２が生成する合成特徴量に基づいて補足クラスを推定する処理を行う。例えば、補足クラス推定部４０４は、特徴量合成部４０２で作成した合成特徴量をもとに、注目クラス群に含まれていないが画像中には含まれているクラスのうち、最も面積が大きいクラスを補足クラスとして推定する。また、補足クラス推定部４０４は、クラスごとに注目クラスになり得るスコアが格納されたベクトルを出力し、当該スコアに基づいて補足クラスを選択する構成とすることもできる。なお、注目クラス群が画像中に含まれているクラスを全て含んでいる場合には、「補足クラスなし」という推定結果を返す。 The supplementary class estimation unit 404 performs a process of estimating the supplementary class based on the composite feature amount generated by the feature amount synthesis unit 402. For example, the supplementary class estimation unit 404 has the largest area among the classes not included in the attention class group but included in the image based on the composite feature quantity created by the feature quantity synthesis unit 402. Estimate the class as a supplementary class. Further, the supplementary class estimation unit 404 may be configured to output a vector in which a score that can be a attention class is stored for each class and select a supplementary class based on the score. If the group of interest includes all the classes included in the image, the estimation result of "no supplementary class" is returned.

注目クラス情報圧縮部４０１は全結合層などから構成され、低次元表現での注目クラス情報を得て特徴量合成部４０２へ出力する。つまり、注目クラス情報は画像に映っているものやそのシーンに基づいて設定されるが、入力される画像中に現れるクラスの数は、分類器が分類可能な全クラスの数よりも十分小さいことが多く、また例えば屋外の画像に屋内のクラスは含まれにくい、屋内では壁と床は同時に含まれやすいなどの共起性を持つため、注目クラス情報は比較的低次元の情報で表すことができ、注目クラス情報圧縮部４０１はこの低次元化の変換処理を行う。例えば、注目クラス情報圧縮部４０１は、予め定義された全クラスに応じた数の変数で表される注目クラス情報を入力され、当該情報を次元圧縮し、より少ない変数で表現される注目クラス情報に変換して出力する。 The attention class information compression unit 401 is composed of a fully connected layer and the like, obtains attention class information in a low-dimensional representation, and outputs the information to the feature amount synthesis unit 402. In other words, the class of interest information is set based on what is shown in the image and the scene, but the number of classes that appear in the input image is sufficiently smaller than the number of all classes that the classifier can classify. In addition, since indoor classes are difficult to be included in outdoor images, and walls and floors are likely to be included at the same time indoors, the class of interest information can be represented by relatively low-dimensional information. The attention class information compression unit 401 can perform this low-dimensional conversion processing. For example, the attention class information compression unit 401 inputs attention class information represented by a number of variables corresponding to all the predefined classes, dimensionally compresses the information, and represents attention class information with fewer variables. Convert to and output.

特徴量合成部４０２は、特徴量抽出部４００にて抽出された特徴量に、注目クラス情報圧縮部４０１にて圧縮された注目クラス情報を合成して合成特徴量を生成し、クラス分類部４０３および補足クラス推定部４０４へ入力する。 The feature amount synthesis unit 402 synthesizes the feature amount extracted by the feature amount extraction unit 400 with the attention class information compressed by the attention class information compression unit 401 to generate a composite feature amount, and generates a composite feature amount, and classifies the feature amount 403. And input to the supplementary class estimation unit 404.

図３は合成特徴量の生成処理を説明する模式図である。図３は図２に示した分類・推定器内におけるデータを模式的に表しており、図の左側には、分類器主部をなす図２の特徴量抽出部４００、特徴量合成部４０２およびクラス分類部４０３の並びに対応して、分類器へ入力される画像１００、特徴量合成部４０２により生成される合成特徴量１１０、分類器から出力されるクラス分類結果１４０が並んでいる。また、図の右側には、注目クラス情報圧縮部４０１の入力ノード１２０および当該ノードに入力される注目クラス情報１２１、並びに注目クラス情報圧縮部４０１の出力ノード１３０が示されている。 FIG. 3 is a schematic diagram illustrating the process of generating the composite feature amount. FIG. 3 schematically shows the data in the classifier / estimator shown in FIG. 2, and on the left side of the figure, the feature amount extraction unit 400, the feature amount synthesis unit 402, and the feature amount synthesis unit 402 of FIG. Corresponding to the arrangement of the class classification unit 403, the image 100 input to the classifier, the composite feature amount 110 generated by the feature amount synthesis unit 402, and the class classification result 140 output from the classifier are arranged. Further, on the right side of the figure, the input node 120 of the attention class information compression unit 401, the attention class information 121 input to the node, and the output node 130 of the attention class information compression unit 401 are shown.

図３の左側に並ぶ分類器主部のデータに関し、画像１００の幅方向にｘ軸、高さ方向にｙ軸をとり、また特徴量のチャンネルに対応する次元をｃ軸で表している。画像１００の大きさはｘ方向にＷ_Ｉ画素、ｙ方向にＨ_Ｉ画素である。特徴量抽出部４００にて生成される特徴量マップはｘ方向にＷ_Ｆ画素、ｙ方向にＨ_Ｆ画素の大きさで、ｃ方向の大きさ、つまりチャンネル数はＣチャンネルとする。ちなみに、特徴量マップのｘ，ｙ方向のサイズは一般に画像１００のサイズとは一致せず、通常、Ｗ_Ｆ＜Ｗ_Ｉ，Ｈ_Ｆ＜Ｈ_Ｉとなる。 Regarding the data of the main part of the classifier arranged on the left side of FIG. 3, the x-axis is taken in the width direction and the y-axis is taken in the height direction of the image 100, and the dimension corresponding to the feature channel is represented by the c-axis. The size of the image 100 is _{H I} pixel _{W I} pixels in the y direction in the x-direction. The feature amount map generated by the feature amount extraction unit 400 W _F pixels in the x direction, the magnitude of H _F pixels in the y-direction, c the direction of the size, i.e. the number of channels is the C channel. Incidentally, the feature maps of x, y-direction size does not match the general size of the image 100, becomes _normal, W _F <W _I, and _H F <H _I.

図３に例示する注目クラス情報１２１は、予め定められたＮ個のクラスそれぞれについて各クラスが注目クラスか否かを表す情報である。例えば、分類器が分類対象とする全クラスが当該Ｎクラスとして設定される。 The attention class information 121 illustrated in FIG. 3 is information indicating whether or not each class is a attention class for each of the predetermined N classes. For example, all classes targeted by the classifier are set as the N class.

具体的には、注目クラス情報１２１は注目クラスを値“１”、注目クラスではなく、よってクラス分類結果においてその他クラスに置き換えて出力させるクラスを“０”で表したＮ次元のベクトルである。注目クラス情報１２１はその具体的な一例を示しており、屋内を撮影した画像に対して生成されたものである。例えば“人”や“床”のクラスは当該画像に含まれるため注目クラスであるとして、ベクトルにて対応する要素に“１”が設定され、一方、例えば“道路”のクラスは当該画像に含まれないため注目クラスではないとして、対応する要素に“０”が設定されている。 Specifically, the attention class information 121 is an N-dimensional vector in which the attention class has a value of "1" and is not the attention class, and therefore the class to be output by replacing it with another class in the classification result is represented by "0". The attention class information 121 shows a specific example thereof, and is generated for an image taken indoors. For example, the class of "people" and "floor" is included in the image, so it is regarded as a class of interest, and "1" is set for the corresponding element in the vector, while the class of "road", for example, is included in the image. Since it is not possible, it is not a class of interest, and "0" is set for the corresponding element.

注目クラス情報圧縮部４０１の入力ノード１２０は注目クラス情報１２１の要素と一対一に対応しており、その数はＮであり、一方、出力ノード１３０の数ＤはＮ未満である。注目クラス情報圧縮部４０１は、入力ノード１２０に入力された注目クラス情報１２１を次元圧縮して、出力ノード１３０から圧縮された注目クラス情報を出力する。つまり、注目クラス情報１２１はＮ次元のベクトルからＤ次元のベクトルに圧縮される。ちなみに、図３では、注目クラス情報圧縮部４０１として、入力ノード１２０と出力ノード１３０とが全結合された構成を示している。 The input node 120 of the attention class information compression unit 401 has a one-to-one correspondence with the elements of the attention class information 121, and the number thereof is N, while the number D of the output nodes 130 is less than N. The attention class information compression unit 401 dimensionally compresses the attention class information 121 input to the input node 120, and outputs the compressed attention class information from the output node 130. That is, the attention class information 121 is compressed from the N-dimensional vector to the D-dimensional vector. Incidentally, FIG. 3 shows a configuration in which the input node 120 and the output node 130 are fully connected as the attention class information compression unit 401.

特徴量合成部４０２は、注目クラス情報圧縮部４０１の出力ノード１３０から圧縮された注目クラス情報を入力され、当該注目クラス情報を特徴量抽出部４００から入力された特徴量マップと合成して、合成特徴量１１０を生成する。合成特徴量１１０は、合成前の特徴量マップにてｘ座標、ｙ座標の組で指定されるＣ次元の特徴量ベクトルそれぞれにＤ次元ベクトルで表される注目クラス情報を連結したものであり、合成前の特徴量マップと幅と高さが同じで、チャンネル数が（Ｃ＋Ｄ）チャンネルとなった構造を有する。例えば、合成特徴量１１０の第１〜第Ｃチャンネルは合成前の特徴量マップで、第（Ｃ＋１）〜第（Ｃ＋Ｄ）チャンネルに、注目クラス情報圧縮部４０１の出力ノード１３０の第１〜第Ｄノードの出力値が設定される。 The feature amount synthesizing unit 402 inputs the attention class information compressed from the output node 130 of the attention class information compression unit 401, synthesizes the feature amount class information with the feature amount map input from the feature amount extraction unit 400, and combines the feature amount synthesis unit 402 with the feature amount map input from the feature amount extraction unit 400. Generates a synthetic feature amount 110. The composite feature amount 110 is obtained by concatenating the attention class information represented by the D-dimensional vector to each of the C-dimensional feature amount vectors specified by the set of x-coordinate and y-coordinate in the feature amount map before synthesis. It has the same width and height as the feature map before synthesis, and has a structure in which the number of channels is (C + D). For example, the 1st to Cth channels of the composite feature amount 110 are the feature amount map before synthesis, and the 1st to 1st D of the output node 130 of the attention class information compression unit 401 are set to the (C + 1) to the (C + D) channels. The output value of the node is set.

本実施形態では各（ｘ，ｙ）座標に対して共通の注目クラス情報を設定するので、合成特徴量１１０の構造は、注目クラス情報のＤ個の要素それぞれをｘ，ｙ方向に複製して特徴量抽出部４００の出力と同じＷ_Ｆ×Ｈ_Ｆ画素の大きさに拡大し、それを合成前の特徴量マップに積層した構造である。つまり、例えば、第１〜第Ｃチャンネルの特徴量は座標（ｘ，ｙ）に応じて異なり得るのに対し、本実施形態では第（Ｃ＋１）〜第（Ｃ＋Ｄ）の各チャンネルには全ての座標（ｘ，ｙ）に共通の値が設定される。 In the present embodiment, common attention class information is set for each (x, y) coordinate, so that the structure of the composite feature amount 110 duplicates each of the D elements of the attention class information in the x and y directions. expanding the size of the same W _F × H _F pixels and an output of the feature extraction unit 400, a structure obtained by laminating it to the feature amount map before combining. That is, for example, the feature quantities of the first to C channels may differ depending on the coordinates (x, y), whereas in the present embodiment, all the coordinates of the (C + 1) to (C + D) channels are all coordinates. A common value is set for (x, y).

以下、画像処理システム１の構成について、先ず、学習装置としての構成および動作について説明し、次いで、領域分割装置としての構成および動作について説明する。 Hereinafter, regarding the configuration of the image processing system 1, first, the configuration and operation as a learning device will be described, and then the configuration and operation as a region dividing device will be described.

［学習装置としての構成］
図４は第１の実施形態に係る画像処理システム１の学習装置としての概略の機能ブロック図であり、記憶部４が学習用データ記憶手段４０および学習モデル記憶手段４１として機能し、画像処理部５が正解ラベル置換手段５０、学習用注目クラス生成手段５１、学習用補足クラス生成手段５２および学習手段５３として機能する。 [Configuration as a learning device]
FIG. 4 is a schematic functional block diagram of the image processing system 1 according to the first embodiment as a learning device, in which the storage unit 4 functions as the learning data storage means 40 and the learning model storage means 41, and the image processing unit 5 functions as a correct answer label replacement means 50, a learning attention class generating means 51, a learning supplementary class generating means 52, and a learning means 53.

学習用データ記憶手段４０は、学習用対象データである多数の画像および当該画像に対し予め与えられた正解のクラスを記憶する。学習用画像と当該画像それぞれに対応する正解のクラスとは、学習処理に先立って予め学習用データ記憶手段４０に記憶される。 The learning data storage means 40 stores a large number of images that are learning target data and a class of correct answers given in advance for the images. The learning image and the correct answer class corresponding to each of the images are stored in the learning data storage means 40 in advance prior to the learning process.

学習モデル記憶手段４１は分類器についての学習モデルを記憶する。学習手段５３による学習処理に伴い、学習モデル記憶手段４１に記憶される学習モデルは更新される。そして、学習が完了すると、学習モデル記憶手段４１は分類器の学習済みモデルを記憶し、後述するモデル記憶手段４２として機能する。上述したように本実施形態では、分類器は例えば、ＣＮＮでモデル化されるネットワークで構成され、学習モデル記憶手段４１は、ＣＮＮなどのネットワークを構成するフィルタのフィルタ係数やネットワーク構造などを含めた情報を分類器として記憶する。 The learning model storage means 41 stores a learning model for the classifier. Along with the learning process by the learning means 53, the learning model stored in the learning model storage means 41 is updated. Then, when the learning is completed, the learning model storage means 41 stores the learned model of the classifier and functions as the model storage means 42 described later. As described above, in the present embodiment, the classifier is composed of, for example, a network modeled by CNN, and the learning model storage means 41 includes the filter coefficients and the network structure of the filters constituting the network such as CNN. Store information as a classifier.

学習手段５３は、学習モデル記憶手段４１に記憶される学習モデルの学習を行う。当該学習では、分類器の学習モデルにて特徴量抽出部４００および注目クラス情報圧縮部４０１それぞれに学習用画像および学習用注目クラス情報を入力し、クラス分類部４０３の出力のクラス分類結果の正解に対する誤差と、補足クラス推定部４０４の出力に得られる補足クラスの正解に対する誤差とに基づいて学習モデルが更新され学習される。ちなみに、分類器の学習における出力の正解に対する誤差は、クラス分類部４０３と補足クラス推定部４０４の上記両誤差を加算等により統合した値とし、学習は当該統合した誤差に基づいて制御することができる。 The learning means 53 learns the learning model stored in the learning model storage means 41. In the learning, the learning image and the learning attention class information are input to each of the feature amount extraction unit 400 and the attention class information compression unit 401 in the learning model of the classifier, and the correct answer of the classification result of the output of the class classification unit 403. The learning model is updated and learned based on the error with respect to and the error with respect to the correct answer of the supplementary class obtained in the output of the supplementary class estimation unit 404. By the way, the error for the correct answer of the output in the learning of the classifier is a value obtained by integrating the above errors of the classification unit 403 and the supplementary class estimation unit 404 by addition or the like, and the learning can be controlled based on the integrated error. it can.

学習手段５３には、当該学習に用いられる学習用画像、学習用注目クラス情報、並びにクラス分類および補足クラスそれぞれの正解とが入力される。これらのうち学習用画像は学習用データ記憶手段４０から学習手段５３に入力される。また、学習用注目クラス生成手段５１、正解ラベル置換手段５０および学習用補足クラス生成手段５２がそれぞれ、学習用注目クラス情報、クラス分類の正解および補足クラスの正解を学習手段５３に入力する。 In the learning means 53, a learning image used for the learning, learning attention class information, and correct answers for each of the classification and the supplementary class are input. Of these, the learning image is input from the learning data storage means 40 to the learning means 53. Further, the learning attention class generating means 51, the correct answer label replacing means 50, and the learning supplementary class generating means 52 input the learning attention class information, the correct answer of the classification, and the correct answer of the supplementary class into the learning means 53, respectively.

正解ラベル置換手段５０は学習用データ記憶手段４０に記憶されている正解のクラスを読み出し、正解のクラスに対する置換処理を行う。当該置換処理は、正解のクラス、つまり学習用画像に存在するクラスの一部を存在しないものとする。例えば、置換するクラスは、各学習用画像に対応した正解のクラスの中でランダムに設定することができる。この際、各クラスを一定の確率でランダムに置換するのではなく、０以上で、正解ラベルに含まれるクラス数以下の乱数を生成し、その乱数の数だけのクラスをランダムに選択し置換することで、置換されるクラスの数が均一に分布するようにするとよい。或いは、ランダムに置換する代わりに、各クラスの置換回数を計数しながら、各学習用画像に対応した正解のクラスの中で置換回数が少ないクラスを優先して選択し置換してもよい。 The correct answer label replacement means 50 reads out the correct answer class stored in the learning data storage means 40, and performs replacement processing for the correct answer class. In the replacement process, it is assumed that the correct answer class, that is, a part of the class existing in the learning image does not exist. For example, the class to be replaced can be randomly set in the correct class corresponding to each learning image. At this time, instead of randomly replacing each class with a certain probability, a random number of 0 or more and less than or equal to the number of classes included in the correct answer label is generated, and as many classes as the number of the random numbers are randomly selected and replaced. Therefore, it is preferable that the number of classes to be replaced is evenly distributed. Alternatively, instead of randomly replacing, the number of replacements of each class may be counted, and the class with the smaller number of replacements may be preferentially selected and replaced among the correct answer classes corresponding to each learning image.

この正解ラベル置換手段５０の処理を人、床、壁、窓が写っている画像を例に用いて説明する。クラスの情報は全クラスのそれぞれと一対一に対応した要素からなるベクトルで表すことができる。このベクトルをクラスベクトルと呼ぶことにする。各画素の正解のクラスは、分類対象として予め定義された全クラスの数をＮとすると、当該クラスに対応する要素に値“１”、それ以外の要素に値“０”が設定されたＮ次元クラスベクトルで表現できる。例えば、人が写っている画素の正解のクラスは、人のクラスの要素が“１”でそれ以外は“０”であるＮ次元ベクトルで表現され、また床の画素の正解のクラスは、床のクラスの要素が“１”でそれ以外は“０”であるＮ次元ベクトルで表現され、壁や窓の画素の正解のクラスも同様に表現される。正解ラベル置換手段５０は、正解のクラスを表すＮ次元ベクトルを各画素について入力され、そのベクトルにその他クラスに対応する要素を加えたＮ＋１次元クラスベクトルを各画素について出力する。正解ラベル置換手段５０が例えば、床以外、つまり人、壁、窓を注目しないものとする場合、人、壁、窓のいずれかが含まれている画素のＮ＋１次元クラスベクトルにおける人、壁、窓のクラスの要素が“０”に置換され、その他クラスに対応する要素が“１”に設定される。また、床が含まれている画素のＮ＋１次元クラスベクトルにおけるその他クラスに対応する要素には“０”が設定される。 The process of the correct label replacement means 50 will be described using an image showing a person, a floor, a wall, and a window as an example. Class information can be represented by a vector consisting of elements that have a one-to-one correspondence with each of all classes. This vector will be called a class vector. As for the correct answer class of each pixel, assuming that the number of all classes defined in advance as the classification target is N, the value "1" is set for the element corresponding to the class and the value "0" is set for the other elements. It can be represented by a dimensional class vector. For example, the correct class of pixels showing a person is represented by an N-dimensional vector in which the element of the human class is "1" and the other elements are "0", and the correct class of pixels on the floor is the floor. The elements of the class of are expressed as "1" and the other elements are expressed as "0", and the correct class of the pixel of the wall or window is also expressed in the same manner. The correct label replacement means 50 inputs an N-dimensional vector representing the correct class for each pixel, and outputs an N + 1-dimensional class vector obtained by adding elements corresponding to other classes to the vector for each pixel. If the correct label replacement means 50 does not pay attention to other than the floor, that is, the person, the wall, or the window, the person, the wall, or the window in the N + 1 dimensional class vector of the pixel including the person, the wall, or the window. The element of the class of is replaced with "0", and the element corresponding to the other class is set to "1". Further, "0" is set for the elements corresponding to other classes in the N + 1 dimensional class vector of the pixel including the floor.

置換処理により、注目クラスについては置換前の正解のクラスに基づくオリジナルの正解ラベル領域、注目クラス以外についてはその他クラスに置き換えられた正解ラベル領域が得られ、この置換済みの正解ラベル領域が正解ラベル置換手段５０からクラス分類の正解として学習手段５３に与えられる。 By the replacement process, the original correct label area based on the correct answer class before replacement is obtained for the attention class, and the correct answer label area replaced with other classes except for the attention class is obtained, and this replaced correct answer label area is the correct answer label. The replacement means 50 gives the learning means 53 as the correct answer for the classification.

学習用注目クラス生成手段５１は、正解ラベル置換手段５０から置換済みの正解ラベル領域を入力され、それに基づいて学習用注目クラス情報を生成する。学習用注目クラス生成手段５１は、置換済みの正解ラベル領域に対応するクラス、つまり、正解ラベル置換手段５０での置換処理後に残る正解のクラスを学習用注目クラス群とし、それを表す学習用注目クラス情報を生成する。ちなみに、学習用注目クラス情報として、置換済みの正解ラベル画像ごとに１つのＮ次元クラスベクトルが生成される。例えば、学習用注目クラス情報は、注目クラスに対応する要素が値“１”でそれ以外は“０”であるクラスベクトルで表現される。 The learning attention class generating means 51 inputs the replaced correct answer label area from the correct answer label replacing means 50, and generates learning attention class information based on the input correct answer label area. The learning attention class generating means 51 sets the class corresponding to the replaced correct label area, that is, the correct answer class remaining after the replacement process by the correct label replacement means 50 as the learning attention class group, and represents the learning attention class group. Generate class information. Incidentally, as learning attention class information, one N-dimensional class vector is generated for each replaced correct label image. For example, the learning attention class information is represented by a class vector in which the element corresponding to the attention class has a value of "1" and the other elements have a value of "0".

学習用補足クラス生成手段５２は、学習用データ記憶手段４０からオリジナルの正解ラベル領域を読み出し、また学習用注目クラス生成手段５１から学習用注目クラス情報を入力され、それらに基づいて補足クラスの正解である学習用補足クラスを生成する。オリジナルの正解ラベル領域から分かる置換前の正解のクラスと、学習用注目クラス情報が示す注目クラスとを比べることで、正解のクラスのうち正解ラベル置換手段５０にて置換され注目クラスに含まれないこととなっているクラスが分かる。学習用補足クラス生成手段５２は、当該クラスのうちの正解ラベル領域としての面積が最大のクラスを、学習用補足クラスとして学習手段５３に入力する。なお、学習用補足クラス生成手段５２は補足クラスが無い場合はその旨を出力する。学習用補足クラス生成手段５２の出力はＮ＋１次元クラスベクトルとすることができ、その第Ｎ＋１次元目の要素を補足クラス無しフラグとすることができる。 The learning supplementary class generating means 52 reads the original correct answer label area from the learning data storage means 40, and the learning attention class information is input from the learning attention class generating means 51, and the correct answer of the supplementary class is input based on them. Generate a supplementary class for learning. By comparing the correct answer class before replacement found from the original correct answer label area with the attention class indicated by the learning attention class information, the correct answer class is replaced by the correct answer label replacement means 50 and is not included in the attention class. You can see the class that is supposed to be. The learning supplementary class generating means 52 inputs the class having the largest area as the correct answer label area among the classes into the learning means 53 as the learning supplementary class. If there is no supplementary class, the learning supplementary class generating means 52 outputs to that effect. The output of the learning supplementary class generation means 52 can be an N + 1 dimension class vector, and the element of the N + 1th dimension can be a supplementary class no flag.

［学習装置としての動作］
画像処理システム１は入力画像を領域分割する動作に先立って、分類器を学習する動作を行う。以下、この分類器の学習について説明する。画像処理システム１における分類器の学習は、学習用画像および学習用注目クラス情報と、クラス分類の正解データである置換済み正解ラベル領域、および補足クラスの正解データである学習用補足クラスとを用い、上述した統合した誤差をもとに、誤差逆伝播法などの既知の最適化手法を用いて、学習モデルのパラメータを繰り返し誤差が収束するまで更新する。この学習によって、注目クラスに対応するラベル領域を求める分類処理が可能な分類器（狭義）と、領域分割結果にて注目クラスに対応するラベル領域に含まれない「その他クラス」の領域を縮小させるように補足クラスを推定する推定器とを学習させることができる。また、当該分類器の学習は、特徴量抽出部４００、クラス分類部４０３および補足クラス推定部４０４の学習に加え、学習用注目クラス情報を用いて注目クラス情報圧縮部４０１を学習する動作を含む。 [Operation as a learning device]
The image processing system 1 performs an operation of learning the classifier prior to the operation of dividing the input image into regions. The learning of this classifier will be described below. The learning of the classifier in the image processing system 1 uses the learning image and the learning attention class information, the replaced correct answer label area which is the correct answer data of the class classification, and the learning supplementary class which is the correct answer data of the supplementary class. Based on the integrated error described above, the parameters of the training model are iteratively updated until the error converges using a known optimization method such as the backpropagation error method. By this learning, the classifier (narrow sense) capable of classifying to find the label area corresponding to the attention class and the area of "other classes" not included in the label area corresponding to the attention class in the area division result are reduced. It is possible to train an estimator that estimates a supplementary class. Further, the learning of the classifier includes the operation of learning the attention class information compression unit 401 using the learning attention class information in addition to the learning of the feature amount extraction unit 400, the class classification unit 403, and the supplementary class estimation unit 404. ..

図５は画像処理システム１の学習時の動作の概略のフロー図である。 FIG. 5 is a schematic flow chart of the operation of the image processing system 1 during learning.

学習動作開始が指示されると、画像処理部５は学習モデル記憶手段４１から分類器の学習モデルのパラメータの初期設定値を読み込み（ステップＳ１）、当該モデルについての学習動作（ステップＳ２〜Ｓ１０）を開始する。 When the start of the learning operation is instructed, the image processing unit 5 reads the initial setting values of the parameters of the learning model of the classifier from the learning model storage means 41 (step S1), and the learning operation for the model (steps S2 to S10). To start.

画像処理部５は、学習用データ記憶手段４０から学習用画像および当該画像の正解ラベルを取得する（ステップＳ２）。画像処理部５は正解ラベル置換手段５０として機能し、正解ラベルに含まれるクラスをランダムに選択し、「その他クラス」に置換する（ステップＳ３）。 The image processing unit 5 acquires the learning image and the correct label of the image from the learning data storage means 40 (step S2). The image processing unit 5 functions as the correct label replacement means 50, randomly selects a class included in the correct label, and replaces it with the “other class” (step S3).

画像処理部５は学習用注目クラス生成手段５１として機能し、正解ラベル置換手段５０で生成されたラベルをもとに学習用注目クラス情報を生成する（ステップＳ４）。例えば、正解ラベル置換手段５０で生成される置換済みの正解ラベル領域は、「その他クラス」を含めてＮ＋１クラスで構成され得るが、学習用注目クラス生成手段５１は「その他クラス」を除いたＮクラスのクラスベクトルを出力する。 The image processing unit 5 functions as a learning attention class generating means 51, and generates learning attention class information based on the label generated by the correct label replacement means 50 (step S4). For example, the replaced correct label area generated by the correct label replacing means 50 may be composed of N + 1 classes including the “other class”, but the learning attention class generating means 51 is N excluding the “other class”. Output the class vector of the class.

次に、画像処理部５は学習用補足クラス生成手段５２として機能し、オリジナルの正解ラベルと学習用注目クラス情報とに基づいて、学習用注目クラス情報には含まれていないが、オリジナルの正解ラベルには含まれているクラスのうち、正解ラベル内で面積が最大のクラスを学習用補足クラスとして設定する（ステップＳ５）。この際、正解ラベル置換手段５０においてどのクラスも置換されず、学習用注目クラス情報にオリジナルの正解ラベルに含まれる全てのクラスが含まれている場合には、学習用補足クラスなしという特殊なクラスを設定する。つまり、学習用補足クラス生成手段５２は要素数Ｎ＋１のクラスベクトルを生成し、注目クラスに追加するべきクラスに相当する要素が値“１”でそれ以外は“０”であるベクトルを出力する。 Next, the image processing unit 5 functions as a learning supplementary class generating means 52, and is not included in the learning attention class information based on the original correct answer label and the learning attention class information, but the original correct answer. Among the classes included in the label, the class having the largest area in the correct answer label is set as a supplementary class for learning (step S5). At this time, if no class is replaced by the correct answer label replacement means 50 and all the classes included in the original correct answer label are included in the learning attention class information, a special class that there is no learning supplementary class. To set. That is, the learning supplementary class generating means 52 generates a class vector having N + 1 elements, and outputs a vector in which the element corresponding to the class to be added to the attention class has a value of “1” and the other elements have a value of “0”.

画像処理部５は学習手段５３として機能し、学習用画像、置換済み正解ラベル領域、学習用注目クラス情報、および学習用補足クラスに基づいて、学習モデルのパラメータを更新する。学習手段５３はまず学習モデルに学習用画像と学習用注目クラス情報を入力し、入力時のパラメータで領域分割と補足クラスの推定を行う（ステップＳ６）。その後、得られた領域分割結果と正解ラベルを比較して誤差を求め（ステップＳ７）、さらに推定した補足クラスと学習用補足クラスとの誤差を求める（ステップＳ８）。学習手段５３はこれらの誤差が小さくなるように確率的勾配降下法などで学習モデルのパラメータを更新する（ステップＳ９）。 The image processing unit 5 functions as a learning means 53, and updates the parameters of the learning model based on the learning image, the replaced correct answer label area, the learning attention class information, and the learning supplementary class. First, the learning means 53 inputs a learning image and learning attention class information into the learning model, and performs area division and estimation of the supplementary class according to the parameters at the time of input (step S6). After that, the obtained area division result is compared with the correct label to obtain an error (step S7), and further, the error between the estimated supplementary class and the learning supplementary class is obtained (step S8). The learning means 53 updates the parameters of the learning model by a stochastic gradient descent method or the like so that these errors become small (step S9).

画像処理システム１は学習動作にて、ステップＳ２〜Ｓ９の処理を学習データを変えながら誤差が収束するまで繰り返し（ステップＳ１０にて「ＮＯ」の場合）、誤差が所定の収束条件を満たすと（ステップＳ１０にて「ＹＥＳ」の場合）、学習モデル記憶手段４１に学習済みモデル（すなわち分類器）のパラメータを記憶させ、学習動作を終了する（ステップＳ１１）。 In the learning operation, the image processing system 1 repeats the processes of steps S2 to S9 while changing the learning data until the error converges (when “NO” in step S10), and when the error satisfies a predetermined convergence condition (when the error satisfies a predetermined convergence condition (in the case of “NO” in step S10)). In the case of "YES" in step S10), the learning model storage means 41 stores the parameters of the learned model (that is, the classifier), and ends the learning operation (step S11).

以上の学習によって生成される分類器（狭義）はクラス分類結果を注目クラスに制限することを指示する注目クラス情報（注目クラス群）を画像（データ群）とともに入力することによって画素（データ）を注目クラス以外に分類することを抑制できるものとなる。そして、学習においては注目クラス群を正解のクラスの部分集合としているため、学習用データの多様性による変動を抑制した（例えば床を道路に誤分類する余地を無くした）高精度なクラス分類ができ、及び／又は、学習データが付与基準の異なるアノテーションの混在したものであっても混在による変動を抑制した（例えば芝を遊技場に分類する余地を無くし草に分類させる）高精度なクラス分類ができるものとなる。また学習が収束しやすくなる。よって、分類器は学習用データの多様性や付与基準の混在による変動を抑制した高精度なクラス分類（領域分割）ができるものとなる。 The classifier (in a narrow sense) generated by the above learning inputs pixels (data) by inputting attention class information (attention class group) indicating that the classification result is limited to the attention class together with an image (data group). It will be possible to suppress the classification to other than the attention class. In learning, since the class of interest is a subset of the correct class, high-precision class classification that suppresses fluctuations due to the diversity of learning data (for example, eliminates the room for misclassifying floors as roads) is possible. High-precision class classification that can and / or suppress fluctuations due to mixing even if the learning data is a mixture of annotations with different assignment criteria (for example, eliminate the room for classifying turf as a playground and classify it as grass) Will be possible. In addition, learning tends to converge. Therefore, the classifier can perform highly accurate class classification (region division) that suppresses fluctuations due to the diversity of learning data and the mixture of assignment criteria.

また、以上の学習によって生成される推定器は、入力された画像（データ群）にあって入力された注目クラス群に無いクラス、すなわち注目クラス群に加えるべき補足クラスを高精度に推定できるものとなる。 In addition, the estimator generated by the above learning can estimate the class in the input image (data group) that is not in the input attention class group, that is, the supplementary class to be added to the attention class group with high accuracy. It becomes.

［領域分割装置としての構成］
図６は第１の実施形態に係る画像処理システム１の領域分割装置としての概略の機能ブロック図であり、記憶部４がモデル記憶手段４２として機能し、画像処理部５が注目クラス設定手段５４および領域分割手段５５として機能する。また、通信部３が画像処理部５と協働し、画像入力手段３０および領域情報出力手段３１として機能する。ここで、注目クラス設定手段５４および領域分割手段５５における主な処理は分類器を用いて行われることから、図６では便宜的に、注目クラス設定手段５４および領域分割手段５５を分類器５６として図示している。 [Configuration as an area division device]
FIG. 6 is a schematic functional block diagram of the image processing system 1 according to the first embodiment as an area dividing device, in which the storage unit 4 functions as the model storage means 42 and the image processing unit 5 is the attention class setting means 54. And functions as the area dividing means 55. Further, the communication unit 3 cooperates with the image processing unit 5 to function as the image input means 30 and the area information output means 31. Here, since the main processing in the attention class setting means 54 and the area dividing means 55 is performed by using the classifier, the attention class setting means 54 and the area dividing means 55 are designated as the classifier 56 in FIG. 6 for convenience. It is shown in the figure.

モデル記憶手段４２は学習により生成された分類器を記憶している。本実施形態においてモデル記憶手段４２は学習装置の構成として上述した学習モデル記憶手段４１と同一であり、分類器は上述した学習済みモデルである。 The model storage means 42 stores the classifier generated by learning. In the present embodiment, the model storage means 42 is the same as the learning model storage means 41 described above as the configuration of the learning device, and the classifier is the learned model described above.

画像入力手段３０は撮影部２から画像を順次取得して分類器５６に入力する。 The image input means 30 sequentially acquires images from the photographing unit 2 and inputs them to the classifier 56.

領域分割手段５５は、画像入力手段３０から画像（入力画像）を入力され、また注目クラス設定手段５４から注目クラス情報を入力され、入力画像の各画素について、クラス分類処理を行い、その結果に基づいて得られる注目クラスのラベル領域を出力する。具体的には、入力画像および注目クラス情報はそれぞれ分類器の特徴量抽出部４００、注目クラス情報圧縮部４０１に入力され、クラス分類部４０３から出力されるクラス分類結果に基づいてラベル領域への分割結果が得られる。領域分割手段５５は注目クラス設定手段５４による注目クラス群の複数通りの設定それぞれについて、分類器によりラベル領域を求め、注目クラス群の複数通りの設定のうち、当該ラベル領域と入力画像に対応する二次元空間との整合の度合いについて所定の条件を満たすものでのラベル領域を入力画像についての領域分割結果として選択する。 The area dividing means 55 inputs an image (input image) from the image input means 30, inputs attention class information from the attention class setting means 54, performs class classification processing for each pixel of the input image, and uses the result as a result. Output the label area of the attention class obtained based on. Specifically, the input image and the attention class information are input to the feature amount extraction unit 400 and the attention class information compression unit 401 of the classifier, respectively, and are transferred to the label area based on the class classification result output from the class classification unit 403. The division result is obtained. The area dividing means 55 obtains a label area by a classifier for each of a plurality of settings of the attention class group by the attention class setting means 54, and corresponds to the label area and the input image among the plurality of settings of the attention class group. A label area that satisfies a predetermined condition for the degree of matching with the two-dimensional space is selected as the area division result for the input image.

注目クラス設定手段５４は、画像入力手段３０から画像を入力され、領域分割手段５５に入力する複数通りの注目クラス群を設定する。本実施形態では、注目クラス設定手段５４は、注目クラス群に補足クラスを加えて新たな注目クラス群を設定する処理により、複数通りの注目クラス群を逐次的に設定する。注目クラス設定手段５４は、領域分割手段５５でのクラス分類処理が所定の条件を満たすまで、逐次的な設定を繰り返す。設定された注目クラス群は注目クラス情報として領域分割手段５５へ与えられる。補足クラスは分類器を用いて推定される。具体的には、分類器の特徴量抽出部４００、注目クラス情報圧縮部４０１にそれぞれ入力画像および注目クラス情報を入力し、補足クラス推定部４０４から補足クラスの推定結果を得る。 The attention class setting means 54 sets a plurality of types of attention class groups in which an image is input from the image input means 30 and is input to the area division means 55. In the present embodiment, the attention class setting means 54 sequentially sets a plurality of types of attention class groups by adding a supplementary class to the attention class group and setting a new attention class group. The attention class setting means 54 repeats the sequential setting until the class classification process in the area dividing means 55 satisfies a predetermined condition. The set attention class group is given to the region dividing means 55 as attention class information. Supplemental classes are estimated using a classifier. Specifically, the input image and the attention class information are input to the feature amount extraction unit 400 and the attention class information compression unit 401 of the classifier, respectively, and the estimation result of the supplementary class is obtained from the supplementary class estimation unit 404.

領域情報出力手段３１は、領域分割手段５５が求めたラベル領域を表示部６に出力する。例えば、領域情報出力手段３１は、ラベル領域ごとに色分けされた画像を生成して表示部６に出力する。 The area information output means 31 outputs the label area obtained by the area dividing means 55 to the display unit 6. For example, the area information output means 31 generates a color-coded image for each label area and outputs the image to the display unit 6.

［領域分割装置としての動作］
図７は画像処理システム１の領域分割処理での動作に関する概略のフロー図である。 [Operation as an area division device]
FIG. 7 is a schematic flow chart relating to the operation of the image processing system 1 in the area division processing.

画像処理システム１が領域分割処理を開始すると、撮影部２は所定時間おきに監視空間を撮影した画像を順次出力する。画像処理部５は通信部３と協働して、撮影部２から画像を受信するたびに図７のフロー図に示す動作を繰り返す。 When the image processing system 1 starts the area division processing, the photographing unit 2 sequentially outputs the images captured in the monitoring space at predetermined time intervals. The image processing unit 5 cooperates with the communication unit 3 to repeat the operation shown in the flow chart of FIG. 7 every time an image is received from the photographing unit 2.

当該動作にてまず通信部３が画像入力手段３０として機能し、画像を受信すると当該画像を画像処理部５に入力する（ステップＳ２０）。 In this operation, the communication unit 3 first functions as an image input means 30, and when an image is received, the image is input to the image processing unit 5 (step S20).

画像処理部５は入力された画像（入力画像）を分類器の特徴量抽出部４００に入力し、入力画像の特徴量を計算する（ステップＳ２１）。この先の処理では、１つの入力画像に対して領域分割処理が注目クラスを変化させながら複数回行われるが、その際にここで計算した特徴量を繰り返し利用する。このように、領域分割処理の都度、特徴量を計算するのではなく再利用することで、画像処理部５の計算量を削減することができる。 The image processing unit 5 inputs the input image (input image) to the feature amount extraction unit 400 of the classifier, and calculates the feature amount of the input image (step S21). In the subsequent processing, the area division processing is performed a plurality of times for one input image while changing the class of interest, and the feature amount calculated here is repeatedly used at that time. In this way, the calculation amount of the image processing unit 5 can be reduced by reusing the feature amount instead of calculating it each time the area division processing is performed.

一方、分類器の注目クラス情報圧縮部４０１には、注目クラス設定手段５４により設定される注目クラス群の初期値が入力される（ステップＳ２２）。注目クラス設定手段５４は当該初期値として例えば、１クラスだけからなる注目クラス群を設定する。注目クラス設定手段５４は例えばＰ通り（１≦Ｐ）の注目クラス群を設定する。ここでＰは予め定めておく。例えばＰ＝３とすることができる。例えば、注目クラスに何も設定しなかった際に補足クラス推定部４０４の出力に得られる補足クラスの上位Ｐ個のクラスのそれぞれを初期注目クラス群とすることができる。あるいは、Ｎ個の全クラスそれぞれを１個ずつ注目クラス群として領域分割を行い、それらＮ通りの注目クラス群のうち、領域分割結果における「その他クラス」の面積が小さい順にＰクラスをそれぞれ初期注目クラス群とすることもできる。 On the other hand, the initial value of the attention class group set by the attention class setting means 54 is input to the attention class information compression unit 401 of the classifier (step S22). The attention class setting means 54 sets, for example, an attention class group consisting of only one class as the initial value. The attention class setting means 54 sets, for example, P ways (1 ≦ P) of attention class groups. Here, P is predetermined. For example, P = 3. For example, each of the upper P classes of the supplementary class obtained in the output of the supplementary class estimation unit 404 when nothing is set in the attention class can be set as the initial attention class group. Alternatively, the area is divided by setting each of the N all classes as one attention class group, and among the N kinds of attention class groups, the P class is initially focused on in ascending order of the area of the "other class" in the area division result. It can also be a class group.

注目クラス設定手段５４と領域分割手段５５は推定器と一体の分類器を共用し、好適な領域分割結果が得られるように、注目クラス群の更新と領域分割処理とを繰り返す（ステップＳ２３〜Ｓ２９）。 The attention class setting means 54 and the area division means 55 share a classifier integrated with the estimator, and the update of the attention class group and the area division process are repeated so that a suitable area division result can be obtained (steps S23 to S29). ).

注目クラス設定手段５４は、補足クラスの推定結果に多少の誤りが含まれていても良いように、注目クラス群を複数通り保持しながら、好適な注目クラス群の探索を行う。 The attention class setting means 54 searches for a suitable attention class group while holding a plurality of attention class groups so that the estimation result of the supplementary class may contain some errors.

補足クラスは補足クラス推定部４０４で算出される。ここでは、補足クラス推定部４０４は補足クラス情報として、クラスごとに補足クラスらしさを表すスコアが格納されたベクトルを出力する構成であるとする。注目クラス設定手段５４は、保持されているＰ通りの注目クラス群を順次、分類器に入力して当該注目クラス群について補足クラス情報を求める。そして、補足クラス情報にてスコアが上位のＱ（１≦Ｑ＜Ｎ）個のクラスそれぞれを当該注目クラス群に対する補足クラスとして選定する（ステップＳ２３）。ここでＱは予め定めておく。例えばＱ＝３とすることができる。 The supplementary class is calculated by the supplementary class estimation unit 404. Here, it is assumed that the supplementary class estimation unit 404 outputs as supplementary class information a vector in which a score representing the supplementary class-likeness is stored for each class. The attention class setting means 54 sequentially inputs the held P-like attention class groups into the classifier and obtains supplementary class information for the attention class group. Then, each of the Q (1 ≦ Q <N) classes having the highest score in the supplementary class information is selected as the supplementary class for the attention class group (step S23). Here, Q is predetermined. For example, Q = 3.

注目クラス設定手段５４がステップＳ２３におけるＱ通りの補足クラスから順次１つ選択して現在の注目クラス群に追加することで試行注目クラス群を作成し（ステップＳ２４）、画像処理部５が作成された試行注目クラス群を分類器に入力して領域分割処理を行う（ステップＳ２５）という一連の処理が、現在の注目クラス群に対してＱ個の補足クラスについて処理し終えるまで繰り返される（ステップＳ２６にて「ＮＯ」の場合）。なお、ステップＳ２５にて領域分割処理とともに補足クラスの推定処理を行ってもよく、その推定結果を、後のステップＳ２８で更新する注目クラス群に対するステップＳ２３の結果として利用することができる。 The attention class setting means 54 sequentially selects one from the supplementary classes according to Q in step S23 and adds it to the current attention class group to create a trial attention class group (step S24), and the image processing unit 5 is created. A series of processes of inputting the trial attention class group into the classifier and performing the area division processing (step S25) are repeated until the processing for Q supplementary classes for the current attention class group is completed (step S26). In the case of "NO"). In addition, the supplementary class estimation process may be performed together with the area division process in step S25, and the estimation result can be used as the result of step S23 for the attention class group to be updated in the later step S28.

Ｑ個の補足クラス全てについて処理が完了すると（ステップＳ２６にて「ＹＥＳ」の場合）、現在の注目クラス群の１つについて試行注目クラス群がＱ通り作成される。 When the processing is completed for all the Q supplementary classes (in the case of "YES" in step S26), the trial attention class group is created according to Q for one of the current attention class groups.

画像処理部５はＰ通りの現在の注目クラス群に対してステップＳ２４〜Ｓ２６の処理を順次行う（ステップＳ２７にて「ＮＯ」の場合）。Ｐ通りの現在の注目クラス群全てについて当該処理が完了すると（ステップＳ２７にて「ＹＥＳ」の場合）、試行注目クラス群とその領域分割結果がＰ×Ｑ通り得られている。注目クラス設定手段５４はそれら試行注目クラス群のうち、領域分割結果における「その他クラス」の面積が小さい順における第１位〜第Ｐ位のものでＰ通り保持している注目クラス群を置換することによって注目クラス群を更新する（ステップＳ２８）。この更新によってＰ通りの注目クラス群は、現在の注目クラス群に補足クラスを加えたものとなる。また、この処理において注目クラス群に対応するラベル領域は基本的に拡大していき画像の全体領域に近づく。 The image processing unit 5 sequentially performs the processes of steps S24 to S26 for the current attention class group according to P (in the case of "NO" in step S27). When the processing is completed for all of the current attention class groups according to P (when “YES” in step S27), the trial attention class group and the region division result thereof are obtained according to P × Q. Of the trial attention class groups, the attention class setting means 54 replaces the attention class group held in P ways with the first to Pth ranks in ascending order of the area of the “other class” in the area division result. By doing so, the attention class group is updated (step S28). With this update, the attention class group according to P will be the current attention class group plus supplementary classes. Further, in this process, the label area corresponding to the class of interest basically expands and approaches the entire area of the image.

画像処理部５は、１つの補足クラスが追加されたＰ×Ｑ通りの試行注目クラス群を生成し、それらの中からＰ通りの注目クラス群を選択するというステップＳ２３〜Ｓ２８の操作を終了条件が満たされるまで繰り返す（ステップＳ２９にて「ＮＯ」の場合）。 The image processing unit 5 completes the operations of steps S23 to S28 of generating P × Q trial attention class groups to which one supplementary class is added and selecting P attention class groups from them. Is repeated until is satisfied (in the case of "NO" in step S29).

終了条件は、注目クラス群に対応する当該ラベル領域と入力画像に対応する二次元空間との整合の度合いについてのものであり、例えば、当該度合いを表す指標の繰り返しに伴う変化が無い又は所定値以下であれば終了と判定される。具体的には、終了条件は、画像中の「その他クラス」の面積が減少しなくなることや、当該減少が所定値以下となることや、保持している注目クラス群全てにおいて「補足クラスなし」と推定されることや、注目クラス群に既に含まれるクラスが補足クラスとして推定され、新たな注目クラス群が得られなくなることなどとすることができる。また、例えば、整合の度合いを表す指標が所定基準を超える又は下回れば終了と判定される。具体的には、終了条件は、「その他クラス」のラベル領域の面積が入力画像の面積の所定割合以下となること、注目クラス群に既に含まれるクラス数が予め設定した最大個数を超えることなどとすることができる。或いは、終了条件を、上記条件のうちの２以上のいずれかを満たすこととしてもよい。 The end condition is about the degree of matching between the label area corresponding to the class of interest and the two-dimensional space corresponding to the input image. For example, there is no change or a predetermined value due to the repetition of the index indicating the degree. If it is as follows, it is judged to be finished. Specifically, the end conditions are that the area of the "other class" in the image does not decrease, that the decrease is less than the predetermined value, and that "no supplementary class" is held in all the attention class groups that are held. It can be assumed that a class already included in the attention class group is estimated as a supplementary class, and a new attention class group cannot be obtained. Further, for example, if the index indicating the degree of matching exceeds or falls below a predetermined standard, it is determined to end. Specifically, the end conditions are that the area of the label area of the "other class" is less than or equal to the predetermined ratio of the area of the input image, and that the number of classes already included in the attention class group exceeds the preset maximum number. Can be. Alternatively, the termination condition may satisfy any one of two or more of the above conditions.

画像処理部５は終了条件が満たされた場合（ステップＳ２９にて「ＹＥＳ」の場合）、注目クラス群の探索が終了したと判定し、その結果得られたＰ通りの注目クラス群のうちで上述の整合度合いについて所定の条件を満たすものを最終結果として選択する。本実施形態では、Ｐ通りの注目クラス群のうちで、「その他クラス」の面積が最小であるものに対応するラベル領域を、入力画像の領域分割結果として領域情報出力手段３１により出力する（ステップＳ３０）。 When the end condition is satisfied (when "YES" in step S29), the image processing unit 5 determines that the search for the attention class group has been completed, and among the P-like attention class groups obtained as a result. The final result is selected that satisfies the predetermined conditions for the above-mentioned degree of consistency. In the present embodiment, the label area corresponding to the P-like attention class group having the smallest area of the “other class” is output by the area information output means 31 as the area division result of the input image (step). S30).

図８は、画像処理システム１の領域分割処理の処理例を説明するための模式図である。図８（ａ）の画像２００は入力画像を示しており、入力画像２００には、壁２０１、窓２０２、人２０３と共に、黒い絨毯が敷かれた床２０４が撮影されている。 FIG. 8 is a schematic diagram for explaining a processing example of the area division processing of the image processing system 1. The image 200 of FIG. 8A shows an input image, in which the wall 201, the window 202, the person 203, and the floor 204 on which the black carpet is laid are photographed.

図８（ｂ）の画像２１０は入力画像２００に対して従来技術により得られるラベル領域を表している。一方、図８（ｃ）の画像２２０は入力画像２００に対して本実施形態の画像処理システム１により得られるラベル領域、また画像２２０ａ〜２２０ｃは当該ラベル領域を得る際の領域分割処理の過程を表している。 The image 210 of FIG. 8B represents a label area obtained by the prior art with respect to the input image 200. On the other hand, the image 220 of FIG. 8C shows the label area obtained by the image processing system 1 of the present embodiment for the input image 200, and the images 220a to 220c show the process of the area division processing when obtaining the label area. Represents.

図８（ｂ）に示す従来技術の処理結果では、壁２０１、窓２０２、人２０３が撮影された領域はそれぞれ正しく壁のクラスのラベル領域２１１、窓のクラスのラベル領域２１２、人のクラスのラベル領域２１３として分割されているが、床２０４が撮影された領域は正しく床のクラスとして分割されたラベル領域２１４と、誤って道路のクラスとして分割されたラベル領域２１５とに分かれてしまっている。 According to the processing results of the prior art shown in FIG. 8B, the areas where the wall 201, the window 202, and the person 203 were photographed are correctly the label area 211 of the wall class, the label area 212 of the window class, and the person class, respectively. Although it is divided as the label area 213, the area where the floor 204 is photographed is divided into the label area 214 which is correctly divided as the floor class and the label area 215 which is mistakenly divided as the road class. ..

図８（ｃ）に示す処理例では、説明を簡単にするために、上述の注目クラス群の保持数Ｐを１とする。その処理過程において、画像２２０ａは注目クラス情報として“床”のクラスを値“１”とし、それ以外のクラスを値“０”としたクラスベクトルを入力したときの分類器のクラス分類部４０３の出力を表しており、ラベル領域として床のクラスのラベル領域２２４が得られ、それ以外の領域は斜線で示す「その他クラス」となっている。補足クラス推定部４０４は補足クラス情報のスコアが最上位の１クラスを補足クラスとし、それにより、この段階では補足クラスが“壁”とされ、“床”、“壁”のクラスが値“１”である新たな注目クラス情報が設定される。 In the processing example shown in FIG. 8C, the retention number P of the above-mentioned attention class group is set to 1 for the sake of simplicity. In the processing process, the image 220a of the class classification unit 403 of the classifier when a class vector in which the class of "floor" is set to the value "1" and the other classes are set to the value "0" is input as the attention class information. The output is represented, and the label area 224 of the floor class is obtained as the label area, and the other areas are “other classes” indicated by diagonal lines. The supplementary class estimation unit 404 sets the one class with the highest score of the supplementary class information as the supplementary class, so that the supplementary class is set as the "wall" at this stage, and the "floor" and "wall" classes are set to the value "1". "New attention class information is set.

画像２２０ｂは当該注目クラス情報を入力したときのクラス分類部４０３の出力を表しており、ラベル領域として床のクラスのラベル領域２２４と壁のクラスのラベル領域２２１とが得られ、それ以外の領域は斜線で示す「その他クラス」となっている。このときの補足クラス推定部４０４は補足クラスを“窓”と推定し、“床”、“壁”、“窓”のクラスが値“１”である新たな注目クラス情報が設定される。 The image 220b shows the output of the class classification unit 403 when the attention class information is input, and the label area 224 of the floor class and the label area 221 of the wall class are obtained as the label areas, and the other areas are obtained. Is the "other class" indicated by the diagonal line. At this time, the supplementary class estimation unit 404 estimates the supplementary class as a “window”, and sets new attention class information in which the “floor”, “wall”, and “window” classes have a value of “1”.

画像２２０ｃは当該注目クラス情報を入力したときのクラス分類部４０３の出力を表しており、ラベル領域として床のクラスのラベル領域２２４、壁のクラスのラベル領域２２１に加え窓のクラスのラベル領域２２２が得られ、それ以外の領域は斜線で示す「その他クラス」となっている。このときの補足クラス推定部４０４は補足クラスを“人”とし、“床”、“壁”、“窓”、“人”のクラスが値“１”である新たな注目クラス情報が設定される。 The image 220c shows the output of the class classification unit 403 when the attention class information is input, and as the label area, the label area 224 of the floor class, the label area 221 of the wall class, and the label area 222 of the window class are shown. Is obtained, and the other areas are "other classes" indicated by diagonal lines. At this time, the supplementary class estimation unit 404 sets the supplementary class to "person", and sets new attention class information in which the classes of "floor", "wall", "window", and "person" have a value of "1". ..

画像２２０は当該注目クラス情報を入力したときのクラス分類部４０３の出力を表しており、入力画像２００は壁のクラスのラベル領域２２１、窓のクラスのラベル領域２２２、人のクラスのラベル領域２２３および床のクラスのラベル領域２２４に分割される。この段階にて、「その他クラス」の領域がなくなることや、補足クラス推定部４０４の出力にて「追加なし」が設定されることといった終了条件が満たされ、領域分割処理が終了する。 The image 220 shows the output of the class classification unit 403 when the attention class information is input, and the input image 200 shows the label area 221 of the wall class, the label area 222 of the window class, and the label area 223 of the person class. And divided into floor class label areas 224. At this stage, the end conditions such as the disappearance of the "other class" area and the setting of "no addition" in the output of the supplementary class estimation unit 404 are satisfied, and the area division process ends.

この処理では、入力画像に出現し得るクラスを注目クラス情報で与えることで、例えば、床を注目クラスとしたときに、画像２１０にて床と似た画像特徴を有する道路のラベル領域２１５とされた床の部分が正しく床に誘導されやすくなり誤分類が抑制される。 In this process, by giving the class that can appear in the input image with the attention class information, for example, when the floor is the attention class, the label area 215 of the road having the image feature similar to the floor is set in the image 210. The part of the floor is easily guided to the floor correctly, and misclassification is suppressed.

《第２の実施形態》
本発明の第２の実施形態に係る画像処理システム１Ｂは第１の実施形態の図１と共通の構成であり、また第１の実施形態と同様、入力画像に対する領域分割処理を行う領域分割装置、およびその学習装置として動作する。以下、第２の実施形態について、第１の実施形態と同様の構成要素については同一の符号を付して第１の実施形態での説明を援用し、また、説明上、第１の実施形態の構成要素との混同を避ける場合には第１の実施形態の符号の後ろに“Ｂ”を付した符号を用いることとし、主に第１の実施形態との相違点を説明する。 << Second Embodiment >>
The image processing system 1B according to the second embodiment of the present invention has the same configuration as that of FIG. 1 of the first embodiment, and is a region dividing device that performs region dividing processing on an input image as in the first embodiment. , And its learning device. Hereinafter, with respect to the second embodiment, the same components as those of the first embodiment are designated by the same reference numerals, and the description in the first embodiment is incorporated. In order to avoid confusion with the constituent elements of the above, a reference numeral having "B" added after the reference numeral of the first embodiment will be used, and the differences from the first embodiment will be mainly described.

画像処理システム１Ｂが第１の実施形態の画像処理システム１と基本的に異なる点は、分類器が第１の実施形態で定義した広義のものではなく狭義のものであり、補足クラスの推定器を要しないという点にある。つまり、第１の実施形態の分類器は広義のものとして、クラス分類を行う狭義の分類器と補足クラスを推定する推定器とが合わさったものであったが、第２の実施形態の分類器は当該推定器の機能を含まない。具体的には、第２の実施形態の分類器は図２において補足クラス推定部４０４を省略した構成である。 The basic difference between the image processing system 1B and the image processing system 1 of the first embodiment is that the classifier is not in the broad sense defined in the first embodiment but in the narrow sense, and is a supplementary class estimator. The point is that it does not require. That is, the classifier of the first embodiment is a combination of a classifier in a narrow sense for classifying and an estimator for estimating a supplementary class in a broad sense, but the classifier in the second embodiment is used. Does not include the function of the estimator. Specifically, the classifier of the second embodiment has a configuration in which the supplementary class estimation unit 404 is omitted in FIG.

この分類器の構成に対応して、当該分類器の学習装置としての画像処理システム１Ｂでは、推定器の出力である補足クラスに対する正解データとする学習用補足クラスが不要となる。具体的には、画像処理システム１Ｂは学習装置として、図４にて学習用補足クラス生成手段５２を省略した構成である。 Corresponding to the configuration of this classifier, the image processing system 1B as the learning device of the classifier does not require the supplementary class for learning which is the correct answer data for the supplementary class which is the output of the estimator. Specifically, the image processing system 1B has a configuration in which the learning supplementary class generation means 52 is omitted in FIG. 4 as a learning device.

画像処理システム１Ｂにおける注目クラス設定手段５４Ｂは、注目クラス群を複数通り設定する。第１の実施形態では、或る注目クラス情報を分類器に入力しその出力に得られる補足クラスを用いて注目クラス情報を更新することで、複数通りの注目クラス群を逐次的に設定しているのに対し、本実施形態では複数通りの注目クラス群は逐次更新という形態に依らずに設定される。具体的には、注目クラス設定手段５４ＢはＮ個の全クラスを１個ずつ注目クラスとし当該１つの注目クラスからなる注目クラス群をＮ通り設定する。 The attention class setting means 54B in the image processing system 1B sets a plurality of attention class groups. In the first embodiment, a plurality of attention class groups are sequentially set by inputting certain attention class information into the classifier and updating the attention class information using the supplementary class obtained in the output. On the other hand, in the present embodiment, a plurality of types of attention class groups are set regardless of the form of sequential update. Specifically, the attention class setting means 54B sets all N classes as attention classes one by one, and sets N ways of attention class groups including the one attention class.

画像処理システム１Ｂにおける領域分割手段５５Ｂは、注目クラス設定手段５４Ｂによる注目クラス群の複数通りの設定それぞれについて、分類器によりラベル領域を求め、当該ラベル領域に基づいて入力画像に対応する二次元空間を領域分割する。そして、当該領域分割処理で得られる複数通りの領域分割結果のうち、当該領域分割結果を構成するラベル領域と入力画像に対応する二次元空間との整合の度合いについて所定の条件を満たすものを入力画像についての領域分割結果として選択する。 The area dividing means 55B in the image processing system 1B obtains a label area by a classifier for each of a plurality of settings of the attention class group by the attention class setting means 54B, and is a two-dimensional space corresponding to the input image based on the label area. Is divided into areas. Then, among the plurality of types of area division results obtained by the area division process, those that satisfy a predetermined condition regarding the degree of matching between the label area constituting the area division result and the two-dimensional space corresponding to the input image are input. Select as the area division result for the image.

図９は画像処理システム１Ｂの領域分割処理での動作に関する概略のフロー図である。 FIG. 9 is a schematic flow chart relating to the operation of the image processing system 1B in the area division processing.

画像処理システム１Ｂが領域分割処理を開始すると、撮影部２は所定時間おきに監視空間を撮影した画像を順次出力する。画像処理部５は通信部３と協働して、撮影部２から画像を受信するたびに図９のフロー図に示す動作を繰り返す。 When the image processing system 1B starts the area division processing, the photographing unit 2 sequentially outputs the images captured in the monitoring space at predetermined time intervals. The image processing unit 5 cooperates with the communication unit 3 to repeat the operation shown in the flow chart of FIG. 9 every time an image is received from the photographing unit 2.

当該動作にてまず通信部３が画像入力手段３０として機能し、画像を受信すると当該画像を画像処理部５に入力する（ステップＳ４０）。 In this operation, the communication unit 3 first functions as an image input means 30, and when an image is received, the image is input to the image processing unit 5 (step S40).

画像処理部５は入力された画像（入力画像）を分類器の特徴量抽出部４００に入力し、入力画像の特徴量を計算する（ステップＳ４１）。第１の実施形態と同様、ここで計算した特徴量は、１つの入力画像に対する注目クラスを変えた複数回の領域分割処理にて繰り返し利用され、これにより画像処理部５の計算量が削減される。 The image processing unit 5 inputs the input image (input image) to the feature amount extraction unit 400 of the classifier, and calculates the feature amount of the input image (step S41). Similar to the first embodiment, the feature amount calculated here is repeatedly used in a plurality of area division processes in which the attention class for one input image is changed, whereby the calculation amount of the image processing unit 5 is reduced. To.

注目クラス設定手段５４Ｂは、注目クラス情報として、クラスベクトルにて１クラスだけを注目クラス群に設定したものを生成し、これを注目クラス情報圧縮部４０１に入力する（ステップＳ４２）。領域分割手段５５Ｂは、ステップＳ４１で計算した特徴量を用いて分類器によりクラス分類処理を行い、ステップＳ４２にて設定された注目クラス群についてのラベル領域を求める（ステップＳ４３）。 The attention class setting means 54B generates the attention class information in which only one class is set as the attention class group by the class vector, and inputs this to the attention class information compression unit 401 (step S42). The area dividing means 55B performs a class classification process by a classifier using the feature amount calculated in step S41, and obtains a label area for the class of interest set in step S42 (step S43).

画像処理部５はＮ個の全クラスについてステップＳ４２，Ｓ４３を繰り返す（ステップＳ４４にて「ＮＯ」の場合）。これにより、注目クラス設定手段５４ＢはＮ通りの注目クラス群を設定し、領域分割手段５５Ｂは注目クラス群の当該Ｎ通りの設定それぞれについて分類器でラベル領域を求める。 The image processing unit 5 repeats steps S42 and S43 for all N classes (when “NO” in step S44). As a result, the attention class setting means 54B sets N kinds of attention class groups, and the area dividing means 55B obtains a label area with a classifier for each of the N ways of setting the attention class group.

画像処理部５は、全クラスについてそれぞれを注目クラス群とした領域分割処理を終えると（ステップＳ４４にて「ＹＥＳ」の場合）、その処理結果のうち「その他クラス」の部分しかないとされた注目クラス群の設定に対応するものを削除する（ステップＳ４５）。 When the image processing unit 5 finishes the area division processing in which each of the classes is set as the group of interest (in the case of "YES" in step S44), it is determined that there is only the "other class" part of the processing result. The one corresponding to the setting of the attention class group is deleted (step S45).

ステップＳ４５で残された領域分割結果には注目クラスについてラベル領域が存在する。領域分割手段５５Ｂは残った領域分割結果を組み合わせ、なるべくラベル領域間の重複がなく、かつ画像全体を埋め尽くすことができるようなラベル領域の組み合わせを作成する。この処理は、例えば次のような手法で行うことができる。 The area division result left in step S45 has a label area for the class of interest. The area division means 55B combines the remaining area division results to create a combination of label areas so that there is as little overlap between the label areas and the entire image can be filled. This process can be performed by, for example, the following method.

画像処理部５はステップＳ４５で残された各領域分割結果をオリジナルの領域分割結果（オリジナル結果）として保持する一方、当該各領域分割結果の複製を作成し探索用の領域分割結果（探索用結果）として保持する（ステップＳ４６）。 The image processing unit 5 holds each area division result left in step S45 as the original area division result (original result), and creates a duplicate of each area division result to search the area division result (search result). ) (Step S46).

ステップＳ４７〜Ｓ５３の処理は終了条件を満たすまで繰り返すループ処理である。その中で行われるステップＳ４７〜Ｓ５２のループ処理は基本的に複数作成される探索用結果についてのものであり、探索用結果を例えばクラスの識別番号順に処理対象として順次設定して実行される。さらにその中で行われるステップＳ４７〜Ｓ４９のループ処理は基本的に複数存在するオリジナル結果についてのものである。 The processing of steps S47 to S53 is a loop processing that is repeated until the end condition is satisfied. The loop processing of steps S47 to S52 performed therein is basically for a plurality of search results to be created, and the search results are sequentially set and executed as processing targets in the order of, for example, the identification number of the class. Further, the loop processing of steps S47 to S49 performed therein is basically for a plurality of original results.

オリジナル結果についてのループ処理は、画像処理部５が、オリジナル結果を、それらが含む注目クラスのラベル領域の面積が大きいものから順に１つずつ選択して（ステップＳ４７）、繰り返される。画像処理部５は選択したラベル領域と処理対象に設定した１つの探索用結果に含まれているラベル領域との重なり度合いを示す指標を計算する（ステップＳ４８）。ここでは当該指標としてＩｏＵ（Intersection over Union）を用いる。ＩｏＵは２つの領域の重複部分（Intersection）の面積をＩ、２つの領域の和領域（Union）の面積をＵとして、ＩｏＵ＝Ｉ／Ｕで与えられ、０〜１の値を取り、０に近いほど２つの領域の重なり度合いが低いことを表す。 The loop processing for the original result is repeated by the image processing unit 5 selecting the original results one by one in order from the one having the largest area of the label area of the attention class included in them (step S47). The image processing unit 5 calculates an index indicating the degree of overlap between the selected label area and the label area included in one search result set as the processing target (step S48). Here, IoU (Intersection over Union) is used as the index. IoU is given by IoU = I / U, where I is the area of the overlapping part (Intersection) of the two regions and U is the area of the sum region (Union) of the two regions. The closer they are, the lower the degree of overlap between the two regions.

画像処理部５は、ＩｏＵが予め定めた閾値Ｔより大きく、且つ、処理対象としている探索用結果とのＩｏＵを計算していないオリジナル結果がある場合は、他のオリジナル結果についてステップＳ４７，Ｓ４８を繰り返す（ステップＳ４９にて「ＮＯ」の場合）。一方、ＩｏＵが閾値Ｔ以下であった場合、または、全てのオリジナル結果についてＩｏＵを計算し終えた場合は（ステップＳ４９にて「ＹＥＳ」の場合）、ステップＳ５０に処理を進める。 If the image processing unit 5 has an original result that is larger than the threshold value T set in advance by the IoU and has not calculated the IoU from the search result to be processed, the image processing unit 5 performs steps S47 and S48 for the other original results. Repeat (in the case of "NO" in step S49). On the other hand, if the IoU is equal to or less than the threshold value T, or if the IoU has been calculated for all the original results (when “YES” in step S49), the process proceeds to step S50.

ステップＳ５０では画像処理部５は、処理対象に設定されている探索用結果とのＩｏＵが閾値Ｔ以下のオリジナル結果が存在するか否かを調べ、存在する場合には（ステップＳ５０にて「ＹＥＳ」の場合）、当該オリジナル結果に含まれている注目クラスのラベル領域を探索用結果に含まれているラベル領域とマージし（ステップＳ５１）、一方、存在しない場合には（ステップＳ５０にて「ＮＯ」の場合）、ステップＳ５１は省略される。ステップＳ５１にて画像処理部５は、マージした結果で探索用結果のラベル領域を更新する。なお、ラベル領域が重なった部分は、探索用結果のラベル領域を優先し残す。 In step S50, the image processing unit 5 examines whether or not there is an original result whose IoU with the search result set as the processing target is equal to or less than the threshold value T, and if it exists (YES in step S50). ”), The label area of the attention class included in the original result is merged with the label area included in the search result (step S51), while if it does not exist (in step S50,“ step S50 ”. In the case of "NO"), step S51 is omitted. In step S51, the image processing unit 5 updates the label area of the search result with the merged result. In the portion where the label areas overlap, the label area of the search result is given priority and left.

画像処理部５はステップＳ４７〜Ｓ５１の処理を処理対象の探索用結果を変えて反復する（ステップＳ５２にて「ＮＯ」の場合）。全ての探索用結果を処理対象とし終えた場合は（ステップＳ５２にて「ＹＥＳ」の場合）、終了判定を行う（ステップＳ５３）。終了判定にて、所定の終了条件が満たされていない場合は（ステップＳ５３にて「ＮＯ」の場合）、ステップＳ４７〜Ｓ５２の処理を繰り返す。ここで、ステップＳ５１のマージ処理により探索用結果におけるラベル領域は拡大し、ステップＳ４７〜Ｓ５２の処理を繰り返すことで、探索用結果におけるラベル領域以外である「その他クラス」の領域は基本的に徐々に減少する。 The image processing unit 5 repeats the processing of steps S47 to S51 by changing the search result of the processing target (in the case of "NO" in step S52). When all the search results have been processed (when “YES” in step S52), the end determination is performed (step S53). If the predetermined end condition is not satisfied in the end determination (when "NO" in step S53), the processes of steps S47 to S52 are repeated. Here, the label area in the search result is expanded by the merge process in step S51, and by repeating the processes in steps S47 to S52, the area of the "other class" other than the label area in the search result is basically gradually gradually increased. Decreases to.

ステップＳ５３の終了判定における終了条件は、探索用分類結果として得られている領域分割結果を構成するラベル領域と入力画像に対応する二次元空間との整合の度合いについてのものであり、例えば、当該度合いを表す指標の繰り返しに伴う変化が無い又は所定値以下であれば終了と判定される。具体的には、終了条件は、追加できるオリジナルの分類結果のラベル領域が無くなることや、そのことによって探索用分類結果における「その他クラス」の領域が減少しなくなることや、当該減少が所定値以下となることとすることができる。また、例えば、整合の度合いを表す指標が所定基準を超える又は下回れば終了と判定される。具体的には、終了条件は、探索用分類結果のラベル領域に含まれるクラス数が予め予め設定した最大個数を超えることや、「その他クラス」のラベル領域の面積が入力画像の面積の所定割合以下となることとすることができる。或いは、終了条件を、上記条件のうちの２以上のいずれかを満たすこととしてもよい。 The end condition in the end determination in step S53 is about the degree of matching between the label area constituting the area division result obtained as the search classification result and the two-dimensional space corresponding to the input image, for example. If there is no change due to the repetition of the index indicating the degree or if it is less than a predetermined value, it is judged to be finished. Specifically, the end condition is that the label area of the original classification result that can be added disappears, that the area of "other class" in the search classification result does not decrease, and that the decrease is less than the predetermined value. Can be. Further, for example, if the index indicating the degree of matching exceeds or falls below a predetermined standard, it is determined to end. Specifically, the end condition is that the number of classes included in the label area of the search classification result exceeds the preset maximum number, and the area of the label area of the "other class" is a predetermined ratio of the area of the input image. It can be as follows. Alternatively, the termination condition may satisfy any one of two or more of the above conditions.

画像処理部５は終了条件が満たされた場合、探索処理を終了し（ステップＳ５３にて「ＹＥＳ」の場合）、探索用結果のうち「その他クラス」の領域が最小であるものにおけるラベル領域を、入力画像の領域分割結果として領域情報出力手段３１により出力する（ステップＳ５４）。 When the end condition is satisfied, the image processing unit 5 ends the search process (when “YES” in step S53), and sets the label area in the search result in which the “other class” area is the smallest. , The area information output means 31 outputs as the area division result of the input image (step S54).

《第３の実施形態》
本発明の第３の実施形態に係る画像処理システム１Ｃは第１の実施形態の図１と共通の構成であり、また第１の実施形態と同様、入力画像に対する領域分割処理を行う領域分割装置、およびその学習装置として動作する。以下、第３の実施形態について、第１の実施形態と同様の構成要素については同一の符号を付して第１の実施形態での説明を援用し、また、説明上、第１の実施形態の構成要素との混同を避ける場合には第１の実施形態の符号の後ろに“Ｃ”を付した符号を用いることとし、主に第１の実施形態との相違点を説明する。 << Third Embodiment >>
The image processing system 1C according to the third embodiment of the present invention has the same configuration as that of FIG. 1 of the first embodiment, and is a region dividing device that performs region dividing processing on an input image as in the first embodiment. , And its learning device. Hereinafter, with respect to the third embodiment, the same components as those in the first embodiment are designated by the same reference numerals, and the description in the first embodiment is incorporated. In order to avoid confusion with the constituent elements of the above, a reference numeral having "C" added after the reference numeral of the first embodiment will be used, and the differences from the first embodiment will be mainly described.

画像処理システム１Ｃの分類器は第２の実施形態と同様、補足クラスの推定器を含まないもの、つまり上述した狭義の分類器であり、補足クラス推定部４０４を有さず、これに対応して学習装置の構成に関し第２の実施形態と同様、学習用補足クラス生成手段５２を要さない。 Similar to the second embodiment, the classifier of the image processing system 1C does not include the supplementary class estimator, that is, it is the above-mentioned narrowly defined classifier, and does not have the supplementary class estimator 404, and corresponds to this. As for the configuration of the learning device, the learning supplementary class generating means 52 is not required as in the second embodiment.

第１および第２の実施形態では、クラス分類部４０３は、画像中にて注目クラスに分類されない部分について、具体的なクラスを特定せず「その他クラス」として出力する構成とすることができたが、本実施形態のクラス分類部４０３は注目クラス以外のクラスについてもラベル領域を出力する。 In the first and second embodiments, the classification unit 403 could be configured to output the portion of the image that is not classified as the class of interest as an "other class" without specifying a specific class. However, the class classification unit 403 of the present embodiment outputs a label area for classes other than the class of interest.

本実施形態における注目クラス情報は、クラス分類処理に偏りを持たせるために与えるバイアス情報としての性格を有する。具体的には、分類器に入力する注目クラス情報は、Ｎ次元のクラスベクトルで定義され、クラス分類結果に現れやすくするクラスの要素に値“１”、クラス分類結果に現れにくくするクラスの要素に値“０”を設定する。 The class of interest information in the present embodiment has a character as bias information given in order to give a bias to the classification process. Specifically, the class information of interest to be input to the classifier is defined by an N-dimensional class vector, and the value "1" is set for the class element that is likely to appear in the class classification result, and the class element that is difficult to appear in the class classification result. Set the value to "0".

分類器の学習動作では、学習手段５３には置換済み正解ラベル領域ではなくオリジナルの正解ラベル領域が入力され、また学習用注目クラス生成手段５１は学習用画像についての正解のクラスを学習用注目クラス情報として学習手段５３に入力する。よって、本実施形態の学習装置の構成では正解ラベル置換手段５０も要さない。 In the learning operation of the classifier, the original correct answer label area is input to the learning means 53 instead of the replaced correct answer label area, and the learning attention class generating means 51 sets the correct answer class for the learning image as the learning attention class. It is input to the learning means 53 as information. Therefore, the correct label replacement means 50 is not required in the configuration of the learning device of the present embodiment.

画像処理システム１Ｃにおける注目クラス設定手段５４Ｃは、クラス分類部４０３の出力に基づいて補足クラスを決め、現在の注目クラス群に当該補足クラスを加えて新たな注目クラス群を設定する。つまり、注目クラス設定手段５４Ｃは、補足クラス推定部４０４を用いずに補足クラスを定める点で第１の実施形態と相違するが、一方、第１の実施形態の注目クラス設定手段５４と同様、注目クラス群に補足クラスを加えて新たな注目クラス群を設定する処理により、複数通りの注目クラス群を逐次的に設定する。 The attention class setting means 54C in the image processing system 1C determines a supplementary class based on the output of the class classification unit 403, adds the supplementary class to the current attention class group, and sets a new attention class group. That is, the attention class setting means 54C is different from the first embodiment in that the supplementary class is determined without using the supplementary class estimation unit 404, but on the other hand, like the attention class setting means 54 of the first embodiment. By adding a supplementary class to the attention class group and setting a new attention class group, a plurality of attention class groups are sequentially set.

具体的には、クラス分類部４０３の出力にて入力画像に現れているとされたクラスであって、現在の注目クラス群に含まれていないものを補足クラス候補とし、その中から補足クラスを選択する。例えば、注目クラス設定手段５４Ｃは、補足クラス候補のうちラベル領域の面積が最大のクラスを補足クラスとして選択し、注目クラス群を更新する。更新後の注目クラス情報では、更新前の注目クラス群と補足クラスに対応する要素に値“１”が設定され、それ以外のクラス、つまり補足クラス候補のうち補足クラスに選択されなかったものと画像に現れていないとされたクラスの要素に値“０”が設定される。 Specifically, the classes that are said to appear in the input image in the output of the class classification unit 403 and are not included in the current attention class group are set as supplementary class candidates, and the supplementary class is selected from among them. select. For example, the attention class setting means 54C selects the class having the largest label area area among the supplementary class candidates as the supplementary class, and updates the attention class group. In the attention class information after the update, the value "1" is set for the elements corresponding to the attention class group and the supplementary class before the update, and the other classes, that is, the supplementary class candidates that are not selected as the supplementary class. The value "0" is set for the element of the class that does not appear in the image.

注目クラス設定手段５４Ｃは、例えば、Ｎ個の全クラスに値“０”が設定されたクラスベクトルを注目クラス情報の初期値として設定し、クラス分類結果に基づいて順次、補足クラスを追加する処理を繰り返す。そして、画像処理部５は終了条件が満たされたときのクラス分類部４０３の出力で得られるラベル領域を、入力画像の領域分割結果として領域情報出力手段３１により出力する。 The attention class setting means 54C sets, for example, a class vector in which a value “0” is set for all N classes as an initial value of attention class information, and sequentially adds supplementary classes based on the classification result. repeat. Then, the image processing unit 5 outputs the label area obtained by the output of the class classification unit 403 when the end condition is satisfied by the area information output means 31 as the area division result of the input image.

終了条件として、注目クラス群に対応する当該ラベル領域と入力画像に対応する二次元空間との整合の度合いに関する条件を設定することができる。例えば、当該度合いを表す指標の繰り返しに伴う変化が無い又は所定値以下であれば終了と判定される。具体的には、終了条件は、注目クラス情報が更新されなくなることや、注目クラス群に対応するラベル領域の面積が増加しなくなることや、当該増加が所定値以下となることとすることができる。また、例えば、整合の度合いを表す指標が所定基準を超える又は下回れば終了と判定される。具体的には、終了条件は、注目クラス群に対応するラベル領域の面積が入力画像の面積の所定割合以上となったことや、注目クラス群に対応しないラベル領域の面積が入力画像の面積の所定割合以下となったこととすることができる。また、終了条件は、例えば、クラス分類処理の繰り返し回数が予め設定した最大回数を超えたこととすることができる。或いは、終了条件を、上記条件のうちの２以上のいずれかを満たすこととしてもよい。 As the end condition, a condition regarding the degree of matching between the label area corresponding to the class of interest and the two-dimensional space corresponding to the input image can be set. For example, if there is no change due to repetition of the index indicating the degree, or if it is equal to or less than a predetermined value, it is determined to end. Specifically, the end condition can be that the attention class information is not updated, the area of the label area corresponding to the attention class group does not increase, or the increase is equal to or less than a predetermined value. .. Further, for example, if the index indicating the degree of matching exceeds or falls below a predetermined standard, it is determined to end. Specifically, the end condition is that the area of the label area corresponding to the attention class group is equal to or more than a predetermined ratio of the area of the input image, and the area of the label area not corresponding to the attention class group is the area of the input image. It can be said that the ratio is less than the predetermined ratio. Further, the end condition can be, for example, that the number of repetitions of the classification process exceeds a preset maximum number of times. Alternatively, the termination condition may satisfy any one of two or more of the above conditions.

［変形例］
（１）上記各実施形態では、データ群を二次元画像とする例を示したが、この例に限られない。例えばデータ群を二次元画像の時系列とすることができる。その場合、空間は時空間であり、データは画素である。また例えば、データ群を距離画像、空間を二次元空間、データを画素（距離値）とすることもできる。なお、その場合、撮影部２は距離画像センサとなる。また例えば、データ群をポイントクラウド等の三次元計測データ、空間を三次元空間、データを計測点とすることもできる。なお、その場合は撮影部２に代えて三次元計測器が用いられる。 [Modification example]
(1) In each of the above embodiments, an example in which the data group is a two-dimensional image is shown, but the present invention is not limited to this example. For example, the data group can be a time series of two-dimensional images. In that case, space is space-time and data is pixels. Further, for example, the data group can be a distance image, the space can be a two-dimensional space, and the data can be a pixel (distance value). In that case, the photographing unit 2 becomes a distance image sensor. Further, for example, the data group can be a three-dimensional measurement data such as a point cloud, the space can be a three-dimensional space, and the data can be a measurement point. In that case, a three-dimensional measuring instrument is used instead of the photographing unit 2.

（２）上記第１の実施形態および変形例では、狭義の分類器の学習モデルと推定器の学習モデルとは、特徴量抽出部４００を共有する例を示したが、両学習モデルは共通部分を持たない別個のモデルとしても良い。その場合、分類器と推定器は、共通の学習用データによって学習させてもよいし、別々の学習用データによって学習させてもよい。 (2) In the first embodiment and the modified example, the learning model of the classifier in the narrow sense and the learning model of the estimator share an example in which the feature amount extraction unit 400 is shared, but both learning models are common parts. It may be a separate model that does not have. In that case, the classifier and the estimator may be trained by common training data or may be trained by separate training data.

（３）上記各実施形態および各変形例では、注目クラス情報圧縮部４０１を特徴量抽出部４００およびクラス分類部４０３との同時並列的な学習によって生成している。これに代えて、学習データのクラスの出現傾向を基にした事前の主成分分析などによって注目クラス情報圧縮部４０１を別途生成してもよい。 (3) In each of the above embodiments and each modification, the attention class information compression unit 401 is generated by simultaneous parallel learning with the feature amount extraction unit 400 and the class classification unit 403. Instead of this, the attention class information compression unit 401 may be separately generated by prior principal component analysis or the like based on the appearance tendency of the class of the training data.

（４）上記各実施形態および各変形例では、分類器の注目クラス情報圧縮部４０１にて注目クラス情報を次元圧縮する例を説明した。しかし、注目クラス情報圧縮部４０１を使用せず、入力された注目クラス情報そのままを、特徴量合成部４０２にて特徴量抽出部４００からの画像特徴量と合成してもよい。 (4) In each of the above-described embodiments and modifications, an example of dimensionally compressing the attention class information by the attention class information compression unit 401 of the classifier has been described. However, instead of using the attention class information compression unit 401, the input attention class information as it is may be combined with the image feature amount from the feature amount extraction unit 400 by the feature amount synthesis unit 402.

以上で説明した領域分割装置・方法・プログラムによれば、画像（データ群）とともに注目クラス情報（注目クラス群）を入力することによって学習用データの多様性や学習用データにおける付与基準の混在に起因する変動を抑制した高精度なクラス分類ができるよう学習された分類器（狭義）で領域分割を行うので、変動を抑制した高精度な領域分割が可能となる。 According to the area division device / method / program explained above, by inputting the attention class information (attention class group) together with the image (data group), the variety of learning data and the mixing criteria of the learning data can be mixed. Since the area is divided by a classifier (narrowly defined) learned so that the classification can be performed with high accuracy by suppressing the fluctuation caused by the fluctuation, the area can be divided with high accuracy by suppressing the fluctuation.

特に、一つのデータ群に対して複数通りの注目クラス群を設定して領域分割を行い、データ群が分布する空間に対するラベル領域の整合の度合いが所定の条件を満たす領域分割結果を選択することによって、注目クラス群の確度が高まるので、変動を抑制した高精度な領域分割が確実に実行可能となる。 In particular, set multiple types of attention class groups for one data group, perform region division, and select the region division result in which the degree of matching of the label area with respect to the space in which the data group is distributed satisfies a predetermined condition. As a result, the accuracy of the attention class group is increased, so that highly accurate region division with suppressed fluctuation can be reliably executed.

また、第１の実施形態およびその変形例にて例示したように、注目クラス群に補足クラスを加えては条件を満たすまで領域分割を繰り返すことにより、変動を抑制した高精度な領域分割の確実な実行が可能となる。つまり、注目クラス群が小さな部分集合であるほど多様性や付与基準の混在に起因する変動を抑制する効果は高まるため、注目クラス群を小さな部分集合から次第に大きな部分集合にして領域分割を行うことで、変動を抑制した高精度な領域分割の確実な実行が可能となる。 Further, as illustrated in the first embodiment and its modified example, by adding a supplementary class to the class of interest and repeating the region division until the condition is satisfied, the reliable region division with suppressed fluctuation is ensured. Execution becomes possible. In other words, the smaller the subset of the attention class group, the greater the effect of suppressing fluctuations due to the mixture of diversity and grant criteria. Therefore, the region division should be performed by gradually changing the attention class group from a small subset to a larger subset. Therefore, it is possible to reliably execute highly accurate region division with suppressed fluctuations.

また、第１の実施形態およびその変形例にて例示したように、注目クラス群を正解に近づけるために加えるべき補足クラスを高精度に推定できる学習が行われた推定器により補足クラスを推定しつつ領域分割を繰り返すことによっても、注目クラス群の確度をさらに高めることができる。 Further, as illustrated in the first embodiment and its modified example, the supplementary class is estimated by a learned estimator capable of estimating the supplementary class to be added in order to bring the attention class group closer to the correct answer. By repeating the region division while doing so, the accuracy of the class of interest can be further increased.

１画像処理システム、２撮影部、３通信部、４記憶部、５画像処理部、６表示部、４０学習用データ記憶手段、４１学習モデル記憶手段、４２モデル記憶手段、５０正解ラベル置換手段、５１学習用注目クラス生成手段、５２学習用補足クラス生成手段、５３学習手段、５４注目クラス設定手段、５５領域分割手段、５６分類器、４００特徴量抽出部、４０１注目クラス情報圧縮部、４０２特徴量合成部、４０３クラス分類部、４０４補足クラス推定部。 1 image processing system, 2 photographing unit, 3 communication unit, 4 storage unit, 5 image processing unit, 6 display unit, 40 learning data storage means, 41 learning model storage means, 42 model storage means, 50 correct label replacement means, 51 Learning attention class generating means, 52 Learning supplementary class generating means, 53 Learning means, 54 Focusing class setting means, 55 Region dividing means, 56 Classifier, 400 Feature extraction unit, 401 Feature class information compression unit, 402 Features Quantitative synthesis unit, 403 class classification unit, 404 supplementary class estimation unit.

Claims

An area division device that performs classification processing to classify a data group distributed in a predetermined space into a plurality of classes and divides the space into label areas identified by the classes.
As a classifier in which the data group and the attention class group are input, the classification processing for the data group is performed, and the label area for the attention class group is output, the training data group and the learning data group are obtained in advance. A model storage means that stores a given correct answer class and a trained model trained using the learning attention class group given by a subset of the correct answer classes.
An attention class setting means for setting a plurality of attention class groups for the data group, and
The label area is obtained by the classifier for each of a plurality of settings of the attention class group by the attention class setting means, and among the area division results of the space based on the label area, the area division result is configured. An area division means for selecting a data group that satisfies a predetermined condition for the degree of consistency between the label area and the space as an area division result for the data group.
A region dividing device characterized by having.

The attention class setting means sequentially sets a plurality of attention classes by adding a supplementary class to the attention class group and setting a new attention class group.
The area division means selects, as the area division result for the data group, the label area output by the classifier for the plurality of types of the attention class group whose size is equal to or larger than a predetermined reference. The area dividing device according to claim 1, wherein the area division device is characterized by the above.

In the trained model, the training is performed by inputting the data group and the attention class group as an estimator for estimating the supplementary class and further using the correct answer of the supplementary class for the training data group. Being,
2. The area dividing device according to claim 2.

A region division method in which a data group distributed in a space is classified into a plurality of classes and the space is divided into label areas identified by the classes.
As a classifier in which the data group and the attention class group are input, the classification processing for the data group is performed, and the label area for the attention class group is output, the training data group and the learning data group are obtained in advance. A step of preparing a trained model that has been trained using a given correct answer class and a group of attention classes for learning given by a subset of the correct answer classes.
A attention class setting step for setting a plurality of attention class groups for the data group, and
For each of the plurality of settings of the attention class group in the attention class setting step, the label area is obtained by the classifier, and among the area division results of the space based on the label area, the area division result is configured. An area division step of selecting a data group that satisfies a predetermined condition for the degree of matching between the label area and the space as an area division result for the data group.
A region division method characterized by having.

A program in which a computer performs a classification process for classifying a data group distributed in a space into a plurality of classes and divides the space into label areas identified by the classes.
The computer
As a classifier in which the data group and the attention class group are input, the classification processing for the data group is performed, and the label area for the attention class group is output, the learning data group and the learning data group are obtained in advance. A model storage means that stores a given correct answer class and a trained model trained using the learning attention class group given by a subset of the correct answer classes.
An attention class setting means for setting a plurality of attention class groups for the data group, and an attention class setting means.
The label area is obtained by the classifier for each of a plurality of settings of the attention class group by the attention class setting means, and among the area division results of the space based on the label area, the area division result is configured. An area division means for selecting a data group that satisfies a predetermined condition for the degree of consistency between the label area and the space as an area division result for the data group.
An area division program characterized by functioning as.