JP7386006B2

JP7386006B2 - Region division device, region division method, region division program, learning device, learning method, and learning program

Info

Publication number: JP7386006B2
Application number: JP2019121964A
Authority: JP
Inventors: 智之吉山
Original assignee: Secom Co Ltd
Current assignee: Secom Co Ltd
Priority date: 2019-06-28
Filing date: 2019-06-28
Publication date: 2023-11-24
Anticipated expiration: 2039-06-28
Also published as: JP2021009484A

Description

本発明は、画像等のデータ群を被写体等のクラスごとに分類してデータ群をラベル領域に分割する技術、および上記分類に係る学習を行う技術に関する。 The present invention relates to a technique for classifying a data group such as an image by class such as a subject and dividing the data group into label regions, and a technique for performing learning related to the classification.

画像に撮影されたシーンを自動認識するなどの目的で、画像を、当該画像に撮影されている複数の物体それぞれの領域や複数の部位それぞれの領域に分割すると共に、各領域に撮影されている物体や部位を認識する技術が研究・開発されてきた。以下、撮影されている物体や部位を被写体と呼ぶ。被写体の認識を伴った領域分割はセマンティックセグメンテーションなどと称される。 For the purpose of automatically recognizing the scene captured in the image, the image is divided into regions for each of multiple objects captured in the image and regions for each of multiple parts, and the images captured in each region are divided into regions for each of multiple objects captured in the image. Technology for recognizing objects and parts has been researched and developed. Hereinafter, the object or part being photographed will be referred to as a subject. Region segmentation accompanied by object recognition is called semantic segmentation.

特に、近年では、学習に基づいて上記分割と認識を行う技術が盛んに研究されている。すなわち、例えば、下記非特許文献１には、予め被写体ごとに分割された領域の画素ごとに被写体を表すクラスを付与した学習用画像を多数用意し、コンピュータにこれらの学習用画像を機械学習させることが記載されている。予め付与する情報はアノテーションなどと称される。この学習によって生成された学習済みモデルに任意の画像を入力すれば当該入力画像に対して画素ごとのクラスが出力される。つまり当該入力画像が被写体ごとに、クラスでラベル付けされた領域（ラベル領域）に分割される。 In particular, in recent years, techniques for performing the above segmentation and recognition based on learning have been actively researched. That is, for example, in the following non-patent document 1, a large number of learning images are prepared in which a class representing the object is assigned to each pixel of a region divided in advance for each object, and a computer is caused to perform machine learning on these learning images. It is stated that. Information given in advance is called an annotation. If an arbitrary image is input to the trained model generated by this learning, a class for each pixel is output for the input image. That is, the input image is divided into regions (label regions) labeled by class for each subject.

また、近年では、学習用画像とアノテーションとからなるデータセットが公開され利用可能となっている。基本的には多様な学習をした学習済みモデルほど高精度な領域分割を行うことができるため、学習に用いるデータセットの規模は大きい方が望ましい。 Furthermore, in recent years, datasets consisting of training images and annotations have been made public and available for use. Basically, trained models that have undergone a variety of training can perform more accurate region segmentation, so it is desirable that the size of the dataset used for learning be larger.

“Fully Convolutional Networks for Semantic Segmentation”,Jonathan Long, Evan Shelhamer, and Trevor Darrell (Proceedings of the IEEE conference on computer vision and pattern recognition, 2015)“Fully Convolutional Networks for Semantic Segmentation”, Jonathan Long, Evan Shelhamer, and Trevor Darrell (Proceedings of the IEEE conference on computer vision and pattern recognition, 2015)

しかしながら、学習データの多様性や付与基準の異なるアノテーションの混在が原因で領域分割結果が変動しやすくなる問題があった。また、付与基準の異なるアノテーションの混在は学習精度低下の原因にもなっていた。 However, there is a problem in that the region segmentation results tend to fluctuate due to the diversity of training data and the mixture of annotations with different assignment criteria. In addition, the mixture of annotations with different assignment criteria was a cause of decreased learning accuracy.

例えば、黒い絨毯の画像とそれに似たアスファルトの画像とを学習に用いると、黒い絨毯が敷かれた床の領域を正しく床の領域と分割する場合だけでなく、その一部または全部を道路の領域として誤って分割してしまう場合も生じる。これは学習の多様性により領域分割結果が変動しやすくなってしまう例である。 For example, if you use an image of a black carpet and an image of asphalt similar to it for learning, you will not only be able to correctly divide a black carpeted floor area into a floor area, but also divide part or all of it into a road. There may also be cases where the area is erroneously divided. This is an example where the region segmentation results tend to fluctuate due to the diversity of learning.

また、例えば、野球場を撮影した画像を入力した場合、当該画像における芝の領域を草の領域として分割する場合もあれば、当該画像における芝の領域を遊技場の領域の一部として分割する場合もある。これは付与基準の異なるアノテーションの混在により領域分割結果が変動しやすくなってしまう例である。例えば、公開されているデータセットにおいては、野球場を撮影した学習用画像のひとつにおいては芝の領域に「草」を表すラベルが付与され土の領域に「土」を表すラベルが付与されているが、野球場を撮影した別の学習用画像においては芝と土の領域を合わせた領域に「遊技場」を表すラベルが付与されている、というように異なる付与基準が混在していることがある。つまり、芝の領域に対しては草も遊技場も正解となる。そのため、入力画像の違いによる変動が生じやすくなる。 For example, if an image of a baseball field is input, the grass area in the image may be divided into grass areas, or the grass area in the image may be divided as part of the playground area. In some cases. This is an example in which the result of region division tends to fluctuate due to a mixture of annotations with different assignment criteria. For example, in a publicly available dataset, in one of the training images taken of a baseball field, the grass area is labeled with the word "grass" and the dirt region is labeled with the word "soil." However, in another training image taken of a baseball field, different labeling criteria are mixed, such as a label indicating "playground" being assigned to the combined area of grass and soil. There is. In other words, for the grass area, both grass and playground are correct answers. Therefore, fluctuations due to differences in input images are likely to occur.

別の側面では芝の領域の例のような複数の正解の存在は学習を収束しづらくさせる。そのため、付与基準の異なるアノテーションの混在は学習精度低下の要因でもある。 On the other hand, the existence of multiple correct answers, such as in the grass area example, makes it difficult for learning to converge. Therefore, the mixture of annotations with different assignment criteria is a factor in reducing learning accuracy.

なお、上記問題は、二次元画像のみならず、時系列画像から形成される時空間のデータやポイントクラウド等の三次元データ等においても生じ得る。 Note that the above problem may occur not only in two-dimensional images but also in spatio-temporal data formed from time-series images, three-dimensional data such as point clouds, and the like.

本発明は、上記問題を鑑みてなされたものであり、領域分割結果の変動を抑制することのできる領域分割技術を提供することを目的とする。また、本発明は、領域分割処理の学習に用いる学習用データに付与基準の異なるアノテーションが混在していても学習精度の低下を防止できる学習技術を提供することを別の目的とする。 The present invention has been made in view of the above-mentioned problems, and an object of the present invention is to provide a region division technique that can suppress fluctuations in the result of region division. Another object of the present invention is to provide a learning technique that can prevent a decrease in learning accuracy even if annotations with different attachment criteria are mixed in learning data used for learning region division processing.

（１）本発明に係る領域分割装置は、空間に分布するデータ群を複数のクラスに分類する分類処理を行い前記空間を前記クラスで識別されるラベル領域に分割する装置であって、前記データ群と前記分類処理に偏りを持たせるためのバイアス情報とを入力され当該データ群についての前記分類処理を行う分類器として、学習用データ群及び当該学習用データ群に対し予め与えられた正解のクラスと当該正解のクラスから導出された前記バイアス情報である学習用バイアス情報とを用いた学習が行われた学習済みモデルを記憶している分類器記憶手段と、前記データ群と当該データ群に対する前記バイアス情報とを前記分類器に入力し、その出力のクラス分類結果に基づいて前記ラベル領域を求める領域分割手段と、を有する。 (1) A region dividing device according to the present invention is a device that performs classification processing to classify a data group distributed in a space into a plurality of classes, and divides the space into labeled regions identified by the classes, As a classifier that receives a group and bias information for biasing the classification process and performs the classification process on the data group, it uses a learning data group and a correct answer given in advance for the learning data group. a classifier storage means that stores a trained model that has been trained using the learning bias information that is the bias information derived from the class and the correct class; and region dividing means for inputting the bias information to the classifier and determining the label region based on the output class classification result.

（２）上記（１）に記載の領域分割装置において、予め定義された前記クラスのそれぞれと一対一に対応した要素を有する前記バイアス情報を入力するバイアス入力手段と、前記バイアス入力手段からの前記バイアス情報を次元圧縮して前記分類処理に供するバイアス情報圧縮手段と、をさらに備えた構成とすることができる。 (2) In the region dividing apparatus according to (1) above, bias input means inputs the bias information having elements in one-to-one correspondence with each of the predefined classes; and the bias information inputted from the bias input means. The configuration may further include bias information compression means for dimensionally compressing the bias information and subjecting it to the classification process.

（３）上記（２）に記載の領域分割装置において、前記分類器は、次元圧縮された前記バイアス情報と前記データ群の特徴量とを組み合わせた合成特徴量を生成する特徴量合成部と、前記合成特徴量に基づいて前記分類処理を行うクラス分類部と、を有する構成とすることができる。 (3) In the region segmentation device according to (2) above, the classifier includes a feature synthesis unit that generates a composite feature that combines the dimensionally compressed bias information and the feature of the data group; A class classification unit that performs the classification process based on the composite feature amount may be configured.

（４）上記（１）～（３）に記載の領域分割装置において、前記分類器に入力する前記バイアス情報は、前記クラス分類結果に現れやすくさせるクラス又は現れにくくさせるクラスを指定するものとすることができる。 (4) In the region segmentation device described in (1) to (3) above, the bias information input to the classifier specifies a class that is more likely to appear or a class that is less likely to appear in the classification result. be able to.

（５）上記（４）に記載の領域分割装置において、前記分類器に入力する前記バイアス情報はさらに、前記クラス分類結果における前記クラスの現れやすさ又は現れにくさの度合いを指定することができる。 (5) In the region segmentation device according to (4) above, the bias information input to the classifier can further specify the degree of ease or difficulty of the class appearing in the class classification result. .

（６）本発明に係る学習装置は、空間に分布するデータ群を複数のクラスに分類する分類処理を行う分類器を学習させる装置であって、前記分類器として、前記データ群と前記分類処理に偏りを持たせるためのバイアス情報とを入力され当該データ群についてのクラス分類結果を出力する学習モデルを記憶する学習モデル記憶手段と、学習用データ群及び当該学習用データ群に対し予め与えられた正解のクラスと当該正解のクラスから導出した前記バイアス情報である学習用バイアス情報とを記憶している学習用データ記憶手段と、前記学習モデルに前記学習用データ群及び前記学習用バイアス情報を入力し、出力の前記クラス分類結果の前記正解に対する誤差に基づいて前記学習モデルを更新する学習を行う学習手段と、を有する。 (6) The learning device according to the present invention is a device for learning a classifier that performs classification processing for classifying a data group distributed in space into a plurality of classes, wherein the classifier uses the data group and the classification processing. a learning model storage means for storing a learning model inputted with bias information for imparting bias to the data group and outputting a class classification result for the data group; learning data storage means storing a correct answer class and learning bias information that is the bias information derived from the correct answer class; and a learning data storage means storing the learning data group and the learning bias information in the learning model. and learning means for performing learning for updating the learning model based on an error of the input and output class classification results with respect to the correct answer.

（７）上記（６）に記載の学習装置において、前記学習用データ群ごとに、予め定義された前記クラスのそれぞれと一対一に対応した要素を有した前記学習用バイアス情報であって当該学習用データ群に与えられた前記正解のクラスを前記クラス分類結果に現れやすくさせるクラスに指定すると共に当該正解のクラス以外を前記クラス分類結果に現れにくくさせるクラスに指定した前記学習用バイアス情報を生成する学習用バイアス生成手段を、さらに備えた構成とすることができる。 (7) In the learning device according to (6) above, the learning bias information has elements that correspond one-to-one with each of the predefined classes for each of the learning data groups, and the learning generating the learning bias information that specifies the correct class given to the data group as a class that makes it more likely to appear in the class classification result, and specifies classes other than the correct answer class as classes that make it difficult to appear in the class classification result; The configuration may further include learning bias generation means.

（８）本発明に係る領域分割方法は、空間に分布するデータ群を複数のクラスに分類する分類処理を行い前記空間を前記クラスで識別されるラベル領域に分割する方法であって、前記データ群と、前記分類処理に偏りを持たせるためのバイアス情報と、を入力され当該データ群についての前記分類処理を行う分類器として、学習用データ群及び当該学習用データ群に対し予め与えられた正解のクラスと、当該正解のクラスから導出された前記バイアス情報である学習用バイアス情報と、を用いた学習が行われた学習済みモデルを用意するステップと、前記データ群と当該データ群に対する前記バイアス情報とを前記分類器に入力し、その出力のクラス分類結果に基づいて前記ラベル領域を求める領域分割ステップと、を有する。 (8) The region dividing method according to the present invention is a method of performing classification processing to classify a data group distributed in a space into a plurality of classes, and dividing the space into labeled regions identified by the classes, the method comprising: As a classifier that receives a group and bias information for imparting bias to the classification process and performs the classification process on the data group, a learning data group and a group of data given in advance for the learning data group are used. a step of preparing a trained model that has been trained using a correct class and learning bias information that is the bias information derived from the correct class; and a region dividing step of inputting bias information to the classifier and determining the label region based on the output class classification result.

（９）本発明に係る学習方法は、空間に分布するデータ群を複数のクラスに分類する分類処理を行う分類器を学習させる方法であって、前記分類器として、前記データ群と、前記分類処理に偏りを持たせるためのバイアス情報とを入力され当該データ群についてのクラス分類結果を出力する学習モデルを用意するステップと、学習用データ群及び当該学習用データ群に対し予め与えられた正解のクラスと、当該正解のクラスから導出した前記バイアス情報である学習用バイアス情報とを用意するステップと、前記学習モデルに前記学習用データ群及び前記学習用バイアス情報を入力し、出力の前記クラス分類結果の前記正解に対する誤差に基づいて前記学習モデルを更新する学習を行う学習ステップと、を有する。 (9) The learning method according to the present invention is a method for learning a classifier that performs classification processing for classifying a data group distributed in space into a plurality of classes, wherein the classifier includes the data group, the classification A step of preparing a learning model that is input with bias information for biasing the processing and outputs a class classification result for the data group, and a step of preparing a learning data group and a correct answer given in advance for the learning data group. and a step of preparing learning bias information that is the bias information derived from the correct class; and inputting the learning data group and the learning bias information to the learning model, and preparing the output class. and a learning step of performing learning to update the learning model based on an error of the classification result with respect to the correct answer.

（１０）本発明に係る領域分割プログラムは、空間に分布するデータ群を複数のクラスに分類する分類処理を行い前記空間を前記クラスで識別されるラベル領域に分割する処理をコンピュータに行わせるプログラムであって、当該コンピュータを、前記データ群と、前記分類処理に偏りを持たせるためのバイアス情報と、を入力され当該データ群についての前記分類処理を行う分類器として、学習用データ群及び当該学習用データ群に対し予め与えられた正解のクラスと、当該正解のクラスから導出された前記バイアス情報である学習用バイアス情報と、を用いた学習が行われた学習済みモデルを記憶している分類器記憶手段、及び、前記データ群と当該データ群に対する前記バイアス情報とを前記分類器に入力し、その出力のクラス分類結果に基づいて前記ラベル領域を求める領域分割手段、として機能させる。 (10) The area division program according to the present invention is a program that causes a computer to perform a classification process of classifying a data group distributed in a space into a plurality of classes and divide the space into label areas identified by the classes. The computer is used as a classifier that receives the data group and bias information for biasing the classification process and performs the classification process on the data group. Stores a trained model that has been trained using a correct class given in advance to a group of training data and learning bias information that is the bias information derived from the correct class. It functions as a classifier storage means, and an area dividing means that inputs the data group and the bias information for the data group into the classifier and obtains the label area based on the output class classification result.

（１１）本発明に係る学習プログラムは、空間に分布するデータ群を複数のクラスに分類する分類処理を行う分類器を学習させる処理をコンピュータに行わせるプログラムであって、当該コンピュータを、前記分類器として、前記データ群と、前記分類処理に偏りを持たせるためのバイアス情報とを入力され当該データ群についてのクラス分類結果を出力する学習モデルを記憶する学習モデル記憶手段、学習用データ群及び当該学習用データ群に対し予め与えられた正解のクラスと、当該正解のクラスから導出した前記バイアス情報である学習用バイアス情報とを記憶している学習用データ記憶手段、及び、前記学習モデルに前記学習用データ群及び前記学習用バイアス情報を入力し、出力の前記クラス分類結果の前記正解に対する誤差に基づいて前記学習モデルを更新する学習を行う学習手段、として機能させる。 (11) The learning program according to the present invention is a program that causes a computer to perform a process of learning a classifier that performs a classification process of classifying a data group distributed in space into a plurality of classes, learning model storage means for storing a learning model that receives the data group and bias information for biasing the classification process and outputs a class classification result for the data group; a learning data group; a learning data storage means that stores a correct answer class given in advance for the learning data group and learning bias information that is the bias information derived from the correct answer class; It functions as a learning means that inputs the learning data group and the learning bias information and performs learning to update the learning model based on the error of the output class classification result with respect to the correct answer.

本発明によれば、領域分割結果の変動を抑制することが可能になる。また、本発明によれば、領域分割処理の学習に用いる学習用データに付与基準の異なるアノテーションが混在していても学習精度の低下を防止することが可能になる。 According to the present invention, it is possible to suppress fluctuations in region division results. Further, according to the present invention, it is possible to prevent a decrease in learning accuracy even if annotations with different attachment criteria are mixed in learning data used for learning region division processing.

本発明の実施形態に係る領域分割装置の概略の構成を示すブロック図である。FIG. 1 is a block diagram showing a schematic configuration of an area dividing device according to an embodiment of the present invention. セグメンテーションを行う際の本発明の実施形態に係る領域分割装置の概略の機能ブロック図である。FIG. 1 is a schematic functional block diagram of a region dividing apparatus according to an embodiment of the present invention when performing segmentation. 本発明の実施形態に係る領域分割装置に用いる分類器の概略の機能ブロック図である。FIG. 2 is a schematic functional block diagram of a classifier used in the region segmentation device according to the embodiment of the present invention. 分類器の学習装置としての本発明の実施形態に係る領域分割装置の概略の機能ブロック図である。FIG. 1 is a schematic functional block diagram of a region segmentation device according to an embodiment of the present invention as a classifier learning device. 本発明の実施形態に係る領域分割装置の領域分割処理での動作に関する概略のフロー図である。FIG. 3 is a schematic flow diagram regarding the operation of the region dividing apparatus according to the embodiment of the present invention in region dividing processing. 合成特徴量の生成処理を説明する模式図である。FIG. 2 is a schematic diagram illustrating a process of generating a composite feature amount. 本発明の実施形態に係る領域分割装置の領域分割処理の処理例を説明するための模式図である。FIG. 3 is a schematic diagram for explaining a processing example of region division processing by the region division device according to the embodiment of the present invention. 本発明の実施形態に係る領域分割装置の学習処理での動作に関する概略のフロー図である。FIG. 3 is a schematic flow diagram regarding the operation of the region dividing device in learning processing according to the embodiment of the present invention.

以下、本発明の実施の形態（以下実施形態という）である領域分割装置１について、図面に基づいて説明する。本発明に係る領域分割装置は、空間に分布するデータ群を複数のクラスに分類する分類処理を行い空間をクラスで識別されるラベル領域に分割するものであり、本実施形態にて一例として示す領域分割装置１は、監視空間を撮影した画像を領域分割する。すなわち、本実施形態において、分類されるデータ群は二次元画像、それを構成するデータは画素であり、分割される空間は画像に対応する二次元空間である。 DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS A region dividing apparatus 1 that is an embodiment (hereinafter referred to as an embodiment) of the present invention will be described below based on the drawings. The region dividing device according to the present invention performs a classification process to classify a data group distributed in space into a plurality of classes, and divides the space into labeled regions identified by classes, and is shown as an example in this embodiment. The region dividing device 1 divides an image of a surveillance space into regions. That is, in this embodiment, the data group to be classified is a two-dimensional image, the data constituting it is a pixel, and the space to be divided is a two-dimensional space corresponding to the image.

領域分割装置１は上記分類処理を行う分類器を備える。また、領域分割装置１は当該分類器を学習する学習装置を含む。 The region dividing device 1 includes a classifier that performs the above classification process. Furthermore, the region segmentation device 1 includes a learning device that learns the classifier.

［領域分割装置１の構成］
図１は領域分割装置１の概略の構成を示すブロック図である。領域分割装置１は撮影部２、通信部３、記憶部４、画像処理部５、表示部６および操作入力部７からなる。 [Configuration of area dividing device 1]
FIG. 1 is a block diagram showing a general configuration of an area dividing device 1. As shown in FIG. The area dividing device 1 includes a photographing section 2, a communication section 3, a storage section 4, an image processing section 5, a display section 6, and an operation input section 7.

撮影部２は、分類対象のデータ群として画像を取得するカメラであり、本実施形態においては監視カメラである。撮影部２は通信部３を介して画像処理部５と接続され、監視空間を所定の時間間隔で撮影して画像を生成し、生成した画像を順次、画像処理部５に入力する。例えば、撮影部２は、監視空間である部屋の一角に当該監視空間を俯瞰する所定の固定視野を有して設置され、監視空間をフレーム周期１秒で撮影してカラー画像を生成する。なお、撮影部２はカラー画像の代わりにモノクロ画像を生成してもよい。 The photographing unit 2 is a camera that acquires images as a data group to be classified, and in this embodiment is a surveillance camera. The photographing section 2 is connected to the image processing section 5 via the communication section 3, generates images by photographing the monitoring space at predetermined time intervals, and sequentially inputs the generated images to the image processing section 5. For example, the imaging unit 2 is installed in a corner of a room that is a monitoring space with a predetermined fixed field of view overlooking the monitoring space, and photographs the monitoring space at a frame period of 1 second to generate a color image. Note that the photographing unit 2 may generate a monochrome image instead of a color image.

通信部３は通信回路であり、その一端が画像処理部５に接続され、他端が撮影部２、表示部６および操作入力部７と接続される。通信部３は撮影部２から画像を取得して画像処理部５に入力し、また、操作入力部７からユーザの指示等を取得して画像処理部５に入力する。また、通信部３は画像処理部５からクラスへの分類結果やラベル領域へのセグメンテーションの結果を入力され表示部６へ出力する。 The communication section 3 is a communication circuit, one end of which is connected to the image processing section 5, and the other end connected to the photographing section 2, the display section 6, and the operation input section 7. The communication unit 3 acquires images from the photographing unit 2 and inputs them to the image processing unit 5 , and also acquires user instructions and the like from the operation input unit 7 and inputs them to the image processing unit 5 . Further, the communication unit 3 receives the results of classification into classes and the results of segmentation into label regions from the image processing unit 5 and outputs them to the display unit 6 .

なお、撮影部２、通信部３、記憶部４、画像処理部５、表示部６および操作入力部７の間は各部の設置場所に応じた形態で適宜接続される。例えば、撮影部２と通信部３および画像処理部５とが遠隔に設置される場合、撮影部２と通信部３との間をインターネット回線にて接続することができる。また、通信部３と画像処理部５との間はバスで接続する構成とすることができる。その他、接続手段として、ＬＡＮ（Local Area Network）、各種ケーブルなどを用いることができる。 The photographing section 2, the communication section 3, the storage section 4, the image processing section 5, the display section 6, and the operation input section 7 are connected as appropriate depending on the installation location of each section. For example, when the photographing section 2, the communication section 3, and the image processing section 5 are installed remotely, the photographing section 2 and the communication section 3 can be connected via an Internet line. Further, the communication section 3 and the image processing section 5 may be connected via a bus. In addition, a LAN (Local Area Network), various cables, etc. can be used as the connection means.

記憶部４は、ＲＯＭ（Read Only Memory）、ＲＡＭ（Random Access Memory）等のメモリ装置であり、各種プログラムや各種データを記憶する。例えば、記憶部４は学習用のデータや、学習済みモデルである分類器の情報を記憶し、画像処理部５との間でこれらの情報を入出力する。すなわち、分類器の学習に用いる情報、分類処理に必要な情報や当該処理の過程で生じた情報などが記憶部４と画像処理部５との間で入出力される。 The storage unit 4 is a memory device such as a ROM (Read Only Memory) or a RAM (Random Access Memory), and stores various programs and various data. For example, the storage unit 4 stores learning data and information on a classifier that is a trained model, and inputs and outputs this information to and from the image processing unit 5. That is, information used for learning the classifier, information necessary for classification processing, information generated in the process of the processing, etc. are input and output between the storage section 4 and the image processing section 5.

画像処理部５は、ＣＰＵ（Central Processing Unit）、ＤＳＰ（Digital Signal Processor）、ＭＣＵ（Micro Control Unit）、ＧＰＵ(Graphics Processing Unit)等の演算装置で構成される。画像処理部５は記憶部４からプログラムを読み出して実行することにより各種の処理手段・制御手段として動作し、必要に応じて、各種データを記憶部４から読み出し、生成したデータを記憶部４に記憶させる。例えば、画像処理部５は分類器を学習し生成すると共に、生成した分類器を通信部３経由で記憶部４に記憶させる。また、画像処理部５は分類器を用いて撮影画像のセグメンテーションを行う。 The image processing unit 5 is composed of arithmetic devices such as a CPU (Central Processing Unit), a DSP (Digital Signal Processor), an MCU (Micro Control Unit), and a GPU (Graphics Processing Unit). The image processing unit 5 operates as various processing means and control means by reading and executing programs from the storage unit 4, and reads various data from the storage unit 4 and stores generated data in the storage unit 4 as necessary. Make me remember. For example, the image processing unit 5 learns and generates a classifier, and stores the generated classifier in the storage unit 4 via the communication unit 3. The image processing unit 5 also performs segmentation of the captured image using a classifier.

表示部６は、液晶ディスプレイまたは有機ＥＬ（Electro-Luminescence）ディスプレイ等であり、通信部３を経由して画像処理部５から入力される分類処理やセグメンテーションの結果を表示する。 The display unit 6 is a liquid crystal display, an organic EL (Electro-Luminescence) display, or the like, and displays the results of classification processing and segmentation input from the image processing unit 5 via the communication unit 3.

操作入力部７は画像処理部５に対する入力機器であり、キーボードおよびマウス等で構成される。 The operation input section 7 is an input device for the image processing section 5, and is composed of a keyboard, a mouse, and the like.

領域分割装置１は、画像を構成する各画素を分類器を用いてクラス分類し、画像をラベル領域に分割する装置であると共に、当該分類器を構築する学習動作を行う学習装置としての機能を有する。以下、領域分割装置１の構成について、先ず、セグメンテーション処理に関する構成、つまり領域分割装置としての構成について説明し、次いで、学習装置としての構成について説明する。 The region dividing device 1 is a device that classifies each pixel constituting an image using a classifier and divides the image into label regions, and also functions as a learning device that performs a learning operation to construct the classifier. have Hereinafter, regarding the configuration of the region dividing device 1, first, the configuration related to segmentation processing, that is, the configuration as a region dividing device will be explained, and then the configuration as a learning device will be explained.

［領域分割装置としての構成］
図２はセグメンテーションを行う際の領域分割装置１の概略の機能ブロック図であり、記憶部４が分類器記憶手段４０として機能し、画像処理部５が領域分割手段５０として機能する。また、通信部３が画像処理部５と協働し、画像入力手段３０および領域情報出力手段３１として機能する。操作入力部７と通信部３はバイアス入力手段７０として機能する。 [Configuration as area dividing device]
FIG. 2 is a schematic functional block diagram of the region dividing device 1 when performing segmentation, in which the storage section 4 functions as the classifier storage means 40 and the image processing section 5 functions as the region dividing means 50. Further, the communication section 3 cooperates with the image processing section 5 and functions as an image input means 30 and an area information output means 31. The operation input section 7 and the communication section 3 function as a bias input means 70.

画像入力手段３０は撮影部２が撮影した画像を順次取得して領域分割手段５０に入力する。 The image input means 30 sequentially acquires images photographed by the photographing section 2 and inputs them to the area division means 50.

バイアス入力手段７０は、利用者が操作入力部７を操作して入力するバイアス情報（入力バイアス情報）を領域分割手段５０へ渡す。バイアス情報は分類処理に偏りを持たせるために与える情報である。特に、入力バイアス情報は利用者が理解可能な形式で表現されたバイアス情報である。入力バイアス情報は、予め定義された全クラスのそれぞれと一対一に対応した要素を有するベクトル（全クラス数をＮとするとＮ次元ベクトル）とすることができる。例えば、利用者は、クラス分類結果に現れやすくさせたいクラスの要素に値“１”、クラス分類結果に現れにくくさせたいクラスの要素に値“０”を設定した入力バイアス情報を設定する。なお、バイアス入力手段７０は利用者が理解可能な形式のバイアス情報が記録されたファイルを読み出して入力する手段とすることもできる。 The bias input means 70 passes bias information (input bias information) inputted by the user by operating the operation input section 7 to the area division means 50. Bias information is information given to bias the classification process. In particular, the input bias information is bias information expressed in a format that is understandable to the user. The input bias information can be a vector (an N-dimensional vector, where N is the total number of classes) having elements in one-to-one correspondence with each of all predefined classes. For example, the user sets input bias information in which elements of a class that is desired to appear more easily in the classification results are set to a value of "1", and elements of a class that is desired to appear less likely to appear in the classification results are set to a value of "0". Note that the bias input means 70 may also be a means for reading and inputting a file in which bias information is recorded in a format understandable by the user.

分類器記憶手段４０は学習により生成された分類器を記憶している。本実施形態では、分類器は深層学習（Deep Learning）で用いられるような多層のネットワークで構成され、例えば、畳み込みニューラルネットワーク（Convolutional Neural Network：ＣＮＮ）でモデル化することができる。分類器記憶手段４０は、ＣＮＮなどのネットワークを構成するフィルタのフィルタ係数やネットワーク構造などを含めた情報を分類器として記憶する。 The classifier storage means 40 stores classifiers generated through learning. In this embodiment, the classifier is configured with a multilayer network used in deep learning, and can be modeled using, for example, a convolutional neural network (CNN). The classifier storage means 40 stores information including filter coefficients and network structure of filters constituting a network such as CNN as a classifier.

領域分割手段５０は、分類器記憶手段４０に格納されている分類器を用いて、画像入力手段３０から入力された画像（入力画像）、および当該入力画像に対しバイアス入力手段７０からの入力バイアス情報を入力として、画素について、事前定義された複数クラスのどれに属するかを推定する分類処理を行う。そして、領域分割手段５０は分類器から出力されるクラス分類結果に基づいてラベル領域を求める。 The area dividing means 50 uses the classifier stored in the classifier storage means 40 to calculate the image (input image) input from the image input means 30 and the input bias from the bias input means 70 for the input image. Using information as input, a classification process is performed to estimate which of a plurality of predefined classes the pixel belongs to. Then, the region dividing means 50 obtains a label region based on the class classification result output from the classifier.

領域情報出力手段３１は、領域分割手段５０が求めたラベル領域を表示部６に出力する。例えば、領域情報出力手段３１は、ラベル領域ごとに色分けされた画像を生成して表示部６に出力する。 The area information output means 31 outputs the label area obtained by the area division means 50 to the display unit 6. For example, the area information output means 31 generates a color-coded image for each label area and outputs it to the display unit 6.

図３は分類器の概略の機能ブロック図である。分類器は、画像とバイアス情報とを入力され、画像の各画素についてクラス分類を行い、その結果を出力する。 FIG. 3 is a schematic functional block diagram of the classifier. The classifier receives the image and bias information, classifies each pixel of the image, and outputs the result.

バイアス情報とは既に述べたように、分類処理に偏りを持たせるために与える情報であり、例えば、画像中に出現するであろうクラス、出現しないであろうクラスの情報を表現したものである。分類器に入力としてバイアス情報を与えることで、セグメンテーション結果に含まれるクラスを制御できるようになる。 As already mentioned, bias information is information given to bias the classification process, for example, it expresses information about classes that will appear in the image and classes that will not appear. . By providing bias information as input to the classifier, it is possible to control the classes included in the segmentation result.

本実施形態の分類器を構成するネットワークは、特徴量抽出部４００、バイアス情報圧縮部４０１、特徴量合成部４０２、およびクラス分類部４０３を含む。これらのうち特徴量抽出部４００、特徴量合成部４０２およびクラス分類部４０３は直列に接続された複数層からなるネットワーク構造であり、この部分は画像を入力されクラス分類結果を出力する。以下、当該部分をネットワーク主部と呼ぶことにする。 The network that constitutes the classifier of this embodiment includes a feature extraction section 400, a bias information compression section 401, a feature synthesis section 402, and a class classification section 403. Among these, the feature extraction unit 400, the feature synthesis unit 402, and the class classification unit 403 have a network structure consisting of a plurality of layers connected in series, and these parts receive an image as input and output a class classification result. Hereinafter, this part will be referred to as the network main part.

特徴量抽出部４００およびクラス分類部４０３は、畳み込み層や活性化関数、プーリング（pooling）層などから構成され、ネットワーク主部は例えば、近傍画素の特徴量を畳み込んだ特徴量マップを求める処理を繰り返し行って周囲の画素との関係を集約し、さらに元の画像の画素についてクラスを識別する処理を行う。本実施形態では、ネットワーク主部はその途中に特徴量合成部４０２を挿入され、特徴量合成部４０２の前と後との２つの部分に分かれる。これら２つの部分が特徴量抽出部４００、クラス分類部４０３であり、特徴量抽出部４００は画像を入力され、当該画像から特徴量の計算を行い、一方、クラス分類部４０３は特徴量合成部４０２が生成する合成特徴量に基づいて画素のクラス分類を行い画像を領域分割する処理を行う。但し、特徴量抽出部４００が行う特徴量の計算は、複数階層に生成される特徴量マップの途中の階層までであり得、また、クラス分類部４０３が行う処理は当該途中の階層以降の特徴量マップの生成を含み得る。 The feature extraction unit 400 and the class classification unit 403 are composed of a convolution layer, an activation function, a pooling layer, etc., and the main part of the network performs, for example, a process to obtain a feature map convoluted with the features of neighboring pixels. This process is repeated to consolidate relationships with surrounding pixels, and further processing is performed to identify the class of the pixels in the original image. In this embodiment, the network main section has a feature amount synthesis section 402 inserted in the middle thereof, and is divided into two parts, one before and after the feature amount synthesis section 402. These two parts are a feature extraction unit 400 and a class classification unit 403.The feature extraction unit 400 receives an image and calculates a feature from the image, while the class classification unit 403 is a feature synthesis unit. 402 performs a process of classifying pixels into classes and dividing the image into regions. However, the feature amount calculation performed by the feature amount extraction unit 400 may be performed up to an intermediate layer of the feature amount map generated in multiple layers, and the processing performed by the class classification unit 403 may be performed on features from the intermediate layer onward. The method may include generating a quantity map.

バイアス情報圧縮部４０１は全結合層などから構成され、低次元表現でのバイアス情報を得て特徴量合成部４０２へ出力する。つまり、バイアス情報は画像に映っているものやそのシーンに基づいて設定されるが、入力される画像中に現れるクラスの数は、分類器が分類可能な全クラスの数よりも十分小さいことが多く、また例えば屋外の画像に屋内のクラスは含まれにくい、屋内では壁と床は同時に含まれやすいなどの共起性を持つため、バイアス情報は比較的低次元の情報で表すことができ、バイアス情報圧縮部４０１はこの低次元化の変換処理を行う。例えば、バイアス情報圧縮部４０１は、予め定義された全クラスに応じた数の変数で表されるバイアス情報を入力され、当該情報を次元圧縮し、より少ない変数で表現されるバイアス情報に変換して出力する。 The bias information compression unit 401 is composed of a fully connected layer and the like, and obtains bias information in a low-dimensional representation and outputs it to the feature amount synthesis unit 402. In other words, bias information is set based on what appears in the image and the scene, but the number of classes appearing in the input image must be sufficiently smaller than the total number of classes that can be classified by the classifier. Bias information can be expressed as relatively low-dimensional information because of the co-occurrence, for example, indoor classes are unlikely to be included in outdoor images, and walls and floors are likely to be included at the same time in indoor images. The bias information compression unit 401 performs this dimension reduction conversion process. For example, the bias information compression unit 401 receives bias information expressed by a number of variables corresponding to all predefined classes, compresses the dimensions of this information, and converts it into bias information expressed by fewer variables. and output it.

バイアス入力手段７０からの入力バイアス情報をバイアス情報圧縮部４０１にて次元圧縮することにより、利用者に理解可能な形式で入力バイアス情報を設定しつつ、コンピュータが効率よく利用可能な形式に変換して領域分割に利用することができる。よって、領域分割結果の変動を抑制する制御を容易且つ効率的に行うことが可能となる。 The input bias information from the bias input means 70 is dimensionally compressed by the bias information compression unit 401, thereby setting the input bias information in a format that is understandable to the user and converting it into a format that can be efficiently used by a computer. can be used for region segmentation. Therefore, it becomes possible to easily and efficiently perform control to suppress fluctuations in the region division results.

特徴量合成部４０２は、特徴量抽出部４００にて抽出された特徴量に、バイアス情報圧縮部４０１にて圧縮されたバイアス情報を合成して合成特徴量を生成し、クラス分類部４０３へ入力する。 The feature quantity synthesis unit 402 synthesizes the bias information compressed by the bias information compression unit 401 with the feature quantity extracted by the feature quantity extraction unit 400 to generate a composite feature quantity, and inputs the synthesized feature quantity to the class classification unit 403. do.

［学習装置としての構成］
図４は分類器の学習を行う学習装置としての領域分割装置１の概略の機能ブロック図であり、記憶部４が学習用データ記憶手段４１および学習モデル記憶手段４２として機能し、画像処理部５が学習用バイアス生成手段５２および学習手段５３として機能する。 [Configuration as a learning device]
FIG. 4 is a schematic functional block diagram of the region segmentation device 1 as a learning device for learning a classifier, in which the storage section 4 functions as a learning data storage means 41 and a learning model storage means 42, and the image processing section 5 functions as the learning bias generation means 52 and the learning means 53.

学習用データ記憶手段４１は、学習用データ群である多数の画像および当該画像に対し予め与えられた正解のクラスと、当該正解のクラスから導出したバイアス情報である学習用バイアス情報とを記憶する。学習用画像と当該画像それぞれに対応する正解のクラスは、学習処理に先立って予め学習用データ記憶手段４１に記憶される。一方、学習用バイアス情報は学習用バイアス生成手段５２により生成され、学習用データ記憶手段４１に記憶される。 The learning data storage means 41 stores a large number of images as a learning data group, a correct class given in advance for the image, and learning bias information that is bias information derived from the correct class. . The learning images and the correct classes corresponding to the images are stored in advance in the learning data storage means 41 prior to the learning process. On the other hand, the learning bias information is generated by the learning bias generation means 52 and stored in the learning data storage means 41.

学習用バイアス生成手段５２は、学習用データ記憶手段４１に記憶される学習用画像それぞれに対応して、その正解のクラスから学習用バイアス情報を生成して、学習用データ記憶手段４１に記憶させる。学習用バイアス情報は入力バイアス情報と同じ形式である。すなわち、学習用バイアス生成手段５２は、学習用画像（学習用データ群）ごとに、予め定義された全クラスのそれぞれと一対一に対応した要素を有した学習用バイアス情報であって学習用画像に与えられた正解のクラスをクラス分類結果に現れやすくさせるクラスに指定すると共に当該正解のクラス以外をクラス分類結果に現れにくくさせるクラスに指定した学習用バイアス情報を生成する。 The learning bias generation means 52 generates learning bias information from the correct class corresponding to each learning image stored in the learning data storage means 41, and stores it in the learning data storage means 41. . The learning bias information has the same format as the input bias information. That is, the learning bias generation means 52 generates learning bias information having elements that correspond one-to-one with each of all predefined classes for each learning image (learning data group). generates learning bias information that specifies the correct class given to the class as a class that is likely to appear in the classification results, and that specifies classes other than the correct answer as classes that are difficult to appear in the classification results.

この学習用バイアス情報を学習に供することで、利用者に理解可能な形式で入力バイアス情報を設定して領域分割結果の変動を抑制する制御を容易且つ効率的に行うための分類器を学習することが可能となる。 By providing this training bias information for learning, a classifier can be trained to easily and efficiently control the input bias information to set it in a format that is understandable to the user and suppress fluctuations in the region segmentation results. becomes possible.

学習手段５３は学習用画像、正解のクラスおよび学習用バイアス情報を入力とし、出力のクラス分類結果の正解に対する誤差に基づいて学習モデルを更新する学習を行う。 The learning means 53 inputs the learning image, the correct class, and the learning bias information, and performs learning to update the learning model based on the error of the output class classification result with respect to the correct answer.

学習モデル記憶手段４２は上述の分類器についての学習モデルを記憶する。学習手段５３による学習処理に伴い、学習モデル記憶手段４２に記憶される学習モデルは更新される。そして、学習が完了すると、学習モデル記憶手段４２は分類器の学習済みモデルを記憶し、分類器記憶手段４０として機能する。 The learning model storage means 42 stores the learning model for the above-mentioned classifier. Along with the learning process by the learning means 53, the learning model stored in the learning model storage means 42 is updated. When the learning is completed, the learning model storage means 42 stores the learned model of the classifier and functions as the classifier storage means 40.

［領域分割装置１の動作］
次に、領域分割装置１の動作を、領域分割処理と学習処理とに分けて説明する。 [Operation of area dividing device 1]
Next, the operation of the region dividing device 1 will be explained separately into region dividing processing and learning processing.

［領域分割処理での動作］
図５は領域分割処理での領域分割装置１の動作に関する概略のフロー図である。 [Operation in area division processing]
FIG. 5 is a schematic flow diagram regarding the operation of the region dividing apparatus 1 in region dividing processing.

領域分割装置１が領域分割処理を開始すると、撮影部２は所定時間おきに監視空間を撮影した画像を順次出力する。画像処理部５は通信部３と協働して、撮影部２から画像を受信するたびに図５のフロー図に示す動作を繰り返す。 When the area dividing device 1 starts the area dividing process, the photographing unit 2 sequentially outputs images taken of the monitoring space at predetermined intervals. The image processing section 5 cooperates with the communication section 3 and repeats the operation shown in the flowchart of FIG. 5 every time it receives an image from the photographing section 2.

通信部３は画像入力手段３０として機能し、画像を受信すると当該画像を画像処理部５に入力する（ステップＳ１００）。 The communication unit 3 functions as an image input unit 30, and upon receiving an image, inputs the image to the image processing unit 5 (step S100).

画像処理部５は入力された画像（入力画像）に対し、セグメンテーション結果を操作するためのバイアス情報（入力バイアス情報）を設定する。例えば、利用者が、セグメンテーション結果に含まれてほしいクラスや除外したいクラスを決定し、それをバイアス情報にすることができる。この場合、画像処理部５は入力画像を表示部６に表示し、利用者は当該入力画像に対する入力バイアス情報を操作入力部７から入力する。操作入力部７はバイアス入力手段７０として機能し、入力バイアス情報を画像処理部５の領域分割手段５０に入力する（ステップＳ１０１）。また、屋外・屋内など撮影画像中のシーンが既知であり画像中に含まれるクラスが限定できる場合は、それをもとに入力バイアス情報を定めることもできる。この場合には、当該入力バイアス情報は例えば、領域分割処理の開始時に予め領域分割手段５０に入力・設定される。 The image processing unit 5 sets bias information (input bias information) for operating the segmentation result on the input image (input image). For example, users can decide which classes they want included or excluded from the segmentation results, and use them as bias information. In this case, the image processing section 5 displays the input image on the display section 6, and the user inputs input bias information for the input image from the operation input section 7. The operation input unit 7 functions as a bias input unit 70, and inputs input bias information to the area division unit 50 of the image processing unit 5 (step S101). Furthermore, if the scene in the captured image, such as outdoors or indoors, is known and the classes included in the image can be limited, input bias information can be determined based on that. In this case, the input bias information is input and set in advance to the region dividing means 50, for example, at the start of the region dividing process.

領域分割手段５０は、入力画像および入力バイアス情報を入力されると、分類器記憶手段４０から読み出した分類器を用いて画像の領域分割を行う。ステップＳ１００の入力バイアス情報は分類器のバイアス情報圧縮部４０１にて圧縮され（ステップＳ１０２）、一方、ステップＳ１００の入力画像は分類器の特徴量抽出部４００に入力され、特徴量抽出部４００は入力画像から特徴量を算出する（ステップＳ１０３）。 When the region dividing means 50 receives the input image and input bias information, it divides the image into regions using the classifier read from the classifier storage means 40. The input bias information in step S100 is compressed by the bias information compression unit 401 of the classifier (step S102), while the input image in step S100 is input to the feature extraction unit 400 of the classifier; Feature amounts are calculated from the input image (step S103).

分類器の特徴量合成部４０２は、特徴量抽出部４００から出力される特徴量に、バイアス情報圧縮部４０１から出力される入力バイアス情報を合成して、合成特徴量を生成する（ステップＳ１０４）。 The feature synthesis unit 402 of the classifier synthesizes the input bias information output from the bias information compression unit 401 with the feature output from the feature extraction unit 400 to generate a composite feature (step S104). .

図６は合成特徴量の生成処理を説明する模式図である。図６は図３に示した分類器内におけるデータを模式的に表しており、図の左側には、ネットワーク主部をなす図３の特徴量抽出部４００、特徴量合成部４０２およびクラス分類部４０３の並びに対応して、分類器へ入力される画像１００、特徴量合成部４０２により生成される合成特徴量１１０、分類器から出力されるクラス分類結果１４０が並んでいる。また、図の右側には、バイアス情報圧縮部４０１の入力ノード１２０および当該ノードに入力されるバイアス情報１２１、並びにバイアス情報圧縮部４０１の出力ノード１３０が示されている。 FIG. 6 is a schematic diagram illustrating the process of generating a composite feature amount. FIG. 6 schematically represents the data in the classifier shown in FIG. 403, the image 100 input to the classifier, the synthesized feature amount 110 generated by the feature amount synthesis unit 402, and the class classification result 140 output from the classifier are arranged. Further, on the right side of the figure, the input node 120 of the bias information compression section 401, the bias information 121 input to the node, and the output node 130 of the bias information compression section 401 are shown.

図６の左側に並ぶネットワーク主部のデータに関し、画像１００の幅方向にｘ軸、高さ方向にｙ軸をとり、また特徴量のチャンネルに対応する次元をｃ軸で表している。画像１００の大きさはｘ方向にＷ_Ｉ画素、ｙ方向にＨ_Ｉ画素である。特徴量抽出部４００にて生成される特徴量マップはｘ方向にＷ_Ｆ画素、ｙ方向にＨ_Ｆ画素の大きさで、ｃ方向の大きさ、つまりチャンネル数はＣチャンネルとする。ちなみに、特徴量マップのｘ，ｙ方向のサイズは一般に画像１００のサイズとは一致せず、通常、Ｗ_Ｆ＜Ｗ_Ｉ，Ｈ_Ｆ＜Ｈ_Ｉとなる。 Regarding the data of the network main part arranged on the left side of FIG. 6, the x axis is in the width direction of the image 100, the y axis is in the height direction, and the c axis is the dimension corresponding to the channel of the feature amount. The size of the image 100 is W _I pixels in the x direction and H _I pixels in the y direction. The feature map generated by the feature extraction unit 400 has a size of _WF pixels in the x direction and _HF pixels in the y direction, and a size in the c direction, that is, the number of channels is C channels. Incidentally, the size of the feature map in the x and y directions generally does not match the size of the image 100, and usually W _F <W _I and H _F <H _I.

図６に例示するバイアス情報１２１は、予め定められたＮ個のクラスそれぞれについて画像中に含まれるであろうクラスか否かの情報である。例えば、分類器が分類対象とする全クラスが当該Ｎクラスとして設定される。 The bias information 121 illustrated in FIG. 6 is information as to whether or not each of N predetermined classes is likely to be included in an image. For example, all classes to be classified by the classifier are set as the N classes.

具体的には、バイアス情報１２１は屋内用のものであり、屋内に出現するであろうクラスを値“１”、屋内に出現しないであろうクラスを“０”で表したＮ次元のベクトルである。バイアス情報１２１はその具体的な一例を示しており、屋内に存在し得る物体等のクラス、例えば“人”や“床”のクラスは画像に含まれるとして、ベクトルにて対応する要素に“１”が設定され、一方、室内に存在しない物体等、例えば“道路”のクラスは画像に含まれないとして、対応する要素に“０”が設定されている。 Specifically, the bias information 121 is for indoor use, and is an N-dimensional vector in which classes that will appear indoors are represented by a value of "1" and classes that will not appear indoors are represented by a value of "0". be. The bias information 121 shows a specific example, and assumes that classes of objects that may exist indoors, such as "person" and "floor", are included in the image, and the corresponding element in the vector is set to "1". ” is set, and on the other hand, objects that do not exist indoors, for example, the class “road” are not included in the image, and the corresponding element is set to “0”.

全要素の値が“０”のバイアス情報を基にして屋内に出現するであろうクラスに対応する要素の値を“１”に変更すれば、クラス分類結果に現れやすくさせるクラスを指定するバイアス情報１２１と言える。また、全要素の値が“１”のバイアス情報を基にして屋内に出現しないであろうクラスに対応する要素の値を“０”に変更すれば、クラス分類結果に現れにくくさせるクラスを指定するバイアス情報１２１と言える。 A bias that specifies classes that are more likely to appear in the classification results by changing the value of the element corresponding to the class that will appear indoors to "1" based on bias information where the value of all elements is "0" It can be said to be information 121. In addition, if you change the value of the element corresponding to the class that will not appear indoors to "0" based on the bias information where the value of all elements is "1", you can specify the class that will be less likely to appear in the classification results. This can be said to be the bias information 121.

バイアス情報圧縮部４０１の入力ノード１２０はバイアス情報１２１の要素と一対一に対応しており、その数はＮであり、一方、出力ノード１３０の数ＤはＮ未満である。バイアス情報圧縮部４０１は、入力ノード１２０に入力されたバイアス情報１２１を次元圧縮して、出力ノード１３０から圧縮されたバイアス情報を出力する。つまり、バイアス情報１２１はＮ次元のベクトルからＤ次元のベクトルに圧縮される。ちなみに、図６では、バイアス情報圧縮部４０１として、入力ノード１２０と出力ノード１３０とが全結合された構成を示している。 The input nodes 120 of the bias information compression unit 401 have a one-to-one correspondence with the elements of the bias information 121, and the number thereof is N, while the number D of the output nodes 130 is less than N. The bias information compression unit 401 performs dimension compression on the bias information 121 input to the input node 120 and outputs compressed bias information from the output node 130. That is, the bias information 121 is compressed from an N-dimensional vector to a D-dimensional vector. Incidentally, FIG. 6 shows a configuration in which the input node 120 and the output node 130 are fully coupled as the bias information compression unit 401.

特徴量合成部４０２は、バイアス情報圧縮部４０１の出力ノード１３０から圧縮されたバイアス情報を入力され、当該バイアス情報を特徴量抽出部４００から入力された特徴量マップと合成して、合成特徴量１１０を生成する。合成特徴量１１０は、合成前の特徴量マップにてｘ座標、ｙ座標の組で指定されるＣ次元の特徴量ベクトルそれぞれにＤ次元ベクトルで表されるバイアス情報を連結したものであり、合成前の特徴量マップと幅と高さが同じで、チャンネル数が（Ｃ＋Ｄ）チャンネルとなった構造を有する。例えば、合成特徴量１１０の第１～第Ｃチャンネルは合成前の特徴量マップで、第（Ｃ＋１）～第（Ｃ＋Ｄ）チャンネルに、バイアス情報圧縮部４０１の出力ノード１３０の第１～第Ｄノードの出力値が設定される。 The feature quantity synthesis unit 402 receives compressed bias information from the output node 130 of the bias information compression unit 401, synthesizes the bias information with the feature quantity map input from the feature quantity extraction unit 400, and generates a synthesized feature quantity. 110 is generated. The synthesized feature quantity 110 is obtained by concatenating bias information expressed by a D-dimensional vector to each C-dimensional feature vector specified by a pair of x and y coordinates in the feature quantity map before synthesis. It has the same width and height as the previous feature map, and has a structure in which the number of channels is (C+D) channels. For example, the first to Cth channels of the composite feature quantity 110 are feature quantity maps before composition, and the (C+1) to (C+D)th channels are the first to D nodes of the output node 130 of the bias information compression unit 401. The output value of is set.

本実施形態では各（ｘ，ｙ）座標のバイアス情報は共通であるので、合成特徴量１１０の構造は、バイアス情報のＤ個の要素それぞれをｘ，ｙ方向に複製して特徴量抽出部４００の出力と同じＷ_Ｆ×Ｈ_Ｆ画素の大きさに拡大し、それを合成前の特徴量マップに積層した構造である。つまり、例えば、第１～第Ｃチャンネルの特徴量は座標（ｘ，ｙ）に応じて異なり得るのに対し、本実施形態では第（Ｃ＋１）～第（Ｃ＋Ｄ）の各チャンネルには全ての座標（ｘ，ｙ）に共通の値が設定される。 In this embodiment, since the bias information for each (x, y) coordinate is common, the structure of the composite feature amount 110 is such that each of the D elements of bias information is duplicated in the x and y directions, and the feature amount extraction unit 400 It has a structure in which it is enlarged to the same size of W _F ×H _F pixels as the output of , and is layered on the feature map before synthesis. In other words, for example, while the feature amounts of the first to Cth channels may differ depending on the coordinates (x, y), in this embodiment, the feature amounts of the first to Cth channels may differ depending on the coordinates (x, y), whereas in this embodiment, the feature amounts of the first to Cth channels include all coordinates. A common value is set for (x, y).

クラス分類部４０３は合成特徴量１１０に基づき入力画像１００の各画素についてクラス分類を行い、クラス分類結果１４０を出力する（ステップＳ１０５）。つまり、クラス分類結果１４０は入力画像１００の画素ごとの分類結果からなる。例えば、各画素に、分類対象とするクラス数に当たるＮ個の値が対応付けられる。この場合、図６に示すように、クラス分類結果１４０はｘ方向にＷ_Ｉ画素、ｙ方向にＨ_Ｉ画素、ｃ方向にＮチャンネルであるデータとなる。クラス分類結果１４０のチャンネルはＮ個のクラスと一対一に対応しており、例えば、各画素の各チャンネルには、当該チャンネルに対応するクラスに当該画素が属する確からしさが高いほど大きな値が与えられる。領域分割手段５０は入力画像１００の座標（ｘ，ｙ）の画素を、例えば、クラス分類結果１４０の当該座標（ｘ，ｙ）において最大値が出力されたチャンネルに対応するクラスに分類することができる。入力画像１００の各画素についてクラス分類を行うことで、入力画像１００が領域分割されラベル領域が定義され、領域分割手段５０は得られたラベル領域情報を領域情報出力手段３１へ出力する（ステップＳ１０６）。 The class classification unit 403 classifies each pixel of the input image 100 based on the composite feature amount 110, and outputs the class classification result 140 (step S105). That is, the class classification result 140 consists of the classification result for each pixel of the input image 100. For example, each pixel is associated with N values corresponding to the number of classes to be classified. In this case, as shown in FIG. 6, the class classification result 140 is data of W _I pixels in the x direction, H _I pixels in the y direction, and N channels in the c direction. The channels of the class classification result 140 have a one-to-one correspondence with the N classes. For example, each channel of each pixel is given a larger value as the probability that the pixel belongs to the class corresponding to the channel is higher. It will be done. The area dividing means 50 can classify the pixel at the coordinates (x, y) of the input image 100 into a class corresponding to the channel that outputs the maximum value at the coordinates (x, y) of the class classification result 140, for example. can. By classifying each pixel of the input image 100, the input image 100 is divided into regions and label regions are defined, and the region division means 50 outputs the obtained label region information to the region information output means 31 (step S106 ).

領域分割装置１は、ステップＳ１００にて入力された画像についてラベル領域情報を出力するとステップＳ１００に戻り、次に入力される画像について上述のステップＳ１００～Ｓ１０６の処理を繰り返す。 After outputting the label area information for the input image in step S100, the area dividing device 1 returns to step S100 and repeats the above-described processes of steps S100 to S106 for the next input image.

図７は、領域分割装置１の領域分割処理の処理例を説明するための模式図である。図７（ａ）の画像２００は入力画像を示しており、入力画像２００には、壁２０１、窓２０２、人２０３と共に、黒い絨毯が敷かれた床２０４が撮影されている。 FIG. 7 is a schematic diagram for explaining a processing example of the region division process of the region division device 1. An image 200 in FIG. 7A shows an input image, and the input image 200 includes a wall 201, a window 202, a person 203, and a floor 204 covered with a black carpet.

入力画像２００に対して得られるラベル領域が図７（ｂ），（ｃ）の画像２１０，２２０である。図７（ｂ）の画像２１０は従来技術により得られるラベル領域を表しており、図７（ｃ）の画像２２０は本実施形態の領域分割装置１により得られるラベル領域を表している。 Label areas obtained for the input image 200 are images 210 and 220 in FIGS. 7(b) and 7(c). An image 210 in FIG. 7(b) represents a label area obtained by the conventional technique, and an image 220 in FIG. 7(c) represents a label area obtained by the area dividing apparatus 1 of this embodiment.

図７（ｂ）に示す従来技術の処理結果では、壁２０１、窓２０２、人２０３が撮影された領域はそれぞれ正しく壁のクラスのラベル領域２１１、窓のクラスのラベル領域２１２、人のクラスのラベル領域２１３として分割されているが、床２０４が撮影された領域は正しく床のクラスとして分割されたラベル領域２１４と、誤って道路のクラスとして分割されたラベル領域２１５とに分かれてしまっている。 In the processing results of the prior art shown in FIG. 7B, the areas where the wall 201, window 202, and person 203 are photographed are correctly labeled as the label area 211 of the wall class, the label area 212 of the window class, and the label area of the person class, respectively. Although it is divided into a label area 213, the area where the floor 204 was photographed is divided into a label area 214, which is correctly divided into a floor class, and a label area 215, which is incorrectly divided into a road class. .

一方、図７（ｃ）は、本実施形態の領域分割装置１に、入力画像２００と共に、入力バイアス情報１２１として図６に例示した屋内用のものを入力して得た処理結果である。当該入力バイアス情報１２１の例では、“人”、“床”のクラスは値“１”であるが、“道路”のクラスは値“０”に設定され、この入力バイアス情報１２１を用いることで、分類処理にて道路のクラスが抑制される。その結果、図７（ｃ）では、壁２０１、窓２０２、人２０３が撮影された領域はそれぞれ正しく壁のクラスのラベル領域２１１、窓のクラスのラベル領域２１２、人のクラスのラベル領域２１３として分割され、さらに道路のクラスが抑制されたことによって、床２０４が撮影された領域も正しく床のクラスとして分割されている。 On the other hand, FIG. 7C shows a processing result obtained by inputting the indoor bias information 121 illustrated in FIG. 6 together with the input image 200 to the region dividing apparatus 1 of this embodiment. In the example of the input bias information 121, the classes "person" and "floor" have the value "1", but the class "road" is set to the value "0", and by using this input bias information 121, , the road classes are suppressed in the classification process. As a result, in FIG. 7(c), the areas where the wall 201, window 202, and person 203 are photographed are correctly labeled as the label area 211 of the wall class, the label area 212 of the window class, and the label area 213 of the person class, respectively. By dividing and further suppressing the road class, the area where the floor 204 is photographed is also correctly divided into floor classes.

つまり、図７の例では、部屋（計測対象の空間）に現れないと想定される道路のクラスを現れにくく設定した入力バイアス情報を与えて、部屋を撮影した入力画像（空間に分布するデータ群）を領域分割することで、道路のクラスへの誤分類が抑制される。よって、道路のクラスを含めた多様な学習を行った分類器を利用しつつ、床を道路に誤分類する変動を抑えることができる。 In other words, in the example shown in Fig. 7, input bias information is set to make it difficult for road classes that are assumed not to appear in the room (the space to be measured) to appear in the input image of the room (a group of data distributed in the space). ) is divided into regions to suppress misclassification of roads into classes. Therefore, while using a classifier that has undergone various learnings including road classes, it is possible to suppress fluctuations in misclassifying floors as roads.

このように、本発明の領域分割装置によれば、多様な学習を行った分類器を利用しつつ、変動を抑制した高精度な領域分割を行うことが可能となる。なお、多様な学習を行った分類器を利用できることは、計測対象の空間ごとに当該空間に特化した分類器を用意しなくてもよいという利が得られることを意味する。ちなみに、入力バイアス情報の値を０に設定したクラスは、全く分類結果に現れないのではなくあくまでも抑制されるため、当該クラスである可能性が高ければ分類結果に現れ得る。その点にも多様な学習を行った分類器を利用できることの利がある。 In this way, according to the region segmentation device of the present invention, it is possible to perform highly accurate region segmentation while suppressing fluctuations while using classifiers that have undergone various types of learning. Note that the ability to use classifiers that have undergone various types of learning means that there is an advantage that there is no need to prepare a classifier specialized for each space to be measured. Incidentally, a class for which the value of input bias information is set to 0 does not appear in the classification result at all, but is suppressed to the last, so if the class has a high possibility of being the class, it may appear in the classification result. In this respect, there is an advantage in being able to use classifiers that have undergone various types of learning.

また、上述した、芝に対して遊技場と草のクラスのアノテーションが混在する学習用データを用いて学習を行った分類器の例について説明する。例えば、野球場内に設置した撮影部２から入力される入力画像においては視野全体がそもそも野球場であることから芝を草のクラスに分類することが望ましく、ヘリコプターに設置した撮影部２から入力される入力画像においては野球場を含む施設等の情報を得たいことから芝を含む野球場を遊技場のクラスに分類することが望ましいとする。この場合、前者の入力画像に対する入力バイアス情報を草のクラスを現れやすく設定し且つ遊技場のクラスを現れにくく設定することで、後者の入力画像に対する入力バイアス情報は草のクラスを現れにくく設定し且つ遊技場のクラスを現れやすく設定することで、それぞれについて希望通りの領域分割結果を得ることができる。 Furthermore, an example of a classifier that performs learning using the above-mentioned learning data in which annotations of the playground and grass classes are mixed for grass will be described. For example, in the case of an input image that is input from the imaging unit 2 installed in a baseball field, since the entire field of view is the baseball field, it is desirable to classify grass into the grass class. Since we want to obtain information on facilities such as baseball fields in input images that include baseball fields, it is desirable to classify baseball fields that include grass into the game field class. In this case, by setting the input bias information for the former input image to make the grass class more likely to appear and to make the game room class less likely to appear, the input bias information for the latter input image is set to make the grass class less likely to appear. In addition, by setting the classes of the game hall so that they appear easily, desired area division results can be obtained for each class.

このように、本発明の領域分割装置によれば、異なる付与基準が混在した学習用データによって学習を行った分類器を利用しつつ、変動を抑制した高精度な領域分割を行うことが可能となる。なお、異なる付与基準が混在した学習用データによって学習を行った分類器を利用できることは、計測対象の空間ごとに当該空間に適した付与基準で作成し直した学習用データを用いた分類器を用意しなくてもよいという利が得られることを意味する。 As described above, according to the region segmentation device of the present invention, it is possible to perform highly accurate region segmentation that suppresses fluctuations while using a classifier that has been trained using training data containing a mixture of different assignment criteria. Become. Note that it is possible to use a classifier that has been trained using training data with a mixture of different assignment criteria, which means that it is possible to use a classifier that uses training data that has been re-created for each space to be measured using assignment criteria that are appropriate for that space. This means that you have the advantage of not having to prepare anything.

［学習処理での動作］
領域分割装置１は入力画像を領域分割する動作に先立って、分類器を学習する動作を行う。以下、この分類器の学習について説明する。領域分割装置１における分類器の学習は、学習用画像とそれに対応する領域分割の正解データである正解のクラスと正解のクラスから作成したバイアス情報（学習用バイアス情報）とを用い、学習用画像に対して分類器の学習モデルが分類した結果と正解データとの誤差をもとに、誤差逆伝播法などの既知の最適化手法を用いて、学習モデルのパラメータを繰り返し誤差が収束するまで更新する。この学習によって分類処理を偏らせる制御が可能な分類器を学習させることができる。また、当該分類器の学習は、特徴量抽出部４００およびクラス分類部４０３の学習に加え、学習用バイアス情報を用いてバイアス情報圧縮部４０１を学習する動作を含む。 [Operation in learning process]
Prior to the operation of dividing an input image into regions, the region dividing device 1 performs an operation of learning a classifier. The learning of this classifier will be explained below. The learning of the classifier in the region segmentation device 1 uses a learning image, a correct class that is correct data for the corresponding region segmentation, and bias information (learning bias information) created from the correct class. Based on the error between the classification result of the classifier's learning model and the correct data, the parameters of the learning model are repeatedly updated using known optimization methods such as error backpropagation until the error converges. do. Through this learning, a classifier that can control biasing of classification processing can be trained. Further, the learning of the classifier includes, in addition to the learning of the feature extraction unit 400 and the class classification unit 403, the operation of learning the bias information compression unit 401 using the learning bias information.

図８は学習処理での領域分割装置１の動作に関する概略のフロー図である。 FIG. 8 is a schematic flow diagram regarding the operation of the region dividing device 1 in the learning process.

当該学習処理では、学習用データとして、学習用画像、正解のクラスおよび学習用バイアス情報を用いる。そこで、学習動作開始が指示されると、画像処理部５は学習用バイアス生成手段５２として機能し、学習用データ記憶手段４１に記憶される各学習用画像について学習用バイアス情報を生成する。具体的には、学習用バイアス生成手段５２は、学習用データ記憶手段４１に学習用画像に対応付けて記憶されている正解のクラスから学習用バイアス情報を生成し、これを当該学習用画像に対応付けて学習用データ記憶手段４１に記憶させる（ステップＳ２００）。 In the learning process, a learning image, a correct class, and learning bias information are used as learning data. Therefore, when the start of the learning operation is instructed, the image processing section 5 functions as the learning bias generation means 52 and generates learning bias information for each learning image stored in the learning data storage means 41. Specifically, the learning bias generation means 52 generates learning bias information from the correct class stored in the learning data storage means 41 in association with the learning image, and applies this to the learning image. The data are stored in the learning data storage means 41 in association with each other (step S200).

学習用バイアス情報は上述の入力バイアス情報１２１に整合する形式であり、本実施形態ではＮ個のクラスに対応した要素からなるＮ次元ベクトルである。当該ベクトルを｛Ｂ_ｉ｝（１≦ｉ≦Ｎ）と表し、また、正解のクラスが、対応する学習用画像に含まれるクラスの集合Ｌを与える場合に、一例として、学習用バイアス情報のベクトルの各要素Ｂ_ｉの値は、当該要素に対応するクラスが集合Ｌに含まれるクラスであるか否かに応じて設定することができる。つまり、この例では、分類器が分類対象とする全クラスをＮクラスとし、そのｉ番目（１≦ｉ≦Ｎ）のクラスをＣ_ｉで表すと、学習用バイアス生成手段５２は当該クラスＣ_ｉに対応する学習用バイアス情報のベクトルの要素Ｂ_ｉを次式で設定する。 The learning bias information has a format that matches the input bias information 121 described above, and in this embodiment is an N-dimensional vector consisting of elements corresponding to N classes. The vector is expressed as {B _i } (1≦i≦N), and when the correct class gives a set L of classes included in the corresponding learning image, as an example, the vector of learning bias information The value of each element B _i can be set depending on whether the class corresponding to the element is included in the set L or not. That is, in this example, if all the classes to be classified by the classifier are N classes, and the i-th (1≦i≦N) class is represented by C _i , the learning bias generation means ₅₂ The element B _i of the vector of learning bias information corresponding to is set by the following equation.

ステップＳ２００での学習用バイアス情報の生成により学習用データが揃うと、画像処理部５は学習手段５３として機能し、学習モデル記憶手段４２から分類器の学習モデルを読み出す（ステップＳ２０１）。なお、この段階での学習モデルのパラメータは初期値である。 When the learning data is prepared by generating the learning bias information in step S200, the image processing unit 5 functions as the learning means 53 and reads out the learning model of the classifier from the learning model storage means 42 (step S201). Note that the parameters of the learning model at this stage are initial values.

次に、学習手段５３は学習用データ記憶手段４１から、学習用画像、正解のクラスおよび学習用バイアス情報のセットからなる学習用データを読み出し（ステップＳ２０２）、学習モデルを更新するための処理（ステップＳ２０３～Ｓ２０７）を行う。なお、ステップＳ２０２で読み出す学習用データは、学習用データ記憶手段４１に記憶されている学習用データの全セットではなく一部のセットであり、学習手段５３は学習データを一部分ずつ順次読み出し学習モデルを更新する処理を繰り返す。本実施形態ではステップＳ２０２にて複数セットの学習用データを読み出す。例えば、１０枚の学習用画像に対応する学習用データのセットが読み出される。 Next, the learning means 53 reads out learning data consisting of a learning image, a correct class, and a set of learning bias information from the learning data storage means 41 (step S202), and performs processing for updating the learning model ( Steps S203 to S207) are performed. Note that the learning data read out in step S202 is not the entire set of learning data stored in the learning data storage means 41 but a part of the set, and the learning means 53 sequentially reads out the learning data part by part and creates a learning model. Repeat the process of updating. In this embodiment, multiple sets of learning data are read out in step S202. For example, a set of learning data corresponding to 10 learning images is read out.

学習手段５３は、読み出した学習用データを１セットずつ順次処理対象に設定し（ステップＳ２０３）、処理対象の学習用画像とその学習用バイアス情報とを学習モデルに入力して処理対象の学習用画像の各画素を分類させる（ステップＳ２０４）。ステップＳ２０４では、その時点でのバイアス情報圧縮部４０１のパラメータを使用して学習用バイアス情報が圧縮され、また、その時点での特徴量抽出部４００のパラメータを用いて学習用画像の特徴量が算出される。それ以外の点ではステップＳ２０４での処理は基本的に、上述した領域分割処理の図５のステップＳ１０２～Ｓ１０６と同様であり、特徴量合成部４０２によって、圧縮された学習用バイアス情報と特徴量抽出部４００にて抽出された特徴量とから合成特徴量が作成され、クラス分類部４０３によって、各画素が属するクラスの分類が行われる。そして、得られた各画素のクラスを学習用画像の座標系に並べることで、学習用画像が領域分割された結果を得ることができる。 The learning means 53 sequentially sets the read learning data one set at a time as a processing target (step S203), inputs the learning image to be processed and its learning bias information into the learning model, and sets the learning data to be processed one by one (step S203). Each pixel of the image is classified (step S204). In step S204, the learning bias information is compressed using the parameters of the bias information compression unit 401 at that time, and the feature amount of the learning image is compressed using the parameters of the feature amount extraction unit 400 at that time. Calculated. Other than that, the process in step S204 is basically the same as steps S102 to S106 in FIG. A composite feature quantity is created from the feature quantity extracted by the extraction unit 400, and a class classification unit 403 classifies the class to which each pixel belongs. Then, by arranging the obtained classes of each pixel in the coordinate system of the learning image, it is possible to obtain a result in which the learning image is divided into regions.

ステップＳ２０３、Ｓ２０４の処理はステップＳ２０２で読み出した全ての学習用データに対して繰り返される（ステップＳ２０５にて「ＮＯ」の場合）。 The processes in steps S203 and S204 are repeated for all the learning data read out in step S202 (if "NO" in step S205).

全ての学習用データについて処理を終えると（ステップＳ２０５にて「ＹＥＳ」の場合）、学習手段５３は、領域分割結果として得られたラベル領域と、正解のクラスに基づくラベル領域とを比較して、分類結果の誤差を計算し（ステップＳ２０６）、その誤差をもとに学習モデルを更新する（ステップＳ２０７）。例えば、学習手段５３はステップＳ２０７にて、誤差逆伝播法などを用いて、特徴量抽出部４００、クラス分類部４０３およびバイアス情報圧縮部４０１のパラメータを更新する。 After completing the processing for all learning data (in the case of "YES" in step S205), the learning means 53 compares the label region obtained as a result of region division with the label region based on the correct class. , calculates the error of the classification result (step S206), and updates the learning model based on the error (step S207). For example, in step S207, the learning means 53 updates the parameters of the feature amount extraction section 400, class classification section 403, and bias information compression section 401 using an error backpropagation method or the like.

学習手段５３は、所定の反復終了条件が満たされていなければ（ステップＳ２０８にて「ＮＯ」の場合）、ステップＳ２０２～Ｓ２０８の処理を繰り返す。例えば、ステップＳ２０６で求める誤差が収束すること、および、反復回数が予め定めた上限回数に達することのいずれかを満たすことが反復終了条件とされる。 The learning means 53 repeats the processes of steps S202 to S208 if the predetermined repetition end condition is not satisfied ("NO" in step S208). For example, the iteration termination condition is that either the error determined in step S206 converges or the number of iterations reaches a predetermined upper limit number of times.

反復終了条件が満たされた場合には（ステップＳ２０８にて「ＹＥＳ」の場合）、学習手段５３はステップＳ２０７で更新された学習モデルを学習済みモデルとして学習モデル記憶手段４２に保存する（ステップＳ２０９）。具体的には、ステップＳ２０７で更新された各パラメータが保存される。これにより学習処理が終了し、上述したように、学習モデル記憶手段４２は分類器記憶手段４０となり、当該学習済みモデルは分類器として領域分割装置１の領域分割処理に供される。 If the iteration end condition is satisfied ("YES" in step S208), the learning means 53 stores the learning model updated in step S207 as a trained model in the learning model storage means 42 (step S209). ). Specifically, each parameter updated in step S207 is saved. This completes the learning process, and as described above, the learning model storage means 42 becomes the classifier storage means 40, and the learned model is used as a classifier in the area division process of the area division apparatus 1.

本実施形態の学習手段５３は、学習用バイアス情報を学習用画像ごとに生成し、学習を行っている。この意味を、上述した、芝に対して遊技場と草のクラスのアノテーションが混在する学習用データの例で説明する。学習用バイアス情報の概念が無い従来技術では、芝に対して遊技場のクラスが付与された学習用画像についても、芝に対して草のクラスが付与された学習用画像についても、学習用画像内の芝の画素を遊技場のクラスに分類することと草のクラスに分類することの両方を許容していたため、正解のクラスに対する誤差が小さくならない学習用画像が生じて学習が収束せず、学習精度が低下する場合があった。これに対して、本実施形態の学習手段５３は、芝に対して遊技場のクラスが付与された学習用画像に対応して草のクラスを現れにくく設定し且つ遊技場のクラスを現れやすく設定した学習用バイアス情報を生成し用いることで、当該学習用画像内の芝の画素を草のクラスに分類することを制限しつつ正解のクラスである遊技場のクラスに分類するよう誘導する。同時に、芝に対して草のクラスが付与された学習用画像に対応して遊技場のクラスを現れにくく設定し且つ草のクラスを現れやすく設定した学習用バイアス情報を生成し用いることで、当該学習用画像内の芝の画素を遊技場のクラスに分類することを制限しつつ正解のクラスである草のクラスに分類するよう誘導する。 The learning means 53 of this embodiment generates learning bias information for each learning image and performs learning. The meaning of this will be explained using the above-mentioned example of learning data in which annotations of the playground and grass classes are mixed for grass. In the conventional technology that does not have the concept of learning bias information, the learning image cannot be used for both the learning image in which the class of playground is assigned to the grass and the learning image in which the class of grass is assigned to the lawn. Because it was allowed to classify the grass pixels in both the playground class and the grass class, there was a training image where the error for the correct class did not become small, and learning did not converge. There were cases where learning accuracy decreased. On the other hand, the learning means 53 of the present embodiment sets the grass class to be difficult to appear and the game field class to be easy to appear, corresponding to the learning image in which the grass is assigned the playground class. By generating and using the learning bias information, the grass pixels in the learning image are restricted from being classified into the grass class, and are guided to be classified into the correct class, which is the game hall class. At the same time, by generating and using learning bias information that sets the class of the playground to be less likely to appear and the class of grass to be more likely to appear, corresponding to the learning image in which the grass class is assigned to the grass. To restrict the classification of grass pixels in a learning image into the playground class and guide them to classify them into the grass class, which is the correct class.

そのため、本発明の学習装置によれば、学習手段５３が学習用バイアス情報を学習用画像ごとに生成して学習を行うことによって、学習用画像単位で正解のクラス以外に分類することを制限できるため、学習を収束しやすくすることができる。よって、付与基準の異なるものが混在する学習用データを用いながらも、分類器の学習精度を向上させることができる。 Therefore, according to the learning device of the present invention, the learning means 53 generates learning bias information for each learning image and performs learning, so that classification of each learning image into a class other than the correct class can be restricted. Therefore, learning can be easily converged. Therefore, the learning accuracy of the classifier can be improved even when using learning data that includes a mixture of different assignment criteria.

［変形例］
（１）上記実施形態では、クラス分類結果に現れやすくさせるクラス又は現れにくくさせるクラスを指定するバイアス情報１２１として、１と０という２つの値を用いて、画像中に含まれるであろうクラスか否かという２つの状態を択一的に設定する例を示したが、バイアス情報は３つ以上の値を用いて表現されるものであってもよい。 [Modified example]
(1) In the above embodiment, two values, 1 and 0, are used as the bias information 121 that specifies a class that is likely to appear in the classification result or a class that is difficult to appear in the class classification result. Although an example is shown in which the two states of "no" are alternatively set, the bias information may be expressed using three or more values.

例えば、バイアス情報はクラス分類結果におけるクラスの現れやすさ又は現れにくさの度合いを指定することができる。当該度合いは例えば、０～１の連続値を用いて表現し得る。また、当該度合いとしてバイアス情報のクラスごとに設定する値を、例えば、画像中に占める当該クラスの面積の割合を用いて定めることができる。また、時系列画像をセグメンテーションする処理では、前時刻の処理結果を参考にしてバイアス情報を作成することができる。また、例えば、バイアス情報にて各クラスに設定する値に、当該クラスの想定される事前確率を用いてもよい。 For example, the bias information can specify the degree to which a class appears more easily or less easily in the classification results. The degree can be expressed using a continuous value of 0 to 1, for example. Further, the value set for each class of bias information as the degree can be determined using, for example, the proportion of the area of the class in the image. Furthermore, in the process of segmenting time-series images, bias information can be created with reference to the processing results at the previous time. Furthermore, for example, the assumed prior probability of the class may be used as the value set for each class in the bias information.

（２）上記実施形態および変形例では、バイアス情報は１つの画像の全体に対して共通の条件を指定するものであった。これに対して、分類器を、画像に設定した複数の領域のそれぞれに異なるバイアス情報を与え、複数のバイアス情報により領域別に異なる条件を指定する構成とすることもできる。これにより例えば、画像の上側には空のクラスが出やすくなるバイアスを掛け、下側には地面のクラスが出やすくなるバイアスを加えるなどの領域分割が可能となる。 (2) In the above embodiment and modification, the bias information specifies a common condition for the entire image. On the other hand, the classifier may be configured to provide different bias information to each of a plurality of regions set in an image, and specify different conditions for each region using the plurality of bias information. This makes it possible to perform region segmentation, for example, by applying a bias that makes it easier to see the sky class in the upper part of the image, and adding a bias that makes it easier to see the ground class in the lower part of the image.

（３）上記実施形態では、分類器にてバイアス情報圧縮部４０１を用い、入力されたバイアス情報を次元圧縮する例を説明した。しかし、バイアス情報圧縮部を使用せず、上記実施形態および各変形例のバイアス情報をその入力された状態のまま、特徴量合成部４０２にて特徴量抽出部４００からの画像特徴量と合成してもよい。 (3) In the above embodiment, an example was described in which the bias information compression unit 401 is used in the classifier to compress the dimensionality of input bias information. However, without using the bias information compression section, the bias information of the above embodiment and each modification is combined with the image feature from the feature extraction section 400 in the feature amount synthesis section 402 in the input state. It's okay.

（４）上記実施形態における分類器の学習処理では、特徴量抽出部４００およびクラス分類部４０３の学習に加え、バイアス情報圧縮部４０１の学習を同時並列的に行っている。これに対して、学習データのクラスの出現傾向をもとに主成分分析などでバイアス情報の圧縮手段（バイアス情報圧縮手段）を事前に用意し、これをバイアス情報圧縮部４０１として用いることができる。この場合には、特徴量抽出部４００およびクラス分類部４０３の学習の際に、バイアス情報圧縮部４０１の学習は不要となる。 (4) In the learning process of the classifier in the above embodiment, in addition to the learning of the feature quantity extraction section 400 and the class classification section 403, the learning of the bias information compression section 401 is performed simultaneously and in parallel. In contrast, it is possible to prepare a bias information compression means (bias information compression means) in advance using principal component analysis or the like based on the appearance tendency of classes in the learning data, and use this as the bias information compression unit 401. . In this case, the bias information compression section 401 does not need to learn when the feature extraction section 400 and the class classification section 403 learn.

また、バイアス情報圧縮手段を事前に用意する場合、分類器内にバイアス情報圧縮部４０１を設けずに、バイアス入力手段７０と領域分割手段５０の間にバイアス情報圧縮手段を接続する構成とすることもできる。この場合、領域分割手段５０がバイアス情報圧縮手段からのバイアス情報を分類器に入力し、当該バイアス情報が分類器の特徴量合成部４０２にて特徴量抽出部４００からの特徴量と合成する。 Further, when the bias information compression means is prepared in advance, the bias information compression means may be connected between the bias input means 70 and the area division means 50 without providing the bias information compression unit 401 in the classifier. You can also do it. In this case, the region division means 50 inputs the bias information from the bias information compression means to the classifier, and the bias information is combined with the feature amount from the feature amount extraction section 400 in the feature amount synthesis section 402 of the classifier.

（５）上記実施形態および変形例では、特徴量合成部４０２は特徴量抽出部４００からの特徴量にバイアス情報を連結することにより合成を行っている。別の実施形態においては、特徴量合成部４０２は特徴量抽出部４００からの特徴量とバイアス情報の積を合成特徴量として算出することにより合成を行うことができる。その場合、バイアス情報圧縮部４０１ないしバイアス情報圧縮手段はバイアス入力手段７０からのバイアス情報を特徴量抽出部４００からの特徴量のチャンネル数Ｃと等しいＣ次元に圧縮する。 (5) In the embodiment and modification described above, the feature quantity synthesis unit 402 performs synthesis by linking the bias information to the feature quantity from the feature quantity extraction unit 400. In another embodiment, the feature quantity synthesis unit 402 can perform synthesis by calculating the product of the feature quantity from the feature quantity extraction unit 400 and the bias information as a composite feature quantity. In that case, the bias information compression section 401 or bias information compression means compresses the bias information from the bias input means 70 into C dimensions, which is equal to the number of channels C of the feature amount from the feature amount extraction section 400.

（６）上記実施形態および各変形例では、分類器は多層のネットワーク構造としたが、それに限らない。例えば、特徴量抽出部４００は、画像からＨＯＧ（Histogram of Oriented Gradients）特徴量やカラーヒストグラムなどを抽出するものとしてもよいし、それらを組み合わせたものとしてもよい。 (6) In the above embodiment and each modification, the classifier has a multilayer network structure, but the structure is not limited thereto. For example, the feature extracting unit 400 may extract a HOG (Histogram of Oriented Gradients) feature, a color histogram, or the like from an image, or may be a combination of these.

（７）上記実施形態および各変形例では、データ群を二次元画像とする例を示したが、この例に限られない。例えばデータ群を二次元画像の時系列とすることができる。その場合、空間は時空間であり、データは画素である。また例えば、データ群を距離画像、空間を二次元空間、データを画素（距離値）とすることもできる。なお、その場合、撮像部２は距離画像センサとなる。また例えば、ポイントクラウド等の三次元計測データ、空間を三次元空間、データを計測点とすることもできる。なお、その場合は撮像部２に代えて三次元計測器が用いられる。 (7) In the above embodiment and each modified example, an example was shown in which the data group is a two-dimensional image, but the invention is not limited to this example. For example, the data group can be a time series of two-dimensional images. In that case, space is space-time and data is pixels. Further, for example, the data group may be a distance image, the space may be a two-dimensional space, and the data may be pixels (distance values). Note that in that case, the imaging unit 2 becomes a distance image sensor. Further, for example, three-dimensional measurement data such as a point cloud, the space can be a three-dimensional space, and the data can be a measurement point. Note that in that case, a three-dimensional measuring instrument is used instead of the imaging section 2.

１領域分割装置、２撮影部、３通信部、４記憶部、５画像処理部、６表示部、７操作入力部、３０画像入力手段、３１領域情報出力手段、４０分類器記憶手段、４１学習用データ記憶手段、４２学習モデル記憶手段、５０領域分割手段、５２学習用バイアス生成手段、５３学習手段、７０バイアス入力手段、１００画像、１１０合成特徴量、１２０入力ノード、１２１バイアス情報、１３０出力ノード、１４０クラス分類結果、４００特徴量抽出部、４０１バイアス情報圧縮部、４０２特徴量合成部、４０３クラス分類部。 Reference Signs List 1 area dividing device, 2 photographing unit, 3 communication unit, 4 storage unit, 5 image processing unit, 6 display unit, 7 operation input unit, 30 image input means, 31 area information output means, 40 classifier storage means, 41 learning data storage means, 42 learning model storage means, 50 region division means, 52 learning bias generation means, 53 learning means, 70 bias input means, 100 image, 110 synthetic feature quantity, 120 input node, 121 bias information, 130 output Node, 140 Class classification result, 400 Feature extraction unit, 401 Bias information compression unit, 402 Feature synthesis unit, 403 Class classification unit.

Claims

An area dividing device that performs classification processing to classify a data group distributed in a space into a plurality of classes, and divides the space into labeled areas identified by the classes,
As a classifier that receives the data group and bias information for biasing the classification process and performs the classification process on the data group, a learning data group and information provided in advance for the learning data group are used. a classifier storage means that stores a trained model that has been trained using a correct class and learning bias information that is the bias information derived from the correct class;
region dividing means for inputting the data group and the bias information for the data group into the classifier and calculating the label region based on the output class classification result;
An area dividing device comprising:

The area dividing device according to claim 1,
Bias input means for inputting the bias information having elements in one-to-one correspondence with each of the predefined classes;
bias information compression means for dimensionally compressing the bias information from the bias input means and subjecting it to the classification process;
An area dividing device further comprising:

The area dividing device according to claim 2,
The classifier is
a feature amount synthesis unit that generates a composite feature amount that combines the dimensionally compressed bias information and the feature amount of the data group;
a class classification unit that performs the classification process based on the composite feature amount;
An area dividing device comprising:

The area dividing device according to any one of claims 1 to 3,
The area segmentation device is characterized in that the bias information input to the classifier specifies a class that is more likely to appear or a class that is less likely to appear in the class classification result.

The area dividing device according to claim 4,
The area segmentation device is characterized in that the bias information input to the classifier further specifies a degree of ease or difficulty of appearance of the class in the class classification result.

A learning device that trains a classifier that performs a classification process for classifying each of a plurality of data included in a data group obtained by measuring the space into a plurality of classes for each space to be measured , comprising :
The classifier stores a learning model that receives the data group and bias information for biasing the classification process and outputs a class classification result for each of the plurality of data included in the data group. a learning model storage means for
The bias information is derived from a correct class given in advance to each of a learning data group and a plurality of learning data included in the learning data group, and a correct answer class of each of the plurality of learning data. learning data storage means storing learning bias information;
learning means for inputting the learning data group and the learning bias information into the learning model, and performing learning to update the learning model based on an error of the output class classification result with respect to the correct answer;
A learning device characterized by having.

The learning device according to claim 6,
For each of the training data groups, the learning bias information has elements that correspond one-to-one with each of the predefined classes, and the correct class given to the learning data group is assigned to the class. The method further comprises learning bias generation means for generating the learning bias information that is designated as a class that is likely to appear in the classification results and that is designated as a class that makes it difficult for classes other than the correct answer to appear in the classification results. A learning device.

An area dividing method that performs a classification process to classify a data group distributed in a space into a plurality of classes, and divides the space into labeled areas identified by the classes, the method comprising:
As a classifier that receives the data group and bias information for biasing the classification process and performs the classification process on the data group, a learning data group and information provided in advance for the learning data group are used. preparing a trained model that has been trained using the correct class and learning bias information that is the bias information derived from the correct class;
a region dividing step of inputting the data group and the bias information for the data group to the classifier and calculating the label region based on the output class classification result;
A region dividing method characterized by having the following.

A learning method for training a classifier that performs classification processing for classifying each of a plurality of data included in a data group obtained by measuring the space into a plurality of classes for each space to be measured, the method comprising:
As the classifier, a learning model is prepared that receives the data group and bias information for biasing the classification process and outputs a class classification result for each of the plurality of data included in the data group. the step of
The bias information is derived from a correct class given in advance to each of a learning data group and a plurality of learning data included in the learning data group, and a correct answer class of each of the plurality of learning data. a step of preparing learning bias information;
a learning step of inputting the learning data group and the learning bias information to the learning model and performing learning to update the learning model based on the error of the output class classification result with respect to the correct answer;
A learning method characterized by having the following.

A program that causes a computer to perform a classification process to classify a data group distributed in a space into a plurality of classes, and to divide the space into label areas identified by the classes, the program comprising:
the computer,
As a classifier that receives the data group and bias information for biasing the classification process and performs the classification process on the data group, a learning data group and information provided in advance for the learning data group are used. a classifier storage means that stores a trained model that has been trained using a correct class and learning bias information that is the bias information derived from the correct class;
region dividing means for inputting the data group and the bias information for the data group into the classifier and calculating the label region based on the output class classification result;
An area division program characterized by functioning as a.

A program that causes a computer to perform processing to train a classifier that performs classification processing to classify each of multiple data included in a data group obtained by measuring the space into multiple classes for each space to be measured. hand,
the computer,
The classifier stores a learning model that receives the data group and bias information for biasing the classification process and outputs a class classification result for each of the plurality of data included in the data group. learning model storage means for
The bias information is derived from a correct class given in advance to each of a learning data group and a plurality of learning data included in the learning data group, and a correct answer class of each of the plurality of learning data. learning data storage means storing learning bias information; and
learning means that inputs the learning data group and the learning bias information into the learning model, and performs learning to update the learning model based on the error of the output class classification result with respect to the correct answer;
A learning program that is characterized by functioning as a.