JP2020194446A

JP2020194446A - Program, information processing method, and information processor

Info

Publication number: JP2020194446A
Application number: JP2019100664A
Authority: JP
Inventors: 享祐市川; Kyosuke Ichikawa; 拓海杉浦; Takumi Sugiura; 大地横川; Daichi Yokokawa; 匡仁熊田; Masahito Kumada
Original assignee: ARK JOHO SYSTEMS KK
Current assignee: ARK JOHO SYSTEMS KK
Priority date: 2019-05-29
Filing date: 2019-05-29
Publication date: 2020-12-03
Anticipated expiration: 2039-05-29
Also published as: JP7333029B2

Abstract

To provide a program or the like capable of reducing a load for creating teacher data used for semantic segmentation.SOLUTION: A computer acquires an image and label information indicating an object in the image. The computer inputs the acquired image to a learned model which is learned to output object information indicating the object in the image when the image is input. Then, the computer generates a heat map indicating an area in the input image used as a basis when the learned model outputs the object information to the input image. The computer associates the label information with each pixel in the area indicated by the heat map in the input image.SELECTED DRAWING: Figure 1

Description

本開示は、プログラム、情報処理方法及び情報処理装置に関する。 The present disclosure relates to programs, information processing methods and information processing devices.

道路、建物、トンネル、ダム等の構造物の表面においてひび割れ等の発生を点検する際に、ニューラルネットワークを用いて、構造物の表面画像からひび割れ等の線状図形を抽出することが行われている（例えば特許文献１参照）。また近年、ニューラルネットワークを用いて、画像中の各画素を特定の物体の領域に分類するセマンティックセグメンテーションと呼ばれる技術が利用されている。セマンティックセグメンテーションでは、例えば犬の画像と猫の画像とを学習させたニューラルネットワークを用いて、犬及び猫を含む画像中の各画素を、犬の領域、猫の領域及びその他の領域にそれぞれ分類する。 When inspecting the occurrence of cracks on the surface of structures such as roads, buildings, tunnels, and dams, a neural network is used to extract linear figures such as cracks from the surface image of the structure. (See, for example, Patent Document 1). Further, in recent years, a technique called semantic segmentation has been used in which each pixel in an image is classified into a region of a specific object by using a neural network. In semantic segmentation, for example, using a neural network trained with a dog image and a cat image, each pixel in the image including the dog and the cat is classified into a dog area, a cat area, and other areas, respectively. ..

特開２０１８−１９５００１号公報Japanese Unexamined Patent Publication No. 2018-195001

セマンティックセグメンテーションでは、ニューラルネットワークを学習させる際に、予め人間が手作業で画像中の各画素を各領域に分類した画像を教師データとして用いる。よって、昨今の機械学習分野では、このような教師データを作成する作業に多くの労力及び予算が費やされている。 In semantic segmentation, when training a neural network, an image in which each pixel in the image is manually classified into each region by a human in advance is used as teacher data. Therefore, in the field of machine learning these days, a lot of labor and budget are spent on the work of creating such teacher data.

本開示は、このような事情に鑑みてなされたものであり、その目的とするところは、セマンティックセグメンテーションに用いる教師データの作成負担を軽減することが可能なプログラム等を提供することにある。 The present disclosure has been made in view of such circumstances, and an object of the present disclosure is to provide a program or the like capable of reducing the burden of creating teacher data used for semantic segmentation.

本開示の一態様に係るプログラムは、画像及び前記画像中の対象物を示すラベル情報を取得し、画像が入力された場合に前記画像中の対象物を示す対象物情報を出力するように学習された学習済みモデルに対して、取得した前記画像を入力し、前記画像が入力された前記学習済みモデルが対象物情報を出力した場合に根拠とした前記画像中の領域を示すヒートマップを生成し、前記画像中の前記ヒートマップが示す領域内の各画素に前記ラベル情報を対応付ける処理をコンピュータに実行させる。 The program according to one aspect of the present disclosure learns to acquire an image and label information indicating an object in the image, and output object information indicating the object in the image when the image is input. The acquired image is input to the trained model, and a heat map showing a region in the image as a basis when the trained model into which the image is input outputs object information is generated. Then, the computer is made to execute the process of associating the label information with each pixel in the region indicated by the heat map in the image.

本開示にあっては、セマンティックセグメンテーションに用いる教師データの作成負担を軽減することができる。 In the present disclosure, it is possible to reduce the burden of creating teacher data used for semantic segmentation.

情報処理装置の構成例を示すブロック図である。It is a block diagram which shows the configuration example of an information processing apparatus. クラス分類モデルの構成例を示す模式図である。It is a schematic diagram which shows the structural example of the classification model. ヒートマップ生成アプリの動作を説明するための模式図である。It is a schematic diagram for demonstrating the operation of the heat map generation application. セグメンテーションＤＮＮを説明するための模式図である。It is a schematic diagram for demonstrating the segmentation DNN. 情報処理装置による教師データの生成処理手順の一例を示すフローチャートである。It is a flowchart which shows an example of the teacher data generation processing procedure by an information processing apparatus. 画面例を示す模式図である。It is a schematic diagram which shows the screen example. 画面例を示す模式図である。It is a schematic diagram which shows the screen example. 実施形態２の情報処理装置による教師データの生成処理手順の一例を示すフローチャートである。It is a flowchart which shows an example of the teacher data generation processing procedure by the information processing apparatus of Embodiment 2.

以下に、本開示のプログラム、情報処理方法及び情報処理装置について、クラス分類を行うように学習させたディープニューラルネットワーク（ＤＮＮ）を用いて、セマンティックセグメンテーションに用いるＤＮＮを学習させるための教師データを生成する装置に適用した実施形態を示す図面に基づいて詳述する。 Below, the program, information processing method, and information processing apparatus of the present disclosure are used to generate teacher data for training the DNN used for semantic segmentation by using a deep neural network (DNN) trained to perform classification. The details will be described with reference to the drawings showing the embodiments applied to the device.

（実施形態１）
クラス分類ＤＮＮを用いて、セマンティックセグメンテーションに用いるセグメンテーションＤＮＮを学習させるための教師データを生成する情報処理装置について説明する。本実施形態では、アスファルト舗装された道路の表面（路面）を撮影して得られた路面画像に基づいて路面上のひび割れ、凹み（わだち掘れ、ポットホール等）、白線及び黄色線等の路面標識の欠損等の有無を判別するように学習させたクラス分類ＤＮＮを用いる。また、本実施形態では、路面画像中の各画素を、ひび割れの領域、凹みの領域、路面標識の欠損の領域等にそれぞれ分類するセグメンテーションＤＮＮの学習に用いる教師データを生成する。 (Embodiment 1)
An information processing device that generates teacher data for training a segmentation DNN used for semantic segmentation by using a classification DNN will be described. In the present embodiment, road markings such as cracks, dents (rutting, potholes, etc.), white lines, yellow lines, etc. on the road surface based on the road surface image obtained by photographing the surface (road surface) of the asphalt-paved road. The classification DNN trained to determine the presence or absence of a defect or the like is used. Further, in the present embodiment, teacher data used for learning the segmentation DNN for classifying each pixel in the road surface image into a cracked region, a dented region, a road marking defect region, and the like is generated.

図１は、情報処理装置の構成例を示すブロック図である。情報処理装置１０は、パーソナルコンピュータ又はサーバコンピュータ等を用いて構成されており、制御部１１、記憶部１２、通信部１３、入力部１４、表示部１５、読み取り部１６等を含み、これらの各部はバスを介して相互に接続されている。制御部１１は、ＣＰＵ（Central Processing Unit）、ＭＰＵ（Micro-Processing Unit）又はＧＰＵ（Graphics Processing Unit）等の１又は複数のプロセッサを含む。制御部１１は、記憶部１２に記憶してある制御プログラム１２Ｐを適宜実行することにより、本開示の情報処理装置が行うべき種々の情報処理、制御処理等を情報処理装置１０に行わせる。 FIG. 1 is a block diagram showing a configuration example of an information processing device. The information processing device 10 is configured by using a personal computer, a server computer, or the like, and includes a control unit 11, a storage unit 12, a communication unit 13, an input unit 14, a display unit 15, a reading unit 16, and the like, and each of these units. Are interconnected via a bus. The control unit 11 includes one or more processors such as a CPU (Central Processing Unit), an MPU (Micro-Processing Unit), and a GPU (Graphics Processing Unit). By appropriately executing the control program 12P stored in the storage unit 12, the control unit 11 causes the information processing device 10 to perform various information processing, control processing, and the like that should be performed by the information processing device of the present disclosure.

記憶部１２は、ＲＡＭ（Random Access Memory）、フラッシュメモリ、ハードディスク、ＳＳＤ（Solid State Drive）等を含む。記憶部１２は、制御部１１が実行する制御プログラム１２Ｐ及び制御プログラム１２Ｐの実行に必要な各種のデータ等を予め記憶している。また記憶部１２は、制御部１１が制御プログラム１２Ｐを実行する際に発生するデータ等を一時的に記憶する。また記憶部１２は、例えばディープラーニング処理によって構築されたＤＮＮであるクラス分類モデル１２ａを記憶している。クラス分類モデル１２ａは、路面を撮影して得られた画像データ（路面画像）が入力された場合に、画像データ中に撮影された路面にひび割れ、凹み、路面標識の欠損等（対象物）があるか否かを示す情報（対象物情報）を出力するように学習された学習済みモデルである。ＤＮＮ（学習済みモデル）は、入力値に対して所定の演算を行い、演算結果を出力するものであり、記憶部１２には、この演算を規定する関数の係数や閾値等のデータがＤＮＮ（クラス分類モデル１２ａ）として記憶される。更に記憶部１２は、クラス分類モデル１２ａの学習に用いた教師データが蓄積された教師データＤＢ（データベース）１２ｂを記憶する。教師データは、路面を撮影して得られた路面画像と、路面画像中の路面に存在するひび割れ、凹み、路面標識の欠損を示す情報又は路面がこれら以外の状態であることを示す情報（正解ラベル、ラベル情報）とを１セットとしたデータであり、このような教師データが教師データＤＢ１２ｂに多数記憶してある。なお、クラス分類モデル１２ａは、このような教師データを用いて、路面画像が入力された場合に、路面画像中の路面にひび割れ、凹み、路面標識の欠損等が存在するか否かを示す情報を出力するように学習してある。また記憶部１２は、ＤＮＮが入力データをクラス分類した際に注目した箇所（分類結果に寄与した箇所）を可視化するアプリケーションプログラムであるヒートマップ生成アプリケーション１２ｃ（ヒートマップ生成アプリ）を記憶する。なお、教師データＤＢ１２ｂは、情報処理装置１０に接続された外部の記憶装置に記憶されてもよく、ネットワークを介して情報処理装置１０が通信可能な記憶装置に記憶されてもよい。 The storage unit 12 includes a RAM (Random Access Memory), a flash memory, a hard disk, an SSD (Solid State Drive), and the like. The storage unit 12 stores in advance various data and the like necessary for executing the control program 12P and the control program 12P executed by the control unit 11. Further, the storage unit 12 temporarily stores data or the like generated when the control unit 11 executes the control program 12P. Further, the storage unit 12 stores the classification model 12a, which is a DNN constructed by, for example, a deep learning process. In the classification model 12a, when the image data (road surface image) obtained by photographing the road surface is input, the road surface photographed in the image data has cracks, dents, missing road markings, etc. (object). It is a trained model trained to output information (object information) indicating whether or not it exists. The DNN (learned model) performs a predetermined operation on an input value and outputs the operation result, and data such as a coefficient and a threshold of a function that defines this operation is stored in the storage unit 12 as a DNN (learned model). It is stored as a classification model 12a). Further, the storage unit 12 stores the teacher data DB (database) 12b in which the teacher data used for learning the classification model 12a is stored. The teacher data includes the road surface image obtained by photographing the road surface, information indicating cracks, dents, and lack of road surface signs existing on the road surface in the road surface image, or information indicating that the road surface is in a state other than these (correct answer). The data is a set of labels and label information), and a large number of such teacher data are stored in the teacher data DB 12b. The class classification model 12a uses such teacher data to indicate whether or not there are cracks, dents, missing road markings, etc. on the road surface in the road surface image when the road surface image is input. I have learned to output. Further, the storage unit 12 stores a heat map generation application 12c (heat map generation application), which is an application program that visualizes a part (a part that contributed to the classification result) that the DNN paid attention to when classifying the input data. The teacher data DB 12b may be stored in an external storage device connected to the information processing device 10, or may be stored in a storage device capable of communicating with the information processing device 10 via the network.

通信部１３は、有線通信又は無線通信によって、インターネット又はＬＡＮ（LocalArea Network）等のネットワークに接続するためのインタフェースであり、ネットワークを介して外部装置との間で情報の送受信を行う。入力部１４は、マウス及びキーボード等を含み、ユーザによる操作入力を受け付け、操作内容に対応した制御信号を制御部１１へ送出する。表示部１５は、液晶ディスプレイ又は有機ＥＬディスプレイ等であり、制御部１１からの指示に従って各種の情報を表示する。なお、入力部１４及び表示部１５は一体として構成されたタッチパネルであってもよい。 The communication unit 13 is an interface for connecting to a network such as the Internet or a LAN (Local Area Network) by wired communication or wireless communication, and transmits / receives information to / from an external device via the network. The input unit 14 includes a mouse, a keyboard, and the like, receives an operation input by the user, and sends a control signal corresponding to the operation content to the control unit 11. The display unit 15 is a liquid crystal display, an organic EL display, or the like, and displays various information according to instructions from the control unit 11. The input unit 14 and the display unit 15 may be a touch panel integrally configured.

読み取り部１６は、ＣＤ（Compact Disc）−ＲＯＭ、ＤＶＤ（Digital Versatile Disc）−ＲＯＭ及びＵＳＢ（Universal Serial Bus）メモリを含む可搬型記憶媒体１ａに記憶された情報を読み取る。記憶部１２に記憶されるプログラム及びデータは、例えば制御部１１が読み取り部１６を介して可搬型記憶媒体１ａから読み取って記憶部１２に記憶してもよい。また、記憶部１２に記憶されるプログラム及びデータは、制御部１１が通信部１３を介してネットワーク経由で外部装置からダウンロードして記憶部１２に記憶してもよい。更に、プログラム及びデータを半導体メモリ１ｂに記憶しておき、制御部１１が、半導体メモリ１ｂからプログラム及びデータを読み出してもよい。 The reading unit 16 reads information stored in a portable storage medium 1a including a CD (Compact Disc) -ROM, a DVD (Digital Versatile Disc) -ROM, and a USB (Universal Serial Bus) memory. The programs and data stored in the storage unit 12 may be read from the portable storage medium 1a by the control unit 11 via the reading unit 16 and stored in the storage unit 12, for example. Further, the programs and data stored in the storage unit 12 may be downloaded by the control unit 11 from an external device via the network via the communication unit 13 and stored in the storage unit 12. Further, the program and data may be stored in the semiconductor memory 1b, and the control unit 11 may read the program and data from the semiconductor memory 1b.

図２は、クラス分類モデル１２ａの構成例を示す模式図である。本実施形態のクラス分類モデル１２ａは、例えば図２に示すようにＣＮＮ（Convolution Neural Network）モデルで構成されている。クラス分類モデル１２ａは、ＣＮＮモデルのほかに、ＲＮＮ（Recurrent Neural Network）モデル又はＬＳＴＭ（Long Short-Term Memory）モデル等で構成されていてもよい。図２に示すクラス分類モデル１２ａは、入力層、中間層及び出力層から構成されている。中間層は畳み込み層、プーリング層及び全結合層を含む。本実施形態のクラス分類モデル１２ａでは、入力層を介して、路面画像（画像データ）が入力される。入力層の各ノードには路面画像中の各画素が入力され、入力層の各ノードを介して入力された路面画像は中間層に入力される。中間層に入力された路面画像は、畳み込み層でフィルタ処理等によって画像の特徴量が抽出されて特徴マップが生成され、プーリング層で圧縮されて情報量が削減される。畳み込み層及びプーリング層は複数層繰り返し設けられており、複数の畳み込み層及びプーリング層によって生成された特徴マップは、全結合層に入力される。全結合層は複数層（図２では２層）設けられており、入力された特徴マップに基づいて、各種の関数や閾値等を用いて各層のノードの出力値を算出し、算出した出力値を順次後の層のノードに入力する。全結合層は、各層のノードの出力値を順次後の層のノードに入力することにより、最終的に出力層の各ノードにそれぞれの出力値を与える。畳み込み層、プーリング層及び全結合層のそれぞれの層数は図２に示す例に限定されない。 FIG. 2 is a schematic diagram showing a configuration example of the classification model 12a. The classification model 12a of the present embodiment is composed of a CNN (Convolution Neural Network) model, for example, as shown in FIG. In addition to the CNN model, the classification model 12a may be composed of an RNN (Recurrent Neural Network) model, an RSTM (Long Short-Term Memory) model, or the like. The classification model 12a shown in FIG. 2 is composed of an input layer, an intermediate layer, and an output layer. The intermediate layer includes a convolution layer, a pooling layer and a fully connected layer. In the classification model 12a of the present embodiment, the road surface image (image data) is input via the input layer. Each pixel in the road surface image is input to each node of the input layer, and the road surface image input via each node of the input layer is input to the intermediate layer. The road surface image input to the intermediate layer is extracted by filtering or the like in the convolutional layer to generate a feature map, which is compressed by the pooling layer to reduce the amount of information. The convolution layer and the pooling layer are repeatedly provided in a plurality of layers, and the feature map generated by the plurality of convolution layers and the pooling layer is input to the fully connected layer. A plurality of fully connected layers (two layers in FIG. 2) are provided, and the output values of the nodes of each layer are calculated using various functions and threshold values based on the input feature map, and the calculated output values are calculated. Is sequentially input to the node of the next layer. The fully connected layer finally gives each node of the output layer its own output value by sequentially inputting the output value of the node of each layer to the node of the subsequent layer. The number of each of the convolution layer, the pooling layer and the fully connected layer is not limited to the example shown in FIG.

本実施形態のクラス分類モデル１２ａでは、出力層は４つのノードを有しており、例えばノード０は、入力された路面画像中の路面にひび割れが存在すると判別すべき確率を出力し、ノード１は、凹みが存在すると判別すべき確率を出力し、ノード２は、路面標識の欠損が存在すると判別すべき確率を出力し、ノード３は、その他の状態であると判別すべき確率を出力する。なお、その他の状態とは、路面にひび割れ、凹み及び路面標識の欠損が存在しない状態と、路面にひび割れ、凹み及び路面標識の欠損以外の損傷が存在する状態とを含む。出力層の各ノードの出力値は例えば０〜１．０の値であり、４つのノードからそれぞれ出力された確率の合計が１．０（１００％）となる。 In the classification model 12a of the present embodiment, the output layer has four nodes. For example, node 0 outputs the probability that it should be determined that a crack exists on the road surface in the input road surface image, and node 1 Outputs the probability that it should be determined that there is a dent, node 2 outputs the probability that it should be determined that there is a defect in the road marking, and node 3 outputs the probability that it should be determined that there is another state. .. The other states include a state in which there are no cracks, dents, or defects in road markings on the road surface, and a state in which damages other than cracks, dents, and defects in road markings are present on the road surface. The output value of each node of the output layer is, for example, a value of 0 to 1.0, and the total of the probabilities output from each of the four nodes is 1.0 (100%).

クラス分類モデル１２ａは、路面画像と、路面画像中の路面に存在するひび割れ、凹み、路面標識の欠損を示す情報又は路面がこれら以外の状態であることを示す情報（正解ラベル）とを含む教師データを用いて学習する。クラス分類モデル１２ａは、教師データに含まれる路面画像が入力された場合に、教師データに含まれる正解ラベルに対応する出力ノードからの出力値が１．０に近づき、他の出力ノードからの出力値が０に近づくように学習する。学習処理においてクラス分類モデル１２ａは、入力値に対して行う所定の演算を規定する各種の関数の係数や閾値等のデータを最適化する。これにより、路面画像が入力された場合に、路面画像中の路面にひび割れ、凹み、路面標識の欠損の何れが存在するか、又は路面がその他の状態であるかを示す情報を出力するように学習された学習済みのクラス分類モデル１２ａが得られる。なお、クラス分類モデル１２ａの学習は、情報処理装置１０で行われてもよく、他の学習装置で行われてもよいが、既に学習処理が完了している学習済みモデルとする。クラス分類モデル１２ａが他の学習装置で学習される場合、情報処理装置１０は、例えばネットワーク経由又は可搬型記憶媒体１ａ経由で学習装置から学習済みのクラス分類モデル１２ａを取得すればよい。なお、本実施形態の情報処理装置１０は、記憶部１２に教師データ（教師データＤＢ１２ｂ）を記憶しているので、学習装置から学習済みのクラス分類モデル１２ａを取得する場合、クラス分類モデル１２ａの学習に用いた教師データも取得し、記憶部１２の教師データＤＢ１２ｂに記憶する。 The classification model 12a is a teacher including a road surface image and information indicating cracks, dents, and lack of road markings existing on the road surface in the road surface image or information indicating that the road surface is in a state other than these (correct label). Learn using data. In the classification model 12a, when the road surface image included in the teacher data is input, the output value from the output node corresponding to the correct label included in the teacher data approaches 1.0, and the output from other output nodes is output. Learn so that the value approaches 0. In the learning process, the classification model 12a optimizes data such as coefficients and thresholds of various functions that define predetermined operations performed on input values. As a result, when a road surface image is input, information indicating whether the road surface in the road surface image has cracks, dents, or missing road markings, or whether the road surface is in another state is output. The trained trained classification model 12a is obtained. The learning of the classification model 12a may be performed by the information processing device 10 or another learning device, but the learning process is already completed. When the classification model 12a is learned by another learning device, the information processing device 10 may acquire the learned classification model 12a from the learning device via, for example, a network or a portable storage medium 1a. Since the information processing device 10 of the present embodiment stores the teacher data (teacher data DB 12b) in the storage unit 12, when the learned class classification model 12a is acquired from the learning device, the class classification model 12a The teacher data used for learning is also acquired and stored in the teacher data DB 12b of the storage unit 12.

図３は、ヒートマップ生成アプリ１２ｃの動作を説明するための模式図である。ヒートマップ生成アプリ１２ｃは、ＤＮＮに入力データが入力されて、ＤＮＮから出力データが出力された場合に、ＤＮＮが、入力データ中のどの部分に基づいて出力データを算出したかを示すヒートマップを生成する。本実施形態の情報処理装置１０では、ヒートマップ生成アプリ１２ｃは、クラス分類モデル１２ａに路面画像が入力されて、クラス分類モデル１２ａが出力層の各ノードからそれぞれの出力値を出力した場合に、クラス分類モデル１２ａが、路面画像中のどの領域に基づいて出力値を算出したかを示すヒートマップを生成する。具体的には、クラス分類モデル１２ａに路面画像が入力されて、クラス分類モデル１２ａの出力層のノード０から１．０に近い値が出力された場合、ヒートマップ生成アプリ１２ｃは、クラス分類モデル１２ａが、路面画像中の路面にひび割れ（ノード０に対応するクラス）が存在すると判定した際に根拠とした路面画像中の領域を示すヒートマップを生成する。なお、ヒートマップは、ＤＮＮからの出力値に影響を与えた領域を示すだけでなく、領域中の画素毎に、各画素が出力値に与えた影響の度合いを示す。即ち、ヒートマップは、入力画像の各画素の画素位置に対応して、各画素が出力値に与えた影響の度合い（レベル）を示す値を有する。ヒートマップ中の各値は、例えば５段階の数値のように所定数の数値によって、各画素が出力値に与えた影響の度合いを示している。図３に示すヒートマップでは、路面画像において各領域（各画素）が出力値に与えた影響の度合いを異なる色（異なる濃淡）で示している。ＤＮＮは複数層からなる中間層を有しており、ヒートマップ生成アプリ１２ｃは、中間層の各層についてそれぞれヒートマップを生成することができる。 FIG. 3 is a schematic diagram for explaining the operation of the heat map generation application 12c. The heat map generation application 12c provides a heat map showing which part of the input data the DNN calculated the output data based on when the input data is input to the DNN and the output data is output from the DNN. Generate. In the information processing device 10 of the present embodiment, when the heat map generation application 12c inputs a road surface image to the classification model 12a and the classification model 12a outputs the respective output values from each node of the output layer, the heat map generation application 12c outputs the respective output values. The classification model 12a generates a heat map showing which region in the road surface image the output value is calculated based on. Specifically, when a road surface image is input to the class classification model 12a and a value close to 1.0 is output from node 0 of the output layer of the class classification model 12a, the heat map generation application 12c is a class classification model. 12a generates a heat map showing a region in the road surface image as a basis when it is determined that the road surface in the road surface image has cracks (class corresponding to node 0). The heat map not only shows the region that affects the output value from the DNN, but also shows the degree of influence that each pixel has on the output value for each pixel in the region. That is, the heat map has a value indicating the degree (level) of the influence that each pixel has on the output value corresponding to the pixel position of each pixel of the input image. Each value in the heat map indicates the degree of influence that each pixel has on the output value by a predetermined number of numerical values such as a numerical value in five stages. In the heat map shown in FIG. 3, the degree of influence of each region (each pixel) on the output value in the road surface image is shown by different colors (different shades). The DNN has an intermediate layer composed of a plurality of layers, and the heat map generation application 12c can generate a heat map for each layer of the intermediate layer.

ヒートマップ生成アプリ１２ｃが生成するヒートマップは、入力画像において、ＤＮＮ（クラス分類モデル１２ａ）が分類したクラスの対象物の領域を示している。そこで、本実施形態の情報処理装置１０は、ヒートマップ生成アプリ１２ｃによって生成されるヒートマップを用いて、セマンティックセグメンテーションに用いるセグメンテーションＤＮＮ（学習モデル）を学習させるための教師データを生成する。なお、ヒートマップ生成アプリ１２ｃは、例えばＧｒａｄ−ＣＡＭ（Gradient weighted Class Activation Mapping）、Guided Ｇｒａｄ−ＣＡＭ、ＣＡＭ、Ｇｒａｄ−ＣＡＭ＋＋、Guided Ｇｒａｄ−ＣＡＭ＋＋、Saliency Map、Guided Backpropagation、Layer-wise Relevance Propagation、Integrated Gradients、Smooth Grad、DeepLIFT（Deep Learning Important FeaTures）等を用いることができる。なお、ヒートマップ生成アプリ１２ｃは、上述したアプリケーションに限定されず、各種の計算方法によってヒートマップを生成するプログラムを用いることができる。 The heat map generated by the heat map generation application 12c shows the area of the object of the class classified by DNN (classification model 12a) in the input image. Therefore, the information processing apparatus 10 of the present embodiment uses the heat map generated by the heat map generation application 12c to generate teacher data for training the segmentation DNN (learning model) used for semantic segmentation. The heat map generation application 12c is, for example, Grad-CAM (Gradient weighted Class Activation Mapping), Guided Grad-CAM, CAM, Grad-CAM ++, Guided Grad-CAM ++, Salience Map, Guided Backpropagation, Layer-wise Relevance Propagation, Integrated. Gradients, Smooth Grad, DeepLIFT (Deep Learning Important FeaTures) and the like can be used. The heat map generation application 12c is not limited to the above-mentioned application, and a program that generates a heat map by various calculation methods can be used.

セグメンテーションＤＮＮについて説明する。ここでは、画像が入力された場合に、入力画像の各画素を、リンゴの領域、バナナの領域、及びその他の領域にそれぞれ分類するように学習された学習済みモデルであるセグメンテーションＤＮＮを例に説明する。図４は、セグメンテーションＤＮＮを説明するための模式図であり、図４ＡはセグメンテーションＤＮＮの構成例を示し、図４ＢはセグメンテーションＤＮＮの学習に用いる教師データの例を示す。セグメンテーションＤＮＮは、例えば図４に示すようなＳｅｇＮｅｔモデル、ＦＣＮ（Fully Convolutional Network ）モデル又はＵ−Ｎｅｔモデル等で構成されている。 The segmentation DNN will be described. Here, a segmentation DNN, which is a trained model trained to classify each pixel of the input image into an apple region, a banana region, and other regions when an image is input, will be described as an example. To do. FIG. 4 is a schematic diagram for explaining the segmentation DNN, FIG. 4A shows a configuration example of the segmentation DNN, and FIG. 4B shows an example of teacher data used for learning the segmentation DNN. The segmentation DNN is composed of, for example, a SegNet model, an FCN (Fully Convolutional Network) model, a U-Net model, or the like as shown in FIG.

図４に示すセグメンテーションＤＮＮは、入力層、中間層及び出力層を有し、中間層は前半部分に畳み込み層及びプーリング層を含み、後半部分にアップサンプリング層（逆プーリング層）及び畳み込み層を含む。このような構成のセグメンテーションＤＮＮでは、入力層を介して入力された入力画像が中間層において、前半部分の畳み込み層で画像の特徴量が抽出されて特徴マップが生成され、プーリング層で圧縮される。セグメンテーションＤＮＮにおいても、畳み込み層及びプーリング層は複数層設けられており、複数の畳み込み層及びプーリング層によって生成された特徴マップは、後半部分のアップサンプリング層で拡大（画素が補間）されて、畳み込み層で画像の平滑化が行われる。アップサンプリング層及び畳み込み層も複数層設けられており、複数のアップサンプリング層及び畳み込み層によって拡大された特徴マップは、出力層から出力される。出力層から出力される特徴マップは、図４に示すように、入力画像に対して、リンゴに分類された領域と、バナナに分類された領域と、その他の領域とにそれぞれ異なる色が付された画像である。セグメンテーションＤＮＮにおいて、畳み込み層及びプーリング層の層数、アップサンプリング層及び畳み込み層の層数等は図４に示す例に限定されない。 The segmentation DNN shown in FIG. 4 has an input layer, an intermediate layer and an output layer, and the intermediate layer includes a convolution layer and a pooling layer in the first half portion and an upsampling layer (reverse pooling layer) and a convolution layer in the second half portion. .. In the segmentation DNN having such a configuration, the input image input via the input layer is in the intermediate layer, the feature amount of the image is extracted in the convolution layer in the first half, a feature map is generated, and the feature map is compressed in the pooling layer. .. Also in the segmentation DNN, a plurality of convolution layers and pooling layers are provided, and the feature map generated by the plurality of convolution layers and pooling layers is enlarged (pixels interpolated) by the upsampling layer in the latter half and convolved. The image is smoothed at the layer. A plurality of upsampling layers and convolution layers are also provided, and the feature map enlarged by the plurality of upsampling layers and convolution layers is output from the output layer. In the feature map output from the output layer, as shown in FIG. 4, different colors are added to the input image in the apple-classified area, the banana-classified area, and the other areas. It is an image. In the segmentation DNN, the number of layers of the convolution layer and the pooling layer, the number of layers of the upsampling layer and the convolution layer, and the like are not limited to the examples shown in FIG.

上述した構成のセグメンテーションＤＮＮは、入力画像と、図４Ｂに示すように入力画像中の各画素に対して、判別すべき物体（ここではリンゴ又はバナナ）を示す情報（クラスラベル）が対応付けられたラベル画像とを含む教師データを用いて学習する。セグメンテーションＤＮＮは、教師データに含まれる入力画像が入力された場合に、教師データに含まれるラベル画像を出力するように学習する。学習処理においてセグメンテーションＤＮＮは、入力値に対して行う所定の演算を規定する各種の関数の係数や閾値等のデータを最適化する。これにより、画像が入力された場合に、入力画像中の各画素を、リンゴの領域、バナナの領域、及びその他の領域（各クラスの領域）にそれぞれ分類するように学習したセグメンテーションＤＮＮが得られる。なお、セグメンテーションＤＮＮは、図４Ｂに示すようにクラス毎にラベル画像が用意された教師データを用いて学習してもよいし、１つの入力画像に複数のクラス（物体）のクラスラベルが対応付けられたラベル画像の教師データを用いて学習してもよい。 In the segmentation DNN having the above-described configuration, the input image is associated with information (class label) indicating an object to be discriminated (here, an apple or a banana) for each pixel in the input image as shown in FIG. 4B. Learning is performed using teacher data including the label image. The segmentation DNN learns to output the label image included in the teacher data when the input image included in the teacher data is input. In the learning process, the segmentation DNN optimizes data such as coefficients and thresholds of various functions that define predetermined operations performed on input values. As a result, when an image is input, a segmentation DNN learned to classify each pixel in the input image into an apple region, a banana region, and other regions (regions of each class) can be obtained. .. Note that the segmentation DNN may be learned using teacher data in which label images are prepared for each class as shown in FIG. 4B, or class labels of a plurality of classes (objects) are associated with one input image. Learning may be performed using the teacher data of the label image.

本実施形態の情報処理装置１０は、図４Ｂに示すようなセグメンテーションＤＮＮの学習に用いるラベル画像（教師データ）を、ヒートマップ生成アプリ１２ｃによって生成されるヒートマップを用いて生成する。なお、本実施形態では、路面画像が入力された場合に、路面画像の各画素を、ひび割れの領域、凹みの領域、路面標識の欠損の領域、及びその他の領域にそれぞれ分類するようにセグメンテーションＤＮＮを学習させるための教師データを、クラス分類モデル１２ａ、教師データＤＢ１２ｂ及びヒートマップ生成アプリ１２ｃを用いて生成する。 The information processing apparatus 10 of the present embodiment generates a label image (teacher data) used for learning the segmentation DNN as shown in FIG. 4B by using the heat map generated by the heat map generation application 12c. In the present embodiment, when a road surface image is input, each pixel of the road surface image is classified into a cracked region, a dented region, a road surface marker defect region, and other regions, respectively. The teacher data for learning the above is generated by using the classification model 12a, the teacher data DB 12b, and the heat map generation application 12c.

以下に、本実施形態の情報処理装置１０が、セグメンテーションＤＮＮの学習に用いる教師データを生成する際に行う処理について説明する。図５は、情報処理装置１０による教師データの生成処理手順の一例を示すフローチャート、図６及び図７は画面例を示す模式図である。以下の処理は、情報処理装置１０の記憶部１２に記憶してある各種のプログラムに従って制御部１１によって実行される。なお、本実施形態では、以下の処理を制御部１１がプログラムを実行することにより実現するが、一部の処理を専用のハードウェア回路で実現してもよい。 Hereinafter, the process performed by the information processing apparatus 10 of the present embodiment when generating the teacher data used for learning the segmentation DNN will be described. FIG. 5 is a flowchart showing an example of a procedure for generating teacher data by the information processing apparatus 10, and FIGS. 6 and 7 are schematic views showing a screen example. The following processing is executed by the control unit 11 according to various programs stored in the storage unit 12 of the information processing device 10. In the present embodiment, the following processing is realized by the control unit 11 executing the program, but some processing may be realized by a dedicated hardware circuit.

情報処理装置１０において、例えば入力部１４を介して、セグメンテーションＤＮＮの学習に用いる教師データの生成処理の実行指示を受け付けた場合、制御部１１（取得部）は、教師データＤＢ１２ｂに記憶してある教師データを読み出す（Ｓ１１）。それぞれの教師データには、路面画像と、路面画像中の路面に存在するひび割れ、凹み、路面標識の欠損（対象物）を示すラベル情報（対象物情報）とが含まれる。制御部１１は、読み出した教師データに基づいて、セグメンテーションＤＮＮの学習に用いる教師データの生成に用いる画像（路面画像）の指定を受け付けるための画像指定画面を生成して表示部１５に表示する（Ｓ１２）。 When the information processing device 10 receives, for example, an execution instruction of the teacher data generation process used for learning the segmentation DNN via the input unit 14, the control unit 11 (acquisition unit) stores the teacher data DB 12b. Read the teacher data (S11). Each teacher data includes a road surface image and label information (object information) indicating cracks, dents, and road marking defects (objects) existing on the road surface in the road surface image. Based on the read teacher data, the control unit 11 generates an image designation screen for receiving the designation of the image (road surface image) used for generating the teacher data used for learning the segmentation DNN, and displays it on the display unit 15 ( S12).

図６は画像指定画面例を示しており、画像指定画面は、読み出した複数の教師データにそれぞれ含まれる路面画像を選択可能画像Ａ２として表示し、選択可能画像Ａ２から選択された１つの路面画像を処理画像Ａ１として表示している。なお、画像指定画面において、選択可能画像Ａ２の表示領域は、教師データＤＢ１２ｂから読み出された全ての路面画像（教師データ）を表示できるように、例えばスクロールできるように構成されている。また画像指定画面は、処理画像Ａ１からヒートマップを生成する際に用いるアプリケーションプログラムを選択するための入力欄Ａ３を有する。入力欄Ａ３には、選択可能なヒートマップ生成アプリ１２ｃから任意の１つを選択するためのプルダウンメニューが設定されている。図６に示すプルダウンメニューでは、ヒートマップ生成アプリ１２ｃとしてＧｒａｄ−ＣＡＭ、Guided Ｇｒａｄ−ＣＡＭ又はＣＡＭが選択できるように表示されているが、プルダウンメニューは、情報処理装置１０で実行可能な全てのアプリケーションの選択が可能に構成されている。更に画像指定画面は、選択された処理画像Ａ１に基づいて、入力欄Ａ３を介して選択されたアプリケーションプログラムを用いて、セグメンテーションＤＮＮの学習に用いる教師データを生成する処理の実行を指示するための実行ボタンを有する。 FIG. 6 shows an example of an image designation screen. The image designation screen displays a road surface image included in each of a plurality of read teacher data as a selectable image A2, and one road surface image selected from the selectable image A2. Is displayed as the processed image A1. In the image designation screen, the display area of the selectable image A2 is configured to be scrollable, for example, so that all road surface images (teacher data) read from the teacher data DB 12b can be displayed. Further, the image designation screen has an input field A3 for selecting an application program to be used when generating a heat map from the processed image A1. In the input field A3, a pull-down menu for selecting an arbitrary one from the selectable heat map generation application 12c is set. In the pull-down menu shown in FIG. 6, a heat map generation application 12c is displayed so that Grad-CAM, Guided Grad-CAM, or CAM can be selected. However, the pull-down menu is all applications that can be executed by the information processing device 10. It is configured so that you can select. Further, the image designation screen is for instructing the execution of the process of generating the teacher data used for learning the segmentation DNN by using the application program selected via the input field A3 based on the selected processed image A1. Has a run button.

画像指定画面において、制御部１１は、入力部１４を介して選択可能画像Ａ２のいずれかの路面画像の指定を受け付け（Ｓ１３）、指定を受け付けた場合、指定（選択）された路面画像を処理画像Ａ１として画像指定画面に表示する。また画像指定画面において、制御部１１は、入力部１４を介してヒートマップ生成アプリ１２ｃのいずれかの選択を受け付けた場合、選択されたアプリケーションの名前を入力欄Ａ３に表示しておく。更に画像指定画面において、制御部１１は、入力部１４を介して実行ボタンが操作されることにより、教師データの生成処理の実行指示を受け付ける。よって、制御部１１は、実行ボタンが操作されて教師データの生成処理の実行指示を受け付けたか否かを判断しており（Ｓ１４）、実行指示を受け付けていないと判断した場合（Ｓ１４：ＮＯ）、ステップＳ１３の処理に戻る。教師データの生成処理の実行指示を受け付けたと判断した場合（Ｓ１４：ＹＥＳ）、制御部１１は、クラス分類モデル１２ａに対して、処理画像Ａ１に選択された路面画像を入力する。そして制御部１１（生成部）は、ヒートマップ生成アプリ１２ｃによって、クラス分類モデル１２ａが処理画像Ａ１に基づいて処理画像Ａ１のラベル情報を出力する際に処理画像Ａ１中のどの領域に基づいて出力値を算出したかを示すヒートマップを生成する（Ｓ１５）。具体的には、制御部１１は、クラス分類モデル１２ａが処理画像Ａ１に基づいて出力層の各ノードからそれぞれの出力値を出力した場合に、それぞれの出力値を算出する際に根拠とした処理画像Ａ１中の領域及び出力値に与えた影響の度合を示すヒートマップを生成する。 On the image designation screen, the control unit 11 accepts the designation of any road surface image of the selectable image A2 via the input unit 14 (S13), and when the designation is accepted, processes the designated (selected) road surface image. It is displayed on the image designation screen as image A1. Further, on the image designation screen, when the control unit 11 receives the selection of any of the heat map generation applications 12c via the input unit 14, the name of the selected application is displayed in the input field A3. Further, on the image designation screen, the control unit 11 receives an execution instruction of the teacher data generation process by operating the execution button via the input unit 14. Therefore, the control unit 11 determines whether or not the execution button is operated to accept the execution instruction of the teacher data generation process (S14), and determines that the execution instruction is not accepted (S14: NO). , Return to the process of step S13. When it is determined that the execution instruction of the teacher data generation process has been received (S14: YES), the control unit 11 inputs the selected road surface image to the process image A1 to the class classification model 12a. Then, the control unit 11 (generation unit) outputs based on which region in the processed image A1 when the classification model 12a outputs the label information of the processed image A1 based on the processed image A1 by the heat map generation application 12c. A heat map showing whether the value has been calculated is generated (S15). Specifically, when the classification model 12a outputs each output value from each node of the output layer based on the processed image A1, the control unit 11 uses the processing as the basis for calculating each output value. A heat map showing the degree of influence on the region and the output value in the image A1 is generated.

ヒートマップ生成アプリ１２ｃは、クラス分類モデル１２ａの中間層が有する各層についてヒートマップを生成しており、制御部１１は、生成された複数のヒートマップを合成して合成ヒートマップを生成する（Ｓ１６）。例えば中間層の各層について予め合成する際の割合が設定してあり、制御部１１は、各層のヒートマップにおいて、それぞれ同じ位置の値に、それぞれの層の合成割合を乗じた上で合計値を算出することにより、１つのヒートマップに合成する。なお、各層の合成割合は例えば、出力層に近い層ほど大きい値となるように設定されている。このように設定した場合、各層で生成されたヒートマップに対する重みを、出力層に近い層ほど大きくすることができる。 The heat map generation application 12c generates a heat map for each layer of the intermediate layer of the classification model 12a, and the control unit 11 synthesizes a plurality of generated heat maps to generate a synthetic heat map (S16). ). For example, the ratio for synthesizing each layer of the intermediate layer is set in advance, and the control unit 11 multiplies the value at the same position by the composition ratio of each layer in the heat map of each layer, and then calculates the total value. By calculating, it is combined into one heat map. The composition ratio of each layer is set so that, for example, the layer closer to the output layer has a larger value. When set in this way, the weight for the heat map generated in each layer can be increased as the layer is closer to the output layer.

制御部１１は、生成した合成ヒートマップを編集するための編集画面を生成して表示部１５に表示する（Ｓ１７）。図７は編集画面例を示しており、編集画面は、画像指定画面と同様に処理画像Ａ１、選択可能画像Ａ２、入力欄Ａ３及び実行ボタンを表示する。また編集画面は、処理画像Ａ１の下側に、処理画像Ａ１のラベル情報（クラスの情報）と、ヒートマップ生成アプリ１２ｃによって生成された各層のヒートマップＭ１とを表示している。図７に示す例では、２つのラベル情報が設定してある処理画像Ａ１について、処理画像Ａ１のクラスは、路面標識（横断歩道）の欠損であるクラス１と、路面標識（路側帯）の欠損であるクラス２とであることが表示してあり、クラス毎にLayer１〜Layer５の５層について生成されたヒートマップが表示してある。編集画面は、各層のヒートマップＭ１の右側に、それぞれの層の合成割合を表示しており、合成割合は入力部１４を介して変更できるように構成されている。なお、編集画面は、いずれかのクラスを選択するためのチェックボックス（ラジオボタン）が表示してあり、チェックボックスにて選択されたクラスについて、各層の合成割合の変更が可能である。更に編集画面は、合成ヒートマップＭ２を大きい領域に表示しており、合成ヒートマップＭ２の上側に、合成ヒートマップＭ２を編集するための鉛筆ボタンＢ１及び消しゴムボタンＢ２を表示している。鉛筆ボタンＢ１は、表示中のクラスに分類すべき領域（画素）を合成ヒートマップＭ２に追加する処理の実行を指示するためのボタンである。消しゴムボタンＢ２は、表示中のクラスに分類すべきでない領域（画素）を合成ヒートマップＭ２から消去する処理の実行を指示するためのボタンである。また編集画面は、編集後の合成ヒートマップＭ２を教師データとして記憶する記憶先（フォルダ名及びファイル名）を指定するための入力欄Ｂ３を有し、編集後の合成ヒートマップＭ２を、入力欄Ｂ３に入力された記憶先に記憶する処理の実行を指示するための保存ボタンを有する。上述した構成の編集画面において、合成ヒートマップＭ２の一部が拡大して表示できるように構成されていてもよい。 The control unit 11 generates an edit screen for editing the generated synthetic heat map and displays it on the display unit 15 (S17). FIG. 7 shows an example of an edit screen, and the edit screen displays a processed image A1, a selectable image A2, an input field A3, and an execution button in the same manner as the image designation screen. Further, on the edit screen, the label information (class information) of the processed image A1 and the heat map M1 of each layer generated by the heat map generation application 12c are displayed below the processed image A1. In the example shown in FIG. 7, regarding the processed image A1 in which two label information is set, the classes of the processed image A1 are class 1 which is a defect of the road marking (pedestrian crossing) and a defect of the road marking (roadside zone). It is displayed that it is class 2 and the heat map generated for the 5 layers of Layer 1 to Layer 5 is displayed for each class. The edit screen displays the composition ratio of each layer on the right side of the heat map M1 of each layer, and the composition ratio can be changed via the input unit 14. A check box (radio button) for selecting one of the classes is displayed on the edit screen, and the composition ratio of each layer can be changed for the class selected by the check box. Further, the editing screen displays the synthetic heat map M2 in a large area, and a pencil button B1 and an eraser button B2 for editing the synthetic heat map M2 are displayed above the synthetic heat map M2. The pencil button B1 is a button for instructing the execution of the process of adding the area (pixel) to be classified into the displayed class to the composite heat map M2. The eraser button B2 is a button for instructing the execution of the process of erasing the area (pixel) that should not be classified into the displayed class from the composite heat map M2. Further, the edit screen has an input field B3 for designating a storage destination (folder name and file name) for storing the edited synthetic heat map M2 as teacher data, and an input field for the edited synthetic heat map M2. It has a save button for instructing the execution of the process of storing in the storage destination input to B3. On the edit screen having the above-described configuration, a part of the composite heat map M2 may be enlarged and displayed.

編集画面において、制御部１１は、入力部１４を介して、いずれかの層のヒートマップＭ１に対する合成割合の変更指示を受け付けたか否かを判断している（Ｓ１８）。合成割合の変更指示を受け付けたと判断した場合（Ｓ１８：ＹＥＳ）、制御部１１は、表示中の合成割合を、変更指示された合成割合に変更して編集画面を更新し、ステップＳ１６の処理に戻る。そして、制御部１１は、変更された各層の合成割合に基づいて、ステップＳ１５で生成された複数のヒートマップから合成ヒートマップを生成し（Ｓ１６）、表示中の合成ヒートマップＭ２を、生成した合成ヒートマップに変更して編集画面を更新する（Ｓ１７）。制御部１１は、各層に対する合成割合の変更指示を受け付ける都度、ステップＳ１６〜Ｓ１７の処理を行い、変更指示された合成割合で合成された合成ヒートマップを生成して表示する。 On the edit screen, the control unit 11 determines whether or not the instruction for changing the composition ratio with respect to the heat map M1 of any layer has been received via the input unit 14 (S18). When it is determined that the instruction to change the composition ratio has been accepted (S18: YES), the control unit 11 changes the composition ratio being displayed to the composition ratio instructed to change, updates the edit screen, and performs the process of step S16. Return. Then, the control unit 11 generates a composite heat map from the plurality of heat maps generated in step S15 based on the changed composite ratio of each layer (S16), and generates the composite heat map M2 being displayed. The edit screen is updated by changing to the synthetic heat map (S17). Each time the control unit 11 receives an instruction to change the composition ratio for each layer, the processes of steps S16 to S17 are performed, and a composition heat map synthesized at the composition ratio instructed to be changed is generated and displayed.

合成割合の変更指示を受け付けていないと判断した場合（Ｓ１８：ＮＯ）、制御部１１は、入力部１４を介して鉛筆ボタンＢ１又は消しゴムボタンＢ２が操作されて、処理画像Ａ１中の各画素の分類クラスに対する変更指示を受け付けたか否かを判断する（Ｓ１９）。表示中の合成ヒートマップＭ２に画素を追加する場合、即ち、表示中のクラスに分類されなかった画素を表示中のクラスに追加する場合、ユーザは、入力部１４を介して鉛筆ボタンＢ１を操作した後、合成ヒートマップＭ２に追加したい箇所（画素）を操作（色付け操作）する。また表示中の合成ヒートマップＭ２から画素を消去する場合、即ち、表示中のクラスに分類された画素を表示中のクラスから除外する場合、ユーザは、入力部１４を介して消しゴムボタンＢ２を操作した後、合成ヒートマップＭ２から消去したい箇所（画素）を操作（消去操作）する。これにより、制御部１１は、表示中の合成ヒートマップＭ２に対する画素の追加又は削除の指示（分類クラスの変更指示）を受け付ける。 When it is determined that the instruction for changing the composition ratio is not accepted (S18: NO), the control unit 11 operates the pencil button B1 or the eraser button B2 via the input unit 14 to operate each pixel in the processed image A1. It is determined whether or not the change instruction for the classification class has been accepted (S19). When adding pixels to the displayed composite heat map M2, that is, when adding pixels not classified into the displayed class to the displayed class, the user operates the pencil button B1 via the input unit 14. After that, the part (pixel) to be added to the composite heat map M2 is operated (coloring operation). Further, when erasing the pixels from the composite heat map M2 being displayed, that is, when excluding the pixels classified into the displayed class from the displayed class, the user operates the eraser button B2 via the input unit 14. After that, the part (pixel) to be erased from the composite heat map M2 is operated (erased operation). As a result, the control unit 11 receives an instruction for adding or deleting pixels (instruction for changing the classification class) to the composite heat map M2 being displayed.

分類クラスの変更指示を受け付けたと判断した場合（Ｓ１９：ＹＥＳ）、制御部１１は、表示中の合成ヒートマップＭ２に対して、変更指示を受け付けた画素の追加又は消去を行って合成ヒートマップＭ２を変更し（Ｓ２０）、表示中の編集画面を更新する。分類クラスの変更指示を受け付けていないと判断した場合（Ｓ１９：ＮＯ）、又はステップＳ２０の処理後、制御部１１は、入力部１４を介して入力欄Ｂ３に記憶先の情報が入力されて保存ボタンが操作されることにより、教師データの保存指示を受け付ける。よって、制御部１１は、編集画面において保存ボタンが操作されて保存指示を受け付けたか否かを判断しており（Ｓ２１）、保存指示を受け付けていないと判断した場合（Ｓ２１：ＮＯ）、ステップＳ１８の処理に戻る。教師データの保存指示を受け付けたと判断した場合（Ｓ２１：ＹＥＳ）、制御部１１は、合成ヒートマップＭ２に基づいて教師データを生成する（Ｓ２２）。具体的には、制御部１１（対応付け部）は、処理画像Ａ１中の合成ヒートマップＭ２が示す領域内の各画素に、処理画像Ａ１のラベル情報（クラスラベル）を対応付けて教師データ（ラベル画像）を生成する。これにより、路面画像が入力された場合に、路面画像の各画素を、ひび割れの領域、凹みの領域、路面標識の欠損の領域、及びその他の領域にそれぞれ分類するためのセグメンテーションＤＮＮの学習に用いる教師データが生成される。そして、制御部１１は、生成した教師データを、編集画面を介して受け付けた記憶先に記憶し（Ｓ２３）、処理を終了する。 When it is determined that the change instruction of the classification class has been accepted (S19: YES), the control unit 11 adds or deletes the pixel for which the change instruction has been received to the composite heat map M2 being displayed, and the composite heat map M2. Is changed (S20), and the displayed edit screen is updated. When it is determined that the instruction to change the classification class is not accepted (S19: NO), or after the process of step S20, the control unit 11 inputs the storage destination information into the input field B3 via the input unit 14 and saves it. By operating the button, the instruction to save the teacher data is accepted. Therefore, the control unit 11 determines whether or not the save button has been operated on the edit screen to accept the save instruction (S21), and if it determines that the save instruction has not been accepted (S21: NO), step S18. Return to the processing of. When it is determined that the instruction to save the teacher data has been accepted (S21: YES), the control unit 11 generates the teacher data based on the synthetic heat map M2 (S22). Specifically, the control unit 11 (correspondence unit) associates the label information (class label) of the processed image A1 with each pixel in the region indicated by the composite heat map M2 in the processed image A1 to provide teacher data (correspondence unit). Label image) is generated. Thereby, when the road surface image is input, each pixel of the road surface image is used for learning the segmentation DNN for classifying each pixel into a cracked area, a dented area, a road marking defect area, and other areas. Teacher data is generated. Then, the control unit 11 stores the generated teacher data in the storage destination received via the edit screen (S23), and ends the process.

本実施形態の情報処理装置１０は、上述した処理により、画像が入力された場合に、画像中に存在する物体を示す情報を出力するクラス分類モデル１２ａと、クラス分類モデル１２ａの学習に用いる教師データとを用いて、画像が入力された場合に、画像中の各画素を、画像中に存在する物体の領域に分類するセグメンテーションＤＮＮを学習させるための教師データを生成することができる。即ち、クラス分類モデル１２ａを生成するために必要なクラス分類モデル１２ａ及び教師データ（教師データＤＢ１２ｂ）を用いて、セグメンテーションＤＮＮの学習（生成）に用いる教師データの生成が可能である。また、上述したようにヒートマップ生成アプリ１２ｃによって生成されるヒートマップを用いて自動的に、セグメンテーションＤＮＮの学習に用いる教師データを生成することにより、従来は手作業で作成していた教師データ（ラベル画像）の作成負担を軽減できる。また、手作業で教師データ（ラベル画像）を作成する場合、作業者毎にばらつきが生じる虞があるが、上述した処理によって自動的に教師データを作成する場合、客観的に各物体が分類された教師データの作成が可能となる。 The information processing apparatus 10 of the present embodiment has a classification model 12a that outputs information indicating an object existing in the image when an image is input by the above-described processing, and a teacher used for learning the classification model 12a. Using the data, it is possible to generate teacher data for training the segmentation DNN that classifies each pixel in the image into a region of an object existing in the image when the image is input. That is, it is possible to generate the teacher data used for learning (generating) the segmentation DNN by using the class classification model 12a and the teacher data (teacher data DB 12b) necessary for generating the class classification model 12a. In addition, as described above, by automatically generating the teacher data used for learning the segmentation DNN using the heat map generated by the heat map generation application 12c, the teacher data previously created manually ( The burden of creating a label image) can be reduced. In addition, when teacher data (label image) is manually created, there is a risk of variation among workers, but when teacher data is automatically created by the above-mentioned processing, each object is objectively classified. It is possible to create teacher data.

近年、セマンティックセグメンテーションは、自動運転分野及び医療分野をはじめとする種々の分野において重要な技術であり、様々な業種及び企業において研究開発が行われている。その際に、セグメンテーションＤＮＮの学習に用いる教師データを自動的に作成することにより、教師データの作成に費やされる予算及び労力の低減が可能であり、セマンティックセグメンテーション技術の更なる発展に寄与するものと期待できる。 In recent years, semantic segmentation is an important technology in various fields including the field of autonomous driving and the medical field, and research and development are being carried out in various industries and companies. At that time, by automatically creating the teacher data used for the learning of the segmentation DNN, it is possible to reduce the budget and labor required to create the teacher data, which will contribute to the further development of the semantic segmentation technology. You can expect it.

本実施形態では、ヒートマップ生成アプリ１２ｃによって生成されたヒートマップから自動的に生成された合成ヒートマップを、編集画面を介して編集（変更）できる。よって、自動的に生成された合成ヒートマップに誤りが含まれる場合に訂正することができるので、精度の高い教師データの生成が可能となる。また、図７に示す編集画面では、処理画像Ａ１を選択し直すことも可能である。よって、一旦表示させた編集画面において、複数の処理画像Ａ１に基づく教師データ（ラベル画像）を順次作成することができる。また、本実施形態において、例えば編集画面を介して、合成ヒートマップの作成に用いるヒートマップの数を変更（指定）できるように構成してもよい。即ち、ヒートマップ生成アプリ１２ｃがクラス分類モデル１２ａの中間層の各層について生成したヒートマップのうちで、合成ヒートマップの生成に用いるヒートマップを変更できるように構成してもよい。具体的には、図７に示す編集画面では、５つのヒートマップから合成ヒートマップが生成される例を示すが、例えば出力層に近い層（例えばLayer３〜Layer５の３層）のヒートマップのみを用いて合成ヒートマップを生成してもよい。 In the present embodiment, the synthetic heat map automatically generated from the heat map generated by the heat map generation application 12c can be edited (changed) via the edit screen. Therefore, if an error is included in the automatically generated synthetic heat map, it can be corrected, so that highly accurate teacher data can be generated. Further, on the edit screen shown in FIG. 7, it is possible to reselect the processed image A1. Therefore, the teacher data (label image) based on the plurality of processed images A1 can be sequentially created on the edit screen once displayed. Further, in the present embodiment, the number of heat maps used for creating the composite heat map may be changed (designated), for example, via an edit screen. That is, the heat map generated by the heat map generation application 12c may be configured so that the heat map used for generating the synthetic heat map can be changed among the heat maps generated for each layer of the intermediate layer of the classification model 12a. Specifically, the edit screen shown in FIG. 7 shows an example in which a composite heat map is generated from five heat maps, but for example, only heat maps of layers close to the output layer (for example, three layers of Layer 3 to Layer 5) are displayed. May be used to generate a synthetic heatmap.

また、生成された各層のヒートマップにおいて、ヒートマップ中の各値の最大値（最大レベル）が所定閾値以上のヒートマップのみを用いて合成ヒートマップを生成するようにしてもよい。即ち、ヒートマップにおける最大値が所定閾値未満のヒートマップを、合成ヒートマップの生成に用いないようにしてもよい。ヒートマップの各値が大きいほど分類結果に対する寄与度が高いことを示しているので、寄与度が高いことを示しているヒートマップのみを用いて合成ヒートマップを生成することにより、より精度の高い教師データを生成できる。なお、合成ヒートマップの生成に用いるか否かの判断基準とする閾値は、例えば編集画面において入力部１４を介して設定できるように構成してもよい。この場合、編集画面は、設定された閾値に基づいて合成ヒートマップの生成に用いると判断されたヒートマップのみを表示してもよい。また、ヒートマップにおける最大値に応じて合成ヒートマップに用いるヒートマップを切り替える場合、ヒートマップ生成アプリ１２ｃが生成したヒートマップから、合成ヒートマップの生成に用いるヒートマップを自動的に選択するように情報処理装置１０を構成してもよい。 Further, in the generated heat map of each layer, the combined heat map may be generated by using only the heat map in which the maximum value (maximum level) of each value in the heat map is equal to or more than a predetermined threshold value. That is, a heat map whose maximum value in the heat map is less than a predetermined threshold value may not be used for generating the composite heat map. The larger each value of the heat map is, the higher the contribution to the classification result is. Therefore, by generating the composite heat map using only the heat map showing the high contribution, the accuracy is higher. Can generate teacher data. The threshold value as a criterion for determining whether or not to use the synthetic heat map may be set so as to be set via the input unit 14 on the editing screen, for example. In this case, the edit screen may display only the heat map determined to be used for generating the synthetic heat map based on the set threshold value. Also, when switching the heat map used for the composite heat map according to the maximum value in the heat map, the heat map used for generating the composite heat map is automatically selected from the heat map generated by the heat map generation application 12c. The information processing device 10 may be configured.

（実施形態２）
ヒートマップ生成アプリ１２ｃによって生成されたヒートマップにノイズ除去処理を行った後に合成ヒートマップを生成する情報処理装置１０について説明する。本実施形態の情報処理装置は、実施形態１の情報処理装置１０と同様の構成を有するので、構成についての説明は省略する。 (Embodiment 2)
The information processing apparatus 10 that generates a composite heat map after performing noise removal processing on the heat map generated by the heat map generation application 12c will be described. Since the information processing device of the present embodiment has the same configuration as the information processing device 10 of the first embodiment, the description of the configuration will be omitted.

図８は、実施形態２の情報処理装置１０による教師データの生成処理手順の一例を示すフローチャートである。図８に示す処理は、図５に示す処理において、ステップＳ１５及びステップＳ１６の間にステップＳ３１の処理を追加したものである。図５と同様のステップについては説明を省略する。また図８では、図５中のステップＳ１７〜Ｓ２３の図示を省略している。 FIG. 8 is a flowchart showing an example of a teacher data generation processing procedure by the information processing apparatus 10 of the second embodiment. The process shown in FIG. 8 is the process shown in FIG. 5 in which the process of step S31 is added between steps S15 and S16. The description of the same steps as in FIG. 5 will be omitted. Further, in FIG. 8, the illustration of steps S17 to S23 in FIG. 5 is omitted.

本実施形態の情報処理装置１０において、制御部１１は、実施形態１と同様にステップＳ１１〜Ｓ１５の処理を行う。これにより、画像指定画面を介して指定された処理画像Ａ１がクラス分類モデル１２ａに入力され、クラス分類モデル１２ａから処理画像Ａ１のラベル情報が出力された場合に、クラス分類モデル１２ａが出力値を算出する際に根拠とした処理画像Ａ１中の領域を示すヒートマップが、ヒートマップ生成アプリ１２ｃによって生成される。 In the information processing apparatus 10 of the present embodiment, the control unit 11 performs the processes of steps S11 to S15 in the same manner as in the first embodiment. As a result, when the processed image A1 designated via the image designation screen is input to the classification model 12a and the label information of the processed image A1 is output from the classification model 12a, the classification model 12a outputs an output value. The heat map generation application 12c generates a heat map showing the region in the processed image A1 as the basis for the calculation.

本実施形態の情報処理装置１０では、制御部１１は、生成したヒートマップのそれぞれに対して、ノイズを除去する処理を行う（Ｓ３１）。例えば制御部１１は、それぞれのヒートマップにおいて、所定閾値未満の値の画素をヒートマップから消去（除去）する。このようにヒートマップにおいて、ノイズの可能性が高い箇所（画素）を消去することにより、クラス分類モデル１２ａによる分類結果に対する寄与度が高い領域のみを示すヒートマップを生成できる。制御部１１は、ノイズが除去された各ヒートマップに基づいて、合成ヒートマップを生成し（Ｓ１６）、ステップＳ１７以降の処理を行う。よって、本実施形態では、分類結果に対する寄与度が高い領域のみを示すヒートマップを合成することにより、より精度の高い合成ヒートマップ（教師データ）を生成できる。 In the information processing apparatus 10 of the present embodiment, the control unit 11 performs a process of removing noise from each of the generated heat maps (S31). For example, the control unit 11 erases (removes) pixels having a value less than a predetermined threshold value from the heat map in each heat map. By erasing the parts (pixels) having a high possibility of noise in the heat map in this way, it is possible to generate a heat map showing only the region having a high degree of contribution to the classification result by the classification model 12a. The control unit 11 generates a composite heat map based on each heat map from which noise has been removed (S16), and performs the processes after step S17. Therefore, in the present embodiment, a more accurate synthetic heat map (teacher data) can be generated by synthesizing a heat map showing only a region having a high degree of contribution to the classification result.

本実施形態においても、実施形態１と同様の効果が得られる。また本実施形態では、ヒートマップ生成アプリ１２ｃによって生成されたヒートマップに含まれるノイズが除去された後に合成ヒートマップが生成されるので、より精度の高い合成ヒートマップ（教師データ）を生成することができる。なお、ヒートマップにおいて、ノイズとして除去されるか否かの判断基準である所定閾値を、例えば入力部１４を介して指定された値に変更できるように構成してもよい。また、例えばヒートマップにおける最大値又は平均値等に応じて所定閾値が自動的に変更されるように構成してもよい。 Also in this embodiment, the same effect as that of the first embodiment can be obtained. Further, in the present embodiment, since the synthetic heat map is generated after the noise contained in the heat map generated by the heat map generation application 12c is removed, a more accurate synthetic heat map (teacher data) is generated. Can be done. In the heat map, the predetermined threshold value, which is a criterion for determining whether or not the noise is removed, may be changed to a value specified via, for example, the input unit 14. Further, for example, the predetermined threshold value may be automatically changed according to the maximum value or the average value in the heat map.

上述した各実施形態において、クラス分類モデル１２ａ及び教師データＤＢ１２ｂに基づいて生成されたセグメンテーションＤＮＮの学習用の教師データを用いて、セグメンテーションＤＮＮを学習させることができる。これにより、画像が入力された場合に、入力画像中の各画素を、判別対象の各物体の領域（各クラスの領域）に精度良く分類できるセグメンテーションＤＮＮを実現できる。本実施形態では、路面画像中の各画素を、ひび割れの領域、凹みの領域、路面標識の欠損の領域及びその他の領域に分類するセグメンテーションＤＮＮの学習用の教師データを作成する構成について説明した。しかし、本開示は、その他の分類を行うセグメンテーションＤＮＮにも適用できる。例えば、建物、トンネル、ダム等の構造物の表面画像から、表面画像中の各画素をひび割れ等の欠損領域のそれぞれに分類するセグメンテーションＤＮＮの学習用の教師データを生成することができる。また、自動運転分野及び医療分野等、種々の分野において利用されるセグメンテーションＤＮＮにも適用でき、各種のセグメンテーションＤＮＮの学習用の教師データの生成に利用できる。また、画像中に存在する物体又は状態を判別するセグメンテーションＤＮＮだけでなく、画像中のフォトジェニックな領域又はインスタ映えする領域を判別するセグメンテーションＤＮＮにも適用できる。即ち、入力画像に対してフォトジェニックな領域の有無又はインスタ映えする領域の有無を判別するクラス分類モデルと、このクラス分類モデルの学習に用いた教師データとを用いることにより、画像中の各画素をフォトジェニックな領域又はインスタ映えする領域に分類するセグメンテーションＤＮＮの学習用の教師データを生成することができる。 In each of the above-described embodiments, the segmentation DNN can be trained by using the teacher data for learning the segmentation DNN generated based on the classification model 12a and the teacher data DB 12b. As a result, when an image is input, it is possible to realize a segmentation DNN that can accurately classify each pixel in the input image into a region (area of each class) of each object to be discriminated. In the present embodiment, a configuration for creating teacher data for learning segmentation DNN, which classifies each pixel in a road surface image into a cracked area, a dented area, a road marking defect area, and other areas, has been described. However, the present disclosure is also applicable to segmentation DNNs that make other classifications. For example, from a surface image of a structure such as a building, a tunnel, or a dam, it is possible to generate teacher data for learning segmentation DNN that classifies each pixel in the surface image into each of a defective region such as a crack. It can also be applied to segmentation DNNs used in various fields such as the field of autonomous driving and the medical field, and can be used to generate teacher data for learning various segmentation DNNs. Further, it can be applied not only to a segmentation DNN for discriminating an object or a state existing in an image, but also to a segmentation DNN for discriminating a photogenic region or an instagram-worthy region in an image. That is, by using the classification model that determines the presence / absence of a photogenic region or the presence / absence of an instagramable region with respect to the input image and the teacher data used for learning this classification model, each pixel in the image. It is possible to generate teacher data for learning the segmentation DNN that classifies the image into a photogenic area or an instagramable area.

今回開示された実施の形態はすべての点で例示であって、制限的なものでは無いと考えられるべきである。本開示の範囲は、上記した意味では無く、特許請求の範囲によって示され、特許請求の範囲と均等の意味及び範囲内でのすべての変更が含まれることが意図される。 The embodiments disclosed this time should be considered as exemplary in all respects and not restrictive. The scope of the present disclosure is expressed by the scope of claims, not the above-mentioned meaning, and is intended to include all modifications within the meaning and scope equivalent to the scope of claims.

１０情報処理装置
１１制御部
１２記憶部
１４入力部
１５表示部
１２ａクラス分類モデル
１２ｂ教師データＤＢ
１２ｃヒートマップ生成アプリ 10 Information processing device 11 Control unit 12 Storage unit 14 Input unit 15 Display unit 12a Class classification model 12b Teacher data DB
12c heat map generation app

Claims

Acquire label information indicating an image and an object in the image, and obtain
The acquired image is input to the trained model trained to output the object information indicating the object in the image when the image is input.
A heat map showing a region in the image as a basis when the trained model to which the image is input outputs object information is generated.
A program that causes a computer to execute a process of associating the label information with each pixel in the region indicated by the heat map in the image.

A claim for causing the computer to perform a process of removing noise based on the degree of the base when the object information is output based on the image in which the trained model is input to the generated heat map. Item 1. The program according to item 1.

The trained model is a neural network having a plurality of layers.
Generate the heatmap corresponding to each of the plurality of layers,
The program according to claim 1 or 2, wherein the computer is made to execute a process of synthesizing a heat map generated corresponding to each of the plurality of layers.

Accepting the change instruction for the generated heat map,
The program according to any one of claims 1 to 3, which causes the computer to execute a process of changing the heat map based on the received change instruction.

The acquired image, the label information corresponding to the image, and the heat map according to the degree of the base when the trained model to which the image is input outputs the object information are associated with each other. The program according to any one of claims 1 to 4, which causes the computer to execute the process of displaying on the display unit.

When an image is input, each pixel in the image is targeted by using the teacher data including each pixel in the region indicated by the heat map in the image and the label information associated with each pixel. The program according to any one of claims 1 to 5, wherein the computer executes a process of training a learning model that classifies a learning model into a plurality of areas including an object area.

The computer
Acquire label information indicating an image and an object in the image, and obtain
The acquired image is input to the trained model trained to output the object information indicating the object in the image when the image is input.
A heat map showing a region in the image as a basis when the trained model to which the image is input outputs object information is generated.
An information processing method for executing a process of associating the label information with each pixel in the region indicated by the heat map in the image.

An acquisition unit that acquires label information indicating an image and an object in the image, and
The acquired image is input to the trained model trained to output the object information indicating the object in the image when the image is input, and the trained model in which the image is input is input. A generator that generates a heat map showing the area in the image that is the basis when the model outputs the object information,
An information processing device including an associating unit for associating the label information with each pixel in the region indicated by the heat map in the image.