JP2020135438A

JP2020135438A - Basis presentation device, basis presentation method and basis presentation program

Info

Publication number: JP2020135438A
Application number: JP2019028316A
Authority: JP
Inventors: 恭史国定; Yasushi Kunisada; 山本　康平; Kohei Yamamoto; 康平山本; 前野　蔵人; Kurato Maeno; 蔵人前野; 素子加賀谷; Motoko Kagaya
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2019-02-20
Filing date: 2019-02-20
Publication date: 2020-08-31

Abstract

To create a saliency map indicating its determination basis at low cost, even in a neural network handling variety of input data.SOLUTION: A basis presentation device 100 comprises: a synthesis unit 103 that synthesizes input data "x" and a saliency map 122 "m" to create synthetic data "y"; an inference unit 104 that creates inference data "z" as a result of inferencing using a first NN 112 from the input data "x" and the synthetic data "y" respectively; and an evaluation unit 105 that calculates an error between the inference data "z" of the input data "x" and the inference data "z" of the synthetic data "y" by means of a loss function "E1(-)", and makes the calculation result to be a loss score, and a map generation unit 102 updates a parameter of a second NN 121 by reflecting the loss score for the second NN 121 used in a previous time inference.SELECTED DRAWING: Figure 1

Description

本発明は、根拠提示装置、根拠提示方法、および、根拠提示プログラムに関する。 The present invention relates to a rationale presentation device, a rationale presentation method, and a rationale presentation program.

一般に、今日の画像認識などで高い性能を有する多層ニューラルネットワークは、膨大なパラメータと複雑なモデルで構成されている。しかし、この種の機械学習によるシステムは優れた性能を示す一方、ニューラルネットワークの出力の判断根拠の解釈が難しいという課題があった。この課題を解決するため、ニューラルネットワークの判断根拠を解釈する手法がいくつか提案されている。 In general, a multi-layer neural network having high performance in today's image recognition and the like is composed of a huge number of parameters and a complicated model. However, while this type of machine learning system shows excellent performance, there is a problem that it is difficult to interpret the judgment basis of the output of the neural network. In order to solve this problem, some methods for interpreting the judgment basis of the neural network have been proposed.

判断根拠を解釈する手法の１つに、ニューラルネットワークの出力に対する入力の貢献度を示す顕著性マップを作成するという手法がある。
非特許文献１に記載の手法は、ニューラルネットワークの入力と出力の勾配を計算し、入力のうち勾配が大きい部分は出力への貢献度が大きいとする手法の１つである。出力と入力の勾配を利用する手法は、顕著性マップに勾配に起因するノイズが多く乗るという課題があるため、入力にノイズを乗せたものを多数用意し、その勾配の平均値をとることでノイズを除去するという工夫が加えられている。 One of the methods for interpreting the judgment basis is to create a saliency map showing the contribution of the input to the output of the neural network.
The method described in Non-Patent Document 1 is one of the methods in which the gradients of the input and the output of the neural network are calculated, and the portion of the input having a large gradient has a large contribution to the output. The method using the gradient of the output and input has a problem that a lot of noise due to the gradient is added to the saliency map, so by preparing a large number of input with noise and taking the average value of the gradient. A device to remove noise has been added.

特許文献１に記載の手法は、ニューラルネットワークの入力と出力の勾配を利用する手法の１つで、ニューラルネットワークのテイラー近似の式を利用して貢献度を計算する。 The method described in Patent Document 1 is one of the methods using the gradients of the input and output of the neural network, and calculates the degree of contribution by using the Taylor approximation formula of the neural network.

非特許文献２に記載の手法は、顕著性マップを学習によって得るという手法で、入力の一部分にマスクをかけたものを学習済みのニューラルネットワークに入力し、出力値が正解から遠くなるように顕著性マップを学習することによって、入力の貢献度が高い部分を隠すようなマスクを作成できるようになるという手法である。 The method described in Non-Patent Document 2 is a method of obtaining a saliency map by learning, in which a part of the input is masked and input to a trained neural network, and the output value is remarkable so as to be far from the correct answer. By learning the sex map, it is possible to create a mask that hides the part where the contribution of input is high.

特表２０１８−５１３５０７号公報Special Table 2018-513507

Daniel Smilkov他著、「SmoothGrad: removing noise by adding noise」、［online］、2017年、［2019年2月4日検索］、インターネット〈URL：https://arxiv.org/abs/1706.03825〉Daniel Smilkov et al., "SmoothGrad: removing noise by adding noise", [online], 2017, [searched February 4, 2019], Internet <URL: https://arxiv.org/abs/1706.03825> Ruth Fong他著、「Interpretable Explanations of Black Boxes by Meaningful Perturbation」、［online］、2018年1月10日、［2019年2月4日検索］、インターネット〈URL：https://arxiv.org/abs/1704.03296〉Ruth Fong et al., "Interpretable Explanations of Black Boxes by Meaningful Perturbation", [online], January 10, 2018, [Search February 4, 2019], Internet <URL: https://arxiv.org/abs /1704.03296>

あるニューラルネットワークの判断根拠を示す顕著性マップを作成する手法は、、ニューラルネットワークや入力データの特性に応じてパラメータを調整する必要がある。よって、扱うニューラルネットワークや入力データが多くなるほど、調整するパラメータを人間が用意するコストも高くなってしまう。 The method of creating a saliency map showing the judgment basis of a certain neural network needs to adjust the parameters according to the characteristics of the neural network and the input data. Therefore, the more neural networks and input data to be handled, the higher the cost for humans to prepare the parameters to be adjusted.

そこで、本発明は、多様な入力データを扱うニューラルネットワークであっても、その判断根拠を示す顕著性マップを低コストで作成することを、主な課題とする。 Therefore, the main object of the present invention is to create a saliency map showing the basis for judgment even in a neural network that handles various input data at low cost.

前記課題を解決するために、本発明の根拠提示装置は、以下の特徴を有する。
本発明は、第１ニューラルネットワークの学習に用いたデータセットの入力データから、第２ニューラルネットワークを用いて推論した結果、前記入力データのうちの特徴を示す顕著性マップを作成して前記第１ニューラルネットワークの根拠として出力するマップ生成部と、
前記入力データと前記顕著性マップとを合成して合成データを作成する合成部と、
前記入力データおよび前記合成データそれぞれから、前記第１ニューラルネットワークを用いて推論した結果、推論データを作成する推論部と、
前記入力データの推論データと、前記合成データの推論データとの誤差を損失関数により計算し、その計算結果を損失スコアとする評価部とを有し、
前記マップ生成部が、前回の推論に用いた前記第２ニューラルネットワークに対して前記損失スコアを反映することで、前記第２ニューラルネットワークのパラメータを更新することを特徴とする。 In order to solve the above problems, the rationale presentation device of the present invention has the following features.
The present invention creates a saliency map showing the characteristics of the input data as a result of inferring from the input data of the data set used for learning the first neural network by using the second neural network. A map generator that outputs as the basis of the neural network,
A compositing unit that synthesizes the input data and the prominence map to create composite data,
An inference unit that creates inference data as a result of inference from each of the input data and the synthetic data using the first neural network.
It has an evaluation unit that calculates the error between the inference data of the input data and the inference data of the composite data by a loss function and uses the calculation result as a loss score.
The map generation unit updates the parameters of the second neural network by reflecting the loss score on the second neural network used in the previous inference.

また、本発明は、前記根拠提示装置が実行する根拠提示方法、および、その根拠提示方法を実行するための根拠提示プログラムである。 Further, the present invention is a ground presentation method executed by the ground presentation device and a ground presentation program for executing the ground presentation method.

本発明によれば、多様な入力データを扱うニューラルネットワークであっても、その判断根拠を示す顕著性マップを低コストで作成することができる。 According to the present invention, even in a neural network that handles various input data, a saliency map showing the judgment basis can be created at low cost.

本発明の実施例１に係るニューラルネットワークシステムを示す構成図である。It is a block diagram which shows the neural network system which concerns on Example 1 of this invention. 本発明の実施例１に係る入力画像データと顕著性マップとを合成する合成部の処理の一例を示す。An example of the processing of the synthesis part which synthesizes the input image data and the saliency map which concerns on Example 1 of this invention is shown. 本発明の実施例１に係る注目領域が広すぎる顕著性マップ画像の一例を示す。An example of a saliency map image in which the region of interest according to Example 1 of the present invention is too wide is shown. 本発明の実施例１に係る注目領域が狭すぎる顕著性マップ画像の一例を示す。An example of a prominence map image in which the region of interest according to Example 1 of the present invention is too narrow is shown. 本発明の実施例１に係る根拠提示装置の処理を示すフローチャートである。It is a flowchart which shows the process of the basis presenting apparatus which concerns on Example 1 of this invention. 本発明の実施例２に係るニューラルネットワークシステムを示す構成図である。It is a block diagram which shows the neural network system which concerns on Example 2 of this invention. 本発明の実施例２に係る第１の注目領域の画像データを示す説明図である。It is explanatory drawing which shows the image data of the 1st region of interest which concerns on Example 2 of this invention. 本発明の実施例２に係る第２の注目領域の画像データを示す説明図である。It is explanatory drawing which shows the image data of the 2nd attention area which concerns on Example 2 of this invention.

以下、図面を参照して、本発明の実施の形態について詳細に説明する。各図は、本発明を十分に理解できる程度に、概略的に示してあるに過ぎない。よって、本発明は、図示例のみに限定されるものではない。また、参照する図面において、本発明を構成する部材の寸法は、説明を明確にするために誇張して表現されている場合がある。なお、各図において、共通する構成要素や同様な構成要素については、同一の符号を付し、それらの重複する説明を省略する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. Each figure is only schematically shown to the extent that the present invention can be fully understood. Therefore, the present invention is not limited to the illustrated examples. Further, in the drawings to be referred to, the dimensions of the members constituting the present invention may be exaggerated for the sake of clarity. In each figure, common components and similar components are designated by the same reference numerals, and duplicate description thereof will be omitted.

［実施例１］
図１は、実施例１におけるニューラルネットワークシステムを示す構成図である。ニューラルネットワークシステムは、機械学習装置１１０と、根拠提示装置１００とで構成される。以下、ニューラルネットワークをＮＮ（Neural Network）と略す。
機械学習装置１１０は、学習部１１１がデータセット１１３を参照して第１ＮＮ１１２を機械学習する。根拠提示装置１００は、機械学習装置１１０の第１ＮＮ１１２による推論の判断根拠を提示する。以下、第１ＮＮ１１２は関数「f(・)」で示す推論処理に用いられるとする。 [Example 1]
FIG. 1 is a configuration diagram showing a neural network system according to the first embodiment. The neural network system includes a machine learning device 110 and a rationale presentation device 100. Hereinafter, the neural network is abbreviated as NN (Neural Network).
In the machine learning device 110, the learning unit 111 machine-learns the first NN 112 with reference to the data set 113. The rationale presentation device 100 presents the rationale for determining the reasoning by the first NN 112 of the machine learning device 110. Hereinafter, it is assumed that the first NN112 is used for the inference processing represented by the function “f (・)”.

機械学習装置１１０の制御部および根拠提示装置１００の制御部は、それぞれ図示しないＣＰＵ（Central Processing Unit）やＧＰＵ（Graphics Processing Unit）を中心に構成されており、図示しないＲＯＭ（Read Only Memory）等から所定のプログラムを読み出して実行することにより、ニューラルネットワークに関する種々の処理を行う。また前記の制御部は、内部にＲＡＭ（Random Access Memory）、ハードディスクドライブやフラッシュメモリ等でなる記憶部を有しており、この記憶部に種々の情報を記憶させる。 The control unit of the machine learning device 110 and the control unit of the rationale presentation device 100 are mainly composed of a CPU (Central Processing Unit) and a GPU (Graphics Processing Unit) (not shown), respectively, and a ROM (Read Only Memory) and the like (not shown). By reading and executing a predetermined program from, various processes related to the neural network are performed. Further, the control unit has a storage unit including a RAM (Random Access Memory), a hard disk drive, a flash memory, and the like inside, and various information is stored in this storage unit.

根拠提示装置１００は、制御部により具現化された処理部として、入力部１０１と、マップ生成部１０２と、合成部１０３と、推論部１０４と、評価部１０５とを有する。
根拠提示装置１００は、記憶部に記憶されるデータとして、第２ＮＮ１２１と、顕著性マップ１２２とを有する。以下、第２ＮＮ１２１は関数「g(・)」で示す推論処理に用いられるとする。
ここからは、根拠提示装置１００の各データ（第２ＮＮ１２１と顕著性マップ１２２）を説明した後に、図５を参照して各処理部を順に説明する。 The rationale presenting device 100 has an input unit 101, a map generation unit 102, a synthesis unit 103, an inference unit 104, and an evaluation unit 105 as processing units embodied by the control unit.
The rationale presenting device 100 has a second NN 121 and a saliency map 122 as data stored in the storage unit. Hereinafter, it is assumed that the second NN 121 is used for the inference processing represented by the function "g (・)".
From here on, after explaining each data (second NN 121 and saliency map 122) of the rationale presenting apparatus 100, each processing unit will be described in order with reference to FIG.

図２は、データセット１１３の入力画像データと、顕著性マップ１２２とを合成する合成部１０３の処理の一例を示す。
データセット１１３の一例として、以下では第１ＮＮ１１２の入力層に入力される犬の画像３０１と、第１ＮＮ１１２の出力層から出力される「犬が写っている」という判別結果との組で構成される教師データを用いる。
データセット１１３の入力データ「x」に対応して１つの顕著性マップ１２２「m」が生成され、「m」と「x」とは同じサイズかつ同じ次元である。図２では、顕著性マップ１２２の一例として、犬の顔が位置する領域を可視化する顕著性マップ画像３０２を扱う。
なお、犬の顔が位置する注目領域３０２ａは、犬の画像３０１から「犬が写っている」という判別結果を得るための顕著性マップ画像３０２の重要な領域である。一方、注目領域以外の顕著性マップ画像３０２の黒塗り領域を「マスク領域」とする。 FIG. 2 shows an example of processing of the compositing unit 103 that synthesizes the input image data of the data set 113 and the saliency map 122.
As an example of the data set 113, the following is composed of a set of a dog image 301 input to the input layer of the first NN 112 and a determination result of "a dog is shown" output from the output layer of the first NN 112. Use teacher data.
One saliency map 122 "m" is generated corresponding to the input data "x" of the dataset 113, and the "m" and "x" are of the same size and dimension. In FIG. 2, as an example of the saliency map 122, the saliency map image 302 that visualizes the region where the dog's face is located is treated.
The region of interest 302a where the dog's face is located is an important region of the saliency map image 302 for obtaining the determination result that "the dog is captured" from the dog image 301. On the other hand, the black-painted area of the saliency map image 302 other than the attention area is referred to as a “mask area”.

合成部１０３は、犬の画像３０１の表面に顕著性マップ画像３０２を重ね合わせることで、合成画像３０３を作成する。合成画像３０３には、注目領域３０３ａに写っている犬の顔と、注目領域３０３ａの外部に位置するマスク領域とが、区別できるように可視化されている。
なお、顕著性マップ画像３０２のマスク領域として、図２では黒塗り領域を例示したが、この他にも犬の画像３０１をぼかしたり、ノイズを混入させたりするようにマスク領域を設定してもよい。 The compositing unit 103 creates the compositing image 303 by superimposing the saliency map image 302 on the surface of the dog image 301. In the composite image 303, the dog's face shown in the attention region 303a and the mask region located outside the attention region 303a are visualized so as to be distinguishable.
Although the black-painted area is illustrated in FIG. 2 as the mask area of the saliency map image 302, the mask area may be set so as to blur the dog image 301 or mix noise. Good.

根拠提示装置１００の主な目的は、第１ＮＮ１１２の出力に対する判断根拠データとして、適切な顕著性マップ１２２を作成することである。例えば、図２のような顕著性マップ画像３０２は、判別結果に貢献する犬の顔が位置する注目領域３０２ａを確実に抽出しつつ、その他の判別結果に貢献しない箇所のマスク領域も充分に広めに設定した、適切な顕著性マップ１２２の一例である。 The main purpose of the rationale presentation device 100 is to create an appropriate saliency map 122 as judgment rationale data for the output of the first NN 112. For example, in the saliency map image 302 as shown in FIG. 2, the attention region 302a where the dog's face that contributes to the discrimination result is located is surely extracted, and the mask region of the other portion that does not contribute to the discrimination result is sufficiently widened. This is an example of an appropriate saliency map 122 set in.

一方、図３の顕著性マップ画像３１１は、注目領域３１１ａが広すぎるため、たしかに犬の顔部分は確実に抽出できる（符号３１２）。しかし、マスク領域が狭すぎて、犬の足など判別結果に貢献しない箇所まで余分に抽出してしまうので（符号３１２ａ）、顕著性マップ画像３１１は不適切な顕著性マップ１２２である。 On the other hand, in the saliency map image 311 of FIG. 3, since the region of interest 311a is too wide, the dog's face portion can be reliably extracted (reference numeral 312). However, since the mask area is too narrow and extra parts such as dog paws that do not contribute to the discrimination result are extracted (reference numeral 312a), the saliency map image 311 is an inappropriate saliency map 122.

また、図４の顕著性マップ画像３２１は、注目領域３２１ａが狭すぎるため、犬の顔の一部（鼻と口）しか抽出されない（符号３２２）。よって、顔の一部を注目領域３２２ａとするだけでは、犬と類似するタヌキなどの他の動物と誤判別してしまうので、顕著性マップ画像３２１は不適切な顕著性マップ１２２である。
以上、さまざまな顕著性マップ１２２の例を説明した。 Further, in the saliency map image 321 of FIG. 4, since the area of interest 321a is too narrow, only a part (nose and mouth) of the dog's face is extracted (reference numeral 322). Therefore, the saliency map image 321 is an inappropriate saliency map 122 because it is erroneously discriminated as another animal such as a raccoon dog similar to a dog if only a part of the face is set as the attention region 322a.
The examples of various saliency maps 122 have been described above.

図１に戻り、第２ＮＮ１２１は、データセット１１３の入力データから、顕著性マップ１２２を出力するためのニューラルネットワークである。第２ＮＮ１２１を用いた推論は、「m=g(x)」という式で表現される。ここで、mの範囲は0<m<1となるように第２ＮＮ１２１を設計する。例えば、第２ＮＮ１２１の出力層の直前に正規化処理を入れてもよい。
また、顕著性マップ１２２の学習に有効な場合は、誤差逆伝播法に適用可能な範囲でmの分布に何らかの規則を与えてもよい。例えば、mの値がある閾値以下のものは0とする活性化関数を、第２ＮＮ１２１の出力前に加えてもよい。 Returning to FIG. 1, the second NN 121 is a neural network for outputting the saliency map 122 from the input data of the data set 113. The inference using the second NN121 is expressed by the formula "m = g (x)". Here, the second NN 121 is designed so that the range of m is 0 <m <1. For example, the normalization process may be inserted immediately before the output layer of the second NN 121.
Further, if it is effective for learning the saliency map 122, some rule may be given to the distribution of m within a range applicable to the backpropagation method. For example, an activation function in which the value of m is equal to or less than a certain threshold value may be added before the output of the second NN 121.

第２ＮＮ１２１の内部の構造は特に限定されず、タスクに応じた構造を自由に選択することができる。例えば、全結合層を重ねたニューラルネットワークを用いてもよいし、畳み込みニューラルネットワークを用いてもよい。畳み込みニューラルネットワークは、例えば、Olaf Ronneberger他著の論文「U-Net: Convolutional Networks for Biomedical Image Segmentation」に開示されている。 The internal structure of the second NN 121 is not particularly limited, and the structure according to the task can be freely selected. For example, a neural network in which fully connected layers are stacked may be used, or a convolutional neural network may be used. Convolutional neural networks are disclosed, for example, in the paper "U-Net: Convolutional Networks for Biomedical Image Segmentation" by Olaf Ronneberger et al.

図５は、根拠提示装置１００の処理を示すフローチャートである。このフローチャートでは、前準備の工程（Ｓ１０１）と、第２ＮＮ１２１の学習工程（Ｓ１０２〜Ｓ１０７）と、第２ＮＮ１２１の学習終了判定の工程（Ｓ１０８）とが順に実行される。学習終了判定を満たすまで、第２ＮＮ１２１の学習工程が繰り返されることで、第２ＮＮ１２１のパラメータは徐々に改善される。
以下、フローチャートの各工程の詳細について、説明する。 FIG. 5 is a flowchart showing the processing of the basis presentation device 100. In this flowchart, the preparatory step (S101), the learning steps of the second NN121 (S102 to S107), and the learning end determination step (S108) of the second NN121 are executed in order. By repeating the learning process of the second NN 121 until the learning end determination is satisfied, the parameters of the second NN 121 are gradually improved.
Hereinafter, details of each process in the flowchart will be described.

推論部１０４は第２ＮＮ１２１の学習開始時に、前準備として第１ＮＮ１１２（学習済みモデル）を取得しておく（Ｓ１０１）。この第１ＮＮ１１２は、内部構造に関わらず、あらゆるニューラルネットワークを適用することが可能である。
マップ生成部１０２は、第２ＮＮ１２１の学習開始時には、第２ＮＮ１２１のパラメータを初期化する（Ｓ１０２）。第２ＮＮ１２１のパラメータとは、例えば、ニューラルネットワークにおけるシナプスの重み「w」である。第２ＮＮ１２１のパラメータは固定値ではなく変動値であるため、２回目以降のＳ１０２ではパラメータの更新処理が行われる。 The inference unit 104 acquires the first NN112 (learned model) as a preliminary preparation at the start of learning of the second NN121 (S101). The first NN112 can apply any neural network regardless of the internal structure.
The map generation unit 102 initializes the parameters of the second NN 121 at the start of learning of the second NN 121 (S102). The parameter of the second NN121 is, for example, the synapse weight “w” in the neural network. Since the parameter of the second NN 121 is not a fixed value but a variable value, the parameter update process is performed in the second and subsequent S102.

入力部１０１は、データセット１１３から入力データ「x」（図２では犬の画像３０１）を取得する（Ｓ１０３）。取得する入力データ「x」は、第１ＮＮ１１２の学習に用いたデータセット１１３の入力データのうちの一部または全部である。
データセット１１３の形式は特に限定されない。画像データを使ってもよいし、時系列データを使ってもよい。また、学習後の第１ＮＮ１１２のパラメータは固定値であり、更新処理は行われず、誤差逆伝播法によって偏微分の値が計算される。
マップ生成部１０２は、第２ＮＮ１２１に入力データ「x」を入力させた推論により、顕著性マップ１２２（図２では顕著性マップ画像３０２）を生成する（Ｓ１０４）。マップ生成部１０２は、生成した顕著性マップ１２２を第１ＮＮ１１２の根拠としてユーザに出力する。 The input unit 101 acquires the input data “x” (dog image 301 in FIG. 2) from the data set 113 (S103). The input data "x" to be acquired is a part or all of the input data of the data set 113 used for the training of the first NN 112.
The format of the data set 113 is not particularly limited. Image data may be used, or time series data may be used. Further, the parameter of the first NN112 after learning is a fixed value, the update process is not performed, and the value of the partial differential is calculated by the error back propagation method.
The map generation unit 102 generates a saliency map 122 (saliency map image 302 in FIG. 2) by inference that the second NN 121 inputs the input data “x” (S104). The map generation unit 102 outputs the generated saliency map 122 to the user as the basis of the first NN 112.

合成部１０３は、入力データ「x」と顕著性マップ１２２とを合成した合成データ「y」（図２では合成画像３０３）を作成する（Ｓ１０５）。この作成処理は、「y＝x・m＝x・g(x)」という式で示される。つまり、合成部１０３は、入力データ「x」に対して顕著性マップ１２２「m」による重みづけを行う。 The compositing unit 103 creates composite data “y” (composite image 303 in FIG. 2) by combining the input data “x” and the saliency map 122 (S105). This creation process is expressed by the formula "y = x · m = x · g (x)". That is, the synthesis unit 103 weights the input data “x” with the saliency map 122 “m”.

推論部１０４は、第１ＮＮ１１２（学習済みモデル）に合成データ「y」を入力させた推論により、推論データ「z」（例えば「犬が写っている」）を求めて評価部１０５に通知する（Ｓ１０６）。推論部１０４による推論処理は「z=f(y)=f(x・m)」という式で表現される。
なお、Ｓ１０２，Ｓ１０４のマップ生成部１０２の処理を、Ｓ１０６の推論部１０４の処理の前に配置することで、ニューラルネットワークの内部構造に関わらず、構築済みの第１ＮＮ１１２を適用することができる。 The inference unit 104 obtains the inference data "z" (for example, "a dog is shown") and notifies the evaluation unit 105 by inference that the synthetic data "y" is input to the first NN112 (trained model) (for example, "a dog is shown") S106). The inference process by the inference unit 104 is expressed by the formula "z = f (y) = f (x · m)".
By arranging the processing of the map generation unit 102 of S102 and S104 before the processing of the inference unit 104 of S106, the constructed first NN112 can be applied regardless of the internal structure of the neural network.

評価部１０５は、推論データ「z」と、推論データの正解値とを評価関数である損失関数「E1(・)」によって比較し、その比較結果として損失スコアを算出する（Ｓ１０７）。推論データの正解値は、入力データ「x」を第１ＮＮ１１２に入力した場合の推論結果である。損失関数「E1(・)」は、例えば以下のような（数式１）で示すことができる。 The evaluation unit 105 compares the inference data “z” with the correct value of the inference data by the loss function “E1 (・)” which is an evaluation function, and calculates the loss score as the comparison result (S107). The correct answer value of the inference data is the inference result when the input data "x" is input to the first NN 112. The loss function "E1 (・)" can be expressed by, for example, the following (Formula 1).

λは係数を示す。 λ indicates a coefficient.

数式１の右辺第１項は、推論データ「z」と、推論データの正解値との比較結果を示している。ここでは、比較のための関数を平均二乗誤差としているが、他の損失関数を用いてもよい。例えば、平均絶対値誤差を用いてもよい。
数式１の右辺第２項は、顕著性マップ１２２の特徴箇所（注目領域）を適切に小さくするための正則化項を示している。もし、顕著性マップ１２２が全て１である場合（例えば、図３のように、マスク領域がほとんど無い場合）、損失スコアは常に０となる（つまり、「犬である」と正しく判別できてしまう）。よって、数式１の右辺第１項だけは、顕著性マップ１２２の学習が進まないこともある。
そこで、顕著性マップ１２２「m」の値が小さくなるための正則化項が必要である。この右辺第２項により、図３の大きすぎる注目領域を、図２の適切な大きさの注目領域へと収束させることができる。ここでは、mの二乗を用いているが、他の正則化の式を用いてもよい。 The first term on the right side of Equation 1 shows the result of comparison between the inference data "z" and the correct value of the inference data. Here, the function for comparison is the mean square error, but other loss functions may be used. For example, the average absolute value error may be used.
The second term on the right side of Equation 1 indicates a regularization term for appropriately reducing the feature portion (area of interest) of the saliency map 122. If the saliency maps 122 are all 1 (for example, as shown in FIG. 3, when there is almost no mask area), the loss score is always 0 (that is, it can be correctly determined as "a dog"). ). Therefore, the learning of the saliency map 122 may not proceed only for the first term on the right side of the equation 1.
Therefore, a regularization argument is required to reduce the value of the saliency map 122 “m”. According to the second term on the right side, the oversized region of interest in FIG. 3 can be converged to the region of interest of an appropriate size in FIG. Here, the square of m is used, but other regularization equations may be used.

評価部１０５は、損失スコアがある閾値よりも小さくなった場合などの学習の終了条件（所定の条件）を満たした場合は（Ｓ１０８でYes）、学習を終了する。一方、学習の終了条件を満たさなかった場合は（Ｓ１０８でNo）、Ｓ１０２に戻り、学習を続行する。別の学習の終了条件としては、一定の学習回数を終えたときとしてもよい。 When the learning end condition (predetermined condition) such as when the loss score becomes smaller than a certain threshold value is satisfied (Yes in S108), the evaluation unit 105 ends the learning. On the other hand, if the learning end condition is not satisfied (No in S108), the process returns to S102 and learning is continued. Another learning end condition may be when a certain number of learnings is completed.

再度のＳ１０２において、評価部１０５は、通常のニューラルネットワークの学習と同様の誤差逆伝播法によって、損失スコアに基づいた第２ＮＮ１２１のパラメータ更新処理をマップ生成部１０２に指示する。これにより、第２ＮＮ１２１のパラメータの学習処理が行われる。このマップ生成部１０２への指示には、例えば、非特許文献１に記載の勾配として、損失スコアを第２ＮＮ１２１の重みで偏微分した値が含まれる。 In S102 again, the evaluation unit 105 instructs the map generation unit 102 to update the parameters of the second NN 121 based on the loss score by the error back propagation method similar to the learning of the normal neural network. As a result, the learning process of the parameters of the second NN 121 is performed. The instruction to the map generation unit 102 includes, for example, a value obtained by partially differentiating the loss score with the weight of the second NN 121 as the gradient described in Non-Patent Document 1.

以上説明した実施例１によれば、マップ生成部１０２が第２ＮＮ１２１を用いて顕著性マップ１２２を作成する。この顕著性マップ１２２は、多様な入力データ「x」に一度に対応できるうえに、勾配ノイズの影響が軽減されたものである。
そして、評価部１０５からマップ生成部１０２に損失スコアをフィードバック（反映）されることで、ユーザが手動で第２ＮＮ１２１のパラメータを調整することなく、第２ＮＮ１２１の機械学習が自動的に行われる。よって、顕著性マップ１２２と第２ＮＮ１２１とを低コストで作成することができる。 According to the first embodiment described above, the map generation unit 102 creates the saliency map 122 using the second NN 121. The saliency map 122 can handle various input data "x" at once, and the influence of gradient noise is reduced.
Then, by feeding back (reflecting) the loss score from the evaluation unit 105 to the map generation unit 102, the machine learning of the second NN 121 is automatically performed without the user manually adjusting the parameters of the second NN 121. Therefore, the saliency map 122 and the second NN 121 can be created at low cost.

［実施例２］
図６は、実施例２におけるニューラルネットワークシステムを示す構成図である。
図１に示した実施例１の機械学習装置１１０と、図６に示す実施例２の機械学習装置１１０とは、同じ構成である。図１に示した実施例１の根拠提示装置１００と、図６に示す実施例２の根拠提示装置２００とでは、符号の下２桁が同じ部品同士が対応する。例えば、図１の入力部１０１と、図６の入力部２０１とは対応し、同じ構成である。 [Example 2]
FIG. 6 is a configuration diagram showing a neural network system according to a second embodiment.
The machine learning device 110 of the first embodiment shown in FIG. 1 and the machine learning device 110 of the second embodiment shown in FIG. 6 have the same configuration. In the basis presentation device 100 of the first embodiment shown in FIG. 1 and the basis presentation device 200 of the second embodiment shown in FIG. 6, parts having the same last two digits of the code correspond to each other. For example, the input unit 101 of FIG. 1 and the input unit 201 of FIG. 6 correspond to each other and have the same configuration.

実施例１と実施例２との違いは、顕著性マップの個数である。実施例１では画像全体の注目領域に対応する１つの顕著性マップ１２２「m」を作成していた。一方、実施例２では複数の注目領域それぞれについての複数の（N個の）要素を持つ顕著性マップ２２２を作成するように拡張される。顕著性マップ２２２は、集合「M=(m[1],m[2],…,m[N])」として示される。m[1]とは、mの１番目の要素を示す。
また、顕著性マップ２２２「M」は、データセット１１３の入力データ「x」ごとに作成される。 The difference between Example 1 and Example 2 is the number of saliency maps. In Example 1, one saliency map 122 “m” corresponding to the region of interest of the entire image was created. On the other hand, the second embodiment is extended to create a saliency map 222 having a plurality of (N) elements for each of the plurality of areas of interest. The saliency map 222 is shown as the set "M = (m [1], m [2], ..., m [N])". m [1] indicates the first element of m.
Further, the saliency map 222 "M" is created for each input data "x" of the data set 113.

図７は、複数の注目領域のうちの第１の注目領域の画像データを示す説明図である。
符号４１１で示すように、例えば第１の注目領域が撮影されている犬の耳であった場合、マップ生成部２０２は、第１の顕著性マップ２２２として、注目領域４１２ａを含む画像４１２を生成する。合成部２０３は、この画像４１２と、入力データである図２の画像３０１とを合成した結果として、注目領域４１３ａを含む合成画像４１３を得る。
なお、符号４１１の「耳を注目領域とする」という文章での説明は、顕著性マップ２２２がN個の要素をもつことをわかりやすくするために簡略化したものである。実際にユーザが注目領域を指定するときには、このような文章での指定ではなく、後記する特徴指定パラメータ２２３での指定となる。 FIG. 7 is an explanatory diagram showing image data of a first region of interest among a plurality of regions of interest.
As shown by reference numeral 411, for example, when the first attention region is a dog's ear being photographed, the map generation unit 202 generates an image 412 including the attention region 412a as the first saliency map 222. To do. As a result of synthesizing the image 412 and the image 301 of FIG. 2 which is the input data, the compositing unit 203 obtains a composite image 413 including the region of interest 413a.
It should be noted that the description in the sentence “the ear is the region of interest” of reference numeral 411 is simplified in order to make it easy to understand that the saliency map 222 has N elements. When the user actually specifies the area of interest, it is not specified in such a sentence, but is specified by the feature specification parameter 223 described later.

図８は、複数の注目領域のうちの第２の注目領域の画像データを示す説明図である。
符号４２１で示すように、例えば第２の注目領域が撮影されている犬の目であった場合、マップ生成部２０２は、第２の顕著性マップ２２２として、注目領域４２２ａを含む画像４２２を生成する。合成部２０３は、この画像４２２と、入力データである図２の画像３０１とを合成した結果として、注目領域４２３ａを含む合成画像４２３を得る。
以上、２つの注目領域から２つの顕著性マップ２２２を生成する一例を説明した。なお、第１ＮＮ１１２は、「犬の耳」や「犬の目」を直接判断して注目領域としているのではない。第１ＮＮ１１２が何らかの判断基準で決定した注目領域を説明するために、あえて「犬の耳」や「犬の目」という視覚化しやすい情報を例示している。 FIG. 8 is an explanatory diagram showing image data of a second region of interest among the plurality of regions of interest.
As shown by reference numeral 421, for example, when the second attention region is a dog's eye being photographed, the map generation unit 202 generates an image 422 including the attention region 422a as the second saliency map 222. To do. As a result of synthesizing the image 422 and the image 301 of FIG. 2 which is the input data, the synthesizing unit 203 obtains a composite image 423 including the region of interest 423a.
So far, an example of generating two saliency maps 222 from two regions of interest has been described. It should be noted that the first NN112 does not directly determine the "dog's ear" or "dog's eye" and set it as the area of interest. In order to explain the region of interest determined by the first NN112 based on some criterion, the easily visualized information such as "dog's ear" and "dog's eye" is illustrated.

図６に戻り、根拠提示装置２００の記憶部には、どのようなN個の要素をもつ顕著性マップ２２２「M」を作成するかをユーザに指定させる特徴指定パラメータ２２３が記憶されている。特徴指定パラメータ２２３は、例えば、以下のようなデータを含む。
・第１ＮＮ１１２の中間層の番号（１番目の中間層ニューロン、２番目の中間層ニューロン…）。この番号は、どのニューロンに対応する注目領域を顕著性マップ２２２の要素として採用するかをユーザに指定させるために用いられる。
・第１ＮＮ１１２の中間層のチャンネル数（前記のN）。このチャンネル数は、マップ生成部２０２によって第２ＮＮ２２１の出力数として設定される。 Returning to FIG. 6, the storage unit of the rationale presentation device 200 stores a feature specification parameter 223 that allows the user to specify what kind of N elements the saliency map 222 “M” is to be created. The feature specification parameter 223 includes, for example, the following data.
-Mesosphere number of the first NN112 (first mesosphere neuron, second mesosphere neuron ...). This number is used to allow the user to specify which neuron of interest region is to be adopted as an element of the saliency map 222.
The number of channels in the mesosphere of the first NN112 (N above). This number of channels is set by the map generation unit 202 as the number of outputs of the second NN221.

以下、実施例２の各処理内容について、実施例１と同様に図５のフローチャートに沿って説明する。
推論部２０４は、学習開始時に第１ＮＮ１１２を取得する（Ｓ１０１）。また、推論部２０４は、特徴指定パラメータ２２３を参照することで、ユーザが指定した中間層の番号を得る。
マップ生成部２０２は、第２ＮＮ２２１の学習開始時には、特徴指定パラメータ２２３の中間層のチャンネル数に応じて、第２ＮＮ２２１のパラメータを初期化する（Ｓ１０２）。２回目以降のＳ１０２ではパラメータの更新処理が行われる。
入力部２０１は、実施例１と同様に、データセット１１３から入力データ「x」を取得する（Ｓ１０３）。 Hereinafter, each processing content of the second embodiment will be described with reference to the flowchart of FIG. 5 in the same manner as in the first embodiment.
The inference unit 204 acquires the first NN 112 at the start of learning (S101). Further, the inference unit 204 obtains the number of the mesosphere specified by the user by referring to the feature designation parameter 223.
At the start of learning of the second NN221, the map generation unit 202 initializes the parameters of the second NN221 according to the number of channels in the intermediate layer of the feature designation parameter 223 (S102). In S102 after the second time, the parameter update process is performed.
The input unit 201 acquires the input data “x” from the data set 113 as in the first embodiment (S103).

マップ生成部２０２は、入力データ「x」を第２ＮＮ２２１に入力することで、顕著性マップ２２２「M」を生成する（Ｓ１０４）。この生成処理は「M=g(x)」という式で示される。なお、顕著性マップ２２２「M」の各要素（i=1〜N）は推論部２０４がＳ１０１で取得した中間層の番号ごとに生成される。 The map generation unit 202 generates the saliency map 222 “M” by inputting the input data “x” to the second NN221 (S104). This generation process is expressed by the formula "M = g (x)". Each element (i = 1 to N) of the saliency map 222 "M" is generated for each number of the intermediate layer acquired by the inference unit 204 in S101.

合成部２０３は、入力データ「x」と、顕著性マップ２２２「M」の各要素とを合成することで、合成データ「Y=(y[1],y[2],…,y[N],y[N+1])」を得る（Ｓ１０５）。この合成処理は、「y[i]=x・m[i]、y[N+1]=x」という式で示される。合成データ「Y」の[N+1]番目の要素は、合成しない入力データ「x」をそのまま代入する。この[N+1]番目の要素は、評価部２０５での正解値の算出に用いられる。 The synthesis unit 203 synthesizes the input data “x” and each element of the saliency map 222 “M” to synthesize the composite data “Y = (y [1], y [2],…, y [N]. ], Y [N + 1]) ”(S105). This synthesis process is expressed by the formula "y [i] = x · m [i], y [N + 1] = x". The [N + 1] th element of the composite data "Y" substitutes the input data "x" that is not composited as it is. This [N + 1] th element is used in the evaluation unit 205 to calculate the correct answer value.

推論部２０４は、第１ＮＮ１１２に合成データ「Y」を入力させて推論を行い、その結果得られた、特徴指定パラメータ２２３で指定された中間層の特徴ベクトルを特徴別推論データ「Z」とする（Ｓ１０６）。第１ＮＮ１１２から特徴別推論データ「Z」を推論する処理を関数「F(・)」とする。この推論処理は「Z[i]=F(y[i])」という式で示される。そして、推論部２０４は、以下の２種類のデータZ[i]=(z[i][1],…,z[i][N+1])を特徴別推論データ「Z」として評価部２０５に通知する。
・［推論結果データ］i=1〜Nのときは、顕著性マップ２２２「M」のi番目の要素に対応した推論データ「z[i][i]」を出力する。
・［正解値データ］i=N+1のときは、入力データ「x」に対応した推論データ「Z[N+1]=(z[N+1][1],…,z[N+1][N])」を出力する。 The inference unit 204 causes the first NN 112 to input the composite data “Y” to perform inference, and sets the feature vector of the intermediate layer specified by the feature designation parameter 223 as the feature-specific inference data “Z”. (S106). The process of inferring the feature-specific inference data "Z" from the first NN112 is a function "F (・)". This inference process is expressed by the formula "Z [i] = F (y [i])". Then, the inference unit 204 evaluates the following two types of data Z [i] = (z [i] [1], ..., z [i] [N + 1]) as feature-specific inference data "Z". Notify 205.
-[Inference result data] When i = 1 to N, the inference data "z [i] [i]" corresponding to the i-th element of the saliency map 222 "M" is output.
・ [Correct value data] When i = N + 1, the inference data "Z [N + 1] = (z [N + 1] [1],…, z [N +" corresponding to the input data "x" 1] [N]) ”is output.

評価部２０５は、特徴別推論データ「Z」の推論結果データと正解値データとを以下の数式２の損失関数「E2(・)」によって比較し、損失スコアを算出する（Ｓ１０７）。 The evaluation unit 205 compares the inference result data of the feature-specific inference data “Z” with the correct answer value data by the loss function “E2 (・)” of the following mathematical formula 2 and calculates the loss score (S107).

実施例１と同様に、数式２の右辺第１項は推論結果データと正解値データとの比較結果を示す項であり、右辺第２項は顕著性マップ２２２を適切に小さくするための正則化項を示している。
評価部２０５は、実施例１と同様に、損失スコアの値に基づいて学習の終了条件（所定の条件）を満たすか否かの判定処理（Ｓ１０８）と、学習の終了条件を満たさなかった場合の学習処理（Ｓ１０２）とを実行する。 Similar to the first embodiment, the first term on the right side of the equation 2 is a term indicating the comparison result between the inference result data and the correct answer value data, and the second term on the right side is the regularization for appropriately reducing the saliency map 222. Indicates a term.
Similar to the first embodiment, the evaluation unit 205 determines whether or not the learning end condition (predetermined condition) is satisfied based on the value of the loss score (S108), and when the learning end condition is not satisfied. The learning process (S102) of the above is executed.

以上説明した実施例２では、ユーザが指定した特徴指定パラメータ２２３に応じた複数の要素を有する顕著性マップ２２２「M」を、マップ生成部２０２が個別に生成する。これにより、ユーザが所望する特徴（注目領域）ごとの顕著性マップ２２２「M」を提示できる。
さらに、顕著性マップ２２２「M」の生成に用いられる第２ＮＮ２２１についても、実施例１と同様に、評価部２０５の評価結果である損失スコアがフィードバックされることで、第２ＮＮ２２１の学習が行われる。
よって、学習により改良された第２ＮＮ２２１が作成する高精度の顕著性マップ２２２「M」をユーザは分析することができるので、第１ＮＮ１１２の中間層の解釈をきめ細かく行うことができる。 In the second embodiment described above, the map generation unit 202 individually generates the saliency map 222 “M” having a plurality of elements according to the feature designation parameter 223 specified by the user. As a result, the saliency map 222 “M” for each feature (area of interest) desired by the user can be presented.
Further, with respect to the second NN221 used for generating the saliency map 222 “M”, the learning of the second NN221 is performed by feeding back the loss score which is the evaluation result of the evaluation unit 205 as in the first embodiment. ..
Therefore, since the user can analyze the high-precision saliency map 222 “M” created by the second NN221 improved by learning, the intermediate layer of the first NN112 can be interpreted in detail.

以上、本発明の各実施例について説明したが、本発明はこれに限定されるものではなく、特許請求の範囲の趣旨を変えない範囲で実施することができる。例えば、以下の変形例が考えられる。 Although each embodiment of the present invention has been described above, the present invention is not limited to this, and can be carried out within a range that does not change the gist of the claims. For example, the following modification can be considered.

各実施例では、第２ＮＮ１２１，２２１が出力する顕著性マップ１２２，２２２は、データセット１１３の入力データ「x」と同じ次元のものを用いている。一方、第２ＮＮ１２１，２２１が１つの顕著性マップ１２２，２２２を出力し、次元方向に顕著性マップ１２２，２２２を複製してもよい。
各実施例では、第２ＮＮ１２１，２２１が出力する顕著性マップ１２２，２２２は、データセット１１３の入力データ「x」と同じサイズのものを用いている。一方、入力データ「x」よりも小さいサイズの顕著性マップ１２２，２２２を出力し、その出力を入力データ「x」と同じサイズになるように拡張してもよい。
各実施例では、評価部１０５，２０５で用いる正解値データは、第１ＮＮ１１２の出力を用いているが、データセット１１３の入力データ「x」に対応する出力データ（正解ラベル）を使用してもよい。 In each embodiment, the saliency maps 122 and 222 output by the second NN 121 and 221 have the same dimensions as the input data “x” of the data set 113. On the other hand, the second NN 121,221 may output one saliency map 122,222 and duplicate the saliency maps 122, 222 in the dimensional direction.
In each embodiment, the saliency maps 122 and 222 output by the second NN 121 and 221 have the same size as the input data “x” of the data set 113. On the other hand, the saliency maps 122 and 222 having a size smaller than the input data "x" may be output, and the output may be expanded so as to have the same size as the input data "x".
In each embodiment, the correct answer value data used by the evaluation units 105 and 205 uses the output of the first NN 112, but even if the output data (correct answer label) corresponding to the input data “x” of the data set 113 is used. Good.

実施例２では、入力部２０１は、第１ＮＮ１１２の学習に使用したデータセット１１３を入力データ「x」として用いている。一方、入力部２０１は、誤差逆伝播を行えるように内部構造の一部に変更を加えられた第１ＮＮ１１２の中間層データを入力データ「x」として用いてもよい。マップ生成部２０２は、変更を加えられた第１ＮＮ１１２の中間層データに対する顕著性マップ２２２を作成することもできる。
実施例２では、特徴指定パラメータ２２３として、第１ＮＮ１１２の中間層の番号を用いているが、出力層の番号を用いてもよい。
実施例２の損失関数「E2(・)」の右辺には、チャンネルごとの特徴の解釈性を向上させる目的で、顕著性マップ２２２の分散が大きくなるように第３項（正則化項）を追加してもよい。 In the second embodiment, the input unit 201 uses the data set 113 used for learning the first NN 112 as the input data “x”. On the other hand, the input unit 201 may use the intermediate layer data of the first NN 112 whose internal structure has been partially modified so that the error back propagation can be performed as the input data “x”. The map generation unit 202 can also create a saliency map 222 for the modified mesosphere data of the first NN 112.
In the second embodiment, the number of the intermediate layer of the first NN 112 is used as the feature designation parameter 223, but the number of the output layer may be used.
On the right side of the loss function "E2 (・)" of Example 2, a third term (regularization term) is provided so that the variance of the saliency map 222 becomes large for the purpose of improving the interpretability of the features for each channel. You may add it.

１００，２００根拠提示装置
１０１，２０１入力部
１０２，２０２マップ生成部
１０３，２０３合成部
１０４，２０４推論部
１０５，２０５評価部
１１０機械学習装置
１１１学習部
１１２第１ＮＮ
１１３データセット
１２１，２２１第２ＮＮ
１２２，２２２顕著性マップ
２２３特徴指定パラメータ 100,200 Rationale presentation device 101,201 Input unit 102,202 Map generation unit 103,203 Synthesis unit 104,204 Reasoning unit 105,205 Evaluation unit 110 Machine learning device 111 Learning unit 112 1st NN
113 Dataset 121,221 2nd NN
122,222 Severity map 223 Feature specification parameters

Claims

As a result of inferring from the input data of the data set used for learning the first neural network using the second neural network, a saliency map showing the characteristics of the input data was created and the basis of the first neural network. The map generator that outputs as
A compositing unit that synthesizes the input data and the prominence map to create composite data,
An inference unit that creates inference data as a result of inference from each of the input data and the synthetic data using the first neural network.
It has an evaluation unit that calculates the error between the inference data of the input data and the inference data of the composite data by a loss function and uses the calculation result as a loss score.
The map generation unit is a rationale presenting device characterized in that the parameters of the second neural network are updated by reflecting the loss score on the second neural network used in the previous inference.

The evaluation unit uses the loss function having a first term that the larger the error is, the higher the loss score is, and the second term that the larger the feature portion of the prominence map is, the higher the loss score is. The rationale presenting device according to claim 1, wherein the loss score is calculated.

The rationale presenting device according to claim 1, wherein the map generation unit outputs one said saliency map corresponding to the input data.

The rationale presenting device further has a storage unit of a feature designation parameter indicating a designated element from a plurality of elements showing the feature of the input data existing in the intermediate layer of the first neural network.
The rationale presenting device according to claim 1, wherein the map generation unit individually creates and outputs the saliency map for each element indicating a designated feature according to the feature designation parameter.

The rationale presenting device has a map generation unit, a synthesis unit, an inference unit, and an evaluation unit.
The map generation unit creates a saliency map showing the characteristics of the input data as a result of inferring from the input data of the data set used for learning the first neural network using the second neural network. Output as the basis of the first neural network,
The synthesis unit creates composite data by synthesizing the input data and the saliency map.
The inference unit creates inference data as a result of inference from the input data and the synthetic data using the first neural network.
The evaluation unit calculates the error between the inference data of the input data and the inference data of the composite data by a loss function, and uses the calculation result as a loss score.
A rationale presentation method characterized in that the map generation unit updates the parameters of the second neural network by reflecting the loss score on the second neural network used in the previous inference.

In the basis presentation device according to claim 1,
As a result of inferring from the input data of the data set used for learning the first neural network using the second neural network, a saliency map showing the characteristics of the input data was created and the basis of the first neural network. And the procedure to output as
A procedure for synthesizing the input data and the saliency map to create composite data, and
A procedure for creating inference data as a result of inference from the input data and the synthetic data using the first neural network, and
A procedure for calculating an error between the inference data of the input data and the inference data of the composite data by a loss function and using the calculation result as a loss score.
A rationale presentation program for executing a procedure for updating the parameters of the second neural network by reflecting the loss score on the second neural network used in the previous inference.