JP2021111299A

JP2021111299A - Learning device, learning method, learning program, identification device, identification method, and identification program

Info

Publication number: JP2021111299A
Application number: JP2020004837A
Authority: JP
Inventors: 恭史国定; Yasushi Kunisada; 素子加賀谷; Motoko Kagaya; 蔵人前野; Kurato Maeno
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2020-01-16
Filing date: 2020-01-16
Publication date: 2021-08-02
Anticipated expiration: 2040-01-16
Also published as: JP7210489B2

Abstract

To present more useful explanatory materials for the judgment basis of neural networks.SOLUTION: A learning device includes an input unit configured to acquire learning data and a correct answer value, an important area estimation unit configured to estimate one or more important areas based on the learning data, a trimming processing unit configured to trim the one or more important areas based on the learning data and information indicating each of the one or more important areas and output the one or more important areas, a feature quantity extraction unit configured to extract a feature quantity based on the one or more important areas and a first neural network, a similarity calculation unit configured to calculate and output a similarity between the feature quantity and a prototype, an inference unit configured to output an inference value based on the similarity, an evaluation unit configured to evaluate the inference value based on the correct answer value and obtain an evaluation result, and an update unit configured to update a weight parameter of the first neural network and the prototype based on the evaluation result.SELECTED DRAWING: Figure 1

Description

本発明は、学習装置、学習方法、学習プログラム、識別装置、識別方法および識別プログラムに関する。 The present invention relates to a learning device, a learning method, a learning program, an identification device, an identification method and an identification program.

一般に、今日の画像認識などにおいて高い性能を有する多層ニューラルネットワークは、膨大なパラメータと複雑なモデルとによって構成されている。しかし、この種の機械学習によるシステムは、優れた性能を示す一方、ニューラルネットワークの判断根拠の解釈が難しいという課題があった。この課題を解決するため、ニューラルネットワークの判断根拠の説明材料を提示する手法が幾つか提案されている。例えば、入力データに対する類似例をニューラルネットワークの判断根拠の説明材料の例として提示する手法が知られている。 In general, a multi-layer neural network having high performance in today's image recognition and the like is composed of a huge number of parameters and a complicated model. However, while this type of machine learning system exhibits excellent performance, there is a problem that it is difficult to interpret the judgment basis of the neural network. In order to solve this problem, some methods have been proposed to present explanatory materials for the judgment basis of the neural network. For example, there is known a method of presenting a similar example to the input data as an example of an explanatory material for the judgment basis of the neural network.

特許文献１に記載の手法は、主に医用画像の診断において、画像から機械学習によって抽出された画像特徴量に基づいて画像特徴量に対応する症例（類似症例）を症例データベースから検索し、類似症例を提示する手法である。 The method described in Patent Document 1 searches a case database for cases (similar cases) corresponding to image features based on image features extracted from images by machine learning, mainly in the diagnosis of medical images, and is similar. It is a method of presenting a case.

非特許文献１に記載の手法は、画像の分類において、複数の分類クラスそれぞれの典型例（プロトタイプ）を学習し、推論時に識別用データの特徴量と最も類似度の高いプロトタイプをニューラルネットワークの判断根拠の説明材料の例として提示する手法である。かかる手法では、モデル内部に識別用データの特徴量とプロトタイプとの類似度を計算する層が埋め込まれているため、ニューラルネットワークは類似度に基づいた分類を行うように学習する。 The method described in Non-Patent Document 1 learns typical examples (prototypes) of each of a plurality of classification classes in image classification, and determines the prototype having the highest degree of similarity to the features of identification data at the time of inference by a neural network. This is a method presented as an example of materials for explaining the rationale. In such a method, since a layer for calculating the feature amount of the identification data and the similarity with the prototype is embedded in the model, the neural network learns to perform the classification based on the similarity.

特許文献２に記載の手法は、非特許文献１に記載の手法と同様に、複数の分類クラスそれぞれの典型例（プロトタイプ）を学習する手法である。しかし、特許文献２の手法は、学習したプロトタイプをそのまま提示するのではなく、プロトタイプに最も近い特徴量を持つデータを学習用データの中から探索し、見つかった学習用データを提示する。このとき、見つかった学習用データ（例えば、画像）の全体ではなく、学習用データの一部が類似部位として提示され得る。 The method described in Patent Document 2 is a method of learning typical examples (prototypes) of each of a plurality of classification classes, similarly to the method described in Non-Patent Document 1. However, the method of Patent Document 2 does not present the learned prototype as it is, but searches for the data having the feature amount closest to the prototype from the learning data and presents the found learning data. At this time, a part of the learning data may be presented as a similar part instead of the whole of the found learning data (for example, an image).

特開２０１９−１２５２４０号公報JP-A-2019-125240

Oscar Li、他3名、"Deep Learning for Case-Based Reasoning through Prototypes: ANeural Network that Explains Its Predictions"、[online]、［令和1年12月26日検索］、インターネット＜https://arxiv.org/abs/1710.04806＞Oscar Li, 3 others, "Deep Learning for Case-Based Reasoning through Prototypes: ANeural Network that Explains Its Predictions", [online], [Searched December 26, 1st year of Reiwa], Internet <https://arxiv. org / abs / 1710.04806 ＞ Chaofan Chen、他5名、"This LooksLike That: Deep Learning for Interpretable Image Recognition"、[online]、［令和1年12月26日検索］、インターネット＜https://arxiv.org/abs/1806.10574＞Chaofan Chen, 5 others, "This LooksLike That: Deep Learning for Interpretable Image Recognition", [online], [Searched December 26, 1st year of Reiwa], Internet <https://arxiv.org/abs/1806.10574> Wei Liu、他6名、"SSD: Single Shot MultiBox Detector"、[online]、［令和1年12月26日検索］、インターネット＜https://arxiv.org/abs/1512.02325＞Wei Liu, 6 others, "SSD: Single Shot MultiBox Detector", [online], [Searched on December 26, 1st year of Reiwa], Internet <https://arxiv.org/abs/1512.02325>

しかしながら、特許文献１に記載の手法、および、非特許文献１に記載の手法においては、ニューラルネットワークに入力されるデータの類似例が画像全体として提示される。したがって、かかる手法においては、類似例として提示された画像全体のうち特にどの部位が、ニューラルネットワークに入力されるデータと類似しているのかを解釈することが困難である。 However, in the method described in Patent Document 1 and the method described in Non-Patent Document 1, similar examples of data input to the neural network are presented as an entire image. Therefore, in such a method, it is difficult to interpret which part of the whole image presented as a similar example is similar to the data input to the neural network.

非特許文献２に記載の手法においては、特徴空間上で計算された入力画像と類似する範囲（類似範囲）を入力画像と同じ大きさにアップサンプリングすることによって、類似範囲に対応する入力画像における部位を提示することが可能である。しかし、特徴空間上での位置と入力画像上の位置との関係は、単なる拡大関係あるいは縮小関係とは異なる。そのため、ニューラルネットワークは、提示された部位以外の箇所も類似部位として判断している可能性がある。すなわち、非特許文献２に記載の手法では、必ずしも正しい類似部位が提示されるとは限らない。 In the method described in Non-Patent Document 2, a range (similar range) similar to the input image calculated on the feature space is upsampled to the same size as the input image, so that the input image corresponding to the similar range is obtained. It is possible to present the site. However, the relationship between the position on the feature space and the position on the input image is different from the mere enlargement relationship or reduction relationship. Therefore, there is a possibility that the neural network determines a part other than the presented part as a similar part. That is, the method described in Non-Patent Document 2 does not always present the correct similar site.

そこで、ニューラルネットワークの判断根拠のより有用な説明材料を提示することを可能とする技術が提供されることが望まれる。 Therefore, it is desired to provide a technique capable of presenting a more useful explanatory material for the judgment basis of the neural network.

上記問題を解決するために、本発明のある観点によれば、学習用データと正解値とを取得する入力部と、前記学習用データに基づいて１または複数の重要領域を推定する重要領域推定部と、前記学習用データと前記１または複数の重要領域それぞれを示す情報とに基づいて前記１または複数の重要領域に対してトリミングを行って前記１または複数の重要領域を出力するトリミング処理部と、前記１または複数の重要領域と第１のニューラルネットワークとに基づいて特徴量を抽出する特徴抽出部と、前記特徴量とプロトタイプとの類似度を算出して出力する類似度算出部と、前記類似度に基づいて推論値を出力する推論部と、前記正解値に基づいて前記推論値を評価して評価結果を得る評価部と、前記評価結果に基づいて、前記第１のニューラルネットワークの重みパラメータと前記プロトタイプとの更新を行う更新部と、を備える、学習装置が提供される。 In order to solve the above problem, according to a certain viewpoint of the present invention, an input unit for acquiring learning data and a correct answer value, and an important area estimation for estimating one or more important areas based on the learning data. A trimming processing unit that trims the one or more important regions based on the unit, the learning data, and information indicating each of the one or a plurality of important regions, and outputs the one or a plurality of important regions. A feature extraction unit that extracts a feature amount based on the one or a plurality of important regions and a first neural network, a similarity calculation unit that calculates and outputs the similarity between the feature amount and the prototype, and the like. An inference unit that outputs an inference value based on the similarity, an evaluation unit that evaluates the inference value based on the correct answer value and obtains an evaluation result, and the first neural network based on the evaluation result. A learning device is provided that includes an update unit that updates the weight parameter and the prototype.

前記重要領域推定部は、前記学習用データと第２のニューラルネットワークとに基づいて前記１または複数の重要領域を推定し、前記更新部は、前記評価結果に基づいて前記第２のニューラルネットワークの重みパラメータの更新を行ってもよい。 The important region estimation unit estimates the one or a plurality of important regions based on the learning data and the second neural network, and the update unit estimates the second neural network based on the evaluation result. The weight parameter may be updated.

前記推論部は、前記類似度と第３のニューラルネットワークとに基づいて前記推論値を出力し、前記更新部は、前記評価結果に基づいて前記第３のニューラルネットワークの重みパラメータの更新を行ってもよい。 The inference unit outputs the inference value based on the similarity and the third neural network, and the update unit updates the weight parameter of the third neural network based on the evaluation result. May be good.

前記１または複数の重要領域それぞれのサイズは、可変であってもよい。 The size of each of the one or more important regions may be variable.

前記１または複数の重要領域それぞれのサイズに対して、所定の制約が課されていてもよい。 Predetermined constraints may be imposed on the size of each of the one or more important areas.

前記特徴量のサイズは、可変であってもよい。 The size of the feature amount may be variable.

前記特徴量のチャネル数は、前記プロトタイプのチャネル数と同一であり、前記類似度算出部は、前記特徴量のチャネルデータの１または複数個所それぞれと前記プロトタイプのチャネルデータとの類似度の中で最も高い類似度を、当該チャネルに対応する類似度として前記推論部に出力してもよい。 The number of channels of the feature amount is the same as the number of channels of the prototype, and the similarity calculation unit calculates the similarity between one or more of the channel data of the feature amount and the channel data of the prototype. The highest similarity may be output to the inference unit as the similarity corresponding to the channel.

前記類似度算出部は、複数の学習用データの一部または全部それぞれにおいて、前記推論部に出力した類似度と、前記類似度に対応する特徴量とを、チャネルごとに保存データとして保存し、前記更新部は、前記プロトタイプと最も類似度が高い特徴量を類似特徴量としてチャネルごとに前記保存データから検出し、前記類似特徴量が抽出された学習用データの前記類似特徴量に対応する領域データをチャネルごとに前記プロトタイプに対応付けてもよい。 The similarity calculation unit saves the similarity output to the inference unit and the feature amount corresponding to the similarity in a part or all of the plurality of learning data as storage data for each channel. The update unit detects a feature amount having the highest degree of similarity to the prototype as a similar feature amount from the stored data for each channel, and the region corresponding to the similar feature amount of the learning data from which the similar feature amount is extracted. Data may be associated with the prototype for each channel.

前記更新部は、前記類似特徴量によって前記プロトタイプをチャネルごとに上書きしてもよい。 The update unit may overwrite the prototype for each channel by the similar feature amount.

前記更新部は、学習の途中において、前記類似特徴量によって前記プロトタイプを上書きした場合、前記プロトタイプの更新を停止してもよい。 When the prototype is overwritten by the similar feature amount in the middle of learning, the update unit may stop updating the prototype.

また、本発明の別の観点によれば、学習用データと正解値とを取得することと、前記学習用データに基づいて１または複数の重要領域を推定することと、前記学習用データと前記１または複数の重要領域それぞれを示す情報とに基づいて前記１または複数の重要領域に対してトリミングを行って前記１または複数の重要領域を出力することと、前記１または複数の重要領域と第１のニューラルネットワークとに基づいて特徴量を抽出することと、前記特徴量とプロトタイプとの類似度を算出して出力することと、前記類似度に基づいて推論値を出力することと、前記正解値に基づいて前記推論値を評価して評価結果を得ることと、前記評価結果に基づいて、前記第１のニューラルネットワークの重みパラメータと前記プロトタイプとの更新を行うことと、を含む、学習方法が提供される。 Further, according to another aspect of the present invention, acquisition of training data and correct answer values, estimation of one or more important regions based on the training data, and the training data and the above. Trimming the one or more important areas based on the information indicating each of the one or more important areas to output the one or more important areas, and the one or more important areas and the first Extracting the feature amount based on the neural network of 1, calculating and outputting the similarity between the feature amount and the prototype, outputting the inferred value based on the similarity degree, and the correct answer. A learning method including evaluating the inferred value based on the value to obtain an evaluation result, and updating the weight parameter of the first neural network and the prototype based on the evaluation result. Is provided.

また、本発明の別の観点によれば、コンピュータを、学習用データと正解値とを取得する入力部と、前記学習用データに基づいて１または複数の重要領域を推定する重要領域推定部と、前記学習用データと前記１または複数の重要領域それぞれを示す情報とに基づいて前記１または複数の重要領域に対してトリミングを行って前記１または複数の重要領域を出力するトリミング処理部と、前記１または複数の重要領域と第１のニューラルネットワークとに基づいて特徴量を抽出する特徴抽出部と、前記特徴量とプロトタイプとの類似度を算出して出力する類似度算出部と、前記類似度に基づいて推論値を出力する推論部と、前記正解値に基づいて前記推論値を評価して評価結果を得る評価部と、前記評価結果に基づいて、前記第１のニューラルネットワークの重みパラメータと前記プロトタイプとの更新を行う更新部と、を備える学習装置として機能させるための学習プログラムが提供される。 Further, according to another aspect of the present invention, the computer is provided with an input unit for acquiring training data and a correct answer value, and an important region estimation unit for estimating one or more important regions based on the training data. A trimming processing unit that trims the one or more important areas based on the learning data and information indicating each of the one or more important areas and outputs the one or more important areas. A feature extraction unit that extracts a feature amount based on the one or a plurality of important regions and a first neural network, a similarity calculation unit that calculates and outputs the similarity between the feature amount and the prototype, and the similarity. An inference unit that outputs an inference value based on the degree, an evaluation unit that evaluates the inference value based on the correct answer value and obtains an evaluation result, and a weight parameter of the first neural network based on the evaluation result. A learning program for functioning as a learning device including an update unit for updating the prototype and the prototype is provided.

また、本発明の別の観点によれば、識別用データと正解値とを取得する入力部と、前記識別用データに基づいて１または複数の重要領域を推定する重要領域推定部と、前記識別用データと前記１または複数の重要領域とに基づいて前記１または複数の重要領域に対してトリミングを行って前記１または複数の重要領域を出力するトリミング処理部と、前記１または複数の重要領域と第１のニューラルネットワークとに基づいて特徴量を抽出する特徴抽出部と、前記特徴量とプロトタイプとの類似度を算出して出力する類似度算出部と、前記類似度に基づいて推論値を出力する推論部と、学習用データの前記プロトタイプに対応する領域データがチャネルごとに表示されるように制御する表示制御部と、を備える、識別装置が提供される。 Further, according to another aspect of the present invention, the input unit for acquiring the identification data and the correct answer value, the important area estimation unit for estimating one or more important regions based on the identification data, and the identification. A trimming processing unit that trims the one or more important areas based on the data and the one or more important areas and outputs the one or more important areas, and the one or more important areas. A feature extraction unit that extracts a feature amount based on the above-mentioned feature amount and the first neural network, a similarity degree calculation unit that calculates and outputs the similarity between the feature amount and the prototype, and an inferred value based on the similarity degree. An identification device including an inference unit for output and a display control unit for controlling region data corresponding to the prototype of training data to be displayed for each channel is provided.

前記表示制御部は、前記識別用データの前記推論部に出力された類似度に対応する領域に関する情報がチャネルごとに表示されるように制御してもよい。 The display control unit may control so that the information regarding the region corresponding to the similarity output to the inference unit of the identification data is displayed for each channel.

前記表示制御部は、前記推論部に出力された類似度または前記類似度に応じた値がスコアとしてチャネルごとに表示されるように制御してもよい。 The display control unit may control so that the similarity output to the inference unit or a value corresponding to the similarity is displayed as a score for each channel.

前記表示制御部は、前記推論部に出力された類似度の高い順に所定の数だけ前記領域データが表示されるように制御してもよい。 The display control unit may control so that a predetermined number of the area data are displayed in descending order of similarity output to the inference unit.

また、本発明の別の観点によれば、識別用データと正解値とを取得することと、前記識別用データに基づいて１または複数の重要領域を推定することと、前記識別用データと前記１または複数の重要領域とに基づいて前記１または複数の重要領域に対してトリミングを行って前記１または複数の重要領域を出力することと、前記１または複数の重要領域と第１のニューラルネットワークとに基づいて特徴量を抽出することと、前記特徴量とプロトタイプとの類似度を算出して出力することと、前記類似度に基づいて推論値を出力することと、学習用データの前記プロトタイプに対応する領域データがチャネルごとに表示されるように制御することと、を含む、識別方法が提供される。 Further, according to another aspect of the present invention, the identification data and the correct answer value are acquired, one or a plurality of important regions are estimated based on the identification data, and the identification data and the above are described. Trimming the one or more important regions based on the one or more important regions to output the one or more important regions, and the one or more important regions and the first neural network. Extracting the feature amount based on the above, calculating and outputting the similarity between the feature amount and the prototype, outputting the inferred value based on the similarity degree, and the prototype of the training data. Identification methods are provided, including controlling the region data corresponding to the channel to be displayed on a channel-by-channel basis.

また、本発明の別の観点によれば、コンピュータを、識別用データと正解値とを取得する入力部と、前記識別用データに基づいて１または複数の重要領域を推定する重要領域推定部と、前記識別用データと前記１または複数の重要領域とに基づいて前記１または複数の重要領域に対してトリミングを行って前記１または複数の重要領域を出力するトリミング処理部と、前記１または複数の重要領域と第１のニューラルネットワークとに基づいて特徴量を抽出する特徴抽出部と、前記特徴量とプロトタイプとの類似度を算出して出力する類似度算出部と、前記類似度に基づいて推論値を出力する推論部と、学習用データの前記プロトタイプに対応する領域データがチャネルごとに表示されるように制御する表示制御部と、を備える識別装置として機能させるための識別プログラムが提供される。 Further, according to another aspect of the present invention, the computer is provided with an input unit for acquiring identification data and a correct answer value, and an important region estimation unit for estimating one or more important regions based on the identification data. A trimming processing unit that trims the one or more important areas based on the identification data and the one or more important areas and outputs the one or more important areas, and the one or more important areas. Based on the feature extraction unit that extracts the feature amount based on the important region of the above and the first neural network, the similarity calculation unit that calculates and outputs the similarity between the feature amount and the prototype, and the similarity degree. An identification program for functioning as an identification device including an inference unit that outputs an inference value and a display control unit that controls so that area data corresponding to the prototype of training data is displayed for each channel is provided. NS.

以上説明したように本発明によれば、ニューラルネットワークの判断根拠のより有用な説明材料を提示することを可能とする技術が提供される。 As described above, the present invention provides a technique capable of presenting a more useful explanatory material for the judgment basis of the neural network.

本発明の実施形態に係る学習装置の機能構成例を示す図である。It is a figure which shows the functional structure example of the learning apparatus which concerns on embodiment of this invention. 重要領域推定部の機能の詳細を説明するための図である。It is a figure for demonstrating the detail of the function of the important area estimation part. トリミング処理部の機能の詳細を説明するための図である。It is a figure for demonstrating the detail of the function of the trimming processing part. 特徴抽出部の機能の詳細を説明するための図である。It is a figure for demonstrating the detail of the function of the feature extraction part. 類似度算出部の機能の詳細を説明するための図である。It is a figure for demonstrating the detail of the function of the similarity calculation part. 推論部の機能の詳細を説明するための図である。It is a figure for demonstrating the detail of the function of an inference part. 評価部の機能の詳細を説明するための図である。It is a figure for demonstrating the detail of the function of the evaluation part. 更新部の機能の詳細を説明するための図である。It is a figure for demonstrating the detail of the function of the update part. 同実施形態に係る学習装置の動作例を示すフローチャートである。It is a flowchart which shows the operation example of the learning apparatus which concerns on this embodiment. 同実施形態に係る識別装置の機能構成例を示す図である。It is a figure which shows the functional structure example of the identification device which concerns on the same embodiment. 類似部位提示画面の例を示す図である。It is a figure which shows the example of the similar part presentation screen. 同実施形態に係る識別装置の動作例を示すフローチャートである。It is a flowchart which shows the operation example of the identification device which concerns on this embodiment. 同実施形態に係る学習装置の例としての情報処理装置のハードウェア構成を示す図である。It is a figure which shows the hardware configuration of the information processing apparatus as an example of the learning apparatus which concerns on this embodiment.

以下に添付図面を参照しながら、本発明の好適な実施の形態について詳細に説明する。なお、本明細書及び図面において、実質的に同一の機能構成を有する構成要素については、同一の符号を付することにより重複説明を省略する。 Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the present specification and the drawings, components having substantially the same functional configuration are designated by the same reference numerals, so that duplicate description will be omitted.

また、本明細書および図面において、実質的に同一の機能構成を有する複数の構成要素を、同一の符号の後に異なる数字を付して区別する場合がある。ただし、実質的に同一の機能構成を有する複数の構成要素等の各々を特に区別する必要がない場合、同一符号のみを付する。また、異なる実施形態の類似する構成要素については、同一の符号の後に異なるアルファベットを付して区別する場合がある。ただし、異なる実施形態の類似する構成要素等の各々を特に区別する必要がない場合、同一符号のみを付する。 Further, in the present specification and the drawings, a plurality of components having substantially the same functional configuration may be distinguished by adding different numbers after the same reference numerals. However, if it is not necessary to distinguish each of a plurality of components having substantially the same functional configuration, only the same reference numerals are given. Further, similar components of different embodiments may be distinguished by adding different alphabets after the same reference numerals. However, if it is not necessary to distinguish each of the similar components of different embodiments, only the same reference numerals are given.

（１．実施形態の詳細）
続いて、本発明の実施形態の詳細について説明する。本発明の実施形態では、学習用データと正解値との組み合わせに基づいてニューラルネットワークの学習を行う学習装置１０（図１）について説明した後、学習済みのニューラルネットワークと識別用データ（テストデータ）とに基づいて推論値を出力する識別装置２０（図１０）について説明を行う。 (1. Details of the embodiment)
Subsequently, the details of the embodiment of the present invention will be described. In the embodiment of the present invention, the learning device 10 (FIG. 1) that learns the neural network based on the combination of the training data and the correct answer value will be described, and then the trained neural network and the identification data (test data) will be described. The identification device 20 (FIG. 10) that outputs an inferred value based on the above will be described.

以下では、学習装置１０と識別装置２０とが同一のコンピュータによって実現される場合を主に想定する。しかし、学習装置１０と識別装置２０とは、別のコンピュータによって実現されてもよい。かかる場合には、学習装置１０によって生成された学習済みのニューラルネットワークが識別装置２０に提供される。例えば、学習済みのニューラルネットワークは、学習装置１０から識別装置２０に記録媒体を介して提供されてもよいし、通信を介して提供されてもよい。 In the following, it is mainly assumed that the learning device 10 and the identification device 20 are realized by the same computer. However, the learning device 10 and the identification device 20 may be realized by different computers. In such a case, the trained neural network generated by the learning device 10 is provided to the identification device 20. For example, the trained neural network may be provided from the learning device 10 to the identification device 20 via a recording medium, or may be provided via communication.

（１−１．学習装置の構成）
まず、本発明の実施形態に係る学習装置１０の構成例について説明する。図１は、本発明の実施形態に係る学習装置１０の機能構成例を示す図である。図１に示されるように、本発明の実施形態に係る学習装置１０は、入力部１２１と、重要領域推定部１２２と、トリミング処理部１２３と、特徴抽出部１２４と、類似度算出部１２５と、推論部１２６と、評価部１４０と、更新部１５０とを備える。 (1-1. Configuration of learning device)
First, a configuration example of the learning device 10 according to the embodiment of the present invention will be described. FIG. 1 is a diagram showing a functional configuration example of the learning device 10 according to the embodiment of the present invention. As shown in FIG. 1, the learning device 10 according to the embodiment of the present invention includes an input unit 121, an important region estimation unit 122, a trimming processing unit 123, a feature extraction unit 124, and a similarity calculation unit 125. , The inference unit 126, the evaluation unit 140, and the update unit 150 are provided.

本発明の実施形態では、入力部１２１と、重要領域推定部１２２と、トリミング処理部１２３と、特徴抽出部１２４と、類似度算出部１２５と、推論部１２６とが、ニューラルネットワーク１２０によって構成される場合を主に想定する。以下では、ニューラルネットワークを「ＮＮ」とも表記する。 In the embodiment of the present invention, the input unit 121, the important region estimation unit 122, the trimming processing unit 123, the feature extraction unit 124, the similarity calculation unit 125, and the inference unit 126 are configured by the neural network 120. Mainly assume the case. Hereinafter, the neural network is also referred to as "NN".

より詳細に、特徴抽出部１２４は、第１のニューラルネットワーク（以下、「特徴抽出ＮＮ」とも表記する。）を含み、重要領域推定部１２２は、第２のニューラルネットワーク（以下、「重要領域推定ＮＮ」とも表記する。）を含み、推論部１２６は、第３のニューラルネットワーク（以下、「推論ＮＮ」とも表記する。）を含む。しかし、入力部１２１と、重要領域推定部１２２と、トリミング処理部１２３と、特徴抽出部１２４と、類似度算出部１２５と、推論部１２６とは、具体的にどのような構成であってもよい。 More specifically, the feature extraction unit 124 includes a first neural network (hereinafter, also referred to as “feature extraction NN”), and the important region estimation unit 122 includes a second neural network (hereinafter, “important region estimation”). NN ”), and the inference unit 126 includes a third neural network (hereinafter, also referred to as“ inference NN ”). However, the input unit 121, the important area estimation unit 122, the trimming processing unit 123, the feature extraction unit 124, the similarity calculation unit 125, and the inference unit 126 may have any specific configuration. good.

これらのブロックは、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）またはＧＰＵ（ＧｒａｐｈｉｃｓＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）などの演算装置を含み、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）により記憶されているプログラムが演算装置によりＲＡＭに展開されて実行されることにより、その機能が実現され得る。このとき、当該プログラムを記録した、コンピュータに読み取り可能な記録媒体も提供され得る。あるいは、これらのブロックは、専用のハードウェアにより構成されていてもよいし、複数のハードウェアの組み合わせにより構成されてもよい。演算装置による演算に必要なデータは、図示しない記憶部によって適宜記憶される。 These blocks include an arithmetic unit such as a CPU (Central Processing Unit) or a GPU (Graphics Processing Unit), and a program stored in a ROM (Read Only Memory) is expanded into a RAM by the arithmetic unit and executed. The function can be realized by the above. At this time, a computer-readable recording medium on which the program is recorded may also be provided. Alternatively, these blocks may be composed of dedicated hardware or may be composed of a combination of a plurality of hardware. The data required for the calculation by the arithmetic unit is appropriately stored by a storage unit (not shown).

データセット１１０、重要領域推定ＮＮの重みパラメータ１３１、特徴抽出ＮＮの重みパラメータ１３２、推論ＮＮの重みパラメータ１３３、プロトタイプ１３４および保存データ１６０は、図示しない記憶部によって記憶される。かかる記憶部は、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、ハードディスクドライブまたはフラッシュメモリなどのメモリによって構成されてよい。 The data set 110, the weight parameter 131 of the important region estimation NN, the weight parameter 132 of the feature extraction NN, the weight parameter 133 of the inference NN, the prototype 134, and the stored data 160 are stored by a storage unit (not shown). Such a storage unit may be composed of a memory such as a RAM (Random Access Memory), a hard disk drive, or a flash memory.

初期状態において、重要領域推定ＮＮの重みパラメータ１３１、特徴抽出ＮＮの重みパラメータ１３２、推論ＮＮの重みパラメータ１３３、および、プロトタイプ１３４それぞれには、初期値が設定されている。例えば、これらに設定される初期値は、ランダムな値であってよいが、どのような値であってもよい。例えば、これらに設定される初期値は、あらかじめ学習によって得られた学習済みの値であってもよい。一方、保存データには、初期状態では特に何も設定されていなくてよい。 In the initial state, initial values are set for each of the weight parameter 131 of the important region estimation NN, the weight parameter 132 of the feature extraction NN, the weight parameter 133 of the inference NN, and the prototype 134. For example, the initial values set in these may be random values, but may be any value. For example, the initial values set in these may be learned values obtained by learning in advance. On the other hand, nothing in particular may be set in the saved data in the initial state.

（データセット１１０）
データセット１１０は、複数の学習用データ（入力データ）と当該複数の学習用データそれぞれの正解値とを含んで構成される。なお、本発明の実施形態では、学習用データが画像データである場合（特に、静止画像データである場合）を主に想定する。しかし、学習用データの種類は特に限定されない。例えば、学習用データは、複数のフレームを含んだ動画像データであってもよいし、音響データであってもよい。 (Data set 110)
The data set 110 is configured to include a plurality of learning data (input data) and correct answer values of the plurality of learning data. In the embodiment of the present invention, it is mainly assumed that the learning data is image data (particularly, still image data). However, the type of learning data is not particularly limited. For example, the learning data may be moving image data including a plurality of frames or acoustic data.

（入力部１２１）
入力部１２１は、データセット１１０から学習用データおよび正解値の組み合わせを順次に取得する。入力部１２１は、学習用データおよび正解値の組み合わせを順次に重要領域推定部１２２およびトリミング処理部１２３それぞれに出力する。入力部１２１よりも後段の各ブロックにおいては、前段のブロックからの入力に基づいて順次に各自の処理が繰り返し実行される。 (Input unit 121)
The input unit 121 sequentially acquires a combination of learning data and a correct answer value from the data set 110. The input unit 121 sequentially outputs the combination of the learning data and the correct answer value to the important area estimation unit 122 and the trimming processing unit 123, respectively. In each block after the input unit 121, each process is sequentially executed repeatedly based on the input from the block in the previous stage.

なお、例えば、入力部１２１は、データセット１１０から学習用データおよび正解値の組み合わせを全部取得し終わった場合には、最初から当該組み合わせを取得し直して再度出力する動作を所定の回数繰り返してよい。かかる場合には、入力部１２１よりも後段のブロックにおいても、前段のブロックからの再度の入力に基づいて順次に各自の処理が繰り返し実行されてよい。 For example, when the input unit 121 has acquired all the combinations of the learning data and the correct answer values from the data set 110, the input unit 121 repeats the operation of reacquiring the combinations from the beginning and outputting them again a predetermined number of times. good. In such a case, even in the block after the input unit 121, each process may be repeatedly executed in sequence based on the re-input from the block in the previous stage.

（重要領域推定部１２２）
重要領域推定部１２２は、入力部１２１から出力された学習用データと重要領域推定ＮＮとに基づいて学習用データから１または複数の重要領域を推定する。より詳細に、重要領域推定部１２２は、重要領域推定ＮＮに学習用データを入力させたことに基づいて、重要領域推定ＮＮから出力されるデータを１または複数の重要領域それぞれを示す情報（１または複数の重要領域それぞれの位置およびサイズ）として得る。重要領域推定部１２２は、１または複数の重要領域それぞれを示す情報をトリミング処理部１２３に出力する。ここで、図２を参照しながら、重要領域推定部１２２の機能についてより詳細に説明する。 (Important area estimation unit 122)
The important area estimation unit 122 estimates one or a plurality of important areas from the learning data based on the learning data output from the input unit 121 and the important area estimation NN. More specifically, the important area estimation unit 122 outputs data output from the important area estimation NN based on inputting learning data to the important area estimation NN, and information indicating one or a plurality of important areas (1). Or obtain as the position and size of each of the multiple important areas). The important area estimation unit 122 outputs information indicating each of the one or a plurality of important areas to the trimming processing unit 123. Here, the function of the important region estimation unit 122 will be described in more detail with reference to FIG.

図２は、重要領域推定部１２２の機能の詳細を説明するための図である。図２を参照すると、入力部１２１から出力された学習用データＧ１が示されており、学習用データＧ１には、被写体の例として「犬」が写っている。このとき、学習用データＧ１の正解値は「犬」である場合が想定される。しかし、学習用データＧ１に写る被写体は「犬」に限定されない。重要領域推定部１２２は、重要領域推定ＮＮに学習用データＧ１を入力させ、重みパラメータ１３１を用いて重要領域推定ＮＮから出力される重要領域Ｒ１〜Ｒ４それぞれを示す情報（重要領域Ｒ１〜Ｒ４それぞれの位置およびサイズ）を得る。 FIG. 2 is a diagram for explaining the details of the function of the important region estimation unit 122. With reference to FIG. 2, the learning data G1 output from the input unit 121 is shown, and the learning data G1 shows a “dog” as an example of the subject. At this time, it is assumed that the correct answer value of the learning data G1 is "dog". However, the subject reflected in the learning data G1 is not limited to the "dog". The important area estimation unit 122 causes the important area estimation NN to input the learning data G1, and uses the weight parameter 131 to indicate information indicating each of the important areas R1 to R4 output from the important area estimation NN (each of the important areas R1 to R4). Position and size).

例えば、重要領域推定部１２２は、学習用データＧ１からあらかじめ指定された数の重要領域を推定する。なお、重要領域の数は限定されないが、後に説明するプロトタイプの精度を高めるためには、プロトタイプのチャネル数以上であるのが望ましい。しかし、重要領域の数は、プロトタイプのチャネル数よりも少なくてもよい。なお、一般的にチャネルとは、１つの入力データに対してニューラルネットワークが抽出する特徴量の先頭の次元のことであるが、本明細書においては、１または複数の重要領域に対して、特徴抽出部１２４が抽出する特徴量の先頭の次元をチャネルと呼ぶ。そのため、チャネル数とは１または複数の重要領域の数と一致するものである。 For example, the important area estimation unit 122 estimates a predetermined number of important areas from the learning data G1. The number of important regions is not limited, but in order to improve the accuracy of the prototype described later, it is desirable that the number is equal to or greater than the number of channels of the prototype. However, the number of critical regions may be less than the number of prototype channels. In general, the channel is the first dimension of the feature amount extracted by the neural network for one input data, but in the present specification, the feature is defined for one or more important regions. The first dimension of the feature amount extracted by the extraction unit 124 is called a channel. Therefore, the number of channels corresponds to the number of one or more important regions.

図２には、重要領域推定部１２２によって、学習用データＧ１から、重要領域Ｒ１（耳）、重要領域Ｒ２（目）、重要領域Ｒ３（口）、重要領域Ｒ４（脚）が推定された例が示されている。すなわち、重要領域推定部１２２によって４つの重要領域が推定された例が示されている。しかし、重要領域推定部１２２によって推定される重要領域の数は限定されない。重要領域推定部１２２によって推定される重要領域の種類も、耳、目、口および脚に限定されない。 FIG. 2 shows an example in which the important area R1 (ear), the important area R2 (eye), the important area R3 (mouth), and the important area R4 (leg) are estimated from the learning data G1 by the important area estimation unit 122. It is shown. That is, an example in which four important regions are estimated by the important region estimation unit 122 is shown. However, the number of important regions estimated by the important region estimation unit 122 is not limited. The types of important regions estimated by the important region estimation unit 122 are also not limited to the ears, eyes, mouth and legs.

ここで、学習用データをｘとし、学習用データにおける重要領域の位置をｔとし、重要領域のサイズをｓとし、重要領域推定ＮＮの処理を関数ｇ（）とすると、重要領域の位置ｔと、重要領域のサイズｓと、重要領域推定ＮＮの処理を示す関数ｇ（）との関係は、下記の数式（１）によって表現され得る。 Here, if the training data is x, the position of the important region in the training data is t, the size of the important region is s, and the processing of the important region estimation NN is the function g (), the position t of the important region is used. , The relationship between the size s of the important region and the function g () indicating the processing of the important region estimation NN can be expressed by the following mathematical formula (1).

ｓ，ｔ＝ｇ（ｘ）・・・（１） s, t = g (x) ... (1)

例えば、学習用データが画像データのように２次元データである場合、かつ、重要領域の形状が長方形である場合には、重要領域の位置ｔは、２次元データにおける長方形の所定点（例えば、長方形の左上の頂点など）の縦軸座標と横軸座標との組み合わせによって表現され得る。重要領域のサイズｓは、２次元データにおける長方形の縦横それぞれの長さによって表現される。しかし、重要領域の形状は、長方形に限定されず、他の形状（例えば、円形など）であってもよい。 For example, when the training data is two-dimensional data such as image data, and the shape of the important region is rectangular, the position t of the important region is a predetermined point (for example, for example) of the rectangle in the two-dimensional data. It can be represented by a combination of the vertical axis coordinates and the horizontal axis coordinates of (such as the upper left vertex of a rectangle). The size s of the important area is represented by the length of each of the vertical and horizontal directions of the rectangle in the two-dimensional data. However, the shape of the important region is not limited to a rectangle, and may be another shape (for example, a circle).

重要領域のサイズｓは、固定されていてもよいが、固定されていなくてもよい（可変であってもよい）。重要領域のサイズｓが可変である場合には、重要領域に基づいて後に提示される類似部位の柔軟性が高まることが期待される。一方、重要領域のサイズｓが学習用データと近すぎる場合には、後に重要領域がトリミング処理部１２３によってトリミングされる意味が薄れてしまう。そこで、重要領域のトリミングが有意義に行われるように、重要領域のサイズに対しては、所定の制約が課されていてもよい。 The size s of the important region may be fixed or may not be fixed (may be variable). When the size s of the important region is variable, it is expected that the flexibility of the similar portion presented later based on the important region is increased. On the other hand, if the size s of the important region is too close to the learning data, the meaning of trimming the important region later by the trimming processing unit 123 is diminished. Therefore, a predetermined constraint may be imposed on the size of the important area so that the important area can be trimmed meaningfully.

例えば、重要領域のサイズｓが所定の範囲に収まるよう、ｇ（ｘ）に対して値域が所定の範囲に限定される関数（例えば、シグモイド関数など）が乗じられてもよい。例えば、ｇ（ｘ）に定数とシグモイド関数とが乗じられれば、重要領域のサイズｓは、０から定数までに収まるようになる。あるいは、ｇ（ｘ）と定数とのいずれか小さい値が重要領域のサイズｓとして採用されれば、重要領域のサイズｓは、定数以下に収まるようになる。例えば、これらの定数が、学習用データのサイズに対して所定の割合（例えば、半分など）に設定されれば、重要領域のサイズｓは、学習用データのサイズの所定の割合以下のサイズになるように制約を受ける。 For example, a function (for example, a sigmoid function) whose range is limited to a predetermined range may be multiplied by g (x) so that the size s of the important region falls within a predetermined range. For example, if g (x) is multiplied by a constant and a sigmoid function, the size s of the important region will be within the range from 0 to the constant. Alternatively, if the smaller value of g (x) and the constant is adopted as the size s of the important region, the size s of the important region will be within the constant. For example, if these constants are set to a predetermined ratio (for example, half) to the size of the training data, the size s of the important region becomes a size less than or equal to the predetermined ratio of the size of the training data. Be constrained to be.

あるいは、重要領域のサイズｓ（または重要領域のサイズｓの二乗）が損失関数に足し合わされた上で、更新部１５０によって重要領域推定ＮＮの重みパラメータ１３１が更新されれば、重要領域のサイズｓが小さくなるように学習が行われるようになる。なお、重要領域推定ＮＮの具体的な構成は、特に限定されない。例えば、重要領域推定ＮＮとしては、畳み込みニューラルネットワーク（例えば、上記した非特許文献３に記載されている畳み込みニューラルネットワークなど）が用いられてもよい。 Alternatively, if the size s of the important area (or the square of the size s of the important area) is added to the loss function and the weight parameter 131 of the important area estimation NN is updated by the update unit 150, the size s of the important area s. Learning will be performed so that The specific configuration of the important region estimation NN is not particularly limited. For example, as the important region estimation NN, a convolutional neural network (for example, the convolutional neural network described in Non-Patent Document 3 described above) may be used.

（トリミング処理部１２３）
図１に戻って説明を続ける。トリミング処理部１２３は、入力部１２１から出力された学習用データＧ１と、重要領域推定部１２２から出力された重要領域Ｒ１〜Ｒ４それぞれを示す情報とに基づいて、学習用データＧ１の重要領域Ｒ１〜Ｒ４に対してトリミングを行って重要領域Ｒ１〜Ｒ４を特徴抽出部１２４に出力する。ここで、図３を参照しながら、トリミング処理部１２３の機能についてより詳細に説明する。 (Trimming processing unit 123)
The explanation will be continued by returning to FIG. The trimming processing unit 123 has the learning data G1 output from the input unit 121 and the important area R1 of the learning data G1 based on the information indicating each of the important areas R1 to R4 output from the important area estimation unit 122. ~ R4 is trimmed and important regions R1 to R4 are output to the feature extraction unit 124. Here, the function of the trimming processing unit 123 will be described in more detail with reference to FIG.

図３は、トリミング処理部１２３の機能の詳細を説明するための図である。図３を参照すると、入力部１２１から出力された学習用データＧ１が示され、重要領域推定部１２２によって推定された重要領域Ｒ１〜Ｒ４それぞれを示す情報（重要領域Ｒ１〜Ｒ４それぞれの位置およびサイズ）が示されている。トリミング処理部１２３は、学習用データＧ１から、重要領域Ｒ１〜Ｒ４に対してトリミングを行う。なお、図３に示されたように、重要領域Ｒ１〜Ｒ４に対するトリミングは、学習用データＧ１のうち重要領域Ｒ１〜Ｒ４以外の領域を除外することを意味し得る。 FIG. 3 is a diagram for explaining the details of the function of the trimming processing unit 123. With reference to FIG. 3, the learning data G1 output from the input unit 121 is shown, and information indicating each of the important regions R1 to R4 estimated by the important region estimation unit 122 (position and size of each of the important regions R1 to R4). )It is shown. The trimming processing unit 123 trims the important regions R1 to R4 from the learning data G1. As shown in FIG. 3, trimming for the important regions R1 to R4 may mean excluding the regions other than the important regions R1 to R4 from the learning data G1.

（特徴抽出部１２４）
図１に戻って説明を続ける。特徴抽出部１２４は、トリミング処理部１２３から出力された重要領域Ｒ１〜Ｒ４と特徴抽出ＮＮとに基づいて特徴量を抽出する。より詳細に、特徴抽出部１２４は、特徴抽出ＮＮに重要領域Ｒ１〜Ｒ４を入力させたことに基づいて、特徴抽出ＮＮから出力されるデータを特徴量として得る。特徴抽出部１２４は、特徴量を類似度算出部１２５に出力する。ここで、図４を参照しながら、特徴抽出部１２４の機能についてより詳細に説明する。 (Feature Extraction Unit 124)
The explanation will be continued by returning to FIG. The feature extraction unit 124 extracts a feature amount based on the important regions R1 to R4 output from the trimming processing unit 123 and the feature extraction NN. More specifically, the feature extraction unit 124 obtains the data output from the feature extraction NN as a feature amount based on the fact that the feature extraction NN is input with the important regions R1 to R4. The feature extraction unit 124 outputs the feature amount to the similarity calculation unit 125. Here, the function of the feature extraction unit 124 will be described in more detail with reference to FIG.

図４は、特徴抽出部１２４の機能の詳細を説明するための図である。図４を参照すると、トリミング処理部１２３から出力された重要領域Ｒ１〜Ｒ４が示されている。特徴抽出部１２４は、特徴抽出ＮＮに重要領域Ｒ１〜Ｒ４を入力させ、重みパラメータ１３２を用いて特徴抽出ＮＮから出力される特徴量Ｆ１〜Ｆ４を得る。 FIG. 4 is a diagram for explaining the details of the function of the feature extraction unit 124. With reference to FIG. 4, important regions R1 to R4 output from the trimming processing unit 123 are shown. The feature extraction unit 124 causes the feature extraction NN to input the important regions R1 to R4, and obtains the feature quantities F1 to F4 output from the feature extraction NN using the weight parameter 132.

特徴量Ｆ１〜Ｆ４それぞれのサイズは、固定されていてもよいが、固定されていなくてもよい（可変であってもよい）。特徴量Ｆ１〜Ｆ４それぞれのサイズが可変である場合には、特徴量Ｆ１〜Ｆ４に基づいて後に提示される類似部位の柔軟性が高まることが期待される。なお、特徴量Ｆ１〜Ｆ４それぞれは、特徴量のチャネルデータに相当する。すなわち、本発明の実施形態では、特徴量のチャネル数が４である場合を主に想定するが、特徴量のチャネル数は限定されない。 The size of each of the feature quantities F1 to F4 may be fixed, but may not be fixed (may be variable). When the size of each of the feature amounts F1 to F4 is variable, it is expected that the flexibility of the similar portion presented later based on the feature amounts F1 to F4 will be increased. Each of the feature amounts F1 to F4 corresponds to the channel data of the feature amount. That is, in the embodiment of the present invention, it is mainly assumed that the number of channels of the feature amount is 4, but the number of channels of the feature amount is not limited.

また、特徴抽出ＮＮの具体的な構成は限定されない。例えば、特徴抽出ＮＮとしては、複数の畳み込み層を含んで構成されたニューラルネットワークが用いられてもよい。 Moreover, the specific configuration of the feature extraction NN is not limited. For example, as the feature extraction NN, a neural network composed of a plurality of convolutional layers may be used.

（類似度算出部１２５）
図１に戻って説明を続ける。類似度算出部１２５は、特徴抽出部１２４から出力された特徴量Ｆ１〜Ｆ４とプロトタイプ１３４との類似度を算出する。特徴抽出部１２４から出力される特徴量Ｆ１〜Ｆ４の次元とプロトタイプ１３４の次元とは同じに設定されている。ここでは、特徴量Ｆ１〜Ｆ４とプロトタイプ１３４それぞれが、複数チャネルに分かれた二次元データ（すなわち、三次元データ）である場合を想定するが、次元数は限定されない。そして、類似度算出部１２５は、算出した特徴量Ｆ１〜Ｆ４とプロトタイプ１３４との類似度を推論部１２６に出力する。ここで、図５を参照しながら、類似度算出部１２５の機能についてより詳細に説明する。 (Similarity calculation unit 125)
The explanation will be continued by returning to FIG. The similarity calculation unit 125 calculates the similarity between the feature amounts F1 to F4 output from the feature extraction unit 124 and the prototype 134. The dimensions of the feature quantities F1 to F4 output from the feature extraction unit 124 and the dimensions of the prototype 134 are set to be the same. Here, it is assumed that the feature quantities F1 to F4 and the prototype 134 are two-dimensional data (that is, three-dimensional data) divided into a plurality of channels, but the number of dimensions is not limited. Then, the similarity calculation unit 125 outputs the calculated similarity between the feature quantities F1 to F4 and the prototype 134 to the inference unit 126. Here, the function of the similarity calculation unit 125 will be described in more detail with reference to FIG.

図５は、類似度算出部１２５の機能の詳細を説明するための図である。図５を参照すると、特徴量Ｆ１〜Ｆ４とプロトタイプＰ１〜Ｐ４とが示されている。プロトタイプＰ１〜Ｐ４それぞれは、プロトタイプのチャネルデータに相当する。すなわち、本発明の実施形態では、プロトタイプのチャネル数が４である場合を主に想定するが、プロトタイプのチャネル数は限定されない。 FIG. 5 is a diagram for explaining the details of the function of the similarity calculation unit 125. With reference to FIG. 5, the feature quantities F1 to F4 and the prototypes P1 to P4 are shown. Each of the prototypes P1 to P4 corresponds to the channel data of the prototype. That is, in the embodiment of the present invention, it is mainly assumed that the number of channels of the prototype is 4, but the number of channels of the prototype is not limited.

特徴量Ｆ１〜Ｆ４のチャネル数とプロトタイプＰ１〜Ｐ４のチャネル数とは、同じに設定されている。これによって、類似度算出部１２５によって、特徴量Ｆ１〜Ｆ４とプロトタイプＰ１〜Ｐ４との類似度がチャネルごとに算出され得る。図５に示された例では、類似度算出部１２５によって、特徴量Ｆ１とプロトタイプＰ１との類似度Ｍ１が算出され、特徴量Ｆ２とプロトタイプＰ２との類似度Ｍ２が算出され、特徴量Ｆ３とプロトタイプＰ３との類似度Ｍ３が算出され、特徴量Ｆ４とプロトタイプＰ４との類似度Ｍ４が算出される。 The number of channels of the feature quantities F1 to F4 and the number of channels of the prototypes P1 to P4 are set to be the same. As a result, the similarity calculation unit 125 can calculate the similarity between the feature quantities F1 to F4 and the prototypes P1 to P4 for each channel. In the example shown in FIG. 5, the similarity calculation unit 125 calculates the similarity M1 between the feature amount F1 and the prototype P1, calculates the similarity degree M2 between the feature amount F2 and the prototype P2, and sets the feature amount F3. The degree of similarity M3 with the prototype P3 is calculated, and the degree of similarity M4 between the feature amount F4 and the prototype P4 is calculated.

対応するチャネルにおける特徴量とプロトタイプとの類似度は、どのように算出されてもよい。例えば、類似度算出部１２５は、対応するチャネルにおける特徴量のサイズとプロトタイプのサイズとが同じ場合には、対応するチャネルにおける特徴量とプロトタイプとの類似度を、特徴量とプロトタイプとにおいて対応する要素同士の差分の二乗和（Ｌ２ノルム）を用いて算出することができる。例えば、Ｌ２ノルムの逆数は、要素同士の差分の二乗和が小さいほど（特徴量とプロトタイプとの距離が近いほど）大きくなるため、類似度として好適に用いられ得る。 The similarity between the feature quantity and the prototype in the corresponding channel may be calculated in any way. For example, when the size of the feature amount in the corresponding channel and the size of the prototype are the same, the similarity calculation unit 125 corresponds the similarity between the feature amount and the prototype in the corresponding channel in the feature amount and the prototype. It can be calculated using the sum of squares (L2 norm) of the differences between the elements. For example, the reciprocal of the L2 norm can be suitably used as the degree of similarity because the smaller the sum of squares of the differences between the elements (the closer the feature amount and the prototype are), the larger the reciprocal.

あるいは、対応するチャネルにおける特徴量のサイズは、プロトタイプのサイズよりも大きくてもよい。かかる場合、特徴抽出部１２４から出力される特徴量のデータ型は、Ｃ（チャネル）×Ｈ（高さ）×Ｗ（幅）と表現され、プロトタイプのデータ型は、Ｃ（チャネル）×Ｈ’（高さ）×Ｗ’（幅）（ただし、Ｈ＞Ｈ’かつＷ＞Ｗ’）と表現される。 Alternatively, the size of the feature in the corresponding channel may be larger than the size of the prototype. In such a case, the data type of the feature amount output from the feature extraction unit 124 is expressed as C (channel) × H (height) × W (width), and the prototype data type is C (channel) × H'. It is expressed as (height) x W'(width) (however, H> H'and W> W').

このとき、特徴量をｚとし、特徴量ｚからプロトタイプｐのサイズと同じサイズの切り出し可能な部分的な特徴量をｚ’とする。そして、類似度算出部１２５は、特徴量ｚの１または複数個所それぞれの特徴量ｚ’（すなわち、特徴量ｚから切り出し可能な部分的な特徴量ｚ’の全部または一部）とプロトタイプｐとの類似度の中で最も高い類似度Ｍを、当該チャネルに対応する類似度として推論部１２６に出力すればよい。すなわち、類似度Ｍは、類似度算出部１２５によって以下の数式（２）に示されるように算出されてよい。 At this time, let z be the feature amount, and let z'be a partial feature amount that can be cut out from the feature amount z and have the same size as the prototype p. Then, the similarity calculation unit 125 includes the feature amount z'of one or a plurality of features of the feature amount z (that is, all or a part of the partial feature amount z'that can be cut out from the feature amount z) and the prototype p. The highest similarity M among the similarity of the above may be output to the inference unit 126 as the similarity corresponding to the channel. That is, the similarity M may be calculated by the similarity calculation unit 125 as shown in the following mathematical formula (2).

なお、類似度の算出方法は、かかる例に限定されない。例えば、類似度を算出する関数としては、ニューラルネットワークにおける誤差逆伝播法が適用可能な関数であれば、ニューラルネットワークが解決すべき問題に応じて自由に設定されてよい。類似度算出部１２５によって更新される保存データ１６０については後に説明する。 The method of calculating the degree of similarity is not limited to such an example. For example, the function for calculating the similarity may be freely set according to the problem to be solved by the neural network as long as it is a function to which the backpropagation method in the neural network can be applied. The stored data 160 updated by the similarity calculation unit 125 will be described later.

（推論部１２６）
図１に戻って説明を続ける。推論部１２６は、類似度算出部１２５から出力された類似度Ｍ１〜Ｍ４に基づいて推論を行って推論値を得る。そして、推論部１２６は、推論値を評価部１４０に出力する。ここで、図６を参照しながら、推論部１２６の機能についてより詳細に説明する。 (Inference unit 126)
The explanation will be continued by returning to FIG. The inference unit 126 makes an inference based on the similarity degrees M1 to M4 output from the similarity calculation unit 125, and obtains an inference value. Then, the inference unit 126 outputs the inference value to the evaluation unit 140. Here, the function of the inference unit 126 will be described in more detail with reference to FIG.

図６は、推論部１２６の機能の詳細を説明するための図である。図６を参照すると、類似度算出部１２５から出力された類似度Ｍ１〜Ｍ４が示されている。推論部１２６は、推論ＮＮに類似度Ｍ１〜Ｍ４を入力させ、重みパラメータ１３３を用いて推論ＮＮから出力される推論値を得る。なお、本明細書においては、ニューラルネットワークへのデータの入力に基づいてニューラルネットワークから出力されるデータを得ることを広く「推論」と言う。そのため、学習段階においても「推論」という用語が使用される。 FIG. 6 is a diagram for explaining the details of the function of the inference unit 126. With reference to FIG. 6, the similarity degrees M1 to M4 output from the similarity degree calculation unit 125 are shown. The inference unit 126 causes the inference NN to input the similarity M1 to M4, and obtains the inference value output from the inference NN using the weight parameter 133. In this specification, obtaining data output from a neural network based on input of data to the neural network is broadly referred to as "inference". Therefore, the term "inference" is also used in the learning stage.

推論ＮＮの具体的な構成は、特に限定されない。しかし、推論ＮＮの出力の形式は、学習用データに対応する正解値の形式と合わせて設定されているのがよい。例えば、正解値が分類問題のクラスである場合、推論ＮＮの出力は、クラス数分の長さを有するｏｎｅ−ｈｏｔベクトルであるとよい。 The specific configuration of the inference NN is not particularly limited. However, the output format of the inference NN should be set in accordance with the format of the correct answer value corresponding to the learning data. For example, when the correct answer value is a class of the classification problem, the output of the inference NN may be a one-hot vector having a length corresponding to the number of classes.

（評価部１４０）
図１に戻って説明を続ける。評価部１４０は、入力部１２１によって取得された正解値に基づいて、推論部１２６から出力された推論値を評価して評価結果を得る。そして、評価部１４０は、評価結果を更新部１５０に出力する。ここで、図７を参照しながら、評価部１４０の機能についてより詳細に説明する。 (Evaluation unit 140)
The explanation will be continued by returning to FIG. The evaluation unit 140 evaluates the inference value output from the inference unit 126 based on the correct answer value acquired by the input unit 121, and obtains an evaluation result. Then, the evaluation unit 140 outputs the evaluation result to the update unit 150. Here, the function of the evaluation unit 140 will be described in more detail with reference to FIG. 7.

図７は、評価部１４０の機能の詳細を説明するための図である。図７を参照すると、推論部１２６から出力された推論値が示されている。また、図７を参照すると、入力部１２１によって取得された正解値が示されている。本発明の実施形態では、評価部１４０が、正解値と推論値とに応じた損失関数を評価結果として算出する場合を想定する。ここで、本発明の実施形態において用いられる損失関数は特定の関数に限定されず、一般的なニューラルネットワークにおいて用いられる損失関数と同様の損失関数が用いられてよい。例えば、損失関数は、正解値と推論値との差分に基づく平均二乗誤差であってもよい。 FIG. 7 is a diagram for explaining the details of the function of the evaluation unit 140. With reference to FIG. 7, the inferred value output from the inference unit 126 is shown. Further, referring to FIG. 7, the correct answer value acquired by the input unit 121 is shown. In the embodiment of the present invention, it is assumed that the evaluation unit 140 calculates the loss function corresponding to the correct answer value and the inferred value as the evaluation result. Here, the loss function used in the embodiment of the present invention is not limited to a specific function, and a loss function similar to the loss function used in a general neural network may be used. For example, the loss function may be a mean square error based on the difference between the correct value and the inferred value.

（更新部１５０）
図１に戻って説明を続ける。更新部１５０は、評価部１４０から出力された評価結果に基づいて、重要領域推定ＮＮの重みパラメータ１３１と、特徴抽出ＮＮの重みパラメータ１３２と、推論ＮＮの重みパラメータ１３３と、プロトタイプ１３４との更新を行う。これによって、推論部１２６から出力される推論値が正解値に近づくように、重要領域推定ＮＮの重みパラメータ１３１と、特徴抽出ＮＮの重みパラメータ１３２と、推論ＮＮの重みパラメータ１３３と、プロトタイプ１３４とが訓練され得る。ここで、図８を参照しながら、更新部１５０の機能についてより詳細に説明する。 (Update part 150)
The explanation will be continued by returning to FIG. The update unit 150 updates the weight parameter 131 of the important region estimation NN, the weight parameter 132 of the feature extraction NN, the weight parameter 133 of the inference NN, and the prototype 134 based on the evaluation result output from the evaluation unit 140. I do. As a result, the weight parameter 131 of the important region estimation NN, the weight parameter 132 of the feature extraction NN, the weight parameter 133 of the inference NN, and the prototype 134 so that the inference value output from the inference unit 126 approaches the correct answer value. Can be trained. Here, the function of the update unit 150 will be described in more detail with reference to FIG.

図８は、更新部１５０の機能の詳細を説明するための図である。図８を参照すると、評価部１４０から出力された評価結果が示されている。プロトタイプＰ１〜Ｐ４それぞれは、プロトタイプのチャネルデータに相当する。例えば、更新部１５０は、評価部１４０から出力された評価結果に基づく誤差逆伝播法（バックプロパゲーション）によって、重要領域推定ＮＮの重みパラメータ１３１と、特徴抽出ＮＮの重みパラメータ１３２と、推論ＮＮの重みパラメータ１３３と、プロトタイプ１３４とを更新してよい。 FIG. 8 is a diagram for explaining the details of the function of the update unit 150. With reference to FIG. 8, the evaluation result output from the evaluation unit 140 is shown. Each of the prototypes P1 to P4 corresponds to the channel data of the prototype. For example, the update unit 150 uses an error backpropagation method (backpropagation) based on the evaluation result output from the evaluation unit 140 to carry out the weight parameter 131 of the important region estimation NN, the weight parameter 132 of the feature extraction NN, and the inference NN. The weight parameter 133 and the prototype 134 may be updated.

本発明の実施形態では、学習装置１０によって、プロトタイプと類似する特徴量（類似特徴量）が抽出された学習用データが検出（探索）される場合を想定する。そして、識別装置２０において、学習装置１０によって検出された学習用データの類似特徴量に対応する領域データが、類似部位として提示される場合を想定する。これによって、類似例全体が提示されるよりも、類似例とテストデータとの類似性が部位ごとに容易に理解されやすくなる。 In the embodiment of the present invention, it is assumed that the learning device 10 detects (searches) the learning data in which the feature amount (similar feature amount) similar to the prototype is extracted. Then, it is assumed that the identification device 20 presents the region data corresponding to the similar feature amount of the learning data detected by the learning device 10 as a similar part. This makes it easier to understand the similarity between the similar example and the test data site by site, rather than presenting the entire similar example.

より詳細に、類似度算出部１２５は、上記したように、複数の学習用データそれぞれに対して特徴量とプロトタイプとの類似度をチャネルごとに算出し、推論部１２６に出力する。そこで、類似度算出部１２５は、所定のタイミングで、推論部１２６に出力した類似度と、当該類似度に対応する当該特徴量とを、チャネルごとに保存データ１６０として保存する。図８には一例として、プロトタイプＰ１に対応するチャネルについて、推論部１２６に出力された複数の類似度（類似度：５０、類似度：１０、・・・、類似度：２０）と、複数の類似度それぞれに対応する特徴量とが保存データ１６０として保存されている例が示されている。しかし、プロトタイプＰ２〜Ｐ４それぞれに対応するチャネルの特徴量と類似度も同様に保存される。 More specifically, as described above, the similarity calculation unit 125 calculates the similarity between the feature amount and the prototype for each of the plurality of learning data for each channel and outputs the similarity to the inference unit 126. Therefore, the similarity calculation unit 125 stores the similarity output to the inference unit 126 and the feature amount corresponding to the similarity as storage data 160 for each channel at a predetermined timing. As an example, FIG. 8 shows a plurality of similarities (similarity: 50, similarity: 10, ..., Similarity: 20) output to the inference unit 126 and a plurality of similarities for the channel corresponding to the prototype P1. An example is shown in which the feature amount corresponding to each degree of similarity is stored as the stored data 160. However, the features and similarity of the channels corresponding to each of the prototypes P2 to P4 are also preserved.

なお、本発明の実施形態では、データセット１１０を用いた学習装置１０による学習が何巡か繰り返し実行された後に（例えば、４回繰り返し実行された後など）、次の巡目（例えば、５巡目など）の類似度と特徴量とが保存される場合を想定する。しかし、類似度と特徴量とが保存されるタイミングは限定されない。類似度算出部１２５は、学習装置１０による学習に用いられた複数の学習用データの一部または全部それぞれにおいて、類似度と特徴量とを保存すればよい。 In the embodiment of the present invention, after the learning by the learning device 10 using the data set 110 is repeatedly executed several times (for example, after being repeatedly executed four times), the next round (for example, 5). It is assumed that the similarity and the feature amount of (rounds, etc.) are preserved. However, the timing at which the similarity and the feature amount are stored is not limited. The similarity calculation unit 125 may store the similarity and the feature amount in a part or all of the plurality of learning data used for learning by the learning device 10.

類似度算出部１２５による類似度と特徴量との保存が終わると、更新部１５０は、プロトタイプＰ１と最も類似度が高い特徴量を類似特徴量として保存データ１６０から検出する。図８に示された例では、類似度が「５０」の特徴量が類似特徴量として検出される。更新部１５０は、類似特徴量が抽出された学習用データの当該類似特徴量に対応する領域データを類似部位としてプロトタイプＰ１に対応付ける。なお、更新部１５０は、同様に、領域データを類似部位としてプロトタイプＰ２〜Ｐ４にチャネルごとに対応付ける。 When the similarity calculation unit 125 finishes storing the similarity and the feature amount, the update unit 150 detects the feature amount having the highest similarity with the prototype P1 as the similar feature amount from the stored data 160. In the example shown in FIG. 8, a feature having a similarity of "50" is detected as a similar feature. The update unit 150 associates the region data corresponding to the similar feature amount of the learning data from which the similar feature amount has been extracted with the prototype P1 as a similar part. Similarly, the update unit 150 associates the region data with the prototypes P2 to P4 as similar parts for each channel.

なお、プロトタイプ１３４は、学習が終了するまで、重みパラメータ１３１〜１３３とともに誤差逆伝播法（バックプロパゲーション）によって更新され続けてもよい。しかし、本発明の実施形態では、更新部１５０が、検出した類似特徴量によってプロトタイプ１３４をチャネルごとに上書きする場合を想定する。これによって、類似度の算出に用いられるプロトタイプ１３４と、提示される類似部位との間の整合性が向上し得る。かかる整合性の観点から、更新部１５０は、学習の途中において、類似特徴量によってプロトタイプ１３４を上書きした場合、プロトタイプ１３４の更新を停止するのがよい。 The prototype 134 may be continuously updated by the error backpropagation method (backpropagation) together with the weight parameters 131 to 133 until the learning is completed. However, in the embodiment of the present invention, it is assumed that the update unit 150 overwrites the prototype 134 for each channel by the detected similar feature amount. This can improve the consistency between the prototype 134 used to calculate the similarity and the presented similarity sites. From the viewpoint of such consistency, it is preferable that the update unit 150 stops updating the prototype 134 when the prototype 134 is overwritten by a similar feature amount in the middle of learning.

なお、更新部１５０は、学習用データに基づく更新が終わるたびに、学習の終了条件が満たされたか否かを判断する。学習の終了条件が満たされていないと判断した場合には、入力部１２１によって次の学習用データが取得され、重要領域推定部１２２、トリミング処理部１２３、特徴抽出部１２４、類似度算出部１２５、推論部１２６、評価部１４０および更新部１５０それぞれによって、当該次の入力データに基づく各自の処理が再度実行される。一方、更新部１５０によって、学習の終了条件が満たされたと判断された場合には、学習が終了される。 The update unit 150 determines whether or not the learning end condition is satisfied each time the update based on the learning data is completed. When it is determined that the learning end condition is not satisfied, the input unit 121 acquires the next learning data, and the important area estimation unit 122, the trimming processing unit 123, the feature extraction unit 124, and the similarity calculation unit 125. , The inference unit 126, the evaluation unit 140, and the update unit 150, respectively, re-execute their own processing based on the next input data. On the other hand, when the update unit 150 determines that the learning end condition is satisfied, the learning is ended.

なお、学習の終了条件は特に限定されず、ニューラルネットワーク１２０の学習がある程度行われたことを示す条件であればよい。具体的に、学習の終了件は、損失関数の値が閾値よりも小さいという条件を含んでもよい。あるいは、学習の終了条件は、損失関数の値の変化が閾値よりも小さいという条件（損失関数の値が収束状態になったという条件）を含んでもよい。あるいは、学習の終了条件は、重みパラメータの更新が所定の回数行われたという条件を含んでもよい。あるいは、評価部１４０によって正解値と推論値とに基づいて精度が算出される場合、学習の終了条件は、精度が所定の割合（例えば、９０％など）を超えるという条件を含んでもよい。 The learning end condition is not particularly limited as long as it is a condition indicating that the neural network 120 has been learned to some extent. Specifically, the end of learning may include the condition that the value of the loss function is smaller than the threshold value. Alternatively, the learning end condition may include a condition that the change in the value of the loss function is smaller than the threshold value (a condition that the value of the loss function is in a converged state). Alternatively, the learning end condition may include the condition that the weight parameter is updated a predetermined number of times. Alternatively, when the evaluation unit 140 calculates the accuracy based on the correct answer value and the inferred value, the learning end condition may include a condition that the accuracy exceeds a predetermined ratio (for example, 90%).

以上、本発明の実施形態に係る学習装置１０の構成例について説明した。 The configuration example of the learning device 10 according to the embodiment of the present invention has been described above.

（１−２．学習装置の動作）
続いて、本発明の実施形態に係る学習装置１０の動作例について説明する。図９は、本発明の実施形態に係る学習装置１０の動作例を示すフローチャートである。まず、図９に示されたように、入力部１２１は、データセット１１０から学習用データおよび正解値の組み合わせを取得する。また、重要領域推定部１２２は、重みパラメータ１３１を取得し、特徴抽出部１２４は、重みパラメータ１３２を取得し、推論部１２６は、重みパラメータ１３３を取得し、類似度算出部１２５は、プロトタイプ１３４を取得する（Ｓ１１）。 (1-2. Operation of learning device)
Subsequently, an operation example of the learning device 10 according to the embodiment of the present invention will be described. FIG. 9 is a flowchart showing an operation example of the learning device 10 according to the embodiment of the present invention. First, as shown in FIG. 9, the input unit 121 acquires a combination of learning data and a correct answer value from the data set 110. Further, the important region estimation unit 122 acquires the weight parameter 131, the feature extraction unit 124 acquires the weight parameter 132, the inference unit 126 acquires the weight parameter 133, and the similarity calculation unit 125 acquires the prototype 134. (S11).

重要領域推定部１２２は、入力部１２１から出力された学習用データと重要領域推定ＮＮとに基づいて学習用データから１または複数の重要領域を推定する（Ｓ１２）。より詳細に、重要領域推定部１２２は、重要領域推定ＮＮに学習用データを入力させ、重みパラメータ１３１を用いて重要領域推定ＮＮから出力されるデータを１または複数の重要領域それぞれを示す情報（１または複数の重要領域それぞれの位置およびサイズ）として得る。重要領域推定部１２２は、１または複数の重要領域それぞれを示す情報をトリミング処理部１２３に出力する。 The important area estimation unit 122 estimates one or more important areas from the learning data based on the learning data output from the input unit 121 and the important area estimation NN (S12). More specifically, the important area estimation unit 122 causes the important area estimation NN to input the learning data, and uses the weight parameter 131 to output the data output from the important area estimation NN to the information indicating each of the one or a plurality of important areas ( Obtained as the position and size of each of one or more important areas). The important area estimation unit 122 outputs information indicating each of the one or a plurality of important areas to the trimming processing unit 123.

トリミング処理部１２３は、入力部１２１から出力された学習用データと、重要領域推定部１２２から出力された１または複数の重要領域それぞれを示す情報とに基づいて、学習用データの１または複数の重要領域に対してトリミングを行う（Ｓ１３）。そして、トリミング処理部１２３は、１または複数の重要領域を特徴抽出部１２４に出力する。 The trimming processing unit 123 has one or a plurality of learning data based on the learning data output from the input unit 121 and the information indicating one or a plurality of important areas output from the important area estimation unit 122. Trimming is performed on the important area (S13). Then, the trimming processing unit 123 outputs one or a plurality of important regions to the feature extraction unit 124.

特徴抽出部１２４は、トリミング処理部１２３から出力された１または複数の重要領域と特徴抽出ＮＮとに基づいて特徴量を抽出する（Ｓ１４）。より詳細に、特徴抽出部１２４は、特徴抽出ＮＮに重要領域を入力させ、重みパラメータ１３２を用いて特徴抽出ＮＮから出力される特徴量を得る。特徴抽出部１２４は、特徴量を類似度算出部１２５に出力する。 The feature extraction unit 124 extracts a feature amount based on one or a plurality of important regions output from the trimming processing unit 123 and the feature extraction NN (S14). More specifically, the feature extraction unit 124 causes the feature extraction NN to input an important region, and obtains a feature amount output from the feature extraction NN using the weight parameter 132. The feature extraction unit 124 outputs the feature amount to the similarity calculation unit 125.

類似度算出部１２５は、特徴抽出部１２４から出力された特徴量とプロトタイプ１３４との類似度を算出する（Ｓ１５）。推論部１２６は、類似度算出部１２５から出力された類似度に基づいて推論を行って推論値を得る（Ｓ１６）。より詳細に、推論部１２６は、推論ＮＮに類似度を入力させ、重みパラメータ１３３を用いて推論ＮＮから出力される推論値を得る。そして、推論部１２６は、推論値を評価部１４０に出力する。 The similarity calculation unit 125 calculates the similarity between the feature amount output from the feature extraction unit 124 and the prototype 134 (S15). The inference unit 126 makes an inference based on the similarity output from the similarity calculation unit 125 and obtains an inferred value (S16). More specifically, the inference unit 126 causes the inference NN to input the similarity, and obtains the inference value output from the inference NN using the weight parameter 133. Then, the inference unit 126 outputs the inference value to the evaluation unit 140.

評価部１４０は、入力部１２１によって取得された正解値に基づいて、推論部１２６から出力された推論値を評価して評価結果を得る（Ｓ１７）。より詳細に、評価部１４０は、正解値と推論値とに応じた損失関数を評価結果として算出する。そして、評価部１４０は、評価結果を更新部１５０に出力する。更新部１５０は、評価部１４０から出力された評価結果に基づいて、重要領域推定ＮＮの重みパラメータ１３１と、特徴抽出ＮＮの重みパラメータ１３２と、推論ＮＮの重みパラメータ１３３と、プロトタイプ１３４との更新を行う（Ｓ１８）。 The evaluation unit 140 evaluates the inference value output from the inference unit 126 based on the correct answer value acquired by the input unit 121, and obtains an evaluation result (S17). More specifically, the evaluation unit 140 calculates the loss function according to the correct answer value and the inferred value as the evaluation result. Then, the evaluation unit 140 outputs the evaluation result to the update unit 150. The update unit 150 updates the weight parameter 131 of the important region estimation NN, the weight parameter 132 of the feature extraction NN, the weight parameter 133 of the inference NN, and the prototype 134 based on the evaluation result output from the evaluation unit 140. (S18).

更新部１５０は、学習用データに基づく更新が終わるたびに、学習の終了条件が満たされたか否かを判断する（Ｓ１９）。学習の終了条件が満たされていないと判断した場合には（Ｓ１９において「ＮＯ」）、Ｓ１１に動作が移行され、入力部１２１によって次の学習用データが取得され、重要領域推定部１２２、トリミング処理部１２３、特徴抽出部１２４、類似度算出部１２５、推論部１２６、評価部１４０および更新部１５０それぞれによって、当該次の入力データに基づく各自の処理が再度実行される。一方、更新部１５０によって、学習の終了条件が満たされたと判断された場合には（Ｓ１９において「ＹＥＳ」）、学習が終了される。 The update unit 150 determines whether or not the learning end condition is satisfied each time the update based on the learning data is completed (S19). When it is determined that the learning end condition is not satisfied (“NO” in S19), the operation is shifted to S11, the next learning data is acquired by the input unit 121, the important area estimation unit 122, trimming. The processing unit 123, the feature extraction unit 124, the similarity calculation unit 125, the inference unit 126, the evaluation unit 140, and the update unit 150 each re-execute their own processing based on the next input data. On the other hand, when the update unit 150 determines that the learning end condition is satisfied (“YES” in S19), the learning is ended.

以上、本発明の実施形態に係る学習装置１０の動作例について説明した。 The operation example of the learning device 10 according to the embodiment of the present invention has been described above.

（１−３．識別装置の構成）
続いて、本発明の実施形態に係る識別装置２０の構成例について説明する。図１０は、本発明の実施形態に係る識別装置２０の機能構成例を示す図である。図１０に示されるように、本発明の実施形態に係る識別装置２０は、学習装置１０によって学習された学習済みのニューラルネットワーク１２０を備える。その他、識別装置２０は、表示制御部２２０および表示部２３０を備える。 (1-3. Configuration of identification device)
Subsequently, a configuration example of the identification device 20 according to the embodiment of the present invention will be described. FIG. 10 is a diagram showing a functional configuration example of the identification device 20 according to the embodiment of the present invention. As shown in FIG. 10, the identification device 20 according to the embodiment of the present invention includes a trained neural network 120 learned by the learning device 10. In addition, the identification device 20 includes a display control unit 220 and a display unit 230.

表示制御部２２０は、演算装置を含み、ＲＯＭにより記憶されているプログラムが演算装置によりＲＡＭに展開されて実行されることにより、その機能が実現され得る。このとき、当該プログラムを記録した、コンピュータに読み取り可能な記録媒体も提供され得る。あるいは、これらのブロックは、専用のハードウェアにより構成されていてもよいし、複数のハードウェアの組み合わせにより構成されてもよい。演算装置による演算に必要なデータは、図示しない記憶部によって適宜記憶される。 The display control unit 220 includes an arithmetic unit, and its function can be realized by the program stored in the ROM being expanded and executed in the RAM by the arithmetic unit. At this time, a computer-readable recording medium on which the program is recorded may also be provided. Alternatively, these blocks may be composed of dedicated hardware or may be composed of a combination of a plurality of hardware. The data required for the calculation by the arithmetic unit is appropriately stored by a storage unit (not shown).

表示部２３０は、ディスプレイによって構成される。テストデータ２１０、重要領域推定ＮＮの重みパラメータ１３１、特徴抽出ＮＮの重みパラメータ１３２、推論ＮＮの重みパラメータ１３３、および、プロトタイプ１３４は、図示しない記憶部によって記憶される。かかる記憶部は、ＲＡＭ、ハードディスクドライブまたはフラッシュメモリなどのメモリによって構成されてよい。 The display unit 230 is composed of a display. The test data 210, the weight parameter 131 of the important region estimation NN, the weight parameter 132 of the feature extraction NN, the weight parameter 133 of the inference NN, and the prototype 134 are stored by a storage unit (not shown). Such a storage unit may be composed of a memory such as a RAM, a hard disk drive, or a flash memory.

（テストデータ２１０）
テストデータ２１０は、識別用データに相当する。なお、本発明の実施形態では、テストデータ２１０が、学習用データと同様に、画像データである場合（特に、静止画像データである場合）を主に想定する。しかし、テストデータ２１０の種類は特に限定されない。例えば、テストデータ２１０は、学習用データと同様に、複数のフレームを含んだ動画像データであってもよいし、音響データであってもよい。 (Test data 210)
The test data 210 corresponds to identification data. In the embodiment of the present invention, it is mainly assumed that the test data 210 is image data (particularly, still image data) like the learning data. However, the type of test data 210 is not particularly limited. For example, the test data 210 may be moving image data including a plurality of frames or acoustic data, as in the case of learning data.

（入力部１２１〜推論部１２６）
入力部１２１は、テストデータ２１０を取得する。入力部１２１は、テストデータ２１０を重要領域推定部１２２およびトリミング処理部１２３それぞれに出力する。重要領域推定部１２２は、入力部１２１から出力されたテストデータ２１０と重要領域推定ＮＮとに基づいてテストデータ２１０から１または複数の重要領域を推定する。テストデータ２１０から１または複数の重要領域を推定する手法は、学習装置１０における重要領域推定部１２２が、学習用データから重要領域を推定する手法と同様である。 (Input unit 121-Inference unit 126)
The input unit 121 acquires the test data 210. The input unit 121 outputs the test data 210 to the important area estimation unit 122 and the trimming processing unit 123, respectively. The important area estimation unit 122 estimates one or more important areas from the test data 210 based on the test data 210 output from the input unit 121 and the important area estimation NN. The method of estimating one or more important regions from the test data 210 is the same as the method of estimating the important region from the learning data by the important region estimation unit 122 in the learning device 10.

トリミング処理部１２３は、入力部１２１から出力されたテストデータと、重要領域推定部１２２から出力された１または複数の重要領域それぞれを示す情報とに基づいて、テストデータの１または複数の重要領域に対してトリミングを行って１または複数の重要領域を特徴抽出部１２４に出力する。テストデータ２１０の１または複数の重要領域に対してトリミングを行う手法は、学習装置１０におけるトリミング処理部１２３が、学習用データの１または複数の重要領域をトリミングする手法と同様である。 The trimming processing unit 123 has one or a plurality of important areas of the test data based on the test data output from the input unit 121 and the information indicating each of the one or a plurality of important areas output from the important area estimation unit 122. Is trimmed and one or a plurality of important regions are output to the feature extraction unit 124. The method of trimming one or more important areas of the test data 210 is the same as the method of trimming the one or more important areas of the learning data by the trimming processing unit 123 in the learning device 10.

特徴抽出部１２４は、トリミング処理部１２３から出力された１または複数の重要領域と特徴抽出ＮＮとに基づいて特徴量を抽出する。特徴抽出部１２４が特徴量を抽出する手法は、学習装置１０における特徴抽出部１２４が特徴量を抽出する手法と同様である。類似度算出部１２５は、特徴抽出部１２４から出力された特徴量とプロトタイプ１３４との類似度を算出する。類似度算出部１２５が類似度を算出する手法は、学習装置１０における類似度算出部１２５が類似度を算出する手法と同様である。 The feature extraction unit 124 extracts a feature amount based on one or a plurality of important regions output from the trimming processing unit 123 and the feature extraction NN. The method in which the feature extraction unit 124 extracts the feature amount is the same as the method in which the feature extraction unit 124 in the learning device 10 extracts the feature amount. The similarity calculation unit 125 calculates the similarity between the feature amount output from the feature extraction unit 124 and the prototype 134. The method in which the similarity calculation unit 125 calculates the similarity is the same as the method in which the similarity calculation unit 125 in the learning device 10 calculates the similarity.

推論部１２６は、類似度算出部１２５から出力された類似度に基づいて推論を行って推論値を得る。推論部１２６が推論を行う手法は、学習装置１０における推論部１２６が推論を行う手法と同様である。例えば、テストデータに、被写体として「犬」が写っている場合、テストデータに対応する推論値として「犬」が出力される場合が想定される。このとき、類似例として「犬」の画像全体が提示されることも考えられる。 The inference unit 126 makes an inference based on the similarity output from the similarity calculation unit 125 and obtains an inferred value. The method in which the inference unit 126 makes an inference is the same as the method in which the inference unit 126 in the learning device 10 makes an inference. For example, when a "dog" is shown as a subject in the test data, it is assumed that the "dog" is output as an inference value corresponding to the test data. At this time, it is conceivable that the entire image of the "dog" is presented as a similar example.

しかし、本発明の実施形態では、識別装置２０によって、類似特徴量に対応する領域データが類似部位として提示される場合を想定する。これによって、類似例として「犬」の画像全体が提示されるよりも、類似例とテストデータとの類似性が部位ごとに容易に理解されやすくなる。以下では、類似部位を提示するためのブロックとして、表示制御部２２０および表示部２３０の機能について説明する。 However, in the embodiment of the present invention, it is assumed that the identification device 20 presents the region data corresponding to the similar feature amount as the similar part. This makes it easier to understand the similarity between the similar example and the test data site by site, rather than presenting the entire image of the "dog" as a similar example. Hereinafter, the functions of the display control unit 220 and the display unit 230 will be described as blocks for presenting similar parts.

（表示制御部２２０〜表示部２３０）
上記したように、プロトタイプと最も類似度が高い特徴量（類似特徴量）が抽出された学習用データの類似特徴量に対応する領域データが類似部位として、チャネルごとにプロトタイプに対応付けられている。そこで、表示制御部２２０は、学習用データのプロトタイプに対応する類似部位がチャネルごとに表示されるように表示部２３０を制御する。以下では、類似部位の提示について、図１１を参照しながら詳細に説明する。 (Display control unit 220 to display unit 230)
As described above, the region data corresponding to the similar feature amount of the learning data from which the feature amount having the highest degree of similarity to the prototype (similar feature amount) is extracted is associated with the prototype for each channel as a similar part. .. Therefore, the display control unit 220 controls the display unit 230 so that similar parts corresponding to the prototypes of the learning data are displayed for each channel. In the following, the presentation of similar parts will be described in detail with reference to FIG.

図１１は、類似部位提示画面の例を示す図である。図１１を参照すると、類似部位提示画面Ｄ１が示されている。表示制御部２２０は、類似部位提示画面Ｄ１が表示部２３０によって表示されるように表示部２３０を制御する。類似部位提示画面Ｄ１には、テストデータＧ２が含まれる他、各チャネルに対応する、類似部位Ｒ２１（耳）、類似部位Ｒ２２（目）、類似部位Ｒ２３（口）、類似部位Ｒ２４（脚）が含まれている。 FIG. 11 is a diagram showing an example of a similar part presentation screen. With reference to FIG. 11, a similar site presentation screen D1 is shown. The display control unit 220 controls the display unit 230 so that the similar portion presentation screen D1 is displayed by the display unit 230. In addition to the test data G2, the similar site presentation screen D1 includes similar sites R21 (ears), similar sites R22 (eyes), similar sites R23 (mouth), and similar sites R24 (legs) corresponding to each channel. include.

このように、類似部位Ｒ２１〜Ｒ２４が提示されることによって、類似例としての「犬」とテストデータＧ２との類似性が部位ごとに容易に理解されやすくなる。例えば、類似部位Ｒ２１〜Ｒ２４に着目すれば、類似例とテストデータＧ２とが類似している理由がより理解されやすくなる。なお、図１１に示された例では、類似部位Ｒ２１〜Ｒ２４の全部が表示されているが、表示制御部２２０は、類似部位Ｒ２１〜Ｒ２４の一部のみ（例えば、類似度の高い順に所定の数だけ）を表示部２３０に表示させてもよい。これによって、推論に対する寄与度が大きいプロトタイプが把握され得る。あるいは、表示制御部２２０は、類似度の低い順に所定の数だけ）を表示部２３０に表示させてもよい。これによって、推論に対する寄与度が小さいプロトタイプが把握され得る。 By presenting the similar sites R21 to R24 in this way, the similarity between the "dog" as a similar example and the test data G2 can be easily understood for each site. For example, if attention is paid to the similar parts R21 to R24, it becomes easier to understand the reason why the similar example and the test data G2 are similar. In the example shown in FIG. 11, all of the similar parts R21 to R24 are displayed, but the display control unit 220 determines only a part of the similar parts R21 to R24 (for example, in descending order of similarity). (Only the number) may be displayed on the display unit 230. As a result, a prototype with a large contribution to inference can be grasped. Alternatively, the display control unit 220 may display a predetermined number) on the display unit 230 in ascending order of similarity. As a result, a prototype with a small contribution to inference can be grasped.

さらに、表示制御部２２０は、テストデータＧ２の推論部１２６に出力された類似度に応じた値がスコアとしてチャネルごとに表示されるように表示部２３０を制御する。スコアが所定の範囲に収まるよう、類似度（例えば、上記の数式（１））に対して値域が所定の範囲に限定される関数（例えば、シグモイド関数など）が乗じられることによってスコアが得られてもよい。あるいは、類似度が無限大に発散しないよう、類似度の分母に対して０以外の定数が加算されることによってスコアが得られてもよい。あるいは、スコアは、類似度そのものであってもよい。 Further, the display control unit 220 controls the display unit 230 so that the value corresponding to the similarity output to the inference unit 126 of the test data G2 is displayed as a score for each channel. The score is obtained by multiplying the similarity (for example, the above formula (1)) by a function (for example, a sigmoid function) whose range is limited to a predetermined range so that the score falls within a predetermined range. You may. Alternatively, the score may be obtained by adding a constant other than 0 to the denominator of the similarity so that the similarity does not diverge to infinity. Alternatively, the score may be the similarity itself.

類似部位提示画面Ｄ１には、類似部位Ｒ２１に対応するスコアとして「５０」が表示され、類似部位Ｒ２２に対応するスコアとして「２０」が表示され、類似部位Ｒ２３に対応するスコアとして「３０」が表示され、類似部位Ｒ２４に対応するスコアとして「７０」が表示されている。このように、スコアが表示されることによって、テストデータＧ２の領域と類似部位Ｒ２１〜Ｒ２４とがどの程度類似しているかが理解されるようになる。 On the similar part presentation screen D1, "50" is displayed as a score corresponding to the similar part R21, "20" is displayed as a score corresponding to the similar part R22, and "30" is displayed as a score corresponding to the similar part R23. It is displayed, and "70" is displayed as a score corresponding to the similar portion R24. By displaying the score in this way, it becomes possible to understand how similar the region of the test data G2 and the similar sites R21 to R24 are.

また、図１１に示されるように、表示制御部２２０は、テストデータＧ２の推論部１２６に出力された類似度に対応する領域に関する情報（領域の位置およびサイズ）が表示されるように表示部２３０を制御するとよい。これによって、テストデータＧ２のどの領域が類似部位に対応しているかが理解されやすくなる。図１１に示された例では、テストデータＧ２の推論部１２６に出力された４つの類似度に対応する領域に関する情報Ｒ３１〜Ｒ３４が表示されている。 Further, as shown in FIG. 11, the display control unit 220 displays the information (position and size of the area) regarding the area corresponding to the similarity output to the inference unit 126 of the test data G2. It is good to control 230. This makes it easier to understand which region of the test data G2 corresponds to a similar site. In the example shown in FIG. 11, information R31 to R34 regarding the regions corresponding to the four similarities output to the inference unit 126 of the test data G2 is displayed.

以上、本発明の実施形態に係る識別装置２０の構成例について説明した。 The configuration example of the identification device 20 according to the embodiment of the present invention has been described above.

（１−４．識別装置の動作）
続いて、本発明の実施形態に係る識別装置２０の動作例について説明する。図１２は、本発明の実施形態に係る識別装置２０の動作例を示すフローチャートである。まず、図１２に示されたように、入力部１２１は、テストデータを取得する。また、重要領域推定部１２２は、重みパラメータ１３１を取得し、特徴抽出部１２４は、重みパラメータ１３２を取得し、推論部１２６は、重みパラメータ１３３を取得し、類似度算出部１２５は、プロトタイプ１３４を取得する（Ｓ３１）。 (1-4. Operation of identification device)
Subsequently, an operation example of the identification device 20 according to the embodiment of the present invention will be described. FIG. 12 is a flowchart showing an operation example of the identification device 20 according to the embodiment of the present invention. First, as shown in FIG. 12, the input unit 121 acquires test data. Further, the important region estimation unit 122 acquires the weight parameter 131, the feature extraction unit 124 acquires the weight parameter 132, the inference unit 126 acquires the weight parameter 133, and the similarity calculation unit 125 acquires the prototype 134. (S31).

重要領域推定部１２２は、入力部１２１から出力されたテストデータと重要領域推定ＮＮとに基づいてテストデータから１または複数の重要領域を推定する（Ｓ３２）。より詳細に、重要領域推定部１２２は、重要領域推定ＮＮにテストデータを入力させ、重みパラメータ１３１を用いて重要領域推定ＮＮから出力されるデータを１または複数の重要領域それぞれを示す情報（１または複数の重要領域それぞれの位置およびサイズ）として得る。重要領域推定部１２２は、１または複数の重要領域それぞれを示す情報をトリミング処理部１２３に出力する。 The important area estimation unit 122 estimates one or more important areas from the test data based on the test data output from the input unit 121 and the important area estimation NN (S32). More specifically, the important area estimation unit 122 causes the important area estimation NN to input test data, and uses the weight parameter 131 to output data from the important area estimation NN to indicate information (1) indicating each of one or a plurality of important areas. Or obtain as the position and size of each of the multiple important areas). The important area estimation unit 122 outputs information indicating each of the one or a plurality of important areas to the trimming processing unit 123.

トリミング処理部１２３は、入力部１２１から出力されたテストデータと、重要領域推定部１２２から出力された１または複数の重要領域それぞれを示す情報とに基づいて、学習用データの１または複数の重要領域に対してトリミングを行う（Ｓ３３）。そして、トリミング処理部１２３は、１または複数の重要領域を特徴抽出部１２４に出力する。 The trimming processing unit 123 has one or a plurality of important learning data based on the test data output from the input unit 121 and the information indicating each of the one or a plurality of important areas output from the important area estimation unit 122. Trimming is performed on the area (S33). Then, the trimming processing unit 123 outputs one or a plurality of important regions to the feature extraction unit 124.

特徴抽出部１２４は、トリミング処理部１２３から出力された１または複数の重要領域と特徴抽出ＮＮとに基づいて特徴量を抽出する（Ｓ３４）。より詳細に、特徴抽出部１２４は、特徴抽出ＮＮに重要領域を入力させ、重みパラメータ１３２を用いて特徴抽出ＮＮから出力される特徴量を得る。特徴抽出部１２４は、特徴量を類似度算出部１２５に出力する。 The feature extraction unit 124 extracts a feature amount based on one or a plurality of important regions output from the trimming processing unit 123 and the feature extraction NN (S34). More specifically, the feature extraction unit 124 causes the feature extraction NN to input an important region, and obtains a feature amount output from the feature extraction NN using the weight parameter 132. The feature extraction unit 124 outputs the feature amount to the similarity calculation unit 125.

類似度算出部１２５は、特徴抽出部１２４から出力された特徴量とプロトタイプ１３４との類似度を算出する（Ｓ３５）。推論部１２６は、類似度算出部１２５から出力された類似度に基づいて推論を行って推論値を得る（Ｓ３６）。より詳細に、推論部１２６は、推論ＮＮに類似度を入力させ、重みパラメータ１３３を用いて推論ＮＮから出力される推論値を得る。そして、推論部１２６は、推論値を出力する。 The similarity calculation unit 125 calculates the similarity between the feature amount output from the feature extraction unit 124 and the prototype 134 (S35). The inference unit 126 makes an inference based on the similarity output from the similarity calculation unit 125 and obtains an inferred value (S36). More specifically, the inference unit 126 causes the inference NN to input the similarity, and obtains the inference value output from the inference NN using the weight parameter 133. Then, the inference unit 126 outputs the inference value.

さらに、表示制御部２２０は、学習用データのプロトタイプに対応する類似部位がチャネルごとに表示されるように表示部２３０を制御する。また、表示制御部２２０は、テストデータの推論部１２６に出力された類似度に応じた値がスコアとしてチャネルごとに表示されるように表示部２３０を制御する（Ｓ３７）。さらに、表示制御部２２０は、テストデータの推論部１２６に出力された類似度に対応する領域に関する情報（領域の位置およびサイズ）が表示されるように表示部２３０を制御する。 Further, the display control unit 220 controls the display unit 230 so that similar parts corresponding to the prototypes of the learning data are displayed for each channel. Further, the display control unit 220 controls the display unit 230 so that the value corresponding to the similarity output to the inference unit 126 of the test data is displayed as a score for each channel (S37). Further, the display control unit 220 controls the display unit 230 so that the information (position and size of the area) regarding the area corresponding to the similarity output to the inference unit 126 of the test data is displayed.

以上、本発明の実施形態に係る識別装置２０の動作例について説明した。 The operation example of the identification device 20 according to the embodiment of the present invention has been described above.

（２．ハードウェア構成例）
続いて、本発明の実施形態に係る学習装置１０のハードウェア構成例について説明する。ただし、本発明の実施形態に係る識別装置２０のハードウェア構成例も同様に実現され得る。 (2. Hardware configuration example)
Subsequently, a hardware configuration example of the learning device 10 according to the embodiment of the present invention will be described. However, a hardware configuration example of the identification device 20 according to the embodiment of the present invention can be realized in the same manner.

以下では、本発明の実施形態に係る学習装置１０のハードウェア構成例として、情報処理装置９００のハードウェア構成例について説明する。なお、以下に説明する情報処理装置９００のハードウェア構成例は、学習装置１０のハードウェア構成の一例に過ぎない。したがって、学習装置１０のハードウェア構成は、以下に説明する情報処理装置９００のハードウェア構成から不要な構成が削除されてもよいし、新たな構成が追加されてもよい。 Hereinafter, as a hardware configuration example of the learning device 10 according to the embodiment of the present invention, a hardware configuration example of the information processing device 900 will be described. The hardware configuration example of the information processing device 900 described below is only an example of the hardware configuration of the learning device 10. Therefore, as for the hardware configuration of the learning device 10, an unnecessary configuration may be deleted from the hardware configuration of the information processing apparatus 900 described below, or a new configuration may be added.

図１３は、本発明の実施形態に係る学習装置１０の例としての情報処理装置９００のハードウェア構成を示す図である。情報処理装置９００は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）９０１と、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）９０２と、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）９０３と、ホストバス９０４と、ブリッジ９０５と、外部バス９０６と、インタフェース９０７と、入力装置９０８と、出力装置９０９と、ストレージ装置９１０と、通信装置９１１と、を備える。 FIG. 13 is a diagram showing a hardware configuration of an information processing device 900 as an example of the learning device 10 according to the embodiment of the present invention. The information processing device 900 includes a CPU (Central Processing Unit) 901, a ROM (Read Only Memory) 902, a RAM (Random Access Memory) 903, a host bus 904, a bridge 905, an external bus 906, and an interface 907. , An input device 908, an output device 909, a storage device 910, and a communication device 911.

ＣＰＵ９０１は、演算処理装置および制御装置として機能し、各種プログラムに従って情報処理装置９００内の動作全般を制御する。また、ＣＰＵ９０１は、マイクロプロセッサであってもよい。ＲＯＭ９０２は、ＣＰＵ９０１が使用するプログラムや演算パラメータ等を記憶する。ＲＡＭ９０３は、ＣＰＵ９０１の実行において使用するプログラムや、その実行において適宜変化するパラメータ等を一時記憶する。これらはＣＰＵバス等から構成されるホストバス９０４により相互に接続されている。 The CPU 901 functions as an arithmetic processing device and a control device, and controls the overall operation in the information processing device 900 according to various programs. Further, the CPU 901 may be a microprocessor. The ROM 902 stores programs, calculation parameters, and the like used by the CPU 901. The RAM 903 temporarily stores a program used in the execution of the CPU 901, parameters that are appropriately changed in the execution, and the like. These are connected to each other by a host bus 904 composed of a CPU bus or the like.

ホストバス９０４は、ブリッジ９０５を介して、ＰＣＩ（ＰｅｒｉｐｈｅｒａｌＣｏｍｐｏｎｅｎｔＩｎｔｅｒｃｏｎｎｅｃｔ／Ｉｎｔｅｒｆａｃｅ）バス等の外部バス９０６に接続されている。なお、必ずしもホストバス９０４、ブリッジ９０５および外部バス９０６を分離構成する必要はなく、１つのバスにこれらの機能を実装してもよい。 The host bus 904 is connected to an external bus 906 such as a PCI (Peripheral Component Interconnect / Interface) bus via a bridge 905. It is not always necessary to separately configure the host bus 904, the bridge 905, and the external bus 906, and these functions may be implemented in one bus.

入力装置９０８は、マウス、キーボード、タッチパネル、ボタン、マイクロフォン、スイッチおよびレバー等ユーザが情報を入力するための入力手段と、ユーザによる入力に基づいて入力信号を生成し、ＣＰＵ９０１に出力する入力制御回路等から構成されている。情報処理装置９００を操作するユーザは、この入力装置９０８を操作することにより、情報処理装置９００に対して各種のデータを入力したり処理動作を指示したりすることができる。 The input device 908 includes input means for the user to input information such as a mouse, keyboard, touch panel, buttons, microphone, switch, and lever, and an input control circuit that generates an input signal based on the input by the user and outputs the input signal to the CPU 901. And so on. By operating the input device 908, the user who operates the information processing device 900 can input various data to the information processing device 900 and instruct the processing operation.

出力装置９０９は、例えば、ＣＲＴ（ＣａｔｈｏｄｅＲａｙＴｕｂｅ）ディスプレイ装置、液晶ディスプレイ（ＬＣＤ）装置、ＯＬＥＤ（ＯｒｇａｎｉｃＬｉｇｈｔＥｍｉｔｔｉｎｇＤｉｏｄｅ）装置、ランプ等の表示装置およびスピーカ等の音声出力装置を含む。 The output device 909 includes, for example, a CRT (Cathode Ray Tube) display device, a liquid crystal display (LCD) device, an OLED (Organic Light Emitting Node) device, a display device such as a lamp, and an audio output device such as a speaker.

ストレージ装置９１０は、データ格納用の装置である。ストレージ装置９１０は、記憶媒体、記憶媒体にデータを記録する記録装置、記憶媒体からデータを読み出す読出し装置および記憶媒体に記録されたデータを削除する削除装置等を含んでもよい。ストレージ装置９１０は、例えば、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）で構成される。このストレージ装置９１０は、ハードディスクを駆動し、ＣＰＵ９０１が実行するプログラムや各種データを格納する。 The storage device 910 is a device for storing data. The storage device 910 may include a storage medium, a recording device for recording data on the storage medium, a reading device for reading data from the storage medium, a deleting device for deleting the data recorded on the storage medium, and the like. The storage device 910 is composed of, for example, an HDD (Hard Disk Drive). The storage device 910 drives a hard disk and stores programs and various data executed by the CPU 901.

通信装置９１１は、例えば、ネットワークに接続するための通信デバイス等で構成された通信インタフェースである。また、通信装置９１１は、無線通信または有線通信のどちらに対応してもよい。 The communication device 911 is, for example, a communication interface composed of a communication device or the like for connecting to a network. Further, the communication device 911 may support either wireless communication or wired communication.

以上、本発明の実施形態に係る学習装置１０のハードウェア構成例について説明した。 The hardware configuration example of the learning device 10 according to the embodiment of the present invention has been described above.

（３．まとめ）
以上に説明したように、本発明の実施形態によれば、ニューラルネットワークの判断根拠のより有用な説明材料を提示することを可能とする技術が提供される。より詳細に、本発明の実施形態によれば、重要領域がトリミングされて重要領域以外の領域が除外された上で特徴量が抽出され、特徴量とプロトタイプとの類似度が算出される。これによって、プロトタイプと類似する特徴量（類似特徴量）が抽出された学習用データの類似特徴量に対応する類似部位が提示され得る。したがって、類似例全体が提示されるよりも、類似例とテストデータとの類似性が部位ごとに容易に理解されやすくなる。 (3. Summary)
As described above, according to the embodiment of the present invention, there is provided a technique capable of presenting a more useful explanatory material for the judgment basis of the neural network. More specifically, according to the embodiment of the present invention, the important region is trimmed to exclude the region other than the important region, and then the feature amount is extracted, and the similarity between the feature amount and the prototype is calculated. Thereby, a similar part corresponding to the similar feature amount of the learning data from which the feature amount similar to the prototype (similar feature amount) is extracted can be presented. Therefore, the similarity between the similar example and the test data is easier to understand for each site than the whole similar example is presented.

以上、添付図面を参照しながら本発明の好適な実施形態について詳細に説明したが、本発明はかかる例に限定されない。本発明の属する技術の分野における通常の知識を有する者であれば、特許請求の範囲に記載された技術的思想の範疇内において、各種の変更例または修正例に想到し得ることは明らかであり、これらについても、当然に本発明の技術的範囲に属するものと了解される。 Although the preferred embodiments of the present invention have been described in detail with reference to the accompanying drawings, the present invention is not limited to such examples. It is clear that a person having ordinary knowledge in the field of technology to which the present invention belongs can come up with various modifications or modifications within the scope of the technical ideas described in the claims. , These are also naturally understood to belong to the technical scope of the present invention.

例えば、上記では、識別装置２０における表示制御部２２０が、類似部位、スコア、テストデータの領域に関する情報を表示部２３０に表示させる場合について主に説明した。しかし、学習装置１０が、識別装置２０と同様に、学習用データのプロトタイプに対応する類似部位、推論部１２６に出力された類似度または類似度に応じた値（スコア）、学習用データの推論部１２６に出力された類似度に対応する領域に関する情報を、チャネルごとに表示部２３０に表示させてもよい。 For example, in the above description, the case where the display control unit 220 in the identification device 20 causes the display unit 230 to display information regarding a similar portion, a score, and a test data area has been mainly described. However, like the identification device 20, the learning device 10 has a similar part corresponding to the prototype of the learning data, a similarity or a value (score) corresponding to the similarity output to the inference unit 126, and inference of the learning data. Information about the area corresponding to the similarity output to the unit 126 may be displayed on the display unit 230 for each channel.

１０学習装置
１１０データセット
１２０ニューラルネットワーク
１２１入力部
１２２重要領域推定部
１２３トリミング処理部
１２４特徴抽出部
１２５類似度算出部
１２６推論部
１３１〜１３３パラメータ
１３４プロトタイプ
１４０評価部
１５０更新部
１６０保存データ
２０識別装置
２２０表示制御部
２３０表示部 10 Learning device 110 Data set 120 Neural network 121 Input unit 122 Important area estimation unit 123 Trimming processing unit 124 Feature extraction unit 125 Similarity calculation unit 126 Inference unit 131-133 Parameters 134 Prototype 140 Evaluation unit 150 Update unit 160 Saved data 20 Identification Device 220 Display control unit 230 Display unit

Claims

Input part to acquire learning data and correct answer value,
An important area estimation unit that estimates one or more important areas based on the learning data, and an important area estimation unit.
A trimming processing unit that trims the one or more important areas based on the learning data and information indicating each of the one or more important areas and outputs the one or more important areas.
A feature extraction unit that extracts features based on the one or more important regions and the first neural network, and
A similarity calculation unit that calculates and outputs the similarity between the feature amount and the prototype, and
An inference unit that outputs an inference value based on the similarity, and an inference unit.
An evaluation unit that evaluates the inferred value based on the correct answer value and obtains an evaluation result,
Based on the evaluation result, an update unit that updates the weight parameter of the first neural network and the prototype, and an update unit.
A learning device equipped with.

The important region estimation unit estimates the one or a plurality of important regions based on the learning data and the second neural network.
The update unit updates the weight parameter of the second neural network based on the evaluation result.
The learning device according to claim 1.

The inference unit outputs the inference value based on the similarity and the third neural network.
The update unit updates the weight parameter of the third neural network based on the evaluation result.
The learning device according to claim 1 or 2.

The size of each of the one or more important regions is variable.
The learning device according to any one of claims 1 to 3.

Predetermined constraints are imposed on the size of each of the one or more important areas.
The learning device according to any one of claims 1 to 4.

The size of the feature amount is variable.
The learning device according to any one of claims 1 to 5.

The number of channels of the feature amount is the same as the number of channels of the prototype.
The similarity calculation unit gives the inference unit the highest degree of similarity between one or more of the channel data of the feature amount and the channel data of the prototype as the degree of similarity corresponding to the channel. Output,
The learning device according to any one of claims 1 to 6.

The similarity calculation unit saves the similarity output to the inference unit and the feature amount corresponding to the similarity in a part or all of the plurality of learning data as storage data for each channel.
The update unit detects a feature amount having the highest degree of similarity to the prototype as a similar feature amount from the stored data for each channel, and the region corresponding to the similar feature amount of the learning data from which the similar feature amount is extracted. Associate data with the prototype for each channel,
The learning device according to any one of claims 1 to 7.

The update unit overwrites the prototype for each channel with the similar features.
The learning device according to claim 8.

When the prototype is overwritten by the similar feature amount in the middle of learning, the update unit stops updating the prototype.
The learning device according to claim 9.

Acquiring training data and correct answer values,
Estimating one or more important regions based on the training data,
Trimming the one or more important areas based on the learning data and the information indicating each of the one or more important areas, and outputting the one or more important areas.
Extracting features based on the one or more important regions and the first neural network,
To calculate and output the similarity between the feature amount and the prototype,
To output the inferred value based on the similarity,
To obtain the evaluation result by evaluating the inferred value based on the correct answer value,
Based on the evaluation result, the weight parameter of the first neural network and the prototype are updated, and
Learning methods, including.

Computer,
Input part to acquire learning data and correct answer value,
An important area estimation unit that estimates one or more important areas based on the learning data, and an important area estimation unit.
A trimming processing unit that trims the one or more important areas based on the learning data and information indicating each of the one or more important areas and outputs the one or more important areas.
A feature extraction unit that extracts features based on the one or more important regions and the first neural network, and
A similarity calculation unit that calculates and outputs the similarity between the feature amount and the prototype, and
An inference unit that outputs an inference value based on the similarity, and an inference unit.
An evaluation unit that evaluates the inferred value based on the correct answer value and obtains an evaluation result,
Based on the evaluation result, an update unit that updates the weight parameter of the first neural network and the prototype, and an update unit.
A learning program to function as a learning device equipped with.

An input unit that acquires identification data and correct answer values,
An important area estimation unit that estimates one or more important areas based on the identification data, and an important area estimation unit.
A trimming processing unit that trims the one or more important areas based on the identification data and the one or more important areas and outputs the one or more important areas.
A feature extraction unit that extracts features based on the one or more important regions and the first neural network, and
A similarity calculation unit that calculates and outputs the similarity between the feature amount and the prototype, and
An inference unit that outputs an inference value based on the similarity, and an inference unit.
A display control unit that controls so that the area data corresponding to the prototype of the training data is displayed for each channel, and
An identification device.

The display control unit controls so that information about a region corresponding to the similarity output to the inference unit of the identification data is displayed for each channel.
The identification device according to claim 13.

The display control unit controls so that the similarity output to the inference unit or a value corresponding to the similarity is displayed as a score for each channel.
The identification device according to claim 13 or 14.

The display control unit controls so that a predetermined number of the area data are displayed in descending order of similarity output to the inference unit.
The identification device according to any one of claims 13 to 15.

Acquiring identification data and correct answer value,
Estimating one or more important regions based on the identification data
To output the one or more important areas by trimming the one or more important areas based on the identification data and the one or more important areas.
Extracting features based on the one or more important regions and the first neural network,
To calculate and output the similarity between the feature amount and the prototype,
To output the inferred value based on the similarity,
Control so that the area data corresponding to the prototype of the training data is displayed for each channel, and
Identification methods, including.

Computer,
An input unit that acquires identification data and correct answer values,
An important area estimation unit that estimates one or more important areas based on the identification data, and an important area estimation unit.
A trimming processing unit that trims the one or more important areas based on the identification data and the one or more important areas and outputs the one or more important areas.
A feature extraction unit that extracts features based on the one or more important regions and the first neural network, and
A similarity calculation unit that calculates and outputs the similarity between the feature amount and the prototype, and
An inference unit that outputs an inference value based on the similarity, and an inference unit.
A display control unit that controls so that the area data corresponding to the prototype of the training data is displayed for each channel, and
An identification program for functioning as an identification device.