JP7085605B2

JP7085605B2 - Model evaluation system, model evaluation method and model evaluation program

Info

Publication number: JP7085605B2
Application number: JP2020182500A
Authority: JP
Inventors: 裕也根本
Original assignee: Mizuho Research and Technologies Ltd
Current assignee: Mizuho Research and Technologies Ltd
Priority date: 2020-10-30
Filing date: 2020-10-30
Publication date: 2022-06-16
Anticipated expiration: 2040-10-30
Also published as: JP2022072841A

Description

本発明は、機械学習により生成された学習済みモデルの評価を支援するモデル評価システム、モデル評価方法及びモデル評価プログラムに関する。 The present invention relates to a model evaluation system, a model evaluation method, and a model evaluation program that support evaluation of a trained model generated by machine learning.

近年、深層学習を利用して生成した学習済みモデルを用いて、画像等のコンテンツを、コンピュータによって認識する技術が利用されている。しかしながら、学習済みモデルは、ブラックボックスでの判断になるため、コンピュータが、コンテンツをどのように認識するかについて、判断の根拠の説明が難しい。このため、判断根拠が不明な状況での運用は困難な場合もある。 In recent years, a technique of recognizing contents such as images by a computer using a trained model generated by using deep learning has been used. However, since the trained model is a black box judgment, it is difficult to explain the basis of the judgment as to how the computer recognizes the content. For this reason, it may be difficult to operate in situations where the basis for judgment is unknown.

そこで、判断根拠を説明するための技術も検討されている（非特許文献１、２）。非特許文献１に記載された技術では、ＣＮＮベースのモデルの大規模なクラスからの決定に対して「視覚的な説明」を作成し、それらをより透明にする。ここでは、任意のターゲットコンセプトの勾配を使用して、最終的な畳み込み層に流し込み、画像内の重要な領域を強調する粗いローカリゼーションマップを生成して、コンセプトを予測する。 Therefore, a technique for explaining the basis of judgment is also being studied (Non-Patent Documents 1 and 2). The technique described in Non-Patent Document 1 creates "visual explanations" for decisions from large classes of CNN-based models and makes them more transparent. Here, the gradient of any target concept is used to inject into the final convolution layer and generate a coarse localization map that highlights important areas in the image to predict the concept.

また、非特許文献２に記載された技術では、予測を中心に解釈可能なモデルを局所的に学習することにより、分類器の予測を解釈可能に説明するＬＩＭＥ（Local Interpretable Model-agnostic Explainations）を用いる。 Further, in the technique described in Non-Patent Document 2, LIME (Local Interpretable Model-agnostic Explainations) is used to explain the prediction of the classifier in an interpretable manner by locally learning a model that can be interpreted centering on the prediction. Use.

コーネル大学、２０１６年１０月７日、Ramprasaath R.他、「Grad-CAM:Visual Explanations from Deep Networks via Gradient-based Localization」、［online］、arxiv.orgサイト、［令和２年９月２２日検索］、インターネット＜https://arxiv.org/pdf/1610.02391.pdf＞Cornell University, October 7, 2016, Ramprasaath R. et al., "Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization", [online], arxiv.org site, [September 22, 2nd year of Reiwa] Search], Internet <https://arxiv.org/pdf/1610.02391.pdf> コーネル大学、２０１６年２月１６日、Marco Tulio Ribeiro他、「"Why Should I Trust You":Explaining the Predictions of Any Classifier」、［online］、arxiv.orgサイト、［令和２年９月２２日検索］、インターネット＜https://arxiv.org/pdf/1602.04938.pdf＞Cornell University, February 16, 2016, Marco Tulio Ribeiro et al., "" Why Should I Trust You ": Explaining the Predictions of Any Classifier", [online], arxiv.org site, [Reiwa September 22, 2016 Search], Internet <https://arxiv.org/pdf/1602.04938.pdf>

しかしながら、非特許文献１に記載された技術では、深層学習の内部に手を加えているため、既に学習された学習済みモデルには適用できない。また、非特許文献１、２に記載された技術では、着目している領域しか分からない。また、いずれの技術も、画像を人が用意しているため、恣意性を排除できない。 However, since the technique described in Non-Patent Document 1 modifies the inside of deep learning, it cannot be applied to a trained model that has already been trained. Further, in the techniques described in Non-Patent Documents 1 and 2, only the area of interest can be known. In addition, in any of the techniques, since the image is prepared by a person, arbitrariness cannot be excluded.

上記課題を解決するモデル評価システムは、学習済みモデルを記録する評価対象記憶部と、前記学習済みモデルを用いて認識結果を出力する制御部とを備える。そして、前記制御部が、複数のサンプルコンテンツを生成し、前記各サンプルコンテンツを、前記評価対象記憶部に記録された学習済みモデルに入力して、前記サンプルコンテンツの認識結果の確からしさを取得し、前記確からしさに応じた前記サンプルコンテンツを用いて、前記学習済みモデルにおける特徴コンテンツに関する評価結果を出力する。 The model evaluation system that solves the above problems includes an evaluation target storage unit that records a trained model and a control unit that outputs a recognition result using the trained model. Then, the control unit generates a plurality of sample contents, inputs the sample contents into the trained model recorded in the evaluation target storage unit, and acquires the certainty of the recognition result of the sample contents. , The evaluation result regarding the feature content in the trained model is output by using the sample content according to the certainty.

本発明によれば、機械学習により生成された学習済みモデルを評価することができる。 According to the present invention, a trained model generated by machine learning can be evaluated.

第１実施形態のモデル評価システムの説明図。Explanatory drawing of the model evaluation system of 1st Embodiment. 第１実施形態のハードウェア構成の説明図。Explanatory drawing of the hardware configuration of 1st Embodiment. 第１実施形態の処理手順の説明図。Explanatory drawing of the processing procedure of 1st Embodiment. 第１実施形態の白黒画像の説明図。Explanatory drawing of the black-and-white image of the first embodiment. 第２実施形態の処理手順の説明図。Explanatory drawing of the processing procedure of 2nd Embodiment. 第２実施形態のマスキングの説明図。Explanatory drawing of masking of 2nd Embodiment. 第２実施形態のクラスタリングの説明図。Explanatory drawing of clustering of 2nd Embodiment.

（第１実施形態）
図１～図４に従って、モデル評価システム、モデル評価方法及びモデル評価プログラムの第1実施形態を説明する。本実施形態では、教師情報を用いた機械学習により生成され、所定のコンテンツ（画像）を入力して認識結果（テキスト）を出力する学習済みモデルを評価する。
図１に示すように、本実施形態のモデル評価システムは、ネットワークを介して接続されたユーザ端末１０、支援サーバ２０を用いる。 (First Embodiment)
The first embodiment of the model evaluation system, the model evaluation method, and the model evaluation program will be described with reference to FIGS. 1 to 4. In this embodiment, a trained model that is generated by machine learning using teacher information, inputs predetermined content (image), and outputs a recognition result (text) is evaluated.
As shown in FIG. 1, the model evaluation system of the present embodiment uses a user terminal 10 and a support server 20 connected via a network.

（ハードウェア構成例）
図２は、ユーザ端末１０、支援サーバ２０等として機能する情報処理装置Ｈ１０のハードウェア構成例である。 (Hardware configuration example)
FIG. 2 is a hardware configuration example of the information processing apparatus H10 that functions as a user terminal 10, a support server 20, and the like.

情報処理装置Ｈ１０は、通信装置Ｈ１１、入力装置Ｈ１２、表示装置Ｈ１３、記憶装置Ｈ１４、プロセッサＨ１５を有する。なお、このハードウェア構成は一例であり、他のハードウェアを有していてもよい。 The information processing device H10 includes a communication device H11, an input device H12, a display device H13, a storage device H14, and a processor H15. Note that this hardware configuration is an example, and may have other hardware.

通信装置Ｈ１１は、他の装置との間で通信経路を確立して、データの送受信を実行するインタフェースであり、例えばネットワークインタフェースや無線インタフェース等である。 The communication device H11 is an interface that establishes a communication path with another device and executes data transmission / reception, such as a network interface or a wireless interface.

入力装置Ｈ１２は、利用者等からの入力を受け付ける装置であり、例えばマウスやキーボード等である。表示装置Ｈ１３は、各種情報を表示するディスプレイやタッチパネル等である。 The input device H12 is a device that receives input from a user or the like, and is, for example, a mouse, a keyboard, or the like. The display device H13 is a display, a touch panel, or the like that displays various information.

記憶装置Ｈ１４は、ユーザ端末１０、支援サーバ２０の各種機能を実行するためのデータや各種プログラムを格納する記憶装置である。記憶装置Ｈ１４の一例としては、ＲＯＭ、ＲＡＭ、ハードディスク等がある。 The storage device H14 is a storage device that stores data and various programs for executing various functions of the user terminal 10 and the support server 20. An example of the storage device H14 is a ROM, a RAM, a hard disk, or the like.

プロセッサＨ１５は、記憶装置Ｈ１４に記憶されるプログラムやデータを用いて、ユーザ端末１０、支援サーバ２０における各処理（例えば、後述する制御部２１における処理）を制御する。プロセッサＨ１５の一例としては、例えばＣＰＵやＭＰＵ等がある。このプロセッサＨ１５は、ＲＯＭ等に記憶されるプログラムをＲＡＭに展開して、各種処理に対応する各種プロセスを実行する。例えば、プロセッサＨ１５は、ユーザ端末１０、支援サーバ２０のアプリケーションプログラムが起動された場合、後述する各処理を実行するプロセスを動作させる。 The processor H15 controls each process (for example, a process in the control unit 21 described later) in the user terminal 10 and the support server 20 by using the programs and data stored in the storage device H14. Examples of the processor H15 include a CPU, an MPU, and the like. The processor H15 expands a program stored in a ROM or the like into a RAM and executes various processes corresponding to various processes. For example, when the application program of the user terminal 10 and the support server 20 is started, the processor H15 operates a process for executing each process described later.

プロセッサＨ１５は、自身が実行するすべての処理についてソフトウェア処理を行なうものに限られない。例えば、プロセッサＨ１５は、自身が実行する処理の少なくとも一部についてハードウェア処理を行なう専用のハードウェア回路（例えば、特定用途向け集積回路：ＡＳＩＣ）を備えてもよい。すなわち、プロセッサＨ１５は、（１）コンピュータプログラム（ソフトウェア）に従って動作する１つ以上のプロセッサ、（２）各種処理のうち少なくとも一部の処理を実行する１つ以上の専用のハードウェア回路、或いは（３）それらの組み合わせ、を含む回路（circuitry）として構成し得る。プロセッサは、ＣＰＵ並びに、ＲＡＭ及びＲＯＭ等のメモリを含み、メモリは、処理をＣＰＵに実行させるように構成されたプログラムコード又は指令を格納している。メモリすなわちコンピュータ可読媒体は、汎用又は専用のコンピュータでアクセスできるあらゆる利用可能な媒体を含む。 The processor H15 is not limited to the one that performs software processing for all the processing executed by itself. For example, the processor H15 may include a dedicated hardware circuit (for example, an integrated circuit for a specific application: ASIC) that performs hardware processing for at least a part of the processing executed by the processor H15. That is, the processor H15 is (1) one or more processors that operate according to a computer program (software), (2) one or more dedicated hardware circuits that execute at least a part of various processes, or ( 3) It can be configured as a circuitry including a combination thereof. The processor includes a CPU and a memory such as a RAM and a ROM, and the memory stores a program code or a command configured to cause the CPU to execute a process. Memory or computer readable media includes any available medium accessible by a general purpose or dedicated computer.

（各情報処理装置の機能）
図１のユーザ端末１０は、学習済みモデルを評価する担当者が利用するコンピュータ端末である。 (Functions of each information processing device)
The user terminal 10 of FIG. 1 is a computer terminal used by a person in charge of evaluating a trained model.

支援サーバ２０は、学習済みモデルの評価を支援するためのコンピュータシステムである。この支援サーバ２０は、制御部２１、評価対象記憶部２２、特徴情報記憶部２３を備えている。 The support server 20 is a computer system for supporting the evaluation of the trained model. The support server 20 includes a control unit 21, an evaluation target storage unit 22, and a feature information storage unit 23.

この制御部２１は、後述する処理（画像加工段階、予測段階、評価段階、クラスタ分析段階等を含む処理）を行なう。このためのモデル評価プログラムを実行することにより、制御部２１は、画像加工部２１１、予測部２１２、評価部２１３、クラスタ分析部２１４等として機能する。 The control unit 21 performs a process described later (a process including an image processing step, a prediction step, an evaluation step, a cluster analysis step, and the like). By executing the model evaluation program for this purpose, the control unit 21 functions as an image processing unit 211, a prediction unit 212, an evaluation unit 213, a cluster analysis unit 214, and the like.

画像加工部２１１は、評価に用いる画像を調整する処理を実行する。画像加工部２１１は、特徴コンテンツ生成処理を終了する終了条件に関するデータを保持している。終了条件としては、例えば、特徴情報記憶部２３に記録された特徴画像の数が所定数になった場合を用いることができる。 The image processing unit 211 executes a process of adjusting an image used for evaluation. The image processing unit 211 holds data regarding an end condition for terminating the feature content generation process. As the end condition, for example, a case where the number of feature images recorded in the feature information storage unit 23 reaches a predetermined number can be used.

予測部２１２は、学習済みモデルを用いて、予測結果を出力する処理を実行する。
評価部２１３は、学習済みモデルを評価する処理を実行する。評価部２１３は、学習済みモデルにより出力された確からしさと比較するための基準値に関するデータを保持している。 The prediction unit 212 executes a process of outputting a prediction result using the trained model.
The evaluation unit 213 executes a process of evaluating the trained model. The evaluation unit 213 holds data on the reference value for comparison with the certainty output by the trained model.

クラスタ分析部２１４は、クラスタリング処理より、特徴画像のグループ分けを行なう処理を実行する。このクラスタリング処理には、例えば、認識結果と特徴画像とを用いたｋ平均法を用いることができるが、ｋ平均法に限定されるものではない。 The cluster analysis unit 214 executes a process of grouping feature images rather than a clustering process. For this clustering process, for example, a k-means method using a recognition result and a feature image can be used, but the clustering process is not limited to the k-means method.

評価対象記憶部２２には、評価対象の学習済みモデルが記録される。この学習済みモデルは、ユーザ端末１０から、評価対象の学習済みモデルを取得した場合に記録される。本実施形態では、評価対象の学習済みモデルとして、深層学習によって生成された予測モデル（ネットワーク）であって、画像に含まれるテキストを認識する文字認識モデルを用いる。ここで、画像に含まれるテキストとしては、例えば、数字「５」を用いることができる。 The trained model to be evaluated is recorded in the evaluation target storage unit 22. This trained model is recorded when the trained model to be evaluated is acquired from the user terminal 10. In the present embodiment, as the trained model to be evaluated, a character recognition model that recognizes the text included in the image, which is a prediction model (network) generated by deep learning, is used. Here, as the text included in the image, for example, the number "5" can be used.

特徴情報記憶部２３には、特徴画像管理レコードが記録される。この特徴画像管理レコードは、特徴コンテンツ生成処理を実行した場合に記録される。特徴画像管理レコードには、特徴画像及び認識結果に関するデータが記録される。 A feature image management record is recorded in the feature information storage unit 23. This feature image management record is recorded when the feature content generation process is executed. Data related to the feature image and the recognition result are recorded in the feature image management record.

特徴画像データ領域には、認識文字の確からしさが基準値以上となった特徴画像（特徴コンテンツ）に関するデータが記録される。
認識結果データ領域には、特徴画像を、基準値以上の確からしさで認識した文字に関するデータが記録される。例えば、画像において基準値以上の確からしさで「５」を認識した場合、認識結果として数字「５」が記録される。 In the feature image data area, data related to the feature image (feature content) in which the certainty of the recognition character is equal to or higher than the reference value is recorded.
In the recognition result data area, data related to characters in which the feature image is recognized with a certainty higher than the reference value is recorded. For example, when "5" is recognized in the image with a certainty of the reference value or more, the number "5" is recorded as the recognition result.

次に、上記のように構成されたシステムにおいて、学習済みモデルを評価する処理手順を説明する。
（特徴コンテンツ生成処理）
まず、図３を用いて、特徴コンテンツ生成処理を説明する。 Next, a processing procedure for evaluating the trained model in the system configured as described above will be described.
(Feature content generation processing)
First, the feature content generation process will be described with reference to FIG.

ここでは、支援サーバ２０の制御部２１は、白黒画像の生成処理を実行する（ステップＳ１０１）。具体的には、制御部２１の画像加工部２１１は、ユーザ端末１０から、学習済みモデルを取得し、評価対象記憶部２２に記録する。そして、画像加工部２１１は、任意の白黒画像（サンプルコンテンツ）を生成する。例えば、所定のサイズのビットマップにおいて、ランダムに白ピクセル及び黒ピクセルを配置した白黒画像を生成する。
例えば、図４に示すように、白黒画像５００を生成する場合を想定する。 Here, the control unit 21 of the support server 20 executes a black-and-white image generation process (step S101). Specifically, the image processing unit 211 of the control unit 21 acquires the trained model from the user terminal 10 and records it in the evaluation target storage unit 22. Then, the image processing unit 211 generates an arbitrary black-and-white image (sample content). For example, in a bitmap of a predetermined size, a black-and-white image in which white pixels and black pixels are randomly arranged is generated.
For example, as shown in FIG. 4, it is assumed that a black-and-white image 500 is generated.

次に、支援サーバ２０の制御部２１は、ランダムにピクセル選定処理を実行する（ステップＳ１０２）。具体的には、制御部２１の画像加工部２１１は、生成した白黒画像において、ランダムにピクセルを選択する。本実施形態では、ステップＳ１０１において生成した白黒画像のビットマップにおいて、１つのピクセルを選択する。
例えば、図４において、白黒画像５００のピクセル５０１を選定する。 Next, the control unit 21 of the support server 20 randomly executes the pixel selection process (step S102). Specifically, the image processing unit 211 of the control unit 21 randomly selects pixels in the generated black-and-white image. In this embodiment, one pixel is selected in the bitmap of the black-and-white image generated in step S101.
For example, in FIG. 4, the pixel 501 of the black-and-white image 500 is selected.

次に、支援サーバ２０の制御部２１は、ピクセル反転処理を実行する（ステップＳ１０３）。具体的には、制御部２１の画像加工部２１１は、選択したピクセルの白黒を反転させる。これにより、選択したピクセルが白ピクセルの場合には黒ピクセルに反転させ、黒ピクセルの場合には白ピクセルに反転させた白黒画像（サンプルコンテンツ）が生成される。
ここでは、図４において、黒のピクセル５０１を白に白黒反転することにより、白黒画像５１０が生成される。 Next, the control unit 21 of the support server 20 executes the pixel inversion process (step S103). Specifically, the image processing unit 211 of the control unit 21 inverts the black and white of the selected pixel. As a result, if the selected pixel is a white pixel, it is inverted to a black pixel, and if it is a black pixel, a black-and-white image (sample content) inverted to a white pixel is generated.
Here, in FIG. 4, a black-and-white image 510 is generated by black-and-white inversion of black pixels 501 to white.

次に、支援サーバ２０の制御部２１は、予測処理を実行する（ステップＳ１０４）。具体的には、制御部２１の予測部２１２は、生成した白黒画像５１０を、評価対象記憶部２２の学習済みモデルに入力する。そして、予測部２１２は、学習済みモデルによって出力された認識結果と確からしさを取得する。 Next, the control unit 21 of the support server 20 executes the prediction process (step S104). Specifically, the prediction unit 212 of the control unit 21 inputs the generated black-and-white image 510 into the trained model of the evaluation target storage unit 22. Then, the prediction unit 212 acquires the recognition result and the certainty output by the trained model.

次に、支援サーバ２０の制御部２１は、確からしさが基準値以上かどうかについての判定処理を実行する（ステップＳ１０５）。具体的には、制御部２１の予測部２１２は、学習済みモデルにより出力された確からしさと基準値とを比較する。 Next, the control unit 21 of the support server 20 executes a determination process as to whether or not the certainty is equal to or higher than the reference value (step S105). Specifically, the prediction unit 212 of the control unit 21 compares the certainty output by the trained model with the reference value.

確からしさが基準値以上と判定した場合（ステップＳ１０５において「ＹＥＳ」の場合）、支援サーバ２０の制御部２１は、特徴画像の登録処理を実行する（ステップＳ１０６）。具体的には、制御部２１の画像加工部２１１は、学習済みモデルに入力した白黒画像を特徴画像として、認識結果と関連付けた特徴画像管理レコードを生成し、特徴情報記憶部２３に記録する。 When it is determined that the certainty is equal to or higher than the reference value (when "YES" in step S105), the control unit 21 of the support server 20 executes the feature image registration process (step S106). Specifically, the image processing unit 211 of the control unit 21 generates a feature image management record associated with the recognition result using the black-and-white image input to the trained model as the feature image, and records it in the feature information storage unit 23.

一方、確からしさが基準値未満と判定した場合（ステップＳ１０５において「ＮＯ」の場合）、支援サーバ２０の制御部２１は、特徴画像の登録処理（ステップＳ１０６）をスキップする。 On the other hand, when it is determined that the certainty is less than the reference value (when "NO" in step S105), the control unit 21 of the support server 20 skips the feature image registration process (step S106).

次に、支援サーバ２０の制御部２１は、終了かどうかについての判定処理を実行する（ステップＳ１０７）。具体的には、制御部２１の画像加工部２１１は、同じ認識結果の特徴画像管理レコードのレコード数をカウントする。そして、レコード数が終了条件を満足している場合には、終了と判定する。 Next, the control unit 21 of the support server 20 executes a determination process as to whether or not it is terminated (step S107). Specifically, the image processing unit 211 of the control unit 21 counts the number of records of the feature image management record of the same recognition result. Then, when the number of records satisfies the end condition, it is determined to end.

レコード数が終了条件を満足しておらず、終了でないと判定した場合（ステップＳ１０７において「ＮＯ」の場合）、支援サーバ２０の制御部２１は、ランダムにピクセル選定処理（ステップＳ１０２）以降の処理を繰り返す。
例えば、図４に示すように、白黒画像５１０において、ピクセル５０２を選定し、白黒反転することにより、白黒画像５２０が生成される。 When it is determined that the number of records does not satisfy the end condition and it is not the end (when "NO" in step S107), the control unit 21 of the support server 20 randomly selects pixels (step S102) and thereafter. repeat.
For example, as shown in FIG. 4, in the black-and-white image 510, the black-and-white image 520 is generated by selecting the pixel 502 and inverting the black-and-white image.

一方、終了と判定した場合（ステップＳ１０７において「ＹＥＳ」の場合）、支援サーバ２０の制御部２１は、特徴画像の取得処理を実行する（ステップＳ１０８）。具体的には、制御部２１のクラスタ分析部２１４は、特徴情報記憶部２３から、すべての特徴画像管理レコードを抽出し、特徴画像管理レコードに記録された特徴画像を取得する。 On the other hand, when it is determined that the end is completed (when "YES" in step S107), the control unit 21 of the support server 20 executes the feature image acquisition process (step S108). Specifically, the cluster analysis unit 214 of the control unit 21 extracts all the feature image management records from the feature information storage unit 23, and acquires the feature images recorded in the feature image management records.

次に、支援サーバ２０の制御部２１は、特徴画像のクラスタリング処理を実行する（ステップＳ１０９）。具体的には、制御部２１のクラスタ分析部２１４は、特徴画像をクラスタリング処理により、グループ分けを行なう。 Next, the control unit 21 of the support server 20 executes a clustering process of the feature image (step S109). Specifically, the cluster analysis unit 214 of the control unit 21 groups the feature images by clustering processing.

次に、支援サーバ２０の制御部２１は、クラスタリング結果の出力処理を実行する（ステップＳ１１０）。具体的には、制御部２１の評価部２１３は、クラスタリングにより生成した各グループの特徴画像を、ユーザ端末１０に出力する。 Next, the control unit 21 of the support server 20 executes the output processing of the clustering result (step S110). Specifically, the evaluation unit 213 of the control unit 21 outputs the feature image of each group generated by clustering to the user terminal 10.

本実施形態によれば、以下のような効果を得ることができる。
（１－１）本実施形態においては、支援サーバ２０の制御部２１は、ランダムにピクセル選定処理（ステップＳ１０２）、ピクセル反転処理（ステップＳ１０３）、予測処理（ステップＳ１０４）を実行する。これにより、画像を部分的に変化させながら、確からしさを算出して、学習済みモデルを評価することができる。 According to this embodiment, the following effects can be obtained.
(1-1) In the present embodiment, the control unit 21 of the support server 20 randomly executes a pixel selection process (step S102), a pixel inversion process (step S103), and a prediction process (step S104). This makes it possible to evaluate the trained model by calculating the certainty while partially changing the image.

（１－２）本実施形態においては、支援サーバ２０の制御部２１は、確からしさが基準値以上かどうかについての判定処理を実行する（ステップＳ１０５）。そして、確からしさが基準値以上と判定した場合（ステップＳ１０５において「ＹＥＳ」の場合）、支援サーバ２０の制御部２１は、特徴画像の登録処理を実行する（ステップＳ１０６）。これにより、確からしさに応じて、認識結果を出力する特徴的な画像を探すことができる。 (1-2) In the present embodiment, the control unit 21 of the support server 20 executes a determination process as to whether or not the certainty is equal to or higher than the reference value (step S105). Then, when it is determined that the certainty is equal to or higher than the reference value (when "YES" in step S105), the control unit 21 of the support server 20 executes the feature image registration process (step S106). This makes it possible to search for a characteristic image that outputs the recognition result according to the certainty.

（１－３）本実施形態においては、支援サーバ２０の制御部２１は、特徴画像のクラスタリング処理を実行する（ステップＳ１０９）。これにより、認識結果に対して、複数の特徴画像を取得した場合にも、クラスタリングによってグループ毎にまとめた特徴を出力することができる。 (1-3) In the present embodiment, the control unit 21 of the support server 20 executes a clustering process of the feature image (step S109). As a result, even when a plurality of feature images are acquired for the recognition result, the features grouped by clustering can be output.

（第２実施形態）
次に、モデル評価システム、モデル評価方法及びモデル評価プログラムの第２実施形態を説明する。第１実施形態では、特徴情報記憶部２３に記録された特徴画像についてクラスタリングを行なう。第２実施形態では、特徴画像において特徴的な領域（特徴領域）を特定してクラスタリングを行なうように変更した特徴領域評価処理を実行する。なお、上記第１実施形態と同様の部分については、同一の符号を付し、その詳細な説明を省略する。 (Second Embodiment)
Next, a second embodiment of the model evaluation system, the model evaluation method, and the model evaluation program will be described. In the first embodiment, clustering is performed on the feature images recorded in the feature information storage unit 23. In the second embodiment, the feature area evaluation process is executed in which the feature area (feature area) is specified in the feature image and changed so as to perform clustering. The same parts as those in the first embodiment are designated by the same reference numerals, and detailed description thereof will be omitted.

この場合、制御部２１の画像加工部２１１は、特徴画像の一部のマスキングを行なう処理を実行する。そして、画像加工部２１１は、マスキング処理の要否を判定するための要否判定条件に関するデータを保持している。要否判定条件として、例えば、各特徴画像の類似性を評価した分散値を用いることができる。この場合、分散値が要否基準値内の場合には、マスク処理は不要と判定する。 In this case, the image processing unit 211 of the control unit 21 executes a process of masking a part of the feature image. Then, the image processing unit 211 holds data regarding the necessity determination condition for determining the necessity of the masking process. As the necessity determination condition, for example, a dispersion value that evaluates the similarity of each feature image can be used. In this case, if the variance value is within the necessity reference value, it is determined that the mask processing is unnecessary.

更に、特徴情報記憶部２３には、特徴領域管理レコードを記録する。特徴領域管理レコードは、特徴領域評価処理を実行した場合に記録される。特徴領域管理レコードには、特徴領域画像及び認識結果に関するデータが記録される。 Further, the feature area management record is recorded in the feature information storage unit 23. The feature area management record is recorded when the feature area evaluation process is executed. Data related to the feature area image and the recognition result are recorded in the feature area management record.

特徴領域画像データ領域には、特徴画像の中で認識結果に影響を与える領域の画像に関するデータが記録される。
認識結果データ領域には、特徴領域により認識されるテキスト（ここでは数字）に関するデータが記録される。 In the feature area image data area, data related to an image in a region of the feature image that affects the recognition result is recorded.
In the recognition result data area, data related to the text (here, numbers) recognized by the feature area is recorded.

（特徴領域評価処理）
次に、図５を用いて、特徴領域評価処理を説明する。
まず、支援サーバ２０の制御部２１は、特徴画像の取得処理を実行する（ステップＳ２０１）。具体的には、制御部２１の画像加工部２１１は、特徴情報記憶部２３から、すべての特徴画像管理レコードを抽出し、特徴画像管理レコードに記録された特徴画像を取得する。 (Characteristic area evaluation processing)
Next, the feature region evaluation process will be described with reference to FIG.
First, the control unit 21 of the support server 20 executes a feature image acquisition process (step S201). Specifically, the image processing unit 211 of the control unit 21 extracts all the feature image management records from the feature information storage unit 23, and acquires the feature images recorded in the feature image management records.

次に、支援サーバ２０の制御部２１は、マスキング処理が必要かどうかについての判定処理を実行する（ステップＳ２０２）。具体的には、制御部２１の画像加工部２１１は、特徴コンテンツ生成処理におけるクラスタリング結果を取得する。そして、画像加工部２１１は、各特徴画像の特徴量を比較し、類似性の分散値を算出する。そして、画像加工部２１１は、分散値と要否基準値とを比較する。 Next, the control unit 21 of the support server 20 executes a determination process as to whether or not the masking process is necessary (step S202). Specifically, the image processing unit 211 of the control unit 21 acquires the clustering result in the feature content generation processing. Then, the image processing unit 211 compares the feature amounts of the feature images and calculates the variance value of the similarity. Then, the image processing unit 211 compares the dispersion value with the necessity reference value.

分散値が要否基準値を超えており、マスキングが必要と判定した場合（ステップＳ２０２）、支援サーバ２０の制御部２１は、特徴画像毎に、以下の処理を繰り返す。 When the dispersion value exceeds the necessity reference value and it is determined that masking is necessary (step S202), the control unit 21 of the support server 20 repeats the following processing for each feature image.

ここでは、まず、支援サーバ２０の制御部２１は、部分マスキング処理を実行する（ステップＳ２０３）。具体的には、制御部２１の画像加工部２１１は、特徴画像のビットマップ全体の１／４のサイズの黒マスクを用いて、特徴画像をマスキングしたマスク画像（マスクコンテンツ）を生成する。例えば、特徴画像の左上に黒マスクを配置したマスク画像を生成する。
図６に示すように、特徴画像６００に対して、黒マスクＭ１を配置したマスク画像６１０を生成する。 Here, first, the control unit 21 of the support server 20 executes the partial masking process (step S203). Specifically, the image processing unit 211 of the control unit 21 generates a mask image (mask content) that masks the feature image by using a black mask having a size of 1/4 of the entire bitmap of the feature image. For example, a mask image in which a black mask is placed at the upper left of the feature image is generated.
As shown in FIG. 6, a mask image 610 in which the black mask M1 is arranged is generated for the feature image 600.

次に、支援サーバ２０の制御部２１は、確からしさの算出処理を実行する（ステップＳ２０４）。具体的には、制御部２１の予測部２１２は、マスク画像を、評価対象記憶部２２に記録された学習済みモデルに入力する。この場合、予測部２１２は、マスク画像について、認識結果及び確からしさを出力する。 Next, the control unit 21 of the support server 20 executes the calculation process of the certainty (step S204). Specifically, the prediction unit 212 of the control unit 21 inputs the mask image to the trained model recorded in the evaluation target storage unit 22. In this case, the prediction unit 212 outputs the recognition result and the certainty of the mask image.

次に、支援サーバ２０の制御部２１は、確からしさの低下の仮記憶処理を実行する（ステップＳ２０５）。具体的には、制御部２１の評価部２１３は、特徴画像の認識結果について、予測部２１２が出力したマスク画像の確からしさを取得する。次に、評価部２１３は、特徴画像の確からしさとマスク画像の確からしさとの差分値を算出する。そして、評価部２１３は、特徴画像において黒マスクでマスキングされた領域（マスク領域）の画像に関連付けて、確からしさの差分値をメモリに仮記憶する。 Next, the control unit 21 of the support server 20 executes a temporary storage process for reducing the certainty (step S205). Specifically, the evaluation unit 213 of the control unit 21 acquires the certainty of the mask image output by the prediction unit 212 with respect to the recognition result of the feature image. Next, the evaluation unit 213 calculates the difference value between the certainty of the feature image and the certainty of the mask image. Then, the evaluation unit 213 temporarily stores the difference value of the certainty in the memory in association with the image of the area (mask area) masked by the black mask in the feature image.

次に、支援サーバ２０の制御部２１は、マスキングを終了したかどうかについての判定処理を実行する（ステップＳ２０６）。具体的には、制御部２１の画像加工部２１１は、特徴画像におけるすべての配置についてマスキングを行なった場合には、マスキングの終了と判定する。例えば、特徴画像の左上からマスキングを開始した場合には、特徴画像の右下に黒マスクが到達したことにより、マスキングの終了と判定する。 Next, the control unit 21 of the support server 20 executes a determination process as to whether or not masking has been completed (step S206). Specifically, the image processing unit 211 of the control unit 21 determines that the masking is completed when masking is performed for all the arrangements in the feature image. For example, when masking is started from the upper left of the feature image, it is determined that the masking is completed when the black mask reaches the lower right of the feature image.

ここで、マスキングを終了していないと判定した場合（ステップＳ２０６において「ＮＯ」の場合）、支援サーバ２０の制御部２１は、部分マスキング処理（ステップＳ２０３）以降の処理を実行する。この場合には、黒マスクを１ピクセル（１行又は１列）分、移動させることにより、マスク画像を生成する。 Here, if it is determined that masking has not been completed (in the case of "NO" in step S206), the control unit 21 of the support server 20 executes the partial masking process (step S203) and subsequent processes. In this case, a mask image is generated by moving the black mask by one pixel (one row or one column).

図６に示すように、マスク画像６１０に対して、黒マスクＭ１を移動させたマスク画像６２０を生成する。支援サーバ２０の制御部２１は、マスク画像６２０について、ステップＳ２０３～２０６の処理を終了後、更に、順次、マスク画像６３０の生成を繰り返す。そして、特徴画像６００の右下に黒マスクが到達したマスク画像６４０において、マスキングの終了と判定する。 As shown in FIG. 6, a mask image 620 is generated by moving the black mask M1 with respect to the mask image 610. The control unit 21 of the support server 20 repeats the generation of the mask image 630 sequentially after the processing of steps S203 to 206 is completed for the mask image 620. Then, in the mask image 640 where the black mask reaches the lower right of the feature image 600, it is determined that the masking is completed.

マスキングを終了と判定した場合（ステップＳ２０６において「ＹＥＳ」の場合）、支援サーバ２０の制御部２１は、確からしさの低下が最大の特徴領域の特定処理を実行する（ステップＳ２０７）。具体的には、制御部２１の評価部２１３は、メモリに仮記憶された差分値の中で、最大値のマスク領域を特徴領域として特定する。そして、評価部２１３は、特徴領域の画像を認識結果に関連付けて記録した特徴領域管理レコードを生成し、特徴情報記憶部２３に記録する。
そして、支援サーバ２０の制御部２１は、すべての特徴画像について終了するまで、上記処理を繰り返す。 When it is determined that the masking is finished (when "YES" in step S206), the control unit 21 of the support server 20 executes the specification process of the feature area where the decrease in certainty is the largest (step S207). Specifically, the evaluation unit 213 of the control unit 21 specifies the mask area of the maximum value as the feature area among the difference values temporarily stored in the memory. Then, the evaluation unit 213 generates a feature area management record in which the image of the feature area is associated with the recognition result and is recorded, and records the image in the feature information storage unit 23.
Then, the control unit 21 of the support server 20 repeats the above process until all the feature images are completed.

すべての特徴画像についての繰り返し処理を終了した場合、支援サーバ２０の制御部２１は、特徴領域の取得処理を実行する（ステップＳ２０８）。具体的には、制御部２１のクラスタ分析部２１４は、特徴情報記憶部２３から、すべての特徴領域管理レコードを抽出し、特徴領域管理レコードに記録された特徴領域の画像を取得する。 When the iterative processing for all the feature images is completed, the control unit 21 of the support server 20 executes the acquisition process of the feature area (step S208). Specifically, the cluster analysis unit 214 of the control unit 21 extracts all the feature area management records from the feature information storage unit 23, and acquires an image of the feature area recorded in the feature area management record.

次に、支援サーバ２０の制御部２１は、特徴領域のクラスタリング処理を実行する（ステップＳ２０９）。具体的には、制御部２１のクラスタ分析部２１４は、特徴領域画像をクラスタリング処理により、グループ分けを行なう。これにより、共通した認識結果において、類似した特徴領域画像を特定することができる。
ここでは、図７に示すように、特徴領域のクラスタリング処理により、グループＧ１～Ｇ３が生成される。 Next, the control unit 21 of the support server 20 executes a clustering process of the feature area (step S209). Specifically, the cluster analysis unit 214 of the control unit 21 groups the feature area images by clustering processing. This makes it possible to identify similar feature region images in a common recognition result.
Here, as shown in FIG. 7, the groups G1 to G3 are generated by the clustering process of the feature region.

次に、支援サーバ２０の制御部２１は、クラスタリング結果の出力処理を実行する（ステップＳ２１１）。具体的には、制御部２１のクラスタ分析部２１４は、クラスタリングにより生成したグループ毎に特徴領域画像の平均画像を生成する。そして、クラスタ分析部２１４は、認識結果に関連付けた平均画像を、ユーザ端末１０に出力する。 Next, the control unit 21 of the support server 20 executes the output processing of the clustering result (step S211). Specifically, the cluster analysis unit 214 of the control unit 21 generates an average image of the feature region image for each group generated by clustering. Then, the cluster analysis unit 214 outputs the average image associated with the recognition result to the user terminal 10.

ここでは、図７に示すように、各グループＧ１～Ｇ３に属する特徴領域画像の平均画像７０１～７０３を生成して、ユーザ端末１０に出力する。
一方、マスキング処理は不要と判定した場合（ステップＳ２０２において「ＮＯ」の場合）、支援サーバ２０の制御部２１は、ステップＳ１０９と同様に、特徴情報記憶部２３に記録された特徴画像を用いて、特徴画像のクラスタリング処理を実行する（ステップＳ２１０）。 Here, as shown in FIG. 7, average images 701 to 703 of the feature region images belonging to each group G1 to G3 are generated and output to the user terminal 10.
On the other hand, when it is determined that the masking process is unnecessary (when "NO" in step S202), the control unit 21 of the support server 20 uses the feature image recorded in the feature information storage unit 23 as in step S109. , The clustering process of the feature image is executed (step S210).

本実施形態によれば、上記（１－１）～（１－３）の効果に加えて、以下のような効果を、更に得ることができる。
（２－１）本実施形態においては、支援サーバ２０の制御部２１は、マスキング処理が必要かどうかについての判定処理を実行する（ステップＳ２０２）。これにより、特徴画像の生成状況により、特徴領域評価処理の実行の必要性を判定することができる。 According to the present embodiment, in addition to the above-mentioned effects (1-1) to (1-3), the following effects can be further obtained.
(2-1) In the present embodiment, the control unit 21 of the support server 20 executes a determination process as to whether or not the masking process is necessary (step S202). Thereby, it is possible to determine the necessity of executing the feature area evaluation process based on the generation status of the feature image.

（２－２）本実施形態においては、支援サーバ２０の制御部２１は、部分マスキング処理（ステップＳ２０３）、確からしさの算出処理（ステップＳ２０４）、確からしさの低下の仮記憶処理（ステップＳ２０５）を実行する。これにより、特徴画像の中でも、過去領域の認識結果に与える影響を評価することができる。 (2-2) In the present embodiment, the control unit 21 of the support server 20 has a partial masking process (step S203), a certainty calculation process (step S204), and a temporary storage process for reducing the certainty (step S205). To execute. This makes it possible to evaluate the influence on the recognition result of the past region even in the feature image.

（２－３）本実施形態においては、支援サーバ２０の制御部２１は、確からしさの低下が最大の特徴領域の特定処理を実行する（ステップＳ２０７）。これにより、特徴画像において、学習済みモデルの出力に最も影響を与える領域を特定することができる。 (2-3) In the present embodiment, the control unit 21 of the support server 20 executes the process of specifying the characteristic region having the greatest decrease in certainty (step S207). This makes it possible to identify the region of the feature image that most affects the output of the trained model.

（２－４）本実施形態においては、支援サーバ２０の制御部２１は、特徴領域のクラスタリング処理を実行する（ステップＳ２０９）。これにより、認識結果に対して、複数の特徴領域画像を取得した場合にも、クラスタリングによるグループ毎にまとめた特徴を出力することができる。 (2-4) In the present embodiment, the control unit 21 of the support server 20 executes the clustering process of the feature area (step S209). As a result, even when a plurality of feature area images are acquired for the recognition result, the features grouped by clustering can be output.

本実施形態は、以下のように変更して実施することができる。本実施形態及び以下の変更例は、技術的に矛盾しない範囲で互いに組み合わせて実施することができる。
・上記第１実施形態では、コンテンツとして画像に含まれる文字を認識する学習済みモデルの評価を行なう。本発明の適用対象は、画像認識に限定されるものではない。例えば、音信号をテキストに変換する音声認識や、文章に基づいて、感情を認識する学習済みモデルに適用してもよい。 This embodiment can be modified and implemented as follows. The present embodiment and the following modified examples can be implemented in combination with each other within a technically consistent range.
-In the first embodiment, the trained model that recognizes the characters included in the image as the content is evaluated. The object of application of the present invention is not limited to image recognition. For example, it may be applied to speech recognition that converts a sound signal into text, or a trained model that recognizes emotions based on sentences.

・上記第１実施形態では、支援サーバ２０の制御部２１は、白黒画像の生成処理を実行する（ステップＳ１０１）。最初の画像は白黒画像に限定されるものではない。例えば、全面が白または黒の画像を用いてもよい。また、認識結果の確からしさが高い画像を最初の画像として用いてもよい。 -In the first embodiment, the control unit 21 of the support server 20 executes a black-and-white image generation process (step S101). The first image is not limited to black and white images. For example, an image whose entire surface is white or black may be used. Further, an image with high certainty of the recognition result may be used as the first image.

・上記第１実施形態では、支援サーバ２０の制御部２１は、ランダムにピクセル選定処理を実行する（ステップＳ１０２）。サンプルコンテンツを網羅的に変更できれば、ピクセルの選択はランダムに限定されるものではない。また、選択するピクセルは、１つに限らず、同時期に複数のピクセルを選択してもよい。 -In the first embodiment, the control unit 21 of the support server 20 randomly executes the pixel selection process (step S102). Pixel selection is not limited to random if the sample content can be changed exhaustively. Further, the pixel to be selected is not limited to one, and a plurality of pixels may be selected at the same time.

・上記第１実施形態では、支援サーバ２０の制御部２１は、ピクセル反転処理を実行する（ステップＳ１０３）。網羅的に変化を加えることができれば、ピクセル反転に限定されるものではない。カラー画像を用いて、画像認識を行なう学習済みモデルの場合には、支援サーバ２０の制御部２１は、例えば、各ピクセルのＲＧＢ値を、順次、変更する。 -In the first embodiment, the control unit 21 of the support server 20 executes the pixel inversion process (step S103). If changes can be made comprehensively, it is not limited to pixel inversion. In the case of a trained model that performs image recognition using a color image, the control unit 21 of the support server 20 sequentially changes, for example, the RGB value of each pixel.

また、音声信号をテキストに変換する学習済みモデルの場合には、支援サーバ２０の制御部２１は、例えば、音声信号を周波数変換し、各周波数の係数をランダムに変更する。
また、文章から何らかの認識結果を取得する自然言語処理を行なう学習済みモデルの場合には、例えば、文章に含まれる単語を変更する。この場合には、複数のサンプルコンテンツを生成するために、単語を記録した辞書記憶部から、文章に含まれる単語を置き換えるために他の単語を取得する。 Further, in the case of a trained model that converts an audio signal into text, the control unit 21 of the support server 20 converts the audio signal into a frequency and randomly changes the coefficient of each frequency, for example.
Further, in the case of a trained model that performs natural language processing to acquire some recognition result from a sentence, for example, the word included in the sentence is changed. In this case, in order to generate a plurality of sample contents, another word is acquired from the dictionary storage unit in which the word is recorded in order to replace the word contained in the sentence.

・上記第１実施形態では、支援サーバ２０の制御部２１は、終了かどうかについての判定処理を実行する（ステップＳ１０７）。ここでは、レコード数が終了条件を満足している場合に、終了と判定する。終了条件は、これに限定されるものではない。例えば、支援サーバ２０の制御部２１が、白黒画像の生成処理（ステップＳ１０１）を繰り返し、この繰り返し回数を終了条件として用いてもよい。この場合には、支援サーバ２０の制御部２１が、先行して生成したサンプルコンテンツの確からしさと、後続のサンプルコンテンツの確からしさとを比較して、確からしさが低下した場合に、先行のサンプルコンテンツを特徴画像と判定する。そして、再度、白黒画像の生成処理（ステップＳ１０１）からの処理を繰り返し、この繰り返し回数が所定回数に到達した場合に、終了と判定する。 -In the first embodiment, the control unit 21 of the support server 20 executes a determination process as to whether or not it is terminated (step S107). Here, when the number of records satisfies the end condition, it is determined to end. The termination condition is not limited to this. For example, the control unit 21 of the support server 20 may repeat the black-and-white image generation process (step S101) and use the number of repetitions as the end condition. In this case, the control unit 21 of the support server 20 compares the certainty of the sample content generated in advance with the certainty of the subsequent sample content, and when the certainty is lowered, the preceding sample is found. Judge the content as a feature image. Then, the process from the black-and-white image generation process (step S101) is repeated again, and when the number of repetitions reaches a predetermined number, it is determined to end.

・上記第２実施形態では、支援サーバ２０の制御部２１は、マスキングが必要かどうかについての判定処理を実行する（ステップＳ２０２）。この判定では、要否基準値を用いる場合に限定されるものではない。例えば、ユーザ端末１０における、担当者の判断の入力結果に基づいて、マスキングの要否を判定するようにしてもよい。この場合には、画像加工部２１１は、ユーザ端末１０に、マスク処理の要否を確認する確認画面を出力する。 -In the second embodiment, the control unit 21 of the support server 20 executes a determination process as to whether masking is necessary (step S202). This determination is not limited to the case where the necessity reference value is used. For example, the necessity of masking may be determined based on the input result of the determination of the person in charge at the user terminal 10. In this case, the image processing unit 211 outputs a confirmation screen for confirming the necessity of mask processing to the user terminal 10.

・上記第２実施形態では、支援サーバ２０の制御部２１は、確からしさの低下の仮記憶処理を実行する（ステップＳ２０５）。ここで、確からしさの低下が基準値以上の特徴領域を特徴情報記憶部２３に記録するようにしてもよい。この場合には、評価部２１３に、特徴領域を判定するための低下基準値に関するデータを保持させておく。そして、評価部２１３は、特徴画像の確からしさとマスク画像の確からしさとの差分値と低下基準値とを比較し、低下基準値以上の差分値の特徴画像を記録する。 -In the second embodiment, the control unit 21 of the support server 20 executes a temporary storage process for reducing the certainty (step S205). Here, the feature area 23 in which the decrease in certainty is equal to or higher than the reference value may be recorded in the feature information storage unit 23. In this case, the evaluation unit 213 holds data on the lowering reference value for determining the characteristic region. Then, the evaluation unit 213 compares the difference value between the certainty of the feature image and the certainty of the mask image and the reduction reference value, and records the feature image having the difference value equal to or higher than the reduction reference value.

・上記第２実施形態では、支援サーバ２０の制御部２１は、部分マスキング処理を実行する（ステップＳ２０３）。具体的には、特徴画像のビットマップ全体の１／４にサイズの黒マスクを用いる。マスクサイズは、これに限定されるものではない。例えば、特徴画像における白黒の分散状況に応じて、マスクサイズを変更してもよい。この場合、分散状況を示す指標値が基準値以下の場合には、マスクサイズを大きくする。 -In the second embodiment, the control unit 21 of the support server 20 executes the partial masking process (step S203). Specifically, a black mask of a size is used for 1/4 of the entire bitmap of the feature image. The mask size is not limited to this. For example, the mask size may be changed according to the black-and-white dispersion situation in the feature image. In this case, if the index value indicating the dispersion status is equal to or less than the reference value, the mask size is increased.

１０…ユーザ端末、２０…支援サーバ、２１…制御部、２１１…画像加工部、２１２…予測部、２１３…評価部、２１４…クラスタ分析部、２２…評価対象記憶部、２３…特徴情報記憶部。 10 ... user terminal, 20 ... support server, 21 ... control unit, 211 ... image processing unit, 212 ... prediction unit, 213 ... evaluation unit, 214 ... cluster analysis unit, 22 ... evaluation target storage unit, 23 ... feature information storage unit ..

Claims

An evaluation target storage unit that records the trained model,
It is a model evaluation system that evaluates the trained model by providing a control unit that outputs a recognition result using the trained model.
The control unit
Generate multiple sample contents by randomly changing only a part of the content of a predetermined size .
Each of the sample contents is input to the trained model recorded in the evaluation target storage unit, and the certainty of the recognition result of the sample contents is acquired.
A model evaluation system characterized in that the sample content whose certainty is equal to or higher than a reference value is specified as a feature content, and the result of clustering the feature content is output as an evaluation result regarding the feature content.

The control unit
A mask content is generated by masking a part of the feature content in the mask area.
The mask content is applied to the trained model recorded in the evaluation target storage unit to calculate the certainty.
The difference value between the certainty of the feature content and the certainty of the mask content is calculated.
The mask area is specified according to the magnitude of the difference value, and the mask area is specified.
The model evaluation system according to claim 1 , wherein the result of clustering the mask area is further output.

An evaluation target storage unit that records the trained model,
It is a method of evaluating the trained model by using a model evaluation system that includes a control unit that outputs a recognition result using the trained model and evaluates the trained model.
The control unit
Generate multiple sample contents by randomly changing only a part of the content of a predetermined size .
Each of the sample contents is input to the trained model recorded in the evaluation target storage unit, and the certainty of the recognition result of the sample contents is acquired.
A model evaluation method characterized in that the sample content whose certainty is equal to or higher than a reference value is specified as a feature content, and the result of clustering the feature content is output as an evaluation result regarding the feature content.

An evaluation target storage unit that records the trained model,
It is a model evaluation program that evaluates the trained model by using a model evaluation system that has a control unit that outputs a recognition result using the trained model and evaluates the trained model.
The control unit
Generate multiple sample contents by randomly changing only a part of the content of a predetermined size .
Each of the sample contents is input to the trained model recorded in the evaluation target storage unit, and the certainty of the recognition result of the sample contents is acquired.
A model evaluation program for specifying the sample content whose certainty is equal to or higher than the reference value as the feature content and functioning as a means for outputting the result of clustering the feature content as an evaluation result for the feature content.