JP2021189889A

JP2021189889A - Evaluation apparatus, evaluation method, and evaluation program

Info

Publication number: JP2021189889A
Application number: JP2020096167A
Authority: JP
Inventors: バネッサブラカモンテ; Bracamonte Vanessa; 清良披田野; Seira Hidano
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2020-06-02
Filing date: 2020-06-02
Publication date: 2021-12-13
Anticipated expiration: 2040-06-02
Also published as: JP7282715B2

Abstract

To provide an evaluation apparatus which presents a tendency of influence of a specific feature quantity in a dataset on a prediction result of a machine learning model, an evaluation method, and an evaluation program.SOLUTION: An evaluation apparatus 1 includes: a dataset input unit 11 which receives multiple pieces of text data and datasets including correct categories of the text data; a prediction value input unit 12 which receives a prediction value of classification with a machine learning model using the text data; a description value input unit 13 which receives a description value which describes a degree of influence on the prediction value for each of words constituting the text data; and a distribution output unit 14 which outputs distribution data of the description values in each spot appearing in the dataset, for a designated word.SELECTED DRAWING: Figure 1

Description

本発明は、機械学習モデルを評価するための評価装置、評価方法及び評価プログラムに関する。 The present invention relates to an evaluation device, an evaluation method, and an evaluation program for evaluating a machine learning model.

従来、例えば映画の口コミ等の文章の内容がポジティブであるかネガティブであるかといった分類、迷惑メールの検知、あるいは画像のクラスタリング等を行う際に、機械学習モデルを用いた自動の分類手法が提供されている。 Conventionally, an automatic classification method using a machine learning model has been provided when classifying sentences such as movie reviews as positive or negative, detecting unsolicited emails, or clustering images. Has been done.

このとき、利用する機械学習モデルの分類精度が信頼できるものであるかどうかが重要であり、機械学習モデルを評価する手法が提案されている。例えば、非特許文献１及び２では、特徴量に対する機械学習モデルによる予測結果を視覚的に提示する手法が提案されている。
また、例えば、非特許文献３及び４では、各特徴量の機械学習モデルの予測結果に対する影響度合いを説明する値を算出する手法が提案されている。 At this time, it is important whether the classification accuracy of the machine learning model to be used is reliable, and a method for evaluating the machine learning model has been proposed. For example, Non-Patent Documents 1 and 2 propose a method of visually presenting a prediction result by a machine learning model for a feature amount.
Further, for example, Non-Patent Documents 3 and 4 propose a method of calculating a value explaining the degree of influence of each feature amount on a prediction result of a machine learning model.

Google AI Blog, “The What-If Tool: Code-Free Probing of Machine Learning Models”, Sep. 11, 2018, <https://pair-code.github.io/what-if-tool>.Google AI Blog, “The What-If Tool: Code-Free Probing of Machine Learning Models”, Sep. 11, 2018, <https://pair-code.github.io/what-if-tool>. Error terrain analysis for machine learning: Tool and visualizations. Rick Barraza, Russell Eames, Yan Esteve Balducci, Josh Hinds, Scott Hoogerwerf, Eric Horvitz, Ece Kamar, Jacquelyn Krones, Josh Lovejoy, Parham Mohadjer, Ben Noah and Besmira Nushi (Contributed talk), Presented at ICLR 2019 Debugging Machine Learning Models Workshop, May 6, 2019, <https://debug-ml-iclr2019.github.io/>.Error terrain analysis for machine learning: Tool and visualizations. Rick Barraza, Russell Eames, Yan Esteve Balducci, Josh Hinds, Scott Hoogerwerf, Eric Horvitz, Ece Kamar, Jacquelyn Krones, Josh Lovejoy, Parham Mohadjer, Ben Noah and Besmira Nushi (Contributed talk) ), Presented at ICLR 2019 Debugging Machine Learning Models Workshop, May 6, 2019, <https://debug-ml-iclr2019.github.io/>. Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. 2016. “‘Why Should I Trust You?’: Explaining the Predictions of Any Classifier.” In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, New York, NY, USA: ACM, 1135-1144. Code and visualizations at <https://github.com/marcotcr/lime>.Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. 2016. “'Why Should I Trust You?': Explaining the Predictions of Any Classifier.” In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD' 16, New York, NY, USA: ACM, 1135-1144. Code and visualizations at <https://github.com/marcotcr/lime>. Lundberg, Scott M, and Su-In Lee. 2017. “A Unified Approach to Interpreting Model Predictions.” In Advances in Neural Information Processing Systems 30, eds. I. Guyon et al. Curran Associates, Inc., 4765-4774. Code and visualizations at <https://github.com/slundberg/shap>.Lundberg, Scott M, and Su-In Lee. 2017. “A Unified Approach to Interpreting Model Predictions.” In Advances in Neural Information Processing Systems 30, eds. I. Guyon et al. Curran Associates, Inc., 4765-4774. Code and visualizations at <https://github.com/slundberg/shap>.

しかしながら、従来の評価手法では、例えば、特定の入力データの中でどの特徴量が結果に影響を及ぼしているのかを人が判断できたとしても、様々な入力データ（テキストデータ）の様々な箇所に現れる特定の特徴量（単語）が機械学習モデルで適切に扱われているか否かを判断することは難しかった。 However, in the conventional evaluation method, for example, even if a person can determine which feature amount influences the result in a specific input data, various parts of various input data (text data) can be determined. It was difficult to judge whether or not the specific features (words) appearing in the machine learning model were properly handled.

本発明は、機械学習モデルの信頼性を評価するために、データセットの中で特定の特徴量が機械学習モデルの予測結果にどのような影響を与えているかの傾向を提示できる評価装置、評価方法及び評価プログラムを提供することを目的とする。 The present invention is an evaluation device and evaluation capable of presenting a tendency of how a specific feature amount affects the prediction result of a machine learning model in a data set in order to evaluate the reliability of the machine learning model. The purpose is to provide methods and evaluation programs.

本発明に係る評価装置は、複数のテキストデータ、及び当該テキストデータそれぞれの正しい分類を含むデータセットを受け付けるデータセット入力部と、前記テキストデータを入力とした機械学習モデルによる前記分類の予測値を受け付ける予測値入力部と、前記テキストデータを構成する語句毎の前記予測値に対する影響度合いを説明する説明値を受け付ける説明値入力部と、指定された語句に対して、前記データセットに現れる箇所毎の前記説明値の分布データを出力する分布出力部と、を備える。 The evaluation device according to the present invention has a data set input unit that accepts a plurality of text data and a data set including the correct classification of each of the text data, and a predicted value of the classification by a machine learning model using the text data as an input. A predicted value input unit that accepts an explanatory value input unit that accepts an explanatory value that explains the degree of influence of each word constituting the text data on the predicted value, and a location that appears in the data set for a specified phrase. It is provided with a distribution output unit for outputting the distribution data of the above-mentioned explanatory values.

前記分布出力部は、複数の語句が指定されたことに応じて、当該複数の語句に対する前記説明値の分布データを比較して出力してもよい。 The distribution output unit may compare and output the distribution data of the explanatory values for the plurality of words and phrases according to the designation of the plurality of words and phrases.

前記分布出力部は、前記説明値をプロットしたグラフを出力してもよい。 The distribution output unit may output a graph in which the explanatory values are plotted.

前記分布出力部は、前記分布データの統計値を出力してもよい。 The distribution output unit may output the statistical value of the distribution data.

前記評価装置は、前記指定された語句が含まれる前記テキストデータの一覧と共に、前記説明値、前記予測値及び前記正しい分類を出力する一覧出力部を備えてもよい。 The evaluation device may include a list output unit that outputs the explanatory value, the predicted value, and the correct classification together with the list of the text data including the designated phrase.

前記一覧出力部は、前記一覧における前記予測値の正誤に関する統計値を出力してもよい。 The list output unit may output a statistic regarding the correctness of the predicted value in the list.

前記一覧出力部は、前記一覧から選択されたテキストデータの全文を、前記指定された語句、及び当該テキストデータの中で前記説明値の大きさが上位所定数の語句を強調して出力してもよい。 The list output unit outputs the entire text of the text data selected from the list by emphasizing the specified phrase and the phrase having the upper predetermined number of explanatory values in the text data. May be good.

前記評価装置は、前記データセットに含まれる語句のうち、前記説明値の大きさが上位所定数の語句を出力する語句出力部を備えてもよい。 The evaluation device may include a word / phrase output unit that outputs a predetermined number of words / phrases whose explanatory values are higher in size among the words / phrases included in the data set.

前記評価装置は、前記データセットのうち、処理対象のテキストデータを、前記予測値の正誤の区分に基づいて選別するフィルタ部を備えてもよい。 The evaluation device may include a filter unit that selects text data to be processed from the data set based on the correctness classification of the predicted value.

前記フィルタ部は、テキストデータの長さに基づいて、前記処理対象のテキストデータをさらに選別してもよい。 The filter unit may further select the text data to be processed based on the length of the text data.

本発明に係る評価方法は、複数のテキストデータ、及び当該テキストデータそれぞれの正しい分類を含むデータセットを受け付けるデータセット入力ステップと、前記テキストデータを入力とした機械学習モデルによる前記分類の予測値を受け付ける予測値入力ステップと、前記テキストデータを構成する語句毎の前記予測値に対する影響度合いを説明する説明値を受け付ける説明値入力ステップと、指定された語句に対して、前記データセットに現れる箇所毎の前記説明値の分布データを出力する分布出力ステップと、をコンピュータが実行する。 The evaluation method according to the present invention includes a data set input step that accepts a plurality of text data and a data set including the correct classification of each of the text data, and a predicted value of the classification by a machine learning model using the text data as an input. A predicted value input step to be accepted, an explanatory value input step to accept an explanatory value for explaining the degree of influence of each word constituting the text data on the predicted value, and a place appearing in the data set for a specified phrase. The computer executes the distribution output step of outputting the distribution data of the above-mentioned explanatory values.

本発明に係る評価プログラムは、前記評価装置としてコンピュータを機能させるためのものである。 The evaluation program according to the present invention is for operating a computer as the evaluation device.

本発明によれば、機械学習モデルの信頼性を評価するために、データセットの中で特定の特徴量が機械学習モデルの予測結果にどのような影響を与えているかの傾向を提示できる。 According to the present invention, in order to evaluate the reliability of a machine learning model, it is possible to present a tendency of how a specific feature amount affects the prediction result of the machine learning model in the data set.

実施形態における評価装置の機能構成を示す図である。It is a figure which shows the functional structure of the evaluation apparatus in embodiment. 実施形態における可視化ツールの画面構成例を示す図である。It is a figure which shows the screen composition example of the visualization tool in an embodiment. 実施形態におけるキーワードリストによる検索結果の画面例を示す図である。It is a figure which shows the screen example of the search result by the keyword list in an embodiment. 実施形態におけるテキストデータの一覧表示において各行が選択された場合の画面例を示す図である。It is a figure which shows the screen example when each line is selected in the list display of the text data in an embodiment. 実施形態における説明値の分布をグラフ表示した画面例を示す図である。It is a figure which shows the screen example which displayed the distribution of the explanatory value in a graph in an embodiment. 実施形態における説明値の分布に関する統計情報の表示例を示す図である。It is a figure which shows the display example of the statistical information about the distribution of the explanatory value in an embodiment. 実施形態における説明値の分布を単語間で比較して表示した画面例を示す図である。It is a figure which shows the screen example which compared and displayed the distribution of the explanatory value in an embodiment between words.

以下、本発明の実施形態の一例について説明する。
図１は、本実施形態における評価装置１の機能構成を示す図である。
評価装置１は、サーバ装置又はパーソナルコンピュータ等の情報処理装置（コンピュータ）であり、制御部１０及び記憶部２０の他、各種データの入出力デバイス及び通信デバイス等を備える。 Hereinafter, an example of the embodiment of the present invention will be described.
FIG. 1 is a diagram showing a functional configuration of the evaluation device 1 in the present embodiment.
The evaluation device 1 is an information processing device (computer) such as a server device or a personal computer, and includes a control unit 10 and a storage unit 20, as well as various data input / output devices and communication devices.

制御部１０は、評価装置１の全体を制御する部分であり、記憶部２０に記憶された各種プログラムを適宜読み出して実行することにより、本実施形態における各機能を実現する。制御部１０は、ＣＰＵであってよい。 The control unit 10 is a part that controls the entire evaluation device 1, and realizes each function in the present embodiment by appropriately reading and executing various programs stored in the storage unit 20. The control unit 10 may be a CPU.

記憶部２０は、ハードウェア群を評価装置１として機能させるための各種プログラム、及び各種データ等の記憶領域であり、ＲＯＭ、ＲＡＭ、フラッシュメモリ又はハードディスクドライブ（ＨＤＤ）等であってよい。具体的には、記憶部２０は、本実施形態の各機能を制御部１０に実行させるためのプログラム（評価プログラム）、及び評価対象の機械学習モデル、機械学習モデルへの入力となるデータセット等を記憶する。 The storage unit 20 is a storage area for various programs and various data for making the hardware group function as the evaluation device 1, and may be a ROM, RAM, flash memory, hard disk drive (HDD), or the like. Specifically, the storage unit 20 includes a program (evaluation program) for causing the control unit 10 to execute each function of the present embodiment, a machine learning model to be evaluated, a data set to be input to the machine learning model, and the like. Remember.

ここで、機械学習モデルは、テキストデータを入力として２クラス分類を行うものである。例えば、映画の口コミ等の文章（テキストデータ）がポジティブな内容である確率と、ネガティブな内容である確率とが、予測値として出力される。 Here, the machine learning model performs two-class classification by inputting text data. For example, the probability that a sentence (text data) such as a word of mouth of a movie has a positive content and the probability that the text (text data) has a negative content are output as predicted values.

制御部１０は、データセット入力部１１と、予測値入力部１２と、説明値入力部１３と、分布出力部１４と、一覧出力部１５と、単語出力部１６（語句出力部）と、フィルタ部１７とを備える。 The control unit 10 includes a data set input unit 11, a predicted value input unit 12, an explanatory value input unit 13, a distribution output unit 14, a list output unit 15, a word output unit 16 (word output unit), and a filter. A unit 17 is provided.

データセット入力部１１は、複数のテキストデータ（Ｄａｔａｐｏｉｎｔｓ）、及びテキストデータそれぞれについてユーザが判断した正しい分類（例えば、ポジティブ又はネガティブ）を含むデータセットを受け付ける。
テキストデータを構成する単語が特徴量（Ｆｅａｔｕｒｅｓ）として機械学習モデルへ入力される要素となる。 The data set input unit 11 accepts a plurality of text data (Datapoints) and a data set including a correct classification (for example, positive or negative) determined by the user for each of the text data.
The words that make up the text data are the elements that are input to the machine learning model as features.

予測値入力部１２は、テキストデータを入力とした機械学習モデルによる分類の予測値を受け付ける。 The predicted value input unit 12 accepts predicted values of classification by a machine learning model using text data as an input.

説明値入力部１３は、テキストデータを構成する単語毎の、機械学習モデルによる予測値に対する影響度合いを説明する説明値（Ｅｘｐｌａｎａｔｉｏｎｖａｌｕｅｓ）を受け付ける。
本実施形態において、説明値は、正の値の場合にポジティブな予測結果に貢献し、負の値の場合にネガティブな予測結果に貢献する。また、説明値の絶対値が大きいほど単語の重要度が高く、予測結果への影響度合いが大きい。
なお、説明値の算出手法は限定されないが、例えば、非特許文献３又は４で提案されている既存の説明手法が適用可能である。 The explanatory value input unit 13 receives explanatory values (Explanation values) for explaining the degree of influence on the predicted value by the machine learning model for each word constituting the text data.
In the present embodiment, the explanatory value contributes to a positive prediction result when it is a positive value, and contributes to a negative prediction result when it is a negative value. In addition, the larger the absolute value of the explanatory value, the higher the importance of the word, and the greater the degree of influence on the prediction result.
The method for calculating the explanatory value is not limited, but for example, the existing explanatory method proposed in Non-Patent Document 3 or 4 can be applied.

分布出力部１４は、ユーザにより指定された単語に対して、データセットに現れる箇所毎の説明値の分布データを出力する。このとき、複数の単語が指定された場合、分布出力部１４は、これら複数の単語に対する説明値の分布データを比較して出力する。
具体的には、分布出力部１４は、説明値をプロットしたグラフをディスプレイに表示させる。 The distribution output unit 14 outputs the distribution data of the explanatory values for each part appearing in the data set for the word specified by the user. At this time, when a plurality of words are specified, the distribution output unit 14 compares and outputs the distribution data of the explanatory values for these the plurality of words.
Specifically, the distribution output unit 14 displays a graph plotting the explanatory values on the display.

また、分布出力部１４は、この分布データの統計値を出力してもよい。統計値は、例えば、出現総数、平均値、中央値、最小値、最大値、ポジティブな予測に影響している数、ネガティブな予測に影響している数、予測に影響しないニュートラルな数等である。 Further, the distribution output unit 14 may output the statistical value of this distribution data. Statistics are, for example, total number of occurrences, average, median, minimum, maximum, number affecting positive predictions, number affecting negative predictions, neutral numbers not affecting predictions, etc. be.

一覧出力部１５は、指定された単語が含まれるテキストデータの一覧と共に、説明値、予測値及び正しい分類を出力する。
このとき、一覧出力部１５は、一覧における予測値の正誤に関する統計値を出力してもよい。統計値は、例えば、予測結果が正解の数及び割合、誤検知（ＦａｌｓｅＰｏｓｉｔｉｖｅ）の数及び割合、検知漏れ（ＦａｌｓｅＮｅｇａｔｉｖｅ）の数及び割合等である。 The list output unit 15 outputs an explanatory value, a predicted value, and a correct classification together with a list of text data including the designated word.
At this time, the list output unit 15 may output a statistical value regarding the correctness of the predicted value in the list. The statistical values are, for example, the number and ratio of correct answers, the number and ratio of false errors (False Positive), the number and ratio of false detection (False Negative), and the like.

また、一覧出力部１５は、一覧から選択されたテキストデータの全文を、指定された単語、及びこのテキストデータの中で説明値の大きさ（絶対値）が上位所定数（例えば、１０個）の単語を強調して出力する。
すなわち、予測結果に関わらず、ポジティブ又はネガティブのいずれかへ分類する影響度の大きい単語が協調される。 Further, the list output unit 15 uses the entire text of the text data selected from the list as a designated word and a predetermined number (for example, 10) of which the magnitude (absolute value) of the explanatory value is higher in the text data. The word is emphasized and output.
That is, regardless of the prediction result, words with a high degree of influence that are classified into either positive or negative are coordinated.

単語出力部１６は、データセットに含まれる単語のうち、説明値の大きさ（絶対値）が上位所定数（例えば、１０個）の単語を出力する。このとき、正負（ポジティブ及びネガティブ）それぞれ別々に上位所定数の単語が抽出され、例えば、出現頻度の順に一覧表示されてもよい。 The word output unit 16 outputs, among the words included in the data set, the words having the upper predetermined number (for example, 10) of the magnitude (absolute value) of the explanatory value. At this time, a predetermined number of high-ranking words may be extracted separately for positive and negative (positive and negative), and may be listed in order of appearance frequency, for example.

フィルタ部１７は、データセットのうち、処理対象のテキストデータを、予測値の正誤の区分、すなわち、正解、誤検知、検知漏れ等の区分を指定する入力に基づいて選別する。
さらに、フィルタ部１７は、テキストデータの長さに基づいて、処理対象のテキストデータをさらに選別してもよい。 The filter unit 17 selects the text data to be processed from the data set based on the input for designating the correct / incorrect classification of the predicted value, that is, the correct answer, the false detection, the detection omission, and the like.
Further, the filter unit 17 may further select the text data to be processed based on the length of the text data.

次に、評価装置１を前述の各機能部として動作させるための評価プログラム（可視化ツール）による画面表示例を説明する。 Next, a screen display example by an evaluation program (visualization tool) for operating the evaluation device 1 as each of the above-mentioned functional units will be described.

図２は、本実施形態における可視化ツールの画面構成例を示す図である。
この画面では、検索機能が提供され、指定された単語（キーワード）による検索と、キーワードリストの取り込みによる一括検索とが可能となっている。 FIG. 2 is a diagram showing an example of screen configuration of the visualization tool in the present embodiment.
On this screen, a search function is provided, and it is possible to perform a search by a specified word (keyword) and a batch search by importing a keyword list.

領域２１では、データセットに対するフィルタ条件が入力される。この例では、「Ｒｅｓｕｌｔ」欄において、予測値の正誤の区分として、正解（Ｃｏｒｒｅｃｔ）、誤検知（ＦａｌｓｅＰｏｓｉｔｉｖｅ）、検知漏れ（ＦａｌｓｅＮｅｇａｔｉｖｅ）のうち、少なくともいずれかの選択が求められている。
また、「ＷｏｒｄＬｅｎｇｔｈ」欄において、テキストデータの長さとして、７００ワード以下、７０１〜１０００ワード、１００１〜１４００ワード、１４０１ワード以上のうち、少なくともいずれかの選択が求められている。なお、テキストデータの長さの指定方法は、例えば、数値範囲が直接入力される態様であってもよい。 In the area 21, the filter condition for the data set is input. In this example, in the "Result" column, it is required to select at least one of correct answer (Correct), false detection (False Positive), and detection omission (False Negative) as the classification of the correctness of the predicted value.
Further, in the "Word Language" column, it is required to select at least one of 700 words or less, 701 to 1000 words, 1001 to 1400 words, and 1401 words or more as the length of the text data. The method of specifying the length of the text data may be, for example, a mode in which a numerical range is directly input.

領域２２には、単語出力部１６により出力される説明値の大きさが上位の単語（ＴｏｐＥｘｐｌａｎａｔｉｏｎＫｅｙｗｏｒｄｓ）がポジティブ及びネガティブのそれぞれについて、１０個ずつ出現頻度順にヒストグラムで表示されている。この表示は、フィルタ部１７による選別後のデータセットを対象とした出力結果である。 In the region 22, 10 words (Top Expansion Keywords) having a higher explanatory value output by the word output unit 16 are displayed in a histogram in order of appearance frequency for each of positive and negative. This display is an output result for the data set after selection by the filter unit 17.

領域２３は、キーワード又はキーワードリストによる検索結果のデータ又はグラフが表示される領域である。
また、領域２４には、機械学習モデルによる分類精度、及びテキストデータ（Ｄａｔａｐｏｉｎｔｓ）の総数が表示されている。 The area 23 is an area in which the data or graph of the search result by the keyword or the keyword list is displayed.
Further, in the area 24, the classification accuracy by the machine learning model and the total number of text data (Datapoints) are displayed.

図３は、本実施形態におけるキーワードリストによる検索結果の画面例を示す図である。
ＣＳＶファイル等によりキーワードリストが取り込まれると、複数のキーワードそれぞれが出現するテキストデータが検索され、検索されたテキストデータの数の順に領域３１に一覧表示される。なお、キーワードがフィルタ後のデータセットに含まれない場合は、該当のキーワードが出現するテキストデータの数は０となる。
なお、検索された数は、キーワードの出現回数であってもよい。 FIG. 3 is a diagram showing a screen example of a search result by a keyword list in this embodiment.
When the keyword list is taken in by a CSV file or the like, the text data in which each of the plurality of keywords appears is searched, and the list is displayed in the area 31 in the order of the number of the searched text data. If the keyword is not included in the filtered data set, the number of text data in which the keyword appears is 0.
The number of searches may be the number of occurrences of the keyword.

さらに、キーワードの一覧から１つが選択されると、一覧出力部１５は、選択されたキーワードを含むテキストデータの検索結果を領域３２に一覧表示させる。
この例では、一覧の各行において、テキストデータの少なくとも一部が、選択されたキーワードが中央に、かつ、前後の単語群と共に配置されるように、コンコーダンスリストとして検索結果が表示されている。
これにより、ユーザは、指定したキーワードがデータセットの中でどのように使用されているかを容易に把握できる。 Further, when one is selected from the list of keywords, the list output unit 15 causes the search result of the text data including the selected keyword to be displayed in a list in the area 32.
In this example, in each line of the list, the search results are displayed as a concordance list so that at least a portion of the text data is placed in the center of the selected keyword and with the preceding and following word groups.
This allows the user to easily understand how the specified keyword is used in the dataset.

一覧の各行には、テキストデータ（データポイント）のＩＤ、及びこのテキストデータにおけるキーワードの説明値が表示される。さらに、機械学習モデルによる予測値が区分（ポジティブ及びネガティブ）毎の確率（％）として、ユーザが付与した正しい分類（ＧｒｏｕｎｄＴｒｕｔｈ）と共に表示される。
なお、１つのテキストデータにキーワードが複数現れる場合、キーワードが現れる度に別の行に出力される。 In each line of the list, the ID of the text data (data point) and the explanatory value of the keyword in this text data are displayed. Further, the predicted value by the machine learning model is displayed as the probability (%) for each division (positive and negative) together with the correct classification (Ground Truth) given by the user.
When a plurality of keywords appear in one text data, each time the keyword appears, it is output to another line.

また、領域３３には、検索条件としてのキーワード及びフィルタ条件と、検索結果の統計情報として、キーワードの出現回数（Ｏｃｃｕｒｅｎｃｅｓ）、キーワードが含まれるテキストデータの数（Ｄａｔａｐｏｉｎｔｓ）、正解の件数及び割合、誤検知の件数及び割合、検知漏れの件数及び割合とが表示されている。 Further, in the area 33, keywords and filter conditions as search conditions, the number of occurrences of keywords (Occurences) as statistical information of search results, the number of text data including keywords (Datapoints), the number and ratio of correct answers, and the like. The number and percentage of false positives and the number and percentage of missed detections are displayed.

図４は、本実施形態におけるテキストデータの一覧表示において各行が選択された場合の画面例を示す図である。
選択されたテキストデータの全文が領域４１に表示される。このとき、テキストデータに含まれる単語のうち、指定されたキーワードと共に、説明値の絶対値が上位であり影響度の高い重要な単語が１０個、強調表示されている。
具体的には、キーワードである「ｇｏｏｄ」の色が変更され、ポジティブに影響する重要な単語とネガティブに影響する重要な単語とが、それぞれ異なる背景色で表示されている。 FIG. 4 is a diagram showing a screen example when each line is selected in the list display of text data in the present embodiment.
The entire text of the selected text data is displayed in the area 41. At this time, among the words included in the text data, 10 important words having a high absolute value of the explanatory value and a high influence are highlighted together with the designated keyword.
Specifically, the color of the keyword "good" has been changed, and important words that have a positive effect and important words that have a negative effect are displayed in different background colors.

図５は、本実施形態における説明値の分布をグラフ表示した画面例を示す図である。
グラフオプションが選択された場合、分布出力部１４は、検索されたキーワードの説明値の分布を、領域５１にグラフ表示する。 FIG. 5 is a diagram showing a screen example in which the distribution of explanatory values in the present embodiment is displayed as a graph.
When the graph option is selected, the distribution output unit 14 displays the distribution of the explanatory values of the searched keywords in a graph in the area 51.

この例では、指定されたキーワード「ｇｏｏｄ」が機械学習モデルにおいてどのように扱われているか、すなわち、いずれの分類（ポジティブ又はネガティブ）へどの程度影響しているかを示す説明値が、データセットの中の「ｇｏｏｄ」の出現箇所毎にプロットされている。 In this example, the explanatory values that indicate how the specified keyword "good" is treated in the machine learning model, that is, how much it affects which classification (positive or negative), is the data set. It is plotted for each place where "good" appears.

水平軸を説明値とし、プロットされる点は、正の値と負の値とで色分けされている。
この場合、中立（説明値≒０）な評価が多いが、ポジティブ（説明値＞０）に評価される場合がネガティブ（説明値＜０）に評価される場合よりも多いことが視覚的に把握できる。 With the horizontal axis as the explanatory value, the plotted points are color-coded by positive and negative values.
In this case, there are many evaluations that are neutral (explanatory value ≒ 0), but it is visually understood that there are more cases where evaluation is positive (explanatory value> 0) than when evaluation is negative (explanatory value <0). can.

また、領域５２には、グラフにプロットされた説明値の分布の特徴を表す統計情報が表示される。
説明値の分布グラフ、及び統計情報は、複数のキーワードを比較して出力することも可能である。 Further, in the area 52, statistical information representing the characteristics of the distribution of the explanatory values plotted on the graph is displayed.
It is also possible to compare and output a plurality of keywords in the distribution graph of the explanatory values and the statistical information.

図６は、本実施形態における説明値の分布に関する統計情報の表示例を示す図である。
この例では、「ｂｅｔｔｅｒ」及び「ｗｏｒｓｔ」の２つの単語を比較して、それぞれの統計値を並べて表示している。 FIG. 6 is a diagram showing a display example of statistical information regarding the distribution of explanatory values in the present embodiment.
In this example, the two words "better" and "worst" are compared and their respective statistics are displayed side by side.

具体的には、例えば、キーワードの出現回数（Ｏｃｃｕｒｅｎｃｅｓ）、平均値（Ｍｅａｎ）、中央値（Ｍｅｄｉａｎ）、最小値（Ｍｉｎ）、最大値（Ｍａｘ）、ポジティブな予測に影響している数（説明値＞０）、ネガティブな予測に影響している数（説明値＜０）、予測に影響しない中立な数（説明値≒０）が示される。
なお、ポジティブ、中立、ネガティブを分ける説明値の閾値は、予め設定されてよい。 Specifically, for example, the number of occurrences of the keyword (Occurences), the mean value (Mean), the median value (Median), the minimum value (Min), the maximum value (Max), and the number affecting the positive prediction (explanation). A value> 0), a number affecting the negative prediction (explanatory value <0), and a neutral number not affecting the prediction (explanatory value ≈ 0) are shown.
The threshold value of the explanatory value for separating positive, neutral, and negative may be set in advance.

この統計情報からは、例えば、次のような特徴が把握される。
・「ｂｅｔｔｅｒ」は、「ｗｏｒｓｔ」に比べて出現頻度が高い。
・「ｂｅｔｔｅｒ」の平均値及び中央値は０付近で中立であるが、「ｗｏｒｓｔ」はネガティブ側に寄っている。
・「ｂｅｔｔｅｒ」も「ｗｏｒｓｔ」も、最小値及び最大値の大きさが同等であり、正負両側に広がっているが、「ｗｏｒｓｔ」の方が大きな値が存在するため、より予測結果への影響度が高い。
・「ｂｅｔｔｅｒ」は、ポジティブな評価とネガティブな評価とが同等数あるが、「ｗｏｒｓｔ」は、ほぼネガティブな評価となっている。 From this statistical information, for example, the following features can be grasped.
-"Better" appears more frequently than "worst".
-The average and median values of "better" are neutral near 0, but "worst" is closer to the negative side.
-Both "better" and "worst" have the same minimum and maximum values and are spread on both the positive and negative sides, but "worst" has a larger value, so it has a greater effect on the prediction results. The degree is high.
-"Better" has the same number of positive evaluations and negative evaluations, but "worst" has almost negative evaluations.

図７は、本実施形態における説明値の分布を単語間で比較して表示した画面例を示す図である。
この例では、「ｇｏｏｄ」と「ｂａｄ」とで説明値の分布を比較している。
「ｇｏｏｄ」の説明値は、ポジティブ側に偏り、「ｂａｄ」の説明値は、逆にネガティブ側に偏っていることが分かる。 FIG. 7 is a diagram showing a screen example in which the distribution of explanatory values in the present embodiment is compared and displayed between words.
In this example, the distribution of explanatory values is compared between "good" and "bad".
It can be seen that the explanatory value of "good" is biased toward the positive side, and the explanatory value of "bad" is biased toward the negative side.

本実施形態によれば、評価装置１は、複数のテキストデータを含むデータセットと、テキストデータを入力とした機械学習モデルによる分類の予測値と、テキストデータを構成する単語毎の説明値とを入力として、指定された単語に対して、データセットに現れる箇所毎の説明値の分布データを出力する。
これにより、評価装置１は、データセットの中で特定の単語（特徴量）が機械学習モデルの予測結果にどのような影響を与えているかの傾向を可視化して提示できる。 According to the present embodiment, the evaluation device 1 sets a data set including a plurality of text data, a predicted value of classification by a machine learning model using the text data as an input, and an explanatory value for each word constituting the text data. As an input, the distribution data of the explanatory values for each part appearing in the data set is output for the specified word.
As a result, the evaluation device 1 can visualize and present the tendency of how a specific word (feature amount) affects the prediction result of the machine learning model in the data set.

この結果、ユーザは、単語毎に説明値の分布の確からしさを確認できるので、例えば、ユーザが保有しているデータセットの中で、中立と思われる単語の説明値の分布が正負いずれかに偏っていたり、逆に、重要と思われる単語の説明値が０付近に集中していたりといった不自然な分布を容易に発見できる。これにより、ユーザは、利用する機械学習モデルの信頼性を適切に評価できる。 As a result, the user can confirm the certainty of the distribution of the explanatory values for each word. For example, in the data set owned by the user, the distribution of the explanatory values of the words that are considered to be neutral is either positive or negative. Unnatural distributions such as being biased or, conversely, the explanatory values of words that are considered important are concentrated near 0 can be easily found. As a result, the user can appropriately evaluate the reliability of the machine learning model to be used.

評価装置１は、複数の単語が指定されたことに応じて、これらの単語に対する説明値の分布データを比較して出力するので、ユーザは、単語間での分布の相違を容易に把握でき、分布の確からしさを確認できる。 Since the evaluation device 1 compares and outputs the distribution data of the explanatory values for these words according to the designation of a plurality of words, the user can easily grasp the difference in the distribution between the words. You can check the certainty of the distribution.

評価装置１は、説明値をプロットしたグラフを出力するので、ユーザは、視覚的に容易に説明値の分布を把握でき、確からしさを確認できる。
評価装置１は、分布データの統計値を出力するので、ユーザは、客観的な数値データに基づいて、説明値の分布の確からしさを確認できる。 Since the evaluation device 1 outputs a graph plotting the explanatory values, the user can easily visually grasp the distribution of the explanatory values and confirm the certainty.
Since the evaluation device 1 outputs the statistical value of the distribution data, the user can confirm the certainty of the distribution of the explanatory values based on the objective numerical data.

評価装置１は、指定された単語が含まれるテキストデータの一覧と共に、説明値、予測値及び正しい分類を出力する。
これにより、ユーザは、指定した単語がデータセットの中でどのように使われているかを確認しつつ、各出現箇所において、機械学習モデルの予測値にどの程度影響しているかを詳細に調査できる。 The evaluation device 1 outputs an explanatory value, a predicted value, and a correct classification together with a list of text data including the designated word.
This allows the user to see how the specified word is used in the dataset and to investigate in detail how much each occurrence affects the predicted value of the machine learning model. ..

評価装置１は、テキストデータの一覧における予測値の正誤に関する統計値を出力する。これにより、ユーザは、指定した単語を含むテキストデータについて、機械学習モデルの予測精度を確認し、この単語による影響を把握することができる。 The evaluation device 1 outputs a statistical value regarding the correctness of the predicted value in the list of text data. As a result, the user can confirm the prediction accuracy of the machine learning model for the text data including the specified word and understand the influence of this word.

評価装置１は、一覧から選択されたテキストデータの全文を、指定された単語、及びテキストデータの中で説明値の大きさが上位所定数の単語を強調した上で出力する。
これにより、ユーザは、指定した単語を含むテキストデータ全体を確認しつつ、この中で影響度合いの高い他の単語群を容易に把握し、機械学習モデルの信頼性を総合的に評価できる。
また、評価装置１は、データセットに含まれる単語のうち、説明値の大きさが上位所定数の単語を出力することで、特徴量を適切にユーザに提示できる。 The evaluation device 1 outputs the entire text of the text data selected from the list after emphasizing the designated words and the words having the upper predetermined number of explanatory values in the text data.
As a result, the user can check the entire text data including the specified word, easily grasp other word groups having a high degree of influence, and comprehensively evaluate the reliability of the machine learning model.
Further, the evaluation device 1 can appropriately present the feature amount to the user by outputting the words having the magnitude of the upper predetermined number of the explanatory values among the words included in the data set.

評価装置１は、データセットのうち、処理対象のテキストデータを、予測値の正誤の区分に基づいて、あるいは、テキストデータの長さに基づいて選別する。
これにより、ユーザは、テキストデータの種類を絞り込み、この範囲での特定の単語の影響度を詳細に評価できる。 The evaluation device 1 selects the text data to be processed from the data set based on the classification of the correctness of the predicted value or the length of the text data.
As a result, the user can narrow down the types of text data and evaluate the degree of influence of a specific word in this range in detail.

以上、本発明の実施形態について説明したが、本発明は前述した実施形態に限るものではない。また、前述した実施形態に記載された効果は、本発明から生じる最も好適な効果を列挙したに過ぎず、本発明による効果は、実施形態に記載されたものに限定されるものではない。 Although the embodiments of the present invention have been described above, the present invention is not limited to the above-described embodiments. Moreover, the effects described in the above-described embodiments are merely a list of the most suitable effects resulting from the present invention, and the effects according to the present invention are not limited to those described in the embodiments.

前述の実施形態では、テキストデータが英語の場合を例示したが、これには限られず、本発明は、任意の言語のテキストデータに適用可能である。 In the above-described embodiment, the case where the text data is in English has been exemplified, but the present invention is not limited to this, and the present invention can be applied to text data in any language.

また、前述の実施形態では、説明値の分布データを複数の単語間で比較する場合、２つのキーワードが指定される例を示したが、比較する単語は２つには限られず、３つ以上の単語がグラフ及び統計情報により比較して示されてもよい。 Further, in the above-described embodiment, when comparing the distribution data of the explanatory values between a plurality of words, an example in which two keywords are specified is shown, but the number of words to be compared is not limited to two, and three or more words are compared. Words may be compared and shown by graphs and statistical information.

また、前述の実施形態では、テキストデータの特徴量として、テキストデータを構成する単語を例示し、その説明値の分布を出力する形態を説明した。本発明において、テキストデータの特徴量は、単語には限られず、１又は複数の単語から構成される語句であってもよい。
この場合、検索のためのキーワードには、１又は複数の連続した単語が入力され、また、一覧出力部１５及び単語出力部１６等による重要単語の提示は、所定数の語句の提示となる。なお、テキストデータ中の語句を判別するために、予め語句を定義した辞書が設けられてもよいし、既存の構文解析の技術により自動で語句の区切りが判別されてもよい。 Further, in the above-described embodiment, as the feature amount of the text data, a word constituting the text data is exemplified, and a mode of outputting the distribution of the explanatory values has been described. In the present invention, the feature amount of the text data is not limited to a word, and may be a phrase composed of one or a plurality of words.
In this case, one or a plurality of consecutive words are input as the keyword for the search, and the presentation of important words by the list output unit 15, the word output unit 16, and the like results in the presentation of a predetermined number of words and phrases. In addition, in order to discriminate words and phrases in text data, a dictionary in which words and phrases are defined in advance may be provided, or word / phrase delimiters may be automatically discriminated by existing parsing techniques.

評価装置１による評価方法は、ソフトウェアにより実現される。ソフトウェアによって実現される場合には、このソフトウェアを構成するプログラムが、情報処理装置（コンピュータ）にインストールされる。また、これらのプログラムは、ＣＤ−ＲＯＭのようなリムーバブルメディアに記録されてユーザに配布されてもよいし、ネットワークを介してユーザのコンピュータにダウンロードされることにより配布されてもよい。さらに、これらのプログラムは、ダウンロードされることなくネットワークを介したＷｅｂサービスとしてユーザのコンピュータに提供されてもよい。 The evaluation method by the evaluation device 1 is realized by software. When realized by software, the programs that make up this software are installed in the information processing device (computer). Further, these programs may be recorded on a removable medium such as a CD-ROM and distributed to the user, or may be distributed by being downloaded to the user's computer via a network. Further, these programs may be provided to the user's computer as a Web service via a network without being downloaded.

１評価装置
１０制御部
１１データセット入力部
１２予測値入力部
１３説明値入力部
１４分布出力部
１５一覧出力部
１６単語出力部（語句出力部）
１７フィルタ部
２０記憶部 1 Evaluation device 10 Control unit 11 Data set input unit 12 Predicted value input unit 13 Explanation value input unit 14 Distribution output unit 15 List output unit 16 Word output unit (word output unit)
17 Filter section 20 Storage section

Claims

A dataset input unit that accepts multiple text data and a dataset containing the correct classification of each text data,
A predicted value input unit that accepts predicted values of the classification by a machine learning model that inputs the text data, and
An explanatory value input unit that accepts explanatory values that explain the degree of influence on the predicted value for each word that constitutes the text data, and an explanatory value input unit.
An evaluation device including a distribution output unit that outputs distribution data of the explanatory values for each location appearing in the data set for a specified phrase.

The evaluation device according to claim 1, wherein the distribution output unit compares and outputs distribution data of the explanatory values for the plurality of words and phrases in response to the designation of the plurality of words and phrases.

The evaluation device according to claim 1 or 2, wherein the distribution output unit outputs a graph in which the explanatory values are plotted.

The evaluation device according to any one of claims 1 to 3, wherein the distribution output unit outputs statistical values of the distribution data.

The evaluation device according to any one of claims 1 to 4, further comprising a list output unit that outputs the explanatory value, the predicted value, and the correct classification together with the list of the text data including the designated phrase.

The evaluation device according to claim 5, wherein the list output unit outputs statistical values relating to correctness of the predicted values in the list.

The list output unit outputs the entire text of the text data selected from the list by emphasizing the specified phrase and the phrase having the upper predetermined number of explanatory values in the text data. Item 5. The evaluation device according to claim 6.

The evaluation device according to any one of claims 1 to 7, further comprising a word / phrase output unit that outputs a word / phrase having a predetermined number of words / phrases having a magnitude higher than that of the explanatory value among the words / phrases included in the data set.

The evaluation device according to any one of claims 1 to 8, further comprising a filter unit for selecting text data to be processed from the data set based on the classification of correctness of the predicted value.

The evaluation device according to claim 9, wherein the filter unit further selects the text data to be processed based on the length of the text data.

A dataset input step that accepts multiple text data and a dataset containing the correct classification of each of the text data.
A predicted value input step that accepts predicted values of the classification by a machine learning model using the text data as input, and
An explanatory value input step that accepts an explanatory value that explains the degree of influence on the predicted value for each word that constitutes the text data, and an explanatory value input step.
An evaluation method in which a computer executes a distribution output step for outputting distribution data of the explanatory values for each location appearing in the data set for a specified phrase.

An evaluation program for operating a computer as the evaluation device according to any one of claims 1 to 10.