JP2020115292A

JP2020115292A - Evaluation support program, evaluation support method, and information processing device

Info

Publication number: JP2020115292A
Application number: JP2019006433A
Authority: JP
Inventors: 憲二大木; Kenji Oki; 英生谷田; Hideo Tanida
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2019-01-17
Filing date: 2019-01-17
Publication date: 2020-07-30
Anticipated expiration: 2039-01-17
Also published as: JP7275591B2

Abstract

To provide an evaluation support program, an evaluation support method, and an information processing device capable of supporting validity of a determination result derived through a machine learning.SOLUTION: A prediction result screen 800 is a screen to display "hardware" categorized with prediction types by corresponding them to input item names "AP service." The input item names correspond to input data of objects to be determined. The prediction types correspond to categories in which the input data of the objects to be determined are determined using a learning model MD. On the prediction result screen 800, an item name "P server" and "P server" are displayed by corresponding to an item name "AP service." The item name "P server" and "P server" are two upper input data with larger similarity to the input data "AP service" of the object to be determined among learnt input data (item name) belonging to the category "hardware" and information forming basis of the prediction result derived using the learning model MD.SELECTED DRAWING: Figure 8

Description

本発明は、評価支援プログラム、評価支援方法および情報処理装置に関する。 The present invention relates to an evaluation support program, an evaluation support method, and an information processing device.

近年、ＡＩ（ＡｒｔｉｆｉｃｉａｌＩｎｔｅｌｌｉｇｅｎｃｅ）を用いて業務判断を自動化することが行われている。業務判断は、判断基準が必ずしもルール化されておらず、人の経験やノウハウに基づいて行われることが多い。このような業務判断について、例えば、過去の文字列の入力データを特徴量として表現し、判断結果をラベルとした教師あり学習を行うことで、新規の入力データに対するラベル付け（クラス分類）を自動化することが行われている。 In recent years, AI (Artificial Intelligence) has been used to automate business decisions. Business decisions are not always made into rules, but are often made based on human experience and know-how. For such business judgment, for example, the input data of the past character string is expressed as the feature amount, and the supervised learning using the judgment result as the label is performed to automate the labeling (class classification) for the new input data. Is being done.

先行技術としては、例えば、入力データの例外度を判定し、判定結果に基づいてモデル記憶部から学習モデルを選択し、モデル実行履歴記憶部、モデル間従属関係に格納されているデータを用いて学習済みモデルの更新処理を行うものがある。 As the prior art, for example, the exceptional degree of the input data is determined, the learning model is selected from the model storage unit based on the determination result, and the data stored in the model execution history storage unit and the inter-model dependency relationship is used. There is one that updates the learned model.

特開平１０−０７４１８８号公報JP, 10-074188, A

しかしながら、従来技術では、機械学習に基づく手法により得られた判断結果の妥当性について評価することが難しい。例えば、機械学習でラベルを判断した根拠を示さなければ、人間が判断結果の妥当性を評価することができない場合がある。 However, with the conventional technology, it is difficult to evaluate the validity of the judgment result obtained by the method based on machine learning. For example, a person may not be able to evaluate the validity of the judgment result unless the basis for judging the label by machine learning is shown.

一つの側面では、本発明は、機械学習により得られる判断結果の妥当性についての評価を支援することを目的とする。 In one aspect, the present invention aims to assist in evaluating the validity of a judgment result obtained by machine learning.

１つの実施態様では、入力データの特徴量から当該入力データが属するカテゴリを判断する学習モデルを用いて、判断対象の入力データが属するカテゴリを判断し、前記学習モデルを生成する際に用いた学習済みの入力データから、判断した前記カテゴリに属する、前記判断対象の入力データとは異なる他の入力データを抽出し、前記判断対象の入力データと対応付けて、判断した前記カテゴリと、抽出した前記他の入力データとを出力する、評価支援プログラムが提供される。 In one embodiment, a learning model that determines the category to which the input data to be determined is determined using a learning model that determines the category to which the input data belongs from the feature amount of the input data, and the learning used when generating the learning model. From the already input data, other input data that is different from the determination target input data belonging to the determined category is extracted, and is associated with the determination target input data, and the determined category and the extracted An evaluation support program that outputs other input data is provided.

本発明の一側面によれば、機械学習により得られる判断結果の妥当性についての評価を支援することができる。 According to one aspect of the present invention, it is possible to support the evaluation of the validity of the judgment result obtained by machine learning.

図１は、実施の形態にかかる評価支援方法の一実施例を示す説明図である。FIG. 1 is an explanatory diagram showing an example of the evaluation support method according to the embodiment. 図２は、情報処理システム２００のシステム構成例を示す説明図である。FIG. 2 is an explanatory diagram showing a system configuration example of the information processing system 200. 図３は、情報処理装置１０１のハードウェア構成例を示すブロック図である。FIG. 3 is a block diagram showing a hardware configuration example of the information processing apparatus 101. 図４は、学習データＤＢ２２０の記憶内容の一例を示す説明図である。FIG. 4 is an explanatory diagram showing an example of the storage contents of the learning data DB 220. 図５は、情報処理装置１０１の機能的構成例を示すブロック図である。FIG. 5 is a block diagram showing a functional configuration example of the information processing apparatus 101. 図６は、特徴量ベクトルの算出例を示す説明図である。FIG. 6 is an explanatory diagram showing an example of calculating the feature amount vector. 図７は、類似度テーブル７００の記憶内容の一例を示す説明図である。FIG. 7 is an explanatory diagram showing an example of the stored contents of the similarity table 700. 図８は、予測結果画面の画面例を示す説明図（その１）である。FIG. 8 is an explanatory diagram (No. 1) showing a screen example of the prediction result screen. 図９は、予測結果画面の画面例を示す説明図（その２）である。FIG. 9 is an explanatory view (No. 2) showing an example of the prediction result screen. 図１０は、予測結果画面の画面例を示す説明図（その３）である。FIG. 10 is an explanatory diagram (part 3) showing an example of the prediction result screen. 図１１は、情報処理装置１０１の学習処理手順の一例を示すフローチャートである。FIG. 11 is a flowchart showing an example of a learning processing procedure of the information processing apparatus 101. 図１２は、情報処理装置１０１の第１の予測処理手順の一例を示すフローチャート（その１）である。FIG. 12 is a flowchart (No. 1) showing an example of a first prediction processing procedure of the information processing apparatus 101. 図１３は、情報処理装置１０１の第１の予測処理手順の一例を示すフローチャート（その２）である。FIG. 13 is a flowchart (No. 2) showing an example of the first prediction processing procedure of the information processing apparatus 101. 図１４は、情報処理装置１０１の第２の予測処理手順の一例を示すフローチャート（その１）である。FIG. 14 is a flowchart (No. 1) illustrating an example of the second prediction processing procedure of the information processing apparatus 101. 図１５は、情報処理装置１０１の第２の予測処理手順の一例を示すフローチャート（その２）である。FIG. 15 is a flowchart (No. 2) showing an example of the second prediction processing procedure of the information processing apparatus 101.

以下に図面を参照して、本発明にかかる評価支援プログラム、評価支援方法および情報処理装置の実施の形態を詳細に説明する。 Embodiments of an evaluation support program, an evaluation support method, and an information processing apparatus according to the present invention will be described in detail below with reference to the drawings.

（実施の形態）
図１は、実施の形態にかかる評価支援方法の一実施例を示す説明図である。図１において、情報処理装置１０１は、機械学習に基づく手法により得られる判断結果の妥当性についての評価を支援するコンピュータである。機械学習は、様々なデータから学習モデル（予測モデル）を作成し、結果を予測する学習である。 (Embodiment)
FIG. 1 is an explanatory diagram showing an example of the evaluation support method according to the embodiment. In FIG. 1, the information processing apparatus 101 is a computer that supports evaluation of the validity of a determination result obtained by a method based on machine learning. Machine learning is learning that creates a learning model (prediction model) from various data and predicts the result.

ここで、ＡＩを用いて業務判断を自動化する事例が増えている。業務判断の例としては、経理担当者が、購入依頼品Ｘ１に対して経費コードＹ１を付与したり、購買受付担当者が、購入案件Ｘ２に対して担当者Ｙ２を設定したりするものがある。このような業務判断について、過去の入力データを特徴量として表現し、判断結果をラベルとした教師あり学習を行うことで、新規の入力データに対するラベル付けを自動化できる。上述した例では、購入依頼品Ｘ１や購入案件Ｘ２が「入力データ」に相当し、経費コードＹ１や担当者Ｙ２が「ラベル」に相当する。 Here, there are an increasing number of cases where AI is used to automate business decisions. As an example of the work determination, there is a case where the accounting staff assigns the expense code Y1 to the purchase request product X1, and the purchase reception staff sets the staff Y2 to the purchase case X2. .. With regard to such business judgment, by labeling past input data as a feature amount and performing supervised learning using the judgment result as a label, labeling of new input data can be automated. In the above-mentioned example, the purchase request product X1 and the purchase item X2 correspond to “input data”, and the expense code Y1 and the person in charge Y2 correspond to “label”.

機械学習による自動化で１００％の正解率を出すことは困難であるが、判断により得られたラベルが正しいか否かを人間が確認する作業を経ることで、業務への適用が可能となり、人間が一から判断する場合と比較して業務の効率化が期待できる。人が確認した際に、ラベルが間違っていると判断した場合はラベルの修正作業が必要となる。 It is difficult to get a 100% correct answer rate by automation by machine learning, but it becomes possible to apply it to work by going through the task of human beings to confirm whether the label obtained by judgment is correct or not. It can be expected that work efficiency will be improved compared to the case where the judgment is made from scratch. When a person confirms that the label is wrong, it is necessary to correct the label.

しかし、機械学習でラベルを判断した根拠となる情報が提示されなければ、人間が判断結果の妥当性を評価することが難しい場合がある。人間が判断結果の妥当性を適切に評価することができなければ、ラベルが間違って修正されたり、ラベルの間違いが見逃されたりする可能性がある。 However, it may be difficult for a human to evaluate the validity of the judgment result unless the information that is the basis for judging the label by the machine learning is presented. If humans cannot properly evaluate the validity of the judgment result, the label may be erroneously corrected or the label error may be overlooked.

なお、判断結果の根拠を提示可能な機械学習アルゴリズムとして、決定木を用いたクラス分類が考えられる。決定木を用いたクラス分類では、各ノードがルール（論理式）を持つ木構造として学習モデルを表現できるため、根拠をルールの列として説明することが可能である。 As a machine learning algorithm capable of presenting the basis of the judgment result, class classification using a decision tree can be considered. In the class classification using the decision tree, the learning model can be expressed as a tree structure in which each node has a rule (logical expression), so that the basis can be explained as a sequence of rules.

しかし、ルールはあくまで機械が分類するために作成されるものであり、各々のルールは人間が見てわかりやすいものではないことが多い。また、ある入力データに対して、最大で木の深さ分のルールが適用されるため、根拠を理解するために把握しなければいけないルールの数が膨大となる。 However, rules are created only by machines for classification, and each rule is often not easy for humans to understand. In addition, since a maximum of tree depth rules are applied to certain input data, the number of rules that must be understood in order to understand the grounds becomes enormous.

また、決定木を用いる手法自体が機械学習アルゴリズムの一実装であるため、この手法で精度がでない場合には、根拠となるルール自体が信頼できず適用することができない。したがって、機械学習により得られる判断結果の精度を確保しつつ、判断結果の妥当性について人間による評価を可能にする技術が望まれる。 Further, since the method itself using the decision tree is one implementation of the machine learning algorithm, if the accuracy is not high in this method, the underlying rule itself is not reliable and cannot be applied. Therefore, there is a demand for a technique that enables human evaluation of the validity of the judgment result while ensuring the accuracy of the judgment result obtained by machine learning.

そこで、本実施の形態では、判断対象として入力されるデータに対して、機械学習により得られたカテゴリを出力する際に、当該カテゴリに属すると判断される他の事例を示すことで、機械学習により得られる判断結果の妥当性についての評価を支援する評価支援方法について説明する。以下、情報処理装置１０１の処理例について説明する。 Therefore, in the present embodiment, when outputting a category obtained by machine learning to data input as a determination target, by showing another example that is determined to belong to the category, machine learning is performed. An evaluation support method for supporting the evaluation of the validity of the judgment result obtained by will be described. Hereinafter, a processing example of the information processing apparatus 101 will be described.

（１）情報処理装置１０１は、学習モデル１１０を用いて、判断対象の入力データが属するカテゴリを判断する。ここで、学習モデル１１０は、入力データの特徴量から当該入力データが属するカテゴリを判断する予測モデルである。学習モデル１１０は、例えば、数式や決定木の木構造データによって表される。 (1) The information processing apparatus 101 uses the learning model 110 to determine the category to which the input data to be determined belongs. Here, the learning model 110 is a prediction model that determines the category to which the input data belongs from the feature amount of the input data. The learning model 110 is represented by, for example, mathematical formulas or tree structure data of decision trees.

入力データは、どのカテゴリに属するかの判断対象として入力されるデータであり、例えば、上述した購入依頼品Ｘ１や購入案件Ｘ２である。カテゴリは、入力データを分類する種別であり、例えば、上述した経費コードＹ１や担当者Ｙ２である。入力データの特徴量は、入力データの特徴を数値化したものである。例えば、入力データの特徴量は、Ｎ−ｇｒａｍの特徴量ベクトルによって表現される。 The input data is data that is input as a determination target of which category it belongs to, and is, for example, the purchase request product X1 or the purchase project X2 described above. The category is a type for classifying the input data, and is, for example, the expense code Y1 or the person in charge Y2 described above. The characteristic amount of the input data is a numerical value of the characteristic of the input data. For example, the feature amount of the input data is represented by an N-gram feature amount vector.

図１の例では、判断対象の入力データを「入力データＸ」とし、入力データＸが属するカテゴリとして「カテゴリＹ」が判断された場合を想定する。 In the example of FIG. 1, it is assumed that the input data to be determined is “input data X”, and “category Y” is determined as the category to which the input data X belongs.

（２）情報処理装置１０１は、学習済みデータ１２０から、判断したカテゴリに属する、判断対象の入力データとは異なる入力データを抽出する。ここで、学習済みデータ１２０は、学習モデル１１０を生成する際に用いた学習済みの入力データを含む。 (2) The information processing apparatus 101 extracts, from the learned data 120, input data belonging to the determined category and different from the input data to be determined. Here, the learned data 120 includes the learned input data used when the learning model 110 is generated.

具体的には、学習済みデータ１２０は、学習モデル１１０の生成に用いられた教師データの集合である。教師データは、教師あり学習において与えられる「例題」と「答え」についてのデータであり、入力データ（例題）と、当該入力データが属するカテゴリ（答え）とのペアである。 Specifically, the learned data 120 is a set of teacher data used for generating the learning model 110. The teacher data is data about “example” and “answer” given in supervised learning, and is a pair of input data (example) and a category (answer) to which the input data belongs.

図１の例では、学習済みデータ１２０から、カテゴリＹに属する、判断対象の入力データＸとは異なる「入力データＸ’」が抽出された場合を想定する。 In the example of FIG. 1, it is assumed that “input data X′” that belongs to category Y and is different from the input data X to be determined is extracted from the learned data 120.

（３）情報処理装置１０１は、判断対象の入力データと対応付けて、判断したカテゴリと、抽出した入力データとを出力する。具体的には、例えば、情報処理装置１０１は、判断対象の入力データＸと対応付けて、判断したカテゴリＹと、抽出した入力データＸ’とを出力する。 (3) The information processing apparatus 101 outputs the determined category and the extracted input data in association with the input data to be determined. Specifically, for example, the information processing apparatus 101 outputs the determined category Y and the extracted input data X′ in association with the determination target input data X.

このように、情報処理装置１０１によれば、判断対象の入力データに対して、機械学習により得られたカテゴリを出力する際に、当該カテゴリに属すると判断される他の事例（学習済みの入力データ）を示すことができる。これにより、カテゴリを判断した根拠となる情報を提示することができ、機械学習により得られた判断結果の妥当性についての評価を支援することができる。 As described above, according to the information processing apparatus 101, when outputting a category obtained by machine learning to the input data to be determined, another case (a learned input is determined to belong to the category) is output. Data) can be shown. As a result, it is possible to present information that serves as the basis for determining the category, and it is possible to support the evaluation of the validity of the determination result obtained by machine learning.

図１の例では、判断対象の入力データＸと対応付けて、カテゴリＹと入力データＸ’とが出力される。これにより、ユーザは、例えば、入力データＸ，Ｘ’それぞれの正確なカテゴリは知らなくても、入力データＸと入力データＸ’とが同じ種別のものではないと判断できれば、入力データＸに対する判断結果（カテゴリＹ）が誤っている可能性があることに気付くことができる。この結果、ユーザが判断結果を注意して確認することができ、ラベル（カテゴリ）の間違いが見逃されるのを防ぐことができる。なお、以下の説明では、「カテゴリ」を「ラベル」と表記する場合がある。 In the example of FIG. 1, the category Y and the input data X′ are output in association with the input data X to be determined. Thus, if the user can determine that the input data X and the input data X′ are not of the same type without knowing the correct categories of the input data X and X′, for example, the determination on the input data X is made. It can be noticed that the result (category Y) may be incorrect. As a result, the user can check the determination result carefully, and it is possible to prevent the mistake of the label (category) from being overlooked. In the following description, "category" may be referred to as "label".

（情報処理システム２００のシステム構成例）
つぎに、図１に示した情報処理装置１０１を含む情報処理システム２００のシステム構成例について説明する。情報処理システム２００は、例えば、企業における様々な業務判断を自動化するためのコンピュータシステムに適用される。 (Example of system configuration of information processing system 200)
Next, a system configuration example of the information processing system 200 including the information processing apparatus 101 shown in FIG. 1 will be described. The information processing system 200 is applied to, for example, a computer system for automating various business decisions in a company.

図２は、情報処理システム２００のシステム構成例を示す説明図である。図２において、情報処理システム２００は、情報処理装置１０１と、クライアント装置２０１と、を含む。情報処理システム２００において、情報処理装置１０１およびクライアント装置２０１は、有線または無線のネットワーク２１０を介して接続される。ネットワーク２１０は、例えば、ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）、ＷＡＮ（ＷｉｄｅＡｒｅａＮｅｔｗｏｒｋ）、インターネットなどである。 FIG. 2 is an explanatory diagram showing a system configuration example of the information processing system 200. In FIG. 2, the information processing system 200 includes an information processing device 101 and a client device 201. In the information processing system 200, the information processing apparatus 101 and the client apparatus 201 are connected via a wired or wireless network 210. The network 210 is, for example, a LAN (Local Area Network), a WAN (Wide Area Network), or the Internet.

ここで、情報処理装置１０１は、学習データＤＢ（Ｄａｔａｂａｓｅ）２２０を有する。情報処理装置１０１は、例えば、サーバである。なお、学習データＤＢ２２０の記憶内容については、図４を用いて後述する。 Here, the information processing apparatus 101 has a learning data DB (Data base) 220. The information processing device 101 is, for example, a server. The storage contents of the learning data DB 220 will be described later with reference to FIG.

クライアント装置２０１は、情報処理システム２００のユーザが使用するコンピュータである。情報処理システム２００のユーザは、例えば、企業における経理担当者や購買受付担当者などの各種業務判断を行う者である。クライアント装置２０１は、例えば、ＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）、タブレット型ＰＣなどである。 The client device 201 is a computer used by a user of the information processing system 200. The user of the information processing system 200 is, for example, a person in charge of accounting in a company, a person in charge of purchasing, or the like who makes various business decisions. The client device 201 is, for example, a PC (Personal Computer), a tablet PC, or the like.

なお、上述した説明では、情報処理装置１０１とクライアント装置２０１とが別体に設けられることにしたが、これに限らない。例えば、情報処理装置１０１は、クライアント装置２０１により実現されることにしてもよい。また、図２の例では、クライアント装置２０１を１台のみ表示したが、これに限らない。例えば、情報処理システム２００には、ユーザごとのクライアント装置２０１が含まれていてもよい。 Although the information processing apparatus 101 and the client apparatus 201 are separately provided in the above description, the present invention is not limited to this. For example, the information processing device 101 may be realized by the client device 201. Further, in the example of FIG. 2, only one client device 201 is displayed, but the present invention is not limited to this. For example, the information processing system 200 may include the client device 201 for each user.

（情報処理装置１０１のハードウェア構成例）
図３は、情報処理装置１０１のハードウェア構成例を示すブロック図である。図３において、情報処理装置１０１は、プロセッサ３０１と、メモリ３０２と、ディスクドライブ３０３と、ディスク３０４と、通信Ｉ／Ｆ（Ｉｎｔｅｒｆａｃｅ）３０５と、可搬型記録媒体Ｉ／Ｆ３０６と、可搬型記録媒体３０７と、を有する。また、各構成部は、バス３００によってそれぞれ接続される。 (Example of hardware configuration of information processing apparatus 101)
FIG. 3 is a block diagram showing a hardware configuration example of the information processing apparatus 101. 3, the information processing apparatus 101 includes a processor 301, a memory 302, a disk drive 303, a disk 304, a communication I/F (Interface) 305, a portable recording medium I/F 306, and a portable recording medium. And 307. In addition, each component is connected by a bus 300.

ここで、プロセッサ３０１は、情報処理装置１０１の全体の制御を司る。プロセッサ３０１は、複数のコアを有していてもよい。プロセッサ３０１は、例えば、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）やＧＰＵ（ＧｒａｐｈｉｃｓＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）である。 Here, the processor 301 controls the entire information processing apparatus 101. The processor 301 may have a plurality of cores. The processor 301 is, for example, a CPU (Central Processing Unit) or a GPU (Graphics Processing Unit).

メモリ３０２は、例えば、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）およびフラッシュＲＯＭなどを有する。具体的には、例えば、フラッシュＲＯＭがＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）のプログラムを記憶し、ＲＯＭがアプリケーションプログラムを記憶し、ＲＡＭがプロセッサ３０１のワークエリアとして使用される。メモリ３０２に記憶されるプログラムは、プロセッサ３０１にロードされることで、コーディングされている処理をプロセッサ３０１に実行させる。 The memory 302 includes, for example, a ROM (Read Only Memory), a RAM (Random Access Memory), and a flash ROM. Specifically, for example, the flash ROM stores an OS (Operating System) program, the ROM stores an application program, and the RAM is used as a work area of the processor 301. The program stored in the memory 302 causes the processor 301 to execute the coded processing by being loaded into the processor 301.

ディスクドライブ３０３は、プロセッサ３０１の制御に従ってディスク３０４に対するデータのリード／ライトを制御する。ディスク３０４は、ディスクドライブ３０３の制御で書き込まれたデータを記憶する。ディスク３０４としては、例えば、磁気ディスク、光ディスクなどが挙げられる。 The disk drive 303 controls reading/writing of data with respect to the disk 304 under the control of the processor 301. The disk 304 stores the data written under the control of the disk drive 303. Examples of the disk 304 include a magnetic disk and an optical disk.

通信Ｉ／Ｆ３０５は、通信回線を通じてネットワーク２１０に接続され、ネットワーク２１０を介して外部のコンピュータ（例えば、図２に示したクライアント装置２０１）に接続される。そして、通信Ｉ／Ｆ３０５は、ネットワーク２１０と装置内部とのインターフェースを司り、外部のコンピュータからのデータの入出力を制御する。通信Ｉ／Ｆ３０５には、例えば、モデムやＬＡＮアダプタなどを採用することができる。 The communication I/F 305 is connected to the network 210 via a communication line, and is connected to an external computer (for example, the client device 201 shown in FIG. 2) via the network 210. Then, the communication I/F 305 administers an interface between the network 210 and the inside of the apparatus, and controls the input/output of data from an external computer. For the communication I/F 305, for example, a modem or a LAN adapter can be adopted.

可搬型記録媒体Ｉ／Ｆ３０６は、プロセッサ３０１の制御に従って可搬型記録媒体３０７に対するデータのリード／ライトを制御する。可搬型記録媒体３０７は、可搬型記録媒体Ｉ／Ｆ３０６の制御で書き込まれたデータを記憶する。可搬型記録媒体３０７としては、例えば、ＣＤ（ＣｏｍｐａｃｔＤｉｓｃ）−ＲＯＭ、ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｋ）、ＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）メモリなどが挙げられる。 The portable recording medium I/F 306 controls reading/writing of data with respect to the portable recording medium 307 under the control of the processor 301. The portable recording medium 307 stores the data written under the control of the portable recording medium I/F 306. Examples of the portable recording medium 307 include a CD (Compact Disc)-ROM, a DVD (Digital Versatile Disk), and a USB (Universal Serial Bus) memory.

なお、情報処理装置１０１は、上述した構成部のほかに、例えば、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）、入力装置、ディスプレイ等を有することにしてもよい。また、情報処理装置１０１は、上述した構成部のうち、例えば、ディスクドライブ３０３、ディスク３０４、可搬型記録媒体Ｉ／Ｆ３０６、可搬型記録媒体３０７を有していなくてもよい。また、図２に示したクライアント装置２０１についても、情報処理装置１０１と同様のハードウェア構成により実現することができる。ただし、クライアント装置２０１は、上述した構成部のほかに、入力装置、ディスプレイ（例えば、後述の図８に示すディスプレイ８１０）などを有する。 Note that the information processing apparatus 101 may include, for example, an SSD (Solid State Drive), an input device, a display, and the like, in addition to the above-described components. In addition, the information processing apparatus 101 may not include, for example, the disk drive 303, the disk 304, the portable recording medium I/F 306, and the portable recording medium 307 among the above-described components. Further, the client device 201 shown in FIG. 2 can also be realized by the same hardware configuration as the information processing device 101. However, the client device 201 has an input device, a display (for example, a display 810 shown in FIG. 8 described later), and the like, in addition to the above-described components.

（学習データＤＢ２２０の記憶内容）
つぎに、図４を用いて、情報処理装置１０１が有する学習データＤＢ２２０の記憶内容について説明する。学習データＤＢ２２０は、例えば、図３に示したメモリ３０２、ディスク３０４などの記憶装置により実現される。 (Memory contents of learning data DB 220)
Next, the storage content of the learning data DB 220 included in the information processing apparatus 101 will be described with reference to FIG. The learning data DB 220 is realized by a storage device such as the memory 302 and the disk 304 shown in FIG. 3, for example.

図４は、学習データＤＢ２２０の記憶内容の一例を示す説明図である。図４において、学習データＤＢ２２０は、品名および種別のフィールドを有し、各フィールドに情報を設定することで、学習データ（例えば、学習データ４００−１〜４００−３）をレコードとして記憶する。 FIG. 4 is an explanatory diagram showing an example of the storage contents of the learning data DB 220. In FIG. 4, the learning data DB 220 has fields of product name and type, and stores learning data (for example, learning data 400-1 to 400-3) as records by setting information in each field.

ここで、品名は、品物の名称であり、教師あり学習に用いる入力データ（例題）に相当する。種別は、品名が属するカテゴリ、すなわち、当該品名の品物が属するカテゴリであり、教師あり学習に用いるラベル（答え）に相当する。例えば、学習データ４００−１は、品名「Ｐサーバ」および種別「ハードウェア」を含み、「Ｐサーバ」が属する種別（カテゴリ）が「ハードウェア」であることを示す。 Here, the product name is the name of the product and corresponds to input data (example) used for learning with a teacher. The type is a category to which a product name belongs, that is, a category to which a product having the product name belongs, and corresponds to a label (answer) used for supervised learning. For example, the learning data 400-1 includes the product name “P server” and the type “hardware”, and indicates that the type (category) to which the “P server” belongs is “hardware”.

（情報処理装置１０１の機能的構成例）
図５は、情報処理装置１０１の機能的構成例を示すブロック図である。図５において、情報処理装置１０１は、取得部５０１と、学習処理部５０２と、受付部５０３と、予測処理部５０４と、抽出部５０５と、出力部５０６と、更新部５０７と、記憶部５１０と、を含む。具体的には、例えば、取得部５０１〜更新部５０７は、図３に示したメモリ３０２、ディスク３０４、可搬型記録媒体３０７などの記憶装置に記憶されたプログラムをプロセッサ３０１に実行させることにより、または、通信Ｉ／Ｆ３０５により、その機能を実現する。各機能部の処理結果は、例えば、メモリ３０２、ディスク３０４などの記憶装置に記憶される。また、記憶部５１０は、例えば、メモリ３０２、ディスク３０４などの記憶装置により実現される。 (Example of functional configuration of information processing apparatus 101)
FIG. 5 is a block diagram showing a functional configuration example of the information processing apparatus 101. In FIG. 5, the information processing apparatus 101 includes an acquisition unit 501, a learning processing unit 502, a reception unit 503, a prediction processing unit 504, an extraction unit 505, an output unit 506, an updating unit 507, and a storage unit 510. And, including. Specifically, for example, the acquisition unit 501 to the update unit 507 cause the processor 301 to execute a program stored in a storage device such as the memory 302, the disk 304, and the portable recording medium 307 illustrated in FIG. Alternatively, the function is realized by the communication I/F 305. The processing result of each functional unit is stored in a storage device such as the memory 302 or the disk 304, for example. The storage unit 510 is realized by a storage device such as the memory 302 and the disk 304, for example.

取得部５０１は、学習データを取得する。ここで、学習データは、機械学習（教師あり学習）に用いられる情報であり、入力データと、当該入力データが属するカテゴリとを含む。例えば、学習データは、品名（入力データ）と、当該品名が属する種別（カテゴリ）との組である。 The acquisition unit 501 acquires learning data. Here, the learning data is information used for machine learning (learning with a teacher), and includes input data and a category to which the input data belongs. For example, the learning data is a set of a product name (input data) and a type (category) to which the product name belongs.

具体的には、例えば、取得部５０１は、クライアント装置２０１から学習データ（品名、種別）を受信することにより、受信した学習データ（品名、種別）を取得する。また、取得部５０１は、情報処理装置１０１の不図示の入力装置を用いたユーザの操作入力により、学習データ（品名、種別）を取得することにしてもよい。 Specifically, for example, the acquisition unit 501 acquires the received learning data (product name, type) by receiving the learning data (product name, type) from the client device 201. Further, the acquisition unit 501 may acquire the learning data (product name, type) by an operation input of the user using an input device (not shown) of the information processing device 101.

取得された学習データ（品名、種別）は、例えば、図４に示した学習データＤＢ２２０に記憶される。 The acquired learning data (product name, type) is stored in the learning data DB 220 shown in FIG. 4, for example.

学習処理部５０２は、取得された学習データに基づいて、学習モデルＭＤを生成する。ここで、学習モデルＭＤは、入力データの特徴量から当該入力データが属するカテゴリを判断する予測モデルである。すなわち、学習モデルＭＤは、入力データに対するラベル付け（多クラス分類）を行う。学習モデルＭＤは、数式で表現されてもよく、また、決定木の木構造データで表現されてもよい。図１に示した学習モデル１１０は、例えば、学習モデルＭＤに相当する。 The learning processing unit 502 generates a learning model MD based on the acquired learning data. Here, the learning model MD is a prediction model that determines the category to which the input data belongs from the feature amount of the input data. That is, the learning model MD labels the input data (multiclass classification). The learning model MD may be expressed by a mathematical expression, or may be expressed by tree structure data of a decision tree. The learning model 110 shown in FIG. 1 corresponds to, for example, the learning model MD.

具体的には、例えば、まず、学習処理部５０２は、ベースとなる学習モデルＭＤを取得する。ベースとなる学習モデルＭＤは、例えば、予め作成されて、メモリ３０２、ディスク３０４などの記憶装置に記憶されている。つぎに、学習処理部５０２は、学習データＤＢ２２０から学習データを取得する。そして、学習処理部５０２は、取得した学習データの品名（入力データ）の特徴量ベクトルを算出する。 Specifically, for example, first, the learning processing unit 502 acquires a learning model MD that is a base. The learning model MD as a base is, for example, created in advance and stored in a storage device such as the memory 302 or the disk 304. Next, the learning processing unit 502 acquires learning data from the learning data DB 220. Then, the learning processing unit 502 calculates the feature quantity vector of the product name (input data) of the acquired learning data.

つぎに、学習処理部５０２は、取得した学習データを、特徴量ベクトルとラベルの組として表現する。そして、学習処理部５０２は、特徴量ベクトルとラベルとの組を入力情報ＩＮ＿Ｄに格納する。この際、学習処理部５０２は、例えば、各品名（入力データ）と対応付けて、各品名（入力データ）の特徴量ベクトルとラベルとの組を入力情報ＩＮ＿Ｄに格納する。 Next, the learning processing unit 502 represents the acquired learning data as a set of a feature amount vector and a label. Then, the learning processing unit 502 stores the set of the feature amount vector and the label in the input information IN_D. At this time, the learning processing unit 502 stores the set of the feature amount vector and the label of each product name (input data) in the input information IN_D in association with each product name (input data), for example.

ここで、図６を用いて、品名（入力データ）の特徴量ベクトルの算出例について説明する。ここでは、品名（入力データ）の特徴量ベクトルとして、Ｔｒｉ−ｇｒａｍの特徴量ベクトルを算出する場合を例に挙げて説明する。 Here, an example of calculating the feature amount vector of the product name (input data) will be described with reference to FIG. Here, a case where a Tri-gram feature amount vector is calculated as the feature amount vector of the product name (input data) will be described as an example.

図６は、特徴量ベクトルの算出例を示す説明図である。図６において、品名（入力データ）の例として、「Ｐサーバ」、「Ｑソフト」および「Ｐサーバー」が示されている。ここでは、学習データＤＢ２２０内の品名（入力データ）が、「Ｐサーバ」、「Ｑソフト」および「Ｐサーバー」の３つである場合を想定する。 FIG. 6 is an explanatory diagram showing an example of calculating the feature amount vector. In FIG. 6, as examples of product names (input data), “P server”, “Q software”, and “P server” are shown. Here, it is assumed that there are three product names (input data) in the learning data DB 220: “P server”, “Q software”, and “P server”.

この場合、学習処理部５０２は、各品名（入力データ）における部分文字列の存在の有無に応じて「０」または「１」を取ることで、各品名（入力データ）の特徴量ベクトルを算出する。各部分文字列は、各品名（入力データ）を３文字区切りで分割したものである。ただし、＄は、空白文字を示す。 In this case, the learning processing unit 502 calculates the feature amount vector of each product name (input data) by taking “0” or “1” depending on the presence or absence of the partial character string in each product name (input data). To do. Each partial character string is a product name (input data) divided into three character segments. However, $ indicates a blank character.

「Ｐサーバ」を例に挙げると、「＄＄Ｐ，＄Ｐサ，Ｐサー，サーバ，ーバ＄，バ＄＄，＄＄Ｑ，＄Ｑソ，Ｑソフ，ソフト，フト＄，ト＄＄，＄＄Ｐ，＄Ｐサ，Ｐサー，サーバ，ーバー，バー＄，ー＄＄」の各部分文字列の存在の有無に応じて特徴量ベクトルを算出する。なお、図６では、一部の部分文字列を省略している。 Taking "P server" as an example, "P$P, $P server, P server, server, server $, server $$, $$Q, $Q software, Q software, software, software $, server $" A feature amount vector is calculated according to the presence or absence of each partial character string of "$, $$P, $P server, P server, server, overbar, bar $, -$$". Note that some of the partial character strings are omitted in FIG.

例えば、部分文字列「＄＄Ｐ」は、「Ｐサーバ」に含まれる。このため、「Ｐサーバ」の特徴量ベクトルのうち、部分文字列「＄＄Ｐ」に対応する値は「１」となる。また、部分文字列「＄＄Ｑ」は、「Ｐサーバ」に含まれない。このため、「Ｐサーバ」の特徴量ベクトルのうち、部分文字列「＄＄Ｑ」に対応する値は「０」となる。 For example, the partial character string “$$P” is included in “P server”. Therefore, the value corresponding to the partial character string “$$P” in the feature amount vector of “P server” is “1”. In addition, the partial character string “$$Q” is not included in “P server”. Therefore, the value corresponding to the partial character string “$$Q” in the feature value vector of “P server” is “0”.

このように、各部分文字列の存在の有無に応じて「０」または「１」を取ることにより、「Ｐサーバ」の特徴量ベクトル「１，１，１，１，１，１，０，…」を算出することができる。同様にして、「Ｑサーバ」の特徴量ベクトル「０，０，０，０，０，０，１，…」および「Ｐサーバー」の特徴量ベクトル「１，１，１，１，０，１，０，…」を算出することができる。 In this way, by taking "0" or "1" depending on the presence or absence of each partial character string, the feature amount vector "1,1,1,1,1,1,0," of the "P server" is obtained. ..." can be calculated. Similarly, the feature quantity vector “0,0,0,0,0,0,1,...” of “Q server” and the feature quantity vector “1,1,1,1,0,1” of “P server”. , 0,... ”can be calculated.

また、学習処理部５０２は、各品名（入力データ）の特徴量ベクトルを疎ベクトルとして表現してもよい。例えば、各品名（入力データ）の特徴量ベクトルは、「１」の要素に比べて「０」の要素が多い傾向にある。そこで、全品名（入力データ）の特徴量ベクトルを行列（図６参照）で表現した場合に、各品名（入力データ）の特徴量ベクトルを、「１」が何行何列目にあるかという情報として表現することにしてもよい。 Further, the learning processing unit 502 may express the feature amount vector of each product name (input data) as a sparse vector. For example, the feature quantity vector of each product name (input data) tends to have more “0” elements than “1” elements. Therefore, when the feature quantity vectors of all product names (input data) are represented by a matrix (see FIG. 6 ), the feature quantity vector of each product name (input data) is defined as what row and column of “1” is. It may be expressed as information.

例えば、「Ｐサーバ」の特徴量ベクトルを、「（１，１），（１，２），（１，３），…，（１，６）」というリスト構造で表現することにしてもよい。これにより、各品名（入力データ）の特徴量ベクトルの全要素を記憶する場合に比べて、各品名（入力データ）の特徴量ベクトルを記憶する際の使用メモリを抑えることができる。 For example, the feature amount vector of "P server" may be represented by a list structure of "(1,1), (1,2), (1,3),..., (1,6)". .. As a result, compared to the case where all the elements of the feature amount vector of each item name (input data) are stored, it is possible to suppress the memory used when storing the feature amount vector of each item name (input data).

図５の説明に戻り、学習処理部５０２は、入力情報ＩＮ＿Ｄに格納された特徴量ベクトルとラベルとの組（教師データ）に基づいて、教師あり学習を行って、学習モデルＭＤを更新する。より詳細に説明すると、例えば、学習処理部５０２は、重回帰分析やロジスティック回帰等を行って数式内のパラメータを調整したり、木構造データの各ノードが持つルールを更新（修正、追加、削除など）したりすることにより、学習モデルＭＤを更新する。 Returning to the description of FIG. 5, the learning processing unit 502 updates the learning model MD by performing supervised learning based on the set of the feature amount vector and the label (teacher data) stored in the input information IN_D. More specifically, for example, the learning processing unit 502 adjusts the parameters in the mathematical expression by performing multiple regression analysis, logistic regression, or the like, and updates (corrects, adds, or deletes) the rules of each node of the tree structure data. Etc.) and the learning model MD is updated.

受付部５０３は、判断対象の入力データを受け付ける。ここで、判断対象の入力データは、どのカテゴリ（ラベル）に属するかを判断する対象となるデータである。判断対象の入力データは、例えば、品名である。カテゴリは、例えば、品名が属する種別である。 The receiving unit 503 receives input data to be determined. Here, the input data to be judged is the data to be judged to which category (label) it belongs. The input data to be judged is, for example, a product name. The category is, for example, the type to which the product name belongs.

具体的には、例えば、受付部５０３は、クライアント装置２０１から判断対象の入力データ（品名）を受信することにより、受信した判断対象の入力データ（品名）を受け付ける。また、取得部５０１は、情報処理装置１０１の不図示の入力装置を用いたユーザの操作入力により、判断対象の入力データ（品名）を受け付けることにしてもよい。 Specifically, for example, the receiving unit 503 receives the input data (product name) to be determined from the client device 201, and thus receives the input data (product name) to be received. Further, the acquisition unit 501 may accept the input data (product name) to be determined by a user's operation input using an input device (not shown) of the information processing apparatus 101.

予測処理部５０４は、学習モデルＭＤを用いて、判断対象の入力データが属するカテゴリを判断する。具体的には、例えば、予測処理部５０４は、受け付けた判断対象の入力データの特徴量ベクトルを算出する。より具体的には、例えば、予測処理部５０４は、判断対象の入力データの特徴量ベクトルとして、Ｔｒｉ−ｇｒａｍの特徴量ベクトルを算出する。 The prediction processing unit 504 determines the category to which the input data to be determined belongs, using the learning model MD. Specifically, for example, the prediction processing unit 504 calculates a feature amount vector of the received determination target input data. More specifically, for example, the prediction processing unit 504 calculates a Tri-gram feature amount vector as the feature amount vector of the input data to be determined.

一例として、判断対象の入力データを「ＡＰサービス」とする。この場合、予測処理部５０４は、「＄＄Ｐ，＄Ｐサ，Ｐサー，サーバ，ーバ＄，バ＄＄，＄＄Ｑ，…，ー＄＄」の各部分文字列の存在の有無に応じて、判断対象の入力データ「ＡＰサービス」の特徴量ベクトルを算出する。なお、この部分文字列は、上述した学習済みの入力データ（Ｐサーバ、Ｑソフト、Ｐサーバー）の部分文字列である。 As an example, the input data to be determined is “AP service”. In this case, the prediction processing unit 504 determines whether or not each partial character string “$$P, $P server, P server, server, server $, server $$, $$Q,..., —$” exists. According to the above, the feature amount vector of the input data “AP service” to be determined is calculated. The partial character string is a partial character string of the input data (P server, Q software, P server) that has been learned as described above.

例えば、部分文字列「＄＄Ｐ」は、「ＡＰサービス」に含まれない。このため、「ＡＰサービス」の特徴量ベクトルのうち、部分文字列「＄＄Ｐ」に対応する値は「０」となる。また、部分文字列「Ｐサー」は、「ＡＰサービス」に含まれる。このため、「ＡＰサービス」の特徴量ベクトルのうち、部分文字列「Ｐサー」に対応する値は「１」となる。このように、各部分文字列の存在の有無に応じて「０」または「１」を取ることにより、判断対象の入力データ「ＡＰサービス」の特徴量ベクトル「０，０，１，０，０，０，０，…」を算出することができる。 For example, the partial character string “$$P” is not included in the “AP service”. Therefore, the value corresponding to the partial character string “$$P” in the feature amount vector of “AP service” is “0”. In addition, the partial character string “P server” is included in the “AP service”. Therefore, the value corresponding to the partial character string “P server” in the feature amount vector of “AP service” is “1”. In this way, by taking “0” or “1” depending on the presence or absence of each partial character string, the feature quantity vector “0,0,1,0,0 of the input data “AP service” to be judged is obtained. , 0, 0,...” can be calculated.

そして、予測処理部５０４は、学習処理部５０２によって更新された学習モデルＭＤを用いて、算出した判断対象の入力データの特徴量ベクトルに対するラベルを予測する。すなわち、予測処理部５０４は、判断対象の入力データ（品名）の特徴量ベクトルを学習モデルＭＤに入力することで、判断対象の入力データが属するカテゴリ（ラベル）を判断する。 Then, the prediction processing unit 504 uses the learning model MD updated by the learning processing unit 502 to predict the label for the calculated feature amount vector of the input data to be determined. That is, the prediction processing unit 504 determines the category (label) to which the input data to be determined belongs by inputting the feature amount vector of the input data (product name) to be determined to the learning model MD.

以下の説明では、判断対象の入力データが属するカテゴリを「カテゴリ＃」と表記する場合がある。 In the following description, the category to which the determination target input data belongs may be referred to as “category #”.

抽出部５０５は、学習モデルＭＤを生成する際に用いた学習済みの入力データから、判断されたカテゴリ＃に属する、判断対象の入力データとは異なる他の入力データを抽出する。具体的には、例えば、抽出部５０５は、学習済みの入力データのうちのカテゴリ＃に属する入力データの中から、Ｋ個の他の入力データをランダムに抽出することにしてもよい。Ｋは、任意に設定可能であり、例えば、１〜１０程度の値に設定される。 The extraction unit 505 extracts, from the learned input data used when generating the learning model MD, other input data belonging to the determined category # and different from the determination target input data. Specifically, for example, the extraction unit 505 may randomly extract K other input data from the input data belonging to the category # of the learned input data. K can be set arbitrarily, and is set to a value of about 1 to 10, for example.

また、抽出部５０５は、判断対象の入力データと学習済みの入力データそれぞれとの類似度を算出することにしてもよい。そして、抽出部５０５は、算出した類似度に基づいて、学習済みの入力データから、カテゴリ＃に属する他の入力データを抽出することにしてもよい。 The extraction unit 505 may also calculate the degree of similarity between the input data to be determined and the learned input data. Then, the extraction unit 505 may extract other input data belonging to the category # from the learned input data based on the calculated similarity.

ここで、類似度とは、入力データ同士の類似度合いを示す指標値である。類似度としては、例えば、判断対象の入力データの特徴量ベクトルと、学習済みの入力データの特徴量ベクトルとのコサイン類似度を用いることができる。コサイン類似度は、データとデータとのベクトルの向きの近さ（角度）により、データ同士の類似度合いを評価するものである。 Here, the degree of similarity is an index value indicating the degree of similarity between input data. As the similarity, for example, the cosine similarity between the feature amount vector of the input data to be determined and the feature amount vector of the learned input data can be used. The cosine similarity is used to evaluate the degree of similarity between data based on the closeness (angle) of the vector directions of the data.

より詳細に説明すると、例えば、抽出部５０５は、特徴量ベクトルの全要素のうち、両方の入力データの値が１である数ａを算出する。また、抽出部５０５は、特徴量ベクトルの全要素のうち、いずれか一方の入力データの値が１である数ｂを算出する。そして、抽出部５０５は、算出した数ａを数ｂで割ることにより、判断対象の入力データの特徴量ベクトルと、学習済みの入力データの特徴量ベクトルとのコサイン類似度（ａ／ｂ）を算出する。この場合、類似度の最大値は「１」となり、最小値は「０」となる。 More specifically, for example, the extraction unit 505 calculates the number a in which the values of both input data are 1 among all the elements of the feature amount vector. Further, the extraction unit 505 calculates the number b in which the value of the input data of any one of all the elements of the feature amount vector is 1. Then, the extraction unit 505 divides the calculated number a by the number b to calculate the cosine similarity (a/b) between the feature amount vector of the input data to be determined and the feature amount vector of the learned input data. calculate. In this case, the maximum value of the similarity is "1" and the minimum value is "0".

なお、学習済みの入力データの特徴量ベクトルは、例えば、入力情報ＩＮ＿Ｄから特定される。また、学習済みの入力データの特徴量ベクトルが疎ベクトルとして表現されている場合には、抽出部５０５は、疎ベクトルを特徴量ベクトルに復元してから、判断対象の入力データの特徴量ベクトルと、学習済みの入力データの特徴量ベクトルとのコサイン類似度を算出する。 The feature amount vector of the learned input data is specified from the input information IN_D, for example. When the feature amount vector of the learned input data is represented as a sparse vector, the extraction unit 505 restores the sparse vector to the feature amount vector, and then extracts the feature amount vector of the input data to be determined. , The cosine similarity with the feature amount vector of the learned input data is calculated.

算出された類似度は、例えば、図７に示すような類似度テーブル７００に記憶される。類似度テーブル７００は、例えば、メモリ３０２、ディスク３０４などの記憶装置により実現される。ここで、類似度テーブル７００の記憶内容について説明する。 The calculated similarity is stored in the similarity table 700 as shown in FIG. 7, for example. The similarity table 700 is realized by a storage device such as the memory 302 and the disk 304, for example. Here, the storage contents of the similarity table 700 will be described.

図７は、類似度テーブル７００の記憶内容の一例を示す説明図である。図７において、類似度テーブル７００は、品名、種別、コサイン類似度および類似順位のフィールドを有し、各フィールドに情報を設定することで、類似度情報（例えば、類似度情報７００−１〜７００−３）をレコードとして記憶する。 FIG. 7 is an explanatory diagram showing an example of the stored contents of the similarity table 700. In FIG. 7, the similarity table 700 has fields of item name, type, cosine similarity and similarity order, and by setting information in each field, similarity information (for example, similarity information 700-1 to 700-700). -3) is stored as a record.

ここで、品名は、学習済みの入力データである。種別は、品名（学習済みの入力データ）が属するカテゴリである。コサイン類似度は、判断対象の入力データの特徴量ベクトルと、学習済みの入力データの特徴量ベクトルとのコサイン類似度である。図７の例では、判断対象の入力データを、品名「Ａサーバ」とする。類似順位は、各学習済みの入力データを、判断対象の入力データとのコサイン類似度が降順となるように並べたときの順位である。 Here, the product name is the learned input data. The type is a category to which the product name (learned input data) belongs. The cosine similarity is the cosine similarity between the feature amount vector of the input data to be judged and the feature amount vector of the learned input data. In the example of FIG. 7, the input data to be determined is the product name “A server”. The similarity rank is a rank when the learned input data are arranged in descending order of the cosine similarity with the input data to be determined.

例えば、類似度情報７００−１は、学習済みの入力データ「Ｐサーバ」の種別「ハードウェア」、コサイン類似度「０．５００」および類似順位「１」を示す。 For example, the similarity information 700-1 indicates the type “hardware” of the learned input data “P server”, the cosine similarity “0.500”, and the similarity order “1”.

図５の説明に戻り、抽出部５０５は、例えば、学習済みの入力データから、判断されたカテゴリ＃に属する入力データのうち判断対象の入力データとの類似度が大きい上位Ｎ個（Ｎ：自然数）の他の入力データを抽出することにしてもよい。Ｎは、任意に設定可能であり、例えば、１〜５程度の値に設定される。 Returning to the description of FIG. 5, the extraction unit 505, for example, from the learned input data, among the input data belonging to the determined category #, the top N pieces (N: natural number) having a high similarity to the input data to be determined. Other input data may be extracted. N can be set arbitrarily, and is set to a value of about 1 to 5, for example.

具体的には、例えば、抽出部５０５は、図７に示した類似度テーブル７００を参照して、学習済みの入力データ（品名）から、カテゴリ＃に属する入力データのうち判断対象の入力データとのコサイン類似度が大きい上位Ｎ個の他の入力データ（品名）を抽出する。例えば、Ｎを「Ｎ＝１」とすると、抽出部５０５は、カテゴリ＃に属する入力データのうち判断対象の入力データとのコサイン類似度が最大の他の入力データを抽出する。 Specifically, for example, the extraction unit 505 refers to the similarity table 700 illustrated in FIG. 7 and determines from the learned input data (product name) as the determination target input data among the input data belonging to the category #. The upper N other input data (product names) having a large cosine similarity of are extracted. For example, when N is “N=1”, the extraction unit 505 extracts, from the input data belonging to the category #, other input data having the maximum cosine similarity with the input data to be determined.

出力部５０６は、判断対象の入力データと対応付けて、判断されたカテゴリ＃と、抽出された他の入力データとを出力する。また、出力部５０６は、判断対象の入力データと対応付けて、さらに、抽出された他の入力データと判断対象の入力データとの類似度を出力することにしてもよい。 The output unit 506 outputs the determined category # and the other extracted input data in association with the input data to be determined. Further, the output unit 506 may output the similarity between the other extracted input data and the determination target input data in association with the determination target input data.

出力部５０６の出力形式としては、例えば、通信Ｉ／Ｆ３０５による他のコンピュータ（例えば、クライアント装置２０１）への送信、不図示のディスプレイへの表示、不図示のプリンタへの印刷出力などがある。 The output format of the output unit 506 includes, for example, transmission by the communication I/F 305 to another computer (for example, the client device 201), display on a display not shown, print output to a printer not shown, and the like.

具体的には、例えば、クライアント装置２０１から判断対象の入力データ（品名）を受け付けたとする。この場合、出力部５０６は、クライアント装置２０１に予測結果画面を表示することにしてもよい。ここで、予測結果画面は、判断対象の入力データと対応付けて、予測処理部５０４によって判断されたカテゴリ＃と、抽出部５０５によって抽出された他の入力データとを表示する画面である。 Specifically, for example, it is assumed that the input data (product name) to be determined is received from the client device 201. In this case, the output unit 506 may display the prediction result screen on the client device 201. Here, the prediction result screen is a screen that displays the category # judged by the prediction processing unit 504 and other input data extracted by the extraction unit 505 in association with the input data to be judged.

予測結果画面の画面例については、図８を用いて後述する。 A screen example of the prediction result screen will be described later with reference to FIG.

また、出力部５０６は、抽出された他の入力データと判断対象の入力データとの類似度が閾値α以下の場合、判断対象の入力データと対応付けて、さらに、所定のアラートＡＬを出力することにしてもよい。閾値αは、任意に設定可能である。例えば、類似度をコサイン類似度（０以上１以下）とすると、閾値αは、０．３程度の値に設定される。 Further, when the similarity between the other extracted input data and the determination target input data is equal to or less than the threshold value α, the output unit 506 outputs the predetermined alert AL in association with the determination target input data. You may decide. The threshold value α can be set arbitrarily. For example, when the similarity is the cosine similarity (0 or more and 1 or less), the threshold value α is set to a value of about 0.3.

具体的には、例えば、出力部５０６は、類似度テーブル７００を参照して、カテゴリ＃に属する入力データのうち、判断対象の入力データとのコサイン類似度が最大の他の入力データのコサイン類似度を特定する。そして、出力部５０６は、特定したコサイン類似度が閾値α以下の場合、判断対象の入力データと対応付けて、さらに、所定のアラートＡＬを出力する。 Specifically, for example, the output unit 506 refers to the similarity table 700, and among the input data belonging to the category #, the cosine similarity of the other input data having the maximum cosine similarity to the input data to be determined. Identify the degree. Then, when the identified cosine similarity is equal to or lower than the threshold value α, the output unit 506 outputs a predetermined alert AL in association with the input data to be determined.

アラートＡＬは、判断対象の入力データが属するカテゴリの判断結果が誤っている可能性があることを伝えて、注意を促すものである。例えば、アラートＡＬは、警告メッセージであってもよいし、警告画像であってもよい。すなわち、判断対象の入力データとの類似度が最大の他の入力データの類似度が低いほど、判断結果が誤っている可能性が高くなる傾向があるため、ユーザに注意を促す。 The alert AL conveys that the determination result of the category to which the input data to be determined belongs may be incorrect, and calls attention. For example, the alert AL may be a warning message or a warning image. That is, the lower the degree of similarity of the other input data having the highest degree of similarity with the input data to be determined, the higher the possibility that the determination result will be erroneous, and the user is warned.

アラートＡＬの出力例については、図９を用いて後述する。 An output example of the alert AL will be described later with reference to FIG.

また、出力部５０６は、抽出された他の入力データが、学習済みの入力データのうち、判断対象の入力データとの類似度が大きい上位Ｍ個（Ｍ：自然数）の入力データに含まれない場合、判断対象データと対応付けて、さらに、所定のアラートＡＬを出力することにしてもよい。Ｍは、例えば、１〜５程度の値に設定される。 Further, the output unit 506 does not include other extracted input data in the upper M (M: natural number) input data having a high degree of similarity with the input data to be determined among the learned input data. In this case, a predetermined alert AL may be further output in association with the determination target data. M is set to a value of about 1 to 5, for example.

具体的には、例えば、出力部５０６は、類似度テーブル７００を参照して、カテゴリ＃に属する入力データのうち、判断対象の入力データとのコサイン類似度が最大の他の入力データの類似順位を特定する。そして、出力部５０６は、特定した類似順位が閾値β以下の場合（ただし、β＝Ｍ）、判断対象の入力データと対応付けて、さらに、所定のアラートＡＬを出力する。 Specifically, for example, the output unit 506 refers to the similarity table 700 and, among the input data belonging to the category #, the similarity order of the other input data having the maximum cosine similarity to the input data to be determined. Specify. Then, when the identified similarity order is equal to or lower than the threshold β (where β=M), the output unit 506 further outputs a predetermined alert AL in association with the input data to be determined.

すなわち、判断対象の入力データとの類似度が最大の他の入力データの類似順位が低いほど、判断結果が誤っている可能性が高くなる傾向があるため、ユーザに注意を促す。類似順位が閾値β以下の場合に出力されるアラートＡＬは、例えば、「類似順位が低いため注意してください」といった警告メッセージである。 That is, the lower the similarity rank of the other input data having the highest similarity to the input data to be determined, the higher the possibility that the determination result is incorrect, and the user is warned. The alert AL that is output when the similarity rank is less than or equal to the threshold value β is, for example, a warning message such as “Please note that the similarity rank is low”.

また、受付部５０３は、予測処理部５０４によって判断されたカテゴリ＃が判断対象の入力データと対応付けて出力された結果、カテゴリ＃が正しいか否かを示す正誤情報を受け付けることにしてもよい。具体的には、例えば、受付部５０３は、クライアント装置２０１から正誤情報を受信することにより、受信した正誤情報を受け付ける。また、取得部５０１は、情報処理装置１０１の不図示の入力装置を用いたユーザの操作入力により、正誤情報を受け付けることにしてもよい。 Further, the receiving unit 503 may receive correctness information indicating whether or not the category # is correct as a result of the category # determined by the prediction processing unit 504 being output in association with the input data to be determined. .. Specifically, for example, the reception unit 503 receives the correct/incorrect information by receiving the correct/incorrect information from the client device 201. Further, the acquisition unit 501 may accept the correctness information by a user's operation input using an input device (not shown) of the information processing device 101.

そして、受付部５０３は、抽出された他の入力データと判断対象の入力データとの類似度を、受け付けた正誤情報と対応付けて記憶部５１０に記録することにしてもよい。他の入力データは、例えば、カテゴリ＃に属する入力データのうち、判断対象の入力データとの類似度が最大の他の入力データである。 Then, the reception unit 503 may record the similarity between the other input data extracted and the input data to be determined in the storage unit 510 in association with the received correctness information. The other input data is, for example, other input data having the maximum degree of similarity with the input data to be determined among the input data belonging to the category #.

更新部５０７は、記憶部５１０に記録された類似度と正誤情報とのペアに基づいて、閾値αを更新する。具体的には、例えば、更新部５０７は、記憶部５１０に記録された類似度と正誤情報とのペアを教師データとして、教師あり学習（機械学習）を行うことにより、閾値αを更新する。 The updating unit 507 updates the threshold value α based on the pair of similarity and correctness information recorded in the storage unit 510. Specifically, for example, the updating unit 507 updates the threshold value α by performing supervised learning (machine learning) using the pair of similarity and correctness information recorded in the storage unit 510 as teacher data.

一例として、閾値αが「α＝０．３」の場合において、類似度と正誤情報とのペアとして、＜０．１４４，誤＞、＜０．１８８，正＞、＜０．８，正＞が記録されているとする。この場合、更新部５０７は、＜０．１４４，誤＞、＜０．１８８，正＞、＜０．８，正＞の各ペアを教師データとして、教師あり学習を行って、例えば、閾値αを「α＝０．１８」に変更する。 As an example, when the threshold α is “α=0.3”, the pair of similarity and correctness information is <0.144, false>, <0.188, correct>, <0.8, correct>. Is recorded. In this case, the updating unit 507 performs supervised learning using each pair of <0.144, erroneous>, <0.188, correct>, <0.8, correct> as teacher data, and, for example, the threshold α Is changed to “α=0.18”.

（予測結果画面の画面例）
つぎに、図８〜図１０を用いて、クライアント装置２０１に表示される予測結果画面の画面例について説明する。予測結果画面は、例えば、情報処理装置１０１の制御により、クライアント装置２０１から受け付けた判断対象の入力データに応じて、クライアント装置２０１のディスプレイ８１０に表示される。 (Screen example of prediction result screen)
Next, screen examples of the prediction result screen displayed on the client device 201 will be described with reference to FIGS. 8 to 10. The prediction result screen is displayed on the display 810 of the client device 201, for example, under the control of the information processing device 101, in accordance with the input data of the determination target received from the client device 201.

ディスプレイ８１０は、カーソル、アイコンあるいはツールボックスをはじめ、文書、画像、機能情報などのデータを表示する表示装置である。ディスプレイ８１０としては、例えば、液晶ディスプレイ、有機ＥＬ（Ｅｌｅｃｔｒｏｌｕｍｉｎｅｓｃｅｎｃｅ）ディスプレイなどを採用することができる。 The display 810 is a display device that displays data such as a document, an image, and functional information in addition to a cursor, an icon, or a tool box. As the display 810, for example, a liquid crystal display, an organic EL (Electroluminescence) display, or the like can be adopted.

図８は、予測結果画面の画面例を示す説明図（その１）である。図８において、予測結果画面８００は、入力品名「ＡＰサービス」と対応付けて、当該入力品名「ＡＰサービス」の予測種別「ハードウェア」を表示する画面である。ここで、入力品名は、判断対象の入力データに対応する。予測種別は、判断対象の入力データが属すると判断されたカテゴリに対応する。 FIG. 8 is an explanatory diagram (No. 1) showing a screen example of the prediction result screen. In FIG. 8, the prediction result screen 800 is a screen that displays the prediction type “hardware” of the input product name “AP service” in association with the input product name “AP service”. Here, the input product name corresponds to the input data to be judged. The prediction type corresponds to the category determined to belong to the input data to be determined.

予測結果画面８００には、入力品名「ＡＰサービス」と対応付けて、品名「Ｐサーバ」および類似度「０．５００」が表示されている。また、予測結果画面８００には、入力品名「ＡＰサービス」と対応付けて、品名「Ｐサーバー」および類似度「０．１５４」が表示されている。 On the prediction result screen 800, the product name “P server” and the similarity “0.500” are displayed in association with the input product name “AP service”. Further, on the prediction result screen 800, the product name “P server” and the similarity “0.154” are displayed in association with the input product name “AP service”.

ここで、品名「Ｐサーバ」および「Ｐサーバー」は、カテゴリ「ハードウェア」に属する学習済みの入力データ（品名）のうち、判断対象の入力データ「ＡＰサービス」との類似度が大きい上位２個の入力データである。類似度「０．５００」および「０．１５４」は、判断対象の入力データ「ＡＰサービス」と品名「Ｐサーバ」および「Ｐサーバー」とのコサイン類似度である。 Here, the product names “P server” and “P server” are the top two of the learned input data (product name) belonging to the category “hardware” that have the highest similarity to the input data “AP service” to be determined. Input data. The similarities “0.500” and “0.154” are cosine similarities between the input data “AP service” to be judged and the product names “P server” and “P server”.

なお、予測結果画面８００内の判定結果（予測種別、予測根拠）は、クライアント装置２０１の不図示の入力装置を用いたユーザの操作入力により、入力ボックス８０１に判断対象の入力データを入力し、判定ボタン８０２を選択することにより表示される。 The judgment result (prediction type, prediction basis) in the prediction result screen 800 is input by the user's operation using an input device (not shown) of the client device 201 to input the input data to be judged in the input box 801. It is displayed by selecting the determination button 802.

予測結果画面８００によれば、判断対象の入力データ「ＡＰサービス」に対して、機械学習により得られたカテゴリ「ハードウェア」を出力する際に、当該カテゴリに属する他の事例（Ｐサーバ、Ｐサーバー）を示すことができる。これにより、カテゴリを判断した根拠となる情報を提示して、機械学習により得られた判断結果の妥当性についての評価を支援することができる。 According to the prediction result screen 800, when outputting the category “hardware” obtained by machine learning to the input data “AP service” to be determined, other cases (P server, P Server). As a result, it is possible to present information that serves as a basis for determining the category and support the evaluation of the validity of the determination result obtained by the machine learning.

例えば、ユーザが、入力品名「ＡＰサービス」が、品名「Ｐサーバ」や品名「Ｐサーバー」とは異なる種別のものであるとわかれば、予測種別「ハードウェア」が誤っている可能性があることに気付くことができる。 For example, if the user finds that the input product name “AP service” is of a type different from the product name “P server” or the product name “P server”, the predicted type “hardware” may be incorrect. You can notice that.

図９は、予測結果画面の画面例を示す説明図（その２）である。図９において、予測結果画面９００は、入力品名「ＡＰサービス」と対応付けて、当該入力品名「ＡＰサービス」の予測種別「ハードウェア」を表示する画面である。 FIG. 9 is an explanatory diagram (part 2) showing a screen example of the prediction result screen. In FIG. 9, the prediction result screen 900 is a screen for displaying the prediction type “hardware” of the input product name “AP service” in association with the input product name “AP service”.

また、予測結果画面９００には、入力品名「ＡＰサービス」と対応付けて、品名「Ｐサーバ」、類似度「０．１４４」および類似順位「１」が表示されている。ここで、品名「Ｐサーバ」は、カテゴリ「ハードウェア」に属する学習済みの入力データ（品名）のうち、判断対象の入力データ「ＡＰサービス」との類似度が最大の入力データである。 Further, on the prediction result screen 900, the product name “P server”, the similarity “0.144”, and the similarity order “1” are displayed in association with the input product name “AP service”. Here, the product name “P server” is the input data having the maximum degree of similarity with the input data “AP service” to be determined, among the learned input data (product name) belonging to the category “hardware”.

類似度「０．１４４」は、判断対象の入力データ「ＡＰサービス」と品名「Ｐサーバ」とのコサイン類似度である。類似順位「１」は、品名「Ｐサーバ」が、学習済みの入力データのうち、判断対象の入力データ「ＡＰサービス」との類似度が最大であることを示す。なお、ここでは説明のため、判断対象の入力データ「ＡＰサービス」と品名「Ｐサーバ」との類似度として、図８に示した例とは異なる値を用いている。 The similarity “0.144” is the cosine similarity between the input data “AP service” to be determined and the product name “P server”. The similarity order “1” indicates that the product name “P server” has the highest degree of similarity with the input data “AP service” to be determined, among the learned input data. For the sake of explanation, a value different from the example shown in FIG. 8 is used as the similarity between the input data “AP service” to be judged and the product name “P server”.

また、予測結果画面９００には、アラートＡＬ１が表示されている。アラートＡＬ１は、入力品名と最も類似する過去のデータ（Ｐサーバ）の類似度が低いため、予測種別が誤っている可能性があることを伝えて、ユーザに注意を促すものである。 In addition, an alert AL1 is displayed on the prediction result screen 900. The alert AL1 is to inform the user that the prediction type may be incorrect because the past data (P server) that is most similar to the input product name has a low similarity.

予測結果画面９００によれば、入力品名との類似度が一定の水準を満たさない場合に、ユーザに対する警告を出して、機械学習により得られた判断結果が誤っている可能性があることを示唆することができる。これにより、ユーザが機械学習により得られた判断結果を注意して確認することができ、ラベル（カテゴリ）の間違いが見逃されるのを防ぐことができる。 According to the prediction result screen 900, when the degree of similarity with the input product name does not satisfy a certain level, a warning is issued to the user and the judgment result obtained by machine learning may be incorrect. can do. This allows the user to carefully check the determination result obtained by machine learning, and prevent a mistake in the label (category) from being overlooked.

図１０は、予測結果画面の画面例を示す説明図（その３）である。図１０において、予測結果画面１０００は、入力品名「ＡＰサービス」と対応付けて、当該入力品名「ＡＰサービス」の予測種別「ハードウェア」を表示する画面である。 FIG. 10 is an explanatory diagram (part 3) showing an example of the prediction result screen. In FIG. 10, the prediction result screen 1000 is a screen for displaying the prediction type “hardware” of the input product name “AP service” in association with the input product name “AP service”.

予測結果画面１０００には、入力品名「ＡＰサービス」と対応付けて、品名「Ｐサーバ」、類似度「０．１４４」、類似順位「１」およびアラートＡＬ１が表示されている。 On the prediction result screen 1000, the product name “P server”, the similarity “0.144”, the similarity order “1”, and the alert AL1 are displayed in association with the input product name “AP service”.

予測結果画面１０００によれば、入力品名との類似度が一定の水準を満たさない場合に、ユーザに対する警告を出して、機械学習により得られた判断結果が誤っている可能性があることを示唆することができる。例えば、ユーザは、品名「Ｐサーバ」、類似度「０．１４４」、類似順位「１」を参照しながら、予測種別「ハードウェア」が正しいか否かを確認することができる。 According to the prediction result screen 1000, when the degree of similarity with the input product name does not satisfy a certain level, a warning is issued to the user and the judgment result obtained by machine learning may be incorrect. can do. For example, the user can confirm whether or not the prediction type “hardware” is correct by referring to the product name “P server”, the similarity “0.144”, and the similarity order “1”.

予測結果画面１０００において、ユーザの操作入力により、正ボタン１００１を選択すると、予測種別が正しいことを示す正誤情報を、クライアント装置２０１から情報処理装置１０１に送信することができる。また、予測結果画面１０００において、ユーザの操作入力により、誤ボタン１００２を選択すると、予測種別が誤っていることを示す正誤情報を、クライアント装置２０１から情報処理装置１０１に送信することができる。 When the correct button 1001 is selected by the user's operation input on the prediction result screen 1000, the correctness information indicating that the prediction type is correct can be transmitted from the client device 201 to the information processing device 101. Further, when the wrong button 1002 is selected by the user's operation input on the prediction result screen 1000, the correctness information indicating that the prediction type is wrong can be transmitted from the client device 201 to the information processing device 101.

これにより、入力品名（判断対象の入力データ）に対する予測種別（カテゴリ＃）の正誤を、情報処理装置１０１に通知することができ、ユーザからのフィードバックにより閾値αを調整可能となる。なお、正ボタン１００１、誤ボタン１００２は、図８および図９に示した予測結果画面８００，９００に含まれていてもよい。 As a result, it is possible to notify the information processing apparatus 101 of the correctness of the prediction type (category #) for the input product name (input data to be determined), and the threshold value α can be adjusted by feedback from the user. The correct button 1001 and the erroneous button 1002 may be included in the prediction result screens 800 and 900 shown in FIGS. 8 and 9.

（情報処理装置１０１の各種処理手順）
つぎに、情報処理装置１０１の各種処理手順について説明する。まず、図１１を用いて、情報処理装置１０１の学習処理手順について説明する。 (Various processing procedures of information processing apparatus 101)
Next, various processing procedures of the information processing apparatus 101 will be described. First, the learning processing procedure of the information processing apparatus 101 will be described with reference to FIG.

図１１は、情報処理装置１０１の学習処理手順の一例を示すフローチャートである。図１１のフローチャートにおいて、まず、情報処理装置１０１は、ベースとなる学習モデルＭＤを取得する（ステップＳ１１０１）。つぎに、情報処理装置１０１は、学習データＤＢ２２０から取得していない未取得の学習データを取得する（ステップＳ１１０２）。 FIG. 11 is a flowchart showing an example of a learning processing procedure of the information processing apparatus 101. In the flowchart of FIG. 11, first, the information processing apparatus 101 acquires a learning model MD that is a base (step S1101). Next, the information processing apparatus 101 acquires unacquired learning data that has not been acquired from the learning data DB 220 (step S1102).

そして、情報処理装置１０１は、取得した学習データの品名（入力データ）の特徴量ベクトルを算出する（ステップＳ１１０３）。つぎに、情報処理装置１０１は、品名（入力データ）と対応付けて、算出した特徴量ベクトルと、学習データのラベル（種別）との組を入力情報ＩＮ＿Ｄに格納する（ステップＳ１１０４）。 Then, the information processing apparatus 101 calculates the feature quantity vector of the product name (input data) of the acquired learning data (step S1103). Next, the information processing apparatus 101 stores the set of the calculated feature amount vector and the label (type) of the learning data in the input information IN_D in association with the product name (input data) (step S1104).

そして、情報処理装置１０１は、入力情報ＩＮ＿Ｄに格納された特徴量ベクトルとラベルとの組に基づいて、教師あり学習を行って、学習モデルＭＤを更新する（ステップＳ１１０５）。つぎに、情報処理装置１０１は、学習データＤＢ２２０から取得していない未取得の学習データがあるか否かを判断する（ステップＳ１１０６）。 Then, the information processing apparatus 101 performs supervised learning based on the set of the feature amount vector and the label stored in the input information IN_D, and updates the learning model MD (step S1105). Next, the information processing apparatus 101 determines whether or not there is unacquired learning data that has not been acquired from the learning data DB 220 (step S1106).

ここで、未取得の学習データがある場合（ステップＳ１１０６：Ｙｅｓ）、情報処理装置１０１は、ステップＳ１１０２に戻る。一方、未取得の学習データがない場合（ステップＳ１１０６：Ｎｏ）、情報処理装置１０１は、本フローチャートによる一連の処理を終了する。 Here, when there is unacquired learning data (step S1106: Yes), the information processing apparatus 101 returns to step S1102. On the other hand, when there is no unacquired learning data (step S1106: No), the information processing apparatus 101 ends the series of processes according to this flowchart.

これにより、入力データの特徴量から当該入力データが属するカテゴリを判断する学習モデルＭＤを生成することができる。 Accordingly, it is possible to generate the learning model MD that determines the category to which the input data belongs from the feature amount of the input data.

つぎに、図１２および図１３を用いて、情報処理装置１０１の第１の予測処理手順について説明する。第１の予測処理手順は、判断対象の入力データと対応付けて、当該入力データが属すると判断されたカテゴリ、当該入力データと最も類似する学習済みの入力データ、所定のアラートＡＬなどを出力する処理である。 Next, the first prediction processing procedure of the information processing apparatus 101 will be described with reference to FIGS. 12 and 13. The first prediction processing procedure outputs the category determined to belong to the input data, the learned input data most similar to the input data, the predetermined alert AL, etc. in association with the input data to be determined. Processing.

図１２および図１３は、情報処理装置１０１の第１の予測処理手順の一例を示すフローチャートである。図１２のフローチャートにおいて、まず、情報処理装置１０１は、判断対象の入力データを受け付けたか否かを判断する（ステップＳ１２０１）。ここで、情報処理装置１０１は、判断対象の入力データを受け付けるのを待つ（ステップＳ１２０１：Ｎｏ）。 12 and 13 are flowcharts showing an example of the first prediction processing procedure of the information processing apparatus 101. In the flowchart of FIG. 12, first, the information processing apparatus 101 determines whether or not the input data to be determined has been received (step S1201). Here, the information processing apparatus 101 waits for receiving the input data to be determined (step S1201: No).

そして、情報処理装置１０１は、判断対象の入力データを受け付けた場合（ステップＳ１２０１：Ｙｅｓ）、受け付けた判断対象の入力データの特徴量ベクトルを算出する（ステップＳ１２０２）。つぎに、情報処理装置１０１は、更新した学習モデルＭＤを用いて、算出した判断対象の入力データの特徴量ベクトルに対するラベル（カテゴリ）を判断する（ステップＳ１２０３）。 Then, when the information processing apparatus 101 receives the determination target input data (step S1201: Yes), the information processing apparatus 101 calculates the feature amount vector of the received determination target input data (step S1202). Next, the information processing apparatus 101 uses the updated learning model MD to determine the label (category) for the calculated feature amount vector of the input data to be determined (step S1203).

そして、情報処理装置１０１は、判断対象の入力データと学習済みの入力データそれぞれとの類似度を算出する（ステップＳ１２０４）。具体的には、例えば、情報処理装置１０１は、判断対象の入力データの特徴量ベクトルと、入力情報ＩＮ＿Ｄに格納された各特徴量ベクトルとのコサイン類似度を算出する。 Then, the information processing apparatus 101 calculates the similarity between the input data to be determined and the learned input data (step S1204). Specifically, for example, the information processing apparatus 101 calculates the cosine similarity between the feature amount vector of the input data to be determined and each feature amount vector stored in the input information IN_D.

つぎに、情報処理装置１０１は、「ｉ」を「ｉ＝１」として（ステップＳ１２０５）、学習済みの入力データから、ｉ番目に類似度が高い学習済みの入力データを抽出する（ステップＳ１２０６）。そして、情報処理装置１０１は、判断したラベルと、抽出した学習済みの入力データのラベルとが一致するか否かを判断する（ステップＳ１２０７）。 Next, the information processing apparatus 101 sets “i” to “i=1” (step S1205) and extracts learned input data having the i-th highest similarity from the learned input data (step S1206). .. Then, the information processing apparatus 101 determines whether or not the determined label and the extracted label of the learned input data match (step S1207).

ここで、ラベルが一致する場合（ステップＳ１２０７：Ｙｅｓ）、情報処理装置１０１は、図１３に示すステップＳ１３０１に移行する。一方、ラベルが一致しない場合には（ステップＳ１２０７：Ｎｏ）、情報処理装置１０１は、学習済みの入力データのうち抽出されていない未抽出の学習済みの入力データがあるか否かを判断する（ステップＳ１２０８）。 Here, when the labels match (step S1207: Yes), the information processing apparatus 101 proceeds to step S1301 shown in FIG. On the other hand, when the labels do not match (step S1207: No), the information processing apparatus 101 determines whether or not there is unextracted learned input data that has not been extracted among the learned input data ( Step S1208).

ここで、未抽出の学習済みの入力データがある場合（ステップＳ１２０８：Ｙｅｓ）、情報処理装置１０１は、「ｉ」をインクリメントして（ステップＳ１２０９）、ステップＳ１２０６に戻る。一方、未抽出の学習済みの入力データがない場合（ステップＳ１２０８：Ｎｏ）、情報処理装置１０１は、予測結果画面を出力して（ステップＳ１２１０）、本フローチャートによる一連の処理を終了する。 If there is unextracted learned input data (step S1208: Yes), the information processing apparatus 101 increments “i” (step S1209) and returns to step S1206. On the other hand, if there is no unextracted input data that has been learned (step S1208: No), the information processing apparatus 101 outputs the prediction result screen (step S1210), and ends the series of processes according to this flowchart.

なお、ステップＳ１２１０において出力される予測結果画面は、判断対象の入力データと対応付けて、ステップＳ１２０３において判断されたラベル（カテゴリ＃）を表示する画面である。 The prediction result screen output in step S1210 is a screen that displays the label (category #) determined in step S1203 in association with the input data to be determined.

図１３のフローチャートにおいて、まず、情報処理装置１０１は、ステップＳ１２０６において抽出された学習済みの入力データの類似度が閾値α以上であるか否かを判断する（ステップＳ１３０１）。ここで、類似度が閾値α未満の場合（ステップＳ１３０１：Ｎｏ）、情報処理装置１０１は、ステップＳ１３０４に移行する。 In the flowchart of FIG. 13, first, the information processing apparatus 101 determines whether the similarity of the learned input data extracted in step S1206 is equal to or greater than the threshold value α (step S1301). Here, when the degree of similarity is less than the threshold value α (step S1301: No), the information processing apparatus 101 proceeds to step S1304.

一方、類似度が閾値α以上の場合（ステップＳ１３０１：Ｙｅｓ）、情報処理装置１０１は、「ｉ」が閾値β以下であるか否かを判断する（ステップＳ１３０２）。ここで、「ｉ」が閾値β以下の場合（ステップＳ１３０２：Ｙｅｓ）、情報処理装置１０１は、抽出された学習済みの入力データを正常値としてセットして（ステップＳ１３０３）、ステップＳ１３０５に移行する。 On the other hand, when the similarity is equal to or higher than the threshold value α (step S1301: Yes), the information processing apparatus 101 determines whether “i” is equal to or lower than the threshold value β (step S1302). Here, when “i” is equal to or smaller than the threshold β (step S1302: Yes), the information processing apparatus 101 sets the extracted learned input data as a normal value (step S1303), and proceeds to step S1305. ..

一方、「ｉ」が閾値βより大きい場合（ステップＳ１３０２：Ｎｏ）、情報処理装置１０１は、抽出された学習済みの入力データを異常値としてセットする（ステップＳ１３０４）。そして、情報処理装置１０１は、予測結果画面を出力して（ステップＳ１３０５）、本フローチャートによる一連の処理を終了する。 On the other hand, when “i” is larger than the threshold β (step S1302: No), the information processing apparatus 101 sets the extracted learned input data as an abnormal value (step S1304). Then, the information processing apparatus 101 outputs the prediction result screen (step S1305) and ends the series of processes according to this flowchart.

ステップＳ１３０５において出力される予測結果画面は、判断対象の入力データと対応付けて、ステップＳ１２０３において判断されたラベル（カテゴリ＃）と、ステップＳ１２０６において抽出された学習済みの入力データとを表示する画面である。また、学習済みの入力データが異常値としてセットされた場合には、予測結果画面には、アラートＡＬが表示される。 The prediction result screen output in step S1305 is a screen that displays the label (category #) determined in step S1203 and the learned input data extracted in step S1206 in association with the input data to be determined. Is. When the learned input data is set as an abnormal value, the alert AL is displayed on the prediction result screen.

これにより、判断対象の入力データに対して、機械学習により得られたカテゴリを出力する際に、当該カテゴリに属すると判断される他の事例を示すことができる。また、判断対象の入力データとの類似度や類似順位が一定の水準を満たさない場合に、ユーザに対する警告（アラートＡＬ）を出して、機械学習により得られた判断結果が誤っている可能性があることを示唆することができる。 Thereby, when outputting the category obtained by machine learning to the input data to be determined, it is possible to show another example that is determined to belong to the category. In addition, if the similarity or the order of similarity with the input data to be judged does not meet a certain level, a warning (alert AL) is issued to the user, and the judgment result obtained by machine learning may be incorrect. Can be suggested.

つぎに、図１４および図１５を用いて、情報処理装置１０１の第２の予測処理手順について説明する。第２の予測処理手順は、判断対象の入力データと対応付けて、当該入力データが属すると判断されたカテゴリ、当該入力データとの類似度が大きい上位Ｎ個の学習済みの入力データを出力する処理である。 Next, the second prediction processing procedure of the information processing apparatus 101 will be described with reference to FIGS. 14 and 15. The second prediction processing procedure outputs the category determined to belong to the input data and the top N learned input data having a high degree of similarity to the input data in association with the input data to be determined. Processing.

図１４および図１５は、情報処理装置１０１の第２の予測処理手順の一例を示すフローチャートである。図１４のフローチャートにおいて、まず、情報処理装置１０１は、判断対象の入力データを受け付けたか否かを判断する（ステップＳ１４０１）。ここで、情報処理装置１０１は、判断対象の入力データを受け付けるのを待つ（ステップＳ１４０１：Ｎｏ）。 14 and 15 are flowcharts showing an example of the second prediction processing procedure of the information processing apparatus 101. In the flowchart of FIG. 14, first, the information processing apparatus 101 determines whether or not the input data to be determined has been received (step S1401). Here, the information processing apparatus 101 waits for receiving the input data to be determined (step S1401: No).

そして、情報処理装置１０１は、判断対象の入力データを受け付けた場合（ステップＳ１４０１：Ｙｅｓ）、受け付けた判断対象の入力データの特徴量ベクトルを算出する（ステップＳ１４０２）。つぎに、情報処理装置１０１は、更新した学習モデルＭＤを用いて、算出した判断対象の入力データの特徴量ベクトルに対するラベル（カテゴリ）を判断する（ステップＳ１４０３）。 Then, when the information processing apparatus 101 receives the determination target input data (step S1401: Yes), the information processing apparatus 101 calculates a feature amount vector of the received determination target input data (step S1402). Next, the information processing apparatus 101 uses the updated learning model MD to determine the label (category) for the calculated feature amount vector of the input data to be determined (step S1403).

そして、情報処理装置１０１は、判断対象の入力データと学習済みの入力データそれぞれとの類似度を算出して（ステップＳ１４０４）、図１５に示すステップＳ１５０１に移行する。具体的には、例えば、情報処理装置１０１は、判断対象の入力データの特徴量ベクトルと、入力情報ＩＮ＿Ｄに格納された各特徴量ベクトルとのコサイン類似度を算出する。 Then, the information processing apparatus 101 calculates the similarity between the determination target input data and the learned input data (step S1404), and proceeds to step S1501 shown in FIG. Specifically, for example, the information processing apparatus 101 calculates the cosine similarity between the feature amount vector of the input data to be determined and each feature amount vector stored in the input information IN_D.

図１５のフローチャートにおいて、まず、情報処理装置１０１は、「ｉ」を「ｉ＝１」として（ステップＳ１５０１）、学習済みの入力データから、ｉ番目に類似度が高い学習済みの入力データを抽出する（ステップＳ１５０２）。そして、情報処理装置１０１は、ステップＳ１４０３において判断したラベルと、抽出した学習済みの入力データのラベルとが一致するか否かを判断する（ステップＳ１５０３）。 In the flowchart of FIG. 15, first, the information processing apparatus 101 sets “i” to “i=1” (step S1501) and extracts learned input data having the i-th highest similarity from the learned input data. Yes (step S1502). Then, the information processing apparatus 101 determines whether or not the label determined in step S1403 matches the label of the extracted learned input data (step S1503).

ここで、ラベルが不一致の場合（ステップＳ１５０３：Ｎｏ）、情報処理装置１０１は、ステップＳ１５０６に移行する。一方、ラベルが一致する場合（ステップＳ１５０３：Ｙｅｓ）、情報処理装置１０１は、抽出した学習済みの入力データをリストに追加する（ステップＳ１５０４）。そして、情報処理装置１０１は、リスト内のデータ数が「Ｎ」となったか否かを判断する（ステップＳ１５０５）。 Here, if the labels do not match (step S1503: No), the information processing apparatus 101 moves to step S1506. On the other hand, when the labels match (step S1503: Yes), the information processing apparatus 101 adds the extracted learned input data to the list (step S1504). Then, the information processing apparatus 101 determines whether or not the number of data items in the list is “N” (step S1505).

ここで、リスト内のデータ数が「Ｎ」となった場合（ステップＳ１５０５：Ｙｅｓ）、情報処理装置１０１は、ステップＳ１５０８に移行する。一方、リスト内のデータ数が「Ｎ」となっていない場合（ステップＳ１５０５：Ｎｏ）、情報処理装置１０１は、学習済みの入力データのうち抽出されていない未抽出の学習済みの入力データがあるか否かを判断する（ステップＳ１５０６）。 Here, when the number of data in the list becomes “N” (step S1505: Yes), the information processing apparatus 101 moves to step S1508. On the other hand, when the number of data in the list is not “N” (step S1505: No), the information processing apparatus 101 has unextracted learned input data that has not been extracted among the learned input data. It is determined whether or not (step S1506).

ここで、未抽出の学習済みの入力データがある場合（ステップＳ１５０６：Ｙｅｓ）、情報処理装置１０１は、「ｉ」をインクリメントして（ステップＳ１５０７）、ステップＳ１５０２に戻る。一方、未抽出の学習済みの入力データがない場合（ステップＳ１５０６：Ｎｏ）、情報処理装置１０１は、予測結果画面を出力して（ステップＳ１５０８）、本フローチャートによる一連の処理を終了する。 If there is unextracted learned input data (step S1506: Yes), the information processing apparatus 101 increments “i” (step S1507) and returns to step S1502. On the other hand, when there is no unextracted learned input data (step S1506: No), the information processing apparatus 101 outputs the prediction result screen (step S1508) and ends the series of processes according to this flowchart.

ステップＳ１５０７において出力される予測結果画面は、判断対象の入力データと対応付けて、ステップＳ１４０３において判断されたラベルと、リストに登録された最大でＮ個の学習済みの入力データとを表示する画面である。 The prediction result screen output in step S1507 is a screen that displays the label determined in step S1403 and the maximum N learned input data registered in the list in association with the input data to be determined. Is.

これにより、判断対象の入力データに対して、機械学習により得られたカテゴリ（ラベル）を出力する際に、当該カテゴリに属すると判断される他の事例（類似度が高い順に最大でＮ個の学習済みの入力データ）を示すことができる。 As a result, when outputting a category (label) obtained by machine learning to the input data to be determined, other cases (up to N pieces of maximum similarity in descending order of similarity) are determined to belong to the category. The learned input data) can be shown.

以上説明したように、実施の形態にかかる情報処理装置１０１によれば、学習モデルＭＤを用いて、判断対象の入力データが属するカテゴリを判断することができる。学習モデルＭＤは、入力データの特徴量から当該入力データが属するカテゴリを判断する予測モデルである。また、情報処理装置１０１によれば、学習モデルＭＤを生成する際に用いた学習済みの入力データから、判断したカテゴリに属する、判断対象の入力データとは異なる他の入力データを抽出することができる。そして、情報処理装置１０１によれば、判断対象の入力データと対応付けて、判断したカテゴリと、抽出した他の入力データとを出力することができる。 As described above, according to the information processing apparatus 101 according to the embodiment, it is possible to determine the category to which the input data to be determined belongs by using the learning model MD. The learning model MD is a prediction model that determines the category to which the input data belongs from the feature amount of the input data. Further, according to the information processing device 101, other input data belonging to the determined category and different from the input data to be determined can be extracted from the learned input data used when generating the learning model MD. it can. Then, according to the information processing apparatus 101, it is possible to output the determined category and the other extracted input data in association with the determination target input data.

これにより、判断対象の入力データに対して、機械学習により得られたカテゴリ（ラベル）を出力する際に、当該カテゴリに属すると判断される他の事例を示すことができる。このため、カテゴリを判断した根拠となる情報をユーザに提示することができ、機械学習により得られた判断結果の妥当性についての評価を支援することができる。 Accordingly, when outputting a category (label) obtained by machine learning to the input data to be determined, another example that is determined to belong to the category can be shown. Therefore, it is possible to present the user with the information that is the basis for determining the category, and it is possible to support the evaluation of the validity of the determination result obtained by the machine learning.

また、情報処理装置１０１によれば、判断対象の入力データと学習済みの入力データそれぞれとの類似度を算出し、算出した類似度に基づいて、学習済みの入力データから、判断したカテゴリに属する他の入力データを抽出することができる。 Further, according to the information processing apparatus 101, the similarity between each of the input data to be determined and the learned input data is calculated, and the learned input data belongs to the determined category based on the calculated similarity. Other input data can be extracted.

これにより、判断対象の入力データが属するカテゴリを判断した根拠とする学習済みの入力データを、判断対象の入力データとの類似度合いを考慮して選択することができる。 This makes it possible to select the learned input data based on which the category to which the determination target input data belongs is determined in consideration of the degree of similarity with the determination target input data.

また、情報処理装置１０１によれば、学習済みの入力データから、判断したカテゴリに属する入力データのうち判断対象の入力データとの類似度が大きい上位Ｎ個の他の入力データを抽出することができる。 Further, according to the information processing apparatus 101, the top N other input data having a high degree of similarity with the input data of the determination target among the input data belonging to the determined category can be extracted from the learned input data. it can.

これにより、判断対象の入力データが属するカテゴリを判断した根拠とする学習済みの入力データとして、判断対象の入力データとの類似度合いが高い入力データを選択することができる。このため、人間にとって直感的にわかりやすい根拠を提示することができ、機械学習により得られた判断結果の妥当性を評価しやすくすることができる。 As a result, it is possible to select, as the learned input data based on which the category to which the determination target input data belongs is determined, input data having a high degree of similarity to the determination target input data. For this reason, it is possible to present a basis that is intuitively understandable to humans, and it is possible to easily evaluate the validity of the judgment result obtained by machine learning.

また、情報処理装置１０１によれば、学習済みの入力データから、判断したカテゴリに属する入力データのうち判断対象の入力データとの類似度が最大の他の入力データを抽出することができる。そして、情報処理装置１０１によれば、抽出した他の入力データと判断対象の入力データとの類似度が閾値α以下の場合、判断対象の入力データと対応付けて、さらに、所定のアラートＡＬを出力することができる。 Further, according to the information processing apparatus 101, it is possible to extract, from the learned input data, other input data having a maximum degree of similarity with the input data to be determined among the input data belonging to the determined category. Then, according to the information processing apparatus 101, when the similarity between the extracted other input data and the determination target input data is equal to or less than the threshold value α, the predetermined alert AL is further associated with the determination target input data. Can be output.

これにより、判断対象の入力データに対して、機械学習により得られたカテゴリを出力する際に、当該カテゴリと同じカテゴリに属し、かつ、判断対象の入力データに最も類似する学習済みの入力データを提示することができる。また、判断対象の入力データとの類似度が一定の水準を満たさない場合に、ユーザに対する警告を出して、機械学習により得られた判断結果が誤っている可能性があることを示唆することができる。このため、ユーザが機械学習により得られた判断結果を注意して確認することができ、ラベル（カテゴリ）の間違いが見逃されるのを防ぐことができる。 As a result, when outputting the category obtained by machine learning to the input data to be judged, the learned input data that belongs to the same category as the category and is the most similar to the input data to be judged is output. Can be presented. In addition, if the similarity with the input data to be judged does not satisfy a certain level, a warning is issued to the user and it may be suggested that the judgment result obtained by machine learning may be incorrect. it can. Therefore, the user can carefully check the determination result obtained by machine learning, and it is possible to prevent the mistake of the label (category) from being overlooked.

また、情報処理装置１０１によれば、学習済みの入力データから、判断したカテゴリに属する入力データのうち判断対象の入力データとの類似度が最大の他の入力データを抽出することができる。そして、情報処理装置１０１によれば、抽出した他の入力データが、学習済みの入力データのうち、判断対象の入力データとの類似度が大きい上位Ｍ個の入力データに含まれない場合、判断対象データと対応付けて、さらに、所定のアラートＡＬを出力することができる。 Further, according to the information processing apparatus 101, it is possible to extract, from the learned input data, other input data having a maximum degree of similarity with the input data to be determined among the input data belonging to the determined category. Then, according to the information processing apparatus 101, when the extracted other input data is not included in the upper M pieces of input data having a large similarity to the input data to be determined among the learned input data, the determination is made. A predetermined alert AL can be further output in association with the target data.

これにより、判断対象の入力データに対して、機械学習により得られたカテゴリを出力する際に、当該カテゴリと同じカテゴリに属し、かつ、判断対象の入力データに最も類似する学習済みの入力データを提示することができる。また、提示する学習済みの入力データの類似順位が一定の水準を満たさない場合に、ユーザに対する警告を出して、機械学習により得られた判断結果が誤っている可能性があることを示唆することができる。このため、ユーザが機械学習により得られた判断結果を注意して確認することができ、ラベル（カテゴリ）の間違いが見逃されるのを防ぐことができる。 As a result, when outputting the category obtained by machine learning to the input data to be judged, the learned input data that belongs to the same category as the category and is the most similar to the input data to be judged is output. Can be presented. In addition, if the similarity order of the learned input data to be presented does not meet a certain level, a warning is issued to the user and the judgment result obtained by machine learning may be incorrect. You can Therefore, the user can carefully check the determination result obtained by machine learning, and it is possible to prevent the mistake of the label (category) from being overlooked.

また、情報処理装置１０１によれば、判断したカテゴリを判断対象の入力データと対応付けて出力した結果、判断したカテゴリが正しいか否かを示す正誤情報を受け付け、抽出した他の入力データと判断対象の入力データとの類似度と対応付けて、受け付けた正誤情報を記憶部５１０に記録することができる。そして、情報処理装置１０１によれば、記憶部５１０に記録された類似度と正誤情報とのペアに基づいて、閾値αを更新することができる。 Further, according to the information processing apparatus 101, as a result of outputting the determined category in association with the determination target input data, the correctness information indicating whether the determined category is correct is accepted and determined as other extracted input data. The received correctness information can be recorded in the storage unit 510 in association with the similarity with the target input data. Then, according to the information processing apparatus 101, the threshold value α can be updated based on the pair of the similarity and the correctness information recorded in the storage unit 510.

これにより、判断対象の入力データに対する判断結果（カテゴリ）の妥当性についてユーザが評価した結果をもとに、閾値αを調整することができる。 Accordingly, the threshold value α can be adjusted based on the result of the user's evaluation of the validity of the judgment result (category) for the input data to be judged.

また、情報処理装置１０１によれば、判断対象の入力データと対応付けて、さらに、抽出した他の入力データと判断対象の入力データとの類似度を出力することができる。 Further, according to the information processing apparatus 101, the similarity between the other input data extracted and the input data to be determined can be output in association with the input data to be determined.

これにより、根拠として提示された学習済みの入力データと判断対象の入力データとの類似度から、根拠の信頼性を判断することが可能となる。例えば、ユーザは、判断対象の入力データとの類似度が高いほど、根拠を信頼できると判断することができる。 As a result, the reliability of the ground can be judged from the similarity between the learned input data presented as the ground and the input data to be judged. For example, the user can determine that the basis is more reliable as the similarity with the input data to be determined is higher.

これらのことから、実施の形態にかかる情報処理装置１０１によれば、任意の機械学習アルゴリズムを適用可能にして機械学習により得られる判断結果の精度を確保しつつ、判断結果の妥当性について人間による評価を可能にすることができる。これにより、企業等における業務判断にかかるユーザの負荷を軽減しつつ、ラベルが間違って修正されたり、ラベルの間違いが見逃されたりするのを防ぐことができる。 From these things, according to the information processing apparatus 101 according to the embodiment, it is possible to apply an arbitrary machine learning algorithm to secure the accuracy of the judgment result obtained by the machine learning, and to confirm the validity of the judgment result by the human. Can allow evaluation. As a result, it is possible to prevent the user from accidentally correcting the label or overlooking the mistake of the label while reducing the user's load on the business decision in the company or the like.

なお、本実施の形態で説明した評価支援方法は、予め用意されたプログラムをパーソナル・コンピュータやワークステーション等のコンピュータで実行することにより実現することができる。本評価支援プログラムは、ハードディスク、フレキシブルディスク、ＣＤ−ＲＯＭ、ＤＶＤ、ＵＳＢメモリ等のコンピュータで読み取り可能な記録媒体に記録され、コンピュータによって記録媒体から読み出されることによって実行される。また、本評価支援プログラムは、インターネット等のネットワークを介して配布してもよい。 The evaluation support method described in the present embodiment can be realized by executing a prepared program on a computer such as a personal computer or a workstation. This evaluation support program is recorded in a computer-readable recording medium such as a hard disk, a flexible disk, a CD-ROM, a DVD, or a USB memory, and is executed by being read from the recording medium by the computer. The evaluation support program may be distributed via a network such as the Internet.

また、本実施の形態で説明した情報処理装置１０１は、スタンダードセルやストラクチャードＡＳＩＣ（ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）などの特定用途向けＩＣやＦＰＧＡなどのＰＬＤ（ＰｒｏｇｒａｍｍａｂｌｅＬｏｇｉｃＤｅｖｉｃｅ）によっても実現することができる。 The information processing apparatus 101 described in the present embodiment can also be realized by a special-purpose IC such as a standard cell or a structured ASIC (Application Specific Integrated Circuit) or a PLD (Programmable Logic Device) such as an FPGA.

上述した実施の形態に関し、さらに以下の付記を開示する。 The following supplementary notes will be disclosed regarding the above-described embodiment.

（付記１）入力データの特徴量から当該入力データが属するカテゴリを判断する学習モデルを用いて、判断対象の入力データが属するカテゴリを判断し、
前記学習モデルを生成する際に用いた学習済みの入力データから、判断した前記カテゴリに属する、前記判断対象の入力データとは異なる他の入力データを抽出し、
前記判断対象の入力データと対応付けて、判断した前記カテゴリと、抽出した前記他の入力データとを出力する、
処理をコンピュータに実行させることを特徴とする評価支援プログラム。 (Supplementary Note 1) The category to which the input data to be determined belongs is determined using a learning model that determines the category to which the input data belongs from the feature amount of the input data.
From the learned input data used when generating the learning model, extract other input data belonging to the determined category and different from the determination target input data,
Outputting the determined category and the extracted other input data in association with the determination target input data,
An evaluation support program characterized by causing a computer to execute processing.

（付記２）前記判断対象の入力データと前記学習済みの入力データそれぞれとの類似度を算出する、処理を前記コンピュータに実行させ、
前記抽出する処理は、
算出した前記類似度に基づいて、前記学習済みの入力データから、判断した前記カテゴリに属する、前記判断対象の入力データとは異なる他の入力データを抽出する、ことを特徴とする付記１に記載の評価支援プログラム。 (Supplementary Note 2) The computer is caused to execute a process of calculating the similarity between the input data to be determined and the learned input data.
The process of extracting is
The additional input data belonging to the determined category and different from the determination target input data is extracted from the learned input data based on the calculated degree of similarity. Evaluation support program.

（付記３）前記抽出する処理は、
前記学習済みの入力データから、判断した前記カテゴリに属する入力データのうち前記判断対象の入力データとの類似度が大きい上位Ｎ個（Ｎ：自然数）の他の入力データを抽出する、ことを特徴とする付記２に記載の評価支援プログラム。 (Supplementary Note 3) The extraction processing is
From the learned input data, upper N (N: natural number) other input data having a high degree of similarity with the input data of the determination target among the input data belonging to the determined category are extracted. The evaluation support program described in Appendix 2.

（付記４）前記抽出する処理は、
前記学習済みの入力データから、判断した前記カテゴリに属する入力データのうち前記判断対象の入力データとの類似度が最大の他の入力データを抽出し、
前記出力する処理は、
抽出した前記他の入力データと前記判断対象の入力データとの類似度が閾値以下の場合、前記判断対象の入力データと対応付けて、さらに、所定のアラートを出力する、ことを特徴とする付記２または３に記載の評価支援プログラム。 (Supplementary Note 4) The extraction processing is
From the learned input data, other input data having a maximum degree of similarity with the input data to be determined is extracted from the input data belonging to the determined category,
The output process is
When the similarity between the extracted other input data and the input data of the determination target is less than or equal to a threshold value, the predetermined input is further associated with the input data of the determination target, and a predetermined alert is output. The evaluation support program described in 2 or 3.

（付記５）前記抽出する処理は、
前記学習済みの入力データから、判断した前記カテゴリに属する入力データのうち前記判断対象の入力データとの類似度が最大の他の入力データを抽出し、
前記出力する処理は、
抽出した前記他の入力データが、前記学習済みの入力データのうち、前記判断対象の入力データとの類似度が大きい上位Ｍ個（Ｍ：自然数）の入力データに含まれない場合、前記判断対象の入力データと対応付けて、さらに、所定のアラートを出力する、ことを特徴とする付記２〜４のいずれか一つに記載の評価支援プログラム。 (Supplementary Note 5) The extraction processing is
From the learned input data, other input data having a maximum degree of similarity with the input data to be determined is extracted from the input data belonging to the determined category,
The output process is
If the extracted other input data is not included in the upper M (M: natural number) input data having a high degree of similarity with the input data to be determined among the learned input data, the determination target The evaluation support program according to any one of appendices 2 to 4, wherein a predetermined alert is further output in association with the input data.

（付記６）判断した前記カテゴリを前記判断対象の入力データと対応付けて出力した結果、判断した前記カテゴリが正しいか否かを示す正誤情報を受け付け、
抽出した前記他の入力データと前記判断対象の入力データとの類似度と対応付けて、受け付けた前記正誤情報を記憶部に記録し、
前記記憶部に記録された前記類似度と前記正誤情報とのペアに基づいて、前記閾値を更新する、
処理を前記コンピュータに実行させることを特徴とする付記４に記載の評価支援プログラム。 (Supplementary Note 6) As a result of outputting the determined category in association with the input data to be determined, correctness information indicating whether or not the determined category is correct is accepted,
Corresponding to the similarity between the extracted other input data and the input data to be determined, the received correctness information is recorded in the storage unit,
Updating the threshold value based on a pair of the similarity and the correctness information recorded in the storage unit,
5. The evaluation support program described in appendix 4, which causes the computer to execute a process.

（付記７）前記出力する処理は、
前記判断対象の入力データと対応付けて、さらに、抽出した前記他の入力データと前記判断対象の入力データとの類似度を出力する、ことを特徴とする付記２〜６のいずれか一つに記載の評価支援プログラム。 (Supplementary Note 7) The output processing is
In any one of appendices 2 to 6, wherein the degree of similarity between the other input data extracted and the input data of the determination target is output in association with the input data of the determination target. Evaluation support program described.

（付記８）前記判断対象の入力データと前記学習済みの入力データそれぞれとの類似度は、コサイン類似度によって表される、ことを特徴とする付記２〜７のいずれか一つに記載の評価支援プログラム。 (Supplementary note 8) The evaluation according to any one of supplementary notes 2 to 7, characterized in that the similarity between each of the determination target input data and each of the learned input data is represented by a cosine similarity. Support program.

（付記９）入力データの特徴量から当該入力データが属するカテゴリを判断する学習モデルを用いて、判断対象の入力データが属するカテゴリを判断し、
前記学習モデルを生成する際に用いた学習済みの入力データから、判断した前記カテゴリに属する、前記判断対象の入力データとは異なる他の入力データを抽出し、
前記判断対象の入力データと対応付けて、判断した前記カテゴリと、抽出した前記他の入力データとを出力する、
処理をコンピュータが実行することを特徴とする評価支援方法。 (Supplementary Note 9) The category to which the input data to be determined belongs is determined using a learning model that determines the category to which the input data belongs from the feature amount of the input data,
From the learned input data used when generating the learning model, extract other input data belonging to the determined category and different from the determination target input data,
Outputting the determined category and the extracted other input data in association with the determination target input data,
An evaluation support method characterized in that a computer executes the processing.

（付記１０）入力データの特徴量から当該入力データが属するカテゴリを判断する学習モデルを用いて、判断対象の入力データが属するカテゴリを判断する予測処理部と、
前記学習モデルを生成する際に用いた学習済みの入力データから、前記予測処理部によって判断された前記カテゴリに属する、前記判断対象の入力データとは異なる他の入力データを抽出する抽出部と、
前記判断対象の入力データと対応付けて、前記予測処理部によって判断された前記カテゴリと、前記抽出部によって抽出された前記他の入力データとを出力する出力部と、
を有することを特徴とする情報処理装置。 (Supplementary Note 10) A prediction processing unit that determines a category to which the input data to be determined belongs by using a learning model that determines the category to which the input data belongs from the feature amount of the input data,
From the learned input data used when generating the learning model, an extraction unit that extracts other input data belonging to the category determined by the prediction processing unit, which is different from the determination target input data,
An output unit that outputs the category determined by the prediction processing unit and the other input data extracted by the extraction unit in association with the input data to be determined.
An information processing device comprising:

１０１情報処理装置
１１０，ＭＤ学習モデル
１２０学習済みデータ
２００情報処理システム
２０１クライアント装置
２１０ネットワーク
２２０学習データＤＢ
３００バス
３０１プロセッサ
３０２メモリ
３０３ディスクドライブ
３０４ディスク
３０５通信Ｉ／Ｆ
３０６可搬型記録媒体Ｉ／Ｆ
３０７可搬型記録媒体
５０１取得部
５０２学習処理部
５０３受付部
５０４予測処理部
５０５抽出部
５０６出力部
５０７更新部
５１０記憶部
７００類似度テーブル
８００，９００，１０００予測結果画面 101 information processing device 110, MD learning model 120 learned data 200 information processing system 201 client device 210 network 220 learning data DB
300 bus 301 processor 302 memory 303 disk drive 304 disk 305 communication I/F
306 Portable recording medium I/F
307 Portable recording medium 501 Acquisition unit 502 Learning processing unit 503 Reception unit 504 Prediction processing unit 505 Extraction unit 506 Output unit 507 Update unit 510 Storage unit 700 Similarity table 800, 900, 1000 Prediction result screen

Claims

Using the learning model that determines the category to which the input data belongs from the feature amount of the input data, the category to which the input data to be judged belongs is judged,
From the learned input data used when generating the learning model, extract other input data belonging to the determined category and different from the determination target input data,
Outputting the determined category and the extracted other input data in association with the determination target input data,
An evaluation support program characterized by causing a computer to execute processing.

Calculating the degree of similarity between the input data to be judged and each of the learned input data, causing the computer to execute a process,
The process of extracting is
The other input data belonging to the determined category and different from the determination target input data is extracted from the learned input data based on the calculated similarity. Evaluation support program described.

The process of extracting is
From the learned input data, upper N (N: natural number) other input data having a high degree of similarity with the input data of the determination target among the input data belonging to the determined category are extracted. The evaluation support program according to claim 2.

The process of extracting is
From the learned input data, other input data having a maximum degree of similarity with the input data to be determined is extracted from the input data belonging to the determined category,
The output process is
When the similarity between the extracted other input data and the input data of the determination target is less than or equal to a threshold value, a predetermined alert is further output in association with the input data of the determination target. The evaluation support program according to Item 2 or 3.

The process of extracting is
From the learned input data, other input data having a maximum degree of similarity with the input data to be determined is extracted from the input data belonging to the determined category,
The output process is
If the extracted other input data is not included in the upper M (M: natural number) input data having a high degree of similarity with the input data to be determined among the learned input data, the determination target The evaluation support program according to any one of claims 2 to 4, wherein a predetermined alert is further output in association with the input data of.

As a result of outputting the determined category in association with the determination target input data, accepting correctness information indicating whether the determined category is correct,
Corresponding to the similarity between the extracted other input data and the input data to be determined, the received correctness information is recorded in the storage unit,
Updating the threshold value based on a pair of the similarity and the correctness information recorded in the storage unit,
The evaluation support program according to claim 4, which causes the computer to execute a process.

Using the learning model that determines the category to which the input data belongs from the feature amount of the input data, the category to which the input data to be judged belongs is judged,
From the learned input data used when generating the learning model, extract other input data belonging to the determined category and different from the determination target input data,
Outputting the determined category and the extracted other input data in association with the determination target input data,
An evaluation support method characterized in that a computer executes a process.

Using a learning model that determines the category to which the input data belongs from the feature amount of the input data, a prediction processing unit that determines the category to which the input data to be judged belongs,
From the learned input data used when generating the learning model, an extraction unit that extracts other input data belonging to the category determined by the prediction processing unit, which is different from the determination target input data,
An output unit that outputs the category determined by the prediction processing unit and the other input data extracted by the extraction unit in association with the input data to be determined.
An information processing device comprising: