JP2024043181A

JP2024043181A - Information processing device, information processing method and program

Info

Publication number: JP2024043181A
Application number: JP2022148221A
Authority: JP
Inventors: 英司光田; 健史大西; 浩司西口; あすか中川; 瑞樹酒井
Original assignee: Toyota Motor Corp
Current assignee: Toyota Motor Corp
Priority date: 2022-09-16
Filing date: 2022-09-16
Publication date: 2024-03-29

Abstract

【課題】予測モデルの予測値を精度良く評価することを可能とする。ニューラルネットワークに限らず、様々な機械学習アルゴリズムによって構築された学習モデルに適用可能な評価方法を提供する。【解決手段】情報処理装置は、予測モデルの生成に用いられた学習データに含まれる複数の説明変数から重要説明変数を特定する特定部と、学習データの重要説明変数の値と予測対象データの重要説明変数の値とに基づいて、複数の学習データから予測対象データと類似する類似学習データを抽出する抽出部と、抽出部によって抽出された類似学習データに基づいて、予測対象データを入力として予測モデルから出力される予測値の信頼性を評価する評価部と、を備える。【選択図】図３[Problem] To enable accurate evaluation of predicted values of a prediction model. To provide an evaluation method applicable to learning models constructed by various machine learning algorithms, not limited to neural networks. [Solution] An information processing device includes an identification unit that identifies important explanatory variables from multiple explanatory variables included in learning data used to generate a prediction model, an extraction unit that extracts similar learning data similar to the prediction target data from the multiple learning data based on the values of the important explanatory variables of the learning data and the values of the important explanatory variables of the prediction target data, and an evaluation unit that evaluates the reliability of a predicted value output from a prediction model using the prediction target data as input, based on the similar learning data extracted by the extraction unit. [Selected Figure] Figure 3

Description

本発明は、情報処理装置、情報処理方法およびプログラムに関する。 The present invention relates to an information processing device, an information processing method, and a program.

従来、学習データを用いて機械学習により構築した学習モデルを利用して、新たなデータに対する予測値を出力する技術が知られている。特許文献１には、学習事例を用いて構築された学習済みのニューラルネットワークによる新たな入力事例に対する予測結果を評価する技術が記載されている。特許文献１に記載の技術では、学習事例と入力事例との類似度に基づいて学習事例から類似事例が抽出され、この類似事例を用いてニューラルネットワークによる予測結果が評価される。 2. Description of the Related Art Conventionally, a technique is known in which a learning model constructed by machine learning using learning data is used to output a predicted value for new data. Patent Document 1 describes a technique for evaluating prediction results for new input cases by a trained neural network constructed using learning cases. In the technique described in Patent Document 1, similar cases are extracted from the learning cases based on the degree of similarity between the learning cases and the input cases, and the prediction results by the neural network are evaluated using the similar cases.

特開２００６－２３６３６７号公報Japanese Patent Application Publication No. 2006-236367

しかしながら、特許文献１に記載の技術では、入力事例および学習事例のすべての説明変数が予測結果の評価に使用されており、予測結果の評価において重要な説明変数ほど寄与が大きくなるようにしているものの、重要でない説明変数が多数存在する場合はそれらが評価に影響を与え得る。このため、特許文献１に記載の技術では、予測結果を精度よく評価できない場合があった。 However, in the technology described in Patent Document 1, all explanatory variables of input examples and learning examples are used to evaluate the prediction result, and the more important the explanatory variable is in the evaluation of the prediction result, the greater the contribution is made. However, if there are many unimportant explanatory variables, they may influence the evaluation. For this reason, with the technique described in Patent Document 1, there were cases in which prediction results could not be accurately evaluated.

本発明はこうした状況に鑑みてなされたものであり、その例示的な目的の一つは、予測モデルの予測値を精度良く評価することを可能とする情報処理装置、情報処理方法およびプログラムを提供することにある。 The present invention has been made in view of these circumstances, and one of its exemplary objects is to provide an information processing device, an information processing method, and a program that make it possible to accurately evaluate predicted values of a prediction model. It's about doing.

上記課題を解決するために、本発明のある態様の情報処理装置は、予測モデルの生成に用いられた学習データに含まれる複数の説明変数から重要説明変数を特定する特定部と、学習データの重要説明変数の値と予測対象データの重要説明変数の値とに基づいて、複数の学習データから予測対象データと類似する類似学習データを抽出する抽出部と、抽出部によって抽出された類似学習データに基づいて、予測対象データを入力として予測モデルから出力される予測値の信頼性を評価する評価部と、を備える。 In order to solve the above problem, an information processing device according to one aspect of the present invention includes an identification unit that identifies important explanatory variables from a plurality of explanatory variables included in the learning data used to generate a prediction model, an extraction unit that extracts similar learning data that is similar to the data to be predicted from the plurality of learning data based on the values of the important explanatory variables of the learning data and the values of the important explanatory variables of the data to be predicted, and an evaluation unit that evaluates the reliability of a predicted value output from the prediction model using the data to be predicted as input based on the similar learning data extracted by the extraction unit.

本発明のさらに別の態様は、情報処理方法である。この情報処理方法は、予測モデルの生成に用いられた学習データに含まれる複数の説明変数から重要説明変数を特定することと、学習データの重要説明変数の値と予測対象データの重要説明変数の値とに基づいて、複数の学習データから予測対象データと類似する類似学習データを抽出することと、類似学習データに基づいて、予測対象データを入力として予測モデルから出力される予測値の信頼性を評価することと、を含む。 Yet another aspect of the present invention is an information processing method. This information processing method involves identifying important explanatory variables from multiple explanatory variables included in the training data used to generate the predictive model, and comparing the values of the important explanatory variables in the training data with the important explanatory variables in the data to be predicted. extracting similar learning data that is similar to the prediction target data from multiple learning data based on the value, and determining the reliability of the predicted value output from the prediction model using the prediction target data as input based on the similar learning data. including evaluating the

本発明のさらに別の態様は、プログラムである。このプログラムは、コンピュータに、予測モデルの生成に用いられた学習データに含まれる複数の説明変数から重要説明変数を特定することと、学習データの重要説明変数の値と予測対象データの重要説明変数の値とに基づいて、複数の学習データから予測対象データと類似する類似学習データを抽出することと、類似学習データに基づいて、予測対象データを入力として予測モデルから出力される予測値の信頼性を評価することと、を実行させるためのものである。 Yet another aspect of the present invention is a program. This program uses a computer to identify important explanatory variables from multiple explanatory variables included in the training data used to generate the prediction model, and to calculate the values of the important explanatory variables in the training data and the important explanatory variables of the data to be predicted. extracting similar learning data that is similar to the prediction target data from multiple training data based on the value of , and increasing the reliability of the predicted value output from the prediction model using the prediction target data as input based on the similar learning data. The objective is to evaluate the gender and carry out the following.

なお、以上の構成要素の任意の組合せ、本発明の表現を方法、装置、システム、記録媒体、コンピュータプログラムなどの間で変換したものもまた、本発明の態様として有効である。 In addition, any combination of the above components, and any transformation of the present invention into a method, device, system, recording medium, computer program, etc., are also valid aspects of the present invention.

本発明によれば、予測モデルの予測値を精度良く評価することを可能とする情報処理装置、情報処理方法およびプログラムを提供できる。 According to the present invention, it is possible to provide an information processing device, an information processing method, and a program that enable accurate evaluation of predicted values of a prediction model.

本発明の一実施形態に係る情報処理装置のハードウェア構成を示す図である。1 is a diagram showing a hardware configuration of an information processing device according to an embodiment of the present invention. 同実施形態に係る情報処理装置の機能ブロック図である。FIG. 2 is a functional block diagram of the information processing device according to the embodiment. 同実施形態に係る処理部の機能ブロック図である。FIG. 2 is a functional block diagram of a processing unit according to the embodiment. 抽出部が予測対象データに対する学習データの類似度を算出する方法の一例を説明するための図である。FIG. 6 is a diagram for explaining an example of a method in which the extraction unit calculates the degree of similarity of learning data to prediction target data. 本発明の一実施形態に係る情報処理装置による動作の一例を示すフローチャートである。3 is a flowchart illustrating an example of an operation by an information processing apparatus according to an embodiment of the present invention. 本発明の一実施形態に係る情報処理装置による動作の一例を示すフローチャートである。3 is a flowchart illustrating an example of an operation by an information processing apparatus according to an embodiment of the present invention.

［実施形態］
以下、図面を参照しながら、本発明を実施するための形態について詳細に説明する。なお、図面の説明において同一の要素には同一の符号を付し、重複する説明を適宜省略する。また、以下に述べる構成は例示であり、本発明の範囲を何ら限定するものではない。 [Embodiment]
DESCRIPTION OF EMBODIMENTS Hereinafter, embodiments for carrying out the present invention will be described in detail with reference to the drawings. In addition, in the description of the drawings, the same elements are given the same reference numerals, and redundant description will be omitted as appropriate. Further, the configuration described below is an example and does not limit the scope of the present invention in any way.

図１は、本発明の一実施形態に係る情報処理装置１のハードウェア構成を示す図である。情報処理装置１は、プロセッサ１０、記憶装置１２、入力操作を受け付ける入力装置１４および情報の出力を行う出力装置１６を備える。プロセッサ１０は、ＣＰＵ（Central Processing Unit）およびＧＰＵ（Graphical Processing Unit）などを含む。記憶装置１２は、メモリ、ＨＤＤ（Hard Disk Drive）およびＳＳＤ（Solid State Drive）などを含む。入力装置１４は、たとえば、キーボード、タッチパネル、マウスおよびマイクなどを含む。出力装置１６は、たとえば、ディスプレイ、タッチパネルおよびスピーカなどを含む。 FIG. 1 is a diagram showing a hardware configuration of an information processing device 1 according to an embodiment of the present invention. The information processing device 1 includes a processor 10, a storage device 12, an input device 14 that receives input operations, and an output device 16 that outputs information. The processor 10 includes a CPU (Central Processing Unit), a GPU (Graphical Processing Unit), and the like. The storage device 12 includes a memory, an HDD (Hard Disk Drive), an SSD (Solid State Drive), and the like. Input device 14 includes, for example, a keyboard, touch panel, mouse, microphone, and the like. Output device 16 includes, for example, a display, a touch panel, and a speaker.

図２は、本発明の一実施形態に係る情報処理装置１の機能ブロック図である。本実施形態に係る情報処理装置１は、入力部２０、記憶部２２、処理部２４および出力部２６を備える。 Figure 2 is a functional block diagram of an information processing device 1 according to one embodiment of the present invention. The information processing device 1 according to this embodiment includes an input unit 20, a storage unit 22, a processing unit 24, and an output unit 26.

入力部２０は、各種の情報を受け付け、その情報を処理部２４に伝達する。入力部２０は、情報処理装置１が備える入力装置１４によって実現される。 The input unit 20 receives various types of information and transmits the information to the processing unit 24. The input unit 20 is realized by the input device 14 included in the information processing device 1.

記憶部２２は、各種の情報を記憶する。記憶部２２は、たとえば、処理部２４が各種の情報処理を実行するためのプログラム、予測モデルに関する情報、予測モデルの生成に用いられた学習データ、予測対象データなどを記憶してよい。記憶部２２は、情報処理装置１が備える記憶装置１２によって実現される。 The storage unit 22 stores various information. The storage unit 22 may store, for example, programs for the processing unit 24 to execute various information processes, information regarding a prediction model, learning data used to generate the prediction model, prediction target data, and the like. The storage unit 22 is realized by the storage device 12 included in the information processing device 1.

本実施形態に係る予測モデルは、各種の公知の機械学習アルゴリズムに基づき、学習データを用いて生成される。予測モデルは、たとえばニューラルネットワークおよび回帰木などで構成されてよい。 The prediction model according to this embodiment is generated using training data based on various known machine learning algorithms. The prediction model may be composed of, for example, a neural network and a regression tree.

本実施形態に係る学習データは、複数の説明変数および目的変数を含む。本実施形態に係る説明変数は、車両に関する変数を含み、たとえば、車種、型式、車体の色、駆動輪数（二輪駆動または四輪駆動）、走行距離および使用年数などを示してよい。また、本実施形態に係る目的変数は、車両の価格を示す。なお、学習データの説明変数および目的変数は、車両に関する変数に限定されるものではない。 The learning data according to this embodiment includes a plurality of explanatory variables and a target variable. The explanatory variables according to this embodiment include variables related to the vehicle, and may indicate, for example, the type, model, color of the vehicle body, number of driven wheels (two-wheel drive or four-wheel drive), mileage, and years of use. Further, the objective variable according to this embodiment indicates the price of the vehicle. Note that the explanatory variables and objective variables of the learning data are not limited to variables related to the vehicle.

このような学習データを用いて生成された予測モデルは、予測対象データを入力として、車両の価格を予測値として出力する。本実施形態に係る予測対象データは、学習データと同様の複数の説明変数を含む。したがって、本実施形態に係る予測モデルは、車両に関する情報を入力として、その車両の価格を予測値として出力する。 A prediction model generated using such training data takes the prediction target data as input and outputs the price of the vehicle as a predicted value. The prediction target data in this embodiment includes multiple explanatory variables similar to the training data. Therefore, the prediction model in this embodiment takes information about the vehicle as input and outputs the price of the vehicle as a predicted value.

処理部２４は、各種の情報処理を実行し、その結果を記憶部２２および出力部２６に伝達する。処理部２４の機能は、情報処理装置１のプロセッサ１０が、記憶装置１２に記憶されたプログラムを実行することにより実現される。 The processing unit 24 executes various information processing and transmits the results to the storage unit 22 and the output unit 26. The functions of the processing unit 24 are realized by the processor 10 of the information processing device 1 executing a program stored in the storage device 12.

出力部２６は、各種の情報を出力する。出力部２６は、たとえば処理部２４による情報処理の結果を表示してよく、具体的には、予測モデルから出力される予測値の信頼性の評価結果を表示してよい。出力部２６は、情報処理装置１が備える出力装置１６によって実現される。 The output unit 26 outputs various information. The output unit 26 may display, for example, the result of information processing by the processing unit 24, and specifically, may display the evaluation result of the reliability of the predicted value output from the prediction model. The output unit 26 is realized by the output device 16 included in the information processing device 1.

図３は、本実施形態に係る処理部２４の機能ブロック図である。本実施形態に係る処理部２４は、生成処理部２４０、評価処理部２６０および予測処理部２８０を有する。 FIG. 3 is a functional block diagram of the processing unit 24 according to this embodiment. The processing unit 24 according to this embodiment includes a generation processing unit 240, an evaluation processing unit 260, and a prediction processing unit 280.

生成処理部２４０は、学習データを用いて予測モデルを生成し、その生成結果に基づき、学習データを構成する説明変数の重要度を判定する。生成処理部２４０の機能は、モデル生成部２４２および判定部２４４が協働することにより実現される。 The generation processing unit 240 generates a predictive model using the learning data, and determines the importance of explanatory variables forming the learning data based on the generation result. The functions of the generation processing section 240 are realized by the cooperation of the model generation section 242 and the determination section 244.

モデル生成部２４２は、各種の公知の機械学習アルゴリズムに基づき、学習データを用いて予測モデルを生成する。生成された予測モデルに関する情報は、判定部２４４および予測処理部２８０に伝達される。モデル生成部２４２が生成する予測モデルの構成は特に限定されるものではないが、本実施形態では、予測モデルが回帰木で構成される例を説明する。 The model generation unit 242 generates a predictive model using learning data based on various known machine learning algorithms. Information regarding the generated prediction model is transmitted to the determination unit 244 and the prediction processing unit 280. Although the configuration of the prediction model generated by the model generation unit 242 is not particularly limited, in this embodiment, an example will be described in which the prediction model is configured with a regression tree.

判定部２４４は、学習データの各説明変数の重要度を判定する。判定部２４４は、各種の公知の機械学習アルゴリズムに基づき算出される特徴量重要度を用いて、各説明変数の重要度を判定してよい。より詳細には、判定部２４４は、回帰木における特徴量重要度に基づいて、各説明変数の重要度を判定してよい。 The determining unit 244 determines the importance of each explanatory variable of the learning data. The determining unit 244 may determine the importance of each explanatory variable using the feature importance calculated based on various known machine learning algorithms. More specifically, the determination unit 244 may determine the importance of each explanatory variable based on the feature importance in the regression tree.

判定部２４４は、たとえば、ＣａｔＢｏｏｓｔ（登録商標）などで定義されている特徴量重要度（Feature Importance）に基づいて、説明変数の重要度を判定してよい。特徴量重要度の求め方は、ＣａｔＢｏｏｓｔの以下のＵＲＬを参照されたい（Feature importance - Model analysis | CatBoost）。なお、特徴量重要度は、モデル生成部２４２で算出されてよい。
https://catboost.ai/en/docs/concepts/fstr#fstr__regular-feature-importance The determination unit 244 may determine the importance of the explanatory variable based on, for example, feature importance defined by CatBoost (registered trademark) or the like. For information on how to determine feature importance, please refer to the CatBoost URL below (Feature importance - Model analysis | CatBoost). Note that the feature value importance may be calculated by the model generation unit 242.
https://catboost.ai/en/docs/concepts/fstr#fstr__regular-feature-importance

判定部２４４は、予測モデルの回帰木において説明変数により学習データを分割したとき、分割された学習データの集合における目的変数の近さ（より詳細には、目的変数の分布に基づくジニ不純度）に基づいて、その説明変数の重要度を判定してよい。具体的には、判定部２４４は、分割前後のジニ不純度の差分に基づいて、説明変数の重要度を判定してよい。 When the learning data is divided by explanatory variables in the regression tree of the prediction model, the determination unit 244 determines the closeness of the target variable in the divided set of training data (more specifically, the Gini impurity based on the distribution of the target variable). The importance of the explanatory variable may be determined based on . Specifically, the determination unit 244 may determine the importance of the explanatory variable based on the difference in Gini impurity before and after the division.

たとえば、ある説明変数Ａによる学習データの分割前後におけるジニ不純度が差分Ａであり、ある説明変数Ｂによる学習データの分割前後におけるジニ不純度が差分Ｂであるとする。判定部２４４は、差分Ａおよび差分Ｂを比較することにより、説明変数Ａおよび説明変数Ｂの重要度を判定できる。具体的には、判定部２４４は、学習データの分割前後において、ジニ不純度の減少量が大きいほど、対応する説明変数の重要度が高いと判定してよい。 For example, assume that the Gini impurity before and after dividing the learning data using a certain explanatory variable A is a difference A, and the Gini impurity before and after dividing the learning data using a certain explanatory variable B is a difference B. The determination unit 244 can determine the importance of the explanatory variable A and the explanatory variable B by comparing the difference A and the difference B. Specifically, the determining unit 244 may determine that the greater the amount of decrease in Gini impurity before and after dividing the learning data, the higher the importance of the corresponding explanatory variable.

なお、説明変数Ａに関して第１の条件で学習データを分割し、その後、説明変数Ｂで学習データが分割され、さらに、説明変数Ａに関して第２の条件で学習データを分割するといった、ある説明変数で学習データが複数回分割される場合がある。この場合、判定部２４４は、少なくとも説明変数Ａに関し、第１および第２の条件で学習データを分割する際のジニ不純度の差分を考慮することとなる。 It should be noted that for a certain explanatory variable, the learning data is divided according to the first condition regarding explanatory variable A, then the learning data is divided according to explanatory variable B, and then the learning data is further divided according to the second condition regarding explanatory variable A. The training data may be divided multiple times. In this case, regarding at least the explanatory variable A, the determination unit 244 considers the difference in Gini impurity when dividing the learning data under the first and second conditions.

なお、予測モデルがニューラルネットワークで構成されている場合には、判定部２４４は、ニューラルネットワークの結合状態に基づいて、各説明変数の重要度を判定してよい。 Note that when the prediction model is configured by a neural network, the determination unit 244 may determine the importance of each explanatory variable based on the connection state of the neural network.

評価処理部２６０は、生成処理部２４０によって生成された予測モデルおよび判定された説明変数の重要度に基づいて、予測モデルから出力される予測値の信頼性を評価する。評価処理部２６０の機能は、特定部２６２、抽出部２６４、カウント部２６６、算出部２６８および評価部２７０が協働することにより実現される。 The evaluation processing unit 260 evaluates the reliability of the predicted value output from the prediction model based on the prediction model generated by the generation processing unit 240 and the determined importance of the explanatory variable. The functions of the evaluation processing section 260 are realized by the cooperation of the identification section 262, the extraction section 264, the counting section 266, the calculation section 268, and the evaluation section 270.

特定部２６２は、予測モデルの生成に用いられた学習データに含まれる複数の説明変数から重要説明変数を特定する。具体的には、特定部２６２は、判定部２４４によって判定された説明変数の重要度に基づいて、重要説明変数を特定してよい。たとえば、特定部２６２は、所定の閾値以上の重要度をもつ説明変数を重要説明変数として特定してよい。特定部２６２は、重要説明変数に関する情報を抽出部２６４に伝達する。 The identifying unit 262 identifies important explanatory variables from a plurality of explanatory variables included in the learning data used to generate the predictive model. Specifically, the identifying unit 262 may identify important explanatory variables based on the importance of the explanatory variables determined by the determining unit 244. For example, the identifying unit 262 may identify an explanatory variable having a degree of importance equal to or higher than a predetermined threshold as an important explanatory variable. The identification unit 262 transmits information regarding important explanatory variables to the extraction unit 264.

抽出部２６４は、学習データの重要説明変数の値と予測対象データの重要説明変数の値とに基づいて、複数の学習データから予測対象データと類似する類似学習データを抽出する。具体的には、抽出部２６４は、学習データの重要説明変数の値と予測対象データの重要説明変数の値に基づいて、予測対象データに対する学習データの類似度を算出し、その類似度を用いて類似学習データを抽出してよい。たとえば、抽出部２６４は、学習データの類似度が所定の閾値を超える場合には、その学習データを類似学習データとして抽出してよい。抽出部２６４は、抽出した類似学習データに関する情報をカウント部２６６、算出部２６８および予測処理部２８０に伝達する。 The extraction unit 264 extracts similar learning data similar to the prediction target data from the plurality of learning data based on the value of the important explanatory variable of the learning data and the value of the important explanatory variable of the prediction target data. Specifically, the extraction unit 264 calculates the similarity of the learning data to the prediction target data based on the value of the important explanatory variable of the learning data and the value of the important explanatory variable of the prediction target data, and uses the similarity. Similar learning data may be extracted using For example, if the similarity of the learning data exceeds a predetermined threshold, the extraction unit 264 may extract the learning data as similar learning data. The extraction unit 264 transmits information regarding the extracted similar learning data to the counting unit 266, the calculation unit 268, and the prediction processing unit 280.

図４を参照しながら、抽出部２６４が予測対象データに対する学習データの類似度を算出する方法の一例を説明する。図４には、予測対象データおよび１つの学習データについて、３つの重要説明変数（Ｘ１、Ｘ２およびＸ３）の値が示されている。ここで、Ｘ１、Ｘ２およびＸ３の重要度は、０．７、０．２および０．１である。 An example of a method by which the extraction unit 264 calculates the similarity of learning data to prediction target data will be described with reference to FIG. 4. FIG. 4 shows the values of three important explanatory variables (X1, X2, and X3) for prediction target data and one piece of learning data. Here, the importance of X1, X2 and X3 is 0.7, 0.2 and 0.1.

図４に示すように、予測対象データの重要説明変数Ｘ１、Ｘ２およびＸ３の値は、それぞれ０．７、０．５および１である。また、学習データの重要説明変数Ｘ１、Ｘ２およびＸ３の値は、それぞれ０．２、０．１および０である。本例では、以下の（１）～（３）の処理を行うことにより、これらの重要説明変数の値に基づいて類似度が算出される。 As shown in FIG. 4, the values of the important explanatory variables X1, X2, and X3 of the data to be predicted are 0.7, 0.5, and 1, respectively. The values of the important explanatory variables X1, X2, and X3 of the training data are 0.2, 0.1, and 0, respectively. In this example, the similarity is calculated based on the values of these important explanatory variables by performing the following processes (1) to (3).

（１）重要説明変数の値の差分の絶対値を算出
まず、学習データの重要説明変数の値と、予測対象データの重要説明変数の値との差分の絶対値が算出される。その算出結果は、図４に示すように、重要説明変数Ｘ１、Ｘ２およびＸ３について、それぞれ０．５、０．４および１である。なお本例では、学習データの重要説明変数の値と予測対象データの重要説明変数の値との差分の絶対値を用いて類似学習データを抽出する例を説明するが、たとえば、学習データの重要説明変数の値と予測対象データの重要説明変数の値との積（「コサイン類似度」ともいう。）を算出し、この積に基づいて類似学習データを抽出してよい。 (1) Calculating the absolute value of the difference between the values of important explanatory variables First, the absolute value of the difference between the value of the important explanatory variable of the learning data and the value of the important explanatory variable of the prediction target data is calculated. As shown in FIG. 4, the calculation results are 0.5, 0.4, and 1 for the important explanatory variables X1, X2, and X3, respectively. In this example, we will explain an example in which similar training data is extracted using the absolute value of the difference between the value of the important explanatory variable of the training data and the value of the important explanatory variable of the prediction target data. The product (also referred to as "cosine similarity") of the value of the explanatory variable and the value of the important explanatory variable of the prediction target data may be calculated, and similar learning data may be extracted based on this product.

また、重要説明変数が量的変数である場合には、重要説明変数の値の差分の絶対値を算出する前に、全学習データにおける重要説明変数の最小値が０となり、全学習データにおける重要説明変数の最大値が１となるように、ｍｉｎ－ｍａｘｎｏｒｍａｌｉｚａｔｉｏｎが施されてよい。 In addition, if the important explanatory variable is a quantitative variable, before calculating the absolute value of the difference between the values of the important explanatory variable, the minimum value of the important explanatory variable in all training data becomes 0, and Min-max normalization may be performed so that the maximum value of the explanatory variable is 1.

さらに、重要説明変数が質的変数である場合には、所定の条件が満たされるときには、重要説明変数の値を１とし、所定の条件が満たされないときには、重要説明変数の値を０としてよい。たとえば、車両の色が黒色であることが所定の条件である場合には、重要説明変数の値が示す車両の色が黒色のとき、重要説明変数の値を１とし、重要説明変数の値が示す車両の色が黒色以外の色であるとき、重要説明変数の値を０としてよい。 Further, when the important explanatory variable is a qualitative variable, the value of the important explanatory variable may be set to 1 when a predetermined condition is satisfied, and the value of the important explanatory variable may be set to 0 when the predetermined condition is not satisfied. For example, if the predetermined condition is that the color of the vehicle is black, when the color of the vehicle indicated by the value of the important explanatory variable is black, the value of the important explanatory variable is set to 1, and the value of the important explanatory variable is set to 1. When the color of the vehicle shown is a color other than black, the value of the important explanatory variable may be set to 0.

（２）絶対値と重要度とを積算
上述の（１）において算出された重要説明変数の値の差分の絶対値とその重要説明変数の重要度とが積算される。積算結果は、図４に示すように、重要説明変数Ｘ１、Ｘ２およびＸ３について、それぞれ０．３５、０．０８および０．１である。 (2) Multiplying the absolute values and the importance The absolute values of the differences between the values of the important explanatory variables calculated in (1) above are multiplied by the importance of the important explanatory variables. As shown in Figure 4, the multiplication results are 0.35, 0.08, and 0.1 for the important explanatory variables X1, X2, and X3, respectively.

（３）類似度の算出
上述の（２）における積算して得られた値の和を用いて、学習データの類似度が算出される。具体的には、１から積算値の合計（０．３５＋０．０８＋０．１＝０．５３）を差し引いた値である０．４７が、学習データの類似度として算出される。同様にして、抽出部２６４は、すべての学習データについて類似度を算出でき、このようにして算出された類似度に基づいて、すべての学習データから類似学習データが抽出される。 (3) Calculating the degree of similarity The degree of similarity of the learning data is calculated using the sum of the values obtained by integrating in (2) above. Specifically, 0.47, which is the value obtained by subtracting the total integrated value (0.35+0.08+0.1=0.53) from 1, is calculated as the similarity of the learning data. Similarly, the extraction unit 264 can calculate the degree of similarity for all of the learning data, and similar learning data is extracted from all of the learning data based on the degree of similarity thus calculated.

学習データの類似度を算出する方法は、ここで説明した例に限定されるものではない。抽出部２６４は、重要説明変数が量的変数である場合には、学習データと予測対象データとの距離に基づいて、類似度を算出してよい。具体的には、抽出部２６４は、重要説明変数の値の差分を用いた二乗和の平方根に基づき類似度を算出してよい。また、重要説明変数が質的変数である場合には、抽出部２６４は、学習データと予測対象データとにおいて一致する重要説明変数の数に基づいて、類似度を算出してよい。たとえば、抽出部２６４は、一致する重要説明変数の数が大きいほど、あるいは全重要説明変数の数に対する一致する重要説明変数の数の比率が大きいほど、類似度が大きくなるように、類似度を算出してよい。 The method for calculating the similarity of learning data is not limited to the example described here. When the important explanatory variable is a quantitative variable, the extraction unit 264 may calculate the degree of similarity based on the distance between the learning data and the prediction target data. Specifically, the extraction unit 264 may calculate the degree of similarity based on the square root of the sum of squares using the differences in the values of the important explanatory variables. Furthermore, when the important explanatory variables are qualitative variables, the extraction unit 264 may calculate the degree of similarity based on the number of matching important explanatory variables between the learning data and the prediction target data. For example, the extraction unit 264 determines the similarity so that the greater the number of matching important explanatory variables, or the greater the ratio of the number of matching important explanatory variables to the total number of important explanatory variables, the greater the similarity. You can calculate it.

図３に戻って、評価処理部２６０の機能について説明する。評価処理部２６０のカウント部２６６は、抽出部２６４によって抽出された類似学習データの数をカウントし、カウントした結果を評価部２７０に伝達する。また、算出部２６８は、抽出部２６４によって抽出された複数の類似学習データの目的変数の標準偏差を算出し、算出した標準偏差を評価部２７０に伝達する。 Returning to FIG. 3, the functions of the evaluation processing section 260 will be explained. The counting unit 266 of the evaluation processing unit 260 counts the number of similar learning data extracted by the extraction unit 264 and transmits the counting result to the evaluation unit 270. Further, the calculation unit 268 calculates the standard deviation of the target variable of the plurality of similar learning data extracted by the extraction unit 264, and transmits the calculated standard deviation to the evaluation unit 270.

評価部２７０は、抽出部２６４によって抽出された類似学習データに基づいて、予測対象データを入力としたときに予測モデルから出力される予測値の信頼性を評価する。評価部２７０は、評価結果を出力部２６に伝達してよい。これにより、出力部２６によって評価結果が表示される。 The evaluation unit 270 evaluates the reliability of the predicted value output from the prediction model when the prediction target data is input, based on the similar learning data extracted by the extraction unit 264. The evaluation unit 270 may transmit the evaluation results to the output unit 26. Thereby, the evaluation result is displayed by the output unit 26.

本実施形態では、評価部２７０は、カウント部２６６によってカウントされた類似学習データの数に基づいて、予測値の信頼性を評価する。具体的には、評価部２７０は、類似学習データの数が多いほど、予測値の信頼性を高く評価してよい。あるいは、評価部２７０は、全類似学習データの数に対する類似学習データの数の比率が大きいほど、予測値の信頼性を高く評価してよい。多くの類似学習データが予測モデルの生成に用いられるほど、予測モデルから出力される予測値の信頼性が高いと考えられる。このため、類似学習データの数を評価に用いることにより、より精度良く予測値の信頼性を評価できる。 In this embodiment, the evaluation unit 270 evaluates the reliability of the predicted value based on the number of similar learning data counted by the counting unit 266. Specifically, the evaluation unit 270 may evaluate the reliability of the predicted value higher as the number of similar learning data increases. Alternatively, the evaluation unit 270 may evaluate the reliability of the predicted value higher as the ratio of the number of similar learning data to the number of all similar learning data is larger. It is considered that the more similar learning data is used to generate a predictive model, the higher the reliability of the predicted value output from the predictive model. Therefore, by using the number of similar learning data for evaluation, it is possible to evaluate the reliability of predicted values with higher accuracy.

また、評価部２７０は、算出部２６８によって算出された類似学習データの目的変数の標準偏差に基づいて、予測値の信頼性を評価する。具体的には、評価部２７０は、類似学習データの目的変数の標準偏差が小さいほど、予測値の信頼性を高く評価してよい。一方、評価部２７０は、類似学習データの目的変数の標準偏差が大きいほど、予測値の信頼性を低く評価してよい。評価部２７０は、標準偏差あるいは標準偏差に基づく値を信頼度として算出してよい。 Furthermore, the evaluation unit 270 evaluates the reliability of the predicted value based on the standard deviation of the target variable of the similar learning data calculated by the calculation unit 268. Specifically, the evaluation unit 270 may evaluate the reliability of the predicted value higher as the standard deviation of the target variable of the similar learning data is smaller. On the other hand, the evaluation unit 270 may evaluate the reliability of the predicted value lower as the standard deviation of the target variable of the similar learning data is larger. The evaluation unit 270 may calculate the standard deviation or a value based on the standard deviation as the reliability.

目的変数の標準偏差が小さいほど目的変数の値のばらつきが小さい。ばらつきの小さい目的変数の値をもつ類似学習データを用いて生成された予測モデルは、精度の高い予測値を出力できると期待できる。一方、ばらつきの大きい目的変数の値をもつ類似学習データを用いて生成された予測モデルが出力する予測値の精度は低いものと考えられる。したがって、類似学習データの目的変数の標準偏差を評価に用いることにより、より精度良く予測値の信頼性を評価できる。 The smaller the standard deviation of the objective variable, the smaller the variation in the values of the objective variable. A prediction model generated using similar learning data with target variable values with small variations can be expected to output highly accurate predicted values. On the other hand, the accuracy of predicted values output by a predictive model generated using similar learning data with target variable values with large variations is considered to be low. Therefore, by using the standard deviation of the target variable of similar learning data for evaluation, it is possible to evaluate the reliability of predicted values with higher accuracy.

予測処理部２８０は、モデル生成部２４２によって生成された予測モデルを用いて予測値を生成し、その予測値を補正する。予測処理部２８０の機能は、予測生成部２８２および補正部２８４が協働することにより実現される。 The prediction processing unit 280 generates a predicted value using the prediction model generated by the model generation unit 242, and corrects the predicted value. The functions of the prediction processing section 280 are realized by the cooperation of the prediction generation section 282 and the correction section 284.

予測生成部２８２は、予測モデルを用いて予測対象データに対する予測値を生成し、その予測値を補正部２８４に伝達する。具体的には、予測生成部２８２は、予測対象データの説明変数を予測モデルに入力し、予測値を取得する。 The prediction generation unit 282 generates a predicted value for the prediction target data using the prediction model, and transmits the predicted value to the correction unit 284. Specifically, the prediction generation unit 282 inputs the explanatory variables of the prediction target data into the prediction model and obtains the predicted value.

補正部２８４は、予測生成部２８２によって生成された予測値を補正し、補正した結果を出力部２６に伝達する。これにより、予測値が出力部２６に表示される。具体的には、補正部２８４は、抽出部２６４によって抽出された複数の類似学習データに基づいて、予測値を補正してよい。類似学習データを用いて予測値を補正することにより、より正確に予測値を補正することが可能となる。 The correction unit 284 corrects the predicted value generated by the prediction generation unit 282 and transmits the corrected result to the output unit 26. Thereby, the predicted value is displayed on the output section 26. Specifically, the correction unit 284 may correct the predicted value based on the plurality of similar learning data extracted by the extraction unit 264. By correcting the predicted value using similar learning data, it becomes possible to correct the predicted value more accurately.

同じ車種、型式、走行距離などの条件が揃った車両であっても、車両の価値は年によって変化することが多い。このため、複数の類似学習データがあるとき、これらの目的変数（車両の価格）の値は、それぞれの類似学習データに対応する年によって大きく変化している可能性がある。本実施形態では、補正部２８４が、このような時間による目的変数の値の変化を考慮した予測値の補正を行う。 Even if the vehicle has the same make, model, mileage, and other conditions, the value of the vehicle often changes from year to year. Therefore, when there is a plurality of similar learning data, the values of these objective variables (vehicle prices) may vary greatly depending on the year corresponding to each similar learning data. In this embodiment, the correction unit 284 corrects the predicted value in consideration of such changes in the value of the target variable over time.

本実施形態では、複数の学習データのそれぞれは、時間情報と対応付けられている。時間情報は、たとえば、目的変数の値に対応する車両の価格が査定によって付けられたとき、および目的変数の値に対応する価格で車両が販売されたときの年および日付などを示す情報であってよい。補正部２８４は、目的変数の値と時間情報との関係に基づいて、予測値を補正できる。 In this embodiment, each of the multiple learning data is associated with time information. The time information may be, for example, information indicating the year and date when the price of the vehicle corresponding to the value of the objective variable was determined by appraisal and when the vehicle was sold at the price corresponding to the value of the objective variable. The correction unit 284 can correct the predicted value based on the relationship between the value of the objective variable and the time information.

たとえば、学習データの目的変数の値が、時間の経過とともに上昇傾向にあるとする。この場合、過去の情報に基づく学習データの説明変数の値と同様の条件を含む車両の価格は、学習データに基づき予測される車両の価格よりも高くなっていると推定される。しかしながら、予測モデルは学習データを用いて生成されているため、このような目的変数の値の時間依存性が予測モデルに反映されていない。このため、予測対象データを予測モデルに入力したとき、正確な車両の価格よりも低い価格を予測値として出力する可能性がある。 For example, suppose the value of the objective variable in the learning data tends to increase over time. In this case, the price of a vehicle that includes conditions similar to the values of the explanatory variables in the learning data based on past information is presumed to be higher than the price of the vehicle predicted based on the learning data. However, because the prediction model is generated using the learning data, the time dependency of the objective variable values is not reflected in the prediction model. For this reason, when the data to be predicted is input into the prediction model, there is a possibility that a price lower than the accurate vehicle price will be output as the predicted value.

そこで、補正部２８４は、目的変数の値の時間依存性に基づいて、予測モデルから出力された予測値を補正してよい。具体的には、目的変数の値が時間の経過とともに上昇している場合には、目的変数の値の上昇速度に応じた価格を予測値に加算してよい。また、目的変数の値が時間の経過とともに下降している場合には、目的変数の値の下降速度に応じた価格を予測値から減算してよい。このように目的変数の値の時間依存性を用いて予測値を補正することにより、車両の価格のように年によって価格が変動するものを目的変数とした場合に、より正確に予測値を補正することが可能となる。 Therefore, the correction unit 284 may correct the predicted value output from the prediction model based on the time dependence of the value of the objective variable. Specifically, if the value of the target variable is increasing over time, a price depending on the rate of increase in the value of the target variable may be added to the predicted value. Furthermore, if the value of the target variable is decreasing over time, a price corresponding to the rate of decline in the value of the target variable may be subtracted from the predicted value. By correcting the predicted value using the time dependence of the value of the objective variable in this way, it is possible to more accurately correct the predicted value when the objective variable is something whose price fluctuates from year to year, such as the price of a vehicle. It becomes possible to do so.

図５は、本発明の一実施形態に係る情報処理装置１による動作の一例を示すフローチャートである。以下、図５に示すフローチャートに沿って、本実施形態に係る動作の流れを説明する。 FIG. 5 is a flowchart illustrating an example of the operation of the information processing device 1 according to an embodiment of the present invention. The flow of operations according to this embodiment will be described below along the flowchart shown in FIG.

まず、モデル生成部２４２は、複数の学習データを用いて予測モデルを生成する（Ｓ１０１）。このとき、予測モデルの生成とともに、特徴量重要度が算出されてよい。次いで、判定部２４４は、学習データの説明変数の重要度を判定する（Ｓ１０３）。ここでは、判定部２４４は、Ｓ１０１における予測モデルの生成に用いられた学習データに含まれる複数の説明変数について、重要度を判定する。次いで、特定部２６２は、Ｓ１０３において判定された説明変数の重要度に基づいて、重要説明変数を特定する（Ｓ１０５）。 First, the model generation unit 242 generates a predictive model using a plurality of learning data (S101). At this time, the feature value importance may be calculated along with the generation of the predictive model. Next, the determining unit 244 determines the importance of the explanatory variables of the learning data (S103). Here, the determination unit 244 determines the importance of a plurality of explanatory variables included in the learning data used to generate the predictive model in S101. Next, the identifying unit 262 identifies important explanatory variables based on the importance of the explanatory variables determined in S103 (S105).

次いで、抽出部２６４は、学習データの重要説明変数の値と、予測対象データの重要説明変数の値とに基づいて、Ｓ１０１における予測モデルの生成に用いられた複数の学習データから類似学習データを抽出する（Ｓ１０７）。ここで使用される重要説明変数は、Ｓ１０５において特定された重要説明変数である。次いで、カウント部２６６は、Ｓ１０７において抽出された類似学習データの数をカウントする（Ｓ１０９）。次いで、算出部２６８は、Ｓ１０７において抽出された類似学習データの目的変数の標準偏差を差出する（Ｓ１１１）。 Next, the extraction unit 264 extracts similar learning data from the plurality of learning data used to generate the prediction model in S101, based on the value of the important explanatory variable of the learning data and the value of the important explanatory variable of the prediction target data. Extract (S107). The important explanatory variables used here are the important explanatory variables specified in S105. Next, the counting unit 266 counts the number of similar learning data extracted in S107 (S109). Next, the calculation unit 268 calculates the standard deviation of the target variable of the similar learning data extracted in S107 (S111).

次いで、評価部２７０は、類似学習データに基づいて、予測データを入力したときに予測モデルから出力される予測値を評価する（Ｓ１１３）。ここでは、評価部２７０は、Ｓ１０９においてカウントされた類似学習データの数およびＳ１１１において算出された目的変数の標準偏差に基づいて、予測値の信頼性を評価する。予測部２５２が予測値の信頼性を評価すると、図５に示す評価処理は終了する。 Next, the evaluation unit 270 evaluates the predicted value output from the prediction model when the prediction data is input, based on the similar learning data (S113). Here, the evaluation unit 270 evaluates the reliability of the predicted value based on the number of similar learning data counted in S109 and the standard deviation of the target variable calculated in S111. When the prediction unit 252 evaluates the reliability of the predicted value, the evaluation process shown in FIG. 5 ends.

図６は、本発明の一実施形態に係る情報処理装置１による動作の一例を示すフローチャートである。以下、図６に示すフローチャートに沿って、本実施形態に係る評価処理の流れを説明する。 FIG. 6 is a flowchart illustrating an example of the operation of the information processing device 1 according to an embodiment of the present invention. The flow of the evaluation process according to this embodiment will be described below along the flowchart shown in FIG.

まず、Ｓ２０１～Ｓ２０７の処理が行われるが、これらの処理は図５を参照しながら説明したＳ１０１～Ｓ１０７の処理と実質的に同一であるため、ここでは省略する。Ｓ２０７において類似学習データが抽出されると、予測生成部２８２は、予測モデルを用いて、予測対象データを入力として予想値を生成する（Ｓ２０９）。次いで、補正部２８４は、Ｓ２０７において抽出された類似学習データの目的変数の値と時間情報との関係に基づいて、Ｓ２０９において生成された予測値を補正する（Ｓ２１１）。補正部２８４が予測値を補正すると、図６に示す補正処理は終了する。 First, the processes of S201 to S207 are carried out, but since these processes are substantially the same as the processes of S101 to S107 described with reference to FIG. 5, they will not be described here. When similar learning data is extracted in S207, the prediction generation unit 282 uses a prediction model to generate a predicted value by inputting the prediction target data (S209). Next, the correction unit 284 corrects the predicted value generated in S209 based on the relationship between the value of the objective variable of the similar learning data extracted in S207 and the time information (S211). When the correction unit 284 corrects the predicted value, the correction process shown in FIG. 6 ends.

以上、本発明の一実施形態に係る情報処理装置１の構成および動作について説明した。本実施形態に係る情報処理装置１によれば、複数の説明変数から特定した重要説明変数に基づいて予測対象データに類似した類似学習データを抽出し、その類似学習データを用いて予測モデルが出力する予測値の信頼性を評価できる。このため、学習データが重要でない説明変数を含んでいたとしても、そのような説明変数は予測値の信頼性の評価には使用されないため、予測モデルの予測値を精度良く評価できる。 The configuration and operation of the information processing device 1 according to an embodiment of the present invention have been described above. According to the information processing device 1 according to the present embodiment, similar learning data similar to prediction target data is extracted based on an important explanatory variable identified from a plurality of explanatory variables, and a prediction model is output using the similar learning data. The reliability of predicted values can be evaluated. Therefore, even if the learning data includes unimportant explanatory variables, such explanatory variables are not used to evaluate the reliability of the predicted value, so the predicted value of the predictive model can be evaluated with high accuracy.

また、本実施形態に係る情報処理装置１によれば、ニューラルネットワークに限らず、回帰木などの各種のモデルで構築された予測モデルについて、その予測値の信頼性を評価できる。このため、本実施形態に係る情報処理装置１によれば、汎用性の高い予測値の信頼性の評価方法を提供できる。 Further, according to the information processing device 1 according to the present embodiment, it is possible to evaluate the reliability of predicted values of predictive models constructed not only by neural networks but also by various models such as regression trees. Therefore, the information processing device 1 according to the present embodiment can provide a highly versatile method for evaluating reliability of predicted values.

また、本実施形態に係る情報処理装置１によれば、評価部２７０は、類似学習データの数に基づいて、予測値の信頼性を評価する。このため、評価部２７０は、類似学習データが多いほど信頼性が高いと評価でき、より精度良く信頼性を評価することが可能となる。 Further, according to the information processing device 1 according to the present embodiment, the evaluation unit 270 evaluates the reliability of the predicted value based on the number of similar learning data. Therefore, the evaluation unit 270 can evaluate that the more similar learning data there is, the higher the reliability is, and it becomes possible to evaluate the reliability with higher accuracy.

また、本実施形態に係る情報処理装置１によれば、評価部２７０は、類似学習データの目的変数の標準偏差に基づいて、予測値の信頼性を評価する。このため、評価部２７０は、類似学習データの目的変数のばらつきが小さいほど予測値の信頼性が高いと評価でき、より精度良く信頼性を評価することが可能となる。 Further, according to the information processing device 1 according to the present embodiment, the evaluation unit 270 evaluates the reliability of the predicted value based on the standard deviation of the target variable of the similar learning data. Therefore, the evaluation unit 270 can evaluate that the reliability of the predicted value is higher as the variation in the target variable of the similar learning data is smaller, and it is possible to evaluate the reliability with higher accuracy.

また、本実施形態に係る情報処理装置１は、中古の車両をオークションに出品したときに落札される価格を予測するための予測モデルの予測値の信頼性を評価する場合にも利用できる。中古の車両の販売価格は、一般的に、査定士が、自身の感覚、経験および価格の経験などに基づいて査定する。本実施形態によれば、情報処理装置１によって生成された予測モデルを用いることにより、査定士の感覚などに頼らなくとも、車両の価格を予測できるとともに、その予測値の信頼性を評価することが可能である。 The information processing device 1 according to the present embodiment can also be used to evaluate the reliability of the predicted value of a prediction model for predicting the price at which a used vehicle will be sold at auction. The sales price of a used vehicle is generally assessed by an appraiser based on his or her own sense, experience, and price experience. According to the present embodiment, by using the prediction model generated by the information processing device 1, it is possible to predict the price of a vehicle without relying on the sense of an appraiser, and to evaluate the reliability of the predicted value. is possible.

［補足］
以上、本発明を実施の形態をもとに説明した。この実施の形態は例示であり、それらの各構成要素や各処理プロセスの組合せにいろいろな変形例が可能なこと、またそうした変形例も本発明の範囲にあることは当業者に理解されるところである。 [supplement]
The present invention has been described above based on the embodiments. Those skilled in the art will understand that this embodiment is merely an example, and that various modifications can be made to the combinations of the constituent elements and processing processes, and that such modifications are also within the scope of the present invention. be.

上記フローチャートを参照しながら説明した情報処理装置１による各ステップの処理は、必ずしも図示した通りの順序で実行されなくてよい。論理的に矛盾のない範囲で、ステップの処理を適宜入れ替えてよいし、複数のステップの処理を並列的に実行してもよい。たとえば、Ｓ１０９（類似学習データの数のカウント）およびＳ１１１（類似学習データの目的変数の標準偏差の算出）の処理は、並列的に実行されてよい。 The processing of each step by the information processing apparatus 1 described with reference to the above flowchart does not necessarily have to be executed in the illustrated order. As long as there is no logical contradiction, the processing of the steps may be changed as appropriate, or the processing of a plurality of steps may be executed in parallel. For example, the processes of S109 (counting the number of similar learning data) and S111 (calculating the standard deviation of the objective variable of similar learning data) may be performed in parallel.

１情報処理装置、２４０生成処理部、２４２モデル生成部、２４４判定部、２５２予測部、２６０評価処理部、２６２特定部、２６４抽出部、２６６カウント部、２６８算出部、２７０評価部、２８２予測生成部、２８４補正部、３０回帰木。 1 information processing device, 240 generation processing unit, 242 model generation unit, 244 determination unit, 252 prediction unit, 260 evaluation processing unit, 262 identification unit, 264 extraction unit, 266 count unit, 268 calculation unit, 270 evaluation unit, 282 prediction Generation unit, 284 Correction unit, 30 Regression tree.

Claims

An identification unit that identifies important explanatory variables from a plurality of explanatory variables included in the learning data used to generate the prediction model;
an extraction unit that extracts similar learning data similar to the prediction target data from a plurality of the learning data based on values of important explanatory variables of the learning data and values of important explanatory variables of the prediction target data;
and an evaluation unit that evaluates the reliability of a predicted value output from the prediction model using the prediction target data as an input based on the similar learning data extracted by the extraction unit.
Information processing device.

The evaluation unit evaluates reliability of the predicted value based on the number of similar learning data extracted by the extraction unit.
The information processing device according to claim 1 .

The extraction unit extracts a plurality of similar learning data that are respectively similar to the prediction target data from the plurality of learning data,
The evaluation unit evaluates the reliability of the predicted value based on the standard deviation of the objective variable of the plurality of similar learning data.
The information processing device according to claim 1.

The explanatory variables include variables related to vehicles,
the objective variable of the learning data indicates the price of the vehicle;
The information processing device according to claim 1.

The prediction model includes a regression tree,
The identifying unit identifies the important explanatory variable based on the feature importance in the regression tree.
The information processing device according to claim 4.

a prediction generation unit that uses the prediction model to generate the predicted value by inputting the prediction target data;
further comprising a correction unit that corrects the predicted value generated by the prediction generation unit based on the similar learning data,
The extraction unit extracts a plurality of similar learning data that are respectively similar to the prediction target data from the plurality of learning data,
Each of the plurality of similar learning data is associated with time information,
The correction unit corrects the predicted value based on the relationship between the value of the objective variable and the time information.
The information processing device according to claim 4 or 5.

Identifying important explanatory variables from multiple explanatory variables included in the training data used to generate the predictive model;
Extracting similar learning data similar to the prediction target data from a plurality of the learning data based on the value of the important explanatory variable of the learning data and the value of the important explanatory variable of the prediction target data;
Evaluating the reliability of a predicted value output from the prediction model using the prediction target data as input, based on the similar learning data,
Information processing method.

to the computer,
Identifying important explanatory variables from multiple explanatory variables included in the training data used to generate the predictive model;
Extracting similar learning data similar to the prediction target data from a plurality of the learning data based on the value of the important explanatory variable of the learning data and the value of the important explanatory variable of the prediction target data;
Evaluating the reliability of a predicted value output from the prediction model using the prediction target data as input, based on the similar learning data;
A program to run.