WO2023223533A1

WO2023223533A1 - Estimation device, estimation method, and program

Info

Publication number: WO2023223533A1
Application number: PCT/JP2022/020927
Authority: WO
Inventors: 竜太松野; 智哉坂井; 啓太佐久間; 義男亀田
Original assignee: 日本電気株式会社
Priority date: 2022-05-20
Filing date: 2022-05-20
Publication date: 2023-11-23
Also published as: JPWO2023223533A1

Abstract

An estimation device 100 according to the present invention comprises: a data selection means 121 for selecting, from first explanatory variable data prepared according to a preset prediction model and first objective variable data corresponding to the first explanatory variable data, first explanatory variable data and first objective variable data corresponding to the first explanatory variable data, on the basis of second explanatory variable data that is not associated with an objective variable; and a performance estimation means 122 for estimating the performance of a prediction model's prediction about the second explanatory variable data, on the basis of the comparison between prediction data obtained by inputting the first explanatory variable data selected according to the prediction model and the first objective variable data corresponding to the first explanatory variable data.

Description

Estimation device, estimation method and program

　本開示は、推定装置、推定方法およびプログラムに関する。 The present disclosure relates to an estimation device, an estimation method, and a program.

　説明変数のデータと、当該説明変数のデータに対応する目的変数のデータと、を基に、説明変数のデータから目的変数のデータを予測する予測モデルを作成し、作成した予測モデルを実運用する方法が検討されている。例えば、特許文献１には、説明変数のデータと、説明変数のデータから生成された新たなデータと、から複数の予測モデル候補を作成し、複数の予測モデル候補の予測精度を評価することによって予測モデルを選択する技術が開示されている。 Create a prediction model that predicts the objective variable data from the explanatory variable data based on the explanatory variable data and the objective variable data that corresponds to the explanatory variable data, and put the created prediction model into practice. Methods are being considered. For example, in Patent Document 1, a plurality of predictive model candidates are created from explanatory variable data and new data generated from the explanatory variable data, and the prediction accuracy of the plural predictive model candidates is evaluated. A technique for selecting a predictive model is disclosed.

国際公開第２０２１／２２９６４８International Publication No. 2021/229648

　しかしながら、予測モデルを運用する状況によっては、予測モデルの予測対象である目的変数のデータの入手に時間がかかる場合がある。このような場合においては、目的変数を入手するまで、予測モデルの予測の性能指標の値を算出することができない、という問題が生じる。 However, depending on the situation in which the prediction model is operated, it may take time to obtain data on the objective variable that is the target of the prediction by the prediction model. In such a case, a problem arises in that the value of the predictive performance index of the predictive model cannot be calculated until the target variable is obtained.

　本開示の目的は、上述した課題を鑑みて、目的変数のデータを用いずに、予測モデルの予測性能を推定することができる推定装置、推定方法およびプログラムを提供することにある。 In view of the above-mentioned problems, an object of the present disclosure is to provide an estimation device, an estimation method, and a program that can estimate the prediction performance of a prediction model without using data of a target variable.

　本発明の一形態である推定装置は、
　予め設定された予測モデルに応じて用意された第１の説明変数データ及び当該第１の説明変数データに対応する第１の目的変数データから、目的変数が対応付けられていない第２の説明変数データに基づいて、前記第１の説明変数データ及び当該第１の説明変数データに対応する前記第１の目的変数データを選択するデータ選択手段と、
　前記予測モデルに対して選択された前記第１の説明変数データを入力して得られる予測データと当該第１の説明変数データに対応する前記第１の目的変数データとの比較に基づいて、前記予測モデルの前記第２の説明変数データに対する予測の性能を推定する性能推定手段と、
を有する、
という構成をとる。 An estimation device that is one form of the present invention includes:
A second explanatory variable to which an objective variable is not associated from first explanatory variable data prepared according to a preset prediction model and first objective variable data corresponding to the first explanatory variable data. data selection means for selecting the first explanatory variable data and the first objective variable data corresponding to the first explanatory variable data based on the data;
Based on a comparison between the prediction data obtained by inputting the first explanatory variable data selected for the prediction model and the first objective variable data corresponding to the first explanatory variable data, the performance estimating means for estimating the prediction performance of the prediction model for the second explanatory variable data;
has,
The structure is as follows.

　また、本発明の一形態である推定方法は、
　予め設定された予測モデルに応じて用意された第１の説明変数データ及び当該第１の説明変数データに対応する第１の目的変数データから、目的変数が対応付けられていない第２の説明変数データに基づいて、前記第１の説明変数データ及び当該第１の説明変数データに対応する前記第１の目的変数データを選択し、
　前記予測モデルに対して選択された前記第１の説明変数データを入力して得られる予測データと当該第１の説明変数データに対応する前記第１の目的変数データとの比較に基づいて、前記予測モデルの前記第２の説明変数データに対する予測の性能を推定する、
という構成をとる。 Furthermore, an estimation method that is one form of the present invention is
A second explanatory variable to which an objective variable is not associated from first explanatory variable data prepared according to a preset prediction model and first objective variable data corresponding to the first explanatory variable data. selecting the first explanatory variable data and the first objective variable data corresponding to the first explanatory variable data based on the data;
Based on a comparison between the prediction data obtained by inputting the first explanatory variable data selected for the prediction model and the first objective variable data corresponding to the first explanatory variable data, the estimating the prediction performance of the prediction model for the second explanatory variable data;
The structure is as follows.

　また、本発明の一形態であるプログラムは、
　予め設定された予測モデルに応じて用意された第１の説明変数データ及び当該第１の説明変数データに対応する第１の目的変数データから、目的変数が対応付けられていない第２の説明変数データに基づいて、前記第１の説明変数データ及び当該第１の説明変数データに対応する前記第１の目的変数データを選択し、
　前記予測モデルに対して選択された前記第１の説明変数データを入力して得られる予測データと当該第１の説明変数データに対応する前記第１の目的変数データとの比較に基づいて、前記予測モデルの前記第２の説明変数データに対する予測の性能を推定する、
処理をコンピュータに実行させる、
という構成をとる。 Further, a program that is one form of the present invention is
A second explanatory variable to which an objective variable is not associated from first explanatory variable data prepared according to a preset prediction model and first objective variable data corresponding to the first explanatory variable data. selecting the first explanatory variable data and the first objective variable data corresponding to the first explanatory variable data based on the data;
Based on a comparison between the prediction data obtained by inputting the first explanatory variable data selected for the prediction model and the first objective variable data corresponding to the first explanatory variable data, the estimating the prediction performance of the prediction model for the second explanatory variable data;
have a computer perform a process,
The structure is as follows.

　本開示によれば、目的変数のデータを用いずに、予測モデルの運用時の予測性能を推定することができる。 According to the present disclosure, the predictive performance of the predictive model during operation can be estimated without using data of the objective variable.

本開示の第１の実施形態における推定装置のハードウェア構成の一例を示すブロック図である。FIG. 1 is a block diagram illustrating an example of a hardware configuration of an estimation device according to a first embodiment of the present disclosure. 本開示の第１の実施形態における推定装置の構成の一例を示すブロック図である。FIG. 1 is a block diagram illustrating an example of the configuration of an estimation device according to a first embodiment of the present disclosure. 本開示の第１の実施形態における推定装置の動作の一例を示すフローチャートである。It is a flowchart which shows an example of operation of an estimation device in a 1st embodiment of this indication. 本開示の第２の実施形態における推定装置の構成の一例を示すブロック図である。It is a block diagram showing an example of composition of an estimation device in a 2nd embodiment of this indication. 本開示の第２の実施形態における推定装置の動作の一例を示すブロック図である。It is a block diagram showing an example of operation of an estimation device in a 2nd embodiment of this indication. 本開示の第２の実施形態における推定装置による推定処理の様子の一例を示す図である。FIG. 7 is a diagram illustrating an example of an estimation process performed by an estimation device according to a second embodiment of the present disclosure.

　本開示の実施の形態について図面を参照しながら、本開示の好適な実施の形態について詳細に説明する。 Preferred embodiments of the present disclosure will be described in detail with reference to the drawings.

＜第１の実施形態＞
　まず、本開示の第１の実施形態の一例ついて説明する。図１は、本実施形態に係る推定装置のハードウェア構成の一例を示すブロック図であり、図２は、推定装置の構成の一例を示すブロック図である。図３は、推定装置の動作の一例を示すフローチャートである。なお、本実施形態では、後述する第２の実施形態で説明する推定装置及び推定方法の構成の概略を示している。 <First embodiment>
First, an example of the first embodiment of the present disclosure will be described. FIG. 1 is a block diagram showing an example of the hardware configuration of an estimation device according to the present embodiment, and FIG. 2 is a block diagram showing an example of the configuration of the estimation device. FIG. 3 is a flowchart showing an example of the operation of the estimation device. Note that this embodiment shows an outline of the configuration of an estimation device and an estimation method that will be explained in a second embodiment described later.

　まず、図１を参照して、本実施形態における推定装置１００のハードウェア構成を説明する。推定装置１００は、一般的な情報処理装置にて構成されており、一例として、以下のようなハードウェア構成を装備している。
　・ＣＰＵ（Ｃｅｎｔｒａｌ　Ｐｒｏｃｅｓｓｉｎｇ　Ｕｎｉｔ）１０１（演算装置）
　・ＲＯＭ（Ｒｅａｄ　Ｏｎｌｙ　Ｍｅｍｏｒｙ）１０２（記憶装置）
　・ＲＡＭ（Ｒａｎｄｏｍ　Ａｃｃｅｓｓ　Ｍｅｍｏｒｙ）１０３（記憶装置）
　・ＲＡＭ１０３にロードされるプログラム群１０４
　・プログラム群１０４を格納する記憶装置１０５
　・情報処理装置外部の記憶媒体１１０の読み書きを行うドライブ装置１０６
　・情報処理装置外部の通信ネットワーク１１１と接続する通信インタフェース１０７
　・データの入出力を行う入出力インタフェース１０８
　・各構成要素を接続するバス１０９ First, with reference to FIG. 1, the hardware configuration of the estimation device 100 in this embodiment will be described. The estimation device 100 is constituted by a general information processing device, and is equipped with the following hardware configuration as an example.
・CPU (Central Processing Unit) 101 (arithmetic unit)
・ROM (Read Only Memory) 102 (storage device)
・RAM (Random Access Memory) 103 (storage device)
- Program group 104 loaded into RAM 103
- Storage device 105 that stores the program group 104
- A drive device 106 that reads and writes from and to a storage medium 110 external to the information processing device
-Communication interface 107 that connects to the communication network 111 outside the information processing device
・I/O interface 108 that inputs and outputs data
・Bus 109 connecting each component

　なお、図１は、推定装置１００である情報処理装置のハードウェア構成の一例を示しており、情報処理装置のハードウェア構成は上述した場合に限定されない。例えば、情報処理装置は、ドライブ装置１０６を有さないなど、上述した構成の一部から構成されてもよい。また、情報処理装置は、上述したＣＰＵの代わりに、ＧＰＵ（Ｇｒａｐｈｉｃ　Ｐｒｏｃｅｓｓｉｎｇ　Ｕｎｉｔ）、ＤＳＰ（Ｄｉｇｉｔａｌ　Ｓｉｇｎａｌ　Ｐｒｏｃｅｓｓｏｒ）、ＭＰＵ（Ｍｉｃｒｏ　Ｐｒｏｃｅｓｓｉｎｇ　Ｕｎｉｔ）、ＦＰＵ（Ｆｌｏａｔｉｎｇ　ｐｏｉｎｔ　ｎｕｍｂｅｒ　Ｐｒｏｃｅｓｓｉｎｇ　Ｕｎｉｔ）、ＰＰＵ（Ｐｈｙｓｉｃｓ　Ｐｒｏｃｅｓｓｉｎｇ　Ｕｎｉｔ）、ＴＰＵ（ＴｅｎｓｏｒＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、量子プロセッサ、マイクロコントローラ、又は、これらの組み合わせなどを用いることができる。 Note that FIG. 1 shows an example of the hardware configuration of an information processing device that is the estimation device 100, and the hardware configuration of the information processing device is not limited to the above-mentioned case. For example, the information processing device may be configured from part of the configuration described above, such as not having the drive device 106. In addition, the information processing device uses GPU (Graphic Processing Unit), DSP (Digital Signal Processor), MPU (Micro Processing Unit), FPU (Float) instead of the above-mentioned CPU. ating point number Processing Unit), PPU (Physics Processing Unit) , a TPU (Tensor Processing Unit), a quantum processor, a microcontroller, or a combination thereof.

　そして、推定装置１００は、プログラム群１０４をＣＰＵ１０１が取得して当該ＣＰＵ１０１が実行することで、図２に示すデータ選択手段１２１と性能推定手段１２２とを構築して装備することができる。なお、プログラム群１０４は、例えば、予め記憶装置１０５やＲＯＭ１０２に格納されており、必要に応じてＣＰＵ１０１がＲＡＭ１０３にロードして実行する。また、プログラム群１０４は、通信ネットワーク１１１を介してＣＰＵ１０１に供給されてもよいし、予め記憶媒体１１０に格納されており、ドライブ装置１０６が該プログラムを読み出してＣＰＵ１０１に供給してもよい。但し、上述したデータ選択手段１２１と性能推定手段１２２とは、かかる手段を実現させるための専用の電子回路で構築されるものであってもよい。 Then, the estimation device 100 can construct and be equipped with the data selection means 121 and the performance estimation means 122 shown in FIG. 2 by the CPU 101 acquiring the program group 104 and executing the program group 104. Note that the program group 104 is stored in advance in the storage device 105 or ROM 102, for example, and is loaded into the RAM 103 and executed by the CPU 101 as needed. Further, the program group 104 may be supplied to the CPU 101 via the communication network 111, or may be stored in the storage medium 110 in advance, and the drive device 106 may read the program and supply it to the CPU 101. However, the data selection means 121 and the performance estimating means 122 described above may be constructed of a dedicated electronic circuit for realizing such means.

　まず、推定装置１００は、予測モデルと、第１の説明変数データ及び当該第１の説明変数データに対応する第１の目的変数データと、第２の説明変数データと、を取得する機能を有する。予測モデルは、推定装置１００内で実行することにより、説明変数のデータが入力されると、対応する目的変数のデータの予測結果を出力する機能を有する。このとき、出力されたデータを予測データと呼ぶ。第１の説明変数データ及び第１の目的変数データは、予測モデルに応じて用意されたデータであり、例えば、予測モデルを機械学習により生成するための訓練データとしての第１の説明変数データ及び第１の目的変数データや、予測モデルを検証するための第１の説明変数データ及び第１の目的変数データを含む。また、第２の説明変数データは、予測モデルを別のタスクで運用する際に取得される説明変数データであり、かかる第２の説明変数データに対する目的変数データは存在せず、対応付けられていない。 First, the estimation device 100 has a function of acquiring a prediction model, first explanatory variable data, first objective variable data corresponding to the first explanatory variable data, and second explanatory variable data. . The prediction model has a function of outputting a prediction result of the data of the corresponding objective variable when data of the explanatory variable is input by being executed in the estimation device 100. At this time, the output data is called predicted data. The first explanatory variable data and the first objective variable data are data prepared according to the prediction model. For example, the first explanatory variable data and the first objective variable data are used as training data for generating the prediction model by machine learning. It includes first objective variable data, first explanatory variable data and first objective variable data for verifying the prediction model. In addition, the second explanatory variable data is explanatory variable data obtained when operating the prediction model in another task, and there is no objective variable data for such second explanatory variable data, and there is no correspondence. do not have.

　なお、推定装置１００は、予測モデル及び各データを推定装置１００の外部から取得しても良い。例えば、推定装置１００が他の装置とネットワークで接続されている場合には、推定装置１００と同一のネットワーク上に存在する別の装置から予測モデル及び各データを取得しても良い。この場合、推定装置１００がネットワーク上の他の装置と通信するための通信装置を含んでいても良い。また、推定装置１００が図示しない記憶装置を含む場合には、記憶装置から予測モデル及び各データを取得しても良い。 Note that the estimation device 100 may acquire the prediction model and each data from outside the estimation device 100. For example, if the estimation device 100 is connected to another device via a network, the prediction model and each data may be acquired from another device that exists on the same network as the estimation device 100. In this case, estimation device 100 may include a communication device for communicating with other devices on the network. Furthermore, if the estimation device 100 includes a storage device (not shown), the prediction model and each data may be acquired from the storage device.

　データ選択手段１２１は、推定装置１００が取得した第１の説明変数データ及び当該第１の説明変数データに対応する第１の目的変数データから、第２の説明変数データに基づいて、一部又は全部の第１の説明変数データ及びこれに対応する第１の目的変数データを選択する。このとき、データ選択手段１２１は、第２の説明変数のデータの内容に基づき、予測モデルの第２の説明変数のデータに対する予測の性能の推定により好ましい第１の説明変数のデータの全部または一部を選択する。より具体的には、後述の性能推定手段１２２による予測モデルの第２の説明変数のデータに対する予測の性能指標の推定値が、より実際の値に近しくなることを目的として、第２の説明変数のデータの内容に基づき、第１の説明変数のデータの全部または一部を選択する。例えば、予測モデルの予測性能の推定においては、一般に予測モデルが訓練時に使用していない説明変数のデータを使用することが望ましい。したがって、データ選択手段１２１による第１の説明変数のデータの全部または一部の選択方法として、予測モデルの訓練に用いていない説明変数のデータを選択する方法を用いても良い。他の例として、予測モデルの第２の説明変数のデータに対する予測性能の推定においては、第２の説明変数のデータと説明変数空間上で近い説明変数のデータと対応する目的変数のデータを用いる方が好ましい。したがって、データ選択手段１２１による第１の説明変数のデータの全部または一部の選択方法として、第２の説明変数のデータの説明変数の空間上の分布（以下、単に分布と呼ぶ。）と選択された説明変数のデータの分布が近くなるように選択する方法を用いても良い。なお、基準の説明変数のデータの選択方法は、後述する性能推定手段１２２による性能の推定方法に応じて、より好適に決定しても良い。データ選択手段１２１で選択された説明変数のデータを基準の説明変数のデータと呼ぶ。また、データ選択手段１２１は、選択した基準の説明変数データに対応する目的変数のデータも選択し、これを基準の目的変数のデータと呼ぶ。そして、基準の説明変数データと、これに対応する目的変数のデータと、を併せて、基準データと呼ぶ。 The data selection means 121 selects a partial or All the first explanatory variable data and the corresponding first objective variable data are selected. At this time, the data selection means 121 selects all or part of the data of the first explanatory variable that is more preferable for estimating the prediction performance of the prediction model for the data of the second explanatory variable, based on the content of the data of the second explanatory variable. Select section. More specifically, for the purpose of making the estimated value of the prediction performance index for the data of the second explanatory variable of the prediction model by the performance estimating means 122 (described later) closer to the actual value, the second explanation is All or part of the data of the first explanatory variable is selected based on the content of the data of the variable. For example, in estimating the predictive performance of a predictive model, it is generally desirable to use data on explanatory variables that are not used by the predictive model during training. Therefore, as a method for selecting all or part of the data of the first explanatory variable by the data selection means 121, a method of selecting data of an explanatory variable that is not used for training the prediction model may be used. As another example, in estimating the predictive performance of a prediction model for data on a second explanatory variable, data on the second explanatory variable, data on an explanatory variable that is close in the explanatory variable space, and data on the corresponding objective variable are used. is preferable. Therefore, as a method for selecting all or part of the data of the first explanatory variable by the data selection means 121, the spatial distribution (hereinafter simply referred to as distribution) of the explanatory variable of the data of the second explanatory variable is selected. Alternatively, a method may be used in which the data of the explanatory variables are selected so that the distribution of the data is close to each other. Note that the method for selecting the reference explanatory variable data may be determined more appropriately depending on the method for estimating performance by the performance estimating means 122, which will be described later. The explanatory variable data selected by the data selection means 121 is referred to as reference explanatory variable data. The data selection means 121 also selects objective variable data corresponding to the selected reference explanatory variable data, and this is referred to as reference objective variable data. The reference explanatory variable data and the corresponding objective variable data are collectively referred to as reference data.

　性能推定手段１２２は、予測モデルに対して基準の説明変数のデータを入力して得られる予測データと、基準の説明変数のデータに対応する基準の目的変数のデータと、の比較に基づいて、予測モデルの第２の説明変数データに対する予測の性能を推定する。より具体的には、例えば、性能推定手段１２２は、予測モデルに基準の説明変数のデータを入力して得られる予測データと、基準の目的変数のデータと、を用いて計算される性能指標の値を、予測モデルの第２の説明変数のデータに対する性能指標の推定値として使用しても良い。この時、性能推定手段１２２は、さらに第２の説明変数のデータと、基準の説明変数のデータとの傾向の差を考慮して、傾向の差が大きいほど性能指標の推定値が悪化するように調整しても良い。ここで傾向の差とは、２つのデータの内容を比較した結果に基づく情報であり、例えば、２つのデータの内容の差つまり２つのデータの内容がどの程度異なっているかを表す数値的な非負の値である。傾向の差の一例として、２つのデータの分布間に定義される距離や指標を用いても良い。 The performance estimating means 122 performs the following based on the comparison between the predicted data obtained by inputting the data of the standard explanatory variable to the prediction model and the data of the standard objective variable corresponding to the data of the standard explanatory variable. Estimate the prediction performance of the prediction model for the second explanatory variable data. More specifically, for example, the performance estimation means 122 calculates a performance index calculated using prediction data obtained by inputting reference explanatory variable data into a prediction model and reference objective variable data. The value may be used as an estimate of the performance index for the data of the second explanatory variable of the predictive model. At this time, the performance estimating means 122 further takes into consideration the difference in trends between the data of the second explanatory variable and the data of the standard explanatory variable, so that the larger the difference in trends, the worse the estimated value of the performance index becomes. It may be adjusted to Here, the difference in tendency is information based on the result of comparing the contents of two data, for example, the difference in the contents of two data, that is, the numerical non-negative value that represents the extent to which the contents of two data differ. is the value of As an example of the difference in trends, a distance or an index defined between two data distributions may be used.

　性能推定手段１２２で用いる性能指標は、予測モデルによる予測性能の良さ、または、悪さを定量的に評価するものである。性能指標の実際の値は、第２説明変数のデータを予測モデルに入力して得られる予測データと、第２の説明変数のデータに対応する目的変数のデータとの比較に基づき計算される。したがって、目的変数のデータが得られない場合は、性能指標の実際の値を計算することは出来ない。性能指標の具体的な例として、回帰の場合には、平均絶対誤差、平均二乗誤差、決定係数などが、判別の場合には、精度、Ｆ１スコア、交差エントロピーなどが挙げられる。もちろん、性能推定手段１２２で使用できる性能指標はこれらに限定されず、任意の評価指標を推定対象とすることができる。 The performance index used by the performance estimating means 122 quantitatively evaluates the goodness or badness of the prediction performance of the prediction model. The actual value of the performance index is calculated based on a comparison between the predicted data obtained by inputting the data of the second explanatory variable into the prediction model and the data of the objective variable corresponding to the data of the second explanatory variable. Therefore, if data on the objective variable is not available, it is not possible to calculate the actual value of the performance index. Specific examples of performance indicators include mean absolute error, mean square error, coefficient of determination, etc. in the case of regression, and accuracy, F1 score, cross entropy, etc. in the case of discrimination. Of course, the performance indicators that can be used by the performance estimating means 122 are not limited to these, and any evaluation index can be used as the estimation target.

　そして、上述した構成の推定装置１００は、上述したデータ選択手段１２１と性能推定手段１２２との機能により、図３のフローチャートに示す推定方法を実行する。 Then, the estimation device 100 configured as described above executes the estimation method shown in the flowchart of FIG. 3 by the functions of the data selection means 121 and performance estimation means 122 described above.

　図３に示すように、推定装置１００は、
　予め設定された予測モデルに応じて用意された第１の説明変数データ及び当該第１の説明変数データに対応する第１の目的変数データから、目的変数が対応付けられていない第２の説明変数データに基づいて、前記第１の説明変数データ及び当該第１の説明変数データに対応する前記第１の目的変数データを選択し（ステップＳ１０１）、
　前記予測モデルに対して選択された前記第１の説明変数データを入力して得られる予測データと当該第１の説明変数データに対応する前記第１の目的変数データとの比較に基づいて、前記予測モデルの前記第２の説明変数データに対する予測の性能を推定する（ステップＳ１０２）。 As shown in FIG. 3, the estimation device 100
A second explanatory variable to which an objective variable is not associated from first explanatory variable data prepared according to a preset prediction model and first objective variable data corresponding to the first explanatory variable data. Based on the data, select the first explanatory variable data and the first objective variable data corresponding to the first explanatory variable data (step S101),
Based on a comparison between the prediction data obtained by inputting the first explanatory variable data selected for the prediction model and the first objective variable data corresponding to the first explanatory variable data, the The prediction performance of the prediction model for the second explanatory variable data is estimated (step S102).

　以上のように、本開示では、例えば、運用中の予測モデルと、第１の説明変数のデータとして予測モデルの作成に用いた説明変数のデータ（すなわち、予測モデルの訓練に用いた説明変数のデータと検証に用いた説明変数のデータ）、第１の説明変数のデータに対応する目的変数のデータとして、予測モデルの作成に用いた目的変数のデータ（すなわち、予測モデルの訓練に用いた目的変数のデータと検証に用いた目的変数のデータ）、第２の説明変数のデータとして、予測モデルの運用に用いた説明変数のデータを使用することで、予測モデルの運用に用いた説明変数のデータに対する予測性能を推定することができる。すなわち、推定装置１００は、予測モデルの運用時に、運用時の目的変数のデータの入手を待たずに、予測モデルの運用時の予測性能を運用時の説明変数のデータを用いて推定することが出来る。 As described above, in the present disclosure, for example, the predictive model in operation and the explanatory variable data used to create the predictive model as the first explanatory variable data (i.e., the explanatory variable data used to train the predictive model) are described. data of the objective variable used to create the predictive model (i.e., the data of the objective variable used to train the predictive model) as the data of the objective variable corresponding to the data of the first explanatory variable. By using the data of the explanatory variables used in the operation of the prediction model as the data of the second explanatory variable (the data of the explanatory variables used in the operation of the prediction model), Predictive performance for data can be estimated. That is, the estimation device 100 can estimate the predictive performance of the predictive model during operation using the explanatory variable data during operation, without waiting for the data on the objective variable during operation to be obtained. I can do it.

　なお、図２に示す構成はあくまで一例であり、必ずしも本実施形態に係る推定装置の構成を限定するものではない。例えば、推定装置１００の一部の機能が複数の装置が連携することで実現されてもよい。具体的な例として、第１の説明変数のデータと、第１の説明変数のデータに対応する目的変数のデータと、第２の説明変数のデータとを取得する第１の推定装置と、予測モデルを取得する第２の推定装置により構成されていても良い。また、多角的に性能指標の値の推定を行うために、推定装置１００が複数のデータ選択手段及び複数の性能推定手段を含んでいても良い。また、複数の性能推定手段による推定値に基づき、推定装置１００による最終的な推定値を算出するための装置を含んでいても良い。推定装置１００が、予測モデルの性能指標の値の推定以外の機能を同時に持つ場合には、図２に図示しない装置を含んでいても良い。 Note that the configuration shown in FIG. 2 is just an example, and does not necessarily limit the configuration of the estimation device according to this embodiment. For example, some functions of the estimation device 100 may be realized by a plurality of devices working together. As a specific example, a first estimation device that obtains data on a first explanatory variable, data on an objective variable corresponding to the data on the first explanatory variable, and data on a second explanatory variable; It may be configured by a second estimating device that obtains the model. Further, in order to estimate the value of the performance index from multiple angles, the estimation device 100 may include a plurality of data selection means and a plurality of performance estimation means. Further, it may include a device for calculating a final estimated value by the estimating device 100 based on the estimated values by the plurality of performance estimating means. If the estimation device 100 has a function other than estimating the value of the performance index of the prediction model, it may include a device not shown in FIG. 2 .

＜第２の実施形態＞
　続いて、本開示の第２の実施形態の一例ついて、図４乃至図６を参照しながら説明する。図４は、本実施形態に係る推定装置の構成の一例について説明するためのブロック図であり、図５は、推定装置の動作の一例を説明するためのフローチャートである。図６は、推定装置を実運用したときの様子を説明するための図である。 <Second embodiment>
Next, an example of a second embodiment of the present disclosure will be described with reference to FIGS. 4 to 6. FIG. 4 is a block diagram for explaining an example of the configuration of the estimation device according to the present embodiment, and FIG. 5 is a flowchart for explaining an example of the operation of the estimation device. FIG. 6 is a diagram for explaining the situation when the estimation device is actually operated.

　本実施形態では、第１の説明変数のデータとして、予測モデルの作成に使用した説明変数のデータを使用する。また、第１の説明変数のデータに対応する第１の目的変数のデータとして、予測モデルの作成に使用した説明変数のデータに対応する目的変数のデータを使用する。また、第２の説明変数のデータとして、予測モデルの運用時の説明変数のデータを使用する。但し、これらはあくまで本開示の実施の形態の一例としての本実施形態における一例であり、本開示の実施の形態自体を限定するものではない。本開示の各データに限らない一例として、例えば、あるタスクで得られた説明変数のデータと目的変数のデータで作成した予測モデルを、別のタスクに適用して別のタスクの説明変数のデータに対する予測モデルの予測性能を推定する場合がある。この時、第２の説明変数のデータとして、予測モデルの運用時の説明変数のデータではなく、別タスクの説明変数データを用いることができ、本開示による推定装置を適用することが可能である。他に、第１の説明変数のデータ、及び、第１の説明変数のデータに対応する第１の目的変数のデータとして、予測モデルの作成に使用したデータ以外を使用しても良く、例えば対応する目的変数のデータが既に入手されている予測モデルの運用時の説明変数のデータと、対応する目的変数のデータを使用しても良い。 In this embodiment, the data of the explanatory variable used to create the prediction model is used as the data of the first explanatory variable. Further, as the data of the first objective variable corresponding to the data of the first explanatory variable, the data of the objective variable corresponding to the data of the explanatory variable used to create the prediction model is used. Furthermore, as the second explanatory variable data, explanatory variable data during operation of the prediction model is used. However, these are merely examples of the present embodiment as an example of the embodiment of the present disclosure, and do not limit the embodiment itself of the present disclosure. As an example not limited to each data of the present disclosure, for example, a predictive model created using explanatory variable data and objective variable data obtained in a certain task is applied to another task, and the explanatory variable data of another task is There are cases where we estimate the predictive performance of a predictive model for . At this time, as the second explanatory variable data, explanatory variable data of another task can be used instead of the explanatory variable data during operation of the prediction model, and it is possible to apply the estimation device according to the present disclosure. . In addition, data other than the data used to create the prediction model may be used as the data of the first explanatory variable and the data of the first objective variable corresponding to the data of the first explanatory variable. It is also possible to use explanatory variable data during operation of a prediction model for which objective variable data has already been obtained, and corresponding objective variable data.

　本実施形態における推定装置１は、演算装置と記憶装置とを備えた１台又は複数台の情報処理装置にて構成される。そして、推定装置１は、図４に示すように、第２データ取得部１０と、出力部２０と、制御部３０と、記憶部４０と、を備える。そして、制御部３０は、さらに、基準データ選択部３、性能推定部４、を備える。第２データ取得部１０、出力部２０、制御部３０が備える基準データ選択部３及び性能推定部４、の各機能は、演算装置が記憶装置に格納された各機能を実現するためのプログラムを実行することにより実現することができる。また、推定装置１は、第１データ４１、予測モデル４２、パラメータ情報４３を記憶した記憶部４０を、記憶装置に備えている。なお、制御部３０は、第２データ取得部１０と、出力部２０と、記憶部４０とのそれぞれと、通信網等を介してデータ通信を行う。 The estimation device 1 in this embodiment is configured with one or more information processing devices including a calculation device and a storage device. As shown in FIG. 4, the estimation device 1 includes a second data acquisition section 10, an output section 20, a control section 30, and a storage section 40. The control unit 30 further includes a reference data selection unit 3 and a performance estimation unit 4. The functions of the second data acquisition unit 10, the output unit 20, the reference data selection unit 3, and the performance estimation unit 4 included in the control unit 30 are performed by the arithmetic unit using programs stored in the storage device to realize each function. This can be achieved by executing. Furthermore, the estimation device 1 includes a storage unit 40 that stores first data 41, a prediction model 42, and parameter information 43 in a storage device. Note that the control unit 30 performs data communication with each of the second data acquisition unit 10, the output unit 20, and the storage unit 40 via a communication network or the like.

　そして、推定装置１は、上述した各構成により、記憶部４０に記憶されている予測モデル４２の、第２データ取得部１０によって取得される第２の説明変数のデータ、すなわち、予測モデル４２の運用時の説明変数のデータに対する予測性能を推定する機能を有する。以下、構成について詳述する。 Then, the estimation device 1 uses the above-described configurations to obtain the second explanatory variable data acquired by the second data acquisition unit 10 of the prediction model 42 stored in the storage unit 40, that is, the prediction model 42. It has the function of estimating predictive performance for explanatory variable data during operation. The configuration will be explained in detail below.

　第２データ取得部１０は、第２の説明変数のデータとして、予測モデルの運用時の説明変数のデータを取得する。運用時の説明変数のデータ（すなわち、第２の説明変数のデータ）は、運用時に得られるデータであり、対応する目的変数のデータを予測モデルが予測する。第２データ取得部１０は、例えば、推定装置１のユーザーが運用時の説明変数のデータを入力する場合は、ユーザーの入力を受け付けるインターフェースを含んでも良く、具体的には、タッチパネル、ボタン、音声入力装置などが該当する。また、推定装置１自体が第２の説明変数のデータを収集する機能を有する場合は、例えば、カメラなどのセンサー装置とセンサー装置から取得された情報を処理する装置と、処理された情報を第２の説明変数のデータとして記憶する装置等とが該当する。推定装置１とは別の装置から第２の説明変数のデータを取得する場合は、別の装置との通信装置等が該当する。 The second data acquisition unit 10 acquires explanatory variable data during operation of the prediction model as second explanatory variable data. The explanatory variable data during operation (that is, the second explanatory variable data) is data obtained during operation, and the prediction model predicts the corresponding objective variable data. For example, when the user of the estimation device 1 inputs explanatory variable data during operation, the second data acquisition unit 10 may include an interface that accepts user input, and specifically, includes a touch panel, buttons, and voice input. This includes input devices, etc. In addition, if the estimation device 1 itself has a function of collecting data on the second explanatory variable, for example, a sensor device such as a camera, a device that processes the information acquired from the sensor device, and a device that processes the information obtained from the sensor device and This corresponds to devices and the like that store data as explanatory variables in item 2. When acquiring data of the second explanatory variable from a device different from the estimation device 1, a communication device with the other device, etc. corresponds to the case.

　出力部２０は、制御部３０で算出された性能指標の推定値を出力する。推定装置１のユーザーに対して推定値を出力する場合には、推定値を表示するためのディスプレイや、音を出力するためのスピーカなどが該当する。また、推定値を、推定装置１とは別の装置で使用する場合は、別の装置との通信装置等が出力部２０に該当する。また、推定装置１は、例えば、推定値が一定の値を下回った際に、予測性能の劣化を報知するアラート機能を有していても良く、その場合は予測性能の劣化の報知装置が出力部２０に該当する。 The output unit 20 outputs the estimated value of the performance index calculated by the control unit 30. When outputting the estimated value to the user of the estimation device 1, a display for displaying the estimated value, a speaker for outputting sound, etc. are applicable. Furthermore, when the estimated value is used in a device other than the estimation device 1, a communication device or the like with the other device corresponds to the output unit 20. Furthermore, the estimation device 1 may have an alert function that notifies the deterioration of the prediction performance when the estimated value falls below a certain value, in which case the device that notifies the deterioration of the prediction performance outputs the This corresponds to Section 20.

　記憶部４０の構成について説明する。図４に示すように、記憶部４０は、第１データ４１、予測モデル４２、パラメータ情報４３を予め格納する。記憶部４０は、第１データ４１、予測モデル４２、及び、パラメータ情報４３を、制御部３０に必要に応じて送信する。 The configuration of the storage unit 40 will be explained. As shown in FIG. 4, the storage unit 40 stores first data 41, a prediction model 42, and parameter information 43 in advance. The storage unit 40 transmits the first data 41, the prediction model 42, and the parameter information 43 to the control unit 30 as necessary.

　第１データ４１は、予測モデル４２の作成に用いた説明変数のデータ（第１の説明変数データ）と、予測モデル４２の作成に用いた説明変数のデータに対応する目的変数のデータ（第１の目的変数データ）と、を含む。この第１データ４１は、予測モデル４２の訓練に用いた訓練データ（すなわち、予測モデル４２の訓練に用いた説明変数のデータと説明変数のデータに対応する目的変数のデータ）と、予測モデル４２の汎化性能の検証に用いた検証データ（すなわち、予測モデル４２の検証に用いた説明変数のデータと説明変数のデータに対応する目的変数のデータ）と、を含む。 The first data 41 includes explanatory variable data (first explanatory variable data) used to create the predictive model 42 and target variable data (first explanatory variable data) corresponding to the explanatory variable data used to create the predictive model 42. objective variable data). This first data 41 includes training data used for training the prediction model 42 (that is, explanatory variable data used for training the prediction model 42 and objective variable data corresponding to the explanatory variable data), and the prediction model 42. includes verification data used to verify the generalization performance of (that is, explanatory variable data used to verify the prediction model 42 and objective variable data corresponding to the explanatory variable data).

　予測モデル４２は、第１データ４１を用いて作成された予測モデルである。具体的には、予測モデル４２は、第１データ４１のうちの訓練データの説明変数のデータが入力されると、訓練データの目的変数のデータを予測するように、所謂教師あり機械学習によって作成されてもよい。また、予測モデル４２は、第１データ４１のうちの検証データを用いて、汎化性能が検証されていても良い。予測モデル４２を実現するアルゴリズムの一例としては、線形回帰、決定木、ランダムフォレスト、ニューラルネットワーク等が挙げられる。これらはあくまで一例であり、説明変数のデータに対して目的変数のデータを予測することが可能であれば、予測モデル４２を実現するアルゴリズムは限定されない。 The prediction model 42 is a prediction model created using the first data 41. Specifically, the prediction model 42 is created by so-called supervised machine learning so that when explanatory variable data of the training data of the first data 41 is input, the data of the objective variable of the training data is predicted. may be done. Moreover, the generalization performance of the prediction model 42 may be verified using verification data of the first data 41. Examples of algorithms that implement the prediction model 42 include linear regression, decision trees, random forests, neural networks, and the like. These are just examples, and the algorithm for realizing the prediction model 42 is not limited as long as it is possible to predict the data of the objective variable based on the data of the explanatory variable.

　パラメータ情報４３は、性能指標の値の推定に必要なパラメータの情報である。例えば、推定する性能指標や後述する基準データの取得方法及び性能指標の推定方法に関する情報を含んでも良い。 The parameter information 43 is information on parameters necessary for estimating the value of the performance index. For example, it may include information regarding the performance index to be estimated, a reference data acquisition method described later, and a performance index estimation method.

　本実施形態では、予測モデルと、第１の説明変数のデータと、第１の説明変数のデータに対応する第１の目的変数のデータとが、推定ごとに変わらないため、記憶部４０に記憶することで取得に要する処理を削減することができる。一方で、第２の説明変数のデータ、すなわち予測モデルの運用時に用いる説明変数のデータは、推定毎に変わるため、第２データ取得部１０により取得することで、最新の運用時の説明変数のデータに対する性能指標の値を推定することが可能になる。 In this embodiment, the prediction model, the data of the first explanatory variable, and the data of the first objective variable corresponding to the data of the first explanatory variable do not change for each estimation, so they are stored in the storage unit 40. By doing so, the processing required for acquisition can be reduced. On the other hand, the data of the second explanatory variables, that is, the data of the explanatory variables used during operation of the prediction model, change for each estimation. It becomes possible to estimate the value of the performance index for the data.

　次に、制御部３０の構成について説明する。図４に示すように、制御部３０は、基準データ選択部３と、性能推定部４とを有する。また、制御部３０は、図示しないCPU、ROM及びRAMなどを備え、基準データ選択部３と性能推定部４とに対しての種々の制御及び演算を行う。 Next, the configuration of the control section 30 will be explained. As shown in FIG. 4, the control unit 30 includes a reference data selection unit 3 and a performance estimation unit 4. Further, the control section 30 includes a CPU, ROM, RAM, etc. (not shown), and performs various controls and calculations on the reference data selection section 3 and the performance estimating section 4.

　基準データ選択部３（データ選択手段）は、第２データ取得部１０によって取得された第２の説明変数のデータと記憶部４０から取得した予測モデル４２と、第１データ４１と、パラメータ情報４３とに基づいて、第１データ４１の説明変数のデータの全部または一部を選択する。このとき、基準データ選択部３は、選択した第１データ４１の説明変数のデータに対応する目的変数のデータも併せて選択する。ここで、選択された説明変数のデータを基準の説明変数のデータと呼ぶ。また、基準の説明変数のデータに対応する第１データ４１の目的変数のデータを、基準の目的変数のデータと呼ぶ。そして、基準の説明変数のデータと基準の目的変数のデータを合わせて基準データと呼ぶ。 The reference data selection unit 3 (data selection means) selects the second explanatory variable data acquired by the second data acquisition unit 10, the prediction model 42 acquired from the storage unit 40, the first data 41, and the parameter information 43. Based on this, all or part of the explanatory variable data of the first data 41 is selected. At this time, the reference data selection unit 3 also selects the data of the objective variable corresponding to the data of the explanatory variable of the selected first data 41. Here, the selected explanatory variable data is referred to as reference explanatory variable data. Moreover, the data of the objective variable of the first data 41 corresponding to the data of the standard explanatory variable is referred to as the data of the standard objective variable. The data of the standard explanatory variables and the data of the standard objective variables are collectively referred to as standard data.

　基準データ選択部３による基準データの選択方法の具体的な一例として、第１データ４１のうちの検証データのみを基準データとして使用する方法を用いても良い。後述の性能指標の推定式（１）は、基準データに対する予測モデル４２の性能指標の値を、第２の説明変数のデータに対する予測モデル４２の性能指標の推定値の基準として、演算に使用する。基準データに、第１データ４１のうちの訓練データを含めず、検証データのみを使用することで、予測モデル４２の訓練データに対する過学習による性能指標の値の過度の評価を避け、予測モデル４２の性能指標の推定値をより好適に算出できる。もちろん、訓練データの全部または一部を基準データに含めてもよいし、訓練データを含めて基準データが含むサンプル数を多くすることが、性能指標の値の好適な推定に有用となる場合もある。例えば、基準データ選択部３は、第２の説明変数のデータの内容から、かかるデータの属性や特性が分類できないなど、後述するように関連性のある第１の説明変数のデータを選択できないような場合には、検証データのみを選択したり、訓練データを含めたすべてのデータを選択してもよい。 As a specific example of a method for selecting reference data by the reference data selection unit 3, a method may be used in which only verification data of the first data 41 is used as reference data. Performance index estimation formula (1), which will be described later, uses the value of the performance index of the predictive model 42 with respect to the reference data as a standard for the estimated value of the performance index of the predictive model 42 with respect to the data of the second explanatory variable. . By not including the training data of the first data 41 in the reference data and using only the verification data, excessive evaluation of the performance index value due to overfitting with respect to the training data of the prediction model 42 is avoided, and the prediction model 42 The estimated value of the performance index can be calculated more suitably. Of course, all or part of the training data may be included in the reference data, and including the training data to increase the number of samples included in the reference data may be useful for optimally estimating the value of the performance index. be. For example, the reference data selection unit 3 may be unable to select related data for the first explanatory variable, as will be described later, such as when the attributes or characteristics of the data cannot be classified based on the content of the data for the second explanatory variable. In such cases, only the validation data or all data including training data may be selected.

　また、基準データ選択部３による基準データの選択方法の他の一例として、第１データ４１のうち、予測モデル４２の第２の説明変数のデータに対する予測性能と関連性の高い全部または一部のデータを基準データとして選択する方法を用いても良い。ここで、関連性が高いとは、予測モデル４２の第２の説明変数のデータに対する予測性能と、予測モデル４２の基準の説明変数のデータに対する予測性能との間に正の相関関係が存在し、２つの予測性能が同程度になると考えられることを言う。つまり、この場合、第２の説明変数のデータと、第１データ４１内の第１の説明変数のデータと、を比較した結果、第２の説明変数データに対して予め設定された基準によりデータ内容の関連例が高い、ことを意味する。後述の性能指標の推定式（１）は、基準データに対する予測モデル４２の性能指標の値を、第２の説明変数のデータに対する予測モデル４２の性能指標の推定値の基準として、演算に使用する。したがって、第２の説明変数のデータに対して関連性の高い一部のデータを基準データとして用いることで、予測モデル４２の性能指標の推定値をより好適に算出できる。第２の説明変数のデータと関連性の高いデータを第１データ４１から選定する際には、データに関する経験的な知識（一般にドメイン知識と呼ばれる）や予測モデル４２の特性を用いても良い。関連性の高いデータを取得する具体的な方法として、例えば、説明変数のデータに日時情報が含まれている場合には、第２の説明変数のデータが含むいずれかのサンプルと同月同日、または同曜日、または同季節のデータを第１データ４１から取得する方法がある。他の一例として、予測モデル４２が予測時に特に重視している説明変数に関して、第２の説明変数のデータと値が近いデータを第１データ４１から取得しても良い。 In addition, as another example of the method for selecting reference data by the reference data selection unit 3, all or part of the first data 41 that is highly relevant to the prediction performance of the second explanatory variable data of the prediction model 42 may be selected. A method of selecting data as reference data may also be used. Here, "highly relevant" means that there is a positive correlation between the predictive performance of the predictive model 42 for the data of the second explanatory variable and the predictive performance of the predictive model 42 for the data of the reference explanatory variable. , refers to the fact that the two prediction performances are considered to be comparable. In other words, in this case, as a result of comparing the data of the second explanatory variable and the data of the first explanatory variable in the first data 41, the data of the second explanatory variable is determined based on the preset standard for the second explanatory variable data. This means that the content has a high number of relevant examples. Performance index estimation formula (1), which will be described later, uses the value of the performance index of the predictive model 42 with respect to the reference data as a standard for the estimated value of the performance index of the predictive model 42 with respect to the data of the second explanatory variable. . Therefore, by using some data highly related to the data of the second explanatory variable as reference data, the estimated value of the performance index of the prediction model 42 can be calculated more suitably. When selecting data highly relevant to the data of the second explanatory variable from the first data 41, empirical knowledge regarding the data (generally referred to as domain knowledge) or characteristics of the prediction model 42 may be used. As a specific method to obtain highly relevant data, for example, if the explanatory variable data includes date and time information, the same day of the same month as any sample containing the second explanatory variable data, or There is a method of acquiring data for the same day of the week or the same season from the first data 41. As another example, regarding an explanatory variable that the prediction model 42 particularly emphasizes during prediction, data whose value is close to that of the second explanatory variable may be acquired from the first data 41.

　また、基準データ選択部３による基準データの選択方法の別の一例として、第１データ４１のうち、第２の説明変数のデータと傾向の差、つまり、予め設定された基準に基づくデータ内容の差が、他と比較して小さい一部のデータを基準データとして選択する方法を用いても良い。後述の性能指標の推定式（１）は、第２の説明変数のデータと基準の説明変数のデータとの傾向の差が大きいほど、第２の説明変数のデータに対する予測モデル４２の性能指標の値が悪くなるように推定値を演算する。第２の説明変数のデータは、第１データ４１と比較して限られた量かつ説明変数の空間上で限られた範囲のデータのみを含むことが多い。こうした状況下では、第１データ４１の全ての説明変数のデータを基準の説明変数のデータとした場合、第２の説明変数のデータと基準の説明変数のデータとの傾向の差が大きくなり、予測モデル４２の第２の説明変数のデータに対する予測の性能指標の値が悪く推定される。しかしながら、第２の説明変数のデータの各サンプルと説明変数の空間上で近いサンプルが第１データ４１の説明変数のデータに含まれている場合がある。換言すると、運用時の説明変数のデータ（第２の説明変数のデータ）と近いデータを、予測モデル４２が、予測モデル４２の作成用のデータ（第１データ４１）を通してすでに訓練済みまたは検証済みの場合がある。こうした場合には、予測モデル４２の第２の説明変数のデータに対する予測の性能指標の実際の値は良くなる。そこで、第１データ４１の説明変数のデータのうち、第２の説明変数のデータと傾向の差が小さい一部のデータを選択して、基準の説明変数のデータとすることで、予測モデル４２の説明変数のデータに対する予測の性能指標の実際の値に近い値を推定することが出来る。また、傾向の差の指標は、後述の性能指標の推定式（１）で使用する傾向の差の指標と、同じ指標によって計算しても良いし、異なる指標によって計算しても良い。また、複数の異なる指標を組み合わせて計算しても良い。 In addition, as another example of the method of selecting reference data by the reference data selection unit 3, the difference between the data of the second explanatory variable and the tendency among the first data 41, that is, the difference in the data content based on the preset standard. A method may be used in which a portion of data with a smaller difference compared to other data is selected as the reference data. Equation (1) for estimating the performance index described below shows that the larger the difference in tendency between the data of the second explanatory variable and the data of the reference explanatory variable, the greater the performance index of the predictive model 42 with respect to the data of the second explanatory variable. Calculate the estimated value so that the value becomes worse. The second explanatory variable data often includes only a limited amount of data and a limited range of data in the explanatory variable space compared to the first data 41. Under these circumstances, if the data of all the explanatory variables of the first data 41 are used as the data of the standard explanatory variable, the difference in tendency between the data of the second explanatory variable and the data of the standard explanatory variable becomes large, The value of the prediction performance index for the data of the second explanatory variable of the prediction model 42 is estimated poorly. However, the explanatory variable data of the first data 41 may include samples that are spatially close to each sample of the second explanatory variable data. In other words, the predictive model 42 has already been trained or verified using data for creating the predictive model 42 (first data 41) using data close to the explanatory variable data (second explanatory variable data) during operation. There are cases where In such a case, the actual value of the performance index of the prediction of the prediction model 42 for the data of the second explanatory variable becomes better. Therefore, by selecting some data that has a small difference in tendency from the data of the second explanatory variable among the explanatory variable data of the first data 41 and using it as the data of the standard explanatory variable, the predictive model 42 It is possible to estimate a value close to the actual value of the prediction performance index for the explanatory variable data. Further, the index of difference in tendency may be calculated using the same index as the index of difference in tendency used in performance index estimation formula (1) described later, or may be calculated using a different index. Further, calculation may be performed by combining a plurality of different indicators.

　ここで、基準データ選択部３による第２の説明変数のデータと傾向の差が小さいデータを第１データ４１の説明変数のデータから選択する具体的な方法の一例として、貪欲法による選択方法を用いても良い。まず、第１データ４１の説明変数のデータから第２の説明変数のデータと最も近いサンプルを１つまたは複数個取得して、一時的な基準の説明変数のデータとする。次に、一時的な基準の説明変数のデータに含まれない第１データ４１の説明変数のデータの各サンプルの中から、一時的な基準の説明変数のデータに追加することで一時的な基準の説明変数のデータと第２の説明変数のデータとの傾向の差をより良く減少させるサンプルを１つまたは複数取得して、一時的な基準の説明変数のデータに追加する。上記の追加過程を、例えば、一時的な基準の説明変数のデータが含むサンプル数が一定数を超えるまで繰り返し、最終的に一時的な基準の説明変数のデータを基準の説明変数のデータとすることで、第１データ４１の説明変数のデータから第２の説明変数のデータと傾向の差が小さいデータを選択することができる。 Here, as an example of a specific method for selecting, from the explanatory variable data of the first data 41, data with a small difference in tendency from the data of the second explanatory variable by the reference data selection unit 3, a selection method using a greedy method is used. May be used. First, one or more samples closest to the second explanatory variable data are obtained from the explanatory variable data of the first data 41 and used as temporary standard explanatory variable data. Next, from each sample of explanatory variable data of the first data 41 that is not included in the explanatory variable data of the temporary standard, the temporary standard is added to the data of the explanatory variable of the temporary standard. One or more samples that better reduce the difference in tendency between the data of the explanatory variable and the data of the second explanatory variable are obtained and added to the data of the temporary reference explanatory variable. The above addition process is repeated, for example, until the number of samples included in the data of the explanatory variable of the temporary standard exceeds a certain number, and finally the data of the explanatory variable of the temporary standard is used as the data of the explanatory variable of the standard. By doing so, it is possible to select data that has a small difference in tendency from the data of the second explanatory variable from the explanatory variable data of the first data 41.

　基準データ選択部３は、上述した基準データの選択方法の２つ以上の複数の例を組み合わせた選択方法を用いても良い。例えば、基準データ選択部３は、第１データ４１のうちの検証データをまず選択し、そのうえで第２の説明変数のデータと関連性が高い一部のデータを選択し、さらに第２の説明変数のデータと傾向の差が小さい一部のデータを選択しても良い。また、基準データの選択方法は、後述する性能推定部４による具体的な推定方法を考慮してより好適な選択方法を用いても良い。すなわち、性能推定部４による推定方法が、特定の基準データの選択方法で性能推定値をより正確に計算することが可能になると経験的または理論的に考えられる場合には、上述した選択方法を用いても良い。 The reference data selection unit 3 may use a selection method that is a combination of two or more examples of the reference data selection methods described above. For example, the reference data selection unit 3 first selects the verification data from the first data 41, then selects some data that is highly related to the data of the second explanatory variable, and then selects the data of the second explanatory variable. You may select some data that has a small difference between the data and the trend. Further, as the reference data selection method, a more suitable selection method may be used in consideration of a specific estimation method by the performance estimating section 4, which will be described later. In other words, if it is empirically or theoretically believed that the estimation method by the performance estimator 4 can more accurately calculate the performance estimated value with a specific reference data selection method, the above-mentioned selection method may be used. May be used.

　なお、第１データ４１の説明変数のデータと、第２の説明変数のデータの傾向の差が大きく異なり、第２の説明変数のデータのうちのいずれサンプルも、第１データ４１の説明変数のデータが含むサンプルと関連性が低い場合、どのように第１データ４１の説明変数のデータの全部または一部を選択しても、第２の説明変数のデータとの傾向の差が０になることは無い。したがって、第１データ４１の説明変数のデータと、第２の説明変数のデータの傾向が大きくことなる場合は、上述のように第１データ４１の説明変数のデータの全部または一部を選択したとしても、予測モデル４２の第２の説明変数のデータに対する予測性能の悪化を推定することができる。 It should be noted that the difference in trends between the explanatory variable data of the first data 41 and the data of the second explanatory variable is significantly different, and any sample of the data of the second explanatory variable is different from that of the explanatory variable of the first data 41. If the data has low relevance to the sample it contains, no matter how you select all or part of the explanatory variable data of the first data 41, the difference in trend from the second explanatory variable data will be 0. There's nothing wrong with that. Therefore, if the trends of the explanatory variable data of the first data 41 and the data of the second explanatory variable are significantly different, all or part of the explanatory variable data of the first data 41 is selected as described above. Even so, it is possible to estimate the deterioration in the prediction performance of the prediction model 42 for the data of the second explanatory variable.

　性能推定部４（性能推定手段）は、基準データと、第２の説明変数のデータと、記憶部４０から取得した予測モデル４２とパラメータ情報４３とに基づいて、第２の説明変数のデータに対する予測モデル４２の性能指標の推定値Ｐを、例えば後述する式（１）により演算する。演算により推定された性能指標の値は、出力部２０へ出力される。
＜性能指標の推定値の演算式＞
Ｐ＝Ｂ＋Ｄ×Ａ…（１）
　ただし、
　Ｐ：第２の説明変数のデータに対する予測モデル４２の性能指標の値の推定値、
　Ｂ：基準の説明変数のデータ及び基準の目的変数のデータに対する予測モデルの性能指標の値、
　Ｄ：第２の説明変数のデータと基準の説明変数のデータとの傾向の差を示す非負の値、
　Ａ：基準の説明変数のデータと、予測モデル４２に基準の説明変数のデータを入力して得られる予測データと基準の目的変数のデータとの比較と、に基づき計算される、Ｄに応じた性能指標の値の変化率、
である。 The performance estimating unit 4 (performance estimating means) performs calculations on the data of the second explanatory variable based on the reference data, the data of the second explanatory variable, and the predictive model 42 and parameter information 43 acquired from the storage unit 40. The estimated value P of the performance index of the prediction model 42 is calculated using, for example, equation (1) described later. The value of the performance index estimated by the calculation is output to the output unit 20.
<Formula for calculating the estimated value of the performance index>
P=B+D×A…(1)
however,
P: estimated value of the performance index value of the predictive model 42 for the data of the second explanatory variable;
B: Value of the performance index of the predictive model for the data of the standard explanatory variable and the data of the standard objective variable,
D: a non-negative value indicating the difference in tendency between the data of the second explanatory variable and the data of the standard explanatory variable;
A: Calculated based on the standard explanatory variable data and the comparison between the predicted data obtained by inputting the standard explanatory variable data into the prediction model 42 and the standard objective variable data. rate of change in the value of the performance index,
It is.

　式（１）におけるＢは、基準の説明変数のデータ及び基準の目的変数のデータに対する予測モデルの性能指標の値である。特に、基準の説明変数のデータと第２の説明変数のデータの傾向の差が全く無い場合（すなわち、式（１）におけるＤの値が０の場合）、性能指標の推定値ＰはＢの値となる。実際、基準の説明変数のデータと第２の説明変数のデータの傾向の差が全く無い場合、第２の説明変数のデータと基準の説明変数のデータを同一と見なすことができるため、説明変数のデータと目的変数のデータの関係の変化（すなわち、コンセプトドリフト）が起こらない限り、第２の説明変数のデータに対する予測モデル４２の予測性能は、基準の説明変数のデータに対する予測モデル４２の予測性能とほとんど同一となる。式（１）は、この事実に基づいた性能指標の値の推定を可能にする。換言すると、式（１）は、コンセプトドリフトが起きていない時、かつ、基準の説明変数のデータを第２の説明変数のデータと同一の傾向となるように選択できる時、性能指標の値を、基準の説明変数のデータと基準の目的変数のデータから正確に推定することが出来る。 B in equation (1) is the value of the performance index of the predictive model for the reference explanatory variable data and the reference objective variable data. In particular, when there is no difference in the trends between the data of the standard explanatory variable and the data of the second explanatory variable (that is, when the value of D in equation (1) is 0), the estimated value P of the performance index is value. In fact, if there is no difference in the trends between the data of the standard explanatory variable and the data of the second explanatory variable, the data of the second explanatory variable and the data of the standard explanatory variable can be considered the same, so the explanatory variable Unless a change in the relationship between the data of The performance is almost the same. Equation (1) allows estimation of the value of the performance index based on this fact. In other words, Equation (1) calculates the value of the performance index when no concept drift occurs and when the data of the standard explanatory variable can be selected so that it has the same tendency as the data of the second explanatory variable. , can be estimated accurately from the data of the standard explanatory variables and the data of the standard objective variables.

　また、基準の説明変数のデータと第２の説明変数のデータの傾向の差がある場合（すなわち、式（１）におけるＤの値が０より大きい場合）、式（１）は、式（１）のＢの値を最も良い値として、式（１）のＤとＡの値に基づいて性能の悪化を推定する。一般的に、予測モデルの訓練データの説明変数のデータと傾向の差がある説明変数のデータに対しては、予測モデルの予測性能が劣化することとなるが、式（１）は、これに基づいて、第２の説明変数のデータに対する予測モデル４２の予測性能の劣化を推定することが出来る。 In addition, if there is a difference in tendency between the data of the standard explanatory variable and the data of the second explanatory variable (i.e., if the value of D in equation (1) is greater than 0), equation (1) is replaced by equation (1). ) is the best value, and the performance deterioration is estimated based on the values of D and A in equation (1). In general, the predictive performance of the predictive model will deteriorate for explanatory variable data that has a tendency different from that of the explanatory variable in the training data of the predictive model, but Equation (1) Based on this, it is possible to estimate the deterioration of the prediction performance of the prediction model 42 with respect to the data of the second explanatory variable.

　式（１）におけるＤは、第２の説明変数のデータと基準の説明変数のデータとの傾向の差を定量的に示す非負の値である。Ｄは、第２の説明変数のデータと基準の説明変数のデータとの傾向の差が全くない時に０になり、差が大きい時に大きな値になる。Ｄの具体的な一例として、第２の説明変数のデータの分布と基準の説明変数のデータの分布の差を算出し、その値をＤとして使用することができる。データ分布の差を測る指標の一例として、カルバック・ライブラー情報量、イェンセン・シャノン情報量、ワッサースタイン距離、Ｍａｘｉｍｕｍ　Ｍｅａｎ　Ｄｉｓｃｒｅｐａｎｃｙ（以下、ＭＭＤ）などが挙げられる。これら指標は、データ分布の差が無い場合に０になり、データ分布の差が大きくなるほど大きな値を取る。また、他の一例として、第２の説明変数のデータと基準の説明変数のデータのそれぞれについて、説明変数の平均や分散などの統計量を算出し、それらの差を基にＤを算出してもよい。別の例として、第２の説明変数のデータのうち、各サンプルから一定の距離以内に基準の説明変数のデータのいずれのサンプルも存在しないサンプルの割合を算出し、その値をＤとしても良い。 D in equation (1) is a non-negative value that quantitatively indicates the difference in tendency between the data of the second explanatory variable and the data of the reference explanatory variable. D becomes 0 when there is no difference in tendency between the data of the second explanatory variable and the data of the standard explanatory variable, and takes a large value when the difference is large. As a specific example of D, the difference between the distribution of data of the second explanatory variable and the distribution of data of the standard explanatory variable can be calculated, and the calculated value can be used as D. Examples of indicators for measuring differences in data distribution include Kullback-Leibler information amount, Jensen-Shannon information amount, Wasserstein distance, Maximum Mean Discrepancy (hereinafter referred to as MMD), and the like. These indicators are 0 when there is no difference in data distribution, and take on larger values as the difference in data distribution becomes larger. In addition, as another example, statistics such as the mean and variance of the explanatory variables are calculated for each of the data of the second explanatory variable and the data of the standard explanatory variable, and D is calculated based on the difference between them. Good too. As another example, the proportion of samples in which there is no sample of the standard explanatory variable data within a certain distance from each sample among the data of the second explanatory variable may be calculated, and this value may be set as D. .

　以上の例におけるＤの値の算出には、第２の説明変数のデータと基準の説明変数のデータの全ての説明変数を使用しても良いし、一部の説明変数のみを使用しても良い。例えば、予測モデル４２による予測に特に強い影響を及ぼす一部の説明変数のみを使用しても良い。また、上記の例におけるＤの値の算出には、第２の説明変数のデータと基準の説明変数のデータのそれぞれの全てを使用しても良いし、一部のデータのみを使用しても良い。例えば、第２の説明変数のデータが非常に多くのサンプルから成る時は、第２の説明変数のデータをサンプリングして、Ｄの値の算出を行ってもよい。これにより、式（１）のＤの算出に必要な時間を短縮することができ、高速な推定が可能になる。 To calculate the value of D in the above example, all explanatory variables of the second explanatory variable data and the standard explanatory variable data may be used, or only some explanatory variables may be used. good. For example, only some explanatory variables that have a particularly strong influence on predictions made by the prediction model 42 may be used. In addition, to calculate the value of D in the above example, you may use all of the data of the second explanatory variable and the data of the standard explanatory variable, or you may use only some of the data. good. For example, when the data of the second explanatory variable consists of a large number of samples, the value of D may be calculated by sampling the data of the second explanatory variable. Thereby, the time required to calculate D in equation (1) can be shortened, and high-speed estimation becomes possible.

　他に、式（１）のDの値の算出において、各説明変数のDへの影響度を調整するために、Ｄの算出前に第２の説明変数のデータと基準の説明変数のデータの各説明変数の値を変換しても良い。例えば、各説明変数のDへの影響度を同程度とするために、各説明変数にＭｉｎ―Ｍａｘ正規化やZ―Score正規化による変換を行っても良い。他の例として、予測モデル４２による予測結果に特に影響力の強い一部の説明変数のＤへの影響度を強くするために、第２の説明変数のデータと基準の説明変数のデータの説明変数の値を例えば２倍にしても良い。 In addition, in calculating the value of D in equation (1), in order to adjust the degree of influence of each explanatory variable on D, it is necessary to compare the data of the second explanatory variable and the data of the standard explanatory variable before calculating D. The value of each explanatory variable may be converted. For example, in order to equalize the degree of influence of each explanatory variable on D, each explanatory variable may be converted by Min-Max normalization or Z-Score normalization. As another example, in order to strengthen the influence on D of some explanatory variables that have a particularly strong influence on the prediction result by the prediction model 42, explanation of the data of the second explanatory variable and the data of the reference explanatory variable For example, the value of the variable may be doubled.

　式（１）におけるＡは、基準の説明変数のデータと、予測モデル４２に基準の説明変数のデータを入力して得られる予測データと基準の目的変数のデータとの比較と、に基づき計算される、Ｄに応じた性能指標の値の変化率である。性能指標の値が高いほど予測モデルの性能が良いとされる場合（例えば、性能指標として決定係数、精度、Ｆ１スコア等を使用する場合）には、負の値、性能指標の値が低いほど予測モデルの性能が良いとされる場合（例えば、性能指標として平均絶対誤差、平均二乗誤差、交差エントロピー等を使用する場合）には、正の値とする。これにより、式（１）のＤの値の増加に応じて、第２の説明変数のデータに対する予測モデル４２の予測性能の悪化を定式化できる。 A in formula (1) is calculated based on the data of the standard explanatory variable and the comparison between the predicted data obtained by inputting the data of the standard explanatory variable into the prediction model 42 and the data of the standard objective variable. is the rate of change in the value of the performance index according to D. If the higher the value of the performance index, the better the performance of the prediction model (for example, when using the coefficient of determination, accuracy, F1 score, etc. as the performance index), a negative value, and the lower the value of the performance index, the better the performance of the prediction model. If the predictive model is considered to have good performance (for example, if mean absolute error, mean squared error, cross entropy, etc. are used as performance indicators), a positive value is used. Thereby, it is possible to formulate the deterioration of the prediction performance of the prediction model 42 for the data of the second explanatory variable in accordance with the increase in the value of D in equation (1).

　式（１）のＡの値は、例えば、定数としても良い。より具体的には、例えば、第１データ以外に、説明変数のデータと、説明変数のデータに対応する目的変数のデータが使用できる場合には、説明変数のデータを第２の説明変数のデータとした時の式（１）によるＰの推定値が、説明変数のデータと目的変数のデータに対する予測モデル４２の性能指標の実際の値となるようにＡの値を設定しても良い。また、例えば理論的な解析や経験的な実験結果に基づいてＡの値を好適に設定しても良い。 The value of A in formula (1) may be a constant, for example. More specifically, for example, if explanatory variable data and objective variable data corresponding to the explanatory variable data can be used in addition to the first data, the explanatory variable data is used as the second explanatory variable data. The value of A may be set so that the estimated value of P according to equation (1) when Further, the value of A may be suitably set based on, for example, theoretical analysis or empirical experimental results.

　また、式（１）のＡの値は、基準の説明変数のデータと、予測モデル４２に基準の説明変数のデータを入力して得られる予測データと基準の目的変数のデータとの比較とに基づいて算出しても良い。具体的な算出方法の一例として、次の式（２）に示す、分布ロバスト最適化に基づく予測性能悪化率の演算式を用いても良い。この分布ロバスト最適化に基づく予測性能悪化率の演算式は、予測モデル４２の基準の説明変数のデータに対する予測の性能指標の値と、基準の説明変数のデータの分布から数理的にＡの値を算出する手法である。なお、式（２）による演算式を用いる場合はＡの値が正になるため、予測性能の指標は、指標の値が低いほど予測モデルの性能が良いとされる指標を用いる。 In addition, the value of A in equation (1) is based on the comparison between the standard explanatory variable data, the predicted data obtained by inputting the standard explanatory variable data into the prediction model 42, and the standard objective variable data. It may be calculated based on. As an example of a specific calculation method, the following formula (2) for calculating the predicted performance deterioration rate based on distribution robust optimization may be used. The calculation formula for the prediction performance deterioration rate based on this distribution robust optimization is calculated mathematically from the value of the prediction performance index for the data of the standard explanatory variable of the prediction model 42 and the distribution of the data of the standard explanatory variable. This is a method to calculate Note that when the arithmetic expression according to equation (2) is used, the value of A is positive, and therefore, as an index of prediction performance, an index is used in which the lower the value of the index, the better the performance of the prediction model.

＜分布ロバスト最適化に基づく予測性能悪化率の演算式＞
Ａ＝（Σｉｊ　Ｋｉｊ×Ｌｉ×Ｌｊ）＾（１／２）…（２）
　ただし、
　Ａ：式（１）のＡの値、
　Σｉｊ：ｉとｊをそれぞれ１から基準の説明変数のデータの数まで変化させた場合の総和記号、
　Ｋｉｊ：基準の説明変数のデータのｉ番目とｊ番目の説明変数のラプラスカーネルの計算結果の値、
　Ｌｉ：基準の目的変数のデータのｉ番目の値と、基準の説明変数のデータのｉ番目のサンプルを予測モデルに入力した際の出力の値とに基づいて計算される予測性能の指標の値、
　Ｌｊ：基準の目的変数のデータのｊ番目の値と、基準の説明変数のデータのｊ番目のサンプルを予測モデルに入力した際の出力の値とに基づいて計算される予測性能の指標の値、
である。 <Formula for predicting performance deterioration rate based on distribution robust optimization>
A=(Σij Kij×Li×Lj)^(1/2)…(2)
however,
A: value of A in formula (1),
Σij: summation symbol when i and j are each changed from 1 to the number of data of the standard explanatory variable,
Kij: the value of the Laplace kernel calculation result of the i-th and j-th explanatory variables of the data of the standard explanatory variables,
Li: Value of the prediction performance index calculated based on the i-th value of the standard objective variable data and the output value when inputting the i-th sample of the standard explanatory variable data into the prediction model. ,
Lj: value of the prediction performance index calculated based on the j-th value of the standard objective variable data and the output value when inputting the j-th sample of the standard explanatory variable data into the prediction model ,
It is.

　式（１）のＤにラプラスカーネルに基づくＭＭＤの値を用いた上で、式（２）によって演算される値を式（１）のＡとすることで、基準の説明変数のデータと第２の説明変数のデータとのＭＭＤによる距離に基づいて、最も予測性能が悪い場合の性能指標の値を推定することができる。 By using the MMD value based on the Laplace kernel for D in equation (1) and setting the value calculated by equation (2) as A in equation (1), the data of the standard explanatory variable and the second The value of the performance index when the prediction performance is the worst can be estimated based on the distance by MMD from the data of the explanatory variable.

＜動作の説明＞
　次に、図４のブロック図及び図５のフローチャートを参照して、本実施の形態に係る全体の動作の一例について詳細に説明する。図５は、第１実施形態における推定装置１の処理手順を表すフローチャートの一例である。 <Explanation of operation>
Next, an example of the overall operation according to this embodiment will be described in detail with reference to the block diagram of FIG. 4 and the flowchart of FIG. 5. FIG. 5 is an example of a flowchart showing the processing procedure of the estimation device 1 in the first embodiment.

　まず、推定装置１の処理が開始されると、第２データ取得部１０により第２の説明変数のデータを取得する（ステップＳ１）。
　次いで、制御部３０は、記憶部４０から、第１データ４１、予測モデル４２、パラメータ情報４３を取得する（ステップＳ２）。
　基準データ選択部３は、第２の説明変数のデータと第１データ４１とパラメータ情報４３とに基づいて基準データを選択する（ステップＳ３）。
　性能推定部４は、基準データと、第２の説明変数のデータと、予測モデル４２と、パラメータ情報４３とに基づいて、式（１）により性能指標の推定値を演算する（ステップＳ４）。
　最後に推定装置１は、性能推定部４により得られた性能指標の推定値を出力部２０により出力する（ステップＳ５）。 First, when the process of the estimation device 1 is started, the second data acquisition unit 10 acquires data of the second explanatory variable (step S1).
Next, the control unit 30 acquires the first data 41, the prediction model 42, and the parameter information 43 from the storage unit 40 (step S2).
The reference data selection unit 3 selects reference data based on the second explanatory variable data, the first data 41, and the parameter information 43 (step S3).
The performance estimating unit 4 calculates the estimated value of the performance index using equation (1) based on the reference data, the data of the second explanatory variable, the prediction model 42, and the parameter information 43 (step S4).
Finally, the estimation device 1 outputs the estimated value of the performance index obtained by the performance estimation section 4 through the output section 20 (step S5).

　なお、図５に示した動作の流れはあくまでも本実施形態の一例であり、必ずしも、本実施形態に係る動作の流れを限定するものではない。具体的な一例として、第２の説明変数の取得処理Ｓ１は、第１データ４１と予測モデル４２とパラメータ情報４３の取得処理（ステップＳ２）の後に実行されても良い。また、ステップＳ２は複数のステップに分割されても良く、例えば、ステップＳ２のうちの予測モデル４２の取得処理が基準データの選択処理（ステップＳ３）の後に実行されても良い。 Note that the flow of operations shown in FIG. 5 is just an example of this embodiment, and does not necessarily limit the flow of operations according to this embodiment. As a specific example, the second explanatory variable acquisition process S1 may be executed after the first data 41, prediction model 42, and parameter information 43 acquisition process (step S2). Moreover, step S2 may be divided into a plurality of steps, and for example, the process of acquiring the prediction model 42 in step S2 may be executed after the process of selecting reference data (step S3).

＜実施例＞
　続いて、本開示の一実施形態に係る推定装置の実施例として、具体的なユースケースを想定した当該システムの運用の例について説明する。 <Example>
Next, as an example of the estimation device according to an embodiment of the present disclosure, an example of the operation of the system assuming a specific use case will be described.

（実施例１：スーパーマーケットでの１ヶ月後のアイスクリームの需要予測）
　まず、実施例１として、あるスーパーマーケットの実店舗において、現在の月、日、曜日、季節、天気、気温、湿度、来客数、関連商品の販売数などを説明変数として、１ヶ月後のアイスクリームの需要数を目的変数として予測する場合の一例を説明する。この場合、１ヶ月が経過してアイスクリームの販売が行われるまで実際の販売数は知ることが出来ない。換言すると、予測を行ってから１ヶ月が経過するまで、目的変数の値である１ヶ月後のアイスクリームの需要数を知ることは出来ず、現在の予測モデルの予測性能を、より具体的には、例えば、直近１週間の予測モデルの予測値の平均二乗誤差を、知ることは出来ない。そこで本開示により直近１週間の予測モデルの予測の平均二乗誤差の推定を実施する。 (Example 1: Forecasting the demand for ice cream in a supermarket one month later)
First, as Example 1, in a physical store of a certain supermarket, the current month, day, day of the week, season, weather, temperature, humidity, number of visitors, number of sales of related products, etc. are used as explanatory variables, and ice cream one month later is An example of predicting the demand quantity using the objective variable will be explained. In this case, the actual number of sales cannot be known until one month has passed and the ice cream is sold. In other words, until one month has passed since the prediction was made, it is not possible to know the value of the objective variable, which is the number of ice creams in demand one month from now. For example, it is not possible to know the mean square error of the predicted values of the prediction model for the most recent week. Therefore, according to the present disclosure, the mean square error of the prediction of the prediction model for the most recent week is estimated.

　予測モデルは、過去の一定期間、例えば過去に取得した３年分のデータ（すなわち、３年間の毎日の説明変数のデータと、毎日の目的変数のデータである各日から１ヶ月後のアイスクリームの販売数を含む）を、予測モデル作成用のデータとして用いて作成されるものとする。ここで、３年分のデータを、第１の説明変数のデータ及び第１の説明変数のデータに対応する目的変数のデータとし、あわせて第１のデータと呼ぶ。作成された予測モデル、及び、第１のデータは所定の記憶領域に保持される。 The prediction model is based on a certain period of time in the past, for example, 3 years of data acquired in the past (i.e. 3 years of daily explanatory variable data and daily target variable data of ice cream 1 month after each day). (including sales numbers) as data for creating a predictive model. Here, the three years' worth of data is referred to as first explanatory variable data and objective variable data corresponding to the first explanatory variable data, and is collectively referred to as first data. The created prediction model and first data are held in a predetermined storage area.

　次に、実運用中に、直近１週間の予測モデルの平均二乗誤差を推定する。すなわち、直近１週間分の説明変数のデータを、第２の説明変数のデータとする。予測性能の推定においては、基準データの選択方法と、式（１）におけるＤの値の算出方法と、式（１）におけるＡの値の算出方法とが必要になる。 Next, during actual operation, estimate the mean square error of the prediction model for the most recent week. That is, the explanatory variable data for the most recent week is used as the second explanatory variable data. Estimating prediction performance requires a method for selecting reference data, a method for calculating the value of D in equation (1), and a method for calculating the value of A in equation (1).

　まず、基準データとして、第１のデータのうち、第２の説明変数のデータが含む月日と合致する月日を持つデータを選択するものとする。これは、同年同月における予測モデルの予測性能はおおむね同程度の結果になる、すなわち、予測モデルの第２の説明変数のデータに対する予測性能と関連性が高いと言う考えに基づく。こうした考えは、例えばスーパーマーケットの売上に関する経験や第１のデータを詳細に分析することなど（すなわち、ドメイン知識）から得られる。これにより、第２の説明変数のデータと関連性が高いデータに基づいて、推定対象である直近１週間の平均二乗誤差の値をより好適に推定することが可能になる。 First, it is assumed that among the first data, data having a month and day that matches the month and day included in the data of the second explanatory variable is selected as reference data. This is based on the idea that the prediction performance of the prediction models in the same month of the same year will give roughly the same results, that is, the prediction performance of the prediction model with respect to the data of the second explanatory variable is highly related. These ideas can be derived, for example, from experience with supermarket sales or from detailed analysis of primary data (ie, domain knowledge). This makes it possible to more appropriately estimate the value of the mean squared error for the most recent week, which is the estimation target, based on data that is highly related to the data of the second explanatory variable.

　次に、式（１）のＤの値として、基準の説明変数のデータと第２の説明変数のデータのＭＭＤを使用するものとする。より具体的にＭＭＤは、シグマ（パラメータ）を１としたラプラスカーネルで計算される。また、Ｄの算出前にＺ－Ｓｃｏｒｅ正規化を行うものとする。ＭＭＤによるＤの値の算出方法は後述の式（３）による式（１）のＡの値の算出方法とあわせて、分布ロバスト最適化に基づく平均二乗誤差の推定方法となり、理論的な解析に基づく推定方法となる。 Next, it is assumed that the MMD of the reference explanatory variable data and the second explanatory variable data is used as the value of D in equation (1). More specifically, MMD is calculated using a Laplace kernel with a sigma (parameter) of 1. Also, before calculating D, Z-Score normalization is performed. The method for calculating the value of D using MMD, together with the method for calculating the value of A in equation (1) using equation (3) described later, becomes a method for estimating the mean square error based on distribution robust optimization, and is suitable for theoretical analysis. The estimation method is based on

　式（１）のＡの値は、基準データと予測モデルに基づき前述の式（２）を具体化して、次の式（３）により演算される値を使用する。式（３）によるＡの値の算出は、分布ロバスト最適化に基づく分析により導かれる演算方法である。 The value of A in equation (1) is a value calculated by the following equation (3) by embodying the above equation (2) based on the reference data and the prediction model. Calculating the value of A using equation (3) is an arithmetic method derived from analysis based on distribution robust optimization.

＜変化率Ａの演算式＞
Ａ＝（Σｉｊ　Ｋｉｊ×（Ｙｉ―Ｐｉ）＾２×（Ｙｊ―Ｐｊ）＾２）＾（１／２）…（３）
　ただし、
　Ａ：式（１）のＡの値、
　Σｉｊ：ｉとｊをそれぞれ１から基準の説明変数のデータの数まで変化させた場合の総和記号、
　Ｋｉｊ：Ｚ－Ｓｃｏｒｅ正規化した基準の説明変数のデータのｉ番目とｊ番目の説明変数のシグマ（パラメータ）を１としたラプラスカーネルの計算結果の値、
　Ｙｉ：基準の目的変数のデータのｉ番目の値、
　Ｙｊ：基準の目的変数のデータのｊ番目の値、
　Ｐｉ：基準の説明変数のデータのｉ番目のサンプルを予測モデルに入力した際の出力の値
　Ｐｊ：基準の説明変数のデータのｊ番目のサンプルを予測モデルに入力した際の出力の値、
である。 <Calculation formula for rate of change A>
A=(Σij Kij×(Yi-Pi)^2×(Yj-Pj)^2)^(1/2)...(3)
however,
A: value of A in formula (1),
Σij: summation symbol when i and j are each changed from 1 to the number of data of the standard explanatory variable,
Kij: The value of the calculation result of the Laplace kernel with the sigma (parameter) of the i-th and j-th explanatory variables of the Z-Score normalized standard explanatory variable data set to 1,
Yi: i-th value of the data of the standard objective variable,
Yj: j-th value of the standard objective variable data,
Pi: Output value when the i-th sample of data of the standard explanatory variable is input into the prediction model Pj: Output value when the j-th sample of the data of the standard explanatory variable is input into the prediction model,
It is.

　以上により具体化された基準データの選択方法と、式（１）のＤの値の算出方法と、式（３）による式（１）のＡの値の算出方法により、実運用中に、直近１週間の予測モデルの平均二乗誤差を推定することが可能となる。 By using the method of selecting reference data specified above, the method of calculating the value of D in equation (1), and the method of calculating the value of A in equation (1) using equation (3), the most recent It becomes possible to estimate the mean square error of the prediction model for one week.

　図６は、本実施例による平均二乗誤差の推定結果をグラフにより利用者に示すユーザーインターフェースの模式図である。図６のグラフは、横軸を日付、縦軸を平均二乗誤差として、平均二乗誤差の実測値と、本実施例による予測値を折れ線グラフにより示す。また、図６は、本実施例による平均二乗誤差の予測値の式（１）のＰの内訳を式（１）の右辺第１項のＢの値と第２項のＤ×Ａの値に分けて、棒グラフにより示す。平均二乗誤差の実測値は、目的変数の値、すなわちアイスクリームの販売数がわかるまでに１ヶ月かかることから、１ヶ月前までの予測に対してのみ計算し描画することが出来る。一方で、本開示による予測性能の推定は、今日の予測の分まで含めて計算することができる。また内訳を同時に表示することで、利用者が予測性能の推定値と推定値の変化をより理解しやすくすることが出来る。 FIG. 6 is a schematic diagram of a user interface that shows the estimation result of the mean square error according to the present example to the user in a graph. The graph in FIG. 6 shows the measured value of the mean square error and the predicted value according to the present example as a line graph, with the horizontal axis representing the date and the vertical axis representing the mean square error. In addition, FIG. 6 shows the breakdown of P in equation (1) of the predicted value of the mean square error according to the present example, based on the value of B in the first term on the right side of equation (1) and the value of D×A in the second term. It is divided and shown by a bar graph. Since it takes one month to know the value of the target variable, that is, the number of ice creams sold, the actual value of the mean squared error can be calculated and plotted only for predictions up to one month in advance. On the other hand, the prediction performance estimation according to the present disclosure can be calculated including today's prediction. Also, by displaying the breakdown at the same time, it is possible for the user to more easily understand the estimated value of the predicted performance and the change in the estimated value.

（実施例２：初期症状に基づく病気診断予測）
　次に、実施例２として、初期症状に基づく病気診断予測を行う場合の一例を説明する。より具体的には、患者の今日の体温、昨日の体温、一昨日の体温、のどの痛みの有無、鼻づまりの有無、せきの有無、だるさの有無などの症状を説明変数として、患者の病気、例えば、風邪、インフルエンザ、アレルギー性鼻炎、溶連菌感染症、急性気管支炎、麻疹などを目的変数として予測モデルによって予測するものとする。実際の病気の判定は、医師による診断が必要であり、診断を行うまでと、診断に必要な症状が発症するまでと、診断に必要な検査結果を得るまでとに時間がかかる場合があり、すべての患者の診断結果が出るまで、予測モデルの例えば直近３０人分の予測精度を知ることは出来ない。そこで本開示による推定装置により直近３０人分に対する予測の予測精度の推定を実施する。 (Example 2: Disease diagnosis prediction based on initial symptoms)
Next, as a second embodiment, an example in which disease diagnosis prediction is performed based on initial symptoms will be described. More specifically, we use symptoms such as today's body temperature, yesterday's body temperature, the day before yesterday's temperature, the presence or absence of a sore throat, the presence or absence of nasal congestion, the presence or absence of a cough, and the presence or absence of malaise to determine the patient's illness, e.g. , cold, influenza, allergic rhinitis, streptococcal infection, acute bronchitis, measles, etc., are used as objective variables and predicted by a prediction model. Determining the actual disease requires diagnosis by a doctor, and it may take time to make the diagnosis, develop the symptoms necessary for diagnosis, and obtain the test results necessary for diagnosis. It is not possible to know the predictive accuracy of the predictive model for, for example, the most recent 30 patients until the diagnostic results for all patients are available. Therefore, the prediction accuracy of the prediction for the most recent 30 people is estimated using the estimation device according to the present disclosure.

　まず、予測モデルは、過去の一定人数、例えば過去に取得した１０００人分の説明変数のデータである症状と、目的変数のデータである病気の判定結果を、予測モデル作成用のデータとして用いることで作成されるとする。ここで、過去１０００人分のデータを、第１の説明変数のデータ及び第１の説明変数のデータに対応する目的変数のデータとし、あわせて第１のデータと呼ぶ。作成された予測モデル、及び、第１のデータは所定の記憶領域に保持される。 First, the predictive model uses symptoms, which are explanatory variable data, and disease judgment results, which are objective variable data, for a certain number of people in the past, for example, 1000 people, as data for creating the predictive model. Suppose it is created with . Here, the data for the past 1000 people are defined as the data of the first explanatory variable and the data of the objective variable corresponding to the data of the first explanatory variable, and are collectively referred to as the first data. The created prediction model and first data are held in a predetermined storage area.

　次に、実運用中に、直近３０人分の予測モデルの予測精度を推定する。すなわち、直近３０人分の説明変数のデータを、第２の説明変数のデータとする。予測精度の推定においては、基準データの選択方法と、式（１）におけるＤの値の算出方法と、式（１）におけるＡの値の算出方法とが必要になる。 Next, during actual operation, the prediction accuracy of the prediction model for the most recent 30 people is estimated. That is, the explanatory variable data for the most recent 30 people is used as the second explanatory variable data. Estimating prediction accuracy requires a method for selecting reference data, a method for calculating the value of D in equation (1), and a method for calculating the value of A in equation (1).

　この例では、基準データとして、第１のデータすべてを使用するものとする。換言すると、基準データとして、第１のデータすべてを選択する。本実施例において式（１）におけるＤの値は、後述する通り、症状が基準データの各症状と比較して未知かどうかに基づいて計算される。実際に既知の症状は、第１のデータすべてであり、したがって、基準データとして第１のデータすべてを選択することで、第１のデータすべてを既知の症状とすることができ、より好適な予測精度の推定が可能になる。なお、Ｄの値の計算方法や第１データの数、症状に関するドメイン知識の有無などによっては、より詳細な方法で基準データを選択しても良いが、本実施例では前述の理由によりデータ全部を選択するものとする。 In this example, it is assumed that all the first data is used as the reference data. In other words, all the first data is selected as the reference data. In this embodiment, the value of D in equation (1) is calculated based on whether the symptom is unknown compared to each symptom of the reference data, as will be described later. Actually known symptoms are all of the first data. Therefore, by selecting all of the first data as reference data, all of the first data can be treated as known symptoms, and more suitable predictions can be made. Estimation of accuracy becomes possible. Note that, depending on the method of calculating the value of D, the number of first data items, the presence or absence of domain knowledge regarding symptoms, the reference data may be selected using a more detailed method, but in this example, all data are selected for the reasons mentioned above. shall be selected.

　また、式（１）におけるＤの値は、Ｍｉｎ－Ｍａｘ正規化した第２の説明変数のデータのうちの各サンプルから一定の距離、例えば、ユークリッド距離で1.0以内に基準の説明変数のデータのいずれのサンプルも存在しないサンプルの割合とする。これは、第２の説明変数のデータの内、第１の説明変数のデータに含まれない、すなわち予測モデルが訓練も検証もしていないような、予測モデルにとって未知の説明変数を持つサンプルの割合を算出する。もし、第２の説明変数のデータの全てのサンプルが、第１の説明変数のデータに含まれる場合、Ｄの値は０になる。 In addition, the value of D in equation (1) is set within a certain distance from each sample of the Min-Max normalized second explanatory variable data, for example, within 1.0 in Euclidean distance. This is the percentage of samples in which neither sample exists. This is the proportion of samples that have explanatory variables that are unknown to the predictive model, such as those that are not included in the data of the first explanatory variable among the data of the second explanatory variable, that is, the predictive model has neither been trained nor verified. Calculate. If all samples of the data of the second explanatory variable are included in the data of the first explanatory variable, the value of D becomes 0.

　最後に、式（１）におけるＡの値は、―１の定数とする。これは、Ｄの算出方法とあわせて、予測モデルが訓練も検証もしていないサンプルに対する予測モデルによる予測は、おおむね間違いになると言う仮定に基づく推定方法である。 Finally, the value of A in equation (1) is a constant of -1. This is an estimation method based on the assumption that, together with the calculation method of D, predictions made by a prediction model for samples on which the prediction model has not been trained or verified will generally be incorrect.

　以上の基準データの選択方法と式（１）のＤ及びＡの値の設定により、モデル作成用のデータに対する予測モデルの精度を上限とした上で、予測モデルが訓練も検証もしていないサンプルに対する予測モデルによる予測は誤るとして、予測モデルの直近３０人分に対する予測精度を推定することが出来る。 By using the above standard data selection method and setting the values of D and A in equation (1), the accuracy of the prediction model against the data for model creation is set as the upper limit, and the prediction model can be used for samples that have not been trained or verified. Assuming that the predictions made by the prediction model are incorrect, it is possible to estimate the prediction accuracy of the prediction model for the most recent 30 people.

　以上、上記実施形態等を参照して本開示を説明したが、本開示は、上述した実施形態に限定されるものではない。本開示の構成や詳細には、本開示の範囲内で当業者が理解しうる様々な変更をすることができる。また、上述した各手段や機能は、ネットワーク上のいかなる場所に設置され接続された情報処理装置で実行されてもよく、つまり、いわゆるクラウドコンピューティングで実行されてもよい。 Although the present disclosure has been described above with reference to the above-described embodiments, the present disclosure is not limited to the above-described embodiments. Various changes can be made to the structure and details of the present disclosure that can be understood by those skilled in the art within the scope of the present disclosure. Further, each of the means and functions described above may be executed by an information processing device installed and connected to any location on the network, that is, may be executed by so-called cloud computing.

　なお、上述したプログラムは、様々なタイプの非一時的なコンピュータ可読媒体（non-transitory computer readable medium）を用いて格納され、コンピュータに供給することができる。非一時的なコンピュータ可読媒体は、様々なタイプの実体のある記録媒体（tangible storage medium）を含む。非一時的なコンピュータ可読媒体の例は、磁気記録媒体（例えばフレキシブルディスク、磁気テープ、ハードディスクドライブ）、光磁気記録媒体（例えば光磁気ディスク）、ＣＤ－ＲＯＭ（Read Only Memory）、ＣＤ－Ｒ、ＣＤ－Ｒ／Ｗ、半導体メモリ（例えば、マスクＲＯＭ、ＰＲＯＭ（Programmable ROM）、ＥＰＲＯＭ（Erasable PROM）、フラッシュＲＯＭ、ＲＡＭ（Random Access Memory））を含む。また、プログラムは、様々なタイプの一時的なコンピュータ可読媒体（transitory computer readable medium）によってコンピュータに供給されてもよい。一時的なコンピュータ可読媒体の例は、電気信号、光信号、及び電磁波を含む。一時的なコンピュータ可読媒体は、電線及び光ファイバ等の有線通信路、又は無線通信路を介して、プログラムをコンピュータに供給できる。 Note that the above-mentioned programs can be stored and supplied to a computer using various types of non-transitory computer readable media. Non-transitory computer-readable media include various types of tangible storage media. Examples of non-transitory computer-readable media include magnetic recording media (e.g., flexible disks, magnetic tapes, hard disk drives), magneto-optical recording media (e.g., magneto-optical disks), CD-ROMs (Read Only Memory), CD-Rs, CD-R/W, semiconductor memory (eg, mask ROM, PROM (Programmable ROM), EPROM (Erasable PROM), flash ROM, RAM (Random Access Memory)). The program may also be supplied to the computer via various types of transitory computer readable media. Examples of transitory computer-readable media include electrical signals, optical signals, and electromagnetic waves. The temporary computer-readable medium can provide the program to the computer via wired communication channels, such as electrical wires and fiber optics, or wireless communication channels.

　＜付記＞
　上記実施形態の一部又は全部は、以下の付記のようにも記載されうる。以下、本発明における推定装置、推定方法、プログラムの構成の概略を説明する。但し、本発明は、以下の構成に限定されない。
（付記１）
　予め設定された予測モデルに応じて用意された第１の説明変数データ及び当該第１の説明変数データに対応する第１の目的変数データから、目的変数が対応付けられていない第２の説明変数データに基づいて、前記第１の説明変数データ及び当該第１の説明変数データに対応する前記第１の目的変数データを選択するデータ選択手段と、
　前記予測モデルに対して選択された前記第１の説明変数データを入力して得られる予測データと当該第１の説明変数データに対応する前記第１の目的変数データとの比較に基づいて、前記予測モデルの前記第２の説明変数データに対する予測の性能を推定する性能推定手段と、
を有する推定装置。
（付記２）
　付記１に記載の推定装置であって、
　前記性能推定手段は、前記予測データと前記第１の目的変数データとの比較と、前記第２の説明変数データと選択された前記第１の説明変数データとの比較と、に基づいて、前記予測モデルの前記第２の説明変数データに対する予測の性能を推定する、
推定装置。
（付記３）
　付記２に記載の推定装置であって、
　前記性能推定手段は、前記予測データと前記第１の目的変数データとの比較に基づく前記予測モデルの選択された前記第１の説明変数データに対する予測の性能と、前記第２の説明変数データと選択された前記第１の説明変数データとのデータ内容の予め設定された基準に基づく差と、に基づいて、前記予測モデルの前記第２の説明変数データに対する予測の性能を推定する、
推定装置。
（付記４）
　付記２に記載の推定装置であって、
　前記性能推定手段は、前記第２の説明変数データと選択された前記第１の説明変数データとのデータ内容の予め設定された基準に基づく差が大きくなるほど、前記予測モデルの前記第２の説明変数データに対する性能が悪化するよう推定する、
推定装置。
（付記５）
　付記４に記載の推定装置であって、
　前記性能推定手段は、前記予測データと前記第１の目的変数データとの比較に基づく前記予測モデルの選択された前記第１の説明変数データに対する予測の性能の値が、前記第２の説明変数データと選択された前記第１の説明変数データとのデータ内容の予め設定された基準に基づく差が大きくなるほど悪化するよう推定する、
推定装置。
（付記６）
　付記４に記載の推定装置であって、
　前記性能推定手段は、前記第２の説明変数データの分布と選択された前記第１の説明変数データの分布の差に応じた前記予測モデルの前記第２の説明変数データに対する性能の悪化率を、選択された前記第１の説明変数データと、前記予測データと選択された前記第１の目的変数データとの比較と、から演算する分布ロバスト最適化に基づく予測性能悪化率の演算式を用いて算出し、当該悪化率に基づいて前記予測モデルの前記第２の説明変数データに対する性能を推定する、
推定装置。
（付記７）
　付記１に記載の推定装置であって、
　前記データ選択手段は、前記第１の説明変数データのうち、前記予測モデルの訓練時に用いられたデータとは異なるデータを選択する、
推定装置。
（付記８）
　付記１に記載の推定装置であって、
　前記データ選択手段は、前記第１の説明変数データのうち、前記予測モデルの予測の性能に関して、前記第２の説明変数データと予め設定された関連性を有するデータを選択する、
推定装置。
（付記９）
　付記１に記載の推定装置であって、
　前記データ選択手段は、前記第１の説明変数データのうち、前記第２の説明変数データとデータ内容の予め設定された基準に基づく差が他と比較して小さい前記第１の説明変数データを選択する、
推定装置。
（付記１０）
　予め設定された予測モデルに応じて用意された第１の説明変数データ及び当該第１の説明変数データに対応する第１の目的変数データから、目的変数が対応付けられていない第２の説明変数データに基づいて、前記第１の説明変数データ及び当該第１の説明変数データに対応する前記第１の目的変数データを選択し、
　前記予測モデルに対して選択された前記第１の説明変数データを入力して得られる予測データと当該第１の説明変数データに対応する前記第１の目的変数データとの比較に基づいて、前記予測モデルの前記第２の説明変数データに対する予測の性能を推定する、
推定方法。
（付記１１）
　予め設定された予測モデルに応じて用意された第１の説明変数データ及び当該第１の説明変数データに対応する第１の目的変数データから、目的変数が対応付けられていない第２の説明変数データに基づいて、前記第１の説明変数データ及び当該第１の説明変数データに対応する前記第１の目的変数データを選択し、
　前記予測モデルに対して選択された前記第１の説明変数データを入力して得られる予測データと当該第１の説明変数データに対応する前記第１の目的変数データとの比較に基づいて、前記予測モデルの前記第２の説明変数データに対する予測の性能を推定する、
処理をコンピュータに実行させるためのプログラムを記憶したコンピュータにて読み取り可能な記憶媒体。 <Additional notes>
Part or all of the above embodiments may also be described as in the following additional notes. Hereinafter, the outline of the configuration of the estimation device, estimation method, and program in the present invention will be explained. However, the present invention is not limited to the following configuration.
(Additional note 1)
A second explanatory variable to which an objective variable is not associated from first explanatory variable data prepared according to a preset prediction model and first objective variable data corresponding to the first explanatory variable data. data selection means for selecting the first explanatory variable data and the first objective variable data corresponding to the first explanatory variable data based on the data;
Based on a comparison between the prediction data obtained by inputting the first explanatory variable data selected for the prediction model and the first objective variable data corresponding to the first explanatory variable data, the performance estimating means for estimating the prediction performance of the prediction model for the second explanatory variable data;
An estimation device having
(Additional note 2)
The estimation device according to supplementary note 1,
The performance estimating means is configured to calculate the performance estimation unit based on a comparison between the prediction data and the first objective variable data, and a comparison between the second explanatory variable data and the selected first explanatory variable data. estimating the prediction performance of the prediction model for the second explanatory variable data;
Estimation device.
(Additional note 3)
The estimation device according to appendix 2,
The performance estimating means calculates the prediction performance of the prediction model for the selected first explanatory variable data based on a comparison between the prediction data and the first objective variable data, and the prediction performance of the prediction model for the selected first explanatory variable data and the second explanatory variable data. estimating the prediction performance of the prediction model for the second explanatory variable data based on a difference between the selected first explanatory variable data and the data content based on a preset standard;
Estimation device.
(Additional note 4)
The estimation device according to appendix 2,
The performance estimating means may improve the second explanation of the prediction model as the difference between the second explanatory variable data and the selected first explanatory variable data based on a preset standard of data content increases. Estimating performance for variable data to deteriorate;
Estimation device.
(Appendix 5)
The estimation device according to appendix 4,
The performance estimating means is configured such that a prediction performance value of the prediction model for the selected first explanatory variable data based on a comparison between the prediction data and the first objective variable data is determined based on the second explanatory variable. Estimating that the larger the difference based on a preset standard of data content between the data and the selected first explanatory variable data, the worse the situation becomes;
Estimation device.
(Appendix 6)
The estimation device according to appendix 4,
The performance estimation means calculates a deterioration rate of performance of the prediction model with respect to the second explanatory variable data according to a difference between a distribution of the second explanatory variable data and a distribution of the selected first explanatory variable data. , using a calculation formula for a predicted performance deterioration rate based on distribution robust optimization, which is calculated from the selected first explanatory variable data, a comparison between the predicted data and the selected first objective variable data. and estimating the performance of the prediction model with respect to the second explanatory variable data based on the deterioration rate.
Estimation device.
(Appendix 7)
The estimation device according to supplementary note 1,
The data selection means selects data different from data used during training of the prediction model from among the first explanatory variable data.
Estimation device.
(Appendix 8)
The estimation device according to supplementary note 1,
The data selection means selects, from among the first explanatory variable data, data that has a preset relationship with the second explanatory variable data with respect to the prediction performance of the prediction model.
Estimation device.
(Appendix 9)
The estimation device according to supplementary note 1,
The data selection means selects, from among the first explanatory variable data, the first explanatory variable data that has a smaller difference between the second explanatory variable data and the data content based on a preset standard compared to other explanatory variable data. select,
Estimation device.
(Appendix 10)
A second explanatory variable to which an objective variable is not associated from first explanatory variable data prepared according to a preset prediction model and first objective variable data corresponding to the first explanatory variable data. selecting the first explanatory variable data and the first objective variable data corresponding to the first explanatory variable data based on the data;
Based on a comparison between the prediction data obtained by inputting the first explanatory variable data selected for the prediction model and the first objective variable data corresponding to the first explanatory variable data, the estimating the prediction performance of the prediction model for the second explanatory variable data;
Estimation method.
(Appendix 11)
A second explanatory variable to which an objective variable is not associated from first explanatory variable data prepared according to a preset prediction model and first objective variable data corresponding to the first explanatory variable data. selecting the first explanatory variable data and the first objective variable data corresponding to the first explanatory variable data based on the data;
Based on a comparison between the prediction data obtained by inputting the first explanatory variable data selected for the prediction model and the first objective variable data corresponding to the first explanatory variable data, the estimating the prediction performance of the prediction model for the second explanatory variable data;
A computer-readable storage medium that stores a program for causing a computer to execute processing.

１　推定装置
３　基準データ選択部
４　性能推定部
１０　第２データ取得部
２０　出力部
３０　制御部
４０　記憶部
４１　第１データ
４２　予測モデル
４３　パラメータ情報
１００　推定装置
１０１　ＣＰＵ
１０２　ＲＯＭ
１０３　ＲＡＭ
１０４　プログラム群
１０５　記憶装置
１０６　ドライブ装置
１０７　通信インタフェース
１０８　入出力インタフェース
１０９　バス
１１０　記憶媒体
１１１　通信ネットワーク
１２１　データ選択手段
１２２　性能推定手段
1 Estimation device 3 Reference data selection section 4 Performance estimation section 10 Second data acquisition section 20 Output section 30 Control section 40 Storage section 41 First data 42 Prediction model 43 Parameter information 100 Estimation device 101 CPU
102 ROM
103 RAM
104 Program group 105 Storage device 106 Drive device 107 Communication interface 108 Input/output interface 109 Bus 110 Storage medium 111 Communication network 121 Data selection means 122 Performance estimation means

Claims

A second explanatory variable to which an objective variable is not associated from first explanatory variable data prepared according to a preset prediction model and first objective variable data corresponding to the first explanatory variable data. data selection means for selecting the first explanatory variable data and the first objective variable data corresponding to the first explanatory variable data based on the data;
Based on a comparison between the prediction data obtained by inputting the first explanatory variable data selected for the prediction model and the first objective variable data corresponding to the first explanatory variable data, the performance estimating means for estimating the prediction performance of the prediction model for the second explanatory variable data;
An estimation device having

The estimation device according to claim 1,
The performance estimating means is configured to calculate the performance estimation unit based on a comparison between the prediction data and the first objective variable data, and a comparison between the second explanatory variable data and the selected first explanatory variable data. estimating the prediction performance of the prediction model for the second explanatory variable data;
Estimation device.

The estimation device according to claim 2,
The performance estimating means calculates the prediction performance of the prediction model for the selected first explanatory variable data based on a comparison between the prediction data and the first objective variable data, and the prediction performance of the prediction model for the selected first explanatory variable data and the second explanatory variable data. estimating the prediction performance of the prediction model for the second explanatory variable data based on a difference between the selected first explanatory variable data and the data content based on a preset standard;
Estimation device.

The estimation device according to claim 2,
The performance estimating means may improve the second explanation of the prediction model as the difference between the second explanatory variable data and the selected first explanatory variable data based on a preset standard of data content increases. Estimating performance for variable data to deteriorate;
Estimation device.

The estimation device according to claim 4,
The performance estimating means is configured such that a prediction performance value of the prediction model for the selected first explanatory variable data based on a comparison between the prediction data and the first objective variable data is determined based on the second explanatory variable. Estimating that the larger the difference based on a preset standard of data content between the data and the selected first explanatory variable data, the worse the situation becomes;
Estimation device.

The estimation device according to claim 4,
The performance estimation means calculates a deterioration rate of performance of the prediction model with respect to the second explanatory variable data according to a difference between a distribution of the second explanatory variable data and a distribution of the selected first explanatory variable data. , using a calculation formula for a predicted performance deterioration rate based on distribution robust optimization, which is calculated from the selected first explanatory variable data, a comparison between the predicted data and the selected first objective variable data. and estimating the performance of the prediction model with respect to the second explanatory variable data based on the deterioration rate.
Estimation device.

The estimation device according to claim 1,
The data selection means selects data different from data used during training of the prediction model from among the first explanatory variable data.
Estimation device.

The estimation device according to claim 1,
The data selection means selects, from among the first explanatory variable data, data that has a preset relationship with the second explanatory variable data with respect to the prediction performance of the prediction model.
Estimation device.

The estimation device according to claim 1,
The data selection means selects, from among the first explanatory variable data, the first explanatory variable data that has a smaller difference between the second explanatory variable data and the data content based on a preset standard compared to other explanatory variable data. select,
Estimation device.

A second explanatory variable to which an objective variable is not associated from first explanatory variable data prepared according to a preset prediction model and first objective variable data corresponding to the first explanatory variable data. selecting the first explanatory variable data and the first objective variable data corresponding to the first explanatory variable data based on the data;
Based on a comparison between the prediction data obtained by inputting the first explanatory variable data selected for the prediction model and the first objective variable data corresponding to the first explanatory variable data, the estimating the prediction performance of the prediction model for the second explanatory variable data;
Estimation method.

A second explanatory variable to which an objective variable is not associated from first explanatory variable data prepared according to a preset prediction model and first objective variable data corresponding to the first explanatory variable data. selecting the first explanatory variable data and the first objective variable data corresponding to the first explanatory variable data based on the data;
Based on a comparison between the prediction data obtained by inputting the first explanatory variable data selected for the prediction model and the first objective variable data corresponding to the first explanatory variable data, the estimating the prediction performance of the prediction model for the second explanatory variable data;
A computer-readable storage medium that stores a program for causing a computer to execute processing.