WO2023223533A1 - Estimation device, estimation method, and program - Google Patents
Estimation device, estimation method, and program Download PDFInfo
- Publication number
- WO2023223533A1 WO2023223533A1 PCT/JP2022/020927 JP2022020927W WO2023223533A1 WO 2023223533 A1 WO2023223533 A1 WO 2023223533A1 JP 2022020927 W JP2022020927 W JP 2022020927W WO 2023223533 A1 WO2023223533 A1 WO 2023223533A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- explanatory variable
- variable data
- performance
- prediction
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 60
- 238000009826 distribution Methods 0.000 claims description 21
- 238000012549 training Methods 0.000 claims description 16
- 230000006866 deterioration Effects 0.000 claims description 15
- 238000004364 calculation method Methods 0.000 claims description 13
- 238000012545 processing Methods 0.000 claims description 10
- 238000005457 optimization Methods 0.000 claims description 7
- 238000004891 communication Methods 0.000 description 12
- 238000010586 diagram Methods 0.000 description 11
- 230000006870 function Effects 0.000 description 11
- 208000024891 symptom Diseases 0.000 description 10
- 230000010365 information processing Effects 0.000 description 9
- 238000010187 selection method Methods 0.000 description 8
- 238000003745 diagnosis Methods 0.000 description 6
- 235000015243 ice cream Nutrition 0.000 description 6
- 238000012795 verification Methods 0.000 description 5
- 201000010099 disease Diseases 0.000 description 4
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 4
- 230000036760 body temperature Effects 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 230000015556 catabolic process Effects 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000002759 z-score normalization Methods 0.000 description 2
- 206010011224 Cough Diseases 0.000 description 1
- 201000005505 Measles Diseases 0.000 description 1
- 206010028735 Nasal congestion Diseases 0.000 description 1
- 206010068319 Oropharyngeal pain Diseases 0.000 description 1
- 201000007100 Pharyngitis Diseases 0.000 description 1
- 206010039085 Rhinitis allergic Diseases 0.000 description 1
- 206010061372 Streptococcal infection Diseases 0.000 description 1
- 201000010105 allergic rhinitis Diseases 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 206010006451 bronchitis Diseases 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 206010022000 influenza Diseases 0.000 description 1
- 238000012417 linear regression Methods 0.000 description 1
- 206010025482 malaise Diseases 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000007637 random forest analysis Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/17—Function evaluation by approximation methods, e.g. inter- or extrapolation, smoothing, least mean square method
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Definitions
- the present disclosure relates to an estimation device, an estimation method, and a program.
- an object of the present disclosure is to provide an estimation device, an estimation method, and a program that can estimate the prediction performance of a prediction model without using data of a target variable.
- An estimation device that is one form of the present invention includes: A second explanatory variable to which an objective variable is not associated from first explanatory variable data prepared according to a preset prediction model and first objective variable data corresponding to the first explanatory variable data.
- data selection means for selecting the first explanatory variable data and the first objective variable data corresponding to the first explanatory variable data based on the data;
- the performance estimating means for estimating the prediction performance of the prediction model for the second explanatory variable data; has,
- the structure is as follows.
- an estimation method that is one form of the present invention is A second explanatory variable to which an objective variable is not associated from first explanatory variable data prepared according to a preset prediction model and first objective variable data corresponding to the first explanatory variable data. selecting the first explanatory variable data and the first objective variable data corresponding to the first explanatory variable data based on the data; Based on a comparison between the prediction data obtained by inputting the first explanatory variable data selected for the prediction model and the first objective variable data corresponding to the first explanatory variable data, the estimating the prediction performance of the prediction model for the second explanatory variable data;
- the structure is as follows.
- a program that is one form of the present invention is A second explanatory variable to which an objective variable is not associated from first explanatory variable data prepared according to a preset prediction model and first objective variable data corresponding to the first explanatory variable data. selecting the first explanatory variable data and the first objective variable data corresponding to the first explanatory variable data based on the data; Based on a comparison between the prediction data obtained by inputting the first explanatory variable data selected for the prediction model and the first objective variable data corresponding to the first explanatory variable data, the estimating the prediction performance of the prediction model for the second explanatory variable data; have a computer perform a process,
- the structure is as follows.
- the predictive performance of the predictive model during operation can be estimated without using data of the objective variable.
- FIG. 1 is a block diagram illustrating an example of a hardware configuration of an estimation device according to a first embodiment of the present disclosure.
- FIG. 1 is a block diagram illustrating an example of the configuration of an estimation device according to a first embodiment of the present disclosure. It is a flowchart which shows an example of operation of an estimation device in a 1st embodiment of this indication. It is a block diagram showing an example of composition of an estimation device in a 2nd embodiment of this indication. It is a block diagram showing an example of operation of an estimation device in a 2nd embodiment of this indication.
- FIG. 7 is a diagram illustrating an example of an estimation process performed by an estimation device according to a second embodiment of the present disclosure.
- FIG. 1 is a block diagram showing an example of the hardware configuration of an estimation device according to the present embodiment
- FIG. 2 is a block diagram showing an example of the configuration of the estimation device
- FIG. 3 is a flowchart showing an example of the operation of the estimation device. Note that this embodiment shows an outline of the configuration of an estimation device and an estimation method that will be explained in a second embodiment described later.
- the estimation device 100 is constituted by a general information processing device, and is equipped with the following hardware configuration as an example.
- ⁇ CPU Central Processing Unit
- ROM Read Only Memory
- RAM Random Access Memory
- Program group 104 loaded into RAM 103 - Storage device 105 that stores the program group 104 -
- a drive device 106 that reads and writes from and to a storage medium 110 external to the information processing device -Communication interface 107 that connects to the communication network 111 outside the information processing device ⁇ I/O interface 108 that inputs and outputs data ⁇ Bus 109 connecting each component
- FIG. 1 shows an example of the hardware configuration of an information processing device that is the estimation device 100, and the hardware configuration of the information processing device is not limited to the above-mentioned case.
- the information processing device may be configured from part of the configuration described above, such as not having the drive device 106.
- the information processing device uses GPU (Graphic Processing Unit), DSP (Digital Signal Processor), MPU (Micro Processing Unit), FPU (Float) instead of the above-mentioned CPU. ating point number Processing Unit), PPU (Physics Processing Unit) , a TPU (Tensor Processing Unit), a quantum processor, a microcontroller, or a combination thereof.
- GPU Graphic Processing Unit
- DSP Digital Signal Processor
- MPU Micro Processing Unit
- FPU Float
- the estimation device 100 can construct and be equipped with the data selection means 121 and the performance estimation means 122 shown in FIG. 2 by the CPU 101 acquiring the program group 104 and executing the program group 104.
- the program group 104 is stored in advance in the storage device 105 or ROM 102, for example, and is loaded into the RAM 103 and executed by the CPU 101 as needed.
- the program group 104 may be supplied to the CPU 101 via the communication network 111, or may be stored in the storage medium 110 in advance, and the drive device 106 may read the program and supply it to the CPU 101.
- the data selection means 121 and the performance estimating means 122 described above may be constructed of a dedicated electronic circuit for realizing such means.
- the estimation device 100 has a function of acquiring a prediction model, first explanatory variable data, first objective variable data corresponding to the first explanatory variable data, and second explanatory variable data.
- the prediction model has a function of outputting a prediction result of the data of the corresponding objective variable when data of the explanatory variable is input by being executed in the estimation device 100.
- the output data is called predicted data.
- the first explanatory variable data and the first objective variable data are data prepared according to the prediction model.
- the first explanatory variable data and the first objective variable data are used as training data for generating the prediction model by machine learning. It includes first objective variable data, first explanatory variable data and first objective variable data for verifying the prediction model.
- the second explanatory variable data is explanatory variable data obtained when operating the prediction model in another task, and there is no objective variable data for such second explanatory variable data, and there is no correspondence. do not have.
- estimation device 100 may acquire the prediction model and each data from outside the estimation device 100.
- the estimation device 100 may include a communication device for communicating with other devices on the network.
- the estimation device 100 includes a storage device (not shown), the prediction model and each data may be acquired from the storage device.
- the data selection means 121 selects a partial or All the first explanatory variable data and the corresponding first objective variable data are selected. At this time, the data selection means 121 selects all or part of the data of the first explanatory variable that is more preferable for estimating the prediction performance of the prediction model for the data of the second explanatory variable, based on the content of the data of the second explanatory variable. Select section. More specifically, for the purpose of making the estimated value of the prediction performance index for the data of the second explanatory variable of the prediction model by the performance estimating means 122 (described later) closer to the actual value, the second explanation is All or part of the data of the first explanatory variable is selected based on the content of the data of the variable.
- a method for selecting all or part of the data of the first explanatory variable by the data selection means 121 a method of selecting data of an explanatory variable that is not used for training the prediction model may be used.
- data on the second explanatory variable, data on an explanatory variable that is close in the explanatory variable space, and data on the corresponding objective variable are used. is preferable.
- the spatial distribution (hereinafter simply referred to as distribution) of the explanatory variable of the data of the second explanatory variable is selected.
- a method may be used in which the data of the explanatory variables are selected so that the distribution of the data is close to each other.
- the method for selecting the reference explanatory variable data may be determined more appropriately depending on the method for estimating performance by the performance estimating means 122, which will be described later.
- the explanatory variable data selected by the data selection means 121 is referred to as reference explanatory variable data.
- the data selection means 121 also selects objective variable data corresponding to the selected reference explanatory variable data, and this is referred to as reference objective variable data.
- the reference explanatory variable data and the corresponding objective variable data are collectively referred to as reference data.
- the performance estimating means 122 performs the following based on the comparison between the predicted data obtained by inputting the data of the standard explanatory variable to the prediction model and the data of the standard objective variable corresponding to the data of the standard explanatory variable. Estimate the prediction performance of the prediction model for the second explanatory variable data. More specifically, for example, the performance estimation means 122 calculates a performance index calculated using prediction data obtained by inputting reference explanatory variable data into a prediction model and reference objective variable data. The value may be used as an estimate of the performance index for the data of the second explanatory variable of the predictive model.
- the performance estimating means 122 further takes into consideration the difference in trends between the data of the second explanatory variable and the data of the standard explanatory variable, so that the larger the difference in trends, the worse the estimated value of the performance index becomes.
- the difference in tendency is information based on the result of comparing the contents of two data, for example, the difference in the contents of two data, that is, the numerical non-negative value that represents the extent to which the contents of two data differ. is the value of
- a distance or an index defined between two data distributions may be used.
- the performance index used by the performance estimating means 122 quantitatively evaluates the goodness or badness of the prediction performance of the prediction model.
- the actual value of the performance index is calculated based on a comparison between the predicted data obtained by inputting the data of the second explanatory variable into the prediction model and the data of the objective variable corresponding to the data of the second explanatory variable. Therefore, if data on the objective variable is not available, it is not possible to calculate the actual value of the performance index.
- Specific examples of performance indicators include mean absolute error, mean square error, coefficient of determination, etc. in the case of regression, and accuracy, F1 score, cross entropy, etc. in the case of discrimination.
- the performance indicators that can be used by the performance estimating means 122 are not limited to these, and any evaluation index can be used as the estimation target.
- the estimation device 100 configured as described above executes the estimation method shown in the flowchart of FIG. 3 by the functions of the data selection means 121 and performance estimation means 122 described above.
- the estimation device 100 A second explanatory variable to which an objective variable is not associated from first explanatory variable data prepared according to a preset prediction model and first objective variable data corresponding to the first explanatory variable data. Based on the data, select the first explanatory variable data and the first objective variable data corresponding to the first explanatory variable data (step S101), Based on a comparison between the prediction data obtained by inputting the first explanatory variable data selected for the prediction model and the first objective variable data corresponding to the first explanatory variable data, the The prediction performance of the prediction model for the second explanatory variable data is estimated (step S102).
- the predictive model in operation and the explanatory variable data used to create the predictive model as the first explanatory variable data are described.
- data of the objective variable used to create the predictive model i.e., the data of the objective variable used to train the predictive model
- the data of the objective variable used to train the predictive model as the data of the objective variable corresponding to the data of the first explanatory variable.
- the estimation device 100 can estimate the predictive performance of the predictive model during operation using the explanatory variable data during operation, without waiting for the data on the objective variable during operation to be obtained. I can do it.
- the configuration shown in FIG. 2 is just an example, and does not necessarily limit the configuration of the estimation device according to this embodiment.
- some functions of the estimation device 100 may be realized by a plurality of devices working together.
- a first estimation device that obtains data on a first explanatory variable, data on an objective variable corresponding to the data on the first explanatory variable, and data on a second explanatory variable; It may be configured by a second estimating device that obtains the model.
- the estimation device 100 may include a plurality of data selection means and a plurality of performance estimation means.
- the estimation device 100 may include a device for calculating a final estimated value by the estimating device 100 based on the estimated values by the plurality of performance estimating means. If the estimation device 100 has a function other than estimating the value of the performance index of the prediction model, it may include a device not shown in FIG. 2 .
- FIG. 4 is a block diagram for explaining an example of the configuration of the estimation device according to the present embodiment
- FIG. 5 is a flowchart for explaining an example of the operation of the estimation device
- FIG. 6 is a diagram for explaining the situation when the estimation device is actually operated.
- the data of the explanatory variable used to create the prediction model is used as the data of the first explanatory variable. Further, as the data of the first objective variable corresponding to the data of the first explanatory variable, the data of the objective variable corresponding to the data of the explanatory variable used to create the prediction model is used. Furthermore, as the second explanatory variable data, explanatory variable data during operation of the prediction model is used.
- a predictive model created using explanatory variable data and objective variable data obtained in a certain task is applied to another task, and the explanatory variable data of another task is There are cases where we estimate the predictive performance of a predictive model for .
- explanatory variable data of another task can be used instead of the explanatory variable data during operation of the prediction model, and it is possible to apply the estimation device according to the present disclosure.
- data other than the data used to create the prediction model may be used as the data of the first explanatory variable and the data of the first objective variable corresponding to the data of the first explanatory variable. It is also possible to use explanatory variable data during operation of a prediction model for which objective variable data has already been obtained, and corresponding objective variable data.
- the estimation device 1 in this embodiment is configured with one or more information processing devices including a calculation device and a storage device.
- the estimation device 1 includes a second data acquisition section 10, an output section 20, a control section 30, and a storage section 40.
- the control unit 30 further includes a reference data selection unit 3 and a performance estimation unit 4.
- the functions of the second data acquisition unit 10, the output unit 20, the reference data selection unit 3, and the performance estimation unit 4 included in the control unit 30 are performed by the arithmetic unit using programs stored in the storage device to realize each function. This can be achieved by executing.
- the estimation device 1 includes a storage unit 40 that stores first data 41, a prediction model 42, and parameter information 43 in a storage device. Note that the control unit 30 performs data communication with each of the second data acquisition unit 10, the output unit 20, and the storage unit 40 via a communication network or the like.
- the estimation device 1 uses the above-described configurations to obtain the second explanatory variable data acquired by the second data acquisition unit 10 of the prediction model 42 stored in the storage unit 40, that is, the prediction model 42. It has the function of estimating predictive performance for explanatory variable data during operation.
- the configuration will be explained in detail below.
- the second data acquisition unit 10 acquires explanatory variable data during operation of the prediction model as second explanatory variable data.
- the explanatory variable data during operation (that is, the second explanatory variable data) is data obtained during operation, and the prediction model predicts the corresponding objective variable data.
- the second data acquisition unit 10 may include an interface that accepts user input, and specifically, includes a touch panel, buttons, and voice input. This includes input devices, etc.
- the estimation device 1 itself has a function of collecting data on the second explanatory variable, for example, a sensor device such as a camera, a device that processes the information acquired from the sensor device, and a device that processes the information obtained from the sensor device and This corresponds to devices and the like that store data as explanatory variables in item 2.
- a communication device with the other device, etc. corresponds to the case.
- the output unit 20 outputs the estimated value of the performance index calculated by the control unit 30.
- a display for displaying the estimated value, a speaker for outputting sound, etc. are applicable.
- a communication device or the like with the other device corresponds to the output unit 20.
- the estimation device 1 may have an alert function that notifies the deterioration of the prediction performance when the estimated value falls below a certain value, in which case the device that notifies the deterioration of the prediction performance outputs the This corresponds to Section 20.
- the storage unit 40 stores first data 41, a prediction model 42, and parameter information 43 in advance.
- the storage unit 40 transmits the first data 41, the prediction model 42, and the parameter information 43 to the control unit 30 as necessary.
- the first data 41 includes explanatory variable data (first explanatory variable data) used to create the predictive model 42 and target variable data (first explanatory variable data) corresponding to the explanatory variable data used to create the predictive model 42. objective variable data).
- This first data 41 includes training data used for training the prediction model 42 (that is, explanatory variable data used for training the prediction model 42 and objective variable data corresponding to the explanatory variable data), and the prediction model 42. includes verification data used to verify the generalization performance of (that is, explanatory variable data used to verify the prediction model 42 and objective variable data corresponding to the explanatory variable data).
- the prediction model 42 is a prediction model created using the first data 41.
- the prediction model 42 is created by so-called supervised machine learning so that when explanatory variable data of the training data of the first data 41 is input, the data of the objective variable of the training data is predicted. may be done.
- the generalization performance of the prediction model 42 may be verified using verification data of the first data 41.
- algorithms that implement the prediction model 42 include linear regression, decision trees, random forests, neural networks, and the like. These are just examples, and the algorithm for realizing the prediction model 42 is not limited as long as it is possible to predict the data of the objective variable based on the data of the explanatory variable.
- the parameter information 43 is information on parameters necessary for estimating the value of the performance index. For example, it may include information regarding the performance index to be estimated, a reference data acquisition method described later, and a performance index estimation method.
- the prediction model, the data of the first explanatory variable, and the data of the first objective variable corresponding to the data of the first explanatory variable do not change for each estimation, so they are stored in the storage unit 40.
- the processing required for acquisition can be reduced.
- the data of the second explanatory variables that is, the data of the explanatory variables used during operation of the prediction model, change for each estimation. It becomes possible to estimate the value of the performance index for the data.
- control unit 30 includes a reference data selection unit 3 and a performance estimation unit 4. Further, the control section 30 includes a CPU, ROM, RAM, etc. (not shown), and performs various controls and calculations on the reference data selection section 3 and the performance estimating section 4.
- the reference data selection unit 3 selects the second explanatory variable data acquired by the second data acquisition unit 10, the prediction model 42 acquired from the storage unit 40, the first data 41, and the parameter information 43. Based on this, all or part of the explanatory variable data of the first data 41 is selected. At this time, the reference data selection unit 3 also selects the data of the objective variable corresponding to the data of the explanatory variable of the selected first data 41.
- the selected explanatory variable data is referred to as reference explanatory variable data.
- the data of the objective variable of the first data 41 corresponding to the data of the standard explanatory variable is referred to as the data of the standard objective variable.
- the data of the standard explanatory variables and the data of the standard objective variables are collectively referred to as standard data.
- a method for selecting reference data by the reference data selection unit 3 may be used in which only verification data of the first data 41 is used as reference data.
- Performance index estimation formula (1) which will be described later, uses the value of the performance index of the predictive model 42 with respect to the reference data as a standard for the estimated value of the performance index of the predictive model 42 with respect to the data of the second explanatory variable. .
- the training data may be included in the reference data, and including the training data to increase the number of samples included in the reference data may be useful for optimally estimating the value of the performance index.
- the reference data selection unit 3 may be unable to select related data for the first explanatory variable, as will be described later, such as when the attributes or characteristics of the data cannot be classified based on the content of the data for the second explanatory variable. In such cases, only the validation data or all data including training data may be selected.
- all or part of the first data 41 that is highly relevant to the prediction performance of the second explanatory variable data of the prediction model 42 may be selected.
- a method of selecting data as reference data may also be used.
- “highly relevant” means that there is a positive correlation between the predictive performance of the predictive model 42 for the data of the second explanatory variable and the predictive performance of the predictive model 42 for the data of the reference explanatory variable. , refers to the fact that the two prediction performances are considered to be comparable.
- the data of the second explanatory variable is determined based on the preset standard for the second explanatory variable data.
- Performance index estimation formula (1) which will be described later, uses the value of the performance index of the predictive model 42 with respect to the reference data as a standard for the estimated value of the performance index of the predictive model 42 with respect to the data of the second explanatory variable. . Therefore, by using some data highly related to the data of the second explanatory variable as reference data, the estimated value of the performance index of the prediction model 42 can be calculated more suitably.
- the prediction model 42 When selecting data highly relevant to the data of the second explanatory variable from the first data 41, empirical knowledge regarding the data (generally referred to as domain knowledge) or characteristics of the prediction model 42 may be used. As a specific method to obtain highly relevant data, for example, if the explanatory variable data includes date and time information, the same day of the same month as any sample containing the second explanatory variable data, or There is a method of acquiring data for the same day of the week or the same season from the first data 41. As another example, regarding an explanatory variable that the prediction model 42 particularly emphasizes during prediction, data whose value is close to that of the second explanatory variable may be acquired from the first data 41.
- Equation (1) for estimating the performance index described below shows that the larger the difference in tendency between the data of the second explanatory variable and the data of the reference explanatory variable, the greater the performance index of the predictive model 42 with respect to the data of the second explanatory variable. Calculate the estimated value so that the value becomes worse.
- the second explanatory variable data often includes only a limited amount of data and a limited range of data in the explanatory variable space compared to the first data 41.
- the explanatory variable data of the first data 41 may include samples that are spatially close to each sample of the second explanatory variable data.
- the predictive model 42 has already been trained or verified using data for creating the predictive model 42 (first data 41) using data close to the explanatory variable data (second explanatory variable data) during operation.
- the actual value of the performance index of the prediction of the prediction model 42 for the data of the second explanatory variable becomes better. Therefore, by selecting some data that has a small difference in tendency from the data of the second explanatory variable among the explanatory variable data of the first data 41 and using it as the data of the standard explanatory variable, the predictive model 42 It is possible to estimate a value close to the actual value of the prediction performance index for the explanatory variable data.
- the index of difference in tendency may be calculated using the same index as the index of difference in tendency used in performance index estimation formula (1) described later, or may be calculated using a different index. Further, calculation may be performed by combining a plurality of different indicators.
- a selection method using a greedy method is used. May be used. First, one or more samples closest to the second explanatory variable data are obtained from the explanatory variable data of the first data 41 and used as temporary standard explanatory variable data. Next, from each sample of explanatory variable data of the first data 41 that is not included in the explanatory variable data of the temporary standard, the temporary standard is added to the data of the explanatory variable of the temporary standard.
- One or more samples that better reduce the difference in tendency between the data of the explanatory variable and the data of the second explanatory variable are obtained and added to the data of the temporary reference explanatory variable.
- the above addition process is repeated, for example, until the number of samples included in the data of the explanatory variable of the temporary standard exceeds a certain number, and finally the data of the explanatory variable of the temporary standard is used as the data of the explanatory variable of the standard.
- the reference data selection unit 3 may use a selection method that is a combination of two or more examples of the reference data selection methods described above. For example, the reference data selection unit 3 first selects the verification data from the first data 41, then selects some data that is highly related to the data of the second explanatory variable, and then selects the data of the second explanatory variable. You may select some data that has a small difference between the data and the trend. Further, as the reference data selection method, a more suitable selection method may be used in consideration of a specific estimation method by the performance estimating section 4, which will be described later. In other words, if it is empirically or theoretically believed that the estimation method by the performance estimator 4 can more accurately calculate the performance estimated value with a specific reference data selection method, the above-mentioned selection method may be used. May be used.
- the difference in trends between the explanatory variable data of the first data 41 and the data of the second explanatory variable is significantly different, and any sample of the data of the second explanatory variable is different from that of the explanatory variable of the first data 41. If the data has low relevance to the sample it contains, no matter how you select all or part of the explanatory variable data of the first data 41, the difference in trend from the second explanatory variable data will be 0. There's nothing wrong with that. Therefore, if the trends of the explanatory variable data of the first data 41 and the data of the second explanatory variable are significantly different, all or part of the explanatory variable data of the first data 41 is selected as described above. Even so, it is possible to estimate the deterioration in the prediction performance of the prediction model 42 for the data of the second explanatory variable.
- the performance estimating unit 4 performs calculations on the data of the second explanatory variable based on the reference data, the data of the second explanatory variable, and the predictive model 42 and parameter information 43 acquired from the storage unit 40.
- the estimated value P of the performance index of the prediction model 42 is calculated using, for example, equation (1) described later.
- the value of the performance index estimated by the calculation is output to the output unit 20.
- B in equation (1) is the value of the performance index of the predictive model for the reference explanatory variable data and the reference objective variable data.
- the estimated value P of the performance index is value.
- the data of the second explanatory variable and the data of the standard explanatory variable can be considered the same, so the explanatory variable Unless a change in the relationship between the data of The performance is almost the same. Equation (1) allows estimation of the value of the performance index based on this fact.
- Equation (1) calculates the value of the performance index when no concept drift occurs and when the data of the standard explanatory variable can be selected so that it has the same tendency as the data of the second explanatory variable. , can be estimated accurately from the data of the standard explanatory variables and the data of the standard objective variables.
- equation (1) is replaced by equation (1).
- equation (1) is the best value, and the performance deterioration is estimated based on the values of D and A in equation (1).
- the predictive performance of the predictive model will deteriorate for explanatory variable data that has a tendency different from that of the explanatory variable in the training data of the predictive model, but Equation (1) Based on this, it is possible to estimate the deterioration of the prediction performance of the prediction model 42 with respect to the data of the second explanatory variable.
- D in equation (1) is a non-negative value that quantitatively indicates the difference in tendency between the data of the second explanatory variable and the data of the reference explanatory variable.
- D becomes 0 when there is no difference in tendency between the data of the second explanatory variable and the data of the standard explanatory variable, and takes a large value when the difference is large.
- the difference between the distribution of data of the second explanatory variable and the distribution of data of the standard explanatory variable can be calculated, and the calculated value can be used as D.
- indicators for measuring differences in data distribution include Kullback-Leibler information amount, Jensen-Shannon information amount, Wasserstein distance, Maximum Mean Discrepancy (hereinafter referred to as MMD), and the like.
- These indicators are 0 when there is no difference in data distribution, and take on larger values as the difference in data distribution becomes larger.
- statistics such as the mean and variance of the explanatory variables are calculated for each of the data of the second explanatory variable and the data of the standard explanatory variable, and D is calculated based on the difference between them. Good too.
- the proportion of samples in which there is no sample of the standard explanatory variable data within a certain distance from each sample among the data of the second explanatory variable may be calculated, and this value may be set as D. .
- all explanatory variables of the second explanatory variable data and the standard explanatory variable data may be used, or only some explanatory variables may be used. good. For example, only some explanatory variables that have a particularly strong influence on predictions made by the prediction model 42 may be used.
- each explanatory variable in order to adjust the degree of influence of each explanatory variable on D, it is necessary to compare the data of the second explanatory variable and the data of the standard explanatory variable before calculating D.
- the value of each explanatory variable may be converted.
- each explanatory variable in order to equalize the degree of influence of each explanatory variable on D, each explanatory variable may be converted by Min-Max normalization or Z-Score normalization.
- explanation of the data of the second explanatory variable and the data of the reference explanatory variable For example, the value of the variable may be doubled.
- a in formula (1) is calculated based on the data of the standard explanatory variable and the comparison between the predicted data obtained by inputting the data of the standard explanatory variable into the prediction model 42 and the data of the standard objective variable. is the rate of change in the value of the performance index according to D. If the higher the value of the performance index, the better the performance of the prediction model (for example, when using the coefficient of determination, accuracy, F1 score, etc. as the performance index), a negative value, and the lower the value of the performance index, the better the performance of the prediction model. If the predictive model is considered to have good performance (for example, if mean absolute error, mean squared error, cross entropy, etc. are used as performance indicators), a positive value is used. Thereby, it is possible to formulate the deterioration of the prediction performance of the prediction model 42 for the data of the second explanatory variable in accordance with the increase in the value of D in equation (1).
- the value of A in formula (1) may be a constant, for example. More specifically, for example, if explanatory variable data and objective variable data corresponding to the explanatory variable data can be used in addition to the first data, the explanatory variable data is used as the second explanatory variable data.
- the value of A may be set so that the estimated value of P according to equation (1) when Further, the value of A may be suitably set based on, for example, theoretical analysis or empirical experimental results.
- the value of A in equation (1) is based on the comparison between the standard explanatory variable data, the predicted data obtained by inputting the standard explanatory variable data into the prediction model 42, and the standard objective variable data. It may be calculated based on.
- the following formula (2) for calculating the predicted performance deterioration rate based on distribution robust optimization may be used.
- the calculation formula for the prediction performance deterioration rate based on this distribution robust optimization is calculated mathematically from the value of the prediction performance index for the data of the standard explanatory variable of the prediction model 42 and the distribution of the data of the standard explanatory variable.
- A ( ⁇ ij Kij ⁇ Li ⁇ Lj) ⁇ (1/2)...(2) however, A: value of A in formula (1), ⁇ ij: summation symbol when i and j are each changed from 1 to the number of data of the standard explanatory variable, Kij: the value of the Laplace kernel calculation result of the i-th and j-th explanatory variables of the data of the standard explanatory variables, Li: Value of the prediction performance index calculated based on the i-th value of the standard objective variable data and the output value when inputting the i-th sample of the standard explanatory variable data into the prediction model. , Lj: value of the prediction performance index calculated based on the j-th value of the standard objective variable data and the output value when inputting the j-th sample of the standard explanatory variable data into the prediction model , It is.
- the data of the standard explanatory variable and the second The value of the performance index when the prediction performance is the worst can be estimated based on the distance by MMD from the data of the explanatory variable.
- FIG. 5 is an example of a flowchart showing the processing procedure of the estimation device 1 in the first embodiment.
- the second data acquisition unit 10 acquires data of the second explanatory variable (step S1).
- the control unit 30 acquires the first data 41, the prediction model 42, and the parameter information 43 from the storage unit 40 (step S2).
- the reference data selection unit 3 selects reference data based on the second explanatory variable data, the first data 41, and the parameter information 43 (step S3).
- the performance estimating unit 4 calculates the estimated value of the performance index using equation (1) based on the reference data, the data of the second explanatory variable, the prediction model 42, and the parameter information 43 (step S4).
- the estimation device 1 outputs the estimated value of the performance index obtained by the performance estimation section 4 through the output section 20 (step S5).
- the second explanatory variable acquisition process S1 may be executed after the first data 41, prediction model 42, and parameter information 43 acquisition process (step S2).
- step S2 may be divided into a plurality of steps, and for example, the process of acquiring the prediction model 42 in step S2 may be executed after the process of selecting reference data (step S3).
- Example 1 Forecasting the demand for ice cream in a supermarket one month later
- Example 1 in a physical store of a certain supermarket, the current month, day, day of the week, season, weather, temperature, humidity, number of visitors, number of sales of related products, etc. are used as explanatory variables, and ice cream one month later is An example of predicting the demand quantity using the objective variable will be explained.
- the actual number of sales cannot be known until one month has passed and the ice cream is sold.
- the value of the objective variable which is the number of ice creams in demand one month from now.
- the mean square error of the prediction of the prediction model for the most recent week is estimated.
- the prediction model is based on a certain period of time in the past, for example, 3 years of data acquired in the past (i.e. 3 years of daily explanatory variable data and daily target variable data of ice cream 1 month after each day). (including sales numbers) as data for creating a predictive model.
- 3 years of data acquired in the past i.e. 3 years of daily explanatory variable data and daily target variable data of ice cream 1 month after each day.
- sales numbers data for creating a predictive model.
- the three years' worth of data is referred to as first explanatory variable data and objective variable data corresponding to the first explanatory variable data, and is collectively referred to as first data.
- the created prediction model and first data are held in a predetermined storage area.
- Estimating prediction performance requires a method for selecting reference data, a method for calculating the value of D in equation (1), and a method for calculating the value of A in equation (1).
- MMD the MMD of the reference explanatory variable data and the second explanatory variable data is used as the value of D in equation (1). More specifically, MMD is calculated using a Laplace kernel with a sigma (parameter) of 1. Also, before calculating D, Z-Score normalization is performed.
- the method for calculating the value of D using MMD together with the method for calculating the value of A in equation (1) using equation (3) described later, becomes a method for estimating the mean square error based on distribution robust optimization, and is suitable for theoretical analysis. The estimation method is based on
- the value of A in equation (1) is a value calculated by the following equation (3) by embodying the above equation (2) based on the reference data and the prediction model. Calculating the value of A using equation (3) is an arithmetic method derived from analysis based on distribution robust optimization.
- A ( ⁇ ij Kij ⁇ (Yi-Pi) ⁇ 2 ⁇ (Yj-Pj) ⁇ 2) ⁇ (1/2)...(3) however, A: value of A in formula (1), ⁇ ij: summation symbol when i and j are each changed from 1 to the number of data of the standard explanatory variable, Kij: The value of the calculation result of the Laplace kernel with the sigma (parameter) of the i-th and j-th explanatory variables of the Z-Score normalized standard explanatory variable data set to 1, Yi: i-th value of the data of the standard objective variable, Yj: j-th value of the standard objective variable data, Pi: Output value when the i-th sample of data of the standard explanatory variable is input into the prediction model Pj: Output value when the j-th sample of the data of the standard explanatory variable is input into the prediction model, It is.
- FIG. 6 is a schematic diagram of a user interface that shows the estimation result of the mean square error according to the present example to the user in a graph.
- the graph in FIG. 6 shows the measured value of the mean square error and the predicted value according to the present example as a line graph, with the horizontal axis representing the date and the vertical axis representing the mean square error.
- FIG. 6 shows the breakdown of P in equation (1) of the predicted value of the mean square error according to the present example, based on the value of B in the first term on the right side of equation (1) and the value of D ⁇ A in the second term. It is divided and shown by a bar graph.
- the prediction performance estimation according to the present disclosure can be calculated including today's prediction. Also, by displaying the breakdown at the same time, it is possible for the user to more easily understand the estimated value of the predicted performance and the change in the estimated value.
- Example 2 Disease diagnosis prediction based on initial symptoms
- symptoms such as today's body temperature, yesterday's body temperature, the day before yesterday's temperature, the presence or absence of a sore throat, the presence or absence of nasal congestion, the presence or absence of a cough, and the presence or absence of malaise to determine the patient's illness, e.g. , cold, influenza, allergic rhinitis, streptococcal infection, acute bronchitis, measles, etc.
- the patient's illness e.g. , cold, influenza, allergic rhinitis, streptococcal infection, acute bronchitis, measles, etc.
- Determining the actual disease requires diagnosis by a doctor, and it may take time to make the diagnosis, develop the symptoms necessary for diagnosis, and obtain the test results necessary for diagnosis. It is not possible to know the predictive accuracy of the predictive model for, for example, the most recent 30 patients until the diagnostic results for all patients are available. Therefore, the prediction accuracy of the prediction for the most recent 30 people is estimated using the estimation device according to the present disclosure.
- the predictive model uses symptoms, which are explanatory variable data, and disease judgment results, which are objective variable data, for a certain number of people in the past, for example, 1000 people, as data for creating the predictive model.
- symptoms which are explanatory variable data
- disease judgment results which are objective variable data
- data for creating the predictive model a certain number of people in the past, for example, 1000 people.
- the data for the past 1000 people are defined as the data of the first explanatory variable and the data of the objective variable corresponding to the data of the first explanatory variable, and are collectively referred to as the first data.
- the created prediction model and first data are held in a predetermined storage area.
- Estimating prediction accuracy requires a method for selecting reference data, a method for calculating the value of D in equation (1), and a method for calculating the value of A in equation (1).
- the reference data it is assumed that all the first data is used as the reference data. In other words, all the first data is selected as the reference data.
- the value of D in equation (1) is calculated based on whether the symptom is unknown compared to each symptom of the reference data, as will be described later.
- known symptoms are all of the first data. Therefore, by selecting all of the first data as reference data, all of the first data can be treated as known symptoms, and more suitable predictions can be made. Estimation of accuracy becomes possible. Note that, depending on the method of calculating the value of D, the number of first data items, the presence or absence of domain knowledge regarding symptoms, the reference data may be selected using a more detailed method, but in this example, all data are selected for the reasons mentioned above. shall be selected.
- the value of D in equation (1) is set within a certain distance from each sample of the Min-Max normalized second explanatory variable data, for example, within 1.0 in Euclidean distance. This is the percentage of samples in which neither sample exists. This is the proportion of samples that have explanatory variables that are unknown to the predictive model, such as those that are not included in the data of the first explanatory variable among the data of the second explanatory variable, that is, the predictive model has neither been trained nor verified. Calculate. If all samples of the data of the second explanatory variable are included in the data of the first explanatory variable, the value of D becomes 0.
- the value of A in equation (1) is a constant of -1. This is an estimation method based on the assumption that, together with the calculation method of D, predictions made by a prediction model for samples on which the prediction model has not been trained or verified will generally be incorrect.
- the accuracy of the prediction model against the data for model creation is set as the upper limit, and the prediction model can be used for samples that have not been trained or verified. Assuming that the predictions made by the prediction model are incorrect, it is possible to estimate the prediction accuracy of the prediction model for the most recent 30 people.
- Non-transitory computer-readable media include various types of tangible storage media.
- Examples of non-transitory computer-readable media include magnetic recording media (e.g., flexible disks, magnetic tapes, hard disk drives), magneto-optical recording media (e.g., magneto-optical disks), CD-ROMs (Read Only Memory), CD-Rs, CD-R/W, semiconductor memory (eg, mask ROM, PROM (Programmable ROM), EPROM (Erasable PROM), flash ROM, RAM (Random Access Memory)).
- the program may also be supplied to the computer via various types of transitory computer readable media. Examples of transitory computer-readable media include electrical signals, optical signals, and electromagnetic waves.
- the temporary computer-readable medium can provide the program to the computer via wired communication channels, such as electrical wires and fiber optics, or wireless communication channels.
- the data selection means for selecting the first explanatory variable data and the first objective variable data corresponding to the first explanatory variable data based on the data; Based on a comparison between the prediction data obtained by inputting the first explanatory variable data selected for the prediction model and the first objective variable data corresponding to the first explanatory variable data, the performance estimating means for estimating the prediction performance of the prediction model for the second explanatory variable data;
- the performance estimating means calculates the prediction performance of the prediction model for the selected first explanatory variable data based on a comparison between the prediction data and the first objective variable data, and the prediction performance of the prediction model for the selected first explanatory variable data and the second explanatory variable data. estimating the prediction performance of the prediction model for the second explanatory variable data based on a difference between the selected first explanatory variable data and the data content based on a preset standard; Estimation device.
- the estimation device may improve the second explanation of the prediction model as the difference between the second explanatory variable data and the selected first explanatory variable data based on a preset standard of data content increases. Estimating performance for variable data to deteriorate; Estimation device. (Appendix 5) The estimation device according to appendix 4, The performance estimating means is configured such that a prediction performance value of the prediction model for the selected first explanatory variable data based on a comparison between the prediction data and the first objective variable data is determined based on the second explanatory variable. Estimating that the larger the difference based on a preset standard of data content between the data and the selected first explanatory variable data, the worse the situation becomes; Estimation device.
- the estimation device calculates a deterioration rate of performance of the prediction model with respect to the second explanatory variable data according to a difference between a distribution of the second explanatory variable data and a distribution of the selected first explanatory variable data. , using a calculation formula for a predicted performance deterioration rate based on distribution robust optimization, which is calculated from the selected first explanatory variable data, a comparison between the predicted data and the selected first objective variable data. and estimating the performance of the prediction model with respect to the second explanatory variable data based on the deterioration rate. Estimation device.
- the estimation device selects data different from data used during training of the prediction model from among the first explanatory variable data.
- Estimation device. (Appendix 8) The estimation device according to supplementary note 1, The data selection means selects, from among the first explanatory variable data, data that has a preset relationship with the second explanatory variable data with respect to the prediction performance of the prediction model.
- Estimation device. (Appendix 9) The estimation device according to supplementary note 1, The data selection means selects, from among the first explanatory variable data, the first explanatory variable data that has a smaller difference between the second explanatory variable data and the data content based on a preset standard compared to other explanatory variable data.
- Estimation device (Appendix 10) A second explanatory variable to which an objective variable is not associated from first explanatory variable data prepared according to a preset prediction model and first objective variable data corresponding to the first explanatory variable data. selecting the first explanatory variable data and the first objective variable data corresponding to the first explanatory variable data based on the data; Based on a comparison between the prediction data obtained by inputting the first explanatory variable data selected for the prediction model and the first objective variable data corresponding to the first explanatory variable data, the estimating the prediction performance of the prediction model for the second explanatory variable data; Estimation method.
- a computer-readable storage medium that stores a program for causing a computer to execute processing.
- Estimation device 3 Reference data selection section 4 Performance estimation section 10 Second data acquisition section 20 Output section 30 Control section 40 Storage section 41 First data 42 Prediction model 43 Parameter information 100 Estimation device 101 CPU 102 ROM 103 RAM 104 Program group 105 Storage device 106 Drive device 107 Communication interface 108 Input/output interface 109 Bus 110 Storage medium 111 Communication network 121 Data selection means 122 Performance estimation means
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Databases & Information Systems (AREA)
- Algebra (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Computing Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
An estimation device 100 according to the present invention comprises: a data selection means 121 for selecting, from first explanatory variable data prepared according to a preset prediction model and first objective variable data corresponding to the first explanatory variable data, first explanatory variable data and first objective variable data corresponding to the first explanatory variable data, on the basis of second explanatory variable data that is not associated with an objective variable; and a performance estimation means 122 for estimating the performance of a prediction model's prediction about the second explanatory variable data, on the basis of the comparison between prediction data obtained by inputting the first explanatory variable data selected according to the prediction model and the first objective variable data corresponding to the first explanatory variable data.
Description
本開示は、推定装置、推定方法およびプログラムに関する。
The present disclosure relates to an estimation device, an estimation method, and a program.
説明変数のデータと、当該説明変数のデータに対応する目的変数のデータと、を基に、説明変数のデータから目的変数のデータを予測する予測モデルを作成し、作成した予測モデルを実運用する方法が検討されている。例えば、特許文献1には、説明変数のデータと、説明変数のデータから生成された新たなデータと、から複数の予測モデル候補を作成し、複数の予測モデル候補の予測精度を評価することによって予測モデルを選択する技術が開示されている。
Create a prediction model that predicts the objective variable data from the explanatory variable data based on the explanatory variable data and the objective variable data that corresponds to the explanatory variable data, and put the created prediction model into practice. Methods are being considered. For example, in Patent Document 1, a plurality of predictive model candidates are created from explanatory variable data and new data generated from the explanatory variable data, and the prediction accuracy of the plural predictive model candidates is evaluated. A technique for selecting a predictive model is disclosed.
しかしながら、予測モデルを運用する状況によっては、予測モデルの予測対象である目的変数のデータの入手に時間がかかる場合がある。このような場合においては、目的変数を入手するまで、予測モデルの予測の性能指標の値を算出することができない、という問題が生じる。
However, depending on the situation in which the prediction model is operated, it may take time to obtain data on the objective variable that is the target of the prediction by the prediction model. In such a case, a problem arises in that the value of the predictive performance index of the predictive model cannot be calculated until the target variable is obtained.
本開示の目的は、上述した課題を鑑みて、目的変数のデータを用いずに、予測モデルの予測性能を推定することができる推定装置、推定方法およびプログラムを提供することにある。
In view of the above-mentioned problems, an object of the present disclosure is to provide an estimation device, an estimation method, and a program that can estimate the prediction performance of a prediction model without using data of a target variable.
本発明の一形態である推定装置は、
予め設定された予測モデルに応じて用意された第1の説明変数データ及び当該第1の説明変数データに対応する第1の目的変数データから、目的変数が対応付けられていない第2の説明変数データに基づいて、前記第1の説明変数データ及び当該第1の説明変数データに対応する前記第1の目的変数データを選択するデータ選択手段と、
前記予測モデルに対して選択された前記第1の説明変数データを入力して得られる予測データと当該第1の説明変数データに対応する前記第1の目的変数データとの比較に基づいて、前記予測モデルの前記第2の説明変数データに対する予測の性能を推定する性能推定手段と、
を有する、
という構成をとる。 An estimation device that is one form of the present invention includes:
A second explanatory variable to which an objective variable is not associated from first explanatory variable data prepared according to a preset prediction model and first objective variable data corresponding to the first explanatory variable data. data selection means for selecting the first explanatory variable data and the first objective variable data corresponding to the first explanatory variable data based on the data;
Based on a comparison between the prediction data obtained by inputting the first explanatory variable data selected for the prediction model and the first objective variable data corresponding to the first explanatory variable data, the performance estimating means for estimating the prediction performance of the prediction model for the second explanatory variable data;
has,
The structure is as follows.
予め設定された予測モデルに応じて用意された第1の説明変数データ及び当該第1の説明変数データに対応する第1の目的変数データから、目的変数が対応付けられていない第2の説明変数データに基づいて、前記第1の説明変数データ及び当該第1の説明変数データに対応する前記第1の目的変数データを選択するデータ選択手段と、
前記予測モデルに対して選択された前記第1の説明変数データを入力して得られる予測データと当該第1の説明変数データに対応する前記第1の目的変数データとの比較に基づいて、前記予測モデルの前記第2の説明変数データに対する予測の性能を推定する性能推定手段と、
を有する、
という構成をとる。 An estimation device that is one form of the present invention includes:
A second explanatory variable to which an objective variable is not associated from first explanatory variable data prepared according to a preset prediction model and first objective variable data corresponding to the first explanatory variable data. data selection means for selecting the first explanatory variable data and the first objective variable data corresponding to the first explanatory variable data based on the data;
Based on a comparison between the prediction data obtained by inputting the first explanatory variable data selected for the prediction model and the first objective variable data corresponding to the first explanatory variable data, the performance estimating means for estimating the prediction performance of the prediction model for the second explanatory variable data;
has,
The structure is as follows.
また、本発明の一形態である推定方法は、
予め設定された予測モデルに応じて用意された第1の説明変数データ及び当該第1の説明変数データに対応する第1の目的変数データから、目的変数が対応付けられていない第2の説明変数データに基づいて、前記第1の説明変数データ及び当該第1の説明変数データに対応する前記第1の目的変数データを選択し、
前記予測モデルに対して選択された前記第1の説明変数データを入力して得られる予測データと当該第1の説明変数データに対応する前記第1の目的変数データとの比較に基づいて、前記予測モデルの前記第2の説明変数データに対する予測の性能を推定する、
という構成をとる。 Furthermore, an estimation method that is one form of the present invention is
A second explanatory variable to which an objective variable is not associated from first explanatory variable data prepared according to a preset prediction model and first objective variable data corresponding to the first explanatory variable data. selecting the first explanatory variable data and the first objective variable data corresponding to the first explanatory variable data based on the data;
Based on a comparison between the prediction data obtained by inputting the first explanatory variable data selected for the prediction model and the first objective variable data corresponding to the first explanatory variable data, the estimating the prediction performance of the prediction model for the second explanatory variable data;
The structure is as follows.
予め設定された予測モデルに応じて用意された第1の説明変数データ及び当該第1の説明変数データに対応する第1の目的変数データから、目的変数が対応付けられていない第2の説明変数データに基づいて、前記第1の説明変数データ及び当該第1の説明変数データに対応する前記第1の目的変数データを選択し、
前記予測モデルに対して選択された前記第1の説明変数データを入力して得られる予測データと当該第1の説明変数データに対応する前記第1の目的変数データとの比較に基づいて、前記予測モデルの前記第2の説明変数データに対する予測の性能を推定する、
という構成をとる。 Furthermore, an estimation method that is one form of the present invention is
A second explanatory variable to which an objective variable is not associated from first explanatory variable data prepared according to a preset prediction model and first objective variable data corresponding to the first explanatory variable data. selecting the first explanatory variable data and the first objective variable data corresponding to the first explanatory variable data based on the data;
Based on a comparison between the prediction data obtained by inputting the first explanatory variable data selected for the prediction model and the first objective variable data corresponding to the first explanatory variable data, the estimating the prediction performance of the prediction model for the second explanatory variable data;
The structure is as follows.
また、本発明の一形態であるプログラムは、
予め設定された予測モデルに応じて用意された第1の説明変数データ及び当該第1の説明変数データに対応する第1の目的変数データから、目的変数が対応付けられていない第2の説明変数データに基づいて、前記第1の説明変数データ及び当該第1の説明変数データに対応する前記第1の目的変数データを選択し、
前記予測モデルに対して選択された前記第1の説明変数データを入力して得られる予測データと当該第1の説明変数データに対応する前記第1の目的変数データとの比較に基づいて、前記予測モデルの前記第2の説明変数データに対する予測の性能を推定する、
処理をコンピュータに実行させる、
という構成をとる。 Further, a program that is one form of the present invention is
A second explanatory variable to which an objective variable is not associated from first explanatory variable data prepared according to a preset prediction model and first objective variable data corresponding to the first explanatory variable data. selecting the first explanatory variable data and the first objective variable data corresponding to the first explanatory variable data based on the data;
Based on a comparison between the prediction data obtained by inputting the first explanatory variable data selected for the prediction model and the first objective variable data corresponding to the first explanatory variable data, the estimating the prediction performance of the prediction model for the second explanatory variable data;
have a computer perform a process,
The structure is as follows.
予め設定された予測モデルに応じて用意された第1の説明変数データ及び当該第1の説明変数データに対応する第1の目的変数データから、目的変数が対応付けられていない第2の説明変数データに基づいて、前記第1の説明変数データ及び当該第1の説明変数データに対応する前記第1の目的変数データを選択し、
前記予測モデルに対して選択された前記第1の説明変数データを入力して得られる予測データと当該第1の説明変数データに対応する前記第1の目的変数データとの比較に基づいて、前記予測モデルの前記第2の説明変数データに対する予測の性能を推定する、
処理をコンピュータに実行させる、
という構成をとる。 Further, a program that is one form of the present invention is
A second explanatory variable to which an objective variable is not associated from first explanatory variable data prepared according to a preset prediction model and first objective variable data corresponding to the first explanatory variable data. selecting the first explanatory variable data and the first objective variable data corresponding to the first explanatory variable data based on the data;
Based on a comparison between the prediction data obtained by inputting the first explanatory variable data selected for the prediction model and the first objective variable data corresponding to the first explanatory variable data, the estimating the prediction performance of the prediction model for the second explanatory variable data;
have a computer perform a process,
The structure is as follows.
本開示によれば、目的変数のデータを用いずに、予測モデルの運用時の予測性能を推定することができる。
According to the present disclosure, the predictive performance of the predictive model during operation can be estimated without using data of the objective variable.
本開示の実施の形態について図面を参照しながら、本開示の好適な実施の形態について詳細に説明する。
Preferred embodiments of the present disclosure will be described in detail with reference to the drawings.
<第1の実施形態>
まず、本開示の第1の実施形態の一例ついて説明する。図1は、本実施形態に係る推定装置のハードウェア構成の一例を示すブロック図であり、図2は、推定装置の構成の一例を示すブロック図である。図3は、推定装置の動作の一例を示すフローチャートである。なお、本実施形態では、後述する第2の実施形態で説明する推定装置及び推定方法の構成の概略を示している。 <First embodiment>
First, an example of the first embodiment of the present disclosure will be described. FIG. 1 is a block diagram showing an example of the hardware configuration of an estimation device according to the present embodiment, and FIG. 2 is a block diagram showing an example of the configuration of the estimation device. FIG. 3 is a flowchart showing an example of the operation of the estimation device. Note that this embodiment shows an outline of the configuration of an estimation device and an estimation method that will be explained in a second embodiment described later.
まず、本開示の第1の実施形態の一例ついて説明する。図1は、本実施形態に係る推定装置のハードウェア構成の一例を示すブロック図であり、図2は、推定装置の構成の一例を示すブロック図である。図3は、推定装置の動作の一例を示すフローチャートである。なお、本実施形態では、後述する第2の実施形態で説明する推定装置及び推定方法の構成の概略を示している。 <First embodiment>
First, an example of the first embodiment of the present disclosure will be described. FIG. 1 is a block diagram showing an example of the hardware configuration of an estimation device according to the present embodiment, and FIG. 2 is a block diagram showing an example of the configuration of the estimation device. FIG. 3 is a flowchart showing an example of the operation of the estimation device. Note that this embodiment shows an outline of the configuration of an estimation device and an estimation method that will be explained in a second embodiment described later.
まず、図1を参照して、本実施形態における推定装置100のハードウェア構成を説明する。推定装置100は、一般的な情報処理装置にて構成されており、一例として、以下のようなハードウェア構成を装備している。
・CPU(Central Processing Unit)101(演算装置)
・ROM(Read Only Memory)102(記憶装置)
・RAM(Random Access Memory)103(記憶装置)
・RAM103にロードされるプログラム群104
・プログラム群104を格納する記憶装置105
・情報処理装置外部の記憶媒体110の読み書きを行うドライブ装置106
・情報処理装置外部の通信ネットワーク111と接続する通信インタフェース107
・データの入出力を行う入出力インタフェース108
・各構成要素を接続するバス109 First, with reference to FIG. 1, the hardware configuration of theestimation device 100 in this embodiment will be described. The estimation device 100 is constituted by a general information processing device, and is equipped with the following hardware configuration as an example.
・CPU (Central Processing Unit) 101 (arithmetic unit)
・ROM (Read Only Memory) 102 (storage device)
・RAM (Random Access Memory) 103 (storage device)
-Program group 104 loaded into RAM 103
-Storage device 105 that stores the program group 104
- Adrive device 106 that reads and writes from and to a storage medium 110 external to the information processing device
-Communication interface 107 that connects to the communication network 111 outside the information processing device
・I/O interface 108 that inputs and outputs data
・Bus 109 connecting each component
・CPU(Central Processing Unit)101(演算装置)
・ROM(Read Only Memory)102(記憶装置)
・RAM(Random Access Memory)103(記憶装置)
・RAM103にロードされるプログラム群104
・プログラム群104を格納する記憶装置105
・情報処理装置外部の記憶媒体110の読み書きを行うドライブ装置106
・情報処理装置外部の通信ネットワーク111と接続する通信インタフェース107
・データの入出力を行う入出力インタフェース108
・各構成要素を接続するバス109 First, with reference to FIG. 1, the hardware configuration of the
・CPU (Central Processing Unit) 101 (arithmetic unit)
・ROM (Read Only Memory) 102 (storage device)
・RAM (Random Access Memory) 103 (storage device)
-
-
- A
-
・I/
・Bus 109 connecting each component
なお、図1は、推定装置100である情報処理装置のハードウェア構成の一例を示しており、情報処理装置のハードウェア構成は上述した場合に限定されない。例えば、情報処理装置は、ドライブ装置106を有さないなど、上述した構成の一部から構成されてもよい。また、情報処理装置は、上述したCPUの代わりに、GPU(Graphic Processing Unit)、DSP(Digital Signal Processor)、MPU(Micro Processing Unit)、FPU(Floating point number Processing Unit)、PPU(Physics Processing Unit)、TPU(TensorProcessingUnit)、量子プロセッサ、マイクロコントローラ、又は、これらの組み合わせなどを用いることができる。
Note that FIG. 1 shows an example of the hardware configuration of an information processing device that is the estimation device 100, and the hardware configuration of the information processing device is not limited to the above-mentioned case. For example, the information processing device may be configured from part of the configuration described above, such as not having the drive device 106. In addition, the information processing device uses GPU (Graphic Processing Unit), DSP (Digital Signal Processor), MPU (Micro Processing Unit), FPU (Float) instead of the above-mentioned CPU. ating point number Processing Unit), PPU (Physics Processing Unit) , a TPU (Tensor Processing Unit), a quantum processor, a microcontroller, or a combination thereof.
そして、推定装置100は、プログラム群104をCPU101が取得して当該CPU101が実行することで、図2に示すデータ選択手段121と性能推定手段122とを構築して装備することができる。なお、プログラム群104は、例えば、予め記憶装置105やROM102に格納されており、必要に応じてCPU101がRAM103にロードして実行する。また、プログラム群104は、通信ネットワーク111を介してCPU101に供給されてもよいし、予め記憶媒体110に格納されており、ドライブ装置106が該プログラムを読み出してCPU101に供給してもよい。但し、上述したデータ選択手段121と性能推定手段122とは、かかる手段を実現させるための専用の電子回路で構築されるものであってもよい。
Then, the estimation device 100 can construct and be equipped with the data selection means 121 and the performance estimation means 122 shown in FIG. 2 by the CPU 101 acquiring the program group 104 and executing the program group 104. Note that the program group 104 is stored in advance in the storage device 105 or ROM 102, for example, and is loaded into the RAM 103 and executed by the CPU 101 as needed. Further, the program group 104 may be supplied to the CPU 101 via the communication network 111, or may be stored in the storage medium 110 in advance, and the drive device 106 may read the program and supply it to the CPU 101. However, the data selection means 121 and the performance estimating means 122 described above may be constructed of a dedicated electronic circuit for realizing such means.
まず、推定装置100は、予測モデルと、第1の説明変数データ及び当該第1の説明変数データに対応する第1の目的変数データと、第2の説明変数データと、を取得する機能を有する。予測モデルは、推定装置100内で実行することにより、説明変数のデータが入力されると、対応する目的変数のデータの予測結果を出力する機能を有する。このとき、出力されたデータを予測データと呼ぶ。第1の説明変数データ及び第1の目的変数データは、予測モデルに応じて用意されたデータであり、例えば、予測モデルを機械学習により生成するための訓練データとしての第1の説明変数データ及び第1の目的変数データや、予測モデルを検証するための第1の説明変数データ及び第1の目的変数データを含む。また、第2の説明変数データは、予測モデルを別のタスクで運用する際に取得される説明変数データであり、かかる第2の説明変数データに対する目的変数データは存在せず、対応付けられていない。
First, the estimation device 100 has a function of acquiring a prediction model, first explanatory variable data, first objective variable data corresponding to the first explanatory variable data, and second explanatory variable data. . The prediction model has a function of outputting a prediction result of the data of the corresponding objective variable when data of the explanatory variable is input by being executed in the estimation device 100. At this time, the output data is called predicted data. The first explanatory variable data and the first objective variable data are data prepared according to the prediction model. For example, the first explanatory variable data and the first objective variable data are used as training data for generating the prediction model by machine learning. It includes first objective variable data, first explanatory variable data and first objective variable data for verifying the prediction model. In addition, the second explanatory variable data is explanatory variable data obtained when operating the prediction model in another task, and there is no objective variable data for such second explanatory variable data, and there is no correspondence. do not have.
なお、推定装置100は、予測モデル及び各データを推定装置100の外部から取得しても良い。例えば、推定装置100が他の装置とネットワークで接続されている場合には、推定装置100と同一のネットワーク上に存在する別の装置から予測モデル及び各データを取得しても良い。この場合、推定装置100がネットワーク上の他の装置と通信するための通信装置を含んでいても良い。また、推定装置100が図示しない記憶装置を含む場合には、記憶装置から予測モデル及び各データを取得しても良い。
Note that the estimation device 100 may acquire the prediction model and each data from outside the estimation device 100. For example, if the estimation device 100 is connected to another device via a network, the prediction model and each data may be acquired from another device that exists on the same network as the estimation device 100. In this case, estimation device 100 may include a communication device for communicating with other devices on the network. Furthermore, if the estimation device 100 includes a storage device (not shown), the prediction model and each data may be acquired from the storage device.
データ選択手段121は、推定装置100が取得した第1の説明変数データ及び当該第1の説明変数データに対応する第1の目的変数データから、第2の説明変数データに基づいて、一部又は全部の第1の説明変数データ及びこれに対応する第1の目的変数データを選択する。このとき、データ選択手段121は、第2の説明変数のデータの内容に基づき、予測モデルの第2の説明変数のデータに対する予測の性能の推定により好ましい第1の説明変数のデータの全部または一部を選択する。より具体的には、後述の性能推定手段122による予測モデルの第2の説明変数のデータに対する予測の性能指標の推定値が、より実際の値に近しくなることを目的として、第2の説明変数のデータの内容に基づき、第1の説明変数のデータの全部または一部を選択する。例えば、予測モデルの予測性能の推定においては、一般に予測モデルが訓練時に使用していない説明変数のデータを使用することが望ましい。したがって、データ選択手段121による第1の説明変数のデータの全部または一部の選択方法として、予測モデルの訓練に用いていない説明変数のデータを選択する方法を用いても良い。他の例として、予測モデルの第2の説明変数のデータに対する予測性能の推定においては、第2の説明変数のデータと説明変数空間上で近い説明変数のデータと対応する目的変数のデータを用いる方が好ましい。したがって、データ選択手段121による第1の説明変数のデータの全部または一部の選択方法として、第2の説明変数のデータの説明変数の空間上の分布(以下、単に分布と呼ぶ。)と選択された説明変数のデータの分布が近くなるように選択する方法を用いても良い。なお、基準の説明変数のデータの選択方法は、後述する性能推定手段122による性能の推定方法に応じて、より好適に決定しても良い。データ選択手段121で選択された説明変数のデータを基準の説明変数のデータと呼ぶ。また、データ選択手段121は、選択した基準の説明変数データに対応する目的変数のデータも選択し、これを基準の目的変数のデータと呼ぶ。そして、基準の説明変数データと、これに対応する目的変数のデータと、を併せて、基準データと呼ぶ。
The data selection means 121 selects a partial or All the first explanatory variable data and the corresponding first objective variable data are selected. At this time, the data selection means 121 selects all or part of the data of the first explanatory variable that is more preferable for estimating the prediction performance of the prediction model for the data of the second explanatory variable, based on the content of the data of the second explanatory variable. Select section. More specifically, for the purpose of making the estimated value of the prediction performance index for the data of the second explanatory variable of the prediction model by the performance estimating means 122 (described later) closer to the actual value, the second explanation is All or part of the data of the first explanatory variable is selected based on the content of the data of the variable. For example, in estimating the predictive performance of a predictive model, it is generally desirable to use data on explanatory variables that are not used by the predictive model during training. Therefore, as a method for selecting all or part of the data of the first explanatory variable by the data selection means 121, a method of selecting data of an explanatory variable that is not used for training the prediction model may be used. As another example, in estimating the predictive performance of a prediction model for data on a second explanatory variable, data on the second explanatory variable, data on an explanatory variable that is close in the explanatory variable space, and data on the corresponding objective variable are used. is preferable. Therefore, as a method for selecting all or part of the data of the first explanatory variable by the data selection means 121, the spatial distribution (hereinafter simply referred to as distribution) of the explanatory variable of the data of the second explanatory variable is selected. Alternatively, a method may be used in which the data of the explanatory variables are selected so that the distribution of the data is close to each other. Note that the method for selecting the reference explanatory variable data may be determined more appropriately depending on the method for estimating performance by the performance estimating means 122, which will be described later. The explanatory variable data selected by the data selection means 121 is referred to as reference explanatory variable data. The data selection means 121 also selects objective variable data corresponding to the selected reference explanatory variable data, and this is referred to as reference objective variable data. The reference explanatory variable data and the corresponding objective variable data are collectively referred to as reference data.
性能推定手段122は、予測モデルに対して基準の説明変数のデータを入力して得られる予測データと、基準の説明変数のデータに対応する基準の目的変数のデータと、の比較に基づいて、予測モデルの第2の説明変数データに対する予測の性能を推定する。より具体的には、例えば、性能推定手段122は、予測モデルに基準の説明変数のデータを入力して得られる予測データと、基準の目的変数のデータと、を用いて計算される性能指標の値を、予測モデルの第2の説明変数のデータに対する性能指標の推定値として使用しても良い。この時、性能推定手段122は、さらに第2の説明変数のデータと、基準の説明変数のデータとの傾向の差を考慮して、傾向の差が大きいほど性能指標の推定値が悪化するように調整しても良い。ここで傾向の差とは、2つのデータの内容を比較した結果に基づく情報であり、例えば、2つのデータの内容の差つまり2つのデータの内容がどの程度異なっているかを表す数値的な非負の値である。傾向の差の一例として、2つのデータの分布間に定義される距離や指標を用いても良い。
The performance estimating means 122 performs the following based on the comparison between the predicted data obtained by inputting the data of the standard explanatory variable to the prediction model and the data of the standard objective variable corresponding to the data of the standard explanatory variable. Estimate the prediction performance of the prediction model for the second explanatory variable data. More specifically, for example, the performance estimation means 122 calculates a performance index calculated using prediction data obtained by inputting reference explanatory variable data into a prediction model and reference objective variable data. The value may be used as an estimate of the performance index for the data of the second explanatory variable of the predictive model. At this time, the performance estimating means 122 further takes into consideration the difference in trends between the data of the second explanatory variable and the data of the standard explanatory variable, so that the larger the difference in trends, the worse the estimated value of the performance index becomes. It may be adjusted to Here, the difference in tendency is information based on the result of comparing the contents of two data, for example, the difference in the contents of two data, that is, the numerical non-negative value that represents the extent to which the contents of two data differ. is the value of As an example of the difference in trends, a distance or an index defined between two data distributions may be used.
性能推定手段122で用いる性能指標は、予測モデルによる予測性能の良さ、または、悪さを定量的に評価するものである。性能指標の実際の値は、第2説明変数のデータを予測モデルに入力して得られる予測データと、第2の説明変数のデータに対応する目的変数のデータとの比較に基づき計算される。したがって、目的変数のデータが得られない場合は、性能指標の実際の値を計算することは出来ない。性能指標の具体的な例として、回帰の場合には、平均絶対誤差、平均二乗誤差、決定係数などが、判別の場合には、精度、F1スコア、交差エントロピーなどが挙げられる。もちろん、性能推定手段122で使用できる性能指標はこれらに限定されず、任意の評価指標を推定対象とすることができる。
The performance index used by the performance estimating means 122 quantitatively evaluates the goodness or badness of the prediction performance of the prediction model. The actual value of the performance index is calculated based on a comparison between the predicted data obtained by inputting the data of the second explanatory variable into the prediction model and the data of the objective variable corresponding to the data of the second explanatory variable. Therefore, if data on the objective variable is not available, it is not possible to calculate the actual value of the performance index. Specific examples of performance indicators include mean absolute error, mean square error, coefficient of determination, etc. in the case of regression, and accuracy, F1 score, cross entropy, etc. in the case of discrimination. Of course, the performance indicators that can be used by the performance estimating means 122 are not limited to these, and any evaluation index can be used as the estimation target.
そして、上述した構成の推定装置100は、上述したデータ選択手段121と性能推定手段122との機能により、図3のフローチャートに示す推定方法を実行する。
Then, the estimation device 100 configured as described above executes the estimation method shown in the flowchart of FIG. 3 by the functions of the data selection means 121 and performance estimation means 122 described above.
図3に示すように、推定装置100は、
予め設定された予測モデルに応じて用意された第1の説明変数データ及び当該第1の説明変数データに対応する第1の目的変数データから、目的変数が対応付けられていない第2の説明変数データに基づいて、前記第1の説明変数データ及び当該第1の説明変数データに対応する前記第1の目的変数データを選択し(ステップS101)、
前記予測モデルに対して選択された前記第1の説明変数データを入力して得られる予測データと当該第1の説明変数データに対応する前記第1の目的変数データとの比較に基づいて、前記予測モデルの前記第2の説明変数データに対する予測の性能を推定する(ステップS102)。 As shown in FIG. 3, theestimation device 100
A second explanatory variable to which an objective variable is not associated from first explanatory variable data prepared according to a preset prediction model and first objective variable data corresponding to the first explanatory variable data. Based on the data, select the first explanatory variable data and the first objective variable data corresponding to the first explanatory variable data (step S101),
Based on a comparison between the prediction data obtained by inputting the first explanatory variable data selected for the prediction model and the first objective variable data corresponding to the first explanatory variable data, the The prediction performance of the prediction model for the second explanatory variable data is estimated (step S102).
予め設定された予測モデルに応じて用意された第1の説明変数データ及び当該第1の説明変数データに対応する第1の目的変数データから、目的変数が対応付けられていない第2の説明変数データに基づいて、前記第1の説明変数データ及び当該第1の説明変数データに対応する前記第1の目的変数データを選択し(ステップS101)、
前記予測モデルに対して選択された前記第1の説明変数データを入力して得られる予測データと当該第1の説明変数データに対応する前記第1の目的変数データとの比較に基づいて、前記予測モデルの前記第2の説明変数データに対する予測の性能を推定する(ステップS102)。 As shown in FIG. 3, the
A second explanatory variable to which an objective variable is not associated from first explanatory variable data prepared according to a preset prediction model and first objective variable data corresponding to the first explanatory variable data. Based on the data, select the first explanatory variable data and the first objective variable data corresponding to the first explanatory variable data (step S101),
Based on a comparison between the prediction data obtained by inputting the first explanatory variable data selected for the prediction model and the first objective variable data corresponding to the first explanatory variable data, the The prediction performance of the prediction model for the second explanatory variable data is estimated (step S102).
以上のように、本開示では、例えば、運用中の予測モデルと、第1の説明変数のデータとして予測モデルの作成に用いた説明変数のデータ(すなわち、予測モデルの訓練に用いた説明変数のデータと検証に用いた説明変数のデータ)、第1の説明変数のデータに対応する目的変数のデータとして、予測モデルの作成に用いた目的変数のデータ(すなわち、予測モデルの訓練に用いた目的変数のデータと検証に用いた目的変数のデータ)、第2の説明変数のデータとして、予測モデルの運用に用いた説明変数のデータを使用することで、予測モデルの運用に用いた説明変数のデータに対する予測性能を推定することができる。すなわち、推定装置100は、予測モデルの運用時に、運用時の目的変数のデータの入手を待たずに、予測モデルの運用時の予測性能を運用時の説明変数のデータを用いて推定することが出来る。
As described above, in the present disclosure, for example, the predictive model in operation and the explanatory variable data used to create the predictive model as the first explanatory variable data (i.e., the explanatory variable data used to train the predictive model) are described. data of the objective variable used to create the predictive model (i.e., the data of the objective variable used to train the predictive model) as the data of the objective variable corresponding to the data of the first explanatory variable. By using the data of the explanatory variables used in the operation of the prediction model as the data of the second explanatory variable (the data of the explanatory variables used in the operation of the prediction model), Predictive performance for data can be estimated. That is, the estimation device 100 can estimate the predictive performance of the predictive model during operation using the explanatory variable data during operation, without waiting for the data on the objective variable during operation to be obtained. I can do it.
なお、図2に示す構成はあくまで一例であり、必ずしも本実施形態に係る推定装置の構成を限定するものではない。例えば、推定装置100の一部の機能が複数の装置が連携することで実現されてもよい。具体的な例として、第1の説明変数のデータと、第1の説明変数のデータに対応する目的変数のデータと、第2の説明変数のデータとを取得する第1の推定装置と、予測モデルを取得する第2の推定装置により構成されていても良い。また、多角的に性能指標の値の推定を行うために、推定装置100が複数のデータ選択手段及び複数の性能推定手段を含んでいても良い。また、複数の性能推定手段による推定値に基づき、推定装置100による最終的な推定値を算出するための装置を含んでいても良い。推定装置100が、予測モデルの性能指標の値の推定以外の機能を同時に持つ場合には、図2に図示しない装置を含んでいても良い。
Note that the configuration shown in FIG. 2 is just an example, and does not necessarily limit the configuration of the estimation device according to this embodiment. For example, some functions of the estimation device 100 may be realized by a plurality of devices working together. As a specific example, a first estimation device that obtains data on a first explanatory variable, data on an objective variable corresponding to the data on the first explanatory variable, and data on a second explanatory variable; It may be configured by a second estimating device that obtains the model. Further, in order to estimate the value of the performance index from multiple angles, the estimation device 100 may include a plurality of data selection means and a plurality of performance estimation means. Further, it may include a device for calculating a final estimated value by the estimating device 100 based on the estimated values by the plurality of performance estimating means. If the estimation device 100 has a function other than estimating the value of the performance index of the prediction model, it may include a device not shown in FIG. 2 .
<第2の実施形態>
続いて、本開示の第2の実施形態の一例ついて、図4乃至図6を参照しながら説明する。図4は、本実施形態に係る推定装置の構成の一例について説明するためのブロック図であり、図5は、推定装置の動作の一例を説明するためのフローチャートである。図6は、推定装置を実運用したときの様子を説明するための図である。 <Second embodiment>
Next, an example of a second embodiment of the present disclosure will be described with reference to FIGS. 4 to 6. FIG. 4 is a block diagram for explaining an example of the configuration of the estimation device according to the present embodiment, and FIG. 5 is a flowchart for explaining an example of the operation of the estimation device. FIG. 6 is a diagram for explaining the situation when the estimation device is actually operated.
続いて、本開示の第2の実施形態の一例ついて、図4乃至図6を参照しながら説明する。図4は、本実施形態に係る推定装置の構成の一例について説明するためのブロック図であり、図5は、推定装置の動作の一例を説明するためのフローチャートである。図6は、推定装置を実運用したときの様子を説明するための図である。 <Second embodiment>
Next, an example of a second embodiment of the present disclosure will be described with reference to FIGS. 4 to 6. FIG. 4 is a block diagram for explaining an example of the configuration of the estimation device according to the present embodiment, and FIG. 5 is a flowchart for explaining an example of the operation of the estimation device. FIG. 6 is a diagram for explaining the situation when the estimation device is actually operated.
本実施形態では、第1の説明変数のデータとして、予測モデルの作成に使用した説明変数のデータを使用する。また、第1の説明変数のデータに対応する第1の目的変数のデータとして、予測モデルの作成に使用した説明変数のデータに対応する目的変数のデータを使用する。また、第2の説明変数のデータとして、予測モデルの運用時の説明変数のデータを使用する。但し、これらはあくまで本開示の実施の形態の一例としての本実施形態における一例であり、本開示の実施の形態自体を限定するものではない。本開示の各データに限らない一例として、例えば、あるタスクで得られた説明変数のデータと目的変数のデータで作成した予測モデルを、別のタスクに適用して別のタスクの説明変数のデータに対する予測モデルの予測性能を推定する場合がある。この時、第2の説明変数のデータとして、予測モデルの運用時の説明変数のデータではなく、別タスクの説明変数データを用いることができ、本開示による推定装置を適用することが可能である。他に、第1の説明変数のデータ、及び、第1の説明変数のデータに対応する第1の目的変数のデータとして、予測モデルの作成に使用したデータ以外を使用しても良く、例えば対応する目的変数のデータが既に入手されている予測モデルの運用時の説明変数のデータと、対応する目的変数のデータを使用しても良い。
In this embodiment, the data of the explanatory variable used to create the prediction model is used as the data of the first explanatory variable. Further, as the data of the first objective variable corresponding to the data of the first explanatory variable, the data of the objective variable corresponding to the data of the explanatory variable used to create the prediction model is used. Furthermore, as the second explanatory variable data, explanatory variable data during operation of the prediction model is used. However, these are merely examples of the present embodiment as an example of the embodiment of the present disclosure, and do not limit the embodiment itself of the present disclosure. As an example not limited to each data of the present disclosure, for example, a predictive model created using explanatory variable data and objective variable data obtained in a certain task is applied to another task, and the explanatory variable data of another task is There are cases where we estimate the predictive performance of a predictive model for . At this time, as the second explanatory variable data, explanatory variable data of another task can be used instead of the explanatory variable data during operation of the prediction model, and it is possible to apply the estimation device according to the present disclosure. . In addition, data other than the data used to create the prediction model may be used as the data of the first explanatory variable and the data of the first objective variable corresponding to the data of the first explanatory variable. It is also possible to use explanatory variable data during operation of a prediction model for which objective variable data has already been obtained, and corresponding objective variable data.
本実施形態における推定装置1は、演算装置と記憶装置とを備えた1台又は複数台の情報処理装置にて構成される。そして、推定装置1は、図4に示すように、第2データ取得部10と、出力部20と、制御部30と、記憶部40と、を備える。そして、制御部30は、さらに、基準データ選択部3、性能推定部4、を備える。第2データ取得部10、出力部20、制御部30が備える基準データ選択部3及び性能推定部4、の各機能は、演算装置が記憶装置に格納された各機能を実現するためのプログラムを実行することにより実現することができる。また、推定装置1は、第1データ41、予測モデル42、パラメータ情報43を記憶した記憶部40を、記憶装置に備えている。なお、制御部30は、第2データ取得部10と、出力部20と、記憶部40とのそれぞれと、通信網等を介してデータ通信を行う。
The estimation device 1 in this embodiment is configured with one or more information processing devices including a calculation device and a storage device. As shown in FIG. 4, the estimation device 1 includes a second data acquisition section 10, an output section 20, a control section 30, and a storage section 40. The control unit 30 further includes a reference data selection unit 3 and a performance estimation unit 4. The functions of the second data acquisition unit 10, the output unit 20, the reference data selection unit 3, and the performance estimation unit 4 included in the control unit 30 are performed by the arithmetic unit using programs stored in the storage device to realize each function. This can be achieved by executing. Furthermore, the estimation device 1 includes a storage unit 40 that stores first data 41, a prediction model 42, and parameter information 43 in a storage device. Note that the control unit 30 performs data communication with each of the second data acquisition unit 10, the output unit 20, and the storage unit 40 via a communication network or the like.
そして、推定装置1は、上述した各構成により、記憶部40に記憶されている予測モデル42の、第2データ取得部10によって取得される第2の説明変数のデータ、すなわち、予測モデル42の運用時の説明変数のデータに対する予測性能を推定する機能を有する。以下、構成について詳述する。
Then, the estimation device 1 uses the above-described configurations to obtain the second explanatory variable data acquired by the second data acquisition unit 10 of the prediction model 42 stored in the storage unit 40, that is, the prediction model 42. It has the function of estimating predictive performance for explanatory variable data during operation. The configuration will be explained in detail below.
第2データ取得部10は、第2の説明変数のデータとして、予測モデルの運用時の説明変数のデータを取得する。運用時の説明変数のデータ(すなわち、第2の説明変数のデータ)は、運用時に得られるデータであり、対応する目的変数のデータを予測モデルが予測する。第2データ取得部10は、例えば、推定装置1のユーザーが運用時の説明変数のデータを入力する場合は、ユーザーの入力を受け付けるインターフェースを含んでも良く、具体的には、タッチパネル、ボタン、音声入力装置などが該当する。また、推定装置1自体が第2の説明変数のデータを収集する機能を有する場合は、例えば、カメラなどのセンサー装置とセンサー装置から取得された情報を処理する装置と、処理された情報を第2の説明変数のデータとして記憶する装置等とが該当する。推定装置1とは別の装置から第2の説明変数のデータを取得する場合は、別の装置との通信装置等が該当する。
The second data acquisition unit 10 acquires explanatory variable data during operation of the prediction model as second explanatory variable data. The explanatory variable data during operation (that is, the second explanatory variable data) is data obtained during operation, and the prediction model predicts the corresponding objective variable data. For example, when the user of the estimation device 1 inputs explanatory variable data during operation, the second data acquisition unit 10 may include an interface that accepts user input, and specifically, includes a touch panel, buttons, and voice input. This includes input devices, etc. In addition, if the estimation device 1 itself has a function of collecting data on the second explanatory variable, for example, a sensor device such as a camera, a device that processes the information acquired from the sensor device, and a device that processes the information obtained from the sensor device and This corresponds to devices and the like that store data as explanatory variables in item 2. When acquiring data of the second explanatory variable from a device different from the estimation device 1, a communication device with the other device, etc. corresponds to the case.
出力部20は、制御部30で算出された性能指標の推定値を出力する。推定装置1のユーザーに対して推定値を出力する場合には、推定値を表示するためのディスプレイや、音を出力するためのスピーカなどが該当する。また、推定値を、推定装置1とは別の装置で使用する場合は、別の装置との通信装置等が出力部20に該当する。また、推定装置1は、例えば、推定値が一定の値を下回った際に、予測性能の劣化を報知するアラート機能を有していても良く、その場合は予測性能の劣化の報知装置が出力部20に該当する。
The output unit 20 outputs the estimated value of the performance index calculated by the control unit 30. When outputting the estimated value to the user of the estimation device 1, a display for displaying the estimated value, a speaker for outputting sound, etc. are applicable. Furthermore, when the estimated value is used in a device other than the estimation device 1, a communication device or the like with the other device corresponds to the output unit 20. Furthermore, the estimation device 1 may have an alert function that notifies the deterioration of the prediction performance when the estimated value falls below a certain value, in which case the device that notifies the deterioration of the prediction performance outputs the This corresponds to Section 20.
記憶部40の構成について説明する。図4に示すように、記憶部40は、第1データ41、予測モデル42、パラメータ情報43を予め格納する。記憶部40は、第1データ41、予測モデル42、及び、パラメータ情報43を、制御部30に必要に応じて送信する。
The configuration of the storage unit 40 will be explained. As shown in FIG. 4, the storage unit 40 stores first data 41, a prediction model 42, and parameter information 43 in advance. The storage unit 40 transmits the first data 41, the prediction model 42, and the parameter information 43 to the control unit 30 as necessary.
第1データ41は、予測モデル42の作成に用いた説明変数のデータ(第1の説明変数データ)と、予測モデル42の作成に用いた説明変数のデータに対応する目的変数のデータ(第1の目的変数データ)と、を含む。この第1データ41は、予測モデル42の訓練に用いた訓練データ(すなわち、予測モデル42の訓練に用いた説明変数のデータと説明変数のデータに対応する目的変数のデータ)と、予測モデル42の汎化性能の検証に用いた検証データ(すなわち、予測モデル42の検証に用いた説明変数のデータと説明変数のデータに対応する目的変数のデータ)と、を含む。
The first data 41 includes explanatory variable data (first explanatory variable data) used to create the predictive model 42 and target variable data (first explanatory variable data) corresponding to the explanatory variable data used to create the predictive model 42. objective variable data). This first data 41 includes training data used for training the prediction model 42 (that is, explanatory variable data used for training the prediction model 42 and objective variable data corresponding to the explanatory variable data), and the prediction model 42. includes verification data used to verify the generalization performance of (that is, explanatory variable data used to verify the prediction model 42 and objective variable data corresponding to the explanatory variable data).
予測モデル42は、第1データ41を用いて作成された予測モデルである。具体的には、予測モデル42は、第1データ41のうちの訓練データの説明変数のデータが入力されると、訓練データの目的変数のデータを予測するように、所謂教師あり機械学習によって作成されてもよい。また、予測モデル42は、第1データ41のうちの検証データを用いて、汎化性能が検証されていても良い。予測モデル42を実現するアルゴリズムの一例としては、線形回帰、決定木、ランダムフォレスト、ニューラルネットワーク等が挙げられる。これらはあくまで一例であり、説明変数のデータに対して目的変数のデータを予測することが可能であれば、予測モデル42を実現するアルゴリズムは限定されない。
The prediction model 42 is a prediction model created using the first data 41. Specifically, the prediction model 42 is created by so-called supervised machine learning so that when explanatory variable data of the training data of the first data 41 is input, the data of the objective variable of the training data is predicted. may be done. Moreover, the generalization performance of the prediction model 42 may be verified using verification data of the first data 41. Examples of algorithms that implement the prediction model 42 include linear regression, decision trees, random forests, neural networks, and the like. These are just examples, and the algorithm for realizing the prediction model 42 is not limited as long as it is possible to predict the data of the objective variable based on the data of the explanatory variable.
パラメータ情報43は、性能指標の値の推定に必要なパラメータの情報である。例えば、推定する性能指標や後述する基準データの取得方法及び性能指標の推定方法に関する情報を含んでも良い。
The parameter information 43 is information on parameters necessary for estimating the value of the performance index. For example, it may include information regarding the performance index to be estimated, a reference data acquisition method described later, and a performance index estimation method.
本実施形態では、予測モデルと、第1の説明変数のデータと、第1の説明変数のデータに対応する第1の目的変数のデータとが、推定ごとに変わらないため、記憶部40に記憶することで取得に要する処理を削減することができる。一方で、第2の説明変数のデータ、すなわち予測モデルの運用時に用いる説明変数のデータは、推定毎に変わるため、第2データ取得部10により取得することで、最新の運用時の説明変数のデータに対する性能指標の値を推定することが可能になる。
In this embodiment, the prediction model, the data of the first explanatory variable, and the data of the first objective variable corresponding to the data of the first explanatory variable do not change for each estimation, so they are stored in the storage unit 40. By doing so, the processing required for acquisition can be reduced. On the other hand, the data of the second explanatory variables, that is, the data of the explanatory variables used during operation of the prediction model, change for each estimation. It becomes possible to estimate the value of the performance index for the data.
次に、制御部30の構成について説明する。図4に示すように、制御部30は、基準データ選択部3と、性能推定部4とを有する。また、制御部30は、図示しないCPU、ROM及びRAMなどを備え、基準データ選択部3と性能推定部4とに対しての種々の制御及び演算を行う。
Next, the configuration of the control section 30 will be explained. As shown in FIG. 4, the control unit 30 includes a reference data selection unit 3 and a performance estimation unit 4. Further, the control section 30 includes a CPU, ROM, RAM, etc. (not shown), and performs various controls and calculations on the reference data selection section 3 and the performance estimating section 4.
基準データ選択部3(データ選択手段)は、第2データ取得部10によって取得された第2の説明変数のデータと記憶部40から取得した予測モデル42と、第1データ41と、パラメータ情報43とに基づいて、第1データ41の説明変数のデータの全部または一部を選択する。このとき、基準データ選択部3は、選択した第1データ41の説明変数のデータに対応する目的変数のデータも併せて選択する。ここで、選択された説明変数のデータを基準の説明変数のデータと呼ぶ。また、基準の説明変数のデータに対応する第1データ41の目的変数のデータを、基準の目的変数のデータと呼ぶ。そして、基準の説明変数のデータと基準の目的変数のデータを合わせて基準データと呼ぶ。
The reference data selection unit 3 (data selection means) selects the second explanatory variable data acquired by the second data acquisition unit 10, the prediction model 42 acquired from the storage unit 40, the first data 41, and the parameter information 43. Based on this, all or part of the explanatory variable data of the first data 41 is selected. At this time, the reference data selection unit 3 also selects the data of the objective variable corresponding to the data of the explanatory variable of the selected first data 41. Here, the selected explanatory variable data is referred to as reference explanatory variable data. Moreover, the data of the objective variable of the first data 41 corresponding to the data of the standard explanatory variable is referred to as the data of the standard objective variable. The data of the standard explanatory variables and the data of the standard objective variables are collectively referred to as standard data.
基準データ選択部3による基準データの選択方法の具体的な一例として、第1データ41のうちの検証データのみを基準データとして使用する方法を用いても良い。後述の性能指標の推定式(1)は、基準データに対する予測モデル42の性能指標の値を、第2の説明変数のデータに対する予測モデル42の性能指標の推定値の基準として、演算に使用する。基準データに、第1データ41のうちの訓練データを含めず、検証データのみを使用することで、予測モデル42の訓練データに対する過学習による性能指標の値の過度の評価を避け、予測モデル42の性能指標の推定値をより好適に算出できる。もちろん、訓練データの全部または一部を基準データに含めてもよいし、訓練データを含めて基準データが含むサンプル数を多くすることが、性能指標の値の好適な推定に有用となる場合もある。例えば、基準データ選択部3は、第2の説明変数のデータの内容から、かかるデータの属性や特性が分類できないなど、後述するように関連性のある第1の説明変数のデータを選択できないような場合には、検証データのみを選択したり、訓練データを含めたすべてのデータを選択してもよい。
As a specific example of a method for selecting reference data by the reference data selection unit 3, a method may be used in which only verification data of the first data 41 is used as reference data. Performance index estimation formula (1), which will be described later, uses the value of the performance index of the predictive model 42 with respect to the reference data as a standard for the estimated value of the performance index of the predictive model 42 with respect to the data of the second explanatory variable. . By not including the training data of the first data 41 in the reference data and using only the verification data, excessive evaluation of the performance index value due to overfitting with respect to the training data of the prediction model 42 is avoided, and the prediction model 42 The estimated value of the performance index can be calculated more suitably. Of course, all or part of the training data may be included in the reference data, and including the training data to increase the number of samples included in the reference data may be useful for optimally estimating the value of the performance index. be. For example, the reference data selection unit 3 may be unable to select related data for the first explanatory variable, as will be described later, such as when the attributes or characteristics of the data cannot be classified based on the content of the data for the second explanatory variable. In such cases, only the validation data or all data including training data may be selected.
また、基準データ選択部3による基準データの選択方法の他の一例として、第1データ41のうち、予測モデル42の第2の説明変数のデータに対する予測性能と関連性の高い全部または一部のデータを基準データとして選択する方法を用いても良い。ここで、関連性が高いとは、予測モデル42の第2の説明変数のデータに対する予測性能と、予測モデル42の基準の説明変数のデータに対する予測性能との間に正の相関関係が存在し、2つの予測性能が同程度になると考えられることを言う。つまり、この場合、第2の説明変数のデータと、第1データ41内の第1の説明変数のデータと、を比較した結果、第2の説明変数データに対して予め設定された基準によりデータ内容の関連例が高い、ことを意味する。後述の性能指標の推定式(1)は、基準データに対する予測モデル42の性能指標の値を、第2の説明変数のデータに対する予測モデル42の性能指標の推定値の基準として、演算に使用する。したがって、第2の説明変数のデータに対して関連性の高い一部のデータを基準データとして用いることで、予測モデル42の性能指標の推定値をより好適に算出できる。第2の説明変数のデータと関連性の高いデータを第1データ41から選定する際には、データに関する経験的な知識(一般にドメイン知識と呼ばれる)や予測モデル42の特性を用いても良い。関連性の高いデータを取得する具体的な方法として、例えば、説明変数のデータに日時情報が含まれている場合には、第2の説明変数のデータが含むいずれかのサンプルと同月同日、または同曜日、または同季節のデータを第1データ41から取得する方法がある。他の一例として、予測モデル42が予測時に特に重視している説明変数に関して、第2の説明変数のデータと値が近いデータを第1データ41から取得しても良い。
In addition, as another example of the method for selecting reference data by the reference data selection unit 3, all or part of the first data 41 that is highly relevant to the prediction performance of the second explanatory variable data of the prediction model 42 may be selected. A method of selecting data as reference data may also be used. Here, "highly relevant" means that there is a positive correlation between the predictive performance of the predictive model 42 for the data of the second explanatory variable and the predictive performance of the predictive model 42 for the data of the reference explanatory variable. , refers to the fact that the two prediction performances are considered to be comparable. In other words, in this case, as a result of comparing the data of the second explanatory variable and the data of the first explanatory variable in the first data 41, the data of the second explanatory variable is determined based on the preset standard for the second explanatory variable data. This means that the content has a high number of relevant examples. Performance index estimation formula (1), which will be described later, uses the value of the performance index of the predictive model 42 with respect to the reference data as a standard for the estimated value of the performance index of the predictive model 42 with respect to the data of the second explanatory variable. . Therefore, by using some data highly related to the data of the second explanatory variable as reference data, the estimated value of the performance index of the prediction model 42 can be calculated more suitably. When selecting data highly relevant to the data of the second explanatory variable from the first data 41, empirical knowledge regarding the data (generally referred to as domain knowledge) or characteristics of the prediction model 42 may be used. As a specific method to obtain highly relevant data, for example, if the explanatory variable data includes date and time information, the same day of the same month as any sample containing the second explanatory variable data, or There is a method of acquiring data for the same day of the week or the same season from the first data 41. As another example, regarding an explanatory variable that the prediction model 42 particularly emphasizes during prediction, data whose value is close to that of the second explanatory variable may be acquired from the first data 41.
また、基準データ選択部3による基準データの選択方法の別の一例として、第1データ41のうち、第2の説明変数のデータと傾向の差、つまり、予め設定された基準に基づくデータ内容の差が、他と比較して小さい一部のデータを基準データとして選択する方法を用いても良い。後述の性能指標の推定式(1)は、第2の説明変数のデータと基準の説明変数のデータとの傾向の差が大きいほど、第2の説明変数のデータに対する予測モデル42の性能指標の値が悪くなるように推定値を演算する。第2の説明変数のデータは、第1データ41と比較して限られた量かつ説明変数の空間上で限られた範囲のデータのみを含むことが多い。こうした状況下では、第1データ41の全ての説明変数のデータを基準の説明変数のデータとした場合、第2の説明変数のデータと基準の説明変数のデータとの傾向の差が大きくなり、予測モデル42の第2の説明変数のデータに対する予測の性能指標の値が悪く推定される。しかしながら、第2の説明変数のデータの各サンプルと説明変数の空間上で近いサンプルが第1データ41の説明変数のデータに含まれている場合がある。換言すると、運用時の説明変数のデータ(第2の説明変数のデータ)と近いデータを、予測モデル42が、予測モデル42の作成用のデータ(第1データ41)を通してすでに訓練済みまたは検証済みの場合がある。こうした場合には、予測モデル42の第2の説明変数のデータに対する予測の性能指標の実際の値は良くなる。そこで、第1データ41の説明変数のデータのうち、第2の説明変数のデータと傾向の差が小さい一部のデータを選択して、基準の説明変数のデータとすることで、予測モデル42の説明変数のデータに対する予測の性能指標の実際の値に近い値を推定することが出来る。また、傾向の差の指標は、後述の性能指標の推定式(1)で使用する傾向の差の指標と、同じ指標によって計算しても良いし、異なる指標によって計算しても良い。また、複数の異なる指標を組み合わせて計算しても良い。
In addition, as another example of the method of selecting reference data by the reference data selection unit 3, the difference between the data of the second explanatory variable and the tendency among the first data 41, that is, the difference in the data content based on the preset standard. A method may be used in which a portion of data with a smaller difference compared to other data is selected as the reference data. Equation (1) for estimating the performance index described below shows that the larger the difference in tendency between the data of the second explanatory variable and the data of the reference explanatory variable, the greater the performance index of the predictive model 42 with respect to the data of the second explanatory variable. Calculate the estimated value so that the value becomes worse. The second explanatory variable data often includes only a limited amount of data and a limited range of data in the explanatory variable space compared to the first data 41. Under these circumstances, if the data of all the explanatory variables of the first data 41 are used as the data of the standard explanatory variable, the difference in tendency between the data of the second explanatory variable and the data of the standard explanatory variable becomes large, The value of the prediction performance index for the data of the second explanatory variable of the prediction model 42 is estimated poorly. However, the explanatory variable data of the first data 41 may include samples that are spatially close to each sample of the second explanatory variable data. In other words, the predictive model 42 has already been trained or verified using data for creating the predictive model 42 (first data 41) using data close to the explanatory variable data (second explanatory variable data) during operation. There are cases where In such a case, the actual value of the performance index of the prediction of the prediction model 42 for the data of the second explanatory variable becomes better. Therefore, by selecting some data that has a small difference in tendency from the data of the second explanatory variable among the explanatory variable data of the first data 41 and using it as the data of the standard explanatory variable, the predictive model 42 It is possible to estimate a value close to the actual value of the prediction performance index for the explanatory variable data. Further, the index of difference in tendency may be calculated using the same index as the index of difference in tendency used in performance index estimation formula (1) described later, or may be calculated using a different index. Further, calculation may be performed by combining a plurality of different indicators.
ここで、基準データ選択部3による第2の説明変数のデータと傾向の差が小さいデータを第1データ41の説明変数のデータから選択する具体的な方法の一例として、貪欲法による選択方法を用いても良い。まず、第1データ41の説明変数のデータから第2の説明変数のデータと最も近いサンプルを1つまたは複数個取得して、一時的な基準の説明変数のデータとする。次に、一時的な基準の説明変数のデータに含まれない第1データ41の説明変数のデータの各サンプルの中から、一時的な基準の説明変数のデータに追加することで一時的な基準の説明変数のデータと第2の説明変数のデータとの傾向の差をより良く減少させるサンプルを1つまたは複数取得して、一時的な基準の説明変数のデータに追加する。上記の追加過程を、例えば、一時的な基準の説明変数のデータが含むサンプル数が一定数を超えるまで繰り返し、最終的に一時的な基準の説明変数のデータを基準の説明変数のデータとすることで、第1データ41の説明変数のデータから第2の説明変数のデータと傾向の差が小さいデータを選択することができる。
Here, as an example of a specific method for selecting, from the explanatory variable data of the first data 41, data with a small difference in tendency from the data of the second explanatory variable by the reference data selection unit 3, a selection method using a greedy method is used. May be used. First, one or more samples closest to the second explanatory variable data are obtained from the explanatory variable data of the first data 41 and used as temporary standard explanatory variable data. Next, from each sample of explanatory variable data of the first data 41 that is not included in the explanatory variable data of the temporary standard, the temporary standard is added to the data of the explanatory variable of the temporary standard. One or more samples that better reduce the difference in tendency between the data of the explanatory variable and the data of the second explanatory variable are obtained and added to the data of the temporary reference explanatory variable. The above addition process is repeated, for example, until the number of samples included in the data of the explanatory variable of the temporary standard exceeds a certain number, and finally the data of the explanatory variable of the temporary standard is used as the data of the explanatory variable of the standard. By doing so, it is possible to select data that has a small difference in tendency from the data of the second explanatory variable from the explanatory variable data of the first data 41.
基準データ選択部3は、上述した基準データの選択方法の2つ以上の複数の例を組み合わせた選択方法を用いても良い。例えば、基準データ選択部3は、第1データ41のうちの検証データをまず選択し、そのうえで第2の説明変数のデータと関連性が高い一部のデータを選択し、さらに第2の説明変数のデータと傾向の差が小さい一部のデータを選択しても良い。また、基準データの選択方法は、後述する性能推定部4による具体的な推定方法を考慮してより好適な選択方法を用いても良い。すなわち、性能推定部4による推定方法が、特定の基準データの選択方法で性能推定値をより正確に計算することが可能になると経験的または理論的に考えられる場合には、上述した選択方法を用いても良い。
The reference data selection unit 3 may use a selection method that is a combination of two or more examples of the reference data selection methods described above. For example, the reference data selection unit 3 first selects the verification data from the first data 41, then selects some data that is highly related to the data of the second explanatory variable, and then selects the data of the second explanatory variable. You may select some data that has a small difference between the data and the trend. Further, as the reference data selection method, a more suitable selection method may be used in consideration of a specific estimation method by the performance estimating section 4, which will be described later. In other words, if it is empirically or theoretically believed that the estimation method by the performance estimator 4 can more accurately calculate the performance estimated value with a specific reference data selection method, the above-mentioned selection method may be used. May be used.
なお、第1データ41の説明変数のデータと、第2の説明変数のデータの傾向の差が大きく異なり、第2の説明変数のデータのうちのいずれサンプルも、第1データ41の説明変数のデータが含むサンプルと関連性が低い場合、どのように第1データ41の説明変数のデータの全部または一部を選択しても、第2の説明変数のデータとの傾向の差が0になることは無い。したがって、第1データ41の説明変数のデータと、第2の説明変数のデータの傾向が大きくことなる場合は、上述のように第1データ41の説明変数のデータの全部または一部を選択したとしても、予測モデル42の第2の説明変数のデータに対する予測性能の悪化を推定することができる。
It should be noted that the difference in trends between the explanatory variable data of the first data 41 and the data of the second explanatory variable is significantly different, and any sample of the data of the second explanatory variable is different from that of the explanatory variable of the first data 41. If the data has low relevance to the sample it contains, no matter how you select all or part of the explanatory variable data of the first data 41, the difference in trend from the second explanatory variable data will be 0. There's nothing wrong with that. Therefore, if the trends of the explanatory variable data of the first data 41 and the data of the second explanatory variable are significantly different, all or part of the explanatory variable data of the first data 41 is selected as described above. Even so, it is possible to estimate the deterioration in the prediction performance of the prediction model 42 for the data of the second explanatory variable.
性能推定部4(性能推定手段)は、基準データと、第2の説明変数のデータと、記憶部40から取得した予測モデル42とパラメータ情報43とに基づいて、第2の説明変数のデータに対する予測モデル42の性能指標の推定値Pを、例えば後述する式(1)により演算する。演算により推定された性能指標の値は、出力部20へ出力される。
<性能指標の推定値の演算式>
P=B+D×A…(1)
ただし、
P:第2の説明変数のデータに対する予測モデル42の性能指標の値の推定値、
B:基準の説明変数のデータ及び基準の目的変数のデータに対する予測モデルの性能指標の値、
D:第2の説明変数のデータと基準の説明変数のデータとの傾向の差を示す非負の値、
A:基準の説明変数のデータと、予測モデル42に基準の説明変数のデータを入力して得られる予測データと基準の目的変数のデータとの比較と、に基づき計算される、Dに応じた性能指標の値の変化率、
である。 The performance estimating unit 4 (performance estimating means) performs calculations on the data of the second explanatory variable based on the reference data, the data of the second explanatory variable, and thepredictive model 42 and parameter information 43 acquired from the storage unit 40. The estimated value P of the performance index of the prediction model 42 is calculated using, for example, equation (1) described later. The value of the performance index estimated by the calculation is output to the output unit 20.
<Formula for calculating the estimated value of the performance index>
P=B+D×A…(1)
however,
P: estimated value of the performance index value of thepredictive model 42 for the data of the second explanatory variable;
B: Value of the performance index of the predictive model for the data of the standard explanatory variable and the data of the standard objective variable,
D: a non-negative value indicating the difference in tendency between the data of the second explanatory variable and the data of the standard explanatory variable;
A: Calculated based on the standard explanatory variable data and the comparison between the predicted data obtained by inputting the standard explanatory variable data into theprediction model 42 and the standard objective variable data. rate of change in the value of the performance index,
It is.
<性能指標の推定値の演算式>
P=B+D×A…(1)
ただし、
P:第2の説明変数のデータに対する予測モデル42の性能指標の値の推定値、
B:基準の説明変数のデータ及び基準の目的変数のデータに対する予測モデルの性能指標の値、
D:第2の説明変数のデータと基準の説明変数のデータとの傾向の差を示す非負の値、
A:基準の説明変数のデータと、予測モデル42に基準の説明変数のデータを入力して得られる予測データと基準の目的変数のデータとの比較と、に基づき計算される、Dに応じた性能指標の値の変化率、
である。 The performance estimating unit 4 (performance estimating means) performs calculations on the data of the second explanatory variable based on the reference data, the data of the second explanatory variable, and the
<Formula for calculating the estimated value of the performance index>
P=B+D×A…(1)
however,
P: estimated value of the performance index value of the
B: Value of the performance index of the predictive model for the data of the standard explanatory variable and the data of the standard objective variable,
D: a non-negative value indicating the difference in tendency between the data of the second explanatory variable and the data of the standard explanatory variable;
A: Calculated based on the standard explanatory variable data and the comparison between the predicted data obtained by inputting the standard explanatory variable data into the
It is.
式(1)におけるBは、基準の説明変数のデータ及び基準の目的変数のデータに対する予測モデルの性能指標の値である。特に、基準の説明変数のデータと第2の説明変数のデータの傾向の差が全く無い場合(すなわち、式(1)におけるDの値が0の場合)、性能指標の推定値PはBの値となる。実際、基準の説明変数のデータと第2の説明変数のデータの傾向の差が全く無い場合、第2の説明変数のデータと基準の説明変数のデータを同一と見なすことができるため、説明変数のデータと目的変数のデータの関係の変化(すなわち、コンセプトドリフト)が起こらない限り、第2の説明変数のデータに対する予測モデル42の予測性能は、基準の説明変数のデータに対する予測モデル42の予測性能とほとんど同一となる。式(1)は、この事実に基づいた性能指標の値の推定を可能にする。換言すると、式(1)は、コンセプトドリフトが起きていない時、かつ、基準の説明変数のデータを第2の説明変数のデータと同一の傾向となるように選択できる時、性能指標の値を、基準の説明変数のデータと基準の目的変数のデータから正確に推定することが出来る。
B in equation (1) is the value of the performance index of the predictive model for the reference explanatory variable data and the reference objective variable data. In particular, when there is no difference in the trends between the data of the standard explanatory variable and the data of the second explanatory variable (that is, when the value of D in equation (1) is 0), the estimated value P of the performance index is value. In fact, if there is no difference in the trends between the data of the standard explanatory variable and the data of the second explanatory variable, the data of the second explanatory variable and the data of the standard explanatory variable can be considered the same, so the explanatory variable Unless a change in the relationship between the data of The performance is almost the same. Equation (1) allows estimation of the value of the performance index based on this fact. In other words, Equation (1) calculates the value of the performance index when no concept drift occurs and when the data of the standard explanatory variable can be selected so that it has the same tendency as the data of the second explanatory variable. , can be estimated accurately from the data of the standard explanatory variables and the data of the standard objective variables.
また、基準の説明変数のデータと第2の説明変数のデータの傾向の差がある場合(すなわち、式(1)におけるDの値が0より大きい場合)、式(1)は、式(1)のBの値を最も良い値として、式(1)のDとAの値に基づいて性能の悪化を推定する。一般的に、予測モデルの訓練データの説明変数のデータと傾向の差がある説明変数のデータに対しては、予測モデルの予測性能が劣化することとなるが、式(1)は、これに基づいて、第2の説明変数のデータに対する予測モデル42の予測性能の劣化を推定することが出来る。
In addition, if there is a difference in tendency between the data of the standard explanatory variable and the data of the second explanatory variable (i.e., if the value of D in equation (1) is greater than 0), equation (1) is replaced by equation (1). ) is the best value, and the performance deterioration is estimated based on the values of D and A in equation (1). In general, the predictive performance of the predictive model will deteriorate for explanatory variable data that has a tendency different from that of the explanatory variable in the training data of the predictive model, but Equation (1) Based on this, it is possible to estimate the deterioration of the prediction performance of the prediction model 42 with respect to the data of the second explanatory variable.
式(1)におけるDは、第2の説明変数のデータと基準の説明変数のデータとの傾向の差を定量的に示す非負の値である。Dは、第2の説明変数のデータと基準の説明変数のデータとの傾向の差が全くない時に0になり、差が大きい時に大きな値になる。Dの具体的な一例として、第2の説明変数のデータの分布と基準の説明変数のデータの分布の差を算出し、その値をDとして使用することができる。データ分布の差を測る指標の一例として、カルバック・ライブラー情報量、イェンセン・シャノン情報量、ワッサースタイン距離、Maximum Mean Discrepancy(以下、MMD)などが挙げられる。これら指標は、データ分布の差が無い場合に0になり、データ分布の差が大きくなるほど大きな値を取る。また、他の一例として、第2の説明変数のデータと基準の説明変数のデータのそれぞれについて、説明変数の平均や分散などの統計量を算出し、それらの差を基にDを算出してもよい。別の例として、第2の説明変数のデータのうち、各サンプルから一定の距離以内に基準の説明変数のデータのいずれのサンプルも存在しないサンプルの割合を算出し、その値をDとしても良い。
D in equation (1) is a non-negative value that quantitatively indicates the difference in tendency between the data of the second explanatory variable and the data of the reference explanatory variable. D becomes 0 when there is no difference in tendency between the data of the second explanatory variable and the data of the standard explanatory variable, and takes a large value when the difference is large. As a specific example of D, the difference between the distribution of data of the second explanatory variable and the distribution of data of the standard explanatory variable can be calculated, and the calculated value can be used as D. Examples of indicators for measuring differences in data distribution include Kullback-Leibler information amount, Jensen-Shannon information amount, Wasserstein distance, Maximum Mean Discrepancy (hereinafter referred to as MMD), and the like. These indicators are 0 when there is no difference in data distribution, and take on larger values as the difference in data distribution becomes larger. In addition, as another example, statistics such as the mean and variance of the explanatory variables are calculated for each of the data of the second explanatory variable and the data of the standard explanatory variable, and D is calculated based on the difference between them. Good too. As another example, the proportion of samples in which there is no sample of the standard explanatory variable data within a certain distance from each sample among the data of the second explanatory variable may be calculated, and this value may be set as D. .
以上の例におけるDの値の算出には、第2の説明変数のデータと基準の説明変数のデータの全ての説明変数を使用しても良いし、一部の説明変数のみを使用しても良い。例えば、予測モデル42による予測に特に強い影響を及ぼす一部の説明変数のみを使用しても良い。また、上記の例におけるDの値の算出には、第2の説明変数のデータと基準の説明変数のデータのそれぞれの全てを使用しても良いし、一部のデータのみを使用しても良い。例えば、第2の説明変数のデータが非常に多くのサンプルから成る時は、第2の説明変数のデータをサンプリングして、Dの値の算出を行ってもよい。これにより、式(1)のDの算出に必要な時間を短縮することができ、高速な推定が可能になる。
To calculate the value of D in the above example, all explanatory variables of the second explanatory variable data and the standard explanatory variable data may be used, or only some explanatory variables may be used. good. For example, only some explanatory variables that have a particularly strong influence on predictions made by the prediction model 42 may be used. In addition, to calculate the value of D in the above example, you may use all of the data of the second explanatory variable and the data of the standard explanatory variable, or you may use only some of the data. good. For example, when the data of the second explanatory variable consists of a large number of samples, the value of D may be calculated by sampling the data of the second explanatory variable. Thereby, the time required to calculate D in equation (1) can be shortened, and high-speed estimation becomes possible.
他に、式(1)のDの値の算出において、各説明変数のDへの影響度を調整するために、Dの算出前に第2の説明変数のデータと基準の説明変数のデータの各説明変数の値を変換しても良い。例えば、各説明変数のDへの影響度を同程度とするために、各説明変数にMin―Max正規化やZ―Score正規化による変換を行っても良い。他の例として、予測モデル42による予測結果に特に影響力の強い一部の説明変数のDへの影響度を強くするために、第2の説明変数のデータと基準の説明変数のデータの説明変数の値を例えば2倍にしても良い。
In addition, in calculating the value of D in equation (1), in order to adjust the degree of influence of each explanatory variable on D, it is necessary to compare the data of the second explanatory variable and the data of the standard explanatory variable before calculating D. The value of each explanatory variable may be converted. For example, in order to equalize the degree of influence of each explanatory variable on D, each explanatory variable may be converted by Min-Max normalization or Z-Score normalization. As another example, in order to strengthen the influence on D of some explanatory variables that have a particularly strong influence on the prediction result by the prediction model 42, explanation of the data of the second explanatory variable and the data of the reference explanatory variable For example, the value of the variable may be doubled.
式(1)におけるAは、基準の説明変数のデータと、予測モデル42に基準の説明変数のデータを入力して得られる予測データと基準の目的変数のデータとの比較と、に基づき計算される、Dに応じた性能指標の値の変化率である。性能指標の値が高いほど予測モデルの性能が良いとされる場合(例えば、性能指標として決定係数、精度、F1スコア等を使用する場合)には、負の値、性能指標の値が低いほど予測モデルの性能が良いとされる場合(例えば、性能指標として平均絶対誤差、平均二乗誤差、交差エントロピー等を使用する場合)には、正の値とする。これにより、式(1)のDの値の増加に応じて、第2の説明変数のデータに対する予測モデル42の予測性能の悪化を定式化できる。
A in formula (1) is calculated based on the data of the standard explanatory variable and the comparison between the predicted data obtained by inputting the data of the standard explanatory variable into the prediction model 42 and the data of the standard objective variable. is the rate of change in the value of the performance index according to D. If the higher the value of the performance index, the better the performance of the prediction model (for example, when using the coefficient of determination, accuracy, F1 score, etc. as the performance index), a negative value, and the lower the value of the performance index, the better the performance of the prediction model. If the predictive model is considered to have good performance (for example, if mean absolute error, mean squared error, cross entropy, etc. are used as performance indicators), a positive value is used. Thereby, it is possible to formulate the deterioration of the prediction performance of the prediction model 42 for the data of the second explanatory variable in accordance with the increase in the value of D in equation (1).
式(1)のAの値は、例えば、定数としても良い。より具体的には、例えば、第1データ以外に、説明変数のデータと、説明変数のデータに対応する目的変数のデータが使用できる場合には、説明変数のデータを第2の説明変数のデータとした時の式(1)によるPの推定値が、説明変数のデータと目的変数のデータに対する予測モデル42の性能指標の実際の値となるようにAの値を設定しても良い。また、例えば理論的な解析や経験的な実験結果に基づいてAの値を好適に設定しても良い。
The value of A in formula (1) may be a constant, for example. More specifically, for example, if explanatory variable data and objective variable data corresponding to the explanatory variable data can be used in addition to the first data, the explanatory variable data is used as the second explanatory variable data. The value of A may be set so that the estimated value of P according to equation (1) when Further, the value of A may be suitably set based on, for example, theoretical analysis or empirical experimental results.
また、式(1)のAの値は、基準の説明変数のデータと、予測モデル42に基準の説明変数のデータを入力して得られる予測データと基準の目的変数のデータとの比較とに基づいて算出しても良い。具体的な算出方法の一例として、次の式(2)に示す、分布ロバスト最適化に基づく予測性能悪化率の演算式を用いても良い。この分布ロバスト最適化に基づく予測性能悪化率の演算式は、予測モデル42の基準の説明変数のデータに対する予測の性能指標の値と、基準の説明変数のデータの分布から数理的にAの値を算出する手法である。なお、式(2)による演算式を用いる場合はAの値が正になるため、予測性能の指標は、指標の値が低いほど予測モデルの性能が良いとされる指標を用いる。
In addition, the value of A in equation (1) is based on the comparison between the standard explanatory variable data, the predicted data obtained by inputting the standard explanatory variable data into the prediction model 42, and the standard objective variable data. It may be calculated based on. As an example of a specific calculation method, the following formula (2) for calculating the predicted performance deterioration rate based on distribution robust optimization may be used. The calculation formula for the prediction performance deterioration rate based on this distribution robust optimization is calculated mathematically from the value of the prediction performance index for the data of the standard explanatory variable of the prediction model 42 and the distribution of the data of the standard explanatory variable. This is a method to calculate Note that when the arithmetic expression according to equation (2) is used, the value of A is positive, and therefore, as an index of prediction performance, an index is used in which the lower the value of the index, the better the performance of the prediction model.
<分布ロバスト最適化に基づく予測性能悪化率の演算式>
A=(Σij Kij×Li×Lj)^(1/2)…(2)
ただし、
A:式(1)のAの値、
Σij:iとjをそれぞれ1から基準の説明変数のデータの数まで変化させた場合の総和記号、
Kij:基準の説明変数のデータのi番目とj番目の説明変数のラプラスカーネルの計算結果の値、
Li:基準の目的変数のデータのi番目の値と、基準の説明変数のデータのi番目のサンプルを予測モデルに入力した際の出力の値とに基づいて計算される予測性能の指標の値、
Lj:基準の目的変数のデータのj番目の値と、基準の説明変数のデータのj番目のサンプルを予測モデルに入力した際の出力の値とに基づいて計算される予測性能の指標の値、
である。 <Formula for predicting performance deterioration rate based on distribution robust optimization>
A=(Σij Kij×Li×Lj)^(1/2)…(2)
however,
A: value of A in formula (1),
Σij: summation symbol when i and j are each changed from 1 to the number of data of the standard explanatory variable,
Kij: the value of the Laplace kernel calculation result of the i-th and j-th explanatory variables of the data of the standard explanatory variables,
Li: Value of the prediction performance index calculated based on the i-th value of the standard objective variable data and the output value when inputting the i-th sample of the standard explanatory variable data into the prediction model. ,
Lj: value of the prediction performance index calculated based on the j-th value of the standard objective variable data and the output value when inputting the j-th sample of the standard explanatory variable data into the prediction model ,
It is.
A=(Σij Kij×Li×Lj)^(1/2)…(2)
ただし、
A:式(1)のAの値、
Σij:iとjをそれぞれ1から基準の説明変数のデータの数まで変化させた場合の総和記号、
Kij:基準の説明変数のデータのi番目とj番目の説明変数のラプラスカーネルの計算結果の値、
Li:基準の目的変数のデータのi番目の値と、基準の説明変数のデータのi番目のサンプルを予測モデルに入力した際の出力の値とに基づいて計算される予測性能の指標の値、
Lj:基準の目的変数のデータのj番目の値と、基準の説明変数のデータのj番目のサンプルを予測モデルに入力した際の出力の値とに基づいて計算される予測性能の指標の値、
である。 <Formula for predicting performance deterioration rate based on distribution robust optimization>
A=(Σij Kij×Li×Lj)^(1/2)…(2)
however,
A: value of A in formula (1),
Σij: summation symbol when i and j are each changed from 1 to the number of data of the standard explanatory variable,
Kij: the value of the Laplace kernel calculation result of the i-th and j-th explanatory variables of the data of the standard explanatory variables,
Li: Value of the prediction performance index calculated based on the i-th value of the standard objective variable data and the output value when inputting the i-th sample of the standard explanatory variable data into the prediction model. ,
Lj: value of the prediction performance index calculated based on the j-th value of the standard objective variable data and the output value when inputting the j-th sample of the standard explanatory variable data into the prediction model ,
It is.
式(1)のDにラプラスカーネルに基づくMMDの値を用いた上で、式(2)によって演算される値を式(1)のAとすることで、基準の説明変数のデータと第2の説明変数のデータとのMMDによる距離に基づいて、最も予測性能が悪い場合の性能指標の値を推定することができる。
By using the MMD value based on the Laplace kernel for D in equation (1) and setting the value calculated by equation (2) as A in equation (1), the data of the standard explanatory variable and the second The value of the performance index when the prediction performance is the worst can be estimated based on the distance by MMD from the data of the explanatory variable.
<動作の説明>
次に、図4のブロック図及び図5のフローチャートを参照して、本実施の形態に係る全体の動作の一例について詳細に説明する。図5は、第1実施形態における推定装置1の処理手順を表すフローチャートの一例である。 <Explanation of operation>
Next, an example of the overall operation according to this embodiment will be described in detail with reference to the block diagram of FIG. 4 and the flowchart of FIG. 5. FIG. 5 is an example of a flowchart showing the processing procedure of the estimation device 1 in the first embodiment.
次に、図4のブロック図及び図5のフローチャートを参照して、本実施の形態に係る全体の動作の一例について詳細に説明する。図5は、第1実施形態における推定装置1の処理手順を表すフローチャートの一例である。 <Explanation of operation>
Next, an example of the overall operation according to this embodiment will be described in detail with reference to the block diagram of FIG. 4 and the flowchart of FIG. 5. FIG. 5 is an example of a flowchart showing the processing procedure of the estimation device 1 in the first embodiment.
まず、推定装置1の処理が開始されると、第2データ取得部10により第2の説明変数のデータを取得する(ステップS1)。
次いで、制御部30は、記憶部40から、第1データ41、予測モデル42、パラメータ情報43を取得する(ステップS2)。
基準データ選択部3は、第2の説明変数のデータと第1データ41とパラメータ情報43とに基づいて基準データを選択する(ステップS3)。
性能推定部4は、基準データと、第2の説明変数のデータと、予測モデル42と、パラメータ情報43とに基づいて、式(1)により性能指標の推定値を演算する(ステップS4)。
最後に推定装置1は、性能推定部4により得られた性能指標の推定値を出力部20により出力する(ステップS5)。 First, when the process of the estimation device 1 is started, the seconddata acquisition unit 10 acquires data of the second explanatory variable (step S1).
Next, thecontrol unit 30 acquires the first data 41, the prediction model 42, and the parameter information 43 from the storage unit 40 (step S2).
The referencedata selection unit 3 selects reference data based on the second explanatory variable data, the first data 41, and the parameter information 43 (step S3).
Theperformance estimating unit 4 calculates the estimated value of the performance index using equation (1) based on the reference data, the data of the second explanatory variable, the prediction model 42, and the parameter information 43 (step S4).
Finally, the estimation device 1 outputs the estimated value of the performance index obtained by theperformance estimation section 4 through the output section 20 (step S5).
次いで、制御部30は、記憶部40から、第1データ41、予測モデル42、パラメータ情報43を取得する(ステップS2)。
基準データ選択部3は、第2の説明変数のデータと第1データ41とパラメータ情報43とに基づいて基準データを選択する(ステップS3)。
性能推定部4は、基準データと、第2の説明変数のデータと、予測モデル42と、パラメータ情報43とに基づいて、式(1)により性能指標の推定値を演算する(ステップS4)。
最後に推定装置1は、性能推定部4により得られた性能指標の推定値を出力部20により出力する(ステップS5)。 First, when the process of the estimation device 1 is started, the second
Next, the
The reference
The
Finally, the estimation device 1 outputs the estimated value of the performance index obtained by the
なお、図5に示した動作の流れはあくまでも本実施形態の一例であり、必ずしも、本実施形態に係る動作の流れを限定するものではない。具体的な一例として、第2の説明変数の取得処理S1は、第1データ41と予測モデル42とパラメータ情報43の取得処理(ステップS2)の後に実行されても良い。また、ステップS2は複数のステップに分割されても良く、例えば、ステップS2のうちの予測モデル42の取得処理が基準データの選択処理(ステップS3)の後に実行されても良い。
Note that the flow of operations shown in FIG. 5 is just an example of this embodiment, and does not necessarily limit the flow of operations according to this embodiment. As a specific example, the second explanatory variable acquisition process S1 may be executed after the first data 41, prediction model 42, and parameter information 43 acquisition process (step S2). Moreover, step S2 may be divided into a plurality of steps, and for example, the process of acquiring the prediction model 42 in step S2 may be executed after the process of selecting reference data (step S3).
<実施例>
続いて、本開示の一実施形態に係る推定装置の実施例として、具体的なユースケースを想定した当該システムの運用の例について説明する。 <Example>
Next, as an example of the estimation device according to an embodiment of the present disclosure, an example of the operation of the system assuming a specific use case will be described.
続いて、本開示の一実施形態に係る推定装置の実施例として、具体的なユースケースを想定した当該システムの運用の例について説明する。 <Example>
Next, as an example of the estimation device according to an embodiment of the present disclosure, an example of the operation of the system assuming a specific use case will be described.
(実施例1:スーパーマーケットでの1ヶ月後のアイスクリームの需要予測)
まず、実施例1として、あるスーパーマーケットの実店舗において、現在の月、日、曜日、季節、天気、気温、湿度、来客数、関連商品の販売数などを説明変数として、1ヶ月後のアイスクリームの需要数を目的変数として予測する場合の一例を説明する。この場合、1ヶ月が経過してアイスクリームの販売が行われるまで実際の販売数は知ることが出来ない。換言すると、予測を行ってから1ヶ月が経過するまで、目的変数の値である1ヶ月後のアイスクリームの需要数を知ることは出来ず、現在の予測モデルの予測性能を、より具体的には、例えば、直近1週間の予測モデルの予測値の平均二乗誤差を、知ることは出来ない。そこで本開示により直近1週間の予測モデルの予測の平均二乗誤差の推定を実施する。 (Example 1: Forecasting the demand for ice cream in a supermarket one month later)
First, as Example 1, in a physical store of a certain supermarket, the current month, day, day of the week, season, weather, temperature, humidity, number of visitors, number of sales of related products, etc. are used as explanatory variables, and ice cream one month later is An example of predicting the demand quantity using the objective variable will be explained. In this case, the actual number of sales cannot be known until one month has passed and the ice cream is sold. In other words, until one month has passed since the prediction was made, it is not possible to know the value of the objective variable, which is the number of ice creams in demand one month from now. For example, it is not possible to know the mean square error of the predicted values of the prediction model for the most recent week. Therefore, according to the present disclosure, the mean square error of the prediction of the prediction model for the most recent week is estimated.
まず、実施例1として、あるスーパーマーケットの実店舗において、現在の月、日、曜日、季節、天気、気温、湿度、来客数、関連商品の販売数などを説明変数として、1ヶ月後のアイスクリームの需要数を目的変数として予測する場合の一例を説明する。この場合、1ヶ月が経過してアイスクリームの販売が行われるまで実際の販売数は知ることが出来ない。換言すると、予測を行ってから1ヶ月が経過するまで、目的変数の値である1ヶ月後のアイスクリームの需要数を知ることは出来ず、現在の予測モデルの予測性能を、より具体的には、例えば、直近1週間の予測モデルの予測値の平均二乗誤差を、知ることは出来ない。そこで本開示により直近1週間の予測モデルの予測の平均二乗誤差の推定を実施する。 (Example 1: Forecasting the demand for ice cream in a supermarket one month later)
First, as Example 1, in a physical store of a certain supermarket, the current month, day, day of the week, season, weather, temperature, humidity, number of visitors, number of sales of related products, etc. are used as explanatory variables, and ice cream one month later is An example of predicting the demand quantity using the objective variable will be explained. In this case, the actual number of sales cannot be known until one month has passed and the ice cream is sold. In other words, until one month has passed since the prediction was made, it is not possible to know the value of the objective variable, which is the number of ice creams in demand one month from now. For example, it is not possible to know the mean square error of the predicted values of the prediction model for the most recent week. Therefore, according to the present disclosure, the mean square error of the prediction of the prediction model for the most recent week is estimated.
予測モデルは、過去の一定期間、例えば過去に取得した3年分のデータ(すなわち、3年間の毎日の説明変数のデータと、毎日の目的変数のデータである各日から1ヶ月後のアイスクリームの販売数を含む)を、予測モデル作成用のデータとして用いて作成されるものとする。ここで、3年分のデータを、第1の説明変数のデータ及び第1の説明変数のデータに対応する目的変数のデータとし、あわせて第1のデータと呼ぶ。作成された予測モデル、及び、第1のデータは所定の記憶領域に保持される。
The prediction model is based on a certain period of time in the past, for example, 3 years of data acquired in the past (i.e. 3 years of daily explanatory variable data and daily target variable data of ice cream 1 month after each day). (including sales numbers) as data for creating a predictive model. Here, the three years' worth of data is referred to as first explanatory variable data and objective variable data corresponding to the first explanatory variable data, and is collectively referred to as first data. The created prediction model and first data are held in a predetermined storage area.
次に、実運用中に、直近1週間の予測モデルの平均二乗誤差を推定する。すなわち、直近1週間分の説明変数のデータを、第2の説明変数のデータとする。予測性能の推定においては、基準データの選択方法と、式(1)におけるDの値の算出方法と、式(1)におけるAの値の算出方法とが必要になる。
Next, during actual operation, estimate the mean square error of the prediction model for the most recent week. That is, the explanatory variable data for the most recent week is used as the second explanatory variable data. Estimating prediction performance requires a method for selecting reference data, a method for calculating the value of D in equation (1), and a method for calculating the value of A in equation (1).
まず、基準データとして、第1のデータのうち、第2の説明変数のデータが含む月日と合致する月日を持つデータを選択するものとする。これは、同年同月における予測モデルの予測性能はおおむね同程度の結果になる、すなわち、予測モデルの第2の説明変数のデータに対する予測性能と関連性が高いと言う考えに基づく。こうした考えは、例えばスーパーマーケットの売上に関する経験や第1のデータを詳細に分析することなど(すなわち、ドメイン知識)から得られる。これにより、第2の説明変数のデータと関連性が高いデータに基づいて、推定対象である直近1週間の平均二乗誤差の値をより好適に推定することが可能になる。
First, it is assumed that among the first data, data having a month and day that matches the month and day included in the data of the second explanatory variable is selected as reference data. This is based on the idea that the prediction performance of the prediction models in the same month of the same year will give roughly the same results, that is, the prediction performance of the prediction model with respect to the data of the second explanatory variable is highly related. These ideas can be derived, for example, from experience with supermarket sales or from detailed analysis of primary data (ie, domain knowledge). This makes it possible to more appropriately estimate the value of the mean squared error for the most recent week, which is the estimation target, based on data that is highly related to the data of the second explanatory variable.
次に、式(1)のDの値として、基準の説明変数のデータと第2の説明変数のデータのMMDを使用するものとする。より具体的にMMDは、シグマ(パラメータ)を1としたラプラスカーネルで計算される。また、Dの算出前にZ-Score正規化を行うものとする。MMDによるDの値の算出方法は後述の式(3)による式(1)のAの値の算出方法とあわせて、分布ロバスト最適化に基づく平均二乗誤差の推定方法となり、理論的な解析に基づく推定方法となる。
Next, it is assumed that the MMD of the reference explanatory variable data and the second explanatory variable data is used as the value of D in equation (1). More specifically, MMD is calculated using a Laplace kernel with a sigma (parameter) of 1. Also, before calculating D, Z-Score normalization is performed. The method for calculating the value of D using MMD, together with the method for calculating the value of A in equation (1) using equation (3) described later, becomes a method for estimating the mean square error based on distribution robust optimization, and is suitable for theoretical analysis. The estimation method is based on
式(1)のAの値は、基準データと予測モデルに基づき前述の式(2)を具体化して、次の式(3)により演算される値を使用する。式(3)によるAの値の算出は、分布ロバスト最適化に基づく分析により導かれる演算方法である。
The value of A in equation (1) is a value calculated by the following equation (3) by embodying the above equation (2) based on the reference data and the prediction model. Calculating the value of A using equation (3) is an arithmetic method derived from analysis based on distribution robust optimization.
<変化率Aの演算式>
A=(Σij Kij×(Yi―Pi)^2×(Yj―Pj)^2)^(1/2)…(3)
ただし、
A:式(1)のAの値、
Σij:iとjをそれぞれ1から基準の説明変数のデータの数まで変化させた場合の総和記号、
Kij:Z-Score正規化した基準の説明変数のデータのi番目とj番目の説明変数のシグマ(パラメータ)を1としたラプラスカーネルの計算結果の値、
Yi:基準の目的変数のデータのi番目の値、
Yj:基準の目的変数のデータのj番目の値、
Pi:基準の説明変数のデータのi番目のサンプルを予測モデルに入力した際の出力の値
Pj:基準の説明変数のデータのj番目のサンプルを予測モデルに入力した際の出力の値、
である。 <Calculation formula for rate of change A>
A=(Σij Kij×(Yi-Pi)^2×(Yj-Pj)^2)^(1/2)...(3)
however,
A: value of A in formula (1),
Σij: summation symbol when i and j are each changed from 1 to the number of data of the standard explanatory variable,
Kij: The value of the calculation result of the Laplace kernel with the sigma (parameter) of the i-th and j-th explanatory variables of the Z-Score normalized standard explanatory variable data set to 1,
Yi: i-th value of the data of the standard objective variable,
Yj: j-th value of the standard objective variable data,
Pi: Output value when the i-th sample of data of the standard explanatory variable is input into the prediction model Pj: Output value when the j-th sample of the data of the standard explanatory variable is input into the prediction model,
It is.
A=(Σij Kij×(Yi―Pi)^2×(Yj―Pj)^2)^(1/2)…(3)
ただし、
A:式(1)のAの値、
Σij:iとjをそれぞれ1から基準の説明変数のデータの数まで変化させた場合の総和記号、
Kij:Z-Score正規化した基準の説明変数のデータのi番目とj番目の説明変数のシグマ(パラメータ)を1としたラプラスカーネルの計算結果の値、
Yi:基準の目的変数のデータのi番目の値、
Yj:基準の目的変数のデータのj番目の値、
Pi:基準の説明変数のデータのi番目のサンプルを予測モデルに入力した際の出力の値
Pj:基準の説明変数のデータのj番目のサンプルを予測モデルに入力した際の出力の値、
である。 <Calculation formula for rate of change A>
A=(Σij Kij×(Yi-Pi)^2×(Yj-Pj)^2)^(1/2)...(3)
however,
A: value of A in formula (1),
Σij: summation symbol when i and j are each changed from 1 to the number of data of the standard explanatory variable,
Kij: The value of the calculation result of the Laplace kernel with the sigma (parameter) of the i-th and j-th explanatory variables of the Z-Score normalized standard explanatory variable data set to 1,
Yi: i-th value of the data of the standard objective variable,
Yj: j-th value of the standard objective variable data,
Pi: Output value when the i-th sample of data of the standard explanatory variable is input into the prediction model Pj: Output value when the j-th sample of the data of the standard explanatory variable is input into the prediction model,
It is.
以上により具体化された基準データの選択方法と、式(1)のDの値の算出方法と、式(3)による式(1)のAの値の算出方法により、実運用中に、直近1週間の予測モデルの平均二乗誤差を推定することが可能となる。
By using the method of selecting reference data specified above, the method of calculating the value of D in equation (1), and the method of calculating the value of A in equation (1) using equation (3), the most recent It becomes possible to estimate the mean square error of the prediction model for one week.
図6は、本実施例による平均二乗誤差の推定結果をグラフにより利用者に示すユーザーインターフェースの模式図である。図6のグラフは、横軸を日付、縦軸を平均二乗誤差として、平均二乗誤差の実測値と、本実施例による予測値を折れ線グラフにより示す。また、図6は、本実施例による平均二乗誤差の予測値の式(1)のPの内訳を式(1)の右辺第1項のBの値と第2項のD×Aの値に分けて、棒グラフにより示す。平均二乗誤差の実測値は、目的変数の値、すなわちアイスクリームの販売数がわかるまでに1ヶ月かかることから、1ヶ月前までの予測に対してのみ計算し描画することが出来る。一方で、本開示による予測性能の推定は、今日の予測の分まで含めて計算することができる。また内訳を同時に表示することで、利用者が予測性能の推定値と推定値の変化をより理解しやすくすることが出来る。
FIG. 6 is a schematic diagram of a user interface that shows the estimation result of the mean square error according to the present example to the user in a graph. The graph in FIG. 6 shows the measured value of the mean square error and the predicted value according to the present example as a line graph, with the horizontal axis representing the date and the vertical axis representing the mean square error. In addition, FIG. 6 shows the breakdown of P in equation (1) of the predicted value of the mean square error according to the present example, based on the value of B in the first term on the right side of equation (1) and the value of D×A in the second term. It is divided and shown by a bar graph. Since it takes one month to know the value of the target variable, that is, the number of ice creams sold, the actual value of the mean squared error can be calculated and plotted only for predictions up to one month in advance. On the other hand, the prediction performance estimation according to the present disclosure can be calculated including today's prediction. Also, by displaying the breakdown at the same time, it is possible for the user to more easily understand the estimated value of the predicted performance and the change in the estimated value.
(実施例2:初期症状に基づく病気診断予測)
次に、実施例2として、初期症状に基づく病気診断予測を行う場合の一例を説明する。より具体的には、患者の今日の体温、昨日の体温、一昨日の体温、のどの痛みの有無、鼻づまりの有無、せきの有無、だるさの有無などの症状を説明変数として、患者の病気、例えば、風邪、インフルエンザ、アレルギー性鼻炎、溶連菌感染症、急性気管支炎、麻疹などを目的変数として予測モデルによって予測するものとする。実際の病気の判定は、医師による診断が必要であり、診断を行うまでと、診断に必要な症状が発症するまでと、診断に必要な検査結果を得るまでとに時間がかかる場合があり、すべての患者の診断結果が出るまで、予測モデルの例えば直近30人分の予測精度を知ることは出来ない。そこで本開示による推定装置により直近30人分に対する予測の予測精度の推定を実施する。 (Example 2: Disease diagnosis prediction based on initial symptoms)
Next, as a second embodiment, an example in which disease diagnosis prediction is performed based on initial symptoms will be described. More specifically, we use symptoms such as today's body temperature, yesterday's body temperature, the day before yesterday's temperature, the presence or absence of a sore throat, the presence or absence of nasal congestion, the presence or absence of a cough, and the presence or absence of malaise to determine the patient's illness, e.g. , cold, influenza, allergic rhinitis, streptococcal infection, acute bronchitis, measles, etc., are used as objective variables and predicted by a prediction model. Determining the actual disease requires diagnosis by a doctor, and it may take time to make the diagnosis, develop the symptoms necessary for diagnosis, and obtain the test results necessary for diagnosis. It is not possible to know the predictive accuracy of the predictive model for, for example, the most recent 30 patients until the diagnostic results for all patients are available. Therefore, the prediction accuracy of the prediction for the most recent 30 people is estimated using the estimation device according to the present disclosure.
次に、実施例2として、初期症状に基づく病気診断予測を行う場合の一例を説明する。より具体的には、患者の今日の体温、昨日の体温、一昨日の体温、のどの痛みの有無、鼻づまりの有無、せきの有無、だるさの有無などの症状を説明変数として、患者の病気、例えば、風邪、インフルエンザ、アレルギー性鼻炎、溶連菌感染症、急性気管支炎、麻疹などを目的変数として予測モデルによって予測するものとする。実際の病気の判定は、医師による診断が必要であり、診断を行うまでと、診断に必要な症状が発症するまでと、診断に必要な検査結果を得るまでとに時間がかかる場合があり、すべての患者の診断結果が出るまで、予測モデルの例えば直近30人分の予測精度を知ることは出来ない。そこで本開示による推定装置により直近30人分に対する予測の予測精度の推定を実施する。 (Example 2: Disease diagnosis prediction based on initial symptoms)
Next, as a second embodiment, an example in which disease diagnosis prediction is performed based on initial symptoms will be described. More specifically, we use symptoms such as today's body temperature, yesterday's body temperature, the day before yesterday's temperature, the presence or absence of a sore throat, the presence or absence of nasal congestion, the presence or absence of a cough, and the presence or absence of malaise to determine the patient's illness, e.g. , cold, influenza, allergic rhinitis, streptococcal infection, acute bronchitis, measles, etc., are used as objective variables and predicted by a prediction model. Determining the actual disease requires diagnosis by a doctor, and it may take time to make the diagnosis, develop the symptoms necessary for diagnosis, and obtain the test results necessary for diagnosis. It is not possible to know the predictive accuracy of the predictive model for, for example, the most recent 30 patients until the diagnostic results for all patients are available. Therefore, the prediction accuracy of the prediction for the most recent 30 people is estimated using the estimation device according to the present disclosure.
まず、予測モデルは、過去の一定人数、例えば過去に取得した1000人分の説明変数のデータである症状と、目的変数のデータである病気の判定結果を、予測モデル作成用のデータとして用いることで作成されるとする。ここで、過去1000人分のデータを、第1の説明変数のデータ及び第1の説明変数のデータに対応する目的変数のデータとし、あわせて第1のデータと呼ぶ。作成された予測モデル、及び、第1のデータは所定の記憶領域に保持される。
First, the predictive model uses symptoms, which are explanatory variable data, and disease judgment results, which are objective variable data, for a certain number of people in the past, for example, 1000 people, as data for creating the predictive model. Suppose it is created with . Here, the data for the past 1000 people are defined as the data of the first explanatory variable and the data of the objective variable corresponding to the data of the first explanatory variable, and are collectively referred to as the first data. The created prediction model and first data are held in a predetermined storage area.
次に、実運用中に、直近30人分の予測モデルの予測精度を推定する。すなわち、直近30人分の説明変数のデータを、第2の説明変数のデータとする。予測精度の推定においては、基準データの選択方法と、式(1)におけるDの値の算出方法と、式(1)におけるAの値の算出方法とが必要になる。
Next, during actual operation, the prediction accuracy of the prediction model for the most recent 30 people is estimated. That is, the explanatory variable data for the most recent 30 people is used as the second explanatory variable data. Estimating prediction accuracy requires a method for selecting reference data, a method for calculating the value of D in equation (1), and a method for calculating the value of A in equation (1).
この例では、基準データとして、第1のデータすべてを使用するものとする。換言すると、基準データとして、第1のデータすべてを選択する。本実施例において式(1)におけるDの値は、後述する通り、症状が基準データの各症状と比較して未知かどうかに基づいて計算される。実際に既知の症状は、第1のデータすべてであり、したがって、基準データとして第1のデータすべてを選択することで、第1のデータすべてを既知の症状とすることができ、より好適な予測精度の推定が可能になる。なお、Dの値の計算方法や第1データの数、症状に関するドメイン知識の有無などによっては、より詳細な方法で基準データを選択しても良いが、本実施例では前述の理由によりデータ全部を選択するものとする。
In this example, it is assumed that all the first data is used as the reference data. In other words, all the first data is selected as the reference data. In this embodiment, the value of D in equation (1) is calculated based on whether the symptom is unknown compared to each symptom of the reference data, as will be described later. Actually known symptoms are all of the first data. Therefore, by selecting all of the first data as reference data, all of the first data can be treated as known symptoms, and more suitable predictions can be made. Estimation of accuracy becomes possible. Note that, depending on the method of calculating the value of D, the number of first data items, the presence or absence of domain knowledge regarding symptoms, the reference data may be selected using a more detailed method, but in this example, all data are selected for the reasons mentioned above. shall be selected.
また、式(1)におけるDの値は、Min-Max正規化した第2の説明変数のデータのうちの各サンプルから一定の距離、例えば、ユークリッド距離で1.0以内に基準の説明変数のデータのいずれのサンプルも存在しないサンプルの割合とする。これは、第2の説明変数のデータの内、第1の説明変数のデータに含まれない、すなわち予測モデルが訓練も検証もしていないような、予測モデルにとって未知の説明変数を持つサンプルの割合を算出する。もし、第2の説明変数のデータの全てのサンプルが、第1の説明変数のデータに含まれる場合、Dの値は0になる。
In addition, the value of D in equation (1) is set within a certain distance from each sample of the Min-Max normalized second explanatory variable data, for example, within 1.0 in Euclidean distance. This is the percentage of samples in which neither sample exists. This is the proportion of samples that have explanatory variables that are unknown to the predictive model, such as those that are not included in the data of the first explanatory variable among the data of the second explanatory variable, that is, the predictive model has neither been trained nor verified. Calculate. If all samples of the data of the second explanatory variable are included in the data of the first explanatory variable, the value of D becomes 0.
最後に、式(1)におけるAの値は、―1の定数とする。これは、Dの算出方法とあわせて、予測モデルが訓練も検証もしていないサンプルに対する予測モデルによる予測は、おおむね間違いになると言う仮定に基づく推定方法である。
Finally, the value of A in equation (1) is a constant of -1. This is an estimation method based on the assumption that, together with the calculation method of D, predictions made by a prediction model for samples on which the prediction model has not been trained or verified will generally be incorrect.
以上の基準データの選択方法と式(1)のD及びAの値の設定により、モデル作成用のデータに対する予測モデルの精度を上限とした上で、予測モデルが訓練も検証もしていないサンプルに対する予測モデルによる予測は誤るとして、予測モデルの直近30人分に対する予測精度を推定することが出来る。
By using the above standard data selection method and setting the values of D and A in equation (1), the accuracy of the prediction model against the data for model creation is set as the upper limit, and the prediction model can be used for samples that have not been trained or verified. Assuming that the predictions made by the prediction model are incorrect, it is possible to estimate the prediction accuracy of the prediction model for the most recent 30 people.
以上、上記実施形態等を参照して本開示を説明したが、本開示は、上述した実施形態に限定されるものではない。本開示の構成や詳細には、本開示の範囲内で当業者が理解しうる様々な変更をすることができる。また、上述した各手段や機能は、ネットワーク上のいかなる場所に設置され接続された情報処理装置で実行されてもよく、つまり、いわゆるクラウドコンピューティングで実行されてもよい。
Although the present disclosure has been described above with reference to the above-described embodiments, the present disclosure is not limited to the above-described embodiments. Various changes can be made to the structure and details of the present disclosure that can be understood by those skilled in the art within the scope of the present disclosure. Further, each of the means and functions described above may be executed by an information processing device installed and connected to any location on the network, that is, may be executed by so-called cloud computing.
なお、上述したプログラムは、様々なタイプの非一時的なコンピュータ可読媒体(non-transitory computer readable medium)を用いて格納され、コンピュータに供給することができる。非一時的なコンピュータ可読媒体は、様々なタイプの実体のある記録媒体(tangible storage medium)を含む。非一時的なコンピュータ可読媒体の例は、磁気記録媒体(例えばフレキシブルディスク、磁気テープ、ハードディスクドライブ)、光磁気記録媒体(例えば光磁気ディスク)、CD-ROM(Read Only Memory)、CD-R、CD-R/W、半導体メモリ(例えば、マスクROM、PROM(Programmable ROM)、EPROM(Erasable PROM)、フラッシュROM、RAM(Random Access Memory))を含む。また、プログラムは、様々なタイプの一時的なコンピュータ可読媒体(transitory computer readable medium)によってコンピュータに供給されてもよい。一時的なコンピュータ可読媒体の例は、電気信号、光信号、及び電磁波を含む。一時的なコンピュータ可読媒体は、電線及び光ファイバ等の有線通信路、又は無線通信路を介して、プログラムをコンピュータに供給できる。
Note that the above-mentioned programs can be stored and supplied to a computer using various types of non-transitory computer readable media. Non-transitory computer-readable media include various types of tangible storage media. Examples of non-transitory computer-readable media include magnetic recording media (e.g., flexible disks, magnetic tapes, hard disk drives), magneto-optical recording media (e.g., magneto-optical disks), CD-ROMs (Read Only Memory), CD-Rs, CD-R/W, semiconductor memory (eg, mask ROM, PROM (Programmable ROM), EPROM (Erasable PROM), flash ROM, RAM (Random Access Memory)). The program may also be supplied to the computer via various types of transitory computer readable media. Examples of transitory computer-readable media include electrical signals, optical signals, and electromagnetic waves. The temporary computer-readable medium can provide the program to the computer via wired communication channels, such as electrical wires and fiber optics, or wireless communication channels.
<付記>
上記実施形態の一部又は全部は、以下の付記のようにも記載されうる。以下、本発明における推定装置、推定方法、プログラムの構成の概略を説明する。但し、本発明は、以下の構成に限定されない。
(付記1)
予め設定された予測モデルに応じて用意された第1の説明変数データ及び当該第1の説明変数データに対応する第1の目的変数データから、目的変数が対応付けられていない第2の説明変数データに基づいて、前記第1の説明変数データ及び当該第1の説明変数データに対応する前記第1の目的変数データを選択するデータ選択手段と、
前記予測モデルに対して選択された前記第1の説明変数データを入力して得られる予測データと当該第1の説明変数データに対応する前記第1の目的変数データとの比較に基づいて、前記予測モデルの前記第2の説明変数データに対する予測の性能を推定する性能推定手段と、
を有する推定装置。
(付記2)
付記1に記載の推定装置であって、
前記性能推定手段は、前記予測データと前記第1の目的変数データとの比較と、前記第2の説明変数データと選択された前記第1の説明変数データとの比較と、に基づいて、前記予測モデルの前記第2の説明変数データに対する予測の性能を推定する、
推定装置。
(付記3)
付記2に記載の推定装置であって、
前記性能推定手段は、前記予測データと前記第1の目的変数データとの比較に基づく前記予測モデルの選択された前記第1の説明変数データに対する予測の性能と、前記第2の説明変数データと選択された前記第1の説明変数データとのデータ内容の予め設定された基準に基づく差と、に基づいて、前記予測モデルの前記第2の説明変数データに対する予測の性能を推定する、
推定装置。
(付記4)
付記2に記載の推定装置であって、
前記性能推定手段は、前記第2の説明変数データと選択された前記第1の説明変数データとのデータ内容の予め設定された基準に基づく差が大きくなるほど、前記予測モデルの前記第2の説明変数データに対する性能が悪化するよう推定する、
推定装置。
(付記5)
付記4に記載の推定装置であって、
前記性能推定手段は、前記予測データと前記第1の目的変数データとの比較に基づく前記予測モデルの選択された前記第1の説明変数データに対する予測の性能の値が、前記第2の説明変数データと選択された前記第1の説明変数データとのデータ内容の予め設定された基準に基づく差が大きくなるほど悪化するよう推定する、
推定装置。
(付記6)
付記4に記載の推定装置であって、
前記性能推定手段は、前記第2の説明変数データの分布と選択された前記第1の説明変数データの分布の差に応じた前記予測モデルの前記第2の説明変数データに対する性能の悪化率を、選択された前記第1の説明変数データと、前記予測データと選択された前記第1の目的変数データとの比較と、から演算する分布ロバスト最適化に基づく予測性能悪化率の演算式を用いて算出し、当該悪化率に基づいて前記予測モデルの前記第2の説明変数データに対する性能を推定する、
推定装置。
(付記7)
付記1に記載の推定装置であって、
前記データ選択手段は、前記第1の説明変数データのうち、前記予測モデルの訓練時に用いられたデータとは異なるデータを選択する、
推定装置。
(付記8)
付記1に記載の推定装置であって、
前記データ選択手段は、前記第1の説明変数データのうち、前記予測モデルの予測の性能に関して、前記第2の説明変数データと予め設定された関連性を有するデータを選択する、
推定装置。
(付記9)
付記1に記載の推定装置であって、
前記データ選択手段は、前記第1の説明変数データのうち、前記第2の説明変数データとデータ内容の予め設定された基準に基づく差が他と比較して小さい前記第1の説明変数データを選択する、
推定装置。
(付記10)
予め設定された予測モデルに応じて用意された第1の説明変数データ及び当該第1の説明変数データに対応する第1の目的変数データから、目的変数が対応付けられていない第2の説明変数データに基づいて、前記第1の説明変数データ及び当該第1の説明変数データに対応する前記第1の目的変数データを選択し、
前記予測モデルに対して選択された前記第1の説明変数データを入力して得られる予測データと当該第1の説明変数データに対応する前記第1の目的変数データとの比較に基づいて、前記予測モデルの前記第2の説明変数データに対する予測の性能を推定する、
推定方法。
(付記11)
予め設定された予測モデルに応じて用意された第1の説明変数データ及び当該第1の説明変数データに対応する第1の目的変数データから、目的変数が対応付けられていない第2の説明変数データに基づいて、前記第1の説明変数データ及び当該第1の説明変数データに対応する前記第1の目的変数データを選択し、
前記予測モデルに対して選択された前記第1の説明変数データを入力して得られる予測データと当該第1の説明変数データに対応する前記第1の目的変数データとの比較に基づいて、前記予測モデルの前記第2の説明変数データに対する予測の性能を推定する、
処理をコンピュータに実行させるためのプログラムを記憶したコンピュータにて読み取り可能な記憶媒体。 <Additional notes>
Part or all of the above embodiments may also be described as in the following additional notes. Hereinafter, the outline of the configuration of the estimation device, estimation method, and program in the present invention will be explained. However, the present invention is not limited to the following configuration.
(Additional note 1)
A second explanatory variable to which an objective variable is not associated from first explanatory variable data prepared according to a preset prediction model and first objective variable data corresponding to the first explanatory variable data. data selection means for selecting the first explanatory variable data and the first objective variable data corresponding to the first explanatory variable data based on the data;
Based on a comparison between the prediction data obtained by inputting the first explanatory variable data selected for the prediction model and the first objective variable data corresponding to the first explanatory variable data, the performance estimating means for estimating the prediction performance of the prediction model for the second explanatory variable data;
An estimation device having
(Additional note 2)
The estimation device according to supplementary note 1,
The performance estimating means is configured to calculate the performance estimation unit based on a comparison between the prediction data and the first objective variable data, and a comparison between the second explanatory variable data and the selected first explanatory variable data. estimating the prediction performance of the prediction model for the second explanatory variable data;
Estimation device.
(Additional note 3)
The estimation device according to appendix 2,
The performance estimating means calculates the prediction performance of the prediction model for the selected first explanatory variable data based on a comparison between the prediction data and the first objective variable data, and the prediction performance of the prediction model for the selected first explanatory variable data and the second explanatory variable data. estimating the prediction performance of the prediction model for the second explanatory variable data based on a difference between the selected first explanatory variable data and the data content based on a preset standard;
Estimation device.
(Additional note 4)
The estimation device according to appendix 2,
The performance estimating means may improve the second explanation of the prediction model as the difference between the second explanatory variable data and the selected first explanatory variable data based on a preset standard of data content increases. Estimating performance for variable data to deteriorate;
Estimation device.
(Appendix 5)
The estimation device according toappendix 4,
The performance estimating means is configured such that a prediction performance value of the prediction model for the selected first explanatory variable data based on a comparison between the prediction data and the first objective variable data is determined based on the second explanatory variable. Estimating that the larger the difference based on a preset standard of data content between the data and the selected first explanatory variable data, the worse the situation becomes;
Estimation device.
(Appendix 6)
The estimation device according toappendix 4,
The performance estimation means calculates a deterioration rate of performance of the prediction model with respect to the second explanatory variable data according to a difference between a distribution of the second explanatory variable data and a distribution of the selected first explanatory variable data. , using a calculation formula for a predicted performance deterioration rate based on distribution robust optimization, which is calculated from the selected first explanatory variable data, a comparison between the predicted data and the selected first objective variable data. and estimating the performance of the prediction model with respect to the second explanatory variable data based on the deterioration rate.
Estimation device.
(Appendix 7)
The estimation device according to supplementary note 1,
The data selection means selects data different from data used during training of the prediction model from among the first explanatory variable data.
Estimation device.
(Appendix 8)
The estimation device according to supplementary note 1,
The data selection means selects, from among the first explanatory variable data, data that has a preset relationship with the second explanatory variable data with respect to the prediction performance of the prediction model.
Estimation device.
(Appendix 9)
The estimation device according to supplementary note 1,
The data selection means selects, from among the first explanatory variable data, the first explanatory variable data that has a smaller difference between the second explanatory variable data and the data content based on a preset standard compared to other explanatory variable data. select,
Estimation device.
(Appendix 10)
A second explanatory variable to which an objective variable is not associated from first explanatory variable data prepared according to a preset prediction model and first objective variable data corresponding to the first explanatory variable data. selecting the first explanatory variable data and the first objective variable data corresponding to the first explanatory variable data based on the data;
Based on a comparison between the prediction data obtained by inputting the first explanatory variable data selected for the prediction model and the first objective variable data corresponding to the first explanatory variable data, the estimating the prediction performance of the prediction model for the second explanatory variable data;
Estimation method.
(Appendix 11)
A second explanatory variable to which an objective variable is not associated from first explanatory variable data prepared according to a preset prediction model and first objective variable data corresponding to the first explanatory variable data. selecting the first explanatory variable data and the first objective variable data corresponding to the first explanatory variable data based on the data;
Based on a comparison between the prediction data obtained by inputting the first explanatory variable data selected for the prediction model and the first objective variable data corresponding to the first explanatory variable data, the estimating the prediction performance of the prediction model for the second explanatory variable data;
A computer-readable storage medium that stores a program for causing a computer to execute processing.
上記実施形態の一部又は全部は、以下の付記のようにも記載されうる。以下、本発明における推定装置、推定方法、プログラムの構成の概略を説明する。但し、本発明は、以下の構成に限定されない。
(付記1)
予め設定された予測モデルに応じて用意された第1の説明変数データ及び当該第1の説明変数データに対応する第1の目的変数データから、目的変数が対応付けられていない第2の説明変数データに基づいて、前記第1の説明変数データ及び当該第1の説明変数データに対応する前記第1の目的変数データを選択するデータ選択手段と、
前記予測モデルに対して選択された前記第1の説明変数データを入力して得られる予測データと当該第1の説明変数データに対応する前記第1の目的変数データとの比較に基づいて、前記予測モデルの前記第2の説明変数データに対する予測の性能を推定する性能推定手段と、
を有する推定装置。
(付記2)
付記1に記載の推定装置であって、
前記性能推定手段は、前記予測データと前記第1の目的変数データとの比較と、前記第2の説明変数データと選択された前記第1の説明変数データとの比較と、に基づいて、前記予測モデルの前記第2の説明変数データに対する予測の性能を推定する、
推定装置。
(付記3)
付記2に記載の推定装置であって、
前記性能推定手段は、前記予測データと前記第1の目的変数データとの比較に基づく前記予測モデルの選択された前記第1の説明変数データに対する予測の性能と、前記第2の説明変数データと選択された前記第1の説明変数データとのデータ内容の予め設定された基準に基づく差と、に基づいて、前記予測モデルの前記第2の説明変数データに対する予測の性能を推定する、
推定装置。
(付記4)
付記2に記載の推定装置であって、
前記性能推定手段は、前記第2の説明変数データと選択された前記第1の説明変数データとのデータ内容の予め設定された基準に基づく差が大きくなるほど、前記予測モデルの前記第2の説明変数データに対する性能が悪化するよう推定する、
推定装置。
(付記5)
付記4に記載の推定装置であって、
前記性能推定手段は、前記予測データと前記第1の目的変数データとの比較に基づく前記予測モデルの選択された前記第1の説明変数データに対する予測の性能の値が、前記第2の説明変数データと選択された前記第1の説明変数データとのデータ内容の予め設定された基準に基づく差が大きくなるほど悪化するよう推定する、
推定装置。
(付記6)
付記4に記載の推定装置であって、
前記性能推定手段は、前記第2の説明変数データの分布と選択された前記第1の説明変数データの分布の差に応じた前記予測モデルの前記第2の説明変数データに対する性能の悪化率を、選択された前記第1の説明変数データと、前記予測データと選択された前記第1の目的変数データとの比較と、から演算する分布ロバスト最適化に基づく予測性能悪化率の演算式を用いて算出し、当該悪化率に基づいて前記予測モデルの前記第2の説明変数データに対する性能を推定する、
推定装置。
(付記7)
付記1に記載の推定装置であって、
前記データ選択手段は、前記第1の説明変数データのうち、前記予測モデルの訓練時に用いられたデータとは異なるデータを選択する、
推定装置。
(付記8)
付記1に記載の推定装置であって、
前記データ選択手段は、前記第1の説明変数データのうち、前記予測モデルの予測の性能に関して、前記第2の説明変数データと予め設定された関連性を有するデータを選択する、
推定装置。
(付記9)
付記1に記載の推定装置であって、
前記データ選択手段は、前記第1の説明変数データのうち、前記第2の説明変数データとデータ内容の予め設定された基準に基づく差が他と比較して小さい前記第1の説明変数データを選択する、
推定装置。
(付記10)
予め設定された予測モデルに応じて用意された第1の説明変数データ及び当該第1の説明変数データに対応する第1の目的変数データから、目的変数が対応付けられていない第2の説明変数データに基づいて、前記第1の説明変数データ及び当該第1の説明変数データに対応する前記第1の目的変数データを選択し、
前記予測モデルに対して選択された前記第1の説明変数データを入力して得られる予測データと当該第1の説明変数データに対応する前記第1の目的変数データとの比較に基づいて、前記予測モデルの前記第2の説明変数データに対する予測の性能を推定する、
推定方法。
(付記11)
予め設定された予測モデルに応じて用意された第1の説明変数データ及び当該第1の説明変数データに対応する第1の目的変数データから、目的変数が対応付けられていない第2の説明変数データに基づいて、前記第1の説明変数データ及び当該第1の説明変数データに対応する前記第1の目的変数データを選択し、
前記予測モデルに対して選択された前記第1の説明変数データを入力して得られる予測データと当該第1の説明変数データに対応する前記第1の目的変数データとの比較に基づいて、前記予測モデルの前記第2の説明変数データに対する予測の性能を推定する、
処理をコンピュータに実行させるためのプログラムを記憶したコンピュータにて読み取り可能な記憶媒体。 <Additional notes>
Part or all of the above embodiments may also be described as in the following additional notes. Hereinafter, the outline of the configuration of the estimation device, estimation method, and program in the present invention will be explained. However, the present invention is not limited to the following configuration.
(Additional note 1)
A second explanatory variable to which an objective variable is not associated from first explanatory variable data prepared according to a preset prediction model and first objective variable data corresponding to the first explanatory variable data. data selection means for selecting the first explanatory variable data and the first objective variable data corresponding to the first explanatory variable data based on the data;
Based on a comparison between the prediction data obtained by inputting the first explanatory variable data selected for the prediction model and the first objective variable data corresponding to the first explanatory variable data, the performance estimating means for estimating the prediction performance of the prediction model for the second explanatory variable data;
An estimation device having
(Additional note 2)
The estimation device according to supplementary note 1,
The performance estimating means is configured to calculate the performance estimation unit based on a comparison between the prediction data and the first objective variable data, and a comparison between the second explanatory variable data and the selected first explanatory variable data. estimating the prediction performance of the prediction model for the second explanatory variable data;
Estimation device.
(Additional note 3)
The estimation device according to appendix 2,
The performance estimating means calculates the prediction performance of the prediction model for the selected first explanatory variable data based on a comparison between the prediction data and the first objective variable data, and the prediction performance of the prediction model for the selected first explanatory variable data and the second explanatory variable data. estimating the prediction performance of the prediction model for the second explanatory variable data based on a difference between the selected first explanatory variable data and the data content based on a preset standard;
Estimation device.
(Additional note 4)
The estimation device according to appendix 2,
The performance estimating means may improve the second explanation of the prediction model as the difference between the second explanatory variable data and the selected first explanatory variable data based on a preset standard of data content increases. Estimating performance for variable data to deteriorate;
Estimation device.
(Appendix 5)
The estimation device according to
The performance estimating means is configured such that a prediction performance value of the prediction model for the selected first explanatory variable data based on a comparison between the prediction data and the first objective variable data is determined based on the second explanatory variable. Estimating that the larger the difference based on a preset standard of data content between the data and the selected first explanatory variable data, the worse the situation becomes;
Estimation device.
(Appendix 6)
The estimation device according to
The performance estimation means calculates a deterioration rate of performance of the prediction model with respect to the second explanatory variable data according to a difference between a distribution of the second explanatory variable data and a distribution of the selected first explanatory variable data. , using a calculation formula for a predicted performance deterioration rate based on distribution robust optimization, which is calculated from the selected first explanatory variable data, a comparison between the predicted data and the selected first objective variable data. and estimating the performance of the prediction model with respect to the second explanatory variable data based on the deterioration rate.
Estimation device.
(Appendix 7)
The estimation device according to supplementary note 1,
The data selection means selects data different from data used during training of the prediction model from among the first explanatory variable data.
Estimation device.
(Appendix 8)
The estimation device according to supplementary note 1,
The data selection means selects, from among the first explanatory variable data, data that has a preset relationship with the second explanatory variable data with respect to the prediction performance of the prediction model.
Estimation device.
(Appendix 9)
The estimation device according to supplementary note 1,
The data selection means selects, from among the first explanatory variable data, the first explanatory variable data that has a smaller difference between the second explanatory variable data and the data content based on a preset standard compared to other explanatory variable data. select,
Estimation device.
(Appendix 10)
A second explanatory variable to which an objective variable is not associated from first explanatory variable data prepared according to a preset prediction model and first objective variable data corresponding to the first explanatory variable data. selecting the first explanatory variable data and the first objective variable data corresponding to the first explanatory variable data based on the data;
Based on a comparison between the prediction data obtained by inputting the first explanatory variable data selected for the prediction model and the first objective variable data corresponding to the first explanatory variable data, the estimating the prediction performance of the prediction model for the second explanatory variable data;
Estimation method.
(Appendix 11)
A second explanatory variable to which an objective variable is not associated from first explanatory variable data prepared according to a preset prediction model and first objective variable data corresponding to the first explanatory variable data. selecting the first explanatory variable data and the first objective variable data corresponding to the first explanatory variable data based on the data;
Based on a comparison between the prediction data obtained by inputting the first explanatory variable data selected for the prediction model and the first objective variable data corresponding to the first explanatory variable data, the estimating the prediction performance of the prediction model for the second explanatory variable data;
A computer-readable storage medium that stores a program for causing a computer to execute processing.
1 推定装置
3 基準データ選択部
4 性能推定部
10 第2データ取得部
20 出力部
30 制御部
40 記憶部
41 第1データ
42 予測モデル
43 パラメータ情報
100 推定装置
101 CPU
102 ROM
103 RAM
104 プログラム群
105 記憶装置
106 ドライブ装置
107 通信インタフェース
108 入出力インタフェース
109 バス
110 記憶媒体
111 通信ネットワーク
121 データ選択手段
122 性能推定手段
1Estimation device 3 Reference data selection section 4 Performance estimation section 10 Second data acquisition section 20 Output section 30 Control section 40 Storage section 41 First data 42 Prediction model 43 Parameter information 100 Estimation device 101 CPU
102 ROM
103 RAM
104Program group 105 Storage device 106 Drive device 107 Communication interface 108 Input/output interface 109 Bus 110 Storage medium 111 Communication network 121 Data selection means 122 Performance estimation means
3 基準データ選択部
4 性能推定部
10 第2データ取得部
20 出力部
30 制御部
40 記憶部
41 第1データ
42 予測モデル
43 パラメータ情報
100 推定装置
101 CPU
102 ROM
103 RAM
104 プログラム群
105 記憶装置
106 ドライブ装置
107 通信インタフェース
108 入出力インタフェース
109 バス
110 記憶媒体
111 通信ネットワーク
121 データ選択手段
122 性能推定手段
1
102 ROM
103 RAM
104
Claims (11)
- 予め設定された予測モデルに応じて用意された第1の説明変数データ及び当該第1の説明変数データに対応する第1の目的変数データから、目的変数が対応付けられていない第2の説明変数データに基づいて、前記第1の説明変数データ及び当該第1の説明変数データに対応する前記第1の目的変数データを選択するデータ選択手段と、
前記予測モデルに対して選択された前記第1の説明変数データを入力して得られる予測データと当該第1の説明変数データに対応する前記第1の目的変数データとの比較に基づいて、前記予測モデルの前記第2の説明変数データに対する予測の性能を推定する性能推定手段と、
を有する推定装置。 A second explanatory variable to which an objective variable is not associated from first explanatory variable data prepared according to a preset prediction model and first objective variable data corresponding to the first explanatory variable data. data selection means for selecting the first explanatory variable data and the first objective variable data corresponding to the first explanatory variable data based on the data;
Based on a comparison between the prediction data obtained by inputting the first explanatory variable data selected for the prediction model and the first objective variable data corresponding to the first explanatory variable data, the performance estimating means for estimating the prediction performance of the prediction model for the second explanatory variable data;
An estimation device having - 請求項1に記載の推定装置であって、
前記性能推定手段は、前記予測データと前記第1の目的変数データとの比較と、前記第2の説明変数データと選択された前記第1の説明変数データとの比較と、に基づいて、前記予測モデルの前記第2の説明変数データに対する予測の性能を推定する、
推定装置。 The estimation device according to claim 1,
The performance estimating means is configured to calculate the performance estimation unit based on a comparison between the prediction data and the first objective variable data, and a comparison between the second explanatory variable data and the selected first explanatory variable data. estimating the prediction performance of the prediction model for the second explanatory variable data;
Estimation device. - 請求項2に記載の推定装置であって、
前記性能推定手段は、前記予測データと前記第1の目的変数データとの比較に基づく前記予測モデルの選択された前記第1の説明変数データに対する予測の性能と、前記第2の説明変数データと選択された前記第1の説明変数データとのデータ内容の予め設定された基準に基づく差と、に基づいて、前記予測モデルの前記第2の説明変数データに対する予測の性能を推定する、
推定装置。 The estimation device according to claim 2,
The performance estimating means calculates the prediction performance of the prediction model for the selected first explanatory variable data based on a comparison between the prediction data and the first objective variable data, and the prediction performance of the prediction model for the selected first explanatory variable data and the second explanatory variable data. estimating the prediction performance of the prediction model for the second explanatory variable data based on a difference between the selected first explanatory variable data and the data content based on a preset standard;
Estimation device. - 請求項2に記載の推定装置であって、
前記性能推定手段は、前記第2の説明変数データと選択された前記第1の説明変数データとのデータ内容の予め設定された基準に基づく差が大きくなるほど、前記予測モデルの前記第2の説明変数データに対する性能が悪化するよう推定する、
推定装置。 The estimation device according to claim 2,
The performance estimating means may improve the second explanation of the prediction model as the difference between the second explanatory variable data and the selected first explanatory variable data based on a preset standard of data content increases. Estimating performance for variable data to deteriorate;
Estimation device. - 請求項4に記載の推定装置であって、
前記性能推定手段は、前記予測データと前記第1の目的変数データとの比較に基づく前記予測モデルの選択された前記第1の説明変数データに対する予測の性能の値が、前記第2の説明変数データと選択された前記第1の説明変数データとのデータ内容の予め設定された基準に基づく差が大きくなるほど悪化するよう推定する、
推定装置。 The estimation device according to claim 4,
The performance estimating means is configured such that a prediction performance value of the prediction model for the selected first explanatory variable data based on a comparison between the prediction data and the first objective variable data is determined based on the second explanatory variable. Estimating that the larger the difference based on a preset standard of data content between the data and the selected first explanatory variable data, the worse the situation becomes;
Estimation device. - 請求項4に記載の推定装置であって、
前記性能推定手段は、前記第2の説明変数データの分布と選択された前記第1の説明変数データの分布の差に応じた前記予測モデルの前記第2の説明変数データに対する性能の悪化率を、選択された前記第1の説明変数データと、前記予測データと選択された前記第1の目的変数データとの比較と、から演算する分布ロバスト最適化に基づく予測性能悪化率の演算式を用いて算出し、当該悪化率に基づいて前記予測モデルの前記第2の説明変数データに対する性能を推定する、
推定装置。 The estimation device according to claim 4,
The performance estimation means calculates a deterioration rate of performance of the prediction model with respect to the second explanatory variable data according to a difference between a distribution of the second explanatory variable data and a distribution of the selected first explanatory variable data. , using a calculation formula for a predicted performance deterioration rate based on distribution robust optimization, which is calculated from the selected first explanatory variable data, a comparison between the predicted data and the selected first objective variable data. and estimating the performance of the prediction model with respect to the second explanatory variable data based on the deterioration rate.
Estimation device. - 請求項1に記載の推定装置であって、
前記データ選択手段は、前記第1の説明変数データのうち、前記予測モデルの訓練時に用いられたデータとは異なるデータを選択する、
推定装置。 The estimation device according to claim 1,
The data selection means selects data different from data used during training of the prediction model from among the first explanatory variable data.
Estimation device. - 請求項1に記載の推定装置であって、
前記データ選択手段は、前記第1の説明変数データのうち、前記予測モデルの予測の性能に関して、前記第2の説明変数データと予め設定された関連性を有するデータを選択する、
推定装置。 The estimation device according to claim 1,
The data selection means selects, from among the first explanatory variable data, data that has a preset relationship with the second explanatory variable data with respect to the prediction performance of the prediction model.
Estimation device. - 請求項1に記載の推定装置であって、
前記データ選択手段は、前記第1の説明変数データのうち、前記第2の説明変数データとデータ内容の予め設定された基準に基づく差が他と比較して小さい前記第1の説明変数データを選択する、
推定装置。 The estimation device according to claim 1,
The data selection means selects, from among the first explanatory variable data, the first explanatory variable data that has a smaller difference between the second explanatory variable data and the data content based on a preset standard compared to other explanatory variable data. select,
Estimation device. - 予め設定された予測モデルに応じて用意された第1の説明変数データ及び当該第1の説明変数データに対応する第1の目的変数データから、目的変数が対応付けられていない第2の説明変数データに基づいて、前記第1の説明変数データ及び当該第1の説明変数データに対応する前記第1の目的変数データを選択し、
前記予測モデルに対して選択された前記第1の説明変数データを入力して得られる予測データと当該第1の説明変数データに対応する前記第1の目的変数データとの比較に基づいて、前記予測モデルの前記第2の説明変数データに対する予測の性能を推定する、
推定方法。 A second explanatory variable to which an objective variable is not associated from first explanatory variable data prepared according to a preset prediction model and first objective variable data corresponding to the first explanatory variable data. selecting the first explanatory variable data and the first objective variable data corresponding to the first explanatory variable data based on the data;
Based on a comparison between the prediction data obtained by inputting the first explanatory variable data selected for the prediction model and the first objective variable data corresponding to the first explanatory variable data, the estimating the prediction performance of the prediction model for the second explanatory variable data;
Estimation method. - 予め設定された予測モデルに応じて用意された第1の説明変数データ及び当該第1の説明変数データに対応する第1の目的変数データから、目的変数が対応付けられていない第2の説明変数データに基づいて、前記第1の説明変数データ及び当該第1の説明変数データに対応する前記第1の目的変数データを選択し、
前記予測モデルに対して選択された前記第1の説明変数データを入力して得られる予測データと当該第1の説明変数データに対応する前記第1の目的変数データとの比較に基づいて、前記予測モデルの前記第2の説明変数データに対する予測の性能を推定する、
処理をコンピュータに実行させるためのプログラムを記憶したコンピュータにて読み取り可能な記憶媒体。
A second explanatory variable to which an objective variable is not associated from first explanatory variable data prepared according to a preset prediction model and first objective variable data corresponding to the first explanatory variable data. selecting the first explanatory variable data and the first objective variable data corresponding to the first explanatory variable data based on the data;
Based on a comparison between the prediction data obtained by inputting the first explanatory variable data selected for the prediction model and the first objective variable data corresponding to the first explanatory variable data, the estimating the prediction performance of the prediction model for the second explanatory variable data;
A computer-readable storage medium that stores a program for causing a computer to execute processing.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2022/020927 WO2023223533A1 (en) | 2022-05-20 | 2022-05-20 | Estimation device, estimation method, and program |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2022/020927 WO2023223533A1 (en) | 2022-05-20 | 2022-05-20 | Estimation device, estimation method, and program |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023223533A1 true WO2023223533A1 (en) | 2023-11-23 |
Family
ID=88835021
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2022/020927 WO2023223533A1 (en) | 2022-05-20 | 2022-05-20 | Estimation device, estimation method, and program |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2023223533A1 (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070260563A1 (en) * | 2006-04-17 | 2007-11-08 | International Business Machines Corporation | Method to continuously diagnose and model changes of real-valued streaming variables |
WO2019229977A1 (en) * | 2018-06-01 | 2019-12-05 | 株式会社 東芝 | Estimation system, estimation method, and estimation program |
JP2022034697A (en) * | 2020-08-19 | 2022-03-04 | 株式会社日立製作所 | Work support system, and work support method |
-
2022
- 2022-05-20 WO PCT/JP2022/020927 patent/WO2023223533A1/en unknown
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070260563A1 (en) * | 2006-04-17 | 2007-11-08 | International Business Machines Corporation | Method to continuously diagnose and model changes of real-valued streaming variables |
WO2019229977A1 (en) * | 2018-06-01 | 2019-12-05 | 株式会社 東芝 | Estimation system, estimation method, and estimation program |
JP2022034697A (en) * | 2020-08-19 | 2022-03-04 | 株式会社日立製作所 | Work support system, and work support method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11023806B2 (en) | Learning apparatus, identifying apparatus, learning and identifying system, and recording medium | |
JP7140410B2 (en) | Forecasting system, forecasting method and forecasting program | |
JP2022037241A (en) | Abnormality detection system, abnormality detection method, abnormality detection program, and method for generating learned model | |
JP6833642B2 (en) | Factor analyzers, factor analysis methods, and programs | |
JP5768834B2 (en) | Plant model management apparatus and method | |
US20210012244A1 (en) | Prediction system, model generation system, method, and program | |
JP5867349B2 (en) | Quality prediction apparatus, operation condition determination method, quality prediction method, computer program, and computer-readable storage medium | |
JP2018092445A5 (en) | ||
JP5251217B2 (en) | Sales number prediction system, operation method of sales number prediction system, and sales number prediction program | |
JP2019128904A (en) | Prediction system, simulation system, method and program | |
US20210248293A1 (en) | Optimization device and optimization method | |
JP7481902B2 (en) | Management computer, management program, and management method | |
Yan et al. | Functional principal components analysis on moving time windows of longitudinal data: dynamic prediction of times to event | |
JP2018125897A (en) | Device and method for system operation decision-making support | |
JP6930195B2 (en) | Model identification device, prediction device, monitoring system, model identification method and prediction method | |
WO2023223533A1 (en) | Estimation device, estimation method, and program | |
JP2021022051A (en) | Machine learning program, machine learning method, and machine learning apparatus | |
KR20200051343A (en) | Method and apparatus for estimating a predicted time series data | |
JP2024017791A (en) | Neural network training method, feature selection device, feature selection method, and computer program | |
JP2021174330A (en) | Prediction device by ensemble learning of heterogeneous machine learning | |
JP6932467B2 (en) | State change detection device, state change detection system and state change detection program | |
JP5826893B1 (en) | Change point prediction apparatus, change point prediction method, and computer program | |
WO2023181244A1 (en) | Model analysis device, model analysis method, and recording medium | |
WO2024180789A1 (en) | Information processing device, information processing method, program | |
WO2024070169A1 (en) | Trial production condition proposal system and trial production condition proposal method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22942737 Country of ref document: EP Kind code of ref document: A1 |