JP7360016B2

JP7360016B2 - Data processing method, data processing device, and program

Info

Publication number: JP7360016B2
Application number: JP2019139631A
Authority: JP
Inventors: 直哉古渡; 正隆小石
Original assignee: Yokohama Rubber Co Ltd
Current assignee: Yokohama Rubber Co Ltd
Priority date: 2019-07-30
Filing date: 2019-07-30
Publication date: 2023-10-12
Anticipated expiration: 2039-07-30
Also published as: JP2021022276A

Description

本発明は、コンピュータが、複数の説明変数の値を入力することにより予め定めた特徴量に関する値を予測して出力する予測モジュールを形成するためのデータ処理方法、データ処理装置、及びプログラムに関する。 The present invention relates to a data processing method, a data processing device, and a program for a computer to form a prediction module that predicts and outputs a value related to a predetermined feature quantity by inputting the values of a plurality of explanatory variables.

近年、コンピュータに機械学習をさせて、入力されたデータから種々の予測を行う技術が活発に提案されている。一方、従来より、複数のゴム材料、充填材、及びオイル等を試行錯誤により配合して加硫ゴム組成物を試作して物性データを実験して計測することが行われている。このため、加硫ゴム組成物の配合情報と物性データの値とを紐付けたデータが多数蓄積されている。この蓄積データを学習用データセットとして活用して、コンピュータに機械学習させて、入力されたデータから物性データの値を予測させることができる。 In recent years, technologies have been actively proposed that allow computers to perform machine learning to make various predictions from input data. On the other hand, conventionally, a plurality of rubber materials, fillers, oils, etc. have been blended through trial and error to produce a prototype vulcanized rubber composition, and physical property data have been experimentally measured. For this reason, a large amount of data has been accumulated that links the formulation information of vulcanized rubber compositions with the values of physical property data. Utilizing this accumulated data as a learning data set, it is possible to cause a computer to perform machine learning and predict values of physical property data from input data.

例えば、ニューラルネットワークの手法を用いて、設計・配合等の説明変数である要因群と特徴量である特性群との写像関係を学習し、説明変数それぞれの値から特徴量の値を推定するとともに、任意の特徴量の値に対して、それを作り出す説明変数の最適値を効率的にかつ容易に求める方法を提供する技術が知られている（特許文献１）。 For example, using a neural network method, we learn the mapping relationship between a group of factors that are explanatory variables such as design and formulation and a group of characteristics that are features, and estimate the value of the feature from the value of each explanatory variable. , a technique is known that provides a method for efficiently and easily determining the optimal value of an explanatory variable that creates an arbitrary feature value (Patent Document 1).

特開２００３－５８５８２号公報Japanese Patent Application Publication No. 2003-58582

この技術におけるニューラルネットワークの学習では、用意したオリジナルデータを全て一律に読み取って複数の学習データに用いる。オリジナルデータには、過去の実験やシミュレーションによって得られたデータが含まれている場合が多い。すなわち、オリジナルデータは、特徴量の値が、測定対象物の実験値である実験データと、特徴量の値が、測定対象物のシミュレーションモデルを用いてシミュレーションを行うことにより算出されたシミュレーション計算値であるシミュレーションデータと、を複数保持する場合が多い。
シミュレーションでは、実験を再現するようにシミュレーションモデルを用いてコンピュータで計算するが、シミュレーションにより得られる結果は、要因（説明変数）を種々変更した時に得られる特徴量の値（シミュレーションで算出された値）の変化は、実験で得られた特徴量の値（実験データ）の変化に対応するが、シミュレーションによって得られた特徴量の値が、実験で得られた特徴量の値に一致しない場合が多く、偏差が存在する場合が多い。
このため、実験データとシミュレーションデータを含んだオリジナルデータを、一まとめにして、説明変数と特徴量の間の関係を機械学習した予測モジュールを作成することは難しい。 In neural network learning using this technology, all prepared original data is uniformly read and used as multiple pieces of training data. Original data often includes data obtained from past experiments and simulations. In other words, the original data includes experimental data whose feature values are experimental values of the measurement target, and simulation calculated values whose feature values are calculated by performing a simulation using a simulation model of the measurement target. In many cases, a plurality of simulation data are held.
In simulation, calculations are performed on a computer using a simulation model to reproduce the experiment, but the results obtained by simulation are the values of the features (values calculated by simulation) obtained when various factors (explanatory variables) are changed. ) corresponds to a change in the value of the feature obtained in the experiment (experimental data), but the value of the feature obtained in the simulation may not match the value of the feature obtained in the experiment. There are often deviations.
For this reason, it is difficult to create a prediction module that performs machine learning on the relationship between explanatory variables and features by combining original data including experimental data and simulation data.

そこで、本発明は、コンピュータが、複数の説明変数の値を入力することにより予め定めた特徴量に関する値を予測して出力する予測モジュールを定めるとき、実験データとシミュレーションデータを含むオリジナルデータセットを用いて、説明変数と特徴量の間の関係を機械学習した予測精度の高い予測モジュールを効率よく作成することができるデータ処理方法、データ処理装置、及びコンピュータにデータ処理方法を実行させるプログラムを提供することを目的とする。 Therefore, the present invention provides an original data set including experimental data and simulation data when a computer defines a prediction module that predicts and outputs a value related to a predetermined feature quantity by inputting the values of a plurality of explanatory variables. Provides a data processing method, a data processing device, and a program for causing a computer to execute the data processing method, which can efficiently create a prediction module with high prediction accuracy by machine learning the relationship between explanatory variables and feature quantities using The purpose is to

本発明の一態様は、コンピュータが、複数の説明変数の値を入力することにより予め定めた特徴量に関する値を予測して出力する予測モジュールを形成するためのデータ処理方法である。当該データ処理方法は、
複数の説明変数それぞれの値と、前記説明変数の値と関連付けを行うための特徴量の値とをセットにして保持するデータであって、前記特徴量の値が、測定対象物の実験値である実験データと、前記特徴量の値が、前記測定対象物のシミュレーションモデルを用いてシミュレーションを行うことにより算出されたシミュレーション計算値であるシミュレーションデータと、を複数保持するオリジナルデータセットを用いて、コンピュータが、前記シミュレーションデータにおける前記特徴量の値と前記実験データにおける前記特徴量の値との間の対応関係に基づいて、前記シミュレーションデータにおける前記特徴量の値を修正した修正シミュレーションデータで構成される修正シミュレーションデータセットを作成するステップと、
前記コンピュータが、前記修正シミュレーションデータセットと前記実験データで構成される実験データセットのそれぞれを、学習用データセットと、検証用データセットとに分離することにより、学習用修正シミュレーションデータセット、学習用実験データセット、検証用修正シミュレーションデータセット、及び検証用実験データセットを生成するステップと、
前記コンピュータが、前記学習用修正シミュレーションデータセット及び前記学習用実験データセット、及び前記学習用修正シミュレーションデータセットと前記学習用実験データセットとを統合した学習用統合データセットのそれぞれを用いて、前記コンピュータが、前記説明変数と前記特徴量との間の関係を機械学習した複数の予測モジュール候補を作成するステップと、
前記コンピュータが、前記検証用修正シミュレーションデータセット、前記検証用実験データセット、及び前記検証用修正シミュレーションデータセットと前記検証用実験データセットとを統合した検証用統合データセットを用いて、機械学習した前記複数の予測モジュール候補それぞれに対して予測精度の評価をするステップと、
前記コンピュータは、前記予測精度の評価結果に基づいて、前記複数の予測モジュール候補から前記予測モジュールを決定するステップと、を備える。 One aspect of the present invention is a data processing method for a computer to form a prediction module that predicts and outputs a value related to a predetermined feature amount by inputting values of a plurality of explanatory variables. The data processing method is
Data that is held as a set of the values of each of a plurality of explanatory variables and the value of a feature amount for associating the value of the explanatory variable with the value of the explanatory variable, the value of the feature amount being an experimental value of the measurement target. Using an original data set that holds a plurality of experimental data and simulation data in which the value of the feature amount is a simulation calculation value calculated by performing a simulation using a simulation model of the measurement target, Corrected simulation data in which a computer corrects the value of the feature amount in the simulation data based on the correspondence between the value of the feature amount in the simulation data and the value of the feature amount in the experimental data. creating a modified simulation dataset consisting of;
The computer separates each of the experimental data sets composed of the corrected simulation data set and the experimental data into a learning data set and a verification data set, thereby creating a corrected learning simulation data set. , generating a learning experiment data set, a verification modified simulation data set, and a verification experiment data set;
The computer uses each of the corrected learning simulation data set, the learning experiment data set, and the learning integrated data set in which the learning corrected simulation data set and the learning experiment data set are integrated, a step in which a computer creates a plurality of prediction module candidates by machine learning the relationship between the explanatory variable and the feature amount ;
The computer performs machine learning using the modified verification simulation data set, the verification experiment data set, and the verification integrated data set that integrates the verification modified simulation data set and the verification experiment data set. Evaluating prediction accuracy for each of the plurality of prediction module candidates;
The computer includes the step of determining the prediction module from the plurality of prediction module candidates based on the evaluation result of the prediction accuracy.

前記シミュレーションデータは、複数の前記実験データにおける前記特徴量の最大値と最小値のそれぞれを実現する前記説明変数の値を用いて、前記シミュレーションモデルを用いて前記シミュレーションを行うことにより算出されたシミュレーション計算値を含み、
前記特徴量の値の修正では、前記最大値及び前記最小値と、前記最大値及び前記最小値のそれぞれに対応したシミュレーション計算値との間の対応関係と、前記実験データの前記特徴量の値が前記最大値と前記最小値の間に存在し、前記説明変数の値同士が許容範囲内で一致する前記シミュレーション計算値と前記実験データにおける前記特徴量の値との間の対応関係とを利用して、前記学習用修正シミュレーションデータセットの前記特徴量の値を修正する、ことが好ましい。 The simulation data is a simulation calculated by performing the simulation using the simulation model using the values of the explanatory variables that realize the maximum and minimum values of the feature amounts in the plurality of experimental data. Contains calculated values,
In the correction of the value of the feature amount, a correspondence relationship between the maximum value and the minimum value and simulation calculation values corresponding to the maximum value and the minimum value, respectively, and the value of the feature amount of the experimental data are determined. exists between the maximum value and the minimum value, and the values of the explanatory variables match within a permissible range.Use a correspondence relationship between the simulation calculation value and the value of the feature quantity in the experimental data. It is preferable that the value of the feature amount of the corrected learning simulation data set is corrected.

前記予測精度の評価をするとき、前記学習用統合データセットを用いて機械学習した予測モジュール候補については、前記検証用修正シミュレーションデータセット、前記検証用実験データセット、及び前記検証用統合データセットのそれぞれを用いたときの予測精度の評価をする、ことが好ましい。 When evaluating the prediction accuracy, for the prediction module candidates that have been machine learned using the integrated training dataset, the corrected verification simulation dataset, the verification experiment dataset, and the verification integrated dataset are evaluated. It is preferable to evaluate the prediction accuracy when using each.

前記予測精度の評価をするとき、前記学習用統合データセットを用いて機械学習した予測モジュール候補については、
（１）前記検証用実験データセットを用いたときの予測精度と、前記学習用実験データセットを用いて機械学習した予測モジュール候補における、前記検証用実験データセットを用いたときの予測精度とを比較し、
（２）前記検証用修正シミュレーションデータセットを用いたときの予測精度と、前記学習用修正シミュレーションデータセットを用いて機械学習した予測モジュール候補における、前記検証用修正シミュレーションデータセットを用いたときの予測精度とを比較し、
比較結果に基づいて、前記学習用統合データセットを用いて機械学習した予測モジュール候補の評価を行う、ことが好ましい。 When evaluating the prediction accuracy, for prediction module candidates that have been machine learned using the integrated learning dataset,
(1) Prediction accuracy when using the experimental data set for verification, and prediction accuracy when using the experimental data set for verification in prediction module candidates that are machine learned using the experimental data set for learning. Compare,
(2) Prediction accuracy when using the corrected simulation data set for verification and prediction when using the corrected simulation data set for verification in prediction module candidates that are machine learned using the corrected simulation data set for learning Compare the accuracy and
Preferably, prediction module candidates subjected to machine learning using the integrated learning data set are evaluated based on the comparison results.

前記シミュレーションデータは、前記シミュレーションモデルの構成および前記シミュレーションの方法の少なくともいずれか１つが異なる第１シミュレーションデータ及び第２シミュレーションデータを含み、
前記第１シミュレーションデータ及び前記第２シミュレーションデータのそれぞれを用いて、前記修正シミュレーションデータセットを作成すること、前記学習用修正シミュレーションデータセット、及び前記検証用修正シミュレーションデータセットを生成すること、前記予測モジュール候補を作成すること、前記予測精度の評価をすること、を行う、ことが好ましい。 The simulation data includes first simulation data and second simulation data that differ in at least one of the configuration of the simulation model and the method of the simulation,
creating the modified simulation data set using each of the first simulation data and the second simulation data; generating the modified simulation data set for learning and the modified simulation data set for verification; and the prediction. Preferably, the steps of creating module candidates and evaluating the prediction accuracy are performed.

前記特徴量は、タイヤに作用する物理量であり、
前記説明変数の値は、前記タイヤを規定する値である、ことが好ましい。 The feature quantity is a physical quantity that acts on the tire,
Preferably, the value of the explanatory variable is a value that defines the tire.

さらに、前記特徴量に関する目標値の入力に応じて、前記コンピュータが、前記予測モジュールを用いて前記目標値を再現する前記説明変数に関する最適値を算出するステップを備え、
前記最適値を算出するステップでは、前記予測モジュールに入力される前記説明変数の値に応じて前記予測モジュールが予測する前記特徴量の値に基づいて、前記説明変数に関する前記最適値を算出する、ことが好ましい。 Furthermore, in response to the input of the target value regarding the feature amount, the computer calculates an optimal value regarding the explanatory variable that reproduces the target value using the prediction module,
In the step of calculating the optimal value, the optimal value for the explanatory variable is calculated based on the value of the feature quantity predicted by the prediction module according to the value of the explanatory variable input to the prediction module. It is preferable.

前記説明変数の値と前記特徴量の値の関係を可視化するステップを、さらに備える、ことが好ましい。 Preferably, the method further includes the step of visualizing the relationship between the value of the explanatory variable and the value of the feature amount.

本発明の他の一態様は、複数の説明変数の値を入力することにより予め定めた特徴量の値を予測して出力する予測モジュールを形成する、コンピュータで構成されたデータ処理装置である。当該データ処理装置は、
複数の説明変数それぞれの値と、前記説明変数の値と関連付けを行うための特徴量の値とをセットにして保持するデータであって、前記特徴量の値が、試験機を用いて得られた測定対象物の実験値である実験データと、前記特徴量の値が、前記測定対象物のシミュレーションモデルを用いてシミュレーションを行うことにより算出されたシミュレーション計算値であるシミュレーションデータとを複数保持するオリジナルデータセットを用いて、前記シミュレーションデータにおける前記特徴量の値と前記実験データにおける前記特徴量の値との間の対応関係に基づいて、前記シミュレーションデータにおける前記特徴量の値を修正して、修正シミュレーションデータで構成される修正シミュレーションデータセットとするデータ修正部と、
前記修正シミュレーションデータセットと前記実験データで構成される実験データセットのそれぞれを、学習用データセットと、検証用データセットとに分離することにより、学習用修正シミュレーションデータセット、学習用実験データセット、検証用修正シミュレーションデータセット、及び検証用実験データセットを生成するデータセット生成部と、
前記学習用修正シミュレーションデータセット及び前記学習用実験データセット、及び前記学習用修正シミュレーションデータセットと前記学習用実験データセットとを統合した学習用統合データセットのそれぞれを用いて、前記コンピュータが、前記説明変数と前記特徴量との間の関係を機械学習した複数の予測モジュール候補を作成する予測モジュール候補作成部と、
前記コンピュータが、前記検証用修正シミュレーションデータセット、前記検証用実験データセット、及び前記検証用修正シミュレーションデータセットと前記検証用実験データセットとを統合した検証用統合データセットを用いて、機械学習した複数の前記予測モジュール候補それぞれに対して予測精度の評価をする予測モジュール候補評価部と、
前記コンピュータは、前記予測精度の評価結果に基づいて、前記複数の予測モジュール候補から前記予測モジュールを決定する予測モジュール決定部と、を備える。 Another aspect of the present invention is a data processing device configured with a computer that forms a prediction module that predicts and outputs the value of a predetermined feature quantity by inputting the values of a plurality of explanatory variables. The data processing device is
Data that is held as a set of the values of each of a plurality of explanatory variables and the value of a feature amount for associating the value of the explanatory variable with the value of the feature amount, the value of the feature amount being obtained using a testing machine. holding a plurality of experimental data, which are experimental values of the measured object, and simulation data, in which the value of the feature amount is a simulation calculated value calculated by performing a simulation using a simulation model of the measured object. Correcting the value of the feature amount in the simulation data based on the correspondence between the value of the feature amount in the simulation data and the value of the feature amount in the experimental data using the original data set. and a data correction unit that creates a corrected simulation data set composed of the corrected simulation data;
By separating each of the experimental data sets composed of the modified simulation data set and the experimental data into a training data set and a verification data set, the modified simulation data set for learning and the experiment for learning can be separated. a dataset generation unit that generates a dataset, a modified simulation dataset for verification, and an experimental dataset for validation;
The computer uses each of the modified learning simulation data set, the learning experiment data set, and the learning integrated data set that integrates the learning modified simulation data set and the learning experiment data set. a prediction module candidate creation unit that creates a plurality of prediction module candidates by machine learning the relationship between the explanatory variable and the feature amount ;
The computer performs machine learning using the modified verification simulation data set, the verification experiment data set, and the verification integrated data set that integrates the verification modified simulation data set and the verification experiment data set. a prediction module candidate evaluation unit that evaluates prediction accuracy for each of the plurality of prediction module candidates;
The computer includes a prediction module determining unit that determines the prediction module from the plurality of prediction module candidates based on the evaluation result of the prediction accuracy.

本発明のさらに他の一態様は、複数の説明変数の値を入力することにより予め定めた特徴量の値を予測して出力する予測モジュールを形成するためのデータ処理方法を、コンピュータに実行させるプログラムである。当該プログラムは、
複数の説明変数それぞれの値と、前記説明変数の値と関連付けを行うための特徴量の値とをセットにして保持するデータであって、前記特徴量の値が、試験機を用いて得られた測定対象物の実験値である実験データと、前記特徴量の値が、前記測定対象物のシミュレーションモデルを用いてシミュレーションを行うことにより算出されたシミュレーション計算値であるシミュレーションデータとを複数保持するオリジナルデータセットを用いて、コンピュータに、前記シミュレーションデータにおける前記特徴量の値と前記実験データにおける前記特徴量の値との間の対応関係に基づいて、前記シミュレーションデータにおける前記特徴量の値を修正させて、修正シミュレーションデータで構成される修正シミュレーションデータセットを生成させる手順と、
前記コンピュータに、前記修正シミュレーションデータセットと前記実験データで構成される実験データセットのそれぞれを、学習用データセットと、検証用データセットとに分離させることにより、学習用修正シミュレーションデータセット、学習用実験データセット、検証用修正シミュレーションデータセット、及び検証用実験データセットを生成させる手順と、
前記コンピュータに、前記学習用修正シミュレーションデータセット、前記学習用実験データセット、及び前記学習用修正シミュレーションデータセットと前記学習用実験データセットとを統合した学習用統合データセットのそれぞれを用いて、前記コンピュータが、前記説明変数と前記特徴量との間の関係を機械学習した複数の予測モジュール候補を作成させる手順と、
前記コンピュータに、前記検証用修正シミュレーションデータセット、前記検証用実験データセット、及び前記検証用修正シミュレーションデータセットと前記検証用実験データセットとを統合した検証用統合データセットを用いて、機械学習した複数の前記予測モジュール候補それぞれに対して予測精度の評価をさせる手順と、
前記コンピュータに、前記予測精度の評価結果に基づいて、前記複数の予測モジュール候補から前記予測モジュールを決定させる手順と、を備える。 Yet another aspect of the present invention causes a computer to execute a data processing method for forming a prediction module that predicts and outputs the value of a predetermined feature amount by inputting the values of a plurality of explanatory variables. It is a program. The program is
Data that is held as a set of the values of each of a plurality of explanatory variables and the value of a feature amount for associating the value of the explanatory variable with the value of the feature amount, the value of the feature amount being obtained using a testing machine. holding a plurality of experimental data, which are experimental values of the measured object, and simulation data, in which the value of the feature amount is a simulation calculated value calculated by performing a simulation using a simulation model of the measured object. Using the original data set, a computer calculates the value of the feature in the simulation data based on the correspondence between the value of the feature in the simulation data and the value of the feature in the experimental data. a step of correcting the values to generate a corrected simulation data set composed of the corrected simulation data;
By causing the computer to separate each of the experimental data sets composed of the corrected simulation data set and the experimental data into a learning data set and a verification data set, a corrected simulation data set for learning is created. , a procedure for generating a learning experiment data set, a verification modified simulation data set, and a verification experiment data set;
Using each of the corrected learning simulation data set, the learning experiment data set, and the learning integrated data set that integrates the learning corrected simulation data set and the learning experiment data set in the computer, A step in which a computer creates a plurality of prediction module candidates by machine learning the relationship between the explanatory variable and the feature amount ;
The computer performs machine learning using the modified verification simulation data set, the verification experiment data set, and the verification integrated data set that integrates the verification modified simulation data set and the verification experiment data set. a step of evaluating prediction accuracy for each of the plurality of prediction module candidates;
and a step of causing the computer to determine the prediction module from the plurality of prediction module candidates based on the evaluation result of the prediction accuracy.

上述のデータ処理方法、データ処理装置、及びプログラムによれば、コンピュータが、複数の説明変数の値を入力することにより予め定めた特徴量に関する値を予測して出力する予測モジュールを定めるとき、実験データとシミュレーションデータを含むオリジナルデータセットを用いて、説明変数と特徴量の間の関係を機械学習した予測精度の高い予測モジュールを効率よく作成することができる。 According to the data processing method, data processing device, and program described above, when a computer determines a prediction module that predicts and outputs a value related to a predetermined feature amount by inputting the values of a plurality of explanatory variables, an experiment is performed. Using an original data set containing data and simulation data, it is possible to efficiently create a prediction module with high prediction accuracy by machine learning the relationship between explanatory variables and feature quantities.

一実施形態のデータ処理方法の流れの一例を概略説明する図である。FIG. 2 is a diagram schematically explaining an example of the flow of a data processing method according to an embodiment. 一実施形態のデータ処理装置の構成の一例を示す図である。FIG. 1 is a diagram illustrating an example of the configuration of a data processing device according to an embodiment. （ａ）～（ｃ）は、一実施形態のデータ処理方法で行うシミュレーションデータにおける特徴量の値の修正の一例を説明する図である。(a) to (c) are diagrams illustrating an example of correction of the value of a feature amount in simulation data performed by the data processing method of one embodiment. 一実施形態のデータ処理方法で用いる学習用オリジナルデータセットの一例を簡素化してわかり易く説明する図である。FIG. 2 is a diagram illustrating an example of an original learning data set used in a data processing method according to an embodiment in a simplified and easy-to-understand manner. （ａ）～（ｃ）は、一実施形態のデータ処理方法において、オリジナルデータセットから作成されるデータセットの例を示す図である。(a) to (c) are diagrams showing examples of datasets created from original datasets in the data processing method of one embodiment. 一実施形態のデータ処理方法において、オリジナルデータセットから作成されるデータセットの例を示す図である。FIG. 3 is a diagram illustrating an example of a data set created from an original data set in the data processing method of one embodiment. 一実施形態のデータ処理方法における予測モジュール候補の作成と、検証用データセットの利用方法の一例を説明する図である。FIG. 2 is a diagram illustrating an example of how to create a prediction module candidate and use a verification data set in the data processing method of one embodiment. 一実施形態のデータ処理方法において用いるシミュレーションデータにおける特徴量の値と実験データにおける特徴量の値との対応を説明する図である。FIG. 3 is a diagram illustrating the correspondence between the value of a feature amount in simulation data and the value of a feature amount in experimental data used in the data processing method of one embodiment.

以下、一実施形態のデータ処理方法、データ処理装置、およびプログラムを添付の図に基づいて説明する。
図１は、一実施形態のデータ処理方法の流れの一例を概略説明する図である。図２は、一実施形態のデータ処理装置の構成の一例を示す図である。
一実施形態のデータ処理方法は、コンピュータにより実行される方法であり、複数の説明変数の値を入力することにより予め定めた特徴量に関する値を予測して出力する予測モジュールを作成する方法である。
予測モジュールは、オリジナルデータセットから作成される複数の学習用データセットを用いて作成された複数の予測モジュール候補の中から、オリジナルデータセットから別途作成された複数の検証用データセットを用いて評価した評価結果に基づいて定められる。 Hereinafter, a data processing method, a data processing device, and a program according to one embodiment will be described based on the attached figures.
FIG. 1 is a diagram schematically explaining an example of the flow of a data processing method according to an embodiment. FIG. 2 is a diagram illustrating an example of the configuration of a data processing device according to an embodiment.
The data processing method of one embodiment is a method executed by a computer, and is a method of creating a prediction module that predicts and outputs a value related to a predetermined feature quantity by inputting the values of a plurality of explanatory variables. .
The prediction module is evaluated using multiple validation datasets created separately from the original dataset, from among multiple prediction module candidates created using multiple training datasets created from the original dataset. It is determined based on the evaluation results.

図２に示すデータ処理装置１０は、ＣＰＵ１２及びメモリ１４を含むコンピュータにより構成される。データ処理装置１０には、ディスプレイ３０、及び、情報を指示入力するためのマウスやキーボードを含む入力操作デバイス３２が接続されている。
入力操作デバイス３２は、操作者がデータ処理装置１０に所望の指示入力をするために用いられる。例えば、予測モジュール候補を作成するための条件を設定するために入力操作デバイス３２から操作者は情報を指示入力する。
ディスプレイ３０は、設定された情報を表示するために用いられ、例えば、データ処理方法で用いるオリジナルデータセット、学習用オリジナルデータセット、検証用オリジナルデータセット、及び各種学習用サブデータセット、検証用サブデータセットにおけるデータの数値、説明変数、欠損説明変数、予測モジュール候補を作成するための条件設定画面、及び、予測モジュール候補における予測精度の評価結果等を表示する。 The data processing device 10 shown in FIG. 2 is constituted by a computer including a CPU 12 and a memory 14. Connected to the data processing device 10 are a display 30 and an input operation device 32 including a mouse and a keyboard for inputting information.
The input operation device 32 is used by an operator to input desired instructions to the data processing apparatus 10. For example, an operator instructs and inputs information from the input operation device 32 in order to set conditions for creating a prediction module candidate.
The display 30 is used to display set information, for example, an original data set used in the data processing method, an original data set for learning, an original data set for verification, various sub-data sets for learning, and sub-data sets for verification. Displays numerical values of data in the data set, explanatory variables, missing explanatory variables, a condition setting screen for creating prediction module candidates, evaluation results of prediction accuracy in prediction module candidates, etc.

メモリ１４には、プログラムが記憶されており、ＣＰＵ１２がプログラムを読み出して実行することにより、シミュレーションデータ修正部１５、サブデータセット作成部１６、予測モジュール候補作成部１８、予測モジュール候補評価部２０、予測モジュール決定部２２、及び予測部２４をソフトウェアモジュールとして機能させる。以下、シミュレーションデータ修正部１５、サブデータセット作成部１６、予測モジュール候補作成部１８、予測モジュール候補評価部２０、予測モジュール決定部２２、及び予測部２４の機能を、図１に示す一実施形態のデータ処理方法の流れを説明しながら同時に説明する。 A program is stored in the memory 14, and when the CPU 12 reads and executes the program, the simulation data correction section 15, the sub-data set creation section 16, the prediction module candidate creation section 18, the prediction module candidate evaluation section 20, The prediction module determining section 22 and the prediction section 24 are made to function as software modules. Hereinafter, the functions of the simulation data correction unit 15, sub-data set creation unit 16, prediction module candidate creation unit 18, prediction module candidate evaluation unit 20, prediction module determination unit 22, and prediction unit 24 will be described in an embodiment shown in FIG. The flow of the data processing method will be explained at the same time.

コンピュータは、機械学習することにより、予測モジュールとなり得る予測モデルを予め保持する。この予測モデルは、上記オリジナルデータセットから作成される複数の学習用サブデータセットを用いて機械学習することにより、予測モジュール候補となる。この予測モジュール候補の少なくとも１つが、予測モジュールとなる。予測モデルは、周知のディープラーニングに代表されるニューラルネットワークを用いたモデル、複数の決定木を使用して、「分類」または「回帰」をする、周知のランダムフォレスト法を用いたモデル、LASSO回帰を用いたモデルを含む。また、予測モデルとして、多項式あるいはクリギング、RBF（Radial Base Function）を用いた非線形関数を用いることもできる。 By performing machine learning, the computer previously holds a prediction model that can serve as a prediction module. This prediction model becomes a prediction module candidate by performing machine learning using a plurality of learning sub-data sets created from the original data set. At least one of these prediction module candidates becomes a prediction module. Prediction models include models using neural networks such as the well-known deep learning, models using the well-known random forest method that uses multiple decision trees for "classification" or "regression," and LASSO regression. Including models using . Further, as a prediction model, a nonlinear function using a polynomial, kriging, or RBF (Radial Base Function) can also be used.

オリジナルデータセットは、複数の説明変数の値と、これらの説明変数の値と関連付けを行うための特徴量の値とをセットにして複数組み（例えば、数万組）保持したデータの群である。説明変数は、例えば、製品の設計寸法、製品に用いる構成材料の構造や物性値、あるいは、製品を作製するときの作製条件等を含み、特徴量は、例えば製品の特性値、市場における販売量等を含む。例えば、オリジナルデータセットが、説明変数として、構造体の設計寸法、構成材料の構造を含み、特徴量として、構造体の特性値を含む場合、データは、上記設計寸法、上記構造を種々変化させたときの上記設計寸法及び上記構造の情報と特性値とからなるデータをいう。したがって、この場合、オリジナルデータセットは、上記設計寸法、上記構造を種々変化させたときの上記設計寸法及び上記構造の情報と特性値とをセットにしたデータを多数含む。 The original data set is a group of data that holds multiple sets (for example, tens of thousands of sets) of the values of multiple explanatory variables and the values of features used to correlate these explanatory variable values. . The explanatory variables include, for example, the design dimensions of the product, the structure and physical properties of the constituent materials used in the product, or the manufacturing conditions when manufacturing the product, and the feature quantities include, for example, the characteristic values of the product, the sales volume in the market. Including etc. For example, if the original data set includes the design dimensions of the structure and the structure of the constituent materials as explanatory variables, and the characteristic values of the structure as feature quantities, the data may include various changes in the design dimensions and structure. This refers to data consisting of information on the above-mentioned design dimensions and the above-mentioned structure, and characteristic values. Therefore, in this case, the original data set includes a large amount of data in which the design dimensions, the design dimensions when the structure is variously changed, information on the structure, and characteristic values are set.

オリジナルデータセットには、過去蓄積された膨大なデータである場合が多い。オリジナルデータセットは、複数の説明変数それぞれの値と、この説明変数の値と関連付けを行うための特徴量の値とをセットにした多数のデータを保持する。多数のデータには、多数の実験データと多数のシミュレーションデータが含まれている。実験データは、特徴量の値が、測定対象物の実験値であるデータである。シミュレーションデータは、特徴量の値が、測定対象物のシミュレーションモデルを用いてシミュレーションを行うことにより算出されたシミュレーション計算値である。図１に示すシミュレーションデータには、異なるシミュレーションモデルを用いて、あるいは異なるシミュレーション方法によって計算された特徴量の値であるシミュレーション計算値を持つシミュレーションデータ１とシミュレーションデータ２が含まれている。 The original data set is often a huge amount of data accumulated in the past. The original data set holds a large amount of data that is a set of values of each of a plurality of explanatory variables and values of feature amounts for associating the values of the explanatory variables. The large amount of data includes a large amount of experimental data and a large amount of simulation data. The experimental data is data in which the value of the feature amount is an experimental value of the measurement object. The simulation data is a simulation calculation value in which the value of the feature amount is calculated by performing a simulation using a simulation model of the measurement target object. The simulation data shown in FIG. 1 includes simulation data 1 and simulation data 2 that have simulation calculation values that are feature values calculated using different simulation models or by different simulation methods.

シミュレーションデータ修正部１５は、オリジナルデータセットのうち、シミュレーションデータにおけるシミュレーション計算値（特徴量の値）を、実験データを用いて、修正する（図１ＳＴ１０）。これにより、説明変数と特徴量の修正した値をセットとする複数の修正シミュレーションデータで構成された修正シミュレーションデータセットが作成される。実験データで構成されるデータセットを実験データセットという。 The simulation data modification unit 15 modifies the simulation calculation values (feature values) in the simulation data of the original data set using the experimental data (ST10 in FIG. 1). As a result, a modified simulation data set is created, which is composed of a plurality of modified simulation data, each of which is a set of modified explanatory variables and feature values. A dataset consisting of experimental data is called an experimental dataset.

図３（ａ）～（ｃ）は、シミュレーションデータにおける特徴量の値の修正の一例を説明する図である。図３（ａ）は、実験データの説明変数とシミュレーションデータの説明変数が許容範囲内で一致するときの実験データ及びシミュレーションデータの特徴量の値の一例を示す。図３（ｂ）は、図３（ａ）に示す特徴量の値を、横軸をシミュレーションデータの特徴量とし、縦軸を実験データの特徴量としたグラフにプロットした結果を示す。
図３（ｂ）に示すように、シミュレーションデータと実験データの特徴量の値は、１対１に対応する関係を有するので、この関係を利用して、図３（ｃ）に示すように、シミュレーションデータにおける特徴量の値を、実験データの値に換算した値を修正後のシミュレーションデータの値として定める。
なお、シミュレーションデータが、シミュレーションデータ１及びシミュレーションデータ２を含んでいる場合、シミュレーションデータ１及びシミュレーションデータ２毎に、図３（ｂ）に示すような関係を求めて、シミュレーションデータ１，２における特徴量の値を、実験データの値に換算して修正後のシミュレーションデータの値を求める。
なお、実験データの特徴量の値とシミュレーションデータの特徴量の値を、値の大きさの順番に並べたとき、その順位が、実験データとシミュレーションデータの間で異なるような場合、実験データ及びシミュレーションデータそれぞれの順位が異なる２つの特徴量の値に代えて、この２つの値の平均値を、新たな特徴量の値として用いてもよい。 FIGS. 3(a) to 3(c) are diagrams illustrating an example of correction of the value of a feature amount in simulation data. FIG. 3A shows an example of the values of the feature amounts of the experimental data and the simulation data when the explanatory variables of the experimental data and the explanatory variables of the simulation data match within a permissible range. FIG. 3(b) shows the results of plotting the values of the feature amounts shown in FIG. 3(a) on a graph in which the horizontal axis is the feature amount of simulation data and the vertical axis is the feature amount of experimental data.
As shown in FIG. 3(b), the values of the feature values of simulation data and experimental data have a one-to-one corresponding relationship.Using this relationship, as shown in FIG. 3(c), The value of the feature amount in the simulation data is converted into the value of the experimental data, and the value is determined as the value of the modified simulation data.
Note that when the simulation data includes simulation data 1 and simulation data 2, a relationship as shown in FIG. The value of the quantity is converted into the value of the experimental data to obtain the value of the corrected simulation data.
Note that when the values of the feature values of the experimental data and the values of the feature values of the simulation data are arranged in the order of the size of the values, if the order differs between the experimental data and the simulation data, the experimental data and the simulation data Instead of the values of two feature quantities having different ranks of simulation data, the average value of these two values may be used as the value of the new feature quantity.

次に、サブデータセット作成部１６は、修正シミュレーションデータで構成された修正シミュレーションデータセット１，２と実験用データで構成された実験データセットのそれぞれを、学習用データセットと、検証用データセットとに分離することにより、学習用修正シミュレーションデータセット１，２、学習用実験データセット、を学習用サブデータセットとして作成し、検証用修正シミュレーションデータセット１，２、検証用実験データセットを、検証用サブデータセットとして作成し、さらに、学習用修正シミュレーションデータセット１，２と学習用実験データセットとを統合した学習用統合データセットを学習用サブデータセットとして作成し、検証用修正シミュレーションデータセット１，２と検証用実験データセットとを統合した検証用統合データセットを検証用サブデータセットとして作成する（図１ＳＴ１２，ＳＴ１４）。 Next, the sub-dataset creation unit 16 converts each of the corrected simulation data sets 1 and 2 made up of corrected simulation data and the experimental data set made up of experimental data into a learning data set and a verification data set. By separating the corrected simulation data sets 1 and 2 for learning and the experimental data set for learning as learning sub-data sets, the corrected simulation data sets 1 and 2 for verification and the experimental data set for verification are created as sub-datasets for learning. Create a sub-dataset for verification as a sub-dataset, and further create an integrated training dataset that integrates the modified learning simulation data sets 1 and 2 and the experimental learning dataset as a learning sub-dataset, and create the modified verification simulation data. An integrated verification data set that integrates sets 1 and 2 and the verification experimental data set is created as a verification sub-data set (ST12, ST14 in FIG. 1).

図４は、オリジナルデータセットを学習用オリジナルデータセットと検証用オリジナルデータセットに分離したときの学習用オリジナルデータセットの一例を簡素化してわかり易く説明する図であり、図５（ａ）～（ｃ）及び図６は、一実施形態のデータ処理方法において、学習用オリジナルデータセットから作成される学習用サブデータセットの例を示す図である。
図４に示す学習用オリジナルデータセットは、説明変数として、説明変数Ｘ_１～Ｘ_ｎ（ｎは自然数）を含み、説明変数それぞれに対するデータとして、データ１～データ９を含む。図４に示す学習用オリジナルデータセットでは、データ数は９であるが、データ数は、実際、数千～数万である。
図４に示す学習用オリジナルデータセットでは、特徴量は１つであるが複数であってもよい。
ここで、図４中の「・・・」は、実際に数値があることを示している。図中の“シミュレーションindex”については、“０”が、非シミュレーションデータであることを示し、“１”がシミュレーション１により得られたデータであることを示し、“２”がシミュレーションモデルあるいはシミュレーション方法の点でシミュレーション１と異なるシミュレーション２により得られたデータであることを示している。図中の“試験機index”については、非シミュレーションデータの場合における試験機の種類を示している。“１”が試験機１により得られた実験データであることを示し、“０”は試験機を用いた実験データでないことを示している。まお、図４では、データ１～９の特徴量の値がＶ１～Ｖ９であることを示している。 FIG. 4 is a diagram for explaining in a simplified manner an example of an original dataset for learning when the original dataset is separated into an original dataset for learning and an original dataset for verification. ) and FIG. 6 are diagrams illustrating examples of learning sub-data sets created from the original learning data set in the data processing method of one embodiment.
The original data set for learning shown in FIG. 4 includes explanatory variables X ₁ to X _n (n is a natural number) as explanatory variables, and data 1 to data 9 as data for each explanatory variable. In the original data set for learning shown in FIG. 4, the number of data is nine, but the number of data is actually several thousand to tens of thousands.
In the original data set for learning shown in FIG. 4, there is one feature amount, but there may be more than one feature amount.
Here, "..." in FIG. 4 indicates that there is actually a numerical value. Regarding the "simulation index" in the figure, "0" indicates non-simulation data, "1" indicates data obtained by simulation 1, and "2" indicates a simulation model or simulation method. This indicates that the data was obtained by Simulation 2, which differs from Simulation 1 in this respect. The "test machine index" in the figure indicates the type of test machine in the case of non-simulation data. "1" indicates that the data is experimental data obtained by the tester 1, and "0" indicates that the data is not experimental data obtained using the tester. In addition, FIG. 4 shows that the values of the feature quantities of data 1 to 9 are V1 to V9.

図５（ａ）は、学習用オリジナルデータの“シミュレーションindex”が“０”のデータにより構成された学習用実験データセットを示しており、図５（ｂ）は、学習用オリジナルデータの“シミュレーションindex”が“１”のデータにより構成された学習用修正シミュレーションセット１を示しており、図５（ｃ）は、学習用オリジナルデータの“シミュレーションindex”が“２”のデータにより構成された学習用修正シミュレーションセット２を示している。学習用修正シミュレーションセット１,２における特徴量の値は、修正した値であるので、図５（ｂ），（ｃ）に示す特徴量の値は、Ｖ４’～Ｖ９’となっている。
図６は、学習用統合データセットを示し、修正シミュレーションデータと実験データで構成されている。 FIG. 5(a) shows a learning experiment data set composed of data whose "simulation index" of the original data for learning is "0", and FIG. 5(b) shows a "simulation index" of the original data for learning. Figure 5(c) shows a modified learning simulation set 1 composed of data whose "simulation index" is "1", and FIG. The revised simulation set 2 is shown in FIG. Since the values of the feature amounts in the learning modified simulation sets 1 and 2 are corrected values, the values of the feature amounts shown in FIGS. 5(b) and 5(c) are V4' to V9'.
FIG. 6 shows an integrated data set for learning, which is composed of modified simulation data and experimental data.

オリジナルデータセットでは、Ｘ_１～Ｘ_ｎの他に“シミュレーションindex”及び“試験機index”も説明変数であるので、説明変数の数はｎ＋２個であり、学習用実験データセット、学習用修正シミュレーション用データセット１，２、及び学習用統合データセットにおける説明変数の数は、Ｘ_１～Ｘ_ｎのｎ個である。 In the original data set _, in addition to X ₁ to The number of explanatory variables in the training data sets 1 and 2 and the learning integrated data set is n, X ₁ to X _n .

予測モジュール候補作成部１８は、学習用オリジナルデータセットと、作成した複数の学習用サブデータセットを用いて予測モデルを機械学習させて、予測モジュール候補１～５を作成する（図１ＳＴ１６）。予測モデルの機械学習では、ディープラーニングが用いられ、例えば、入力設定された条件に基づいた層構成の予測モジュール候補、例えば、１～７層の層構成の予測モジュール候補が作成される。 The prediction module candidate creation unit 18 performs machine learning on a prediction model using the original training data set and the created plurality of training sub-data sets to create prediction module candidates 1 to 5 (ST16 in FIG. 1). In the machine learning of the prediction model, deep learning is used, and, for example, prediction module candidates with a layered structure based on input set conditions are created, for example, prediction module candidates with a layered structure of 1 to 7 layers.

図７は、予測モジュール候補の作成と、後述する検証用データセットの利用方法の一例を説明する図である。
予測モジュール候補１～５は、上述した学習用オリジナルデータセット、学習用修正シミュレーションデータセット１，２、学習用実験データセット、及び学習用統合データセットのそれぞれを用いて、予測モデルが説明変数と特徴量の間の関係を機械学習することにより作成されたものである。予測モジュールの機械学習では、転移学習方法を用いることもできる。 FIG. 7 is a diagram illustrating an example of how to create a prediction module candidate and use a verification data set, which will be described later.
Prediction module candidates 1 to 5 use the above-mentioned original training data set, modified learning simulation data sets 1 and 2, learning experiment data set, and learning integrated data set, so that the prediction model can be used as an explanatory variable. It was created through machine learning of the relationships between features. Transfer learning methods can also be used in the machine learning of the prediction module.

したがって、学習用オリジナルデータセットから作成された予測モジュール候補１では、Ｘ_１～Ｘ_ｎ、“シミュレーションindex”及び“試験機index”が説明変数として定義される。したがって、この場合の説明変数はｎ＋２個である。学習用修正シミュレーションデータセット１，２、学習用実験データセット、及び学習用統合データセットのそれぞれから作成された予測モジュール候補２～５では、Ｘ_１～Ｘ_ｎが説明変数として定義される。したがって、この場合の説明変数はｎ個である。 Therefore, in the prediction module candidate 1 created from the original data set for learning, X ₁ to X _n , "simulation index" and "test machine index" are defined as explanatory variables. Therefore, the number of explanatory variables in this case is n+2. In prediction module candidates 2 to 5 created from each of the modified learning simulation data sets 1 and 2, the learning experimental data set, and the integrated learning data set, X ₁ to X _n are defined as explanatory variables. Therefore, there are n explanatory variables in this case.

予測モジュール候補評価部２０は、検証用修正シミュレーションデータセット１，２、検証用実験データセット、及び検証用修正シミュレーションデータセット１，２と検証用実験データセットとを統合した検証用統合データセットを用いて、機械学習した予測モジュール候補１～５それぞれに対して予測精度の評価をする（図１ＳＴ１８）。 The prediction module candidate evaluation unit 20 generates a modified verification simulation data set 1 and 2, a verification experiment data set, and a verification integrated data set that integrates the verification modified simulation data sets 1 and 2 and the verification experiment data set. The prediction accuracy is evaluated for each of the prediction module candidates 1 to 5 that have been subjected to machine learning (ST 18 in FIG. 1).

検証用サブデータセットとして用意した検証用修正シミュレーションデータセット１，２、検証用実験データセット、及び検証用統合データセットは、学習用修正シミュレーションデータセット１，２、学習用実験データセット、及び学習用統合データセットと同様に、説明変数としてＸ_１～Ｘ_ｎを持つので、検証用修正シミュレーションデータセット１，２、検証用実験データセット、及び検証用統合データセットは、予測モジュール２～５のそれぞれの検証用サブデータセットとして用いることができる。例えば、予測モジュール候補２と、検証用修正シミュレーションデータセット１，２、検証用実験データセット、及び検証用統合データセットのそれぞれを用いて特徴量の予測値を算出することができる。したがって、予測モジュール２で算出した特徴量の予測値を、検証用修正シミュレーションデータセット１，２、検証用実験データセット、及び検証用統合データセットの特徴量の値と比較することができる。同様に、予測モジュール候補３～５についても、算出した特徴量の予測値を、検証用修正シミュレーションデータセット１，２、検証用実験データセット、及び検証用統合データセットの特徴量の値を正解値として比較することができる。 The modified simulation data sets 1 and 2 for verification, the experimental data set for verification, and the integrated data set for verification prepared as sub-datasets for verification are the modified simulation data sets 1 and 2 for learning, the experimental data set for learning, and the learning Similar to the integrated data set for prediction, the corrected simulation data set for verification 1 and ₂ , the _experimental data set for verification, and the integrated data set for verification have the explanatory variables X 1 to It can be used as a sub-data set for each verification. For example, the predicted value of the feature amount can be calculated using each of the prediction module candidate 2, the modified verification simulation data sets 1 and 2, the verification experiment data set, and the verification integrated data set. Therefore, the predicted value of the feature amount calculated by the prediction module 2 can be compared with the value of the feature amount of the corrected simulation data sets 1 and 2 for verification, the experimental data set for verification, and the unified data set for verification. Similarly, for prediction module candidates 3 to 5, the predicted values of the calculated features are used as the correct answer for the corrected simulation data sets 1 and 2 for verification, the experimental data set for verification, and the integrated data set for verification. Can be compared as values.

予測モジュール候補１～５における特徴量の予測精度の評価では、予測モジュール候補１～５それぞれが予測した特徴量の予測値が、正解値にどの程度近似しているかを評価する。評価の仕方は、特に制限されないが、例えば、正解値に対する予測値の比を表した値を評価値とする。特徴量が複数設定されている場合、特徴量毎の上記比の平均値あるいは、上記比が１から最も遠く離れている値を評価値とする。あるいは、実際の特徴量の値と予測モジュール候補による予測値とが多数組あるので、実際の特徴量の値と予測値との間の相関係数Ｒあるいは決定係数Ｒ^２を評価値とすることもできる。 In evaluating the prediction accuracy of the feature quantities in the prediction module candidates 1 to 5, it is evaluated to what extent the predicted values of the feature quantities predicted by the prediction module candidates 1 to 5 each approximate the correct value. The method of evaluation is not particularly limited, but for example, a value representing the ratio of the predicted value to the correct value is used as the evaluation value. When a plurality of feature quantities are set, the average value of the ratios for each feature quantity or the value where the ratio is farthest from 1 is used as the evaluation value. Alternatively, since there are many sets of actual feature values and predicted values by prediction module candidates, the evaluation value may be the correlation coefficient R or determination coefficient ^R2 between the actual feature values and predicted values. You can also do it.

予測モジュール決定部２２は、予測モジュール候補評価部２０で求めた予測精度の評価結果（評価値）に基づいて、予測精度が高い予測モジュールを決定する（図１のＳＴ２０）。決定される予測モジュールは、複数の予測モジュール候補の中から、予測精度が最も高い１つを選んで決定してもよいし、予測精度が閾値を越える複数の予測モジュール候補を予測モジュールとして決定してもよい。予測モジュール候補の中で、説明変数が最も多い予測モジュール候補１が、最も予測精度が高い予測モジュール候補とは限らない。特徴量に寄与しない説明変数もあり、この説明変数がノイズ成分となって予測精度を低下させる場合がある。
予測精度の評価結果の情報は、ディスプレイ３０に画面表示されることが好ましい。 The prediction module determination unit 22 determines a prediction module with high prediction accuracy based on the prediction accuracy evaluation result (evaluation value) obtained by the prediction module candidate evaluation unit 20 (ST20 in FIG. 1). The prediction module to be determined may be determined by selecting the one with the highest prediction accuracy from a plurality of prediction module candidates, or by determining a plurality of prediction module candidates whose prediction accuracy exceeds a threshold as the prediction module. It's okay. Among the prediction module candidates, prediction module candidate 1 with the most explanatory variables is not necessarily the prediction module candidate with the highest prediction accuracy. There are also explanatory variables that do not contribute to the feature amounts, and these explanatory variables may become noise components and reduce prediction accuracy.
Preferably, the information on the prediction accuracy evaluation results is displayed on the display 30.

予測部２４は、決定された予測モジュールを設定して、説明変数の値を入力することにより特徴量に関する値を予測する。予測した特徴量に関する値は、ディスプレイ３０に出力される。 The prediction unit 24 sets the determined prediction module and predicts a value related to the feature amount by inputting the value of the explanatory variable. Values related to the predicted feature amounts are output to the display 30.

このように、上述のデータ処理方法では、シミュレーションデータと実験データを含んでいる場合において、シミュレーションデータの特徴量の値を修正することにより、修正シミュレーションデータと実験データを同じ学習用データとして同時に用いて、また、修正シミュレーションデータセット１，２のように種類の異なる修正シミュレーションデータ毎の学習用修正シミュレーションデータセットを用いて、予測モデルを機械学習させることができるので予測モジュール候補を複数作成することができる。さらに、検証用サブデータセットとして、検証用修正シミュレーションデータ、検証用実験データセット、及び検証用統合データセットそれぞれを用いて、すなわち、検証用オリジナルデータセットを効率よく用いて、複数の予測モジュール候補の予測精度を評価することができる。したがって、オリジナルデータセットに実験データとシミュレーションデータを含む場合であっても、説明変数と特徴量の間の関係を機械学習した予測精度の高い予測モジュールを効率よく作成することができる。 In this way, in the data processing method described above, when simulation data and experimental data are included, by modifying the feature values of the simulation data, the modified simulation data and experimental data can be used simultaneously as the same learning data. In addition, since the prediction model can be machine learned using learning modified simulation data sets for different types of modified simulation data, such as modified simulation data sets 1 and 2, it is possible to create multiple prediction module candidates. Can be done. Furthermore, as sub-datasets for verification, each of the corrected simulation data for verification, the experimental data set for verification, and the integrated data set for verification are used, that is, the original data set for verification is efficiently used to generate multiple prediction module candidates. It is possible to evaluate the prediction accuracy of Therefore, even when the original data set includes experimental data and simulation data, it is possible to efficiently create a prediction module with high prediction accuracy by machine learning the relationship between explanatory variables and feature amounts.

一実施形態によれは、オリジナルデータを学習用データセットと検証用データセットとに分割するとき、検証用データセットをオリジナルデータセットの異なる部分から取り出し、残りの部分を学習用データセットとする分割を複数回行い、分割の度に、学習用データセットを用いて作成した予測モジュール候補の予測精度の評価を行い、複数回行った予測精度の評価結果の平均値に基づいて予測モジュール候補から予測モジュールを決定する、ことが好ましい。これにより、オリジナルデータセットの広い範囲で偏ることなく機械学習のための学習用データセットを作成することができ、また、検証のための検証用データセットを広い範囲で偏ることなく用いることができ、予測精度の評価を精度よく求めることができる。 According to one embodiment, when dividing the original data into a training dataset and a validation dataset, the validation dataset is taken from a different part of the original dataset and the remaining part is the training dataset. is performed multiple times, and each time it is divided, the prediction accuracy of the prediction module candidates created using the training dataset is evaluated, and predictions are made from the prediction module candidates based on the average value of the prediction accuracy evaluation results performed multiple times. Preferably, the module is determined. As a result, it is possible to create a training dataset for machine learning without being biased over a wide range of the original dataset, and it is also possible to use a validation dataset for verification over a wide range without bias. , it is possible to accurately evaluate the prediction accuracy.

一実施形態によれば、シミュレーションデータは、複数の実験データにおける特徴量の最大値と最小値のそれぞれを実現する説明変数の値を用いて、シミュレーションモデルを用いてシミュレーションを行うことにより算出されたシミュレーション計算値を含み、特徴量の値の修正では、最大値及び最小値と、最大値及び最小値のそれぞれに対応したシミュレーション計算値との間の対応関係と、実験データの特徴量の値が最大値と最小値の間に存在し、説明変数の値同士が許容範囲内で一致するシミュレーション計算値と実験データにおける特徴量の値との間の対応関係を利用して、学習用シミュレーションデータセットの特徴量の値を修正する、ことが好ましい。シミュレーションは、特に限定されないが、例えば、周知の有限要素モデルを用いたシミュレーションが挙げられる。
図８は、シミュレーションデータにおける特徴量の値、すなわちシミュレーション計算値と実験データにおける特徴量の値との対応を説明する図である。実験データにおける特徴量の最大値及び最小値を実現する説明変数の値に対応するシミュレーションデータの特徴量の値、すなわちシミュレーション計算値があれば、最大値と最小値に対応した２つのシミュレーション計算値の間における計算値の修正を、内挿補間を利用して高い精度で行なうことができる。このため、実験データにおける特徴量の最大値及び最小値に対応したシミュレーションデータの特徴量の値が、シミュレーションデータにない場合、シミュレーションモデルを用いてシミュレーションを行うことにより、実験データにおける特徴量の最大値及び最小値に対応したシミュレーション計算値を容易に算出することができる。 According to one embodiment, the simulation data is calculated by performing a simulation using a simulation model using values of explanatory variables that realize the maximum and minimum values of feature quantities in a plurality of experimental data. Including simulation calculation values, correction of feature value values requires the correspondence between the maximum and minimum values and the simulation calculation values corresponding to the maximum and minimum values, and the feature value of experimental data. A learning simulation data set is created by using the correspondence between the simulated calculated values, which exist between the maximum and minimum values, and the values of the explanatory variables match within the allowable range, and the feature values in the experimental data. Preferably, the value of the feature amount is modified. The simulation is not particularly limited, but includes, for example, a simulation using a well-known finite element model.
FIG. 8 is a diagram illustrating the correspondence between the value of the feature amount in the simulation data, that is, the value calculated by the simulation, and the value of the feature amount in the experimental data. If there is a value of the feature quantity of the simulation data corresponding to the value of the explanatory variable that realizes the maximum value and minimum value of the feature quantity in the experimental data, that is, a simulation calculated value, two simulation calculated values corresponding to the maximum value and the minimum value. Correcting the calculated values between the two can be performed with high precision using interpolation. Therefore, if the simulation data does not have the feature values in the simulation data that correspond to the maximum and minimum values of the feature values in the experimental data, by performing a simulation using a simulation model, it is possible to maximize the feature values in the experimental data. Simulation calculation values corresponding to the values and the minimum values can be easily calculated.

また、実験データの特徴量の値が最大値と最小値の間に存在し、説明変数の値同士が許容範囲内で一致するシミュレーション計算値と実験データにおける前記特徴量の値との間の対応関係を利用して、学習用シミュレーションデータセットの特徴量の値を内挿補間により高い精度で修正することができる。この場合においても、実験データにおける説明変数と説明変数の値が許容範囲内で一致するシミュレーションデータがない場合、シミュレーションモデルを用いてシミュレーションを行うことにより、実験データにおける特徴量の値に対応したシミュレーション計算値を容易に算出することができる。
このようにして、図８に示すように、シミュレーションデータと実験データの間で、説明変数が許容範囲内で一致するときの特徴量の値の対応付けを行うことができる。このため、内挿補間により、精度の高い値の修正を行うことができる。 In addition, the value of the feature amount in the experimental data exists between the maximum value and the minimum value, and the values of the explanatory variables match within the allowable range.The correspondence between the simulation calculated value and the value of the feature amount in the experimental data. By using the relationship, the value of the feature amount of the learning simulation data set can be corrected with high precision by interpolation. Even in this case, if there is no simulation data in which the explanatory variables in the experimental data match the values of the explanatory variables within the allowable range, by performing a simulation using a simulation model, a simulation that corresponds to the value of the feature in the experimental data can be performed. Calculated values can be easily calculated.
In this way, as shown in FIG. 8, it is possible to associate the values of the feature amounts between the simulation data and the experimental data when the explanatory variables match within the allowable range. Therefore, by interpolation, values can be corrected with high precision.

一実施形態によれば、予測精度を評価するとき、図７に示すように、学習用統合データセットを用いて機械学習した予測モジュール候補については、検証用修正シミュレーションデータセット１，２、検証用実験データセット、及び検証用統合データセットのそれぞれを用いたときの予測精度の評価をすることが好ましい。学習用統合データセットを用いて機械学習した予測モジュール候補は、他のいずれの予測モジュール候補よりも予測精度が高いことが一般的に想定されるが、必ずしも予測精度が高くない場合もある。このため、学習用統合データセットを用いて機械学習した予測モジュール候補については、検証用オリジナルデータセットから作成した種々の検証用サブデータセットを可能な限り利用して、予測精度の評価をすることが好ましい。 According to one embodiment, when evaluating prediction accuracy, as shown in FIG. Preferably, prediction accuracy is evaluated using each of the experimental data set and the integrated verification data set. Although it is generally assumed that a prediction module candidate that has undergone machine learning using an integrated training dataset has a higher prediction accuracy than any other prediction module candidate, the prediction accuracy may not always be high. For this reason, for prediction module candidates that have been machine learned using the integrated training dataset, the prediction accuracy should be evaluated by using various validation sub-datasets created from the original validation dataset as much as possible. is preferred.

一実施形態によれば、予測精度の評価をするとき、学習用統合データセットを用いて機械学習した予測モジュール候補については、
（１）検証用実験データセットを用いたときの予測精度と、学習用実験データセットを用いて機械学習した予測モジュール候補における、検証用実験データセットを用いたときの予測精度とを比較し、
（２）検証用修正シミュレーションデータセットを用いたときの予測精度と、学習用修正シミュレーションデータセットを用いて機械学習した予測モジュール候補における、検証用修正シミュレーションセットを用いたときの予測精度とを比較し、
比較結果に基づいて、学習用統合データセットを用いて機械学習した予測モジュール候補の評価を行う、ことが好ましい。学習用統合データセットを用いて機械学習した予測モジュール候補の特徴量の値の予測精度は、学習用実験データセットを用いて機械学習した予測モジュール候補における、検証用実験データセットを用いたときの予測精度に比べて向上していること、及び、学習用修正シミュレーションデータセットを用いて機械学習した予測モジュール候補における、検証用修正シミュレーションセットを用いたときの予測精度に比べて向上していることが一般に想定されるが、必ずしも予測精度が高くない場合もある。このため、学習用統合データセットを用いて機械学習した予測モジュール候補については、実験データセットから作成した予測モジュール候補が実験データセットの実験データを検証用のデータとして用いた場合の予測精度と比べること、及び、修正シミュレーションデータセットから作成した予測モジュール候補が修正シミュレーションデータを検証用のデータとして用いた場合の予測精度と比べること、が特に好ましい。 According to one embodiment, when evaluating prediction accuracy, for prediction module candidates subjected to machine learning using an integrated training dataset,
(1) Compare the prediction accuracy when using the experimental data set for verification and the prediction accuracy when using the experimental data set for verification in the prediction module candidates that were machine learned using the experimental data set for learning,
(2) Compare the prediction accuracy when using the modified simulation dataset for verification and the prediction accuracy when using the modified simulation set for verification for prediction module candidates that were machine learned using the modified simulation dataset for learning. death,
Preferably, prediction module candidates subjected to machine learning using the integrated learning data set are evaluated based on the comparison results. The prediction accuracy of the feature value of the prediction module candidate machine-learned using the integrated training dataset is the same as that of the prediction module candidate machine-learned using the learning experiment dataset when using the verification experiment dataset. The prediction accuracy is improved compared to the prediction accuracy, and the prediction accuracy of the prediction module candidates learned by machine learning using the modified training simulation dataset is improved compared to the prediction accuracy when using the modified verification simulation set. is generally assumed, but the prediction accuracy may not always be high. For this reason, for prediction module candidates that have been machine-learned using the integrated training dataset, the prediction accuracy of the prediction module candidates created from the experimental dataset is compared with the prediction accuracy when the experimental data of the experimental dataset is used as verification data. It is especially preferable to compare the prediction accuracy of the prediction module candidates created from the modified simulation data set with the prediction accuracy when the modified simulation data is used as verification data.

シミュレーションデータは、シミュレーションモデルの構成およびシミュレーションの方法が同じ１種類のシミュレーションデータであってもよいが、図１に示すように、シミュレーションモデルの構成およびシミュレーションの方法の少なくともいずれか１つが異なるシミュレーションデータ１（第１シミュレーションデータ）及びシミュレーションデータ２（第２シミュレーションデータ）を含むことが好ましい。この場合、シミュレーションデータ１及びシミュレーションデータ２のそれぞれを用いて、図１に示すＳＴ１０～２０の処理を行う、ことが好ましい。これにより、シミュレーションの相違による複数の予測モジュール候補の予測精度を評価することができるので、予測精度の高い予測モジュールを決定することができる。 The simulation data may be one type of simulation data in which the configuration of the simulation model and the simulation method are the same, but as shown in FIG. 1 (first simulation data) and simulation data 2 (second simulation data). In this case, it is preferable to perform the processing of ST10 to ST20 shown in FIG. 1 using each of simulation data 1 and simulation data 2. This makes it possible to evaluate the prediction accuracy of a plurality of prediction module candidates based on differences in simulations, so it is possible to determine a prediction module with high prediction accuracy.

一実施形態によれば、特徴量は、タイヤに作用する物理量、例えばタイヤの特性値であり、説明変数の値は、タイヤを規定する値である、ことが好ましい。これにより、タイヤに作用する物理量を、タイヤを規定する値を用いて高い精度で予測することが可能になる。タイヤを規定する値は、例えば、タイヤを装着するリムサイズ、タイヤの偏平率、タイヤ幅、ビードフィラー断面積、第１スチールコードの角度、第１スチールコードの剛性、第２スチールコードの角度、第２スチールコードの剛性、第１カーカスコードの角度、及び第１カーカスコードの剛性、第２カーカスコードの角度、第２カーカスコードの剛性等を含む。 According to one embodiment, it is preferable that the feature quantity is a physical quantity that acts on the tire, such as a characteristic value of the tire, and the value of the explanatory variable is a value that defines the tire. This makes it possible to predict with high accuracy the physical quantities that act on the tire using values that define the tire. The values that define the tire include, for example, the rim size on which the tire is mounted, the aspect ratio of the tire, the tire width, the cross-sectional area of the bead filler, the angle of the first steel cord, the rigidity of the first steel cord, the angle of the second steel cord, and the second steel cord. 2 steel cord, the angle of the first carcass cord, the rigidity of the first carcass cord, the angle of the second carcass cord, the rigidity of the second carcass cord, etc.

一実施形態によれば、予測モジュールは、特徴量に関する目標値の入力に応じて、目標値を再現する説明変数に関する最適値を算出する最適化処理に用いることもできる。すなわち、一実施形態のデータ処理方法では、特徴量に関する目標値の入力に応じて、データ処理装置１０が、予測モジュールを用いて目標値を再現する説明変数に関する最適値を算出する最適化処理を含むことが好ましい。この場合、予測モジュールに入力される説明変数の値に応じて予測モジュールが予測する特徴量の値に基づいて、説明変数に関する最適値を算出することが好ましい。最適値を算出する方法は、例えば、進化的アルゴリズムが利用することが好ましい。進化的アルゴリズムは、Genetic Algorithm（遺伝的アルゴリズム）、Differential Evolution、Particle Swarm Optimization、Ant Colony Optimization等を含む。実験計画法やラテンハイパーキューブ法を利用することも好ましい。 According to one embodiment, the prediction module can also be used in an optimization process that calculates an optimal value for an explanatory variable that reproduces the target value in response to input of a target value for the feature quantity. That is, in the data processing method of one embodiment, in response to input of a target value regarding a feature amount, the data processing device 10 performs an optimization process of calculating an optimal value regarding an explanatory variable that reproduces the target value using a prediction module. It is preferable to include. In this case, it is preferable to calculate the optimal value for the explanatory variable based on the value of the feature quantity predicted by the prediction module in accordance with the value of the explanatory variable input to the prediction module. For example, it is preferable that an evolutionary algorithm be used as the method for calculating the optimal value. Evolutionary algorithms include Genetic Algorithm, Differential Evolution, Particle Swarm Optimization, Ant Colony Optimization, etc. It is also preferable to use the experimental design method or the Latin hypercube method.

一実施形態によれば、説明変数の値と特徴量の値の関係を可視化することが好ましい。
説明変数の値と特徴量の値の関係は、ディスプレイ３０に表示される。説明変数の値と特徴量の値の関係は、例えば自己組織化マップにより表される。あるいは、自己組織化マップに代えて、散布図を用いて、説明変数と特徴量の値の関係を可視化してもよい。 According to one embodiment, it is preferable to visualize the relationship between the value of the explanatory variable and the value of the feature amount.
The relationship between the value of the explanatory variable and the value of the feature amount is displayed on the display 30. The relationship between the value of the explanatory variable and the value of the feature amount is represented by, for example, a self-organizing map. Alternatively, instead of the self-organizing map, a scatter diagram may be used to visualize the relationship between explanatory variables and feature values.

このようなデータ処理方法は、コンピュータに実行させるプログラムをメモリ１４から読み出して実行することにより達成することができる。したがって、このプログラムは、
（１）実験データとシミュレーションデータとを複数保持するオリジナルデータセットを用いて、コンピュータに、シミュレーションデータにおける特徴量の値と実験用データにおける特徴量の値との間の対応関係に基づいて、シミュレーションデータにおける特徴量の値を修正させて、修正シミュレーションデータで構成される修正シミュレーションデータセットを生成させる手順と、
（２）コンピュータに、修正シミュレーションデータセットと実験用データで構成される実験データセットのそれぞれを、学習用データセットと、検証用データセットとに分離させることにより、学習用修正シミュレーションデータセット、学習用実験データセット、検証用修正シミュレーションデータセット、及び検証用実験データセットを生成させる手順と、
（３）コンピュータに、学習用修正シミュレーションデータセット、学習用実験データセット、及び学習用統合データセットのそれぞれを用いて、コンピュータが、説明変数と特性量との間の関係を機械学習した複数の予測モジュール候補を作成させる手順と、
（４）コンピュータに、検証用修正シミュレーションデータセット、検証用実験データセット、及び検証用統合データセットを用いて、機械学習した複数の予測モジュール候補それぞれに対して予測精度の評価をさせる手順と、
（５）コンピュータに、予測精度の評価結果に基づいて、複数の予測モジュール候補から予測モジュールを決定させる手順と、を備える。 Such a data processing method can be achieved by reading a program to be executed by a computer from the memory 14 and executing it. Therefore, this program:
(1) Using an original data set containing multiple experimental data and simulation data, a computer performs a simulation based on the correspondence between the feature values in the simulation data and the feature values in the experimental data. a step of correcting the value of the feature quantity in the data to generate a corrected simulation data set composed of the corrected simulation data;
(2) By having the computer separate each experimental data set consisting of a corrected simulation data set and experimental data into a learning data set and a verification data set, a corrected simulation data set for learning and a learning data set are created. a procedure for generating an experimental data set for use, a modified simulation data set for verification, and an experimental data set for verification;
(3) A computer uses each of the corrected simulation data set for learning, experimental data set for learning, and integrated data set for learning to perform machine learning on the relationship between explanatory variables and characteristic quantities. A procedure for creating a prediction module candidate,
(4) a step of causing the computer to evaluate the prediction accuracy of each of the plurality of prediction module candidates subjected to machine learning using the corrected simulation data set for verification, the experimental data set for verification, and the integrated data set for verification;
(5) A procedure for causing the computer to determine a prediction module from a plurality of prediction module candidates based on the evaluation result of prediction accuracy.

（実施例、比較例）
上述のデータ処理方法の効果を確認するために、１０８８１個の実験データと、３７３９個のシミュレーションデータを用意した。説明変数は、タイヤ寸法、タイヤの構成材料の寸法、物性値、及びタイヤ構造の形態を情報として含み、特徴量として、転がり抵抗を用いた。 (Example, comparative example)
In order to confirm the effectiveness of the data processing method described above, 10,881 pieces of experimental data and 3,739 pieces of simulation data were prepared. The explanatory variables include tire dimensions, the dimensions of the constituent materials of the tire, physical property values, and the form of the tire structure as information, and rolling resistance was used as the feature quantity.

実施例では、実験データ及びシミュレーションデータを含むオリジナルデータセットを用意し、このオリジナルデータセットを上述のデータ処理方法により処理して、予測モジュールを決定した。予測モジュール候補は、学習用オリジナルデータセットから作成された予測モジュール候補と、学習用統合データセットから作成された予測モジュール候補と、学習用実験データから作成された予測モジュール候補の３つである。 In the example, an original data set including experimental data and simulation data was prepared, and this original data set was processed by the above-described data processing method to determine a prediction module. There are three prediction module candidates: a prediction module candidate created from the original data set for learning, a prediction module candidate created from the integrated training data set, and a prediction module candidate created from the experimental data for learning.

一方、比較例では、実験データを含むがシミュレーションデータを含まないオリジナルデータセットを用いて予測モジュールを決定した。この場合に学習用実験データセットから１つの予測モジュール候補が作成されるだけであり、予測モジュール候補の数は１つであるので、この予測モジュール候補が自動的に比較例における予測モジュールとなる。 On the other hand, in the comparative example, a prediction module was determined using an original data set that included experimental data but did not include simulation data. In this case, only one prediction module candidate is created from the learning experimental data set, and the number of prediction module candidates is one, so this prediction module candidate automatically becomes the prediction module in the comparative example.

実施例における予測モジュール候補の評価結果は以下のとおりであった。
予測モジュール候補は、予測モデルをディープラーニング法により機械学習をさせることにより作成した。深層学習における層構成は、３層とした。
学習用オリジナルデータセットから作成された予測モジュール候補における検証用オリジナルデータセットを用いた予測値と、検証用オリジナルデータセットにおける特徴量の値との間の決定係数Ｒ^２は０．７１と低く、
学習用実験データセットから作成された予測モジュール候補における検証用実験データセット、検証用修正シミュレーションデータセット、及び検証用統合データセットを用いた特徴量の予測値と、上記データセット内の対応する特徴量の値との間の決定係数Ｒ^２は０．７７であり、
学習用統合データセットから作成された予測モジュール候補における検証用実験データセット、検証用修正シミュレーションデータセット、及び検証用統合データセットを用いた特徴量の予測値と、上記データセット内の対応する特徴量の値との間の決定係数Ｒ^２は０．８８であった。 The evaluation results of the prediction module candidates in the example were as follows.
The prediction module candidates were created by subjecting the prediction model to machine learning using a deep learning method. The layer structure in deep learning was three layers.
The coefficient of determination R ² between the predicted value using the original data set for verification in the prediction module candidate created from the original data set for learning and the value of the feature amount in the original data set for verification is as low as 0.71.
Predicted values of features using the verification experiment data set, verification modified simulation data set, and verification integrated data set in the prediction module candidates created from the learning experiment data set, and the corresponding features in the above datasets. The coefficient of determination ^R2 between the value of the quantity is 0.77,
Predicted values of feature quantities using the experimental validation dataset, modified simulation dataset for validation, and integrated validation dataset in the prediction module candidates created from the integrated training dataset, and the corresponding features in the above dataset. The coefficient of determination ^R2 between the quantity values was 0.88.

一方、比較例で作成される予測モジュール候補は、上述の学習用実験データセットから作成された１つの予測モジュール候補だけであるので、その予測モジュール候補の決定係数Ｒ^２は０．７７である。 On the other hand, since the prediction module candidate created in the comparative example is only one prediction module candidate created from the above-mentioned learning experiment data set, the determination coefficient R ² of the prediction module candidate is 0.77.

したがって、実施例で決定される予測モジュールの決定係数Ｒ^２は０．８８であり、比較例で決定される予測モジュールの決定係数Ｒ^２は０．７７である。
これより、実施例の予測モジュールの予測精度は高いといえる。 Therefore, the coefficient of determination R ² of the prediction module determined in the example is 0.88, and the coefficient of determination R ² of the prediction module determined in the comparative example is 0.77.
From this, it can be said that the prediction accuracy of the prediction module of the example is high.

以上、本発明のデータ処理方法、データ処理装置、及びプログラムについて詳細に説明したが、本発明は上記実施形態に限定されず、本発明の主旨を逸脱しない範囲において、種々の改良や変更をしてもよいのはもちろんである。 Although the data processing method, data processing device, and program of the present invention have been described in detail above, the present invention is not limited to the above embodiments, and various improvements and changes can be made without departing from the gist of the present invention. Of course you can.

１０データ処理装置
１２ＣＰＵ
１４メモリ
１５シミュレーションデータ修正部
１６サブデータセット作成部
１８予測モジュール候補作成部
２０予測モジュール候補作成部
２２予測モジュール決定部
２４予測部 10 Data processing device 12 CPU
14 Memory 15 Simulation data modification section 16 Sub-data set creation section 18 Prediction module candidate creation section 20 Prediction module candidate creation section 22 Prediction module determination section 24 Prediction section

Claims

A data processing method for forming a prediction module in which a computer predicts and outputs a value related to a predetermined feature amount by inputting values of a plurality of explanatory variables, the method comprising:
Data that is held as a set of the values of each of a plurality of explanatory variables and the value of a feature amount for associating the value of the explanatory variable with the value of the explanatory variable, the value of the feature amount being an experimental value of the measurement target. Using an original data set that holds a plurality of experimental data and simulation data in which the value of the feature amount is a simulation calculation value calculated by performing a simulation using a simulation model of the measurement target, Corrected simulation data in which a computer corrects the value of the feature amount in the simulation data based on the correspondence between the value of the feature amount in the simulation data and the value of the feature amount in the experimental data. creating a modified simulation dataset consisting of;
The computer separates each of the experimental data sets composed of the corrected simulation data set and the experimental data into a learning data set and a verification data set, thereby creating a corrected learning simulation data set. , generating a learning experiment data set, a verification modified simulation data set, and a verification experiment data set;
The computer uses each of the corrected learning simulation data set, the learning experiment data set, and the learning integrated data set in which the learning corrected simulation data set and the learning experiment data set are integrated, a step in which a computer creates a plurality of prediction module candidates by machine learning the relationship between the explanatory variable and the feature amount;
The computer performs machine learning using the modified verification simulation data set, the verification experiment data set, and the verification integrated data set that integrates the verification modified simulation data set and the verification experiment data set. Evaluating prediction accuracy for each of the plurality of prediction module candidates;
A data processing method comprising: the computer determining the prediction module from the plurality of prediction module candidates based on the evaluation result of the prediction accuracy.

The simulation data is a simulation calculated by performing the simulation using the simulation model using the values of the explanatory variables that realize the maximum and minimum values of the feature amounts in the plurality of experimental data. Contains calculated values,
In the correction of the value of the feature amount, a correspondence relationship between the maximum value and the minimum value and simulation calculation values corresponding to the maximum value and the minimum value, respectively, and the value of the feature amount of the experimental data are determined. exists between the maximum value and the minimum value, and the values of the explanatory variables match within a permissible range.Use a correspondence relationship between the simulation calculation value and the value of the feature quantity in the experimental data. The data processing method according to claim 1, wherein the value of the feature amount of the modified learning simulation data set is modified.

When evaluating the prediction accuracy, for the prediction module candidates that have been machine learned using the integrated training dataset, the corrected verification simulation dataset, the verification experiment dataset, and the verification integrated dataset are evaluated. The data processing method according to claim 1 or 2, wherein prediction accuracy is evaluated when each is used.

When evaluating the prediction accuracy, for prediction module candidates that have been machine learned using the integrated learning dataset,
(1) Prediction accuracy when using the experimental data set for verification, and prediction accuracy when using the experimental data set for verification in prediction module candidates that are machine learned using the experimental data set for learning. Compare,
(2) Prediction accuracy when using the corrected simulation data set for verification and prediction when using the corrected simulation data set for verification in prediction module candidates that are machine learned using the corrected simulation data set for learning Compare the accuracy and
The data processing method according to any one of claims 1 to 3, wherein prediction module candidates subjected to machine learning using the integrated learning data set are evaluated based on the comparison results.

The simulation data includes first simulation data and second simulation data that differ in at least one of the configuration of the simulation model and the method of the simulation,
creating the modified simulation data set using each of the first simulation data and the second simulation data; generating the modified simulation data set for learning and the modified simulation data set for verification; and the prediction. 5. The data processing method according to claim 1, further comprising creating module candidates and evaluating the prediction accuracy.

The feature quantity is a physical quantity that acts on the tire,
The data processing method according to claim 1, wherein the value of the explanatory variable is a value that defines the tire.

Furthermore, in response to the input of the target value regarding the feature amount, the computer calculates an optimal value regarding the explanatory variable that reproduces the target value using the prediction module,
In the step of calculating the optimal value, the optimal value for the explanatory variable is calculated based on the value of the feature quantity predicted by the prediction module according to the value of the explanatory variable input to the prediction module. The data processing method according to any one of claims 1 to 6.

The data processing method according to any one of claims 1 to 7, further comprising the step of visualizing the relationship between the value of the explanatory variable and the value of the feature amount.

A data processing device configured with a computer that forms a prediction module that predicts and outputs the value of a predetermined feature quantity by inputting the values of a plurality of explanatory variables,
Data that is held as a set of the values of each of a plurality of explanatory variables and the value of a feature amount for associating the value of the explanatory variable with the value of the feature amount, the value of the feature amount being obtained using a testing machine. holding a plurality of experimental data, which are experimental values of the measured object, and simulation data, in which the value of the feature amount is a simulation calculated value calculated by performing a simulation using a simulation model of the measured object. Correcting the value of the feature amount in the simulation data based on the correspondence between the value of the feature amount in the simulation data and the value of the feature amount in the experimental data using the original data set. and a data correction unit that creates a corrected simulation data set composed of the corrected simulation data;
By separating each of the experimental data sets composed of the modified simulation data set and the experimental data into a training data set and a verification data set, the modified simulation data set for learning and the experiment for learning can be separated. a dataset generation unit that generates a dataset, a modified simulation dataset for verification, and an experimental dataset for validation;
The computer uses each of the modified learning simulation data set, the learning experiment data set, and the learning integrated data set that integrates the learning modified simulation data set and the learning experiment data set. a prediction module candidate creation unit that creates a plurality of prediction module candidates by machine learning the relationship between the explanatory variable and the feature amount;
The computer performs machine learning using the modified verification simulation data set, the verification experiment data set, and the verification integrated data set that integrates the verification modified simulation data set and the verification experiment data set. a prediction module candidate evaluation unit that evaluates prediction accuracy for each of the plurality of prediction module candidates;
The data processing device is characterized in that the computer includes a prediction module determining unit that determines the prediction module from the plurality of prediction module candidates based on the evaluation result of the prediction accuracy.

A program that causes a computer to execute a data processing method for forming a prediction module that predicts and outputs the value of a predetermined feature quantity by inputting the values of a plurality of explanatory variables, the program comprising:
Data that is held as a set of the values of each of a plurality of explanatory variables and the value of a feature amount for associating the value of the explanatory variable with the value of the feature amount, the value of the feature amount being obtained using a testing machine. holding a plurality of experimental data, which are experimental values of the measured object, and simulation data, in which the value of the feature amount is a simulation calculated value calculated by performing a simulation using a simulation model of the measured object. Using the original data set, a computer calculates the value of the feature in the simulation data based on the correspondence between the value of the feature in the simulation data and the value of the feature in the experimental data. a step of correcting the values to generate a corrected simulation data set composed of the corrected simulation data;
By causing the computer to separate each of the experimental data sets composed of the corrected simulation data set and the experimental data into a learning data set and a verification data set, a corrected simulation data set for learning is created. , a procedure for generating a learning experiment data set, a verification modified simulation data set, and a verification experiment data set;
Using each of the corrected learning simulation data set, the learning experiment data set, and the learning integrated data set that integrates the learning corrected simulation data set and the learning experiment data set in the computer, A step in which a computer creates a plurality of prediction module candidates by machine learning the relationship between the explanatory variable and the feature amount;
The computer performs machine learning using the modified verification simulation data set, the verification experiment data set, and the verification integrated data set that integrates the verification modified simulation data set and the verification experiment data set. a step of evaluating prediction accuracy for each of the plurality of prediction module candidates;
A program comprising: causing the computer to determine the prediction module from the plurality of prediction module candidates based on the evaluation result of the prediction accuracy.