WO2021229648A1 - 数式モデル生成システム、数式モデル生成方法および数式モデル生成プログラム - Google Patents
数式モデル生成システム、数式モデル生成方法および数式モデル生成プログラム Download PDFInfo
- Publication number
- WO2021229648A1 WO2021229648A1 PCT/JP2020/018844 JP2020018844W WO2021229648A1 WO 2021229648 A1 WO2021229648 A1 WO 2021229648A1 JP 2020018844 W JP2020018844 W JP 2020018844W WO 2021229648 A1 WO2021229648 A1 WO 2021229648A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- explanatory variable
- model
- explanatory
- mathematical model
- generated
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Definitions
- the present invention relates to a mathematical model generation system, a mathematical model generation method, and a mathematical model generation program that generate a mathematical model driven by data.
- a regression model by machine learning is a typical method.
- deep learning has a problem of low interpretability because it is a so-called black box in which the inside of the model is not easily understood and mathematically expressed, although the prediction performance is excellent.
- a linear regression model such as LASSO is a so-called white box in which the inside of the model is mathematically expressed by a linear model, but there is a problem that the prediction performance is low.
- Non-Patent Document 1 describes a method for extracting a free-form natural law from experimental data.
- a genetic algorithm is used to search for a mathematical formula expressing a non-linear phenomenon while changing arithmetic symbols (+,-, ⁇ , ⁇ , etc.).
- Non-Patent Document 1 since a candidate solution (that is, a mathematical formula) is not uniquely determined, there is a problem that a human must manually select the best solution from a plurality of candidate solutions. be. Therefore, it is preferable that the solution can be uniquely determined and a mathematical model with high interpretability can be generated.
- a candidate solution that is, a mathematical formula
- an object of the present invention is to provide a mathematical model generation system, a mathematical model generation method, and a mathematical model generation program that can uniquely determine a solution and generate a highly interpretable mathematical model.
- the mathematical model generation system is an explanatory variable generation means that generates a new explanatory variable by combining the underlying explanatory variables and generates an explanatory variable candidate including the underlying explanatory variable and the generated new explanatory variable. And, from the explanatory variable candidates, a more preferable explanatory variable is selected as the explanatory variable used for the mathematical model, which is a model expressing the nonlinear phenomenon by a mathematical formula, and the explanatory variable selection means for generating the candidate for the mathematical model is generated.
- the explanatory variable generation means is preferable because it includes a model evaluation means for evaluating the goodness of the candidate formula model and a model selection means for selecting the formula model having the highest evaluation among the plurality of generated formula model candidates.
- a new explanatory variable is generated by combining the explanatory variables selected as variables, a new explanatory variable candidate including the selected explanatory variable and the newly generated explanatory variable is generated, and the explanatory variable selection means is newly used. It is characterized in that an explanatory variable is selected from the generated explanatory variable candidates to generate a candidate for a formula model, and a model evaluation means evaluates the goodness of each generated formula model candidate.
- a new explanatory variable is generated by combining the underlying explanatory variables, an explanatory variable candidate including the base explanatory variable and the generated new explanatory variable is generated, and the explanatory variable candidate is generated.
- a more preferable explanatory variable is selected as an explanatory variable used for the mathematical model, which is a model expressing the nonlinear phenomenon by a mathematical formula, a candidate for the mathematical model is generated, and a new explanatory variable is combined with the explanatory variable selected as the preferable explanatory variable.
- Explanatory variable is generated, a new explanatory variable candidate including the selected explanatory variable and the newly generated explanatory variable is generated, and the explanatory variable is selected from the newly generated explanatory variable candidates to select a mathematical model. It is characterized in that the candidates of the above are generated, the goodness of each generated formula model candidate is evaluated, and the highest evaluation formula model is selected from the generated multiple formula model candidates.
- the mathematical model generation program generates a new explanatory variable by combining the underlying explanatory variables on a computer, and generates an explanatory variable candidate including the underlying explanatory variable and the generated new explanatory variable.
- the explanatory variable selection process is generated by selecting a more preferable explanatory variable as the explanatory variable used for the mathematical model, which is a model expressing the nonlinear phenomenon with a mathematical formula, and generating the candidate for the mathematical model.
- the model evaluation process that evaluates the goodness of the candidate formula model and the model selection process that selects the formula model with the highest evaluation among the generated multiple formula model candidates are executed, and the explanatory variable generation process is performed.
- a new explanatory variable is generated by combining the explanatory variables selected as preferable explanatory variables, a new explanatory variable candidate including the selected explanatory variable and the newly generated explanatory variable is generated, and the explanatory variable selection process is performed. It is characterized in that an explanatory variable is selected from the newly generated explanatory variable candidates to generate a candidate for a formula model, and the goodness of each generated formula model candidate is evaluated by the model evaluation process.
- a solution can be uniquely determined and a highly interpretable mathematical model can be generated.
- FIG. 1 is a block diagram showing a configuration example of the first embodiment of the mathematical model generation system according to the present invention.
- the mathematical model assumed in this embodiment is a model expressing a nonlinear phenomenon by a mathematical expression. That is, the mathematical model generation system 100 of the present embodiment models a non-linear phenomenon with a mathematical formula.
- the mathematical model generation system 100 of the present embodiment includes a storage unit 10, an input unit 20, an explanatory variable generation unit 30, an explanatory variable selection unit 40, a model evaluation unit 60, a model selection unit 70, and an output unit 80. And have.
- the storage unit 10 stores various information such as parameters and settings used by the mathematical formula model generation system 100 of the present embodiment for each process. Further, the storage unit 10 may store the learning data used for generating the mathematical formula model and the generated mathematical formula model. The contents of the learning data will be described later.
- the mathematical formula model generation system 100 may be configured to acquire various information from another device (for example, a storage server) via a communication network.
- the storage unit 10 does not have to store the above-mentioned information.
- the storage unit 10 is realized by, for example, a magnetic disk or the like.
- the input unit 20 accepts input of learning data used for generating a mathematical model.
- the training data includes an objective variable and one or more explanatory variables, and the content thereof is determined according to the content of the mathematical model to be generated.
- the objective variable is the terminal velocity of the particle
- the explanatory variables are the particle size, particle density, fluid density, gravitational acceleration, fluid viscosity, and the like.
- the explanatory variable may be a feature amount that can be measured as an actual result, or may be a feature amount generated by combining these feature amounts (such as sum or product).
- the input unit 20 divides the input learning data into an objective variable and an explanatory variable.
- the objective variable is represented by y
- the explanatory variable is represented by x (and x with a subscript).
- the input explanatory variable (that is, the explanatory variable given in the initial state) may be referred to as the original explanatory variable.
- the input unit 20 may accept inputs such as an upper limit value, a threshold value, and the number of repetitions (for example, K, L, N, ⁇ ', etc.) described later. It should be noted that these values may be stored in the storage unit 10 in advance.
- the explanatory variable generation unit 30 combines the underlying explanatory variables to generate a new explanatory variable, and generates an explanatory variable candidate including the basic explanatory variable and the newly generated explanatory variable.
- combining the explanatory variables means performing an operation on each explanatory variable
- the generated new explanatory variable means an explanatory variable obtained as a result of performing the operation.
- the types of operations are preferably simple operations, such as four arithmetic operations (sum, difference, product, quotient), exponentiation, exponent, logarithm, trigonometric function, and the like. Further, it is preferable to limit the number of explanatory variables to be combined to two in order to prevent the amount of calculation from becoming enormous.
- the explanatory variable generation unit 30 comprehensively generates simple non-linear terms (sum, difference, product, quotient, exponent, trigonometric function, etc.) using each explanatory variable, and creates a new explanatory variable. Generated as x n-1'.
- FIG. 2 is an explanatory diagram showing an example of explanatory variable candidates obtained as a result of generation.
- FIG. 2 shows an example in which four arithmetic operations and exponentiation operations are performed by combining one or two explanatory variables.
- table T1 illustrated in FIG. 2 combines explanatory variables x 1 of interest, x 2, and shows a list of explanatory variables candidates generated when was x 3 (21 pieces).
- Table T2 exemplified in FIG. 2 is a list of explanatory variable candidates generated when the explanatory variables to be combined are x 1 , x 2 , x 3 , x 1 x 2 , and x 2 + x 3. (60 pieces) is shown.
- the explanatory variable selection unit 40 selects a more preferable explanatory variable as the explanatory variable used in the mathematical model from the explanatory variable candidates. That is, the explanatory variable selection unit 40 selects only important explanatory variables from the list of explanatory variable candidates. In the following description, the selected explanatory variable is xn .
- the explanatory variable selection unit 40 may select explanatory variables (feature amount selection) by machine learning using an algorithm such as LASSO, for example. Specifically, the explanatory variable selection unit 40 selects the finally remaining explanatory variable (feature amount) by learning the natural phenomenon model by machine learning using data representing the natural phenomenon (learning data), for example. You may.
- the machine learning method used here is arbitrary as long as the explanatory variables (features) can be selected.
- the explanatory variable selection unit 40 may select an explanatory variable based on a constraint that the larger the number of types of the underlying explanatory variable (that is, the original explanatory variable), the more difficult it is to select. Based on this constraint, for example, in the explanatory variables "x 1 x 2 / x 3 " and "x 1 x 2 x 3 (x 1 + x 2 )", the former has three types of explanatory variables. , The latter has 5 types of explanatory variables.
- the explanatory variable selection unit 40 may select an explanatory variable using, for example, an evaluation function that increases the penalty when the type of the original explanatory variable becomes larger than the predetermined threshold value ⁇ '.
- an evaluation function that increases the penalty when the type of the original explanatory variable becomes larger than the predetermined threshold value ⁇ '.
- the evaluation function can be defined as in Equation 1 illustrated below.
- Equation 1 y is the objective variable vector, X g is the design matrix for the nonlinear explanatory variable x g , ⁇ is the regression coefficient vector, and ⁇ is the regularization parameter. Further, in Equation 1, M is the number of nonlinear explanatory variables x g , and ⁇ is the number of original explanatory variables in each nonlinear explanatory variable.
- Am ( ⁇ ) in Equation 1 is a function that controls the magnitude of regularization in each nonlinear explanatory variable by ⁇ .
- Am ( ⁇ ) may be defined as, for example, a step function as in Equation 2 illustrated below.
- the explanatory variable selection unit 40 may, for example, determine that the larger the linear regression standardization coefficient is, the higher the importance is selected, and select the explanatory variable, or a parameter (importance parameter) calculated in a random forest. The higher the value, the higher the importance, and the explanatory variables may be selected.
- a method that can incorporate the above-mentioned constraint that the larger the number of original explanatory variables included in each term is, the more difficult it is to be selected (that is, the less important it is) is preferable.
- the importance can be similarly determined by increasing the coefficient of the regularization term of each term according to the number of original explanatory variables included in each term.
- the upper limit K of the number of explanatory variables to be selected may be set, and the explanatory variable selection unit 40 may select the explanatory variables with the upper limit K as a constraint condition.
- the upper limit K of the explanatory variable is set by the user or the like according to the computing power of the computer to be used. The larger the value of the upper limit K, the higher the accuracy of the generated mathematical model, but the amount of calculation will increase accordingly.
- the explanatory variable selection unit 40 generates a candidate for a mathematical model using the selected explanatory variable. Specifically, the explanatory variable selection unit 40 generates candidates for a linear regression model including the selected explanatory variables in each term.
- the candidate of the generated mathematical model is referred to as Mn .
- the explanatory variable selection unit 40 when feature selection is performed using the above-mentioned LASSO, the explanatory variable selection unit 40 generates candidates for a linear regression model in the process of feature selection.
- the model evaluation unit 60 evaluates the goodness of the generated mathematical model candidate Mn.
- the method in which the model evaluation unit 60 evaluates the goodness of the candidate Mn of the mathematical model is arbitrary.
- the model evaluation unit 60 may evaluate, for example, the high generalization performance of a candidate for a mathematical model. This is because it can be said that the higher the generalization performance, the higher the ability to discriminate against unknown test data.
- a method using an information criterion such as AIC (Akaike's Information Criterion) or BIC (Bayesian Information Criterion), or CV-Error (cross-validation error: Cross-).
- AIC Akaike's Information Criterion
- BIC Bayesian Information Criterion
- CV-Error cross-validation error: Cross-.
- Methods for evaluating the performance of the model such as a method using ValidationError) and a method using MIC (Maximum Information Coefficient), can be mentioned. From these evaluation methods, the evaluation method used by the model evaluation unit 60 may be determined in advance.
- the mathematical model generation system 100 repeats the processes of the explanatory variable generation unit 30, the explanatory variable selection unit 40, and the model evaluation unit 60 a predetermined number of times to generate a plurality of mathematical model candidates.
- the upper limit of the number of repetitions is N.
- the upper limit N may be set by the user or the like according to the computing power of the computer to be used and the complexity of the phenomenon.
- the explanatory variable generation unit 30 generates a new explanatory variable that combines only the explanatory variables selected by the explanatory variable selection unit 40, and includes a new explanatory variable including the selected explanatory variable and the newly generated explanatory variable. Generate explanatory variable candidates. That is, in the mathematical model generation system 100, the explanatory variable generation unit 30 adds explanatory variable candidates by generating non-linear terms, and the explanatory variable selection unit 40 excludes the explanatory variables not selected from the explanatory variable candidates. repeat. Such recursive processing makes it possible to generate candidates for multiple mathematical models with different complexity.
- Model selection unit 70 among the plural candidates for formula models generated, selects the highest evaluation mathematical model M F. Furthermore, the model selection unit 70, among the explanatory variables contained in the mathematical model M F chosen, in order contribution rate is high to a mathematical model, selecting a predetermined number of explanatory variables, the explanatory variables selected A new mathematical model (specifically, a linear regression model) included in the term may be generated. Specifically, if the number is greater than L number is predetermined explanatory variable x F 'of the selected mathematical model M F, the model selection unit 70, the contribution rate to model the high upper L number of explanatory variables You may select and generate a new linear regression model containing the selected explanatory variables in each term.
- the method of calculating the contribution of the explanatory variables included in the mathematical model is arbitrary.
- the model selection unit 70 may determine that the explanatory variable having a higher linear regression standardization coefficient has a higher contribution rate.
- the model selection unit 70 may determine that the explanatory variable having a larger parameter value calculated in the random forest has a higher contribution.
- the number L of the selected explanatory variables may be set by the user or the like according to the complexity of the phenomenon targeted by the mathematical model.
- the output unit 80 outputs the generated mathematical model.
- the input unit 20, the explanatory variable generation unit 30, the explanatory variable selection unit 40, the model evaluation unit 60, the model selection unit 70, and the output unit 80 are computer processors that operate according to a program (mathematical model generation program). (For example, it is realized by CPU (Central Processing Unit), GPU (Graphics Processing Unit)).
- the program is stored in the storage unit 10, the processor reads the program, and according to the program, the input unit 20, the explanatory variable generation unit 30, the explanatory variable selection unit 40, the model evaluation unit 60, the model selection unit 70, and the output. It may operate as a unit 80. Further, the functions of the input unit 20, the explanatory variable generation unit 30, the explanatory variable selection unit 40, the model evaluation unit 60, the model selection unit 70, and the output unit 80 may be provided in the SaaS (Software as a Service) format.
- SaaS Software as a Service
- the input unit 20, the explanatory variable generation unit 30, the explanatory variable selection unit 40, the model evaluation unit 60, the model selection unit 70, and the output unit 80 are each realized by dedicated hardware. May be good. Further, a part or all of each component of each device may be realized by a general-purpose or dedicated circuit (circuitry), a processor, or a combination thereof. These may be composed of a single chip or may be composed of a plurality of chips connected via a bus. A part or all of each component of each device may be realized by the combination of the circuit or the like and the program described above.
- each component of the input unit 20, the explanatory variable generation unit 30, the explanatory variable selection unit 40, the model evaluation unit 60, the model selection unit 70, and the output unit 80 may be partially or entirely by a plurality of information processing devices, circuits, or the like.
- a plurality of information processing devices, circuits, and the like may be centrally arranged or distributedly arranged.
- the information processing device, the circuit, and the like may be realized as a form in which each is connected via a communication network, such as a client-server system and a cloud computing system.
- FIG. 3 is a flowchart showing an operation example of the mathematical model model generation system 100 of the present embodiment.
- step S11 the input of data including the objective variable y and the explanatory variable x
- the processes from step S13 to step S17 are repeated up to a predetermined number of times N.
- the explanatory variable generation unit 30 comprehensively generates non-linear terms that can be created by the underlying explanatory variable x n-1 (step 14). Specifically, the explanatory variable generation unit 30 comprehensively creates nonlinear terms (sum, difference, product, quotient, exponent, triangular function, etc.) that can be created with one or two types of explanatory variables x n-1. Then, the underlying explanatory variable and the created nonlinear term are set as new explanatory variable candidates x n-1 '.
- the explanatory variable selection unit 40 may select the explanatory variables based on the constraint that the larger the number of types of the original explanatory variables x included in the selected explanatory variables, the more difficult it is to be selected.
- the model evaluation unit 60 records the generalization performance G n of the candidate M n linear regression model to the storage unit 10 ( Step S16).
- the model selection unit 70 selects the best generalization performance G n good mathematical model M F (step S18). Then, the model selection unit 70, the number of explanatory variables contained in the mathematical model M F selected following determines whether L (step S19). When the number of explanatory variables is not L or less (No in step S19), the model selection unit 70 selects the upper L type explanatory variables having a high contribution to the model (step S20). Then, the model selection unit 70 generates again a linear regression model M F description variable selected (step S21).
- step S21 If the number of explanatory variables is less than L (Yes in step S19), or after the processing in step S21, the output unit 80 outputs as the best model M F (step S22).
- the explanatory variable generation unit 30 generates a new explanatory variable by combining the underlying explanatory variables, and the explanatory variable including the basic explanatory variable and the generated new explanatory variable. Generate candidates. Further, the explanatory variable selection unit 40 selects a more preferable explanatory variable as the explanatory variable used for the mathematical model from the explanatory variable candidates, and generates a candidate for the mathematical model. When the process is repeated to generate mathematical model candidates, the model evaluation unit 60 evaluates the goodness of each generated mathematical model candidate, and the model selection unit 70 selects the highest evaluation mathematical model. Therefore, a solution can be uniquely determined and a highly interpretable mathematical model can be generated.
- Embodiment 2 Next, a second embodiment of the mathematical model generation system according to the present invention will be described. In the second embodiment, a method for the user to input the condition setting for model generation and output various results will be described.
- FIG. 4 is a block diagram showing a configuration example of a second embodiment of the mathematical model generation system according to the present invention.
- the mathematical model generation system 200 of the present embodiment includes a storage unit 10, an input unit 120, an explanatory variable generation unit 30, an explanatory variable selection unit 40, a model evaluation unit 60, a model selection unit 70, and an output unit 180. And have. That is, the mathematical model generation system 200 of the present embodiment has an input unit 120 instead of the input unit 10 and an output unit 180 instead of the output unit 80 as compared with the mathematical formula model generation system 100 of the first embodiment. It differs in that it has. Other than that, the configuration is the same as that of the first embodiment.
- the input unit 120 accepts data and parameter inputs so that the user can set conditions for model generation.
- the input unit 120 includes a learning data input unit 121, an explanatory variable selection unit 122, a term selection unit 123, a function type selection unit 124, an evaluation index selection unit 125, an evaluation method selection unit 126, and a parameter setting unit 127. And include.
- the input unit 120 may include all the configurations described above, or may include only a part of the configurations.
- the learning data input unit 121 accepts input of training data used for generating a mathematical model, as in the input unit 20 of the first embodiment.
- the explanatory variable selection unit 122 accepts the selection of one or more explanatory variables to be model-generated from the user. Specifically, the explanatory variable selection unit 122 may accept selection of an explanatory variable that is the basis for the explanatory variable generation unit 30 to generate a new explanatory variable, and the explanatory variable selection unit 40 explains the mathematical model. You may accept the selection of explanatory variable candidates to be used as variables. The explanatory variable selection unit 122 may, for example, output the target explanatory variable to a display device (not shown) and allow the user to select the explanatory variable, which is included in a table format file or a record in the database. You may select the target explanatory variable from the explanatory variables.
- the term selection unit 123 accepts from the user the selection of the type of term for which the explanatory variable generation unit 30 generates a new explanatory variable.
- the term selection unit 123 may accept options such as sum, difference, quotient, product, power, exponent, logarithm, and trigonometric function as the types of terms to be generated when the non-linear term is automatically generated.
- the function type selection unit 124 accepts the selection of the type of the function Am ( ⁇ ) that characterizes the regularization term used by the explanatory variable selection unit 40.
- the function type selection unit 124 may accept selection of a function type from, for example, a step function as shown in Equation 2 above, a regularization type such as L 1 , L 0 , and L 0.5, and the like.
- the evaluation index selection unit 125 accepts from the user the selection of an index for which the model evaluation unit 60 evaluates the generalization performance of the candidate of the mathematical model.
- the evaluation index selection unit 125 may accept selection of an index to be used for evaluation from, for example, the evaluation methods described above (AIC, BIC, CV-Error, MIC, etc. described above).
- the evaluation method selection unit 126 accepts from the user the selection of the evaluation method used by the model selection unit 70 when determining the contribution of the explanatory variables.
- the evaluation method selection unit 126 may accept selection from evaluation methods such as, for example, a linear regression standardization coefficient, a random forest, and the like.
- the parameter setting unit 127 accepts inputs of various parameters used for processing by the mathematical model model generation system 200.
- the parameter setting unit 127 inputs, for example, an upper limit K of the number of explanatory variables selected by the explanatory variable selection unit 40, a number L of explanatory variables selected by the model selection unit 70, and an upper limit of the number of repetitions of N at least one. May be accepted. Then, the parameter setting unit 127 sets so that the received parameter is used for each process.
- the parameter setting unit 127 may accept inputs such as upper limit values and threshold values.
- the output unit 180 outputs various information obtained when the mathematical expression model is generated, in addition to the generated mathematical expression model. Specifically, the output unit 180 may output the generated nonlinear term, explanatory variable candidate, mathematical model candidate, generalization performance, and the like. The mode of output is arbitrary. The output unit 180 may display various information on a display device (not shown), or may output various information to a log file or the like.
- FIG. 5 is an explanatory diagram showing an example of the result of deriving the Stokes' equation driven by data.
- FIG. 6 is an explanatory diagram showing an example of the result of deriving the equation of the energy conservation law driven by data.
- 5 and 6 show the prediction result by the model generated by LASSO and the prediction result by the model generated by the neural network for comparison with the mathematical model generated by the above-mentioned mathematical model generation system 100. ..
- graphs G41 and G51 show prediction results by a model generated by LASSO
- graphs G42 and G52 show prediction results by a model generated by a neural network
- graphs G43 and 53 are generated. The prediction result by the mathematical model is shown.
- LASSO can express a phenomenon by a mathematical formula, but since it assumes linearity, it cannot derive a non-linear phenomenon (Stokes' equation or energy conservation law equation).
- neural networks can appropriately express non-linearity, but cannot express models by mathematical formulas.
- FIG. 7 is a block diagram showing an outline of the mathematical model generation system according to the present invention.
- the mathematical model generation system 90 (for example, the mathematical model generation system 100) according to the present invention combines the underlying explanatory variables (for example, the original explanatory variable x) to generate a new explanatory variable, and the underlying explanatory variable and the underlying explanatory variable.
- An explanatory variable generation means 91 (for example, an explanatory variable generation unit 30) that generates an explanatory variable candidate including a new explanatory variable that has been generated, and a mathematical model that is a model expressing a nonlinear phenomenon from among the explanatory variable candidates.
- An explanatory variable selection means 92 (for example, an explanatory variable selection unit 40) that selects a more preferable explanatory variable as an explanatory variable to be used for and generates a candidate for a mathematical model, and a model evaluation that evaluates the goodness of the generated candidate for the mathematical model.
- the means 93 for example, the model evaluation unit 60
- the model selection means 94 for example, the model selection unit 70 for selecting the most evaluated mathematical model among the generated candidates for the mathematical model are provided. ..
- the explanatory variable generation means 91 generates a new explanatory variable by combining the explanatory variables selected as the preferable explanatory variables, and generates a new explanatory variable candidate including the selected explanatory variable and the newly generated explanatory variable.
- the generated explanatory variable selection means 92 selects an explanatory variable from the newly generated explanatory variable candidates to generate a candidate for a formula model, and the model evaluation means 93 selects the candidate for each generated formula model. Evaluate the goodness.
- the explanatory variable selection means 92 may select the explanatory variable based on the constraint that the selection becomes difficult as the number of types of the explanatory variable as the basis increases. With such a configuration, it is possible to generate a model that is close to the expected behavior while suppressing the amount of calculation.
- model selection means 94 selects a predetermined number of explanatory variables in descending order of contribution to the mathematical expression model from the explanatory variables included in the selected mathematical expression model, and sets the selected explanatory variables in each item. You may generate a new mathematical model that includes.
- explanatory variable selection means 92 may select an explanatory variable by machine learning (for example, LASSO) and generate a candidate for a mathematical model using the selected explanatory variable.
- machine learning for example, LASSO
- the explanatory variable generation means 91 generates explanatory variable candidates
- the explanatory variable selection means 92 selects explanatory variables and the mathematical model candidates are generated
- the model evaluation means 93 evaluates the goodness of the mathematical model candidates.
- a plurality of mathematical model candidates may be generated.
- the model selection means 94 may select the mathematical model with the highest evaluation from the generated candidates for the mathematical formula model.
- the mathematical model generation system 90 has an upper limit of the number of explanatory variables selected by the explanatory variable selection means 92 (for example, upper limit K), an upper limit of the number of explanatory variables selected by the model selection means 94 (for example, L), and. ,
- a parameter setting means for accepting at least one input may be provided.
- FIG. 8 is a schematic block diagram showing a configuration of a computer according to at least one embodiment.
- the computer 1000 includes a processor 1001, a main storage device 1002, an auxiliary storage device 1003, and an interface 1004.
- the above-mentioned mathematical model generation system 90 is implemented in the computer 1000.
- the operation of each of the above-mentioned processing units is stored in the auxiliary storage device 1003 in the form of a program (mathematical model model generation program).
- the processor 1001 reads a program from the auxiliary storage device 1003, expands it to the main storage device 1002, and executes the above processing according to the program.
- the auxiliary storage device 1003 is an example of a non-temporary tangible medium.
- non-temporary tangible media include magnetic disks, magneto-optical disks, CD-ROMs (Compact Disc Read-only memory), DVD-ROMs (Read-only memory), which are connected via interface 1004. Examples include semiconductor memory.
- the computer 1000 may be connected to the input device 1005 and the output device 1006 via the interface 1004, the input device 1005 may receive an input by a user or the like, and the output device 1006 may display the processing result.
- the program may be for realizing a part of the above-mentioned functions. Further, the program may be a so-called difference file (difference program) that realizes the above-mentioned function in combination with another program already stored in the auxiliary storage device 1003.
- difference file difference program
- An explanatory variable generation means for generating a new explanatory variable by combining the underlying explanatory variables and generating an explanatory variable candidate including the basic explanatory variable and the generated new explanatory variable, and the above description. From the variable candidates, an explanatory variable selection means that selects a more preferable explanatory variable as an explanatory variable to be used in the formula model, which is a model expressing the nonlinear phenomenon by a formula, and generates a candidate for the formula model, and the generated formula.
- the explanatory variable generation means is preferable, comprising a model evaluation means for evaluating the goodness of the model candidates and a model selection means for selecting the highest evaluation mathematical model among the generated multiple mathematical model candidates.
- a new explanatory variable is generated by combining the explanatory variables selected as the explanatory variables, a new explanatory variable candidate including the selected explanatory variable and the newly generated explanatory variable is generated, and the explanatory variable selection means is , A formula that is characterized by selecting explanatory variables from the newly generated explanatory variable candidates to generate formula model candidates, and evaluating the goodness of each generated formula model candidate by the model evaluation means.
- Model generation system
- the explanatory variable selection means is the mathematical model generation system according to Appendix 1, which generates a linear regression model as a candidate for a mathematical model.
- Appendix 3 The mathematical model generation system according to Appendix 1 or Appendix 2, wherein the explanatory variable selection means selects an explanatory variable based on a constraint that the larger the number of types of explanatory variables on which it is based, the more difficult it is to select.
- the explanatory variable selection means selects an explanatory variable by machine learning and generates a candidate for a mathematical model using the selected explanatory variable.
- the mathematical model according to any one of Supplements 1 to 4. Generation system.
- a new explanatory variable is generated by combining the underlying explanatory variables, an explanatory variable candidate including the basic explanatory variable and the generated new explanatory variable is generated, and the explanatory variable candidates are generated from the explanatory variable candidates.
- a more preferable explanatory variable is selected as an explanatory variable used in the mathematical model, which is a model expressing a non-linear phenomenon, a candidate for the mathematical model is generated, and a new explanatory variable is combined with the explanatory variable selected as the preferred explanatory variable.
- Generate an explanatory variable generate a new explanatory variable candidate including the selected explanatory variable and the newly generated explanatory variable, select an explanatory variable from the newly generated explanatory variable candidates, and select an explanatory variable as a mathematical model. Candidates are generated, the goodness of each generated formula model candidate is evaluated, and the most evaluated formula model is selected from the generated multiple formula model candidates. ..
- Appendix 9 The mathematical model generation method according to Appendix 8, which generates a linear regression model as a candidate for a mathematical model.
- An explanatory variable generation process in which a new explanatory variable is generated by combining the underlying explanatory variables on a computer, and an explanatory variable candidate including the basic explanatory variable and the generated new explanatory variable is generated. From the explanatory variable candidates, a more preferable explanatory variable is selected as an explanatory variable used for the mathematical model, which is a model expressing the nonlinear phenomenon by a mathematical formula, and the explanatory variable selection process for generating the candidate for the mathematical model is performed.
- the model evaluation process for evaluating the goodness of the candidate formula model and the model selection process for selecting the formula model with the highest evaluation among the generated multiple formula model candidates are executed, and the explanatory variable generation process is performed.
- a new explanatory variable is generated by combining the explanatory variables selected as the preferred explanatory variables, a new explanatory variable candidate including the selected explanatory variable and the newly generated explanatory variable is generated, and the explanatory variable selection is performed.
- an explanatory variable is selected from the newly generated explanatory variable candidates to generate a candidate for a formula model, and in the model evaluation process, a formula model for evaluating the goodness of each generated formula model candidate.
- a program storage medium that stores the generated program.
- Appendix 11 The program storage medium according to Appendix 10 for storing a mathematical model generation program for generating a linear regression model as a candidate for a mathematical model in an explanatory variable selection process in a computer.
- An explanatory variable generation process in which a new explanatory variable is generated by combining the underlying explanatory variables on a computer, and an explanatory variable candidate including the basic explanatory variable and the generated new explanatory variable is generated. From the explanatory variable candidates, a more preferable explanatory variable is selected as an explanatory variable used for the mathematical model, which is a model expressing the nonlinear phenomenon by a mathematical formula, and the explanatory variable selection process for generating the candidate for the mathematical model is performed.
- the model evaluation process for evaluating the goodness of the candidate formula model and the model selection process for selecting the formula model with the highest evaluation among the generated multiple formula model candidates are executed, and the explanatory variable generation process is performed.
- a new explanatory variable is generated by combining the explanatory variables selected as the preferred explanatory variables, a new explanatory variable candidate including the selected explanatory variable and the newly generated explanatory variable is generated, and the explanatory variable selection is performed.
- an explanatory variable is selected from the newly generated explanatory variable candidates to generate a candidate for a formula model, and in the model evaluation process, a formula model for evaluating the goodness of each generated formula model candidate. Generator.
- Appendix 13 The mathematical model generation program according to Appendix 12, which causes a computer to generate a linear regression model as a candidate for a mathematical model in an explanatory variable selection process.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/JP2020/018844 WO2021229648A1 (ja) | 2020-05-11 | 2020-05-11 | 数式モデル生成システム、数式モデル生成方法および数式モデル生成プログラム |
| JP2022522107A JP7491371B2 (ja) | 2020-05-11 | 2020-05-11 | 数式モデル生成システム、数式モデル生成方法および数式モデル生成プログラム |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/JP2020/018844 WO2021229648A1 (ja) | 2020-05-11 | 2020-05-11 | 数式モデル生成システム、数式モデル生成方法および数式モデル生成プログラム |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2021229648A1 true WO2021229648A1 (ja) | 2021-11-18 |
Family
ID=78525435
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2020/018844 Ceased WO2021229648A1 (ja) | 2020-05-11 | 2020-05-11 | 数式モデル生成システム、数式モデル生成方法および数式モデル生成プログラム |
Country Status (2)
| Country | Link |
|---|---|
| JP (1) | JP7491371B2 (https=) |
| WO (1) | WO2021229648A1 (https=) |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20220076148A1 (en) * | 2020-09-07 | 2022-03-10 | Kioxia Corporation | Information processing device and information processing method |
| JPWO2023139718A1 (https=) * | 2022-01-20 | 2023-07-27 | ||
| JPWO2024029020A1 (https=) * | 2022-08-04 | 2024-02-08 |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2000020504A (ja) * | 1998-06-30 | 2000-01-21 | Toshiba Corp | 目的変数の説明または予測方法、および目的変数を説明または予測するプログラムを記録した記録媒体 |
| WO2019053828A1 (ja) * | 2017-09-13 | 2019-03-21 | 日本電気株式会社 | 情報分析装置、情報分析方法および情報分析プログラム |
| JP2020057261A (ja) * | 2018-10-03 | 2020-04-09 | トヨタ自動車株式会社 | 重回帰分析装置及び重回帰分析方法 |
Family Cites Families (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP6947108B2 (ja) | 2018-04-03 | 2021-10-13 | 日本電信電話株式会社 | データ予測装置、方法、及びプログラム |
-
2020
- 2020-05-11 WO PCT/JP2020/018844 patent/WO2021229648A1/ja not_active Ceased
- 2020-05-11 JP JP2022522107A patent/JP7491371B2/ja active Active
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2000020504A (ja) * | 1998-06-30 | 2000-01-21 | Toshiba Corp | 目的変数の説明または予測方法、および目的変数を説明または予測するプログラムを記録した記録媒体 |
| WO2019053828A1 (ja) * | 2017-09-13 | 2019-03-21 | 日本電気株式会社 | 情報分析装置、情報分析方法および情報分析プログラム |
| JP2020057261A (ja) * | 2018-10-03 | 2020-04-09 | トヨタ自動車株式会社 | 重回帰分析装置及び重回帰分析方法 |
Cited By (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20220076148A1 (en) * | 2020-09-07 | 2022-03-10 | Kioxia Corporation | Information processing device and information processing method |
| JPWO2023139718A1 (https=) * | 2022-01-20 | 2023-07-27 | ||
| WO2023139718A1 (ja) * | 2022-01-20 | 2023-07-27 | 日本電気株式会社 | 特徴量選定装置、特徴量選定方法、身体状態推定システム、および記録媒体 |
| JP7711777B2 (ja) | 2022-01-20 | 2025-07-23 | 日本電気株式会社 | 特徴量選定装置、特徴量選定方法、およびプログラム |
| JPWO2024029020A1 (https=) * | 2022-08-04 | 2024-02-08 | ||
| WO2024029020A1 (ja) * | 2022-08-04 | 2024-02-08 | 日本電信電話株式会社 | データ分析装置、データ分析方法及びデータ分析プログラム |
| JP7750420B2 (ja) | 2022-08-04 | 2025-10-07 | Ntt株式会社 | データ分析装置、データ分析方法及びデータ分析プログラム |
Also Published As
| Publication number | Publication date |
|---|---|
| JP7491371B2 (ja) | 2024-05-28 |
| JPWO2021229648A1 (https=) | 2021-11-18 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP6451895B2 (ja) | 予測モデル選択システム、予測モデル選択方法および予測モデル選択プログラム | |
| JP7491371B2 (ja) | 数式モデル生成システム、数式モデル生成方法および数式モデル生成プログラム | |
| US12271797B2 (en) | Feature selection for model training | |
| JP2018097612A (ja) | 情報処理装置、プログラム及び情報処理方法 | |
| JP7421475B2 (ja) | 学習方法、混合率予測方法及び学習装置 | |
| JP7339924B2 (ja) | 材料の特性値を推定するシステム | |
| EP4133416A1 (en) | Automatic selection of quantization and filter removal optimization under energy constraints | |
| KR101976689B1 (ko) | 데이터 모델링을 위한 변수 자동생성방법 및 그 장치 | |
| US20210026853A1 (en) | Combination search system, information processing device, method, and program | |
| Azzali et al. | A vectorial approach to genetic programming | |
| US20230401361A1 (en) | Generating and analyzing material structures based on neural networks | |
| US20230273771A1 (en) | Secret decision tree test apparatus, secret decision tree test system, secret decision tree test method, and program | |
| JP2016103126A (ja) | 重要業績評価指標のカテゴリ分割の条件を求める方法、並びに、その為のコンピュータ及びコンピュータ・プログラム | |
| CN119515155A (zh) | 一种基于注意力神经网络的人员能力自动评估方法及系统 | |
| JP7323669B1 (ja) | オントロジー生成方法及び学習方法 | |
| WO2020181032A1 (en) | Methods of explaining an individual predictions made by predictive processes and/or predictive models | |
| CN106599899A (zh) | 一种数据处理方法和设备 | |
| Mandli et al. | Selection of most relevant features from high dimensional data using ig-ga hybrid approach | |
| US20230325304A1 (en) | Secret decision tree test apparatus, secret decision tree test system, secret decision tree test method, and program | |
| JP7310884B2 (ja) | パラメータ推定装置、パラメータ推定方法、及びパラメータ推定プログラム | |
| JP2023069932A (ja) | モデル構築装置、モデル構築方法、及びプログラム | |
| US20250259111A1 (en) | Learning apparatus, physical property prediction apparatus, learning program, and physical property prediction program | |
| JP6721036B2 (ja) | 推論システム、推論方法、及び、プログラム | |
| US20250342222A1 (en) | Parameter generation device, system, method, and program | |
| JP7207423B2 (ja) | 作業集合選択装置、作業集合選択方法および作業集合選択プログラム |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20935504 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2022522107 Country of ref document: JP Kind code of ref document: A |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 20935504 Country of ref document: EP Kind code of ref document: A1 |