CN112966447A - Chemical material adsorption performance prediction method and device based on automatic machine learning - Google Patents

Chemical material adsorption performance prediction method and device based on automatic machine learning Download PDF

Info

Publication number
CN112966447A
CN112966447A CN202110318374.8A CN202110318374A CN112966447A CN 112966447 A CN112966447 A CN 112966447A CN 202110318374 A CN202110318374 A CN 202110318374A CN 112966447 A CN112966447 A CN 112966447A
Authority
CN
China
Prior art keywords
model
module
machine learning
chemical material
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110318374.8A
Other languages
Chinese (zh)
Inventor
王坤峰
杨培松
张欢
赖欣
阳庆元
俞度立
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Chemical Technology
Original Assignee
Beijing University of Chemical Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Chemical Technology filed Critical Beijing University of Chemical Technology
Priority to CN202110318374.8A priority Critical patent/CN112966447A/en
Publication of CN112966447A publication Critical patent/CN112966447A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2119/00Details relating to the type or aim of the analysis or the optimisation
    • G06F2119/18Manufacturability analysis or optimisation for manufacturability

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the technical field of machine learning, and in one aspect provides a chemical material adsorption performance prediction method based on automatic machine learning, which comprises the following steps: acquiring structural characteristics of a chemical material to construct an original data set; preprocessing an original data set, and generating an initial model containing hyper-parameters according to machine learning; performing iterative training on an initial model by using a pipeline method to generate an optimal prediction model; inputting the test data set into an optimal prediction model to predict the adsorption performance of the chemical material; the method can quickly and accurately realize the prediction of the adsorption performance of the material. The invention also provides a chemical material adsorption performance prediction device based on automatic machine learning, which comprises a data set construction module, a model pre-training module, a model component module and a test module; the method is applied to the device to predict the adsorption performance of the chemical material.

Description

Chemical material adsorption performance prediction method and device based on automatic machine learning
Technical Field
The invention relates to the technical field of machine learning, in particular to a chemical material adsorption performance prediction method and device based on automatic machine learning.
Background
The screening and design of chemical materials have important significance on the storage and transportation of important chemical gases. However, the synthesized gas storage materials are various in types and large in quantity, and when the synthesized gas storage materials are used for researching the gas adsorption working capacity, the method needs to be realized through a molecular dynamics simulation method, the method is accurate but time-consuming, and the traditional calculation method is obviously not feasible for searching suitable storage materials from mass storage materials.
Machine learning provides great convenience for material property calculation, but also has some problems. With the increase of the variety and complexity of the algorithm, engineers need to select a corresponding model architecture, a training process, a regularization method, a hyper-parameter and the like, which have great influence on the performance of the algorithm. The process of building accurate and powerful learning models requires advanced data science skills and it is also a difficult task to select the appropriate method to solve the problem and configure the optimal parameter values for a particular model. Therefore, how to quickly and effectively calculate the adsorption performance of the material and screen out a suitable gas storage material becomes a problem which needs to be solved urgently.
The prior art has the following problems. First, the traditional material calculation method is slow and inefficient, and cannot meet the existing requirements. Secondly, the parameter adjusting process of the common machine learning algorithm is complex, and the requirement threshold for the use of non-professionals is high. Third, there is no design of a pipeline model for material prediction content.
Disclosure of Invention
To this end, one aspect of the present invention provides a method for predicting material adsorption performance based on automatic machine learning. The method comprises the steps of obtaining characteristics of a chemical material through a data component module and establishing original data, establishing an initial model through a model pre-training module, generating an optimal prediction model through a model establishing module and predicting the adsorption performance of the chemical material through a testing module. The method is used for solving the problems that the prediction speed of the adsorption performance of the material is low and the efficiency is further low due to the fact that a pipeline model related to the prediction content of the material is not designed in the prior art.
In order to achieve the above object, the present invention provides a method for predicting chemical material adsorption performance based on automatic machine learning, comprising:
acquiring various characteristics related to the adsorption performance of the chemical material, establishing an original data set by combining different types of characteristics, and preprocessing the original data set;
performing feature processing on the preprocessed raw data set and utilizing machine learning to generate a plurality of initial models containing hyper-parameters according to the raw data subjected to feature processing by utilizing machine learning;
iteratively training a plurality of the initial models by a pipeline method to generate an optimal prediction model;
inputting the test data set into the optimal prediction model to predict the adsorption performance of the chemical material.
Further, the method for preprocessing the raw data comprises one or more of data sampling, data cleaning, feature compression, feature conversion and feature extraction;
the method for generating the initial models containing the hyper-parameters through machine learning comprises the steps of carrying out feature processing on the original data to ensure the reasonability of the data, and selecting different machines for learning to generate the initial models containing the hyper-parameters according to prior knowledge.
Further, the method for iteratively training a plurality of initial models by a pipeline method to obtain an optimal prediction model comprises: and performing data screening and feature processing on the data set subjected to feature processing according to the feature importance, and performing parameter adjustment on the initial model through a genetic algorithm and an iterative method.
Further, the data screening includes selecting the optimal top n% feature information by using a SelectKBest method and removing the feature information which does not meet a minimum variance threshold, wherein the selection method is used for calculating the optimal top n% feature by combining chi-square verification and mutual information, and the formula is as follows:
Figure BDA0002992199860000031
Figure BDA0002992199860000032
wherein p (x, y) is a joint distribution function of x and y, p (x) and p (y) are marginal probability density functions of x and y, respectively, FiIs an observed value of the ith feature, EiIs the expected value of the ith feature.
Further, the feature importance includes a correlation between the feature and the target variable and a correlation between the features, the feature importance is generated by analyzing the correlation between the feature and the target variable and the correlation between the features, retaining the feature with strong correlation with the target variable, and deleting the feature with strong correlation between the features, the correlation is calculated by a formula:
Figure BDA0002992199860000041
where r (x, y) represents the correlation coefficient between two variables x and y,
Figure BDA0002992199860000042
and
Figure BDA0002992199860000043
representing the mean of x and y, respectively.
Further, the parameter adjusting method for the initial model through the genetic algorithm comprises the following steps: and generating a plurality of initial models with optimal performance by respectively optimizing the hyper-parameters of the initial models, and generating an optimal prediction model by selecting optimal parameters.
Further, the method for generating the optimal prediction model comprises the following steps: integrating the plurality of initial models with the best performance into an optimal prediction model set through superposition combination, wherein an integration formula is as follows:
Figure BDA0002992199860000044
wherein A ═ { A ═ A1,...AnIs a set of machine learning, each element representing data processing and machine learning algorithms, AjE (j is 1.. multidata., n) is a hyper-parameter space Lambda corresponding to the setj
Partitioning the dataset into k training sets by performing k cross-validations on the dataset at model selection
Figure BDA0002992199860000045
And k verification sets
Figure BDA0002992199860000046
Figure BDA0002992199860000047
Is a training set
Figure BDA0002992199860000048
After training, the training is carried out with a hyperparameter lambda epsilon lambdajAlgorithm A ofjIn the verification set
Figure BDA0002992199860000049
The optimal combination of prediction models and hyper-parameter combinations are generated.
Further, by goodness of fit R2And RMSE evaluating the best predictive model:
Figure BDA0002992199860000051
Figure BDA0002992199860000052
where n represents the total number of data sets,
Figure BDA0002992199860000053
and yiRespectively the optimal model predicted value and the actual value of the ith data,
Figure BDA0002992199860000054
is all predicted values
Figure BDA0002992199860000055
Average value of (a).
Another aspect of the present invention provides an automatic machine learning-based chemical material adsorption performance prediction apparatus for performing the automatic machine learning-based chemical material adsorption performance prediction method according to any one of claims 1 to 8, comprising:
the data set construction module is used for acquiring physical and chemical structural characteristics of the chemical material, judging and filtering invalid data and null values, and establishing an original data set for the filtered characteristics;
the model pre-training module is connected with the data component module and used for generating a plurality of initial models containing hyper-parameters according to different algorithms;
a model construction module connected with the pre-training module of the model and used for iteratively training a plurality of initial models through a pipeline method to generate an optimal prediction model;
and the test module is connected with the model component module and used for inputting a test data set to the optimal prediction model to predict the adsorption performance of the chemical material.
Further, the model building module comprises:
the characteristic engineering module is connected with the pre-training module of the model and is used for carrying out characteristic processing on the original data set and carrying out characteristic selection, compression and extraction on the original data in the original data set according to the characteristic importance;
the model selection module is connected with the characteristic engineering module and used for carrying out algorithm model selection on the original data set after characteristic processing by combining characteristic importance and building an initial model by different algorithms;
the parameter optimizing module is connected with the model selecting module and used for respectively optimizing the hyperparameters of the plurality of initial models by a genetic algorithm optimizing method to generate a plurality of initial models with the best performance:
the pipeline module is connected with the parameter optimizing module and is used for integrating a plurality of initial models with the best performance into an optimal prediction model set through a superposition combination method;
and the model evaluation module is respectively connected with the pipeline module and the test module and is used for evaluating the performance of the optimal prediction model set model in the formed pipeline module and selecting an optimal prediction model.
Compared with the prior art, the method has the advantages that the original data set is constructed by obtaining the structural characteristics of the chemical material, the original data set is preprocessed, the initial model containing the hyper-parameters is generated according to machine learning, the optimal prediction model is generated by iterative training of the initial model of the pipeline method, and the test data set is input to the optimal prediction model to predict the adsorption performance of the chemical material.
Furthermore, the original data is subjected to sampling and/or data cleaning and/or feature compression and/or feature extraction preprocessing, feature processing is carried out on the original data, different machines are selected for learning, a plurality of initial models containing the hyperparameters are generated according to the priori knowledge, the rationality of the data is guaranteed, the parameter adjusting process is further simplified, and therefore the prediction efficiency is further improved.
And further, data screening and feature processing are carried out on the data set subjected to feature processing according to feature importance, and parameter adjustment is carried out on the initial model in an iterative manner through a genetic algorithm, so that the parameter adjustment process is further simplified, and the prediction efficiency is further improved.
Further, the optimal top n% of feature information is selected by a SelectKBest method, and features which do not accord with a minimum variance threshold are removed, so that the feature selection of the data is improved, and the prediction efficiency is further improved.
Further, the feature importance is generated by retaining the features with strong correlation with the target variable and deleting the features with strong correlation between the features, so that the feature selection of the data is improved, and the prediction efficiency is further improved.
Furthermore, the hyper-parameters of the initial models are optimized through a genetic algorithm to generate the initial models with the optimal performance, and the optimal parameters are selected to generate the optimal prediction model, so that the automatic processing of the models is realized, and the prediction efficiency is further improved.
Further, the initial models with the best performance are integrated into an optimal prediction model set through superposition combination, k cross verifications are performed during model selection, the data set is divided into k training sets and k verification sets, and the algorithm is verified after training is completed to generate the optimal prediction model combination and the optimal hyper-parameter combination, so that automatic processing of the models is realized, and the prediction efficiency is further improved.
Further, by goodness of fit R2And the RMSE evaluates the optimal prediction model to verify the optimal prediction model, thereby further improving the prediction efficiency.
Furthermore, the chemical material adsorption performance prediction method based on automation and learning is arranged in the chemical material adsorption performance prediction device based on automation and learning, so that the automatic prediction of the chemical material adsorption performance is realized, and the prediction efficiency is further improved.
Drawings
FIG. 1 is a flow chart of a method for predicting the adsorption performance of a chemical material based on automatic machine learning according to the present invention;
fig. 2 is a detailed flowchart of step S103;
fig. 3 is a block diagram of the structure of the prediction device for chemical material adsorption performance based on automatic machine learning according to the present invention.
FIG. 4 is a schematic diagram of an evaluation model performance of the adsorption performance of the organic molecule to methane gas according to the present invention.
Detailed Description
In order that the objects and advantages of the invention will be more clearly understood, the invention is further described below with reference to examples; it should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Preferred embodiments of the present invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are only for explaining the technical principle of the present invention, and do not limit the scope of the present invention.
It should be noted that in the description of the present invention, the terms of direction or positional relationship indicated by the terms "upper", "lower", "left", "right", "inner", "outer", etc. are based on the directions or positional relationships shown in the drawings, which are only for convenience of description, and do not indicate or imply that the device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and thus, should not be construed as limiting the present invention.
Furthermore, it should be noted that, in the description of the present invention, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
Fig. 1 is a flow chart of a method for predicting the adsorption performance of a chemical material based on automatic machine learning according to the present invention.
The method for predicting the adsorption performance of the chemical material based on automatic machine learning comprises the following steps:
s100, acquiring various characteristics of chemical materials, and constructing an original data set by combining different types of characteristics;
step S200, inputting an original data set to perform feature processing, and obtaining a plurality of initial models containing hyper-parameters by using a base learner;
step S300, performing iterative training on a plurality of initial models by using a pipeline method to obtain an optimal prediction model: performing data screening and feature processing on the input data set; adjusting parameters of the model through a genetic algorithm, and obtaining an optimal prediction model through a step-by-step iteration method;
and S400, inputting a test data set to the optimal prediction model to predict the adsorption performance of the chemical material.
In the step S100, physical and chemical characteristics of the material, including physical characteristics such as pore size characteristics, volume, density, surface area, and element content percentage of the material, and chemical characteristics such as heat of adsorption, are calculated by using a conventional molecular simulation method, and invalid data and null values are determined and filtered, and an original data set is established according to the characteristics.
In step S200, feature processing is performed on the original data set, and a plurality of initial models including hyper-parameters are generated through machine learning for obtaining an initial parameter model of machine learning according to prior knowledge.
In the step S300, data screening and feature processing are performed on the input data set; adjusting parameters of the model through a genetic algorithm; and obtaining the optimal prediction model by a step-by-step iteration method.
In the embodiment, a plurality of initial models containing hyper-parameters are obtained according to prior knowledge, and the initial models are subjected to iterative training through a pipeline method to generate an optimal prediction model;
specifically, data screening and characteristic processing are carried out on an input data set, parameters of the initial model are adjusted through a genetic algorithm, an optimal prediction model is generated through a step-by-step iteration method, and a test data set is input into the optimal prediction model to predict the adsorption performance of the chemical material. The pipeline method comprises the steps of performing data preprocessing, feature engineering, model selection, model evaluation and the like end to end by using the pipeline, so that the prediction model can be optimized without manual participation. By combining the method of automatic machine learning in the chemical material performance prediction, the time and labor cost can be minimized while the prediction accuracy is ensured.
Fig. 2 is a flowchart illustrating a step S103 of the method for predicting the chemical material adsorption performance based on automatic machine learning according to the present invention.
The embodiment of the invention provides a chemical material adsorption performance prediction method based on automatic machine learning, and the step S103 comprises the following steps:
and step 310, performing feature engineering, namely performing feature processing on the original data and preprocessing on the original data set, wherein the preprocessing comprises one or more of data sampling, data cleaning, feature compression, feature conversion and feature extraction, and performing feature screening according to the importance of features.
Specifically, by a SelectKBest method, selecting the feature information of the optimal top n% and removing the features which do not meet the minimum variance threshold, wherein the method for selecting the optimal top n% features is performed by combining chi-square verification and mutual information:
Figure BDA0002992199860000111
Figure BDA0002992199860000112
wherein p (x, y) is a joint distribution function of x and y, and p (x) and p (y) are marginal probability density functions of x and y, FiIs an observed value of the ith feature, EiIs the expected value of the ith feature, the feature importance includes the correlation between the feature and the target variable and the inter-featureAnd (4) correlation, namely, the features with strong correlation with the target variable are reserved, and the features with strong correlation among the features are deleted to avoid multiple collinearity of the model.
Specifically, the correlation coefficient of the two characteristic variables is calculated by the following formula, and if the correlation coefficient of the two characteristic variables is greater than 0.9, the correlation of the two characteristic variables is determined to be strong:
Figure BDA0002992199860000113
where r (x, y) represents the correlation coefficient between two variables x and y,
Figure BDA0002992199860000114
and
Figure BDA0002992199860000115
representing the mean of x and y, respectively.
And 320, selecting a model, namely selecting the model with excellent performance of the initialization model in the machine learning.
And 330, model parameter adjustment, namely optimizing the hyper-parameters of the plurality of initial models through the genetic algorithm to generate a plurality of initial models with optimal performance, and selecting a batch of optimal parameters to ensure that the model performance is optimal.
Step 340, performing superposition combination on the optimal models obtained in the step 330 in a pipeline mode, generating a plurality of optimal performance initial models according to a genetic algorithm, and integrating the plurality of optimal performance initial models into an optimal prediction model set by using a superposition combination method:
Figure BDA0002992199860000121
wherein A ═ { A ═ A1,...AnRepresents a set of base learners, each element representing data processing and machine learning algorithms, AjE.a (j ═ 1.. times, n) is the hyper-parameter space Λ corresponding to the setj
In particular, the modelK cross-validation is performed during selection, and the data set is divided into k training sets
Figure BDA0002992199860000122
And k verification sets
Figure BDA0002992199860000123
Figure BDA0002992199860000124
To be in a training set
Figure BDA0002992199860000125
With a hyperparameter lambda epsilon lambda of trainingjAlgorithm A ofjIn the verification set
Figure BDA0002992199860000126
The optimal model combination and the hyper-parameter combination are found through the algorithm.
Specifically, an optimal pipeline model is obtained by integration through a superposition combination method, so that a high-performance model can be prevented from being discarded, a more complex model structure is formed, a model with stronger prediction capability can be obtained, simplification of hyper-parameters and overfitting of data are avoided, and the obtained model has stronger robustness; the pipeline method is iterative, each step is repeatedly executed, the accuracy of the model is continuously improved, and a successful algorithm is obtained. Combining the method models of all the processing processes to form a prediction model of a production line, and finally obtaining a finished pipeline prediction model comprising data preprocessing, characteristic engineering and model prediction.
Step 350, evaluating the pipeline model, and fitting goodness of fit R through an objective function2And RMSE estimates the pipeline model formed by overlapping and combining the optimal models:
Figure BDA0002992199860000131
Figure BDA0002992199860000132
where n represents the total number of data sets,
Figure BDA0002992199860000133
and yiRespectively the optimal model predicted value and the actual value of the ith data,
Figure BDA0002992199860000134
is all predicted values
Figure BDA0002992199860000135
Average value of (a).
Fig. 3 is a block diagram of a chemical material adsorption performance prediction apparatus based on automatic machine learning according to the present invention.
The embodiment of the invention provides a chemical material adsorption performance prediction device based on automatic machine learning, which comprises:
the data set building module 10 is used for acquiring physical and chemical structural characteristics of chemical materials, judging and filtering invalid data and null values, and building an original data set by combining the characteristics;
the model pre-training module 20 is used for obtaining a plurality of initial models containing hyper-parameters according to different algorithms; and setting a group of initial hyper-parameter values for each model according to the prior knowledge so as to facilitate the subsequent iteration optimization operation.
A model building module 30 for iteratively training a plurality of initial models by a pipeline method to obtain an optimal prediction model, the training module comprising:
the feature engineering module 31 is used for performing feature processing on the original data set, and performing feature selection, compression, extraction and the like on the original data set according to feature importance;
the model selection module 32 is used for selecting an algorithm model for the original data set after the feature processing by combining the feature importance, and selecting different algorithms to build an initial model;
the parameter optimizing module 33 is configured to optimize the hyper-parameters for the plurality of initial models respectively by using a genetic algorithm optimization method to obtain a plurality of initial models with optimal performance:
a pipeline module 34, configured to integrate the initial models with the best performance into an optimal prediction model by a superposition combination method;
a model evaluation module 35 for evaluating the performance of the models in the formed pipeline modules to select an optimal prediction model;
the test module 36 is used for inputting the test data set to the optimal prediction model to predict the chemical material adsorption performance.
Fig. 4 is a schematic diagram of a model performance of the method for predicting the adsorption performance of the chemical material based on automatic machine learning according to the present invention for evaluating the adsorption performance of the organic molecule to methane gas.
The evaluation model performance of the organic molecules on the methane gas adsorption performance is compared with a predicted value and a real observed value of the covalent organic compounds on the methane gas adsorption performance through a pipeline model, so that the evaluation model performance is improved by 2 to 3 orders of magnitude in efficiency compared with the traditional molecular simulation calculation method.
So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention; various modifications and alterations to this invention will become apparent to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A chemical material adsorption performance prediction method based on automatic machine learning is characterized by comprising the following steps:
acquiring various characteristics related to the adsorption performance of the chemical material, establishing an original data set by combining different types of characteristics, and preprocessing the original data set;
performing feature processing on the preprocessed raw data set and utilizing machine learning to generate a plurality of initial models containing hyper-parameters according to the raw data subjected to the feature processing;
iteratively training a plurality of the initial models by a pipeline method to generate an optimal prediction model;
inputting the test data set into the optimal prediction model to predict the adsorption performance of the chemical material.
2. The method of predicting the adsorption performance of a chemical material based on automatic machine learning of claim 1, wherein the method of preprocessing the raw data comprises: one or more of data sampling, data cleaning, feature compression, feature conversion and feature extraction;
the method for generating a plurality of initial models containing hyper-parameters through machine learning comprises the following steps: and performing characteristic processing on the original data to ensure the reasonability of the data and selecting different machines for learning to generate a plurality of initial models containing the hyperparameters according to the prior knowledge.
3. The method of predicting the adsorption performance of a chemical material based on automatic machine learning according to claim 1, wherein the method of iteratively training a plurality of initial models by a pipeline method to obtain an optimal prediction model comprises: and performing data screening and feature processing on the data set subjected to feature processing according to the feature importance, and performing parameter adjustment on the initial model through a genetic algorithm and an iteration method.
4. The automated machine learning-based prediction method of chemical material adsorption performance of claim 3, wherein the data screening comprises: selecting the optimal top n% of feature information by using a SelectKBest method and removing the feature information which does not accord with the minimum variance threshold, wherein the optimal top n% of feature is obtained by combining chi-square verification and mutual information through the selection method, and the formula is as follows:
Figure FDA0002992199850000021
Figure FDA0002992199850000022
wherein p (x, y) is a joint distribution function of x and y, p (x) and p (y) are marginal probability density functions of x and y, respectively, FiIs an observed value of the ith feature, EiIs the expected value of the ith feature.
5. The method of predicting adsorption performance of chemical material based on automatic machine learning according to claim 4, wherein the feature importance includes correlation between features and target variables and correlation between features, the feature importance is generated by analyzing correlation between features and target variables and correlation between features, retaining features having strong correlation with target variables and deleting features having strong correlation between features, if correlation coefficient of two feature variables is greater than 0.9, it is determined that correlation of two feature variables is strong, and the calculation formula is:
Figure FDA0002992199850000023
where r (x, y) represents the correlation coefficient between two variables x and y,
Figure FDA0002992199850000031
and
Figure FDA0002992199850000032
respectively represent x andthe mean value of y.
6. The method of predicting the adsorption performance of a chemical material based on automatic machine learning according to claim 3, wherein the method of parametrizing the initial model by a genetic algorithm comprises: and generating a plurality of initial models with the best performance by respectively optimizing the hyper-parameters of the initial models and generating the best prediction model by selecting the best parameters.
7. The automated machine learning-based prediction method of chemical material adsorption performance of claim 6, wherein the method of generating an optimal prediction model comprises: integrating a plurality of initial models with the best performance into an optimal prediction model set through superposition combination, wherein an integration formula is shown as follows,
Figure FDA0002992199850000033
wherein A ═ { A ═ A1,...AnIs the set of machine learning, each element represents data processing and machine learning algorithm, AjE (j is 1.. multidata., n) is a hyper-parameter space Lambda corresponding to the setj
When model selection is performed, k cross-validation is performed on the data set, and the data set is divided into k training sets
Figure FDA0002992199850000034
And k verification sets
Figure FDA0002992199850000035
Figure FDA0002992199850000036
Is a training set
Figure FDA0002992199850000037
After training, the training is carried out with a hyperparameter lambda epsilon lambdajAlgorithm A ofjIn the verification set
Figure FDA0002992199850000038
To generate the optimal combination of prediction models and hyper-parameter combinations.
8. The method of claim 7, wherein the goodness-of-fit R is used to predict the adsorption performance of the chemical material2And RMSE evaluation of the best predictive model, R2And RMSE is calculated as follows:
Figure FDA0002992199850000041
Figure FDA0002992199850000042
where n represents the total number of data sets,
Figure FDA0002992199850000043
and yiRespectively the optimal model predicted value and the actual value of the ith data,
Figure FDA0002992199850000044
is all predicted values
Figure FDA0002992199850000045
Average value of (a).
9. An automatic machine learning-based chemical material adsorption performance prediction apparatus for performing the automatic machine learning-based chemical material adsorption performance prediction method according to any one of claims 1 to 8, comprising:
the data set construction module is used for acquiring physical and chemical structural characteristics of the chemical material, judging and filtering invalid data and null values, and establishing an original data set for the filtered characteristics;
the model pre-training module is connected with the data component module and used for generating a plurality of initial models containing hyper-parameters according to different algorithms;
a model construction module connected with the model pre-training module and used for iteratively training a plurality of initial models through a pipeline method to generate an optimal prediction model;
and the test module is connected with the model component module and used for inputting a test data set to the optimal prediction model to predict the adsorption performance of the chemical material.
10. The apparatus of claim 9 for predicting the adsorption performance of a chemical material based on automatic machine learning, wherein the model building module comprises:
the characteristic engineering module is connected with the pre-training module of the model and is used for carrying out characteristic processing on the original data set and carrying out characteristic selection, compression and extraction on the original data in the original data set according to the characteristic importance;
the model selection module is connected with the characteristic engineering module and used for carrying out algorithm model selection on the original data set after characteristic processing by combining characteristic importance and building an initial model by different algorithms;
the parameter optimizing module is connected with the model selecting module and used for respectively optimizing the hyperparameters of the plurality of initial models by a genetic algorithm optimizing method to generate a plurality of initial models with the best performance:
the pipeline module is connected with the parameter optimizing module and is used for integrating a plurality of initial models with the best performance into an optimal prediction model set through a superposition combination method;
and the model evaluation module is respectively connected with the pipeline module and the test module and is used for evaluating the performance of the optimal prediction model set model in the formed pipeline module and selecting an optimal prediction model.
CN202110318374.8A 2021-03-25 2021-03-25 Chemical material adsorption performance prediction method and device based on automatic machine learning Pending CN112966447A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110318374.8A CN112966447A (en) 2021-03-25 2021-03-25 Chemical material adsorption performance prediction method and device based on automatic machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110318374.8A CN112966447A (en) 2021-03-25 2021-03-25 Chemical material adsorption performance prediction method and device based on automatic machine learning

Publications (1)

Publication Number Publication Date
CN112966447A true CN112966447A (en) 2021-06-15

Family

ID=76278499

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110318374.8A Pending CN112966447A (en) 2021-03-25 2021-03-25 Chemical material adsorption performance prediction method and device based on automatic machine learning

Country Status (1)

Country Link
CN (1) CN112966447A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113505527A (en) * 2021-06-24 2021-10-15 中国科学院计算机网络信息中心 Material property prediction method and system based on data driving
CN113761802A (en) * 2021-09-10 2021-12-07 成都材智科技有限公司 Nuclear power structural material data performance prediction model and model construction method
CN114530217A (en) * 2022-02-16 2022-05-24 西安建筑科技大学 Semi-coke-based porous carbon heavy metal adsorption efficiency prediction method and related device
CN115080752A (en) * 2022-08-18 2022-09-20 湖南大学 Numerical value feature discovery method and system based on automatic acquisition of feature field knowledge
CN115366281A (en) * 2022-08-22 2022-11-22 青岛科技大学 Mold temperature controller temperature control method and device based on machine learning and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111325285A (en) * 2020-03-10 2020-06-23 五邑大学 Fatigue driving prediction method and device based on automatic machine learning and storage medium
WO2020249125A1 (en) * 2019-06-14 2020-12-17 第四范式(北京)技术有限公司 Method and system for automatically training machine learning model
WO2020253055A1 (en) * 2019-06-19 2020-12-24 山东大学 Parallel analog circuit optimization method based on genetic algorithm and machine learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020249125A1 (en) * 2019-06-14 2020-12-17 第四范式(北京)技术有限公司 Method and system for automatically training machine learning model
WO2020253055A1 (en) * 2019-06-19 2020-12-24 山东大学 Parallel analog circuit optimization method based on genetic algorithm and machine learning
CN111325285A (en) * 2020-03-10 2020-06-23 五邑大学 Fatigue driving prediction method and device based on automatic machine learning and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
IOANNIS TSAMARDINOS等: "An Automated Machine Learning architecture for the accelerated prediction of Metal-Organic Frameworks performance in energy and environmental applications", 《MICROPOROUS AND MESOPOROUS MATERIALS》, pages 1 - 13 *
涂同珩: "基于自动机器学习的雷达信号识别研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》, no. 10, pages 140 - 71 *
袁慎: "基于属性加权的聚类算法在银行客户细分中的应用研究", 《中国优秀硕士学位论文全文数据库信息科技辑》, no. 2, pages 138 - 704 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113505527A (en) * 2021-06-24 2021-10-15 中国科学院计算机网络信息中心 Material property prediction method and system based on data driving
CN113761802A (en) * 2021-09-10 2021-12-07 成都材智科技有限公司 Nuclear power structural material data performance prediction model and model construction method
CN114530217A (en) * 2022-02-16 2022-05-24 西安建筑科技大学 Semi-coke-based porous carbon heavy metal adsorption efficiency prediction method and related device
CN115080752A (en) * 2022-08-18 2022-09-20 湖南大学 Numerical value feature discovery method and system based on automatic acquisition of feature field knowledge
CN115080752B (en) * 2022-08-18 2022-12-02 湖南大学 Numerical value feature discovery method and system based on automatic acquisition of feature field knowledge
CN115366281A (en) * 2022-08-22 2022-11-22 青岛科技大学 Mold temperature controller temperature control method and device based on machine learning and storage medium

Similar Documents

Publication Publication Date Title
CN112966447A (en) Chemical material adsorption performance prediction method and device based on automatic machine learning
Goh et al. A review on machine learning in 3D printing: applications, potential, and challenges
CN112382352B (en) Method for quickly evaluating structural characteristics of metal organic framework material based on machine learning
CN110739031B (en) Supervised prediction method and device for metallurgical sintering process and storage medium
Kalidindi et al. Digital twins for materials
CN112669899B (en) 16S and metagenome sequencing data correlation analysis method, system and equipment
McGregor et al. Using machine learning to predict dimensions and qualify diverse part designs across multiple additive machines and materials
CN103226728A (en) Intelligent detection and yield optimization method for HDPE (high density polyethylene) cascade polymerization reaction course
CN109636006A (en) A kind of multirow facility layout method
CN111370055A (en) Intron retention prediction model establishing method and prediction method thereof
Wang et al. A new input variable selection method for soft sensor based on stacked auto-encoders
Liu et al. Learning chordal extensions
CN115148307A (en) Material performance automatic prediction system
CN114926075A (en) Mechanical part production scheduling method based on man-hour prediction
JP7207128B2 (en) Forecasting Systems, Forecasting Methods, and Forecasting Programs
CN114819344A (en) Global space-time meteorological agricultural disaster prediction method based on key influence factors
CN115497573B (en) Carbon-based biological and geological catalytic material property prediction and preparation method
CN116151107B (en) Method, system and electronic equipment for identifying ore potential of magma type nickel cobalt
CN116595889B (en) Processing method and system for thin rib uniform distribution structure based on PEEK material
Liu et al. Determining zeolite structures with a domain-dependent genetic algorithm
Shahzad et al. Accelerating materials discovery: combinatorial synthesis, high-throughput characterization, and computational advances
WO2023176062A1 (en) Method for designing porous body and method for manufacturing porous body
Chike et al. Impact of Machine/Deep Learning on Additive Manufacturing: Publication Trends, Bibliometric Analysis, and Literature Review (2013-2022).
Worthington Machine learning predictions of crack paths in brittle and ductile media
CN115099261A (en) CFRP grinding and polishing processing surface roughness prediction method and system based on acoustic emission signals

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination