CN112132183A - Method for rapidly predicting specific surface area of perovskite material based on XGboost algorithm - Google Patents
Method for rapidly predicting specific surface area of perovskite material based on XGboost algorithm Download PDFInfo
- Publication number
- CN112132183A CN112132183A CN202010845701.0A CN202010845701A CN112132183A CN 112132183 A CN112132183 A CN 112132183A CN 202010845701 A CN202010845701 A CN 202010845701A CN 112132183 A CN112132183 A CN 112132183A
- Authority
- CN
- China
- Prior art keywords
- specific surface
- surface area
- perovskite material
- sample
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 239000000463 material Substances 0.000 title claims abstract description 58
- 238000000034 method Methods 0.000 title claims abstract description 44
- 238000012360 testing method Methods 0.000 claims abstract description 21
- 238000012216 screening Methods 0.000 claims abstract description 13
- 238000012549 training Methods 0.000 claims abstract description 13
- 239000000126 substance Substances 0.000 claims abstract description 10
- 238000007781 pre-processing Methods 0.000 claims description 9
- 230000008569 process Effects 0.000 claims description 9
- 238000003066 decision tree Methods 0.000 claims description 5
- 238000009825 accumulation Methods 0.000 claims description 4
- 238000010801 machine learning Methods 0.000 abstract description 4
- 238000001354 calcination Methods 0.000 description 23
- 238000004364 calculation method Methods 0.000 description 6
- 238000002790 cross-validation Methods 0.000 description 6
- 238000002474 experimental method Methods 0.000 description 5
- 238000002844 melting Methods 0.000 description 5
- 230000008018 melting Effects 0.000 description 5
- 201000004569 Blindness Diseases 0.000 description 3
- 230000003197 catalytic effect Effects 0.000 description 3
- 230000007547 defect Effects 0.000 description 3
- 230000007613 environmental effect Effects 0.000 description 3
- 238000002360 preparation method Methods 0.000 description 3
- 239000010936 titanium Substances 0.000 description 3
- 238000009835 boiling Methods 0.000 description 2
- 238000006555 catalytic reaction Methods 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 241000282414 Homo sapiens Species 0.000 description 1
- 229910012463 LiTaO3 Inorganic materials 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 239000013078 crystal Substances 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 229910021645 metal ion Inorganic materials 0.000 description 1
- 229910052761 rare earth metal Inorganic materials 0.000 description 1
- 230000001172 regenerating effect Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 229910001428 transition metal ion Inorganic materials 0.000 description 1
- -1 tricalcium titanium Chemical compound 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/70—Machine learning, data mining or chemometrics
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C60/00—Computational materials science, i.e. ICT specially adapted for investigating the physical or chemical properties of materials or phenomena associated with their design, synthesis, processing, characterisation or utilisation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Chemical & Material Sciences (AREA)
- Crystallography & Structural Chemistry (AREA)
- Evolutionary Biology (AREA)
- Other Investigation Or Analysis Of Materials By Electrical Means (AREA)
Abstract
The invention relates to a method for rapidly predicting the specific surface area of a perovskite material based on an XGboost machine learning algorithm, which searches for ABO from the literature by investigating the literature3Taking a preprocessed data set sample as a data set sample according to the specific surface area data and the chemical formula of the perovskite material; generating characteristic variables by using the sample set; randomly dividing a data set sample into a training set and a testing set; with ABO3The specific surface area of the type perovskite material sample is taken as a target variable, and the obtained part of characteristic variables are taken as independent variables. Establishing ABO by XGboost algorithm3A rapid prediction model of the specific surface area of the perovskite material; screening the characteristic variables and establishing a forecasting model; and forecasting the specific surface area of the obtained test set sample according to the rapid forecasting model. The invention is based on the credibility and realityLiterature data and latest modeling methods, established prediction ABO3The model of the specific surface area of the perovskite material has the advantages of simplicity, convenience, rapidness, low cost, no pollution, simple test, high efficiency and the like.
Description
Technical Field
The invention relates to a method for testing the catalytic performance of a perovskite material, in particular to a method for quickly predicting the Specific surface area (Specific surface area) of the perovskite material based on an XGboost algorithm.
Background
In 1893, perovskite materials were discovered by the german mineralogist gusteveau in the Wularmountain area in the Russia. Perovskites are of many kinds, such as: bulk perovskite oxides, porous perovskite oxides, hollow perovskite oxides, and the like. In recent years, perovskites have attracted considerable attention from researchers due to their unique crystal structures. At ABO3In the perovskite compound, a rare earth element is generally used for occupying the a site, and a transition metal ion element is generally used for occupying the B site. A. The B site can be doped with other metal ion elements to improve the performance of the material. ABO3The perovskite type is widely applied to the fields of magnetic property, ferroelectric property, photoelectric property, catalytic property and the like.
With the development of socio-economy, in recent years, human beings face the problem of energy shortage, and the development and design of clean energy becomes a hotspot of researchers, and perovskite materials are used as clean energy in the aspects of catalysis and batteries. The Specific Surface Area (SSA) is one of the important features describing perovskite materials, ABO3The size of the Specific Surface Area (SSA) of the perovskite type can be one of the important parameters for evaluating the catalytic performance. The research preliminarily finds that the perovskite material has higher specific surface area, so that the perovskite material has very good application prospect in the field of catalysis. Thus establishing ABO3A quantitative relation model of the specific surface area of the perovskite material, the atomic property and the preparation condition,thereby finding ABO with higher specific surface area3The perovskite material has very important research significance.
XGboost is a short for eXtreme Gradient Boosting, and is a Gradient Boosting decision tree algorithm based on iterative accumulation. XGBoost is an improvement and specific implementation of the gradient enhanced regression tree (GBRT) proposed by chentianqi of the university of washington, earliest in 2014. Compared with a gradient enhanced regression tree (GBRT), (1) the algorithm improves an objective function and introduces a decision tree complexity regular term to control the complexity of a model; (2) the algorithm introduces a sparse-aware algorithm (sparse-aware), so that parallel computing becomes possible; (3) the algorithm is improved in details, such as column sampling, support of linear classifiers and the like.
Disclosure of Invention
In order to solve the problems in the prior art, the invention aims to overcome the defects of the existing technology, and provides a method for rapidly predicting the specific surface area of a perovskite material based on an XGboost algorithm, wherein the method is low in cost, simple, efficient, complete and accurate in data, free of experiment, complex calculation process and pollution. The method is a machine learning method for rapidly predicting the specific surface area of the perovskite material based on the XGboost algorithm.
In order to achieve the purpose, the invention can be realized by the following technical scheme:
a method for rapidly predicting the specific surface area of a perovskite material based on an XGboost algorithm comprises the following steps:
1) search for ABO from literature by investigating literature3Carrying out data preprocessing on Specific Surface Area (SSA) data and a chemical formula of the perovskite material, and taking a preprocessed data set sample as a data set sample for subsequent modeling;
2) generating corresponding characteristic variables such as atomic properties, structural parameters and the like by using the molecular formula of the sample set according to the collected process characteristics, structural parameters and atomic properties;
3) randomly dividing the data set samples obtained in the step 1) into a training set and a testing set;
4) to the fraction collected in said step 1)ABO3The specific surface area of the perovskite material sample is taken as a target variable, and part of characteristic variables obtained in the step 2) are taken as independent variables; establishing ABO by adopting XGboost algorithm3A rapid prediction model of the specific surface area of the perovskite material;
5) screening the characteristic variables, screening 10 characteristic variables from the original 21 characteristic variables, and establishing a forecasting model;
6) based on the ABO established in the associated step 5)3And (3) a rapid prediction model of the specific surface area of the perovskite material, and predicting the specific surface area of the test set sample obtained in the step 5).
Preferably, the preprocessing of the data samples in step 1) includes sorting the molecular formula of the samples and finding out sample data with specific surface area.
Preferably, the data preprocessing in step 1) includes screening samples with specific surface area and determining the number of sample data sets.
Preferably, the XGBoost algorithm in step 4) is a gradient boosting decision tree algorithm based on iterative accumulation.
Preferably, the XGBoost algorithm in step 6) is proposed by chentianqi of university of washington in 2014, and the algorithm is to start from an initial naive model, establish a new model for fitting based on errors of observed values in a sample set, add the new model to the existing model in an addition form, and repeatedly iterate the process to form an integrated model.
Compared with the prior art, the invention has the following obvious prominent substantive features and obvious advantages:
1. ABO forecasting3The specific surface area content of the perovskite material is simple and quick, and ABO is collected from the literature by using a computer system3The sample data of the perovskite material can obtain the atomic properties and the structural parameters of all samples in a few seconds; the converted data is used as an independent variable, the specific surface area is used as a dependent variable, and the XGboost is used for modeling, so that a calculation result can be obtained only in a few seconds; the calculation process is convenient and efficient, and can be completed by only one person; through the forecast modelABO is judged in advance3The specific surface area of the perovskite material is selected to meet the requirements, and a sample is selected for test verification, so that the experimental efficiency can be improved, and blindness is avoided;
2. the method overcomes the defects of the traditional frying method, avoids continuous trial and error, and forecasts ABO through theory and calculation3The perovskite material has the advantages of specific surface area content and low cost, and the preparation method is simple, easy to realize and suitable for popularization and application;
3. the method of the invention does not relate to experiments, does not use chemicals in the whole process, has no pollution to the environment, and accords with the concept of environmental protection.
Drawings
Fig. 1 is a graph of the XGBoost regression model modeling result of the specific surface area of the tricalcium titanium ore material in the embodiment of the present invention.
Fig. 2 is a graph of the XGBoost regression model 5-fold cross validation result of the specific surface area of the tetracalcite ore material in the embodiment of the present invention.
FIG. 3 is a graph of the XGboost regression model independent test set results of the specific surface area of the pentaperovskite material in the embodiment of the present invention.
Fig. 4 is a schematic diagram of the principle of the XGBoost algorithm according to the embodiments of the present invention.
Detailed Description
The invention is described in detail below with reference to the figures and specific embodiments.
The first embodiment is as follows:
a method for rapidly predicting the specific surface area of a perovskite material based on an XGboost algorithm comprises the following steps:
1) search for ABO from literature by investigating literature3Carrying out data preprocessing on Specific Surface Area (SSA) data and a chemical formula of the perovskite material, and taking a preprocessed data set sample as a data set sample for subsequent modeling;
2) generating corresponding characteristic variables such as atomic properties, structural parameters and the like by using the molecular formula of the sample set according to the collected process characteristics, structural parameters and atomic properties;
3) randomly dividing the data set samples obtained in the step 1) into a training set and a testing set;
4) with the ABO collected in said step 1)3The specific surface area of the perovskite material sample is taken as a target variable, and part of characteristic variables obtained in the step 2) are taken as independent variables; establishing ABO by adopting XGboost algorithm3A rapid prediction model of the specific surface area of the perovskite material;
5) screening the characteristic variables, screening 10 characteristic variables from the original 21 characteristic variables, and establishing a forecasting model;
6) based on the ABO established in the associated step 5)3And (3) a rapid prediction model of the specific surface area of the perovskite material, and predicting the specific surface area of the test set sample obtained in the step 5).
The method of the embodiment overcomes the defects of the traditional frying method, avoids continuous trial and error, and predicts ABO through theory and calculation3The perovskite material has the advantages of specific surface area content and low cost, and the preparation method is simple, easy to realize and suitable for popularization and application.
Example two:
this embodiment is substantially the same as the above embodiment, and is characterized in that:
in this embodiment, the preprocessing of the data sample in step 1) includes sorting the molecular formula of the sample and finding out the sample data with specific surface area. The data preprocessing in the step 1) comprises screening samples with specific surface areas and determining the number of sample data sets.
In this embodiment, the XGBoost algorithm in step 4) is a gradient boosting decision tree algorithm based on iterative accumulation.
The method for rapidly predicting the specific surface area of the perovskite material based on the XGboost algorithm is low in cost, simple, efficient, complete and accurate in data, free of experiment, complex calculation process and pollution, and rapid.
Example three:
this embodiment is substantially the same as the above embodiment, and is characterized in that:
in the embodiment, the ABO is quickly predicted based on the XGboost machine learning algorithm3A method of specific surface area content of a perovskite-type material, comprising the steps of:
1) search for ABO from literature and databases by investigating literature3Specific Surface Area (SSA) and molecular formula of the perovskite-type material as data set samples for subsequent modeling. Partial ABO3The chemical formula and the specific surface area value of the perovskite material are shown in table 1;
TABLE 1 partial data sample set of perovskite molecular formula and specific surface area values
Chemical formula (II) | SSA/m2g-1 | Chemical formula (II) | SSA/m2g-1 |
BaZr0.899Sn0.101O3 | 5.7 | SrTi0.99Al0.01O3 | 2.7 |
BaZr0.8Sn0.2O3 | 3.8 | Na0.95Sr0.05TaO3 | 2.6 |
BaZr0.701Sn0.299O3 | 4 | Na0.999La0.011TaO3 | 2.5 |
Ca0.95La0.05Ti0.951Cr0.049O3 | 14.69 | Na0.99La0.02TaO3 | 3.2 |
Ca0.9La0.1Ti0.899Cr0.101O3 | 9.72 | Na0.99La0.02TaO3 | 3.5 |
Ca0.801La0.199Ti0.802Cr0.198O3 | 9.51 | Na0.965La0.035TaO3 | 4.9 |
BaNb0.665Co0.335O3 | 19.7 | LiTaO3 | 1.96 |
2) Regenerating corresponding atomic properties and structural features by using the chemical formula of the sample set; obtaining 21 characteristic variables including the characteristic variables of atom radius, atom electronegativity, atom melting point and the like; part of the characteristic variable data is shown in table 2;
TABLE 2 partial characteristic variable table
3) Randomly dividing 98 data set samples obtained in the step 1) into a training set and a testing set, wherein the proportion is 1: 4, the sample sizes of the training set and the test set are 78 and 20 respectively;
4) with the ABO collected in said step 1)3The specific surface area of the perovskite material sample is taken as a target variable, the atomic property data of the perovskite material sample is substituted into the following conversion equation, and the obtained converted data is shown in table 3; then screening out 10 converted characteristic variables as an optimal independent variable subset for modeling; finally establishing ABO for the training set sample obtained in the step 3) by adopting XGboost algorithm3A rapid prediction model of the specific surface area of the perovskite material;
P1=-0.004353[Radius_A]-0.01642[Radius_B]-0.2117[Ea]+1.330[Eb]-0.7311[TF]-0.007174[aO3]-0.01084[rc]-0.4550[Za]+0.7123[Zb]-0.1503[R_a/R_b]-0.003704[Mass]+0.004129[A_aff]-0.002251[B_aff]-0.0007124[A_Tm]+0.0001092[B_Tm]+3.91E-05[A_Tb]-2.623E-06[B_Tb]-0.002683[A_Hfus]-8.249E-05[B_Hfus]-0.03654[A_Density]+0.03631[B_Density]-0.001472[Calcination temperature]-0.06144[Calcination time]+2.057
P2=+0.001240[Radius_A]+0.002322[Radius_B]-0.5748[Ea]+0.01522[Eb]+0.1594[TF]+0.0009699[aO3]+0.003461[rc]-0.5994[Za]-0.06822[Zb]-0.01718[R_a/R_b]-0.01233[Mass]-0.007368[A_aff]-0.005196[B_aff]-0.001109[A_Tm]+6.676E-05[B_Tm]-0.0003223[A_Tb]+3.147E-05[B_Tb]-0.005600[A_Hfus]+0.0007792[B_Hfus]-0.1268[A_Density]-0.005220[B_Density]-0.001280[Calcination temperature]-0.03864[Calcination time]+10.604
P3=+0.008214[Radius_A]-0.02080[Radius_B]+1.375[Ea]+1.154[Eb]+1.829[TF]-0.008959[aO3]+0.006274[rc]-0.1867[Za]+0.1670[Zb]+0.3138[R_a/R_b]-0.01238[Mass]-0.009529[A_aff]-0.006347[B_aff]-0.0007486[A_Tm]-0.0002556[B_Tm]-0.0002856[A_Tb]-0.0001464[B_Tb]-0.009453[A_Hfus]-0.00069[B_Hfus]-0.009726[A_Density]-0.04398[B_Density]-0.001678[Calcination temperature]-0.05267[Calcination time]+5.596
P4=+0.001616[Radius_A]+0.01250[Radius_B]+3.183[Ea]+0.7320[Eb]+0.2143[TF]+0.005042[aO3]+0.01930[rc]+0.1732[Za]+0.4250[Zb]-0.01136[R_a/R_b]-0.008406[Mass]-0.004047[A_aff]-0.004583[B_aff]-0.001311[A_Tm]+3.624E-05[B_Tm]-0.0005064[A_Tb]+5.77E-05[B_Tb]-0.007909[A_Hfus]-0.005566[B_Hfus]+0.01509[A_Density]+0.02002[B_Density]-0.001275[Calcination temperature]-0.04217[Calcination time]-4.061
P5=+1.287E-05[Radius_A]+0.007085[Radius_B]+3.136[Ea]-2.923[Eb]-0.02134[TF]+0.002615[aO3]+0.01909[rc]-0.6690[Za]-0.2010[Zb]-0.01659[R_a/R_b]-0.008292[Mass]-0.003902[A_aff]+0.003641[B_aff]+0.0003066[A_Tm]+6.962E-05[B_Tm]+0.0002947[A_Tb]+9.823E-05[B_Tb]+0.0002871[A_Hfus]+0.002680[B_Hfus]+0.07634[A_Density]+0.01498[B_Density]-0.001640[Calcination temperature]-0.05509[Calcination time]+6.599
P6=+0.001780[Radius_A]+0.02312[Radius_B]+1.732[Ea]+1.105[Eb]+0.2420[TF]+0.009168[aO3]-0.01115[rc]-0.3194[Za]+0.2506[Zb]+0.05211[R_a/R_b]-0.009452[Mass]-0.01915[A_aff]+0.001699[B_aff]+0.001546[A_Tm]+0.0003963[B_Tm]+3.099E-05[A_Tb]+0.000257[B_Tb]+0.001348[A_Hfus]-0.002316[B_Hfus]+0.04421[A_Density]+0.1053[B_Density]+1.287E-05[Calcination temperature]-0.03217[Calcination time]-7.118
P7=+0.002563[Radius_A]+0.04203[Radius_B]+1.955[Ea]+2.925[Eb]+0.2192[TF]+0.01723[aO3]-0.003154[rc]-1.139[Za]+0.4292[Zb]+0.04602[R_a/R_b]-0.01006[Mass]-0.007689[A_aff]-0.002913[B_aff]+0.001966[A_Tm]+0.0004187[B_Tm]-3.815E-05[A_Tb]+0.0001841[B_Tb]-0.0008564[A_Hfus]+0.002528[B_Hfus]+0.09231[A_Density]+0.07596[B_Density]+0.0009823[Calcination temperature]+0.007110[Calcination time]-11.904
P8=+0.001358[Radius_A]+0.03356[Radius_B]+2.253[Ea]+1.688[Eb]-0.04910[TF]+0.01403[aO3]-0.01932[rc]-1.255[Za]-0.4454[Zb]+0.04821[R_a/R_b]-0.0005658[Mass]-0.003682[A_aff]-0.004948[B_aff]+0.001214[A_Tm]-3.886E-05[B_Tm]-0.0004656[A_Tb]+0.0001209[B_Tb]+0.001690[A_Hfus]-0.003008[B_Hfus]+0.1074[A_Density]-0.006633[B_Density]+0.0001445[Calcination temperature]-0.001449[Calcination time]+1.813
P9=-0.004053[Radius_A]-0.03830[Radius_B]+2.623[Ea]+1.718[Eb]-0.4856[TF]-0.01664[aO3]-0.01983[rc]-1.189[Za]-0.9871[Zb]+0.04117[R_a/R_b]-0.002149[Mass]-0.003582[A_aff]-0.003981[B_aff]+0.0008678[A_Tm]+0.0003424[B_Tm]-0.0005741[A_Tb]+0.0004025[B_Tb]+0.0007[A_Hfus]+0.0006933[B_Hfus]+0.1652[A_Density]+0.01008[B_Density]+0.0004473[Calcination temperature]+0.03664[Calcination time]+12.352
P10=-0.02304[Radius_A]-0.03334[Radius_B]-0.6108[Ea]+0.06693[Eb]-4.440[TF]-0.01432[aO3]+0.006226[rc]-0.7707[Za]+0.3130[Zb]-0.4503[R_a/R_b]+0.009910[Mass]-0.01894[A_aff]-0.002579[B_aff]+0.001971[A_Tm]-0.0004959[B_Tm]-0.0009604[A_Tb]+1.717E-05[B_Tb]-0.001678[A_Hfus]+0.007276[B_Hfus]+0.3947[A_Density]+0.002961[B_Density]-0.0001974[Calcination temperature]-0.004212[Calcination time]+9.822;
in each of the above equations:
radius _ a: radius of the A-site element;
radius _ B: radius of B site element;
ea: electronegativity of the A-site element;
eb: electronegativity of B bit element;
TF: a tolerance factor;
aO3:O3the unit cell edge of (1);
rc: a critical radius;
and Za: ionization energy of the A site element;
zb: ionization energy of B site element;
r _ a/R _ b: the ratio of the radii of the atoms at the A and B positions;
mass: molecular mass;
a _ aff: electronegativity of the A-site element;
b _ aff: electronegativity of B bit element;
a _ Tm: melting point of the A site element;
b _ Tm: melting point of B site element;
a _ Tb: the boiling point of the A site element;
b _ Tb: the boiling point of the B site element;
a _ Hvus: melting enthalpy of the a-site element;
b _ Hvus: melting enthalpy of B site element;
a _ Density: density of A-site elements;
b _ Density: density of B-site elements;
call temperature: the calcination temperature;
calcination time: and (4) calcining time.
TABLE 3 partial conversion data
P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 |
-2.06 | 1.20 | 1.67 | 0.15 | 0.20 | 0.19 | 0.56 | 0.02 | -0.32 | 0.11 |
1.06 | 2.50 | -0.28 | 0.73 | 1.04 | -0.42 | -0.14 | 0.07 | 0.28 | 0.03 |
2.07 | 1.92 | -0.96 | 1.71 | 0.55 | 0.32 | -0.40 | 0.18 | -0.29 | 0.02 |
0.07 | 0.64 | -2.61 | 0.78 | -0.78 | -0.23 | 0.37 | -0.19 | -0.09 | -0.05 |
-1.00 | -0.28 | -0.49 | 0.00 | -0.57 | -0.15 | 0.33 | -0.05 | 0.13 | 0.01 |
-1.99 | 0.17 | 0.93 | -0.78 | 0.17 | -0.17 | -0.91 | -0.06 | 0.24 | 0.03 |
1.91 | 1.82 | -1.09 | 1.63 | 0.44 | 0.27 | -0.35 | 0.16 | -0.26 | 0.01 |
According to the method, the specific surface area of the test set sample is rapidly forecast according to the established model for rapidly forecasting the specific surface area of the perovskite. The modeling result of the specific surface area quantitative prediction model established based on 78 perovskite samples combined with XGboost is shown in FIG. 1.
In the embodiment, 78 perovskite sample data are subjected to regression modeling by using an XGboost regression algorithm, and an XGboost regression quantitative model of the perovskite specific surface area is established. The correlation coefficient of the prediction value of the perovskite specific surface area model and the literature real value is 0.996. According to the method, an efficient and rapid forecasting model is established through sample data from documents and databases, the method has the advantages of rapidness, convenience, low cost, environmental friendliness, and can also guide actual experiment operation and avoid blindness.
Example four:
this embodiment is substantially the same as the above embodiment, and is characterized in that:
in this embodiment, 78 samples in the training set are grouped into 5 groups. Each group of datasets is denoted by D1, D2, D3, D4, D5. In the first step, D1, D2, D3 and D4 are used as training sets, model 1 is established by using the same optimal independent variable subset as in the first embodiment, and the specific surface area of D5 is forecasted by using model 1. Second step with D1D2, D3 and D5 are training sets, and model 2 is established and used to predict the specific surface area of D4 using the same optimal independent variable subset as in example one. By analogy, after 5 models are established, the stability and reliability of the data modeling method are judged through the error between the predicted value and the true value.
And according to the established rapid prediction model of the perovskite specific surface area, rapidly predicting the specific surface area of the training set sample. The internal cross validation result of 5 folds of the perovskite specific surface area quantitative prediction model established on the basis of 78 perovskite samples combined with XGboost is shown in FIG. 2.
In the method, a 5-fold cross validation method is adopted to perform 5-fold cross validation on the XGboost quantitative prediction model of the perovskite specific surface area established by 78 sample data, and the correlation coefficient of the model prediction value of the perovskite specific surface area in the 5-fold cross validation and the literature true value is 0.843. According to the method, the forecasting model of the 5-fold cross validation of the training set is established through sample data from documents and databases, the method has the advantages of being fast, convenient, low in cost, green, environment-friendly and the like, and meanwhile, the stability and the reliability of the data modeling method can be evaluated.
Example five:
this embodiment is substantially the same as the previous embodiment, and is characterized in that:
in this embodiment, the specific surface area of the test set sample is quickly predicted according to the established perovskite specific surface area quick prediction model. The prediction result of the independent test set of the perovskite quantitative prediction model established on the basis of 20 sample data combined with XGboost is shown in FIG. 3.
The XGboost quantitative prediction model of the perovskite specific surface area is established to predict 20 samples in the independent test set, so that a good result is obtained. The model prediction value of the perovskite specific surface area and the real worth correlation coefficient of the literature are 0.863, and the method establishes an efficient and rapid prediction model through sample data from the literature and a database, has the advantages of rapidness, convenience, low cost, environmental friendliness, and can also play a role in guiding the practical experiment operation and avoid blindness.
In summary, the above embodiments are methods for rapidly predicting the Specific Surface Area (SSA) of a perovskite material based on the XGBoost machine learning algorithm, and search for ABO from the literature using a computer system3The Specific Surface Area (SSA) data and the chemical formula of the perovskite material, and the preprocessed data set sample is used as a data set sample for subsequent modeling; generating corresponding characteristic variables such as atomic properties, structural parameters and the like by using the molecular formula of the sample set according to the collected process characteristics, structural parameters and atomic properties; will be provided withRandomly dividing the obtained data set sample into a training set and a test set; with collected ABO3The specific surface area of the perovskite material sample is taken as a target variable, and the obtained partial characteristic variables are taken as independent variables; establishing ABO by XGboost algorithm3A rapid prediction model of the specific surface area of the perovskite material; screening the characteristic variables, screening 10 characteristic variables from the original 21 characteristic variables, and establishing a forecasting model; according to the established ABO3And (3) a rapid prediction model of the specific surface area of the perovskite material predicts the specific surface area of the obtained test set sample. The prediction ABO established by the invention is based on reliable, real and new literature data and the latest modeling method3The model of the specific surface area of the perovskite material has the advantages of simplicity, convenience, rapidness, low cost, no pollution, simple test, high efficiency and the like.
While the embodiments of the present invention have been described with reference to the drawings, the present invention is not limited to the above embodiments, and various changes and modifications may be made according to the purpose of the invention, and any changes, modifications, substitutions, combinations or simplifications made according to the spirit and principle of the technical solution of the present invention shall be equivalent substitutions, as long as the invention is consistent with the purpose of the present invention, and the technical principle and the inventive concept of the method for fast predicting the perovskite specific surface area based on XGBoost according to the present invention shall fall within the protection scope of the present invention.
Claims (4)
1. A method for rapidly predicting the specific surface area of a perovskite material based on an XGboost algorithm comprises the following steps:
1) search for ABO from literature by investigating literature3Carrying out data preprocessing on the specific surface area data and the chemical formula of the perovskite material, and taking a preprocessed data set sample as a data set sample for subsequent modeling;
2) generating corresponding atomic properties and structural parameter characteristic variables by using molecular formulas of the sample set according to the collected process characteristics, structural parameters and atomic properties;
3) randomly dividing the data set samples obtained in the step 1) into a training set and a testing set;
4) with the ABO collected in said step 1)3Establishing ABO by using the XGboost algorithm with the specific surface area of the perovskite material sample as a target variable and the part of characteristic variables obtained in the step 2) as independent variables3A rapid prediction model of the specific surface area of the perovskite material;
5) screening the characteristic variables, screening 10 characteristic variables from the original 21 characteristic variables, and establishing a forecasting model;
6) based on the ABO established in the associated step 5)3And (3) a rapid prediction model of the specific surface area of the perovskite material, and predicting the specific surface area of the test set sample obtained in the step 5).
2. The XGboost algorithm-based method for rapidly predicting the specific surface area of a perovskite material according to claim 1, wherein the XGboost algorithm is used for: the preprocessing of the data sample in the step 1) comprises the steps of sorting the molecular formula of the sample and finding out sample data with specific surface area.
3. The XGboost algorithm-based method for rapidly predicting the specific surface area of a perovskite material according to claim 1, wherein the XGboost algorithm is used for: the data preprocessing in the step 1) comprises screening samples with specific surface areas and determining the number of sample data sets.
4. The XGboost algorithm-based method for rapidly predicting the specific surface area of a perovskite material according to claim 1, wherein the XGboost algorithm is used for: the XGboost algorithm in the step 4) is a gradient lifting decision tree algorithm based on iterative accumulation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010845701.0A CN112132183A (en) | 2020-08-20 | 2020-08-20 | Method for rapidly predicting specific surface area of perovskite material based on XGboost algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010845701.0A CN112132183A (en) | 2020-08-20 | 2020-08-20 | Method for rapidly predicting specific surface area of perovskite material based on XGboost algorithm |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112132183A true CN112132183A (en) | 2020-12-25 |
Family
ID=73850500
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010845701.0A Pending CN112132183A (en) | 2020-08-20 | 2020-08-20 | Method for rapidly predicting specific surface area of perovskite material based on XGboost algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112132183A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112992290A (en) * | 2021-03-17 | 2021-06-18 | 华北电力大学 | Perovskite band gap prediction method based on machine learning and cluster model |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111091878A (en) * | 2019-11-07 | 2020-05-01 | 上海大学 | Method for rapidly predicting perovskite dielectric constant |
CN111242302A (en) * | 2019-12-27 | 2020-06-05 | 冶金自动化研究设计院 | XGboost prediction method of intelligent parameter optimization module |
-
2020
- 2020-08-20 CN CN202010845701.0A patent/CN112132183A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111091878A (en) * | 2019-11-07 | 2020-05-01 | 上海大学 | Method for rapidly predicting perovskite dielectric constant |
CN111242302A (en) * | 2019-12-27 | 2020-06-05 | 冶金自动化研究设计院 | XGboost prediction method of intelligent parameter optimization module |
Non-Patent Citations (1)
Title |
---|
LI SHI ET AL.: "Using Data Mining To Search for Perovskite Materials with Higher Specific Surface Area", 《JOURNAL OF CHEMICAL INFORMATION AND MODELING》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112992290A (en) * | 2021-03-17 | 2021-06-18 | 华北电力大学 | Perovskite band gap prediction method based on machine learning and cluster model |
CN112992290B (en) * | 2021-03-17 | 2024-02-23 | 华北电力大学 | Perovskite band gap prediction method based on machine learning and cluster model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Chueh et al. | Electrochemistry of mixed oxygen ion and electron conducting electrodes in solid electrolyte cells | |
Niu et al. | Towards the digitalisation of porous energy materials: evolution of digital approaches for microstructural design | |
Döll et al. | Global optimization in LEED structure determination using genetic algorithms | |
CN111460382A (en) | Fuel vehicle harmful gas emission prediction method and system based on Gaussian process regression | |
CN112132177B (en) | Machine learning based fast prediction of ABO 3 On-line forecasting method of perovskite band gap | |
CN113052367A (en) | Method for efficiently predicting stability of perovskite based on integrated machine learning | |
CN108985381A (en) | The determination method, device and equipment of nitrogen oxide emission prediction model | |
CN112132183A (en) | Method for rapidly predicting specific surface area of perovskite material based on XGboost algorithm | |
Kiyohara et al. | Radial Distribution Function from X-ray Absorption near Edge Structure with an Artificial Neural Network | |
CN114252463A (en) | Urban atmospheric particulate source analysis method | |
Qi et al. | Chemical signatures to identify the origin of solid ashes for efficient recycling using machine learning | |
CN113808681A (en) | ABO (abnormal noise) rapid prediction based on SHAP-Catboost3Method and system for specific surface area of perovskite material | |
Pop et al. | A fuzzy classification of the chemical elements | |
Fei et al. | Discrimination of excessive exhaust emissions of vehicles based on Catboost algorithm | |
Albuthbahak et al. | Prediction of concrete compressive strength using supervised machine learning models through ultrasonic pulse velocity and mix parameters | |
CN116307068A (en) | Multi-city multi-atmosphere pollutant prediction method based on four-dimensional directed GCN-LSTM model | |
CN115050434A (en) | High-flux screening method for ultrahigh-temperature thermal barrier coating material | |
CN112132185B (en) | Method for rapidly predicting double perovskite oxide band gap based on data mining | |
CN113223639A (en) | Method for exploring structure, composition and property of perovskite oxide | |
CN112132182A (en) | Method for rapidly predicting resistivity of ternary gold alloy based on machine learning | |
Harris et al. | Applications of evolutionary computation in structure determination from diffraction data | |
Vaišnys et al. | An alternative method of estimating the cranial capacity of Olduvai Hominid 7 | |
Wang et al. | Research on coal mine gas concentration prediction based on cloud computing technology under the background of internet | |
Li et al. | Fault detection of liquid-propellant rocket engines based on LSSVM | |
CN112133383B (en) | Method for predicting perovskite specific surface area based on genetic symbolic regression |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20201225 |