CN112132183A - Method for rapidly predicting specific surface area of perovskite material based on XGboost algorithm - Google Patents

Method for rapidly predicting specific surface area of perovskite material based on XGboost algorithm Download PDF

Info

Publication number
CN112132183A
CN112132183A CN202010845701.0A CN202010845701A CN112132183A CN 112132183 A CN112132183 A CN 112132183A CN 202010845701 A CN202010845701 A CN 202010845701A CN 112132183 A CN112132183 A CN 112132183A
Authority
CN
China
Prior art keywords
specific surface
surface area
perovskite material
sample
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010845701.0A
Other languages
Chinese (zh)
Inventor
刘秀娟
陆文聪
李龙
陶秋伶
赵娟娟
王向东
张诗琳
杨晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Shanghai for Science and Technology
Original Assignee
University of Shanghai for Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Shanghai for Science and Technology filed Critical University of Shanghai for Science and Technology
Priority to CN202010845701.0A priority Critical patent/CN112132183A/en
Publication of CN112132183A publication Critical patent/CN112132183A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/70Machine learning, data mining or chemometrics
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C60/00Computational materials science, i.e. ICT specially adapted for investigating the physical or chemical properties of materials or phenomena associated with their design, synthesis, processing, characterisation or utilisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Chemical & Material Sciences (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Evolutionary Biology (AREA)
  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)

Abstract

The invention relates to a method for rapidly predicting the specific surface area of a perovskite material based on an XGboost machine learning algorithm, which searches for ABO from the literature by investigating the literature3Taking a preprocessed data set sample as a data set sample according to the specific surface area data and the chemical formula of the perovskite material; generating characteristic variables by using the sample set; randomly dividing a data set sample into a training set and a testing set; with ABO3The specific surface area of the type perovskite material sample is taken as a target variable, and the obtained part of characteristic variables are taken as independent variables. Establishing ABO by XGboost algorithm3A rapid prediction model of the specific surface area of the perovskite material; screening the characteristic variables and establishing a forecasting model; and forecasting the specific surface area of the obtained test set sample according to the rapid forecasting model. The invention is based on the credibility and realityLiterature data and latest modeling methods, established prediction ABO3The model of the specific surface area of the perovskite material has the advantages of simplicity, convenience, rapidness, low cost, no pollution, simple test, high efficiency and the like.

Description

Method for rapidly predicting specific surface area of perovskite material based on XGboost algorithm
Technical Field
The invention relates to a method for testing the catalytic performance of a perovskite material, in particular to a method for quickly predicting the Specific surface area (Specific surface area) of the perovskite material based on an XGboost algorithm.
Background
In 1893, perovskite materials were discovered by the german mineralogist gusteveau in the Wularmountain area in the Russia. Perovskites are of many kinds, such as: bulk perovskite oxides, porous perovskite oxides, hollow perovskite oxides, and the like. In recent years, perovskites have attracted considerable attention from researchers due to their unique crystal structures. At ABO3In the perovskite compound, a rare earth element is generally used for occupying the a site, and a transition metal ion element is generally used for occupying the B site. A. The B site can be doped with other metal ion elements to improve the performance of the material. ABO3The perovskite type is widely applied to the fields of magnetic property, ferroelectric property, photoelectric property, catalytic property and the like.
With the development of socio-economy, in recent years, human beings face the problem of energy shortage, and the development and design of clean energy becomes a hotspot of researchers, and perovskite materials are used as clean energy in the aspects of catalysis and batteries. The Specific Surface Area (SSA) is one of the important features describing perovskite materials, ABO3The size of the Specific Surface Area (SSA) of the perovskite type can be one of the important parameters for evaluating the catalytic performance. The research preliminarily finds that the perovskite material has higher specific surface area, so that the perovskite material has very good application prospect in the field of catalysis. Thus establishing ABO3A quantitative relation model of the specific surface area of the perovskite material, the atomic property and the preparation condition,thereby finding ABO with higher specific surface area3The perovskite material has very important research significance.
XGboost is a short for eXtreme Gradient Boosting, and is a Gradient Boosting decision tree algorithm based on iterative accumulation. XGBoost is an improvement and specific implementation of the gradient enhanced regression tree (GBRT) proposed by chentianqi of the university of washington, earliest in 2014. Compared with a gradient enhanced regression tree (GBRT), (1) the algorithm improves an objective function and introduces a decision tree complexity regular term to control the complexity of a model; (2) the algorithm introduces a sparse-aware algorithm (sparse-aware), so that parallel computing becomes possible; (3) the algorithm is improved in details, such as column sampling, support of linear classifiers and the like.
Disclosure of Invention
In order to solve the problems in the prior art, the invention aims to overcome the defects of the existing technology, and provides a method for rapidly predicting the specific surface area of a perovskite material based on an XGboost algorithm, wherein the method is low in cost, simple, efficient, complete and accurate in data, free of experiment, complex calculation process and pollution. The method is a machine learning method for rapidly predicting the specific surface area of the perovskite material based on the XGboost algorithm.
In order to achieve the purpose, the invention can be realized by the following technical scheme:
a method for rapidly predicting the specific surface area of a perovskite material based on an XGboost algorithm comprises the following steps:
1) search for ABO from literature by investigating literature3Carrying out data preprocessing on Specific Surface Area (SSA) data and a chemical formula of the perovskite material, and taking a preprocessed data set sample as a data set sample for subsequent modeling;
2) generating corresponding characteristic variables such as atomic properties, structural parameters and the like by using the molecular formula of the sample set according to the collected process characteristics, structural parameters and atomic properties;
3) randomly dividing the data set samples obtained in the step 1) into a training set and a testing set;
4) to the fraction collected in said step 1)ABO3The specific surface area of the perovskite material sample is taken as a target variable, and part of characteristic variables obtained in the step 2) are taken as independent variables; establishing ABO by adopting XGboost algorithm3A rapid prediction model of the specific surface area of the perovskite material;
5) screening the characteristic variables, screening 10 characteristic variables from the original 21 characteristic variables, and establishing a forecasting model;
6) based on the ABO established in the associated step 5)3And (3) a rapid prediction model of the specific surface area of the perovskite material, and predicting the specific surface area of the test set sample obtained in the step 5).
Preferably, the preprocessing of the data samples in step 1) includes sorting the molecular formula of the samples and finding out sample data with specific surface area.
Preferably, the data preprocessing in step 1) includes screening samples with specific surface area and determining the number of sample data sets.
Preferably, the XGBoost algorithm in step 4) is a gradient boosting decision tree algorithm based on iterative accumulation.
Preferably, the XGBoost algorithm in step 6) is proposed by chentianqi of university of washington in 2014, and the algorithm is to start from an initial naive model, establish a new model for fitting based on errors of observed values in a sample set, add the new model to the existing model in an addition form, and repeatedly iterate the process to form an integrated model.
Compared with the prior art, the invention has the following obvious prominent substantive features and obvious advantages:
1. ABO forecasting3The specific surface area content of the perovskite material is simple and quick, and ABO is collected from the literature by using a computer system3The sample data of the perovskite material can obtain the atomic properties and the structural parameters of all samples in a few seconds; the converted data is used as an independent variable, the specific surface area is used as a dependent variable, and the XGboost is used for modeling, so that a calculation result can be obtained only in a few seconds; the calculation process is convenient and efficient, and can be completed by only one person; through the forecast modelABO is judged in advance3The specific surface area of the perovskite material is selected to meet the requirements, and a sample is selected for test verification, so that the experimental efficiency can be improved, and blindness is avoided;
2. the method overcomes the defects of the traditional frying method, avoids continuous trial and error, and forecasts ABO through theory and calculation3The perovskite material has the advantages of specific surface area content and low cost, and the preparation method is simple, easy to realize and suitable for popularization and application;
3. the method of the invention does not relate to experiments, does not use chemicals in the whole process, has no pollution to the environment, and accords with the concept of environmental protection.
Drawings
Fig. 1 is a graph of the XGBoost regression model modeling result of the specific surface area of the tricalcium titanium ore material in the embodiment of the present invention.
Fig. 2 is a graph of the XGBoost regression model 5-fold cross validation result of the specific surface area of the tetracalcite ore material in the embodiment of the present invention.
FIG. 3 is a graph of the XGboost regression model independent test set results of the specific surface area of the pentaperovskite material in the embodiment of the present invention.
Fig. 4 is a schematic diagram of the principle of the XGBoost algorithm according to the embodiments of the present invention.
Detailed Description
The invention is described in detail below with reference to the figures and specific embodiments.
The first embodiment is as follows:
a method for rapidly predicting the specific surface area of a perovskite material based on an XGboost algorithm comprises the following steps:
1) search for ABO from literature by investigating literature3Carrying out data preprocessing on Specific Surface Area (SSA) data and a chemical formula of the perovskite material, and taking a preprocessed data set sample as a data set sample for subsequent modeling;
2) generating corresponding characteristic variables such as atomic properties, structural parameters and the like by using the molecular formula of the sample set according to the collected process characteristics, structural parameters and atomic properties;
3) randomly dividing the data set samples obtained in the step 1) into a training set and a testing set;
4) with the ABO collected in said step 1)3The specific surface area of the perovskite material sample is taken as a target variable, and part of characteristic variables obtained in the step 2) are taken as independent variables; establishing ABO by adopting XGboost algorithm3A rapid prediction model of the specific surface area of the perovskite material;
5) screening the characteristic variables, screening 10 characteristic variables from the original 21 characteristic variables, and establishing a forecasting model;
6) based on the ABO established in the associated step 5)3And (3) a rapid prediction model of the specific surface area of the perovskite material, and predicting the specific surface area of the test set sample obtained in the step 5).
The method of the embodiment overcomes the defects of the traditional frying method, avoids continuous trial and error, and predicts ABO through theory and calculation3The perovskite material has the advantages of specific surface area content and low cost, and the preparation method is simple, easy to realize and suitable for popularization and application.
Example two:
this embodiment is substantially the same as the above embodiment, and is characterized in that:
in this embodiment, the preprocessing of the data sample in step 1) includes sorting the molecular formula of the sample and finding out the sample data with specific surface area. The data preprocessing in the step 1) comprises screening samples with specific surface areas and determining the number of sample data sets.
In this embodiment, the XGBoost algorithm in step 4) is a gradient boosting decision tree algorithm based on iterative accumulation.
The method for rapidly predicting the specific surface area of the perovskite material based on the XGboost algorithm is low in cost, simple, efficient, complete and accurate in data, free of experiment, complex calculation process and pollution, and rapid.
Example three:
this embodiment is substantially the same as the above embodiment, and is characterized in that:
in the embodiment, the ABO is quickly predicted based on the XGboost machine learning algorithm3A method of specific surface area content of a perovskite-type material, comprising the steps of:
1) search for ABO from literature and databases by investigating literature3Specific Surface Area (SSA) and molecular formula of the perovskite-type material as data set samples for subsequent modeling. Partial ABO3The chemical formula and the specific surface area value of the perovskite material are shown in table 1;
TABLE 1 partial data sample set of perovskite molecular formula and specific surface area values
Chemical formula (II) SSA/m2g-1 Chemical formula (II) SSA/m2g-1
BaZr0.899Sn0.101O3 5.7 SrTi0.99Al0.01O3 2.7
BaZr0.8Sn0.2O3 3.8 Na0.95Sr0.05TaO3 2.6
BaZr0.701Sn0.299O3 4 Na0.999La0.011TaO3 2.5
Ca0.95La0.05Ti0.951Cr0.049O3 14.69 Na0.99La0.02TaO3 3.2
Ca0.9La0.1Ti0.899Cr0.101O3 9.72 Na0.99La0.02TaO3 3.5
Ca0.801La0.199Ti0.802Cr0.198O3 9.51 Na0.965La0.035TaO3 4.9
BaNb0.665Co0.335O3 19.7 LiTaO3 1.96
2) Regenerating corresponding atomic properties and structural features by using the chemical formula of the sample set; obtaining 21 characteristic variables including the characteristic variables of atom radius, atom electronegativity, atom melting point and the like; part of the characteristic variable data is shown in table 2;
TABLE 2 partial characteristic variable table
Figure BDA0002642983040000041
3) Randomly dividing 98 data set samples obtained in the step 1) into a training set and a testing set, wherein the proportion is 1: 4, the sample sizes of the training set and the test set are 78 and 20 respectively;
4) with the ABO collected in said step 1)3The specific surface area of the perovskite material sample is taken as a target variable, the atomic property data of the perovskite material sample is substituted into the following conversion equation, and the obtained converted data is shown in table 3; then screening out 10 converted characteristic variables as an optimal independent variable subset for modeling; finally establishing ABO for the training set sample obtained in the step 3) by adopting XGboost algorithm3A rapid prediction model of the specific surface area of the perovskite material;
P1=-0.004353[Radius_A]-0.01642[Radius_B]-0.2117[Ea]+1.330[Eb]-0.7311[TF]-0.007174[aO3]-0.01084[rc]-0.4550[Za]+0.7123[Zb]-0.1503[R_a/R_b]-0.003704[Mass]+0.004129[A_aff]-0.002251[B_aff]-0.0007124[A_Tm]+0.0001092[B_Tm]+3.91E-05[A_Tb]-2.623E-06[B_Tb]-0.002683[A_Hfus]-8.249E-05[B_Hfus]-0.03654[A_Density]+0.03631[B_Density]-0.001472[Calcination temperature]-0.06144[Calcination time]+2.057
P2=+0.001240[Radius_A]+0.002322[Radius_B]-0.5748[Ea]+0.01522[Eb]+0.1594[TF]+0.0009699[aO3]+0.003461[rc]-0.5994[Za]-0.06822[Zb]-0.01718[R_a/R_b]-0.01233[Mass]-0.007368[A_aff]-0.005196[B_aff]-0.001109[A_Tm]+6.676E-05[B_Tm]-0.0003223[A_Tb]+3.147E-05[B_Tb]-0.005600[A_Hfus]+0.0007792[B_Hfus]-0.1268[A_Density]-0.005220[B_Density]-0.001280[Calcination temperature]-0.03864[Calcination time]+10.604
P3=+0.008214[Radius_A]-0.02080[Radius_B]+1.375[Ea]+1.154[Eb]+1.829[TF]-0.008959[aO3]+0.006274[rc]-0.1867[Za]+0.1670[Zb]+0.3138[R_a/R_b]-0.01238[Mass]-0.009529[A_aff]-0.006347[B_aff]-0.0007486[A_Tm]-0.0002556[B_Tm]-0.0002856[A_Tb]-0.0001464[B_Tb]-0.009453[A_Hfus]-0.00069[B_Hfus]-0.009726[A_Density]-0.04398[B_Density]-0.001678[Calcination temperature]-0.05267[Calcination time]+5.596
P4=+0.001616[Radius_A]+0.01250[Radius_B]+3.183[Ea]+0.7320[Eb]+0.2143[TF]+0.005042[aO3]+0.01930[rc]+0.1732[Za]+0.4250[Zb]-0.01136[R_a/R_b]-0.008406[Mass]-0.004047[A_aff]-0.004583[B_aff]-0.001311[A_Tm]+3.624E-05[B_Tm]-0.0005064[A_Tb]+5.77E-05[B_Tb]-0.007909[A_Hfus]-0.005566[B_Hfus]+0.01509[A_Density]+0.02002[B_Density]-0.001275[Calcination temperature]-0.04217[Calcination time]-4.061
P5=+1.287E-05[Radius_A]+0.007085[Radius_B]+3.136[Ea]-2.923[Eb]-0.02134[TF]+0.002615[aO3]+0.01909[rc]-0.6690[Za]-0.2010[Zb]-0.01659[R_a/R_b]-0.008292[Mass]-0.003902[A_aff]+0.003641[B_aff]+0.0003066[A_Tm]+6.962E-05[B_Tm]+0.0002947[A_Tb]+9.823E-05[B_Tb]+0.0002871[A_Hfus]+0.002680[B_Hfus]+0.07634[A_Density]+0.01498[B_Density]-0.001640[Calcination temperature]-0.05509[Calcination time]+6.599
P6=+0.001780[Radius_A]+0.02312[Radius_B]+1.732[Ea]+1.105[Eb]+0.2420[TF]+0.009168[aO3]-0.01115[rc]-0.3194[Za]+0.2506[Zb]+0.05211[R_a/R_b]-0.009452[Mass]-0.01915[A_aff]+0.001699[B_aff]+0.001546[A_Tm]+0.0003963[B_Tm]+3.099E-05[A_Tb]+0.000257[B_Tb]+0.001348[A_Hfus]-0.002316[B_Hfus]+0.04421[A_Density]+0.1053[B_Density]+1.287E-05[Calcination temperature]-0.03217[Calcination time]-7.118
P7=+0.002563[Radius_A]+0.04203[Radius_B]+1.955[Ea]+2.925[Eb]+0.2192[TF]+0.01723[aO3]-0.003154[rc]-1.139[Za]+0.4292[Zb]+0.04602[R_a/R_b]-0.01006[Mass]-0.007689[A_aff]-0.002913[B_aff]+0.001966[A_Tm]+0.0004187[B_Tm]-3.815E-05[A_Tb]+0.0001841[B_Tb]-0.0008564[A_Hfus]+0.002528[B_Hfus]+0.09231[A_Density]+0.07596[B_Density]+0.0009823[Calcination temperature]+0.007110[Calcination time]-11.904
P8=+0.001358[Radius_A]+0.03356[Radius_B]+2.253[Ea]+1.688[Eb]-0.04910[TF]+0.01403[aO3]-0.01932[rc]-1.255[Za]-0.4454[Zb]+0.04821[R_a/R_b]-0.0005658[Mass]-0.003682[A_aff]-0.004948[B_aff]+0.001214[A_Tm]-3.886E-05[B_Tm]-0.0004656[A_Tb]+0.0001209[B_Tb]+0.001690[A_Hfus]-0.003008[B_Hfus]+0.1074[A_Density]-0.006633[B_Density]+0.0001445[Calcination temperature]-0.001449[Calcination time]+1.813
P9=-0.004053[Radius_A]-0.03830[Radius_B]+2.623[Ea]+1.718[Eb]-0.4856[TF]-0.01664[aO3]-0.01983[rc]-1.189[Za]-0.9871[Zb]+0.04117[R_a/R_b]-0.002149[Mass]-0.003582[A_aff]-0.003981[B_aff]+0.0008678[A_Tm]+0.0003424[B_Tm]-0.0005741[A_Tb]+0.0004025[B_Tb]+0.0007[A_Hfus]+0.0006933[B_Hfus]+0.1652[A_Density]+0.01008[B_Density]+0.0004473[Calcination temperature]+0.03664[Calcination time]+12.352
P10=-0.02304[Radius_A]-0.03334[Radius_B]-0.6108[Ea]+0.06693[Eb]-4.440[TF]-0.01432[aO3]+0.006226[rc]-0.7707[Za]+0.3130[Zb]-0.4503[R_a/R_b]+0.009910[Mass]-0.01894[A_aff]-0.002579[B_aff]+0.001971[A_Tm]-0.0004959[B_Tm]-0.0009604[A_Tb]+1.717E-05[B_Tb]-0.001678[A_Hfus]+0.007276[B_Hfus]+0.3947[A_Density]+0.002961[B_Density]-0.0001974[Calcination temperature]-0.004212[Calcination time]+9.822;
in each of the above equations:
radius _ a: radius of the A-site element;
radius _ B: radius of B site element;
ea: electronegativity of the A-site element;
eb: electronegativity of B bit element;
TF: a tolerance factor;
aO3:O3the unit cell edge of (1);
rc: a critical radius;
and Za: ionization energy of the A site element;
zb: ionization energy of B site element;
r _ a/R _ b: the ratio of the radii of the atoms at the A and B positions;
mass: molecular mass;
a _ aff: electronegativity of the A-site element;
b _ aff: electronegativity of B bit element;
a _ Tm: melting point of the A site element;
b _ Tm: melting point of B site element;
a _ Tb: the boiling point of the A site element;
b _ Tb: the boiling point of the B site element;
a _ Hvus: melting enthalpy of the a-site element;
b _ Hvus: melting enthalpy of B site element;
a _ Density: density of A-site elements;
b _ Density: density of B-site elements;
call temperature: the calcination temperature;
calcination time: and (4) calcining time.
TABLE 3 partial conversion data
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10
-2.06 1.20 1.67 0.15 0.20 0.19 0.56 0.02 -0.32 0.11
1.06 2.50 -0.28 0.73 1.04 -0.42 -0.14 0.07 0.28 0.03
2.07 1.92 -0.96 1.71 0.55 0.32 -0.40 0.18 -0.29 0.02
0.07 0.64 -2.61 0.78 -0.78 -0.23 0.37 -0.19 -0.09 -0.05
-1.00 -0.28 -0.49 0.00 -0.57 -0.15 0.33 -0.05 0.13 0.01
-1.99 0.17 0.93 -0.78 0.17 -0.17 -0.91 -0.06 0.24 0.03
1.91 1.82 -1.09 1.63 0.44 0.27 -0.35 0.16 -0.26 0.01
According to the method, the specific surface area of the test set sample is rapidly forecast according to the established model for rapidly forecasting the specific surface area of the perovskite. The modeling result of the specific surface area quantitative prediction model established based on 78 perovskite samples combined with XGboost is shown in FIG. 1.
In the embodiment, 78 perovskite sample data are subjected to regression modeling by using an XGboost regression algorithm, and an XGboost regression quantitative model of the perovskite specific surface area is established. The correlation coefficient of the prediction value of the perovskite specific surface area model and the literature real value is 0.996. According to the method, an efficient and rapid forecasting model is established through sample data from documents and databases, the method has the advantages of rapidness, convenience, low cost, environmental friendliness, and can also guide actual experiment operation and avoid blindness.
Example four:
this embodiment is substantially the same as the above embodiment, and is characterized in that:
in this embodiment, 78 samples in the training set are grouped into 5 groups. Each group of datasets is denoted by D1, D2, D3, D4, D5. In the first step, D1, D2, D3 and D4 are used as training sets, model 1 is established by using the same optimal independent variable subset as in the first embodiment, and the specific surface area of D5 is forecasted by using model 1. Second step with D1D2, D3 and D5 are training sets, and model 2 is established and used to predict the specific surface area of D4 using the same optimal independent variable subset as in example one. By analogy, after 5 models are established, the stability and reliability of the data modeling method are judged through the error between the predicted value and the true value.
And according to the established rapid prediction model of the perovskite specific surface area, rapidly predicting the specific surface area of the training set sample. The internal cross validation result of 5 folds of the perovskite specific surface area quantitative prediction model established on the basis of 78 perovskite samples combined with XGboost is shown in FIG. 2.
In the method, a 5-fold cross validation method is adopted to perform 5-fold cross validation on the XGboost quantitative prediction model of the perovskite specific surface area established by 78 sample data, and the correlation coefficient of the model prediction value of the perovskite specific surface area in the 5-fold cross validation and the literature true value is 0.843. According to the method, the forecasting model of the 5-fold cross validation of the training set is established through sample data from documents and databases, the method has the advantages of being fast, convenient, low in cost, green, environment-friendly and the like, and meanwhile, the stability and the reliability of the data modeling method can be evaluated.
Example five:
this embodiment is substantially the same as the previous embodiment, and is characterized in that:
in this embodiment, the specific surface area of the test set sample is quickly predicted according to the established perovskite specific surface area quick prediction model. The prediction result of the independent test set of the perovskite quantitative prediction model established on the basis of 20 sample data combined with XGboost is shown in FIG. 3.
The XGboost quantitative prediction model of the perovskite specific surface area is established to predict 20 samples in the independent test set, so that a good result is obtained. The model prediction value of the perovskite specific surface area and the real worth correlation coefficient of the literature are 0.863, and the method establishes an efficient and rapid prediction model through sample data from the literature and a database, has the advantages of rapidness, convenience, low cost, environmental friendliness, and can also play a role in guiding the practical experiment operation and avoid blindness.
In summary, the above embodiments are methods for rapidly predicting the Specific Surface Area (SSA) of a perovskite material based on the XGBoost machine learning algorithm, and search for ABO from the literature using a computer system3The Specific Surface Area (SSA) data and the chemical formula of the perovskite material, and the preprocessed data set sample is used as a data set sample for subsequent modeling; generating corresponding characteristic variables such as atomic properties, structural parameters and the like by using the molecular formula of the sample set according to the collected process characteristics, structural parameters and atomic properties; will be provided withRandomly dividing the obtained data set sample into a training set and a test set; with collected ABO3The specific surface area of the perovskite material sample is taken as a target variable, and the obtained partial characteristic variables are taken as independent variables; establishing ABO by XGboost algorithm3A rapid prediction model of the specific surface area of the perovskite material; screening the characteristic variables, screening 10 characteristic variables from the original 21 characteristic variables, and establishing a forecasting model; according to the established ABO3And (3) a rapid prediction model of the specific surface area of the perovskite material predicts the specific surface area of the obtained test set sample. The prediction ABO established by the invention is based on reliable, real and new literature data and the latest modeling method3The model of the specific surface area of the perovskite material has the advantages of simplicity, convenience, rapidness, low cost, no pollution, simple test, high efficiency and the like.
While the embodiments of the present invention have been described with reference to the drawings, the present invention is not limited to the above embodiments, and various changes and modifications may be made according to the purpose of the invention, and any changes, modifications, substitutions, combinations or simplifications made according to the spirit and principle of the technical solution of the present invention shall be equivalent substitutions, as long as the invention is consistent with the purpose of the present invention, and the technical principle and the inventive concept of the method for fast predicting the perovskite specific surface area based on XGBoost according to the present invention shall fall within the protection scope of the present invention.

Claims (4)

1. A method for rapidly predicting the specific surface area of a perovskite material based on an XGboost algorithm comprises the following steps:
1) search for ABO from literature by investigating literature3Carrying out data preprocessing on the specific surface area data and the chemical formula of the perovskite material, and taking a preprocessed data set sample as a data set sample for subsequent modeling;
2) generating corresponding atomic properties and structural parameter characteristic variables by using molecular formulas of the sample set according to the collected process characteristics, structural parameters and atomic properties;
3) randomly dividing the data set samples obtained in the step 1) into a training set and a testing set;
4) with the ABO collected in said step 1)3Establishing ABO by using the XGboost algorithm with the specific surface area of the perovskite material sample as a target variable and the part of characteristic variables obtained in the step 2) as independent variables3A rapid prediction model of the specific surface area of the perovskite material;
5) screening the characteristic variables, screening 10 characteristic variables from the original 21 characteristic variables, and establishing a forecasting model;
6) based on the ABO established in the associated step 5)3And (3) a rapid prediction model of the specific surface area of the perovskite material, and predicting the specific surface area of the test set sample obtained in the step 5).
2. The XGboost algorithm-based method for rapidly predicting the specific surface area of a perovskite material according to claim 1, wherein the XGboost algorithm is used for: the preprocessing of the data sample in the step 1) comprises the steps of sorting the molecular formula of the sample and finding out sample data with specific surface area.
3. The XGboost algorithm-based method for rapidly predicting the specific surface area of a perovskite material according to claim 1, wherein the XGboost algorithm is used for: the data preprocessing in the step 1) comprises screening samples with specific surface areas and determining the number of sample data sets.
4. The XGboost algorithm-based method for rapidly predicting the specific surface area of a perovskite material according to claim 1, wherein the XGboost algorithm is used for: the XGboost algorithm in the step 4) is a gradient lifting decision tree algorithm based on iterative accumulation.
CN202010845701.0A 2020-08-20 2020-08-20 Method for rapidly predicting specific surface area of perovskite material based on XGboost algorithm Pending CN112132183A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010845701.0A CN112132183A (en) 2020-08-20 2020-08-20 Method for rapidly predicting specific surface area of perovskite material based on XGboost algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010845701.0A CN112132183A (en) 2020-08-20 2020-08-20 Method for rapidly predicting specific surface area of perovskite material based on XGboost algorithm

Publications (1)

Publication Number Publication Date
CN112132183A true CN112132183A (en) 2020-12-25

Family

ID=73850500

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010845701.0A Pending CN112132183A (en) 2020-08-20 2020-08-20 Method for rapidly predicting specific surface area of perovskite material based on XGboost algorithm

Country Status (1)

Country Link
CN (1) CN112132183A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112992290A (en) * 2021-03-17 2021-06-18 华北电力大学 Perovskite band gap prediction method based on machine learning and cluster model

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111091878A (en) * 2019-11-07 2020-05-01 上海大学 Method for rapidly predicting perovskite dielectric constant
CN111242302A (en) * 2019-12-27 2020-06-05 冶金自动化研究设计院 XGboost prediction method of intelligent parameter optimization module

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111091878A (en) * 2019-11-07 2020-05-01 上海大学 Method for rapidly predicting perovskite dielectric constant
CN111242302A (en) * 2019-12-27 2020-06-05 冶金自动化研究设计院 XGboost prediction method of intelligent parameter optimization module

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LI SHI ET AL.: "Using Data Mining To Search for Perovskite Materials with Higher Specific Surface Area", 《JOURNAL OF CHEMICAL INFORMATION AND MODELING》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112992290A (en) * 2021-03-17 2021-06-18 华北电力大学 Perovskite band gap prediction method based on machine learning and cluster model
CN112992290B (en) * 2021-03-17 2024-02-23 华北电力大学 Perovskite band gap prediction method based on machine learning and cluster model

Similar Documents

Publication Publication Date Title
Chueh et al. Electrochemistry of mixed oxygen ion and electron conducting electrodes in solid electrolyte cells
Niu et al. Towards the digitalisation of porous energy materials: evolution of digital approaches for microstructural design
Döll et al. Global optimization in LEED structure determination using genetic algorithms
CN111460382A (en) Fuel vehicle harmful gas emission prediction method and system based on Gaussian process regression
CN112132177B (en) Machine learning based fast prediction of ABO 3 On-line forecasting method of perovskite band gap
CN113052367A (en) Method for efficiently predicting stability of perovskite based on integrated machine learning
CN108985381A (en) The determination method, device and equipment of nitrogen oxide emission prediction model
CN112132183A (en) Method for rapidly predicting specific surface area of perovskite material based on XGboost algorithm
Kiyohara et al. Radial Distribution Function from X-ray Absorption near Edge Structure with an Artificial Neural Network
CN114252463A (en) Urban atmospheric particulate source analysis method
Qi et al. Chemical signatures to identify the origin of solid ashes for efficient recycling using machine learning
CN113808681A (en) ABO (abnormal noise) rapid prediction based on SHAP-Catboost3Method and system for specific surface area of perovskite material
Pop et al. A fuzzy classification of the chemical elements
Fei et al. Discrimination of excessive exhaust emissions of vehicles based on Catboost algorithm
Albuthbahak et al. Prediction of concrete compressive strength using supervised machine learning models through ultrasonic pulse velocity and mix parameters
CN116307068A (en) Multi-city multi-atmosphere pollutant prediction method based on four-dimensional directed GCN-LSTM model
CN115050434A (en) High-flux screening method for ultrahigh-temperature thermal barrier coating material
CN112132185B (en) Method for rapidly predicting double perovskite oxide band gap based on data mining
CN113223639A (en) Method for exploring structure, composition and property of perovskite oxide
CN112132182A (en) Method for rapidly predicting resistivity of ternary gold alloy based on machine learning
Harris et al. Applications of evolutionary computation in structure determination from diffraction data
Vaišnys et al. An alternative method of estimating the cranial capacity of Olduvai Hominid 7
Wang et al. Research on coal mine gas concentration prediction based on cloud computing technology under the background of internet
Li et al. Fault detection of liquid-propellant rocket engines based on LSSVM
CN112133383B (en) Method for predicting perovskite specific surface area based on genetic symbolic regression

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20201225