CN110516701A

CN110516701A - Method based on data mining quick predict perovskite Curie temperature

Info

Publication number: CN110516701A
Application number: CN201910648969.2A
Authority: CN
Inventors: 田璐敏; 陆文聪
Original assignee: University of Shanghai for Science and Technology
Current assignee: University of Shanghai for Science and Technology
Priority date: 2019-07-12
Filing date: 2019-07-18
Publication date: 2019-11-29

Abstract

The invention discloses a kind of methods based on data mining quick predict perovskite Curie temperature, and steps are as follows: 1) searching ABO from document and database₃The Curie temperature numerical value and chemical formula of type inorganic hybridization perovskite material；2) corresponding descriptor is generated according to chemical formula；3) data set is randomly divided into training set and test set using Euclidean distance determination method；4) independent variable is screened with progressive method combination supporting vector machine leaving-one method；5) with target variable and the independent variable that has screened, and the forecasting model of perovskite material Curie temperature is established with support vector machines by training set sample；6) according to the Curie temperature of established model prediction test set sample.The method of the present invention establishes the forecasting model of efficient quick by deriving from the sample data of document and database, has the advantages that quick and convenient, inexpensive, environmentally protective, while also can avoid blindness to experiment practical operation directive function.

Description

Method based on data mining quick predict perovskite Curie temperature

Technical field

The present invention relates to a kind of perovskite electromagnetic performance test methods, test more particularly to a kind of perovskite Curie temperature Method.Applied to perovskite performance characterization and analysis and testing technology field.

Background technique

Perovskite is due to its stable crystal structure, unique physicochemical property and the hot spot for being increasingly becoming research.It can be with Applied to catalyst, the dye sensitizing agent of dye-sensitized solar cells can also be used as.Part perovskite is due to huge magnetic Resistance can be utilized to as giant magnetic material.Perovskite giant magnetic resistor material has in terms of magnetic refrigeration, magnetic storage, magnetic sensing There is good application prospect.

Curie temperature (Curie Temperature) refers to temperature when spontaneous magnetization in magnetic material drops to zero, Symbol is Tc, is the critical point that ferromagnetism or ferrimagnetism substance are transformed into paramagnet.Substance when lower than Curie-point temperature It is ferromagnet, related magnetic field is difficult to change with material at this time.When temperature is higher than curie point, which becomes paramagnet, magnetic The magnetic field of body is easy to the change with surrounding magnetic field and changes.Curie temperature is the ceiling temperature of many magnetic material work, because This research Curie temperature is to have very important significance.

Progressive method is one kind of independent variable screening compared with classical way, and principle is simple but highly effective.So-called progressive method, refers to Retain some variable at the beginning, be then gradually adding other variables, while contribution of the observation variable to model, retains contributive Variable simultaneously rejects the small variable of contribution, until model is optimal.

Support vector machines (support vector machine, abbreviation SVM) is mathematician Vladimir N.Vapnik etc. The machine learning established on the basis of Statistical Learning Theory (statistical learning theory, abbreviation SLT) is newly square Method, including supporting vector classification (support vector classification, abbreviation SVC) algorithm and support vector regression (support vector regression, abbreviation SVR) algorithm.Support vector machines can carry out small sample few in number Modeling, and obtain the preferable model of prediction ability.At present to the test of the Curie temperature of perovskite usually require by test into Row, may cause chemical contamination using chemical substance, to the perovskite Curie temperature of a large amount of different atomic parameters, structural parameters Test job amount is huge, low efficiency, and there are blindness for partial test, is not able to satisfy at this stage to the Curie of series of components perovskite The needs of the comprehensive cognition of temperature.

Summary of the invention

In order to solve prior art problem, it is an object of the present invention to overcome the deficiencies of the prior art, and to provide one kind Based on the method for data mining quick predict perovskite Curie temperature, pass through theoretical and CALCULATING PREDICTION ABO₃Type inorganic hybridization calcium titanium Pit wood material Curie temperature, using Euclidean distance determination method, progressive method combination supporting vector machine leaving-one method, by data mining side Method only needs the several seconds to can be obtained by calculated result, convenient and efficient, saves manpower, environmentally protective.

In order to achieve the above objectives, the present invention adopts the following technical scheme:

A method of based on data mining quick predict perovskite Curie temperature, include the following steps:

1) ABO is searched from document and database₃The Curie temperature numerical value and chemical formula of type inorganic hybridization perovskite material, As data set sample；

2) using the atomic parameter and structural parameters being collected into, corresponding atomic parameter and structure are generated according to chemical formula Parameter descriptor, and in descriptor generating process, delete processing is carried out to the sample of defect numerical value；

3) Euclidean distance determination method is utilized, the data set sample random division obtained in the step 1) is training Collection and test set；

4) using the Curie temperature being collected into the step 1) as target variable, the original of the generation in the step 2) Subparameter and structural parameters descriptor are independent variable；With progressive method combination supporting vector machine leaving-one method, training set is carried out from change Amount screening, selects the subset of the optimal independent variable of modeling；

5) independent variable screened with target variable and in the step 4), and with support vector machines, by Training set sample obtained in the step 3), establishes the forecasting model of perovskite material Curie temperature；

6) it according to the forecasting model for the perovskite Curie temperature established in the step 5), forecasts in the step 3) The Curie temperature of obtained test set sample.

As currently preferred technical solution, in the step 3), the specific steps of Euclidean distance determination method are such as Under:

3-1) using the atomic parameter of the generation in the step 2) and structural parameters descriptor as independent variable, and to become certainly The coordinate as each sample is measured, a high latitude space is created；

3-2) select the maximum sample of forbidden bandwidth；

The sample of selection 3-3) is included in modeling collection；

3-4) using the sample as the center of circle, R is the sphere that radius establishes a high latitude space, defines radius R are as follows:

Wherein c is the customized discrimination factor (Dissimilarity level), and setting c as 0.5, V is respectively to become certainly The product of most value difference is measured, N is sample number, and K is space dimensionality；

3-5) sample by sample spacing d less than radius R is included in test set, defines sample i and sample i+1 spacing d are as follows:

Wherein x_i,nIt is n-th of independent variable of sample i, x_i+1, n is n-th of independent variable of sample i+1；

The maximum sample of forbidden bandwidth in remaining sample set 3-6) is chosen, and repeats step 3-2) to 3-5), until all Sample be included into modeling collection and test set.

As currently preferred technical solution, in the step 4), using progressive method screen independent variable the step of such as Under:

A feature, dividing when combining it with the feature being selected into are selected in the feature being never selected into every time It is maximum from criterion J, until the number of features being selected into reaches specified dimension D；

If being selected into k feature, it is denoted as X_k, m-k feature x not being selected into_j, one by one with the feature set X that has been selected into_kGroup J value is calculated after conjunction, wherein j=1,2 ..., m-k, if meeting the following formula:

J(X_k+x₁)≥J(X_k+x₂)≥…≥J(X_k+x_n-k)

Then x₁It is selected into, the feature group of next step is combined into X_k+1=X_k+x_i；K=0 when beginning, the process are performed until k=D Until；In progressive method, the characteristic of selection is 8.

The present invention compared with prior art, has following obvious prominent substantive distinguishing features and remarkable advantage:

1. the method for the present invention overcomes the shortcomings that traditional " cooking method ", constantly trial and error is avoided, passes through theoretical and CALCULATING PREDICTION ABO₃Type inorganic hybridization perovskite material Curie temperature；The method of the present invention carries out Curie temperature using support vector machine method Forecast, and cross validation has been carried out to result, descriptor is generated using the atomic parameter and structural parameters being collected into, will be obtained Descriptor import model, it is only necessary to the several seconds can be obtained by calculated result, convenient and efficient, and a people can be completed；

2. the method for the present invention is not related to experiment and chemical article in the whole process, chemical contamination is not generated, green is met Environmental protection concept；Preparation method of the present invention is simple, it is easy to accomplish, it is suitble to promote and apply；

3. the method for the present invention can prejudge ABO by model prediction in advance₃Curie's temperature of type inorganic hybridization perovskite material Degree selects satisfactory sample and carries out experimental verification, the efficiency of experiment can be improved, plays directive function, avoid blindness

Detailed description of the invention

Fig. 1 is the Support vector regression model modeling result figure of one perovskite Curie temperature of the embodiment of the present invention.

Fig. 2 is the Support vector regression model leave one cross validation knot of two perovskite Curie temperature of the embodiment of the present invention Fruit figure.

Fig. 3 is the Support vector regression Model Independent test set result of three perovskite Curie temperature of the embodiment of the present invention Figure.

Specific embodiment

Above scheme is described further below in conjunction with specific implementation example, the preferred embodiment of the present invention is described in detail such as Under:

Embodiment one:

In the present embodiment, a method of based on data mining quick predict perovskite Curie temperature, including walk as follows It is rapid:

1) ABO is searched from document and database₃The Curie temperature numerical value and chemical formula of type inorganic hybridization perovskite material, As data set sample；Part chemical formula and Curie temperature numerical value are as shown in table 1, and table 1 is part perovskite chemical formula and Curie The set of data samples of temperature value:

The set of data samples of 1. perovskite chemical formula of table and Curie temperature numerical value

Chemical formula	Tc/K	Chemical formula	Tc/K
				La_0.7Sr_0.3Mn_0.5Cr_0.5O₃	226	La_0.9Pb_0.1MnO₃	235
La_0.7Sr_0.3Mn_0.8Cr_0.2O₃	286	La_0.8Pb_0.2MnO₃	310
				La_0.7Sr_0.3Mn_0.9Cu_0.1O₃	350	La_0.7Pb_0.3MnO₃	358
La_0.75Sr_0.25MnO₃	340	La_0.6Pb_0.4MnO₃	360
				La_0.7Sr_0.3Mn_0.6Cr_0.4O₃	242	La_0.5Pb_0.5MnO₃	355
La_0.7Sr_0.25Ag_0.05MnO₃	303	La_0.65Sr_0.35MnO₃	377
				La_0.7Sr_0.05Ag_0.25MnO₃	363	La_0.55Pr_0.1Sr_0.35MnO₃	353
La_0.75Ba_0.1Ag_0.15MnO₃	315	La_0.45Pr_0.2Sr_0.35MnO₃	344
				La_0.7Ca_0.3MnO₃	250	La_0.35Pr_0.3Sr_0.35MnO₃	334
La_0.7Ag_0.3MnO₃	270	La_0.7Sr_0.1Ag_0.2MnO₃	286.5
				La_0.89Sr_0.11MnO₃	195	La_0.67Sr_0.33MnO₃	372.5
La_0.88Sr_0.12MnO₃	170	La_0.7Sr_0.3MnO₃	370
				La_0.875Sr_0.125MnO₃	188	La_0.7Sr_0.3Mn_0.95Fe_0.05O₃	330
La_0.865Sr_0.135MnO₃	214	La_0.7Sr_0.3Mn_0.9Cr_0.1O₃	326
				La_0.855Sr_0.145MnO₃	230.5	La_0.7Sr_0.3Mn_0.85Cr_0.15O₃	304
La_0.845Sr_0.155MnO₃	242	La_0.7Sr_0.3Mn_0.85Fe_0.15O₃	175
				La_0.835Sr_0.165MnO₃	260.5	La_0.68Nd_0.02Ba_0.3Mn_0.9Cr_0.1O₃	300
La_0.83Sr_0.17MnO₃	265	La_0.7Ba_0.3Mn_0.9Cr_0.1O₃	298
				La_0.825Sr_0.175MnO₃	283	La_0.7Sr_0.3Mn_0.9Fe_0.1O₃	261
La_0.72Sr_0.28MnO₃	375	La_0.7Ba_0.3Mn_0.9Fe_0.1O₃	215
				La_0.69Sr_0.31MnO₃	380	La_0.67Ca_0.33MnO₃	275
La_0.64Sr_0.36MnO₃	372	La_0.6Sr_0.1Cu_0.3MnO₃	232
				La_0.52Sr_0.48MnO₃	330	La_0.65Ca_0.18Sr_0.17MnO₃	323
La_0.50Sr_0.50MnO₃	310	La_0.67Ba_0.33Mn_0.98Ti_0.02O₃	314
				La_0.48Sr_0.52MnO₃	290	La_0.7Ca_0.2Sr_0.1MnO₃	315
La_0.45Sr_0.55MnO₃	260	La_0.65Nd_0.05Ca_0.3MnO₃	250
				La_0.4Sm_0.3Sr_0.3MnO₃	256	La_0.6Sr_0.2Ba_0.2MnO₃	354
Ba_0.95Sr_0.05MnO₃	353	La_0.8Ba_0.1Ca_0.1Mn_0.97Fe_0.03O₃	281
				La_0.7Sr_0.3Mn_0.93Fe_0.07O₃	296	La_0.9Mg_0.1MnO₃	160
La_0.7Sr_0.3Mn_0.9Al_0.1O₃	310	La_0.8Ba_0.2MnO₃	295
				La_0.67Ca_0.33Mn_0.85V_0.15O₃	287.2	La_0.67Ba_0.23Ca_0.1MnO₃	350
La_0.6Nd_0.1Ca_0.15Sr_0.15Mn_0.9Fe_0.1O₃	298	La_0.7Ba_0.3MnO₃	328
				La_0.6Nd_0.1Ca_0.15Sr_0.15Mn_0.95Fe_0.05O₃	306	La_0.57Dy_0.1Sr_0.33MnO₃	358
La_0.6Nd_0.1Ca_0.15Sr_0.15MnO₃	326

2) using the atomic parameter and structural parameters being collected into, corresponding atomic parameter and structure are generated according to chemical formula Parameter descriptor, and in descriptor generating process, delete processing, the complete sample number of data are carried out to the sample of defect numerical value It is 67, perovskite chemical formula and Curie temperature numerical value are as shown in step 1) table 1；Utilize the atomic parameter and structure being collected into Parameter generates descriptor, amounts to 147, part of descriptor is as shown in table 2:

2. part descriptors table of table

A_aff	A_Radius	A_Tm	A_Tb	A_work function(eV)
					B_aff	B_Radius	B_Tm	B_Tb	B_work function(eV)
TF	A_modulus bulk	A_Density	A_ionic	A_quantum number
					rc	B_modulus bulk	B_Density	B_ionic	B_quantum number
Za	A_group number	A_Hfus	Mass	A_atomic weight(10-3kg)
					Zb	B_group number	B_Hfus	R_a/R_b	B_atomic weight(10-3kg)

3) Euclidean distance determination method is utilized, will be in 67 data set sample random divisions obtained in the step 2) Training set and test set, ratio 4:1, training set and test set sample size are respectively 54 and 13；

Specific step is as follows for Euclidean distance determination method:

3-2) select the maximum sample of forbidden bandwidth；

The sample of selection 3-3) is included in modeling collection；

3-4) using the sample selected as the center of circle, R is the sphere that radius establishes a high latitude space, defines radius R are as follows:

The maximum sample of forbidden bandwidth in remaining sample set 3-6) is chosen, and repeats step 3-2) to 3-5), until all Sample be included into modeling collection and test set；

4) using the Curie temperature being collected into the step 1) as target variable, the original of the generation in the step 2) Subparameter and structural parameters descriptor are independent variable；It is verified with progressive method combination supporting vector machine leaving-one method, to training set Independent variable screening is carried out, selects 8 optimal independents variable, the subset of the optimal independent variable as modeling；

The step of screening independent variable using progressive method is as follows:

Progressive method is a kind of simple searching method from bottom to top, selects a spy in the feature being never selected into every time Sign, separable criterion J when combining it with the feature being selected into is maximum, until the number of features being selected into reaches specified Dimension D until；

J(X_k+x₁)≥J(X_k+x₂)≥…≥J(X_k+x_n-k)

Then x₁It is selected into, the feature group of next step is combined into X_k+1=X_k+x_i；K=0 when beginning, the process are performed until k=D Until；In progressive method, the characteristic of selection is 8；In the method, the characteristic of selection is 8；

The Fast Prediction model that perovskite Curie temperature is established with support vector machines, the optimal variable selected are as shown in table 3:

The selected optimal descriptor table of 3. progressive method combination supporting vector machine leaving-one method of table

A_enthalpy vacancies Miedema(kJ·mole^-1)	A_modulus rigidity(GPa)
		A_modulus Young(GPa)	A_distance core electron(Schubert)(A)
B_nWS1/3Miedema(a.u.-1/3)	B_enthalpy surface Miedema(kJ·mole^-1)
		B_enthalpy vacancies Miedema(kJ·mole^-1)	B_ionic

In this step, erased noise is big and the higher variable of repeatability, selects the optimal variable subset of modeling, reduces Noise data improves screening precision；

5) independent variable screened with target variable and in the step 4), and with support vector machines, by Training set sample obtained in the step 3), establishes the forecasting model of perovskite material Curie temperature, selects the optimal of modeling The subset of variable；

Perovskite Curie temperature Fast Prediction model of the present embodiment according to foundation, the Curie of Fast Prediction test set sample Temperature.Modeling result based on the Curie temperature quantitative forecast model that 54 perovskite sample combination supporting vector machines are established, such as Shown in Fig. 1.

The present embodiment carries out regression modeling to 54 perovskite sample datas using Support vector regression algorithm, establishes nothing The Support vector regression quantitative model of machine hydridization perovskite Curie temperature.Perovskite Curie temperature model prediction value and document are true The related coefficient of real value is 0.9076.The present embodiment method establishes height by the sample data from document and database Effect efficiently forecasting model, have the advantages that it is quick and convenient, inexpensive, environmentally protective, while can also be to testing practical operation Directive function is played, blindness is avoided.

Embodiment two:

The present embodiment is basically the same as the first embodiment, and is particular in that:

In the present embodiment, A is numbered in 54 samples in training set₁, A₂……A₅₄.The first step is with A₁, A₂…… A₅₃It establishes model 1 using the optimal independent variable subset being the same as example 1 for training set and performance model 1 forecasts A₅₄Residence In temperature.Second step is with A₁, A₂……A₅₂, A₅₄It is established for training set using the optimal independent variable subset being the same as example 1 Model 2 and the forecast of performance model 2 A₅₃Curie temperature.And so on, after establishing 54 models, pass through predicted value and true value The stability and reliability of error judgment Data Modeling Method.

According to the perovskite Curie temperature Fast Prediction model of foundation, the Curie temperature of Fast Prediction training set sample.Base It is handed over inside the leaving-one method for the perovskite Curie temperature quantitative forecast model that 54 perovskite sample combination supporting vector machines are established Verification result is pitched, as shown in Figure 2.

The support vector machines for the perovskite Curie temperature that the present embodiment method establishes 54 sample datas using leaving-one method Quantitative forecast model carries out leaving-one method cross-validation, and the model prediction value of perovskite Curie temperature and document are true in leaving-one method The related coefficient of real value is 0.8485.The present embodiment method establishes instruction by the sample data from document and database The forecasting model for practicing collection leave one cross validation, have the advantages that it is quick and convenient, inexpensive, environmentally protective, while can also be right The stability and reliability of Data Modeling Method make assessment.

Embodiment three:

The present embodiment is substantially the same as in the previous example, and is particular in that:

In the present embodiment, according to the perovskite Curie temperature Fast Prediction model of foundation, Fast Prediction test set sample Curie temperature.Based on 54 sample data combination supporting vector machines establish perovskite Curie temperature quantitative forecast model it is only Vertical test set forecast result, as shown in Figure 3.

The present embodiment method is using the support vector machines quantitative forecast model for the perovskite Curie temperature established to independent survey 13 samples that examination is concentrated are forecast, preferable result has been obtained.The model prediction value and document of perovskite Curie temperature are true The related coefficient of real value is 0.7938, and the present embodiment method establishes height by the sample data from document and database Effect efficiently forecasting model, have the advantages that it is quick and convenient, inexpensive, environmentally protective, while can also be to testing practical operation Directive function is played, blindness is avoided.

Combination attached drawing of the embodiment of the present invention is illustrated above, but the present invention is not limited to the above embodiments, it can be with The purpose of innovation and creation according to the present invention makes a variety of variations, under the Spirit Essence and principle of all technical solutions according to the present invention Change, modification, substitution, combination or the simplification made, should be equivalent substitute mode, as long as meeting goal of the invention of the invention, Without departing from the present invention is based on the technical principle and inventive concept of the method for data mining quick predict perovskite Curie temperature, Belong to protection scope of the present invention.

Claims

1. a kind of method based on data mining quick predict perovskite Curie temperature, which comprises the steps of:

2) using the atomic parameter and structural parameters being collected into, corresponding atomic parameter and structural parameters are generated according to chemical formula Descriptor, and in descriptor generating process, delete processing is carried out to the sample of defect numerical value；

3) utilize Euclidean distance determination method, the data set sample random division obtained in the step 1) be training set and Test set；

4) using the Curie temperature being collected into the step 1) as target variable, the atom of the generation in the step 2) is joined Several and structural parameters descriptor is independent variable；With progressive method combination supporting vector machine leaving-one method, independent variable sieve is carried out to training set Choosing, selects the subset of the optimal independent variable of modeling；

5) independent variable screened with target variable and in the step 4), and with support vector machines, by described Training set sample obtained in step 3) establishes the forecasting model of perovskite material Curie temperature；

6) according to the forecasting model for the perovskite Curie temperature established in the step 5), forecast obtains in the step 3) Test set sample Curie temperature.

2. the method according to claim 1 based on data mining quick predict perovskite Curie temperature, it is characterised in that: In In the step 3), specific step is as follows for Euclidean distance determination method:

3-1) made using the atomic parameter of the generation in the step 2) and structural parameters descriptor as independent variable, and with independent variable For the coordinate of each sample, a high latitude space is created；

3-2) select the maximum sample of forbidden bandwidth；

The sample of selection 3-3) is included in modeling collection；

Wherein c be the customized discrimination factor (Dissimilarity level), set c as 0.5, V be respective independent variable most The product of value difference, N are sample number, and K is space dimensionality；

The maximum sample of forbidden bandwidth in remaining sample set 3-6) is chosen, and repeats step 3-2) to 3-5), until all samples Originally modeling collection and test set are included into.

3. the method according to claim 1 based on data mining quick predict perovskite Curie temperature, it is characterised in that: In In the step 4), using progressive method screen independent variable the step of it is as follows:

A feature is selected in the feature that is never selected into every time, separable when combining it with the feature being selected into is sentenced According to J maximum, until the number of features being selected into reaches specified dimension D；

If being selected into k feature, it is denoted as X_k, m-k feature x not being selected into_j, one by one with the feature set X that has been selected into_kAfter combination J value is calculated, wherein j=1,2 ..., m-k, if meeting the following formula:

J(X_k+x₁)≥J(X_k+x₂)≥…≥J(X_k+x_n-k)

Then x₁It is selected into, the feature group of next step is combined into X_k+1=X_k+x_i；K=0 when beginning, until which is performed until k=D； In progressive method, the characteristic of selection is 8.