CN113223639A - Method for exploring structure, composition and property of perovskite oxide - Google Patents

Method for exploring structure, composition and property of perovskite oxide Download PDF

Info

Publication number
CN113223639A
CN113223639A CN202110274280.5A CN202110274280A CN113223639A CN 113223639 A CN113223639 A CN 113223639A CN 202110274280 A CN202110274280 A CN 202110274280A CN 113223639 A CN113223639 A CN 113223639A
Authority
CN
China
Prior art keywords
feature
optimal
descriptor
algorithm
descriptors
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110274280.5A
Other languages
Chinese (zh)
Inventor
林彬
邓钦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202110274280.5A priority Critical patent/CN113223639A/en
Publication of CN113223639A publication Critical patent/CN113223639A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C60/00Computational materials science, i.e. ICT specially adapted for investigating the physical or chemical properties of materials or phenomena associated with their design, synthesis, processing, characterisation or utilisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a method for exploring the structure, component and property relation of perovskite oxide, which comprises the following steps: 1) collecting a data set of perovskite oxide as a data set sample; 2) preprocessing a data set, and analyzing a data sample value by using a Pearson correlation coefficient to find out a characteristic with strong correlation with a target variable; 3) carrying out feature dimension expansion on the features selected in the step 2) to obtain a new descriptor; 4) selecting the characteristics of the new descriptor to find an optimal descriptor subset; 5) performing linear regression fitting on the optimal descriptor subset to find an optimal descriptor; 6) and obtaining the corresponding relationship among the structure, the components and the properties through linear regression. The novel method provided by the invention does not depend on prior knowledge or a model, can obtain a large number of descriptors, finds the optimal descriptor from the descriptors and obtains the relationship among the structure, the components and the properties of the perovskite oxide, and provides a novel method for the exploration and research of materials.

Description

Method for exploring structure, composition and property of perovskite oxide
Technical Field
The invention relates to the field of designing the structure, the components and the properties of perovskite materials. Is a method for exploring the structural, component and property relationship of perovskite oxide. Particularly, the method combines feature engineering and a linear regression algorithm, a large number of descriptors are created based on the feature engineering, important descriptor subsets are obtained through feature selection, and then the optimal descriptors are found through the linear regression algorithm, and the relation among the structure, the components and the properties of the perovskite oxide is obtained.
Technical Field
In material discovery, it is critical to explore the relationship between the structure, composition and properties of the material. The large compositional possibilities of material compounds present a significant challenge for understanding the relationship between structure, composition and properties. Exploring relationships between structures, components and properties results from a large number of experiments in the laboratory, which often require a wide variety of equipment, require a significant amount of time and effort, and thus make it very difficult to explore relationships between structures, components and properties of materials.
In recent 5 years, machine learning has been widely used in material discovery, and can greatly reduce the calculation cost and shorten the development period. Therefore, it is the most effective method to replace laboratory experiments. However, the complexity of the machine learning process and the lack of model interpretability make it difficult to derive straightforward formulas that describe the relationship between material structure and composition. Therefore, it is important to find a reasonable method to investigate the structure of the material, and the relationship between the components and the properties is important.
Perovskite oxides have received much attention in electrocatalysts and fuel cells because of their excellent electrocatalytic properties. Perovskite oxide refers to an ABO of the type having a perovskite structure3A compound of the formula (I). Wherein the A site is a large radius cation and the B site is a small radius cation. The most important characteristic of the perovskite structure is that ions with greatly different radii can stably coexist in the same structure. Since the types and the amounts of elements which can be accommodated in the A and B sites are very wide, the types of compounds having perovskite structures are very large, which is a search for ABO3The relationship between the structure, composition and properties of perovskite oxides presents significant challenges.
To date, some methods of finding valid descriptors, such as symbolic regression algorithms, have been reported. The purpose of this method is to find several important descriptors describing the target property in a given feature space, or to find some hidden mathematical formula, so as to use these important descriptors to predict the target variable. These methods, while effective, require reliance on a number of conditions, such as large amounts of data, proper algorithms, etc., which are clearly difficult for materials scientists who are not familiar with computer algorithms. Therefore, the efficiency of these algorithms is very low.
Disclosure of Invention
In order to solve the problems of the prior art, the present invention aims to overcome the shortcomings of the prior art and provide a method for exploring the structural, compositional and property relationships of perovskite oxides. The method is helpful for exploring the relationship among the structure, the components and the properties of the perovskite oxide under the condition of no prior knowledge, finding out expressions for describing the correlation among the structure, the components and the properties, saving the experimental time and resources and accelerating the design and optimization of materials. The invention is realized by adopting the following technical scheme:
a method for exploring the structural, compositional and property relationships of perovskite oxides, comprising the steps of:
1) collecting a data set of perovskite oxides from the literature as a data set sample for the method;
2) preprocessing the data set, deleting the samples with the defect values, analyzing the Pearson correlation coefficient of the complete data sample values, and finding out the characteristic with strong correlation with the target variable;
3) establishing a new descriptor based on the feature engineering according to the features selected in the step 2);
4) selecting the characteristics of the generated new descriptors to find an optimal descriptor subset;
5) performing linear regression fitting on the optimal descriptor subset to find an optimal descriptor;
6) and (4) obtaining the relation of the corresponding structure, components and properties by using the found optimal descriptor as an input variable through a linear regression algorithm.
The implementation steps of the above technical solution will now be further described.
In the feature engineering in the step 3), a new descriptor is constructed through a transformation function and a feature combination, wherein the transformation function is beneficial to scaling the size of the feature, or the nonlinear relation between the feature and the target variable is converted into a linear relation; feature combining is one of the very useful methods in feature engineering, which combines two or more class features into one feature; feature combinations are cross-multiplications of all possible feature values, the features of each combination actually representing the synergy of the respective information; there are 9 changesFunction x, x-1,
Figure BDA0002975928640000021
x2,x3,exLn | x |, ln (1+ | x |), log | x |, all of which are non-linearly combined to get more descriptors.
Further, the method for selecting the features of the descriptors in the step 4) adopts a sequential forward feature selection method, and the algorithm is as follows:
and selecting a feature subset training model. The model herein refers to the underlying algorithm employed by the sequential forward feature selection method, here a gradient boosting regression algorithm. The model is evaluated with a validation dataset. This is done for different feature subsets according to a sequential forward search algorithm. And selecting the optimal feature subset according to the evaluation result.
Further, the method for selecting the features of the descriptors in step 4) adopts a sequential forward feature selection method, and the method adopts a gradient lifting regression algorithm as a bottom-layer algorithm, wherein the gradient lifting regression algorithm is as follows:
the gradient lifting regression algorithm minimizes an objective function in each iteration by adding new regression trees, each new tree learns on the residual error of the last tree and trains along the negative gradient direction of the loss function, a plurality of weak learners are trained through multiple iterations, and finally the weak learners are linearly combined to generate a strong learner.
Further, the method for performing regression fitting on the optimal descriptor subset in step 5) adopts a linear regression algorithm, and the algorithm is as follows:
linear regression uses the least squares method or the gradient descent method to find the optimal model coefficients for a given data set; the least squares method does this by minimizing the sum of the squared errors between the fitted values and the actual values for each observation instance in the training data; the gradient descent method finds out the optimal model coefficient by updating the coefficient in each iteration; each update ensures that the sum of the model fit value and the actual value of the training data is reduced; after several iterations it finds the local minimum by moving in the direction of the negative gradient and thus finds the optimal model coefficients for a given data set.
Compared with the prior art, the invention has the following obvious prominent substantive characteristics and obvious advantages:
1. the method facilitates exploring ABO without prior knowledge3The relation among the structure, the components and the properties of the perovskite oxide is found, an expression for describing the relation among the structure, the components and the properties is found, the guidance is provided for exploring the relation among the structure, the components and the properties, and blindness is avoided; the design and optimization of the material are accelerated, and a new method is provided for the exploration and research of the material;
2. the invention does not relate to experiments and use chemical products in the whole process, does not generate chemical pollution and accords with the concept of green environmental protection; the method is easy to realize and is suitable for popularization and application;
3. the method is simple and easy to implement, saves experimental time and resources, has low cost, and is suitable for popularization and use.
Drawings
The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:
fig. 1 is a thermodynamic diagram of pearson correlation coefficient analysis performed on data sample values in embodiment 1 of the present invention; the fill fraction in each box corresponds to the value of the associated pearson correlation coefficient.
Fig. 2 is a process diagram of descriptor generation based on feature engineering in embodiment 1 of the present invention.
FIG. 3 is a scatter plot of optimum descriptor and target variable stability for perovskite oxides in example 1 of the present invention; the line represents the predicted value of stability, and the scatter represents the actual value of stability.
Detailed Description
To further clarify the objects, technical solutions and advantages of the present invention, the following detailed description of the present invention is given with reference to specific examples and drawings, and the exemplary embodiments and descriptions thereof are only used for explaining the present invention and are not intended to limit the present invention.
The first embodiment is as follows:
in this embodiment, referring to fig. 1, a method for recursive automatic acquisition of descriptors for exploring the structural, compositional and property relationships of perovskite oxides comprises the steps of:
1) a data set of 4912 groups of perovskite oxides is collected from the literature as a data set sample for the method;
the perovskite oxide data set is described in table 1:
TABLE 1 perovskite oxide Material data set description
Figure BDA0002975928640000041
Figure BDA0002975928640000051
2) And preprocessing the data set, deleting the samples with the defect numerical values, and analyzing the Pearson correlation coefficient of the complete data sample values to find out the characteristic with strong correlation with the target variable. Fig. 1 shows how several characteristics of perovskite are correlated with the stability of the target variable, and all of the characteristics are strongly correlated with the stability of the target variable, and all of the characteristics are taken as effective characteristics.
3) Establishing a new descriptor based on the feature engineering according to the features selected in the step 2); the feature engineering constructs a new descriptor by combining a transformation function and features, the transformation function introduces an algorithm and an aggregation operator in a given feature to generate the new descriptor, and the transformation function is helpful for scaling the size of the feature or converting a nonlinear relation between the feature and a target variable into a linear relation; feature combining is one of the very useful methods in feature engineering, which combines two or more class features into one feature; feature combinations are cross-multiplications of all possible feature values, the features of each combination actually representing the synergy of the respective information. FIG. 2 shows feature engineering based featuresAnd (5) generating. x is the number ofi( i 1, 2.., n) represents the selected feature, the parameters following the arrow representing the new descriptor generated; there are 9 transformation functions x, x-1,
Figure BDA0002975928640000052
x2,x3,exLn | x |, ln (1+ | x |), log | x |, all of which are non-linearly combined to get more descriptors.
4) Selecting the characteristics of the generated new descriptors to find an optimal descriptor subset;
further, the method for selecting the features of the descriptors in the step 4) adopts a sequential forward feature selection method, and the algorithm is as follows:
and selecting a feature subset training model. The model herein refers to the underlying algorithm employed by the sequential forward feature selection method, here a gradient boosting regression algorithm. The model is evaluated with a validation dataset. This is done for different feature subsets according to a sequential forward search algorithm. And selecting the optimal feature subset according to the evaluation result.
Further, the method for selecting the features of the descriptors in step 4) adopts a sequential forward feature selection method, which adopts a gradient lifting regression algorithm as a bottom-layer algorithm, and the algorithm is as follows:
the gradient lifting regression algorithm minimizes an objective function in each iteration by adding new regression trees, each new tree learns on the residual error of the last tree and trains along the negative gradient direction of the loss function, a plurality of weak learners are trained through multiple iterations, and finally the weak learners are linearly combined to generate a strong learner.
5) Performing linear regression fitting on the optimal descriptor subset to find an optimal descriptor; the selection of the best descriptors and corresponding evaluation indices by the linear regression model is shown in table 2:
further, the method for performing regression fitting on the optimal descriptor subset in step 5) adopts a linear regression algorithm, and the algorithm is as follows:
linear regression uses the least squares method or the gradient descent method to find the optimal model coefficients for a given data set; the least squares method does this by minimizing the sum of the squared errors between the fitted values and the actual values for each observation instance in the training data; the gradient descent method finds out the optimal model coefficient by updating the coefficient in each iteration; each update ensures that the sum of the model fit value and the actual value of the training data is reduced; after several iterations it finds the local minimum by moving in the direction of the negative gradient and thus finds the optimal model coefficients for a given data set.
TABLE 2 selection of optimal descriptors and corresponding evaluation indices for linear regression models
Figure BDA0002975928640000061
Figure BDA0002975928640000071
6) Finding the optimal descriptor by linear regression algorithm
Figure BDA0002975928640000072
As input variables, the corresponding relationships of structure, components and properties are obtained,
Figure BDA0002975928640000073
FIG. 3 is an optimal descriptor
Figure BDA0002975928640000074
A scatter plot of stability against a target variable; the line represents the predicted value of stability, and the scatter represents the actual value of stability.
The above-mentioned embodiments are described in further detail with reference to the accompanying drawings, and it should be understood that the present invention is not limited to the above-mentioned embodiments, and various changes and modifications can be made according to the purpose of the invention, and any modifications, substitutions, combinations, or simplifications made within the spirit and principle of the technical solution of the present invention shall be included in the scope of the present invention.

Claims (5)

1. A method for exploring the structural, compositional and property relationships of perovskite oxides, characterized by: comprises the following steps:
1) collecting a data set of perovskite material from the literature as a data set sample for the method;
2) preprocessing the collected data set, deleting the samples with defective numerical values, and analyzing the Pearson correlation coefficient of the complete data sample value to find out the characteristic with strong correlation with the target variable;
3) establishing a new descriptor based on the feature engineering according to the features selected in the step 2);
4) selecting the characteristics of the generated new descriptors to find an optimal descriptor subset;
5) performing linear regression fitting on the optimal descriptor subset to find an optimal descriptor;
6) and (4) obtaining the relation of the corresponding structure, components and properties by using the found optimal descriptor as an input variable through a linear regression algorithm.
2. A method for exploring the structural, compositional and property relationships of perovskite oxides as claimed in claim 1 wherein said feature engineering in step 3) constructs new descriptors by transformation functions and feature combinations, the transformation functions helping to scale the size of the features or to translate non-linear relationships between the features and the target variables into linear relationships; feature combining is one of the very useful methods in feature engineering, which combines two or more class features into one feature; feature combinations are cross-multiplications of all possible feature values, the features of each combination actually representing the synergy of the respective information; there are 9 transformation functions x, x-1,
Figure FDA0002975928630000011
x2,x3,exLn | x |, ln (1+ | x |), log | x |, all of which are non-linearly combined to get more descriptors.
3. The method for exploring the structural, compositional and property relationships of perovskite oxides as claimed in claim 1, wherein said method of feature selection for descriptors in step 4) employs a sequential forward feature selection method, and the algorithm is as follows:
selecting a feature subset training model, wherein the model refers to a bottom-layer algorithm adopted by a sequential forward feature selection method, and a gradient lifting regression algorithm is adopted; evaluating the model with a validation dataset; performing the above operations on different feature subsets according to a sequential forward search algorithm; and selecting the optimal feature subset according to the evaluation result.
4. The method for exploring the structural, compositional and property relationships of perovskite oxides as claimed in claim 1, wherein said feature selection method for descriptors in step 4) employs a sequential forward feature selection method employing a gradient boosting regression algorithm as an underlying algorithm as follows: the gradient lifting regression algorithm minimizes an objective function in each iteration by adding new regression trees, each new tree learns on the residual error of the last tree and trains along the negative gradient direction of the loss function, a plurality of weak learners are trained through multiple iterations, and finally the weak learners are linearly combined to generate a strong learner.
5. A method for exploring the structural, compositional and property relationships of perovskite oxides as claimed in claim 1 wherein said method of regression fitting the optimal descriptor subset in step 5) employs a linear regression algorithm as follows:
linear regression uses the least squares method or the gradient descent method to find the optimal model coefficients for a given data set; the least squares method does this by minimizing the sum of the squared errors between the fitted values and the actual values for each observation instance in the training data; the gradient descent method finds out the optimal model coefficient by updating the coefficient in each iteration; each update ensures that the sum of the model fit value and the actual value of the training data is reduced; after several iterations it finds the local minimum by moving in the direction of the negative gradient and thus finds the optimal model coefficients for a given data set.
CN202110274280.5A 2021-03-15 2021-03-15 Method for exploring structure, composition and property of perovskite oxide Pending CN113223639A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110274280.5A CN113223639A (en) 2021-03-15 2021-03-15 Method for exploring structure, composition and property of perovskite oxide

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110274280.5A CN113223639A (en) 2021-03-15 2021-03-15 Method for exploring structure, composition and property of perovskite oxide

Publications (1)

Publication Number Publication Date
CN113223639A true CN113223639A (en) 2021-08-06

Family

ID=77083718

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110274280.5A Pending CN113223639A (en) 2021-03-15 2021-03-15 Method for exploring structure, composition and property of perovskite oxide

Country Status (1)

Country Link
CN (1) CN113223639A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116825227A (en) * 2023-08-31 2023-09-29 桑若(厦门)光伏产业有限公司 Perovskite component proportion analysis method and device based on depth generation model

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116825227A (en) * 2023-08-31 2023-09-29 桑若(厦门)光伏产业有限公司 Perovskite component proportion analysis method and device based on depth generation model
CN116825227B (en) * 2023-08-31 2023-11-14 桑若(厦门)光伏产业有限公司 Perovskite component proportion analysis method and device based on depth generation model

Similar Documents

Publication Publication Date Title
CN112132177B (en) Machine learning based fast prediction of ABO 3 On-line forecasting method of perovskite band gap
CN113052367A (en) Method for efficiently predicting stability of perovskite based on integrated machine learning
CN113223639A (en) Method for exploring structure, composition and property of perovskite oxide
Abolhasani et al. Role of AI in experimental materials science
Oweida et al. Merging materials and data science: opportunities, challenges, and education in materials informatics
Xu et al. Machine learning in energy chemistry: introduction, challenges and perspectives
CN113808681A (en) ABO (abnormal noise) rapid prediction based on SHAP-Catboost3Method and system for specific surface area of perovskite material
Kalinin et al. Designing workflows for materials characterization
CN117370568A (en) Power grid main equipment knowledge graph completion method based on pre-training language model
Schaechtle et al. Time series structure discovery via probabilistic program synthesis
CN112116091B (en) Online forecasting method for rapidly forecasting organic-inorganic hybrid perovskite band gap based on machine learning
CN113051816A (en) Optimization method of atmospheric pollution control scheme
CN114329805A (en) Connecting piece multidisciplinary collaborative design optimization method based on self-adaptive agent model
CN113222455A (en) Generator set parameter name matching method based on modular decomposition and matching
CN109145518B (en) Method for constructing reliability decision graph model of large-scale complex equipment
CN112132183A (en) Method for rapidly predicting specific surface area of perovskite material based on XGboost algorithm
Vasconcelos et al. Hidden-state-only speciation and extinction models provide accurate tip estimates of diversification rates
Ismail et al. A hybrid of Newton method and genetic algorithm for constrained optimization method of the production of metabolic pathway
CN116884536B (en) Automatic optimization method and system for production formula of industrial waste residue bricks
Chen et al. Database technologies for L-system simulations in virtual plant applications on bioinformatics
Baroš et al. Parasitic Architectural Forms (PAF) S01. E02 “Methodology and Ontology”
CN115497573B (en) Carbon-based biological and geological catalytic material property prediction and preparation method
Huang A Study on the CO2 Emissions of Mainland China Using Deep Learning Models
Zhang et al. Rethinking complexity for software code structures: A pioneering study on Linux kernel code repository
CN113744810B (en) Microscopic model molecule generation track tracking method based on gas-solid heterogeneous reaction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication